Optimizing Nginx performance keeps latency stable and increases throughput by cutting per-request overhead, reusing connections efficiently, and avoiding repeated work during traffic spikes.

Nginx uses an event-driven worker model where each worker can serve many concurrent connections, so throughput is shaped by worker concurrency, keepalive reuse, upstream behavior, and protocol choices like HTTP/2 multiplexing or HTTP/3 over QUIC. When HTTPS is involved, TLS session reuse reduces handshake cost, while compression and caching reduce bytes-on-the-wire and upstream work.

Over-tuning can move bottlenecks instead of removing them: higher concurrency increases memory and file descriptor pressure, unsafe caching can leak personalized responses, and aggressive compression can trade bandwidth savings for CPU contention. Changes are safest when applied one at a time with repeatable benchmarks and metrics visible during the test window.

Nginx performance tuning checklist:

  1. Capture a throughput and latency baseline with wrk using fixed options.
    $ wrk -t2 -c50 -d30s https://example.com/
    Running 30s test @ https://example.com/
      2 threads and 50 connections
      Thread Stats   Avg      Stdev     Max   +/- Stdev
        Latency   18.21ms    7.44ms  120.33ms   90.12%
        Req/Sec     1.38k   103.22     1.62k    86.50%
      82,389 requests in 30.10s, 1.10GB read
    Requests/sec:   2737.71
    Transfer/sec:    37.41MB

    Keep URL, concurrency, and duration unchanged across comparison runs.

  2. Capture stub_status or equivalent connection metrics during each test run.
    $ curl -s http://127.0.0.1/nginx_status
    Active connections: 2
    server accepts handled requests
      14562 14562 98341
    Reading: 0 Writing: 1 Waiting: 1

    Expose status endpoints to trusted networks only.

  3. Enable gzip or Brotli compression for text assets.

    Compression helps most for HTML, CSS, JSON, and JavaScript responses.

  4. Verify Content-Encoding negotiation on a representative text endpoint.
    $ curl -I -H 'Accept-Encoding: br,gzip' https://example.com/ | grep -i '^content-encoding:'
    content-encoding: br

    No Content-Encoding header usually means no compression was applied or the content was already compressed.

  5. Enable HTTP/2 on HTTPS listeners where supported.

    HTTP/2 reduces socket churn by multiplexing many requests over a single TCP connection.

  6. Confirm HTTP/2 negotiation with curl.
    $ curl -I --http2 https://example.com/
    HTTP/2 200
    server: nginx
    ##### snipped #####
  7. Enable HTTP/3 only when QUIC support is available end-to-end.

    HTTP/3 depends on the Nginx build, TLS library support, and UDP reachability.

  8. Confirm HTTP/3 negotiation with a QUIC-capable curl build.
    $ curl -I --http3 https://example.com/
    HTTP/3 200
    server: nginx
    ##### snipped #####

    If testing fails with a missing-feature error, use a curl build compiled with HTTP/3 support.

  9. Enable TLS session caching for HTTPS traffic.

    Session reuse reduces CPU spent on repeated handshakes from short-lived clients.

  10. Tune keepalive settings to reduce connection churn under load.

    Overly large keepalive pools can consume worker_connections and memory with idle sockets.

  11. Tune reverse-proxy behavior for proxied applications.

    Upstream latency, buffering, and connection reuse often dominate end-to-end response time.

  12. Enable open_file_cache for high-volume static file serving.

    Open file cache is most effective when serving many small files from local disk.

  13. Enable caching only for responses that are safe to share, with clear invalidation.

    Caching personalized or authentication-dependent content can leak data between users.

  14. Enable microcaching only for upstream responses that tolerate short staleness.

    Even sub-second microcache TTLs can break per-user pages, CSRF token flows, and real-time dashboards.

  15. Tune worker_processes to match available CPU resources.

    Too many workers can increase context switching without improving throughput.

  16. Tune worker_connections for peak concurrent sockets.

    Each connection consumes a file descriptor; raising worker_connections requires higher file descriptor limits.

  17. Increase file descriptor limits before pushing high concurrency.
    $ ulimit -n
    1024

    Interactive shell limits can differ from the nginx service limit.

  18. Reduce access log overhead on known high-traffic endpoints when measured as a bottleneck.

    Disabling logs can hide incidents; reduce logging only where the impact is proven.

  19. Re-run the same wrk benchmark per change to validate improvement.
    $ wrk -t2 -c50 -d30s https://example.com/
    ##### snipped #####

    Compare Requests/sec plus tail latency, not only averages.

Discuss the article:

Comment anonymously. Login not required.