Optimizing Nginx performance keeps latency stable and increases throughput by cutting per-request overhead, reusing connections efficiently, and avoiding repeated work during traffic spikes.

Nginx uses an event-driven worker model where each worker can serve many concurrent connections, so throughput is shaped by worker concurrency, keepalive reuse, upstream behavior, and protocol choices like HTTP/2 multiplexing or HTTP/3 over QUIC. When HTTPS is involved, TLS session reuse reduces handshake cost, while compression and caching reduce bytes-on-the-wire and upstream work.

Over-tuning can move bottlenecks instead of removing them: higher concurrency increases memory and file descriptor pressure, unsafe caching can leak personalized responses, and aggressive compression can trade bandwidth savings for CPU contention. Changes are safest when applied one at a time with repeatable benchmarks and metrics visible during the test window.

Steps to optimize Nginx web server performance:

  1. Capture a throughput and latency baseline with wrk using fixed options.
    $ wrk -t2 -c50 -d30s http://127.0.0.1/
    Running 30s test @ http://127.0.0.1/
      2 threads and 50 connections
      Thread Stats   Avg      Stdev     Max   +/- Stdev
        Latency   140.54us  319.28us  18.38ms   98.18%
        Req/Sec   174.37k     9.22k  193.56k    78.70%
      10423925 requests in 30.10s, 8.60GB read
    Requests/sec: 346313.31
    Transfer/sec:    292.61MB

    Keep URL, concurrency, and duration unchanged across comparison runs.

  2. Capture stub_status or equivalent connection metrics during each test run.
    $ curl -s http://127.0.0.1/nginx_status
    Active connections: 1 
    server accepts handled requests
     10453 10453 10423957 
    Reading: 0 Writing: 1 Waiting: 0 

    Expose status endpoints to trusted networks only.

  3. Enable gzip or Brotli compression for text assets.

    Compression helps most for HTML, CSS, JSON, and JavaScript responses.

  4. Verify Content-Encoding negotiation on a representative text endpoint.
    $ curl -I --silent -H 'Accept-Encoding: gzip' http://127.0.0.1/ | grep -i '^content-encoding:'
    Content-Encoding: gzip

    No Content-Encoding header usually means no compression was applied or the content was already compressed.

  5. Enable HTTP/2 on HTTPS listeners where supported.

    HTTP/2 reduces socket churn by multiplexing many requests over a single TCP connection.

  6. Confirm HTTP/2 negotiation with curl.
    $ curl -I --http2 -k --silent https://127.0.0.1/
    HTTP/2 200 
    server: nginx/1.24.0 (Ubuntu)
    date: Mon, 29 Dec 2025 22:13:02 GMT
    content-type: text/html
    content-length: 10671
    last-modified: Sun, 28 Dec 2025 06:15:52 GMT
    etag: "6950cb18-29af"
    accept-ranges: bytes
  7. Enable HTTP/3 only when QUIC support is available end-to-end.

    HTTP/3 depends on the Nginx build, TLS library support, and UDP reachability.

  8. Confirm HTTP/3 negotiation with a QUIC-capable curl build.
    $ curl -I --http3 https://example.com/
    HTTP/3 200
    server: nginx
    ##### snipped #####

    If testing fails with a missing-feature error, use a curl build compiled with HTTP/3 support.

  9. Enable TLS session caching for HTTPS traffic.

    Session reuse reduces CPU spent on repeated handshakes from short-lived clients.

  10. Tune keepalive settings to reduce connection churn under load.

    Overly large keepalive pools can consume worker_connections and memory with idle sockets.

  11. Tune reverse-proxy behavior for proxied applications.

    Upstream latency, buffering, and connection reuse often dominate end-to-end response time.

  12. Enable open_file_cache for high-volume static file serving.

    Open file cache is most effective when serving many small files from local disk.

  13. Enable caching only for responses that are safe to share, with clear invalidation.

    Caching personalized or authentication-dependent content can leak data between users.

  14. Enable microcaching only for upstream responses that tolerate short staleness.

    Even sub-second microcache TTLs can break per-user pages, CSRF token flows, and real-time dashboards.

  15. Tune worker_processes to match available CPU resources.

    Too many workers can increase context switching without improving throughput.

  16. Tune worker_connections for peak concurrent sockets.

    Each connection consumes a file descriptor; raising worker_connections requires higher file descriptor limits.

  17. Increase file descriptor limits before pushing high concurrency.
    $ ulimit -n
    1048576

    Interactive shell limits can differ from the nginx service limit.

  18. Reduce access log overhead on known high-traffic endpoints when measured as a bottleneck.

    Disabling logs can hide incidents; reduce logging only where the impact is proven.

  19. Re-run the same wrk benchmark per change to validate improvement.
    $ wrk -t2 -c50 -d30s http://127.0.0.1/
    ##### snipped #####

    Compare Requests/sec plus tail latency, not only averages.