How to troubleshoot performance issues in Linux

Performance problems on Linux can turn simple logins into laggy keystrokes, stretch API responses into timeouts, and push background jobs past their schedules, so quick triage prevents small hiccups from becoming full outages.

Most slowdowns come from one dominant bottleneck—CPU contention, memory pressure, storage I/O latency, or network delay—and the kernel surfaces each of those through lightweight counters in /proc plus a few standard tools that provide “snapshot” views of current conditions.

Metrics are noisy and short-lived, so capture evidence early, sample more than once, and avoid running heavy benchmarks on an already-struggling host; a high load average can be caused by blocked I/O just as easily as busy CPUs, and aggressive troubleshooting can make the situation worse.

Steps to troubleshoot performance issues with uptime, vmstat, and ps in Linux:

Capture system load and run queue signals.

$ uptime
 08:04:29 up 1 day, 18:54,  0 user,  load average: 0.29, 0.23, 0.20
$ nproc
10
$ vmstat 1 5
procs -----------memory---------- ---swap-- -----io---- -system-- -------cpu-------
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st gu
 0  0      0 22697260 143280 621428    0    0     5   256  527    0  1  0 99  0  0  0
 1  0      0 22705324 143280 621428    0    0     0     0 1389 2166  0  0 100  0  0  0
 0  0      0 22705324 143292 621512    0    0    96     0 1820 2812  0  0 99  0  0  0
 1  0      0 22705324 143292 621512    0    0     0     0  474  634  0  0 100  0  0  0
 0  0      0 22705324 143292 621512    0    0     0     0  202  202  0  0 100  0  0  0

Compare load average to CPU count (nproc) and watch r in vmstat for a persistently non-zero run queue.

Identify the top CPU and memory consumers.

$ ps -eo pid,user,comm,%cpu,%mem --sort=-%cpu | head
  PID USER     COMMAND         %CPU %MEM
    1 root     systemd          0.0  0.0
 1426 systemd+ systemd-timesyn  0.0  0.0
  171 message+ dbus-daemon      0.0  0.0
  174 root     systemd-logind   0.0  0.0
   23 root     systemd-journal  0.0  0.0
 7319 root     cron             0.0  0.0
 3365 syslog   rsyslogd         0.0  0.0
 3846 root     sshd             0.0  0.0
 8997 user     bash             0.0  0.0
$ ps -eo pid,user,comm,%cpu,%mem --sort=-%mem | head
  PID USER     COMMAND         %CPU %MEM
   23 root     systemd-journal  0.0  0.0
    1 root     systemd          0.0  0.0
  174 root     systemd-logind   0.0  0.0
 3846 root     sshd             0.0  0.0
 1426 systemd+ systemd-timesyn  0.0  0.0
  171 message+ dbus-daemon      0.0  0.0
 3365 syslog   rsyslogd         0.0  0.0
 9013 user     ps               0.0  0.0
 9006 user     bash             0.0  0.0

Use COMMAND plus PID to pivot into deeper inspection (open files, threads, cgroups, logs) without guessing.

Check memory pressure and swap activity.

$ free -h
               total        used        free      shared  buff/cache   available
Mem:            23Gi       1.1Gi        21Gi        13Mi       747Mi        22Gi
Swap:          1.0Gi          0B       1.0Gi
$ vmstat 1 5
procs -----------memory---------- ---swap-- -----io---- -system-- -------cpu-------
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st gu
 1  0      0 22736704 143292 621724    0    0     5   256  527    0  1  0 99  0  0  0
 0  0      0 22745040 143292 621724    0    0     0     0  191  197  0  0 100  0  0  0
 0  0      0 22745040 143300 621724    0    0     0    80  193  213  0  0 100  0  0  0
 0  0      0 22745040 143300 621724    0    0     0     0  194  195  0  0 100  0  0  0
 1  0      0 22745040 143300 621724    0    0     0     0  936  783  2  1 96  0  0  0

Sustained swapping (non-zero si and so) can make systems feel frozen and may lead to the kernel OOM killer terminating processes.

Check CPU time spent waiting on storage.

$ vmstat 1 5
procs -----------memory---------- ---swap-- -----io---- -system-- -------cpu-------
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st gu
 1  0      0 22740500 143300 621724    0    0     5   256  527    0  1  0 99  0  0  0
 0  0      0 22740248 143300 621724    0    0     0     0 1413 2177  0  0 100  0  0  0
 0  0      0 22739996 143300 621724    0    0     0     4 1277 2010  0  0 100  0  0  0
 0  0      0 22739996 143300 621724    0    0     0   328  529  645  0  0 100  0  0  0
 0  0      0 22739996 143316 621740    0    0     8   212  493  623  0  0 100  0  0  0

High wa (I/O wait) typically means storage latency or saturation; correlate with disk errors and per-device utilization before blaming CPUs.

Measure network latency to critical endpoints.

$ ping -c 5 1.1.1.1
PING 1.1.1.1 (1.1.1.1) 56(84) bytes of data.
64 bytes from 1.1.1.1: icmp_seq=1 ttl=63 time=7.37 ms
64 bytes from 1.1.1.1: icmp_seq=2 ttl=63 time=7.17 ms
64 bytes from 1.1.1.1: icmp_seq=3 ttl=63 time=7.55 ms
64 bytes from 1.1.1.1: icmp_seq=4 ttl=63 time=7.68 ms
64 bytes from 1.1.1.1: icmp_seq=5 ttl=63 time=8.22 ms

--- 1.1.1.1 ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 4023ms
rtt min/avg/max/mdev = 7.166/7.596/8.215/0.353 ms

Prefer hostnames over raw IPs when DNS resolution is part of the failure mode.

Repeat the same snapshots to confirm the dominant bottleneck moved or cleared.

$ uptime
 08:04:46 up 1 day, 18:55,  0 user,  load average: 0.22, 0.21, 0.19
$ vmstat 1 5
procs -----------memory---------- ---swap-- -----io---- -system-- -------cpu-------
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st gu
 1  0      0 22730932 143316 621912    0    0     5   256  527    0  1  0 99  0  0  0
 0  0      0 22779764 143324 621848    0    0     0    52  268  318  0  0 100  0  0  0
 0  0      0 22779512 143324 621848    0    0     0     0 1164  783  4  1 94  0  0  0
 0  0      0 22779512 143324 621848    0    0     0     0 1210 1886  0  0 100  0  0  0
 0  0      0 22779512 143324 621848    0    0     0     0 1352 2125  0  0 100  0  0  0