Performance problems on Linux can turn simple logins into laggy keystrokes, stretch API responses into timeouts, and push background jobs past their schedules, so quick triage prevents small hiccups from becoming full outages.
Most slowdowns come from one dominant bottleneck—CPU contention, memory pressure, storage I/O latency, or network delay—and the kernel surfaces each of those through lightweight counters in /proc plus a few standard tools that provide “snapshot” views of current conditions.
Metrics are noisy and short-lived, so capture evidence early, sample more than once, and avoid running heavy benchmarks on an already-struggling host; a high load average can be caused by blocked I/O just as easily as busy CPUs, and aggressive troubleshooting can make the situation worse.
Steps to troubleshoot performance issues with uptime, vmstat, and ps in Linux:
- Capture system load and run queue signals.
$ uptime 12:43:09 up 10 days, 11:49, 0 user, load average: 1.19, 1.14, 0.93 $ nproc 8 $ vmstat 1 5 procs -----------memory---------- ---swap-- -----io---- -system-- -------cpu------- r b swpd free buff cache si so bi bo in cs us sy id wa st gu 1 0 0 8389040 284412 2621904 0 0 107 267 551 0 0 0 99 0 0 0 0 0 0 8389040 284412 2621904 0 0 0 0 1387 1704 0 1 99 0 0 0 0 0 0 8384944 284412 2621904 0 0 0 0 353 386 0 0 100 0 0 0 0 0 0 8384944 284412 2621904 0 0 0 0 494 641 0 0 100 0 0 0 0 0 0 8384944 284412 2621904 0 0 0 0 179 185 0 0 100 0 0 0
Compare load average to CPU count (nproc) and watch r in vmstat for a persistently non-zero run queue.
Related: How to check load average in Linux
- Identify the top CPU and memory consumers.
$ ps -eo pid,user,comm,%cpu,%mem --sort=-%cpu | head PID USER COMMAND %CPU %MEM 1 root python3 0.0 0.0 21 root python3 0.0 0.1 511 root sh 0.0 0.0 512 root ps 0.0 0.0 513 root head 0.0 0.0 $ ps -eo pid,user,comm,%cpu,%mem --sort=-%mem | head PID USER COMMAND %CPU %MEM 21 root python3 0.0 0.1 1 root python3 0.0 0.0 515 root ps 0.0 0.0 514 root sh 0.0 0.0 516 root head 0.0 0.0Use COMMAND plus PID to pivot into deeper inspection (open files, threads, cgroups, logs) without guessing.
- Check memory pressure and swap activity.
$ free -h total used free shared buff/cache available Mem: 11Gi 1.2Gi 8.0Gi 48Mi 2.8Gi 10Gi Swap: 4.0Gi 0B 4.0Gi $ vmstat 1 5 procs -----------memory---------- ---swap-- -----io---- -system-- -------cpu------- r b swpd free buff cache si so bi bo in cs us sy id wa st gu 0 0 0 8385196 284412 2621904 0 0 107 267 551 0 0 0 99 0 0 0 0 0 0 8384944 284412 2621904 0 0 0 0 191 216 0 0 100 0 0 0 0 0 0 8384944 284412 2621904 0 0 0 0 110 102 0 0 100 0 0 0 0 0 0 8384944 284412 2621904 0 0 0 0 97 89 0 0 100 0 0 0 0 0 0 8384944 284412 2621904 0 0 0 0 783 611 2 2 96 0 0 0Sustained swapping (non-zero si and so) can make systems feel frozen and may lead to the kernel OOM killer terminating processes.
- Check CPU time spent waiting on storage.
$ vmstat 1 5 procs -----------memory---------- ---swap-- -----io---- -system-- -------cpu------- r b swpd free buff cache si so bi bo in cs us sy id wa st gu 1 0 0 8384944 284412 2621904 0 0 107 267 551 0 0 0 99 0 0 0 0 0 0 8384944 284412 2621904 0 0 0 0 181 223 0 0 100 0 0 0 0 0 0 8384944 284412 2621904 0 0 0 0 80 72 0 0 100 0 0 0 0 0 0 8384944 284412 2621900 0 0 0 0 1235 1493 0 1 99 0 0 0 1 0 0 8384944 284412 2621900 0 0 0 0 410 499 0 0 100 0 0 0
High wa (I/O wait) typically means storage latency or saturation; correlate with disk errors and per-device utilization before blaming CPUs.
Related: How to check CPU I/O wait in Linux
Related: How to check disk errors in Linux - Measure network latency to critical endpoints.
$ ping -c 5 203.0.113.50 PING 203.0.113.50 (203.0.113.50) 56(84) bytes of data. 64 bytes from 203.0.113.50: icmp_seq=1 ttl=63 time=10.8 ms 64 bytes from 203.0.113.50: icmp_seq=2 ttl=63 time=9.90 ms 64 bytes from 203.0.113.50: icmp_seq=3 ttl=63 time=7.17 ms 64 bytes from 203.0.113.50: icmp_seq=4 ttl=63 time=7.30 ms 64 bytes from 203.0.113.50: icmp_seq=5 ttl=63 time=10.6 ms --- 203.0.113.50 ping statistics --- 5 packets transmitted, 5 received, 0% packet loss, time 4033ms rtt min/avg/max/mdev = 7.171/9.154/10.785/1.595 ms
Prefer hostnames over raw IPs when DNS resolution is part of the failure mode.
- Repeat the same snapshots to confirm the dominant bottleneck moved or cleared.
$ uptime 12:43:26 up 10 days, 11:49, 0 user, load average: 1.08, 1.11, 0.92 $ vmstat 1 5 procs -----------memory---------- ---swap-- -----io---- -system-- -------cpu------- r b swpd free buff cache si so bi bo in cs us sy id wa st gu 1 0 0 8385508 284412 2621900 0 0 107 267 551 0 0 0 99 0 0 0 0 0 0 8385508 284412 2621900 0 0 0 0 165 176 0 0 100 0 0 0 0 0 0 8367636 284412 2621900 0 0 0 0 1173 935 4 2 94 0 0 0 0 0 0 8367636 284412 2621900 0 0 0 0 148 164 0 0 100 0 0 0 0 0 0 8373296 284412 2621900 0 0 0 0 105 75 0 0 100 0 0 0
Mohd Shakir Zakaria is a cloud architect with deep roots in software development and open-source advocacy. Certified in AWS, Red Hat, VMware, ITIL, and Linux, he specializes in designing and managing robust cloud and on-premises infrastructures.
