Performance problems on Linux can turn simple logins into laggy keystrokes, stretch API responses into timeouts, and push background jobs past their schedules, so quick triage prevents small hiccups from becoming full outages.
Most slowdowns come from one dominant bottleneck—CPU contention, memory pressure, storage I/O latency, or network delay—and the kernel surfaces each of those through lightweight counters in /proc plus a few standard tools that provide “snapshot” views of current conditions.
Metrics are noisy and short-lived, so capture evidence early, sample more than once, and avoid running heavy benchmarks on an already-struggling host; a high load average can be caused by blocked I/O just as easily as busy CPUs, and aggressive troubleshooting can make the situation worse.
Steps to troubleshoot performance issues with uptime, vmstat, and ps in Linux:
- Capture system load and run queue signals.
$ uptime 08:04:29 up 1 day, 18:54, 0 user, load average: 0.29, 0.23, 0.20 $ nproc 10 $ vmstat 1 5 procs -----------memory---------- ---swap-- -----io---- -system-- -------cpu------- r b swpd free buff cache si so bi bo in cs us sy id wa st gu 0 0 0 22697260 143280 621428 0 0 5 256 527 0 1 0 99 0 0 0 1 0 0 22705324 143280 621428 0 0 0 0 1389 2166 0 0 100 0 0 0 0 0 0 22705324 143292 621512 0 0 96 0 1820 2812 0 0 99 0 0 0 1 0 0 22705324 143292 621512 0 0 0 0 474 634 0 0 100 0 0 0 0 0 0 22705324 143292 621512 0 0 0 0 202 202 0 0 100 0 0 0
Compare load average to CPU count (nproc) and watch r in vmstat for a persistently non-zero run queue.
Related: How to check load average in Linux
- Identify the top CPU and memory consumers.
$ ps -eo pid,user,comm,%cpu,%mem --sort=-%cpu | head PID USER COMMAND %CPU %MEM 1 root systemd 0.0 0.0 1426 systemd+ systemd-timesyn 0.0 0.0 171 message+ dbus-daemon 0.0 0.0 174 root systemd-logind 0.0 0.0 23 root systemd-journal 0.0 0.0 7319 root cron 0.0 0.0 3365 syslog rsyslogd 0.0 0.0 3846 root sshd 0.0 0.0 8997 user bash 0.0 0.0 $ ps -eo pid,user,comm,%cpu,%mem --sort=-%mem | head PID USER COMMAND %CPU %MEM 23 root systemd-journal 0.0 0.0 1 root systemd 0.0 0.0 174 root systemd-logind 0.0 0.0 3846 root sshd 0.0 0.0 1426 systemd+ systemd-timesyn 0.0 0.0 171 message+ dbus-daemon 0.0 0.0 3365 syslog rsyslogd 0.0 0.0 9013 user ps 0.0 0.0 9006 user bash 0.0 0.0Use COMMAND plus PID to pivot into deeper inspection (open files, threads, cgroups, logs) without guessing.
- Check memory pressure and swap activity.
$ free -h total used free shared buff/cache available Mem: 23Gi 1.1Gi 21Gi 13Mi 747Mi 22Gi Swap: 1.0Gi 0B 1.0Gi $ vmstat 1 5 procs -----------memory---------- ---swap-- -----io---- -system-- -------cpu------- r b swpd free buff cache si so bi bo in cs us sy id wa st gu 1 0 0 22736704 143292 621724 0 0 5 256 527 0 1 0 99 0 0 0 0 0 0 22745040 143292 621724 0 0 0 0 191 197 0 0 100 0 0 0 0 0 0 22745040 143300 621724 0 0 0 80 193 213 0 0 100 0 0 0 0 0 0 22745040 143300 621724 0 0 0 0 194 195 0 0 100 0 0 0 1 0 0 22745040 143300 621724 0 0 0 0 936 783 2 1 96 0 0 0Sustained swapping (non-zero si and so) can make systems feel frozen and may lead to the kernel OOM killer terminating processes.
- Check CPU time spent waiting on storage.
$ vmstat 1 5 procs -----------memory---------- ---swap-- -----io---- -system-- -------cpu------- r b swpd free buff cache si so bi bo in cs us sy id wa st gu 1 0 0 22740500 143300 621724 0 0 5 256 527 0 1 0 99 0 0 0 0 0 0 22740248 143300 621724 0 0 0 0 1413 2177 0 0 100 0 0 0 0 0 0 22739996 143300 621724 0 0 0 4 1277 2010 0 0 100 0 0 0 0 0 0 22739996 143300 621724 0 0 0 328 529 645 0 0 100 0 0 0 0 0 0 22739996 143316 621740 0 0 8 212 493 623 0 0 100 0 0 0
High wa (I/O wait) typically means storage latency or saturation; correlate with disk errors and per-device utilization before blaming CPUs.
Related: How to check CPU I/O wait in Linux
Related: How to check disk errors in Linux - Measure network latency to critical endpoints.
$ ping -c 5 1.1.1.1 PING 1.1.1.1 (1.1.1.1) 56(84) bytes of data. 64 bytes from 1.1.1.1: icmp_seq=1 ttl=63 time=7.37 ms 64 bytes from 1.1.1.1: icmp_seq=2 ttl=63 time=7.17 ms 64 bytes from 1.1.1.1: icmp_seq=3 ttl=63 time=7.55 ms 64 bytes from 1.1.1.1: icmp_seq=4 ttl=63 time=7.68 ms 64 bytes from 1.1.1.1: icmp_seq=5 ttl=63 time=8.22 ms --- 1.1.1.1 ping statistics --- 5 packets transmitted, 5 received, 0% packet loss, time 4023ms rtt min/avg/max/mdev = 7.166/7.596/8.215/0.353 ms
Prefer hostnames over raw IPs when DNS resolution is part of the failure mode.
- Repeat the same snapshots to confirm the dominant bottleneck moved or cleared.
$ uptime 08:04:46 up 1 day, 18:55, 0 user, load average: 0.22, 0.21, 0.19 $ vmstat 1 5 procs -----------memory---------- ---swap-- -----io---- -system-- -------cpu------- r b swpd free buff cache si so bi bo in cs us sy id wa st gu 1 0 0 22730932 143316 621912 0 0 5 256 527 0 1 0 99 0 0 0 0 0 0 22779764 143324 621848 0 0 0 52 268 318 0 0 100 0 0 0 0 0 0 22779512 143324 621848 0 0 0 0 1164 783 4 1 94 0 0 0 0 0 0 22779512 143324 621848 0 0 0 0 1210 1886 0 0 100 0 0 0 0 0 0 22779512 143324 621848 0 0 0 0 1352 2125 0 0 100 0 0 0
Mohd Shakir Zakaria is a cloud architect with deep roots in software development and open-source advocacy. Certified in AWS, Red Hat, VMware, ITIL, and Linux, he specializes in designing and managing robust cloud and on-premises infrastructures.
