Performance problems on Linux can turn simple logins into laggy keystrokes, stretch API responses into timeouts, and push background jobs past their schedules, so quick triage prevents small hiccups from becoming full outages.
Most slowdowns come from one dominant bottleneck—CPU contention, memory pressure, storage I/O latency, or network delay—and the kernel surfaces each of those through lightweight counters in /proc plus a few standard tools that provide “snapshot” views of current conditions.
Metrics are noisy and short-lived, so capture evidence early, sample more than once, and avoid running heavy benchmarks on an already-struggling host; a high load average can be caused by blocked I/O just as easily as busy CPUs, and aggressive troubleshooting can make the situation worse.
$ uptime 08:04:29 up 1 day, 18:54, 0 user, load average: 0.29, 0.23, 0.20 $ nproc 10 $ vmstat 1 5 procs -----------memory---------- ---swap-- -----io---- -system-- -------cpu------- r b swpd free buff cache si so bi bo in cs us sy id wa st gu 0 0 0 22697260 143280 621428 0 0 5 256 527 0 1 0 99 0 0 0 1 0 0 22705324 143280 621428 0 0 0 0 1389 2166 0 0 100 0 0 0 0 0 0 22705324 143292 621512 0 0 96 0 1820 2812 0 0 99 0 0 0 1 0 0 22705324 143292 621512 0 0 0 0 474 634 0 0 100 0 0 0 0 0 0 22705324 143292 621512 0 0 0 0 202 202 0 0 100 0 0 0
Compare load average to CPU count (nproc) and watch r in vmstat for a persistently non-zero run queue.
Related: How to check load average in Linux
$ ps -eo pid,user,comm,%cpu,%mem --sort=-%cpu | head
PID USER COMMAND %CPU %MEM
1 root systemd 0.0 0.0
1426 systemd+ systemd-timesyn 0.0 0.0
171 message+ dbus-daemon 0.0 0.0
174 root systemd-logind 0.0 0.0
23 root systemd-journal 0.0 0.0
7319 root cron 0.0 0.0
3365 syslog rsyslogd 0.0 0.0
3846 root sshd 0.0 0.0
8997 user bash 0.0 0.0
$ ps -eo pid,user,comm,%cpu,%mem --sort=-%mem | head
PID USER COMMAND %CPU %MEM
23 root systemd-journal 0.0 0.0
1 root systemd 0.0 0.0
174 root systemd-logind 0.0 0.0
3846 root sshd 0.0 0.0
1426 systemd+ systemd-timesyn 0.0 0.0
171 message+ dbus-daemon 0.0 0.0
3365 syslog rsyslogd 0.0 0.0
9013 user ps 0.0 0.0
9006 user bash 0.0 0.0
Use COMMAND plus PID to pivot into deeper inspection (open files, threads, cgroups, logs) without guessing.
$ free -h
total used free shared buff/cache available
Mem: 23Gi 1.1Gi 21Gi 13Mi 747Mi 22Gi
Swap: 1.0Gi 0B 1.0Gi
$ vmstat 1 5
procs -----------memory---------- ---swap-- -----io---- -system-- -------cpu-------
r b swpd free buff cache si so bi bo in cs us sy id wa st gu
1 0 0 22736704 143292 621724 0 0 5 256 527 0 1 0 99 0 0 0
0 0 0 22745040 143292 621724 0 0 0 0 191 197 0 0 100 0 0 0
0 0 0 22745040 143300 621724 0 0 0 80 193 213 0 0 100 0 0 0
0 0 0 22745040 143300 621724 0 0 0 0 194 195 0 0 100 0 0 0
1 0 0 22745040 143300 621724 0 0 0 0 936 783 2 1 96 0 0 0
Sustained swapping (non-zero si and so) can make systems feel frozen and may lead to the kernel OOM killer terminating processes.
$ vmstat 1 5 procs -----------memory---------- ---swap-- -----io---- -system-- -------cpu------- r b swpd free buff cache si so bi bo in cs us sy id wa st gu 1 0 0 22740500 143300 621724 0 0 5 256 527 0 1 0 99 0 0 0 0 0 0 22740248 143300 621724 0 0 0 0 1413 2177 0 0 100 0 0 0 0 0 0 22739996 143300 621724 0 0 0 4 1277 2010 0 0 100 0 0 0 0 0 0 22739996 143300 621724 0 0 0 328 529 645 0 0 100 0 0 0 0 0 0 22739996 143316 621740 0 0 8 212 493 623 0 0 100 0 0 0
High wa (I/O wait) typically means storage latency or saturation; correlate with disk errors and per-device utilization before blaming CPUs.
Related: How to check CPU I/O wait in Linux
Related: How to check disk errors in Linux
$ ping -c 5 1.1.1.1 PING 1.1.1.1 (1.1.1.1) 56(84) bytes of data. 64 bytes from 1.1.1.1: icmp_seq=1 ttl=63 time=7.37 ms 64 bytes from 1.1.1.1: icmp_seq=2 ttl=63 time=7.17 ms 64 bytes from 1.1.1.1: icmp_seq=3 ttl=63 time=7.55 ms 64 bytes from 1.1.1.1: icmp_seq=4 ttl=63 time=7.68 ms 64 bytes from 1.1.1.1: icmp_seq=5 ttl=63 time=8.22 ms --- 1.1.1.1 ping statistics --- 5 packets transmitted, 5 received, 0% packet loss, time 4023ms rtt min/avg/max/mdev = 7.166/7.596/8.215/0.353 ms
Prefer hostnames over raw IPs when DNS resolution is part of the failure mode.
$ uptime 08:04:46 up 1 day, 18:55, 0 user, load average: 0.22, 0.21, 0.19 $ vmstat 1 5 procs -----------memory---------- ---swap-- -----io---- -system-- -------cpu------- r b swpd free buff cache si so bi bo in cs us sy id wa st gu 1 0 0 22730932 143316 621912 0 0 5 256 527 0 1 0 99 0 0 0 0 0 0 22779764 143324 621848 0 0 0 52 268 318 0 0 100 0 0 0 0 0 0 22779512 143324 621848 0 0 0 0 1164 783 4 1 94 0 0 0 0 0 0 22779512 143324 621848 0 0 0 0 1210 1886 0 0 100 0 0 0 0 0 0 22779512 143324 621848 0 0 0 0 1352 2125 0 0 100 0 0 0