How to troubleshoot a Linux service outage

Service outages on Linux can break web applications, background workers, and scheduled jobs, so a consistent triage routine reduces downtime and prevents guesswork during incidents.

On most modern Linux systems, systemd manages services as units and records state transitions and failures in the journal. A service can be unavailable because the process exited, the unit failed to start due to dependencies, or the daemon started but never bound to its port, and socket activation can also delay startup until the first connection arrives.

Collect unit state and log excerpts before applying fixes, since restarts can rotate logs and change failure modes. Rate limiting (start-limit-hit) and automatic restarts can mask the first error, so focus on the earliest failure lines and treat repeated restarts as a symptom, not a solution.

Steps to troubleshoot a Linux service outage with systemctl, journalctl, and ss:

Confirm the unit name and current service state.

$ sudo systemctl status --no-pager --full ssh.service | head -n 12
* ssh.service - OpenBSD Secure Shell server
     Loaded: loaded (/usr/lib/systemd/system/ssh.service; disabled; preset: enabled)
     Active: active (running) since Mon 2026-01-12 22:27:01 UTC; 1h 35min ago
TriggeredBy: * ssh.socket
       Docs: man:sshd(8)
             man:sshd_config(5)
   Main PID: 1337 (sshd)
      Tasks: 1 (limit: 14999)
     Memory: 1.1M (peak: 2.7M)
        CPU: 31ms
     CGroup: /system.slice/ssh.service
             `-1337 "sshd: /usr/sbin/sshd -D [listener] 0 of 10-100 startups"

List failing services with systemctl list-units --type=service --state=failed when the unit name is unknown.

Review recent service logs for failures or crash loops.

$ sudo journalctl -u ssh.service -b --no-pager -n 20
Jan 12 22:27:01 host.example.net systemd[1]: Starting ssh.service - OpenBSD Secure Shell server...
Jan 12 22:27:01 host.example.net sshd[1337]: Server listening on 0.0.0.0 port 22.
Jan 12 22:27:01 host.example.net sshd[1337]: Server listening on :: port 22.
Jan 12 22:27:01 host.example.net systemd[1]: Started ssh.service - OpenBSD Secure Shell server.
Jan 13 00:00:10 host.example.net sshd[3673]: Connection closed by 127.0.0.1 port 58638
Jan 13 00:01:27 host.example.net sshd[3796]: Connection closed by 127.0.0.1 port 59412

Narrow to a time window with --since "15 minutes ago" and prefer the first failure line, since later restarts often produce noisy duplicates.

Verify the service is listening on expected ports or sockets.

$ sudo ss -lntp '( sport = :22 )'
State  Recv-Q Send-Q Local Address:Port Peer Address:PortProcess
LISTEN 0      4096         0.0.0.0:22        0.0.0.0:*    users:(("sshd",pid=1337,fd=3),("systemd",pid=1,fd=87))
LISTEN 0      4096            [::]:22           [::]:*    users:(("sshd",pid=1337,fd=4),("systemd",pid=1,fd=88))

A missing listener usually indicates a failed start; a socket-activated service may require checking the matching .socket unit.

Check unit dependencies, environment files, and required paths.

$ sudo systemctl cat ssh.service
# /usr/lib/systemd/system/ssh.service
[Unit]
Description=OpenBSD Secure Shell server
Documentation=man:sshd(8) man:sshd_config(5)
After=network.target auditd.service
ConditionPathExists=!/etc/ssh/sshd_not_to_be_run

[Service]
EnvironmentFile=-/etc/default/ssh
ExecStartPre=/usr/sbin/sshd -t
ExecStart=/usr/sbin/sshd -D $SSHD_OPTS
ExecReload=/usr/sbin/sshd -t
ExecReload=/bin/kill -HUP $MAINPID
KillMode=process
Restart=on-failure
RestartPreventExitStatus=255
Type=notify
RuntimeDirectory=sshd
RuntimeDirectoryMode=0755

[Install]
WantedBy=multi-user.target
Alias=sshd.service

Environment files (for example /etc/example/example.env) can contain secrets; avoid pasting their contents into tickets, chat logs, or screenshots.

Restart or reload the service after corrections.
```
$ sudo systemctl restart ssh.service
```
Run systemctl daemon-reload after editing unit files under /etc/systemd/system/ so systemd loads the updated definition.

Repeated restarts without fixing the underlying error can trigger start-limit-hit and bury the original failure under log churn.

Related: How to manage a Linux service with systemctl
Confirm the service remains active and does not immediately restart.
```
$ sudo systemctl show ssh.service -p ActiveState -p SubState -p NRestarts
NRestarts=0
ActiveState=active
SubState=running
```
Re-run the same command after a few minutes; a rising NRestarts count usually indicates a crash loop, a watchdog kill, or a dependency that is still unstable.

Author: Mohd Shakir Zakaria
Mohd Shakir Zakaria is a cloud architect with deep roots in software development and open-source advocacy. Certified in AWS, Red Hat, VMware, ITIL, and Linux, he specializes in designing and managing robust cloud and on-premises infrastructures.