Service outages on Linux can break web applications, background workers, and scheduled jobs, so a consistent triage routine reduces downtime and prevents guesswork during incidents.

On most modern Linux systems, systemd manages services as units and records state transitions and failures in the journal. A service can be unavailable because the process exited, the unit failed to start due to dependencies, or the daemon started but never bound to its port, and socket activation can also delay startup until the first connection arrives.

Collect unit state and log excerpts before applying fixes, since restarts can rotate logs and change failure modes. Rate limiting (start-limit-hit) and automatic restarts can mask the first error, so focus on the earliest failure lines and treat repeated restarts as a symptom, not a solution.

Steps to troubleshoot a Linux service outage with systemctl, journalctl, and ss:

  1. Confirm the unit name and current service state.
    $ sudo systemctl status --no-pager --full example.service | head -n 12
    * example.service - Example API
         Loaded: loaded (/etc/systemd/system/example.service; enabled; preset: enabled)
         Active: activating (auto-restart) (Result: exit-code) since Sat 2026-01-10 07:30:15 +08; 792ms ago
        Process: 153130 ExecStart=/usr/local/bin/example-api --config /etc/example/example.conf (code=exited, status=1/FAILURE)
       Main PID: 153130 (code=exited, status=1/FAILURE)
            CPU: 24ms
    
    Jan 10 07:30:15 host.example.net systemd[1]: example.service: Main process exited, code=exited, status=1/FAILURE
    Jan 10 07:30:15 host.example.net systemd[1]: example.service: Failed with result 'exit-code'.

    List failing services with systemctl list-units --type=service --state=failed when the unit name is unknown.

  2. Review recent service logs for failures or crash loops.
    $ sudo journalctl -u example.service -b --no-pager -n 40
    Jan 10 07:30:14 host.example.net example-api[153125]: ERROR: Unable to read config file: /etc/example/example.conf (Permission denied)
    Jan 10 07:30:14 host.example.net systemd[1]: example.service: Main process exited, code=exited, status=1/FAILURE
    Jan 10 07:30:14 host.example.net systemd[1]: example.service: Failed with result 'exit-code'.
    Jan 10 07:30:15 host.example.net systemd[1]: example.service: Scheduled restart job, restart counter is at 1.
    Jan 10 07:30:15 host.example.net systemd[1]: Started example.service - Example API.
    Jan 10 07:30:15 host.example.net example-api[153130]: ERROR: Unable to read config file: /etc/example/example.conf (Permission denied)
    Jan 10 07:30:15 host.example.net systemd[1]: example.service: Main process exited, code=exited, status=1/FAILURE
    ##### snipped #####

    Narrow to a time window with --since "15 minutes ago" and prefer the first failure line, since later restarts often produce noisy duplicates.

  3. Verify the service is listening on expected ports or sockets.
    $ sudo ss -lntp '( sport = :9000 )'
    State Recv-Q Send-Q Local Address:Port Peer Address:PortProcess

    A missing listener usually indicates a failed start; a socket-activated service may require checking the matching .socket unit.

  4. Check unit dependencies, environment files, and required paths.
    $ sudo systemctl cat example.service
    # /etc/systemd/system/example.service
    [Unit]
    Description=Example API
    After=network-online.target
    Wants=network-online.target
    
    [Service]
    Type=simple
    User=svcuser
    Group=svcuser
    EnvironmentFile=-/etc/default/example
    ExecStart=/usr/local/bin/example-api --config /etc/example/example.conf
    WorkingDirectory=/var/lib/example
    Restart=on-failure
    RestartSec=1s
    
    [Install]
    WantedBy=multi-user.target

    Environment files (for example /etc/example/example.env) can contain secrets; avoid pasting their contents into tickets, chat logs, or screenshots.

  5. Restart or reload the service after corrections.
    $ sudo systemctl restart example.service

    Run systemctl daemon-reload after editing unit files under /etc/systemd/system/ so systemd loads the updated definition.

    Repeated restarts without fixing the underlying error can trigger start-limit-hit and bury the original failure under log churn.

  6. Confirm the service remains active and does not immediately restart.
    $ sudo systemctl show example.service -p ActiveState -p SubState -p NRestarts
    NRestarts=1
    ActiveState=activating
    SubState=auto-restart

    Re-run the same command after a few minutes; a rising NRestarts count usually indicates a crash loop, a watchdog kill, or a dependency that is still unstable.