How to troubleshoot a Nagios Core host down alert

A host alert in Nagios Core means the monitoring server could not prove that a configured host is reachable with its host check command. The same alert can come from a real outage, a wrong host address, a broken host check command, or a missing parent relationship, so start by matching the recorded state to the host object and plugin result.

Host checks use the host object's address, check_command, and optional parents fields. A failed host check becomes DOWN when at least one parent path remains reachable, while a child behind failed parents becomes UNREACHABLE because Nagios Core cannot prove whether the child itself is down.

Packaged Ubuntu and Debian systems keep the nagios4 configuration, status data, and plugins in distribution-managed locations. Source installs commonly use a different prefix; keep the first pass read-only until the manual check shows whether the alert follows the real network path or the loaded object values.

Steps to troubleshoot a Nagios Core host down alert:

Check the recorded host state and plugin output.

$ sudo grep -A24 "host_name=web01.example.net" /var/lib/nagios4/status.dat
host_name=web01.example.net
check_command=check-host-alive
has_been_checked=1
current_state=1
plugin_output=PING CRITICAL - Packet loss = 100%
last_time_down=1782023108
##### snipped #####

In /var/lib/nagios4/status.dat, current_state=0 means UP, current_state=1 means DOWN, and current_state=2 means UNREACHABLE.

Run the host check plugin manually as the nagios user.
```
$ sudo -u nagios /usr/lib/nagios/plugins/check_ping -H 192.0.2.10 -w 5000,100% -c 5000,100% -p 1
PING CRITICAL - Packet loss = 100%| rta=U;5000.000000;5000.000000;; pl=100%;100;100;0;
```
Use the host object's address value and the arguments from its check_command definition. A CRITICAL result with the same packet-loss signal confirms that the alert follows the check path.
Related: How to run a Nagios plugin manually
Test a known reachable address with the same plugin.
```
$ sudo -u nagios /usr/lib/nagios/plugins/check_ping -H 127.0.0.1 -w 5000,100% -c 5000,100% -p 1
PING OK - Packet loss = 0%, RTA = 0.11 ms|rta=0.107000ms;5000.000000;5000.000000;0.000000 pl=0%;100;100;0;
```
If the known reachable address also fails, fix local ICMP permissions, plugin execution, firewall policy, or the monitoring server route before changing the host object.

Inspect the loaded host object fields.

$ sudo grep -A8 "web01.example.net" /var/lib/nagios4/objects.cache
host_name	web01.example.net
alias	Web 01
address	192.0.2.10
parents	router01.example.net
check_period	24x7
check_command	check-host-alive
contact_groups	admins
notification_period	workhours

The loaded object cache is useful when the source file was edited but Nagios Core is still using an older or different object definition.

Correct the host object when the address, parent, or host check command is wrong.

$ sudoedit /etc/nagios4/conf.d/web01-host.cfg

/etc/nagios4/conf.d/web01-host.cfg

define host {
    use                     linux-server
    host_name               web01.example.net
    alias                   Web 01
    address                 192.0.2.10
    parents                 router01.example.net
    check_command           check-host-alive
}

Do not add a parent only to hide a real host outage. Use parents only for the router, switch, firewall, or other hop that actually sits between the monitoring server and the child host.
Related: How to configure parent hosts in Nagios Core

Validate the Nagios Core configuration after changing the object.

$ sudo nagios4 -v /etc/nagios4/nagios.cfg
Nagios Core 4.4.6
##### snipped #####
Checked 3 hosts.
##### snipped #####
Total Warnings: 0
Total Errors:   0

Things look okay - No serious problems were detected during the pre-flight check

Fix every reported object or command error before reloading the daemon.
Related: How to validate the Nagios Core configuration

Reload Nagios Core after the pre-flight check reports zero errors.
```
$ sudo systemctl reload nagios4
```
Ubuntu and Debian package installs use the nagios4 service name. Use the local service name or source-install control method when it differs.
Related: How to manage the Nagios Core system service
Retest the original host check after the object or network path is fixed.
```
$ sudo -u nagios /usr/lib/nagios/plugins/check_ping -H 192.0.2.10 -w 5000,100% -c 5000,100% -p 1
PING OK - Packet loss = 0%, RTA = 0.92 ms|rta=0.920000ms;5000.000000;5000.000000;0.000000 pl=0%;100;100;0;
```
If the manual result still returns CRITICAL while the parent path is UP, keep the incident on the host or network path instead of editing Nagios Core to mask the outage.

Confirm that Nagios Core records the updated state after the next active check.

$ sudo grep -A24 "host_name=web01.example.net" /var/lib/nagios4/status.dat
host_name=web01.example.net
check_command=check-host-alive
has_been_checked=1
current_state=0
plugin_output=PING OK - Packet loss = 0%, RTA = 0.92 ms
last_time_up=1782023500
##### snipped #####

Force a fresh active check when the web UI still shows the old result after a verified fix.
Related: How to reschedule an active check in Nagios Core

Author: Mohd Shakir Zakaria
Mohd Shakir Zakaria is a cloud architect with deep roots in software development and open-source advocacy. Certified in AWS, Red Hat, VMware, ITIL, and Linux, he specializes in designing and managing robust cloud and on-premises infrastructures.