Long-running alerts often need a wider audience than the first page. Nagios Core notification escalations let the monitoring scheduler add or change recipients after a host or service problem reaches a chosen notification number, so unresolved incidents can move from the first responder to a broader on-call path.
An escalation is a hostescalation or serviceescalation object definition. The object matches an existing host or service, sets the notification numbers where the escalation applies, and supplies the contacts or contact groups used while that range is active.
The service escalation for web01.example.net and HTTP starts on the third CRITICAL notification and stays active for later notifications. Include both linux-admins and oncall-managers when the first-response group should keep receiving pages after escalation; use only the higher group when the escalation should replace the original recipients.
$ grep '^cfg_dir=/etc/nagios4/conf.d' /etc/nagios4/nagios.cfg cfg_dir=/etc/nagios4/conf.d
Current Ubuntu and Debian packages include /etc/nagios4/conf.d by default. Add a cfg_dir or cfg_file entry first when local objects are stored somewhere else.
Related: How to add a Nagios Core object configuration directory
The HTTP service must already notify its first-response group, and oncall-managers must contain contacts with working service notification commands.
Related: How to create a Nagios Core contact and contact group
Related: How to add a service check in Nagios Core
$ sudoedit /etc/nagios4/conf.d/http-escalation.cfg
define serviceescalation { host_name web01.example.net service_description HTTP first_notification 3 last_notification 0 notification_interval 15 escalation_period 24x7 escalation_options c,r contact_groups linux-admins,oncall-managers }
first_notification 3 starts the rule on the third notification. last_notification 0 leaves the rule active for later notifications, and escalation_options c,r limits the escalation to CRITICAL and recovery notifications.
$ sudo nagios4 -v /etc/nagios4/nagios.cfg Nagios Core 4.4.6 ##### snipped ##### Reading configuration data... Read main config file okay... Read object config files okay... Running pre-flight check on configuration data... Checking objects... Checked 9 services. Checked 2 hosts. Checked 3 contacts. Checked 3 contact groups. Checked 0 host escalations. Checked 1 service escalations. ##### snipped ##### Total Warnings: 0 Total Errors: 0 Things look okay - No serious problems were detected during the pre-flight check
Zero errors confirm that the service, contact groups, time period, and escalation object resolve together.
Related: How to validate the Nagios Core configuration
$ sudo systemctl reload nagios4
Ubuntu and Debian package installs use the nagios4 unit. Use the local service name or source-install control script when Nagios Core was installed outside the package layout.
Related: How to manage the Nagios Core system service
$ systemctl is-active nagios4 active
Check /var/log/nagios4/nagios.log or the journal if the service is not active after the reload.
Related: How to check Nagios Core logs
Use a non-production service or a planned notification test window. Acknowledging the problem, disabling notifications, scheduling downtime, or recovering before the third notification prevents the escalation from firing.
$ sudo grep 'web01.example.net;HTTP' /var/log/nagios4/nagios.log [1782022036] SERVICE NOTIFICATION: ops-primary;web01.example.net;HTTP;CRITICAL;notify-service-by-email;HTTP CRITICAL - escalation test [1782023836] SERVICE NOTIFICATION: ops-primary;web01.example.net;HTTP;CRITICAL;notify-service-by-email;HTTP CRITICAL - escalation test [1782023836] SERVICE NOTIFICATION: ops-escalation;web01.example.net;HTTP;CRITICAL;notify-service-by-email;HTTP CRITICAL - escalation test
The later notification includes both the first-response contact and the escalation contact because the escalation definition lists both contact groups.