Rapid state changes can turn one unstable Nagios Core service into repeated problem and recovery alerts. Flap detection watches recent check results, marks the object as flapping when the state-change percentage crosses the high threshold, and suppresses normal notifications until the percentage falls below the low threshold.
The main /etc/nagios4/nagios.cfg file controls whether flap detection runs at all on Debian and Ubuntu package installs. Host and service objects can then enable flap detection, override the global thresholds, and choose which states count toward the calculation.
Use object-specific thresholds when one noisy service needs different flap behavior from the rest of the monitoring server. Keep the high threshold greater than the low threshold, include the f notification option when contacts should receive flapping start and stop notifications, and avoid thresholds that hide a real fault behind long notification suppression.
$ sudo grep '^enable_flap_detection=' /etc/nagios4/nagios.cfg enable_flap_detection=1
Set enable_flap_detection=1 before relying on object-level flap settings. Source installs commonly use /usr/local/nagios/etc/nagios.cfg instead.
$ sudoedit /etc/nagios4/conf.d/web01-flap.cfg
Edit the existing service definition when one already exists. Creating a second service with the same host_name and service_description causes a duplicate-object error during validation.
define service { use generic-service host_name web01.example.net service_description HTTP check_command check_http max_check_attempts 4 check_interval 5 retry_interval 1 flap_detection_enabled 1 low_flap_threshold 5.0 high_flap_threshold 15.0 flap_detection_options o,w,c,u notification_options w,c,r,f contact_groups admins }
low_flap_threshold and high_flap_threshold override the program-wide service thresholds for this service. flap_detection_options controls which service states count toward flapping, where o is OK, w is WARNING, c is CRITICAL, and u is UNKNOWN.
$ sudo nagios4 -v /etc/nagios4/nagios.cfg Nagios Core 4.4.6 ##### snipped ##### Reading configuration data... Read main config file okay... Read object config files okay... Running pre-flight check on configuration data... ##### snipped ##### Total Warnings: 0 Total Errors: 0 Things look okay - No serious problems were detected during the pre-flight check
Do not reload Nagios Core while Total Errors is greater than 0. Fix the first reported object, command, or path error before applying the change.
Related: How to validate the Nagios Core configuration
$ sudo systemctl reload nagios4
Ubuntu and Debian package installs use the nagios4 service name. Use the local service name, init script, or direct SIGHUP method when Nagios Core was installed from source or runs without systemd.
Related: How to manage the Nagios Core system service
$ sudo cat /var/lib/nagios4/objects.cache
##### snipped #####
define service {
host_name web01.example.net
service_description HTTP
check_command check_http
check_interval 5.000000
retry_interval 1.000000
low_flap_threshold 5.000000
high_flap_threshold 15.000000
flap_detection_enabled 1
flap_detection_options o,w,u,c
notification_options r,w,c,f
}
##### snipped #####
The object cache path comes from object_cache_file in /etc/nagios4/nagios.cfg.
$ sudo cat /var/lib/nagios4/status.dat
##### snipped #####
servicestatus {
host_name=web01.example.net
service_description=HTTP
current_state=2
flap_detection_enabled=1
is_flapping=1
percent_state_change=100.00
}
servicecomment {
host_name=web01.example.net
service_description=HTTP
author=(Nagios Process)
comment_data=Notifications for this service are being suppressed because it was detected as having been flapping between different states (18.4% change >= 15.0% threshold). When the service state stabilizes and the flapping stops, notifications will be re-enabled.
}
##### snipped #####
A service that has not crossed the high threshold shows is_flapping=0. The Nagios Process comment appears only after enough included state changes trigger notification suppression.