How to configure Nagios Core flap detection

Rapid state changes can turn one unstable Nagios Core service into repeated problem and recovery alerts. Flap detection watches recent check results, marks the object as flapping when the state-change percentage crosses the high threshold, and suppresses normal notifications until the percentage falls below the low threshold.

The main /etc/nagios4/nagios.cfg file controls whether flap detection runs at all on Debian and Ubuntu package installs. Host and service objects can then enable flap detection, override the global thresholds, and choose which states count toward the calculation.

Use object-specific thresholds when one noisy service needs different flap behavior from the rest of the monitoring server. Keep the high threshold greater than the low threshold, include the f notification option when contacts should receive flapping start and stop notifications, and avoid thresholds that hide a real fault behind long notification suppression.

Steps to configure Nagios Core flap detection:

Confirm that global flap detection is enabled in the active Nagios Core configuration.
```
$ sudo grep '^enable_flap_detection=' /etc/nagios4/nagios.cfg
enable_flap_detection=1
```
Set enable_flap_detection=1 before relying on object-level flap settings. Source installs commonly use /usr/local/nagios/etc/nagios.cfg instead.
Open the object file that defines the noisy service.
```
$ sudoedit /etc/nagios4/conf.d/web01-flap.cfg
```
Edit the existing service definition when one already exists. Creating a second service with the same host_name and service_description causes a duplicate-object error during validation.

Add or adjust the flap detection directives on the service object.

/etc/nagios4/conf.d/web01-flap.cfg

define service {
    use                     generic-service
    host_name               web01.example.net
    service_description     HTTP
    check_command           check_http
    max_check_attempts      4
    check_interval          5
    retry_interval          1
    flap_detection_enabled  1
    low_flap_threshold      5.0
    high_flap_threshold     15.0
    flap_detection_options  o,w,c,u
    notification_options    w,c,r,f
    contact_groups          admins
}

low_flap_threshold and high_flap_threshold override the program-wide service thresholds for this service. flap_detection_options controls which service states count toward flapping, where o is OK, w is WARNING, c is CRITICAL, and u is UNKNOWN.

Validate the Nagios Core configuration before applying the change.

$ sudo nagios4 -v /etc/nagios4/nagios.cfg
Nagios Core 4.4.6
##### snipped #####
Reading configuration data...
   Read main config file okay...
   Read object config files okay...

Running pre-flight check on configuration data...
##### snipped #####
Total Warnings: 0
Total Errors:   0

Things look okay - No serious problems were detected during the pre-flight check

Do not reload Nagios Core while Total Errors is greater than 0. Fix the first reported object, command, or path error before applying the change.
Related: How to validate the Nagios Core configuration

Reload Nagios Core after the pre-flight check reports zero errors.
```
$ sudo systemctl reload nagios4
```
Ubuntu and Debian package installs use the nagios4 service name. Use the local service name, init script, or direct SIGHUP method when Nagios Core was installed from source or runs without systemd.
Related: How to manage the Nagios Core system service

Confirm that Nagios Core loaded the service-specific flap settings.

$ sudo cat /var/lib/nagios4/objects.cache
##### snipped #####
define service {
	host_name	web01.example.net
	service_description	HTTP
	check_command	check_http
	check_interval	5.000000
	retry_interval	1.000000
	low_flap_threshold	5.000000
	high_flap_threshold	15.000000
	flap_detection_enabled	1
	flap_detection_options	o,w,u,c
	notification_options	r,w,c,f
}
##### snipped #####

The object cache path comes from object_cache_file in /etc/nagios4/nagios.cfg.

Check the runtime service status after the service changes states enough for flap detection to evaluate it.

$ sudo cat /var/lib/nagios4/status.dat
##### snipped #####
servicestatus {
	host_name=web01.example.net
	service_description=HTTP
	current_state=2
	flap_detection_enabled=1
	is_flapping=1
	percent_state_change=100.00
}

servicecomment {
	host_name=web01.example.net
	service_description=HTTP
	author=(Nagios Process)
	comment_data=Notifications for this service are being suppressed because it was detected as having been flapping between different states (18.4% change >= 15.0% threshold).  When the service state stabilizes and the flapping stops, notifications will be re-enabled.
}
##### snipped #####

A service that has not crossed the high threshold shows is_flapping=0. The Nagios Process comment appears only after enough included state changes trigger notification suppression.