Checkmk marks a service as flapping when its state changes back and forth quickly enough to create notification noise. Troubleshooting a flapping service means confirming the state-change pattern, deciding whether the monitored system or the check definition is unstable, and retesting the same service after one focused correction.
Flapping is a monitoring state, not the root cause. The current service output shows the latest check result, while Events of host & services, notification history, metrics, and service rules show whether the object is really failing or only crossing a narrow threshold.
Keep the first pass scoped to the affected host and service. Avoid disabling flap detection globally before the evidence is clear, because that can hide real intermittent outages across unrelated objects.
Checkmk suppresses successive state-change notifications while an object is flapping, but it still records when the object enters or leaves the flapping state.
If the service has one long problem period instead of repeated state changes, handle it as a normal service problem rather than flap noise.
OMD[mysite]:~$ lq GET statehist Columns: host_name service_description state duration Filter: host_name = web01 Filter: service_description = HTTP Filter: time >= 1781942400 Limit: 6 web01;HTTP;0;65 web01;HTTP;2;58 web01;HTTP;0;71 web01;HTTP;2;43 web01;HTTP;0;88 web01;HTTP;2;49
Replace web01, HTTP, and the Unix timestamp with the affected object and incident start time. For service states, 0 means OK, 1 means WARN, 2 means CRIT, and 3 means UNKNOWN.
A service can notify when it enters or leaves flapping even though additional state changes are suppressed while flapping remains active.
Small metric movements around a warning or critical boundary usually point to threshold tuning; matching application errors, packet loss, or agent failures point to a real intermittent problem.
Fix the application, network, agent, or data source when the check output shows real failures. Use a narrow service monitoring rule when the service is healthy but the threshold, discovery rule, or check parameter is too sensitive.
The rule set is Maximum number of check attempts for service. More attempts delay hard-state notification, so use it for brief noise rather than sustained outages.
Disabling flap detection globally affects unrelated hosts and services. Use Enable/disable flapping detection for services only for a tightly matched service rule.
The same service should stop alternating rapidly, and the flapping icon should disappear after Checkmk sees a final stable state.
Related: How to acknowledge a problem in Checkmk
Related: How to schedule Checkmk downtime