How to create a Prometheus alert rule

A Prometheus alert rule turns a PromQL condition into an alert instance that Prometheus evaluates on a schedule. Create one when a metric condition, such as a missing scrape target, should appear in Prometheus and be available for Alertmanager notification routing.

Prometheus reads alerting and recording rules from YAML rule files listed under rule_files in the active configuration file. On many Linux installations that file is /etc/prometheus/prometheus.yml, but a server started with --config.file may use another path.

The alert named PrometheusTargetMissing watches the Prometheus self-scrape and stays inactive while the target is up. A completed rule file passes promtool validation, the full Prometheus configuration references the rule file, and the runtime rules API reports the alert rule with health set to ok after a reload.

Steps to create a Prometheus alert rule:

  1. Create a directory for Prometheus rule files.
    $ sudo install -d -m 0755 /etc/prometheus/rules
  2. Open a new alert rule file.
    $ sudoedit /etc/prometheus/rules/alert-rules.yml
  3. Add a rule group with one alerting rule.
    groups:
      - name: prometheus-self-checks
        interval: 30s
        rules:
          - alert: PrometheusTargetMissing
            expr: up{job="prometheus"} == 0
            for: 1m
            labels:
              severity: warning
            annotations:
              summary: "Prometheus target is down"
              description: "The Prometheus scrape target has been down for more than 1 minute."

    The for value keeps the alert pending until the expression has stayed true for the full duration. Remove or shorten it only for controlled tests, not for normal paging rules.

  4. Open the active Prometheus configuration file.
    $ sudoedit /etc/prometheus/prometheus.yml
  5. Add the rule file path under rule_files.
    global:
      evaluation_interval: 30s
    
    rule_files:
      - /etc/prometheus/rules/*.yml

    Keep existing scrape_configs, alerting, storage, and remote-write settings in the same file. The rule_files entry can name one file or a glob that matches several rule files.

  6. Check the alert rule syntax.
    $ promtool check rules /etc/prometheus/rules/alert-rules.yml
    Checking /etc/prometheus/rules/alert-rules.yml
      SUCCESS: 1 rules found
  7. Check the full Prometheus configuration before reloading.
    $ promtool check config /etc/prometheus/prometheus.yml
    Checking /etc/prometheus/prometheus.yml
      SUCCESS: 1 rule files found
     SUCCESS: /etc/prometheus/prometheus.yml is valid prometheus config file syntax
    
    Checking /etc/prometheus/rules/alert-rules.yml
      SUCCESS: 1 rules found
  8. Reload Prometheus so it reads the new alert rule file.
    $ curl -sS -X POST http://localhost:9090/-/reload

    The HTTP reload endpoint requires Prometheus to run with --web.enable-lifecycle. If that flag is disabled, reload with SIGHUP or the service manager used on the host.

  9. Check that Prometheus loaded the alert rule.
    $ curl -sS 'http://localhost:9090/api/v1/rules?type=alert'
    {"status":"success","data":{"groups":[
      {
        "name":"prometheus-self-checks",
        "file":"/etc/prometheus/rules/alert-rules.yml",
        "rules":[
          {
            "state":"inactive",
            "name":"PrometheusTargetMissing",
            "query":"up{job=\"prometheus\"} == 0",
            "duration":60,
            "health":"ok",
            "type":"alerting"
          }
        ],
        "interval":30
      }
    ]}}

    state can be inactive when the condition is currently false. health set to ok and the expected name, file, and query values prove that Prometheus loaded and evaluated the rule.