Monitoring Elasticsearch cluster health exposes allocation failures, missing nodes, and cluster-state pressure before search latency climbs or indexing starts failing across dependent services.

The _cluster/health API returns a script-friendly summary of shard availability and pending coordination work, while the _cat/health, _cat/nodes, _cat/allocation, _cat/indices, and _cat/shards endpoints provide compact operator views that highlight where the problem sits. Together they show whether the issue is cluster-wide, limited to a node, tied to disk pressure, or isolated to specific indices and shards.

Secured deployments commonly use an authenticated HTTPS endpoint for these curl requests. Elastic's current API docs still position the CAT endpoints as operator-facing views, so use _cluster/health for monitoring integrations and keep CAT checks for fast triage.

Steps to monitor Elasticsearch cluster health:

  1. Request a concise cluster-wide summary from the cluster health API.
    $ curl -sS --fail "http://localhost:9200/_cluster/health?filter_path=cluster_name,status,number_of_nodes,number_of_data_nodes,active_primary_shards,active_shards,relocating_shards,initializing_shards,unassigned_shards,number_of_pending_tasks,active_shards_percent_as_number&pretty"
    {
      "cluster_name" : "docker-cluster",
      "status" : "yellow",
      "number_of_nodes" : 1,
      "number_of_data_nodes" : 1,
      "active_primary_shards" : 2,
      "active_shards" : 2,
      "relocating_shards" : 0,
      "initializing_shards" : 0,
      "unassigned_shards" : 1,
      "number_of_pending_tasks" : 0,
      "active_shards_percent_as_number" : 66.66666666666666
    }

    Elastic's current cluster health API still reports green when all primary and replica shards are assigned, yellow when only replica shards are unassigned, and red when one or more primary shards are unassigned.

    For secured clusters, use the cluster HTTPS endpoint and the authentication method already used by operators for that cluster.

  2. Use the CAT health view for a one-line operator snapshot of shard and task counts.
    $ curl -sS --fail "http://localhost:9200/_cat/health?v=true&h=cluster,status,node.total,node.data,shards,pri,relo,init,unassign,pending_tasks,active_shards_percent"
    cluster        status node.total node.data shards pri relo init unassign pending_tasks active_shards_percent
    docker-cluster yellow          1         1      2   2    0    0        1             0                 66.7%

    The current CAT health API supports custom column selection with h=, so hiding epoch and timestamp keeps repeated checks focused on the health signals that change during recovery.

    CAT endpoints are best used for human triage rather than application polling.

  3. List nodes with resource indicators and role flags.
    $ curl -sS --fail "http://localhost:9200/_cat/nodes?v&h=ip,heap.percent,ram.percent,cpu,load_1m,load_5m,load_15m,node.role,master,name"
    ip         heap.percent ram.percent cpu load_1m load_5m load_15m node.role   master name
    172.17.0.3           21          99  38    5.85    4.22     2.39 cdfhilmrstw *      b64dd82513cb

    The current node.role column still uses compact letters such as m for master-eligible, d for data, i for ingest, and - for a coordinating-only node.

    Add format=json for ad hoc parsing, but use the richer node APIs for monitoring systems that need machine-stable fields.

  4. Review shard allocation and disk pressure across nodes.
    $ curl -sS --fail "http://localhost:9200/_cat/allocation?v&h=shards,disk.indices,disk.used,disk.avail,disk.total,disk.percent,host,ip,node"
    shards disk.indices disk.used disk.avail disk.total disk.percent host       ip         node
         2       11.2kb    37.1gb     21.2gb     58.3gb           63 172.17.0.3 172.17.0.3 b64dd82513cb
         1                                                                                 UNASSIGNED

    An UNASSIGNED row means at least one shard is not placed on any node, and rising disk.percent can prevent new allocations when disk watermarks are crossed.

  5. Check index-level health, replica counts, and store growth.
    $ curl -sS --fail "http://localhost:9200/_cat/indices?v&s=health,status,index&h=health,status,index,pri,rep,docs.count,store.size"
    health status index           pri rep docs.count store.size
    green  open   metrics-2026.04   1   0          1      5.4kb
    yellow open   logs-2026.04      1   1          1      5.8kb

    A single-node cluster commonly shows yellow here when an index still has one or more replicas configured, because replicas cannot be assigned to the same node as the primary shard.

  6. Inspect shard placement and the reason behind unassigned shards.
    $ curl -sS --fail "http://localhost:9200/_cat/shards?v=true&s=state,index,shard,prirep&h=index,shard,prirep,state,node,unassigned.reason"
    index           shard prirep state      node         unassigned.reason
    logs-2026.04    0     r      UNASSIGNED              INDEX_CREATED
    logs-2026.04    0     p      STARTED    b64dd82513cb
    metrics-2026.04 0     p      STARTED    b64dd82513cb

    Focus first on prirep = p entries in UNASSIGNED state, because unassigned primary shards make the affected data unavailable.

  7. Check pending cluster tasks when health stays degraded or changes lag behind node events.
    $ curl -sS --fail "http://localhost:9200/_cluster/pending_tasks?pretty"
    {
      "tasks" : [ ]
    }

    An empty tasks array means there is no queued cluster-state work at that moment, while non-empty results expose priority, source, and time_in_queue_millis for backlog triage.