How to monitor Elasticsearch cluster health

Monitoring Elasticsearch cluster health surfaces shard allocation issues, node outages, and coordination backlogs before search and indexing become unreliable.

The cluster health API summarizes allocation state using a green/yellow/red status plus shard counters, while the cat APIs provide compact tables for nodes, indices, and shard placement.

Production clusters often require TLS plus authentication, so API requests may need credentials, an API key, and https:// endpoints. High-frequency polling of verbose endpoints can add overhead on large clusters, so prefer filtered responses and small column sets for repeated checks.

Steps to monitor Elasticsearch cluster health:

Query cluster health for overall status, allocation counters.

$ curl -sS "http://localhost:9200/_cluster/health?filter_path=cluster_name,status,number_of_nodes,number_of_data_nodes,active_primary_shards,active_shards,relocating_shards,initializing_shards,unassigned_shards,number_of_pending_tasks,active_shards_percent_as_number&pretty"
{
  "cluster_name" : "search-cluster",
  "status" : "green",
  "number_of_nodes" : 1,
  "number_of_data_nodes" : 1,
  "active_primary_shards" : 3,
  "active_shards" : 3,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "active_shards_percent_as_number" : 100.0
}

green = all shards allocated; yellow = replica shards unassigned; red = primary shards unassigned. Secured clusters commonly use https://host:9200 plus -u user:pass or -H "Authorization: ApiKey BASE64VALUE" .

List nodes with roles, resource indicators.

$ curl -sS "http://localhost:9200/_cat/nodes?v&h=ip,heap.percent,ram.percent,cpu,load_1m,load_5m,load_15m,node.role,master,name"
ip         heap.percent ram.percent cpu load_1m load_5m load_15m node.role   master name
192.0.2.40           25          98   0    0.64    0.91     0.89 cdfhilmrstw *      node-01

master column shows the elected master as *. Add format=json on cat endpoints for scripting.

Review node disk usage from allocation output.

$ curl -sS "http://localhost:9200/_cat/allocation?v&h=shards,disk.indices,disk.used,disk.avail,disk.total,disk.percent,ip,node"
shards disk.indices disk.used disk.avail disk.total disk.percent ip         node
     3       10.6kb   115.9gb      1.6tb      1.7tb            6 192.0.2.40 node-01

Disk watermarks can block shard allocation, leaving shards unassigned.

Review index health, document counts, store sizes.

$ curl -sS "http://localhost:9200/_cat/indices?v&s=health,status,index&h=health,status,index,pri,rep,docs.count,store.size"
health status index           pri rep docs.count store.size
green  open   logs-2024.12      1   0          0       249b
green  open   logs-2026.01      1   0          1      5.4kb
green  open   metrics-2026.01   1   0          1      4.9kb

yellow indices often indicate replica shards missing on single-node clusters with replicas configured.

Inspect shard placement for relocations, initializations, unassigned shards.

$ curl -sS "http://localhost:9200/_cat/shards?v&s=index,shard,prirep&h=index,shard,prirep,state,docs,store,ip,node"
index           shard prirep state   docs store ip         node
logs-2024.12    0     p      STARTED    0  249b 192.0.2.40 node-01
logs-2026.01    0     p      STARTED    1 5.4kb 192.0.2.40 node-01
metrics-2026.01 0     p      STARTED    1 4.9kb 192.0.2.40 node-01

UNASSIGNED primary shards can make affected indices unavailable.

Check cluster pending tasks for coordination backlog.
```
$ curl -sS "http://localhost:9200/_cluster/pending_tasks?pretty"
{
  "tasks" : [ ]
}
```
Non-empty tasks with rising time_in_queue_millis indicates sustained cluster state pressure.

Author: Mohd Shakir Zakaria
Mohd Shakir Zakaria is a cloud architect with deep roots in software development and open-source advocacy. Certified in AWS, Red Hat, VMware, ITIL, and Linux, he specializes in designing and managing robust cloud and on-premises infrastructures.