Monitoring Elasticsearch cluster health surfaces shard allocation issues, node outages, and coordination backlogs before search and indexing become unreliable.
The cluster health API summarizes allocation state using a green/yellow/red status plus shard counters, while the cat APIs provide compact tables for nodes, indices, and shard placement.
Production clusters often require TLS plus authentication, so API requests may need credentials, an API key, and https:// endpoints. High-frequency polling of verbose endpoints can add overhead on large clusters, so prefer filtered responses and small column sets for repeated checks.
Steps to monitor Elasticsearch cluster health:
- Query cluster health for overall status, allocation counters.
$ curl -sS "http://localhost:9200/_cluster/health?filter_path=cluster_name,status,number_of_nodes,number_of_data_nodes,active_primary_shards,active_shards,relocating_shards,initializing_shards,unassigned_shards,number_of_pending_tasks,active_shards_percent_as_number&pretty" { "cluster_name" : "search-cluster", "status" : "green", "number_of_nodes" : 1, "number_of_data_nodes" : 1, "active_primary_shards" : 3, "active_shards" : 3, "relocating_shards" : 0, "initializing_shards" : 0, "unassigned_shards" : 0, "number_of_pending_tasks" : 0, "active_shards_percent_as_number" : 100.0 }green = all shards allocated; yellow = replica shards unassigned; red = primary shards unassigned. Secured clusters commonly use https://host:9200 plus -u user:pass or -H "Authorization: ApiKey BASE64VALUE" .
- List nodes with roles, resource indicators.
$ curl -sS "http://localhost:9200/_cat/nodes?v&h=ip,heap.percent,ram.percent,cpu,load_1m,load_5m,load_15m,node.role,master,name" ip heap.percent ram.percent cpu load_1m load_5m load_15m node.role master name 192.0.2.40 25 98 0 0.64 0.91 0.89 cdfhilmrstw * node-01
master column shows the elected master as *. Add format=json on cat endpoints for scripting.
- Review node disk usage from allocation output.
$ curl -sS "http://localhost:9200/_cat/allocation?v&h=shards,disk.indices,disk.used,disk.avail,disk.total,disk.percent,ip,node" shards disk.indices disk.used disk.avail disk.total disk.percent ip node 3 10.6kb 115.9gb 1.6tb 1.7tb 6 192.0.2.40 node-01Disk watermarks can block shard allocation, leaving shards unassigned.
- Review index health, document counts, store sizes.
$ curl -sS "http://localhost:9200/_cat/indices?v&s=health,status,index&h=health,status,index,pri,rep,docs.count,store.size" health status index pri rep docs.count store.size green open logs-2024.12 1 0 0 249b green open logs-2026.01 1 0 1 5.4kb green open metrics-2026.01 1 0 1 4.9kb
yellow indices often indicate replica shards missing on single-node clusters with replicas configured.
- Inspect shard placement for relocations, initializations, unassigned shards.
$ curl -sS "http://localhost:9200/_cat/shards?v&s=index,shard,prirep&h=index,shard,prirep,state,docs,store,ip,node" index shard prirep state docs store ip node logs-2024.12 0 p STARTED 0 249b 192.0.2.40 node-01 logs-2026.01 0 p STARTED 1 5.4kb 192.0.2.40 node-01 metrics-2026.01 0 p STARTED 1 4.9kb 192.0.2.40 node-01
UNASSIGNED primary shards can make affected indices unavailable.
- Check cluster pending tasks for coordination backlog.
$ curl -sS "http://localhost:9200/_cluster/pending_tasks?pretty" { "tasks" : [ ] }Non-empty tasks with rising time_in_queue_millis indicates sustained cluster state pressure.
Mohd Shakir Zakaria is a cloud architect with deep roots in software development and open-source advocacy. Certified in AWS, Red Hat, VMware, ITIL, and Linux, he specializes in designing and managing robust cloud and on-premises infrastructures.
