Monitoring shard recovery in Elasticsearch shows whether replicas, relocated shards, or restored data are actually catching up after node restarts, node loss, disk rebalancing, or snapshot restores. Fast visibility here reduces the risk of assuming a cluster is healthy while data movement is still consuming bandwidth and leaving shards partially protected.
The quickest operator view is /_cat/recovery, which lists active shard copies with their recovery type, current stage, source and target nodes, and progress percentages. For a deeper read, /<index>/_recovery exposes per-shard file, byte, translog, and timing details, while cluster health counters show whether relocation and initialization are still draining across the cluster.
Secured deployments commonly use an authenticated HTTPS endpoint for these curl requests. Elastic's current API docs still position the CAT endpoints as operator-facing views rather than application APIs, and slow or stalled recoveries commonly trace back to throttling, disk pressure, or allocation rules rather than the recovery commands themselves.
$ curl -sS --fail "http://localhost:9200/_cluster/health?filter_path=cluster_name,status,number_of_nodes,number_of_data_nodes,relocating_shards,initializing_shards,unassigned_shards,number_of_pending_tasks,active_shards_percent_as_number&pretty"
{
"cluster_name" : "es-cluster",
"status" : "yellow",
"number_of_nodes" : 3,
"number_of_data_nodes" : 3,
"relocating_shards" : 2,
"initializing_shards" : 1,
"unassigned_shards" : 3,
"number_of_pending_tasks" : 0,
"active_shards_percent_as_number" : 98.8
}
Rising relocating_shards or initializing_shards counts confirm recovery work is still active, while persistent unassigned_shards often point to an allocation problem rather than slow copying alone.
For secured clusters, switch the URL to https:// and add authentication such as --user elastic:password or -H "Authorization: ApiKey BASE64VALUE" when the HTTP endpoint uses a private CA.
$ curl -sS --fail "http://localhost:9200/_cat/health?v=true&h=cluster,status,node.total,node.data,shards,pri,relo,init,unassign,pending_tasks,active_shards_percent" cluster status node.total node.data shards pri relo init unassign pending_tasks active_shards_percent es-cluster yellow 3 3 256 128 2 1 3 0 98.8%
Elastic's current CAT health documentation still calls this view useful for tracking recovery over time, but it remains intended for human triage rather than monitoring integrations.
$ curl -sS --fail "http://localhost:9200/_cat/recovery?v&active_only=true&h=index,shard,time,type,stage,source_node,target_node,files_percent,bytes_percent,translog_ops_percent&s=index,shard" index shard time type stage source_node target_node files_percent bytes_percent translog_ops_percent logs-2026.04 0 01:12 peer index es-hot-1 es-hot-2 74.6% 68.3% 0.0% logs-2026.04 1 00:19 peer translog es-hot-1 es-hot-3 100.0% 100.0% 42.7%
The current CAT recovery API still supports active_only=true, so empty output means there are no shard recoveries in progress at that moment.
$ watch -n 2 'curl -sS --fail "http://localhost:9200/_cat/recovery?v&active_only=true&h=index,shard,time,stage,source_node,target_node,bytes_percent,translog_ops_percent&s=index,shard"' Every 2.0s: curl -sS --fail "http://localhost:9200/_cat/recovery?v&active_only=true&h=index,shard,time,stage,source_node,target_node,bytes_percent,translog_ops_percent&s=index,shard" Thu Apr 2 16:30:01 2026 index shard time stage source_node target_node bytes_percent translog_ops_percent logs-2026.04 0 01:14 index es-hot-1 es-hot-2 71.1% 0.0% logs-2026.04 1 00:21 translog es-hot-1 es-hot-3 100.0% 48.9%
Press Ctrl+C to stop watch. When percentages stay flat for multiple refreshes, compare the affected shards against throttling, disk, and allocation signals before forcing extra movement.
$ curl -sS --fail "http://localhost:9200/logs-2026.04/_recovery?active_only=true&detailed=true&human&pretty"
{
"logs-2026.04" : {
"shards" : [
{
"id" : 0,
"type" : "PEER",
"stage" : "TRANSLOG",
"primary" : false,
"source" : {
"name" : "es-hot-1"
},
"target" : {
"name" : "es-hot-2"
},
"index" : {
"files" : {
"total" : 152,
"recovered" : 152,
"percent" : "100.0%"
},
"size" : {
"total" : "100mb",
"total_in_bytes" : 104857600,
"recovered" : "100mb",
"recovered_in_bytes" : 104857600,
"percent" : "100.0%"
},
"source_throttle_time" : "0s",
"target_throttle_time" : "0s"
},
"translog" : {
"recovered" : 489,
"total" : 1000,
"percent" : "48.9%",
"total_time" : "4.2s"
},
"verify_index" : {
"check_index_time" : "0s"
}
}
##### snipped #####
]
}
}
Elastic's current index recovery API still supports active_only=true and detailed=true together. INDEX, VERIFY_INDEX, TRANSLOG, and DONE stages help separate file copy, validation, replay, and completion.
$ curl -sS --fail "http://localhost:9200/_cat/shards/logs-2026.04?v=true&h=index,shard,prirep,state,node,unassigned.reason&s=shard,prirep" index shard prirep state node unassigned.reason logs-2026.04 0 p STARTED es-hot-1 logs-2026.04 0 r STARTED es-hot-2 logs-2026.04 1 p STARTED es-hot-1 logs-2026.04 1 r STARTED es-hot-3
The current CAT shards API still supports the unassigned.reason column. Focus first on replicas or primaries that remain UNASSIGNED, INITIALIZING, or RELOCATING after the rest of the index has settled.
$ curl -sS --fail "http://localhost:9200/_cluster/health?wait_for_no_relocating_shards=true&wait_for_no_initializing_shards=true&timeout=60s&filter_path=cluster_name,status,relocating_shards,initializing_shards,unassigned_shards,active_shards_percent_as_number,timed_out&pretty"
{
"cluster_name" : "es-cluster",
"status" : "yellow",
"timed_out" : false,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 2,
"active_shards_percent_as_number" : 99.2
}
Current cluster health parameters still support waiting for zero relocating and initializing shards. A yellow result can still be acceptable when recovery is complete but one or more replicas remain intentionally unavailable or cannot be placed.