Monitoring shard recovery in Elasticsearch exposes how quickly data is copied and replayed after node loss, restarts, snapshot restores, or shard rebalancing, keeping performance surprises and prolonged risk windows to a minimum.

Shard recovery progresses through phases that copy segment files and catch up recent operations, typically moving from index to translog before finalizing. The compact /_cat/recovery view is designed for fast, sortable snapshots of what is currently moving, while /<index>/_recovery provides per-shard details that pinpoint whether file transfer, checksum verification, or translog replay is the bottleneck.

Requests shown use http://localhost:9200 as an example endpoint and may require HTTPS plus authentication in secured clusters. Slow recoveries are commonly caused by recovery throttling (for example indices.recovery.max_bytes_per_sec), disk saturation, or noisy neighbors on the same nodes, and forcing extra relocations during recovery can amplify the load.

Steps to monitor shard recovery in Elasticsearch:

  1. Check cluster health for relocating, initializing, and unassigned shards.
    $ curl -s "http://localhost:9200/_cluster/health?pretty"
    {
      "cluster_name" : "es-cluster",
      "status" : "yellow",
      "timed_out" : false,
      "number_of_nodes" : 3,
      "number_of_data_nodes" : 3,
      "active_primary_shards" : 128,
      "active_shards" : 253,
      "relocating_shards" : 2,
      "initializing_shards" : 1,
      "unassigned_shards" : 3,
      "delayed_unassigned_shards" : 0,
      "number_of_pending_tasks" : 0,
      "number_of_in_flight_fetch" : 0,
      "task_max_waiting_in_queue_millis" : 0,
      "active_shards_percent_as_number" : 98.8
    }

    Secured clusters may require HTTPS, credentials or an API key, and a CA certificate for curl.

  2. List active shard recoveries with stage and progress columns.
    $ curl -s "http://localhost:9200/_cat/recovery?v&active_only=true&h=index,shard,time,type,stage,source_node,target_node,files_percent,bytes_percent,translog_ops_percent&s=index,shard"
    index        shard time  type  stage    source_node target_node files_percent bytes_percent translog_ops_percent
    logs-2025.01 0     00:18 peer  index    es-hot-1    es-hot-2    73.2%         68.4%        0.0%
    logs-2025.01 1     00:07 peer  translog es-hot-1    es-hot-3    100.0%        100.0%       42.7%

    Empty output indicates there are no active recoveries.

  3. Refresh the recovery view periodically to track live progress.
    $ watch -n 2 'curl -s "http://localhost:9200/_cat/recovery?v&active_only=true&h=index,shard,time,stage,source_node,target_node,bytes_percent&s=index,shard"'
    Every 2.0s: curl -s "http://localhost:9200/_cat/recovery?v&active_only=true&h=index,shard,time,stage,source_node,target_node,bytes_percent&s=index,shard"  Fri Jan  2 10:15:01 2026
    
    index        shard time  stage    source_node target_node bytes_percent
    logs-2025.01 0     00:20 translog es-hot-1    es-hot-2    55.9%

    Press Ctrl+C to exit watch.

  4. Inspect detailed recovery metrics for a specific index.
    $ curl -s "http://localhost:9200/logs-2025.01/_recovery?active_only=true&detailed=true&pretty"
    {
      "logs-2025.01" : {
        "shards" : [
          {
            "id" : 0,
            "type" : "PEER",
            "stage" : "TRANSLOG",
            "primary" : false,
            "source" : {
              "name" : "es-hot-1"
            },
            "target" : {
              "name" : "es-hot-2"
            },
            "index" : {
              "files" : {
                "total" : 152,
                "recovered" : 152,
                "percent" : "100.0%"
              },
              "size" : {
                "total_in_bytes" : 104857600,
                "recovered_in_bytes" : 58720256,
                "percent" : "56.0%"
              }
            },
            "translog" : {
              "recovered" : 427,
              "total" : 1000,
              "percent" : "42.7%"
            }
          }
    ##### snipped #####
        ]
      }
    }
  5. Confirm shards for the affected index return to STARTED state.
    $ curl -s "http://localhost:9200/_cat/shards/logs-2025.01?v&h=index,shard,prirep,state,docs,store,node&s=shard,prirep"
    index        shard prirep state   docs store   node
    logs-2025.01 0     p      STARTED 9412 102.3mb es-hot-1
    logs-2025.01 0     r      STARTED 9412 102.3mb es-hot-2
    logs-2025.01 1     p      STARTED 9388 101.9mb es-hot-1
    logs-2025.01 1     r      STARTED 9388 101.9mb es-hot-3

    States like INITIALIZING and RELOCATING indicate recovery is still in progress.

  6. Wait until the cluster reports zero relocating and initializing shards.
    $ curl -s "http://localhost:9200/_cluster/health?wait_for_no_relocating_shards=true&wait_for_no_initializing_shards=true&timeout=60s&pretty"
    {
      "cluster_name" : "es-cluster",
      "status" : "yellow",
      "timed_out" : false,
      "number_of_nodes" : 3,
      "number_of_data_nodes" : 3,
      "active_primary_shards" : 128,
      "active_shards" : 256,
      "relocating_shards" : 0,
      "initializing_shards" : 0,
      "unassigned_shards" : 2,
      "delayed_unassigned_shards" : 0,
      "number_of_pending_tasks" : 0,
      "number_of_in_flight_fetch" : 0,
      "task_max_waiting_in_queue_millis" : 0,
      "active_shards_percent_as_number" : 99.2
    }

    Cluster status can remain yellow with missing replicas even when recovery and relocation have finished.