Shard recovery can saturate storage and network bandwidth during node restarts, failures, and shard rebalancing, causing noticeable spikes in search and indexing latency. Recovery throttling keeps the cluster usable while shard data is being copied and replicas are catching up.
Elasticsearch controls recovery speed with a per-node bandwidth cap and shard allocation concurrency limits. The indices.recovery.max_bytes_per_sec setting caps total inbound and outbound recovery traffic per node, while cluster.routing.allocation.node_concurrent_recoveries and cluster.routing.allocation.cluster_concurrent_rebalance limit how many recoveries and rebalances run in parallel.
Higher limits reduce time spent waiting for replicas and rebalances, but aggressive values compete with normal workloads by consuming disk I/O, CPU, and network capacity. Secured clusters typically require HTTPS and authentication for API calls, and some releases replace cluster.routing.allocation.node_concurrent_recoveries with separate incoming/outgoing recovery settings, so setting names should match the running Elasticsearch version.
Steps to configure Elasticsearch recovery throttling:
- Review current recovery throttling defaults and overrides in cluster settings.
$ curl -s "http://localhost:9200/_cluster/settings?include_defaults=true&filter_path=defaults.indices.recovery.max_bytes_per_sec,defaults.cluster.routing.allocation.node_concurrent_recoveries,defaults.cluster.routing.allocation.cluster_concurrent_rebalance,persistent.indices.recovery.max_bytes_per_sec,persistent.cluster.routing.allocation.*,transient.indices.recovery.max_bytes_per_sec,transient.cluster.routing.allocation.*&pretty" { "defaults" : { "cluster" : { "routing" : { "allocation" : { "cluster_concurrent_rebalance" : "2", "node_concurrent_recoveries" : "2" } } }, "indices" : { "recovery" : { "max_bytes_per_sec" : "40mb" } } } } - Set recovery throttle and concurrency settings as persistent cluster settings.
$ curl -s -H "Content-Type: application/json" -X PUT "http://localhost:9200/_cluster/settings" -d '{ "persistent": { "indices.recovery.max_bytes_per_sec": "80mb", "cluster.routing.allocation.node_concurrent_recoveries": 2, "cluster.routing.allocation.cluster_concurrent_rebalance": 2 } }' { "acknowledged" : true, "persistent" : { "cluster" : { "routing" : { "allocation" : { "cluster_concurrent_rebalance" : "2", "node_concurrent_recoveries" : "2" } } }, "indices" : { "recovery" : { "max_bytes_per_sec" : "80mb" } } }, "transient" : { } }Secured clusters typically require https:// and authentication options (for example curl --user and --cacert) instead of unauthenticated http://localhost:9200.
Setting indices.recovery.max_bytes_per_sec too high or increasing concurrency too far can saturate storage and network, increasing query latency and triggering client timeouts.
- Verify the recovery throttling settings are present in the persistent cluster settings.
$ curl -s "http://localhost:9200/_cluster/settings?filter_path=persistent.indices.recovery.max_bytes_per_sec,persistent.cluster.routing.allocation.*&pretty" { "persistent" : { "indices" : { "recovery" : { "max_bytes_per_sec" : "80mb" } }, "cluster" : { "routing" : { "allocation" : { "node_concurrent_recoveries" : "2", "cluster_concurrent_rebalance" : "2" } } } } }Use a unit suffix such as mb or gb for indices.recovery.max_bytes_per_sec, and use transient instead of persistent for temporary tuning.
- Remove the persistent overrides to revert to the default recovery throttling behavior.
$ curl -s -H "Content-Type: application/json" -X PUT "http://localhost:9200/_cluster/settings" -d '{ "persistent": { "indices.recovery.max_bytes_per_sec": null, "cluster.routing.allocation.node_concurrent_recoveries": null, "cluster.routing.allocation.cluster_concurrent_rebalance": null } }' {"acknowledged":true,"persistent":{},"transient":{}}
Mohd Shakir Zakaria is a cloud architect with deep roots in software development and open-source advocacy. Certified in AWS, Red Hat, VMware, ITIL, and Linux, he specializes in designing and managing robust cloud and on-premises infrastructures.
