Recovery traffic can overwhelm storage and network capacity after node restarts, failures, snapshot restores, or shard relocations, stretching repair windows and spiking search or indexing latency. Recovery throttling keeps the cluster responsive while shards are copied and replayed back into place.
Elasticsearch exposes recovery throttles through the dynamic /_cluster/settings API. The main bandwidth cap is indices.recovery.max_bytes_per_sec, which limits inbound and outbound recovery traffic per node for both peer recoveries and snapshot-based recoveries. Concurrency is controlled by cluster.routing.allocation.node_concurrent_recoveries or the more specific cluster.routing.allocation.node_concurrent_incoming_recoveries and cluster.routing.allocation.node_concurrent_outgoing_recoveries settings.
The default bandwidth cap is usually 40mb per node, while dedicated cold and frozen nodes use higher memory-based defaults. Elastic also documents the recovery concurrency defaults as 2 and recommends changing them cautiously, since aggressive recovery settings can increase disk, CPU, and network contention rather than materially shortening recovery time. Prefer persistent settings for routine tuning, and replace http://localhost:9200 with the secured cluster endpoint plus HTTPS, authentication, and CA options when security is enabled.
Steps to configure Elasticsearch recovery throttling:
- Review the current recovery bandwidth and concurrency defaults before setting an override.
$ curl -sS "http://localhost:9200/_cluster/settings?include_defaults=true&filter_path=defaults.indices.recovery.max_bytes_per_sec,defaults.cluster.routing.allocation.node_concurrent_incoming_recoveries,defaults.cluster.routing.allocation.node_concurrent_outgoing_recoveries,defaults.cluster.routing.allocation.node_concurrent_recoveries&pretty" { "defaults" : { "cluster" : { "routing" : { "allocation" : { "node_concurrent_incoming_recoveries" : "2", "node_concurrent_outgoing_recoveries" : "2", "node_concurrent_recoveries" : "2" } } }, "indices" : { "recovery" : { "max_bytes_per_sec" : "40mb" } } } }GET /_cluster/settings returns only explicit overrides by default, so include_defaults=true is needed to expose the fallback values. Dedicated cold and frozen nodes can report a higher default for indices.recovery.max_bytes_per_sec based on available memory.
- Apply persistent recovery bandwidth and per-node concurrency limits.
$ curl -sS -H "Content-Type: application/json" -X PUT "http://localhost:9200/_cluster/settings?pretty" -d '{ "persistent": { "indices.recovery.max_bytes_per_sec": "20mb", "cluster.routing.allocation.node_concurrent_incoming_recoveries": 1, "cluster.routing.allocation.node_concurrent_outgoing_recoveries": 1 } }' { "acknowledged" : true, "persistent" : { "cluster" : { "routing" : { "allocation" : { "node_concurrent_incoming_recoveries" : "1", "node_concurrent_outgoing_recoveries" : "1" } } }, "indices" : { "recovery" : { "max_bytes_per_sec" : "20mb" } } }, "transient" : { } }The more specific incoming and outgoing settings override cluster.routing.allocation.node_concurrent_recoveries when they are set.
Secured clusters need https plus authentication, and update calls require a credential with cluster manage privilege.
Lower recovery limits reduce pressure on busy nodes but extend replica catch-up and rebalance time. Raising these settings above the defaults can destabilize busy clusters instead of helping them.
- Read back the explicit cluster overrides to confirm the new throttle is active.
$ curl -sS "http://localhost:9200/_cluster/settings?pretty" { "persistent" : { "cluster" : { "routing" : { "allocation" : { "node_concurrent_incoming_recoveries" : "1", "node_concurrent_outgoing_recoveries" : "1" } } }, "indices" : { "recovery" : { "max_bytes_per_sec" : "20mb" } } }, "transient" : { } }This plain read keeps the confirmation concise because it returns only persistent and transient overrides, not the full defaults tree.
- Remove the persistent overrides to return to the node defaults or any value supplied from elasticsearch.yml.
$ curl -sS -H "Content-Type: application/json" -X PUT "http://localhost:9200/_cluster/settings?pretty" -d '{ "persistent": { "indices.recovery.max_bytes_per_sec": null, "cluster.routing.allocation.node_concurrent_incoming_recoveries": null, "cluster.routing.allocation.node_concurrent_outgoing_recoveries": null } }' { "acknowledged" : true, "persistent" : { }, "transient" : { } }Assigning null removes the explicit override instead of setting the literal string null.
- Confirm the cluster has fallen back to the default recovery throttling values.
$ curl -sS "http://localhost:9200/_cluster/settings?include_defaults=true&filter_path=defaults.indices.recovery.max_bytes_per_sec,defaults.cluster.routing.allocation.node_concurrent_incoming_recoveries,defaults.cluster.routing.allocation.node_concurrent_outgoing_recoveries,defaults.cluster.routing.allocation.node_concurrent_recoveries&pretty" { "defaults" : { "cluster" : { "routing" : { "allocation" : { "node_concurrent_incoming_recoveries" : "2", "node_concurrent_outgoing_recoveries" : "2", "node_concurrent_recoveries" : "2" } } }, "indices" : { "recovery" : { "max_bytes_per_sec" : "40mb" } } } }
Mohd Shakir Zakaria is a cloud architect with deep roots in software development and open-source advocacy. Certified in AWS, Red Hat, VMware, ITIL, and Linux, he specializes in designing and managing robust cloud and on-premises infrastructures.
