How to configure Elasticsearch shard rebalancing

Shard rebalancing keeps Elasticsearch from concentrating too many shards on the same nodes after node failures, rolling restarts, tier changes, or cluster growth. Keeping shard movement under control helps avoid hot spots, uneven disk pressure, and long recovery windows after a disruption.

Elasticsearch automatically rebalances shards to improve cluster balance, but it still obeys allocation filtering, forced awareness, and data tier boundaries when choosing where shards may move. The dynamic setting cluster.routing.rebalance.enable controls whether all shards, only primaries, only replicas, or none are eligible for rebalancing, while cluster.routing.allocation.allow_rebalance can still postpone movement until the cluster reaches the required health state.

Elastic documents all as the default and recommends persistent cluster settings for routine operations because transient overrides can clear unexpectedly on unstable clusters. These examples target self-managed clusters using the /_cluster/settings API. Elastic Cloud Hosted and Elastic Cloud Enterprise deployments are generally safer to manage through deployment user settings, and secured clusters typically need https plus authentication for every request.

Steps to configure Elasticsearch shard rebalancing:

  1. Review the current rebalance mode and the gate that allows rebalancing to start.
    $ curl -sS "http://localhost:9200/_cluster/settings?include_defaults=true&filter_path=defaults.cluster.routing.rebalance.enable,defaults.cluster.routing.allocation.allow_rebalance,persistent.cluster.routing.rebalance.enable,persistent.cluster.routing.allocation.allow_rebalance,transient.cluster.routing.rebalance.enable,transient.cluster.routing.allocation.allow_rebalance&pretty"
    {
      "defaults" : {
        "cluster" : {
          "routing" : {
            "rebalance" : {
              "enable" : "all"
            },
            "allocation" : {
              "allow_rebalance" : "always"
            }
          }
        }
      }
    }

    cluster.routing.rebalance.enable controls which shard copies may move, while cluster.routing.allocation.allow_rebalance decides when those moves are permitted to begin.

  2. Apply a persistent shard rebalancing mode.
    $ curl -sS -H "Content-Type: application/json" -X PUT "http://localhost:9200/_cluster/settings?pretty" -d '{
      "persistent": {
        "cluster.routing.rebalance.enable": "none"
      }
    }'
    {
      "acknowledged" : true,
      "persistent" : {
        "cluster" : {
          "routing" : {
            "rebalance" : {
              "enable" : "none"
            }
          }
        }
      },
      "transient" : { }
    }

    Replace none with primaries or replicas when rebalancing should continue for only one shard type instead of pausing completely.

    Secured clusters need https plus an authenticated request that can update cluster settings.

    Elastic recommends setting this back to all as soon as the maintenance window or operational constraint ends so the cluster can return to a balanced and fully resilient state.

  3. Read back the explicit override to confirm the new mode is active.
    $ curl -sS "http://localhost:9200/_cluster/settings?flat_settings=true&pretty"
    {
      "persistent" : {
        "cluster.routing.rebalance.enable" : "none"
      },
      "transient" : { }
    }

    GET /_cluster/settings returns only explicit overrides when include_defaults=true is omitted, and flat_settings=true keeps the response concise for quick checks.

  4. Clear the override after maintenance to return to the default rebalance policy.
    $ curl -sS -H "Content-Type: application/json" -X PUT "http://localhost:9200/_cluster/settings?pretty" -d '{
      "persistent": {
        "cluster.routing.rebalance.enable": null
      }
    }'
    {
      "acknowledged" : true,
      "persistent" : { },
      "transient" : { }
    }

    Assigning null removes the explicit override instead of writing the literal string null. Restore the recorded pre-maintenance value instead when the cluster was intentionally using primaries or replicas.

  5. Confirm that the cluster has returned to the default rebalance mode.
    $ curl -sS "http://localhost:9200/_cluster/settings?include_defaults=true&filter_path=defaults.cluster.routing.rebalance.enable,persistent.cluster.routing.rebalance.enable,transient.cluster.routing.rebalance.enable&pretty"
    {
      "defaults" : {
        "cluster" : {
          "routing" : {
            "rebalance" : {
              "enable" : "all"
            }
          }
        }
      }
    }

    When an explicit persistent or transient value still appears here, that override is taking precedence over the built-in default.

  6. Wait for shard relocations to drain after re-enabling regular balancing.
    $ curl -sS "http://localhost:9200/_cluster/health?wait_for_no_relocating_shards=true&timeout=60s&pretty"
    {
      "cluster_name" : "docker-cluster",
      "status" : "green",
      "timed_out" : false,
      "number_of_nodes" : 1,
      "number_of_data_nodes" : 1,
      "active_primary_shards" : 4,
      "active_shards" : 4,
      "relocating_shards" : 0,
      "initializing_shards" : 0,
      "unassigned_shards" : 0,
      "unassigned_primary_shards" : 0,
      "delayed_unassigned_shards" : 0,
      "number_of_pending_tasks" : 0,
      "number_of_in_flight_fetch" : 0,
      "task_max_waiting_in_queue_millis" : 0,
      "active_shards_percent_as_number" : 100.0
    }

    Zero relocating_shards confirms no rebalance moves are still in flight. If the cluster remains uneven, check allocation filters, forced awareness, data tier rules, and cluster.routing.allocation.allow_rebalance.