Removing a node from an Elasticsearch cluster is routine during host replacement, hardware retirement, and capacity changes. Draining shards before the service stops keeps primary and replica copies available so searches and indexing can continue through the maintenance window.

Shard placement is controlled by the cluster allocator. Applying a temporary cluster.routing.allocation.exclude rule tells Elasticsearch to move shard copies away from the departing data node while the node stays online, giving the cluster time to rebalance before the process is stopped.

Recent self-managed Elasticsearch deployments often use TLS and authentication by default, so the API calls may require https, credentials, and the cluster CA file instead of plain http. When the departing node is master-eligible, confirm the remaining master-eligible nodes can still form a quorum; removing half or more of them in a short period requires a temporary voting configuration exclusion before the node is stopped.

Steps to remove a node from an Elasticsearch cluster:

  1. List cluster nodes and note the exact node.name to remove.
    $ curl -sS "http://localhost:9200/_cat/nodes?v&h=ip,name,node.role,master"
    ip         name    node.role   master
    192.0.2.41 node-02 cdfhilmrstw *
    192.0.2.43 node-03 cdfhilmrstw -
    192.0.2.40 node-01 cdfhilmrstw -

    If the cluster uses the default security setup, replace http with https and add credentials plus the cluster CA, for example:

    $ curl -sS -u "elastic:%%password%%" "https://localhost:9200/_cat/nodes?v&h=ip,name,node.role,master"
  2. Check cluster health before starting the drain.
    $ curl -sS "http://localhost:9200/_cluster/health?pretty"
    {
      "cluster_name" : "search-cluster",
      "status" : "green",
      "timed_out" : false,
      "number_of_nodes" : 3,
      "number_of_data_nodes" : 3,
      "active_primary_shards" : 3,
      "active_shards" : 6,
      "relocating_shards" : 0,
      "initializing_shards" : 0,
      "unassigned_shards" : 0,
      "delayed_unassigned_shards" : 0,
      "number_of_pending_tasks" : 0,
      "number_of_in_flight_fetch" : 0,
      "task_max_waiting_in_queue_millis" : 0,
      "active_shards_percent_as_number" : 100.0
    }

    Proceeding while status is red, or while primary shards are unassigned, can make indices unavailable during the removal.

  3. Exclude the node from shard allocation if it stores data.
    $ curl -sS -H "Content-Type: application/json" -X PUT "http://localhost:9200/_cluster/settings" -d '{
      "persistent": {
        "cluster.routing.allocation.exclude._name": "node-03"
      }
    }'
    {
      "acknowledged" : true,
      "persistent" : {
        "cluster" : {
          "routing" : {
            "allocation" : {
              "exclude" : {
                "_name" : "node-03"
              }
            }
          }
        }
      },
      "transient" : { }
    }

    Use cluster.routing.allocation.exclude._id or cluster.routing.allocation.exclude._ip when a persistent node identifier or IP address is more reliable than node.name. Dedicated master-only and coordinating-only nodes with no shards can usually skip this step.

  4. Wait for shard relocation to finish before stopping a drained data node.
    $ curl -sS "http://localhost:9200/_cluster/health?wait_for_no_relocating_shards=true&timeout=30m&pretty"
    {
      "cluster_name" : "search-cluster",
      "status" : "green",
      "timed_out" : false,
      "number_of_nodes" : 3,
      "number_of_data_nodes" : 3,
      "active_primary_shards" : 3,
      "active_shards" : 6,
      "relocating_shards" : 0,
      "initializing_shards" : 0,
      "unassigned_shards" : 0,
      "delayed_unassigned_shards" : 0,
      "number_of_pending_tasks" : 0,
      "number_of_in_flight_fetch" : 0,
      "task_max_waiting_in_queue_millis" : 0,
      "active_shards_percent_as_number" : 100.0
    }

    Increase timeout for large nodes or slower storage. Nodes without data can move to the service stop step after the cluster health and master-eligibility checks are complete.

  5. Confirm the excluded node now hosts zero shards.
    $ curl -sS "http://localhost:9200/_cat/allocation?v&h=shards,disk.indices,disk.used,disk.avail,disk.total,disk.percent,host,ip,node"
    shards disk.indices disk.used disk.avail disk.total disk.percent host       ip         node
    3      10kb         40gb      18.2gb     58.3gb     68           192.0.2.40 192.0.2.40 node-01
    0      0b           40gb      18.2gb     58.3gb     68           192.0.2.43 192.0.2.43 node-03
    3      10kb         40gb      18.2gb     58.3gb     68           192.0.2.41 192.0.2.41 node-02

    If the shard count does not reach 0, remaining allocation rules, insufficient free capacity, or missing replica copies can block relocation.

  6. Stop the Elasticsearch service on the node being removed.
    $ sudo systemctl stop elasticsearch

    If the departing node is master-eligible and its removal takes half or more of the master-eligible nodes out of service, add a voting configuration exclusion before stopping it:

    $ curl -sS -X POST "http://localhost:9200/_cluster/voting_config_exclusions?node_names=node-03"

    Clear it after the node leaves the cluster with

    $ curl -sS -X DELETE "http://localhost:9200/_cluster/voting_config_exclusions"

    .

    If the host is being permanently decommissioned, follow the stop with sudo systemctl disable elasticsearch so the node does not rejoin on the next boot.

  7. Verify the node is no longer listed after the service stops.
    $ curl -sS "http://localhost:9200/_cat/nodes?v&h=ip,name,node.role,master"
    ip         name    node.role   master
    192.0.2.41 node-02 cdfhilmrstw *
    192.0.2.40 node-01 cdfhilmrstw -
  8. Clear the temporary allocation exclusion after the node is offline.
    $ curl -sS -H "Content-Type: application/json" -X PUT "http://localhost:9200/_cluster/settings" -d '{
      "persistent": {
        "cluster.routing.allocation.exclude._name": null
      }
    }'
    {
      "acknowledged" : true,
      "persistent" : { },
      "transient" : { }
    }

    Clearing the exclusion prevents it from blocking a replacement node that reuses the same node.name or address. If a voting configuration exclusion was added earlier, clear it separately after the node has fully left the cluster.

  9. Verify cluster health after the removal completes.
    $ curl -sS "http://localhost:9200/_cluster/health?pretty"
    {
      "cluster_name" : "search-cluster",
      "status" : "green",
      "timed_out" : false,
      "number_of_nodes" : 2,
      "number_of_data_nodes" : 2,
      "active_primary_shards" : 3,
      "active_shards" : 6,
      "relocating_shards" : 0,
      "initializing_shards" : 0,
      "unassigned_shards" : 0,
      "delayed_unassigned_shards" : 0,
      "number_of_pending_tasks" : 0,
      "number_of_in_flight_fetch" : 0,
      "task_max_waiting_in_queue_millis" : 0,
      "active_shards_percent_as_number" : 100.0
    }