How to run a rolling upgrade for Elasticsearch

A rolling upgrade keeps a self-managed Elasticsearch cluster available while nodes move to a newer supported release, which is the safest way to apply security fixes, minor-version changes, or a planned major-version transition without taking the whole cluster offline.

During a rolling upgrade, one node leaves the cluster, is upgraded, and rejoins before the next node is touched. Mixed versions are supported only for the duration of the upgrade, and the upgrade order matters because upgraded nodes can join an older master, while older nodes are not guaranteed to join a cluster after the master-eligible nodes have already moved ahead.

Current Elastic guidance requires a supported upgrade path, a recent recovery snapshot, and version-aware preparation before the first restart. For major upgrades to 9.x, run the Upgrade Assistant from the latest 8.19 patch release first; secured clusters usually need HTTPS, authentication, and a trusted CA for every API call; and any extra plugins must match the target Elasticsearch version before the upgraded node starts.

Steps to run a rolling upgrade for Elasticsearch:

Record the current node versions and roles before the first restart.

$ curl -sS --fail --user elastic:password \
  "https://cluster.example.net:9200/_cat/nodes?v=true&h=name,ip,version,master,node.role"
name         ip           version master node.role
es-frozen-01 192.0.2.31   8.19.7  -      f
es-cold-01   192.0.2.21   8.19.7  -      c
es-warm-01   192.0.2.22   8.19.7  -      w
es-hot-01    192.0.2.11   8.19.7  -      h
es-ingest-01 192.0.2.41   8.19.7  -      i
es-master-01 192.0.2.51   8.19.7  *      m

Current Elastic docs say to upgrade data nodes first, tier-by-tier in the order frozen, cold, warm, hot, then any remaining data nodes, then dedicated ML, ingest, and coordinating nodes, and master-eligible nodes last.

For a major upgrade to 9.x, run the Upgrade Assistant from the latest 8.19 patch release before restarting any node.

Verify the cluster is stable before taking any node out of service.

$ curl -sS --fail --user elastic:password \
  "https://cluster.example.net:9200/_cluster/health?wait_for_status=green&wait_for_no_relocating_shards=true&wait_for_no_initializing_shards=true&pretty"
{
  "cluster_name" : "search-prod",
  "status" : "green",
  "timed_out" : false,
  "number_of_nodes" : 6,
  "number_of_data_nodes" : 4,
  "active_primary_shards" : 184,
  "active_shards" : 368,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "active_shards_percent_as_number" : 100.0
}

Keep the API base URL, authentication, and CA path consistent across the rest of the upgrade commands.

Confirm the snapshot that would be used for recovery if the upgrade must be abandoned.

$ curl -sS --fail --user elastic:password \
  "https://cluster.example.net:9200/_cat/snapshots/prod_repo?v&s=end_epoch:desc&h=id,status,end_epoch,duration,indices,successful_shards,failed_shards"
id                     status  end_epoch  duration indices successful_shards failed_shards
upgrade-2026.04.02-01 SUCCESS 1775113028 27s          184               368             0

Once any node has joined on the new version, rollback means restoring from a supported snapshot rather than downgrading that upgraded node in place.

Enable ML upgrade mode if the cluster runs active machine learning jobs or datafeeds.
```
$ curl -sS --fail --user elastic:password \
  -X POST "https://cluster.example.net:9200/_ml/set_upgrade_mode?enabled=true"
{
  "acknowledged" : true
}
```
Skip this step when the cluster does not use ML. Current Elastic guidance prefers upgrade mode over manually stopping every job and datafeed when temporary suspension is acceptable.

Disable replica allocation before stopping the next data node.

$ curl -sS --fail --user elastic:password \
  -H "Content-Type: application/json" -X PUT "https://cluster.example.net:9200/_cluster/settings?pretty" -d '{
  "persistent": {
    "cluster.routing.allocation.enable": "primaries"
  }
}'
{
  "acknowledged" : true,
  "persistent" : {
    "cluster" : {
      "routing" : {
        "allocation" : {
          "enable" : "primaries"
        }
      }
    }
  },
  "transient" : { }
}

Do this only when the node being upgraded currently holds data. Dedicated ingest, coordinating, ML, and master-only nodes do not need replica allocation disabled first.

Pause non-essential indexing and flush the cluster before restarting the selected node.
```
$ curl -sS --fail --user elastic:password \
  -X POST "https://cluster.example.net:9200/_flush?pretty"
{
  "_shards" : {
    "total" : 368,
    "successful" : 368,
    "failed" : 0
  }
}
```
The flush is optional, but current Elastic guidance notes that recovery is faster when recent writes have already been committed to disk.
Stop the Elasticsearch service on the selected node.
```
$ sudo systemctl stop elasticsearch.service
```
Related: How to manage the Elasticsearch service with systemctl in Linux

Upgrade the stopped node to the target Elasticsearch package version.

$ sudo apt-get update
Hit:1 https://artifacts.elastic.co/packages/9.x/apt stable InRelease
##### snipped #####
$ sudo apt-get install --only-upgrade elasticsearch
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
The following packages will be upgraded:
  elasticsearch
1 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.
##### snipped #####
Setting up elasticsearch (9.3.2) ...

Review package prompts carefully. Replacing /etc/elasticsearch/elasticsearch.yml, JVM options, or keystore-adjacent files incorrectly can keep the node from rejoining the cluster.

Upgrade any installed plugins on the stopped node before starting it again.

$ sudo /usr/share/elasticsearch/bin/elasticsearch-plugin list
analysis-icu
$ sudo /usr/share/elasticsearch/bin/elasticsearch-plugin remove analysis-icu
-> Removing analysis-icu...
-> Removed analysis-icu
$ sudo /usr/share/elasticsearch/bin/elasticsearch-plugin install --batch analysis-icu
-> Installing analysis-icu
-> Installed analysis-icu

Repeat the remove-and-install cycle for each listed plugin so every plugin matches the node's new Elasticsearch version. Skip this step when no extra plugins are installed.

Start the upgraded node.
```
$ sudo systemctl start elasticsearch.service
```
Related: How to manage the Elasticsearch service with systemctl in Linux

Confirm the upgraded node rejoined the cluster with the new version.

$ curl -sS --fail --user elastic:password \
  "https://cluster.example.net:9200/_cat/nodes?v=true&h=name,version,master,node.role"
name         version master node.role
es-frozen-01 9.3.2   -      f
es-cold-01   8.19.7  -      c
es-warm-01   8.19.7  -      w
es-hot-01    8.19.7  -      h
es-ingest-01 8.19.7  -      i
es-master-01 8.19.7  *      m

Leave cluster.initial_master_nodes unset during a rolling upgrade. The upgraded node is rejoining an existing cluster, not bootstrapping a new one.

Re-enable shard allocation after the upgraded data node is back in the cluster.

$ curl -sS --fail --user elastic:password \
  -H "Content-Type: application/json" -X PUT "https://cluster.example.net:9200/_cluster/settings?pretty" -d '{
  "persistent": {
    "cluster.routing.allocation.enable": null
  }
}'
{
  "acknowledged" : true,
  "persistent" : { },
  "transient" : { }
}

Skip this step when the upgraded node was not a data node, because allocation was never restricted for that restart.

Wait for recovery to finish before touching the next node.
```
$ curl -sS --fail --user elastic:password \
  "https://cluster.example.net:9200/_cat/health?v=true&h=cluster,status,relo,init,unassign,pending_tasks,active_shards_percent"
cluster     status relo init unassign pending_tasks active_shards_percent
search-prod yellow    0    0        2             0                 99.5%
```
If there are no relocating or initializing shards, it is safe to continue even when status remains yellow after the first upgraded data node. Current Elastic docs note that replicas may stay unassigned until another node on the new version is available to receive them.

Related: How to monitor Elasticsearch cluster health
Repeat the disable, stop, upgrade, start, and recovery cycle for the remaining nodes in the supported order.

Upgrade all data nodes tier-by-tier first: frozen, cold, warm, hot, then any remaining data nodes. After that, upgrade dedicated ML, ingest, and coordinating nodes, and upgrade master-eligible nodes last.

Do not stop half or more master-eligible nodes at the same time, or the cluster can become unavailable and older nodes may no longer rejoin.

Disable ML upgrade mode after the last node is back if it was enabled earlier.

$ curl -sS --fail --user elastic:password \
  -X POST "https://cluster.example.net:9200/_ml/set_upgrade_mode?enabled=false"
{
  "acknowledged" : true
}

Skip this step when ML upgrade mode was never enabled.

Verify all nodes report the target version after the final restart.

$ curl -sS --fail --user elastic:password \
  "https://cluster.example.net:9200/_cat/nodes?v=true&h=name,ip,version,master,node.role"
name         ip           version master node.role
es-frozen-01 192.0.2.31   9.3.2   -      f
es-cold-01   192.0.2.21   9.3.2   -      c
es-warm-01   192.0.2.22   9.3.2   -      w
es-hot-01    192.0.2.11   9.3.2   -      h
es-ingest-01 192.0.2.41   9.3.2   -      i
es-master-01 192.0.2.51   9.3.2   *      m

Confirm the cluster is fully healthy again after the last node upgrade.

$ curl -sS --fail --user elastic:password \
  "https://cluster.example.net:9200/_cluster/health?wait_for_status=green&wait_for_no_relocating_shards=true&wait_for_no_initializing_shards=true&pretty"
{
  "cluster_name" : "search-prod",
  "status" : "green",
  "timed_out" : false,
  "number_of_nodes" : 6,
  "number_of_data_nodes" : 4,
  "active_primary_shards" : 184,
  "active_shards" : 368,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "active_shards_percent_as_number" : 100.0
}

After a major upgrade, review archived settings and continue with compatible upgrades for Kibana and any ingest components that still run older versions.