Rolling upgrades keep an Elasticsearch cluster serving reads and writes while moving to a newer version, avoiding a full outage for routine maintenance and security patching.
A rolling upgrade takes one node out of the cluster, upgrades it, and brings it back before moving on to the next node. Shards and replicas keep data available across remaining nodes, and temporarily restricting allocation prevents the cluster from “helpfully” reshuffling replicas while a node is intentionally offline.
Commands use Ubuntu or Debian packages managed by systemd and assume a supported rolling-upgrade path for the target versions. Ensure a recent snapshot exists, expect temporarily reduced redundancy while each node is offline, and adjust the cluster API URL and authentication in the curl examples to match the deployment.
Steps to run a rolling upgrade for Elasticsearch:
- Confirm the cluster is stable before taking any node offline.
$ curl --silent --show-error "http://localhost:9200/_cluster/health?wait_for_status=green&wait_for_no_relocating_shards=true&wait_for_no_initializing_shards=true&pretty" { "cluster_name" : "search-cluster", "status" : "green", "timed_out" : false, "number_of_nodes" : 3, "number_of_data_nodes" : 3, "active_primary_shards" : 1, "active_shards" : 2, "relocating_shards" : 0, "initializing_shards" : 0, "unassigned_shards" : 0, "delayed_unassigned_shards" : 0, "number_of_pending_tasks" : 0, "number_of_in_flight_fetch" : 0, "task_max_waiting_in_queue_millis" : 0, "active_shards_percent_as_number" : 100.0 }Keep the API base URL and curl authentication consistent across all requests, especially on clusters using TLS and secured HTTP.
- Record the current node list and versions before upgrading the first node.
$ curl --silent --show-error "http://localhost:9200/_cat/nodes?h=name,ip,version,master,node.role" node-01 192.0.2.11 8.12.2 - cdfhilmrstw node-03 192.0.2.13 8.12.2 - cdfhilmrstw node-02 192.0.2.12 8.12.2 * cdfhilmrstw
- Restrict shard allocation to primaries before stopping the selected node.
$ curl --silent --show-error -H "Content-Type: application/json" -X PUT "http://localhost:9200/_cluster/settings?pretty" -d '{ "transient": { "cluster.routing.allocation.enable": "primaries" } }' { "acknowledged" : true, "persistent" : { }, "transient" : { "cluster" : { "routing" : { "allocation" : { "enable" : "primaries" } } } } }Replica allocation pauses while the node is offline, reducing unnecessary shard movement during the upgrade window.
- Flush shards to reduce recovery time when the upgraded node restarts.
$ curl --silent --show-error -X POST "http://localhost:9200/_flush?pretty" { "_shards" : { "total" : 2, "successful" : 2, "failed" : 0 } } - Stop the Elasticsearch service on the selected node.
$ sudo systemctl stop elasticsearch
- Upgrade the Elasticsearch package on the stopped node.
$ sudo apt update ##### snipped ##### $ sudo apt install --assume-yes elasticsearch Reading package lists... Done Building dependency tree... Done Reading state information... Done The following packages will be upgraded: elasticsearch 1 upgraded, 0 newly installed, 0 to remove and 0 not upgraded. ##### snipped ##### Setting up elasticsearch (8.19.9) ...
Review any package prompts carefully, since overwriting /etc/elasticsearch/elasticsearch.yml or JVM options can prevent the node from rejoining the cluster.
- Start the Elasticsearch service on the upgraded node.
$ sudo systemctl start elasticsearch
- Confirm the Elasticsearch service reports active state on the upgraded node.
$ sudo systemctl is-active elasticsearch active
- Confirm the upgraded node has rejoined the cluster.
$ curl --silent --show-error "http://localhost:9200/_cat/nodes?h=name,version,master" node-01 8.19.9 - node-02 8.12.2 * node-03 8.12.2 -
- Wait for the cluster to settle at yellow with no relocating or initializing shards.
$ curl --silent --show-error "http://localhost:9200/_cluster/health?wait_for_status=yellow&wait_for_no_relocating_shards=true&wait_for_no_initializing_shards=true&pretty" { "status" : "yellow", "timed_out" : false, "number_of_nodes" : 3, "number_of_data_nodes" : 3, "initializing_shards" : 0, "relocating_shards" : 0, "active_primary_shards" : 3, "active_shards" : 4, "unassigned_shards" : 2, "delayed_unassigned_shards" : 0, "number_of_pending_tasks" : 0, "number_of_in_flight_fetch" : 0, "task_max_waiting_in_queue_millis" : 0, "active_shards_percent_as_number" : 66.66666666666666 }Yellow is expected while allocation is restricted to primaries, since replica shards may remain unassigned until allocation is restored.
- Clear the temporary shard allocation restriction after the node is stable.
$ curl --silent --show-error -H "Content-Type: application/json" -X PUT "http://localhost:9200/_cluster/settings?pretty" -d '{ "transient": { "cluster.routing.allocation.enable": null } }' { "acknowledged" : true, "persistent" : { }, "transient" : { } }Setting a transient value to null removes the override and returns allocation behavior to the configured default.
- Wait for cluster health to return to green after replicas are allocated.
$ curl --silent --show-error "http://localhost:9200/_cluster/health?wait_for_status=green&wait_for_no_relocating_shards=true&wait_for_no_initializing_shards=true&pretty" { "status" : "green", "timed_out" : false, "number_of_nodes" : 3, "number_of_data_nodes" : 3, "active_primary_shards" : 3, "active_shards" : 3, "relocating_shards" : 0, "initializing_shards" : 0, "unassigned_shards" : 0 } - Repeat the node upgrade process for each remaining node.
Prefer upgrading data and ingest nodes first, and upgrade master-eligible nodes last unless the target version documentation specifies a different order.
- Verify all nodes report the expected version after the final node upgrade.
$ curl --silent --show-error "http://localhost:9200/_cat/nodes?h=name,ip,version" node-01 192.0.2.11 8.19.9 node-03 192.0.2.13 8.19.9 node-02 192.0.2.12 8.19.9
Mohd Shakir Zakaria is a cloud architect with deep roots in software development and open-source advocacy. Certified in AWS, Red Hat, VMware, ITIL, and Linux, he specializes in designing and managing robust cloud and on-premises infrastructures.
