Allocation awareness keeps Elasticsearch shard copies separated across racks or availability zones so a single failure domain is less likely to take out both a primary and its replica.
Awareness works by tagging nodes with custom attributes such as node.attr.zone and telling the cluster to consider that attribute during shard placement using cluster.routing.allocation.awareness.attributes. When the cluster has enough eligible nodes across domains, replicas are distributed to avoid landing in the same attribute value as their primaries.
Every node that can hold shards must report the attribute, and the values must accurately reflect the intended fault domains or shards can remain UNASSIGNED. Enabling awareness can trigger shard relocation activity, and secured clusters typically require HTTPS plus authentication for the cluster settings API; replica counts must be sized appropriately to get full separation across all domains.
Steps to configure Elasticsearch allocation awareness:
- Identify the failure-domain attribute and expected values for the cluster.
Common choices are zone (availability zone) or rack (rack ID), and the attribute name must match between node.attr.* and cluster.routing.allocation.awareness.attributes.
- Set the node attribute in /etc/elasticsearch/elasticsearch.yml on each node.
node.attr.zone: az1
Use a distinct value per fault domain (for example, az1 and az2), and keep the attribute spelling consistent across all data nodes.
- Restart the Elasticsearch service on the node to load the new attribute.
$ sudo systemctl restart elasticsearch
Restarting multiple data nodes at the same time can reduce shard availability; use a rolling restart in production.
- Check that the nodes are reporting the attribute values to the cluster.
$ curl -s "http://localhost:9200/_nodes?filter_path=nodes.*.name,nodes.*.attributes&pretty" { "nodes" : { "a1b2c3d4e5f6g7h8" : { "name" : "es-data-1", "attributes" : { "zone" : "az1" } }, "i9j0k1l2m3n4o5p6" : { "name" : "es-data-2", "attributes" : { "zone" : "az2" } } } }Only nodes that can hold shards need the attribute; coordinating-only nodes can omit it.
- Enable allocation awareness at the cluster level.
$ curl -s -H "Content-Type: application/json" -X PUT "http://localhost:9200/_cluster/settings" -d '{ "persistent": { "cluster.routing.allocation.awareness.attributes": "zone" } }' { "acknowledged" : true }Secured clusters typically require HTTPS and authentication (basic auth or an API key) for the same request.
- Force awareness to the full set of values when shard copies must not collapse into a single surviving domain.
$ curl -s -H "Content-Type: application/json" -X PUT "http://localhost:9200/_cluster/settings" -d '{ "persistent": { "cluster.routing.allocation.awareness.force.zone.values": "az1,az2" } }' { "acknowledged" : true }Forced awareness can intentionally keep replicas UNASSIGNED when a value is missing, which may reduce availability until capacity returns.
- Verify the allocation awareness settings.
$ curl -s "http://localhost:9200/_cluster/settings?pretty&flat_settings=true" { "persistent" : { "cluster.routing.allocation.awareness.attributes" : "zone", "cluster.routing.allocation.awareness.force.zone.values" : "az1,az2" }, "transient" : { } } - Check cluster health for unassigned shards during or after relocation.
$ curl -s "http://localhost:9200/_cluster/health?pretty" { "cluster_name" : "es-cluster", "status" : "green", "number_of_nodes" : 2, "active_primary_shards" : 5, "active_shards" : 10, "relocating_shards" : 0, "initializing_shards" : 0, "unassigned_shards" : 0 }Relocation activity appears as non-zero relocating_shards or initializing_shards until balancing completes.
- Confirm primary and replica shards are allocated to nodes in different zones.
$ curl -s "http://localhost:9200/_cat/shards?v" index shard prirep state docs store ip node logs-000001 0 p STARTED 1200 1.2mb 10.0.0.11 es-data-1 logs-000001 0 r STARTED 1200 1.2mb 10.0.0.12 es-data-2 logs-000001 1 p STARTED 1187 1.1mb 10.0.0.12 es-data-2 logs-000001 1 r STARTED 1187 1.1mb 10.0.0.11 es-data-1 ##### snipped #####
Compare the node names in the shard listing with the reported node zone attributes to confirm copies are separated by fault domain.
Mohd Shakir Zakaria is a cloud architect with deep roots in software development and open-source advocacy. Certified in AWS, Red Hat, VMware, ITIL, and Linux, he specializes in designing and managing robust cloud and on-premises infrastructures.
