How to reindex data in Elasticsearch

Reindexing in Elasticsearch involves copying documents from one index to another, optionally transforming the data as it moves.

This procedure uses the _reindex API, enabling flexible modifications such as changing field names, updating mappings, or consolidating data from multiple indices.

By efficiently reindexing, administrators ensure that the underlying data structures remain optimized, reflecting evolving schema requirements and enhancing query performance.

Steps to reindex data in Elasticsearch:

Prepare a new target index with the desired settings and mappings.

$ curl --request PUT --header "Content-Type: application/json" --data '{
  "settings": {
    "number_of_shards": 1
  }
}' http://localhost:9200/new_index
{"acknowledged":true,"shards_acknowledged":true,"index":"new_index"}

Creating the target index first ensures control over shards, replicas, and field types.

Use the reindex endpoint to copy documents from the source to the target.

$ curl --request POST --header "Content-Type: application/json" --data '{
  "source": {
    "index": "old_index"
  },
  "dest": {
    "index": "new_index"
  }
}' http://localhost:9200/_reindex
{"took":30,"total":10000,"updated":0,"created":10000,"deleted":0,"batches":10,"noops":0,"retries":{...},"throttled":0,"throttled_until":0,"failures":[]}

Monitor the response for failures to catch indexing errors early.

Wait for the reindex task to complete if it runs asynchronously.

$ curl --request GET http://localhost:9200/_tasks?detailed
{
  "nodes": {...}
}

Refresh the new_index to make newly indexed documents immediately searchable.
```
$ curl --request POST http://localhost:9200/new_index/_refresh
{"_shards":{"total":1,"successful":1,"failed":0}}
```
Avoid frequent reindexing in production as it can impact cluster load.

Verify the migrated documents by performing a search.

$ curl --request GET --silent http://localhost:9200/new_index/_search | jq
{
  "took":5,
  "hits":{"total":{"value":10000,"relation":"eq"},"hits":[...]}
}

Remove or decommission the old_index once the new one is validated.
Consider using reindex-from-remote if the source resides in a different cluster.

For related configuration guidance, see: index-create

Author: Mohd Shakir Zakaria
Mohd Shakir Zakaria is a cloud architect with deep roots in software development and open-source advocacy. Certified in AWS, Red Hat, VMware, ITIL, and Linux, he specializes in designing and managing robust cloud and on-premises infrastructures.

Discuss the article:

Comment anonymously. Login not required.