Reindexing in Elasticsearch involves copying documents from one index to another, optionally transforming the data as it moves.

This procedure uses the _reindex API, enabling flexible modifications such as changing field names, updating mappings, or consolidating data from multiple indices.

By efficiently reindexing, administrators ensure that the underlying data structures remain optimized, reflecting evolving schema requirements and enhancing query performance.

Steps to reindex data in Elasticsearch:

  1. Prepare a new target index with the desired settings and mappings.
    $ curl --request PUT --header "Content-Type: application/json" --data '{
      "settings": {
        "number_of_shards": 1
      }
    }' http://localhost:9200/new_index
    {"acknowledged":true,"shards_acknowledged":true,"index":"new_index"}

    Creating the target index first ensures control over shards, replicas, and field types.

  2. Use the reindex endpoint to copy documents from the source to the target.
    $ curl --request POST --header "Content-Type: application/json" --data '{
      "source": {
        "index": "old_index"
      },
      "dest": {
        "index": "new_index"
      }
    }' http://localhost:9200/_reindex
    {"took":30,"total":10000,"updated":0,"created":10000,"deleted":0,"batches":10,"noops":0,"retries":{...},"throttled":0,"throttled_until":0,"failures":[]}

    Monitor the response for failures to catch indexing errors early.

  3. Wait for the reindex task to complete if it runs asynchronously.
    $ curl --request GET http://localhost:9200/_tasks?detailed
    {
      "nodes": {...}
    }
  4. Refresh the new_index to make newly indexed documents immediately searchable.
    $ curl --request POST http://localhost:9200/new_index/_refresh
    {"_shards":{"total":1,"successful":1,"failed":0}}

    Avoid frequent reindexing in production as it can impact cluster load.

  5. Verify the migrated documents by performing a search.
    $ curl --request GET --silent http://localhost:9200/new_index/_search | jq
    {
      "took":5,
      "hits":{"total":{"value":10000,"relation":"eq"},"hits":[...]}
    }
  6. Remove or decommission the old_index once the new one is validated.
  7. Consider using reindex-from-remote if the source resides in a different cluster.

For related configuration guidance, see: index-create

Discuss the article:

Comment anonymously. Login not required.