Wednesday, June 15, 2016

ElasticSearch document routing (or allocation)

When you index a document, it is stored on a single primary shard. How does Elasticsearch know which shard a document belongs to? When we create a new document, how does it know whether it should store that document on shard 1 or shard 2?

The process can’t be random, since we may need to retrieve the document in the future. In fact, it is determined by a simple formula:
 
    shard = hash(routing) % number_of_primary_shards

The routing value is an arbitrary string, which defaults to the document’s _id but can also be set to a custom value. This routing string is passed through a hashing function to generate a number, which is divided by the number of primary shards in the index to return the remainder. The remainder will always be in the range 0 to number_of_primary_shards - 1, and gives us the number of the shard where a particular document lives.

This is the reason why the number of primary shards should not change after a index has been created since change number of primary shards will cause all sort of problems.

This is how a document gets decided where to go for a index. Then there is a way for determining where a primary shard should be allocated and where the replica shard should be allocated across the cluster.

document routing + index primary shard and replica shard allocation will be the complete elasticsearch allocation which in my view a bit confusing to grasp in the beginning of the learning ElasticSearch.

No comments:

Post a Comment