The process can’t be random, since we may need to retrieve the document in the future. In fact, it is determined by a simple formula:
shard = hash(routing) % number_of_primary_shards
The
routing
value is an arbitrary string, which defaults to the document’s
_id
but can also be set to a custom value. This routing
string is passed
through a hashing function to generate a number, which is divided by the
number of primary shards in the index to return the remainder. The remainder
will always be in the range 0
to number_of_primary_shards - 1
, and gives
us the number of the shard where a particular document lives.This is the reason why the number of primary shards should not change after a index has been created since change number of primary shards will cause all sort of problems.
This is how a document gets decided where to go for a index. Then there is a way for determining where a primary shard should be allocated and where the replica shard should be allocated across the cluster.
document routing + index primary shard and replica shard allocation will be the complete elasticsearch allocation which in my view a bit confusing to grasp in the beginning of the learning ElasticSearch.
No comments:
Post a Comment