0% found this document useful (0 votes)
3 views

ADBMS - Unit3 (Autosaved)

Uploaded by

jason beryl
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

ADBMS - Unit3 (Autosaved)

Uploaded by

jason beryl
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 12

Consistent Hashing

• Cassandra and other dynamo-based databases distribute data throughout the


cluster by using consistent hashing.

• The rowkey (analogous to a primary key in an RDBMS) is hashed.

• Each node is allocated a range of hash values, and the node that has the
specific range for a hashed key value takes responsibility for the initial
placement of that data.
Consistent Hashing
• In the default Cassandra partitioning scheme, the hash values range from -
263 to 263-1.

• Therefore, if there were four nodes in the cluster and we wanted to assign
equal numbers of hashes to each node, then the hash ranges for each
would be approximate as follows:
Consistent Hashing
• We usually visualize the cluster as a ring: the circumference of the ring represents all
the possible hash values, and the location of the node on the ring represents its area of
responsibility.

• Figure 8-10 illustrates simple consistent hashing: the value for a rowkey is hashed,
which determines its position on “the ring.”

• Nodes in the cluster take responsibility for ranges of values within the ring, and
therefore take ownership of specific rowkey values.
Consistent Hashing
• The four-node cluster in Figure 8-10 is well balanced because every node is
responsible for hash ranges of similar magnitude. But we risk unbalancing the cluster
as we add nodes.

• If we double the number of nodes in the cluster, then we can assign the new nodes at
points on the ring between existing nodes and the cluster will remain balanced.

• However, doubling the cluster is usually impractical: it’s more economical to grow the
cluster incrementally.
Consistent Hashing
• Early versions of Cassandra had two options when adding a new node.

• We could either remap all the hash ranges, or we could map the new node
within an existing range.

• In the first option we obtain a balanced cluster, but only after an expensive
rebalancing process.
Consistent Hashing
• In the second option the cluster becomes unbalanced; since each node is
responsible for the region of the ring between itself and its predecessor,
adding a new node without changing the ranges of other nodes essentially
splits a region in half.

• Figure 8-11 shows how adding a node to the cluster can unbalance the
distribution of hash key ranges.
Consistent Hashing
• Virtual nodes, implemented in Cassandra, Riak, and many other Dynamo-based systems,
provide a solution to this issue.

• When using virtual nodes, the hash ranges are calculated for a relatively large number of
virtual nodes—256 virtual nodes per physical node, typically—and these virtual nodes are
assigned to physical nodes.

• Now when a new node is added, specific virtual nodes can be reallocated to the new node,
resulting in a balanced configuration with minimal overhead. Figure 8-12 illustrates the
relationship between virtual nodes and physical nodes.
Order-Preserving Partitioning
• The Cassandra partitioner determines how keys are distributed across nodes.

• The default partitioner uses consistent hashing, as described in the previous


section.

• Cassandra also supports order-preserving partitioners that distribute data


across the nodes of the cluster as ranges of actual (e.g., not hashed) row keys.

You might also like