0% found this document useful (0 votes)
3 views

ADBMS - Unit3 (Autosaved)

Uploaded by

jason beryl
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

ADBMS - Unit3 (Autosaved)

Uploaded by

jason beryl
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 12

Consistent Hashing

• Cassandra and other dynamo-based databases distribute data throughout the


cluster by using consistent hashing.

• The rowkey (analogous to a primary key in an RDBMS) is hashed.

• Each node is allocated a range of hash values, and the node that has the
specific range for a hashed key value takes responsibility for the initial
placement of that data.
Consistent Hashing
• In the default Cassandra partitioning scheme, the hash values range from -
263 to 263-1.

• Therefore, if there were four nodes in the cluster and we wanted to assign
equal numbers of hashes to each node, then the hash ranges for each
would be approximate as follows:
Consistent Hashing
• We usually visualize the cluster as a ring: the circumference of the ring represents all
the possible hash values, and the location of the node on the ring represents its area of
responsibility.

• Figure 8-10 illustrates simple consistent hashing: the value for a rowkey is hashed,
which determines its position on “the ring.”

• Nodes in the cluster take responsibility for ranges of values within the ring, and
therefore take ownership of specific rowkey values.
Consistent Hashing
• The four-node cluster in Figure 8-10 is well balanced because every node is
responsible for hash ranges of similar magnitude. But we risk unbalancing the cluster
as we add nodes.

• If we double the number of nodes in the cluster, then we can assign the new nodes at
points on the ring between existing nodes and the cluster will remain balanced.

• However, doubling the cluster is usually impractical: it’s more economical to grow the
cluster incrementally.
Consistent Hashing
• Early versions of Cassandra had two options when adding a new node.

• We could either remap all the hash ranges, or we could map the new node
within an existing range.

• In the first option we obtain a balanced cluster, but only after an expensive
rebalancing process.
Consistent Hashing
• In the second option the cluster becomes unbalanced; since each node is
responsible for the region of the ring between itself and its predecessor,
adding a new node without changing the ranges of other nodes essentially
splits a region in half.

• Figure 8-11 shows how adding a node to the cluster can unbalance the
distribution of hash key ranges.
Consistent Hashing
• Virtual nodes, implemented in Cassandra, Riak, and many other Dynamo-based systems,
provide a solution to this issue.

• When using virtual nodes, the hash ranges are calculated for a relatively large number of
virtual nodes—256 virtual nodes per physical node, typically—and these virtual nodes are
assigned to physical nodes.

• Now when a new node is added, specific virtual nodes can be reallocated to the new node,
resulting in a balanced configuration with minimal overhead. Figure 8-12 illustrates the
relationship between virtual nodes and physical nodes.
Order-Preserving Partitioning
• The Cassandra partitioner determines how keys are distributed across nodes.

• The default partitioner uses consistent hashing, as described in the previous


section.

• Cassandra also supports order-preserving partitioners that distribute data


across the nodes of the cluster as ranges of actual (e.g., not hashed) row keys.

You might also like