0% found this document useful (0 votes)
7 views

L19-mod6-ReplicationPartitioning-P2

Uploaded by

Ethan Chung
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

L19-mod6-ReplicationPartitioning-P2

Uploaded by

Ethan Chung
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

CSE 3244: Data Management In the Cloud

Module 6: Replication and Partitioning

NoSQL

Some slides from Jimmy Lin (U. Waterloo),


Jack Conway, Harvard
2PC is slow! So, what do RDBMSes provide?
 Relational model with schemas
 Powerful, flexible query language
 Transactional semantics: ACID
 Rich ecosystem, lots of tool support
 Slow execution in distributed context

What if we want a la carte?


Source: www.flickr.com/photos/vidiot/18556565/
Features a la carte?
 What if I’m willing to give up consistency for scalability?
 What if I’m willing to give up the relational model for something
more flexible?
 What if I just want a cheaper solution?

Enter… NoSQL!
NoSQL (Not only SQL)
1. Horizontally scale “simple operations”
2. Replicate/distribute data over many servers
3. Simple call interface BASE = Basically Available, Soft
4. Weaker concurrency model than ACID state, Eventually consistent
5. Efficient use of distributed indexes and RAM
6. Flexible schemas

(Major) Types of NoSQL databases


 Key-valuestores (Memcached, Redis, DynamoDB, RocksDB, ...)
 Column-oriented databases (Redshift, CosmosDB, Vertica, ...)
 Document stores (MongoDB, DynamoDB, CouchDB, ...)
 Graph databases (Neo4j, CosmosDB, Aerospike, ...)

Source: Cattell (2010). Scalable SQL and NoSQL Data Stores. SIGMOD Record.
https://dl.acm.org/doi/pdf/10.1145/1978915.1978919
https://db-engines.com
Key-Value Stores: Data Model
 Stores associations between keys and values
 Keys are usually primitives
 For example, ints, strings, raw bytes, etc.
 Values can be primitive or complex: usually opaque to store
 Primitives: ints, strings, etc.
 Complex: JSON, HTML fragments, etc.

Image from: https://aws.amazon.com/nosql/key-value/


Key-Value Stores: Operations

Very simple API:
 Get – fetch value associated with key
 Put – set value associated with key

Optional operations:
 Multi-get
 Multi-put
 Range queries
 Consistency model:
 On a single machine, put operations are atomic
 Across multiple machines and cross-key operations: who
knows?

Non-persistent:
Just a big in-memory hash table
Persistent
Wrapper around a traditional RDBMS
What if data doesn’t fit on a single
machine? Partition!
When data doesn’t fit in a single node, increasing the
number of nodes horizontally is easier with NoSQL.
- Horizontal scaling

Image credit: https://simplyexplained.com on NoSQL


Replication

Replication is typically done at each partition as well as


across partitions.
- Eventual consistency.
- Possible that different copies may have different data.

Image credit: https://simplyexplained.com on NoSQL


What if data doesn’t fit on a single machine?
Partition!

 Partition the key space across multiple machines


 Let’ssay, hash partitioning
 For n machines, store key k at machine h(k) mod n

CSE3244 Pseudo-Code
Val = get(Key k) //Any server is queried for Key k
Val = get(Key k, Server s) //Server s is queried for Key k
put(Key k, Val v) //Any server is used initiate put
What if data doesn’t fit on a single machine?
Partition!
 Partition the key space across multiple machines
 Let’s say, hash partitioning
 For n machines, store key k at machine h(k) mod n

CSE3244 Pseudo-Code
Val = get(Key k) //Any server is queried for Key k
Val = get(Key k, Server s) //Server s is queried for Key k
put(Key k, Val v) //Any server is used initiate put

 Okay… But:
1. How do we know which physical machine to contact?
2. How do we add a new machine to the cluster?
3. What happens if a machine fails?
Naive Solution H(key) = int(key) mod servers
 Hash the keys

We want to join, how


do we find our spot?

H(Key) =44 mod 6

Server #7
Naive Solution H(key) = int(key) mod servers
 Hash the keys

We want to join, how


do we find our spot?

H(Key) =44 mod 6

Server #7
Naive Solution H(key) = int(key) mod servers
 Hash the keys

H(Key) =44 mod 6


Problem: Number of servers has
Changed. A lot of data is now in
the wrong place

Server #7
Naive Solution H(key) = int(key) mod servers
 Hash the keys
Server #7

Problem: Number of servers has


Changed. A lot of data is now in
the wrong place.

Hash each element again! Too


Much data movement
Clever Solution: H = int(key|MachName) mod BigPrime

Consistent hashing
 Hash the keys
 Hash the machines also! h=0

We want to join, how


do we find our spot?

H(Key) =52

H(MachName) =30

h = BigPrime/2
Clever Solution: H = int(key|MachName) mod BigPrime

Consistent hashing
 Hash the keys
 Hash the machines also!

We want to join, how


do we find our spot?

Problem: Number of servers has


Changed. Which data is in the wrong
place?
H(Key) =52

H(MachName) =30
Clever Solution: H = int(key|MachName) mod BigPrime

Consistent hashing
 Hash the keys
 Hash the machines also!

Solution only key-value pairs stored


on the server with the closest but
smallest hash value are remapped

Problem: How do we know which


servers are active and their hash
values?
h = 2n – 1 h=0
Simple Solution
Service Registry

Active→h(name)
Mach2 → 10
Mach3 → 20
Mach5 → 30
Mach1 → 40

Routing: Which machine holds the registry?


What if that machine fails?
h = 2n – 1 h=0
Each machine holds pointers
to predecessor and successor

Send request to any node, gets


routed to correct one in O(n) hops

Can we do better?

Routing: Which machine holds the key?


h = 2n – 1 h=0
Each machine holds pointers
to predecessor and successor

+ “finger table”
(+2, +4, +8, …)

Send request to any node, gets routed


to correct one in O(log n) hops

Routing: Which machine holds the key?


Example: Finger table
Machine fails: Do we lose data?

Covered!

Covered!
Solution: Replication
N = 3, replicate +1, –1
Problem: Load imbalance
Few servers (~100s) are not
evenly mapped across hash
space.

Some servers handle 2-3X


workload
- Reads
- Failures/adding nodes
Another Refinement: Virtual Nodes
 Don’t directly hash servers
 Create a large number of virtual nodes, map to physical servers
 Better load redistribution in event of machine failure
 When new server joins, evenly shed load from other servers

VN

VN

VN
VN

VN
Network
Interface
Card (NIC)
dies

What about failures during updates?


A dastardly sequence of events

(1) RED: update A=1


(2) RED: copy A to -1 node
(3) RED: NIC dies
(3) GREEN: update A = 2

Current state:
(GREEN, A = 0)
(GREEN, req RED for A = 2)
(RED, A = 1)
(RED, req GREEN for A = 1)

What are our options?

To be sure, a NIC is a network interface


card. Widely, used to access Wi-Fi on a laptop
A dastardly sequence of events

(1) RED: update A=1


(2) RED: copy A to -1 node
(3) RED: NIC dies
(3) GREEN: update A = 2

Current state:
(GREEN, A = 0)
(GREEN, req RED for A = 2)
(RED, A = 1)
(RED, req GREEN for A = 1)
Ensure DB Consistency: All
requests to A see updates in the
same order:

Option: Wait for Red to recover;


Rollback A=1 and Green take over

Ensure Availability: All requests


get processed immediately

Option: Green and Red diverge.


Merge state later.

You might also like