L19-mod6-ReplicationPartitioning-P2
L19-mod6-ReplicationPartitioning-P2
NoSQL
Enter… NoSQL!
NoSQL (Not only SQL)
1. Horizontally scale “simple operations”
2. Replicate/distribute data over many servers
3. Simple call interface BASE = Basically Available, Soft
4. Weaker concurrency model than ACID state, Eventually consistent
5. Efficient use of distributed indexes and RAM
6. Flexible schemas
Source: Cattell (2010). Scalable SQL and NoSQL Data Stores. SIGMOD Record.
https://dl.acm.org/doi/pdf/10.1145/1978915.1978919
https://db-engines.com
Key-Value Stores: Data Model
Stores associations between keys and values
Keys are usually primitives
For example, ints, strings, raw bytes, etc.
Values can be primitive or complex: usually opaque to store
Primitives: ints, strings, etc.
Complex: JSON, HTML fragments, etc.
Non-persistent:
Just a big in-memory hash table
Persistent
Wrapper around a traditional RDBMS
What if data doesn’t fit on a single
machine? Partition!
When data doesn’t fit in a single node, increasing the
number of nodes horizontally is easier with NoSQL.
- Horizontal scaling
CSE3244 Pseudo-Code
Val = get(Key k) //Any server is queried for Key k
Val = get(Key k, Server s) //Server s is queried for Key k
put(Key k, Val v) //Any server is used initiate put
What if data doesn’t fit on a single machine?
Partition!
Partition the key space across multiple machines
Let’s say, hash partitioning
For n machines, store key k at machine h(k) mod n
CSE3244 Pseudo-Code
Val = get(Key k) //Any server is queried for Key k
Val = get(Key k, Server s) //Server s is queried for Key k
put(Key k, Val v) //Any server is used initiate put
Okay… But:
1. How do we know which physical machine to contact?
2. How do we add a new machine to the cluster?
3. What happens if a machine fails?
Naive Solution H(key) = int(key) mod servers
Hash the keys
Server #7
Naive Solution H(key) = int(key) mod servers
Hash the keys
Server #7
Naive Solution H(key) = int(key) mod servers
Hash the keys
Server #7
Naive Solution H(key) = int(key) mod servers
Hash the keys
Server #7
Consistent hashing
Hash the keys
Hash the machines also! h=0
H(Key) =52
H(MachName) =30
h = BigPrime/2
Clever Solution: H = int(key|MachName) mod BigPrime
Consistent hashing
Hash the keys
Hash the machines also!
H(MachName) =30
Clever Solution: H = int(key|MachName) mod BigPrime
Consistent hashing
Hash the keys
Hash the machines also!
Active→h(name)
Mach2 → 10
Mach3 → 20
Mach5 → 30
Mach1 → 40
Can we do better?
+ “finger table”
(+2, +4, +8, …)
Covered!
Covered!
Solution: Replication
N = 3, replicate +1, –1
Problem: Load imbalance
Few servers (~100s) are not
evenly mapped across hash
space.
VN
VN
VN
VN
VN
Network
Interface
Card (NIC)
dies
Current state:
(GREEN, A = 0)
(GREEN, req RED for A = 2)
(RED, A = 1)
(RED, req GREEN for A = 1)
Current state:
(GREEN, A = 0)
(GREEN, req RED for A = 2)
(RED, A = 1)
(RED, req GREEN for A = 1)
Ensure DB Consistency: All
requests to A see updates in the
same order: