0% found this document useful (0 votes)

7 views

L19-mod6-ReplicationPartitioning-P2

Uploaded by

Ethan Chung

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views

L19-mod6-ReplicationPartitioning-P2

Uploaded by

Ethan Chung

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 27

CSE 3244: Data Management In the Cloud

Module 6: Replication and Partitioning

NoSQL

Some slides from Jimmy Lin (U. Waterloo),

Jack Conway, Harvard
2PC is slow! So, what do RDBMSes provide?
 Relational model with schemas
 Powerful, flexible query language
 Transactional semantics: ACID
 Rich ecosystem, lots of tool support
 Slow execution in distributed context

What if we want a la carte?

Source: www.flickr.com/photos/vidiot/18556565/
Features a la carte?
 What if I’m willing to give up consistency for scalability?
 What if I’m willing to give up the relational model for something
more flexible?
 What if I just want a cheaper solution?

Enter… NoSQL!
NoSQL (Not only SQL)
1. Horizontally scale “simple operations”
2. Replicate/distribute data over many servers
3. Simple call interface BASE = Basically Available, Soft
4. Weaker concurrency model than ACID state, Eventually consistent
5. Efficient use of distributed indexes and RAM
6. Flexible schemas

(Major) Types of NoSQL databases

 Key-valuestores (Memcached, Redis, DynamoDB, RocksDB, ...)
 Column-oriented databases (Redshift, CosmosDB, Vertica, ...)
 Document stores (MongoDB, DynamoDB, CouchDB, ...)
 Graph databases (Neo4j, CosmosDB, Aerospike, ...)

Source: Cattell (2010). Scalable SQL and NoSQL Data Stores. SIGMOD Record.
https://dl.acm.org/doi/pdf/10.1145/1978915.1978919
https://db-engines.com
Key-Value Stores: Data Model
 Stores associations between keys and values
 Keys are usually primitives
 For example, ints, strings, raw bytes, etc.
 Values can be primitive or complex: usually opaque to store
 Primitives: ints, strings, etc.
 Complex: JSON, HTML fragments, etc.

Image from: https://aws.amazon.com/nosql/key-value/

Key-Value Stores: Operations

Very simple API:
 Get – fetch value associated with key
 Put – set value associated with key

Optional operations:
 Multi-get
 Multi-put
 Range queries
 Consistency model:
 On a single machine, put operations are atomic
 Across multiple machines and cross-key operations: who
knows?

Non-persistent:
Just a big in-memory hash table
Persistent
Wrapper around a traditional RDBMS
What if data doesn’t fit on a single
machine? Partition!
When data doesn’t fit in a single node, increasing the
number of nodes horizontally is easier with NoSQL.
- Horizontal scaling

Image credit: https://simplyexplained.com on NoSQL

Replication

Replication is typically done at each partition as well as

across partitions.
- Eventual consistency.
- Possible that different copies may have different data.

Image credit: https://simplyexplained.com on NoSQL

What if data doesn’t fit on a single machine?
Partition!

 Partition the key space across multiple machines

 Let’ssay, hash partitioning
 For n machines, store key k at machine h(k) mod n

CSE3244 Pseudo-Code
Val = get(Key k) //Any server is queried for Key k
Val = get(Key k, Server s) //Server s is queried for Key k
put(Key k, Val v) //Any server is used initiate put
What if data doesn’t fit on a single machine?
Partition!
 Partition the key space across multiple machines
 Let’s say, hash partitioning
 For n machines, store key k at machine h(k) mod n

CSE3244 Pseudo-Code
Val = get(Key k) //Any server is queried for Key k
Val = get(Key k, Server s) //Server s is queried for Key k
put(Key k, Val v) //Any server is used initiate put

 Okay… But:
1. How do we know which physical machine to contact?
2. How do we add a new machine to the cluster?
3. What happens if a machine fails?
Naive Solution H(key) = int(key) mod servers
 Hash the keys

We want to join, how

do we find our spot?

H(Key) =44 mod 6

Server #7
Naive Solution H(key) = int(key) mod servers
 Hash the keys

We want to join, how

do we find our spot?

H(Key) =44 mod 6

Server #7
Naive Solution H(key) = int(key) mod servers
 Hash the keys

H(Key) =44 mod 6

Problem: Number of servers has
Changed. A lot of data is now in
the wrong place

Server #7
Naive Solution H(key) = int(key) mod servers
 Hash the keys
Server #7

Problem: Number of servers has

Changed. A lot of data is now in
the wrong place.

Hash each element again! Too

Much data movement
Clever Solution: H = int(key|MachName) mod BigPrime

Consistent hashing
 Hash the keys
 Hash the machines also! h=0

We want to join, how

do we find our spot?

H(Key) =52

H(MachName) =30

h = BigPrime/2
Clever Solution: H = int(key|MachName) mod BigPrime

Consistent hashing
 Hash the keys
 Hash the machines also!

We want to join, how

do we find our spot?

Problem: Number of servers has

Changed. Which data is in the wrong
place?
H(Key) =52

H(MachName) =30
Clever Solution: H = int(key|MachName) mod BigPrime

Consistent hashing
 Hash the keys
 Hash the machines also!

Solution only key-value pairs stored

on the server with the closest but
smallest hash value are remapped

Problem: How do we know which

servers are active and their hash
values?
h = 2n – 1 h=0
Simple Solution
Service Registry

Active→h(name)
Mach2 → 10
Mach3 → 20
Mach5 → 30
Mach1 → 40

Routing: Which machine holds the registry?

What if that machine fails?
h = 2n – 1 h=0
Each machine holds pointers
to predecessor and successor

Send request to any node, gets

routed to correct one in O(n) hops

Can we do better?

Routing: Which machine holds the key?

h = 2n – 1 h=0
Each machine holds pointers
to predecessor and successor

+ “finger table”
(+2, +4, +8, …)

Send request to any node, gets routed

to correct one in O(log n) hops

Routing: Which machine holds the key?

Example: Finger table
Machine fails: Do we lose data?

Covered!

Covered!
Solution: Replication
N = 3, replicate +1, –1
Problem: Load imbalance
Few servers (~100s) are not
evenly mapped across hash
space.

Some servers handle 2-3X

workload
- Reads
- Failures/adding nodes
Another Refinement: Virtual Nodes
 Don’t directly hash servers
 Create a large number of virtual nodes, map to physical servers
 Better load redistribution in event of machine failure
 When new server joins, evenly shed load from other servers

VN
VN

VN
Network
Interface
Card (NIC)
dies

What about failures during updates?

A dastardly sequence of events

(1) RED: update A=1

(2) RED: copy A to -1 node
(3) RED: NIC dies
(3) GREEN: update A = 2

Current state:
(GREEN, A = 0)
(GREEN, req RED for A = 2)
(RED, A = 1)
(RED, req GREEN for A = 1)

What are our options?

To be sure, a NIC is a network interface

card. Widely, used to access Wi-Fi on a laptop
A dastardly sequence of events

(1) RED: update A=1

(2) RED: copy A to -1 node
(3) RED: NIC dies
(3) GREEN: update A = 2

Current state:
(GREEN, A = 0)
(GREEN, req RED for A = 2)
(RED, A = 1)
(RED, req GREEN for A = 1)
Ensure DB Consistency: All
requests to A see updates in the
same order:

Option: Wait for Red to recover;

Rollback A=1 and Green take over

Ensure Availability: All requests

get processed immediately

Option: Green and Red diverge.

Merge state later.

DRAUP_Embedded_Engineering Talent Analysis_Sample Report
No ratings yet
DRAUP_Embedded_Engineering Talent Analysis_Sample Report
64 pages
Docker Made Easy
No ratings yet
Docker Made Easy
111 pages
WhatsApp Security Whitepaper
100% (1)
WhatsApp Security Whitepaper
27 pages
Cluster Parallel
No ratings yet
Cluster Parallel
29 pages
Hadoop Map Reduce Concept
No ratings yet
Hadoop Map Reduce Concept
23 pages
BDP 2023 03
No ratings yet
BDP 2023 03
59 pages
DHTLookup
No ratings yet
DHTLookup
56 pages
07 Hashing
No ratings yet
07 Hashing
73 pages
Practise Quiz Ccd-410 Exam (02-2014) - Cloudera Quiz Learning
No ratings yet
Practise Quiz Ccd-410 Exam (02-2014) - Cloudera Quiz Learning
50 pages
Distributed and Cloud Computing
No ratings yet
Distributed and Cloud Computing
58 pages
Lecture 10: Parallel Query Evaluation: CS 838: Foundations of Data Management Spring 2016
No ratings yet
Lecture 10: Parallel Query Evaluation: CS 838: Foundations of Data Management Spring 2016
4 pages
Itcertmaster: Safe, Simple and Fast. 100% Pass Guarantee!
No ratings yet
Itcertmaster: Safe, Simple and Fast. 100% Pass Guarantee!
6 pages
3- SPARK
No ratings yet
3- SPARK
51 pages
24 Hadoop Interview Questions & Answers For MapReduce Developers - FromDev
No ratings yet
24 Hadoop Interview Questions & Answers For MapReduce Developers - FromDev
7 pages
Chapter 4
No ratings yet
Chapter 4
71 pages
Hadoop Interview Questions Faq
No ratings yet
Hadoop Interview Questions Faq
14 pages
Introduction To MapReduce
No ratings yet
Introduction To MapReduce
43 pages
Hadoop Trainting in Hyderabad@KellyTechnologies
No ratings yet
Hadoop Trainting in Hyderabad@KellyTechnologies
23 pages
CS 201, S24, L5, Hash Tables
No ratings yet
CS 201, S24, L5, Hash Tables
4 pages
24 Hadoop Interview Questions & Answers For MapReduce Developers - FromDev
No ratings yet
24 Hadoop Interview Questions & Answers For MapReduce Developers - FromDev
3 pages
CSE524sp10-01
No ratings yet
CSE524sp10-01
62 pages
Presentation 3
No ratings yet
Presentation 3
63 pages
Exam Question Paper
No ratings yet
Exam Question Paper
8 pages
6. Week 6 Assignment 06
No ratings yet
6. Week 6 Assignment 06
4 pages
Hadoop Interview Questions Author: Pappupass Learning Resource
No ratings yet
Hadoop Interview Questions Author: Pappupass Learning Resource
16 pages
5 Hash_new
No ratings yet
5 Hash_new
24 pages
Big Data Quiz1.1
No ratings yet
Big Data Quiz1.1
4 pages
Hashing_DPP 01
No ratings yet
Hashing_DPP 01
4 pages
Peer-to-Peer (P2P) Systems: DHT, Chord, Pastry
No ratings yet
Peer-to-Peer (P2P) Systems: DHT, Chord, Pastry
122 pages
Map Reduce
No ratings yet
Map Reduce
3 pages
005 - Data Partitioning - Grokking The Advanced System Design Interview - WWW - Educative.io
No ratings yet
005 - Data Partitioning - Grokking The Advanced System Design Interview - WWW - Educative.io
5 pages
Map Reduce
No ratings yet
Map Reduce
25 pages
4 - Key-Value Stores
No ratings yet
4 - Key-Value Stores
47 pages
f1983db565593b752835454dd9d91b62_MIT6_830F10_lec17
No ratings yet
f1983db565593b752835454dd9d91b62_MIT6_830F10_lec17
7 pages
04 - Instruction Set Architecture-RV Part III V2
No ratings yet
04 - Instruction Set Architecture-RV Part III V2
49 pages
Hadoop Interview Questions - Part 1
No ratings yet
Hadoop Interview Questions - Part 1
8 pages
Chapter 6 Parallel Processor
No ratings yet
Chapter 6 Parallel Processor
21 pages
24 Interview Questions
No ratings yet
24 Interview Questions
7 pages
Hadoopsdsdgs
No ratings yet
Hadoopsdsdgs
29 pages
Parallel & Distributed Computing
100% (1)
Parallel & Distributed Computing
52 pages
Week2 Frequently Asked Questions
No ratings yet
Week2 Frequently Asked Questions
18 pages
SampleQues Technical
No ratings yet
SampleQues Technical
10 pages
Lecture3 Hadoop-NLP
No ratings yet
Lecture3 Hadoop-NLP
44 pages
Large-Scale Data Management: Cs525: Special Topics in Dbs
No ratings yet
Large-Scale Data Management: Cs525: Special Topics in Dbs
22 pages
Slide 3
No ratings yet
Slide 3
34 pages
Lecture 1 Parallel Databases
No ratings yet
Lecture 1 Parallel Databases
30 pages
CS-3032 (BD) - CS End April 2024
No ratings yet
CS-3032 (BD) - CS End April 2024
27 pages
Unit 2 Topic 4 Map Reduce
No ratings yet
Unit 2 Topic 4 Map Reduce
43 pages
Distributed Shared Memory
No ratings yet
Distributed Shared Memory
22 pages
24-TensorFlow-Clipper
No ratings yet
24-TensorFlow-Clipper
35 pages
BDA RepeatedImp Questions
No ratings yet
BDA RepeatedImp Questions
30 pages
TDD: Topics in Distributed Databases: Parallel Database Management Systems
No ratings yet
TDD: Topics in Distributed Databases: Parallel Database Management Systems
38 pages
Best Hadoop Online Training
No ratings yet
Best Hadoop Online Training
41 pages
Efficient Distributed Locality Sensitive Hashing: Bahman Bahmani Ashish Goel Rajendra Shinde
No ratings yet
Efficient Distributed Locality Sensitive Hashing: Bahman Bahmani Ashish Goel Rajendra Shinde
5 pages
Crawler
No ratings yet
Crawler
83 pages
Consistent Hashing
No ratings yet
Consistent Hashing
19 pages
LN 2
No ratings yet
LN 2
33 pages
Lez.d-01-Hadoop (A) Intro
No ratings yet
Lez.d-01-Hadoop (A) Intro
58 pages
SDX Formularifinal
No ratings yet
SDX Formularifinal
2 pages
Learn Hive in 24 Hours
From Everand
Learn Hive in 24 Hours
Alex Nordeen
No ratings yet
50 Recipes for Programming Node.js
From Everand
50 Recipes for Programming Node.js
Jamie Munro
3/5 (4)
Node.js: The Definitive Resource
From Everand
Node.js: The Definitive Resource
Tom Henricksen
No ratings yet
Couchbase Certified Java Developer - Exam Practice Tests
From Everand
Couchbase Certified Java Developer - Exam Practice Tests
Cristian Scutaru
No ratings yet
Windows Server 2016 TP5 Policy Settings Spreadsheet
No ratings yet
Windows Server 2016 TP5 Policy Settings Spreadsheet
1,522 pages
Task 1 Types of Parallel Processing
No ratings yet
Task 1 Types of Parallel Processing
3 pages
Computer Studies Information Technology Scheme of Work For Junior Secondary School JSS 3
No ratings yet
Computer Studies Information Technology Scheme of Work For Junior Secondary School JSS 3
6 pages
Conexiones en Puertos en Asa
No ratings yet
Conexiones en Puertos en Asa
6 pages
Search Agents Uninformed Search: Artificial Intelligence
No ratings yet
Search Agents Uninformed Search: Artificial Intelligence
48 pages
Filipino and English Reading Comprehention System
No ratings yet
Filipino and English Reading Comprehention System
176 pages
BioHiTech Biobrain Troubleshooting Guide - v1.1
No ratings yet
BioHiTech Biobrain Troubleshooting Guide - v1.1
6 pages
Server-Centric It Architecture and Its Limitations
100% (1)
Server-Centric It Architecture and Its Limitations
12 pages
Windows Hacking
No ratings yet
Windows Hacking
69 pages
What Is A Microcontroller
No ratings yet
What Is A Microcontroller
8 pages
Simple Vector Processor Modeled With VHDL
No ratings yet
Simple Vector Processor Modeled With VHDL
6 pages
图书馆系统论文
100% (1)
图书馆系统论文
8 pages
Log
No ratings yet
Log
15 pages
2) Routing Protocol Basis
No ratings yet
2) Routing Protocol Basis
23 pages
Firmware Update E 3
No ratings yet
Firmware Update E 3
4 pages
A First Course in Predictive Control Second Edition Bookshelf
No ratings yet
A First Course in Predictive Control Second Edition Bookshelf
1 page
37 Afzal Khan Ajp Exp 22
100% (2)
37 Afzal Khan Ajp Exp 22
12 pages
CS402 Final Term Solved SUBJECTIVE by JUNAID
No ratings yet
CS402 Final Term Solved SUBJECTIVE by JUNAID
59 pages
Quick Installation Guide Mettler V3.0
No ratings yet
Quick Installation Guide Mettler V3.0
9 pages
1 - Threat Modeling Report
No ratings yet
1 - Threat Modeling Report
4 pages
DJM30073 Labwork1
No ratings yet
DJM30073 Labwork1
9 pages
PWylie - API Security Through External Attack Surface Management
No ratings yet
PWylie - API Security Through External Attack Surface Management
25 pages
Hillstone Solution VFW IKEv2 IPsec VPN
No ratings yet
Hillstone Solution VFW IKEv2 IPsec VPN
10 pages
SMS Teltonika
No ratings yet
SMS Teltonika
6 pages
Notes
No ratings yet
Notes
5 pages
Intern Report-3
No ratings yet
Intern Report-3
77 pages
Class 11 CH - 1 Basic Computer Organization
No ratings yet
Class 11 CH - 1 Basic Computer Organization
4 pages

L19-mod6-ReplicationPartitioning-P2

Uploaded by

L19-mod6-ReplicationPartitioning-P2

Uploaded by

CSE 3244: Data Management In the Cloud

Module 6: Replication and Partitioning

Some slides from Jimmy Lin (U. Waterloo),

What if we want a la carte?

(Major) Types of NoSQL databases

Image from: https://aws.amazon.com/nosql/key-value/

Image credit: https://simplyexplained.com on NoSQL

Replication is typically done at each partition as well as

Image credit: https://simplyexplained.com on NoSQL

 Partition the key space across multiple machines

We want to join, how

H(Key) =44 mod 6

We want to join, how

H(Key) =44 mod 6

H(Key) =44 mod 6

Problem: Number of servers has

Hash each element again! Too

We want to join, how

We want to join, how

Problem: Number of servers has

Solution only key-value pairs stored

Problem: How do we know which

Routing: Which machine holds the registry?

Send request to any node, gets

Routing: Which machine holds the key?

Send request to any node, gets routed

Routing: Which machine holds the key?

Some servers handle 2-3X

What about failures during updates?

(1) RED: update A=1

What are our options?

To be sure, a NIC is a network interface

(1) RED: update A=1

Option: Wait for Red to recover;

Ensure Availability: All requests

Option: Green and Red diverge.

You might also like