Cassandra introduction mars jug

Introduction to Cassandra
DuyHai DOAN, Technical Advocate
@doanduyhai

Shameless self-promotion!
@doanduyhai
2
Duy Hai DOAN
Cassandra technical advocate
• talks, meetups, confs
• open-source devs (Achilles, …)
• OSS Cassandra point of contact
☞ duy_hai.doan@datastax.com
• production troubleshooting

Datastax!
@doanduyhai
3
• Founded in April 2010
• We contribute a lot to Apache Cassandra™
• 400+ customers (25 of the Fortune 100), 200+ employees
• Headquarter in San Francisco Bay area
• EU headquarter in London, offices in France and Germany
• Datastax Enterprise = OSS Cassandra + extra features

Agenda!
@doanduyhai
4
Architecture
• Cluster, Replication, Consistency
Data model
• Last Write Win (LWW), CQL basics, From SQL to CQL
Dev Center Demo
DSE overview
CQL In Depth (time permitted)

Cassandra history!
@doanduyhai
5
NoSQL database
• created at Facebook
• open-sourced since 2008
• current version = 2.1
• column-oriented ☞ distributed table

Cassandra 5 key facts!
@doanduyhai
6
Linear scalability
C*
C*C*
NetcoSports
3 nodes, ≈3GB
NetFlix
1k+ nodes, PB+
YOU

@doanduyhai
7
Continuous availability (≈100% up-time)
• resilient architecture (Dynamo)

Rolling Upgrades!
@doanduyhai
8
n1
n2
n3
n4
n5
n6
n7
n8
Live production

Rolling Upgrades!
@doanduyhai
9
n1
n2
n3
n4
n5
n6
n7
n8
Live production

Rolling Upgrades!
@doanduyhai
10
n1
n2
n3
n4
n5
n6
n7
n8
Live production

@doanduyhai
11
Continuous availability (≈100% up-time)
• resilient architecture (Dynamo)
• rolling upgrades
• data backward compatible n/n+1 versions

@doanduyhai
12
Multi-data centers
• out-of-the-box (config only)
• AWS conf for multi-region DCs
• GCE/CloudStack support

Muti-DC usages!
@doanduyhai
13
New York (DC1)
London (DC2)
Data-locality, disaster recovery
n2
n3
n4
n5
n6
n7
n8
n1
n2
n3
n n4 5
n1
Async
replication

Muti-DC usages!
@doanduyhai
14
Workload segregation/virtual DC
n2
n3
n4
n5
n6
n7
n8
n1
n2
n3
n n4 5
n1
Production
(Live)
Analytics
(Spark/Hadoop)
Same DC

Muti-DC usages!
@doanduyhai
15
Prod data copy for testing/benchmarking
n2
n3
n4
n5
n6
n7
n8
n1
n2
n1 n3
Use
LOCAL
consistency
My tiny test
cluster
Data copy
❌
Never read
back

@doanduyhai
16
Operational simplicity
• 1 node = 1 process + 1 config file
• deployment automation
• OpsCenter for monitoring

@doanduyhai
17

@doanduyhai
18
Analytics combo
• Cassandra + Spark = awesome !
• realtime streaming

Cassandra architecture!
Cluster
Replication
Consistency

@doanduyhai
20
Cluster layer
• Amazon DynamoDB paper
• masterless architecture
Data-store layer
• Google Big Table paper
• Columns/columns family

@doanduyhai
21
API (CQL & RPC)
CLUSTER (DYNAMO)
DATA STORE (BIG TABLES)
DISKS
Node1
Client request
API (CQL & RPC)
CLUSTER (DYNAMO)
DATA STORE (BIG TABLES)
DISKS
Node2

Data distribution!
@doanduyhai
22
Random: hash of #partition → token = hash(#p)
Hash: ]-X, X]
X = huge number (264/2)
n1
n2
n3
n4
n5
n6
n7
n8

Token Ranges!
@doanduyhai
23
A: ]0, X/8]
B: ] X/8, 2X/8]
C: ] 2X/8, 3X/8]
D: ] 3X/8, 4X/8]
E: ] 4X/8, 5X/8]
F: ] 5X/8, 6X/8]
G: ] 6X/8, 7X/8]
H: ] 7X/8, X]
n1
n2
n3
n4
n5
n6
n7
n8
A
B
C
D
E
F
G
H

Linear scalability!
@doanduyhai
24
n1
n2
8 nodes 10 nodes
n3
n4
n5
n6
n7
n8
n1
n2
n3 n4
n5
n6
n7
n9 n8
n10

Failure tolerance!
@doanduyhai
25
Replication Factor (RF) = 3
n1
n2
n3
n4
n5
n6
n7
n8
1
2
3
{B, A, H}
{C, B, A}
{D, C, B}
A
B
C
D
E
F
G
H

Coordinator node!
Incoming requests (read/write)
Coordinator node handles the request
Every node can be coordinator àmasterless
@doanduyhai
n1
n2
n3
n4
n5
n6
n7
n8
1
2
3
coordinator
request

Consistency!
@doanduyhai
27
Tunable at runtime
• ONE
• QUORUM (strict majority w.r.t. RF)
• ALL
Apply both to read & write

Write consistency!
Write ONE
• write request to all replicas in //
@doanduyhai
n1
n2
n3
n4
n5
n6
n7
n8
1
2
3
coordinator

Write consistency!
Write ONE
• wait for ONE ack before returning to
client
@doanduyhai
n1
n2
n3
n4
n5
n6
n7
n8
1
2
3
coordinator
5 μs

Write consistency!
Write ONE
• wait for ONE ack before returning to
client
• other acks later, asynchronously
@doanduyhai
n1
n2
n3
n4
n5
n6
n7
n8
1
2
3
coordinator
5 μs
10 μs
120 μs

Write consistency!
Write QUORUM
• wait for QUORUM acks before
returning to client
• other acks later, asynchronously
@doanduyhai
n1
n2
n3
n4
n5
n6
n7
n8
1
2
3
coordinator
5 μs
10 μs
120 μs

Read consistency!
Read ONE
• read from one node among all replicas
@doanduyhai
n1
n2
n3
n4
n5
n6
n7
n8
1
2
3
coordinator

Read consistency!
Read ONE
• read from one node among all replicas
• contact the fastest node (stats)
@doanduyhai
n1
n2
n3
n4
n5
n6
n7
n8
1
2
3
coordinator

Read consistency!
Read QUORUM
• read from one fastest node
@doanduyhai
n1
n2
n3
n4
n5
n6
n7
n8
1
2
3
coordinator

Read consistency!
Read QUORUM
• AND request digest from other
replicas to reach QUORUM
@doanduyhai
n1
n2
n3
n4
n5
n6
n7
n8
1
2
3
coordinator

Read consistency!
Read QUORUM
• return most up-to-date data to client
@doanduyhai
n1
n2
n3
n4
n5
n6
n7
n8
1
2
3
coordinator

Read consistency!
Read QUORUM
• return most up-to-date data to client
• repair if digest mismatch n1
@doanduyhai
n2
n3
n4
n5
n6
n7
n8
1
2
3
coordinator

Consistency trade-off!
@doanduyhai
38

Consistency in action!
@doanduyhai
39
RF = 3, Write ONE, Read ONE
B A A
B A A
Read ONE: A
data replication in progress …
Write ONE: B

@doanduyhai
40
RF = 3, Write ONE, Read QUORUM
B A A
Write ONE: B
Read QUORUM: A
B A A

@doanduyhai
41
RF = 3, Write ONE, Read ALL
B A A
Read ALL: B
B A A
Write ONE: B

@doanduyhai
42
RF = 3, Write QUORUM, Read ONE
B B A
Write QUORUM: B
Read ONE: A
B B A

@doanduyhai
43
RF = 3, Write QUORUM, Read QUORUM
B B A
Read QUORUM: B
B B A
Write QUORUM: B

Consistency level!
@doanduyhai
44
ONE
Fast, may not read latest written value

Consistency level!
@doanduyhai
45
QUORUM
Strict majority w.r.t. Replication Factor
Good balance

Consistency level!
@doanduyhai
46
ALL
Paranoid
Slow, no high availability

Consistency summary!
ONERead + ONEWrite
☞ available for read/write even (N-1) replicas down
QUORUMRead + QUORUMWrite
☞ available for read/write even 1+ replica down
@doanduyhai 47

Data model!
Cassandra Write Path!
Last Write Win!
CQL basics!
From SQL to CQL!

@doanduyhai
50
Commit log1
. . .
1
Commit log2
Commit logn
Memory

@doanduyhai
51
Memory
MemTable
Table1
Commit log1
. . .
1
Commit log2
Commit logn
MemTable
Table2
MemTable
TableN
2
. . .

@doanduyhai
52
Commit log1
Commit log2
Commit logn
Table1
Table2 Table3
SStable2 SStable3 3
SStable1
Memory
. . .

@doanduyhai
53
MemTable . . . Memory
Table1
Commit log1
Commit log2
Commit logn
Table1
SStable1
Table2 Table3
SStable2 SStable3
MemTable
Table2
MemTable
TableN
. . .

@doanduyhai
54
Commit log1
Commit log2
SStable3 . . .
Commit logn
Table1
SStable1
Memory
Table2 Table3
SStable2 SStable3
SStable1
SStable2

Last Write Win (LWW)!
INSERT INTO users(login, name, age) VALUES(‘jdoe’, ‘John DOE’, 33);
@doanduyhai
55
jdoe
age
name
33 John DOE
#partition

@doanduyhai
jdoe
age (t1) name (t1)
33 John DOE
56
auto-generated timestamp
.

@doanduyhai
57
UPDATE users SET age = 34 WHERE login = jdoe;
jdoe
SSTable1 SSTable2
age (t1) name (t1)
33 John DOE
jdoe
age (t2)
34

@doanduyhai
58
DELETE age FROM users WHERE login = jdoe;
tombstone
SSTable1 SSTable2 SSTable3
jdoe
age (t3)
ý
jdoe
age (t1) name (t1)
33 John DOE
jdoe
age (t2)
34

@doanduyhai
59
SELECT age FROM users WHERE login = jdoe;
? ? ?
jdoe
age (t3)
ý
jdoe
age (t1) name (t1)
33 John DOE
jdoe
age (t2)
34

@doanduyhai
60
✕ ✕ ✓
jdoe
age (t3)
ý
jdoe
age (t1) name (t1)
33 John DOE
jdoe
age (t2)
34

Compaction!
@doanduyhai
61
jdoe
age (t3)
ý
jdoe
age (t1) name (t1)
33 John DOE
jdoe
age (t2)
34
New SSTable
jdoe
age (t3) name (t1)
ý John DOE

Historical data!
You want to keep data history ?
• do not use internal generated timestamp !!!
• ☞ time-series data modeling
@doanduyhai
62
history
id
SSTable1 SSTable2
date1(t1) date2(t2) … date9(t9)
… … … …
id
date10(t10) date11(t11) …
…
… … … …

CRUD operations!
@doanduyhai
63
UPDATE users SET age = 34 WHERE login = jdoe;
DELETE age FROM users WHERE login = jdoe;

Simple Table!
@doanduyhai
64
CREATE TABLE users (
login text,
name text,
age int,
…
PRIMARY KEY(login));
partition key (#partition)

Clustered table (1 – N)!
@doanduyhai
65
CREATE TABLE mailbox (
login text,
message_id timeuuid,
interlocutor text,
message text,
PRIMARY KEY((login), message_id));
partition key clustering column
(sorted)
unicity

Queries!
@doanduyhai
66
Get message by user and message_id (date)
SELECT * FROM mailbox WHERE login = jdoe
and message_id = ‘2014-09-25 16:00:00’;
Get message by user and date interval
SELECT * FROM mailbox WHERE login = jdoe
and message_id <= ‘2014-09-25 16:00:00’
and message_id >= ‘2014-09-20 16:00:00’;

Queries!
@doanduyhai
67
Get message by message_id only (#partition not provided)
SELECT * FROM mailbox WHERE message_id = ‘2014-09-25 16:00:00’;
Get message by date interval (#partition not provided)
SELECT * FROM mailbox WHERE
and message_id <= ‘2014-09-25 16:00:00’
and message_id >= ‘2014-09-20 16:00:00’;

Queries!
Get message by user range (range query on #partition)
Get message by user pattern (non exact match on #partition)
@doanduyhai
68
SELECT * FROM mailbox WHERE login >= hsue and login <= jdoe;
SELECT * FROM mailbox WHERE login like ‘%doe%‘;

WHERE clause restrictions!
@doanduyhai
69
All queries (INSERT/UPDATE/DELETE/SELECT) must provide #partition
Only exact match (=) on #partition, range queries (<, ≤, >, ≥) not allowed
• ☞ full cluster scan
On clustering columns, only range queries (<, ≤, >, ≥) and exact match
WHERE clause only possible on columns defined in PRIMARY KEY

@doanduyhai
70
What if I want to perform « arbitrary » WHERE clause ?
• search form scenario, dynamic search fields

@doanduyhai
71
What if I want to perform « arbitrary » WHERE clause ?
• search form scenario, dynamic search fields
☞ Apache Solr (Lucene) integration (Datastax Enterprise)
SELECT * FROM users WHERE solr_query = ‘age:[33 TO *] AND gender:male’;
SELECT * FROM users WHERE solr_query = ‘lastname:*schwei?er’;

Collections & maps!
@doanduyhai
72
login text,
name text,
age int,
friends set<text>,
hobbies list<text>,
languages map<int, text>,
…

User Defined Type (UDT)!
Instead of
@doanduyhai
73
login text,
…
street_number int,
street_name text,
postcode int,
country text,
…

User Defined Type (UDT)!
@doanduyhai
74
CREATE TYPE address (
street_number int,
street_name text,
postcode int,
country text);
login text,
…
location frozen <address>,
…

UDT insert!
@doanduyhai
75
INSERT INTO users(login,name, location) VALUES (
‘jdoe’,
’John DOE’,
{
‘street_number’: 124,
‘street_name’: ‘Congress Avenue’,
‘postcode’: 95054,
‘country’: ‘USA’
});

UDT update!
@doanduyhai
76
UPDATE users set location =
{
‘street_number’: 125,
‘street_name’: ‘Congress Avenue’,
‘postcode’: 95054,
‘country’: ‘USA’
}
WHERE login = jdoe;
Can be nested ☞ store documents
• but no dynamic fields (or use map<string, blob>)

From SQL to CQL!
@doanduyhai
77
Remember…

From SQL to CQL!
@doanduyhai
78
Remember…
CQL is not SQL

From SQL to CQL!
@doanduyhai
79
Remember…
there is no join
(do you want to scale ?)

From SQL to CQL!
@doanduyhai
80
Remember…
there is no integrity constraint
(do you want to read-before-write ?)

From SQL to CQL!
@doanduyhai
81
Normalized
User
1
n
Comment
CREATE TABLE comments (
article_id uuid,
comment_id timeuuid,
author_id text, // typical join id
content text,
PRIMARY KEY((article_id), comment_id));

From SQL to CQL!
@doanduyhai
82
De-normalized
User
1
n
Comment
CREATE TABLE comments (
article_id uuid,
comment_id timeuuid,
author person, // person is UDT
content text,
PRIMARY KEY((article_id), comment_id));

Data modeling best practices!
@doanduyhai
83
Start by queries
• identify core functional read paths
• 1 read scenario ≈ 1 SELECT

@doanduyhai
84
Start by queries
• identify core functional read paths
• 1 read scenario ≈ 1 SELECT
Denormalize
• wisely, only duplicate necessary & immutable data
• functional/technical trade-off

@doanduyhai
85
Person UDT
- firstname/lastname
- date of birth
- gender
- mood
- location

@doanduyhai
86
John DOE, male
birthdate: 21/02/1981
subscribed since 03/06/2011
☉ San Mateo, CA
’’Impossible is not John DOE’’
Full detail read from
User table on click

DSE (Datastax Enterprise)!
@doanduyhai
88
Security
Analytics (Spark & Hadoop)
Search (Solr)

Use Cases!
@doanduyhai
89
Messaging
Collections/
Playlists
Fraud
detection
Recommendation/
Personalization
Internet of things/
Sensor data

CQL In Depth!
Simple Table!
Clustered Table!

Storage Engine!
@doanduyhai
91
#partition1
#col1 #col2 #col3 #col4
cell1 cell2 cell3 cell4
#partition2
#col1 #col2 #col3
cell1 cell2 cell3
#partition3
#col1 #col2
cell1 cell2
#partition4
#col1 #col2 #col3 #col4 …
cell1 cell2 cell3 cell4 …
Partition Key
Column Name
Cell

Data Model Abstraction!
@doanduyhai
92
Table ≈ Map<#p,SortedMap<#col,cell>>
SortedMap<token,…>

Data Model Abstraction!
@doanduyhai
93
SortedMap<#col,cell>>
!
!
SortedMap<token,…>
Unicity
Sort

Static Data Type!
Partition Key Type Column Name Type Cell Type
@doanduyhai
94
Native types: bigint, blob, counter, decimal, double, float, inet, int,
timestamp, timeuuid, uuid.

Simple Table Mapping!
@doanduyhai
95
login text,
name text,
age int,
…
Map<login,SortedMap<column_label,value>>!
text text blob

@doanduyhai
96
INSERT INTO users(login, name, age) VALUES(‘jdoe’, 33, ‘John DOE’);
INSERT INTO users(login, name, age) VALUES(‘hsue’, 26, ‘Helen SUE’);
RowKey: jdoe
=> (name=, value=, timestamp=1412419763515000)
=> (name=age, value=00000021, timestamp=1412419763515000)
=> (name=name, value=4a6f686e20444f45, timestamp=1412419763515000)
RowKey: hsue
=> (name=age, value=0000001c, timestamp=1412419776578000)
=> (name=name, value=48656c656e20535545, timestamp=1412419776578000)!

@doanduyhai
97
RowKey: jdoe
RowKey: hsue
Marker column

@doanduyhai
98
RowKey: jdoe
RowKey: hsue
Sorted
column_label

@doanduyhai
99
RowKey: jdoe
RowKey: hsue
Values
as bytes

Clustered Table Mapping!
@doanduyhai
100
CREATE TABLE daily_3g_quality_per_city (
operator text,
city text,
date int, //date as YYYYMMdd
latency_ms int,
power_watt double,
PRIMARY KEY((operator), city, date);
Map<operator,
SortedMap<city,
SortedMap<date,
SortedMap<column_label,value>>>>!

@doanduyhai
101
RowKey: verizon
=> (name=Austin:20140910:, value=, timestamp=…)
=> (name=Austin:20140910:latency_ms, value=000000e6, timestamp=…)
=> (name=Austin:20140910:power_watt, value=3ff3333333333333, timestamp=…)
=> (name=Austin:20140911:latency_ms, value=000000d4, timestamp=…)
=> (name=New York:20140913:, value=, timestamp=1412422893832000)
=> (name=New York:20140913:latency_ms, value=0000007b, timestamp=…)
=> (name=New York:20140913:power_watt, value=3ffb333333333333, timestamp=…)
=> (name=New York:20140917:, value=, timestamp=…)
=> (name=New York:20140917:latency_ms, value=00000067, timestamp=…)
=> (name=New York:20140917:power_watt, value=3ffe666666666666, timestamp=…)

@doanduyhai
102
RowKey: verizon
Sort first by
city

@doanduyhai
103
RowKey: verizon
… then by
date

@doanduyhai
104
RowKey: verizon
… then by
column_label

Query With Clustered Table!
Select by operator and city for all dates
Select by operator and city range for all dates
@doanduyhai
105
SELECT * FROM daily_3g_quality_per_city
WHERE operator = ‘verizon’ AND city = ‘Austin’
WHERE operator = ‘verizon’ AND city >= ‘Austin’ AND city <= ‘New York’

Select by operator and city and date
Select by operator and city and range of date
@doanduyhai
106
WHERE operator = ‘verizon’ AND city = ‘Austin’ AND date = 20140910
AND date >= 20140910 AND date <= 20140913

@doanduyhai
107
Select by operator and city and date tuples
AND date IN (20140910, 20140913)

@doanduyhai
108
Select by operator and date without city
WHERE operator = ‘verizon’ AND date = 20140910
Map<operator,
SortedMap<city,
SortedMap<date,
SortedMap<column_label,value>>>>!

Thank You
@doanduyhai
duy_hai.doan@datastax.com
https://academy.datastax.com/

Cassandra introduction mars jug

More Related Content

What's hot (10)

Viewers also liked (20)

Similar to Cassandra introduction mars jug (20)

More from Duyhai Doan (14)

Recently uploaded (20)

Cassandra introduction mars jug