Cassandra internals

Cassandra under the
hood
Richard Low
rlow@acunu.com

Outline
• What happens when you write?
• Commit logs
• Memtables
“richard”:{
“email”:”rlow@acunu.com”

• SSTables
}
?
• What happens when you read?
• Point queries
• Range queries
• Repair and snapshots

Why should we care?

• Help understand performance
• Understand performance implications of
data model
• Helps to ﬁx it if something goes wrong
• Interesting!

Writes (2)
Insert

Commit log Memtable

SSTable
{ Bloom ﬁlter, Index, Data }

Commit log
Insert

Commit log Memtable

SSTable

Commit log
• Each insert written to commit log ﬁrst
• Stored in insertion order
• Inserts not acknowledged until written to
commit log
• Batch vs periodic
• In case of crash, can replay

Memtable

• In memory store of insertions
• ConcurrentSkipListMap
• When too large, ﬂushed to disk
• Ensures all writes to disk are sequential

SSTable
Insert

Commit log Memtable

SSTable

SSTables

• Stores actual data, sorted by key
• Contains a Bloom ﬁlter and index to help
ﬁnd keys
• Read only

Bloom ﬁlters
• Probabilistic data structure
• Answers membership queries:
• ‘Does the set contain x?’
• Can give false positives, never false
negatives
• Space efﬁcient
• Typical size: 1 byte per key

How it works together
Bloom ﬁlter Index Data

011010111010010 k_0 -> 0 k_0....................................................
k_128 -> 4582 .....k_1...............................................
k_256 -> 9242 .........k_2...........k_3..........................


011010111010010 k_0 -> 0 k_0....................................................
k_128 -> 4582 .....k_1...............................................
k_256 -> 9242 .........k_2...........k_3..........................

Contains x? Where is x? Retrieve x

Memory Disk

011010111010010 k_0 -> 0 k_0....................................................
k_128 -> 4582 .....k_1...............................................
k_256 -> 9242 .........k_2...........k_3..........................

Contains x? Where is x? Retrieve x

Point queries
Memtables k_0
k_1
->
->
.........
.........
k_0
k_1
->
->
.........
.........
k_0
k_1
->
->
.........
.........
k_2 -> ......... k_2 -> ......... k_2 -> .........

SSTables k_0....................................................
.....k_1...............................................
k_0....................................................
.....k_1...............................................
.........k_2...........k_3.......................... .........k_2...........k_3..........................
k_0 -> 0 k_0 -> 0
k_128 -> 4582 k_128 -> 4582
k_256 -> 9242 k_256 -> 9242

k_0.................................................... k_0....................................................
.....k_1............................................... .....k_1...............................................
.........k_2...........k_3.......................... .........k_2...........k_3..........................
k_0 -> 0 k_0 -> 0
k_128 -> 4582 k_128 -> 4582
k_256 -> 9242 k_256 -> 9242

Point queries
Memtables k_0
k_1
->
->
.........
.........
k_0
k_1
->
->
.........
.........
k_0
k_1
->
->
.........
.........
k_2 -> ......... k_2 -> ......... k_2 -> .........

SSTables k_0....................................................
.....k_1...............................................
k_0....................................................
.....k_1...............................................
.........k_2...........k_3.......................... .........k_2...........k_3..........................
k_0 -> 0 k_0 -> 0
k_128 -> 4582 k_128 -> 4582
k_256 -> 9242 k_256 -> 9242

1. Query ﬁlter

k_0.................................................... k_0....................................................
.....k_1............................................... .....k_1...............................................
.........k_2...........k_3.......................... .........k_2...........k_3..........................
k_0 -> 0 k_0 -> 0
k_128 -> 4582 k_128 -> 4582
k_256 -> 9242 k_256 -> 9242

Point queries
Memtables k_0
k_1
->
->
.........
.........
k_0
k_1
->
->
.........
.........
k_0
k_1
->
->
.........
.........
k_2 -> ......... k_2 -> ......... k_2 -> .........

SSTables k_0....................................................
.....k_1...............................................
k_0....................................................
.....k_1...............................................
.........k_2...........k_3.......................... .........k_2...........k_3..........................
k_0 -> 0 k_0 -> 0
k_128 -> 4582 k_128 -> 4582
k_256 -> 9242 k_256 -> 9242

1. Query ﬁlter
2. Find location

k_0.................................................... k_0....................................................
.....k_1............................................... .....k_1...............................................
.........k_2...........k_3.......................... .........k_2...........k_3..........................
k_0 -> 0 k_0 -> 0
k_128 -> 4582 k_128 -> 4582
k_256 -> 9242 k_256 -> 9242

Point queries
Memtables k_0
k_1
->
->
.........
.........
k_0
k_1
->
->
.........
.........
k_0
k_1
->
->
.........
.........
k_2 -> ......... k_2 -> ......... k_2 -> .........

SSTables k_0....................................................
.....k_1...............................................
k_0....................................................
.....k_1...............................................
.........k_2...........k_3.......................... .........k_2...........k_3..........................
k_0 -> 0 k_0 -> 0
k_128 -> 4582 k_128 -> 4582
k_256 -> 9242 k_256 -> 9242

1. Query ﬁlter
2. Find location
3. Read data
k_0.................................................... k_0....................................................
.....k_1............................................... .....k_1...............................................
.........k_2...........k_3.......................... .........k_2...........k_3..........................
k_0 -> 0 k_0 -> 0
k_128 -> 4582 k_128 -> 4582
k_256 -> 9242 k_256 -> 9242

Range queries
• Bloom ﬁlters useless
• Use index to locate portion of SSTable
• Read data, merge results
• Necessary to lookup in every SSTable data
ﬁle
• Disk I/O proportional to #SSTables

Compaction

• Merges SSTables
• Removes overwrites and obsolete
tombstones
• Improves range query performance
• Major compaction creates one SSTable

Write optimised
• All writes are sequential on disk
• Each write is written multiple times during
compactions
• Bloom ﬁlters mean approx. one I/O per
read
• Avoid a read-modify-write data model

Scaling
• In memory:
• Buffers
• Memtables
• Bloom ﬁlters
• Index
• If not enough memory, signiﬁcant
performance impact

Repair: Merkle Trees
• Repair builds a Merkle tree
• Compared with replicas
• Efﬁcient
• If differences are found,
portions of SSTables are
streamed
• Requires full disk scan to
build

Snapshot

• For backup, want consistent set of SSTables
• nodetool snapshot does this
• Creates hard links to existing SSTables
• Implies data will be copied after a few
compactions

Summary
• How writes end up on disk
• How point queries and range queries ﬁnd
the data
• Implications
• Repair
• Snapshot

Cassandra internals

Recommended

More Related Content

More from Acunu (20)

Recently uploaded (20)

Cassandra internals

Editor's Notes