SlideShare a Scribd company logo
Cassandra under the
       hood
         Richard Low
      rlow@acunu.com
Outline
• What happens when you write?
 • Commit logs
 • Memtables
                       “richard”:{
                         “email”:”rlow@acunu.com”

 • SSTables
                       }
                                    ?
• What happens when you read?
 • Point queries
 • Range queries
• Repair and snapshots
Why should we care?

• Help understand performance
• Understand performance implications of
  data model
• Helps to fix it if something goes wrong
• Interesting!
Writes
Writes (2)
             Insert


Commit log            Memtable



                      SSTable
             { Bloom filter, Index, Data }
Commit log
             Insert


Commit log            Memtable



                      SSTable
             { Bloom filter, Index, Data }
Commit log
• Each insert written to commit log first
• Stored in insertion order
• Inserts not acknowledged until written to
  commit log
• Batch vs periodic
• In case of crash, can replay
Memtable


    Memtable
Memtable

• In memory store of insertions
• ConcurrentSkipListMap
• When too large, flushed to disk
• Ensures all writes to disk are sequential
SSTable
             Insert


Commit log            Memtable



                      SSTable
             { Bloom filter, Index, Data }
SSTables

• Stores actual data, sorted by key
• Contains a Bloom filter and index to help
  find keys
• Read only
Bloom filters
• Probabilistic data structure
• Answers membership queries:
 • ‘Does the set contain x?’
• Can give false positives, never false
  negatives
• Space efficient
• Typical size: 1 byte per key
How it works together
   Bloom filter            Index                                Data

011010111010010   k_0       ->    0      k_0....................................................
                  k_128     ->    4582   .....k_1...............................................
                  k_256     ->    9242   .........k_2...........k_3..........................
How it works together
   Bloom filter             Index                                Data

011010111010010   k_0        ->    0      k_0....................................................
                  k_128      ->    4582   .....k_1...............................................
                  k_256      ->    9242   .........k_2...........k_3..........................




    Contains x?           Where is x?                        Retrieve x
How it works together
                                   Memory   Disk
   Bloom filter             Index                                    Data

011010111010010   k_0        ->     0         k_0....................................................
                  k_128      ->     4582      .....k_1...............................................
                  k_256      ->     9242      .........k_2...........k_3..........................




    Contains x?           Where is x?                            Retrieve x
Point queries
Memtables   k_0
            k_1
                         ->
                         ->
                                 .........
                                 .........
                                                      k_0
                                                      k_1
                                                                               ->
                                                                               ->
                                                                                             .........
                                                                                             .........
                                                                                                             k_0
                                                                                                             k_1
                                                                                                                       ->
                                                                                                                       ->
                                                                                                                            .........
                                                                                                                            .........
            k_2          ->      .........            k_2                      ->            .........       k_2       ->   .........




SSTables                                 k_0....................................................
                                         .....k_1...............................................
                                                                                                                             k_0....................................................
                                                                                                                             .....k_1...............................................
                                         .........k_2...........k_3..........................                                .........k_2...........k_3..........................
            k_0     ->        0                                                                      k_0       ->   0
            k_128   ->        4582                                                                   k_128     ->   4582
            k_256   ->        9242                                                                   k_256     ->   9242




                                         k_0....................................................                             k_0....................................................
                                         .....k_1...............................................                             .....k_1...............................................
                                         .........k_2...........k_3..........................                                .........k_2...........k_3..........................
            k_0     ->        0                                                                      k_0       ->   0
            k_128   ->        4582                                                                   k_128     ->   4582
            k_256   ->        9242                                                                   k_256     ->   9242
Point queries
Memtables        k_0
                 k_1
                              ->
                              ->
                                      .........
                                      .........
                                                           k_0
                                                           k_1
                                                                                    ->
                                                                                    ->
                                                                                                  .........
                                                                                                  .........
                                                                                                                  k_0
                                                                                                                  k_1
                                                                                                                            ->
                                                                                                                            ->
                                                                                                                                 .........
                                                                                                                                 .........
                 k_2          ->      .........            k_2                      ->            .........       k_2       ->   .........




SSTables                                      k_0....................................................
                                              .....k_1...............................................
                                                                                                                                  k_0....................................................
                                                                                                                                  .....k_1...............................................
                                              .........k_2...........k_3..........................                                .........k_2...........k_3..........................
                 k_0     ->        0                                                                      k_0       ->   0
                 k_128   ->        4582                                                                   k_128     ->   4582
                 k_256   ->        9242                                                                   k_256     ->   9242

1. Query filter




                                              k_0....................................................                             k_0....................................................
                                              .....k_1...............................................                             .....k_1...............................................
                                              .........k_2...........k_3..........................                                .........k_2...........k_3..........................
                 k_0     ->        0                                                                      k_0       ->   0
                 k_128   ->        4582                                                                   k_128     ->   4582
                 k_256   ->        9242                                                                   k_256     ->   9242
Point queries
Memtables          k_0
                   k_1
                                ->
                                ->
                                        .........
                                        .........
                                                             k_0
                                                             k_1
                                                                                      ->
                                                                                      ->
                                                                                                    .........
                                                                                                    .........
                                                                                                                    k_0
                                                                                                                    k_1
                                                                                                                              ->
                                                                                                                              ->
                                                                                                                                   .........
                                                                                                                                   .........
                   k_2          ->      .........            k_2                      ->            .........       k_2       ->   .........




SSTables                                        k_0....................................................
                                                .....k_1...............................................
                                                                                                                                    k_0....................................................
                                                                                                                                    .....k_1...............................................
                                                .........k_2...........k_3..........................                                .........k_2...........k_3..........................
                   k_0     ->        0                                                                      k_0       ->   0
                   k_128   ->        4582                                                                   k_128     ->   4582
                   k_256   ->        9242                                                                   k_256     ->   9242

1. Query filter
2. Find location


                                                k_0....................................................                             k_0....................................................
                                                .....k_1...............................................                             .....k_1...............................................
                                                .........k_2...........k_3..........................                                .........k_2...........k_3..........................
                   k_0     ->        0                                                                      k_0       ->   0
                   k_128   ->        4582                                                                   k_128     ->   4582
                   k_256   ->        9242                                                                   k_256     ->   9242
Point queries
Memtables          k_0
                   k_1
                                ->
                                ->
                                        .........
                                        .........
                                                             k_0
                                                             k_1
                                                                                      ->
                                                                                      ->
                                                                                                    .........
                                                                                                    .........
                                                                                                                    k_0
                                                                                                                    k_1
                                                                                                                              ->
                                                                                                                              ->
                                                                                                                                   .........
                                                                                                                                   .........
                   k_2          ->      .........            k_2                      ->            .........       k_2       ->   .........




SSTables                                        k_0....................................................
                                                .....k_1...............................................
                                                                                                                                    k_0....................................................
                                                                                                                                    .....k_1...............................................
                                                .........k_2...........k_3..........................                                .........k_2...........k_3..........................
                   k_0     ->        0                                                                      k_0       ->   0
                   k_128   ->        4582                                                                   k_128     ->   4582
                   k_256   ->        9242                                                                   k_256     ->   9242

1. Query filter
2. Find location
3. Read data
                                                k_0....................................................                             k_0....................................................
                                                .....k_1...............................................                             .....k_1...............................................
                                                .........k_2...........k_3..........................                                .........k_2...........k_3..........................
                   k_0     ->        0                                                                      k_0       ->   0
                   k_128   ->        4582                                                                   k_128     ->   4582
                   k_256   ->        9242                                                                   k_256     ->   9242
Range queries
• Bloom filters useless
• Use index to locate portion of SSTable
• Read data, merge results
• Necessary to lookup in every SSTable data
  file
• Disk I/O proportional to #SSTables
Compaction

• Merges SSTables
• Removes overwrites and obsolete
  tombstones
• Improves range query performance
• Major compaction creates one SSTable
Write optimised
• All writes are sequential on disk
• Each write is written multiple times during
  compactions
• Bloom filters mean approx. one I/O per
  read
• Avoid a read-modify-write data model
Scaling
• In memory:
 • Buffers
 • Memtables
 • Bloom filters
 • Index
• If not enough memory, significant
  performance impact
Repair: Merkle Trees
• Repair builds a Merkle tree
• Compared with replicas
• Efficient
• If differences are found,
  portions of SSTables are
  streamed
• Requires full disk scan to
  build
Snapshot

• For backup, want consistent set of SSTables
• nodetool snapshot does this
• Creates hard links to existing SSTables
• Implies data will be copied after a few
  compactions
Summary
• How writes end up on disk
• How point queries and range queries find
  the data
• Implications
• Repair
• Snapshot

More Related Content

More from Acunu (20)

Acunu and Hailo: a realtime analytics case study on Cassandra
Acunu and Hailo: a realtime analytics case study on CassandraAcunu and Hailo: a realtime analytics case study on Cassandra
Acunu and Hailo: a realtime analytics case study on Cassandra
Acunu
 
Virtual nodes: Operational Aspirin
Virtual nodes: Operational AspirinVirtual nodes: Operational Aspirin
Virtual nodes: Operational Aspirin
Acunu
 
Acunu Analytics and Cassandra at Hailo All Your Base 2013
Acunu Analytics and Cassandra at Hailo All Your Base 2013 Acunu Analytics and Cassandra at Hailo All Your Base 2013
Acunu Analytics and Cassandra at Hailo All Your Base 2013
Acunu
 
Understanding Cassandra internals to solve real-world problems
Understanding Cassandra internals to solve real-world problemsUnderstanding Cassandra internals to solve real-world problems
Understanding Cassandra internals to solve real-world problems
Acunu
 
Acunu Analytics: Simpler Real-Time Cassandra Apps
Acunu Analytics: Simpler Real-Time Cassandra AppsAcunu Analytics: Simpler Real-Time Cassandra Apps
Acunu Analytics: Simpler Real-Time Cassandra Apps
Acunu
 
All Your Base
All Your BaseAll Your Base
All Your Base
Acunu
 
Realtime Analytics with Apache Cassandra
Realtime Analytics with Apache CassandraRealtime Analytics with Apache Cassandra
Realtime Analytics with Apache Cassandra
Acunu
 
Realtime Analytics with Apache Cassandra - JAX London
Realtime Analytics with Apache Cassandra - JAX LondonRealtime Analytics with Apache Cassandra - JAX London
Realtime Analytics with Apache Cassandra - JAX London
Acunu
 
Real-time Cassandra
Real-time CassandraReal-time Cassandra
Real-time Cassandra
Acunu
 
Realtime Analytics on the Twitter Firehose with Apache Cassandra - Denormaliz...
Realtime Analytics on the Twitter Firehose with Apache Cassandra - Denormaliz...Realtime Analytics on the Twitter Firehose with Apache Cassandra - Denormaliz...
Realtime Analytics on the Twitter Firehose with Apache Cassandra - Denormaliz...
Acunu
 
Realtime Analytics with Cassandra
Realtime Analytics with CassandraRealtime Analytics with Cassandra
Realtime Analytics with Cassandra
Acunu
 
Acunu Analytics @ Cassandra London
Acunu Analytics @ Cassandra LondonAcunu Analytics @ Cassandra London
Acunu Analytics @ Cassandra London
Acunu
 
Exploring Big Data value for your business
Exploring Big Data value for your businessExploring Big Data value for your business
Exploring Big Data value for your business
Acunu
 
Realtime Analytics on the Twitter Firehose with Cassandra
Realtime Analytics on the Twitter Firehose with CassandraRealtime Analytics on the Twitter Firehose with Cassandra
Realtime Analytics on the Twitter Firehose with Cassandra
Acunu
 
Progressive NOSQL: Cassandra
Progressive NOSQL: CassandraProgressive NOSQL: Cassandra
Progressive NOSQL: Cassandra
Acunu
 
Cassandra EU 2012 - Overview of Case Studies and State of the Market by 451 R...
Cassandra EU 2012 - Overview of Case Studies and State of the Market by 451 R...Cassandra EU 2012 - Overview of Case Studies and State of the Market by 451 R...
Cassandra EU 2012 - Overview of Case Studies and State of the Market by 451 R...
Acunu
 
Cassandra EU 2012 - Putting the X Factor into Cassandra
Cassandra EU 2012 - Putting the X Factor into CassandraCassandra EU 2012 - Putting the X Factor into Cassandra
Cassandra EU 2012 - Putting the X Factor into Cassandra
Acunu
 
Cassandra EU 2012 - Netflix's Cassandra Architecture and Open Source Efforts
Cassandra EU 2012 - Netflix's Cassandra Architecture and Open Source EffortsCassandra EU 2012 - Netflix's Cassandra Architecture and Open Source Efforts
Cassandra EU 2012 - Netflix's Cassandra Architecture and Open Source Efforts
Acunu
 
Next Generation Cassandra
Next Generation CassandraNext Generation Cassandra
Next Generation Cassandra
Acunu
 
Cassandra EU 2012 - CQL: Then, Now and When by Eric Evans
Cassandra EU 2012 - CQL: Then, Now and When by Eric Evans Cassandra EU 2012 - CQL: Then, Now and When by Eric Evans
Cassandra EU 2012 - CQL: Then, Now and When by Eric Evans
Acunu
 
Acunu and Hailo: a realtime analytics case study on Cassandra
Acunu and Hailo: a realtime analytics case study on CassandraAcunu and Hailo: a realtime analytics case study on Cassandra
Acunu and Hailo: a realtime analytics case study on Cassandra
Acunu
 
Virtual nodes: Operational Aspirin
Virtual nodes: Operational AspirinVirtual nodes: Operational Aspirin
Virtual nodes: Operational Aspirin
Acunu
 
Acunu Analytics and Cassandra at Hailo All Your Base 2013
Acunu Analytics and Cassandra at Hailo All Your Base 2013 Acunu Analytics and Cassandra at Hailo All Your Base 2013
Acunu Analytics and Cassandra at Hailo All Your Base 2013
Acunu
 
Understanding Cassandra internals to solve real-world problems
Understanding Cassandra internals to solve real-world problemsUnderstanding Cassandra internals to solve real-world problems
Understanding Cassandra internals to solve real-world problems
Acunu
 
Acunu Analytics: Simpler Real-Time Cassandra Apps
Acunu Analytics: Simpler Real-Time Cassandra AppsAcunu Analytics: Simpler Real-Time Cassandra Apps
Acunu Analytics: Simpler Real-Time Cassandra Apps
Acunu
 
All Your Base
All Your BaseAll Your Base
All Your Base
Acunu
 
Realtime Analytics with Apache Cassandra
Realtime Analytics with Apache CassandraRealtime Analytics with Apache Cassandra
Realtime Analytics with Apache Cassandra
Acunu
 
Realtime Analytics with Apache Cassandra - JAX London
Realtime Analytics with Apache Cassandra - JAX LondonRealtime Analytics with Apache Cassandra - JAX London
Realtime Analytics with Apache Cassandra - JAX London
Acunu
 
Real-time Cassandra
Real-time CassandraReal-time Cassandra
Real-time Cassandra
Acunu
 
Realtime Analytics on the Twitter Firehose with Apache Cassandra - Denormaliz...
Realtime Analytics on the Twitter Firehose with Apache Cassandra - Denormaliz...Realtime Analytics on the Twitter Firehose with Apache Cassandra - Denormaliz...
Realtime Analytics on the Twitter Firehose with Apache Cassandra - Denormaliz...
Acunu
 
Realtime Analytics with Cassandra
Realtime Analytics with CassandraRealtime Analytics with Cassandra
Realtime Analytics with Cassandra
Acunu
 
Acunu Analytics @ Cassandra London
Acunu Analytics @ Cassandra LondonAcunu Analytics @ Cassandra London
Acunu Analytics @ Cassandra London
Acunu
 
Exploring Big Data value for your business
Exploring Big Data value for your businessExploring Big Data value for your business
Exploring Big Data value for your business
Acunu
 
Realtime Analytics on the Twitter Firehose with Cassandra
Realtime Analytics on the Twitter Firehose with CassandraRealtime Analytics on the Twitter Firehose with Cassandra
Realtime Analytics on the Twitter Firehose with Cassandra
Acunu
 
Progressive NOSQL: Cassandra
Progressive NOSQL: CassandraProgressive NOSQL: Cassandra
Progressive NOSQL: Cassandra
Acunu
 
Cassandra EU 2012 - Overview of Case Studies and State of the Market by 451 R...
Cassandra EU 2012 - Overview of Case Studies and State of the Market by 451 R...Cassandra EU 2012 - Overview of Case Studies and State of the Market by 451 R...
Cassandra EU 2012 - Overview of Case Studies and State of the Market by 451 R...
Acunu
 
Cassandra EU 2012 - Putting the X Factor into Cassandra
Cassandra EU 2012 - Putting the X Factor into CassandraCassandra EU 2012 - Putting the X Factor into Cassandra
Cassandra EU 2012 - Putting the X Factor into Cassandra
Acunu
 
Cassandra EU 2012 - Netflix's Cassandra Architecture and Open Source Efforts
Cassandra EU 2012 - Netflix's Cassandra Architecture and Open Source EffortsCassandra EU 2012 - Netflix's Cassandra Architecture and Open Source Efforts
Cassandra EU 2012 - Netflix's Cassandra Architecture and Open Source Efforts
Acunu
 
Next Generation Cassandra
Next Generation CassandraNext Generation Cassandra
Next Generation Cassandra
Acunu
 
Cassandra EU 2012 - CQL: Then, Now and When by Eric Evans
Cassandra EU 2012 - CQL: Then, Now and When by Eric Evans Cassandra EU 2012 - CQL: Then, Now and When by Eric Evans
Cassandra EU 2012 - CQL: Then, Now and When by Eric Evans
Acunu
 

Recently uploaded (20)

Cyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of securityCyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of security
riccardosl1
 
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptxSpecial Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
shyamraj55
 
Linux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdfLinux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdf
RHCSA Guru
 
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In FranceManifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
chb3
 
Cybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure ADCybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure AD
VICTOR MAESTRE RAMIREZ
 
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager APIUiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPathCommunity
 
What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...
Vishnu Singh Chundawat
 
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
Alan Dix
 
HCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser EnvironmentsHCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser Environments
panagenda
 
Role of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered ManufacturingRole of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered Manufacturing
Andrew Leo
 
Rusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond SparkRusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond Spark
carlyakerly1
 
Generative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in BusinessGenerative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in Business
Dr. Tathagat Varma
 
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep DiveDesigning Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
ScyllaDB
 
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdfComplete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Software Company
 
Technology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data AnalyticsTechnology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data Analytics
InData Labs
 
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
BookNet Canada
 
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdfThe Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
Abi john
 
Quantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur MorganQuantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur Morgan
Arthur Morgan
 
Drupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy ConsumptionDrupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy Consumption
Exove
 
Build Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For DevsBuild Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For Devs
Brian McKeiver
 
Cyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of securityCyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of security
riccardosl1
 
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptxSpecial Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
shyamraj55
 
Linux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdfLinux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdf
RHCSA Guru
 
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In FranceManifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
chb3
 
Cybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure ADCybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure AD
VICTOR MAESTRE RAMIREZ
 
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager APIUiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPathCommunity
 
What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...
Vishnu Singh Chundawat
 
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
Alan Dix
 
HCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser EnvironmentsHCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser Environments
panagenda
 
Role of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered ManufacturingRole of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered Manufacturing
Andrew Leo
 
Rusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond SparkRusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond Spark
carlyakerly1
 
Generative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in BusinessGenerative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in Business
Dr. Tathagat Varma
 
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep DiveDesigning Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
ScyllaDB
 
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdfComplete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Software Company
 
Technology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data AnalyticsTechnology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data Analytics
InData Labs
 
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
BookNet Canada
 
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdfThe Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
Abi john
 
Quantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur MorganQuantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur Morgan
Arthur Morgan
 
Drupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy ConsumptionDrupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy Consumption
Exove
 
Build Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For DevsBuild Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For Devs
Brian McKeiver
 

Cassandra internals

  • 2. Outline • What happens when you write? • Commit logs • Memtables “richard”:{ “email”:”[email protected]” • SSTables } ? • What happens when you read? • Point queries • Range queries • Repair and snapshots
  • 3. Why should we care? • Help understand performance • Understand performance implications of data model • Helps to fix it if something goes wrong • Interesting!
  • 5. Writes (2) Insert Commit log Memtable SSTable { Bloom filter, Index, Data }
  • 6. Commit log Insert Commit log Memtable SSTable { Bloom filter, Index, Data }
  • 7. Commit log • Each insert written to commit log first • Stored in insertion order • Inserts not acknowledged until written to commit log • Batch vs periodic • In case of crash, can replay
  • 8. Memtable Memtable
  • 9. Memtable • In memory store of insertions • ConcurrentSkipListMap • When too large, flushed to disk • Ensures all writes to disk are sequential
  • 10. SSTable Insert Commit log Memtable SSTable { Bloom filter, Index, Data }
  • 11. SSTables • Stores actual data, sorted by key • Contains a Bloom filter and index to help find keys • Read only
  • 12. Bloom filters • Probabilistic data structure • Answers membership queries: • ‘Does the set contain x?’ • Can give false positives, never false negatives • Space efficient • Typical size: 1 byte per key
  • 13. How it works together Bloom filter Index Data 011010111010010 k_0 -> 0 k_0.................................................... k_128 -> 4582 .....k_1............................................... k_256 -> 9242 .........k_2...........k_3..........................
  • 14. How it works together Bloom filter Index Data 011010111010010 k_0 -> 0 k_0.................................................... k_128 -> 4582 .....k_1............................................... k_256 -> 9242 .........k_2...........k_3.......................... Contains x? Where is x? Retrieve x
  • 15. How it works together Memory Disk Bloom filter Index Data 011010111010010 k_0 -> 0 k_0.................................................... k_128 -> 4582 .....k_1............................................... k_256 -> 9242 .........k_2...........k_3.......................... Contains x? Where is x? Retrieve x
  • 16. Point queries Memtables k_0 k_1 -> -> ......... ......... k_0 k_1 -> -> ......... ......... k_0 k_1 -> -> ......... ......... k_2 -> ......... k_2 -> ......... k_2 -> ......... SSTables k_0.................................................... .....k_1............................................... k_0.................................................... .....k_1............................................... .........k_2...........k_3.......................... .........k_2...........k_3.......................... k_0 -> 0 k_0 -> 0 k_128 -> 4582 k_128 -> 4582 k_256 -> 9242 k_256 -> 9242 k_0.................................................... k_0.................................................... .....k_1............................................... .....k_1............................................... .........k_2...........k_3.......................... .........k_2...........k_3.......................... k_0 -> 0 k_0 -> 0 k_128 -> 4582 k_128 -> 4582 k_256 -> 9242 k_256 -> 9242
  • 17. Point queries Memtables k_0 k_1 -> -> ......... ......... k_0 k_1 -> -> ......... ......... k_0 k_1 -> -> ......... ......... k_2 -> ......... k_2 -> ......... k_2 -> ......... SSTables k_0.................................................... .....k_1............................................... k_0.................................................... .....k_1............................................... .........k_2...........k_3.......................... .........k_2...........k_3.......................... k_0 -> 0 k_0 -> 0 k_128 -> 4582 k_128 -> 4582 k_256 -> 9242 k_256 -> 9242 1. Query filter k_0.................................................... k_0.................................................... .....k_1............................................... .....k_1............................................... .........k_2...........k_3.......................... .........k_2...........k_3.......................... k_0 -> 0 k_0 -> 0 k_128 -> 4582 k_128 -> 4582 k_256 -> 9242 k_256 -> 9242
  • 18. Point queries Memtables k_0 k_1 -> -> ......... ......... k_0 k_1 -> -> ......... ......... k_0 k_1 -> -> ......... ......... k_2 -> ......... k_2 -> ......... k_2 -> ......... SSTables k_0.................................................... .....k_1............................................... k_0.................................................... .....k_1............................................... .........k_2...........k_3.......................... .........k_2...........k_3.......................... k_0 -> 0 k_0 -> 0 k_128 -> 4582 k_128 -> 4582 k_256 -> 9242 k_256 -> 9242 1. Query filter 2. Find location k_0.................................................... k_0.................................................... .....k_1............................................... .....k_1............................................... .........k_2...........k_3.......................... .........k_2...........k_3.......................... k_0 -> 0 k_0 -> 0 k_128 -> 4582 k_128 -> 4582 k_256 -> 9242 k_256 -> 9242
  • 19. Point queries Memtables k_0 k_1 -> -> ......... ......... k_0 k_1 -> -> ......... ......... k_0 k_1 -> -> ......... ......... k_2 -> ......... k_2 -> ......... k_2 -> ......... SSTables k_0.................................................... .....k_1............................................... k_0.................................................... .....k_1............................................... .........k_2...........k_3.......................... .........k_2...........k_3.......................... k_0 -> 0 k_0 -> 0 k_128 -> 4582 k_128 -> 4582 k_256 -> 9242 k_256 -> 9242 1. Query filter 2. Find location 3. Read data k_0.................................................... k_0.................................................... .....k_1............................................... .....k_1............................................... .........k_2...........k_3.......................... .........k_2...........k_3.......................... k_0 -> 0 k_0 -> 0 k_128 -> 4582 k_128 -> 4582 k_256 -> 9242 k_256 -> 9242
  • 20. Range queries • Bloom filters useless • Use index to locate portion of SSTable • Read data, merge results • Necessary to lookup in every SSTable data file • Disk I/O proportional to #SSTables
  • 21. Compaction • Merges SSTables • Removes overwrites and obsolete tombstones • Improves range query performance • Major compaction creates one SSTable
  • 22. Write optimised • All writes are sequential on disk • Each write is written multiple times during compactions • Bloom filters mean approx. one I/O per read • Avoid a read-modify-write data model
  • 23. Scaling • In memory: • Buffers • Memtables • Bloom filters • Index • If not enough memory, significant performance impact
  • 24. Repair: Merkle Trees • Repair builds a Merkle tree • Compared with replicas • Efficient • If differences are found, portions of SSTables are streamed • Requires full disk scan to build
  • 25. Snapshot • For backup, want consistent set of SSTables • nodetool snapshot does this • Creates hard links to existing SSTables • Implies data will be copied after a few compactions
  • 26. Summary • How writes end up on disk • How point queries and range queries find the data • Implications • Repair • Snapshot

Editor's Notes