SlideShare a Scribd company logo
How To Size Up A Cassandra Cluster
Joe Chu, Technical Trainer
jchu@datastax.com
April 2014
©2014 DataStax Confidential. Do not distribute without consent.
What is Apache Cassandra?
• Distributed NoSQL database
• Linearly scalable
• Highly available with no single point of failure
• Fast writes and reads
• Tunable data consistency
• Rack and Datacenter awareness
©2014 DataStax Confidential. Do not distribute without consent. 2
Peer-to-peer architecture
• All nodes are the same
• No master / slave architecture
• Less operational overhead for better scalability.
• Eliminates single point of failure, increasing availability.
©2014 DataStax Confidential. Do not distribute without consent. 3
Master
Slave
Slave
Peer
Peer
PeerPeer
Peer
Linear Scalability
• Operation throughput increases linearly with the number of
nodes added.
©2014 DataStax Confidential. Do not distribute without consent. 4
Data Replication
• Cassandra can write copies of data on different nodes.
RF = 3
• Replication factor setting determines the number of copies.
• Replication strategy can replicate data to different racks and
and different datacenters.
©2014 DataStax Confidential. Do not distribute without consent. 5
INSERT INTO user_table (id, first_name,
last_name) VALUES (1, „John‟, „Smith‟); R1
R2
R3
Node
• Instance of a running Cassandra process.
• Usually represented a single machine or server.
©2014 DataStax Confidential. Do not distribute without consent. 6
Rack
• Logical grouping of nodes.
• Allows data to be replicated across different racks.
©2014 DataStax Confidential. Do not distribute without consent. 7
Datacenter
• Grouping of nodes and racks.
• Each data center can have separate replication settings.
• May be in different geographical locations, but not always.
©2014 DataStax Confidential. Do not distribute without consent. 8
Cluster
• Grouping of datacenters, racks, and nodes that communicate
with each other and replicate data.
• Clusters are not aware of other clusters.
©2014 DataStax Confidential. Do not distribute without consent. 9
Consistency Models
• Immediate consistency
When a write is successful, subsequent reads are
guaranteed to return that latest value.
• Eventual consistency
When a write is successful, stale data may still be read but
will eventually return the latest value.
©2014 DataStax Confidential. Do not distribute without consent. 10
Tunable Consistency
• Cassandra offers the ability to chose between immediate and
eventual consistency by setting a consistency level.
• Consistency level is set per read or write operation.
• Common consistency levels are ONE, QUORUM, and ALL.
• For multi-datacenters, additional levels such as
LOCAL_QUORUM and EACH_QUORUM to control cross-
datacenter traffic.
©2014 DataStax Confidential. Do not distribute without consent. 11
CL ONE
• Write: Success when at least one replica node has
acknowleged the write.
• Read: Only one replica node is given the read request.
©2014 DataStax Confidential. Do not distribute without consent. 12
R1
R2
R3Coordinator
Client
RF = 3
CL QUORUM
• Write: Success when a majority of the replica nodes has
acknowledged the write.
• Read: A majority of the nodes are given the read request.
• Majority = ( RF / 2 ) + 1
©2013 DataStax Confidential. Do not distribute without consent. 13©2014 DataStax Confidential. Do not distribute without consent. 13
R1
R2
R3Coordinator
Client
RF = 3
CL ALL
• Write: Success when all of the replica nodes has
acknowledged the write.
• Read: All replica nodes are given the read request.
©2013 DataStax Confidential. Do not distribute without consent. 14©2014 DataStax Confidential. Do not distribute without consent. 14
R1
R2
R3Coordinator
Client
RF = 3
Log-Structured Storage Engine
• Cassandra storage engine inspired by Google BigTable
• Key to fast write performance on Cassandra
©2014 DataStax Confidential. Do not distribute without consent. 16
Memtable
SSTable SSTable SSTable
Commit
Log
Updates and Deletes
• SSTable files are immutable and cannot be changed.
• Updates are written as new data.
• Deletes write a tombstone, which mark a row or column(s) as
deleted.
• Updates and deletes are just as fast as inserts.
©2014 DataStax Confidential. Do not distribute without consent. 17
SSTable SSTable SSTable
id:1, first:John,
last:Smith
timestamp: …405
id:1, first:John,
last:Williams
timestamp: …621
id:1, deleted
timestamp: …999
Compaction
• Periodically an operation is triggered that will merge the data
in several SSTables into a single SSTable.
• Helps to limits the number of SSTables to read.
• Removes old data and tombstones.
• SSTables are deleted after compaction
©2014 DataStax Confidential. Do not distribute without consent. 18
SSTable SSTable SSTable
id:1, first:John,
last:Smith
timestamp:405
id:1, first:John,
last:Williams
timestamp:621
id:1, deleted
timestamp:999
New SSTable
id:1, deleted
timestamp:999
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Cluster Sizing
©2014 DataStax Confidential. Do not distribute without consent. 19
Cluster Sizing Considerations
• Replication Factor
• Data Size
“How many nodes would I need to store my data set?”
• Data Velocity (Performance)
“How many nodes would I need to achieve my desired
throughput?”
©2014 DataStax Confidential. Do not distribute without consent. 20
Choosing a Replication Factor
©2014 DataStax Confidential. Do not distribute without consent. 21
What Are You Using Replication For?
• Durability or Availability?
• Each node has local durability (Commit Log), but replication
can be used for distributed durability.
• For availability, a recommended setting is RF=3.
• RF=3 is the minimum necessary to achieve both consistency
and availability using QUORUM and LOCAL_QUORUM.
©2014 DataStax Confidential. Do not distribute without consent. 22
How Replication Can Affect Consistency Level
• When RF < 3, you do not have as much flexibility when
choosing consistency and availability.
• QUORUM = ALL
©2014 DataStax Confidential. Do not distribute without consent. 23
R1
R2
Coordinator
Client
RF = 2
Using A Larger Replication Factor
• When RF > 3, there is more data usage and higher latency for
operations requiring immediate consistency.
• If using eventual consistency, a consistency level of ONE will
have consistent performance regardless of the replication
factor.
• High availability clusters may use a replication factor as high
as 5.
©2014 DataStax Confidential. Do not distribute without consent. 24
Data Size
©2014 DataStax Confidential. Do not distribute without consent. 25
Disk Usage Factors
• Data Size
• Replication Setting
• Old Data
• Compaction
• Snapshots
©2014 DataStax Confidential. Do not distribute without consent. 26
Data Sizing
• Row and Column Data
• Row and Column Overhead
• Indices and Other Structures
©2014 DataStax Confidential. Do not distribute without consent. 27
Replication Overhead
• A replication factor > 1 will effectively multiply your data size
by that amount.
©2014 DataStax Confidential. Do not distribute without consent. 28
RF = 1 RF = 2 RF = 3
Old Data
• Updates and deletes do not actually overwrite or delete data.
• Older versions of data and tombstones remain in the SSTable
files until they are compacted.
• This becomes more important for heavy update and delete
workloads.
©2014 DataStax Confidential. Do not distribute without consent. 29
Compaction
• Compaction needs free disk space to write the new
SSTable, before the SSTables being compacted are removed.
• Leave enough free disk space on each node to allow
compactions to run.
• Worst case for the Size Tier Compaction Strategy is 50% of
the total data capacity of the node.
• For the Leveled Compaction Strategy, that is about 10% of
the total data capacity.
©2014 DataStax Confidential. Do not distribute without consent. 30
Snapshots
• Snapshots are hard-links or copies of SSTable data files.
• After SSTables are compacted, the disk space may not be
reclaimed if a snapshot of those SSTables were created.
Snapshots are created when:
• Executing the nodetool snapshot command
• Dropping a keyspace or table
• Incremental backups
• During compaction
©2014 DataStax Confidential. Do not distribute without consent. 31
Recommended Disk Capacity
• For current Cassandra versions, the ideal disk capacity is
approximate 1TB per node if using spinning disks and 3-5 TB
per node using SSDs.
• Having a larger disk capacity may be limited by the resulting
performance.
• What works for you is still dependent on your data model
design and desired data velocity.
©2014 DataStax Confidential. Do not distribute without consent. 32
Data Velocity (Performance)
©2014 DataStax Confidential. Do not distribute without consent. 33
How to Measure Performance
• I/O Throughput
“How many reads and writes can be completed per
second?”
• Read and Write Latency
“How fast should I be able to get a response for my read and
write requests?”
©2014 DataStax Confidential. Do not distribute without consent. 34
Sizing for Failure
• Cluster must be sized taking into account the performance
impact caused by failure.
• When a node fails, the corresponding workload must be
absorbed by the other replica nodes in the cluster.
• Performance is further impacted when recovering a node.
Data must be streamed or repaired using the other replica
nodes.
©2014 DataStax Confidential. Do not distribute without consent. 35
Hardware Considerations for Performance
CPU
• Operations are often CPU-intensive.
• More cores are better.
Memory
• Cassandra uses JVM heap memory.
• Additional memory used as off-heap memory by Cassandra,
or as the OS page cache.
Disk
• C* optimized for spinning disks, but SSDs will perform better.
• Attached storage (SAN) is strongly discouraged.
©2014 DataStax Confidential. Do not distribute without consent. 36
Some Final Words…
©2014 DataStax Confidential. Do not distribute without consent. 37
Summary
• Cassandra allows flexibility when sizing your cluster from a
single node to thousands of nodes
• Your use case will dictate how you want to size and configure
your Cassandra cluster. Do you need availability? Immediate
consistency?
• The minimum number of nodes needed will be determined by
your data size, desired performance and replication factor.
©2014 DataStax Confidential. Do not distribute without consent. 38
Additional Resources
• DataStax Documentation
http://www.datastax.com/documentation/cassandra/2.0/cassandra/architectu
re/architecturePlanningAbout_c.html
• Planet Cassandra
http://planetcassandra.org/nosql-cassandra-education/
• Cassandra Users Mailing List
user-subscribe@cassandra.apache.org
http://mail-archives.apache.org/mod_mbox/cassandra-user/
©2014 DataStax Confidential. Do not distribute without consent. 39
Questions?
Questions?
©2014 DataStax Confidential. Do not distribute without consent. 40
Thank You
We power the big data
apps that transform business.
41©2014 DataStax Confidential. Do not distribute without consent.
Ad

More Related Content

What's hot (20)

[OpenInfra Days Korea 2018] Day 2 - CEPH 운영자를 위한 Object Storage Performance T...
[OpenInfra Days Korea 2018] Day 2 - CEPH 운영자를 위한 Object Storage Performance T...[OpenInfra Days Korea 2018] Day 2 - CEPH 운영자를 위한 Object Storage Performance T...
[OpenInfra Days Korea 2018] Day 2 - CEPH 운영자를 위한 Object Storage Performance T...
OpenStack Korea Community
 
Best practices for MySQL/MariaDB Server/Percona Server High Availability
Best practices for MySQL/MariaDB Server/Percona Server High AvailabilityBest practices for MySQL/MariaDB Server/Percona Server High Availability
Best practices for MySQL/MariaDB Server/Percona Server High Availability
Colin Charles
 
Hive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveHive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep Dive
DataWorks Summit
 
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the CloudAmazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Noritaka Sekiyama
 
Using Apache Hive with High Performance
Using Apache Hive with High PerformanceUsing Apache Hive with High Performance
Using Apache Hive with High Performance
Inderaj (Raj) Bains
 
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Databricks
 
Galera Cluster Best Practices for DBA's and DevOps Part 1
Galera Cluster Best Practices for DBA's and DevOps Part 1Galera Cluster Best Practices for DBA's and DevOps Part 1
Galera Cluster Best Practices for DBA's and DevOps Part 1
Codership Oy - Creators of Galera Cluster
 
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Databricks
 
Apache Spark Architecture
Apache Spark ArchitectureApache Spark Architecture
Apache Spark Architecture
Alexey Grishchenko
 
Installing Postgres on Linux
Installing Postgres on LinuxInstalling Postgres on Linux
Installing Postgres on Linux
EDB
 
PostgreSQL on AWS: Tips & Tricks (and horror stories)
PostgreSQL on AWS: Tips & Tricks (and horror stories)PostgreSQL on AWS: Tips & Tricks (and horror stories)
PostgreSQL on AWS: Tips & Tricks (and horror stories)
Alexander Kukushkin
 
Lessons Learned From Running 1800 Clusters (Brooke Jensen, Instaclustr) | Cas...
Lessons Learned From Running 1800 Clusters (Brooke Jensen, Instaclustr) | Cas...Lessons Learned From Running 1800 Clusters (Brooke Jensen, Instaclustr) | Cas...
Lessons Learned From Running 1800 Clusters (Brooke Jensen, Instaclustr) | Cas...
DataStax
 
Apache Flink internals
Apache Flink internalsApache Flink internals
Apache Flink internals
Kostas Tzoumas
 
Apache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingApache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data Processing
DataWorks Summit
 
Apache Spark in Depth: Core Concepts, Architecture & Internals
Apache Spark in Depth: Core Concepts, Architecture & InternalsApache Spark in Depth: Core Concepts, Architecture & Internals
Apache Spark in Depth: Core Concepts, Architecture & Internals
Anton Kirillov
 
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA Architecture
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA ArchitectureCeph Day Beijing - Ceph All-Flash Array Design Based on NUMA Architecture
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA Architecture
Danielle Womboldt
 
Under the Hood of a Shard-per-Core Database Architecture
Under the Hood of a Shard-per-Core Database ArchitectureUnder the Hood of a Shard-per-Core Database Architecture
Under the Hood of a Shard-per-Core Database Architecture
ScyllaDB
 
Using Apache Arrow, Calcite, and Parquet to Build a Relational Cache
Using Apache Arrow, Calcite, and Parquet to Build a Relational CacheUsing Apache Arrow, Calcite, and Parquet to Build a Relational Cache
Using Apache Arrow, Calcite, and Parquet to Build a Relational Cache
Dremio Corporation
 
Modeling Data and Queries for Wide Column NoSQL
Modeling Data and Queries for Wide Column NoSQLModeling Data and Queries for Wide Column NoSQL
Modeling Data and Queries for Wide Column NoSQL
ScyllaDB
 
Oracle Performance Tools of the Trade
Oracle Performance Tools of the TradeOracle Performance Tools of the Trade
Oracle Performance Tools of the Trade
Carlos Sierra
 
[OpenInfra Days Korea 2018] Day 2 - CEPH 운영자를 위한 Object Storage Performance T...
[OpenInfra Days Korea 2018] Day 2 - CEPH 운영자를 위한 Object Storage Performance T...[OpenInfra Days Korea 2018] Day 2 - CEPH 운영자를 위한 Object Storage Performance T...
[OpenInfra Days Korea 2018] Day 2 - CEPH 운영자를 위한 Object Storage Performance T...
OpenStack Korea Community
 
Best practices for MySQL/MariaDB Server/Percona Server High Availability
Best practices for MySQL/MariaDB Server/Percona Server High AvailabilityBest practices for MySQL/MariaDB Server/Percona Server High Availability
Best practices for MySQL/MariaDB Server/Percona Server High Availability
Colin Charles
 
Hive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep DiveHive + Tez: A Performance Deep Dive
Hive + Tez: A Performance Deep Dive
DataWorks Summit
 
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the CloudAmazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Noritaka Sekiyama
 
Using Apache Hive with High Performance
Using Apache Hive with High PerformanceUsing Apache Hive with High Performance
Using Apache Hive with High Performance
Inderaj (Raj) Bains
 
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Improving SparkSQL Performance by 30%: How We Optimize Parquet Pushdown and P...
Databricks
 
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Databricks
 
Installing Postgres on Linux
Installing Postgres on LinuxInstalling Postgres on Linux
Installing Postgres on Linux
EDB
 
PostgreSQL on AWS: Tips & Tricks (and horror stories)
PostgreSQL on AWS: Tips & Tricks (and horror stories)PostgreSQL on AWS: Tips & Tricks (and horror stories)
PostgreSQL on AWS: Tips & Tricks (and horror stories)
Alexander Kukushkin
 
Lessons Learned From Running 1800 Clusters (Brooke Jensen, Instaclustr) | Cas...
Lessons Learned From Running 1800 Clusters (Brooke Jensen, Instaclustr) | Cas...Lessons Learned From Running 1800 Clusters (Brooke Jensen, Instaclustr) | Cas...
Lessons Learned From Running 1800 Clusters (Brooke Jensen, Instaclustr) | Cas...
DataStax
 
Apache Flink internals
Apache Flink internalsApache Flink internals
Apache Flink internals
Kostas Tzoumas
 
Apache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data ProcessingApache Tez - A New Chapter in Hadoop Data Processing
Apache Tez - A New Chapter in Hadoop Data Processing
DataWorks Summit
 
Apache Spark in Depth: Core Concepts, Architecture & Internals
Apache Spark in Depth: Core Concepts, Architecture & InternalsApache Spark in Depth: Core Concepts, Architecture & Internals
Apache Spark in Depth: Core Concepts, Architecture & Internals
Anton Kirillov
 
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA Architecture
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA ArchitectureCeph Day Beijing - Ceph All-Flash Array Design Based on NUMA Architecture
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA Architecture
Danielle Womboldt
 
Under the Hood of a Shard-per-Core Database Architecture
Under the Hood of a Shard-per-Core Database ArchitectureUnder the Hood of a Shard-per-Core Database Architecture
Under the Hood of a Shard-per-Core Database Architecture
ScyllaDB
 
Using Apache Arrow, Calcite, and Parquet to Build a Relational Cache
Using Apache Arrow, Calcite, and Parquet to Build a Relational CacheUsing Apache Arrow, Calcite, and Parquet to Build a Relational Cache
Using Apache Arrow, Calcite, and Parquet to Build a Relational Cache
Dremio Corporation
 
Modeling Data and Queries for Wide Column NoSQL
Modeling Data and Queries for Wide Column NoSQLModeling Data and Queries for Wide Column NoSQL
Modeling Data and Queries for Wide Column NoSQL
ScyllaDB
 
Oracle Performance Tools of the Trade
Oracle Performance Tools of the TradeOracle Performance Tools of the Trade
Oracle Performance Tools of the Trade
Carlos Sierra
 

Similar to How to size up an Apache Cassandra cluster (Training) (20)

Everyday I'm Scaling... Cassandra (Ben Bromhead, Instaclustr) | C* Summit 2016
Everyday I'm Scaling... Cassandra (Ben Bromhead, Instaclustr) | C* Summit 2016Everyday I'm Scaling... Cassandra (Ben Bromhead, Instaclustr) | C* Summit 2016
Everyday I'm Scaling... Cassandra (Ben Bromhead, Instaclustr) | C* Summit 2016
DataStax
 
Everyday I’m scaling... Cassandra
Everyday I’m scaling... CassandraEveryday I’m scaling... Cassandra
Everyday I’m scaling... Cassandra
Instaclustr
 
M6d cassandrapresentation
M6d cassandrapresentationM6d cassandrapresentation
M6d cassandrapresentation
Edward Capriolo
 
Why does my choice of storage matter with cassandra?
Why does my choice of storage matter with cassandra?Why does my choice of storage matter with cassandra?
Why does my choice of storage matter with cassandra?
Johnny Miller
 
Highly available, scalable and secure data with Cassandra and DataStax Enterp...
Highly available, scalable and secure data with Cassandra and DataStax Enterp...Highly available, scalable and secure data with Cassandra and DataStax Enterp...
Highly available, scalable and secure data with Cassandra and DataStax Enterp...
Johnny Miller
 
Apache Cassandra For Java Developers - Why, What and How. LJC @ UCL October 2014
Apache Cassandra For Java Developers - Why, What and How. LJC @ UCL October 2014Apache Cassandra For Java Developers - Why, What and How. LJC @ UCL October 2014
Apache Cassandra For Java Developers - Why, What and How. LJC @ UCL October 2014
Johnny Miller
 
BigData Developers MeetUp
BigData Developers MeetUpBigData Developers MeetUp
BigData Developers MeetUp
Christian Johannsen
 
Devops kc
Devops kcDevops kc
Devops kc
Philip Thompson
 
Patrick Guillebert – IT-Tage 2015 – Cassandra NoSQL - Architektur und Anwendu...
Patrick Guillebert – IT-Tage 2015 – Cassandra NoSQL - Architektur und Anwendu...Patrick Guillebert – IT-Tage 2015 – Cassandra NoSQL - Architektur und Anwendu...
Patrick Guillebert – IT-Tage 2015 – Cassandra NoSQL - Architektur und Anwendu...
Informatik Aktuell
 
Cassandra - A Basic Introduction Guide
Cassandra - A Basic Introduction GuideCassandra - A Basic Introduction Guide
Cassandra - A Basic Introduction Guide
Mohammed Fazuluddin
 
DataStax TechDay - Munich 2014
DataStax TechDay - Munich 2014DataStax TechDay - Munich 2014
DataStax TechDay - Munich 2014
Christian Johannsen
 
Cassandra CLuster Management by Japan Cassandra Community
Cassandra CLuster Management by Japan Cassandra CommunityCassandra CLuster Management by Japan Cassandra Community
Cassandra CLuster Management by Japan Cassandra Community
Hiromitsu Komatsu
 
Using Apache Cassandra and Apache Kafka to Scale Next Gen Applications
Using Apache Cassandra and Apache Kafka to Scale Next Gen ApplicationsUsing Apache Cassandra and Apache Kafka to Scale Next Gen Applications
Using Apache Cassandra and Apache Kafka to Scale Next Gen Applications
Data Con LA
 
Apache Cassandra and The Multi-Cloud by Amanda Moran
Apache Cassandra and The Multi-Cloud by Amanda MoranApache Cassandra and The Multi-Cloud by Amanda Moran
Apache Cassandra and The Multi-Cloud by Amanda Moran
Data Con LA
 
Cassandra Core Concepts
Cassandra Core ConceptsCassandra Core Concepts
Cassandra Core Concepts
DataStax Academy
 
mParticle's Journey to Scylla from Cassandra
mParticle's Journey to Scylla from CassandramParticle's Journey to Scylla from Cassandra
mParticle's Journey to Scylla from Cassandra
ScyllaDB
 
Managing Security At 1M Events a Second using Elasticsearch
Managing Security At 1M Events a Second using ElasticsearchManaging Security At 1M Events a Second using Elasticsearch
Managing Security At 1M Events a Second using Elasticsearch
Joe Alex
 
cybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningcybersecurity notes for mca students for learning
cybersecurity notes for mca students for learning
VitsRangannavar
 
Apache Cassandra and DataStax Enterprise Explained with Peter Halliday at Wil...
Apache Cassandra and DataStax Enterprise Explained with Peter Halliday at Wil...Apache Cassandra and DataStax Enterprise Explained with Peter Halliday at Wil...
Apache Cassandra and DataStax Enterprise Explained with Peter Halliday at Wil...
DataStax Academy
 
Leveraging Endpoint Flexibility in Data-Intensive Clusters
Leveraging Endpoint Flexibility in Data-Intensive ClustersLeveraging Endpoint Flexibility in Data-Intensive Clusters
Leveraging Endpoint Flexibility in Data-Intensive Clusters
Ran Ziv
 
Everyday I'm Scaling... Cassandra (Ben Bromhead, Instaclustr) | C* Summit 2016
Everyday I'm Scaling... Cassandra (Ben Bromhead, Instaclustr) | C* Summit 2016Everyday I'm Scaling... Cassandra (Ben Bromhead, Instaclustr) | C* Summit 2016
Everyday I'm Scaling... Cassandra (Ben Bromhead, Instaclustr) | C* Summit 2016
DataStax
 
Everyday I’m scaling... Cassandra
Everyday I’m scaling... CassandraEveryday I’m scaling... Cassandra
Everyday I’m scaling... Cassandra
Instaclustr
 
M6d cassandrapresentation
M6d cassandrapresentationM6d cassandrapresentation
M6d cassandrapresentation
Edward Capriolo
 
Why does my choice of storage matter with cassandra?
Why does my choice of storage matter with cassandra?Why does my choice of storage matter with cassandra?
Why does my choice of storage matter with cassandra?
Johnny Miller
 
Highly available, scalable and secure data with Cassandra and DataStax Enterp...
Highly available, scalable and secure data with Cassandra and DataStax Enterp...Highly available, scalable and secure data with Cassandra and DataStax Enterp...
Highly available, scalable and secure data with Cassandra and DataStax Enterp...
Johnny Miller
 
Apache Cassandra For Java Developers - Why, What and How. LJC @ UCL October 2014
Apache Cassandra For Java Developers - Why, What and How. LJC @ UCL October 2014Apache Cassandra For Java Developers - Why, What and How. LJC @ UCL October 2014
Apache Cassandra For Java Developers - Why, What and How. LJC @ UCL October 2014
Johnny Miller
 
Patrick Guillebert – IT-Tage 2015 – Cassandra NoSQL - Architektur und Anwendu...
Patrick Guillebert – IT-Tage 2015 – Cassandra NoSQL - Architektur und Anwendu...Patrick Guillebert – IT-Tage 2015 – Cassandra NoSQL - Architektur und Anwendu...
Patrick Guillebert – IT-Tage 2015 – Cassandra NoSQL - Architektur und Anwendu...
Informatik Aktuell
 
Cassandra - A Basic Introduction Guide
Cassandra - A Basic Introduction GuideCassandra - A Basic Introduction Guide
Cassandra - A Basic Introduction Guide
Mohammed Fazuluddin
 
Cassandra CLuster Management by Japan Cassandra Community
Cassandra CLuster Management by Japan Cassandra CommunityCassandra CLuster Management by Japan Cassandra Community
Cassandra CLuster Management by Japan Cassandra Community
Hiromitsu Komatsu
 
Using Apache Cassandra and Apache Kafka to Scale Next Gen Applications
Using Apache Cassandra and Apache Kafka to Scale Next Gen ApplicationsUsing Apache Cassandra and Apache Kafka to Scale Next Gen Applications
Using Apache Cassandra and Apache Kafka to Scale Next Gen Applications
Data Con LA
 
Apache Cassandra and The Multi-Cloud by Amanda Moran
Apache Cassandra and The Multi-Cloud by Amanda MoranApache Cassandra and The Multi-Cloud by Amanda Moran
Apache Cassandra and The Multi-Cloud by Amanda Moran
Data Con LA
 
mParticle's Journey to Scylla from Cassandra
mParticle's Journey to Scylla from CassandramParticle's Journey to Scylla from Cassandra
mParticle's Journey to Scylla from Cassandra
ScyllaDB
 
Managing Security At 1M Events a Second using Elasticsearch
Managing Security At 1M Events a Second using ElasticsearchManaging Security At 1M Events a Second using Elasticsearch
Managing Security At 1M Events a Second using Elasticsearch
Joe Alex
 
cybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningcybersecurity notes for mca students for learning
cybersecurity notes for mca students for learning
VitsRangannavar
 
Apache Cassandra and DataStax Enterprise Explained with Peter Halliday at Wil...
Apache Cassandra and DataStax Enterprise Explained with Peter Halliday at Wil...Apache Cassandra and DataStax Enterprise Explained with Peter Halliday at Wil...
Apache Cassandra and DataStax Enterprise Explained with Peter Halliday at Wil...
DataStax Academy
 
Leveraging Endpoint Flexibility in Data-Intensive Clusters
Leveraging Endpoint Flexibility in Data-Intensive ClustersLeveraging Endpoint Flexibility in Data-Intensive Clusters
Leveraging Endpoint Flexibility in Data-Intensive Clusters
Ran Ziv
 
Ad

More from DataStax Academy (20)

Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craftForrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
DataStax Academy
 
Introduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph DatabaseIntroduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph Database
DataStax Academy
 
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache CassandraIntroduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
DataStax Academy
 
Cassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsCassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart Labs
DataStax Academy
 
Cassandra 3.0 Data Modeling
Cassandra 3.0 Data ModelingCassandra 3.0 Data Modeling
Cassandra 3.0 Data Modeling
DataStax Academy
 
Cassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackCassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stack
DataStax Academy
 
Data Modeling for Apache Cassandra
Data Modeling for Apache CassandraData Modeling for Apache Cassandra
Data Modeling for Apache Cassandra
DataStax Academy
 
Coursera Cassandra Driver
Coursera Cassandra DriverCoursera Cassandra Driver
Coursera Cassandra Driver
DataStax Academy
 
Production Ready Cassandra
Production Ready CassandraProduction Ready Cassandra
Production Ready Cassandra
DataStax Academy
 
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonCassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
DataStax Academy
 
Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1
DataStax Academy
 
Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2
DataStax Academy
 
Standing Up Your First Cluster
Standing Up Your First ClusterStanding Up Your First Cluster
Standing Up Your First Cluster
DataStax Academy
 
Real Time Analytics with Dse
Real Time Analytics with DseReal Time Analytics with Dse
Real Time Analytics with Dse
DataStax Academy
 
Introduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraIntroduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache Cassandra
DataStax Academy
 
Enabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseEnabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax Enterprise
DataStax Academy
 
Bad Habits Die Hard
Bad Habits Die Hard Bad Habits Die Hard
Bad Habits Die Hard
DataStax Academy
 
Advanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraAdvanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache Cassandra
DataStax Academy
 
Advanced Cassandra
Advanced CassandraAdvanced Cassandra
Advanced Cassandra
DataStax Academy
 
Apache Cassandra and Drivers
Apache Cassandra and DriversApache Cassandra and Drivers
Apache Cassandra and Drivers
DataStax Academy
 
Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craftForrester CXNYC 2017 - Delivering great real-time cx is a true craft
Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
DataStax Academy
 
Introduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph DatabaseIntroduction to DataStax Enterprise Graph Database
Introduction to DataStax Enterprise Graph Database
DataStax Academy
 
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache CassandraIntroduction to DataStax Enterprise Advanced Replication with Apache Cassandra
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
DataStax Academy
 
Cassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart LabsCassandra on Docker @ Walmart Labs
Cassandra on Docker @ Walmart Labs
DataStax Academy
 
Cassandra 3.0 Data Modeling
Cassandra 3.0 Data ModelingCassandra 3.0 Data Modeling
Cassandra 3.0 Data Modeling
DataStax Academy
 
Cassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackCassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stack
DataStax Academy
 
Data Modeling for Apache Cassandra
Data Modeling for Apache CassandraData Modeling for Apache Cassandra
Data Modeling for Apache Cassandra
DataStax Academy
 
Production Ready Cassandra
Production Ready CassandraProduction Ready Cassandra
Production Ready Cassandra
DataStax Academy
 
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & PythonCassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
DataStax Academy
 
Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1Cassandra @ Sony: The good, the bad, and the ugly part 1
Cassandra @ Sony: The good, the bad, and the ugly part 1
DataStax Academy
 
Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2Cassandra @ Sony: The good, the bad, and the ugly part 2
Cassandra @ Sony: The good, the bad, and the ugly part 2
DataStax Academy
 
Standing Up Your First Cluster
Standing Up Your First ClusterStanding Up Your First Cluster
Standing Up Your First Cluster
DataStax Academy
 
Real Time Analytics with Dse
Real Time Analytics with DseReal Time Analytics with Dse
Real Time Analytics with Dse
DataStax Academy
 
Introduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache CassandraIntroduction to Data Modeling with Apache Cassandra
Introduction to Data Modeling with Apache Cassandra
DataStax Academy
 
Enabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax EnterpriseEnabling Search in your Cassandra Application with DataStax Enterprise
Enabling Search in your Cassandra Application with DataStax Enterprise
DataStax Academy
 
Advanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache CassandraAdvanced Data Modeling with Apache Cassandra
Advanced Data Modeling with Apache Cassandra
DataStax Academy
 
Apache Cassandra and Drivers
Apache Cassandra and DriversApache Cassandra and Drivers
Apache Cassandra and Drivers
DataStax Academy
 
Ad

Recently uploaded (20)

AI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global TrendsAI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global Trends
InData Labs
 
Automation Dreamin': Capture User Feedback From Anywhere
Automation Dreamin': Capture User Feedback From AnywhereAutomation Dreamin': Capture User Feedback From Anywhere
Automation Dreamin': Capture User Feedback From Anywhere
Lynda Kane
 
#AdminHour presents: Hour of Code2018 slide deck from 12/6/2018
#AdminHour presents: Hour of Code2018 slide deck from 12/6/2018#AdminHour presents: Hour of Code2018 slide deck from 12/6/2018
#AdminHour presents: Hour of Code2018 slide deck from 12/6/2018
Lynda Kane
 
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath MaestroDev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
UiPathCommunity
 
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Impelsys Inc.
 
Build Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For DevsBuild Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For Devs
Brian McKeiver
 
Automation Dreamin' 2022: Sharing Some Gratitude with Your Users
Automation Dreamin' 2022: Sharing Some Gratitude with Your UsersAutomation Dreamin' 2022: Sharing Some Gratitude with Your Users
Automation Dreamin' 2022: Sharing Some Gratitude with Your Users
Lynda Kane
 
Hands On: Create a Lightning Aura Component with force:RecordData
Hands On: Create a Lightning Aura Component with force:RecordDataHands On: Create a Lightning Aura Component with force:RecordData
Hands On: Create a Lightning Aura Component with force:RecordData
Lynda Kane
 
Rusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond SparkRusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond Spark
carlyakerly1
 
Leading AI Innovation As A Product Manager - Michael Jidael
Leading AI Innovation As A Product Manager - Michael JidaelLeading AI Innovation As A Product Manager - Michael Jidael
Leading AI Innovation As A Product Manager - Michael Jidael
Michael Jidael
 
"PHP and MySQL CRUD Operations for Student Management System"
"PHP and MySQL CRUD Operations for Student Management System""PHP and MySQL CRUD Operations for Student Management System"
"PHP and MySQL CRUD Operations for Student Management System"
Jainul Musani
 
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdfComplete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Software Company
 
Learn the Basics of Agile Development: Your Step-by-Step Guide
Learn the Basics of Agile Development: Your Step-by-Step GuideLearn the Basics of Agile Development: Your Step-by-Step Guide
Learn the Basics of Agile Development: Your Step-by-Step Guide
Marcel David
 
What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...
Vishnu Singh Chundawat
 
Cyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of securityCyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of security
riccardosl1
 
Network Security. Different aspects of Network Security.
Network Security. Different aspects of Network Security.Network Security. Different aspects of Network Security.
Network Security. Different aspects of Network Security.
gregtap1
 
Drupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy ConsumptionDrupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy Consumption
Exove
 
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager APIUiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPathCommunity
 
Asthma presentación en inglés abril 2025 pdf
Asthma presentación en inglés abril 2025 pdfAsthma presentación en inglés abril 2025 pdf
Asthma presentación en inglés abril 2025 pdf
VanessaRaudez
 
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
SOFTTECHHUB
 
AI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global TrendsAI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global Trends
InData Labs
 
Automation Dreamin': Capture User Feedback From Anywhere
Automation Dreamin': Capture User Feedback From AnywhereAutomation Dreamin': Capture User Feedback From Anywhere
Automation Dreamin': Capture User Feedback From Anywhere
Lynda Kane
 
#AdminHour presents: Hour of Code2018 slide deck from 12/6/2018
#AdminHour presents: Hour of Code2018 slide deck from 12/6/2018#AdminHour presents: Hour of Code2018 slide deck from 12/6/2018
#AdminHour presents: Hour of Code2018 slide deck from 12/6/2018
Lynda Kane
 
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath MaestroDev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
UiPathCommunity
 
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Impelsys Inc.
 
Build Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For DevsBuild Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For Devs
Brian McKeiver
 
Automation Dreamin' 2022: Sharing Some Gratitude with Your Users
Automation Dreamin' 2022: Sharing Some Gratitude with Your UsersAutomation Dreamin' 2022: Sharing Some Gratitude with Your Users
Automation Dreamin' 2022: Sharing Some Gratitude with Your Users
Lynda Kane
 
Hands On: Create a Lightning Aura Component with force:RecordData
Hands On: Create a Lightning Aura Component with force:RecordDataHands On: Create a Lightning Aura Component with force:RecordData
Hands On: Create a Lightning Aura Component with force:RecordData
Lynda Kane
 
Rusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond SparkRusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond Spark
carlyakerly1
 
Leading AI Innovation As A Product Manager - Michael Jidael
Leading AI Innovation As A Product Manager - Michael JidaelLeading AI Innovation As A Product Manager - Michael Jidael
Leading AI Innovation As A Product Manager - Michael Jidael
Michael Jidael
 
"PHP and MySQL CRUD Operations for Student Management System"
"PHP and MySQL CRUD Operations for Student Management System""PHP and MySQL CRUD Operations for Student Management System"
"PHP and MySQL CRUD Operations for Student Management System"
Jainul Musani
 
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdfComplete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Software Company
 
Learn the Basics of Agile Development: Your Step-by-Step Guide
Learn the Basics of Agile Development: Your Step-by-Step GuideLearn the Basics of Agile Development: Your Step-by-Step Guide
Learn the Basics of Agile Development: Your Step-by-Step Guide
Marcel David
 
What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...
Vishnu Singh Chundawat
 
Cyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of securityCyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of security
riccardosl1
 
Network Security. Different aspects of Network Security.
Network Security. Different aspects of Network Security.Network Security. Different aspects of Network Security.
Network Security. Different aspects of Network Security.
gregtap1
 
Drupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy ConsumptionDrupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy Consumption
Exove
 
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager APIUiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPathCommunity
 
Asthma presentación en inglés abril 2025 pdf
Asthma presentación en inglés abril 2025 pdfAsthma presentación en inglés abril 2025 pdf
Asthma presentación en inglés abril 2025 pdf
VanessaRaudez
 
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
SOFTTECHHUB
 

How to size up an Apache Cassandra cluster (Training)

  • 1. How To Size Up A Cassandra Cluster Joe Chu, Technical Trainer [email protected] April 2014 ©2014 DataStax Confidential. Do not distribute without consent.
  • 2. What is Apache Cassandra? • Distributed NoSQL database • Linearly scalable • Highly available with no single point of failure • Fast writes and reads • Tunable data consistency • Rack and Datacenter awareness ©2014 DataStax Confidential. Do not distribute without consent. 2
  • 3. Peer-to-peer architecture • All nodes are the same • No master / slave architecture • Less operational overhead for better scalability. • Eliminates single point of failure, increasing availability. ©2014 DataStax Confidential. Do not distribute without consent. 3 Master Slave Slave Peer Peer PeerPeer Peer
  • 4. Linear Scalability • Operation throughput increases linearly with the number of nodes added. ©2014 DataStax Confidential. Do not distribute without consent. 4
  • 5. Data Replication • Cassandra can write copies of data on different nodes. RF = 3 • Replication factor setting determines the number of copies. • Replication strategy can replicate data to different racks and and different datacenters. ©2014 DataStax Confidential. Do not distribute without consent. 5 INSERT INTO user_table (id, first_name, last_name) VALUES (1, „John‟, „Smith‟); R1 R2 R3
  • 6. Node • Instance of a running Cassandra process. • Usually represented a single machine or server. ©2014 DataStax Confidential. Do not distribute without consent. 6
  • 7. Rack • Logical grouping of nodes. • Allows data to be replicated across different racks. ©2014 DataStax Confidential. Do not distribute without consent. 7
  • 8. Datacenter • Grouping of nodes and racks. • Each data center can have separate replication settings. • May be in different geographical locations, but not always. ©2014 DataStax Confidential. Do not distribute without consent. 8
  • 9. Cluster • Grouping of datacenters, racks, and nodes that communicate with each other and replicate data. • Clusters are not aware of other clusters. ©2014 DataStax Confidential. Do not distribute without consent. 9
  • 10. Consistency Models • Immediate consistency When a write is successful, subsequent reads are guaranteed to return that latest value. • Eventual consistency When a write is successful, stale data may still be read but will eventually return the latest value. ©2014 DataStax Confidential. Do not distribute without consent. 10
  • 11. Tunable Consistency • Cassandra offers the ability to chose between immediate and eventual consistency by setting a consistency level. • Consistency level is set per read or write operation. • Common consistency levels are ONE, QUORUM, and ALL. • For multi-datacenters, additional levels such as LOCAL_QUORUM and EACH_QUORUM to control cross- datacenter traffic. ©2014 DataStax Confidential. Do not distribute without consent. 11
  • 12. CL ONE • Write: Success when at least one replica node has acknowleged the write. • Read: Only one replica node is given the read request. ©2014 DataStax Confidential. Do not distribute without consent. 12 R1 R2 R3Coordinator Client RF = 3
  • 13. CL QUORUM • Write: Success when a majority of the replica nodes has acknowledged the write. • Read: A majority of the nodes are given the read request. • Majority = ( RF / 2 ) + 1 ©2013 DataStax Confidential. Do not distribute without consent. 13©2014 DataStax Confidential. Do not distribute without consent. 13 R1 R2 R3Coordinator Client RF = 3
  • 14. CL ALL • Write: Success when all of the replica nodes has acknowledged the write. • Read: All replica nodes are given the read request. ©2013 DataStax Confidential. Do not distribute without consent. 14©2014 DataStax Confidential. Do not distribute without consent. 14 R1 R2 R3Coordinator Client RF = 3
  • 15. Log-Structured Storage Engine • Cassandra storage engine inspired by Google BigTable • Key to fast write performance on Cassandra ©2014 DataStax Confidential. Do not distribute without consent. 16 Memtable SSTable SSTable SSTable Commit Log
  • 16. Updates and Deletes • SSTable files are immutable and cannot be changed. • Updates are written as new data. • Deletes write a tombstone, which mark a row or column(s) as deleted. • Updates and deletes are just as fast as inserts. ©2014 DataStax Confidential. Do not distribute without consent. 17 SSTable SSTable SSTable id:1, first:John, last:Smith timestamp: …405 id:1, first:John, last:Williams timestamp: …621 id:1, deleted timestamp: …999
  • 17. Compaction • Periodically an operation is triggered that will merge the data in several SSTables into a single SSTable. • Helps to limits the number of SSTables to read. • Removes old data and tombstones. • SSTables are deleted after compaction ©2014 DataStax Confidential. Do not distribute without consent. 18 SSTable SSTable SSTable id:1, first:John, last:Smith timestamp:405 id:1, first:John, last:Williams timestamp:621 id:1, deleted timestamp:999 New SSTable id:1, deleted timestamp:999 . . . . . . . . . . . . . . . .
  • 18. Cluster Sizing ©2014 DataStax Confidential. Do not distribute without consent. 19
  • 19. Cluster Sizing Considerations • Replication Factor • Data Size “How many nodes would I need to store my data set?” • Data Velocity (Performance) “How many nodes would I need to achieve my desired throughput?” ©2014 DataStax Confidential. Do not distribute without consent. 20
  • 20. Choosing a Replication Factor ©2014 DataStax Confidential. Do not distribute without consent. 21
  • 21. What Are You Using Replication For? • Durability or Availability? • Each node has local durability (Commit Log), but replication can be used for distributed durability. • For availability, a recommended setting is RF=3. • RF=3 is the minimum necessary to achieve both consistency and availability using QUORUM and LOCAL_QUORUM. ©2014 DataStax Confidential. Do not distribute without consent. 22
  • 22. How Replication Can Affect Consistency Level • When RF < 3, you do not have as much flexibility when choosing consistency and availability. • QUORUM = ALL ©2014 DataStax Confidential. Do not distribute without consent. 23 R1 R2 Coordinator Client RF = 2
  • 23. Using A Larger Replication Factor • When RF > 3, there is more data usage and higher latency for operations requiring immediate consistency. • If using eventual consistency, a consistency level of ONE will have consistent performance regardless of the replication factor. • High availability clusters may use a replication factor as high as 5. ©2014 DataStax Confidential. Do not distribute without consent. 24
  • 24. Data Size ©2014 DataStax Confidential. Do not distribute without consent. 25
  • 25. Disk Usage Factors • Data Size • Replication Setting • Old Data • Compaction • Snapshots ©2014 DataStax Confidential. Do not distribute without consent. 26
  • 26. Data Sizing • Row and Column Data • Row and Column Overhead • Indices and Other Structures ©2014 DataStax Confidential. Do not distribute without consent. 27
  • 27. Replication Overhead • A replication factor > 1 will effectively multiply your data size by that amount. ©2014 DataStax Confidential. Do not distribute without consent. 28 RF = 1 RF = 2 RF = 3
  • 28. Old Data • Updates and deletes do not actually overwrite or delete data. • Older versions of data and tombstones remain in the SSTable files until they are compacted. • This becomes more important for heavy update and delete workloads. ©2014 DataStax Confidential. Do not distribute without consent. 29
  • 29. Compaction • Compaction needs free disk space to write the new SSTable, before the SSTables being compacted are removed. • Leave enough free disk space on each node to allow compactions to run. • Worst case for the Size Tier Compaction Strategy is 50% of the total data capacity of the node. • For the Leveled Compaction Strategy, that is about 10% of the total data capacity. ©2014 DataStax Confidential. Do not distribute without consent. 30
  • 30. Snapshots • Snapshots are hard-links or copies of SSTable data files. • After SSTables are compacted, the disk space may not be reclaimed if a snapshot of those SSTables were created. Snapshots are created when: • Executing the nodetool snapshot command • Dropping a keyspace or table • Incremental backups • During compaction ©2014 DataStax Confidential. Do not distribute without consent. 31
  • 31. Recommended Disk Capacity • For current Cassandra versions, the ideal disk capacity is approximate 1TB per node if using spinning disks and 3-5 TB per node using SSDs. • Having a larger disk capacity may be limited by the resulting performance. • What works for you is still dependent on your data model design and desired data velocity. ©2014 DataStax Confidential. Do not distribute without consent. 32
  • 32. Data Velocity (Performance) ©2014 DataStax Confidential. Do not distribute without consent. 33
  • 33. How to Measure Performance • I/O Throughput “How many reads and writes can be completed per second?” • Read and Write Latency “How fast should I be able to get a response for my read and write requests?” ©2014 DataStax Confidential. Do not distribute without consent. 34
  • 34. Sizing for Failure • Cluster must be sized taking into account the performance impact caused by failure. • When a node fails, the corresponding workload must be absorbed by the other replica nodes in the cluster. • Performance is further impacted when recovering a node. Data must be streamed or repaired using the other replica nodes. ©2014 DataStax Confidential. Do not distribute without consent. 35
  • 35. Hardware Considerations for Performance CPU • Operations are often CPU-intensive. • More cores are better. Memory • Cassandra uses JVM heap memory. • Additional memory used as off-heap memory by Cassandra, or as the OS page cache. Disk • C* optimized for spinning disks, but SSDs will perform better. • Attached storage (SAN) is strongly discouraged. ©2014 DataStax Confidential. Do not distribute without consent. 36
  • 36. Some Final Words… ©2014 DataStax Confidential. Do not distribute without consent. 37
  • 37. Summary • Cassandra allows flexibility when sizing your cluster from a single node to thousands of nodes • Your use case will dictate how you want to size and configure your Cassandra cluster. Do you need availability? Immediate consistency? • The minimum number of nodes needed will be determined by your data size, desired performance and replication factor. ©2014 DataStax Confidential. Do not distribute without consent. 38
  • 38. Additional Resources • DataStax Documentation http://www.datastax.com/documentation/cassandra/2.0/cassandra/architectu re/architecturePlanningAbout_c.html • Planet Cassandra http://planetcassandra.org/nosql-cassandra-education/ • Cassandra Users Mailing List [email protected] http://mail-archives.apache.org/mod_mbox/cassandra-user/ ©2014 DataStax Confidential. Do not distribute without consent. 39
  • 39. Questions? Questions? ©2014 DataStax Confidential. Do not distribute without consent. 40
  • 40. Thank You We power the big data apps that transform business. 41©2014 DataStax Confidential. Do not distribute without consent.

Editor's Notes

  • #3: Discuss the main features and highlights of the Cassandra database. The features that are important to you will influence how you design, size, and configure your Cassandra cluster.For Rack and Datacenter awareness, mention that includes deploying Cassandra in the cloud, such as Amazon EC2, Rackspace, Google Computing Cloud, etc.
  • #4: This should be self-explanatory for me as I go through this slide
  • #5: Since Cassandra is linearly scalable, your cluster can be scaled as large as needed. The focus for this presentation is more on the minimum number of nodes you’d want / need, alongwith the replication setting.Based on data from a University of Toronto studyhttp://vldb.org/pvldb/vol5/p1724_tilmannrabl_vldb2012.pdf
  • #6: Replication needed to achieve high availability when designing for failure.Can’t have replication factor larger than number of nodes in the cluster.
  • #7: Wait, there are people that didn’t understand what a rack or datacenter is? Well then, let’s backtrack a little and define some of these terms. Starting with the smallest unit, we have the node…By process, I mean a Java virtual machine. Cassandra is written in Java, and its binary code must run in a virtual machine.Nodes can also be represented as a cloud or virtual server instance.
  • #9: Datacenters can be geographically separated, but also logically separated as well. Additional use cases for data centers include disaster recovery and workload seperation.
  • #11: With single node databases when you write data, you can expect to read back the same data. It’s not so easy with distributed systems though. The node that a client writes data to may not be the same node another client is trying to read the data from. In that case, a distributed system must implement a consistency model to determine when written data is saved to the relevant nodes. May want to mention BASE, make sure to clarify that eventual consistency usually occurs within milliseconds (thanks Netflix!)
  • #12: Tunable consistency is a key feature of Cassandra, and the type of consistency you want to use may affect your cluster design.
  • #14: Be sure to explain what happens if data returned from each node does not match.
  • #15: Be sure to explain what happens if data returned from each node does not match.
  • #16: Cross-datacenter latency vs. local consistency / consistency across datacenters
  • #17: Important to understand how the storage engine works, since that directly impacts data size. No reads before a write.Writes go to commit log for durability, and memtablePeriodically memtable data is flushed to disk into a SSTable, or sorted strings table. This will destroy the memtable so that the memory can be reused. Relevant commitlog entries also marked as cleared.
  • #18: Important to understand how the storage engine works, since that directly impacts data size.
  • #19: Important to understand how the storage engine works, since that directly impacts data size.
  • #21: Now that you have a basic understanding of how Cassandra works and the possible benefits to select and use, we can talk about the primary factors for sizing your database.Although not as key, I will also discuss some considerations for the replication factor as well
  • #23: If RF = 1, we are not making use of Cassandra’s advantages of being available. One node = single point of failure.If just using Cassandra for durability, may use RF=2 just to ensure we have two copies of your data on separate nodes.Next slide will talk a bit more about RF &lt; 3.
  • #25: PerformanceFor high availability use cases, there are clusters configured to use a replication factor as high as 5. Not very common.
  • #27: Each Cassandra node has a certain data capacity, and out of that capacity it can only be used for data to a certain limit. These are some of the factors.
  • #28: Of course your data set needs to be accounted for. In addition there is overhead for writing the data in Cassandra, as well as certain structures used for read optimizations (Partition index, summary, Bloom filter)
  • #29: If using a RF &gt; 1, must account for those additional copies. At RF=3, if your data set is 5TB it means C* will be saving 15TB.
  • #30: One consequence of log structured storage is that data that’s no longer needed will exist until a compaction will clean it up. That means additional space remains used until a compaction occurs.
  • #31: Free disk space must be reserved for compaction so that data can be merged into a new file. See above.
  • #32: Backing up your data is very easy with Cassandra. Since the data files are immutable, a snapshot can be taken which creates a hard link or copy of the relevant SSTables. Hard links in particular are pretty much zero cost, since it takes negligible disk space and time to create the hard link.However just be careful. If you are creating snapshots, or configured Cassandra to automatically create snapshots, that’s also going to eat up your disk space unless user does housekeeping.
  • #33: DataStax recommended disk capacity, size your cluster so that your data fits.Why can’t we just add more disks? Limited by performance of each node handling that much data (contention from reads/writes, flushing, compaction, limit on JVM heap memory allocation).
  • #35: For cluster sizing, you want to have enough nodes so that read and write performance meet any SLAs, or are otherwise acceptable to users.
  • #36: Failure conditions must also be taken into account. If a node fails, the workload from that node must be absorbed by the other nodes in the cluster. When recovering the node, this can result in further impact to performance.Don’t size cluster to fully utilize each node, leave room so that cluster can still perform acceptably during failure.Rule of thumb: Some Cassandra MVPs recommend having no less than 6 nodes in your cluster. With less than 6, if you lose one node, you lose a good chunk of your cluster’s throughput capability (at least 20%).