SlideShare a Scribd company logo
Why does my choice of storage matter with
Cassandra?
Johnny Miller, Solutions Architect
@CyanMiller
www.linkedin.com/in/johnnymiller
Quote
“The single biggest predictor of success or failure
with a Cassandra deployment is in storage choice”
Patrick McFadin, Chief Evangelist for Cassandra, @PatrickMcFadin
©2014 DataStax Confidential. Do not distribute without consent. 2
Cassandra Storage Engine
©2014 DataStax Confidential. Do not distribute without consent. 3
Inserts/Updates
©2014 DataStax Confidential. Do not distribute without consent. 4
Memtables are organized in sorted
order by row key and flushed to
SSTables sequentially (Read/Write)
Ordered Map of KVP,
(Immutable, Read Only)
Append only file structure,
providing interim durability for
writes before they get flushed to
SSTables (Write Only)
Reads
©2014 DataStax Confidential. Do not distribute without consent. 5
Deletes
•  Unlike most DBs, deleted data is not immediately
removed from disk.
•  A marker called a tombstone is written to indicate the the
column is deleted
•  A tombstones exist for a configurable period of time, and
are only deleted from disk via compaction after that time
has expired.
•  Deletes are just as fast as inserts J
©2014 DataStax Confidential. Do not distribute without consent. 6
Compaction
•  Regular compaction of data in Cassandra is essential for a healthy and performant
cluster.
•  SSTables are immutable
•  Get rid or duplicate/overwritten data
•  Drop deleted data and tomnstones
•  Data in SSTables is sorted by partition key, the effect of which is that while the
SSTables are being consolidated, the disk I/O is not random.
©2014 DataStax Confidential. Do not distribute without consent. 7
Compaction Strategies
•  There is a choice of three strategies Cassandra can use for compaction
and all have different disk I/O profiles and capacity requirements.
•  SizeTieredCompactionStrategy (default)
•  Using this strategy causes bursts in I/O activity while a compaction is in process
•  These I/O bursts can negatively affect read-heavy workloads, but typically do not impact
write performance.
•  Data highly likely to be spread across multiple SSTables i.e. multiple disk seeks
•  LeveledCompactionStrategy
•  ~90% of the time, data will be in only a single SSTable i.e. minimal disk seeks
•  However, there is significantly higher Disk I/O than size tiered compaction in order to
guarantee how many SSTables data may be spread across
•  Due to high disk I/O rarely appropriate for on traditional HDD
•  DateTieredCompactionStrategy (C* 2.0.11+ and 2.1.1+)
•  Stores data written within a certain period of time in the same SSTable.
•  Can store data that has been set to expire using TTL in an SSTable with other data
scheduled to expire at approximately – can just drop the SSTable without any
compaction!
©2014 DataStax Confidential. Do not distribute without consent. 8
Choice of storage matters
•  Most databases rewrite modified data in place and writes are
buffered and then flushed to disk as random writes.
•  With Cassandra:
•  Disk writes are typically sequential append only
operations
•  On-disk tables are written in a sorted order so compaction
running time increases linearly with the amount of data
•  So choice of storage is pretty important!!
©2014 DataStax Confidential. Do not distribute without consent. 9
Disks and Configuration Options
©2014 DataStax Confidential. Do not distribute without consent. 10
Quote
“For many applications, we are no longer constrained by hard drive
capacity, but by seek speeds.
Essentially, a 7200 RPM hard drive is capable of delivering approximately
100 seeks per second, and this has not changed in more than 10 years,
even as disk capacity has been doubling every 18–24 months.
In fact, if you have a big data application which required a half a petabyte of
storage, what had previously required 1024 disk drives when we were using
500 GB drives, now that 3 TB disks are available, only 171 disk drives are
needed.
So a storage array capable of storing half a petabyte is now capable of 80%
fewer seeks.”
- Ted Ts’o Maintainer of the ext4 file system in the Linux kernel
©2014 DataStax Confidential. Do not distribute without consent. 11
Hard Drive/Spinning Disk
©2014 DataStax Confidential. Do not distribute without consent. 12
This part actually has to move!This bit spins around very fast
So what can we do?
•  Memory?
•  Caching can help, but the hit rate has to be extremely high to mitigate the mechanical
latency of spinning disks
•  Get rid of the moving parts!
•  Mechanical media will never be able to keep up under load
•  Today’s databases service multiple users with difference access patterns
•  A relatively small number of concurrent disk reads can result in seconds of latency
•  SSDs don’t have moving parts
•  SSDs can eliminate entire classes of problems
•  With Cassandra in particular, you will save a lot of money on staff resources by investing
in SSDs up front
•  Compactions can be tough on flash, but it’s not as bad as you think
©2014 DataStax Confidential. Do not distribute without consent. 13
The best way to do it – SSDs!
•  What is an SSD?
•  Solid State Drive
•  Bits stored in NAND Flash Memory
•  No moving parts
•  “Seeks” 2-3 orders of magnitude faster than spinning disk
•  What’s the catch?
•  Smaller capacity
•  More expensive
•  Flash wears out
In practice, this is not a problem – if it makes you nervous, keep spares
©2014 DataStax Confidential. Do not distribute without consent. 14
The IO Scheduler
•  NOOP - use this scheduler if you know another IO device (like a RAID
card) will be doing its own IO scheduling. The NOOP scheduler is just
a pass-through.
•  Deadline - otherwise, use the deadline scheduler
•  Tell the OS the drive is non-rotational
•  Tune read-ahead way down – start at 0 and work your way up
Don’t forget to tune the OS for SSDs
©2014 DataStax Confidential. Do not distribute without consent. 15
echo deadline > /sys/block/<drive>/queue/scheduler
echo 0 > /sys/block/sda/queue/rotational
blockdev –setra 0 /dev/<drive>
Use SSDs
•  More flexibility and substantial performance benefit
•  Typically 10x the performance for less than 2x the cost (potentially lower) when compared
with HDDs.
•  You can use LeveledCompactionStrategy
•  SSD drives can scale up to cope with larger compaction overheads while simultaneously
serving many random reads.
•  Netflix found that they could half the total system cost to achieve the same level of
throughput.
•  Additionally the mean read request latency was reduced from 10ms to 2.2ms and 99th
percentile request latency was reduced from 65ms to 10ms.
•  http://techblog.netflix.com/2012/07/benchmarking-high-performance-io-with.html.
©2014 DataStax Confidential. Do not distribute without consent. 16
The worst way to do it
•  Shared Storage
•  Cassandra is a shared-nothing architecture with no single point of failure
•  Adding shared storage adds a single point of failure
•  Irrespective of the terrible performance – this alone is enough reason not to do
it.
©2014 DataStax Confidential. Do not distribute without consent. 17
Shared
Storage
Local storage configuration
•  RAID or JBOD?
•  RAID – Redundant Array of (Independepent/Inexpensive) Devices
•  JBOD – Just a Bunch Of Disks
•  RAID
•  Common Cassandra RAID levels are 0, 1, 10
•  RAID-0 is most common, but means all data on node must be rebuilt
from other nodes when a drive fails
•  JBOD
•  Drives are listed individually in cassandra.yaml
•  Failed drives can be replaces individually
©2014 DataStax Confidential. Do not distribute without consent. 18
How to choose between RAID or JBOD?
•  Performance
•  For SSDs, not so much…
•  Compactions are usually throttled significantly below bus speed
•  So a single SSD usually has sufficient throughput
•  Throughput is really the only advantage RAID buys
•  Manageability
•  Pick the option that best fits the deployment scenario
•  If using SSD, and drives can be replaced, choose JBOD
•  Otherwise, RAID is probably the right choice.
©2014 DataStax Confidential. Do not distribute without consent. 19
How to choose between RAID or JBOD?
•  Cloud provider
•  EC2 ephemeral SSD can’t be replaced, use RAID (and dont use EBS)
•  GCE persistent SSD volumes can be replaced, JBOD is useful
•  Not all drives are hot swapable
•  PCIe devices can’t conveniently be replaced
•  SSD Spares for JBOD mode
•  Keep spare SSDs online, but not in use
•  Allows the node to be easily brought back online
with a quick config change
©2014 DataStax Confidential. Do not distribute without consent. 20
Comparison Data
©2014 DataStax Confidential. Do not distribute without consent. 21
FusionIO ioDrive II
©2014 DataStax Confidential. Do not distribute without consent. 22
Reads
Writes
Latency (microseconds)
PNY XLR8 SSD (consumer grade MLC)
©2014 DataStax Confidential. Do not distribute without consent. 23
Samsung 840 Pro SSD (consumer grade MLC)
©2014 DataStax Confidential. Do not distribute without consent. 24
7200RPM SATA
©2014 DataStax Confidential. Do not distribute without consent. 25
7200RPM SAS
©2014 DataStax Confidential. Do not distribute without consent. 26
10K SATA
©2014 DataStax Confidential. Do not distribute without consent. 27
15K SAS
©2014 DataStax Confidential. Do not distribute without consent. 28
All Drives
©2014 DataStax Confidential. Do not distribute without consent. 29
All SSDs
©2014 DataStax Confidential. Do not distribute without consent. 30
Conclusion
•  Using SSDs is a good idea
•  Better response times
•  Less variance in performance
•  Significantly higher throughput so fewer servers needed
©2014 DataStax Confidential. Do not distribute without consent. 31
VS
Thank You
We power the big data apps
that transform business.
Ad

More Related Content

What's hot (20)

Apache Phoenix and HBase: Past, Present and Future of SQL over HBase
Apache Phoenix and HBase: Past, Present and Future of SQL over HBaseApache Phoenix and HBase: Past, Present and Future of SQL over HBase
Apache Phoenix and HBase: Past, Present and Future of SQL over HBase
DataWorks Summit/Hadoop Summit
 
What No One Tells You About Writing a Streaming App: Spark Summit East talk b...
What No One Tells You About Writing a Streaming App: Spark Summit East talk b...What No One Tells You About Writing a Streaming App: Spark Summit East talk b...
What No One Tells You About Writing a Streaming App: Spark Summit East talk b...
Spark Summit
 
Yahoo! JAPANにおけるApache Cassandraへの取り組み
Yahoo! JAPANにおけるApache Cassandraへの取り組みYahoo! JAPANにおけるApache Cassandraへの取り組み
Yahoo! JAPANにおけるApache Cassandraへの取り組み
Yahoo!デベロッパーネットワーク
 
Optimizing Apache Spark SQL Joins
Optimizing Apache Spark SQL JoinsOptimizing Apache Spark SQL Joins
Optimizing Apache Spark SQL Joins
Databricks
 
Spark SQL Tutorial | Spark Tutorial for Beginners | Apache Spark Training | E...
Spark SQL Tutorial | Spark Tutorial for Beginners | Apache Spark Training | E...Spark SQL Tutorial | Spark Tutorial for Beginners | Apache Spark Training | E...
Spark SQL Tutorial | Spark Tutorial for Beginners | Apache Spark Training | E...
Edureka!
 
Ceph Object Storage Performance Secrets and Ceph Data Lake Solution
Ceph Object Storage Performance Secrets and Ceph Data Lake SolutionCeph Object Storage Performance Secrets and Ceph Data Lake Solution
Ceph Object Storage Performance Secrets and Ceph Data Lake Solution
Karan Singh
 
Apache Sparkにおけるメモリ - アプリケーションを落とさないメモリ設計手法 -
Apache Sparkにおけるメモリ - アプリケーションを落とさないメモリ設計手法 -Apache Sparkにおけるメモリ - アプリケーションを落とさないメモリ設計手法 -
Apache Sparkにおけるメモリ - アプリケーションを落とさないメモリ設計手法 -
Yoshiyasu SAEKI
 
Cassandraのしくみ データの読み書き編
Cassandraのしくみ データの読み書き編Cassandraのしくみ データの読み書き編
Cassandraのしくみ データの読み書き編
Yuki Morishita
 
YugabyteDBを使ってみよう(NewSQL/分散SQLデータベースよろず勉強会 #1 発表資料)
YugabyteDBを使ってみよう(NewSQL/分散SQLデータベースよろず勉強会 #1 発表資料)YugabyteDBを使ってみよう(NewSQL/分散SQLデータベースよろず勉強会 #1 発表資料)
YugabyteDBを使ってみよう(NewSQL/分散SQLデータベースよろず勉強会 #1 発表資料)
NTT DATA Technology & Innovation
 
RDB開発者のためのApache Cassandra データモデリング入門
RDB開発者のためのApache Cassandra データモデリング入門RDB開発者のためのApache Cassandra データモデリング入門
RDB開発者のためのApache Cassandra データモデリング入門
Yuki Morishita
 
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Databricks
 
Scalar DB: Universal Transaction Manager
Scalar DB: Universal Transaction ManagerScalar DB: Universal Transaction Manager
Scalar DB: Universal Transaction Manager
Scalar, Inc.
 
Tez Shuffle Handler: Shuffling at Scale with Apache Hadoop
Tez Shuffle Handler: Shuffling at Scale with Apache HadoopTez Shuffle Handler: Shuffling at Scale with Apache Hadoop
Tez Shuffle Handler: Shuffling at Scale with Apache Hadoop
DataWorks Summit
 
The Data Lake Engine Data Microservices in Spark using Apache Arrow Flight
The Data Lake Engine Data Microservices in Spark using Apache Arrow FlightThe Data Lake Engine Data Microservices in Spark using Apache Arrow Flight
The Data Lake Engine Data Microservices in Spark using Apache Arrow Flight
Databricks
 
Apache Hudi: The Path Forward
Apache Hudi: The Path ForwardApache Hudi: The Path Forward
Apache Hudi: The Path Forward
Alluxio, Inc.
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to Redis
Arnab Mitra
 
RocksDB Performance and Reliability Practices
RocksDB Performance and Reliability PracticesRocksDB Performance and Reliability Practices
RocksDB Performance and Reliability Practices
Yoshinori Matsunobu
 
C* Summit 2013: How Not to Use Cassandra by Axel Liljencrantz
C* Summit 2013: How Not to Use Cassandra by Axel LiljencrantzC* Summit 2013: How Not to Use Cassandra by Axel Liljencrantz
C* Summit 2013: How Not to Use Cassandra by Axel Liljencrantz
DataStax Academy
 
Spark-on-YARN: Empower Spark Applications on Hadoop Cluster
Spark-on-YARN: Empower Spark Applications on Hadoop ClusterSpark-on-YARN: Empower Spark Applications on Hadoop Cluster
Spark-on-YARN: Empower Spark Applications on Hadoop Cluster
DataWorks Summit
 
大量のデータ処理や分析に使えるOSS Apache Sparkのご紹介(Open Source Conference 2020 Online/Kyoto ...
大量のデータ処理や分析に使えるOSS Apache Sparkのご紹介(Open Source Conference 2020 Online/Kyoto ...大量のデータ処理や分析に使えるOSS Apache Sparkのご紹介(Open Source Conference 2020 Online/Kyoto ...
大量のデータ処理や分析に使えるOSS Apache Sparkのご紹介(Open Source Conference 2020 Online/Kyoto ...
NTT DATA Technology & Innovation
 
Apache Phoenix and HBase: Past, Present and Future of SQL over HBase
Apache Phoenix and HBase: Past, Present and Future of SQL over HBaseApache Phoenix and HBase: Past, Present and Future of SQL over HBase
Apache Phoenix and HBase: Past, Present and Future of SQL over HBase
DataWorks Summit/Hadoop Summit
 
What No One Tells You About Writing a Streaming App: Spark Summit East talk b...
What No One Tells You About Writing a Streaming App: Spark Summit East talk b...What No One Tells You About Writing a Streaming App: Spark Summit East talk b...
What No One Tells You About Writing a Streaming App: Spark Summit East talk b...
Spark Summit
 
Optimizing Apache Spark SQL Joins
Optimizing Apache Spark SQL JoinsOptimizing Apache Spark SQL Joins
Optimizing Apache Spark SQL Joins
Databricks
 
Spark SQL Tutorial | Spark Tutorial for Beginners | Apache Spark Training | E...
Spark SQL Tutorial | Spark Tutorial for Beginners | Apache Spark Training | E...Spark SQL Tutorial | Spark Tutorial for Beginners | Apache Spark Training | E...
Spark SQL Tutorial | Spark Tutorial for Beginners | Apache Spark Training | E...
Edureka!
 
Ceph Object Storage Performance Secrets and Ceph Data Lake Solution
Ceph Object Storage Performance Secrets and Ceph Data Lake SolutionCeph Object Storage Performance Secrets and Ceph Data Lake Solution
Ceph Object Storage Performance Secrets and Ceph Data Lake Solution
Karan Singh
 
Apache Sparkにおけるメモリ - アプリケーションを落とさないメモリ設計手法 -
Apache Sparkにおけるメモリ - アプリケーションを落とさないメモリ設計手法 -Apache Sparkにおけるメモリ - アプリケーションを落とさないメモリ設計手法 -
Apache Sparkにおけるメモリ - アプリケーションを落とさないメモリ設計手法 -
Yoshiyasu SAEKI
 
Cassandraのしくみ データの読み書き編
Cassandraのしくみ データの読み書き編Cassandraのしくみ データの読み書き編
Cassandraのしくみ データの読み書き編
Yuki Morishita
 
YugabyteDBを使ってみよう(NewSQL/分散SQLデータベースよろず勉強会 #1 発表資料)
YugabyteDBを使ってみよう(NewSQL/分散SQLデータベースよろず勉強会 #1 発表資料)YugabyteDBを使ってみよう(NewSQL/分散SQLデータベースよろず勉強会 #1 発表資料)
YugabyteDBを使ってみよう(NewSQL/分散SQLデータベースよろず勉強会 #1 発表資料)
NTT DATA Technology & Innovation
 
RDB開発者のためのApache Cassandra データモデリング入門
RDB開発者のためのApache Cassandra データモデリング入門RDB開発者のためのApache Cassandra データモデリング入門
RDB開発者のためのApache Cassandra データモデリング入門
Yuki Morishita
 
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Deep Dive into Spark SQL with Advanced Performance Tuning with Xiao Li & Wenc...
Databricks
 
Scalar DB: Universal Transaction Manager
Scalar DB: Universal Transaction ManagerScalar DB: Universal Transaction Manager
Scalar DB: Universal Transaction Manager
Scalar, Inc.
 
Tez Shuffle Handler: Shuffling at Scale with Apache Hadoop
Tez Shuffle Handler: Shuffling at Scale with Apache HadoopTez Shuffle Handler: Shuffling at Scale with Apache Hadoop
Tez Shuffle Handler: Shuffling at Scale with Apache Hadoop
DataWorks Summit
 
The Data Lake Engine Data Microservices in Spark using Apache Arrow Flight
The Data Lake Engine Data Microservices in Spark using Apache Arrow FlightThe Data Lake Engine Data Microservices in Spark using Apache Arrow Flight
The Data Lake Engine Data Microservices in Spark using Apache Arrow Flight
Databricks
 
Apache Hudi: The Path Forward
Apache Hudi: The Path ForwardApache Hudi: The Path Forward
Apache Hudi: The Path Forward
Alluxio, Inc.
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to Redis
Arnab Mitra
 
RocksDB Performance and Reliability Practices
RocksDB Performance and Reliability PracticesRocksDB Performance and Reliability Practices
RocksDB Performance and Reliability Practices
Yoshinori Matsunobu
 
C* Summit 2013: How Not to Use Cassandra by Axel Liljencrantz
C* Summit 2013: How Not to Use Cassandra by Axel LiljencrantzC* Summit 2013: How Not to Use Cassandra by Axel Liljencrantz
C* Summit 2013: How Not to Use Cassandra by Axel Liljencrantz
DataStax Academy
 
Spark-on-YARN: Empower Spark Applications on Hadoop Cluster
Spark-on-YARN: Empower Spark Applications on Hadoop ClusterSpark-on-YARN: Empower Spark Applications on Hadoop Cluster
Spark-on-YARN: Empower Spark Applications on Hadoop Cluster
DataWorks Summit
 
大量のデータ処理や分析に使えるOSS Apache Sparkのご紹介(Open Source Conference 2020 Online/Kyoto ...
大量のデータ処理や分析に使えるOSS Apache Sparkのご紹介(Open Source Conference 2020 Online/Kyoto ...大量のデータ処理や分析に使えるOSS Apache Sparkのご紹介(Open Source Conference 2020 Online/Kyoto ...
大量のデータ処理や分析に使えるOSS Apache Sparkのご紹介(Open Source Conference 2020 Online/Kyoto ...
NTT DATA Technology & Innovation
 

Viewers also liked (20)

Cassandra and Solid State Drives
Cassandra and Solid State DrivesCassandra and Solid State Drives
Cassandra and Solid State Drives
Rick Branson
 
Cassandra and Solid State Drives
Cassandra and Solid State DrivesCassandra and Solid State Drives
Cassandra and Solid State Drives
DataStax Academy
 
How to size up an Apache Cassandra cluster (Training)
How to size up an Apache Cassandra cluster (Training)How to size up an Apache Cassandra cluster (Training)
How to size up an Apache Cassandra cluster (Training)
DataStax Academy
 
Apache Cassandra in the Real World
Apache Cassandra in the Real WorldApache Cassandra in the Real World
Apache Cassandra in the Real World
Jeremy Hanna
 
Indexing in Cassandra
Indexing in CassandraIndexing in Cassandra
Indexing in Cassandra
Ed Anuff
 
Cassandra Day SV 2014: Designing Commodity Storage in Apache Cassandra
Cassandra Day SV 2014: Designing Commodity Storage in Apache CassandraCassandra Day SV 2014: Designing Commodity Storage in Apache Cassandra
Cassandra Day SV 2014: Designing Commodity Storage in Apache Cassandra
DataStax Academy
 
Cassandra v3.0 at Rakuten meet-up on 12/2/2015
Cassandra v3.0 at Rakuten meet-up on 12/2/2015Cassandra v3.0 at Rakuten meet-up on 12/2/2015
Cassandra v3.0 at Rakuten meet-up on 12/2/2015
datastaxjp
 
Scaling Cassandra for Big Data
Scaling Cassandra for Big DataScaling Cassandra for Big Data
Scaling Cassandra for Big Data
DataStax Academy
 
Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)
Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)
Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)
jbellis
 
A deep look at the cql where clause
A deep look at the cql where clauseA deep look at the cql where clause
A deep look at the cql where clause
Benjamin Lerer
 
MongoDB Linux Porting, Performance Measurements and and Scaling Advantage usi...
MongoDB Linux Porting, Performance Measurements and and Scaling Advantage usi...MongoDB Linux Porting, Performance Measurements and and Scaling Advantage usi...
MongoDB Linux Porting, Performance Measurements and and Scaling Advantage usi...
MongoDB
 
Mongo Performance Optimization Using Indexing
Mongo Performance Optimization Using IndexingMongo Performance Optimization Using Indexing
Mongo Performance Optimization Using Indexing
Chinmay Naik
 
CQL: SQL In Cassandra
CQL: SQL In CassandraCQL: SQL In Cassandra
CQL: SQL In Cassandra
Eric Evans
 
Overcoming Scaling Challenges in MongoDB Deployments with SSD
Overcoming Scaling Challenges in MongoDB Deployments with SSDOvercoming Scaling Challenges in MongoDB Deployments with SSD
Overcoming Scaling Challenges in MongoDB Deployments with SSD
MongoDB
 
Cassandra compaction
Cassandra compactionCassandra compaction
Cassandra compaction
Kazutaka Tomita
 
Cassandra at Instagram (August 2013)
Cassandra at Instagram (August 2013)Cassandra at Instagram (August 2013)
Cassandra at Instagram (August 2013)
Rick Branson
 
Webinar: MongoDB 2.6 New Security Features
Webinar: MongoDB 2.6 New Security FeaturesWebinar: MongoDB 2.6 New Security Features
Webinar: MongoDB 2.6 New Security Features
MongoDB
 
Evgeniy Karelin. Mongo DB integration example solving performance and high lo...
Evgeniy Karelin. Mongo DB integration example solving performance and high lo...Evgeniy Karelin. Mongo DB integration example solving performance and high lo...
Evgeniy Karelin. Mongo DB integration example solving performance and high lo...
Vlad Savitsky
 
Linux Kernel I/O Schedulers
Linux Kernel I/O SchedulersLinux Kernel I/O Schedulers
Linux Kernel I/O Schedulers
RajKumar Rampelli
 
Exactly-Once Streaming from Kafka-(Cody Koeninger, Kixer)
Exactly-Once Streaming from Kafka-(Cody Koeninger, Kixer)Exactly-Once Streaming from Kafka-(Cody Koeninger, Kixer)
Exactly-Once Streaming from Kafka-(Cody Koeninger, Kixer)
Spark Summit
 
Cassandra and Solid State Drives
Cassandra and Solid State DrivesCassandra and Solid State Drives
Cassandra and Solid State Drives
Rick Branson
 
Cassandra and Solid State Drives
Cassandra and Solid State DrivesCassandra and Solid State Drives
Cassandra and Solid State Drives
DataStax Academy
 
How to size up an Apache Cassandra cluster (Training)
How to size up an Apache Cassandra cluster (Training)How to size up an Apache Cassandra cluster (Training)
How to size up an Apache Cassandra cluster (Training)
DataStax Academy
 
Apache Cassandra in the Real World
Apache Cassandra in the Real WorldApache Cassandra in the Real World
Apache Cassandra in the Real World
Jeremy Hanna
 
Indexing in Cassandra
Indexing in CassandraIndexing in Cassandra
Indexing in Cassandra
Ed Anuff
 
Cassandra Day SV 2014: Designing Commodity Storage in Apache Cassandra
Cassandra Day SV 2014: Designing Commodity Storage in Apache CassandraCassandra Day SV 2014: Designing Commodity Storage in Apache Cassandra
Cassandra Day SV 2014: Designing Commodity Storage in Apache Cassandra
DataStax Academy
 
Cassandra v3.0 at Rakuten meet-up on 12/2/2015
Cassandra v3.0 at Rakuten meet-up on 12/2/2015Cassandra v3.0 at Rakuten meet-up on 12/2/2015
Cassandra v3.0 at Rakuten meet-up on 12/2/2015
datastaxjp
 
Scaling Cassandra for Big Data
Scaling Cassandra for Big DataScaling Cassandra for Big Data
Scaling Cassandra for Big Data
DataStax Academy
 
Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)
Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)
Dealing with JVM limitations in Apache Cassandra (Fosdem 2012)
jbellis
 
A deep look at the cql where clause
A deep look at the cql where clauseA deep look at the cql where clause
A deep look at the cql where clause
Benjamin Lerer
 
MongoDB Linux Porting, Performance Measurements and and Scaling Advantage usi...
MongoDB Linux Porting, Performance Measurements and and Scaling Advantage usi...MongoDB Linux Porting, Performance Measurements and and Scaling Advantage usi...
MongoDB Linux Porting, Performance Measurements and and Scaling Advantage usi...
MongoDB
 
Mongo Performance Optimization Using Indexing
Mongo Performance Optimization Using IndexingMongo Performance Optimization Using Indexing
Mongo Performance Optimization Using Indexing
Chinmay Naik
 
CQL: SQL In Cassandra
CQL: SQL In CassandraCQL: SQL In Cassandra
CQL: SQL In Cassandra
Eric Evans
 
Overcoming Scaling Challenges in MongoDB Deployments with SSD
Overcoming Scaling Challenges in MongoDB Deployments with SSDOvercoming Scaling Challenges in MongoDB Deployments with SSD
Overcoming Scaling Challenges in MongoDB Deployments with SSD
MongoDB
 
Cassandra at Instagram (August 2013)
Cassandra at Instagram (August 2013)Cassandra at Instagram (August 2013)
Cassandra at Instagram (August 2013)
Rick Branson
 
Webinar: MongoDB 2.6 New Security Features
Webinar: MongoDB 2.6 New Security FeaturesWebinar: MongoDB 2.6 New Security Features
Webinar: MongoDB 2.6 New Security Features
MongoDB
 
Evgeniy Karelin. Mongo DB integration example solving performance and high lo...
Evgeniy Karelin. Mongo DB integration example solving performance and high lo...Evgeniy Karelin. Mongo DB integration example solving performance and high lo...
Evgeniy Karelin. Mongo DB integration example solving performance and high lo...
Vlad Savitsky
 
Exactly-Once Streaming from Kafka-(Cody Koeninger, Kixer)
Exactly-Once Streaming from Kafka-(Cody Koeninger, Kixer)Exactly-Once Streaming from Kafka-(Cody Koeninger, Kixer)
Exactly-Once Streaming from Kafka-(Cody Koeninger, Kixer)
Spark Summit
 
Ad

Similar to Why does my choice of storage matter with cassandra? (20)

Building Data Pipelines with SMACK: Designing Storage Strategies for Scale an...
Building Data Pipelines with SMACK: Designing Storage Strategies for Scale an...Building Data Pipelines with SMACK: Designing Storage Strategies for Scale an...
Building Data Pipelines with SMACK: Designing Storage Strategies for Scale an...
DataStax
 
Dell SSD og Flash teknologi i SAN
Dell SSD og Flash teknologi i SANDell SSD og Flash teknologi i SAN
Dell SSD og Flash teknologi i SAN
Kenneth de Brucq
 
IaaS for DBAs in Azure
IaaS for DBAs in AzureIaaS for DBAs in Azure
IaaS for DBAs in Azure
Kellyn Pot'Vin-Gorman
 
Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...
Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...
Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...
Ceph Community
 
eFolder Webinar — Big News: Get Ready for Next-Gen BDR
eFolder Webinar — Big News: Get Ready for Next-Gen BDReFolder Webinar — Big News: Get Ready for Next-Gen BDR
eFolder Webinar — Big News: Get Ready for Next-Gen BDR
eFolder
 
Ssd collab13
Ssd   collab13Ssd   collab13
Ssd collab13
Gwen (Chen) Shapira
 
Ceph Day Shanghai - SSD/NVM Technology Boosting Ceph Performance
Ceph Day Shanghai - SSD/NVM Technology Boosting Ceph Performance Ceph Day Shanghai - SSD/NVM Technology Boosting Ceph Performance
Ceph Day Shanghai - SSD/NVM Technology Boosting Ceph Performance
Ceph Community
 
Taking Splunk to the Next Level - Architecture Breakout Session
Taking Splunk to the Next Level - Architecture Breakout SessionTaking Splunk to the Next Level - Architecture Breakout Session
Taking Splunk to the Next Level - Architecture Breakout Session
Splunk
 
Ceph Day San Jose - Red Hat Storage Acceleration Utlizing Flash Technology
Ceph Day San Jose - Red Hat Storage Acceleration Utlizing Flash TechnologyCeph Day San Jose - Red Hat Storage Acceleration Utlizing Flash Technology
Ceph Day San Jose - Red Hat Storage Acceleration Utlizing Flash Technology
Ceph Community
 
5 Tips for a More Efficient Data Center
5 Tips for a More Efficient Data Center5 Tips for a More Efficient Data Center
5 Tips for a More Efficient Data Center
Western Digital
 
SSD-Bondi.pptx
SSD-Bondi.pptxSSD-Bondi.pptx
SSD-Bondi.pptx
ssuserfc2c45
 
VMworld 2014: Databases in a Virtualized World
VMworld 2014:  Databases in a Virtualized WorldVMworld 2014:  Databases in a Virtualized World
VMworld 2014: Databases in a Virtualized World
Violin Memory
 
VMworld 2013: Just Because You Could, Doesn't Mean You Should: Lessons Learne...
VMworld 2013: Just Because You Could, Doesn't Mean You Should: Lessons Learne...VMworld 2013: Just Because You Could, Doesn't Mean You Should: Lessons Learne...
VMworld 2013: Just Because You Could, Doesn't Mean You Should: Lessons Learne...
VMworld
 
OOW13: It's a solid state-world
OOW13: It's a solid state-worldOOW13: It's a solid state-world
OOW13: It's a solid state-world
Marc Fielding
 
Red Hat Storage Day Dallas - Red Hat Ceph Storage Acceleration Utilizing Flas...
Red Hat Storage Day Dallas - Red Hat Ceph Storage Acceleration Utilizing Flas...Red Hat Storage Day Dallas - Red Hat Ceph Storage Acceleration Utilizing Flas...
Red Hat Storage Day Dallas - Red Hat Ceph Storage Acceleration Utilizing Flas...
Red_Hat_Storage
 
Presentation architecting a cloud infrastructure
Presentation   architecting a cloud infrastructurePresentation   architecting a cloud infrastructure
Presentation architecting a cloud infrastructure
xKinAnx
 
Presentation architecting a cloud infrastructure
Presentation   architecting a cloud infrastructurePresentation   architecting a cloud infrastructure
Presentation architecting a cloud infrastructure
solarisyourep
 
Red Hat Ceph Storage Acceleration Utilizing Flash Technology
Red Hat Ceph Storage Acceleration Utilizing Flash Technology Red Hat Ceph Storage Acceleration Utilizing Flash Technology
Red Hat Ceph Storage Acceleration Utilizing Flash Technology
Red_Hat_Storage
 
Seagate Implementation of Dense Storage Utilizing HDDs and SSDs
Seagate Implementation of Dense Storage Utilizing HDDs and SSDsSeagate Implementation of Dense Storage Utilizing HDDs and SSDs
Seagate Implementation of Dense Storage Utilizing HDDs and SSDs
Red_Hat_Storage
 
Everyday I'm Scaling... Cassandra (Ben Bromhead, Instaclustr) | C* Summit 2016
Everyday I'm Scaling... Cassandra (Ben Bromhead, Instaclustr) | C* Summit 2016Everyday I'm Scaling... Cassandra (Ben Bromhead, Instaclustr) | C* Summit 2016
Everyday I'm Scaling... Cassandra (Ben Bromhead, Instaclustr) | C* Summit 2016
DataStax
 
Building Data Pipelines with SMACK: Designing Storage Strategies for Scale an...
Building Data Pipelines with SMACK: Designing Storage Strategies for Scale an...Building Data Pipelines with SMACK: Designing Storage Strategies for Scale an...
Building Data Pipelines with SMACK: Designing Storage Strategies for Scale an...
DataStax
 
Dell SSD og Flash teknologi i SAN
Dell SSD og Flash teknologi i SANDell SSD og Flash teknologi i SAN
Dell SSD og Flash teknologi i SAN
Kenneth de Brucq
 
Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...
Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...
Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...
Ceph Community
 
eFolder Webinar — Big News: Get Ready for Next-Gen BDR
eFolder Webinar — Big News: Get Ready for Next-Gen BDReFolder Webinar — Big News: Get Ready for Next-Gen BDR
eFolder Webinar — Big News: Get Ready for Next-Gen BDR
eFolder
 
Ceph Day Shanghai - SSD/NVM Technology Boosting Ceph Performance
Ceph Day Shanghai - SSD/NVM Technology Boosting Ceph Performance Ceph Day Shanghai - SSD/NVM Technology Boosting Ceph Performance
Ceph Day Shanghai - SSD/NVM Technology Boosting Ceph Performance
Ceph Community
 
Taking Splunk to the Next Level - Architecture Breakout Session
Taking Splunk to the Next Level - Architecture Breakout SessionTaking Splunk to the Next Level - Architecture Breakout Session
Taking Splunk to the Next Level - Architecture Breakout Session
Splunk
 
Ceph Day San Jose - Red Hat Storage Acceleration Utlizing Flash Technology
Ceph Day San Jose - Red Hat Storage Acceleration Utlizing Flash TechnologyCeph Day San Jose - Red Hat Storage Acceleration Utlizing Flash Technology
Ceph Day San Jose - Red Hat Storage Acceleration Utlizing Flash Technology
Ceph Community
 
5 Tips for a More Efficient Data Center
5 Tips for a More Efficient Data Center5 Tips for a More Efficient Data Center
5 Tips for a More Efficient Data Center
Western Digital
 
VMworld 2014: Databases in a Virtualized World
VMworld 2014:  Databases in a Virtualized WorldVMworld 2014:  Databases in a Virtualized World
VMworld 2014: Databases in a Virtualized World
Violin Memory
 
VMworld 2013: Just Because You Could, Doesn't Mean You Should: Lessons Learne...
VMworld 2013: Just Because You Could, Doesn't Mean You Should: Lessons Learne...VMworld 2013: Just Because You Could, Doesn't Mean You Should: Lessons Learne...
VMworld 2013: Just Because You Could, Doesn't Mean You Should: Lessons Learne...
VMworld
 
OOW13: It's a solid state-world
OOW13: It's a solid state-worldOOW13: It's a solid state-world
OOW13: It's a solid state-world
Marc Fielding
 
Red Hat Storage Day Dallas - Red Hat Ceph Storage Acceleration Utilizing Flas...
Red Hat Storage Day Dallas - Red Hat Ceph Storage Acceleration Utilizing Flas...Red Hat Storage Day Dallas - Red Hat Ceph Storage Acceleration Utilizing Flas...
Red Hat Storage Day Dallas - Red Hat Ceph Storage Acceleration Utilizing Flas...
Red_Hat_Storage
 
Presentation architecting a cloud infrastructure
Presentation   architecting a cloud infrastructurePresentation   architecting a cloud infrastructure
Presentation architecting a cloud infrastructure
xKinAnx
 
Presentation architecting a cloud infrastructure
Presentation   architecting a cloud infrastructurePresentation   architecting a cloud infrastructure
Presentation architecting a cloud infrastructure
solarisyourep
 
Red Hat Ceph Storage Acceleration Utilizing Flash Technology
Red Hat Ceph Storage Acceleration Utilizing Flash Technology Red Hat Ceph Storage Acceleration Utilizing Flash Technology
Red Hat Ceph Storage Acceleration Utilizing Flash Technology
Red_Hat_Storage
 
Seagate Implementation of Dense Storage Utilizing HDDs and SSDs
Seagate Implementation of Dense Storage Utilizing HDDs and SSDsSeagate Implementation of Dense Storage Utilizing HDDs and SSDs
Seagate Implementation of Dense Storage Utilizing HDDs and SSDs
Red_Hat_Storage
 
Everyday I'm Scaling... Cassandra (Ben Bromhead, Instaclustr) | C* Summit 2016
Everyday I'm Scaling... Cassandra (Ben Bromhead, Instaclustr) | C* Summit 2016Everyday I'm Scaling... Cassandra (Ben Bromhead, Instaclustr) | C* Summit 2016
Everyday I'm Scaling... Cassandra (Ben Bromhead, Instaclustr) | C* Summit 2016
DataStax
 
Ad

More from Johnny Miller (6)

201504 securing cassandraanddse
201504 securing cassandraanddse201504 securing cassandraanddse
201504 securing cassandraanddse
Johnny Miller
 
Apache Cassandra For Java Developers - Why, What and How. LJC @ UCL October 2014
Apache Cassandra For Java Developers - Why, What and How. LJC @ UCL October 2014Apache Cassandra For Java Developers - Why, What and How. LJC @ UCL October 2014
Apache Cassandra For Java Developers - Why, What and How. LJC @ UCL October 2014
Johnny Miller
 
Cassandra 2.0 to 2.1
Cassandra 2.0 to 2.1Cassandra 2.0 to 2.1
Cassandra 2.0 to 2.1
Johnny Miller
 
Highly available, scalable and secure data with Cassandra and DataStax Enterp...
Highly available, scalable and secure data with Cassandra and DataStax Enterp...Highly available, scalable and secure data with Cassandra and DataStax Enterp...
Highly available, scalable and secure data with Cassandra and DataStax Enterp...
Johnny Miller
 
Introduction to CQL and Data Modeling with Apache Cassandra
Introduction to CQL and Data Modeling with Apache CassandraIntroduction to CQL and Data Modeling with Apache Cassandra
Introduction to CQL and Data Modeling with Apache Cassandra
Johnny Miller
 
Going native with Apache Cassandra
Going native with Apache CassandraGoing native with Apache Cassandra
Going native with Apache Cassandra
Johnny Miller
 
201504 securing cassandraanddse
201504 securing cassandraanddse201504 securing cassandraanddse
201504 securing cassandraanddse
Johnny Miller
 
Apache Cassandra For Java Developers - Why, What and How. LJC @ UCL October 2014
Apache Cassandra For Java Developers - Why, What and How. LJC @ UCL October 2014Apache Cassandra For Java Developers - Why, What and How. LJC @ UCL October 2014
Apache Cassandra For Java Developers - Why, What and How. LJC @ UCL October 2014
Johnny Miller
 
Cassandra 2.0 to 2.1
Cassandra 2.0 to 2.1Cassandra 2.0 to 2.1
Cassandra 2.0 to 2.1
Johnny Miller
 
Highly available, scalable and secure data with Cassandra and DataStax Enterp...
Highly available, scalable and secure data with Cassandra and DataStax Enterp...Highly available, scalable and secure data with Cassandra and DataStax Enterp...
Highly available, scalable and secure data with Cassandra and DataStax Enterp...
Johnny Miller
 
Introduction to CQL and Data Modeling with Apache Cassandra
Introduction to CQL and Data Modeling with Apache CassandraIntroduction to CQL and Data Modeling with Apache Cassandra
Introduction to CQL and Data Modeling with Apache Cassandra
Johnny Miller
 
Going native with Apache Cassandra
Going native with Apache CassandraGoing native with Apache Cassandra
Going native with Apache Cassandra
Johnny Miller
 

Recently uploaded (20)

How analogue intelligence complements AI
How analogue intelligence complements AIHow analogue intelligence complements AI
How analogue intelligence complements AI
Paul Rowe
 
Drupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy ConsumptionDrupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy Consumption
Exove
 
Cybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure ADCybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure AD
VICTOR MAESTRE RAMIREZ
 
Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)
Ortus Solutions, Corp
 
Build Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For DevsBuild Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For Devs
Brian McKeiver
 
Rusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond SparkRusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond Spark
carlyakerly1
 
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdfThe Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
Abi john
 
2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx
Samuele Fogagnolo
 
Electronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploitElectronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploit
niftliyevhuseyn
 
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Impelsys Inc.
 
What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...
Vishnu Singh Chundawat
 
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
SOFTTECHHUB
 
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
BookNet Canada
 
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptxDevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
Justin Reock
 
Cyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of securityCyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of security
riccardosl1
 
Technology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data AnalyticsTechnology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data Analytics
InData Labs
 
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptxIncreasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Anoop Ashok
 
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
Alan Dix
 
Quantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur MorganQuantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur Morgan
Arthur Morgan
 
Role of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered ManufacturingRole of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered Manufacturing
Andrew Leo
 
How analogue intelligence complements AI
How analogue intelligence complements AIHow analogue intelligence complements AI
How analogue intelligence complements AI
Paul Rowe
 
Drupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy ConsumptionDrupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy Consumption
Exove
 
Cybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure ADCybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure AD
VICTOR MAESTRE RAMIREZ
 
Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)
Ortus Solutions, Corp
 
Build Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For DevsBuild Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For Devs
Brian McKeiver
 
Rusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond SparkRusty Waters: Elevating Lakehouses Beyond Spark
Rusty Waters: Elevating Lakehouses Beyond Spark
carlyakerly1
 
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdfThe Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
Abi john
 
2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx
Samuele Fogagnolo
 
Electronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploitElectronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploit
niftliyevhuseyn
 
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Impelsys Inc.
 
What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...
Vishnu Singh Chundawat
 
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
SOFTTECHHUB
 
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
BookNet Canada
 
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptxDevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
Justin Reock
 
Cyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of securityCyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of security
riccardosl1
 
Technology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data AnalyticsTechnology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data Analytics
InData Labs
 
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptxIncreasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Anoop Ashok
 
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
Alan Dix
 
Quantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur MorganQuantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur Morgan
Arthur Morgan
 
Role of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered ManufacturingRole of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered Manufacturing
Andrew Leo
 

Why does my choice of storage matter with cassandra?

  • 1. Why does my choice of storage matter with Cassandra? Johnny Miller, Solutions Architect @CyanMiller www.linkedin.com/in/johnnymiller
  • 2. Quote “The single biggest predictor of success or failure with a Cassandra deployment is in storage choice” Patrick McFadin, Chief Evangelist for Cassandra, @PatrickMcFadin ©2014 DataStax Confidential. Do not distribute without consent. 2
  • 3. Cassandra Storage Engine ©2014 DataStax Confidential. Do not distribute without consent. 3
  • 4. Inserts/Updates ©2014 DataStax Confidential. Do not distribute without consent. 4 Memtables are organized in sorted order by row key and flushed to SSTables sequentially (Read/Write) Ordered Map of KVP, (Immutable, Read Only) Append only file structure, providing interim durability for writes before they get flushed to SSTables (Write Only)
  • 5. Reads ©2014 DataStax Confidential. Do not distribute without consent. 5
  • 6. Deletes •  Unlike most DBs, deleted data is not immediately removed from disk. •  A marker called a tombstone is written to indicate the the column is deleted •  A tombstones exist for a configurable period of time, and are only deleted from disk via compaction after that time has expired. •  Deletes are just as fast as inserts J ©2014 DataStax Confidential. Do not distribute without consent. 6
  • 7. Compaction •  Regular compaction of data in Cassandra is essential for a healthy and performant cluster. •  SSTables are immutable •  Get rid or duplicate/overwritten data •  Drop deleted data and tomnstones •  Data in SSTables is sorted by partition key, the effect of which is that while the SSTables are being consolidated, the disk I/O is not random. ©2014 DataStax Confidential. Do not distribute without consent. 7
  • 8. Compaction Strategies •  There is a choice of three strategies Cassandra can use for compaction and all have different disk I/O profiles and capacity requirements. •  SizeTieredCompactionStrategy (default) •  Using this strategy causes bursts in I/O activity while a compaction is in process •  These I/O bursts can negatively affect read-heavy workloads, but typically do not impact write performance. •  Data highly likely to be spread across multiple SSTables i.e. multiple disk seeks •  LeveledCompactionStrategy •  ~90% of the time, data will be in only a single SSTable i.e. minimal disk seeks •  However, there is significantly higher Disk I/O than size tiered compaction in order to guarantee how many SSTables data may be spread across •  Due to high disk I/O rarely appropriate for on traditional HDD •  DateTieredCompactionStrategy (C* 2.0.11+ and 2.1.1+) •  Stores data written within a certain period of time in the same SSTable. •  Can store data that has been set to expire using TTL in an SSTable with other data scheduled to expire at approximately – can just drop the SSTable without any compaction! ©2014 DataStax Confidential. Do not distribute without consent. 8
  • 9. Choice of storage matters •  Most databases rewrite modified data in place and writes are buffered and then flushed to disk as random writes. •  With Cassandra: •  Disk writes are typically sequential append only operations •  On-disk tables are written in a sorted order so compaction running time increases linearly with the amount of data •  So choice of storage is pretty important!! ©2014 DataStax Confidential. Do not distribute without consent. 9
  • 10. Disks and Configuration Options ©2014 DataStax Confidential. Do not distribute without consent. 10
  • 11. Quote “For many applications, we are no longer constrained by hard drive capacity, but by seek speeds. Essentially, a 7200 RPM hard drive is capable of delivering approximately 100 seeks per second, and this has not changed in more than 10 years, even as disk capacity has been doubling every 18–24 months. In fact, if you have a big data application which required a half a petabyte of storage, what had previously required 1024 disk drives when we were using 500 GB drives, now that 3 TB disks are available, only 171 disk drives are needed. So a storage array capable of storing half a petabyte is now capable of 80% fewer seeks.” - Ted Ts’o Maintainer of the ext4 file system in the Linux kernel ©2014 DataStax Confidential. Do not distribute without consent. 11
  • 12. Hard Drive/Spinning Disk ©2014 DataStax Confidential. Do not distribute without consent. 12 This part actually has to move!This bit spins around very fast
  • 13. So what can we do? •  Memory? •  Caching can help, but the hit rate has to be extremely high to mitigate the mechanical latency of spinning disks •  Get rid of the moving parts! •  Mechanical media will never be able to keep up under load •  Today’s databases service multiple users with difference access patterns •  A relatively small number of concurrent disk reads can result in seconds of latency •  SSDs don’t have moving parts •  SSDs can eliminate entire classes of problems •  With Cassandra in particular, you will save a lot of money on staff resources by investing in SSDs up front •  Compactions can be tough on flash, but it’s not as bad as you think ©2014 DataStax Confidential. Do not distribute without consent. 13
  • 14. The best way to do it – SSDs! •  What is an SSD? •  Solid State Drive •  Bits stored in NAND Flash Memory •  No moving parts •  “Seeks” 2-3 orders of magnitude faster than spinning disk •  What’s the catch? •  Smaller capacity •  More expensive •  Flash wears out In practice, this is not a problem – if it makes you nervous, keep spares ©2014 DataStax Confidential. Do not distribute without consent. 14
  • 15. The IO Scheduler •  NOOP - use this scheduler if you know another IO device (like a RAID card) will be doing its own IO scheduling. The NOOP scheduler is just a pass-through. •  Deadline - otherwise, use the deadline scheduler •  Tell the OS the drive is non-rotational •  Tune read-ahead way down – start at 0 and work your way up Don’t forget to tune the OS for SSDs ©2014 DataStax Confidential. Do not distribute without consent. 15 echo deadline > /sys/block/<drive>/queue/scheduler echo 0 > /sys/block/sda/queue/rotational blockdev –setra 0 /dev/<drive>
  • 16. Use SSDs •  More flexibility and substantial performance benefit •  Typically 10x the performance for less than 2x the cost (potentially lower) when compared with HDDs. •  You can use LeveledCompactionStrategy •  SSD drives can scale up to cope with larger compaction overheads while simultaneously serving many random reads. •  Netflix found that they could half the total system cost to achieve the same level of throughput. •  Additionally the mean read request latency was reduced from 10ms to 2.2ms and 99th percentile request latency was reduced from 65ms to 10ms. •  http://techblog.netflix.com/2012/07/benchmarking-high-performance-io-with.html. ©2014 DataStax Confidential. Do not distribute without consent. 16
  • 17. The worst way to do it •  Shared Storage •  Cassandra is a shared-nothing architecture with no single point of failure •  Adding shared storage adds a single point of failure •  Irrespective of the terrible performance – this alone is enough reason not to do it. ©2014 DataStax Confidential. Do not distribute without consent. 17 Shared Storage
  • 18. Local storage configuration •  RAID or JBOD? •  RAID – Redundant Array of (Independepent/Inexpensive) Devices •  JBOD – Just a Bunch Of Disks •  RAID •  Common Cassandra RAID levels are 0, 1, 10 •  RAID-0 is most common, but means all data on node must be rebuilt from other nodes when a drive fails •  JBOD •  Drives are listed individually in cassandra.yaml •  Failed drives can be replaces individually ©2014 DataStax Confidential. Do not distribute without consent. 18
  • 19. How to choose between RAID or JBOD? •  Performance •  For SSDs, not so much… •  Compactions are usually throttled significantly below bus speed •  So a single SSD usually has sufficient throughput •  Throughput is really the only advantage RAID buys •  Manageability •  Pick the option that best fits the deployment scenario •  If using SSD, and drives can be replaced, choose JBOD •  Otherwise, RAID is probably the right choice. ©2014 DataStax Confidential. Do not distribute without consent. 19
  • 20. How to choose between RAID or JBOD? •  Cloud provider •  EC2 ephemeral SSD can’t be replaced, use RAID (and dont use EBS) •  GCE persistent SSD volumes can be replaced, JBOD is useful •  Not all drives are hot swapable •  PCIe devices can’t conveniently be replaced •  SSD Spares for JBOD mode •  Keep spare SSDs online, but not in use •  Allows the node to be easily brought back online with a quick config change ©2014 DataStax Confidential. Do not distribute without consent. 20
  • 21. Comparison Data ©2014 DataStax Confidential. Do not distribute without consent. 21
  • 22. FusionIO ioDrive II ©2014 DataStax Confidential. Do not distribute without consent. 22 Reads Writes Latency (microseconds)
  • 23. PNY XLR8 SSD (consumer grade MLC) ©2014 DataStax Confidential. Do not distribute without consent. 23
  • 24. Samsung 840 Pro SSD (consumer grade MLC) ©2014 DataStax Confidential. Do not distribute without consent. 24
  • 25. 7200RPM SATA ©2014 DataStax Confidential. Do not distribute without consent. 25
  • 26. 7200RPM SAS ©2014 DataStax Confidential. Do not distribute without consent. 26
  • 27. 10K SATA ©2014 DataStax Confidential. Do not distribute without consent. 27
  • 28. 15K SAS ©2014 DataStax Confidential. Do not distribute without consent. 28
  • 29. All Drives ©2014 DataStax Confidential. Do not distribute without consent. 29
  • 30. All SSDs ©2014 DataStax Confidential. Do not distribute without consent. 30
  • 31. Conclusion •  Using SSDs is a good idea •  Better response times •  Less variance in performance •  Significantly higher throughput so fewer servers needed ©2014 DataStax Confidential. Do not distribute without consent. 31 VS
  • 32. Thank You We power the big data apps that transform business.