SlideShare a Scribd company logo
Apache Cassandra at Talkbits
Max Alexejev
Moscow Cassandra Users Group
25 April 2013
What is talkbits?
Talkbits backend
Recursive call
Talkbits backend deployment diagram
Cassandra in EC2 at Talkbits

NetworkTopologyStrategy + EC2MultiRegionSnitch

1 DC, 3 racks (availability zones in S3 Region), N nodes per rack.
3N nodes total.

Data stored in 3 local copies, 1 per zone.

Write with LOCAL_QUORUM setting, read with 1 or 2.

m1.large nodes (2 cores, 4CU, 7.5Gb RAM).

Transaction log and data files are both on RAID0-ed ephemeral
drive (2 drives in array). Works for SSD or EC2 disks only!
Other typical setup options for EC2:

m1.xlarge (16Gb) / m2.4xlarge (64Gb) / hi1.4xlarge (SSD) nodes

EBS-backed data volumes (not recommended. use for
development only).
Cassandra consistency options
Definitions
N, R, W settings from Amazon Dynamo.
N – replication factor. Set per keyspace on keyspace creation.
Quorum: N / 2 + 1 (rounded down)
RW consistency options:
ANY, ONE, TWO, THREE, QUORUM, LOCAL_QUORUM &
EACH_QUORUM (multi-dc), ALL.
Set per query.
Cassandra consistency semantics
W + R > N
Ensures strong consistency. Read will always reflect the most recent
write.
R = W = [LOCAL_]QUORUM
Strong consistency. See quorum definition and formula above.
W + R <= N
Eventual consistency.
W = 1
Good for fire-n-forget writes: logs, traces, metrics, page views etc.
Cassandra backups to S3
Full backups
•Periodic snapshots (daily, weekly)
•Remove from local disk after upload to S3 to prevent disk
overflow
Incremental backups
•SSTable are compressed and copied to S3
•Happens on IN_MOVED_TO, IN_CLOSE_WRITE events
•Don’t turn on with leveled compaction (huge network traffic
to S3)
Continuous backups
•Compress and copy transaction log to S3 with short time
intervals (for example - 5, 30, 60 mins)
Cassandra backups to S3 - tools
TableSnap from SimpleGeo
https://github.com/Instagram/tablesnap (most up-to-date fork)
3 simple Python scripts is the whole tool (tablesnap, tableslurp,
tablechop). Allows to upload SSTables in real-time, restore and remove
old backups uploads from S3.
Priam from Netflix
https://github.com/Netflix/Priam
Full-blown web application. Requires servlet container to run and
depends on Amazon SimpleDB service for distributed token
management.
Contacts
Max Alexejev
http://ru.linkedin.com/pub/max-alexejev/51/820/ab9
http://www.slideshare.net/MaxAlexejev/
malexejev@gmail.com

More Related Content

What's hot (18)

PDF
10 Devops-Friendly Database Must-Haves - Dor Laor, ScyllaDB - DevOpsDays Tel ...
DevOpsDays Tel Aviv
 
PDF
Gnocchi v4 (preview)
Gordon Chung
 
PDF
Gnocchi v4 - past and present
Gordon Chung
 
PDF
Gnocchi v3
Gordon Chung
 
PPTX
Spark Gotchas and Lessons Learned
Jen Waller
 
PPT
Cassandra 1.2 by Eddie Satterly
DataStax Academy
 
PPTX
Exploring Parallel Merging In GPU Based Systems Using CUDA C.
Rakib Hossain
 
PDF
Cassandra 2.1 boot camp, Compaction
Joshua McKenzie
 
PDF
JavaCro'15 - Big Data in a DIY home - Marko Švaljek
HUJAK - Hrvatska udruga Java korisnika / Croatian Java User Association
 
PPTX
R user-group-2011-09
Ted Dunning
 
ODP
bup backup system (2011-04)
apenwarr
 
PPT
Avi Apelbaum - RAC
gridcontrol
 
PPTX
ДЕНИС КЛЕПIКОВ «Long Term storage for Prometheus» Lviv DevOps Conference 2019
UA DevOps Conference
 
PPTX
NoSql with cassandra
Marek Koniew
 
PPTX
R user group 2011 09
MapR Technologies
 
PPTX
MongoDB Backup & Disaster Recovery
Elankumaran Srinivasan
 
PDF
Gnocchi Profiling 2.1.x
Gordon Chung
 
PDF
Galaxy CloudMan performance on AWS
Enis Afgan
 
10 Devops-Friendly Database Must-Haves - Dor Laor, ScyllaDB - DevOpsDays Tel ...
DevOpsDays Tel Aviv
 
Gnocchi v4 (preview)
Gordon Chung
 
Gnocchi v4 - past and present
Gordon Chung
 
Gnocchi v3
Gordon Chung
 
Spark Gotchas and Lessons Learned
Jen Waller
 
Cassandra 1.2 by Eddie Satterly
DataStax Academy
 
Exploring Parallel Merging In GPU Based Systems Using CUDA C.
Rakib Hossain
 
Cassandra 2.1 boot camp, Compaction
Joshua McKenzie
 
JavaCro'15 - Big Data in a DIY home - Marko Švaljek
HUJAK - Hrvatska udruga Java korisnika / Croatian Java User Association
 
R user-group-2011-09
Ted Dunning
 
bup backup system (2011-04)
apenwarr
 
Avi Apelbaum - RAC
gridcontrol
 
ДЕНИС КЛЕПIКОВ «Long Term storage for Prometheus» Lviv DevOps Conference 2019
UA DevOps Conference
 
NoSql with cassandra
Marek Koniew
 
R user group 2011 09
MapR Technologies
 
MongoDB Backup & Disaster Recovery
Elankumaran Srinivasan
 
Gnocchi Profiling 2.1.x
Gordon Chung
 
Galaxy CloudMan performance on AWS
Enis Afgan
 

Viewers also liked (11)

PPTX
Digitaldrawing pp2
msickler
 
PPT
Studiointro 12
msickler
 
DOCX
Reina
acatalipriss
 
PPT
Ummera Presentation
Ummera Smoked Products
 
DOCX
Instrumen penilaian sikap dari penilaian diri
Krisna Indah Puspitasari
 
PPTX
WishClub- Plano de Compensação
Gedielson Lima Corrêa
 
PPTX
Karakteristik Mata Pelajaran Bahasa Inggris
Krisna Indah Puspitasari
 
PPTX
Environmental influence to development of children's mentality
Krisna Indah Puspitasari
 
DOCX
Tugas Mata Kuliah Translation
Krisna Indah Puspitasari
 
PPTX
Geometry Dash
Hugo Esteban Ruano Lopez
 
DOCX
Makalah Rancangan penelitian (research design)
Krisna Indah Puspitasari
 
Digitaldrawing pp2
msickler
 
Studiointro 12
msickler
 
Ummera Presentation
Ummera Smoked Products
 
Instrumen penilaian sikap dari penilaian diri
Krisna Indah Puspitasari
 
WishClub- Plano de Compensação
Gedielson Lima Corrêa
 
Karakteristik Mata Pelajaran Bahasa Inggris
Krisna Indah Puspitasari
 
Environmental influence to development of children's mentality
Krisna Indah Puspitasari
 
Tugas Mata Kuliah Translation
Krisna Indah Puspitasari
 
Makalah Rancangan penelitian (research design)
Krisna Indah Puspitasari
 
Ad

Similar to Cassandra at talkbits (17)

PDF
Aurora는 어떻게 다른가 - 김일호 솔루션즈 아키텍트:: AWS Cloud Track 3 Gaming
Amazon Web Services Korea
 
PDF
Cassandra for Sysadmins
Nathan Milford
 
PPTX
Dynamo cassandra
Wu Liang
 
PDF
Amazon (AWS) Aurora
PGConf APAC
 
PDF
C* Summit 2013: Cassandra at Instagram by Rick Branson
DataStax Academy
 
PDF
Introducing Amazon Aurora
Sailesh Krishnamurthy
 
PPTX
Cassandra
Upaang Saxena
 
PDF
ContainerDays Boston 2015: "CoreOS: Building the Layers of the Scalable Clust...
DynamicInfraDays
 
PPTX
Spark 计算模型
wang xing
 
PDF
Data processing platforms with SMACK: Spark and Mesos internals
Anton Kirillov
 
PDF
An Overview of Spanner: Google's Globally Distributed Database
Benjamin Bengfort
 
PPT
Clustering van IT-componenten
Richard Claassens CIPPE
 
PDF
Best Practices with PostgreSQL on Solaris
Jignesh Shah
 
PDF
Real-Time Data Pipelines Made Easy with Structured Streaming in Apache Spark.pdf
nilanjan172nsvian
 
PDF
spark stream - kafka - the right way
Dori Waldman
 
Aurora는 어떻게 다른가 - 김일호 솔루션즈 아키텍트:: AWS Cloud Track 3 Gaming
Amazon Web Services Korea
 
Cassandra for Sysadmins
Nathan Milford
 
Dynamo cassandra
Wu Liang
 
Amazon (AWS) Aurora
PGConf APAC
 
C* Summit 2013: Cassandra at Instagram by Rick Branson
DataStax Academy
 
Introducing Amazon Aurora
Sailesh Krishnamurthy
 
Cassandra
Upaang Saxena
 
ContainerDays Boston 2015: "CoreOS: Building the Layers of the Scalable Clust...
DynamicInfraDays
 
Spark 计算模型
wang xing
 
Data processing platforms with SMACK: Spark and Mesos internals
Anton Kirillov
 
An Overview of Spanner: Google's Globally Distributed Database
Benjamin Bengfort
 
Clustering van IT-componenten
Richard Claassens CIPPE
 
Best Practices with PostgreSQL on Solaris
Jignesh Shah
 
Real-Time Data Pipelines Made Easy with Structured Streaming in Apache Spark.pdf
nilanjan172nsvian
 
spark stream - kafka - the right way
Dori Waldman
 
Ad

Recently uploaded (20)

PDF
Market Wrap for 18th July 2025 by CIFDAQ
CIFDAQ
 
PPTX
Machine Learning Benefits Across Industries
SynapseIndia
 
PDF
NewMind AI Weekly Chronicles – July’25, Week III
NewMind AI
 
PDF
OFFOFFBOX™ – A New Era for African Film | Startup Presentation
ambaicciwalkerbrian
 
PDF
Build with AI and GDG Cloud Bydgoszcz- ADK .pdf
jaroslawgajewski1
 
PPTX
python advanced data structure dictionary with examples python advanced data ...
sprasanna11
 
PDF
GDG Cloud Munich - Intro - Luiz Carneiro - #BuildWithAI - July - Abdel.pdf
Luiz Carneiro
 
PPTX
Simple and concise overview about Quantum computing..pptx
mughal641
 
PDF
Economic Impact of Data Centres to the Malaysian Economy
flintglobalapac
 
PDF
Market Insight : ETH Dominance Returns
CIFDAQ
 
PDF
Peak of Data & AI Encore - Real-Time Insights & Scalable Editing with ArcGIS
Safe Software
 
PDF
MASTERDECK GRAPHSUMMIT SYDNEY (Public).pdf
Neo4j
 
PPTX
The Future of AI & Machine Learning.pptx
pritsen4700
 
PDF
How Current Advanced Cyber Threats Transform Business Operation
Eryk Budi Pratama
 
PPTX
Agentic AI in Healthcare Driving the Next Wave of Digital Transformation
danielle hunter
 
PDF
Lecture A - AI Workflows for Banking.pdf
Dr. LAM Yat-fai (林日辉)
 
PPTX
IT Runs Better with ThousandEyes AI-driven Assurance
ThousandEyes
 
PPTX
PCU Keynote at IEEE World Congress on Services 250710.pptx
Ramesh Jain
 
PDF
TrustArc Webinar - Navigating Data Privacy in LATAM: Laws, Trends, and Compli...
TrustArc
 
PDF
SalesForce Managed Services Benefits (1).pdf
TechForce Services
 
Market Wrap for 18th July 2025 by CIFDAQ
CIFDAQ
 
Machine Learning Benefits Across Industries
SynapseIndia
 
NewMind AI Weekly Chronicles – July’25, Week III
NewMind AI
 
OFFOFFBOX™ – A New Era for African Film | Startup Presentation
ambaicciwalkerbrian
 
Build with AI and GDG Cloud Bydgoszcz- ADK .pdf
jaroslawgajewski1
 
python advanced data structure dictionary with examples python advanced data ...
sprasanna11
 
GDG Cloud Munich - Intro - Luiz Carneiro - #BuildWithAI - July - Abdel.pdf
Luiz Carneiro
 
Simple and concise overview about Quantum computing..pptx
mughal641
 
Economic Impact of Data Centres to the Malaysian Economy
flintglobalapac
 
Market Insight : ETH Dominance Returns
CIFDAQ
 
Peak of Data & AI Encore - Real-Time Insights & Scalable Editing with ArcGIS
Safe Software
 
MASTERDECK GRAPHSUMMIT SYDNEY (Public).pdf
Neo4j
 
The Future of AI & Machine Learning.pptx
pritsen4700
 
How Current Advanced Cyber Threats Transform Business Operation
Eryk Budi Pratama
 
Agentic AI in Healthcare Driving the Next Wave of Digital Transformation
danielle hunter
 
Lecture A - AI Workflows for Banking.pdf
Dr. LAM Yat-fai (林日辉)
 
IT Runs Better with ThousandEyes AI-driven Assurance
ThousandEyes
 
PCU Keynote at IEEE World Congress on Services 250710.pptx
Ramesh Jain
 
TrustArc Webinar - Navigating Data Privacy in LATAM: Laws, Trends, and Compli...
TrustArc
 
SalesForce Managed Services Benefits (1).pdf
TechForce Services
 

Cassandra at talkbits

  • 1. Apache Cassandra at Talkbits Max Alexejev Moscow Cassandra Users Group 25 April 2013
  • 5. Cassandra in EC2 at Talkbits  NetworkTopologyStrategy + EC2MultiRegionSnitch  1 DC, 3 racks (availability zones in S3 Region), N nodes per rack. 3N nodes total.  Data stored in 3 local copies, 1 per zone.  Write with LOCAL_QUORUM setting, read with 1 or 2.  m1.large nodes (2 cores, 4CU, 7.5Gb RAM).  Transaction log and data files are both on RAID0-ed ephemeral drive (2 drives in array). Works for SSD or EC2 disks only! Other typical setup options for EC2:  m1.xlarge (16Gb) / m2.4xlarge (64Gb) / hi1.4xlarge (SSD) nodes  EBS-backed data volumes (not recommended. use for development only).
  • 6. Cassandra consistency options Definitions N, R, W settings from Amazon Dynamo. N – replication factor. Set per keyspace on keyspace creation. Quorum: N / 2 + 1 (rounded down) RW consistency options: ANY, ONE, TWO, THREE, QUORUM, LOCAL_QUORUM & EACH_QUORUM (multi-dc), ALL. Set per query.
  • 7. Cassandra consistency semantics W + R > N Ensures strong consistency. Read will always reflect the most recent write. R = W = [LOCAL_]QUORUM Strong consistency. See quorum definition and formula above. W + R <= N Eventual consistency. W = 1 Good for fire-n-forget writes: logs, traces, metrics, page views etc.
  • 8. Cassandra backups to S3 Full backups •Periodic snapshots (daily, weekly) •Remove from local disk after upload to S3 to prevent disk overflow Incremental backups •SSTable are compressed and copied to S3 •Happens on IN_MOVED_TO, IN_CLOSE_WRITE events •Don’t turn on with leveled compaction (huge network traffic to S3) Continuous backups •Compress and copy transaction log to S3 with short time intervals (for example - 5, 30, 60 mins)
  • 9. Cassandra backups to S3 - tools TableSnap from SimpleGeo https://github.com/Instagram/tablesnap (most up-to-date fork) 3 simple Python scripts is the whole tool (tablesnap, tableslurp, tablechop). Allows to upload SSTables in real-time, restore and remove old backups uploads from S3. Priam from Netflix https://github.com/Netflix/Priam Full-blown web application. Requires servlet container to run and depends on Amazon SimpleDB service for distributed token management.