SlideShare a Scribd company logo
Getting to know
by Michelle Darling
mdarlingcmt@gmail.com
August 2013
Agenda:
● What is Cassandra?
● Installation, CQL3
● Data Modelling
● Summary
Only 15 min to cover these, so
please hold questions til the
end, or email me :-) and I’ll
summarize Q&A for everyone.
Unfortunately, no time for:
● DB Admin
○ Detailed Architecture
○ Partitioning /
Consistent Hashing
○ Consistency Tuning
○ Data Distribution &
Replication
○ System Tables
● App Development
○ Using Python, Ruby etc
to access Cassandra
○ Using Hadoop to
stream data into
Cassandra
What is Cassandra?
“Fortuneteller of Doom”
from Greek Mythology. Tried to
warn others about future disasters,
but no one listened. Unfortunately,
she was 100% accurate.
NoSQL Distributed DB
● Consistency - A__ID
● Availability - High
● Point of Failure - none
● Good for Event
Tracking & Analysis
○ Time series data
○ Sensor device data
○ Social media analytics
○ Risk Analysis
○ Failure Prediction
Rackspace: “Which servers
are under heavy load
and are about to crash?”
The Evolution of Cassandra
2008: Open-Source Release / 2013: Enterprise & Community Editions
Data Model
● Wide rows, sparse arrays
● High performance through very
fast write throughput.
Infrastructure
● Peer-Peer Gossip
● Key-Value Pairs
● Tunable Consistency
2006
2005
● Originally for Inbox Search
● But now used for Instagram
Other NoSQL vs. Cassandra
NoSQL Taxonomy:
● Key-Value Pairs
○ Dynamo, Riak, Redis
● Column-Based
○ BigTable, HBase,
Cassandra
● Document-Based
○ MongoDB, Couchbase
● Graph
○ Neo4J
Big Data Capable
C* Differentiators:
● Production-proven at
Netflix, eBay, Twitter,
20 of Fortune 100
● “Clear Winner” in
Scalability,
Performance,
Availability
-- DataStax
Architecture
● Cluster (ring)
● Nodes (circles)
● Peer-to-Peer Model
● Gossip Protocol
Partitioner:
Consistent Hashing
Netflix
Streaming Video
● Personalized
Recommendations per
family member
● Built on Amazon Web
Services (AWS) +
Cassandra
Cloud installation using
● Amazon Web Services (AWS)
● Elastic Compute Cloud (EC2)
○ Free for the 1st year! Then pay only for what you use.
○ Sign up for AWS EC2 account: Big Data University Video 4:34 minutes,
● Amazon Machine Image (AMI)
○ Preconfigured installation template
○ Choose: “DataStax AMI for Cassandra
Community Edition”
○ Follow these *very good* step-by-step
instructions from DataStax.
○ AMIs also available for CouchBase, MongoDB
(make sure you pick the free tier community versions to avoid
monthly charge$$!!!).
AWS EC2 Dashboard
DataStax AMI Setup
DataStax AMI Setup
--clustername Michelle
--totalnodes 1
--version community
“Roll your Own” Installation
DataStax Community Edition
● Install instructions
For Linux, Windows,
MacOS:
http://www.datastax.com/2012/01/getting-
started-with-cassandra
● Video: “Set up a 4-
node Cassandra
cluster in under 2
minutes”
http://www.screenr.com/5G6
Invoke CQLSH, CREATE KEYSPACE
./bin/cqlsh
cqlsh> CREATE KEYSPACE big_data
… with strategy_class = ‘org.apache.cassandra.
locator.SimpleStrategy’
… with strategy_options:replication_factor=‘1’;
cqlsh> use big_data;
cqlsh:big_data>
Tip: Skip Thrift -- use CQL3
Thrift RPC
// Your Column
Column col = new Column(ByteBuffer.wrap("name".
getBytes()));
col.setValue(ByteBuffer.wrap("value".getBytes()));
col.setTimestamp(System.currentTimeMillis());
// Don't ask
ColumnOrSuperColumn cosc = new ColumnOrSuperColumn();
cosc.setColumn(col);
// Prepare to be amazed
Mutation mutation = new Mutation();
mutation.setColumnOrSuperColumn(cosc);
List<Mutation> mutations = new ArrayList<Mutation>();
mutations.add(mutation);
Map mutations_map = new HashMap<ByteBuffer, Map<String,
List<Mutation>>>();
Map cf_map = new HashMap<String, List<Mutation>>();
cf_map.set("Standard1", mutations);
mutations_map.put(ByteBuffer.wrap("key".getBytes()),
cf_map);
cassandra.batch_mutate(mutations_map,
consistency_level);
CQL3
- Uses cqlsh
- “SQL-like” language
- Runs on top of Thrift RPC
- Much more user-friendly.
Thrift code on left
equals this in CQL3:
INSERT INTO (id, name)
VALUES ('key',
'value');
CREATE TABLE
cqlsh:big_data> create table user_tags (
… user_id varchar,
… tag varchar,
… value counter,
… primary key (user_id, tag)
…):
● TABLE user_tags: “How many times has a user
mentioned a hashtag?”
● COUNTER datatype - Computes & stores counter value
at the time data is written. This optimizes query
performance.
UPDATE TABLE
SELECT FROM TABLE
cqlsh:big_data> UPDATE user_tags SET
value=value+1 WHERE user_id = ‘paul’ AND tag =
‘cassandra’
cqlsh:big_data> SELECT * FROM user_tags
user_id | tag | value
--------+-----------+----------
paul | cassandra | 1
DATA MODELING
A Major Paradigm Shift!
RDBMS Cassandra
Structured Data, Fixed Schema Unstructured Data, Flexible Schema
“Array of Arrays”
2D: ROW x COLUMN
“Nested Key-Value Pairs”
3D: ROW Key x COLUMN key x COLUMN values
DATABASE KEYSPACE
TABLE TABLE a.k.a COLUMN FAMILY
ROW ROW a.k.a PARTITION. Unit of replication.
COLUMN COLUMN [Name, Value, Timestamp]. a.k.a CLUSTER. Unit
of storage. Up to 2 billion columns per row.
FOREIGN KEYS, JOINS,
ACID Consistency
Referential Integrity not enforced, so A_CID.
BUT relationships represented using COLLECTIONS.
Cassandra
3D+: Nested Objects
RDBMS
2D: Rows
x columns
Example:
“Twissandra” Web App
Twitter-Inspired
sample application
written in Python +
Cassandra.
● Play with the app:
twissandra.com
● Examine & learn
from the code on
GitHub.
Features/Queries:
● Sign In, Sign Up
● Post Tweet
● Userline (User’s tweets)
● Timeline (All tweets)
● Following (Users being
followed by user)
● Followers (Users
following this user)
Twissandra.com vs Twitter.com
Twissandra - RDBMS Version
Entities
● USER, TWEET
● FOLLOWER, FOLLOWING
● FRIENDS
Relationships:
● USER has many TWEETs.
● USER is a FOLLOWER of many
USERs.
● Many USERs are FOLLOWING
USER.
Twissandra - Cassandra Version
Tip: Model tables to mirror queries.
TABLES or CFs
● TWEET
● USER, USERNAME
● FOLLOWERS, FOLLOWING
● USERLINE, TIMELINE
Notes:
● Extra tables mirror queries.
● Denormalized tables are
“pre-formed”for faster
performance.
TABLE
Tip: Remember,
Skip Thrift -- use CQL3
What does C* data look like?
TABLE Userline
“List all of user’s Tweets”
*************
Row Key: user_id
Columns
● Column Key: tweet_id
● “at” Timestamp
● TTL (Time to Live) -
seconds til expiration
date.
*************
Cassandra Data Model = LEGOs?
FlexibleSchema
Summary:
● Go straight from SQL
to CQL3; skip Thrift, Column
Families, SuperColumns, etc
● Denormalize tables to
mirror important queries.
Roughly 1 table per impt query.
● Choose wisely:
○ Partition Keys
○ Cluster Keys
○ Indexes
○ TTL
○ Counters
○ Collections
See DataStax Music Service
Example
● Consider hybrid
approach:
○ 20% - RDBMS for highly
structured, OLTP, ACID
requirements.
○ 80% - Scale Cassandra to
handle the rest of data.
Remember:
● Cheap: storage,
servers, OpenSource
software.
● Precious: User AND
Developer Happiness.
Resources
C* Summit 2013:
● Slides
● Cassandra at eBay Scale (slides)
● Data Modelers Still Have Jobs -
Adjusting For the NoSQL
Environment (Slides)
● Real-time Analytics using
Cassandra, Spark and Shark
slides
● Cassandra By Example: Data
Modelling with CQL3 Slides
● DATASTAX C*OLLEGE CREDIT:
DATA MODELLING FOR APACHE
CASSANDRA slides
I wish I found these 1st:
● How do I Cassandra?
slides
● Mobile version of
DataStax web docs
(link)
Ad

More Related Content

What's hot (20)

Introduction to cassandra
Introduction to cassandraIntroduction to cassandra
Introduction to cassandra
Nguyen Quang
 
Log Structured Merge Tree
Log Structured Merge TreeLog Structured Merge Tree
Log Structured Merge Tree
University of California, Santa Cruz
 
Intro to HBase
Intro to HBaseIntro to HBase
Intro to HBase
alexbaranau
 
Migration to Oracle Multitenant
Migration to Oracle MultitenantMigration to Oracle Multitenant
Migration to Oracle Multitenant
Jitendra Singh
 
C* Summit 2013: The World's Next Top Data Model by Patrick McFadin
C* Summit 2013: The World's Next Top Data Model by Patrick McFadinC* Summit 2013: The World's Next Top Data Model by Patrick McFadin
C* Summit 2013: The World's Next Top Data Model by Patrick McFadin
DataStax Academy
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to Redis
Dvir Volk
 
Intro to Cassandra
Intro to CassandraIntro to Cassandra
Intro to Cassandra
DataStax Academy
 
Cassandra Database
Cassandra DatabaseCassandra Database
Cassandra Database
YounesCharfaoui
 
Cassandra Introduction & Features
Cassandra Introduction & FeaturesCassandra Introduction & Features
Cassandra Introduction & Features
DataStax Academy
 
A Day in the Life of a ClickHouse Query Webinar Slides
A Day in the Life of a ClickHouse Query Webinar Slides A Day in the Life of a ClickHouse Query Webinar Slides
A Day in the Life of a ClickHouse Query Webinar Slides
Altinity Ltd
 
InnoDB Internal
InnoDB InternalInnoDB Internal
InnoDB Internal
mysqlops
 
An Overview of Apache Cassandra
An Overview of Apache CassandraAn Overview of Apache Cassandra
An Overview of Apache Cassandra
DataStax
 
Deploying PostgreSQL on Kubernetes
Deploying PostgreSQL on KubernetesDeploying PostgreSQL on Kubernetes
Deploying PostgreSQL on Kubernetes
Jimmy Angelakos
 
Tech Talk: RocksDB Slides by Dhruba Borthakur & Haobo Xu of Facebook
Tech Talk: RocksDB Slides by Dhruba Borthakur & Haobo Xu of FacebookTech Talk: RocksDB Slides by Dhruba Borthakur & Haobo Xu of Facebook
Tech Talk: RocksDB Slides by Dhruba Borthakur & Haobo Xu of Facebook
The Hive
 
Scylla Summit 2022: Making Schema Changes Safe with Raft
Scylla Summit 2022: Making Schema Changes Safe with RaftScylla Summit 2022: Making Schema Changes Safe with Raft
Scylla Summit 2022: Making Schema Changes Safe with Raft
ScyllaDB
 
Presentation of Apache Cassandra
Presentation of Apache Cassandra Presentation of Apache Cassandra
Presentation of Apache Cassandra
Nikiforos Botis
 
Oracle 12c and its pluggable databases
Oracle 12c and its pluggable databasesOracle 12c and its pluggable databases
Oracle 12c and its pluggable databases
Gustavo Rene Antunez
 
Under the Hood of a Shard-per-Core Database Architecture
Under the Hood of a Shard-per-Core Database ArchitectureUnder the Hood of a Shard-per-Core Database Architecture
Under the Hood of a Shard-per-Core Database Architecture
ScyllaDB
 
Cassandra an overview
Cassandra an overviewCassandra an overview
Cassandra an overview
PritamKathar
 
Achieving compliance With MongoDB Security
Achieving compliance With MongoDB Security Achieving compliance With MongoDB Security
Achieving compliance With MongoDB Security
Mydbops
 
Introduction to cassandra
Introduction to cassandraIntroduction to cassandra
Introduction to cassandra
Nguyen Quang
 
Migration to Oracle Multitenant
Migration to Oracle MultitenantMigration to Oracle Multitenant
Migration to Oracle Multitenant
Jitendra Singh
 
C* Summit 2013: The World's Next Top Data Model by Patrick McFadin
C* Summit 2013: The World's Next Top Data Model by Patrick McFadinC* Summit 2013: The World's Next Top Data Model by Patrick McFadin
C* Summit 2013: The World's Next Top Data Model by Patrick McFadin
DataStax Academy
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to Redis
Dvir Volk
 
Cassandra Introduction & Features
Cassandra Introduction & FeaturesCassandra Introduction & Features
Cassandra Introduction & Features
DataStax Academy
 
A Day in the Life of a ClickHouse Query Webinar Slides
A Day in the Life of a ClickHouse Query Webinar Slides A Day in the Life of a ClickHouse Query Webinar Slides
A Day in the Life of a ClickHouse Query Webinar Slides
Altinity Ltd
 
InnoDB Internal
InnoDB InternalInnoDB Internal
InnoDB Internal
mysqlops
 
An Overview of Apache Cassandra
An Overview of Apache CassandraAn Overview of Apache Cassandra
An Overview of Apache Cassandra
DataStax
 
Deploying PostgreSQL on Kubernetes
Deploying PostgreSQL on KubernetesDeploying PostgreSQL on Kubernetes
Deploying PostgreSQL on Kubernetes
Jimmy Angelakos
 
Tech Talk: RocksDB Slides by Dhruba Borthakur & Haobo Xu of Facebook
Tech Talk: RocksDB Slides by Dhruba Borthakur & Haobo Xu of FacebookTech Talk: RocksDB Slides by Dhruba Borthakur & Haobo Xu of Facebook
Tech Talk: RocksDB Slides by Dhruba Borthakur & Haobo Xu of Facebook
The Hive
 
Scylla Summit 2022: Making Schema Changes Safe with Raft
Scylla Summit 2022: Making Schema Changes Safe with RaftScylla Summit 2022: Making Schema Changes Safe with Raft
Scylla Summit 2022: Making Schema Changes Safe with Raft
ScyllaDB
 
Presentation of Apache Cassandra
Presentation of Apache Cassandra Presentation of Apache Cassandra
Presentation of Apache Cassandra
Nikiforos Botis
 
Oracle 12c and its pluggable databases
Oracle 12c and its pluggable databasesOracle 12c and its pluggable databases
Oracle 12c and its pluggable databases
Gustavo Rene Antunez
 
Under the Hood of a Shard-per-Core Database Architecture
Under the Hood of a Shard-per-Core Database ArchitectureUnder the Hood of a Shard-per-Core Database Architecture
Under the Hood of a Shard-per-Core Database Architecture
ScyllaDB
 
Cassandra an overview
Cassandra an overviewCassandra an overview
Cassandra an overview
PritamKathar
 
Achieving compliance With MongoDB Security
Achieving compliance With MongoDB Security Achieving compliance With MongoDB Security
Achieving compliance With MongoDB Security
Mydbops
 

Similar to Cassandra NoSQL Tutorial (20)

Multi-cluster k8ssandra
Multi-cluster k8ssandraMulti-cluster k8ssandra
Multi-cluster k8ssandra
KubernetesCommunityD
 
Cassandra Talk: Austin JUG
Cassandra Talk: Austin JUGCassandra Talk: Austin JUG
Cassandra Talk: Austin JUG
Stu Hood
 
Introduction to cloud and openstack
Introduction to cloud and openstackIntroduction to cloud and openstack
Introduction to cloud and openstack
Shivaling Sannalli
 
GumGum: Multi-Region Cassandra in AWS
GumGum: Multi-Region Cassandra in AWSGumGum: Multi-Region Cassandra in AWS
GumGum: Multi-Region Cassandra in AWS
DataStax Academy
 
NetflixOSS Meetup season 3 episode 1
NetflixOSS Meetup season 3 episode 1NetflixOSS Meetup season 3 episode 1
NetflixOSS Meetup season 3 episode 1
Ruslan Meshenberg
 
Avoiding Pitfalls for Cassandra.pdf
Avoiding Pitfalls for Cassandra.pdfAvoiding Pitfalls for Cassandra.pdf
Avoiding Pitfalls for Cassandra.pdf
Cédrick Lunven
 
On Rails with Apache Cassandra
On Rails with Apache CassandraOn Rails with Apache Cassandra
On Rails with Apache Cassandra
Stu Hood
 
Running Cassandra in AWS
Running Cassandra in AWSRunning Cassandra in AWS
Running Cassandra in AWS
DataStax Academy
 
Apache Cassandra Lunch #64: Cassandra for .NET Developers
Apache Cassandra Lunch #64: Cassandra for .NET DevelopersApache Cassandra Lunch #64: Cassandra for .NET Developers
Apache Cassandra Lunch #64: Cassandra for .NET Developers
Anant Corporation
 
Five Lessons in Distributed Databases
Five Lessons  in Distributed DatabasesFive Lessons  in Distributed Databases
Five Lessons in Distributed Databases
jbellis
 
Apache Cassandra introduction
Apache Cassandra introductionApache Cassandra introduction
Apache Cassandra introduction
fardinjamshidi
 
Lambda at Weather Scale - Cassandra Summit 2015
Lambda at Weather Scale - Cassandra Summit 2015Lambda at Weather Scale - Cassandra Summit 2015
Lambda at Weather Scale - Cassandra Summit 2015
Robbie Strickland
 
How Opera Syncs Tens of Millions of Browsers and Sleeps Well at Night
How Opera Syncs Tens of Millions of Browsers and Sleeps Well at NightHow Opera Syncs Tens of Millions of Browsers and Sleeps Well at Night
How Opera Syncs Tens of Millions of Browsers and Sleeps Well at Night
ScyllaDB
 
Cassandra To Infinity And Beyond
Cassandra To Infinity And BeyondCassandra To Infinity And Beyond
Cassandra To Infinity And Beyond
Romain Hardouin
 
A Microservices approach with Cassandra and Quarkus | DevNation Tech Talk
A Microservices approach with Cassandra and Quarkus | DevNation Tech TalkA Microservices approach with Cassandra and Quarkus | DevNation Tech Talk
A Microservices approach with Cassandra and Quarkus | DevNation Tech Talk
Red Hat Developers
 
Cassandra REST API with Pagination TEAM 15
Cassandra REST API with Pagination TEAM 15Cassandra REST API with Pagination TEAM 15
Cassandra REST API with Pagination TEAM 15
Akash Kant
 
Global Cluster Topologies in MongoDB Atlas
Global Cluster Topologies in MongoDB AtlasGlobal Cluster Topologies in MongoDB Atlas
Global Cluster Topologies in MongoDB Atlas
MongoDB
 
Cassandra - A Basic Introduction Guide
Cassandra - A Basic Introduction GuideCassandra - A Basic Introduction Guide
Cassandra - A Basic Introduction Guide
Mohammed Fazuluddin
 
Cassandra's Odyssey @ Netflix
Cassandra's Odyssey @ NetflixCassandra's Odyssey @ Netflix
Cassandra's Odyssey @ Netflix
Roopa Tangirala
 
DataStax TechDay - Munich 2014
DataStax TechDay - Munich 2014DataStax TechDay - Munich 2014
DataStax TechDay - Munich 2014
Christian Johannsen
 
Cassandra Talk: Austin JUG
Cassandra Talk: Austin JUGCassandra Talk: Austin JUG
Cassandra Talk: Austin JUG
Stu Hood
 
Introduction to cloud and openstack
Introduction to cloud and openstackIntroduction to cloud and openstack
Introduction to cloud and openstack
Shivaling Sannalli
 
GumGum: Multi-Region Cassandra in AWS
GumGum: Multi-Region Cassandra in AWSGumGum: Multi-Region Cassandra in AWS
GumGum: Multi-Region Cassandra in AWS
DataStax Academy
 
NetflixOSS Meetup season 3 episode 1
NetflixOSS Meetup season 3 episode 1NetflixOSS Meetup season 3 episode 1
NetflixOSS Meetup season 3 episode 1
Ruslan Meshenberg
 
Avoiding Pitfalls for Cassandra.pdf
Avoiding Pitfalls for Cassandra.pdfAvoiding Pitfalls for Cassandra.pdf
Avoiding Pitfalls for Cassandra.pdf
Cédrick Lunven
 
On Rails with Apache Cassandra
On Rails with Apache CassandraOn Rails with Apache Cassandra
On Rails with Apache Cassandra
Stu Hood
 
Apache Cassandra Lunch #64: Cassandra for .NET Developers
Apache Cassandra Lunch #64: Cassandra for .NET DevelopersApache Cassandra Lunch #64: Cassandra for .NET Developers
Apache Cassandra Lunch #64: Cassandra for .NET Developers
Anant Corporation
 
Five Lessons in Distributed Databases
Five Lessons  in Distributed DatabasesFive Lessons  in Distributed Databases
Five Lessons in Distributed Databases
jbellis
 
Apache Cassandra introduction
Apache Cassandra introductionApache Cassandra introduction
Apache Cassandra introduction
fardinjamshidi
 
Lambda at Weather Scale - Cassandra Summit 2015
Lambda at Weather Scale - Cassandra Summit 2015Lambda at Weather Scale - Cassandra Summit 2015
Lambda at Weather Scale - Cassandra Summit 2015
Robbie Strickland
 
How Opera Syncs Tens of Millions of Browsers and Sleeps Well at Night
How Opera Syncs Tens of Millions of Browsers and Sleeps Well at NightHow Opera Syncs Tens of Millions of Browsers and Sleeps Well at Night
How Opera Syncs Tens of Millions of Browsers and Sleeps Well at Night
ScyllaDB
 
Cassandra To Infinity And Beyond
Cassandra To Infinity And BeyondCassandra To Infinity And Beyond
Cassandra To Infinity And Beyond
Romain Hardouin
 
A Microservices approach with Cassandra and Quarkus | DevNation Tech Talk
A Microservices approach with Cassandra and Quarkus | DevNation Tech TalkA Microservices approach with Cassandra and Quarkus | DevNation Tech Talk
A Microservices approach with Cassandra and Quarkus | DevNation Tech Talk
Red Hat Developers
 
Cassandra REST API with Pagination TEAM 15
Cassandra REST API with Pagination TEAM 15Cassandra REST API with Pagination TEAM 15
Cassandra REST API with Pagination TEAM 15
Akash Kant
 
Global Cluster Topologies in MongoDB Atlas
Global Cluster Topologies in MongoDB AtlasGlobal Cluster Topologies in MongoDB Atlas
Global Cluster Topologies in MongoDB Atlas
MongoDB
 
Cassandra - A Basic Introduction Guide
Cassandra - A Basic Introduction GuideCassandra - A Basic Introduction Guide
Cassandra - A Basic Introduction Guide
Mohammed Fazuluddin
 
Cassandra's Odyssey @ Netflix
Cassandra's Odyssey @ NetflixCassandra's Odyssey @ Netflix
Cassandra's Odyssey @ Netflix
Roopa Tangirala
 
Ad

More from Michelle Darling (8)

Family pics2august014
Family pics2august014Family pics2august014
Family pics2august014
Michelle Darling
 
Final pink panthers_03_31
Final pink panthers_03_31Final pink panthers_03_31
Final pink panthers_03_31
Michelle Darling
 
Final pink panthers_03_30
Final pink panthers_03_30Final pink panthers_03_30
Final pink panthers_03_30
Michelle Darling
 
Php summary
Php summaryPhp summary
Php summary
Michelle Darling
 
Rsplit apply combine
Rsplit apply combineRsplit apply combine
Rsplit apply combine
Michelle Darling
 
College day pressie
College day pressieCollege day pressie
College day pressie
Michelle Darling
 
R learning by examples
R learning by examplesR learning by examples
R learning by examples
Michelle Darling
 
V3 gamingcasestudy
V3 gamingcasestudyV3 gamingcasestudy
V3 gamingcasestudy
Michelle Darling
 
Ad

Recently uploaded (20)

Drupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy ConsumptionDrupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy Consumption
Exove
 
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
SOFTTECHHUB
 
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdfSAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
Precisely
 
HCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser EnvironmentsHCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser Environments
panagenda
 
How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?
Daniel Lehner
 
Linux Support for SMARC: How Toradex Empowers Embedded Developers
Linux Support for SMARC: How Toradex Empowers Embedded DevelopersLinux Support for SMARC: How Toradex Empowers Embedded Developers
Linux Support for SMARC: How Toradex Empowers Embedded Developers
Toradex
 
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptxSpecial Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
shyamraj55
 
2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx
Samuele Fogagnolo
 
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Impelsys Inc.
 
Procurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptxProcurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptx
Jon Hansen
 
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes
 
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In FranceManifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
chb3
 
AI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global TrendsAI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global Trends
InData Labs
 
Semantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AISemantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AI
artmondano
 
Cyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of securityCyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of security
riccardosl1
 
Greenhouse_Monitoring_Presentation.pptx.
Greenhouse_Monitoring_Presentation.pptx.Greenhouse_Monitoring_Presentation.pptx.
Greenhouse_Monitoring_Presentation.pptx.
hpbmnnxrvb
 
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-UmgebungenHCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
panagenda
 
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
BookNet Canada
 
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven InsightsAndrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell
 
Linux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdfLinux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdf
RHCSA Guru
 
Drupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy ConsumptionDrupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy Consumption
Exove
 
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
SOFTTECHHUB
 
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdfSAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
Precisely
 
HCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser EnvironmentsHCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser Environments
panagenda
 
How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?
Daniel Lehner
 
Linux Support for SMARC: How Toradex Empowers Embedded Developers
Linux Support for SMARC: How Toradex Empowers Embedded DevelopersLinux Support for SMARC: How Toradex Empowers Embedded Developers
Linux Support for SMARC: How Toradex Empowers Embedded Developers
Toradex
 
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptxSpecial Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
shyamraj55
 
2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx
Samuele Fogagnolo
 
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Impelsys Inc.
 
Procurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptxProcurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptx
Jon Hansen
 
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes
 
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In FranceManifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
chb3
 
AI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global TrendsAI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global Trends
InData Labs
 
Semantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AISemantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AI
artmondano
 
Cyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of securityCyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of security
riccardosl1
 
Greenhouse_Monitoring_Presentation.pptx.
Greenhouse_Monitoring_Presentation.pptx.Greenhouse_Monitoring_Presentation.pptx.
Greenhouse_Monitoring_Presentation.pptx.
hpbmnnxrvb
 
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-UmgebungenHCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
panagenda
 
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
BookNet Canada
 
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven InsightsAndrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell
 
Linux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdfLinux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdf
RHCSA Guru
 

Cassandra NoSQL Tutorial

  • 1. Getting to know by Michelle Darling [email protected] August 2013
  • 2. Agenda: ● What is Cassandra? ● Installation, CQL3 ● Data Modelling ● Summary Only 15 min to cover these, so please hold questions til the end, or email me :-) and I’ll summarize Q&A for everyone. Unfortunately, no time for: ● DB Admin ○ Detailed Architecture ○ Partitioning / Consistent Hashing ○ Consistency Tuning ○ Data Distribution & Replication ○ System Tables ● App Development ○ Using Python, Ruby etc to access Cassandra ○ Using Hadoop to stream data into Cassandra
  • 3. What is Cassandra? “Fortuneteller of Doom” from Greek Mythology. Tried to warn others about future disasters, but no one listened. Unfortunately, she was 100% accurate. NoSQL Distributed DB ● Consistency - A__ID ● Availability - High ● Point of Failure - none ● Good for Event Tracking & Analysis ○ Time series data ○ Sensor device data ○ Social media analytics ○ Risk Analysis ○ Failure Prediction Rackspace: “Which servers are under heavy load and are about to crash?”
  • 4. The Evolution of Cassandra 2008: Open-Source Release / 2013: Enterprise & Community Editions Data Model ● Wide rows, sparse arrays ● High performance through very fast write throughput. Infrastructure ● Peer-Peer Gossip ● Key-Value Pairs ● Tunable Consistency 2006 2005 ● Originally for Inbox Search ● But now used for Instagram
  • 5. Other NoSQL vs. Cassandra NoSQL Taxonomy: ● Key-Value Pairs ○ Dynamo, Riak, Redis ● Column-Based ○ BigTable, HBase, Cassandra ● Document-Based ○ MongoDB, Couchbase ● Graph ○ Neo4J Big Data Capable C* Differentiators: ● Production-proven at Netflix, eBay, Twitter, 20 of Fortune 100 ● “Clear Winner” in Scalability, Performance, Availability -- DataStax
  • 6. Architecture ● Cluster (ring) ● Nodes (circles) ● Peer-to-Peer Model ● Gossip Protocol Partitioner: Consistent Hashing
  • 7. Netflix Streaming Video ● Personalized Recommendations per family member ● Built on Amazon Web Services (AWS) + Cassandra
  • 8. Cloud installation using ● Amazon Web Services (AWS) ● Elastic Compute Cloud (EC2) ○ Free for the 1st year! Then pay only for what you use. ○ Sign up for AWS EC2 account: Big Data University Video 4:34 minutes, ● Amazon Machine Image (AMI) ○ Preconfigured installation template ○ Choose: “DataStax AMI for Cassandra Community Edition” ○ Follow these *very good* step-by-step instructions from DataStax. ○ AMIs also available for CouchBase, MongoDB (make sure you pick the free tier community versions to avoid monthly charge$$!!!).
  • 11. DataStax AMI Setup --clustername Michelle --totalnodes 1 --version community
  • 12. “Roll your Own” Installation DataStax Community Edition ● Install instructions For Linux, Windows, MacOS: http://www.datastax.com/2012/01/getting- started-with-cassandra ● Video: “Set up a 4- node Cassandra cluster in under 2 minutes” http://www.screenr.com/5G6
  • 13. Invoke CQLSH, CREATE KEYSPACE ./bin/cqlsh cqlsh> CREATE KEYSPACE big_data … with strategy_class = ‘org.apache.cassandra. locator.SimpleStrategy’ … with strategy_options:replication_factor=‘1’; cqlsh> use big_data; cqlsh:big_data>
  • 14. Tip: Skip Thrift -- use CQL3 Thrift RPC // Your Column Column col = new Column(ByteBuffer.wrap("name". getBytes())); col.setValue(ByteBuffer.wrap("value".getBytes())); col.setTimestamp(System.currentTimeMillis()); // Don't ask ColumnOrSuperColumn cosc = new ColumnOrSuperColumn(); cosc.setColumn(col); // Prepare to be amazed Mutation mutation = new Mutation(); mutation.setColumnOrSuperColumn(cosc); List<Mutation> mutations = new ArrayList<Mutation>(); mutations.add(mutation); Map mutations_map = new HashMap<ByteBuffer, Map<String, List<Mutation>>>(); Map cf_map = new HashMap<String, List<Mutation>>(); cf_map.set("Standard1", mutations); mutations_map.put(ByteBuffer.wrap("key".getBytes()), cf_map); cassandra.batch_mutate(mutations_map, consistency_level); CQL3 - Uses cqlsh - “SQL-like” language - Runs on top of Thrift RPC - Much more user-friendly. Thrift code on left equals this in CQL3: INSERT INTO (id, name) VALUES ('key', 'value');
  • 15. CREATE TABLE cqlsh:big_data> create table user_tags ( … user_id varchar, … tag varchar, … value counter, … primary key (user_id, tag) …): ● TABLE user_tags: “How many times has a user mentioned a hashtag?” ● COUNTER datatype - Computes & stores counter value at the time data is written. This optimizes query performance.
  • 16. UPDATE TABLE SELECT FROM TABLE cqlsh:big_data> UPDATE user_tags SET value=value+1 WHERE user_id = ‘paul’ AND tag = ‘cassandra’ cqlsh:big_data> SELECT * FROM user_tags user_id | tag | value --------+-----------+---------- paul | cassandra | 1
  • 17. DATA MODELING A Major Paradigm Shift! RDBMS Cassandra Structured Data, Fixed Schema Unstructured Data, Flexible Schema “Array of Arrays” 2D: ROW x COLUMN “Nested Key-Value Pairs” 3D: ROW Key x COLUMN key x COLUMN values DATABASE KEYSPACE TABLE TABLE a.k.a COLUMN FAMILY ROW ROW a.k.a PARTITION. Unit of replication. COLUMN COLUMN [Name, Value, Timestamp]. a.k.a CLUSTER. Unit of storage. Up to 2 billion columns per row. FOREIGN KEYS, JOINS, ACID Consistency Referential Integrity not enforced, so A_CID. BUT relationships represented using COLLECTIONS.
  • 19. Example: “Twissandra” Web App Twitter-Inspired sample application written in Python + Cassandra. ● Play with the app: twissandra.com ● Examine & learn from the code on GitHub. Features/Queries: ● Sign In, Sign Up ● Post Tweet ● Userline (User’s tweets) ● Timeline (All tweets) ● Following (Users being followed by user) ● Followers (Users following this user)
  • 21. Twissandra - RDBMS Version Entities ● USER, TWEET ● FOLLOWER, FOLLOWING ● FRIENDS Relationships: ● USER has many TWEETs. ● USER is a FOLLOWER of many USERs. ● Many USERs are FOLLOWING USER.
  • 22. Twissandra - Cassandra Version Tip: Model tables to mirror queries. TABLES or CFs ● TWEET ● USER, USERNAME ● FOLLOWERS, FOLLOWING ● USERLINE, TIMELINE Notes: ● Extra tables mirror queries. ● Denormalized tables are “pre-formed”for faster performance.
  • 24. What does C* data look like? TABLE Userline “List all of user’s Tweets” ************* Row Key: user_id Columns ● Column Key: tweet_id ● “at” Timestamp ● TTL (Time to Live) - seconds til expiration date. *************
  • 25. Cassandra Data Model = LEGOs? FlexibleSchema
  • 26. Summary: ● Go straight from SQL to CQL3; skip Thrift, Column Families, SuperColumns, etc ● Denormalize tables to mirror important queries. Roughly 1 table per impt query. ● Choose wisely: ○ Partition Keys ○ Cluster Keys ○ Indexes ○ TTL ○ Counters ○ Collections See DataStax Music Service Example ● Consider hybrid approach: ○ 20% - RDBMS for highly structured, OLTP, ACID requirements. ○ 80% - Scale Cassandra to handle the rest of data. Remember: ● Cheap: storage, servers, OpenSource software. ● Precious: User AND Developer Happiness.
  • 27. Resources C* Summit 2013: ● Slides ● Cassandra at eBay Scale (slides) ● Data Modelers Still Have Jobs - Adjusting For the NoSQL Environment (Slides) ● Real-time Analytics using Cassandra, Spark and Shark slides ● Cassandra By Example: Data Modelling with CQL3 Slides ● DATASTAX C*OLLEGE CREDIT: DATA MODELLING FOR APACHE CASSANDRA slides I wish I found these 1st: ● How do I Cassandra? slides ● Mobile version of DataStax web docs (link)