Cassandra Tutorial
Cassandra Tutorial
by Michelle Darling
[email protected]
August 2013
Agenda:
What is Cassandra?
Installation, CQL3
Data Modelling
Summary
What is Cassandra?
NoSQL Distributed DB
Consistency - A__ID
Availability - High
Point of Failure - none
Good for Event
Tracking & Analysis
Fortuneteller of Doom
from Greek Mythology. Tried to
warn others about future disasters,
but no one listened. Unfortunately,
she was 100% accurate.
2006
Infrastructure
Peer-Peer Gossip
Key-Value Pairs
Tunable Consistency
Key-Value Pairs
Dynamo, Riak, Redis
Column-Based
BigTable, HBase,
Cassandra
Document-Based
MongoDB, Couchbase
Graph
Neo4J
Cassandra
C* Differentiators:
Production-proven at
Netflix, eBay, Twitter,
20 of Fortune 100
Clear Winner in
Scalability,
Performance,
Availability
-- DataStax
Architecture
Cluster (ring)
Nodes (circles)
Peer-to-Peer Model
Gossip Protocol
Partitioner:
Consistent Hashing
Netflix
Streaming Video
Personalized
Recommendations per
family member
Built on Amazon Web
Services (AWS) +
Cassandra
Free for the 1st year! Then pay only for what you use.
Sign up for AWS EC2 account: Big Data University Video 4:34 minutes,
--clustername Michelle
--totalnodes 1
--version community
Install instructions
For Linux, Windows,
MacOS:
http://www.datastax.com/2012/01/gettingstarted-with-cassandra
CQL3
// Your Column
Column col = new Column(ByteBuffer.wrap("name".
getBytes()));
col.setValue(ByteBuffer.wrap("value".getBytes()));
col.setTimestamp(System.currentTimeMillis());
// Don't ask
ColumnOrSuperColumn cosc = new ColumnOrSuperColumn();
cosc.setColumn(col);
- Uses cqlsh
- SQL-like language
- Runs on top of Thrift RPC
- Much more user-friendly.
// Prepare to be amazed
Mutation mutation = new Mutation();
mutation.setColumnOrSuperColumn(cosc);
List<Mutation> mutations = new ArrayList<Mutation>();
mutations.add(mutation);
Map mutations_map = new HashMap<ByteBuffer, Map<String,
List<Mutation>>>();
Map cf_map = new HashMap<String, List<Mutation>>();
cf_map.set("Standard1", mutations);
mutations_map.put(ByteBuffer.wrap("key".getBytes()),
cf_map);
cassandra.batch_mutate(mutations_map,
consistency_level);
CREATE TABLE
cqlsh:big_data> create table user_tags (
user_id varchar,
tag varchar,
value counter,
primary key (user_id, tag)
):
TABLE user_tags: How many times has a user
mentioned a hashtag?
COUNTER datatype - Computes & stores counter value
at the time data is written. This optimizes query
performance.
UPDATE TABLE
SELECT FROM TABLE
cqlsh:big_data> UPDATE user_tags SET
value=value+1 WHERE user_id = paul AND tag =
cassandra
DATA MODELING
A Major Paradigm Shift!
RDBMS
Cassandra
Array of Arrays
2D: ROW x COLUMN
DATABASE
KEYSPACE
TABLE
ROW
COLUMN
Cassandra
3D+: Nested Objects
RDBMS
2D: Rows
x columns
Example:
Twissandra Web App
Twitter-Inspired
sample application
written in Python +
Cassandra.
Play with the app:
twissandra.com
Examine & learn
from the code on
GitHub.
Features/Queries:
Sign In, Sign Up
Post Tweet
Userline (Users tweets)
Timeline (All tweets)
Following (Users being
followed by user)
Followers (Users
following this user)
Twissandra.com vs Twitter.com
Entities
USER, TWEET
FOLLOWER, FOLLOWING
FRIENDS
Relationships:
USER has many TWEETs.
USER is a FOLLOWER of many
USERs.
Many USERs are FOLLOWING
USER.
TABLES or CFs
TWEET
USER, USERNAME
FOLLOWERS, FOLLOWING
USERLINE, TIMELINE
Notes:
Extra tables mirror queries.
Denormalized tables are
pre-formedfor faster
performance.
Tip: Remember,
Skip Thrift -- use CQL3
TABLE
Flex
Sch ible
ema
Summary:
Go straight from SQL
to CQL3; skip Thrift, Column
Choose wisely:
Partition Keys
Cluster Keys
Indexes
TTL
Counters
Collections
See DataStax Music Service
Example
Consider hybrid
approach:
Remember:
Cheap: storage,
servers, OpenSource
software.
Precious: User AND
Developer Happiness.
Resources
C* Summit 2013:
Slides
Cassandra at eBay Scale (slides)
Data Modelers Still Have Jobs Adjusting For the NoSQL
Environment (Slides)
Real-time Analytics using
Cassandra, Spark and Shark
slides