SlideShare a Scribd company logo
Introduction to  Apache Cassandra (for Java developers!) Nate McCall [email_address] @zznate
Brief Intro  NOT a "key/value store" Columns are dynamic inside a column family SSTables are immutable  SSTables merged on reads All nodes share the same role (i.e. no single point of failure) Trading ACID compliance for scalability is a fundamental design decision
How does this impact development? Substantially.  For operations affecting the same data, that data will become consistent eventually as determined by the timestamps.  But you can trade availability for consistency. (More on this later) You can store whatever you want. It's all just bytes. You need to think about how you will query the data before you write it.
Neat. So Now What? Like any database,  you need a client! Python: Telephus:  http://github.com/driftx/Telephus  (Twisted) Pycassa:  http://github.com/pycassa/pycassa Java: Hector:  http://github.com/rantav/hector  (Examples  https://github.com/zznate/hector-examples  ) Pelops:  http://github.com/s7/scale7-pelops Kundera  http://code.google.com/p/kundera/ Datanucleus JDO:  http://github.com/tnine/Datanucleus-Cassandra-Plugin Grails: grails-cassandra:  https://github.com/wolpert/grails-cassandra .NET: FluentCassandra :  http://github.com/managedfusion/fluentcassandra Aquiles:  http://aquiles.codeplex.com/ Ruby: Cassandra:  http://github.com/fauna/cassandra PHP: phpcassa:  http://github.com/thobbs/phpcassa SimpleCassie :  http://code.google.com/p/simpletools-php/wiki/SimpleCassie
... but do not roll your own
Thrift Fast, efficient serialization and network IO.  Lots of clients available (you can probably use it in other places as well) Why you don't want to work with the Thrift API directly: SuperColumn ColumnOrSuperColumn ColumnParent.super_column ColumnPath.super_column Map<ByteBuffer,Map<String,List<Mutation>>> mutationMap 
Higher Level Client Hector JMX Counters Add/remove hosts: automatically  programatically via JMX Plugable load balancing Complete encapsulation of Thrift API Type-safe approach to dealing with Apache Cassandra Lightweight ORM (supports JPA 1.0 annotations) Mavenized!  http://repo2.maven.org/maven2/me/prettyprint/
&quot;CQL&quot; Currently in Apache Cassandra trunk  Experimental Lots of possibilities from test/system/test_cql.py: UPDATE StandardLong1 SET 1L=&quot;1&quot;, 2L=&quot;2&quot;, 3L=&quot;3&quot;, 4L=&quot;4&quot; WHERE KEY=&quot;aa&quot; SELECT &quot;cd1&quot;, &quot;col&quot; FROM Standard1 WHERE KEY = &quot;kd&quot; DELETE &quot;cd1&quot;, &quot;col&quot; FROM Standard1 WHERE KEY = &quot;kd&quot;
Avro?? Gone. Added too much complexity after Thrift caught up.   &quot;None of the libraries distinguished themselves as being a particularly crappy choice for serialization.&quot;  (See  CASSANDRA-1765 )
Thrift API Methods Retrieving Writing/Removing Meta Information Schema Manipulation
Thrift API Methods - Retrieving get: retrieve a single column for a key get_slice: retrieve a &quot;slice&quot; of columns for a key multiget_slice: retrieve a &quot;slice&quot; of columns for a list of keys get_count: counts the columns of key (you have to deserialize the row to do it) get_range_slices: retrieve a slice for a range of keys get_indexed_slices (FTW!)
Thrift API Methods - Writing/Removing insert batch_mutate (batch insertion AND deletion) remove truncate**
Thrift API Methods - Meta Information describe_cluster_name describe_version describe_keyspace describe_keyspaces
Thrift API Methods - Schema system_add_keyspace system_update_keyspace system_drop_keyspace system_add_column_family system_update_column_family system_drop_column_family
vs. RDBMS - Consistency Level Consistency is tunable per request! Cassandra provides consistency when R + W > N (read replica count + write replica count > replication factor). *** CONSITENCY LEVEL FAILURE IS NOT A ROLLBACK*** Idempotent: an operation can be applied multiple times without changing the result
vs. RDBMS - Append Only Proper data modelling will minimizes seeks  (Go to Tyler's presentation for more!)
On to the Code... https://github.com/zznate/cassandra-tutorial Uses Maven.  Really basic.  Modify/abuse/alter as needed.  Descriptions of what is going on and how to run each example are in the Javadoc comments.  Sample data is based on North American Numbering Plan http://en.wikipedia.org/wiki/North_American_Numbering_Plan
Data Shape 512 202 30.27 097.74 W TX Austin 512 203 30.27 097.74 L TX Austin 512 204 30.32 097.73 W TX Austin 512 205 30.32 097.73 W TX Austin 512 206 30.32 097.73 L TX Austin
Get a Single Column for a Key GetCityForNpanxx.java Retrieve a single column with: Name Value Timestamp TTL
Get the Contents of a Row GetSliceForNpanxx.java Retrieves a list of columns (Hector wraps these in a ColumnSlice) &quot;SlicePredicate&quot; can either be explicit set of columns OR a range (more on ranges soon) Another messy either/or choice encapsulated by Hector
Get the (sorted!) Columns of a Row  GetSliceForStateCity.java Shows why the choice of comparator is important (this is the order in which the columns hit the disk - take advantage of it) Can be easily modified to return results in reverse order (but this is slightly slower)
Get the Same Slice from Several Rows MultigetSliceForNpanxx.java Very similar to get_slice examples, except we provide a list of keys
Get Slices From a Range of Rows GetRangeSlicesForStateCity.java Like multiget_slice, except we can specify a KeyRange (encapsulated by RangeSlicesQuery#setKeys(start, end) The results of this query will be significantly more meaningful with OrderPreservingPartitioner (try this at home!)
Get Slices From a Range of Rows - 2 GetSliceForAreaCodeCity.java Compound column name for controlling ranges Comparator at work on text field
Get Slices from Indexed Columns GetIndexedSlicesForCityState.java You only need to index a single column to apply clauses on other columns (BUT- the indexed column must be present with an EQUALS clause!) (It's just another ColumnFamily maintained automatically)
Insert, Update and Delete ... are effectively the same operation.  InsertRowsForColumnFamilies.java DeleteRowsForColumnFamily.java Run each in succession (in whichever combination you like) and verify your results on the CLI Hint: watch the timestamps bin/cassandra-cli --host localhost use Tutorial; list AreaCode; list Npanxx; list StateCity;
Stuff I Punted on for the Sake of Brevity meta_* methods CassandraClusterTest.java: L43-81 @hector system_* methods SchemaManipulation.java @ hector-examples CassandraClusterTest.java: L84-157 @hector ORM (it works and is in production) ORM Documentation multiple nodes failure scenarios Data modelling (go see Tyler's presentation)
Things to Remember deletes and timestamp granularity &quot;range ghosts&quot; using the wrong column comparator and InvalidRequestException deletions actually write data use column-level TTL to automate deletion &quot;how do I iterate over all the rows in a column family&quot;? get_range_slices, but don't do that a good sign your data model is wrong
Dealing with *Lots* of Data (Briefly) Two biggest headaches have been addressed: Compaction pollutes os page cache ( CASSANDRA-1470 ) Greater than 143mil keys on a single SSTable means more BF false positives ( CASSANDRA-1555 ) Hadoop integration: Yes. (Go see Jeremy's presentation) Bulk loading: Yes.  CASSANDRA-1278 For more information:  http://wiki.apache.org/cassandra/LargeDataSetConsiderations

More Related Content

What's hot (20)

Understanding AntiEntropy in Cassandra
Understanding AntiEntropy in CassandraUnderstanding AntiEntropy in Cassandra
Understanding AntiEntropy in Cassandra
Jason Brown
 
Cassandra for Sysadmins
Cassandra for SysadminsCassandra for Sysadmins
Cassandra for Sysadmins
Nathan Milford
 
Introduction to NoSQL & Apache Cassandra
Introduction to NoSQL & Apache CassandraIntroduction to NoSQL & Apache Cassandra
Introduction to NoSQL & Apache Cassandra
Chetan Baheti
 
Introduction to Cassandra
Introduction to CassandraIntroduction to Cassandra
Introduction to Cassandra
Gokhan Atil
 
Cassandra ppt 2
Cassandra ppt 2Cassandra ppt 2
Cassandra ppt 2
Skillwise Group
 
Cassandra multi-datacenter operations essentials
Cassandra multi-datacenter operations essentialsCassandra multi-datacenter operations essentials
Cassandra multi-datacenter operations essentials
Julien Anguenot
 
Distribute Key Value Store
Distribute Key Value StoreDistribute Key Value Store
Distribute Key Value Store
Santal Li
 
Spark Streaming Recipes and "Exactly Once" Semantics Revised
Spark Streaming Recipes and "Exactly Once" Semantics RevisedSpark Streaming Recipes and "Exactly Once" Semantics Revised
Spark Streaming Recipes and "Exactly Once" Semantics Revised
Michael Spector
 
Cassandra Introduction & Features
Cassandra Introduction & FeaturesCassandra Introduction & Features
Cassandra Introduction & Features
Phil Peace
 
The Automation Factory
The Automation FactoryThe Automation Factory
The Automation Factory
Nathan Milford
 
Understanding Data Consistency in Apache Cassandra
Understanding Data Consistency in Apache CassandraUnderstanding Data Consistency in Apache Cassandra
Understanding Data Consistency in Apache Cassandra
DataStax
 
Apache Cassandra Developer Training Slide Deck
Apache Cassandra Developer Training Slide DeckApache Cassandra Developer Training Slide Deck
Apache Cassandra Developer Training Slide Deck
DataStax Academy
 
One Billion Black Friday Shoppers on a Distributed Data Store (Fahd Siddiqui,...
One Billion Black Friday Shoppers on a Distributed Data Store (Fahd Siddiqui,...One Billion Black Friday Shoppers on a Distributed Data Store (Fahd Siddiqui,...
One Billion Black Friday Shoppers on a Distributed Data Store (Fahd Siddiqui,...
DataStax
 
Big Data Day LA 2015 - Sparking up your Cassandra Cluster- Analytics made Awe...
Big Data Day LA 2015 - Sparking up your Cassandra Cluster- Analytics made Awe...Big Data Day LA 2015 - Sparking up your Cassandra Cluster- Analytics made Awe...
Big Data Day LA 2015 - Sparking up your Cassandra Cluster- Analytics made Awe...
Data Con LA
 
Node.js and Cassandra
Node.js and CassandraNode.js and Cassandra
Node.js and Cassandra
Stratio
 
Learning Cassandra
Learning CassandraLearning Cassandra
Learning Cassandra
Dave Gardner
 
Intro to cassandra
Intro to cassandraIntro to cassandra
Intro to cassandra
Aaron Ploetz
 
Managing Objects and Data in Apache Cassandra
Managing Objects and Data in Apache CassandraManaging Objects and Data in Apache Cassandra
Managing Objects and Data in Apache Cassandra
DataStax
 
Real-time streaming and data pipelines with Apache Kafka
Real-time streaming and data pipelines with Apache KafkaReal-time streaming and data pipelines with Apache Kafka
Real-time streaming and data pipelines with Apache Kafka
Joe Stein
 
Apache cassandra v4.0
Apache cassandra v4.0Apache cassandra v4.0
Apache cassandra v4.0
Yuki Morishita
 
Understanding AntiEntropy in Cassandra
Understanding AntiEntropy in CassandraUnderstanding AntiEntropy in Cassandra
Understanding AntiEntropy in Cassandra
Jason Brown
 
Cassandra for Sysadmins
Cassandra for SysadminsCassandra for Sysadmins
Cassandra for Sysadmins
Nathan Milford
 
Introduction to NoSQL & Apache Cassandra
Introduction to NoSQL & Apache CassandraIntroduction to NoSQL & Apache Cassandra
Introduction to NoSQL & Apache Cassandra
Chetan Baheti
 
Introduction to Cassandra
Introduction to CassandraIntroduction to Cassandra
Introduction to Cassandra
Gokhan Atil
 
Cassandra multi-datacenter operations essentials
Cassandra multi-datacenter operations essentialsCassandra multi-datacenter operations essentials
Cassandra multi-datacenter operations essentials
Julien Anguenot
 
Distribute Key Value Store
Distribute Key Value StoreDistribute Key Value Store
Distribute Key Value Store
Santal Li
 
Spark Streaming Recipes and "Exactly Once" Semantics Revised
Spark Streaming Recipes and "Exactly Once" Semantics RevisedSpark Streaming Recipes and "Exactly Once" Semantics Revised
Spark Streaming Recipes and "Exactly Once" Semantics Revised
Michael Spector
 
Cassandra Introduction & Features
Cassandra Introduction & FeaturesCassandra Introduction & Features
Cassandra Introduction & Features
Phil Peace
 
The Automation Factory
The Automation FactoryThe Automation Factory
The Automation Factory
Nathan Milford
 
Understanding Data Consistency in Apache Cassandra
Understanding Data Consistency in Apache CassandraUnderstanding Data Consistency in Apache Cassandra
Understanding Data Consistency in Apache Cassandra
DataStax
 
Apache Cassandra Developer Training Slide Deck
Apache Cassandra Developer Training Slide DeckApache Cassandra Developer Training Slide Deck
Apache Cassandra Developer Training Slide Deck
DataStax Academy
 
One Billion Black Friday Shoppers on a Distributed Data Store (Fahd Siddiqui,...
One Billion Black Friday Shoppers on a Distributed Data Store (Fahd Siddiqui,...One Billion Black Friday Shoppers on a Distributed Data Store (Fahd Siddiqui,...
One Billion Black Friday Shoppers on a Distributed Data Store (Fahd Siddiqui,...
DataStax
 
Big Data Day LA 2015 - Sparking up your Cassandra Cluster- Analytics made Awe...
Big Data Day LA 2015 - Sparking up your Cassandra Cluster- Analytics made Awe...Big Data Day LA 2015 - Sparking up your Cassandra Cluster- Analytics made Awe...
Big Data Day LA 2015 - Sparking up your Cassandra Cluster- Analytics made Awe...
Data Con LA
 
Node.js and Cassandra
Node.js and CassandraNode.js and Cassandra
Node.js and Cassandra
Stratio
 
Learning Cassandra
Learning CassandraLearning Cassandra
Learning Cassandra
Dave Gardner
 
Intro to cassandra
Intro to cassandraIntro to cassandra
Intro to cassandra
Aaron Ploetz
 
Managing Objects and Data in Apache Cassandra
Managing Objects and Data in Apache CassandraManaging Objects and Data in Apache Cassandra
Managing Objects and Data in Apache Cassandra
DataStax
 
Real-time streaming and data pipelines with Apache Kafka
Real-time streaming and data pipelines with Apache KafkaReal-time streaming and data pipelines with Apache Kafka
Real-time streaming and data pipelines with Apache Kafka
Joe Stein
 

Similar to Introduction to apache_cassandra_for_develope (20)

NOSQL and Cassandra
NOSQL and CassandraNOSQL and Cassandra
NOSQL and Cassandra
rantav
 
Meetup cassandra for_java_cql
Meetup cassandra for_java_cqlMeetup cassandra for_java_cql
Meetup cassandra for_java_cql
zznate
 
Building a High-Performance Database with Scala, Akka, and Spark
Building a High-Performance Database with Scala, Akka, and SparkBuilding a High-Performance Database with Scala, Akka, and Spark
Building a High-Performance Database with Scala, Akka, and Spark
Evan Chan
 
Scala active record
Scala active recordScala active record
Scala active record
鉄平 土佐
 
Hackingtomcat
HackingtomcatHackingtomcat
Hackingtomcat
Aung Khant
 
Hacking Tomcat
Hacking TomcatHacking Tomcat
Hacking Tomcat
guestc27cd9
 
Practical catalyst
Practical catalystPractical catalyst
Practical catalyst
dwm042
 
Sparklife - Life In The Trenches With Spark
Sparklife - Life In The Trenches With SparkSparklife - Life In The Trenches With Spark
Sparklife - Life In The Trenches With Spark
Ian Pointer
 
Java findamentals1
Java findamentals1Java findamentals1
Java findamentals1
Todor Kolev
 
Java findamentals1
Java findamentals1Java findamentals1
Java findamentals1
Todor Kolev
 
Java findamentals1
Java findamentals1Java findamentals1
Java findamentals1
Todor Kolev
 
Heap and stack space in java
Heap and stack space in javaHeap and stack space in java
Heap and stack space in java
Talha Ocakçı
 
B2 2006 tomcat_clusters
B2 2006 tomcat_clustersB2 2006 tomcat_clusters
B2 2006 tomcat_clusters
Steve Feldman
 
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
Helena Edelson
 
Using Cassandra with your Web Application
Using Cassandra with your Web ApplicationUsing Cassandra with your Web Application
Using Cassandra with your Web Application
supertom
 
Jdbc[1]
Jdbc[1]Jdbc[1]
Jdbc[1]
Fulvio Corno
 
JDBC programming
JDBC programmingJDBC programming
JDBC programming
Fulvio Corno
 
Inside the JVM - Follow the white rabbit! / Breizh JUG
Inside the JVM - Follow the white rabbit! / Breizh JUGInside the JVM - Follow the white rabbit! / Breizh JUG
Inside the JVM - Follow the white rabbit! / Breizh JUG
Sylvain Wallez
 
JDD 2016 - Grzegorz Rozniecki - Java 8 What Could Possibly Go Wrong
JDD 2016 - Grzegorz Rozniecki - Java 8 What Could Possibly Go WrongJDD 2016 - Grzegorz Rozniecki - Java 8 What Could Possibly Go Wrong
JDD 2016 - Grzegorz Rozniecki - Java 8 What Could Possibly Go Wrong
PROIDEA
 
Cassandra SF Meetup - CQL Performance With Apache Cassandra 3.X
Cassandra SF Meetup - CQL Performance With Apache Cassandra 3.XCassandra SF Meetup - CQL Performance With Apache Cassandra 3.X
Cassandra SF Meetup - CQL Performance With Apache Cassandra 3.X
aaronmorton
 
NOSQL and Cassandra
NOSQL and CassandraNOSQL and Cassandra
NOSQL and Cassandra
rantav
 
Meetup cassandra for_java_cql
Meetup cassandra for_java_cqlMeetup cassandra for_java_cql
Meetup cassandra for_java_cql
zznate
 
Building a High-Performance Database with Scala, Akka, and Spark
Building a High-Performance Database with Scala, Akka, and SparkBuilding a High-Performance Database with Scala, Akka, and Spark
Building a High-Performance Database with Scala, Akka, and Spark
Evan Chan
 
Practical catalyst
Practical catalystPractical catalyst
Practical catalyst
dwm042
 
Sparklife - Life In The Trenches With Spark
Sparklife - Life In The Trenches With SparkSparklife - Life In The Trenches With Spark
Sparklife - Life In The Trenches With Spark
Ian Pointer
 
Java findamentals1
Java findamentals1Java findamentals1
Java findamentals1
Todor Kolev
 
Java findamentals1
Java findamentals1Java findamentals1
Java findamentals1
Todor Kolev
 
Java findamentals1
Java findamentals1Java findamentals1
Java findamentals1
Todor Kolev
 
Heap and stack space in java
Heap and stack space in javaHeap and stack space in java
Heap and stack space in java
Talha Ocakçı
 
B2 2006 tomcat_clusters
B2 2006 tomcat_clustersB2 2006 tomcat_clusters
B2 2006 tomcat_clusters
Steve Feldman
 
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
Fast and Simplified Streaming, Ad-Hoc and Batch Analytics with FiloDB and Spa...
Helena Edelson
 
Using Cassandra with your Web Application
Using Cassandra with your Web ApplicationUsing Cassandra with your Web Application
Using Cassandra with your Web Application
supertom
 
Inside the JVM - Follow the white rabbit! / Breizh JUG
Inside the JVM - Follow the white rabbit! / Breizh JUGInside the JVM - Follow the white rabbit! / Breizh JUG
Inside the JVM - Follow the white rabbit! / Breizh JUG
Sylvain Wallez
 
JDD 2016 - Grzegorz Rozniecki - Java 8 What Could Possibly Go Wrong
JDD 2016 - Grzegorz Rozniecki - Java 8 What Could Possibly Go WrongJDD 2016 - Grzegorz Rozniecki - Java 8 What Could Possibly Go Wrong
JDD 2016 - Grzegorz Rozniecki - Java 8 What Could Possibly Go Wrong
PROIDEA
 
Cassandra SF Meetup - CQL Performance With Apache Cassandra 3.X
Cassandra SF Meetup - CQL Performance With Apache Cassandra 3.XCassandra SF Meetup - CQL Performance With Apache Cassandra 3.X
Cassandra SF Meetup - CQL Performance With Apache Cassandra 3.X
aaronmorton
 

More from zznate (15)

Advanced Apache Cassandra Operations with JMX
Advanced Apache Cassandra Operations with JMXAdvanced Apache Cassandra Operations with JMX
Advanced Apache Cassandra Operations with JMX
zznate
 
Hardening cassandra q2_2016
Hardening cassandra q2_2016Hardening cassandra q2_2016
Hardening cassandra q2_2016
zznate
 
Seattle C* Meetup: Hardening cassandra for compliance or paranoia
Seattle C* Meetup: Hardening cassandra for compliance or paranoiaSeattle C* Meetup: Hardening cassandra for compliance or paranoia
Seattle C* Meetup: Hardening cassandra for compliance or paranoia
zznate
 
Software Development with Apache Cassandra
Software Development with Apache CassandraSoftware Development with Apache Cassandra
Software Development with Apache Cassandra
zznate
 
Hardening cassandra for compliance or paranoia
Hardening cassandra for compliance or paranoiaHardening cassandra for compliance or paranoia
Hardening cassandra for compliance or paranoia
zznate
 
Successful Software Development with Apache Cassandra
Successful Software Development with Apache CassandraSuccessful Software Development with Apache Cassandra
Successful Software Development with Apache Cassandra
zznate
 
Stampede con 2014 cassandra in the real world
Stampede con 2014   cassandra in the real worldStampede con 2014   cassandra in the real world
Stampede con 2014 cassandra in the real world
zznate
 
An Introduction to the Vert.x framework
An Introduction to the Vert.x frameworkAn Introduction to the Vert.x framework
An Introduction to the Vert.x framework
zznate
 
Intravert atx meetup_condensed
Intravert atx meetup_condensedIntravert atx meetup_condensed
Intravert atx meetup_condensed
zznate
 
Apachecon cassandra transport
Apachecon cassandra transportApachecon cassandra transport
Apachecon cassandra transport
zznate
 
Oscon 2012 tdd_cassandra
Oscon 2012 tdd_cassandraOscon 2012 tdd_cassandra
Oscon 2012 tdd_cassandra
zznate
 
Strata west 2012_java_cassandra
Strata west 2012_java_cassandraStrata west 2012_java_cassandra
Strata west 2012_java_cassandra
zznate
 
Nyc summit intro_to_cassandra
Nyc summit intro_to_cassandraNyc summit intro_to_cassandra
Nyc summit intro_to_cassandra
zznate
 
Meetup cassandra sfo_jdbc
Meetup cassandra sfo_jdbcMeetup cassandra sfo_jdbc
Meetup cassandra sfo_jdbc
zznate
 
Hector v2: The Second Version of the Popular High-Level Java Client for Apach...
Hector v2: The Second Version of the Popular High-Level Java Client for Apach...Hector v2: The Second Version of the Popular High-Level Java Client for Apach...
Hector v2: The Second Version of the Popular High-Level Java Client for Apach...
zznate
 
Advanced Apache Cassandra Operations with JMX
Advanced Apache Cassandra Operations with JMXAdvanced Apache Cassandra Operations with JMX
Advanced Apache Cassandra Operations with JMX
zznate
 
Hardening cassandra q2_2016
Hardening cassandra q2_2016Hardening cassandra q2_2016
Hardening cassandra q2_2016
zznate
 
Seattle C* Meetup: Hardening cassandra for compliance or paranoia
Seattle C* Meetup: Hardening cassandra for compliance or paranoiaSeattle C* Meetup: Hardening cassandra for compliance or paranoia
Seattle C* Meetup: Hardening cassandra for compliance or paranoia
zznate
 
Software Development with Apache Cassandra
Software Development with Apache CassandraSoftware Development with Apache Cassandra
Software Development with Apache Cassandra
zznate
 
Hardening cassandra for compliance or paranoia
Hardening cassandra for compliance or paranoiaHardening cassandra for compliance or paranoia
Hardening cassandra for compliance or paranoia
zznate
 
Successful Software Development with Apache Cassandra
Successful Software Development with Apache CassandraSuccessful Software Development with Apache Cassandra
Successful Software Development with Apache Cassandra
zznate
 
Stampede con 2014 cassandra in the real world
Stampede con 2014   cassandra in the real worldStampede con 2014   cassandra in the real world
Stampede con 2014 cassandra in the real world
zznate
 
An Introduction to the Vert.x framework
An Introduction to the Vert.x frameworkAn Introduction to the Vert.x framework
An Introduction to the Vert.x framework
zznate
 
Intravert atx meetup_condensed
Intravert atx meetup_condensedIntravert atx meetup_condensed
Intravert atx meetup_condensed
zznate
 
Apachecon cassandra transport
Apachecon cassandra transportApachecon cassandra transport
Apachecon cassandra transport
zznate
 
Oscon 2012 tdd_cassandra
Oscon 2012 tdd_cassandraOscon 2012 tdd_cassandra
Oscon 2012 tdd_cassandra
zznate
 
Strata west 2012_java_cassandra
Strata west 2012_java_cassandraStrata west 2012_java_cassandra
Strata west 2012_java_cassandra
zznate
 
Nyc summit intro_to_cassandra
Nyc summit intro_to_cassandraNyc summit intro_to_cassandra
Nyc summit intro_to_cassandra
zznate
 
Meetup cassandra sfo_jdbc
Meetup cassandra sfo_jdbcMeetup cassandra sfo_jdbc
Meetup cassandra sfo_jdbc
zznate
 
Hector v2: The Second Version of the Popular High-Level Java Client for Apach...
Hector v2: The Second Version of the Popular High-Level Java Client for Apach...Hector v2: The Second Version of the Popular High-Level Java Client for Apach...
Hector v2: The Second Version of the Popular High-Level Java Client for Apach...
zznate
 

Introduction to apache_cassandra_for_develope

  • 1. Introduction to  Apache Cassandra (for Java developers!) Nate McCall [email_address] @zznate
  • 2. Brief Intro  NOT a &quot;key/value store&quot; Columns are dynamic inside a column family SSTables are immutable  SSTables merged on reads All nodes share the same role (i.e. no single point of failure) Trading ACID compliance for scalability is a fundamental design decision
  • 3. How does this impact development? Substantially.  For operations affecting the same data, that data will become consistent eventually as determined by the timestamps.  But you can trade availability for consistency. (More on this later) You can store whatever you want. It's all just bytes. You need to think about how you will query the data before you write it.
  • 4. Neat. So Now What? Like any database, you need a client! Python: Telephus:  http://github.com/driftx/Telephus  (Twisted) Pycassa:  http://github.com/pycassa/pycassa Java: Hector:  http://github.com/rantav/hector  (Examples  https://github.com/zznate/hector-examples  ) Pelops:  http://github.com/s7/scale7-pelops Kundera  http://code.google.com/p/kundera/ Datanucleus JDO:  http://github.com/tnine/Datanucleus-Cassandra-Plugin Grails: grails-cassandra:  https://github.com/wolpert/grails-cassandra .NET: FluentCassandra :  http://github.com/managedfusion/fluentcassandra Aquiles:  http://aquiles.codeplex.com/ Ruby: Cassandra:  http://github.com/fauna/cassandra PHP: phpcassa:  http://github.com/thobbs/phpcassa SimpleCassie :  http://code.google.com/p/simpletools-php/wiki/SimpleCassie
  • 5. ... but do not roll your own
  • 6. Thrift Fast, efficient serialization and network IO.  Lots of clients available (you can probably use it in other places as well) Why you don't want to work with the Thrift API directly: SuperColumn ColumnOrSuperColumn ColumnParent.super_column ColumnPath.super_column Map<ByteBuffer,Map<String,List<Mutation>>> mutationMap 
  • 7. Higher Level Client Hector JMX Counters Add/remove hosts: automatically  programatically via JMX Plugable load balancing Complete encapsulation of Thrift API Type-safe approach to dealing with Apache Cassandra Lightweight ORM (supports JPA 1.0 annotations) Mavenized!  http://repo2.maven.org/maven2/me/prettyprint/
  • 8. &quot;CQL&quot; Currently in Apache Cassandra trunk  Experimental Lots of possibilities from test/system/test_cql.py: UPDATE StandardLong1 SET 1L=&quot;1&quot;, 2L=&quot;2&quot;, 3L=&quot;3&quot;, 4L=&quot;4&quot; WHERE KEY=&quot;aa&quot; SELECT &quot;cd1&quot;, &quot;col&quot; FROM Standard1 WHERE KEY = &quot;kd&quot; DELETE &quot;cd1&quot;, &quot;col&quot; FROM Standard1 WHERE KEY = &quot;kd&quot;
  • 9. Avro?? Gone. Added too much complexity after Thrift caught up.   &quot;None of the libraries distinguished themselves as being a particularly crappy choice for serialization.&quot;  (See  CASSANDRA-1765 )
  • 10. Thrift API Methods Retrieving Writing/Removing Meta Information Schema Manipulation
  • 11. Thrift API Methods - Retrieving get: retrieve a single column for a key get_slice: retrieve a &quot;slice&quot; of columns for a key multiget_slice: retrieve a &quot;slice&quot; of columns for a list of keys get_count: counts the columns of key (you have to deserialize the row to do it) get_range_slices: retrieve a slice for a range of keys get_indexed_slices (FTW!)
  • 12. Thrift API Methods - Writing/Removing insert batch_mutate (batch insertion AND deletion) remove truncate**
  • 13. Thrift API Methods - Meta Information describe_cluster_name describe_version describe_keyspace describe_keyspaces
  • 14. Thrift API Methods - Schema system_add_keyspace system_update_keyspace system_drop_keyspace system_add_column_family system_update_column_family system_drop_column_family
  • 15. vs. RDBMS - Consistency Level Consistency is tunable per request! Cassandra provides consistency when R + W > N (read replica count + write replica count > replication factor). *** CONSITENCY LEVEL FAILURE IS NOT A ROLLBACK*** Idempotent: an operation can be applied multiple times without changing the result
  • 16. vs. RDBMS - Append Only Proper data modelling will minimizes seeks  (Go to Tyler's presentation for more!)
  • 17. On to the Code... https://github.com/zznate/cassandra-tutorial Uses Maven.  Really basic.  Modify/abuse/alter as needed.  Descriptions of what is going on and how to run each example are in the Javadoc comments.  Sample data is based on North American Numbering Plan http://en.wikipedia.org/wiki/North_American_Numbering_Plan
  • 18. Data Shape 512 202 30.27 097.74 W TX Austin 512 203 30.27 097.74 L TX Austin 512 204 30.32 097.73 W TX Austin 512 205 30.32 097.73 W TX Austin 512 206 30.32 097.73 L TX Austin
  • 19. Get a Single Column for a Key GetCityForNpanxx.java Retrieve a single column with: Name Value Timestamp TTL
  • 20. Get the Contents of a Row GetSliceForNpanxx.java Retrieves a list of columns (Hector wraps these in a ColumnSlice) &quot;SlicePredicate&quot; can either be explicit set of columns OR a range (more on ranges soon) Another messy either/or choice encapsulated by Hector
  • 21. Get the (sorted!) Columns of a Row  GetSliceForStateCity.java Shows why the choice of comparator is important (this is the order in which the columns hit the disk - take advantage of it) Can be easily modified to return results in reverse order (but this is slightly slower)
  • 22. Get the Same Slice from Several Rows MultigetSliceForNpanxx.java Very similar to get_slice examples, except we provide a list of keys
  • 23. Get Slices From a Range of Rows GetRangeSlicesForStateCity.java Like multiget_slice, except we can specify a KeyRange (encapsulated by RangeSlicesQuery#setKeys(start, end) The results of this query will be significantly more meaningful with OrderPreservingPartitioner (try this at home!)
  • 24. Get Slices From a Range of Rows - 2 GetSliceForAreaCodeCity.java Compound column name for controlling ranges Comparator at work on text field
  • 25. Get Slices from Indexed Columns GetIndexedSlicesForCityState.java You only need to index a single column to apply clauses on other columns (BUT- the indexed column must be present with an EQUALS clause!) (It's just another ColumnFamily maintained automatically)
  • 26. Insert, Update and Delete ... are effectively the same operation.  InsertRowsForColumnFamilies.java DeleteRowsForColumnFamily.java Run each in succession (in whichever combination you like) and verify your results on the CLI Hint: watch the timestamps bin/cassandra-cli --host localhost use Tutorial; list AreaCode; list Npanxx; list StateCity;
  • 27. Stuff I Punted on for the Sake of Brevity meta_* methods CassandraClusterTest.java: L43-81 @hector system_* methods SchemaManipulation.java @ hector-examples CassandraClusterTest.java: L84-157 @hector ORM (it works and is in production) ORM Documentation multiple nodes failure scenarios Data modelling (go see Tyler's presentation)
  • 28. Things to Remember deletes and timestamp granularity &quot;range ghosts&quot; using the wrong column comparator and InvalidRequestException deletions actually write data use column-level TTL to automate deletion &quot;how do I iterate over all the rows in a column family&quot;? get_range_slices, but don't do that a good sign your data model is wrong
  • 29. Dealing with *Lots* of Data (Briefly) Two biggest headaches have been addressed: Compaction pollutes os page cache ( CASSANDRA-1470 ) Greater than 143mil keys on a single SSTable means more BF false positives ( CASSANDRA-1555 ) Hadoop integration: Yes. (Go see Jeremy's presentation) Bulk loading: Yes.  CASSANDRA-1278 For more information:  http://wiki.apache.org/cassandra/LargeDataSetConsiderations