SlideShare a Scribd company logo
美商優科無線 資深工程師

       Boris Yen
專家講座 B:
淺談 Apache Cassandra
Outline
•   Cassandra vs SQL Server
•   Overview
•   Data in Cassandra
•   Data Partitioning
•   Data Replication
•   Data Consistency
•   Client Libraries
Cassandra vs SQL Server
•   Cassandra
    o More servers = More capacity.
    o The concerns of scaling is transparent to application.
    o No single point of failure.
    o Horizontal scale.

•   SQL Server
    o More power machine = More capacity.
    o Adding capacity requires manual labor from ops people
      and substantial downtime.
    o There would be limit on how big you could go.
    o Vertical scale, Moore’s law scaling
Overview
•   Features are coming from Dynamo and BigTable
•   Distributed
    o   Data partitioned among all nodes
•   Extremely Scalable
    o Add new node = Add more capacity
    o Easy to add new node
•   Fault tolerant
    o All nodes are the same
    o Read/Write anywhere
    o Automatic Data replication
•   High Performance
Overview




http://www.datastax.com/dev/blog/whats-new-in-cassandra-1-0-
performance
                                                                 http://www.cubrid.org/blog/dev-platform/nosql-
                                                                 benchmarking/




                                                               http://techblog.netflix.com/2011/11/benchmarking-
                                                               cassandra-scalability-on.html
Data in Cassandra
•   Keyspace ~ Database in RDBMS
•   Column Family ~ Table in RDBMS

    Keyspace


     ColumnFamily
                                                {
                                                    column: Phone,
                    ID     Addr       Phone         value: 09...,
      Key: Boris
                                                    timestamp: 1000
                    1    ... Taiwan   09.....   }

                                                timestamp is used
                                                to resolve conflict.
Data in Cassandra
•    Keyspace
      o   Where the replication strategy and replication factor
          is defined.
           CREATE KEYSPACE keyspace_name WITH
           strategy_class = 'SimpleStrategy'
           AND strategy_options:replication_factor=2;



•    ColumnFamily
    CREATE COLUMNFAMILY user (
     id uuid PRIMARY KEY, address text, userName text ) WITH
     comment='' AND comparator=text AND read_repair_chance=0.100000 AND
     gc_grace_seconds=864000 AND default_validation=text AND
     min_compaction_threshold=4 AND max_compaction_threshold=32 AND
     replicate_on_write=True AND compaction_strategy_class='SizeTieredCompactionStrategy' AND
    compression_parameters:sstable_compression='org.apache.cassandra.io.compress.SnappyCompres
    sor';
Data in Cassandra
•   Commit log
    o   Used to capture write activities. Data durability is
        assured.
•   Memtable
    o   Used to store most recent write activities.
•   SSTable
    o   When a memtable got flushed to disk, it becomes a
        sstable.
Data Read/Write
•   Write

            Data          Commitlog        Memtable


                                                     Flushed

                                           SSTable


•   Read
    o Search Row cache, if the result is not empty, then return the
      result. No further actions are needed.
    o If no hit in the Row cache. Try to get data from Memtable(s)
      and SSTable(s) that might contain requested key. Collate the
      results and return.
Data Compaction
                                    t2 > t1
           Boris:{
             name: boris (t1)
sstable1     phone: 092xxx (t1)
             addr: tainan (t1)
           }
                                                         Boris:{
                                                           addr: tainan (t1)
                                                           email: y@gmail (t2)
                                              sstableX     name: boris.yen (t2)
           Boris:{                                         phone: 092xxx (t1)
             name: boris.yen (t2)                          sex: male (t2)
sstable2     sex: male (t2)
             email: y@gmail (t2)                         }
           }


  .
  .
  .
  .
Data Partitioning
•   The total data managed by the cluster is
    represented as a circular space or ring.
•   Before a node can join the ring, it must be assigned
    a token.
•   The token determines the node’s position on the
    ring and the range of data it is responsible for.
•   Partitioning strategy
     o Random Partitioning
         Default and Recommended
     o Order Partitioning
         Sequential writes can cause hot spots
         More administrative overhead to load balance the
          cluster
Data Partitioning
           Random
           Partitioning
                          t1
            hash(k2)            hash(k1)




Data: k1     t5                       t2              Data: k3


                                            hash(k4)


                                           hash(k3)



                    t4          t3
Data Replication
•   To ensure fault tolerance and no single point
    of failure.
•   Replication is controlled by the parameters
    replication factor and replication strategy
    of a keyspace.
•   Replication factor controls how many copies
    of a row should be stored in the cluster
•   Replication strategy controls how the data
    being replicated.
Data Replication
             Random Partitioning
                                   t1
             RF=3                       hash(k1)




Data: k1            t5                        t2

           coordinator




                          t4            t3
Data Consistency
•   Cassandra supports tunable data
    consistency.
•   Choose from strong and eventual
    consistency depending on the need.
•   Can be done on a per-operation basis, and
    for both reads and writes.
•   Handles multi-data center operations
Consistency Level


   Write           Read
    Any
    One             One
  Quorum          Quorum
Local_Quorum    Local_Quorum
Each_Quorum     Each_Quorum
    All             All
Built-in Consistency Repair
                  Features

•   Read Repair
•   Hinted Handoff
•   Anti-Entropy Node Repair




http://www.datastax.com/docs/0.8/dml/data_consistency#builtin-consistency
Client Library for Java
•   Hector
    o https://github.com/hector-client/hector.git
    o https://github.com/hector-client/hector/wiki/User-
      Guide
•   Astyanax
    o https://github.com/Netflix/astyanax.git
•   CQL + JDBC
    o   http://code.google.com/a/apache-
        extras.org/p/cassandra-jdbc/
Hector
•   High level, simple object oriented
    interface to cassandra
•   Failover behavior on the client side
•   Connection pooling for improved
    performance and scalability
•   Automatic retry of downed hosts
.
.
.
Hector
// slice query
SliceQuery<String, String> q = HFactory.createSliceQuery(ko, se, se, se);
q.setColumnFamily(cf).setKey("jsmith").setColumnNames("first", "last",
"middle");
Result<ColumnSlice<String, String>> r = q.execute();

// multi-get
MultigetSliceQuery<String, String, String> multigetSliceQuery =
   HFactory.createMultigetSliceQuery(keyspace, stringSerializer, stringSerializer,
stringSerializer);
multigetSliceQuery.setColumnFamily("Standard1");
multigetSliceQuery.setKeys("fake_key_0", "fake_key_1",
   "fake_key_2", "fake_key_3", "fake_key_4");
multigetSliceQuery.setRange("", "", false, 3);
Result<Rows<String, String, String>> result = multigetSliceQuery.execute();

// batch operation
Mutator<String> mutator = HFactory.createMutator(keyspace, stringSerializer);
mutator.addInsertion("jsmith", "Standard1",
HFactory.createStringColumn("first", "John")).addInsertion("jsmith",
"Standard1", HFactory.createStringColumn("last",
"Smith")).addInsertion("jsmith", "Standard1",
HFactory.createStringColumn("middle", "Q"));
mutator.execute();
https://github.com/hector-client/hector/wiki/User-Guide
CQL+JDBC
Class.forName("org.apache.cassandra.cql.jdbc.CassandraDriver");
    String URL = String.format("jdbc:cassandra://%s:%d/%s",HOST,PORT,"system");
    System.out.println("Connection URL = '"+URL +"'");

    con = DriverManager.getConnection(URL);
    Statement stmt = con.createStatement();


// Create KeySpace
String createKS = String.format("CREATE KEYSPACE %s WITH strategy_class =
SimpleStrategy AND strategy_options:replication_factor = 1;",KEYSPACE);

stmt.execute(createKS);

// Create the target Column family
      String createCF = "CREATE COLUMNFAMILY RegressionTest (keyname text PRIMARY
KEY,” + "bValue boolean, “+ "iValue int “+ ") WITH comparator = ascii AND default_validation =
bigint;";

 stmt.execute(createCF);

https://code.google.com/a/apache-extras.org/p/cassandra-
jdbc/source/browse/src/test/java/org/apache/cassandra/cql/jdbc/JdbcRegressionTest.java
CQL+JDBC
Statement statement = con.createStatement();

String truncate = "TRUNCATE RegressionTest;";
statement.execute(truncate);

String insert1 = "INSERT INTO RegressionTest (keyname,bValue,iValue) VALUES ('key0',true,
2000);";
statement.executeUpdate(insert1);

String insert2 = "INSERT INTO RegressionTest (keyname,bValue) VALUES( 'key1',false);";
statement.executeUpdate(insert2);

String select = "SELECT * from RegressionTest;";
ResultSet result = statement.executeQuery(select);
ResultSetMetaData metadata = result.getMetaData();
.
.
.


https://code.google.com/a/apache-extras.org/p/cassandra-
jdbc/source/browse/src/test/java/org/apache/cassandra/cql/jdbc/JdbcRegressionTest.java
Useful Tools
•   cassandra-cli
    o <cassandra-dir>/bin
    o http://www.datastax.com/docs/1.0/dml/using_cli

•   cqlsh
    o <cassandra-dir>/bin
    o http://www.datastax.com/docs/1.0/references/cql/index

•   nodetool
    o <cassandra-dir>/bin
    o http://www.datastax.com/docs/1.0/references/nodetool

•   stress
    o <cassandra-dir>/tools/bin
    o http://www.datastax.com/docs/1.0/references/stress_java
Useful Tools
•   OpsCenter
    o    http://www.datastax.com/products/opscenter
•   sstableloader
    o    <cassandra-dir>/bin
    o    http://www.datastax.com/dev/blog/bulk-loading
•   More tools
        http://en.wikipedia.org/wiki/Apache_Cassandra#Tools
          _for_Cassandra
Questions?
Ad

More Related Content

What's hot (20)

Cassandra and Spark
Cassandra and SparkCassandra and Spark
Cassandra and Spark
nickmbailey
 
Apache Cassandra multi-datacenter essentials
Apache Cassandra multi-datacenter essentialsApache Cassandra multi-datacenter essentials
Apache Cassandra multi-datacenter essentials
Julien Anguenot
 
Apache Cassandra, part 1 – principles, data model
Apache Cassandra, part 1 – principles, data modelApache Cassandra, part 1 – principles, data model
Apache Cassandra, part 1 – principles, data model
Andrey Lomakin
 
Dynamo and BigTable in light of the CAP theorem
Dynamo and BigTable in light of the CAP theoremDynamo and BigTable in light of the CAP theorem
Dynamo and BigTable in light of the CAP theorem
Grisha Weintraub
 
Understanding Data Partitioning and Replication in Apache Cassandra
Understanding Data Partitioning and Replication in Apache CassandraUnderstanding Data Partitioning and Replication in Apache Cassandra
Understanding Data Partitioning and Replication in Apache Cassandra
DataStax
 
Monitoring Cassandra at Scale (Jason Cacciatore, Netflix) | C* Summit 2016
Monitoring Cassandra at Scale (Jason Cacciatore, Netflix) | C* Summit 2016Monitoring Cassandra at Scale (Jason Cacciatore, Netflix) | C* Summit 2016
Monitoring Cassandra at Scale (Jason Cacciatore, Netflix) | C* Summit 2016
DataStax
 
Cassandra Consistency: Tradeoffs and Limitations
Cassandra Consistency: Tradeoffs and LimitationsCassandra Consistency: Tradeoffs and Limitations
Cassandra Consistency: Tradeoffs and Limitations
Panagiotis Papadopoulos
 
Distribute Key Value Store
Distribute Key Value StoreDistribute Key Value Store
Distribute Key Value Store
Santal Li
 
Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016
Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016
Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016
DataStax
 
Introduction to Apache Cassandra
Introduction to Apache CassandraIntroduction to Apache Cassandra
Introduction to Apache Cassandra
Robert Stupp
 
Cassandra - A decentralized storage system
Cassandra - A decentralized storage systemCassandra - A decentralized storage system
Cassandra - A decentralized storage system
Arunit Gupta
 
Boot Strapping in Cassandra
Boot Strapping  in CassandraBoot Strapping  in Cassandra
Boot Strapping in Cassandra
Arunit Gupta
 
HBaseCon 2013: Scalable Network Designs for Apache HBase
HBaseCon 2013: Scalable Network Designs for Apache HBaseHBaseCon 2013: Scalable Network Designs for Apache HBase
HBaseCon 2013: Scalable Network Designs for Apache HBase
Cloudera, Inc.
 
Hbase Nosql
Hbase NosqlHbase Nosql
Hbase Nosql
elliando dias
 
Tuning Speculative Retries to Fight Latency (Michael Figuiere, Minh Do, Netfl...
Tuning Speculative Retries to Fight Latency (Michael Figuiere, Minh Do, Netfl...Tuning Speculative Retries to Fight Latency (Michael Figuiere, Minh Do, Netfl...
Tuning Speculative Retries to Fight Latency (Michael Figuiere, Minh Do, Netfl...
DataStax
 
Scaling with MongoDB
Scaling with MongoDBScaling with MongoDB
Scaling with MongoDB
MongoDB
 
Large partition in Cassandra
Large partition in CassandraLarge partition in Cassandra
Large partition in Cassandra
Shogo Hoshii
 
Introduction to Cassandra
Introduction to CassandraIntroduction to Cassandra
Introduction to Cassandra
Gokhan Atil
 
Postgres Vienna DB Meetup 2014
Postgres Vienna DB Meetup 2014Postgres Vienna DB Meetup 2014
Postgres Vienna DB Meetup 2014
Michael Renner
 
Hands on MapR -- Viadea
Hands on MapR -- ViadeaHands on MapR -- Viadea
Hands on MapR -- Viadea
viadea
 
Cassandra and Spark
Cassandra and SparkCassandra and Spark
Cassandra and Spark
nickmbailey
 
Apache Cassandra multi-datacenter essentials
Apache Cassandra multi-datacenter essentialsApache Cassandra multi-datacenter essentials
Apache Cassandra multi-datacenter essentials
Julien Anguenot
 
Apache Cassandra, part 1 – principles, data model
Apache Cassandra, part 1 – principles, data modelApache Cassandra, part 1 – principles, data model
Apache Cassandra, part 1 – principles, data model
Andrey Lomakin
 
Dynamo and BigTable in light of the CAP theorem
Dynamo and BigTable in light of the CAP theoremDynamo and BigTable in light of the CAP theorem
Dynamo and BigTable in light of the CAP theorem
Grisha Weintraub
 
Understanding Data Partitioning and Replication in Apache Cassandra
Understanding Data Partitioning and Replication in Apache CassandraUnderstanding Data Partitioning and Replication in Apache Cassandra
Understanding Data Partitioning and Replication in Apache Cassandra
DataStax
 
Monitoring Cassandra at Scale (Jason Cacciatore, Netflix) | C* Summit 2016
Monitoring Cassandra at Scale (Jason Cacciatore, Netflix) | C* Summit 2016Monitoring Cassandra at Scale (Jason Cacciatore, Netflix) | C* Summit 2016
Monitoring Cassandra at Scale (Jason Cacciatore, Netflix) | C* Summit 2016
DataStax
 
Cassandra Consistency: Tradeoffs and Limitations
Cassandra Consistency: Tradeoffs and LimitationsCassandra Consistency: Tradeoffs and Limitations
Cassandra Consistency: Tradeoffs and Limitations
Panagiotis Papadopoulos
 
Distribute Key Value Store
Distribute Key Value StoreDistribute Key Value Store
Distribute Key Value Store
Santal Li
 
Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016
Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016
Storing Cassandra Metrics (Chris Lohfink, DataStax) | C* Summit 2016
DataStax
 
Introduction to Apache Cassandra
Introduction to Apache CassandraIntroduction to Apache Cassandra
Introduction to Apache Cassandra
Robert Stupp
 
Cassandra - A decentralized storage system
Cassandra - A decentralized storage systemCassandra - A decentralized storage system
Cassandra - A decentralized storage system
Arunit Gupta
 
Boot Strapping in Cassandra
Boot Strapping  in CassandraBoot Strapping  in Cassandra
Boot Strapping in Cassandra
Arunit Gupta
 
HBaseCon 2013: Scalable Network Designs for Apache HBase
HBaseCon 2013: Scalable Network Designs for Apache HBaseHBaseCon 2013: Scalable Network Designs for Apache HBase
HBaseCon 2013: Scalable Network Designs for Apache HBase
Cloudera, Inc.
 
Tuning Speculative Retries to Fight Latency (Michael Figuiere, Minh Do, Netfl...
Tuning Speculative Retries to Fight Latency (Michael Figuiere, Minh Do, Netfl...Tuning Speculative Retries to Fight Latency (Michael Figuiere, Minh Do, Netfl...
Tuning Speculative Retries to Fight Latency (Michael Figuiere, Minh Do, Netfl...
DataStax
 
Scaling with MongoDB
Scaling with MongoDBScaling with MongoDB
Scaling with MongoDB
MongoDB
 
Large partition in Cassandra
Large partition in CassandraLarge partition in Cassandra
Large partition in Cassandra
Shogo Hoshii
 
Introduction to Cassandra
Introduction to CassandraIntroduction to Cassandra
Introduction to Cassandra
Gokhan Atil
 
Postgres Vienna DB Meetup 2014
Postgres Vienna DB Meetup 2014Postgres Vienna DB Meetup 2014
Postgres Vienna DB Meetup 2014
Michael Renner
 
Hands on MapR -- Viadea
Hands on MapR -- ViadeaHands on MapR -- Viadea
Hands on MapR -- Viadea
viadea
 

Viewers also liked (6)

SAUTERNES AND BARSAC GRANDS CRUS CLASSES 2011
SAUTERNES AND BARSAC GRANDS CRUS CLASSES 2011SAUTERNES AND BARSAC GRANDS CRUS CLASSES 2011
SAUTERNES AND BARSAC GRANDS CRUS CLASSES 2011
LettresDeChateaux
 
การนำเสนอข้อมูล(จด)
การนำเสนอข้อมูล(จด)การนำเสนอข้อมูล(จด)
การนำเสนอข้อมูล(จด)
pumyam
 
Talk about apache cassandra, TWJUG 2011
Talk about apache cassandra, TWJUG 2011Talk about apache cassandra, TWJUG 2011
Talk about apache cassandra, TWJUG 2011
Boris Yen
 
Press release - 2011 Vintage
Press release - 2011 VintagePress release - 2011 Vintage
Press release - 2011 Vintage
LettresDeChateaux
 
การนำเสนอข้อมูล(จด)
การนำเสนอข้อมูล(จด)การนำเสนอข้อมูล(จด)
การนำเสนอข้อมูล(จด)
pumyam
 
การนำเสนอข้อมูล
การนำเสนอข้อมูลการนำเสนอข้อมูล
การนำเสนอข้อมูล
pumyam
 
SAUTERNES AND BARSAC GRANDS CRUS CLASSES 2011
SAUTERNES AND BARSAC GRANDS CRUS CLASSES 2011SAUTERNES AND BARSAC GRANDS CRUS CLASSES 2011
SAUTERNES AND BARSAC GRANDS CRUS CLASSES 2011
LettresDeChateaux
 
การนำเสนอข้อมูล(จด)
การนำเสนอข้อมูล(จด)การนำเสนอข้อมูล(จด)
การนำเสนอข้อมูล(จด)
pumyam
 
Talk about apache cassandra, TWJUG 2011
Talk about apache cassandra, TWJUG 2011Talk about apache cassandra, TWJUG 2011
Talk about apache cassandra, TWJUG 2011
Boris Yen
 
Press release - 2011 Vintage
Press release - 2011 VintagePress release - 2011 Vintage
Press release - 2011 Vintage
LettresDeChateaux
 
การนำเสนอข้อมูล(จด)
การนำเสนอข้อมูล(จด)การนำเสนอข้อมูล(จด)
การนำเสนอข้อมูล(จด)
pumyam
 
การนำเสนอข้อมูล
การนำเสนอข้อมูลการนำเสนอข้อมูล
การนำเสนอข้อมูล
pumyam
 
Ad

Similar to Introduce Apache Cassandra - JavaTwo Taiwan, 2012 (20)

Scaling web applications with cassandra presentation
Scaling web applications with cassandra presentationScaling web applications with cassandra presentation
Scaling web applications with cassandra presentation
Murat Çakal
 
Cassandra training
Cassandra trainingCassandra training
Cassandra training
András Fehér
 
Apache Cassandra at the Geek2Geek Berlin
Apache Cassandra at the Geek2Geek BerlinApache Cassandra at the Geek2Geek Berlin
Apache Cassandra at the Geek2Geek Berlin
Christian Johannsen
 
Intro to cassandra
Intro to cassandraIntro to cassandra
Intro to cassandra
Aaron Ploetz
 
Cassandra Java APIs Old and New – A Comparison
Cassandra Java APIs Old and New – A ComparisonCassandra Java APIs Old and New – A Comparison
Cassandra Java APIs Old and New – A Comparison
shsedghi
 
Cassandra for Sysadmins
Cassandra for SysadminsCassandra for Sysadmins
Cassandra for Sysadmins
Nathan Milford
 
Boundary Front end tech talk: how it works
Boundary Front end tech talk: how it worksBoundary Front end tech talk: how it works
Boundary Front end tech talk: how it works
Boundary
 
QuadIron An open source library for number theoretic transform-based erasure ...
QuadIron An open source library for number theoretic transform-based erasure ...QuadIron An open source library for number theoretic transform-based erasure ...
QuadIron An open source library for number theoretic transform-based erasure ...
Scality
 
Deep Dive into Cassandra
Deep Dive into CassandraDeep Dive into Cassandra
Deep Dive into Cassandra
Brent Theisen
 
Spark and cassandra (Hulu Talk)
Spark and cassandra (Hulu Talk)Spark and cassandra (Hulu Talk)
Spark and cassandra (Hulu Talk)
Jon Haddad
 
How to size up an Apache Cassandra cluster (Training)
How to size up an Apache Cassandra cluster (Training)How to size up an Apache Cassandra cluster (Training)
How to size up an Apache Cassandra cluster (Training)
DataStax Academy
 
Devops kc
Devops kcDevops kc
Devops kc
Philip Thompson
 
Managing Data and Operation Distribution In MongoDB
Managing Data and Operation Distribution In MongoDBManaging Data and Operation Distribution In MongoDB
Managing Data and Operation Distribution In MongoDB
Jason Terpko
 
Making KVS 10x Scalable
Making KVS 10x ScalableMaking KVS 10x Scalable
Making KVS 10x Scalable
Sadayuki Furuhashi
 
On Rails with Apache Cassandra
On Rails with Apache CassandraOn Rails with Apache Cassandra
On Rails with Apache Cassandra
Stu Hood
 
Ben Coverston - The Apache Cassandra Project
Ben Coverston - The Apache Cassandra ProjectBen Coverston - The Apache Cassandra Project
Ben Coverston - The Apache Cassandra Project
Morningstar Tech Talks
 
Hadoop Summit 2012 | Bayesian Counters AKA In Memory Data Mining for Large Da...
Hadoop Summit 2012 | Bayesian Counters AKA In Memory Data Mining for Large Da...Hadoop Summit 2012 | Bayesian Counters AKA In Memory Data Mining for Large Da...
Hadoop Summit 2012 | Bayesian Counters AKA In Memory Data Mining for Large Da...
Cloudera, Inc.
 
Cassandra Overview
Cassandra OverviewCassandra Overview
Cassandra Overview
Sergey Titov, Ph.D.
 
DataStax TechDay - Munich 2014
DataStax TechDay - Munich 2014DataStax TechDay - Munich 2014
DataStax TechDay - Munich 2014
Christian Johannsen
 
Renegotiating the boundary between database latency and consistency
Renegotiating the boundary between database latency  and consistencyRenegotiating the boundary between database latency  and consistency
Renegotiating the boundary between database latency and consistency
ScyllaDB
 
Scaling web applications with cassandra presentation
Scaling web applications with cassandra presentationScaling web applications with cassandra presentation
Scaling web applications with cassandra presentation
Murat Çakal
 
Apache Cassandra at the Geek2Geek Berlin
Apache Cassandra at the Geek2Geek BerlinApache Cassandra at the Geek2Geek Berlin
Apache Cassandra at the Geek2Geek Berlin
Christian Johannsen
 
Intro to cassandra
Intro to cassandraIntro to cassandra
Intro to cassandra
Aaron Ploetz
 
Cassandra Java APIs Old and New – A Comparison
Cassandra Java APIs Old and New – A ComparisonCassandra Java APIs Old and New – A Comparison
Cassandra Java APIs Old and New – A Comparison
shsedghi
 
Cassandra for Sysadmins
Cassandra for SysadminsCassandra for Sysadmins
Cassandra for Sysadmins
Nathan Milford
 
Boundary Front end tech talk: how it works
Boundary Front end tech talk: how it worksBoundary Front end tech talk: how it works
Boundary Front end tech talk: how it works
Boundary
 
QuadIron An open source library for number theoretic transform-based erasure ...
QuadIron An open source library for number theoretic transform-based erasure ...QuadIron An open source library for number theoretic transform-based erasure ...
QuadIron An open source library for number theoretic transform-based erasure ...
Scality
 
Deep Dive into Cassandra
Deep Dive into CassandraDeep Dive into Cassandra
Deep Dive into Cassandra
Brent Theisen
 
Spark and cassandra (Hulu Talk)
Spark and cassandra (Hulu Talk)Spark and cassandra (Hulu Talk)
Spark and cassandra (Hulu Talk)
Jon Haddad
 
How to size up an Apache Cassandra cluster (Training)
How to size up an Apache Cassandra cluster (Training)How to size up an Apache Cassandra cluster (Training)
How to size up an Apache Cassandra cluster (Training)
DataStax Academy
 
Managing Data and Operation Distribution In MongoDB
Managing Data and Operation Distribution In MongoDBManaging Data and Operation Distribution In MongoDB
Managing Data and Operation Distribution In MongoDB
Jason Terpko
 
On Rails with Apache Cassandra
On Rails with Apache CassandraOn Rails with Apache Cassandra
On Rails with Apache Cassandra
Stu Hood
 
Ben Coverston - The Apache Cassandra Project
Ben Coverston - The Apache Cassandra ProjectBen Coverston - The Apache Cassandra Project
Ben Coverston - The Apache Cassandra Project
Morningstar Tech Talks
 
Hadoop Summit 2012 | Bayesian Counters AKA In Memory Data Mining for Large Da...
Hadoop Summit 2012 | Bayesian Counters AKA In Memory Data Mining for Large Da...Hadoop Summit 2012 | Bayesian Counters AKA In Memory Data Mining for Large Da...
Hadoop Summit 2012 | Bayesian Counters AKA In Memory Data Mining for Large Da...
Cloudera, Inc.
 
Renegotiating the boundary between database latency and consistency
Renegotiating the boundary between database latency  and consistencyRenegotiating the boundary between database latency  and consistency
Renegotiating the boundary between database latency and consistency
ScyllaDB
 
Ad

Recently uploaded (20)

Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep DiveDesigning Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
ScyllaDB
 
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In FranceManifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
chb3
 
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager APIUiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPathCommunity
 
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-UmgebungenHCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
panagenda
 
Build Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For DevsBuild Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For Devs
Brian McKeiver
 
Heap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and DeletionHeap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and Deletion
Jaydeep Kale
 
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdfComplete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Software Company
 
Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)
Ortus Solutions, Corp
 
Drupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy ConsumptionDrupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy Consumption
Exove
 
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
BookNet Canada
 
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
organizerofv
 
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven InsightsAndrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell
 
tecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdftecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdf
fjgm517
 
Cybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure ADCybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure AD
VICTOR MAESTRE RAMIREZ
 
Semantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AISemantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AI
artmondano
 
How analogue intelligence complements AI
How analogue intelligence complements AIHow analogue intelligence complements AI
How analogue intelligence complements AI
Paul Rowe
 
2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx
Samuele Fogagnolo
 
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdfSAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
Precisely
 
AI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global TrendsAI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global Trends
InData Labs
 
Role of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered ManufacturingRole of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered Manufacturing
Andrew Leo
 
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep DiveDesigning Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
ScyllaDB
 
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In FranceManifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
chb3
 
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager APIUiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPathCommunity
 
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-UmgebungenHCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
panagenda
 
Build Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For DevsBuild Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For Devs
Brian McKeiver
 
Heap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and DeletionHeap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and Deletion
Jaydeep Kale
 
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdfComplete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Software Company
 
Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)
Ortus Solutions, Corp
 
Drupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy ConsumptionDrupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy Consumption
Exove
 
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
BookNet Canada
 
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
organizerofv
 
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven InsightsAndrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell: Transforming Business Strategy Through Data-Driven Insights
Andrew Marnell
 
tecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdftecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdf
fjgm517
 
Cybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure ADCybersecurity Identity and Access Solutions using Azure AD
Cybersecurity Identity and Access Solutions using Azure AD
VICTOR MAESTRE RAMIREZ
 
Semantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AISemantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AI
artmondano
 
How analogue intelligence complements AI
How analogue intelligence complements AIHow analogue intelligence complements AI
How analogue intelligence complements AI
Paul Rowe
 
2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx
Samuele Fogagnolo
 
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdfSAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
Precisely
 
AI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global TrendsAI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global Trends
InData Labs
 
Role of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered ManufacturingRole of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered Manufacturing
Andrew Leo
 

Introduce Apache Cassandra - JavaTwo Taiwan, 2012

  • 3. Outline • Cassandra vs SQL Server • Overview • Data in Cassandra • Data Partitioning • Data Replication • Data Consistency • Client Libraries
  • 4. Cassandra vs SQL Server • Cassandra o More servers = More capacity. o The concerns of scaling is transparent to application. o No single point of failure. o Horizontal scale. • SQL Server o More power machine = More capacity. o Adding capacity requires manual labor from ops people and substantial downtime. o There would be limit on how big you could go. o Vertical scale, Moore’s law scaling
  • 5. Overview • Features are coming from Dynamo and BigTable • Distributed o Data partitioned among all nodes • Extremely Scalable o Add new node = Add more capacity o Easy to add new node • Fault tolerant o All nodes are the same o Read/Write anywhere o Automatic Data replication • High Performance
  • 6. Overview http://www.datastax.com/dev/blog/whats-new-in-cassandra-1-0- performance http://www.cubrid.org/blog/dev-platform/nosql- benchmarking/ http://techblog.netflix.com/2011/11/benchmarking- cassandra-scalability-on.html
  • 7. Data in Cassandra • Keyspace ~ Database in RDBMS • Column Family ~ Table in RDBMS Keyspace ColumnFamily { column: Phone, ID Addr Phone value: 09..., Key: Boris timestamp: 1000 1 ... Taiwan 09..... } timestamp is used to resolve conflict.
  • 8. Data in Cassandra • Keyspace o Where the replication strategy and replication factor is defined. CREATE KEYSPACE keyspace_name WITH strategy_class = 'SimpleStrategy' AND strategy_options:replication_factor=2; • ColumnFamily CREATE COLUMNFAMILY user ( id uuid PRIMARY KEY, address text, userName text ) WITH comment='' AND comparator=text AND read_repair_chance=0.100000 AND gc_grace_seconds=864000 AND default_validation=text AND min_compaction_threshold=4 AND max_compaction_threshold=32 AND replicate_on_write=True AND compaction_strategy_class='SizeTieredCompactionStrategy' AND compression_parameters:sstable_compression='org.apache.cassandra.io.compress.SnappyCompres sor';
  • 9. Data in Cassandra • Commit log o Used to capture write activities. Data durability is assured. • Memtable o Used to store most recent write activities. • SSTable o When a memtable got flushed to disk, it becomes a sstable.
  • 10. Data Read/Write • Write Data Commitlog Memtable Flushed SSTable • Read o Search Row cache, if the result is not empty, then return the result. No further actions are needed. o If no hit in the Row cache. Try to get data from Memtable(s) and SSTable(s) that might contain requested key. Collate the results and return.
  • 11. Data Compaction t2 > t1 Boris:{ name: boris (t1) sstable1 phone: 092xxx (t1) addr: tainan (t1) } Boris:{ addr: tainan (t1) email: y@gmail (t2) sstableX name: boris.yen (t2) Boris:{ phone: 092xxx (t1) name: boris.yen (t2) sex: male (t2) sstable2 sex: male (t2) email: y@gmail (t2) } } . . . .
  • 12. Data Partitioning • The total data managed by the cluster is represented as a circular space or ring. • Before a node can join the ring, it must be assigned a token. • The token determines the node’s position on the ring and the range of data it is responsible for. • Partitioning strategy o Random Partitioning  Default and Recommended o Order Partitioning  Sequential writes can cause hot spots  More administrative overhead to load balance the cluster
  • 13. Data Partitioning Random Partitioning t1 hash(k2) hash(k1) Data: k1 t5 t2 Data: k3 hash(k4) hash(k3) t4 t3
  • 14. Data Replication • To ensure fault tolerance and no single point of failure. • Replication is controlled by the parameters replication factor and replication strategy of a keyspace. • Replication factor controls how many copies of a row should be stored in the cluster • Replication strategy controls how the data being replicated.
  • 15. Data Replication Random Partitioning t1 RF=3 hash(k1) Data: k1 t5 t2 coordinator t4 t3
  • 16. Data Consistency • Cassandra supports tunable data consistency. • Choose from strong and eventual consistency depending on the need. • Can be done on a per-operation basis, and for both reads and writes. • Handles multi-data center operations
  • 17. Consistency Level Write Read Any One One Quorum Quorum Local_Quorum Local_Quorum Each_Quorum Each_Quorum All All
  • 18. Built-in Consistency Repair Features • Read Repair • Hinted Handoff • Anti-Entropy Node Repair http://www.datastax.com/docs/0.8/dml/data_consistency#builtin-consistency
  • 19. Client Library for Java • Hector o https://github.com/hector-client/hector.git o https://github.com/hector-client/hector/wiki/User- Guide • Astyanax o https://github.com/Netflix/astyanax.git • CQL + JDBC o http://code.google.com/a/apache- extras.org/p/cassandra-jdbc/
  • 20. Hector • High level, simple object oriented interface to cassandra • Failover behavior on the client side • Connection pooling for improved performance and scalability • Automatic retry of downed hosts . . .
  • 21. Hector // slice query SliceQuery<String, String> q = HFactory.createSliceQuery(ko, se, se, se); q.setColumnFamily(cf).setKey("jsmith").setColumnNames("first", "last", "middle"); Result<ColumnSlice<String, String>> r = q.execute(); // multi-get MultigetSliceQuery<String, String, String> multigetSliceQuery = HFactory.createMultigetSliceQuery(keyspace, stringSerializer, stringSerializer, stringSerializer); multigetSliceQuery.setColumnFamily("Standard1"); multigetSliceQuery.setKeys("fake_key_0", "fake_key_1", "fake_key_2", "fake_key_3", "fake_key_4"); multigetSliceQuery.setRange("", "", false, 3); Result<Rows<String, String, String>> result = multigetSliceQuery.execute(); // batch operation Mutator<String> mutator = HFactory.createMutator(keyspace, stringSerializer); mutator.addInsertion("jsmith", "Standard1", HFactory.createStringColumn("first", "John")).addInsertion("jsmith", "Standard1", HFactory.createStringColumn("last", "Smith")).addInsertion("jsmith", "Standard1", HFactory.createStringColumn("middle", "Q")); mutator.execute(); https://github.com/hector-client/hector/wiki/User-Guide
  • 22. CQL+JDBC Class.forName("org.apache.cassandra.cql.jdbc.CassandraDriver"); String URL = String.format("jdbc:cassandra://%s:%d/%s",HOST,PORT,"system"); System.out.println("Connection URL = '"+URL +"'"); con = DriverManager.getConnection(URL); Statement stmt = con.createStatement(); // Create KeySpace String createKS = String.format("CREATE KEYSPACE %s WITH strategy_class = SimpleStrategy AND strategy_options:replication_factor = 1;",KEYSPACE); stmt.execute(createKS); // Create the target Column family String createCF = "CREATE COLUMNFAMILY RegressionTest (keyname text PRIMARY KEY,” + "bValue boolean, “+ "iValue int “+ ") WITH comparator = ascii AND default_validation = bigint;"; stmt.execute(createCF); https://code.google.com/a/apache-extras.org/p/cassandra- jdbc/source/browse/src/test/java/org/apache/cassandra/cql/jdbc/JdbcRegressionTest.java
  • 23. CQL+JDBC Statement statement = con.createStatement(); String truncate = "TRUNCATE RegressionTest;"; statement.execute(truncate); String insert1 = "INSERT INTO RegressionTest (keyname,bValue,iValue) VALUES ('key0',true, 2000);"; statement.executeUpdate(insert1); String insert2 = "INSERT INTO RegressionTest (keyname,bValue) VALUES( 'key1',false);"; statement.executeUpdate(insert2); String select = "SELECT * from RegressionTest;"; ResultSet result = statement.executeQuery(select); ResultSetMetaData metadata = result.getMetaData(); . . . https://code.google.com/a/apache-extras.org/p/cassandra- jdbc/source/browse/src/test/java/org/apache/cassandra/cql/jdbc/JdbcRegressionTest.java
  • 24. Useful Tools • cassandra-cli o <cassandra-dir>/bin o http://www.datastax.com/docs/1.0/dml/using_cli • cqlsh o <cassandra-dir>/bin o http://www.datastax.com/docs/1.0/references/cql/index • nodetool o <cassandra-dir>/bin o http://www.datastax.com/docs/1.0/references/nodetool • stress o <cassandra-dir>/tools/bin o http://www.datastax.com/docs/1.0/references/stress_java
  • 25. Useful Tools • OpsCenter o http://www.datastax.com/products/opscenter • sstableloader o <cassandra-dir>/bin o http://www.datastax.com/dev/blog/bulk-loading • More tools http://en.wikipedia.org/wiki/Apache_Cassandra#Tools _for_Cassandra