SlideShare a Scribd company logo
Cassandra     for
Python Developers

     Tyler Hobbs
History

    Open-sourced by Facebook   2008

    Apache Incubator           2009

    Top-level Apache project   2010

    DataStax founded           2010
Strengths

    Scalable
    – 2x Nodes == 2x Performance

    Reliable (Available)
    – Replication that works
    – Multi-DC support
    – No single point of failure
Strengths

    Fast
    – 10-30k writes/sec, 1-10k reads/sec

    Analytics
    – Integrated Hadoop support
Weaknesses

    No ACID transactions
    – Don't need these as often as you'd think

    Limited support for ad-hoc queries
    – You'll give these up anyway when sharding an RDBMS


    Generally complements another system
    – Not intended to be one-size-fits-all
Clustering

    Every node plays the same role
    – No masters, slaves, or special nodes
Clustering

             0

      50          10




      40          20

             30
Clustering
                       Key: “www.google.com”
             0

      50          10




      40          20

             30
Clustering
                       Key: “www.google.com”
             0
                       md5(“www.google.com”)
      50          10

                                14

      40          20

             30
Clustering
                       Key: “www.google.com”
             0
                       md5(“www.google.com”)
      50          10

                                14

      40          20

             30
Clustering
                       Key: “www.google.com”
             0
                       md5(“www.google.com”)
      50          10

                                14

      40          20

             30
Clustering
                          Key: “www.google.com”
             0
                          md5(“www.google.com”)
      50           10

                                     14

      40           20

             30
                  Replication Factor = 3
Clustering
   Client can talk to any node
Data Model

    Keyspace
    – A collection of Column Families
    – Controls replication settings

    Column Family
    – Kinda resembles a table
ColumnFamilies

    Static
    – Object data

    Dynamic
    – Pre-calculated query results
Static Column Families
                   Users
   zznate    password: *    name: Nate


   driftx    password: *   name: Brandon


   thobbs    password: *    name: Tyler


   jbellis   password: *   name: Jonathan   site: riptano.com
Dynamic Column Families
                     Following
zznate    driftx:   thobbs:


driftx


thobbs    zznate:


jbellis   driftx:   mdennis:   pcmanus   thobbs:   xedin:   zznate
Dynamic Column Families

    Timeline of tweets by a user

    Timeline of tweets by all of the people a
    user is following

    List of comments sorted by score

    List of friends grouped by state
Pycassa


    Python client library for Cassandra

    Open Source (MIT License)
    – www.github.com/pycassa/pycassa

    Users
    – Reddit
    – ~10k github downloads of every version
Installing Pycassa

    easy_install pycassa
    – or pip
Basic Layout

    pycassa.pool
    – Connection pooling

    pycassa.columnfamily
    – Primary module for the data API

    pycassa.system_manager
    – Schema management
The Data API

    RPC-based API

    Rows are like a sorted list of (name,value)
    tuples
    – Like a dict, but sorted by the names
    – OrderedDicts are used to preserve sorting
Inserting Data
>>> from pycassa.pool import ConnectionPool
>>> from pycassa.columnfamily import ColumnFamily
>>>
>>> pool = ConnectionPool(“MyKeyspace”)
>>> cf = ColumnFamily(pool, “MyCF”)
>>>
>>> cf.insert(“key”, {“col_name”: “col_value”})
>>> cf.get(“key”)
{“col_name”: “col_value”}
Inserting Data
>>> columns = {“aaa”: 1, “ccc”: 3}
>>> cf.insert(“key”, columns)
>>> cf.get(“key”)
{“aaa”: 1, “ccc”: 3}
>>>
>>> # Updates are the same as inserts
>>> cf.insert(“key”, {“aaa”: 42})
>>> cf.get(“key”)
{“aaa”: 42, “ccc”: 3}
>>>
>>> # We can insert anywhere in the row
>>> cf.insert(“key”, {“bbb”: 2, “ddd”: 4})
>>> cf.get(“key”)
{“aaa”: 42, “bbb”: 2, “ccc”: 3, “ddd”: 4}
Fetching Data
>>> cf.get(“key”)
{“aaa”: 42, “bbb”: 2, “ccc”: 3, “ddd”: 4}
>>>
>>> # Get a set of columns by name
>>> cf.get(“key”, columns=[“bbb”, “ddd”])
{“bbb”: 2, “ddd”: 4}
Fetching Data
>>> # Get a slice of columns
>>> cf.get(“key”, column_start=”bbb”,
...               column_finish=”ccc”)
{“bbb”: 2, “ccc”: 3}
>>>
>>> # Slice from “ccc” to the end
>>> cf.get(“key”, column_start=”ccc”)
{“ccc”: 3, “ddd”: 4}
>>>
>>> # Slice from “bbb” to the beginning
>>> cf.get(“key”, column_start=”bbb”,
...               column_reversed=True)
{“bbb”: 2, “aaa”: 42}
Fetching Data
>>> # Get the first two columns in the row
>>> cf.get(“key”, column_count=2)
{“aaa”: 42, “bbb”: 2}
>>>
>>> # Get the last two columns in the row
>>> cf.get(“key”, column_reversed=True,
...               column_count=2)
{“ddd”: 4, “ccc”: 3}
Fetching Multiple Rows
>>> columns = {“col”: “val”}
>>> cf.batch_insert({“k1”: columns,
...                  “k2”: columns,
...                  “k3”: columns})
>>>
>>> # Get multiple rows by name
>>> cf.multiget([“k1”,“k2”])
{“k1”: {”col”: “val”},
 “k2”: {“col”: “val”}}


>>> # You can get slices of each row, too
>>> cf.multiget([“k1”,“k2”], column_start=”bbb”) …
Fetching a Range of Rows
>>> # Get a generator over all of the rows
>>> for key, columns in cf.get_range():
...     print key, columns
“k1” {”col”: “val”}
“k2” {“col”: “val”}
“k3” {“col”: “val”}


>>> # You can get slices of each row
>>> cf.get_range(column_start=”bbb”) …
Fetching Rows by Secondary Index
>>> from pycassa.index import *
>>>
>>> # Build up our index clause to match
>>> exp = create_index_expression(“name”, “Joe”)
>>> clause = create_index_clause([exp])
>>> matches = users.get_indexed_slices(clause)
>>>
>>> # results is a generator over matching rows
>>> for key, columns in matches:
...     print key, columns
“13” {”name”: “Joe”, “nick”: “thatguy2”}
“257” {“name”: “Joe”, “nick”: “flowers”}
“98” {“name”: “Joe”, “nick”: “fr0d0”}
Deleting Data
>>>   # Delete a whole row
>>>   cf.remove(“key1”)
>>>
>>>   # Or selectively delete columns
>>>   cf.remove(“key2”, columns=[“name”, “date”])
Connection Management

    pycassa.pool.ConnectionPool
    – Takes a list of servers
        • Can be any set of nodes in your cluster
    – pool_size, max_retries, timeout
    – Automatically retries operations against other nodes
        • Writes are idempotent!
    – Individual node failures are transparent
    – Thread safe
Async Options

    eventlet
    – Just need to monkeypatch socket and threading

    Twisted
    – Use Telephus instead of Pycassa
    – www.github.com/driftx/telephus
    – Less friendly, documented, etc
Tyler Hobbs
        @tylhobbs
tyler@datastax.com
Ad

More Related Content

What's hot (20)

Python in the database
Python in the databasePython in the database
Python in the database
pybcn
 
Caching and tuning fun for high scalability
Caching and tuning fun for high scalabilityCaching and tuning fun for high scalability
Caching and tuning fun for high scalability
Wim Godden
 
Apache Cassandra Lesson: Data Modelling and CQL3
Apache Cassandra Lesson: Data Modelling and CQL3Apache Cassandra Lesson: Data Modelling and CQL3
Apache Cassandra Lesson: Data Modelling and CQL3
Markus Klems
 
How to Use JSON in MySQL Wrong
How to Use JSON in MySQL WrongHow to Use JSON in MySQL Wrong
How to Use JSON in MySQL Wrong
Karwin Software Solutions LLC
 
Top Node.js Metrics to Watch
Top Node.js Metrics to WatchTop Node.js Metrics to Watch
Top Node.js Metrics to Watch
Sematext Group, Inc.
 
MongoDB Database Replication
MongoDB Database ReplicationMongoDB Database Replication
MongoDB Database Replication
Mehdi Valikhani
 
Cassandra 3.0 Awesomeness
Cassandra 3.0 AwesomenessCassandra 3.0 Awesomeness
Cassandra 3.0 Awesomeness
Jon Haddad
 
CQL performance with Apache Cassandra 3.0 (Aaron Morton, The Last Pickle) | C...
CQL performance with Apache Cassandra 3.0 (Aaron Morton, The Last Pickle) | C...CQL performance with Apache Cassandra 3.0 (Aaron Morton, The Last Pickle) | C...
CQL performance with Apache Cassandra 3.0 (Aaron Morton, The Last Pickle) | C...
DataStax
 
Replication MongoDB Days 2013
Replication MongoDB Days 2013Replication MongoDB Days 2013
Replication MongoDB Days 2013
Randall Hunt
 
C*ollege Credit: Creating Your First App in Java with Cassandra
C*ollege Credit: Creating Your First App in Java with CassandraC*ollege Credit: Creating Your First App in Java with Cassandra
C*ollege Credit: Creating Your First App in Java with Cassandra
DataStax
 
Load Data Fast!
Load Data Fast!Load Data Fast!
Load Data Fast!
Karwin Software Solutions LLC
 
Cassandra EU - Data model on fire
Cassandra EU - Data model on fireCassandra EU - Data model on fire
Cassandra EU - Data model on fire
Patrick McFadin
 
Cassandra 3.0 - JSON at scale - StampedeCon 2015
Cassandra 3.0 - JSON at scale - StampedeCon 2015Cassandra 3.0 - JSON at scale - StampedeCon 2015
Cassandra 3.0 - JSON at scale - StampedeCon 2015
StampedeCon
 
Beyond PHP - it's not (just) about the code
Beyond PHP - it's not (just) about the codeBeyond PHP - it's not (just) about the code
Beyond PHP - it's not (just) about the code
Wim Godden
 
Beyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeBeyond php - it's not (just) about the code
Beyond php - it's not (just) about the code
Wim Godden
 
Cassandra 2.2 & 3.0
Cassandra 2.2 & 3.0Cassandra 2.2 & 3.0
Cassandra 2.2 & 3.0
Victor Coustenoble
 
glance replicator
glance replicatorglance replicator
glance replicator
irix_jp
 
Beyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeBeyond php - it's not (just) about the code
Beyond php - it's not (just) about the code
Wim Godden
 
DataStax: An Introduction to DataStax Enterprise Search
DataStax: An Introduction to DataStax Enterprise SearchDataStax: An Introduction to DataStax Enterprise Search
DataStax: An Introduction to DataStax Enterprise Search
DataStax Academy
 
Administering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersAdministering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud Clusters
Sematext Group, Inc.
 
Python in the database
Python in the databasePython in the database
Python in the database
pybcn
 
Caching and tuning fun for high scalability
Caching and tuning fun for high scalabilityCaching and tuning fun for high scalability
Caching and tuning fun for high scalability
Wim Godden
 
Apache Cassandra Lesson: Data Modelling and CQL3
Apache Cassandra Lesson: Data Modelling and CQL3Apache Cassandra Lesson: Data Modelling and CQL3
Apache Cassandra Lesson: Data Modelling and CQL3
Markus Klems
 
MongoDB Database Replication
MongoDB Database ReplicationMongoDB Database Replication
MongoDB Database Replication
Mehdi Valikhani
 
Cassandra 3.0 Awesomeness
Cassandra 3.0 AwesomenessCassandra 3.0 Awesomeness
Cassandra 3.0 Awesomeness
Jon Haddad
 
CQL performance with Apache Cassandra 3.0 (Aaron Morton, The Last Pickle) | C...
CQL performance with Apache Cassandra 3.0 (Aaron Morton, The Last Pickle) | C...CQL performance with Apache Cassandra 3.0 (Aaron Morton, The Last Pickle) | C...
CQL performance with Apache Cassandra 3.0 (Aaron Morton, The Last Pickle) | C...
DataStax
 
Replication MongoDB Days 2013
Replication MongoDB Days 2013Replication MongoDB Days 2013
Replication MongoDB Days 2013
Randall Hunt
 
C*ollege Credit: Creating Your First App in Java with Cassandra
C*ollege Credit: Creating Your First App in Java with CassandraC*ollege Credit: Creating Your First App in Java with Cassandra
C*ollege Credit: Creating Your First App in Java with Cassandra
DataStax
 
Cassandra EU - Data model on fire
Cassandra EU - Data model on fireCassandra EU - Data model on fire
Cassandra EU - Data model on fire
Patrick McFadin
 
Cassandra 3.0 - JSON at scale - StampedeCon 2015
Cassandra 3.0 - JSON at scale - StampedeCon 2015Cassandra 3.0 - JSON at scale - StampedeCon 2015
Cassandra 3.0 - JSON at scale - StampedeCon 2015
StampedeCon
 
Beyond PHP - it's not (just) about the code
Beyond PHP - it's not (just) about the codeBeyond PHP - it's not (just) about the code
Beyond PHP - it's not (just) about the code
Wim Godden
 
Beyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeBeyond php - it's not (just) about the code
Beyond php - it's not (just) about the code
Wim Godden
 
glance replicator
glance replicatorglance replicator
glance replicator
irix_jp
 
Beyond php - it's not (just) about the code
Beyond php - it's not (just) about the codeBeyond php - it's not (just) about the code
Beyond php - it's not (just) about the code
Wim Godden
 
DataStax: An Introduction to DataStax Enterprise Search
DataStax: An Introduction to DataStax Enterprise SearchDataStax: An Introduction to DataStax Enterprise Search
DataStax: An Introduction to DataStax Enterprise Search
DataStax Academy
 
Administering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersAdministering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud Clusters
Sematext Group, Inc.
 

Similar to Cassandra for Python Developers (20)

Cassandra for Ruby/Rails Devs
Cassandra for Ruby/Rails DevsCassandra for Ruby/Rails Devs
Cassandra for Ruby/Rails Devs
Tyler Hobbs
 
Cassandra Summit 2013 Keynote
Cassandra Summit 2013 KeynoteCassandra Summit 2013 Keynote
Cassandra Summit 2013 Keynote
jbellis
 
Importing Data into Neo4j quickly and easily - StackOverflow
Importing Data into Neo4j quickly and easily - StackOverflowImporting Data into Neo4j quickly and easily - StackOverflow
Importing Data into Neo4j quickly and easily - StackOverflow
Neo4j
 
ETL With Cassandra Streaming Bulk Loading
ETL With Cassandra Streaming Bulk LoadingETL With Cassandra Streaming Bulk Loading
ETL With Cassandra Streaming Bulk Loading
alex_araujo
 
Rapid and Scalable Development with MongoDB, PyMongo, and Ming
Rapid and Scalable Development with MongoDB, PyMongo, and MingRapid and Scalable Development with MongoDB, PyMongo, and Ming
Rapid and Scalable Development with MongoDB, PyMongo, and Ming
Rick Copeland
 
Graph Connect: Importing data quickly and easily
Graph Connect: Importing data quickly and easilyGraph Connect: Importing data quickly and easily
Graph Connect: Importing data quickly and easily
Mark Needham
 
GraphConnect Europe 2016 - Importing Data - Mark Needham, Michael Hunger
GraphConnect Europe 2016 - Importing Data - Mark Needham, Michael HungerGraphConnect Europe 2016 - Importing Data - Mark Needham, Michael Hunger
GraphConnect Europe 2016 - Importing Data - Mark Needham, Michael Hunger
Neo4j
 
The Ring programming language version 1.6 book - Part 46 of 189
The Ring programming language version 1.6 book - Part 46 of 189The Ring programming language version 1.6 book - Part 46 of 189
The Ring programming language version 1.6 book - Part 46 of 189
Mahmoud Samir Fayed
 
Webinaire 2 de la série « Retour aux fondamentaux » : Votre première applicat...
Webinaire 2 de la série « Retour aux fondamentaux » : Votre première applicat...Webinaire 2 de la série « Retour aux fondamentaux » : Votre première applicat...
Webinaire 2 de la série « Retour aux fondamentaux » : Votre première applicat...
MongoDB
 
The Ring programming language version 1.7 book - Part 48 of 196
The Ring programming language version 1.7 book - Part 48 of 196The Ring programming language version 1.7 book - Part 48 of 196
The Ring programming language version 1.7 book - Part 48 of 196
Mahmoud Samir Fayed
 
The Ring programming language version 1.5 book - Part 8 of 31
The Ring programming language version 1.5 book - Part 8 of 31The Ring programming language version 1.5 book - Part 8 of 31
The Ring programming language version 1.5 book - Part 8 of 31
Mahmoud Samir Fayed
 
ClickHouse Introduction by Alexander Zaitsev, Altinity CTO
ClickHouse Introduction by Alexander Zaitsev, Altinity CTOClickHouse Introduction by Alexander Zaitsev, Altinity CTO
ClickHouse Introduction by Alexander Zaitsev, Altinity CTO
Altinity Ltd
 
Python database access
Python database accessPython database access
Python database access
Smt. Indira Gandhi College of Engineering, Navi Mumbai, Mumbai
 
Spark And Cassandra: 2 Fast, 2 Furious
Spark And Cassandra: 2 Fast, 2 FuriousSpark And Cassandra: 2 Fast, 2 Furious
Spark And Cassandra: 2 Fast, 2 Furious
Jen Aman
 
Spark and Cassandra 2 Fast 2 Furious
Spark and Cassandra 2 Fast 2 FuriousSpark and Cassandra 2 Fast 2 Furious
Spark and Cassandra 2 Fast 2 Furious
Russell Spitzer
 
Database Connectivity using Python and MySQL
Database Connectivity using Python and MySQLDatabase Connectivity using Python and MySQL
Database Connectivity using Python and MySQL
devsuchaye
 
The Ring programming language version 1.2 book - Part 32 of 84
The Ring programming language version 1.2 book - Part 32 of 84The Ring programming language version 1.2 book - Part 32 of 84
The Ring programming language version 1.2 book - Part 32 of 84
Mahmoud Samir Fayed
 
Store and Process Big Data with Hadoop and Cassandra
Store and Process Big Data with Hadoop and CassandraStore and Process Big Data with Hadoop and Cassandra
Store and Process Big Data with Hadoop and Cassandra
Deependra Ariyadewa
 
The Ring programming language version 1.9 book - Part 53 of 210
The Ring programming language version 1.9 book - Part 53 of 210The Ring programming language version 1.9 book - Part 53 of 210
The Ring programming language version 1.9 book - Part 53 of 210
Mahmoud Samir Fayed
 
The Ring programming language version 1.9 book - Part 36 of 210
The Ring programming language version 1.9 book - Part 36 of 210The Ring programming language version 1.9 book - Part 36 of 210
The Ring programming language version 1.9 book - Part 36 of 210
Mahmoud Samir Fayed
 
Cassandra for Ruby/Rails Devs
Cassandra for Ruby/Rails DevsCassandra for Ruby/Rails Devs
Cassandra for Ruby/Rails Devs
Tyler Hobbs
 
Cassandra Summit 2013 Keynote
Cassandra Summit 2013 KeynoteCassandra Summit 2013 Keynote
Cassandra Summit 2013 Keynote
jbellis
 
Importing Data into Neo4j quickly and easily - StackOverflow
Importing Data into Neo4j quickly and easily - StackOverflowImporting Data into Neo4j quickly and easily - StackOverflow
Importing Data into Neo4j quickly and easily - StackOverflow
Neo4j
 
ETL With Cassandra Streaming Bulk Loading
ETL With Cassandra Streaming Bulk LoadingETL With Cassandra Streaming Bulk Loading
ETL With Cassandra Streaming Bulk Loading
alex_araujo
 
Rapid and Scalable Development with MongoDB, PyMongo, and Ming
Rapid and Scalable Development with MongoDB, PyMongo, and MingRapid and Scalable Development with MongoDB, PyMongo, and Ming
Rapid and Scalable Development with MongoDB, PyMongo, and Ming
Rick Copeland
 
Graph Connect: Importing data quickly and easily
Graph Connect: Importing data quickly and easilyGraph Connect: Importing data quickly and easily
Graph Connect: Importing data quickly and easily
Mark Needham
 
GraphConnect Europe 2016 - Importing Data - Mark Needham, Michael Hunger
GraphConnect Europe 2016 - Importing Data - Mark Needham, Michael HungerGraphConnect Europe 2016 - Importing Data - Mark Needham, Michael Hunger
GraphConnect Europe 2016 - Importing Data - Mark Needham, Michael Hunger
Neo4j
 
The Ring programming language version 1.6 book - Part 46 of 189
The Ring programming language version 1.6 book - Part 46 of 189The Ring programming language version 1.6 book - Part 46 of 189
The Ring programming language version 1.6 book - Part 46 of 189
Mahmoud Samir Fayed
 
Webinaire 2 de la série « Retour aux fondamentaux » : Votre première applicat...
Webinaire 2 de la série « Retour aux fondamentaux » : Votre première applicat...Webinaire 2 de la série « Retour aux fondamentaux » : Votre première applicat...
Webinaire 2 de la série « Retour aux fondamentaux » : Votre première applicat...
MongoDB
 
The Ring programming language version 1.7 book - Part 48 of 196
The Ring programming language version 1.7 book - Part 48 of 196The Ring programming language version 1.7 book - Part 48 of 196
The Ring programming language version 1.7 book - Part 48 of 196
Mahmoud Samir Fayed
 
The Ring programming language version 1.5 book - Part 8 of 31
The Ring programming language version 1.5 book - Part 8 of 31The Ring programming language version 1.5 book - Part 8 of 31
The Ring programming language version 1.5 book - Part 8 of 31
Mahmoud Samir Fayed
 
ClickHouse Introduction by Alexander Zaitsev, Altinity CTO
ClickHouse Introduction by Alexander Zaitsev, Altinity CTOClickHouse Introduction by Alexander Zaitsev, Altinity CTO
ClickHouse Introduction by Alexander Zaitsev, Altinity CTO
Altinity Ltd
 
Spark And Cassandra: 2 Fast, 2 Furious
Spark And Cassandra: 2 Fast, 2 FuriousSpark And Cassandra: 2 Fast, 2 Furious
Spark And Cassandra: 2 Fast, 2 Furious
Jen Aman
 
Spark and Cassandra 2 Fast 2 Furious
Spark and Cassandra 2 Fast 2 FuriousSpark and Cassandra 2 Fast 2 Furious
Spark and Cassandra 2 Fast 2 Furious
Russell Spitzer
 
Database Connectivity using Python and MySQL
Database Connectivity using Python and MySQLDatabase Connectivity using Python and MySQL
Database Connectivity using Python and MySQL
devsuchaye
 
The Ring programming language version 1.2 book - Part 32 of 84
The Ring programming language version 1.2 book - Part 32 of 84The Ring programming language version 1.2 book - Part 32 of 84
The Ring programming language version 1.2 book - Part 32 of 84
Mahmoud Samir Fayed
 
Store and Process Big Data with Hadoop and Cassandra
Store and Process Big Data with Hadoop and CassandraStore and Process Big Data with Hadoop and Cassandra
Store and Process Big Data with Hadoop and Cassandra
Deependra Ariyadewa
 
The Ring programming language version 1.9 book - Part 53 of 210
The Ring programming language version 1.9 book - Part 53 of 210The Ring programming language version 1.9 book - Part 53 of 210
The Ring programming language version 1.9 book - Part 53 of 210
Mahmoud Samir Fayed
 
The Ring programming language version 1.9 book - Part 36 of 210
The Ring programming language version 1.9 book - Part 36 of 210The Ring programming language version 1.9 book - Part 36 of 210
The Ring programming language version 1.9 book - Part 36 of 210
Mahmoud Samir Fayed
 
Ad

Recently uploaded (20)

TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc
 
Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)
Ortus Solutions, Corp
 
Mobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi ArabiaMobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi Arabia
Steve Jonas
 
2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx
Samuele Fogagnolo
 
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
organizerofv
 
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep DiveDesigning Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
ScyllaDB
 
HCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser EnvironmentsHCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser Environments
panagenda
 
Semantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AISemantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AI
artmondano
 
AI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global TrendsAI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global Trends
InData Labs
 
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-UmgebungenHCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
panagenda
 
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdfThe Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
Abi john
 
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptxSpecial Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
shyamraj55
 
Electronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploitElectronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploit
niftliyevhuseyn
 
Procurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptxProcurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptx
Jon Hansen
 
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptxDevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
Justin Reock
 
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In FranceManifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
chb3
 
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdfComplete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Software Company
 
Cyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of securityCyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of security
riccardosl1
 
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
BookNet Canada
 
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptxIncreasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Anoop Ashok
 
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc
 
Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)
Ortus Solutions, Corp
 
Mobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi ArabiaMobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi Arabia
Steve Jonas
 
2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx
Samuele Fogagnolo
 
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
organizerofv
 
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep DiveDesigning Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
ScyllaDB
 
HCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser EnvironmentsHCL Nomad Web – Best Practices and Managing Multiuser Environments
HCL Nomad Web – Best Practices and Managing Multiuser Environments
panagenda
 
Semantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AISemantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AI
artmondano
 
AI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global TrendsAI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global Trends
InData Labs
 
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-UmgebungenHCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
panagenda
 
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdfThe Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
Abi john
 
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptxSpecial Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
shyamraj55
 
Electronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploitElectronic_Mail_Attacks-1-35.pdf by xploit
Electronic_Mail_Attacks-1-35.pdf by xploit
niftliyevhuseyn
 
Procurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptxProcurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptx
Jon Hansen
 
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptxDevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
Justin Reock
 
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In FranceManifest Pre-Seed Update | A Humanoid OEM Deeptech In France
Manifest Pre-Seed Update | A Humanoid OEM Deeptech In France
chb3
 
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdfComplete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Software Company
 
Cyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of securityCyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of security
riccardosl1
 
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
BookNet Canada
 
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptxIncreasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Anoop Ashok
 
Ad

Cassandra for Python Developers

  • 1. Cassandra for Python Developers Tyler Hobbs
  • 2. History  Open-sourced by Facebook 2008  Apache Incubator 2009  Top-level Apache project 2010  DataStax founded 2010
  • 3. Strengths  Scalable – 2x Nodes == 2x Performance  Reliable (Available) – Replication that works – Multi-DC support – No single point of failure
  • 4. Strengths  Fast – 10-30k writes/sec, 1-10k reads/sec  Analytics – Integrated Hadoop support
  • 5. Weaknesses  No ACID transactions – Don't need these as often as you'd think  Limited support for ad-hoc queries – You'll give these up anyway when sharding an RDBMS  Generally complements another system – Not intended to be one-size-fits-all
  • 6. Clustering  Every node plays the same role – No masters, slaves, or special nodes
  • 7. Clustering 0 50 10 40 20 30
  • 8. Clustering Key: “www.google.com” 0 50 10 40 20 30
  • 9. Clustering Key: “www.google.com” 0 md5(“www.google.com”) 50 10 14 40 20 30
  • 10. Clustering Key: “www.google.com” 0 md5(“www.google.com”) 50 10 14 40 20 30
  • 11. Clustering Key: “www.google.com” 0 md5(“www.google.com”) 50 10 14 40 20 30
  • 12. Clustering Key: “www.google.com” 0 md5(“www.google.com”) 50 10 14 40 20 30 Replication Factor = 3
  • 13. Clustering  Client can talk to any node
  • 14. Data Model  Keyspace – A collection of Column Families – Controls replication settings  Column Family – Kinda resembles a table
  • 15. ColumnFamilies  Static – Object data  Dynamic – Pre-calculated query results
  • 16. Static Column Families Users zznate password: * name: Nate driftx password: * name: Brandon thobbs password: * name: Tyler jbellis password: * name: Jonathan site: riptano.com
  • 17. Dynamic Column Families Following zznate driftx: thobbs: driftx thobbs zznate: jbellis driftx: mdennis: pcmanus thobbs: xedin: zznate
  • 18. Dynamic Column Families  Timeline of tweets by a user  Timeline of tweets by all of the people a user is following  List of comments sorted by score  List of friends grouped by state
  • 19. Pycassa  Python client library for Cassandra  Open Source (MIT License) – www.github.com/pycassa/pycassa  Users – Reddit – ~10k github downloads of every version
  • 20. Installing Pycassa  easy_install pycassa – or pip
  • 21. Basic Layout  pycassa.pool – Connection pooling  pycassa.columnfamily – Primary module for the data API  pycassa.system_manager – Schema management
  • 22. The Data API  RPC-based API  Rows are like a sorted list of (name,value) tuples – Like a dict, but sorted by the names – OrderedDicts are used to preserve sorting
  • 23. Inserting Data >>> from pycassa.pool import ConnectionPool >>> from pycassa.columnfamily import ColumnFamily >>> >>> pool = ConnectionPool(“MyKeyspace”) >>> cf = ColumnFamily(pool, “MyCF”) >>> >>> cf.insert(“key”, {“col_name”: “col_value”}) >>> cf.get(“key”) {“col_name”: “col_value”}
  • 24. Inserting Data >>> columns = {“aaa”: 1, “ccc”: 3} >>> cf.insert(“key”, columns) >>> cf.get(“key”) {“aaa”: 1, “ccc”: 3} >>> >>> # Updates are the same as inserts >>> cf.insert(“key”, {“aaa”: 42}) >>> cf.get(“key”) {“aaa”: 42, “ccc”: 3} >>> >>> # We can insert anywhere in the row >>> cf.insert(“key”, {“bbb”: 2, “ddd”: 4}) >>> cf.get(“key”) {“aaa”: 42, “bbb”: 2, “ccc”: 3, “ddd”: 4}
  • 25. Fetching Data >>> cf.get(“key”) {“aaa”: 42, “bbb”: 2, “ccc”: 3, “ddd”: 4} >>> >>> # Get a set of columns by name >>> cf.get(“key”, columns=[“bbb”, “ddd”]) {“bbb”: 2, “ddd”: 4}
  • 26. Fetching Data >>> # Get a slice of columns >>> cf.get(“key”, column_start=”bbb”, ... column_finish=”ccc”) {“bbb”: 2, “ccc”: 3} >>> >>> # Slice from “ccc” to the end >>> cf.get(“key”, column_start=”ccc”) {“ccc”: 3, “ddd”: 4} >>> >>> # Slice from “bbb” to the beginning >>> cf.get(“key”, column_start=”bbb”, ... column_reversed=True) {“bbb”: 2, “aaa”: 42}
  • 27. Fetching Data >>> # Get the first two columns in the row >>> cf.get(“key”, column_count=2) {“aaa”: 42, “bbb”: 2} >>> >>> # Get the last two columns in the row >>> cf.get(“key”, column_reversed=True, ... column_count=2) {“ddd”: 4, “ccc”: 3}
  • 28. Fetching Multiple Rows >>> columns = {“col”: “val”} >>> cf.batch_insert({“k1”: columns, ... “k2”: columns, ... “k3”: columns}) >>> >>> # Get multiple rows by name >>> cf.multiget([“k1”,“k2”]) {“k1”: {”col”: “val”}, “k2”: {“col”: “val”}} >>> # You can get slices of each row, too >>> cf.multiget([“k1”,“k2”], column_start=”bbb”) …
  • 29. Fetching a Range of Rows >>> # Get a generator over all of the rows >>> for key, columns in cf.get_range(): ... print key, columns “k1” {”col”: “val”} “k2” {“col”: “val”} “k3” {“col”: “val”} >>> # You can get slices of each row >>> cf.get_range(column_start=”bbb”) …
  • 30. Fetching Rows by Secondary Index >>> from pycassa.index import * >>> >>> # Build up our index clause to match >>> exp = create_index_expression(“name”, “Joe”) >>> clause = create_index_clause([exp]) >>> matches = users.get_indexed_slices(clause) >>> >>> # results is a generator over matching rows >>> for key, columns in matches: ... print key, columns “13” {”name”: “Joe”, “nick”: “thatguy2”} “257” {“name”: “Joe”, “nick”: “flowers”} “98” {“name”: “Joe”, “nick”: “fr0d0”}
  • 31. Deleting Data >>> # Delete a whole row >>> cf.remove(“key1”) >>> >>> # Or selectively delete columns >>> cf.remove(“key2”, columns=[“name”, “date”])
  • 32. Connection Management  pycassa.pool.ConnectionPool – Takes a list of servers • Can be any set of nodes in your cluster – pool_size, max_retries, timeout – Automatically retries operations against other nodes • Writes are idempotent! – Individual node failures are transparent – Thread safe
  • 33. Async Options  eventlet – Just need to monkeypatch socket and threading  Twisted – Use Telephus instead of Pycassa – www.github.com/driftx/telephus – Less friendly, documented, etc