SlideShare a Scribd company logo
©2013 DataStax Confidential. Do not distribute without consent.
@rustyrazorblade
Jon Haddad

Technical Evangelist, DataStax
Python & Cassandra
1
This should be boring
• Talking to a database should not
be any of the following:
• Exciting
• "AH HA!"
• Confusing
git@github.com:rustyrazorblade/python-presentation.git
Agenda
• Go over driver basic concepts
• Connecting
• Perform queries
• Introduce object mapper
(cqlengine)
• Application integration
DataStax Native Python Driver
• Talks to Cassandra
• Connection pooling
• Aware of cluster topology
• Automatic retries / failure
management
• Load balancing
• Will include object mapper
(cqlengine) in next release
• Fully Open Source (Apache
License)
Connect to Cassandra
• Import and create a Cluster instance
• Cluster takes options such as load balancing policy, reconnect policy, retry
policy
• On connection, driver discovers entire cluster automatically
Executing queries
• CQL: Similar to SQL
• session.execute()
• Create tables, insert, selects
• Can accept simple strings
• Not token aware
Prepared Statements
• Use for all queries (inserts / updates / deletes)
• Decrease server load
• Increase security
• Allows for token aware queries
Async Queries
• Prepared statements required!
• Much faster than sync
• Utilize the entire cluster
• Driver can help us here
• We can use futures
1 statement = """INSERT INTO sensor
2 (sensor_id, name, created_at)
3 VALUES (?, ?, ?)"""
4
5 insert_sensor = session.prepare(statement)
6
7 def create_sensor_entries_callback(response, sensor_id):
8 print "CALLBACK"
9
10 for x in range(10):
11 sensor_data = (uuid.uuid4(), "sensor %d" % x, datetime.now())
12 future = session.execute_async(insert_sensor, sensor_data)
13 future.add_callback(create_sensor_entries_callback, sensor_id)
14
Async Queries w/ Callbacks
callback function
add callback
1 from cassandra.concurrent import execute_concurrent_with_args
2
3 stmt = """SELECT * FROM sensor_data WHERE sensor_id=?
4 ORDER BY created_at DESC LIMIT 1""")
5
6 select_statement = session.prepare(stmt)
7
8 sensor_ids = [["f472d5ff-0c76-404a-8044-038db416685e"],
9 ["940cb741-d5b5-4c5d-82f5-bf1aa61c6d47"],
10 ["497d4b2c-cba2-4d0f-bd80-42de612690fd"],
11 ["1bdeac75-7e12-43ba-80b5-2d38405f9843"]
12
13 result = execute_concurrent_with_args(session, select_statement, sensor_ids)
Async Queries (managed)
prepared statement
automatically manages concurrency
Performance Considerations
• Like SQL, CQL features IN() but in
general, it's terrible for
performance
• Results in more GC & perf
problems
• BATCH has the same issue
• Failure to get a single result
causes entire IN() or batch to retry
Object Mapper
Defining Models
• Each model maps to a single table
• Every model inherits from cassandra.cqlengine.models.Model
• Define fields in your table programatically
• Collections map to native Python types (lists, sets, dict)
• Table management included (no need to write ALTER)
Model with Collections
• Sets & Maps are most useful
• Use to denormalize
• Lists can have performance issues if misused
1 class Message(Model):
2 message_id = TimeUUID(primary_key=True, default=uuid1)
3 subject = Text()
4 body = Text()
5 addressed_to = Set(UUID)
6
7 class Photo(Model):
8 photo_id = UUID(primary_key=True, default=uuid4)
9 title = Text()
10 likes = Map<UUID, Text>
Clustering Keys
• Automatically determined by
ordering in model
• First primary key is partition key
• The rest are clustering keys
1 class UsersInGroup(Model):
2 group_id = UUID(primary_key=True)
3 user_id = UUID(primary_key=True)
4 is_admin = Boolean()
5
6
1 class UsersInGroupByState(Model):
2 group_id = UUID(primary_key=True, partition_key=True)
3 state = Text(primary_key=True, partition_key=True
4 user_id = UUID(primary_key=True)
5 is_admin = Boolean(default=False)
Inserting Data
• Model.create(**kwargs)
• Performs validation
• Supports custom validation
• Supports TTLs
Lightweight Transactions
• Uses paxos for consensus
• IF NOT EXISTS for INSERT
• IF FIELD=VALUE for UPDATE
• Use sparingly - requires
several round trips
Batches
• Use only to maintain multiple views (for consistency purposes)
1 class User(Model):
2 name = Text(primary_key=True)
3 twitter = Text()
4 email = Text()
5
6 class TwitterToUser(Model):
7 twitter = Text(primary_key=True)
8 name = Text()
9
10 (twitter, name) = ("rustyrazorblade", "jon")
11
12 with BatchQuery() as b:
13 User.batch(b).create(name=name, twitter=twitter)
14 EmailToUser.batch(b).create(twitter=twitter, name=name)
Fetching a Row
• Model.get() can be used to
fetch a single row
• Will throw a DoesNotExist
exception if not found
Fetching Many Rows
• Model.objects() accepts any filter acceptable to Cassandra
Table Properties
• Every table option supported
• Compaction
• gc_grace_seconds
• read repair chance
• caching
Table Inheritance
• Multiple tables with similar fields
• Query Pattern: filtering
Table Polymorphism
• Similar to inheritance
• Uses a single table
• Query pattern: select all types
Application Development
Virtual Environments
• virtualenv is your friend!
• mkvirtualenv also your friend!
• pip install mkvirtualenv
Flask==0.10.1
blist==1.3.6
cassandra-driver==2.1.2
Flask==0.9.0
rednose==0.4.1
ipdb==0.7
ipdbplugin==1.2
ipython==2.3.1
mock==1.0.1
nose==1.3.4
All sandboxed environments
Integrations
• Django
• django-cassandra-engine
• Integrates with manage.py
• Flask
• use @app.before_first_request
• General rule: connect post-fork
Go build stuff!
©2013 DataStax Confidential. Do not distribute without consent. 28

More Related Content

What's hot (19)

PDF
50 new features of Java EE 7 in 50 minutes
Antonio Goncalves
 
PPTX
Core Data Performance Guide Line
Gagan Vishal Mishra
 
PDF
Introduction into MySQL Query Tuning
Sveta Smirnova
 
PDF
Configure the dbase using em in oracle 11g
Girija Muscut
 
PPTX
Drools
Allan Huang
 
PPT
Learning Java 4 – Swing, SQL, and Security API
caswenson
 
PDF
Mini Session - Using GDB for Profiling
Enkitec
 
PDF
Capturing, Analyzing, and Optimizing your SQL
Padraig O'Sullivan
 
TXT
Hello click click boom
symbian_mgl
 
PPTX
MySql:Basics
DataminingTools Inc
 
PPTX
State of entity framework
David Paquette
 
PDF
Introduction to MySQL Query Tuning for Dev[Op]s
Sveta Smirnova
 
PDF
What's new in Cassandra 2.0
iamaleksey
 
PDF
Load Data Fast!
Karwin Software Solutions LLC
 
PDF
Performance Schema in Action: demo
Sveta Smirnova
 
PPTX
บทที่ 4 การเพิ่มข้อมูลลงฐานข้อมูล
Priew Chakrit
 
PDF
Understanding Query Execution
webhostingguy
 
DOC
Selenium Webdriver with data driven framework
David Rajah Selvaraj
 
50 new features of Java EE 7 in 50 minutes
Antonio Goncalves
 
Core Data Performance Guide Line
Gagan Vishal Mishra
 
Introduction into MySQL Query Tuning
Sveta Smirnova
 
Configure the dbase using em in oracle 11g
Girija Muscut
 
Drools
Allan Huang
 
Learning Java 4 – Swing, SQL, and Security API
caswenson
 
Mini Session - Using GDB for Profiling
Enkitec
 
Capturing, Analyzing, and Optimizing your SQL
Padraig O'Sullivan
 
Hello click click boom
symbian_mgl
 
MySql:Basics
DataminingTools Inc
 
State of entity framework
David Paquette
 
Introduction to MySQL Query Tuning for Dev[Op]s
Sveta Smirnova
 
What's new in Cassandra 2.0
iamaleksey
 
Performance Schema in Action: demo
Sveta Smirnova
 
บทที่ 4 การเพิ่มข้อมูลลงฐานข้อมูล
Priew Chakrit
 
Understanding Query Execution
webhostingguy
 
Selenium Webdriver with data driven framework
David Rajah Selvaraj
 

Similar to Cassandra Day Atlanta 2015: Python & Cassandra (20)

PDF
Cassandra: An Alien Technology That's not so Alien
Brian Hess
 
PDF
Python & Cassandra - Best Friends
Jon Haddad
 
PDF
Cassandra Day Denver 2014: Python & Cassandra Best Friends
DataStax Academy
 
PPTX
DataStax NYC Java Meetup: Cassandra with Java
carolinedatastax
 
PDF
PySpark Cassandra - Amsterdam Spark Meetup
Frens Jan Rumph
 
PPTX
Dun ddd
Lyuben Todorov
 
PDF
ChtiJUG - Cassandra 2.0
Michaël Figuière
 
PDF
Advanced Cassandra
DataStax Academy
 
PDF
Paris Cassandra Meetup - Cassandra for Developers
Michaël Figuière
 
PDF
Going native with Apache Cassandra
Johnny Miller
 
PDF
Things YouShould Be Doing When Using Cassandra Drivers
Rebecca Mills
 
PPTX
An in Depth Journey into Odoo's ORM
Odoo
 
PDF
Cassandra Day Chicago 2015: Building Java Applications with Apache Cassandra
DataStax Academy
 
PDF
High Performance Django 1
DjangoCon2008
 
PDF
High Performance Django
DjangoCon2008
 
PDF
On Cassandra Development: Past, Present and Future
pcmanus
 
PDF
Nike Tech Talk: Double Down on Apache Cassandra and Spark
Patrick McFadin
 
PDF
Cassandra hands on
niallmilton
 
PDF
Bubbles – Virtual Data Objects
Stefan Urbanek
 
PPTX
CCT (Check and Calculate Transfer)
Francesca Pappalardo
 
Cassandra: An Alien Technology That's not so Alien
Brian Hess
 
Python & Cassandra - Best Friends
Jon Haddad
 
Cassandra Day Denver 2014: Python & Cassandra Best Friends
DataStax Academy
 
DataStax NYC Java Meetup: Cassandra with Java
carolinedatastax
 
PySpark Cassandra - Amsterdam Spark Meetup
Frens Jan Rumph
 
ChtiJUG - Cassandra 2.0
Michaël Figuière
 
Advanced Cassandra
DataStax Academy
 
Paris Cassandra Meetup - Cassandra for Developers
Michaël Figuière
 
Going native with Apache Cassandra
Johnny Miller
 
Things YouShould Be Doing When Using Cassandra Drivers
Rebecca Mills
 
An in Depth Journey into Odoo's ORM
Odoo
 
Cassandra Day Chicago 2015: Building Java Applications with Apache Cassandra
DataStax Academy
 
High Performance Django 1
DjangoCon2008
 
High Performance Django
DjangoCon2008
 
On Cassandra Development: Past, Present and Future
pcmanus
 
Nike Tech Talk: Double Down on Apache Cassandra and Spark
Patrick McFadin
 
Cassandra hands on
niallmilton
 
Bubbles – Virtual Data Objects
Stefan Urbanek
 
CCT (Check and Calculate Transfer)
Francesca Pappalardo
 
Ad

More from DataStax Academy (20)

PDF
Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
DataStax Academy
 
PPTX
Introduction to DataStax Enterprise Graph Database
DataStax Academy
 
PPTX
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
DataStax Academy
 
PPTX
Cassandra on Docker @ Walmart Labs
DataStax Academy
 
PDF
Cassandra 3.0 Data Modeling
DataStax Academy
 
PPTX
Cassandra Adoption on Cisco UCS & Open stack
DataStax Academy
 
PDF
Data Modeling for Apache Cassandra
DataStax Academy
 
PDF
Coursera Cassandra Driver
DataStax Academy
 
PDF
Production Ready Cassandra
DataStax Academy
 
PDF
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
DataStax Academy
 
PPTX
Cassandra @ Sony: The good, the bad, and the ugly part 1
DataStax Academy
 
PPTX
Cassandra @ Sony: The good, the bad, and the ugly part 2
DataStax Academy
 
PDF
Standing Up Your First Cluster
DataStax Academy
 
PDF
Real Time Analytics with Dse
DataStax Academy
 
PDF
Introduction to Data Modeling with Apache Cassandra
DataStax Academy
 
PDF
Cassandra Core Concepts
DataStax Academy
 
PPTX
Enabling Search in your Cassandra Application with DataStax Enterprise
DataStax Academy
 
PPTX
Bad Habits Die Hard
DataStax Academy
 
PDF
Advanced Data Modeling with Apache Cassandra
DataStax Academy
 
PDF
Apache Cassandra and Drivers
DataStax Academy
 
Forrester CXNYC 2017 - Delivering great real-time cx is a true craft
DataStax Academy
 
Introduction to DataStax Enterprise Graph Database
DataStax Academy
 
Introduction to DataStax Enterprise Advanced Replication with Apache Cassandra
DataStax Academy
 
Cassandra on Docker @ Walmart Labs
DataStax Academy
 
Cassandra 3.0 Data Modeling
DataStax Academy
 
Cassandra Adoption on Cisco UCS & Open stack
DataStax Academy
 
Data Modeling for Apache Cassandra
DataStax Academy
 
Coursera Cassandra Driver
DataStax Academy
 
Production Ready Cassandra
DataStax Academy
 
Cassandra @ Netflix: Monitoring C* at Scale, Gossip and Tickler & Python
DataStax Academy
 
Cassandra @ Sony: The good, the bad, and the ugly part 1
DataStax Academy
 
Cassandra @ Sony: The good, the bad, and the ugly part 2
DataStax Academy
 
Standing Up Your First Cluster
DataStax Academy
 
Real Time Analytics with Dse
DataStax Academy
 
Introduction to Data Modeling with Apache Cassandra
DataStax Academy
 
Cassandra Core Concepts
DataStax Academy
 
Enabling Search in your Cassandra Application with DataStax Enterprise
DataStax Academy
 
Bad Habits Die Hard
DataStax Academy
 
Advanced Data Modeling with Apache Cassandra
DataStax Academy
 
Apache Cassandra and Drivers
DataStax Academy
 
Ad

Recently uploaded (20)

DOCX
Python coding for beginners !! Start now!#
Rajni Bhardwaj Grover
 
PDF
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
PPTX
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
PDF
“Voice Interfaces on a Budget: Building Real-time Speech Recognition on Low-c...
Edge AI and Vision Alliance
 
PPTX
Designing_the_Future_AI_Driven_Product_Experiences_Across_Devices.pptx
presentifyai
 
PDF
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PDF
Jak MŚP w Europie Środkowo-Wschodniej odnajdują się w świecie AI
dominikamizerska1
 
PPTX
AUTOMATION AND ROBOTICS IN PHARMA INDUSTRY.pptx
sameeraaabegumm
 
PDF
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
PPTX
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 
PDF
Staying Human in a Machine- Accelerated World
Catalin Jora
 
PDF
The 2025 InfraRed Report - Redpoint Ventures
Razin Mustafiz
 
PDF
How do you fast track Agentic automation use cases discovery?
DianaGray10
 
PDF
Transforming Utility Networks: Large-scale Data Migrations with FME
Safe Software
 
PDF
Mastering Financial Management in Direct Selling
Epixel MLM Software
 
PPTX
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
PDF
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
PPTX
Seamless Tech Experiences Showcasing Cross-Platform App Design.pptx
presentifyai
 
PDF
Peak of Data & AI Encore AI-Enhanced Workflows for the Real World
Safe Software
 
PDF
What’s my job again? Slides from Mark Simos talk at 2025 Tampa BSides
Mark Simos
 
Python coding for beginners !! Start now!#
Rajni Bhardwaj Grover
 
The Rise of AI and IoT in Mobile App Tech.pdf
IMG Global Infotech
 
Q2 FY26 Tableau User Group Leader Quarterly Call
lward7
 
“Voice Interfaces on a Budget: Building Real-time Speech Recognition on Low-c...
Edge AI and Vision Alliance
 
Designing_the_Future_AI_Driven_Product_Experiences_Across_Devices.pptx
presentifyai
 
New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
Jak MŚP w Europie Środkowo-Wschodniej odnajdują się w świecie AI
dominikamizerska1
 
AUTOMATION AND ROBOTICS IN PHARMA INDUSTRY.pptx
sameeraaabegumm
 
Transcript: New from BookNet Canada for 2025: BNC BiblioShare - Tech Forum 2025
BookNet Canada
 
OpenID AuthZEN - Analyst Briefing July 2025
David Brossard
 
Staying Human in a Machine- Accelerated World
Catalin Jora
 
The 2025 InfraRed Report - Redpoint Ventures
Razin Mustafiz
 
How do you fast track Agentic automation use cases discovery?
DianaGray10
 
Transforming Utility Networks: Large-scale Data Migrations with FME
Safe Software
 
Mastering Financial Management in Direct Selling
Epixel MLM Software
 
COMPARISON OF RASTER ANALYSIS TOOLS OF QGIS AND ARCGIS
Sharanya Sarkar
 
Building Real-Time Digital Twins with IBM Maximo & ArcGIS Indoors
Safe Software
 
Seamless Tech Experiences Showcasing Cross-Platform App Design.pptx
presentifyai
 
Peak of Data & AI Encore AI-Enhanced Workflows for the Real World
Safe Software
 
What’s my job again? Slides from Mark Simos talk at 2025 Tampa BSides
Mark Simos
 

Cassandra Day Atlanta 2015: Python & Cassandra

  • 1. ©2013 DataStax Confidential. Do not distribute without consent. @rustyrazorblade Jon Haddad
 Technical Evangelist, DataStax Python & Cassandra 1
  • 2. This should be boring • Talking to a database should not be any of the following: • Exciting • "AH HA!" • Confusing [email protected]:rustyrazorblade/python-presentation.git
  • 3. Agenda • Go over driver basic concepts • Connecting • Perform queries • Introduce object mapper (cqlengine) • Application integration
  • 4. DataStax Native Python Driver • Talks to Cassandra • Connection pooling • Aware of cluster topology • Automatic retries / failure management • Load balancing • Will include object mapper (cqlengine) in next release • Fully Open Source (Apache License)
  • 5. Connect to Cassandra • Import and create a Cluster instance • Cluster takes options such as load balancing policy, reconnect policy, retry policy • On connection, driver discovers entire cluster automatically
  • 6. Executing queries • CQL: Similar to SQL • session.execute() • Create tables, insert, selects • Can accept simple strings • Not token aware
  • 7. Prepared Statements • Use for all queries (inserts / updates / deletes) • Decrease server load • Increase security • Allows for token aware queries
  • 8. Async Queries • Prepared statements required! • Much faster than sync • Utilize the entire cluster • Driver can help us here • We can use futures
  • 9. 1 statement = """INSERT INTO sensor 2 (sensor_id, name, created_at) 3 VALUES (?, ?, ?)""" 4 5 insert_sensor = session.prepare(statement) 6 7 def create_sensor_entries_callback(response, sensor_id): 8 print "CALLBACK" 9 10 for x in range(10): 11 sensor_data = (uuid.uuid4(), "sensor %d" % x, datetime.now()) 12 future = session.execute_async(insert_sensor, sensor_data) 13 future.add_callback(create_sensor_entries_callback, sensor_id) 14 Async Queries w/ Callbacks callback function add callback
  • 10. 1 from cassandra.concurrent import execute_concurrent_with_args 2 3 stmt = """SELECT * FROM sensor_data WHERE sensor_id=? 4 ORDER BY created_at DESC LIMIT 1""") 5 6 select_statement = session.prepare(stmt) 7 8 sensor_ids = [["f472d5ff-0c76-404a-8044-038db416685e"], 9 ["940cb741-d5b5-4c5d-82f5-bf1aa61c6d47"], 10 ["497d4b2c-cba2-4d0f-bd80-42de612690fd"], 11 ["1bdeac75-7e12-43ba-80b5-2d38405f9843"] 12 13 result = execute_concurrent_with_args(session, select_statement, sensor_ids) Async Queries (managed) prepared statement automatically manages concurrency
  • 11. Performance Considerations • Like SQL, CQL features IN() but in general, it's terrible for performance • Results in more GC & perf problems • BATCH has the same issue • Failure to get a single result causes entire IN() or batch to retry
  • 13. Defining Models • Each model maps to a single table • Every model inherits from cassandra.cqlengine.models.Model • Define fields in your table programatically • Collections map to native Python types (lists, sets, dict) • Table management included (no need to write ALTER)
  • 14. Model with Collections • Sets & Maps are most useful • Use to denormalize • Lists can have performance issues if misused 1 class Message(Model): 2 message_id = TimeUUID(primary_key=True, default=uuid1) 3 subject = Text() 4 body = Text() 5 addressed_to = Set(UUID) 6 7 class Photo(Model): 8 photo_id = UUID(primary_key=True, default=uuid4) 9 title = Text() 10 likes = Map<UUID, Text>
  • 15. Clustering Keys • Automatically determined by ordering in model • First primary key is partition key • The rest are clustering keys 1 class UsersInGroup(Model): 2 group_id = UUID(primary_key=True) 3 user_id = UUID(primary_key=True) 4 is_admin = Boolean() 5 6 1 class UsersInGroupByState(Model): 2 group_id = UUID(primary_key=True, partition_key=True) 3 state = Text(primary_key=True, partition_key=True 4 user_id = UUID(primary_key=True) 5 is_admin = Boolean(default=False)
  • 16. Inserting Data • Model.create(**kwargs) • Performs validation • Supports custom validation • Supports TTLs
  • 17. Lightweight Transactions • Uses paxos for consensus • IF NOT EXISTS for INSERT • IF FIELD=VALUE for UPDATE • Use sparingly - requires several round trips
  • 18. Batches • Use only to maintain multiple views (for consistency purposes) 1 class User(Model): 2 name = Text(primary_key=True) 3 twitter = Text() 4 email = Text() 5 6 class TwitterToUser(Model): 7 twitter = Text(primary_key=True) 8 name = Text() 9 10 (twitter, name) = ("rustyrazorblade", "jon") 11 12 with BatchQuery() as b: 13 User.batch(b).create(name=name, twitter=twitter) 14 EmailToUser.batch(b).create(twitter=twitter, name=name)
  • 19. Fetching a Row • Model.get() can be used to fetch a single row • Will throw a DoesNotExist exception if not found
  • 20. Fetching Many Rows • Model.objects() accepts any filter acceptable to Cassandra
  • 21. Table Properties • Every table option supported • Compaction • gc_grace_seconds • read repair chance • caching
  • 22. Table Inheritance • Multiple tables with similar fields • Query Pattern: filtering
  • 23. Table Polymorphism • Similar to inheritance • Uses a single table • Query pattern: select all types
  • 25. Virtual Environments • virtualenv is your friend! • mkvirtualenv also your friend! • pip install mkvirtualenv Flask==0.10.1 blist==1.3.6 cassandra-driver==2.1.2 Flask==0.9.0 rednose==0.4.1 ipdb==0.7 ipdbplugin==1.2 ipython==2.3.1 mock==1.0.1 nose==1.3.4 All sandboxed environments
  • 26. Integrations • Django • django-cassandra-engine • Integrates with manage.py • Flask • use @app.before_first_request • General rule: connect post-fork
  • 28. ©2013 DataStax Confidential. Do not distribute without consent. 28