SlideShare a Scribd company logo
NoSQL Database: ApacheNoSQL Database: Apache
CassandraCassandra
www.folio3.com@folio_3
Folio3 – OverviewFolio3 – Overview
www.folio3.com @folio_3
Who We Are
 We are a Development Partner for our customers
 Design software solutions, not just implement them
 Focus on the solution – Platform and technology agnostic
 Expertise in building applications that are:
Mobile Social Cloud-based Gamified
What We Do
 Areas of Focus
 Enterprise
 Custom enterprise applications
 Product development targeting the enterprise
 Mobile
 Custom mobile apps for iOS, Android, Windows Phone, BB OS
 Mobile platform (server-to-server) development
 Social Media
 CMS based websites for consumers and enterprise (corporate, consumer,
community & social networking)
 Social media platform development (enterprise & consumer)
Folio3 At a Glance
 Founded in 2005
 Over 200 full time employees
 Offices in the US, Canada, Bulgaria & Pakistan
 Palo Alto, CA.
 Sofia, Bulgaria
 Karachi, Pakistan
Toronto, Canada
Areas of Focus: Enterprise
 Automating workflows
 Cloud based solutions
 Application integration
 Platform development
 Healthcare
 Mobile Enterprise
 Digital Media
 Supply Chain
Some of Our Enterprise Clients
Areas of Focus: Mobile
 Serious enterprise applications for Banks,
Businesses
 Fun consumer apps for app discovery,
interaction, exercise gamification and play
 Educational apps
 Augmented Reality apps
 Mobile Platforms
Some of Our Mobile Clients
Areas of Focus: Web & Social Media
 Community Sites based on
Content Management Systems
 Enterprise Social Networking
 Social Games for Facebook &
Mobile
 Companion Apps for games
Some of Our Web Clients
NoSQL Database: ApacheNoSQL Database: Apache
CassandraCassandra
www.folio3.com @folio_3
Agenda
 What is NOSQL?
 Motivations for NOSQL?
 Brewer’s CAP Theorem
 Taxonomy of NOSQL databases
 Apache Cassandra
 Features
 Data Model
 Consistency
 Operations
 Cluster Membership
 What Does NOSQL means for RDBMS?
What is NOSQL?
 Refers to databases that differs from traditional relational database
management system (RDBMS)
 Distributed, flexible, horizontally scalable data stores
 Confusion with the term NOSQL
 NOSQL != No SQL (or Anti-SQL)
 NOSQL = Not Only SQL
 NOSQL is an inaccurate term since it is commonly used to refer to
"non-relational" databases but the term has stuck
Motivations for NOSQL
 Classical RDBMS unsuitable for today's web applications
because:
 Performance (Latency): Variable
 Flexibility: Low
 Scalability: Variable
 Functionality
Brewer's CAP Theorm
 Consistency (C)
 Availability (A)
 Partition Tolerance (P)
 Pick any two
 Most NOSQL databases sacrifice Consistency
in favor of high Availability and Performance
Taxonomy of NOSQL
 Key/Value Stores - Distributed Hash Tables (DHT)
 Memcached, Amazon’s Dynamo, Redis, PStore
 Document Stores
 Semi structured data (stores entire documents)
 CouchDB, MongoDB, RDDB, Riak
 Graph Databases *
 Based on graph theory
 ActiveRDF, AllegroGraph, Neo4J
 Object Database *
 Versant, Objectivity
 Column-oriented Stores
 * these are considered soft NOSQL databases and are usually in NOSQL category because of being
"non-relational".
Column-Oriented Data Stores
 Semi-structured column-based data stores
 Stores each column separately so that aggregate operations for one column
of the entire table are significantly quicker than the traditional row storage
model
 Popular examples
 Hadoop/HBASE
 Apache Cassandra
 Google's BigTable
 HyperTable
 Amazon's SimpleDB
Apache Cassandra
 Fully distributed column oriented data store
 Also provides Map Reduce implementation using Hadoop (increased
performance)
 Based on Google's BigTable (Data Model) and Amazon's Dynamo
(Consistency & Partition Tolerance)
 Cassandra values Availability and Partitioning tolerance (AP) while
providing tunable consistency levels.
History
 Developed at Facebook
 Released as open source project on Google Code in July 2008
 Became an Apache Incubator Project in March 2009
 Became a top level Apache project in February 2010 Performance
 Rumors of Facebook having started working on its own separate
version of Cassandra
Features
 Fully Distributed
 Highly Scalable
 Fault Tolerant (No single point of failure)
 Tunable Consistency (Eventually Consistent)
 Semi-structured key-value store
 High Availability
 No Referential Integrity
 No Joins
Data Model
 KeySpace (Uppermost namespace)
 Column Family / Super Column Family (analogous to table)
 Super Column
 Column (Name, Value, Timestamp)
 Rows are referenced through keys
 Each column is stored in a separate physical file
Standard Column Family
Super Column Family
Super Column Family: Static/Static
Super Column Family: Static/Static
Super Column Family: Static/Dynamic
Super Column Family: Static/Dynamic
Super Column Family: Dynamic/Static
Super Column Family: Dynamic/Static
Super Column Family: Dynamic/Dynamic
Super Column Family: Dynamic/Dynamic
Apache Cassandra: Consistency
 Consistency refers to whether a system is left in a consistent state
after an operation. In distributed data systems like Cassandra, this
usually means that once a writer has written, all readers will see that
write.
 If W + R > N, you will have strong consistent behavior; that is, readers
will always see the most recent write
 W is the number of nodes to block for on write
 R is the number to block for on reads
 N is the replication factor (number of replicas)
Apache Cassandra: Consistency
 Relational databases provide strong consistency (ACID)
 Cassandra provide eventual consistency (BASE) meaning the database
will eventually reach a consistent state
 QUORUM reads and writes gives consistency while still allowing
availability
 Q = (N / 2) + 1 (simple majority)
 If latency is more important than consistency, you can lower values
for either or both W and R.
Apache Cassandra: Consistency Levels
 Write
 ZERO
 ANY
 ONE
 QUORUM
 ALL
 Read
 ZERO
 ANY
 ONE
 QUORUM
 ALL
Write Operation
 Client sends a write request to a random node; the random node
forwards the request to the proper node (1st replica responsible for
the partition - coordinator)
 Coordinator sends requests to N replicas
 If W replicas confirm the write operation then OK
 Always writable, hinted handoff (If a replica node for the key is down,
Cassandra will write a hint to the live replica node indicating that the
write needs to be replayed to the unavailable node.)
Read Operation
 Coordinator sends requests to N replicas, if R replicas respond then
OK
 If different versions are returned then reconcile and write back the
reconciled version (Read Repair)
Cluster Membership
 Gossip Protocol
 Every T seconds each node increments its heartbeat counter
and gossips to another node about the state of the cluster;
the receiving node merges the cluster info with its own copy
 Cluster state (node in/out, failure) propagated quickly:
O(LogN) where N is the number of nodes in the cluster
Storage Ring
 Cassandra cluster nodes are organized in a virtual ring.
 Each node has a single unique token that defines its place in the ring
and which keys it is responsible for
 Key ranges are adjusted when the nodes join or leave
Apache Cassandra: MySQL Comparison
 MySQL (> 50 GB data)
 Read Average: ~ 350 ms
 Write Average: ~ 300 ms
 Cassandra (> 50 GB data)
 Read Average: 15 ms
 Write Average: 0.12 ms
Apache Cassandra: Client API
 Low level API
 Thrift
 High Level API
 Java
 Hector, Pelops, Kundera
 .NET
 FluentCassandra, Aquiles
 Python
 Telephus, Pycassa
 PHP
 phpcassa, SimpleCassie
Apache Cassandra: Where to Use?
 Use Cassandra, if you want/need
 High write throughput
 Near-Linear scalability
 Automated replication/fault tolerance
 Can tolerate low consistency
 Can tolerate missing RDBMS features
Apache Cassandra: Users
 Facebook (of course)
 To power inbox search (previously)
 Twitter
 To handle user relationships, analytics (but not for tweets)
 Digg & Reddit
 Both use Cassandra to handle user comments and votes
 Rackspace
 IBM
 To build scalable email system
 Cisco's WebEx
 To store user feed and activity in near real time
What does NOSQL mean for the future of RDBMS?
 No worries! RDBMSs are here to stay for the foreseeable future
 NOSQL data stores can be used in combination with RDBMS in some
situations
 NOSQL still has a long way to go, in order to reach the widespread
(mainstream) use and support of the RDBMS
Weakness of NOSQL
 No or limited support for complex queries
 No transactions available (operations are atomic)
 No standard interface for NOSQL databases (like SQL in relational
databases)
 No or limited administrative features available for NOSQL databases
 Not suitable (yet) for mainstream use
Why Still Use RDBMS?
 All the weaknesses of NOSQL
 Relational databases are widely used and understood
 RDBMS DBAs and developers are easily available in the market
 For big business, relational databases are a safe choice because they
have heavily invested in relational technology
 Many database design and development tools available
References
 http://www.allthingsdistributed.com/2008/12/eventually_consistent.
html
 http://wiki.apache.org/cassandra/FrontPage
 http://en.wikipedia.org/wiki/Apache_Cassandra
 http://www.slideshare.net/gdusbabek/cassandra-presentation-for-
san-antonio-jug
 http://www.slideshare.net/Eweaver/cassandra-presentation-at-nosql
 http://nosql-database.org/
 http://nosqlpedia.com/
Contact
 For more details about our
services, please get in touch with
us.
contact@folio3.com
US Office: (408) 365-4638
www.folio3.com

More Related Content

What's hot (20)

PPTX
Introduction to NoSQL
PolarSeven Pty Ltd
 
PDF
C* Summit 2013: The World's Next Top Data Model by Patrick McFadin
DataStax Academy
 
PPTX
Introduction to Redis
Arnab Mitra
 
PDF
Spark SQL
Joud Khattab
 
PDF
Introduction to Cassandra Architecture
nickmbailey
 
PPTX
Cassandra an overview
PritamKathar
 
PPTX
Cassandra
Upaang Saxena
 
PPTX
Apache Cassandra at the Geek2Geek Berlin
Christian Johannsen
 
PPTX
Redis introduction
Federico Daniel Colombo Gennarelli
 
PDF
Introduction to Cassandra Basics
nickmbailey
 
PPTX
Introduction to MongoDB
MongoDB
 
PPTX
Redis Introduction
Alex Su
 
PDF
Hudi architecture, fundamentals and capabilities
Nishith Agarwal
 
PPTX
Apache Kafka 0.8 basic training - Verisign
Michael Noll
 
PDF
MongoDB Fundamentals
MongoDB
 
PPT
7. Key-Value Databases: In Depth
Fabio Fumarola
 
PPTX
HBase and HDFS: Understanding FileSystem Usage in HBase
enissoz
 
PPTX
Data Lake Overview
James Serra
 
PPTX
Hive + Tez: A Performance Deep Dive
DataWorks Summit
 
PDF
Intro to HBase
alexbaranau
 
Introduction to NoSQL
PolarSeven Pty Ltd
 
C* Summit 2013: The World's Next Top Data Model by Patrick McFadin
DataStax Academy
 
Introduction to Redis
Arnab Mitra
 
Spark SQL
Joud Khattab
 
Introduction to Cassandra Architecture
nickmbailey
 
Cassandra an overview
PritamKathar
 
Cassandra
Upaang Saxena
 
Apache Cassandra at the Geek2Geek Berlin
Christian Johannsen
 
Introduction to Cassandra Basics
nickmbailey
 
Introduction to MongoDB
MongoDB
 
Redis Introduction
Alex Su
 
Hudi architecture, fundamentals and capabilities
Nishith Agarwal
 
Apache Kafka 0.8 basic training - Verisign
Michael Noll
 
MongoDB Fundamentals
MongoDB
 
7. Key-Value Databases: In Depth
Fabio Fumarola
 
HBase and HDFS: Understanding FileSystem Usage in HBase
enissoz
 
Data Lake Overview
James Serra
 
Hive + Tez: A Performance Deep Dive
DataWorks Summit
 
Intro to HBase
alexbaranau
 

Viewers also liked (19)

PPT
NoSql Databases
Nimat Khattak
 
PDF
Nosql databases for the .net developer
Jesus Rodriguez
 
PPTX
A practical introduction to Oracle NoSQL Database - OOW2014
Anuj Sahni
 
PPTX
Big Data and NoSQL for Database and BI Pros
Andrew Brust
 
PPTX
Nosql databases
ateeq ateeq
 
ODP
Intro to cassandra
Aaron Ploetz
 
PPTX
An Intro to NoSQL Databases
Rajith Pemabandu
 
PDF
Using Spring with NoSQL databases (SpringOne China 2012)
Chris Richardson
 
PDF
NoSQL-Database-Concepts
Bhaskar Gunda
 
KEY
NoSQL databases and managing big data
Steven Francia
 
PDF
Cassandra Core Concepts - Cassandra Day Toronto
Jon Haddad
 
PDF
Requêtes multi-critères avec Cassandra
Julien Dubois
 
PDF
NoSQL Databases, Not just a Buzzword
Haitham El-Ghareeb
 
PPTX
Test Automation for NoSQL Databases
Tobias Trelle
 
PDF
Oracle NoSQL Database release 3.0 overview
Dave Segleau
 
PDF
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
Helena Edelson
 
KEY
NoSQL Databases: Why, what and when
Lorenzo Alberton
 
PDF
NoSQL Now! NoSQL Architecture Patterns
DATAVERSITY
 
PDF
Wide-column Stores für Architekten (HBase, Cassandra)
Andreas Buckenhofer
 
NoSql Databases
Nimat Khattak
 
Nosql databases for the .net developer
Jesus Rodriguez
 
A practical introduction to Oracle NoSQL Database - OOW2014
Anuj Sahni
 
Big Data and NoSQL for Database and BI Pros
Andrew Brust
 
Nosql databases
ateeq ateeq
 
Intro to cassandra
Aaron Ploetz
 
An Intro to NoSQL Databases
Rajith Pemabandu
 
Using Spring with NoSQL databases (SpringOne China 2012)
Chris Richardson
 
NoSQL-Database-Concepts
Bhaskar Gunda
 
NoSQL databases and managing big data
Steven Francia
 
Cassandra Core Concepts - Cassandra Day Toronto
Jon Haddad
 
Requêtes multi-critères avec Cassandra
Julien Dubois
 
NoSQL Databases, Not just a Buzzword
Haitham El-Ghareeb
 
Test Automation for NoSQL Databases
Tobias Trelle
 
Oracle NoSQL Database release 3.0 overview
Dave Segleau
 
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
Helena Edelson
 
NoSQL Databases: Why, what and when
Lorenzo Alberton
 
NoSQL Now! NoSQL Architecture Patterns
DATAVERSITY
 
Wide-column Stores für Architekten (HBase, Cassandra)
Andreas Buckenhofer
 
Ad

Similar to NOSQL Database: Apache Cassandra (20)

PPTX
SQL and NoSQL in SQL Server
Michael Rys
 
PPTX
NoSql Database
Suresh Parmar
 
PPTX
Learning Cassandra NoSQL
Pankaj Khattar
 
PPTX
Introduction to NoSQL
Ahmed Helmy
 
PPT
Schemaless Databases
Dan Gunter
 
PPTX
Learn Cassandra at edureka!
Edureka!
 
PPT
No sql
Shruti_gtbit
 
PPT
No sql
Murat Çakal
 
PDF
About "Apache Cassandra"
Jihyun Ahn
 
ODP
Front Range PHP NoSQL Databases
Jon Meredith
 
PPT
Bhupeshbansal bigdata
Bhupesh Bansal
 
PPTX
Nosql seminar
Shreyashkumar Nangnurwar
 
PPTX
No sq lv2
Nusrat Sharmin
 
PPT
No SQL Databases as modern database concepts
debasisdas225831
 
PPT
05 No SQL Sudarshan.ppt
AnandKonj1
 
PPT
No SQL Databases sdfghjkl;sdfghjkl;sdfghjkl;'
sankarapu posibabu
 
PPT
No SQL Databases.ppt
ssuser8c8fc1
 
PDF
cassandra
Akash R
 
PPTX
Data Engineering for Data Scientists
jlacefie
 
ODP
Nonrelational Databases
Udi Bauman
 
SQL and NoSQL in SQL Server
Michael Rys
 
NoSql Database
Suresh Parmar
 
Learning Cassandra NoSQL
Pankaj Khattar
 
Introduction to NoSQL
Ahmed Helmy
 
Schemaless Databases
Dan Gunter
 
Learn Cassandra at edureka!
Edureka!
 
No sql
Shruti_gtbit
 
No sql
Murat Çakal
 
About "Apache Cassandra"
Jihyun Ahn
 
Front Range PHP NoSQL Databases
Jon Meredith
 
Bhupeshbansal bigdata
Bhupesh Bansal
 
No sq lv2
Nusrat Sharmin
 
No SQL Databases as modern database concepts
debasisdas225831
 
05 No SQL Sudarshan.ppt
AnandKonj1
 
No SQL Databases sdfghjkl;sdfghjkl;sdfghjkl;'
sankarapu posibabu
 
No SQL Databases.ppt
ssuser8c8fc1
 
cassandra
Akash R
 
Data Engineering for Data Scientists
jlacefie
 
Nonrelational Databases
Udi Bauman
 
Ad

More from Folio3 Software (20)

PPT
Shopify & Shopify Plus Ecommerce Development Experts
Folio3 Software
 
PPT
Magento and Magento 2 Ecommerce Development
Folio3 Software
 
PPTX
All You Need to Know About Type Script
Folio3 Software
 
PPT
Enter the Big Picture
Folio3 Software
 
PPT
A Guideline to Test Your Own Code - Developer Testing
Folio3 Software
 
PPT
OWIN (Open Web Interface for .NET)
Folio3 Software
 
PPT
Introduction to Go-Lang
Folio3 Software
 
PPT
An Introduction to CSS Preprocessors (SASS & LESS)
Folio3 Software
 
PPT
Introduction to SharePoint 2013
Folio3 Software
 
PPT
An Overview of Blackberry 10
Folio3 Software
 
PPT
StackOverflow Architectural Overview
Folio3 Software
 
PPT
Enterprise Mobility - An Introduction
Folio3 Software
 
PPT
Distributed and Fault Tolerant Realtime Computation with Apache Storm, Apache...
Folio3 Software
 
PPT
Introduction to Docker
Folio3 Software
 
PPT
Introduction to Enterprise Service Bus
Folio3 Software
 
PPT
Regular Expression in Action
Folio3 Software
 
PPT
HTTP Server Push Techniques
Folio3 Software
 
PPT
Best Practices of Software Development
Folio3 Software
 
PPT
Offline Data Access in Enterprise Mobility
Folio3 Software
 
PPT
Realtime and Synchronous Applications
Folio3 Software
 
Shopify & Shopify Plus Ecommerce Development Experts
Folio3 Software
 
Magento and Magento 2 Ecommerce Development
Folio3 Software
 
All You Need to Know About Type Script
Folio3 Software
 
Enter the Big Picture
Folio3 Software
 
A Guideline to Test Your Own Code - Developer Testing
Folio3 Software
 
OWIN (Open Web Interface for .NET)
Folio3 Software
 
Introduction to Go-Lang
Folio3 Software
 
An Introduction to CSS Preprocessors (SASS & LESS)
Folio3 Software
 
Introduction to SharePoint 2013
Folio3 Software
 
An Overview of Blackberry 10
Folio3 Software
 
StackOverflow Architectural Overview
Folio3 Software
 
Enterprise Mobility - An Introduction
Folio3 Software
 
Distributed and Fault Tolerant Realtime Computation with Apache Storm, Apache...
Folio3 Software
 
Introduction to Docker
Folio3 Software
 
Introduction to Enterprise Service Bus
Folio3 Software
 
Regular Expression in Action
Folio3 Software
 
HTTP Server Push Techniques
Folio3 Software
 
Best Practices of Software Development
Folio3 Software
 
Offline Data Access in Enterprise Mobility
Folio3 Software
 
Realtime and Synchronous Applications
Folio3 Software
 

Recently uploaded (20)

PPTX
Tally_Basic_Operations_Presentation.pptx
AditiBansal54083
 
PPTX
Java Native Memory Leaks: The Hidden Villain Behind JVM Performance Issues
Tier1 app
 
PDF
Salesforce CRM Services.VALiNTRY360
VALiNTRY360
 
PPTX
Why Businesses Are Switching to Open Source Alternatives to Crystal Reports.pptx
Varsha Nayak
 
PPTX
Engineering the Java Web Application (MVC)
abhishekoza1981
 
PDF
Efficient, Automated Claims Processing Software for Insurers
Insurance Tech Services
 
PDF
Digger Solo: Semantic search and maps for your local files
seanpedersen96
 
PDF
Executive Business Intelligence Dashboards
vandeslie24
 
PPTX
Writing Better Code - Helping Developers make Decisions.pptx
Lorraine Steyn
 
PDF
Beyond Binaries: Understanding Diversity and Allyship in a Global Workplace -...
Imma Valls Bernaus
 
PDF
Linux Certificate of Completion - LabEx Certificate
VICTOR MAESTRE RAMIREZ
 
PDF
iTop VPN With Crack Lifetime Activation Key-CODE
utfefguu
 
PDF
MiniTool Partition Wizard 12.8 Crack License Key LATEST
hashhshs786
 
PDF
Revenue streams of the Wazirx clone script.pdf
aaronjeffray
 
PDF
Understanding the Need for Systemic Change in Open Source Through Intersectio...
Imma Valls Bernaus
 
PPTX
Equipment Management Software BIS Safety UK.pptx
BIS Safety Software
 
PDF
GetOnCRM Speeds Up Agentforce 3 Deployment for Enterprise AI Wins.pdf
GetOnCRM Solutions
 
PPTX
Migrating Millions of Users with Debezium, Apache Kafka, and an Acyclic Synch...
MD Sayem Ahmed
 
PPTX
Tally software_Introduction_Presentation
AditiBansal54083
 
PPTX
MailsDaddy Outlook OST to PST converter.pptx
abhishekdutt366
 
Tally_Basic_Operations_Presentation.pptx
AditiBansal54083
 
Java Native Memory Leaks: The Hidden Villain Behind JVM Performance Issues
Tier1 app
 
Salesforce CRM Services.VALiNTRY360
VALiNTRY360
 
Why Businesses Are Switching to Open Source Alternatives to Crystal Reports.pptx
Varsha Nayak
 
Engineering the Java Web Application (MVC)
abhishekoza1981
 
Efficient, Automated Claims Processing Software for Insurers
Insurance Tech Services
 
Digger Solo: Semantic search and maps for your local files
seanpedersen96
 
Executive Business Intelligence Dashboards
vandeslie24
 
Writing Better Code - Helping Developers make Decisions.pptx
Lorraine Steyn
 
Beyond Binaries: Understanding Diversity and Allyship in a Global Workplace -...
Imma Valls Bernaus
 
Linux Certificate of Completion - LabEx Certificate
VICTOR MAESTRE RAMIREZ
 
iTop VPN With Crack Lifetime Activation Key-CODE
utfefguu
 
MiniTool Partition Wizard 12.8 Crack License Key LATEST
hashhshs786
 
Revenue streams of the Wazirx clone script.pdf
aaronjeffray
 
Understanding the Need for Systemic Change in Open Source Through Intersectio...
Imma Valls Bernaus
 
Equipment Management Software BIS Safety UK.pptx
BIS Safety Software
 
GetOnCRM Speeds Up Agentforce 3 Deployment for Enterprise AI Wins.pdf
GetOnCRM Solutions
 
Migrating Millions of Users with Debezium, Apache Kafka, and an Acyclic Synch...
MD Sayem Ahmed
 
Tally software_Introduction_Presentation
AditiBansal54083
 
MailsDaddy Outlook OST to PST converter.pptx
abhishekdutt366
 

NOSQL Database: Apache Cassandra

  • 1. NoSQL Database: ApacheNoSQL Database: Apache CassandraCassandra www.folio3.com@folio_3
  • 2. Folio3 – OverviewFolio3 – Overview www.folio3.com @folio_3
  • 3. Who We Are  We are a Development Partner for our customers  Design software solutions, not just implement them  Focus on the solution – Platform and technology agnostic  Expertise in building applications that are: Mobile Social Cloud-based Gamified
  • 4. What We Do  Areas of Focus  Enterprise  Custom enterprise applications  Product development targeting the enterprise  Mobile  Custom mobile apps for iOS, Android, Windows Phone, BB OS  Mobile platform (server-to-server) development  Social Media  CMS based websites for consumers and enterprise (corporate, consumer, community & social networking)  Social media platform development (enterprise & consumer)
  • 5. Folio3 At a Glance  Founded in 2005  Over 200 full time employees  Offices in the US, Canada, Bulgaria & Pakistan  Palo Alto, CA.  Sofia, Bulgaria  Karachi, Pakistan Toronto, Canada
  • 6. Areas of Focus: Enterprise  Automating workflows  Cloud based solutions  Application integration  Platform development  Healthcare  Mobile Enterprise  Digital Media  Supply Chain
  • 7. Some of Our Enterprise Clients
  • 8. Areas of Focus: Mobile  Serious enterprise applications for Banks, Businesses  Fun consumer apps for app discovery, interaction, exercise gamification and play  Educational apps  Augmented Reality apps  Mobile Platforms
  • 9. Some of Our Mobile Clients
  • 10. Areas of Focus: Web & Social Media  Community Sites based on Content Management Systems  Enterprise Social Networking  Social Games for Facebook & Mobile  Companion Apps for games
  • 11. Some of Our Web Clients
  • 12. NoSQL Database: ApacheNoSQL Database: Apache CassandraCassandra www.folio3.com @folio_3
  • 13. Agenda  What is NOSQL?  Motivations for NOSQL?  Brewer’s CAP Theorem  Taxonomy of NOSQL databases  Apache Cassandra  Features  Data Model  Consistency  Operations  Cluster Membership  What Does NOSQL means for RDBMS?
  • 14. What is NOSQL?  Refers to databases that differs from traditional relational database management system (RDBMS)  Distributed, flexible, horizontally scalable data stores  Confusion with the term NOSQL  NOSQL != No SQL (or Anti-SQL)  NOSQL = Not Only SQL  NOSQL is an inaccurate term since it is commonly used to refer to "non-relational" databases but the term has stuck
  • 15. Motivations for NOSQL  Classical RDBMS unsuitable for today's web applications because:  Performance (Latency): Variable  Flexibility: Low  Scalability: Variable  Functionality
  • 16. Brewer's CAP Theorm  Consistency (C)  Availability (A)  Partition Tolerance (P)  Pick any two  Most NOSQL databases sacrifice Consistency in favor of high Availability and Performance
  • 17. Taxonomy of NOSQL  Key/Value Stores - Distributed Hash Tables (DHT)  Memcached, Amazon’s Dynamo, Redis, PStore  Document Stores  Semi structured data (stores entire documents)  CouchDB, MongoDB, RDDB, Riak  Graph Databases *  Based on graph theory  ActiveRDF, AllegroGraph, Neo4J  Object Database *  Versant, Objectivity  Column-oriented Stores  * these are considered soft NOSQL databases and are usually in NOSQL category because of being "non-relational".
  • 18. Column-Oriented Data Stores  Semi-structured column-based data stores  Stores each column separately so that aggregate operations for one column of the entire table are significantly quicker than the traditional row storage model  Popular examples  Hadoop/HBASE  Apache Cassandra  Google's BigTable  HyperTable  Amazon's SimpleDB
  • 19. Apache Cassandra  Fully distributed column oriented data store  Also provides Map Reduce implementation using Hadoop (increased performance)  Based on Google's BigTable (Data Model) and Amazon's Dynamo (Consistency & Partition Tolerance)  Cassandra values Availability and Partitioning tolerance (AP) while providing tunable consistency levels.
  • 20. History  Developed at Facebook  Released as open source project on Google Code in July 2008  Became an Apache Incubator Project in March 2009  Became a top level Apache project in February 2010 Performance  Rumors of Facebook having started working on its own separate version of Cassandra
  • 21. Features  Fully Distributed  Highly Scalable  Fault Tolerant (No single point of failure)  Tunable Consistency (Eventually Consistent)  Semi-structured key-value store  High Availability  No Referential Integrity  No Joins
  • 22. Data Model  KeySpace (Uppermost namespace)  Column Family / Super Column Family (analogous to table)  Super Column  Column (Name, Value, Timestamp)  Rows are referenced through keys  Each column is stored in a separate physical file
  • 25. Super Column Family: Static/Static
  • 26. Super Column Family: Static/Static
  • 27. Super Column Family: Static/Dynamic
  • 28. Super Column Family: Static/Dynamic
  • 29. Super Column Family: Dynamic/Static
  • 30. Super Column Family: Dynamic/Static
  • 31. Super Column Family: Dynamic/Dynamic
  • 32. Super Column Family: Dynamic/Dynamic
  • 33. Apache Cassandra: Consistency  Consistency refers to whether a system is left in a consistent state after an operation. In distributed data systems like Cassandra, this usually means that once a writer has written, all readers will see that write.  If W + R > N, you will have strong consistent behavior; that is, readers will always see the most recent write  W is the number of nodes to block for on write  R is the number to block for on reads  N is the replication factor (number of replicas)
  • 34. Apache Cassandra: Consistency  Relational databases provide strong consistency (ACID)  Cassandra provide eventual consistency (BASE) meaning the database will eventually reach a consistent state  QUORUM reads and writes gives consistency while still allowing availability  Q = (N / 2) + 1 (simple majority)  If latency is more important than consistency, you can lower values for either or both W and R.
  • 35. Apache Cassandra: Consistency Levels  Write  ZERO  ANY  ONE  QUORUM  ALL  Read  ZERO  ANY  ONE  QUORUM  ALL
  • 36. Write Operation  Client sends a write request to a random node; the random node forwards the request to the proper node (1st replica responsible for the partition - coordinator)  Coordinator sends requests to N replicas  If W replicas confirm the write operation then OK  Always writable, hinted handoff (If a replica node for the key is down, Cassandra will write a hint to the live replica node indicating that the write needs to be replayed to the unavailable node.)
  • 37. Read Operation  Coordinator sends requests to N replicas, if R replicas respond then OK  If different versions are returned then reconcile and write back the reconciled version (Read Repair)
  • 38. Cluster Membership  Gossip Protocol  Every T seconds each node increments its heartbeat counter and gossips to another node about the state of the cluster; the receiving node merges the cluster info with its own copy  Cluster state (node in/out, failure) propagated quickly: O(LogN) where N is the number of nodes in the cluster
  • 39. Storage Ring  Cassandra cluster nodes are organized in a virtual ring.  Each node has a single unique token that defines its place in the ring and which keys it is responsible for  Key ranges are adjusted when the nodes join or leave
  • 40. Apache Cassandra: MySQL Comparison  MySQL (> 50 GB data)  Read Average: ~ 350 ms  Write Average: ~ 300 ms  Cassandra (> 50 GB data)  Read Average: 15 ms  Write Average: 0.12 ms
  • 41. Apache Cassandra: Client API  Low level API  Thrift  High Level API  Java  Hector, Pelops, Kundera  .NET  FluentCassandra, Aquiles  Python  Telephus, Pycassa  PHP  phpcassa, SimpleCassie
  • 42. Apache Cassandra: Where to Use?  Use Cassandra, if you want/need  High write throughput  Near-Linear scalability  Automated replication/fault tolerance  Can tolerate low consistency  Can tolerate missing RDBMS features
  • 43. Apache Cassandra: Users  Facebook (of course)  To power inbox search (previously)  Twitter  To handle user relationships, analytics (but not for tweets)  Digg & Reddit  Both use Cassandra to handle user comments and votes  Rackspace  IBM  To build scalable email system  Cisco's WebEx  To store user feed and activity in near real time
  • 44. What does NOSQL mean for the future of RDBMS?  No worries! RDBMSs are here to stay for the foreseeable future  NOSQL data stores can be used in combination with RDBMS in some situations  NOSQL still has a long way to go, in order to reach the widespread (mainstream) use and support of the RDBMS
  • 45. Weakness of NOSQL  No or limited support for complex queries  No transactions available (operations are atomic)  No standard interface for NOSQL databases (like SQL in relational databases)  No or limited administrative features available for NOSQL databases  Not suitable (yet) for mainstream use
  • 46. Why Still Use RDBMS?  All the weaknesses of NOSQL  Relational databases are widely used and understood  RDBMS DBAs and developers are easily available in the market  For big business, relational databases are a safe choice because they have heavily invested in relational technology  Many database design and development tools available
  • 47. References  http://www.allthingsdistributed.com/2008/12/eventually_consistent. html  http://wiki.apache.org/cassandra/FrontPage  http://en.wikipedia.org/wiki/Apache_Cassandra  http://www.slideshare.net/gdusbabek/cassandra-presentation-for- san-antonio-jug  http://www.slideshare.net/Eweaver/cassandra-presentation-at-nosql  http://nosql-database.org/  http://nosqlpedia.com/
  • 48. Contact  For more details about our services, please get in touch with us. [email protected] US Office: (408) 365-4638 www.folio3.com