SlideShare a Scribd company logo
Distributed Data Stores and
No SQL Databases
S. Sudarshan
IIT Bombay
Parallel Databases and Data Stores
 Relational Databases – mainstay of business
 Web-based applications caused spikes
 Especially true for public-facing e-Commerce sites
 Many application servers, one database
 Easy to parallelize application servers to 1000s of
servers, harder to parallelize databases to same scale
 First solution: memcache or other caching
mechanisms to reduce database access
Scaling Up
 What if the dataset is huge, and very high
number of transactions per second
 Use multiple servers to host database
 Parallel databases have been around for a
while
 But expensive, and designed for decision
support not OLTP
Scaling RDBMS – Master/Slave
 Master-Slave
 All writes are written to the master. All reads
performed against the replicated slave
databases
 Good for mostly read, very few update
applications
 Critical reads may be incorrect as writes may
not have been propagated down
 Large data sets can pose problems as
master needs to duplicate data to slaves
Scaling RDBMS - Partitioning
 Partitioning
 Divide the database across many machines
 E.g. hash or range partitioning
 Handled transparently by parallel databases
 but they are expensive
 “Sharding”
 Divide data amongst many cheap databases
(MySQL/PostgreSQL)
 Manage parallel access in the application
 Scales well for both reads and writes
 Not transparent, application needs to be partition-aware
What is NoSQL?
 Stands for Not Only SQL
 Class of non-relational data storage systems
 E.g. BigTable, Dynamo, PNUTS/Sherpa, ..
 Usually do not require a fixed table schema nor
do they use the concept of joins
 All NoSQL offerings relax one or more of the
ACID properties (will talk about the CAP
theorem)
 Not a backlash/rebellion against RDBMS
 SQL is a rich query language that cannot be
rivaled by the current list of NoSQL offerings
Why Now?
 Explosion of social media sites (Facebook,
Twitter) with large data needs
 Explosion of storage needs in large web
sites such as Google, Yahoo
 Much of the data is not files
 Rise of cloud-based solutions such as
Amazon S3 (simple storage solution)
 Shift to dynamically-typed data with frequent
schema changes
 Open-source community
Distributed Key-Value Data Stores
 Distributed key-value data storage systems allow
key-value pairs to be stored (and retrieved on key)
in a massively parallel system
 E.g. Google BigTable, Yahoo! Sherpa/PNUTS, Amazon
Dynamo, ..
 Partitioning, high availability etc completely
transparent to application
 Sharding systems and key-value stores don’t
support many relational features
 No join operations (except within partition)
 No referential integrity constraints across partitions
 etc.
Typical NoSQL API
 Basic API access:
 get(key) -- Extract the value given a key
 put(key, value) -- Create or update the value
given its key
 delete(key) -- Remove the key and its
associated value
 execute(key, operation, parameters) -- Invoke
an operation to the value (given its key)
which is a special data structure (e.g. List,
Set, Map .... etc).
Flexible Data Model
ColumnFamily: Rockets
Key Value
1
2
3
Name Value
toon
inventoryQty
brakes
Rocket-Powered Roller Skates
Ready, Set, Zoom
5
false
name
Name Value
toon
inventoryQty
brakes
Little Giant Do-It-Yourself Rocket-Sled Kit
Beep Prepared
4
false
Name Value
toon
inventoryQty
wheels
Acme Jet Propelled Unicycle
Hot Rod and Reel
1
1
name
name
NoSQL Data Storage: Classification
 Uninterpreted key/value or ‘the big hash
table’.
 Amazon S3 (Dynamo)
 Flexible schema
 BigTable, Cassandra, HBase (ordered keys,
semi-structured data),
 Sherpa/PNuts (unordered keys, JSON)
 MongoDB (based on JSON)
 CouchDB (name/value in text)
PNUTS Data Storage Architecture
CAP Theorem
 Three properties of a system
 Consistency (all copies have same value)
 Availability (system can run even if parts have failed)
 Partitions (network can break into two or more parts,
each with active systems that can’t talk to other parts)
 Brewer’s CAP “Theorem”: You can have at most
two of these three properties for any system
 Very large systems will partition at some point
 Choose one of consistency or availablity
 Traditional database choose consistency
 Most Web applications choose availability
 Except for specific parts such as order processing
Availability
 Traditionally, thought of as the
server/process available five 9’s (99.999 %).
 However, for large node system, at almost
any point in time there’s a good chance that
a node is either down or there is a network
disruption among the nodes.
 Want a system that is resilient in the face of
network disruption
Eventual Consistency
 When no updates occur for a long period of time, eventually
all updates will propagate through the system and all the
nodes will be consistent
 For a given accepted update and a given node, eventually
either the update reaches the node or the node is removed
from service
 Known as BASE (Basically Available, Soft state, Eventual
consistency), as opposed to ACID
 Soft state: copies of a data item may be inconsistent
 Eventually Consistent – copies becomes consistent at some
later time if there are no more updates to that data item
Common Advantages
 Cheap, easy to implement (open source)
 Data are replicated to multiple nodes (therefore
identical and fault-tolerant) and can be
partitioned
 When data is written, the latest version is on at least
one node and then replicated to other nodes
 Down nodes easily replaced
 No single point of failure
 Easy to distribute
 Don't require a schema
What does NoSQL Not Provide?
 Joins
 Group by
 But PNUTS provides interesting
materialized view approach to
joins/aggregation.
 ACID transactions
 SQL
 Integration with applications that are based
on SQL
Should I be using NoSQL Databases?
 NoSQL Data storage systems makes sense for
applications that need to deal with very very large
semi-structured data
 Log Analysis
 Social Networking Feeds
 Most of us work on organizational databases,
which are not that large and have low
update/query rates
 regular relational databases are THE correct
solution for such applications
Further Reading
 Lots of material on the Web

E.g. nice presentation on NoSQL by Perry Hoekstra
E.g. nice presentation on NoSQL by Perry Hoekstra
(Perficient)
(Perficient)
 Some material in this talk is from above presentation
Some material in this talk is from above presentation

Use a search engine to find information on data
Use a search engine to find information on data
storage systems such as
storage systems such as
 BigTable (Google), Dynamo (Amazon), Cassandra
BigTable (Google), Dynamo (Amazon), Cassandra
(Facebook/Apache), Pnuts/Sherpa (Yahoo),
(Facebook/Apache), Pnuts/Sherpa (Yahoo),
CouchDB, MongoDB, …
CouchDB, MongoDB, …
 Several of above are open source
Several of above are open source

More Related Content

Similar to No SQL Databases as modern database concepts (20)

PPTX
NOSQL PRESENTATION ON INTRRODUCTION Intro.pptx
plvdravikumarit
 
PPTX
NoSQL(NOT ONLY SQL)
Rahul P
 
PPT
6269441.ppt
Swapna Jk
 
PPTX
NOSQL
akbarashaikh
 
PPTX
No sq lv2
Nusrat Sharmin
 
PDF
NOSQL- Presentation on NoSQL
Ramakant Soni
 
PDF
NOSQL in big data is the not only structure langua.pdf
ajajkhan16
 
PDF
Nosql Presentation.pdf for DBMS understanding
HUSNAINAHMAD39
 
PPTX
To SQL or NoSQL, that is the question
Krishnakumar S
 
PPTX
No sql database
vishal gupta
 
PPTX
NoSQL databases - An introduction
Pooyan Mehrparvar
 
PPTX
cours database pour etudiant NoSQL (1).pptx
ssuser1fde9c
 
PPTX
NoSQL.pptx
RithikRaj25
 
PDF
1. Lecture1_NOSQL_Introduction.pdf
ShaimaaMohamedGalal
 
PPTX
Master.pptx
KarthikR780430
 
PPTX
NoSQL and Couchbase
Sangharsh agarwal
 
PPTX
NoSQL A brief look at Apache Cassandra Distributed Database
Joe Alex
 
PPT
No sql
Murat Çakal
 
PDF
SURVEY ON IMPLEMANTATION OF COLUMN ORIENTED NOSQL DATA STORES ( BIGTABLE & CA...
IJCERT JOURNAL
 
PDF
NoSQL Basics - A Quick Tour
Bikram Sinha. MBA, PMP
 
NOSQL PRESENTATION ON INTRRODUCTION Intro.pptx
plvdravikumarit
 
NoSQL(NOT ONLY SQL)
Rahul P
 
6269441.ppt
Swapna Jk
 
No sq lv2
Nusrat Sharmin
 
NOSQL- Presentation on NoSQL
Ramakant Soni
 
NOSQL in big data is the not only structure langua.pdf
ajajkhan16
 
Nosql Presentation.pdf for DBMS understanding
HUSNAINAHMAD39
 
To SQL or NoSQL, that is the question
Krishnakumar S
 
No sql database
vishal gupta
 
NoSQL databases - An introduction
Pooyan Mehrparvar
 
cours database pour etudiant NoSQL (1).pptx
ssuser1fde9c
 
NoSQL.pptx
RithikRaj25
 
1. Lecture1_NOSQL_Introduction.pdf
ShaimaaMohamedGalal
 
Master.pptx
KarthikR780430
 
NoSQL and Couchbase
Sangharsh agarwal
 
NoSQL A brief look at Apache Cassandra Distributed Database
Joe Alex
 
No sql
Murat Çakal
 
SURVEY ON IMPLEMANTATION OF COLUMN ORIENTED NOSQL DATA STORES ( BIGTABLE & CA...
IJCERT JOURNAL
 
NoSQL Basics - A Quick Tour
Bikram Sinha. MBA, PMP
 

More from debasisdas225831 (10)

PPT
javascript arrays in details for udergaduate studenets .ppt
debasisdas225831
 
PPT
C++ STL standard template librairy in OOPs.ppt
debasisdas225831
 
PPTX
slides11-objects_and_classes in python.pptx
debasisdas225831
 
PPT
Introduction to HTML for under-graduate studnets
debasisdas225831
 
PPT
Exception and Error Handling in C++ - A detailed Presentation
debasisdas225831
 
PPT
GraphTerminology in Data Structure using C++
debasisdas225831
 
PPTX
Online Food Ordering System Project Presentation
debasisdas225831
 
PPT
Supply Chain Management in Business Technology
debasisdas225831
 
PPT
How to calculate complexity in Data Structure
debasisdas225831
 
PPT
Introduction to Hashing in Data Structure using C++
debasisdas225831
 
javascript arrays in details for udergaduate studenets .ppt
debasisdas225831
 
C++ STL standard template librairy in OOPs.ppt
debasisdas225831
 
slides11-objects_and_classes in python.pptx
debasisdas225831
 
Introduction to HTML for under-graduate studnets
debasisdas225831
 
Exception and Error Handling in C++ - A detailed Presentation
debasisdas225831
 
GraphTerminology in Data Structure using C++
debasisdas225831
 
Online Food Ordering System Project Presentation
debasisdas225831
 
Supply Chain Management in Business Technology
debasisdas225831
 
How to calculate complexity in Data Structure
debasisdas225831
 
Introduction to Hashing in Data Structure using C++
debasisdas225831
 
Ad

Recently uploaded (20)

PDF
Stokey: A Jewish Village by Rachel Kolsky
History of Stoke Newington
 
PDF
Geographical diversity of India short notes by sandeep swamy
Sandeep Swamy
 
PPTX
Quarter 1_PPT_PE & HEALTH 8_WEEK 3-4.pptx
ronajadolpnhs
 
PDF
Chapter-V-DED-Entrepreneurship: Institutions Facilitating Entrepreneurship
Dayanand Huded
 
PPTX
How to Handle Salesperson Commision in Odoo 18 Sales
Celine George
 
PDF
Dimensions of Societal Planning in Commonism
StefanMz
 
PPTX
PPT-Q1-WK-3-ENGLISH Revised Matatag Grade 3.pptx
reijhongidayawan02
 
PDF
Characteristics, Strengths and Weaknesses of Quantitative Research.pdf
Thelma Villaflores
 
PDF
Governor Josh Stein letter to NC delegation of U.S. House
Mebane Rash
 
PDF
Biological Bilingual Glossary Hindi and English Medium
World of Wisdom
 
PDF
Isharyanti-2025-Cross Language Communication in Indonesian Language
Neny Isharyanti
 
PDF
Exploring the Different Types of Experimental Research
Thelma Villaflores
 
PDF
Women's Health: Essential Tips for Every Stage.pdf
Iftikhar Ahmed
 
PPTX
Universal immunization Programme (UIP).pptx
Vishal Chanalia
 
PDF
Mahidol_Change_Agent_Note_2025-06-27-29_MUSEF
Tassanee Lerksuthirat
 
PPT
Talk on Critical Theory, Part One, Philosophy of Social Sciences
Soraj Hongladarom
 
PPTX
PPT-Q1-WEEK-3-SCIENCE-ERevised Matatag Grade 3.pptx
reijhongidayawan02
 
PPTX
EDUCATIONAL MEDIA/ TEACHING AUDIO VISUAL AIDS
Sonali Gupta
 
PDF
0725.WHITEPAPER-UNIQUEWAYSOFPROTOTYPINGANDUXNOW.pdf
Thomas GIRARD, MA, CDP
 
PPTX
Controller Request and Response in Odoo18
Celine George
 
Stokey: A Jewish Village by Rachel Kolsky
History of Stoke Newington
 
Geographical diversity of India short notes by sandeep swamy
Sandeep Swamy
 
Quarter 1_PPT_PE & HEALTH 8_WEEK 3-4.pptx
ronajadolpnhs
 
Chapter-V-DED-Entrepreneurship: Institutions Facilitating Entrepreneurship
Dayanand Huded
 
How to Handle Salesperson Commision in Odoo 18 Sales
Celine George
 
Dimensions of Societal Planning in Commonism
StefanMz
 
PPT-Q1-WK-3-ENGLISH Revised Matatag Grade 3.pptx
reijhongidayawan02
 
Characteristics, Strengths and Weaknesses of Quantitative Research.pdf
Thelma Villaflores
 
Governor Josh Stein letter to NC delegation of U.S. House
Mebane Rash
 
Biological Bilingual Glossary Hindi and English Medium
World of Wisdom
 
Isharyanti-2025-Cross Language Communication in Indonesian Language
Neny Isharyanti
 
Exploring the Different Types of Experimental Research
Thelma Villaflores
 
Women's Health: Essential Tips for Every Stage.pdf
Iftikhar Ahmed
 
Universal immunization Programme (UIP).pptx
Vishal Chanalia
 
Mahidol_Change_Agent_Note_2025-06-27-29_MUSEF
Tassanee Lerksuthirat
 
Talk on Critical Theory, Part One, Philosophy of Social Sciences
Soraj Hongladarom
 
PPT-Q1-WEEK-3-SCIENCE-ERevised Matatag Grade 3.pptx
reijhongidayawan02
 
EDUCATIONAL MEDIA/ TEACHING AUDIO VISUAL AIDS
Sonali Gupta
 
0725.WHITEPAPER-UNIQUEWAYSOFPROTOTYPINGANDUXNOW.pdf
Thomas GIRARD, MA, CDP
 
Controller Request and Response in Odoo18
Celine George
 
Ad

No SQL Databases as modern database concepts

  • 1. Distributed Data Stores and No SQL Databases S. Sudarshan IIT Bombay
  • 2. Parallel Databases and Data Stores  Relational Databases – mainstay of business  Web-based applications caused spikes  Especially true for public-facing e-Commerce sites  Many application servers, one database  Easy to parallelize application servers to 1000s of servers, harder to parallelize databases to same scale  First solution: memcache or other caching mechanisms to reduce database access
  • 3. Scaling Up  What if the dataset is huge, and very high number of transactions per second  Use multiple servers to host database  Parallel databases have been around for a while  But expensive, and designed for decision support not OLTP
  • 4. Scaling RDBMS – Master/Slave  Master-Slave  All writes are written to the master. All reads performed against the replicated slave databases  Good for mostly read, very few update applications  Critical reads may be incorrect as writes may not have been propagated down  Large data sets can pose problems as master needs to duplicate data to slaves
  • 5. Scaling RDBMS - Partitioning  Partitioning  Divide the database across many machines  E.g. hash or range partitioning  Handled transparently by parallel databases  but they are expensive  “Sharding”  Divide data amongst many cheap databases (MySQL/PostgreSQL)  Manage parallel access in the application  Scales well for both reads and writes  Not transparent, application needs to be partition-aware
  • 6. What is NoSQL?  Stands for Not Only SQL  Class of non-relational data storage systems  E.g. BigTable, Dynamo, PNUTS/Sherpa, ..  Usually do not require a fixed table schema nor do they use the concept of joins  All NoSQL offerings relax one or more of the ACID properties (will talk about the CAP theorem)  Not a backlash/rebellion against RDBMS  SQL is a rich query language that cannot be rivaled by the current list of NoSQL offerings
  • 7. Why Now?  Explosion of social media sites (Facebook, Twitter) with large data needs  Explosion of storage needs in large web sites such as Google, Yahoo  Much of the data is not files  Rise of cloud-based solutions such as Amazon S3 (simple storage solution)  Shift to dynamically-typed data with frequent schema changes  Open-source community
  • 8. Distributed Key-Value Data Stores  Distributed key-value data storage systems allow key-value pairs to be stored (and retrieved on key) in a massively parallel system  E.g. Google BigTable, Yahoo! Sherpa/PNUTS, Amazon Dynamo, ..  Partitioning, high availability etc completely transparent to application  Sharding systems and key-value stores don’t support many relational features  No join operations (except within partition)  No referential integrity constraints across partitions  etc.
  • 9. Typical NoSQL API  Basic API access:  get(key) -- Extract the value given a key  put(key, value) -- Create or update the value given its key  delete(key) -- Remove the key and its associated value  execute(key, operation, parameters) -- Invoke an operation to the value (given its key) which is a special data structure (e.g. List, Set, Map .... etc).
  • 10. Flexible Data Model ColumnFamily: Rockets Key Value 1 2 3 Name Value toon inventoryQty brakes Rocket-Powered Roller Skates Ready, Set, Zoom 5 false name Name Value toon inventoryQty brakes Little Giant Do-It-Yourself Rocket-Sled Kit Beep Prepared 4 false Name Value toon inventoryQty wheels Acme Jet Propelled Unicycle Hot Rod and Reel 1 1 name name
  • 11. NoSQL Data Storage: Classification  Uninterpreted key/value or ‘the big hash table’.  Amazon S3 (Dynamo)  Flexible schema  BigTable, Cassandra, HBase (ordered keys, semi-structured data),  Sherpa/PNuts (unordered keys, JSON)  MongoDB (based on JSON)  CouchDB (name/value in text)
  • 12. PNUTS Data Storage Architecture
  • 13. CAP Theorem  Three properties of a system  Consistency (all copies have same value)  Availability (system can run even if parts have failed)  Partitions (network can break into two or more parts, each with active systems that can’t talk to other parts)  Brewer’s CAP “Theorem”: You can have at most two of these three properties for any system  Very large systems will partition at some point  Choose one of consistency or availablity  Traditional database choose consistency  Most Web applications choose availability  Except for specific parts such as order processing
  • 14. Availability  Traditionally, thought of as the server/process available five 9’s (99.999 %).  However, for large node system, at almost any point in time there’s a good chance that a node is either down or there is a network disruption among the nodes.  Want a system that is resilient in the face of network disruption
  • 15. Eventual Consistency  When no updates occur for a long period of time, eventually all updates will propagate through the system and all the nodes will be consistent  For a given accepted update and a given node, eventually either the update reaches the node or the node is removed from service  Known as BASE (Basically Available, Soft state, Eventual consistency), as opposed to ACID  Soft state: copies of a data item may be inconsistent  Eventually Consistent – copies becomes consistent at some later time if there are no more updates to that data item
  • 16. Common Advantages  Cheap, easy to implement (open source)  Data are replicated to multiple nodes (therefore identical and fault-tolerant) and can be partitioned  When data is written, the latest version is on at least one node and then replicated to other nodes  Down nodes easily replaced  No single point of failure  Easy to distribute  Don't require a schema
  • 17. What does NoSQL Not Provide?  Joins  Group by  But PNUTS provides interesting materialized view approach to joins/aggregation.  ACID transactions  SQL  Integration with applications that are based on SQL
  • 18. Should I be using NoSQL Databases?  NoSQL Data storage systems makes sense for applications that need to deal with very very large semi-structured data  Log Analysis  Social Networking Feeds  Most of us work on organizational databases, which are not that large and have low update/query rates  regular relational databases are THE correct solution for such applications
  • 19. Further Reading  Lots of material on the Web  E.g. nice presentation on NoSQL by Perry Hoekstra E.g. nice presentation on NoSQL by Perry Hoekstra (Perficient) (Perficient)  Some material in this talk is from above presentation Some material in this talk is from above presentation  Use a search engine to find information on data Use a search engine to find information on data storage systems such as storage systems such as  BigTable (Google), Dynamo (Amazon), Cassandra BigTable (Google), Dynamo (Amazon), Cassandra (Facebook/Apache), Pnuts/Sherpa (Yahoo), (Facebook/Apache), Pnuts/Sherpa (Yahoo), CouchDB, MongoDB, … CouchDB, MongoDB, …  Several of above are open source Several of above are open source

Editor's Notes

  • #9: -> http://horicky.blogspot.com/2009/11/nosql-patterns.html
  • #11: .
  • #17: -> No JDBC -> Data integrity at the application layer