SlideShare a Scribd company logo
Computational Research Division Lawrence Berkeley National Laboratory Dan Gunter
Introduction About this talk It is not “hands-on” (sorry) Most of it is history and overview It’s about databases, not explicitly “clouds” Relation to cloud computing Cloud computing and scalable databases go hand-in-hand There are a  lot  of open-source NOSQL projects right now Understanding what they do, and what features of the commercial implementations they’re imitating, gives insight into scalability issues for distributed computing in general
Terminology: NOSQL and “Schemaless” First: not terribly important or deep in meaning But “NOSQL” has gained currency Original, and best, meaning:  Not Only SQL Wikipedia credits it to Carlo Strozzi in 1998, re-introduced in 2009 by Eric Evans of Rackspace May use non-SQL, typically simpler, access methods Don’t need to follow all the rules for RDBMS’es Lends itself to “No (use of) SQL”, but this is misleading Also referred to as “schemaless” databases Implies dynamic schema evolution
NOSQL past and present Pre-RDBMS RDBMS era NOSQL
Pre-relational structured storage systems Hierarchical storage and sparse multi-dimensional arrays MUMPS (Massachusetts General Hospital Utility Multi-Programming System), later ANSI  M sparse multi-dimensional array global variables, prefixed with “^”, are automatically persisted: ^Car(“Door”,”Color”) = “Blue” “ Pick” OS/database everything is hash table IBM Information Management System (IMS), [DB1] Computer Systems News , 11/28/83
The relational model Introduced with E. F. Codd’s 1970 paper “ A Relational Model of Data for Large Shared Data Banks” Relational algebra provided declarative means of reasoning about data sets SQL is loosely based on relational algebra A 1 ... A n Value 1 ... Value n R Relation (Table) Relation variable (Table name) Attribute (Column) {unordered} Heading Tuple (Row) {unordered}
Recent NOSQL database products Columnar  or  Extensible record  Google BigTable HBase Cassandra HyperTable SimpleDB Document Store CouchDB MongoDB Lotus Domino Graph DB Neo4j FlockDB InfiniteGraph Key/Value Store Mnesia Memcached Redis Tokyo Cabinet Dynamo Project Voldemort Dynomite Riak
Why NOSQL? Renewed interest originated with  global  internet companies (Google, Amazon, Yahoo!, FaceBook, etc.) that hit limitations of standard RDBMS solutions for one or more of: Extremely high transaction rates Dynamic analysis of huge volumes of data  Rapidly evolving and/or semi-structured data At the same time, these companies – unlike the financial and health services industries using  M  and friends – did not particularly need “ACID” transactional guarantees Didn’t want to run z/OS on mainframes And had to deal with the ugly reality of distributed computing: networks break your $&#!
CAP Theorem Introduced by Eric Brewer in a PODC keynote on July 2000, thus also known as “Brewer’s Theorem” CAP  =  C onsistency,  A vailability,  P artition-tolerance Theorem states that in any “shared data” system, i.e. any distributed system, you can have  at most  2 out of 3 of CAP (at the same time) This was later proved formally (w/asynchronous model) Three possibilities: All robust distributed systems live here Forfeit partition-tolerance Forfeit availability Forfeit consistency Single-site databases, cluster databases, LDAP  Distributed databases w/pessimistic locking, majority protocols Coda, web caching, DNS,  Dynamo
CAP, ACID, and BASE RDBMS systems and research focus on ACID:  A tomicity,  C onsistency,  I solation, and  D urability concurrent operations act as if they are serialized Brewer’s point is that this is one end of a  spectrum , one that sacrifices Partition-tolerance and Availability for Consistency So, at the other end of the spectrum we have  BASE :  B asically  A vailable  S oft-state with  E ventual consistency Stale data may be returned Optimistic locking (e.g., versioned writes) Simpler, faster, easier evolution ACID BASE
Pioneers Google BigTable Amazon Dynamo These implementations are  not  publicly available, but the distributed-system techniques that they integrated to build huge databases have been imitated, to a greater or lesser extent, by every implementation that followed.
Google BigTable Internal Google back-end, scaling to thousands of nodes, for web indexing,  Google Earth, Google Finance Scales to petabytes of data, with highly varied data size & latency requirements Data model is (3D) sparse, multi-dimensional, sorted map (row_key, column_key, timestamp) -> string Technologies: Google File System, to store data across 1000’s of nodes 3-level indexing with  Tablets SSTable  for efficient lookup and high throughput Distributed locking with  Chubby
BigTable’s Data Model Google’s Bigtable is essentially a massive, distributed 3-D spreadsheet. It doesn’t do SQL, there is limited support for atomic transactions, nor does it support the full relational database model. In short, in these and other areas, the Google team made design trade-offs to enable the scalability and fault-tolerance Google apps require. - Robin Harris, StorageMojo (blog), 2006-09-08 t 6 t 5 t 3 name contents: anchor:cnnsi.com ... anchor:my.look.ca ... “ com.cnn.www” “ CNN” ... “ CNN.com” ... “ <html>...” “ <html>...” “ <html>...”
Tablets and SSTables Tablets represent contiguous groups of rows Automatically split when grow too big One “tablet server” holds many tablets 3-level indexing scheme similar to B+-tree Root tablet -> Metadata tablets -> Data (leaf) tablets With 128MB metadata tablets, can addr. 2 34  leaves Client communicates directly with tablet server, so data does not go through root (i.e. locate, then transfer) Client also caches information Values written to memory, to disk in a commit log; periodically dumped into read-only  SSTables . Better throughput at the expense of some latency
Use of Bloom Filters to optimize lookups Review: What is a Bloom filter? Can test whether an element is a member of a set  probabilistic: can only say “no” with certainty Here, tests if an  SSTable  has a row/column pair NO: Stop YES: Need to load & retrieve data anyways Useful optimization in this space.. w  is not in {  x, y, z  } because it hashes to one position with a 0 1 1 1 0 0 1 0 1 0 1 0 0 1 0 { x,  w y, z }
Chubby and Paxos Chubby is a distributed locking service. Requests go the current Master. If the Master fails, Paxos is used to elect a new one Each “DB” is a replica Each server runs on its own host Google tends to run 5 servers, with only one being the “master” at any one time Chubby server  DB Chubby server  DB Chubby server  DB Chubby server  DB Chubby server  DB Master
What about CAP? For bookkeeping tasks, Chubby’s replication allows tolerance of node failures ( P ) and consistency ( C ) at the price of availability ( A ), during time to elect a new master and synchronize the replicas. Tablets have “relaxed consistency” of storage, GFS: A single master that maps files to servers Multiple replicas of the data Versioned writes Checksums to detect corruption (with periodic handshakes)
Amazon’s Dynamo Used by Amazon’s “core services”, for very high  A  and  P  at the price of  C  (“eventual consistency”) Data is stored and retrieved solely by key (key/value store) Techniques used:  Consistent hashing  – for partitioning  Vector clocks  – to allow MVCC and read repairs rather than write contention Merkle trees —a data structure that can diff large amounts of data quickly using a tree of hash values Gossip  – A decentralized information sharing approach that allows clusters to be self-maintaining Techniques not new, but their synthesis at this scale, in a real system, was
Dynamo data partitioning and replication Virtual node Host “node” Host “node” Virtual node Virtual node Virtual node Virtual node Virtual node Virtual node . . Hash ring using consistent hashing Host “node” Virtual node Virtual node Virtual node Virtual node 4 4 3 Item Hashes to this spot coordinator node replicas
Eventual consistency and sloppy quorum R  = Number of healthy nodes from the  preference list  (roughly, list of “next” nodes on hash ring) needed for a read W  = Number of healthy nodes from preference list needed for a write N  = number of replicas of each data item You can tune your performance R << N, high  read availability W << N, high  write availability R + W > N,  consistent, but  sloppy quorum R + W < N,  at best,  eventual consistency Hinted handoff  keeps track of the data “missed” by nodes that go down, and updates them when they come back online
Replica synchronization with Merkle trees When things go really bad, the “hinted” replicas may be lost and nodes may need to synchronize their replicas To make synchronization efficient, all the keys for a given virtual node are stored in a  hash tree  or  Merkle tree  which stores data at the leaves and recursive hashes in the nodes Same hash => Same data at leaves For Dynamo, the “data” are the keys stored in a given virtual node Each node is a hash of its children If two top hashes match, then the trees are the same
Infrastructure (at scale) is fractal This ability to  be effective at multiple scales  is crucial to the rise in NOSQL (schemaless) database popularity Why didn’t Amazon or Google just run a big machine with something like GT.M, Vertica, or KDB (etc.)? The answer must be partially to do something new, but partially that it wasn’t  just  shopping carts or search
The Gold Rush Columnar  or  Extensible record  Google BigTable HBase Cassandra HyperTable SimpleDB Document Store CouchDB MongoDB Lotus Domino Graph DB Neo4j FlockDB InfiniteGraph Key/Value Store Mnesia Memcached Redis Tokyo Cabinet Dynamo Project Voldemort Dynomite Riak Hibari
Basic operations are simply get, put, and delete All systems can distribute keys over nodes Vector clocks are used as in Dynamo (or just locks) Replication: common Transactions: not common Multiple storage engines: common Key/Value Store Memcached Redis Tokyo Cabinet Dynamo Project Voldemort Dynomite Riak Hibari
Dynamo-like features: Automatic partitioning with consistent hashing MVCC with vector clocks Eventual consistency (N, R, and W) Also: combines cache with storage to avoid sep. cache layer pluggable storage layer RAM, disk, other.. Project Voldemort Type Key/Value Store License Apache 2.0 Language Java Company Linked-In Web project-voldemort.com
Dynamo-like features: Consistent hashing MVCC with vector clocks Eventual consistency (N, R, and W) Also: Hadoop-like M/R queries   in either JS or Erlang  REST access API result = self.client\ .add(bucket.get_name())\ .map(&quot;Riak.mapValuesJson”\  .reduce(&quot;Riak.reduceSum”\ .run() Riak Example: Map/reduce with the Python API Type Key/Value Store License Open-Source Language Erlang Company Basho Web wiki.basho.com/display/RIAK/Riak/
Dynamo-like features: consistent hashing Unique features: Chain replication   Each node may function as head, middle, or end of a chain associated with a position on the hash ring; head gets requests & tail services them. See  http://www.slideshare.net/geminimobile/hibari Durability (fsync) in exchange for slower writes Hibari Type Key/Value Store  License Open-Source Language Erlang Company Gemini Mobile Web sourceforge.net/projects/hibari/
All share BigTable data model rows and columns “ column families” that can have new columns added Consistency models vary: MVCC distributed locking Need to run on a different back-end than BigTable (GFS ain’t for sale) Columnar  or  Extensible record  Google BigTable HBase Cassandra HyperTable
Marriage of BigTable and Dynamo  Consistent hashing Structured values Columns / column families Slicing with predicates Tunable consistency: W = 0, Any, 1, Quorum, All R = 1, Quorum, All Write commit log, memtable, and uses SSTables Cassandra Used at: Facebook, Twitter, Digg, Reddit, Rackspace Type Extensible column store License Apache 2.0 Language Java Company Apache Software Foundation Web cassandra.apache.org
Store objects (not really documents) think: nested maps Varying degrees of consistency, but not ACID Allow queries on data contents (M/R or other) May provide atomic read-and-set operations SimpleDB Document Store CouchDB MongoDB Lotus Domino Mnesia
Objects are grouped in “collections” REST API not very efficient for throughput Read scalability through asynchronous replication with eventual consistency No sharding Incrementally updated M/R “views” ACID? Uses MVCC and flush on commit. So, kinda.. CouchDB Type Document store License Apache 2.0 Language Erlang Company Apache Software Foundation Web couchdb.org
(Also) groups objects in “collections”, within a “database” Data stored in binary JSON called BSON Replication just for failover Automatic  sharding M/R queries, and simple filters  User-defined indexes on fields of the objects Atomic update “modifiers” can increment value modify-if-current ..others MongoDB As of v1.6, can also do limited replication with  replica sets http://www.slideshare.net/mongodb/mongodb-replica-sets Type Document store License GPL Language C++ Company 10gen Web mongodb.org
Stores data in “tables” Data stored in memory Logged to selected disks  Replication and sharding Queries are performed using Erlang list comprehensions (!) User-defined indexes on fields of the objects Transactions are supported (but optional) Optimizing query compiler and dynamic “rule” tables Embedded in Erlang OTP platform (similar to  Pick ) Mnesia * Mozilla Public License modified to conform with laws of Sweden (more herring) Type Document store License EPL* Language Erlang Company Ericsson Web www.erlang.org Papers http://www.erlang.se/publications/mnesia_overview.pdf
Why do we care about Mnesia / OTP? Database for RabbitMQ (distributed messaging behind S3) Erlang seems to be gaining a popularity in the distributed-computing space females() -> F = fun() -> Q = query [E.name || E <- table(employee),     E.sex = female] end, mnemosyne:eval(Q) end, mnesia:transaction(F).  Erlang query for “all females” in company* *I know, but it’s not  my  example. This is right out of the manual.
Comparison of MongoDB and CouchDB Domain is monitoring a set of ongoing managed data transfers initial concern is handling the data in real-time So, did some very simple 1-node benchmarks of MongoDB and CouchDB load times (i.e on my laptop) for 200K records Of course this is just one (lame) test There is a need for a standard NOSQL benchmark suite; so far YCSB is the closest (from Yahoo!) Database Inserts/sec MongoDB 16,000 CouchDB 70 CouchDB, batch 1,800
Schemaless data modeling http://labs.mudynamics.com/2010/04/01/why-nosql-is-bad-for-startups/
Example from distributed monitoring Consider semi-structured input like: ts=2010-02-20T23:14:06Z event=job.state level=Info wf_uuid=8bae72f2-31b9-45f4-bdd3-ce8032081a28 state=JOB_SUCCESS name=create_dir_montage_0_viz_glidein job_submit_seq=1 If the fields are likely to change, or new types of data will appear, how to model this kind of data? Blob Placeholders Entity-Attribute-Value All of these are data modeling “anti-patterns” for relational DBs
What’s wrong with EAV? It’s terrible, I should know, I tried it You end up with queries that look like this to just extract a bunch of fields that started out in the same log line:
What about queries?
SQL vs. M/R and other models You need to think about this going in; you are throwing away much of the elegance of relational query optimization need to weigh against costs of static schemata Holistic approach: Spend lots of time on  logical  model, understand problem! What degree of normalization makes sense? Is your data well-represented as a hash table? Is it hierarchical? Graph-like? What degree of consistency do you really need? Or maybe multiple ones?
Google’s interactive analysis tool:  Dremel see http://research.google.com/pubs/archive/36632.pdf Uses a parallel “nested columnar storage” DB SQL-like query language SELECT A, COUNT(B) FROM T GROUP BY A Interactive query times (seconds) on “trillions of records” Of course it’s not released open-source, but the glove has been thrown Now if we could only combine with visualization.. and link it all up to the cloud.. and make it free.. with ponies..
Conclusions Anyone who says RDBMS is dead (and means it) is an idiot SQL is mostly a red herring Can be layered on top of NOSQL, e.g. BigQuery and Hive What really is interesting about NOSQL is scalability (given relaxed consistency) and lack of static schemas incremental scalability from local disk to large degrees of parallelism in the face of distributed failure easier schema evolution, esp. important at the “development” phase, which is often longer than anyone wants to admit Whether we should move towards the One True Database or a Unix-like ecosystem of tools is mostly a matter of philosophical bent; certainly both directions hold promise
Selected references Cattell’s overview of “scalable datastores” http://cattell.net/datastores/ BigTable: http://labs.google.com/papers/bigtable.html Stonebraker et al. on columnar vs. map/reduce http://database.cs.brown.edu/sigmod09/benchmarks-sigmod09.pdf NOSQL “summer reading”: http://nosqlsummer.org/ “path throgh them”: http://doubleclix.wordpress.com/2010/06/12/a-path-throug-nosql-summer-reading/ Varley’s Master’s Thesis on non-relational db’s (modeling) http://ianvarley.com/UT/MR/Varley_MastersReport_Full_2009-08-07.pdf
Ad

More Related Content

What's hot (20)

NOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQLNOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQL
Ramakant Soni
 
Hive(ppt)
Hive(ppt)Hive(ppt)
Hive(ppt)
Abhinav Tyagi
 
Version Stamps in NOSQL Databases
Version Stamps in NOSQL DatabasesVersion Stamps in NOSQL Databases
Version Stamps in NOSQL Databases
Dr-Dipali Meher
 
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
Simplilearn
 
3. mining frequent patterns
3. mining frequent patterns3. mining frequent patterns
3. mining frequent patterns
Azad public school
 
Big Data Open Source Technologies
Big Data Open Source TechnologiesBig Data Open Source Technologies
Big Data Open Source Technologies
neeraj rathore
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
Apache Apex
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
Ravi Teja
 
Big data unit i
Big data unit iBig data unit i
Big data unit i
Navjot Kaur
 
Nosql seminar
Nosql seminarNosql seminar
Nosql seminar
Shreyashkumar Nangnurwar
 
CCS334 BIG DATA ANALYTICS Session 3 Distributed models.pptx
CCS334 BIG DATA ANALYTICS Session 3 Distributed models.pptxCCS334 BIG DATA ANALYTICS Session 3 Distributed models.pptx
CCS334 BIG DATA ANALYTICS Session 3 Distributed models.pptx
Guru Nanak Technical Institutions
 
Map Reduce
Map ReduceMap Reduce
Map Reduce
Prashant Gupta
 
5.1 mining data streams
5.1 mining data streams5.1 mining data streams
5.1 mining data streams
Krish_ver2
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
Dr. C.V. Suresh Babu
 
Information retrieval s
Information retrieval sInformation retrieval s
Information retrieval s
silambu111
 
The Data Science Process
The Data Science ProcessThe Data Science Process
The Data Science Process
Vishal Patel
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with Hadoop
Philippe Julio
 
Data models in NoSQL
Data models in NoSQLData models in NoSQL
Data models in NoSQL
Dr-Dipali Meher
 
Introduction to Distributed System
Introduction to Distributed SystemIntroduction to Distributed System
Introduction to Distributed System
Sunita Sahu
 
Hadoop
HadoopHadoop
Hadoop
Nishant Gandhi
 
NOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQLNOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQL
Ramakant Soni
 
Version Stamps in NOSQL Databases
Version Stamps in NOSQL DatabasesVersion Stamps in NOSQL Databases
Version Stamps in NOSQL Databases
Dr-Dipali Meher
 
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
Simplilearn
 
Big Data Open Source Technologies
Big Data Open Source TechnologiesBig Data Open Source Technologies
Big Data Open Source Technologies
neeraj rathore
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
Apache Apex
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
Ravi Teja
 
5.1 mining data streams
5.1 mining data streams5.1 mining data streams
5.1 mining data streams
Krish_ver2
 
Information retrieval s
Information retrieval sInformation retrieval s
Information retrieval s
silambu111
 
The Data Science Process
The Data Science ProcessThe Data Science Process
The Data Science Process
Vishal Patel
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with Hadoop
Philippe Julio
 
Introduction to Distributed System
Introduction to Distributed SystemIntroduction to Distributed System
Introduction to Distributed System
Sunita Sahu
 

Similar to Schemaless Databases (20)

No SQL Databases as modern database concepts
No SQL Databases as modern database conceptsNo SQL Databases as modern database concepts
No SQL Databases as modern database concepts
debasisdas225831
 
05 No SQL Sudarshan.ppt
05 No SQL Sudarshan.ppt05 No SQL Sudarshan.ppt
05 No SQL Sudarshan.ppt
AnandKonj1
 
No SQL Databases.ppt
No SQL Databases.pptNo SQL Databases.ppt
No SQL Databases.ppt
ssuser8c8fc1
 
No SQL Databases sdfghjkl;sdfghjkl;sdfghjkl;'
No SQL Databases sdfghjkl;sdfghjkl;sdfghjkl;'No SQL Databases sdfghjkl;sdfghjkl;sdfghjkl;'
No SQL Databases sdfghjkl;sdfghjkl;sdfghjkl;'
sankarapu posibabu
 
NoSQL Basics - A Quick Tour
NoSQL Basics - A Quick TourNoSQL Basics - A Quick Tour
NoSQL Basics - A Quick Tour
Bikram Sinha. MBA, PMP
 
NoSQL and MapReduce
NoSQL and MapReduceNoSQL and MapReduce
NoSQL and MapReduce
J Singh
 
Bhupeshbansal bigdata
Bhupeshbansal bigdata Bhupeshbansal bigdata
Bhupeshbansal bigdata
Bhupesh Bansal
 
NO SQL: What, Why, How
NO SQL: What, Why, HowNO SQL: What, Why, How
NO SQL: What, Why, How
Igor Moochnick
 
No sql
No sqlNo sql
No sql
Shruti_gtbit
 
NoSql Database
NoSql DatabaseNoSql Database
NoSql Database
Suresh Parmar
 
Front Range PHP NoSQL Databases
Front Range PHP NoSQL DatabasesFront Range PHP NoSQL Databases
Front Range PHP NoSQL Databases
Jon Meredith
 
Is multi-model the future of NoSQL?
Is multi-model the future of NoSQL?Is multi-model the future of NoSQL?
Is multi-model the future of NoSQL?
Max Neunhöffer
 
Datastores
DatastoresDatastores
Datastores
Raveen Vijayan
 
No sql
No sqlNo sql
No sql
Murat Çakal
 
Oslo baksia2014
Oslo baksia2014Oslo baksia2014
Oslo baksia2014
Max Neunhöffer
 
NoSQL Introduction, Theory, Implementations
NoSQL Introduction, Theory, ImplementationsNoSQL Introduction, Theory, Implementations
NoSQL Introduction, Theory, Implementations
Firat Atagun
 
No sql distilled-distilled
No sql distilled-distilledNo sql distilled-distilled
No sql distilled-distilled
rICh morrow
 
NOSQL
NOSQLNOSQL
NOSQL
akbarashaikh
 
CS 542 Parallel DBs, NoSQL, MapReduce
CS 542 Parallel DBs, NoSQL, MapReduceCS 542 Parallel DBs, NoSQL, MapReduce
CS 542 Parallel DBs, NoSQL, MapReduce
J Singh
 
Nonrelational Databases
Nonrelational DatabasesNonrelational Databases
Nonrelational Databases
Udi Bauman
 
No SQL Databases as modern database concepts
No SQL Databases as modern database conceptsNo SQL Databases as modern database concepts
No SQL Databases as modern database concepts
debasisdas225831
 
05 No SQL Sudarshan.ppt
05 No SQL Sudarshan.ppt05 No SQL Sudarshan.ppt
05 No SQL Sudarshan.ppt
AnandKonj1
 
No SQL Databases.ppt
No SQL Databases.pptNo SQL Databases.ppt
No SQL Databases.ppt
ssuser8c8fc1
 
No SQL Databases sdfghjkl;sdfghjkl;sdfghjkl;'
No SQL Databases sdfghjkl;sdfghjkl;sdfghjkl;'No SQL Databases sdfghjkl;sdfghjkl;sdfghjkl;'
No SQL Databases sdfghjkl;sdfghjkl;sdfghjkl;'
sankarapu posibabu
 
NoSQL and MapReduce
NoSQL and MapReduceNoSQL and MapReduce
NoSQL and MapReduce
J Singh
 
Bhupeshbansal bigdata
Bhupeshbansal bigdata Bhupeshbansal bigdata
Bhupeshbansal bigdata
Bhupesh Bansal
 
NO SQL: What, Why, How
NO SQL: What, Why, HowNO SQL: What, Why, How
NO SQL: What, Why, How
Igor Moochnick
 
Front Range PHP NoSQL Databases
Front Range PHP NoSQL DatabasesFront Range PHP NoSQL Databases
Front Range PHP NoSQL Databases
Jon Meredith
 
Is multi-model the future of NoSQL?
Is multi-model the future of NoSQL?Is multi-model the future of NoSQL?
Is multi-model the future of NoSQL?
Max Neunhöffer
 
NoSQL Introduction, Theory, Implementations
NoSQL Introduction, Theory, ImplementationsNoSQL Introduction, Theory, Implementations
NoSQL Introduction, Theory, Implementations
Firat Atagun
 
No sql distilled-distilled
No sql distilled-distilledNo sql distilled-distilled
No sql distilled-distilled
rICh morrow
 
CS 542 Parallel DBs, NoSQL, MapReduce
CS 542 Parallel DBs, NoSQL, MapReduceCS 542 Parallel DBs, NoSQL, MapReduce
CS 542 Parallel DBs, NoSQL, MapReduce
J Singh
 
Nonrelational Databases
Nonrelational DatabasesNonrelational Databases
Nonrelational Databases
Udi Bauman
 
Ad

Recently uploaded (20)

tecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdftecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdf
fjgm517
 
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
Alan Dix
 
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptxDevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
Justin Reock
 
Big Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur MorganBig Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur Morgan
Arthur Morgan
 
Leading AI Innovation As A Product Manager - Michael Jidael
Leading AI Innovation As A Product Manager - Michael JidaelLeading AI Innovation As A Product Manager - Michael Jidael
Leading AI Innovation As A Product Manager - Michael Jidael
Michael Jidael
 
Buckeye Dreamin 2024: Assessing and Resolving Technical Debt
Buckeye Dreamin 2024: Assessing and Resolving Technical DebtBuckeye Dreamin 2024: Assessing and Resolving Technical Debt
Buckeye Dreamin 2024: Assessing and Resolving Technical Debt
Lynda Kane
 
#AdminHour presents: Hour of Code2018 slide deck from 12/6/2018
#AdminHour presents: Hour of Code2018 slide deck from 12/6/2018#AdminHour presents: Hour of Code2018 slide deck from 12/6/2018
#AdminHour presents: Hour of Code2018 slide deck from 12/6/2018
Lynda Kane
 
Technology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data AnalyticsTechnology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data Analytics
InData Labs
 
"Rebranding for Growth", Anna Velykoivanenko
"Rebranding for Growth", Anna Velykoivanenko"Rebranding for Growth", Anna Velykoivanenko
"Rebranding for Growth", Anna Velykoivanenko
Fwdays
 
Semantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AISemantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AI
artmondano
 
Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)
Ortus Solutions, Corp
 
Salesforce AI Associate 2 of 2 Certification.docx
Salesforce AI Associate 2 of 2 Certification.docxSalesforce AI Associate 2 of 2 Certification.docx
Salesforce AI Associate 2 of 2 Certification.docx
José Enrique López Rivera
 
How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?
Daniel Lehner
 
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
SOFTTECHHUB
 
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager APIUiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPathCommunity
 
Linux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdfLinux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdf
RHCSA Guru
 
Datastucture-Unit 4-Linked List Presentation.pptx
Datastucture-Unit 4-Linked List Presentation.pptxDatastucture-Unit 4-Linked List Presentation.pptx
Datastucture-Unit 4-Linked List Presentation.pptx
kaleeswaric3
 
Mobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi ArabiaMobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi Arabia
Steve Jonas
 
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc
 
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdfThe Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
Abi john
 
tecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdftecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdf
fjgm517
 
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
Alan Dix
 
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptxDevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
DevOpsDays Atlanta 2025 - Building 10x Development Organizations.pptx
Justin Reock
 
Big Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur MorganBig Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur Morgan
Arthur Morgan
 
Leading AI Innovation As A Product Manager - Michael Jidael
Leading AI Innovation As A Product Manager - Michael JidaelLeading AI Innovation As A Product Manager - Michael Jidael
Leading AI Innovation As A Product Manager - Michael Jidael
Michael Jidael
 
Buckeye Dreamin 2024: Assessing and Resolving Technical Debt
Buckeye Dreamin 2024: Assessing and Resolving Technical DebtBuckeye Dreamin 2024: Assessing and Resolving Technical Debt
Buckeye Dreamin 2024: Assessing and Resolving Technical Debt
Lynda Kane
 
#AdminHour presents: Hour of Code2018 slide deck from 12/6/2018
#AdminHour presents: Hour of Code2018 slide deck from 12/6/2018#AdminHour presents: Hour of Code2018 slide deck from 12/6/2018
#AdminHour presents: Hour of Code2018 slide deck from 12/6/2018
Lynda Kane
 
Technology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data AnalyticsTechnology Trends in 2025: AI and Big Data Analytics
Technology Trends in 2025: AI and Big Data Analytics
InData Labs
 
"Rebranding for Growth", Anna Velykoivanenko
"Rebranding for Growth", Anna Velykoivanenko"Rebranding for Growth", Anna Velykoivanenko
"Rebranding for Growth", Anna Velykoivanenko
Fwdays
 
Semantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AISemantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AI
artmondano
 
Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)Into The Box Conference Keynote Day 1 (ITB2025)
Into The Box Conference Keynote Day 1 (ITB2025)
Ortus Solutions, Corp
 
Salesforce AI Associate 2 of 2 Certification.docx
Salesforce AI Associate 2 of 2 Certification.docxSalesforce AI Associate 2 of 2 Certification.docx
Salesforce AI Associate 2 of 2 Certification.docx
José Enrique López Rivera
 
How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?
Daniel Lehner
 
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
SOFTTECHHUB
 
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager APIUiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPathCommunity
 
Linux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdfLinux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdf
RHCSA Guru
 
Datastucture-Unit 4-Linked List Presentation.pptx
Datastucture-Unit 4-Linked List Presentation.pptxDatastucture-Unit 4-Linked List Presentation.pptx
Datastucture-Unit 4-Linked List Presentation.pptx
kaleeswaric3
 
Mobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi ArabiaMobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi Arabia
Steve Jonas
 
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc
 
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdfThe Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
Abi john
 
Ad

Schemaless Databases

  • 1. Computational Research Division Lawrence Berkeley National Laboratory Dan Gunter
  • 2. Introduction About this talk It is not “hands-on” (sorry) Most of it is history and overview It’s about databases, not explicitly “clouds” Relation to cloud computing Cloud computing and scalable databases go hand-in-hand There are a lot of open-source NOSQL projects right now Understanding what they do, and what features of the commercial implementations they’re imitating, gives insight into scalability issues for distributed computing in general
  • 3. Terminology: NOSQL and “Schemaless” First: not terribly important or deep in meaning But “NOSQL” has gained currency Original, and best, meaning: Not Only SQL Wikipedia credits it to Carlo Strozzi in 1998, re-introduced in 2009 by Eric Evans of Rackspace May use non-SQL, typically simpler, access methods Don’t need to follow all the rules for RDBMS’es Lends itself to “No (use of) SQL”, but this is misleading Also referred to as “schemaless” databases Implies dynamic schema evolution
  • 4. NOSQL past and present Pre-RDBMS RDBMS era NOSQL
  • 5. Pre-relational structured storage systems Hierarchical storage and sparse multi-dimensional arrays MUMPS (Massachusetts General Hospital Utility Multi-Programming System), later ANSI M sparse multi-dimensional array global variables, prefixed with “^”, are automatically persisted: ^Car(“Door”,”Color”) = “Blue” “ Pick” OS/database everything is hash table IBM Information Management System (IMS), [DB1] Computer Systems News , 11/28/83
  • 6. The relational model Introduced with E. F. Codd’s 1970 paper “ A Relational Model of Data for Large Shared Data Banks” Relational algebra provided declarative means of reasoning about data sets SQL is loosely based on relational algebra A 1 ... A n Value 1 ... Value n R Relation (Table) Relation variable (Table name) Attribute (Column) {unordered} Heading Tuple (Row) {unordered}
  • 7. Recent NOSQL database products Columnar or Extensible record Google BigTable HBase Cassandra HyperTable SimpleDB Document Store CouchDB MongoDB Lotus Domino Graph DB Neo4j FlockDB InfiniteGraph Key/Value Store Mnesia Memcached Redis Tokyo Cabinet Dynamo Project Voldemort Dynomite Riak
  • 8. Why NOSQL? Renewed interest originated with global internet companies (Google, Amazon, Yahoo!, FaceBook, etc.) that hit limitations of standard RDBMS solutions for one or more of: Extremely high transaction rates Dynamic analysis of huge volumes of data Rapidly evolving and/or semi-structured data At the same time, these companies – unlike the financial and health services industries using M and friends – did not particularly need “ACID” transactional guarantees Didn’t want to run z/OS on mainframes And had to deal with the ugly reality of distributed computing: networks break your $&#!
  • 9. CAP Theorem Introduced by Eric Brewer in a PODC keynote on July 2000, thus also known as “Brewer’s Theorem” CAP = C onsistency, A vailability, P artition-tolerance Theorem states that in any “shared data” system, i.e. any distributed system, you can have at most 2 out of 3 of CAP (at the same time) This was later proved formally (w/asynchronous model) Three possibilities: All robust distributed systems live here Forfeit partition-tolerance Forfeit availability Forfeit consistency Single-site databases, cluster databases, LDAP Distributed databases w/pessimistic locking, majority protocols Coda, web caching, DNS, Dynamo
  • 10. CAP, ACID, and BASE RDBMS systems and research focus on ACID: A tomicity, C onsistency, I solation, and D urability concurrent operations act as if they are serialized Brewer’s point is that this is one end of a spectrum , one that sacrifices Partition-tolerance and Availability for Consistency So, at the other end of the spectrum we have BASE : B asically A vailable S oft-state with E ventual consistency Stale data may be returned Optimistic locking (e.g., versioned writes) Simpler, faster, easier evolution ACID BASE
  • 11. Pioneers Google BigTable Amazon Dynamo These implementations are not publicly available, but the distributed-system techniques that they integrated to build huge databases have been imitated, to a greater or lesser extent, by every implementation that followed.
  • 12. Google BigTable Internal Google back-end, scaling to thousands of nodes, for web indexing, Google Earth, Google Finance Scales to petabytes of data, with highly varied data size & latency requirements Data model is (3D) sparse, multi-dimensional, sorted map (row_key, column_key, timestamp) -> string Technologies: Google File System, to store data across 1000’s of nodes 3-level indexing with Tablets SSTable for efficient lookup and high throughput Distributed locking with Chubby
  • 13. BigTable’s Data Model Google’s Bigtable is essentially a massive, distributed 3-D spreadsheet. It doesn’t do SQL, there is limited support for atomic transactions, nor does it support the full relational database model. In short, in these and other areas, the Google team made design trade-offs to enable the scalability and fault-tolerance Google apps require. - Robin Harris, StorageMojo (blog), 2006-09-08 t 6 t 5 t 3 name contents: anchor:cnnsi.com ... anchor:my.look.ca ... “ com.cnn.www” “ CNN” ... “ CNN.com” ... “ <html>...” “ <html>...” “ <html>...”
  • 14. Tablets and SSTables Tablets represent contiguous groups of rows Automatically split when grow too big One “tablet server” holds many tablets 3-level indexing scheme similar to B+-tree Root tablet -> Metadata tablets -> Data (leaf) tablets With 128MB metadata tablets, can addr. 2 34 leaves Client communicates directly with tablet server, so data does not go through root (i.e. locate, then transfer) Client also caches information Values written to memory, to disk in a commit log; periodically dumped into read-only SSTables . Better throughput at the expense of some latency
  • 15. Use of Bloom Filters to optimize lookups Review: What is a Bloom filter? Can test whether an element is a member of a set probabilistic: can only say “no” with certainty Here, tests if an SSTable has a row/column pair NO: Stop YES: Need to load & retrieve data anyways Useful optimization in this space.. w is not in { x, y, z } because it hashes to one position with a 0 1 1 1 0 0 1 0 1 0 1 0 0 1 0 { x, w y, z }
  • 16. Chubby and Paxos Chubby is a distributed locking service. Requests go the current Master. If the Master fails, Paxos is used to elect a new one Each “DB” is a replica Each server runs on its own host Google tends to run 5 servers, with only one being the “master” at any one time Chubby server DB Chubby server DB Chubby server DB Chubby server DB Chubby server DB Master
  • 17. What about CAP? For bookkeeping tasks, Chubby’s replication allows tolerance of node failures ( P ) and consistency ( C ) at the price of availability ( A ), during time to elect a new master and synchronize the replicas. Tablets have “relaxed consistency” of storage, GFS: A single master that maps files to servers Multiple replicas of the data Versioned writes Checksums to detect corruption (with periodic handshakes)
  • 18. Amazon’s Dynamo Used by Amazon’s “core services”, for very high A and P at the price of C (“eventual consistency”) Data is stored and retrieved solely by key (key/value store) Techniques used: Consistent hashing – for partitioning Vector clocks – to allow MVCC and read repairs rather than write contention Merkle trees —a data structure that can diff large amounts of data quickly using a tree of hash values Gossip – A decentralized information sharing approach that allows clusters to be self-maintaining Techniques not new, but their synthesis at this scale, in a real system, was
  • 19. Dynamo data partitioning and replication Virtual node Host “node” Host “node” Virtual node Virtual node Virtual node Virtual node Virtual node Virtual node . . Hash ring using consistent hashing Host “node” Virtual node Virtual node Virtual node Virtual node 4 4 3 Item Hashes to this spot coordinator node replicas
  • 20. Eventual consistency and sloppy quorum R = Number of healthy nodes from the preference list (roughly, list of “next” nodes on hash ring) needed for a read W = Number of healthy nodes from preference list needed for a write N = number of replicas of each data item You can tune your performance R << N, high read availability W << N, high write availability R + W > N, consistent, but sloppy quorum R + W < N, at best, eventual consistency Hinted handoff keeps track of the data “missed” by nodes that go down, and updates them when they come back online
  • 21. Replica synchronization with Merkle trees When things go really bad, the “hinted” replicas may be lost and nodes may need to synchronize their replicas To make synchronization efficient, all the keys for a given virtual node are stored in a hash tree or Merkle tree which stores data at the leaves and recursive hashes in the nodes Same hash => Same data at leaves For Dynamo, the “data” are the keys stored in a given virtual node Each node is a hash of its children If two top hashes match, then the trees are the same
  • 22. Infrastructure (at scale) is fractal This ability to be effective at multiple scales is crucial to the rise in NOSQL (schemaless) database popularity Why didn’t Amazon or Google just run a big machine with something like GT.M, Vertica, or KDB (etc.)? The answer must be partially to do something new, but partially that it wasn’t just shopping carts or search
  • 23. The Gold Rush Columnar or Extensible record Google BigTable HBase Cassandra HyperTable SimpleDB Document Store CouchDB MongoDB Lotus Domino Graph DB Neo4j FlockDB InfiniteGraph Key/Value Store Mnesia Memcached Redis Tokyo Cabinet Dynamo Project Voldemort Dynomite Riak Hibari
  • 24. Basic operations are simply get, put, and delete All systems can distribute keys over nodes Vector clocks are used as in Dynamo (or just locks) Replication: common Transactions: not common Multiple storage engines: common Key/Value Store Memcached Redis Tokyo Cabinet Dynamo Project Voldemort Dynomite Riak Hibari
  • 25. Dynamo-like features: Automatic partitioning with consistent hashing MVCC with vector clocks Eventual consistency (N, R, and W) Also: combines cache with storage to avoid sep. cache layer pluggable storage layer RAM, disk, other.. Project Voldemort Type Key/Value Store License Apache 2.0 Language Java Company Linked-In Web project-voldemort.com
  • 26. Dynamo-like features: Consistent hashing MVCC with vector clocks Eventual consistency (N, R, and W) Also: Hadoop-like M/R queries in either JS or Erlang REST access API result = self.client\ .add(bucket.get_name())\ .map(&quot;Riak.mapValuesJson”\ .reduce(&quot;Riak.reduceSum”\ .run() Riak Example: Map/reduce with the Python API Type Key/Value Store License Open-Source Language Erlang Company Basho Web wiki.basho.com/display/RIAK/Riak/
  • 27. Dynamo-like features: consistent hashing Unique features: Chain replication Each node may function as head, middle, or end of a chain associated with a position on the hash ring; head gets requests & tail services them. See http://www.slideshare.net/geminimobile/hibari Durability (fsync) in exchange for slower writes Hibari Type Key/Value Store License Open-Source Language Erlang Company Gemini Mobile Web sourceforge.net/projects/hibari/
  • 28. All share BigTable data model rows and columns “ column families” that can have new columns added Consistency models vary: MVCC distributed locking Need to run on a different back-end than BigTable (GFS ain’t for sale) Columnar or Extensible record Google BigTable HBase Cassandra HyperTable
  • 29. Marriage of BigTable and Dynamo Consistent hashing Structured values Columns / column families Slicing with predicates Tunable consistency: W = 0, Any, 1, Quorum, All R = 1, Quorum, All Write commit log, memtable, and uses SSTables Cassandra Used at: Facebook, Twitter, Digg, Reddit, Rackspace Type Extensible column store License Apache 2.0 Language Java Company Apache Software Foundation Web cassandra.apache.org
  • 30. Store objects (not really documents) think: nested maps Varying degrees of consistency, but not ACID Allow queries on data contents (M/R or other) May provide atomic read-and-set operations SimpleDB Document Store CouchDB MongoDB Lotus Domino Mnesia
  • 31. Objects are grouped in “collections” REST API not very efficient for throughput Read scalability through asynchronous replication with eventual consistency No sharding Incrementally updated M/R “views” ACID? Uses MVCC and flush on commit. So, kinda.. CouchDB Type Document store License Apache 2.0 Language Erlang Company Apache Software Foundation Web couchdb.org
  • 32. (Also) groups objects in “collections”, within a “database” Data stored in binary JSON called BSON Replication just for failover Automatic sharding M/R queries, and simple filters User-defined indexes on fields of the objects Atomic update “modifiers” can increment value modify-if-current ..others MongoDB As of v1.6, can also do limited replication with replica sets http://www.slideshare.net/mongodb/mongodb-replica-sets Type Document store License GPL Language C++ Company 10gen Web mongodb.org
  • 33. Stores data in “tables” Data stored in memory Logged to selected disks Replication and sharding Queries are performed using Erlang list comprehensions (!) User-defined indexes on fields of the objects Transactions are supported (but optional) Optimizing query compiler and dynamic “rule” tables Embedded in Erlang OTP platform (similar to Pick ) Mnesia * Mozilla Public License modified to conform with laws of Sweden (more herring) Type Document store License EPL* Language Erlang Company Ericsson Web www.erlang.org Papers http://www.erlang.se/publications/mnesia_overview.pdf
  • 34. Why do we care about Mnesia / OTP? Database for RabbitMQ (distributed messaging behind S3) Erlang seems to be gaining a popularity in the distributed-computing space females() -> F = fun() -> Q = query [E.name || E <- table(employee), E.sex = female] end, mnemosyne:eval(Q) end, mnesia:transaction(F). Erlang query for “all females” in company* *I know, but it’s not my example. This is right out of the manual.
  • 35. Comparison of MongoDB and CouchDB Domain is monitoring a set of ongoing managed data transfers initial concern is handling the data in real-time So, did some very simple 1-node benchmarks of MongoDB and CouchDB load times (i.e on my laptop) for 200K records Of course this is just one (lame) test There is a need for a standard NOSQL benchmark suite; so far YCSB is the closest (from Yahoo!) Database Inserts/sec MongoDB 16,000 CouchDB 70 CouchDB, batch 1,800
  • 36. Schemaless data modeling http://labs.mudynamics.com/2010/04/01/why-nosql-is-bad-for-startups/
  • 37. Example from distributed monitoring Consider semi-structured input like: ts=2010-02-20T23:14:06Z event=job.state level=Info wf_uuid=8bae72f2-31b9-45f4-bdd3-ce8032081a28 state=JOB_SUCCESS name=create_dir_montage_0_viz_glidein job_submit_seq=1 If the fields are likely to change, or new types of data will appear, how to model this kind of data? Blob Placeholders Entity-Attribute-Value All of these are data modeling “anti-patterns” for relational DBs
  • 38. What’s wrong with EAV? It’s terrible, I should know, I tried it You end up with queries that look like this to just extract a bunch of fields that started out in the same log line:
  • 40. SQL vs. M/R and other models You need to think about this going in; you are throwing away much of the elegance of relational query optimization need to weigh against costs of static schemata Holistic approach: Spend lots of time on logical model, understand problem! What degree of normalization makes sense? Is your data well-represented as a hash table? Is it hierarchical? Graph-like? What degree of consistency do you really need? Or maybe multiple ones?
  • 41. Google’s interactive analysis tool: Dremel see http://research.google.com/pubs/archive/36632.pdf Uses a parallel “nested columnar storage” DB SQL-like query language SELECT A, COUNT(B) FROM T GROUP BY A Interactive query times (seconds) on “trillions of records” Of course it’s not released open-source, but the glove has been thrown Now if we could only combine with visualization.. and link it all up to the cloud.. and make it free.. with ponies..
  • 42. Conclusions Anyone who says RDBMS is dead (and means it) is an idiot SQL is mostly a red herring Can be layered on top of NOSQL, e.g. BigQuery and Hive What really is interesting about NOSQL is scalability (given relaxed consistency) and lack of static schemas incremental scalability from local disk to large degrees of parallelism in the face of distributed failure easier schema evolution, esp. important at the “development” phase, which is often longer than anyone wants to admit Whether we should move towards the One True Database or a Unix-like ecosystem of tools is mostly a matter of philosophical bent; certainly both directions hold promise
  • 43. Selected references Cattell’s overview of “scalable datastores” http://cattell.net/datastores/ BigTable: http://labs.google.com/papers/bigtable.html Stonebraker et al. on columnar vs. map/reduce http://database.cs.brown.edu/sigmod09/benchmarks-sigmod09.pdf NOSQL “summer reading”: http://nosqlsummer.org/ “path throgh them”: http://doubleclix.wordpress.com/2010/06/12/a-path-throug-nosql-summer-reading/ Varley’s Master’s Thesis on non-relational db’s (modeling) http://ianvarley.com/UT/MR/Varley_MastersReport_Full_2009-08-07.pdf

Editor's Notes

  • #3: This talk really comes out of my attempt to orient myself in this space. Background is in monitoring distributed systems, concerned with scalable collection and data analysis. But also want to know what I can use for semi-structured data “in the small”.
  • #4: Where it applies, the distinction between relatively fixed schemas and dynamic ones is more technically significant than what query syntax is used to access the data, as has been shown by a number of products that provided a dialect of SQL as an alternative query language either alongside or on top of their native syntax.
  • #5: PICK -- MultiValue (aka PICK) databases are developed at TRW in 1965. M[umps] -- According to comment from Scott Jones M[umps] is developed at Mass General Hospital in 1966. It is a programming language that incorporates a hierarchical database with B+ tree storage. IBM IMS -- IBM IMS, a hierarchical database, is developed with Rockwell and Caterpillar for the Apollo space program in 1966. ISM -- InterSystems develops the ISM product family succeeded by the Open M product, all M[umps] implementations. See comment from Scott Jones below. ANSI M -- M[umps] is approved as a ANSI standard language in 1977. AT&amp;T DBM -- in 1979 Ken Thompson creates DBM which is released by AT&amp;T. At it&apos;s core it is a file-based hash. TDBM -- TDBM supporting atomic transactions NDBM -- NDBM was the Berkeley version of DBM supporting having multiple databases open at the same time. SDBM -- SDBM - another clone of DBM mainly for licensing reasons. GT.M -- GT.M is the first version of a key-value store with focus on high performance transaction processing. It is open sourced in 2000. BerkeleyDB -- BerkeleyDB is created at Berkeley in the transition from 4.3BSD to 4.4BSD. Sleepycat software is started as a company in 1996 when Netscape needed new features for BerkeleyDB. Later acquired by Oracle which still sell and maintain BerkeleyDB. Lotus Domino -- Lotus Notes or rather the server part, Lotus Domino, which really is a document database has it&apos;s initial release in 1989, now sold by IBM. It has evolved a lot from the early versions and is now a full office and collaboration suite. GDBM -- GDBM is the Gnu project clone of DBM Mnesia -- Mnesia is developed by Ericsson as a soft real-time database to be used in telecom. It is relational in nature but does not use SQL as query language but rather Erlang itself. Cache -- InterSystems CachÈ launched in 1997 and is a hybrid so-called post-relational database. It has object interfaces, SQL, PICK/MultiValue and direct manipulation of data structures. It is a M[umps] implementation. See Scott Jones comment below for more on the history of InterSystems Metakit -- Metakit is started in 1997 and is probably the first document oriented database. Supports smaller datasets than the ones in vogue nowadays. Neo4j -- Graph database Neo4j is started in 2000. db4o -- db4o an object database for java and .net is started in 2000 QDBM -- QDBM is a re-implementation of DBM with better performance by Mikio Hirabayashi. Memcached -- Memcached is started in 2003 by Danga to power Livejournal. Memcached isn&apos;t really a database since it&apos;s memory-only but there is soon a version with file storage called memcachedb. Infogrid graph DB -- Infogrid graph database is started as closed source in 2005, open sourced in 2008 CouchDB -- CouchDB is started in 2005 and provides a document database inspired by Lotus Notes. The project moves to the Apache Foundation in 2008. Google BigTable -- Google BigTable is started in 2004 and the research paper is released in 2006. JackRabbit -- JackRabbit is started in 2006 as an implementation of JSR 170 and 283. Tokyo Cabinet -- Tokyo Cabinet is a successor to QDBM by (Mikio Hirabayashi) started in 2006 Dynamo -- The research paper on Amazon Dynamo is released in 2007. MongoDB -- The document database MongoDB is started in 2007 as a part of a open source cloud computing stack and first standalone release in 2009. Cassandra -- Facebooks open sources the Cassandra project in 2008 Voldemort -- Project Voldemort is a replicated database with no single point-of-failure. Started in 2008. Dynomite -- Dynomite is a Dynamo clone written in Erlang. Terrastore -- Terrastore is a scalable elastic document store started in 2009 Redis -- Redis is persistent key-value store started in 2009 Riak -- Riak Another dynamo-inspired database started in 2009. HBase -- HBase is a BigTable clone for the Hadoop project while Hypertable is another BigTable type database also from 2009. Vertexdb -- Vertexdb another graph database is started in 2009 Term: NOSQL -- Eric Evans of Rackspace, a committer on the Cassandra project, introduces the term NoSQL often used in the sense of Not only SQL to describe the surge of new projects and products.
  • #6: Both of these systems are still used. An open-source version of M, called GT.M, is available (since 2000). M is still used by the US Dept of Veterans Affairs, and also by Ameritrade (Cache’: 12B transactions a day), ING Direct, and others in the financial industry. The IBM IMS system is still very actively used today, in particular for the US Federal Reserve. According to Wikipedia, odds are good your ATM transaction hits an IMS database. Chinese banks have purchased IMS technology. IMS includes a separate “transaction management” (TM) system.
  • #7: E. F. Codd’s seminal 1970 paper, “ A Relational Model of Data for Large Shared Data Banks” laid out a solid mathematical basis for databases in contrast to the hierarchical and network models of the time, relational algebra, an offshoot of first-order logic, provided a declarative means of reasoning about the data that did not depend on the implementation SQL is “loosely based” on relational algebra
  • #8: This taxonomy will be explored in more detail later, the point for now is that there are several different types of datastores and a number of examples of each and, referring back to the timeline, most of these implementations have occurred in the past few years..
  • #9: Corporations (once again) found themselves at the forefront of systems research. But what was that research? (Read on..)
  • #10: If nothing else, being able to refer to the “CAP theorem” the next time your networked demo breaks..
  • #11: In his talk, Brewer said “there is almost no work in this area”. I think that the existence of scalable (schemaless) database systems is proof that this has changed.
  • #12: Pictured is Parliament, pioneers of funk!
  • #13: Trivia: what major movie was about producing a script called “Chubby Rain”?
  • #14: Example of a BIgTable that stores web pages (directly out of the paper). The row names are reversed URLs (so sorted rows tend to group things by the same domain) There are two column families, “contents” and “anchor” In this example, each anchor cell has one version, and the contents column has 3
  • #17: Paxos is an old and well-known algorithm. The Chubby “Database” is really a set of directories with small “lockfiles”. Each tablet server gets one Chubby directory, and each of its tablets is a lockfile.
  • #19: These core services included the Amazon e-commerce shopping cart.
  • #20: Each virtual node is responsible for keys between itself and its predecessor on the ring. The mapping of a single node to a variable number of virtual nodes on the hash ring accounts for heterogeneity (host “power”) in the system.
  • #21: The quorum is “sloppy” because R and W refer to the number of healthy nodes, which may change between the write and subsequent read of the key.
  • #23: (Who knows what this is?) The picture is a close-up of a vegetable: the “ Chou Romanesco&amp;quot; cauliflower
  • #24: Particularly appropriate analogy because of the industry’s tendency to rush towards shiny new technologies! Following sections will examine each of these categories and walk through one publicly available product (or more) for each. With the exception of graph databases, which I simply haven’t taken the time to grok yet.
  • #26: Both Voldemort and the next database, Riak, claim they were “inspired” by the early Dynamo paper
  • #28: In the diagram, the green nodes are head; orange middle; red are tails. The white arrows are write requests, grey read requests, and red are (all) replies.
  • #30: Developed by former engineers from BigTable and Dynamo projects, in heavy use at Facebook. For consistency level, zero = totally async.; Any= 1 node, including hinted handoff; Quorum = R/2+1 where R = #replicas Reads of 0 or Any don’t make sense. 0=no data, Any=wrong node; can’t do read-repairs, just the handed-off version
  • #32: Has a nice Web UI called “Futon”. Yes, everything is a reclining furniture pun.
  • #36: Obviously, this is at best a micro-benchmark. YCSB stands for Yahoo! Cloud Serving Benchmark
  • #40: I won’t attempt to actually cover Map/Reduce, and don’t know Erlang. Instead: what impact do these databases have on data modeling efforts?