SlideShare a Scribd company logo
NO SQL 
10/20/2014 @ Surabhi Dwivedi 1
Contents 
 Introduction and Feature of NoSQL 
 CAP Theorem 
 RDBMS VS NoSQL 
 NoSQL Database family 
10/20/2014 @ Surabhi Dwivedi 2
Features- Not Only SQL 
 No RDBMS 
◦ No relational 
 Distributed Data Store 
◦ Horizontally scalable 
 Schema-free / Flexible schema 
◦ Database JOINs generally not supported 
 A huge amount of data 
◦ Eg Google/Facebook which collects terabits of data 
 BASE properties 
◦ Basically Available 
◦ Soft state 
 It does not have to be consitent all the time 
◦ Eventually consistent 
 The system will eventually become consistent when the updates 
propagate, in particular, when there are not too many updates 
10/20/2014 @ Surabhi Dwivedi 3
NoSQL 
 Provides a mechanism for 
◦ storage and retrieval of data that is modeled in 
means other than the tabular relations used in 
relational databases 
 Used in big data and real-time web 
applications 
 NoSQL isn’t a single product or technology, 
but an umbrella term for a category of 
databases 
10/20/2014 @ Surabhi Dwivedi 4
NoSQL does not Provide 
 Joins 
 Group by 
 ACID transactions 
 SQL 
 NoSQL databases reject: 
◦ Overhead of ACID transactions 
◦ “Complexity” of SQL 
◦ Burden of up-front schema design 
◦ Declarative query expression 
10/20/2014 @ Surabhi Dwivedi 5
10/20/2014 @ Surabhi Dwivedi 6
Requirement of NoSQL 
10/20/2014 @ Surabhi Dwivedi 7
NoSQL - Users 
10/20/2014 @ Surabhi Dwivedi 8
CAP Theorem 
10/20/2014 @ Surabhi Dwivedi 9
CAP Theorem 
 Three properties of a system 
◦ Consistency 
 all copies have same value 
◦ Availability 
 system can run even if parts have failed Via replication 
◦ Partitions 
 network can break into two or more parts, each with active 
systems that can’t talk to other parts 
 Very large systems will partition at some point 
◦ Choose one of consistency or availability 
◦ Traditional database choose consistency 
◦ Most Web applications choose availability 
 Except for specific parts such as order processing 
10/20/2014 @ Surabhi Dwivedi 10
RDBMS VS NoSQL database 
RDBMS NoSQL 
Structured and organized data Stands for Not Only SQL 
Structured query language (SQL) No declarative query language 
Data and its relationships are stored in 
separate tables. 
No predefined schema 
Data Manipulation Language, Data 
Definition Language 
Variants - Key-Value Pair Store, Column 
Store, Document Store, Graph Store 
Tight Consistency Eventual consistency rather ACID 
property 
ACID Transaction CAP Theorem 
- Prioritizes high performance, high 
availability and scalability 
10/20/2014 @ Surabhi Dwivedi 11
Example –NoSQL Databases 
10/20/2014 @ Surabhi Dwivedi 12
NoSQL Database Family 
10/20/2014 @ Surabhi Dwivedi 13
NoSQL Database Types 
• Hash table of keys 
• Lookup a single value for a key 
• Amazon’s Dynamo 
Distributed Key- 
Value Systems 
• Stores documents made up of tagged elements 
• Access data by key or by search of “document” data. 
• CouchDB, MongoDB 
Document-based 
Systems 
• Each storage block contains data from only one column 
• Google’s BigTable 
• Facebook’s Cassandra 
Column-based 
Systems 
• Use a graph structure 
• Google’s Pregel, - Neo4j 
Graph-based 
Systems 
10/20/2014 @ Surabhi Dwivedi 14
Column-oriented databases 
• Column-family stores allow you to store data with keys mapped to 
values and the values grouped into multiple column families, 
• Each column family being a map of data Most popular types - non-relational 
databases 
• Column-family databases store data in column families as rows 
• They have many columns associated with a row key 
• Column families are groups of related data that is often 
• accessed together 
• The basic unit of storage in Column-family databases is a column 
• Example 
• Hadoop / Hbase 
• Cassandra :Apache Cassandra was initially developed at Facebook to 
power their Inbox Search feature 
• Cloudata :Google's Big table clone like HBase 
10/20/2014 @ Surabhi Dwivedi 15
Column-Oriented Databases Cont … 
 Data tables are stored as sections of columns of 
data, rather than as rows of data. 
 The column is used as a store for the value, and 
has a timestamp that is used to differentiate the 
valid content from stale ones. 
 Application will use the timestamp to find out 
which of the stored values in the backup nodes 
are up-to-date. 
 Column Family 
◦ A container for columns, analogous to table in a 
relational database. 
◦ The column Family has a name, a map with a key and 
a value(which is a map c10o/20n/2t0a14in@in Sgura cbhoi Dlwuivmedi ns). 16
Example 
 Cassandra 
 Hbase 
 Hypertable 
 Amazon Simple DB 
10/20/2014 @ Surabhi Dwivedi 17
{ 
“row_key_1” : { 
“name” : { 
... 
} 
“location” : { 
... 
}, 
“preferences” : { 
... 
} 
}, 
“row_key_2” : { 
“name” : { 
... 
}, 
“location” : { 
... 
}, 
“preferences” : { 
... 
} 
}, 
“row_key_3” : { 
... 
} 
uniquely identifies a record in a 
column database 
•Column-family identifier. 
•Second level key 
10/20/2014 @ Surabhi Dwivedi 18
{ 
“row_key_1” : { 
“name” : { 
“first_name” : “Jolly”, 
“last_name” : “Goodfellow” 
} 
} 
}, 
“location” : { 
“zip”: “94301” 
}, 
“preferences” : { 
“d/r” : “D” 
} 
}, 
“row_key_2” : { 
“name” : { 
“first_name” : “Very”, 
“middle_name” : “Happy”, 
“last_name” : “Guy” 
}, 
“location” : { 
“zip” : “10001” 
}, 
“preferences” : 
“v/nv”: “V” 
} 
}, 
... 
} 
Each row may have a different set of 
columns within a column-family 
10/20/2014 @ Surabhi Dwivedi 19
Contrasting Column Databases with 
RDBMS 
• Column-oriented database 
– minimal need for schema dentition 
– easily accommodate newer columns 
– predefined column-family 
– set of columns grouped together into a bundle 
– Column family(no data type) - column in an 
RDBMS(with data type) 
– Column databases designed to scale and can easily 
accommodate millions of columns and billions of 
rows 
10/20/2014 @ Surabhi Dwivedi 20
Contrasting Column Databases with 
RDBMS Cont … 
10/20/2014 @ Surabhi Dwivedi 21
Hadoop distributed filesystem (HDFS) – 
Background for Distributed Storage 
• Apache Hadoop is an open source software 
project 
• Enables the distributed processing of large data 
sets across clusters of servers 
• Designed to scale up from a single server to 
thousands of machines, with a very high degree 
of fault tolerance. 
• Data in a Hadoop cluster is broken down into 
smaller pieces (called blocks) and distributed 
throughout the cluster. 
• The map and reduce functions can be executed 
on smaller subsets of larger data sets 
10/20/2014 @ Surabhi Dwivedi 22
Hadoop distributed filesystem 
(HDFS) 
A MapReduce 
◦ Map() procedure - performs filtering and sorting 
(such as sorting students by first name into 
queues, one queue for each name) 
◦ Reduce() procedure performs a summary 
operation (such as counting the number of 
students in each queue, yielding name 
frequencies). 
10/20/2014 @ Surabhi Dwivedi 23
Hadoop distributed filesystem 
(HDFS) - Example 
 A file containing the phone numbers for everyone in the 
United States; 
 The people with a last name starting with A might be 
stored on server 1, B on server 2, and so on. 
 In a Hadoop world, pieces of this phonebook would be 
stored across the cluster 
 To reconstruct the entire phonebook, your program 
would need the blocks from every server in the cluster. 
 To achieve availability as components fail, HDFS 
replicates these smaller pieces onto two additional 
servers by default. 
◦ This redundancy offers multiple benefits, 
 Higher availability. 
 Scalability : Hadoop cluster break work into smaller chunks and run 
those jobs on all the servers in the cluster 
 Data locality, which is critical when working with large data sets. 
10/20/2014 @ Surabhi Dwivedi 24
Hbase - Distributed Storage 
 HBase is a column-oriented database 
management system that runs on top of 
HDFS. 
 HBase’s distributed architecture is designed 
for applications storing up to billions of 
rows and millions of columns 
 A good option to replace a relational 
database that cannot support such large data 
sets. 
10/20/2014 @ Surabhi Dwivedi 25
Hbase Distributed Storage Architecture 
10/20/2014 @ Surabhi Dwivedi 26
10/20/2014 @ Surabhi Dwivedi 27
• master-worker pattern 
• A master and a set of workers(range servers) 
• When HBase starts, master allocates set of ranges to a range 
server. 
• Each range stores an ordered set of rows, where each row is 
idetified by a unique row-key. 
• As number of rows stored in a range grows beyond a 
configured thresold 
• the range is split into two and rows are divided between the 
two new ranges. 
10/20/2014 @ Surabhi Dwivedi 28
write-ahead-log (WAL) 
• WAL is a common technique for providing atomicity 
and durability (two of the ACID properties). 
• When data is written to a region, it’s first written to the 
write-ahead-log, if enabled. 
• Later, it’s written to the region’s in-memory store. 
• If the in-memory store is full, data is flushed to disk 
and persisted in the underlying distributed storage. 
• In HBase a client program could decide to turn WAL 
on or switch it off. 
• Switching it off would boost performance but reduce 
reliability and recovery, in case of failure. 
10/20/2014 @ Surabhi Dwivedi 29
write-ahead-log (WAL) 
10/20/2014 @ Surabhi Dwivedi 30
Document Model 
 Notion of a schema is dynamic: each 
document can contain different fields. 
◦ Helpful for modeling unstructured and 
polymorphic data. 
◦ It also makes it easier to evolve an application 
during development , such as adding new fields. 
◦ Data can be queried based on any fields in a 
document 
10/20/2014 @ Surabhi Dwivedi 31
DOCUMENT STORE 
• Documents are grouped together into collections 
• Collections - relational tables. 
• Collections don’t impose strict schema 
constraints 
• Records are not documents in the sense of a 
word processing document 
• Structure of any document can be modified 
• By adding and removing members from the document 
- by reading the document into program, modifying it 
and re-saving it 
• By using various update commands. 
10/20/2014 @ Surabhi Dwivedi 32
DOCUMENT STORE 
• Each document is stored in BSON format. 
• Binary data (using BSON format) can be stored 
in any of the fields in the document. 
• BSON is a binary-encoded representation of a JSON-type 
document format 
– nested set of key/value pairs. 
– JSON – JavaScript Object Notation 
• BSON is a superset of JSON 
– supports additional types 
• regular expression, 
• binary data, 
• date. 
• Each document has a unique identifier, which 
MongoDB can generate like auto-generated object ids 
10/20/2014 @ Surabhi Dwivedi 33
DOCUMENT STORE 
 Document databases – 
◦ Good for storing and managing Big Data-size 
collections of literal documents 
 like text documents, email messages, and XML 
documents 
 conceptual “documents” like de-normalized 
(aggregate) representations of a database entity 
 Good for storing “sparse” data 
◦ irregular (semi-structured) data that would 
require an extensive use of “nulls” in an 
RDBMS. 
10/20/2014 @ Surabhi Dwivedi 34
DOCUMENT STORE 
 “Documents” are encoded in a standard data exchange 
format 
◦ XML, JSON (JavaScript Object Notation) or BSON (Binary 
JSON). 
 Unlike the simple key-value stores, the value column in 
document databases contains semi-structured data 
◦ specifically attribute name/value pairs. 
 A single column can house hundreds of such attributes 
 Number and type of attributes recorded can vary from 
row to row. 
 Both keys and values are fully searchable in document 
databases. 
10/20/2014 @ Surabhi Dwivedi 35
DOCUMENT STORE 
 Records within a single table can have different structures. 
 An example record from Mongo, using JSON format, might 
look like 
{ 
“_id” : ObjectId(“4fccbf281168a6aa3c215443″), 
“first_name” : “Thomas”, 
“last_name” : “Jefferson”, 
“address” : { 
“street” : “1600 Pennsylvania Ave NW”, 
“city” : “Washington”, 
“state” : “DC” 
} 
} 
10/20/2014 @ Surabhi Dwivedi 36
Document Store - Internals 
 Document Stores 
◦ Like Key-Value Stores, except Value is a “Document” 
 Data model: (key, “document”) pairs 
 Basic operations: I 
◦ Insert (key, document), 
◦ Fetch(key), Update(key), 
◦ Delete(key) 
 Also Fetch() based on document contents 
 Example systems 
◦ CouchDB, MongoDB 
 Document stores 
◦ Store arbitrary/extensible structures as a “value” 
10/20/2014 @ Surabhi Dwivedi 37
10/20/2014 @ Surabhi Dwivedi 38
Advantages of the Document Model 
 More natural to represent data at the database level 
 An aggregated document can be accessed with a 
single call to the database 
◦ rather than having to JOIN multiple tables to respond to a 
query. 
 The MongoDB document is physically stored as a 
single object, requiring only a single read from 
memory or disk. 
◦ RDBMS JOINs require multiple reads from multiple 
physical locations. 
 Distributing the database across multiple nodes (a 
process called sharding) is easier 
◦ horizontal scalability 
◦ documents are self-contained 
10/20/2014 @ Surabhi Dwivedi 39
MongoDB- Features 
 MongoDB provides high performance data persistence. 
◦ Support for embedded data models reduces I/O activity on database 
system. 
◦ Indexes support faster queries and can include keys from embedded 
documents and arrays. 
 High Availability 
◦ automatic failover. 
◦ data redundancy. 
 A replica set is a group of MongoDB servers that maintain the 
same data set, providing redundancy and increasing data 
availability. 
 Automatic Scaling 
◦ MongoDB provides horizontal scalability as part of its core 
functionality. 
◦ Automatic sharding distributes data across a cluster of machines. 
◦ Replica sets can provide eventually-consistent reads for low-latency 
high throughput deployments. 
10/20/2014 @ Surabhi Dwivedi 40
MongoDB - Sharding 
• Data is distributed across multiple range servers 
• MongoDB allows ordered collections to be saved across 
multiple machines. 
• Shards are replicated to allow failover. 
• Large collection could be split into four shards 
• Each shard in turn may be replicated three times. 
• This would create 12 units of a MongoDB server. 
• The two additional cpies of each shard serve as failover units. 
• Sharding addresses the challenge of scaling to support 
high throughput and large data sets: 
• Each shard processes fewer operations as the cluster grows. 
• As a result, a cluster can increase capacity and throughput 
horizontally. 
• For example, to insert data, the application only needs to access 
the shard responsible for that record. 
• Sharding reduces the amount of data that each server needs to 
store. Each shard stores less data as the cluster grows. 
10/20/2014 @ Surabhi Dwivedi 41
•Data set is divided and 
distributed data over 
multiple servers, or shards. 
• Each shard is an 
independent database, and 
collectively, the shards make 
up a single logical database. 
10/20/2014 @ Surabhi Dwivedi 42
Distributed Key-Value Systems 
 Key-Value Pair (KVP) Stores 
◦ Access data (values) by strings called keys. 
◦ Data has no required format – data may have any format 
◦ Extremely simple interface 
 Data model: (key, value) pairs 
 NoSQL Key-Value store is a single table with two 
columns: 
◦ one being the (Primary) Key, and the other being the Value. 
 Basic Operations: Insert (key, value), Fetch 
(key),Update (key), Delete (key) 
◦ Implementation: efficiency, scalability, fault-tolerance 
 Records distributed to nodes based on key Replication 
 Single-record transactions, “eventual consistency” 
10/20/2014 @ Surabhi Dwivedi 43
Example- Key Value 
 Riak 
 Redis 
 Memcached DB 
 Berkeley DB 
 Hamster DB (especially suited for 
embedded use) 
 Amazon Dynamo DB (not open source) 
 Project Voldemort (Open Source 
Implementation of Dynamo DB) 
10/20/2014 @ Surabhi Dwivedi 44
References 
 Professional NoSQL – Shashank Tiwari 
 MongoDB Manual 
 http://docs.mongodb.org 
 http://docs.mongodb.org/manual/core/shar 
ding-introduction/ 
 Wikipedia References 
 Intro to Hbase Internals & Schema Design 
(for HBase Users) 
◦ Alex Baranau, Sematext International, 2012 
10/20/2014 @ Surabhi Dwivedi 45
Ad

More Related Content

What's hot (20)

Cloud Deployments with Apache Hadoop and Apache HBase
Cloud Deployments with Apache Hadoop and Apache HBaseCloud Deployments with Apache Hadoop and Apache HBase
Cloud Deployments with Apache Hadoop and Apache HBase
DATAVERSITY
 
HBaseCon 2012 | HBase Schema Design - Ian Varley, Salesforce
HBaseCon 2012 | HBase Schema Design - Ian Varley, SalesforceHBaseCon 2012 | HBase Schema Design - Ian Varley, Salesforce
HBaseCon 2012 | HBase Schema Design - Ian Varley, Salesforce
Cloudera, Inc.
 
Intro to HBase Internals & Schema Design (for HBase users)
Intro to HBase Internals & Schema Design (for HBase users)Intro to HBase Internals & Schema Design (for HBase users)
Intro to HBase Internals & Schema Design (for HBase users)
alexbaranau
 
Sql vs NoSQL-Presentation
 Sql vs NoSQL-Presentation Sql vs NoSQL-Presentation
Sql vs NoSQL-Presentation
Shubham Tomar
 
NoSQL Seminer
NoSQL SeminerNoSQL Seminer
NoSQL Seminer
Partha Das
 
Nosql
NosqlNosql
Nosql
Muluken Sholaye Tesfaye
 
NoSQL databases - An introduction
NoSQL databases - An introductionNoSQL databases - An introduction
NoSQL databases - An introduction
Pooyan Mehrparvar
 
Column db dol
Column db dolColumn db dol
Column db dol
poojabi
 
Non Relational Databases
Non Relational DatabasesNon Relational Databases
Non Relational Databases
Chris Baglieri
 
HBase In Action - Chapter 04: HBase table design
HBase In Action - Chapter 04: HBase table designHBase In Action - Chapter 04: HBase table design
HBase In Action - Chapter 04: HBase table design
phanleson
 
NoSql
NoSqlNoSql
NoSql
AnitaSenthilkumar
 
HBase: Just the Basics
HBase: Just the BasicsHBase: Just the Basics
HBase: Just the Basics
HBaseCon
 
Introduction to Apache Accumulo
Introduction to Apache AccumuloIntroduction to Apache Accumulo
Introduction to Apache Accumulo
Jared Winick
 
NoSQL-Database-Concepts
NoSQL-Database-ConceptsNoSQL-Database-Concepts
NoSQL-Database-Concepts
Bhaskar Gunda
 
Compaction and Splitting in Apache Accumulo
Compaction and Splitting in Apache AccumuloCompaction and Splitting in Apache Accumulo
Compaction and Splitting in Apache Accumulo
Hortonworks
 
7. Key-Value Databases: In Depth
7. Key-Value Databases: In Depth7. Key-Value Databases: In Depth
7. Key-Value Databases: In Depth
Fabio Fumarola
 
Hadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
Hadoop World 2011: Advanced HBase Schema Design - Lars George, ClouderaHadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
Hadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
Cloudera, Inc.
 
Hbase: an introduction
Hbase: an introductionHbase: an introduction
Hbase: an introduction
Jean-Baptiste Poullet
 
8. column oriented databases
8. column oriented databases8. column oriented databases
8. column oriented databases
Fabio Fumarola
 
NoSql
NoSqlNoSql
NoSql
Girish Khanzode
 
Cloud Deployments with Apache Hadoop and Apache HBase
Cloud Deployments with Apache Hadoop and Apache HBaseCloud Deployments with Apache Hadoop and Apache HBase
Cloud Deployments with Apache Hadoop and Apache HBase
DATAVERSITY
 
HBaseCon 2012 | HBase Schema Design - Ian Varley, Salesforce
HBaseCon 2012 | HBase Schema Design - Ian Varley, SalesforceHBaseCon 2012 | HBase Schema Design - Ian Varley, Salesforce
HBaseCon 2012 | HBase Schema Design - Ian Varley, Salesforce
Cloudera, Inc.
 
Intro to HBase Internals & Schema Design (for HBase users)
Intro to HBase Internals & Schema Design (for HBase users)Intro to HBase Internals & Schema Design (for HBase users)
Intro to HBase Internals & Schema Design (for HBase users)
alexbaranau
 
Sql vs NoSQL-Presentation
 Sql vs NoSQL-Presentation Sql vs NoSQL-Presentation
Sql vs NoSQL-Presentation
Shubham Tomar
 
NoSQL databases - An introduction
NoSQL databases - An introductionNoSQL databases - An introduction
NoSQL databases - An introduction
Pooyan Mehrparvar
 
Column db dol
Column db dolColumn db dol
Column db dol
poojabi
 
Non Relational Databases
Non Relational DatabasesNon Relational Databases
Non Relational Databases
Chris Baglieri
 
HBase In Action - Chapter 04: HBase table design
HBase In Action - Chapter 04: HBase table designHBase In Action - Chapter 04: HBase table design
HBase In Action - Chapter 04: HBase table design
phanleson
 
HBase: Just the Basics
HBase: Just the BasicsHBase: Just the Basics
HBase: Just the Basics
HBaseCon
 
Introduction to Apache Accumulo
Introduction to Apache AccumuloIntroduction to Apache Accumulo
Introduction to Apache Accumulo
Jared Winick
 
NoSQL-Database-Concepts
NoSQL-Database-ConceptsNoSQL-Database-Concepts
NoSQL-Database-Concepts
Bhaskar Gunda
 
Compaction and Splitting in Apache Accumulo
Compaction and Splitting in Apache AccumuloCompaction and Splitting in Apache Accumulo
Compaction and Splitting in Apache Accumulo
Hortonworks
 
7. Key-Value Databases: In Depth
7. Key-Value Databases: In Depth7. Key-Value Databases: In Depth
7. Key-Value Databases: In Depth
Fabio Fumarola
 
Hadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
Hadoop World 2011: Advanced HBase Schema Design - Lars George, ClouderaHadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
Hadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
Cloudera, Inc.
 
8. column oriented databases
8. column oriented databases8. column oriented databases
8. column oriented databases
Fabio Fumarola
 

Viewers also liked (20)

Statistical terms for classification
Statistical terms for classificationStatistical terms for classification
Statistical terms for classification
surabhi_dwivedi
 
CareerGuide.com
CareerGuide.comCareerGuide.com
CareerGuide.com
Tina Arora
 
Buy | Sell |Rent | Property | Surabhi Realtors | Brahmand |Thane
Buy | Sell |Rent | Property | Surabhi Realtors | Brahmand |ThaneBuy | Sell |Rent | Property | Surabhi Realtors | Brahmand |Thane
Buy | Sell |Rent | Property | Surabhi Realtors | Brahmand |Thane
Surabhi Realtors
 
Career counselling impact story -gautam sharma
Career counselling impact story -gautam sharmaCareer counselling impact story -gautam sharma
Career counselling impact story -gautam sharma
CareerGuide.com
 
Career experts
Career expertsCareer experts
Career experts
Raju Verma
 
Careerguide.com
Careerguide.comCareerguide.com
Careerguide.com
Nikhil Agrawal
 
Stochastic Modeling for Valuation and Risk Management
Stochastic Modeling for Valuation and Risk ManagementStochastic Modeling for Valuation and Risk Management
Stochastic Modeling for Valuation and Risk Management
Roderick Powell
 
Snapshot feature of network storage
Snapshot feature of network storageSnapshot feature of network storage
Snapshot feature of network storage
qsantechnology
 
Introduction to Bag of Little Bootstrap
Introduction to Bag of Little Bootstrap Introduction to Bag of Little Bootstrap
Introduction to Bag of Little Bootstrap
Wayne Lee
 
Introduction to Mathematical Probability
Introduction to Mathematical ProbabilityIntroduction to Mathematical Probability
Introduction to Mathematical Probability
Solo Hermelin
 
Cross-Validation
Cross-ValidationCross-Validation
Cross-Validation
guestfee8698
 
Gamma, Expoential, Poisson And Chi Squared Distributions
Gamma, Expoential, Poisson And Chi Squared DistributionsGamma, Expoential, Poisson And Chi Squared Distributions
Gamma, Expoential, Poisson And Chi Squared Distributions
mathscontent
 
Btrfs current status and_future_prospects
Btrfs current status and_future_prospectsBtrfs current status and_future_prospects
Btrfs current status and_future_prospects
fj_staoru_takeuchi
 
Lecture 3: Basic Concepts of Machine Learning - Induction & Evaluation
Lecture 3: Basic Concepts of Machine Learning - Induction & EvaluationLecture 3: Basic Concepts of Machine Learning - Induction & Evaluation
Lecture 3: Basic Concepts of Machine Learning - Induction & Evaluation
Marina Santini
 
Work ethics by baskaran
Work ethics by baskaranWork ethics by baskaran
Work ethics by baskaran
baskaranpaf
 
Stochastic modelling and its applications
Stochastic modelling and its applicationsStochastic modelling and its applications
Stochastic modelling and its applications
Kartavya Jain
 
INTRODUCTION TO UML DIAGRAMS
INTRODUCTION TO UML DIAGRAMSINTRODUCTION TO UML DIAGRAMS
INTRODUCTION TO UML DIAGRAMS
Ashita Agrawal
 
Career development ppt
Career development pptCareer development ppt
Career development ppt
Nancy Erskine-Sackey
 
basics of computer system ppt
basics of computer system pptbasics of computer system ppt
basics of computer system ppt
Suaj
 
Introduction to computer network
Introduction to computer networkIntroduction to computer network
Introduction to computer network
Ashita Agrawal
 
Statistical terms for classification
Statistical terms for classificationStatistical terms for classification
Statistical terms for classification
surabhi_dwivedi
 
CareerGuide.com
CareerGuide.comCareerGuide.com
CareerGuide.com
Tina Arora
 
Buy | Sell |Rent | Property | Surabhi Realtors | Brahmand |Thane
Buy | Sell |Rent | Property | Surabhi Realtors | Brahmand |ThaneBuy | Sell |Rent | Property | Surabhi Realtors | Brahmand |Thane
Buy | Sell |Rent | Property | Surabhi Realtors | Brahmand |Thane
Surabhi Realtors
 
Career counselling impact story -gautam sharma
Career counselling impact story -gautam sharmaCareer counselling impact story -gautam sharma
Career counselling impact story -gautam sharma
CareerGuide.com
 
Career experts
Career expertsCareer experts
Career experts
Raju Verma
 
Stochastic Modeling for Valuation and Risk Management
Stochastic Modeling for Valuation and Risk ManagementStochastic Modeling for Valuation and Risk Management
Stochastic Modeling for Valuation and Risk Management
Roderick Powell
 
Snapshot feature of network storage
Snapshot feature of network storageSnapshot feature of network storage
Snapshot feature of network storage
qsantechnology
 
Introduction to Bag of Little Bootstrap
Introduction to Bag of Little Bootstrap Introduction to Bag of Little Bootstrap
Introduction to Bag of Little Bootstrap
Wayne Lee
 
Introduction to Mathematical Probability
Introduction to Mathematical ProbabilityIntroduction to Mathematical Probability
Introduction to Mathematical Probability
Solo Hermelin
 
Gamma, Expoential, Poisson And Chi Squared Distributions
Gamma, Expoential, Poisson And Chi Squared DistributionsGamma, Expoential, Poisson And Chi Squared Distributions
Gamma, Expoential, Poisson And Chi Squared Distributions
mathscontent
 
Btrfs current status and_future_prospects
Btrfs current status and_future_prospectsBtrfs current status and_future_prospects
Btrfs current status and_future_prospects
fj_staoru_takeuchi
 
Lecture 3: Basic Concepts of Machine Learning - Induction & Evaluation
Lecture 3: Basic Concepts of Machine Learning - Induction & EvaluationLecture 3: Basic Concepts of Machine Learning - Induction & Evaluation
Lecture 3: Basic Concepts of Machine Learning - Induction & Evaluation
Marina Santini
 
Work ethics by baskaran
Work ethics by baskaranWork ethics by baskaran
Work ethics by baskaran
baskaranpaf
 
Stochastic modelling and its applications
Stochastic modelling and its applicationsStochastic modelling and its applications
Stochastic modelling and its applications
Kartavya Jain
 
INTRODUCTION TO UML DIAGRAMS
INTRODUCTION TO UML DIAGRAMSINTRODUCTION TO UML DIAGRAMS
INTRODUCTION TO UML DIAGRAMS
Ashita Agrawal
 
basics of computer system ppt
basics of computer system pptbasics of computer system ppt
basics of computer system ppt
Suaj
 
Introduction to computer network
Introduction to computer networkIntroduction to computer network
Introduction to computer network
Ashita Agrawal
 
Ad

Similar to No SQL introduction (20)

Introduction to NoSQL
Introduction to NoSQLIntroduction to NoSQL
Introduction to NoSQL
balwinders
 
Why no sql ? Why Couchbase ?
Why no sql ? Why Couchbase ?Why no sql ? Why Couchbase ?
Why no sql ? Why Couchbase ?
Ahmed Rashwan
 
NoSQL Basics and MongDB
NoSQL Basics and  MongDBNoSQL Basics and  MongDB
NoSQL Basics and MongDB
Shamima Yeasmin Mukta
 
Dsm project-h base-cassandra
Dsm project-h base-cassandraDsm project-h base-cassandra
Dsm project-h base-cassandra
Shantanu Deshpande
 
my no sql introductiobkjhikjhkjhkhjhgchjvbbnn.ppt
my no sql introductiobkjhikjhkjhkhjhgchjvbbnn.pptmy no sql introductiobkjhikjhkjhkhjhgchjvbbnn.ppt
my no sql introductiobkjhikjhkjhkhjhgchjvbbnn.ppt
wondimagegndesta
 
NoSQL BIg Data Analytics Mongo DB and Cassandra .pdf
NoSQL BIg Data Analytics Mongo DB and Cassandra .pdfNoSQL BIg Data Analytics Mongo DB and Cassandra .pdf
NoSQL BIg Data Analytics Mongo DB and Cassandra .pdf
SharmilaChidaravalli
 
Research on vector spatial data storage scheme based
Research on vector spatial data storage scheme basedResearch on vector spatial data storage scheme based
Research on vector spatial data storage scheme based
Anant Kumar
 
Managing Big data with Hadoop
Managing Big data with HadoopManaging Big data with Hadoop
Managing Big data with Hadoop
Nalini Mehta
 
DSM - Comparison of Hbase and Cassandra
DSM - Comparison of Hbase and CassandraDSM - Comparison of Hbase and Cassandra
DSM - Comparison of Hbase and Cassandra
Shrikant Samarth
 
2.Introduction to NOSQL (Core concepts).pptx
2.Introduction to NOSQL (Core concepts).pptx2.Introduction to NOSQL (Core concepts).pptx
2.Introduction to NOSQL (Core concepts).pptx
RushikeshChikane2
 
Apache HBase™
Apache HBase™Apache HBase™
Apache HBase™
Prashant Gupta
 
unit2-ppt1.pptx
unit2-ppt1.pptxunit2-ppt1.pptx
unit2-ppt1.pptx
revathigollu23
 
Presentation on NoSQL Database related RDBMS
Presentation on NoSQL Database related RDBMSPresentation on NoSQL Database related RDBMS
Presentation on NoSQL Database related RDBMS
abdurrobsoyon
 
Hbase 20141003
Hbase 20141003Hbase 20141003
Hbase 20141003
Jean-Baptiste Poullet
 
Big Data and Cloud Computing
Big Data and Cloud ComputingBig Data and Cloud Computing
Big Data and Cloud Computing
Farzad Nozarian
 
NoSQL and MongoDB
NoSQL and MongoDBNoSQL and MongoDB
NoSQL and MongoDB
Rajesh Menon
 
No sq lv2
No sq lv2No sq lv2
No sq lv2
Nusrat Sharmin
 
Nosql databases
Nosql databasesNosql databases
Nosql databases
Fayez Shayeb
 
SURVEY ON IMPLEMANTATION OF COLUMN ORIENTED NOSQL DATA STORES ( BIGTABLE & CA...
SURVEY ON IMPLEMANTATION OF COLUMN ORIENTED NOSQL DATA STORES ( BIGTABLE & CA...SURVEY ON IMPLEMANTATION OF COLUMN ORIENTED NOSQL DATA STORES ( BIGTABLE & CA...
SURVEY ON IMPLEMANTATION OF COLUMN ORIENTED NOSQL DATA STORES ( BIGTABLE & CA...
IJCERT JOURNAL
 
No sql database
No sql databaseNo sql database
No sql database
vishal gupta
 
Introduction to NoSQL
Introduction to NoSQLIntroduction to NoSQL
Introduction to NoSQL
balwinders
 
Why no sql ? Why Couchbase ?
Why no sql ? Why Couchbase ?Why no sql ? Why Couchbase ?
Why no sql ? Why Couchbase ?
Ahmed Rashwan
 
my no sql introductiobkjhikjhkjhkhjhgchjvbbnn.ppt
my no sql introductiobkjhikjhkjhkhjhgchjvbbnn.pptmy no sql introductiobkjhikjhkjhkhjhgchjvbbnn.ppt
my no sql introductiobkjhikjhkjhkhjhgchjvbbnn.ppt
wondimagegndesta
 
NoSQL BIg Data Analytics Mongo DB and Cassandra .pdf
NoSQL BIg Data Analytics Mongo DB and Cassandra .pdfNoSQL BIg Data Analytics Mongo DB and Cassandra .pdf
NoSQL BIg Data Analytics Mongo DB and Cassandra .pdf
SharmilaChidaravalli
 
Research on vector spatial data storage scheme based
Research on vector spatial data storage scheme basedResearch on vector spatial data storage scheme based
Research on vector spatial data storage scheme based
Anant Kumar
 
Managing Big data with Hadoop
Managing Big data with HadoopManaging Big data with Hadoop
Managing Big data with Hadoop
Nalini Mehta
 
DSM - Comparison of Hbase and Cassandra
DSM - Comparison of Hbase and CassandraDSM - Comparison of Hbase and Cassandra
DSM - Comparison of Hbase and Cassandra
Shrikant Samarth
 
2.Introduction to NOSQL (Core concepts).pptx
2.Introduction to NOSQL (Core concepts).pptx2.Introduction to NOSQL (Core concepts).pptx
2.Introduction to NOSQL (Core concepts).pptx
RushikeshChikane2
 
Presentation on NoSQL Database related RDBMS
Presentation on NoSQL Database related RDBMSPresentation on NoSQL Database related RDBMS
Presentation on NoSQL Database related RDBMS
abdurrobsoyon
 
Big Data and Cloud Computing
Big Data and Cloud ComputingBig Data and Cloud Computing
Big Data and Cloud Computing
Farzad Nozarian
 
SURVEY ON IMPLEMANTATION OF COLUMN ORIENTED NOSQL DATA STORES ( BIGTABLE & CA...
SURVEY ON IMPLEMANTATION OF COLUMN ORIENTED NOSQL DATA STORES ( BIGTABLE & CA...SURVEY ON IMPLEMANTATION OF COLUMN ORIENTED NOSQL DATA STORES ( BIGTABLE & CA...
SURVEY ON IMPLEMANTATION OF COLUMN ORIENTED NOSQL DATA STORES ( BIGTABLE & CA...
IJCERT JOURNAL
 
Ad

Recently uploaded (20)

TestMigrationsInPy: A Dataset of Test Migrations from Unittest to Pytest (MSR...
TestMigrationsInPy: A Dataset of Test Migrations from Unittest to Pytest (MSR...TestMigrationsInPy: A Dataset of Test Migrations from Unittest to Pytest (MSR...
TestMigrationsInPy: A Dataset of Test Migrations from Unittest to Pytest (MSR...
Andre Hora
 
Landscape of Requirements Engineering for/by AI through Literature Review
Landscape of Requirements Engineering for/by AI through Literature ReviewLandscape of Requirements Engineering for/by AI through Literature Review
Landscape of Requirements Engineering for/by AI through Literature Review
Hironori Washizaki
 
Solidworks Crack 2025 latest new + license code
Solidworks Crack 2025 latest new + license codeSolidworks Crack 2025 latest new + license code
Solidworks Crack 2025 latest new + license code
aneelaramzan63
 
FL Studio Producer Edition Crack 2025 Full Version
FL Studio Producer Edition Crack 2025 Full VersionFL Studio Producer Edition Crack 2025 Full Version
FL Studio Producer Edition Crack 2025 Full Version
tahirabibi60507
 
Download Wondershare Filmora Crack [2025] With Latest
Download Wondershare Filmora Crack [2025] With LatestDownload Wondershare Filmora Crack [2025] With Latest
Download Wondershare Filmora Crack [2025] With Latest
tahirabibi60507
 
Requirements in Engineering AI- Enabled Systems: Open Problems and Safe AI Sy...
Requirements in Engineering AI- Enabled Systems: Open Problems and Safe AI Sy...Requirements in Engineering AI- Enabled Systems: Open Problems and Safe AI Sy...
Requirements in Engineering AI- Enabled Systems: Open Problems and Safe AI Sy...
Lionel Briand
 
Mastering Fluent Bit: Ultimate Guide to Integrating Telemetry Pipelines with ...
Mastering Fluent Bit: Ultimate Guide to Integrating Telemetry Pipelines with ...Mastering Fluent Bit: Ultimate Guide to Integrating Telemetry Pipelines with ...
Mastering Fluent Bit: Ultimate Guide to Integrating Telemetry Pipelines with ...
Eric D. Schabell
 
Not So Common Memory Leaks in Java Webinar
Not So Common Memory Leaks in Java WebinarNot So Common Memory Leaks in Java Webinar
Not So Common Memory Leaks in Java Webinar
Tier1 app
 
Adobe Lightroom Classic Crack FREE Latest link 2025
Adobe Lightroom Classic Crack FREE Latest link 2025Adobe Lightroom Classic Crack FREE Latest link 2025
Adobe Lightroom Classic Crack FREE Latest link 2025
kashifyounis067
 
Why Orangescrum Is a Game Changer for Construction Companies in 2025
Why Orangescrum Is a Game Changer for Construction Companies in 2025Why Orangescrum Is a Game Changer for Construction Companies in 2025
Why Orangescrum Is a Game Changer for Construction Companies in 2025
Orangescrum
 
Adobe Illustrator Crack FREE Download 2025 Latest Version
Adobe Illustrator Crack FREE Download 2025 Latest VersionAdobe Illustrator Crack FREE Download 2025 Latest Version
Adobe Illustrator Crack FREE Download 2025 Latest Version
kashifyounis067
 
Exploring Wayland: A Modern Display Server for the Future
Exploring Wayland: A Modern Display Server for the FutureExploring Wayland: A Modern Display Server for the Future
Exploring Wayland: A Modern Display Server for the Future
ICS
 
Adobe After Effects Crack FREE FRESH version 2025
Adobe After Effects Crack FREE FRESH version 2025Adobe After Effects Crack FREE FRESH version 2025
Adobe After Effects Crack FREE FRESH version 2025
kashifyounis067
 
Revolutionizing Residential Wi-Fi PPT.pptx
Revolutionizing Residential Wi-Fi PPT.pptxRevolutionizing Residential Wi-Fi PPT.pptx
Revolutionizing Residential Wi-Fi PPT.pptx
nidhisingh691197
 
Expand your AI adoption with AgentExchange
Expand your AI adoption with AgentExchangeExpand your AI adoption with AgentExchange
Expand your AI adoption with AgentExchange
Fexle Services Pvt. Ltd.
 
Adobe Master Collection CC Crack Advance Version 2025
Adobe Master Collection CC Crack Advance Version 2025Adobe Master Collection CC Crack Advance Version 2025
Adobe Master Collection CC Crack Advance Version 2025
kashifyounis067
 
Maxon CINEMA 4D 2025 Crack FREE Download LINK
Maxon CINEMA 4D 2025 Crack FREE Download LINKMaxon CINEMA 4D 2025 Crack FREE Download LINK
Maxon CINEMA 4D 2025 Crack FREE Download LINK
younisnoman75
 
Salesforce Data Cloud- Hyperscale data platform, built for Salesforce.
Salesforce Data Cloud- Hyperscale data platform, built for Salesforce.Salesforce Data Cloud- Hyperscale data platform, built for Salesforce.
Salesforce Data Cloud- Hyperscale data platform, built for Salesforce.
Dele Amefo
 
Proactive Vulnerability Detection in Source Code Using Graph Neural Networks:...
Proactive Vulnerability Detection in Source Code Using Graph Neural Networks:...Proactive Vulnerability Detection in Source Code Using Graph Neural Networks:...
Proactive Vulnerability Detection in Source Code Using Graph Neural Networks:...
Ranjan Baisak
 
How Valletta helped healthcare SaaS to transform QA and compliance to grow wi...
How Valletta helped healthcare SaaS to transform QA and compliance to grow wi...How Valletta helped healthcare SaaS to transform QA and compliance to grow wi...
How Valletta helped healthcare SaaS to transform QA and compliance to grow wi...
Egor Kaleynik
 
TestMigrationsInPy: A Dataset of Test Migrations from Unittest to Pytest (MSR...
TestMigrationsInPy: A Dataset of Test Migrations from Unittest to Pytest (MSR...TestMigrationsInPy: A Dataset of Test Migrations from Unittest to Pytest (MSR...
TestMigrationsInPy: A Dataset of Test Migrations from Unittest to Pytest (MSR...
Andre Hora
 
Landscape of Requirements Engineering for/by AI through Literature Review
Landscape of Requirements Engineering for/by AI through Literature ReviewLandscape of Requirements Engineering for/by AI through Literature Review
Landscape of Requirements Engineering for/by AI through Literature Review
Hironori Washizaki
 
Solidworks Crack 2025 latest new + license code
Solidworks Crack 2025 latest new + license codeSolidworks Crack 2025 latest new + license code
Solidworks Crack 2025 latest new + license code
aneelaramzan63
 
FL Studio Producer Edition Crack 2025 Full Version
FL Studio Producer Edition Crack 2025 Full VersionFL Studio Producer Edition Crack 2025 Full Version
FL Studio Producer Edition Crack 2025 Full Version
tahirabibi60507
 
Download Wondershare Filmora Crack [2025] With Latest
Download Wondershare Filmora Crack [2025] With LatestDownload Wondershare Filmora Crack [2025] With Latest
Download Wondershare Filmora Crack [2025] With Latest
tahirabibi60507
 
Requirements in Engineering AI- Enabled Systems: Open Problems and Safe AI Sy...
Requirements in Engineering AI- Enabled Systems: Open Problems and Safe AI Sy...Requirements in Engineering AI- Enabled Systems: Open Problems and Safe AI Sy...
Requirements in Engineering AI- Enabled Systems: Open Problems and Safe AI Sy...
Lionel Briand
 
Mastering Fluent Bit: Ultimate Guide to Integrating Telemetry Pipelines with ...
Mastering Fluent Bit: Ultimate Guide to Integrating Telemetry Pipelines with ...Mastering Fluent Bit: Ultimate Guide to Integrating Telemetry Pipelines with ...
Mastering Fluent Bit: Ultimate Guide to Integrating Telemetry Pipelines with ...
Eric D. Schabell
 
Not So Common Memory Leaks in Java Webinar
Not So Common Memory Leaks in Java WebinarNot So Common Memory Leaks in Java Webinar
Not So Common Memory Leaks in Java Webinar
Tier1 app
 
Adobe Lightroom Classic Crack FREE Latest link 2025
Adobe Lightroom Classic Crack FREE Latest link 2025Adobe Lightroom Classic Crack FREE Latest link 2025
Adobe Lightroom Classic Crack FREE Latest link 2025
kashifyounis067
 
Why Orangescrum Is a Game Changer for Construction Companies in 2025
Why Orangescrum Is a Game Changer for Construction Companies in 2025Why Orangescrum Is a Game Changer for Construction Companies in 2025
Why Orangescrum Is a Game Changer for Construction Companies in 2025
Orangescrum
 
Adobe Illustrator Crack FREE Download 2025 Latest Version
Adobe Illustrator Crack FREE Download 2025 Latest VersionAdobe Illustrator Crack FREE Download 2025 Latest Version
Adobe Illustrator Crack FREE Download 2025 Latest Version
kashifyounis067
 
Exploring Wayland: A Modern Display Server for the Future
Exploring Wayland: A Modern Display Server for the FutureExploring Wayland: A Modern Display Server for the Future
Exploring Wayland: A Modern Display Server for the Future
ICS
 
Adobe After Effects Crack FREE FRESH version 2025
Adobe After Effects Crack FREE FRESH version 2025Adobe After Effects Crack FREE FRESH version 2025
Adobe After Effects Crack FREE FRESH version 2025
kashifyounis067
 
Revolutionizing Residential Wi-Fi PPT.pptx
Revolutionizing Residential Wi-Fi PPT.pptxRevolutionizing Residential Wi-Fi PPT.pptx
Revolutionizing Residential Wi-Fi PPT.pptx
nidhisingh691197
 
Expand your AI adoption with AgentExchange
Expand your AI adoption with AgentExchangeExpand your AI adoption with AgentExchange
Expand your AI adoption with AgentExchange
Fexle Services Pvt. Ltd.
 
Adobe Master Collection CC Crack Advance Version 2025
Adobe Master Collection CC Crack Advance Version 2025Adobe Master Collection CC Crack Advance Version 2025
Adobe Master Collection CC Crack Advance Version 2025
kashifyounis067
 
Maxon CINEMA 4D 2025 Crack FREE Download LINK
Maxon CINEMA 4D 2025 Crack FREE Download LINKMaxon CINEMA 4D 2025 Crack FREE Download LINK
Maxon CINEMA 4D 2025 Crack FREE Download LINK
younisnoman75
 
Salesforce Data Cloud- Hyperscale data platform, built for Salesforce.
Salesforce Data Cloud- Hyperscale data platform, built for Salesforce.Salesforce Data Cloud- Hyperscale data platform, built for Salesforce.
Salesforce Data Cloud- Hyperscale data platform, built for Salesforce.
Dele Amefo
 
Proactive Vulnerability Detection in Source Code Using Graph Neural Networks:...
Proactive Vulnerability Detection in Source Code Using Graph Neural Networks:...Proactive Vulnerability Detection in Source Code Using Graph Neural Networks:...
Proactive Vulnerability Detection in Source Code Using Graph Neural Networks:...
Ranjan Baisak
 
How Valletta helped healthcare SaaS to transform QA and compliance to grow wi...
How Valletta helped healthcare SaaS to transform QA and compliance to grow wi...How Valletta helped healthcare SaaS to transform QA and compliance to grow wi...
How Valletta helped healthcare SaaS to transform QA and compliance to grow wi...
Egor Kaleynik
 

No SQL introduction

  • 1. NO SQL 10/20/2014 @ Surabhi Dwivedi 1
  • 2. Contents  Introduction and Feature of NoSQL  CAP Theorem  RDBMS VS NoSQL  NoSQL Database family 10/20/2014 @ Surabhi Dwivedi 2
  • 3. Features- Not Only SQL  No RDBMS ◦ No relational  Distributed Data Store ◦ Horizontally scalable  Schema-free / Flexible schema ◦ Database JOINs generally not supported  A huge amount of data ◦ Eg Google/Facebook which collects terabits of data  BASE properties ◦ Basically Available ◦ Soft state  It does not have to be consitent all the time ◦ Eventually consistent  The system will eventually become consistent when the updates propagate, in particular, when there are not too many updates 10/20/2014 @ Surabhi Dwivedi 3
  • 4. NoSQL  Provides a mechanism for ◦ storage and retrieval of data that is modeled in means other than the tabular relations used in relational databases  Used in big data and real-time web applications  NoSQL isn’t a single product or technology, but an umbrella term for a category of databases 10/20/2014 @ Surabhi Dwivedi 4
  • 5. NoSQL does not Provide  Joins  Group by  ACID transactions  SQL  NoSQL databases reject: ◦ Overhead of ACID transactions ◦ “Complexity” of SQL ◦ Burden of up-front schema design ◦ Declarative query expression 10/20/2014 @ Surabhi Dwivedi 5
  • 7. Requirement of NoSQL 10/20/2014 @ Surabhi Dwivedi 7
  • 8. NoSQL - Users 10/20/2014 @ Surabhi Dwivedi 8
  • 9. CAP Theorem 10/20/2014 @ Surabhi Dwivedi 9
  • 10. CAP Theorem  Three properties of a system ◦ Consistency  all copies have same value ◦ Availability  system can run even if parts have failed Via replication ◦ Partitions  network can break into two or more parts, each with active systems that can’t talk to other parts  Very large systems will partition at some point ◦ Choose one of consistency or availability ◦ Traditional database choose consistency ◦ Most Web applications choose availability  Except for specific parts such as order processing 10/20/2014 @ Surabhi Dwivedi 10
  • 11. RDBMS VS NoSQL database RDBMS NoSQL Structured and organized data Stands for Not Only SQL Structured query language (SQL) No declarative query language Data and its relationships are stored in separate tables. No predefined schema Data Manipulation Language, Data Definition Language Variants - Key-Value Pair Store, Column Store, Document Store, Graph Store Tight Consistency Eventual consistency rather ACID property ACID Transaction CAP Theorem - Prioritizes high performance, high availability and scalability 10/20/2014 @ Surabhi Dwivedi 11
  • 12. Example –NoSQL Databases 10/20/2014 @ Surabhi Dwivedi 12
  • 13. NoSQL Database Family 10/20/2014 @ Surabhi Dwivedi 13
  • 14. NoSQL Database Types • Hash table of keys • Lookup a single value for a key • Amazon’s Dynamo Distributed Key- Value Systems • Stores documents made up of tagged elements • Access data by key or by search of “document” data. • CouchDB, MongoDB Document-based Systems • Each storage block contains data from only one column • Google’s BigTable • Facebook’s Cassandra Column-based Systems • Use a graph structure • Google’s Pregel, - Neo4j Graph-based Systems 10/20/2014 @ Surabhi Dwivedi 14
  • 15. Column-oriented databases • Column-family stores allow you to store data with keys mapped to values and the values grouped into multiple column families, • Each column family being a map of data Most popular types - non-relational databases • Column-family databases store data in column families as rows • They have many columns associated with a row key • Column families are groups of related data that is often • accessed together • The basic unit of storage in Column-family databases is a column • Example • Hadoop / Hbase • Cassandra :Apache Cassandra was initially developed at Facebook to power their Inbox Search feature • Cloudata :Google's Big table clone like HBase 10/20/2014 @ Surabhi Dwivedi 15
  • 16. Column-Oriented Databases Cont …  Data tables are stored as sections of columns of data, rather than as rows of data.  The column is used as a store for the value, and has a timestamp that is used to differentiate the valid content from stale ones.  Application will use the timestamp to find out which of the stored values in the backup nodes are up-to-date.  Column Family ◦ A container for columns, analogous to table in a relational database. ◦ The column Family has a name, a map with a key and a value(which is a map c10o/20n/2t0a14in@in Sgura cbhoi Dlwuivmedi ns). 16
  • 17. Example  Cassandra  Hbase  Hypertable  Amazon Simple DB 10/20/2014 @ Surabhi Dwivedi 17
  • 18. { “row_key_1” : { “name” : { ... } “location” : { ... }, “preferences” : { ... } }, “row_key_2” : { “name” : { ... }, “location” : { ... }, “preferences” : { ... } }, “row_key_3” : { ... } uniquely identifies a record in a column database •Column-family identifier. •Second level key 10/20/2014 @ Surabhi Dwivedi 18
  • 19. { “row_key_1” : { “name” : { “first_name” : “Jolly”, “last_name” : “Goodfellow” } } }, “location” : { “zip”: “94301” }, “preferences” : { “d/r” : “D” } }, “row_key_2” : { “name” : { “first_name” : “Very”, “middle_name” : “Happy”, “last_name” : “Guy” }, “location” : { “zip” : “10001” }, “preferences” : “v/nv”: “V” } }, ... } Each row may have a different set of columns within a column-family 10/20/2014 @ Surabhi Dwivedi 19
  • 20. Contrasting Column Databases with RDBMS • Column-oriented database – minimal need for schema dentition – easily accommodate newer columns – predefined column-family – set of columns grouped together into a bundle – Column family(no data type) - column in an RDBMS(with data type) – Column databases designed to scale and can easily accommodate millions of columns and billions of rows 10/20/2014 @ Surabhi Dwivedi 20
  • 21. Contrasting Column Databases with RDBMS Cont … 10/20/2014 @ Surabhi Dwivedi 21
  • 22. Hadoop distributed filesystem (HDFS) – Background for Distributed Storage • Apache Hadoop is an open source software project • Enables the distributed processing of large data sets across clusters of servers • Designed to scale up from a single server to thousands of machines, with a very high degree of fault tolerance. • Data in a Hadoop cluster is broken down into smaller pieces (called blocks) and distributed throughout the cluster. • The map and reduce functions can be executed on smaller subsets of larger data sets 10/20/2014 @ Surabhi Dwivedi 22
  • 23. Hadoop distributed filesystem (HDFS) A MapReduce ◦ Map() procedure - performs filtering and sorting (such as sorting students by first name into queues, one queue for each name) ◦ Reduce() procedure performs a summary operation (such as counting the number of students in each queue, yielding name frequencies). 10/20/2014 @ Surabhi Dwivedi 23
  • 24. Hadoop distributed filesystem (HDFS) - Example  A file containing the phone numbers for everyone in the United States;  The people with a last name starting with A might be stored on server 1, B on server 2, and so on.  In a Hadoop world, pieces of this phonebook would be stored across the cluster  To reconstruct the entire phonebook, your program would need the blocks from every server in the cluster.  To achieve availability as components fail, HDFS replicates these smaller pieces onto two additional servers by default. ◦ This redundancy offers multiple benefits,  Higher availability.  Scalability : Hadoop cluster break work into smaller chunks and run those jobs on all the servers in the cluster  Data locality, which is critical when working with large data sets. 10/20/2014 @ Surabhi Dwivedi 24
  • 25. Hbase - Distributed Storage  HBase is a column-oriented database management system that runs on top of HDFS.  HBase’s distributed architecture is designed for applications storing up to billions of rows and millions of columns  A good option to replace a relational database that cannot support such large data sets. 10/20/2014 @ Surabhi Dwivedi 25
  • 26. Hbase Distributed Storage Architecture 10/20/2014 @ Surabhi Dwivedi 26
  • 27. 10/20/2014 @ Surabhi Dwivedi 27
  • 28. • master-worker pattern • A master and a set of workers(range servers) • When HBase starts, master allocates set of ranges to a range server. • Each range stores an ordered set of rows, where each row is idetified by a unique row-key. • As number of rows stored in a range grows beyond a configured thresold • the range is split into two and rows are divided between the two new ranges. 10/20/2014 @ Surabhi Dwivedi 28
  • 29. write-ahead-log (WAL) • WAL is a common technique for providing atomicity and durability (two of the ACID properties). • When data is written to a region, it’s first written to the write-ahead-log, if enabled. • Later, it’s written to the region’s in-memory store. • If the in-memory store is full, data is flushed to disk and persisted in the underlying distributed storage. • In HBase a client program could decide to turn WAL on or switch it off. • Switching it off would boost performance but reduce reliability and recovery, in case of failure. 10/20/2014 @ Surabhi Dwivedi 29
  • 30. write-ahead-log (WAL) 10/20/2014 @ Surabhi Dwivedi 30
  • 31. Document Model  Notion of a schema is dynamic: each document can contain different fields. ◦ Helpful for modeling unstructured and polymorphic data. ◦ It also makes it easier to evolve an application during development , such as adding new fields. ◦ Data can be queried based on any fields in a document 10/20/2014 @ Surabhi Dwivedi 31
  • 32. DOCUMENT STORE • Documents are grouped together into collections • Collections - relational tables. • Collections don’t impose strict schema constraints • Records are not documents in the sense of a word processing document • Structure of any document can be modified • By adding and removing members from the document - by reading the document into program, modifying it and re-saving it • By using various update commands. 10/20/2014 @ Surabhi Dwivedi 32
  • 33. DOCUMENT STORE • Each document is stored in BSON format. • Binary data (using BSON format) can be stored in any of the fields in the document. • BSON is a binary-encoded representation of a JSON-type document format – nested set of key/value pairs. – JSON – JavaScript Object Notation • BSON is a superset of JSON – supports additional types • regular expression, • binary data, • date. • Each document has a unique identifier, which MongoDB can generate like auto-generated object ids 10/20/2014 @ Surabhi Dwivedi 33
  • 34. DOCUMENT STORE  Document databases – ◦ Good for storing and managing Big Data-size collections of literal documents  like text documents, email messages, and XML documents  conceptual “documents” like de-normalized (aggregate) representations of a database entity  Good for storing “sparse” data ◦ irregular (semi-structured) data that would require an extensive use of “nulls” in an RDBMS. 10/20/2014 @ Surabhi Dwivedi 34
  • 35. DOCUMENT STORE  “Documents” are encoded in a standard data exchange format ◦ XML, JSON (JavaScript Object Notation) or BSON (Binary JSON).  Unlike the simple key-value stores, the value column in document databases contains semi-structured data ◦ specifically attribute name/value pairs.  A single column can house hundreds of such attributes  Number and type of attributes recorded can vary from row to row.  Both keys and values are fully searchable in document databases. 10/20/2014 @ Surabhi Dwivedi 35
  • 36. DOCUMENT STORE  Records within a single table can have different structures.  An example record from Mongo, using JSON format, might look like { “_id” : ObjectId(“4fccbf281168a6aa3c215443″), “first_name” : “Thomas”, “last_name” : “Jefferson”, “address” : { “street” : “1600 Pennsylvania Ave NW”, “city” : “Washington”, “state” : “DC” } } 10/20/2014 @ Surabhi Dwivedi 36
  • 37. Document Store - Internals  Document Stores ◦ Like Key-Value Stores, except Value is a “Document”  Data model: (key, “document”) pairs  Basic operations: I ◦ Insert (key, document), ◦ Fetch(key), Update(key), ◦ Delete(key)  Also Fetch() based on document contents  Example systems ◦ CouchDB, MongoDB  Document stores ◦ Store arbitrary/extensible structures as a “value” 10/20/2014 @ Surabhi Dwivedi 37
  • 38. 10/20/2014 @ Surabhi Dwivedi 38
  • 39. Advantages of the Document Model  More natural to represent data at the database level  An aggregated document can be accessed with a single call to the database ◦ rather than having to JOIN multiple tables to respond to a query.  The MongoDB document is physically stored as a single object, requiring only a single read from memory or disk. ◦ RDBMS JOINs require multiple reads from multiple physical locations.  Distributing the database across multiple nodes (a process called sharding) is easier ◦ horizontal scalability ◦ documents are self-contained 10/20/2014 @ Surabhi Dwivedi 39
  • 40. MongoDB- Features  MongoDB provides high performance data persistence. ◦ Support for embedded data models reduces I/O activity on database system. ◦ Indexes support faster queries and can include keys from embedded documents and arrays.  High Availability ◦ automatic failover. ◦ data redundancy.  A replica set is a group of MongoDB servers that maintain the same data set, providing redundancy and increasing data availability.  Automatic Scaling ◦ MongoDB provides horizontal scalability as part of its core functionality. ◦ Automatic sharding distributes data across a cluster of machines. ◦ Replica sets can provide eventually-consistent reads for low-latency high throughput deployments. 10/20/2014 @ Surabhi Dwivedi 40
  • 41. MongoDB - Sharding • Data is distributed across multiple range servers • MongoDB allows ordered collections to be saved across multiple machines. • Shards are replicated to allow failover. • Large collection could be split into four shards • Each shard in turn may be replicated three times. • This would create 12 units of a MongoDB server. • The two additional cpies of each shard serve as failover units. • Sharding addresses the challenge of scaling to support high throughput and large data sets: • Each shard processes fewer operations as the cluster grows. • As a result, a cluster can increase capacity and throughput horizontally. • For example, to insert data, the application only needs to access the shard responsible for that record. • Sharding reduces the amount of data that each server needs to store. Each shard stores less data as the cluster grows. 10/20/2014 @ Surabhi Dwivedi 41
  • 42. •Data set is divided and distributed data over multiple servers, or shards. • Each shard is an independent database, and collectively, the shards make up a single logical database. 10/20/2014 @ Surabhi Dwivedi 42
  • 43. Distributed Key-Value Systems  Key-Value Pair (KVP) Stores ◦ Access data (values) by strings called keys. ◦ Data has no required format – data may have any format ◦ Extremely simple interface  Data model: (key, value) pairs  NoSQL Key-Value store is a single table with two columns: ◦ one being the (Primary) Key, and the other being the Value.  Basic Operations: Insert (key, value), Fetch (key),Update (key), Delete (key) ◦ Implementation: efficiency, scalability, fault-tolerance  Records distributed to nodes based on key Replication  Single-record transactions, “eventual consistency” 10/20/2014 @ Surabhi Dwivedi 43
  • 44. Example- Key Value  Riak  Redis  Memcached DB  Berkeley DB  Hamster DB (especially suited for embedded use)  Amazon Dynamo DB (not open source)  Project Voldemort (Open Source Implementation of Dynamo DB) 10/20/2014 @ Surabhi Dwivedi 44
  • 45. References  Professional NoSQL – Shashank Tiwari  MongoDB Manual  http://docs.mongodb.org  http://docs.mongodb.org/manual/core/shar ding-introduction/  Wikipedia References  Intro to Hbase Internals & Schema Design (for HBase Users) ◦ Alex Baranau, Sematext International, 2012 10/20/2014 @ Surabhi Dwivedi 45