SlideShare a Scribd company logo
1
 Akbar Shaikh | Monocept
2
2002 2004 2006 2008 2010 2012
Data
3
Data
 Facebook had 60k servers in 2010
 Google had 450k servers in 2006 (speculated)
 Microsoft: between 100k and 500k servers (since Azure)
 Amazon: likely has a similar numbers, too (S3)
 Atomicity: Everything in a transaction succeeds lest it is rolled back.
 Consistency: A transaction cannot leave the database in an inconsistent state.
 Isolation: One transaction cannot interfere with another.
 Durability: A completed transaction persists, even after applications restart.
4
 Basic availability: Each request is guaranteed a response—successful or failed
execution.
 Soft state: The state of the system may change over time, at times without any
input (for eventual consistency).
 Eventual consistency: The database may be momentarily inconsistent but will be
consistent eventually.
5
The point I am trying to make here is, we may have to look beyond ACID to
something called BASE, coined by Eric Brewer:
 Consistency : Data access in a distributed database is considered to be consistent when an
update written on one node is immediately available on another node.
 Availability : The system guarantees availability for requests even though one or more
nodes are down.
 Partition Tolerance : Nodes can be physically separated from each other at any given
point and for any length of time. The time they're not able to reach each other, due to
routing problems, network interface troubles, or firewall issues, is called a network
partition. During the partition, all nodes should still be able to serve both read and write
requests. Ideally the system automatically reconciles updates as soon as every node can
reach every other node again.
6
Eric Brewer also noted that it is impossible for a distributed computer system to provide
consistency, availability and partition tolerance simultaneously. This is more commonly referred
to as the CAP theorem.
ACID
 Strong consistency for transactions
highest priority
 Availability less important
 Pessimistic
 Complex Mechanisms
BASE
 Availability and Scaling highest
priorities
 Weak consistency
 Optimistic
 Simple and Fast
7
8
9
10
11
{ "customer" : "billingAddress" : [ { "city" : "Chicago" } ],
"id" : 1,
"name" : "Martin",
"orders" : [ { "customerId" : 1,
"id" : 99,
"orderItems" : [ { "price" : 32.450000000000003,
"productId" : 27,
"productName" : "NoSQL Distilled"
} ],
"orderPayment" : [ { "billingAddress" : { "city" : "Chicago" },
"ccinfo" : "1000-1000-1000-1000",
"txnId" : "abelif879rft"
} ],
"shippingAddress" : [ { "city" : "Chicago" } ]
} ]
}
We see two primary reasons why people consider using a NoSQL database.
 Application development productivity.
A lot of application development effort is spent on mapping data between in-memory
data structures and a relational database. A NoSQL database may provide a data model
that better fits the application’s needs, thus simplifying that interaction and resulting in
less code to write, debug, and evolve.
 Large-scale data.
Organizations are finding it valuable to capture more data and process it more quickly.
They are finding it expensive, if even possible, to do so with relational databases. The
primary reason is that a relational database is designed to run on a single machine, but
it is usually more economic to run large data and computing loads on clusters of many
smaller and cheaper machines. Many NoSQL databases are designed explicitly to run
on clusters, so they make a better fit for big data scenarios.
12
 For almost as long as we’ve been in the software profession, relational databases
have been the default choice for serious data storage, especially in the world of
enterprise applications.
 If you’re an architect starting a new project, your only choice is likely to be which
relational database to use.
 After such a long period of dominance, the current excitement about NoSQL
databases comes as a surprise.
13
 Schemaless : data representation: Almost all NoSQL implementations offer schemaless data representation. This
means that you don’t have to think too far ahead to define a structure and you can continue to evolve over time—
including adding new fields or even nesting the data, for example, in case of JSON representation.
 Development time : I have heard stories about reduced development time because one doesn’t have to deal with
complex SQL queries. Do you remember the JOIN query that you wrote to collate the data across multiple tables to
create your final view?
 Speed : Even with the small amount of data that you have, if you can deliver in milliseconds rather than hundreds of
milliseconds—especially over mobile and other intermittently connected devices—you have much higher probability
of winning users over.
 Plan ahead for scalability : You read it right. Why fall into the ditch and then try to get out of it? Why not just plan
ahead so that you never fall into one. Or in other words, your application can be quite elastic—it can handle sudden
spikes of load. Of course, you win users over straightaway.
14
NoSQL databases have a lot more to offer than just solving the problems of scale
which are mentioned as follows:
Some NoSQL use cases
1. Massive data volumes
 Massively distributed architecture required to store the data
 Google, Amazon, Yahoo, Facebook…
2. Extreme query workload
 Impossible to efficiently do joins at that scale with an RDBMS
3. Schema evolution
 Schema flexibility (migration) is not trivial at large scale
 Schema changes can be gradually introduced with NoSQL
15
16
17
The main idea here is using a hash table where
there is a unique key and a pointer to a particular
item of data. The Key/value model is the simplest
and easiest to implement.
Key-value stores
But it is inefficient when you are only
interested in querying or updating part of
a value, among other disadvantages.
One key  one value, very fast
Key: Hash (no duplicates)
Value: binary object („BLOB“)
(DB does not understand your content)
customer_22
?=PQ)“§VN?
=§(Q$U%V§W=(BN
W§(=BU&W§$()=
W§$(=%
GIVE ME A
MEANING!
Key
Value
18
 A key-value store is a simple hash table
 Primarily used when all access to the database is via primary key
 Simplest NoSQL data stores to use (from an API perspective) PUT, GET, DELETE (matches REST)
 Value is a blob with the data store not caring or knowing what is inside
 Aggregate-Oriented
Suitable Use Cases
 Storing Session Information
 User Profiles, Preferences
 Shopping Cart Data
19
Key Value Databases
These were inspired by Lotus Notes and are similar to
key-value stores. The model is basically versioned
documents that are collections of other key-value
collections.
The semi-structured documents are stored in formats
like JSON.
Document databases are essentially the next level of
Key/value, allowing nested values associated with each
key. Document databases support querying more
efficiently.
Document databases
20
 Documents are the main concept
 Stores and retrieves documents, which can be XML, JSON, BSON, …
 Documents are self-describing, hierarchical tree data structures which can
consist of maps, collections and scalar values
 Documents stored are similar to each other but do not have to be exactly the same
 Aggregate-Oriented Suitable
Use Cases
 Event Logging
 Content Management Systems
 Web Analytics or Real-Time Analytics
 Product Catalog
21
Documents Databases
Often referred as “BigTable clones” • "a sparse,
distributed multi-dimensional sorted map“
These were created to store and process very large
amounts of data distributed over many machines.
There are still keys but they point to multiple columns.
The columns are arranged by column family.
Wide-column stores
22
Column stores can greatly improve the performance of queries that only touch a small amount of columns
 This is because they will only access these columns' particular data
 Simple math: table t has a total of 10 GB data, with
 column a: 4 GB
 column b: 2 GB
 column c: 3 GB
 column d: 1 GB
If a query only uses column d, at most 1 GB of data will be processed by a column store
n a row store, the full 10 GB will be processed
 Aggregate-Oriented Suitable
Use Cases
• Event Logging
• Content Management Systems
23
Wide-column Databases
 Are used to store information about networks, such
as social connections.
Graph stores
24
 Allow to store entities and relationships between these entities
 Entities are known as nodes, which have properties
 Relations are known as edges, which also have properties
 A query on the graph is also known as traversing the graph
 Traversing the relationships is very fast
Suitable Use Cases
 Connected Data
 Routing, Dispatch and Location-Based Services
 Recommendation Engines
25
Graph Databases
POLYGLOT PERSISTENCE
 In 2006, Neal Ford coined the term Polyglot Programming
 Applications should be written in a mix of languages to take advantage of the fact that
different languages are suitable for tackling different problems Polyglot Persistence
defines a hybrid approach to persistence
 Using multiple data storage technologies
 Selected based on the way data is being used by individual applications
 Why store binary images in relational databases, when there are better storage
systems?
 Can occur both over the enterprise as well as within a single application
26
27
POLYGLOT PERSISTENCE
„Traditional“ Today we use the same database for all
kind of data Shopping cart data User Sessions
Completed Order Product Catalog Recommendations
• Business transactions, session management
RDBMS data, reporting, logging information,
content information, ...
Need for same properties of availability, consistency
or backup requirements
Polyglot Data Storage Usage allows to mix and
match Relational and NoSQL data stores
28
POLYGLOT PERSISTENCE – CHALLENGES
 Decisions
• Have to decide what data storage technology to use
• Today it is easier to go with relational
 New Data Access APIs
• Each data store has its own mechanisms for
accessing the data
• Different API‟s
 Solution: Wrap the data access code into services
(Data/Entity Service) exposed to applications
 Will enforce a contract/schema to a schemaless database
29
Replica Sets: High
Availability
Replication is the process of synchronizing data across multiple servers.
Purpose of Replication
Replication provides redundancy and increases data availability.
With multiple copies of data on different database servers, replication protects a database from the loss of
a single server.
Replication also allows you to recover from hardware failure and service interruptions.
With additional copies of the data, you can dedicate one to disaster recovery, reporting, or backup.
In some cases, you can use replication to increase read capacity.
Clients have the ability to send read and write operations to different servers.
You can also maintain copies in different data centers to increase the locality and availability of data for
distributed applications.
30
Replica Sets: High
Availability
The primary accepts all write
operations from clients. Replica
set can have only one primary.
Because only one member can
accept write operations, replica
sets provide strict consistency.
The secondaries replicate the primary’s
oplog and apply the operations to their
data sets.
Secondaries’ data sets reflect the
primary’s data set.
31
Replica Sets: High
Availability
Automatic Failover
When a primary does not communicate with the other members of the set for more than 10 seconds, the
replica set will attempt to select another member to become the new primary. The first secondary that
receives a majority of the votes becomes primary.
32
Sharding: High Scalability And
Throughput
Sharding is a method for storing data across multiple
machines.
Purpose of Sharding
Database systems with large data sets and high throughput applications can challenge the capacity of a
single server.
High query rates can exhaust the CPU capacity of the server. Larger data sets exceed the storage
capacity of a single machine.
Finally, working set sizes larger than the system’s RAM stress the I/O capacity of disk drives.
33
Sharding: high scalability and throughput
Sharding, or horizontal scaling, by contrast, divides the data set and distributes the data over multiple
servers, or shards. Each shard is an independent database, and collectively, the shards make up a single
logical database.
34
Map-Reduce
The map-reduce pattern is a way to organize processing in such a way as to take advantage of multiple
machines on a cluster while keeping as much processing and the data it needs together on the same
machine.
It first gained prominence with Google’s Map Reduce
framework.
"Map" step: The master node takes the input,
divides it into smaller sub-problems, and distributes
them to worker nodes. A worker node may do this again
in turn, leading to a multi-level tree structure.
The worker node processes the smaller problem,
and passes the answer back to its master node.
"Reduce" step: The master node then collects the answers to all the sub-problems and combines them
in some way to form the output – the answer to the problem it was originally trying to solve.
35
36
Advantages of MongoDB over RDBMS
Schema less : MongoDB is document database in which one collection holds different
documents.
Number of fields, content and size of the document can be differ from one document to
another.
Structure of a single object is clear
No complex joins
Deep query-ability. MongoDB supports dynamic queries on documents using a document-
based query language that's nearly as powerful as SQL
Ease of scale-out: MongoDB is easy to scale
37
 Why should use MongoDB
  Document Oriented Storage : Data is stored in the form of JSON style documents
  Index on any attribute
  Replication & High Availability
  Auto-Sharding
  Rich Queries
  Fast In-Place Updates
  Professional Support By MongoDB
 Where should use MongoDB?
  Big Data
  Content Management and Delivery
  Mobile and Social Infrastructure
  User Data Management
  Data Hub
38
39
40
41
Storage Type: Document
 http://www.mongodb.com/scale
 http://www.mongodb.com/partners/cloud/microsoft
 http://azure.microsoft.com/en-us/gallery/store/mongodb/mongodb-inc/
 http://www.mongodb.com/leading-nosql-database
 http://nosql.findthebest.com/
 http://kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs-redis
 http://stackoverflow.com/questions/5252577/how-much-faster-is-redis-than-mongodb
Azure offered as a Service:
 https://mongolab.com/welcome/
mongodb offered as a Service:
 http://www.objectrocket.com/
 https://www.mongohq.com/
42
43
44
Thank You

More Related Content

PPTX
introduction to NOSQL Database
nehabsairam
 
PDF
NoSQL
Radu Potop
 
PPTX
Fundamentals of Enterprise Networks
VisualBee.com
 
PPTX
Data models in NoSQL
Dr-Dipali Meher
 
PPTX
Consistency in NoSQL
Dr-Dipali Meher
 
ODP
Intro To Android App Development
Mike Kvintus
 
PPTX
Introduction to NoSQL
Dr-Dipali Meher
 
PPTX
Selecting best NoSQL
Mohammed Fazuluddin
 
introduction to NOSQL Database
nehabsairam
 
NoSQL
Radu Potop
 
Fundamentals of Enterprise Networks
VisualBee.com
 
Data models in NoSQL
Dr-Dipali Meher
 
Consistency in NoSQL
Dr-Dipali Meher
 
Intro To Android App Development
Mike Kvintus
 
Introduction to NoSQL
Dr-Dipali Meher
 
Selecting best NoSQL
Mohammed Fazuluddin
 

What's hot (20)

PPT
System design
Anand Grewal
 
PPTX
NOSQL vs SQL
Mohammed Fazuluddin
 
PPT
6 Data Modeling for NoSQL 2/2
Fabio Fumarola
 
PPT
HCI 3e - Ch 19: Groupware
Alan Dix
 
PDF
Interoperability and Portability for Cloud Computing: A Guide
Cloud Standards Customer Council
 
PPTX
Class and object_diagram
Sadhana28
 
PPTX
NoSQL databases - An introduction
Pooyan Mehrparvar
 
PPTX
Network management ppt
DheerajPachauri
 
PDF
LibChain – Open, Verifiable and Anonymous Access Management. Juan Cabello, P...
LIBER Europe
 
PPTX
Importance & Principles of Modeling from UML Designing
ABHISHEK KUMAR
 
PPT
Lecture5
josephineusha
 
PPTX
Polyglot Persistence
Dr-Dipali Meher
 
PPTX
Database security
MaryamAsghar9
 
PDF
NOSQL- Presentation on NoSQL
Ramakant Soni
 
PPTX
Middleware Technologies ppt
OECLIB Odisha Electronics Control Library
 
PDF
MongoDB Lab Manual (1).pdf used in data science
bitragowthamkumar1
 
PPT
Database Security
alraee
 
PPTX
AWS Elastic Compute Cloud (EC2)
zekeLabs Technologies
 
PPT
Map Reduce
Sri Prasanna
 
ODP
Distributed Computing
Sudarsun Santhiappan
 
System design
Anand Grewal
 
NOSQL vs SQL
Mohammed Fazuluddin
 
6 Data Modeling for NoSQL 2/2
Fabio Fumarola
 
HCI 3e - Ch 19: Groupware
Alan Dix
 
Interoperability and Portability for Cloud Computing: A Guide
Cloud Standards Customer Council
 
Class and object_diagram
Sadhana28
 
NoSQL databases - An introduction
Pooyan Mehrparvar
 
Network management ppt
DheerajPachauri
 
LibChain – Open, Verifiable and Anonymous Access Management. Juan Cabello, P...
LIBER Europe
 
Importance & Principles of Modeling from UML Designing
ABHISHEK KUMAR
 
Lecture5
josephineusha
 
Polyglot Persistence
Dr-Dipali Meher
 
Database security
MaryamAsghar9
 
NOSQL- Presentation on NoSQL
Ramakant Soni
 
Middleware Technologies ppt
OECLIB Odisha Electronics Control Library
 
MongoDB Lab Manual (1).pdf used in data science
bitragowthamkumar1
 
Database Security
alraee
 
AWS Elastic Compute Cloud (EC2)
zekeLabs Technologies
 
Map Reduce
Sri Prasanna
 
Distributed Computing
Sudarsun Santhiappan
 
Ad

Similar to NOSQL (20)

PPTX
UNIT I Introduction to NoSQL.pptx
Rahul Borate
 
PDF
NoSql and it's introduction features-Unit-1.pdf
ajajkhan16
 
PPTX
Introduction to Data Science NoSQL.pptx
tarakesh7199
 
PPTX
The Rise of NoSQL and Polyglot Persistence
Abdelmonaim Remani
 
PPTX
No SQL- The Future Of Data Storage
Bethmi Gunasekara
 
PPTX
NoSQL A brief look at Apache Cassandra Distributed Database
Joe Alex
 
PPTX
Big Data (NJ SQL Server User Group)
Don Demcsak
 
PDF
NOSQL in big data is the not only structure langua.pdf
ajajkhan16
 
PPTX
Nosql seminar
Shreyashkumar Nangnurwar
 
PPTX
Relational and non relational database 7
abdulrahmanhelan
 
PPTX
Sql vs NoSQL
RTigger
 
PPTX
Module 2.2 Introduction to NoSQL Databases.pptx
NiramayKolalle
 
PPT
NoSql Databases
Nimat Khattak
 
PPTX
UNIT I Introduction to NoSQL.pptx
Rahul Borate
 
PPTX
No sq lv2
Nusrat Sharmin
 
PPTX
NoSQL Basics and MongDB
Shamima Yeasmin Mukta
 
PPTX
NoSQL(NOT ONLY SQL)
Rahul P
 
PPT
No SQL Databases as modern database concepts
debasisdas225831
 
PPTX
Master.pptx
KarthikR780430
 
PDF
Nosql Presentation.pdf for DBMS understanding
HUSNAINAHMAD39
 
UNIT I Introduction to NoSQL.pptx
Rahul Borate
 
NoSql and it's introduction features-Unit-1.pdf
ajajkhan16
 
Introduction to Data Science NoSQL.pptx
tarakesh7199
 
The Rise of NoSQL and Polyglot Persistence
Abdelmonaim Remani
 
No SQL- The Future Of Data Storage
Bethmi Gunasekara
 
NoSQL A brief look at Apache Cassandra Distributed Database
Joe Alex
 
Big Data (NJ SQL Server User Group)
Don Demcsak
 
NOSQL in big data is the not only structure langua.pdf
ajajkhan16
 
Relational and non relational database 7
abdulrahmanhelan
 
Sql vs NoSQL
RTigger
 
Module 2.2 Introduction to NoSQL Databases.pptx
NiramayKolalle
 
NoSql Databases
Nimat Khattak
 
UNIT I Introduction to NoSQL.pptx
Rahul Borate
 
No sq lv2
Nusrat Sharmin
 
NoSQL Basics and MongDB
Shamima Yeasmin Mukta
 
NoSQL(NOT ONLY SQL)
Rahul P
 
No SQL Databases as modern database concepts
debasisdas225831
 
Master.pptx
KarthikR780430
 
Nosql Presentation.pdf for DBMS understanding
HUSNAINAHMAD39
 
Ad

Recently uploaded (20)

PDF
20ME702-Mechatronics-UNIT-1,UNIT-2,UNIT-3,UNIT-4,UNIT-5, 2025-2026
Mohanumar S
 
PPT
SCOPE_~1- technology of green house and poyhouse
bala464780
 
PPTX
database slide on modern techniques for optimizing database queries.pptx
aky52024
 
PDF
67243-Cooling and Heating & Calculation.pdf
DHAKA POLYTECHNIC
 
PDF
FLEX-LNG-Company-Presentation-Nov-2017.pdf
jbloggzs
 
PDF
top-5-use-cases-for-splunk-security-analytics.pdf
yaghutialireza
 
PPTX
Module2 Data Base Design- ER and NF.pptx
gomathisankariv2
 
PDF
Zero carbon Building Design Guidelines V4
BassemOsman1
 
PDF
67243-Cooling and Heating & Calculation.pdf
DHAKA POLYTECHNIC
 
PPTX
MSME 4.0 Template idea hackathon pdf to understand
alaudeenaarish
 
PDF
JUAL EFIX C5 IMU GNSS GEODETIC PERFECT BASE OR ROVER
Budi Minds
 
PDF
Introduction to Data Science: data science process
ShivarkarSandip
 
PDF
July 2025: Top 10 Read Articles Advanced Information Technology
ijait
 
PDF
Unit I Part II.pdf : Security Fundamentals
Dr. Madhuri Jawale
 
PDF
Traditional Exams vs Continuous Assessment in Boarding Schools.pdf
The Asian School
 
PPTX
Information Retrieval and Extraction - Module 7
premSankar19
 
PPTX
business incubation centre aaaaaaaaaaaaaa
hodeeesite4
 
PPTX
IoT_Smart_Agriculture_Presentations.pptx
poojakumari696707
 
PDF
Software Testing Tools - names and explanation
shruti533256
 
PDF
Top 10 read articles In Managing Information Technology.pdf
IJMIT JOURNAL
 
20ME702-Mechatronics-UNIT-1,UNIT-2,UNIT-3,UNIT-4,UNIT-5, 2025-2026
Mohanumar S
 
SCOPE_~1- technology of green house and poyhouse
bala464780
 
database slide on modern techniques for optimizing database queries.pptx
aky52024
 
67243-Cooling and Heating & Calculation.pdf
DHAKA POLYTECHNIC
 
FLEX-LNG-Company-Presentation-Nov-2017.pdf
jbloggzs
 
top-5-use-cases-for-splunk-security-analytics.pdf
yaghutialireza
 
Module2 Data Base Design- ER and NF.pptx
gomathisankariv2
 
Zero carbon Building Design Guidelines V4
BassemOsman1
 
67243-Cooling and Heating & Calculation.pdf
DHAKA POLYTECHNIC
 
MSME 4.0 Template idea hackathon pdf to understand
alaudeenaarish
 
JUAL EFIX C5 IMU GNSS GEODETIC PERFECT BASE OR ROVER
Budi Minds
 
Introduction to Data Science: data science process
ShivarkarSandip
 
July 2025: Top 10 Read Articles Advanced Information Technology
ijait
 
Unit I Part II.pdf : Security Fundamentals
Dr. Madhuri Jawale
 
Traditional Exams vs Continuous Assessment in Boarding Schools.pdf
The Asian School
 
Information Retrieval and Extraction - Module 7
premSankar19
 
business incubation centre aaaaaaaaaaaaaa
hodeeesite4
 
IoT_Smart_Agriculture_Presentations.pptx
poojakumari696707
 
Software Testing Tools - names and explanation
shruti533256
 
Top 10 read articles In Managing Information Technology.pdf
IJMIT JOURNAL
 

NOSQL

  • 1. 1  Akbar Shaikh | Monocept
  • 2. 2 2002 2004 2006 2008 2010 2012 Data
  • 3. 3 Data  Facebook had 60k servers in 2010  Google had 450k servers in 2006 (speculated)  Microsoft: between 100k and 500k servers (since Azure)  Amazon: likely has a similar numbers, too (S3)
  • 4.  Atomicity: Everything in a transaction succeeds lest it is rolled back.  Consistency: A transaction cannot leave the database in an inconsistent state.  Isolation: One transaction cannot interfere with another.  Durability: A completed transaction persists, even after applications restart. 4
  • 5.  Basic availability: Each request is guaranteed a response—successful or failed execution.  Soft state: The state of the system may change over time, at times without any input (for eventual consistency).  Eventual consistency: The database may be momentarily inconsistent but will be consistent eventually. 5 The point I am trying to make here is, we may have to look beyond ACID to something called BASE, coined by Eric Brewer:
  • 6.  Consistency : Data access in a distributed database is considered to be consistent when an update written on one node is immediately available on another node.  Availability : The system guarantees availability for requests even though one or more nodes are down.  Partition Tolerance : Nodes can be physically separated from each other at any given point and for any length of time. The time they're not able to reach each other, due to routing problems, network interface troubles, or firewall issues, is called a network partition. During the partition, all nodes should still be able to serve both read and write requests. Ideally the system automatically reconciles updates as soon as every node can reach every other node again. 6 Eric Brewer also noted that it is impossible for a distributed computer system to provide consistency, availability and partition tolerance simultaneously. This is more commonly referred to as the CAP theorem.
  • 7. ACID  Strong consistency for transactions highest priority  Availability less important  Pessimistic  Complex Mechanisms BASE  Availability and Scaling highest priorities  Weak consistency  Optimistic  Simple and Fast 7
  • 8. 8
  • 9. 9
  • 10. 10
  • 11. 11 { "customer" : "billingAddress" : [ { "city" : "Chicago" } ], "id" : 1, "name" : "Martin", "orders" : [ { "customerId" : 1, "id" : 99, "orderItems" : [ { "price" : 32.450000000000003, "productId" : 27, "productName" : "NoSQL Distilled" } ], "orderPayment" : [ { "billingAddress" : { "city" : "Chicago" }, "ccinfo" : "1000-1000-1000-1000", "txnId" : "abelif879rft" } ], "shippingAddress" : [ { "city" : "Chicago" } ] } ] }
  • 12. We see two primary reasons why people consider using a NoSQL database.  Application development productivity. A lot of application development effort is spent on mapping data between in-memory data structures and a relational database. A NoSQL database may provide a data model that better fits the application’s needs, thus simplifying that interaction and resulting in less code to write, debug, and evolve.  Large-scale data. Organizations are finding it valuable to capture more data and process it more quickly. They are finding it expensive, if even possible, to do so with relational databases. The primary reason is that a relational database is designed to run on a single machine, but it is usually more economic to run large data and computing loads on clusters of many smaller and cheaper machines. Many NoSQL databases are designed explicitly to run on clusters, so they make a better fit for big data scenarios. 12
  • 13.  For almost as long as we’ve been in the software profession, relational databases have been the default choice for serious data storage, especially in the world of enterprise applications.  If you’re an architect starting a new project, your only choice is likely to be which relational database to use.  After such a long period of dominance, the current excitement about NoSQL databases comes as a surprise. 13
  • 14.  Schemaless : data representation: Almost all NoSQL implementations offer schemaless data representation. This means that you don’t have to think too far ahead to define a structure and you can continue to evolve over time— including adding new fields or even nesting the data, for example, in case of JSON representation.  Development time : I have heard stories about reduced development time because one doesn’t have to deal with complex SQL queries. Do you remember the JOIN query that you wrote to collate the data across multiple tables to create your final view?  Speed : Even with the small amount of data that you have, if you can deliver in milliseconds rather than hundreds of milliseconds—especially over mobile and other intermittently connected devices—you have much higher probability of winning users over.  Plan ahead for scalability : You read it right. Why fall into the ditch and then try to get out of it? Why not just plan ahead so that you never fall into one. Or in other words, your application can be quite elastic—it can handle sudden spikes of load. Of course, you win users over straightaway. 14 NoSQL databases have a lot more to offer than just solving the problems of scale which are mentioned as follows:
  • 15. Some NoSQL use cases 1. Massive data volumes  Massively distributed architecture required to store the data  Google, Amazon, Yahoo, Facebook… 2. Extreme query workload  Impossible to efficiently do joins at that scale with an RDBMS 3. Schema evolution  Schema flexibility (migration) is not trivial at large scale  Schema changes can be gradually introduced with NoSQL 15
  • 16. 16
  • 17. 17
  • 18. The main idea here is using a hash table where there is a unique key and a pointer to a particular item of data. The Key/value model is the simplest and easiest to implement. Key-value stores But it is inefficient when you are only interested in querying or updating part of a value, among other disadvantages. One key  one value, very fast Key: Hash (no duplicates) Value: binary object („BLOB“) (DB does not understand your content) customer_22 ?=PQ)“§VN? =§(Q$U%V§W=(BN W§(=BU&W§$()= W§$(=% GIVE ME A MEANING! Key Value 18
  • 19.  A key-value store is a simple hash table  Primarily used when all access to the database is via primary key  Simplest NoSQL data stores to use (from an API perspective) PUT, GET, DELETE (matches REST)  Value is a blob with the data store not caring or knowing what is inside  Aggregate-Oriented Suitable Use Cases  Storing Session Information  User Profiles, Preferences  Shopping Cart Data 19 Key Value Databases
  • 20. These were inspired by Lotus Notes and are similar to key-value stores. The model is basically versioned documents that are collections of other key-value collections. The semi-structured documents are stored in formats like JSON. Document databases are essentially the next level of Key/value, allowing nested values associated with each key. Document databases support querying more efficiently. Document databases 20
  • 21.  Documents are the main concept  Stores and retrieves documents, which can be XML, JSON, BSON, …  Documents are self-describing, hierarchical tree data structures which can consist of maps, collections and scalar values  Documents stored are similar to each other but do not have to be exactly the same  Aggregate-Oriented Suitable Use Cases  Event Logging  Content Management Systems  Web Analytics or Real-Time Analytics  Product Catalog 21 Documents Databases
  • 22. Often referred as “BigTable clones” • "a sparse, distributed multi-dimensional sorted map“ These were created to store and process very large amounts of data distributed over many machines. There are still keys but they point to multiple columns. The columns are arranged by column family. Wide-column stores 22
  • 23. Column stores can greatly improve the performance of queries that only touch a small amount of columns  This is because they will only access these columns' particular data  Simple math: table t has a total of 10 GB data, with  column a: 4 GB  column b: 2 GB  column c: 3 GB  column d: 1 GB If a query only uses column d, at most 1 GB of data will be processed by a column store n a row store, the full 10 GB will be processed  Aggregate-Oriented Suitable Use Cases • Event Logging • Content Management Systems 23 Wide-column Databases
  • 24.  Are used to store information about networks, such as social connections. Graph stores 24
  • 25.  Allow to store entities and relationships between these entities  Entities are known as nodes, which have properties  Relations are known as edges, which also have properties  A query on the graph is also known as traversing the graph  Traversing the relationships is very fast Suitable Use Cases  Connected Data  Routing, Dispatch and Location-Based Services  Recommendation Engines 25 Graph Databases
  • 26. POLYGLOT PERSISTENCE  In 2006, Neal Ford coined the term Polyglot Programming  Applications should be written in a mix of languages to take advantage of the fact that different languages are suitable for tackling different problems Polyglot Persistence defines a hybrid approach to persistence  Using multiple data storage technologies  Selected based on the way data is being used by individual applications  Why store binary images in relational databases, when there are better storage systems?  Can occur both over the enterprise as well as within a single application 26
  • 27. 27 POLYGLOT PERSISTENCE „Traditional“ Today we use the same database for all kind of data Shopping cart data User Sessions Completed Order Product Catalog Recommendations • Business transactions, session management RDBMS data, reporting, logging information, content information, ... Need for same properties of availability, consistency or backup requirements Polyglot Data Storage Usage allows to mix and match Relational and NoSQL data stores
  • 28. 28 POLYGLOT PERSISTENCE – CHALLENGES  Decisions • Have to decide what data storage technology to use • Today it is easier to go with relational  New Data Access APIs • Each data store has its own mechanisms for accessing the data • Different API‟s  Solution: Wrap the data access code into services (Data/Entity Service) exposed to applications  Will enforce a contract/schema to a schemaless database
  • 29. 29 Replica Sets: High Availability Replication is the process of synchronizing data across multiple servers. Purpose of Replication Replication provides redundancy and increases data availability. With multiple copies of data on different database servers, replication protects a database from the loss of a single server. Replication also allows you to recover from hardware failure and service interruptions. With additional copies of the data, you can dedicate one to disaster recovery, reporting, or backup. In some cases, you can use replication to increase read capacity. Clients have the ability to send read and write operations to different servers. You can also maintain copies in different data centers to increase the locality and availability of data for distributed applications.
  • 30. 30 Replica Sets: High Availability The primary accepts all write operations from clients. Replica set can have only one primary. Because only one member can accept write operations, replica sets provide strict consistency. The secondaries replicate the primary’s oplog and apply the operations to their data sets. Secondaries’ data sets reflect the primary’s data set.
  • 31. 31 Replica Sets: High Availability Automatic Failover When a primary does not communicate with the other members of the set for more than 10 seconds, the replica set will attempt to select another member to become the new primary. The first secondary that receives a majority of the votes becomes primary.
  • 32. 32 Sharding: High Scalability And Throughput Sharding is a method for storing data across multiple machines. Purpose of Sharding Database systems with large data sets and high throughput applications can challenge the capacity of a single server. High query rates can exhaust the CPU capacity of the server. Larger data sets exceed the storage capacity of a single machine. Finally, working set sizes larger than the system’s RAM stress the I/O capacity of disk drives.
  • 33. 33 Sharding: high scalability and throughput Sharding, or horizontal scaling, by contrast, divides the data set and distributes the data over multiple servers, or shards. Each shard is an independent database, and collectively, the shards make up a single logical database.
  • 34. 34 Map-Reduce The map-reduce pattern is a way to organize processing in such a way as to take advantage of multiple machines on a cluster while keeping as much processing and the data it needs together on the same machine. It first gained prominence with Google’s Map Reduce framework. "Map" step: The master node takes the input, divides it into smaller sub-problems, and distributes them to worker nodes. A worker node may do this again in turn, leading to a multi-level tree structure. The worker node processes the smaller problem, and passes the answer back to its master node. "Reduce" step: The master node then collects the answers to all the sub-problems and combines them in some way to form the output – the answer to the problem it was originally trying to solve.
  • 35. 35
  • 36. 36
  • 37. Advantages of MongoDB over RDBMS Schema less : MongoDB is document database in which one collection holds different documents. Number of fields, content and size of the document can be differ from one document to another. Structure of a single object is clear No complex joins Deep query-ability. MongoDB supports dynamic queries on documents using a document- based query language that's nearly as powerful as SQL Ease of scale-out: MongoDB is easy to scale 37
  • 38.  Why should use MongoDB   Document Oriented Storage : Data is stored in the form of JSON style documents   Index on any attribute   Replication & High Availability   Auto-Sharding   Rich Queries   Fast In-Place Updates   Professional Support By MongoDB  Where should use MongoDB?   Big Data   Content Management and Delivery   Mobile and Social Infrastructure   User Data Management   Data Hub 38
  • 39. 39
  • 40. 40
  • 42.  http://www.mongodb.com/scale  http://www.mongodb.com/partners/cloud/microsoft  http://azure.microsoft.com/en-us/gallery/store/mongodb/mongodb-inc/  http://www.mongodb.com/leading-nosql-database  http://nosql.findthebest.com/  http://kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs-redis  http://stackoverflow.com/questions/5252577/how-much-faster-is-redis-than-mongodb Azure offered as a Service:  https://mongolab.com/welcome/ mongodb offered as a Service:  http://www.objectrocket.com/  https://www.mongohq.com/ 42
  • 43. 43

Editor's Notes

  • #4: http://downloadsquad.switched.com/2010/06/29/facebook-doubles-its-server-count-from-30-000-to-60-000-in-just-6-months/ by Sebastian Anthony on June 29, 2010 at 10:00 AM