adsu4
adsu4
NoSQL
Database Part 1
RDBMS NoSQL
integrity is mission-critical OK as long as most data is correct
data format consistent, well-defined data format unknown or inconsistent
data is of long-term value data are expected to be replaced
data updates are frequent write-once, read multiple (no
updates, or at least not often)
You can have consistent data, and you can have a high-availability
database, then transactions will require longer times to execute.
The time it takes to update all copies depends on several factors, such
as the load on the system and the speed of the network.
2. Read-Your-Writes Consistency
Once you have updated a record, all of your reads of that record will return the
updated value. You would never retrieve a value inconsistent with the value you
had written.
For example, Alice updates a customer’s outstanding balance to $1,500. The
update is written to one server and the replication process begins updating
other copies. During the replication process, Alice queries the customer’s
balance. She is guaranteed to see $1,500 when the database supports read your-
writes consistency.
Distribution Models-
Single Server,
Sharding,
Master-Slave Replication,
Peer-to-Peer Replication,
Combining Sharding and Replication
Data Distribution
➢ NoSQLsystems: data distributed over large clusters
➢ Multiple servers
➢ master-slave and
➢ peer-to-peer
5
0
Single Server
The first and the simplest distribution option is the one we
would most often recommend—no distribution at all.
Run the database on a single machine that handles all the
reads and writes to the data store.
It eliminates all the complexities.
It’s easy for operations people to manage and easy for
application developers to handle.
Graph databases are the obvious category here—these work
best in a single-server configuration.
If your data usage is mostly about processing aggregates, then
a single-server document or key-value store may well be
worthwhile because it’s easier on application developers.
Sharding
➢ Different parts of the data onto different servers : different people are accessing
different parts of the dataset - a technique that’s called sharding.
➢ This allows for larger datasets to be split in smaller chunks and stored in
multiple data nodes, increasing the total storage capacity of the system.
➢ Horizontal scalability : as additional nodes are brought on to share the load.
Horizontal scaling allows for near-limitless scalability to handle big data
and intense workloads
➢ Ideal case: different users all talking to different server nodes. Each user only has
to talk to one server, so gets rapid responses from that server.
➢ Data accessed together on the same node ̶ aggregate unit!
With a single server it’s easier to pay the effort and cost to keep that
server up and running;
Clusters usually try to use less reliable machines, and you’re more
likely to get a node failure.
So in practice, sharding alone is likely to decrease resilience
(Immediately recover).
Sharding
Disadvantages:
Sharding does come with several drawbacks, namely overhead in
query result compilation, complexity of
administration, and increased infrastructure costs.
Query Overhead — Each sharded database must have a separate machine or
service which understands how to route a querying operation to the appropriate
shard. This introduces additional latency on every operation. Furthermore, if the
data required for the query is horizontally partitioned across multiple shards, the
router must then query each shard and merge the result together. This can make an
otherwise simple operation quite expensive and slow down response times.
Complexity of Administration — With a single unsharded database, only the
database server itself requires upkeep and maintenance. With every sharded
database, on top of managing the shards themselves, there are additional service
nodes to maintain. Plus, in cases where replication is being used, any data updates
must be mirrored across each replicated node. Overall, a sharded database is a
more complex system which requires more administration.
Increased Infrastructure Costs — Sharding by its nature requires additional
machines and compute power over a single database server. While this allows your
database to grow beyond the limits of a single machine, each additional shard
comes with higher costs. The cost of a distributed database system, especially if it is
missing the proper optimization, can be significant.
Sharding Architectures and Types
Ranged/Dynamic Sharding
Algorithmic/Hashed Sharding
Entity-/Relationship-Based Sharding
Geography-Based Sharding
1. Ranged/Dynamic Sharding
master-slave peer-to-peer
6
4
Replication
6
6
Master-Slave Replication
Advantages:
Most helpful for scaling when you have a read-intensive dataset.
More read requests handled by:
Add more slave nodes
Inconsistency
Disadvantages: • Slow propagation of changes to copies on different nodes
• Inconsistencies on read lead to problems but are relatively transient
• Two people can update different copies of the same record stored on
different nodes at the same time - a write-write conflict.
• Inconsistent writes are forever.
Combining Sharding and Replication
MemcachedDB
not
open-source
Project
Voldemort
open-source
version
Google’s
BigTable
The document type is mostly used for CMS systems, blogging platforms, real-
time analytics & e-commerce applications.
It should not use for complex transactions which require multiple operations
or queries against varying aggregate structures.
Amazon SimpleDB, CouchDB, MongoDB, Riak, Lotus Notes are popular
Document originated DBMS systems.
Mrs. Deepali Jaadhav 84
Document Databases
Documents
Instead of storing each attribute of an entity with a separate key,
document databases store multiple attributes in a single document.
One of the most important characteristics of document databases is you
do not have to define a fixed schema before you add data to the
database.
Lotus Notes
Storage Facility
FlockDB
by deepali Jadhav
preencoded.png
Document Databases
Document databases, also called document-oriented databases.
It uses a key-value approach to store data but with important
differences from key-value databases. A document database stores
values as documents.
Documents are semi-structured entities, typically in a standard format
such as JavaScript Object Notation (JSON) or Extensible Markup
Language (XML).
When the term document is used in this context, it refers to data
structures that are stored as strings or binary representations of strings.
preencoded.png
Benefits of Document Databases
Flexibility
Document databases offer a flexible schema, allowing you to
adapt to changing data requirements without rigid constraints.
You can easily add new fields or modify existing ones without
impacting the structure of your data.
Scalability
These databases are designed to handle large volumes of data,
making them ideal for applications with rapidly growing data
sets. They can scale horizontally by adding more servers to the
cluster, ensuring consistent performance even under heavy
load.
Performance
Document databases prioritize read and write operations,
offering fast and efficient data access. This makes them well-
suited for applications that require real-time updates and low
latency, such as e-commerce platforms and social media
applications.
preencoded.png
Code Example: Illustrating Document Structures
Let's examine a practical example of a customer collection to understand how document structures can vary
within a collection.
Customer Collection Example Explanation
• Each document in the collection represents a
{
customer, identified by a unique `customerId`.
"customerId": "12345",
"firstName": "John", • The document structure includes basic information
"lastName": "Doe", like `firstName`, `lastName`, and `email`.
"email": "[email protected]", • An embedded `address` object provides detailed
"address": { address information.
"street": "123 Main Street", • The `orders` array stores information about the
"city": "Anytown", customer's previous orders, demonstrating a one-
"state": "CA", to-many relationship.
"zip": "91234"
},
"orders": [
{ "orderId": "A123", "date":
"2023-03-15" },
{ "orderId": "B456", "date":
"2023-04-20" }
]
}
preencoded.png
Basic Operations on Document Databases
Inserting
Deleting
Updating
Retrieving
db.createCollection("books")
> db.user.findOne({age:39})
{
"_id" : ObjectId("5114e0bd42…"),
"first" : "John",
"last" : "Doe",
"age" : 39,
"interests" : [
"Reading",
"Mountain Biking ]
"favorites": {
"color": "Blue",
"sport": "Soccer"}
}
Not suitable for hierarchical data storage. Suitable for hierarchical data storage.
It is vertically scalable i.e increasing RAM. It is horizontally scalable i.e we can add more servers.
It centers around ACID properties (Atomicity, It centers around the CAP theorem (Consistency,
Consistency, Isolation, and Durability). Availability, and Partition tolerance).
It is row-based. It is document-based.
It is slower in comparison with MongoDB. It is almost 100 times faster than RDBMS.
It is column-based. It is field-based.
It does not provide JavaScript client for querying. It provides a JavaScript client for querying.
It supports SQL query language only. It supports JSON query language along with SQL.
Document
(Embedding or
Nesting)
Array of JSON
objects
Unlike RDBMS:
No Integrity
Constraints in
MongoDB
Documents manual
Insert Documents —
MongoDB Manual
In RDBMS In MongoDB
Either insert the 1st docuement
Equivalent to in SQL:
Two
operat
ors
Query Condition
New
doc
For the document having item = “BE10”, replace it with the given document
Manual:
http://docs.mongodb.org/master/MongoDB-
manual.pdf
Dataset: http://docs.mongodb.org/manual/reference/bios-
example-collection/
Online Execution:
https://docs.mongodb.com/manual/tutorial/insert-
documents/
133 Mrs. Deepali Jaadhav
Key Value Database