0% found this document useful (0 votes)
2 views

Chapter-2_NoSQL_Databases_part1

Chapter 2 discusses NoSQL databases, tracing their history from hierarchical models to the rise of NoSQL in the mid-2000s due to the limitations of relational databases in handling big data. It covers key concepts such as ACID and BASE, data models, and the CAP theorem, highlighting the characteristics and scalability of NoSQL systems. The chapter emphasizes the shift towards distributed, schema-less, and horizontally scalable database solutions to manage large volumes of data.

Uploaded by

Srinivas Boini
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Chapter-2_NoSQL_Databases_part1

Chapter 2 discusses NoSQL databases, tracing their history from hierarchical models to the rise of NoSQL in the mid-2000s due to the limitations of relational databases in handling big data. It covers key concepts such as ACID and BASE, data models, and the CAP theorem, highlighting the characteristics and scalability of NoSQL systems. The chapter emphasizes the shift towards distributed, schema-less, and horizontally scalable database solutions to manage large volumes of data.

Uploaded by

Srinivas Boini
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

Chapter 2:

NoSQL Databases

Big Data Management and Analytics 50


DATABASE
SYSTEMS
NoSQL Database Systems
GROUP

Outline

• History

• Concepts
• ACID
• BASE
• CAP

• Data Models
• Key-Value
• Document
• Column-based
• Graph

Big Data Management and Analytics 51


DATABASE
SYSTEMS
History
GROUP

60s: IBM developed the Hierarchical Database Model

• Tree-like structure
• Data stored as records connected by links
• Support only one-to-one and one-to-many relationships

Mid 80‘s: Rise of Relational Database Model

• Data stored in a collection of tables (rows and columns)


→ Strict relational scheme
• SQL became standard language (based on relational algebra)
→ Impedance Mismatch!

Big Data Management and Analytics 52


DATABASE
SYSTEMS
History – Impedance Mismatch
GROUP

Supply:
Supplier:
LNR Lname Status Sitz PNR Pname Ort
LNR: L1 … … … … … … …
Lname: Meier
Status: 20 … … … … … … …
Sitz: Wetter … … … … … … …
Project:
PNR: P2
Pname: Pleite
Ort: Bonn
Pieces: TNR Tname Farbe Gewicht LNR PNR TNR Menge
TNR: T6 … … … … … … … …
Tname: Schraube
Farbe: rot … … … … … … … …
Gewicht: 03 … … … … … … … …
Menge: 700

Given the LTP scheme from Datenbanksysteme I and an object


of type Supply:
How to incorporate the data bundled in the object Supply
into the DB?

Big Data Management and Analytics 53


DATABASE
SYSTEMS
History – Impedance Mismatch
GROUP

Supply:
Supplier:
LNR Lname Status Sitz PNR Pname Ort
LNR: L1 … … … … … … …
Lname: Meier
Status: 20 … … … … … … …
Sitz: Wetter L1 Meier 20 Wetter P2 Pleite Bonn
Project:
PNR: P2
Pname: Pleite
Ort: Bonn
Pieces: TNR Tname Farbe Gewicht LNR PNR TNR Menge
TNR: T6 … … … … … … … …
Tname: Schraube
Farbe: rot … … … … … … … …
Gewicht: 03 T6 Schraube rot 03 … … … …
Menge: 700

INSERT INTO L VALUES (Supply.getSupplier().getLNR(), ...);

INSERT INTO P VALUES (Supply.getProject().getPNR(), ...);

...

Big Data Management and Analytics 54


DATABASE
SYSTEMS
History – Impedance Mismatch
GROUP

Supply:
Supplier:
LNR Lname Status Sitz PNR Pname Ort
LNR: L1 … … … … … … …
Lname: Meier
Status: 20 … … … … … … …
Sitz: Wetter L1 Meier 20 Wetter P2 Pleite Bonn
Project:
PNR: P2
Pname: Pleite
Ort: Bonn
Pieces: TNR Tname Farbe Gewicht LNR PNR TNR Menge
TNR: T6 … … … … … … … …
Tname: Schraube
Farbe: rot … … … … … … … …
Gewicht: 03 T6 Schraube rot 03 L1 P2 T6 700
Menge: 700

INSERT INTO LTP VALUES (...);

• Object-oriented encapsulation vs. storing data distributed


among several tables
→ Lots of data type maintenance by the programmer

Big Data Management and Analytics 55


DATABASE
SYSTEMS
History
GROUP

Mid 90‘s: Trend of the Object-Relational Database Model


• Data stored as objects (including data and methods)
• Avoidance of object-relational mapping
→ Programmer-friendly
• But still Relational Databases prevailed in the 90‘s

Mid 2000‘s: Rise of Web 2.0


• Lots of user generated data through web applications

→ Storage systems had to become scaled up

Big Data Management and Analytics 56


DATABASE
SYSTEMS
History
GROUP

Approaches to scale up storage systems


• Two opportunities to solve the rising storage system:
• Vertical scaling
Enlarge a single machine
– Limited in space
– Expensive

• Horizontal scaling
Use many commodity ma-
chines and form computer
clusters or grids
– Cluster maintenance

Big Data Management and Analytics 57


DATABASE
SYSTEMS
History
GROUP

Approaches to scale up storage systems


• Two opportunities to solve the rising storage system:
• Vertical scaling
Enlarge a single machine
– Limited in space
– Expensive

• Horizontal scaling
Use many commodity ma-
chines and form computer
clusters or grids
– Cluster maintenance

Big Data Management and Analytics 58


DATABASE
SYSTEMS
History
GROUP

Mid 2000‘s: Birth of the NoSQL Movement


• Problem of computer clusters:
Relational databases do not scale well horizontally

→ Big Players like Google or Amazon developed their own


storage systems: NoSQL („Not-Only SQL“) databases were
born

Today: Age of NoSQL


• Several different NoSQL systems available (>225)

Big Data Management and Analytics 59


DATABASE
SYSTEMS
Characterstics of NoSQL Databases
GROUP

There is no unique definition but some characteristics for


NoSQL Databases:

• Horizontal scalability (cluster-friendliness)


• Non-relational
• Distributed
• Schema-less
• Open-source (at least most of the systems)

Big Data Management and Analytics 60


About the concepts behind NoSQL
Databases
DATABASE
SYSTEMS
GROUP

ACID – The holy grail of RDBMSs:


• Atomicity: Transactions happen entirely or not at all. If a
transaction fails (partly), the state of the database is
unchanged.
• Consistency: Any transaction brings the database from one
valid state to another and does not break one of the pre-
defined rules (like constraints).
• Isolation: Concurrent execution of transactions results in a
system state that would be obtained if transactions were
executed serially.
• Durability: Once a transaction has been commited, it will
remain so.
Big Data Management and Analytics 61
About the concepts behind NoSQL
Databases
DATABASE
SYSTEMS
GROUP

BASE – An artificial concept for NoSQL databases:


• Basically Available: The system is generally available, but
some data might not at any time (e.g. due to node failures)

• Soft State: The system‘s state changes over time. Stale data
may expire if not refreshed.

• Eventual consistency: The system is consistent from time to


time, but not always. Updates are propagated through the
system if there is enough time.

→ BASE is settled on the opposite site to ACID when


considering a „consistency-availability spectrum“
Big Data Management and Analytics 62
About the concepts behind NoSQL
Databases
DATABASE
SYSTEMS
GROUP

Levels of Consistency:
Eventual Consistency

Monotonic Read Consistency


M.R.C. + R.Y.O.W.

Immediate Consistency
Strong Consistency
Transactions

Read-Your-Own-Writes

Big Data Management and Analytics 63


About the concepts behind NoSQL
Databases
DATABASE
SYSTEMS
GROUP

Levels of Consistency:

• Eventual Consistency: Write operations are not spread


across all servers/partitions immediately

• Monotononic Read Consistency: A client who read an object


once will never read an older version of this object

• Read Your Own Writes: A client who wrote an object will


never read an older version of this object

• Immediate Consistency: Updates are propagated


immediately, but not atomic

Big Data Management and Analytics 64


About the concepts behind NoSQL
Databases
DATABASE
SYSTEMS
GROUP

Levels of Consistency:

• Strong consistency: Updates are propagated immediately +


support of atomic operations on single data entities (usually
on master nodes)

• Transactions: Full support of ACID transaction model

Big Data Management and Analytics 65


About the concepts behind NoSQL
Databases
DATABASE
SYSTEMS
GROUP

Data sharding Data replication


Document Document

The two types of consistency:


• Logical consistency:
Data is consistent within itself (Data Integrity)

• Replication consistency:
Data is consistent across multiple replicas (on multiple
machines)
Big Data Management and Analytics 66
About the concepts behind NoSQL
Databases
DATABASE
SYSTEMS
GROUP

Brewer‘s CAP Theorem:

CONSISTENCY

AVAILABILITY PARTITION
TOLERANCE

Any networked shared-data system can have at


most two of the three desired properties!
Big Data Management and Analytics 67
About the concepts behind NoSQL
Databases
DATABASE
SYSTEMS
GROUP

DB-Systems allowed by CAP Theorem:

• CP-Systems: Fully consistent and partitioned systems


renounce availability. Only consistent nodes are available.

• AP-Systems: Fully available and partitioned systems


renounce consistency. All nodes answer to queries all the
time, even if answers are inconsistent.

• AC-Systems: Fully available and consistent systems


renounce partitioning. Only possible if the system is not
distributed.

Big Data Management and Analytics 68


DATABASE
SYSTEMS
Big Picture
GROUP

All clients always


CAP Theorem: have the same view
of the data

C C

A P
A Each client can al- The system works well
ways read and write despite physical
network partitions

Big Data Management and Analytics 69


DATABASE
SYSTEMS
Big Picture
GROUP

All clients always


CAP Theorem: have the same view
of the data

C C
ACID

AC-Systems CP-Systems
- RDBMSs (MySQL,
Postgres, …)

BASE
A P
A Each client can al-
AP-Systems
The system works well
ways read and write despite physical
network partitions

Big Data Management and Analytics 70

You might also like