Nosql

The Database
Revolution:
An Historical
Perspective

The Database Revolution:
An Historical Perspective

Brandon Byars
bbyars@thoughtworks.com
John Finlay
jfinlay@thoughtworks.com

Topics to be Covered
1. Historical Background
– How Data Management brought us to today
2. Current State
– RDBMS and recent catalysts for change
3. New-world Data Management problems
– How NoSQL is being applied
4. NoSQL Issues
– Hey, nobody’s perfect …

Historical Background
To understand the transition
database engines are undergoing,

we need to acknowledge
how they originated,
and how they evolved.

Let’s go back to
the dawn of time…

Big Box Computing
• 360/85: 4 MB “core”; 1 MIPS; ~$1-5M
• Plus building,
water cooling
• 29 MB disk x 8,
$250K
• 4MB drum
• CICS, TSO

Data in Flat Files
• Common file formats: ISAM, later VSAM
• Single-user processing
– e.g. batch job
• Multi-user processing
– Generally queued by TP monitor
• Log-based Transaction management
– Supporting COMMIT/ROLLBACK

Hierarchical Model
Course
1101 Calculus

Prereqs Offering
1026 Trig 110913 Thorvaldsen 2112
1024 Algebra 110912 Hayes 127A

Instructor Students
22628 Smith 10274699 Barney
10274484 Finlay
Marks 10228437 Byars
Assign 1 87
Assign 2 37

• 1968: IBM introduces IMS/DB
– Primary use: BOM (e.g. Apollo)
– “Programmer as navigator” thru physical pointers
– What other classes does Byers take?

Inverted List Model
Address Converter
Bygum Finlay Smith
ISN BLOCK#
… …
Allen Atkins Byggles Bygum Chen Eggers Finlay Myers Rex 1265
1266 48265
Austin 1625 Benson 1938 Bindle 1493 Byars 1266 1267 12973
… …

• 1970: Software AG introduces Adabas
• Similar to modern RDBMS
– Normalized data with optional MU’s/PE’s
– Multiple compressed B-tree Indexes per table
– (Single table) search result sets
• Data retrieval by record
• FK references managed, followed in code

Network Model
Prior Next
Parent (Parent 2)
Next Prior
Prior
1st Child nth Child
(1st Child) Next
Direct
Prior Next
2nd Child
Next Prior

• 1971: CODASYL navigational model
• 1983: Cullinet IDMS
– Navigation slightly easier but still in code
– A little late in the game

Relational Model
• 1970: Codd introduces Relational Calculus
• 1977: IBM spikes System/R
– Origin of SQL
• 1979: Oracle; 1983, DB2
– Acceptable performance on “modern” hardware
• Others follow: Informix (1980),
Sybase/SQL Server (1984), MySQL (1995)

SQL Effect
• Perceived advantages of SQL:
– “One size fits all”
– No lock-in
– Less reliance on code navigation
– Cheaper cycles can do more
• non-RDBMS vendors forced to support SQL
– Set results vs. sequential processing
– “Tuna fish with your cottage cheese”

Today:
Does a relational model always work?

• Predefined data structures
• Expensive / Slow
• ACID not always required / desired
• “Standard language” ain’t so standard
• Not always TPC-A accounting/inventory apps

Example: EAV The RDBMS Way
EntityId Attribute Value
10 Name Brandon Byars
10 Age 34
10 GoodLooking Yes
10 LikesEAV No
10 NetWorth 34.57
10 BirthDate April 29, 1977

• “5th NF” to circumvent rigid data structures
• No data types, no FK’s
• All the RDBMS overhead without the benefits

Growth of Processing Power

Used with permission of Ben Klemens
“Moore’s law won’t save you”
http://modelingwithdata.org

How the World Has Changed
Every 2 days we create as much information
as we did from the dawn of civilization to 2003
- Paraphrasing Eric Schmidt, 2011
Old World New World
Processor(Mips/core) 1 10,000
Processors x Nodes 1x1 8 x 1,000
Nodes / $M 1 100
Memory/Node(MB) 1 100,000
Disk Data(GB) 1 1,000,000
Disk Storage (KB/$) 1 10,000,000
Users (1000’s) 1 100,000
Support Staff (100’s) 1 1

The NoSQL Argument

ONE SIZE DOESN’T FIT ANYBODY

SINGLE CPU/NODE UNREALISTIC

IT’S TIME FOR A REWRITE

CAP Theorem

Neo4J
RDBMS
Consistency Availability

Redis Unicorns DNS
Quorum Dynamo
Bigtable CouchDB
MongoDB Cassandra

Partition Tolerance

ACID versus BASE
• ACID: The RDBMS keystone
– Atomic
– Consistent
– Isolated
– Durable

• BASE: A new alternative
– Basically Available
– Soft State
– Eventually Consistent

Distribution: Types of Failures
• Memory and network corruption
• Large clock skew
• Hung machines
• Extended and asymmetric network partitions
• Bugs in other systems used
• Overflow of file system quotas
• Planned and un- planned maintenance
• Disk failure

Problem: Index the Interwebs
Word URL
nosql http://nosql.com
Inverted Index nosql http://mapreduce.com
nosql http://hadoop.com

URL IncomingLinks
http://nosql.com 5763

PageRank http://mapreduce.com 100346

http://hadoop.com 87234

MapReduce: Inverted Index
HTML Documents
Map

Word URL (word, URL)
nosql http://nosql.com
nosql http://mapreduce.com

(word, list(URL))

Reduce

MapReduce: Incoming Links
HTML Documents
Map

URL IncomingLinks (targetURL, 1)
http://nosql.com 5763
http://mapreduce.com 100346

(targetURL, count)

Reduce

Bigtable
“A Bigtable is a sparse, distributed, persistent
multidimensional sorted map”

Bigtable
• Sorted map abstraction
• Allows quick random read/write of massive
amounts of structured data
• Single row transactions
• Unlimited columns on row

Facebook Messages
RowKey Message:Offset Message:Subject
bbyars:hbase:17 34 FB messages
bbyars:nosql:17 56 FB messages
jfinlay:oracle:19 5 Geospatial
jfinlay:oracle:24 87 RAC
jfinlay:postgres:19 32 Geospatial

Problem:
e-Commerce at Web Scale

Dynamo
“In particular, applications have received successful
responses (without timing out) for 99.9995% of its
requests and no data loss event has occurred to date.”

• Incremental scalability
• Symmetry
• Decentralization
• Heterogeneity

Dynamo: Ring Partitioning
A Hash(key)

G B

F C

E D

Dynamo: Tuning Knobs

N R W

Big Data: Clones
Map Reduce Bigtable Dynamo

2nd Generation:

Document Databases
{
_id:ObjectId("4c4ba5c0672c685e5e8aabf3"),
name: "Brandon Byars”,
children: [
{
name: "Jackson Byars",
birthDate: "February 15, 1999"
},
{
name: "Zachary Byars",
birthdate: "January 15, 2009"
}]
}

Summary
One size really doesn’t fit all

PreSQL RDBMS NoSQL
Overhead low high low
Data Model rigid changeable rigid/flexible
Code Lock-in high medium high
Distribution n/a achievable easy/hard
Extras few many few
Cost high medium-high low-medium

Future Choices
• RDBMS still the “go-to” solution in most cases
• But, look at
new alternatives

Keep Business,
not Fashion,
in mind

Nosql

More Related Content

Similar to Nosql (20)

Recently uploaded (20)

Nosql

Editor's Notes