NoSQL Database Technology - A Survey and Comparison of Systems
NoSQL Database Technology - A Survey and Comparison of Systems
database
technology:
A
survey
and
comparison
of
systems
James Phillips
1
Changes
in
interac4ve
so7ware
–
NoSQL
driver
2
Modern interactive software architecture
Database Scales Up
Get a bigger, more complex server
Note
–
Rela4onal
database
technology
is
great
for
what
it
is
great
for,
but
it
is
not
great
for
this.
3
Survey:
Schema
inflexibility
#1
adop4on
driver
Costs 16%
Other 11%
4
Extending
the
scope
of
RDBMS
technology
5
Lacking
market
solu4ons,
users
forced
to
invent
Very
few
organiza4ons
want
to
(fewer
can)
build
and
maintain
database
so7ware
technology.
But
every
organiza4on
building
interac4ve
web
applica4ons
needs
this
technology.
6
NoSQL database matches application logic tier architecture
Data layer now scales with linear cost and constant performance.
8
The Key-Value Store – the foundation of NoSQL
Key
101100101000100010011101
101100101000100010011101
101100101000100010011101
101100101000100010011101
101100101000100010011101
Opaque
101100101000100010011101
101100101000100010011101
Binary
101100101000100010011101
101100101000100010011101
Value
101100101000100010011101
101100101000100010011101
101100101000100010011101
101100101000100010011101
101100101000100010011101
101100101000100010011101
9
Memcached – the NoSQL precursor
Key
101100101000100010011101
memcached
101100101000100010011101
101100101000100010011101
101100101000100010011101
In-‐memory
only
101100101000100010011101
Limited
set
of
opera4ons
Opaque
101100101000100010011101
Blob
Storage:
Set,
Add,
Replace,
CAS
101100101000100010011101
Binary
101100101000100010011101
Retrieval:
Get
101100101000100010011101
Structured
Data:
Append,
Increment
Value
101100101000100010011101
101100101000100010011101
“Simple
and
fast.”
101100101000100010011101
101100101000100010011101
101100101000100010011101
Challenges:
cold
cache,
disrup4ve
elas4city
101100101000100010011101
10
Redis
–
More
“Structured
Data”
commands
Key
101100101000100010011101
redis
101100101000100010011101
101100101000100010011101
101100101000100010011101
“Data
Structures”
In-‐memory
only
101100101000100010011101
Vast
set
of
opera4ons
Blob
101100101000100010011101
Blob
Storage:
Set,
Add,
Replace,
CAS
101100101000100010011101
List
101100101000100010011101
Retrieval:
Get,
Pub-‐Sub
101100101000100010011101
Set
Structured
Data:
Strings,
Hashes,
Lists,
Sets,
101100101000100010011101
Sorted
lists
Hash
101100101000100010011101
101100101000100010011101
…
101100101000100010011101
Example
operaOons
for
a
Set
101100101000100010011101
Add,
count,
subtract
sets,
intersec4on,
is
101100101000100010011101
member?,
atomic
move
from
one
set
to
another
11
NoSQL
catalog
memcached redis
12
Membase
–
From
key-‐value
cache
to
database
Key
101100101000100010011101
membase
101100101000100010011101
101100101000100010011101
101100101000100010011101
Disk-‐based
with
built-‐in
memcached
cache
101100101000100010011101
Cache
refill
on
restart
Opaque
101100101000100010011101
Memcached
compa4ble
(drop
in
replacement)
101100101000100010011101
Binary
101100101000100010011101
Highly-‐available
(data
replica4on)
101100101000100010011101
Add
or
remove
capacity
to
live
cluster
Value
101100101000100010011101
101100101000100010011101
“Simple,
fast,
elas4c.”
101100101000100010011101
101100101000100010011101
101100101000100010011101
101100101000100010011101
13
NoSQL
catalog
memcached
redis
(memory/disk)
membase
Database
14
Couchbase
–
document-‐oriented
database
Key
Couchbase
{
“string”
:
“string”,
“string”
:
value,
Auto-‐sharding
“string”
:
Disk-‐based
with
built-‐in
memcached
cache
JSON
{
“string”
:
“string”,
Cache
refill
on
restart
“string”
:
value
},
“string”
:
[
array
]
OBJECT
Memcached
compa4ble
(drop
in
replace)
Highly-‐available
(data
replica4on)
}
(“DOCUMENT”)
Add
or
remove
capacity
to
live
cluster
15
NoSQL
catalog
memcached
redis
(memory/disk)
membase
couchbase
Database
16
MongoDB
–
Document-‐oriented
database
Key
MongoDB
{
“string”
:
“string”,
“string”
:
value,
Disk-‐based
with
in-‐memory
“caching”
“string”
:
BSON
(“binary
JSON”)
format
and
wire
protocol
BSON
{
“string”
:
“string”,
Master-‐slave
replica4on
OBJECT
“string”
:
value
},
Auto-‐sharding
“string”
(“DOCUMENT”)
:
[
array
]
Values
are
BSON
objects
}
Supports
ad
hoc
queries
–
best
when
indexed
17
NoSQL
catalog
memcached
redis
(memory/disk)
membase
couchbase
Database
mongoDB
18
Cassandra
–
Column
overlays
Cassandra
Column
1
Disk-‐based
system
Clustered
Column
2
External
caching
required
for
low-‐latency
reads
“Columns”
are
overlaid
on
the
data
Not
all
rows
must
have
all
columns
Column
3
(not
present)
Supports
efficient
queries
on
columns
Restart
required
when
adding
columns
Good
cross-‐datacenter
support
19
NoSQL
catalog
memcached
redis
(memory/disk)
mongoDB
20
Neo4j
–
Graph
database
Neo4j
Disk-‐based
system
External
caching
required
for
low-‐latency
reads
Nodes,
rela4onships
and
paths
Proper4es
on
nodes
Delete,
Insert,
Traverse,
etc.
21
NoSQL
catalog
memcached
redis
(memory/disk)
mongoDB
22
Document-‐oriented
database
advantages
Performance.
The
document
data
model
keeps
related
data
in
a
single
physical
loca4on
in
memory
and
on
disk
(a
document).
This
allows
consistently
low-‐latency
access
to
the
data
–
reads
and
writes
happen
with
very
liqle
delay.
Database
latency
can
result
in
perceived
“lag”
by
the
player
of
a
game
and
avoiding
it
is
a
key
success
criterion.
Dynamic
elasOcity.
Because
the
document
approach
keeps
records
“in
one
place”
(a
single
document
in
a
con4guous
physical
loca4on),
it
is
much
easier
to
move
the
data
from
one
server
to
another
while
maintaining
consistency
–
and
without
requiring
any
game
down4me.
Moving
data
between
servers
is
required
to
add
and
remove
cluster
capacity
to
cost-‐effec4vely
match
the
aggregate
performance
needs
of
the
applica4on
to
the
performance
capability
of
the
database.
Doing
this
at
any
4me
without
stopping
the
revenue
flow
of
the
game
can
make
a
material
difference
in
game
profitability.
Schema
flexibility.
While
all
NoSQL
databases
provide
schema
flexibility.
Key-‐value
and
document-‐
oriented
databases
enjoy
the
most
flexibility.
Column-‐oriented
databases
s4ll
require
maintenance
to
add
new
columns
and
to
group
them.
A
key-‐value
or
document-‐oriented
database
requires
no
database
maintenance
to
change
the
database
schema
(to
add
and
remove
“fields”
or
data
elements
from
a
given
record).
Query
flexibility.
Balancing
schema
flexibility
with
query
expressiveness
(the
ability
to
ask
the
database
ques4ons,
for
example
“return
me
a
list
of
all
the
farms
in
which
a
player
purchased
a
black
sheep
last
month”)
is
important.
While
a
key-‐value
database
is
completely
flexible,
allowing
a
user
to
put
any
desired
value
in
the
“value”
part
of
the
key-‐value
pair,
it
doesn’t
provide
the
ability
to
ask
ques4ons.
It
only
permits
accessing
the
data
record
associated
with
a
given
key.
I
can
ask
for
the
farm
data
for
user
A,
B,
C
…
and
see
if
they
have
a
black
sheep,
but
I
can’t
ask
the
database
to
do
that
work
on
my
behalf.
Document-‐databases
provide
the
best
balance
of
schema
flexibility
without
giving
up
the
ability
to
do
sophis4cated
queries.
23
Big
data
and
NoSQL
–
Hadoop
and
Couchbase
2
1
profiles,
campaigns
click
stream
events
24
COUCHBASE
25
Couchbase
Server
26
Representa4ve
user
list
27
Typical
Couchbase
produc4on
environment
ApplicaOon users
Load Balancer
ApplicaOon Servers
Servers
28
Couchbase
architecture
Database
Opera4ons
Rebalance
orchestrator
Configura4on
manager
Heartbeat
Data
Manager
Cluster
Manager
storage
interface
Erlang/OTP
Cluster
Management
29
Couchbase
deployment
Web
Applica4on
Couchbase
Client
Library
Data Flow
Cluster Management
30
Clustering
With
Couchbase
3
3
Listener-‐Sender
RAM 4
Replica
Server
1
for
KEY
Master
server
for
KEY
Replica
Server
2
for
KEY
31
Basic
Opera4on
APP
SERVER
1
APP
SERVER
2
Docs
distributed
evenly
across
COUCHBASE
CLIENT
LIBRARY
COUCHBASE
CLIENT
LIBRARY
servers
in
the
cluster
CLUSTER
MAP
CLUSTER
MAP
Each
server
stores
both
ac#ve
&
replica
docs
Only
one
server
ac4ve
at
a
4me
Client
library
provides
app
with
Read/Write/Update
Read/Write/Update
simple
interface
to
database
Cluster
map
provides
map
to
which
server
doc
is
on
App
never
needs
to
know
SERVER
1
SERVER
2
SERVER
3
App
reads,
writes,
updates
Ac4ve
Docs
Ac4ve
Docs
Ac4ve
Docs
docs
Doc
5
DOC
Doc
4
DOC
Doc
1
DOC
Mul4ple
App
Servers
can
Doc
2
DOC
Doc
7
DOC
Doc
3
DOC
access
same
document
at
same
Doc
9
DOC
Doc
8
DOC
Doc
6
DOC
4me
Replica
Docs
Replica
Docs
Replica
Docs
Replica
Docs
Replica
Docs
Replica
Docs
Doc
4
DOC
Doc
6
DOC
Doc
7
DOC
Doc
7
Doc
1
DOC
Doc
3
DOC
Doc
9
DOC
Doc
9
Doc
8
DOC
Doc
2
DOC
Doc
5
DOC
Replica
Docs
Replica
Docs
Replica
Docs
Replica
Docs
Replica
Docs
Replica
Docs
Doc
4
DOC
Doc
6
DOC
Doc
7
DOC
Doc
5
DOC
Doc
8
DOC
Doc
7
Doc
1
DOC
Doc
3
DOC
Doc
9
DOC
Doc
2
DOC
Doc
9
35
Reading
and
Wri4ng
Server Server
Reading
Data
37
Keeping
working
data
set
in
RAM
is
key
to
read
performance
…
or
else!
(because
you
don’t
want
the
“else”
part
happening
very
ohen
–
it
is
MUCH
slower
than
a
memory
read
and
you
could
have
to
wait
in
line
an
indeterminate
amount
of
Ome
for
the
read
to
happen.)
Reading
Data
38
Working
set
ra4o
depends
on
your
applica4on
working/total
set
=
.01
working/total
set
=
.33
working/total
set
=
1
Server
network
WriOng
Data
41
Queues
build
if
aggregate
arrival
rate
exceeds
drain
rates
Server
network
WriOng
Data
42
Scaling
out
permits
matching
of
aggregate
flow
rates
so
queues
do
not
grow
44