0% found this document useful (0 votes)
7 views

NoSQL Database Technology - A Survey and Comparison of Systems

The document surveys NoSQL database technology, highlighting its advantages over traditional relational databases, particularly in terms of flexibility and scalability. It discusses the challenges faced by users, such as schema rigidity and performance issues, and outlines various NoSQL systems like Bigtable, Dynamo, and Couchbase. The document emphasizes the importance of NoSQL in modern interactive web applications, where it aligns with application logic tier architecture for efficient data management.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

NoSQL Database Technology - A Survey and Comparison of Systems

The document surveys NoSQL database technology, highlighting its advantages over traditional relational databases, particularly in terms of flexibility and scalability. It discusses the challenges faced by users, such as schema rigidity and performance issues, and outlines various NoSQL systems like Bigtable, Dynamo, and Couchbase. The document emphasizes the importance of NoSQL in modern interactive web applications, where it aligns with application logic tier architecture for efficient data management.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 44

NoSQL

 database  technology:  A  
survey  and  comparison  of  systems  

James  Phillips  

1  
Changes  in  interac4ve  so7ware  –  NoSQL  driver  

2  
Modern interactive software architecture

Application Scales Out


Just add more commodity web servers

Database Scales Up
Get a bigger, more complex server

Note  –  Rela4onal  database  technology  is  great  for  what  it  is  great  for,  but  it  is  not  great  for  this.  
3  
Survey:  Schema  inflexibility  #1  adop4on  driver  

What  is  the  biggest  data  management  problem    


driving  your  use  of  NoSQL  in  the  coming  year?  

Lack  of  flexibility/rigid  schemas   49%  

Inability  to  scale  out  data   35%  

High  latency/low  performance   29%  

Costs   16%  

All  of  these   12%  

Other   11%  

Source: Couchbase NoSQL Survey, December 2011, n=1351

4  
Extending  the  scope  of  RDBMS  technology  

• Data  par44oning  (“sharding”)  


– Disrup4ve  to  reshard  –  impacts  applica4on  
– No  cross-­‐shard  joins  
– Schema  management  on  every  shard  
• Denormalizng  
– Increases  speed  
– At  the  limit,  provides  complete  flexibility  
– Eliminates  rela4onal  query  benefits  
• Distributed  caching  
– Accelerate  reads  
– Scale  out  
– Another  4er,  no  write  accelera4on,  coherency  management  

5  
Lacking  market  solu4ons,  users  forced  to  invent  

Bigtable   Dynamo   Cassandra   Voldemort  


November  2006   October  2007   August  2008   February  2009  

• No  schema  required  before  inser4ng  data  


• No  schema  change  required  to  change  data  format  
• Auto-­‐sharding  without  applica4on  par4cipa4on  
• Distributed  queries  
• Integrated  main  memory  caching  
• Data  synchroniza4on  (mobile,  mul4-­‐datacenter)  

Very  few  organiza4ons  want  to  (fewer  can)  build  and  maintain  database  so7ware  technology.  
But  every  organiza4on  building  interac4ve  web  applica4ons  needs  this  technology.  
6  
NoSQL database matches application logic tier architecture
Data layer now scales with linear cost and constant performance.

Application Scales Out


Just add more commodity web servers

NoSQL  Database  Servers  


Database Scales Out
Just add more commodity data servers

Scaling out flattens the cost and performance curves.


7  
NOSQL  TAXONOMY  

8  
The Key-Value Store – the foundation of NoSQL

Key  
101100101000100010011101  
101100101000100010011101  
101100101000100010011101  
101100101000100010011101  
101100101000100010011101  
Opaque  
101100101000100010011101  
101100101000100010011101  
Binary  
101100101000100010011101  
101100101000100010011101  
Value  
101100101000100010011101  
101100101000100010011101  
101100101000100010011101  
101100101000100010011101  
101100101000100010011101  
101100101000100010011101  

9  
Memcached – the NoSQL precursor

Key  
101100101000100010011101   memcached  
101100101000100010011101  
101100101000100010011101  
101100101000100010011101   In-­‐memory  only  
101100101000100010011101   Limited  set  of  opera4ons  
Opaque  
101100101000100010011101   Blob  Storage:  Set,  Add,  Replace,  CAS  
101100101000100010011101  
Binary  
101100101000100010011101   Retrieval:  Get  
101100101000100010011101   Structured  Data:  Append,  Increment  
Value  
101100101000100010011101  
101100101000100010011101   “Simple  and  fast.”  
101100101000100010011101  
101100101000100010011101  
101100101000100010011101   Challenges:  cold  cache,  disrup4ve  elas4city  
101100101000100010011101  

10  
Redis  –  More  “Structured  Data”  commands  

Key  
101100101000100010011101   redis  
101100101000100010011101  
101100101000100010011101  
101100101000100010011101  
“Data  Structures”   In-­‐memory  only  
101100101000100010011101   Vast  set  of  opera4ons  
Blob  
101100101000100010011101   Blob  Storage:  Set,  Add,  Replace,  CAS  
101100101000100010011101  
List  
101100101000100010011101   Retrieval:  Get,  Pub-­‐Sub  
101100101000100010011101  
Set   Structured  Data:  Strings,  Hashes,  Lists,  Sets,  
101100101000100010011101   Sorted  lists  
Hash  
101100101000100010011101  
101100101000100010011101  
…  
101100101000100010011101   Example  operaOons  for  a  Set  
101100101000100010011101   Add,  count,  subtract  sets,  intersec4on,  is  
101100101000100010011101   member?,  atomic  move  from  one  set  to  
another  

11  
NoSQL  catalog  

Key-­‐Value   Data  Structure   Document   Column   Graph  


(memory  only)  
Cache  

memcached   redis  

12  
Membase  –  From  key-­‐value  cache  to  database  

Key  
101100101000100010011101   membase  
101100101000100010011101  
101100101000100010011101  
101100101000100010011101   Disk-­‐based  with  built-­‐in  memcached  cache  
101100101000100010011101   Cache  refill  on  restart  
Opaque  
101100101000100010011101   Memcached  compa4ble  (drop  in  replacement)  
101100101000100010011101  
Binary  
101100101000100010011101   Highly-­‐available  (data  replica4on)  
101100101000100010011101   Add  or  remove  capacity  to  live  cluster  
Value  
101100101000100010011101  
101100101000100010011101   “Simple,  fast,  elas4c.”  
101100101000100010011101  
101100101000100010011101  
101100101000100010011101  
101100101000100010011101  

13  
NoSQL  catalog  

Key-­‐Value   Data  Structure   Document   Column   Graph  


(memory  only)  
Cache  

memcached   redis  
(memory/disk)  

membase  
Database  

14  
Couchbase  –  document-­‐oriented  database  

Key  
Couchbase  
{  
       “string”  :  “string”,  
       “string”  :  value,   Auto-­‐sharding  
       “string”  :     Disk-­‐based  with  built-­‐in  memcached  cache  
JSON  
                     {    “string”   :  “string”,   Cache  refill  on  restart  
                             “string”  :  value  },  
       “string”  :  [  array  ]  
OBJECT   Memcached  compa4ble  (drop  in  replace)  
Highly-­‐available  (data  replica4on)  
}   (“DOCUMENT”)   Add  or  remove  capacity  to  live  cluster  

When  values  are  JSON  objects  (“documents”):  


Create  indices,  views  and  query  against  the  
views  

15  
NoSQL  catalog  

Key-­‐Value   Data  Structure   Document   Column   Graph  


(memory  only)  
Cache  

memcached   redis  
(memory/disk)  

membase   couchbase  
Database  

16  
MongoDB  –  Document-­‐oriented  database  

Key  
MongoDB  
{  
       “string”  :  “string”,  
       “string”  :  value,   Disk-­‐based  with  in-­‐memory  “caching”  
       “string”  :     BSON  (“binary  JSON”)  format  and  wire  protocol  
BSON  
                     {    “string”   :  “string”,   Master-­‐slave  replica4on  
OBJECT  
                             “string”   :  value  },   Auto-­‐sharding  
       “string”  (“DOCUMENT”)   :  [  array  ]   Values  are  BSON  objects  
}   Supports  ad  hoc  queries  –  best  when  indexed  

17  
NoSQL  catalog  

Key-­‐Value   Data  Structure   Document   Column   Graph  


(memory  only)  
Cache  

memcached   redis  
(memory/disk)  

membase   couchbase  
Database  

mongoDB  

18  
Cassandra  –  Column  overlays  

Cassandra  
Column  1  
Disk-­‐based  system  
Clustered    
Column  2   External  caching  required  for  low-­‐latency  reads  
“Columns”  are  overlaid  on  the  data  
Not  all  rows  must  have  all  columns  
Column  3    
(not  present)     Supports  efficient  queries  on  columns  
Restart  required  when  adding  columns  
Good  cross-­‐datacenter  support  

19  
NoSQL  catalog  

Key-­‐Value   Data  Structure   Document   Column   Graph  


(memory  only)  
Cache  

memcached   redis  
(memory/disk)  

membase   couchbase   cassandra  


Database  

mongoDB  

20  
Neo4j  –  Graph  database  

Neo4j  
Disk-­‐based  system  
External  caching  required  for  low-­‐latency  reads  
Nodes,  rela4onships  and  paths  
Proper4es  on  nodes  
Delete,  Insert,  Traverse,  etc.  

21  
NoSQL  catalog  

Key-­‐Value   Data  Structure   Document   Column   Graph  


(memory  only)  
Cache  

memcached   redis  
(memory/disk)  

membase   couchbase   cassandra   Neo4j  


Database  

mongoDB  

22  
Document-­‐oriented  database  advantages    
Performance.  The  document  data  model  keeps  related  data  in  a  single  physical  loca4on  in  memory  
and  on  disk  (a  document).  This  allows  consistently  low-­‐latency  access  to  the  data  –  reads  and  writes  
happen  with  very  liqle  delay.  Database  latency  can  result  in  perceived  “lag”  by  the  player  of  a  game  
and  avoiding  it  is  a  key  success  criterion.    
Dynamic  elasOcity.  Because  the  document  approach  keeps  records  “in  one  place”  (a  single  document  
in  a  con4guous  physical  loca4on),  it  is  much  easier  to  move  the  data  from  one  server  to  another  while  
maintaining  consistency  –  and  without  requiring  any  game  down4me.  Moving  data  between  servers  is  
required  to  add  and  remove  cluster  capacity  to  cost-­‐effec4vely  match  the  aggregate  performance  
needs  of  the  applica4on  to  the  performance  capability  of  the  database.  Doing  this  at  any  4me  without  
stopping  the  revenue  flow  of  the  game  can  make  a  material  difference  in  game  profitability.  
Schema  flexibility.  While  all  NoSQL  databases  provide  schema  flexibility.  Key-­‐value  and  document-­‐
oriented  databases  enjoy  the  most  flexibility.  Column-­‐oriented  databases  s4ll  require  maintenance  to  
add  new  columns  and  to  group  them.  A  key-­‐value  or  document-­‐oriented  database  requires  no  
database  maintenance  to  change  the  database  schema  (to  add  and  remove  “fields”  or  data  elements  
from  a  given  record).    
Query  flexibility.  Balancing  schema  flexibility  with  query  expressiveness  (the  ability  to  ask  the  
database  ques4ons,  for  example  “return  me  a  list  of  all  the  farms  in  which  a  player  purchased  a  black  
sheep  last  month”)  is  important.  While  a  key-­‐value  database  is  completely  flexible,  allowing  a  user  to  
put  any  desired  value  in  the  “value”  part  of  the  key-­‐value  pair,  it  doesn’t  provide  the  ability  to  ask  
ques4ons.  It  only  permits  accessing  the  data  record  associated  with  a  given  key.  I  can  ask  for  the  farm  
data  for  user  A,  B,  C  …  and  see  if  they  have  a  black  sheep,  but  I  can’t  ask  the  database  to  do  that  work  
on  my  behalf.  Document-­‐databases  provide  the  best  balance  of  schema  flexibility  without  giving  up  
the  ability  to  do  sophis4cated  queries.   23  
Big  data  and  NoSQL  –  Hadoop  and  Couchbase  

40  milliseconds  to  respond  


with  the  decision.  

profiles,  real  4me  campaign    


3   sta4s4cs  

2  
1   profiles,  campaigns  
click  stream  
events   24  
COUCHBASE  

25  
Couchbase  Server  

Simple.  Fast.  Elas4c.  NoSQL.    


 Couchbase  automa4cally  distributes  data  across  commodity  servers.  Built-­‐in  caching  enables  
apps  to  read  and  write  data  with  sub-­‐millisecond  latency.  And  with  no  schema  to  manage,  
Couchbase  effortlessly  accommodates  changing  data  management  requirements.    

26  
Representa4ve  user  list  

27  
Typical  Couchbase  produc4on  environment  

ApplicaOon  users  

Load  Balancer  

ApplicaOon  Servers  

Servers  

28  
Couchbase  architecture  
Database  Opera4ons  

REST  management  API/Web  UI  

vBucket  state  and  replica4on  manager  


(built-­‐in  memcached)  

Global  singleton  supervisor  

Rebalance  orchestrator  
Configura4on  manager  

Node  health  monitor  


Process  monitor  
Membase  EP  Engine  

Heartbeat  
Data  Manager   Cluster  Manager  
storage  interface  

CouchDB   hqp   on  each  node   one  per  cluster  

Erlang/OTP  

Cluster  Management  
29  
Couchbase  deployment  

Web  
Applica4on  

Couchbase  
Client  Library  

Data  Flow  

Cluster  Management  

30  
Clustering  With  Couchbase  

SET  request  arrives  at  KEY’s   SET  acknowledgement  


master  server  
1   2   returned  to  applica4on  

3   3  
Listener-­‐Sender  

RAM   4  

Couchbase  storage  engine  


Disk Disk Disk

Disk Disk Disk

Replica  Server  1  for  KEY   Master  server  for  KEY   Replica  Server  2  for  KEY  
31  
Basic  Opera4on  
APP  SERVER  1   APP  SERVER  2  
 Docs  distributed  evenly  across  
COUCHBASE  CLIENT  LIBRARY   COUCHBASE  CLIENT  LIBRARY   servers  in  the  cluster  
CLUSTER  MAP   CLUSTER  MAP  
 Each  server  stores  both  ac#ve  
&  replica  docs  
 Only  one  server  ac4ve  at  a  4me  
 Client  library  provides  app  with  
Read/Write/Update   Read/Write/Update   simple  interface  to  database  
 Cluster  map  provides  map  to  
which  server  doc  is  on  
 App  never  needs  to  know  
SERVER  1   SERVER  2   SERVER  3  
 App  reads,  writes,  updates  
Ac4ve  Docs   Ac4ve  Docs   Ac4ve  Docs  
docs  
Doc  5   DOC   Doc  4   DOC   Doc  1   DOC  
 Mul4ple  App  Servers  can  
Doc  2   DOC   Doc  7   DOC   Doc  3   DOC   access  same  document  at  same  
Doc  9   DOC   Doc  8   DOC   Doc  6   DOC  
4me  

Replica  Docs   Replica  Docs   Replica  Docs  

Doc  4   DOC   Doc  6   DOC   Doc  7   DOC  

Doc  1   DOC   Doc  3   DOC   Doc  9   DOC  

Doc  8   DOC   Doc  2   DOC   Doc  5   DOC  

COUCHBASE  SERVER    CLUSTER  

User  Configured  Replica  Count  =  1   32  


Add  Nodes  
APP  SERVER  1   APP  SERVER  2  

 Two  servers  added  to  


COUCHBASE  CLIENT  LIBRARY   COUCHBASE  CLIENT  LIBRARY   cluster  
 One-­‐click  opera4on  
CLUSTER  MAP   CLUSTER  MAP  
 Docs  automa4cally  
rebalanced  across  
cluster  
 Even  distribu4on  of  
docs  
Read/Write/Update   Read/Write/Update    Minimum  doc  
movement  
 Cluster  map  updated  
 App  database  calls  now  
distributed  over  larger  #  
SERVER  1   SERVER  2   SERVER  3   SERVER  4   SERVER  5   of  servers  
Ac4ve  Docs   Ac4ve  Docs   Ac4ve  Docs   Ac4ve  Docs   Ac4ve  Docs  
Ac4ve  Docs  
Doc  5   DOC   Doc  4   DOC   Doc  1   DOC  
Doc  3  
Doc  2   DOC   Doc  7   DOC   Doc  3   DOC  
Doc  6  
Doc  9   DOC   Doc  8   DOC   Doc  6   DOC  

Replica  Docs   Replica  Docs   Replica  Docs   Replica  Docs   Replica  Docs  
Replica  Docs  
Doc  4   DOC   Doc  6   DOC   Doc  7   DOC  
Doc  7  
Doc  1   DOC   Doc  3   DOC   Doc  9   DOC  
Doc  9  
Doc  8   DOC   Doc  2   DOC   Doc  5   DOC  

COUCHBASE  SERVER    CLUSTER  

User  Configured  Replica  Count  =  1   33  


Fail  Over  Node  
APP  SERVER  1   APP  SERVER  2  
 App  servers  happily  accessing  docs  
on  Server  3  
COUCHBASE  CLIENT  LIBRARY   COUCHBASE  CLIENT  LIBRARY    Server  fails  
 App  server  requests  to  server  3  fail  
CLUSTER  MAP   CLUSTER  MAP    Cluster  detects  server  has  failed  
 Promotes  replicas  of  docs  to  ac#ve  
 Updates  cluster  map  
 App  server  requests  for  docs  now  
go  to  appropriate  server  
 Typically  rebalance    would  follow  

SERVER  1   SERVER  2   SERVER  3   SERVER  4   SERVER  5  


Ac4ve  Docs   Ac4ve  Docs   Ac4ve  Docs   Ac4ve  Docs   Ac4ve  Docs  
Ac4ve  Docs  
Doc  5   DOC   Doc  4   DOC   Doc  1   DOC   Doc  9   DOC   Doc  6   DOC  
Doc  3  
Doc  2   DOC   Doc  7   DOC   Doc  3   Doc  8   DOC  
Doc  6  
DOC  

Replica  Docs   Replica  Docs   Replica  Docs   Replica  Docs   Replica  Docs  
Replica  Docs  
Doc  4   DOC   Doc  6   DOC   Doc  7   DOC   Doc  5   DOC   Doc  8   DOC  
Doc  7  
Doc  1   DOC   Doc  3   DOC   Doc  9   DOC   Doc  2   DOC  
Doc  9  

COUCHBASE  SERVER    CLUSTER  

User  Configured  Replica  Count  =  1   34  


COUCHBASE  SOLUTION  
OPERATING  A  CLUSTER  

35  
Reading  and  Wri4ng  

Reading  Data   WriOng  Data  

Give  me   Please  store  


document  A   A   document  A  
Here  is     A   OK,  I  stored  
document  A   document  A  

Server   Server  

(We’ll  save  the  arithmeOc  for  the  sizing  secOon  :  )  


36  
Reading  data  

Give  me  document  A   Here  is  document  A  

If  document  A  is  in  memory  


A  
       return  document  A  to  the  applica4on  
Else  
A          add  document  to  read  queue  
       reader  eventually  loads  document    
             from  disk  into  memory  
       return  document  A  to  the  applica4on  
Server  

Reading  Data   37  
Keeping  working  data  set  in  RAM  is  key  to  read  performance  

Your  applicaOon’s  working  


set  should  fit  in  RAM…  

…  or  else!  (because  you  don’t  want  the  “else”  part  happening  very  
ohen  –  it  is  MUCH  slower  than  a  memory  read  and  you  could  have  to  
wait  in  line  an  indeterminate  amount  of  Ome  for  the  read  to  happen.)  

Reading  Data   38  
Working  set  ra4o  depends  on  your  applica4on  
working/total  set  =  .01   working/total  set  =  .33   working/total  set  =  1  

Server   Server   Server  

Late  stage  social  game   Business  applicaOon   Ad  Network  


Many  users  no  longer   Users  logged  in  during  the   Any  cookie  can  show  up  
ac4ve;  few  logged  in  at   day.  Day  moves  around   at  any  4me.  
any  given  4me.   the  globe.  
Reading  Data   39  
Couchbase  in  opera4on:  Wri4ng  data  

Store  document  A   OK,  it  is  stored  

If  there  is  room  for  the  document  in  RAM  


A          Store  the  document  in  RAM  
Else  
A          Eject  other  document(s)  from  RAM  
       Store  the  document  in  RAM  
Add  the  document  to  the  replica4on  queue  
       Replicator  eventually  transmits  document  
Add  the  document  to  write  queue  
       Writer  eventually  writes  document  to  disk  
Server  

WriOng  Data  40  


Flow  of  data  when  wri4ng  

ApplicaOons  wriOng  to  Couchbase    

Server  

Couchbase  transmijng  replicas   Couchbase  wriOng  to  disk  

network  
WriOng  Data  41  
Queues  build  if  aggregate  arrival  rate  exceeds  drain  rates  

Server  

ReplicaOon  queue   Disk  write  queue  

network  
WriOng  Data  42  
Scaling  out  permits  matching  of  aggregate  flow  rates  so  
queues  do  not  grow  

Server   Server   Server  

network   network   network  


43  
QUESTIONS?  

[email protected]  

44  

You might also like