2.1.SummerSOC2015 Tutorial NoSQL
2.1.SummerSOC2015 Tutorial NoSQL
Holger Schwarz
Matthias Wieland
IPVS, Universität Stuttgart, Germany
SummerSOC 2015
NoSQL Data Management
Overview
• Introduction to NoSQL
2
NoSQL Data Management
History of NoSQL
3
NoSQL Data Management
4
NoSQL Data Management
• Relational Databases
Standard SQL database available for Cloud Environments as Virtual Machine
Image or as a service depending on the vendor
Not cloud-ready: Difficult to scale
• NoSQL databases
Database which is designed for the cloud
Built to serve heavy read/write loads
Good ability to scale up and down
Applications built based on SQL data model require a complete rewrite
E.g. Apache Cassandra, CouchDB and MongoDB
5
NoSQL Data Management
6
NoSQL Data Management
• Typical characteristics of
NoSQL databases are:
Non-relational
Schema-free
Open Source
Simple API
Distributed
Eventual consistency
Source: https://clt.vtc.edu.hk/what-happens-online-in-60-seconds/
7
NoSQL Data Management
Non-relational
Schema-free
• Most NoSQL databases are schema-free or have relaxed schemas
• No need for definition of any sort of schema of the data
• Allows heterogeneous structures of data in the same domain
8
NoSQL Data Management
Simple API
Distributed
• Several NoSQL databases can be executed in a distributed fashion
• Providing auto-scaling and fail-over capabilities
• Often ACID is sacrificed for scalability and throughput
• Often no synchronous replication between distributed nodes is possible, e.g.
asynchronous Multi-Master Replication, peer-to-peer, HDFS Replication
• Only providing eventual consistency
9
NoSQL Data Management
C
10
NoSQL Data Management
• Facebook
Social network
• Amazon
DynamoDB and SimpleDB
• CERN
• GitHub
11
NoSQL Data Management
Overview
• Introduction to NoSQL
12
NoSQL Data Management
Choose two!
13
NoSQL Data Management
14
NoSQL Data Management
Consistent Hashing
• Technique how to efficiently distribute replicas to nodes
• Consistent hashing is a special kind of hashing
When hash table is resized only K/n keys need to be remapped on average
K is the number of keys, and n is the number of slots
In traditional hash tables nearly all keys have to be remapped
• Insert Servers on ring
Hash based e.g. on IP, Name, …
Take over objects between own and processor hash
• Insert Objects on ring 0
Hash based on key
Walks around the circle until 2
falling into the first bucket Server 1
• Delete Servers
Copy objects to next server Server
Server2/1
2
• Virtual Servers Server 2/2
More than one hash per server
• Replication 12
Place objects multiple times Server 3
Improves reliability
15
NoSQL Data Management
16
NoSQL Data Management
• MapReduce
Available in many NoSQL databases
Can run fully distributed
It is Functional Programming, not writing queries!
Map phase - perform filtering and sorting
Reduce phase - performs a summary operation
~ SELECT and GROUP BY of a relational database
More details later!
Source: @tgrall
17
NoSQL Data Management
Overview
• Introduction to NoSQL
18
NoSQL Data Management
19
NoSQL Data Management
sessionid=A08154711 id
• Session data userlogin=“xyz”
date_of_expiry=2015/12/31
• User profiles
key value
timestamp x y z temperature
20
NoSQL Data Management
Key Functionality
bucket types
21
NoSQL Data Management
docs.basho.com 22
NoSQL Data Management
Configure Replication
docs.basho.com 23
NoSQL Data Management
24
NoSQL Data Management
Riak Search
• For Raik KV, a value is just a value possibly associated with a type
• Riak Search 2.0
Based on Solr, the search platform built on Apache Lucene
Define extractors, i.e., modules responsible for pulling out a list of fields and
values from a Riak object
Define Solr schema to instruct Riak/Solr how to index a value
Queries:
exact match, globs, inclusive/exclusive range queries, AND/OR/NOT, prefix
matching, proximity searches, term boosting, sorting, …
25
NoSQL Data Management
Document Stores
26
NoSQL Data Management
28
NoSQL Data Management
Data Organization
29
NoSQL Data Management
Key Functionality
insert document
update document(s)
create index
30
NoSQL Data Management
Querying Documents
www.mongodb.com 31
NoSQL Data Management
Availability
• Specific secondaries
priority 0 member delayed member
hidden member arbiter
32
NoSQL Data Management
Scalability
33
NoSQL Data Management
34
NoSQL Data Management
35
NoSQL Data Management
sparse table
• Metadata
there is no catalog that provides the set of all columns for a certain table
left to the user/application
36
NoSQL Data Management
• Key/Value class
keylength valuelength key value
e.g. com.cnn.
www
2 anchor cnnsi.com t9 put
37
NoSQL Data Management
Key Functionality
namespaces
38
NoSQL Data Management
Backup HMaster
…
• Failover • Failover
HBase clients talk directly to the Region immediately becomes
RegionServers, hence they may unavailable when the RegionServer is
continue without MasterServer (at least down
for some time) The Master will detect that the
catalog table META exists as HBase RegionServer has failed
tables, i.e., not resident in the region assignments will be considered
MasterServer invalid
assign region to a new RegionServer
39
NoSQL Data Management
Storage Structure
HRegionServer HRegionServer
main memory
StoreFile StoreFile
Log Log
Block Block
40
NoSQL Data Management
Write Data
write table_T.family_a.field_f
HRegionServer HRegionServer
main memory
StoreFile StoreFile
Log Log
Read Data
read table_T.family_a.field_f
HRegionServer HRegionServer
main memory
StoreFile StoreFile
Log Log
42
NoSQL Data Management
StoreFile Reorganisation
43
NoSQL Data Management
44
NoSQL Data Management
CQL in Cassandra
CREATE KEYSPACE demodb WITH REPLICATION =
{'class' : 'SimpleStrategy', 'replication_factor': 3};
cassandra.apache.org 45
NoSQL Data Management
Graph Databases
46
NoSQL Data Management
smartblogs.com
• Location-based services, 5 7
15
9
20 12 graph with
distances
• Recommendation systems,
e.g., bought products, often-visited attractions
47
NoSQL Data Management
node
relationship
properties
neo4j.com 48
NoSQL Data Management
Basic Functionality
graph
remove property
query index
create index
49
NoSQL Data Management
Cypher Example
MATCH (n {name:”Holger”})-[:KNOWS]->(m)
50
NoSQL Data Management
Scalability
node memory
c) application-level sharding
51
NoSQL Data Management
High Availability
write on master
write commit
M S1 S2 S3
propagate
52
NoSQL Data Management
High Availability
write on slave
write commit
propagate pull asynchronously
M S1 S2 S3 M S1 S2 S3
propagate
commit
53
NoSQL Data Management
54
NoSQL Data Management
Hadoop Ecosystem
Hive Pig …
MapReduce Framework
HBase
(Coordination)
Zookeeper
YARN
(Cluster Resource Management)
HDFS
(Hadoop Distributed File System)
55
NoSQL Data Management
MAP RED
56
NoSQL Data Management
task tracker m3
file
k/v 1
k/v 2
Fault Tolerance
58
NoSQL Data Management
HDFS Architecture
Is that all?
60
NoSQL Data Management
61
NoSQL Data Management
Conclusion
• Relational Databases provide To make a proper decision,
Data spread over many tables carefully examine your application
Schema needs to be defined • the data model that is most
Structured query language (SQL) appropriate
Transactions • the query complexity
• the consistency needs
Strong Consistency
• the transactional requirements
General purpose applicability
• NoSQL
Aggregated data in one object (identified by a key)
No predefined schema
No declarative query language
Limited transactional capability
Eventual consistency rather ACID property
Focus on scalability and availability
Often selected and customized for a concrete application scenario
62