NoSQL and Couchbase

NoSQL & Couchbase
Sangharsh Agarwal

Relational Databases
• MySQL, PostgreSQL, SQLite, Oracle etc.,
• Good at
–Schemas
–Strong Consistency
–Transactions
–“Mature” and well tested
–Availability of Expertise

What is NoSQL?
• It’s not Anti SQL or ‘NO’ SQL.
• It means (N)ot (O)nly SQL.
• Exact name could be Non
Relational DB.

What is NoSQL?
• Carlo Strozzi used the term NoSQL in 1998 to name his lightweight, open-
source relational database that did not expose the standard SQL interface.
• A NoSQL database provides a mechanism for storage and retrieval of data
that is modeled in means other than the tabular relations used in
relational databases.
• Motivation for NoSQL include simplicity of design, horizontal scaling and
finer control over availability.
• Data structures in NoSQL (e.g. key-value, graph, or document) differs from
the RDBMS, and therefore some operations are faster in NoSQL and some
in RDBMS.

“Is NoSQL a complete
replacement of RDBMS?”
“NO”

Common Features of NoSQL
• Open Source
• Schema-less
• Scalability with Scale Out not Scale Up.
• Distribution with Sharding.
• Eventual Consistency.
• Commodity Class Nodes
• Parallel Query with MapReduce.
• Cloud Readiness
• High Availability

NoSQL Data Models (1/2)
• Distributed Caches: Couchbase, Memcached,
Velocity
• Wide Column Stores: Accumulo, Cassandra,
Druid, HBase
• Document Stores: Clusterpoint, Apache
CouchDB, Couchbase, MarkLogic, MongoDB

NoSQL Data Models (2/2)
• Key-value Stores: Dynamo, FoundationDB,
MemcacheDB, Redis, Riak, FairCom c-treeACE
• Graph Databases: Allegro, Neo4J,
InfiniteGraph, OrientDB, Virtuoso, Stardog

Why NoSQL (1/2)
• Interactive applications have changed dramatically over the last 15
years. In the late ‘90s, large web companies emerged with dramatic
increases in scale on many dimensions:
– The number of concurrent users skyrocketed. (Big Users)
– The amount of data collected and processed soared. (IOT)
– The amount of unstructured or semi-structured data exploded. (Big
Data/Cloud)
• Dealing with above issues was more and more difficult using
relational database technology.
• Relational databases are essentially architected to run a single
machine and use a rigid schema-based approach to modeling data.

Why NoSQL (2/2)
• Schema-less: Alter operation in RDBMS is
costly.
• RDMS are less capable of dealing with Big-
Data.
• RDMS are not good for Object oriented
programmer.
• RDMS support Scale-up than Scale-out.
• RDMS can-not handle Unstructured or semi-
structured data.

Big Users
• Not that long ago, 1,000 daily users of an application was a lot and 10,000
was an extreme case.
• Today, with the growth in global Internet use, the increased number of
hours users spend online, and the growing popularity of smartphones and
tablets, it's not uncommon for apps to have millions of users a day.

Internet of Things
• The amount of machine-generated data is increasing with
the proliferation of digital telemetry.
• There are 14 billion things connected to the Internet.
– By 2020, 32 billion things will be connected to the Internet.
– By 2020, 10% of data will be generated by embedded systems.
– By 2020, 20% of target rich data will be generated by
embedded systems.
• Telemetry data is small, semi-structured and continuous.
It’s a challenge for relational databases.
• To address this challenge, the innovative enterprise is
relying on NoSQL technology to scale concurrent data
access to millions of connected things.

Big Data
• The amount of data is growing rapidly, and the nature of data is changing as well
as developers find new data types – most of which are unstructured or semi-
structures – that they want to incorporate into their applications.
• Data is becoming easier to capture and access through third parties such as
Facebook, Dun and Bradstreet, and others.
• NoSQL provides a data model that maps better to the application’s organization
of data and simplifies the interaction between the

The Cloud
• Three-Tier Internet Architecture: Applications today are increasingly developed
using a three-tier internet architecture, are cloud-based, and use a Software-as-a-
Service business model that needs to support the collective needs of thousands of
customers.
• Above approach requires a horizontally scalable architecture that easily scales with
the number of users and amount of data the application has.
• NoSQL technologies have been built from the ground up to be distributed, scale-
out technologies and therefore fit better with the highly distributed nature of the
three-tier Internet architecture.

Data Models
• Relational and NoSQL data models are very different.
• The relational model takes data and separates it into many interrelated tables.
• Tables reference each other through foreign keys that are stored in columns as
well.
• NoSQL databases have a very different model.
• For example, a document-oriented NoSQL database takes the data you want
to store and aggregates it into documents using the JSON format.

The CAP Theorem
Published by Eric Brewer in 2000, the theorem is a set of basic requirements that
describe any distributed system (not just storage/database systems).
• Consistency - All the servers in the system will have the same data so anyone
using the system will get the same copy regardless of which server answers
their request.
• Availability - The system will always respond to a request (even if it's not the
latest data or consistent across the system or just a message saying the system
isn't working).
• Partition Tolerance - The system continues to operate as a whole even if
individual servers fail or can't be reached.
It's theoretically impossible to have all 3 requirements met, so a combination of
2 must be chosen and this is usually the deciding factor in what technology is
used.

ACID Properties
ACID is a set of properties that apply specifically to database transactions,
defined as follows:
• Atomicity - Everything in a transaction must happen successfully or none
of the changes are committed. This avoids a transaction that changes
multiple pieces of data from failing halfway and only making a few
changes.
• Consistency - The data will only be committed if it passes all the rules in
place in the database (ie: data types, triggers, constraints, etc).
• Isolation - Transactions won't affect other transactions by changing data
that another operation is counting on; and other users won't see partial
results of a transaction in progress (depending on isolation mode).
• Durability - Once data is committed, it is durably stored and safe against
errors, crashes or any other (software) malfunctions within the database.

BASE Theorem
• Basically Available - This constraint states that the system does guarantee
the availability of the data as regards CAP Theorem; there will be a
response to any request. But, that response could still be ‘failure’ to obtain
the requested data or the data may be in an inconsistent or changing
state, much like waiting for a check to clear in your bank account.
• Soft state - The state of the system could change over time, so even during
times without input there may be changes going on due to ‘eventual
consistency,’ thus the state of the system is always ‘soft.’
• Eventual consistency - The system will eventually become consistent once
it stops receiving input. The data will propagate to everywhere it should
sooner or later, but the system will continue to receive input and is not
checking the consistency of every transaction before it moves onto the
next one.

Couchbase - The NoSQL document database
• Couchbase Server, originally known as Membase, is an open
source, distributed (shared-nothing architecture) NoSQL
document-oriented database that is optimized for interactive
applications. These applications must service many concurrent
users; creating, storing, retrieving, aggregating, manipulating and
presenting data.
• Couchbase is designed to provide easy-to-scale key-value or
document access with low latency and high sustained
throughput. It is designed to be clustered from a single machine
to very large scale deployments.
• In the parlance of Eric Brewer’s CAP theorem, Couchbase is a CP
type system.

Couchbase Features
Easy Scalability
It’s easy to scale your database layer with
Couchbase Server, whether within a cluster
or across clusters in multiple data centers.
With one click of a button, no downtime,
and no changes to your app, you can grow
your cluster from 1 to 25 to 100s of servers
while keeping the workload evenly
distributed.
Consistent High Performance
Couchbase Server’s consistent sub
millisecond response times means an
awesome experience for your app users.
Consistent, high throughput lets you
serve more users with fewer servers.
Data and workload are equally spread
across all servers.
Always On
With Couchbase Server, your application is
always online, 24x365. Whether you are
upgrading your database, system software
or hardware – or recovering from a
disaster – you can count on zero app
downtime with Couchbase Server.
Flexible Data Model
You shouldn’t have to worry about the
database when you change your
application. With Couchbase Server, there
is no fixed schema so records can have
different structure, and be changed any
time, without modification to other
documents in the database.

Couchbase Features..
Flexible Data Model
1. JSON Support
2. Indexing and Querying
3. Incremental Map Reduce
Easy Scalability
1. Clone to Grow with Auto-Sharding
2. Cross-Cluster Replication (XDCR)
Consistent High Performance
1. Built-In Object-Level Cache
(memcached)
Always On 24x365
1. Zero Downtime Manitenance
2. Data Replication With Auto-Failover
3. Management and Monitoring UI
4. Reliable Storage Architecture.

Why Couchbase?
• Couchbase provides the world’s most complete,
most scalable and best performing NoSQL
database.
• Couchbase provides the world’s most complete,
most scalable and best performing NoSQL
database.
• Couchbase provides a shared nothing
architecture, a single node-type, a built in caching
layer, true auto-sharding and the world’s first
NoSQL mobile offering.

Couchbase Architecture (1/3)
High-Level Deployment Architecture.

• In Couchbase Server, the data
manager stores and retrieves data
in response to data operation
requests from applications.
• Every server in a Couchbase cluster
includes a built-in multi-threaded
object-managed cache, which
provides consistent low-latency for
read and write operations.
• The cluster manager supervises
server configuration and interaction
between servers within a
Couchbase cluster.
Node architecture diagram of Couchbase Server

Data flow within Couchbase during a write operation
1. Client writes a document into the cache,
and the server sends the client a
confirmation.
2. The document is added into the intra-
cluster replication queue to be replicated
to other servers within the cluster.
3. The document is also added into the disk
write queue to be asynchronously
persisted to disk. The document is
persisted to disk after the disk-write
queue is flushed.
4. After the document is persisted to disk,
it’s replicated to other Couchbase Server
clusters using cross datacenter replication
(XDCR) and eventually indexed.

Couchbase’ Elasticsearch Connector
• Together, Couchbase and Elasticsearch enable you to build richer and more
powerful apps with full-text search, indexing and querying and real-time analytics
for use cases such as content stores or aggregating data from varied data sources.
“The plug-in for Elasticsearch extends Couchbase Server’s flexibility even further,
allowing users to build self-adapting interactive applications.”

References
• http://www.thoughtworks.com/insights/articles/nosql-
comparison
• http://www.quora.com/What-is-the-relation-between-SQL-
NoSQL-the-CAP-theorem-and-ACID
• http://www.christof-strauch.de/nosqldbs.pdf
• http://docs.couchbase.com/

NoSQL and Couchbase

Recommended

More Related Content

What's hot (20)

Similar to NoSQL and Couchbase (20)

Recently uploaded (20)

NoSQL and Couchbase