A Brief Overview On Apache CouchDB
A Brief Overview On Apache CouchDB
Submitted by
Harshith V
Trainee Software Engineer
Custom Solution Services
Excelsoft Technologies Inc.
Apache CouchDB
Catalog
1.0 DBMS-Database Management System
1.1. RDBMS ------------------------------------------------------------------------------------------------------------1
1.2. OLAP ----------------------------------------------------------------------------------------------------------1
1.3. NoSQL ----------------------------------------------------------------------------------------------------------1
1. Document databases: ---------------------------------------------------------------------------------------1
2. Key-Value stores: ---------------------------------------------------------------------------------------1
3. Column-Family stores: ---------------------------------------------------------------------------------------2
4. Graph databases: ---------------------------------------------------------------------------------------2
1. RDBMS
RDBMS stands for Relational Database Management System.
It is a type of DBMS software in which we store the data
In the form of Tables ( rows & columns ) ".
2. OLAP
OLAP stands for Online Analytical Processing.
It is a category of software tools that enable users to interactively analyze and explore
multidimensional data for business intelligence purposes. OLAP systems are designed to handle
complex queries and support analytical processing, allowing users to gain insights from large
volumes of data.
3. NoSQL
NoSQL is a type of database management system (DBMS) that is designed to handle and store large
volumes of unstructured and semi-structured data.
1. Document databases:
These databases store data as semi-structured documents,such as JSON, XML and binary forms like
BSON.
Documents are addressed in the databases via a unique key that represents the document.
Software Systems:
Mongo DB
CouchDB
Elastic Search
2. Key-Value stores:
Every data element in database is stored in key-value pairs.
The data can be retrived by using a unique key alloted to each element in the database.
The values can be simple data types like string and numbers or complex objects.
Software Systems:
Redis
Amazon DynamoDB
Riak
1
Apache CouchDB
3. Column-Family stores:
A column-oriented database is a non-relational database that stores the data in columns instead of
rows.
Software Systems:
Apache Cassandra
HBase
4. Graph databases:
Graph-Based Databases focus on the relationship between the elements.It stores the data in the
form of nodes in the databases.
The connection between the nodes are called links or relationships.
Software Systems:
Neo4j
Amazon Neptune
Arango DB
2
Apache CouchDB
2.1:Technical Overview
A CouchDB server hosts named databases,which store documents. Each document is uniquely
named in the database, and CouchDB provides a RESTful HTTP API for reading and updating
(add,edit,delete,fetch) database documents.
Documents are the primary unit of data in CouchDB and consists of any number of fields and
attachments.Documents also include metadata that is maintained by the db system.Document
fields are uniquely named and containing values of varying types (text,number,boolean,lists,etc),
And there is no set limit to text size or element count.
The CouchDB API is the primary method of interfacing to a CouchDB instance. Requests are made
using HTTP and requests are used to request information from the database, store new data, and
perform views and formatting of the information stored within the documents.
3
Apache CouchDB
GET:
Requests the specified item. As with normal HTTP requests,the format of the URL defines what is
returned.With CouchDB this can include static items, database documents, and configuration and
statistical information. In most cases the information is returned in the form of a JSON document.
HEAD:
The HEAD method is used to get the HTTP header of a GET request without the body of the
response.
URL: GET http://localhost:5984/mydatabase/mydocument
POST
Post means-Upload data.
Within CouchDB POST is used to set values, including uploading documents, setting document
values
URL: POST http://localhost:5984/mydatabase
PUT
Used to put a specified resource.
In CouchDB PUT is used to create new objects, including databases, documents, views and design
documents.
URL: PUT http://localhost:5984/mydatabase/mydocument
4
Apache CouchDB
DELETE
Deletes the specified resource, including documents, views, and design documents.
Delete URL:http://admin:[email protected]:5984/sample/0011
5
Apache CouchDB
COPY
A special method that can be used to copy documents and objects.
Example URL:
COPY http://localhost:5984/mydatabase/mydocument
In the request headers, include the destination URL where you want to copy the document
(Destination: http://localhost:5984/anotherdatabase/newdocument).
6
Apache CouchDB
Because CouchDB uses HTTP for all communication, we need to ensure that the correct HTTP
headers are supplied.
An HTTP header is a field of an HTTP request or response that passes additional context and
metadata about the request or response
Request Headers
Accept:
Indicates the media types that the client is willing to accept from the server in the response.
Example: Accept: application/json tells the server that the client prefers to receive JSON
data.
Content-Type:
Specifies the format of the data in the request or response body. It tells the server or client
how to interpret the data.
Example: Content-Type: application/json indicates that the data is in JSON format.
For the majority of requests this will be JSON (application/json).
Response Headers
Cache-control:
The Cache-Control header in HTTP requests and responses provides instructions on how
caching should be handled by browsers, proxies, and other intermediary servers.
Content-length:
Indicates the size of the request or response body in bytes.
Example: Content-Length: 1024 tells the recipient how many bytes of data to expect in the
body of the request or response.
Content-type:
The Content-Type header in an HTTP response specifies the media type of the data being sent
in the response body.
Etag:
Contains the entity tag of the resource, which is a unique identifier representing the current
state of the resource.
Example: ETag: "1-6a0c974072b3e621eb22a387033e83d5" is used to compare the current
state of a document with a previously obtained ETag value.
7
Apache CouchDB
405-Resource not allowed This status is issued when http request type is
invalid
The majority of requests and responses to CouchDB use the JavaScript Object Notation (JSON) for
formatting the content and structure of the data and responses.
JSON is used because it is the simplest and easiest solution for working with data within a web
browser.
these are:
• Array - a list of values enclosed in square brackets.
For example: ["one", "two", "three"]
• Boolean - a true or false value. You can use these strings directly.
For example: { "value": true}
8
Apache CouchDB
For example:
{
"servings" : 4,
"subtitle" : "Cooking without fire",
"cooktime" : 30,
"title" : "Churumuri"
}
In CouchDB, the JSON object is used to represent a variety of structures in the document, including
the main CouchDB document.
9
Apache CouchDB
10
Apache CouchDB
1. CouchDB Fauxton:
Fauxton is web based interface built into CouchDB. We can do actions like creating and deleting
operations.
A web based built in administration interface to facilitate a simple GUI to Interact with Couch DB.
Fig:4.0.1 Fauxton UI
2. CouchDB cURL:
But to communicate with the CouchDB Database to transfer data from or to a server, CouchDB
cURL utility is needed.
11
Apache CouchDB
http://localhost:5984/mydatabase
12
Apache CouchDB
13
Apache CouchDB
3. We can keep _id as is, or we can change and we can add more fields to JSON document and Click
on Create Document button
14
Apache CouchDB
15
Apache CouchDB
The CAP Theorem describes a few different strategies for distributing application logic across the
network. CouchDB’s Solution uses replication to propagate application changes across the
participating nodes.
1. Consistency:
All database clients see the same data, even with concurrent updates.
In other words, all nodes in the system have the same data at the same time.
2. Availability:
All database clients are able to access the same version of data, means It contains the most recent
write.
3. Partition Tolerance:
The databases can be split over multiple servers.
The system continues to operate despite arbitrary message loss or failure of part of the system.
A BTree (Balanced tree) is a data structure that is commonly used in database systems and file
systems to store and manage large sets of data in a sorted order for efficient search,insertion and
deletion operations.
The BTree structure is designed to maintain balance,ensuring that all leaf nodes are at the same
level, which helps in maintaining efficient performance.
In the Context of databases, whenever we mention a ‘BTree Engine’ it could be referring to the
underlying storage engine or index structure used by DBMS.
Many relational databases,such as MySQL and PostGreSQL, use B-Trees to implement indexes,
providing fast and balanced access of data
16
Apache CouchDB
CouchDB implements ACID properties for data storage and document updates.
Atomicity: CouchDB ensures document updates are "all or nothing." Either the entire update
succeeds and gets committed, or if any failure occurs, everything rolls back, leaving the document
untouched.
Consistency: When the data in couchdb was once committed, then the data will not be modified or
overwritten.
Isolation: Concurrent writes to the same document are prevented. Only one client can modify a
document at a time, ensuring no conflicts occur due to simultaneous edits.
Durability:
CouchDB ensures durability by writing data to disk and maintaining multiple copies (replicas)
of data across nodes in a cluster.
Once a write operation is acknowledged, CouchDB guarantees that the data will persist even
In the event of hardware failures or crashes.
MVCC is a core concept that plays a crucial role in handling concurrent access to the database.
The MVCC mechanism in couchdb enables multiple users or transactions to work with database
concurrently while maintaining consistency and avoiding conflicts.
6.5 Replication
Replication is the fundamental feature in CouchDB that provides data distribution fault tolerance
and scalability.
17
Apache CouchDB
Replication Basics:
1. One way replication
Data is copied from source database to target database.
One of the couchdb’s strengths is the ability to synchronize two copies of the same database.
This enables data across several nodes or data centers,but also move data more closely to the
clients.
Replication involves a source and a destination database,which can be on the same or different
couchdb instances.
The Aim of replication is that at the end of the process, all active documents in the source
databases are also in the destination database and all documents that were deleted in the source
documents are also deleted in the destination databases
6.6 Indexes
In CouchDB, indexes are known as "views". Views are special functions defined within design
documents that generate indexes for querying documents in a database. These views are created
using map functions, which define how documents should be indexed.
18
Apache CouchDB
Map Function:all
Function(doc)
{
If(doc.type===”people”)
Emit(doc.id);
}
Doc: Inside our map function, our logic will determine if the doc needs to be mapped or not.
IF YES
Reduce Function
In CouchDB, a reduce function is used to aggregate data emitted by the map function of a view.
It takes a set of key-value pairs and produces a single result, such as summing up values or finding
maximum/minimum values.
The function iterates over the values and computes the desired result, returning a single value as
the output.
19
Apache CouchDB
Mango is designed to provide a flexible and powerful querying mechanism that goes beyond basic
views.
There are two parts to a Mango Query: the index and the selector.
Example of a simple Selector Mango Query
{
“selector”:{
“field1”:”value1”,
“field2”:”{“$gt”:42}
}
“fields”:[“field1”,”field2”]
“sort”:[{“field1”:”asc”}]
“limit”:10,
“skip”;0
}
“selector”: This is the main part of the query where you define the condition that documents must
meet to be included in the result set.
In above example it only selects
Field1=value1 &
Field2 > greater than 42
“sort”:Defines the sorting order of the result set. It is an array of field and directin pairs.
In this example, the result set will be sorted in ascending order on “field1”.
“skip”:specifies the number of documents to skip before starting into include documents in result
set.
20
Apache CouchDB
In CouchDB, operators are used in various contexts such as querying documents, filtering results,
and defining views.
Operators are identified by the dollar ($) prefix in the name field.
21
Apache CouchDB
22
Apache CouchDB
23