NoSQL Big Data Management
NoSQL Big Data Management
2. Flexibility
3. Sharding
4. Speed
5. Scalability
6. Resources Sharing
7. Performance
Demerits of Distributed Computing
• Analytical vs Operational
• Batch vs Interactive
SQL- RDBMS
NoSql
NoSql- Flexible Data Models & multiple schemas, Consider as semi-structured
What is NoSQL?
• NoSQL Database is a non-relational Data Management System, that does not
require a fixed schema.
NoSQL is a new set of database. Big Data Solutions:
NoSQL database system includes a wide range A solution is Big Data store in HDFS
of database technologies that can store files, The accesses are sequential in
HDFS data.
structured, semi-structured, unstructured and
polymorphic data.
NoSQL databases have the following properties:
• higher scalability.
Apache's CouchDB A project of Apache which is also widely used database for the web.
CouchDBconsists of Document Store.
It uses the JSON data exchange format to store its documents, JavaScript for indexing,
combining and transforming documents, and HTTP Apis
Oracle NoSQL Step towards NoSQL data store; distributed key-value data store; provides transactional
semantics for data manipulation, horizontal scalability, simple administration and monitoring
CAP Theorem
It states that is impossible for a distributed data store to offer more than
two out of three guarantees
• Consistency
• Availability
• Partition Tolerance
• Ability to store any format or data - including documents with missing fields
• Most technologies (e.g. Cassandra, Hadoop, Mondo) allow for rapid and easy scaling of servers
(sharding/ clustering).
• Some technologies allow for indexing - but at that point you are not really schemaless so you can
have a nearly schema-less design with one primary key (say a doumentid) and required fields (like a
timestamp) … and still allow nearly anything else to be loaded in.
• A developer can build their own objects (schema) easily and change them on the fly (think Agile)
without engaging a DBA.
Increasing Flexibility for Data Manipulation
Soft State: Due to the lack of immediate consistency, data values may change over time.
Eventual Consistency: The system will be eventually consistent after the application
input. The data will be replicated to different nodes and will eventually reach a consistent
state. But the consistency is not guaranteed at a transaction level.
Key-Value Store:
The data store characteristics are high performance, scalability & flexibility.
A simple string called, key maps to a large data string or
BLOB (Basic Large Object).
• Put (key, value): associates the value with the key and updates a value
if this key is already present.
• Delete (key) : removes a key and its value from the data store
Limitations of key-value store architectural pattern are:
• No indexes are maintained on values, thus a subset of values is not
searchable.
• Key-value store does not provide traditional database capabilities
• Maintaining unique values as keys may become more difficult when the
volume of data increases.
• Queries cannot be performed on individual values. No clause like 'where' in
a relational database usable that filters a result set.
Dynamic schema means that documents in the same collection do not need to have the same set of
fields or structure, and common fields in a collection's documents may hold different types of data.
{ _id: ObjectId(7df78ad8902c)
title: 'MongoDB Overview',
description: 'MongoDB is no sql database',
by: 'tutorials point',
url: 'http://www.tutorialspoint.com',
tags: ['mongodb', 'database', 'NoSQL'],
likes: 100,
comments: [
{
user:'user1',
message: 'My first comment',
dateCreated: new Date(2011,1,20,2,15),
like: 0
},
{ user:'user2',
message: 'My second comments',
dateCreated: new Date(2011,1,25,7,45),
like: 5
}
]
}
_id is a 12 bytes hexadecimal -These 12 bytes first 4 bytes for the current timestamp, next 3 bytes for machine id, next
2 bytes for process id of MongoDB server and remaining 3 bytes are simple incremental VALUE.
Replication :
• Replication is the process of synchronizing data across multiple servers.
• Replication provides redundancy and increases data availability with multiple copies
of data on different database servers.
Commands Description :
rs.initiate() To initiate a new replica set
rs.conf () To check the replica set configuration
rs.status() To check the status of a replica set
rs.add() To add members to a replica se
Node : Place where data stores for processing Mem-table: Memory resident data structure,
after data written in commit log, data write
Data Center: Collection of many related nodes in mem-table temporarily
Cluster : Collection of many data centers
SSTable: When mem-table reaches a certain
threshold, data flush into an SSTable disk
file
Cassandra Query Language(CQL)
Data Replication in Cassandra