0% found this document useful (0 votes)

124 views

NoSQL Big Data Management

This document discusses NoSQL databases and key concepts around distributed computing. It provides an overview of features of distributed computing like fault tolerance, flexibility, and scalability. It also discusses demerits like issues in troubleshooting large networks and security risks. The document then contrasts Hadoop and NoSQL databases, and provides definitions and properties of NoSQL databases. It describes different types of NoSQL databases like MongoDB, Cassandra, CouchDB, and HBase, and how they differ in terms of data models, scalability, and consistency.

Uploaded by

Navaneeth Krish

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

124 views

NoSQL Big Data Management

Uploaded by

Navaneeth Krish

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 36

Module - 3

NoSQL Big data Management

Features of Distributed Computing

1. Fault Tolerance and reliability

2. Flexibility

3. Sharding

4. Speed

5. Scalability

6. Resources Sharing

7. Performance
Demerits of Distributed Computing

1. Issues in troubleshooting in a large Network infrastructure

2. Additional software requirements

3. Security risks for data and resources

Hadoop or NoSQL? What is the difference?

• Analytical vs Operational

• Volume (Petabytes vs Terabytes) vs Velocity

• Batch vs Interactive
SQL- RDBMS
NoSql
NoSql- Flexible Data Models & multiple schemas, Consider as semi-structured
What is NoSQL?
• NoSQL Database is a non-relational Data Management System, that does not
require a fixed schema.
 NoSQL is a new set of database. Big Data Solutions:

 Require a scalable distributed

 A flexible Data base Model used for Big Data computing model with shared-
& Real time web apps nothing architecture.

 NoSQL database system includes a wide range  A solution is Big Data store in HDFS
of database technologies that can store files, The accesses are sequential in
HDFS data.
structured, semi-structured, unstructured and
polymorphic data.
NoSQL databases have the following properties:

• Support for Multiple Data Models (Schema-free)

• Simple Application Programme Interface(API)

• higher scalability.

• distributed computing & cost effective.

Types of NoSQL Databases
NoSQL data stores and their characteristic features
Apache's HBase open-source and non-relational data store written in Java;
A column-family based NoSQL data store,
data store providing BigTable-like capabilities scalability, strong consistency

Apache's MongoDB Master-slave distribution model;

Document-oriented data store with JSON-like documents and dynamic schemas;
open-source, NoSQL, scalable and non-relational database;
Used by Websites Craigslist, eBay, Foursquare at the backend

Apache's Cassandra Decentralized distribution peer-to-peer model;

Open source; NoSQL; scalable, non-relational, column-family based;
Fault-tolerant and tuneable consistency used by Facebook and Instagram

Apache's CouchDB A project of Apache which is also widely used database for the web.
CouchDBconsists of Document Store.
It uses the JSON data exchange format to store its documents, JavaScript for indexing,
combining and transforming documents, and HTTP Apis

Oracle NoSQL Step towards NoSQL data store; distributed key-value data store; provides transactional
semantics for data manipulation, horizontal scalability, simple administration and monitoring
CAP Theorem
It states that is impossible for a distributed data store to offer more than
two out of three guarantees
• Consistency
• Availability
• Partition Tolerance

 Database must answer, and that answer would be

old or wrong data (AP).
 Database should not answer, unless it receives the
latest copy of the data (CP).
Schema-Less Models
Advantages of schema-less:
• Speed for whole document requests

• Ability to store any format or data - including documents with missing fields

• Most technologies (e.g. Cassandra, Hadoop, Mondo) allow for rapid and easy scaling of servers
(sharding/ clustering).

• Some technologies allow for indexing - but at that point you are not really schemaless so you can
have a nearly schema-less design with one primary key (say a doumentid) and required fields (like a
timestamp) … and still allow nearly anything else to be loaded in.

• A developer can build their own objects (schema) easily and change them on the fly (think Agile)
without engaging a DBA.
Increasing Flexibility for Data Manipulation

BASE- Principles (Properties)

Basically Available: Rather than enforcing immediate consistency, BASE-modelled

NoSQL databases will ensure availability of data by spreading and replicating it across
the nodes of the database cluster.

Soft State: Due to the lack of immediate consistency, data values may change over time.

Eventual Consistency: The system will be eventually consistent after the application
input. The data will be replicated to different nodes and will eventually reach a consistent
state. But the consistency is not guaranteed at a transaction level.
Key-Value Store:
The data store characteristics are high performance, scalability & flexibility.
A simple string called, key maps to a large data string or
BLOB (Basic Large Object).

Key-value store accesses use a primary key for accessing the

values. Therefore, the store can be easily scaled up for very
large data.

The key is flexible and can be represented in many formats:

(i) Artificially generated strings created from a hash of a value,
(ii) Logical path names to images or files,
(iii)REST web-service calls (request response cycles)
(iv) SQL queries.
The key-value store provides client to read and write values using a
key as follows:

• Get (key) : returns the value associated with the key.

• Put (key, value): associates the value with the key and updates a value
if this key is already present.

• Multi-get ( keyl, key2,… keyN) : returns the list of values

• Delete (key) : removes a key and its value from the data store
Limitations of key-value store architectural pattern are:
• No indexes are maintained on values, thus a subset of values is not
searchable.
• Key-value store does not provide traditional database capabilities
• Maintaining unique values as keys may become more difficult when the
volume of data increases.
• Queries cannot be performed on individual values. No clause like 'where' in
a relational database usable that filters a result set.

Typical uses of key-value store are:

(i) Image store, (iii) Lookup table
(ii) Document or file store (iv) Query-cache.
Document Store:

• Document stores unstructured data.

• Storage has similarity with object store.

• Querying is easy. For example, using section

number, sub-section number and figure caption
and table headings to retrieve document partitions.

• Data stores in nested hierarchies. Typical uses of a document store

• JSON formats data model are: (i) office documents,
• XML document object model (DOM), (ii) inventory store,
• Machine-readable data as one BLOB. (iii) forms data, (iv) document
exchange and (v) document search.
Document Collection:
A collection can be used in many ways for
managing a large document store.
Three uses of a document collection are:
1. Group the documents together, similar to a
directory structure in a file.
2. Enables navigating through document
hierarchies, logically grouping similar
documents and storing business rules such
as permissions, indexes and triggers
3. A collection can contain other collections
as well.
MONGODB DATABASE

• MongoDB is an open-source document database and leading NoSQL

database. MongoDB is written in C++.
• MongoDB is a cross-platform, document oriented database that provides,
high performance, high availability, and easy scalability.
• MongoDB works on concept of collection and document.
A document is a set of key-value pairs. Documents have dynamic schema.

Dynamic schema means that documents in the same collection do not need to have the same set of
fields or structure, and common fields in a collection's documents may hold different types of data.

{ _id: ObjectId(7df78ad8902c)
title: 'MongoDB Overview',
description: 'MongoDB is no sql database',
by: 'tutorials point',
url: 'http://www.tutorialspoint.com',
tags: ['mongodb', 'database', 'NoSQL'],
likes: 100,
comments: [
{
user:'user1',
message: 'My first comment',
dateCreated: new Date(2011,1,20,2,15),
like: 0
},
{ user:'user2',
message: 'My second comments',
dateCreated: new Date(2011,1,25,7,45),
like: 5
}
]
}

_id is a 12 bytes hexadecimal -These 12 bytes first 4 bytes for the current timestamp, next 3 bytes for machine id, next
2 bytes for process id of MongoDB server and remaining 3 bytes are simple incremental VALUE.
Replication :
• Replication is the process of synchronizing data across multiple servers.
• Replication provides redundancy and increases data availability with multiple copies
of data on different database servers.
Commands Description :
rs.initiate() To initiate a new replica set
rs.conf () To check the replica set configuration
rs.status() To check the status of a replica set
rs.add() To add members to a replica se

Replica Set Features

•A cluster of N nodes
•Any one node can be primary
•All write operations go to primary
•Automatic failover
•Automatic recovery
Auto-Sharding:
• Sharding is the process of storing data records across multiple machines and it is
MongoDB's approach to meeting the demands of data growth.

MongoDB uses sharding to support deployments

with very large data sets and high throughput
operations.

query router, providing an interface between client

applications and the sharded cluster

MongoDB uses the shard key to distribute

the collection’s documents across shards.
CASSANDRA DATABASES

• Cassandra is a column-oriented database.

• Cassandra is scalable, consistent, and fault-tolerant.
• Cassandra's distribution design is based on Amazon's Dynamo and its data
model on Google's Bigtable.
• Cassandra is created at Facebook. It is totally different from relational
database management systems.
• Cassandra follows a Dynamo-style replication model with no single point of
failure, but adds a more powerful "column family" data model.
• Cassandra is being used by some of the biggest companies like Facebook,
Twitter, Cisco, Rackspace, ebay, Twitter, Netflix, and more.
Components of cassandra Commit log : Used for crash recovery; each
write operation written to commit log

Node : Place where data stores for processing Mem-table: Memory resident data structure,
after data written in commit log, data write
Data Center: Collection of many related nodes in mem-table temporarily
Cluster : Collection of many data centers
SSTable: When mem-table reaches a certain
threshold, data flush into an SSTable disk
file
Cassandra Query Language(CQL)
Data Replication in Cassandra

In Cassandra, one or more of the nodes

in a cluster act as replicas for a given
piece of data. If it is detected that some
of the nodes responded with an out-of-
date value, Cassandra will return the
most recent value to the client.

Cassandra uses the Gossip Protocol in the background

to allow the nodes to communicate with each other and
detect any faulty nodes in the cluster.

Hourglass Workout Program by Luisagiuliet 2
76% (21)
Hourglass Workout Program by Luisagiuliet 2
51 pages
12 Week Program: Summer Body Starts Now
89% (45)
12 Week Program: Summer Body Starts Now
70 pages
Read People Like A Book by Patrick King-Edited
59% (76)
Read People Like A Book by Patrick King-Edited
12 pages
Livingood, Blake - Livingood Daily Your 21-Day Guide To Experience Real Health
77% (13)
Livingood, Blake - Livingood Daily Your 21-Day Guide To Experience Real Health
260 pages
Cheat Code To The Universe
94% (78)
Cheat Code To The Universe
34 pages
Facial Gains Guide (001 081)
91% (45)
Facial Gains Guide (001 081)
81 pages
Curse of Strahd
95% (467)
Curse of Strahd
258 pages
The Psychiatric Interview - Daniel Carlat
91% (34)
The Psychiatric Interview - Daniel Carlat
473 pages
The Borax Conspiracy
91% (57)
The Borax Conspiracy
14 pages
The Secret Language of Attraction
86% (107)
The Secret Language of Attraction
278 pages
How To Develop and Write A Grant Proposal
83% (542)
How To Develop and Write A Grant Proposal
17 pages
Workbook For The Body Keeps The Score
88% (52)
Workbook For The Body Keeps The Score
111 pages
Donald Trump & Jeffrey Epstein Rape Lawsuit and Affidavits
83% (1016)
Donald Trump & Jeffrey Epstein Rape Lawsuit and Affidavits
13 pages
KamaSutra Positions
78% (69)
KamaSutra Positions
55 pages
7 Hermetic Principles
93% (30)
7 Hermetic Principles
3 pages
27 Feedback Mechanisms Pogil Key
77% (13)
27 Feedback Mechanisms Pogil Key
6 pages
Phone Codes
78% (27)
Phone Codes
5 pages
36 Questions That Lead To Love
91% (35)
36 Questions That Lead To Love
3 pages
Sample Mental Health Progress Note
96% (47)
Sample Mental Health Progress Note
3 pages
2025 MandateForLeadership FULL
70% (10)
2025 MandateForLeadership FULL
920 pages
How To Kiss A Woman's Breast
60% (114)
How To Kiss A Woman's Breast
14 pages
The 36 Questions That Lead To Love - The New York Times
94% (34)
The 36 Questions That Lead To Love - The New York Times
3 pages
100 Questions To Ask Your Partner
80% (35)
100 Questions To Ask Your Partner
2 pages
Satanic Calendar
25% (56)
Satanic Calendar
4 pages
The 36 Questions That Lead To Love - The New York Times
95% (21)
The 36 Questions That Lead To Love - The New York Times
3 pages
14 Easiest & Hardest Muscles To Build (Ranked With Solutions)
100% (7)
14 Easiest & Hardest Muscles To Build (Ranked With Solutions)
27 pages
Jeffrey Epstein39s Little Black Book Unredacted PDF
75% (12)
Jeffrey Epstein39s Little Black Book Unredacted PDF
95 pages
Apache Cassandra Administrator Associate - Exam Practice Tests
From Everand
Apache Cassandra Administrator Associate - Exam Practice Tests
Cristian Scutaru
No ratings yet
1001 Songs
70% (71)
1001 Songs
1,798 pages
The 4 Hour Workweek, Expanded and Updated by Timothy Ferriss - Excerpt
23% (954)
The 4 Hour Workweek, Expanded and Updated by Timothy Ferriss - Excerpt
38 pages
Zodiac Sign & Their Most Common Addictions
63% (30)
Zodiac Sign & Their Most Common Addictions
9 pages
Final Questionnaire Mystery Shopping-Mystery Shopper-Modified
No ratings yet
Final Questionnaire Mystery Shopping-Mystery Shopper-Modified
7 pages
Iti Pdfs
No ratings yet
Iti Pdfs
10 pages
Budget-of-Lessons-ICT-CSS 9 Q1-Q4
100% (3)
Budget-of-Lessons-ICT-CSS 9 Q1-Q4
6 pages
Theano
No ratings yet
Theano
660 pages
NOSQL
No ratings yet
NOSQL
23 pages
Notes in Descrete Math PDF
No ratings yet
Notes in Descrete Math PDF
399 pages
Pump Station Design
100% (4)
Pump Station Design
62 pages
BROTHER Service Manual PDF
100% (1)
BROTHER Service Manual PDF
337 pages
PC-Software HMGWIN 3000: User Manual
No ratings yet
PC-Software HMGWIN 3000: User Manual
30 pages
An Investigation of NoSQL Database Performance From A MYSQL Perspective
No ratings yet
An Investigation of NoSQL Database Performance From A MYSQL Perspective
3 pages
Characteristics of Key Value DB (DB)
No ratings yet
Characteristics of Key Value DB (DB)
13 pages
Cassandra: Types of Nosql Databases
No ratings yet
Cassandra: Types of Nosql Databases
6 pages
Introduction To Nosql: - Key Value Databases
No ratings yet
Introduction To Nosql: - Key Value Databases
14 pages
Nosql Databases: by Amy Alexander and Tanya Christina
No ratings yet
Nosql Databases: by Amy Alexander and Tanya Christina
14 pages
Nosqlmodule 1
100% (1)
Nosqlmodule 1
102 pages
Unit 5-Key - Value Store Database
No ratings yet
Unit 5-Key - Value Store Database
16 pages
NoSQL Intro
No ratings yet
NoSQL Intro
26 pages
Nosql Database: New Era of Databases For Big Data Analytics - Classification, Characteristics and Comparison
No ratings yet
Nosql Database: New Era of Databases For Big Data Analytics - Classification, Characteristics and Comparison
17 pages
Big Data - RDBMS, NoSQL and DynamoDB
No ratings yet
Big Data - RDBMS, NoSQL and DynamoDB
6 pages
BDA Unit2 Complete
No ratings yet
BDA Unit2 Complete
56 pages
Nosql - Journey Ahead!: Origin: Punch Cards To Dbms
No ratings yet
Nosql - Journey Ahead!: Origin: Punch Cards To Dbms
54 pages
Introduction To Neo4j
No ratings yet
Introduction To Neo4j
14 pages
Visual Guide To NoSQL Systems - Beany
No ratings yet
Visual Guide To NoSQL Systems - Beany
9 pages
No SQL
No ratings yet
No SQL
11 pages
Learning Cypher Sample Chapter
No ratings yet
Learning Cypher Sample Chapter
26 pages
Big Data and Spark Developers
No ratings yet
Big Data and Spark Developers
5 pages
Module 4 Nosql
No ratings yet
Module 4 Nosql
8 pages
SQL NoSQL NewSQL
No ratings yet
SQL NoSQL NewSQL
12 pages
Unit - 3
No ratings yet
Unit - 3
42 pages
Lekcija09 - 04 NoSQL Redis
No ratings yet
Lekcija09 - 04 NoSQL Redis
40 pages
Nosql Database Systems: M.Tech. (Iind, Sem Ce/Cn)
100% (1)
Nosql Database Systems: M.Tech. (Iind, Sem Ce/Cn)
135 pages
NoSql Notes
No ratings yet
NoSql Notes
4 pages
O.R- Unit - I, II, III
No ratings yet
O.R- Unit - I, II, III
44 pages
HBase
No ratings yet
HBase
31 pages
CCS334 BIG DATA ANALYTICS Session 1 Intr
No ratings yet
CCS334 BIG DATA ANALYTICS Session 1 Intr
18 pages
Class: CS 237 Distributed Systems Middleware Instructor: Nalini Venkatasubramanian
No ratings yet
Class: CS 237 Distributed Systems Middleware Instructor: Nalini Venkatasubramanian
55 pages
2016 05 10 Apache Nifi Deep Dive 160511170654
No ratings yet
2016 05 10 Apache Nifi Deep Dive 160511170654
34 pages
DBMS Basic Concepts
No ratings yet
DBMS Basic Concepts
56 pages
DW DM Notes
No ratings yet
DW DM Notes
107 pages
Discrete Structures
No ratings yet
Discrete Structures
350 pages
SKP Engineering College: A Course Material On
No ratings yet
SKP Engineering College: A Course Material On
212 pages
Data Modeling Guidelines For Nosql Document-Store Databases
No ratings yet
Data Modeling Guidelines For Nosql Document-Store Databases
12 pages
MapReduce Example
No ratings yet
MapReduce Example
76 pages
Introduction To NoSQL
No ratings yet
Introduction To NoSQL
43 pages
Unit-V: Database Management System
No ratings yet
Unit-V: Database Management System
5 pages
HBase
No ratings yet
HBase
36 pages
Virtualization and Five Step Process
No ratings yet
Virtualization and Five Step Process
19 pages
Understanding Business Intelligence:: ETL and Data Mart Best Practices
No ratings yet
Understanding Business Intelligence:: ETL and Data Mart Best Practices
20 pages
20191216134846D3338 - COMP6579 Session 10 - Big Data Analytics (Apache Spark - SparkML)
No ratings yet
20191216134846D3338 - COMP6579 Session 10 - Big Data Analytics (Apache Spark - SparkML)
42 pages
Cassandra Tutorial For Beginners: Learn in 3 Days: What Is Apache Cassandra?
No ratings yet
Cassandra Tutorial For Beginners: Learn in 3 Days: What Is Apache Cassandra?
4 pages
MVC Design Pattern PPT Presented by QuontraSolutions
No ratings yet
MVC Design Pattern PPT Presented by QuontraSolutions
35 pages
Spark Lab
No ratings yet
Spark Lab
6 pages
ER Practical 7r
No ratings yet
ER Practical 7r
5 pages
Hash Pointers
No ratings yet
Hash Pointers
10 pages
3 Lecture 3-ETL
100% (1)
3 Lecture 3-ETL
42 pages
Unit - 2
No ratings yet
Unit - 2
26 pages
Data Warehouse and Design Presentation
No ratings yet
Data Warehouse and Design Presentation
11 pages
Matillion Optimizing Snowflake
No ratings yet
Matillion Optimizing Snowflake
23 pages
Hadoop Data Lake: Hadoop Log Files Json
No ratings yet
Hadoop Data Lake: Hadoop Log Files Json
5 pages
BDC Previous Papers 2 Marks
100% (1)
BDC Previous Papers 2 Marks
7 pages
Spark Optimizations & Deployment
No ratings yet
Spark Optimizations & Deployment
39 pages
DBMS Module 3 PPT
No ratings yet
DBMS Module 3 PPT
51 pages
Optimizing Hadoop for MapReduce
From Everand
Optimizing Hadoop for MapReduce
Khaled Tannir
No ratings yet
(Excerpts From) Investigating Performance: Design and Outcomes With Xapi
From Everand
(Excerpts From) Investigating Performance: Design and Outcomes With Xapi
Janet Laane Effron
No ratings yet
Controlled SAP LUW
No ratings yet
Controlled SAP LUW
1 page
Lighting Technology: Hella Kgaa Hueck & Co
No ratings yet
Lighting Technology: Hella Kgaa Hueck & Co
8 pages
Satsearch Datasheet Krjdo3 Arcsec Arcus-Adcs
No ratings yet
Satsearch Datasheet Krjdo3 Arcsec Arcus-Adcs
2 pages
Peer Reviewed Title:: Jungers, Bryan D
No ratings yet
Peer Reviewed Title:: Jungers, Bryan D
137 pages
JSA07946E-AD - E-Com Local Wiring
No ratings yet
JSA07946E-AD - E-Com Local Wiring
17 pages
PDF Editor - PDF Conversion Made Easy - PDFSimpli
No ratings yet
PDF Editor - PDF Conversion Made Easy - PDFSimpli
2 pages
Managing Projects Using Oracle Project Management PJT And1425
No ratings yet
Managing Projects Using Oracle Project Management PJT And1425
19 pages
Introduction To Data Structures and Algorithms ITS 105
No ratings yet
Introduction To Data Structures and Algorithms ITS 105
20 pages
Capacitor Banks
No ratings yet
Capacitor Banks
15 pages
VoLTE Cell Capacity - Calculating Packet Size, PRBs and No. of Users - Techplayon
No ratings yet
VoLTE Cell Capacity - Calculating Packet Size, PRBs and No. of Users - Techplayon
5 pages
Prospectus - 1
No ratings yet
Prospectus - 1
28 pages
00 Datasheet of STS For 200 - 215KTL - 6800 kVA 3400 kVA@1000m 40 - 20220510
No ratings yet
00 Datasheet of STS For 200 - 215KTL - 6800 kVA 3400 kVA@1000m 40 - 20220510
4 pages
High Broadband Antenna
No ratings yet
High Broadband Antenna
2 pages
Cat Ic-10-Ch04 2017 en
No ratings yet
Cat Ic-10-Ch04 2017 en
84 pages
Untitled
No ratings yet
Untitled
33 pages
Standard Multi-Body System Software in The Vehicle Development Process
No ratings yet
Standard Multi-Body System Software in The Vehicle Development Process
8 pages
Aviacsa Mro
No ratings yet
Aviacsa Mro
1 page
Servo Drive Spindle Glentek
No ratings yet
Servo Drive Spindle Glentek
34 pages
UD VolvoV4 Fault Codes
100% (1)
UD VolvoV4 Fault Codes
306 pages
How To Install SolidWorks 2011 On Windows 7
0% (1)
How To Install SolidWorks 2011 On Windows 7
3 pages
Exercise
No ratings yet
Exercise
2 pages
BSCS Revised Roadmap Batch 14 To Onwards
No ratings yet
BSCS Revised Roadmap Batch 14 To Onwards
40 pages
Internet Presentation
No ratings yet
Internet Presentation
25 pages
Computer System Security - Book Aktuu
No ratings yet
Computer System Security - Book Aktuu
105 pages