SlideShare a Scribd company logo
Unit-3
 Non-relational
 Flexible schema
 Other or additional query languages than SQL
 Distributed – horizontal scaling
 Less structured data
 Supports big data
2
INTRODUCTION OF NOSQL
When compared to relational databases, NoSQL databases are more scalable and provide
superior performance, and their data model addresses several issues that the relational
model is not designed to address:
◦ Geographically distributed architecture instead of expensive,
monolithic architecture
◦ Large volumes of rapidly changing structured, semi-structured, and
unstructured data
◦ Agile sprints, quick schema iteration, and frequent code pushes
◦ Object-oriented programming that is easy to use and flexible
3
 It’s Not No SQL it’s NOT ONLY SQL.
 It’s not even a replacement to RDBMS.
As compared to the good olden days we are
saving more and more data.
Connection between the data is growing in
which we require an architecture that takes
advantage of these two key issues.
MongoDB is a cross-platform, document
oriented database that provides
 High performance.
 High availability.
 Easy scalability.
MongoDB works on concept of collection and
document.
Architecture : -
Database
ContainerDocum
ent
When your requirements has these properties :
 You absolutely must store unstructured data. Say
things coming from 3rd-party API you don’t
control, logs whose format may change any
minute, user-entered metadata, but you want
indexes on a subset of it.
 You need to handle more reads/writes than
single server can deal with and master-slave
architecture won’t work for you.
 You change your schema very often on a large
dataset.
 Stands for No-SQL or Not Only SQL??
 Class of non-relational data storage
systems
 E.g. BigTable, Dynamo, PNUTS/Sherpa, ..
 Usually do not require a fixed table schema
nor do they use the concept of joins
 Distributed data storage systems
 All NoSQL offerings relax one or more of
the ACID properties (will talk about the CAP
theorem)
 Basic API access:
 get(key) -- Extract the value given a key
 put(key, value) -- Create or update the
value given its key
 delete(key) -- Remove the key and its
associated value
 execute(key, operation, parameters) --
Invoke an operation to the value (given
its key) which is a special data structure
(e.g. List, Set, Map .... etc).
NoSQL Data Storage: Classification
 Uninterpreted key/value or ‘the big hash
table’.
 Amazon S3 (Dynamo)
 Flexible schema
 BigTable, Cassandra, Base (ordered keys, semi-
structured data),
 Sherpa/PNuts (unordered keys, JSON)
 MongoDB (based on JSON)
 CouchDB (name/value in text)
 Cheap, easy to implement (open source)
 Data are replicated to multiple nodes (therefore
identical and fault-tolerant) and can be
partitioned
 When data is written, the latest version is on at least
one node and then replicated to other nodes
 No single point of failure
 Easy to distribute
 Don't require a schema
What does NoSQL Not Provide?
 Joins
 Group by
 But PNUTS provides interesting
materialized view approach to
joins/aggregation.
 ACID transactions
 SQL
 Integration with applications that are
based on SQL
Key/value (Dynamo)
Columnar/tabular (Base)
Document (mongoDB)
Big data technology unit 3
mongoDB SQL
Document Tuple
Collection Table/View
PK: _id Field PK: Any Attribute(s)
Uniformity not Required Uniform Relation Schema
Index Index
Embedded Structure Joins
Shard Partition
RDBMS MongoDB
Database ➜ Database
Table ➜ Collection
Row ➜ Document
Index ➜ Index
Join ➜ Embedded Document
Foreign Key ➜ Reference
 Has two phases:
A map stage that processes each document
and emits one or more objects for each input document
A reduce phase that combines the output of the map
operation.
An optional finalize stage for final modifications to the
result
 Uses Custom JavaScript functions
Provides greater flexibility but is less efficient and
more complex than the aggregation pipeline
 Can have output sets that exceed the 16 megabyte
output limitation of the aggregation pipeline.
 It’s Not No SQL it’s NOT ONLY SQL.
 It’s not even a replacement to RDBMS.
As compared to the good olden days we are saving
more and more data.
Connection between the data is growing in which
we require an architecture that takes advantage of
these two key issues.
 Key Value pair
Dynamo DB
Azure Table Storage
(ATS )
Graph
database
 Document Based
Mango Db
AmazonSimple DB
Couch DB
 Column Oriented database
(#key,#value)
(Name, Tom)
(Age,25)
(Role, Student)
(University, CU)
[
{
"Name":
"Tom",
"Age": 30,
"Role":
"Student",
"University":
"CU",
}
]
Student
Tom
CU
25
Masters
Ottawa Location
• Neo4j
• Infogrid
Row Id Columns
1
Name Tom
Age 25
Role Studen
t
Bigtable(Google)
Base
MongoDB is a cross-platform, document oriented
database that provides
 High performance.
 High availability.
 Easy scalability.
MongoDB works on concept of collection and
document.
 All the modern applications deals with huge data.
 Development with ease is possible with mongo DB.
 Flexibility in deployment.
 Rich Queries.
 Older database systems may not be compatible with
the design.
And it’s a document oriented storage:- Data is stored in
the form of JSON Style.
Architecture : -
Database
ContainerDocum
ent
XML JSON
It is a markup language. It is a way of representing
objects.
This is more verbose than
JSON.
This format uses less words.
It is used to describe the
structured data.
It is used to describe
unstructured data which
include arrays.
JavaScript functions like
eval(), parse()
doesn’t work here.
When eval method is applied
to JSON it returns the
described object.
Example:
<car>
<company>Volkswagen</c
ompany>
{
"company": Volkswagen,
"name": "Vento",
 What is it ?
 How does it work ?
 Hadoop
 Tools
 Architecture
 Distributed database management system
 Designed for big data
 Scalable
 Fault tolerant
 No single point of failure
 Has an SQL like query language
 NoSQL
 Organises data into tables
 Uses Cassandra Query Language ( CQL )
 Does not allow sub queries or joins
 Supports Hadoop Map Reduce
 Uses asynchronous masterless replication
◦ Gives low latency
 Allows indexing
 Allows batch analysis via Hadoop
How does Cassandra integrate with Hadoop
 Support for Map Reduce
 Integration with
◦ Apache Pig
◦ Apache Hive
 Can also act as a back end for Solr !
 User Interface ( GUI )
◦ Cassandra GUI
◦ Toad for cloud db's
 Administration
◦ OpsCentre
◦ Cassandra Cluster Admin
 Other
◦ Client libraries
◦ Java, Python, .Net, Perl etc
 A peer to peer cluster
 No single point of failure
 Tunable consistency
◦ Is performance or accuracy more important ?
 Query by key or key range
 Row oriented data storage
 Rows can hold up to 2 billion columns
Big data technology unit 3
Ad

More Related Content

What's hot (20)

Big data vahidamiri-tabriz-13960226-datastack.ir
Big data vahidamiri-tabriz-13960226-datastack.irBig data vahidamiri-tabriz-13960226-datastack.ir
Big data vahidamiri-tabriz-13960226-datastack.ir
datastack
 
Big data vahidamiri-datastack.ir
Big data vahidamiri-datastack.irBig data vahidamiri-datastack.ir
Big data vahidamiri-datastack.ir
datastack
 
Intro to bigdata on gcp (1)
Intro to bigdata on gcp (1)Intro to bigdata on gcp (1)
Intro to bigdata on gcp (1)
SahilRaina21
 
Big Data Analytics 2014
Big Data Analytics 2014Big Data Analytics 2014
Big Data Analytics 2014
Stratebi
 
Database awareness
Database awarenessDatabase awareness
Database awareness
kloia
 
Intro to Big Data Hadoop
Intro to Big Data HadoopIntro to Big Data Hadoop
Intro to Big Data Hadoop
Apache Apex
 
Big data hadoop rdbms
Big data hadoop rdbmsBig data hadoop rdbms
Big data hadoop rdbms
Arjen de Vries
 
Big Data with SQL Server
Big Data with SQL ServerBig Data with SQL Server
Big Data with SQL Server
Mark Kromer
 
Big data concepts
Big data conceptsBig data concepts
Big data concepts
Serkan Özal
 
Hdfs Dhruba
Hdfs DhrubaHdfs Dhruba
Hdfs Dhruba
Jeff Hammerbacher
 
Big Data , Big Problem?
Big Data , Big Problem?Big Data , Big Problem?
Big Data , Big Problem?
Mohammadhasan Farazmand
 
Big Data Concepts
Big Data ConceptsBig Data Concepts
Big Data Concepts
Ahmed Salman
 
Big data ppt
Big data pptBig data ppt
Big data ppt
Shweta Sahu
 
Big Data technology Landscape
Big Data technology LandscapeBig Data technology Landscape
Big Data technology Landscape
ShivanandaVSeeri
 
Introduction to Apache Hadoop Eco-System
Introduction to Apache Hadoop Eco-SystemIntroduction to Apache Hadoop Eco-System
Introduction to Apache Hadoop Eco-System
Md. Hasan Basri (Angel)
 
Mongo db
Mongo dbMongo db
Mongo db
Kowndinya Mannepalli
 
Big Data Technology Stack : Nutshell
Big Data Technology Stack : NutshellBig Data Technology Stack : Nutshell
Big Data Technology Stack : Nutshell
Khalid Imran
 
Vikram Andem Big Data Strategy @ IATA Technology Roadmap
Vikram Andem Big Data Strategy @ IATA Technology Roadmap Vikram Andem Big Data Strategy @ IATA Technology Roadmap
Vikram Andem Big Data Strategy @ IATA Technology Roadmap
IT Strategy Group
 
Big data
Big dataBig data
Big data
Mina Soltani
 
RDBMS vs Hadoop vs Spark
RDBMS vs Hadoop vs SparkRDBMS vs Hadoop vs Spark
RDBMS vs Hadoop vs Spark
Laxmi8
 
Big data vahidamiri-tabriz-13960226-datastack.ir
Big data vahidamiri-tabriz-13960226-datastack.irBig data vahidamiri-tabriz-13960226-datastack.ir
Big data vahidamiri-tabriz-13960226-datastack.ir
datastack
 
Big data vahidamiri-datastack.ir
Big data vahidamiri-datastack.irBig data vahidamiri-datastack.ir
Big data vahidamiri-datastack.ir
datastack
 
Intro to bigdata on gcp (1)
Intro to bigdata on gcp (1)Intro to bigdata on gcp (1)
Intro to bigdata on gcp (1)
SahilRaina21
 
Big Data Analytics 2014
Big Data Analytics 2014Big Data Analytics 2014
Big Data Analytics 2014
Stratebi
 
Database awareness
Database awarenessDatabase awareness
Database awareness
kloia
 
Intro to Big Data Hadoop
Intro to Big Data HadoopIntro to Big Data Hadoop
Intro to Big Data Hadoop
Apache Apex
 
Big Data with SQL Server
Big Data with SQL ServerBig Data with SQL Server
Big Data with SQL Server
Mark Kromer
 
Big Data technology Landscape
Big Data technology LandscapeBig Data technology Landscape
Big Data technology Landscape
ShivanandaVSeeri
 
Introduction to Apache Hadoop Eco-System
Introduction to Apache Hadoop Eco-SystemIntroduction to Apache Hadoop Eco-System
Introduction to Apache Hadoop Eco-System
Md. Hasan Basri (Angel)
 
Big Data Technology Stack : Nutshell
Big Data Technology Stack : NutshellBig Data Technology Stack : Nutshell
Big Data Technology Stack : Nutshell
Khalid Imran
 
Vikram Andem Big Data Strategy @ IATA Technology Roadmap
Vikram Andem Big Data Strategy @ IATA Technology Roadmap Vikram Andem Big Data Strategy @ IATA Technology Roadmap
Vikram Andem Big Data Strategy @ IATA Technology Roadmap
IT Strategy Group
 
RDBMS vs Hadoop vs Spark
RDBMS vs Hadoop vs SparkRDBMS vs Hadoop vs Spark
RDBMS vs Hadoop vs Spark
Laxmi8
 

Similar to Big data technology unit 3 (20)

Mongo Bb - NoSQL tutorial
Mongo Bb - NoSQL tutorialMongo Bb - NoSQL tutorial
Mongo Bb - NoSQL tutorial
Mohan Rathour
 
Nosql seminar
Nosql seminarNosql seminar
Nosql seminar
Shreyashkumar Nangnurwar
 
NoSQL BIg Data Analytics Mongo DB and Cassandra .pdf
NoSQL BIg Data Analytics Mongo DB and Cassandra .pdfNoSQL BIg Data Analytics Mongo DB and Cassandra .pdf
NoSQL BIg Data Analytics Mongo DB and Cassandra .pdf
SharmilaChidaravalli
 
SQL vs NoSQL deep dive
SQL vs NoSQL deep diveSQL vs NoSQL deep dive
SQL vs NoSQL deep dive
Ahmed Shaaban
 
nosql.pptx
nosql.pptxnosql.pptx
nosql.pptx
Prakash Zodge
 
Nosql
NosqlNosql
Nosql
Muluken Sholaye Tesfaye
 
No sql bigdata and postgresql
No sql bigdata and postgresqlNo sql bigdata and postgresql
No sql bigdata and postgresql
Zaid Shabbir
 
No sq lv2
No sq lv2No sq lv2
No sq lv2
Nusrat Sharmin
 
Nosql
NosqlNosql
Nosql
ROXTAD71
 
Nosql
NosqlNosql
Nosql
Roxana Tadayon
 
unit2-ppt1.pptx
unit2-ppt1.pptxunit2-ppt1.pptx
unit2-ppt1.pptx
revathigollu23
 
NO SQL Databases, Big Data and the cloud
NO SQL Databases, Big Data and the cloudNO SQL Databases, Big Data and the cloud
NO SQL Databases, Big Data and the cloud
Manu Cohen-Yashar
 
Big Data Tools MapReduce,Hive and Pig.pdf
Big Data Tools MapReduce,Hive and Pig.pdfBig Data Tools MapReduce,Hive and Pig.pdf
Big Data Tools MapReduce,Hive and Pig.pdf
SharmilaChidaravalli
 
Vskills Apache Cassandra sample material
Vskills Apache Cassandra sample materialVskills Apache Cassandra sample material
Vskills Apache Cassandra sample material
Vskills
 
Big Data Analytics Module-3 as per vtu syllabus.pptx
Big Data Analytics Module-3 as per vtu syllabus.pptxBig Data Analytics Module-3 as per vtu syllabus.pptx
Big Data Analytics Module-3 as per vtu syllabus.pptx
shilpabl1803
 
NoSQL Basics and MongDB
NoSQL Basics and  MongDBNoSQL Basics and  MongDB
NoSQL Basics and MongDB
Shamima Yeasmin Mukta
 
NoSQL(NOT ONLY SQL)
NoSQL(NOT ONLY SQL)NoSQL(NOT ONLY SQL)
NoSQL(NOT ONLY SQL)
Rahul P
 
Big Data Analytics Module-4 as per vtu .pptx
Big Data Analytics Module-4 as per vtu .pptxBig Data Analytics Module-4 as per vtu .pptx
Big Data Analytics Module-4 as per vtu .pptx
shilpabl1803
 
EVALUATING CASSANDRA, MONGO DB LIKE NOSQL DATASETS USING HADOOP STREAMING
EVALUATING CASSANDRA, MONGO DB LIKE NOSQL DATASETS USING HADOOP STREAMINGEVALUATING CASSANDRA, MONGO DB LIKE NOSQL DATASETS USING HADOOP STREAMING
EVALUATING CASSANDRA, MONGO DB LIKE NOSQL DATASETS USING HADOOP STREAMING
ijiert bestjournal
 
NoSQL powerpoint presentation difference with rdbms
NoSQL powerpoint presentation difference with rdbmsNoSQL powerpoint presentation difference with rdbms
NoSQL powerpoint presentation difference with rdbms
AtulKabbur
 
Mongo Bb - NoSQL tutorial
Mongo Bb - NoSQL tutorialMongo Bb - NoSQL tutorial
Mongo Bb - NoSQL tutorial
Mohan Rathour
 
NoSQL BIg Data Analytics Mongo DB and Cassandra .pdf
NoSQL BIg Data Analytics Mongo DB and Cassandra .pdfNoSQL BIg Data Analytics Mongo DB and Cassandra .pdf
NoSQL BIg Data Analytics Mongo DB and Cassandra .pdf
SharmilaChidaravalli
 
SQL vs NoSQL deep dive
SQL vs NoSQL deep diveSQL vs NoSQL deep dive
SQL vs NoSQL deep dive
Ahmed Shaaban
 
No sql bigdata and postgresql
No sql bigdata and postgresqlNo sql bigdata and postgresql
No sql bigdata and postgresql
Zaid Shabbir
 
NO SQL Databases, Big Data and the cloud
NO SQL Databases, Big Data and the cloudNO SQL Databases, Big Data and the cloud
NO SQL Databases, Big Data and the cloud
Manu Cohen-Yashar
 
Big Data Tools MapReduce,Hive and Pig.pdf
Big Data Tools MapReduce,Hive and Pig.pdfBig Data Tools MapReduce,Hive and Pig.pdf
Big Data Tools MapReduce,Hive and Pig.pdf
SharmilaChidaravalli
 
Vskills Apache Cassandra sample material
Vskills Apache Cassandra sample materialVskills Apache Cassandra sample material
Vskills Apache Cassandra sample material
Vskills
 
Big Data Analytics Module-3 as per vtu syllabus.pptx
Big Data Analytics Module-3 as per vtu syllabus.pptxBig Data Analytics Module-3 as per vtu syllabus.pptx
Big Data Analytics Module-3 as per vtu syllabus.pptx
shilpabl1803
 
NoSQL(NOT ONLY SQL)
NoSQL(NOT ONLY SQL)NoSQL(NOT ONLY SQL)
NoSQL(NOT ONLY SQL)
Rahul P
 
Big Data Analytics Module-4 as per vtu .pptx
Big Data Analytics Module-4 as per vtu .pptxBig Data Analytics Module-4 as per vtu .pptx
Big Data Analytics Module-4 as per vtu .pptx
shilpabl1803
 
EVALUATING CASSANDRA, MONGO DB LIKE NOSQL DATASETS USING HADOOP STREAMING
EVALUATING CASSANDRA, MONGO DB LIKE NOSQL DATASETS USING HADOOP STREAMINGEVALUATING CASSANDRA, MONGO DB LIKE NOSQL DATASETS USING HADOOP STREAMING
EVALUATING CASSANDRA, MONGO DB LIKE NOSQL DATASETS USING HADOOP STREAMING
ijiert bestjournal
 
NoSQL powerpoint presentation difference with rdbms
NoSQL powerpoint presentation difference with rdbmsNoSQL powerpoint presentation difference with rdbms
NoSQL powerpoint presentation difference with rdbms
AtulKabbur
 
Ad

Recently uploaded (20)

How to Subscribe Newsletter From Odoo 18 Website
How to Subscribe Newsletter From Odoo 18 WebsiteHow to Subscribe Newsletter From Odoo 18 Website
How to Subscribe Newsletter From Odoo 18 Website
Celine George
 
How to Customize Your Financial Reports & Tax Reports With Odoo 17 Accounting
How to Customize Your Financial Reports & Tax Reports With Odoo 17 AccountingHow to Customize Your Financial Reports & Tax Reports With Odoo 17 Accounting
How to Customize Your Financial Reports & Tax Reports With Odoo 17 Accounting
Celine George
 
P-glycoprotein pamphlet: iteration 4 of 4 final
P-glycoprotein pamphlet: iteration 4 of 4 finalP-glycoprotein pamphlet: iteration 4 of 4 final
P-glycoprotein pamphlet: iteration 4 of 4 final
bs22n2s
 
One Hot encoding a revolution in Machine learning
One Hot encoding a revolution in Machine learningOne Hot encoding a revolution in Machine learning
One Hot encoding a revolution in Machine learning
momer9505
 
Diabetic neuropathy peripheral autonomic
Diabetic neuropathy peripheral autonomicDiabetic neuropathy peripheral autonomic
Diabetic neuropathy peripheral autonomic
Pankaj Patawari
 
Presentation of the MIPLM subject matter expert Erdem Kaya
Presentation of the MIPLM subject matter expert Erdem KayaPresentation of the MIPLM subject matter expert Erdem Kaya
Presentation of the MIPLM subject matter expert Erdem Kaya
MIPLM
 
Unit 4: Long term- Capital budgeting and its types
Unit 4: Long term- Capital budgeting and its typesUnit 4: Long term- Capital budgeting and its types
Unit 4: Long term- Capital budgeting and its types
bharath321164
 
How to track Cost and Revenue using Analytic Accounts in odoo Accounting, App...
How to track Cost and Revenue using Analytic Accounts in odoo Accounting, App...How to track Cost and Revenue using Analytic Accounts in odoo Accounting, App...
How to track Cost and Revenue using Analytic Accounts in odoo Accounting, App...
Celine George
 
High Performance Liquid Chromatography .pptx
High Performance Liquid Chromatography .pptxHigh Performance Liquid Chromatography .pptx
High Performance Liquid Chromatography .pptx
Ayush Srivastava
 
Handling Multiple Choice Responses: Fortune Effiong.pptx
Handling Multiple Choice Responses: Fortune Effiong.pptxHandling Multiple Choice Responses: Fortune Effiong.pptx
Handling Multiple Choice Responses: Fortune Effiong.pptx
AuthorAIDNationalRes
 
Michelle Rumley & Mairéad Mooney, Boole Library, University College Cork. Tra...
Michelle Rumley & Mairéad Mooney, Boole Library, University College Cork. Tra...Michelle Rumley & Mairéad Mooney, Boole Library, University College Cork. Tra...
Michelle Rumley & Mairéad Mooney, Boole Library, University College Cork. Tra...
Library Association of Ireland
 
Open Access: Revamping Library Learning Resources.
Open Access: Revamping Library Learning Resources.Open Access: Revamping Library Learning Resources.
Open Access: Revamping Library Learning Resources.
Rishi Bankim Chandra Evening College, Naihati, North 24 Parganas, West Bengal, India
 
YSPH VMOC Special Report - Measles Outbreak Southwest US 4-26-2025.pptx
YSPH VMOC Special Report - Measles Outbreak  Southwest US 4-26-2025.pptxYSPH VMOC Special Report - Measles Outbreak  Southwest US 4-26-2025.pptx
YSPH VMOC Special Report - Measles Outbreak Southwest US 4-26-2025.pptx
Yale School of Public Health - The Virtual Medical Operations Center (VMOC)
 
CBSE - Grade 8 - Science - Chemistry - Metals and Non Metals - Worksheet
CBSE - Grade 8 - Science - Chemistry - Metals and Non Metals - WorksheetCBSE - Grade 8 - Science - Chemistry - Metals and Non Metals - Worksheet
CBSE - Grade 8 - Science - Chemistry - Metals and Non Metals - Worksheet
Sritoma Majumder
 
K12 Tableau Tuesday - Algebra Equity and Access in Atlanta Public Schools
K12 Tableau Tuesday  - Algebra Equity and Access in Atlanta Public SchoolsK12 Tableau Tuesday  - Algebra Equity and Access in Atlanta Public Schools
K12 Tableau Tuesday - Algebra Equity and Access in Atlanta Public Schools
dogden2
 
To study Digestive system of insect.pptx
To study Digestive system of insect.pptxTo study Digestive system of insect.pptx
To study Digestive system of insect.pptx
Arshad Shaikh
 
SPRING FESTIVITIES - UK AND USA -
SPRING FESTIVITIES - UK AND USA            -SPRING FESTIVITIES - UK AND USA            -
SPRING FESTIVITIES - UK AND USA -
Colégio Santa Teresinha
 
Multi-currency in odoo accounting and Update exchange rates automatically in ...
Multi-currency in odoo accounting and Update exchange rates automatically in ...Multi-currency in odoo accounting and Update exchange rates automatically in ...
Multi-currency in odoo accounting and Update exchange rates automatically in ...
Celine George
 
Studying Drama: Definition, types and elements
Studying Drama: Definition, types and elementsStudying Drama: Definition, types and elements
Studying Drama: Definition, types and elements
AbdelFattahAdel2
 
Timber Pitch Roof Construction Measurement-2024.pptx
Timber Pitch Roof Construction Measurement-2024.pptxTimber Pitch Roof Construction Measurement-2024.pptx
Timber Pitch Roof Construction Measurement-2024.pptx
Tantish QS, UTM
 
How to Subscribe Newsletter From Odoo 18 Website
How to Subscribe Newsletter From Odoo 18 WebsiteHow to Subscribe Newsletter From Odoo 18 Website
How to Subscribe Newsletter From Odoo 18 Website
Celine George
 
How to Customize Your Financial Reports & Tax Reports With Odoo 17 Accounting
How to Customize Your Financial Reports & Tax Reports With Odoo 17 AccountingHow to Customize Your Financial Reports & Tax Reports With Odoo 17 Accounting
How to Customize Your Financial Reports & Tax Reports With Odoo 17 Accounting
Celine George
 
P-glycoprotein pamphlet: iteration 4 of 4 final
P-glycoprotein pamphlet: iteration 4 of 4 finalP-glycoprotein pamphlet: iteration 4 of 4 final
P-glycoprotein pamphlet: iteration 4 of 4 final
bs22n2s
 
One Hot encoding a revolution in Machine learning
One Hot encoding a revolution in Machine learningOne Hot encoding a revolution in Machine learning
One Hot encoding a revolution in Machine learning
momer9505
 
Diabetic neuropathy peripheral autonomic
Diabetic neuropathy peripheral autonomicDiabetic neuropathy peripheral autonomic
Diabetic neuropathy peripheral autonomic
Pankaj Patawari
 
Presentation of the MIPLM subject matter expert Erdem Kaya
Presentation of the MIPLM subject matter expert Erdem KayaPresentation of the MIPLM subject matter expert Erdem Kaya
Presentation of the MIPLM subject matter expert Erdem Kaya
MIPLM
 
Unit 4: Long term- Capital budgeting and its types
Unit 4: Long term- Capital budgeting and its typesUnit 4: Long term- Capital budgeting and its types
Unit 4: Long term- Capital budgeting and its types
bharath321164
 
How to track Cost and Revenue using Analytic Accounts in odoo Accounting, App...
How to track Cost and Revenue using Analytic Accounts in odoo Accounting, App...How to track Cost and Revenue using Analytic Accounts in odoo Accounting, App...
How to track Cost and Revenue using Analytic Accounts in odoo Accounting, App...
Celine George
 
High Performance Liquid Chromatography .pptx
High Performance Liquid Chromatography .pptxHigh Performance Liquid Chromatography .pptx
High Performance Liquid Chromatography .pptx
Ayush Srivastava
 
Handling Multiple Choice Responses: Fortune Effiong.pptx
Handling Multiple Choice Responses: Fortune Effiong.pptxHandling Multiple Choice Responses: Fortune Effiong.pptx
Handling Multiple Choice Responses: Fortune Effiong.pptx
AuthorAIDNationalRes
 
Michelle Rumley & Mairéad Mooney, Boole Library, University College Cork. Tra...
Michelle Rumley & Mairéad Mooney, Boole Library, University College Cork. Tra...Michelle Rumley & Mairéad Mooney, Boole Library, University College Cork. Tra...
Michelle Rumley & Mairéad Mooney, Boole Library, University College Cork. Tra...
Library Association of Ireland
 
CBSE - Grade 8 - Science - Chemistry - Metals and Non Metals - Worksheet
CBSE - Grade 8 - Science - Chemistry - Metals and Non Metals - WorksheetCBSE - Grade 8 - Science - Chemistry - Metals and Non Metals - Worksheet
CBSE - Grade 8 - Science - Chemistry - Metals and Non Metals - Worksheet
Sritoma Majumder
 
K12 Tableau Tuesday - Algebra Equity and Access in Atlanta Public Schools
K12 Tableau Tuesday  - Algebra Equity and Access in Atlanta Public SchoolsK12 Tableau Tuesday  - Algebra Equity and Access in Atlanta Public Schools
K12 Tableau Tuesday - Algebra Equity and Access in Atlanta Public Schools
dogden2
 
To study Digestive system of insect.pptx
To study Digestive system of insect.pptxTo study Digestive system of insect.pptx
To study Digestive system of insect.pptx
Arshad Shaikh
 
Multi-currency in odoo accounting and Update exchange rates automatically in ...
Multi-currency in odoo accounting and Update exchange rates automatically in ...Multi-currency in odoo accounting and Update exchange rates automatically in ...
Multi-currency in odoo accounting and Update exchange rates automatically in ...
Celine George
 
Studying Drama: Definition, types and elements
Studying Drama: Definition, types and elementsStudying Drama: Definition, types and elements
Studying Drama: Definition, types and elements
AbdelFattahAdel2
 
Timber Pitch Roof Construction Measurement-2024.pptx
Timber Pitch Roof Construction Measurement-2024.pptxTimber Pitch Roof Construction Measurement-2024.pptx
Timber Pitch Roof Construction Measurement-2024.pptx
Tantish QS, UTM
 
Ad

Big data technology unit 3

  • 2.  Non-relational  Flexible schema  Other or additional query languages than SQL  Distributed – horizontal scaling  Less structured data  Supports big data 2 INTRODUCTION OF NOSQL
  • 3. When compared to relational databases, NoSQL databases are more scalable and provide superior performance, and their data model addresses several issues that the relational model is not designed to address: ◦ Geographically distributed architecture instead of expensive, monolithic architecture ◦ Large volumes of rapidly changing structured, semi-structured, and unstructured data ◦ Agile sprints, quick schema iteration, and frequent code pushes ◦ Object-oriented programming that is easy to use and flexible 3
  • 4.  It’s Not No SQL it’s NOT ONLY SQL.  It’s not even a replacement to RDBMS. As compared to the good olden days we are saving more and more data. Connection between the data is growing in which we require an architecture that takes advantage of these two key issues.
  • 5. MongoDB is a cross-platform, document oriented database that provides  High performance.  High availability.  Easy scalability. MongoDB works on concept of collection and document.
  • 7. When your requirements has these properties :  You absolutely must store unstructured data. Say things coming from 3rd-party API you don’t control, logs whose format may change any minute, user-entered metadata, but you want indexes on a subset of it.  You need to handle more reads/writes than single server can deal with and master-slave architecture won’t work for you.  You change your schema very often on a large dataset.
  • 8.  Stands for No-SQL or Not Only SQL??  Class of non-relational data storage systems  E.g. BigTable, Dynamo, PNUTS/Sherpa, ..  Usually do not require a fixed table schema nor do they use the concept of joins  Distributed data storage systems  All NoSQL offerings relax one or more of the ACID properties (will talk about the CAP theorem)
  • 9.  Basic API access:  get(key) -- Extract the value given a key  put(key, value) -- Create or update the value given its key  delete(key) -- Remove the key and its associated value  execute(key, operation, parameters) -- Invoke an operation to the value (given its key) which is a special data structure (e.g. List, Set, Map .... etc).
  • 10. NoSQL Data Storage: Classification  Uninterpreted key/value or ‘the big hash table’.  Amazon S3 (Dynamo)  Flexible schema  BigTable, Cassandra, Base (ordered keys, semi- structured data),  Sherpa/PNuts (unordered keys, JSON)  MongoDB (based on JSON)  CouchDB (name/value in text)
  • 11.  Cheap, easy to implement (open source)  Data are replicated to multiple nodes (therefore identical and fault-tolerant) and can be partitioned  When data is written, the latest version is on at least one node and then replicated to other nodes  No single point of failure  Easy to distribute  Don't require a schema
  • 12. What does NoSQL Not Provide?  Joins  Group by  But PNUTS provides interesting materialized view approach to joins/aggregation.  ACID transactions  SQL  Integration with applications that are based on SQL
  • 15. mongoDB SQL Document Tuple Collection Table/View PK: _id Field PK: Any Attribute(s) Uniformity not Required Uniform Relation Schema Index Index Embedded Structure Joins Shard Partition
  • 16. RDBMS MongoDB Database ➜ Database Table ➜ Collection Row ➜ Document Index ➜ Index Join ➜ Embedded Document Foreign Key ➜ Reference
  • 17.  Has two phases: A map stage that processes each document and emits one or more objects for each input document A reduce phase that combines the output of the map operation. An optional finalize stage for final modifications to the result  Uses Custom JavaScript functions Provides greater flexibility but is less efficient and more complex than the aggregation pipeline  Can have output sets that exceed the 16 megabyte output limitation of the aggregation pipeline.
  • 18.  It’s Not No SQL it’s NOT ONLY SQL.  It’s not even a replacement to RDBMS. As compared to the good olden days we are saving more and more data. Connection between the data is growing in which we require an architecture that takes advantage of these two key issues.
  • 19.  Key Value pair Dynamo DB Azure Table Storage (ATS ) Graph database  Document Based Mango Db AmazonSimple DB Couch DB  Column Oriented database (#key,#value) (Name, Tom) (Age,25) (Role, Student) (University, CU) [ { "Name": "Tom", "Age": 30, "Role": "Student", "University": "CU", } ] Student Tom CU 25 Masters Ottawa Location • Neo4j • Infogrid Row Id Columns 1 Name Tom Age 25 Role Studen t Bigtable(Google) Base
  • 20. MongoDB is a cross-platform, document oriented database that provides  High performance.  High availability.  Easy scalability. MongoDB works on concept of collection and document.
  • 21.  All the modern applications deals with huge data.  Development with ease is possible with mongo DB.  Flexibility in deployment.  Rich Queries.  Older database systems may not be compatible with the design. And it’s a document oriented storage:- Data is stored in the form of JSON Style.
  • 23. XML JSON It is a markup language. It is a way of representing objects. This is more verbose than JSON. This format uses less words. It is used to describe the structured data. It is used to describe unstructured data which include arrays. JavaScript functions like eval(), parse() doesn’t work here. When eval method is applied to JSON it returns the described object. Example: <car> <company>Volkswagen</c ompany> { "company": Volkswagen, "name": "Vento",
  • 24.  What is it ?  How does it work ?  Hadoop  Tools  Architecture
  • 25.  Distributed database management system  Designed for big data  Scalable  Fault tolerant  No single point of failure  Has an SQL like query language  NoSQL
  • 26.  Organises data into tables  Uses Cassandra Query Language ( CQL )  Does not allow sub queries or joins  Supports Hadoop Map Reduce  Uses asynchronous masterless replication ◦ Gives low latency  Allows indexing  Allows batch analysis via Hadoop
  • 27. How does Cassandra integrate with Hadoop  Support for Map Reduce  Integration with ◦ Apache Pig ◦ Apache Hive  Can also act as a back end for Solr !
  • 28.  User Interface ( GUI ) ◦ Cassandra GUI ◦ Toad for cloud db's  Administration ◦ OpsCentre ◦ Cassandra Cluster Admin  Other ◦ Client libraries ◦ Java, Python, .Net, Perl etc
  • 29.  A peer to peer cluster  No single point of failure  Tunable consistency ◦ Is performance or accuracy more important ?  Query by key or key range  Row oriented data storage  Rows can hold up to 2 billion columns