SlideShare a Scribd company logo
Overview
of
Big Data Architecture ,
Hadoop Ecosystem &
NoSQL Databases
Khanderao Kand
CTO GloMantra Inc.
Entrepreneur and Technologist
Twitter @khanderao
Big Data Use Cases
Predictive Analytics, Recommendations, Brand/Product
Management
Social CRM: Brand Analytics, Consumer Sentiment
Analysis, Competition Analysis:
Risks and Fraud Reduction : Financial, Intrusion, Anti
Money Laundry
Text Analytics: Patent Search
Network and log analysis : Intrusion Analysis
Health Analysis: Epidemics, Communicable
deseases
Intelligence Analysis: CIA, Homeland Security
Societal: Social Movements Analysis, Political Campaign
Analysis
Big Data Characteristics
3 V (Volume, Velocity and Variety)
Variety: Text, Images, Videos, Social Web, Web
Logs, ERPs, CRM
Volume: Petabytes , Millions of people ,Billions / trillions of
records
Velocity: Speed of data coming in (likes, mobile, RFID, …)
Loosely structured and distributed data
Often involves time stamped events
Incomplete / non-perfect data
Velocity
Volume
Variety
Big Data is not just Hadoop
…Processing Algorithms
Log processing for frauds / intrusions / anomalies
Behavioral analysis of consumers: for Ads / targeting
Pattern recognition e.g. stock trades / weather
Machine learning / correlating events
Text Processing / Text mining / Sentiment analysis
Search
Predictive Analytics
Typical Tools
Statistical Processing (e.g. R)
Machine Learning ( Apache Mahout, UIMA)
Text Processing (WEKA, Mallet)
Complex Event Processing (S4, Esper)
Data Mining / Warehousing (JDM)
Big Data vs Traditional Architecture
Big Data Architecture
…
User launches
a batch job
1
Three Tier Architecture
App Request Data
from Data Tier
2
Data Tier sends
data to the App Tier
3
4
App Tier
sends the
report
5
User
requests a
report
1
Master Distributes
Application
2
Master launches
App on nodes
35
User
downloads
results
Application & Data Tier
Data Tier
Application
Tier
Application
Tier
Examples
Select top (10 [cor (SMP500)) from STOCKS from US
Select SUM(mentions) from twitter where hashtag = „coke‟
when ad =“coke” in period 1 day over 5 year
If buyer age=45, gender=male,
past: Nike, sports: 49ner, drinks: Budweiser
currently searching: Harley Davidson
what would he buy?
Ads Correlation
Goal: Cluster users based on past response to ads (and not on any known /
learned attributes) and use that knowledge to serve new ads for users in the
clusters
Approach:
AdClicked Events would be processed by CF engine
Userid, AdId, click -> Logged
CF engine would do batch processing to cluster users with
similar responses to past ads
CF Based Optimization algo to get users predicted score for given ads
Issues:
Users click data is very sparse
Ads may be short lived hence frequent CF batch (like indexing) needed
Mitigation:
Any way to correlate users demographic to click response (currently
correlation is low) . Can we infer users cluster with demographic based
cluster?
Collaborative Filtering
Basic Concept:
Leverage information provided by interactions of users to predict
items of interest for a user
Motivation:
What to recommend to user
Based on:
user past actions / feedback (clicks )
and
users who acted similar to „this‟ user
Advantage:
Very good results
Content / language agnostic
CF Recommendation
Ad1 Adn
U1
U j
CF Algorithm
Recommendation
Top
Ads
For
The
user
Serving Best Ad that user May click or view
Site Content MR1
Cassandra
/ MogoDB
MR3User Clicks
Cassandra
/ MogoDB
Cassandra
/ MogoDB
Ad Data
Site Content Analysis
and Classifier
DMZ
Freebase
OpenCalais
Content Analysis
User Behavior
User-Interest
MR2
User Cluster Based AdReco
ALGO: CF
Algo: Text Analytics +
Classifier (SVM/Bayes)
Classifiers + Statistical
mySQL / Apache Jena
Types of Big Data Platforms
Type Concepts Size Vendors
In-Memory
Databases
• Specialized I/O
and Flash
Memory for
faster I/O
• Specialized
HW
• Locked in
Order of TBs Oracle Exalytics,
SAP HANA,
Scaleout, Kognitio
Massively Parallel
Computing (MPP)
• Massive Nodes
• Organized data
• Distributed
Query
• Special HW
Order of 10s of
TB
Greenplum,
Netezza, Teradata
Aster, Sybase IQ
Map Reduce • Map and
Reduce
• Horizontally
scalable
• Commodity
hardware
100‟s of TB to
Petabytes
Hadoop
The image was taken from the Atacama desert in western South America by Yuri
Beletsky (Las Campanas Observatory, Carnegie Institution for Science) on July 11,
2012. Copyright Yuri Beletsky
Alignment…
Explosion of data from site logs, search engines, social
media…
Google published paper on Map Reduce and Google File
System, inspired Doug Cutting working on Apache
Lucene-Nutch, Hadoop born
Yahoo took further with 1000 nodes in 2007-2008
Possible to process very very large data on commodity
hardware
Apache Open source
Main Stars
Availability: Explosion of Data
Technology:
Hadoop
Cheaper storage and hardware
Scalability with Cloud
Requirement: Business requirement of intelligence out of
the data
Hadoop
Apache Java Open Source
Google Idea, Yahoo original implementation, open sourced
Two Components:
HDFS distributed File System and
Map-Reduce Engine
Commodity Hardware
Very High Scalability
HDFS
Large Data Set
Write Once – Read Many
Fault Tolerant
Distributed File System
Name Node – Data Node
Fixed Size Data Blocks
Checksum
Files – Sequence of blocks
Replicated over Balanced Cluster
Heartbeat Report from Nodes
NameNode
Client 1 Client2
Read
Write
Replication
Rack1 Rack N
Hadoop Jobs-Tasks
Job Tracker
Client 1 Client2
Task Tracker
Rack N
• Move the processing (Code) to Data instead of Data to Code
• JobTracker distributes and tracks tasks
• TaskTracker on processing nodes communicated task status to JobTrackers
• If Task does not respond, marked as failed, and relaunched on another Node
Task Tracker2
Map Reduce
• Two Step, Map and Reduce, approach of solving problem
• Move the code to the data
• Map step process data on nodes
• Reduce step aggregates results from all Map nodes with reduce algorithm
Map Reduce
OutputInput
Sort /
Shuffle
Big Data Process: MR Job
Train
Map
Reduce
Output
Map
Reduce
Output
Map
Reduce
Output
Map
Reduce
Output
Map
Reduce
Output
Big Data Stack
Speed
Scale
Speed
Hadoop
Esper, S4
kdb
Hbase
MongoDB
MySQL
Scale
Mahout
Matlab
R
SciPy
SAS SPSS
Patents
Infrastructure Technology
Layer
Processing Algorithms
Applications
Big Data Logical Architecture
Hadoop
Map Reduce
Unstructured
Data
Lucene
Nutch
Structured
Data
RDBMS
Datalogs
Streams
ETL
Data
Integration
Workflow
&
Scheduler
System
Admin
Monitoring
No-SQL
Hadoop
Based
RDBMS
No-SQL
SOLR
Apps
BI
Visualization
Analytics Products
BI Tools - Dev
Hadoop Ecosystem (Basic)
HDFS
Map Reduce
HCatalog
Network
HBase
SqoopHivePigAvro/
Thrift
Data
Access
Zookeeper,HCatalog
Knox
Chukwa /
Flume
Oozie
Processing
Storage
Workflow
Orchestration
Ambari,Nagios,Ganglia
BI Analytics Apps RDBMS
Big dataarchitecturesandecosystem+nosql
Apache AVRO
RPC and serialization framework
Programming language independent
JSON format
Primary use Hadoop
Communication between Nodes and In/Out Hadoop
Apache Thrift
Interface Definition Language for RPC
Language Independent
Binary Communication format
Layered Stack enabling debugging and monitoring
No config / No centralization
Developed by Facebook
IDL needs code generation for schema change
Code
Service Client
Read/Write
TProtocol
TTransport
Apache Hive
SQL-like HiveQL
Warehousing Apps
Compiles to MapReduce Tasks
Facebook, Netflix, etc.
hive> CREATE TABLE ADLOG (adtime timestamp, id int, action string)
hive> SHOW TABLES;
hive> DESCRIBE ADLOG;
hive> ALTER TABLE …
hive> FROM rawlog r INSERT OVERWRITE TABLE ADLOG
SELECT TRANSFORM(r.time, r.id, r.input)
AS (adtime, id, action) USING '/bin/log' WHERE a.adtime > '2008-08-09';
Apache Pig Latin
Higher Level scripting above Map Reduce
Procedural (unlike SQL) by easy like SQL
Constructs like FOREACH, GROUP
Supports User Defined Functions
From Yahoo
Good for Integrating and writing Hadoop Jobs
A = LOAD 'WordcountInput.txt';
B = MAPREDUCE 'wordcount.jar' STORE A INTO 'inputDir'
LOAD 'outputDir'
AS (word:chararray, count: int) `my.outputDir`;
Sqoop
Data Bulk Load
Data Import Export
RDBMS and NoSQL
HDFS, Hbase
Data Sliced
Sliced Transferred via MAP only Jobs
Chukwa
Hadoop Subproject
Large scale log processing
In/Out HDFS
Collection and analysis
Batch Oriented
Components:
Agents
Collectors
MR Jobs for Parsing & Archiving
HICC : Hadoop Infra Care Center Web App
Flume
Apache project
Large scale log processing
Supported by Cloudera
Log Stream
Components:
Agents
Channel
Clients Log4JAppender, HTTP ..
Compared with Chukwa:
Near Real time (seconds) vs Minutes
No Central Config
Source
Agent
Sink1
Agent
Sink2
Agent
Sink
Client
Flume
Channel
Big „Fast‟ Data
Real time adhoc querry:
Google Percolater and Dremel inspired
Cloudera : Impala
SQL like querry on HDFS
Lower latency
By pass Map Reduce
Apache Drill
Apache Storm
High Volume Stream Processing
Twitter (acquired BackType)
Uses ZeroMQ
Concepts:
Spout
Bolt (like Map or Reduce)
Topology
Spout
Spout
Bolt
(Transor
m)
Bolt
Bolt
(Reduce)
Bolt
Storm + Fusion Convergence –
Twitter Model
NoSQL & Map Reduce
NoSQL databases provides:
Schema flexibility,
Aligned programming models
High Volume and scalability on commodity hardware
Eventual Consistency
Can Interact with real time Applications and high velocity of data
Hadoop / HDFS catered more for batch processing, its gap with
operational apps can be bridged by using NoSQL to avoid
duplication and latency of data
Such integration powers NoSQL with high performing Map
Reduce functionality
HBase natively Hadoop Based
Cassandra augmented to Hadoop
MongoDB had MapReduce functionality but not HDFS based.
MongoDB added HadoopBridge
Q & A
Ad

More Related Content

What's hot (20)

Learning How to Learn Hadoop
Learning How to Learn HadoopLearning How to Learn Hadoop
Learning How to Learn Hadoop
Silicon Halton
 
The Hadoop Path by Subash DSouza of Archangel Technology Consultants, LLC.
The Hadoop Path by Subash DSouza of Archangel Technology Consultants, LLC.The Hadoop Path by Subash DSouza of Archangel Technology Consultants, LLC.
The Hadoop Path by Subash DSouza of Archangel Technology Consultants, LLC.
Data Con LA
 
Hadoop Ecosystem Architecture Overview
Hadoop Ecosystem Architecture Overview Hadoop Ecosystem Architecture Overview
Hadoop Ecosystem Architecture Overview
Senthil Kumar
 
Hadoop white papers
Hadoop white papersHadoop white papers
Hadoop white papers
Muthu Natarajan
 
Big data architecture on cloud computing infrastructure
Big data architecture on cloud computing infrastructureBig data architecture on cloud computing infrastructure
Big data architecture on cloud computing infrastructure
datastack
 
Hadoop_arunam_ppt
Hadoop_arunam_pptHadoop_arunam_ppt
Hadoop_arunam_ppt
jerrin joseph
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
Flavio Vit
 
Building a Big Data platform with the Hadoop ecosystem
Building a Big Data platform with the Hadoop ecosystemBuilding a Big Data platform with the Hadoop ecosystem
Building a Big Data platform with the Hadoop ecosystem
Gregg Barrett
 
Hadoop introduction , Why and What is Hadoop ?
Hadoop introduction , Why and What is  Hadoop ?Hadoop introduction , Why and What is  Hadoop ?
Hadoop introduction , Why and What is Hadoop ?
sudhakara st
 
Big data vahidamiri-datastack.ir
Big data vahidamiri-datastack.irBig data vahidamiri-datastack.ir
Big data vahidamiri-datastack.ir
datastack
 
Hive at Yahoo: Letters from the trenches
Hive at Yahoo: Letters from the trenchesHive at Yahoo: Letters from the trenches
Hive at Yahoo: Letters from the trenches
DataWorks Summit
 
Big data Hadoop
Big data  Hadoop   Big data  Hadoop
Big data Hadoop
Ayyappan Paramesh
 
Big Data on the Microsoft Platform
Big Data on the Microsoft PlatformBig Data on the Microsoft Platform
Big Data on the Microsoft Platform
Andrew Brust
 
Hadoop and Hive in Enterprises
Hadoop and Hive in EnterprisesHadoop and Hive in Enterprises
Hadoop and Hive in Enterprises
markgrover
 
Big Data Architecture Workshop - Vahid Amiri
Big Data Architecture Workshop -  Vahid AmiriBig Data Architecture Workshop -  Vahid Amiri
Big Data Architecture Workshop - Vahid Amiri
datastack
 
Poorna Hadoop
Poorna HadoopPoorna Hadoop
Poorna Hadoop
Poornachandrarao Kommana
 
عصر کلان داده، چرا و چگونه؟
عصر کلان داده، چرا و چگونه؟عصر کلان داده، چرا و چگونه؟
عصر کلان داده، چرا و چگونه؟
datastack
 
Hadoop Architecture Options for Existing Enterprise DataWarehouse
Hadoop Architecture Options for Existing Enterprise DataWarehouseHadoop Architecture Options for Existing Enterprise DataWarehouse
Hadoop Architecture Options for Existing Enterprise DataWarehouse
Asis Mohanty
 
Capacity Management and BigData/Hadoop - Hitchhiker's guide for the Capacity ...
Capacity Management and BigData/Hadoop - Hitchhiker's guide for the Capacity ...Capacity Management and BigData/Hadoop - Hitchhiker's guide for the Capacity ...
Capacity Management and BigData/Hadoop - Hitchhiker's guide for the Capacity ...
Renato Bonomini
 
Hadoop Innovation Summit 2014
Hadoop Innovation Summit 2014Hadoop Innovation Summit 2014
Hadoop Innovation Summit 2014
Data Con LA
 
Learning How to Learn Hadoop
Learning How to Learn HadoopLearning How to Learn Hadoop
Learning How to Learn Hadoop
Silicon Halton
 
The Hadoop Path by Subash DSouza of Archangel Technology Consultants, LLC.
The Hadoop Path by Subash DSouza of Archangel Technology Consultants, LLC.The Hadoop Path by Subash DSouza of Archangel Technology Consultants, LLC.
The Hadoop Path by Subash DSouza of Archangel Technology Consultants, LLC.
Data Con LA
 
Hadoop Ecosystem Architecture Overview
Hadoop Ecosystem Architecture Overview Hadoop Ecosystem Architecture Overview
Hadoop Ecosystem Architecture Overview
Senthil Kumar
 
Big data architecture on cloud computing infrastructure
Big data architecture on cloud computing infrastructureBig data architecture on cloud computing infrastructure
Big data architecture on cloud computing infrastructure
datastack
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
Flavio Vit
 
Building a Big Data platform with the Hadoop ecosystem
Building a Big Data platform with the Hadoop ecosystemBuilding a Big Data platform with the Hadoop ecosystem
Building a Big Data platform with the Hadoop ecosystem
Gregg Barrett
 
Hadoop introduction , Why and What is Hadoop ?
Hadoop introduction , Why and What is  Hadoop ?Hadoop introduction , Why and What is  Hadoop ?
Hadoop introduction , Why and What is Hadoop ?
sudhakara st
 
Big data vahidamiri-datastack.ir
Big data vahidamiri-datastack.irBig data vahidamiri-datastack.ir
Big data vahidamiri-datastack.ir
datastack
 
Hive at Yahoo: Letters from the trenches
Hive at Yahoo: Letters from the trenchesHive at Yahoo: Letters from the trenches
Hive at Yahoo: Letters from the trenches
DataWorks Summit
 
Big Data on the Microsoft Platform
Big Data on the Microsoft PlatformBig Data on the Microsoft Platform
Big Data on the Microsoft Platform
Andrew Brust
 
Hadoop and Hive in Enterprises
Hadoop and Hive in EnterprisesHadoop and Hive in Enterprises
Hadoop and Hive in Enterprises
markgrover
 
Big Data Architecture Workshop - Vahid Amiri
Big Data Architecture Workshop -  Vahid AmiriBig Data Architecture Workshop -  Vahid Amiri
Big Data Architecture Workshop - Vahid Amiri
datastack
 
عصر کلان داده، چرا و چگونه؟
عصر کلان داده، چرا و چگونه؟عصر کلان داده، چرا و چگونه؟
عصر کلان داده، چرا و چگونه؟
datastack
 
Hadoop Architecture Options for Existing Enterprise DataWarehouse
Hadoop Architecture Options for Existing Enterprise DataWarehouseHadoop Architecture Options for Existing Enterprise DataWarehouse
Hadoop Architecture Options for Existing Enterprise DataWarehouse
Asis Mohanty
 
Capacity Management and BigData/Hadoop - Hitchhiker's guide for the Capacity ...
Capacity Management and BigData/Hadoop - Hitchhiker's guide for the Capacity ...Capacity Management and BigData/Hadoop - Hitchhiker's guide for the Capacity ...
Capacity Management and BigData/Hadoop - Hitchhiker's guide for the Capacity ...
Renato Bonomini
 
Hadoop Innovation Summit 2014
Hadoop Innovation Summit 2014Hadoop Innovation Summit 2014
Hadoop Innovation Summit 2014
Data Con LA
 

Viewers also liked (8)

Big Data Israel Meetup : Couchbase and Big Data
Big Data Israel Meetup : Couchbase and Big DataBig Data Israel Meetup : Couchbase and Big Data
Big Data Israel Meetup : Couchbase and Big Data
Tugdual Grall
 
Couchbase Chennai Meetup 2 - Big Data & Analytics
Couchbase Chennai Meetup 2 - Big Data & AnalyticsCouchbase Chennai Meetup 2 - Big Data & Analytics
Couchbase Chennai Meetup 2 - Big Data & Analytics
RedBlackTree
 
Fábulas[1]
Fábulas[1]Fábulas[1]
Fábulas[1]
Xoch Itl
 
Couchbase @ Big Data France 2016
Couchbase @ Big Data France 2016Couchbase @ Big Data France 2016
Couchbase @ Big Data France 2016
Cecile Le Pape
 
Intro to hadoop tutorial
Intro to hadoop tutorialIntro to hadoop tutorial
Intro to hadoop tutorial
markgrover
 
Hadoop operations
Hadoop operationsHadoop operations
Hadoop operations
DataWorks Summit
 
HBase introduction talk
HBase introduction talkHBase introduction talk
HBase introduction talk
Hayden Marchant
 
Tutorial hadoop hdfs_map_reduce
Tutorial hadoop hdfs_map_reduceTutorial hadoop hdfs_map_reduce
Tutorial hadoop hdfs_map_reduce
mudassar mulla
 
Big Data Israel Meetup : Couchbase and Big Data
Big Data Israel Meetup : Couchbase and Big DataBig Data Israel Meetup : Couchbase and Big Data
Big Data Israel Meetup : Couchbase and Big Data
Tugdual Grall
 
Couchbase Chennai Meetup 2 - Big Data & Analytics
Couchbase Chennai Meetup 2 - Big Data & AnalyticsCouchbase Chennai Meetup 2 - Big Data & Analytics
Couchbase Chennai Meetup 2 - Big Data & Analytics
RedBlackTree
 
Fábulas[1]
Fábulas[1]Fábulas[1]
Fábulas[1]
Xoch Itl
 
Couchbase @ Big Data France 2016
Couchbase @ Big Data France 2016Couchbase @ Big Data France 2016
Couchbase @ Big Data France 2016
Cecile Le Pape
 
Intro to hadoop tutorial
Intro to hadoop tutorialIntro to hadoop tutorial
Intro to hadoop tutorial
markgrover
 
Tutorial hadoop hdfs_map_reduce
Tutorial hadoop hdfs_map_reduceTutorial hadoop hdfs_map_reduce
Tutorial hadoop hdfs_map_reduce
mudassar mulla
 
Ad

Similar to Big dataarchitecturesandecosystem+nosql (20)

Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Cloudera, Inc.
 
Hive @ Hadoop day seattle_2010
Hive @ Hadoop day seattle_2010Hive @ Hadoop day seattle_2010
Hive @ Hadoop day seattle_2010
nzhang
 
Ameya Kanitkar: Using Hadoop and HBase to Personalize Web, Mobile and Email E...
Ameya Kanitkar: Using Hadoop and HBase to Personalize Web, Mobile and Email E...Ameya Kanitkar: Using Hadoop and HBase to Personalize Web, Mobile and Email E...
Ameya Kanitkar: Using Hadoop and HBase to Personalize Web, Mobile and Email E...
WebExpo
 
Basic of Big Data
Basic of Big Data Basic of Big Data
Basic of Big Data
Amar kumar
 
Big data-hadoop-training-course-content-content
Big data-hadoop-training-course-content-contentBig data-hadoop-training-course-content-content
Big data-hadoop-training-course-content-content
Training Institute
 
Python in big data world
Python in big data worldPython in big data world
Python in big data world
Rohit
 
Big data technologies with Case Study Finance and Healthcare
Big data technologies with Case Study Finance and HealthcareBig data technologies with Case Study Finance and Healthcare
Big data technologies with Case Study Finance and Healthcare
Sandeep Sharma IIMK Smart City,IoT,Bigdata,Cloud,BI,DW
 
No sql databases
No sql databasesNo sql databases
No sql databases
Sandeep Sharma IIMK Smart City,IoT,Bigdata,Cloud,BI,DW
 
Big Data - HDInsight and Power BI
Big Data - HDInsight and Power BIBig Data - HDInsight and Power BI
Big Data - HDInsight and Power BI
Prasad Prabhu (PP)
 
OOP 2014
OOP 2014OOP 2014
OOP 2014
Emil Andreas Siemes
 
NoSQL Type, Bigdata, and Analytics
NoSQL Type, Bigdata, and AnalyticsNoSQL Type, Bigdata, and Analytics
NoSQL Type, Bigdata, and Analytics
Sandeep Sharma IIMK Smart City,IoT,Bigdata,Cloud,BI,DW
 
Big Data
Big DataBig Data
Big Data
Mehmet Burak Akgün
 
Distributed computing poli
Distributed computing poliDistributed computing poli
Distributed computing poli
ivascucristian
 
BigData
BigDataBigData
BigData
Shankar R
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
MaulikLakhani
 
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Shirshanka Das
 
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
Yael Garten
 
Data Infrastructure for a World of Music
Data Infrastructure for a World of MusicData Infrastructure for a World of Music
Data Infrastructure for a World of Music
Lars Albertsson
 
Data science big data and analytics
Data science big data and analyticsData science big data and analytics
Data science big data and analytics
Sandeep Sharma IIMK Smart City,IoT,Bigdata,Cloud,BI,DW
 
Hive Training -- Motivations and Real World Use Cases
Hive Training -- Motivations and Real World Use CasesHive Training -- Motivations and Real World Use Cases
Hive Training -- Motivations and Real World Use Cases
nzhang
 
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Cloudera, Inc.
 
Hive @ Hadoop day seattle_2010
Hive @ Hadoop day seattle_2010Hive @ Hadoop day seattle_2010
Hive @ Hadoop day seattle_2010
nzhang
 
Ameya Kanitkar: Using Hadoop and HBase to Personalize Web, Mobile and Email E...
Ameya Kanitkar: Using Hadoop and HBase to Personalize Web, Mobile and Email E...Ameya Kanitkar: Using Hadoop and HBase to Personalize Web, Mobile and Email E...
Ameya Kanitkar: Using Hadoop and HBase to Personalize Web, Mobile and Email E...
WebExpo
 
Basic of Big Data
Basic of Big Data Basic of Big Data
Basic of Big Data
Amar kumar
 
Big data-hadoop-training-course-content-content
Big data-hadoop-training-course-content-contentBig data-hadoop-training-course-content-content
Big data-hadoop-training-course-content-content
Training Institute
 
Python in big data world
Python in big data worldPython in big data world
Python in big data world
Rohit
 
Big Data - HDInsight and Power BI
Big Data - HDInsight and Power BIBig Data - HDInsight and Power BI
Big Data - HDInsight and Power BI
Prasad Prabhu (PP)
 
Distributed computing poli
Distributed computing poliDistributed computing poli
Distributed computing poli
ivascucristian
 
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Strata 2017 (San Jose): Building a healthy data ecosystem around Kafka and Ha...
Shirshanka Das
 
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
Building a healthy data ecosystem around Kafka and Hadoop: Lessons learned at...
Yael Garten
 
Data Infrastructure for a World of Music
Data Infrastructure for a World of MusicData Infrastructure for a World of Music
Data Infrastructure for a World of Music
Lars Albertsson
 
Hive Training -- Motivations and Real World Use Cases
Hive Training -- Motivations and Real World Use CasesHive Training -- Motivations and Real World Use Cases
Hive Training -- Motivations and Real World Use Cases
nzhang
 
Ad

Recently uploaded (20)

Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep DiveDesigning Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
ScyllaDB
 
Mobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi ArabiaMobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi Arabia
Steve Jonas
 
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath MaestroDev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
UiPathCommunity
 
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
Alan Dix
 
TrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business ConsultingTrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business Consulting
Trs Labs
 
How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?
Daniel Lehner
 
Linux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdfLinux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdf
RHCSA Guru
 
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
BookNet Canada
 
Semantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AISemantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AI
artmondano
 
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
BookNet Canada
 
Heap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and DeletionHeap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and Deletion
Jaydeep Kale
 
What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...
Vishnu Singh Chundawat
 
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptxIncreasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Anoop Ashok
 
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager APIUiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPathCommunity
 
Role of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered ManufacturingRole of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered Manufacturing
Andrew Leo
 
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes
 
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
organizerofv
 
2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx
Samuele Fogagnolo
 
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-UmgebungenHCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
panagenda
 
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
SOFTTECHHUB
 
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep DiveDesigning Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
ScyllaDB
 
Mobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi ArabiaMobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi Arabia
Steve Jonas
 
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath MaestroDev Dives: Automate and orchestrate your processes with UiPath Maestro
Dev Dives: Automate and orchestrate your processes with UiPath Maestro
UiPathCommunity
 
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
Alan Dix
 
TrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business ConsultingTrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business Consulting
Trs Labs
 
How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?
Daniel Lehner
 
Linux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdfLinux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdf
RHCSA Guru
 
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
BookNet Canada
 
Semantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AISemantic Cultivators : The Critical Future Role to Enable AI
Semantic Cultivators : The Critical Future Role to Enable AI
artmondano
 
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
BookNet Canada
 
Heap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and DeletionHeap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and Deletion
Jaydeep Kale
 
What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...What is Model Context Protocol(MCP) - The new technology for communication bw...
What is Model Context Protocol(MCP) - The new technology for communication bw...
Vishnu Singh Chundawat
 
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptxIncreasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Increasing Retail Store Efficiency How can Planograms Save Time and Money.pptx
Anoop Ashok
 
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager APIUiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPathCommunity
 
Role of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered ManufacturingRole of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered Manufacturing
Andrew Leo
 
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes Partner Innovation Updates for May 2025
ThousandEyes
 
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
organizerofv
 
2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx2025-05-Q4-2024-Investor-Presentation.pptx
2025-05-Q4-2024-Investor-Presentation.pptx
Samuele Fogagnolo
 
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-UmgebungenHCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
panagenda
 
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
SOFTTECHHUB
 

Big dataarchitecturesandecosystem+nosql

  • 1. Overview of Big Data Architecture , Hadoop Ecosystem & NoSQL Databases Khanderao Kand CTO GloMantra Inc. Entrepreneur and Technologist Twitter @khanderao
  • 2. Big Data Use Cases Predictive Analytics, Recommendations, Brand/Product Management Social CRM: Brand Analytics, Consumer Sentiment Analysis, Competition Analysis: Risks and Fraud Reduction : Financial, Intrusion, Anti Money Laundry Text Analytics: Patent Search Network and log analysis : Intrusion Analysis Health Analysis: Epidemics, Communicable deseases Intelligence Analysis: CIA, Homeland Security Societal: Social Movements Analysis, Political Campaign Analysis
  • 3. Big Data Characteristics 3 V (Volume, Velocity and Variety) Variety: Text, Images, Videos, Social Web, Web Logs, ERPs, CRM Volume: Petabytes , Millions of people ,Billions / trillions of records Velocity: Speed of data coming in (likes, mobile, RFID, …) Loosely structured and distributed data Often involves time stamped events Incomplete / non-perfect data Velocity Volume Variety
  • 4. Big Data is not just Hadoop …Processing Algorithms Log processing for frauds / intrusions / anomalies Behavioral analysis of consumers: for Ads / targeting Pattern recognition e.g. stock trades / weather Machine learning / correlating events Text Processing / Text mining / Sentiment analysis Search Predictive Analytics
  • 5. Typical Tools Statistical Processing (e.g. R) Machine Learning ( Apache Mahout, UIMA) Text Processing (WEKA, Mallet) Complex Event Processing (S4, Esper) Data Mining / Warehousing (JDM)
  • 6. Big Data vs Traditional Architecture Big Data Architecture … User launches a batch job 1 Three Tier Architecture App Request Data from Data Tier 2 Data Tier sends data to the App Tier 3 4 App Tier sends the report 5 User requests a report 1 Master Distributes Application 2 Master launches App on nodes 35 User downloads results Application & Data Tier Data Tier Application Tier Application Tier
  • 7. Examples Select top (10 [cor (SMP500)) from STOCKS from US Select SUM(mentions) from twitter where hashtag = „coke‟ when ad =“coke” in period 1 day over 5 year If buyer age=45, gender=male, past: Nike, sports: 49ner, drinks: Budweiser currently searching: Harley Davidson what would he buy?
  • 8. Ads Correlation Goal: Cluster users based on past response to ads (and not on any known / learned attributes) and use that knowledge to serve new ads for users in the clusters Approach: AdClicked Events would be processed by CF engine Userid, AdId, click -> Logged CF engine would do batch processing to cluster users with similar responses to past ads CF Based Optimization algo to get users predicted score for given ads Issues: Users click data is very sparse Ads may be short lived hence frequent CF batch (like indexing) needed Mitigation: Any way to correlate users demographic to click response (currently correlation is low) . Can we infer users cluster with demographic based cluster?
  • 9. Collaborative Filtering Basic Concept: Leverage information provided by interactions of users to predict items of interest for a user Motivation: What to recommend to user Based on: user past actions / feedback (clicks ) and users who acted similar to „this‟ user Advantage: Very good results Content / language agnostic
  • 10. CF Recommendation Ad1 Adn U1 U j CF Algorithm Recommendation Top Ads For The user
  • 11. Serving Best Ad that user May click or view Site Content MR1 Cassandra / MogoDB MR3User Clicks Cassandra / MogoDB Cassandra / MogoDB Ad Data Site Content Analysis and Classifier DMZ Freebase OpenCalais Content Analysis User Behavior User-Interest MR2 User Cluster Based AdReco ALGO: CF Algo: Text Analytics + Classifier (SVM/Bayes) Classifiers + Statistical mySQL / Apache Jena
  • 12. Types of Big Data Platforms Type Concepts Size Vendors In-Memory Databases • Specialized I/O and Flash Memory for faster I/O • Specialized HW • Locked in Order of TBs Oracle Exalytics, SAP HANA, Scaleout, Kognitio Massively Parallel Computing (MPP) • Massive Nodes • Organized data • Distributed Query • Special HW Order of 10s of TB Greenplum, Netezza, Teradata Aster, Sybase IQ Map Reduce • Map and Reduce • Horizontally scalable • Commodity hardware 100‟s of TB to Petabytes Hadoop
  • 13. The image was taken from the Atacama desert in western South America by Yuri Beletsky (Las Campanas Observatory, Carnegie Institution for Science) on July 11, 2012. Copyright Yuri Beletsky
  • 14. Alignment… Explosion of data from site logs, search engines, social media… Google published paper on Map Reduce and Google File System, inspired Doug Cutting working on Apache Lucene-Nutch, Hadoop born Yahoo took further with 1000 nodes in 2007-2008 Possible to process very very large data on commodity hardware Apache Open source
  • 15. Main Stars Availability: Explosion of Data Technology: Hadoop Cheaper storage and hardware Scalability with Cloud Requirement: Business requirement of intelligence out of the data
  • 16. Hadoop Apache Java Open Source Google Idea, Yahoo original implementation, open sourced Two Components: HDFS distributed File System and Map-Reduce Engine Commodity Hardware Very High Scalability
  • 17. HDFS Large Data Set Write Once – Read Many Fault Tolerant Distributed File System Name Node – Data Node Fixed Size Data Blocks Checksum Files – Sequence of blocks Replicated over Balanced Cluster Heartbeat Report from Nodes NameNode Client 1 Client2 Read Write Replication Rack1 Rack N
  • 18. Hadoop Jobs-Tasks Job Tracker Client 1 Client2 Task Tracker Rack N • Move the processing (Code) to Data instead of Data to Code • JobTracker distributes and tracks tasks • TaskTracker on processing nodes communicated task status to JobTrackers • If Task does not respond, marked as failed, and relaunched on another Node Task Tracker2
  • 19. Map Reduce • Two Step, Map and Reduce, approach of solving problem • Move the code to the data • Map step process data on nodes • Reduce step aggregates results from all Map nodes with reduce algorithm Map Reduce OutputInput Sort / Shuffle
  • 20. Big Data Process: MR Job Train Map Reduce Output Map Reduce Output Map Reduce Output Map Reduce Output Map Reduce Output
  • 21. Big Data Stack Speed Scale Speed Hadoop Esper, S4 kdb Hbase MongoDB MySQL Scale Mahout Matlab R SciPy SAS SPSS Patents Infrastructure Technology Layer Processing Algorithms Applications
  • 22. Big Data Logical Architecture Hadoop Map Reduce Unstructured Data Lucene Nutch Structured Data RDBMS Datalogs Streams ETL Data Integration Workflow & Scheduler System Admin Monitoring No-SQL Hadoop Based RDBMS No-SQL SOLR Apps BI Visualization Analytics Products BI Tools - Dev
  • 23. Hadoop Ecosystem (Basic) HDFS Map Reduce HCatalog Network HBase SqoopHivePigAvro/ Thrift Data Access Zookeeper,HCatalog Knox Chukwa / Flume Oozie Processing Storage Workflow Orchestration Ambari,Nagios,Ganglia BI Analytics Apps RDBMS
  • 25. Apache AVRO RPC and serialization framework Programming language independent JSON format Primary use Hadoop Communication between Nodes and In/Out Hadoop
  • 26. Apache Thrift Interface Definition Language for RPC Language Independent Binary Communication format Layered Stack enabling debugging and monitoring No config / No centralization Developed by Facebook IDL needs code generation for schema change Code Service Client Read/Write TProtocol TTransport
  • 27. Apache Hive SQL-like HiveQL Warehousing Apps Compiles to MapReduce Tasks Facebook, Netflix, etc. hive> CREATE TABLE ADLOG (adtime timestamp, id int, action string) hive> SHOW TABLES; hive> DESCRIBE ADLOG; hive> ALTER TABLE … hive> FROM rawlog r INSERT OVERWRITE TABLE ADLOG SELECT TRANSFORM(r.time, r.id, r.input) AS (adtime, id, action) USING '/bin/log' WHERE a.adtime > '2008-08-09';
  • 28. Apache Pig Latin Higher Level scripting above Map Reduce Procedural (unlike SQL) by easy like SQL Constructs like FOREACH, GROUP Supports User Defined Functions From Yahoo Good for Integrating and writing Hadoop Jobs A = LOAD 'WordcountInput.txt'; B = MAPREDUCE 'wordcount.jar' STORE A INTO 'inputDir' LOAD 'outputDir' AS (word:chararray, count: int) `my.outputDir`;
  • 29. Sqoop Data Bulk Load Data Import Export RDBMS and NoSQL HDFS, Hbase Data Sliced Sliced Transferred via MAP only Jobs
  • 30. Chukwa Hadoop Subproject Large scale log processing In/Out HDFS Collection and analysis Batch Oriented Components: Agents Collectors MR Jobs for Parsing & Archiving HICC : Hadoop Infra Care Center Web App
  • 31. Flume Apache project Large scale log processing Supported by Cloudera Log Stream Components: Agents Channel Clients Log4JAppender, HTTP .. Compared with Chukwa: Near Real time (seconds) vs Minutes No Central Config Source Agent Sink1 Agent Sink2 Agent Sink Client Flume Channel
  • 32. Big „Fast‟ Data Real time adhoc querry: Google Percolater and Dremel inspired Cloudera : Impala SQL like querry on HDFS Lower latency By pass Map Reduce Apache Drill
  • 33. Apache Storm High Volume Stream Processing Twitter (acquired BackType) Uses ZeroMQ Concepts: Spout Bolt (like Map or Reduce) Topology Spout Spout Bolt (Transor m) Bolt Bolt (Reduce) Bolt
  • 34. Storm + Fusion Convergence – Twitter Model
  • 35. NoSQL & Map Reduce NoSQL databases provides: Schema flexibility, Aligned programming models High Volume and scalability on commodity hardware Eventual Consistency Can Interact with real time Applications and high velocity of data Hadoop / HDFS catered more for batch processing, its gap with operational apps can be bridged by using NoSQL to avoid duplication and latency of data Such integration powers NoSQL with high performing Map Reduce functionality HBase natively Hadoop Based Cassandra augmented to Hadoop MongoDB had MapReduce functionality but not HDFS based. MongoDB added HadoopBridge
  • 36. Q & A