SlideShare a Scribd company logo
Slide 1 www.edureka.in/hadoop
Slide 2
Hello There!!
My name is Annie.
Let me test your Hadoop 1.x
knowledge?
Annie’s Introduction
Slide 3
Hello There!!
My name is Annie.
I love quizzes and
puzzles and I am here to
make you guys think and
answer my questions.
Can you store 1 billion files in a Hadoop 1.x cluster?
- Yes
- No
Annie’s Question
Slide 4
No. Even though you have hundreds of DataNodes in the cluster,
the NameNode keeps all its metadata in memory, so you are limited
to a maximum of only 50-100M files in the entire cluster because of
a Single NameNode in Hadoop 1.x.
Annie’s Answer
Slide 5
Hello There!!
My name is Annie.
I love quizzes and
puzzles and I am here to
make you guys think and
answer my questions.
A Hadoop 1.x cluster can have multiple HDFS Namespaces.
- True
- False
Annie’s Question
Slide 6
False. Not possible with Hadoop 1.x.
Annie’s Answer
Slide 7
Hello There!!
My name is Annie.
I love quizzes and
puzzles and I am here to
make you guys think and
answer my questions.
Which of the following is (are) a significant disadvantage in Hadoop
1.0?
- ‘Single Point Of Failure’ of NameNode
- Too much burden on Job Tracker
Annie’s Question
Slide 8
Single Point of Failure of NameNode and too much burden
on Job Tracker.
Annie’s Answer
Slide 9
Hello There!!
My name is Annie.
I love quizzes and
puzzles and I am here to
make you guys think and
answer my questions.
Can you use hundreds of Hadoop DataNode for any other processing
than MapReduce in Hadoop 1.x?
- Yes
- No
Annie’s Question
Slide 10
No. Hadoop 1.x dedicates all the DataNode resources to Map and
Reduce slots with no or little room for processing any other
workload.
Annie’s Answer
Slide 11
Hello There!!
My name is Annie.
I love quizzes and
puzzles and I am here to
make you guys think and
answer my questions.
Can you use Hadoop for Real-time processing?
- Yes
- No
Annie’s Question
Slide 12
No. Hadoop is designed and developer for massively parallel batch
processing.
Annie’s Answer
Limitations of Hadoop 1.x
 No horizontal scalability of NameNode
 Does not support NameNode High Availability
 Overburdened JobTracker
 Not possible to run Non-MapReduce Big Data Applications on HDFS
 Does not support Multi-tenancy
Slide 14 www.edureka.in/hadoop
Hadoop 1.x – In Summary
Client
HDFS Map Reduce
Secondary
NameNode
Data
BlocksDataNode
NameNode Job Tracker
Task Tracker
Map Reduce
DataNode Task Tracker
Map Reduce….
DataNode DataNodeTask Tracker
Map Reduce
Task Tracker
Map Reduce
Slide 15 www.edureka.in/hadoop
Problem Description
NameNode – No Horizontal
Scalability
Single NameNode and Single Namespace, limited by
NameNode RAM
NameNode – No High Availability (HA) NameNode is Single Point of Failure, Need manual recovery using
Secondary NameNode in case of failure
Job Tracker – Overburdened Spends significant portion of time and effort managing the life cycle of
Applications
MRv1 – Only Map and Reduce tasks Humongous Data stored in HDFS remains unutilized and cannot be used
for other workloads such as Graph processing etc.
Hadoop 1.x - Challenges
NameNode - No High Availability
NameNode - No Horizontal Scale
Data
Node
Data
Node
Data
Node
….
Client Get Block Locations
Block Management
Read Data
NameNode
NS
Slide 16 www.edureka.in/hadoop
NameNode – Scale and HA
Slide 17 www.edureka.in/hadoop
Name Node –Single Point of Failure
 Secondary NameNode:
 “Not a hot standby” for the NameNode
 Connects to NameNode every hour*
 Housekeeping, backup of NameNode metadata
 Saved metadata can build a failed NameNode
You give me
metadata every
hour, I will make
it secure
Single Point
Failure
Secondary
NameNode
NameNode
metadata
metadata
Slide 18 www.edureka.in/hadoop
Job Tracker – Overburdened
CPU
 Spends a very significant portion of time and effort managing
the life cycle of applications
Network
 Single Listener Thread to communicate with thousands of
Map and Reduce Jobs
Task Tracker Task Tracker Task Tracker….
Job
Tracker
Slide 19 www.edureka.in/hadoop
MRv1 – Unpredictability in Large Clusters
As the cluster size grow and reaches to 4000 Nodes
 Cascading Failures
 The DataNode failures results in a serious
deterioration of the overall cluster
performance because of attempts to replicate
data and overload live nodes, through network
flooding.
 Multi-tenancy
 As clusters increase in size, you may want to
employ these clusters for a variety of models.
MRv1 dedicates its nodes to Hadoop and
cannot be re-purposed for other applications
and workloads in an Organization. With the
growing popularity and adoption of cloud
computing among enterprises, this becomes
more important.
Unutilized Data in HDFS
 Terabytes and Petabytes of data in HDFS can only be used for MapReduce processing
Slide 11 www.edureka.in/hadoop
Introducing Hadoop 2.0
Features Hadoop 1.x Hadoop 2.0
HDFS Federation One NameNode and a Namespace Multiple NameNode and
Namespaces
NameNode High Availability Not present Highly Available
YARN - Processing Control and
Multi-tenancy
JobTracker, TaskTracker Resource Manager, Node
Manager, App Master, Capacity
Scheduler
Other important Hadoop 2.0 features
 HDFS Snapshots
 NFSv3 access to data in HDFS
 Support for running Hadoop on MS Windows
 Binary Compatibility for MapReduce applications built on Hadoop 1.0
 Substantial amount of Integration testing with rest of the projects (such as PIG, HIVE) in Hadoop ecosystem
Slide 12 www.edureka.in/hadoop
Namenode
Block Management
NS
Storage
Datanode Datanode…
NamespaceBlockStorage
Namespace
NS1 NSk NSn
NN-1 NN-k NN-n
Common Storage
Datanode 1
…
Datanode 2
…
Datanode m
…
BlockStorage
Pool 1 Pool k Pool n
Block Pools
… …
Hadoop 1.0 Hadoop 2.0
Slide 22 www.edureka.in/hadoop
http://hadoop.apache.org/docs/stable2/hadoop-project-dist/hadoop-hdfs/Federation.html
Hadoop 2.0 Cluster Architecture - Federation
Slide 23 www.edureka.in/hadoop
cluster.
Annie’s Question
How does HDFS Federation help HDFS Scale horizontally?
A) Reduces the load on any single NameNode by using the multiple,
independent NameNodes to manage individual parts of the file system
namespace.
B) Provides cross-data centre (non-local) support for HDFS, allowing a cluster
administrator to split the Block Storage outside the local cluster.
Slide 24 www.edureka.in/hadoop
Annie’s Answer
(A). In order to scale the name service horizontally, HDFS federation uses
multiple independent NameNodes. The NameNodes are federated, that is, the
NameNodes are independent and do not require coordination with each other.
Slide 25
Annie’s Question
You have configured two NameNodes to manage /marketing and /finance
namespaces respectively. What will happen if you try to ‘put’ a file in
/accounting directory?
www.edureka.in/hadoop
Slide 26
Annie’s Answer
The ‘put’ will fail. None of the namespaces will manage the file and you will get
an IOException with a “No such file or directory error”.
www.edureka.in/hadoop
Slide 27
Node Manager
HDFS
YARN
Resource
Manager
Shared
edit logs
All name space edits
logged to shared NFS
storage; single writer
(fencing)
Read edit logs and applies
to its own namespace
Data Node
Standby
NameNode
Active
NameNode
Container
App
Master
Node Manager
Data Node
Container
App
Master
Data Node
Client
Data Node
Container
App
Master
Node Manager
Data Node
Container
App
Master
Node Manager
Hadoop 2.0 Cluster Architecture - HA
NameNode High
Availability
Next Generation
MapReduce
HDFS HIGH AVAILABILITY
http://hadoop.apache.org/docs/stable2/hadoop-yarn/hadoop-yarn-site/HDFSHighAvailabilityWithNFS.html
Slide 28
Hadoop 2.0 Cluster Architecture - HA
www.edureka.in/hadoop
High Availability in
Hadoop 2.0
NameNode recovery in
Hadoop 1.0
Secondary
NameNode
Standby
NameNode
Active
NameNode
Secondary
NameNode
NameNode
Edit logs
Meta-Data
Automatic failover
to Standby
NameNode
Manually Recover
using Secondary
NameNode
FSImage
Slide 29
Annie’s Question
NameNode HA was developed to overcome the following disadvantage in
Hadoop 1.0?
a) Single Point Of Failure of NameNode
b) Too much burden on Job Tracker
www.edureka.in/hadoop
Slide 30
Annie’s Answer
Single Point of Failure of NameNode.
www.edureka.in/hadoop
Apache Oozie (Workflow)
HDFS
(Hadoop Distributed File System)
Pig Latin
Data Analysis
Hive
DW System
MapReduce Framework
HBase
Apache Oozie (Workflow)
HDFS
(Hadoop Distributed File System)
Pig Latin
Data Analysis
Hive
DW System
MapReduce Framework HBase
Other
YARN
Frameworks
(MPI, GIRAPH)
Slide 23 www.edureka.in/hadoop
YARN
Cluster Resource Management
YARN adds a more general interface to run non-MapReduce jobs (such as Graph Processing) within the Hadoop framework
YARN and Hadoop Ecosystem
BATCH
(MapReduce)
INTERACTIVE
(Text)
ONLINE
(HBase)
STREAMING
(Storm, S4, …)
GRAPH
(Giraph)
IN-MEMORY
(Spark)
HPC MPI
(OpenMPI)
OTHER
(Search)
(Weave..)
Slide 32 www.edureka.in/hadoop
http://hadoop.apache.org/docs/stable2/hadoop-yarn/hadoop-yarn-site/YARN.html
YARN – Moving beyond MapReduce
Slide 33 www.edureka.in/hadoop
 Organizes jobs into queues
 Queue shares as %’s of cluster
 FIFO scheduling within each
queue
 Data locality-aware Scheduling
 Hierarchical Queues
To manage the resource within an organization.
 Capacity Guarantees
A fraction to the total available capacity allocated to each Queue.
 Security
To safeguard applications from other users.
 Elasticity
Resources are available in a predictable and elastic manner to queues.
 Multi-tenancy
Set of limit to prevent over-utilization of resources by a single
application.
 Operability
Runtime configuration of Queues.
 Resource-based scheduling
If needed, Applications can request more resources than the default.
Multi-tenancy - Capacity Scheduler
Slide 34
Annie’s Question
YARN was developed to overcome the following disadvantage in Hadoop 1.0
MapReduce framework?
a) Single Point Of Failure Of NameNode
b) Too much burden on Job Tracker
www.edureka.in/hadoop
Slide 35
Annie’s Answer
Too much burden on Job Tracker.
www.edureka.in/hadoop
Slide 36
NameNode High
Availability
Next Generation
MapReduce
Hadoop 2.0 – In Summary
Client
HDFS YARN
Resource ManagerStandby
NameNode
Active
NameNode
Distributed Data Storage Distributed Data Processing
DataNode
Node Manager
Container
App
Master
…….
Masters
Slaves
Node Manager
DataNode
Container
App
Master
DataNode
Node Manager
Container
App
Master
Shared
edit logs
OR
Journal
Node
Scheduler
Applications
Manager
(AsM)
www.edureka.in/hadoop
Slide 37
Hello There!!
My name is Annie.
I love quizzes and
puzzles and I am here to
make you guys think and
answer my questions.
Can you use Hadoop 2.0 for Real-time processing?
- Yes
- No
Annie’s Question
Slide 38
No. Even though YARN in Hadoop 2.0 supports multiple frameworks
for different workloads other than batch, you need Storm or S4 for
real-time processing.
Annie’s Answer
Slide 39 www.edureka.in/hadoop
What about Real-time Processing?
Hadoop is good for Batch
but
How do I process Big Data in Real-
time?
Slide 40 www.edureka.in/hadoop
Storm is coming….
APACHE STORM
The Real-time Hadoop
• Continuous commutation system
Distributed, Reliable, Fault-tolerant,
Scalable and Robust
• Suitable for Big Data processing
• Guarantees no data loss
Programming Language agnostic
• JSON-based for Ruby, Python etc.
Use case
• Stream processing
• Distributed RPC
• Continuous Computation
Hadoop Vs. Storm
Hadoop Storm
Differences
Fundamentally as Batch
processing system
Real-time processing,
process unterminated
streams (e.g. twitter
feeds) of data, process
data as it arrives
MapReduce Jobs run to
completion
Topologies (Computation
Graph) run forever
Stateful Nodes
Stateless Nodes
Hadoop Storm
Similarities
Scalable Scalable
Guarantees no data loss Guarantees no data loss
Open Source Open Source
Storm Use Cases
 Data Normalization
• Groupon uses Storm to build real-time data integration
systems.
 Analytics
• Storm powers Twitter’s publisher analytics product,
processing every tweet and click that happens on Twitter to
provide analytics for Twitter's publisher partners.
• Flipboard use Storm across a wide range of services ranging
from Content Search to real-time analytics, to generating
custom magazine fields.
 Log processing
• Alibaba uses Storm to process the application log and data
change in databases to supply real-time data stats for data
apps.
• NaviSite uses Storm in its server log monitoring and auditing
system.
Thank You
See You in Next Class
Ad

More Related Content

What's hot (20)

Hadoop
Hadoop Hadoop
Hadoop
ABHIJEET RAJ
 
Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation Hadoop
Varun Narang
 
Map Reduce
Map ReduceMap Reduce
Map Reduce
Vigen Sahakyan
 
Mining Data Streams
Mining Data StreamsMining Data Streams
Mining Data Streams
SujaAldrin
 
Hadoop Seminar Report
Hadoop Seminar ReportHadoop Seminar Report
Hadoop Seminar Report
Bhushan Kulkarni
 
Hadoop technology
Hadoop technologyHadoop technology
Hadoop technology
tipanagiriharika
 
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
Simplilearn
 
Hadoop Map Reduce
Hadoop Map ReduceHadoop Map Reduce
Hadoop Map Reduce
VNIT-ACM Student Chapter
 
Hadoop Ecosystem
Hadoop EcosystemHadoop Ecosystem
Hadoop Ecosystem
Sandip Darwade
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with Hadoop
Philippe Julio
 
Yarn.ppt
Yarn.pptYarn.ppt
Yarn.ppt
V.V.Vanniaperumal College for Women
 
Unit-3_BDA.ppt
Unit-3_BDA.pptUnit-3_BDA.ppt
Unit-3_BDA.ppt
PoojaShah174393
 
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Simplilearn
 
project report on hadoop
project report on hadoopproject report on hadoop
project report on hadoop
Manoj Jangalva
 
Map Reduce
Map ReduceMap Reduce
Map Reduce
Prashant Gupta
 
Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component
rebeccatho
 
04 spark-pair rdd-rdd-persistence
04 spark-pair rdd-rdd-persistence04 spark-pair rdd-rdd-persistence
04 spark-pair rdd-rdd-persistence
Venkat Datla
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
Apache Apex
 
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
Simplilearn
 
Introduction to Hadoop Technology
Introduction to Hadoop TechnologyIntroduction to Hadoop Technology
Introduction to Hadoop Technology
Manish Borkar
 
Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation Hadoop
Varun Narang
 
Mining Data Streams
Mining Data StreamsMining Data Streams
Mining Data Streams
SujaAldrin
 
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
Introduction To Hadoop | What Is Hadoop And Big Data | Hadoop Tutorial For Be...
Simplilearn
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with Hadoop
Philippe Julio
 
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Hadoop Architecture | HDFS Architecture | Hadoop Architecture Tutorial | HDFS...
Simplilearn
 
project report on hadoop
project report on hadoopproject report on hadoop
project report on hadoop
Manoj Jangalva
 
Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component
rebeccatho
 
04 spark-pair rdd-rdd-persistence
04 spark-pair rdd-rdd-persistence04 spark-pair rdd-rdd-persistence
04 spark-pair rdd-rdd-persistence
Venkat Datla
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
Apache Apex
 
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
Hadoop Training | Hadoop Training For Beginners | Hadoop Architecture | Hadoo...
Simplilearn
 
Introduction to Hadoop Technology
Introduction to Hadoop TechnologyIntroduction to Hadoop Technology
Introduction to Hadoop Technology
Manish Borkar
 

Similar to Hadoop 2.0 Architecture | HDFS Federation | NameNode High Availability | (20)

Hadoop Developer
Hadoop DeveloperHadoop Developer
Hadoop Developer
Edureka!
 
Learn Hadoop
Learn HadoopLearn Hadoop
Learn Hadoop
Edureka!
 
Big Data & Hadoop Tutorial
Big Data & Hadoop TutorialBig Data & Hadoop Tutorial
Big Data & Hadoop Tutorial
Edureka!
 
Hadoop a Highly Available and Secure Enterprise Data Warehousing solution
Hadoop a Highly Available and Secure Enterprise Data Warehousing solutionHadoop a Highly Available and Secure Enterprise Data Warehousing solution
Hadoop a Highly Available and Secure Enterprise Data Warehousing solution
Edureka!
 
250hadoopinterviewquestions
250hadoopinterviewquestions250hadoopinterviewquestions
250hadoopinterviewquestions
Ramana Swamy
 
Hadoop demo ppt
Hadoop demo pptHadoop demo ppt
Hadoop demo ppt
Phil Young
 
Hadoop Tutorial for Beginners
Hadoop Tutorial for BeginnersHadoop Tutorial for Beginners
Hadoop Tutorial for Beginners
business Corporate
 
500 data engineering interview question.docx
500 data engineering interview question.docx500 data engineering interview question.docx
500 data engineering interview question.docx
aekannake
 
Hadoop .pdf
Hadoop .pdfHadoop .pdf
Hadoop .pdf
SudhanshiBakre1
 
Hadoop tutorial-pdf.pdf
Hadoop tutorial-pdf.pdfHadoop tutorial-pdf.pdf
Hadoop tutorial-pdf.pdf
Sheetal Jain
 
Presentation sreenu dwh-services
Presentation sreenu dwh-servicesPresentation sreenu dwh-services
Presentation sreenu dwh-services
Sreenu Musham
 
Top Hadoop Big Data Interview Questions and Answers for Fresher
Top Hadoop Big Data Interview Questions and Answers for FresherTop Hadoop Big Data Interview Questions and Answers for Fresher
Top Hadoop Big Data Interview Questions and Answers for Fresher
JanBask Training
 
50 must read hadoop interview questions & answers - whizlabs
50 must read hadoop interview questions & answers - whizlabs50 must read hadoop interview questions & answers - whizlabs
50 must read hadoop interview questions & answers - whizlabs
Whizlabs
 
Hadoop for Java Professionals
Hadoop for Java ProfessionalsHadoop for Java Professionals
Hadoop for Java Professionals
Edureka!
 
Best hadoop-online-training
Best hadoop-online-trainingBest hadoop-online-training
Best hadoop-online-training
Geohedrick
 
Hadoop seminar
Hadoop seminarHadoop seminar
Hadoop seminar
KrishnenduKrishh
 
Understanding hadoop
Understanding hadoopUnderstanding hadoop
Understanding hadoop
RexRamos9
 
HDFS
HDFSHDFS
HDFS
Steve Loughran
 
Hadoop Distributed File System in Big data
Hadoop Distributed File System in Big dataHadoop Distributed File System in Big data
Hadoop Distributed File System in Big data
ramukaka777787
 
OPERATING SYSTEM .pptx
OPERATING SYSTEM .pptxOPERATING SYSTEM .pptx
OPERATING SYSTEM .pptx
AltafKhadim
 
Hadoop Developer
Hadoop DeveloperHadoop Developer
Hadoop Developer
Edureka!
 
Learn Hadoop
Learn HadoopLearn Hadoop
Learn Hadoop
Edureka!
 
Big Data & Hadoop Tutorial
Big Data & Hadoop TutorialBig Data & Hadoop Tutorial
Big Data & Hadoop Tutorial
Edureka!
 
Hadoop a Highly Available and Secure Enterprise Data Warehousing solution
Hadoop a Highly Available and Secure Enterprise Data Warehousing solutionHadoop a Highly Available and Secure Enterprise Data Warehousing solution
Hadoop a Highly Available and Secure Enterprise Data Warehousing solution
Edureka!
 
250hadoopinterviewquestions
250hadoopinterviewquestions250hadoopinterviewquestions
250hadoopinterviewquestions
Ramana Swamy
 
Hadoop demo ppt
Hadoop demo pptHadoop demo ppt
Hadoop demo ppt
Phil Young
 
500 data engineering interview question.docx
500 data engineering interview question.docx500 data engineering interview question.docx
500 data engineering interview question.docx
aekannake
 
Hadoop tutorial-pdf.pdf
Hadoop tutorial-pdf.pdfHadoop tutorial-pdf.pdf
Hadoop tutorial-pdf.pdf
Sheetal Jain
 
Presentation sreenu dwh-services
Presentation sreenu dwh-servicesPresentation sreenu dwh-services
Presentation sreenu dwh-services
Sreenu Musham
 
Top Hadoop Big Data Interview Questions and Answers for Fresher
Top Hadoop Big Data Interview Questions and Answers for FresherTop Hadoop Big Data Interview Questions and Answers for Fresher
Top Hadoop Big Data Interview Questions and Answers for Fresher
JanBask Training
 
50 must read hadoop interview questions & answers - whizlabs
50 must read hadoop interview questions & answers - whizlabs50 must read hadoop interview questions & answers - whizlabs
50 must read hadoop interview questions & answers - whizlabs
Whizlabs
 
Hadoop for Java Professionals
Hadoop for Java ProfessionalsHadoop for Java Professionals
Hadoop for Java Professionals
Edureka!
 
Best hadoop-online-training
Best hadoop-online-trainingBest hadoop-online-training
Best hadoop-online-training
Geohedrick
 
Understanding hadoop
Understanding hadoopUnderstanding hadoop
Understanding hadoop
RexRamos9
 
Hadoop Distributed File System in Big data
Hadoop Distributed File System in Big dataHadoop Distributed File System in Big data
Hadoop Distributed File System in Big data
ramukaka777787
 
OPERATING SYSTEM .pptx
OPERATING SYSTEM .pptxOPERATING SYSTEM .pptx
OPERATING SYSTEM .pptx
AltafKhadim
 
Ad

More from Edureka! (20)

What to learn during the 21 days Lockdown | Edureka
What to learn during the 21 days Lockdown | EdurekaWhat to learn during the 21 days Lockdown | Edureka
What to learn during the 21 days Lockdown | Edureka
Edureka!
 
Top 10 Dying Programming Languages in 2020 | Edureka
Top 10 Dying Programming Languages in 2020 | EdurekaTop 10 Dying Programming Languages in 2020 | Edureka
Top 10 Dying Programming Languages in 2020 | Edureka
Edureka!
 
Top 5 Trending Business Intelligence Tools | Edureka
Top 5 Trending Business Intelligence Tools | EdurekaTop 5 Trending Business Intelligence Tools | Edureka
Top 5 Trending Business Intelligence Tools | Edureka
Edureka!
 
Tableau Tutorial for Data Science | Edureka
Tableau Tutorial for Data Science | EdurekaTableau Tutorial for Data Science | Edureka
Tableau Tutorial for Data Science | Edureka
Edureka!
 
Python Programming Tutorial | Edureka
Python Programming Tutorial | EdurekaPython Programming Tutorial | Edureka
Python Programming Tutorial | Edureka
Edureka!
 
Top 5 PMP Certifications | Edureka
Top 5 PMP Certifications | EdurekaTop 5 PMP Certifications | Edureka
Top 5 PMP Certifications | Edureka
Edureka!
 
Top Maven Interview Questions in 2020 | Edureka
Top Maven Interview Questions in 2020 | EdurekaTop Maven Interview Questions in 2020 | Edureka
Top Maven Interview Questions in 2020 | Edureka
Edureka!
 
Linux Mint Tutorial | Edureka
Linux Mint Tutorial | EdurekaLinux Mint Tutorial | Edureka
Linux Mint Tutorial | Edureka
Edureka!
 
How to Deploy Java Web App in AWS| Edureka
How to Deploy Java Web App in AWS| EdurekaHow to Deploy Java Web App in AWS| Edureka
How to Deploy Java Web App in AWS| Edureka
Edureka!
 
Importance of Digital Marketing | Edureka
Importance of Digital Marketing | EdurekaImportance of Digital Marketing | Edureka
Importance of Digital Marketing | Edureka
Edureka!
 
RPA in 2020 | Edureka
RPA in 2020 | EdurekaRPA in 2020 | Edureka
RPA in 2020 | Edureka
Edureka!
 
Email Notifications in Jenkins | Edureka
Email Notifications in Jenkins | EdurekaEmail Notifications in Jenkins | Edureka
Email Notifications in Jenkins | Edureka
Edureka!
 
EA Algorithm in Machine Learning | Edureka
EA Algorithm in Machine Learning | EdurekaEA Algorithm in Machine Learning | Edureka
EA Algorithm in Machine Learning | Edureka
Edureka!
 
Cognitive AI Tutorial | Edureka
Cognitive AI Tutorial | EdurekaCognitive AI Tutorial | Edureka
Cognitive AI Tutorial | Edureka
Edureka!
 
AWS Cloud Practitioner Tutorial | Edureka
AWS Cloud Practitioner Tutorial | EdurekaAWS Cloud Practitioner Tutorial | Edureka
AWS Cloud Practitioner Tutorial | Edureka
Edureka!
 
Blue Prism Top Interview Questions | Edureka
Blue Prism Top Interview Questions | EdurekaBlue Prism Top Interview Questions | Edureka
Blue Prism Top Interview Questions | Edureka
Edureka!
 
Big Data on AWS Tutorial | Edureka
Big Data on AWS Tutorial | Edureka Big Data on AWS Tutorial | Edureka
Big Data on AWS Tutorial | Edureka
Edureka!
 
A star algorithm | A* Algorithm in Artificial Intelligence | Edureka
A star algorithm | A* Algorithm in Artificial Intelligence | EdurekaA star algorithm | A* Algorithm in Artificial Intelligence | Edureka
A star algorithm | A* Algorithm in Artificial Intelligence | Edureka
Edureka!
 
Kubernetes Installation on Ubuntu | Edureka
Kubernetes Installation on Ubuntu | EdurekaKubernetes Installation on Ubuntu | Edureka
Kubernetes Installation on Ubuntu | Edureka
Edureka!
 
Introduction to DevOps | Edureka
Introduction to DevOps | EdurekaIntroduction to DevOps | Edureka
Introduction to DevOps | Edureka
Edureka!
 
What to learn during the 21 days Lockdown | Edureka
What to learn during the 21 days Lockdown | EdurekaWhat to learn during the 21 days Lockdown | Edureka
What to learn during the 21 days Lockdown | Edureka
Edureka!
 
Top 10 Dying Programming Languages in 2020 | Edureka
Top 10 Dying Programming Languages in 2020 | EdurekaTop 10 Dying Programming Languages in 2020 | Edureka
Top 10 Dying Programming Languages in 2020 | Edureka
Edureka!
 
Top 5 Trending Business Intelligence Tools | Edureka
Top 5 Trending Business Intelligence Tools | EdurekaTop 5 Trending Business Intelligence Tools | Edureka
Top 5 Trending Business Intelligence Tools | Edureka
Edureka!
 
Tableau Tutorial for Data Science | Edureka
Tableau Tutorial for Data Science | EdurekaTableau Tutorial for Data Science | Edureka
Tableau Tutorial for Data Science | Edureka
Edureka!
 
Python Programming Tutorial | Edureka
Python Programming Tutorial | EdurekaPython Programming Tutorial | Edureka
Python Programming Tutorial | Edureka
Edureka!
 
Top 5 PMP Certifications | Edureka
Top 5 PMP Certifications | EdurekaTop 5 PMP Certifications | Edureka
Top 5 PMP Certifications | Edureka
Edureka!
 
Top Maven Interview Questions in 2020 | Edureka
Top Maven Interview Questions in 2020 | EdurekaTop Maven Interview Questions in 2020 | Edureka
Top Maven Interview Questions in 2020 | Edureka
Edureka!
 
Linux Mint Tutorial | Edureka
Linux Mint Tutorial | EdurekaLinux Mint Tutorial | Edureka
Linux Mint Tutorial | Edureka
Edureka!
 
How to Deploy Java Web App in AWS| Edureka
How to Deploy Java Web App in AWS| EdurekaHow to Deploy Java Web App in AWS| Edureka
How to Deploy Java Web App in AWS| Edureka
Edureka!
 
Importance of Digital Marketing | Edureka
Importance of Digital Marketing | EdurekaImportance of Digital Marketing | Edureka
Importance of Digital Marketing | Edureka
Edureka!
 
RPA in 2020 | Edureka
RPA in 2020 | EdurekaRPA in 2020 | Edureka
RPA in 2020 | Edureka
Edureka!
 
Email Notifications in Jenkins | Edureka
Email Notifications in Jenkins | EdurekaEmail Notifications in Jenkins | Edureka
Email Notifications in Jenkins | Edureka
Edureka!
 
EA Algorithm in Machine Learning | Edureka
EA Algorithm in Machine Learning | EdurekaEA Algorithm in Machine Learning | Edureka
EA Algorithm in Machine Learning | Edureka
Edureka!
 
Cognitive AI Tutorial | Edureka
Cognitive AI Tutorial | EdurekaCognitive AI Tutorial | Edureka
Cognitive AI Tutorial | Edureka
Edureka!
 
AWS Cloud Practitioner Tutorial | Edureka
AWS Cloud Practitioner Tutorial | EdurekaAWS Cloud Practitioner Tutorial | Edureka
AWS Cloud Practitioner Tutorial | Edureka
Edureka!
 
Blue Prism Top Interview Questions | Edureka
Blue Prism Top Interview Questions | EdurekaBlue Prism Top Interview Questions | Edureka
Blue Prism Top Interview Questions | Edureka
Edureka!
 
Big Data on AWS Tutorial | Edureka
Big Data on AWS Tutorial | Edureka Big Data on AWS Tutorial | Edureka
Big Data on AWS Tutorial | Edureka
Edureka!
 
A star algorithm | A* Algorithm in Artificial Intelligence | Edureka
A star algorithm | A* Algorithm in Artificial Intelligence | EdurekaA star algorithm | A* Algorithm in Artificial Intelligence | Edureka
A star algorithm | A* Algorithm in Artificial Intelligence | Edureka
Edureka!
 
Kubernetes Installation on Ubuntu | Edureka
Kubernetes Installation on Ubuntu | EdurekaKubernetes Installation on Ubuntu | Edureka
Kubernetes Installation on Ubuntu | Edureka
Edureka!
 
Introduction to DevOps | Edureka
Introduction to DevOps | EdurekaIntroduction to DevOps | Edureka
Introduction to DevOps | Edureka
Edureka!
 
Ad

Recently uploaded (20)

Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdfComplete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Software Company
 
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
BookNet Canada
 
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptxSpecial Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
shyamraj55
 
Role of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered ManufacturingRole of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered Manufacturing
Andrew Leo
 
Procurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptxProcurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptx
Jon Hansen
 
Cyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of securityCyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of security
riccardosl1
 
How analogue intelligence complements AI
How analogue intelligence complements AIHow analogue intelligence complements AI
How analogue intelligence complements AI
Paul Rowe
 
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdfSAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
Precisely
 
Build Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For DevsBuild Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For Devs
Brian McKeiver
 
TrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business ConsultingTrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business Consulting
Trs Labs
 
Drupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy ConsumptionDrupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy Consumption
Exove
 
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul
 
tecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdftecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdf
fjgm517
 
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc
 
Mobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi ArabiaMobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi Arabia
Steve Jonas
 
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdfThe Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
Abi john
 
Generative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in BusinessGenerative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in Business
Dr. Tathagat Varma
 
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-UmgebungenHCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
panagenda
 
Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025
Splunk
 
AI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global TrendsAI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global Trends
InData Labs
 
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdfComplete Guide to Advanced Logistics Management Software in Riyadh.pdf
Complete Guide to Advanced Logistics Management Software in Riyadh.pdf
Software Company
 
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
Transcript: #StandardsGoals for 2025: Standards & certification roundup - Tec...
BookNet Canada
 
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptxSpecial Meetup Edition - TDX Bengaluru Meetup #52.pptx
Special Meetup Edition - TDX Bengaluru Meetup #52.pptx
shyamraj55
 
Role of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered ManufacturingRole of Data Annotation Services in AI-Powered Manufacturing
Role of Data Annotation Services in AI-Powered Manufacturing
Andrew Leo
 
Procurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptxProcurement Insights Cost To Value Guide.pptx
Procurement Insights Cost To Value Guide.pptx
Jon Hansen
 
Cyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of securityCyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of security
riccardosl1
 
How analogue intelligence complements AI
How analogue intelligence complements AIHow analogue intelligence complements AI
How analogue intelligence complements AI
Paul Rowe
 
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdfSAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
SAP Modernization: Maximizing the Value of Your SAP S/4HANA Migration.pdf
Precisely
 
Build Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For DevsBuild Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For Devs
Brian McKeiver
 
TrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business ConsultingTrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business Consulting
Trs Labs
 
Drupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy ConsumptionDrupalcamp Finland – Measuring Front-end Energy Consumption
Drupalcamp Finland – Measuring Front-end Energy Consumption
Exove
 
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul Shares 5 Steps to Implement AI Agents for Maximum Business Efficien...
Noah Loul
 
tecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdftecnologias de las primeras civilizaciones.pdf
tecnologias de las primeras civilizaciones.pdf
fjgm517
 
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc Webinar: Consumer Expectations vs Corporate Realities on Data Broker...
TrustArc
 
Mobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi ArabiaMobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi Arabia
Steve Jonas
 
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdfThe Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
Abi john
 
Generative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in BusinessGenerative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in Business
Dr. Tathagat Varma
 
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-UmgebungenHCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
panagenda
 
Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025Splunk Security Update | Public Sector Summit Germany 2025
Splunk Security Update | Public Sector Summit Germany 2025
Splunk
 
AI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global TrendsAI and Data Privacy in 2025: Global Trends
AI and Data Privacy in 2025: Global Trends
InData Labs
 

Hadoop 2.0 Architecture | HDFS Federation | NameNode High Availability |

  • 2. Slide 2 Hello There!! My name is Annie. Let me test your Hadoop 1.x knowledge? Annie’s Introduction
  • 3. Slide 3 Hello There!! My name is Annie. I love quizzes and puzzles and I am here to make you guys think and answer my questions. Can you store 1 billion files in a Hadoop 1.x cluster? - Yes - No Annie’s Question
  • 4. Slide 4 No. Even though you have hundreds of DataNodes in the cluster, the NameNode keeps all its metadata in memory, so you are limited to a maximum of only 50-100M files in the entire cluster because of a Single NameNode in Hadoop 1.x. Annie’s Answer
  • 5. Slide 5 Hello There!! My name is Annie. I love quizzes and puzzles and I am here to make you guys think and answer my questions. A Hadoop 1.x cluster can have multiple HDFS Namespaces. - True - False Annie’s Question
  • 6. Slide 6 False. Not possible with Hadoop 1.x. Annie’s Answer
  • 7. Slide 7 Hello There!! My name is Annie. I love quizzes and puzzles and I am here to make you guys think and answer my questions. Which of the following is (are) a significant disadvantage in Hadoop 1.0? - ‘Single Point Of Failure’ of NameNode - Too much burden on Job Tracker Annie’s Question
  • 8. Slide 8 Single Point of Failure of NameNode and too much burden on Job Tracker. Annie’s Answer
  • 9. Slide 9 Hello There!! My name is Annie. I love quizzes and puzzles and I am here to make you guys think and answer my questions. Can you use hundreds of Hadoop DataNode for any other processing than MapReduce in Hadoop 1.x? - Yes - No Annie’s Question
  • 10. Slide 10 No. Hadoop 1.x dedicates all the DataNode resources to Map and Reduce slots with no or little room for processing any other workload. Annie’s Answer
  • 11. Slide 11 Hello There!! My name is Annie. I love quizzes and puzzles and I am here to make you guys think and answer my questions. Can you use Hadoop for Real-time processing? - Yes - No Annie’s Question
  • 12. Slide 12 No. Hadoop is designed and developer for massively parallel batch processing. Annie’s Answer
  • 13. Limitations of Hadoop 1.x  No horizontal scalability of NameNode  Does not support NameNode High Availability  Overburdened JobTracker  Not possible to run Non-MapReduce Big Data Applications on HDFS  Does not support Multi-tenancy
  • 14. Slide 14 www.edureka.in/hadoop Hadoop 1.x – In Summary Client HDFS Map Reduce Secondary NameNode Data BlocksDataNode NameNode Job Tracker Task Tracker Map Reduce DataNode Task Tracker Map Reduce…. DataNode DataNodeTask Tracker Map Reduce Task Tracker Map Reduce
  • 15. Slide 15 www.edureka.in/hadoop Problem Description NameNode – No Horizontal Scalability Single NameNode and Single Namespace, limited by NameNode RAM NameNode – No High Availability (HA) NameNode is Single Point of Failure, Need manual recovery using Secondary NameNode in case of failure Job Tracker – Overburdened Spends significant portion of time and effort managing the life cycle of Applications MRv1 – Only Map and Reduce tasks Humongous Data stored in HDFS remains unutilized and cannot be used for other workloads such as Graph processing etc. Hadoop 1.x - Challenges
  • 16. NameNode - No High Availability NameNode - No Horizontal Scale Data Node Data Node Data Node …. Client Get Block Locations Block Management Read Data NameNode NS Slide 16 www.edureka.in/hadoop NameNode – Scale and HA
  • 17. Slide 17 www.edureka.in/hadoop Name Node –Single Point of Failure  Secondary NameNode:  “Not a hot standby” for the NameNode  Connects to NameNode every hour*  Housekeeping, backup of NameNode metadata  Saved metadata can build a failed NameNode You give me metadata every hour, I will make it secure Single Point Failure Secondary NameNode NameNode metadata metadata
  • 18. Slide 18 www.edureka.in/hadoop Job Tracker – Overburdened CPU  Spends a very significant portion of time and effort managing the life cycle of applications Network  Single Listener Thread to communicate with thousands of Map and Reduce Jobs Task Tracker Task Tracker Task Tracker…. Job Tracker
  • 19. Slide 19 www.edureka.in/hadoop MRv1 – Unpredictability in Large Clusters As the cluster size grow and reaches to 4000 Nodes  Cascading Failures  The DataNode failures results in a serious deterioration of the overall cluster performance because of attempts to replicate data and overload live nodes, through network flooding.  Multi-tenancy  As clusters increase in size, you may want to employ these clusters for a variety of models. MRv1 dedicates its nodes to Hadoop and cannot be re-purposed for other applications and workloads in an Organization. With the growing popularity and adoption of cloud computing among enterprises, this becomes more important.
  • 20. Unutilized Data in HDFS  Terabytes and Petabytes of data in HDFS can only be used for MapReduce processing Slide 11 www.edureka.in/hadoop
  • 21. Introducing Hadoop 2.0 Features Hadoop 1.x Hadoop 2.0 HDFS Federation One NameNode and a Namespace Multiple NameNode and Namespaces NameNode High Availability Not present Highly Available YARN - Processing Control and Multi-tenancy JobTracker, TaskTracker Resource Manager, Node Manager, App Master, Capacity Scheduler Other important Hadoop 2.0 features  HDFS Snapshots  NFSv3 access to data in HDFS  Support for running Hadoop on MS Windows  Binary Compatibility for MapReduce applications built on Hadoop 1.0  Substantial amount of Integration testing with rest of the projects (such as PIG, HIVE) in Hadoop ecosystem Slide 12 www.edureka.in/hadoop
  • 22. Namenode Block Management NS Storage Datanode Datanode… NamespaceBlockStorage Namespace NS1 NSk NSn NN-1 NN-k NN-n Common Storage Datanode 1 … Datanode 2 … Datanode m … BlockStorage Pool 1 Pool k Pool n Block Pools … … Hadoop 1.0 Hadoop 2.0 Slide 22 www.edureka.in/hadoop http://hadoop.apache.org/docs/stable2/hadoop-project-dist/hadoop-hdfs/Federation.html Hadoop 2.0 Cluster Architecture - Federation
  • 23. Slide 23 www.edureka.in/hadoop cluster. Annie’s Question How does HDFS Federation help HDFS Scale horizontally? A) Reduces the load on any single NameNode by using the multiple, independent NameNodes to manage individual parts of the file system namespace. B) Provides cross-data centre (non-local) support for HDFS, allowing a cluster administrator to split the Block Storage outside the local cluster.
  • 24. Slide 24 www.edureka.in/hadoop Annie’s Answer (A). In order to scale the name service horizontally, HDFS federation uses multiple independent NameNodes. The NameNodes are federated, that is, the NameNodes are independent and do not require coordination with each other.
  • 25. Slide 25 Annie’s Question You have configured two NameNodes to manage /marketing and /finance namespaces respectively. What will happen if you try to ‘put’ a file in /accounting directory? www.edureka.in/hadoop
  • 26. Slide 26 Annie’s Answer The ‘put’ will fail. None of the namespaces will manage the file and you will get an IOException with a “No such file or directory error”. www.edureka.in/hadoop
  • 27. Slide 27 Node Manager HDFS YARN Resource Manager Shared edit logs All name space edits logged to shared NFS storage; single writer (fencing) Read edit logs and applies to its own namespace Data Node Standby NameNode Active NameNode Container App Master Node Manager Data Node Container App Master Data Node Client Data Node Container App Master Node Manager Data Node Container App Master Node Manager Hadoop 2.0 Cluster Architecture - HA NameNode High Availability Next Generation MapReduce HDFS HIGH AVAILABILITY http://hadoop.apache.org/docs/stable2/hadoop-yarn/hadoop-yarn-site/HDFSHighAvailabilityWithNFS.html
  • 28. Slide 28 Hadoop 2.0 Cluster Architecture - HA www.edureka.in/hadoop High Availability in Hadoop 2.0 NameNode recovery in Hadoop 1.0 Secondary NameNode Standby NameNode Active NameNode Secondary NameNode NameNode Edit logs Meta-Data Automatic failover to Standby NameNode Manually Recover using Secondary NameNode FSImage
  • 29. Slide 29 Annie’s Question NameNode HA was developed to overcome the following disadvantage in Hadoop 1.0? a) Single Point Of Failure of NameNode b) Too much burden on Job Tracker www.edureka.in/hadoop
  • 30. Slide 30 Annie’s Answer Single Point of Failure of NameNode. www.edureka.in/hadoop
  • 31. Apache Oozie (Workflow) HDFS (Hadoop Distributed File System) Pig Latin Data Analysis Hive DW System MapReduce Framework HBase Apache Oozie (Workflow) HDFS (Hadoop Distributed File System) Pig Latin Data Analysis Hive DW System MapReduce Framework HBase Other YARN Frameworks (MPI, GIRAPH) Slide 23 www.edureka.in/hadoop YARN Cluster Resource Management YARN adds a more general interface to run non-MapReduce jobs (such as Graph Processing) within the Hadoop framework YARN and Hadoop Ecosystem
  • 32. BATCH (MapReduce) INTERACTIVE (Text) ONLINE (HBase) STREAMING (Storm, S4, …) GRAPH (Giraph) IN-MEMORY (Spark) HPC MPI (OpenMPI) OTHER (Search) (Weave..) Slide 32 www.edureka.in/hadoop http://hadoop.apache.org/docs/stable2/hadoop-yarn/hadoop-yarn-site/YARN.html YARN – Moving beyond MapReduce
  • 33. Slide 33 www.edureka.in/hadoop  Organizes jobs into queues  Queue shares as %’s of cluster  FIFO scheduling within each queue  Data locality-aware Scheduling  Hierarchical Queues To manage the resource within an organization.  Capacity Guarantees A fraction to the total available capacity allocated to each Queue.  Security To safeguard applications from other users.  Elasticity Resources are available in a predictable and elastic manner to queues.  Multi-tenancy Set of limit to prevent over-utilization of resources by a single application.  Operability Runtime configuration of Queues.  Resource-based scheduling If needed, Applications can request more resources than the default. Multi-tenancy - Capacity Scheduler
  • 34. Slide 34 Annie’s Question YARN was developed to overcome the following disadvantage in Hadoop 1.0 MapReduce framework? a) Single Point Of Failure Of NameNode b) Too much burden on Job Tracker www.edureka.in/hadoop
  • 35. Slide 35 Annie’s Answer Too much burden on Job Tracker. www.edureka.in/hadoop
  • 36. Slide 36 NameNode High Availability Next Generation MapReduce Hadoop 2.0 – In Summary Client HDFS YARN Resource ManagerStandby NameNode Active NameNode Distributed Data Storage Distributed Data Processing DataNode Node Manager Container App Master ……. Masters Slaves Node Manager DataNode Container App Master DataNode Node Manager Container App Master Shared edit logs OR Journal Node Scheduler Applications Manager (AsM) www.edureka.in/hadoop
  • 37. Slide 37 Hello There!! My name is Annie. I love quizzes and puzzles and I am here to make you guys think and answer my questions. Can you use Hadoop 2.0 for Real-time processing? - Yes - No Annie’s Question
  • 38. Slide 38 No. Even though YARN in Hadoop 2.0 supports multiple frameworks for different workloads other than batch, you need Storm or S4 for real-time processing. Annie’s Answer
  • 39. Slide 39 www.edureka.in/hadoop What about Real-time Processing? Hadoop is good for Batch but How do I process Big Data in Real- time?
  • 40. Slide 40 www.edureka.in/hadoop Storm is coming…. APACHE STORM The Real-time Hadoop • Continuous commutation system Distributed, Reliable, Fault-tolerant, Scalable and Robust • Suitable for Big Data processing • Guarantees no data loss Programming Language agnostic • JSON-based for Ruby, Python etc. Use case • Stream processing • Distributed RPC • Continuous Computation
  • 41. Hadoop Vs. Storm Hadoop Storm Differences Fundamentally as Batch processing system Real-time processing, process unterminated streams (e.g. twitter feeds) of data, process data as it arrives MapReduce Jobs run to completion Topologies (Computation Graph) run forever Stateful Nodes Stateless Nodes Hadoop Storm Similarities Scalable Scalable Guarantees no data loss Guarantees no data loss Open Source Open Source
  • 42. Storm Use Cases  Data Normalization • Groupon uses Storm to build real-time data integration systems.  Analytics • Storm powers Twitter’s publisher analytics product, processing every tweet and click that happens on Twitter to provide analytics for Twitter's publisher partners. • Flipboard use Storm across a wide range of services ranging from Content Search to real-time analytics, to generating custom magazine fields.  Log processing • Alibaba uses Storm to process the application log and data change in databases to supply real-time data stats for data apps. • NaviSite uses Storm in its server log monitoring and auditing system.
  • 43. Thank You See You in Next Class