SlideShare a Scribd company logo
BigData
Syed
Solutions Engineer - Big Data
mail.syed786@gmail.com
info.syedacademy@gmail.com
+91-9030477368
Need For A New Processing Platform (Big Data)
What is Big Data ?
 Twitter (over 7~ TB/day)
 Facebook (over 10~ TB/day)
 Google (over 20~ PB/day)
Where does it come from ?
Existing systems (vertical scalibility)
Why Hadoop (horizontal scalibility)?
Yahoo
Google
Facebook
LinkedIn
IBM
Amazon
HortonWorks
Cloudera
NY Times
… the list goes on.
Companies Using Hadoop
What is Hadoop?
 Flexible infrastructure for large scale computation & data
processing on a network of commodity hardware.
 Completely written in java.
 Open source & distributed under Apache license
 Hadoop Core Components: HDFS & MapReduce.
 The Apache Hadoop software library is a framework that
allows for the distributed processing of large data sets
across clusters of computers using simple programming
models.
What Hadoop is Not?
 A database
 An online transaction processing (OLTP) system
 Replacement of all programming language
Hadoop Introduction and Architecture
Hadoop High-Level Architecture
HDFS - Hadoop Distributed File System
Design of HDFS
Where HDFS is not a good fit
Why Is a Block in HDFS So Large?
HDFS Architecture
Let us Zoom into HDFS
NameNode
 Deeper Things about Name Node
Secondary NameNode
 What is DataNode?
DataNode
NameNode and DataNodes
Feature Matrix
Yahoo Study
Still need to be fixed
How Do We Fix a Single NameNode
Feature
HDFS Architecture
NameNode HA(V2)
NameNode HA – Shared Storage
NameNode HA
HDFS Federation
Hadoop JournalNode
JournalNode machines - the machines on which you run the JournalNodes. The
JournalNode daemon is relatively lightweight, so these daemons may reasonably be
collocated on machines with other Hadoop daemons, for example NameNodes, the
JobTracker, or the YARN ResourceManager. Note: There must be at least 3 JournalNode
daemons, since edit log modifications must be written to a majority of JNs. This will allow
the system to tolerate the failure of a single machine. You may also run more than 3
JournalNodes, but in order to actually increase the number of failures the system can
tolerate, you should run an odd number of JNs, (i.e. 3, 5, 7, etc.). Note that when running
with N JournalNodes, the system can tolerate at most (N - 1) / 2 failures and continue to
function normally.
Hadoop 1
Limited up to 4,000 nodes per cluster
O(# of tasks in a cluster)
JobTracker bottleneck - resource management, job
scheduling and monitoring
Only has one namespace for managing HDFS
Map and Reduce slots are static
Only job to run is MapReduce
Hadoop 1 - Reading Files
Rack1 Rack2 Rack3 RackN
read file (fsimage/edit)
Hadoop Client
NameNode SNameNode
return DNs,
block ids, etc.
DN | TT
DN | TT
DN | TT
DN | TT
DN | TT
DN | TT
DN | TT
DN | TT
DN | TT
DN | TT
DN | TT
DN | TT
checkpoint
heartbeat/
block reportread blocks
Hadoop 1 - Writing Files
Rack1 Rack2 Rack3 RackN
request write (fsimage/edit)
Hadoop Client
NameNode SNameNode
return DNs, etc.
DN | TT
DN | TT
DN | TT
DN | TT
DN | TT
DN | TT
DN | TT
DN | TT
DN | TT
DN | TT
DN | TT
DN | TT
checkpoint
block report
write blocks
replication pipelining
Hadoop 2
Potentially up to 10,000 nodes per cluster
O(cluster size)
Supports multiple namespace for managing
HDFS
Efficient cluster utilization (YARN)
MRv1 backward and forward compatible
Any apps can integrate with Hadoop
Hadoop 2 - Basics
Hadoop 2 - Reading Files
(w/ NN Federation)
Rack1 Rack2 Rack3 RackN
read file
fsimage/edit copy
Hadoop Client NN1/ns1
SNameNode
per NN
return DNs,
block ids, etc.
DN | NM
DN | NM
DN | NM
DN | NM
DN | NM
DN | NM
DN | NM
DN | NM
DN | NM
DN | NM
DN | NM
DN | NM
checkpoint
register/
heartbeat/
block report
read blocks
fs sync Backup NN
per NN
checkpoint
NN2/ns2 NN3/ns3 NN4/ns4
or
ns1 ns2 ns3 ns4
dn1, dn2
dn1, dn3
dn4, dn5 dn4, dn5
Block Pools
Hadoop 2 - Writing Files
Rack1 Rack2 Rack3 RackN
request write
Hadoop Client
return DNs, etc.
DN | NM
DN | NM
DN | NM
DN | NM
DN | NM
DN | NM
DN | NM
DN | NM
DN | NM
DN | NM
DN | NM
DN | NM
write blocks
replication pipelining
fsimage/edit copy
NN1/ns1
SNameNode
per NN
checkpoint
block report
fs sync Backup NN
per NN
checkpoint
NN2/ns2 NN3/ns3 NN4/ns4
or
Thank you!
www.syedacademy.com
mail.syed786@gmail.com
info.syedacademy@gmail.com
+91-9030477368
Ad

More Related Content

What's hot (20)

July 2010 Triangle Hadoop Users Group - Chad Vawter Slides
July 2010 Triangle Hadoop Users Group - Chad Vawter SlidesJuly 2010 Triangle Hadoop Users Group - Chad Vawter Slides
July 2010 Triangle Hadoop Users Group - Chad Vawter Slides
ryancox
 
Hadoop Tutorial
Hadoop TutorialHadoop Tutorial
Hadoop Tutorial
awesomesos
 
Apache Hadoop 0.22 and Other Versions
Apache Hadoop 0.22 and Other VersionsApache Hadoop 0.22 and Other Versions
Apache Hadoop 0.22 and Other Versions
Konstantin V. Shvachko
 
Hadoop and big data training
Hadoop and big data trainingHadoop and big data training
Hadoop and big data training
agiamas
 
Architecture of Hadoop
Architecture of HadoopArchitecture of Hadoop
Architecture of Hadoop
Knoldus Inc.
 
Hadoop training in hyderabad-kellytechnologies
Hadoop training in hyderabad-kellytechnologiesHadoop training in hyderabad-kellytechnologies
Hadoop training in hyderabad-kellytechnologies
Kelly Technologies
 
Hadoop interview questions
Hadoop interview questionsHadoop interview questions
Hadoop interview questions
Kalyan Hadoop
 
Hadoop architecture meetup
Hadoop architecture meetupHadoop architecture meetup
Hadoop architecture meetup
vmoorthy
 
Apache Drill - Why, What, How
Apache Drill - Why, What, HowApache Drill - Why, What, How
Apache Drill - Why, What, How
mcsrivas
 
Hadoop
HadoopHadoop
Hadoop
Rajesh Piryani
 
Apache hadoop technology : Beginners
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : Beginners
Shweta Patnaik
 
Hadoop workshop
Hadoop workshopHadoop workshop
Hadoop workshop
Purna Chander
 
Meethadoop
MeethadoopMeethadoop
Meethadoop
IIIT-H
 
A Basic Introduction to the Hadoop eco system - no animation
A Basic Introduction to the Hadoop eco system - no animationA Basic Introduction to the Hadoop eco system - no animation
A Basic Introduction to the Hadoop eco system - no animation
Sameer Tiwari
 
Hadoop trainting in hyderabad@kelly technologies
Hadoop trainting in hyderabad@kelly technologiesHadoop trainting in hyderabad@kelly technologies
Hadoop trainting in hyderabad@kelly technologies
Kelly Technologies
 
Hadoop - Overview
Hadoop - OverviewHadoop - Overview
Hadoop - Overview
Jay
 
Practical Problem Solving with Apache Hadoop & Pig
Practical Problem Solving with Apache Hadoop & PigPractical Problem Solving with Apache Hadoop & Pig
Practical Problem Solving with Apache Hadoop & Pig
Milind Bhandarkar
 
R for hadoopers
R for hadoopersR for hadoopers
R for hadoopers
Gwen (Chen) Shapira
 
Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component
rebeccatho
 
Cassandra/Hadoop Integration
Cassandra/Hadoop IntegrationCassandra/Hadoop Integration
Cassandra/Hadoop Integration
Jeremy Hanna
 
July 2010 Triangle Hadoop Users Group - Chad Vawter Slides
July 2010 Triangle Hadoop Users Group - Chad Vawter SlidesJuly 2010 Triangle Hadoop Users Group - Chad Vawter Slides
July 2010 Triangle Hadoop Users Group - Chad Vawter Slides
ryancox
 
Hadoop Tutorial
Hadoop TutorialHadoop Tutorial
Hadoop Tutorial
awesomesos
 
Hadoop and big data training
Hadoop and big data trainingHadoop and big data training
Hadoop and big data training
agiamas
 
Architecture of Hadoop
Architecture of HadoopArchitecture of Hadoop
Architecture of Hadoop
Knoldus Inc.
 
Hadoop training in hyderabad-kellytechnologies
Hadoop training in hyderabad-kellytechnologiesHadoop training in hyderabad-kellytechnologies
Hadoop training in hyderabad-kellytechnologies
Kelly Technologies
 
Hadoop interview questions
Hadoop interview questionsHadoop interview questions
Hadoop interview questions
Kalyan Hadoop
 
Hadoop architecture meetup
Hadoop architecture meetupHadoop architecture meetup
Hadoop architecture meetup
vmoorthy
 
Apache Drill - Why, What, How
Apache Drill - Why, What, HowApache Drill - Why, What, How
Apache Drill - Why, What, How
mcsrivas
 
Apache hadoop technology : Beginners
Apache hadoop technology : BeginnersApache hadoop technology : Beginners
Apache hadoop technology : Beginners
Shweta Patnaik
 
Meethadoop
MeethadoopMeethadoop
Meethadoop
IIIT-H
 
A Basic Introduction to the Hadoop eco system - no animation
A Basic Introduction to the Hadoop eco system - no animationA Basic Introduction to the Hadoop eco system - no animation
A Basic Introduction to the Hadoop eco system - no animation
Sameer Tiwari
 
Hadoop trainting in hyderabad@kelly technologies
Hadoop trainting in hyderabad@kelly technologiesHadoop trainting in hyderabad@kelly technologies
Hadoop trainting in hyderabad@kelly technologies
Kelly Technologies
 
Hadoop - Overview
Hadoop - OverviewHadoop - Overview
Hadoop - Overview
Jay
 
Practical Problem Solving with Apache Hadoop & Pig
Practical Problem Solving with Apache Hadoop & PigPractical Problem Solving with Apache Hadoop & Pig
Practical Problem Solving with Apache Hadoop & Pig
Milind Bhandarkar
 
Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component
rebeccatho
 
Cassandra/Hadoop Integration
Cassandra/Hadoop IntegrationCassandra/Hadoop Integration
Cassandra/Hadoop Integration
Jeremy Hanna
 

Similar to Hadoop Architecture in Depth (20)

Hadoop 1.x vs 2
Hadoop 1.x vs 2Hadoop 1.x vs 2
Hadoop 1.x vs 2
Rommel Garcia
 
Lecture 2 part 1
Lecture 2 part 1Lecture 2 part 1
Lecture 2 part 1
Jazan University
 
Unit 1
Unit 1Unit 1
Unit 1
SriKGangadharRaoAssi
 
Understanding Hadoop
Understanding HadoopUnderstanding Hadoop
Understanding Hadoop
Mahendran Ponnusamy
 
Presentation sreenu dwh-services
Presentation sreenu dwh-servicesPresentation sreenu dwh-services
Presentation sreenu dwh-services
Sreenu Musham
 
Hadoop training by keylabs
Hadoop training by keylabsHadoop training by keylabs
Hadoop training by keylabs
Siva Sankar
 
Hadoop and BigData - July 2016
Hadoop and BigData - July 2016Hadoop and BigData - July 2016
Hadoop and BigData - July 2016
Ranjith Sekar
 
Harnessing Hadoop: Understanding the Big Data Processing Options for Optimizi...
Harnessing Hadoop: Understanding the Big Data Processing Options for Optimizi...Harnessing Hadoop: Understanding the Big Data Processing Options for Optimizi...
Harnessing Hadoop: Understanding the Big Data Processing Options for Optimizi...
Cognizant
 
hadoop
hadoophadoop
hadoop
swatic018
 
hadoop
hadoophadoop
hadoop
swatic018
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
Mr. Ankit
 
Hw09 Production Deep Dive With High Availability
Hw09   Production Deep Dive With High AvailabilityHw09   Production Deep Dive With High Availability
Hw09 Production Deep Dive With High Availability
Cloudera, Inc.
 
Apache hadoop
Apache hadoopApache hadoop
Apache hadoop
sheetal sharma
 
Hadoop and Mapreduce Introduction
Hadoop and Mapreduce IntroductionHadoop and Mapreduce Introduction
Hadoop and Mapreduce Introduction
rajsandhu1989
 
Apache Hadoop India Summit 2011 talk "Hadoop Map-Reduce Programming & Best Pr...
Apache Hadoop India Summit 2011 talk "Hadoop Map-Reduce Programming & Best Pr...Apache Hadoop India Summit 2011 talk "Hadoop Map-Reduce Programming & Best Pr...
Apache Hadoop India Summit 2011 talk "Hadoop Map-Reduce Programming & Best Pr...
Yahoo Developer Network
 
Cppt Hadoop
Cppt HadoopCppt Hadoop
Cppt Hadoop
chunkypandey12
 
Cppt
CpptCppt
Cppt
chunkypandey12
 
Cppt
CpptCppt
Cppt
chunkypandey12
 
Topic 9a-Hadoop Storage- HDFS.pptx
Topic 9a-Hadoop Storage- HDFS.pptxTopic 9a-Hadoop Storage- HDFS.pptx
Topic 9a-Hadoop Storage- HDFS.pptx
DanishMahmood23
 
Hadoop by kamran khan
Hadoop by kamran khanHadoop by kamran khan
Hadoop by kamran khan
KamranKhan587
 
Presentation sreenu dwh-services
Presentation sreenu dwh-servicesPresentation sreenu dwh-services
Presentation sreenu dwh-services
Sreenu Musham
 
Hadoop training by keylabs
Hadoop training by keylabsHadoop training by keylabs
Hadoop training by keylabs
Siva Sankar
 
Hadoop and BigData - July 2016
Hadoop and BigData - July 2016Hadoop and BigData - July 2016
Hadoop and BigData - July 2016
Ranjith Sekar
 
Harnessing Hadoop: Understanding the Big Data Processing Options for Optimizi...
Harnessing Hadoop: Understanding the Big Data Processing Options for Optimizi...Harnessing Hadoop: Understanding the Big Data Processing Options for Optimizi...
Harnessing Hadoop: Understanding the Big Data Processing Options for Optimizi...
Cognizant
 
Big Data and Hadoop
Big Data and HadoopBig Data and Hadoop
Big Data and Hadoop
Mr. Ankit
 
Hw09 Production Deep Dive With High Availability
Hw09   Production Deep Dive With High AvailabilityHw09   Production Deep Dive With High Availability
Hw09 Production Deep Dive With High Availability
Cloudera, Inc.
 
Hadoop and Mapreduce Introduction
Hadoop and Mapreduce IntroductionHadoop and Mapreduce Introduction
Hadoop and Mapreduce Introduction
rajsandhu1989
 
Apache Hadoop India Summit 2011 talk "Hadoop Map-Reduce Programming & Best Pr...
Apache Hadoop India Summit 2011 talk "Hadoop Map-Reduce Programming & Best Pr...Apache Hadoop India Summit 2011 talk "Hadoop Map-Reduce Programming & Best Pr...
Apache Hadoop India Summit 2011 talk "Hadoop Map-Reduce Programming & Best Pr...
Yahoo Developer Network
 
Topic 9a-Hadoop Storage- HDFS.pptx
Topic 9a-Hadoop Storage- HDFS.pptxTopic 9a-Hadoop Storage- HDFS.pptx
Topic 9a-Hadoop Storage- HDFS.pptx
DanishMahmood23
 
Hadoop by kamran khan
Hadoop by kamran khanHadoop by kamran khan
Hadoop by kamran khan
KamranKhan587
 
Ad

More from Syed Hadoop (6)

Kafka syed academy_v1_introduction
Kafka syed academy_v1_introductionKafka syed academy_v1_introduction
Kafka syed academy_v1_introduction
Syed Hadoop
 
Spark SQL In Depth www.syedacademy.com
Spark SQL In Depth www.syedacademy.comSpark SQL In Depth www.syedacademy.com
Spark SQL In Depth www.syedacademy.com
Syed Hadoop
 
Spark Streaming In Depth - www.syedacademy.com
Spark Streaming In Depth - www.syedacademy.comSpark Streaming In Depth - www.syedacademy.com
Spark Streaming In Depth - www.syedacademy.com
Syed Hadoop
 
Spark_RDD_SyedAcademy
Spark_RDD_SyedAcademySpark_RDD_SyedAcademy
Spark_RDD_SyedAcademy
Syed Hadoop
 
Spark_Intro_Syed_Academy
Spark_Intro_Syed_AcademySpark_Intro_Syed_Academy
Spark_Intro_Syed_Academy
Syed Hadoop
 
Hadoop course content Syed Academy
Hadoop course content Syed AcademyHadoop course content Syed Academy
Hadoop course content Syed Academy
Syed Hadoop
 
Kafka syed academy_v1_introduction
Kafka syed academy_v1_introductionKafka syed academy_v1_introduction
Kafka syed academy_v1_introduction
Syed Hadoop
 
Spark SQL In Depth www.syedacademy.com
Spark SQL In Depth www.syedacademy.comSpark SQL In Depth www.syedacademy.com
Spark SQL In Depth www.syedacademy.com
Syed Hadoop
 
Spark Streaming In Depth - www.syedacademy.com
Spark Streaming In Depth - www.syedacademy.comSpark Streaming In Depth - www.syedacademy.com
Spark Streaming In Depth - www.syedacademy.com
Syed Hadoop
 
Spark_RDD_SyedAcademy
Spark_RDD_SyedAcademySpark_RDD_SyedAcademy
Spark_RDD_SyedAcademy
Syed Hadoop
 
Spark_Intro_Syed_Academy
Spark_Intro_Syed_AcademySpark_Intro_Syed_Academy
Spark_Intro_Syed_Academy
Syed Hadoop
 
Hadoop course content Syed Academy
Hadoop course content Syed AcademyHadoop course content Syed Academy
Hadoop course content Syed Academy
Syed Hadoop
 
Ad

Recently uploaded (20)

chapter 4 Variability statistical research .pptx
chapter 4 Variability statistical research .pptxchapter 4 Variability statistical research .pptx
chapter 4 Variability statistical research .pptx
justinebandajbn
 
Simple_AI_Explanation_English somplr.pptx
Simple_AI_Explanation_English somplr.pptxSimple_AI_Explanation_English somplr.pptx
Simple_AI_Explanation_English somplr.pptx
ssuser2aa19f
 
Classification_in_Machinee_Learning.pptx
Classification_in_Machinee_Learning.pptxClassification_in_Machinee_Learning.pptx
Classification_in_Machinee_Learning.pptx
wencyjorda88
 
Template_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
Template_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnTemplate_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
Template_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
cegiver630
 
Just-In-Timeasdfffffffghhhhhhhhhhj Systems.ppt
Just-In-Timeasdfffffffghhhhhhhhhhj Systems.pptJust-In-Timeasdfffffffghhhhhhhhhhj Systems.ppt
Just-In-Timeasdfffffffghhhhhhhhhhj Systems.ppt
ssuser5f8f49
 
VKS-Python-FIe Handling text CSV Binary.pptx
VKS-Python-FIe Handling text CSV Binary.pptxVKS-Python-FIe Handling text CSV Binary.pptx
VKS-Python-FIe Handling text CSV Binary.pptx
Vinod Srivastava
 
Deloitte Analytics - Applying Process Mining in an audit context
Deloitte Analytics - Applying Process Mining in an audit contextDeloitte Analytics - Applying Process Mining in an audit context
Deloitte Analytics - Applying Process Mining in an audit context
Process mining Evangelist
 
DPR_Expert_Recruitment_notice_Revised.pdf
DPR_Expert_Recruitment_notice_Revised.pdfDPR_Expert_Recruitment_notice_Revised.pdf
DPR_Expert_Recruitment_notice_Revised.pdf
inmishra17121973
 
Ch3MCT24.pptx measure of central tendency
Ch3MCT24.pptx measure of central tendencyCh3MCT24.pptx measure of central tendency
Ch3MCT24.pptx measure of central tendency
ayeleasefa2
 
Defense Against LLM Scheming 2025_04_28.pptx
Defense Against LLM Scheming 2025_04_28.pptxDefense Against LLM Scheming 2025_04_28.pptx
Defense Against LLM Scheming 2025_04_28.pptx
Greg Makowski
 
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
James Francis Paradigm Asset Management
 
IAS-slides2-ia-aaaaaaaaaaain-business.pdf
IAS-slides2-ia-aaaaaaaaaaain-business.pdfIAS-slides2-ia-aaaaaaaaaaain-business.pdf
IAS-slides2-ia-aaaaaaaaaaain-business.pdf
mcgardenlevi9
 
Cleaned_Lecture 6666666_Simulation_I.pdf
Cleaned_Lecture 6666666_Simulation_I.pdfCleaned_Lecture 6666666_Simulation_I.pdf
Cleaned_Lecture 6666666_Simulation_I.pdf
alcinialbob1234
 
Minions Want to eat presentacion muy linda
Minions Want to eat presentacion muy lindaMinions Want to eat presentacion muy linda
Minions Want to eat presentacion muy linda
CarlaAndradesSoler1
 
Stack_and_Queue_Presentation_Final (1).pptx
Stack_and_Queue_Presentation_Final (1).pptxStack_and_Queue_Presentation_Final (1).pptx
Stack_and_Queue_Presentation_Final (1).pptx
binduraniha86
 
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
Simran112433
 
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Abodahab
 
How to join illuminati Agent in uganda call+256776963507/0741506136
How to join illuminati Agent in uganda call+256776963507/0741506136How to join illuminati Agent in uganda call+256776963507/0741506136
How to join illuminati Agent in uganda call+256776963507/0741506136
illuminati Agent uganda call+256776963507/0741506136
 
Flip flop presenation-Presented By Mubahir khan.pptx
Flip flop presenation-Presented By Mubahir khan.pptxFlip flop presenation-Presented By Mubahir khan.pptx
Flip flop presenation-Presented By Mubahir khan.pptx
mubashirkhan45461
 
Ppt. Nikhil.pptxnshwuudgcudisisshvehsjks
Ppt. Nikhil.pptxnshwuudgcudisisshvehsjksPpt. Nikhil.pptxnshwuudgcudisisshvehsjks
Ppt. Nikhil.pptxnshwuudgcudisisshvehsjks
panchariyasahil
 
chapter 4 Variability statistical research .pptx
chapter 4 Variability statistical research .pptxchapter 4 Variability statistical research .pptx
chapter 4 Variability statistical research .pptx
justinebandajbn
 
Simple_AI_Explanation_English somplr.pptx
Simple_AI_Explanation_English somplr.pptxSimple_AI_Explanation_English somplr.pptx
Simple_AI_Explanation_English somplr.pptx
ssuser2aa19f
 
Classification_in_Machinee_Learning.pptx
Classification_in_Machinee_Learning.pptxClassification_in_Machinee_Learning.pptx
Classification_in_Machinee_Learning.pptx
wencyjorda88
 
Template_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
Template_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnTemplate_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
Template_A3nnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnnn
cegiver630
 
Just-In-Timeasdfffffffghhhhhhhhhhj Systems.ppt
Just-In-Timeasdfffffffghhhhhhhhhhj Systems.pptJust-In-Timeasdfffffffghhhhhhhhhhj Systems.ppt
Just-In-Timeasdfffffffghhhhhhhhhhj Systems.ppt
ssuser5f8f49
 
VKS-Python-FIe Handling text CSV Binary.pptx
VKS-Python-FIe Handling text CSV Binary.pptxVKS-Python-FIe Handling text CSV Binary.pptx
VKS-Python-FIe Handling text CSV Binary.pptx
Vinod Srivastava
 
Deloitte Analytics - Applying Process Mining in an audit context
Deloitte Analytics - Applying Process Mining in an audit contextDeloitte Analytics - Applying Process Mining in an audit context
Deloitte Analytics - Applying Process Mining in an audit context
Process mining Evangelist
 
DPR_Expert_Recruitment_notice_Revised.pdf
DPR_Expert_Recruitment_notice_Revised.pdfDPR_Expert_Recruitment_notice_Revised.pdf
DPR_Expert_Recruitment_notice_Revised.pdf
inmishra17121973
 
Ch3MCT24.pptx measure of central tendency
Ch3MCT24.pptx measure of central tendencyCh3MCT24.pptx measure of central tendency
Ch3MCT24.pptx measure of central tendency
ayeleasefa2
 
Defense Against LLM Scheming 2025_04_28.pptx
Defense Against LLM Scheming 2025_04_28.pptxDefense Against LLM Scheming 2025_04_28.pptx
Defense Against LLM Scheming 2025_04_28.pptx
Greg Makowski
 
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
Safety Innovation in Mt. Vernon A Westchester County Model for New Rochelle a...
James Francis Paradigm Asset Management
 
IAS-slides2-ia-aaaaaaaaaaain-business.pdf
IAS-slides2-ia-aaaaaaaaaaain-business.pdfIAS-slides2-ia-aaaaaaaaaaain-business.pdf
IAS-slides2-ia-aaaaaaaaaaain-business.pdf
mcgardenlevi9
 
Cleaned_Lecture 6666666_Simulation_I.pdf
Cleaned_Lecture 6666666_Simulation_I.pdfCleaned_Lecture 6666666_Simulation_I.pdf
Cleaned_Lecture 6666666_Simulation_I.pdf
alcinialbob1234
 
Minions Want to eat presentacion muy linda
Minions Want to eat presentacion muy lindaMinions Want to eat presentacion muy linda
Minions Want to eat presentacion muy linda
CarlaAndradesSoler1
 
Stack_and_Queue_Presentation_Final (1).pptx
Stack_and_Queue_Presentation_Final (1).pptxStack_and_Queue_Presentation_Final (1).pptx
Stack_and_Queue_Presentation_Final (1).pptx
binduraniha86
 
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
1. Briefing Session_SEED with Hon. Governor Assam - 27.10.pdf
Simran112433
 
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Day 1 - Lab 1 Reconnaissance Scanning with NMAP, Vulnerability Assessment wit...
Abodahab
 
Flip flop presenation-Presented By Mubahir khan.pptx
Flip flop presenation-Presented By Mubahir khan.pptxFlip flop presenation-Presented By Mubahir khan.pptx
Flip flop presenation-Presented By Mubahir khan.pptx
mubashirkhan45461
 
Ppt. Nikhil.pptxnshwuudgcudisisshvehsjks
Ppt. Nikhil.pptxnshwuudgcudisisshvehsjksPpt. Nikhil.pptxnshwuudgcudisisshvehsjks
Ppt. Nikhil.pptxnshwuudgcudisisshvehsjks
panchariyasahil
 

Hadoop Architecture in Depth

  • 2. Need For A New Processing Platform (Big Data) What is Big Data ?  Twitter (over 7~ TB/day)  Facebook (over 10~ TB/day)  Google (over 20~ PB/day) Where does it come from ? Existing systems (vertical scalibility) Why Hadoop (horizontal scalibility)?
  • 4. What is Hadoop?  Flexible infrastructure for large scale computation & data processing on a network of commodity hardware.  Completely written in java.  Open source & distributed under Apache license  Hadoop Core Components: HDFS & MapReduce.  The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models.
  • 5. What Hadoop is Not?  A database  An online transaction processing (OLTP) system  Replacement of all programming language
  • 6. Hadoop Introduction and Architecture
  • 8. HDFS - Hadoop Distributed File System Design of HDFS Where HDFS is not a good fit Why Is a Block in HDFS So Large?
  • 10. Let us Zoom into HDFS
  • 11. NameNode  Deeper Things about Name Node
  • 13.  What is DataNode? DataNode
  • 17. Still need to be fixed
  • 18. How Do We Fix a Single NameNode Feature
  • 21. NameNode HA – Shared Storage
  • 24. Hadoop JournalNode JournalNode machines - the machines on which you run the JournalNodes. The JournalNode daemon is relatively lightweight, so these daemons may reasonably be collocated on machines with other Hadoop daemons, for example NameNodes, the JobTracker, or the YARN ResourceManager. Note: There must be at least 3 JournalNode daemons, since edit log modifications must be written to a majority of JNs. This will allow the system to tolerate the failure of a single machine. You may also run more than 3 JournalNodes, but in order to actually increase the number of failures the system can tolerate, you should run an odd number of JNs, (i.e. 3, 5, 7, etc.). Note that when running with N JournalNodes, the system can tolerate at most (N - 1) / 2 failures and continue to function normally.
  • 25. Hadoop 1 Limited up to 4,000 nodes per cluster O(# of tasks in a cluster) JobTracker bottleneck - resource management, job scheduling and monitoring Only has one namespace for managing HDFS Map and Reduce slots are static Only job to run is MapReduce
  • 26. Hadoop 1 - Reading Files Rack1 Rack2 Rack3 RackN read file (fsimage/edit) Hadoop Client NameNode SNameNode return DNs, block ids, etc. DN | TT DN | TT DN | TT DN | TT DN | TT DN | TT DN | TT DN | TT DN | TT DN | TT DN | TT DN | TT checkpoint heartbeat/ block reportread blocks
  • 27. Hadoop 1 - Writing Files Rack1 Rack2 Rack3 RackN request write (fsimage/edit) Hadoop Client NameNode SNameNode return DNs, etc. DN | TT DN | TT DN | TT DN | TT DN | TT DN | TT DN | TT DN | TT DN | TT DN | TT DN | TT DN | TT checkpoint block report write blocks replication pipelining
  • 28. Hadoop 2 Potentially up to 10,000 nodes per cluster O(cluster size) Supports multiple namespace for managing HDFS Efficient cluster utilization (YARN) MRv1 backward and forward compatible Any apps can integrate with Hadoop
  • 29. Hadoop 2 - Basics
  • 30. Hadoop 2 - Reading Files (w/ NN Federation) Rack1 Rack2 Rack3 RackN read file fsimage/edit copy Hadoop Client NN1/ns1 SNameNode per NN return DNs, block ids, etc. DN | NM DN | NM DN | NM DN | NM DN | NM DN | NM DN | NM DN | NM DN | NM DN | NM DN | NM DN | NM checkpoint register/ heartbeat/ block report read blocks fs sync Backup NN per NN checkpoint NN2/ns2 NN3/ns3 NN4/ns4 or ns1 ns2 ns3 ns4 dn1, dn2 dn1, dn3 dn4, dn5 dn4, dn5 Block Pools
  • 31. Hadoop 2 - Writing Files Rack1 Rack2 Rack3 RackN request write Hadoop Client return DNs, etc. DN | NM DN | NM DN | NM DN | NM DN | NM DN | NM DN | NM DN | NM DN | NM DN | NM DN | NM DN | NM write blocks replication pipelining fsimage/edit copy NN1/ns1 SNameNode per NN checkpoint block report fs sync Backup NN per NN checkpoint NN2/ns2 NN3/ns3 NN4/ns4 or