SlideShare a Scribd company logo
INTRODUCTION TO
HADOOP
Learning Objectives Learning Outcomes
Introduction to Hadoop
1. To study the features of Hadoop.
2. To learn the basic concepts of HDFS and
MapReduce Programming.
3. To study HDFS Architecture.
4. To study MapReduce Programming Model
5. To study Hadoop Ecosystem.
a) To comprehend the reasons behind the
popularity of Hadoop.
b) To be able to perform HDFS
operations.
c) To comprehend MapReduce framework.
d) To understand the read and write in
HDFS.
e) To be able to understand Hadoop
Ecosystem.
Agenda
 Hadoop - An Introduction
 RDBMS versus Hadoop
 Distributed Computing
Challenges
 History of Hadoop
 Hadoop Overview
 Key Aspects of Hadoop
 Hadoop Components
 High Level Architecture of
Hadoop
 Use case for Hadoop
 ClickStream Data
 Hadoop Distributors
 HDFS
 HDFS Daemons
 Anatomy of File Read
 Anatomy of File Write
 Replica Placement Strategy
 Working with HDFS commands
 Special Features of HDFS
Agenda
 Processing Data
with Hadoop
 What is MapReduce
Programming?
 How does MapReduce
Works?
 MapReduce Word Count
Example
 Managing Resources and
Application with Hadoop YARN
 Limitations of Hadoop 1.0
Architecture
 Hadoop 2 YARN: Taking Hadoop
Beyond Batch
 Hadoop Ecosystem
 Pig
 Hive
 Sqoop
 HBase
Hadoop – An Introduction
 Hadoop is an open-source distributed
computing framework that is used for
storing and processing large volumes of
data.
 It is designed to run on a cluster of
commodity hardware, and its main
components include a distributed file
system (Hadoop Distributed File System or
HDFS) and a parallel processing
framework (MapReduce).
 Its capability to handle massive
amounts of data, different categories of
What is Hadoop ?
Hadoop is an open-source, Java-based framework from
Apache which is used for storing, processing and analyzing
data which are very huge in volume.
Hadoop is used for batch/ offline processing.
It is a collection of software utilities which uses a network of
many computers to solve problems involving large amounts
of data and computation.
Hadoop Overview
 Key Aspects of Hadoop
Hadoop Overview
 Key Aspects of Hadoop
Hadoop Overview
 Key Aspects of Hadoop
Hadoop Overview
 Key Aspects of Hadoop
Hadoop Overview
 Key Aspects of Hadoop
History of Hadoop- Hadoop was created by Doug Cutting and Mike Cafarella in
2005, inspired by Google's MapReduce and Google File System (GFS) technologies.
Is there any full form of HADOOP?
 NO
 Doug used the name for his open source project because it
was relatively easy to spell and pronounce, meaningless, and
not used elsewhere.
RDBMS versus HADOOP
Distributed Computing Challenges
Hadoop Components
HBase is a key value store (mostly), Hive is a system to execute SQL-like queries on a Hadoop system,
Pig is a special query language to access big data. Apache Sqoop is a tool that is extensively used to
transfer large amounts of data from Hadoop to the relational database servers and vice-versa.
Lecture 2 Hadoop.pptx
Hadoop Components
Hadoop Components
 Hadoop Core Components:
 HDFS:
(a) Storage component.
(b) Distributes data across several nodes.
(c) Natively redundant.
 MapReduce:
(a) Computational framework.
(b) Splits a task across multiple nodes.
(c) Processes data in parallel.
Hadoop
HDFS
MapReduce
Hadoop High Level Architecture
Hadoop High Level Architecture
Hadoop High Level Architecture
 Every Hadoop cluster consists of a single master and multiple
worker nodes.
 The Master node has a Job Tracker, Task Tracker, Name Node
and Data Node while
 the Slave (worker node) can act as both a DataNode and
TaskTracker.
 Also it is possible to have data-only and compute only worker
nodes.
Modules of Hadoop
 The Hadoop framework is composed of the following modules :
 Hadoop Distributed File System (HDFS) : It includes the files that
will be broken into blocks and will be stored in nodes over a
distributed architecture. Using a distributed file system provides very
high aggregate bandwidth across clusters
Modules of Hadoop
 The Hadoop framework is composed of the following modules :
 Hadoop Distributed File System (HDFS)
 Hadoop Yarn (Yet Another Resource Negotiator) : Used for job
scheduling and managing the computing resources in clusters.
Modules of Hadoop
 The Hadoop framework is composed of the following modules :
 Hadoop Distributed File System (HDFS)
 Hadoop Yarn (Yet Another Resource Negotiator)
 Hadoop MapReduce : It is an algorithm which distributes the task
into small pieces and assigns those pieces to many computers
joined over the network, and assembles all the events to form the
last event dataset.
Modules of Hadoop
 The Hadoop framework is composed of the following modules :
 Hadoop Distributed File System (HDFS)
 Hadoop Yarn (Yet Another Resource Negotiator)
 Hadoop MapReduce
 Hadoop Common : Includes Java Libraries that are used to start
Hadoop and utilities which are needed by other Hadoop modules.
ClickStream Data Analysis
 ClickStream data (mouse clicks) helps you to
understand the purchasing behavior of customers.
ClickStream analysis helps online marketers to
optimize their product web pages, promotional
content, etc. to improve their business.
Hadoop Distributors
Ad

More Related Content

Similar to Lecture 2 Hadoop.pptx (20)

Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1
Thanh Nguyen
 
Overview of big data & hadoop version 1 - Tony Nguyen
Overview of big data & hadoop   version 1 - Tony NguyenOverview of big data & hadoop   version 1 - Tony Nguyen
Overview of big data & hadoop version 1 - Tony Nguyen
Thanh Nguyen
 
Hadoop
HadoopHadoop
Hadoop
yasser hassen
 
Big Data Training in Amritsar
Big Data Training in AmritsarBig Data Training in Amritsar
Big Data Training in Amritsar
E2MATRIX
 
Big-Data Hadoop Tutorials - MindScripts Technologies, Pune
Big-Data Hadoop Tutorials - MindScripts Technologies, Pune Big-Data Hadoop Tutorials - MindScripts Technologies, Pune
Big-Data Hadoop Tutorials - MindScripts Technologies, Pune
amrutupre
 
Hadoop
HadoopHadoop
Hadoop
Nishant Gandhi
 
Big Data UNIT 2 AKTU syllabus all topics covered
Big Data UNIT 2 AKTU syllabus all topics coveredBig Data UNIT 2 AKTU syllabus all topics covered
Big Data UNIT 2 AKTU syllabus all topics covered
chinky1118
 
Hadoop map reduce
Hadoop map reduceHadoop map reduce
Hadoop map reduce
VijayMohan Vasu
 
Big Data Training in Mohali
Big Data Training in MohaliBig Data Training in Mohali
Big Data Training in Mohali
E2MATRIX
 
Big Data Training in Ludhiana
Big Data Training in LudhianaBig Data Training in Ludhiana
Big Data Training in Ludhiana
E2MATRIX
 
Hadoop online training by certified trainer
Hadoop online training by certified trainerHadoop online training by certified trainer
Hadoop online training by certified trainer
sriram0233
 
Hadoop training by keylabs
Hadoop training by keylabsHadoop training by keylabs
Hadoop training by keylabs
Siva Sankar
 
62_Tazeen_Sayed_Hadoop_Ecosystem.pptx
62_Tazeen_Sayed_Hadoop_Ecosystem.pptx62_Tazeen_Sayed_Hadoop_Ecosystem.pptx
62_Tazeen_Sayed_Hadoop_Ecosystem.pptx
TazeenSayed3
 
Introduction to Apache Hadoop Ecosystem
Introduction to Apache Hadoop EcosystemIntroduction to Apache Hadoop Ecosystem
Introduction to Apache Hadoop Ecosystem
Mahabubur Rahaman
 
Hadoop online training
Hadoop online training Hadoop online training
Hadoop online training
Keylabs
 
Hadoop_arunam_ppt
Hadoop_arunam_pptHadoop_arunam_ppt
Hadoop_arunam_ppt
jerrin joseph
 
Distributed Systems Hadoop.pptx
Distributed Systems Hadoop.pptxDistributed Systems Hadoop.pptx
Distributed Systems Hadoop.pptx
Uttara University
 
What is hadoop
What is hadoopWhat is hadoop
What is hadoop
Asis Mohanty
 
2.1-HADOOP.pdf
2.1-HADOOP.pdf2.1-HADOOP.pdf
2.1-HADOOP.pdf
MarianJRuben
 
project report on hadoop
project report on hadoopproject report on hadoop
project report on hadoop
Manoj Jangalva
 
Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1Overview of Big data, Hadoop and Microsoft BI - version1
Overview of Big data, Hadoop and Microsoft BI - version1
Thanh Nguyen
 
Overview of big data & hadoop version 1 - Tony Nguyen
Overview of big data & hadoop   version 1 - Tony NguyenOverview of big data & hadoop   version 1 - Tony Nguyen
Overview of big data & hadoop version 1 - Tony Nguyen
Thanh Nguyen
 
Big Data Training in Amritsar
Big Data Training in AmritsarBig Data Training in Amritsar
Big Data Training in Amritsar
E2MATRIX
 
Big-Data Hadoop Tutorials - MindScripts Technologies, Pune
Big-Data Hadoop Tutorials - MindScripts Technologies, Pune Big-Data Hadoop Tutorials - MindScripts Technologies, Pune
Big-Data Hadoop Tutorials - MindScripts Technologies, Pune
amrutupre
 
Big Data UNIT 2 AKTU syllabus all topics covered
Big Data UNIT 2 AKTU syllabus all topics coveredBig Data UNIT 2 AKTU syllabus all topics covered
Big Data UNIT 2 AKTU syllabus all topics covered
chinky1118
 
Big Data Training in Mohali
Big Data Training in MohaliBig Data Training in Mohali
Big Data Training in Mohali
E2MATRIX
 
Big Data Training in Ludhiana
Big Data Training in LudhianaBig Data Training in Ludhiana
Big Data Training in Ludhiana
E2MATRIX
 
Hadoop online training by certified trainer
Hadoop online training by certified trainerHadoop online training by certified trainer
Hadoop online training by certified trainer
sriram0233
 
Hadoop training by keylabs
Hadoop training by keylabsHadoop training by keylabs
Hadoop training by keylabs
Siva Sankar
 
62_Tazeen_Sayed_Hadoop_Ecosystem.pptx
62_Tazeen_Sayed_Hadoop_Ecosystem.pptx62_Tazeen_Sayed_Hadoop_Ecosystem.pptx
62_Tazeen_Sayed_Hadoop_Ecosystem.pptx
TazeenSayed3
 
Introduction to Apache Hadoop Ecosystem
Introduction to Apache Hadoop EcosystemIntroduction to Apache Hadoop Ecosystem
Introduction to Apache Hadoop Ecosystem
Mahabubur Rahaman
 
Hadoop online training
Hadoop online training Hadoop online training
Hadoop online training
Keylabs
 
Distributed Systems Hadoop.pptx
Distributed Systems Hadoop.pptxDistributed Systems Hadoop.pptx
Distributed Systems Hadoop.pptx
Uttara University
 
project report on hadoop
project report on hadoopproject report on hadoop
project report on hadoop
Manoj Jangalva
 

More from Anonymous9etQKwW (13)

CISCT 2024 template (1) template template
CISCT 2024 template (1) template templateCISCT 2024 template (1) template template
CISCT 2024 template (1) template template
Anonymous9etQKwW
 
distributed system ppt presentation in cs
distributed system ppt presentation in csdistributed system ppt presentation in cs
distributed system ppt presentation in cs
Anonymous9etQKwW
 
os distributed system theoretical foundation
os distributed system theoretical foundationos distributed system theoretical foundation
os distributed system theoretical foundation
Anonymous9etQKwW
 
osi model computer networks complete detail
osi model computer networks complete detailosi model computer networks complete detail
osi model computer networks complete detail
Anonymous9etQKwW
 
CODch3Slides.ppt
CODch3Slides.pptCODch3Slides.ppt
CODch3Slides.ppt
Anonymous9etQKwW
 
IntroductoryPPT_CSE242.pptx
IntroductoryPPT_CSE242.pptxIntroductoryPPT_CSE242.pptx
IntroductoryPPT_CSE242.pptx
Anonymous9etQKwW
 
Intro.ppt
Intro.pptIntro.ppt
Intro.ppt
Anonymous9etQKwW
 
Big Data & Analytics (CSE6005) L6.pptx
Big Data & Analytics (CSE6005) L6.pptxBig Data & Analytics (CSE6005) L6.pptx
Big Data & Analytics (CSE6005) L6.pptx
Anonymous9etQKwW
 
mapreduceApril24.ppt
mapreduceApril24.pptmapreduceApril24.ppt
mapreduceApril24.ppt
Anonymous9etQKwW
 
ch7.ppt
ch7.pptch7.ppt
ch7.ppt
Anonymous9etQKwW
 
lecture 2.pptx
lecture 2.pptxlecture 2.pptx
lecture 2.pptx
Anonymous9etQKwW
 
Chap 4.ppt
Chap 4.pptChap 4.ppt
Chap 4.ppt
Anonymous9etQKwW
 
Artificial Neural Networks_Bioinsspired_Algorithms_Nov 20.ppt
Artificial Neural Networks_Bioinsspired_Algorithms_Nov 20.pptArtificial Neural Networks_Bioinsspired_Algorithms_Nov 20.ppt
Artificial Neural Networks_Bioinsspired_Algorithms_Nov 20.ppt
Anonymous9etQKwW
 
CISCT 2024 template (1) template template
CISCT 2024 template (1) template templateCISCT 2024 template (1) template template
CISCT 2024 template (1) template template
Anonymous9etQKwW
 
distributed system ppt presentation in cs
distributed system ppt presentation in csdistributed system ppt presentation in cs
distributed system ppt presentation in cs
Anonymous9etQKwW
 
os distributed system theoretical foundation
os distributed system theoretical foundationos distributed system theoretical foundation
os distributed system theoretical foundation
Anonymous9etQKwW
 
osi model computer networks complete detail
osi model computer networks complete detailosi model computer networks complete detail
osi model computer networks complete detail
Anonymous9etQKwW
 
IntroductoryPPT_CSE242.pptx
IntroductoryPPT_CSE242.pptxIntroductoryPPT_CSE242.pptx
IntroductoryPPT_CSE242.pptx
Anonymous9etQKwW
 
Big Data & Analytics (CSE6005) L6.pptx
Big Data & Analytics (CSE6005) L6.pptxBig Data & Analytics (CSE6005) L6.pptx
Big Data & Analytics (CSE6005) L6.pptx
Anonymous9etQKwW
 
Artificial Neural Networks_Bioinsspired_Algorithms_Nov 20.ppt
Artificial Neural Networks_Bioinsspired_Algorithms_Nov 20.pptArtificial Neural Networks_Bioinsspired_Algorithms_Nov 20.ppt
Artificial Neural Networks_Bioinsspired_Algorithms_Nov 20.ppt
Anonymous9etQKwW
 
Ad

Recently uploaded (20)

QA/QC Manager (Quality management Expert)
QA/QC Manager (Quality management Expert)QA/QC Manager (Quality management Expert)
QA/QC Manager (Quality management Expert)
rccbatchplant
 
five-year-soluhhhhhhhhhhhhhhhhhtions.pdf
five-year-soluhhhhhhhhhhhhhhhhhtions.pdffive-year-soluhhhhhhhhhhhhhhhhhtions.pdf
five-year-soluhhhhhhhhhhhhhhhhhtions.pdf
AdityaSharma944496
 
new ppt artificial intelligence historyyy
new ppt artificial intelligence historyyynew ppt artificial intelligence historyyy
new ppt artificial intelligence historyyy
PianoPianist
 
MAQUINARIA MINAS CEMA 6th Edition (1).pdf
MAQUINARIA MINAS CEMA 6th Edition (1).pdfMAQUINARIA MINAS CEMA 6th Edition (1).pdf
MAQUINARIA MINAS CEMA 6th Edition (1).pdf
ssuser562df4
 
Compiler Design_Lexical Analysis phase.pptx
Compiler Design_Lexical Analysis phase.pptxCompiler Design_Lexical Analysis phase.pptx
Compiler Design_Lexical Analysis phase.pptx
RushaliDeshmukh2
 
Compiler Design Unit1 PPT Phases of Compiler.pptx
Compiler Design Unit1 PPT Phases of Compiler.pptxCompiler Design Unit1 PPT Phases of Compiler.pptx
Compiler Design Unit1 PPT Phases of Compiler.pptx
RushaliDeshmukh2
 
Lidar for Autonomous Driving, LiDAR Mapping for Driverless Cars.pptx
Lidar for Autonomous Driving, LiDAR Mapping for Driverless Cars.pptxLidar for Autonomous Driving, LiDAR Mapping for Driverless Cars.pptx
Lidar for Autonomous Driving, LiDAR Mapping for Driverless Cars.pptx
RishavKumar530754
 
Data Structures_Searching and Sorting.pptx
Data Structures_Searching and Sorting.pptxData Structures_Searching and Sorting.pptx
Data Structures_Searching and Sorting.pptx
RushaliDeshmukh2
 
211421893-M-Tech-CIVIL-Structural-Engineering-pdf.pdf
211421893-M-Tech-CIVIL-Structural-Engineering-pdf.pdf211421893-M-Tech-CIVIL-Structural-Engineering-pdf.pdf
211421893-M-Tech-CIVIL-Structural-Engineering-pdf.pdf
inmishra17121973
 
DATA-DRIVEN SHOULDER INVERSE KINEMATICS YoungBeom Kim1 , Byung-Ha Park1 , Kwa...
DATA-DRIVEN SHOULDER INVERSE KINEMATICS YoungBeom Kim1 , Byung-Ha Park1 , Kwa...DATA-DRIVEN SHOULDER INVERSE KINEMATICS YoungBeom Kim1 , Byung-Ha Park1 , Kwa...
DATA-DRIVEN SHOULDER INVERSE KINEMATICS YoungBeom Kim1 , Byung-Ha Park1 , Kwa...
charlesdick1345
 
Artificial Intelligence (AI) basics.pptx
Artificial Intelligence (AI) basics.pptxArtificial Intelligence (AI) basics.pptx
Artificial Intelligence (AI) basics.pptx
aditichinar
 
Level 1-Safety.pptx Presentation of Electrical Safety
Level 1-Safety.pptx Presentation of Electrical SafetyLevel 1-Safety.pptx Presentation of Electrical Safety
Level 1-Safety.pptx Presentation of Electrical Safety
JoseAlbertoCariasDel
 
Smart_Storage_Systems_Production_Engineering.pptx
Smart_Storage_Systems_Production_Engineering.pptxSmart_Storage_Systems_Production_Engineering.pptx
Smart_Storage_Systems_Production_Engineering.pptx
rushikeshnavghare94
 
some basics electrical and electronics knowledge
some basics electrical and electronics knowledgesome basics electrical and electronics knowledge
some basics electrical and electronics knowledge
nguyentrungdo88
 
Oil-gas_Unconventional oil and gass_reseviours.pdf
Oil-gas_Unconventional oil and gass_reseviours.pdfOil-gas_Unconventional oil and gass_reseviours.pdf
Oil-gas_Unconventional oil and gass_reseviours.pdf
M7md3li2
 
Machine learning project on employee attrition detection using (2).pptx
Machine learning project on employee attrition detection using (2).pptxMachine learning project on employee attrition detection using (2).pptx
Machine learning project on employee attrition detection using (2).pptx
rajeswari89780
 
IntroSlides-April-BuildWithAI-VertexAI.pdf
IntroSlides-April-BuildWithAI-VertexAI.pdfIntroSlides-April-BuildWithAI-VertexAI.pdf
IntroSlides-April-BuildWithAI-VertexAI.pdf
Luiz Carneiro
 
Avnet Silica's PCIM 2025 Highlights Flyer
Avnet Silica's PCIM 2025 Highlights FlyerAvnet Silica's PCIM 2025 Highlights Flyer
Avnet Silica's PCIM 2025 Highlights Flyer
WillDavies22
 
Development of MLR, ANN and ANFIS Models for Estimation of PCUs at Different ...
Development of MLR, ANN and ANFIS Models for Estimation of PCUs at Different ...Development of MLR, ANN and ANFIS Models for Estimation of PCUs at Different ...
Development of MLR, ANN and ANFIS Models for Estimation of PCUs at Different ...
Journal of Soft Computing in Civil Engineering
 
"Boiler Feed Pump (BFP): Working, Applications, Advantages, and Limitations E...
"Boiler Feed Pump (BFP): Working, Applications, Advantages, and Limitations E..."Boiler Feed Pump (BFP): Working, Applications, Advantages, and Limitations E...
"Boiler Feed Pump (BFP): Working, Applications, Advantages, and Limitations E...
Infopitaara
 
QA/QC Manager (Quality management Expert)
QA/QC Manager (Quality management Expert)QA/QC Manager (Quality management Expert)
QA/QC Manager (Quality management Expert)
rccbatchplant
 
five-year-soluhhhhhhhhhhhhhhhhhtions.pdf
five-year-soluhhhhhhhhhhhhhhhhhtions.pdffive-year-soluhhhhhhhhhhhhhhhhhtions.pdf
five-year-soluhhhhhhhhhhhhhhhhhtions.pdf
AdityaSharma944496
 
new ppt artificial intelligence historyyy
new ppt artificial intelligence historyyynew ppt artificial intelligence historyyy
new ppt artificial intelligence historyyy
PianoPianist
 
MAQUINARIA MINAS CEMA 6th Edition (1).pdf
MAQUINARIA MINAS CEMA 6th Edition (1).pdfMAQUINARIA MINAS CEMA 6th Edition (1).pdf
MAQUINARIA MINAS CEMA 6th Edition (1).pdf
ssuser562df4
 
Compiler Design_Lexical Analysis phase.pptx
Compiler Design_Lexical Analysis phase.pptxCompiler Design_Lexical Analysis phase.pptx
Compiler Design_Lexical Analysis phase.pptx
RushaliDeshmukh2
 
Compiler Design Unit1 PPT Phases of Compiler.pptx
Compiler Design Unit1 PPT Phases of Compiler.pptxCompiler Design Unit1 PPT Phases of Compiler.pptx
Compiler Design Unit1 PPT Phases of Compiler.pptx
RushaliDeshmukh2
 
Lidar for Autonomous Driving, LiDAR Mapping for Driverless Cars.pptx
Lidar for Autonomous Driving, LiDAR Mapping for Driverless Cars.pptxLidar for Autonomous Driving, LiDAR Mapping for Driverless Cars.pptx
Lidar for Autonomous Driving, LiDAR Mapping for Driverless Cars.pptx
RishavKumar530754
 
Data Structures_Searching and Sorting.pptx
Data Structures_Searching and Sorting.pptxData Structures_Searching and Sorting.pptx
Data Structures_Searching and Sorting.pptx
RushaliDeshmukh2
 
211421893-M-Tech-CIVIL-Structural-Engineering-pdf.pdf
211421893-M-Tech-CIVIL-Structural-Engineering-pdf.pdf211421893-M-Tech-CIVIL-Structural-Engineering-pdf.pdf
211421893-M-Tech-CIVIL-Structural-Engineering-pdf.pdf
inmishra17121973
 
DATA-DRIVEN SHOULDER INVERSE KINEMATICS YoungBeom Kim1 , Byung-Ha Park1 , Kwa...
DATA-DRIVEN SHOULDER INVERSE KINEMATICS YoungBeom Kim1 , Byung-Ha Park1 , Kwa...DATA-DRIVEN SHOULDER INVERSE KINEMATICS YoungBeom Kim1 , Byung-Ha Park1 , Kwa...
DATA-DRIVEN SHOULDER INVERSE KINEMATICS YoungBeom Kim1 , Byung-Ha Park1 , Kwa...
charlesdick1345
 
Artificial Intelligence (AI) basics.pptx
Artificial Intelligence (AI) basics.pptxArtificial Intelligence (AI) basics.pptx
Artificial Intelligence (AI) basics.pptx
aditichinar
 
Level 1-Safety.pptx Presentation of Electrical Safety
Level 1-Safety.pptx Presentation of Electrical SafetyLevel 1-Safety.pptx Presentation of Electrical Safety
Level 1-Safety.pptx Presentation of Electrical Safety
JoseAlbertoCariasDel
 
Smart_Storage_Systems_Production_Engineering.pptx
Smart_Storage_Systems_Production_Engineering.pptxSmart_Storage_Systems_Production_Engineering.pptx
Smart_Storage_Systems_Production_Engineering.pptx
rushikeshnavghare94
 
some basics electrical and electronics knowledge
some basics electrical and electronics knowledgesome basics electrical and electronics knowledge
some basics electrical and electronics knowledge
nguyentrungdo88
 
Oil-gas_Unconventional oil and gass_reseviours.pdf
Oil-gas_Unconventional oil and gass_reseviours.pdfOil-gas_Unconventional oil and gass_reseviours.pdf
Oil-gas_Unconventional oil and gass_reseviours.pdf
M7md3li2
 
Machine learning project on employee attrition detection using (2).pptx
Machine learning project on employee attrition detection using (2).pptxMachine learning project on employee attrition detection using (2).pptx
Machine learning project on employee attrition detection using (2).pptx
rajeswari89780
 
IntroSlides-April-BuildWithAI-VertexAI.pdf
IntroSlides-April-BuildWithAI-VertexAI.pdfIntroSlides-April-BuildWithAI-VertexAI.pdf
IntroSlides-April-BuildWithAI-VertexAI.pdf
Luiz Carneiro
 
Avnet Silica's PCIM 2025 Highlights Flyer
Avnet Silica's PCIM 2025 Highlights FlyerAvnet Silica's PCIM 2025 Highlights Flyer
Avnet Silica's PCIM 2025 Highlights Flyer
WillDavies22
 
"Boiler Feed Pump (BFP): Working, Applications, Advantages, and Limitations E...
"Boiler Feed Pump (BFP): Working, Applications, Advantages, and Limitations E..."Boiler Feed Pump (BFP): Working, Applications, Advantages, and Limitations E...
"Boiler Feed Pump (BFP): Working, Applications, Advantages, and Limitations E...
Infopitaara
 
Ad

Lecture 2 Hadoop.pptx

  • 2. Learning Objectives Learning Outcomes Introduction to Hadoop 1. To study the features of Hadoop. 2. To learn the basic concepts of HDFS and MapReduce Programming. 3. To study HDFS Architecture. 4. To study MapReduce Programming Model 5. To study Hadoop Ecosystem. a) To comprehend the reasons behind the popularity of Hadoop. b) To be able to perform HDFS operations. c) To comprehend MapReduce framework. d) To understand the read and write in HDFS. e) To be able to understand Hadoop Ecosystem.
  • 3. Agenda  Hadoop - An Introduction  RDBMS versus Hadoop  Distributed Computing Challenges  History of Hadoop  Hadoop Overview  Key Aspects of Hadoop  Hadoop Components  High Level Architecture of Hadoop  Use case for Hadoop  ClickStream Data  Hadoop Distributors  HDFS  HDFS Daemons  Anatomy of File Read  Anatomy of File Write  Replica Placement Strategy  Working with HDFS commands  Special Features of HDFS
  • 4. Agenda  Processing Data with Hadoop  What is MapReduce Programming?  How does MapReduce Works?  MapReduce Word Count Example  Managing Resources and Application with Hadoop YARN  Limitations of Hadoop 1.0 Architecture  Hadoop 2 YARN: Taking Hadoop Beyond Batch  Hadoop Ecosystem  Pig  Hive  Sqoop  HBase
  • 5. Hadoop – An Introduction  Hadoop is an open-source distributed computing framework that is used for storing and processing large volumes of data.  It is designed to run on a cluster of commodity hardware, and its main components include a distributed file system (Hadoop Distributed File System or HDFS) and a parallel processing framework (MapReduce).  Its capability to handle massive amounts of data, different categories of
  • 6. What is Hadoop ? Hadoop is an open-source, Java-based framework from Apache which is used for storing, processing and analyzing data which are very huge in volume. Hadoop is used for batch/ offline processing. It is a collection of software utilities which uses a network of many computers to solve problems involving large amounts of data and computation.
  • 7. Hadoop Overview  Key Aspects of Hadoop
  • 8. Hadoop Overview  Key Aspects of Hadoop
  • 9. Hadoop Overview  Key Aspects of Hadoop
  • 10. Hadoop Overview  Key Aspects of Hadoop
  • 11. Hadoop Overview  Key Aspects of Hadoop
  • 12. History of Hadoop- Hadoop was created by Doug Cutting and Mike Cafarella in 2005, inspired by Google's MapReduce and Google File System (GFS) technologies.
  • 13. Is there any full form of HADOOP?  NO  Doug used the name for his open source project because it was relatively easy to spell and pronounce, meaningless, and not used elsewhere.
  • 16. Hadoop Components HBase is a key value store (mostly), Hive is a system to execute SQL-like queries on a Hadoop system, Pig is a special query language to access big data. Apache Sqoop is a tool that is extensively used to transfer large amounts of data from Hadoop to the relational database servers and vice-versa.
  • 19. Hadoop Components  Hadoop Core Components:  HDFS: (a) Storage component. (b) Distributes data across several nodes. (c) Natively redundant.  MapReduce: (a) Computational framework. (b) Splits a task across multiple nodes. (c) Processes data in parallel. Hadoop HDFS MapReduce
  • 20. Hadoop High Level Architecture
  • 21. Hadoop High Level Architecture
  • 22. Hadoop High Level Architecture  Every Hadoop cluster consists of a single master and multiple worker nodes.  The Master node has a Job Tracker, Task Tracker, Name Node and Data Node while  the Slave (worker node) can act as both a DataNode and TaskTracker.  Also it is possible to have data-only and compute only worker nodes.
  • 23. Modules of Hadoop  The Hadoop framework is composed of the following modules :  Hadoop Distributed File System (HDFS) : It includes the files that will be broken into blocks and will be stored in nodes over a distributed architecture. Using a distributed file system provides very high aggregate bandwidth across clusters
  • 24. Modules of Hadoop  The Hadoop framework is composed of the following modules :  Hadoop Distributed File System (HDFS)  Hadoop Yarn (Yet Another Resource Negotiator) : Used for job scheduling and managing the computing resources in clusters.
  • 25. Modules of Hadoop  The Hadoop framework is composed of the following modules :  Hadoop Distributed File System (HDFS)  Hadoop Yarn (Yet Another Resource Negotiator)  Hadoop MapReduce : It is an algorithm which distributes the task into small pieces and assigns those pieces to many computers joined over the network, and assembles all the events to form the last event dataset.
  • 26. Modules of Hadoop  The Hadoop framework is composed of the following modules :  Hadoop Distributed File System (HDFS)  Hadoop Yarn (Yet Another Resource Negotiator)  Hadoop MapReduce  Hadoop Common : Includes Java Libraries that are used to start Hadoop and utilities which are needed by other Hadoop modules.
  • 27. ClickStream Data Analysis  ClickStream data (mouse clicks) helps you to understand the purchasing behavior of customers. ClickStream analysis helps online marketers to optimize their product web pages, promotional content, etc. to improve their business.