BD - HadoopEcoSystem Unit 2part 1

The document discusses the Hadoop ecosystem, which includes components like HDFS, MapReduce, Sqoop, Flume, Hive, Pig, Mahout, R connectors, Ambari, Zookeeper and Oozie. It explains what each component is used for and how they work together to process and manage large datasets across clusters of computers.

Uploaded by

Rameshwar Kanade

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

27 views

BD - HadoopEcoSystem Unit 2part 1

Uploaded by

Rameshwar Kanade

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

Hadoop Ecosystem

Unit II Chapter 4 Mr. M.S.Emmi,

Faculty,
Department of MCA,
KLS GIT Belagavi.
Contents
• Understanding Hadoop Ecosystem
• Hadoop Distributed File System
• HDFS Architecture
• Concept of Blocks in HDFS Architecture
• NameNodes and DataNodes
• The Command-Line Interface
• Using HDFS Files
• Hadoop-Specific File System Types
• HDFS Commands
• The org.apache.hadoop.io package
• HDFS High availability: Features of HDFS.
An enterprise will have a computer to store and process big data. For storage purpose, the
programmers will take the help of their choice of database vendors such as Oracle, IBM, etc.
In this approach, the user interacts with the application, which in turn handles the part of data
storage and analysis.

Limitation
This approach works fine with those applications that process less voluminous data that can
be accommodated by standard database servers, or up to the limit of the processor that is
processing the data. But when it comes to dealing with huge amounts of scalable data, it is a
hectic task to process such data through a single database bottleneck.
Google’s Solution
Google solved this problem using an algorithm called MapReduce. This algorithm
divides the task into small parts and assigns them to many computers, and collects
the results from them which when integrated, form the result dataset.
Hadoop
Using the solution provided by Google, Doug Cutting and his team developed an Open Source Project
called HADOOP.
Hadoop runs applications using the MapReduce algorithm, where the data is processed in parallel with others.
In short, Hadoop is used to develop applications that could perform complete statistical analysis on huge
amounts of data.
Advantages of Hadoop

•Hadoop framework allows the user to quickly write and test distributed systems. It is efficient,
and it automatically distributes the data and work across the machines and in turn, utilizes the
underlying parallelism of the CPU cores.

•Hadoop does not rely on hardware to provide fault-tolerance and high availability (FTHA),
rather Hadoop library itself has been designed to detect and handle failures at the application
layer.

•Servers can be added or removed from the cluster dynamically and Hadoop continues to
operate without interruption.

•Another big advantage of Hadoop is that apart from being open source, it is compatible on all
the platforms since it is Java based.
How Distributed databases and Hadoop are different?
➢Distributed databases ➢Hadoop
• Deal with tables and relations • Deal with flat files in any format
• Must have schema for data • Operates on no schema for data
• Implements data fragmentation and • Divides files automatically into blocks
partitioning
• Generate notations of a job divided
• Generate notations of a transaction into tasks
• Implements ACID transaction • Implements MapReduce computing
properties model
• Allow distributed transactions • Consider every task as either a map
or a reduce.
Understanding Hadoop Ecosystem
•So exactly What is Hadoop?
• “Hadoop is a framework that allows for the distributed
processing of data sets across clusters of computers
using simple programming models.”
• Hadoop is an Apache open source framework written in
java that allows distributed processing of large datasets
across clusters of computers using simple programming
models.
Understanding Hadoop Ecosystem

Hadoop ecosystem can be defined as a

“comprehensive collection of tools and
technologies that can be effectively
implemented and deployed to provide Big
Data solutions in a cost-effective manner.”

MapReduce and Hadoop Distributed File

System (HDFS) are two components of the
Hadoop ecosystem.

Along with these two it provides a collection

of various elements to support the
complete development and deployment of
Big Data solutions.

The figure depicts the elements of the

Hadoop Ecosystem
❖ HDFS: In Hadoop, data is stored in this storage layer and it is stored across different machines in a distributed fashion.
❖ MapReduce: It helps in processing data and getting some valuable results. Latest version is YARN (Yet Another Resource
Negotiator)
❖ Sqoop: Is a mechanism to get data from relational DBs. It has import and export utility.
❖ Flume: Helps to get unstructured data into Hadoop.
❖ Hive: It is a high level language or wrapper on top of MapReduce, based on writing logic driven queries.(Created by FB)
❖ Pig: Provides high level API to process data, speedup code and make it handier(English like language, Created by Yahoo)
❖ Mahout: Machine learning component.
❖ R connectors: Provides support for statistical and Mathematical calculations.
❖ Ambari: Open source mechanism to create, provision, manage and monitor cluster.
❖ Zookeeper: Provides coordination and synchronization between tools and components of Hadoop.
❖ Oozie: Schedule jobs i.e, manages workflow.
❖ Hbase: It structures data into columns and sits on top of HDFS providing reference to HDFS data.
All these elements enable users to process
large datasets in real time and provide tools
to support various types of Hadoop projects,
schedule jobs and manage cluster
resources.

The fig depicts how the various elements of

Hadoop involve at various stages of
processing data
i
MapReduce and HDFS provide the
necessary services and basic structure to
deal with the core requirements of Big Data
solutions.
Other services and tools of the ecosystem
provide the environment and components
required to build and manage purpose
driven Big Data applications.

Unit Iii
No ratings yet
Unit Iii
20 pages
Nir Aes101so
No ratings yet
Nir Aes101so
42 pages
MSP
100% (1)
MSP
3 pages
unit 2
No ratings yet
unit 2
9 pages
hadoop ecosystem-converted
No ratings yet
hadoop ecosystem-converted
5 pages
Bda 18CS72 Mod-2
No ratings yet
Bda 18CS72 Mod-2
152 pages
UNIT-4-Hadoop Ecosystem-Part 1
No ratings yet
UNIT-4-Hadoop Ecosystem-Part 1
22 pages
Hadoop Ecosystem
No ratings yet
Hadoop Ecosystem
7 pages
Unit 4 Hadoop
No ratings yet
Unit 4 Hadoop
31 pages
Big Data – Introduction to Hadoop
No ratings yet
Big Data – Introduction to Hadoop
61 pages
Hadoop
No ratings yet
Hadoop
11 pages
Unit 5 - Introduction To Hadoop
No ratings yet
Unit 5 - Introduction To Hadoop
50 pages
Unit 5 - Introduction To Hadoop
No ratings yet
Unit 5 - Introduction To Hadoop
50 pages
Unit 3 ETI (BDA)
No ratings yet
Unit 3 ETI (BDA)
34 pages
BDA-Module2
No ratings yet
BDA-Module2
43 pages
Unit 2 - Hadoop PDF
No ratings yet
Unit 2 - Hadoop PDF
7 pages
2 Hadoop
No ratings yet
2 Hadoop
20 pages
DBMS Unit-5
No ratings yet
DBMS Unit-5
92 pages
Chapter 2 Hadoop Eco System
No ratings yet
Chapter 2 Hadoop Eco System
34 pages
Unit 2-1
No ratings yet
Unit 2-1
43 pages
BDA Module 2-2023
No ratings yet
BDA Module 2-2023
30 pages
UNIT II
No ratings yet
UNIT II
30 pages
INTRO hadoop-ecosystem
No ratings yet
INTRO hadoop-ecosystem
6 pages
BIG Data_Unit_2
No ratings yet
BIG Data_Unit_2
24 pages
BigData Unit 2
No ratings yet
BigData Unit 2
15 pages
Apache Hadoop
No ratings yet
Apache Hadoop
11 pages
Hadoop
No ratings yet
Hadoop
5 pages
Chapter 3 Hadoop
No ratings yet
Chapter 3 Hadoop
10 pages
Hadoop Presentation: Swarnali B.SC Computer Science Hons. 2 Year Chandernagore Govt. College Halder
No ratings yet
Hadoop Presentation: Swarnali B.SC Computer Science Hons. 2 Year Chandernagore Govt. College Halder
8 pages
Big Data Introduction & Ecosystems
No ratings yet
Big Data Introduction & Ecosystems
4 pages
BDA Presentations Unit-4 - Hadoop, Ecosystem
100% (1)
BDA Presentations Unit-4 - Hadoop, Ecosystem
25 pages
Bda Lab Manual
0% (1)
Bda Lab Manual
40 pages
Hadoop-How It Works
No ratings yet
Hadoop-How It Works
5 pages
BDP UNIT 4
No ratings yet
BDP UNIT 4
28 pages
UNIT2 BDA
No ratings yet
UNIT2 BDA
12 pages
Unit 2 Big Data Notes
No ratings yet
Unit 2 Big Data Notes
21 pages
BIG DATA UNIT 2
No ratings yet
BIG DATA UNIT 2
277 pages
Big Data ABHISHEK PRAJA C CCCCCCCCCCC
No ratings yet
Big Data ABHISHEK PRAJA C CCCCCCCCCCC
11 pages
Apache Hadoop
No ratings yet
Apache Hadoop
27 pages
Unit_IV_Hadoop
No ratings yet
Unit_IV_Hadoop
90 pages
Big Data RAJNEESH CCC
No ratings yet
Big Data RAJNEESH CCC
11 pages
Big Data Unit II
No ratings yet
Big Data Unit II
42 pages
Report On An Exploratory Analysis of The
No ratings yet
Report On An Exploratory Analysis of The
19 pages
8 MapReduce Different Phases 08-01-2025
No ratings yet
8 MapReduce Different Phases 08-01-2025
28 pages
Big Data Analytics
From Everand
Big Data Analytics
Nitin Kumar Yadav
No ratings yet
Hadoop and Their Ecosystem
100% (2)
Hadoop and Their Ecosystem
24 pages
Unit 3 Introduction To Hadoop Syllabus
No ratings yet
Unit 3 Introduction To Hadoop Syllabus
22 pages
Unit 2
No ratings yet
Unit 2
23 pages
Big Data Analytics Unit-3
No ratings yet
Big Data Analytics Unit-3
15 pages
Big Data Lab Manual
No ratings yet
Big Data Lab Manual
44 pages
BDA Unit-3
No ratings yet
BDA Unit-3
47 pages
Unit 2
No ratings yet
Unit 2
73 pages
CC-KML051-Unit V
No ratings yet
CC-KML051-Unit V
17 pages
Bda Unit-2
No ratings yet
Bda Unit-2
52 pages
Module 2. 16974328568170
No ratings yet
Module 2. 16974328568170
113 pages
Unit-III (Big Data) Final
No ratings yet
Unit-III (Big Data) Final
34 pages
Hadoop Overview Training Material
No ratings yet
Hadoop Overview Training Material
44 pages
Hadoop Ecosystem
No ratings yet
Hadoop Ecosystem
56 pages
Understanding Hadoop Ecosystem
No ratings yet
Understanding Hadoop Ecosystem
38 pages
Hadoop Ecosystem
No ratings yet
Hadoop Ecosystem
5 pages
Exploring Hadoop Ecosystem (Volume 1): Batch Processing
From Everand
Exploring Hadoop Ecosystem (Volume 1): Batch Processing
Wei Liu
No ratings yet
Mastering Big Data and Hadoop: From Basics to Expert Proficiency
From Everand
Mastering Big Data and Hadoop: From Basics to Expert Proficiency
William Smith
No ratings yet
Product Backlog
No ratings yet
Product Backlog
8 pages
SAPQA
No ratings yet
SAPQA
2 pages
Resources PayPerCall Community
No ratings yet
Resources PayPerCall Community
6 pages
Security Term Defination
No ratings yet
Security Term Defination
42 pages
Laudon Ess7 ch12 PDF
No ratings yet
Laudon Ess7 ch12 PDF
20 pages
Dinesh Chand PDF
No ratings yet
Dinesh Chand PDF
1 page
ESEYE Portals Capacity Planning
No ratings yet
ESEYE Portals Capacity Planning
9 pages
Personal Online Banking - BDO Network Bank, Inc
No ratings yet
Personal Online Banking - BDO Network Bank, Inc
2 pages
Pavlo Bond a Renko Resume
No ratings yet
Pavlo Bond a Renko Resume
2 pages
2yrs Mca Sem2
No ratings yet
2yrs Mca Sem2
14 pages
DM Lab Manual IV Cse I Sem
No ratings yet
DM Lab Manual IV Cse I Sem
36 pages
Load Tests Through Jmeter & Case Study
No ratings yet
Load Tests Through Jmeter & Case Study
36 pages
Groupware Technology
100% (5)
Groupware Technology
18 pages
IT Security Event Management: Yahya Mehdizadeh
No ratings yet
IT Security Event Management: Yahya Mehdizadeh
14 pages
Basic Concepts Micro Strategy Interview Question and Answers
No ratings yet
Basic Concepts Micro Strategy Interview Question and Answers
6 pages
7 ITIL V3 Foundation Sample Exam 3
No ratings yet
7 ITIL V3 Foundation Sample Exam 3
15 pages
KUMAR
No ratings yet
KUMAR
4 pages
Social Listening Tools Guide
No ratings yet
Social Listening Tools Guide
15 pages
Pertemuan Ke 10 Work Breakdown Structure
No ratings yet
Pertemuan Ke 10 Work Breakdown Structure
11 pages
FAQ
No ratings yet
FAQ
8 pages
Dokumen - Tips - Deltav Sis Standalone Deltav Documentsdeltav Sis Process Safety System Whitepaper
No ratings yet
Dokumen - Tips - Deltav Sis Standalone Deltav Documentsdeltav Sis Process Safety System Whitepaper
13 pages
Cryptocurrency Exchange Development
No ratings yet
Cryptocurrency Exchange Development
10 pages
CV - Aashika
No ratings yet
CV - Aashika
4 pages
Template - Dimensional Modeling Workbook
No ratings yet
Template - Dimensional Modeling Workbook
5 pages
BRF+ in Real Time
No ratings yet
BRF+ in Real Time
12 pages
Cyber Cafe Management System
0% (1)
Cyber Cafe Management System
8 pages
Dss Cheetshet
No ratings yet
Dss Cheetshet
3 pages
Library System Thesis Proposal
100% (3)
Library System Thesis Proposal
6 pages

BD - HadoopEcoSystem Unit 2part 1

Uploaded by

BD - HadoopEcoSystem Unit 2part 1

Uploaded by

Hadoop Ecosystem

Unit II Chapter 4 Mr. M.S.Emmi,

Hadoop ecosystem can be defined as a

MapReduce and Hadoop Distributed File

Along with these two it provides a collection

The figure depicts the elements of the

The fig depicts how the various elements of

You might also like