0% found this document useful (0 votes)

6 views

CS 4407 Discussion Forum Unit 2

Hadoop is an open-source framework for distributed storage and processing of large datasets across computer clusters, designed to scale from a single machine to thousands. It consists of four main modules: HDFS for data storage, YARN for resource management, MapReduce for data processing, and Hadoop Common for shared utilities. Hadoop is crucial for analytics as it can handle vast amounts of data, is scalable and fault-tolerant, making it suitable for applications like CRM, fraud detection, risk management, and machine learning.

Uploaded by

Danial Naveed

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views

CS 4407 Discussion Forum Unit 2

Uploaded by

Danial Naveed

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 2

Hadoop is an open-source framework that allows for the distributed storage

and processing of large datasets across clusters of computers using simple

programming models. It is designed to scale from a single computer to
thousands of clustered computers, each offering local computation and
storage. In this way, Hadoop can efficiently store and process large datasets
ranging in size from gigabytes to petabytes of data (What Is Hadoop and
What Is It Used for? | Google Cloud, n.d.).

How Hadoop Functions

Hadoop works by breaking large datasets into smaller chunks and

distributing them across the nodes in a cluster. Each node then processes its
chunk of data in parallel, and the results are combined to produce the final
output. This parallel processing approach makes Hadoop very efficient for
processing large datasets (What Is Hadoop and What Is It Used for? | Google
Cloud, n.d.).

Hadoop consists of four main modules:

1. Hadoop Distributed File System (HDFS): This module stores data.

It splits the data into blocks and stores them across the nodes in the
cluster (Zhasa, 2024).

2. Yet Another Resource Negotiator (YARN): This module is

responsible for managing the resources in the cluster. It allocates
resources to the different applications that are running on the cluster
(Zhasa, 2024).

3. MapReduce: This module is responsible for processing the data. It

divides the processing into two stages: the map stage and the reduce
stage. The map stage processes the data in parallel, and the reduce
stage combines the results (Zhasa, 2024).

4. Hadoop Common: This module provides common libraries and

utilities that are used by the other Hadoop modules (Zhasa, 2024).

Importance of Hadoop as an Analytics Technology

Hadoop is an important analytics technology because it can store and

process large datasets that are too big for traditional databases. It is also
very scalable, so it can handle the ever-increasing amounts of data that are
being generated today. Additionally, Hadoop is fault-tolerant, so it can
continue to operate even if some of the nodes in the cluster fail (Ashwin,
2024).
Hadoop is used by many organizations to analyze large datasets for a variety
of purposes, such as:

 Customer relationship management (CRM): Hadoop can be used

to store and analyze customer data to improve marketing and sales
efforts (Hadoop: What It Is and Why It Matters, n.d.).

 Fraud detection: Hadoop can be used to identify fraudulent activity

by analyzing large datasets of financial transactions (What Is Hadoop
and What Is It Used for? | Google Cloud, n.d.).

 Risk management: Hadoop can be used to assess and manage risk

by analyzing large datasets of financial and operational data.

 Machine learning: Hadoop can be used to train machine learning

models on large datasets.

In conclusion, Hadoop is a powerful analytics technology that can be used to

store and process large datasets. It is scalable, fault-tolerant, and can be
used for a variety of purposes. As the amount of data that is being generated
continues to grow, Hadoop will become even more important as an analytics
technology.

References:

What is Hadoop and What is it Used For? | Google Cloud. (n.d.). Google
Cloud. https://cloud.google.com/learn/what-is-hadoop

Zhasa, M. (2024, August 13). What is hadoop? Components of hadoop and

how does it work. Simplilearn.com.
https://www.simplilearn.com/tutorials/hadoop-tutorial/what-is-Hadoop

Ashwin. (2024, May 15). Introduction to Apache Hadoop for Big Data |
Medium. Medium. https://medium.com/@ashwin_kumar_/introduction-to-
apache-hadoop-for-big-data-30c85460580f

Hadoop: What it is and why it matters. (n.d.). SAS.

https://www.sas.com/en_us/insights/big-data/hadoop.html

Unit Iii
No ratings yet
Unit Iii
20 pages
Assign1 APM462 S2016
No ratings yet
Assign1 APM462 S2016
4 pages
BDA Unit 3
No ratings yet
BDA Unit 3
6 pages
Chapter - 2 Hadoop
No ratings yet
Chapter - 2 Hadoop
32 pages
CC-KML051-Unit V
No ratings yet
CC-KML051-Unit V
17 pages
HADOOP and PYTHON For BEGINNERS - 2 BOOKS in 1 - Learn Coding Fast! HADOOP and PYTHON Crash Course, A QuickStart Guide, Tutorial Book by Program Examples, in Easy Steps!
100% (1)
HADOOP and PYTHON For BEGINNERS - 2 BOOKS in 1 - Learn Coding Fast! HADOOP and PYTHON Crash Course, A QuickStart Guide, Tutorial Book by Program Examples, in Easy Steps!
89 pages
CS 4407 Discussion Forum Unit 2
No ratings yet
CS 4407 Discussion Forum Unit 2
2 pages
Big Data Hadoop Stack
No ratings yet
Big Data Hadoop Stack
52 pages
Big Data Assignment 1
No ratings yet
Big Data Assignment 1
6 pages
Hadoop Features 2
No ratings yet
Hadoop Features 2
3 pages
INTRODUCTION TO DATA SCIENCE
No ratings yet
INTRODUCTION TO DATA SCIENCE
14 pages
Big Data - Unit 2 Hadoop Framework
100% (1)
Big Data - Unit 2 Hadoop Framework
19 pages
Parallel Project
No ratings yet
Parallel Project
32 pages
Haddob Lab Report
No ratings yet
Haddob Lab Report
12 pages
CC Unit - 5
No ratings yet
CC Unit - 5
27 pages
Assignment 5 (Hadoop)
No ratings yet
Assignment 5 (Hadoop)
1 page
data analyst
No ratings yet
data analyst
9 pages
Big Data Analytics Unit-3
No ratings yet
Big Data Analytics Unit-3
15 pages
Hadoop - Quick Guide Hadoop - Big Data Overview
No ratings yet
Hadoop - Quick Guide Hadoop - Big Data Overview
41 pages
SDL Module-No SQL Module Assignment No. 2: Q1 What Is Hadoop and Need For It? Discuss It's Architecture
No ratings yet
SDL Module-No SQL Module Assignment No. 2: Q1 What Is Hadoop and Need For It? Discuss It's Architecture
6 pages
BIG Data_Unit_2
No ratings yet
BIG Data_Unit_2
24 pages
Unit 2 Big Data Notes
No ratings yet
Unit 2 Big Data Notes
21 pages
Big Data Technologies On Map Reduce and Hadoop
No ratings yet
Big Data Technologies On Map Reduce and Hadoop
2 pages
BDA Module 2
No ratings yet
BDA Module 2
40 pages
Testing Big Data: Camelia Rad
No ratings yet
Testing Big Data: Camelia Rad
31 pages
Hadoop
No ratings yet
Hadoop
7 pages
Hadoop Ecosystem
No ratings yet
Hadoop Ecosystem
7 pages
Unit-2-_Hadoop2_
No ratings yet
Unit-2-_Hadoop2_
30 pages
IMTC634_Data Science_Chapter 13
No ratings yet
IMTC634_Data Science_Chapter 13
16 pages
Chapter 2 Hadoop Eco System
No ratings yet
Chapter 2 Hadoop Eco System
34 pages
Big Data Mahout
No ratings yet
Big Data Mahout
10 pages
Hadoop
No ratings yet
Hadoop
11 pages
Unit III
No ratings yet
Unit III
15 pages
BDH UNITs
No ratings yet
BDH UNITs
2 pages
Big Data Overview
No ratings yet
Big Data Overview
18 pages
Poetic Seminar
No ratings yet
Poetic Seminar
17 pages
Bda Aiml Note Unit 2
No ratings yet
Bda Aiml Note Unit 2
13 pages
Unit 2
No ratings yet
Unit 2
10 pages
Report On An Exploratory Analysis of The
No ratings yet
Report On An Exploratory Analysis of The
19 pages
CC UNIT 2 (1)
No ratings yet
CC UNIT 2 (1)
29 pages
Bigdata
No ratings yet
Bigdata
6 pages
Unit 2
No ratings yet
Unit 2
56 pages
BDA Notes Unit-2
No ratings yet
BDA Notes Unit-2
27 pages
CASE STUDY On Application of Hadoop
No ratings yet
CASE STUDY On Application of Hadoop
16 pages
Big Data Architecture
No ratings yet
Big Data Architecture
17 pages
Integrating R and Hadoop For Big Data Analysis
No ratings yet
Integrating R and Hadoop For Big Data Analysis
12 pages
Unit - I Introduction To Big Data
No ratings yet
Unit - I Introduction To Big Data
38 pages
Hadoop Notesforstudents
No ratings yet
Hadoop Notesforstudents
13 pages
Bda QB
No ratings yet
Bda QB
18 pages
Big Data Analytics
No ratings yet
Big Data Analytics
12 pages
Hadoop and Its Ecosystem.docx edited
No ratings yet
Hadoop and Its Ecosystem.docx edited
10 pages
WP Machine Learn Hadoop
No ratings yet
WP Machine Learn Hadoop
2 pages
Unit 3 ETI (BDA)
No ratings yet
Unit 3 ETI (BDA)
34 pages
Bda Unit-Iii-R20
No ratings yet
Bda Unit-Iii-R20
44 pages
Unit II BDA
No ratings yet
Unit II BDA
32 pages
Cloud - UNIT V
No ratings yet
Cloud - UNIT V
18 pages
Seminar Report PDF
100% (2)
Seminar Report PDF
35 pages
Module III Note
No ratings yet
Module III Note
36 pages
Hadoop Ecosystem for Big Data
From Everand
Hadoop Ecosystem for Big Data
Dr. Zemelak Goraga
No ratings yet
Mastering Big Data and Hadoop: From Basics to Expert Proficiency
From Everand
Mastering Big Data and Hadoop: From Basics to Expert Proficiency
William Smith
No ratings yet
Exploring Hadoop Ecosystem (Volume 1): Batch Processing
From Everand
Exploring Hadoop Ecosystem (Volume 1): Batch Processing
Wei Liu
No ratings yet
PVT & Eos Modelling: Using Pvtsim Software
No ratings yet
PVT & Eos Modelling: Using Pvtsim Software
109 pages
MC770 / MC780 MPS3537mc / MPS4242mc Maintenance Manual: Downloaded From Manuals Search Engine
No ratings yet
MC770 / MC780 MPS3537mc / MPS4242mc Maintenance Manual: Downloaded From Manuals Search Engine
217 pages
Univeristy of Mindanao: Cpe 545/L - (Subject Description)
No ratings yet
Univeristy of Mindanao: Cpe 545/L - (Subject Description)
2 pages
Fba 311 B.i.t-1-1
No ratings yet
Fba 311 B.i.t-1-1
131 pages
Avi Components
No ratings yet
Avi Components
16 pages
Department of Education: Republic of The Philippines
No ratings yet
Department of Education: Republic of The Philippines
2 pages
Hillstone SG 6000 E Series Hardware Reference Guide 1
No ratings yet
Hillstone SG 6000 E Series Hardware Reference Guide 1
62 pages
engproc-59-00037
No ratings yet
engproc-59-00037
9 pages
Sikagrout®-114 Ae: Product Data Sheet
No ratings yet
Sikagrout®-114 Ae: Product Data Sheet
3 pages
Apollonius Rhodius - Argonautika
No ratings yet
Apollonius Rhodius - Argonautika
457 pages
StayingConnectedVol2 10
No ratings yet
StayingConnectedVol2 10
3 pages
Fidelity International Internship Recruitment Drive on 16th Nov'2024- 2025 Graduating Batch
No ratings yet
Fidelity International Internship Recruitment Drive on 16th Nov'2024- 2025 Graduating Batch
33 pages
Side Channel Pumps: Self-Priming, Multi-Stage Type
No ratings yet
Side Channel Pumps: Self-Priming, Multi-Stage Type
5 pages
المحاضرة الاولى اشارة ونظم
No ratings yet
المحاضرة الاولى اشارة ونظم
34 pages
06laplac Ti 89 Laplas
No ratings yet
06laplac Ti 89 Laplas
10 pages
Sections in Red Color Are Required To Be Filled To Submit/Forward The Application For Approval
No ratings yet
Sections in Red Color Are Required To Be Filled To Submit/Forward The Application For Approval
2 pages
EOT SM Electrics A067
No ratings yet
EOT SM Electrics A067
6 pages
Produced by An Autodesk Student Version: Tampak Atas Evaporating Chamber Secondary Steam
No ratings yet
Produced by An Autodesk Student Version: Tampak Atas Evaporating Chamber Secondary Steam
1 page
CS501 Assignment Solution
No ratings yet
CS501 Assignment Solution
3 pages
Webforge Handrail 2020 Web 3
No ratings yet
Webforge Handrail 2020 Web 3
17 pages
Feed Waiter Boiler Sustem
No ratings yet
Feed Waiter Boiler Sustem
9 pages
DM-Question Bank 2024-25 Objective Question Bank
No ratings yet
DM-Question Bank 2024-25 Objective Question Bank
14 pages
Scorpio - N Introductory Pricelist 30th July'22 BS6
No ratings yet
Scorpio - N Introductory Pricelist 30th July'22 BS6
1 page
(Prometheus & Grafana) Use and Create Own Performance Dashboard
No ratings yet
(Prometheus & Grafana) Use and Create Own Performance Dashboard
10 pages
HL Haolin Elec 5N04 - C237240
No ratings yet
HL Haolin Elec 5N04 - C237240
4 pages
Opamp: Inverting Amplifier Non-Inverting Amplifier Differential Amplifier Summing Amplifier
No ratings yet
Opamp: Inverting Amplifier Non-Inverting Amplifier Differential Amplifier Summing Amplifier
9 pages
Shafa Salsabila (11190423) Assignment 5
No ratings yet
Shafa Salsabila (11190423) Assignment 5
2 pages
NIT Trichy - Pragyan - Hackathon Problem Statement
No ratings yet
NIT Trichy - Pragyan - Hackathon Problem Statement
2 pages
Experiment 2: Tripping Characteristics of Fuse & MCB
No ratings yet
Experiment 2: Tripping Characteristics of Fuse & MCB
17 pages

CS 4407 Discussion Forum Unit 2

Uploaded by

CS 4407 Discussion Forum Unit 2

Uploaded by

Hadoop is an open-source framework that allows for the distributed storage

and processing of large datasets across clusters of computers using simple

How Hadoop Functions

Hadoop works by breaking large datasets into smaller chunks and

Hadoop consists of four main modules:

1. Hadoop Distributed File System (HDFS): This module stores data.

2. Yet Another Resource Negotiator (YARN): This module is

3. MapReduce: This module is responsible for processing the data. It

4. Hadoop Common: This module provides common libraries and

Importance of Hadoop as an Analytics Technology

Hadoop is an important analytics technology because it can store and

 Customer relationship management (CRM): Hadoop can be used

 Fraud detection: Hadoop can be used to identify fraudulent activity

 Risk management: Hadoop can be used to assess and manage risk

 Machine learning: Hadoop can be used to train machine learning

In conclusion, Hadoop is a powerful analytics technology that can be used to

Zhasa, M. (2024, August 13). What is hadoop? Components of hadoop and

Hadoop: What it is and why it matters. (n.d.). SAS.

You might also like