Cloud Computing

The document discusses the Hadoop file system. It describes that Hadoop is an open source framework that allows storing and processing large datasets across multiple computers in parallel. It explains the key components of Hadoop including HDFS for storage, MapReduce for processing, Hadoop Common, and YARN for resource management. HDFS stores data as blocks across DataNodes and uses a NameNode for metadata. MapReduce performs distributed processing through map and reduce functions. Hadoop provides advantages like scalability, fault tolerance, and support for various platforms but also has disadvantages such as complexity, security concerns, and cost of setup and maintenance.

Uploaded by

Afia Faryad

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

24 views

Cloud Computing

Uploaded by

Afia Faryad

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 19

Hadoop file system

Ayesha Arshad
Roll no 11
Afia Faryad
Roll no 29
Qadsia Hashmi
Roll no 08
Sehar Zaheer
Roll no 48
Table of content
 Hadoop file system
 Who uses Hadoop
 History of Hadoop
 Component of Hadoop file system
 HDFS
 Components of HDFS
 File block in HDFS
 Architecture of HDFS
 Map reduce
 Architecture of map reduce
 Working of Hadoop
 Yarn and Hadoop common
 Architecture of Hadoop
 Advantages of Hadoop
 Disadvantages of Hadoop
Hadoop file system;

 Apache Hadoop is an open source framework that is used to efficiently store

and process large datasets ranging in size from gigabytes to petabytes of
data.
 Instead of using one large computer to store and process the data, Hadoop
allows clustering multiple computers to analyze massive datasets in parallel
more quickly.
Who uses Hadoop

 Hadoop is an open source framework from Apache and is used to store process
and analyze data which are very huge in volume.
 Hadoop is written in Java.
 It is being used by
• Facebook
• Yahoo
• Google
• Twitter
• LinkedIn and many more.
History of Hadoop

 The Hadoop was started by Doug Cutting and Mike Cafarella in 2002.
Components/modules of Hadoop

 HDFS

 Map Reduce

 Hadoop Common

 yarn
HDFS

 HDFS(Hadoop Distributed File System) is utilized for storage permission. It is

mainly designed for working on commodity Hardware devices(inexpensive
devices), working on a distributed file system design
 HDFS is designed in such a way that it believes more in storing the data in a
large chunk of blocks rather than storing small data blocks.
Data storage Nodes in HDFS.

 NameNode(Master)
 NameNode is mainly used for storing the Metadata i.e. the data about the
data. Meta Data can be the transaction logs that keep track of the user’s
activity in a Hadoop cluster.
 DataNode(Slave)
 DataNode works as a Slave DataNode are mainly utilized for storing the
data in a Hadoop cluster, the number of DataNode can be from 1 to 500
or even more than that.
File Block In HDFS:

 Data in HDFS is always stored in terms of blocks. So the single block of data
is divided into multiple blocks of size 128MB which is default and you can also
change it manually.
Hdfs architecture
Map-reducing

 The major feature of MapReduce is to perform the distributed processing in

parallel in a Hadoop cluster which Makes Hadoop working so fast. When you
are dealing with Big Data, serial processing is no more of any use.
 MapReduce has mainly 2 tasks which are divided phase-wise:
 Map
 Reduce
component of Map reduce

 Job Tracker
 The role of Job Tracker is to accept the MapReduce jobs from client and
process the data by using NameNode.
 In response, NameNode provides metadata to Job Tracker.
 Task Tracker
 It works as a slave node for Job Tracker.
 It receives task and code from Job Tracker and applies that code on the
file. This process can also be called as a Mapper.
MapReduce architecture
Working of Hadoop
 Hadoop runs code across a cluster of computers. This process includes the
following core tasks that Hadoop performs −
 Data is initially divided into directories and files. Files are divided into
uniform sized blocks of 128M and 64M (preferably 128M).
 These files are then distributed across various cluster nodes for further
processing.
 HDFS, being on top of the local file system, supervises the processing.
 Blocks are replicated for handling hardware failure.
 Checking that the code was executed successfully.
 Performing the sort that takes place between the map and reduce stages.
 Sending the sorted data to a certain computer.
 Writing the debugging logs for each job.
Architecture of Hadoop
Yarn and Hadoop common

 Yarn-Yet Another Resource Negotiator, as the name implies, YARN is the one
who helps to manage the resources across the clusters. In short, it performs
scheduling and resource allocation for the Hadoop System.

 Hadoop Common- it contains packages and libraries which are used for other
modules.
Advantages of Hadoop

 It is efficient, and it automatic distributes the data and work across the
machines and in turn, utilizes the underlying parallelism of the CPU cores.
 Hadoop provide fault-tolerance
 High availability (FTHA).
 Servers can be added or removed from the cluster dynamically and Hadoop
continues to operate without interruption.
 Another big advantage of Hadoop is that apart from being open source, it is
compatible on all the platforms since it is Java based.
Disadvantages of Hadoop
 Not very effective for small data.
 Hard cluster management.
 Has stability issues.
 Security concerns.
 Complexity: Hadoop can be complex to set up and maintain, especially for
organizations without a dedicated team of experts.
 Latency: Hadoop is not well-suited for low-latency workloads and may not be the best
choice for real-time data processing
 Data Security: Hadoop does not provide built-in security features such as data
encryption or user authentication, which can make it difficult to secure sensitive data.
 Cost: Hadoop can be expensive to set up and maintain, especially for organizations
with large amounts of data.
 Data Loss: In the event of a hardware failure, the data stored in a single node may be
lost permanently.
Thank you for listening!
Any question?

BDA Notes
No ratings yet
BDA Notes
25 pages
Unit-2-_Hadoop2_
No ratings yet
Unit-2-_Hadoop2_
30 pages
HADOOP
No ratings yet
HADOOP
19 pages
Unit-2 Hadoop HDFS Hadoopecosystem
No ratings yet
Unit-2 Hadoop HDFS Hadoopecosystem
25 pages
Introduction To Hadoop
No ratings yet
Introduction To Hadoop
5 pages
Top Hadoop Interview Q&A
No ratings yet
Top Hadoop Interview Q&A
25 pages
Big Data Analytics – Unit 4
No ratings yet
Big Data Analytics – Unit 4
32 pages
Unit 4 - Data Science - Www.rgpvnotes.in
No ratings yet
Unit 4 - Data Science - Www.rgpvnotes.in
18 pages
Unit II Big Data
No ratings yet
Unit II Big Data
27 pages
Hadoop Overview
100% (1)
Hadoop Overview
16 pages
Unit-2 Hadoop
No ratings yet
Unit-2 Hadoop
16 pages
Report On An Exploratory Analysis of The
No ratings yet
Report On An Exploratory Analysis of The
19 pages
Chapter_6 - Hadoop
No ratings yet
Chapter_6 - Hadoop
51 pages
shawn
No ratings yet
shawn
4 pages
CC Unit - 5
No ratings yet
CC Unit - 5
27 pages
cloud computing Unit-5
No ratings yet
cloud computing Unit-5
22 pages
Unit 2 Big Data Notes
No ratings yet
Unit 2 Big Data Notes
21 pages
INTRODUCTION TO DATA SCIENCE
No ratings yet
INTRODUCTION TO DATA SCIENCE
14 pages
CC UNIT 2 (1)
No ratings yet
CC UNIT 2 (1)
29 pages
Hadoop
No ratings yet
Hadoop
83 pages
BIG Data_Unit_2
No ratings yet
BIG Data_Unit_2
24 pages
Haddob Lab Report
No ratings yet
Haddob Lab Report
12 pages
BDA Lab Assignment 2
No ratings yet
BDA Lab Assignment 2
18 pages
HADOOP FRAME WORK
No ratings yet
HADOOP FRAME WORK
38 pages
HADOOP
No ratings yet
HADOOP
40 pages
To Hadoop: A Dell Technical White Paper
No ratings yet
To Hadoop: A Dell Technical White Paper
9 pages
02 Unit-II Hadoop Architecture and HDFS
No ratings yet
02 Unit-II Hadoop Architecture and HDFS
18 pages
Hadoop Features 2
No ratings yet
Hadoop Features 2
3 pages
8-2-2019 9.55-10.50
No ratings yet
8-2-2019 9.55-10.50
28 pages
IMTC634_Data Science_Chapter 13
No ratings yet
IMTC634_Data Science_Chapter 13
16 pages
HADOOP
No ratings yet
HADOOP
18 pages
Hadoop Ecosystem
No ratings yet
Hadoop Ecosystem
7 pages
Hadoop-How It Works
No ratings yet
Hadoop-How It Works
5 pages
CC Unit 5
No ratings yet
CC Unit 5
43 pages
Hadoop Ecosystem PDF
No ratings yet
Hadoop Ecosystem PDF
6 pages
BDA Lab Assignment 3 PDF
No ratings yet
BDA Lab Assignment 3 PDF
17 pages
Fbda Unit-3
No ratings yet
Fbda Unit-3
27 pages
Chapter_6_839bc026c9704c0b899907d1ad5145b3_1712934164767
No ratings yet
Chapter_6_839bc026c9704c0b899907d1ad5145b3_1712934164767
19 pages
UNIT 5 Combined
No ratings yet
UNIT 5 Combined
13 pages
Unit1
No ratings yet
Unit1
50 pages
Module-2 PPT-1
No ratings yet
Module-2 PPT-1
126 pages
BDA Unit-3
No ratings yet
BDA Unit-3
47 pages
BDM 2
No ratings yet
BDM 2
5 pages
Big Data Lecture Presentation
No ratings yet
Big Data Lecture Presentation
28 pages
Big Data Analytics Unit-3
No ratings yet
Big Data Analytics Unit-3
15 pages
Hadoop Interview1
No ratings yet
Hadoop Interview1
27 pages
Unit III
No ratings yet
Unit III
15 pages
High Performance Fault-Tolerant Hadoop Distributed File System
No ratings yet
High Performance Fault-Tolerant Hadoop Distributed File System
9 pages
High Performance Fault-Tolerant Hadoop Distributed File System
No ratings yet
High Performance Fault-Tolerant Hadoop Distributed File System
9 pages
BDA Lab Assignment 1 PDF
No ratings yet
BDA Lab Assignment 1 PDF
20 pages
Hadoop
No ratings yet
Hadoop
11 pages
Guided By:-Prof. K. Kakwani: Payal M. Wadhwani
No ratings yet
Guided By:-Prof. K. Kakwani: Payal M. Wadhwani
24 pages
Unit 2
No ratings yet
Unit 2
21 pages
Bigdata Unit IV
No ratings yet
Bigdata Unit IV
29 pages
hadoop_1_88c3acc6-f4eb-4017-a334-f88abc6e813f
No ratings yet
hadoop_1_88c3acc6-f4eb-4017-a334-f88abc6e813f
8 pages
HADOOP ECOSSYTEM, COMPONENTS, Loading, Getting Data From Hadoop
No ratings yet
HADOOP ECOSSYTEM, COMPONENTS, Loading, Getting Data From Hadoop
10 pages
Big Data - Unit 2 Hadoop Framework
100% (1)
Big Data - Unit 2 Hadoop Framework
19 pages
Apache Hadoop: Developer(s) Stable Release Preview Release
No ratings yet
Apache Hadoop: Developer(s) Stable Release Preview Release
5 pages
Wa0001.
No ratings yet
Wa0001.
56 pages
Exploring Hadoop Ecosystem (Volume 1): Batch Processing
From Everand
Exploring Hadoop Ecosystem (Volume 1): Batch Processing
Wei Liu
No ratings yet
running-autodesk-autoCAD-on-amazon-appstream-2-0
No ratings yet
running-autodesk-autoCAD-on-amazon-appstream-2-0
1 page
Introduction To Amazon s3
No ratings yet
Introduction To Amazon s3
32 pages
Cloud Applications
No ratings yet
Cloud Applications
73 pages
IBM - IBM Storage Scale Big Data and Analytics Guide (2023)
No ratings yet
IBM - IBM Storage Scale Big Data and Analytics Guide (2023)
574 pages
Unit 1: How To Deploy SAP Model Company
No ratings yet
Unit 1: How To Deploy SAP Model Company
10 pages
Introduction To Hadoop
No ratings yet
Introduction To Hadoop
52 pages
Two Mark Questions - Unit 3
No ratings yet
Two Mark Questions - Unit 3
6 pages
Kafka Cheat Sheet
No ratings yet
Kafka Cheat Sheet
2 pages
12 01 2023 - CSE3009 IOT BK MQTT Topics
No ratings yet
12 01 2023 - CSE3009 IOT BK MQTT Topics
30 pages
Modernized Backup & Recovery For Nutanix Enterprise Cloud
No ratings yet
Modernized Backup & Recovery For Nutanix Enterprise Cloud
2 pages
Hybrid-Cloud-Management-ServiceNow-Morpheus-CMP
No ratings yet
Hybrid-Cloud-Management-ServiceNow-Morpheus-CMP
2 pages
Developing Cloud-Native Solutions with Microsoft Azure and .NET: Build Highly Scalable Solutions for the Enterprise 1 / converted Edition Ashirwad Satapathi download pdf
100% (1)
Developing Cloud-Native Solutions with Microsoft Azure and .NET: Build Highly Scalable Solutions for the Enterprise 1 / converted Edition Ashirwad Satapathi download pdf
65 pages
AWS Solution Architect
No ratings yet
AWS Solution Architect
6 pages
Professional Cloud Architect Exam - Free Actual Q&As, Page 13 - ExamTopics
No ratings yet
Professional Cloud Architect Exam - Free Actual Q&As, Page 13 - ExamTopics
3 pages
Implementation of Virtual Private Server (VPS) Using Digital Ocean Cloud Server On BMT. Mentari East Lampung
No ratings yet
Implementation of Virtual Private Server (VPS) Using Digital Ocean Cloud Server On BMT. Mentari East Lampung
6 pages
Amazon Interview Questions
No ratings yet
Amazon Interview Questions
4 pages
AWS CLF CO2 Test Exam 5
No ratings yet
AWS CLF CO2 Test Exam 5
4 pages
Step 1: Learn The Fundamentals of The AWS Cloud: Learning Resource Duration Type
No ratings yet
Step 1: Learn The Fundamentals of The AWS Cloud: Learning Resource Duration Type
3 pages
MOP for Firmware Upgrade and Feature Key upgrade
No ratings yet
MOP for Firmware Upgrade and Feature Key upgrade
9 pages
Youjun 04
No ratings yet
Youjun 04
100 pages
Rmi !
No ratings yet
Rmi !
2 pages
Welcome To Module 2: Understanding Virtualization
No ratings yet
Welcome To Module 2: Understanding Virtualization
8 pages
SNSRT19 Voice Command Smart Home Rafael 19
No ratings yet
SNSRT19 Voice Command Smart Home Rafael 19
21 pages
Verada - Timetable
No ratings yet
Verada - Timetable
5 pages
Big Data Technologies On Map Reduce and Hadoop
No ratings yet
Big Data Technologies On Map Reduce and Hadoop
2 pages
M8 - Course Summary v5.2 - ILT
No ratings yet
M8 - Course Summary v5.2 - ILT
17 pages
Azure Fundamentals - Cloud Computing Transcript
No ratings yet
Azure Fundamentals - Cloud Computing Transcript
15 pages
CPE Sample Questions
No ratings yet
CPE Sample Questions
8 pages
communicating with cloud applications
No ratings yet
communicating with cloud applications
4 pages
II H&S CS3551 DC QB Unit-5
No ratings yet
II H&S CS3551 DC QB Unit-5
4 pages

Cloud Computing

Uploaded by

Cloud Computing

Uploaded by

Hadoop file system

 Apache Hadoop is an open source framework that is used to efficiently store

 HDFS(Hadoop Distributed File System) is utilized for storage permission. It is

 The major feature of MapReduce is to perform the distributed processing in

You might also like