0% found this document useful (0 votes)
24 views

Cloud Computing

The document discusses the Hadoop file system. It describes that Hadoop is an open source framework that allows storing and processing large datasets across multiple computers in parallel. It explains the key components of Hadoop including HDFS for storage, MapReduce for processing, Hadoop Common, and YARN for resource management. HDFS stores data as blocks across DataNodes and uses a NameNode for metadata. MapReduce performs distributed processing through map and reduce functions. Hadoop provides advantages like scalability, fault tolerance, and support for various platforms but also has disadvantages such as complexity, security concerns, and cost of setup and maintenance.

Uploaded by

Afia Faryad
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views

Cloud Computing

The document discusses the Hadoop file system. It describes that Hadoop is an open source framework that allows storing and processing large datasets across multiple computers in parallel. It explains the key components of Hadoop including HDFS for storage, MapReduce for processing, Hadoop Common, and YARN for resource management. HDFS stores data as blocks across DataNodes and uses a NameNode for metadata. MapReduce performs distributed processing through map and reduce functions. Hadoop provides advantages like scalability, fault tolerance, and support for various platforms but also has disadvantages such as complexity, security concerns, and cost of setup and maintenance.

Uploaded by

Afia Faryad
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 19

Hadoop file system

Ayesha Arshad
Roll no 11
Afia Faryad
Roll no 29
Qadsia Hashmi
Roll no 08
Sehar Zaheer
Roll no 48
Table of content
 Hadoop file system
 Who uses Hadoop
 History of Hadoop
 Component of Hadoop file system
 HDFS
 Components of HDFS
 File block in HDFS
 Architecture of HDFS
 Map reduce
 Architecture of map reduce
 Working of Hadoop
 Yarn and Hadoop common
 Architecture of Hadoop
 Advantages of Hadoop
 Disadvantages of Hadoop
Hadoop file system;

 Apache Hadoop is an open source framework that is used to efficiently store


and process large datasets ranging in size from gigabytes to petabytes of
data.
 Instead of using one large computer to store and process the data, Hadoop
allows clustering multiple computers to analyze massive datasets in parallel
more quickly.
Who uses Hadoop

 Hadoop is an open source framework from Apache and is used to store process
and analyze data which are very huge in volume.
 Hadoop is written in Java.
 It is being used by
• Facebook
• Yahoo
• Google
• Twitter
• LinkedIn and many more.
History of Hadoop

 The Hadoop was started by Doug Cutting and Mike Cafarella in 2002.
Components/modules of Hadoop

 HDFS

 Map Reduce

 Hadoop Common

 yarn
HDFS

 HDFS(Hadoop Distributed File System) is utilized for storage permission. It is


mainly designed for working on commodity Hardware devices(inexpensive
devices), working on a distributed file system design
 HDFS is designed in such a way that it believes more in storing the data in a
large chunk of blocks rather than storing small data blocks. 
Data storage Nodes in HDFS. 

 NameNode(Master)
 NameNode is mainly used for storing the Metadata i.e. the data about the
data. Meta Data can be the transaction logs that keep track of the user’s
activity in a Hadoop cluster. 
 DataNode(Slave)
  DataNode works as a Slave DataNode are mainly utilized for storing the
data in a Hadoop cluster, the number of DataNode can be from 1 to 500
or even more than that. 
File Block In HDFS:

  Data in HDFS is always stored in terms of blocks. So the single block of data
is divided into multiple blocks of size 128MB which is default and you can also
change it manually. 
Hdfs architecture
Map-reducing

 The major feature of MapReduce is to perform the distributed processing in


parallel in a Hadoop cluster which Makes Hadoop working so fast. When you
are dealing with Big Data, serial processing is no more of any use.
 MapReduce has mainly 2 tasks which are divided phase-wise: 
 Map  
 Reduce  
component of Map reduce

 Job Tracker
 The role of Job Tracker is to accept the MapReduce jobs from client and
process the data by using NameNode.
 In response, NameNode provides metadata to Job Tracker.
 Task Tracker
 It works as a slave node for Job Tracker.
 It receives task and code from Job Tracker and applies that code on the
file. This process can also be called as a Mapper.
MapReduce architecture
Working of Hadoop
 Hadoop runs code across a cluster of computers. This process includes the
following core tasks that Hadoop performs −
 Data is initially divided into directories and files. Files are divided into
uniform sized blocks of 128M and 64M (preferably 128M).
 These files are then distributed across various cluster nodes for further
processing.
 HDFS, being on top of the local file system, supervises the processing.
 Blocks are replicated for handling hardware failure.
 Checking that the code was executed successfully.
 Performing the sort that takes place between the map and reduce stages.
 Sending the sorted data to a certain computer.
 Writing the debugging logs for each job.
Architecture of Hadoop
Yarn and Hadoop common

 Yarn-Yet Another Resource Negotiator, as the name implies, YARN is the one
who helps to manage the resources across the clusters. In short, it performs
scheduling and resource allocation for the Hadoop System.

 Hadoop Common- it contains packages and libraries which are used for other
modules.
Advantages of Hadoop

 It is efficient, and it automatic distributes the data and work across the
machines and in turn, utilizes the underlying parallelism of the CPU cores.
 Hadoop provide fault-tolerance
 High availability (FTHA).
 Servers can be added or removed from the cluster dynamically and Hadoop
continues to operate without interruption.
 Another big advantage of Hadoop is that apart from being open source, it is
compatible on all the platforms since it is Java based.
Disadvantages of Hadoop
 Not very effective for small data.
 Hard cluster management.
 Has stability issues.
 Security concerns.
 Complexity: Hadoop can be complex to set up and maintain, especially for
organizations without a dedicated team of experts.
 Latency: Hadoop is not well-suited for low-latency workloads and may not be the best
choice for real-time data processing
 Data Security: Hadoop does not provide built-in security features such as data
encryption or user authentication, which can make it difficult to secure sensitive data.
 Cost: Hadoop can be expensive to set up and maintain, especially for organizations
with large amounts of data.
 Data Loss: In the event of a hardware failure, the data stored in a single node may be
lost permanently.
Thank you for listening!
Any question?

You might also like