Cloud Computing
Cloud Computing
Ayesha Arshad
Roll no 11
Afia Faryad
Roll no 29
Qadsia Hashmi
Roll no 08
Sehar Zaheer
Roll no 48
Table of content
Hadoop file system
Who uses Hadoop
History of Hadoop
Component of Hadoop file system
HDFS
Components of HDFS
File block in HDFS
Architecture of HDFS
Map reduce
Architecture of map reduce
Working of Hadoop
Yarn and Hadoop common
Architecture of Hadoop
Advantages of Hadoop
Disadvantages of Hadoop
Hadoop file system;
Hadoop is an open source framework from Apache and is used to store process
and analyze data which are very huge in volume.
Hadoop is written in Java.
It is being used by
• Facebook
• Yahoo
• Google
• Twitter
• LinkedIn and many more.
History of Hadoop
The Hadoop was started by Doug Cutting and Mike Cafarella in 2002.
Components/modules of Hadoop
HDFS
Map Reduce
Hadoop Common
yarn
HDFS
NameNode(Master)
NameNode is mainly used for storing the Metadata i.e. the data about the
data. Meta Data can be the transaction logs that keep track of the user’s
activity in a Hadoop cluster.
DataNode(Slave)
DataNode works as a Slave DataNode are mainly utilized for storing the
data in a Hadoop cluster, the number of DataNode can be from 1 to 500
or even more than that.
File Block In HDFS:
Data in HDFS is always stored in terms of blocks. So the single block of data
is divided into multiple blocks of size 128MB which is default and you can also
change it manually.
Hdfs architecture
Map-reducing
Job Tracker
The role of Job Tracker is to accept the MapReduce jobs from client and
process the data by using NameNode.
In response, NameNode provides metadata to Job Tracker.
Task Tracker
It works as a slave node for Job Tracker.
It receives task and code from Job Tracker and applies that code on the
file. This process can also be called as a Mapper.
MapReduce architecture
Working of Hadoop
Hadoop runs code across a cluster of computers. This process includes the
following core tasks that Hadoop performs −
Data is initially divided into directories and files. Files are divided into
uniform sized blocks of 128M and 64M (preferably 128M).
These files are then distributed across various cluster nodes for further
processing.
HDFS, being on top of the local file system, supervises the processing.
Blocks are replicated for handling hardware failure.
Checking that the code was executed successfully.
Performing the sort that takes place between the map and reduce stages.
Sending the sorted data to a certain computer.
Writing the debugging logs for each job.
Architecture of Hadoop
Yarn and Hadoop common
Yarn-Yet Another Resource Negotiator, as the name implies, YARN is the one
who helps to manage the resources across the clusters. In short, it performs
scheduling and resource allocation for the Hadoop System.
Hadoop Common- it contains packages and libraries which are used for other
modules.
Advantages of Hadoop
It is efficient, and it automatic distributes the data and work across the
machines and in turn, utilizes the underlying parallelism of the CPU cores.
Hadoop provide fault-tolerance
High availability (FTHA).
Servers can be added or removed from the cluster dynamically and Hadoop
continues to operate without interruption.
Another big advantage of Hadoop is that apart from being open source, it is
compatible on all the platforms since it is Java based.
Disadvantages of Hadoop
Not very effective for small data.
Hard cluster management.
Has stability issues.
Security concerns.
Complexity: Hadoop can be complex to set up and maintain, especially for
organizations without a dedicated team of experts.
Latency: Hadoop is not well-suited for low-latency workloads and may not be the best
choice for real-time data processing
Data Security: Hadoop does not provide built-in security features such as data
encryption or user authentication, which can make it difficult to secure sensitive data.
Cost: Hadoop can be expensive to set up and maintain, especially for organizations
with large amounts of data.
Data Loss: In the event of a hardware failure, the data stored in a single node may be
lost permanently.
Thank you for listening!
Any question?