Document 4 HDFS

HDFS is a distributed file system designed for Hadoop that provides scalable and reliable data storage across large clusters. It uses a master-slave architecture where the NameNode manages the file system metadata and namespace as the master, and DataNodes store and retrieve blocks of data as slaves. Files are split into blocks that are replicated across multiple DataNodes for fault tolerance, with the NameNode monitoring replication and rebalancing as needed.

Uploaded by

Piyali

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

75 views

Document 4 HDFS

Uploaded by

Piyali

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

Document#4

Topics
HDFS
HDFS Architecture
NameNodes and DataNodes
NameNode in Hadoop-2
Replication

HDFS

HDFS
Hadoop Distributed File System (HDFS) is a Java-based file system that provides
scalable and reliable data storage that is designed to span large clusters of commodity servers.
HDFS was designed to be a scalable, fault-tolerant, distributed storage system that works closely
with MapReduce. HDFS will just work under a variety of physical and systemic
circumstances. By distributing storage and computation across many servers, the combined
storage resource can grow with demand while remaining economical at every size.
These specific features ensure that the Hadoop clusters are highly functional and highly
available:
Rack awareness allows consideration of a nodes physical location, when allocating
storage and scheduling tasks.
Minimal data motion. MapReduce moves compute processes to the data on HDFS and
not the other way around. Processing tasks can occur on the physical node where the data
resides. This significantly reduces the network I/O patterns and keeps most of the I/O on
the local disk or within the same rack and provides very high aggregate read/write
bandwidth.
Utilities diagnose the health of the files system and can rebalance the data on different
nodes
Rollback allows system operators to bring back the previous version of HDFS after an
upgrade, in case of human or system errors
Standby NameNode provides redundancy and supports high availability
Highly operable. Hadoop handles different types of cluster that might otherwise require
operator intervention. This design allows a single operator to maintain a cluster of 1000s
of nodes.

HDFS Architecture
An HDFS cluster is comprised of a NameNode which manages the cluster metadata and
DataNodes that store the data. Files and directories are represented on the NameNode by inodes.
Inodes record attributes like permissions, modification and access times, or namespace and disk
space quotas.
The file content is split into large blocks (typically 128 megabytes), and each block of the
file is independently replicated at multiple DataNodes. The blocks are stored on the local file
system on the datanodes. The Namenode actively monitors the number of replicas of a block.
When a replica of a block is lost due to a DataNode failure or disk failure, the NameNode creates
another replica of the block. The NameNode maintains the namespace tree and the mapping of
blocks to DataNodes, holding the entire namespace image in RAM.
The NameNode does not directly send requests to DataNodes. It sends instructions to the
DataNodes by replying to heartbeats sent by those DataNodes. The instructions include
commands to: replicate blocks to other nodes, remove local block replicas, re-register and send
an immediate block report, or shut down the node.
According to 'Hadoop The definitive guide' - "The namenode manages the filesystem
namespace. It maintains the filesystem tree and the metadata for all the files and directories in
the tree."
Essentially, Namespace means a container. In this context is means the file name
grouping or hierarchy structure.
Metadata contains things like the owners of files, permission bits, block location, size etc

NameNodes and DataNodes

New File Creation
AddBlock Request
DataNode Details

NameNode

HDFS
Client
Write
DataNode

DataNode

Blocks
Received

File Writing
AddBlock Request
DataNode Details

NameNode

HDFS
Client
Write

Acknowledgement

DataNode

DataNode
Blocks
Received

File Reading

Get Block Location

DataNode Details
HDFS
Client

DataNode
Read

DataNode

NameNode

NameNode in Hadoop-2

Hadoop clusters storage resources were previously available only to HDFS. The new
storage architecture generalizes the block storage layer so that it can be used not only by HDFS
but also other storage services. The first use of this feature is HDFS federation, which allows
multiple instances of HDFS namespaces to share the underlying storage. In future versions of
Hadoop, other storage services (such as key-value storage) will use the same storage layer.

Replication
The NameNode endeavors to ensure that each block always has the intended number of
replicas. The NameNode detects that a block has become under- or over-replicated when a block
report from a DataNode arrives. When a block becomes over replicated, the NameNode chooses
a replica to remove. The NameNode will prefer not to reduce the number of racks that host
replicas, and secondly prefer to remove a replica from the DataNode with the least amount of
available disk space. The goal is to balance storage utilization across DataNodes without
reducing the block's availability.
When a block becomes under-replicated, it is put in the replication priority queue. A
block with only one replica has the highest priority, while a block with a number of replicas that
is greater than two thirds of its replication factor has the lowest priority. A background thread
periodically scans the head of the replication queue to decide where to place new replicas. Block
replication follows a similar policy as that of new block placement. If the number of existing
replicas is one, HDFS places the next replica on a different rack. In case that the block has two
existing replicas, if the two existing replicas are on the same rack, the third replica is placed on a
different rack; otherwise, the third replica is placed on a different node in the same rack as an
existing replica. Here the goal is to reduce the cost of creating new replicas.
The NameNode also makes sure that not all replicas of a block are located on one rack. If
the NameNode detects that a block's replicas end up at one rack, the NameNode treats the block
as mis-replicated and replicates the block to a different rack using the same block placement
policy described above. After the NameNode receives the notification that the replica is created,
the block becomes over-replicated. The NameNode then will decides to remove an old replica
because the over-replication policy prefers not to reduce the number of racks.

Walc 2
92% (26)
Walc 2
301 pages
Wisdom Oracle PDF
79% (57)
Wisdom Oracle PDF
248 pages
WALC 10 Memory
83% (12)
WALC 10 Memory
186 pages
Pachislo Manual PDF
100% (3)
Pachislo Manual PDF
30 pages
Kohler tp6805 14/20RESA/L Service Manual
100% (10)
Kohler tp6805 14/20RESA/L Service Manual
128 pages
Dynomite Owners Manual
No ratings yet
Dynomite Owners Manual
405 pages
Core Data Mastery in SwiftUI
No ratings yet
Core Data Mastery in SwiftUI
551 pages
Gideon's Guardians - New Meth Recipe - A - K - A Easter Bunny Meth
67% (6)
Gideon's Guardians - New Meth Recipe - A - K - A Easter Bunny Meth
50 pages
EPA07 Maxxforce 11, 13 Engine Service Manual
79% (29)
EPA07 Maxxforce 11, 13 Engine Service Manual
490 pages
Unlock Codes All Cell Phones
100% (26)
Unlock Codes All Cell Phones
15 pages
DIY: Immobilizer Hacking For Lost Keys or Swapped ECU
50% (4)
DIY: Immobilizer Hacking For Lost Keys or Swapped ECU
14 pages
Cell Phone Unlock Code Instructions
63% (8)
Cell Phone Unlock Code Instructions
41 pages
A Computer Motherboard Diagram
100% (2)
A Computer Motherboard Diagram
10 pages
Toyota Camry 2002 2006 Workshop Manual
98% (62)
Toyota Camry 2002 2006 Workshop Manual
20 pages
Lock Picking Hotel Rooms
100% (1)
Lock Picking Hotel Rooms
22 pages
2019 Mac Pro Service Technician Manual
No ratings yet
2019 Mac Pro Service Technician Manual
341 pages
All CDMA Codes
75% (4)
All CDMA Codes
17 pages
Logic Pro X Shortcuts
92% (13)
Logic Pro X Shortcuts
11 pages
Yarn Ha Federation
No ratings yet
Yarn Ha Federation
64 pages
Holley Carb Manual PDF
100% (1)
Holley Carb Manual PDF
2 pages
IBM Assembly Language Coding (ALC) Part 1
100% (8)
IBM Assembly Language Coding (ALC) Part 1
68 pages
Philips HTD 5540 Service Manual PDF
100% (1)
Philips HTD 5540 Service Manual PDF
56 pages
All Mobile Tricks
91% (35)
All Mobile Tricks
19 pages
ServiceManualNamux4English PDF
100% (9)
ServiceManualNamux4English PDF
112 pages
Unit-2_ch_1_updated
No ratings yet
Unit-2_ch_1_updated
22 pages
Hadoop
No ratings yet
Hadoop
23 pages
Module 1 PDF
No ratings yet
Module 1 PDF
49 pages
Unit2 HDFS
No ratings yet
Unit2 HDFS
17 pages
Unit- 3 (HDFS)-1
No ratings yet
Unit- 3 (HDFS)-1
24 pages
Quick Look: HDFS: Assumptions and Goals
No ratings yet
Quick Look: HDFS: Assumptions and Goals
5 pages
HDFS
No ratings yet
HDFS
15 pages
HDFS
No ratings yet
HDFS
16 pages
HDFS and YARN
No ratings yet
HDFS and YARN
91 pages
HDFS Intro
No ratings yet
HDFS Intro
9 pages
Hadoop File System
No ratings yet
Hadoop File System
36 pages
File System Basics: Hadoop Distributed
No ratings yet
File System Basics: Hadoop Distributed
22 pages
Unit-3 (HDFS)
No ratings yet
Unit-3 (HDFS)
59 pages
Unit-2
No ratings yet
Unit-2
14 pages
Unit II-bid Data Programming
No ratings yet
Unit II-bid Data Programming
23 pages
Namenode High Availability
No ratings yet
Namenode High Availability
7 pages
HDFS Concepts
No ratings yet
HDFS Concepts
10 pages
Chapter N2 HDFS The Hadoop Distributed File System - Matrix
No ratings yet
Chapter N2 HDFS The Hadoop Distributed File System - Matrix
37 pages
Chapter 4 - Hadoop Ecosystem
No ratings yet
Chapter 4 - Hadoop Ecosystem
24 pages
HDFSnew
No ratings yet
HDFSnew
20 pages
Bigdata 15cs82 Vtu Module 1 2 Notes
57% (14)
Bigdata 15cs82 Vtu Module 1 2 Notes
49 pages
Bigdata 15cs82 Vtu Module 1 2 Notes PDF
No ratings yet
Bigdata 15cs82 Vtu Module 1 2 Notes PDF
49 pages
Unit 2
No ratings yet
Unit 2
56 pages
Understanding Hadoop Ecosystem1 2
No ratings yet
Understanding Hadoop Ecosystem1 2
65 pages
Experiment No. 2 Training Session On Hadoop: Hadoop Distributed File System
No ratings yet
Experiment No. 2 Training Session On Hadoop: Hadoop Distributed File System
9 pages
5.apache Hadoop
No ratings yet
5.apache Hadoop
33 pages
Module 1 PDF
No ratings yet
Module 1 PDF
42 pages
BDA - Unit-2
No ratings yet
BDA - Unit-2
24 pages
Unit 2 Da Material
No ratings yet
Unit 2 Da Material
71 pages
BDA Module-1 Notes
No ratings yet
BDA Module-1 Notes
14 pages
Unit-4 Hadoop Distributed File System (HDFS) : Syllabus
No ratings yet
Unit-4 Hadoop Distributed File System (HDFS) : Syllabus
17 pages
CC Unit 5 Notes
No ratings yet
CC Unit 5 Notes
30 pages
Unit Ii
No ratings yet
Unit Ii
39 pages
Unit 3 Big Data_240516_090400
No ratings yet
Unit 3 Big Data_240516_090400
20 pages
Unit III
No ratings yet
Unit III
86 pages
Hadoop File System: B. Ramamurthy
No ratings yet
Hadoop File System: B. Ramamurthy
36 pages
HDFS
No ratings yet
HDFS
37 pages
HDFS
No ratings yet
HDFS
19 pages
UNIT 3 HDFS, Hadoop Environment Part 1
No ratings yet
UNIT 3 HDFS, Hadoop Environment Part 1
9 pages
Hadoop Working
No ratings yet
Hadoop Working
33 pages
Hadoop File System: B. Ramamurthy
No ratings yet
Hadoop File System: B. Ramamurthy
36 pages
Bda - M 2
No ratings yet
Bda - M 2
113 pages
Unit_3_HDFS
No ratings yet
Unit_3_HDFS
26 pages
The Hadoop Distributed File System
No ratings yet
The Hadoop Distributed File System
44 pages
5_bdp-2024-06
No ratings yet
5_bdp-2024-06
14 pages
huawei
No ratings yet
huawei
32 pages
Hadoop Architecture
No ratings yet
Hadoop Architecture
48 pages
BDA Mod 3 QB Solns
No ratings yet
BDA Mod 3 QB Solns
19 pages
Unit 3
No ratings yet
Unit 3
44 pages
Hadoop File System: B. Ramamurthy
No ratings yet
Hadoop File System: B. Ramamurthy
36 pages
Hadoop Distributed File System (HDFS)
No ratings yet
Hadoop Distributed File System (HDFS)
6 pages
Namenode and Datanodes
No ratings yet
Namenode and Datanodes
3 pages
Bda Unit 5
No ratings yet
Bda Unit 5
17 pages
21CS72-BIGDATA-MODULE-2-HDFS (1)
No ratings yet
21CS72-BIGDATA-MODULE-2-HDFS (1)
55 pages
Hadoop Distributed File System: Presented by Mohammad Sufiyan Nagaraju Kola Prudhvi Krishna Kamireddy
No ratings yet
Hadoop Distributed File System: Presented by Mohammad Sufiyan Nagaraju Kola Prudhvi Krishna Kamireddy
17 pages
The Architecture of Open Source Applications - The Hadoop Distributed File System
No ratings yet
The Architecture of Open Source Applications - The Hadoop Distributed File System
6 pages
BIGDTA_UNIT_3
No ratings yet
BIGDTA_UNIT_3
65 pages
BD Module 1 Final
No ratings yet
BD Module 1 Final
17 pages
FreeBSD Mastery: Advanced ZFS: IT Mastery, #9
From Everand
FreeBSD Mastery: Advanced ZFS: IT Mastery, #9
Michael W. Lucas
No ratings yet
Samsung Full Codes
100% (5)
Samsung Full Codes
7 pages
Diy Inmobilizer Circuits
100% (5)
Diy Inmobilizer Circuits
18 pages
Direct Key Web
No ratings yet
Direct Key Web
7 pages
Data Link, Fault Tracing V2
100% (1)
Data Link, Fault Tracing V2
22 pages
Manual Ecu D Citit
No ratings yet
Manual Ecu D Citit
145 pages
How Cell Phones Work
No ratings yet
How Cell Phones Work
12 pages
Charmed RPG Player Handbook
75% (4)
Charmed RPG Player Handbook
18 pages
Hints Computer System Design
100% (1)
Hints Computer System Design
27 pages
Credit Card Final Review
No ratings yet
Credit Card Final Review
21 pages
Guideline For Licensing of Premises For Manufacturing Medical Devices
No ratings yet
Guideline For Licensing of Premises For Manufacturing Medical Devices
19 pages
U3 - Enterprise 3 - Assignment Brief
No ratings yet
U3 - Enterprise 3 - Assignment Brief
2 pages
MVPMAP List of Company Members 1
100% (37)
MVPMAP List of Company Members 1
3 pages
XL 3200 Ss
No ratings yet
XL 3200 Ss
4 pages
Wire Lengths For 4 and 9-1 Ununs
No ratings yet
Wire Lengths For 4 and 9-1 Ununs
4 pages
Neil Aldrin L. Abel Notes As Assignment Bs Cpe 3 - 3 10/19/2020
No ratings yet
Neil Aldrin L. Abel Notes As Assignment Bs Cpe 3 - 3 10/19/2020
4 pages
MK75 Pro Manual En. V3 20230410 1
No ratings yet
MK75 Pro Manual En. V3 20230410 1
3 pages
EEE 212 (Electronic Devices and Circuits Theory Lab)
No ratings yet
EEE 212 (Electronic Devices and Circuits Theory Lab)
5 pages
Legal Ethical Societal Issues (1)
No ratings yet
Legal Ethical Societal Issues (1)
22 pages
Vinod Case Study Sun Microsystems
No ratings yet
Vinod Case Study Sun Microsystems
2 pages
Remotec ZFM-80US
No ratings yet
Remotec ZFM-80US
2 pages
JIT Full
No ratings yet
JIT Full
6 pages
A-Level Computer Science (7517) : Non-Exam Assessment (NEA) Guidance
No ratings yet
A-Level Computer Science (7517) : Non-Exam Assessment (NEA) Guidance
23 pages
Process Control B.S
100% (3)
Process Control B.S
437 pages
Linked List ADT
No ratings yet
Linked List ADT
27 pages
Budget Radio Telescope For DIY
No ratings yet
Budget Radio Telescope For DIY
14 pages
Isaac Asimov - Visit To The World's Fair of 2014
No ratings yet
Isaac Asimov - Visit To The World's Fair of 2014
6 pages
Final Course Handout Electrical & Electronics Systems
No ratings yet
Final Course Handout Electrical & Electronics Systems
9 pages
LoRa MESH Radio YL-800N EN
No ratings yet
LoRa MESH Radio YL-800N EN
12 pages
Travel Tripper - Customer Interviews
No ratings yet
Travel Tripper - Customer Interviews
2 pages
TAZ Workshop Sec1 2 Edu525 Dec7
No ratings yet
TAZ Workshop Sec1 2 Edu525 Dec7
11 pages
L-3-EEE251-Measurement and Instrumentation - DMAK
No ratings yet
L-3-EEE251-Measurement and Instrumentation - DMAK
88 pages
International Standard: Oil of Rose (Rosa
No ratings yet
International Standard: Oil of Rose (Rosa
6 pages
AWS VS Azure VS GCP VS IBM Cloud VS Oracle VS Alibaba
100% (2)
AWS VS Azure VS GCP VS IBM Cloud VS Oracle VS Alibaba
11 pages
Service-Oriented Architecture (SOA)
100% (1)
Service-Oriented Architecture (SOA)
11 pages
User Manual LM1
No ratings yet
User Manual LM1
27 pages
Sram Utilization and Power Consumption Analysis For Low Power Applications
No ratings yet
Sram Utilization and Power Consumption Analysis For Low Power Applications
9 pages
Dawanle Substation $
No ratings yet
Dawanle Substation $
12 pages

Document 4 HDFS

Uploaded by

Document 4 HDFS

Uploaded by

Document#4

NameNodes and DataNodes

Get Block Location

You might also like