Anatomy of file write in hadoop

M
KU

Y

F
O

OM P
AT OO
AN AD
H

LE
FI

TE
RI
W

IN

AR

A
OM
.C
ND
OO
NA
H
A
YA
SH
K@
JE
90
RA
12
H_
ES
AJ
R

Data center D1
Name Node
Rack R1

R1N1

R1N2

R1N3

R1N4

Rack R2

R2N1

R2N2

R2N3

R2N4

1. This is our example Hadoop cluster.
2. This with has one name node and two racks named R1 and R2 in a data center D1. Each rack has 4
nodes and they are uniquely identified as R1N1, R1N2 and so on.
3. Replication factor is 3.
4. HDFS block size is 64 MB.

1. Name node saves part of HDFS metadata like file location, permission, etc. in files
called namespace image and edit logs. Files are stored in HDFS as blocks. These
block information are not saved in any file. Instead it is gathered every time the
cluster is started. And this information is stored in name node’s memory.
2. Replica Placement : Assuming the replication factor is 3; When a file is written from
a data node (say R1N1), Hadoop attempts to save the first replica in same data
node (R1N1). Second replica is written into another node (R2N2) in a different rack
(R2). Third replica is written into another node (R2N1) in the same rack (R2) where
the second replica was saved.
3. Hadoop takes a simple approach in which the network is represented as a tree and
the distance between two nodes is the sum of their distances to their closest
common ancestor. The levels can be like; “Data Center” > “Rack” > “Node”.
Example; ‘/d1/r1/n1’ is a representation for a node named n1 on rack r1 in data
center d1. Distance calculation has 4 possible scenarios as;

1. distance(/d1/r1/n1, /d1/r1/n1) = 0 [Processes on same
node]
2. distance(/d1/r1/n1, /d1/r1/n2) = 2 [different node is
same rack]
3. distance(/d1/r1/n1, /d1/r2/n3) = 4 [node in different rack

ANATOMY OF FILE WRITE – HAPPY PATH

HDFS
Client

create()

RPC call to create a new file

DistributedFileSystem
RPC call is complete

Name Node
sfo_crimes.csv

FSDataOutputStream
DFSOutputStream

RIN1 JVM

• Let’s say we are trying to write the “sfo_crimes.csv” file from R1N1.
• So a HDFS Client program will run on R1N1’s JVM.
• First the HDFS client program calls the method create() on a Java class
DistributedFileSystem (subclass of FileSystem).
• DFS makes a RPC call to name node to create a new file in the file system's
namespace. No blocks are associated to the file at this stage.
• Name node performs various checks; ensures the file doesn't exists, the user has the
right permissions to create the file. Then name node creates a record for the new file.
• Then DFS creates a FSDataOutputStream for the client to write data to. FSDOS wraps
a DFSOutputStream, which handles communication with DN and NN.
• In response to ‘FileSystem.create()’, HDFS Client receives this FSDataOutputStream.

HDFS
Client

write()

FSDataOutputStream
DFSOutputStream

Name Node

Data Queue
Ack Queue

DataStreamer

RIN1 JVM

• From now on HDFS Client deals with FSDataOutputStream.
• HDFS Client invokes write() on the stream.
• Following are the important components involved in a file write;
• Data Queue: When client writes data, DFSOS splits into packets and writes into
this internal queue.
• DataStreamer: The data queue is consumed by this component, which also
communicates with name node for block allocation.
• Ack Queue: Packets consumed by DataStreamer are temporaroly saved in an
this internal queue.

HDFS
Client

write()

FSDataOutputStream
DFSOutputStream

Name Node

Data Queue
P
6

P
5

P
4

P
3

P
2

P
1

Ack Queue

DataStreamer
Pipeline
RIN1 JVM

R1N1

R2N1

R1N2

• As said, data written by client will be converted into packets and stored in data queue.
• DataStreamer communicates with NN to allocate new blocks by picking a list of
suitable DNs to store the replicas. NN uses ‘Replica Placement’ as a strategy to pick
DNs for a block.
• The list of DNs form a pipeline. Since the replication factor is assumed as 3, there are
3 nodes picked by NN.

HDFS
Client

write()

FSDataOutputStream

close()

DFSOutputStream

Name Node

Data Queue
P
8

P
7

P
6

P
5

P
4

P
3

Ack Queue
P
2

P
1

DataStreamer
Pipeline
RIN1 JVM
Ac
k

P1

P1

R1N1

R2N1
Ack

•

P1

R1N2
Ack

DataStreamer consumes few packets from data queue. A copy of the consumed data is stored in
‘ack queue’.
• DataStreamer streams the packet to first node in pipeline. Once the data is written in DN1, the
data is forwarded to next DN. This repeats till last DN.
• Once the packet is written to the last DN, an acknowledgement is sent from each DN to DFSOS.
The packet P1 is removed from Ack Queue.
• The whole process continues till a block is filled. After that, the pipeline is closed and
DataStreamer asks NN for fresh set of DNs for next block. And the cycle repeats.
• HDFS Client calls the close() method once the write is finished. This would flush all the remaining
packets to the pipeline & waits for ack before informing the NN that the write is complete.

ANATOMY OF FILE WRITE – DATA NODE WRITE
ERROR

HDFS
Client

write()

FSDataOutputStream
DFSOutputStream

Name Node

Data Queue
P
8

P7

PP
68

P
P5
7

P
P
6
4

P
P
5
3

P
P2
4

P
3
1

Ack Queue
P
2

P
1

DataStreamer
Pipeline
RIN1 JVM
P1

R1N1
•

R2N1

R1N2

A normal write begins with a write() method call from HDFS client on the stream. And let’s say an
error occurred while writing to R2N1.
• The pipeline will be closed.
• Packets in ack queue are moved to front data queue.
• The current block on good DNs are given a new identity and its communicated to NN, so the
partial block on the failed DN will be deleted if the failed DN recovers later.
• The failed data node is removed from pipeline and the remaining data is written to the remaining
two DNs.
• NN notices that the block is under-replicated, and it arranges for further replica to be created on
another node.

THE END

SORRY FOR MY POOR ENGLISH. 
PLEASE SEND YOUR VALUABLE FEEDBACK TO
RAJESH_1290K@YAHOO.COM

Anatomy of file write in hadoop

Recommended

More Related Content

What's hot (20)

Viewers also liked (13)

Similar to Anatomy of file write in hadoop (20)

Recently uploaded (20)

Anatomy of file write in hadoop