Hadoop Cluster
Hadoop Cluster
Application Application
& Resource Management Layer
Layer
Resource Management
Layer
Storage
Layer
Storage
Layer
HDFS
HDFS stands for Hadoop Distributed File System. It provides for data storage of Hadoop. HDFS splits
the data unit into smaller units called blocks and stores them in a distributed manner.
It has got two clusters running.
• Master node – NameNode
• Slave nodes – DataNode.
Block in HDFS
Block is nothing but the smallest unit of storage on a computer system. It is the smallest contiguous
storage allocated to a file. In Hadoop, we have a default block size of 128MB or 256 MB.
What is Cluster
⮚ A Hadoop cluster is a collection of computers, known as nodes, that
are networked together to perform parallel computations on big data
sets.
In this case, the number of data nodes required to store 500TB of data
equals 500/72, or approximately 7.
Communication Protocols Used in Hadoop
Clusters
⮚ The HDFS communication protocol works on the top of TCP/IP protocol.
⮚ Hadoop cluster establishes the connection to the client using client protocol.
⮚ Iterative Processing
References
⮚ Data-Flair
⮚ Data-Bricks