Computer Science Apprenticeship Bigdata Assignement3

The Hadoop Distributed File System (HDFS) is a distributed file system designed to store and process large amounts of data across clusters of machines. HDFS uses a master/slave architecture where the NameNode is the master that manages file metadata and DataNodes store data blocks and replicate them for fault tolerance. Additional components like the Secondary NameNode provide backups. HDFS allows data to be easily scaled across clusters and is fault tolerant through data replication, though it has disadvantages for small files or frequent small writes. Careful configuration of resources and data placement is important for optimal HDFS performance.

Uploaded by

abood jallad

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

37 views3 pages

Computer Science Apprenticeship Bigdata Assignement3

Uploaded by

abood jallad

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 3

Computer Science Apprenticeship (CAP)

An-Najah National University

Big Data Course - 2022-2023 (FALL)

Assignment-3: A report about Hadoop Distributed File System (HDFS)

Instructor: Dr. Hamed Abdelhaq

DueDate:24/12/2022

-----------------------------------------------------------------------------------------------------------------

The Hadoop Distributed File System (HDFS) is a distributed file

system designed to run on a cluster of machines. It was
developed as part of the Apache Hadoop project and is
designed to store and manage large amounts of data in a
distributed manner, allowing it to be accessed and processed
concurrently by multiple machines.
The main purpose of HDFS is to enable the efficient storage and
processing of large amounts of data in a distributed
environment. This is particularly useful in big data scenarios,
where the volume of data being processed is too large to be
handled by a single machine. HDFS allows data to be
distributed across multiple machines, allowing for faster
processing times and more efficient use of resources.
The main components of HDFS include the NameNode, the
DataNode, and the Secondary NameNode. The NameNode is
the master node in the HDFS architecture and is responsible for
managing the file system namespace and maintaining metadata
about the files and directories in the file system. The DataNode
stores the actual data blocks and is responsible for replicating
the data blocks across multiple machines in the cluster. The
Secondary NameNode acts as a backup to the NameNode,
creating periodic checkpoints of the file system metadata to
prevent data loss in the event of a failure.
One of the key characteristics of HDFS is its fault tolerance.
HDFS is designed to be able to withstand the failure of
individual machines within the cluster without losing data. This
is achieved through the use of replicas, which are copies of the
data stored on multiple machines in the cluster. In the event of
a machine failure, the data can still be accessed from one of the
replicas.
Another key characteristic of HDFS is its ability to scale. As the
volume of data being processed increases, additional machines
can be added to the cluster to handle the increased load. This
allows HDFS to handle very large amounts of data without
experiencing a decrease in performance.
There are various commands that can be used to operate on
HDFS, allowing users to perform actions such as creating and
deleting files and directories, reading and writing data, and
copying data between HDFS and other file systems. For
example, the "hdfs dfs -mkdir" command can be used to create
a new directory on HDFS, while the "hdfs dfs -put" command
can be used to copy a file from the local file system to HDFS.
Despite its many advantages, HDFS does have some
disadvantages. One potential drawback is that it is not well-
suited for handling small files, as the overhead of storing and
processing the metadata for these files can be significant.
Additionally, HDFS is not designed to handle a high number of
small writes, as this can lead to a decrease in performance.
In order to operate effectively, HDFS requires a well-configured
cluster of machines. This includes ensuring that the machines
have sufficient memory and storage capacity to handle the data
being processed, as well as configuring the network
connections between the machines to ensure good
performance. It is also important to carefully plan the
placement of data blocks on the machines in the cluster, as this
can have a significant impact on performance.
Overall, HDFS is a powerful tool for storing and processing large
amounts of data in a distributed manner. Its fault tolerance and
ability to scale make it well-suited for big data applications, and
its various commands allow for a range of actions to be
performed on the data stored within it. However, it is
important to carefully consider the configuration of the cluster
and the placement of data blocks in order to ensure optimal
performance.

References:

https://www.techtarget.com/searchdatamanagement/definition/
Hadoop-Distributed-File-System-HDFS
https://www.tutorialspoint.com/hadoop/hadoop_hdfs_overview.htm
https://www.tutorialspoint.com/hadoop/hadoop_hdfs_operations.htm

Roland SH 201 Service Manual
0% (1)
Roland SH 201 Service Manual
43 pages
Traffic Signal CG Mini Project Using OpenGL Report
100% (1)
Traffic Signal CG Mini Project Using OpenGL Report
43 pages
MEXA-584L E Catalog PDF
100% (1)
MEXA-584L E Catalog PDF
2 pages
HDFS 3
No ratings yet
HDFS 3
51 pages
10 Dfs
No ratings yet
10 Dfs
5 pages
DC MOD 6
No ratings yet
DC MOD 6
9 pages
Unit-4 BDA as on 25-11-2024
No ratings yet
Unit-4 BDA as on 25-11-2024
258 pages
BD Unit-IIINotes
No ratings yet
BD Unit-IIINotes
17 pages
Big Data Refers to Extremely Large and Complex Datasets That 1
No ratings yet
Big Data Refers to Extremely Large and Complex Datasets That 1
421 pages
Hadoop Distributed File System (HDFS)
No ratings yet
Hadoop Distributed File System (HDFS)
6 pages
Hadoop Distributed File System: Presented by Mohammad Sufiyan Nagaraju Kola Prudhvi Krishna Kamireddy
No ratings yet
Hadoop Distributed File System: Presented by Mohammad Sufiyan Nagaraju Kola Prudhvi Krishna Kamireddy
17 pages
Module III Hadoop Framework
No ratings yet
Module III Hadoop Framework
21 pages
UNIT-3-1 (1)
No ratings yet
UNIT-3-1 (1)
20 pages
UNIT 3 HDFS, Hadoop Environment Part 1
No ratings yet
UNIT 3 HDFS, Hadoop Environment Part 1
9 pages
UNIT 3 FULL
No ratings yet
UNIT 3 FULL
89 pages
Big Data Assighmwnt 2
No ratings yet
Big Data Assighmwnt 2
60 pages
HDFS
No ratings yet
HDFS
22 pages
Module 1 PDF
No ratings yet
Module 1 PDF
49 pages
Notes - 3 Unit neha
No ratings yet
Notes - 3 Unit neha
25 pages
Chapter 4 - Hadoop Ecosystem
No ratings yet
Chapter 4 - Hadoop Ecosystem
24 pages
The Hadoop Approach
100% (2)
The Hadoop Approach
14 pages
HDFS
No ratings yet
HDFS
14 pages
Distributed File Systems Leading To Hadoop File System: UNIT-2
No ratings yet
Distributed File Systems Leading To Hadoop File System: UNIT-2
12 pages
Unit-2 Introduction To Hadoop
No ratings yet
Unit-2 Introduction To Hadoop
19 pages
Experiment No. 2 Training Session On Hadoop: Hadoop Distributed File System
No ratings yet
Experiment No. 2 Training Session On Hadoop: Hadoop Distributed File System
9 pages
BDA Module-1 Notes
No ratings yet
BDA Module-1 Notes
14 pages
Unit 3
No ratings yet
Unit 3
61 pages
Unit 1 (Chapter 2) - Big Data Storage
No ratings yet
Unit 1 (Chapter 2) - Big Data Storage
34 pages
Bigdata 15cs82 Vtu Module 1 2 Notes
57% (14)
Bigdata 15cs82 Vtu Module 1 2 Notes
49 pages
Bigdata 15cs82 Vtu Module 1 2 Notes PDF
No ratings yet
Bigdata 15cs82 Vtu Module 1 2 Notes PDF
49 pages
BIG DATA - Unit 4 HADOOP AND MAP REDUCE -mini xerox - easy read
No ratings yet
BIG DATA - Unit 4 HADOOP AND MAP REDUCE -mini xerox - easy read
16 pages
HDFS
No ratings yet
HDFS
11 pages
Big data aktu unit 3
No ratings yet
Big data aktu unit 3
90 pages
HDFS
No ratings yet
HDFS
15 pages
Apache Hadoop 3.4.1 – HDFS Architecture
No ratings yet
Apache Hadoop 3.4.1 – HDFS Architecture
7 pages
Unit-4 BDA as on 25-11-2024
No ratings yet
Unit-4 BDA as on 25-11-2024
248 pages
HDFS Intro
No ratings yet
HDFS Intro
9 pages
Read Write in HDFS
No ratings yet
Read Write in HDFS
6 pages
HDFS
No ratings yet
HDFS
16 pages
Unit-3 (HDFS)
No ratings yet
Unit-3 (HDFS)
59 pages
Unit 3 Big Data_240516_090400
No ratings yet
Unit 3 Big Data_240516_090400
20 pages
Big Data Unit-III
No ratings yet
Big Data Unit-III
39 pages
Unit 3.1
No ratings yet
Unit 3.1
88 pages
High Performance Fault-Tolerant Hadoop Distributed File System
No ratings yet
High Performance Fault-Tolerant Hadoop Distributed File System
9 pages
High Performance Fault-Tolerant Hadoop Distributed File System
No ratings yet
High Performance Fault-Tolerant Hadoop Distributed File System
9 pages
2403.15701v1
No ratings yet
2403.15701v1
10 pages
3
No ratings yet
3
20 pages
Hadoop Working
No ratings yet
Hadoop Working
33 pages
Complete Hadoop Notes Final
No ratings yet
Complete Hadoop Notes Final
4 pages
Wa0001.
No ratings yet
Wa0001.
56 pages
Quick Look: HDFS: Assumptions and Goals
No ratings yet
Quick Look: HDFS: Assumptions and Goals
5 pages
1) Discuss The Design of Hadoop Distributed File System (HDFS) and Concept in Detail
No ratings yet
1) Discuss The Design of Hadoop Distributed File System (HDFS) and Concept in Detail
11 pages
Module 1
No ratings yet
Module 1
66 pages
HDFS Unit 4
No ratings yet
HDFS Unit 4
8 pages
Wa Introhdfs PDF
No ratings yet
Wa Introhdfs PDF
11 pages
Hadoop Architecture
No ratings yet
Hadoop Architecture
48 pages
Notes
88% (8)
Notes
18 pages
Unit 2
No ratings yet
Unit 2
22 pages
Bigdata Unit 3
No ratings yet
Bigdata Unit 3
96 pages
BDA Lab Assignment 2
No ratings yet
BDA Lab Assignment 2
18 pages
File System Basics: Hadoop Distributed
No ratings yet
File System Basics: Hadoop Distributed
22 pages
Reliability and Architecture of HDFS: Definitive Reference for Developers and Engineers
From Everand
Reliability and Architecture of HDFS: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Hard Circle Drives (HDDs): Uncovering the Center of Information Stockpiling
From Everand
Hard Circle Drives (HDDs): Uncovering the Center of Information Stockpiling
Friend Good
No ratings yet
The Elements of Big Data Value Foundations of the Research and Innovation Ecosystem 1st edition by Edward Curry, Andreas Metzger, Sonja Zillner, Jean Christophe, Pazzaglia Ana, GarcÃa Robles 3030681750 978-3030681753 - The ebook is ready for download with just one simple click
100% (22)
The Elements of Big Data Value Foundations of the Research and Innovation Ecosystem 1st edition by Edward Curry, Andreas Metzger, Sonja Zillner, Jean Christophe, Pazzaglia Ana, GarcÃa Robles 3030681750 978-3030681753 - The ebook is ready for download with just one simple click
83 pages
Oracle Cloud
No ratings yet
Oracle Cloud
5 pages
01 01 RN23821EN31GLA0 MCBSC Architecture
No ratings yet
01 01 RN23821EN31GLA0 MCBSC Architecture
62 pages
DS Final Sample
No ratings yet
DS Final Sample
5 pages
AccountStatement_Report_6092264012_17122024_10_32
No ratings yet
AccountStatement_Report_6092264012_17122024_10_32
1 page
2913 Published
No ratings yet
2913 Published
21 pages
on tap anh 10
No ratings yet
on tap anh 10
6 pages
TBI GDF Report Builder
No ratings yet
TBI GDF Report Builder
3 pages
Group7 - Word Processing
No ratings yet
Group7 - Word Processing
35 pages
Method Statement For MCC-R0
No ratings yet
Method Statement For MCC-R0
59 pages
Telegram Useful Bots
No ratings yet
Telegram Useful Bots
6 pages
ThotWave SAS9 Clinical Research
No ratings yet
ThotWave SAS9 Clinical Research
11 pages
Cs8792-Cryptography and Network Security Unit-3: Sn. No. Option 1 Option 2 Option 3 Option 4 Correct Option
No ratings yet
Cs8792-Cryptography and Network Security Unit-3: Sn. No. Option 1 Option 2 Option 3 Option 4 Correct Option
3 pages
7SR11 & 7SR12 Settings & Instruments Guide
No ratings yet
7SR11 & 7SR12 Settings & Instruments Guide
2 pages
Extensible Authentication Protocol EAP Method Requ
No ratings yet
Extensible Authentication Protocol EAP Method Requ
13 pages
BPP (Checklist of Tools) TESDA OP CO 03 Accreditation ACs Forms
No ratings yet
BPP (Checklist of Tools) TESDA OP CO 03 Accreditation ACs Forms
30 pages
Loops in C++
No ratings yet
Loops in C++
16 pages
Timer Digital Theben OperatingInstructions_TR-687-3-top2_en
No ratings yet
Timer Digital Theben OperatingInstructions_TR-687-3-top2_en
16 pages
Businessman Target Marketing PowerPoint Templates
No ratings yet
Businessman Target Marketing PowerPoint Templates
47 pages
GPU - Final - Gradescope
No ratings yet
GPU - Final - Gradescope
20 pages
Unit 2 Wireless Sensor Networks
No ratings yet
Unit 2 Wireless Sensor Networks
29 pages
Azure Stack HCI
No ratings yet
Azure Stack HCI
17 pages
Company Profile
No ratings yet
Company Profile
13 pages
Interworking Between SIP/SDP and H.323: June 2000
No ratings yet
Interworking Between SIP/SDP and H.323: June 2000
22 pages
Presale Id: 239648: #Slno Category Product Quantity Unit Price Total
No ratings yet
Presale Id: 239648: #Slno Category Product Quantity Unit Price Total
2 pages
BW Multi Dimensional Data Modelling
No ratings yet
BW Multi Dimensional Data Modelling
99 pages
PD 63711
No ratings yet
PD 63711
64 pages

Computer Science Apprenticeship Bigdata Assignement3

Uploaded by

Computer Science Apprenticeship Bigdata Assignement3

Uploaded by

Computer Science Apprenticeship (CAP)

An-Najah National University

Big Data Course - 2022-2023 (FALL)

Assignment-3: A report about Hadoop Distributed File System (HDFS)

Instructor: Dr. Hamed Abdelhaq

The Hadoop Distributed File System (HDFS) is a distributed file

You might also like