0% found this document useful (0 votes)

4 views

BDA LAB MANUEL

The document outlines software requirements and a series of experiments for learning Hadoop and Java programming. It includes detailed instructions for setting up Hadoop in various modes, implementing data structures in Java, and performing file management tasks in Hadoop. Additionally, it covers advanced MapReduce programming tasks and the use of tools like Pig and Hive for data processing.

Uploaded by

adapauma48

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views

BDA LAB MANUEL

Uploaded by

adapauma48

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 9

Software Requirements:

1. Hadoop : https://hadoop.apache.org/release/2.7.6.html
2. Java : https://www.oracle.com/java/technologies/javase/javase8u211-later-archivedownloads.
html
3. Eclipse : https://www.eclipse.org/downloads/
List of Experiments:
Experiment 1: Week 1, 2:
1. Implement the following Data structures in Java
a) Linked Lists b) Stacks c) Queues d) Set e) Map
Experiment 2: Week 3:
2. (i)Perform setting up and Installing Hadoop in its three operating modes: Standalone, Pseudo distributed, Fully
distributed
(ii)Use web based tools to monitor your Hadoop setup.
Experiment 3: Week 4:
3.Implement the following file management tasks in Hadoop:
􀁸 Adding files and directories
􀁸 Retrieving files
􀁸 Deleting files
Hint: A typical Hadoop workflow creates data files (such as log files) elsewhere and copies them into
HDFS using one of the above command line utilities.
Experiment 4: Week 5:
4. Run a basic Word Count MapReduce program to understand MapReduce Paradigm.
Experiment 5: Week 6:
5. Write a map reduce program that mines weather data.
Weather sensors collecting data every hour at many locations across the globe gather a large volume
of log data, which is a good candidate for analysis with Map Reduce, since it is semi
structured and record-oriented.
Experiment 6: Week 7:
6.Use MapReduce to find the shortest path between two people in a social graph.
Hint: Use an adjacency list to model a graph, and for each node store the distance from the original
node, as well as a back pointer to the original node. Use the mappers to propagate the distance to the
original node, and the reducer to restore the state of the graph. Iterate until the target node has been
reached.
Experiment 7: Week 8:
7. Implement Friends-of-friends algorithm in MapReduce.
Hint: Two MapReduce jobs are required to calculate the FoFs for each user in a social network .The
first job calculates the common friends for each user, and the second job sorts the common friends
by the number of connections to your friends.
Experiment 8: Week 9:
8. Implement an iterative PageRank graph algorithm in MapReduce.
Hint: PageRank can be implemented by iterating a MapReduce job until the graph has converged. The
mappers are responsible for propagating node PageRank values to their adjacent nodes, and the
reducers are responsible for calculating new PageRank values for each node, and for re-creating the
original graph with the updated PageRank values.
Experiment 9: Week 10:
9. Perform an efficient semi-join in MapReduce.
Hint: Perform a semi-join by having the mappers load a Bloom filter from the Distributed Cache,
and then filter results from the actual MapReduce data source by performing membership queries
against the Bloom filter to determine which data source records should be emitted to the reducers.
Experiment 10: Week 11:
10. Install and Run Pig then write Pig Latin scripts to sort, group, join, project, and filter your
data.
Experiment 12: Week 12:
11. Install and Run Hive then use Hive to create, alter, and drop databases, tables, views,
functions, and indexes

Experiment 2: Week 3:

2. (i)Perform setting up and Installing Hadoop in its three operating modes:

Standalone, Pseudo distributed, Fully distributed
(ii)Use web based tools to monitor your Hadoop setup.
Program – 1

AIM: Perform setting up and install HADOOP in its two operating modes, Pseudo-distributed and fully distributed.
About HADOOP
The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets
across clusters of computers using simple programming models. It is designed to scale up from single servers
to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver
high-availability, the library itself is designed to detect and handle failures at the application layer, so
delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures.

Setting up HADOOP
Pre-requisites:
1. Java
2. SSH
Before any other steps, we need to set the java environment variable, this can be done in windows from the system
variables window or on linux by adding the following to the variables
file:
export JAVA_HOME=/usr/java/latest
Download and extract the HADOOP binaries
1. wget http://apache.claz.org/hadoop/common/hadoop-3.1.2/
2. hadoop-3.1.2.tar.gz
3. tar xzf hadoop-3.1.2.tar.gz
4. hadoop-3.1.2/* to hadoop/

Pseudo-distributed mode
1. Add the following variables to the system variable file
1 export HADOOP_HOME=/usr/local/hadoop
2 export HADOOP_MAPRED_HOME=$HADOOP_HOME
3 export HADOOP_COMMON_HOME=$HADOOP_HOME
4 export HADOOP_HDFS_HOME=$HADOOP_HOME
5 export YARN_HOME=$HADOOP_HOME
6 export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
7 export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
8 export HADOOP_INSTALL=$HADOOP_HOME
2. Configure HADOOP files
a) Change to the Hadoop directory/etc/Hadoop
b) b. Add the following to the hadoop-env.sh file
c) export JAVA_HOME=/usr/local/jdk1.7.0_71
d) c. Edit the following config files
Program – 1

core-site.xml
1 <configuration>
2 <property>
3 <name>fs.default.name</name>
4 <value>hdfs://localhost:9000</value>
5 </property>
6 </configuration>

hdfs-site.xml

1 <configuration>
2 <property>
3 <name>dfs.replication</name>
4 <value>1</value>
5 </property>
6 <property>
7 <name>dfs.name.dir</name>
8 <value>file:///home/<user_name>/hadoopinfra/hdfs/namenode
9 </value>
10 </property>
11 <property>
12 <name>dfs.data.dir</name>
13 <value>file:///home/<user_name>/hadoopinfra/hdfs/datanode
14 </value>
15 </property>
16 </configuration>
yarn-site.xml

1 <configuration>
2 <property>
3 <name>yarn.nodemanager.aux-services</name>
4 <value>mapreduce_shuffle</value>
5 </property>
6 </configuration>
mapred-site.xml
1 <configuration>
2 <property>
3 <name>mapreduce.framework.name</name>
4 <value>yarn</value>
5 </property>
6 </configuration>

d. Verifying the installation

i. Formatting the namenodes

Program – 1
Fully distributed mode

1 Configure system and create host files on each node

a. For each node, edit eh /etc/hosts/ file and add the IP addresses of the servers e.g.

i. 92.0.2.1 node-master
ii. 192.0.2.2 node1
iii. 192.0.2.3 node2
2. Distribute the authentication key-pairs to the users

a. Login to the node-master and generate ssh-keys

b. Copy the keys tot the other nodes.

3. Download and extract the HADOOP binaries

4. Set the environment variables (same as pseudo-distributed)
5. Edit the core-site.xml file to set NameNode location

1 <?xml version="1.0" encoding="UTF-8"?>

2 <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
3 <configuration>
4 <property>
5 <name>fs.default.name</name>
6 <value>hdfs://node-master:9000</value>
7 </property>
8 </configuration>

1 <configuration>
2 <property>
3 <name>yarn.acl.enable</name>
4 <value>0</value>
5 </property>
6 <property>
7 <name>yarn.resourcemanager.hostname</name>
8 <value>node-master</value>
9 </property>
10 <property>
11 <name>yarn.nodemanager.aux-services</name>
12 <value>mapreduce_shuffle</value>
13 </property>
14 </configuration>

9. Duplicate the config files to each node.

10. Format the HDFS (same as pseudo-distributed).
11. Start the HDFS (same as pseudo-distributed).
12. Run YARN (same as pseudo-distributed).
Findings and Learnings:

1. We have installed HADOOP in both pseudo-distributed and fully-distributed modes

Experiment 3: Week 4:
3.Implement the following file management tasks in Hadoop:
1 Adding files and directories
2 Retrieving files
3 Deleting files
Hint: A typical Hadoop workflow creates data files (such as log files) elsewhere and copies them
into
HDFS using one of the above command line utilities.
Open a terminal window to the current working directory.
# 1. Print the Hadoop version
hadoop version
***************b) # 2. List the contents of the root directory in HDFS
hadoop fs -ls /
# 3. Report the amount of space used and
# available on currently mounted filesystem
hadoop fs -df hdfs:/
# 4. Count the number of directories,files and bytes under
# the paths that match the specified file pattern
hadoop fs -count hdfs:/
# 5. Run a DFS filesystem checking utility
hadoop fsck – /
# 6. Run a cluster balancing utility
hadoop balancer
a) # 7. Create a new directory named “hadoop” below the # /user/training directory in HDFS.
Since you’re
# currently logged in with the “training” user ID, # /user/training is your home directory in HDFS.
hadoop fs -mkdir /user/training/hadoop
# 8. Add a sample text file from the local directory
# named “data” to the new directory you created in HDFS
# during the previous step. #
hadoop fs -put data/sample.txt /user/training/hadoop
# 9. List the contents of this new directory in HDFS.
hadoop fs -ls /user/training/Hadoop
# 10. Add the entire local directory called “retail” to the
# /user/training directory in HDFS.
hadoop fs -put data/retail /user/training/hadoop
# 11. Since /user/training is your home directory in HDFS, # any command that does not have an
absolute path is
# interpreted as relative to that directory. The next
# command will therefore list your home directory, and # should show the items you’ve just added
there. hadoop fs -ls
# 12. See how much space this directory occupies in HDFS.
hadoop fs -du -s -h hadoop/retail
b) ******************# 13. Delete a file ‘customers’ from the “retail” directory.
hadoop fs -rm hadoop/retail/customers # 14. Ensure this file is no longer in HDFS. hadoop fs
-ls hadoop/retail/customers
# 15. Delete all files from the “retail” directory using a wildcard.
hadoop fs -rm hadoop/retail/*
# 16. To empty the trash
hadoop fs -expunge
# 17. Finally, remove the entire retail directory and all # of its contents in HDFS.
hadoop fs -rm -r hadoop/retail
# 18. List the hadoop directory again
hadoop fs -ls hadoop
# 19. Add the purchases.txt file from the local directory
# named “/home/training/” to the hadoop directory you created in HDFS hadoop fs -
copyFromLocal /home/training/purchases.txt hadoop/ # 20. To view the contents of your
text file purchases.txt
# which is present in your hadoop directory.
hadoop fs -cat hadoop/purchases.txt
# 21. Add the purchases.txt file from “hadoop” directory which is present in HDFS directory
# to the directory “data” which is present in your local directory
hadoop fs -copyToLocal hadoop/purchases.txt /home/training/data
# 22. cp is used to copy files between directories present in HDFS
hadoop fs -cp /user/training/*.txt /user/training/hadoop
# 23. ‘-get’ command can be used alternaively to ‘-copyToLocal’ command
hadoop fs -get hadoop/sample.txt /home/training/
# 24. Display last kilobyte of the file “purchases.txt” to stdout
hadoop fs -tail hadoop/purchases.txt
# 25. Default file permissions are 666 in HDFS
# Use ‘-chmod’ command to change permissions of a file
hadoop fs -ls hadoop/purchases.txt
sudo -u hdfs hadoop fs -chmod 600 hadoop/purchases.txt
# 26. Default names of owner and group are training,training
# Use ‘-chown’ to change owner name and group name simultaneously
hadoop fs -ls hadoop/purchases.txt
sudo -u hdfs hadoop fs -chown root:root hadoop/purchases.txt
# 27. Default name of group is training
# Use ‘-chgrp’ command to change group name
hadoop fs -ls hadoop/purchases.txt
sudo -u hdfs hadoop fs -chgrp training hadoop/purchases.txt
# 28. Move a directory from one location to other
hadoop fs -mv hadoop apache_hadoop
# 29. Default replication factor to a file is 3.
# Use ‘-setrep’ command to change replication factor of a file
hadoop fs -setrep -w 2 apache_hadoop/sample.txt
# 30. Copy a directory from one node in the cluster to another
# Use ‘-distcp’ command to copy,
# -overwrite option to overwrite in an existing files # -update command to synchronize both
directories
hadoop fs -distcp hdfs://namenodeA/apache_hadoop
hdfs://namenodeB/hadoop
# 31. Command to make the name node leave safe mode
hadoop fs -expunge
sudo -u hdfs hdfs dfsadmin -safemode leave
# 32. List all the hadoop file system shell commands
hadoop fs
# 33. Last but not least, always ask for help!
hadoop fs -help

Oracle Apps Technical
100% (1)
Oracle Apps Technical
6 pages
BDA Lab Manual
No ratings yet
BDA Lab Manual
34 pages
Big_data_Lab_Manual[1] (4)
No ratings yet
Big_data_Lab_Manual[1] (4)
32 pages
Big Data File
No ratings yet
Big Data File
16 pages
BDA Lab Manual-1
No ratings yet
BDA Lab Manual-1
60 pages
Big data analytics lab-JD
No ratings yet
Big data analytics lab-JD
49 pages
1.Mrplab Intro
No ratings yet
1.Mrplab Intro
18 pages
bigdatamanual(2)
No ratings yet
bigdatamanual(2)
45 pages
Bda Aat
No ratings yet
Bda Aat
18 pages
HadoopfilePP
No ratings yet
HadoopfilePP
83 pages
Big Data Record 2024-25
No ratings yet
Big Data Record 2024-25
46 pages
Big Data Analytics Lab Experiments
No ratings yet
Big Data Analytics Lab Experiments
16 pages
Week 1 in Terminal
No ratings yet
Week 1 in Terminal
10 pages
Lab Manual
No ratings yet
Lab Manual
27 pages
BIG data file
No ratings yet
BIG data file
28 pages
CCS334-BDA LAB MANUAL final (1)
No ratings yet
CCS334-BDA LAB MANUAL final (1)
46 pages
Hadoop Administrator Training - Lab Hand Book
No ratings yet
Hadoop Administrator Training - Lab Hand Book
12 pages
Quick Configuration of Openldap and Kerberos In Linux and Authenicating Linux to Active Directory
From Everand
Quick Configuration of Openldap and Kerberos In Linux and Authenicating Linux to Active Directory
Dr. Hidaia Mahmood Alassouli
No ratings yet
Pro 3
No ratings yet
Pro 3
45 pages
big datalab
No ratings yet
big datalab
4 pages
Experiment-2_BDA_Lab
No ratings yet
Experiment-2_BDA_Lab
13 pages
Ccs334 Bda Lab Ex
No ratings yet
Ccs334 Bda Lab Ex
45 pages
Bdafile
No ratings yet
Bdafile
9 pages
Course: Big Data Analytics Lab Scheme: 2017
No ratings yet
Course: Big Data Analytics Lab Scheme: 2017
25 pages
Prachi 20CS111 BDALab File
No ratings yet
Prachi 20CS111 BDALab File
20 pages
EXP 1-2
No ratings yet
EXP 1-2
9 pages
bi lab file
No ratings yet
bi lab file
19 pages
BDA
No ratings yet
BDA
30 pages
Big Data
No ratings yet
Big Data
67 pages
@bigdatalabfile 09
No ratings yet
@bigdatalabfile 09
35 pages
HADOOP AND BIG DATA - Final
No ratings yet
HADOOP AND BIG DATA - Final
26 pages
3 Hadoop
No ratings yet
3 Hadoop
40 pages
213nt1306- Big Data Analytics Lab Manual
No ratings yet
213nt1306- Big Data Analytics Lab Manual
80 pages
lab manual
No ratings yet
lab manual
34 pages
A Report On Distributed Computing
No ratings yet
A Report On Distributed Computing
25 pages
Big Data
No ratings yet
Big Data
23 pages
Bigdatamanualfinal 231019063224 d211cb48
No ratings yet
Bigdatamanualfinal 231019063224 d211cb48
45 pages
1 Big Data Lab - 230823 - 103054
No ratings yet
1 Big Data Lab - 230823 - 103054
34 pages
BDT Lab Manual
No ratings yet
BDT Lab Manual
34 pages
Amrita CC 3.1
No ratings yet
Amrita CC 3.1
7 pages
Big Data Lab Manual Printout Copy
No ratings yet
Big Data Lab Manual Printout Copy
51 pages
Data Science
No ratings yet
Data Science
82 pages
Bigdata Lab File
No ratings yet
Bigdata Lab File
20 pages
Practical-1: Aim: Hadoop Configuration and Single Node Cluster Setup and Perform File Management Task in
No ratings yet
Practical-1: Aim: Hadoop Configuration and Single Node Cluster Setup and Perform File Management Task in
61 pages
Hadoop Lab Manual
No ratings yet
Hadoop Lab Manual
54 pages
BDA Lab Assignment 1 PDF
No ratings yet
BDA Lab Assignment 1 PDF
20 pages
BIG DATA UNIT-III Notes
No ratings yet
BIG DATA UNIT-III Notes
16 pages
bda-manual
No ratings yet
bda-manual
33 pages
CCS334 BDA LAB MANUAL
No ratings yet
CCS334 BDA LAB MANUAL
48 pages
BigData_Lab_Manual
No ratings yet
BigData_Lab_Manual
44 pages
Hadoop Single Node Cluster Setup Steps
No ratings yet
Hadoop Single Node Cluster Setup Steps
7 pages
BDA lab manual UPDATED
No ratings yet
BDA lab manual UPDATED
45 pages
PDC All Labs
100% (1)
PDC All Labs
129 pages
bda lab s
No ratings yet
bda lab s
92 pages
BDA LAB MANUAL
No ratings yet
BDA LAB MANUAL
45 pages
Ba Lab Record-It b2022-26
No ratings yet
Ba Lab Record-It b2022-26
43 pages
big data
No ratings yet
big data
28 pages
CCS334 Bda
No ratings yet
CCS334 Bda
23 pages
Big Data Analytics Lab
No ratings yet
Big Data Analytics Lab
18 pages
BDA-Lab Record
No ratings yet
BDA-Lab Record
43 pages
EX. NO Date Program NO Sign
No ratings yet
EX. NO Date Program NO Sign
80 pages
PR130 NEMO 96HDLe
No ratings yet
PR130 NEMO 96HDLe
19 pages
Requisition - PDF Jum
No ratings yet
Requisition - PDF Jum
4 pages
RUBRIC For Presentation of Report
No ratings yet
RUBRIC For Presentation of Report
1 page
Task Sheet #02 (Analyzing Primary Sources)
No ratings yet
Task Sheet #02 (Analyzing Primary Sources)
5 pages
ST Jerome Jerome
No ratings yet
ST Jerome Jerome
4 pages
Writing Paragraph: (BUG1E2)
No ratings yet
Writing Paragraph: (BUG1E2)
22 pages
Hack Google
50% (2)
Hack Google
6 pages
Transmission Fundamentals
No ratings yet
Transmission Fundamentals
13 pages
Label Nama Undangan Siap Download
No ratings yet
Label Nama Undangan Siap Download
21 pages
Essentials of Executive Functions Assessment - 1st Edition Complete DOCX Download
100% (4)
Essentials of Executive Functions Assessment - 1st Edition Complete DOCX Download
16 pages
Week4 Workshop
No ratings yet
Week4 Workshop
47 pages
VOCABULARY 2 pdf
No ratings yet
VOCABULARY 2 pdf
1 page
Unit 5 Short Notes
No ratings yet
Unit 5 Short Notes
6 pages
18 Method Three Moment Eqution
No ratings yet
18 Method Three Moment Eqution
29 pages
Soal Level 4 Smansa
No ratings yet
Soal Level 4 Smansa
6 pages
Spanish English contrasts a course in Spanish linguistics 2nd Edition Whitley - Download the ebook and explore the most detailed content
100% (1)
Spanish English contrasts a course in Spanish linguistics 2nd Edition Whitley - Download the ebook and explore the most detailed content
57 pages
Koi Sunta Hai (Someone Is Listening) (Journeys With Kumar and Kabir
No ratings yet
Koi Sunta Hai (Someone Is Listening) (Journeys With Kumar and Kabir
3 pages
r10-Misoc-2025-04!11!007_implementation of Listo Si Kap in All Barangays
No ratings yet
r10-Misoc-2025-04!11!007_implementation of Listo Si Kap in All Barangays
29 pages
Bilingualism Life Span
No ratings yet
Bilingualism Life Span
11 pages
Study 9 - Understanding The Loaves: READ Mark 6:30-56
No ratings yet
Study 9 - Understanding The Loaves: READ Mark 6:30-56
7 pages
UNIT - 3 Microprocessor and Its Applications
No ratings yet
UNIT - 3 Microprocessor and Its Applications
15 pages
Preparing For Prophetic Ministry Mark Conner
No ratings yet
Preparing For Prophetic Ministry Mark Conner
8 pages
Narrative Text PowerPoint Presentation
No ratings yet
Narrative Text PowerPoint Presentation
10 pages
LessonPlan Recount Text
No ratings yet
LessonPlan Recount Text
9 pages
Bangura 2005 Ubuntugogy - An African Educational Paradigm That Transcends Pedagogy, Andragogy Ergonagy and Heutagogy
No ratings yet
Bangura 2005 Ubuntugogy - An African Educational Paradigm That Transcends Pedagogy, Andragogy Ergonagy and Heutagogy
42 pages
LING 101 Fall 2015 Midterm Study Sheet
No ratings yet
LING 101 Fall 2015 Midterm Study Sheet
2 pages
Monaco Institute of Business & Computer Science Diploma in Software Engineering
No ratings yet
Monaco Institute of Business & Computer Science Diploma in Software Engineering
5 pages
ATC 58 Performance Assessment Calculation Tool (PACT)
No ratings yet
ATC 58 Performance Assessment Calculation Tool (PACT)
14 pages
Sap PR-PO Approver
No ratings yet
Sap PR-PO Approver
47 pages

BDA LAB MANUEL

Uploaded by

BDA LAB MANUEL

Uploaded by

Software Requirements:

2. (i)Perform setting up and Installing Hadoop in its three operating modes:

d. Verifying the installation

i. Formatting the namenodes

1 Configure system and create host files on each node

a. Login to the node-master and generate ssh-keys

3. Download and extract the HADOOP binaries

1 <?xml version="1.0" encoding="UTF-8"?>

6. Set the HDFS Paths in hdfs-site.xml

9. Duplicate the config files to each node.

1. We have installed HADOOP in both pseudo-distributed and fully-distributed modes

You might also like