0% found this document useful (0 votes)
336 views4 pages

Lab 1 - Hadoop HDFS and MapReduce

This document provides instructions for setting up Hadoop on a Linux server and familiarizing yourself with the Hadoop File System (HDFS) and MapReduce. It outlines steps to login, configure environment variables, format HDFS, start the distributed file system, create directories and copy files to HDFS, run a sample MapReduce job, retrieve output from HDFS, and shut down HDFS. Additional reading links are included for HDFS commands and details on MapReduce programming.

Uploaded by

Shiv GM
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
336 views4 pages

Lab 1 - Hadoop HDFS and MapReduce

This document provides instructions for setting up Hadoop on a Linux server and familiarizing yourself with the Hadoop File System (HDFS) and MapReduce. It outlines steps to login, configure environment variables, format HDFS, start the distributed file system, create directories and copy files to HDFS, run a sample MapReduce job, retrieve output from HDFS, and shut down HDFS. Additional reading links are included for HDFS commands and details on MapReduce programming.

Uploaded by

Shiv GM
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 4

Hadoop HDFS and MapReduce

LAB GUIDE

Hadoop Getting Started | Big Data Technologies | Oct 16 2017


Login and Environment Setup
1. Start PuTTY on your system and enter the given IP address to connect to the Linux
server with Hadoop installed.

2. Login with user id hadoopx and password hux. (e.g. hadoop1, hu1)

You can set Hadoop environment variables by appending the following commands to
~/.bashrc file.

export HADOOP_HOME=/usr/local/hadoop

export HADOOP_MAPRED_HOME=$HADOOP_HOME

export HADOOP_COMMON_HOME=$HADOOP_HOME

export HADOOP_HDFS_HOME=$HADOOP_HOME

export YARN_HOME=$HADOOP_HOME

export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native

export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin

export HADOOP_INSTALL=$HADOOP_HOME

To do this, perform the following steps:

1. Type nano ~/.bashrc to open the file to open the nano editor.

2. You will see that there are a number of lines of text already present in the file. Take
care that you dont accidentally modify the content already present.

3. Press and hold the down arrow key to go the end of the file. Press Enter.

4. Copy (ctrl-c) the lines above (beginning with export) and paste (ctrl-v) in the
nano window. These lines should appear at the end of the file in the editor.

5. Press ctrl-x to exit the editor and press y at the prompts that appear.

6. To apply the changes to the shell environment, type the following command at the
bash prompt:

$source ~/.bashrc

7. To verify that the changes have taken effect type the following command at the
bash prompt:

a. hadoop version

PAGE 1
This should show the version of Hadoop running (2.8.1) on the Linux server.

Familiarizing yourself with HDFS


1. First format the HDFS file system:

$ hadoop namenode -format

2. Start the distributed file system. The following command will start the namenode as
well as the data nodes as cluster.

$ start-dfs.sh

3. Listing Files in HDFS

$ hadoop fs -ls

4. Make the HDFS directories required to execute MapReduce jobs:

$ hadoop fs -mkdir ~/user

$ hadoop fs -mkdir user/<username>

5. Create a data file, data.txt, containing input data for a program in the home
directory

$ cat /usr/local/hadoop/etc/hadoop/*.xml >> ~/data.txt

6. Inserting Data into HDFS

Copy the file data.txt in the home directory of the local filesystem to the directory
/input in hdfs filesystem.

a) Create an input directory in hdfs:


$ hadoop fs -mkdir user/<username>

b) Copy file from the local filesystem


$ hadoop fs -put ~/data.txt user/<username>

c) Verify that the file has been copied.


$ hadoop fs -ls user/<username>

6. Run a MapReduce program from the set of example programs provided:

PAGE 2
hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-
mapreduce-examples-2.8.1.jar grep user/hadoop1 user/
hadoop1/output 'dfs[a-z.]+'

7. Retrieving Data from HDFS

Assume we have a file in HDFS called outfile. Given below is a simple demonstration for
retrieving the required file from the Hadoop file system.

Step 1

Initially, view the data from HDFS using cat command.

$ hadoop fs -cat user/hadoop1/output/*

Step 2

Get the file from HDFS to the local file system using get command.

$ mkdir ~/output

$ hadoop fs -get user/hadoop1/output/* ~/output

Shutting Down the HDFS

You can shut down the HDFS by using the following command.

$ stop-dfs.sh

Additional Reading:
1. You can find the complete list of HDFS commands here.

2. A detailed explanation of MapReduce and a complete description of the steps in


developing a MapReduce program can be found here.

PAGE 3

You might also like