Lab 1 - Hadoop HDFS and MapReduce
Lab 1 - Hadoop HDFS and MapReduce
LAB GUIDE
2. Login with user id hadoopx and password hux. (e.g. hadoop1, hu1)
You can set Hadoop environment variables by appending the following commands to
~/.bashrc file.
export HADOOP_HOME=/usr/local/hadoop
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
export HADOOP_INSTALL=$HADOOP_HOME
1. Type nano ~/.bashrc to open the file to open the nano editor.
2. You will see that there are a number of lines of text already present in the file. Take
care that you dont accidentally modify the content already present.
3. Press and hold the down arrow key to go the end of the file. Press Enter.
4. Copy (ctrl-c) the lines above (beginning with export) and paste (ctrl-v) in the
nano window. These lines should appear at the end of the file in the editor.
5. Press ctrl-x to exit the editor and press y at the prompts that appear.
6. To apply the changes to the shell environment, type the following command at the
bash prompt:
$source ~/.bashrc
7. To verify that the changes have taken effect type the following command at the
bash prompt:
a. hadoop version
PAGE 1
This should show the version of Hadoop running (2.8.1) on the Linux server.
2. Start the distributed file system. The following command will start the namenode as
well as the data nodes as cluster.
$ start-dfs.sh
$ hadoop fs -ls
5. Create a data file, data.txt, containing input data for a program in the home
directory
Copy the file data.txt in the home directory of the local filesystem to the directory
/input in hdfs filesystem.
PAGE 2
hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-
mapreduce-examples-2.8.1.jar grep user/hadoop1 user/
hadoop1/output 'dfs[a-z.]+'
Assume we have a file in HDFS called outfile. Given below is a simple demonstration for
retrieving the required file from the Hadoop file system.
Step 1
Step 2
Get the file from HDFS to the local file system using get command.
$ mkdir ~/output
You can shut down the HDFS by using the following command.
$ stop-dfs.sh
Additional Reading:
1. You can find the complete list of HDFS commands here.
PAGE 3