0% found this document useful (0 votes)

336 views4 pages

Lab 1 - Hadoop HDFS and MapReduce

This document provides instructions for setting up Hadoop on a Linux server and familiarizing yourself with the Hadoop File System (HDFS) and MapReduce. It outlines steps to login, configure environment variables, format HDFS, start the distributed file system, create directories and copy files to HDFS, run a sample MapReduce job, retrieve output from HDFS, and shut down HDFS. Additional reading links are included for HDFS commands and details on MapReduce programming.

Uploaded by

Shiv GM

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

336 views4 pages

Lab 1 - Hadoop HDFS and MapReduce

Uploaded by

Shiv GM

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 4

Hadoop HDFS and MapReduce

LAB GUIDE

Hadoop Getting Started | Big Data Technologies | Oct 16 2017

Login and Environment Setup
1. Start PuTTY on your system and enter the given IP address to connect to the Linux
server with Hadoop installed.

2. Login with user id hadoopx and password hux. (e.g. hadoop1, hu1)

You can set Hadoop environment variables by appending the following commands to
~/.bashrc file.

export HADOOP_HOME=/usr/local/hadoop

export HADOOP_MAPRED_HOME=$HADOOP_HOME

export HADOOP_COMMON_HOME=$HADOOP_HOME

export HADOOP_HDFS_HOME=$HADOOP_HOME

export YARN_HOME=$HADOOP_HOME

export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native

export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin

export HADOOP_INSTALL=$HADOOP_HOME

To do this, perform the following steps:

1. Type nano ~/.bashrc to open the file to open the nano editor.

2. You will see that there are a number of lines of text already present in the file. Take
care that you dont accidentally modify the content already present.

3. Press and hold the down arrow key to go the end of the file. Press Enter.

4. Copy (ctrl-c) the lines above (beginning with export) and paste (ctrl-v) in the
nano window. These lines should appear at the end of the file in the editor.

5. Press ctrl-x to exit the editor and press y at the prompts that appear.

6. To apply the changes to the shell environment, type the following command at the
bash prompt:

$source ~/.bashrc

7. To verify that the changes have taken effect type the following command at the
bash prompt:

a. hadoop version

PAGE 1
This should show the version of Hadoop running (2.8.1) on the Linux server.

Familiarizing yourself with HDFS

1. First format the HDFS file system:

$ hadoop namenode -format

2. Start the distributed file system. The following command will start the namenode as
well as the data nodes as cluster.

$ start-dfs.sh

3. Listing Files in HDFS

$ hadoop fs -ls

4. Make the HDFS directories required to execute MapReduce jobs:

$ hadoop fs -mkdir ~/user

$ hadoop fs -mkdir user/<username>

5. Create a data file, data.txt, containing input data for a program in the home
directory

$ cat /usr/local/hadoop/etc/hadoop/*.xml >> ~/data.txt

6. Inserting Data into HDFS

Copy the file data.txt in the home directory of the local filesystem to the directory
/input in hdfs filesystem.

a) Create an input directory in hdfs:

$ hadoop fs -mkdir user/<username>

b) Copy file from the local filesystem

$ hadoop fs -put ~/data.txt user/<username>

c) Verify that the file has been copied.

$ hadoop fs -ls user/<username>

6. Run a MapReduce program from the set of example programs provided:

PAGE 2
hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-
mapreduce-examples-2.8.1.jar grep user/hadoop1 user/
hadoop1/output 'dfs[a-z.]+'

7. Retrieving Data from HDFS

Assume we have a file in HDFS called outfile. Given below is a simple demonstration for
retrieving the required file from the Hadoop file system.

Step 1

Initially, view the data from HDFS using cat command.

$ hadoop fs -cat user/hadoop1/output/*

Step 2

Get the file from HDFS to the local file system using get command.

$ mkdir ~/output

$ hadoop fs -get user/hadoop1/output/* ~/output

Shutting Down the HDFS

You can shut down the HDFS by using the following command.

$ stop-dfs.sh

Additional Reading:
1. You can find the complete list of HDFS commands here.

2. A detailed explanation of MapReduce and a complete description of the steps in

developing a MapReduce program can be found here.

PAGE 3

Class 12 COMPUTER SCIENCE PPT Chapter 2 File-Handling-In-Python
No ratings yet
Class 12 COMPUTER SCIENCE PPT Chapter 2 File-Handling-In-Python
60 pages
Machine Learning Models For Salary Prediction Dataset Using Python
No ratings yet
Machine Learning Models For Salary Prediction Dataset Using Python
5 pages
Hadoop Command Line Interface
No ratings yet
Hadoop Command Line Interface
10 pages
DEV3600 LabGuide
No ratings yet
DEV3600 LabGuide
26 pages
MT6737M Android Scatter
No ratings yet
MT6737M Android Scatter
9 pages
Chapter 3 Exercises
50% (2)
Chapter 3 Exercises
3 pages
Step by Step Hadoop 2.8 Installation
No ratings yet
Step by Step Hadoop 2.8 Installation
14 pages
Hadoop Python MapReduce Tutorial For Beginners
No ratings yet
Hadoop Python MapReduce Tutorial For Beginners
15 pages
How To Set Up A Hadoop Cluster in Docker
No ratings yet
How To Set Up A Hadoop Cluster in Docker
13 pages
2 HDFS Commands
No ratings yet
2 HDFS Commands
7 pages
Course Contents of Hadoop and Big Data
No ratings yet
Course Contents of Hadoop and Big Data
11 pages
Install and Run Hadoop On Windows
No ratings yet
Install and Run Hadoop On Windows
29 pages
Hadoop Admin 171103e Exercise Manual
No ratings yet
Hadoop Admin 171103e Exercise Manual
103 pages
BIG DATA WITH HADOOP, HDFS & MAPREDUCE (Hands On Training)
No ratings yet
BIG DATA WITH HADOOP, HDFS & MAPREDUCE (Hands On Training)
35 pages
MapR Sandbox For Hadoop DocUpdateFor3.1.1
No ratings yet
MapR Sandbox For Hadoop DocUpdateFor3.1.1
7 pages
Hadoop Exams
No ratings yet
Hadoop Exams
14 pages
Adm2000 Lab Guide
100% (1)
Adm2000 Lab Guide
48 pages
HDFS Exercises - Basic
No ratings yet
HDFS Exercises - Basic
5 pages
Mapr Snapshots
No ratings yet
Mapr Snapshots
31 pages
Map Reduce Examples
No ratings yet
Map Reduce Examples
16 pages
Cloudera Administration Study Guide
No ratings yet
Cloudera Administration Study Guide
3 pages
Administration of Hadoop Summer 2014 Lab Guide v3.1
No ratings yet
Administration of Hadoop Summer 2014 Lab Guide v3.1
107 pages
Install Hadoop-2.6.0 On Windows10
No ratings yet
Install Hadoop-2.6.0 On Windows10
10 pages
HDFS Commands
No ratings yet
HDFS Commands
6 pages
Hadoop and BigData LAB MANUAL
50% (4)
Hadoop and BigData LAB MANUAL
59 pages
Hadoop Installation
No ratings yet
Hadoop Installation
6 pages
File Types in Data Engineering!
No ratings yet
File Types in Data Engineering!
18 pages
Mcca Study Guide 7.2017 Uvawomo
No ratings yet
Mcca Study Guide 7.2017 Uvawomo
30 pages
Big Data Analytics - Lab-Manual
No ratings yet
Big Data Analytics - Lab-Manual
19 pages
Apache Flume Tutorial PDF
No ratings yet
Apache Flume Tutorial PDF
43 pages
DEV 301 - Lab Guide
100% (1)
DEV 301 - Lab Guide
46 pages
Cloudera Developer Training Exercise Manual
No ratings yet
Cloudera Developer Training Exercise Manual
131 pages
How Sqoop Works?: Sqoop "SQL To Hadoop and Hadoop To SQL"
No ratings yet
How Sqoop Works?: Sqoop "SQL To Hadoop and Hadoop To SQL"
27 pages
2-Hunting Insider Threats Part 1
No ratings yet
2-Hunting Insider Threats Part 1
22 pages
SAS Hadoop Kerberos
No ratings yet
SAS Hadoop Kerberos
27 pages
Full Stack UNIT 3
No ratings yet
Full Stack UNIT 3
36 pages
Hadoop Mapr Configuring Topologies
No ratings yet
Hadoop Mapr Configuring Topologies
34 pages
Word Count Program To Demonstrate The Use of Map and Reduce Tasks
No ratings yet
Word Count Program To Demonstrate The Use of Map and Reduce Tasks
5 pages
CHAPTER 3 - Scanning Networks
No ratings yet
CHAPTER 3 - Scanning Networks
14 pages
MapGuide Programming Manual
No ratings yet
MapGuide Programming Manual
164 pages
Hive Main Installation
No ratings yet
Hive Main Installation
2 pages
Bigdatacourse
No ratings yet
Bigdatacourse
10 pages
HADOOP PPT
No ratings yet
HADOOP PPT
21 pages
Cloudera Lab Preparation
No ratings yet
Cloudera Lab Preparation
3 pages
DEV3600SlideGuide PDF
No ratings yet
DEV3600SlideGuide PDF
555 pages
Mapreduce Lab
No ratings yet
Mapreduce Lab
36 pages
Cloudera Administration
No ratings yet
Cloudera Administration
424 pages
A Technical Overview of Couchbase
No ratings yet
A Technical Overview of Couchbase
96 pages
CP R80.40 Installation and Upgrade Guide
No ratings yet
CP R80.40 Installation and Upgrade Guide
799 pages
Questions Certif BigData
No ratings yet
Questions Certif BigData
12 pages
CH 23
No ratings yet
CH 23
126 pages
14-Lesson Cloudera Hive
No ratings yet
14-Lesson Cloudera Hive
9 pages
Hive Queries
No ratings yet
Hive Queries
5 pages
Parallel Distributed Architecture For Storage and Sharing (PDash)
No ratings yet
Parallel Distributed Architecture For Storage and Sharing (PDash)
6 pages
Console Bacula PDF
No ratings yet
Console Bacula PDF
51 pages
Workbook
No ratings yet
Workbook
10 pages
Hadoop Interview Questions and Answers
No ratings yet
Hadoop Interview Questions and Answers
3 pages
3.1.2. LAB PRACTICE - Footprinting With Maltego v1
No ratings yet
3.1.2. LAB PRACTICE - Footprinting With Maltego v1
20 pages
Apache Hue-Cloudera
No ratings yet
Apache Hue-Cloudera
63 pages
Optimizing Hadoop for MapReduce
From Everand
Optimizing Hadoop for MapReduce
Khaled Tannir
No ratings yet
BDA LAB MANUAL
No ratings yet
BDA LAB MANUAL
45 pages
BigData_Lab_Manual
No ratings yet
BigData_Lab_Manual
44 pages
Basic Commands For Os
No ratings yet
Basic Commands For Os
12 pages
LSP Lab Manual 21CSL35A - Students
No ratings yet
LSP Lab Manual 21CSL35A - Students
37 pages
ROBO
No ratings yet
ROBO
4 pages
Year & Sem - III Year & V Sem Subject - Operating System Topic: File Management (UNIT 4) Presented by - Dr. Nilam Choudhary, Associate Professor, CSE
No ratings yet
Year & Sem - III Year & V Sem Subject - Operating System Topic: File Management (UNIT 4) Presented by - Dr. Nilam Choudhary, Associate Professor, CSE
47 pages
Special Directories and Files
No ratings yet
Special Directories and Files
20 pages
CS609 Assignment # 03
No ratings yet
CS609 Assignment # 03
5 pages
Positouch DBF Files 2
No ratings yet
Positouch DBF Files 2
98 pages
ICS 431-Ch14-File System Implementation
No ratings yet
ICS 431-Ch14-File System Implementation
29 pages
SV104 User Manual App A
No ratings yet
SV104 User Manual App A
15 pages
Log
No ratings yet
Log
4 pages
Systems Software Worksheet 4 Utility Software
No ratings yet
Systems Software Worksheet 4 Utility Software
4 pages
Assignment 1
No ratings yet
Assignment 1
2 pages
Unix Exercise 1
No ratings yet
Unix Exercise 1
2 pages
Disk Format and LVM Extend
No ratings yet
Disk Format and LVM Extend
9 pages
Chmod and Permissions
No ratings yet
Chmod and Permissions
4 pages
Gkastrov 1 Log
No ratings yet
Gkastrov 1 Log
5 pages
aulog
No ratings yet
aulog
10 pages
Fuse
No ratings yet
Fuse
7 pages
Fatgen 103
No ratings yet
Fatgen 103
35 pages
List of DOS Commands
No ratings yet
List of DOS Commands
22 pages
Ms Dos Notes
No ratings yet
Ms Dos Notes
15 pages
Linux Important
No ratings yet
Linux Important
191 pages
Introduction To UNIX and Linux - Lecture Three - Exercise
No ratings yet
Introduction To UNIX and Linux - Lecture Three - Exercise
2 pages
CoursBigData-TPs-Applications - 1-2
No ratings yet
CoursBigData-TPs-Applications - 1-2
25 pages
CS314 Lab2
No ratings yet
CS314 Lab2
7 pages
Questions of Operating System - Source IIT
0% (2)
Questions of Operating System - Source IIT
1 page
PING Howto PDF
No ratings yet
PING Howto PDF
42 pages
Forensics
No ratings yet
Forensics
6 pages

Lab 1 - Hadoop HDFS and MapReduce

Uploaded by

Lab 1 - Hadoop HDFS and MapReduce

Uploaded by

Hadoop HDFS and MapReduce

Hadoop Getting Started | Big Data Technologies | Oct 16 2017

To do this, perform the following steps:

Familiarizing yourself with HDFS

$ hadoop namenode -format

3. Listing Files in HDFS

4. Make the HDFS directories required to execute MapReduce jobs:

$ hadoop fs -mkdir ~/user

$ hadoop fs -mkdir user/<username>

$ cat /usr/local/hadoop/etc/hadoop/*.xml >> ~/data.txt

6. Inserting Data into HDFS

a) Create an input directory in hdfs:

b) Copy file from the local filesystem

c) Verify that the file has been copied.

6. Run a MapReduce program from the set of example programs provided:

7. Retrieving Data from HDFS

Initially, view the data from HDFS using cat command.

$ hadoop fs -cat user/hadoop1/output/*

$ hadoop fs -get user/hadoop1/output/* ~/output

Shutting Down the HDFS

2. A detailed explanation of MapReduce and a complete description of the steps in

You might also like