Hadoop Hands-On Exercises: Lawrence Berkeley National Lab

This document outlines exercises for learning Hadoop, including running the word count example, using HDFS commands, streaming with Unix commands and scripts, and a census data analysis example using a mapper and reducer script. Participants will access a remote Hadoop cluster, load data into HDFS, run MapReduce jobs, and view outputs.

Uploaded by

jeevasakthi

Available Formats

Download as PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

118 views

Hadoop Hands-On Exercises: Lawrence Berkeley National Lab

Uploaded by

jeevasakthi

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 23

Hadoop Hands-On Exercises

Lawrence Berkeley National Lab

Oct 2011
We will
Training accounts/User Agreement forms
Test access to carver
HDFS commands
Monitoring
Run the word count example
Simple streaming with Unix commands
Streaming with simple scripts
Streaming Census example
Pig Examples
Additional Exercises
2
Instructions
http://tinyurl.com/nerschadoopoct
3
Login and Environment
ssh [username]@carver.nersc.gov
echo $SHELL
should be bash
4
Remote Participants
Visit: http://maghdp01.nersc.gov:50030/
http://magellan.nersc.gov
(Go to Using Magellan -> Creating a SOCKS proxy)
5
Environment Setup
$ ssh [username]@carver.nersc.gov
$ echo $SHELL
If your shell doesnt show /bin/bash please change
your shell
$ bash
Setup your environment to use Hadoop on Magellan
system
$ module load tig hadoop
6
Hadoop Command
hadoop command [genericOptions] [commandOptions]
Examples:-
command fs, jar, job
[genericOptions] - -conf, -D, -files, -libjars, -archives
[commandOptions] - -ls, -submit
7
HDFS Commands [1]
$ hadoop fs ls
If you see an error do the following where
[username] is your training account username
$ hadoop fs -mkdir /user/[username]
$ vi testfile1 [ Repeat for testfile2]
This is file 1
This is to test HDFS
$ hadoop fs -mkdir input
$ hadoop fs -put testfile* input
You can get help on commands -
$ hadoop fs -help
8
HDFS Commands [2]
$ hadoop fs -cat input/testfile1
$ hadoop fs -cat input/testfile*
Download the files from HDFS into a directory called
input and check there is a input directory.
$ hadoop fs -get input input
$ ls input/
9
Monitoring
http://maghdp01.nersc.gov:50030/
http://maghdp01.nersc.gov:50070/
$ hadoop job -list
10
Wordcount Example
Input in HDFS
$ hadoop fs -mkdir wordcount-in
$ hadoop fs -put /global/scratch/sd/lavanya/
hadooptutorial/wordcount/* wordcount-in/
Run example
$ hadoop jar /usr/common/tig/hadoop/
hadoop-0.20.2+228/hadoop-0.20.2+228-examples.jar
wordcount wordcount-in wordcount-op
View output
$ hadoop fs -ls wordcount-op
$ hadoop fs -cat wordcount-op/part-r-00000
$ hadoop fs -cat wordcount-op/p* | grep Darcy
11
Wordcount: Number of reduces
$ hadoop dfs -rmr wordcount-op
$ hadoop jar /usr/common/tig/hadoop/
hadoop-0.20.2+228/hadoop-0.20.2+228-examples.jar
wordcount -Dmapred.reduce.tasks=4 wordcount-in
wordcount-op
http://maghdp01.nersc.gov:50030/
12
Wordcount: GPFS
Setup permissions for Hadoop user [ONE-TIME]
$ mkdir /global/scratch/sd/[username]/hadoop
$ chmod -R 755 /global/scratch/sd/[username]
$ chmod -R 777 /global/scratch/sd/[username]/hadoop/
Run Job
$ hadoop jar /usr/common/tig/hadoop
/hadoop-0.20.2+228/hadoop-0.20.2+228-examples.jar wordcount -
Dfs.default.name=file://// /global/scratch/sd/lavanya/
hadooptutorial/wordcount/ /global/scratch/sd/[username]
hadoop/wordcount-gpfs/
Set perms for yourself
$ fixperms.sh /global/scratch/sd/[username]/hadoop/wordcount-
gpfs/
13
Streaming with Unix Commands
$ hadoop jar $HADOOP_HOME/contrib/streaming/
hadoop*-streaming.jar -input wordcount-in -output
wordcount-streaming-op -mapper /bin/cat -reducer /
usr/bin/wc
$ hadoop fs -cat wordcount-streaming-op/p*
14
Streaming with Unix Commands/
GPFS
$ hadoop jar $HADOOP_HOME/contrib/streaming/
hadoop*-streaming.jar -Dfs.default.name=file:/// -
input /global/scratch/sd/lavanya/hadooptutorial/
wordcount/ -output /global/scratch/sd/[username]/
hadoop/wordcount-streaming-op -mapper /bin/cat -
reducer /usr/bin/wc
$ fixperms.sh /global/scratch/sd/[username]/hadoop/
wordcount-streaming-op
15
Streaming with Scripts
$ mkdir simple-streaming-example
$ cd simple-streaming-example
$ vi cat.sh
cat
Now let us test this
$ hadoop fs -mkdir cat-in
$ hadoop fs -put /global/scratch/sd/lavanya/
hadooptutorial/cat/in/* cat-in/
$ hadoop jar /usr/common/tig/hadoop/
hadoop-0.20.2+228/contrib/streaming/
hadoop*streaming*.jar -mapper cat.sh -input cat-in -
output cat-op -file cat.sh
16
Streaming with scripts Number
of reducers and mappers
$ hadoop jar /usr/common/tig/hadoop/
hadoop-0.20.2+228/contrib/streaming/
hadoop*streaming*.jar -Dmapred.reduce.tasks=0 -
mapper cat.sh -input cat-in -output cat-op -file cat.sh
$ hadoop jar /usr/common/tig/hadoop/
hadoop-0.20.2+228/contrib/streaming/
hadoop*streaming*.jar -
Dmapred.min.split.size=91212121212 -mapper cat.sh
-input cat-in -output cat-op -file cat.sh
17
Census sample
$ mkdir census
$ cd census
$ cp /global/scratch/sd/lavanya/hadooptutorial/
census/censusdata.sample .
$ mkdir census
$ cd census
$ cp /global/scratch/sd/lavanya/hadooptutorial/
census/censusdata.sample .
18
Mapper
#The code is available in
$ vi mapper.sh
while read line; do
if [[ "$line" == *Alabama* ]]; then
echo "Alabama 1"
fi
if [[ "$line" == *Alaska* ]]; then
echo -e "Alaska\t1"
fi
done
$ chmod 755 mapper.sh
$ cat censusdata.sample | ./mapper.sh
19
Census Run
$ hadoop fs -mkdir census
$ hadoop fs -put /global/scratch/sd/lavanya/
hadooptutorial/census/censusdata.sample census/
$ hadoop jar /usr/common/tig/hadoop/
hadoop-0.20.2+228/contrib/streaming/
hadoop*streaming*.jar -mapper mapper.sh -input
census -output census-op -file mapper.sh reducer /
usr/bin/wc
$ hadoop fs -cat census-op/p*
20
Census Run: Mappers and
Reducers
$ hadoop fs -rmr census-op
$ hadoop jar /usr/common/tig/hadoop/
hadoop-0.20.2+228/contrib/streaming/
hadoop*streaming*.jar -Dmapred.map.tasks=10 -
Dmapred.reduce.tasks=2 -mapper mapper.sh -input
census -output census-op/ -file mapper.sh -reducer /
usr/bin/wc
21
Census: Custom Reducer
$ vi reducer.sh
last_key="Alabama"
while read line; do
key=`echo $line | cut -f1 -d' '`
val=`echo $line | cut -f2 -d' '`
if [[ "$last_key" = "$key" ]];then
let "count=count+1";
else
echo "**" $last_key $count
last_key=${key};
count=1;
fi
done
echo "**" $last_key $count
22
Census Run with custom reducer
$ hadoop fs -rmr census-op
$ hadoop jar /usr/common/tig/hadoop/
hadoop-0.20.2+228/contrib/streaming/
hadoop*streaming*.jar -Dmapred.map.tasks=10 -
Dmapred.reduce.tasks=2 -mapper mapper.sh -input
census -output census-op -file mapper.sh -reducer
reducer.sh -file reducer.sh
23

Red Hat System Administration II RH134 9.0 - Student - Ashish Lingayat, Bernardo Gargallo, Ed Parenti, Jacob - 2023 - Anna's Archive
100% (1)
Red Hat System Administration II RH134 9.0 - Student - Ashish Lingayat, Bernardo Gargallo, Ed Parenti, Jacob - 2023 - Anna's Archive
438 pages
Kodekloud linux challenge
No ratings yet
Kodekloud linux challenge
13 pages
Cit - 352 - Doc - Chap 10 Hands-On Projects
0% (1)
Cit - 352 - Doc - Chap 10 Hands-On Projects
2 pages
Unix Codes
0% (1)
Unix Codes
21 pages
Linux and Shell Commands
100% (7)
Linux and Shell Commands
23 pages
Sc2Sei Automatic Transfer of Phase Readings and Waveforms From A Seiscomp3 Data Base To A Seisan Data Base
No ratings yet
Sc2Sei Automatic Transfer of Phase Readings and Waveforms From A Seiscomp3 Data Base To A Seisan Data Base
33 pages
HadoopExercises July2011 PDF
No ratings yet
HadoopExercises July2011 PDF
26 pages
MapReduce_commands
No ratings yet
MapReduce_commands
3 pages
20CSPL701 - Bda - Record 2024-2025
No ratings yet
20CSPL701 - Bda - Record 2024-2025
54 pages
BDA Record (1)
No ratings yet
BDA Record (1)
34 pages
Hands On-Exercies
No ratings yet
Hands On-Exercies
17 pages
20CSPL701 - Bda - Record 2024-2025
No ratings yet
20CSPL701 - Bda - Record 2024-2025
61 pages
Hadoop Installation
No ratings yet
Hadoop Installation
6 pages
Hadoop Single Node Cluster Setup Steps
No ratings yet
Hadoop Single Node Cluster Setup Steps
7 pages
BDA Practical1 MC18-23
No ratings yet
BDA Practical1 MC18-23
17 pages
Shell Script Moderno - Aurelio Jargas
No ratings yet
Shell Script Moderno - Aurelio Jargas
52 pages
Hadoop 2.6.5 Installing On Ubuntu 16.04 and 18.04 (Single-Node Cluster)
No ratings yet
Hadoop 2.6.5 Installing On Ubuntu 16.04 and 18.04 (Single-Node Cluster)
7 pages
Hadoop Single Node Installation
No ratings yet
Hadoop Single Node Installation
7 pages
Lab 4 - Installation of Hadoop and MapReduce WordCount Example
No ratings yet
Lab 4 - Installation of Hadoop and MapReduce WordCount Example
14 pages
Linux Audit Check List
No ratings yet
Linux Audit Check List
3 pages
Hadoop_Commands
No ratings yet
Hadoop_Commands
5 pages
Setup Hadoop Gettingstart
No ratings yet
Setup Hadoop Gettingstart
4 pages
SRM Institute of Science and Technology: Vadapalani Campus Department of Computer Science and Engineeering
No ratings yet
SRM Institute of Science and Technology: Vadapalani Campus Department of Computer Science and Engineeering
5 pages
Installationof Hadoop 3
No ratings yet
Installationof Hadoop 3
6 pages
Steps To Install Hadoop 2.x Release (Yarn or Next-Gen) On Single Node Cluster Setup
No ratings yet
Steps To Install Hadoop 2.x Release (Yarn or Next-Gen) On Single Node Cluster Setup
7 pages
Apache Hbase Installation
No ratings yet
Apache Hbase Installation
5 pages
Had Oop Installation
No ratings yet
Had Oop Installation
4 pages
Hadoop InstallSteps
No ratings yet
Hadoop InstallSteps
14 pages
day7
No ratings yet
day7
7 pages
bda-manual
No ratings yet
bda-manual
33 pages
Bda Lab
No ratings yet
Bda Lab
47 pages
YarnHdfs Administration
No ratings yet
YarnHdfs Administration
10 pages
SAP Content Server Install
No ratings yet
SAP Content Server Install
13 pages
INSTALLATION Complete Asterisk-OpenImsCore
No ratings yet
INSTALLATION Complete Asterisk-OpenImsCore
5 pages
Hadoop Installation Manual 2.odt
No ratings yet
Hadoop Installation Manual 2.odt
20 pages
BDA LAB Programs
No ratings yet
BDA LAB Programs
56 pages
5-Practicas+BigData Trabajar Hdfs
No ratings yet
5-Practicas+BigData Trabajar Hdfs
10 pages
Hadoop 2x Installation With HA (NFS - QJM)
No ratings yet
Hadoop 2x Installation With HA (NFS - QJM)
43 pages
Install Hadoop
No ratings yet
Install Hadoop
5 pages
Hadoop Installation Steps
100% (1)
Hadoop Installation Steps
6 pages
Exp 8
No ratings yet
Exp 8
1 page
Playing With Docker Container - Commands
No ratings yet
Playing With Docker Container - Commands
3 pages
Hadoop Installation
No ratings yet
Hadoop Installation
6 pages
Linux Elevation of Privileges, Manual Testing: Sudo Su
No ratings yet
Linux Elevation of Privileges, Manual Testing: Sudo Su
4 pages
4exploring Hadoop Ecosystem With Simple Linux Commands
No ratings yet
4exploring Hadoop Ecosystem With Simple Linux Commands
10 pages
Original
No ratings yet
Original
17 pages
Installation Process of HADOOP
No ratings yet
Installation Process of HADOOP
12 pages
SD-RAN 1.4 Installation Manual
No ratings yet
SD-RAN 1.4 Installation Manual
14 pages
Installing A Single Node Hadoop Cluster
No ratings yet
Installing A Single Node Hadoop Cluster
4 pages
Bda A1
No ratings yet
Bda A1
15 pages
Allora Worker Node Guide
No ratings yet
Allora Worker Node Guide
7 pages
PRACTICAL 4 - Single and Multi Node Hadoop Install
No ratings yet
PRACTICAL 4 - Single and Multi Node Hadoop Install
11 pages
DataVisuaization Lab
No ratings yet
DataVisuaization Lab
5 pages
Playing With Docker Container - Commands
No ratings yet
Playing With Docker Container - Commands
3 pages
Module-1: Hdfs Basics Running Example Programs and Benchmarks Hadoop Mapreduce Framework Mapreduce Programming
No ratings yet
Module-1: Hdfs Basics Running Example Programs and Benchmarks Hadoop Mapreduce Framework Mapreduce Programming
33 pages
Hadoop Installation Final
No ratings yet
Hadoop Installation Final
32 pages
install hadoop
No ratings yet
install hadoop
2 pages
Lab 1
No ratings yet
Lab 1
13 pages
HADOOP 1.X Installation Steps On Ubuntu
No ratings yet
HADOOP 1.X Installation Steps On Ubuntu
3 pages
Hadoop Commands
100% (1)
Hadoop Commands
6 pages
Hadoop Fs
No ratings yet
Hadoop Fs
3 pages
Hadoop 3x Installation With HA
No ratings yet
Hadoop 3x Installation With HA
17 pages
Bí kiếp TU TIÊN
No ratings yet
Bí kiếp TU TIÊN
3 pages
Bash Command Line Pro Tips
From Everand
Bash Command Line Pro Tips
Jason Cannon
4.5/5 (8)
Quick Configuration of Openldap and Kerberos in Linux and Authenicating Linux to Active Directory
From Everand
Quick Configuration of Openldap and Kerberos in Linux and Authenicating Linux to Active Directory
Dr. Hidaia Mahmood Alassouli
No ratings yet
Operational FP FFU
No ratings yet
Operational FP FFU
4 pages
Error Handling in Qlikview
No ratings yet
Error Handling in Qlikview
2 pages
Worksheet W
No ratings yet
Worksheet W
1 page
Resignation Letter Format
No ratings yet
Resignation Letter Format
3 pages
QlikView Section Access Examples
No ratings yet
QlikView Section Access Examples
3 pages
S. No KYC Document Validity Date ID Proof
No ratings yet
S. No KYC Document Validity Date ID Proof
4 pages
Capability Maturity Model
No ratings yet
Capability Maturity Model
1 page
Worksheet Z
No ratings yet
Worksheet Z
1 page
Worksheet X
No ratings yet
Worksheet X
1 page
Worksheet V
No ratings yet
Worksheet V
1 page
Update Mobile: - Register For ES - Change TPIN - Apply For ADDON - Previous Menu (Press )
No ratings yet
Update Mobile: - Register For ES - Change TPIN - Apply For ADDON - Previous Menu (Press )
1 page
Worksheet O
No ratings yet
Worksheet O
1 page
Worksheet T
No ratings yet
Worksheet T
1 page
Worksheet U
No ratings yet
Worksheet U
1 page
Worksheet P
No ratings yet
Worksheet P
1 page
Worksheet Q
No ratings yet
Worksheet Q
1 page
Worksheet R
No ratings yet
Worksheet R
1 page
Worksheet N
No ratings yet
Worksheet N
1 page
Worksheet S
No ratings yet
Worksheet S
1 page
Worksheet G
No ratings yet
Worksheet G
1 page
Worksheet L
No ratings yet
Worksheet L
1 page
Worksheet I
No ratings yet
Worksheet I
1 page
Worksheet B
No ratings yet
Worksheet B
1 page
Worksheet D
No ratings yet
Worksheet D
1 page
Worksheet J
No ratings yet
Worksheet J
1 page
Worksheet K
No ratings yet
Worksheet K
1 page
Worksheet H
No ratings yet
Worksheet H
1 page
Worksheet F
No ratings yet
Worksheet F
1 page
Worksheet C
No ratings yet
Worksheet C
1 page
Worksheet E
No ratings yet
Worksheet E
1 page
Linux and Shell Scripting
No ratings yet
Linux and Shell Scripting
2 pages
Programmability Open NX-OS
No ratings yet
Programmability Open NX-OS
285 pages
Linux Exo Two
No ratings yet
Linux Exo Two
10 pages
Linux List All Users in The System - Nixcraft
No ratings yet
Linux List All Users in The System - Nixcraft
11 pages
Intro To Docker
No ratings yet
Intro To Docker
5 pages
Introduction To Problem Solving Methods and Algorithm Development by Dr. S. Srivastava..
No ratings yet
Introduction To Problem Solving Methods and Algorithm Development by Dr. S. Srivastava..
96 pages
Linux Shell and Shell Scripting TCS 492 2024
No ratings yet
Linux Shell and Shell Scripting TCS 492 2024
29 pages
(Ebooks PDF) Download Penetration Testing With Kali Linux OSCP Offensive Security Full Chapters
100% (2)
(Ebooks PDF) Download Penetration Testing With Kali Linux OSCP Offensive Security Full Chapters
54 pages
Linux Commands
No ratings yet
Linux Commands
12 pages
21 Best Linux Command Cheat Sheets For Free
No ratings yet
21 Best Linux Command Cheat Sheets For Free
1 page
UP LAB Programs
No ratings yet
UP LAB Programs
5 pages
Advance Shell Scripting Training
No ratings yet
Advance Shell Scripting Training
129 pages
117-101 Real Exam Questions - Guaranteed: Number: 117-101 Passing Score: 700 Time Limit: 120 Min File Version: 27.4
No ratings yet
117-101 Real Exam Questions - Guaranteed: Number: 117-101 Passing Score: 700 Time Limit: 120 Min File Version: 27.4
32 pages
Bash Cheatsheet
100% (2)
Bash Cheatsheet
4 pages
Cyberscape Report Ru
No ratings yet
Cyberscape Report Ru
27 pages
[Ebooks PDF] download Scripting Automation with Bash PowerShell and Python 1st Edition Michael Kofler full chapters
92% (12)
[Ebooks PDF] download Scripting Automation with Bash PowerShell and Python 1st Edition Michael Kofler full chapters
60 pages
Programming and CS Terms by Jamshed
No ratings yet
Programming and CS Terms by Jamshed
80 pages
Working With Shells, Scripting
No ratings yet
Working With Shells, Scripting
13 pages
Shell Scripts and Grep Command
No ratings yet
Shell Scripts and Grep Command
14 pages
12th examinationII
No ratings yet
12th examinationII
3 pages
Blueseer ERP (200-235)
No ratings yet
Blueseer ERP (200-235)
36 pages
Msit 3a
No ratings yet
Msit 3a
161 pages
Linux Interview Questions
100% (1)
Linux Interview Questions
52 pages
SHELL SCRIPTING - Merged
No ratings yet
SHELL SCRIPTING - Merged
47 pages
Interview-Shell Scripting
No ratings yet
Interview-Shell Scripting
31 pages

Hadoop Hands-On Exercises: Lawrence Berkeley National Lab

Uploaded by

Hadoop Hands-On Exercises: Lawrence Berkeley National Lab

Uploaded by

Hadoop Hands-On Exercises

Lawrence Berkeley National Lab

You might also like