0% found this document useful (0 votes)

12 views

Labs Lecture2

This document outlines a lab exercise for running a MapReduce job using Apache Hadoop, specifically focusing on counting word occurrences in Shakespeare's works. It details the steps for compiling Java files, creating a JAR, submitting the job, and reviewing the results. Additionally, it includes instructions for stopping running MapReduce jobs and managing output files in HDFS.

Uploaded by

Alexandru Cristian Popa

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views

Labs Lecture2

Uploaded by

Alexandru Cristian Popa

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Apache Hadoop

Labs, Lecture 2

1
Lab: Running a MapReduce Job
Files and Directories Used in this Exercise

Source directory: ~/workspace/wordcount/src/stubs

Files:
WordCount.java: A simple MapReduce driver class.
WordMapper.java: A mapper class for the job.
SumReducer.java: A reducer class for the job.
wc.jar: The compiled, assembled WordCount program

In this lab you will compile Java files, create a JAR, and run MapReduce jobs.

In addition to manipulating files in HDFS, the wrapper program hadoop is used to

launch MapReduce jobs. The code for a job is contained in a compiled JAR file.
Hadoop loads the JAR into HDFS and distributes it to the worker nodes, where the
individual tasks of the MapReduce job are executed.

One simple example of a MapReduce job is to count the number of occurrences of

each word in a file or set of files. In this lab you will compile and submit a
MapReduce job to count the number of occurrences of every word in the works of
Shakespeare.

Compiling and Submitting a MapReduce Job

1. In a terminal window, change to the lab source directory, and list the contents:

$ cd ~/workspace/wordcount/src
$ ls

List the files in the stubs package directory:

$ ls stubs

The package contains the following Java files:

2
WordCount.java: A simple MapReduce driver class.
WordMapper.java: A mapper class for the job.
SumReducer.java: A reducer class for the job.

Examine these files if you wish, but do not change them. Remain in this
directory while you execute the following commands.

2. Before compiling, examine the classpath Hadoop is configured to use:

$ hadoop classpath

This shows lists the locations where the Hadoop core API classes are installed.

3. Compile the three Java classes:

$ javac -classpath `hadoop classpath` stubs/*.java

Note: in the command above, the quotes around hadoop classpath are
backquotes. This runs the hadoop classpath command and uses its
output as part of the javac command.

The compiled (.class) files are placed in the stubs directory.

4. Collect your compiled Java files into a JAR file:

$ jar cvf wc.jar stubs/*.class

5. Submit a MapReduce job to Hadoop using your JAR file to count the occurrences
of each word in Shakespeare:

$ hadoop jar wc.jar stubs.WordCount \

shakespeare wordcounts

This hadoop jar command names the JAR file to use (wc.jar), the class
whose main method should be invoked (stubs.WordCount), and the HDFS
input and output directories to use for the MapReduce job.

3
Your job reads all the files in your HDFS shakespeare directory, and places its
output in a new HDFS directory called wordcounts.

6. Try running this same command again without any change:

$ hadoop jar wc.jar stubs.WordCount \

shakespeare wordcounts

Your job halts right away with an exception, because Hadoop automatically fails
if your job tries to write its output into an existing directory. This is by design;
since the result of a MapReduce job may be expensive to reproduce, Hadoop
prevents you from accidentally overwriting previously existing files.

7. Review the result of your MapReduce job:

$ hadoop fs -ls wordcounts

This lists the output files for your job. (Your job ran with only one Reducer, so
there should be one file, named part-r-00000, along with a _SUCCESS file
and a _logs directory.)

8. View the contents of the output for your job:

$ hadoop fs -cat wordcounts/part-r-00000 | less

You can page through a few screens to see words and their frequencies in the
works of Shakespeare. (The spacebar will scroll the output by one screen; the
letter 'q' will quit the less utility.) Note that you could have specified
wordcounts/* just as well in this command.

4
Wildcards in HDFS file paths

Take care when using wildcards (e.g. *) when specifying HFDS filenames;
because of how Linux works, the shell will attempt to expand the wildcard
before invoking hadoop, and then pass incorrect references to local files instead
of HDFS files. You can prevent this by enclosing the wildcarded HDFS filenames
in single quotes, e.g. hadoop fs –cat 'wordcounts/*'

9. Try running the WordCount job against a single file:

$ hadoop jar wc.jar stubs.WordCount \

shakespeare/poems pwords

When the job completes, inspect the contents of the pwords HDFS directory.

10. Clean up the output files produced by your job runs:

$ hadoop fs -rm -r wordcounts pwords

Stopping MapReduce Jobs

It is important to be able to stop jobs that are already running. This is useful if, for
example, you accidentally introduced an infinite loop into your Mapper. An
important point to remember is that pressing ^C to kill the current process (which
is displaying the MapReduce job's progress) does not actually stop the job itself.

A MapReduce job, once submitted to Hadoop, runs independently of the initiating

process, so losing the connection to the initiating process does not kill the job.
Instead, you need to tell the Hadoop JobTracker to stop the job.

5
1. Start another word count job like you did in the previous section:

$ hadoop jar wc.jar stubs.WordCount shakespeare \

count2

2. While this job is running, open another terminal window and enter:

$ mapred job -list

This lists the job ids of all running jobs. A job id looks something like:
job_200902131742_0002

3. Copy the job id, and then kill the running job by entering:

$ mapred job -kill jobid

The JobTracker kills the job, and the program running in the original terminal
completes.

This is the end of the lab.

Homework_Labs_Lecture2
No ratings yet
Homework_Labs_Lecture2
6 pages
03_Run the WordCount program instructions.docx
No ratings yet
03_Run the WordCount program instructions.docx
4 pages
Run The WordCount Program Instructions
No ratings yet
Run The WordCount Program Instructions
3 pages
Practical 2c
No ratings yet
Practical 2c
2 pages
Running Jar Program
No ratings yet
Running Jar Program
3 pages
ExNo04
No ratings yet
ExNo04
4 pages
Word Count Program To Demonstrate The Use of Map and Reduce Tasks
No ratings yet
Word Count Program To Demonstrate The Use of Map and Reduce Tasks
5 pages
Prerequisites: Single Node Setup Cluster Setup
No ratings yet
Prerequisites: Single Node Setup Cluster Setup
5 pages
Lab2 WC
No ratings yet
Lab2 WC
2 pages
Intellipaat Hands On Exercises PDF
No ratings yet
Intellipaat Hands On Exercises PDF
49 pages
Unit IV Programming Model
No ratings yet
Unit IV Programming Model
30 pages
Word_Count(2021)
No ratings yet
Word_Count(2021)
50 pages
Experiment-4 BDA LAB
No ratings yet
Experiment-4 BDA LAB
7 pages
Tutorial-Counting Words in File (S) Using Mapreduce: Prerequisites
No ratings yet
Tutorial-Counting Words in File (S) Using Mapreduce: Prerequisites
11 pages
Developing A Simple Map-Reduce Program For Hadoop: Big Data Course CS6350 Professor: Dr. Latifur Khan
No ratings yet
Developing A Simple Map-Reduce Program For Hadoop: Big Data Course CS6350 Professor: Dr. Latifur Khan
22 pages
Steps to create jar file and execute word count problem in mapper reducer
No ratings yet
Steps to create jar file and execute word count problem in mapper reducer
5 pages
DSBDA 11
No ratings yet
DSBDA 11
15 pages
Setting Up Eclipse:: Codelab 1 Introduction To The Hadoop Environment (Version 0.17.0)
No ratings yet
Setting Up Eclipse:: Codelab 1 Introduction To The Hadoop Environment (Version 0.17.0)
9 pages
Ravinder Big Data 4 PDF
No ratings yet
Ravinder Big Data 4 PDF
15 pages
WordCount Program Hadoop Task 2
No ratings yet
WordCount Program Hadoop Task 2
7 pages
6 WIBD-Practicals
No ratings yet
6 WIBD-Practicals
19 pages
Word Count
No ratings yet
Word Count
10 pages
Hadoop and Map Reduce
No ratings yet
Hadoop and Map Reduce
27 pages
Ravikant_Hadoop_file
No ratings yet
Ravikant_Hadoop_file
22 pages
CS-702 (D) BigData
No ratings yet
CS-702 (D) BigData
61 pages
BDM Lab Manual 2
No ratings yet
BDM Lab Manual 2
4 pages
Hands-On Exercises With Big Data: Lab Sheet 1: Getting Started With Mapreduce and Hadoop
No ratings yet
Hands-On Exercises With Big Data: Lab Sheet 1: Getting Started With Mapreduce and Hadoop
14 pages
Commands Guide.: 5.3 Walk-Through
No ratings yet
Commands Guide.: 5.3 Walk-Through
1 page
02-Wordcount Mapreduce
No ratings yet
02-Wordcount Mapreduce
5 pages
L4A Running Hadoop with MR
No ratings yet
L4A Running Hadoop with MR
5 pages
CS702_Big_Data_Programs
No ratings yet
CS702_Big_Data_Programs
58 pages
CS246 TA Session: Hadoop Tutorial: Peyman Kazemian 1/11/2011
No ratings yet
CS246 TA Session: Hadoop Tutorial: Peyman Kazemian 1/11/2011
13 pages
MapReduce Word Count Example - Javatpoint
No ratings yet
MapReduce Word Count Example - Javatpoint
12 pages
B1 instructions
No ratings yet
B1 instructions
9 pages
Group 11 Assignment 4
No ratings yet
Group 11 Assignment 4
10 pages
Cloud PDF
No ratings yet
Cloud PDF
47 pages
Hadoop Map-Reduce
No ratings yet
Hadoop Map-Reduce
2 pages
Word Count Program With MapReduce and Java
No ratings yet
Word Count Program With MapReduce and Java
6 pages
DSBDA GRP B Print
No ratings yet
DSBDA GRP B Print
21 pages
Assignment 11 DSBDA
No ratings yet
Assignment 11 DSBDA
4 pages
Big Data Cloudera TP
No ratings yet
Big Data Cloudera TP
33 pages
hadoop2
No ratings yet
hadoop2
31 pages
Steps: /usr/lib/hadoop-0.20/ Usr/lib/hadoop-0.20/lib
No ratings yet
Steps: /usr/lib/hadoop-0.20/ Usr/lib/hadoop-0.20/lib
4 pages
Wordcount
No ratings yet
Wordcount
3 pages
DA Lab Program-2
No ratings yet
DA Lab Program-2
6 pages
Activity 2
No ratings yet
Activity 2
31 pages
BDA
No ratings yet
BDA
6 pages
BDA3
No ratings yet
BDA3
7 pages
Hadoop Exercise Mapreduce
No ratings yet
Hadoop Exercise Mapreduce
5 pages
Cloudera Academic Partnership 4 PDF
No ratings yet
Cloudera Academic Partnership 4 PDF
38 pages
Lecture 4 PDF
No ratings yet
Lecture 4 PDF
38 pages
Big Data Akshat
No ratings yet
Big Data Akshat
57 pages
BDA Lab
No ratings yet
BDA Lab
13 pages
Setup Hadoop Gettingstart
No ratings yet
Setup Hadoop Gettingstart
4 pages
Example - (Map Function in Word Count)
No ratings yet
Example - (Map Function in Word Count)
6 pages
Quick Configuration of Openldap and Kerberos In Linux and Authenicating Linux to Active Directory
From Everand
Quick Configuration of Openldap and Kerberos In Linux and Authenicating Linux to Active Directory
Dr. Hidaia Mahmood Alassouli
No ratings yet
Map Reduce
No ratings yet
Map Reduce
57 pages
Word Count Program With MapReduce and Java
No ratings yet
Word Count Program With MapReduce and Java
6 pages
Hands On
No ratings yet
Hands On
26 pages
Import Import Import Import Import Import Import Import Public Class Extends Implements
No ratings yet
Import Import Import Import Import Import Import Import Public Class Extends Implements
7 pages
Parminder Singh Bhatia Resume
No ratings yet
Parminder Singh Bhatia Resume
2 pages
What is NoSQL (1)
No ratings yet
What is NoSQL (1)
52 pages
Module-1 Data Analytics in Healthcare Systems
No ratings yet
Module-1 Data Analytics in Healthcare Systems
23 pages
Hive Is A Data Warehouse Infrastructure Tool To Process Structured Data in Hadoop
No ratings yet
Hive Is A Data Warehouse Infrastructure Tool To Process Structured Data in Hadoop
30 pages
Vinoth Kumar Resume
No ratings yet
Vinoth Kumar Resume
2 pages
Big Data Technologies 2018
No ratings yet
Big Data Technologies 2018
1 page
Installation Guide - Hadoop-2.6.0
No ratings yet
Installation Guide - Hadoop-2.6.0
28 pages
Handbook of Big Data Analytics, - Vadlamani Ravi, Aswani Kumar CH
No ratings yet
Handbook of Big Data Analytics, - Vadlamani Ravi, Aswani Kumar CH
390 pages
Cloud Computing in IoT
No ratings yet
Cloud Computing in IoT
88 pages
rc159-HBase 7 PDF
No ratings yet
rc159-HBase 7 PDF
7 pages
The OpenIMAJ Tutorial
No ratings yet
The OpenIMAJ Tutorial
85 pages
Welcome To Apache™ Hadoop®!
No ratings yet
Welcome To Apache™ Hadoop®!
9 pages
Bda Experiment 5: Roll No. A-52 Name: Janmejay Patil Class: BE-A Batch: A3 Date of Experiment: Date of Submission Grade
No ratings yet
Bda Experiment 5: Roll No. A-52 Name: Janmejay Patil Class: BE-A Batch: A3 Date of Experiment: Date of Submission Grade
5 pages
professional-data-engineer_5
No ratings yet
professional-data-engineer_5
23 pages
Data Analytics Lab
No ratings yet
Data Analytics Lab
46 pages
WP - Cyber Command Whitepaper
No ratings yet
WP - Cyber Command Whitepaper
43 pages
BS in Artificial intelligence
No ratings yet
BS in Artificial intelligence
4 pages
CSC440M Cloud Computing Monsoon 2016-17
No ratings yet
CSC440M Cloud Computing Monsoon 2016-17
6 pages
Hadoop Architecture
No ratings yet
Hadoop Architecture
25 pages
Big Data Analytics-Syllabus
No ratings yet
Big Data Analytics-Syllabus
3 pages
6th sem AIDS syllabus 2022 scheme
No ratings yet
6th sem AIDS syllabus 2022 scheme
52 pages
Data Science: Lecture #1
No ratings yet
Data Science: Lecture #1
22 pages
Big Data Processing Concepts
No ratings yet
Big Data Processing Concepts
9 pages
Hadoop Unit-4
No ratings yet
Hadoop Unit-4
44 pages
Cloudera Administration
No ratings yet
Cloudera Administration
424 pages
Unit V-Hive
No ratings yet
Unit V-Hive
10 pages
G.R.Anantha Raman - 1A Review On Big Data Analytics in The Field of Agriculture
No ratings yet
G.R.Anantha Raman - 1A Review On Big Data Analytics in The Field of Agriculture
16 pages
117769
No ratings yet
117769
20 pages
6-IR
No ratings yet
6-IR
18 pages

Labs Lecture2

Uploaded by

Labs Lecture2

Uploaded by

Apache Hadoop

Source directory: ~/workspace/wordcount/src/stubs

In addition to manipulating files in HDFS, the wrapper program hadoop is used to

One simple example of a MapReduce job is to count the number of occurrences of

Compiling and Submitting a MapReduce Job

List the files in the stubs package directory:

The package contains the following Java files:

2. Before compiling, examine the classpath Hadoop is configured to use:

3. Compile the three Java classes:

$ javac -classpath `hadoop classpath` stubs/*.java

The compiled (.class) files are placed in the stubs directory.

4. Collect your compiled Java files into a JAR file:

$ jar cvf wc.jar stubs/*.class

$ hadoop jar wc.jar stubs.WordCount \

6. Try running this same command again without any change:

$ hadoop jar wc.jar stubs.WordCount \

7. Review the result of your MapReduce job:

$ hadoop fs -ls wordcounts

8. View the contents of the output for your job:

$ hadoop fs -cat wordcounts/part-r-00000 | less

9. Try running the WordCount job against a single file:

$ hadoop jar wc.jar stubs.WordCount \

10. Clean up the output files produced by your job runs:

$ hadoop fs -rm -r wordcounts pwords

Stopping MapReduce Jobs

A MapReduce job, once submitted to Hadoop, runs independently of the initiating

$ hadoop jar wc.jar stubs.WordCount shakespeare \

$ mapred job -list

$ mapred job -kill jobid

This is the end of the lab.

You might also like