0% found this document useful (0 votes)

18 views

Lab11 B

Uploaded by

l227486

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views

Lab11 B

Uploaded by

l227486

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

F Big Data A

LAB-11-A
Timings: 11:30 am - 2:30 pm

Lab Protocols:

1. This Lab Would hold tasks at the end. Cheating would result straight 0
2. Making noise in lab during demonstration would result in immediate termination
of session and start of Tasks.
3. Contact me on email for queries [email protected]
MapReduce for word count problem on Hadoop:
In MapReduce word count example, we find out the frequency of each word. Here, the role
of Mapper is to map the keys to the existing values and the role of Reducer is to aggregate
the keys of common values. So, everything is represented in the form of key-value pair.
Example:
Let’s solve a word count problem using MapReduce on Hadoop.
Step 1: Open Cloudera Quickstart VM.

Step 2: Create a .txt data file inside /home/cloudera directory that will be passed as an input to
MapReduce program. For simplicity purpose, we name it as word_count_data.txt.

Step 3: Create mapper.py and reducer.py files inside /home/cloudera directory.

Step 4: Test the MapReduce program(s) locally to check if everything works properly before
running on Hadoop. cat word_count_data.txt | python mapper.py | sort -k1,1 | python
reducer.py
For the above example, the output obtained is exactly the same as expected.
If you see all the words correctly mapped, sorted and reduced to their respective counts, then
your program is good to be tested on Hadoop.
Step 5: Configure Hadoop services and settings.
Now, we need to configure certain settings on Hadoop before we run the MapReduce program
for word count.
5a: Login to Cloudera Quickstart.
Open browser on Cloudera Quickstart VM and open quickstart.cloudera:7180/cmf/login.
Login by entering the credentials as cloudera for both, username and password.
Note: If you see the error “Unable to connect” while logging in to
quickstart.cloudera:7180/cmf/login, try restarting the CDH services.

Restart CDH services by typing the following command: sudo

/home/cloudera/cloudera-manager --express --force

5b: Start HDFS and YARN services.

Click the dropdown arrow and choose Start option for HDFS and YARN services.

You’ll see the following if both; HDFS and YARN services are started successfully.

HDFS service started successfully.

Step 6: Create a directory on HDFS
Now, we create a directory named word_count_map_reduce on HDFS where our input data
and its resulting output would be stored. Use the following command for it.
hdfs dfs -mkdir /word_count_map_reduce

Note: If the directory already exists, then either create a directory with new name or delete
the existing directory using the following command.
export HADOOP_USER_NAME=hdfs
hdfs dfs -rmr /word_count_map_reduce
List HDFS directory items using the following command.
hdfs dfs -ls /

Step 7: Move input data file to HDFS.

Copy the word_count_data.txt file to word_count_map_reduce directory on HDFS using the
following command.
hdfs dfs -put /home/cloudera/word_count_data.txt /word_count_map_reduce
Check if file was copied successfully to the desired location.
hdfs dfs -ls /word_count_map_reduce

Step 8: Download hadoop-streaming JAR 2.7.3.

Open browser and go to https://jar-download.com/artifacts/org.apache.hadoop/hadoop-
streaming?p=4 and download hadoop-streaming JAR 2.7.3 file.

Once the file is downloaded, unzip it inside /home/cloudera directory.

Double-check if the JAR file was unzipped successfully and is present inside /home/cloudera
directory.
ls

Step 9: Configure permissions to run MapReduce on Hadoop.

We’re almost ready to run our MapReduce job on Hadoop but before that, we need to give
permission to read, write and execute the Mapper and Reducer programs on Hadoop.
We also need to provide permission for the default user (cloudera) to write the output file inside
HDFS.
Run the following commands to do so:
chmod 777 mapper.py reducer.py
hdfs dfs -chown cloudera /word_count_map_reduce

Step 10: Run MapReduce on Hadoop.

We’re at the ultimate step of this program. Run the MapReduce job on Hadoop using the
following command.
hadoop jar /home/cloudera/hadoop-streaming-2.7.3.jar \
> -input /word_count_map_reduce/word_count_data.txt\
> -output /word_count_map_reduce/output \
> -mapper /home/cloudera/mapper.py \
If you see the output on terminal as shown in above two images, then the MapReduce job was
executed successfully.
Step 11: Read the MapReduce output.
Now, finally run the following command to read the output of MapReduce for word count of
the input data file you had created. hdfs dfs -cat /word_count_map_reduce/output/part-00000
Congratulations, the output for MapReduce on Hadoop is obtained exactly as expected. All the
words in the input data file have been mapped, sorted and reduced to their respective counts.

SITHKOP001 Assessment - Fortuno
73% (11)
SITHKOP001 Assessment - Fortuno
22 pages
Word Count using MapReduce on Hadoop
No ratings yet
Word Count using MapReduce on Hadoop
14 pages
TP3_hadoop python_Wordcount (1)
No ratings yet
TP3_hadoop python_Wordcount (1)
6 pages
Word Count
No ratings yet
Word Count
10 pages
CS702_Big_Data_Programs
No ratings yet
CS702_Big_Data_Programs
58 pages
Quick Configuration of Openldap and Kerberos in Linux and Authenicating Linux to Active Directory
From Everand
Quick Configuration of Openldap and Kerberos in Linux and Authenicating Linux to Active Directory
Dr. Hidaia Mahmood Alassouli
No ratings yet
Tutorial-Counting Words in File (S) Using Mapreduce: Prerequisites
No ratings yet
Tutorial-Counting Words in File (S) Using Mapreduce: Prerequisites
11 pages
Bda Experiment No2
No ratings yet
Bda Experiment No2
12 pages
Professional Hadoop Solutions
From Everand
Professional Hadoop Solutions
Boris Lublinsky
4/5 (2)
02-Wordcount Mapreduce
No ratings yet
02-Wordcount Mapreduce
5 pages
BDF Programs
No ratings yet
BDF Programs
32 pages
DSBDN
No ratings yet
DSBDN
4 pages
Practical 2c
No ratings yet
Practical 2c
2 pages
bda lab s
No ratings yet
bda lab s
92 pages
BDA Lab
No ratings yet
BDA Lab
13 pages
CS702 Big Data Programs
No ratings yet
CS702 Big Data Programs
59 pages
CS-702 (D) BigData
No ratings yet
CS-702 (D) BigData
61 pages
L4A Running Hadoop with MR
No ratings yet
L4A Running Hadoop with MR
5 pages
BDT Lab Manual
No ratings yet
BDT Lab Manual
48 pages
DSBDA 11
No ratings yet
DSBDA 11
15 pages
BDA
No ratings yet
BDA
6 pages
Developing A Simple Map-Reduce Program For Hadoop: Big Data Course CS6350 Professor: Dr. Latifur Khan
No ratings yet
Developing A Simple Map-Reduce Program For Hadoop: Big Data Course CS6350 Professor: Dr. Latifur Khan
22 pages
Hadoop Blueprints
From Everand
Hadoop Blueprints
Tanmay Deshpande
No ratings yet
HDInsight Essentials - Second Edition
From Everand
HDInsight Essentials - Second Edition
Rajesh Nadipalli
No ratings yet
Hadoop For Dummies
From Everand
Hadoop For Dummies
Dirk deRoos
3/5 (2)
bda megh
No ratings yet
bda megh
50 pages
Cloudera Administration Handbook
From Everand
Cloudera Administration Handbook
Rohit Menon
No ratings yet
Writing An Hadoop MapReduce Program in Python
No ratings yet
Writing An Hadoop MapReduce Program in Python
21 pages
Bda Lab Exercises Lab Mannual - 2023
No ratings yet
Bda Lab Exercises Lab Mannual - 2023
72 pages
Big Data Lab Manual Printout Copy
No ratings yet
Big Data Lab Manual Printout Copy
51 pages
Hadoop Map-Reduce
No ratings yet
Hadoop Map-Reduce
2 pages
Run Python MapReduce On Local Docker Hadoop Cluster - DEV Community
No ratings yet
Run Python MapReduce On Local Docker Hadoop Cluster - DEV Community
5 pages
lsde_workshop_wk9(2)
No ratings yet
lsde_workshop_wk9(2)
31 pages
Cp5261 Da Lab Me-Cse 2021 - Edit
No ratings yet
Cp5261 Da Lab Me-Cse 2021 - Edit
88 pages
BDA record
No ratings yet
BDA record
58 pages
MapReduce(Streaming) TP Report
No ratings yet
MapReduce(Streaming) TP Report
16 pages
Exploring Hadoop Ecosystem (Volume 1): Batch Processing
From Everand
Exploring Hadoop Ecosystem (Volume 1): Batch Processing
Wei Liu
No ratings yet
Prerequisites: Single Node Setup Cluster Setup
No ratings yet
Prerequisites: Single Node Setup Cluster Setup
5 pages
Big Data Cloudera TP
No ratings yet
Big Data Cloudera TP
33 pages
Big Data Analytics Lab Manual(BE AI&DS)
No ratings yet
Big Data Analytics Lab Manual(BE AI&DS)
29 pages
Advanced Hadoop Techniques: A Comprehensive Guide to Mastery
From Everand
Advanced Hadoop Techniques: A Comprehensive Guide to Mastery
Adam Jones
No ratings yet
Hadoop Beginner's Guide
From Everand
Hadoop Beginner's Guide
Garry Turkington
4/5 (7)
Cloud PDF
No ratings yet
Cloud PDF
47 pages
Word Count Exercise Instructions
No ratings yet
Word Count Exercise Instructions
3 pages
Learning Cascading
From Everand
Learning Cascading
Michael Covert
No ratings yet
Word Count Program With MapReduce and Java
No ratings yet
Word Count Program With MapReduce and Java
6 pages
Big Data Akshat
No ratings yet
Big Data Akshat
57 pages
Mapreduce Lab
No ratings yet
Mapreduce Lab
36 pages
CS246 TA Session: Hadoop Tutorial: Peyman Kazemian 1/11/2011
No ratings yet
CS246 TA Session: Hadoop Tutorial: Peyman Kazemian 1/11/2011
13 pages
Parlab Parallel Boot Camp Cloud Computing With Mapreduce and Hadoop
No ratings yet
Parlab Parallel Boot Camp Cloud Computing With Mapreduce and Hadoop
49 pages
Activity 2
No ratings yet
Activity 2
31 pages
Hadoop Mini Project
No ratings yet
Hadoop Mini Project
8 pages
CCBDI Full Lab Manual Anurag Removed
No ratings yet
CCBDI Full Lab Manual Anurag Removed
97 pages
Big Data Analytics - Lecture 6
No ratings yet
Big Data Analytics - Lecture 6
33 pages
Example - (Map Function in Word Count)
No ratings yet
Example - (Map Function in Word Count)
6 pages
Data Science
No ratings yet
Data Science
82 pages
Experiment-4 BDA LAB
No ratings yet
Experiment-4 BDA LAB
7 pages
Assignment 11 DSBDA
No ratings yet
Assignment 11 DSBDA
4 pages
Features of Hadoop: - Suitable For Big Data Analysis
No ratings yet
Features of Hadoop: - Suitable For Big Data Analysis
6 pages
Palak
No ratings yet
Palak
10 pages
Ravinder Big Data 4 PDF
No ratings yet
Ravinder Big Data 4 PDF
15 pages
3 Required (B) Prepare The Following Accounts For The Year Ended 30 September 2020. Close The Accounts
No ratings yet
3 Required (B) Prepare The Following Accounts For The Year Ended 30 September 2020. Close The Accounts
18 pages
Transpo Cases
No ratings yet
Transpo Cases
141 pages
OSF
100% (1)
OSF
2 pages
Project Visagan Sir
No ratings yet
Project Visagan Sir
13 pages
Project Management
No ratings yet
Project Management
13 pages
Risk Management: Nitesh Kumar, Asst. Professor, LPU
No ratings yet
Risk Management: Nitesh Kumar, Asst. Professor, LPU
14 pages
Kasb Bank
No ratings yet
Kasb Bank
3 pages
EPA - 600 - R-00 - 093 Controlling So2 Emissions PDF
No ratings yet
EPA - 600 - R-00 - 093 Controlling So2 Emissions PDF
113 pages
Answers - Sheet 7
No ratings yet
Answers - Sheet 7
3 pages
FALLSEM2020-21 ECE2003 ETH VL2020210101783 Reference Material I 14-Jul-2020 DLD Satheesh
No ratings yet
FALLSEM2020-21 ECE2003 ETH VL2020210101783 Reference Material I 14-Jul-2020 DLD Satheesh
127 pages
Jonathan Sudharta e - Prescription
No ratings yet
Jonathan Sudharta e - Prescription
53 pages
Apj.159) Alireza Bahadori Hari B. Vuthaluru Saeid Mokhatab - Optimizing Separator Pressures in The Multistage Crude
No ratings yet
Apj.159) Alireza Bahadori Hari B. Vuthaluru Saeid Mokhatab - Optimizing Separator Pressures in The Multistage Crude
7 pages
9946Z - 0657-SP-SL-DE-C-G77-452-P3 - Typical Sections Through Permeable Block Paving
No ratings yet
9946Z - 0657-SP-SL-DE-C-G77-452-P3 - Typical Sections Through Permeable Block Paving
1 page
2003 Fall Choice Zippo Lighter Catalog
No ratings yet
2003 Fall Choice Zippo Lighter Catalog
24 pages
News For Squamish
No ratings yet
News For Squamish
156 pages
Car and Driver
No ratings yet
Car and Driver
132 pages
TDS - Total - Lubrilam S 40 L - 1as - 201412 - en
No ratings yet
TDS - Total - Lubrilam S 40 L - 1as - 201412 - en
1 page
487bfd33-175b-48be-b8e4-136aa976ee48
No ratings yet
487bfd33-175b-48be-b8e4-136aa976ee48
2 pages
EC Declaration of Conformity For Machinery: R&M Materials Handling Inc. 4501 Gateway Boulevard
0% (1)
EC Declaration of Conformity For Machinery: R&M Materials Handling Inc. 4501 Gateway Boulevard
5 pages
Im Entrepreneurship
No ratings yet
Im Entrepreneurship
48 pages
12 a - Unit 6 - Applications of Computer Aided Engineering
No ratings yet
12 a - Unit 6 - Applications of Computer Aided Engineering
18 pages
United States v. Leo Christy Condolon, 600 F.2d 7, 4th Cir. (1979)
No ratings yet
United States v. Leo Christy Condolon, 600 F.2d 7, 4th Cir. (1979)
3 pages
IC 1 Minimum Standards For Intensive Care Units
No ratings yet
IC 1 Minimum Standards For Intensive Care Units
15 pages
Assignemnt 1
0% (1)
Assignemnt 1
2 pages
Junard Resume
No ratings yet
Junard Resume
2 pages
DaburIndia ICICI 130412
No ratings yet
DaburIndia ICICI 130412
2 pages
TIG Welding Amp Charts
50% (4)
TIG Welding Amp Charts
2 pages
Prad4x4™ New Thar Catalog
No ratings yet
Prad4x4™ New Thar Catalog
24 pages
Anthurium Varieties Performance and Economics Under Greenhouse.
33% (3)
Anthurium Varieties Performance and Economics Under Greenhouse.
4 pages

Lab11 B

Uploaded by

Lab11 B

Uploaded by

F Big Data A

Step 3: Create mapper.py and reducer.py files inside /home/cloudera directory.

Restart CDH services by typing the following command: sudo

5b: Start HDFS and YARN services.

HDFS service started successfully.

Step 7: Move input data file to HDFS.

Step 8: Download hadoop-streaming JAR 2.7.3.

Once the file is downloaded, unzip it inside /home/cloudera directory.

Step 9: Configure permissions to run MapReduce on Hadoop.

Step 10: Run MapReduce on Hadoop.

You might also like