0% found this document useful (0 votes)

5 views

Practical 2-1

Hadoop

Uploaded by

warlord 56

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views

Practical 2-1

Hadoop

Uploaded by

warlord 56

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

Practical 2-1 Write a map reduce program for Word Count in file.

MapReduce consists of 2 steps:

Map Function – It takes a set of data and converts it into another set of data, where individual elements are
broken down into tuples (Key-Value pair).

Example – (Map function in Word Count)

Input
Set of data
Bus, Car, bus, car, train, car, bus, car, train, bus, TRAIN, BUS, buS, caR, CAR, car, BUS, TRAIN

Output
Convert into another set of data
(Key,Value)
(Bus,1), (Car,1), (bus,1), (car,1), (train,1), (car,1), (bus,1), (car,1), (train,1), (bus,1),
(TRAIN,1),(BUS,1), (buS,1), (caR,1), (CAR,1), (car,1), (BUS,1), (TRAIN,1)

Reduce Function – Takes the output from Map as an input and combines those data tuples into a smaller set
of tuples.

Example – (Reduce function in Word Count)

Input
(output of Map function)

Set of Tuples
(Bus,1), (Car,1), (bus,1), (car,1), (train,1), (car,1), (bus,1), (car,1), (train,1), (bus,1), (TRAIN,1),(BUS,1),
(buS,1), (caR,1), (CAR,1), (car,1), (BUS,1), (TRAIN,1)

Output
Converts into smaller set of tuples
(BUS,7), (CAR,7), (TRAIN,4)

Work Flow of the Program

Workflow of MapReduce consists of 5 steps:

Splitting – The splitting parameter can be anything, e.g. splitting by space, comma, semicolon, or even by a
new line (‘\n’).

Mapping – as explained above.

Intermediate splitting – the entire process in parallel on different clusters. In order to group them in
“Reduce Phase” the similar KEY data should be on the same cluster.

Reduce – it is nothing but mostly group by phase.

Combining – The last phase where all the data (individual result set from each cluster) is combined together
to form a result.

Steps
1. Open Eclipse> File > New > Java Project >( Name it – MRProgramsDemo) > Finish.

2. Right Click > New > Package ( Name it - PackageDemo) > Finish.

3. Right Click on Package > New > Class (Name it - WordCount).

4. Add Following Reference Libraries:

a. Right Click on Project > Build Path> Add External

i. /usr/lib/hadoop-0.20/hadoop-core.jar

ii. Usr/lib/hadoop-0.20/lib/Commons-cli-1.2.jar

5. Type the following code in java:

package PackageDemo;
import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;
public class WordCount {
public static void main(String [] args) throws Exception
{
Configuration c=new Configuration();
String[] files=new GenericOptionsParser(c,args).getRemainingArgs();
Path input=new Path(files[0]);
Path output=new Path(files[1]);
Job j=new Job(c,"wordcount");
j.setJarByClass(WordCount.class);
j.setMapperClass(MapForWordCount.class);
j.setReducerClass(ReduceForWordCount.class);
j.setOutputKeyClass(Text.class);
j.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(j, input);
FileOutputFormat.setOutputPath(j, output);
System.exit(j.waitForCompletion(true)?0:1);
}
public static class MapForWordCount extends Mapper<LongWritable, Text, Text, IntWritable>{
public void map(LongWritable key, Text value, Context con) throws IOException, InterruptedException
{
String line = value.toString();
String[] words=line.split(",");
for(String word: words )
{
Text outputKey = new Text(word.toUpperCase().trim());
IntWritable outputValue = new IntWritable(1);
con.write(outputKey, outputValue);
}

}
}
public static class ReduceForWordCount extends Reducer<Text, IntWritable, Text, IntWritable>
{
public void reduce(Text word, Iterable<IntWritable> values, Context con) throws IOException, InterruptedE
xception
{
int sum = 0;
for(IntWritable value : values)
{
sum += value.get();
}
con.write(word, new IntWritable(sum));
}
}
}

The above program consists of three classes:

Driver class (Public, void, static, or main; this is the entry point).
The Map class which extends the public class Mapper<KEYIN, VALUEIN, KEYOUT, VALUEOUT> and
implements the Map function.
The Reduce class which extends the public class Reducer<KEYIN, VALUEIN, KEYOUT, VALUEOUT>
and implements the Reduce function.

6. Make a jar file

Right Click on Project> Export> Select export destination as Jar File > next> Finish.
7. Take a text file and move it into HDFS format:

To move this into Hadoop directly, open the terminal and enter the following commands:

hadoop fs -put wordcountFile wordCountFile\

8. Run the jar file:

(Hadoop jar jarfilename.jar packageName.ClassName PathToInputTextFile PathToOutputDirectry)

hadoop jar MRProgramsDemo.jar PackageDemo.WordCount wordCountFile MRDir1

9. Open the result:

[rahul@localhost ~]$ hadoop fs -ls MRDir1

Found 3 items

-rw-r--r-- 1 training supergroup 0 2023-07-21 09:50 /user/rahul/MRDir1/_SUCCESS

drwxr-xr-x - training supergroup 0 2023-07-21 09:50 /user/rahul/MRDir1/_logs
-rw-r--r-- 1 training supergroup 20 2023-07-21 09:50 /user/rahul/MRDir1/part-r-00000

[rahul@localhost ~]$ hadoop fs -cat MRDir1/part-r-00000

BUS 7
CAR 4
TRAIN 6

The C# Player's Guide - 5th Edition - 5.0.0
83% (18)
The C# Player's Guide - 5th Edition - 5.0.0
497 pages
Corce
70% (46)
Corce
206 pages
Introduction To Computer Theory by Cohen Solutions Manual
80% (5)
Introduction To Computer Theory by Cohen Solutions Manual
198 pages
Ap Computer Science Principles Practice Exam and Notes 2021
100% (6)
Ap Computer Science Principles Practice Exam and Notes 2021
108 pages
The Ethical Slut PDF
55% (69)
The Ethical Slut PDF
298 pages
Hacking The Art of Exploitation 2nd Edition Jon Erickson
100% (19)
Hacking The Art of Exploitation 2nd Edition Jon Erickson
492 pages
PrepTest 83 - Print and Take Test - 7sage Lsat
100% (3)
PrepTest 83 - Print and Take Test - 7sage Lsat
46 pages
Typography For Lawyers
20% (5)
Typography For Lawyers
9 pages
50 Phone Hacks DR - Brad
58% (19)
50 Phone Hacks DR - Brad
29 pages
One-Page Mythic GME
100% (8)
One-Page Mythic GME
11 pages
C# Cheat Sheet
100% (5)
C# Cheat Sheet
12 pages
Learn Python in A Day
100% (14)
Learn Python in A Day
141 pages
All Codes Mobile
100% (1)
All Codes Mobile
53 pages
Example - (Map Function in Word Count)
No ratings yet
Example - (Map Function in Word Count)
6 pages
Word Count Program With MapReduce and Java
No ratings yet
Word Count Program With MapReduce and Java
6 pages
Word Count Program With MapReduce and Java
No ratings yet
Word Count Program With MapReduce and Java
6 pages
Ravikant_Hadoop_file
No ratings yet
Ravikant_Hadoop_file
22 pages
Big Data Practical 2
No ratings yet
Big Data Practical 2
11 pages
Map Reduce
No ratings yet
Map Reduce
57 pages
Steps: /usr/lib/hadoop-0.20/ Usr/lib/hadoop-0.20/lib
No ratings yet
Steps: /usr/lib/hadoop-0.20/ Usr/lib/hadoop-0.20/lib
4 pages
Experiment-4 BDA LAB
No ratings yet
Experiment-4 BDA LAB
7 pages
Word Count Program With MapReduce and Java
No ratings yet
Word Count Program With MapReduce and Java
5 pages
CS702_Big_Data_Programs
No ratings yet
CS702_Big_Data_Programs
58 pages
Big Data 4 Vivek
No ratings yet
Big Data 4 Vivek
3 pages
Practical 2c
No ratings yet
Practical 2c
2 pages
DA Lab Program-2
No ratings yet
DA Lab Program-2
6 pages
Word Count Program To Demonstrate The Use of Map and Reduce Tasks
No ratings yet
Word Count Program To Demonstrate The Use of Map and Reduce Tasks
5 pages
Mapreduce Programming Framework
No ratings yet
Mapreduce Programming Framework
23 pages
Ravinder Big Data 4 PDF
No ratings yet
Ravinder Big Data 4 PDF
15 pages
Bda Experiment No2
No ratings yet
Bda Experiment No2
12 pages
Unit IV Programming Model
No ratings yet
Unit IV Programming Model
30 pages
Week-8 de
No ratings yet
Week-8 de
9 pages
BDC Output 3
No ratings yet
BDC Output 3
4 pages
CS246 TA Session: Hadoop Tutorial: Peyman Kazemian 1/11/2011
No ratings yet
CS246 TA Session: Hadoop Tutorial: Peyman Kazemian 1/11/2011
13 pages
Palak
No ratings yet
Palak
10 pages
Import Import Import Import Import Import Import Import Public Class Extends Implements
No ratings yet
Import Import Import Import Import Import Import Import Public Class Extends Implements
7 pages
Steps to create jar file and execute word count problem in mapper reducer
No ratings yet
Steps to create jar file and execute word count problem in mapper reducer
5 pages
CS-702 (D) BigData
No ratings yet
CS-702 (D) BigData
61 pages
wc
No ratings yet
wc
13 pages
BDA Lab 8 Manual
No ratings yet
BDA Lab 8 Manual
7 pages
Hadoop Wordcount Program
No ratings yet
Hadoop Wordcount Program
20 pages
2020300053_BDA_EXP2_CHINMAY
No ratings yet
2020300053_BDA_EXP2_CHINMAY
7 pages
Setting Up Eclipse:: Codelab 1 Introduction To The Hadoop Environment (Version 0.17.0)
No ratings yet
Setting Up Eclipse:: Codelab 1 Introduction To The Hadoop Environment (Version 0.17.0)
9 pages
Hadoop and Map Reduce
No ratings yet
Hadoop and Map Reduce
27 pages
BDA
No ratings yet
BDA
6 pages
Bda Lab Exercises Lab Mannual - 2023
No ratings yet
Bda Lab Exercises Lab Mannual - 2023
72 pages
3 MapReduce program ex code
No ratings yet
3 MapReduce program ex code
14 pages
02-Wordcount Mapreduce
No ratings yet
02-Wordcount Mapreduce
5 pages
Bda Unit III r20csm
No ratings yet
Bda Unit III r20csm
54 pages
Assignment 11 DSBDA
No ratings yet
Assignment 11 DSBDA
4 pages
Parlab Parallel Boot Camp Cloud Computing With Mapreduce and Hadoop
No ratings yet
Parlab Parallel Boot Camp Cloud Computing With Mapreduce and Hadoop
49 pages
Dllction To MAPREDUCE Afflrlling: L Tro
No ratings yet
Dllction To MAPREDUCE Afflrlling: L Tro
12 pages
Prerequisites: Single Node Setup Cluster Setup
No ratings yet
Prerequisites: Single Node Setup Cluster Setup
5 pages
Exp 4 Word Count
No ratings yet
Exp 4 Word Count
4 pages
Part B Assignment - No - 1
No ratings yet
Part B Assignment - No - 1
6 pages
Module2 C MapReduceParadigm
No ratings yet
Module2 C MapReduceParadigm
74 pages
Map Reduce Design and Execution Framework Part 1
No ratings yet
Map Reduce Design and Execution Framework Part 1
19 pages
ExNo04
No ratings yet
ExNo04
4 pages
Hadoop Developingapps PDF
No ratings yet
Hadoop Developingapps PDF
17 pages
BDA Experiment 3
No ratings yet
BDA Experiment 3
7 pages
Word Count Example
No ratings yet
Word Count Example
4 pages
DSBDA 11
No ratings yet
DSBDA 11
15 pages
Developing A Simple Map-Reduce Program For Hadoop: Big Data Course CS6350 Professor: Dr. Latifur Khan
No ratings yet
Developing A Simple Map-Reduce Program For Hadoop: Big Data Course CS6350 Professor: Dr. Latifur Khan
22 pages
Exp 3-Word Count
No ratings yet
Exp 3-Word Count
4 pages
Advanced Mapreduce
No ratings yet
Advanced Mapreduce
37 pages
WordCount Program Hadoop Task 2
No ratings yet
WordCount Program Hadoop Task 2
7 pages
BDA3
No ratings yet
BDA3
7 pages
6. Map Reduce Programming
No ratings yet
6. Map Reduce Programming
67 pages
Big Data Akshat
No ratings yet
Big Data Akshat
57 pages
6 WIBD-Practicals
No ratings yet
6 WIBD-Practicals
19 pages
hadoop2
No ratings yet
hadoop2
31 pages
Unit 3 - Big Data Technologies
No ratings yet
Unit 3 - Big Data Technologies
42 pages
Python Advanced Programming: The Guide to Learn Python Programming. Reference with Exercises and Samples About Dynamical Programming, Multithreading, Multiprocessing, Debugging, Testing and More
From Everand
Python Advanced Programming: The Guide to Learn Python Programming. Reference with Exercises and Samples About Dynamical Programming, Multithreading, Multiprocessing, Debugging, Testing and More
Marcus Richards
No ratings yet
AI Tools and Prompts
100% (4)
AI Tools and Prompts
94 pages
Eat That Frog
100% (10)
Eat That Frog
124 pages
Introduction To Computer Science
100% (6)
Introduction To Computer Science
202 pages
Coding With JavaScript For Dummies Everything To Know About JavaScript (2020) - 40153
100% (1)
Coding With JavaScript For Dummies Everything To Know About JavaScript (2020) - 40153
247 pages
Linux Cheat Sheet
No ratings yet
Linux Cheat Sheet
4 pages
Structured and Unstructured Maintenance With Example
0% (1)
Structured and Unstructured Maintenance With Example
9 pages
Simple Sabotage Field Manual
100% (2)
Simple Sabotage Field Manual
16 pages
NWO, Illuminati, Freemason, Occult, Bible Prophecy, Conspiracy, Secret Society, Etc. Links
No ratings yet
NWO, Illuminati, Freemason, Occult, Bible Prophecy, Conspiracy, Secret Society, Etc. Links
47 pages
Do You Speak Java
No ratings yet
Do You Speak Java
186 pages
Learn To Code Getting Started Guide
100% (4)
Learn To Code Getting Started Guide
23 pages
The JavaScript Beginner's Handbook
90% (10)
The JavaScript Beginner's Handbook
76 pages
Credit Card Processing System
No ratings yet
Credit Card Processing System
18 pages
Python Programming For Beginners - A Crash Course To Learn Python and Other Recommended Coding
83% (6)
Python Programming For Beginners - A Crash Course To Learn Python and Other Recommended Coding
86 pages
Learning Liquid
100% (1)
Learning Liquid
89 pages
LINUX COMMAND LINE An Introduction To Linux Command Line Environment
No ratings yet
LINUX COMMAND LINE An Introduction To Linux Command Line Environment
174 pages
Learn To Code HTML and CSS Develop Style Websites PDF
100% (2)
Learn To Code HTML and CSS Develop Style Websites PDF
595 pages
How To Use PATS Module Initialization Function
No ratings yet
How To Use PATS Module Initialization Function
5 pages
COS 318: Operating Systems Virtual Memory Paging: Andy Bavier Computer Science Department Princeton University
No ratings yet
COS 318: Operating Systems Virtual Memory Paging: Andy Bavier Computer Science Department Princeton University
24 pages
Drill 4 Soquiat Marc Hendri
No ratings yet
Drill 4 Soquiat Marc Hendri
9 pages
OTRS 3.0 - Developer Manual
No ratings yet
OTRS 3.0 - Developer Manual
146 pages
Lab Assignment-I: Submitted By: Anurag Sharma 101503043
No ratings yet
Lab Assignment-I: Submitted By: Anurag Sharma 101503043
17 pages
13.5.3 Example Simulation Codes: Temperature Sensors: Is It Hot in Here, or Is It Just Me?
No ratings yet
13.5.3 Example Simulation Codes: Temperature Sensors: Is It Hot in Here, or Is It Just Me?
5 pages
DBMS Lab Record 2020-21
No ratings yet
DBMS Lab Record 2020-21
36 pages
G9Adv-Building An MP3 Player in Python (Word Version)
No ratings yet
G9Adv-Building An MP3 Player in Python (Word Version)
11 pages
TR - Receipt Auto Application
No ratings yet
TR - Receipt Auto Application
8 pages
Develop An Application To Display Analog Time Picker. Also Display The Selected Time. (Write Only - Java File) 4 M Mainactivity - Java
No ratings yet
Develop An Application To Display Analog Time Picker. Also Display The Selected Time. (Write Only - Java File) 4 M Mainactivity - Java
7 pages
Chapter 3 - Simple Sorting and Searching
100% (1)
Chapter 3 - Simple Sorting and Searching
18 pages
Chat Room Socket
No ratings yet
Chat Room Socket
4 pages
VAT Code and Percent Using BRF+
No ratings yet
VAT Code and Percent Using BRF+
8 pages
Java Lab Programs
100% (1)
Java Lab Programs
62 pages
Introduction To Computer & Programming: Ms Sadia Ejaz Cs Department
No ratings yet
Introduction To Computer & Programming: Ms Sadia Ejaz Cs Department
55 pages
R Programming for Data Science Roger D. Peng download
No ratings yet
R Programming for Data Science Roger D. Peng download
29 pages
Test No 4 Os HPSC Subjective
No ratings yet
Test No 4 Os HPSC Subjective
2 pages
Lecture 06
No ratings yet
Lecture 06
12 pages
Namma Kalvi 12th Computer Science Practical Manual em Dhanapal
No ratings yet
Namma Kalvi 12th Computer Science Practical Manual em Dhanapal
19 pages
354331085485959 Pramodd Komarneni AssessmentCenterReport 163
No ratings yet
354331085485959 Pramodd Komarneni AssessmentCenterReport 163
32 pages
Task Soln l12
No ratings yet
Task Soln l12
70 pages
Big Data File in R
No ratings yet
Big Data File in R
23 pages
Java Project Itinerary For Enterprise and System Development
No ratings yet
Java Project Itinerary For Enterprise and System Development
6 pages
Python Notes
No ratings yet
Python Notes
10 pages
Quantum Computing: A Report FOR MIS207
No ratings yet
Quantum Computing: A Report FOR MIS207
10 pages
Mihirkulkarni Resume
No ratings yet
Mihirkulkarni Resume
1 page
C Notes
No ratings yet
C Notes
12 pages
Learn Splunk Online Training
No ratings yet
Learn Splunk Online Training
25 pages
FRONTEND-CHEATSHEET
No ratings yet
FRONTEND-CHEATSHEET
16 pages
An Empirical Study On Apache Spark
No ratings yet
An Empirical Study On Apache Spark
15 pages
Dan Saks C++ in Embedded Systems
No ratings yet
Dan Saks C++ in Embedded Systems
24 pages

Practical 2-1

Uploaded by

Practical 2-1

Uploaded by

Practical 2-1 Write a map reduce program for Word Count in file.

MapReduce consists of 2 steps:

Example – (Map function in Word Count)

Example – (Reduce function in Word Count)

Work Flow of the Program

Mapping – as explained above.

Reduce – it is nothing but mostly group by phase.

3. Right Click on Package > New > Class (Name it - WordCount).

4. Add Following Reference Libraries:

a. Right Click on Project > Build Path> Add External

5. Type the following code in java:

The above program consists of three classes:

6. Make a jar file

hadoop fs -put wordcountFile wordCountFile\

8. Run the jar file:

(Hadoop jar jarfilename.jar packageName.ClassName PathToInputTextFile PathToOutputDirectry)

hadoop jar MRProgramsDemo.jar PackageDemo.WordCount wordCountFile MRDir1

9. Open the result:

-rw-r--r-- 1 training supergroup 0 2023-07-21 09:50 /user/rahul/MRDir1/_SUCCESS

[rahul@localhost ~]$ hadoop fs -cat MRDir1/part-r-00000

You might also like