0% found this document useful (0 votes)
17 views

Big Data 4 Vivek

This document describes running a MapReduce program in Java to count the frequency of words in a text dataset. It involves writing a mapper to extract words and counts from input data, a reducer to sum the counts of each word, and a driver to run the MapReduce job and output the results.

Uploaded by

Pulkit Ahuja
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views

Big Data 4 Vivek

This document describes running a MapReduce program in Java to count the frequency of words in a text dataset. It involves writing a mapper to extract words and counts from input data, a reducer to sum the counts of each word, and a driver to run the MapReduce job and output the results.

Uploaded by

Pulkit Ahuja
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 3

Name: Vivek Kumar Roll No:11212652

EXPERIMENT NO:- 04

AIM: Run a java program based on parallel programming to implement the concept of
Map Reduce Paradigm.

DESCRIPTION:--

MapReduce is the heart of Hadoop. It is this programming paradigm that allows for massive
scalability across hundreds or thousands of servers in a Hadoop cluster. The MapReduce concept
is fairly simple to understand for those who are familiar with clustered scale-out data processing
solutions. The term MapReduce actually refers to two separate and distinct tasks that Hadoop
programs perform. The first is the map job, which takes a set of data and converts it into another
set of data, where individual elements are broken down into tuples (key/value pairs). The reduce
job takes the output from a map as input and combines those data tuples into a smaller set of
tuples. As the sequence of the name MapReduce implies, the reduce job is always performed
after the map job.

ALGORITHM : MAPREDUCE PROGRAM

WordCount is a simple program which counts the number of occurrences of each word in
a given text input data set. WordCount fits very well with the MapReduce programming model
making it a great example to understand the Hadoop Map/Reduce programming style. Our
implementation consists of three main parts:

1. Mapper: Mapper is a function which process the input data. The mapper
processes the data and creates several small chunks of data. The input to the
mapper function is in the form of (key, value) pairs, even though the input to a
MapReduce program is a file or directory (which is stored in the HDFS)
2. Reducer: The reducer is a pure function that takes the previous state and an
action, and returns the next state. (previousState, action) => nextState.
Copy. It's called a reducer because it's the type of function you would
pass to Array.
3. Driver: There is one final component of a Hadoop MapReduce program,
called the Driver. The driver initializes the job and instructs the
Hadoop platform to execute your code on a set of input files, and controls where
the output files are placed.
Step-1. Write a Mapper : A Mapper overrides the ―mapǁ function from the Class
org.apache.hadoop.mapreduce.Mapper "which provides <key, value> pairs as the input. A
Mapper implementation may output <key,value> pairs using the provided Context . Input value
of the WordCount Map task will be a line of text from the input data file and the key would be
the line number <line_number, line_of_text> . Map task outputs <word, one> for each word in
the line of text. Pseudo-code
Name: Vivek Kumar Roll No:11212652

void Map (key, value){


for each word x in
value:
output.collect(x, 1);
}
Step-2. Write a Reducer: A Reducer collects the intermediate <key.value> output from
multiple map tasks and assemble a single result. Here, the WordCount program will sum up
the occurrence of each word to pairs as <word, occurrence>. Pseudo-code
void Reduce (keyword, <list of value>)
{ for each x in <list of value>:
sum+=x;
final output.collect(keyword, sum);

Step-3. Write Driver: The Driver program configures and run the MapReduce job. We use the
main program to perform basic configurations such as:
• Job Name: name of this Job
• Executable (Jar) Class: the main executable class. For here, WordCount.
• Mapper Class: class which overrides the "map" function. For here, Map.
• Reducer: class which override the "reduce" function. For here, Reduce.
• Output Key: type of output key. For here. Text. Output Value: type of output value.
For here, IntWritable.
• File Input Path
• File Output Path

INPUT:- Set of Data Related Shakespeare Comedies, Glossary, Poems


Name: Vivek Kumar Roll No:11212652

OUTPUT:-

You might also like