BDC Output 3
BDC Output 3
IOException,InterruptedException
{ int sum=0;
for(IntWritable x: values)
{ sum+=x.get();
}
} public static void main(String[] args) throws Exception
Configuration conf= new Configuration(); Job job = new
Job(conf,"My Word Count Program");
Big Data Computing Practical No. 3
job.setJarByClass(WordCount.class);
job.setMapperClass(Map.class);
job.setReducerClass(Reduce.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class); Path
outputPath = new Path(args[1]);
//Configuring the input/output path from the filesystem into the job
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
//deleting the output path automatically from hdfs so that we don't have to delete it explicitly
outputPath.getFileSystem(conf).delete(outputPath); //exiting the job only if the flag value
becomes false
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
The entire MapReduce program can be fundamentally divided into three parts:
A. Mapper Phase Code
B. Reducer Phase Code
C. Driver Code
We will understand the code for each of these three parts sequentially.
Mapper code:
public static class Map extends
Mapper<LongWritable,Text,Text,IntWritable> {
• We define the data types of input and output key/value pair after the class declaration using angle
brackets.
• Input:
◦ The key is nothing but the offset of each line in the text file:LongWritable
◦ The value is each individual line (as shown in the figure at the right): Text
• Output:
◦ The key is the tokenized words: Text
◦ We have the hardcoded value in our case which is 1: IntWritable
◦ Example – Dear 1, Bear 1, etc.
• We have written a java code where we have tokenized each word and assigned them a hardcoded
value equal to 1.
Reducer Code:
public static class Reduce extends
Reducer<Text,IntWritable,Text,IntWritable>
{
public void reduce(Text key, Iterable<IntWritable> values,Context context)
throws IOException,InterruptedException
{
int sum=0; for(IntWritable x: values)
{
sum+=x.get();
}
context.write(key, new IntWritable(sum));
}
}
• We have created a class Reduce which extends class Reducer like that of
Mapper.
• We define the data types of input and output key/value pair after the class
declaration using angle brackets as done for Mapper.
• Both the input and the output of the Reducer is a keyvalue pair.
• Input:
◦ The key nothing but those unique words which have been generated after
the sorting and shuffling phase: Text
Big Data Computing Practical No. 3
job.setOutputValueClass(IntWritable.class); job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
Path outputPath = new Path(args[1]);
//Configuring the input/output path from the filesystem into the job
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
• In the driver class, we set the configuration of our MapReduce job to run in Hadoop.
• We specify the name of the job , the data type of input/ output of the mapper and reducer.
• We also specify the names of the mapper and reducer classes.
• The path of the input and output folder is also specified.
• The method setInputFormatClass () is used for specifying that how a Mapper will read the input
data or what will be the unit of work. Here, we have chosen
TextInputFormat so that single line is read by the mapper at a time from the input text file.
• The main () method is the entry point for the driver. In this method, we instantiate a new
Configuration object for the job.