0% found this document useful (0 votes)
18 views

BDA3

1) The document describes an experiment to write code for a MapReduce word count program using Hadoop. 2) It involves creating driver, mapper, and reducer classes in Eclipse and adding Hadoop libraries. 3) The mapper class splits input lines by spaces and emits (word, 1) pairs, the reducer class sums the 1s for each word, and the driver class runs the MapReduce job.

Uploaded by

nikithakatta0
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views

BDA3

1) The document describes an experiment to write code for a MapReduce word count program using Hadoop. 2) It involves creating driver, mapper, and reducer classes in Eclipse and adding Hadoop libraries. 3) The mapper class splits input lines by spaces and emits (word, 1) pairs, the reducer class sums the 1s for each word, and the driver class runs the MapReduce job.

Uploaded by

nikithakatta0
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Big data Analytics LAB

Experiment_no-03

Aim: Write driver code, mapper code, reducer code to count number of words in a
given file. (Hint: WordCount Map- Reduce Program)

Description:

1) Open Oracle VM VirtualBox->export cloudera->start

2) Cloudera->settings->system->set processors to “2”, by default it is “1”.

3) To launch “cloudera Express”

(i) Open terminal in cloudera and start the server by using

“sudo service cloudera-sdh-server start”

(ii) After successful completion click on Cloudera Express->Cloud Manager

(iii) Both username and password is “cloudera” and then click on Login

4) In browser type “localhost:50070/dfshealth.jsp”


5) In eclipse->File->New->Java project->Project name “WordCount”->Finish

6) Create three classes.

Right click on WordCount -> New->class->Name “WCDriver”

Right click on WordCount -> New->class->Name “WCMapper”

Right click on WordCount -> New->class->Name “WCReducer”

7) Add Hadoop libraries.

Right click on WordCount ->Build path->Configure Build path->Add external


JARS. (usr\lib\hadoop\hadoop-common-2.6.0-cdh 5.13.0 jar,

Usr\lib\hadoop\hadoop-core-2.6.0-cdh 5.13.0 jar)

Program:

WCMapper.java

import java.io.IOException;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.MapReduceBase;
import org.apache.hadoop.mapred.Mapper;
import org.apache.hadoop.mapred.OutputCollector;
import org.apache.hadoop.mapred.Reporter;

public class WCMapper extends MapReduceBase implements Mapper<LongWritable,


Text, Text, IntWritable> {

// Map function
public void map(LongWritable key, Text value, OutputCollector<Text,
IntWritable> output, Reporter rep) throws IOException
{

String line = value.toString();

// Splitting the line on spaces


for (String word : line.split(" "))
{
if (word.length() > 0)
{
output.collect(new Text(word), new IntWritable(1));
}
}
}
}

WCReducer.java

import java.io.IOException;
import java.util.Iterator;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.MapReduceBase;
import org.apache.hadoop.mapred.OutputCollector;
import org.apache.hadoop.mapred.Reducer;
import org.apache.hadoop.mapred.Reporter;

public class WCReducer extends MapReduceBase implements Reducer<Text,


IntWritable, Text, IntWritable> {

// Reduce function
public void reduce(Text key, Iterator<IntWritable> value,
OutputCollector<Text, IntWritable> output,
Reporter rep) throws IOException
{

int count = 0;
// Counting the frequency of each words
while (value.hasNext())
{
IntWritable i = value.next();
count += i.get();
}

output.collect(key, new IntWritable(count));


}
}

WCDriver.java

import java.io.IOException;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.FileInputFormat;
import org.apache.hadoop.mapred.FileOutputFormat;
import org.apache.hadoop.mapred.JobClient;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;

public class WCDriver extends Configured implements Tool {

public int run(String args[]) throws IOException


{
if (args.length < 2)
{
System.out.println("Please give valid inputs");
return -1;
}

JobConf conf = new JobConf(WCDriver.class);


FileInputFormat.setInputPaths(conf, new Path(args[0]));
FileOutputFormat.setOutputPath(conf, new Path(args[1]));
conf.setMapperClass(WCMapper.class);
conf.setReducerClass(WCReducer.class);
conf.setMapOutputKeyClass(Text.class);
conf.setMapOutputValueClass(IntWritable.class);
conf.setOutputKeyClass(Text.class);
conf.setOutputValueClass(IntWritable.class);
JobClient.runJob(conf);
return 0;
}

// Main Method
public static void main(String args[]) throws Exception
{
int exitCode = ToolRunner.run(new WCDriver(), args);
System.out.println(exitCode);
}
}

Right click on WordCount->Export->java->jar file->JAR file:” WordCount”->Finish


OUTPUT:

In Terminal

You might also like