0% found this document useful (0 votes)

9 views

18mcs35e U4

MapReduce is a framework for distributed processing of large datasets across clusters of computers. It allows applications to process vast amounts of data in parallel by breaking the processing into independent chunks called mappers and reducers. The mappers perform filtering and sorting, while the reducers perform a summary operation to combine the outputs of the mappers. This makes the framework highly scalable as the mappers and reducers can run on many computers at once. The MapReduce algorithm consists of map and reduce stages where the map stage converts input data into intermediate key-value pairs and the reduce stage combines those intermediate outputs into final results.

Uploaded by

jefferyleclerc

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views

18mcs35e U4

Uploaded by

jefferyleclerc

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

MapReduce

MapReduce is a framework using which we can write applications to process huge

amounts of data, in parallel, on large clusters of commodity hardware in a reliable
manner.

What is MapReduce?
MapReduce is a processing technique and a program model for distributed computing
based on java. The MapReduce algorithm contains two important tasks, namely Map
and Reduce. Map takes a set of data and converts it into another set of data, where
individual elements are broken down into tuples (key/value pairs). Secondly, reduce
task, which takes the output from a map as an input and combines those data tuples
into a smaller set of tuples. As the sequence of the name MapReduce implies, the
reduce task is always performed after the map job.
The major advantage of MapReduce is that it is easy to scale data processing over
multiple computing nodes. Under the MapReduce model, the data processing
primitives are called mappers and reducers. Decomposing a data processing
application into mappers and reducers is sometimes nontrivial. But, once we write an
application in the MapReduce form, scaling the application to run over hundreds,
thousands, or even tens of thousands of machines in a cluster is merely a configuration
change. This simple scalability is what has attracted many programmers to use the
MapReduce model.

The Algorithm
• Generally MapReduce paradigm is based on sending the computer to where the
data resides!
• MapReduce program executes in three stages, namely map stage, shuffle stage,
and reduce stage.
o Map stage − The map or mapper’s job is to process the input data.
Generally the input data is in the form of file or directory and is stored in
the Hadoop file system (HDFS). The input file is passed to the mapper
function line by line. The mapper processes the data and creates several
small chunks of data.
o Reduce stage − This stage is the combination of the Shufflestage and
the Reduce stage. The Reducer’s job is to process the data that comes
from the mapper. After processing, it produces a new set of output, which
will be stored in the HDFS.
• During a MapReduce job, Hadoop sends the Map and Reduce tasks to the
appropriate servers in the cluster.
• The framework manages all the details of data-passing such as issuing tasks,
verifying task completion, and copying data around the cluster between the
nodes.
• Most of the computing takes place on nodes with data on local disks that
reduces the network traffic.
• After completion of the given tasks, the cluster collects and reduces the data to
form an appropriate result, and sends it back to the Hadoop server.

Inputs and Outputs (Java Perspective)

The MapReduce framework operates on <key, value> pairs, that is, the framework
views the input to the job as a set of <key, value> pairs and produces a set of <key,
value> pairs as the output of the job, conceivably of different types.
The key and the value classes should be in serialized manner by the framework and
hence, need to implement the Writable interface. Additionally, the key classes have to
implement the Writable-Comparable interface to facilitate sorting by the framework.
Input and Output types of a MapReduce job − (Input) <k1, v1> → map → <k2, v2> →
reduce → <k3, v3>(Output).

Input Output

Map <k1, v1> list (<k2, v2>)

Reduce <k2, list(v2)> list (<k3, v3>)

Terminology
• PayLoad − Applications implement the Map and the Reduce functions, and form
the core of the job.
• Mapper − Mapper maps the input key/value pairs to a set of intermediate
key/value pair.
• NamedNode − Node that manages the Hadoop Distributed File System (HDFS).
• DataNode − Node where data is presented in advance before any processing
takes place.
• MasterNode − Node where JobTracker runs and which accepts job requests
from clients.
• SlaveNode − Node where Map and Reduce program runs.
• JobTracker − Schedules jobs and tracks the assign jobs to Task tracker.
• Task Tracker − Tracks the task and reports status to JobTracker.
• Job − A program is an execution of a Mapper and Reducer across a dataset.
• Task − An execution of a Mapper or a Reducer on a slice of data.
• Task Attempt − A particular instance of an attempt to execute a task on a
SlaveNode.

Example Scenario
Given below is the data regarding the electrical consumption of an organization. It
contains the monthly electrical consumption and the annual average for various years.

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Avg

1979 23 23 2 43 24 25 26 26 26 26 25 26 25

1980 26 27 28 28 28 30 31 31 31 30 30 30 29

1981 31 32 32 32 33 34 35 36 36 34 34 34 34

1984 39 38 39 39 39 41 42 43 40 39 38 38 40

1985 38 39 39 39 39 41 41 41 00 40 39 39 45

If the above data is given as input, we have to write applications to process it and
produce results such as finding the year of maximum usage, year of minimum usage,
and so on. This is a walkover for the programmers with finite number of records. They
will simply write the logic to produce the required output, and pass the data to the
application written.
But, think of the data representing the electrical consumption of all the largescale
industries of a particular state, since its formation.
When we write applications to process such bulk data,
• They will take a lot of time to execute.
• There will be a heavy network traffic when we move data from source to network
server and so on.
To solve these problems, we have the MapReduce framework.

Input Data

The above data is saved as sample.txtand given as input. The input file looks as
shown below.
1979 23 23 2 43 24 25 26 26 26 26 25 26 25
1980 26 27 28 28 28 30 31 31 31 30 30 30 29
1981 31 32 32 32 33 34 35 36 36 34 34 34 34
1984 39 38 39 39 39 41 42 43 40 39 38 38 40
1985 38 39 39 39 39 41 41 41 00 40 39 39 45

Example Program

Given below is the program to the sample data using MapReduce framework.
package hadoop;

import java.util.*;

import java.io.IOException;
import java.io.IOException;

import org.apache.hadoop.fs.Path;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapred.*;
import org.apache.hadoop.util.*;

public class ProcessUnits {

//Mapper class
public static class E_EMapper extends MapReduceBase implements
Mapper<LongWritable ,/*Input key Type */
Text, /*Input value Type*/
Text, /*Output key Type*/
IntWritable> /*Output value Type*/
{
//Map function
public void map(LongWritable key, Text value,
OutputCollector<Text, IntWritable> output,

Reporter reporter) throws IOException {

String line = value.toString();
String lasttoken = null;
StringTokenizer s = new StringTokenizer(line,"\t");
String year = s.nextToken();

while(s.hasMoreTokens()) {
lasttoken = s.nextToken();
}
int avgprice = Integer.parseInt(lasttoken);
output.collect(new Text(year), new IntWritable(avgprice));
}
}

//Reducer class
public static class E_EReduce extends MapReduceBase implements
Reducer< Text, IntWritable, Text, IntWritable > {

//Reduce function
public void reduce( Text key, Iterator <IntWritable> values,
OutputCollector<Text, IntWritable> output, Reporter reporter)
throws IOException {
int maxavg = 30;
int val = Integer.MIN_VALUE;

while (values.hasNext()) {
if((val = values.next().get())>maxavg) {
output.collect(key, new IntWritable(val));
}
}
}
}

//Main function
public static void main(String args[])throws Exception {
JobConf conf = new JobConf(ProcessUnits.class);

conf.setJobName("max_eletricityunits");
conf.setOutputKeyClass(Text.class);
conf.setOutputValueClass(IntWritable.class);
conf.setMapperClass(E_EMapper.class);
conf.setCombinerClass(E_EReduce.class);
conf.setReducerClass(E_EReduce.class);
conf.setInputFormat(TextInputFormat.class);
conf.setOutputFormat(TextOutputFormat.class);

FileInputFormat.setInputPaths(conf, new Path(args[0]));

FileOutputFormat.setOutputPath(conf, new Path(args[1]));

JobClient.runJob(conf);
}
}

Save the above program as ProcessUnits.java. The compilation and execution of the
program is explained below.
Compilation and Execution of Process Units Program
Let us assume we are in the home directory of a Hadoop user (e.g. /home/hadoop).
Follow the steps given below to compile and execute the above program.

Step 1

The following command is to create a directory to store the compiled java classes.
$ mkdir units

Step 2

Download Hadoop-core-1.2.1.jar, which is used to compile and execute the

MapReduce program. Visit the following link mvnrepository.com to download the jar.
Let us assume the downloaded folder is /home/hadoop/.

Step 3

The following commands are used for compiling the ProcessUnits.java program and
creating a jar for the program.
$ javac -classpath hadoop-core-1.2.1.jar -d units ProcessUnits.java
$ jar -cvf units.jar -C units/ .

Step 4

The following command is used to create an input directory in HDFS.

$HADOOP_HOME/bin/hadoop fs -mkdir input_dir

Step 5

The following command is used to copy the input file named sample.txtin the input
directory of HDFS.
$HADOOP_HOME/bin/hadoop fs -put /home/hadoop/sample.txt input_dir

Step 6

The following command is used to verify the files in the input directory.
$HADOOP_HOME/bin/hadoop fs -ls input_dir/

Step 7
The following command is used to run the Eleunit_max application by taking the input
files from the input directory.
$HADOOP_HOME/bin/hadoop jar units.jar hadoop.ProcessUnits input_dir
output_dir
Wait for a while until the file is executed. After execution, as shown below, the output
will contain the number of input splits, the number of Map tasks, the number of reducer
tasks, etc.
INFO mapreduce.Job: Job job_1414748220717_0002
completed successfully
14/10/31 06:02:52
INFO mapreduce.Job: Counters: 49
File System Counters

FILE: Number of bytes read = 61

FILE: Number of bytes written = 279400
FILE: Number of read operations = 0
FILE: Number of large read operations = 0
FILE: Number of write operations = 0
HDFS: Number of bytes read = 546
HDFS: Number of bytes written = 40
HDFS: Number of read operations = 9
HDFS: Number of large read operations = 0
HDFS: Number of write operations = 2 Job Counters

Launched map tasks = 2

Launched reduce tasks = 1
Data-local map tasks = 2
Total time spent by all maps in occupied slots (ms) = 146137
Total time spent by all reduces in occupied slots (ms) = 441
Total time spent by all map tasks (ms) = 14613
Total time spent by all reduce tasks (ms) = 44120
Total vcore-seconds taken by all map tasks = 146137
Total vcore-seconds taken by all reduce tasks = 44120
Total megabyte-seconds taken by all map tasks = 149644288
Total megabyte-seconds taken by all reduce tasks = 45178880

Map-Reduce Framework

Map input records = 5

Map output records = 5
Map output bytes = 45
Map output materialized bytes = 67
Input split bytes = 208
Combine input records = 5
Combine output records = 5
Reduce input groups = 5

Crossreference API Q1 9TH and ISO 9001 2015
88% (8)
Crossreference API Q1 9TH and ISO 9001 2015
3 pages
Machine Learning Algorithms Applications and Practices in Data Science PDF
No ratings yet
Machine Learning Algorithms Applications and Practices in Data Science PDF
113 pages
MapReduce Is A Framework Using Which We Can Write Applications To Process Huge Amounts of Data
No ratings yet
MapReduce Is A Framework Using Which We Can Write Applications To Process Huge Amounts of Data
12 pages
3 Fuel Consumption Example - MR
No ratings yet
3 Fuel Consumption Example - MR
7 pages
05 Movies Data Analysis Using Mapreduce
No ratings yet
05 Movies Data Analysis Using Mapreduce
20 pages
Map Reduce
No ratings yet
Map Reduce
25 pages
Bda Unit-3
No ratings yet
Bda Unit-3
20 pages
Unit-2 (MapReduce-I)
No ratings yet
Unit-2 (MapReduce-I)
28 pages
Map Reduce
No ratings yet
Map Reduce
36 pages
Introduction To MapReduce
No ratings yet
Introduction To MapReduce
9 pages
BDA UNIT-3 (1) - Merged
No ratings yet
BDA UNIT-3 (1) - Merged
98 pages
UNIT 3 NOTES (1)
No ratings yet
UNIT 3 NOTES (1)
21 pages
21CS1601 UNIT 5 UNDERSTANDING BIG DATA TECHNOLGIES
No ratings yet
21CS1601 UNIT 5 UNDERSTANDING BIG DATA TECHNOLGIES
20 pages
Bda Module 4
No ratings yet
Bda Module 4
34 pages
Big Data Mapreduce and Streaming
No ratings yet
Big Data Mapreduce and Streaming
10 pages
Chapter Five Hadoop Mapreduce & HDFS
No ratings yet
Chapter Five Hadoop Mapreduce & HDFS
44 pages
Unit 2 - From Hadoop Streaming PDF
No ratings yet
Unit 2 - From Hadoop Streaming PDF
20 pages
03 Firstmrjob Invertedindexconstruction 141206231216 Conversion Gate01 PDF
No ratings yet
03 Firstmrjob Invertedindexconstruction 141206231216 Conversion Gate01 PDF
54 pages
Question Bank-BDA
No ratings yet
Question Bank-BDA
15 pages
Mapreduce Introduction
No ratings yet
Mapreduce Introduction
14 pages
Notes Bug Data and of Apache
No ratings yet
Notes Bug Data and of Apache
4 pages
BDA Unit-2
No ratings yet
BDA Unit-2
11 pages
Introduction To MapReduce
No ratings yet
Introduction To MapReduce
17 pages
DSBDA Manual Assignment 11
No ratings yet
DSBDA Manual Assignment 11
6 pages
Bda 03
No ratings yet
Bda 03
10 pages
HadoopMapreduce Summerization
No ratings yet
HadoopMapreduce Summerization
24 pages
Unit 2 Topic 5 Developing A Map Reduce Application
No ratings yet
Unit 2 Topic 5 Developing A Map Reduce Application
52 pages
Hadoop
No ratings yet
Hadoop
34 pages
Hadoop: A Seminar Report On
No ratings yet
Hadoop: A Seminar Report On
28 pages
Hadoop Map Reduce Concepts - Teaching - 1
No ratings yet
Hadoop Map Reduce Concepts - Teaching - 1
53 pages
Prerequisites: Single Node Setup Cluster Setup
No ratings yet
Prerequisites: Single Node Setup Cluster Setup
5 pages
Unit-Iv CC&BD CS62
No ratings yet
Unit-Iv CC&BD CS62
76 pages
Parallel Project
No ratings yet
Parallel Project
32 pages
Unit IV Notes
No ratings yet
Unit IV Notes
25 pages
BDA notes
No ratings yet
BDA notes
39 pages
Hadoop: A Report Writing On
No ratings yet
Hadoop: A Report Writing On
13 pages
BDA Module 3 - Part 1 (Mapreduce and HBase) 2023
No ratings yet
BDA Module 3 - Part 1 (Mapreduce and HBase) 2023
15 pages
Hadoop Spark
No ratings yet
Hadoop Spark
34 pages
Big Data Notes (All Lectures)
No ratings yet
Big Data Notes (All Lectures)
44 pages
Chapter 4 MapReduce and New Software Stack
No ratings yet
Chapter 4 MapReduce and New Software Stack
48 pages
Map reduce
No ratings yet
Map reduce
35 pages
UNIT – III
No ratings yet
UNIT – III
38 pages
6. Map Reduce Programming
No ratings yet
6. Map Reduce Programming
67 pages
Parlab Parallel Boot Camp: Cloud Computing With Mapreduce and Hadoop
No ratings yet
Parlab Parallel Boot Camp: Cloud Computing With Mapreduce and Hadoop
55 pages
3.1.How Map Reduce Works & 3.2 Anatomy
No ratings yet
3.1.How Map Reduce Works & 3.2 Anatomy
11 pages
Analyzing The Data With Hadoop
No ratings yet
Analyzing The Data With Hadoop
13 pages
Parlab Parallel Boot Camp Cloud Computing With Mapreduce and Hadoop
No ratings yet
Parlab Parallel Boot Camp Cloud Computing With Mapreduce and Hadoop
49 pages
HDFS Unit 4
No ratings yet
HDFS Unit 4
12 pages
Short Programs
No ratings yet
Short Programs
41 pages
Unit 2 Topic 4 Map Reduce
No ratings yet
Unit 2 Topic 4 Map Reduce
27 pages
Map Reduce
No ratings yet
Map Reduce
30 pages
Term Paper Java
No ratings yet
Term Paper Java
14 pages
Map Reduce
No ratings yet
Map Reduce
74 pages
Map Reduce Programming
No ratings yet
Map Reduce Programming
74 pages
2 Hadoop Ecosystem
No ratings yet
2 Hadoop Ecosystem
41 pages
U-3 Big Data
No ratings yet
U-3 Big Data
23 pages
Unit - III Advanced Analytics Technology and Tools
No ratings yet
Unit - III Advanced Analytics Technology and Tools
44 pages
The Map Reduce Programming
No ratings yet
The Map Reduce Programming
15 pages
Introduction to Algorithms
From Everand
Introduction to Algorithms
S VASIST
No ratings yet
Python Advanced Programming: The Guide to Learn Python Programming. Reference with Exercises and Samples About Dynamical Programming, Multithreading, Multiprocessing, Debugging, Testing and More
From Everand
Python Advanced Programming: The Guide to Learn Python Programming. Reference with Exercises and Samples About Dynamical Programming, Multithreading, Multiprocessing, Debugging, Testing and More
Marcus Richards
No ratings yet
SAP interface programming with RFC and VBA: Edit SAP data with MS Access
From Everand
SAP interface programming with RFC and VBA: Edit SAP data with MS Access
Karl Josef Hensel
No ratings yet
Oracle Certified Professional Java Programmer OCPJP 1Z0 809
From Everand
Oracle Certified Professional Java Programmer OCPJP 1Z0 809
Manish Soni
No ratings yet
2023 Data, Analytics, and Artificial Intelligence Adoption Strategy-A
No ratings yet
2023 Data, Analytics, and Artificial Intelligence Adoption Strategy-A
7 pages
2023 Data, Analytics, and Artificial Intelligence Adoption Strategy-H
No ratings yet
2023 Data, Analytics, and Artificial Intelligence Adoption Strategy-H
4 pages
2023 Data, Analytics, and Artificial Intelligence Adoption Strategy-C
No ratings yet
2023 Data, Analytics, and Artificial Intelligence Adoption Strategy-C
10 pages
MapReduce - What It Is, and Why It Is So Popular
No ratings yet
MapReduce - What It Is, and Why It Is So Popular
7 pages
Paper Dvi
No ratings yet
Paper Dvi
7 pages
SAP HANA PAL - K-Means Algorithm or How To Do Cust... - SAP Community-1E
No ratings yet
SAP HANA PAL - K-Means Algorithm or How To Do Cust... - SAP Community-1E
2 pages
2 Mapreduce Model Principles
No ratings yet
2 Mapreduce Model Principles
7 pages
SAP HANA PAL - K-Means Algorithm or How To Do Cust... - SAP Community-1Q
No ratings yet
SAP HANA PAL - K-Means Algorithm or How To Do Cust... - SAP Community-1Q
2 pages
Balanced K-Means Revisited-5
No ratings yet
Balanced K-Means Revisited-5
3 pages
Balanced K-Means Revisited-1
No ratings yet
Balanced K-Means Revisited-1
3 pages
SAP HANA PAL - K-Means Algorithm or How To Do Cust... - SAP Community-O
No ratings yet
SAP HANA PAL - K-Means Algorithm or How To Do Cust... - SAP Community-O
3 pages
Hadoop
No ratings yet
Hadoop
7 pages
SAP HANA PAL - K-Means Algorithm or How To Do Cust... - SAP Community-16
No ratings yet
SAP HANA PAL - K-Means Algorithm or How To Do Cust... - SAP Community-16
3 pages
SAP HANA PAL - K-Means Algorithm or How To Do Cust... - SAP Community-14
No ratings yet
SAP HANA PAL - K-Means Algorithm or How To Do Cust... - SAP Community-14
3 pages
SAP HANA PAL - K-Means Algorithm or How To Do Cust... - SAP Community-P
No ratings yet
SAP HANA PAL - K-Means Algorithm or How To Do Cust... - SAP Community-P
3 pages
SAP HANA PAL - K-Means Algorithm or How To Do Cust... - SAP Community-17
No ratings yet
SAP HANA PAL - K-Means Algorithm or How To Do Cust... - SAP Community-17
3 pages
SAP HANA PAL - K-Means Algorithm or How To Do Cust... - SAP Community-9
No ratings yet
SAP HANA PAL - K-Means Algorithm or How To Do Cust... - SAP Community-9
4 pages
A Distance-Based Kernel For Classification Via Support Vector Machines - PMC-17
No ratings yet
A Distance-Based Kernel For Classification Via Support Vector Machines - PMC-17
1 page
SAP HANA PAL - K-Means Algorithm or How To Do Cust... - SAP Community
No ratings yet
SAP HANA PAL - K-Means Algorithm or How To Do Cust... - SAP Community
3 pages
SAP HANA PAL - K-Means Algorithm or How To Do Cust... - SAP Community-A
No ratings yet
SAP HANA PAL - K-Means Algorithm or How To Do Cust... - SAP Community-A
6 pages
SAP HANA PAL - K-Means Algorithm or How To Do Cust... - SAP Community-5
No ratings yet
SAP HANA PAL - K-Means Algorithm or How To Do Cust... - SAP Community-5
4 pages
Data Visualization Cheat Sheet For Basic Machine Learning Algorithms - by Boriharn K - Mar, 2024 - Towards Data Science
No ratings yet
Data Visualization Cheat Sheet For Basic Machine Learning Algorithms - by Boriharn K - Mar, 2024 - Towards Data Science
3 pages
Improved K-Means Map Reduce Algorithm For Big Data Cluster Analysis
No ratings yet
Improved K-Means Map Reduce Algorithm For Big Data Cluster Analysis
7 pages
SAP HANA PAL - K-Means Algorithm or How To Do Cust... - SAP Community-4
No ratings yet
SAP HANA PAL - K-Means Algorithm or How To Do Cust... - SAP Community-4
3 pages
Tutorial For K Means Clustering in Python Sklearn - MLK - Machine Learning Knowledge-5
No ratings yet
Tutorial For K Means Clustering in Python Sklearn - MLK - Machine Learning Knowledge-5
3 pages
The Incremental Online K Means Clustering Algorithm and Its Application To Color Quantization
No ratings yet
The Incremental Online K Means Clustering Algorithm and Its Application To Color Quantization
42 pages
K-Means Clustering Optimization Algorithm Based On Mapreduce
No ratings yet
K-Means Clustering Optimization Algorithm Based On Mapreduce
6 pages
Fast Scalable K-Means++ Algorithm With Mapreduce
No ratings yet
Fast Scalable K-Means++ Algorithm With Mapreduce
2 pages
Fuzzy K-Mean Clustering in Mapreduce On Cloud Based Hadoop: Dweepna Garg
No ratings yet
Fuzzy K-Mean Clustering in Mapreduce On Cloud Based Hadoop: Dweepna Garg
4 pages
Analysis of Mapreduce Algorithms: Harini Padmanaban
No ratings yet
Analysis of Mapreduce Algorithms: Harini Padmanaban
6 pages
Microprocessors: Spring 2005
No ratings yet
Microprocessors: Spring 2005
118 pages
Mysql Replication Update
No ratings yet
Mysql Replication Update
38 pages
Training Manual and Textbook Catalogue2007
No ratings yet
Training Manual and Textbook Catalogue2007
13 pages
Quantifying Sound Quality in Loudspeaker Reproduction: MSC Computer Science Game & Media Technology
No ratings yet
Quantifying Sound Quality in Loudspeaker Reproduction: MSC Computer Science Game & Media Technology
14 pages
Teorema Binomiale PDF
No ratings yet
Teorema Binomiale PDF
40 pages
DTech Dubai Tunnel Case Study
No ratings yet
DTech Dubai Tunnel Case Study
2 pages
We I 2012 User Guide
No ratings yet
We I 2012 User Guide
91 pages
EEE6209 Part A Topic 01
No ratings yet
EEE6209 Part A Topic 01
39 pages
Ehtcl 001 1
No ratings yet
Ehtcl 001 1
1 page
Cisa Exam Testing Concept It Alignment With Business Objecti PDF
No ratings yet
Cisa Exam Testing Concept It Alignment With Business Objecti PDF
9 pages
Introduction To Introduction To Extreme Extreme Programming Programming Extreme Extreme Programming Programming
No ratings yet
Introduction To Introduction To Extreme Extreme Programming Programming Extreme Extreme Programming Programming
29 pages
Incepta
No ratings yet
Incepta
8 pages
ICTICT532 Task Requirements
No ratings yet
ICTICT532 Task Requirements
3 pages
VERITAS NetBackup 5.1 Advanced Client System Administrators Guide For UNIX and Windows
No ratings yet
VERITAS NetBackup 5.1 Advanced Client System Administrators Guide For UNIX and Windows
287 pages
Session7 DML
No ratings yet
Session7 DML
6 pages
Oracle 19 C: Mr. Murali Sir
No ratings yet
Oracle 19 C: Mr. Murali Sir
376 pages
Google Sites Manual
100% (1)
Google Sites Manual
16 pages
Privacy and Anonymity OTR: Applied Cryptography, Lecture 13
No ratings yet
Privacy and Anonymity OTR: Applied Cryptography, Lecture 13
153 pages
Goal Stack Planning
No ratings yet
Goal Stack Planning
4 pages
Schools: Safeguarding Your Reputation: Harbottle & Lewis LLP
No ratings yet
Schools: Safeguarding Your Reputation: Harbottle & Lewis LLP
6 pages
Qlik+Sense+Search+Cheat+Sheet+en US
No ratings yet
Qlik+Sense+Search+Cheat+Sheet+en US
1 page
"Online Job Portal": Visvesvaraya Technological University
No ratings yet
"Online Job Portal": Visvesvaraya Technological University
5 pages
Charisma Analyzer Powered by Tableau Software
100% (1)
Charisma Analyzer Powered by Tableau Software
12 pages
Programming and Algorithms Overview
No ratings yet
Programming and Algorithms Overview
38 pages
Datakom D700 PRESENTATION PDF
No ratings yet
Datakom D700 PRESENTATION PDF
44 pages
Sex Sigma
No ratings yet
Sex Sigma
6 pages
bk9 3 PDF
No ratings yet
bk9 3 PDF
0 pages
Kingston SSDNow SF Updater - 082212
No ratings yet
Kingston SSDNow SF Updater - 082212
6 pages

18mcs35e U4

Uploaded by

18mcs35e U4

Uploaded by

MapReduce

MapReduce is a framework using which we can write applications to process huge

Inputs and Outputs (Java Perspective)

Map <k1, v1> list (<k2, v2>)

Reduce <k2, list(v2)> list (<k3, v3>)

public class ProcessUnits {

Reporter reporter) throws IOException {

FileInputFormat.setInputPaths(conf, new Path(args[0]));

Download Hadoop-core-1.2.1.jar, which is used to compile and execute the

The following command is used to create an input directory in HDFS.

FILE: Number of bytes read = 61

Launched map tasks = 2

Map input records = 5

You might also like