0% found this document useful (0 votes)

9 views54 pages

Bda Unit III r20csm

Uploaded by

kharshitha93

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views54 pages

Bda Unit III r20csm

Uploaded by

kharshitha93

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 54

MAP REDUCE

Mrs.P.Sridevi CSE-IT dept

UNIT -III
• Writing MapReduce Programs:
• A Weather Dataset,
• Understanding Hadoop API for MapReduce Framework
(Old and New),
• Basic programs of Hadoop MapReduce: WordCount
• Driver code,
• Mapper code,
• Reducer code,
• RecordReader,
• Combiner, Partitioner
Hadoop components

hadoop

HDFS Map Reduce

MR Design pattern
• Used in
• Indexing and Search
• Classification
• Summarization
• Recommendation Systems
• Analytics
MR Features
• It’s a Programming Model
• Large Scale Distributed Model
• Parallel Programming
Functions in MR
• Mapper Map()
• Reducer Reduce()
Implementation- hadoop
File Input formatter
• TextInputFormatter (k,V)
• KeyValueTextInputFormatter(k,V)
• Sequence File Input Formatter(k,V)
<filename,file>
• binary files( images,video,audio) called
sequence files
Architecture overview
Master node

user

Job tracker

Slave node 1 Slave node 2 Slave node N

Task tracker Task tracker Task tracker

Workers Workers Workers

Map+Reduce
R
M E
Very Partitioning
A D Result
big Function
P U
data
C
E

• Map: • Reduce :
– Accepts input – Accepts intermediate
key/value pair key/value pair
– Emits intermediate – Emits output
key/value pair key/value pair
Diagram (1)
SCALING OUT
• Small input can be processed in standalone
system. This is useful to test the MR
programming model. The data is in local file
system and computation is done.
• If data is more we should go for distributed
systems and part of data is processed parallel
in each data node
Diagram (2)
Input file ( in GBs)

INPUT SPLIT 0 INPUT SPLIT 1 INPUT SPLIT 2 INPUT SPLIT 3

In 64mb block

RECORD READER< RR1 RR2 RR3

key, value>

MAPPER 0 MAPPER 1 MAPPER 2 MAPPER 3

(tokenizes)
<Key,value>

SHUFFLE AND SORT INTERMEDIATE DATA (key2,list[])

Points need to be emphasized
• No reduce can begin until map is complete
• Master must communicate locations of
intermediate files
• Tasks scheduled based on location of data
• If map worker fails any time before reduce
finishes, task must be completely rerun
• MapReduce library does most of the hard
work.
INPUT FILE

FILE: TEXT INPUT FORMAT

MAP REDUCE PROGRAMS PROCESS DATA IN TWO MAP REDUCE

PHASES
1. MAP PHASE INPUT <KEY , VALUE >
OUTPUT <KEY,VALUE>
2. REDUCE PHASE INPUT <KEY,VALUE>
OUTPUT<KEY, VALUE>
OUTPUT FILE
MAP REDUCE PROGRAM HAS TWO FUNCTIONS
3. MAP()
4. REDUCE()
INPUT FILE INPUT SPLIT(blocks) MAP PER

•Each input split is a 64/128 Mb block

•One map task is applied for each split
•Each line is considered a record
•The number of records = the number of times the
mapper runs in the split
•The Number of input splits we have so many
mappers we should have
INPUT FILE INPUT SPLIT RECORD READER

<byteOffset, , line >

MAPPER
INPUT SPLIT(128Mb
INPUT FILE in GBs RECORD READER (RR)
block)

Hadoop runs map task on the node where input <KEY ,Value>
data resides in HDFS. <byteOffset, line >
This is called data locality optimization
MAPPER(tokenizes)
The output of mapper is written to local disks.
Key , value
Bcz the output is intermediate output. How,1
Its processed by reduce task. Shuffle and sort
Final output comes from reducer.
There can be single reducer or multiple reducers
Reducer
If there are multiple reducers the output of mapper
(aggregation)
is partitoned into various nodes
Record Writer

Working of map reduce flow

Result/output file
ncdc
0029029170999991909010106004+62900+027667FM-
12+009099999V0202001N001019999999N0000001N9-
01061+99999102241ADDGF108991999999999999999999
0029029170999991909010113004+62900+027667FM-12+009099999V0209991C000019999999N0000001N9-00781+99999102031ADDGF108991999999999999999999
0029029170999991909010120004+62900+027667FM-12+009099999V0202001N001019999999N0000001N9-00501+99999101781ADDGF108991999999999999999999
0029029170999991909010206004+62900+027667FM-12+009099999V0202301N001019999999N0000001N9-00721+99999101141ADDGF108991999999999999999999
0035029170999991909010213004+62900+027667FM-12+009099999V0202301N002619999999N0000001N9-00221+99999100081ADDGF108991999999999999999999
0029029170999991909010220004+62900+027667FM-12+009099999V0202301N001019999999N0000001N9+00061+99999099671ADDGF104991999999999999999999
0029029170999991909010306004+62900+027667FM-12+009099999V0202701N001019999999N0000001N9-00111+99999099861ADDGF100991999999999999999999
0029029170999991909010313004+62900+027667FM-12+009099999V0209991C000019999999N0000001N9-00221+99999100111ADDGF104991999999999999999999
0029029170999991909010320004+62900+027667FM-12+009099999V0202501N002619999999N0000001N9+00061+99999099321ADDGF108991999999999999999999
0029029170999991909010406004+62900+027667FM-12+009099999V0202701N001019999999N0000001N9+00111+99999099101ADDGF100991999999999999999999
0029029170999991909010413004+62900+027667FM-12+009099999V0202901N001019999999N0000001N9+00111+99999098951ADDGF100991999999999999999999
0029029170999991909010420004+62900+027667FM-12+009099999V0203201N002619999999N0000001N9+00001+99999098941ADDGF100991999999999999999999
0029029170999991909010506004+62900+027667FM-12+009099999V0202901N002619999999N0000001N9-00171+99999099341ADDGF100991999999999999999999
0029029170999991909010513004+62900+027667FM-12+009099999V0203201N001019999999N0000001N9-00221+99999099961ADDGF100991999999999999999999
0029029170999991909010520004+62900+027667FM-12+009099999V0203201N001019999999N0000001N9-00391+99999100181ADDGF100991999999999999999999
0029029170999991909010606004+62900+027667FM-12+009099999V0209991C000019999999N0000001N9-00561+99999100431ADDGF108991999999999999999999
FILE SIZE IS 2GB
BLOCK SIZE IS 64MB
• Hi students
• How are you
• How is your BDA class
• How many students are there in the class
• How are the students feeling about online
class
• Many students are present today.
• I love teaching BDA
Shuffle
Input to the Reducer is the sorted output
of the mappers. In this phase the
framework fetches the relevant partition
of the output of all the mappers.
Sort
The framework groups Reducer inputs by
keys (since different mappers may have
output the same key) in this stage.
The shuffle and sort phases occur
simultaneously; while map-outputs are
being fetched they are merged.
File output formatter
TextOutputFormatter - plain text files
SequenceFileOutputFormatter -sequence files

MAPPER PHASE:
<k1,v1>---- -> list(k2,v2>
Combiner: <k2,list(v2)
REDUCER PHASE:
<k2,list(v2)> ---- list<k3,v3>
PACKAGES AND LIBRARIES
import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
Public class WordCountMapper extends
Mapper<LongWritable,Text,Text,IntWritable>
{
Public void map(LongWritable key,Text value,Context context)
throws IOException InterruptedException
{
Text word = new Text();
String line = value.toString();
stringTokenizer s = new StringTokenizer(line);
While(s.hasMoreTokens())
{
word.set(s.nextToken());
context.write(word, new IntWritable(1));
}}}
public class WordCountReducer extends
Reducer<Text, IntWritable,Text,IntWritable>
{
public void reduce(Text key, Iterable<IntWritable> value ,
Context context ) throws IOException, InterruptedException {
int sum = 0;
for (IntWritable val : value)
{
sum = sum+ val.get();
}

context.write(key, new IntWritable(sum);

}
}
public class WordCountDriver
{
Public static void main(String[] args)throws IOException
{
Configuration conf = new Configuration();
Job job = Job.getInstance(conf, "word countprj");
job.setJarByClass(WordCount.class);
job.setMapperClass(WordCountMapper.class);
job.setReducerClass(WordCountReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.setInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, newPath(args[1]));
if(!job.waitForCompletion(true) )
return;
}}
Mapper output
• INPUT:HELLO WORLD BYE WORLD
• HELLO HADOOP GOODBYE HADOOP
• For the given sample input the first map
emits:
• < Hello, 1>
• < World, 1>
• < Bye, 1>
• < World, 1>
For the given sample input the second map
emits:
• Hello Hadoop Goodbye Hadoop
• < Hello, 1>
• < Hadoop, 1>
• < Goodbye, 1>
• < Hadoop, 1>
The output of the map:
• < Goodbye, 1>
• < Hadoop, [1,1]>
• < Hello, [1,1]>
• < BYE,1>
• <WORLD,[1,1]>
OUTPUT OF REDUCER
• < Bye, 1>
• < Goodbye, 1>
• < Hadoop, 2>
• < Hello, 2>
• < World, 2>
How Many Maps?
The number of maps is usually driven by the total size
of the inputs, that is, the total number of blocks of the
input files.
The right level of parallelism for maps seems to be
around 10-100 maps per-node.

Thus, if you expect 10TB of input data and have a

blocksize of 128MB, you’ll end up with 82,000 maps,
Configuration.set(MRJobConfig.NUM_MAPS,
int) is used to set the map size
Reducer

Reducer has 3 primary phases: shuffle, sort and

reduce.

Reducer reduces a set of intermediate values which

share a key to a smaller set of values.
The number of reducers for the job is set by the user
via Job.setNumReduceTasks(int).
The framework then
calls reduce(WritableComparable, Iterable<Writable>,
Context) method for each <key, (list of values)> pair in
the grouped inputs.
• a MapReduce job takes a set of input key-value pairs
and produces a set of output key-value pairs by
passing the data through map and reduce functions.
• Map tasks deal with splitting and mapping of data
• Reduce tasks shuffle and reduce the data. .
The reduce tasks consolidate the data into final
results.
• MapReduce programs are parallel in nature, thus are
very useful for performing large-scale data analysis
• The input to each phase is key-value pairs.
• Map Reduce programs are performed by multiple
machines in a cluster.
Map only job

• let us consider a scenario where we just need to

perform the operation and no aggregation
required, in such case, we will prefer ‘Map-Only
job’ in Hadoop.
• In Hadoop Map-Only job, the map does all task
with its InputputSplit and no job is done by the
reducer. Here map output is the final output.
• by setting job.setNumreduceTasks(0) in the
configuration in a driver. This will make a
number of reducer as 0
• If we set the number of Reducer to 0 (by setting
job.setNumreduceTasks(0)), then no reducer will
execute and no aggregation will take place. In
such case, we will prefer “Map only Job”.
In Map-Only job, the map does all task with
its InputSplit and the reducer do no job.
• Mapper output is the final output.
• Between map and reduce phases there is key,
sort, and shuffle phase. Sort and shuffle phase
are responsible for sorting the keys in ascending
order. Then grouping values based on same keys.
This phase is very expensive.
Map only job cont...

• Avoiding reduce phase would eliminate sort and

shuffle phase as well. This also saves network
congestion. As in shuffling an output of mapper
travels to the reducer, when data size is huge, large
data travel to the reducer.
• In MapReduce job, mapper output is written to
local disk before sending to Reducer
• in the map-only job, this output is directly written
to HDFS. This further saves time and reduces cost
as well.
In between map and reduces phases there is
key, sort and shuffle phase. Sort and shuffle are
responsible for sorting the keys in ascending
order and then grouping values based on same
keys.
The output of mapper is written to local disk
before sending to reducer but in map only job,
this output is directly written to HDFS. This
further saves time and reduces cost as well.
• there is no need of partitioner and combiner
in Hadoop Map Only job that makes the
process fast.
• Map only job in Hadoop reduces the network
congestion by avoiding shuffle, sort and
reduce phase. Mapper takes care of overall
processing and produces the output. We can
achieve this by using
the job.setNumreduceTasks(0).
Combiners & Partitioners
combiners
• Combiners are used to reduce the amount of the
data being transferred over the Network
• It is used to optimize the usage of the network
bandwidth.
• Combiners are also called as local reducers
• They run on the mapper output and run on the same
machine where mapper has been executed earlier
• Hadoop may call combiner function zero,one or
many times for a particular map output record.
WORD COUNT PROCESS
• 1st map output
Hadoop 1
Hadoop 1
Hadoop 1
Hadoop 1
Combiner function
<Hadoop ,1,1,1,1>
Reduce method would be called with
<Hadoop,4>
• A Combiner, also known as a semi-reducer,
• It is an optional class that operates by accepting the
inputs from the Map class and thereafter passing the
output key-value pairs to the Reducer class.
• The main function of a Combiner is to summarize the
map output records with the same key.
• The output (key-value collection) of the combiner will
be sent over the network to the actual Reducer task as
input.
• Combiner function does not replace the reduce
method.
• Combiner is implemented by extending reduce
abstract class
Partitioner

• A partitioner works like a condition in processing an input

dataset.
• The partition phase takes place after the Map phase and
before the Reduce phase.
• The number of partitioners is equal to the number of
reducers.
• That means a partitioner will divide the data according to the
number of reducers.
• Therefore, the data passed from a single partitioner is
processed by a single Reducer.
• Partitioner allows us to distribute outputs from the map
stage are sent to reducers
• It partitions the key space
• A partitioner partitions the key-value pairs of
intermediate Map-outputs.
• It partitions the data using a user-defined
condition, which works like a hash function.
• The total number of partitions is same as the
number of Reducer tasks for the job.
• The difference between a partitioner and
a combiner is that the partitioner divides the
data according to the number of reducers so
that all the data in a single partition gets
executed by a single reducer.
DIFFERENCE BETWEEN COMBINERS AND
Combiner can be PARTITIONERS
viewed as mini-reducers in
the map phase.
They perform a local-reduce on the mapper
results before they are distributed further.
Once the Combiner functionality is executed,
it is then passed on to the Reducer for
further work.
come into the picture when we are
Partitioner
working on more than one Reducer. So, the
partitioner decide which reducer is
responsible for a particular key. They
basically take the Mapper Result(if Combiner is
used then Combiner Result) and send it to the
MAPREDUCE FEATURES
• MapReduce is a popular programming model
for processing large datasets in parallel across
a distributed cluster of computers.
• Developed by Google, it has become an
essential component of the Hadoop
ecosystem, enabling efficient data processing
and analysis.
• MapReduce Fundamentals
• MapReduce is designed to address the challenges associated
with processing massive amounts of data by breaking the
problem into smaller, more manageable tasks. It consists of two
primary functions: Map and Reduce, which work together to
process and analyze data.
• The Map Function
• The Map function takes input data and processes it into
intermediate key-value pairs. It applies a user-defined function
to each input record, generating output pairs that are then
sorted and grouped by key.
• The Reduce Function
• The Reduce function processes the intermediate key-value pairs
generated by the Map function. It aggregates, filters, or
combines the data based on a user-defined function, generating
the final output.

제어공학_강의자료_ch.01_D2
No ratings yet
제어공학_강의자료_ch.01_D2
32 pages
Big Data Management Continued
No ratings yet
Big Data Management Continued
48 pages
Chapter 9 - Processing Big Data With Mapreduce
No ratings yet
Chapter 9 - Processing Big Data With Mapreduce
157 pages
ECS765P_W2_The MapReduce Programming Model
No ratings yet
ECS765P_W2_The MapReduce Programming Model
53 pages
MapReduce (1)
No ratings yet
MapReduce (1)
33 pages
6. Map Reduce Programming
No ratings yet
6. Map Reduce Programming
67 pages
04_MapReduce
No ratings yet
04_MapReduce
45 pages
2020 11 10 00 23 25 DESKTOP-3GP43LF Log
No ratings yet
2020 11 10 00 23 25 DESKTOP-3GP43LF Log
409 pages
BDA U2 - copy
No ratings yet
BDA U2 - copy
79 pages
Map Reduce
No ratings yet
Map Reduce
30 pages
Mapreduce Model Principles
No ratings yet
Mapreduce Model Principles
65 pages
3.4 Map Scheduler
No ratings yet
3.4 Map Scheduler
23 pages
M4_06_MapReduce
No ratings yet
M4_06_MapReduce
28 pages
Advanced Mapreduce
No ratings yet
Advanced Mapreduce
37 pages
Buffer_Overflow_Attack_Lab_Student_Copy
No ratings yet
Buffer_Overflow_Attack_Lab_Student_Copy
20 pages
BSC in Information Technology (Data Science) : Massive or Big Data Processing J.Alosius
No ratings yet
BSC in Information Technology (Data Science) : Massive or Big Data Processing J.Alosius
30 pages
Map Reduce
No ratings yet
Map Reduce
74 pages
Computer Network Engineering Thesis
100% (3)
Computer Network Engineering Thesis
6 pages
Lecture 03
No ratings yet
Lecture 03
26 pages
Unit 3 - Big Data Technologies
No ratings yet
Unit 3 - Big Data Technologies
42 pages
Iram PDF
No ratings yet
Iram PDF
24 pages
Unit - III
No ratings yet
Unit - III
37 pages
Unit-2 MapReduce2024
No ratings yet
Unit-2 MapReduce2024
41 pages
Hadoop Map Reduce
No ratings yet
Hadoop Map Reduce
53 pages
Map Reduce Programming
No ratings yet
Map Reduce Programming
74 pages
Understanding MapReduce
No ratings yet
Understanding MapReduce
15 pages
BDA-MapReduce (1) 5rfgy656yhgvcft6
No ratings yet
BDA-MapReduce (1) 5rfgy656yhgvcft6
60 pages
Map Reduce
No ratings yet
Map Reduce
45 pages
Lecture 04
No ratings yet
Lecture 04
25 pages
Lecture - 3
No ratings yet
Lecture - 3
25 pages
Map Reduce
No ratings yet
Map Reduce
57 pages
BDA Unit 3 1
No ratings yet
BDA Unit 3 1
37 pages
Map Reduce
No ratings yet
Map Reduce
44 pages
Hadoop and Map Reduce
No ratings yet
Hadoop and Map Reduce
27 pages
BDA Unit 2 Notes
No ratings yet
BDA Unit 2 Notes
32 pages
Hadoop and MR Programming: DR G Sudha Sadasivam Professor Cse, PSGCT
No ratings yet
Hadoop and MR Programming: DR G Sudha Sadasivam Professor Cse, PSGCT
71 pages
3.Map-Reduce Framework - 1
No ratings yet
3.Map-Reduce Framework - 1
47 pages
BDA FW-4
No ratings yet
BDA FW-4
7 pages
SW - Buck - 1 Application Report: Topology: Non Synchronous Buck Converter Controller: Texas Instruments TPS40200
No ratings yet
SW - Buck - 1 Application Report: Topology: Non Synchronous Buck Converter Controller: Texas Instruments TPS40200
25 pages
Unit 2 Topic 4 Map Reduce
No ratings yet
Unit 2 Topic 4 Map Reduce
43 pages
Classification of Computers: Introduction To Computing
No ratings yet
Classification of Computers: Introduction To Computing
21 pages
Big Data Infrastructure: Week 2: Mapreduce Algorithm Design (2/2)
No ratings yet
Big Data Infrastructure: Week 2: Mapreduce Algorithm Design (2/2)
55 pages
Map Reduce 2
No ratings yet
Map Reduce 2
14 pages
SBC8530 Quick User Manual
No ratings yet
SBC8530 Quick User Manual
32 pages
Introduction To Problem Solving: Notes For I - B.Tech (CSM) - A
No ratings yet
Introduction To Problem Solving: Notes For I - B.Tech (CSM) - A
11 pages
Lez.d-01-Hadoop (A) Intro
No ratings yet
Lez.d-01-Hadoop (A) Intro
58 pages
Introduction To Mechatronics Laboratory Manual: Mendenhall Innovation Program
No ratings yet
Introduction To Mechatronics Laboratory Manual: Mendenhall Innovation Program
16 pages
Hadoop
No ratings yet
Hadoop
28 pages
Dllction To MAPREDUCE Afflrlling: L Tro
No ratings yet
Dllction To MAPREDUCE Afflrlling: L Tro
12 pages
unit 2
No ratings yet
unit 2
12 pages
CS 425 / ECE 428 Distributed Systems Fall 2016: Lecture 4: Mapreduce and Hadoop
No ratings yet
CS 425 / ECE 428 Distributed Systems Fall 2016: Lecture 4: Mapreduce and Hadoop
24 pages
IOT Based Soil Moisture, Temperature and Humidity Measurement Using Arduino
No ratings yet
IOT Based Soil Moisture, Temperature and Humidity Measurement Using Arduino
42 pages
Lab Report 1
No ratings yet
Lab Report 1
16 pages
Master Blue-Green Deployment in AWS
No ratings yet
Master Blue-Green Deployment in AWS
35 pages
Understanding MapReduce
No ratings yet
Understanding MapReduce
4 pages
Unit 5 - Mapreduce
No ratings yet
Unit 5 - Mapreduce
8 pages
Data Science Presentation
No ratings yet
Data Science Presentation
20 pages
BIG DATA UNIT -3
No ratings yet
BIG DATA UNIT -3
7 pages
Palak
No ratings yet
Palak
10 pages
Big Data Practical 2
No ratings yet
Big Data Practical 2
11 pages
Hadoop Wordcount Program
No ratings yet
Hadoop Wordcount Program
20 pages
ATV71HD22M3X
No ratings yet
ATV71HD22M3X
13 pages
Mapreduce: Simplified Data Processing On Large Clusters by Jeffrey Dean and Sanjay Ghemawa Presented by Jon Logan
No ratings yet
Mapreduce: Simplified Data Processing On Large Clusters by Jeffrey Dean and Sanjay Ghemawa Presented by Jon Logan
30 pages
CSC 222 Operating System
No ratings yet
CSC 222 Operating System
17 pages
Mapreduce Programming Framework
No ratings yet
Mapreduce Programming Framework
23 pages
TL084CN Amplificador Operacional
No ratings yet
TL084CN Amplificador Operacional
31 pages
UNIT-5
No ratings yet
UNIT-5
9 pages
Practical File
No ratings yet
Practical File
24 pages
Parlab Parallel Boot Camp Cloud Computing With Mapreduce and Hadoop
No ratings yet
Parlab Parallel Boot Camp Cloud Computing With Mapreduce and Hadoop
49 pages
CS246 TA Session: Hadoop Tutorial: Peyman Kazemian 1/11/2011
No ratings yet
CS246 TA Session: Hadoop Tutorial: Peyman Kazemian 1/11/2011
13 pages
Unit IV Programming Model
No ratings yet
Unit IV Programming Model
30 pages
Hadoop (Mapreduce)
No ratings yet
Hadoop (Mapreduce)
43 pages
Object Oriented Programming With C++
No ratings yet
Object Oriented Programming With C++
159 pages
Big Data 4 Vivek
No ratings yet
Big Data 4 Vivek
3 pages
Bahria University Lahore Campus: Department of Computer Sciences
No ratings yet
Bahria University Lahore Campus: Department of Computer Sciences
10 pages
Map Red
No ratings yet
Map Red
6 pages
Ce 1216 Stud
No ratings yet
Ce 1216 Stud
202 pages
Intership Bhuttu
No ratings yet
Intership Bhuttu
15 pages
Lotus Notes 8.5.1 Support (DSCS851)
No ratings yet
Lotus Notes 8.5.1 Support (DSCS851)
4 pages
6-Web Application Security
No ratings yet
6-Web Application Security
36 pages
Hadoop Karunesh
No ratings yet
Hadoop Karunesh
14 pages
Andrea Valle, Integrated Algorithmic Composition
No ratings yet
Andrea Valle, Integrated Algorithmic Composition
4 pages
03 Firstmrjob Invertedindexconstruction 141206231216 Conversion Gate01 PDF
No ratings yet
03 Firstmrjob Invertedindexconstruction 141206231216 Conversion Gate01 PDF
54 pages
Structure and Design of Building Automation Systems
No ratings yet
Structure and Design of Building Automation Systems
16 pages
Programming & Problem Solving Through Language: Total Time: 3 Hrs
No ratings yet
Programming & Problem Solving Through Language: Total Time: 3 Hrs
5 pages
CMPS03 Documentation
No ratings yet
CMPS03 Documentation
5 pages
FPA 1200 C Data Sheet
No ratings yet
FPA 1200 C Data Sheet
4 pages
M Shell Commands
No ratings yet
M Shell Commands
4 pages
SAP interface programming with RFC and VBA: Edit SAP data with MS Access
From Everand
SAP interface programming with RFC and VBA: Edit SAP data with MS Access
Karl Josef Hensel
No ratings yet
PLC: Programmable Logic Controller – Arktika.: EXPERIMENTAL PRODUCT BASED ON CPLD.
From Everand
PLC: Programmable Logic Controller – Arktika.: EXPERIMENTAL PRODUCT BASED ON CPLD.
Franco Mario
No ratings yet

Bda Unit III r20csm

Uploaded by

Bda Unit III r20csm

Uploaded by

MAP REDUCE

Mrs.P.Sridevi CSE-IT dept

HDFS Map Reduce

Slave node 1 Slave node 2 Slave node N

Task tracker Task tracker Task tracker

Workers Workers Workers

INPUT SPLIT 0 INPUT SPLIT 1 INPUT SPLIT 2 INPUT SPLIT 3

RECORD READER< RR1 RR2 RR3

MAPPER 0 MAPPER 1 MAPPER 2 MAPPER 3

SHUFFLE AND SORT INTERMEDIATE DATA (key2,list[])

FILE: TEXT INPUT FORMAT

MAP REDUCE PROGRAMS PROCESS DATA IN TWO MAP REDUCE

•Each input split is a 64/128 Mb block

<byteOffset, , line >

Working of map reduce flow

context.write(key, new IntWritable(sum);

Thus, if you expect 10TB of input data and have a

Reducer has 3 primary phases: shuffle, sort and

Reducer reduces a set of intermediate values which

• let us consider a scenario where we just need to

• Avoiding reduce phase would eliminate sort and

• A partitioner works like a condition in processing an input

You might also like