0% found this document useful (0 votes)

6 views

Map Reduce Design and Execution Framework Part 1

The document discusses MapReduce and Hadoop. It provides an overview of MapReduce including how it works, key concepts, and examples like word count. It also discusses implementations of MapReduce like Hadoop.

Uploaded by

l200908

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views

Map Reduce Design and Execution Framework Part 1

Uploaded by

l200908

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 19

HADOOP AND

MAP REDUCE
Map Reduce
■ Idea:
– Bring computation close to the data
– Provide unified programming model to simplify parallelism
– Store data redundantly for reliability

Builds on Distributed File Systems

Distributed File System HDFS

■ Reliable distributed file system

■ Data kept in blocks spread across machines
■ Each block replicated on different machines
– Seamless recovery from disk or machine failure
C0 C1 D0 C1 C2 C5 C0 C5

C5 C2 C5 C3 D0 D1 … D0 C2
Machine 1 Machine 2 Machine 3 Machine N

Bring computation directly to the data!

HDFS DETAILS LATER

MapReduce: Overview
■ Sequentially read a lot of data
■ Map:
– Extract something you care about
■ Group by key: Sort and Shuffle
■ Reduce:
– Aggregate, summarize, filter or transform
■ Write the result

Outline stays the same, Map and

Reduce change to fit the problem

4
MAP REDUCE –KEY IDEA

■ Key idea: Programmers specify two functions:

– map (k, v) → <k’, v’>*
– reduce (k’, v’) → <k’, v’>*
– All values with the same key are sent to the same reducer

The execution framework handles everything else…

(Dean and Ghemawat, OSDI 2004)

MapReduce - Word Count
Warm-up task:
■ We have a huge text document

■ Count the number of times each distinct word appears in the file

■ Sample application:
– Analyze web server logs to find popular URLs
MapReduce Example - WordCount

Image from: http://blog.jteam.nl/wp-content/uploads/2009/08/MapReduceWordCountOverview1.png

MapReduce Example - WordCount

Image from: http://blog.jteam.nl/wp-content/uploads/2009/08/MapReduceWordCountOverview1.png

MapReduce Example - WordCount

Image from: http://blog.jteam.nl/wp-content/uploads/2009/08/MapReduceWordCountOverview1.png

MapReduce Example - WordCount

Image from: http://blog.jteam.nl/wp-content/uploads/2009/08/MapReduceWordCountOverview1.png

MapReduce Example - WordCount

Image from: http://blog.jteam.nl/wp-content/uploads/2009/08/MapReduceWordCountOverview1.png

Word Count Using MapReduce

map(key, value):
// key: document name; value: text of the document
for each word w in value:
emit(w, 1)

reduce(key, values):
// key: a word; value: an iterator over counts
result = 0
for each count v in values:
result += v
emit(key, result)

12
Word Count Using MapReduce
from mrjob.job import MRJob
map(key, value):
class WordCount(MRJob):
// key: document name; value: text of
the document
def mapper(self, _, line): for each word w in value:
for word in line.split(): emit(w, 1)
yield(word, 1)
reduce(key, values):
def reducer(self, word, counts): // key: a word; value:an array counts
result = 0
yield(word, sum(counts)) for each count v in values:
result += v
if __name__ == '__main__': emit(key, result)

WordCount.run()
https://mrjob.readthedocs.io/en/latest/
13
MapReduce “word count” example

Map Group by key Reduce

Waterloo is a (waterloo,1) (waterloo, (is, 1)
city in Ontario, (is, 1) [1,1,1]) (smallest, 1)
Canada. It is (a, 1) … (is, [1]) (of, 2) …
the smallest of (smallest, 1) (smallest, [1])
three cities in (of,1) (of, [1,1]) (municipality, 1)
the Regional (three, 1) … (municipality, (county, 1)
Municipality of (municipality,1) [1]) (a, 1) …
(of,1)
Waterloo (and (county, [1])
(waterloo, 1) …
previously in (a,1) (waterloo, 3)
(waterloo, 1)
Waterloo (county, 1) (three, [1]) (three, 1)
County, (ontario, 1) (ontario, [1]) (ontario, 1)
Ontario), and is … …
…
adjacent to the
Bigof
city document
Kitchener.
…
Example: Inverted Index
■ This was the original Google's usecase
■ Generate an inverted index of words from a given set of files

• Map:
▫ parses a document and emits
<word, docId> pairs
• Reduce:
▫ takes all pairs for a given word,
sorts the docId values, and emits
a <word,list(docId)> pair
Example: Language modeling

■ Statistical machine translation:

– Need to count number of times every 5-word sequence occurs in
a large corpus of documents

Map •extract (5‐word sequence, count) from

document

Reduce •combine counts

Example: Distributed Grep

■ Find all occurrences of the given pattern in a very large set of files.

Map: •Apply grep on assigned documents

•Emit list of documents that contain term

Reduce: •Merge lists

MapReduce Implementations
■ Google has a proprietary implementation in C++
– Bindings in Java, Python
■ Hadoop is an open-source implementation in Java
– Development led by Yahoo, used in production
– Now an Apache project
– Rapidly expanding software ecosystem
■ Lots of custom research implementations
– For GPUs, cell processors, etc.
Map-Reduce

Input &
Output is often an
output are
input to another
stored on
Map Reduce task
DFS

Scheduler try to schedule map task close to Intermediate results are stored on the
the physical storage location of input data local FS of Map & Reduce tasks to
avoid network traffic

Zoho Questions PDF
80% (10)
Zoho Questions PDF
4 pages
Chapter Five Hadoop Mapreduce & HDFS
No ratings yet
Chapter Five Hadoop Mapreduce & HDFS
44 pages
Hadoop Spark
No ratings yet
Hadoop Spark
34 pages
Lecture 1 - Map Reduce
No ratings yet
Lecture 1 - Map Reduce
31 pages
Distributed and Cloud Computing
No ratings yet
Distributed and Cloud Computing
58 pages
Large-Scale Data Management: Cs525: Special Topics in Dbs
No ratings yet
Large-Scale Data Management: Cs525: Special Topics in Dbs
22 pages
Map Reduce_3
No ratings yet
Map Reduce_3
23 pages
Hadoop Map Reduce Concepts - Teaching - 1
No ratings yet
Hadoop Map Reduce Concepts - Teaching - 1
53 pages
CS 425 / ECE 428 Distributed Systems Fall 2014: Lecture 3: Mapreduce and Hadoop
No ratings yet
CS 425 / ECE 428 Distributed Systems Fall 2014: Lecture 3: Mapreduce and Hadoop
24 pages
Map Reduce Architecture: Adapted From Lectures by
No ratings yet
Map Reduce Architecture: Adapted From Lectures by
37 pages
Hadoop
No ratings yet
Hadoop
34 pages
Hadoop Trainting in Hyderabad@KellyTechnologies
No ratings yet
Hadoop Trainting in Hyderabad@KellyTechnologies
23 pages
Map Reduce and Hadoop
No ratings yet
Map Reduce and Hadoop
39 pages
Map Reduce
No ratings yet
Map Reduce
28 pages
Lecture3 Hadoop-NLP
No ratings yet
Lecture3 Hadoop-NLP
44 pages
Unit III EBDP 2022
No ratings yet
Unit III EBDP 2022
77 pages
Hadoop Wordcount Program
No ratings yet
Hadoop Wordcount Program
20 pages
BIG DATA UNIT -3
No ratings yet
BIG DATA UNIT -3
7 pages
Hadoop Map Reduce Concept
No ratings yet
Hadoop Map Reduce Concept
23 pages
DSBDA Manual Assignment 11
No ratings yet
DSBDA Manual Assignment 11
6 pages
Lecture 03
No ratings yet
Lecture 03
26 pages
Data Science
No ratings yet
Data Science
7 pages
Chapter 4
No ratings yet
Chapter 4
53 pages
Assignment 11 DSBDA
No ratings yet
Assignment 11 DSBDA
4 pages
Unit 3 - Big Data Technologies
No ratings yet
Unit 3 - Big Data Technologies
42 pages
3.4 Map Scheduler
No ratings yet
3.4 Map Scheduler
23 pages
CS702_Big_Data_Programs
No ratings yet
CS702_Big_Data_Programs
58 pages
Introduction To MapReduce
No ratings yet
Introduction To MapReduce
9 pages
084 Liza Bda File
No ratings yet
084 Liza Bda File
23 pages
Big Data Akshat
No ratings yet
Big Data Akshat
57 pages
Lecture - 3
No ratings yet
Lecture - 3
25 pages
Introduction To Map Reduce
No ratings yet
Introduction To Map Reduce
50 pages
L06MapReduce
No ratings yet
L06MapReduce
37 pages
Lecture-3-MR-model-and-systems
No ratings yet
Lecture-3-MR-model-and-systems
67 pages
3- SPARK
No ratings yet
3- SPARK
51 pages
Map Reduced B Seminar
No ratings yet
Map Reduced B Seminar
17 pages
MapReduce Introduction
No ratings yet
MapReduce Introduction
34 pages
Unit 2 Topic 4 Map Reduce
No ratings yet
Unit 2 Topic 4 Map Reduce
43 pages
Lecture 10 MapReduce Hadoop
No ratings yet
Lecture 10 MapReduce Hadoop
37 pages
Kcs 061 PPT Unit 2
No ratings yet
Kcs 061 PPT Unit 2
56 pages
Chapter 4
No ratings yet
Chapter 4
71 pages
Unit 2 - From Hadoop Streaming PDF
No ratings yet
Unit 2 - From Hadoop Streaming PDF
20 pages
Hadoop Spark
No ratings yet
Hadoop Spark
31 pages
Algorithms
No ratings yet
Algorithms
49 pages
Hadoop Spark
No ratings yet
Hadoop Spark
34 pages
Hadoop Streaming: Mapreduce
No ratings yet
Hadoop Streaming: Mapreduce
8 pages
AAAI2011 Tutorial Slides
No ratings yet
AAAI2011 Tutorial Slides
213 pages
BD - Unit - III - MapReduce
100% (1)
BD - Unit - III - MapReduce
31 pages
Map Reduce
No ratings yet
Map Reduce
25 pages
Map-Reduce For Parallel Computing: Amit Jain
No ratings yet
Map-Reduce For Parallel Computing: Amit Jain
72 pages
3.1 - MapReduce Paradigm (1)
No ratings yet
3.1 - MapReduce Paradigm (1)
28 pages
Bda Lab
No ratings yet
Bda Lab
11 pages
3a - MapReduce Data Flow Scheduling Combiner Partitioner PDF
No ratings yet
3a - MapReduce Data Flow Scheduling Combiner Partitioner PDF
22 pages
June 19th 2009
No ratings yet
June 19th 2009
71 pages
Bda Lab Exercises Lab Mannual - 2023
No ratings yet
Bda Lab Exercises Lab Mannual - 2023
72 pages
Map Reduce
No ratings yet
Map Reduce
3 pages
Bigdata Lecture 3
No ratings yet
Bigdata Lecture 3
42 pages
Big Data
No ratings yet
Big Data
43 pages
QuickStart Guide to Db2 Development with Python
From Everand
QuickStart Guide to Db2 Development with Python
Roger E. Sanders
No ratings yet
Mastering Go A Practical Guide to Developers: A Practical Guide to Developers
From Everand
Mastering Go A Practical Guide to Developers: A Practical Guide to Developers
Miguel Miranda de Mattos
No ratings yet
Beginning R: The Statistical Programming Language
From Everand
Beginning R: The Statistical Programming Language
Mark Gardener
4.5/5 (4)
Hillstone SG 6000 E Series Hardware Reference Guide 1
No ratings yet
Hillstone SG 6000 E Series Hardware Reference Guide 1
62 pages
6TH SEM - Newlink
No ratings yet
6TH SEM - Newlink
3 pages
Chapter 4. Sputtering Target Manufacturing
No ratings yet
Chapter 4. Sputtering Target Manufacturing
1 page
Windows Service Applications
No ratings yet
Windows Service Applications
48 pages
AutoCAD 2016 Important Notes - by - Mamun Hossen
No ratings yet
AutoCAD 2016 Important Notes - by - Mamun Hossen
67 pages
Global Vendor Webinar: Presentation Will Begin at 12:03GMT
No ratings yet
Global Vendor Webinar: Presentation Will Begin at 12:03GMT
35 pages
Treasury Brochure Latest
No ratings yet
Treasury Brochure Latest
8 pages
Introduction
No ratings yet
Introduction
26 pages
Brevini EvoMax Series PDF
No ratings yet
Brevini EvoMax Series PDF
196 pages
Alto Elvis 15.2xla SM Ver1.0
100% (1)
Alto Elvis 15.2xla SM Ver1.0
22 pages
SET02
No ratings yet
SET02
4 pages
Prospects of Bulk Power EHV and UHV Transmission (PDFDrive)
100% (1)
Prospects of Bulk Power EHV and UHV Transmission (PDFDrive)
20 pages
Pro 000230
No ratings yet
Pro 000230
3 pages
Make Better-Looking Food Photos From Home
100% (1)
Make Better-Looking Food Photos From Home
5 pages
How To Install Oracle Solaris Cluster On Solaris 11
No ratings yet
How To Install Oracle Solaris Cluster On Solaris 11
4 pages
Hydraulic System (Indo)
No ratings yet
Hydraulic System (Indo)
20 pages
Xy4 Installation&Learning Guide
No ratings yet
Xy4 Installation&Learning Guide
156 pages
Z-Subsea Free Span & VIV
No ratings yet
Z-Subsea Free Span & VIV
2 pages
PDIT
No ratings yet
PDIT
2 pages
CHAMPS Icons and Reproducible Forms
No ratings yet
CHAMPS Icons and Reproducible Forms
51 pages
Aastra Dialog Digitalphones
No ratings yet
Aastra Dialog Digitalphones
8 pages
Small Engines and Power Tools
No ratings yet
Small Engines and Power Tools
11 pages
The Get Started Guide: The Easiest Guide To Getting Your Music Out There!
No ratings yet
The Get Started Guide: The Easiest Guide To Getting Your Music Out There!
11 pages
K-Poss 22 operator manual
No ratings yet
K-Poss 22 operator manual
4 pages
Código K-Means en Spyder
No ratings yet
Código K-Means en Spyder
3 pages
Crestron Flex Quick Reference Guide
No ratings yet
Crestron Flex Quick Reference Guide
8 pages
Cámara ANPR Adaptive Recognition Vidar Smart HDX
No ratings yet
Cámara ANPR Adaptive Recognition Vidar Smart HDX
3 pages
Radar Remote Sensing: Applications and Challenges Prashant K. Srivastava & Dileep Kumar Gupta & Tanvir Islam & Dawei Han & Rajendra Prasad 2024 scribd download
75% (4)
Radar Remote Sensing: Applications and Challenges Prashant K. Srivastava & Dileep Kumar Gupta & Tanvir Islam & Dawei Han & Rajendra Prasad 2024 scribd download
41 pages
RE - EXTRACT.BASIC - INFO-User Guide
No ratings yet
RE - EXTRACT.BASIC - INFO-User Guide
10 pages

Map Reduce Design and Execution Framework Part 1

Uploaded by

Map Reduce Design and Execution Framework Part 1

Uploaded by

HADOOP AND

Builds on Distributed File Systems

■ Reliable distributed file system

Bring computation directly to the data!

HDFS DETAILS LATER

Outline stays the same, Map and

■ Key idea: Programmers specify two functions:

The execution framework handles everything else…

(Dean and Ghemawat, OSDI 2004)

Image from: http://blog.jteam.nl/wp-content/uploads/2009/08/MapReduceWordCountOverview1.png

Image from: http://blog.jteam.nl/wp-content/uploads/2009/08/MapReduceWordCountOverview1.png

Image from: http://blog.jteam.nl/wp-content/uploads/2009/08/MapReduceWordCountOverview1.png

Image from: http://blog.jteam.nl/wp-content/uploads/2009/08/MapReduceWordCountOverview1.png

Image from: http://blog.jteam.nl/wp-content/uploads/2009/08/MapReduceWordCountOverview1.png

Map Group by key Reduce

■ Statistical machine translation:

Map •extract (5‐word sequence, count) from

Reduce •combine counts

Map: •Apply grep on assigned documents

Reduce: •Merge lists

You might also like