SlideShare a Scribd company logo
Lecture 2 – MapReduce CPE 458 – Parallel Programming, Spring 2009 Except as otherwise noted, the content of this presentation is licensed under the Creative Commons Attribution 2.5 License.http://creativecommons.org/licenses/by/2.5
Outline MapReduce: Programming Model MapReduce Examples A Brief History  MapReduce Execution Overview Hadoop MapReduce Resources
MapReduce “ A simple and powerful interface that enables automatic parallelization and distribution of large-scale computations, combined with an implementation of this interface that achieves high performance on large clusters of commodity PCs.” Dean and Ghermawat, “MapReduce: Simplified Data Processing on Large Clusters”,  Google Inc.
MapReduce More simply, MapReduce is: A parallel programming model and associated implementation.
Programming Model Description The mental model the programmer has about the detailed execution of their application. Purpose Improve programmer productivity Evaluation Expressibility Simplicity Performance
Programming Models von Neumann model Execute a stream of instructions (machine code) Instructions can specify Arithmetic operations Data addresses Next instruction to execute Complexity Track billions of data locations and millions of instructions Manage with: Modular design High-level programming languages (isomorphic)
Programming Models Parallel Programming Models Message passing Independent tasks encapsulating local data Tasks interact by exchanging messages Shared memory Tasks share a common address space Tasks interact by reading and writing this space asynchronously Data parallelization Tasks execute a sequence of independent operations Data usually evenly partitioned across tasks Also referred to as “Embarrassingly parallel”
MapReduce: Programming Model Process data using special  map () and  reduce () functions The map() function is called on every item in the input and emits a series of intermediate key/value pairs All values associated with a given key are grouped together The reduce() function is called on every unique key, and its value list, and emits a value that is added to the output
MapReduce: Programming Model How now Brown cow How does It work now brown 1 cow 1 does 1 How 2 it 1 now 2 work 1 M M M M R R <How,1> <now,1> <brown,1> <cow,1> <How,1> <does,1> <it,1> <work,1> <now,1> <How,1 1> <now,1 1> <brown,1> <cow,1> <does,1> <it,1> <work,1> Input Output Map Reduce MapReduce Framework
MapReduce: Programming Model More formally, Map(k1,v1) --> list(k2,v2) Reduce(k2, list(v2)) --> list(v2)
MapReduce Runtime System Partitions input data Schedules execution across a set of machines Handles machine failure Manages interprocess communication
MapReduce Benefits Greatly reduces parallel programming complexity Reduces synchronization complexity Automatically partitions data Provides failure transparency Handles load balancing Practical Approximately 1000 Google MapReduce jobs run everyday.
MapReduce Examples Word frequency Map doc Reduce <word,3> <word,1> <word,1> <word,1> Runtime System <word,1,1,1>
MapReduce Examples Distributed grep Map function emits <word, line_number> if word matches search criteria Reduce function is the identity function URL access frequency Map function processes web logs, emits <url, 1> Reduce function sums values and emits <url, total>
A Brief History Functional programming (e.g., Lisp) map() function Applies a function to each value of a sequence reduce() function Combines all elements of a sequence using a binary operator
MapReduce Execution Overview The user program, via the MapReduce library, shards the input data User Program Input Data Shard 0 Shard 1 Shard 2 Shard 3 Shard 4 Shard 5 Shard 6 * Shards are typically 16-64mb in size
MapReduce Execution Overview The user program creates process copies distributed on a machine cluster. One copy will be the “Master” and the others will be worker threads. User Program Master Workers Workers Workers Workers Workers
MapReduce Resources The master distributes M map and  R reduce tasks to idle workers. M == number of shards R == the intermediate key space is divided into R parts Master Idle Worker Message(Do_map_task)
MapReduce Resources Each map-task worker reads assigned input shard and outputs intermediate key/value pairs. Output buffered in RAM. Map worker Shard 0 Key/value pairs
MapReduce Execution Overview Each worker flushes intermediate values, partitioned into R regions, to disk and notifies the Master process.  Master Map worker Disk locations Local Storage
MapReduce Execution Overview Master process gives disk locations to an available reduce-task worker who reads all associated intermediate data.  Master Reduce worker Disk locations remote Storage
MapReduce Execution Overview Each reduce-task worker sorts its intermediate data. Calls the reduce function, passing in unique keys and associated key values. Reduce function output appended to reduce-task’s partition output file. Reduce worker Sorts data Partition Output file
MapReduce Execution Overview Master process wakes up user process when all tasks have completed.  Output contained in R output files. wakeup User Program Master Output files
MapReduce Execution Overview Fault Tolerance Master process periodically pings workers Map-task failure Re-execute All output was stored locally Reduce-task failure Only re-execute partially completed tasks All output stored in the global file system
Hadoop Open source MapReduce implementation http: //hadoop .apache.org/core/index.html Uses  Hadoop Distributed Filesytem (HDFS) http: //hadoop .apache. org/core/docs/current/hdfs_design .html Java ssh
References Introduction to Parallel Programming and MapReduce, Google Code University http://code.google.com/edu/parallel/mapreduce-tutorial.html Distributed Systems http://code. google . com/edu/parallel/index .html MapReduce: Simplified Data Processing on Large Clusters http://labs. google . com/papers/mapreduce .html Hadoop http: //hadoop .apache.org/core/
Ad

More Related Content

What's hot (20)

Hadoop And Their Ecosystem ppt
 Hadoop And Their Ecosystem ppt Hadoop And Their Ecosystem ppt
Hadoop And Their Ecosystem ppt
sunera pathan
 
Anatomy of classic map reduce in hadoop
Anatomy of classic map reduce in hadoop Anatomy of classic map reduce in hadoop
Anatomy of classic map reduce in hadoop
Rajesh Ananda Kumar
 
Mining Data Streams
Mining Data StreamsMining Data Streams
Mining Data Streams
SujaAldrin
 
Introduction to YARN and MapReduce 2
Introduction to YARN and MapReduce 2Introduction to YARN and MapReduce 2
Introduction to YARN and MapReduce 2
Cloudera, Inc.
 
Map reduce in BIG DATA
Map reduce in BIG DATAMap reduce in BIG DATA
Map reduce in BIG DATA
GauravBiswas9
 
Hadoop Overview & Architecture
Hadoop Overview & Architecture  Hadoop Overview & Architecture
Hadoop Overview & Architecture
EMC
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
Apache Apex
 
Hadoop File system (HDFS)
Hadoop File system (HDFS)Hadoop File system (HDFS)
Hadoop File system (HDFS)
Prashant Gupta
 
Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component
rebeccatho
 
Hadoop Ecosystem
Hadoop EcosystemHadoop Ecosystem
Hadoop Ecosystem
Sandip Darwade
 
Introduction to Hadoop Technology
Introduction to Hadoop TechnologyIntroduction to Hadoop Technology
Introduction to Hadoop Technology
Manish Borkar
 
Hadoop Tutorial For Beginners
Hadoop Tutorial For BeginnersHadoop Tutorial For Beginners
Hadoop Tutorial For Beginners
Dataflair Web Services Pvt Ltd
 
Mapreduce advanced
Mapreduce advancedMapreduce advanced
Mapreduce advanced
Chirag Ahuja
 
Hadoop YARN | Hadoop YARN Architecture | Hadoop YARN Tutorial | Hadoop Tutori...
Hadoop YARN | Hadoop YARN Architecture | Hadoop YARN Tutorial | Hadoop Tutori...Hadoop YARN | Hadoop YARN Architecture | Hadoop YARN Tutorial | Hadoop Tutori...
Hadoop YARN | Hadoop YARN Architecture | Hadoop YARN Tutorial | Hadoop Tutori...
Simplilearn
 
Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation Hadoop
Varun Narang
 
Map reduce presentation
Map reduce presentationMap reduce presentation
Map reduce presentation
ateeq ateeq
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with Hadoop
Philippe Julio
 
PPT on Hadoop
PPT on HadoopPPT on Hadoop
PPT on Hadoop
Shubham Parmar
 
Cloud Computing: Hadoop
Cloud Computing: HadoopCloud Computing: Hadoop
Cloud Computing: Hadoop
darugar
 
HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY ppt
sravya raju
 
Hadoop And Their Ecosystem ppt
 Hadoop And Their Ecosystem ppt Hadoop And Their Ecosystem ppt
Hadoop And Their Ecosystem ppt
sunera pathan
 
Anatomy of classic map reduce in hadoop
Anatomy of classic map reduce in hadoop Anatomy of classic map reduce in hadoop
Anatomy of classic map reduce in hadoop
Rajesh Ananda Kumar
 
Mining Data Streams
Mining Data StreamsMining Data Streams
Mining Data Streams
SujaAldrin
 
Introduction to YARN and MapReduce 2
Introduction to YARN and MapReduce 2Introduction to YARN and MapReduce 2
Introduction to YARN and MapReduce 2
Cloudera, Inc.
 
Map reduce in BIG DATA
Map reduce in BIG DATAMap reduce in BIG DATA
Map reduce in BIG DATA
GauravBiswas9
 
Hadoop Overview & Architecture
Hadoop Overview & Architecture  Hadoop Overview & Architecture
Hadoop Overview & Architecture
EMC
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
Apache Apex
 
Hadoop File system (HDFS)
Hadoop File system (HDFS)Hadoop File system (HDFS)
Hadoop File system (HDFS)
Prashant Gupta
 
Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component Introduction to Hadoop and Hadoop component
Introduction to Hadoop and Hadoop component
rebeccatho
 
Introduction to Hadoop Technology
Introduction to Hadoop TechnologyIntroduction to Hadoop Technology
Introduction to Hadoop Technology
Manish Borkar
 
Mapreduce advanced
Mapreduce advancedMapreduce advanced
Mapreduce advanced
Chirag Ahuja
 
Hadoop YARN | Hadoop YARN Architecture | Hadoop YARN Tutorial | Hadoop Tutori...
Hadoop YARN | Hadoop YARN Architecture | Hadoop YARN Tutorial | Hadoop Tutori...Hadoop YARN | Hadoop YARN Architecture | Hadoop YARN Tutorial | Hadoop Tutori...
Hadoop YARN | Hadoop YARN Architecture | Hadoop YARN Tutorial | Hadoop Tutori...
Simplilearn
 
Seminar Presentation Hadoop
Seminar Presentation HadoopSeminar Presentation Hadoop
Seminar Presentation Hadoop
Varun Narang
 
Map reduce presentation
Map reduce presentationMap reduce presentation
Map reduce presentation
ateeq ateeq
 
Big Data Analytics with Hadoop
Big Data Analytics with HadoopBig Data Analytics with Hadoop
Big Data Analytics with Hadoop
Philippe Julio
 
Cloud Computing: Hadoop
Cloud Computing: HadoopCloud Computing: Hadoop
Cloud Computing: Hadoop
darugar
 
HADOOP TECHNOLOGY ppt
HADOOP  TECHNOLOGY pptHADOOP  TECHNOLOGY ppt
HADOOP TECHNOLOGY ppt
sravya raju
 

Similar to Map Reduce (20)

MapReduce
MapReduceMapReduce
MapReduce
ahmedelmorsy89
 
Big Data- process of map reducing MapReduce- .ppt
Big Data- process of map reducing MapReduce- .pptBig Data- process of map reducing MapReduce- .ppt
Big Data- process of map reducing MapReduce- .ppt
sunilsoni446112
 
Introduction To Map Reduce
Introduction To Map ReduceIntroduction To Map Reduce
Introduction To Map Reduce
rantav
 
Hadoop interview questions - Softwarequery.com
Hadoop interview questions - Softwarequery.comHadoop interview questions - Softwarequery.com
Hadoop interview questions - Softwarequery.com
softwarequery
 
MapReduce and Hadoop Introcuctory Presentation
MapReduce and Hadoop Introcuctory PresentationMapReduce and Hadoop Introcuctory Presentation
MapReduce and Hadoop Introcuctory Presentation
ssuserb91a20
 
Introduction to map reduce
Introduction to map reduceIntroduction to map reduce
Introduction to map reduce
M Baddar
 
MAP REDUCE IN DATA SCIENCE.pptx
MAP REDUCE IN DATA SCIENCE.pptxMAP REDUCE IN DATA SCIENCE.pptx
MAP REDUCE IN DATA SCIENCE.pptx
HARIKRISHNANU13
 
2004 map reduce simplied data processing on large clusters (mapreduce)
2004 map reduce simplied data processing on large clusters (mapreduce)2004 map reduce simplied data processing on large clusters (mapreduce)
2004 map reduce simplied data processing on large clusters (mapreduce)
anh tuan
 
Map reduce
Map reduceMap reduce
Map reduce
Shahbaz Sidhu
 
Mapreduce Osdi04
Mapreduce Osdi04Mapreduce Osdi04
Mapreduce Osdi04
Jyotirmoy Dey
 
Meethadoop
MeethadoopMeethadoop
Meethadoop
IIIT-H
 
Lecture 1 mapreduce
Lecture 1  mapreduceLecture 1  mapreduce
Lecture 1 mapreduce
Shubham Bansal
 
map Reduce.pptx
map Reduce.pptxmap Reduce.pptx
map Reduce.pptx
habibaabderrahim1
 
Report Hadoop Map Reduce
Report Hadoop Map ReduceReport Hadoop Map Reduce
Report Hadoop Map Reduce
Urvashi Kataria
 
Mapreduce - Simplified Data Processing on Large Clusters
Mapreduce - Simplified Data Processing on Large ClustersMapreduce - Simplified Data Processing on Large Clusters
Mapreduce - Simplified Data Processing on Large Clusters
Abhishek Singh
 
The google MapReduce
The google MapReduceThe google MapReduce
The google MapReduce
Romain Jacotin
 
Hadoop File System was developed using distributed file system design.
Hadoop File System was developed using distributed file system design.Hadoop File System was developed using distributed file system design.
Hadoop File System was developed using distributed file system design.
JSujatha2
 
E031201032036
E031201032036E031201032036
E031201032036
ijceronline
 
MapReduce-Notes.pdf
MapReduce-Notes.pdfMapReduce-Notes.pdf
MapReduce-Notes.pdf
AnilVijayagiri
 
Map reduce
Map reduceMap reduce
Map reduce
대호 김
 
Big Data- process of map reducing MapReduce- .ppt
Big Data- process of map reducing MapReduce- .pptBig Data- process of map reducing MapReduce- .ppt
Big Data- process of map reducing MapReduce- .ppt
sunilsoni446112
 
Introduction To Map Reduce
Introduction To Map ReduceIntroduction To Map Reduce
Introduction To Map Reduce
rantav
 
Hadoop interview questions - Softwarequery.com
Hadoop interview questions - Softwarequery.comHadoop interview questions - Softwarequery.com
Hadoop interview questions - Softwarequery.com
softwarequery
 
MapReduce and Hadoop Introcuctory Presentation
MapReduce and Hadoop Introcuctory PresentationMapReduce and Hadoop Introcuctory Presentation
MapReduce and Hadoop Introcuctory Presentation
ssuserb91a20
 
Introduction to map reduce
Introduction to map reduceIntroduction to map reduce
Introduction to map reduce
M Baddar
 
MAP REDUCE IN DATA SCIENCE.pptx
MAP REDUCE IN DATA SCIENCE.pptxMAP REDUCE IN DATA SCIENCE.pptx
MAP REDUCE IN DATA SCIENCE.pptx
HARIKRISHNANU13
 
2004 map reduce simplied data processing on large clusters (mapreduce)
2004 map reduce simplied data processing on large clusters (mapreduce)2004 map reduce simplied data processing on large clusters (mapreduce)
2004 map reduce simplied data processing on large clusters (mapreduce)
anh tuan
 
Meethadoop
MeethadoopMeethadoop
Meethadoop
IIIT-H
 
Report Hadoop Map Reduce
Report Hadoop Map ReduceReport Hadoop Map Reduce
Report Hadoop Map Reduce
Urvashi Kataria
 
Mapreduce - Simplified Data Processing on Large Clusters
Mapreduce - Simplified Data Processing on Large ClustersMapreduce - Simplified Data Processing on Large Clusters
Mapreduce - Simplified Data Processing on Large Clusters
Abhishek Singh
 
Hadoop File System was developed using distributed file system design.
Hadoop File System was developed using distributed file system design.Hadoop File System was developed using distributed file system design.
Hadoop File System was developed using distributed file system design.
JSujatha2
 
Ad

More from Sri Prasanna (20)

Qr codes para tech radar
Qr codes para tech radarQr codes para tech radar
Qr codes para tech radar
Sri Prasanna
 
Qr codes para tech radar 2
Qr codes para tech radar 2Qr codes para tech radar 2
Qr codes para tech radar 2
Sri Prasanna
 
assds
assdsassds
assds
Sri Prasanna
 
assds
assdsassds
assds
Sri Prasanna
 
asdsa
asdsaasdsa
asdsa
Sri Prasanna
 
dsd
dsddsd
dsd
Sri Prasanna
 
About stacks
About stacksAbout stacks
About stacks
Sri Prasanna
 
About Stacks
About  StacksAbout  Stacks
About Stacks
Sri Prasanna
 
About Stacks
About  StacksAbout  Stacks
About Stacks
Sri Prasanna
 
About Stacks
About  StacksAbout  Stacks
About Stacks
Sri Prasanna
 
About Stacks
About  StacksAbout  Stacks
About Stacks
Sri Prasanna
 
About Stacks
About  StacksAbout  Stacks
About Stacks
Sri Prasanna
 
About Stacks
About StacksAbout Stacks
About Stacks
Sri Prasanna
 
About Stacks
About StacksAbout Stacks
About Stacks
Sri Prasanna
 
Network and distributed systems
Network and distributed systemsNetwork and distributed systems
Network and distributed systems
Sri Prasanna
 
Introduction & Parellelization on large scale clusters
Introduction & Parellelization on large scale clustersIntroduction & Parellelization on large scale clusters
Introduction & Parellelization on large scale clusters
Sri Prasanna
 
Mapreduce: Theory and implementation
Mapreduce: Theory and implementationMapreduce: Theory and implementation
Mapreduce: Theory and implementation
Sri Prasanna
 
Other distributed systems
Other distributed systemsOther distributed systems
Other distributed systems
Sri Prasanna
 
Ad

Recently uploaded (20)

Greenhouse_Monitoring_Presentation.pptx.
Greenhouse_Monitoring_Presentation.pptx.Greenhouse_Monitoring_Presentation.pptx.
Greenhouse_Monitoring_Presentation.pptx.
hpbmnnxrvb
 
Build Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For DevsBuild Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For Devs
Brian McKeiver
 
Cyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of securityCyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of security
riccardosl1
 
Big Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur MorganBig Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur Morgan
Arthur Morgan
 
Mobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi ArabiaMobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi Arabia
Steve Jonas
 
TrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business ConsultingTrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business Consulting
Trs Labs
 
How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?
Daniel Lehner
 
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager APIUiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPathCommunity
 
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep DiveDesigning Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
ScyllaDB
 
Generative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in BusinessGenerative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in Business
Dr. Tathagat Varma
 
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Impelsys Inc.
 
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
organizerofv
 
Quantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur MorganQuantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur Morgan
Arthur Morgan
 
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-UmgebungenHCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
panagenda
 
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
BookNet Canada
 
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
Alan Dix
 
Heap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and DeletionHeap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and Deletion
Jaydeep Kale
 
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdfThe Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
Abi john
 
Linux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdfLinux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdf
RHCSA Guru
 
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
SOFTTECHHUB
 
Greenhouse_Monitoring_Presentation.pptx.
Greenhouse_Monitoring_Presentation.pptx.Greenhouse_Monitoring_Presentation.pptx.
Greenhouse_Monitoring_Presentation.pptx.
hpbmnnxrvb
 
Build Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For DevsBuild Your Own Copilot & Agents For Devs
Build Your Own Copilot & Agents For Devs
Brian McKeiver
 
Cyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of securityCyber Awareness overview for 2025 month of security
Cyber Awareness overview for 2025 month of security
riccardosl1
 
Big Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur MorganBig Data Analytics Quick Research Guide by Arthur Morgan
Big Data Analytics Quick Research Guide by Arthur Morgan
Arthur Morgan
 
Mobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi ArabiaMobile App Development Company in Saudi Arabia
Mobile App Development Company in Saudi Arabia
Steve Jonas
 
TrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business ConsultingTrsLabs - Fintech Product & Business Consulting
TrsLabs - Fintech Product & Business Consulting
Trs Labs
 
How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?How Can I use the AI Hype in my Business Context?
How Can I use the AI Hype in my Business Context?
Daniel Lehner
 
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager APIUiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPath Community Berlin: Orchestrator API, Swagger, and Test Manager API
UiPathCommunity
 
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep DiveDesigning Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
Designing Low-Latency Systems with Rust and ScyllaDB: An Architectural Deep Dive
ScyllaDB
 
Generative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in BusinessGenerative Artificial Intelligence (GenAI) in Business
Generative Artificial Intelligence (GenAI) in Business
Dr. Tathagat Varma
 
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Enhancing ICU Intelligence: How Our Functional Testing Enabled a Healthcare I...
Impelsys Inc.
 
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
IEDM 2024 Tutorial2_Advances in CMOS Technologies and Future Directions for C...
organizerofv
 
Quantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur MorganQuantum Computing Quick Research Guide by Arthur Morgan
Quantum Computing Quick Research Guide by Arthur Morgan
Arthur Morgan
 
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-UmgebungenHCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
HCL Nomad Web – Best Practices und Verwaltung von Multiuser-Umgebungen
panagenda
 
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
#StandardsGoals for 2025: Standards & certification roundup - Tech Forum 2025
BookNet Canada
 
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
AI Changes Everything – Talk at Cardiff Metropolitan University, 29th April 2...
Alan Dix
 
Heap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and DeletionHeap, Types of Heap, Insertion and Deletion
Heap, Types of Heap, Insertion and Deletion
Jaydeep Kale
 
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdfThe Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
The Evolution of Meme Coins A New Era for Digital Currency ppt.pdf
Abi john
 
Linux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdfLinux Professional Institute LPIC-1 Exam.pdf
Linux Professional Institute LPIC-1 Exam.pdf
RHCSA Guru
 
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
AI EngineHost Review: Revolutionary USA Datacenter-Based Hosting with NVIDIA ...
SOFTTECHHUB
 

Map Reduce

  • 1. Lecture 2 – MapReduce CPE 458 – Parallel Programming, Spring 2009 Except as otherwise noted, the content of this presentation is licensed under the Creative Commons Attribution 2.5 License.http://creativecommons.org/licenses/by/2.5
  • 2. Outline MapReduce: Programming Model MapReduce Examples A Brief History MapReduce Execution Overview Hadoop MapReduce Resources
  • 3. MapReduce “ A simple and powerful interface that enables automatic parallelization and distribution of large-scale computations, combined with an implementation of this interface that achieves high performance on large clusters of commodity PCs.” Dean and Ghermawat, “MapReduce: Simplified Data Processing on Large Clusters”, Google Inc.
  • 4. MapReduce More simply, MapReduce is: A parallel programming model and associated implementation.
  • 5. Programming Model Description The mental model the programmer has about the detailed execution of their application. Purpose Improve programmer productivity Evaluation Expressibility Simplicity Performance
  • 6. Programming Models von Neumann model Execute a stream of instructions (machine code) Instructions can specify Arithmetic operations Data addresses Next instruction to execute Complexity Track billions of data locations and millions of instructions Manage with: Modular design High-level programming languages (isomorphic)
  • 7. Programming Models Parallel Programming Models Message passing Independent tasks encapsulating local data Tasks interact by exchanging messages Shared memory Tasks share a common address space Tasks interact by reading and writing this space asynchronously Data parallelization Tasks execute a sequence of independent operations Data usually evenly partitioned across tasks Also referred to as “Embarrassingly parallel”
  • 8. MapReduce: Programming Model Process data using special map () and reduce () functions The map() function is called on every item in the input and emits a series of intermediate key/value pairs All values associated with a given key are grouped together The reduce() function is called on every unique key, and its value list, and emits a value that is added to the output
  • 9. MapReduce: Programming Model How now Brown cow How does It work now brown 1 cow 1 does 1 How 2 it 1 now 2 work 1 M M M M R R <How,1> <now,1> <brown,1> <cow,1> <How,1> <does,1> <it,1> <work,1> <now,1> <How,1 1> <now,1 1> <brown,1> <cow,1> <does,1> <it,1> <work,1> Input Output Map Reduce MapReduce Framework
  • 10. MapReduce: Programming Model More formally, Map(k1,v1) --> list(k2,v2) Reduce(k2, list(v2)) --> list(v2)
  • 11. MapReduce Runtime System Partitions input data Schedules execution across a set of machines Handles machine failure Manages interprocess communication
  • 12. MapReduce Benefits Greatly reduces parallel programming complexity Reduces synchronization complexity Automatically partitions data Provides failure transparency Handles load balancing Practical Approximately 1000 Google MapReduce jobs run everyday.
  • 13. MapReduce Examples Word frequency Map doc Reduce <word,3> <word,1> <word,1> <word,1> Runtime System <word,1,1,1>
  • 14. MapReduce Examples Distributed grep Map function emits <word, line_number> if word matches search criteria Reduce function is the identity function URL access frequency Map function processes web logs, emits <url, 1> Reduce function sums values and emits <url, total>
  • 15. A Brief History Functional programming (e.g., Lisp) map() function Applies a function to each value of a sequence reduce() function Combines all elements of a sequence using a binary operator
  • 16. MapReduce Execution Overview The user program, via the MapReduce library, shards the input data User Program Input Data Shard 0 Shard 1 Shard 2 Shard 3 Shard 4 Shard 5 Shard 6 * Shards are typically 16-64mb in size
  • 17. MapReduce Execution Overview The user program creates process copies distributed on a machine cluster. One copy will be the “Master” and the others will be worker threads. User Program Master Workers Workers Workers Workers Workers
  • 18. MapReduce Resources The master distributes M map and R reduce tasks to idle workers. M == number of shards R == the intermediate key space is divided into R parts Master Idle Worker Message(Do_map_task)
  • 19. MapReduce Resources Each map-task worker reads assigned input shard and outputs intermediate key/value pairs. Output buffered in RAM. Map worker Shard 0 Key/value pairs
  • 20. MapReduce Execution Overview Each worker flushes intermediate values, partitioned into R regions, to disk and notifies the Master process. Master Map worker Disk locations Local Storage
  • 21. MapReduce Execution Overview Master process gives disk locations to an available reduce-task worker who reads all associated intermediate data. Master Reduce worker Disk locations remote Storage
  • 22. MapReduce Execution Overview Each reduce-task worker sorts its intermediate data. Calls the reduce function, passing in unique keys and associated key values. Reduce function output appended to reduce-task’s partition output file. Reduce worker Sorts data Partition Output file
  • 23. MapReduce Execution Overview Master process wakes up user process when all tasks have completed. Output contained in R output files. wakeup User Program Master Output files
  • 24. MapReduce Execution Overview Fault Tolerance Master process periodically pings workers Map-task failure Re-execute All output was stored locally Reduce-task failure Only re-execute partially completed tasks All output stored in the global file system
  • 25. Hadoop Open source MapReduce implementation http: //hadoop .apache.org/core/index.html Uses Hadoop Distributed Filesytem (HDFS) http: //hadoop .apache. org/core/docs/current/hdfs_design .html Java ssh
  • 26. References Introduction to Parallel Programming and MapReduce, Google Code University http://code.google.com/edu/parallel/mapreduce-tutorial.html Distributed Systems http://code. google . com/edu/parallel/index .html MapReduce: Simplified Data Processing on Large Clusters http://labs. google . com/papers/mapreduce .html Hadoop http: //hadoop .apache.org/core/