Map Reduce Excercise

This document describes 4 problems related to data analysis using MapReduce. It asks to: 1) Implement the DISTINCT operator to return unique values for a column in 1 MapReduce stage. 2) Implement a SHUFFLE operator to randomly reorder a dataset using MapReduce. 3) Calculate the communication cost for a DISTINCT query on a column where another column meets a condition. 4) Design a MapReduce job to calculate average sales price by supplier from product sales records. It also asks true/false questions about MapReduce properties.

Uploaded by

Ashwin Ajmera

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (1 vote)

148 views

Map Reduce Excercise

Uploaded by

Ashwin Ajmera

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 2

Cloud Computing for Data Analysis

Assignment – 1
1. The DISTINCT(X) operator is used to return only distinct (unique) values for datatype (or
column) X in the entire dataset .

As an example, for the following table A:

A.ID A.ZIPCODE A.AGE

1 12345 30
2 12345 40
3 78910 10
4 78910 10
5 78910 20

DISTINCT(A.ID) = (1, 2, 3, 4, 5)
DISTINCT(A.ZIPCODE) = (12345, 78910)
DISTINCT(A.AGE) = (30, 40, 10, 20)

Implement the DISTINCT(X) operator using Map-Reduce. Provide the algo-

rithm pseudocode. You should use only one Map-Reduce stage, i.e. the algorithm should
make only one pass over the data.

2. The SHUFFLE operator takes a dataset as input and randomly re-orders it.

Hint: Assume that we have a function rand(m) that is capable of outputting a random integer
between [1, m].
Implement the SHUFFLE operator using Map-Reduce. Provide the algorithm pseudocode.

3. What is the communication cost (in terms of total data flow on the network between mappers and
reducers) for following query using Map-Reduce:

Get DISTINCT(A.ID from A WHERE A.AGE > 30 )

The dataset A has 1000M rows, and 400M of these rows have A.AGE <= 30. DISTINCT(A.ID)
has 1M elements. A tuple emitted from any mapper is 1 KB in size.
4. Consider the checkout counter at a large supermarket chain. For each item sold, it generates a
record of the form [ProductId, Supplier, Price]. Here, ProductId is the unique identifier of a
product, Supplier is the supplier name of the product and Price is the sales price for the item.
Assume that the supermarket chain has accumulated many terabytes of data over a period of
several months.

The CEO wants a list of suppliers, listing for each supplier the average sales price of items
provided by the supplier. How would you organize the computation using the Map-Reduce
computation model?

***************************************************************************

For the following questions give short explanations of your answers.

5. True or False: Each mapper/reducer must generate the same number of output key/value pairs
as it receives on the input.
6. True or False: The output type of keys/values of mappers/reducers must be of the same type as
their input.
7. True or False: The input to reducers is grouped by key.
8. True or False: It is possible to start reducers while some mappers are still running.

The Art of Problem Solving Intermediate Algebra
96% (25)
The Art of Problem Solving Intermediate Algebra
720 pages
Digital SAT Math Practice Questions
61% (31)
Digital SAT Math Practice Questions
29 pages
Discovering Geometry Solutions Manual
70% (10)
Discovering Geometry Solutions Manual
304 pages
Woodcock Johson IV Training Manual PDF
100% (2)
Woodcock Johson IV Training Manual PDF
48 pages
Introduction To Geometry
90% (21)
Introduction To Geometry
580 pages
The Motivational Interviewing Workbook - Exercises To Decide What You Want and How To Get There
100% (10)
The Motivational Interviewing Workbook - Exercises To Decide What You Want and How To Get There
224 pages
Algebra Cheat Sheet
97% (72)
Algebra Cheat Sheet
2 pages
Workout Log
63% (19)
Workout Log
8 pages
Physics Primer - Homework - 1
95% (42)
Physics Primer - Homework - 1
40 pages
Golf Strategies - Dave Pelz's Short Game Bible PDF
92% (24)
Golf Strategies - Dave Pelz's Short Game Bible PDF
444 pages
Catherine V Holmes - How To Draw Cool Stuff, A Drawing Guide For Teachers and Students
97% (35)
Catherine V Holmes - How To Draw Cool Stuff, A Drawing Guide For Teachers and Students
260 pages
[Algebra Essentials Practice Workbook with Answers Linear and Quadratic Equations Cross Multiplying and Systems of Equations Improve your Math Fluency Series] Chris McMullen - Algebra Essentials Practice Workbook with A.pdf
82% (11)
[Algebra Essentials Practice Workbook with Answers Linear and Quadratic Equations Cross Multiplying and Systems of Equations Improve your Math Fluency Series] Chris McMullen - Algebra Essentials Practice Workbook with A.pdf
207 pages
Math 87 Mathematics 8 - 7 Textbook An Incremental Development Stephen Hake John Saxon
100% (10)
Math 87 Mathematics 8 - 7 Textbook An Incremental Development Stephen Hake John Saxon
696 pages
Parts Work 4th Edition
100% (30)
Parts Work 4th Edition
166 pages
Astrology Cheatsheet
98% (44)
Astrology Cheatsheet
15 pages
Nptel Big Data Full Assignment Solution 2021
100% (8)
Nptel Big Data Full Assignment Solution 2021
36 pages
Self-System Therapy For Depression Client Workbook
100% (9)
Self-System Therapy For Depression Client Workbook
113 pages
Grade 10.00 Out of 10.00 (100%) : Question Text
No ratings yet
Grade 10.00 Out of 10.00 (100%) : Question Text
69 pages
Java SE 8 Question Bank
100% (1)
Java SE 8 Question Bank
107 pages
The Colossal Book of Mathematics PDF
100% (11)
The Colossal Book of Mathematics PDF
744 pages
Algebra 8-1studyguide
71% (7)
Algebra 8-1studyguide
110 pages
Map Reduce
No ratings yet
Map Reduce
1 page
Dsebl ZG522
No ratings yet
Dsebl ZG522
4 pages
NOSQL_QBSOL_IA-02
No ratings yet
NOSQL_QBSOL_IA-02
18 pages
Module 3 Nosql
No ratings yet
Module 3 Nosql
12 pages
DSE-3222-05-Mar-2025
No ratings yet
DSE-3222-05-Mar-2025
14 pages
Big Data 2020
No ratings yet
Big Data 2020
13 pages
MapReduce Questions
No ratings yet
MapReduce Questions
8 pages
Introduction To MapReduce
No ratings yet
Introduction To MapReduce
43 pages
BDA Question BANK (1)
No ratings yet
BDA Question BANK (1)
7 pages
BDH Answer Bank
No ratings yet
BDH Answer Bank
21 pages
UEC735
No ratings yet
UEC735
2 pages
Computational Tools DTU Presentation Week3
No ratings yet
Computational Tools DTU Presentation Week3
33 pages
Big Data and Hadoop - Semester Exam - 6th Sem-Set 01
No ratings yet
Big Data and Hadoop - Semester Exam - 6th Sem-Set 01
3 pages
220CT Revision Notes
100% (2)
220CT Revision Notes
20 pages
5-Yarn architecture Components Workflow Scheduling-22-01-2025
No ratings yet
5-Yarn architecture Components Workflow Scheduling-22-01-2025
26 pages
NOSQL_MOD3
No ratings yet
NOSQL_MOD3
18 pages
DrKP Module 3
No ratings yet
DrKP Module 3
44 pages
Big Data Midterm
No ratings yet
Big Data Midterm
3 pages
Module2 C MapReduceParadigm
No ratings yet
Module2 C MapReduceParadigm
74 pages
Please Use Either of The 3 Option Given Below While Setting Up The Subjective/descriptive Questions
No ratings yet
Please Use Either of The 3 Option Given Below While Setting Up The Subjective/descriptive Questions
22 pages
Solution - BDA - IA1 - 23-24
No ratings yet
Solution - BDA - IA1 - 23-24
10 pages
Mapreduce and Hadoop Distributed File System
No ratings yet
Mapreduce and Hadoop Distributed File System
36 pages
2023 BD All Assignment
No ratings yet
2023 BD All Assignment
63 pages
DS BigDATA 2ièmeN2TR UVT 2022 2023
No ratings yet
DS BigDATA 2ièmeN2TR UVT 2022 2023
4 pages
Sample MCQs
No ratings yet
Sample MCQs
4 pages
bigDataAnalytics_hw1_2022_sol
No ratings yet
bigDataAnalytics_hw1_2022_sol
9 pages
Data Warehousing&Data Mining AMTCSE0114
No ratings yet
Data Warehousing&Data Mining AMTCSE0114
3 pages
5 RK_MapReduce_v3
No ratings yet
5 RK_MapReduce_v3
30 pages
Map-Reduce For Parallel Computing: Amit Jain
No ratings yet
Map-Reduce For Parallel Computing: Amit Jain
72 pages
MB-124G Big Data Analytics Using R (Elective - G)
No ratings yet
MB-124G Big Data Analytics Using R (Elective - G)
2 pages
Unit 4 Handouts
No ratings yet
Unit 4 Handouts
13 pages
Ela 2
No ratings yet
Ela 2
3 pages
Combined Exam 29.06.2020
No ratings yet
Combined Exam 29.06.2020
13 pages
Relational Algebra Operations in Mapreduce
No ratings yet
Relational Algebra Operations in Mapreduce
28 pages
Assignment - Big Data Management
No ratings yet
Assignment - Big Data Management
2 pages
12 IP QUESTION PAPER
No ratings yet
12 IP QUESTION PAPER
8 pages
Quiz 1 - Attempt Review
No ratings yet
Quiz 1 - Attempt Review
7 pages
Coursera 2
No ratings yet
Coursera 2
17 pages
Lecture 4: Mapreduce and Hadoop: Indranil Gupta (Indy)
No ratings yet
Lecture 4: Mapreduce and Hadoop: Indranil Gupta (Indy)
37 pages
CS442_DSA_Supplimentry 2024
No ratings yet
CS442_DSA_Supplimentry 2024
3 pages
Final Exam.
No ratings yet
Final Exam.
3 pages
Science BSC Information Technology Semester 5 2019 November Next Generation Technologies Cbcs
No ratings yet
Science BSC Information Technology Semester 5 2019 November Next Generation Technologies Cbcs
21 pages
Assignment 1 - 2024
No ratings yet
Assignment 1 - 2024
3 pages
Ir MR 1
No ratings yet
Ir MR 1
34 pages
Why MapReduce
No ratings yet
Why MapReduce
8 pages
MODULE 3
No ratings yet
MODULE 3
79 pages
BDA IV B.Tech I Sem MR18-Mid-2 Objective Questions
No ratings yet
BDA IV B.Tech I Sem MR18-Mid-2 Objective Questions
11 pages
Assignment 3
No ratings yet
Assignment 3
6 pages
Big Data: Practice Exercises
0% (1)
Big Data: Practice Exercises
4 pages
r16 Te Sem Viii Choice It Big Data Analytics
No ratings yet
r16 Te Sem Viii Choice It Big Data Analytics
5 pages
Map Reduce
No ratings yet
Map Reduce
39 pages
sample_question
No ratings yet
sample_question
19 pages
Itxperts 210898sample Question Paper 2025
No ratings yet
Itxperts 210898sample Question Paper 2025
10 pages
ACSAI0512[1]_merged
No ratings yet
ACSAI0512[1]_merged
23 pages
Sol Big Data And Analytics Jan-2024
No ratings yet
Sol Big Data And Analytics Jan-2024
12 pages
21CS745 Model Papper
No ratings yet
21CS745 Model Papper
2 pages
Digital Image Processing: Fundamentals and Applications
From Everand
Digital Image Processing: Fundamentals and Applications
Fouad Sabry
No ratings yet
Introduction to Algorithms
From Everand
Introduction to Algorithms
S VASIST
No ratings yet
Notes
No ratings yet
Notes
3 pages
Name of The Student Student ID Session 2. Present Address
No ratings yet
Name of The Student Student ID Session 2. Present Address
9 pages
1 Introduction Bash Shell Linux Mac Os m1 Overview Slides PDF
No ratings yet
1 Introduction Bash Shell Linux Mac Os m1 Overview Slides PDF
6 pages
Activity Clock PDF
No ratings yet
Activity Clock PDF
2 pages
Chapter 05 Slides
No ratings yet
Chapter 05 Slides
35 pages
Pig: Building High-Level Dataflows Over Map-Reduce: Utkarsh Srivastava
No ratings yet
Pig: Building High-Level Dataflows Over Map-Reduce: Utkarsh Srivastava
46 pages
Hive - A Warehousing Solution Over A Map-Reduce Framework
No ratings yet
Hive - A Warehousing Solution Over A Map-Reduce Framework
24 pages
Black Insurgency (McAdam)
No ratings yet
Black Insurgency (McAdam)
21 pages
Framing The Women's March On Washington
No ratings yet
Framing The Women's March On Washington
10 pages
Re Producing Feminine Bodies Emergent Spaces Through Contestation in The Women S March On Washington PDF
No ratings yet
Re Producing Feminine Bodies Emergent Spaces Through Contestation in The Women S March On Washington PDF
12 pages
Emergent and Divergent Spaces in The Women S March The Challenges of Intersectionality and Inclusion
No ratings yet
Emergent and Divergent Spaces in The Women S March The Challenges of Intersectionality and Inclusion
9 pages
A Living Archive of Modern Protest Memory Making in The Women S March
No ratings yet
A Living Archive of Modern Protest Memory Making in The Women S March
10 pages
Farm Worker Movement (Jenkins, Perrow)
No ratings yet
Farm Worker Movement (Jenkins, Perrow)
21 pages
12 Sympathizers (Oegema, Klandermans) PDF
No ratings yet
12 Sympathizers (Oegema, Klandermans) PDF
21 pages
Social Networks (Snow)
No ratings yet
Social Networks (Snow)
16 pages
Linux Lab Manual by Zoom PDF
No ratings yet
Linux Lab Manual by Zoom PDF
184 pages
Data Mining With Hadoop and Hive Introduction To Architecture
No ratings yet
Data Mining With Hadoop and Hive Introduction To Architecture
39 pages
S MapReduce Types Formats Features 03
No ratings yet
S MapReduce Types Formats Features 03
16 pages
Algebra 2
95% (19)
Algebra 2
200 pages
How To Distinguish ADHD From Typical Toddler Behavior
100% (1)
How To Distinguish ADHD From Typical Toddler Behavior
24 pages
How To Read Sheet Music For Beginners
100% (2)
How To Read Sheet Music For Beginners
15 pages
Tarasov Calculus
100% (1)
Tarasov Calculus
179 pages
Mathematics Fundamentals
89% (9)
Mathematics Fundamentals
198 pages
Florida Teacher Certificate Examinations (FTCE) Study Guide
0% (2)
Florida Teacher Certificate Examinations (FTCE) Study Guide
20 pages
Day 2 Math, Distributive Property
No ratings yet
Day 2 Math, Distributive Property
9 pages
Pre-Algebra and Algebra
100% (23)
Pre-Algebra and Algebra
66 pages
Fractions
100% (10)
Fractions
50 pages
M. Aurelius PDF
100% (10)
M. Aurelius PDF
366 pages
ECG Rhythm Interpretation 2007
100% (20)
ECG Rhythm Interpretation 2007
533 pages
PCS-9705 X Instruction Manual en Domestic General X R1.03 (En CKZZ5305.0086.0004)
75% (4)
PCS-9705 X Instruction Manual en Domestic General X R1.03 (En CKZZ5305.0086.0004)
255 pages
Updated Chapter 2 Control Systems Lecture Notes
No ratings yet
Updated Chapter 2 Control Systems Lecture Notes
22 pages
CRPD Sco 2023 24 32
No ratings yet
CRPD Sco 2023 24 32
9 pages
Algorithms For Predictive Maintenance Efficiently Developed With Matlab
No ratings yet
Algorithms For Predictive Maintenance Efficiently Developed With Matlab
22 pages
Concurrency Problems:: Transactions
No ratings yet
Concurrency Problems:: Transactions
22 pages
CYQX
100% (1)
CYQX
9 pages
Triger 911
No ratings yet
Triger 911
4 pages
Dbms
No ratings yet
Dbms
16 pages
EE765-Reliability and Failure Analysis of Electronic Devices
No ratings yet
EE765-Reliability and Failure Analysis of Electronic Devices
10 pages
A Low Cost GSM GPRS Based Wireless Home
No ratings yet
A Low Cost GSM GPRS Based Wireless Home
6 pages
Benchmarking
No ratings yet
Benchmarking
3 pages
Experiment No:-17: Title
No ratings yet
Experiment No:-17: Title
4 pages

Map Reduce Excercise

Uploaded by

Map Reduce Excercise

Uploaded by

Cloud Computing for Data Analysis

As an example, for the following table A:

A.ID A.ZIPCODE A.AGE

Implement the DISTINCT(X) operator using Map-Reduce. Provide the algo-

Get DISTINCT(A.ID from A WHERE A.AGE > 30 )

For the following questions give short explanations of your answers.

You might also like