Mapreduce Advanced
Mapreduce Advanced
MapReduce in Heterogeneous
Environments
Eva Kalyvianaki
[email protected]
Contents
2
Motivation: MapReduce is becoming popular
Open-source implementation, Hadoop, used by Yahoo!,
Facebook, Last.fm, …
Scale: 20 PB/day at Google, O(10,000) nodes at Yahoo, 3000
jobs/day at Facebook
3
Stragglers in MapReduce
Straggler is a node that performs poorly or not performing
at all.
Original MapReduce mitigation approach was:
To run a speculative copy (called a backup task)
Whichever copy or original would finish first would be included
4
Modern Clusters: Heterogeneity is the norm
Cloud computing providers like Amazon’s Elastic Compute
Cloud (EC2) provide cheap on-demand computing:
Price: 2 cents / VM / hour
Scale: thousands of VMs
Caveat: less control of performance
5
MapReduce Revised
6
MapReduce Implementation, Hadoop
7
Scheduling in MapReduce
When a node has an empty slot, Hadoop chooses one from
the three categories in the following priority:
1. A failed task is given higher priority
2. Unscheduled tasks. For maps, tasks with local data to the node are
chosen first.
3. Looks to run a speculative task.
8
Deciding on Speculative Tasks
Which task to execute speculatively?
Hadoop monitors tasks progress using a progress score: a
number from 0, …, 1
For mappers: the score is the fraction of input data read
For reducers: the execution is divided into three equal phases,
1/3 of the score each:
Copy phase: percent of maps that output has been copied from
Sort phase: map outputs are sorted by key: percent of data merged
Reduce phase: percent of data passed through the reduce function
Example: a task halfway through the copy phase has
progress score = 1/2*1/3 = 1/6.
Example: a task halfway through the reduce phase has
progress score = 1/3 + 1/3 + 1/2 * 1/3 = 5/6
9
Deciding on Speculative Tasks (con’t)
10
Scheduler’s Assumptions
1. Nodes can perform work at roughly the same rate
2. Tasks progress at constant rate all the time
3. There is no cost to starting a speculative task
4. A task’s progress is roughly equal to the fraction of its total
work
5. Tasks tend to finish in waves, so a task with a low progress
score is likely a slow task
6. Different task of the same category (maps or reduces) take
roughly the same amount of work
11
Revising Scheduler’s Assumptions
1. Nodes can perform work at roughly the same rate
2. Tasks progress at constant rate all the time
12
Heterogeneity in Virtualized Environments
60
50
40
30
20
10
0
1 2 3 4 5 6 7
VMs on Physical Host
13
Revising Scheduler’s Assumptions
3. There is no cost to starting a speculative task
4. A task’s progress is roughly equal to the fraction of its total
work
5. Tasks tend to finish in waves, so a task with a low progress
score is likely a slow task
15
Progress Rate Example
1 min 2 min
Node 1 1 task/min
Node 2 3x slower
Time (min)
16
Progress Rate Example
Node 1
Time (min)
Node 2 is slowest, but should back up Node 3’s task!
17
Our Scheduler: LATE
Sanity thresholds:
Cap number of backup tasks
Launch backups on fast nodes
Only back up tasks that are sufficiently slow
18
LATE Details
1 – progress score
estimated time left =
progress rate
19
LATE Scheduler
Threshold values:
10% cap on backups, 25th percentiles for slow node/task
Validated by sensitivity analysis
20
LATE Example
2 min
Node 3
Time (min)
Environments:
EC2 (3 job types, 200-250 nodes)
Small local testbed
Self-contention through VM placement
Stragglers through background processes
22
EC2 Sort without Stragglers (Sec 5.2.1)
106 machines , 7-8 VMs per machine total of 243 VMs
128 MB data per host, 30 GB in total
486 map tasks and 437 reduce tasks
average 27% speedup over native, 31% over no backups
1.4
Normalized Response Time
1.2
0.8 No Backups
Hadoop Native
0.6
LATE Scheduler
0.4
0.2
0
Worst Best Average
23
EC2 Sort with Stragglers (Sec 5.2.2)
8 VMs are manually slowed down out of 100 VMs in total
running background of CPU- and disk-intensive jobs
average 58% speedup over native, 220% over no backups
93% max speedup over native
2.5
Normalized Response Time
2.0
1.5
No Backups
Hadoop Native
1.0 LATE Scheduler
0.5
0.0
Worst Best Average
24
Conclusion
25
Summary
26