Probe Sharing a Simple Technique to Improve on Sparrow
Probe Sharing a Simple Technique to Improve on Sparrow
Abstract—As big data analytics frameworks are developing in a distributed manner by utilizing global cluster infor-
towards larger degrees of parallelism and shorter task dura- mation via a loosely coordinated mechanism; Sparrow intro-
tions to provide lower latency, millions of scheduling decisions duces batch sampling and late binding techniques to adapt
per second pose a great challenge to centralized schedulers. the power of two choices load balancing technique [7] to the
Therefore, increasing efforts are devoted to the study of dis- domain of distributed scheduling, providing near-optimal
tributed scheduling approaches to avoid the throughput limita- performance (note that Sparrow has been open source and
tion of centralized designs. Among these approaches, Sparrow many recent schedulers (e.g., Hopper [19], Tarcil [20] and
is a leading design. However, due to Sparrow’s sample-based Eagle [11]) use Sparrow’s sample-based techniques). After
techniques, some tasks in subsequent jobs may be scheduled
further studying, we found that both Apollo and Omega
earlier than those in the head-of-line job, which results in
scheduling disorder and inevitably causes poor response times
could guarantee strict scheduling orders according to sched-
and unfairness. To address these problems, this paper pro-
uling policies while Sparrow could not (this paper is origi-
poses a simple algorithm called probe sharing: jobs that arrive nally motivated by such observation when running Sparrow).
at the same Sparrow scheduler can share their probes to en- We think it may cause many problems when performing
sure that all tasks in the head-of-line job can be scheduled ear- scheduling:
lier than subsequent jobs. We have performed theoretical (1) As we all know, schedulers maintain job queues ac-
analysis and proved that probe sharing makes a good im-
cording to scheduling policies (e.g., FIFO, earliest deadline
provement on Sparrow. We have implemented probe sharing in
first, shortest job first and max-min fairness). However,
Sparrow and shown that probe sharing reduces scheduling
delays by 2.2× and provides 100% fairness. Trace-driven simu-
Sparrow’s sample-based techniques may result in scheduling
lations have been also used to evaluate probe sharing when disorder because some tasks in subsequent jobs may be
scaling to large clusters. In addition, the simplicity of probe scheduled earlier than those in the head-of-line job (e.g.,
sharing makes it applicable to many schedulers that use Spar- probes for a subsequent job may have gotten lucky and sam-
row’s techniques (e.g., Hopper, Tarcil and Eagle). pled some lightly loaded worker machines. Consequently,
some tasks in the subsequent job will be scheduled earlier
Keywords-big data; distributed scheduling; probe sharing; than tasks in the head-of-line job).
scheduling delay; fairness (2) The disorder in scheduling jobs that arrive at the
same scheduler may cause poor response times. Since a job
I. INTRODUCTION cannot complete until its last task finishes, the job response
Big data analytics frameworks (e.g., MapReduce [1], Dremel time is decided by the last task’s completion time while those
[12], Spark [3]) have played an important role in driving earlier-scheduled tasks have little effect on the job response
many services including web search indexing, business intel- time. However, due to Sparrow’s sample-based techniques,
ligence and spam detection. These frameworks are now shift- some tasks in the head-of-line job may postpone being
ing towards larger degrees of parallelism and shorter task scheduled because of the earlier scheduling of subsequent
durations, which poses a great challenge to centralized jobs’ tasks, and the delay scheduling of such tasks can in-
schedulers. As a result, many researchers turn to distributed crease the response time for the head-of-line job. Besides,
scheduling approaches (e.g., Omega [4], Apollo [5], Sparrow some tasks in subsequent jobs may be scheduled earlier than
[6]) to avoid the throughput limitation of centralized designs. the head-of-line job’s tasks, and the earlier scheduling of
Such approaches are able to handle millions of scheduling such tasks cannot reduce the response time for the subsequent
decisions per second and provide sub-second response times jobs. The rest of paper will use scheduling delay to quantify
by scheduling from a set of machines. To our knowledge, this influence and the definition is introduced in Section 2.
Omega, Apollo and Sparrow are three famous approaches (3) The disorder in scheduling jobs that arrive at the
that perform distributed scheduling in a cluster: Omega uses same scheduler may cause unfairness. We use FIFO as an
share-state scheduling and lock-free optimistic concurrency example to illustrate the problem (unfairness also occurs to
control to achieve both implementation extensibility and other scheduling policies). According to [6], Sparrow en-
performance scalability; Apollo makes scheduling decisions forces FIFO in this way: jobs that arrive earlier are assigned
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY GUWAHATI. Downloaded on November 04,2024 at 13:48:21 UTC from IEEE Xplore. Restrictions apply.
978-1-5386-1629-1/17/$31.00 ©2017 IEEE
2017 IEEE Symposium on Computers and Communications (ISCC)
higher priority; each worker machine maintains one queue ness describes the fraction of tasks in a job that are scheduled
for each priority; a worker machine responds to the probe for later than any task in subsequent jobs that arrive at the same
the highest priority non-empty queue when it gets a free slot. scheduler. Fairness describes the fraction of tasks in a job
However, it is not strict FIFO. Due to Sparrow’s sample- that are scheduled earlier than any task in subsequent jobs
based techniques, probes for a subsequent job may have got- that arrive at the same scheduler. We compute scheduling
ten lucky and sampled some lightly loaded worker machines delay by taking the difference between the job response time
that have not queued probes for the head-of-line job with a using a given scheduling approach and ideal job response
higher priority. Consequently, some tasks in the subsequent time (i.e., the job is scheduled with zero wait time). We
job will be scheduled earlier than tasks in the head-of-line compute unfairness by dividing the number of disordered
job, which violates FIFO. tasks (i.e., the tasks that are scheduled later than any task in
subsequent jobs) by the total number of tasks. We compute
In this paper, considering the problems with Sparrow, we fairness by one minus unfairness.
turn to a simple algorithm called probe sharing to improve
on Sparrow: jobs that arrive at the same Sparrow scheduler B. Sparrow scheduling approach
can share their probes to ensure that all tasks in the head-of-
line job can be scheduled earlier than subsequent jobs. That Sparrow is a distributed, low latency approach that performs
is to say, for jobs that arrive at the same Sparrow scheduler, scheduling from a set of schedulers rather than a single one.
they should be scheduled in a strict order according to the The design goals of Sparrow are to provide fine-grained task
job queue (i.e., a specific scheduling policy). In this manner, scheduling and to achieve sub-second response times, which
we can avoid scheduling disorder within a single Sparrow is complementary to the functionality provided by cluster
scheduler, reducing scheduling delays (e.g., the last task in resource managers. It learns from the power of two choices
the head-of-line job can be scheduled earlier to reduce the load balance technique, which proposes scheduling each
response time for the job; while the delay scheduling of the task by probing two random servers and placing the task on
first task in the subsequent job has little effect on the re- the server with fewer queued tasks. Sparrow introduces
sponse time for the job) and providing better fairness. Note batch sampling [8] and late binding [6] techniques to adapt
that all scheduling policies (e.g., FIFO, earliest deadline the power of two choices load balancing technique to the
first, shortest job first and max-min fairness) can reduce domain of parallel job scheduling.
to job queues, which means probe sharing works for all Batch sampling and late binding: With batch sampling and
scheduling policies. late binding, Sparrow schedulers perform scheduling in this
The rest of the paper is organized as follows. Section 2 way: when a job consists of m tasks arrives at a Sparrow
introduces our terminology and job model and basic con- scheduler, the scheduler randomly selects dm (where d de-
cepts of Sparrow. Section 3 gives a specific introduction on notes Sparrow’s probe ratio) worker machines and sends
probe sharing and the pseudocode for the algorithm. Section each a probe (a lightweight RPC). To guarantee that the job’s
4 analyses the scheduling delay and fairness using Sparrow’s m tasks will be placed on the m least loaded worker ma-
native scheduling approach and probe sharing on the basis of chines, a probed worker machine does not reply immediately
Probability Theory and Queuing Theory. Section 5 describes to the probe and instead places it at the end of its probe
our implementation and trace-driven simulations. We survey queue. When the probe reaches the front of the queue, the
related work in Section 6 and end with a discussion about worker machine replies to the corresponding scheduler. The
this research in Section 7. scheduler assigns the job’s m tasks to the first m worker ma-
chines to reply, and then proactively sends a cancellation
II. BACKGROUND RPC to all worker machines with outstanding probes for the
job. Pseudocode for Sparrow scheduling approach is shown
This section introduces our terminology and job model and below:
basic concepts of Sparrow.
A. Terminology and job model
Algorithm 1 Sparrow Scheduling Approach
Each scheduler:
In order to prove that probe sharing improves on Sparrow,
we use the same job model as Sparrow: we consider a clus- when a job j that consists of m tasks arrives:
ter of worker machines that execute tasks and schedulers randomly select dm workers to send each a probe for j
that assign tasks to worker machines; each job runs as a
parallel stage including a list of tasks and can be handled by
when a probe reply to a job j is received from worker n:
any scheduler; worker machines each run tasks in a fixed
number of slots (see more detailed information in [6]). if j has unscheduled tasks then
assign an unscheduled task to n
In this paper, task response time describes the time from
when a task is submitted to the scheduler until the task is if all tasks in j are scheduled then
completed. Job response time describes the time from when cancel the outstanding probes for j
a job is submitted to the scheduler until the last task in the end if
job is completed. Scheduling delay describes the total delay
within a job caused by both scheduling and queuing. Unfair- end if
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY GUWAHATI. Downloaded on November 04,2024 at 13:48:21 UTC from IEEE Xplore. Restrictions apply.
2017 IEEE Symposium on Computers and Communications (ISCC)
Each worker: for the first 100 worker machines to reply (70 worker ma-
when a probe p for a job j arrives: chines reply to the probe for job 1 and 30 worker machines
reply to the probe for job 2), scheduler s assigns job 1’s 100
place p at the end of probe queue tasks to them; for the next 80 worker machines to reply (30
worker machines reply to the probe for job 1 and 50 worker
when a slot is free: machines reply to the probe for job 2), scheduler s assigns
job 2’s 80 tasks to them and cancels the outstanding probes
reply to the head-of-line probe
for job 1; for the next 20 worker machines to reply (20
In addition, Sparrow uses strict priorities [9] and worker machines reply to the probe for job 2), scheduler s
weighted fair sharing [6] to support a useful set of scheduling assigns job 2’s 20 tasks to them and cancels the outstanding
policies. Many scheduling policies (e.g., FIFO, earliest dead- probes for job 2. For more than two jobs, probe sharing still
line first, and shortest job first) can reduce to strict priorities. works because it always schedules head-of-line job first.
For instance, with FIFO, jobs that arrive earlier are assigned Pseudocode for probe sharing is shown below:
higher priority; each worker machine maintains one queue Algorithm 2 Probe Sharing
for each priority; if a worker machine gets a free slot, it re-
sponds to the probe from the highest priority non-empty Each scheduler:
queue. To enforce max-min fairness [10], each worker ma- when a job j that consist of m tasks arrives:
chine maintains a separate probe queue for each user, and initialize j.probereplycount to 0
enforces weighted fair sharing over those probe queues.
randomly select dm workers to send each a probe for j
III. SHARING PROBES IN SPARROW
To solve scheduling disorder problems caused by Sparrow’s when a probe reply to a job j is received from worker n:
sample-based techniques, we turn to probe sharing. j.probereplycount += 1
A. Probe sharing if j.probereplycount = m then
cancel the outstanding probes for j
In practice, scheduling policies [21, 13] can be complex
which involve FIFO, earliest deadline first and shortest job end if
first. For multi-users, max-min fairness or dominant resource if j.probereplycount <= m then
fairness [14] can be enforced. Luckily, all these policies can assign an unscheduled task in head-of-line job to n
reduce to a job queue (to enforce max-min fairness, multi-
queues are used [6]). Therefore, we can use probe sharing to if all tasks in the head-of-line job are scheduled then
ensure strict scheduling order according to the job queue (we remove head-of-line job from the job queue
perform probe sharing on multi-queues separately when end if
Sparrow enforces max-min fairness), which handles schedul-
end if
ing disorder problems. Besides, because probe sharing
shares probes within a single Sparrow scheduler, it adds little Each work:
overhead to Sparrow, which has been proved through our when a probe p for a job j arrives:
implementation. place p at the end of probe queue
Assume a job consists m tasks and Sparrow’s probe ratio
is set to d, probe sharing works in this way: each time a job when a slot is free:
arrives at a Sparrow scheduler, the scheduler randomly se-
lects dm worker machines to send each a probe for this job; reply to the head-of-line probe
when a scheduler receives a probe reply to a job from a
B. An Example that probe sharing improves on Sparrow
worker machine and the job’s probe reply count is less than
m, the scheduler will assign a task in the head-of-line job to We use an example to show that probe sharing. Assume
this worker machine no matter which job this reply belongs each job consists of 2 tasks and task duration is constant.
to; Meanwhile, the scheduler maintains each job’s probe Assume further the probe ratio is set to 2 (the assumptions is
reply count: if there are already m worker machines that re- used only to make probe sharing easily understood). As is
ply to the probes for a job, the scheduler will cancel the out- shown in Figure 1 and 2, after job 1 arrives at a Sparrow
standing (d-1)m probes for this job. Note that we use probe scheduler, job 2 arrives at the same scheduler in a short arri-
reply count to cancel outstanding probes fairly across all val interval. With Sparrow’s native scheduling approach, job
jobs. To make our algorithm easily understood, we give the 1 randomly selects worker 2, 3, 4, 5 to send each a probe and
following example: assume a job consists 100 tasks and worker 4, 5 will first reply to the probe for job 1; job 2 ran-
Sparrow’s probe ratio is set to 2. When job 1 first arrives at domly selects worker 1, 2, 3, 6 to send each a probe and
scheduler s, scheduler s randomly selects 200 worker ma- worker 1, 6 will first reply to the probe for job 2. Therefore,
chines to send each a probe for job 1, and then job 2 arrives as is shown in the left side of Figure 3, the scheduling deci-
at scheduler s, scheduler s operates in the same way. Finally, sions made by Sparrow’s native scheduling approach is to
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY GUWAHATI. Downloaded on November 04,2024 at 13:48:21 UTC from IEEE Xplore. Restrictions apply.
2017 IEEE Symposium on Computers and Communications (ISCC)
assign job 1’s tasks to worker 4, 5 and to assign job 2’s tasks C. Problems with probe sharing
to worker 1, 6. With probe sharing, since worker 1, 5 is the
first two to reply of all probed workers, the scheduler will Probe sharing improves on Sparrow poorly at low cluster
choose worker 1, 5 to place job 1’s tasks (note that worker 1 loads because of two reasons. Firstly, at low cluster loads,
actually replies to the probe for job 2). Therefore, as is tasks in the head-of-line job can be scheduled instantly to
shown in the right side of Figure 3, scheduling decisions idle worker machines, so there is little chance that tasks in a
made by probe sharing is to assign job 1’s tasks to worker 1, subsequent job can be scheduled earlier than those of the
5 and job 2’s tasks to worker 4, 6, which reduces job 1’s head-of-line job. Secondly, low cluster loads usually mean
scheduling delay and remains job 2’s scheduling delay un- low arrival rates, so the arrival interval between jobs can be
changed. Besides, the scheduling decisions made by probe big enough to reduce the possibility that Sparrow’s native
sharing can avoid scheduling disorder, offering better fair- scheduling approach schedules tasks in subsequent jobs ear-
ness. Note that the example is a case where 2 jobs arrive and lier than those in the head-of-line job. Though probe sharing
for more than two jobs, probe sharing still works because it improves on Sparrow poorly at low cluster loads, it cannot
shares probes across all jobs. do harm to Sparrow because probe sharing at worst degener-
ates into Sparrow’s native scheduling approach.
Like Sparrow’s native scheduling approach, probe
sharing cannot guarantee fairness across a whole cluster.
Jobs that arrive at different Sparrow schedulers may suffer a
bad condition where tasks in a subsequent job may be sched-
uled earlier than those in an earlier job that arrive at a differ-
ent scheduler. Consider a case where two jobs arrive at dif-
ferent Sparrow schedulers, an earlier job may have gotten
unlucky and probed some heavily loaded worker machines
so there are chances that some tasks in a subsequent job can
be scheduled earlier. It is because probe sharing is unable to
share probes between Sparrow schedulers.
Figure 1. Job 1 randomly selects worker 2, 3, 4, 5 to send each a probe
and worker 4 and 5 will first reply to the probe for job 1. Empty boxes IV. ANALYSIS
represent tasks in earlier jobs that are supposed to lunach on the workers.
Given a few simplifying assumptions, this section analytical-
ly shows that probe sharing improves on Sparrow by sharing
probes across jobs that arrive at the same Sparrow scheduler,
reducing scheduling delays and providing better fairness.
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY GUWAHATI. Downloaded on November 04,2024 at 13:48:21 UTC from IEEE Xplore. Restrictions apply.
2017 IEEE Symposium on Computers and Communications (ISCC)
and s scheduler (s << n) in a cluster; the probability that a !!! !!!! !!! !!!
𝑗!"#$!%# = (8)
worker machine is idle is independent of whether the other !
worker machines are idle; a job consists of m tasks and ar- B. Fairness
rives at a scheduler as a Poisson process with mean rate λ per
second (i.e., mean job arrival interval is 1/λ seconds); task Consider a case where job 1 arrives at a Sparrow scheduler
duration is exponentially distributed with mean 1/µ seconds and then job 2 arrives at the same scheduler after 1/λ se-
(i.e., the service rate is µ per second). conds (more subsequent jobs will increase unfairness of
Sparrow’s native scheduling approach). Also, let 𝑢!"#$%&"'((
Let λ denote the mean job arrival rate of each scheduler define the fraction of tasks in job 1 that are scheduled later
and 𝜆! denote the mean task arrival rate of each worker ma- than any task in job 2 and 𝑓!"#$%&'' = 1 − 𝑢!"#$%&"'(( .
!"#
chine. According to Jackson’s Theorem [15], 𝜆! = , be-
! As long as 𝜌 < 1, (1 − 𝜌)𝑚 tasks in job 2 can be sched-
cause each worker has an equal possibility to launch a task.
uled instantly. Let selection variable 𝑋! be denoted as
On the basis of Queuing Theory [16], we can figure out
!"! !
𝜌= and analyze each worker machine by a birth-and- 1, 𝑡!! >
!" !
death process: 𝑋! = ! (9)
0, 𝑡!! ≤
!
0, 0 ≤ 𝑘 < 1
𝜇! = (1) where P(𝑡!! ) ~ Gamma(k, 1/µ) and k is the queue length of a
𝜇, 𝑘 ≥ 1
𝜂′! = 𝜌 ! 𝜂′! (2) worker machine. Then we have 𝑢!"#$%&"'(( and 𝑓!"#$%&'' of
Sparrow’s native scheduling approach. Note that 𝑢!"#$%&"'((
!
Solving for !!! η! = 1, we can find that and 𝑓!"#$%&'' of probe sharing is 0% and 100%.
𝜂′! = 𝜌 ! (1 − 𝜌) (3) !
!!! ![!! ]
𝑢!"#$%&"'(( = (10)
Because Sparrow randomly selects dm worker machines for !
!
a job that consists of m tasks and places the tasks on the m !!! ![!! ]
least loaded worker machines, the queue length distribution 𝑓!"#$%&'' = 1 − (11)
!
in Sparrow is derived as
C. Evaluation
𝜂! = 𝑑 ∗ 𝜌 ! (1 − 𝜌) (4) After setting n to 5000, m to 100, µ to 1 (i.e., mean task dura-
where 𝑘 < 𝑘!"# and 𝑘!"# = 𝑚𝑖𝑛!!"# ! !
!!! 𝑑𝜌 (1 − 𝜌) tion is 1000ms), d to 2, s to 10, and 𝜌 to 0.9, we get the
subjected to ! ! queue length distribution of worker machines by (4). Table 2
!!! 𝑑𝜌 (1 − 𝜌) > 1.
shows the queue length distribution of 100 selected worker
A. Scheduling Delay machines to launch tasks for a job.
We prove that probe sharing reduces scheduling delays by a TABLE II. LENGTH DISTRIBUTION OF SELECTED WORKER MACHINES
special case, where job 2 shares only its first-replied probe
with job 1 (as the number of shared probes increases, probe Queue length Number of worker machines
sharing reduces scheduling delays more) and the arrival in- 0 20
terval between job 1 and job 2 is 1/λ). Let 𝑡!!! denote the re- 1 18
sponse time for the j-th task in the i-th job that is scheduled 2 16
to a worker machine with queue length of k.
3 14
In this case, job 1 and job 2’s response times using Spar- 4 13
row’s native scheduling approach can be derived as
5 12
𝑗! ′ = max! 𝑡!! 6 7
!
≈ max! 𝑡!!!"# (5) According to Table 2 and (8), when job 2 shares its first-
replied probe with job 1, mean response time is reduced by
Meanwhile, job 1 and job 2’s response times using probe
138ms (the result of sharing only 1 probe), which means a
sharing can be derived as
lower scheduling delay. We believe probe sharing is an ef-
𝑗! = 𝐸[𝑚𝑎𝑥
! ! ! !
, 𝑡!!!"# , 𝑡!!!"# , … , 𝑡!!!!
!"#
] (6) fective algorithm because as the number of shared probes
! increases, scheduling delays can be reduced more. Mean-
!"# ! ! ! ! ! while, according to Table 2 and (11), we find that 𝑓!"#$%&''
𝑗! = 𝐸[𝑚𝑎𝑥 𝑡!! − , 𝑡!!!"# , 𝑡!!!"# , … , 𝑡!!!!
!"#
] (7)
! of Sparrow’s native scheduling approach is around 24%
Therefore, probe sharing can reduce the mean response time while 𝑓!"#$%&'' of probe sharing is 100%, demonstrating that
for job 1 and job 2 by probe sharing provides better fairness.
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY GUWAHATI. Downloaded on November 04,2024 at 13:48:21 UTC from IEEE Xplore. Restrictions apply.
2017 IEEE Symposium on Computers and Communications (ISCC)
Figure 4. Scheduling delays using Sparrow’s native scheduling approach Figure 5. Fairness using Sparrow’s native scheduling approach and probe
and probe sharing in a cluster of 20 worker machines with a probe ratio of sharing in a cluster of 20 worker machines with a probe ratio of 1 .
1 (default probe ratio = 1.05).
V. EVALUATION
This section evaluates probe sharing through both
imlementation and trace-driven simulations.
A. Implementation
We have implemented probe sharing based on Sparrow
Scheduling Platform [22] (U.C. Berkeley). It is availabe at
https://github.com/workbylwz/sparrow_probe_sharing. We run
Sparrow’s native scheduling approach and probe sharing on
a cluster of 20 worker machines. Each worker machine is
constrained to have 2 CPU cores and 2 GB RAM and intalls
Ubuntu14.04. We use 4 slots on each worker machine and
set the probe ratio to 1, simialr to the default configuration.
Figure 6. Scheduling delays using Sparrow’s native scheduling approach
In the experiments, we make users keep submitting jobs and probe sharing in a simulated cluster of 5,000 worker machines with a
with task duration of 1000ms to sustain different cluster probe ratio of 2.
loads (probe sharing also shows a good improvement with
other task duration distributions). Each experiment lasts large clusters. The traces are provided by Google and contain
over 600 seconds. By analyzing the logs, we plot scheduling data from a 12.5k-machine cell over about a month-long
delays at cluster loads of 75%, 80%, 85% 90% and 95% in period in May 2011. The data consist of timestamps, job
Figure 4. It shows that probe sharing makes a good improve- events, task events and other detailed records. It is available
ment on Sparrow. The heavier the cluster load is, the greater at https://github.com/google/cluster-data. We make the traces
improvement it makes. At 75% cluter load, probe sharing adapted to our simulations (mean task duration in the
reduces scheduling delays by 2.2×; at 95% cluster load, selected traces is 771 seconds and we change it to 771ms;
probe sharing reduces scheduling delays by 2.0×. jobs have a larger degree of parallelism).
We plot fairness using Sparrow’s native scheduling ap- In trace-driven simulations, we simulate Sparrow’s
proach and probe sharing in Figure 5. It shows that less than native scheduling approach and probe sharing on a cluster
half of tasks in a job are scheduled fairly using Sparrow’s of 5000 4-core worker machines and 10 schedulers. We use
native scheduling approach (e.g., fairness falls to 13.7% 4 slots on each worker machine. We enforce FIFO in each
within a single scheduler at 95% cluster load); while all jobs scheudler (scheduling policies like shortest job first and
that arrive at the same Sparrow scheduler are scheduled in a earliest job first show the similar results). The network
strict order using probe sharing (i.e., fairness is 100% within round trip time (RTT) is set to 1 millisecond which is the
a single scheduler). It meets our expectations because probe mean network RTT on EC2 with an un-optimized network
sharing always schedules head-of-line job’s tasks first. stack [2]. Besides, we change the mean job inter-arrival de-
lay to reach various utilization levels, such that we can
B. Trace-driven simulations
evaluate probe sharing at different cluster loads. The simula-
Since Sparrow aims at working in a cluster of thousands of tion results demonstrate that probe sharing reduces schedul-
nodes to provide sub-second response times, we use trace- ing delays by up to 5.0× at 95% cluster load and ensures
driven simulations to evaluate probe sharing when scaling to 100% fairness within a single Sparrow scheduler.
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY GUWAHATI. Downloaded on November 04,2024 at 13:48:21 UTC from IEEE Xplore. Restrictions apply.
2017 IEEE Symposium on Computers and Communications (ISCC)
Figure 7. Scheduling delays using Sparrow’s native scheduling approach Figure 9. Scheduling delays using Sparrow’s naitve scheduling approach
and probe sharing in a simulated cluster of 5,000 worker machines with a and probe sharing in a simulated cluster of 5,000 worker machines with a
probe ratio of 2. Boxes depict median, 25th, and 75th percentiles. probe ratio of 1.
Figure 8. Scheduling delays using Sparrow’s native scheduling approach Figure 10. Fairness using Sparrow’s native scheduling approach and probe
and probe sharing in a simulated cluster of 5,000 worker machines with a sharing in a simulated cluster of 5,000 worker machines with a probe ratio
probe ratio of 1.5. of 2.
In the simulations, we set probe ratio to 2, where probe by 1.6× at 95% cluster load. Figure 9 plots scheduling delays
sharing shows the least performance (using a probe ratio of using a probe ratio of 1 and shows that probe sharing im-
more than 2 will experience performance degradation [6]). proves scheduling delays on Sparrow significantly and re-
As is shown in Figure 6 (see previous page), probe sharing duces scheduling delays by 5.0× at 95% cluster load. Above
outperforms Sparrow’s native scheduling approach obvious- all, we can say that probe sharing reduces scheduling delays
ly. Scheduling delays using both approaches maintain around by up to 5.0× at 95% cluster load, which is a good improve-
1.5ms (it equals to 1.5 RTT communication overhead to ment since Sparrow has already provided near-optimal per-
launch a task) when the cluster load is within 75%. From formance.
75% to 95% cluster load, scheduling delays using Sparrow’s We evaluate the fairness using Sparrow’s native schedul-
native scheduling approach increase sharply while schedul- ing approach and probe sharing in the aforementioned simu-
ing delays using probe sharing keep increasing in a lower lation with a probe ratio of 2. As is shown in Figure 10,
speed. At 95% cluster load, probe sharing reduces Sparrow’s fairness using Sparrow’s native scheduling approach de-
scheduling delays by 1.4×. We also plot the candlestick chart creases sharply as the cluster load increases, and less than
in Figure 7 to show the rough distribution of scheduling de- half of tasks in a job are scheduled fairly at 95% cluster load
lays at cluster loads of 80%, 85%, 90%, and 95%. We can (i.e., fairness falls to 34.9% within a single scheduler at 95%
see that the median scheduling delay using probe sharing is cluster load); while probe sharing ensures that all jobs that
much lower than that using Sparrow’s native scheduling ap- arrive at the same Sparrow scheduler are scheduled in a strict
proach, which also demonstrates that probe sharing makes a order (i.e., fairness is 100% within a single scheduler).
good improvement on Sparrow.
VI. RELATED WORK
We repeated the simulations by setting probe ratio to 1.5
and 1. Figure 8 plots scheduling delays using a probe ratio of Scheduling in big data analytics frameworks has been exten-
1.5 and shows that probe sharing reduces scheduling delays sively studied in earlier work. However, few researches work
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY GUWAHATI. Downloaded on November 04,2024 at 13:48:21 UTC from IEEE Xplore. Restrictions apply.
2017 IEEE Symposium on Computers and Communications (ISCC)
on distributed scheduling. Probe sharing is the first to im- [3] Zaharia M, Chowdhury M, Das T, et al. Resilient distributed datasets:
prove on Sparrow by sharing probes across jobs. A fault-tolerant abstraction for in-memory cluster
computing[C]//Proceedings of the 9th USENIX conference on
Our idea comes from Sparrow’s batch sampling [8]. Networked Systems Design and Implementation. USENIX
Association, 2012: 2-2.
Batch sampling improves on per-task sampling by sharing
information across all probes for a particular job. However, [4] Schwarzkopf M, Konwinski A, Abd-El-Malek M, et al. Omega:
with batch sampling, information cannot be shared across all flexible, scalable schedulers for large compute
clusters[C]//Proceedings of the 8th ACM European Conference on
jobs that arrive at the same Sparrow scheduler. We then turn Computer Systems. ACM, 2013: 351-364.
to probe sharing to allow sharing probes across jobs to re-
[5] Boutin E, Ekanayake J, Lin W, et al. Apollo: scalable and coordinated
duce scheduling delays and provide better fairness. It has scheduling for cloud-scale computing[C]//11th USENIX Symposium
been proved that probe sharing is an improved technique on on Operating Systems Design and Implementation (OSDI 14). 2014:
the basis of batch sampling, which makes information 285-300.
sharing go further. [6] Ousterhout K, Wendell P, Zaharia M, et al. Sparrow: distributed, low
latency scheduling [C]//Proceedings of the Twenty-Fourth ACM
Sharing information between jobs is adopted by delay Symposium on Operating Systems Principles. ACM, 2013: 69-84.
scheduling [17], where a job that should be scheduled next
[7] Mitzenmacher M. The power of two choices in randomized load
according to fairness cannot launch a local task, it waits for a balancing[J]. Parallel and Distributed Systems, IEEE Transactions on,
small amount of time, letting subsequent jobs launch a task 2001, 12(10): 1094-1104.
instead. We can see that, the application area of delay sched- [8] Ousterhout K, Wendell P, Zaharia M, et al. Batch Sampling: Low
uling is different from that of our work. Besides, delay Overhead Scheduling for Sub-Second Prallel Jobs[J]. University of
scheduling aims to achieve data locality while preserving California, Berkeley, 2012.
fairness in Yarn but our work tries to improve on Sparrow, [9] Georges J P, Divoux T, Rondeau E. Strict Priority versus Weighted
reducing scheduling delays and providing better fairness. Fair Queueing in Switched Ethernet networks for time critical
applications[C]//Parallel and Distributed Processing Symposium,
Yaq [18] focuses on queue management to achieve lower 2005. Proceedings. 19th IEEE International. IEEE, 2005: 141-141.
response times, which is similar to probe sharing. It intro- [10] Demers A, Keshav S, Shenker S. Analysis and simulation of a fair
duces several techniques (e.g., prioritization of task execu- queueing algorithm[C]//ACM SIGCOMM Computer Communication
tion via queue reordering and careful placement of tasks to Review. ACM, 1989, 19(4): 1-12.
queues) to improve system’s performance. These techniques [11] Delgado P, Dinu F, Didona D, et al. Eagle: A Better Hybrid Data
pursue lower response times at the cost of fairness and re- Center Scheduler[R]. Tech. Rep, 2016.
quire estimating task execution times, which may be inaccu- [12] Melnik S, Gubarev A, Long J J, et al. Dremel: interactive analysis of
rate. But probe sharing provides better fairness and avoids web-scale datasets[J]. Proceedings of the VLDB Endowment, 2010,
complex estimations. 3(1-2): 330-339.
[13] Avi-Itzhak B, Levy H. On measuring fairness in queues[J]. Advances
VII. CONCLUSION in Applied Probability, 2004: 919-936.
As distributed scheduling approaches like Omega, Apollo [14] Ghodsi A, Zaharia M, Hindman B, et al. Dominant Resource
Fairness: Fair Allocation of Multiple Resource Types[C]//NSDI.
and Sparrow gain in popularity in big data analytics frame- 2011, 11: 24-24.
works, there is a strong need to make optimizations on them.
[15] Jackson J R. Jobshop-like queueing systems[J]. Management science,
This paper proposes a simple algorithm called probe sharing 1963, 10(1): 131-142.
to improve on Sparrow by sharing probes across all jobs that
arrive at the same Sparrow scheduler. The theoretical analy- [16] Kendall D G. Stochastic processes occurring in the theory of queues
and their analysis by the method of the imbedded Markov chain[J].
sis has proved that probe sharing makes a good improve- The Annals of Mathematical Statistics, 1953: 338-354.
ment on Sparrow. The implementation has shown that probe
[17] Zaharia M, Borthakur D, Sen Sarma J, et al. Delay scheduling: a
sharing reduces scheduling delays by 2.2× and provides simple technique for achieving locality and fairness in cluster
100% fairness. The trace-driven simulations have further scheduling[C]//Proceedings of the 5th European conference on
shown that probe sharing provides good performance when Computer systems. ACM, 2010: 265-278.
scaling to large clusters. In addition, the simplicity of probe [18] Rasley J, Karanasos K, Kandula S, et al. Efficient queue management
sharing makes it applicable to schedulers that use Sparrow’s for cluster scheduling[C]//Proceedings of the Eleventh European
techniques (e.g., Hopper [19], Tarcil [20] and Eagle [11]). Conference on Computer Systems. ACM, 2016: 36.
[19] Ren X, Ananthanarayanan G, Wierman A, et al. Hopper:
ACKNOWLEDGMENT Decentralized speculation-aware cluster scheduling at scale[J]. ACM
SIGCOMM Computer Communication Review, 2015, 45(4): 379-
This work is supported by the National Natural Science 392.
Foundation of China (No. 61472199) and the National Basic [20] Delimitrou C, Sanchez D, Kozyrakis C. Tarcil: reconciling
Research Program of China under grant 2010CB328105. scheduling speed and quality in large shared clusters[C]//Proceedings
of the Sixth ACM Symposium on Cloud Computing. ACM, 2015: 97-
REFERENCES 110.
[1] J. Dean and S. Ghemawat. MapReduce: Simplified dataprocessing on [21] Panwalkar S S, Iskander W. A survey of scheduling rules[J].
large clusters. In OSDI, 2004. Operations research, 1977, 25(1): 45-61.
[2] Amazon EC2. http://aws.amazon.com/ec2. [22] https://github.com/radlab/sparrow.
Authorized licensed use limited to: INDIAN INSTITUTE OF TECHNOLOGY GUWAHATI. Downloaded on November 04,2024 at 13:48:21 UTC from IEEE Xplore. Restrictions apply.