0% found this document useful (0 votes)

2 views5 pages

cme323_lec2

The document discusses scalable algorithms, focusing on types of scalability such as strong, weak, and embarrassingly parallel. It emphasizes the importance of scheduling in parallel computing, detailing the greedy algorithm for job assignment and its approximation ratio. Additionally, it introduces the All Prefix Sum problem, highlighting the challenge of parallelizing this inherently sequential task.

Uploaded by

gauravsom789

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views5 pages

cme323_lec2

Uploaded by

gauravsom789

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

CME 323: Distributed Algorithms and Optimization, Spring 2020

http://stanford.edu/~rezab/dao.
Instructor: Reza Zadeh, Matroid and Stanford.

Lecture 2, 4/5/2020. Scribed by Andreas Santucci, Edited by Robin Brown.

2 Scalable algorithms, Scheduling, and a glance at All Prefix Sum

2.1 Types of scaling
Another fundamental quantity to understand is the idea of how much speed-up we can hope to
achieve given more processors. There are three different types of scalability:(1) Strong Scaling, (2)
Weak Scaling, and (3) Embarrassingly Parallel.
Let T1,n denote the run-time on one processor given an input of size n. Suppose we have p
processors. We define the speed up of a parallel algorithm as
T1,n
SpeedUp(p, n) = .
Tp,n
Definition 2.1 (Strongly Scalable) If SpeedUp(p, n) = Θ(p), we say that the algorithm is strongly scalable.
We can use the results of last lecture to show that parallel sum of n numbers on p processors,
is strongly scalable.
Proof:To see this, recall that work is T1 = n, and depth is T∞ = log2 n, so SpeedUp(p, n) =
n
p
n
+log n = Θ(p). Specifically, both n and p are assumed to be going to infinity,but p grows much
2
slower than does n. Hence
n n p n n
n ≥ n n ≥ and n ≤ n = p.
p + log2 n p + p 2 p + log2 n p

T1,n
Definition 2.2 (Weakly Scalable) If SpeedUp(p,np) = Tp,np = Ω(1), then our algorithm is
weakly scalable.
This metric characterizes the case where, for each processor we add, we add more data as well.
This is a useful metric in practice, because oftentimes the only time we can afford to add more
processors or machines is when we are burdened with more data than our infrastructure can handle.
Definition 2.3 (Embarassingly Parallel) When the DAG representing an algorithm has 0-depth,
the algorithms is said to be embarrassingly parallel
That is, there is no dependency between our operations. It’s scalable in the most trivial sense,
e.g. flipping as many coins as possible at the same time.
We note here that we have used Brent’s theorem to derive the scaling bounds for the parallel
sum algorithm. In the previous lecture, we alluded to the fact that Brent’s theorem assumes
optimal scheduling, which is NP-hard. Fortunately, the existence of a polynomial time constant
approximation algorithm for optimal scheduling implies that these bounds still hold.

1
Figure 1: An Embarrassingly Parallel DAG

2.2 Scheduling
In addition to building algorithms with low depth, clever scheduling is just as important to par-
allelism. Given a DAG of computations, at any level in the DAG there are a certain number of
computations which can be required to execute (at the same time). The number of computations is
not necessarily equal to the number of processors you have available to you, so you need to decide
how to assign computations to processor—this is what is referred to as scheduling.
Ideally, you wish for all your processors to be busy, however, depending how jobs are assigned
to processors, you might end up with processors that are idle and not working. Depending on the
size and dependencies of jobs to be scheduled, it may not be possible for all processors to be busy
all the time. This then turns into an optimization problem where we try to schedule jobs in a way
that minimizes the idle time of processors. This problem turns out to be NP hard.
It is the scheduler’s task to schedule things in tandem in such a way minimizes the idle time
of processors. We could do this greedily, i.e., as soon as there is any computation to be done, we
assign it to a processor. Or, we can look ahead in our DAG to see if we can plan more efficiently.
Spark has a scheduler. Every distributed computing set up has a scheduler. Your operating
system and phone’s have schedulers. Every computer has processes, and every computer runs in
parallel. Your computer might have fifty Chrome tabs open and must decide which one to give
priority to in order to optimize performance of your machine.

2.2.1 Problem definition

An important problem in any parallel or distributed computing setting is figuring out how to
schedule jobs optimally. I.e. a scheduler must be able to assign sequential computation to processors
or machines in order to minimize the total time necessary to process all jobs.

Notation We assume that the processors are identical (i.e. each job takes the same amount of
time to run on any of the machines). More formally, we are given p processors and an unordered set
of n jobs with processing times J1 , . . . , Jn ∈ R. Say that the final schedule for processor i is defined
by a set of indices of jobs assigned to processor i. We call this set Si . The load for processor i is
P
therefore, Li = k∈Si Jk . The goal is to minimize the makespan defined as Lmax = maxi∈{1,...,p} Li .

2.2.2 The simple (greedy) algorithm

The intuition behind the greedy algorithm discussed here is simple: in order to minimize the
makespan we don’t want to give a job to a machine that already has a large load. Therefore, we
consider the following algorithm. Take the jobs one by one and assign each job to the processor
that has the least load at that time. This algorithm is simple and is online.

2
1 for each job that comes in (streaming) do
2 Assign job to lowest burdened machine
3 end
Algorithm 1: Simple scheduler

Other variants of scheduling We note there are many other variants of scheduling. Jobs can
have dependencies, i.e. one job must finish before another job can start. Here, the problem is pre-
specified by a computational DAG that is known before the time of scheduling. Another variant
is that scheduling must happen online, i.e. jobs come at you in an order where you cannot look
into the future. As jobs come in, you have to schedule it, and you cannot go back and change the
schedule. For a comprehensive treatment of variants of scheduling, see Handbook of Scheduling.1

2.2.3 Optimality of the greedy approach

In either of the above cases, where jobs have dependencies or must be scheduled online, the problem
is NP hard. So, we use approximation algorithms. We claim that the simple (greedy, and online)
algorithm actually has an approximation ratio of 2. In other words, the algorithm is in the worst-
case 2 times worse than the optimal, which is fairly good. For this analysis, we define the optimal
makespan (minimal makespan possible) to be OPT and try to compare the output of the greedy
algorithm to this. We also define Lmax as above to be the makespan.
Claim: Greedy algorithm has an approximation ratio of 2.
Proof: We first want to get a handle (lower bound) on OPT. We know that the optimal makespan
must be at least the sum of the processing times for the jobs divided amongst the p processors, i.e.2
n
1X
OPT ≥ Ji . (1)
p
i=1

A second lower bound is that OPT is at least as large as the time of the longest job, 3

OPT ≥ max Ji . (2)

Now consider running the greedy algorithm and identifying the processor responsible for the
makespan of the greedy algorithm (i.e. k = argmaxi Li ). Let Jt be the load of the last job placed
1
To give an idea of another variant, consider the case of distributed computing, where each machine houses a set
of local data, and shuffling data across the network is a bottleneck. We may consider scheduling jobs to machines
such that no data are shuffled. We’ll consider this more in the latter part of the course. This is called locality sensitive
scheduling
2
To see this, assume toward contradiction that OPT is able to schedule jobs such that OPT < p1 n
P
i=1 Ji . Suppose
that instead of OPT assigning jobs to p processors in parallel, we assigned all the work to one processor sequentially.
Then of course the total time required given by p · OPT < n
P Pn
i=1 Ji but this is a contradiction, since i=1 Ji exactly
represents the amount of work required to process all n jobs on a single processor.
3
The reason for this is simple: the longest job must be scheduled at some point to be run sequentially on one
processor, at which point it will require maxi Ji time to compute. There may be other processors which bottleneck
our makespan, but we know that OPT must take at least as long as any job, and in particular this holds for the
largest job.

3
on this processor. Before the last job was placed on this processor, the load of this processor was
thus Lmax - Jt . By the definition of the greedy algorithm, this processor must have also had the
least load before the last job was assigned. Therefore, all other processors at this time must have
had load at least Lmax − Jt , i.e. Lmax − Jt ≤ L0i for all i. Hence, summing this inequality over all i,
p
X p
X n
X
p(Lmax − Jt ) ≤ L0i ≤ Li = Ji (3)
i=1 i=1 i=1
In the second inequality, we assert that although Jt the last job placed on the bottleneck processor,
there may still be other jobs yet to be assigned, hence we have that the sum of total work placed on
each machine cannot decrease after assigning all jobs. The last equality comes from the fact that
the sum of the loads must be equal to the sum of the processing times for the jobs. Rearranging
the terms in this expression let’s us express this as:
n
1X
Lmax ≤ Ji + Jt (4)
p
i=1
Now, note that our greedy algorithm actually has makespan exactly equal to Lmax , by definition.
Using equations 1 and 2 along with the fact that Jt ≤ maxi Ji , we get that our greedy approximation
algorithm has makespan
APX = Lmax ≤ OP T + OP T = 2 × OP T. (5)
This shows that the greedy algorithm provides us with a scheduling time that is not more than 2
times more than the optimal.

What if we could see the future? We note that if we first sort the jobs in descending order
and assign larger jobs first, we can naively get a 3/2 approximation. The intuition is that if we first
schedule large jobs, we can use the smaller jobs to “fill in the gaps” remaining, i.e. to balance all
loads. If we use the same algorithm with a tighter analysis, we get a 4/3 approximation. We’ll see
later in the course how Spark uses lazy evaluation for exactly this reason: by faking computation
until the user takes an action, Spark can sort jobs and to obtain a more efficient scheduler.

What’s realistic? It may seem that our above assumption, to be able to look into the future and
know which jobs are going to be scheduled, is quite unreasonable. In reality, we don’t even know
how long each job will take. However, we often have a pretty good idea (based on historical data
or expectations) how long a particular job will take to run. And further, we may have statistics
or expectations on how many jobs of a particular type are going to come in, hence, it may not be
such an unrealistic scenario to know (within a certain tolerance) the expected amount of time each
job will take as well as what jobs might be in the pipeline.

2.3 All Prefix Sum

Given a list of integers, we want to find the sum of all prefixes of the list, i.e. the running sum. We
are given an input array A of size n elements long. Our output is of size n + 1 elements long, and
its first entry is always zero.

4
As an example, suppose A = [3, 5, 3, 1, 6], then AllPrefixSum(A) = [0, 3, 8, 11, 12, 18].
This feels like an inherently sequential task. The obvious way to do this with a single machine is
to have a running sum and write intermediary sums as we iterate through the array in linear time.
However, this does not parallelize at all. How can we parallelize this, so that it has low-depth?
We’ll take a look at this problem in more detail next lecture. For now, try to come up with your
own algorithm.

References
[1] Ola Svensson Approximation Algorithms. EPFL, January 21, 2013.

[2] Joseph Y-T. Leung. Handbook of Scheduling. CRC Press, 2004.

[3] G. Blelloch and B. Maggs. Parallel Algorithms. Carnegie Mellon University.

PDC UNIT-2
No ratings yet
PDC UNIT-2
48 pages
Uniprocessor Scheduling Algorithms. As Shown in Figure 3.3, Uniprocessor Scheduling Is Part
No ratings yet
Uniprocessor Scheduling Algorithms. As Shown in Figure 3.3, Uniprocessor Scheduling Is Part
8 pages
DC Module 4
No ratings yet
DC Module 4
78 pages
25-approx
No ratings yet
25-approx
70 pages
CPP Unit-4
No ratings yet
CPP Unit-4
61 pages
Emulations, Scheduling and Patterns
No ratings yet
Emulations, Scheduling and Patterns
30 pages
hpc_parallel
No ratings yet
hpc_parallel
122 pages
Job Scheduling: Uwe Schwiegelshohn EPIT 2007, June 5 Ordonnancement
No ratings yet
Job Scheduling: Uwe Schwiegelshohn EPIT 2007, June 5 Ordonnancement
57 pages
Unit-Iv Concurrent and Parallel Programming: Parallel Programming Paradigms - Data Parallel
No ratings yet
Unit-Iv Concurrent and Parallel Programming: Parallel Programming Paradigms - Data Parallel
61 pages
Parallel Algorithm Models: An Algorithm Model Is Typically A Way of Structuring A Parallel Algo. Models
No ratings yet
Parallel Algorithm Models: An Algorithm Model Is Typically A Way of Structuring A Parallel Algo. Models
28 pages
Provably Efficient Scheduling For Languages With Fine-Grained Parallelism
No ratings yet
Provably Efficient Scheduling For Languages With Fine-Grained Parallelism
41 pages
unit1 2 and 3
No ratings yet
unit1 2 and 3
76 pages
5-Parallel Algorithm Design Life Cycle
No ratings yet
5-Parallel Algorithm Design Life Cycle
25 pages
Par Seq Algorithms
No ratings yet
Par Seq Algorithms
44 pages
Chapter 7 - Parallel Programming Issues
No ratings yet
Chapter 7 - Parallel Programming Issues
68 pages
Emulations, Scheduling and Patterns
No ratings yet
Emulations, Scheduling and Patterns
30 pages
Deterministic Processor Scheduling
No ratings yet
Deterministic Processor Scheduling
32 pages
The Limited Preemption Uniprocessor Scheduling of Sporadic Task Systems
No ratings yet
The Limited Preemption Uniprocessor Scheduling of Sporadic Task Systems
23 pages
Lecture 6 Principles of Parallel Algorithm Design
No ratings yet
Lecture 6 Principles of Parallel Algorithm Design
35 pages
Lre-Tl Cited 2
No ratings yet
Lre-Tl Cited 2
22 pages
RTS L 28 - L 29: Scheduling Real Time Task in Multi-Processor and Distributed Systems
No ratings yet
RTS L 28 - L 29: Scheduling Real Time Task in Multi-Processor and Distributed Systems
18 pages
Chap 4-7 - Parallel - Abstractions - and - MPI
No ratings yet
Chap 4-7 - Parallel - Abstractions - and - MPI
34 pages
BDS Session 2
No ratings yet
BDS Session 2
58 pages
filename (21)
No ratings yet
filename (21)
30 pages
RTS Unit 6 Notes
No ratings yet
RTS Unit 6 Notes
7 pages
8-Parallel Algorithm Design - Preliminaries-09-Jan-2020Material - I - 09-Jan-2020 - Module - 3 - Preliminaries PDF
No ratings yet
8-Parallel Algorithm Design - Preliminaries-09-Jan-2020Material - I - 09-Jan-2020 - Module - 3 - Preliminaries PDF
18 pages
Scheduling Bin Packing
No ratings yet
Scheduling Bin Packing
10 pages
Con Currency Mapping
No ratings yet
Con Currency Mapping
40 pages
Deterministic Models: Preliminaries: 2.1 Framework and Notation
No ratings yet
Deterministic Models: Preliminaries: 2.1 Framework and Notation
6 pages
Lecture 5 Principles of Parallel Algorithm Design
No ratings yet
Lecture 5 Principles of Parallel Algorithm Design
30 pages
Resource and Process Management
No ratings yet
Resource and Process Management
98 pages
L2 Parallel Computing Models
No ratings yet
L2 Parallel Computing Models
31 pages
End-To-End Task in Heterogeneous Systems
No ratings yet
End-To-End Task in Heterogeneous Systems
13 pages
An Introduction To Parallel Algorithms
No ratings yet
An Introduction To Parallel Algorithms
66 pages
GAIL-Framework-GTA
No ratings yet
GAIL-Framework-GTA
77 pages
real time system chapter 1 solution
No ratings yet
real time system chapter 1 solution
5 pages
processes-10-02269
No ratings yet
processes-10-02269
14 pages
Multiprocessor Scheduling Using Task Duplication Based Scheduling Algorithms: A Review Paper
No ratings yet
Multiprocessor Scheduling Using Task Duplication Based Scheduling Algorithms: A Review Paper
7 pages
Dr. Sanjay P. Ahuja, Ph.D. FIS Distinguished Professor of CIS School of Computing UNF
No ratings yet
Dr. Sanjay P. Ahuja, Ph.D. FIS Distinguished Professor of CIS School of Computing UNF
8 pages
Parallel Computation Models: Slide 1
No ratings yet
Parallel Computation Models: Slide 1
28 pages
1 Allocation/Scheduling Problem Statement
No ratings yet
1 Allocation/Scheduling Problem Statement
32 pages
Parallel DFS and BFS
No ratings yet
Parallel DFS and BFS
35 pages
1 Multiprocessor Scheduling With Precedence Constraints: 1.1 Problem Description
No ratings yet
1 Multiprocessor Scheduling With Precedence Constraints: 1.1 Problem Description
5 pages
A Fast Algorithm For Multi-Machine Scheduling Problems With Jobs of Equal Processing Times
No ratings yet
A Fast Algorithm For Multi-Machine Scheduling Problems With Jobs of Equal Processing Times
13 pages
Bessler - Maximizing and Managing Your SAS Job Run Capabilities When Using Enterprise Guide or Standalone PC SAS
No ratings yet
Bessler - Maximizing and Managing Your SAS Job Run Capabilities When Using Enterprise Guide or Standalone PC SAS
33 pages
Lieu Sol
No ratings yet
Lieu Sol
37 pages
Malleable-Lab: A Tool For Evaluating Adaptive Online Schedulers On Malleable Jobs
No ratings yet
Malleable-Lab: A Tool For Evaluating Adaptive Online Schedulers On Malleable Jobs
8 pages
AI Documents
No ratings yet
AI Documents
25 pages
Queues Fairness and the Go Scheduler V3
No ratings yet
Queues Fairness and the Go Scheduler V3
145 pages
Exact Algorithm for Batch Scheduling on Unrelated Machine
No ratings yet
Exact Algorithm for Batch Scheduling on Unrelated Machine
6 pages
Tut - 3 Greedy
No ratings yet
Tut - 3 Greedy
2 pages
Unix Fundamentals Chapter 03
No ratings yet
Unix Fundamentals Chapter 03
20 pages
Unix Fundamentals Chapter 02
No ratings yet
Unix Fundamentals Chapter 02
20 pages
Sequencing Models: Nitrav J Bhavsar Ddu, Nadiad
No ratings yet
Sequencing Models: Nitrav J Bhavsar Ddu, Nadiad
24 pages
Assignment of Algorithm
No ratings yet
Assignment of Algorithm
9 pages
Cloud Computing Questions-1
No ratings yet
Cloud Computing Questions-1
38 pages
Assignment 6
No ratings yet
Assignment 6
3 pages
Module-2: Periodic-Task Model: 2.1 Clock-Driven Approach
No ratings yet
Module-2: Periodic-Task Model: 2.1 Clock-Driven Approach
16 pages
Parallel Algorithm - Introduction
No ratings yet
Parallel Algorithm - Introduction
36 pages
Gas Authority of India Ltd - SAS ESP - Technical Architecture Document v1
No ratings yet
Gas Authority of India Ltd - SAS ESP - Technical Architecture Document v1
13 pages
DM-185
No ratings yet
DM-185
12 pages
Clock-Driven Scheduling: Notations and Assumptions
No ratings yet
Clock-Driven Scheduling: Notations and Assumptions
11 pages
MWSUG-2010-57
No ratings yet
MWSUG-2010-57
11 pages
Unit - Iv Tasks and Loading
No ratings yet
Unit - Iv Tasks and Loading
6 pages
Load Scheduling
100% (1)
Load Scheduling
10 pages
CS 2110 - Fall 2012: Homework 2 Scheduling
No ratings yet
CS 2110 - Fall 2012: Homework 2 Scheduling
6 pages
Chapter 14: Parallel Algorithms
No ratings yet
Chapter 14: Parallel Algorithms
23 pages
rajasthan_deployed_projects
No ratings yet
rajasthan_deployed_projects
12 pages
1 Overview, Models of Computation, Brent's Theorem
No ratings yet
1 Overview, Models of Computation, Brent's Theorem
8 pages
Bessler - Why and How To Use SAS Macro Language
No ratings yet
Bessler - Why and How To Use SAS Macro Language
24 pages
A3 Littlefield Simulation Report
No ratings yet
A3 Littlefield Simulation Report
8 pages
hw3 S
No ratings yet
hw3 S
11 pages
OS-Chap2-2021 01 22
No ratings yet
OS-Chap2-2021 01 22
103 pages
sas sizing installation
No ratings yet
sas sizing installation
3 pages
OS LAB Manual 2023
No ratings yet
OS LAB Manual 2023
57 pages
TOEFL
No ratings yet
TOEFL
2 pages
Parallel Models of Computation
No ratings yet
Parallel Models of Computation
3 pages
Distributed Scheduling&Load Balancing
No ratings yet
Distributed Scheduling&Load Balancing
54 pages
Ceragon FibeAir IP-10C Product Description ETSI RevB.01
0% (1)
Ceragon FibeAir IP-10C Product Description ETSI RevB.01
131 pages
Unix Fundamentals Chapter 03 Exercises
No ratings yet
Unix Fundamentals Chapter 03 Exercises
1 page
OS Module 5
No ratings yet
OS Module 5
22 pages
Moring Exam
No ratings yet
Moring Exam
31 pages
Project-task-scheduler-website
No ratings yet
Project-task-scheduler-website
2 pages
Research Paper PDF
No ratings yet
Research Paper PDF
6 pages
Pdm Method
No ratings yet
Pdm Method
1 page
OpenText Communications Center Enterprise 16.0 - Supervisor User Guide English (CCMWEBRETR160000-UGD-EN-03)
100% (1)
OpenText Communications Center Enterprise 16.0 - Supervisor User Guide English (CCMWEBRETR160000-UGD-EN-03)
44 pages
Comparative Analysis FCFS RR
No ratings yet
Comparative Analysis FCFS RR
2 pages
Resource and Resource Access Control
No ratings yet
Resource and Resource Access Control
14 pages
Module 5 Production Planning and Control
No ratings yet
Module 5 Production Planning and Control
30 pages
VxWorks Is A Proprietary Real
No ratings yet
VxWorks Is A Proprietary Real
3 pages
Disk Scheduling Fcfs SSTF Scan
No ratings yet
Disk Scheduling Fcfs SSTF Scan
11 pages
CPU Scheduling Algorithm Assignment
No ratings yet
CPU Scheduling Algorithm Assignment
29 pages
Introduction To Modern Computer Systems
No ratings yet
Introduction To Modern Computer Systems
13 pages
AMOEBA Distributed Operating System
No ratings yet
AMOEBA Distributed Operating System
15 pages
05 CPU Scheduling
No ratings yet
05 CPU Scheduling
8 pages
Sequencing Techniques
No ratings yet
Sequencing Techniques
24 pages
Microprocessor and Interfacing: Course Contents (Uec Scheme)
No ratings yet
Microprocessor and Interfacing: Course Contents (Uec Scheme)
7 pages
Container Oriented Job Scheduling Using Linear Programming Model
No ratings yet
Container Oriented Job Scheduling Using Linear Programming Model
7 pages
ANSYS Running ANSYS Fluent Using A Load Manager
No ratings yet
ANSYS Running ANSYS Fluent Using A Load Manager
58 pages
Practice Questions For Lecture 4
No ratings yet
Practice Questions For Lecture 4
4 pages
Planning & Schedule
No ratings yet
Planning & Schedule
18 pages
Lab: Processes and Scheduling in xv6: Lecture Notes On Operating Systems
No ratings yet
Lab: Processes and Scheduling in xv6: Lecture Notes On Operating Systems
4 pages
GP OS Process
No ratings yet
GP OS Process
4 pages
Conceptual Programming: Conceptual Programming: Learn Programming the old way!
From Everand
Conceptual Programming: Conceptual Programming: Learn Programming the old way!
Avishek Sharma
No ratings yet
Introduction to Algorithms
From Everand
Introduction to Algorithms
S VASIST
No ratings yet

cme323_lec2

Uploaded by

cme323_lec2

Uploaded by

CME 323: Distributed Algorithms and Optimization, Spring 2020

Lecture 2, 4/5/2020. Scribed by Andreas Santucci, Edited by Robin Brown.

2 Scalable algorithms, Scheduling, and a glance at All Prefix Sum

2.2.1 Problem definition

2.2.2 The simple (greedy) algorithm

2.2.3 Optimality of the greedy approach

OPT ≥ max Ji . (2)

2.3 All Prefix Sum

[2] Joseph Y-T. Leung. Handbook of Scheduling. CRC Press, 2004.

[3] G. Blelloch and B. Maggs. Parallel Algorithms. Carnegie Mellon University.

You might also like