Advanced Algorithms Course. Lecture Notes. Part 1
Advanced Algorithms Course. Lecture Notes. Part 1
These notes are based on Kleinberg, Tardos, Algorithm Design and also
influenced by other material.
Many details are omitted (we suppose that you have already a good
basic understanding of algorithms), and no diagrams or calculation
examples are included.
The contents follow the lectures, but they may differ from what was
exactly said in class.
1
Approximation Algorithms
Load Balancing
Suppose that n jobs with processing times tj have to be done, and every job
must be assigned to one of m machines. Let Ti be the load, i.e., the total
processing time of machine i. The goal is to compute a fair assignment that
minimizes the makespan T := maxi Ti .
This problem is NP-complete already for m = 2 machines, as can be
shown by a simple polynomial-time reduction from Subset Sum. (We assume
that you already know that Subset Sum is NP-complete.) Therefore we look
for algorithms that give good approximate solutions in polynomial time.
A natural greedy algorithm passes through the jobs and assigns every
job to a machine with currently smallest load. Due to NP-completeness,
this is not always optimal, and in fact, there are srikingly small explicit
counterexamples: Consider m = 2 machines and processing times 3, 3, 2, 2, 2.
Here the optimal makespan is 6, whereas the greedy solution yields 7. Still
this might be acceptable in practice.
The question arises how much the greedy solution is away from an opti-
mal solution in the worst case. An obvious way to analyze the quality of such
approximation algorithms (for minimization problems) is to prove a lower
bound for the optimal solution T and an upper bound for the algorithmic
solution T . Then, clearly, T /T is at most the ratio of these bounds.
For our Load Balancing problem, trivial lower bounds on T are any
tj /m. Now we use them to prove T 2T . The idea is to
P
tj , and
consider a machine i that finally has the maximum load Ti = T , and the job
j assigned last to this machine i. Before job j was assigned, all machines
had a load at least Ti tj (because job j has been assigned to machine
i with the currently smallest load). It follows k Tk m(Ti tj ), hence
P
(T tj ) + tj T + T .
This analysis is as good as it could be: There exist instances where T
is actually nearly 2T . A nasty case is many short jobs (that the greedy
algorithm assigns in a balanced way) followed by one long job (that must
be assigned to one machine, thereby destroying the balance). The obvious
weakness of the greedy algorithm is that it considers the jobs in the given
order. We can easily improve that: First sort the jobs so that t1 . . .
tn , then apply the greedy algorithm. In fact, we can now prove a better
approximation guarantee: T 1.5T .
2
Again we try and find lower and upper bounds on T and T , respectively.
As the algorithm became more clever, the (better) bounds are a little more
tricky as well. First we can suppose m < n, otherwise the Load Balancing
problem is trivial. Since at least two of the m + 1 longest jobs must be put
on the same machine, we have T 2tm+1 . From now on we reuse notations
from the previous proof. If machine i with the maximum final load T does
job j only, then the solution is optimal, since T = tj T . It remains the
case that our algorithm has assigned two or more jobs to machine i. Since
job j was assigned last to machine i, we have j m + 1. (Think a little.)
Hence tj tm+1 0.5T . And the previous analysis, which is still true,
gave T tj T . Together this yields T 1.5T .
Center Selection
Let S be a set of n sites (points) in a metric space equipped with a distance
function dist, and k a given number. The goal is to select a set C of k
centers (which are also points in the same metric space) so as to minimize
the maximum distance r(C) of a site to the nearest center. Think of placing
shops, fire stations, radio stations, etc., in a region. We call r(C) the covering
radius and define r = minC maxsS mincC dist(s, c), where C varies over
all sets of k points.
In contrast to Load Balancing, it is already not so obvious to get an
idea for a good greedy approximation algorithm. But, surprisingly, a little
extra information would be extremely helpful: Assume for the moment that
a little bird comes and tell us the optimal value r (but not the solution C).
As we will see, this would make the problem much easier. (Of course, later
we will have to drop this crazy assumption.) We say that a center c covers
a site s, with covering radius r, if dist(c, s) r.
Now we devise a simple greedy algorithm and analyze it at the same
time. In the beginning all sites are uncovered. Consider any uncovered site
s. We know that, in an optimal solution, some center c covers s. Instead
of this unknown c we choose s itself as a center! By the triangle inequality,
all sites covered by c are also covered by s if we enlarge the covering radius
to 2r. We repeat this step until all sites are covered (now with radius 2r).
Since the unknown optimal solution needed k centers, our greedy solution
uses at most k centers as well. If some sites remain uncovered after k steps,
3
this only indicates that our assumed r was too small. Thus we have an
algorithm with approximation ratio 2, under the preliminary assumption of
a known optimal covering radius r.
But in reality we do not have this extra information. So how do we
determine r? A tempting idea is binary search, but since the search space
consists of real numbers, it is not clear how many search steps we would need.
The trick is much simpler: Instead of doing binary search we revise the above
algorithm a little. Note that our preliminary algorithm may choose, in each
step, an arbitrary site s whose distance to all centers already selected is
larger than 2r. And this gives the idea:
In each step, we consider the site s having the largest minimum distance
to all centers already selected. Then the above analysis is still correct, but
we do not need prior knowledge of r, since the modified algorithm does
not use the value r in any way. Now we have a real algorithm for Center
Selection, with approximation ratio 2.