Algorithm Engineering - Selected Results and Surveys
Algorithm Engineering - Selected Results and Surveys
Algorithm Engineering
Selected Results and Surveys
123
Lecture Notes in Computer Science 9220
Commenced Publication in 1973
Founding and Former Series Editors:
Gerhard Goos, Juris Hartmanis, and Jan van Leeuwen
Editorial Board
David Hutchison
Lancaster University, Lancaster, UK
Takeo Kanade
Carnegie Mellon University, Pittsburgh, PA, USA
Josef Kittler
University of Surrey, Guildford, UK
Jon M. Kleinberg
Cornell University, Ithaca, NY, USA
Friedemann Mattern
ETH Zurich, Zurich, Switzerland
John C. Mitchell
Stanford University, Stanford, CA, USA
Moni Naor
Weizmann Institute of Science, Rehovot, Israel
C. Pandu Rangan
Indian Institute of Technology, Madras, India
Bernhard Steffen
TU Dortmund University, Dortmund, Germany
Demetri Terzopoulos
University of California, Los Angeles, CA, USA
Doug Tygar
University of California, Berkeley, CA, USA
Gerhard Weikum
Max Planck Institute for Informatics, Saarbrücken, Germany
More information about this series at http://www.springer.com/series/7407
Lasse Kliemann Peter Sanders (Eds.)
•
Algorithm Engineering
Selected Results and Surveys
123
Editors
Lasse Kliemann Peter Sanders
Kiel University Karlsruhe Institute of Technology
Kiel Karlsruhe
Germany Germany
which started in 2007 and lasted six years. In total, 28 projects received funding
through this program. In addition there were six associated projects. We gratefully
acknowledge this support.
Each submission for this volume was peer-reviewed. We sincerely thank the authors
and the reviewers for their work, diligence, and cooperation.
In total, we have 12 chapters, including extensive surveys and case studies:
Chapter 1 A simple but powerful stochastic local search algorithm for the SAT
problem is presented and analyzed. Experiments are used for tuning and for
comparison with other algorithms. It is concluded that when flipping a variable,
it is more important to pay attention to the number of newly unsatisfied clauses
than to the number of newly satisfied ones.
Chapter 2 is a survey on practical algorithms for routing in transportation networks,
including road networks, schedule-based public transport networks, as well as
multimodal scenarios. Experiments show that it is possible to find good jour-
neys within milliseconds in large-scale networks. Several of the described
approaches have been included in mainstream production systems.
Chapter 3 surveys different ways to theoretically analyze the k-means clustering
algorithm. Several of the theoretical activities, e.g., smoothed analysis, were
motivated by observations in experiments.
Chapter 4 surveys practical algorithms for balanced graph partitioning. A large
variety of different approaches are presented, and implementation aspects and
benchmarking are discussed.
Chapter 5 In randomized and derandomized rounding, for many applications it is
required that the solution satisfies certain constraints with probability one. In
order to obtain such solutions, there exist two very different algorithmic
approaches, which, however, have very similar theoretical properties. This
chapter surveys theoretical foundations, experimental studies, and applications
for those two original approaches and new ones derived from them.
Chapter 6 is a review of external-memory search for state space problems, giving
detailed descriptions of algorithms and data structures, complemented by
concrete examples. Implementation on a GPU is discussed and speedups are
substantiated by experiment.
Chapter 7 presents a framework for the development and evaluation of real-time
rendering algorithms. A central concept is a meta rendering algorithm that
automatically selects an algorithm for the visualization of highly complex
scenes.
Chapter 8 applies the Algorithm Engineering cycle of design, analysis, imple-
mentation, and experimentation to robust optimization. In such problems, the
exact data is not known but bounded by a set of possible realizations. The
importance of considering real-world applications is demonstrated.
Chapter 9 gives a survey on concepts and algorithms for finding clusters in net-
works that change over time. Data sets for experimentation, comprised of
real-world and synthetic data, are thoroughly discussed.
Preface VII
1 Introduction
The SAT problem is one of the most studied N P-complete problems in computer
science. One reason is the wide range of SAT’s practical applications ranging
from hardware verification to planning and scheduling. Given a propositional
formula in CNF with variables {x1 , . . . , xn } the SAT-problem consists in finding
an assignment for the variables such that all clauses are satisfied.
Stochastic local search (SLS) solvers operate on complete assignments and
try to find a solution by flipping variables according to a given heuristic. Most
SLS solvers are based on the following scheme: Initially, a random assignment
is chosen. If the formula is satisfied by the assignment the solution is found. If
not, a variable is chosen according to a (possibly probabilistic) variable selection
heuristic, which is further called pickVar. The heuristics mostly depend on some
score, which counts the number of satisfied/unsatisfied clauses, as well as other
aspects like the “age” of variables, and others. It was believed that a good
flip heuristic should be designed in a very sophisticated way to obtain a really
efficient solver. We show in the following that it is worth to “come back to
the roots” since a very elementary and (as we think) elegant design principle
c Springer International Publishing AG 2016
L. Kliemann and P. Sanders (Eds.): Algorithm Engineering, LNCS 9220, pp. 1–18, 2016.
DOI: 10.1007/978-3-319-49487-6 1
2 A. Balint and U. Schöning
for the pickVar heuristic just based on probability distributions will do the job
extraordinary well.
It is especially popular (and successful) to pick the flip variable from an
unsatisfied clause. This is called focused local search in [14]. In each round, the
selected variable is flipped and the process starts over again until a solution is
eventually found.
Most important for the flip heuristic seems to be the score of an
assignment, i.e. the number of satisfied clauses. Considering the process of flip-
ping one variable, we get the relative score change produced by a candidate
variable for flipping as: (score after flipping minus score before flipping) which
is equal to make minus break. Here make means the number of newly satisfied
clauses which come about by flipping the variable, and break means the number
of clauses which become false by flipping the respective variable. To be more
precise, we will denote make(x, α) and break(x, α) as functions of the respec-
tive flip variable x and the actual assignment α (before flipping). Notice that
in case of focused flipping mentioned above the value of make is always
at least 1.
Most of the SLS solvers so far, if not all, follow the strategy that whenever
the score improves by flipping a certain variable from an unsatisfied clause, they
will indeed flip this variable without referring to probabilistic decisions. Only if
no improvement is possible as is the case in local minima, a probabilistic strat-
egy is performed. The winner of the SAT Competition 2011 category random
SAT, Sparrow, mainly follows this strategy but when it comes to a probabilistic
strategy it uses a probability distribution function [2]. The probability distribu-
tion in Sparrow is defined as an exponential function of the score value. In this
chapter we analyze several simple SLS solvers which are based only on proba-
bility distributions.
We propose a new class of solvers here, called probSAT, which base their prob-
ability distributions for selecting the next flip variable solely on the make and
break values, but not necessarily on the value of the score = make − break, as
it was the case in Sparrow. Our experiments indicate that the influence of make
should be kept rather weak – it is even reasonable to ignore make completely,
like in implementations of WalkSAT [13]. The role of make and break in these
SLS-type algorithms should be seen in a new light. The new type of algorithm
presented here can also be applied for general constraint satisfaction problems
and works as follows.
Engineering SLS for SAT 3
Algorithm 1. ProbSAT
Input : Formula F , maxT ries, maxF lips
Output: satisfying assignment α or UNKNOWN
1 for i = 1 to maxT ries do
2 α ← randomly generated assignment
3 for j = 1 to maxF lips do
4 if (α is model for F ) then
5 return α
6 Cu ← randomly selected unsatisfied clause
7 for x in Cu do
8 compute f (x, α)
9 var ← random variable x according to probability f (x,α)
z∈Cu f (z,α)
10 α ← flip(var) in α
11 return UNKNOWN;
The idea here is that the function f should give a high value to variable
x if flipping x seems to be advantageous, and a low value otherwise. Using f
the probability distribution for the potential flip variables is calculated. The flip
probability for x is proportional to f (x, α). Letting f be a constant function
leads in the k-SAT case to the probabilities ( k1 , . . . , k1 ) morphing the probSAT
algorithm to the random walk algorithm that is theoretically analyzed in [15]. In
all our experiments with various functions f we made f depend on break(x, α)
and possibly on make(x, α), and no other properties of x and α nor the history
of previous search course. In the following we analyze experimentally the effect
of several functions to be plugged in for f .
(cm )make(x,α)
f (x, α) =
(cb )break(x,α)
The parameters of the function are cb and cm . Because of the exponential func-
1
tions used here (think of cx = e T x ) this is reminiscence of the way Metropolis-
like algorithms (see [17]) select a variable. Also, this is similar to the Softmax
probabilistic decision for actions in reinforcement learning [19]. We call this the
exp-algorithm. The separation into the two base constants cm and cb will allow
us to find out whether there is a different influence of the make and the break
value – and there is one, indeed.
It seems reasonable to try to maximize make and to minimize break. There-
fore, we expect cm > 1 and cb > 1 to be good choices for these parameters.
Actually, one might expect that cm should be identical to cb such that the above
formula simplifies to cmake−break = cscore for an appropriate parameter c.
4 A. Balint and U. Schöning
To get a picture on how the performance of the solver varies for different
values of cm and cb , we have done a uniform sampling of cb ∈ [1.0, 4.0] and
of cm ∈ [0.1, 2.0] for this exponential function and of cm ∈ [−1.0, 1.0] for the
polynomial function (see below). We have then run the solver with the different
parameter settings on a set of randomly generated 3-SAT instances with 1000
variables at a clause to variable ratio of 4.26. The cutoff limit was set to 10 s.
As a performance measure we use PAR10: penalized average runtime, where a
timeout of the solver is penalized with 10·(cutoff limit). A parameter setting
where the solver is not able to solve anything has a PAR10 value of 100 in
our case.
In the case of 3-SAT a very good choice of the parameters is cb > 1 (as
expected) and cm < 1 (totally unexpected), for example, cb = 3.6 and cm = 0.5
(see Fig. 1 left upper diagram and the survey in Table 1) with small variation
Fig. 1. Parameter space performance plot: The left plots show the performance
of different combinations of cb and cm for the exponential (upper left corner) and the
polynomial (lower left corner) functions. The darker the area the better the runtime of
the solver with that parameter settings. The right plots show the performance variation
if we ignore the make values (correspond to the cut in the left plots) by setting cm = 1
for the exponential function and cm = 0 for the polynomial function.
Engineering SLS for SAT 5
Our experiments showed that the exponential decay in probability with growing
break value might be too strong in the case of 3-SAT. The above formulas have
an exponential decay in probability comparing different (say) break values. The
relative decay is the same when we compare break = 0 with break = 1, and
when we compare, say, break = 5 with break = 6. A “smoother” function for
high values would be a polynomial decay function. This led us to consider the
following, 2-parameter function ( = 1 in all experiments):
(make(x, α))cm
f (x, α) =
( + break(x, α))cb
We call this the poly-algorithm. The best parameters in case of 3-SAT turned
out to be cm = −0.8 (notice the minus sign!) and cb = 3.1 (See Fig. 1, lower
part). In the interval cm ∈ [−1.0, 1.0] the optimal choice of parameters can be
described by the linear function cb + 0.9cm = 2.3. Without harm one can set
cm = 0, and then take cb = 2.3, and thus ignore the make value completely.
Ignoring the make value (i.e. setting cm = 0) brings us to the function
As mentioned above, in both cases, the exp- and the poly-algorithm, it was a
good choice to ignore the make value completely (by setting cm = 1 in the
exp-algorithm, or by setting cm = 0 in the poly-algorithm). This corresponds to
the vertical lines in Fig. 1, left diagrams. But nevertheless, the optimal choice in
both cases, was to set cm = 0.5 and cb = 3.6 in the case of the exp-algorithm
0.5make −(break+make/2)
(and similarly for the poly-algorithm.) We have 3.6 break ≈ 3.6 .
6 A. Balint and U. Schöning
This can be interpreted as follows: instead of the usual score = make − break a
better score measure is −(break + make/2).
The value of cb determines the greediness of the algorithm. We concentrate
on cb in this discussion since it seems to be the more important parameter.
The higher the value of cb , the more greedy is the algorithm. A low value of
cb (in the extreme, cb = 1 in the exp-algorithm) morphs the algorithm to a
random walk algorithm with flip probabilities ( k1 , . . . k1 ) like the one considered
in [15]. Examining Fig. 1, almost a phase-transition can be observed. If cb falls
under some critical value, like 2.0, the expected run time increases tremendously.
Turning towards the other side of the scale, increasing the value of cb , i.e. making
the algorithm more greedy, also degrades the performance but not with such an
abrupt rise of the running time as in the other case. These observations have
also been made empirically by Hoos in [9], where he proposed to approximate
the noise value from above, rather from below.
All random instances used in our settings are uniform random k-SAT problems
generated with different clause to variable ratios, denoted with r. The class
of random 3-SAT problems is the best studied class of random problems and
because of this reason we have four different sets of 3-SAT instances.
The 5-SAT and 7-SAT problems used in our experiments come from [21]: 5sat500
(500 variables at r = 20) and 7sat90 (90 variables at r = 85). The 3sat1k,
3sat10k, 5sat500 and 7sat90 instance classes are divided into two equal sized
classes called train and test. The train set is used to determine good parameters
for cb and cm and the second class is used to report the performance. Further
we also include the set of satisfiable random and crafted instances from the SAT
Competition 2011.
1
www.satcompetition.org.
Engineering SLS for SAT 7
The problem that every solver designer is confronted with is the determination
of good parameters for its solvers. We have avoided to accomplish this task by
manual tuning but instead have used an automatic procedure.
As our parameter search space is relatively small, we have opted to use a mod-
ified version of the iterated F-Race [5] configurator, which we have implemented
in Java. The idea of F-race is relatively simple: good configurations should be
evaluated more often than poor ones which should be dropped as soon as possi-
ble. F-Race uses a familywise Friedman test (see Test 25 in [18] for more details
about the test) to check if there is a significant performance difference between
solver configurations. The test is conducted every time the solvers have run on
an instance. If the test is positive, poor configurations are dropped, and only
the good ones are further evaluated. The configurator ends when the number of
solvers left in the race is less than 2 times the number of parameters or if there
are no more instances to evaluate on.
To determine good values for cb and cm we have run our modified version
of F-Race on the training sets 3sat1k, 3sat10k, 5sat500 and 7sat90. The cutoff
time for the solvers were set to 10 s for 3sat1k and to 100 s for the rest. The best
parameter values returned by this procedure are listed in Table 1. Values for the
class of 3sat1k problems were also included, because the preliminary analysis of
the parameter search space was done on this class. The best parameter of the
break-only-exp-algorithm for k-SAT can be roughly described by the formula
cb = k 0.8 .
Table 1. Parameter setting for cb and cm : Each cell represents a good setting for
cb and cm dependent on the function used by the solver. Parameter values close to
these values have similar good performance.
4 Empirical Evaluation
In the second part of our experiments we compare the performance of our solvers
to that of the SAT Competition 2011 winners and also to WalkSAT [13]. An
additional comparison to a survey propagation algorithm will show how far our
probSAT local search solver can get.
8 A. Balint and U. Schöning
Soft- and Hardware. The solvers were run on a part of the bwGrid clusters [8]
(Intel Harpertown quad-core CPUs with 2.83 GHz and 8 GByte RAM). The
operating system was Scientific Linux. All experiments were conducted with
EDACC, a platform that distributes solver execution on clusters [1].
The Competitors. The WalkSAT solver is implemented within our own code
basis. We use our own implementation and not the original code (version 48)
provided by Henry Kautz2 , because our implementation is approximately 1.35
times faster3 .
We have used version 1.4 of the survey propagation solver provided by
Zecchina4 , which was changed to be DIMACS conform. For all other solvers
we have used the binaries from the SAT Competition 20115 .
Results. We have evaluated our solvers and the competitors on the test set of the
instance sets 3sat1k, 3sat10k, 5sat500 and 7sat90 (note that the training set was
used only for finding good parameters for the solvers). The parameter setting
for cb and cm are those from Table 1 (in case of 3-SAT we have always used the
parameters for 3sat10k). The results of the evaluations are listed in Table 2.
On the 3-SAT instances, the polynomial function yields the overall best per-
formance. On the 3-SAT competition set all of our solver variants exhibited the
most stable performance, being able to solve all problems within cutoff time.
The survey propagation solver has problems with the 3sat10k and the 3sat-
Comp problems (probably because of the relatively small number of variables).
The good performance of the break-only-poly-solver remains surprisingly good
even on the 3satExtreme set where the number of variables reaches 5 · 105 (ten
times larger than that from the SAT Competition 2011). From the class of SLS
solvers it exhibits the best performance on this set and is only approx. 2 times
slower than survey propagation. Note that a value of cb = 2.165 for the break-
only-poly solver further improved the runtime of the solver by approximately
30 % on the 3satExtreme set.
2
http://www.cs.rochester.edu/u/kautz/walksat/.
3
The latest version 50 of WalkSAT has been significantly improved, but was not
available at the time we have performed the experiments.
4
http://users.ictp.it/∼zecchina/SP/.
5
http://www.cril.univ-artois.fr/SAT11/solvers/SAT2011-static-binaries.tar.gz.
Engineering SLS for SAT 9
Table 2. Evaluation results: Each cell represents the PAR10 (Penalized average
runtime with penalization factor 10 - every unsuccessful run is penalized with 10 times
the maximum runtime.) runtime and the number of successful runs for the solvers on
the given instance set. Results are highlighted if the solver succeeded in solving all
instances within the cutoff time, or if it has the best PAR10 runtime. Cutoff times are
600 s for 3sat10k, 5sat500 and 7sat90 and 5000 s for the rest. The blank cells indicates
that we have no parameter setting worth evaluating.
5000
Sparrow2011
EagleUP
4000 WalkSAT adapt
CPU Time (s)
sattime2011
probSAT adapt
3000
2000
1000
0 50 100 150
Fig. 2. Results on the “large” set of the SAT Competition 2011 random instances
represented as a cactus plot. The x-axis represents the number of problems a solver
was able to solve ordered by runtime; the y-axis is the runtime. The lower a curve
(low runtimes) and the more it gets to the right (more problems solved) the better the
solver.
Table 3. Results on the crafted satisfiable instances: Each cell reports the
number of solved instances within the cutoff time (5000 s). The first line shows the
results on the original instances and the second on the preprocessed instances.
The high greediness level needed for WalkSAT and probSAT to solve the
crafted instances indicates that this instances might be more similar to the
7-SAT instances (generally to higher k-SAT). A confirmation of this conjec-
ture is that Sparrow with fixed parameters for 7-SAT instances could solve 103
instances vs. 104 in the default setting (which adapts the parameters according
to the maximum clause length found in the problem). We suppose that improv-
ing SLS solvers for random instances with large clause length would also yield
improvements for non random instances.
To check whether the performance of SLS solvers can be improved by pre-
processing the instances first, we have run the preprocessor of lingeling [4],
which incorporates all main preprocessing techniques, to simplify the instances.
The results unluckily show the contrary of what would have been expected (see
Table 3). None of the SLS solvers could benefit from the preprocessing step, solv-
ing equal or less instances. These results motivated the analysis of preprocess-
ing techniques in more detail, which was performed in [3]. It turns out that
bounded variable elimination, which performs variable elimination through res-
olution rules up to certain bound is a good preprocessing technique for SLS
solvers and can indeed improve the performance of SLS solvers.
Sparrow2011
sattime2011
800
sattime2012
SATzilla2012 Rand
EagleUP
600 CCASat
CPU Time (s)
pfolioUZK
ppfolio2012
SAT Solver Selector
400 SATzilla2012 All
probSAT
WalkSAT
200
0
0 100 200 300 400
Fig. 3. Results of the best performing solvers on the SAT Challenge 2012 random
instances as a cactus plot. For details about cactus plot see Fig. 2.
k f ct cb
3 poly 2.06 0.9
4 exp 2.85 -
5 exp 3.7 -
6 exp 5.1 -
≥ 7 exp 5.4 -
where k is the size of the longest clause found in the problem during pars-
ing. These parameter values have been determined in different configuration
experiments.
All array data structures where ended by a sentinel6 (i.e. the last element
in the array is the stop value; in our case we have used 0). All for-loops have
been changed into while-loops that have no counter but only a sentinel check,
allowing us to save several memory dereferences and variables. As most of the
operations performed by SLS solvers are loops over some small sized arrays,
this optimization turns out to improve the performance of the solver between
10 %–25 % (dependent on the instances).
6
We would like to thank Armin Biere for this suggestion.
Engineering SLS for SAT 13
5000
probSAT SC13
WalkSATlm2013
4000 CScoreSAT2013
vflipnum
CPU Time (s)
FrwCB2013
3000
CCA2013
BalancedZ
2000 Ncca+
sattime2013
1000
0
0 20 40 60 80 100
Fig. 4. Results of the best performing solvers on the SAT Competition 2013 random
satisfiable instances.
probSAT solved 81 out of 109 that could be solved by any solver. Altogether this
shows that the solving approach (and the parameter settings) used by probSAT
has an overall good performance.
In principle, WalkSAT [13] also uses a certain pattern of probabilities for flipping
one of the variables within a non-satisfied clause. But the probability distribution
does not depend on a single continuous function f as in our algorithms described
above, but uses some non-continuous if-then-else decisions as described in [13].
In Table 5 we compare the flipping probabilities in WalkSAT (setting the wp
parameter i.e. the noise value to wp = 0.567) with the break-only-poly-algorithm
(with cb = 2.06 and = 0.9) using several examples of break values combinations
that might occur within a 3-CNF clause.
Even though the probabilities look very similar, we think that the small
differences renders our approach to be more robust. Further, probSAT has the
PAC property [10, p. 153]. In each step every variable has a probability greater
zero to be picked for flipping. This is though not the case for WalkSAT. A variable
occurring in a clause where an other variable has a score of zero can not be
chosen for flipping. There is no published example for which WalkSAT gets
trapped in cycles. Though, during a talk given by Donald Knuth in Trento at the
SAT Conference in 2012 where he presented details about his implementation
of WalkSAT, he mentioned that Bram Cohen, the designer of WalkSAT, has
provided such an example.
Engineering SLS for SAT 15
6 Implementation Variations
In the previous sections we have compared the solvers based on their runtime. As
a consequence the efficiency of the implementation plays a crucial role and the
best available implementation should be taken for comparison. Another possible
comparison measure is the number of flips the solver needs to perform to find
a solution. From a practical point of view this is not optimal. The number
of flips per second (denoted with f lips/sec) is a key measure of SLS solvers
when it comes to compare algorithm implementations or two different similar
algorithms. In this Section we would like to address this problem by comparing
two different implementations of probSAT and WalkSAT on a set of very large
3-SAT problems.
All efficient implementations of SLS solvers are computing the scores of vari-
ables from scratch only within the initialization phase. During the search of the
solver, the scores are only updated. This is possible because only the score of
variables can change that are in the neighborhood of the variable being flipped.
This method is also known as caching (the scores of the variables are being
cached) in [10, p. 273] or incremental approach in [7].
The other method would be to compute the score of variables on the fly before
taking them into consideration for flipping. This method is called non-caching
or non-incremental approach. In case of probSAT or WalkSAT only the score of
variables from one single clause has to be computed as opposed to other solvers
where all variables from all unsatisfied clauses are taken into consideration for
flipping.
We have implemented two different versions of probSAT and WalkSAT within
the same code basis (i.e. the solvers are identical with exception of the pickVar
method), one that uses caching and one that does not. We have evaluated the
16 A. Balint and U. Schöning
15000
CPU Time (s)
10000
probSAT caching
5000
probSAT non−caching
WalkSAT (UBCSAT)
WalkSAT non−caching
WalkSAT caching
0
0 20 40 60 80 100
four different solvers on a set of 100 randomly generated 3-SAT problems with
106 variables and a ratio of 4.2. The results can be seen in Fig. 5.
Within the time limit of 1.5·104 s only the variants not using caching were able
to solve all problems. The implementation with caching solved only 72 (prob-
SAT) respectively 65 instances (WalkSAT). Note that all solvers started with
the same seed (i.e. they perform search on the exactly same search trajectory).
The difference between the different implementations in terms of performance
can be explained by the different number of f lips/sec. While the version with
caching performs around 1.4 · 105 flips/sec the version without caching is able to
perform around 2.2·105 flips/sec. This explains the difference in runtime between
the two different implementations. Similar findings have also been observed in
[20, p. 27] and in [7].
The advantage of non-caching decreases with increasing k (for random gen-
erated k-SAT problems) and becomes even a disadvantage for 5-SAT problems
and upwards. As a consequence the latest version of probSAT uses caching for
3-SAT problems and non-caching for the other types of problems.
We introduced a simple algorithmic design principle for a SLS solver which does
its job without heuristics and “tricks”. It just relies on the concept of probability
distribution and focused search. It is though flexible enough to allow plugging
in various functions f which guide the search.
Using this concept we were able to discover a non-symmetry regarding the
importance of the break and make values: the break value is the more important
one; one can even do without the make value completely.
Engineering SLS for SAT 17
Acknowledgments. We would like to thank the BWGrid [8] project for providing
the computational resources. This project was funded by the Deutsche Forschungsge-
meinschaft (DFG) under the number SCHO 302/9-1. We thank Daniel Diepold and
Simon Gerber for implementing the F-race configurator and providing different analy-
sis tools within the EDACC framework. We would also like to thank Andreas Fröhlich
for fruitful discussions on this topic and Armin Biere for helpful suggestions regarding
code optimizations.
References
1. Balint, A., Diepold, D., Gall, D., Gerber, S., Kapler, G., Retz, R.: EDACC - an
advanced platform for the experiment design, administration and analysis of empir-
ical algorithms. In: Coello, C.A.C. (ed.) LION 2011. LNCS, vol. 6683, pp. 586–599.
Springer, Heidelberg (2011). doi:10.1007/978-3-642-25566-3 46
2. Balint, A., Fröhlich, A.: Improving stochastic local search for SAT with a new
probability distribution. In: Strichman, O., Szeider, S. (eds.) SAT 2010. LNCS, vol.
6175, pp. 10–15. Springer, Heidelberg (2010). doi:10.1007/978-3-642-14186-7 3
3. Balint, A., Manthey, N.: Analysis of preprocessing techniques and their utility for
CDCL and SLS solver. In: Proceedings of POS2013 (2013)
4. Biere, A.: Lingeling and friends at the SAT competition 2011. Technical report,
FMV Reports Series, Institute for Formal Models and Verification, Johannes
Kepler University, Altenbergerstr. 69, 4040 Linz, Austria (2011)
5. Birattari, M., Yuan, Z., Balaprakash, P., Stützle, T.: F-Race and iterated
F-Race: an overview. In: Bartz-Beielstein, T., Chiarandini, M., Paquete, L.,
Preuss, M. (eds.) Experimental Methods for the Analysis of Optimization Algo-
rithms, pp. 311–336. Springer, Heidelberg (2010). http://dx.doi.org/10.1007/
978-3-642-02538-9 13
6. Braunstein, A., Mézard, M., Zecchina, R.: Survey propagation: an algorithm for
satisfiability. Random Structures & Algorithms 27(2), 201–226 (2005)
7. Fukunaga, A.: Efficient implementations of SAT local search. In: Seventh Interna-
tional Conference on Theory and Applications of Satisfiability Testing (SAT 2004),
pp. 287–292 (2004, this volume)
8. bwGRiD(http://www.bwgrid.de/): Member of the German D-Grid initiative,
funded by the Ministry of Education and Research (Bundesministeriumfür Bildung
und Forschung) and the Ministry for Science, Research and Arts Baden-Württemberg
(Ministerium für Wissenschaft, Forschung und Kunst Baden-Württemberg). Techi-
cal report, Universities of Baden-Württemberg (2007-2010)
18 A. Balint and U. Schöning
9. Hoos, H.H.: An adaptive noise mechanism for WalkSAT. In: Proceedings of the
Eighteenth National Conference in Artificial Intelligence (AAAI 2002), pp. 655–
660 (2002)
10. Hoos, H.H., Stützle, T.: Stochastic Local Search: Foundations and Applications.
Morgan Kaufmann, San Francisco (2005)
11. Kroc, L., Sabharwal, A., Selman, B.: An empirical study of optimal noise and
runtime distributions in local search. In: Strichman, O., Szeider, S. (eds.) SAT
2010. LNCS, vol. 6175, pp. 346–351. Springer, Heidelberg (2010). doi:10.1007/
978-3-642-14186-7 31
12. Luby, M., Sinclair, A., Zuckerman, D.: Optimal speedup of Las Vegas algorithms.
In: ISTCS, pp. 128–133 (1993). http://dblp.uni-trier.de/db/conf/istcs/istcs1993.
html#LubySZ93
13. McAllester, D., Selman, B., Kautz, H.: Evidence for invariants in local search.
In: Proceedings of the Fourteenth National Conference on Artificial Intelligence
(AAAI 1997), pp. 321–326 (1997)
14. Papadimitriou, C.H.: On selecting a satisfying truth assignment. In: Proceedings of
the 32nd Annual Symposium on Foundations of Computer Science (FOCS 1991),
pp. 163–169 (1991)
15. Schöning, U.: A probabilistic algorithm for k-SAT and constraint satisfaction prob-
lems. In: Proceedings of the Fourtieth Annual Symposium on Foundations of Com-
puter Science (FOCS 1999), p. 410 (1999)
16. Schöning, U.: Principles of stochastic local search. In: Akl, S.G., Calude, C.S.,
Dinneen, M.J., Rozenberg, G., Wareham, H.T. (eds.) UC 2007. LNCS, vol. 4618,
pp. 178–187. Springer, Heidelberg (2007). doi:10.1007/978-3-540-73554-0 17
17. Seitz, S., Alava, M., Orponen, P.: Focused local search for random 3-satisfiability.
CoRR abs/cond-mat/0501707 (2005)
18. Sheskin, D.J.: Handbook of Parametric and Nonparametric Statistical Procedures,
4th edn. Chapman & Hall/CRC, Boca Raton (2007)
19. Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press,
Cambridge (1998). http://www.cs.ualberta.ca/%7Esutton/book/ebook/the-book.
html
20. Tompkins, D.A.D.: Dynamic local search for SAT: design, insights and analysis.
Ph.D. thesis, University of British Columbia, October 2010
21. Tompkins, D.A.D., Balint, A., Hoos, H.H.: Captain jack: new variable selection
heuristics in local search for SAT. In: Sakallah, K.A., Simon, L. (eds.) SAT
2011. LNCS, vol. 6695, pp. 302–316. Springer, Heidelberg (2011). doi:10.1007/
978-3-642-21581-0 24
Route Planning in Transportation Networks
1 Introduction
This survey is an introduction to the state of the art in the area of practical algo-
rithms for routing in transportation networks. Although a thorough survey by
Delling et al. [94] has appeared fairly recently, it has become outdated due to sig-
nificant developments in the last half-decade. For example, for continent-sized
This work was mostly done while the authors Daniel Delling, Andrew Goldberg, and
Renato F. Werneck were at Microsoft Research Silicon Valley.
c Springer International Publishing AG 2016
L. Kliemann and P. Sanders (Eds.): Algorithm Engineering, LNCS 9220, pp. 19–80, 2016.
DOI: 10.1007/978-3-319-49487-6 2
20 H. Bast et al.
Fig. 1. Schematic search spaces of Dijkstra’s algorithm (left), bidirectional search (mid-
dle), and the A* algorithm (right).
avoiding the scans of vertices that are not in the direction of t. They either
exploit the (geometric) embedding of the network or properties of the graph
itself, such as the structure of shortest path trees toward (compact) regions of
the graph.
picking well-spaced landmarks close to the boundary of the graph leads to the
best results, with acceptable query times on average [112,150]. For a small (but
noticeable) fraction of the queries, however, speedups relative to bidirectional
Dijkstra are minor.
Arc Flags. The Arc Flags approach [157,178] is somewhat similar to Geometric
Containers, but does not use geometry. During preprocessing, it partitions the
graph into K cells that are roughly balanced (have similar number of vertices)
and have a small number of boundary vertices. Each arc maintains a vector of K
bits (arc flags), where the i-th bit is set if the arc lies on a shortest path to some
vertex of cell i. The search algorithm then prunes arcs which do not have the
bit set for the cell containing t. For better query performance, arc flags can be
extended to nested multilevel partitions [197]. Whenever the search reaches the
cell that contains t, it starts evaluating arc flags with respect to the (finer) cells
of the level below. This approach works best in combination with bidirectional
search [157].
The arc flags for a cell i are computed by growing a backward shortest path
tree from each boundary vertex (of cell i), setting the i-th flag for all arcs of
the tree. Alternatively, one can compute arc flags by running a label-correcting
algorithm from all boundary vertices simultaneously [157]. To reduce preprocess-
ing space, one can use a compression scheme that flips some flags from zero to
one [58], which preserves correctness. As Sect. 3 will show, Arc Flags currently
have the fastest query times among purely goal-directed methods for road net-
works. Although high preprocessing times (of several hours) have long been a
drawback of Arc Flags, the recent PHAST algorithm (cf. Sect. 2.7) can make
this method more competitive with other techniques [75].
Fig. 3. Left: Multilevel overlay graph with two levels. The dots depict separator vertices
in the lower and upper level. Right: Overlay graph constructed from arc separators.
Each cell contains a full clique between its boundary vertices, and cut arcs are thicker.
Arc Separators. The second class of algorithms we consider uses arc sepa-
rators to build the overlay graphs. In a first step, one computes a parti-
tion C = (C1 , . . . , Ck ) of the vertices into balanced cells while attempting to
minimize the number of cut arcs (which connect boundary vertices of different
cells). Shortcuts are then added to preserve the distances between the boundary
vertices within each cell.
An early version of this approach is the Hierarchical MulTi (HiTi)
method [165]. It builds an overlay graph containing all boundary vertices and
all cut arcs. In addition, for each pair u, v of boundary vertices in Ci , HiTi adds
to the overlay a shortcut (u, v) representing the shortest path from u to v in G
restricted to Ci . The query algorithm then (implicitly) runs Dijkstra’s algorithm
on the subgraph induced by the cells containing s and t plus the overlay. This
approach can be extended to use nested multilevel partitions. HiTi has only been
tested on grid graphs [165], leading to modest speedups. See also Fig. 3.
The recent Customizable Route Planning (CRP) [76,78] algorithm uses a
similar approach, but is specifically engineered to meet the requirements of real-
world systems operating on road networks. In particular, it can handle turn costs
and is optimized for fast updates of the cost function (metric). Moreover, it uses
PUNCH [79], a graph partitioning algorithm tailored to road networks. Finally,
CRP splits preprocessing in two phases: metric-independent preprocessing and
customization. The first phase computes, besides the multilevel partition, the
Route Planning in Transportation Networks 27
pruning. Nevertheless, Geisberger et al. [142] prove that the highest-ranked ver-
tex u∗ on the original s–t path will be visited by both searches, and that both its
labels will be exact, i. e., ds (u∗ ) = dist(s, u∗ ) and dt (u∗ ) = dist(u∗ , t). Therefore,
among all vertices u visited by both searches, the one minimizing ds (u) + dt (u)
represents the shortest path. Note that, since u∗ is not necessarily the first vertex
that is scanned by both searches, they cannot stop as soon as they meet.
Query times depend on the vertex order. During preprocessing, the ver-
tex order is usually determined online and bottom-up. The overall (heuristic)
goal is to minimize the number of edges added during preprocessing. One typ-
ically selects the vertex to be contracted next by considering a combination
of several factors, including the net number of shortcuts added and the num-
ber of nearby vertices already contracted [142,168]. Better vertex orders can
be obtained by combining the bottom-up algorithm with (more expensive) top-
down offline algorithms that explicitly classify vertices hitting many shortest
paths as more important [5,77]. Since road networks have very small separa-
tors [79], one can use nested dissection to obtain reasonably good orders that
work for any length function [100,107]. Approximate CH has been considered as
a way to accommodate networks with less inherent hierarchy [143].
CH is actually a successor of Highway Hierarchies [225] and Highway Node
Routing [234], which are based on similar ideas. CH is not only faster, but also
conceptually simpler. This simplicity has made it quite versatile, serving as a
building block not only for other point-to-point algorithms [4,15,40,100], but
also for extended queries (cf. Sect. 2.7) and applications (cf. Sect. 3.2).
does not even have to look at all the hubs in a label [4]. As a result, HL has
the fastest known queries for road networks, taking roughly the time needed
for five accesses to main memory (see Sect. 3.1). One drawback is space usage,
which, although not prohibitive, is significantly higher than for competing meth-
ods. By combining common substructures that appear in multiple labels, Hub
Label Compression (HLC) [82] (see also [77]) reduces space usage by an order of
magnitude, at the expense of higher query times.
Locality filters are less straightforward in such cases: although one can still
use geographical distances [142,224], a graph-based approach considering the
Voronoi regions [189] induced by transit nodes tends to be significantly more
accurate [15]. A theoretically justified TNR variant [3] also picks important
vertices as transit nodes and has a natural graph-based locality filter, but is
impractical for large networks.
Pruned Highway Labeling. The Pruned Highway Labeling (PHL) [11] algorithm
can be seen as a hybrid between pure labeling and transit nodes. Its preprocessing
routine decomposes the input into disjoint shortest paths, then computes a label
for each vertex v containing the distance from v to vertices in a small subset of
such paths. The labels are such that any shortest s–t path can be expressed as
s–u–w–t, where u–w is a subpath of a path P that belongs to the labels of
s and t. Queries are thus similar to HL, finding the lowest-cost intersecting path.
For efficient preprocessing, the algorithm uses the pruned labeling technique [12].
Although this method has some similarity with Thorup’s distance oracle for
planar graphs [245], it does not require planarity. PHL has only been evaluated
on undirected graphs, however.
2.6 Combinations
Since the individual techniques described so far exploit different graph proper-
ties, they can often be combined for additional speedups. This section describes
such hybrid algorithms. In particular, early results [161,235] considered the com-
bination of Geometric Containers, multilevel overlay graphs, and (Euclidean-
based) A* on transportation networks, resulting in speedups of one or two orders
of magnitude over Dijkstra’s algorithm.
More recent studies have focused on combining hierarchical methods (such as
CH or Reach) with fast goal-directed techniques (such as ALT or Arc Flags). For
instance, the REAL algorithm combines Reach and ALT [149]. A basic combina-
tion is straightforward: one simply runs an ALT query with additional pruning
by reach (using the ALT lower bounds themselves for reach evaluations). A more
sophisticated variant uses reach-aware landmarks: landmarks and their distances
are only precomputed for vertices with high reach values. This saves space (only
a small fraction of the graph needs to store landmark distances), but requires
two-stage queries (goal direction is only used when the search is far enough from
both source and target).
A similar space-saving approach is used by Core-ALT [40,88]. It first com-
putes an overlay graph for the core graph, a (small) subset (e. g., 1 %) of ver-
tices (which remain after “unimportant” ones are contracted), then computes
landmarks for the core vertices only. Queries then work in two stages: first plain
bidirectional search, then ALT is applied when the search is restricted to the core.
The (earlier) HH* approach [95] is similar, but uses Highway Hierarchies [225]
to determine the core.
Another approach with two-phase queries is ReachFlags [40]. During pre-
processing, it first computes (approximate) reach values for all vertices in G,
32 H. Bast et al.
then extracts the subgraph H induced by all vertices whose reach value exceeds
a certain threshold. Arc flags are then only computed for H, to be used in the
second phase of the query.
The SHARC algorithm [39] combines the computation of shortcuts with
multilevel arc flags. The preprocessing algorithm first determines a partition
of the graph and then computes shortcuts and arc flags in turn. Shortcuts are
obtained by contracting unimportant vertices with the restriction that shortcuts
never span different cells of the partition. The algorithm then computes arc
flags such that, for each cell C, the query uses a shortcut arc if and only if the
target vertex is not in C. Space usage can be reduced with various compression
techniques [58]. Note that SHARC is unidirectional and hierarchical: arc flags not
only guide the search toward the target, but also vertically across the hierarchy.
This is useful when the backward search is not well defined, as in time-dependent
route planning (discussed in Sect. 2.7).
Combining CH with Arc Flags results in the CHASE algorithm [40]. Dur-
ing preprocessing, a regular contraction hierarchy is computed and the search
graph that includes all shortcuts is assembled. The algorithm then extracts the
subgraph H induced by the top k vertices according to the contraction order.
Bidirectional arc flags (and the partition) are finally computed on the restricted
subgraph H. Queries then run in two phases. Since computing arc flags was
somewhat slow, k was originally set to a small fraction (about 5 %) of the total
number |V | of vertices [40]. More recently, Delling et al. showed that PHAST (see
Sect. 2.7) can compute arc flags fast enough to allow k to be set to |V |, making
CHASE queries much simpler (single-pass), as well as faster [75].
Finally, Bauer et al. [40] combine Transit Node Routing with Arc Flags to
obtain the TNR+AF algorithm. Recall that the bottleneck of the TNR query is
performing the table lookups between pairs of access nodes from A(s) and A(t).
To reduce the number of lookups, TNR+AF’s preprocessing decomposes the set
of transit nodes T into k cells. For each vertex s and access node u ∈ A(s), it
stores a k-bit vector, with bit i indicating whether there exists a shortest path
from s to cell i through u. A query then only considers the access nodes from s
that have their bits set with respect to the cells of A(t). A similar pruning is
done at the target.
2.7 Extensions
In various applications, one is often interested in more than just the length of
the shortest path between two points in a static network. Most importantly,
one should also be able to retrieve the shortest path itself. Moreover, many of
the techniques considered so far can be adapted to compute batched shortest
paths (such as distance tables), to more realistic scenarios (such as dynamic
networks), or to deal with multiple objective functions. In the following, we
briefly discuss each of these extensions.
Path Retrieval. Our descriptions so far have focused on finding only the length
of the shortest path. The algorithms we described can easily be augmented to
Route Planning in Transportation Networks 33
provide the actual list of edges or vertices on the path. For techniques that do
not use shortcuts (such as Dijkstra’s algorithm, A* search, or Arc Flags), one
can simply maintain a parent pointer for each vertex v, updating it whenever
the distance label of v changes. When shortcuts are present (such as in CH,
SHARC, or CRP), this approach gives only a compact representation of the
shortest path (in terms of shortcuts). The shortcuts then need to be unpacked.
If each shortcut is the concatenation of two other arcs (or shortcuts), as in CH,
storing the middle vertex [142] of each shortcut allows for an efficient (linear-
time) recursive unpacking of all shortcuts on the output path. If shortcuts are
built from multiple arcs (as for CRP or SHARC), one can either store the entire
sequence for each shortcut [225] or run a local (bidirectional) Dijkstra search from
its endpoints [78]. These two techniques can be used for bounded-hop algorithms
as well.
to find a Pareto set, i. e., a maximum set of incomparable paths. Such sets of
shortest paths can be computed by extensions of Dijkstra’s algorithm; see [117]
for a survey on multicriteria combinatorial optimization. More specifically, the
Multicriteria Label-Setting (MLS) algorithm [155,187,196,243] extends Dijk-
stra’s algorithm by keeping, for each vertex, a bag of nondominated labels. Each
label is represented as a tuple, with one entry per optimization criterion. The
priority queue maintains labels instead of vertices, typically ordered lexicograph-
ically. In each iteration, it extracts the minimum label L and scans the incident
arcs a = (u, v) of the vertex u associated with L. It does so by adding the cost
of a to L and then merging L into the bag of v, eliminating possibly dominated
labels on the fly. In contrast, the Multi-Label-Correcting (MLC) algorithm [68,98]
considers the whole bag of nondominated labels associated with u at once when
scanning the vertex u. Hence, individual labels of u may be scanned multiple
times during one execution of the algorithm.
Both MLS and MLC are fast enough as long as the Pareto sets are
small [109,204]. Unfortunately, Pareto sets may contain exponentially many solu-
tions, even for the restricted case of two optimization criteria [155], which makes
it hard to achieve large speedups [47,97]. To reduce the size of Pareto sets, one
can relax domination. In particular, (1 + ε)-Pareto sets have provable polyno-
mial size [212] and can be computed efficiently [182,246,253]. Moreover, large
Pareto sets open up a potential for parallelization that is not present for a single
objective function [124,222].
A reasonable alternative [138] to multicriteria search is to optimize a linear
combination αc1 + (1 − α)c2 of two criteria (c1 , c2 ), with the parameter α set at
query time. Moreover, it is possible to efficiently compute the values of α where
the path actually changes. Funke and Storandt [133] show that CH can handle
such functions with polynomial preprocessing effort, even with more than two
criteria.
respect to the vertex order), and Hub Labels (with respect to the hubs) [252].
In fact, minimizing the number of shortcuts for CH is APX-hard [36,194]. For
SHARC, however, a greedy factor-k approximation algorithm exists [38]. Decid-
ing which k shortcuts (for fixed k) to add to a graph in order to minimize the
SHARC search space is also NP-hard [38]. Bauer et al. [35] also analyze the pre-
processing of Arc Flags in more detail and on restricted graph classes, such as
paths, trees, and cycles, and show that finding an optimal partition is NP-hard
even for binary trees.
Besides complexity, theoretical performance bounds for query algorithms,
which aim to explain their excellent practical performance, have also been con-
sidered. Proving better running time bounds than those of Dijkstra’s algorithm
is unlikely for general graphs; in fact, there are inputs for which most algorithms
are ineffective. That said, one can prove nontrivial bounds for specific graph
classes. In particular, various authors [37,194] have independently observed a
natural relationship between CH and the notions of filled graphs [214] and elim-
ination trees [232]. For planar graphs, one can use nested dissection [180] to
build a CH order leading to O(|V | log |V |) shortcuts [37,194]. More generally,
for minor-closed graph classes with balanced O( |V |)-separators, the search
space is bounded by O( |V |) [37]. Similarly, on graphs with treewidth k, the
search space of CH is bounded by O(k log |V |) [37].
Road networks have motivated a large amount of theoretical work on algo-
rithms for planargraphs. In particular, it is known that planar graphs have sepa-
rators of size O( |V |) [180,181]. Although road networks are not strictly planar,
they do have small separators [79,123], so theoretically efficient algorithms for
planar graphs are likely to also perform well in road networks. Sommer [238]
surveys several approximate methods with various trade-offs. In practice, the
observed performance of most speedup techniques is much better on actual road
networks than on arbitrary planar graphs (even grids). A theoretical explanation
of this discrepancy thus requires a formalization of some property related to key
features of real road networks.
One such graph property is Highway Dimension, proposed by Abraham
et al. [3] (see also [1,7]). Roughly speaking, a graph has highway dimension h if,
at any scale r, one can hit all shortest paths of length at least r by a hitting set S
that is locally sparse, in the sense that any ball of radius r has at most h elements
from S. Based on previous experimental observations [30], the authors [7] con-
jecture that road networks have small highway dimension. Based on this notion,
they establish bounds on the performance of (theoretically justified versions of)
various speedup techniques in terms of h and the graph diameter D, assuming
the graph is undirected and that edge lengths are integral. More precisely, after
running a polynomial-time preprocessing routine, which adds O(h log h log D)
shortcuts to G, Reach and CH run in O((h log h log D)2 ) time. Moreover, they
also show that HL runs in O(h log h log D) time and long-range TNR queries
take O(h2 ) time. In addition, Abraham et al. [3] show that a graph with high-
way dimension h has doubling dimension log(h + 1), and Kleinberg et al. [171]
show that landmark-based triangulation yields good bounds for most pairs of
38 H. Bast et al.
vertices of graphs with small doubling dimension. This gives insight into the
good performance of ALT in road networks.
The notion of highway dimension is an interesting application of the scientific
method. It was originally used to explain the good observed performance of
CH, Reach, and TNR, and ended up predicting that HL (which had not been
implemented yet) would have good performance in practice.
Generative models for road networks have also been proposed and analyzed.
Abraham et al. [3,7] propose a model that captures some of the properties of
road networks and generates graphs with provably small highway dimension.
Bauer e al. [42] show experimentally that several speedup techniques are indeed
effective on graphs generated according to this model, as well as according to
a new model based on Voronoi diagrams. Models with a more geometric flavor
have been proposed by Eppstein and Goodrich [123] and by Eisenstat [118].
Besides these results, Rice and Tsotras [220] analyze the A* algorithm and
obtain bounds on the search space size that depend on the underestimation
error of the potential function. Also, maintaining and updating multilevel overlay
graphs have been theoretically analyzed in [57]. For Transit Node Routing, Eisner
and Funke [120] propose instance-based lower bounds on the size of the transit
node set. For labeling algorithms, bounds on the label size for different graph
classes are given by Gavoille et al. [135]. Approximation algorithms to compute
small labels have also been studied [16,64,80]; although they can find slightly
better labels than faster heuristics [5,77], their running time is prohibitive [80].
Because the focus of this work is on algorithm engineering, we refrain from
going into more detail about the available theoretical work. Instead, we refer the
interested reader to overview articles with a more theoretical emphasis, such as
those by Sommer [238], Zwick [262], and Gavoille and Peleg [134].
We give data for two models. The simplified model ignores turn restrictions
and penalties, while the realistic model includes the turn information [255]. There
are two common approaches to deal with turns. The arc-based representation [59]
blows up the graph so that roads become vertices and feasible turns become arcs.
In contrast, the compact representation [76,144] keeps intersections as vertices,
but with associated turn tables. One can save space by sharing turn tables among
many vertices, since the number of intersection types in a road network is rather
limited. Most speedup techniques can be used as is for the arc-based represen-
tation, but may need modification to work on the compact model.
Most experimental studies are restricted to the simplified model. Since some
algorithms are more sensitive to how turns are modeled than others, it is hard
to extrapolate these results to more realistic networks. We therefore consider
experimental results for each model separately.
Fig. 7. Preprocessing and average query time performance for algorithms with available
experimental data on the road network of Western Europe, using travel times as edge
weights. Connecting lines indicate different trade-offs for the same algorithm. The figure
is inspired by [238].
model that includes turn costs. For reference, the plot includes unidirectional and
bidirectional implementations of Dijkstra’s algorithm using a 4-heap. (Note that
one can obtain a 20 % improvement when using a multilevel bucket queue [147].)
Finally, the table-lookup figure is based on the time of a single memory access
in our reference machine and the precomputation time of |V | shortest path trees
using PHAST [75]. Note that a machine with more than one petabyte of RAM (as
required by this algorithm) would likely have slower memory access times.
Times in the plot are on a single core of an Intel X5680 3.33 GHz CPU,
a mainstream server at the time of writing. Several of the algorithms in
the plot were originally run on this machine [5,75,78,82]; for the remaining,
we divide by the following scaling factors: 2.322 for [40,83], 2.698 for [142],
1.568 for [15], 0.837 for [107], and 0.797 for [112]. These were obtained from
a benchmark (developed for this survey) that measures the time of computing
several shortest path trees on the publicly available USA road network with
travel times [101]. For the machines we did not have access to, we asked the
authors to run the benchmark for us [112]. The benchmark is available from
Route Planning in Transportation Networks 41
Table 1 has additional details about the methods in the Pareto set, includ-
ing two versions of Dijkstra’s algorithm, one Dijkstra-based hierarchical tech-
nique (CH), three non-graph-based algorithms (TNR, HL, HLC), and two com-
binations (CHASE and TNR+AF). For reference, the table also includes a goal-
directed technique (Arc Flags) and a separator-based algorithm (CRP), even
though they are dominated by other methods. All algorithms were rerun for
this survey on the reference machine (Intel X5680 3.33 GHz CPU), except those
based on TNR, for which we report scaled results. All runs are single-threaded for
this experiment, but note that all preprocessing algorithms could be accelerated
using multiple cores (and, in some cases, even GPUs) [75,144].
For each method, Table 1 reports the total amount of space required by all
data structures (including the graph, if needed, but excluding extra information
needed for path unpacking), the total preprocessing time, the number of vertices
42 H. Bast et al.
scanned by an average query (where applicable) and the average query time.
Once again, queries consist of pairs of vertices picked uniformly at random. We
note that all methods tested can be parametrized (typically within a relatively
narrow band) to achieve different trade-offs between query time, preprocessing
time, and space. For simplicity, we pick a single “reasonable” set of parameters
for each method. The only exception is HL-∞, which achieves the fastest reported
query times but whose preprocessing is unreasonably slow.
Observe that algorithms based on any one of the approaches considered in
Sect. 2 can answer queries in milliseconds or less. Separator-based (CRP), hierar-
chical (CH), and goal-directed (Arc Flags) methods do not use much more space
than Dijkstra’s algorithm, but are three to four orders of magnitude faster. By
combining hierarchy-based pruning and goal direction, CHASE improves query
times by yet another order of magnitude, visiting little more than the shortest
path itself. Finally, when a higher space overhead is acceptable, non-graph-based
methods can be more than a million times faster than the baseline. In particular,
HL-∞ is only 5 times slower than the trivial table-lookup method, where a query
consists of a single access to main memory. Note that the table-lookup method
itself is impractical, since it would require more than one petabyte of RAM.
The experiments reported so far consider only random queries, which tend to
be long-range. In a real system, however, most queries tend to be local. For that
reason, Sanders and Schultes [223] introduced a methodology based on Dijkstra
ranks. When running Dijkstra’s algorithm from a vertex s, the rank of a vertex u
is the order in which it is taken from the priority queue. By evaluating pairs of
vertices for Dijkstra ranks 21 , 22 , . . . , 2log|V | for some randomly chosen sources,
all types (local, mid-range, global) of queries are evaluated. Figure 8 reports
the median running times for all techniques from Table 1 (except TNR+AF, for
which such numbers have never been published) for 1 000 random sources and
Dijkstra ranks ≥ 26 . As expected, algorithms based on graph searches (including
Dijkstra, CH, CRP, and Arc Flags) are faster for local queries. This is not true
for bounded-hop algorithms. For TNR, in particular, local queries must actually
use a (significantly slower) graph-based approach. HL is more uniform overall
because it never uses a graph.
Realistic Setting. Although useful, the results shown in Table 1 do not capture
all features that are important for real-world systems. First, systems providing
actual driving directions must account for turn costs and restrictions, which the
simplified graph model ignores. Second, systems must often support multiple
metrics (cost functions), such as shortest distances, avoid U-turns, avoid/prefer
freeways, or avoid ferries; metric-specific data structures should therefore be
as small as possible. Third, query times should be robust to the choice of cost
functions: the system should not time out if an unfriendly cost function is chosen.
Finally, one should be able to incorporate a new cost function quickly to account
for current traffic conditions (or even user preferences).
CH has the fastest preprocessing among the algorithms in Table 1 and its
queries are fast enough for interactive applications. Its performance degrades
Route Planning in Transportation Networks 43
under realistic constraints [78], however. In contrast, CRP was developed with
these constraints in mind. As explained in Sect. 2.3, it splits its preprocessing
phase in two: although the initial metric-independent phase is relatively slow (as
shown in Table 1), only the subsequent (and fast) metric-dependent customiza-
tion phase must be rerun to incorporate a new metric. Moreover, since CRP is
based on edge separators, its performance is (almost) independent of the cost
function.
Table 2 (reproduced from [78]) compares CH and CRP with and without turn
costs, as well as for travel distances. The instance tested is the same in Table 1,
augmented by turn costs (set to 100 seconds for U-turns and zero otherwise).
This simple change makes it almost as hard as fully realistic (proprietary) map
data used in production systems [78]. The table reports metric-independent pre-
processing and metric-dependent customization separately; “DS” refers to the
data structures shared by all metrics, while “custom” refers to the additional
space and time required by each individual metric. Unlike in Table 1, space con-
sumption also includes data structures used for path unpacking. For queries,
we report the time to get just the length of the shortest path (dist), as well
as the total time to retrieve both the length and the full path (path). More-
over, preprocessing (and customization) times refer to multi-threaded executions
on 12 cores; queries are still sequential.
As the table shows, CRP query times are very robust to the cost function and
the presence of turns. Also, a new cost function can be applied in roughly 370 ms,
fast enough to even support user-specific cost functions. Customization times can
44 H. Bast et al.
be even reduced to 36 ms with GPUs [87], also reducing the amount of data stored
in main memory by a factor of 6. This is fast enough for setting the cost function
at query time, enabling realistic personalized driving directions on continental
scale. If GPUs are not available or space consumption is an issue, one can drop
the contraction-based customization. This yields customization times of about
one second on a 12-core CPU, which is still fast enough for many scenarios.
In contrast, CH performance is significantly worse on metrics other than travel
times without turn costs.
We stress that not all applications have the same requirements. If only good
estimates on travel times (and not actual paths) are needed, ignoring turn costs
and restrictions is acceptable. In particular, ranking POIs according to travel
times (but ignoring turn costs) already gives much better results than ranking
based on geographic distances. Moreover, we note that CH has fast queries even
with fully realistic turn costs. If space (for the expanded graph) is not an issue,
it can still provide a viable solution to the static problem; the same holds for
related methods such as HL and HLC [82]. For more dynamic scenarios, CH
preprocessing can be made parallel [144] or even distributed [168]; even if run
sequentially, it is fast enough for large metropolitan areas.
3.2 Applications
As discussed in Sect. 2.7, many speedup techniques can handle more than plain
point-to-point shortest path computations. In particular, hierarchical techniques
such as CH or CRP tend to be quite versatile, with many established extensions.
Some applications may involve more than one path between a source and
a target. For example, one may want to show the user several “reasonable”
paths (in addition to the shortest one) [60]. In general, these alternative paths
should be short, smooth, and significantly different from the shortest path (and
other alternatives). Such paths can either be computed directly as the concate-
nation of partial shortest paths [6,60,78,173,184] or compactly represented as
a small graph [17,174,213]. A related problem is to compute a corridor [86]
of paths between source and target, which allows deviations from the best
route (while driving) to be handled without recomputing the entire path.
Route Planning in Transportation Networks 45
These robust routes can be useful in mobile scenarios with limited connectivity.
Another useful tool to reduce communication overhead in such cases is route
compression [31].
Extensions that deal with nontrivial cost functions have also been considered.
In particular, one can extend CH to handle flexible arc restrictions [140] (such
as height or weight limitations) or even multiple criteria [133,138] (such as opti-
mizing costs and travel time). Minimizing the energy consumption of electric
vehicles [43,44,122,152,240,241] is another nontrivial application, since batteries
are recharged when the car is going downhill. Similarly, optimal cycling routes
must take additional constraints (such as the amount of uphill cycling) into
account [239].
The ability of computing many (batched) shortest paths fast enables inter-
esting new applications. By quickly analyzing multiple candidate shortest paths,
one can efficiently match GPS traces to road segments [119,121]. Traffic simu-
lations also benefit from acceleration techniques [183], since they must consider
the likely routes taken by all drivers in a network. Another application is route
prediction [177]: one can estimate where a vehicle is (likely) headed by mea-
suring how good its current location is as a via point towards each candidate
destination. Fast routing engines allow more locations to be evaluated more fre-
quently, leading to better predictions [2,121,162,176]. Planning placement of
charging stations can also benefit from fast routing algorithms [132]. Another
important application is ride sharing [2,110,139], in which one must match a
ride request with the available offer in a large system, typically by minimizing
drivers’ detours.
Finally, batched shortest-path computations enable a wide range of point-of-
interest queries [2,99,114,119,137,179,221,260]. Typical examples include find-
ing the closest restaurant to a given location, picking the best post office to stop
on the way home, or finding the best meeting point for a group of friends. Typ-
ically using the bucket-based approach (cf. Sect. 2.7), fast routing engines allow
POIs to be ranked according to network-based cost functions (such as travel
time) rather than geographic distances. This is crucial for accuracy in areas
with natural (or man-made) obstacles, such as mountains, rivers, or rail tracks.
Note that more elaborate POI queries must consider concatenations of shortest
paths. One can handle these efficiently using an extension of the bucket-based
approach that indexes pairs of vertices instead of individual ones [2,99].
4.1 Modeling
The first challenge is to model the timetable in order to enable algorithms that
compute optimal journeys. Since the shortest-path problem is well understood
in the literature, it seems natural to build a graph G = (V, A) from the timetable
such that shortest paths in G correspond to optimal journeys. This section
reviews the two main approaches to do so (time-expanded and time-dependent),
as well as the common types of problems one is interested to solve. For a more
detailed overview of these topics, we refer the reader to an overview article by
Müller-Hannemann et al. [203].
connects each arrival vertex to the first transfer vertex that obeys the minimum
change time constraints. See Fig. 9 for an illustration. If there is a footpath from
stop pi to stop pj , then for each arrival event at stop pi one adds an arc to
the earliest reachable transfer vertex at pj . This model has been further engi-
neered [90] to reduce the number of arcs that are explored “redundantly” during
queries.
A timetable is usually valid for a certain period of time (up to one year).
Since the timetables of different days of the year are quite similar, a space-
saving technique (compressed model ) is to consider events modulo their traffic
days [202,219].
5 min during rush hour, and every 10 min otherwise. Bast and Storandt [27]
exploit this fact in the frequency-based model : as in the time-dependent approach,
vertices correspond to stops, and an arc between a pair of stops (u, v) is added
if there is at least one elementary connection from u to v. However, instead of
storing the departures of an arc explicitly, those with coinciding travel times are
compressed into a set of tuples consisting of an initial departure time τdep , a time
interval Δ, and a frequency f . The corresponding original departures can thus be
reconstructed by computing each τdep + f i for those i ∈ Z≥0 that satisfy τdep +
f i ≤ τdep + Δ. Bast and Storandt compute these tuples by covering the set
of departure times by a small set of overlapping arithmetic progressions, then
discarding duplicate entries (occurring after decompression) at query time [27].
the optimal shortest path (namely for different consecutive days). One possible
solution is to use a bag of labels for each vertex as in the multicriteria variants
described below. Another solution is described in Pyrga et al. [219].
On time-dependent graphs, Dijkstra’s algorithm can be augmented to com-
pute shortest paths [65,111], as long as the cost functions are nonnegative and
FIFO [208,209]. The only modification is that, when the algorithm scans an
arc (u, v), the arc cost is evaluated at time τ + dist(s, u). Note that the algo-
rithm retains the label-setting property, i. e., each vertex is scanned at most once.
In the time-dependent public transit model, the query is run from the stop vertex
corresponding to ps and the algorithm may stop as soon as it extracts pt from
the priority queue. The algorithm is called TDD (time-dependent Dijkstra).
Another approach is to exploit the fact that the time-expanded graph is
directed and acyclic. (Note that overnight connections can be handled by
unrolling the timetable for several consecutive periods.) By scanning vertices
in topological order, arbitrary queries can be answered in linear time. This sim-
ple and well-known observation has been applied for journey planning by Mellouli
and Suhl [191], for example. While this idea saves the relatively expensive pri-
ority queue operations of Dijkstra’s algorithm, one can do even better by not
maintaining the graph structure explicitly, thus improving locality and cache
efficiency. The recently developed Connection Scan Algorithm (CSA) [105] orga-
nizes the elementary connections of the timetable in a single array, sorted by
departure time. The query then only scans this array once, which is very effi-
cient in practice. Note that CSA requires footpaths in the input to be closed
under transitivity to ensure correctness.
Range Problem. The range problem can be solved on the time-dependent model
by variants of Dijkstra’s algorithm. The first variant [68,206] maintains, at each
vertex u, a travel-time function (instead of a scalar label) representing the opti-
mal travel times from s to u for the considered time range. Whenever the algo-
rithm relaxes an arc (u, v), it first links the full travel-time function associated
with u to the (time-dependent) cost function of the arc (u, v), resulting in a func-
tion that represents the times to travel from s to v via u. This function is then
merged into the (tentative) travel time function associated with v, which corre-
sponds to taking the element-wise minimum of the two functions. The algorithm
loses the label-setting property, since travel time functions cannot be totally
ordered. As a result the algorithm may reinsert vertices into the priority queue
whenever it finds a journey that improves the travel time function of an already
scanned vertex.
Another algorithm [34] exploits the fact that trips depart at discrete points
in time, which helps to avoid redundant work when propagating travel time
functions. When it relaxes an arc, it does not consider the full function, but
each of its encoded connections individually. It then only propagates the parts
of the function that have improved.
The Self-Pruning Connection Setting algorithm (SPCS) [85] is based on the
observation that any optimal journey from s to t has to start with one of the trips
departing from s. It therefore runs, for each such trip, Dijkstra’s algorithm from s
Route Planning in Transportation Networks 51
at its respective departure time. SPCS performs these runs simultaneously using
a shared priority queue whose entries are ordered by arrival time. Whenever
the algorithm scans a vertex u, it checks if u has been already scanned for an
associated (departing) trip with a later departure time (at s), in which case it
prunes u. Moreover, SPCS can be parallelized by assigning different subsets of
departing trips from s to different CPU cores.
Bast and Storandt [27] propose an extension of Dijkstra’s algorithm that
operates on the (compressed) frequency-based model directly. It maintains with
every vertex u a set of tuples consisting of a time interval, a frequency, and the
travel time. Hence, a single tuple may represent multiple optimal journeys, each
departing within the tuple’s time interval. Whenever the algorithm relaxes an
arc (u, v), it first extends the tuples from the bag at u with the ones stored at the
arc (u, v) in the compressed graph. The resulting tentative bag of tuples (rep-
resenting all optimal journeys to v via u) is then merged into the bag of tuples
associated with v. The main challenge of this algorithm is efficiently merging
tuples with incompatible frequencies and time intervals [27].
Finally, the Connection Scan Algorithm has been extended to the range prob-
lem [105]. It uses the same array of connections, ordered by departure time, as
for earliest arrival queries. It still suffices to scan this array once, even to obtain
optimal journeys to all stops of the network.
served by these stops. To scan route r, RAPTOR traverses its stops in order of
travel, keeping track of the earliest possible trip (of r) that can be taken. This
trip may improve the tentative arrival times at subsequent stops of route r. Note
that RAPTOR scans each route at most once per round, which is very efficient in
practice (even faster than Dijkstra’s algorithm with a single criterion). Moreover,
RAPTOR can be parallelized by distributing non-conflicting routes to different
CPU cores. It can also be extended to handle range queries (rRAPTOR) and
additional optimization criteria (McRAPTOR). Note that, like CSA, RAPTOR
also requires footpaths in the input to be closed under transitivity.
Trip-Based Routing [256] accelerates RAPTOR by executing a BFS-like
search on a network of trips and precomputed sensible transfers.
ALT. The (unidirectional) ALT [148] algorithm has been adapted to both the
time-expanded [90] and the time-dependent [207] models for computing earliest
arrival queries. In both cases, landmark selection and distance precomputation
is performed on an auxiliary stop graph, in which vertices correspond to stops
and an arc is added between two stops pi , pj if there is an elementary connection
from pi to pj in the input. Arc costs are lower bounds on the travel time between
their endpoints.
Route Planning in Transportation Networks 53
Arc Flags and SHARC. Delling et al. [90] have adapted Arc Flags [157,178] to
the time-expanded model as follows. First, they compute a partition on the stop
graph (defined as in ALT). Then, for each boundary stop p of cell C, and each of
its arrival vertices, a backward search is performed on the time-expanded graph.
The authors observe that public transit networks have many paths of equal
length between the same pair of vertices [90], making the choice of tie-breaking
rules important. Furthermore, Delling et al. [90] combine Arc Flags, ALT, and
a technique called Node Blocking, which avoids exploring multiple arcs from the
same route.
SHARC, which combines Arc Flags with shortcuts [39], has been tested on
the time-dependent model with earliest arrival queries by Delling [72]. Moreover,
Arc Flags with shortcuts for the Multi-Label-Setting algorithm (MLS) have been
considered for computing full (i. e., using strict domination) Pareto sets using
arrival time and number of transfers as criteria [47]. In time-dependent graphs,
a flag must be set if its arc appears on a shortest path toward the correspond-
ing cell at least once during the time horizon [72]. For better performance, one
can use different sets of flags for different time periods (e. g., every two hours).
The resulting total speedup is still below 15, from which it is concluded that
“accelerating time-dependent multicriteria timetable information is harder than
expected” [47]. Slight additional speedups can be obtained if one restricts the
search space to only those solutions in the Pareto set for which the travel time
is within an interval defined by the earliest arrival time and some upper bound.
Berger et al. [49] observed that in such a scenario optimal substructure in com-
bination with lower travel time bounds can be exploited and yield additional
pruning during search. It is worth noting that this method does not require any
preprocessing and is therefore well-suited for a dynamic scenario.
Overlay Graphs. To accelerate earliest arrival queries, Schulz et al. [235] compute
single-level overlays between “important” hub stations in the time-expanded
model, with importance values given as input. More precisely, given a subset
of important stations, the overlay graph consists of all vertices (events) that
are associated with these stations. Edges in the overlay are computed such that
distances between any pair of vertices (events) are preserved. Extending this
approach to overlay graphs over multiple levels of hub stations (selected by
importance or degree) results in speedups of about 11 [236].
TRANSIT. Finally, Transit Node Routing [28,30,224] has been adapted to pub-
lic transit journey planning in [14]. Preprocessing of the resulting TRANSIT
algorithm uses the (small) stop graph to determine a set of transit nodes (with
a similar method as in [28]), between which it maintains a distance table that
contains sets of journeys with minimal travel time (over the day). Each stop p
maintains, in addition, a set of access nodes A(p), which is computed on the
time-expanded graph by running local searches from each departure event of p
toward the transit stops. The query then uses the access nodes of ps and pt and
Route Planning in Transportation Networks 55
the distance table to resolve global requests. For local requests, it runs goal-
directed A* search. Queries are slower than for Transfer Patterns.
Besides computing journeys according to one of the problems from Sect. 4.1,
extended scenarios (such as incorporating delays) have been studied as well.
Uncertainty and Delays. Trains, buses and other means of transport are often
prone to delays in the real world. Thus, handling delays (and other sources
of uncertainty) is an important aspect of a practical journey planning system.
Firmani et al. [125] recently presented a case study for the public transport
network of the metropolitan area of Rome. They provide strong evidence that
computing journeys according to the published timetable often fails to deliver
optimal or even high-quality solutions. However, incorporating real-time GPS
location data of vehicles into the journey planning algorithm helps improve the
journey quality (e. g., in terms of the experienced delay) [13,84].
Müller-Hannemann and Schnee [201] consider the online problem where
delays, train cancellations, and extra trains arrive as a continuous stream of
information. They present an approach which quickly updates the time-expanded
model to enable queries according to current conditions. Delling et al. [74] also
discuss updating the time-dependent model and compare the required effort with
the time-expanded model. Cionini et al. [63] propose a new graph-based model
which is tailored to handle dynamic updates, and they experimentally show its
effectiveness in terms of both query and update times. Berger et al. [48] pro-
pose a realistic stochastic model that predicts how delays propagate through
the network. In particular, this model is evaluated using real (delay) data from
Deutsche Bahn. Bast et al. [25] study the robustness of Transfer Patterns with
respect to delays. They show that the transfer patterns computed for a scenario
without any delays give optimal results for 99 % of queries, even when large and
area-wide (random) delays are injected into the networks.
Disser et al. [109] and Delling et al. [93] study the computation of reliable
journeys via multicriteria optimization. The reliability of a transfer is defined as a
function of the available buffer time for the transfer. Roughly speaking, the larger
the buffer time, the more likely it is that the transfer will be successful. According
to this notion, transfers with a high chance of success are still considered reliable
even if there is no backup alternative in case they fail.
To address this issue, Dibbelt et al. [105] minimize the expected arrival
time (with respect to a simple model for the probability that a transfer breaks).
Instead of journeys, their method (which is based on the CSA algorithm) outputs
a decision graph representing optimal instructions to the user at each point of
their journey, including cases in which a connecting trip is missed. Interestingly,
minimizing the expected arrival time implicitly helps minimizing the number of
transfers, since each “unnecessary” transfer introduces additional uncertainty,
hurting the expected arrival time.
56 H. Bast et al.
Finally, Goerigk et al. [146] study the computation of robust journeys, con-
sidering both strict robustness (i. e., computing journeys that are always feasible
for a given set of delay scenarios) and light robustness (i. e., computing journeys
that are most reliable when given some extra slack time). While strict robustness
turns out to be too conservative in practice, the notion of light robustness seems
more promising. Recoverable robust journeys (which can always be updated when
delays occur) have recently been considered in [145]. A different, new robustness
concept has been proposed by Böhmová et al. [51]. In order to propose solutions
that are robust for typical delays, past observations of real traffic situations are
used. Roughly speaking, a route is more robust the better it has performed in
the past under different scenarios.
Night Trains. Gunkel et al. [153] have considered the computation of overnight
train journeys, whose optimization goals are quite different from regular “day-
time” journeys. From a customer’s point of view, the primary objective is usu-
ally to have a reasonably long sleeping period. Moreover, arriving too early in
the morning at the destination is often not desired. Gunkel et al. present two
approaches to compute overnight journeys. The first approach explicitly enu-
merates all overnight trains (which are given by the input) and computes, for
each such train, the optimal feeding connections. The second approach runs
multicriteria search with sleeping time as a maximization criterion.
Fares. Müller-Hannemann and Schnee [199] have analyzed several pricing sche-
mes, integrating them as an optimization criterion (cost) into MOTIS, a mul-
ticriteria search algorithm that works on the time-expanded model. In general,
however, optimizing exact monetary cost is a challenging problem, since real-
world pricing schemes are hard to capture by a mathematical model [199].
Delling et al. [92] consider computing Pareto sets of journeys that optimize
fare zones with the McRAPTOR algorithm. Instead of using (monetary) cost
as an optimization criterion directly, they compute all nondominated journeys
that traverse different combinations of fare zones, which can then be evaluated
by cost in a quick postprocessing step.
Guidebook Routing. Bast and Storandt [26] introduce Guidebook Routing, where
the user specifies only source and target stops, but neither a day nor a time of
departure. The desired answer is then a set of routes, each of which is given
by a sequence of train or bus numbers and transfer stations. For example, an
answer may read like take bus number 11 towards the bus stop at X, then change
to bus number 13 or 14 (whichever comes first) and continue to the bus stop
at Y. Guidebook routes can be computed by first running a multicriteria range
query, and then extracting from the union of all Pareto-optimal time-dependent
paths a subset of routes composed by arcs which are most frequently used. The
Transfer Patterns algorithm lends itself particularly well to the computation
of such guidebook routes. For practical guidebook routes (excluding “exotic”
connections at particular times), the preprocessing space and query times of
Transfer Patterns can be reduced by a factor of 4 to 5.
Route Planning in Transportation Networks 57
This section compares the performance of some of the journey planning algo-
rithms discussed in this section. As in road networks, all algorithms have been
carefully implemented in C++ using mostly custom-built data structures.
Table 3 summarizes the results. Running times are obtained from a sequential
execution on one core of a dual 8-core Intel Xeon E5-2670 machine clocked at
2.6 GHz with 64 GiB of DDR3-1600 RAM. The exceptions are Transfer Patterns
and Contraction Hierarchies, for which we reproduce the values reported in the
original publication (obtained on a comparable machine).
For each algorithm, we report the instance on which it has been evaluated,
as well as its total number of elementary connections (a proxy for size) and the
number of consecutive days covered by the connections. Unfortunately, realistic
benchmark data of country scale (or larger) has not been widely available to the
research community. Some metropolitan transit agencies have recently started
making their timetable data publicly available, mostly using the General Transit
Feed format2 . Still, research groups often interpret the data differently, making it
hard to compare the performance of different algorithms. The largest metropoli-
tan instance currently available is the full transit network of London3 . It contains
approximately 21 thousand stops, 2.2 thousand routes, 133 thousand trips, 46
thousand footpaths, and 5.1 million elementary connections for one full day. We
therefore use this instance for the evaluation of most algorithms. The instances
representing Germany and long-distance trains in Europe are generated in a
similar way, but from proprietary data.
The table also contains the preprocessing time (where applicable), the aver-
age number of label comparisons per stop, the average number of journeys com-
puted by the algorithm, and its running time in milliseconds. Note that the
number of journeys can be below 1 because some stops are unreachable for cer-
tain late departure times. References indicate the publications from which the
figures are taken (which may differ from the first publication); TED was run by
the authors for this survey. (Our TED implementation uses a single-level bucket
queue [104] and stops as soon as a vertex of the target stop has been extracted.)
The columns labeled “criteria” indicate whether the algorithm minimizes arrival
time (arr), number of transfers (tran), fare zones (fare), reliability (rel), and
whether it computes range queries (rng) over the full timetable period of 1, 2,
or 7 days. Methods with multiple criteria compute Pareto sets.
Among algorithms without preprocessing, we observe that those that do not
use a graph (RAPTOR and CSA) are consistently faster than their graph-based
counterparts. Moreover, running Dijkstra’s algorithm on the time-expanded
graph model (TED) is significantly slower than running it on the time-dependent
graph model (TDD), since time-expanded graphs are much larger. For earliest
arrival queries on metropolitan areas, CSA is the fastest algorithm without pre-
processing, but preprocessing-based methods (such as Transfer Patterns) can
2
https://developers.google.com/transit/gtfs/.
3
http://data.london.gov.uk/.
58 H. Bast et al.
order of magnitude. Note that in public transit networks the optimization crite-
ria are often positively correlated (such as arrival time and number of transfers),
which keeps the Pareto sets at a manageable size. Still, as the number of criteria
increases, exact real-time queries become harder to achieve.
The reported figures for Transfer Patterns are based on preprocessing lever-
aging the frequency-based model with traffic days compression, which makes
quadratic (in the number of stops) preprocessing effort feasible. Consequently,
hub stops and the three-leg heuristic are not required, and the algorithm is guar-
anteed to find the optimal solution. The data produced by the preprocessing is
shown to be robust against large and area-wide delays, resulting in much less
than 1 % of suboptimal journeys [25] (not shown in the table).
For range queries, preprocessing-based techniques (CH, ACSA, Transfer Pat-
terns) scale better than CSA or SPCS. For full multicriteria range queries (con-
sidering transfers), Transfer Patterns is by far the fastest method, thanks to
preprocessing. Among search-based methods, CSA is faster than rRAPTOR by
a factor of two, although it does twice the amount of work in terms of label
comparisons. Note, however, that while CSA cannot scale to smaller time ranges
by design [105], the performance of rRAPTOR depends linearly on the num-
ber of journeys departing within the time range [92]. For example, for 2-hour
range queries rRAPTOR computes 15.9 journeys taking only 61.3 ms on aver-
age [93] (not reported in the table). Guidebook routes covering about 80 % of
the optimal results (for the full period) can be computed in a fraction of a
millisecond [26].
To aim for journeys that reasonably combine different transport modes, one
may use penalties in the objective function of the algorithm. These penalties
are often considered as a linear combination with the primary optimization
goal (typically travel time). Examples for this approach include Aifadopoulou
et al. [10], who present a linear program that computes multimodal journeys.
The TRANSIT algorithm [14] also uses a linear utility function and incorporates
travel time, ticket cost, and “inconvenience” of transfers. Finally, Modesti and
Sciomachen [195] consider a combined network of unrestricted walking, unre-
stricted car travel, and public transit, in which journeys are optimized according
to a linear combination of several criteria, such as cost and travel time. More-
over, their utility function incorporates user preferences on the transportation
modes.
The label-constrained shortest paths [21] approach computes journeys that explic-
itly obey certain constraints on the modes of transportation. It defines an alpha-
bet Σ of modes of transportation and labels each arc of the graph by the
appropriate symbol from Σ. Then, given a language L over Σ as additional
input to the query, any journey (path) must obey the constraints imposed by
the language L, i. e., the concatenation of the labels along the path must sat-
isfy L. The problem of computing shortest label-constrained paths is tractable
for regular languages [21], which suffice to model reasonable transport mode
constraints in multimodal journey planning [18,20]. Even restricted classes of
Route Planning in Transportation Networks 61
regular languages can be useful, such as those that impose a hierarchy of trans-
port modes [50,89,169,170,210,258] or Kleene languages that can only globally
exclude (and include) certain transport modes [140].
Barrett et al. [21] have proven that the label-constrained shortest path prob-
lem is solvable in deterministic polynomial time. The corresponding algorithm,
called label-constrained shortest path problem Dijkstra (LCSPP-D), first builds a
product network G of the input (the multimodal graph) and the (possibly non-
deterministic) finite automaton that accepts the regular language L. For given
source and target vertices s, t (referring to the original input), the algorithm
determines origin and destination sets of product vertices from G, containing
those product vertices that refer to s/t and an initial/final state of the automa-
ton. Dijkstra’s algorithm is then run on G between these two sets of product
vertices. In a follow-up experimental study, Barrett et al. [20] evaluate this algo-
rithm using linear regular languages, a special case.
Basic speedup techniques, such as bidirectional search [67], A* [156], and
heuristic A* [237] have been evaluated in the context of multimodal journey
planning in [159] and [19]. Also, Pajor [210] combines the LCSPP-D algorithm
with time-dependent Dijkstra [65] to compute multimodal journeys that con-
tain a time-dependent subnetwork. He also adapts and analyzes bidirectional
search [67], ALT [148], Arc Flags [157,178], and shortcuts [249] with respect to
LCSPP.
walking duration for the pedestrian network, and monetary cost for taxis. They
observe that simply applying the MLS algorithm [155,187,196,243] to a compre-
hensive multimodal graph turns out to be slow, even when partial contraction
is applied to the road and pedestrian networks, as in UCCH [106]. To get bet-
ter query performance, they extend RAPTOR [92] to the multimodal scenario,
which results in the multimodal multicriteria RAPTOR algorithm (MCR) [73].
Like RAPTOR, MCR operates in rounds (one per transfer) and computes Pareto
sets of optimal journeys with exactly i transfers in round i. It does so by running,
in each round, a dedicated subalgorithm (RAPTOR for public transit; MLS for
walking and taxi) which obtains journeys with the respective transport mode as
their last leg.
Since with increasing number of optimization criteria the resulting Pareto
sets tend to get very large, Delling et al. identify the most significant journeys in
a quick postprocessing step by a scoring method based on fuzzy logic [259]. For
faster queries, MCR-based heuristics (which relax domination during the algo-
rithm) successfully find the most significant journeys while avoiding the compu-
tation of insignificant ones in the first place.
Bast et al. [23] use MLS with contraction to compute multimodal multicri-
teria journeys at a metropolitan scale. To identify the significant journeys of the
Pareto set, they propose a method called Types aNd Thresholds (TNT). The
method is based on a set of simple axioms that summarize what most users
would consider as unreasonable multimodal paths. For example, if one is willing
to take the car for a large fraction of the trip, one might as well take it for
the whole trip. Three types of reasonable trips are deduced from the axioms:
(1) only car, (2) arbitrarily much transit and walking with no car, and (3) arbi-
trarily much transit with little or no walking and car. With a concrete threshold
for “little” (such as 10 min), the rules can then be applied to filter the reasonable
journeys. As in [73], filtering can be applied during the algorithm to prune the
search space and reduce query time. The resulting sets are fairly robust with
respect to the choice of threshold.
6 Final Remarks
The last decade has seen astonishing progress in the performance of shortest
path algorithms on transportation networks. For routing in road networks, in
particular, modern algorithms can be up to seven orders of magnitude faster
than standard solutions. Successful approaches exploit different properties of
road networks that make them easier to deal with than general graphs, such
as goal direction, a strong hierarchical structure, and the existence of small
separators. Although some early acceleration techniques relied heavily on geom-
etry (road networks are after all embedded on the surface of the Earth), no
current state-of-the-art algorithm makes explicit use of vertex coordinates (see
Table 1). While one still sees the occasional development (and publication) of
geometry-based algorithms they are consistently dominated by established tech-
niques. In particular, the recent Arterial Hierarchies [261] algorithm is compared
64 H. Bast et al.
to CH (which has slightly slower queries), but not to other previously published
techniques (such as CHASE, HL, and TNR) that would easily dominate it. This
shows that results in this rapidly-evolving area are often slow to reach some
communities; we hope this survey will help improve this state of affairs.
Note that experiments on real data are very important, as properties of
production data are not always accurately captured by simplified models and
folklore assumptions. For example, the common belief that an algorithm can
be augmented to include turn penalties without significant loss in performance
turned out to be wrong for CH [76].
Another important lesson from recent developments is that careful engineer-
ing is essential to unleash the full computational power of modern computer
architectures. Algorithms such as CRP, CSA, HL, PHAST, and RAPTOR, for
example, achieve much of their good performance by carefully exploiting locality
of reference and parallelism (at the level of instructions, cores, and even GPUs).
The ultimate validation of several of the approaches described here is that
they have found their way into systems that serve millions of users every day.
Several authors of papers cited in this survey have worked on routing-related
projects for companies like Apple, Esri, Google, MapBox, Microsoft, Nokia,
PTV, TeleNav, TomTom, and Yandex. Although companies tend to be secretive
about the actual algorithms they use, in some cases this is public knowledge.
TomTom uses a variant of Arc Flags with shortcuts to perform time-dependent
queries [231]. Microsoft’s Bing Maps4 use CRP for routing in road networks.
OSRM [185], a popular route planning engine using OpenStreetMap data, uses
CH for queries. The Transfer Patterns [24] algorithm has been in use for public-
transit journey planning on Google Maps5 since 2010. RAPTOR is currently in
use by OpenTripPlanner6 .
These recent successes do not mean that all problems in this area are solved.
The ultimate goal, a worldwide multimodal journey planner, has not yet been
reached. Systems like Rome2Rio7 provide a simplified first step, but a more useful
system would take into account real-time traffic and transit information, historic
patterns, schedule constraints, and monetary costs. Moreover, all these elements
should be combined in a personalized manner. Solving such a general problem
efficiently seems beyond the reach of current algorithms. Given the recent pace
of progress, however, a solution may be closer than expected.
4
http://www.bing.com/blogs/site blogs/b/maps/archive/2012/01/05/
bing-maps-new-routing-engine.aspx.
5
http://www.google.com/transit.
6
http://opentripplanner.com.
7
http://www.rome2rio.com.
Route Planning in Transportation Networks 65
References
1. Abraham, I., Delling, D., Fiat, A., Goldberg, A.V., Werneck, R.F.: VC-dimension
and shortest path algorithms. In: Aceto, L., Henzinger, M., Sgall, J. (eds.) ICALP
2011. LNCS, vol. 6755, pp. 690–699. Springer, Heidelberg (2011). doi:10.1007/
978-3-642-22006-7 58
2. Abraham, I., Delling, D., Fiat, A., Goldberg, A.V., Werneck, R.F.: HLDB:
Location-based services in databases. In: Proceedings of the 20th ACM SIGSPA-
TIAL International Symposium on Advances in Geographic Information Systems
(GIS 2012), pp. 339–348. ACM Press 2012. Best Paper Award
3. Abraham, I., Delling, D., Fiat, A., Goldberg, A.V., Werneck, R.F.: Highway
dimension and provably efficient shortest path algorithms. Technical report MSR-
TR-2013-91, Microsoft Research (2013)
4. Abraham, I., Delling, D., Goldberg, A.V., Werneck, R.F.: A hub-based labeling
algorithm for shortest paths in road networks. In: Pardalos, P.M., Rebennack, S.
(eds.) SEA 2011. LNCS, vol. 6630, pp. 230–241. Springer, Heidelberg (2011).
doi:10.1007/978-3-642-20662-7 20
5. Abraham, I., Delling, D., Goldberg, A.V., Werneck, R.F.: Hierarchical hub label-
ings for shortest paths. In: Epstein, L., Ferragina, P. (eds.) ESA 2012. LNCS, vol.
7501, pp. 24–35. Springer, Heidelberg (2012). doi:10.1007/978-3-642-33090-2 4
6. Abraham, I., Delling, D., Goldberg, A.V., Werneck, R.F.: Alternative routes in
road networks. ACM J. Exp. Algorithm. 18(1), 1–17 (2013)
7. Abraham, I., Fiat, A., Goldberg, A.V., Werneck, R.F.: Highway dimension, short-
est paths, and provably efficient algorithms. In: Proceedings of the 21st Annual
ACM-SIAM Symposium on Discrete Algorithms (SODA 2010), pp. 782–793.
SIAM (2010)
8. Ahuja, R.K., Mehlhorn, K., Orlin, J.B., Tarjan, R.: Faster algorithms for the
shortest path problem. J. ACM 37(2), 213–223 (1990)
9. Ahuja, R.K., Orlin, J.B., Pallottino, S., Scutellà, M.G.: Dynamic shortest paths
minimizing travel times and costs. Networks 41(4), 197–205 (2003)
10. Aifadopoulou, G., Ziliaskopoulos, A., Chrisohoou, E.: Multiobjective optimum
path algorithm for passenger pretrip planning in multimodal transportation net-
works. J. Transp. Res. Board 2032(1), 26–34 (2007). doi:10.3141/2032-04
11. Akiba, T., Iwata, Y., Kawarabayashi, K., Kawata, Y.: Fast shortest-path distance
queries on road networks by pruned highway labeling. In: Proceedings of the 16th
Meeting on Algorithm Engineering and Experiments (ALENEX 2014), pp. 147–
154. SIAM (2014)
12. Akiba, T., Iwata, Y., Yoshida,Y.: Fast exact shortest-path distance queries on
large networks by pruned landmark labeling. In: Proceedings of the 2013 ACM
SIGMOD International Conference on Management of Data (SIGMOD 2013), pp.
349–360. ACM Press (2013)
13. Allulli, L., Italiano, G.F., Santaroni, F.: Exploiting GPS data in public
transport journey planners. In: Gudmundsson, J., Katajainen, J. (eds.) SEA
2014. LNCS, vol. 8504, pp. 295–306. Springer, Heidelberg (2014). doi:10.1007/
978-3-319-07959-2 25
14. Antsfeld, L., Walsh, T.: Finding multi-criteria optimal paths in multi-modal pub-
lic transportation networks using the transit algorithm. In: Proceedings of the
19th ITS World Congress (2012)
15. Arz, J., Luxen, D., Sanders, P.: Transit node routing reconsidered. In: Bonifaci, V.,
Demetrescu, C., Marchetti-Spaccamela, A. (eds.) SEA 2013. LNCS, vol. 7933,
pp. 55–66. Springer, Heidelberg (2013). doi:10.1007/978-3-642-38527-8 7
66 H. Bast et al.
16. Babenko, M., Goldberg, A.V., Gupta, A., Nagarajan, V.: Algorithms for hub label
optimization. In: Fomin, F.V., Freivalds, R., Kwiatkowska, M., Peleg, D. (eds.)
ICALP 2013. LNCS, vol. 7965, pp. 69–80. Springer, Heidelberg (2013). doi:10.
1007/978-3-642-39206-1 7
17. Bader, R., Dees, J., Geisberger, R., Sanders, P.: Alternative route graphs in road
networks. In: Marchetti-Spaccamela, A., Segal, M. (eds.) TAPAS 2011. LNCS, vol.
6595, pp. 21–32. Springer, Heidelberg (2011). doi:10.1007/978-3-642-19754-3 5
18. Barrett, C., Bisset, K., Holzer, M., Konjevod, G., Marathe, M., Wagner, D.: Engi-
neering label-constrained shortest-path algorithms. In: Fleischer, R., Xu, J. (eds.)
AAIM 2008. LNCS, vol. 5034, pp. 27–37. Springer, Heidelberg (2008). doi:10.
1007/978-3-540-68880-8 5
19. Barrett, C., Bisset, K., Holzer, M., Konjevod, G., Marathe, M.V., Wagner, D.:
Engineering label-constrained shortest-path algorithms. In: The Shortest Path
Problem: Ninth DIMACS Implementation Challenge, DIMACS Book, vol. 74,
pp. 309–319. American Mathematical Society (2009)
20. Barrett, C., Bisset, K., Jacob, R., Konjevod, G., Marathe, M.: Classical and con-
temporary shortest path problems in road networks: implementation and exper-
imental analysis of the TRANSIMS router. In: Möhring, R., Raman, R. (eds.)
ESA 2002. LNCS, vol. 2461, pp. 126–138. Springer, Heidelberg (2002). doi:10.
1007/3-540-45749-6 15
21. Barrett, C., Jacob, R., Marathe, M.V.: Formal-language-constrained path prob-
lems. SIAM J. Comput. 30(3), 809–837 (2000)
22. Bast, H.: Car or public transport—two worlds. In: Albers, S., Alt, H., Näher, S.
(eds.) Efficient Algorithms. LNCS, vol. 5760, pp. 355–367. Springer, Heidelberg
(2009). doi:10.1007/978-3-642-03456-5 24
23. Bast, H., Brodesser, M., Storandt, S.: Result diversity for multi-modal route
planning. In: Proceedings of the 13th Workshop on Algorithmic Approaches for
Transportation Modeling, Optimization, and Systems (ATMOS 2013), OpenAc-
cess Series in Informatics (OASIcs), pp. 123–136 (2013)
24. Bast, H., Carlsson, E., Eigenwillig, A., Geisberger, R., Harrelson, C., Raychev, V.,
Viger, F.: Fast routing in very large public transportation networks using transfer
patterns. In: Berg, M., Meyer, U. (eds.) ESA 2010. LNCS, vol. 6346, pp. 290–301.
Springer, Heidelberg (2010). doi:10.1007/978-3-642-15775-2 25
25. Bast, H., Sternisko, J., Storandt, S.: Delay-robustness of transfer patterns in
public transportation route planning. In: Proceedings of the 13th Workshop on
Algorithmic Approaches for Transportation Modeling, Optimization, and Systems
(ATMOS 2013), OpenAccess Series in Informatics (OASIcs), pp. 42–54 (2013)
26. Bast, H., Storandt, S.: Flow-based guidebook routing. In: Proceedings of the 16th
Meeting on Algorithm Engineering and Experiments (ALENEX 2014), pp. 155–
165. SIAM (2014)
27. Bast, H., Storandt, S.: Frequency-based search for public transit. In: Proceed-
ings of the 22nd ACM SIGSPATIAL International Conference on Advances in
Geographic Information Systems, pp. 13–22. ACM Press, November 2014
28. Bast, H., Funke, S., Matijevic, D.: Ultrafast shortest-path queries via transit
nodes. In: The Shortest Path Problem: Ninth DIMACS Implementation Chal-
lenge, DIMACS Book, vol. 74, pp. 175–192. American Mathematical Society
(2009)
29. Bast, H., Funke, S., Matijevic, D., Sanders, P., Schultes, D.: In transit to con-
stant shortest-path queries in road networks. In: Proceedings of the 9th Workshop
on Algorithm Engineering and Experiments (ALENEX 2007), pp. 46–59. SIAM
(2007)
Route Planning in Transportation Networks 67
30. Bast, H., Funke, S., Sanders, P., Schultes, D.: Fast routing in road networks with
transit nodes. Science 316(5824), 566 (2007)
31. Batz, G.V., Geisberger, R., Luxen, D., Sanders, P., Zubkov, R.: Efficient route
compression for hybrid route planning. In: Even, G., Rawitz, D. (eds.) MedAlg
2012. LNCS, vol. 7659, pp. 93–107. Springer, Heidelberg (2012). doi:10.1007/
978-3-642-34862-4 7
32. Batz, G.V., Geisberger, R., Sanders, P., Vetter, C.: Minimum time-dependent
travel times with contraction hierarchies. ACM J. Exp. Algorithm. 18(1.4), 1–43
(2013)
33. Batz, G.V., Sanders, P.: Time-dependent route planning with generalized objec-
tive functions. In: Epstein, L., Ferragina, P. (eds.) ESA 2012. LNCS, vol. 7501,
pp. 169–180. Springer, Heidelberg (2012). doi:10.1007/978-3-642-33090-2 16
34. Bauer, A.: Multimodal profile queries. Bachelor thesis, Karlsruhe Institute of
Technology, May 2012
35. Bauer, R., Baum, M., Rutter, I., Wagner, D.: On the complexity of partitioning
graphs for arc-flags. J. Graph Algorithms Appl. 17(3), 265–299 (2013)
36. Bauer, R., Columbus, T., Katz, B., Krug, M., Wagner, D.: Preprocess-
ing speed-up techniques is hard. In: Calamoneri, T., Diaz, J. (eds.) CIAC
2010. LNCS, vol. 6078, pp. 359–370. Springer, Heidelberg (2010). doi:10.1007/
978-3-642-13073-1 32
37. Bauer, R., Columbus, T., Rutter, I., Wagner, D.: Search-space size in contraction
hierarchies. In: Fomin, F.V., Freivalds, R., Kwiatkowska, M., Peleg, D. (eds.)
ICALP 2013. LNCS, vol. 7965, pp. 93–104. Springer, Heidelberg (2013). doi:10.
1007/978-3-642-39206-1 9
38. Bauer, R., D’Angelo, G., Delling, D., Schumm, A., Wagner, D.: The shortcut
problem - complexity and algorithms. J. Graph Algorithms Appl. 16(2), 447–481
(2012)
39. Bauer, R., Delling, D.: SHARC: Fast and robust unidirectional routing. ACM J.
Exp. Algorithm. 14(2.4), 1–29 (2009). Special Section on Selected Papers from
ALENEX 2008
40. Bauer, R., Delling, D., Sanders, P., Schieferdecker, D., Schultes, D., Wagner, D.:
Combining hierarchical, goal-directed speed-up techniques for Dijkstra’s algo-
rithm. ACM J. Exp. Algorithm. 15(2.3), 1–31 (2010). Special Section devoted
to WEA 2008
41. Bauer, R., Delling, D., Wagner, D.: Experimental study on speed-up techniques
for timetable information systems. Networks 57(1), 38–52 (2011)
42. Bauer, R., Krug, M., Meinert, S., Wagner, D.: Synthetic road networks. In:
Chen, B. (ed.) AAIM 2010. LNCS, vol. 6124, pp. 46–57. Springer, Heidelberg
(2010). doi:10.1007/978-3-642-14355-7 6
43. Baum, M., Dibbelt, J., Hübschle-Schneider, L., Pajor, T., Wagner, D.: Speed-
consumption tradeoff for electric vehicle route planning. In: Proceedings of
the 14th Workshop on Algorithmic Approaches for Transportation Modeling,
Optimization, and Systems (ATMOS 2014), OpenAccess Series in Informatics
(OASIcs), pp. 138–151 (2014)
44. Baum, M., Dibbelt, J., Pajor, T., Wagner, D.: Energy-optimal routes for electric
vehicles. In: Proceedings of the 21st ACM SIGSPATIAL International Conference
on Advances in Geographic Information Systems, pp. 54–63. ACM Press (2013)
45. Baumann, N., Schmidt, R.: Buxtehude-Garmisch in 6 Sekunden. die elektronis-
che Fahrplanauskunft (EFA) der Deutschen Bundesbahn. Zeitschrift für aktuelle
Verkehrsfragen 10, 929–931 (1988)
68 H. Bast et al.
46. Bellman, R.: On a routing problem. Q. Appl. Math. 16, 87–90 (1958)
47. Berger, A., Delling, D., Gebhardt, A., Müller-Hannemann, M.: Accelerating time-
dependent multi-criteria timetable information is harder than expected. In: Pro-
ceedings of the 9th Workshop on Algorithmic Approaches for Transportation
Modeling, Optimization, and Systems (ATMOS 2009), OpenAccess Series in
Informatics (OASIcs) (2009)
48. Berger, A., Gebhardt, A., Müller-Hannemann, M., Ostrowski, M.: Stochastic
delay prediction in large train networks. In: Proceedings of the 11th Workshop
on Algorithmic Approaches for Transportation Modeling, Optimization, and Sys-
tems (ATMOS 2011), OpenAccess Series in Informatics (OASIcs), vol. 20, pp.
100–111 (2011)
49. Berger, A., Grimmer, M., Müller-Hannemann, M.: Fully dynamic speed-up tech-
niques for multi-criteria shortest path searches in time-dependent networks.
In: Festa, P. (ed.) SEA 2010. LNCS, vol. 6049, pp. 35–46. Springer, Heidelberg
(2010). doi:10.1007/978-3-642-13193-6 4
50. Bielli, M., Boulmakoul, A., Mouncif, H.: Object modeling and path computation
for multimodal travel systems. Eur. J. Oper. Res. 175(3), 1705–1730 (2006)
51. Böhmová, K., Mihalák, M., Pröger, T., Šrámek, R., Widmayer, P.: Robust rout-
ing in urban public transportation: how to find reliable journeys based on past
observations. In: Proceedings of the 13th Workshop on Algorithmic Approaches
for Transportation Modeling, Optimization, and Systems (ATMOS 2013), Ope-
nAccess Series in Informatics (OASIcs), pp. 27–41 (2013)
52. Botea, A.: Ultra-fast optimal pathfinding without runtime search. In: Proceedings
of the Seventh AAAI Conference on Artificial Intelligence and Interactive Digital
Entertainment (AIIDE 2011), pp. 122–127. AAAI Press (2011)
53. Botea, A., Harabor, D.: Path planning with compressed all-pairs shortest paths
data. In: Proceedings of the 23rd International Conference on Automated Plan-
ning and Scheduling, AAAI Press (2013)
54. Brandes, U., Erlebach, T.: Network Analysis: Methodological Foundations. The-
oretical Computer Science and General Issues, vol. 3418. Springer, Heidelberg
(2005)
55. Brandes, U., Schulz, F., Wagner, D., Willhalm, T.: Travel planning with self-made
maps. In: Buchsbaum, A.L., Snoeyink, J. (eds.) ALENEX 2001. LNCS, vol. 2153,
pp. 132–144. Springer, Heidelberg (2001). doi:10.1007/3-540-44808-X 10
56. Brodal, G., Jacob, R.: Time-dependent networks as models to achieve fast exact
time-table queries. In: Proceedings of the 3rd Workshop on Algorithmic Methods
and Models for Optimization of Railways (ATMOS 2003), Electronic Notes in
Theoretical Computer Science, vol. 92, pp. 3–15 (2004)
57. Bruera, F., Cicerone, S., D’Angelo, G., Di Stefano, G., Frigioni, D.: Dynamic
multi-level overlay graphs for shortest paths. Math. Comput. Sci. 1(4), 709–736
(2008)
58. Brunel, E., Delling, D., Gemsa, A., Wagner, D.: Space-efficient SHARC-routing.
In: Festa, P. (ed.) SEA 2010. LNCS, vol. 6049, pp. 47–58. Springer, Heidelberg
(2010). doi:10.1007/978-3-642-13193-6 5
59. Caldwell, T.: On finding minimum routes in a network with turn penalties. Com-
mun. ACM 4(2), 107–108 (1961)
60. Cambridge Vehicle Information Technology Ltd. Choice routing (2005). http://
www.camvit.com
61. Cherkassky, B.V., Goldberg, A.V., Radzik, T.: Shortest paths algorithms. Math.
Programm. Ser. A 73, 129–174 (1996)
Route Planning in Transportation Networks 69
62. Cherkassky, B.V., Goldberg, A.V., Silverstein, C.: Buckets, heaps, lists, and
monotone priority queues. In: Proceedings of the 8th Annual ACM-SIAM Sympo-
sium on Discrete Algorithms (SODA 1997), pp. 83–92. IEEE Computer Society
Press (1997)
63. Cionini, A., D’Angelo, G., D’Emidio, M., Frigioni, D., Giannakopoulou, K.,
Paraskevopoulos, A.: Engineering graph-based models for dynamic timetable
information systems. In: Proceedings of the 14th Workshop on Algorithmic
Approaches for Transportation Modeling, Optimization, and Systems (ATMOS
2014), OpenAccess Series in Informatics (OASIcs), pp. 46–61 (2014)
64. Cohen, E., Halperin, E., Kaplan, H., Zwick, U.: Reachability and distance queries
via 2-hop labels. SIAM J. Comput. 32(5), 1338–1355 (2003)
65. Cooke, K., Halsey, E.: The shortest route through a network with time-dependent
intermodal transit times. J. Math. Anal. Appl. 14(3), 493–498 (1966)
66. D’Angelo, G., D’Emidio, M., Frigioni, D., Vitale, C.: Fully dynamic maintenance
of arc-flags in road networks. In: Klasing, R. (ed.) SEA 2012. LNCS, vol. 7276,
pp. 135–147. Springer, Heidelberg (2012). doi:10.1007/978-3-642-30850-5 13
67. George, B.D.: Linear Programming and Extensions. Princeton University Press,
Princeton (1962)
68. Dean, B.C.: Continuous-time dynamic shortest path algorithms. Master’s thesis,
Massachusetts Institute of Technology (1999)
69. Dean, B.C.: Algorithms for minimum-cost paths in time-dependent networks with
waiting policies. Networks 44(1), 41–46 (2004)
70. Dean, B.C.: Shortest paths in FIFO time-dependent networks: theory and algo-
rithms. Technical report, Massachusetts Institute Of Technology (2004)
71. Dehne, F., Omran, M.T., Sack, J.-R.: Shortest paths in time-dependent FIFO
networks. Algorithmica 62, 416–435 (2012)
72. Delling, D.: Time-dependent SHARC-routing. Algorithmica 60(1), 60–94 (2011)
73. Delling, D., Dibbelt, J., Pajor, T., Wagner, D., Werneck, R.F.: Computing
multimodal journeys in practice. In: Bonifaci, V., Demetrescu, C., Marchetti-
Spaccamela, A. (eds.) SEA 2013. LNCS, vol. 7933, pp. 260–271. Springer,
Heidelberg (2013). doi:10.1007/978-3-642-38527-8 24
74. Delling, D., Giannakopoulou, K., Wagner, D., Zaroliagis, C.: Timetable informa-
tion updating in case of delays: modeling issues. Technical report 133, Arrival
Technical report (2008)
75. Delling, D., Goldberg, A.V., Nowatzyk, A., Werneck, R.F.: PHAST: Hardware-
accelerated shortest path trees. J. Parallel Distrib. Comput. 73(7), 940–952 (2013)
76. Delling, D., Goldberg, A.V., Pajor, T., Werneck, R.F.: Customizable route plan-
ning. In: Pardalos, P.M., Rebennack, S. (eds.) SEA 2011. LNCS, vol. 6630, pp.
376–387. Springer, Heidelberg (2011). doi:10.1007/978-3-642-20662-7 32
77. Delling, D., Goldberg, A.V., Pajor, T., Werneck, R.F.: Robust distance queries on
massive networks. In: Schulz, A.S., Wagner, D. (eds.) ESA 2014. LNCS, vol. 8737,
pp. 321–333. Springer, Heidelberg (2014). doi:10.1007/978-3-662-44777-2 27
78. Delling, D., Goldberg, A.V., Pajor, T., Werneck, R.F.: Customizable route plan-
ning in road networks. Transp. Sci. (2015)
79. Delling, D., Goldberg, A.V., Razenshteyn, I., Werneck, R.F.: Graph partition-
ing with natural cuts. In: 25th International Parallel and Distributed Processing
Symposium (IPDPS 2011), pp. 1135–1146. IEEE Computer Society (2011)
80. Delling, D., Goldberg, A.V., Savchenko, R., Werneck, R.F.: Hub labels: theory and
practice. In: Gudmundsson, J., Katajainen, J. (eds.) SEA 2014. LNCS, vol. 8504,
pp. 259–270. Springer, Heidelberg (2014). doi:10.1007/978-3-319-07959-2 22
70 H. Bast et al.
81. Delling, D., Goldberg, A.V., Werneck, R.F.: Faster batched shortest paths in road
networks. In: Proceedings of the 11th Workshop on Algorithmic Approaches for
Transportation Modeling, Optimization, and Systems (ATMOS 2011), OpenAc-
cess Series in Informatics (OASIcs), vol. 20, pp. 52–63 (2011)
82. Delling, D., Goldberg, A.V., Werneck, R.F.: Hub label compression. In: Bonifaci,
V., Demetrescu, C., Marchetti-Spaccamela, A. (eds.) SEA 2013. LNCS, vol. 7933,
pp. 18–29. Springer, Heidelberg (2013). doi:10.1007/978-3-642-38527-8 4
83. Delling, D., Holzer, M., Müller, K., Schulz, F., Wagner, D., High-performance
multi-level routing. In: The Shortest Path Problem: Ninth DIMACS Implemen-
tation Challenge, DIMACS Book, vol. 74, pp. 73–92. American Mathematical
Society (2009)
84. Delling, D., Italiano, G.F., Pajor, T., Santaroni, F.: Better transit routing by
exploiting vehicle GPS data. In: Proceedings of the 7th ACM SIGSPATIAL
International Workshop on Computational Transportation Science. ACM Press,
November 2014
85. Delling, D., Katz, B., Pajor, T.: Parallel computation of best connections in public
transportation networks. ACM J. Exp. Algorithm. 17(4), 4. 1–4. 26 (2012)
86. Delling, D., Kobitzsch, M., Luxen, D., Werneck, R.F.: Robust mobile route plan-
ning with limited connectivity. In: Proceedings of the 14th Meeting on Algorithm
Engineering and Experiments (ALENEX 2012), pp. 150–159. SIAM (2012)
87. Delling, D., Kobitzsch, M., Werneck, R.F.: Customizing driving directions
with GPUs. In: Silva, F., Dutra, I., Santos Costa, V. (eds.) Euro-Par 2014.
LNCS, vol. 8632, pp. 728–739. Springer, Heidelberg (2014). doi:10.1007/
978-3-319-09873-9 61
88. Delling, D., Nannicini, G.: Core routing on dynamic time-dependent road net-
works. Informs J. Comput. 24(2), 187–201 (2012)
89. Delling, D., Pajor, T., Wagner, D.: Accelerating multi-modal route planning by
access-nodes. In: Fiat, A., Sanders, P. (eds.) ESA 2009. LNCS, vol. 5757, pp.
587–598. Springer, Heidelberg (2009). doi:10.1007/978-3-642-04128-0 53
90. Delling, D., Pajor, T., Wagner, D.: Engineering time-expanded graphs for faster
timetable information. In: Ahuja, R.K., Möhring, R.H., Zaroliagis, C.D. (eds.)
Robust and Online Large-Scale Optimization. LNCS, vol. 5868, pp. 182–206.
Springer, Heidelberg (2009). doi:10.1007/978-3-642-05465-5 7
91. Delling, D., Pajor, T., Wagner, D., Zaroliagis, C.: Efficient route planning in flight
networks. In: Proceedings of the 9th Workshop on Algorithmic Approaches for
Transportation Modeling, Optimization, and Systems (ATMOS 2009), OpenAc-
cess Series in Informatics (OASIcs) (2009)
92. Delling, D., Pajor, T., Werneck, R.F.: Round-based public transit routing. In:
Proceedings of the 14th Meeting on Algorithm Engineering and Experiments
(ALENEX 2012), pp. 130–140. SIAM (2012)
93. Delling, D., Pajor, T., Werneck, R.F.: Round-based public transit routing. Transp.
Sci. 49, 591–604 (2014)
94. Delling, D., Sanders, P., Schultes, D., Wagner, D.: Engineering route planning
algorithms. In: Lerner, J., Wagner, D., Zweig, K.A. (eds.) Algorithmics of Large
and Complex Networks. LNCS, vol. 5515, pp. 117–139. Springer, Heidelberg
(2009). doi:10.1007/978-3-642-02094-0 7
95. Delling, D., Sanders, P., Schultes, D., Wagner, D.: Highway hierarchies star. In:
The Shortest Path Problem: Ninth DIMACS Implementation Challenge, DIMACS
Book, vol. 74, pp. 141–174. American Mathematical Society (2009)
Route Planning in Transportation Networks 71
96. Delling, D., Wagner, D.: Landmark-based routing in dynamic graphs. In: Deme-
trescu, C. (ed.) WEA 2007. LNCS, vol. 4525, pp. 52–65. Springer, Heidelberg
(2007). doi:10.1007/978-3-540-72845-0 5
97. Delling, D., Wagner, D.: Pareto paths with SHARC. In: Vahrenhold, J. (ed.)
SEA 2009. LNCS, vol. 5526, pp. 125–136. Springer, Heidelberg (2009). doi:10.
1007/978-3-642-02011-7 13
98. Delling, D., Wagner, D.: Time-dependent route planning. In: Ahuja, R.K.,
Möhring, R.H., Zaroliagis, C.D. (eds.) Robust and Online Large-Scale Optimiza-
tion. LNCS, vol. 5868, pp. 207–230. Springer, Heidelberg (2009). doi:10.1007/
978-3-642-05465-5 8
99. Delling, D., Werneck, R.F.: Customizable point-of-interest queries in road net-
works. In: Proceedings of the 21st ACM SIGSPATIAL International Symposium
on Advances in Geographic Information Systems (GIS 2013), pp. 490–493. ACM
Press (2013)
100. Delling, D., Werneck, R.F.: Faster customization of road networks. In: Bonifaci,
V., Demetrescu, C., Marchetti-Spaccamela, A. (eds.) SEA 2013. LNCS, vol. 7933,
pp. 30–42. Springer, Heidelberg (2013). doi:10.1007/978-3-642-38527-8 5
101. Demetrescu, C., Goldberg, A.V., Johnson, D.S. (eds.): The Shortest Path Prob-
lem: Ninth DIMACS Implementation Challenge, DIMACS Book, vol. 74. Ameri-
can Mathematical Society, Providence (2009)
102. Demiryurek, U., Banaei-Kashani, F., Shahabi, C.: A case for time-dependent
shortest path computation in spatial networks. In: Proceedings of the 18th ACM
SIGSPATIAL International Conference on Advances in Geographic Information
Systems (GIS 2010), pp. 474–477 (2010)
103. Denardo, E.V., Fox, B.L.: Shortest-route methods: 1. reaching, pruning, and buck-
ets. Oper. Res. 27(1), 161–186 (1979)
104. Dial, R.B.: Algorithm 360: shortest-path forest with topological ordering [H].
Commun. ACM 12(11), 632–633 (1969)
105. Dibbelt, J., Pajor, T., Strasser, B., Wagner, D.: Intriguingly simple and fast tran-
sit routing. In: Bonifaci, V., Demetrescu, C., Marchetti-Spaccamela, A. (eds.)
SEA 2013. LNCS, vol. 7933, pp. 43–54. Springer, Heidelberg (2013). doi:10.1007/
978-3-642-38527-8 6
106. Dibbelt, J., Pajor, T., Wagner, D.: User-constrained multi-modal route planning.
In: Proceedings of the 14th Meeting on Algorithm Engineering and Experiments
(ALENEX 2012), pp. 118–129. SIAM (2012)
107. Dibbelt, J., Strasser, B., Wagner, D.: Customizable contraction hierarchies. In:
Gudmundsson, J., Katajainen, J. (eds.) SEA 2014. LNCS, vol. 8504, pp. 271–282.
Springer, Heidelberg (2014). doi:10.1007/978-3-319-07959-2 23
108. Dijkstra, E.W.: A note on two problems in connexion with graphs. Numer. Math.
1, 269–271 (1959)
109. Disser, Y., Müller–Hannemann, M., Schnee, M.: Multi-criteria shortest paths
in time-dependent train networks. In: McGeoch, C.C. (ed.) WEA 2008.
LNCS, vol. 5038, pp. 347–361. Springer, Heidelberg (2008). doi:10.1007/
978-3-540-68552-4 26
110. Drews, F., Luxen, D.: Multi-hop ride sharing. In: Proceedings of the 5th Interna-
tional Symposium on Combinatorial Search (SoCS 2012), pp. 71–79. AAAI Press
(2013)
111. Dreyfus, S.E.: An appraisal of some shortest-path algorithms. Oper. Res. 17(3),
395–412 (1969)
72 H. Bast et al.
112. Efentakis, A., Pfoser, D.: Optimizing landmark-based routing and preprocessing.
In: Proceedings of the 6th ACM SIGSPATIAL International Workshop on Com-
putational Transportation Science, pp. 25:25–25:30. ACM Press, November 2013
113. Efentakis, A., Pfoser, D.: GRASP. Extending graph separators for the single-
source shortest-path problem. In: Schulz, A.S., Wagner, D. (eds.) ESA 2014.
LNCS, vol. 8737, pp. 358–370. Springer, Heidelberg (2014). doi:10.1007/
978-3-662-44777-2 30
114. Efentakis, A., Pfoser, D., Vassiliou., Y.: SALT: a unified framework for all
shortest-path query variants on road networks. CoRR, abs/1411.0257 (2014)
115. Efentakis, A., Pfoser, D., Voisard, A.: Efficient data management in support of
shortest-path computation. In: Proceedings of the 4th ACM SIGSPATIAL Inter-
national Workshop on Computational Transportation Science, pp. 28–33. ACM
Press (2011)
116. Efentakis, A., Theodorakis, D., Pfoser,D.: Crowdsourcing computing resources
for shortest-path computation. In: Proceedings of the 20th ACM SIGSPATIAL
International Symposium on Advances in Geographic Information Systems (GIS
2012), pp. 434–437. ACM Press (2012)
117. Ehrgott, M., Gandibleux, X.: Multiple Criteria Optimization: State of the Art
Annotated Bibliographic Surveys. Kluwer Academic Publishers Group, New York
(2002)
118. Eisenstat, D.: Random road networks: the quadtree model. In: Proceedings of the
Eighth Workshop on Analytic Algorithmics and Combinatorics (ANALCO 2011),
pp. 76–84. SIAM, January 2011
119. Eisner, J., Funke, S.: Sequenced route queries: getting things done on the way back
home. In: Proceedings of the 20th ACM SIGSPATIAL International Symposium
on Advances in Geographic Information Systems (GIS 2012), pp. 502–505. ACM
Press (2012)
120. Eisner, J., Funke, S.: Transit nodes - lower bounds and refined construction.
In: Proceedings of the 14th Meeting on Algorithm Engineering and Experiments
(ALENEX 2012), pp. 141–149. SIAM (2012)
121. Eisner, J., Funke, S., Herbst, A., Spillner, A., Storandt, S.: Algorithms for match-
ing and predicting trajectories. In: Proceedings of the 13th Workshop on Algo-
rithm Engineering and Experiments (ALENEX 2011), pp. 84–95. SIAM (2011)
122. Eisner, J., Funke, S., Storandt, S.: Optimal route planning for electric vehicles in
large network. In: Proceedings of the Twenty-Fifth AAAI Conference on Artificial
Intelligence. AAAI Press, August 2011
123. Eppstein, D., Goodrich, M.T.: Studying (non-planar) road networks through
an algorithmic lens. In: Proceedings of the 16th ACM SIGSPATIAL Interna-
tional Conference on Advances in Geographic Information Systems (GIS 2008),
pp. 1–10. ACM Press (2008)
124. Erb, S., Kobitzsch, M., Sanders, P.: Parallel bi-objective shortest paths using
weight-balanced B-trees with bulk updates. In: Gudmundsson, J., Katajainen,
J. (eds.) SEA 2014. LNCS, vol. 8504, pp. 111–122. Springer, Heidelberg (2014).
doi:10.1007/978-3-319-07959-2 10
125. Firmani, D., Italiano, G.F., Laura, L., Santaroni, F.: Is timetabling routing always
reliable for public transport? In: Proceedings of the 13th Workshop on Algo-
rithmic Approaches for Transportation Modeling, Optimization, and Systems
(ATMOS 2013), OpenAccess Series in Informatics (OASIcs), pp. 15–26 (2013)
126. Floyd, R.W.: Algorithm 97: shortest path. Commun. ACM 5(6), 345 (1962)
127. Ford, Jr., L.R.: Network flow theory. Technical report P-923, Rand Corporation,
Santa Monica, California (1956)
Route Planning in Transportation Networks 73
128. Foschini, L., Hershberger, J., Suri, S.: On the complexity of time-dependent short-
est paths. Algorithmica 68(4), 1075–1097 (2014)
129. Fredman, M.L., Tarjan, R.E.: Fibonacci heaps and their uses in improved network
optimization algorithms. J. ACM 34(3), 596–615 (1987)
130. Fu, L., Sun, D., Rilett, L.R.: Heuristic shortest path algorithms for transportation
applications: state of the art. Comput. Oper. Res. 33(11), 3324–3343 (2006)
131. Funke, S., Nusser, A., Storandt, S.: On k-path covers and their applications.
In: Proceedings of the 40th International Conference on Very Large Databases
(VLDB 2014), pp. 893–902 (2014)
132. Funke, S., Nusser, A., Storandt, S.: Placement of loading stations for electric
vehicles: no detours necessary! In: Proceedings of the Twenty-Eighth AAAI Con-
ference on Artificial Intelligence. AAAI Press (2014)
133. Funke, S., Storandt, S.: Polynomial-time construction of contraction hierarchies
for multi-criteria objectives. In: Proceedings of the 15th Meeting on Algorithm
Engineering and Experiments (ALENEX 2013), pp. 31–54. SIAM (2013)
134. Gavoille, C., Peleg, D.: Compact and localized distributed data structures. Dis-
trib. Comput. 16(2–3), 111–120 (2003)
135. Gavoille, C., Peleg, D., Pérennes, S., Raz, R.: Distance labeling in graphs. J.
Algorithms 53, 85–112 (2004)
136. Geisberger, R.: Contraction of timetable networks with realistic transfers. In:
Festa, P. (ed.) SEA 2010. LNCS, vol. 6049, pp. 71–82. Springer, Heidelberg (2010).
doi:10.1007/978-3-642-13193-6 7
137. Geisberger, R.: Advanced route planning in transportation networks. Ph.D. thesis,
Karlsruhe Institute of Technology, February 2011
138. Geisberger, R., Kobitzsch, M., Sanders, P.: Route planning with flexible objective
functions. In: Proceedings of the 12th Workshop on Algorithm Engineering and
Experiments (ALENEX 2010), pp. 124–137. SIAM (2010)
139. Geisberger, R., Luxen, D., Sanders, P., Neubauer, S., Volker, L.: Fast detour com-
putation for ride sharing. In: Proceedings of the 10th Workshop on Algorithmic
Approaches for Transportation Modeling, Optimization, and Systems (ATMOS
2010), OpenAccess Series in Informatics (OASIcs), vol. 14, pp. 88–99 (2010)
140. Geisberger, R., Rice, M., Sanders, P., Tsotras, V.: Route planning with flexible
edge restrictions. ACM J. Exp. Algorithm. 17(1), 1–20 (2012)
141. Geisberger, R., Sanders, P.: Engineering time-dependent many-to-many short-
est paths computation. In: Proceedings of the 10th Workshop on Algorithmic
Approaches for Transportation Modeling, Optimization, and Systems (ATMOS
2010), OpenAccess Series in Informatics (OASIcs), vol. 14 (2010)
142. Geisberger, R., Sanders, P., Schultes, D., Vetter, C.: Exact routing in large road
networks using contraction hierarchies. Transp. Sci. 46(3), 388–404 (2012)
143. Geisberger, R., Schieferdecker, D.: Heuristic contraction hierarchies with approx-
imation guarantee. In: Proceedings of the 3rd International Symposium on Com-
binatorial Search (SoCS 2010). AAAI Press (2010)
144. Geisberger, R., Vetter, C.: Efficient routing in road networks with turn costs. In:
Pardalos, P.M., Rebennack, S. (eds.) SEA 2011. LNCS, vol. 6630, pp. 100–111.
Springer, Heidelberg (2011). doi:10.1007/978-3-642-20662-7 9
145. Goerigk, M., Heße, S., Müller-Hannemann, M., Schmidt, M.: Recoverable robust
timetable information. In: Proceedings of the 13th Workshop on Algorithmic
Approaches for Transportation Modeling, Optimization, and Systems (ATMOS
2013), OpenAccess Series in Informatics (OASIcs), pp. 1–14 (2013)
74 H. Bast et al.
146. Goerigk, M., Knoth, M., Müller-Hannemann, M., Schmidt, M., Schöbel, A.: The
price of strict and light robustness in timetable information. Transp. Sci. 48,
225–242 (2014)
147. Goldberg, A.V.: A practical shortest path algorithm with linear expected time.
SIAM J. Comput. 37, 1637–1655 (2008)
148. Goldberg, A.V., Harrelson, C.: Computing the shortest path: A* search meets
graph theory. In: Proceedings of the 16th Annual ACM-SIAM Symposium on
Discrete Algorithms (SODA 2005), pp. 156–165. SIAM (2005)
149. Goldberg, A.V., Kaplan, H., Werneck, R.F.: Reach for A*: shortest path algo-
rithms with preprocessing. In: The Shortest Path Problem: Ninth DIMACS Imple-
mentation Challenge, DIMACS Book, vol. 74, pp. 93–139. American Mathemat-
ical Society (2009)
150. Goldberg, A.V., Werneck, R.F.: Computing point-to-point shortest paths from
external memory. In: Proceedings of the 7th Workshop on Algorithm Engineering
and Experiments (ALENEX 2005), pp. 26–40. SIAM (2005)
151. Goldman, R., Shivakumar, N.R., Venkatasubramanian, S., Garcia-Molina, H.:
Proximity search in databases. In: Proceedings of the 24th International Con-
ference on Very Large Databases (VLDB 1998), pp. 26–37. Morgan Kaufmann,
August 1998
152. Goodrich, M.T., Pszona, P.: Two-phase bicriterion search for finding fast and
efficient electric vehicle routes. In: Proceedings of the 22nd ACM SIGSPATIAL
International Conference on Advances in Geographic Information Systems. ACM
Press, November 2014
153. Gunkel, T., Schnee, M., Müller-Hannemann, M.: How to find good night train
connections. Networks 57(1), 19–27 (2011)
154. Gutman, R.J., Reach-based routing: a new approach to shortest path algorithms
optimized for road networks. In: Proceedings of the 6th Workshop on Algorithm
Engineering and Experiments (ALENEX 2004), pp. 100–111. SIAM (2004)
155. Hansen, P.: Bricriteria path problems. In: Fandel, G., Gal, T. (eds.) Multiple
Criteria Decision Making - Theory and Application. LNEMS, vol. 177, pp. 109–
127. Springer, Heidelberg (1979). doi:10.1007/978-3-642-48782-8 9
156. Hart, P.E., Nilsson, N., Raphael, B.: A formal basis for the heuristic determination
of minimum cost paths. IEEE Trans. Syst. Sci. Cybern. 4, 100–107 (1968)
157. Hilger, M., Köhler, E., Möhring, R.H., Schilling, H., Fast point-to-point shortest
path computations with arc-flags. In: The Shortest Path Problem: Ninth DIMACS
Implementation Challenge, DIMACS Book, vol. 74, pp. 41–72. American Mathe-
matical Society (2009)
158. Hliněný, P., Moriš, O.: Scope-based route planning. In: Demetrescu, C.,
Halldórsson, M.M. (eds.) ESA 2011. LNCS, vol. 6942, pp. 445–456. Springer,
Heidelberg (2011). doi:10.1007/978-3-642-23719-5 38
159. Holzer, M.: Engineering planar-separator and shortest-path algorithms. Ph.D.
thesis, Karlsruhe Institute of Technology (KIT) - Department of Informatics
(2008)
160. Holzer, M., Schulz, F., Wagner, D.: Engineering multilevel overlay graphs for
shortest-path queries. ACM J. Exp. Algorithm. 13(2.5), 1–26 (2008)
161. Holzer, M., Schulz, F., Wagner, D., Willhalm, T.: Combining speed-up techniques
for shortest-path computations. ACM J. Exp. Algorithm. 10(2.5), 1–18 (2006)
162. Horvitz, E., Krumm, J.: Some help on the way: opportunistic routing under uncer-
tainty. In: Proceedings of the 2012 ACM Conference on Ubiquitous Computing
(Ubicomp 2012), pp. 371–380. ACM Press (2012)
Route Planning in Transportation Networks 75
163. Ikeda, T., Hsu, M.-Y., Imai, H., Nishimura, S., Shimoura, H., Hashimoto, T.,
Tenmoku, K., Mitoh, K.: A fast algorithm for finding better routes by AI search
techniques. In: Proceedings of the Vehicle Navigation and Information Systems
Conference (VNSI 1994), pp. 291–296. ACM Press (1994)
164. Jing, N., Huang, Y.-W., Rundensteiner, E.A.: Hierarchical encoded path views for
path query processing: an optimal model and its performance evaluation. IEEE
Trans. Knowl. Data Eng. 10(3), 409–432 (1998)
165. Jung, S., Pramanik, S.: An efficient path computation model for hierarchically
structured topographical road maps. IEEE Trans. Knowl. Data Eng. 14(5), 1029–
1046 (2002)
166. Kaindl, H., Kainz, G.: Bidirectional heuristic search reconsidered. J. Artif. Intell.
Res. 7, 283–317 (1997)
167. Kaufmann, H.: Towards mobile time-dependent route planning. Bachelor thesis,
Karlsruhe Institute of Technology (2013)
168. Kieritz, T., Luxen, D., Sanders, P., Vetter, C.: Distributed time-dependent con-
traction hierarchies. In: Festa, P. (ed.) SEA 2010. LNCS, vol. 6049, pp. 83–93.
Springer, Heidelberg (2010). doi:10.1007/978-3-642-13193-6 8
169. Kirchler, D., Liberti, L., Wolfler Calvo, R.: A label correcting algorithm for the
shortest path problem on a multi-modal route network. In: Klasing, R. (ed.) SEA
2012. LNCS, vol. 7276, pp. 236–247. Springer, Heidelberg (2012). doi:10.1007/
978-3-642-30850-5 21
170. Kirchler, D., Liberti, L., Pajor, T., Calvo, R.W.: UniALT for regular language con-
straint shortest paths on a multi-modal transportation network. In: Proceedings
of the 11th Workshop on Algorithmic Approaches for Transportation Modeling,
Optimization, and Systems (ATMOS 2011), OpenAccess Series in Informatics
(OASIcs), vol. 20, pp. 64–75 (2011)
171. Kleinberg, J.M., Slivkins, A., Wexler, T.: Triangulation and embedding using
small sets of beacons. In: Proceedings of the 45th Annual IEEE Symposium on
Foundations of Computer Science (FOCS 2004), pp. 444–453. IEEE Computer
Society Press (2004)
172. Knopp, S., Sanders, P., Schultes, D., Schulz, F., Wagner, D.: Computing many-
to-many shortest paths using highway hierarchies. In: Proceedings of the 9th
Workshop on Algorithm Engineering and Experiments (ALENEX 2007), pp. 36–
45. SIAM (2007)
173. Kobitzsch, M.: HiDAR: an alternative approach to alternative routes. In: Bod-
laender, H.L., Italiano, G.F. (eds.) ESA 2013. LNCS, vol. 8125, pp. 613–624.
Springer, Heidelberg (2013). doi:10.1007/978-3-642-40450-4 52
174. Kobitzsch, M., Radermacher, M., Schieferdecker, D.: Evolution and evaluation
of the penalty method for alternative graphs. In: Proceedings of the 13th Work-
shop on Algorithmic Approaches for Transportation Modeling, Optimization, and
Systems (ATMOS 2013), OpenAccess Series in Informatics (OASIcs), pp. 94–107
(2013)
175. Kontogiannis, S., Zaroliagis, C.: Distance oracles for time-dependent networks.
In: Esparza, J., Fraigniaud, P., Husfeldt, T., Koutsoupias, E. (eds.) ICALP
2014. LNCS, vol. 8572, pp. 713–725. Springer, Heidelberg (2014). doi:10.1007/
978-3-662-43948-7 59
176. Krumm, J., Gruen, R., Delling, D.: From destination prediction to route predic-
tion. J. Locat. Based Serv. 7(2), 98–120 (2013)
177. Krumm, J., Horvitz, E.: Predestination: where do you want to go today? IEEE
Comput. 40(4), 105–107 (2007)
76 H. Bast et al.
230. Sankaranarayanan, J., Samet, H.: Roads belong in databases. IEEE Data Eng.
Bull. 33(2), 4–11 (2010)
231. Schilling, H.: TomTom navigation - How mathematics help getting through traffic
faster (2012). Talk given at ISMP
232. Schreiber, R.: A new implementation of sparse Gaussian elimination. ACM Trans.
Math. Softw. 8(3), 256–276 (1982)
233. Schultes, D.: Route planning in road networks. Ph.D. thesis, Universität Karlsruhe
(TH), February 2008
234. Schultes, D., Sanders, P.: Dynamic highway-node routing. In: Demetrescu, C. (ed.)
WEA 2007. LNCS, vol. 4525, pp. 66–79. Springer, Heidelberg (2007). doi:10.1007/
978-3-540-72845-0 6
235. Schulz, F., Wagner, D., Weihe, K.: Dijkstra’s algorithm on-line: an empirical case
study from public railroad transport. ACM J. Exp. Algorithm. 5(12), 1–23 (2000)
236. Schulz, F., Wagner, D., Zaroliagis, C.: Using multi-level graphs for timetable
information in railway systems. In: Mount, D.M., Stein, C. (eds.) ALENEX
2002. LNCS, vol. 2409, pp. 43–59. Springer, Heidelberg (2002). doi:10.1007/
3-540-45643-0 4
237. Sedgewick, R., Vitter, J.S.: Shortest paths in Euclidean graphs. Algorithmica
1(1), 31–48 (1986)
238. Sommer, C.: Shortest-path queries in static networks. ACM Comput. Surv. 46(4),
1–31 (2014)
239. Storandt, S.: Route planning for bicycles - exact constrained shortest paths made
practical via contraction hierarchy. In: Proceedings of the Twenty-Second Inter-
national Conference on Automated Planning and Scheduling, pp. 234–242 (2012)
240. Storandt, S., Funke, S.: Cruising with a battery-powered vehicle and not getting
stranded. In: Proceedings of the Twenty-Sixth AAAI Conference on Artificial
Intelligence. AAAI Press (2012)
241. Storandt, S., Funke, S.: Enabling e-mobility: facility location for battery loading
stations. In: Proceedings of the Twenty-Seventh AAAI Conference on Artificial
Intelligence. AAAI Press (2013)
242. Strasser, B., Wagner, D.: Connection scan accelerated. In: Proceedings of the
16th Meeting on Algorithm Engineering and Experiments (ALENEX 2014), pp.
125–137. SIAM (2014)
243. Theune, D.: Robuste und effiziente Methoden zur Lösung von Wegproblemen.
Ph.D. thesis, Universität Paderborn (1995)
244. Thorup, M.: Integer priority queues with decrease key in constant time and the
single source shortest paths problem. In: 35th ACM Symposium on Theory of
Computing, pp. 149–158. ACM, New York (2003)
245. Thorup, M.: Compact oracles for reachability and approximate distances in planar
digraphs. J. ACM 51(6), 993–1024 (2004)
246. Tsaggouris, G., Zaroliagis, C.: Multiobjective optimization: improved FPTAS for
shortest paths and non-linear objectives with applications. Theory Comput. Syst.
45(1), 162–186 (2009)
247. Tulp, E., Siklóssy, L.: TRAINS, an active time-table searcher. ECAI 88, 170–175
(1988)
248. Tulp, E., Siklóssy, L.: Searching time-table networks. Artif. Intell. Eng. Des. Anal.
Manuf. 5(3), 189–198 (1991)
249. van Vliet, D.: Improved shortest path algorithms for transport networks. Transp.
Res. Part B: Methodol. 12(1), 7–20 (1978)
80 H. Bast et al.
250. Wagner, D., Willhalm, T.: Drawing graphs to speed up shortest-path computa-
tions. In: Proceedings of the 7th Workshop on Algorithm Engineering and Exper-
iments (ALENEX 2005), pp. 15–24. SIAM (2005)
251. Wagner, D., Willhalm, T., Zaroliagis, C.: Geometric containers for efficient
shortest-path computation. ACM J. Exp. Algorithm. 10(1.3), 1–30 (2005)
252. Weller, M.: Optimal hub labeling is NP-complete. CoRR, abs/1407.8373 (2014)
253. White, D.J.: Epsilon efficiency. J. Optim. Theory Appl. 49(2), 319–337 (1986)
254. Williams, J.W.J.: Algorithm 232: heapsort. J. ACM 7(6), 347–348 (1964)
255. Winter, S.: Modeling costs of turns in route planning. GeoInformatica 6(4), 345–
361 (2002)
256. Witt, S.: Trip-based public transit routing. In: Bansal, N., Finocchi, I. (eds.) ESA
2015. LNCS, vol. 9294, pp. 1025–1036. Springer, Heidelberg (2015). doi:10.1007/
978-3-662-48350-3 85
257. Lingkun, W., Xiao, X., Deng, D., Cong, G., Zhu, A.D., Zhou, S.: Shortest path
and distance queries on road networks: an experimental evaluation. Proc. VLDB
Endow. 5(5), 406–417 (2012)
258. Yu, H., Lu, F.: Advanced multi-modal routing approach for pedestrians. In: 2nd
International Conference on Consumer Electronics, Communications and Net-
works, pp. 2349–2352 (2012)
259. Zadeh, L.A.: Fuzzy logic. IEEE Comput. 21(4), 83–93 (1988)
260. Zhong, R., Li, G., Tan, K.-L., Zhou, L.: G-tree: an efficient index for KNN search
on road networks. In: Proceedings of the 22nd International Conference on Infor-
mation and Knowledge Management, pp. 39–48. ACM Press (2013)
261. Zhu, A.D., Ma, H., Xiao, X., Luo, S., Tang, Y., Zhou, S.: Shortest path, distance
queries on road networks: towards bridging theory and practice. In: Proceedings
of the 2013 ACM SIGMOD International Conference on Management of Data
(SIGMOD 2013), pp. 857–868. ACM Press (2013)
262. Zwick, U.: Exact and approximate distances in graphs — a survey. In: Heide,
F.M. (ed.) ESA 2001. LNCS, vol. 2161, pp. 33–48. Springer, Heidelberg (2001).
doi:10.1007/3-540-44676-1 3
Theoretical Analysis of the k-Means
Algorithm – A Survey
1 Introduction
Clustering is a basic process in data analysis. It aims to partition a set of objects
into groups called clusters such that, ideally, objects in the same group are
similar and objects in different groups are dissimilar to each other. There are
many scenarios where such a partition is useful. It may, for example, be used to
structure the data to allow efficient information retrieval, to reduce the data by
replacing a cluster by one or more representatives or to extract the main ‘themes’
in the data. There are many surveys on clustering algorithms, including well-
known classics [45,48] and more recent ones [24,47]. Notice that the title of [47] is
Data clustering: 50 years beyond K-means in reference to the k-means algorithm,
the probably most widely used clustering algorithm of all time. It was proposed
in 1957 by Lloyd [58] (and independently in 1956 by Steinhaus [70]) and is the
topic of this survey.
The k-means algorithm solves the problem of clustering to minimize the sum
of squared errors (SSE). In this problem, we are given a set of points P ⊂ Rd in a
Euclidean space, and the goal is to find a set C ⊂ Rd of k points (not necessarily
included in P ) such that the sum of the squared distances of the points in P
to their nearest center in C is minimized. Thus, the objective function to be
minimized is
cost(P, C) := min p − c2 ,
c∈C
p∈P
where · 2 is the squared Euclidean distance. The points in C are called cen-
ters. The objective function may also be viewed as the attempt to minimize the
variance of the Euclidean distance of the points to their nearest cluster centers.
c Springer International Publishing AG 2016
L. Kliemann and P. Sanders (Eds.): Algorithm Engineering, LNCS 9220, pp. 81–116, 2016.
DOI: 10.1007/978-3-319-49487-6 3
82 J. Blömer et al.
Also notice that when given the centers, the partition of the data set is implicitly
defined by assigning each point to its nearest center.
The above problem formulation assumes that the number of centers k is
known in advance. How to choose k might be apparent from the application at
hand, or from a statistical model that is assumed to be true. If it is not, then the
k-means algorithm is typically embedded into a search for the correct number
of clusters. It is then necessary to specify a measure that allows to compare
clusterings with different k (the SSE criterion is monotonely decreasing with k
and thus not a good measure). A good introduction to the topic is the overview by
Venkatasubramanian [75] as well as Sect. 5 in the paper by Tibshirani et al. [71]
and the summary by Gordon [39]. In this survey, we assume that k is provided
with the input.
As Jain [47] also notices, the k-means algorithm is still widely used for clus-
tering and in particular for solving the SSE problem. That is true despite a
variety of alternative options that have been developed in fifty years of research,
and even though the k-means algorithm has known drawbacks.
In this survey, we review the theoretical analysis that has been developed
for the k-means algorithm. Our aim is to give an overview on the properties of
the k-means algorithm and to understand its weaknesses, but also to point out
what makes the k-means algorithm such an attractive algorithm. In this survey
we mainly review theoretical aspects of the k-means algorithm, i.e. focus on
the deduction part of the algorithm engineering cycle, but we also discuss some
implementations with focus on scalability for big data.
The question is whether and how it can be avoided to always compute the
distances between all points and centers, even if this does not lead to an asymp-
totic improvement. Imagine the following pruning rule: Let ci be a center in the
current iteration. Compute the minimum distance Δi between ci and any other
center in time Θ(kd). Whenever the distance between a point p and ci is smaller
than Δi /2, then the closest center to p is ci and computing the other k − 1
distances is not necessary. A common observation is that points often stay with
the same cluster as in the previous iteration. Thus, check first whether the point
is within the safe zone of its old center. More complicated pruning rules take
the movement of the points into account. If a point has not moved far compared
to the center movements, it keeps its center allocation. Rules like this aim at
accelerating the k-means algorithm while computing the same clustering as a
naı̈ve implementation. The example pruning rules are from [50].
Accelerating the algorithm can also be done by assigning groups of points
together using sufficient statistics. Assume that a subset P of points is assigned
to the same center. Then finding this center and later updating it based on
the new points can be done by only using three statistics on P . These are
the sum of the points (which is a point itself), the sum of the squared lengths
of the points (and thus a constant) and the number of points. However, this
is only useful if the statistic is already precomputed. For low-dimensional data
sets, the precomputation can be done using kd-trees. These provide a hierarchical
subdivision of a point set. The idea now is to equip each inner node with sufficient
statistics on the point set represented by it. When reassigning points to centers,
pruning techniques can be used to decide whether all points belonging to an
inner node have the same center, or whether it is necessary to proceed to the
child nodes to compute the assignment. Different algorithms based on this idea
are given in [10,54,68]. Notice that sufficient statistics are used in other contexts,
too, e.g. as a building block of the well-known data stream clustering algorithm
BIRCH [76].
There are many ways more that help to accelerate the k-means algorithm.
For an extensive overview and more pointers to the literature, see [41].
2
can be bounded by O(ndk ) because given a set of k centers, we can move
each of the O(k 2 ) bisectors such that they coincide with d linearly independent
points without changing the partition. For the special case of d = 1 and k < 5,
Dasgupta [31] proved an upper bound of O(n) iterations. Later, for d = 1 and
any k, Har-Peled and Sadri [44] showed an upper bound of O(nΔ2 ) iterations,
where Δ is the ratio between the diameter and the smallest pairwise distance of
the input points.
Q1 Q2 Q3
c1 c2 c3
S
Q1
Fig. 1. Illustration of the upper bound for the k-means algorithm [44].
In the following, we will explain the idea to obtain the upper bound given
in [44]. The input is a set P of n points with spread Δ from the Euclidean line
R. W.l.o.g., we can assume that the minimum pairwise distance in P is 1 and
the diameter of P is Δ. For any natural number k and for any partition of P
into k sets, the clustering cost of P with the means of the subsets as centers is
bounded by O(nΔ2 ). In particular, this holds for the solution of the k-means
algorithm after the first iteration. Additionally, the clustering cost of P certainly
is ω(1) as we assumed that the minimum pairwise distance in P is 1. Thus, if
we can show that each following iteration decreases the cost by at least some
constant amount, then we are done. Let us now consider the point of time in any
iteration of the k-means algorithm when the cluster centers have been moved to
the means of their respective clusters and the next step is to assign each point
to the new closest cluster center. In this step, there has to be a cluster that is
extended or shrunk from its right end. W.l.o.g. and as illustrated in Fig. 1, let
us assume that the leftmost cluster Q1 is extended from its right end. Let S be
the set of points that join cluster Q1 to obtain cluster Q1 . Since the minimum
pairwise distance is 1, the distance of the mean of S to the leftmost point in S
is at least (|S| − 1)/2. Similarly, the distance of the mean of Q1 to the rightmost
point in Q1 is at least (|Q1 | − 1)/2. Furthermore, the distance between any point
in Q1 and any point in S is at least 1. Let μ(X) be the mean of any point set X.
Then, we have μ(Q1 ) − μ(S) ≥ (|Q1 | − 1)/2 + (|S| − 1)/2 + 1 = (|Q1 | + |S|)/2.
The movement of the mean of the leftmost cluster is at least
|Q1 |μ(Q1 ) + |S|μ(S)
μ(Q1 ) − μ(Q1 ) = μ(Q1 ) −
|Q1 | + |S|
|S| |S| 1
= μ(Q1 ) − μ(S) ≥ ≥ .
|Q1 | + |S| 2 2
We will now need the following fact, which is proved in Sect. 6.
86 J. Blömer et al.
Fact 1. Let
1
μ := p
|P |
p∈P
be the mean of a point set P , and let y ∈ Rd be any point. Then, we have
p − y2 = p − μ2 + |P | · y − μ2 .
p∈P p∈P
Lower Bounds. Lower bounds on the worst-case running time of the k-means
algorithm have been studied in [13,31,72]. Dasgupta [31] proved that the k-
means algorithm has a worst-case
√ running time of Ω(n) iterations. Using a
construction in some Ω( n)-dimensional space, Arthur and Vassilvitskii [13]
were able to improve
√ this result to obtain a super-polynomial worst-case run-
ning time of 2Ω( n) iterations. This has been simplified and further improved by
Vattani [72] who proved an exponential lower bound on the worst-case running
time of the k-means algorithm showing that k-means requires 2Ω(n) iterations
even in the plane. A modification of the construction shows that the k-means
algorithm has a worst-case running time that, besides being exponential in n, is
also exponential in the spread Δ of the d-dimensional input points for any d ≥ 3.
In the following, we will give a high-level view on the construction presented
in [72]. Vattani uses a special set of n input points in R2 and a set of k =
Θ(n) cluster centers adversarially chosen among the input points. The points
are arranged in a sequence of t = Θ(n) gadgets G0 , G1 , . . . , Gt−1 . Except from
some scaling, the gadgets are identical. Each gadget contains a constant number
of points, has two clusters and hence two cluster centers, and can perform two
stages reflected by the positions of the two centers. In one stage, gadget Gi ,
0 ≤ i < t, has one center in a certain position c∗i , and, in the other stage, the
same center has left the position c∗i and has moved a little bit towards gadget
Gi+1 . Once triggered by gadget Gi+1 , Gi performs both of these stages twice in
a row. Performing these two stages happens as follows. The two centers of gadget
Gi+1 are assigned to the center of gravity of their clusters, which results in some
points of Gi+1 are temporarily assigned to the center c∗i of Gi . Now, the center
of Gi located at c∗i and the centers of Gi+1 move, so that the points temporarily
assigned to a center of Gi are again assigned to the centers of Gi+1 . Then, again
triggered by Gi+1 , gadget Gi performs the same two stages once more. There
is only some small modification in the arrangement of the two clusters of Gi+1 .
Now, assume that all gadgets except Gt−1 are stable and the centers of Gt−1 are
moved to the centers of gravity of their clusters. This triggers a chain reaction,
in which the gadgets perform 2Ω(t) stages in total. Since, each stage of a gadget
corresponds to one iteration of the k-means algorithm, the algorithm needs 2Ω(n)
iterations on the set of points contained in the gadgets.
Theoretical Analysis of the k-Means Algorithm – A Survey 87
Smoothed Analysis. Concerning the above facts, one might wonder why
k-means works so well in practice. To close this gap between theory and practice,
the algorithm has also been studied in the model of smoothed analysis [12,15,62].
This model is especially useful when both worst-case and average-case analysis
are not realistic and reflects the fact that real-world datasets are likely to con-
tain measurement errors or imprecise data. In case an algorithm has a low time
complexity in the smoothed setting, it is likely to have a small running time on
real-world datasets as well.
Next, we explain the model in more detail. For given parameters n and σ, an
adversary chooses an input instance of size n. Then, each input point is perturbed
by adding some small amount of random noise using a Gaussian distribution with
mean 0 and standard deviation σ. The maximum expected running time of the
algorithm executed on the perturbed input points is measured.
Arthur and Vassilvitskii [15] showed that, in the smoothed setting, the
number of iterations of the k-means algorithm is at most poly(nk , σ −1 ). This
was improved
√
by Manthey and Röglin [62] who proved the upper bounds
poly(n , 1/σ) and k kd ·poly(n, 1/σ) on the number of iterations. Finally, Arthur
k
y z x
input points
optimal centers
heuristic centers
Fig. 2. Example illustrating the fact that no approximation guarantee can be given for
the k-means algorithm [55].
Often one simply chooses the starting centers uniformly at random, but this
can lead to problems, for example, when there is a cluster that is far away from
the remaining points and that is so small that it is likely that no point of it is
randomly drawn as one of the initial centers. In such a case one must hope to
eventually converge to a solution that has a center in this cluster as otherwise
we would end up with a bad solution. Unfortunately, it is not clear that this
happens (in fact, one can assume that it will not).
Therefore, a better idea is to start with a solution that already satisfies some
approximation guarantees and let the k-means algorithm refine the solution. In
this section we will present methods that efficiently pick a relatively good initial
solution. As discussed later in Sect. 6 there are better approximation algorithms,
but they are relatively slow and the algorithms presented in this section present
a better trade-off between running time and quality of the initial solution.
1
Notice that though we present these results after [14] and [7] for reasons of presen-
tation, the work of Ostrovsky et al. [67] appeared first.
Theoretical Analysis of the k-Means Algorithm – A Survey 91
Note that ε-separability scales with the number of clusters. Imagine k optimal
clusters with the same clustering cost C, i. e., the total clustering cost is k · C.
Then, ε-separability requires that clustering with k − 1 clusters instead of k
clusters costs at least k · C/ε2 . Thus, for more clusters, the pairwise separation
has to be higher.
for each r
= s with r, s ∈ {1, . . . , k}, where c is some constant. The term A−CS
is the spectral norm of the matrix A − C, defined by
A point p from cluster Tr satisfies the proximity condition if, for any s
= r, the
projection of p onto the line between μr and μs is at least Δrs closer to μr than
to μs .
We have a closer look at the definition. The term A − C is the matrix con-
sisting of the difference vectors, i. e., it gives the deviations of the points to their
centers. The term (A − C) · v2 is the projection of these distance vectors into
direction v, i. e., a measure on how much the data is scattered in this direction.
Thus, A−CS /n is the largest average distance to the mean in any direction. It
is an upper bound on the variance of the optimal clusters. Assume that ni = n/k
for all i. Then, Δ2rs = (2c)2 k 2 A − C2S /ni is close to being the maximal aver-
age variance of the two clusters in any direction. It is actually larger, because
A − CS includes all clusters, so Δrs and thus the separation of the points in
Tr and Ts depends on all clusters even though it differs for different r, s.
Seeding Method. Given an input that is assumed to satisfy the above separa-
tion condition, Kumar and Kanan compute an initial solution by projecting
the points onto a lower-dimensional subspace and approximately solving the
low-dimensional instance. The computed centers form the seed to the k-means
method.
The lower-dimensional
subspace is the best-fit subspace Vk , i. e., it minimizes
the expression p∈P minv∈V p − v2 among all k-dimensional subspaces V . It
is known that Vk is the subspace spanned by the first k eigenvectors of A, which
can be calculated by singular value decomposition (SVD) 2 , and that project-
ing points to Vk and solving the SSE optimally on the projected points yields
a 2-approximation. Any constant-factor approximation thus gives a constant
approximation for the original input.
In addition to these known facts, the result by Kumar and Kannan shows
that initializing the k-means algorithm with this solution even yields an optimal
solution as long as the optimal partition satisfies the proximity condition.
2
The computation of the SVD is a well-studied field of research. For an in-depth
introduction to spectral algorithms and singular value decompositions, see [52].
Theoretical Analysis of the k-Means Algorithm – A Survey 93
(1+ε)
R= ε(2+ε)
c c
ε
R+ 2(2+ε)
Fig. 3. Illustration of the ε-Apollonius ball for a point c with respect to a point c.
q 1
R< 16
1/8
c c
Fig. 4. Illustration of the fact that each center can serve as a switch center for at most
one weakly misclassified point.
q to all the other points in P is at least 1, no other center can move closer to
q than c due to a reassignment of a weakly misclassified point. This means in
the next iteration c will still be the closest cluster center to q and q will not be
(1 + ε)-misclassified. As a result, either there are no (1 + ε)-misclassified points
left and the algorithm terminates or there are some strongly misclassified points.
Thus, at least every second iteration reassigns some strongly misclassified points,
which completes the proof.
than the k-means algorithm (and the constant approximation algorithms), and
this interest increases with the availability of larger and larger amounts of data.
The problem of solving the SSE problem for big data has been researched from a
practical as well as from a theoretical side and in this section, we are interested
in the intersection.
The theoretical model of choice is streaming. The data stream model assumes
that the data can only be read once and in a given order, and that the algorithm
is restricted to small space, e.g. polylogarithmic in the input it processes, but
still computes an approximation. One-pass algorithms and low memory usage are
certainly also desirable from a practical point of view, since random access to the
data is a major slowdown for algorithms, and small memory usage might mean
that all stored information actually fits into the main memory. The k-means
algorithm reads the complete data set in each iteration, and a straightforward
implementation of the k-means++ reads the data about k times for the seeding
alone, and these are reasons why the algorithms do not scale so well for large
inputs.
An old variant of the k-means algorithm, proposed independently of Lloyd’s
work by MacQueen [59], gives a very fast alternative to the k-means algorithm.
It processes the data once, assigns each new data point to its closest center
and updates this center to be the centroid of the points assigned to it. Thus, it
never reassigns points. MacQueen’s k-means algorithm clearly satisfies the first
two requirements for a streaming algorithm, but not the third. Indeed, it is not
surprising that MacQueen’s algorithm does not necessarily converge to a good
solution, and that the solution depends heavily on the start centers and the
order of the input points. The famous streaming algorithm BIRCH [76] is also
very fast and is perceived as producing better clusterings, yet, it still shares the
property that there is no approximation guarantee [37].
Various data stream algorithms for the SSE problem have been proposed, see
for example [29,34,35,38,42,43], achieving (1 + ε)-approximations in one pass
over the data for constant k (and constant d, for some of the algorithms). We
now look at algorithms which lie in between practical and theoretical results.
Local Search and the Stream Framework. Guha et al. [40] develop a frame-
work for clustering algorithms in the data stream setting that they call Stream.
They combine it with a constant factor approximation based on local search.
The resulting algorithm is named StreamLS3 . It computes a constant approx-
imation in the data stream setting. StreamLS has originally been designed for
the variant of the SSE problem where the distances are not squared (also called
the k-median problem), but it is stated to work for the SSE problem as well with
worse constants.
The Stream framework reads data in blocks of size m. For each block, it
computes a set of c · k centers that are a constant factor approximation for
the SSE problem with k centers (c is a constant) by using an approximation
3
http://infolab.stanford.edu/∼loc/.
98 J. Blömer et al.
Adaptions of k -Means++. Ailon et al. [8] use the Stream framework and
combine it with different approximation algorithms. The main idea is to extend
the seeding part of the k-means++ algorithm to an algorithm called k-means#
and to use this algorithm within the above Stream framework description. Recall
that the seeding in k-means++ is done by D2 -sampling. This method iteratively
samples k centers. The first one is sampled uniformly at random. For the ith
Theoretical Analysis of the k-Means Algorithm – A Survey 99
center, each input point p is sampled with probability D2 (p)/ q∈P D2 (q), where
P is the input point set, D2 (p) = minc1 ,...,ci−1 ||p − ci ||2 is the cost of p in the
current solution and c1 , . . . ci−1 are the centers chosen so far. A set of k centers
chosen in this way is an expected O(log k)-approximation.
The algorithm k-means# starts with choosing 3 log k centers uniformly at
random and then performs k − 1 iterations, each of which samples 3 log k centers
according to the above given probability distribution. This is done to ensure that
for an arbitrary optimal clustering of the points, each of the clusters is ‘hit’ with
constant probability by at least one center. Ailon et al. show that the O(k log k)
centers computed by k-means# are a constant factor approximation for the SSE
criterion with high probability4 .
To obtain the final algorithm, the Stream framework is used. Recall that
the framework uses two approximation algorithms A and A . While A can be
a bicriteria approximation that computes a constant factor approximation with
c · k centers, A has to compute an approximative solution with k centers. The
approximation guarantee of the final algorithm is the guarantee provided by A .
Ailon et al. sample k centers by D2 -sampling for A , thus, the overall result
is an expected O(log k) approximation. For A, k-means# is ran 3 log n times to
reduce the error probability sufficiently and then the best clustering is reported.
The overall algorithm needs nε memory for a constant ε > 0.
The overall algorithm is compared to the k-means algorithm and to
MacQueen’s k-means algorithm on data sets with up to ten thousand points
in up to sixty dimensions. While it produces solutions of better quality than the
two k-means versions, it is slower than both.
Ackermann et al. [6] develop a streaming algorithm based on k-means++
motivated from a different line of work5 . The ingredients of their algorithms
look very much alike the basic building blocks of the algorithm by Ailon et al.:
sampling more than k points according to the k-means++ sampling method,
organizing the computations in a binary tree and computing the final clustering
with k-means++. There are key differences, though.
Firstly, their work is motivated from the point of view of coresets for the
SSE problem. A coreset S for a point set P is a smaller and weighted set of
points that has approximately the same clustering cost as P for any choice of
k centers. It thus satisfies a very strong property. Ackermann et al. show that
sampling sufficiently many points according to the k-means++ sampling results
in a coreset. For constant dimension d, they show that O(k · (log n)O(1) ) points
guarantee that the clustering cost of the sampled points is within an ε-error from
the true cost of P for any set of k centers6 .
Coresets can be embedded into a streaming setting very nicely by using a
technique called merge-and-reduce. It works similar as the computation tree of
4
As briefly discussed in Sect. 3.1, it is sufficient to sample O(k) centers to obtain a
constant factor approximation as later discovered by Aggarwal et al. [7].
5
http://www.cs.uni-paderborn.de/fachgebiete/ag-bloemer/forschung/abgeschlossene
/clustering-dfg-schwerpunktprogramm-1307/streamkmpp.html.
6
This holds with constant probability and for any constant ε.
100 J. Blömer et al.
the Stream framework: It reads blocks of data, computes a coreset for each
block and merges and reduces these coresets in a binary computation tree. Now
the advantage is that this tree can have superconstant height since this can be
cancelled out by adjusting the error ε of each coreset computation. A maximum
height of Θ(log n) means that the block size on the lowest level can be much
smaller than above (recall that in the algorithm by Ailon et al., the block size
was nε ). For the above algorithm, a height of Θ(log n) would mean that the
approximation ratio would be Ω(clog n ) ∈ Ω(n). By embedding their coreset
construction into the merge-and-reduce technique, Ackermann et al. provide a
streaming algorithm that needs O(k · (log n)O(1) ) space and computes a coreset
of similar size for SSE problem. They obtain a solution for the problem by
running k-means++ on the coreset. Thus, the solution is an expected O(log k)-
approximation.
Secondly, Ackermann et al. significantly speed up the k-means++ sampling
approach. Since the sampling is applied again and again, this has a major impact
on the running time. Notice that it is necessary for the sampling to compute D(p)
for all p and to update this after each center that was drawn. When computing
a coreset of m points for a point of points, a vanilla implementation of this
sampling needs Θ(dm) time. Ackermann et al. develop a data structure called
coreset tree which allows to perform the sampling much faster. It does, however,
change the sampling procedure slightly, such that the theoretically proven bound
does not necessarily hold any more.
In the actual implementation, the sample size and thus the coreset size is set
to 200 k and thus much smaller than it is supported by the theoretical analysis.
However, experiments support that the algorithm still produces solutions of high
quality, despite these two heuristic changes. The resulting algorithm is called
StreamKM++.
Ackermann et al. test their algorithm on data sets with up to eleven
million points in up to 68 dimensions and compare the performance to BIRCH,
StreamLS, the k-means algorithm and k-means++. They find that StreamLS
and StreamKM++ compute solutions of comparable quality, and much better
than BIRCH. BIRCH is the fastest algorithm. However, StreamKM++ beats
the running time of StreamLS by far and can e.g. compute a solution for the
largest data set and k = 30 in 27 % of the running time of StreamLS. For small
dimensions or higher k, the speed up is even larger. The k-means algorithm and
k-means++ are much slower than StreamLS and thus also than StreamKM++.
It is to be expected that StreamKM++ is faster than the variant by Ailon et al.
as well.
point, BIRCH decides whether to add the point to an existing subset or not. If
so, then it applies a rule to choose one of the subsets and to add the point to it
by updating the sufficient statistics. This can be done in constant time. If not,
then the tree grows and represents a partitioning with one more subset.
BIRCH has a parameter for the maximum size of the tree. If the size of the
tree exceeds this threshold, then it rebuilds the tree. Notice that a subset repre-
sented by its sufficient statistics cannot be split up. Thus, rebuilding means that
some subsets are merged to obtain a smaller tree. After reading the input data,
BIRCH represents each subset in the partitioning by a weighted point (which is
obtained from the sufficient statistics) and then runs a clustering algorithm on
the weighted point set.
The algorithm is very fast since updating the sufficient statistics is highly
efficient and rebuilding does not occur too often. However, the solutions com-
puted by BIRCH are not guaranteed to have a low cost with respect to the SSE
cost function.
Fichtenberger et al. [37] develop the algorithm BICO8 . The name is a combi-
nation of the words BIRCH and coreset. BICO also maintains a tree which stores
a representation of a partitioning. Each node of this tree represents a subset by
its sufficient statistics.
The idea of BICO is to improve the decision if and where to add a point to a
subset in order to decrease the error of the summary. For this, BICO maintains
a maximum error value T . A subset is forbidden to induce more error than T .
The error of a subset is measured by the squared distances of all points in the
subset to the centroid because in the end of the computation, the subset will be
represented by the centroid.
For a new point, BICO searches for the subset whose centroid is closest to
the point. BICO first checks whether the new point lies within a certain radius
of this centroid since it wants to avoid to use all the allowed error of a subset for
one point. If the point lies outside of the radius, a new node is created directly
beneath the root of the tree for the new point. Otherwise, the point is added to
this subset if the error keeps being bounded by T . If the point does not pass this
check, then it is passed on to the child node of the current node whose centroid
is closest. If no child node exists or the point lies without the nodes radius, then
a new child node is created based on the new point.
If the tree gets too large, then T is doubled and the tree is rebuilt by merging
subsets whose error as a combined subset is below the new T .
For constant dimension d, Fichtenberger et al. show that the altered method
is guaranteed to compute a summary that satisfies the coreset property for a
threshold value that lies in Θ(k · log n). Combined with k-means++, BICO gives
an expected O(log k)-approximation.
The implementation of BICO faces the same challenge as StreamKM++,
k-means or k-means++, namely, it needs to again and again compute the dis-
tance between a point and its closest neighbor in a stored point set. BICO
has one advantage, though, since it is only interested in this neighbor if it
8
http://ls2-www.cs.uni-dortmund.de/bico.
102 J. Blömer et al.
lies within a certain radius of the new point. This helps in developing heuris-
tics to speed up the insertion process. The method implemented in BICO has
the same worst case behavior as iterating through all stored points but can be
much faster.
Fichtenberger et al. compare BICO to StreamKM++, BIRCH and Mac-
Queen’s k-means algorithm on the same data sets as in [6] and one additional
128-dimensional data set. In all experiments, the summary size of BICO is set
to 200 k, thus the summary is not necessarily a coreset. The findings are that
BICO and StreamKM++ compute the best solutions, while BIRCH and Mac-
Queen are the fastest algorithms. However, for small k, the running time of
BICO is comparable to BIRCH and MacQueen. The running time of BICO is
O(ndm), where m is the chosen summary size, thus, the increase in the running
time for larger k stems from the choice m = 200 k. For larger k, the running
time can be decreased to lie below the running time of BIRCH by reducing m
at the cost of worse solutions. In the tested instances, the quality was then still
higher than for BIRCH and MacQueen.
6 Complexity of SSE
Before we consider variants of the k-means algorithm that deal with objective
functions different from SSE, we conclude our SSE related study by looking at
the complexity of SSE in general. We start by delivering a proof to the following
fact which we already used above. We also reflect on the insights that it gives
us on the structure of optimal solutions of the SSE problem.
Fact 3. Let μ := |P1 | p∈P p be the mean of a point set P , and let y ∈ Rd be
any point. Then, we have
p − y2 = p − μ2 + |P | · y − μ2 .
p∈P p∈P
Proof. The result is well known and the proof is contained in many papers. We
in particular follow [55]. First note that
p − y2 = p − μ + μ − y2
p∈P p∈P
= p − μ2 + 2(μ − y)T (p − μ) + |P | · y − μ2 .
p∈P p∈P
The first consequence of Fact 3 is that the SSE problem can be solved analyt-
ically for k = 1: The mean μ minimizes the cost function, and the optimal cost
Theoretical Analysis of the k-Means Algorithm – A Survey 103
is p∈P p − μ2 . For k ≥ 2, the optimal solution induces a partitioning of the
input point set P into subsets of P with the same closest center. These subsets
are called clusters. The center of a cluster is the mean of the points contained
in the cluster (otherwise, exchanging the center by the mean would improve the
solution). At the same time, every partitioning of the point set induces a feasible
solution by computing the mean of each subset of the partitioning. This gives
a new representation of an optimal solution as a partitioning of the input point
set that minimizes the induced clustering cost.
Notice that we cannot easily enumerate all possible centers as there are infi-
nitely many possibilities. By our new view on optimal solutions, we can instead
iterate over all possible partitionings. However, the number of possible parti-
tionings is exponential in n for every constant k ≥ 2. We get the intuition that
the problem is hard, probably even for small k. Next, we see a proof that this is
indeed the case. Notice that there exist different proofs for the fact that SSE is
NP-hard [9,32,60] and the proof presented here is the one due to Aloise et al. [9].
e2
v2 v3 e1 e2 e3 e4
v1 ⎛ ⎞
e1 e3 1 0 0 −1
v2 ⎜−1 1 0 0⎟
⎜ ⎟
v3 ⎝ 0 −1 1 0⎠
v1 v4 0 0 −1 1
e4 v4
(a) A simple example, (b) its corresponding matrix
e1 e2 e3 e4
v1 ⎛ ⎞
1 0 0 −1
v3 ⎜ 0 −1 1 0⎟
⎜ ⎟
v2 ⎝−1 1 0 0⎠
v4 0 0 −1 1
(c) and the cut X = {v1 , v3 }.
Fig. 5. An example for the reduction from our densest cut problem to SSE.
points, and a squared distance of (1 − 1/|X|)2 for the one endpoint that is in X.
Thus, the total cost of P (X) is
1
2+ (|X| − 1) + (1 − 1/|X|)2
|X|2
ej =(x,y)∈E,x,y∈X ej =(x,y)∈E(X)
1
= 2 + |E(X)| 1 − .
|X|
ej =(x,y)∈E,x,y∈X
This analysis holds for the clustering cost of P (V \X) analogously. Addition-
ally, every edge is either in E(X), or it has both endpoints in either P (X) or
P (V \V ). Thus, the total cost of the 2-clustering induced by X is
1 1 |E(X)| · |V |
2(|E| − |E(X)|) + |E(X)| 2 − − = 2|E| − .
|X| |V \X| |X| · |V \X|
Finding the optimal 2-clustering means that we minimize the above term.
As 2|E| and |V | are the same for all possible 2-clusterings, this corresponds to
finding the clustering which maximizes |E(X)|/(|X| · |V \X|). Thus, finding the
best 2-clustering is equivalent to maximizing the density.
Notice that the above transformation produces clustering inputs which are
|E|-dimensional. Thus, SSE is hard for constant k and arbitrary dimension. It
is also hard for constant dimension d and arbitrary k [60]. For small dimension
and a small number of clusters k, the problem can be solved in polynomial time
by the algorithm of Inaba et al. [46].
Theoretical Analysis of the k-Means Algorithm – A Survey 105
was given by Jain and Vazirani [49] who developed a primal dual approxima-
tion algorithm for a related problem and extended it to the SSE setting. Inaba
et al. [46] developed the first polynomial-time (1+ε)-approximation algorithm for
the case of k = 2 clusters. Matušek [63] improved this and obtained a polynomial-
time (1 + ε)-approximation algorithm for constant k and d with running time
O(n logk n) if ε is also fixed. Further (1 + ε)-approximations were for example
given by [29,34,38,43,57,73]. Notice that all cited (1 + ε)-approximation algo-
rithms are exponential in the number of clusters k and in some cases additionally
in the dimension d.
dΦ (x, c)
c x
k-Means with Bregman Divergences. Similar to SSE we can define the minimum
sum-of-Bregman-errors clustering problem (SBE). In this problem we are given
a fixed Bregman divergence dΦ with domain D and a set of points P ⊂ D. The
108 J. Blömer et al.
Moreover, for all Bregman divergences, any set of input points P , and any set of
k centers {μ1 , . . . , μk }, the optimal partitions for SBE induced by the centers μj
can be separated by hyperplanes. This was first explicitly stated
in [20]. More
precisely, the Bregman bisector x ∈ D | dΦ (x, c1 ) = dΦ (x, c2 ) between any two
Theoretical Analysis of the k-Means Algorithm – A Survey 109
Therefore, SBE with dΦ is just SSE for a linearly transformed input set. This
immediately implies that for Mahalanobis divergences SBE is NP-hard. Next, if
Φ is sufficiently smooth, the Hessian ∇2 Φt of Φ at point t ∈ ri(D) is a symmetric,
positive definite matrix. Therefore, dΦ locally behaves like a Mahalanobis diver-
gence. This can used to show that with appropriate restriction on the strictly
convex function Φ SBE is NP-hard.
Some Bregman divergences are (trivially) μ-similar. Others, like the Kullback-
Leibler divergence or the Itakura-Saito divergence become μ-similar if one
restricts the domain on which they are defined. For example, if we restrict
the Kullback-Leibler divergence to D = [λ, ν]d for 0 < λ < ν ≤ 1, then
the Kullback-Leibler divergence is λν -similar. This can be shown by looking
at the first order Taylor series expansion of the negative Shannon entropy
Φ(x1 , . . . , xd ) = xi ln(xi ).
μ-similar Bregman divergences approximately behave like Mahalanobis diver-
gences. Due to (2) Mahalanobis divergences behave like the squared Euclid-
ean distance. Hence, one can hope that μ-similar Bregman divergences behave
110 J. Blömer et al.
roughly like the squared Euclidean distance. In fact, it is not too difficult to
show that the swapping algorithm of Kanungo et al. [55] can be generalized to
μ-similar Bregman divergences to obtain approximation algorithms with approx-
imation factor 18/μ2 + for arbitrary > 0. Whether one can combine the tech-
nique of Kanungo et al. with Matoušek’s technique [63] to obtain better constant
factor approximation algorithms is not known.
In the work of Ackermann et al. [5], μ-similarity has been used to obtain
a probabilistic (1 + )-approximation algorithm for SBE, whose running time
is exponential in k, d, 1/, and 1/μ, but linear in |P |. Building upon results
in [57], Ackermann et al. describe and analyze an algorithm to solve the
k-median problem for metric and non-metric distance measures D that satisfy
the following conditions.
(1) For k = 1, optimal solutions to the k-median problem with respect to dis-
tance D can be computed efficiently.
(2) For every δ, γ > 0 there is a constant mδ,γ such that for any set P , with
probability 1 − δ the optimal 1-median of a random sample S of size mδ,γ
from P is a (1 + γ)-approximation to the 1-median for set P .
Together, (1) and (2) are called the [γ, δ]-sampling property. Using the same
algorithm as in [57] but a combinatorial rather than geometric analysis,
Ackermann et al. show that for any distance measure D satisfying the [γ, δ]-
sampling property and any > 0 there is an algorithm that with constant prob-
ability returns a (1 + )-approximation to the k-median problem with distance
measure D. The running time of the algorithm is linear in n, the number of input
points, and exponential in k, 1/, and the parameter mδ,
/3 from the sampling
property. Finally, Ackermann et al. show that any μ-similar Bregman divergence
satisfies the [δ, γ]-sampling property with parameter mδ,γ = γδμ 1
. Overall, this
yields a (1+) algorithm for SBE for μ-similar Bregman divergences with running
time linear in n, and exponential in k, 1/, 1/μ.
The k-Means Algorithm for Bregman Divergences. The starting point for much
of the recent research on SBE for Bregman divergences is the work by Banerjee
et al. [20]. They were the first to explicitly state Fact 4 and describe the k-means
algorithm (see page 2) as a generic algorithm to solve SBE for arbitrary Bregman
divergences. Surprisingly, the k-means algorithm cannot be generalized beyond
Bregman divergences. In [19] it is shown, that under some mild smoothness con-
ditions, any divergence that satisfies Fact 4 is a Bregman divergence. Of course,
this does not imply that variants or modifications of the k-means algorithm can-
not be used for distance measures other than Bregman divergences. However,
in these generalizations cluster centroids cannot be used as optimizers in the
second step, the re-estimation step.
Banerjee et al. already showed that for any Bregman divergence the k-means
algorithm terminates after a finite number of steps. In fact, using the linear
separability of intermediate solutions computed by the k-means algorithm (see
Eq. 1), for any Bregman divergence the number of iterations of the k-means
Theoretical Analysis of the k-Means Algorithm – A Survey 111
2
algorithm can be bounded by O(nk d ). Since the squared Euclidean distance is
a Bregman divergence it is clear that no approximation guarantees can be given
for the solutions the k-means algorithm finds for SBE.
1. Lower Bounds. Manthey and Röglin extended Vattani’s exponential lower
bound for the running time of the k-means algorithm to any Bregman diver-
gence dΦ defined by a sufficiently smooth function Φ. In their proof they use an
approach similar to the approach used by Ackerman et al. to show that SBE is
NP-hard. Using (2) Manthey and Röglin first extend Vattani’s lower bound to
any Mahalanobis divergence. Then, using the fact that any Bregman divergence
dΦ with sufficiently smooth Φ locally resembles some Mahalanobis divergence
dA , Manthey and Röglin show that a lower bound for the Mahalanobis diver-
gence dA carries over to a lower bound for the Bregman divergence dΦ . Hence,
for any smooth Bregman divergence the k-means algorithm has exponential run-
ning time. Moreover, Manthey and Röglin show that for the k-means algorithm
the squared Euclidean distance, and more generally Mahalanobis divergences,
are the easiest Bregman divergences.
2. Smoothed Analysis. Recall that the smoothed complexity of the k-means algo-
rithm is polynomial in n and 1/σ, when each input point is perturbed by random
noise generated using a Gaussian distribution with mean 0 and standard devia-
tion σ, a result due to Arthur, Manthey, and Röglin [12]. So far, this result has not
been generalized to Bregman divergences. For almost any Bregman divergence
dΦ Manthey and Röglin [61] prove two upper bounds on the smoothed √
complex-
k
ity of the k-means algorithm. The first bound is of the form poly(n , 1/σ), the
second is of the form k kd · poly(n, 1/σ). These bounds match bounds that Man-
they and Rögin achieved for the squared Euclidean distance in [62]. Instead of
reviewing their proofs, we will briefly review two technical difficulties Manthey
and Röglin had to account for.
Bregman divergences dΦ : D × ri(D) → R≥0 ∪ {∞} like the Kullback-Leibler
divergence are defined on a bounded subset of some Rd . Therefore perturb-
ing a point in D may yield a point for which the Bregman divergence is not
defined. Moreover, whereas the Gaussian noise is natural for the squared Euclid-
ean distance this is by no means clear for all Bregman divergences. In fact,
Banerjee et al. [20] already showed a close connection between Bregman diver-
gences and exponential families, indicating that noise chosen according to an
exponential distribution may be appropriate for some Bregman divergences.
Manthey and Röglin deal with these issues by first introducing a general and
abstract perturbation model parametrized by some σ ∈ (0, 1]. Then Manthey
and Röglin give a smoothed analysis of the k-means algorithm for Bregman
divergences with respect to this abstract model. It is important to note that as
in the squared Euclidean case, the parameter σ measures the amount of random-
ness in the perturbation. Finally, for Bregman divergences like the Mahalanobis
divergences, the Kullback-Leibler divergence, or the Itakura-Saito Manthey and
Röglin instantiate the abstract perturbation model with some perturbations
schemes using explicit distributions.
112 J. Blömer et al.
References
1. Achlioptas, D., McSherry, F.: On spectral learning of mixtures of distributions.
In: Auer, P., Meir, R. (eds.) COLT 2005. LNCS (LNAI), vol. 3559, pp. 458–469.
Springer, Heidelberg (2005). doi:10.1007/11503415 31
2. Ackermann, M.R., Blömer, J.: Coresets and approximate clustering for Breg-
man divergences. In: Proceedings of the 20th Annual ACM-SIAM Symposium
on Discrete Algorithms (SODA 2009), pp. 1088–1097. Society for Industrial and
Applied Mathematics (SIAM) (2009). http://www.cs.uni-paderborn.de/uploads/
tx sibibtex/CoresetsAndApproximateClusteringForBregmanDivergences.pdf
3. Ackermann, M.R., Blömer, J.: Bregman clustering for separable instances. In:
Kaplan, H. (ed.) SWAT 2010. LNCS, vol. 6139, pp. 212–223. Springer, Heidelberg
(2010). doi:10.1007/978-3-642-13731-0 21
4. Ackermann, M.R., Blömer, J., Scholz, C.: Hardness and non-approximability of
Bregman clustering problems. In: Electronic Colloquium on Computational Com-
plexity (ECCC), vol. 18, no. 15, pp. 1–20 (2011). http://eccc.uni-trier.de/report/
2011/015/, report no. TR11-015
Theoretical Analysis of the k-Means Algorithm – A Survey 113
5. Ackermann, M.R., Blömer, J., Sohler, C.: Clustering for metric and non-metric dis-
tance measures. ACM Trans. Algorithms 6(4), Article No. 59:1–26 (2010). Special
issue on SODA 2008
6. Ackermann, M.R., Märtens, M., Raupach, C., Swierkot, K., Lammersen, C., Sohler,
C.: Streamkm++: a clustering algorithm for data streams. ACM J. Exp. Algorith-
mics 17, Article No. 4, 1–30 (2012)
7. Aggarwal, A., Deshpande, A., Kannan, R.: Adaptive sampling for k -means clus-
tering. In: Dinur, I., Jansen, K., Naor, J., Rolim, J. (eds.) APPROX/RANDOM
-2009. LNCS, vol. 5687, pp. 15–28. Springer, Heidelberg (2009). doi:10.1007/
978-3-642-03685-9 2
8. Ailon, N., Jaiswal, R., Monteleoni, C.: Streaming k-means approximation. In: Pro-
ceedings of the 22nd Annual Conference on Neural Information Processing Systems,
pp. 10–18 (2009)
9. Aloise, D., Deshpande, A., Hansen, P., Popat, P.: NP-hardness of Euclidean sum-
of-squares clustering. Mach. Learn. 75(2), 245–248 (2009)
10. Alsabti, K., Ranka, S., Singh, V.: An efficient k-means clustering algorithm. In:
Proceeding of the First Workshop on High-Performance Data Mining (1998)
11. Arora, S., Kannan, R.: Learning mixtures of separated nonspherical Gaussians.
Ann. Appl. Probab. 15(1A), 69–92 (2005)
12. Arthur, D., Manthey, B., Röglin, H.: k-means has polynomial smoothed complexity.
In: Proceedings of the 50th Annual IEEE Symposium on Foundations of Computer
Science (FOCS 2009), pp. 405–414. IEEE Computer Society (2009)
13. Arthur, D., Vassilvitskii, S.: How slow is the k-means method? In: Proceedings of
the 22nd ACM Symposium on Computational Geometry (SoCG 2006), pp. 144–153
(2006)
14. Arthur, D., Vassilvitskii, S.: k-means++: the advantages of careful seeding. In:
Proceedings of the 18th Annual ACM-SIAM Symposium on Discrete Algorithms
(SODA 2007), pp. 1027–1035. Society for Industrial and Applied Mathematics
(2007)
15. Arthur, D., Vassilvitskii, S.: Worst-case and smoothed analysis of the ICP algo-
rithm, with an application to the k-means method. SIAM J. Comput. 39(2), 766–
782 (2009)
16. Awasthi, P., Blum, A., Sheffet, O.: Stability yields a PTAS for k-median and k-
means clustering. In: FOCS, pp. 309–318 (2010)
17. Awasthi, P., Charikar, M., Krishnaswamy, R., Sinop, A.K.: The hardness of approx-
imation of Euclidean k-means. In: SoCG 2015 (2015, accepted)
18. Balcan, M.F., Blum, A., Gupta, A.: Approximate clustering without the approxi-
mation. In: SODA, pp. 1068–1077 (2009)
19. Banerjee, A., Guo, X., Wang, H.: On the optimality of conditional expectation as
a Bregman predictor. IEEE Trans. Inf. Theory 51(7), 2664–2669 (2005)
20. Banerjee, A., Merugu, S., Dhillon, I.S., Ghosh, J.: Clustering with Bregman diver-
gences. J. Mach. Learn. Res. 6, 1705–1749 (2005)
21. Belkin, M., Sinha, K.: Toward learning Gaussian mixtures with arbitrary separa-
tion. In: COLT, pp. 407–419 (2010)
22. Belkin, M., Sinha, K.: Learning Gaussian mixtures with arbitrary separation.
CoRR abs/0907.1054 (2009)
23. Belkin, M., Sinha, K.: Polynomial learning of distribution families. In: FOCS, pp.
103–112 (2010)
24. Berkhin, P.: A survey of clustering data mining techniques. In: Kogan, J., Nicholas,
C., Teboulle, M. (eds.) Grouping Multidimensional Data, pp. 25–71. Springer, Hei-
delberg (2006)
114 J. Blömer et al.
25. Braverman, V., Meyerson, A., Ostrovsky, R., Roytman, A., Shindler, M., Tagiku,
B.: Streaming k-means on well-clusterable data. In: SODA, pp. 26–40 (2011)
26. Brubaker, S.C., Vempala, S.: Isotropic PCA and affine-invariant clustering. In:
FOCS, pp. 551–560 (2008)
27. Chaudhuri, K., McGregor, A.: Finding metric structure in information theoretic
clustering. In: COLT, pp. 391–402. Citeseer (2008)
28. Chaudhuri, K., Rao, S.: Learning mixtures of product distributions using correla-
tions and independence. In: COLT, pp. 9–20 (2008)
29. Chen, K.: On coresets for k-median and k-means clustering in metric and Euclidean
spaces and their applications. SIAM J. Comput. 39(3), 923–947 (2009)
30. Dasgupta, S.: Learning mixtures of Gaussians. In: FOCS, pp. 634–644 (1999)
31. Dasgupta, S.: How fast Is k -means? In: Schölkopf, B., Warmuth, M.K. (eds.)
COLT-Kernel 2003. LNCS (LNAI), vol. 2777, p. 735. Springer, Heidelberg (2003).
doi:10.1007/978-3-540-45167-9 56
32. Dasgupta, S.: The hardness of k-means clustering. Technical report CS2008-0916,
University of California (2008)
33. Dasgupta, S., Schulman, L.J.: A probabilistic analysis of EM for mixtures of sep-
arated, spherical Gaussians. J. Mach. Learn. Res. 8, 203–226 (2007)
34. Feldman, D., Langberg, M.: A unified framework for approximating and clustering
data. In: Proceedings of the 43th Annual ACM Symposium on Theory of Comput-
ing (STOC), pp. 569–578 (2011)
35. Feldman, D., Monemizadeh, M., Sohler, C.: A PTAS for k-means clustering based
on weak coresets. In: Proceedings of the 23rd ACM Symposium on Computational
Geometry (SoCG), pp. 11–18 (2007)
36. Feldman, J., O’Donnell, R., Servedio, R.A.: Learning mixtures of product distrib-
utions over discrete domains. SIAM J. Comput. 37(5), 1536–1564 (2008)
37. Fichtenberger, H., Gillé, M., Schmidt, M., Schwiegelshohn, C., Sohler, C.: BICO:
BIRCH meets coresets for k -means clustering. In: Bodlaender, H.L., Italiano, G.F.
(eds.) ESA 2013. LNCS, vol. 8125, pp. 481–492. Springer, Heidelberg (2013). doi:10.
1007/978-3-642-40450-4 41
38. Frahling, G., Sohler, C.: Coresets in dynamic geometric data streams. In: Proceed-
ings of the 37th STOC, pp. 209–217 (2005)
39. Gordon, A.: Null models in cluster validation. In: Gaul, W., Pfeifer, D. (eds.)
From Data to Knowledge: Theoretical and Practical Aspects of Classification, Data
Analysis, and Knowledge Organization, pp. 32–44. Springer, Heidelberg (1996)
40. Guha, S., Meyerson, A., Mishra, N., Motwani, R., O’Callaghan, L.: Clustering
data streams: theory and practice. IEEE Trans. Knowl. Data Eng. 15(3), 515–528
(2003)
41. Hamerly, G., Drake, J.: Accelerating Lloyd’s algorithm for k-means clustering. In:
Celebi, M.E. (ed.) Partitional Clustering Algorithms, pp. 41–78. Springer, Cham
(2015)
42. Har-Peled, S., Kushal, A.: Smaller coresets for k-median and k-means clustering.
Discrete Comput. Geom. 37(1), 3–19 (2007)
43. Har-Peled, S., Mazumdar, S.: On coresets for k-means and k-median clustering.
In: Proceedings of the 36th Annual ACM Symposium on Theory of Computing
(STOC 2004), pp. 291–300 (2004)
44. Har-Peled, S., Sadri, B.: How fast is the k-means method? In: SODA, pp. 877–885
(2005)
45. Hartigan, J.A.: Clustering Algorithms. Wiley, Hoboken (1975)
Theoretical Analysis of the k-Means Algorithm – A Survey 115
46. Inaba, M., Katoh, N., Imai, H.: Applications of weighted Voronoi diagrams and
randomization to variance-based k-clustering (extended abstract). In: Symposium
on Computational Geometry (SoCG 1994), pp. 332–339 (1994)
47. Jain, A.K.: Data clustering: 50 years beyond k-means. Pattern Recogn. Lett. 31(8),
651–666 (2010)
48. Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: a review. ACM Comput.
Surv. 31(3), 264–323 (1999)
49. Jain, K., Vazirani, V.V.: Approximation algorithms for metric facility location and
k-median problems using the primal-dual schema and Lagrangian relaxation. J.
ACM 48(2), 274–296 (2001)
50. Judd, D., McKinley, P.K., Jain, A.K.: Large-scale parallel data clustering. IEEE
Trans. Pattern Anal. Mach. Intell. 20(8), 871–876 (1998)
51. Kalai, A.T., Moitra, A., Valiant, G.: Efficiently learning mixtures of two Gaussians.
In: STOC, pp. 553–562 (2010)
52. Kannan, R., Vempala, S.: Spectral algorithms. Found. Trends Theoret. Comput.
Sci. 4(3–4), 157–288 (2009)
53. Kannan, R., Salmasian, H., Vempala, S.: The spectral method for general mixture
models. SIAM J. Comput. 38(3), 1141–1156 (2008)
54. Kanungo, T., Mount, D.M., Netanyahu, N.S., Piatko, C.D., Silverman, R., Wu,
A.Y.: An efficient k-means clustering algorithm: analysis and implementation.
IEEE Trans. Pattern Anal. Mach. Intell. 24(7), 881–892 (2002)
55. Kanungo, T., Mount, D.M., Netanyahu, N.S., Piatko, C.D., Silverman, R., Wu,
A.Y.: A local search approximation algorithm for k-means clustering. Comput.
Geom. 28(2–3), 89–112 (2004)
56. Kumar, A., Kannan, R.: Clustering with spectral norm and the k-means algorithm.
In: Proceedings of the 51st Annual Symposium on Foundations of Computer Sci-
ence (FOCS 2010), pp. 299–308. IEEE Computer Society (2010)
57. Kumar, A., Sabharwal, Y., Sen, S.: Linear-time approximation schemes for clus-
tering problems in any dimensions. J. ACM 57(2), Article No. 5 (2010)
58. Lloyd, S.P.: Least squares quantization in PCM. Bell Laboratories Technical Mem-
orandum (1957)
59. MacQueen, J.B.: Some methods for classification and analysis of multivariate obser-
vations. In: Proceedings of the 5th Berkeley Symposium on Mathematical Statistics
and Probability, vol. 1, pp. 281–297. University of California Press (1967)
60. Mahajan, M., Nimbhorkar, P., Varadarajan, K.: The planar k-means problem is
NP-hard. In: Das, S., Uehara, R. (eds.) WALCOM 2009. LNCS, vol. 5431, pp.
274–285. Springer, Heidelberg (2009). doi:10.1007/978-3-642-00202-1 24
61. Manthey, B., Röglin, H.: Worst-case and smoothed analysis of k-means clustering
with Bregman divergences. JoCG 4(1), 94–132 (2013)
62. Manthey, B., Rölin, H.: Improved smoothed analysis of the k-means method. In:
Proceedings of the Twentieth Annual ACM-SIAM Symposium on Discrete Algo-
rithms, pp. 461–470. Society for Industrial and Applied Mathematics (2009)
63. Matoušek, J.: On approximate geometric k-clustering. Discrete Comput. Geom.
24(1), 61–84 (2000)
64. Matula, D.W., Shahrokhi, F.: Sparsest cuts and bottlenecks in graphs. Discrete
Appl. Math. 27, 113–123 (1990)
65. Moitra, A., Valiant, G.: Settling the polynomial learnability of mixtures of Gaus-
sians. In: FOCS 2010 (2010)
116 J. Blömer et al.
66. Nock, R., Luosto, P., Kivinen, J.: Mixed Bregman clustering with approximation
guarantees. In: Daelemans, W., Goethals, B., Morik, K. (eds.) ECML PKDD 2008.
LNCS (LNAI), vol. 5212, pp. 154–169. Springer, Heidelberg (2008). doi:10.1007/
978-3-540-87481-2 11
67. Ostrovsky, R., Rabani, Y., Schulman, L.J., Swamy, C.: The effectiveness of Lloyd-
type methods for the k-means problem. In: FOCS, pp. 165–176 (2006)
68. Pelleg, D., Moore, A.W.: Accelerating exact k-means algorithms with geometric
reasoning. In: Proceedings of the Fifth ACM SIGKDD International Conference
on Knowledge Discovery and Data Mining, pp. 277–281 (1999)
69. Selim, S.Z., Ismail, M.A.: k-means-type algorithms: a generalized convergence the-
orem and characterization of local optimality. IEEE Trans. Pattern Anal. Mach.
Intell. (PAMI) 6(1), 81–87 (1984)
70. Steinhaus, H.: Sur la division des corps matériels en parties. Bulletin de l’Académie
Polonaise des Sciences IV(12), 801–804 (1956)
71. Tibshirani, R., Walther, G., Hastie, T.: Estimating the number of clusters in a
dataset via the gap statistic. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 63, 411–423
(2001)
72. Vattani, A.: k-means requires exponentially many iterations even in the plane. In:
Proceedings of the 25th ACM Symposium on Computational Geometry (SoCG
2009), pp. 324–332. Association for Computing Machinery (2009)
73. de la Vega, W.F., Karpinski, M., Kenyon, C., Rabani, Y.: Approximation schemes
for clustering problems. In: Proceedings of the 35th Annual ACM Symposium on
Theory of Computing (STOC 2003), pp. 50–58 (2003)
74. Vempala, S., Wang, G.: A spectral algorithm for learning mixture models. J. Com-
put. Syst. Sci. 68(4), 841–860 (2004)
75. Venkatasubramanian, S.: Choosing the number of clusters I-III (2010). http://blog.
geomblog.org/p/conceptual-view-of-clustering.html. Accessed 30 Mar 2015
76. Zhang, T., Ramakrishnan, R., Livny, M.: BIRCH: a new data clustering algorithm
and its applications. Data Min. Knowl. Disc. 1(2), 141–182 (1997)
Recent Advances in Graph Partitioning
1 Introduction
Graphs are frequently used by computer scientists as abstractions when mod-
eling an application problem. Cutting a graph into smaller pieces is one of the
fundamental algorithmic operations. Even if the final application concerns a dif-
ferent problem (such as traversal, finding paths, trees, and flows), partitioning
large graphs is often an important subproblem for complexity reduction or par-
allelization. With the advent of ever larger instances in applications such as
scientific simulation, social networks, or road networks, graph partitioning (GP)
therefore becomes more and more important, multifaceted, and challenging. The
purpose of this paper is to give a structured overview of the rich literature, with
a clear emphasis on explaining key ideas and discussing recent work that is
missing in other overviews. For a more detailed picture on how the field has
evolved previously, we refer the interested reader to a number of surveys. Bichot
and Siarry [22] cover studies on GP within the area of numerical analysis. This
includes techniques for GP, hypergraph partitioning and parallel methods. The
book discusses studies from a combinatorial viewpoint as well as several applica-
tions of GP such as the air traffic control problem. Schloegel et al. [191] focus on
fast graph partitioning techniques for scientific simulations. In their account of
the state of the art in this area around the turn of the millennium, they describe
geometric, combinatorial, spectral, and multilevel methods and how to combine
them for static partitioning. Load balancing of dynamic simulations, parallel
aspects, and problem formulations with multiple objectives or constraints are
also considered. Monien et al. [156] discuss heuristics and approximation algo-
rithms used in the multilevel GP framework. In their description they focus
mostly on coarsening by matching and local search by node-swapping heuristics.
Kim et al. [119] cover genetic algorithms.
c Springer International Publishing AG 2016
L. Kliemann and P. Sanders (Eds.): Algorithm Engineering, LNCS 9220, pp. 117–158, 2016.
DOI: 10.1007/978-3-319-49487-6 4
118 A. Buluç et al.
Our survey is structured as follows. First, Sect. 2 introduces the most impor-
tant variants of the problem and their basic properties such as NP-hardness.
Then Sect. 3 discusses exemplary applications including parallel processing, road
networks, image processing, VLSI design, social networks, and bioinformatics.
The core of this overview concerns the solution methods explained in Sects. 4, 5,
6 and 7. They involve a surprising variety of techniques. We begin in Sect. 4 with
basic, global methods that “directly” partition the graph. This ranges from very
simple algorithms based on breadth first search to sophisticated combinatorial
optimization methods that find exact solutions for small instances. Also meth-
ods from computational geometry and linear algebra are being used. Solutions
obtained in this or another way can be improved using a number of heuristics
described in Sect. 5. Again, this ranges from simple-minded but fast heuristics
for moving individual nodes to global methods, e.g., using flow or shortest path
computations. The most successful approach to partitioning large graphs – the
multilevel method – is presented in Sect. 6. It successively contracts the graph to
a more manageable size, solves the base instance using one of the techniques from
Sect. 4, and – using techniques from Sect. 5 – improves the obtained partition
when uncontracting to the original input. Metaheuristics are also important. In
Sect. 7 we describe evolutionary methods that can use multiple runs of other
algorithms (e.g., multilevel) to obtain high quality solutions. Thus, the best
GP solvers orchestrate multiple approaches into an overall system. Since all of
this is very time consuming and since the partitions are often used for parallel
computing, parallel aspects of GP are very important. Their discussion in Sect. 8
includes parallel solvers, mapping onto a set of parallel processors, and migration
minimization when repartitioning a dynamic graph. Section 9 describes issues of
implementation, benchmarking, and experimentation. Finally, Sect. 10 points to
future challenges.
2 Preliminaries
1. V1 ∪ · · · ∪ Vk = V
2. Vi ∩ Vj = ∅ ∀i = j.
A balance constraint demands that all blocks have about equal weights. More
precisely, it requires that, ∀i ∈ {1, . . . , k} : |Vi | ≤ Lmax := (1 + )
|V |/k for
some imbalance parameter ∈ R≥0 . In the case of = 0, one also uses the term
perfectly balanced. Sometimes we also use weighted nodes with node weights
c : V → R>0 . Weight functions on nodes and edges are extended to sets of such
objects by summing their weights. A block Vi is overloaded if |Vi | > Lmax . A clus-
tering is also a partition of the nodes. However, k is usually not given in advance,
and the balance constraint is removed. Note that a partition is also a clustering of
Recent Advances in Graph Partitioning 119
In practice, one often seeks to find a partition that minimizes (or maximizes) an
objective. Probably the most prominent objective function is to minimize the
total cut
ω(Eij ). (1)
i<j
Other formulations of GPP exist. For instance when GP is used in parallel com-
puting to map the graph nodes to different processors, the communication volume
is often more appropriate than the cut[100]. For a block Vi , the communica-
tion volume is defined as comm(Vi ) := v∈Vi c(v)D(v), where D(v) denotes the
number of different blocks in which v has a neighbor node, excluding Vi . The
maximum communication volume is then defined as maxi comm(Vi ), whereas
the total communication volume is defined as i comm(Vi ). The maximum com-
munication volume was used in one subchallenge of the 10th DIMACS Challenge
on Graph Partitioning and Graph Clustering [13]. Although some applications
profit from other objective functions such as the communication volume or block
shape (formalized by the block’s aspect ratio [56], minimizing the cut size has
been adopted as a kind of standard. One reason is that cut optimization seems to
be easier in practice. Another one is that for graphs with high structural locality
the cut often correlates with most other formulations but other objectives make
it more difficult to use a multilevel approach.
There are also GP formulations in which balance is not directly encoded in the
problem description but integrated into the objective function. For example, the
120 A. Buluç et al.
expansion of a non-trivial cut (V1 , V2 ) is defined as ω(E12 )/ min(c(V1 ), c(V2 )). Sim-
ilarly, the conductance
of such a cut is defined as ω(E12 )/ min(vol(V1 ), vol(V2 )),
where vol(S) := v∈S d(v) denotes the volume of the set S.
As an extension to the problem, when the application graph changes over
time, repartitioning becomes necessary. Due to changes in the underlying appli-
cation, a graph partition may become gradually imbalanced due to the introduc-
tion of new nodes (and edges) and the deletion of others. Once the imbalance
exceeds a certain threshold, the application should call the repartitioning rou-
tine. This routine is to compute a new partition Π from the old one, Π. In
many applications it is favorable to keep the changes between Π and Π small.
Minimizing these changes simultaneously to optimizing Π with respect to the
cut (or a similar objective) leads to multiobjective optimization. To avoid the
complexity of the latter, a linear combination of both objectives seems feasible
in practice [193].
The minimum weight k-cut problem asks for a partition of the nodes into k non-
empty blocks without enforcing a balance constraint. Goldschmidt et al. [88]
2
proved that, for a fixed k, this problem can be solved optimally in O(nk ). The
problem is NP-complete [88] if k is not part of the input.
For the unweighted minimum bisection
problem, Feige and Krauthgamer [68]
have shown that there is an O log1.5 n approximation algorithm and an O(log n)
approximation for minimum bisection on planar graphs. The bisection problem
is efficiently solvable if the balance constraint is dropped – in this case it is
the minimum cut problem. Wagner et al. [211] have shown that the minimum
bisection problem becomes harder the more the balance constraint is tightened
towards the perfectly balanced case. More precisely, if the block weights are
bounded from below by a constant, i. e., |Vi | ≥ C, then the problem is solvable
in polynomial time. The problem is NP-hard if the block weights are constrained
by |Vi | ≥ αnδ for some α, δ > 0 or if |Vi | = n2 . The case |Vi | ≥ α log n for some
α > 0 is open. Note that the case |Vi | ≥ αnδ also implies that the general GPP
with similar lower bounds on the block weights is NP-hard.
If the balance constraint of the problem is dropped and one uses a different
objective function such as sparsest cut, then there are better approximation
algorithms. The sparsest cut objective combines cut and balance into a single
objective function. For general graphs and√the sparsest
cut metric, Arora et al.
[7,8] achieve an approximation ratio of O log n in Õ(n2 ) time.
Being of high theoretical importance, most of the approximation algorithms
are not implemented, and the approaches that implement approximation algo-
rithms are too slow to be used for large graphs or are not able to compete with
state-of-the-art GP solvers. Hence, mostly heuristics are used in practice.
Power Grids. Disturbances and cascading failures are among the central prob-
lems in power grid systems that can cause catastrophic blackouts. Splitting a
power network area into self-sufficient islands is an approach to prevent the
propagation of cascading failures [132]. Often the cut-based objectives of the
partitioning are also combined with the load shedding schemes that enhance the
robustness of the system and minimize the impact of cascading events [133].
Finding vulnerabilities of power systems by GPP has an additional difficulty.
In some applications, one may want to find more than one (nearly) minimum
partitioning because of the structural difference between the solutions. Spectral
GP (see Sect. 4.2) is also used to detect contingencies in power grid vulnerability
analysis by splitting the network into regions with excess generation and excess
load [60].
the conductance objective described in Sect. 2.1. Many efficient algorithms were
proposed for solving GPP with the normalized cut objective. Among the most
successful are spectral and multilevel approaches. Another relevant formulation
of the partitioning objective which is useful for image segmentation is given by
optimizing the isoperimetric ratio for sets [89]. For more information on graph
partitioning and image segmentation see [32,169].
4 Global Algorithms
We begin our discussion of the wide spectrum of GP algorithms with methods
that work with the entire graph and compute a solution directly. These algo-
rithms are often used for smaller graphs or are applied as subroutines in more
complex methods such as local search or multilevel algorithms. Many of these
methods are restricted to bipartitioning but can be generalized to k-partitioning
for example by recursion.
After discussing exact methods in Sect. 4.1 we turn to heuristic algorithms.
Spectral partitioning (Sect. 4.2) uses methods from linear algebra. Graph growing
(Sect. 4.3) uses breadth first search or similar ways to directly add nodes to a
block. Flow computations are discussed in Sect. 4.4. Section 4.5 summarizes a
wide spectrum of geometric techniques. Finally, Sect. 4.5 introduces streaming
algorithms which work with a very limited memory footprint.
111,134,197] and some methods that solve the general GPP [71,198]. Most of
the methods rely on the branch-and-bound framework [126].
Bounds are derived using various approaches: Karisch et al. [111] and
Armbruster [5] use semi-definite programming, and Sellman et al. [197] and
Sensen [198] employ multi-commodity flows. Linear programming is used by
Brunetta et al. [28], Ferreira et al. [71], Lisser and Rendl [134] and by Arm-
bruster et al. [6]. Hager et al. [93,94] formulate GPP in form of a continuous
quadratic program on which the branch and bound technique is applied. The
objective of the quadratic program is decomposed into convex and concave com-
ponents. The more complicated concave component is then tackled by an SDP
relaxation. Felner [70] and Delling et al. [49,51] utilize combinatorial bounds.
Delling et al. [49,51] derive the bounds by computing minimum s-t cuts between
partial assignments (A, B), i. e., A, B ⊆ V and A ∩ B = ∅. The method can
partition road networks with more than a million nodes, but its running time
highly depends on the bisection width of the graph.
In general, depending on the method used, two alternatives can be observed.
Either the bounds derived are very good and yield small branch-and-bound trees
but are hard to compute. Or the bounds are somewhat weaker and yield larger
trees but are faster to compute. The latter is the case when using combinatorial
bounds. On finite connected subgraphs of the two dimensional grid without holes,
the bipartitioning problem can be solved optimally in O n4 time [69]. Recent
work by Bevern et al. [19] looks at the parameterized complexity for computing
balanced partitions in graphs.
All of these methods can typically solve only very small problems while having
very large running times, or if they can solve large bipartitioning instances using
a moderate amount of time [49,51], highly depend on the bisection width of
the graph. Methods that solve the general GPP [71,198] have immense running
times for graphs with up to a few hundred nodes. Moreover, the experimental
evaluation of these methods only considers small block numbers k ≤ 4.
method explained in Sect. 6, but their method coarsens with independent node
sets and performs local improvement with Rayleigh quotient iteration. Hendrick-
son and Leland [98] extend the spectral method to partition a graph into more
than two blocks by using multiple eigenvectors; these eigenvectors are compu-
tationally inexpensive to obtain. The method produces better partitions than
recursive bisection, but is only useful for the partitioning of a graph into four
or eight blocks. The authors also extended the method to graphs with node and
edge weights.
4.4 Flows
The well-known max-flow min-cut theorem [75] can be used to separate two node
sets in a graph by computing a maximum flow and hence a minimum cut between
them. This approach completely ignores balance, and it is not obvious how to
apply it to the balanced GPP. However, at least for random regular graphs with
small bisection width this can be done [29]. Maximum flows are also often used
as a subroutine. Refer to Sect. 5.4 for applications to improve a partition and
to Sect. 6.4 for coarsening in the context of the multilevel framework. There are
also applications of flow computations when quality is measured by expansion
or conductance [3,127].
The bisecting plane is orthogonal to the coordinate axis, which can create par-
titions with large separators in case of meshes with skewed dimensions. Inertial
partitioning can be interpreted as an improvement over RCB in terms of worst
case performance because its bisecting plane is orthogonal to a plane L that
minimizes the moments of inertia of nodes. In other words, the projection plane
L is chosen such that it minimizes the sum of squared distances to all nodes.
The random spheres algorithm of Miller et al. [83,152] generalizes the RCB
algorithm by stereographically projecting the d dimensional nodes to a random
d + 1 dimensional sphere which is bisected by a plane through its center point.
This method gives performance guarantees for planar graphs, k-nearest neighbor
graphs, and other “well-behaved” graphs.
Other representatives of geometry-based partitioning algorithms are space-
filling curves [14,105,171,223] which reduce d-dimensional partitioning to the
one-dimensional case. Space filling curves define a bijective mapping from V
to {1, . . . , |V |}. This mapping aims at the preservation of the nodes’ locality in
space. The partitioning itself is simpler and cheaper than RCB once the bijective
mapping is constructed. A generalization of space-filling curves to general graphs
can be done by so-called graph-filling curves [190].
A recent work attempts to bring information on the graph structure into the
geometry by embedding arbitrary graphs into the coordinate space using a mul-
tilevel graph drawing algorithm [121]. For a more detailed, albeit not very recent,
treatment of geometric methods, we refer the interested reader to Schloegel
et al. [191].
the neighborhood and the selection strategy allows a wide variety of techniques.
Having the improvement of paging properties of computer programs in mind,
Kernighan and Lin [117] were probably the first to define GPP and to provide
a local search method for this problem. The selection strategy finds the swap of
node assignments that yields the largest decrease in the total cut size. Note that
this “decrease” is also allowed to be negative. A round ends when all nodes have
been moved in this way. The solution is then reset to the best solution encoun-
tered in this round. The algorithm terminates when a round has not found an
improvement.
A major drawback of the KL method is that it is expensive in terms of
asymptotic
running time. The implementation assumed in [117] takes time
O n2 log n and can be improved to O(m max(log n, Δ)) where Δ denotes the
maximum degree [64]. A major breakthrough is the modification by Fiduccia and
Mattheyses [72]. Their carefully designed data structures and adaptations yield
the KL/FM local search algorithm, whose asymptotic running time is O(m).
Bob Darrow was the first who implemented the KL/FM algorithm [72].
Karypis and Kumar [114] further accelerated KL/FM by only allowing
boundary nodes to move and by stopping a round when the edge cut does not
decrease after x node moves. They improve quality by random tie breaking and
by allowing additional rounds even when no improvements have been found.
A highly localized version of KL/FM is considered in [161]. Here, the search
spreads from a single boundary node. The search stops when a stochastic model
of the search predicts that a further improvement has become unlikely. This
strategy has a better chance to climb out of local minima and yields improved
cuts for the GP solvers KaSPar [161] and KaHIP [183].
Rather than swapping nodes, Holtgrewe et al. move a single node at a
time allowing more flexible tradeoffs between reducing the cut or improving
balance [102].
Helpful Sets by Diekmann et al. [55,155] introduce a more general neigh-
borhood relation in the bipartitioning case. These algorithms are inspired by a
proof technique of Hromkovič and Monien [103] for proving upper bounds on
the bisection width of a graph. Instead of migrating single nodes, whole sets of
nodes are exchanged between the blocks to improve the cut. The running time
of the algorithm is comparable to the KL/FM algorithm, while solution quality
is often better than other methods [155].
It has been shown by Simon and Teng [201] that, due to the lack of global
knowledge, recursive bisection can create partitions that are very far away from
the optimal partition so that there is a need for k-way local search algorithms.
There are multiple ways of extending the KL/FM algorithm to get a local search
algorithm that can improve a k-partition.
One early extension of the KL/FM algorithm to k-way local search uses
k(k − 1) priority queues, one for each type of move (source block, target block)
130 A. Buluç et al.
[97,182]. For a single movement one chooses the node that maximizes the gain,
breaking ties by the improvement in balance.
Karypis and Kumar [114] present a k-way version of the KL/FM algorithm
that runs in linear time O(m). They use a single global priority queue for all
types of moves. The priority used is the maximum local gain, i. e., the maximum
reduction in the cut when the node is moved to one of its neighboring blocks.
The node that is selected for movement yields the maximum improvement for
the objective and maintains or improves upon the balance constraint.
Most current local search algorithms exchange nodes between blocks of the
partition trying to decrease the cut size while also maintaining balance. This
highly restricts the set of possible improvements. Sanders and Schulz [186,195]
relax the balance constraint for node movements but globally maintain (or
improve) balance by combining multiple local searches. This is done by reducing
the combination problem to finding negative cycles in a graph, exploiting the
existence of efficient algorithms for this problem.
A more expensive k-way local search algorithm is based on tabu search [86,87],
which has been applied to GP by [16–18,78,175]. We briefly outline the method
reported by Galinier et al. [78]. Instead of moving a node exactly once per
round, as in the traditional versions of the KL/FM algorithms, specific types of
moves are excluded only for a number of iterations. The number of iterations
that a move (v, block) is excluded depends on an aperiodic function f and the
current iteration i. The algorithm always moves a non-excluded node with the
highest gain. If the node is in block A, then the move (v, A) is excluded for f (i)
iterations after the node is moved to the block yielding the highest gain, i. e.,
the node cannot be put back to block A for f (i) iterations.
Diekmann et al. [57] extend graph growing and previous ideas [216] to obtain
an iterative procedure called Bubble framework, which is capable of partitioning
into k > 2 well-shaped blocks. Some applications profit from good geometric
block shapes, e. g., the convergence rate of certain iterative linear solvers.
Graph growing is extended first by carefully selecting k seed nodes that
are evenly distributed over the graph. The key property for obtaining a good
quality, however, is an iterative improvement within the second and the third
step – analogous to Lloyd’s k-means algorithm [135]. Starting from the k seed
nodes, k breadth-first searches grow the blocks analogous to Sect. 4.3, only that
the breadth-first searches are scheduled such that the smallest block receives the
next node. Local search algorithms are further used within this step to balance
the load of the blocks and to improve the cut of the resulting partition, which may
result in unconnected blocks. The final step of one iteration computes new seed
nodes for the next round. The new center of a block is defined as the node that
minimizes the sum of the distances to all other nodes within its block. To avoid
their expensive computation, approximations are used. The second and the third
step of the algorithm are iterated until either the seed nodes stop changing or no
improved partition was found for more than 10 iterations. Figure 1 illustrates the
three steps of the algorithm. A drawback of the algorithm is its computational
complexity O(km).
Subsequently, this approach has been improved by using distance measures
that better reflect the graph structure [144,151,189]. For example, Schamberger
[189] introduced the usage of diffusion as a growing mechanism around the initial
seeds and extended the method to weighted graphs. More sophisticated diffusion
schemes, some of which have been employed within the Bubble framework, are
discussed in Sect. 5.6.
A random walk on a graph starts on a node v and then chooses randomly the
next node to visit from the set of neighbors (possibly including v itself) based
on transition probabilities. The latter can for instance reflect the importance of
an edge. This iterative process can be repeated an arbitrary number of times.
Fig. 1. The three steps of the Bubble framework. Black nodes indicate the seed nodes.
On the left hand side, seed nodes are found. In the middle, a partition is found by
performing breadth-first searches around the seed nodes and on the right hand side
new seed nodes are found.
132 A. Buluç et al.
in Fig. 2: coarsening, initial partitioning, and uncoarsening. The main goal of the
coarsening (in many multilevel approaches implemented as contraction) phase is
to gradually approximate the original problem and the input graph with fewer
degrees of freedom. In multilevel GP solvers this is achieved by creating a hier-
archy of successively coarsened graphs with decreasing sizes in such a way that
cuts in the coarse graphs reflect cuts in the fine graph. There are multiple pos-
sibilities to create graph hierarchies. Most methods used today contract sets of
nodes on the fine level.Contracting U ⊂ V amounts to replacing it with a single
node u with c(u) := w∈U c(w). Contraction (and other types of coarsening)
might produce parallel edges which are replaced by a single edge whose weight
accumulates the weights of the parallel edges (see Fig. 3). This implies that bal-
anced partitions on the coarse level represent balanced partitions on the fine
level with the same cut value.
Coarsening is usually stopped when the graph is sufficiently small to be
initially partitioned using some (possibly expensive) algorithm. Any of the basic
algorithms from Sect. 4 can be used for initial partitioning as long as they are
able to handle general node and edge weights. The high quality of more expensive
methods that can be applied at the coarsest level does not necessarily translate
into quality at the finest level, and some GP multilevel solvers rather run several
faster but diverse methods repeatedly with different random tie breaking instead
of applying expensive global optimization techniques.
Uncoarsening consists of two stages. First, the solution obtained on the coarse
level graph is mapped to the fine level graph. Then the partition is improved, typ-
ically by using some variants of the improvement methods described in Sect. 5.
This process of uncoarsening and local improvement is carried on until the
finest hierarchy level has been processed. One run of this simple coarsening-
uncoarsening scheme is also called a V-cycle (see Fig. 2).
There are at least three intuitive reasons why the multilevel approach works
so well: First, at the coarse levels we can afford to perform a lot of work per node
without increasing the overall execution time by a lot. Furthermore, a single node
move at a coarse level corresponds to a big change in the final solution. Hence,
we might be able to find improvements easily that would be difficult to find on
the finest level. Finally, fine level local improvements are expected to run fast
since they already start from a good solution inherited from the coarse level. Also
V−Cycle
input output
contraction phase
uncoarsening phase
graph partition
... ...
local improvement
match
W−Cycle
contract uncontract
initial
F−Cycle
partitioning
Fig. 2. The multilevel approach to GP. The left figure shows a two-level contraction-
based scheme. The right figure shows different chains of coarsening-uncoarsening in the
multilevel frameworks.
134 A. Buluç et al.
multilevel methods can benefit from their iterative application (such as chains
of V-cycles) when the previous iteration’s solution is used to improve the qual-
ity of coarsening. Moreover, (following the analogy to multigrid schemes) the
inter-hierarchical coarsening-uncoarsening iteration can also be reconstructed
in such way that more work will be done at the coarser levels (see F-, and
W-cycles in Fig. 2, and [183,212]). An important technical advantage of mul-
tilevel approaches is related to parallelization. Because multilevel approaches
achieve a global solution by local processing only (though applied at different
levels of coarseness) they are naturally parallelization-schemes friendly.
The most widely used contraction strategy contracts (large) matchings, i. e., the
contracted sets are pairs of nodes connected by edges and these edges are not
allowed to be incident to each other. The idea is that this leads to a geomet-
rically decreasing size of the graph and hence a logarithmic number of levels,
while subsequent levels are “similar” so that local improvement can quickly find
good solutions. Assuming linear-time algorithms on all levels, one then gets
linear overall execution time. Conventional wisdom is that a good matching con-
tains many high weight edges since this decreases the weight of the edges in the
coarse graph and will eventually lead to small cuts. However, one also wants a
certain uniformity in the node weights so that it is not quite clear what should
be the objective of the matching algorithm. A successful recent approach is to
delegate this tradeoff between edge weights and uniformity to an edge rating
function [1,102]. For example, the function f (u, v) = ω({u,v})
c(v)c(u) works very well
[102,183] (also for n-level partitioning [161]). The concept of algebraic distance
yields further improved edge ratings [179].
The weighted matching problem itself has attracted a lot of interest moti-
vated to a large extent by its application for coarsening. Although the maximum
Recent Advances in Graph Partitioning 135
Using max-flow computations, Delling et al. [50] find “natural cuts” separating
heuristically determined regions from the remainder of the graph. Components
cut by none of these cuts are then contracted reducing the graph size by up to
two orders of magnitude. They use this as the basis of a two-level GP solver that
quickly gives very good solutions for road networks.
Aggregation-based coarsening identifies nodes on the fine level that survive in the
coarsened graph. All other nodes are assigned to these coarse nodes. In the general
case of weighted aggregation, nodes on a fine level belong to nodes on the coarse
level with some probability. This approach is derived from a class of hierarchical
linear solvers called Algebraic Multigrid (AMG) methods [41,144]. First results
on the bipartitioning problem were obtained by Ron et al. in [176]. As AMG lin-
ear solvers have shown, weighted aggregation is important in order to express the
likelihood of nodes to belong together. The accumulated likelihoods “smooth the
solution space” by eliminating from it local minima that will be detected instante-
neously by the local processing at the uncoarsening phase. This enables a relaxed
formulation of coarser levels and avoids making hardened local decisions, such
as edge contractions, before accumulating relevant global information about the
graph.
Weighted aggregation can lead to significantly denser coarse graphs. Hence,
only the most efficient AMG approaches can be adapted to graph partitioning
successfully. Furthermore one has to avoid unbalanced node weights. In [179]
algebraic distance [38] is used as a measure of connectivity between nodes to
obtain sparse and balanced coarse levels of high quality. These principles and
their relevance to AMG are summarized in [178].
Lafon and Lee [124] present a related coarsening framework whose goal is
to retain the spectral properties of the graph. They use matrix-based argu-
ments using random walks (for partitioning methods based on random walks see
Sect. 5.6) to derive approximation guarantees on the eigenvectors of the coarse
graph. The disadvantage of this approach is the rather expensive computation
of eigenvectors.
are computed. This is done such that the number of nodes that are grouped
k
together, i. e., j=1 |Vj ∩ Wσ(j) |, is maximum among all permutations σ of
{1, . . . , k}. An offspring is created as follows. Sets of nodes in U will be grouped
within a block of the offspring. That means if a node is in on of the sets of U,
then it is assigned to the same block to which it was assigned to in P1 . Other-
wise, it is assigned to a random block, such that the balance constraint remains
fulfilled. Local search is then used to improve the computed offspring before it is
inserted into the population. Benlic and Hao [17] combine their approach with
tabu search. Their algorithms produce partitions of very high quality, but cannot
guarantee that the output partition fulfills the desired balance constraint.
Sanders and Schulz introduced a distributed evolutionary algorithm, KaFF-
PaE (KaFFPaEvolutionary) [184]. They present a general combine operator
framework, which means that a partition P can be combined with another
partition or an arbitrary clustering of the graph, as well as multiple mutation
138 A. Buluç et al.
In the era of stalling CPU clock speeds, exploiting parallelism is probably the
most important way to accelerate computer programs from a hardware perspec-
tive. When executing parallel graph algorithms without shared memory, a good
distribution of the graph onto the PEs is very important. Since parallel com-
puting is a major purpose for GP, we discuss in this section several techniques
beneficial for parallel scenarios. (i) Parallel GP algorithms are often necessary
due to memory constraints: Partitioning a huge distributed graph on a single PE
is often infeasible. (ii) When different PEs communicate with different speeds
with each other, techniques for mapping the blocks communication-efficiently
onto the PEs become important. (iii) When the graph changes over time (as
in dynamic simulations), so does its partition. Once the imbalance becomes too
large, one should find a new partition that unifies three criteria for this purpose:
balance, low communication, and low migration.
Parallel GP algorithms are becoming more and more important since parallel
hardware is now ubiquitous and networks grow. If the underlying application is
in parallel processing, finding the partitions in parallel is even more compelling.
The difficulty of parallelization very much depends on the circumstances. It is
relatively easy to run sequential GP solvers multiple times with randomized tie
breaking in all available decisions. Completely independent runs quickly lead to
a point of diminishing return but are a useful strategy for very simple initial
partitioners as the one described in Sect. 4.3. Evolutionary GP solvers are more
effective (thanks to very good combination operators) and scale very well, even
on loosely coupled distributed machines [184].
Most of the geometry-based algorithms from Sect. 4.5 are parallelizable and
perhaps this is one of the main reasons for using them. In particular, one can use
them to find an initial distribution of nodes to processors in order to improve the
locality of a subsequent graph based parallel method [102]. If such a “reasonable”
distribution of a large graph over the local memories is available, distributed
memory multilevel partitioners using MPI can be made to scale [40,102,112,213].
However, loss of quality compared to the sequential algorithms is a constant
concern. A recent parallel matching algorithm allows high quality coarsening,
Recent Advances in Graph Partitioning 139
though [24]. If k coincides with the number of processors, one can use parallel
edge coloring of the quotient graph to do pairwise refinement between neighbor-
ing blocks. At least for mesh-like graphs this scales fairly well [102] and gives
quality comparable to sequential solvers. This comparable solution quality also
holds for parallel Jostle as described by Walshaw and Cross [214].
Parallelizing local search algorithms like KL/FM is much more difficult since
local search is inherently sequential and since recent results indicate that it
achieves best quality when performed in a highly localized way [161,183]. When
restricting local search to improving moves, parallelization is possible, though
[2,116,128,149]. In a shared memory context, one can also use speculative
parallelism [205]. The diffusion-based improvement methods described in
Sect. 5.6 are also parallelizable without loss of quality since they are formulated
in a naturally data parallel way [147,168].
(u, v) ∈ Ec denotes how much data process u sends to process v. Let furthermore
Gp = (Vp , Ep , ωp ) be the processor graph, where (i, j) ∈ Ep specifies the band-
width (or the latency) between PE i and PE j. We now assume that a partition
has already been computed, inducing a communication graph Gc . The task after
partitioning is then to find a communication-optimal mapping π : Vc → Vp .
Different objective functions have been proposed for this mapping problem.
Since it is difficult to capture the deciding hardware characteristics, most authors
concentrate on simplified cost functions – similar to the simplification of the edge
cut for graph partitioning. Apparently small variations in the cost functions
rarely lead to drastic variations in application running time. For details we refer
to Pellegrini’s survey on static mapping [167] (which we wish to update with this
section, not to replace) and the references therein. Global sum type cost functions
do not have the drawback of requiring global updates. Moreover, discontinuities
in their search space, which may inhibit metaheuristics to be effective, are usually
less pronounced than for maximum-based cost functions. Commonly used is the
sum, for all edges of Gc , of their weight multiplied
by the cost of a unit-weight
communication in Gp [167]: f (Gc , Gp , π) := (u,v)∈Ec ωc (u, v) · ωp (π(u), π(v)).
The accuracy of the distance function ωp depends on several factors, one
of them being the routing algorithm, which determines the paths a message
takes. The maximum length over all these paths is called the dilation of the
embedding π. One simplifying assumption can be that the routing algorithm
is oblivious [101] and, for example, uses always shortest paths. When multiple
messages are exchanged at the same time, the same communication link may be
requested by multiple messages. This congestion of edges in Gp can therefore be
another important factor to consider and whose maximum (or average) over all
edges should be minimized. Minimizing the maximum congestion is NP-hard, cf.
Garey and Johnson [80] or more recent work [101,120].
Algorithms. Due to the problem’s complexity, exact mapping methods are only
practical in special cases. Leighton’s book [130] discusses embeddings between
arrays, trees, and hypercubic topologies. One can apply a wide range of opti-
mization techniques to the mapping problem, also multilevel algorithms. Their
general structure is very similar to that described in Sect. 6. The precise dif-
ferences of the single stages are beyond our scope. Instead we focus on very
recent results – some of which also use hierarchical approaches. For pointers to
additional methods we refer the reader to Pellegrini [167] and Aubanel’s short
summary [9] on resource-aware load balancing.
Greedy approaches such as the one by Brandfass et al. [27] map the node vc
of Gc with the highest total communication cost w. r. t. to the already mapped
nodes onto the node vp of Gp with the smallest total distance w. r. t. to the
already mapped nodes. Some variations exist that improve this generic approach
in certain settings [84,101].
Hoefler and Snir [101] employ the reverse Cuthill-McKee (RCM) algorithm
as a mapping heuristic. Originally, RCM has been conceived for the problem
of minimizing the bandwidth of a sparse matrix [81]. In case both Gc and Gp
Recent Advances in Graph Partitioning 141
are sparse, the simultaneous optimization of both graph layouts can lead to
reasonable mapping results, also cf. Pellegrini [166].
Many metaheuristics have been used to solve the mapping problem. Uçar et
al. [210] implement a large variety of methods within a clustering approach, among
them genetic algorithms, simulated annealing, tabu search, and particle swarm
optimization. Brandfass et al. [27] present local search and evolutionary algo-
rithms. Their experiments confirm that metaheuristics are significantly slower
than problem-specific heuristics, but obtain high-quality solutions [27,210].
Another common approach is to partition Gc – or the application graph itself
– simultaneously together with Gp into the same number of blocks k . This is
for example done in Scotch [164]. For this approach k is chosen small enough
so that it is easy to test which block in Gc is mapped onto which block in Gp .
Since this often implies k < k, the partitioning is repeated recursively. When the
number of nodes in each block is small enough, the mapping within each block
is computed by brute force. If k = 2 and the two graphs to be partitioned are
the application graph and Gp , the method is called dual recursive bipartitioning.
Recently, schemes that model the processor graph as a tree have emerged [36]
in this algorithmic context and in similar ones [107].
Hoefler and Snir [101] compare the greedy, RCM, and dual recursive (bi)par-
titioning mapping techniques experimentally. On a 3D torus and two other real
architectures, their results do not show a clear winner. However, they confirm
previous studies [167] in that performing mapping at all is worthwhile. Bhatele
et al. [21] discuss topology-aware mappings of different communication patterns
to the physical topology in the context of MPI on emerging architectures. Better
mappings avoid communication hot spots and reduce communication times sig-
nificantly. Geometric information can also be helpful for finding good mappings
on regular architectures such as tori [20].
Repartitioning involves a tradeoff between the quality of the new partition and
the migration volume. Larger changes between the old partition Π and the new
one Π , necessary to obtain a small communication volume in Π , result in a
higher migration volume. Different strategies have been explored in the literature
to address this tradeoff. Two simple ones and their limitations are described by
Schloegel et al. [192]. One approach is to compute a new partition Π from
scratch and determine a migration-minimal mapping between Π and Π . This
approach delivers good partitions, but the migration volume is often very high.
Another strategy simply migrates nodes from overloaded blocks to underloaded
ones, until a new balanced partition is reached. While this leads to optimal
migration costs, it often delivers poor partition quality. To improve these simple
schemes, Schloegel et al. [193] combine the two and get the best of both in their
tool ParMetis.
Migration minimization with virtual nodes has been used in the repartition-
ing case by, among others, Hendrickson et al. [99]. For each block, an additional
142 A. Buluç et al.
node is added, which may not change its affiliation. It is connected to each node
v of the block by an edge whose weight is proportional to the migration cost for
v. Thus, one can account for migration costs and partition quality at the same
time. A detailed discussion of this general technique was made by Walshaw [217].
Recently, this technique has been extended to heterogeneous architectures by
Fourestier and Pellegrini [77].
Diffusion-based partitioning algorithms are particularly strong for repar-
titioning. PDibaP yields about 30–50% edge cut improvement compared to
ParMetis and about 15% improvement on parallel Jostle with a comparable
migration volume [147] (a short description of these tools can be found in
Sect. 9.3). Hypergraph-based repartitioning is particularly important when the
underlying problem has a rather irregular structure [34].
The graph data structure used by most partitioning software is the Compressed
Sparse Rows (CSR) format, also known as adjacency arrays. CSR is a cache and
storage efficient data structure for representing static graphs. The CSR represen-
tation of a graph can be composed of two, three, or four arrays, depending upon
whether edges or nodes are weighted. The node array (V) is of size n + 1 and
holds the node pointers. The edge array and the edge weights array, if present,
are of size m each. Each entry in the edge array (E) holds the node id of the
target node, while the corresponding entry in the edge weights array (W) holds
the weight of the edge. The node array holds the offsets to the edge array, mean-
ing that the target nodes of the outgoing edges of the ith node are accessible
from E(V(i)) to E(V(i + 1) − 1) and their respective weights are accessible from
W(V(i)) to W(V(i+1)−1). Both Metis and Scotch use a CSR-like data structure.
Since nodes can also be weighted in graph partitioning, an additional vector of
size n is often used to store node weights in that case. The CSR format can
further be improved and reinforced by rearranging the nodes with one of the
cache-oblivious layouts such as the minimum logarithmic arrangement [42,180].
Among distributed-memory GP solvers, ParMetis and PT-Scotch use a 1D
node distribution where each processor owns approximately n/p nodes and their
corresponding edges. By contrast, Zoltan uses a 2D edge distribution that has
lower communication requirements in theory.
Recent Advances in Graph Partitioning 143
9.2 Benchmarking
The Walshaw benchmark2 was created in 2000 by Soper et al. [202]. This public
domain archive, maintained by Chris Walshaw, contains 34 real-world graphs
stemming from applications such as finite element computations, matrix compu-
tations, VLSI Design and shortest path computations. More importantly, it also
contains for each graph the partitions with the smallest cuts found so far. Sub-
missions are sought that achieve improved cut values for k ∈ {2, 4, 8, 16, 32, 64}
and balance parameters ∈ {0, 0.01, 0.03, 0.05}, while running time is not an
issue. Currently, solutions of over 40 algorithms have been submitted to the
archive. It is the most popular GP benchmark in the literature.
There are many other very valuable sources of graphs for experimental
evaluations: the 10th DIMACS Implementation Challenge [12,13], the Florida
Sparse Matrix Collection [46], the Laboratory of Web Algorithms [220], the
Koblenz Network Collection [123], and the Stanford Large Network Dataset
Collection [131]. Many of the graphs are available at the website of the 10th
DIMACS Implementation Challenge [12,13] in the graph format that is used by
many GP software tools.
Aubanel et al. [82] present a different kind of partitioning benchmark. Instead
of measuring the edge cut of the partitions, the authors evaluate the execution
time of a parallel PDE solver to benchmark the partitions produced by differ-
ent GP solvers. The crucial module of the benchmark is parallel matrix-vector
multiplication, which is meaningful for other numerical routines as well.
Many fast methods for GPP are based on approaches in which finding a
global solution is done by local operations only. Testing if such methods are
robust against falling into local optima obtained by the local processing is a
very important task. In [179] a simple strategy for checking the quality of such
methods was presented. To construct a potentially hard instance, one may con-
sider a mixture of graphs with very different structures that are weakly connected
with each other. For example, in multilevel algorithms these graphs can force the
algorithm to contract incorrect edges that lead to uneven coarsening; also, they
can attract a “too strong” refinement to reach a local optimum, which can con-
tradict better optima at finer levels. Examples of real graphs that contain such
mixtures of structures include multi-mode networks [206] and logistics multi-
stage system networks [204]. Hardness of particular structures for GP solvers is
confirmed by generating graphs that are similar to the given ones at both coarse
and/or fine resolutions [91].
There are a number of software packages that implement the described algo-
rithms. One of the first publicly available software packages called Chaco is
due to Hendrickson and Leland [95]. As most of the publicly available soft-
ware packages, Chaco implements the multilevel approach outlined in Sect. 6
2
http://staffweb.cms.gre.ac.uk/∼wc06/partition/.
144 A. Buluç et al.
and basic local search algorithms. Moreover, they implement spectral partition-
ing techniques. Probably the fastest and best known system is the Metis family
by Karypis and Kumar [113,114]. kMetis [114] is focused on partitioning speed
and hMetis [115], which is a hypergraph partitioner, aims at partition quality.
PaToH [35] is also a widely used hypergraph partitioner that produces high qual-
ity partitions. ParMetis is a widely used parallel implementation of the Metis
GP algorithm [112]. Scotch [39,40,163] is a GP framework by Pellegrini. It uses
recursive multilevel bisection and includes sequential as well as parallel partition-
ing techniques. Jostle [213,215] is a well-known sequential and parallel GP solver
developed by Chris Walshaw. The commercialised version of this partitioner is
known as NetWorks. It has been able to hold most of the records in the Walshaw
Benchmark for a long period of time. If a model of the communication network
is available, then Jostle and Scotch are able to take this model into account for
the partitioning process. Party [57,155] implements the Bubble/shape-optimized
framework and the Helpful Sets algorithm. The software packages DibaP and its
MPI-parallel variant PDibaP by Meyerhenke [143,147] implement the Bubble
framework using diffusion; DibaP also uses AMG-based techniques for coarsen-
ing and solving linear systems arising in the diffusive approach. Recently, Sanders
and Schulz [186,187] released the GP package KaHIP (Karlsruhe High Quality
Partitioning) which implements for example flow-based methods, more-localized
local searches and several parallel and sequential meta-heuristics. KaHIP scored
most of the points in the GP subchallenge of the 10th DIMACS Implemen-
tation Challenge [13] and currently holds most of the entries in the Walshaw
Benchmark.
To address the load balancing problem in parallel applications, distrib-
uted versions of the established sequential partitioners Metis, Jostle and
Scotch [168,194,215] have been developed. The tools Parkway by Trifunovic and
Knottenbelt [208] as well as Zoltan by Devine et al. [53] focus on hypergraph
partitioning. Recent results of the 10th DIMACS Implementation Challenge [13]
suggest that scaling current hypergraph partitioners to very large systems is
even more challenging than graph partitioners.
10 Future Challenges
It is an interesting question to what extent the multitude of results sketched
above have reached a state of maturity where future improvements become less
and less likely. On the one hand, if you consider the Walshaw benchmark with
its moderately sized static graphs with mostly regular structure, the quality
obtained using the best current systems is very good and unlikely to improve
much in the future. One can already get very good quality with a careful appli-
cation of decade old techniques like KL/FM local search and the multilevel
approach. On the other hand, as soon as you widen your view in some direction,
there are plenty of important open problems.
Bridging Gaps Between Theory and Practice. We are far from understanding
why (or when) the heuristic methods used in practice produce solutions very
Recent Advances in Graph Partitioning 145
Multilevel Approach. While the multilevel paradigm has been extremely suc-
cessful for GP, there are still many algorithmic challenges ahead. The variety
of continuous systems multilevel algorithms (such as various types of multigrid)
turned into a separate field of applied mathematics, and optimization. Yet, mul-
tilevel algorithms for GPP still consist in practice of a very limited number
of multilevel techniques. The situation with other combinatorial optimization
problems is not significantly different. One very promising direction is bridging
the gaps between the theory and practice of multiscale computing and multi-
level GP such as introducing nonlinear coarsening schemes. For example, a novel
multilevel approach for the minimum vertex separator problem was recently pro-
posed using the continuous bilinear quadratic program formulation [92], and a
hybrid of the geometric multigrid, and full approximation scheme for continuous
problem was used for graph drawing, and VLSI placement problems [45,177].
Development of more sophisticated coarsening schemes, edge ratings, and met-
rics of nodes’ similarity that can be propagated throughout the hierarchies are
146 A. Buluç et al.
among the future challenges for graph partitioning as well as any attempt of
their rigorous analysis.
Parallelism and Other Hardware Issues. Scalable high quality GP (with qual-
ity comparable to sequential partitioners) remains an open problem. With the
advent of exascale machines with millions of processors and possibly billions
of threads, the situation is further aggravated. Traditional “flat” partitions of
graphs for processing on such machines implies a huge number of blocks. It is
unclear how even sequential partitioners perform for such instances. Resorting to
recursive partitioning brings down k and also addresses the hierarchical nature
of such machines. However, this means that we need parallel partitioners where
the number of available processors is much bigger than k. It is unclear how to
do this with high quality. Approaches like the band graphs from PT-Scotch are
interesting but likely to fail for complex networks.
Efficient implementation is also a big issue since complex memory hierar-
chies and heterogeneity (e.g., GPUs or FPGAs) make the implementation com-
plicated. In particular, there is a mismatch between the fine-grained discrete
computations predominant in the best sequential graph partitioners and the
massive data parallelism (SIMD-instructions, GPUs,. . . ) in high performance
computing which better fits highly regular numeric computations. It is therefore
likely that high quality GP will only be used for the higher levels of the machine
hierarchy, e.g., down to cluster nodes or CPU sockets. At lower levels of the
architectural hierarchy, we may use geometric partitioning or even regular grids
with dummy values for non-existing cells (e.g. [74]).
While exascale computing is a challenge for high-end applications, many
more applications can profit from GP in cloud computing and using tools for
high productivity such as Map/Reduce [47], Pregel [139], GraphLab [137], Com-
binatorial BLAS [30], or Parallel Boost Graph Library [90]. Currently, none of
these systems uses sophisticated GP software.
These changes in architecture also imply that we are no longer interested in
algorithms with little computations but rather in data access with high locality
and good energy efficiency.
(with changed k), and (re)mapping will therefore become more important. Even
running time as the bottom-line performance goal might be replaced by energy
consumption [199].
References
1. Abou-Rjeili, A., Karypis, G.: Multilevel algorithms for partitioning power-law
graphs. In: 20th International Parallel and Distributed Processing Symposium
(IPDPS). IEEE (2006)
2. Akhremtsev, Y., Sanders, P., Schulz, C.: (Semi-)external algorithms for graph
partitioning and clustering. In: 15th Workshop on Algorithm Engineering and
Experimentation (ALENEX), pp. 33–43 (2015)
3. Andersen, R., Lang, K.J.: An algorithm for improving graph partitions. In: 19th
ACM-SIAM Symposium on Discrete Algorithms, pp. 651–660 (2008)
4. Andreev, K., Räcke, H.: Balanced graph partitioning. Theory Comput. Syst.
39(6), 929–939 (2006)
5. Armbruster, M.: Branch-and-cut for a semidefinite relaxation of large-scale min-
imum bisection problems. Ph.D. thesis, U. Chemnitz (2007)
6. Armbruster, M., Fügenschuh, M., Helmberg, C., Martin, A.: A comparative
study of linear and semidefinite branch-and-cut methods for solving the mini-
mum graph bisection problem. In: Lodi, A., Panconesi, A., Rinaldi, G. (eds.)
IPCO 2008. LNCS, vol. 5035, pp. 112–124. Springer, Heidelberg (2008). doi:10.
1007/978-3-540-68891-4 8 √
7. Arora, S., Hazan, E., Kale, S.: O( log n) approximation to sparsest cut in Õ(n2 )
time. SIAM J. Comput. 39(5), 1748–1771 (2010)
8. Arora, S., Rao, S., Vazirani, U.: Expander flows, geometric embeddings and graph
partitioning. In: 36th ACM Symposium on the Theory of Computing (STOC),
pp. 222–231 (2004)
9. Aubanel, E.: Resource-aware load balancing of parallel applications. In: Udoh, E.,
Wang, F.Z. (eds.) Handbook of Research on Grid Technologies and Utility Com-
puting: Concepts for Managing Large-Scale Applications, pp. 12–21. Information
Science Reference - Imprint of: IGI Publishing, May 2009
10. Auer, B.F., Bisseling, R.H.: Graph coarsening and clustering on the GPU. In:
Bader et al. [13], pp. 19–36
11. Aykanat, C., Cambazoglu, B.B., Findik, F., Kurc, T.: Adaptive decomposi-
tion and remapping algorithms for object-space-parallel direct volume render-
ing of unstructured grids. J. Parallel Distrib. Comput. 67(1), 77–99 (2007).
http://dx.doi.org/10.1016/j.jpdc.2006.05.005
12. Bader, D.A., Meyerhenke, H., Sanders, P., Schulz, C., Kappes, A., Wagner, D.:
Benchmarking for graph clustering and graph partitioning. In: Encyclopedia of
Social Network Analysis and Mining (to appear)
13. Bader, D.A., Meyerhenke, H., Sanders, P., Wagner, D. (eds.): Graph Partitioning
and Graph Clustering – 10th DIMACS Impl. Challenge, Contemporary Mathe-
matics, vol. 588. AMS, Boston (2013)
14. Bader, M.: Space-Filling Curves. Springer, Heidelberg (2013)
148 A. Buluç et al.
15. Barnard, S.T., Simon, H.D.: A fast multilevel implementation of recursive spectral
bisection for partitioning unstructured problems. In: 6th SIAM Conference on
Parallel Processing for Scientific Computing, pp. 711–718 (1993)
16. Benlic, U., Hao, J.K.: An effective multilevel memetic algorithm for balanced
graph partitioning. In: 22nd IEEE International Conference on Tools with Arti-
ficial Intelligence (ICTAI), pp. 121–128 (2010)
17. Benlic, U., Hao, J.K.: A multilevel memetic approach for improving graph k-
partitions. IEEE Trans. Evol. Comput. 15(5), 624–642 (2011)
18. Benlic, U., Hao, J.K.: An effective multilevel tabu search approach for balanced
graph partitioning. Comput. Oper. Res. 38(7), 1066–1075 (2011)
19. van Bevern, R., Feldmann, A.E., Sorge, M., Suchý, O.: On the parameterized com-
plexity of computing balanced partitions in graphs. CoRR abs/1312.7014 (2013).
http://arxiv.org/abs/1312.7014
20. Bhatele, A., Kale, L.: Heuristic-based techniques for mapping irregular commu-
nication graphs to mesh topologies. In: 13th Conference on High Performance
Computing and Communications (HPCC), pp. 765–771 (2011)
21. Bhatele, A., Jain, N., Gropp, W.D., Kale, L.V.: Avoiding hot-spots on two-level
Direct networks. In: ACM/IEEE Conference for High Performance Computing,
Networking, Storage and Analysis (SC), pp. 76:1–76:11. ACM (2011)
22. Bichot, C., Siarry, P. (eds.): Graph Partitioning. Wiley, Hoboken (2011)
23. Bichot, C.E.: A new method, the fusion fission, for the relaxed k-way graph par-
titioning problem, and comparisons with some multilevel algorithms. J. Math.
Model. Algorithms 6(3), 319–344 (2007)
24. Birn, M., Osipov, V., Sanders, P., Schulz, C., Sitchinava, N.: Efficient paral-
lel and external matching. In: Wolf, F., Mohr, B., Mey, D. (eds.) Euro-Par
2013. LNCS, vol. 8097, pp. 659–670. Springer, Heidelberg (2013). doi:10.1007/
978-3-642-40047-6 66
25. Boman, E.G., Devine, K.D., Rajamanickam, S.: Scalable matrix computations on
large scale-free graphs using 2D graph partitioning. In: ACM/IEEE Conference
for High Performance Computing, Networking, Storage and Analysis (SC) (2013)
26. Boppana, R.B.: Eigenvalues and graph bisection: an average-case analysis. In:
28th Symposium on Foundations of Computer Science (FOCS), pp. 280–285
(1987)
27. Brandfass, B., Alrutz, T., Gerhold, T.: Rank reordering for MPI
communication optimization. Comput. Fluids 80, 372–380 (2013).
http://www.sciencedirect.com/science/article/pii/S004579301200028X
28. Brunetta, L., Conforti, M., Rinaldi, G.: A branch-and-cut algorithm for the equi-
cut problem. Math. Program. 78(2), 243–263 (1997)
29. Bui, T., Chaudhuri, S., Leighton, F., Sipser, M.: Graph bisection algorithms with
good average case behavior. Combinatorica 7, 171–191 (1987)
30. Buluç, A., Gilbert, J.R.: The combinatorial BLAS: design, implementation, and
applications. Int. J. High Perform. Comput. Appl. 25(4), 496–509 (2011)
31. Buluç, A., Madduri, K.: Graph partitioning for scalable distributed graph com-
putations. In: Bader et al. [13], pp. 83–102
32. Camilus, K.S., Govindan, V.K.: A review on graph based segmentation. IJIGSP
4, 1–13 (2012)
33. Catalyurek, U., Aykanat, C.: A hypergraph-partitioning approach for coarse-
grain decomposition. In: ACM/IEEE Conference on Supercomputing (SC). ACM
(2001)
Recent Advances in Graph Partitioning 149
34. Catalyurek, U., Boman, E., et al.: Hypergraph-based dynamic load balancing for
adaptive scientific computations. In: 21st International Parallel and Distributed
Processing Symposium (IPDPS). IEEE (2007)
35. Çatalyürek, Ü., Aykanat, C.: PaToH: partitioning tool for hypergraphs. In: Padua,
D. (ed.) Encyclopedia of Parallel Computing. Springer, Heidelberg (2011)
36. Chan, S.Y., Ling, T.C., Aubanel, E.: The impact of heterogeneous multi-core
clusters on graph partitioning: an empirical study. Cluster Comput. 15(3), 281–
302 (2012)
37. Chardaire, P., Barake, M., McKeown, G.P.: A PROBE-based heuristic for graph
partitioning. IEEE Trans. Comput. 56(12), 1707–1720 (2007)
38. Chen, J., Safro, I.: Algebraic distance on graphs. SIAM J. Sci. Comput. 33(6),
3468–3490 (2011)
39. Chevalier, C., Pellegrini, F.: Improvement of the efficiency of genetic algorithms
for scalable parallel graph partitioning in a multi-level framework. In: Nagel, W.E.,
Walter, W.V., Lehner, W. (eds.) Euro-Par 2006. LNCS, vol. 4128, pp. 243–252.
Springer, Heidelberg (2006). doi:10.1007/11823285 25
40. Chevalier, C., Pellegrini, F.: PT-Scotch: a tool for efficient parallel graph ordering.
Parallel Comput. 34(6), 318–331 (2008)
41. Chevalier, C., Safro, I.: Comparison of coarsening schemes for multi-level graph
partitioning. In: Proceedings Learning and Intelligent Optimization (2009)
42. Chierichetti, F., Kumar, R., Lattanzi, S., Mitzenmacher, M., Panconesi, A.,
Raghavan, P.: On compressing social networks. In: 15th ACM SIGKDD Interna-
tional Conference on Knowledge Discovery and Data Mining, pp. 219–228 (2009)
43. Chu, S., Cheng, J.: Triangle listing in massive networks and its applications. In:
17th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, pp.
672–680 (2011)
44. Comellas, F., Sapena, E.: A multiagent algorithm for graph partitioning. In: Roth-
lauf, F., Branke, J., Cagnoni, S., Costa, E., Cotta, C., Drechsler, R., Lutton, E.,
Machado, P., Moore, J.H., Romero, J., Smith, G.D., Squillero, G., Takagi, H.
(eds.) EvoWorkshops 2006. LNCS, vol. 3907, pp. 279–285. Springer, Heidelberg
(2006). doi:10.1007/11732242 25
45. Cong, J., Shinnerl, J.: Multilevel Optimization in VLSICAD. Springer, Heidelberg
(2003)
46. Davis, T.: The University of Florida Sparse Matrix Collection (2008). http://
www.cise.ufl.edu/research/sparse/matrices/
47. Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters.
In: 6th Symposium on Operating System Design and Implementation (OSDI), pp.
137–150. USENIX (2004)
48. Delling, D., Goldberg, A.V., Pajor, T., Werneck, R.F.: Customizable route plan-
ning. In: Pardalos, P.M., Rebennack, S. (eds.) SEA 2011. LNCS, vol. 6630, pp.
376–387. Springer, Heidelberg (2011). doi:10.1007/978-3-642-20662-7 32
49. Delling, D., Goldberg, A.V., Razenshteyn, I., Werneck, R.F.: Exact combina-
torial branch-and-bound for graph bisection. In: 12th Workshop on Algorithm
Engineering and Experimentation (ALENEX), pp. 30–44 (2012)
50. Delling, D., Goldberg, A.V., et al.: Graph partitioning with natural cuts. In:
25th International Parallel and Distributed Processing Symposium (IPDPS), pp.
1135–1146 (2011)
51. Delling, D., Werneck, R.F.: Better bounds for graph bisection. In: Epstein, L.,
Ferragina, P. (eds.) ESA 2012. LNCS, vol. 7501, pp. 407–418. Springer, Heidelberg
(2012). doi:10.1007/978-3-642-33090-2 36
150 A. Buluç et al.
52. Delling, D., Werneck, R.F.: Faster customization of road networks. In: Bonifaci,
V., Demetrescu, C., Marchetti-Spaccamela, A. (eds.) SEA 2013. LNCS, vol. 7933,
pp. 30–42. Springer, Heidelberg (2013). doi:10.1007/978-3-642-38527-8 5
53. Devine, K.D., Boman, E.G., Heaphy, R.T., Bisseling, R.H., Catalyurek, U.V.: Par-
allel hypergraph partitioning for scientific computing. In: Proceedings of the IEEE
International Parallel and Distributed Processing Symposium, p. 124. IPDPS 2006
(2006). http://dl.acm.org/citation.cfm?id=1898953.1899056
54. Guo, D., Ke Liao, H.J.: Power system reconfiguration based on multi-level graph
partitioning. In: 7th International Conference, GIScience 2012 (2012)
55. Diekmann, R., Monien, B., Preis, R.: Using helpful sets to improve graph bisec-
tions. In: Interconnection Networks and Mapping and Scheduling Parallel Com-
putations, vol. 21, pp. 57–73 (1995)
56. Diekmann, R., Preis, R., Schlimbach, F., Walshaw, C.: Shape-optimized mesh
partitioning and load balancing for parallel adaptive FEM. Parallel Comput. 26,
1555–1581 (2000)
57. Diekmann, R., Preis, R., Schlimbach, F., Walshaw, C.: Shape-optimized mesh par-
titioning and load balancing for parallel adaptive FEM. Parallel Comput. 26(12),
1555–1581 (2000)
58. Donath, W.E., Hoffman, A.J.: Algorithms for partitioning of graphs and computer
logic based on eigenvectors of connection matrices. IBM Tech. Discl. Bull. 15(3),
938–944 (1972)
59. Donath, W.E., Hoffman, A.J.: Lower bounds for the partitioning of graphs. IBM
J. Res. Dev. 17(5), 420–425 (1973)
60. Donde, V., Lopez, V., Lesieutre, B., Pinar, A., Yang, C., Meza, J.: Identification
of severe multiple contingencies in electric power networks. In: 37th N. A. Power
Symposium, pp. 59–66. IEEE (2005)
61. Drake, D., Hougardy, S.: A simple approximation algorithm for the weighted
matching problem. Inf. Process. Lett. 85, 211–213 (2003)
62. Drake Vinkemeier, D.E., Hougardy, S.: A linear-time approximation algorithm
for weighted matchings in graphs. ACM Trans. Algorithms 1(1), 107–122 (2005)
63. Duan, R., Pettie, S., Su, H.H.: Scaling Algorithms for Approximate and Exact
Maximum Weight Matching. CoRR abs/1112.0790 (2011)
64. Dutt, S.: New faster Kernighan-Lin-type graph-partitioning algorithms. In: 4th
IEEE/ACM Conference on Computer-Aided Design, pp. 370–377 (1993)
65. Even, G., Naor, J.S., Rao, S., Schieber, B.: Fast approximate graph partitioning
algorithms. SIAM J. Comput. 28(6), 2187–2214 (1999)
66. Fagginger Auer, B.O., Bisseling, R.H.: Abusing a hypergraph partitioner for
unweighted graph partitioning. In: Bader et al. [13], pp. 19–35
67. Farhat, C., Lesoinne, M.: Automatic partitioning of unstructured meshes for the
parallel solution of problems in computational mechanics. J. Numer. Methods
Eng. 36(5), 745–764 (1993). http://dx.doi.org/10.1002/nme.1620360503
68. Feige, U., Krauthgamer, R.: A polylogarithmic approximation of the minimum
bisection. SIAM J. Comput. 31(4), 1090–1118 (2002)
69. Feldmann, A.E., Widmayer, P.: An O(n4 ) time algorithm to compute the bisection
width of solid grid graphs. In: Demetrescu, C., Halldórsson, M.M. (eds.) ESA
2011. LNCS, vol. 6942, pp. 143–154. Springer, Heidelberg (2011). doi:10.1007/
978-3-642-23719-5 13
70. Felner, A.: Finding optimal solutions to the graph partitioning problem with
heuristic search. Ann. Math. Artif. Intell. 45, 293–322 (2005)
Recent Advances in Graph Partitioning 151
71. Ferreira, C.E., Martin, A., De Souza, C.C., Weismantel, R., Wolsey, L.A.: The
node capacitated graph partitioning problem: a computational study. Math. Pro-
gram. 81(2), 229–256 (1998)
72. Fiduccia, C.M., Mattheyses, R.M.: A linear-time heuristic for improving network
partitions. In: 19th Conference on Design Automation, pp. 175–181 (1982)
73. Fiedler, M.: A property of eigenvectors of nonnegative symmetric matrices and
its application to graph theory. Czech. Math. J. 25(4), 619–633 (1975)
74. Fietz, J., Krause, M.J., Schulz, C., Sanders, P., Heuveline, V.: Optimized
hybrid parallel lattice Boltzmann fluid flow simulations on complex geome-
tries. In: Kaklamanis, C., Papatheodorou, T., Spirakis, P.G. (eds.) Euro-Par
2012. LNCS, vol. 7484, pp. 818–829. Springer, Heidelberg (2012). doi:10.1007/
978-3-642-32820-6 81
75. Ford, L.R., Fulkerson, D.R.: Maximal flow through a network. Can. J. Math. 8(3),
399–404 (1956)
76. Fortunato, S.: Community Detection in Graphs. CoRR abs/0906.0612 (2009)
77. Fourestier, S., Pellegrini, F.: Adaptation au repartitionnement de graphes d’une
méthode d’optimisation globale par diffusion. In: RenPar’20 (2011)
78. Galinier, P., Boujbel, Z., Fernandes, M.C.: An efficient memetic algorithm for the
graph partitioning problem. Ann. Oper. Res. 191(1), 1–22 (2011)
79. Garey, M.R., Johnson, D.S., Stockmeyer, L.: Some simplified NP-complete prob-
lems. In: 6th ACM Symposium on Theory of Computing, pp. 47–63. STOC, ACM
(1974)
80. Garey, M.R., Johnson, D.S.: Computers and Intractability: A Guide to the Theory
of NP-Completeness. W. H. Freeman & Co., New York (1979)
81. George, A., Liu, J.W.H.: Computer Solution of Large Sparse Positive Definite
Systems. Prentice-Hall, Upper Saddle River (1981)
82. Ghazinour, K., Shaw, R.E., Aubanel, E.E., Garey, L.E.: A linear solver for bench-
marking partitioners. In: 22nd IEEE International Symposium on Parallel and
Distributed Processing (IPDPS), pp. 1–8 (2008)
83. Gilbert, J.R., Miller, G.L., Teng, S.H.: Geometric mesh partitioning: implemen-
tation and experiments. SIAM J. Sci. Comput. 19(6), 2091–2110 (1998)
84. Glantz, R., Meyerhenke, H., Noe, A.: Algorithms for mapping parallel processes
onto grid and torus architectures. In: Proceedings of the 23rd Euromicro Interna-
tional Conference on Parallel, Distributed and Network-Based Processing (2015,
to appear). Preliminary version: http://arxiv.org/abs/1411.0921
85. Glantz, R., Meyerhenke, H., Schulz, C.: Tree-based coarsening and partition-
ing of complex networks. In: Gudmundsson, J., Katajainen, J. (eds.) SEA
2014. LNCS, vol. 8504, pp. 364–375. Springer, Heidelberg (2014). doi:10.1007/
978-3-319-07959-2 31
86. Glover, F.: Tabu search – part I. ORSA J. Comput. 1(3), 190–206 (1989)
87. Glover, F.: Tabu search – part II. ORSA J. Comput. 2(1), 4–32 (1990)
88. Goldschmidt, O., Hochbaum, D.S.: A polynomial algorithm for the k-cut problem
for fixed k. Math. Oper. Res. 19(1), 24–37 (1994)
89. Grady, L., Schwartz, E.L.: Isoperimetric graph partitioning for image segmenta-
tion. IEEE Trans. Pattern Anal. Mach. Intell. 28, 469–475 (2006)
90. Gregor, D., Lumsdaine, A.: The parallel BGL: a generic library for distributed
graph computations. In: Parallel Object-Oriented Scientific Computing (POOSC)
(2005)
91. Gutfraind, A., Meyers, L.A., Safro, I.: Multiscale Network Generation. CoRR
abs/1207.4266 (2012)
152 A. Buluç et al.
92. Hager, W.W., Hungerford, J.T., Safro, I.: A multilevel bilinear program-
ming algorithm for the vertex separator problem. CoRR abs/1410.4885 (2014).
arXiv:1410.4885
93. Hager, W.W., Krylyuk, Y.: Graph partitioning and continuous quadratic pro-
gramming. SIAM J. Discrete Math. 12(4), 500–523 (1999)
94. Hager, W.W., Phan, D.T., Zhang, H.: An exact algorithm for graph partitioning.
Math. Program. 137(1–2), 531–556 (2013)
95. Hendrickson, B.: Chaco: Software for Partitioning Graphs. http://www.cs.sandia.
gov/bahendr/chaco.html
96. Hendrickson, B.: Graph partitioning and parallel solvers: has the emperor no
clothes? In: Ferreira, A., Rolim, J., Simon, H., Teng, S.-H. (eds.) IRREGULAR
1998. LNCS, vol. 1457, pp. 218–225. Springer, Heidelberg (1998). doi:10.1007/
BFb0018541
97. Hendrickson, B., Leland, R.: A multilevel algorithm for partitioning graphs. In:
ACM/IEEE Conference on Supercomputing 1995 (1995)
98. Hendrickson, B., Leland, R.: An improved spectral graph partitioning algorithm
for mapping parallel computations. SIAM J. Sci. Comput. 16(2), 452–469 (1995)
99. Hendrickson, B., Leland, R., Driessche, R.V.: Enhancing data locality by using
terminal propagation. In: 29th Hawaii International Conference on System Sci-
ences (HICSS 2009), vol. 1, p. 565. Software Technology and Architecture (1996)
100. Hendrickson, B., Kolda, T.G.: Graph partitioning models for parallel computing.
Parallel Comput. 26(12), 1519–1534 (2000)
101. Hoefler, T., Snir, M.: Generic topology mapping strategies for large-scale parallel
architectures. In: ACM International Conference on Supercomputing (ICS 2011),
pp. 75–85. ACM (2011)
102. Holtgrewe, M., Sanders, P., Schulz, C.: Engineering a scalable high quality graph
partitioner. In: 24th IEEE International Parallel and Distributed Processing Sym-
posium (IPDPS), pp. 1–12 (2010)
103. Hromkovič, J., Monien, B.: The bisection problem for graphs of degree 4 (config-
uring transputer systems). In: Tarlecki, A. (ed.) MFCS 1991. LNCS, vol. 520, pp.
211–220. Springer, Heidelberg (1991). doi:10.1007/3-540-54345-7 64
104. Huang, S., Aubanel, E., Bhavsar, V.C.: PaGrid: a mesh partitioner for computa-
tional grids. J. Grid Comput. 4(1), 71–88 (2006)
105. Hungershöfer, J., Wierum, J.-M.: On the quality of partitions based on space-
filling curves. In: Sloot, P.M.A., Hoekstra, A.G., Tan, C.J.K., Dongarra, J.J. (eds.)
ICCS 2002. LNCS, vol. 2331, pp. 36–45. Springer, Heidelberg (2002). doi:10.1007/
3-540-47789-6 4
106. Hyafil, L., Rivest, R.: Graph partitioning and constructing optimal decision trees
are polynomial complete problems. Technical report 33, IRIA - Laboratoire de
Recherche en Informatique et Automatique (1973)
107. Jeannot, E., Mercier, G., Tessier, F.: Process placement in multicore clusters:
algorithmic issues and practical techniques. IEEE Trans. Parallel Distrib. Syst.
PP(99), 1–1 (2013)
108. Jerrum, M., Sorkin, G.B.: The metropolis algorithm for graph bisection. Discret.
Appl. Math. 82(1–3), 155–175 (1998)
109. Junker, B., Schreiber, F.: Analysis of Biological Networks. Wiley, Hoboken (2008)
110. Kahng, A.B., Lienig, J., Markov, I.L., Hu, J.: VLSI Physical Design - From Graph
Partitioning to Timing Closure. Springer, Heidelberg (2011)
111. Karisch, S.E., Rendl, F., Clausen, J.: Solving graph bisection problems with semi-
definite programming. INFORMS J. Comput. 12(3), 177–191 (2000)
Recent Advances in Graph Partitioning 153
112. Karypis, G., Kumar, V.: Parallel multilevel k-way partitioning scheme for irreg-
ular graphs. In: ACM/IEEE Supercomputing 1996 (1996)
113. Karypis, G., Kumar, V.: A fast and high quality multilevel scheme for partitioning
irregular graphs. SIAM J. Sci. Comput. 20(1), 359–392 (1998)
114. Karypis, G., Kumar, V.: Multilevel k-way partitioning scheme for irregular graphs.
J. Parallel Distrib. Comput. 48(1), 96–129 (1998)
115. Karypis, G., Kumar, V.: Multilevel k-way hypergraph partitioning. In: 36th
ACM/IEEE Design Automation Conference, pp. 343–348. ACM (1999)
116. Karypis, G., Kumar, V.: Parallel multilevel series k-way partitioning scheme for
irregular graphs. SIAM Rev. 41(2), 278–300 (1999)
117. Kernighan, B.W., Lin, S.: An efficient heuristic procedure for partitioning graphs.
Bell Syst. Tech. J. 49(1), 291–307 (1970)
118. Kieritz, T., Luxen, D., Sanders, P., Vetter, C.: Distributed time-dependent con-
traction hierarchies. In: Festa, P. (ed.) SEA 2010. LNCS, vol. 6049, pp. 83–93.
Springer, Heidelberg (2010). doi:10.1007/978-3-642-13193-6 8
119. Kim, J., Hwang, I., Kim, Y.H., Moon, B.R.: Genetic approaches for graph par-
titioning: a survey. In: 13th Genetic and Evolutionary Computation (GECCO),
pp. 473–480. ACM (2011). http://doi.acm.org/10.1145/2001576.2001642
120. Kim, Y.M., Lai, T.H.: The complexity of congestion-1 embed-
ding in a hypercube. J. Algorithms 12(2), 246–280 (1991).
http://www.sciencedirect.com/science/article/pii/019667749190004I
121. Kirmani, S., Raghavan, P.: Scalable parallel graph partitioning. In: High Perfor-
mance Computing, Networking, Storage and Analysis, SC 2013. ACM (2013)
122. Korosec, P., Silc, J., Robic, B.: Solving the mesh-partitioning problem with an
ant-colony algorithm. Parallel Comput. 30(5–6), 785–801 (2004)
123. Kunegis, J.: KONECT - the Koblenz network collection. In: Web Observatory
Workshop, pp. 1343–1350 (2013)
124. Lafon, S., Lee, A.B.: Diffusion maps and coarse-graining: a unified framework for
dimensionality reduction, graph partioning and data set parametrization. IEEE
Trans. Pattern Anal. Mach. Intell. 28(9), 1393–1403 (2006)
125. Lanczos, C.: An iteration method for the solution of the eigenvalue problem of
linear differential and integral operators. J. Res. Natl Bur. Stand. 45(4), 255–282
(1950)
126. Land, A.H., Doig, A.G.: An automatic method of solving discrete programming
problems. Econometrica 28(3), 497–520 (1960)
127. Lang, K., Rao, S.: A flow-based method for improving the expansion or
conductance of graph cuts. In: Bienstock, D., Nemhauser, G. (eds.) IPCO
2004. LNCS, vol. 3064, pp. 325–337. Springer, Heidelberg (2004). doi:10.1007/
978-3-540-25960-2 25
128. Lasalle, D., Karypis, G.: Multi-threaded graph partitioning. In: 27th International
Parallel and Distributed Processing Symposium (IPDPS), pp. 225–236 (2013)
129. Lauther, U.: An extremely fast, exact algorithm for finding shortest paths in static
networks with geographical background. In: Münster GI-Days (2004)
130. Leighton, F.T.: Introduction to Parallel Algorithms and Architectures: Arrays,
Trees, Hypercubes. Morgan Kaufmann Publishers, Burlington (1992)
131. Lescovec, J.: Stanford network analysis package (SNAP). http://snap.stanford.
edu/index.html
132. Li, H., Rosenwald, G., Jung, J., Liu, C.C.: Strategic power infrastructure defense.
Proc. IEEE 93(5), 918–933 (2005)
133. Li, J., Liu, C.C.: Power system reconfiguration based on multilevel graph parti-
tioning. In: PowerTech, pp. 1–5 (2009)
154 A. Buluç et al.
134. Lisser, A., Rendl, F.: Graph partitioning using linear and semidefinite program-
ming. Math. Program. 95(1), 91–101 (2003)
135. Lloyd, S.: Least squares quantization in PCM. IEEE Trans. Inf. Theory 28(2),
129–137 (1982)
136. Lovász, L.: Random walks on graphs: a survey. Comb. Paul Erdös is Eighty 2,
1–46 (1993)
137. Low, Y., Gonzalez, J., Kyrola, A., Bickson, D., Guestrin, C., Hellerstein, J.M.:
Distributed GraphLab: a framework for machine learning in the cloud. PVLDB
5(8), 716–727 (2012)
138. Luxen, D., Schieferdecker, D.: Candidate sets for alternative routes in road net-
works. In: Klasing, R. (ed.) SEA 2012. LNCS, vol. 7276, pp. 260–270. Springer,
Heidelberg (2012). doi:10.1007/978-3-642-30850-5 23
139. Malewicz, G., Austern, M.H., Bik, A.J.C., Dehnert, J.C., Horn, I., Leiser, N., Cza-
jkowski, G.: Pregel: a system for large-scale graph processing. In: ACM SIGMOD
International Conference on Management of Data (SIGMOD), pp. 135–146. ACM
(2010)
140. Maue, J., Sanders, P.: Engineering algorithms for approximate weighted matching.
In: Demetrescu, C. (ed.) WEA 2007. LNCS, vol. 4525, pp. 242–255. Springer,
Heidelberg (2007). doi:10.1007/978-3-540-72845-0 19
141. Maue, J., Sanders, P., Matijevic, D.: Goal directed shortest path queries using
precomputed cluster distances. ACM J. Exp. Algorithmics 14, 3.2:1–3.2:27 (2009)
142. Meuer, H., Strohmaier, E., Simon, H., Dongarra, J.: June 2013 — TOP500 super-
computer sites. http://top.500.org/lists/2013/06/
143. Meyerhenke, H., Monien, B., Sauerwald, T.: A new diffusion-based multilevel
algorithm for computing graph partitions. J. Parallel Distrib. Comput. 69(9),
750–761 (2009)
144. Meyerhenke, H., Monien, B., Schamberger, S.: Accelerating shape optimizing load
balancing for parallel FEM simulations by algebraic multigrid. In: 20th IEEE
International Parallel and Distributed Processing Symposium (IPDPS), p. 57
(CD) (2006)
145. Meyerhenke, H., Sanders, P., Schulz, C.: Partitioning complex networks via
size-constrained clustering. In: Gudmundsson, J., Katajainen, J. (eds.) SEA
2014. LNCS, vol. 8504, pp. 351–363. Springer, Heidelberg (2014). doi:10.1007/
978-3-319-07959-2 30
146. Meyerhenke, H.: Disturbed diffusive processes for solving partitioning problems
on graphs. Ph.D. thesis, Universität Paderborn (2008)
147. Meyerhenke, H.: Shape optimizing load balancing for MPI-parallel adaptive
numerical simulations. In: Bader et al. [13], pp. 67–82
148. Meyerhenke, H., Monien, B., Schamberger, S.: Graph partitioning and disturbed
diffusion. Parallel Comput. 35(10–11), 544–569 (2009)
149. Meyerhenke, H., Sanders, P., Schulz, C.: Parallel graph partitioning for complex
networks. In: Proceeding of the 29th IEEE International Parallel & Distributed
Processing Symposium, (IPDPS 2015) (2015 to appear). Preliminary version:
http://arxiv.org/abs/1404.4797
150. Meyerhenke, H., Sauerwald, T.: Beyond good partition shapes: an analysis of
diffusive graph partitioning. Algorithmica 64(3), 329–361 (2012)
151. Meyerhenke, H., Schamberger, S.: Balancing parallel adaptive FEM computations
by solving systems of linear equations. In: Cunha, J.C., Medeiros, P.D. (eds.)
Euro-Par 2005. LNCS, vol. 3648, pp. 209–219. Springer, Heidelberg (2005). doi:10.
1007/11549468 26
Recent Advances in Graph Partitioning 155
152. Miller, G., Teng, S.H., Vavasis, S.: A unified geometric approach to graph sep-
arators. In: 32nd Symposium on Foundations of Computer Science (FOCS), pp.
538–547 (1991)
153. Möhring, R.H., Schilling, H., Schütz, B., Wagner, D., Willhalm, T.: Partitioning
graphs to speedup Dijkstra’s algorithm. ACM J. Exp. Algorithmics 11, 1–29
(2006, 2007)
154. Mondaini, R.: Biomat 2009: International Symposium on Mathematical and Com-
putational Biology, Brasilia, Brazil, 1–6. World Scientific (2010). http://books.
google.es/books?id=3tiLMKtXiZwC
155. Monien, B., Schamberger, S.: Graph partitioning with the party library: helpful-
sets in practice. In: 16th Symposium on Computer Architecture and High Perfor-
mance Computing, pp. 198–205 (2004)
156. Monien, B., Preis, R., Schamberger, S.: Approximation algorithms for multilevel
graph partitioning. In: Gonzalez, T.F. (ed.) Handbook of Approximation Algo-
rithms and Metaheuristics, chap. 60, pp. 60-1–60-15. Taylor & Francis, Abingdon
(2007)
157. Moulitsas, I., Karypis, G.: Architecture aware partitioning algorithms. In: Bour-
geois, A.G., Zheng, S.Q. (eds.) ICA3PP 2008. LNCS, vol. 5022, pp. 42–53.
Springer, Heidelberg (2008). doi:10.1007/978-3-540-69501-1 6
158. Newman, M.E.J.: Community detection and graph partitioning. CoRR
abs/1305.4974 (2013)
159. Newman, M.: Networks: An Introduction. Oxford University Press Inc., New York
(2010)
160. Nishimura, J., Ugander, J.: Restreaming graph partitioning: simple versatile algo-
rithms for advanced balancing. In: 19th ACM SIGKDD International Conference
on Knowledge Discovery and Data Mining (KDD) (2013)
161. Osipov, V., Sanders, P.: n-level graph partitioning. In: Berg, M., Meyer, U. (eds.)
ESA 2010. LNCS, vol. 6346, pp. 278–289. Springer, Heidelberg (2010). doi:10.
1007/978-3-642-15775-2 24
162. Papa, D.A., Markov, I.L.: Hypergraph partitioning and clustering. In: Gonzalez,
T.F. (ed.) Handbook of Approximation Algorithms and Metaheuristics, chap. 61,
pp. 61-1–61-19. CRC Press, Boca Raton (2007)
163. Pellegrini, F.: Scotch home page. http://www.labri.fr/pelegrin/scotch
164. Pellegrini, F.: Static mapping by dual recursive bipartitioning of process and archi-
tecture graphs. In: Scalable High-Performance Computing Conference (SHPCC),
pp. 486–493. IEEE, May 1994
165. Pellegrini, F.: A parallelisable multi-level banded diffusion scheme for computing
balanced partitions with smooth boundaries. In: Kermarrec, A.-M., Bougé, L.,
Priol, T. (eds.) Euro-Par 2007. LNCS, vol. 4641, pp. 195–204. Springer, Heidelberg
(2007). doi:10.1007/978-3-540-74466-5 22
166. Pellegrini, F.: Scotch and libScotch 5.0 user’s guide. Technical report, LaBRI,
Université Bordeaux I, December 2007
167. Pellegrini, F.: Static mapping of process graphs. In: Bichot, C.E., Siarry, P. (eds.)
Graph Partitioning, chap. 5, pp. 115–136. Wiley, Hoboken (2011)
168. Pellegrini, F.: Scotch and PT-Scotch graph partitioning software: an overview. In:
Naumann, U., Schenk, O. (eds.) Combinatorial Scientific Computing, pp. 373–406.
CRC Press, Boca Raton (2012)
169. Peng, B., Zhang, L., Zhang, D.: A survey of graph theoretical approaches to image
segmentation. Pattern Recognit. 46(3), 1020–1038 (2013)
170. Pettie, S., Sanders, P.: A simpler linear time 2/3 − approximation for maximum
weight matching. Inf. Process. Lett. 91(6), 271–276 (2004)
156 A. Buluç et al.
171. Pilkington, J.R., Baden, S.B.: Partitioning with space-filling curves. Technical
report CS94-349, UC San Diego, Department of Computer Science and Engineer-
ing (1994)
172. Pothen, A., Simon, H.D., Liou, K.P.: Partitioning sparse matrices with eigenvec-
tors of graphs. SIAM J. Matrix Anal. Appl. 11(3), 430–452 (1990)
173. Preis, R.: Linear time 1/2-approximation algorithm for maximum weighted
matching in general graphs. In: Meinel, C., Tison, S. (eds.) STACS 1999. LNCS,
vol. 1563, pp. 259–269. Springer, Heidelberg (1999). doi:10.1007/3-540-49116-3 24
174. Raghavan, U.N., Albert, R., Kumara, S.: Near linear time algorithm to detect
community structures in large-scale networks. Phys. Rev. E 76(3) (2007)
175. Rolland, E., Pirkul, H., Glover, F.: Tabu search for graph partitioning. Ann. Oper.
Res. 63(2), 209–232 (1996)
176. Ron, D., Wishko-Stern, S., Brandt, A.: An algebraic multigrid based algorithm
for bisectioning general graphs. Technical report MCS05-01, Department of Com-
puter Science and Applied Mathematics, The Weizmann Institute of Science
(2005)
177. Ron, D., Safro, I., Brandt, A.: A fast multigrid algorithm for energy minimization
under planar density constraints. Multiscale Model. Simul. 8(5), 1599–1620 (2010)
178. Ron, D., Safro, I., Brandt, A.: Relaxation-based coarsening and multiscale graph
organization. Multiscale Model. Simul. 9(1), 407–423 (2011)
179. Safro, I., Sanders, P., Schulz, C.: Advanced coarsening schemes for graph parti-
tioning. In: Klasing, R. (ed.) SEA 2012. LNCS, vol. 7276, pp. 369–380. Springer,
Heidelberg (2012)
180. Safro, I., Temkin, B.: Multiscale approach for the network compression-friendly
ordering. J. Discret. Algorithms 9(2), 190–202 (2011)
181. Salihoglu, S., Widom, J.: GPS: a graph processing system. In: Proceedings of
the 25th International Conference on Scientific and Statistical Database Man-
agement, SSDBM, pp. 22:1–22:12. ACM (2013). http://doi.acm.org/10.1145/
2484838.2484843
182. Sanchis, L.A.: Multiple-way network partitioning. IEEE Trans. Comput. 38(1),
62–81 (1989)
183. Sanders, P., Schulz, C.: Engineering multilevel graph partitioning algorithms. In:
Demetrescu, C., Halldórsson, M.M. (eds.) ESA 2011. LNCS, vol. 6942, pp. 469–
480. Springer, Heidelberg (2011). doi:10.1007/978-3-642-23719-5 40
184. Sanders, P., Schulz, C.: Distributed evolutionary graph partitioning. In: 12th
Workshop on Algorithm Engineering and Experimentation (ALENEX), pp. 16–29
(2012)
185. Sanders, P., Schulz, C.: High quality graph partitioning. In: Bader et al. [13], pp.
19–36
186. Sanders, P., Schulz, C.: Think locally, act globally: highly balanced graph parti-
tioning. In: Bonifaci, V., Demetrescu, C., Marchetti-Spaccamela, A. (eds.) SEA
2013. LNCS, vol. 7933, pp. 164–175. Springer, Heidelberg (2013). doi:10.1007/
978-3-642-38527-8 16
187. Sanders, P., Schulz, C.: KaHIP - Karlsruhe High Quality Partitioning Homepage.
http://algo2.iti.kit.edu/documents/kahip/index.html
188. Schaeffer, S.E.: Graph clustering. Comput. Sci. Rev. 1(1), 27–64. http://dx.doi.
org/10.1016/j.cosrev.2007.05.001
189. Schamberger, S.: On partitioning FEM graphs using diffusion. In: HPGC Work-
shop of the 18th International Parallel and Distributed Processing Symposium
(IPDPS 2004). IEEE Computer Society (2004)
Recent Advances in Graph Partitioning 157
190. Schamberger, S., Wierum, J.M.: A locality preserving graph ordering approach
for implicit partitioning: graph-filling curves. In: 17th International Conference
on Parallel and Distributed Computing Systems (PDCS), ISCA, pp. 51–57 (2004)
191. Schloegel, K., Karypis, G., Kumar, V.: Graph partitioning for high-performance
scientific simulations. In: Dongarra, J., Foster, I., Fox, G., Gropp, W., Kennedy,
K., Torczon, L., White, A. (eds.) Sourcebook of parallel computing, pp. 491–541.
Morgan Kaufmann Publishers, Burlington (2003)
192. Schloegel, K., Karypis, G., Kumar, V.: Multilevel diffusion schemes for reparti-
tioning of adaptive meshes. J. Parallel Distrib. Comput. 47(2), 109–124 (1997)
193. Schloegel, K., Karypis, G., Kumar, V.: A unified algorithm for load-balancing
adaptive scientific simulations. In: Supercomputing 2000, p. 59 (CD). IEEE Com-
puter Society (2000)
194. Schloegel, K., Karypis, G., Kumar, V.: Parallel static and dynamic multi-
constraint graph partitioning. Concurr. Comput.: Pract. Exp. 14(3), 219–240
(2002)
195. Schulz, C.: High quality graph partititioning. Ph.D. thesis. epubli GmbH (2013)
196. Schulz, F., Wagner, D., Zaroliagis, C.: Using multi-level graphs for timetable
information in railway systems. In: Mount, D.M., Stein, C. (eds.) ALENEX
2002. LNCS, vol. 2409, pp. 43–59. Springer, Heidelberg (2002). doi:10.1007/
3-540-45643-0 4
197. Sellmann, M., Sensen, N., Timajev, L.: Multicommodity flow approximation
used for exact graph partitioning. In: Battista, G., Zwick, U. (eds.) ESA
2003. LNCS, vol. 2832, pp. 752–764. Springer, Heidelberg (2003). doi:10.1007/
978-3-540-39658-1 67
198. Sensen, N.: Lower bounds and exact algorithms for the graph partitioning problem
using multicommodity flows. In: Heide, F.M. (ed.) ESA 2001. LNCS, vol. 2161,
pp. 391–403. Springer, Heidelberg (2001). doi:10.1007/3-540-44676-1 33
199. Shalf, J., Dosanjh, S., Morrison, J.: Exascale computing technology challenges.
In: Palma, J.M.L.M., Daydé, M., Marques, O., Lopes, J.C. (eds.) VECPAR
2010. LNCS, vol. 6449, pp. 1–25. Springer, Heidelberg (2011). doi:10.1007/
978-3-642-19328-6 1
200. Simon, H.D.: Partitioning of unstructured problems for parallel processing. Com-
put. Syst. Eng. 2(2), 135–148 (1991)
201. Simon, H.D., Teng, S.H.: How good is recursive bisection? SIAM J. Sci. Comput.
18(5), 1436–1445 (1997)
202. Soper, A.J., Walshaw, C., Cross, M.: A combined evolutionary search and multi-
level optimisation approach to graph-partitioning. J. Glob. Optim. 29(2), 225–241
(2004)
203. Stanton, I., Kliot, G.: Streaming graph partitioning for large distributed graphs.
In: 18th ACM SIGKDD International Conference on Knowledge discovery and
data mining (KDD), pp. 1222–1230. ACM (2012)
204. Stock, L.: Strategic logistics management. Cram101 Textbook Outlines, Lightning
Source Inc. (2006). http://books.google.com/books?id=1LyCAQAACAAJ
205. Sui, X., Nguyen, D., Burtscher, M., Pingali, K.: Parallel graph partitioning on
multicore architectures. In: Cooper, K., Mellor-Crummey, J., Sarkar, V. (eds.)
LCPC 2010. LNCS, vol. 6548, pp. 246–260. Springer, Heidelberg (2011). doi:10.
1007/978-3-642-19595-2 17
206. Tang, L., Liu, H., Zhang, J., Nazeri, Z.: Community evolution in dynamic multi-
mode networks. In: 14th ACM SIGKDD International Conference on Knowledge
discovery and data mining (KDD), pp. 677–685. ACM (2008)
158 A. Buluç et al.
207. Teresco, J., Beall, M., Flaherty, J., Shephard, M.: A hierarchical partition
model for adaptive finite element computation. Comput. Method. Appl. Mech.
Eng. 184(2–4), 269–285 (2000). http://www.sciencedirect.com/science/article/
pii/S0045782599002315
208. Trifunović, A., Knottenbelt, W.J.: Parallel multilevel algorithms for hypergraph
partitioning. J. Parallel Distrib. Comput. 68(5), 563–581 (2008)
209. Tsourakakis, C.E., Gkantsidis, C., Radunovic, B., Vojnovic, M.: Fennel: streaming
graph partitioning for massive scale graphs. Technical report MSR-TR-2012-113,
Microsoft Research (2000)
210. Ucar, B., Aykanat, C., Kaya, K., Ikinci, M.: Task assignment in heterogeneous
computing systems. J. Parallel Distrib. Comput. 66(1), 32–46 (2006). http://
www.sciencedirect.com/science/article/pii/S0743731505001577
211. Wagner, D., Wagner, F.: Between min cut and graph bisection. In: Borzyszkowski,
A.M., Sokolowski, S. (eds.) MFCS 1993. LNCS, vol. 711, pp. 744–750. Springer,
Heidelberg (1993). doi:10.1007/3-540-57182-5 65
212. Walshaw, C.: Multilevel refinement for combinatorial optimisation problems. Ann.
Oper. Res. 131(1), 325–372 (2004)
213. Walshaw, C., Cross, M.: Mesh partitioning: a multilevel balancing and refinement
algorithm. SIAM J. Sci. Comput. 22(1), 63–80 (2000)
214. Walshaw, C., Cross, M.: Parallel mesh partitioning on distributed memory sys-
tems. In: Topping, B. (ed.) Computational Mechanics Using High Performance
Computing, pp. 59–78. Saxe-Coburg Publications, Stirling (2002). Invited chapter
215. Walshaw, C., Cross, M.: JOSTLE: parallel multilevel graph-partitioning software -
an overview. In: Mesh Partitioning Techniques and Domain Decomposition Tech-
niques, pp. 27–58. Civil-Comp Ltd. (2007)
216. Walshaw, C., Cross, M., Everett, M.G.: A localized algorithm for optimizing
unstructured mesh partitions. J. High Perform. Comput. Appl. 9(4), 280–295
(1995)
217. Walshaw, C.: Variable partition inertia: graph repartitioning and load balancing
for adaptive meshes. In: Parashar, M., Li, X. (eds.) Advanced Computational
Infrastructures for Parallel and Distributed Adaptive Applications, pp. 357–380.
Wiley Online Library, Hoboken (2010)
218. Walshaw, C., Cross, M.: Multilevel mesh partitioning for heterogeneous commu-
nication networks. Future Gener. Comp. Syst. 17(5), 601–623 (2001)
219. Walshaw, C., Cross, M., Everett, M.G.: Dynamic load-balancing for parallel adap-
tive unstructured meshes. In: Proceedings of the 8th SIAM Conference on Parallel
Processing for Scientific Computing (PPSC 1997) (1997)
220. Laboratory of Web Algorithms, University of Macedonia: Datasets. http://law.
dsi.unimi.it/datasets.php, http://law.dsi.unimi.it/datasets.php
221. Williams, R.D.: Performance of dynamic load balancing algorithms for unstruc-
tured mesh calculations. Concurr.: Pract. Exp. 3(5), 457–481 (1991)
222. Zhou, M., Sahni, O., et al.: Controlling unstructured mesh partitions for massively
parallel simulations. SIAM J. Sci. Comput. 32(6), 3201–3227 (2010)
223. Zumbusch, G.: Parallel Multilevel Methods: Adaptive Mesh Refinement and Load-
balancing. Teubner, Stuttgart (2003)
How to Generate Randomized Roundings with
Dependencies and How to Derandomize Them
1 Introduction
Randomized rounding is a core primitive of randomized algorithmics (see, e.g.,
the corresponding chapter in the textbook [40]). One central application going
back to Raghavan and Thompson [47,48] is to round non-integral solutions of
linear systems to integer ones. By rounding the variables independently, large
deviations bounds of Chernoff-Hoeffding type can be exploited, leading to good
performance guarantees and low rounding errors. This has been successfully
applied to a broad set of algorithmic problems.
More recently, a need for roundings that also satisfy certain hard constraints
was observed. Here, independent randomized rounding performs not so well—
the chance
√ that a single such constraint is satisfied can easily be as low as
O(1/ n), where n is the number of variables. Repeatedly generating indepen-
dent randomized √ roundings, even for a single constraint and when one is will-
ing to pay an O( n) runtime loss, is surprisingly not admissible as noted by
Srinivasan [56]. Consequently, the better solution is to generate the random-
ized roundings not independently, but in a way that they immediately satisfy
Work done while both authors were affiliated with the Max Planck Institute for
Informatics, Saarbrücken, Germany. Supported by the German Science Foundation
(DFG) through grants DO 749/4-1, DO 749/4-2, and DO 749/4-3 in the priority
programme SPP 1307 “Algorithm Engineering”.
c Springer International Publishing AG 2016
L. Kliemann and P. Sanders (Eds.): Algorithm Engineering, LNCS 9220, pp. 159–184, 2016.
DOI: 10.1007/978-3-319-49487-6 5
160 B. Doerr and M. Wahlström
the desired constraints. This was most successfully done by Srinivasan in his
seminal paper [56], who showed a way to generate randomized roundings that
satisfy the constraint that the sum of all variables is not changed in the rounding
process (provided, of course, that the sum of the original variables is integral).1
These roundings provably satisfy the same large deviation bounds that were
known to hold for independent randomized rounding. This work extended to
hard constraints of the bipartite edge weight rounding type in [35,36], how-
ever for restricted applications of large deviation bounds. A completely different
approach to generating randomized roundings respecting hard constraints was
proposed in [16]. It satisfies the same large deviation bounds, hence yields the
same guarantees on rounding errors and approximation ratios as the previous
approach, but had the additional feature that it could be derandomized easily.
Further extensions followed, see, e.g., Chekuri et al. [9,10]. Throughout these
works, several applications of the roundings were given, in particular to LP-
rounding based approximation algorithms.
The existence of two very different algorithms for this important problem that
from the proven performance guarantees look very similar spurred a sequence
of algorithm engineering works. While mostly experimental in nature, both con-
cerning test problems and classic algorithmic problems, these works also led to
a derandomization of the approach of [36] and to the invention of a hybrid app-
roach (both for the randomized and derandomized setting) combining features
of both previous ones. The aim of this work is to survey these results, which cur-
rently are spread mostly over several conference papers. By presenting them in
a concise and coherent manner, we hope to make these methods easily accessible
also to the non-expert. To complete the picture, we also review some applica-
tions of the tools to concrete problems (as opposed to studying the roundings
in isolation). Furthermore, we also give an elementary and unified proof that all
three approaches to generate randomized roundings with cardinality constraints
are actually correct. For this, only separate proofs, all quite technical, existed
so far.
The field of non-independent randomized rounding and related topics has
seen several other breakthrough results in the last years. We mention them
here, but for reasons of brevity have to point the reader to the relevant litera-
ture. These include the algorithmic breakthroughs for the Lovász local lemma by
Moser and Tardos [42,58] and for Spencer’s combinatorial discrepancy result [54]
by Bansal [5,6], both of which represent efficient algorithms for computing
objects whose existence was previously only guaranteed by non-constructive
probabilistic methods. There are also several variants of rounding procedures
which are out of scope for the present chapter, including the entropy round-
ing method [51], iterative rounding [33,41] and the problem-specific polytope
rounding used by Saha and Srinivasan for resource allocation problems [52].
1
Note that some earlier solutions for special cases exist, e.g., for sums of variables
adding up to one [47] or the hypergraph discrepancy problem [14, 15], which is the
rounding problem with all variables being 1/2 and the rounding errors defined by a
binary matrix.
Randomized Roundings with Dependencies 161
Often, we can assume without loss of generality that x ∈ [0, 1]. In this case, a
randomized rounding y of x is one with probability x and zero otherwise. As
said already, we have E[y] = x in any case.2
For a family x = (x1 , . . . , xn ) of numbers, we say that y = (y1 , . . . , yn )
is a randomized rounding of x when each y rounding of xj .
j is a randomized
By linearity of expectation, this implies E[ j∈[n] aj yj ] = j∈[n] aj xj for all
coefficients aj ∈ R.
When thinking of x as a solution of a linear system Ax = b, then our aim is
to keep the rounding errors (Ay)i − (Ax)i small. Note first that these rounding
errors are independent of the integral part of x, which is why we often assume
x ∈ [0, 1]n . When y is an independent randomized rounding of x, that is, the
random variables y1 , . . . , yn are mutually independent, then the usual Chernoff-
Hoeffding large deviation bounds can be used to bound the rounding errors. For
example, when A ∈ [0, 1]m×n and δ ∈ [0, 1], we have
By the union bound, this implies thatwith constant probability the round-
ing errors are bounded by O(max{ (Ax)i log m, log m}) for all rows i
simultaneously.
2
Note that, in fact, E[y] = x and y ∈ {x, x} is equivalent to saying that y is a
randomized rounding of x.
162 B. Doerr and M. Wahlström
On the other hand, we can easily create systems of hard constraints where
randomized rounding is possible, but which only allow for extremely limited
concentration bounds. Consider an integral polytope P and a fractional point
x ∈ P , and let x be expressed as a convex combination i αi pi over vertices pi of
P (i.e., 0 ≤ αi ≤ 1 for each i, and i αi = 1); note that such an expression always
exists. If we can compute such an expression for any x ∈ P , then we can produce
a randomized rounding by simply letting x = pi with probability αi for each i.
Such a blunt rounding algorithm would in general not allow for any interesting
concentration bounds. (See [23] for the corresponding statement in the setting
of hard cardinality constraints.) Concretely, we may consider a polytope with
only two integral points (0, 1, 0, 1, . . .), (1, 0, 1, 0, . . .) ∈ [0, 1]n ; this may also be
described via cardinality constraints (xi + xi+1 = 1) for 1 ≤ i < n. Given a
fractional point x = (ξ, 1 − ξ, ξ, . . .), we can create a randomized rounding y of x
by letting y = (1, 0, 1, 0, . . .) with probability ξ, and y = (0, 1, 0, 1, . . .) otherwise.
It is clear that this produces a randomized rounding of x, but the procedure
only allows for very specific and restricted concentration bounds. To get useful
concentration bounds, we must consider weaker classes of hard constraints.
Cases with Complete Negative Correlation. Having seen that when no
hard constraints are present, independent randomized rounding allows large
deviation inequalities on all variables, a natural question is for which systems of
hard constraints in general we can obtain unrestricted large deviation bounds.
The standard approach to this is via negative correlation. A set of variables
y = {y1 , . . . , yn } ∈ {0, 1}n are negatively correlated (over all subsets, also
referred to as complete negative correlation) if, for each S ⊆ [n] and each b = 0, 1
it holds that
Pr[ yi = b] ≤ Pr[yi = b].
i∈S i∈S
Since negative correlation suffices for the classic large deviation bounds to
hold [46], the question is which hard constraints allow randomized rounding
in a way that the rounded variables are negatively correlated. Chekuri et al. [9]
showed that this is possible for every point in a polytope P exactly when P is a
kind of matroid polytope. Specifically, they show the following.
Theorem 1 ( [9]). Let P be a polytope with vertices in {0, 1}V . Then the fol-
lowing two properties are equivalent:
1. For any x ∈ P , there exists a probability distribution over vertices of P such
that a random vertex y drawn from this distribution satisfies E[y] = x and
the coordinates {yi }i∈V are negatively correlated.
2. P is a projection of a matroid base polytope, in the sense that there is a
matroid M = (V , I) such that V ⊆ V and p is a vertex of P iff p = 1B∩V
for some base B of M .
implementation and running time); see Sect. 4.4. Interesting special cases include
spanning trees (e.g., [4]) and cardinality constraints. The latter is covered in
detail in Sect. 4.
Partial Negative Correlation. To gain more expressive power in the hard
constraints, we have to give up some generality for the soft constraints. The first
result in this direction was by Gandhi et al. [35,36], who covered the case of edge-
rounding in bipartite graphs, with hard cardinality constraints (and negative
correlation) over sets of edges incident on a common vertex; see Sect. 5 for details.
This was generalized by Chekuri et al. to matroid intersection constraints, with
negative correlation over subsets of variables corresponding to equivalence classes
of the matroids; see [9]. Unlike for complete negative correlation, we have no
complete characterization for this case (i.e., no “only if” statement corresponding
to the second part of Theorem 1).
Further Extensions. Chekuri et al. [10] showed that by relaxing the condi-
tions slightly, one can achieve roundings that are in a sense almost randomized
roundings (up to a factor (1 − ) for a given > 0), which satisfy a set of
hard constraints generalizing all cases above, and such that Chernoff-Hoeffding
concentration bounds apply for any linear function over the variables (i.e., not
restricted to certain variable subsets). In particular, they show the following.
See [10] for details. They mention that the results can be further generalized
to non-bipartite b-matching. Since cardinality constraints are special cases of
bipartite b-matchings, and since both matroid intersection and non-bipartite b-
matching are covered by the above result, this result properly generalizes all the
above-given results (except for the factor (1 − ε) and the exact factors involved
in the concentration bounds).
n
sum of all variables (this implies that we assume that i=1 xi is integral). It will
be immediately clear how to extend all of the below to disjoint cardinality con-
straints (that is, more than one and possibly not covering all variables), and even
to cardinality constraints forming a laminar system (for each two constraints,
the two sets of variables concerned are disjoint or one is a subset of the other).
4.1 Algorithms
We now describe three algorithms that have been proposed for generating ran-
domized roundings with a global cardinality constraint [16,30,56]. All three
approaches (as well as the preliminary works [14,15]) use the same basic idea
of breaking down the rounding process to suitably rounding pairs of variables.
We thus first describe this common core, then fill in the details of how each
algorithm works. (The rounding algorithm for matroid constraints of Chekuri
et al. [9] can also be phrased in this framework, though it is not the perspective
taken in [9].)
Pair Roundings. Let (xi , xj ) be a pair of fractional variables in x. A pair-
rounding step is to take such a pair (xi , xj ) and modify their values as follows. Let
δ + , δ − > 0 be two values chosen by the respective algorithm, and adjust (xi , xj )
to (xi + δ, xj − δ), with δ ∈ {δ + , −δ − } chosen randomly so that E[δ] = 0. The
values δ + , δ − and the choice of the pair (xi , xj ) vary according to the algorithm;
see below. Clearly, each pair-rounding step preserves the sum of all values, keeps
xi in [0, 1] and does not change the expectation of xi (hence the final yi is a
randomized rounding of xi ). Negative correlation also follows, as shown next.
Proof. The are clear; we need to show that for any S ⊆ [n] we
first two claims
have Pr[ i∈S yi = b] ≤ i∈S Pr[yi = b] for b = 0, 1. We give a proof via induction
over the number of pair-rounding steps. As a base case, assume that no pair-
rounding steps case x is integral, y = x, and for each choice
are taken. In that
of S and b, Pr[ i∈S yi = b] = i∈S [xi = b]; thus the statements hold. For the
inductive case, let S ⊆ [n] be an arbitrary set and consider P := Pr[ t∈S xt = 1].
Let (xi , xj ) be the pair of variables in the first pair-rounding step, and observe
δ−
= δ ] = δ+ +δ− . Also let S = S \ {xi , xj } and P = Pr[ t∈S xt = 1] ≤
+
Pr[δ
t∈S xt , by the inductive hypothesis. Now the statement follows by simple
manipulations. If |S ∩ {xi , xj }| = 1, say xi ∈ S, then
δ− δ+
P =( −
(xi + δ + ) + + (xi − δ − ))P = xi P ;
δ+ +δ δ + δ−
166 B. Doerr and M. Wahlström
δ− δ+
P =( −
(xi + δ + )(xj − δ + ) + + (xi − δ − )(xj + δ − ))P
δ+
+δ δ + δ−
δ + (δ − )2 δ − (δ + )2
= (xi xj − + − )P ≤ xi xj P .
δ + δ− δ+ + δ−
The case of t∈S xt = 0 is analogous, replacing each xt by 1 − xt .
Srinivasan’s Method. In [56], the details of the above scheme are filled in as
follows. Let (xi , xj ) be a pair of fractional variables (chosen arbitrarily), and
let δ + = min(1 − xi , xj ) and δ − = min(xi , 1 − xj ). Working through the above
description of the choice of δ, we find that δ = δ + with probability δ − /(δ + +δ − ),
and δ = −δ − with complementary probability. Observe that in each case, at
least one of the new values xi + δ, xj − δ is integral, and will hence not be chosen
for further rounding steps. In particular, this implies that there are only O(n)
rounding steps. While the choice of pairs (xi , xj ) has no impact on the theoretical
behavior of the algorithm, in practice it had some importance; see below.
Bitwise Roundings. A different approach to dependent rounding was taken
by Doerr [16]. For this method, we must assume that the variables {x1 , . . . , xn }
have finite bit-depth , i.e., that each variable xi can be written as ci · 2− for
some integers ci and . In this case, we round variables as follows. Let (xi , xj ) be
a pair of variables with the least significant bit (LSB) set to 1 (i.e., ci mod 2 =
cj mod 2 = 1). If no such variables exist, we may rewrite x with a shorter bit-
n < ; also note that under the assumption that the total cardinality
depth
i=1 i is integral, there cannot be only a single variable xi with non-zero LSB.
x
We round (xi , xj ) as described above by letting δ = ±2− with equal probability;
note that after this step, both variables will have a LSB of 0, regardless of choice
of δ. Hence, after O(n) rounding steps there will be no further variables with
non-zero LSB, and we may consider our variables to have a smaller bit-depth
− 1. After such phases, and consequently O(n) rounding steps, all variables
will be integral.
The advantage and original motivation of this scheme is that the rounding
phases are (arguably) simpler than the previous case, both to implement and to
analyze. In [16], each individual rounding phase was performed in a plug-in fash-
ion by the independent randomized rounding method of Raghavan and Thomp-
son, allowing for the first announced derandomized algorithm for this problem.
However, later it was observed (in [31]) that the standard method of pessimistic
estimators can be applied directly to all these schemes (see below). The complex-
ity of O(n) is noticeably worse than the O(n) of Srinivasan’s method, except for
variables with small, constant bit-depth, but the approach of bit-wise rounding
turned out useful for the more general case of bipartite graphs; see Sect. 5.
A Hybrid Scheme. Motivated by differences observed in running time and
solution quality for the case of bipartite graphs (see Sect. 5), a third variant of
rounding scheme was considered in [30]. In brief, this variant consists of picking
Randomized Roundings with Dependencies 167
4.2 Derandomization
All of the above rounding schemes can be derandomized using the methods
developed by Raghavan for classical randomized rounding [47]. Let us outline
how these methods work for the independent rounding case before we review
how they can be adapted to cases of dependent rounding.
The first ingredient is known as method of conditional probabilities [32,47,55].
Let x be as above, and let P (x) be the probability of some undesirable event, e.g.,
the probability that an independent randomized rounding y of x has a rounding
error larger than some bound μ. Assume that P can be efficiently computed,
and that P (x) < 1. We can then produce a rounding y of x by iteratively
rounding each variable xi in turn, at each step picking the value yi ∈ {0, 1} that
minimizes P (·). Let x resp. x be x modified as xi ← 1 resp. xi ← 0. Then
P (x) = xi P (x ) + (1 − xi )P (x ), as P (x ) and P (x ) are simply the conditional
probabilities of failure given xi . Hence min(P (x ), P (x )) ≤ P (x) < 1, and we
maintain the invariant that P (x∗ ) < 1 for every generated point x∗ . By induction
we have P (y) < 1, where y is the final rounding of x generated this way, and
since y is integral we conclude that P (y) = 0, e.g., y produces a rounding error
less than our bound μ, and we are done.
To extend this to cases where P (x) is unknown or too expensive to compute
(as is the case for rounding errors in a linear system Ax = b), we may use a
pessimistic estimator F (x) in place of P (x). Such an estimator is an efficiently
computable function F (x) such that F (x) ≥ P (x) for all x, F (x) < 1 for the
initial point x, and for every two modifications x , x of a point x as above,
min(F (x ), F (x )) ≤ F (x). By identical arguments as above, using a pessimistic
estimator F (x) in place of the probability P (x), we may deterministically pro-
duce a rounding y of x which satisfies our condition. The art, or course, is
finding such pessimistic estimators. Raghavan [47] showed that certain technical
expressions occurring in the proof of Chernoff-Hoeffding bounds are pessimistic
estimators. This has the advantage that they can applied to systems Ax = b
of soft linear constraints whenever the corresponding Chernoff-Hoeffding bound
shows that with positive probability a solution with a certain rounding error
exists; see Sect. 2, and [47] for details.
To adapt the above to the dependent cases, we proceed as follows. Let x
be a point, and consider a pair-rounding step on variables xi , xj . Recall that
here we adjust x ← x + δ(ei − ej ) for some δ ∈ {δ + , δ − }. Let F (x) be the
above pessimistic estimator, and define f (δ) = F (x + δ(ei − ej )). It was shown
in [31] that f (δ) is a concave function, meaning that for any pair δ + , δ − ≥ 0,
168 B. Doerr and M. Wahlström
at least one of the values f (δ + ), f (−δ − ) is at most F (x). We may now proceed
greedily, as above, at every pair-rounding step selecting that value of δ which
minimizes F (x). As before, this can be done in O(mn) time, for n variables and
m soft constraints. (Similarly to Theorem 3, this can be used to derandomize
any pair-rounding-based algorithm with the same guarantee for the rounding
errors.)
Historically, the derandomization of the bit-wise method progressed through
several generations, from the initial derandomization in [16] with significantly
worse constant factors in the rounding error guarantees, via a partial improve-
ment given in [31], until the general form of the above method was realized [30].
In practice, though the pessimistic estimators are far from perfect (e.g., due
to the use of a union bound), the greedy aspect of the derandomization process
makes for a powerful heuristic, as points with smaller value F (x) also tend to lead
to smaller rounding errors. Although the theoretical guarantees for the resulting
rounding error are comparable to the expected outcome of a randomized process,
in applications and experiments we repeatedly find that derandomized, greedy
methods significantly outperform randomized ones. (See the experiments in this
section for more.)
Implementation Notes. A few potential issues suggest themselves with respect
to implementation of the above. The first is the source of randomness for the
randomized methods. While we did not have access to a “real” (hardware) ran-
domness source, we found no indication in our experiments that the choice of
pseudo-random number generator would have a very powerful impact on the
results. The second potential issue lies in the use of floating-point arithmetics.
As noted in Sect. 2, exact computation of pessimistic estimators is only possi-
ble in the Real RAM model, and alternatives for the standard model are very
costly. Instead, our implementation (as is usual) uses CPU-native floating point
arithmetics. While this “usually” works “reasonably” well, there are potential
issues of accumulated imprecision (in particular since the pessimistic estimators
become repeatedly adjusted throughout the process). However, in experiments
we found no indication of such problems within the scope of this and the next
section.
1, 000, 000 variables in 0.05–0.14 s, with the bit-wise method being slower at
approximately one second. For the derandomized versions, rounding 10, 000
variables subject to 10, 000 soft constraints took 52 s for independent round-
ing, 75 s for Srinivasan’s method, and in excess of ten minutes with bit-wise
rounding. Later engineering of the code base reduced these times, eventually
allowing derandomization instances for a geometric discrepancy problem with
215 = 32, 768 variables and an equal number of soft constraints to be rounded in
37 s (with special-purpose code) [28]; see Sect. 6.1. No issues of numerical stabil-
ity were encountered. The hybrid method was not tested here, but based on the
results in [30] there is no reason to expect that the outcome would be noticeably
different from the other applications of Srinivasan’s method tested here.
Next, we consider solution quality (i.e., rounding errors). All considered
methods have identical theoretical concentration bounds (in the randomized
case) respectively identical theoretical upper bounds (in the derandomized case),
including the classical independent, non-constraint-preserving roundings. For the
bit-wise method, as noted above, the derandomization used in [31] had a worse
constant factor than the latest versions, thus we focus first on the other meth-
ods. Taking the performance of independent randomized rounding as a reference
(100%), the experiments of [31] showed that adding a cardinality constraint led
to no worse rounding errors, and in some cases to a reduction of rounding errors if
the soft constraints have large cardinality (e.g., on instances with a dense random
matrix of soft constraints, a hard cardinality constraint reduced rounding errors
by 15%). No clear difference between the dependent randomized methods was
found. Using a derandomization reduced the rounding error by approximately
50% on random instances; more on structured instances stemming from experi-
ments reported in Sect. 6.1. Comparing the independent derandomized rounding
with Srinivasan’s method revealed no clear difference, though perhaps an advan-
tage for Srinivasan’s method of a few percent. In particular, there seemed to be
no significant “price of hard constraints” in terms of solution quality. All algo-
rithms outperformed their theoretical bounds on rounding error by a factor of
2–3 (presumably due to the latter’s use of union bounds).
This data supports the general expectation that derandomized methods pro-
duce significantly smaller rounding errors, a conclusion that was consistently
arrived at in all our experiments. This advantage persisted when compared to
generating a large number of random solutions and keeping the best one (note
that computing the rounding error requires O(nm) time).
Finally, regarding the derandomized bit-wise method, the version used in [31]
performed worse than the other two (with rounding errors at 55–65% of those of
randomized rounding). Experiments in [29,30] (see later) using newer versions
tend to confirm a (modest) advantage of the derandomization of Srinivasan’s
method over that of the bit-wise method, though we have no good explanation
for this. However, we did find that particular combinations of soft constraints and
order of variable comparison led to very poor quality solutions for Srinivasan’s
method; see [31] regarding tree shape (but note that in later investigations, the
effect has been found to be less general than originally implied). In this respect,
170 B. Doerr and M. Wahlström
As noted in Sect. 3, the above methods can be extended to the setting of matroid
constraints [9]. Matroids are powerful objects, whose usage unifies many results
from combinatorics (see, e.g., [45,53]); hence this extension is a powerful result.
However, it comes with a significant impact to practicality. While the algorithm
of [9] is reasonable (being combinatorial in nature), it works on the basis of a
decomposition of the input point x ∈ Rn into a convex combination of matroid
bases, and such a decomposition is difficult to obtain, both in terms of compu-
tational complexity (the best bound for the general case being O(n6 ) time [11])
and in terms of implementation difficulty. Using a more traditional pair-rounding
approach (in line with the algorithms of Sect. 4.1; see [8]) would save us from
having to provide an explicit decomposition, but instead requires the ability to
test the fractional membership of a point x in the corresponding polytope; for the
general case, this is again as difficult as obtaining a decomposition [11]. (Chekuri
et al. [9] note that in some applications, such a decomposition is provided along
with the point x, in which case these objections do not apply.)
One particularly interesting special case of this result are spanning tree con-
straints, i.e., creating a random spanning tree for a graph according to some
given edge probabilities. This was used in the breakthrough O(log n/ log log n)-
approximation result for Asymmetric TSP of Asadpour et al. [4] (although [4]
used the heavy machinery of maximum entropy sampling). However, the cost
of the above-noted primitives for the spanning tree polytope is still non-trivial,
e.g., decomposition requires O(n2 ) calls to a max-flow algorithm [34]. The best
present bound for max-flow is O(nm) time due to Orlin [44].
we assume that each vertex constraint δ(w) is integral, by adding two dummy
vertices u0 , v0 and up to |U | + |V | + 1 dummy edges [30].
We will not be able to guarantee complete negative correlation, as may be
realized by considering, e.g., an even cycle C2n of 12 -edges. There are exactly two
rounded solutions for this instance, and if the edges are numbered e1 , e2 , . . . , e2n
in order, in each solution we have x1 = x3 = . . . = x2n−1 and x2 = x4 = . . . =
x2n . However, the above results show that one can generate roundings subject to
the above, with negative correlation within all subsets δ(w) for w ∈ U ∪ V . We
will review the above algorithms, and recall the conclusions of some experiments,
both in general terms and for particular applications. We also briefly report on
theoretical results that extend the above situation, again to a matroid setting.
5.1 Algorithms
As in Sect. 4, three different rounding schemes are available for solving the prob-
lem, corresponding to the three schemes of Sect. 4.1, with a common algorithmic
core referred to as pipage rounding. Thus we first describe this common core,
then review how each rounding scheme can be applied to it.
Pipage Rounding. The common principle behind these algorithms is the idea
of pipage rounding, due to Ageev and Sviridenko [1]. Let C ⊆ E be a set of edges
that induce a simple cycle in E; w.l.o.g. assume that C = {e1 , . . . , e2t }, numbered
in order along the cycle C, and let xC = {x1 , . . . , x2t } be the corresponding set
of variables. We will perform a pair-rounding step at every vertex incident to
C, similarly as in Sect. 4, but this time, in order to maintain all cardinality
constraints δ(w) the adjustments need to cascade. Concretely, for some δ ∈
(−1, 1) we will adjust the values xi for all edges ei ∈ C so that x2i−1 ← x2i−1 + δ
and x2i ← x2i − δ; the adjustment δ is chosen randomly as δ ∈ {δ + , −δ − } with
E[δ] = 0. The choice of δ + , δ − and the cycle C is algorithm-specific. We refer to
such an adjustment as a pipage rounding step. We review the effects of applying
the various rounding schemes to this outline. Note that when considering δ(w)
in isolation, w ∈ U ∪ V , the above scheme acts exactly like a pair-rounding
algorithm, implying both negative correlation and derandomization as in Sect. 4.
Gandhi et al. In [35,36], the details are chosen much as in Srinivasan’s method
for pair-rounding. That is, δ + and δ − are defined as the largest values such that
using an adjustment of δ = δ + (resp. δ = −δ − ) leaves all variables xi ∈ [0, 1];
necessarily, at least one variable xi must become integral in such an adjustment.
Each time, the cycle C is chosen arbitrarily among the edges that still have
fractional values. (By the integrality of each vertex constraint, no vertex is inci-
dent to exactly one fractional edge, hence such a cycle always exists.) We get an
upper bound of O(m) pipage rounding steps; as each step may involve a cycle
C of O(n) edges, the total running time (pessimistically) becomes O(nm). (A
better bound may be O(mp), where p is the average cycle length, but this is
hard to estimate theoretically.)
Bit-Wise. In [16], the bit-wise rounding scheme is applied to the above. Con-
cretely, we assume that each variable xi has a finite bit-depth of . Let E be
172 B. Doerr and M. Wahlström
differences in running times; however, this was not attempted.) Naturally, the
value of p depends on the graph structure; in experiments with random 5-regular
bipartite graphs with n vertices, we found that the total number of edge visits
for the method of Gandhi et al. scaled as O(n1.37 ), while it remained linear for
the two other methods. (The difference in running time scaled proportionally
to this.) In concrete numbers, for 5-regular graphs on 1000 vertices, the average
running times for the derandomized versions were 14.6 s, 10.2 s, resp. 8 s for the
method of Gandhi et al., bit-wise, resp. the hybrid method; for 20-regular graphs
on 1000 vertices the times were 109 s, 67 s, resp. 65 s; and for random graphs with
m = 20, 000 and n = 400, the times were 56 s, 46 s, resp. 33 s. In other words,
for this range of m, the order of Gandhi - bit-wise - hybrid is stable, with a
total gap of roughly factor of two. The randomized methods were roughly two
orders of magnitude faster on these instances, which though noteworthy is a less
drastic difference than in Sect. 4. In terms of rounding error, the general order
was that the method of Gandhi et al. produced smaller rounding errors, and the
bit-wise method larger errors, e.g., for the random graphs with m = 20, 000 and
n = 400, the average rounding errors were 4.38 for Gandhi et al., 6.09 for the
bit-wise method, and 5.43 for the hybrid method. However, in the application
experiments (reported in Sect. 6.3), this order was not preserved (there, instead,
the hybrid was both fastest and produced the best-quality solutions). The ran-
domized methods again produced rounding errors similar to each other, up to
twice as large as the derandomized methods.
6 Some Applications
To get a feeling for the behavior of the algorithms “in practice,” we now review
some work on applying the above methods to (real or artificial) instances of con-
crete optimization problems. We cover three topics: Low-discrepancy pointsets
(in Sect. 6.1), routing and covering problems (in Sect. 6.2), and problems of
broadcast scheduling (in Sect. 6.3). These represent various areas where meth-
ods of dependent randomized rounding have been proposed for approximation
174 B. Doerr and M. Wahlström
by the difficulty of computing discrepancies. Note that the formula for d∗∞ (P )
discretizes into O(nd ) tests, which is highly impractical. Unfortunately, though
improvements exist [13,38], no practical method for upper-bounding d∗∞ (P ) for
larger d is known, and it is now known that computing d∗∞ (P ) in time O(no(d) )
would contradict certain complexity-theoretical assumptions [37]. Therefore, the
final conclusions of the experiments remain tentative.
for each i ∈ U ; the optimization goal is max i wi ei . For the basic variant, for
the rounding case we can simply treat the values xi as values to be rounded,
subject to a hard constraint i xi = L. Let y ∈ {0, 1}n be the rounded version
of x. For a single element i ∈ U , the probability that i is covered by y equals
1 − Pr[ (yj = 0)] ≥ 1 − Pr[yj = 0] = 1 − (1 − yj );
j:i∈Sj j:i∈Sj j:i∈Sj
greedy achieves 28054, derandomized Srinivasan 27, 397, and the gradient deran-
domization 28, 448. Optimum was found via exhaustive search to be 28, 709.
We also considered an alternate way of combining greedy and LP-rounding, of
seeding the LP-algorithm by using some fraction εL of the budget for greedy
pre-selection before solving the remaining instance as above; this was found in
some cases to further improve solution quality.
for each internal window (and possibly up to twice more). It is shown in [36]
that each request satisfied in the LP-solution has a chance of at least 3/4 to be
satisfied in the integral solution, leading in particular to a 3/4-approximation
assuming that there is a fractional solution which satisfies all requests.
In [30], this algorithm was implemented and tested, on broadcast scheduling
instances derived from Wikipedia access logs (see the paper for details). We com-
plemented the randomized algorithm above with a simple greedy algorithm and
a derandomization of the above. The greedy algorithm simply proceeds time slot
by time slot, and in each time slot broadcasts that page which would satisfy the
greatest number of remaining requests. For the derandomization, observe that
there are two randomized aspects to the above algorithm, namely the choice of
shifts and the decisions in the pipage rounding. The latter can be derandomized
via ad-hoc pessimistic estimators; the former can be derandomized by select-
ing for each page p that value zp which maximizes the sum of its associated
pessimistic estimators. In the experiments, we found that the greedy algorithm
performed the worst (unlike in Sect. 6.2, where it was quite competitive), and
that the two aspects of the derandomization (choice of z and using derandom-
ized pipage rounding) both strongly improved the solution quality. Concretely,
for the larger instance tested in [30], the greedy algorithm and the randomized
algorithms all achieve a value of 24.6, while derandomizing both aspects gives
value 26.6 (bitwise), 27 (Gandhi et al.), resp. 27.3 (hybrid). The LP has a value
of 27.5. (However, it should be noted that in all derandomized versions, the orig-
inal “fairness” condition that each request has a 3/4 chance of being satisfied
naturally no longer holds.)
We also tested the goal of minimum average delay, which was also covered
in [36]. However, for this goal, the LP-rounding approach does not seem to be
warranted, as the greedy algorithm was found to be both much faster and to
produce better solutions.
of the process, each node knowing the rumor already calls a random neighbor
and gossips the rumor to it. This process has been observed to be a very robust
and scalable method to disseminate information, consequently if found many
applications both in replicated databases [12] and wireless sensor networks [2].
Randomized rumor spreading has a natural interpretation as randomized
rounding problem. Note that for a node u of degree d at each time step for each
neighbor v the probability that u calls v is xtv = 1/d. An actual run of the
protocol leads to a rounding ytv defined by ytv = 1 if and only if u actually
called v in
round t. This rounding problem comes with the natural dependency
y
v tv = v xtv = 1, but as we shall see, adding further dependencies can be
useful.
In [24,25], it was suggested that nodes should not take independent actions
over time, but rather it should be avoided, e.g., that a node calls the same other
node twice in a row. To keep the bookkeeping effort low, it was proposed that
each node has a cyclic permutation of his neighbors. When first informed, it
chooses a random starting point in this cyclic order, but from then on determin-
istically follows the order of the list. Note that this also massively reduced the
number of random bits needed by the process. Despite using much less random-
ness, this process was proven to have a mostly similar or slightly better perfor-
mance than the classic independent rumor spreading. In [21], an experimental
investigation was undertaken that confirmed speed-ups for several settings where
the theoretical works could not prove a difference of the protocols. Also, it was
observed that the particular choice of the lists can make a difference, e.g., for 2D
grids with diagonal adjacencies a low-discrepancy order to serve the directions
was shown to be much better than a clock-wise order.
Interestingly, the most significant improvement stemming from dependen-
cies (and in fact very low dependencies) was found on preferential attachment
graphs [18,20]. These graphs were introduced by Barabási and Albert [7] as
a model for real-world networks. For these graphs, surprisingly, a very minor
fine-tuning turned out to change the asymptotic runtime [18,20]. While classic
protocol with high probability needs Ω(log n) rounds to inform all vertices, this
changes to O(log n/ log log n) when the independent choice is replaced by talk-
ing to a neighbor chosen uniformly at random from all neighbors except the one
called in the very previous round. That this asymptotic improvement is visible
also for realistic network sizes was shown in [19]. We are not aware of previous
results showing that such a minor fine-tuning of a randomized algorithm can
lead to such gains for real-world network structures.
7 Conclusions
All results presented in the article indicate that randomized rounding and its
derandomization can be adapted to respect additional hard cardinality con-
straints without incurring significant losses compared to classical independent
randomized rounding as introduced by Raghavan and Thomspon [47,48]. For
disjoint cardinality constraints, when using Srinivasan’s approach or the hybrid
Randomized Roundings with Dependencies 181
approach, we did not observe that generating the roundings or the derandom-
izations took more time or was significantly more complicated. Also, we gen-
erally did not observe larger rounding errors when additional hard constraints
were present (rather the opposite, in particular, adding a global cardinality con-
straint may in fact slightly descrease the rounding errors). For the choice of
the rounding method to be used, the experimental results clearly indicate that
for disjoint cardinality constraints, Srinivasan’s or the hybrid approach should
be preferred, where as for the bipartite edge weight setting, the bit-wise or the
hybrid approach are more efficient.
Acknowledgements. The authors are grateful to the German Science Foundation for
generously supporting this research through their priority programme Algorithm Engi-
neering, both financially and by providing scientific infrastructure. We are thankful to
our colleagues in the priority programme for many stimulation discussions. A partic-
ular thank goes to our collaborators and associated members of the project, namely
Carola Doerr née Winzen (University of Kiel, then MPI Saarbrücken, now Université
Pierre et Marie Curie—Paris 6), Tobias Friedrich (MPI Saarbrücken, now University
of Jena), Michael Gnewuch (University of Kiel, now University of Kaiserslautern),
Peter Kritzer (University of Linz), Marvin Künnemann (MPI Saarbrücken), Friedrich
Pillichshammer (University of Linz), and Thomas Sauerwald (MPI Saarbrücken, now
University of Cambridge).
References
1. Ageev, A.A., Sviridenko, M.: Pipage rounding: a new method of constructing algo-
rithms with proven performance guarantee. J. Comb. Optim. 8(3), 307–328 (2004)
2. Al-Karaki, J.N., Kamal, A.E.: Routing techniques in wireless sensor networks: a
survey. Wirel. Commun. IEEE 11(6), 6–28 (2004)
3. Arora, S., Rao, S., Vazirani, U.V.: Expander flows, geometric embeddings and
graph partitioning. J. ACM 56(2) (2009)
4. Asadpour, A., Goemans, M.X., Madry, A., Gharan, S.O., Saberi, A.: An O(log n/
log log n)-approximation algorithm for the asymmetric traveling salesman problem.
In: SODA, pp. 379–389 (2010)
5. Bansal, N.: Constructive algorithms for discrepancy minimization. In: FOCS, pp.
3–10 (2010)
6. Bansal, N., Spencer, J.: Deterministic discrepancy minimization. Algorithmica
67(4), 451–471 (2013)
7. Barabási, A.L., Albert, R.: Emergence of scaling in random networks. Science 286,
509–512 (1999)
8. Chekuri, C., Vondrák, J., Zenklusen, R.: Dependent randomized rounding for
matroid polytopes and applications (2009). http://arxiv.org/pdf/0909.4348v2.pdf
9. Chekuri, C., Vondrák, J., Zenklusen, R.: Dependent randomized rounding via
exchange properties of combinatorial structures. In: FOCS, pp. 575–584 (2010)
10. Chekuri, C., Vondrák, J., Zenklusen, R.: Multi-budgeted matchings and matroid
intersection via dependent rounding. In: SODA, pp. 1080–1097 (2011)
11. Cunningham, W.H.: Testing membership in matroid polyhedra. J. Comb. Theory
Ser. B 36(2), 161–188 (1984)
182 B. Doerr and M. Wahlström
12. Demers, A.J., Greene, D.H., Hauser, C., Irish, W., Larson, J., Shenker, S., Sturgis,
H.E., Swinehart, D.C., Terry, D.B.: Epidemic algorithms for replicated database
maintenance. Oper. Syst. Rev. 22, 8–32 (1988)
13. Dobkin, D.P., Eppstein, D., Mitchell, D.P.: Computing the discrepancy with appli-
cations to supersampling patterns. ACM Trans. Graph. 15(4), 354–376 (1996)
14. Doerr, B.: Multi-color discrepancies. dissertation, Christian-Albrechts-Universität
zu Kiel (2000)
15. Doerr, B.: Structured randomized rounding and coloring. In: Freivalds, R. (ed.)
FCT 2001. LNCS, vol. 2138, pp. 461–471. Springer, Heidelberg (2001). doi:10.
1007/3-540-44669-9 53
16. Doerr, B.: Generating randomized roundings with cardinality constraints and
derandomizations. In: Durand, B., Thomas, W. (eds.) STACS 2006. LNCS, vol.
3884, pp. 571–583. Springer, Heidelberg (2006). doi:10.1007/11672142 47
17. Doerr, B.: Randomly rounding rationals with cardinality constraints and deran-
domizations. In: Thomas, W., Weil, P. (eds.) STACS 2007. LNCS, vol. 4393, pp.
441–452. Springer, Heidelberg (2007). doi:10.1007/978-3-540-70918-3 38
18. Doerr, B., Fouz, M., Friedrich, T.: Social networks spread rumors in sublogarithmic
time. In: STOC, pp. 21–30. ACM (2011)
19. Doerr, B., Fouz, M., Friedrich, T.: Experimental analysis of rumor spreading in
social networks. In: MedAlg, pp. 159–173 (2012)
20. Doerr, B., Fouz, M., Friedrich, T.: Why rumors spread so quickly in social networks.
Communun. ACM 55, 70–75 (2012)
21. Doerr, B., Friedrich, T., Künnemann, M., Sauerwald, T.: Quasirandom rumor
spreading: an experimental analysis. JEA 16. Article 3.3 (2011)
22. Doerr, B., Gnewuch, M.: Construction of low-discrepancy point sets of small size
by bracketing covers and dependent randomized rounding. In: Keller, A., Heinrich,
S., Niederreiter, H. (eds.) Monte Carlo and Quasi-Monte Carlo Methods 2006, pp.
299–312. Springer, Heidelberg (2008)
23. Doerr, B.: Non-independent randomized rounding. In: SODA, pp. 506–507 (2003)
24. Doerr, B., Friedrich, T., Sauerwald, T.: Quasirandom rumor spreading. In: SODA,
pp. 773–781 (2008)
25. Doerr, B., Friedrich, T., Sauerwald, T.: Quasirandom rumor spreading: expanders,
push vs. pull, and robustness. In: ICALP, pp. 366–377 (2009)
26. Doerr, B., Gnewuch, M., Kritzer, P., Pillichshammer, F.: Component-by-
component construction of low-discrepancy point sets of small size. Monte Carlo
Meth. Appl. 14(2), 129–149 (2008)
27. Doerr, B., Gnewuch, M., Wahlström, M.: Implementation of a component-by-
component algorithm to generate small low-discrepancy samples. In: L’Ecuyer,
P., Owen, A.B. (eds.) Monte Carlo and Quasi-Monte Carlo Methods 2008, pp.
323–338. Springer, Heidelberg (2009)
28. Doerr, B., Gnewuch, M., Wahlström, M.: Algorithmic construction of low-
discrepancy point sets via dependent randomized rounding. J. Complex. 26(5),
490–507 (2010)
29. Doerr, B., Künnemann, M., Wahlström, M.: Randomized rounding for routing
and covering problems: experiments and improvements. In: Festa, P. (ed.) SEA
2010. LNCS, vol. 6049, pp. 190–201. Springer, Heidelberg (2010). doi:10.1007/
978-3-642-13193-6 17
30. Doerr, B., Künnemann, M., Wahlström, M.: Dependent randomized rounding: the
bipartite case. In: ALENEX, pp. 96–106 (2011)
31. Doerr, B., Wahlström, M.: Randomized rounding in the presence of a cardinality
constraint. In: ALENEX, pp. 162–174 (2009)
Randomized Roundings with Dependencies 183
32. Erdős, P., Selfridge, J.L.: On a combinatorial game. J. Combinatorial Theory Ser.
A 14, 298–301 (1973)
33. Fleischer, L., Jain, K., Williamson, D.P.: Iterative rounding 2-approximation algo-
rithms for minimum-cost vertex connectivity problems. J. Comput. Syst. Sci.
72(5), 838–867 (2006)
34. Gabow, H.N., Manu, K.S.: Packing algorithms for arborescences (and spanning
trees) in capacitated graphs. Math. Program. 82, 83–109 (1998)
35. Gandhi, R., Khuller, S., Parthasarathy, S., Srinivasan, A.: Dependent rounding in
bipartite graphs. In: FOCS, pp. 323–332 (2002)
36. Gandhi, R., Khuller, S., Parthasarathy, S., Srinivasan, A.: Dependent rounding
and its applications to approximation algorithms. J. ACM 53, 324–360 (2006)
37. Giannopoulos, P., Knauer, C., Wahlström, M., Werner, D.: Hardness of discrepancy
computation and epsilon-net verification in high dimension. J. Complexity 28(2),
162–176 (2012)
38. Gnewuch, M., Wahlström, M., Winzen, C.: A new randomized algorithm to approx-
imate the star discrepancy based on threshold accepting. SIAM J. Numerical Anal.
50(2), 781–807 (2012)
39. Goemans, M.X., Williamson, D.P.: Improved approximation algorithms for max-
imum cut and satisfiability problems using semidefinite programming. J. ACM
42(6), 1115–1145 (1995)
40. Hromkovič, J.: Design and Analysis of Randomized Algorithms. Introduction to
Design Paradigms. Texts in Theoretical Computer Science An EATCS Series.
Springer, Berlin (2005)
41. Jain, K.: A factor 2 approximation algorithm for the generalized Steiner network
problem. Combinatorica 21(1), 39–60 (2001)
42. Moser, R.A., Tardos, G.: A constructive proof of the general Lovász local lemma.
J. ACM 57(2) (2010)
43. Niederreiter, H.: Random number generation and Quasi-Monte Carlo methods. In:
CBMS-NSF Regional Conference Series in Applied Mathematics, vol. 63. Society
for Industrial and Applied Mathematics (SIAM), Philadelphia, PA (1992)
44. Orlin, J.B.: Max flows in O(nm) time, or better. In: STOC, pp. 765–774 (2013)
45. Oxley, J.: Matroid Theory. Oxford Graduate Texts in Mathematics. OUP Oxford,
Oxford (2011)
46. Panconesi, A., Srinivasan, A.: Randomized distributed edge coloring via an exten-
sion of the Chernoff-Hoeffding bounds. SIAM J. Comput. 26, 350–368 (1997)
47. Raghavan, P.: Probabilistic construction of deterministic algorithms: approximat-
ing packing integer programs. J. Comput. Syst. Sci. 37, 130–143 (1988)
48. Raghavan, P., Thompson, C.D.: Randomized rounding: a technique for provably
good algorithms and algorithmic proofs. Combinatorica 7, 365–374 (1987)
49. Raghavendra, P.: Optimal algorithms and inapproximability results for every CSP?
In: STOC, pp. 245–254 (2008)
50. Raghavendra, P., Steurer, D.: How to round any CSP. In: FOCS, pp. 586–594
(2009)
51. Rothvoß, T.: The entropy rounding method in approximation algorithms. In:
SODA, pp. 356–372 (2012)
52. Saha, B., Srinivasan, A.: A new approximation technique for resource-allocation
problems. In: ICS, pp. 342–357 (2010)
53. Schrijver, A.: Combinatorial Optimization: Polyhedra and Efficiency. Algorithms
and Combinatorics, vol. 24. Springer, Heidelberg (2003)
54. Spencer, J.: Six standard deviations suffice. Trans. Amer. Math. Soc. 289, 679–706
(1985)
184 B. Doerr and M. Wahlström
55. Spencer, J.: Ten Lectures on the Probabilistic Method. SIAM, Philadelphia (1987)
56. Srinivasan, A.: Distributions on level-sets with applications to approximations algo-
rithms. In: FOCS, pp. 588–597 (2001)
57. Srivastav, A., Stangier, P.: Algorithmic Chernoff-Hoeffding inequalities in integer
programming. Random Struct. Algorithms 8, 27–58 (1996)
58. Szegedy, M.: The Lovász local lemma - a survey. In: CSR, pp. 1–11 (2013)
External-Memory State Space Search
Stefan Edelkamp(B)
Abstract. Many state spaces are so big that even in compressed form
they fail to fit into main memory. As a result, during the execution of a
search algorithm, only a part of the state space can be processed in main
memory at a time; the remainder is stored on a disk.
In this paper we survey research efforts in external-memory search
for solving state space problems, where the state space is generated by
applying rules. We study different form of expressiveness and the effect of
guiding the search into the direction of the goal. We consider outsourcing
the search to disk as well as its additional parallelization to many-core
processing units. We take the sliding-tile puzzle as a running example.
1 Introduction
A multitude of algorithmic tasks in a variety of application domains can be
formalized as a state space problem. A typical example is the sliding-tile puzzle –
in square arrangement called the (n2 −1)-puzzle (see Fig. 1). Numbered tiles in a
rectangular grid have to be moved to a designated goal location by successively
sliding tiles into the only empty square. The state space grows rapidly: the
8-puzzle has 181,440, the 15-puzzle 20, 922, 789, 888, 000/2 ≈ 10 trillion, and the
24-puzzle 15, 511, 210, 043, 330, 985, 984, 000, 000/2 ≈ 7.75 × 1025 states.
More generally, a state space problem P = (S, A, s, T ) consists of a set of
states S, an initial state s ∈ S, a set of goal states T ⊆ S, and a finite set
of actions A where each a ∈ A transforms a state into another one. Usually,
a subset of actions A(u) ⊆ A is applicable in each state u. A solution π is an
ordered sequence of actions ai ∈ A, i ∈ {1, . . . , k} that transforms the initial
state s into one of the goal states t ∈ T , i.e., there exists a sequence of states
ui ∈ S, i ∈ {0, . . . , k}, with u0 = s, uk = t and ui is the outcome of applying ai
to ui−1 , i ∈ {1, . . . , k}. A cost (or weight) function w : A → IR≥0 induces the
k
cost of a solution consisting of actions a1 , . . . , ak as i=1 w(ai ). In the usual
case of unit-cost domains, for all a ∈ A we have w(a) = 1. A solution is optimal
if it has minimum cost among all solutions.
A state space problem graph G = (V, E, s, T ) for the state space problem
P = (S, A, s, T ) is defined by V = S as the set of nodes, s ∈ S as the initial
node, T as the set of goal nodes, and E ⊆ V × V as the set of edges that connect
nodes to nodes with (u, v) ∈ E if and only if there exists an a ∈ A with a(u) = v.
Solving state space problems, however, is best characterized as a search in an
c Springer International Publishing AG 2016
L. Kliemann and P. Sanders (Eds.): Algorithm Engineering, LNCS 9220, pp. 185–225, 2016.
DOI: 10.1007/978-3-319-49487-6 6
186 S. Edelkamp
1 2 3 4 5
1 2 3 4
1 2 3 6 7 8 9 10
5 6 7 8
8 4 11 12 13 14 15
9 10 11 12
7 6 5 16 17 18 19 20
13 14 15
21 22 23 24
implicit graph. The difference is that not all edges have to be explicitly stored, but
are generated by a set of rules (such as in games). We have an initial node s ∈ V ,
a set of goal nodes determined by a predicate Goal : V → IB = {f alse, true}.
The basic operation is called node expansion (a.k.a., node exploration), which
means generation of all neighbors of a node u. The resulting nodes are called
successors (a.k.a., children) of u, and u is called a parent or predecessor. We will
write Succ(u) = {v ∈ S | ∃a ∈ A(u) | a(u) = v} for the successor set.
In more general state-space search models, by applying one action the suc-
cessor of a state is no longer unique. For the non-deterministic case, we have
Succ(u, a) = {v ∈ S | a ∈ A(u)}. For a Markov decision problem (MDPs) with
probabilities p(v | u, a) we additionally impose v∈Succ(u,a) p(v | u, a) = 1.
All nodes have to be reached at least once on a path from the initial node
through successor generation. Consequently, we can divide the set of reached
nodes into the set of expanded nodes and the set of generated nodes that are not
yet expanded. In AI literature the former set is often referred to as the Closed
list or the search frontier, and the latter set as the Open list. The denotation
as a list refers to the legacy of the first implementation, namely as a simple
linked list. However, realizing them using the right data structures (e.g., a hash
table for duplicate elimination and a priority queue for best-first exploration) is
crucial for the search algorithm’s characteristics and performance.
Refined algorithms have led to first optimal solutions for challenging com-
binatorial games. Besides computation time, space is a crucial computational
resource. For the Rubik’s Cube with 43,252,003,274,489,856,000 states the exact
diameter is 20 moves [73]. The computation for the lower bound took 35 CPU
years on several computers using (pattern) databases [62]. Rokicki et al. [83] par-
titioned the states into 2,217,093,120 sets of 19,508,428,800 states each, reduced
the count of sets needed to solve to 55,882,296 using symmetry and set covering.
Only solutions of length of at most 20 were found with a program that solved
a single set in about 20 s. The Towers-of-Hanoi problem (with 4 pegs and 30
disks) spawns a space of 1,152,921,504,606,846,976 states and was solved in 17
days using 400 GBs of disk space [67]. To show that Checkers is draw (assum-
ing optimal play) [86,87], endgame databases of up to 10 pieces were built, for
any combination of kings and checkers. The database size amounts to 39 tril-
lion states. The number of states in the proof for a particular opening took
about one month on an average of 7 processors, with a longest line of 67 moves.
The standard problem for Connect 4 has 4,531,985,219,092 reachable states [31].
External-Memory State Space Search 187
It is won for the first player [4,5]. Most other states have been classified via an
external-memory hybrid of explicit-state and symbolic retrograde analysis [33].
Current domain-independent action planning systems solve Blocksworld
problems with 50 blocks and more, and produce close-to cost-optimal plans in
Logistics with hundreds of steps [24,49,50,52,82]. For planning with numbers,
potentially infinite search spaces have to be explored [48,52]. With the external-
memory search, in some cases, optimal plans can be obtained [54].
External-memory search algorithms have also helped finding bugs in soft-
ware [8,26,34,35,54,74,92]. Different model checkers have been externalized and
enhanced by directing the search toward system errors. Search heuristics accel-
erate symbolic model checkers for analyzing hardware, on-the-fly verifiers for
analyzing compiled software units, and industrial tools for exploring real-time
domains and finding resource-optimal schedules. Given a large and dynamically
changing state vector, external-memory and parallel exploration scaled best.
A sweep-line scans the search space according to a given partial order [71], while
[55] implements a model-checking algorithm on top of external-memory A*, [56]
provides a distributed implementation of the algorithm of [55] for model check-
ing safety properties, while [27] extends the approach to general (LTL) proper-
ties. Iterative broadening has been suggested in the context of model checking
real-time domains by [26], and some recent algorithms include perfect hash func-
tions [12,13] in what has been denoted as semi-external-memory search [35,36].
External-memory search is also among the best-known methods for opti-
mally solving multiple sequence alignment problems [69,88,96]. The graphs for
the some challenging problems required days of CPU time to be explored [30].
Monte-Carlo tree search [17,60,84] is effective especially for post-hoc optimiza-
tion [42].
The text kicks off with introducing external-memory search algorithms
(Sect. 2) and continues with engineering the delayed detection (and elimina-
tion) of duplicates (Sect. 3). It then turns to pattern databases (Sect. 4), before
addressing more general state space formalisms (Sect. 5), as well as paralleliza-
tion options on CPUs (Sect. 6) and GPUs (Sect. 7). The work refers to prior pub-
lications of the author. E.g., Sects. 2.1 and 2.5 contain content from [29], Sect. 5
refers to [28], Sects. 6.1 and 6.3 are based on [56], Sect. 6.4 is based on [32], and
Sect. 7 contains content from [37,40].
2 External-Memory Search
The commonly used model for comparing the performance of external algorithms
consists of a single processor, a small internal memory that can hold up to M
data items, and an unlimited secondary memory. The size of the input problem
(in terms of the number of records) is abbreviated by N . Moreover, the block
size B governs the bandwidth of memory transfers. It is usually assumed that
at the beginning of the algorithm, the input data is stored in contiguous blocks
on external memory, and the same must hold for the output. Only the number
of block read and writes are counted, computations in internal memory do not
incur any cost (see Fig. 2).
188 S. Edelkamp
CPU
Internal Memory
M
B Block Size
External Memory
the buffer that is flushed to disk. The outcome of this phase are k pre-sorted
files. Note that duplicate elimination can be improved by using hash tables for
the blocks before flushed to disk. Since the node set in the hash table has to be
stored anyway, the savings by early duplicate detection are considerably small.
In the next step, external-memory (multi-way) merging is applied to unify
the files into Open(i) by a simultaneous scan. The size of the output files is
chosen such that a single pass suffices. Duplicates are eliminated (even though
semantically more insightful for the ease of notation not renaming the files into
Closed ). Since the files were pre-sorted, the complexity is given by the scanning
time of all files. One also has to eliminate Open(i − 1) and Open(i − 2) from
Open(i) to avoid re-computations; that is, nodes extracted from the disk-based
queue are not immediately deleted, but kept until the layer has been completely
generated and sorted, at which point duplicates can be eliminated using a parallel
scan. The process is repeated until Open(i − 1) becomes empty, or the goal has
O(sort(|Succ(Open(i−1))|)+scan(|Open(i−
been found. The algorithm applies
1)| + |Open(i − 2)|)) I/Os. By i |Succ(Open(i))| = O(|E|) and i |Open(i)| =
O(|V |), the total execution time is O(sort(|E|) + scan(|V |)) I/Os.
In search problems with bounded branching factor we have |E| = O(|V |), and
thus the complexity for external-memory BFS reduces to O(sort(|V |)) I/Os. If we
keep each Open(i) in a separate file for sparse problem graphs (e.g. simple chains)
file opening and closing would accumulate to O(|V |) I/Os. The solution for this
case is to store the nodes in Open(i), Open(i + 1), and so forth consecutively in
internal memory. Therefore, I/O is needed, only if a level has at most B nodes.
Let s be the initial node, and Succ be the successor generation function.
The algorithm extends to integer weighted graphs G = (V, E, w) with bounded
locality locG = max{δ(s, u) − δ(s, v) + w(u, v) | u ∈ S, v ∈ Succ(u)}, where
δ(s, u) is the shortest path distance from s to u. The locality determines the
thickness of the search frontier needed to prevent duplicates in the search.
In external-memory search the exploration fully resides on disk. As pointers
are not available solutions are reconstructed by saving the predecessor together
with every state, by scanning with decreasing depth the stored files, and by
looking for matching predecessors. Any reached node that is a predecessor of
the current node is its predecessor on an optimal solution path. This results in a
I/O complexity of O(scan(|V |)). Even if conceptually simpler, there is no need
to store the the search frontier Open(i), i ∈ {0, 1, . . . , k}, in different files.
By completely enumerating the state space the external-memory BFS explo-
ration results showed that an instance of the 15-puzzle requires at most 80
steps [68]. The result has been validated in [80] on a distributed-memory system
with 32 nodes (128 CPUs) in 66h.
2.4 External-Memory A*
In the following, we study how to extend external-memory BFS to A* [47]. The
main advantage of A* with respect to BFS is that, due to the use of a lower bound
on the goal distance, it often traverses a much smaller part of the search space
to establish an optimal solution. Since A* only changes the traversal ordering,
it is advantageous to BFS only if both algorithms terminate at a goal node.
In A*, the cost for node u is f (u) = g(u) + h(u), with g being the cost of the
path from the initial node to u and h(u) being the estimate of the remaining costs
from u to the goal. In each step, a node u with minimum f -value is removed
External-Memory State Space Search 191
1e+07
1e+06
100000
10000
Nodes
h
1000
100
10
1
0 20 40 60 80 100 120 140
BFS Layer
Fig. 3. Schematic view of enforced hill climbing, incrementally queuing down to better
goal distance values, restarting each time the exit of a plateau is reached (left). Typical
memory profile in external-memory enforced hill climbing of a particular benchmark
planning problem (right): the x-axis provides an index for the concatenatenation of all
the BFS-layers encountered during the search, while the y-axis denotes the number of
states stored and expanded (height of bars), for the according index (on log scale).
from Open, and the new value f (v) of a successor v of u is updated to the
minimum of its current value and f (v) = g(v) + h(v) = g(u) + w(u, v) + h(v) =
f (u) + w(u, v) − h(u) + h(v); in this case, it is inserted into Open itself.
In our algorithm, we first assume a consistent heuristic, where for all u and
v we have w(u, v) ≥ h(u) − h(v), and a uniformly weighted undirected prob-
lem graph. These conditions are often met in practice, since many problem
graphs in single-agent search, e.g., in Rubik’s cube and sliding-tile puzzles are
uniformly weighted and undirected and many heuristics, e.g., pattern database
estimates [66] are consistent. Under these assumptions, we have h(u) ≤ h(v) + 1
for every node u and every successor v of u. Since the problem graph is undirected
this implies |h(u) − h(v)| ≤ 1 and h(v) − h(u) ∈ {−1, 0, 1}. If the heuristic is
consistent, then on each search path, the evaluation function f is non-decreasing.
No successor will have a smaller f -value than the current one. Therefore, A*,
which traverses the node set in f -order, expands each node at most once.
In the (n2 − 1)-puzzle, for example, the Manhattan distance is defined as the
sum of the horizontal and vertical differences between actual and goal configu-
rations, for all tiles. The heuristic is consistent, since for two successive nodes u
and v the difference of the according estimate evaluations h(v) − h(u) is either
−1 or 1. The f -values of nodes u and successor nodes v of are either the same
or f (v) = f (u) + 2.
As above, external-memory A* [29] maintains the search frontier on disk,
possibly partitioned into main-memory-sized sequences. In fact, the disk files
correspond to a bucket implementation of a priority queue data structure. In
the course of the algorithm, each bucket addressed with index i contains all
nodes u in the set Open that have priority f (u) = i. A disk-based representation
of this data structure will store each bucket in a different file [64].
192 S. Edelkamp
Bytes External-A*
1e+12
1e+10
1e+08
1e+06
10000
80
100 70
60
50
40 h-value
30
1 20
0 10 20 10
30 40 50 60 70 0
g-value 80
one sorted file. This file can then be scanned to remove the duplicate nodes from
it. In fact, both the merging and duplicates removal can be done simultaneously.
Another special case of the duplicate nodes exists when the nodes that have
already been evaluated in the upper layers are generated again (see Fig. 5). These
duplicate nodes have to be removed by a file subtraction process for the next
active bucket Open(gmin + 1, hmax − 1) by removing any node that has appeared
in Open(gmin , hmax − 1) and Open(gmin − 1, hmax − 1) (Buckets shaded in light
gray). This file subtraction can be done by a mere parallel scan of the pre-sorted
files and by using a temporary file in which the intermediate result is stored.
It suffices to remove duplicates only in the bucket that is expanded next, i.e.,
Open(gmin + 1, hmax − 1).
When merging the pre-sorted sets with the previously existing Open buckets
(both residing on disk), duplicates are eliminated, leaving the sets Open(gmin +
1, hmax − 1), Open(gmin + 1, hmax ) and Open(gmin + 1, hmax + 1) duplicate-free.
Then the next active bucket Open(gmin +1, hmax −1) is refined not to contain any
node in Open(gmin − 1, hmax − 1) or Open(gmin , hmax − 1). This can be achieved
through a parallel scan of the pre-sorted files and by using a temporary file
in which the intermediate result is stored, before Open(gmin + 1, hmax − 1) is
updated. It suffices to perform file subtraction lazily only for the bucket that is
194 S. Edelkamp
expanded next. Since external-memory A* only modifies the order of states with
the same f -value, completeness and optimality are inherited from internal A*.
By simulating internal A*, DDD ensures that each edge in the problem graph
is looked at most once, so that O(sort(|Succ(Open(gmin +1, hmax −1))|) I/Os are
needed to eliminate duplicates in the successor lists. Since each node is expanded
at most once, this adds O(sort(|E|)) I/Os to the overall run time. Filtering,
evaluating nodes, and merging lists is available in scanning time of all buckets
in consideration. During the exploration, each bucket Open will be referred to
at most six times, once for expansion, at most three times as a successor bucket
and at most two times for duplicate elimination as a predecessor of the same
h-value as the currently active bucket. Therefore, evaluating, merging and file
subtraction add O(scan(|V |) + scan(|E|)) I/Os to the overall run time.
If |E| = O(|V |) the complexity reduces to O(sort(|V |)) I/Os. It is not difficult
to generalize the result to directed graphs with bounded locality, since in this
case subtraction amounts to O(locG · scan(|V |)) = O(scan(|V |)) I/Os.
By setting the weight of all edges (u, v) to h(u) − h(v) + 1 for a consistent
heuristic h, A* can be cast as a variant of Dijkstra’s algorithm. To reconstruct
a solution path, we store predecessor information with each node on disk (thus
doubling the state vector size), and apply backward chaining, starting with the
target node. However, this is not strictly necessary: For a node in depth g, we
intersect the set of possible predecessors with the buckets of depth g − 1. Any
node that is in the intersection is reachable on an optimal solution path, so that
we can iterate the construction process. Time is bounded by O(scan(|V |)) I/Os.
Let us consider how to externally solve 15-puzzle problem instances that can-
not be solved internally with A* and the Manhattan distance estimate. Internal
sorting is implemented by applying Quicksort [51]. Multi-way external-memory
External-Memory State Space Search 195
Procedure External-Memory A*
Input: Problem graph with start node s
Output: Optimal solution path
merging maintains file pointers for every flushed buffer and joins them into a
single sorted file. Internally, a heap is used (its engineered implementation is
crucial for the efficiency of the sorting). Duplicate removal and bucket subtrac-
tion are performed on single passes through the bucket file. Table 1 illustrates
the impact of duplicate removal (dr) and bucket subtraction (sub) on the num-
ber of generated states for problem instances of increasing complexity. In some
cases, the experiment is terminated because of the limited hard disk capacity.
One interesting feature of our approach from a practical point of view is the
ability to pause and resume the program execution in large problem instances.
This is desirable, e.g. in the case when the limits of secondary storage are reached,
as one can resume the execution with more disk space. External sorting can be
avoided to some extent, by a single or a selection of hash functions that splits
larger files into smaller pieces until they fit into main memory. As with the h-
value in the above case a node and its duplicate will have the same hash address.
While external-memory A* requires a constant amount of memory for the
internal read and write buffers, iterative-deepening A* (IDA*) [61] that applies
depth-first bounded searches with an increasing optimal solution cost threshold,
requires very little memory that scales linear with the search depth. External-
memory A* removes all duplicates from the search, but require slow disk to
succeed. Moreover, in search practice disk space is limited, too. Therefore, one
196 S. Edelkamp
3 Duplicate Detection
Internal External
1 2 3
4 5 6 7
8 9 10 11
12 13 14 15
Fig. 6. Example for structured duplicate detection; problem instance (left) is mapped
to one node in the abstract graph (right). For expanding all states mapped to an
abstract node, for the elimination of duplicates only states stored in the abstract suc-
cessors nodes need to be loaded in main memory.
bmax = maxv∈φ(S) |Succ(v)| in the abstract state space φ(S). If there are differ-
ent abstractions to choose from, we take those that have the smallest ratio of
maximum node branching factor bmax and abstract state space size |φ(S)|. The
idea is that smaller abstract state space sizes should be preferred but usually
lead to larger branching factors.
In the example of the 15-puzzle (see Fig. 6), the projection is based on nodes
that have the same blank position. This state-space abstraction also preserves
the additional property that the successor set and the expansion sets are disjoint,
yielding no self-loops in the abstract problem graph. The duplicate scope defines
the successor buckets that have to be read into main memory.
The method is crucially dependent on the availability and selection of suit-
able abstraction functions φ that adapt to the internal memory constraints. In
contrast, DDD does not rely on any partitioning beside the heuristic function
and it does not require the duplicate scope to fit in main memory. A time-space
trade-off refinement called edge partitioning [97] generates successors only along
one edge at a time.
SDD is compatible with ordinary and hash-based duplicate detection, as in
case the files that have to be loaded into main memory do no longer fit, we have
to delay. However, the structured partitioning may have truncated the file sizes
for duplicate detection to a manageable number. Each heuristic or hash function
defines a partitioning of the search space but not all partitions provide a good
locality with respect to the successor or predecessor states.
3
7
11 8 9 10
12 13 14 15 12 13 14 15
To keep the pattern database partitioned, we assume that the number of files
that can be opened simultaneously does not exceed Δ = max{h(v) − h(u)} +
1 | u, v ∈ Succ(u)}, i.e., Δ matches the locality of the abstract state space graph.
If a heuristic estimate is needed as soon as a node is generated, an appro-
priate choice for creating external-memory pattern databases is a backwards
BFS with SDD, as SDD already provides locality with respect to a state space
abstraction function. After the construction patterns are arranged according to
pattern blocks, one for each abstract state. When a concrete heuristic search
algorithm expands nodes, it must check if the pattern form the pattern-lookup
scope are in main memory, and, if not, it reads them from disk. Pattern blocks
that do not belong to the current pattern-lookup scope are removed. When the
part of internal memory is full, the search algorithm must decide, which pattern
block to remove, e.g., by adopting the least-recently used strategy.
Larger pattern databases provide better bounds and thus allow more guid-
ance in the search. For the 15-puzzle puzzle, a 10-tile 28 GB pattern database has
been built [81], while [23] computed 9-9-6, 9-8-7, and 8-8-8 pattern database sets
for the 24-puzzle that are up to three orders of magnitude larger (up to 1.4 TB)
than the standard 6-6-6-6 pattern database set. This was possible by perform-
ing a parallel breadth-first search in the compressed pattern space. Experiments
indicate an average 8-fold improvement of the 9-9-6 set over the 6-6-6-6 set. Com-
bining several large pattern databases yielded on average a 13-fold improvement.
A massive parallel search based on the map-and-reduce paradigm [21] using these
databases was proposed by [89].
If we consider the example of the 35-puzzle with x tiles in the pattern, the
abstract state space consists of 36!/(36 − x)! states. A perfect hash-table for the
35-puzzle has space requirements of 43.14 MB (x = 5), 1.3 GB (x = 6), and 39.1
GB (x = 7). The latter has successfully been constructed on disk by [32].
The algorithm exits if an error bound on the policy evaluation falls below a
user-supplied threshold , or a maximum number of iterations have been exe-
cuted. If the optimal cost f ∗ is known for each state, the optimal policy can
be easily extracted by choosing an operation according to a single application
of the Bellman equation. The procedure takes a heuristic h for initializing the
value function as an additional parameter.
The error bound on the value function is also called the residual, and can
for example be computed in form maxu∈S |ft (u) − ft−1 (u)|. A residual of zero
denotes that the process has converged. An advantage of other methods like
policy iteration is that it converges to the exact optimum, while value iteration
usually only reaches an approximation. On the other hand, the latter technique
is usually more efficient on large state spaces.
For implicit search graphs, value iteration proceed in two phases. In the first
phase, the whole state space is generated from the initial state s. In this process,
an entry in a hash table (or vector) is allocated in order to store the f -value for
each state u; this value is initialized to the cost of u if u ∈ T , or to a given (non-
necessarily admissible) heuristic estimate (or zero if no estimate is available) if
u is non-terminal. In the second phase, iterative scans of the state space are
performed updating the values of non-terminal states u as:
Backward Phase: Update of Values. This is the most critical part of the approach
and deserves more attention. To perform the update on the value of state v,
we have to bring together the value of its successor states. As they both are
contained in one file, and there is no arrangement that can bring all successor
states close to their predecessor states, we make a copy of the entire graph (file)
and deal with the current state and its successor differently. To establish the
adjacencies, the second copy, called T emp, is sorted with respect to the node u.
Remember that Open is sorted with respect to the node v.
A parallel scan of files Open and T emp gives us access to all the successors
and values needed to perform the update on the value of v. This scenario is
shown in Fig. 8 for the graph in the example. The contents of Temp and Opent ,
for t = 0, are shown along with the heuristic values computed so far for each
edge (u, v). The arrows show the flow of information (alternation between dotted
and dashed arrows is just for clarity). The results of the updates are written to
the file Opent+1 containing the new values for each state after t + 1 iterations.
Once Opent+1 is computed, the file Opent can be removed as it is no longer
needed.
External-Memory State Space Search 203
2 1
2 7
1
5
h=3 0
2 0
I 1 3 8 T 10 T
1
6
2 1
4 9
h(v) = 3 2 2 2 2 1 2 0 1 1 1 1 0 0 0 0
Temp {(∅, 1), (1,2), (1,3), (1,4), (2,3), (2,5), (3,4), (3,8), (4,6), (5,6), (5,7), (6,9), (7,8), (7,10), (9,8), (9,10)}
sorted on pred.
h
of
w
Flo
Open 0 {(∅, 1), (1,2), (1,3), (2,3), (1,4), (3,4), (2,5), (4,6), (5,6), (5,7), (3,8), (7,8), (9,8), (6,9), (7,10), (9,10)}
h(v) = 3 2 2 2 2 2 1 1 1 1 0 0 0 1 0 0
Open 1 {(∅, 1), (1,2), (1,3), (2,3), (1,4), (3,4), (2,5), (4,6), (5,6), (5,7), (3,8), (7,8), (9,8), (6,9), (7,10), (9,10)}
h(v) = 3 2 1 1 2 2 2 2 2 1 0 0 0 1 0 0
sorted on state
Fig. 8. An example graph with initial f -values and one backward phase in external-
memory value iteration. A parallel scan of Open0 and T emp is done from left to right.
The file Open1 is the result of the first update; f -values that changed in the first update
are shown with bold underline typeface.
The backward update algorithms first copies the Opent list in T emp using
buffered I/O operations, and sorts the new T emp list according to the predeces-
sor states u. The algorithm then iterates on all edges from Opent and searches
for the successors in T emp. Since Opent is sorted with respect to states v, the
algorithm never goes back and forth in any of the Opent or Temp f iles. Note
that all reads and writes are buffered and thus can be carried out very efficiently
by always doing I/O operations in blocks. Four different cases arise when an edge
(u, v, a, f (v)) is read from Opent . (States from Fig. 8 are referred in parentheses.)
– Case I: v is terminal (states 8 &10). Since no update is necessary, the edge
can be written to Opent+1 .
– Case II: v is the same as the last updated state (state 3). Write the edge to
Opent+1 with such last value. (Case shown in Fig. 8 with curved arrows.)
– Case III: v has no successors. That means that v is a terminal state and so is
handled by case I.
– Case IV: v has one or more successors (remaining states). For each action a ∈
A(v), compute the value q(a, v) by summing the products of the probabilities
and the stored values. Such value is kept in the array q(a).
For edges (x, y, a , f ) read from T emp, we have
– Case A: y is the initial state, implying x = ptyset. Skip this edge since there is
nothing to do. By taking ptyset as the smallest element, the sorting of T emp
brings all such edges to the front of the file. (Case not shown.)
204 S. Edelkamp
– Case B: x = v, i.e. the predecessor of this edge matches the current state from
Opent . This calls for an update in the q(a)-value.
The array q : A → IR is initialized to the edge weight w(a, v), for each a ∈ A(v).
Once all the successors are processed, the new value for v is the minimum of the
values stored in the q-array for all applicable actions.
The backward phase performs at most tmax iterations. Each iteration consists
of one sorting and two scanning operations for a total of O(tmax ·sort(|E|)) I/Os.
For the sliding-tile puzzles we performed two experiments: one with determin-
istic moves, and the other with noisy actions that achieve their intended effects
with probability p = 0.9 and no effect with probability 1 − p. Table 4 shows the
results for random instances of the 8-puzzle for both experiments. The rectan-
gular 3 × 4 sliding-tile puzzle with p = 0.9 cannot be solved with internal value
iteration because the state space did not fit in RAM. External-memory value
iteration generated a total of 1,357,171,197 edges taking 45 GBs of disk space.
The backward update finished successfully after 21 days in 72 iterations using
1.4 GBs RAM. The value function for initial state converged to 28.8889 with a
residual smaller than = 10−4 .
parent files
child files
Fig. 9. Externally stored state space with parent and children files.
Procedure Parallel-External-Memory-BFS
Input: Undirected problem graph with start node s, number of processes N
hash function ψ
Output: Partitioned BFS layers Openj (i), i ∈ {0, 1, . . . , k}, j ∈ {0, 1, . . . , N }
Algorithm 1.2. Parallel external-memory breadth-first search for state space enu-
meration.
P1 P2 P1 P2 P1 P2
0 2 2 0 0 1 2 0
2 2 2 0
2 2 2 2 1 0 2 2 2 0 3 3
2 2 2 2 2 2 2 2 2 3 4 2
0 2 2 0 0 2 2 0 0 3 2 0
P3 P4 P3 P4 P3 P4
Fig. 10. Example for parallel SDD with 4 processes: before P1 releases its work, after
P1 has released his work, after P1 has allocated new work.
process) for more memory that might have been released using a completed
exploration by some other process. For a proper (conflict-free) distribution of
work, numbers I(φ(u)) were assigned to each abstract node φ(u), denoting the
accumulated influence that currently imposed to this node by running processes.
If I(φ(u)) = 0 the abstract node φ(u) can be picked for expansion from every
processor that is currently idle. Function I is updated as follows. In a first
step, for all φ(v) = φ(u) with φ(u) ∈ Succ(φ(v)) value φ(v) is incremented by
one: all abstract nodes that include φ(u) in their scope cannot be expanded,
since φ(u) is chosen for expansion. In a second step, for all φ(v) = φ(u) with
φ(v) ∈ Succ(φ(u)) and all φ(w) = φ(v) with φ(w) ∈ Succ(φ(v)) value φ(v) is
incremented by one: all abstract nodes that include any φ(v) as a successor of
φ(u) cannot be expanded, since they are also assigned to the processor.
Figure 10 illustrates the working of parallel structural duplicate detection
for the 15-puzzle with the currently expanded abstract nodes shaded. The left-
most part of figure shows the abstract problem graph together with 4 processes
working independently at expanding abstract states. The numbers I(φ(u)) are
associated with each abstract node φ(u). The middle part of the figure depicts
the situation after one process has finished, the right part shows the situation
after process has been assigned to a new abstract state.
h−value
g−value
join and leave the exploration, the size of the state space partition does not
necessarily have to match the number of processes. By utilizing a queue, one
also may expect a process to access a bucket multiple times. However, for the
ease of a first understanding, it is simpler to assume that the jobs are distrib-
uted uniformly among the processes.) For improving the efficiency, we assume
a distributed environment with one master and several slave processes. In the
implementation, the master is in fact an ordinary process defined as the one that
finalized the work for a bucket. The applies to both the cases when each slave
has its own hard disk or if they work together on one hard disk e.g. residing on
the master. We do not expect all processes to run on one machine, but allow
slaves to log-on the master machine, suitable for workstation clusters. Message
passing between the master and slave processes is purely done on files, so that
all processes are fully autonomous. Even if slave processes are killed, their work
can be re-done by any other idle process that is available.
One file that we call the expand-queue, contains all current requests for explor-
ing a node set that is contained in a file. The filename consists of the current
g- and h-value. In case of larger files, file-pointers for processing parts of a file
are provided, to allow for better load balancing. There are different strategies
to split a file into equi-distance parts or into chunks depending on the number
and performance of logged-on slaves. As we want to keep the exploration process
distributed, we select the file pointer windows into equidistant parts of a fixed
number of C bytes for the nodes to be expanded. For improved I/O, the number
C is supposed to divide the system’s block size B. As concurrent read operations
are allowed for most operating systems, multiple processes reading the same file
impose no concurrency conflicts.
The expand-queue is generated by the master process and is initialized with
the first block to be expanded. Additionally, we maintain the total number of
requests, i.e., the size of the queue, and the current number of satisfied requests.
External-Memory State Space Search 209
Any logged-on slave reads a request and increases the count once it finishes.
During the expansion process, in a subdirectory indexed by the slave’s name it
generates files that are indexed by the g- and h-value of the successor nodes.
The other queue is the refine-queue also generated by the master process
once all processes are done. It is organized in a similar fashion as the expand
queue and allows slaves to request work. The refine-queue contains filenames
that have been generated above, namely the slave-name (that does not have
to match with the one of the current process), the block number, and the g-
and h-value. For a suitable processing the master process will move the files
from subdirectories indexed by the slave’s name to ones that are indexed by the
block number. As this is a sequential operation executed by the master thread,
changing the file locations is fast in practice. To avoid redundant work, each
process eliminates the requests from the queue. Moreover, after finishing the
job, it writes an acknowledge to an associated file, so that each process can
access the current status of the exploration, and determine if a bucket has been
completely explored or sorted.
All communication between different processes can be shared files, so that a
message passing unit is not required. However, a mechanism for mutual exclusion
is necessary. A rather simple but efficient method to avoid concurrent writes
accesses is the following. Whenever a process has to write on a shared file, it
issues an operating system command to rename the file. If the command fails,
it implies that the file is currently being used by another process.
For each bucket that is under consideration, we establish four stages in the
algorithm with a pseudo-code shown in Algorithm 1.3. The four phases are visu-
alized in Fig. 12 (top to bottom). Zig-zag curves illustrate the order of the nodes
in the files wrt. the comparison function used. As the states are presorted in
internal memory, every peak correspond to a flushed buffer. The sorting criteria
itself is defined first by the node’s hash key and then by the low-level comparison
based on the (compressed) state vector.
In the exploration stage (generating the first row in the figure), each process
p flushes the successors with a particular g- and h-value to its own file (g, h, p).
Each process has its own hash table and eliminates some duplicates already in
main memory. The hash table is based on chaining, with chains sorted along
the node comparison function. However, if the output buffer exceeds memory
capacity it writes the entire hash table to disk. By the use of the sorting criteria
as given above, this can be done using a mere scan of the hash table.
– In the first sorting stage (generating the second row in the figure), each process
sorts its own file. In the distributed setting we exploit the advantage that the
files can be sorted in parallel, reducing internal processing time. Moreover, the
number of file pointers needed is restricted by the number of flushed buffers,
illustrated by the number of peaks in the figure. Based on this restriction, we
only need a merge of different sorted buffers.
– In the distribution stage (generating the third row in the figure), all nodes in
the presorted files are distributed according to the hash value’s range. As all
input files are presorted this is a mere scan. No all-including file is generated,
210 S. Edelkamp
P1 P2 PN
Expansion
phase: newly
gen. buffers
sorted on Sorted buffers in processors’ files
partition order
Full bucket
sorting on
parition
Local files externally sorted
order
Distribution
stage Distributed in M disjoint hash
partitions by scanning
Sorting on
state vectors Externally sorted local files
keeping the individual file sizes small. This stage can be a bottleneck to the
parallel execution, as processes have to wait until the distribution stage is
completed. However, if we expect the files to reside on different hard drives,
traffic for file copying can be parallelized.
– In the second sorting stage (generating the last row in the figure), processes
resort the files (with buffers presorted wrt. the hash value’s range), to find
further duplicates. The number of peaks in each individual file is limited by
the number of input files (=number of processes), and the number of output
files is determined by the selected partitioning of the hash index range. Using
the hash index as the sorting key we establish that the concatenation of files
is sorted.
Procedure Parallel-External-Memory-A*
Input: Undirected problem graph with start node s, predicate Goal, N processes
hash function ψ
Output: Optimal solution path
one tile i for i ∈ {1, . . . , 35}. All client processes operate individually on different
processing nodes and communicate via shared files.
During the expansion of a bucket (see Fig. 14), the master writes a file Ti
for each client process Pi , i ∈ {1, . . . , 35}. Once it has finished the expansion
of a bucket, the master Pm announces that each Pi should start evaluating Ti .
Additionally, the client is informed on the current g- and h-value. After that,
the master Pm is suspended, and waits for all Pi ’s to complete their task. To
relieve the master from load, no sorting takes place during distribution. Next,
the client processes start evaluating Ti , putting their results into Ei (h − 1) or
Ei (h + 1), depending on the observed difference in the h-values. All files Ei are
additionally sorted to eliminate duplicates; internally (when a buffer is flushed)
and externally (for each generated buffer). As only 3 buckets are opened at a
time (1 for reading and 2 for writing) the associated internal buffers can be large.
After the evaluation phase is completed, each process Pi is suspended. When
all clients are done, the master Pm is resumed and merges the Ei (h − 1) and
Ei (h + 1) files into Em (h − 1) and Em (h + 1). The merging preserves the order in
the files Ei (h − 1) and Ei (h + 1), so that the files Em (h − 1) and Em (h + 1) are
already sorted with all duplicates within the bucket eliminated. The subtraction
212 S. Edelkamp
h(s)
0 1 2 3 4 5......h
0
s 1
1 2 4 7
2 3 5 8
3 6 9
(1, 4)
P1 P2 P3
client 7
client master
4 10
(2, 3)
P 1 P2 P 3
successors
5 11 sort
.. .. merge
. . T P2 P3 P1 distribute
8
g
In the last few years there has been a remarkable increase in the performance
and capabilities of the graphics processing unit (GPU). Modern GPUs are not
only powerful, but also parallel programmable processors featuring high arith-
metic capabilities and memory bandwidths. High-level programming interfaces
have been designed for using GPUs as ordinary computing devices. These efforts
in general purpose GPU programming (GPGPU) has positioned the GPU as a
compelling alternative to traditional microprocessors in high-performance com-
puting. The GPU’s rapid increase in both programmability and capability has
inspired researchers to map computationally demanding, complex problems to
it. Since the memory transfer between the card and main board on the express
bus is extremely fast, GPUs have become an apparent candidate to speed-up
large-scale computations. GPUs have several cores, but the programming and
External-Memory State Space Search 213
(g, h)
Pm
expand
distribute
T1 T2 T3 T33 T34 T35
p1 p2 p3 Pi p33 p34 p35
evaluate
Em(h − 1) Em(h + 1)
Pm
(g − 1, h − 1) subtract (g − 1, h + 1)
(g + 1, h − 1) (g + 1, h + 1)
computational model are different from the ones on the CPU. A core is a stream-
ing processor with some floating point and arithmetic logic units. Together with
some special function units, streaming processor are grouped together to form
streaming multiprocessors. Programming a GPU requires a special compiler,
which translates the code to native GPU instructions. The GPU architecture
mimics a single instruction multiply data computer with the same instructions
running on all processors. It supports different layers for accessing memory.
GPUs forbid simultaneous writes to a memory cell but support concurrent reads.
GPUs have outpaced CPUs in numerical algorithms [46,72]. Applications
include studying the folding behavior of proteins by [57] and the simulation of
bio-molecular systems by [79]. Since the memory transfer between the card and
main board on the express bus is in the order gigabytes per second, GPUs have
become an apparent candidate to speed-up large-scale computations like sorting
numerical data on disk [18,44]. Its application for sorting-based delayed duplicate
detection is apparent. By using perfect hash functions there is work on explor-
ing single-agent search problems on the GPU [41], and on solving two-player
games [39]. Moreover, explicit-state and probabilistic model checking problems
have been ported to the GPU [11,38].
On the GPU, memory is structured hierarchically, starting with the GPU’s
global memory called video RAM, or VRAM. Access to this memory is slow,
but can be accelerated through coalescing, where adjacent accesses with less
than word-width number bits are combined to full word-width access. Each
streaming multi-processor includes a small amount of memory called SRAM,
which is shared between all streaming multi-processor and can be accessed at the
same speed as registers. Additional registers are also located in each streaming
214 S. Edelkamp
CPU GPU
y x wv u SRAM
y x wv u
RAM VRAM
HDD/SSD
v uw
xz
y
Fig. 15. External-memory search utilizing the GPU and the CPU, arrows indicate
movements of sets of states between different sorts of memory.
Procedure GPU-BFS
Input: State space problem with initial state s
Output: State space partitioned into layers
CPU GPU
generated successors
sorting
each block sorted (parallel)
copying
compacting
duplicates from blocks removed
subtracting
HDD/SSD
removed states seen in prevoius layers
flushing
Fig. 16. Hash-based partitioning, total order for set of states is the combination of
hash address (computed on the CPU) and sorting index (computed on the GPU).
operating on the pairs (h (s), s). The sorted vector is copied back from VRAM
to RAM, and the array is compacted by eliminating duplicates with another
scan through the elements. Subtracting visited states is made possible through
scanning all previous layers residing on disk. Finally, we flush the duplicate-free
file for the current BFS level to disk and iterate. To accelerate discrimination
and to obey the imposed order on disk, the hash bucket value h (s) is added to
the front of the state vector s.
If a BFS level becomes too large to be sorted on the GPU, we split the search
frontier into parts that fit in the VRAM. This yields some additional state vector
files to be subtracted to obtain a duplicate-free layer, but in practice time perfor-
mance is still dominated by expansion and sorting. For the case that subtraction
becomes harder, we can exploit the hash-partitioning, inserting previous states
into files partitioned by the same hash value. States that have a matching hash
value are mapped to the same file. Provided that the sorting order is first on
the hash value then on the state, after the concatenation of files (even if sorted
separately) we obtain a total order on the sets of states. This implies that we
can restrict duplicate elimination to states that have matching hash values.
On the GPU, we have a fixed amount of O(|VRAM|/|SRAM|) group opera-
tions, where each group is sorted by Bitonic Sort. Hence, the sorting complexity
is independent from the number of elements to be sorted, as in each iteration
the entire vector is processed. With a good distribution function, we assure that
External-Memory State Space Search 217
on the average each bucket is at least 50% filled with successor states, such that
we loose less than factor 2 by not dealing with entirely filled buckets. As an
example, in our case, we have |VRAM| = 1 GB, and |SRAM| = (16 − c) KB,
where c is a small constant, imposed by the internal memory requirements of the
graphics card. For a state vector of 32 byte, we arrive at k = 256 elements in one
group. Within each group Bitonic Sort is applied, known to induce O(k log2 k)
work and O(log k) iterations. In each iteration the number of comparisons that
can be executed in parallel depends on the number of available threads, which
in turn depends on the graphics card chosen.
Instead of sorting the buckets after they have been filled, it is possible to use
chaining right away, checking each individual successor for having a duplicate
against the states stored in its bucket. Keeping the list of states sorted, as in
ordered hashing, accelerates the search, however, requires additional work for
insertion, and does not speed up the computation, if compared to parallel sorting
the buckets on the GPU. We only implemented a refinement that checks the state
to be inserted in a bucket with the top element to detect some duplicates quickly.
State Compression. With a 64-bit hash address we do not encounter any collision
even in very large state spaces. Henceforth, given hash function h, we compress
the state vector for u to (h(u), i(u)), where i(u) is the index of the state vector
residing in RAM that is needed for expansion. We sort the pairs on the GPU
with respect to the lexicographic ordering of h. The shorter the state vector, the
more elements fit into one group, and the better the expected speed-up.
To estimate the probability of an error, assume a state space of n = 230
elements uniformly hashed to the m = 264 possible bit-vectors of length 64. We
have m!/(mn (m − n)!) ≥ ((m − n + 1)/m)n ≥ (1 − n/m)n . For our case this
30
resolves to (1 − 2−34 )2 = (.99999999994179233909)1073741824 , and a confidence
of at least 93.94% that no duplicate arises while hashing the entire state space
to 64 bits. Recall, that missing a duplicate harms, only if the missed state is the
only way to reach the error in the system. If the above confidence appears still
to be too low, one may re-run the experiment with another independent hash
function, showing that with ≥99.6%, no false positive has been produced during
the traversal of the state space.
Bitvector GPU Search. Static perfect hashing has been devised in the early
70th [22,43]. Practical perfect hashing has been analyzed by [12] and an external-
memory perfect hash function variant has been proposed by [13].
For the design of a minimum perfect hash function of the sliding-tile puzzles
we observe that in a lexicographic ordering every two successive permutations
have an alternating signature and differ by exactly one transposition. For mini-
mal perfect hashing a (n2 − 1)-puzzle state to {0, . . . , n2 !/2 − 1} we consequently
compute the lexicographic rank and divide it by 2. For unranking, we now have to
determine, which one of the two uncompressed permutations of puzzle is reach-
able. This amounts to finding the signature of the permutation, which allows to
separate solvable from unsolvable states.
There is one subtle problem with the blank. Simply taking minimum perfect
hash value for the alternation group in Sn2 does not suffice, as swapping a tile
with the blank not necessarily toggles the solvability status (e.g., it may be a
move). To resolve this problem, we partition state space along the position of
the blank. Let B0 , . . . , Bn2 −1 denote the sets of blank-projected states. Then
each Bi contains (n2 − 1)!/2 elements. Given index i and the rank inside Bi , it
is simple to reconstruct the state.
Korf and Schultze [68] used lookup tables to compute lexicographic ranks,
while Bonet [9] discussed different time-space trade-offs. Mares and Straka
[75] proposed a linear-time algorithm for lexicographic ranking, which relies
on bitvector operations in constant time. Applications of perfect hashing for
bitvector state space search include Peg Solitaire [41], Nine-Men-Morris [39],
and Chineese Checkers [90,91]. Bitvector-compressed pattern databases result
in log2 3 ≈ 1.6 bits per state [14]. Efficient permutation indices have been pro-
posed by Myrvold and Ruskey [78]. The basic motivation is the generation of a
random permutation of size N according to swapping πi with πr where r is a ran-
dom number uniformly chosen in 0, . . . , r, and i decreases from N − 1 to 1. The
(recursive) procedure Rank is shown in Algorithm 1.5. The permutation π and
its inverse π −1 are initialized according with the permutation, for which a rank
has to determined. The inverse π −1 of π can be computed by setting ππ−1 i
= i,
for all i ∈ {0, . . . , k − 1}. Take as an example permutation π = π −1 = (1, 0, 3, 2).
Then its rank is 2 · 3! + Rank(102). This unrolls to 2 · 3! + 2 · 2! + 0 · 1! + 0 · 0! = 16.
It is also possible to compile a rank back into a permutation in linear time.
External-Memory State Space Search 219
(GPU) EXPAND
(GPU) (GPU)
0010
SORT SORT
FLUSH FLUSH
Fig. 17. GPU exploration of the 15-puzzle stored as a bitvector in RAM (GPU sorting
indices is optional and was not used in the experiments).
The inverse procedure Unrank is initialized with the identity permutation and
shown in Algorithm 1.5.
References
1. Aggarwal, A., Vitter, J.S.: The input/output complexity of sorting and related
problems. J. ACM 31(9), 1116–1127 (1988)
2. Ajwani, D., Dementiev, R., Meyer, U.: A computational study of external-memory
BFS algorithms. In: SODA, pp. 601–610 (2006)
3. Ajwani, D., Malinger, I., Meyer, U., Toledo, S.: Graph search on flash memory.
MPI-TR (2008)
4. Allen, J.D.: The Complete Book of CONNECT 4: History, Strategy, Puzzles.
Sterling Publishing, New York (2011)
5. Allis, L.V.: A knowledge-based approach to connect-four. The game is solved: white
wins. Master’s thesis, Vrije Univeriteit, The Netherlands (1998)
6. Barnat, J., Brim, L., Edelkamp, S., Sulewski, D., Šimeček, P.: Can flash memory
help in model checking? In: Cofer, D., Fantechi, A. (eds.) FMICS 2008. LNCS, vol.
5596, pp. 150–165. Springer, Heidelberg (2009). doi:10.1007/978-3-642-03240-0 14
7. Barto, A., Bradtke, S., Singh, S.: Learning to act using real-time dynamic pro-
gramming. Artif. Intell. 72(1), 81–138 (1995)
8. Bloem, R., Ravi, K., Somenzi, F.: Symbolic guided search for CTL model checking.
In: DAC, pp. 29–34 (2000)
9. Bonet, B.: Efficient algorithms to rank and unrank permutations in lexicographic
order. In: AAAI-Workshop on Search in AI and Robotics (2008)
10. Bonet, B., Geffner, H.: Learning depth-first: a unified approach to heuristic search
in deterministic and non-deterministic settings, and its application to MDPs. In:
ICAPS, pp. 142–151 (2006)
11. Bošnački, D., Edelkamp, S., Sulewski, D.: Efficient probabilistic model checking on
general purpose graphics processors. In: Păsăreanu, C.S. (ed.) SPIN 2009. LNCS,
vol. 5578, pp. 32–49. Springer, Heidelberg (2009). doi:10.1007/978-3-642-02652-2 7
12. Botelho, F.C., Pagh, R., Ziviani, N.: Simple and space-efficient minimal perfect
hash functions. In: WADS, pp. 139–150 (2007)
13. Botelho, F.C., Ziviani, N.: External perfect hashing for very large key sets. In:
CIKM, pp. 653–662 (2007)
14. Breyer, T.M., Korf, R.E.: 1.6-bit pattern databases. In: AAAI (2010)
15. Burns, E., Lemons, S., Ruml, W., Zhou, R.: Suboptimal and anytime heuristic
search on multi-core machines. In: ICAPS (2009)
16. Burns, E., Lemons, S., Zhou, R., Ruml, W.: Best-first heuristic search for multi-
core machines. In: IJCAI, pp. 449–455 (2009)
17. Cazenave, T.: Nested monte-carlo search. In: IJCAI, pp. 456–461 (2009)
18. Cederman, D., Tsigas, P.: A practical quicksort algorithm for graphics processors.
Technical report 2008–01, Chalmers University of Technology (2008)
19. Cormen, T., Leiserson, C., Rivest, R.: Introduction to Algorithms. MIT Press,
Cambridge (1990)
20. Culberson, J.C., Schaeffer, J.: Pattern databases. Comput. Intell. 14(4), 318–334
(1998)
21. Dean, J., Ghemawat, S.: MapReduce: simplified data processing on large clusters.
In: OSDI (USENIX Association, Berkeley, CA, USA) (2004)
22. Dietzfelbinger, M., Karlin, A., Mehlhorn, K., auf der Heide, F.M., Rohnert, H.,
Tarjan, R.E.: Dynamic perfect hashing upper and lower bounds. SIAM J. Comput.
23, 738–761 (1994)
23. Döbbelin, R., Schütt, T., Reinefeld, A.: Building large compressed PDBs for the
sliding tile puzzle. In: Computer Games, pp. 16–27 (2013)
222 S. Edelkamp
24. Edelkamp, S.: Planning with pattern databases. In: ECP, pp. 13–24 (2001). Reprint
2013 by AAAI Press. http://www.aaai.org/ocs/index.php/ECP/ECP01
25. Edelkamp, S.: External symbolic heuristic search with pattern databases. In:
ICAPS, pp. 51–60 (2005)
26. Edelkamp, S., Jabbar, S.: Externalizing real-time model checking. In: MOCHART,
pp. 67–83 (2006)
27. Edelkamp, S., Jabbar, S.: Large-scale directed model checking LTL. In: Valmari, A.
(ed.) SPIN 2006. LNCS, vol. 3925, pp. 1–18. Springer, Heidelberg (2006). doi:10.
1007/11691617 1
28. Edelkamp, S., Jabbar, S., Bonet, B.: External memory value iteration. In: ICAPS,
pp. 414–429 (2007)
29. Edelkamp, S., Jabbar, S., Schrödl, S.: External A*. In: Biundo, S., Frühwirth,
T., Palm, G. (eds.) KI 2004. LNCS (LNAI), vol. 3238, pp. 226–240. Springer,
Heidelberg (2004). doi:10.1007/978-3-540-30221-6 18
30. Edelkamp, S., Kissmann, P.: Externalizing the multiple sequence alignment prob-
lem with affine gap costs. In: Hertzberg, J., Beetz, M., Englert, R. (eds.) KI 2007.
LNCS (LNAI), vol. 4667, pp. 444–447. Springer, Heidelberg (2007). doi:10.1007/
978-3-540-74565-5 36
31. Edelkamp, S., Kissmann, P.: Symbolic classification of general two-player games.
In: Dengel, A.R., Berns, K., Breuel, T.M., Bomarius, F., Roth-Berghofer, T.R.
(eds.) KI 2008. LNCS (LNAI), vol. 5243, pp. 185–192. Springer, Heidelberg (2008).
doi:10.1007/978-3-540-85845-4 23
32. Edelkamp, S., Kissmann, P., Jabbar, S.: Scaling search with pattern databases. In:
MOCHART, pp. 49–64 (2008)
33. Edelkamp, S., Kissmann, P., Rohte, M.: Symbolic and explicit search hybrid
through perfect hash functions - a case study in connect four. In: ICAPS (2014)
34. Edelkamp, S., Leue, S., Lluch-Lafuente, A.: Directed explicit-state model checking
in the validation of communication protocols. Int. J. Softw. Tools Technol. 5(2–3),
247–267 (2004)
35. Edelkamp, S., Sanders, P., Simecek, P.: Semi-external LTL model checking. In:
CAV, pp. 530–542 (2008)
36. Edelkamp, S., Sulewski, D.: Flash-efficient LTL model checking with minimal coun-
terexamples. In: SEFM, pp. 73–82 (2008)
37. Edelkamp, S., Sulewski, D.: Model checking via delayed duplicate detection on the
GPU. Technical report 821, TU Dortmund (2008)
38. Edelkamp, S., Sulewski, D.: Efficient probabilistic model checking on general pur-
pose graphics processors. In: SPIN (2010)
39. Edelkamp, S., Sulewski, D.: GPU exploration of two-player games with perfect
hash functions. In: SOCS (2010)
40. Edelkamp, S., Sulewski, D.: External memory breadth-first search with delayed
duplicate detection on the GPU. In: MOCHART, pp. 12–31 (2011)
41. Edelkamp, S., Sulewski, D., Yücel, C.: Perfect hashing for state space exploration
on the GPU. In: ICAPS, pp. 57–64 (2010)
42. Edelkamp, S., Tang, Z.: Monte-carlo tree search for the multiple sequence alignment
problem. In: SOCS, pp. 9–17 (2015)
43. Fredman, M.L., Komlós, J., Szemerédi, E.: Storing a sparse table with o(1) worst
case access time. J. ACM 3, 538–544 (1984)
44. Govindaraju, N.K., Gray, J., Kumar, R., Manocha, D.: GPUTeraSort: high perfor-
mance graphics coprocessor sorting for large database management. In: SIGMOD,
pp. 325–336 (2006)
External-Memory State Space Search 223
45. Hansen, E., Zilberstein, S.: LAO*: a heuristic search algorithm that finds solutions
with loops. Artif. Intell. 129, 35–62 (2001)
46. Harris, M., Sengupta, S., Owens, J.D.: Parallel prefix sum (scan) with CUDA.
In: Nguyen, H. (ed.) GPU Gems 3, pp. 851–876. Addison-Wesley, Salt Lake City
(2007)
47. Hart, N., Nilsson, J., Raphael, B.: A formal basis for the heuristic determination
of minimum cost paths. IEEE Trans. Syst. Sci. Cybern. 4(2), 100–107 (1968)
48. Helmert, M.: Decidability and undecidability results for planning with numerical
state variables. In: AIPS, pp. 303–312 (2002)
49. Helmert, M., Domshlak, C.: Landmarks, critical paths, abstractions: what’s the
difference anyway? In: ICAPS (2009)
50. Helmert, M., Haslum, P., Hoffmann, J.: Flexible abstraction heuristics for optimal
sequential planning. In: ICAPS, pp. 176–183 (2007)
51. Hoare, C.A.R.: Algorithm 64: quicksort. Commun. ACM 4(7), 321 (1961)
52. Hoffmann, J.: The metric FF planning system: translating “Ignoring the delete
list” to numerical state variables. J. Artif. Intell. Res. 20, 291–341 (2003)
53. Hoffmann, J., Nebel, B.: Fast plan generation through heuristic search. J. Artif.
Intell. Res. 14, 253–302 (2001)
54. Jabbar, S.: External memory algorithms for state space exploration in model check-
ing and action planning. PhD thesis, TU Dortmund (2008)
55. Jabbar, S., Edelkamp, S.: I/O efficient directed model checking. In: Cousot, R.
(ed.) VMCAI 2005. LNCS, vol. 3385, pp. 313–329. Springer, Heidelberg (2005).
doi:10.1007/978-3-540-30579-8 21
56. Jabbar, S., Edelkamp, S.: Parallel external directed model checking with linear
I/O. In: Emerson, E.A., Namjoshi, K.S. (eds.) VMCAI 2006. LNCS, vol. 3855, pp.
237–251. Springer, Heidelberg (2005). doi:10.1007/11609773 16
57. Jaychandran, G., Vishal, V., Pande, V.S.: Using massively parallel simulations,
Markovian models to study protein folding: examining the Villin head-piece. J.
Chem. Phys. 124(6), 164 903–164 914 (2006)
58. Kishimoto, A., Fukunaga, A., Botea, A.: On the scaling behavior of HDA*. In:
SOCS (2010)
59. Kishimoto, A., Fukunaga, A.S., Botea, A.: Scalable, parallel best-first search for
optimal sequential planning. In: ICAPS (2009)
60. Kocsis, L., Szepesvári, C.: Bandit based Monte-Carlo planning. In: ICML, pp.
282–293 (2006)
61. Korf, R.E.: Linear-space best-first search. Artif. Intell. 62(1), 41–78 (1993)
62. Korf, R.E.: Finding optimal solutions to Rubik’s cube using pattern databases. In:
AAAI, pp. 700–705 (1997)
63. Korf, R.E.:. Breadth-first frontier search with delayed duplicate detection. In:
MOCHART, pp. 87–92 (2003)
64. Korf, R.E.: Best-first frontier search with delayed duplicate detection. In: AAAI,
pp. 650–657 (2004)
65. Korf, R.E.: Minimizing disk I/O in two-bit breadth-first search. In: AAAI, pp.
317–324 (2008)
66. Korf, R.E., Felner, A.: Disjoint pattern database heuristics. In: Chips Challeng-
ing Champions: Games, Computers and Artificial Intelligence, pp. 13–26. Elsevier
(2002)
67. Korf, R.E., Felner, A.: Recent progress in heuristic search: a case study of the
four-peg towers of Hanoi problem. In: IJCAI, pp. 2324–2329 (2007)
68. Korf, R.E., Schultze, T.: Large-scale parallel breadth-first search. In: AAAI, pp.
1380–1385 (2005)
224 S. Edelkamp
69. Korf, R.E., Zhang, W.: Divide-and-conquer frontier search applied to optimal
sequence alignment. In: AAAI, pp. 910–916 (2000)
70. Korf, R.E., Zhang, W., Thayer, I., Hohwald, H.: Frontier search. J. ACM 52(5),
715–748 (2005)
71. Kristensen, L., Mailund, T.: Path finding with the sweep-line method using external
storage. In: ICFEM, pp. 319–337 (2003)
72. Krueger, J., Westermann, R.: Linear algebra operators for GPU implementation
of numerical algorithms. ACM Trans. Graph. 22(3), 908–916 (2003)
73. Kunkle, D., Cooperman, G.: Solving Rubik’s cube: disk is the new RAM. Commun.
ACM 51(4), 31–33 (2008)
74. Kupferschmid, S., Dräger, K., Hoffmann, J., Finkbeiner, B., Dierks, H., Podelski,
A., Behrmann, G.: Uppaal/DMC - abstraction-based Heuristics for directed model
checking. In: TACAS, pp. 679–682 (2007)
75. Mareš, M., Straka, M.: Linear-time ranking of permutations. In: Arge, L.,
Hoffmann, M., Welzl, E. (eds.) ESA 2007. LNCS, vol. 4698, pp. 187–193. Springer,
Heidelberg (2007). doi:10.1007/978-3-540-75520-3 18
76. Mehlhorn, K., Meyer, U.: External-memory breadth-first search with sublinear
I/O. In: Möhring, R., Raman, R. (eds.) ESA 2002. LNCS, vol. 2461, pp. 723–735.
Springer, Heidelberg (2002). doi:10.1007/3-540-45749-6 63
77. Munagala, K., Ranade, A.: I/O-complexity of graph algorithms. In: SODA, pp.
687–694 (1999)
78. Myrvold, W., Ruskey, F.: Ranking and unranking permutations in linear time. Inf.
Process. Lett. 79(6), 281–284 (2001)
79. Phillips, J.C., Braun, R., Wang, W., Gumbart, J., Tajkhorshid, E., Villa, E.,
Chipot, C., Skeel, R.D., Kale, L., Schulten, K.: Scalable molecular dynamics with
NAMD. J. Comp. Chem. 26, 1781–1802 (2005)
80. Reinefeld, A., Schütt, T.: Out-of-core parallel frontier search with MapReduce.
In: Mewhort, D.J.K., Cann, N.M., Slater, G.W., Naughton, T.J. (eds.) HPCS
2009. LNCS, vol. 5976, pp. 323–336. Springer, Heidelberg (2010). doi:10.1007/
978-3-642-12659-8 24
81. Reinefeld, A., Schütt, T., Döbbelin, R.: Very large pattern databases for heuristic
search. In: Hariri, S., Keahey, K. (eds.), HPDC, pp. 803–809 (2010)
82. Richter, S., Helmert, M., Westphal, M.: Landmarks revisited. In: AAAI, pp. 975–
982 (2008)
83. Rokicki, T., Kociemba, H., Davidson, M., Dethridge, J.: The diameter of the
Rubik’s cube group is twenty. SIAM J. Discrete Math. 27(2), 1082–1105 (2013)
84. Rosin, C.D.: Nested rollout policy adaptation for Monte-Carlo tree search. In:
IJCAI, pp. 649–654 (2011)
85. Meyer, U., Sanders, P., Sibeyn, J. (eds.): Algorithms for Memory Hierarchies.
LNCS, vol. 2625. Springer, Heidelberg (2003)
86. Schaeffer, J., Björnsson, Y., Burch, N., Kishimoto, A., Müller, M.: Solving checkers.
In: IJCAI, pp. 292–297 (2005)
87. Schaeffer, J., Burch, N., Bjrnsson, Y., Kishimoto, A., Müller, M., Lake, R., Lu,
S.S.P.: Checkers is solved. Science 317(5844), 1518–1522 (2007)
88. Schroedl, S.: An improved search algorithm for optimal multiple sequence align-
ment. J. Artif. Intell. Res. 23, 587–623 (2005)
89. Schütt, T., Reinefeld, A., Maier, R.: MR-search: massively parallel heuristic search.
Concurr. Comput.: Pract. Exp. 25(1), 40–54 (2013)
90. Sturtevant, N.R.: External memory PDBs: initial results. In: SARA (2013)
91. Sturtevant, N.R., Rutherford, M.J.: Minimizing writes in parallel external memory
search. In: IJCAI (2013)
External-Memory State Space Search 225
92. Wijs, A.: What to do Next? Analysing and optimising system behaviour in time.
PhD thesis, Vrije Universiteit Amsterdam (1999)
93. Zhou, R., Hansen, E.: Breadth-first heuristic search. In: ICAPS, pp. 92–100 (2004)
94. Zhou, R., Hansen, E.: Structured duplicate detection in external-memory graph
search. In: AAAI, pp. 683–689 (2004)
95. Zhou, R., Hansen, E.: External-memory pattern databases using structured dupli-
cate detection. In: AAAI (2005)
96. Zhou, R., Hansen, E.A.: Multiple sequence alignment using A*. In: AAAI (2002).
Student abstract
97. Zhou, R., Hansen, E.A.: Edge partitioning in external-memory graph search. In:
IJCAI, pp. 2410–2417 (2007)
98. Zhou, R., Hansen, E.A.: Parallel structured duplicate detection. In: AAAI, pp.
1217–1222 (2007)
Algorithm Engineering Aspects of Real-Time
Rendering Algorithms
1 Introduction
In the area of computer graphics, rendering describes the process of visualizing
a data set. One important aspect of rendering is, of course, how the data is pre-
sented to serve the desired application. Besides that, an algorithmic challenge
arises from the complexity of the rendered data set. Especially if the visualiza-
tion has to be performed in real time, the amount of data can easily exceed
the capabilities of state of the art hardware, if only simple rendering techniques
are applied. In this paper, we focus on tools and techniques for the develop-
ment of algorithms for rendering three-dimensional virtual scenes in real-time
walkthrough applications. Although the algorithmic challenges induced by com-
plex virtual scenes traditionally play an important role in this area of computer
graphics, explicitly considering techniques supporting the developing process, or
providing a sound empirical evaluation are only considered marginally.
while the current view of the scene is rendered. The rendering process is normally
supported by dedicated graphics hardware. Such hardware nowadays supports
rendering of several million polygons at interactive frame rates (e.g., at least 10
frames per second). Considering complex virtual environments (like the CAD
data of an air plane, or of a complete construction facility), the complexity of
such scenes can however still exceed the capabilities of the hardware by orders
of magnitudes. Thus, many real world applications require specialized rendering
algorithms to reduce the amount of data processed by the graphics hardware.
The problem of rendering complex three dimensional scenes exhibits several
properties that distinguishes it from many other problem areas dealing with large
data sets; even influencing the process of designing, implementing, and evaluat-
ing algorithms in this area. In our opinion, the three most relevant properties are
the use of dedicated graphics hardware, the influence of the input’s geometric
structure, and the relevance of the image quality perceived by a human observer.
Dedicated graphics hardware provides a large amount of computational power,
but also requires adaptations to its parallel mode of operation and the partic-
ularities of the rendering pipeline. On the one hand, the geometric structure of
the virtual scene offers the opportunity to speed up the rendering process by
exploiting the mutual occlusion of objects in the scene. On the other hand, the
view on the scene changes for every observer position in the scene, which has to
be considered in order to acquire any reasonable evaluation results on the gen-
eral efficiency of a rendering algorithm. The human perception of the rendered
images allows to speed up the rendering process by replacing complex parts of
the scene by similar looking, but much simpler approximations. Challenges for
the development of such algorithms is to actually create well looking approxima-
tions and to reasonably measure the image quality for an objective experimental
evaluation.
1.2 Overview
First, we give an overview of the state of the art concerning different aspects influ-
encing the evaluation process of rendering algorithms (Sect. 2). Then, we present
the PADrend framework (Sect. 3), developed to provide a common basis for the
development and evaluation and usage of rendering algorithms. The behavior of
a rendering algorithm is not only depending on the visualized scene, but also on
the observer’s view on the scene – which is often only insufficiently considered
by existing evaluation methods. We developed a special evaluation technique
that tackles this issue based on globally approximated scene properties (Sect. 4).
As an example, we present a meta rendering algorithm (Sect. 5.2) that uses the
presented software framework and techniques to automatically assess and select
other rendering algorithms for the visualization of highly complex scenes.
processes all triangles, whereby first the vertices of each triangle are projected
into the 2-dimensional screenspace (geometric transformation), and second, each
triangle is filled by coloring its pixels (rasterization).
To provide a smooth navigation through a scene about 10 images (frames)
per second are necessary. Current graphics hardware can render scenes con-
sisting of up to 15 millions triangles with 10 fps. Rendering algorithms try to
overcome this limit by working like a filter [1,21] in front of the z-buffer algo-
rithm: They reduce the number of polygons sent to the pipeline either by exclud-
ing invisible objects (visibility culling) or by approximating complex objects by
less complex replacements.
In the following three sections, we discuss the challenges of objectively eval-
uating and comparing different rendering algorithms. An algorithm’s efficiency,
as well as the achieved image quality, is not only influenced by the virtual scene
used as input (Sect. 2.1), but also by the user’s movement through the scene
(Sect. 2.2) and by the used graphics hardware (Sect. 2.3).
1
http://graphics.stanford.edu/data/3Dscanrep/.
2
http://www.cs.unc.edu/%7Ewalk/.
3
http://www.boeing.com.
4
http://www.esri.com/software/cityengine/resources/demos.
Algorithm Engineering Aspects of Real-Time Rendering Algorithms 229
If certain properties of the input scene can be assumed (e.g., all the polygons
are uniformly distributed), it can be shown, that the rendering time of several
rendering algorithms is logarithmic in the complexity of the scene for any position
in the scene (e.g., [7,30]).
Modern graphics cards combine a set of massively parallel processing units with
dedicated memory. Normally, the high level rendering process is controlled by
the CPU while the actual geometric and color calculations are performed on the
graphics processing unit (GPU) in parallel. This heterogeneous design has to be
reflected by the Real-time rendering systems, which reduce the number of primi-
tives that are sent to the graphics card to the extent that a rendering with a fixed
frame rate is possible. For this purpose a run-time prediction for the rendering
of the primitives send to the graphics card is necessary. As today’s graphics
230 M. Fischer et al.
cards do not provide hard guarantees of the execution time of elementary graph-
ics operations, the runtime predictions are imprecise and depend mainly on the
chosen modeling of the hardware:
For the linear runtime estimation of the z-buffer algorithm we assume that
geometric transformation and rasterization are both possible in constant time
for each triangle. Practical implementations show that the runtime depends on
the projected size of the triangles (counted in number of rasterized pixels). If we
use an additional parameter a counting the number off all rasterized pixels, we
estimate the runtime by O(n + a) [15].
Funkhouser and Séquin [11] refined the model of basic graphics hardware
operations in order to predict the rendering time of single polygons. The ren-
dering time estimation take into account the number of polygons, the number
of pixels and the number of pixels in the projection.
Wimmer and Wonka [33] model the graphics hardware by four major com-
ponents, system tasks, and the maximum of CPU tasks and GPU tasks where
the latter two are a sum of frame setup, memory management, rendering code,
and idle time.
In this respect, no hard real-time rendering is possible, as in the case real-
time operating systems that can provide reliably specific results within a pre-
determined time period. A real-time rendering system is aimed more to provide
statistical guarantees for a fixed frame rate [33].
– Allow rapid development by providing a set of high level interfaces and mod-
ules for common tasks.
Algorithm Engineering Aspects of Real-Time Rendering Algorithms 231
5
http://www.mozilla.org/MPL/.
6
https://github.com/EScript.
232 M. Fischer et al.
The basis for the high level rendering processes is the Minimalistic Scene Graph
(MinSG) library. “Minimalistic” represents the idea that the scene graph’s
core functionality is designed to be as compact and simplistic as possible. The
library’s core contains a small set of basic node types (a geometry node, a con-
tainer node, and a few others) and a set of different properties (called states)
that can be attached to nodes. The structure of the virtual scene is represented
as a tree of nodes, in which the leaf nodes contain the scene’s geometry as trian-
gle meshes. Material and lighting properties are represented by states attached
to the nodes. Specialized functions, data structures, and rendering algorithms
can be created by building upon those core components.
A characteristic of MinSG is the way in which rendering algorithms are
implemented. In most rendering engines, the rendering is defined by an external
process getting the scene graph as input. In MinSG, rendering algorithms are
implemented as states that are attached to inner nodes of the scene graph (like
material properties). All nodes in the corresponding subtree may be influenced
by this rendering state. The rendering state may use custom traversal techniques,
call arbitrary rendering functions using the Rendering library, and dynamically
store arbitrary data at any node. In this way, the rendering algorithm rather
becomes a property of the scene than an external process.
Some algorithms can be seen as a combination of different high level func-
tions. For instance, the color cubes algorithm (based on [7]) can be split up into
two steps: First, identify the nodes in the scene graph whose current projected
size is smaller than a given value. Then, approximate the corresponding subtrees
by rendering colored cubes mimicking the original geometry (trade image quality
for speed). In MinSG, these two steps (node selection and approximating) can
be handled by different cooperating rendering states using rendering channels.
Both states are assigned to a node in the scene graph for which the algorithm
should be applied. Upon activation, the approximating rendering state registers
itself as handler for a rendering channel. The selection rendering state performs
the actual traversal of the subtree. Nodes with a large projected size are tra-
versed further, leaves are rendered normally. Nodes with a small projected size
are passed on to the rendering channel for which the approximation renderer
is registered. The approximation renderer now handles the node by displaying
the colored cube. Other implemented approximation states are based on mesh
reduction techniques (like discrete Level of Detail [13,21]), image-based tech-
niques (like impostors [28]), or point-based techniques (like Progressive Blue
Surfels [17]). Furthermore, the state controlling which nodes are to be approxi-
mated can consider not only a node’s projected size, but its estimated visible size
(e.g., based on an occlusion culling algorithm), or adjust the size dynamically
to fulfill a given frame rate constraint (budget rendering). The main benefits
of such a modular decomposition of algorithms are the possibility to reuse the
parts for related techniques (decrease duplicated code and increase robustness)
and the possibility to experiment with the recombination of different parts even
at runtime. Further techniques that are implemented using rendering channels
Algorithm Engineering Aspects of Real-Time Rendering Algorithms 233
are, for example, distance sorting of transparent objects and multi-pass render-
ing. Techniques requiring complete control over the traversal process itself, can
hoverer easily leave out rendering channels completely.
Scripted rendering states are another possibility to influence the rendering
process, especially during the early development phase of an algorithm. Such
states are implemented in EScript, resulting in a noticeably lower performance–
but with the advantage that they can be deployed and altered at runtime. The
effects of a change in the algorithm can be observed immediately even without
reloading the scene. Later in the development process, such a state is normally
re-implemented by translating it to C++ with little effort.
Frame Statistics. For each rendered image (frame), different parameters are
measured automatically and combined into the frame statistics. Typical para-
meter types include the number of rendered polygons, the time needed for ren-
dering the frame, or the number of certain operations performed by a rendering
algorithm (see the middle window in Fig. 2). Using the Waypoints plugin, the
parameters can be measured along a predefined camera path and exported for
later analysis. Similar features are common to most rendering frameworks.
Fig. 2. Screen shot of PADrend showing the user interface of several plugins. The
left window shows the configuration of a node of the scene graph with an attached
rendering state. The middle window shows several frame statistics measured while
moving through the scene. The right window shows a visualization of the last frame’s
logged rendering events: Blue lines represent rendered geometry, yellow and red lines
represent occlusion queries, and empty regions show idle waiting times of the CPU –
possible starting points for optimizations. (Color figure online)
rendering process (see the right window in Fig. 2). This feature is used best in
combination with additional external profiler tools that are provided by major
GPU manufacturers.
Image Quality Evaluation. The image compare plugin offers several func-
tions for automatically measuring image quality. These functions require two
images as input: one original image created using an exact rendering algorithm
and the approximated image that is created using an approximate algorithm.
Simple and widely used functions are the number of different pixels in the two
images or some metrics on the color distances. We additionally support other
techniques originating from the area of image compression algorithms (like JPEG
compression). The structural similarity technique (SSIM) [32] detects changes
in the structure of an image by comparing the pixel neighborhoods of all pixels
in the two images.
To reduce the impact of small image errors compared to the dimensions of
the image, like noise or aliasing artifacts, an image pyramid [5] is used addi-
tionally. SSIM is calculated for the full-size images and for multiple versions
with reduced resolution. The arithmetic mean of the multiple SSIM calculations
for the different resolutions is used as final image quality value. The numerical
value as produced by using SSIM offers little direct meaning, but the order of
Algorithm Engineering Aspects of Real-Time Rendering Algorithms 235
different values (e.g., image A looks better than image B), aligns better with
human perception than non-structure-oriented techniques like PSNR (for a dis-
cussion, see [31]).
The basic observation underlying our method is that many aspects of a rendering
algorithm’s behavior can be expressed as position-dependent scene properties.
Such a scene property can be expressed as function defined over R3 mapping
to a property value from an arbitrary co-domain, like R or N. In the following
section, we give an overview of some property functions that proved to be useful
in the evaluation of rendering algorithms.
Number of Visible Objects: One basic property is the number of the scene’s
objects that are visible from a position in the scene (on a pixel basis and not
geometrically). This property is not bound to a specific algorithm, but can
give important insight into the structure of the used scene. In our experience,
almost all other properties are influenced by visibility.
A practical and efficient way of determining this property at a specific posi-
tion, is to use the graphics hardware for the visibility tests. The scene is
projected onto the six sides of a cube surrounding the observed position
by rendering. Each object contributing at least one pixel to the final image
on one of the sides is counted as visible. This can easily be measured by
using hardware-assisted occlusion queries. The resolution used for rendering
process should resemble the screen resolution of the walkthrough system and
is an important parameter for the evaluation. If the property is evaluated in a
system supporting efficient ray-casts (like a real-time ray tracing system), an
omnidirectional observer can alternatively be implemented using a spherical
projection, avoiding distortions in the corners of the cube.
Rendering Time: The rendering time property of an algorithm describes the
time needed for an algorithm to render one frame. This value is clearly not
only dependent on the position in the scene, but also on the viewing direction.
To express the rendering time as meaningful position-dependent scene prop-
erty, we abstract the viewing direction by taking the maximum of the values
236 M. Fischer et al.
for six different directions – the six sides of a cube. The camera aperture
angle is set to 120◦ to produce an overlap between adjacent directions. This
overlap is required to reduce artifacts occurring if complex parts of the scene
are not completely covered by any projection surface, although this area
could be completely in view, if the camera was rotated differently.
Number of Operations: Other meaningful scene properties can be defined
by the number of various operations performed by an algorithm to render
a frame. This includes, for example, the number of rendered objects, the
number of state changes in the graphics pipeline, or the number of issued
occlusion queries. The measurement is similar to the rendering time mea-
surement in that we can take the maximum value of the six directions.
The sampling method works as follows: Beginning with the initial region, the
active region is subdivided into eight (3D case) or four (2D case) equally-sized
new regions. For each of the new regions, new sample points are chosen (see
description below) and the property function is evaluated at those positions.
Two values are then calculated for each new region: The property value is the
average value of all sample values that lie inside or at the border of the region.
The quality-gain is defined by the variance of the region’s sample values divided
by the region’s diameter – large regions with a high value fluctuation get high
values, while small regions with almost uniform values get low values. The new
regions are inserted into a priority queue based on their quality-gain values. The
region with the highest quality-gain value is extracted from the queue and the
algorithm starts over splitting this region next. The algorithm stops when the
number of evaluated sample points exceeds the chosen value (or when another
used defined condition is reached).
Fig. 3. Visualization of the rendering time of the CHC++ algorithm [22] evaluated for a
2D cutting plane of the Power Plant. White regions (low height): rendering times down
to 5 ms (behind the chimney, high occlusion). Red regions (large height): rendering
times up to 16 ms (inside the main building, little occlusion). The approximation is
based on 2 k sample points evaluated in 160 s. (image source [18]) (Color figure online)
value should be chosen in such a way that important areas can be seen from the
outside. A 3D example is shown in Fig. 4(b).
The visualization makes it easy for the user to understand the algorithm’s
behavior and how it corresponds to the characteristics of different scene regions.
Because it is intuitive and easy to use, it is a valuable tool during the development
of a new algorithm. It can also be used to comprehensibly present aspects of a
new algorithm in a scientific publication.
In Sect. 5.1 we use the well known CHC++ occlusions culling method [22] to
demonstrate our proposed evaluation methods. In Sect. 5.2 we present a new
rendering method that is developed with the PADrend system and show how
the image quality can easily be evaluated by the system.
Algorithm Engineering Aspects of Real-Time Rendering Algorithms 239
Fig. 4. (a) Violin plots for the distribution of the CHC++ rendering time in the Power
Plant scene for different loose octree’s maximum depths (4 k samples per value com-
puted in 40 min overall). (b) Visualization of the property rendering time: CHC++
minus simple z-buffer. Only the negative values are shown as darker (blue) regions
(two times 4 k samples). (c) Violin plot for the same property. (image source [18])
(Color figure online)
depth values allows for a local search for the best value. In our setting, the
objectively best value is five (see Fig. 4(a)).
5.2 Multi-Algorithm-Rendering
Many large virtual 3D scenes are not structured evenly, for example, because they
exhibit a high variety in their polygons’ spatial distribution. For such heteroge-
neous data, there may be no single algorithm that constantly performs well with
any type of scene and that is able to render the scene at each position fast and
with high image quality. For a small set of scenes, this situation can be improved,
if an experienced user is able to manually assign different rendering algorithms to
particular parts of the scene. Based on the set of rendering algorithms and auto-
matic evaluation techniques implemented in PADrend, we developed the meta
rendering algorithm Multi-Algorithm-Rendering (MAR) [24], which automati-
cally deploys different rendering algorithms simultaneously for a broad range of
scene types.
In a preprocessing step, MAR first divides the scene into suitable regions.
Then, at randomly chosen sample positions the expected behavior (rendering
time and image quality) of all available rendering algorithms for all regions is
evaluated. During runtime, this data is utilized to compute estimates for the
running time and image quality for the actual observer’s point of view. By solving
an optimizing problem embedded in a control cycle, the frame rate can be kept
almost constant, while the image quality is optimized.
Figure 5 shows an example from a highly complex and heterogeneous scene
consisting of more than 100 million polygons. It is composed of models emerging
from CAD, laser scans, and a procedural scene generator. The second screen
shot shows the algorithm assignment MAR chose for that position from the
following rendering algorithms: CHC++, normal z-buffer [6,27], Spherical Vis-
ibility Sampling [9], Color Cubes [7], discrete Level of Detail [13,21], and two
variants of Progressive Blue Surfels [18]. Figure 6 shows the distribution of the
achieved image quality as a scene property function. If the chosen target frame
rate is increased, the faster algorithms will be preferred – resulting in a decreased
image quality.
Algorithm Engineering Aspects of Real-Time Rendering Algorithms 241
z-Buffer
CHC++
SVS
LoD
Color Cubes
Blue Surfels
Force Surfels
F
Fig. 6. Evaluation of MAR’s image quality as scene property function. The observer
positions are sampled in a slice slightly above the city.
6 Conclusion
References
1. Akenine-Möller, T., Haines, E., Hoffman, N.: Real-Time Rendering, 3rd edn. A K
Peters Ltd., Wellesley (2008)
2. Baxter III., W.V., Sud, A., Govindaraju, N.K., Manocha, D.: GigaWalk: interactive
walkthrough of complex environments. In: Proceedings of the 13th Eurographics
Workshop on Rendering (EGWR 2002), pp. 203–214 (2002)
3. Bittner, J., Wimmer, M., Piringer, H., Purgathofer, W.: Coherent hierarchical
culling: hardware occlusion queries made useful. Comput. Graph. Forum Proc. of
Eurographics 2004 23(3), 615–624 (2004)
4. Brüderlin, B., Heyer, M., Pfützner, S.: Interviews3D: a platform for interactive
handling of massive data sets. IEEE Comput. Graph. Appl. 27(6), 48–59 (2007)
5. Burt, P.J.: Fast filter transform for image processing. Comput. Graph. Image
Process. 16(1), 20–51 (1981)
6. Catmull, E.E.: A subdivision algorithm for computer display of curved surfaces.
Ph.D. thesis, Department of Computer Science, University of Utah, Salt Lake City,
UT, USA (1974)
7. Chamberlain, B., DeRose, T., Lischinski, D., Salesin, D., Snyder, J.: Fast rendering
of complex environments using a spatial hierarchy. In: Proceedings of Graphics
Interface (GI 1996), pp. 132–141. Canadian Information Processing Society (1996)
8. Cook, R.L.: Stochastic sampling in computer graphics. ACM Trans. Graph. (TOG)
5(1), 51–72 (1986)
9. Eikel, B., Jähn, C., Fischer, M., auf der Heide, F.M.: Spherical visibility sam-
pling. Comput. Graph. Forum Proc. of 24th Eurographics Symposium on Render-
ing 32(4), 49–58 (2013)
10. Eikel, B., Jähn, C., Petring, R.: PADrend: platform for algorithm development
and rendering. In: Augmented & Virtual Reality in der Produktentstehung. HNI-
Verlagsschriftenreihe, vol. 295, pp. 159–170. University Paderborn, Heinz Nixdorf
Institute (2011)
11. Funkhouser, T.A., Séquin, C.H.: Adaptive display algorithm for interactive frame
rates during visualization of complex virtual environments. In: Proceedings of the
20th Conference on Computer Graphics and Interactive Techniques (SIGGRAPH
1993), pp. 247–254 (1993)
12. Funkhouser, T.A., Séquin, C.H., Teller, S.J.: Management of large amounts of data
in interactive building walkthroughs. In: Proceedings of the 1992 Symposium on
Interactive 3D Graphics (I3D 1992), pp. 11–20 (1992)
13. Garland, M., Heckbert, P.S.: Simplifying surfaces with color and texture using
quadric error metrics. In: Proceedings of the Conference on Visualization (VIS
1998), pp. 263–269 (1998)
14. Guthe, M., Balázs, Á., Klein, R.: Near optimal hierarchical culling: performance
driven use of hardware occlusion queries. In: Akenine-Möller, T., Heidrich, W.
(eds.) Proceedings of the 17th Eurographics Symposium on Rendering (EGSR
2006), pp. 207–214 (2006)
15. Heckbert, P., Garland, M.: Multiresolution modeling for fast rendering. In: Pro-
ceedings of Graphics Interface (GI 1994), pp. 43–50 (1994)
16. Hintze, J.L., Nelson, R.D.: Violin plots: a box plot-density trace synergism. Am.
Stat. 52(2), 181–184 (1998)
17. Jähn, C.: Progressive blue surfels. Technical report, Heinz Nixdorf Institute, Uni-
versity of Paderborn (2013). CoRR. http://arxiv.org/abs/1307.0247
Algorithm Engineering Aspects of Real-Time Rendering Algorithms 243
18. Jähn, C., Eikel, B., Fischer, M., Petring, R., Meyer auf der Heide, F.: Evaluation
of rendering algorithms using position-dependent scene properties. In: Bebis, G.,
et al. (eds.) ISVC 2013. LNCS, vol. 8033, pp. 108–118. Springer, Heidelberg (2013).
doi:10.1007/978-3-642-41914-0 12
19. Kovalčı́k, V., Sochor, J.: Occlusion culling with statistically optimized occlusion
queries. In: The 13th International Conference in Central Europe on Computer
Graphics, Visualization and Computer Vision (WSCG 2005), pp. 109–112 (2005)
20. Li, L., Yang, X., Xiao, S.: Efficient visibility projection on spherical polar coordi-
nates for shadow rendering using geometry shader. In: IEEE International Confer-
ence on Multimedia and Expo, pp. 1005–1008 (2008)
21. Luebke, D., Reddy, M., Cohen, J.D., Varshney, A., Watson, B., Huebner, R.: Level
of Detail for 3D Graphics. Computer Graphics and Geometric Modeling. Morgan
Kaufman Publishers, Burlington (2003)
22. Mattausch, O., Bittner, J., Wimmer, M.: CHC++: coherent hierarchical culling
revisited. Comput. Graph. Forum Proc. of Eurographics 2008 27(2), 221–230
(2008)
23. Meruvia-Pastor, O.E.: Visibility preprocessing using spherical sampling of polygo-
nal patches. In: Short Presentations of Eurographics 2002 (2002)
24. Petring, R., Eikel, B., Jähn, C., Fischer, M., Meyer auf der Heide, F.: Real-time 3D
rendering of heterogeneous scenes. In: Bebis, G., et al. (eds.) ISVC 2013. LNCS, vol.
8033, pp. 448–458. Springer, Heidelberg (2013). doi:10.1007/978-3-642-41914-0 44
25. Sander, P.V., Nehab, D., Barczak, J.: Fast triangle reordering for vertex locality
and reduced overdraw. ACM Trans. Graph. (TOG) ACM SIGGRAPH 2007 26(3),
89:1–89:9 (2007)
26. Staadt, O.G., Walker, J., Nuber, C., Hamann, B.: A survey and performance analy-
sis of software platforms for interactive cluster-based multi-screen rendering. In:
Proceedings of the Workshop on Virtual Environments (EGVE 2003), pp. 261–270
(2003)
27. Straßer, W.: Schnelle Kurven- und Flächendarstellung auf graphischen Sicht-
geräten. Ph.D. thesis, Technische Universität Berlin, Berlin, Germany (1974)
28. Süß, T., Jähn, C., Fischer, M.: Asynchronous parallel reliefboard computation for
scene object approximation. In: Proceedings of the 10th Eurographics Symposium
on Parallel Graphics and Visualization (EGPGV 2010), pp. 43–51 (2010)
29. Teller, S.J., Séquin, C.H.: Visibility preprocessing for interactive walkthroughs.
In: Proceedings of the 18th Conference on Computer Graphics and Interactive
Techniques (SIGGRAPH 1991), pp. 61–70 (1991)
30. Wand, M., Fischer, M., Peter, I., Meyer auf der Heide, F., Straßer, W.: The ran-
domized z-buffer algorithm: interactive rendering of highly complex scenes. In:
Proceedings of the 28th Conference on Computer Graphics and Interactive Tech-
niques (SIGGRAPH 2001), pp. 361–370 (2001)
31. Wang, Z., Bovik, A.C.: Mean squared error: love it or leave it? A new look at signal
fidelity measures. IEEE Signal Process. Mag. 26(1), 98–117 (2009)
32. Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment:
from error visibility to structural similarity. IEEE Trans. Image Process. 13(4),
600–612 (2004)
33. Wimmer, M., Wonka, P.: Rendering time estimation for real-time rendering. In:
Rendering Techniques 2003, Proceedings of the Eurographics Symposium on Ren-
dering 2003, pp. 118–129 (2003)
244 M. Fischer et al.
34. Yuan, P., Green, M., Lau, R.W.H.: A framework for performance evaluation of
real-time rendering algorithms in virtual reality. In: Proceedings of the ACM Sym-
posium on Virtual Reality Software and Technology (VRST 1997), pp. 51–58 (1997)
35. Zhang, H., Manocha, D., Hudson, T., Hoff III., K.E.: Visibility culling using hierar-
chical occlusion maps. In: Proceedings of the 24th Conference on Computer Graph-
ics and Interactive Techniques (SIGGRAPH 1997), pp. 77–88 (1997)
Algorithm Engineering in Robust Optimization
1 Introduction
Partially supported by grant SCHO 1140/3-2 within the DFG programme Algorithm
Engineering.
c Springer International Publishing AG 2016
L. Kliemann and P. Sanders (Eds.): Algorithm Engineering, LNCS 9220, pp. 245–279, 2016.
DOI: 10.1007/978-3-319-49487-6 8
246 M. Goerigk and A. Schöbel
F(ξ) = {x ∈ X : F (x, ξ) ≤ 0}
Common Uncertainty Sets. There are some types of uncertainty sets that are
frequently used in current literature. These include:
1. Finite uncertainty U = ξ 1 , . . . , ξ N
2. Interval-based uncertainty U = [ξ 1 , ξ 1 ] × . . . × [ξ M , ξ M ]
3. Polytopic uncertainty U = conv ξ 1 , . . . , ξ N
4. Norm-based uncertainty U = ξ ∈ RM : ξ − ξ ˆ ≤ α for some parame-
ter α ≥ 0
M M
5. Ellipsoidal uncertainty U = ξ ∈ R : i=1 ξi /σi ≤ Ω
2 2 for some para-
meter Ω ≥ 0
6. Constraint-wise uncertainty U = U1 × . . . × Um , where Ui only affects
constraint i
N N
where conv ξ 1 , . . . , ξ N = i=1 λ i ξ i
: i=1 λ i = 1, λ ∈ RN
+ denotes the con-
vex hull of a set of points. Note that this classification is not exclusive, i.e., a
given uncertainty set can belong to multiple types at the same time.
Fig. 1. The algorithm engineering cycle for robust optimization following [113].
SR(U) = F(ξ),
ξ∈U
The first to consider this type of problems from the perspective of generalized
linear programs was Soyster [118] for uncertainty sets U of type
U = K1 × . . . × K n ,
where the set Ki contains possible column vectors Ai of the coefficient matrix A.
Subsequent works on this topic include [55,123].
However, building this approach into a strong theoretic framework is due
to a series of papers by Ben-Tal, Nemirovski, El Ghaoui and co-workers
[18–20,64]. A summary of their results can be found in the book [14]. Their
basic underlying idea is to hedge against all scenarios that may occur. As they
argue, such an approach makes sense in many settings, e.g., when constructing
a bridge which must be stable, no matter which traffic scenario occurs, or for
airplanes or nuclear power plants. However, this high degree of conservatism of
strict robustness is not applicable to all situations which call for robust solu-
tions. An example for this is timetabling in public transportation: being strictly
robust for a timetable means that all announced arrival and departure times have
to be met, no matter what happens. This may mean to add high buffer times,
depending on the uncertainty set used, and thus would not result in a practically
applicable timetable. Such applications triggered research in robust optimization
on ways to relax the concept. We now describe some of these approaches.
a1 x1 + . . . + an xn ≤ b
for a given parameter Γ ∈ {0, . . . , n}. Any solution x to this model hence hedges
against all scenarios in which at most Γ many uncertain coefficients may deviate
from their nominal values at the same time.
Algorithm Engineering in Robust Optimization 251
When fixing the here-and-now variables, one has to make sure that for any
possible scenario ξ ∈ U there exists v ∈ X 2 such that (u, v) is feasible for ξ. The
set of adjustable robust solutions is therefore given by
= P rX 1 (F(ξ)),
ξ∈U
where wi models a penalty weight for the violation of constraint i and ρ deter-
mines the required nominal quality. We denote by ξˆ the nominal scenario, as
introduced on page 3. This approach was in its first application in [58] used as
a further development of the concept of cardinality constrained robustness (see
Sect. 2.2).
Note that a constraint of the form F (x, ξ) ≤ 0 is equivalent to a constraint
λF (x, ξ) ≤ 0 for any λ > 0; therefore, the coefficients wi play an important role
in balancing the allowed violation of the given constraints.
It can be extended by including the recovery costs of a solution x: Let d(A(x, ξ))
be a possible vector-valued function that measures the costs of the recovery, and
let λ ∈ Λ be a limit on the recovery costs, i.e., λ ≥ d(A(x, ξ)) for all ξ ∈ U.
Assume that there is some cost function g : Λ → R associated with λ.
Setting
The concept of regret robustness differs from the other presented robustness
concepts insofar it usually only considers uncertainty in the objective function.
Instead of minimizing the worst-case performance of a solution, it minimizes the
difference to the objective function of the best solution that would have been
possible in a scenario. In some publications, it is also called deviation robustness.
Let f ∗ (ξ) denote the best objective value in scenario ξ ∈ U. The min-max
regret counterpart of an uncertain optimization problem with uncertainty in the
objective is then given by
(Regret) min sup f (x, ξ) − f ∗ (ξ)
ξ∈U
s.t. F (x) ≤ 0
x ∈ X.
(UFO) vecmaxμ(x)
s.t. F (x) ≤ 0
f (x) ≤ (1 + ρ)f ∗ (ξ)
ˆ
x ∈ X,
where f ∗ (ξ)
ˆ denotes the best objective value to the nominal problem. The
authors show that this approach generalizes both stochastic optimization and
the concept of cardinality constrained robustness of Bertsimas and Sim.
2.8 Summary
Not only the development of robustness concepts, but also their analysis is data-
driven. This becomes in particular clear when looking at the structure of the
Algorithm Engineering in Robust Optimization 257
(SR) min z
s.t. f (x, ξ i ) ≤ z for i = 1, . . . , N
s.t. F (x, ξ i ) ≤ 0 for i = 1, . . . , N
x ∈ X.
as the strictly robust counterpart. Reliability and light robustness can be treated
analogously. In all three cases, the robust counterpart keeps many properties
of the original (non-robust) problem formulation: If the original formulation
was e.g., a linear program, also its robust counterpart is. The same holds for
differentiability, convexity, and many other properties.
For regret robustness one needs to precompute the best objective function
value for each scenario ξ 1 , i = 1, . . . , N in order to receive again a straightfor-
ward reformulation. Also in adjustable and recoverable robustness mathematical
programming formulations can be derived by adding a wait and see variable, or
a group of recovery variables for each of the scenarios. This usually leads to
a high number of additional variables but is (at least for linear programming)
often still solvable.
Note that the concept of cardinality constrained robustness does not make
much sense for a finite set of scenarios since it concerns the restriction which
scenarios might occur. For a finite set, scenarios in which too many parameters
change can be removed beforehand.
Polytopic Uncertainty. Let f (x, ·) and F (x, ·) be quasiconvex in ξ for any fixed
x ∈ X . Then there are robustness concepts in which the following reduction
result holds: The robust counterpart w.r.t. an uncertainty set U is equiva-
lent to the robust counterpart w.r.t. U := conv(U ). In such cases the robust
counterpart w.r.t. a polytopic uncertainty set U = conv{ξ 1 , . . . , ξ N } is equiv-
alent to the robust counterpart w.r.t. the finite uncertainty set {ξ 1 , . . . , ξ N },
hence the formulations for finite uncertainty sets can be used to treat polytopic
uncertainties.
258 M. Goerigk and A. Schöbel
We now review for which robustness concepts the reduction result holds. First
of all, this is true for strict robustness, For affine and convex uncertainty this was
mentioned in [18]; the generalization to quasiconvex uncertainty is straightfor-
ward. One of the direct consequences, namely that the robust counterpart of an
uncertain linear program under these conditions is again a linear program was
mentioned in [20]. The same result holds for reliability since the reliable robust
counterpart can be transformed to a strictly convex counterpart by defining
F̃ (x, ξ) = F (x, ξ) − γ. For light robustness, the result is also true, see [115]. For
the case of adjustable robustness, [16] showed that the result holds for prob-
lems with fixed recourse. Otherwise, counterexamples can be constructed. The
generalization to nonlinear two-stage problems and quasiconvex uncertainty is
due to [121]. For recoverable robustness there exist special cases in which the
recovery robust counterpart is equivalent to an adjustable robust counterpart
with fixed recourse. In these cases, the result of [16] may be applied. However,
in general, recoverable robustness does not allow this property. This also holds
for recovery-to-optimality.
For light robustness, it has been shown in [115] that the lightly robust coun-
terpart of a linear program with ellipsoidal uncertainty becomes a quadratic pro-
gram. Ellipsoidal uncertainty could receive more attention also for other robust-
ness concepts (e.g., for regret robustness, which usually only considers finite or
interval-based uncertainty, see [4]), or for adjustable robustness, see [14].
Duality in uncertain programs has been considered as early as 1980, see [123].
In [10], it is shown that “the primal worst equals the dual best”, i.e., under
quite general constraints, the dual of a strictly robust counterpart (a min-max
problem) amounts to optimization under the best case instead of the worst-case
(a max-min problem). Since then, duality in robust optimization has been a vivid
field of research, see, e.g., [82,120]. In the following, we highlight two applications
of duality for robust optimization: One for constraints, and one for objectives.
f (x, ξ) ≤ 0 ∀ξ ∈ U
can be rewritten as
max f (x, ξ) ≤ 0.
ξ∈U
This is used in [17]. With a concave function f (x, ·) and an uncertainty set
U = {ξˆ + Aζ : ζ ∈ Z} with a nonempty, convex and compact set Z, applying
duality yields
ξˆt v + δ ∗ (AT v|Z) − f∗ (v, x) ≤ 0
where δ ∗ is the support function, f∗ is a conjugate function, and other tech-
nical requirements are met. This gives a very general tool to compute robust
counterparts; e.g., a linear constraint of the form f (x, ξ) = ξ t x − β and
Z = {ζ : ζ2 ≤ ρ} yields the counterpart ξˆt x + ρAt x2 ≤ β.
260 M. Goerigk and A. Schöbel
min ct x
s.t. x ∈ X = {x ∈ {0, 1}n : Ax ≥ b}
where cwc (x) denotes the regret worst-case for x, given as ci if xi = 1, and ci if
xi = 0. Using that the duality gap is zero, we can insert the dual to the inner
optimization problem, and get the following equivalent problem:
min cx − bt y
s.t. Ax ≥ b
At y ≤ (c − c)x + c
x ∈ {0, 1}n , y ∈ Rn+
This reformulation can then be solved using, e.g., a branch and bound approach.
s.t. F (x, ξ i ) ≤ 0 ∀i = 1, . . . , N
x∈X
Due to the lack of structure in the uncertainty set, these instances can be hard
so solve, even though they have a similar structure as the nominal problem.
s.t. F (x) ≤ 0
x∈X
Note that the structure of the nominal problem is preserved, which allows the
usage of specialized algorithms already known. Furthermore, the optimal objec-
tive value SRC∗ (μ) of this problem is a lower bound on the optimal objective
value SR∗ of the strictly robust counterpart; and as the set of feasible solutions
is the same, also an upper bound is provided by solving SRC(μ).
This approach is further extended by solving the problem
i.e., by finding the multiplier μ that yields the strongest lower bound. This can
be done using a sub-gradient method.
The lower and upper bounds generated by the surrogate relaxation are then
used within a branch and bound framework on the x variables. The approach
was further improved for the knapsack problem in [79,122].
The cooperative local search algorithm (CLS) works as follows: It first con-
structs a heuristic starting solution, e.g., by a greedy approach. In every iteration,
a set of moves M is constructed using either the generalized search for sets with
small cardinality, or the restricted search for sets with large cardinality. When
a maximum number of iterations is reached, the best feasible solution found so
far is returned.
that the left-hand side is the probability of a probability; this is due to fact
that V (x) is a random variable in the sampled scenarios. In other words, if a
desired probability of infeasibility is given, the accordingly required sample size
can be determined. This result holds under the assumption that every subset of
scenarios is feasible, and is independent of the probability distribution which is
used for sampling over U.
As the number of scenarios sampled this way may be large, the sequential
optimization approach [61–63] uses sampled scenarios one by one. Using the
above probability estimates, a solution generated by this method is feasible for
(SR) only within a certain probability. The basic idea is the following: We con-
sider the set S(γ) of feasible solutions with respect to a given quality level γ,
i.e.,
where 1/2
ν(γ, x, ξ) = max{0, f (x) − γ}2 + max{0, F (x, ξ)}2
Using a subgradient on ν, the current solution is updated in every iteration using
the sampled scenario ξ. Lower bounds on the number of required iterations are
given to reach a desired level of solution quality and probability of feasibility.
5.1 Libraries
In the following, we present some libraries that are designed for robust optimiza-
tion. A related overview can also be found in [68].
AIMMS for Robust Optimization. AIMMS [108], which stands for “Advanced
Interactive Multidimensional Modeling System”, is a proprietary software that
contains an algebraic modeling language (AML) for optimization problems.
AIMMS supports most well-known solvers, including Cplex1 , Xpress2 and
Gurobi3 .
Since 2010, AIMMS has offered a robust optimization add-on, which was
developed in a partnership with A. Ben-Tal. The extension only considers the
concepts of strict and adjustable robustness as introduced in Sects. 2.1 and 2.3.
As uncertainty sets, interval-based uncertainty sets, polytopic uncertainty sets,
or ellipsoidal uncertainty sets are supported and transformed to mathematical
programs as described in Sect. 3.1. The respective transformations are automat-
ically done when the model is translated from the algebraic modeling language
to the solver.
ROME. While AIMMS focuses on the work of Ben-Tal and co-workers, ROME
[77] (“Robust Optimization Made Easy”) takes its origins in the work of
Bertsimas, Sim and co-workers. ROME is built in the MATLAB4 environment,
which makes it on the one hand intuitive to use for MATLAB-users, but on the
other hand lacks the versatility of an AML. As a research project, ROME is free
to use. It currently supports Cplex, Xpress and SDPT35 as solver engines.
ROME considers polytopic and ellipsoidal uncertainty sets, that can be fur-
ther specified using the mean support, the covariance matrix, or directional
deviations. Assuming an affine dependence of the wait-and-see variables, it then
transforms the uncertain optimization problem to an adjustable robust counter-
part. The strictly robust counterpart is included as a special case.
1
http://www-03.ibm.com/software/products/en/ibmilogcpleoptistud.
2
http://www.fico.com/en/products/fico-xpress-optimization-suite.
3
http://www.gurobi.com/.
4
http://www.mathworks.com/products/matlab/.
5
http://www.math.nus.edu.sg/∼mattohkc/sdpt3.html.
Algorithm Engineering in Robust Optimization 267
5.2 Applications
As already stated, robust optimization has been application-driven; thus, there
are abundant papers dealing with applications of some robustness approach to
real-world or at least realistic problems. Presenting an exhaustive list would go
far beyond the scope of this paper; examples include circuit design [94], emer-
gency logistics [13], and load planning [32] for adjustable robustness; supply chain
optimization [29] and furniture planning [5] for cardinality constrained robust-
ness; inventory control for comprehensive robustness [15]; timetabling [58,60],
and timetable information [71] for light robustness; shunting [43], timetabling
[44,73], and railway rolling stock planning [36] for recoverable robustness; and
airline scheduling for UFO [52].
Hence, we can state:
Remark 4. Robust optimization is application-driven.
268 M. Goerigk and A. Schöbel
In this section we consider research that either compares two robustness concepts
to the same problem, or two algorithms for the same problem and robustness
concept. We present a list of papers on the former aspect in Table 1, and a list of
papers on the latter aspect in Table 2. We do not claim completeness for these
tables; rather, they should be considered as giving a general impression on recent
directions of research.
We conclude the following from these tables and the accompanying litera-
ture: Firstly, papers considering real-world applications that compare different
robustness concepts are relatively rare. Applied studies are too often satisfied
with considering only one approach of the many that are possible. Secondly, algo-
rithmic comparisons dominantly stem from the field of min-max regret, where
at the same time mostly academic problems are considered. The efficient cal-
culation of solutions for other robustness concepts is still a relatively open and
promising field of research. Summarizing, we claim that:
Table 2. Papers presenting experiments comparing at least two algorithms for the
same robustness concept. “cc” abbreviates “cardinality constrained”.
improved behavior for the application at hand. This is much similar to frequently
published papers on optimization problems which compare a problem-specific
method to a generic MIP solver, usually observing a better performance of the
former compared to the latter.
However, while a standard MIP solver is often still competitive to problem-
tailored algorithms, a robustness concept which does not capture the problem
specifics at hand will nearly always be the second choice to one which uses the
full problem potential.
What becomes immediately obvious is that these limits are much smaller than
for their nominal problem counterparts, which can go easily into the millions.
students that are new to the field have a hard time to identify the state-of-
the-art.
3. Performance guarantees are not sufficiently researched in robust optimiza-
tion. Also this point can be regarded as related to robust optimization being
application-driven and non-unified. Performance guarantees are of special
importance when comparing algorithms; hence, with a lack of comparison,
there also comes a lack of performance guarantees. This includes the compar-
ison of robust optimization concepts, of robust optimization algorithms, and
even the general evaluation of a robust solution compared to a non-robust
solution.
4. There is no robust optimization library available with specifically designed
algorithms other than reformulation approaches. While libraries for robust
optimization exist, they concentrate on the modeling aspects of uncertainty,
and less on different algorithmic approaches. Having such a library avail-
able would prove extremely helpful not only for practitioners, but also for
researches that develop new algorithms and try to compare them to the state-
of-the-art.
5. There are too few comparative studies in robust optimization. All the above
points culminate in the lack of comparative studies; however, we argue that
here also lies a chance to tackle these problems. This paper is a humble step
to motivate such research, and we hope for many more publications to come.
References
1. Adasme, P., Lisser, A., Soto, I.: Robust semidefinite relaxations for a quadratic
OFDMA resource allocation scheme. Comput. Oper. Res. 38(10), 1377–1399
(2011)
2. Agra, A., Christiansen, M., Figueiredo, R., Hvattum, L.M., Poss, M., Requejo,
C.: The robust vehicle routing problem with time windows. Comput. Oper. Res.
40(3), 856–866 (2013)
3. Aissi, H., Bazgan, C., Vanderpooten, D.: Approximation of min–max and min–
max regret versions of some combinatorial optimization problems. Eur. J. Oper.
Res. 179(2), 281–290 (2007)
4. Aissi, H., Bazgan, C., Vanderpooten, D.: Min–max and min–max regret versions
of combinatorial optimization problems: a survey. Eur. J. Oper. Res. 197(2),
427–438 (2009)
5. Alem, D.J., Morabito, R.: Production planning in furniture settings via robust
optimization. Comput. Oper. Res. 39(2), 139–150 (2012)
6. Arrival project under contract no. FP6-021235-2. http://arrival.cti.gr/index.php/
Main/HomePage
7. Atamtürk, A.: Strong formulations of robust mixed 0–1 programming. Math. Pro-
gram. 108(2), 235–250 (2006)
8. Averbakh, I.: The minmax regret permutation flow-shop problem with two jobs.
Eur. J. Oper. Res. 169(3), 761–766 (2006)
9. Averbakh, I., Berman, O.: Algorithms for the robust 1-center problem on a tree.
Eur. J. Oper. Res. 123(2), 292–302 (2000)
10. Beck, A., Ben-Tal, A.: Duality in robust optimization: primal worst equals dual
best. Oper. Res. Lett. 37(1), 1–6 (2009)
274 M. Goerigk and A. Schöbel
11. Ben-Tal, A., Bertsimas, D., Brown, D.B.: A soft robust model for optimization
under ambiguity. Oper. Res. 58(4–Part–2), 1220–1234 (2010)
12. Ben-Tal, A., Boyd, S., Nemirovski, A.: Extending scope of robust optimiza-
tion: comprehensive robust counterparts of uncertain problems. Math. Program.
107(1–2), 63–89 (2006)
13. Ben-Tal, A., Chung, B.D., Mandala, S.R., Yao, T.: Robust optimization for emer-
gency logistics planning: risk mitigation in humanitarian relief supply chains.
Transp. Res. Part B: Methodol. 45(8), 1177–1189 (2011)
14. Ben-Tal, A., Ghaoui, L.E., Nemirovski, A.: Robust Optimization. Princeton Uni-
versity Press, Princeton (2009)
15. Ben-Tal, A., Golany, B., Shtern, S.: Robust multi-echelon multi-period inventory
control. Eur. J. Oper. Res. 199(3), 922–935 (2009)
16. Ben-Tal, A., Goryashko, A., Guslitzer, E., Nemirovski, A.: Adjustable robust
solutions of uncertain linear programs. Math. Program. A 99, 351–376 (2003)
17. Ben-Tal, A., den Hertog, D., Vial, J.P.: Deriving robust counterparts of nonlinear
uncertain inequalities. Math. Program. 149(1), 265–299 (2015)
18. Ben-Tal, A., Nemirovski, A.: Robust convex optimization. Math. Oper. Res.
23(4), 769–805 (1998)
19. Ben-Tal, A., Nemirovski, A.: Robust solutions of uncertain linear programs. Oper.
Res. Lett. 25, 1–13 (1999)
20. Ben-Tal, A., Nemirovski, A.: Robust solutions of linear programming problems
contaminated with uncertain data. Math. Program. A 88, 411–424 (2000)
21. Berman, O., Wang, J.: The minmax regret gradual covering location problem on
a network with incomplete information of demand weights. Eur. J. Oper. Res.
208(3), 233–238 (2011)
22. Bertsimas, D., Brown, D., Caramanis, C.: Theory and applications of robust opti-
mization. SIAM Rev. 53(3), 464–501 (2011)
23. Bertsimas, D., Caramanis, C.: Finite adaptability in multistage linear optimiza-
tion. IEEE Trans. Autom. Control 55(12), 2751–2766 (2010)
24. Bertsimas, D., Goyal, V.: On the power of robust solutions in two-stage stochastic
and adaptive optimization problems. Math. Oper. Res. 35(2), 284–305 (2010)
25. Bertsimas, D., Goyal, V., Sun, X.A.: A geometric characterization of the power
of finite adaptability in multistage stochastic and adaptive optimization. Math.
Oper. Res. 36(1), 24–54 (2011)
26. Bertsimas, D., Pachamanova, D.: Robust multiperiod portfolio management in
the presence of transaction costs. Comput. Oper. Res. 35(1), 3–17 (2008)
27. Bertsimas, D., Pachamanova, D., Sim, M.: Robust linear optimization under gen-
eral norms. Oper. Res. Lett. 32(6), 510–516 (2004)
28. Bertsimas, D., Sim, M.: The price of robustness. Oper. Res. 52(1), 35–53 (2004)
29. Bertsimas, D., Thiele, A.: A robust optimization approach to inventory theory.
Oper. Res. 54(1), 150–168 (2006)
30. Bohle, C., Maturana, S., Vera, J.: A robust optimization approach to wine grape
harvesting scheduling. Eur. J. Oper. Res. 200(1), 245–252 (2010)
31. Bouman, P.C., Akker, J.M., Hoogeveen, J.A.: Recoverable robustness by col-
umn generation. In: Demetrescu, C., Halldórsson, M.M. (eds.) ESA 2011.
LNCS, vol. 6942, pp. 215–226. Springer, Heidelberg (2011). doi:10.1007/
978-3-642-23719-5 19
32. Bruns, F., Goerigk, M., Knust, S., Schöbel, A.: Robust load planning of trains in
intermodal transportation. OR Spectr. 36(3), 631–668 (2014)
Algorithm Engineering in Robust Optimization 275
33. Bürger, M., Notarstefano, G., Allgöwer, F.: A polyhedral approximation frame-
work for convex and robust distributed optimization. IEEE Trans. Autom. Con-
trol 59(2), 384–395 (2014)
34. Büsing, C., Koster, A.M.C.A., Kutschka, M.: Recoverable robust knapsacks: the
discrete scenario case. Optim. Lett. 5(3), 379–392 (2011)
35. Büsing, C., Koster, A., Kutschka, M.: Recoverable robust knapsacks: Γ -scenarios.
In: Pahl, J., Reiners, T., Voß, S. (eds.) Network Optimization. Lecture Notes in
Computer Science, vol. 6701, pp. 583–588. Springer, Heidelberg (2011)
36. Cacchiani, V., Caprara, A., Galli, L., Kroon, L., Maroti, G., Toth, P.: Railway
rolling stock planning: robustness against large disruptions. Transp. Sci. 46(2),
217–232 (2012)
37. Calafiore, G., Campi, M.: Uncertain convex programs: randomized solutions and
confidence levels. Math. Program. 102(1), 25–46 (2005)
38. Calafiore, G.C.: Random convex programs. SIAM J. Optim. 20(6), 3427–3464
(2010)
39. Calafiore, G., Campi, M.: The scenario approach to robust control design. IEEE
Trans. Autom. Control 51(5), 742–753 (2006)
40. Campi, M.C., Garatti, S.: The exact feasibility of randomized solutions of uncer-
tain convex programs. SIAM J. Optim. 19(3), 1211–1230 (2008)
41. Catanzaro, D., Labbé, M., Salazar-Neumann, M.: Reduction approaches for
robust shortest path problems. Comput. Oper. Res. 38(11), 1610–1619 (2011)
42. Chassein, A.B., Goerigk, M.: A new bound for the midpoint solution in minmax
regret optimization with an application to the robust shortest path problem. Eur.
J. Oper. Res. 244(3), 739–747 (2015)
43. Cicerone, S., D’Angelo, G., Stefano, G.D., Frigioni, D., Navarra, A.: Robust algo-
rithms and price of robustness in shunting problems. In: Proceedings of the 7th
Workshop on Algorithmic Approaches for Transportation Modeling, Optimiza-
tion, and Systems (ATMOS 2007) (2007)
44. Cicerone, S., D’Angelo, G., Stefano, G., Frigioni, D., Navarra, A., Schachtebeck,
M., Schöbel, A.: Recoverable robustness in shunting and timetabling. In: Ahuja,
R.K., Möhring, R.H., Zaroliagis, C.D. (eds.) Robust and Online Large-Scale Opti-
mization. LNCS, vol. 5868, pp. 28–60. Springer, Heidelberg (2009). doi:10.1007/
978-3-642-05465-5 2
45. Conde, E.: Minmax regret location–allocation problem on a network under uncer-
tainty. Eur. J. Oper. Res. 179(3), 1025–1039 (2007)
46. Conde, E.: A minmax regret approach to the critical path method with task
interval times. Eur. J. Oper. Res. 197(1), 235–242 (2009)
47. Conde, E.: On a constant factor approximation for minmax regret problems using
a symmetry point scenario. Eur. J. Oper. Res. 219(2), 452–457 (2012)
48. Conde, E., Candia, A.: Minimax regret spanning arborescences under uncertain
costs. Eur. J. Oper. Res. 182(2), 561–577 (2007)
49. D’Angelo, G., Di Stefano, G., Navarra, A.: Recoverable-robust timetables for
trains on single-line corridors. In: Proceedings of the 3rd International Seminar
on Railway Operations Modelling and Analysis - Engineering and Optimisation
Approaches (RailZurich 2009) (2009)
50. Delling, D., Sanders, P., Schultes, D., Wagner, D.: Engineering route planning
algorithms. In: Lerner, J., Wagner, D., Zweig, K.A. (eds.) Algorithmics of Large
and Complex Networks. Lecture Notes in Computer Science, vol. 5515, pp. 117–
139. Springer, Heidelberg (2009)
276 M. Goerigk and A. Schöbel
51. Dhamdhere, K., Goyal, V., Ravi, R., Singh, M.: How to pay, come what may:
approximation algorithms for demand-robust covering problems. In: 46th Annual
IEEE Symposium on Foundations of Computer Science, FOCS 2005, pp. 367–376.
IEEE (2005)
52. Eggenberg, N.: Combining robustness and recovery for airline schedules. Ph.D.
thesis, EPFL (2009)
53. Eggenberg, N., Salani, M., Bierlaire, M.: Uncertainty feature optimization: an
implicit paradigm for problems with noisy data. Networks 57(3), 270–284 (2011)
54. Erera, A., Morales, J., Svalesbergh, M.: Robust optimization for empty reposi-
tioning problems. Oper. Res. 57(2), 468–483 (2009)
55. Falk, J.E.: Exact solutions of inexact linear programs. Oper. Res. 24(4), 783–787
(1976)
56. de Farias, J.R., Zhao, H., Zhao, M.: A family of inequalities valid for the robust
single machine scheduling polyhedron. Comput. Oper. Res. 37(9), 1610–1614
(2010)
57. Feige, U., Jain, K., Mahdian, M., Mirrokni, V.: Robust combinatorial optimiza-
tion with exponential scenarios. In: Fischetti, M., Williamson, D.P. (eds.) IPCO
2007. LNCS, vol. 4513, pp. 439–453. Springer, Heidelberg (2007). doi:10.1007/
978-3-540-72792-7 33
58. Fischetti, M., Monaci, M.: Light robustness. In: Ahuja, R.K., Möhring, R.H.,
Zaroliagis, C.D. (eds.) Robust and Online Large-Scale Optimization. LNCS, vol.
5868, pp. 61–84. Springer, Heidelberg (2009). doi:10.1007/978-3-642-05465-5 3
59. Fischetti, M., Monaci, M.: Cutting plane versus compact formulations for uncer-
tain (integer) linear programs. Math. Program. Comput. 4(3), 239–273 (2012)
60. Fischetti, M., Salvagnin, D., Zanette, A.: Fast approaches to improve the robust-
ness of a railway timetable. Transp. Sci. 43, 321–335 (2009)
61. Fujisaki, Y., Wada, T.: Sequential randomized algorithms for robust optimization.
In: Proceedings of the 46th IEEE Conference on Decision and Control, pp. 6190–
6195 (2007)
62. Fujisaki, Y., Wada, T.: Robust optimization via probabilistic cutting plane tech-
nique. In: Proceedings of the 40th ISCIE International Symposium on Stochastic
Systems Theory and its Applications, pp. 137–142 (2009)
63. Fujisaki, Y., Wada, T.: Robust optimization via randomized algorithms. In:
ICCAS-SICE 2009, pp. 1226–1229 (2009)
64. Ghaoui, L.E., Lebret, H.: Robust solutions to least-squares problems with uncer-
tain data. SIAM J. Matrix Anal. Appl. 18, 1034–1064 (1997)
65. Goerigk, M.: Algorithms and concepts for robust optimization. Ph.D. thesis,
Georg-August Universität Göttingen (2012)
66. Goerigk, M.: ROPI homepage (2013). http://optimierung.mathematik.uni-kl.de/
∼goerigk/ropi/
67. Goerigk, M.: A note on upper bounds to the robust knapsack problem with dis-
crete scenarios. Ann. Oper. Res. 223(1), 461–469 (2014)
68. Goerigk, M.: ROPI - a robust optimization programming interface for C++.
Optim. Methods Softw. 29(6), 1261–1280 (2014)
69. Goerigk, M., Deghdak, K., T’Kindt, V.: A two-stage robustness approach to evac-
uation planning with buses. Transp. Res. Part B: Methodol. 78, 66–82 (2015)
70. Goerigk, M., Heße, S., Müller-Hannemann, M., Schmidt, M., Schöbel, A.: Recov-
erable robust timetable information. In: Frigioni, D., Stiller, S. (eds.) 13th Work-
shop on Algorithmic Approaches for Transportation Modelling, Optimization, and
Systems. OpenAccess Series in Informatics (OASIcs), vol. 33, pp. 1–14. Schloss
Dagstuhl-Leibniz-Zentrum fuer Informatik, Dagstuhl (2013)
Algorithm Engineering in Robust Optimization 277
71. Goerigk, M., Knoth, M., Müller-Hannemann, M., Schmidt, M., Schöbel, A.: The
price of robustness in timetable information. In: Caprara, A., Kontogiannis, S.
(eds.) 11th Workshop on Algorithmic Approaches for Transportation Modelling,
Optimization, and Systems. OpenAccess Series in Informatics (OASIcs), vol. 20,
pp. 76–87. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, Dagstuhl (2011)
72. Goerigk, M., Schmidt, M., Schöbel, A., Knoth, M., Müller-Hannemann, M.: The
price of strict and light robustness in timetable information. Transp. Sci. 48(2),
225–242 (2014)
73. Goerigk, M., Schöbel, A.: An empirical analysis of robustness concepts for
timetabling. In: Erlebach, T., Lübbecke, M. (eds.) 10th Workshop on Algorithmic
Approaches for Transportation Modelling, Optimization, and Systems (ATMOS
2010). OpenAccess Series in Informatics (OASIcs), vol. 14, pp. 100–113. Schloss
Dagstuhl-Leibniz-Zentrum fuer Informatik, Dagstuhl (2010)
74. Goerigk, M., Schöbel, A.: A scenario-based approach for robust linear
optimization. In: Marchetti-Spaccamela, A., Segal, M. (eds.) TAPAS 2011.
LNCS, vol. 6595, pp. 139–150. Springer, Heidelberg (2011). doi:10.1007/
978-3-642-19754-3 15
75. Goerigk, M., Schöbel, A.: Recovery-to-optimality: a new two-stage approach to
robustness with an application to aperiodic timetabling. Comput. Oper. Res.
52(Part A), 1–15 (2014)
76. Goetzmann, K.S., Stiller, S., Telha, C.: Optimization over integers with robustness
in cost and few constraints. In: Solis-Oba, R., Persiano, G. (eds.) Approximation
and Online Algorithms (WAOA 2011). Lecture Notes in Computer Science, vol.
7164, pp. 89–101. Springer, Heidelberg (2012)
77. Goh, J., Sim, M.: Robust optimization made easy with ROME. Oper. Res. 59(4),
973–985 (2011)
78. Golovin, D., Goyal, V., Polishchuk, V., Ravi, R., Sysikaski, M.: Improved approx-
imations for two-stage min-cut and shortest path problems under uncertainty.
Math. Program. 149(1), 167–194 (2015)
79. Iida, H.: A note on the max-min 0–1 knapsack problem. J. Comb. Optim. 3(1),
89–94 (1999)
80. Inuiguchi, M., Sakawa, M.: Minimax regret solution to linear programming prob-
lems with an interval objective function. Eur. J. Oper. Res. 86(3), 526–536 (1995)
81. Jenkins, L.: Selecting scenarios for environmental disaster planning. Eur. J. Oper.
Res. 121(2), 275–286 (2000)
82. Jeyakumar, V., Li, G., Srisatkunarajah, S.: Strong duality for robust minimax
fractional programming problems. Eur. J. Oper. Res. 228(2), 331–336 (2013)
83. Kalai, R., Lamboray, C., Vanderpooten, D.: Lexicographic α-robustness: an alter-
native to min–max criteria. Eur. J. Oper. Res. 220(3), 722–728 (2012)
84. Kasperski, A., Kurpisz, A., Zieliński, P.: Approximating a two-machine flow shop
scheduling under discrete scenario uncertainty. Eur. J. Oper. Res. 217(1), 36–43
(2012)
85. Kasperski, A., Makuchowski, M., Zieliński, P.: A tabu search algorithm for the
minmax regret minimum spanning tree problem with interval data. J. Heuristics
18(4), 593–625 (2012)
86. Kasperski, A., Zieliński, P.: An approximation algorithm for interval data minmax
regret combinatorial optimization problems. Inf. Process. Lett. 97(5), 177–180
(2006)
87. Kasperski, A., Zieliński, P.: Minmax regret approach and optimality evaluation
in combinatorial optimization problems with interval and fuzzy weights. Eur. J.
Oper. Res. 200(3), 680–687 (2010)
278 M. Goerigk and A. Schöbel
88. Khandekar, R., Kortsarz, G., Mirrokni, V., Salavatipour, M.R.: Two-stage robust
network design with exponential scenarios. Algorithmica 65(2), 391–408 (2013)
89. Kouvelis, P., Yu, G.: Robust Discrete Optimization and Its Applications. Kluwer
Academic Publishers, Boston (1997)
90. Liebchen, C., Lübbecke, M., Möhring, R., Stiller, S.: The concept of recoverable
robustness, linear programming recovery, and railway applications. In: Ahuja,
R.K., Möhring, R.H., Zaroliagis, C.D. (eds.) Robust and Online Large-Scale Opti-
mization. LNCS, vol. 5868, pp. 1–27. Springer, Heidelberg (2009). doi:10.1007/
978-3-642-05465-5 1
91. Lin, J., Ng, T.S.: Robust multi-market newsvendor models with interval demand
data. Eur. J. Oper. Res. 212(2), 361–373 (2011)
92. Löfberg, J.: Automatic robust convex programming. Optim. Methods Softw.
27(1), 115–129 (2012)
93. López, M., Still, G.: Semi-infinite programming. Eur. J. Oper. Res. 180(2), 491–
518 (2007)
94. Mani, M., Sing, A.K., Orshansky, M.: Joint design-time and post-silicon mini-
mization of parametric yield loss using adjustable robust optimization. In: Pro-
ceedings of the 2006 IEEE/ACM International Conference on Computer-Aided
Design, ICCAD 2006, pp. 19–26. ACM, New York (2006)
95. Mausser, H.E., Laguna, M.: A heuristic to minimax absolute regret for linear
programs with interval objective function coefficients. Eur. J. Oper. Res. 117(1),
157–174 (1999)
96. Mausser, H., Laguna, M.: A new mixed integer formulation for the maximum
regret problem. Int. Trans. Oper. Res. 5(5), 389–403 (1998)
97. Monaci, M., Pferschy, U.: On the robust knapsack problem. SIAM J. Optim.
23(4), 1956–1982 (2013)
98. Monaci, M., Pferschy, U., Serafini, P.: Exact solution of the robust knapsack
problem. Comput. Oper. Res. 40(11), 2625–2631 (2013)
99. Montemanni, R.: A Benders decomposition approach for the robust spanning tree
problem with interval data. Eur. J. Oper. Res. 174(3), 1479–1490 (2006)
100. Montemanni, R., Gambardella, L.: An exact algorithm for the robust shortest
path problem with interval data. Comput. Oper. Res. 31(10), 1667–1680 (2004)
101. Montemanni, R., Gambardella, L.: A branch and bound algorithm for the robust
spanning tree problem with interval data. Eur. J. Oper. Res. 161(3), 771–779
(2005)
102. Müller-Hannemann, M., Schirra, S. (eds.): Algorithm Engineering. LNCS, vol.
5971. Springer, Heidelberg (2010)
103. Mulvey, J.M., Vanderbei, R.J., Zenios, S.A.: Robust optimization of large-scale
systems. Oper. Res. 43(2), 264–281 (1995)
104. Mutapcic, A., Boyd, S.: Cutting-set methods for robust convex optimization with
pessimizing oracles. Optim. Methods Softw. 24(3), 381–406 (2009)
105. Ng, T.S., Sun, Y., Fowler, J.: Semiconductor lot allocation using robust optimiza-
tion. Eur. J. Oper. Res. 205(3), 557–570 (2010)
106. Nikulin, Y.: Simulated annealing algorithm for the robust spanning tree problem.
J. Heuristics 14(4), 391–402 (2008)
107. Ouorou, A.: Tractable approximations to a robust capacity assignment model in
telecommunications under demand uncertainty. Comput. Oper. Res. 40(1), 318–
327 (2013)
108. Paragon Decision Company: AIMMS - The Language Reference, Version 3.12,
March 2012
Algorithm Engineering in Robust Optimization 279
109. Pereira, J., Averbakh, I.: Exact and heuristic algorithms for the interval data
robust assignment problem. Comput. Oper. Res. 38(8), 1153–1163 (2011)
110. Pérez-Galarce, F., Álvarez-Miranda, E., Candia-Véjar, A., Toth, P.: On exact
solutions for the minmax regret spanning tree problem. Comput. Oper. Res. 47,
114–122 (2014)
111. Reemtsen, R.: Some outer approximation methods for semi-infinite optimization
problems. J. Comput. Appl. Math. 53(1), 87–108 (1994)
112. Roy, B.: Robustness in operational research and decision aiding: a multi-faceted
issue. Eur. J. Oper. Res. 200(3), 629–638 (2010)
113. Sanders, P.: Algorithm engineering – an attempt at a definition. In: Albers, S.,
Alt, H., Näher, S. (eds.) Efficient Algorithms. LNCS, vol. 5760, pp. 321–340.
Springer, Heidelberg (2009). doi:10.1007/978-3-642-03456-5 22
114. Sbihi, A.: A cooperative local search-based algorithm for the multiple-scenario
max–min knapsack problem. Eur. J. Oper. Res. 202(2), 339–346 (2010)
115. Schöbel, A.: Generalized light robustness and the trade-off between robustness
and nominal quality. Math. Methods Oper. Res. 80(2), 161–191 (2014)
116. Siddiqui, S., Azarm, S., Gabriel, S.: A modified Benders decomposition method
for efficient robust optimization under interval uncertainty. Struct. Multidiscip.
Optim. 44(2), 259–275 (2011)
117. Song, X., Lewis, R., Thompson, J., Wu, Y.: An incomplete m-exchange algorithm
for solving the large-scale multi-scenario knapsack problem. Comput. Oper. Res.
39(9), 1988–2000 (2012)
118. Soyster, A.: Convex programming with set-inclusive constraints and applications
to inexact linear programming. Oper. Res. 21, 1154–1157 (1973)
119. Stiller, S.: Extending concepts of reliability. Network creation games, real-time
scheduling, and robust optimization. Ph.D. thesis, TU Berlin (2008)
120. Suzuki, S., Kuroiwa, D., Lee, G.M.: Surrogate duality for robust optimization.
Eur. J. Oper. Res. 231(2), 257–262 (2013)
121. Takeda, A., Taguchi, S., Tütüncü, R.: Adjustable robust optimization models for
a nonlinear two-period system. J. Optim. Theory Appl. 136, 275–295 (2008)
122. Taniguchi, F., Yamada, T., Kataoka, S.: Heuristic and exact algorithms for the
max–min optimization of the multi-scenario knapsack problem. Comput. Oper.
Res. 35(6), 2034–2048 (2008)
123. Thuente, D.J.: Duality theory for generalized linear programs with computational
methods. Oper. Res. 28(4), 1005–1011 (1980)
124. Velarde, J.L.G., Martı́, R.: Adaptive memory programing for the robust capaci-
tated international sourcing problem. Comput. Oper. Res. 35(3), 797–806 (2008)
125. Xu, J., Johnson, M.P., Fischbeck, P.S., Small, M.J., VanBriesen, J.M.: Robust
placement of sensors in dynamic water distribution systems. Eur. J. Oper. Res.
202(3), 707–716 (2010)
126. Yaman, H., Karaşan, O.E., Pınar, M.Ç.: The robust spanning tree problem with
interval data. Oper. Res. Lett. 29(1), 31–40 (2001)
127. Yin, Y., Madanat, S.M., Lu, X.Y.: Robust improvement schemes for road networks
under demand uncertainty. Eur. J. Oper. Res. 198(2), 470–479 (2009)
128. Zanjani, M.K., Ait-Kadi, D., Nourelfath, M.: Robust production planning in a
manufacturing environment with random yield: a case in sawmill production plan-
ning. Eur. J. Oper. Res. 201(3), 882–891 (2010)
129. Zieliński, P.: The computational complexity of the relative robust shortest path
problem with interval data. Eur. J. Oper. Res. 158(3), 570–576 (2004)
Clustering Evolving Networks
1 Introduction
Clustering is a powerful tool to examine the structure of various data. Since in
many fields data often entails an inherent network structure or directly derives
from physical or virtual networks, clustering techniques that explicitly build on
the information given by links between entities recently received great atten-
tion [58,130]. Moreover, many real world networks are continuously evolving,
which makes it even more challenging to explore their structure. Examples for
evolving networks include networks based on mobile communication data, sci-
entific publication data, and data on human interaction.
The structure that is induced by the entities of a network together with the
links between is often called graph, the entities are called verticals and the links
are called edges. However, the terms graph and network are often used inter-
changeably. The structural feature that is classically addressed by graph clus-
tering algorithms are subsets of vertices that are linked significantly stronger to
each other than to vertices outside the subset. In the context of mobile com-
munication networks this could be, for example, groups of cellphone users that
call each other more frequently than others. Depending on the application and
the type of the underlying network, searching for this kind of subsets has many
different names. Sociologists usually speak about community detection or com-
munity mining in social networks, in the context of communication networks
like Twitter, people aim at detecting emerging topics while in citations networks
the focus is on the identification of research areas, to name but a few. All these
issues can be solved by modeling the data as an appropriate graph and apply-
ing graph clustering. The found sets (corresponding to communities, topics or
c Springer International Publishing AG 2016
L. Kliemann and P. Sanders (Eds.): Algorithm Engineering, LNCS 9220, pp. 280–329, 2016.
DOI: 10.1007/978-3-319-49487-6 9
Clustering Evolving Networks 281
research areas) are then called clusters and the set of clusters is called a cluster-
ing. We further remark that also beyond sociology the term community is often
used instead of cluster [106]. The notion of clusters or communities as densely
connected subgroups that are only sparsely connected to each other has led to
the paradigm of intracluster density versus intercluster sparsity in the field of
graph clustering.
Nevertheless, the notion of a clustering given so far still leaves room for many
different formal definitions. Most commonly, a clustering of a graph is defined
as a partition of the vertex set into subsets, which form the clusters. In some
scenarios (e.g., outlier detection) it is, however, undesirable that each vertex is
assigned to a cluster. In this case, a clustering not necessarily forms a partition
of the vertex set, but leaves some vertices unclustered. Yet both concepts are
based on disjoint vertex sets, and the latter can be easily transformed into the
former one by just considering each vertex that is not a member of a cluster
as a cluster consisting of exactly one vertex. Other applications further admit
overlapping clusters, again with or without a complete assignment of the vertices
to clusters.
In this survey we give an overview of recent graph clustering approaches that
aim at finding disjoint or overlapping clusters in evolving graphs. The evolution
of the graph is usually modeled following one of two common concepts: The
first concept is based on a series of snapshots of the graph, where each snapshot
corresponds to a time step, and the difference between two consecutive snapshots
results from a bunch of edge and vertex changes. The second concept considers
a given stream of atomic edge and vertex changes, where each change induces
a new snapshot and a new time step. The primary objective of clustering such
networks is to find a meaningful clustering for each snapshot. Some algorithms
further aim at a particularly fast computation of these clusterings, others assume
that changes have only a small impact on the community structure in each
time step, and thus, aim at clusterings that differ not too much in consecutive
time steps. The latter was introduced as temporal smoothness by Chakrabarti
et al. [32] in the context of clustering evolving attributed data (instead of graphs).
In order to achieve these goals, online algorithms explicitly exploit information
about the graph structure and the community structure of previous time steps.
Algorithms that further use structural information from following time steps
are called offline. In this survey, we consider only online algorithms that can be
roughly separated into two classes. The first class contains clustering approaches
that incorporate temporal smoothness inspired by Chakrabarti et al. Most of
these approaches are based on an existing static clustering algorithm, which is
executed from scratch in each time step (cf. Fig. 1). In contrast, the approaches
in the second class dynamically update clusterings found in previous time steps
without a computation from scratch (cf. Fig. 2).
Apart from finding an appropriate clustering in each snapshot of an evolv-
ing graph, many applications require further steps in order to make the found
clusterings interpretable and usable for further analysis. A first natural question
directly resulting from the evolution of the graph is how the found clusters or
282 T. Hartmann et al.
change change
snapshots
scratch
scratch
scratch
from
from
from
clusterings information information
change change
snapshots
information
information
information
clusterings update update
Fig. 2. Dynamic update strategy for clusterings in evolving graphs. Vertical dashed
arrows indicate the use of information, horizontal arrows indicate the dynamic update
strategy based on previous time steps.
communities evolve over time and at what time steps events like cluster merging
or cluster splitting occur. In order to answer this question, the clusters need to be
tracked over time, thereby finding sequences of snapshots where certain clusters
remain stable while other clusters may split or merge. In this context, clusters
or communities of a single snapshot are often called local in order to distinguish
them from sequences of associated (local) communities in consecutive snapshots,
which describe the evolution of a certain (meta)community over time. When the
evolution of the clusters is supposed to be interpreted by human experts, it is
further necessary to present the algorithmic results in a clear and readable form.
Hence, the visualization of evolving clusters is another central issue in the con-
text of clustering evolving graphs. The evaluation of found clusterings is finally
an issue regarding the design of good clustering algorithms. There are many
open questions on how to choose an appropriate evaluation scheme in order to
get credible and comparable results. We discuss these issues in more detail in
Sect. 1.1 and give a brief idea on applications based on clustering evolving graphs
in Sect. 1.3.
Clustering Evolving Networks 283
Delimitation. Apart from clustering approaches that follow the intracluster den-
sity versus intercluster sparsity paradigm, there exist various further approaches
that look very similar at a first glance but turn out to have a different focus.
Very closely related to graph clustering are algorithms for graph partition-
ing [20]. In contrast to many graph clustering algorithms, graph partitioning
always assumes that the number of clusters is an input parameter, most often, a
power of 2, and seeks to minimize the number of edges cut by a partition, such
that the parts have (almost) equal size. Its main application is not network analy-
sis but the preprocessing of graphs for parallel computing tasks. The dynamic
counterpart to static graph partitioning is often called repartitioning or load bal-
ancing [29,39,96]. Another area similar to clustering evolving graphs is clustering
graph streams [4,160]. Similar to consecutive graph snapshots, a graph stream is
a sequence of consecutively arriving graphs, but instead of finding a clustering
of the vertices in each graph or snapshot, the aim is to detect groups of similar
graphs. The term streaming algorithm usually refers to algorithms that process
the data in one or few passes under the restriction of limited memory availability,
like for example the partitioning algorithm by Stanton and Kliot [138]. However,
some authors also use the adjective streaming or the term stream model in the
context of graph changes in order to describe consecutive atomic changes [9,49].
A further task is the search for stable subgraphs in a given time interval in
an evolving network, i.e., subgraphs that change only slightly during the whole
interval. Depending on the formal definition of stability these subgraphs have
various other names, like heavy subgraphs or high-score subgraphs [23]. Pattern
Mining in evolving graphs is focused on frequently occurring subgraphs, inde-
pendent from their density [24].
Intention and Outline. In this survey, we introduce some of the current graph
clustering approaches for evolving networks that operate in an online scenario.
All approaches have in common that they use structural information from the
previous time steps in order to generate a meaningful clustering for the snap-
shot of the current time step. In doing so, some approaches focus on temporal
smoothness, while other approaches aim at a fast running time and a few even
achieve both.
In contrast to existing surveys on graph clustering, we focus on online algo-
rithms in evolving networks. A very detailed and well-founded presentation of
algorithmic aspects in static (non-evolving) and evolving graphs is further given
in the thesis of Görke [66]. For an overview on clustering techniques in static
graphs see also Schaeffer [130] and Fortunato [58]. The latter also provides a
short abstract on clustering evolving graphs. Aynaud et al. [12] explicitly con-
sider clustering approaches in evolving graphs, however, they do not focus on
the algorithmic aspect of reusing structural information in an online scenario.
Aggarwal and Subbian [2] give an overview on analysis methods where clustering
of slowly and fast evolving networks is one subject among others. Finally, Bilgin
and Yener [21] consider evolving networks from a more general perspective. They
also provide a section on “Clustering Dynamic Graphs”, however, the emphasis
of this section is the above mentioned idea of clustering graph streams.
284 T. Hartmann et al.
This survey is organized as follows. In Sect. 1.1, we discuss the above men-
tioned main issues related to clustering evolving networks in more detail. In
Sect. 1.2 we provide an overview on popular quality and distance measures for
clusterings. The former are of course used for the evaluation of clusterings but
also in the context of algorithm design. The latter are applied for the evaluation
as well as for cluster tracking and event detection. We conclude the introduc-
tion by a brief idea on applications in Sect. 1.3. The main part of this survey is
presented in Sect. 2, where we introduce current clustering approaches according
to our focus described above. Moreover, we provide an overview on the main
features of the presented algorithms in Table 1. Section 3 further lists a selection
of data sets and graph generators used for the evaluation of the approaches pre-
sented in Sect. 2 and briefly discusses the difficulties in choosing appropriate data
for evaluating clustering approaches on evolving graphs. We finally conclude in
Sect. 4.
Notation. Until noted otherwise, we will assume that graphs are simple, i.e.,
they do not contain loops and parallel edges. A dynamic or evolving graph
G = (G0 , . . . , Gtmax ) is a sequence of graphs with Gt = (Vt , Et ) being the state
of the dynamic graph at time step t. Gt is also called snapshot of G. A clustering
Ct = {C1 , . . . , Ck } of Gt is a set of subsets of Vt called clusters or communities.
If these subsets are pairwise disjoint, the clustering is called disjoint, otherwise
it is overlapping. A disjoint clustering that further has the property that each
vertex is contained in a cluster, i.e., that corresponds to a partition of Vt , is called
complete. Complete clusterings are often represented by storing a cluster id for
each vertex that encodes the corresponding cluster. A pair {u, v} of vertices
such that there is a cluster that contains both u and v is called intracluster
pair, otherwise {u, v} is called intercluster pair. An edge between the vertices
of an intracluster pair is called intracluster edge; intercluster edges are defined
analogously. A singleton clustering is a complete clustering where each cluster
contains only one vertex; such clusters are called singleton clusters. The other
extreme, i.e., a clustering consisting of only one cluster containing all vertices, is
called 1-clustering. Each cluster C ⊂ Vt further induces a cut in Gt . A cut in a
graph G = (V, E) is defined by a set S ⊂ V , which indicates one side of the cut.
The other side is implicitly given by V \ S. A cut is thus denoted by (S, V \ S).
Cluster Tracking and Event Detection. Assuming the cluster structure of the
network is already given for each snapshot by an arbitrary clustering approach,
Clustering Evolving Networks 285
detecting the evolution of the clusters over time becomes a task independent
from finding the clusters. Most approaches that address this task describe a
framework of two subproblems. On the one hand, they seek for series of similar
clusters in consecutive snapshots (often called meta communities, meta groups or
time-lines), and on the other hand, they aim at identifying critical events where
clusters, for instance, appear, survive, disappear, split or merge. In particular,
deciding if a cluster that has just disappeared reappears in future time steps,
and thus, actually survives, requires future information, which is not available
in an online scenario. Hence, whether a framework is applicable in an online
scenario depends on the defined events. The frameworks of Takaffoli et al. [143]
and Green et al. [74] are offline frameworks since they compare the structure
of the clusters of the current snapshot to all previous and future snapshots in
order to also find clusters that disappear and reappear after a while. While the
framework by Takaffoli et al. requires disjoint clusters, the approach of Green et
al. also allows overlapping clusters.
In order to compare clusters of consecutive snapshots, many approaches
define a similarity measure considering two clusters as similar if the similar-
ity value exceeds a given threshold. Asur et al. [11] detect similar clusters by the
size of their intersection. Takaffoli et al. [143] normalize this value by the size
of the larger cluster. Berger-Wolf and Saia [19] admit the use of any similarity
measure that is efficiently computable and satisfies some mathematical proper-
ties, as it does the standard Jaccard similarity measure [78] that normalizes the
size of the intersection by the size of the union.
The frameworks mentioned so far can be applied to any cluster structure
in a given graph, regardless which clustering approach was used to find this
structure. Other tracking approaches, however, exploit special properties of the
given cluster structures in the snapshots, and thus, require that the clusters
are constructed by a designated (static) clustering method. Palla et al. [113]
require clusterings found by the clique percolation method (PCM) [42], which
can be also seen as a special case of a clustering method proposed by Everett
and Borgatti [52]. For a brief description of PCM see the part in Sect. 2 where
algorithms maintaining auxiliary structures are introduced. In order to identify
evolving clusters in two consecutive time steps, Palla et al. construct the union
of the two corresponding snapshots and apply again PCM to this union graph.
Due to the properties of the clusters found by PCM, it holds that each cluster
found in one of the snapshots is contained in exactly one cluster in the union
graph. A cluster C in the snapshot at time t − 1 is then associated with the
cluster C in the snapshot at time t that is contained in the same cluster in the
union graph and has the most vertices in common with C. Falkowski et al. [54]
consider clusters that result from a hierarchical divisive edge betweenness clus-
tering algorithm [63]. In contrast to Palla et al. who map clusters only between
two consecutive time steps, Falkowski et al. present an offline approach. They
construct an auxiliary graph that consists of all clusters found in any snapshot
and two clusters are connected by an edge if and only if the relative overlap of
both clusters exceeds a given threshold. On this graph the authors apply the
286 T. Hartmann et al.
In this section, we will give a short overview of quality measures assessing the
goodness of clusterings, followed by a discussion on what has to be additionally
considered in the dynamic scenario, and an introduction to some frequently used
distance measures that can be applied to evaluate the similarity of two cluster-
ings. To give a comprehensive overview of all quality and distance measures used
in the literature is beyond the scope of this survey; further information can be
found for example in the surveys of Fortunato [58] and Wagner et al. [148].
e(S, V \ S)
cond(S, V \ S) =
min{vol(S), vol(V \ S)}
Many variations thereof exist, most of which either replace the volume by
other notions of cluster size, or use a slightly different tradeoff between the cut
size and the sizes of the two induced parts. We give the definition of two of these
variations, namely expansion and normalized cut; the latter is especially popular
in the field of image segmentation [133]:
e(S, V \ S)
exp(S, V \ S) =
min{|S|, |V \ S|}
e(S, V \ S) e(S, V \ S)
ncut(S, V \ S) = +
vol(S) vol(V \ S)
Finding a cut that is minimal with respect to any of the three definitions
above is N P-hard [90,133,135], which is why divisive algorithms are usually
based on approximation algorithms or heuristics. It remains to mention that
cut-based measures are closely related to spectral clustering techniques [95].
It is not immediately clear how the above measures can be used to evaluate
whole clusterings. One possibility is to associate two values with each cluster C,
one that evaluates the cut that separates the cluster from the rest of the graph
and another evaluating all cuts within the subgraph that is induced by C. In
the context of conductance, this leads to the following definition of inter- and
intracluster conductance of a cluster C [27]:
intercluster conductance(C) = cond(C, V \ C)
intracluster conductance(C) = min {cond(S, C \ S)}
S⊂C
speaking, the idea behind this is to compare the number of edges within clusters
to the expected number of edges in the same partition, if edges are randomly
rewired. The most popular measure in this context is the modularity of a cluster-
ing as defined by Girvan and Newman [107] in 2004. Let e(C) denote the number
of edges connecting the vertices in cluster C. Then, the modularity mod(C) of a
(complete) clustering C can be defined as
e(C) vol(C)2
mod(C) = − .
m 4m2
C∈C C∈C
Here, the first term measures the actual fraction of edges within clusters
and the second the expectation of this value after random rewiring, given that
the probability that a rewired edge is incident to a particular vertex is propor-
tional to the degree of this vertex in the original graph. The larger the difference
between these terms, the better the clustering is adjusted to the graph struc-
ture. The corresponding optimization problem is N P-hard [25]. Modularity can
be generalized to weighted [105] and directed [10,89] graphs, to overlapping or
fuzzy clusterings [109,132], and to a local scenario, where the goal is to evalu-
ate single clusters [34,36,94]. Part of its popularity stems from the existence of
heuristic algorithms that optimize modularity and that are able to cluster very
large graphs in short time [22,112,124]. In Sect. 2, we will describe some gener-
alizations of these algorithms to the dynamic setting. Furthermore, in contrast
to many other measures and definitions, modularity does not depend on any
parameters. This might explain why it is still widely used, despite some recent
criticism [59].
also be a good clustering for the snapshot at time step t − 1. This is based on
the underlying assumption that fundamental structural changes are rare. Hence,
linearly combining the snapshot quality of the current clustering with respect to
the current snapshot Gt and the previous snapshot Gt−1 yields a dynamic quality
measure, which can be build from any static quality measure:
This causes the clustering at time step t to also take the structure of snapshot
Gt−1 into account, which implicitly enforces smoothness. Takaffoli et al. [144]
apply this approach in the context of modularity, and Chi et al. [35] in the
context of spectral clustering; both will be discussed in Sect. 2.
the following notion of normalized mutual information (NMI), which maps the
mutual information to the interval [0, 1]:
2I(C, D)
NMI(C, D) =
H(C) + H(D)
This corresponds to counting the number of vertex pairs where both cluster-
ings disagree in their classification as intracluster or intercluster pair, followed
by a normalization. Delling et al. [41] argue that this measure is not appropriate
in the context of graph clustering, as it does not consider the topology of the
underlying graph. They propose to only consider vertex pairs connected by an
edge, which leads to the graph based Rand index. This graph based version is
used by Görke et al. [70] to measure the distance between clusterings at adjacent
time steps.
Chi et al. [35] apply the chi square statistic to enforce and measure the
similarity between adjacent clusterings. The chi square statistic was suggested
by Pearson [117] in 1900 to test for independence in a bivariate distribution.
In the context of comparing partitions, different variants exist [99]; the version
used by Chi et al. is the following:
294 T. Hartmann et al.
|C ∩ D|
χ (C, D) = n ·
2
−1
|C| · |D|
C∈C D∈D
1.3 Applications
GraphScope. The GraphScope approach by Sun et al. [140] is one of the first
and most cited dynamic clustering approaches so far. However, contrary to the
notion of communities as densely connected subgraphs, GraphScope follows
the idea of block modeling, which is another common technique in sociology.
The aim is to group actors in social networks by their role, i.e., structural equiv-
alence. Two actors are equivalent if they interact in the same way with the same
actors (not necessarily with each other). This is, the subgraph induced by such a
group may be disconnected or even consisting of an independent set of vertices.
The latter is the case in approaches like GraphScope that consider bipar-
tite graphs of source and destination vertices and seek for groups of equivalent
vertices in each part, i.e., groups consisting either of source or destination ver-
tices. Furthermore, instead of independent snapshots, GraphScope considers
whole graph segments, which are sequences of similar consecutive snapshots that
(w.l.o.g.) have all the same number of sources and destinations. The main idea is
the following. Given a graph segment and a partition of the vertices in each part
(the same partition for all snapshots in the graph segment), the more similar the
vertices are per group the cheaper are the encoding costs for the graph segment
using an appropriate encoding scheme based on a form of Minimum Description
Length (MDL) [123]. This is, GraphScope seeks for two partitions, one for each
part of the bipartite input graph, that minimize the encoding costs with respect
to the current graph segment. It computes good partitions in that sense by an
iterative greedy approach. Based on the same idea, the MDL is further used
to decide whether a newly arriving snapshot belongs to the current graph seg-
ment or starts a new segment. If the new snapshot belongs to the current graph
segment, the two partitions for the graph segment are updated, initialized by
296 T. Hartmann et al.
the previous partitions. If the new snapshot differs too much from the previous
snapshots, a new segment is started. In order to find new partitions in the new
segment, the iterative greedy approach is either initialized with the partitions of
the previous graph segment or the iterations are done from scratch. The latter
can be seen as a static version of GraphScope. An experimental comparison on
real world data proves a much better running time of the dynamic approach with
respect to the static approach. Additional experiments further illustrate that the
found source and destination partitions correspond to semantically meaningful
clusters. Although this approach focuses on bipartite graphs, it can be easily
modified to deal with unipartite graphs, by constraining the source partitions to
be the same as the destination partitions [31].
to compute the set of new events only every 48 to 96 min, compared to a real
time event identification performed by the dynamic approach. Instead of dense
subgraphs, Agarwal et al. seek for subgraphs that possess the property that
each edge in the subgraph is part of a cycle of length at most 4. This property is
highly local, and thus, can be updated efficiently. An experimental study on real-
world data comparing the dynamic approach to a static algorithm that computes
biconnected subgraphs confirms the efficiency of the local updates.
Dinh et al. [44, 46] (MIEN) Newman and Moore [36] (CNM)
Xie et al. [151] (L ABEL R ANK T) Xie et al. [152] (L ABEL R ANK )
Gehweiler et al. [60] (D I D I C) Meyerhenke et al. [95]
G ENERATIVE M ODELS
costs of the induced minimum s-t-cut. For two non-adjacent vertices u and v
in T , the minimum u-v-cut is given by a lightest edge on the path from u to v
in T . In order to obtain the final complete clustering, the artificial vertex q is
deleted from T , resulting in a set of subtrees inducing the clusters. Due to the
special properties of the minimum s-q-cuts that separate the resulting clusters,
Flake et al. are able to prove a guarantee (depending on α) of the intercluster
expansion and the intracluster expansion of the resulting clustering, which in
general is NP-hard to compute (cf. the cut-based quality measures introduced
in Sect. 1.2). The dynamic version of the cut-clustering algorithm determines
which parts of the current Gomory-Hu tree of Gα become invalid due to an
atomic change in G and describes how to update these parts depending on the
type of the atomic change. The result is a cut clustering of the current graph G
with respect to the same parameter value α as in the previous time step. The
most difficult and also (in theory) most time consuming type of an update is the
update after an edge deletion. However, in most real world instances the actual
effort for this operation is still low, as shown by an experimental evaluation on
real world data. We stress that there also exists another attempt [126,127] that
claims to be a dynamic version of the cut clustering algorithm of Flake et al.,
however, Görke et al. showed that this attempt is erroneous beyond straight-
forward correction. Doll et al. [48] further propose a dynamic version of the
hierarchical cut-clustering algorithm that results from varying the parameter
value α, as shown by Flake et al. [57].
Spectral Graph Clustering Methods. The main idea of static spectral graph clus-
tering is to find an r-dimensional placement of the vertices such that vertices
that form a cluster in an appropriate clustering with respect to a given objective
are close to each other while vertices that are assigned to different clusters are
further away from each other. This can be done by considering the spectrum of
a variation of the adjacency matrix, like for example the Laplacian matrix in
the context of the normalized cut objective [95]. More precisely, many desirable
objectives result in optimization problems that are solved by the eigenvectors
associated with the top-r eigenvalues of a variation of the adjacency matrix
that represents the objective. The rows of the n-by-r matrix formed by these
eigenvectors then represent r-dimensional coordinates of the vertices that favor
the objective. The final clustering is then obtained by applying, for example,
k-means to these data points.
The EvolSpec algorithm by Chi et al. [35] conveys this concept to a
dynamic scenario by introducing objectives that incorporate temporal smooth-
ness. Inspired by Chakrabarti et al. [32], the authors linearly combine snapshot
costs and temporal costs of a clustering at time step t, where the temporal costs
either describe how well the current clustering clusters historic data in time
step t − 1 or how different the clusterings in time step t and t − 1 are. For both
quality measures, they give the matrices that represent the corresponding objec-
tives, and thus, allow the use of these measures in the context of spectral graph
clustering.
302 T. Hartmann et al.
Ning et al. [111] show how to efficiently update the eigenvalues and the
associated eigenvectors for established objectives if an edge or a vertex in the
underlying graph changes. Compared to the static spectral clustering, which
takes O(n3/2 ) time, this linear incremental approach saves a factor of n1/2 . An
experimental evaluation of the running times on Web-blog data (collected by
the NEC laboratories) confirm this theoretical result. The fact that the updates
yield only approximations of the desired values is not an issue, as further exper-
iments on the approximation error and an analysis of the keywords in the found
clusters show.
A concept that is closely related to spectral graph clustering is Low-rank
approximations of the adjacency matrix of a graph. Tong et al. [145] do not
provide a stand-alone community detection algorithm but a fast algorithm that
returns a good low-rank approximation of the adjacency matrix of a graph that
requires only few space. Additionally, they propose efficient updates of these
matrix approximations that may enable many clustering methods that use low-
rank adjacency matrix approximations to also operate on evolving graphs.
breadth first search starting from the set of affected vertices. In their experi-
ments, considering these slightly larger subsets instead of only directly affected
vertices improves modularity and yields a good tradeoff between running time
and quality. The second variant not only stores the clustering from the last
time step but the whole sequence of merge operations in the form of a den-
drogram. A dendrogram is a binary forest where leaves correspond to vertices
in the original graph and vertices on higher levels correspond to merge oper-
ations. Additionally, if a vertex in the dendrogram is drawn in a level above
another vertex, this encodes that the corresponding merge has been performed
later in the algorithm. Figure 3a shows an example of a dendrogram produced
by the static CNM algorithm whose resulting clustering consists of two clusters.
Storing the whole dendrogram across time steps makes backtracking strategies
applicable. To update the clustering for the next time step, the backtracking
procedure first retracts a minimum number of merges such that certain require-
ments are met, which depend on the type of change. In case an intracluster edge
has been inserted, the requirement is that its incident vertices are in separate
clusters after the backtracking procedure. If an intercluster edge is inserted or
an intracluster edge deleted, merges are retracted until both affected vertices
are in singleton clusters. If an intercluster edge is deleted, the dendrogram stays
unchanged. Afterwards, CNM is used to complete this preliminary clustering.
Bansal et al. [16] use a similar approach. Instead of backtracking merges in the
dendrogram, their algorithm repeats all merge operations from the last time step
until an affected vertex is encountered. Again, this preliminary clustering is com-
pleted with the static CNM algorithm. Figure 3 illustrates the difference between
the two approaches. Both studies report a speedup in running time compared to
the static algorithm, Görke et al. additionally show that their approach improves
smoothness significantly. In the experiments of Bansal et al., quality in terms of
modularity is comparable to the static algorithm, while Görke et al. even observe
an improvement of quality on synthetic graphs and excerpts of coauthor graphs
derived from arXiv. Görke et al. additionally compare the backtracking variant of
dGlobal to the variant freeing subsets of vertices; for the test instances, back-
tracking was consistently faster but yielded worse smoothness values.
Fig. 4. Illustration of the Louvain method and the corresponding dendrogram. In the
left part, the depicted edge structures show the graphs before the vertex moves, while
the colored subsets depict the resulting clusters after the vertex moves on the particular
level.
The second static algorithm that has been modified for the dynamic scenario
is a local greedy algorithm often called Louvain method [22]. Similar to CNM,
the algorithm starts with a singleton clustering. Now, vertices of the graph are
considered in a random order. If there is at least one cluster such that moving the
current vertex v to it improves the overall modularity, v is moved to the cluster
that yields the maximal gain in modularity. This process is repeated in sev-
eral rounds until a local maximum is attained. Then, clusters are contracted to
supernodes and edges between clusters summarized as weighted edges, whereas
edges within clusters are mapped to (weighted) self loops. The local moving pro-
cedure is then repeated on the abstracted graph taking edge weights into account.
Contractions and vertex moves are iterated until the graphs stays unchanged.
Then, the clustering is projected down to the lowest level, which represents the
original graph, to get the final result. Figure 4 illustrates this procedure.
Among the modifications of the Louvain method to the dynamic scenario,
the one by Aynaud and Guillaume [13] is the most direct. In their study, instead
of the singleton clustering, the clustering from the last time step is used to
initialize the clustering on the lowest level. Using a dynamic network of webblogs,
they demonstrate that this modification improves smoothness significantly. In
terms of modularity, the modified version follows the static version quite well
and yields better quality than a reference algorithm based on random walks
called Walktrap [118]. The authors further propose to use a tradeoff between
modularity and smoothness by removing a fixed percentage of randomly chosen
vertices from their cluster in each time step, in order to give the algorithm more
freedom to perform necessary changes in the clustering.
An evolutionary version of the Louvain method is proposed by Görke
et al. [70], called tdLocal. Here, the clustering is again reinitialized by the sin-
gleton clustering in each time step. Inspired by Chakrabarti et al. [32], smooth-
ness is encouraged by optimizing a linear combination of modularity and the
graph based Rand index [41]. It is possible to optimize this modified objective
with the Louvain algorithm without increasing the asymptotic running time of
one round.
Clustering Evolving Networks 305
scenario [114,152], roughly based on the idea to only update labels/label vectors
of vertices affected by changes in the graph. The dynamic version of LabelRank
is called LabelRankT.
A concept that is very similar to LabelRank and has been developed in
the context of graph partitioning is diffusion [73,97,98]. Similar to the above
algorithm, each vertex maintains a vector of size k indicating to which extent it
is connected to the vertices of each of the k clusters. The entries of these vectors
are called loads; loads are distributed through the network along the edges in
rounds, which explains the origin of the name diffusion. Based on this concept,
Gehweiler and Meyerhenke [61] propose a distributed graph clustering algorithm
called DiDiC, which is motivated by the task to cluster nodes of a peer-to-peer
based virtual distributed supercomputer. The weight of edges between nodes in
this network corresponds to the bandwidth between the associated peers. The
idea is to find clusters of highly connected peers that can be exploited to solve
a common task in parallel. In contrast to LabelRankT, they use a second
diffusion system drawing the loads associated with cluster i back to the vertices
in cluster i, which accelerates the formation of large, connected clusters. In the
first time step, the process starts with a random clustering and distributes the
load of each cluster to the vertices it contains. After the diffusion process has been
run for a certain number of rounds, clusters are reassigned such that each vertex
moves to the cluster from which it obtained the highest load value, leading to a
complete clustering. The algorithm is made dynamic by initializing the clusters
and load vectors with the values obtained in the previous time step, instead of
random initialization.
Yu et al. [157] that assumes “soft community membership”, i.e. vertices belong
to different clusters to more or less extent. This results in an overlapping clus-
tering. However, these clusters can easily be converted to a complete clustering
in a postprocessing step by assigning each vertex to the cluster it participates
in to the largest extent. For this reason and the fact that this is often done when
comparing complete clusterings to the clusterings produced by FacetNet, we
list the algorithm both under overlapping and non overlapping clusterings in
Table 1. In the generative model, the probability of a certain cluster assignment
at time step t depends on the cluster assignment at step t − 1. Depending on a
parameter ν, the transitions will be more or less smooth. It can be shown that
under certain assumptions, the MAP estimation of this model is equivalent to
the framework of Chakrabarti [32]. In this context, the Kullback-Leibler diver-
gence [82] between the observed weight matrix and an approximation of it based
on cluster assignments is applied as the snapshot cost and the Kullback-Leibler
divergence between the clustering at time step t and at time step t − 1 as history
cost. For the inference step, an expectation maximization algorithm is used that
is guaranteed to converge towards a locally optimal solution of the correspond-
ing MAP problem. In the FacetNet framework, the number of clusters can
change over time. To determine the best number of clusters for each time step,
an extension of modularity to soft community memberships is proposed. In the
experimental part, synthetic and real world networks are used to evaluate the
performance of FacetNet and to compare it to its static counterpart as well
as a static and evolutionary (EvolSpec) version of spectral clustering [35,133].
With respect to quality, the FacetNet approach compares favorably.
In the algorithm of Yang et al., the number of clusters is given as input.
Given the hidden clustering at a certain time step, the conditional probability
for a link between two vertices is determined by the linking probabilities associ-
ated with their respective clusters. These linking probabilities are in turn random
variables such that their prior distribution causes higher linking probabilities for
intracluster edges. The whole generative model corresponds to a Bayesian net
where the latent variables associated with a certain time step depend on the
clustering from the last time step, a matrix A specifying the probability that
a vertex moves from a certain cluster to another in the current time step, and
the prior distribution for the linking probabilities between clusters. Again, the
prior distribution for A biases the moving probabilities in such a way that the
probability for each vertex to move to another community k is smaller than the
probability to stay in its own cluster, which implicitly biases the model towards
smoothness. The model can be generalized to weighted graphs in a straight-
forward way. For the inference step, the authors evaluate both the online and
the offline scenario. In the online scenario, the variables are sampled from time
step to time step using the observations seen so far. In the offline scenario, all
variables are sampled together by taking both past and future observations into
account. In both cases, a Gibbs sampler is used to infer the latent variables.
In the offline scenario, additionally, an expectation maximization algorithm
is proposed. These two variants are then compared against each other, static
Clustering Evolving Networks 309
that each entry in the resulting vector is randomly taken from one of the parent
vectors. The dynamization is trivially done by initializing the population of the
current time step by the result from the last time step. Different evolutionary
metaheuristics are compared with respect to both criteria on a dynamic graph
representing YouTube videos.
While the former approach uses evolutionary metaheuristics, which have
nothing to do with evolutionary clustering according to Chakrabarti et al. [32],
the next approach is again an evolutionary clustering approach. In contrast to
other evolutionary clustering approaches, which most often incorporate tempo-
ral smoothness into a particular clustering algorithm, the framework introduced
by Xu et al. [154] can be applied with any static clustering method. In their
publication the authors use the normalized cut spectral clustering approach by
Yu and Shi [158]. Although the idea of Xu et al. is inspired by Chakrabarti et
al., the main difference is that they do not incorporate temporal smoothness
by optimizing a linear combination of snapshot quality and history quality, but
adapt the input data for the chosen clustering algorithm based on the commu-
nity structure found in the previous snapshot. This adaption is done as follows.
The adjacency matrices of the snapshots are considered as realizations of a non
stationary random process which allows to define an expected adjacency matrix
for the current snapshot. Based on this expected matrix a smoothed adjacency
matrix can be approximated that also takes into account the previous time step.
The smoothed adjacency matrix is a convex combination of the smoothed adja-
cency matrix of the previous time step and the actual adjacency matrix of the
current time step. The parameter that balances the two terms of the convex
combination is estimated such that it minimizes a mean squared error criterion.
The chosen clustering algorithm is then applied to the estimated smoothed adja-
cency matrix, thus incorporating temporal smoothness to stabilize the variation
of the found clusters over time.
All the above clustering approaches use the same objective for the whole
graph to get good clusterings. In this way these approaches consider the input
graph as homogeneous structure, regardless whether parts of the graph are
sparser than others, and thus, possibly require another notion of density for
reasonable communities than denser parts. Wang et al. [84] follow Aggarwal et
al. [3] who claim that considering networks as homogeneous structures is not
an appropriate attempt. This is why Wang et al. introduce patterns describing
homogeneous regions that are consolidated in a second step to generate non
overlapping clusters. In contrast to density, which depends on the number or
the weight of edges within a subgraph or cluster, homogeneity means that all
vertices in a pattern have similarly weighted neighbors. In order to efficiently
compute the patterns in a dynamic scenario, the authors maintain, by incre-
mental updates, a top-k neighbor list and a top-k candidate list as auxiliary
structures. These updates are able to deal with atomic changes as well as with
several changes (of vertices and edge weights) in one time step. In compari-
son with FacetNet [93] and the evolutionary clustering method by Kim and
Han [81], experiments on the DBLP, the ACM and the IBM data set prove a
Clustering Evolving Networks 311
better processing rate (number of vertices processed per second) and a better
accuracy of the found clusterings in locally heterogeneous graphs.
3 Data Sets
Often, researchers developing new or enhanced algorithms are faced with the
question which data sets to use to illustrate the advantages and validity of
their approach. In the context of clustering evolving graphs, the data needs to
have some temporal aspects and, ideally, should come with some well-motivated
ground truth clustering. Additionally, using data that has been previously used
in the literature makes the comparison to other methods less cumbersome. To
simplify the search for suitable test data, this section aims to give an overview
on what kind of data sets have been used in current publications regarding the
clustering of evolving graphs. In the first part, we concentrate on real world
instances, i.e., instances that correspond to data collected from observed rela-
tionships between objects or persons. In the second part, we briefly talk about
models and generators for evolving networks, with a special focus on synthetic
data incorporating a hidden ground truth clustering.
Most networks described in this category are based on human interaction and
can therefore be classified as social networks in the wider sense. We tried to
assign them to more fine grained subcategories depending on their structure and
interpretation.
Cellphone Data. Very similar to email networks are data about phone calls. Palla
et al. [113] cluster a network of phone calls between the customers of a mobile
phone company containing data of over 4 million users. They consider edges to
be weighted; a phone call contributes to the weight between the participating
customers for some time period around the actual time of the call. As metadata
to evaluate their community finding approach, they consider zip code and age of
customers. Similar data is considered by Green et al. [74], however, they do not
consider edge weights. The Reality MiningDataset [50] is provided by the MIT
Human Dynamics Lab4 and was collected during a social science experiment in
2004. It includes information about call logs, Bluetooth devices in proximity, cell
2
Available at http://www.cs.cmu.edu/∼enron/.
3
For further details and for downloading the whole dataset, please visit http://
i11www.iti.uni-karlsruhe.de/en/projects/spp1307/emaildata.
4
http://realitycommons.media.mit.edu/realitymining.html.
Clustering Evolving Networks 313
tower IDs, application usage, and phone status of 94 subjects over the course
of an academic year. In the context of dynamic graph clustering, it is possible
to extract test data in various ways. Xu et al. [154] construct a dynamic graph
where the edge weight between two participants in a time step corresponds to the
number of intervals in which they were in close physical proximity. As ground
truth clustering, they use the affiliations of the participants. Sun et al. [140]
additionally consider the cellphone activity to construct a second dynamic graph.
Online Social Networks and Blogs. Another prime example of social networks
are online social networks like Facebook or Flickr. In the context of clustering
algorithms, they are particularly interesting due to their size and the fact that
friendship links are explicit and not implicitly assumed with the help of other
metadata. Viswanath et al. [147] crawled the regional network of Facebook in
New Orleans. Only data from public profiles is collected, giving information
about approximately 63000 users and 1.5 Mio. friendship links, together with
their evolution. Nguyen et al. [108] and Dinh et al. [44] use these data to evalu-
ate their clustering algorithms. Kumar et al. [83] analyze data from Flickr5 and
Yahoo! 360◦ . Whereas Yahoo! 360◦ was a typical social network that does not
exist anymore, Flickr has a focus on the sharing of photos, although friendship
links exist as well. Both datasets are used by the authors in anonymized form and
are not publicly available. Closely related to online social networks are networks
derived from blogging platforms; here, the edges correspond to entry-to-entry
links between different blogs [13,93,111,156]. Angel et al. [9] use sampled data
obtained via Twitter’s restricted access to its data stream6 . LiveJournal7 is some-
where in between a network of blogs and an online social network. Interesting is
that users can explicitly create friendship links as well as join groups. In contrast
to the usual way dynamic networks are build from blogs, edges do not neces-
sarily correspond to links but can also depend on friendship links. Backstrom
et al. [14] study the evolution of communities in LiveJournal using friendship
links as edges and group membership as (overlapping) ground truth clustering.
5
http://www.flickr.com/.
6
https://dev.twitter.com/docs/streaming-apis#sampling.
7
http://www.livejournal.com/.
314 T. Hartmann et al.
Small Examples. Many publications about static graph clustering include the
analysis of small networks to illustrate some properties of the clusterings pro-
duced by their algorithm. A famous example for that is the karate network
collected by Zachary in 1977 [159], which describes friendship links between
members of a karate club before the club split up due to an internal dispute;
a typical question is whether a clustering algorithm is able to predict the split
given the network structure. Usually these networks are small enough to be
visualized entirely in an article, which enables readers to compare different clus-
terings of these networks across several publications. The closest evolving analog
to Zachary’s karate network is the Southern Women data set collected in 1933 by
Davis et al. [40]. It contains data on the social activities of 18 women observed
over a nine-month period. Within this period, they recorded for 14 informal
events whether these women participated or not. It has been used as a test set
by Berger et al. [18], Berger-Wolf and Saia [19], and Yang et al. [156]. Another
interesting small example is Grevi’s zebra data set [142] used by Berger et al. [18].
It consists of information about the spatial proximity of members of a zebra herd
observed over three months, corresponding to 44 observations or time steps.
Other Data Sets. In the following, we will list some further sources for dynamic
graph data already used to evaluate dynamic graph clustering algorithms.
Xie et al. [152] have used graphs representing the topology of the internet at
8
http://www.informatik.uni-trier.de/∼ley/db/.
9
http://arxiv.org/.
10
http://i11www.iti.uni-karlsruhe.de/en/projects/spp1307/dyneval.
11
http://www.cs.cornell.edu/projects/kddcup/datasets.html.
Clustering Evolving Networks 315
the level of autonomous systems (AS Graphs) based on data collected by the
University of Oregon Route Views Project [92]. These data are available from
the Stanford Large Network Dataset Collection12 . Xu et al. [154] try to iden-
tify communities of spammers in data from Project Honey Pot13 , an ongoing
project to identify spammers. Sun et al. [141] use data extracted from the social
bookmarking web service Delicious14 , which naturally comes with a plenitude
of metadata. Kim et al. [80] use data from youtube crawls15 in their evaluation.
Pang et al. [114] cluster a dynamic network of players of World of Warcraft,
where edges are based on the information whether they take part in the same
group.
Static Networks with Artificial Dynamics. Apart from real world data with a
naturally given temporal evolution, it is also possible to artificially incorporate
some dynamics into originally static data. Riedy et al. [121], for example, consider
static real world networks that become dynamic by generating random edge
deletions and insertions.
Depending on the aim of designing a certain clustering algorithm, there are good
reasons to use synthetic data as well as good reasons to use not only synthetic
data for the evaluation. Synthetic data refers to graphs that are artificially gen-
erated by the help of a graph generator. Given a number of vertices, these gen-
erators decide which vertices are connected by an edge based on the probability
of such an edge. The edge probabilities are derived for example from a prefer-
ential attachment process [17], where vertices that already have a high degree
are connected with higher probability than others, or from other rules that are
characteristic for the particular generator. In the context of evolving graphs,
graph generators usually not only have to decide which vertices are linked but
also which vertices or edges are added or deleted. Furthermore, if the generator
incorporates a hidden ground truth clustering, this usually evolves randomly as
well, which in turn influences edge probabilities.
One reason to include real world instances, i.e., instances that stem from typ-
ical applications, in the experimental evaluation is that they frequently exhibit
very specific properties and symmetries that are difficult to analyze and rebuild
in synthetic data. Hence, to predict the performance of an algorithm in a certain
application, using only synthetic data is unrewarding, since experiments involv-
ing sample instances stemming from this application are often more accurate.
This raises the question of why to use synthetic data at all. There are some
good arguments that justify the use of synthetic data, at least together with real
world data:
12
http://snap.stanford.edu/data/index.html.
13
http://www.projecthoneypot.org/.
14
https://delicious.com/.
15
http://netsg.cs.sfu.ca/youtubedata/.
316 T. Hartmann et al.
share a certain fraction of their links with other vertices in their cluster and
the remaining links with random vertices in other parts of the graph. The LFR
benchmark has been generalized to weighted and directed graphs, as well as to
overlapping clusters [85]. Among the clustering algorithms described in Sect. 2,
Dinh et al. [44] have used a modification of this benchmark to a dynamic setting,
whereas Cazabet et al. [30] only use it in a static scenario. Green et al. [74] use
dynamic benchmarks based on LFR graphs that incorporate different cluster
events, including membership switching, cluster growth, shrinkage, birth and
death, and the merge and split of clusters. After the ground truth clustering has
been adapted, a new random graph is drawn according to the mechanisms of the
LFR benchmark, which results in large differences between adjacent timesteps.
Aldecoa and Marı́n [6] finally suggest to interpolate between two graphs with
a significant clustering structure by rewiring edges at random. This is proposed
as an alternative to benchmarks like the GN or LFR benchmark in the context
of static clustering algorithms. Here, the assumption is that clusterings of the
intermediate states of the graph during the rewiring process should have low
distance to both the ground truth clustering of the initial and the final state.
The rewiring process could be seen as a model for community evolution. In the
context of tracking clusterings over time, Berger et al. [18] do not consider models
for dynamic graphs but two scenarios for the evolution of clusters that are more
sophisticated than random vertex moves or cluster splits and merges. It remains
to mention that, in principle, all generative models used to infer clusterings
via a Bayesian approach discussed in Sect. 2 might also be used as benchmark
instances, as they naturally come with a dynamic ground truth clustering.
3.3 Summary
Nowadays, a lot of large real world networks have been collected and made
available by projects like the Stanford Large Network Dataset Collection16 . One
problem in the context of evaluating clustering algorithms for evolving networks
is that even if the original data itself has a temporal aspect, this information
is often missing in the thereof constructed networks readily provided in many
benchmark sets. On the other hand, the listing in Sect. 3.1 reveals that there
is no real lack of dynamic data that is publicly available. A downside of these
data is that converting them to dynamic graphs is often laborious and leaves
many degrees of freedom. As discussed in the context of the Enron network,
data from the same origin can lead to quite different dynamic graphs, depending
on the design choices taken. This makes the comparison of results across different
publications cumbersome. For static graph clustering, a set of very frequently
used networks mostly taken from the websites of Newman17 and Arenas18 gained
some popularity in the orbit of modularity based methods. It would be nice to
have a similar set of common benchmark graphs that are evolving over time.
16
http://snap.stanford.edu/data/.
17
http://www-personal.umich.edu/∼mejn/netdata/.
18
http://deim.urv.cat/∼aarenas/data/welcome.htm.
Clustering Evolving Networks 319
4 Conclusion
Clustering evolving networks is at least as difficult as clustering static networks
since it inherits all the difficulties from the static case and is further faced with
additional problems that arise from the evolution of the considered networks. The
difficulties inherited from static graph clustering are the many different ideas of
what a good clustering is and what a good clustering algorithm is supposed to
do, as well as the absence of approved benchmark instances to evaluate and com-
pare the performance of clustering algorithms. Additional tasks arise whenever
we seek for temporal smoothness or want to detect and visualize the evolution of
clusters over time. Among the vast number of algorithms designed for detecting
clusters in evolving graphs, in this survey we only considered graph clustering
approaches in online scenarios with an algorithmic focus on the exploitation of
structural information from previous time steps. We presented several state-of-
the-art algorithms in different categories and summarized the main features of
these algorithms in Table 1. As a first step towards common benchmark sets for
the evaluation of clustering algorithms also in evolving networks, we explicitly
listed data and graph generators that were used by the authors of the publi-
cations presented in this survey. With this list we aim at viewing the variety
of available data and providing a collection to other authors in order to help
them finding reasonable test instances for their particular algorithm. Further-
more, we discussed tasks like cluster mapping, event detection and visualization,
which make the found cluster information beneficial for further analysis. We
gave a brief overview on state-of-the-art approaches solving also these problems
and gave some further references where the reader can find more information
regarding these issues.
References
1. Agarwal, M.K., Ramamritham, K., Bhide, M.: Real time discovery of dense clus-
ters in highly dynamic graphs: identifying real world events in highly dynamic
environments. In: Proceedings of the 38th International Conference on Very Large
Databases (VLDB 2012), pp. 980–991 (2012)
2. Aggarwal, C.C., Subbian, K.: Evolutionary network analysis: a survey. ACM Com-
put. Surv. 47(10), 10:1–10:36 (2014)
3. Aggarwal, C.C., Xie, Y., Yu, P.S.: Towards community detection in locally het-
erogeneous networks. In: Proceedings of the Fifth SIAM International Conference
on Data Mining, pp. 391–402. SIAM (2011)
4. Aggarwal, C.C., Zhao, Y., Yu, P.: A framework for clustering massive
graph streams. Stat. Anal. Data Min. 3(6), 399–416 (2010). http://dx.doi.
org/10.1002/sam.10090
320 T. Hartmann et al.
19. Berger-Wolf, T., Saia, J.: A framework for analysis of dynamic social networks. In:
Proceedings of the 12th ACM SIGKDD International Conference on Knowledge
Discovery and Data Mining, pp. 523–528. ACM Press (2006)
20. Bichot, C.E., Siarry, P. (eds.): Graph Partitioning. Wiley, Hoboken (2011).
http://onlinelibrary.wiley.com/book/10.1002/9781118601181
21. Bilgin, C.C., Yener, B.: Dynamic network evolution: models, clustering, anomaly
detection. Technical report, Rensselaer University, NY (2008). http://citeseerx.
ist.psu.edu/viewdoc/download?rep=rep1&type=pdf&doi=10.1.1.161.6375
22. Blondel, V., Guillaume, J.L., Lambiotte, R., Lefebvre, E.: Fast unfolding of com-
munities in large networks. J. Stat. Mech. Theory Exp. 2008(10) (2008). http://
dx.doi.org/10.1088/1742-5468/2008/10/P10008
23. Bogdanov, P., Mongiovi, M., Singh, A.K.: Mining heavy subgraphs in time-
evolving networks. In: Proceedings of the 2011 IEEE International Conference
on Data Mining, pp. 81–90. IEEE Computer Society (2011)
24. Borgwardt, K.M., Kriegel, H.P., Wackersreuther, P.: Pattern mining in frequent
dynamic subgraphs. In: Proceedings of the 2006 IEEE International Conference
on Data Mining, pp. 818–822. IEEE Computer Society (2006)
25. Brandes, U., Delling, D., Gaertler, M., Görke, R., Höfer, M., Nikoloski, Z., Wag-
ner, D.: On modularity clustering. IEEE Trans. Knowl. Data Eng. 20(2), 172–188
(2008). http://doi.ieeecomputersociety.org/10.1109/TKDE.2007.190689
26. Brandes, U., Gaertler, M., Wagner, D.: Experiments on graph clustering algo-
rithms. In: Battista, G., Zwick, U. (eds.) ESA 2003. LNCS, vol. 2832, pp. 568–579.
Springer, Heidelberg (2003). doi:10.1007/978-3-540-39658-1 52, http://www.
springerlink.com/openurl.asp?genre=article&issn=0302-9743&volume=2832&
spage=568
27. Brandes, U., Gaertler, M., Wagner, D.: Engineering graph clustering: models
and experimental evaluation. ACM J. Exp. Algorithmics 12(1.1), 1–26 (2007).
http://portal.acm.org/citation.cfm?id=1227161.1227162
28. Bron, C., Kerbosch, J.A.G.M.: Algorithm 457: finding all cliques of an undirected
graph. Commun. ACM 16(9), 575–577 (1973)
29. Catalyurek, U., Boman, E., Devine, K., Bozdag, D., Heaphy, R., Riesen, L.A.:
Hypergraph-based dynamic load balancing for adaptive scientific computations.
In: 21th International Parallel and Distributed Processing Symposium (IPDPS
2007), pp. 1–11. IEEE Computer Society (2007)
30. Cazabet, R., Amblard, F., Hanachi, C.: Detection of overlapping communities in
dynamical social networks. In: Proceedings of the 2010 IEEE Second International
Conference on Social Computing, pp. 309–314. IEEE (2010)
31. Chakrabarti, D.: AutoPart: parameter-free graph partitioning and outlier detec-
tion. In: Proceedings of the 8th European Conference on Principles and Practice
of Knowledge Discovery in Databases, pp. 112–124. ACM Press (2004)
32. Chakrabarti, D., Kumar, R., Tomkins, A.S.: Evolutionary clustering. In: Proceed-
ings of the 12th ACM SIGKDD International Conference on Knowledge Discovery
and Data Mining, pp. 554–560. ACM Press (2006). http://doi.acm.org/10.1145/
1150402.1150467
33. Chen, J., Fagnan, J., Goebel, R., Rabbany, R., Sangi, F., Takaffoli, M., Verbeek,
E., Zaı̈ane, O.R.: Meerkat: community mining with dynamic social networks. In:
Proceedings in the 10th IEEE International Conference on Data Mining - Work-
shops, pp. 1377–1380. IEEE Computer Society, December 2010
322 T. Hartmann et al.
34. Chen, J., Zaı̈ane, O.R., Goebel, R.: Detecting communities in large networks by
iterative local expansion. In: Proceedings of the 2009 IEEE International Confer-
ence on Computational Aspects of Social Networks, pp. 105–112. IEEE Computer
Society (2009)
35. Chi, Y., Song, X., Zhou, D., Hino, K., Tseng, B.L.: Evolutionary spectral clus-
tering by incorporating temporal smoothness. In: Proceedings of the 13th ACM
SIGKDD International Conference on Knowledge Discovery and Data Mining,
pp. 153–162. ACM Press (2007). http://doi.acm.org/10.1145/1281192.1281212
36. Clauset, A.: Finding local community structure in networks. Phys. Rev. E 72(2),
026132 (2005). http://link.aps.org/doi/10.1103/PhysRevE.72.026132
37. Clauset, A., Newman, M.E.J., Moore, C.: Finding community structure in very
large networks. Phys. Rev. E 70(066111) (2004). http://link.aps.org/abstract/
PRE/v70/e066111
38. Condon, A., Karp, R.M.: Algorithms for graph partitioning on the planted par-
tition model. Randoms Struct. Algorithms 18(2), 116–140 (2001). http://dx.doi.
org/10.1002/1098-2418(200103)18:2116::AID-RSA10013.0.CO;2-2
39. Cybenko, G.: Dynamic load balancing for distributed memory mul-
tiprocessors. J. Parallel Distrib. Comput. 7(2), 279–301 (1989).
http://dx.doi.org/10.1016/0743-7315(89)90021-X
40. Davis, A., Gardner, B., Gardner, M.R.: Deep South. University of Chicago Press,
Chicago (1941)
41. Delling, D., Gaertler, M., Görke, R., Wagner, D.: Engineering comparators for
graph clusterings. In: Fleischer, R., Xu, J. (eds.) AAIM 2008. LNCS, vol. 5034,
pp. 131–142. Springer, Heidelberg (2008). doi:10.1007/978-3-540-68880-8 14
42. Derényi, I., Palla, G., Vicsek, T.: Clique percolation in random networks. Phys.
Rev. Lett. 94, 160202 (2005). http://link.aps.org/abstract/PRL/v94/e160202
43. Ding, C.H.Q., He, X., Zha, H., Gu, M., Simon, H.D.: A min-max cut algorithm
for graph partitioning and data clustering. In: Proceedings of the 2001 IEEE
International Conference on Data Mining, pp. 107–114. IEEE Computer Society
(2001). http://dx.doi.org/10.1109/ICDM.2001.989507
44. Dinh, T.N., Nguyen, N.P., Thai, M.T.: An adaptive approximation algorithm for
community detection in dynamic scale-free networks. In: Proceedings of the 32th
Annual Joint Conference of the IEEE Computer and Communications Societies
(Infocom). IEEE Computer Society Press (2013, to appear)
45. Dinh, T.N., Shin, I., Thai, N.K., Thai, M.T., Znati, T.: A general approach
for modules identification in evolving networks. In: Hirsch, M.J., Pardalos,
P.M., Murphey, R. (eds.) Dynamics of Information Systems. Springer Opti-
mization and Its Applications, vol. 40, pp. 83–100. Springer, New York (2010).
http://dx.doi.org/10.1007/978-1-4419-5689-7 4
46. Dinh, T.N., Thai, M.T.: Community detection in scale-free networks: approxima-
tion algorithms for maximizing modularity. IEEE J. Sel. Areas Commun. 31(6),
997–1006 (2013)
47. Dinh, T.N., Ying, X., Thai, M.T.: Towards social-aware routing in dynamic com-
munication networks. In: Proceedings of the 28th International Performance Com-
puting and Communications Conference (IPCCC), pp. 161–168 (2009)
48. Doll, C., Hartmann, T., Wagner, D.: Fully-dynamic hierarchical graph clus-
tering using cut trees. In: Dehne, F., Iacono, J., Sack, J.-R. (eds.) WADS
2011. LNCS, vol. 6844, pp. 338–349. Springer, Heidelberg (2011). doi:10.1007/
978-3-642-22300-6 29
49. Duan, D., Li, Y., Li, R., Lu, Z.: Incremental k-clique clustering in dynamic social
networks. Artif. Intell. 38(2), 129–147 (2012)
Clustering Evolving Networks 323
50. Eagle, N., Pentland, A.: Reality mining: sensing complex social systems. J. Pers.
Ubiquit. Comput. 10(4), 255–268 (2006)
51. Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A density-based algorithm for dis-
covering clusters in large spatial databases with noise. In: Proceedings of the
2nd ACM SIGKDD International Conference on Knowledge Discovery and Data
Mining, pp. 226–231. ACM Press (1996)
52. Everett, M.G., Borgatti, S.P.: Analyzing clique overlap. Connections 21(1), 49–61
(1998)
53. Falkowski, T.: Community analysis in dynamic social networks. Ph.D. thesis,
Otto-von-Guericke-Universität Magdeburg (2009)
54. Falkowski, T., Bartelheimer, J., Spiliopoulou, M.: Mining and visualizing the evo-
lution of subgroups in social networks. In: IEEE/WIC/ACM International Con-
ference on Web Intelligence, pp. 52–58. IEEE (2006)
55. Falkowski, T., Barth, A., Spiliopoulou, M.: Dengraph: A density-based commu-
nity detection algorithm. In: IEEE/WIC/ACM International Conference on Web
Intelligence, pp. 112–115. IEEE (2007)
56. Fan, Y., Li, M., Zhang, P., Wu, J., Di, Z.: Accuracy and precision of methods for
community identification in weighted networks. Phys. A 377(1), 363–372 (2007).
http://www.sciencedirect.com/science/article/pii/S0378437106012386
57. Flake, G.W., Tarjan, R.E., Tsioutsiouliklis, K.: Graph clustering and minimum
cut trees. Internet Math. 1(4), 385–408 (2004). http://www.internetmathematics.
org/volumes/1.htm
58. Fortunato, S.: Community detection in graphs. Phys. Rep. 486(3–5), 75–174
(2010). http://www.sciencedirect.com/science/journal/03701573
59. Fortunato, S., Barthélemy, M.: Resolution limit in community detection. Proc.
Natl. Acad. Sci. U.S.A. 104(1), 36–41 (2007). http://www.pnas.org/content/
104/1/36.full.pdf
60. Gaertler, M., Görke, R., Wagner, D.: Significance-driven graph clustering. In:
Kao, M.-Y., Li, X.-Y. (eds.) AAIM 2007. LNCS, vol. 4508, pp. 11–26. Springer,
Heidelberg (2007). doi:10.1007/978-3-540-72870-2 2, http://www.springerlink.
com/content/nrq6tlm286808887/?p=65f77ccbb2674a16b9a67da6bb370dc7&pi=5
61. Gehweiler, J., Meyerhenke, H.: A distributed diffusive heuristic for clustering a
virtual P2P supercomputer. In: Proceedings of the 7th High-Performance Grid
Computing Workshop (HGCW 2010) in Conjunction with 24th International Par-
allel and Distributed Processing Symposium (IPDPS 2010), pp. 1–8. IEEE Com-
puter Society (2010)
62. Gilbert, H.: Random graphs. Ann. Math. Stat. 30(4), 1141–1144 (1959)
63. Girvan, M., Newman, M.E.J.: Community structure in social and biological net-
works. Proc. Natl. Acad. Sci. U.S.A. 99(12), 7821–7826 (2002)
64. Gloor, P.A., Zhao, Y.: TeCFlow - a temporal communication flow visualizer for
social network analysis. In: ACM CSCW Workshop on Social Networks (2004)
65. Gomory, R.E., Hu, T.: Multi-terminal network flows. J. Soc. Ind. Appl. Math.
9(4), 551–570 (1961)
66. Görke, R.: An algorithmic walk from static to dynamic graph clustering. Ph.D.
thesis, Fakultät für Informatik, February 2010. http://digbib.ubka.uni-karlsruhe.
de/volltexte/1000018288
67. Görke, R., Hartmann, T., Wagner, D.: Dynamic graph clustering using minimum-
cut trees. In: Dehne, F., Gavrilova, M., Sack, J.-R., Tóth, C.D. (eds.) WADS
2009. LNCS, vol. 5664, pp. 339–350. Springer, Heidelberg (2009). doi:10.1007/
978-3-642-03367-4 30. http://dx.doi.org/10.1007/978-3-642-03367-4 30
324 T. Hartmann et al.
68. Görke, R., Hartmann, T., Wagner, D.: Dynamic graph clustering using minimum-
cut trees. J. Graph Algorithms Appl. 16(2), 411–446 (2012)
69. Görke, R., Kluge, R., Schumm, A., Staudt, C., Wagner, D.: An efficient generator
for clustered dynamic random networks. In: Even, G., Rawitz, D. (eds.) MedAlg
2012. LNCS, vol. 7659, pp. 219–233. Springer, Heidelberg (2012). doi:10.1007/
978-3-642-34862-4 16
70. Görke, R., Maillard, P., Schumm, A., Staudt, C., Wagner, D.: Dynamic graph clus-
tering combining modularity and smoothness. ACM J. Exp. Algorithmics 18(1),
1.5:1.1–1.5:1.29 (2013). http://dl.acm.org/citation.cfm?doid=2444016.2444021
71. Görke, R., Schumm, A., Wagner, D.: Density-constrained graph clustering.
In: Dehne, F., Iacono, J., Sack, J.-R. (eds.) WADS 2011. LNCS, vol. 6844,
pp. 679–690. Springer, Heidelberg (2011). doi:10.1007/978-3-642-22300-6 58.
http://link.springer.com/chapter/10.1007/978-3-642-22300-6 58?null
72. Görke, R., Staudt, C.: A generator for dynamic clustered random graphs.
Technical report, ITI Wagner, Faculty of Informatics, Universität Karlsruhe
(TH) (2009). http://i11www.iti.uni-karlsruhe.de/projects/spp1307/dyngen,
informatik, Uni Karlsruhe, TR 2009-7
73. Grady, L., Schwartz, E.I.: Isoperimetric graph partitioning for image segmenta-
tion. IEEE Trans. Pattern Anal. Mach. Intell. 28(3), 469–475 (2006)
74. Greene, D., Doyle, D., Cunningham, P.: Tracking the evolution of communities in
dynamic social networks. In: Proceedings of the 2010 IEEE/ACM International
Conference on Advances in Social Networks Analysis and Mining, pp. 176–183.
IEEE Computer Society (2010)
75. Guimerà, R., Sales-Pardo, M., Amaral, L.A.N.: Module identification
in bipartite and directed networks. Phys. Rev. E 76, 036102 (2007).
http://link.aps.org/doi/10.1103/PhysRevE.76.036102
76. Held, P., Kruse, R.: Analysis and visualization of dynamic clusterings. In: Pro-
ceedings of the 46th Hawaii International Conference on System Sciences, pp.
1385–1393 (2013)
77. Hopcroft, J.E., Khan, O., Kulis, B., Selman, B.: Tracking evolving communities
in large linked networks. Proc. Natl. Acad. Sci. U.S.A. 101, 5244–5253 (2004).
http://www.pnas.org/content/101/suppl.1/5249.abstract
78. Jaccard, P.: The distribution of flora in the alpine zone. New Phytol. 11(2), 37–50
(1912)
79. Kannan, R., Vempala, S., Vetta, A.: On clusterings: good, bad, spectral. J. ACM
51(3), 497–515 (2004)
80. Kim, K., McKay, R.I., Moon, B.R.: Multiobjective evolutionary algorithms for
dynamic social network clustering. In: Proceedings of the 12th Annual Conference
on Genetic and Evolutionary Computation, pp. 1179–1186. ACM Press (2010)
81. Kim, M.S., Han, J.: A particle-and-density based evolutionary clustering method
for dynamic networks. In: Proceedings of the 35th International Conference on
Very Large Databases (VLDB 2009), pp. 622–633 (2009)
82. Kullback, S., Leibler, R.A.: On information and sufficiency. Ann. Math. Stat.
22(1), 79–86 (1951)
83. Kumar, R., Novak, J., Tomkins, A.S.: Structure and evolution of online social
networks. In: Proceedings of the 12th ACM SIGKDD International Conference on
Knowledge Discovery and Data Mining, pp. 611–617. ACM Press (2006). http://
doi.acm.org/10.1145/1150402.1150476
84. Lai, J.H., Wang, C.D., Yu, P.: Dynamic community detection in weighted graph
streams. In: Proceedings of the 2013 SIAM International Conference on Data
Mining, pp. 151–161. SIAM (2013)
Clustering Evolving Networks 325
85. Lancichinetti, A., Fortunato, S.: Benchmarks for testing community detection
algorithms on directed and weighted graphs with overlapping communities. Phys.
Rev. E 80(1), 016118 (2009)
86. Lancichinetti, A., Fortunato, S., Kertész, J.: Detecting the overlapping and hier-
archical community structure of complex networks. New J. Phys. 11(033015)
(2009). http://www.iop.org/EJ/njp
87. Lancichinetti, A., Fortunato, S., Radicchi, F.: Benchmark graphs for testing com-
munity detection algorithms. Phys. Rev. E 78(4), 046110 (2008)
88. Lee, C., Cunningham, P.: Benchmarking community detection methods on social
media data. Preprint, arXiv:1302.0739 [cs.SI] (2013)
89. Leicht, E.A., Newman, M.E.J.: Community structure in directed net-
works. Phys. Rev. Lett. 100(11), 118703+ (2008). http://dx.doi.org/10.1103/
PhysRevLett.100.118703
90. Leighton, F.T., Rao, S.: Multicommodity max-flow min-cut theorems and their
use in designing approximation algorithms. J. ACM 46(6), 787–832 (1999).
http://portal.acm.org/citation.cfm?doid=331524.331526
91. Leskovec, J., Backstrom, L., Kumar, R., Tomkins, A.S.: Microscopic evolution of
social networks. In: Proceedings of the 14th ACM SIGKDD International Confer-
ence on Knowledge Discovery and Data Mining, pp. 462–470. ACM Press (2008)
92. Leskovec, J., Kleinberg, J.M., Faloutsos, C.: Graphs over time: densification laws,
shrinking diameters and possible explanations. In: Proceedings of the 11th ACM
SIGKDD International Conference on Knowledge Discovery and Data Mining,
pp. 177–187. ACM Press (2005). http://portal.acm.org/citation.cfm?id=1081893
93. Lin, Y.R., Chi, Y., Zhu, S., Sundaram, H., Tseng, B.L.: Analyzing communities
and their evolutions in dynamic social networks. ACM Trans. Knowl. Discov.
Data 3(2), 8:1–8:31 (2009)
94. Luo, F., Wang, J.Z., Promislow, E.: Exploring local community structures in
large networks. In: IEEE/WIC/ACM International Conference on Web Intelli-
gence, pp. 233–239. IEEE (2006). http://ieeexplore.ieee.org/xpl/articleDetails.
jsp?arnumber=4061371
95. von Luxburg, U.: A tutorial on spectral clustering. Stat. Comput. 17(4), 395–416
(2007). http://www.springerlink.com/content/jq1g17785n783661/
96. Meyerhenke, H.: Dynamic load balancing for parallel numerical simulations based
on repartitioning with disturbed diffusion. In: 15th International Conference on
Parallel and Distributed Systems (ICPADS), pp. 150–157. IEEE (2009)
97. Meyerhenke, H., Monien, B., Sauerwald, T.: A new diffusion-based multilevel
algorithm for computing graph partitions. J. Parallel Distrib. Comput. 69(9),
750–761 (2009). http://dx.doi.org/10.1016/j.jpdc.2009.04.005
98. Meyerhenke, H., Monien, B., Schamberger, S.: Graph partitioning
and disturbed diffusion. Parallel Comput. 35(10–11), 544–569 (2009).
http://dx.doi.org/10.1016/j.parco.2009.09.006
99. Mirkin, B.: Eleven ways to look at the chi-squared coefficient for contingency
tables. Am. Stat. 55(2), 111–120 (2001). http://www.jstor.org/stable/2685997
100. Misue, K., Eades, P., Lai, W., Sugiyama, K.: Layout adjustment
and the mental map. J. Vis. Lang. Comput. 6(2), 183–210 (1995).
http://www.sciencedirect.com/science/article/pii/S1045926X85710105
101. Moody, J., McFarland, D., Bender-deMoll, S.: Dynamic network visualization.
Am. J. Sociol. 110(4), 1206–1241 (2005)
102. Muelder, C., Ma, K.L.: Rapid graph layout using space filling curves. IEEE Trans.
Vis. Comput. Graph. 14(6), 1301–1308 (2008)
326 T. Hartmann et al.
103. Muelder, C., Ma, K.L.: A treemap based method for rapid layout of large graphs.
In: Proceedings of IEEE Pacific Visualization Symposium (PacificVis 2008), pp.
231–238 (2008)
104. Newman, M.E.J.: The structure and function of complex networks. SIAM Rev.
45(2), 167–256 (2003). http://dx.doi.org/10.1137/S003614450342480
105. Newman, M.E.J.: Analysis of weighted networks. Phys. Rev. E 70(056131), 1–9
(2004). http://link.aps.org/abstract/PRE/v70/e056131
106. Newman, M.E.J.: Detecting community structure in networks. Eur. Phys.
J. B 38(2), 321–330 (2004). http://www.springerlink.com/content/5GTDACX
17BQV6CDC
107. Newman, M.E.J., Girvan, M.: Finding and evaluating commu-
nity structure in networks. Phys. Rev. E 69(026113), 1–16 (2004).
http://link.aps.org/abstract/PRE/v69/e026113
108. Nguyen, N.P., Dinh, T.N., Ying, X., Thai, M.T.: Adaptive algorithms for detecting
community structure in dynamic social networks. In: Proceedings of the 30th
Annual Joint Conference of the IEEE Computer and Communications Societies
(Infocom), pp. 2282–2290. IEEE Computer Society Press (2011)
109. Nicosia, V., Mangioni, G., Carchiolo, V., Malgeri, M.: Extending the definition
of modularity to directed graphs with overlapping communities. J. Stat. Mech.:
Theory Exp. 2009(03), p03024 (23pp) (2009). http://stacks.iop.org/1742-5468/
2009/P03024
110. Ning, H., Xu, W., Chi, Y., Gong, Y., Huang, T.: Incremental spectral clustering
with application to monitoring of evolving blog communities. In: Proceedings of
the 2007 SIAM International Conference on Data Mining, pp. 261–272. SIAM
(2007)
111. Ning, H., Xu, W., Chi, Y., Gong, Y., Huang, T.: Incremental spectral clustering
by efficiently updating the eigen-system. Pattern Recogn. 43, 113–127 (2010)
112. Ovelgönne, M., Geyer-Schulz, A.: An ensemble learning strategy for graph clus-
tering. In: Graph Partitioning and Graph Clustering: Tenth DIMACS Implemen-
tation Challenge. DIMACS Book, vol. 588, pp. 187–206. American Mathematical
Society (2013). http://www.ams.org/books/conm/588/11701
113. Palla, G., Barabási, A.L., Vicsek, T.: Quantifying social group evolution.
Nature 446, 664–667 (2007). http://www.nature.com/nature/journal/v446/
n7136/abs/nature05670.html
114. Pang, S., Chen, C., Wei, T.: A realtime community detection algorithm: incremen-
tal label propagation. In: First International Conference on Future Information
Networks (ICFIN 2009), pp. 313–317. IEEE (2009)
115. Park, Y., Song, M.: A genetic algorithm for clustering problems. In: Proceedings
of the 3rd Annual Conference on Genetic Programming, pp. 568–575 (1998)
116. Patro, R., Duggal, G., Sefer, E., Wang, H., Filippova, D., Kingsford, C.: The miss-
ing models: a data-driven approach for learning how networks grow. In: Proceed-
ings of the 18th ACM SIGKDD International Conference on Knowledge Discovery
and Data Mining, pp. 42–50. ACM Press (2012)
117. Pearson, K.: On the criterion that a given system of deviations from the probable
in the case of a correlated system of variables is such that it can be reasonably
supposed to have arisen from random sampling. Philos. Mag. Ser. 5 50(302),
157–175 (1900)
118. Pons, P., Latapy, M.: Computing communities in large networks using
random walks. J. Graph Algorithms Appl. 10(2), 191–218 (2006).
http://www.cs.brown.edu/publications/jgaa/
Clustering Evolving Networks 327
119. Raghavan, U.N., Albert, R., Kumara, S.: Near linear time algorithm to detect
community structures in large-scale networks. Phys. Rev. E 76(3), 036106 (2007).
http://link.aps.org/doi/10.1103/PhysRevE.76.036106
120. Rand, W.M.: Objective criteria for the evaluation of clustering methods.
J. Am. Stat. Assoc. 66(336), 846–850 (1971). http://www.jstor.org/stable/
2284239?origin=crossref
121. Riedy, J., Bader, D.A.: Multithreaded community monitoring for massive stream-
ing graph data. In: Workshop on Multithreaded Architectures and Applications
(MTAAP 2013) (2013, to appear)
122. Riedy, E.J., Meyerhenke, H., Ediger, D., Bader, D.A.: Parallel commu-
nity detection for massive graphs. In: Wyrzykowski, R., Dongarra, J.,
Karczewski, K., Waśniewski, J. (eds.) PPAM 2011. LNCS, vol. 7203,
pp. 286–296. Springer, Heidelberg (2012). doi:10.1007/978-3-642-31464-3 29.
http://dx.doi.org/10.1007/978-3-642-31464-3 29
123. Rissanen, J.: Modeling by shortest data description. Automatica 14(5), 465–471
(1978)
124. Rotta, R., Noack, A.: Multilevel local search algorithms for modular-
ity clustering. ACM J. Exp. Algorithmics 16, 2.3:2.1–2.3:2.27 (2011).
http://doi.acm.org/10.1145/1963190.1970376
125. Rousseeuw, P.J.: Silhouettes: a graphical aid to the interpretation and val-
idation of cluster analysis. J. Comput. Appl. Math. 20, 53–65 (1987).
http://www.sciencedirect.com/science/article/pii/0377042787901257
126. Saha, B., Mitra, P.: Dynamic algorithm for graph clustering using minimum cut
tree. In: Proceedings of the Sixth IEEE International Conference on Data Min-
ing - Workshops, pp. 667–671. IEEE Computer Society, December 2006. http://
ieeexplore.ieee.org/xpls/abs all.jsp?arnumber=4063709
127. Saha, B., Mitra, P.: Dynamic algorithm for graph clustering using minimum cut
tree. In: Proceedings of the 2007 SIAM International Conference on Data Mining,
pp. 581–586. SIAM (2007). http://www.siam.org/proceedings/datamining/2007/
dm07.php
128. Sallaberry, A., Muelder, C., Ma, K.-L.: Clustering, visualizing, and navigat-
ing for large dynamic graphs. In: Didimo, W., Patrignani, M. (eds.) GD
2012. LNCS, vol. 7704, pp. 487–498. Springer, Heidelberg (2013). doi:10.1007/
978-3-642-36763-2 43. http://dx.doi.org/10.1007/978-3-642-36763-2 43
129. Sawardecker, E.N., Sales-Pardo, M., Amaral, L.A.N.: Detection of node group
membership in networks with group overlap. Eur. Phys. J. B 67, 277–284 (2009).
http://dx.doi.org/10.1140/epjb/e2008-00418-0
130. Schaeffer, S.E.: Graph clustering. Comput. Sci. Rev. 1(1), 27–64 (2007).
http://dx.doi.org/10.1016/j.cosrev.2007.05.001
131. Schuetz, P., Caflisch, A.: Efficient modularity optimization by multistep
greedy algorithm and vertex mover refinement. Phys. Rev. E 77(046112)
(2008). http://scitation.aip.org/getabs/servlet/GetabsServlet?prog=normal&
id=PLEEE8000077000004046112000001&idtype=cvips&gifs=yes
132. Shen, H., Cheng, X., Cai, K., Hu, M.B.: Detect overlapping and hierarchical
community structure in networks. Phys. A: Stat. Mech. Appl. 388(8), 1706–1712
(2009). http://www.sciencedirect.com/science/article/pii/S0378437108010376
133. Shi, J., Malik, J.: Normalized cuts and image segmentation. IEEE Trans.
Pattern Anal. Mach. Intell. 22(8), 888–905 (2000). http://doi.ieeecs.org/
10.1109/34.868688
328 T. Hartmann et al.
134. Sibson, R.: Slink: an optimally efficient algorithm for the single-link cluster
method. Comput. J. 16(1), 30–34 (1973). http://dx.doi.org/10.1093/comjnl/
16.1.30
135. Šı́ma, J., Schaeffer, S.E.: On the NP-completeness of some graph cluster measures.
In: Wiedermann, J., Tel, G., Pokorný, J., Bieliková, M., Štuller, J. (eds.) SOFSEM
2006. LNCS, vol. 3831, pp. 530–537. Springer, Heidelberg (2006). doi:10.1007/
11611257 51. http://dx.doi.org/10.1007/11611257 51
136. Snijders, T.A., Nowicki, K.: Estimation and prediction of stochastic blockmodels
for graphs with latent block structure. J. Classif. 14, 75–100 (1997)
137. Spiliopoulou, M., Ntoutsi, I., Theodoridis, Y., Schult, R.: MONIC: modeling and
monitoring cluster transitions. In: Proceedings of the 12th ACM SIGKDD Inter-
national Conference on Knowledge Discovery and Data Mining, pp. 706–711.
ACM Press (2006). http://doi.acm.org/10.1145/1150402.1150491
138. Stanton, I., Kliot, G.: Streaming graph partitioning for large distributed graphs.
In: Proceedings of the 18th ACM SIGKDD International Conference on Knowl-
edge Discovery and Data Mining, pp. 1222–1230. ACM Press (2012)
139. Staudt, C., Meyerhenke, H.: Engineering high-performance community detection
heuristics for massive graphs. In: Proceedings of the 2013 International Conference
on Parallel Processing. Conference Publishing Services (CPS) (2013)
140. Sun, J., Yu, P.S., Papadimitriou, S., Faloutsos, C.: Graphscope: parameter-
free mining of large time-evolving graphs. In: Proceedings of the 13th ACM
SIGKDD International Conference on Knowledge Discovery and Data Mining,
pp. 687–696. ACM Press (2007). http://portal.acm.org/citation.cfm?id=1281192.
1281266&coll=Portal&dl=GUIDE&CFID=54298929&CFTOKEN=41087406
141. Sun, Y., Tang, J., Han, J., Gupta, M., Zhao, B.: Community evolution detection
in dynamic heterogeneous information networks. In: Proceedings of the Eighth
Workshop on Mining and Learning with Graphs, pp. 137–146. ACM Press (2010).
http://doi.acm.org/10.1145/1830252.1830270
142. Sundaresan, S.R., Fischhoff, I.R., Dushoff, J.: Network metrics reveal differ-
ences in social organization between two fission-fusion species, Grevy’s zebra and
onager. Oecologia 151(1), 140–149 (2007)
143. Takaffoli, M., Fagnan, J., Sangi, F., Zaı̈ane, O.R.: Tracking changes in dynamic
information networks. In: Proceedings of the 2011 IEEE International Confer-
ence on Computational Aspects of Social Networks, pp. 94–101. IEEE Computer
Society (2011)
144. Takaffoli, M., Rabbany, R., Zaı̈ane, O.R.: Incremental local community identifi-
cation in dynamic social networks. In: Proceedings of the 2013 IEEE/ACM Inter-
national Conference on Advances in Social Networks Analysis and Mining, IEEE
Computer Society (2013, to appear)
145. Tong, H., Papadimitriou, S., Sun, J., Yu, P.S., Faloutsos, C.: Colibri: fast mining
of large static and dynamic graphs. In: Proceedings of the 14th ACM SIGKDD
International Conference on Knowledge Discovery and Data Mining, pp. 686–694.
ACM Press (2008). http://doi.acm.org/10.1145/1401890.1401973
146. Vázquez, A.: Growing network with local rules: preferential attachment, clus-
tering hierarchy, and degree correlations. Phys. Rev. E 67, 056104 (2003).
http://link.aps.org/doi/10.1103/PhysRevE.67.056104
147. Viswanath, B., Mislove, A., Cha, M., Gummadi, P.K.: On the evolution of user
interaction in facebook. In: Proceedings of the 2nd ACM Workshop on Online
Social Networks, pp. 37–42. ACM Press (2009). http://doi.acm.org/10.1145/
1592665.1592675
Clustering Evolving Networks 329
148. Wagner, S., Wagner, D.: Comparing clusterings - an overview. Technical report
2006-04, ITI Wagner, Faculty of Informatics, Universität Karlsruhe (TH) (2007).
http://digbib.ubka.uni-karlsruhe.de/volltexte/1000011477
149. Wang, Y.J., Wong, G.Y.: Stochastic blockmodels for directed graphs. J. Am. Stat.
Assoc. 82, 8–19 (1987)
150. Watts, D.J.: Networks, dynamics, and the small-world phenomenon. Am. J.
Sociol. 105, 493–527 (1999)
151. Watts, D.J., Strogatz, S.H.: Collective dynamics of ‘small-world’ networks. Nature
393(6684), 440–442 (1998)
152. Xie, J., Chen, M., Szymanski, B.K.: LabelRankT: incremental community detec-
tion in dynamic networks via label propagation. CoRR abs/1305.2006 (2013).
http://arxiv.org/abs/1305.2006
153. Xie, J., Szymanski, B.K.: LabelRank: a stabilized label propagation algorithm for
community detection in networks. CoRR abs/1303.0868 (2013). http://arxiv.org/
abs/1303.0868
154. Xu, K.S., Kliger, M., Hero, A.O.: Tracking communities in dynamic social
networks. In: Salerno, J., Yang, S.J., Nau, D., Chai, S.-K. (eds.) SBP
2011. LNCS, vol. 6589, pp. 219–226. Springer, Heidelberg (2011). doi:10.1007/
978-3-642-19656-0 32
155. Xu, X., Yuruk, N., Feng, Z., Schweiger, T.A.J.: Scan: a structural clustering algo-
rithm for networks. In: Proceedings of the 13th ACM SIGKDD International
Conference on Knowledge Discovery and Data Mining, pp. 824–833. ACM Press
(2007)
156. Yang, T., Chi, Y., Zhu, S., Jin, R.: Detecting communities and their evolutions
in dynamic social networks - a Bayesian approach. Mach. Learn. 82(2), 157–189
(2011)
157. Yu, K., Yu, S., Tresp, V.: Soft clustering on graphs. In: Advances in Neural
Information Processing Systems 18, p. 5. MIT Press (2006)
158. Yu, S.X., Shi, J.: Multiclass spectral clustering. In: Proceedings of the 9th IEEE
International Conference on Computer Vision, pp. 313–319 (2003)
159. Zachary, W.W.: An information flow model for conflict and fission in small groups.
J. Anthropol. Res. 33, 452–473 (1977)
160. Zhao, Y., Yu, P.S.: On graph stream clustering with side information. In: Proceed-
ings of the Seventh SIAM International Conference on Data Mining, pp. 139–150.
SIAM (2013)
161. Zheleva, E., Sharara, H., Getoor, L.: Co-evolution of social and affiliation net-
works. In: Proceedings of the 15th ACM SIGKDD International Conference
on Knowledge Discovery and Data Mining, pp. 1007–1016. ACM Press (2009).
http://doi.acm.org/10.1145/1557019.1557128
162. Zhou, H.: Network landscape from a Brownian particle’s perspective. Phys. Rev.
E 67, 041908 (2003). http://link.aps.org/doi/10.1103/PhysRevE.67.041908
Integrating Sequencing and Scheduling:
A Generic Approach with Two Exemplary
Industrial Applications
Institut Für Mathematik, Technische Universität Berlin, Straße des 17. Juni 136,
10623 Berlin, Germany
{hoehn,moehring}@math.tu-berlin.de
1 Introduction
During the industrial revolution in the late 19th century, production processes
were widely reorganized from individual manufacturing to assembly line produc-
tion. The high efficiency of this new concept helped to drastically increase the
productivity so that nowadays, it is one of the standard manufacturing concepts
in several branches of industry. While over the years, assembly lines improved and
got technically more advanced, production planning at the same time became
more and more complex. The awareness for the central importance of elaborate
schedules, though, raised only slowly. Still, surprisingly many companies do not
Supported by the German Research Foundation (DFG) as part of the Priority Pro-
gram “Algorithm Engineering.”.
c Springer International Publishing AG 2016
L. Kliemann and P. Sanders (Eds.): Algorithm Engineering, LNCS 9220, pp. 330–351, 2016.
DOI: 10.1007/978-3-319-49487-6 10
Integrating Sequencing and Scheduling: A Generic Approach 331
exploit the full potential of their production lines due to suboptimal planning. It
is not unusual that even today production plans are designed manually without
any support of planning or optimization tools. Of course, this does not neces-
sarily imply poor plans. On the contrary, due to the long-term experience of the
planners, these schedules have very often an impressive quality. However, if the
problem becomes too complex it is basically impossible for a human being to
oversee the problem as a whole, and hence, to manually produce good plans.
The addressed complex scheduling problems share major problem charac-
teristics. One of the most central components is the sequencing aspect which
arises naturally from the architecture of assembly lines. Due to complex side
constraints, however, the overall scheduling problem usually goes far beyond
basic sequencing problems known from theoretic literature. In most of the cases,
the actual processing sequence is just one of many strongly interdependent deci-
sions. Once a sequence is chosen, still certain allocations have to be made—e.g.,
the allocation of resources or the assignment of time slots for work related to
security regulations. Such decisions may depend not only on pairs of adjacently
scheduled jobs but also on whole subsequences or even the entire sequence in the
worst case. As a consequence, the quest for a good processing sequence can only
be answered accurately by taking into account the ensuing allocation problem
as well.
Currently, for most of those problems (sufficiently fast) exact algorithms
are out of reach. Computationally expensive heuristics are usually also not an
option, since planning tools are utilized on a daily basis, and for this reason
underlie strict runtime limits. When designing simple heuristics, on the other
hand, one is often faced with problems stemming from tangled constraints. Often,
the sequencing is tailored to only some of the many side constraints, leading in
the end to rather poor results.
In this paper, we investigate a generic integrated approach which aims to
avoid these difficulties. Based on a black box sequence evaluation, the algorithm
elaborately examines a large number of processing sequences. In this way, the
highly problem specific allocation problem is separated from the problem unspe-
cific sequencing task. From a user’s perspective, this allows to solely focus on
the allocation problem which is usually far more manageable than the overall
scheduling problem. This makes the approach very user-friendly, and, as we will
see in examples in the following sections, also very successful. As a nice side
effect, the approach generates not only one but several good solutions such that
expert planners can take the final decision according to criteria possibly not even
covered by the actual model. Since it is unlikely to provide a priori performance
guarantees, we assess the performance of the algorithms via instance-based lower
bounds.
Before addressing related algorithmic challenges, we will first give an idea
of what different planning problems and the underlying allocation problems
look like.
on assembly belts. Due to the constant speed of the lines, and hence, the lim-
ited time for each working step, it is not possible to install certain options like
sunroofs or special navigation devices at every car in the sequence but only
at a certain fraction. Violations of the constraints are measured by a penalty
function, representing the cost for additional workers required or the additional
time needed. This is a classic problem introduced in [29] already in the 1980s. It
has prominently attained attention when being addressed in the 2005 ROADEF
Challenge1 in cooperation with Renault. The results of this competition are
summarized in [33]. In this problem, the underlying allocation problem is trivial.
Once the processing sequence is chosen, only the penalty cost needs to computed,
and no further decisions have to be taken.
industry, there are usually further side constraints that go far beyond ATSP. A
class of such constraints is due to the temperature of the processed items which
plays a crucial role in many production steps. To avoid the items cooling down
too much, it is necessary to limit the waiting times between different processing
stages, and hence, it may be necessary to take into account preceding stages
when planning.
When implementing the generic algorithmic approach for a concrete problem,
the most elaborate part is usually the design of algorithms for the allocation
problem which—to serve as a proper subroutine—have to be sufficiently fast.
Due to the large number of sequence evaluation calls, we cannot afford allocation
algorithms with more than linear or quadratic runtime. In many cases, this
permits again only heuristics. Their quality goes very often hand in hand with
a good understanding of the problem, not only from the experimental but also
from the theoretical perspective. Structural insights gained in the analysis can
provide essential ideas for the design of algorithms.
n−1
Cmax (π, A(π)) = pj + sπ(j)π(j+1) + cπ (A(π)) .
j∈J j=1
n−1
The term j=1 sπ(j)π(j+1) + cπ (A(π)) is referred to as cost of the schedule.
We typically assume that the allocation problem involves decisions that
incorporate not only neighboring jobs but also larger parts of the processing
sequence π. Consequently, also the additional allocation cost cπ (A(π)) will be
n−1
in this sense. In contrast, the term j=1 sπ(j)π(j+1) is purely local
non-local
while j∈J pj is even independent of the schedule.
even in the symmetric case when all edge length are 1 or 2 [28]. Moreover, for
the symmetric case with general edge lengths, it follows that there exists no con-
stant factor approximation algorithm unless P = NP [31]. Excellent references on
TSP include the textbook by Applegate et al. [4] and the more recent collection
by Gutin and Punnen [13]. Despite the hardness of ATSP, Helsgaun’s efficient
implementation of the Lin-Kernighan Heuristic [15] (LKH) can be expected to
compute optimal solutions for instances of up to roughly 100 and 1000 cities in
approximately one second and one minute, respectively.
Initial Population. The initial population plays a central role in the success of
our approach. Choosing an appropriate and diverse population usually increases
the chance to find good solutions and moreover, to do so much faster. In contrast
to the final sequence we aim to compute, for the different sequences in the initial
population we rather look for sequences that are specifically good with respect
to only some of the many constraints. One natural such sequence is the one
we obtain when ignoring all side constraints and focusing only on minimizing
the total sequence-dependent setup durations sij . In this restricted form, the
problem can be formulated as ATSP with distances sij , and a solution can
be computed utilizing LKH. If necessary, we add further randomly generated
sequences. After all, a highly diverse initial population provides a good starting
point for later recombinations in the algorithm.
(a) Schematic view of a coil coat- (b) Coils of different widths as they are processed
ing line with chem, primer, and in the coating line (from left to right). The exam-
finish coater. ple shows the coils’ colors at one particular coater.
Due to the extremely diverse product portfolio of coils, the coil coating
process plays an important role in the overall planning in steel industry. As
is typical for paint jobs, it may be subject to long setup times, mainly for the
cleaning of equipment, and thus very high setup cost. This problem has been
introduced and studied in detail in [18]. However, at this point our focus is
rather on illustrating the general approach taken in Algorithm 1.1, and so the
description remains high level.
setup
intervals shuttle coater 1 shuttle coater 2 shuttle coater m
work team 2
···
0
work team 1
···
setup
interval conflicts
R
of R with R
0
tank
0 tank interval 0 0 intervals
of R
Fig. 2. Interval model for the tank assignment and setup work schedule.
Related Work. Literature regarding optimization in the planning process for coil
coating in general is scarce at best: To the best of our knowledge, only Tang
and Wang [35] consider planning for a coil coating line. They apply tabu search
to a rather basic model without shuttle coaters. The present work is the first
incorporating shuttles and concurrent setup work in a thorough mathematical
investigation.
Model for the Underlying Allocation Problem. Due to the shuttle coaters, even
fixing the coil sequence leaves open the non-trivial question of deciding the
tank assignment and the scheduling of concurrent setup work. We develop a
representation of solutions of this allocation problem as a family of weighted
2-dimensional intervals or rectangles, where the first dimension is related to a
tank assignment and the second dimension to performing concurrent setup work.
Integrating Sequencing and Scheduling: A Generic Approach 339
More specifically, the x-axis is divided into disjoint segments, one for each
shuttle coater. Each of the segments covers the fixed chosen coil sequence, and
an “interval” in that sequence corresponds to a maximal sequence of consecu-
tive coils run from the same tank on the corresponding coater. Consequently,
intersecting intervals conflict in the sense of representing an infeasible tank
assignment. The y-axis is similarly divided into segments for each team that
can perform setup work. Here, an “interval” in the coil sequence corresponds to
a time interval during which setup work is performed concurrently to production.
If every team has the ability to perform all possible setup work, we have identi-
cal rows of the different teams in the y-direction. In order to properly perform
setup work concurrently, the tank on the respective coater must not be changed
during this interval, i.e., the rectangle’s setup interval must be contained in its
tank interval with respect to the segment-based time axis. Intersecting setup
intervals conflict since one team can only perform concurrent setup work at one
coater at a time. See Fig. 2 for an example.
Finally, we assign weights to the rectangles which represent the (possibly
negative) cost savings by the corresponding partial tank assignment and the
concurrent setup work performed, compared to running all coils on the same tank
without concurrent setup work. The fixed-sequence allocation problem is then
equivalent to finding pairwise non-conflicting rectangles with maximum total
weight (in a set of properly defined rectangles). This problem is closely related—
however, neither being a special case nor a generalization—to the Maximum
Weight Independent Set Problem on a restricted class of so called 2-union graphs,
which are of interest in their own right. Due to the limited space, here we only
focus on the tank assignment and concurrent setup work scheduling. In [18], we
provide additional results for the latter problem.
Algorithm for the Underlying Allocation Problem. Studying the allocation prob-
lem, we observed that even this fixed-sequence subproblem is strongly NP-hard
when the number of shuttle coaters is assumed to be part of the problem input.
Theorem 1 ([18]). For the number of shuttle coaters m being part of the prob-
lem input, the fixed sequence tank assignment and concurrent setup work schedul-
ing problem is strongly NP-hard for any fixed number r < m of work teams for
performing setup work.
This result shows that we cannot assume a polynomial time algorithm for the
allocation problem with an arbitrary number of shuttle coaters m. However, in
practice, this number is usually rather small so that it is reasonable to assume m
to be constant. Under this assumption, we could show that the problem is in P.
Theorem 2 ([18]). When the number of shuttle coaters m is constant, and the
ratio maxj∈C pj /τ is polynomial in the size of the input I, denoting by pj the
duration of a coil j and by τ the greatest common divisor of all setup durations,
then the fixed sequence tank assignment and concurrent setup work scheduling
problem can be solved in polynomial time, even if the number of work teams r is
part of the input.
340 W. Höhn and R.H. Möhring
Even though the designed algorithm has a polynomial runtime, it is far too
slow to be used as subroutine in our overall framework. Due to its dynamic
programming nature, even a single run of the algorithm for a very small number
of coils and shuttle coaters exceeds our overall runtime limit.
Still, the graph theoretical model inspires a fast and good heuristic for the
allocation problem which would not have been possible without the above inves-
tigations. The complexity of our exact algorithm stems from the need to consider
interval selections for all coaters simultaneously in order to ensure that savings
from all selected setup intervals can be realized by the scarce work resource(s).
Intuitively, the probability that concurrent setup work on different cells can be
scheduled feasibly, i.e., one setup at a time, increases with the length of the
associated tank interval. This is our core idea for computing good tank assign-
ments heuristically. Instead of considering all coaters at once, we consider them
separately. Recall that savings from tank intervals for different coaters can be
realized independently in any case. Now, instead of explicitly considering all pos-
sible saving rectangles belonging to some tank interval I, we assume that during
a certain fraction α of Is length setup work can be performed, and, at this point,
we do not care when exactly it is performed and even, if it can be performed
that way at all.
Modeling this idea, we define new weights for the intervals. With these mod-
ified weights, it suffices to consider tank intervals alone. As a consequence, com-
puting a tank assignment reduces to finding a Maximum Weight Independent
Set in an interval graph, which can be dealt with very efficiently; see e.g., [26]. In
order to compute a feasible concurrent setup allocation for this tank assignment,
we use an earliest-deadline-first strategy as a simple scheduling rule for properly
defined deadlines.
Additionally, we also consider the allocation rule which was previously in use
at Salzgitter Flachstahl: Whenever subsequent coils have different colors, switch
the tank. If the new tank does not contain the required color, a color change on
that tank becomes necessary. We refer to this rule as FIFO.
Results. Embedding the above algorithm for the allocation problem into the
generic Algorithm 1.1, altogether, we develop a practical heuristic which solves
the overall planning problem and computes detailed production schedules for
the coil coating line. Besides the allocation algorithm, also the construction of
the initial population plays a major role for the overall quality of the algorithm.
It generates several very different solutions which are good with respect to only
some of the many side constraints. The quality of our schedules is assessed with
the help of an integer program which we solve by branch-and-price.
Our algorithm has been added to PSI Metals’ planning software suite, and
is currently in use for a coil coating line with shuttle coaters at Salzgitter
Flachstahl, Germany. At the time of its introduction, it yielded an average
reduction in makespan by over 13% as compared to the previous manual plan-
ning process. In addition, our lower bounds suggest that the makespan of the
solutions computed by our algorithm is within 10% of the optimal makespan
Integrating Sequencing and Scheduling: A Generic Approach 341
1.2
1.0
0.8
best α
α = 0.5
0.6
FIFO
0.4
0.2
0.0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
1.4
1.2
1.0
LBtriv
0.8 LBTSP
LBIP
FIFO
0.6 expert
0.4
0.2
0.0
s1 s2 s3 s4 s5 s6 s7 s8 s9 s 10
for typical instances2 . Since most setup cost calculations are incorporated into
our methods as a black box, our algorithm can be adapted easily to other coil
coating lines with different setup rules and a different number of shuttle coaters.
We close this section with further details on our computational study which
was based on instances from daily production at Salzgitter Falchstahl. For
long-term instances which cover roughly 72 h of production, we compared the
allocation subroutine based on Independent Set for the best α in
2
This success has made this contribution a finalist of the 2009 EURO Excellence in
Practice Award.
Integrating Sequencing and Scheduling: A Generic Approach 343
{0.1, 0.2, . . . , 0.8} and for fixed α = 0.5, and the FIFO rule; see Fig. 3 and Table 1.
The Independent Set Heuristic outperformed FIFO on 12 of the 16 instances,
reducing cost by up to 30% (over FIFO). This translates to makespan savings of
up to 6%. When fixing α to 0.5, the Independent Set Heuristic remains similarly
superior to FIFO on 8 instances, while incurring an increase in makespan of at
most 1% in four cases.
For short-term instances of roughly 24 h, we succeeded in computing lower
bounds with respect to the FIFO rule by our IP approach. Yet, we did not solve
all instances to integer optimality, so the lower bound is certainly improvable. In
Fig. 4 and Table 1 we compare different lower bounds—a trivial bound LBtriv , a
TSP-based bound LBTSP , and the mention IP bound LBIP , see [18] for further
details—with our FIFO results and the solutions we were provided with by
Salzgitter Flachstahl. The results show makespan reductions of up to more than
30%. The superiority of the Independent Set Heuristic to FIFO is less significant
in short-term planning, so that we focused on FIFO for the short instances.
> Δclean
> Δclean
Fig. 5. The schedule (a) contains only jobs and sequence-dependent setups, two of them
being cleanings. The time between the cleanings exceeds the maximum time lag Δclean
between cleanings. The schedules (b) and (c) below are feasible with respect to cleaning
time lags. The former replaces two existing sequence-dependent setups while the latter
preempts the middle job to require only one additional cleaning.
possibly requiring more additional cleanings in total; see Fig. 5. Thus, setups are
not only determined by their neighboring jobs, but also by scheduling decisions
taken in view of the entire sequence.
Moreover, there are waiting periods which are caused by filling constraints of
certain products, or other technical constraints. A typical example for the latter
is the production of cream. The pretreated milk is processed in a special tank
before the finished cream can be filled into cups. Due to the limited tank size,
cream jobs have to be partitioned into different tank fillings. Resulting from the
preparation time in the tank and the later cleaning time, the filling machine may
be idle, awaiting the first cream job of a new tank filling. We refer to constraints
of this type as capacity constraints.
Related Work. Previous work on the planning of filling lines in dairy industry
mainly focuses on mixed-integer linear programming (MILP) models to minimize
weighted cost functions comprising setup cost, storage cost and others. Recently,
Kopanos et al. [19] proposed a MILP model for parallel filling machines, taking
into account machine- and sequence-dependent setups, due dates, and certain
tank capacities. Further MILP models were proposed for different problem set-
tings, e.g., by [11,22,23]. For the common planning horizon of one week, the mod-
els in [11,19] could compute optimal solutions at very low computational cost.
Based on different relaxations of their model, Marinelli et al. [23] also proposed
an algorithm which is heuristically almost optimal, but at a high computational
expense. However, different to our problem, regular cleanings of the filling line
are performed at the end of the day in all these papers. In combination with their
very restricting sequencing constraints, this turns these problems into packing
problems rather than into sequencing problems as in our case. The results cannot
be adapted to our problem, and, to the best of our knowledge, flexible cleaning
scheduling has not been considered in the scheduling literature yet.
Integrating Sequencing and Scheduling: A Generic Approach 345
Model for the Underlying Allocation Problem. The side constraints in the dairy
problem exhibit a very similar structure. In fact, we can formulate all of them
as generalized capacity constraints. Such a constraint is defined in a similar way
as normal capacity constraints. For the normal capacity constraints, one is look-
ing for consecutive subgroups of jobs of a certain characteristic, in total not
exceeding a given capacity limited—e.g., cream jobs of at most 10000 liters—
where consecutive means that there are no other jobs of the same characteristic
between the jobs of the subgroup. However, jobs of other characteristics may
occur inbetween; see the bottom part of Fig. 6. While for the normal capacity
constraints there are certain time lags to be observed between the subgroups,
in the generalized case, there are general cost. Moreover, generalized capacity
constraints do not only take into account jobs but also setup work or waiting
periods, jointly referred to as tasks.
Note that the dairy problem involves multiple generalized capacity con-
straints, and a subgroup’s cost with respect to a certain constraint strongly
depends on the chosen subgroups corresponding to other constraints, since the
related costs may represent setup or waiting tasks that have to be added to
the sequence. E.g., if inserting cleanings into the schedule for the cleaning con-
straint, actual waiting times of other constraints may decrease, and so would
the cost of the associated subgroups with respect to that constraint. Thus, by
optimally satisfying one constraint after the other, in general, we do not obtain
an optimum schedule.
Algorithm for the Underlying Allocation Problem. The special case in which the
allocation problem consists of a single generalized capacity constraint can be
solved with standard dynamic programming. In order to make use of efficient
software libraries, we formulate the problem as shortest path problem. For a
given sequence of tasks, we define a graph whose vertices lie on the natural time
axis of the sequence; see Fig. 6. For each characteristic task, we place a vertex
at its start time. Additionally, a vertex is placed at the completion time of the
last characteristic task. We insert an arc pointing from one vertex to another if
and only if it is a forward arc in the natural sense of time, and the tasks below
that arc, i.e., the tasks that lie in the time interval covered by the arc, form
a feasible subgroup with respect to the constraint under consideration. More
precisely, the subgroup is formed by the subsequence of tasks, starting with
346 W. Höhn and R.H. Möhring
the first characteristic job below that arc, and ending with the last such job,
respectively. Since any feasible subgroup is represented by an arc in the graph,
this yields a one-to-one correspondence between feasible sets of subgroups and
paths in the graph. By using the subgroups costs as arc weights, a shortest
path in this graph corresponds to a minimum cost choice of subgroups. Since
the number of arcs is quadratic in the number of characteristic tasks, Dijkstras
shortest path algorithm solves the problem in polynomial time.
If there is more than one generalized capacity constraint, it might as well
be possible to solve the allocation problem (under some restrictions) with an
adapted dynamic programming approach. However, since we do not expect such
an algorithm to be sufficiently fast for being used as subroutine in Algorithm 1.1,
our practical focus was more on fast heuristic algorithms. Aiming for at most a
quadratic runtime, we examined algorithmic variants which satisfy the different
generalized capacity constraints one after the other. The variants differ as well
in the order in which the constraints are satisfied as in the subroutine that
is used in each constraint step. While in the first setting, we use the above
shortest path approach (SP) which satisfies each constraint at minimum cost,
in the second variant we use a greedy subroutine (Greedy) which adds tasks
to subgroups as long as the given capacity is not exceeded. The keywords First
and Last indicate that the cleaning constraint is satisfied before and after the
generalized capacity constrainst, respectively. In our tests we consider all four
variants GreedyFirst, GreedyLast, SPFirst and SPLast.
Currently, the complexity of the allocation problem is still open. For the
overall dairy problem with its sequencing and allocation components, already a
very restricted special case can be shown to be strongly NP-hard [12].
Results. We evaluate all of the above variants of the allocation algorithm as sub-
routine in the framework of Algorithm 1.1. For the initial population, we choose
a sequence which is optimal with respect to local setup work but disregards all
other constraints, and in addition, we choose several random sequences. The
former is computed utilizing LKH. Ignoring all cost due to generalized capacity
constraints, i.e., accounting only for the job’s processing times and local setup
times, the LKH solution provides a lower bound for the optimum cost of the
general problem. This bound is used to evaluate our solutions.
Our industrial partner Sachsenmilch provided us with actual production data
which we used to generate 2880 additional realistic, but in different ways more
extreme, problem instances. For the data of Sachsenmilch’s current production,
our approach achieved an optimality gap of only 2%, and of roughly 15% on
average for the generated instances. Due to the presumed weakness of the lower
bound, the gap of 2% makes us believe that the former solution is in fact optimal.
Comparing the Greedy and the optimal SP subroutine, it turned out that
Greedy generates in fact better results. A reason for this might be that in
contrast to the schedules computed with the shortest path subroutine SP, the
Greedy schedules never preempt jobs. This property seems to allow to adapt
to “later” constraints more easily.
Integrating Sequencing and Scheduling: A Generic Approach 347
In the remainder of this section, we will provide some more detailed com-
putational results. In our evaluations, we filter the instances for one particular
parameter, i.e., an instance belongs to such a setting if this parameter is satisfied,
no matter how the remaining parameters are set. We examined different base
types (all, yoghurt, cream), processing times (original, small, large), volumes of
the cream tank (original, small), and lengths of the cleaning intervals (original,
small). The setting original comprises all instances that were generated accord-
ing to the actual parameters by Sachsenmilch. The results are shown in Fig. 7
and the corresponding Table 2.
We compared the better result of the two greedy algorithms GreedyLast
and GreedyFirst (BestGreedy) with the better result of the two short-
est path variants SPLast and SPFirst (BestSP). BestGreedy performed
always better than BestSP, up to 4% in makespan average. In fact, for the
orginial test setting, BestSP is never better than BestGreedy. If utilizing
BestSP, the worst case gap compared to BestGreedy is 6%, whereas con-
versely this gap may grow up to 210%. As expected, also the runtimes differ
greatly; see again Table 2. While BestSP produces better solutions for the allo-
cation subproblem, the faster running time of BestGreedy allows for more
iterations of the genetic algorithm and better overall solutions.
If comparing the two greedy and the two shortest path approaches with
each other, the difference of the performance is not that striking. The average
performance is almost identical. However, considering the worst case behavior, in
both cases the Last-algorithm is only about 6% worse than the First-algorithm,
where conversely, the gap can attain 15%. This may be due to the advantage of
scheduling the inflexible cleanings as late as possible.
4.0
2.0
3.5
1.8
3.0
1.6
2.5
1.4
2.0
1.2
1.5
1.0 1.0
al d . d d . r . r . r . k k n n al d. d d. r. r. r. k k n n
in o ro o u u u an an clea clea in o ro o u u u an an clea clea
ig p r p p r l d l d d t t ig p r p p r l d l d d t t
or all urt am ina mal arge inal all al Δ ll Δ or all urt am ina mal arge inal all al Δ ll Δ
g e g s l ig sm in ma g e g s l ig sm in ma
yo cr ori or ig s yo cr ori or ig s
or or
(a) Ratio BestSP/BestGreedy. (b) Ratio GreedyLast/ LB.
Fig. 7. Computational results for the different parameter classes. In the box plots, the
average value ∅ is marked as fat line and the surrounding rectangle shows the interval
of the standard deviation σ. The minimum and maximum value is marked at the lower
and upper end of the thin vertical line, respectively.
348 W. Höhn and R.H. Möhring
The optimality gap computed with the LKH lower bound is roughly 15%,
and it is slightly better for the greedy algorithms than for the shortest path
algorithms. Exemplary for our best algorithm GreedyLast, the results are
shown in Fig. 7(b). For all algorithms, the optimality gap is much worse for a
small tank size. However, in this case the lower bound performs very poorly, so
that one may assume that this gap is far away from being tight.
where Cj denotes the completion time of job j. In the literature, scheduling prob-
lems with this objective are often referred to as generalized min-sum scheduling.
For instance, setting the cost function to f (t) := tp for some p ≥ 1 and apply-
ing the p-th square root to the optimal objective function value, we minimize
the Lp -norm, a classic fairness measure.
6 Conclusions
Our work shows that good algorithm engineering requires theory and practice
to go hand in hand. Problems from industry inspire new interesting theoretical
questions which, on the one hand, naturally enhance problems that have been
studied in theory, while on the other hand, they provide essential insights for
designing better practical algorithms. In the end, algorithm engineering allows
to achieve results which would not have been possible with standard techniques
and from which in fact both theory and practice benefit.
350 W. Höhn and R.H. Möhring
References
1. Aarts, E., Lenstra, J.K. (eds.): Local Search in Combinatorial Optimization. Wiley,
Hoboken (1997)
2. Akkerman, R., van Donk, D.P.: Analyzing scheduling in the food-processing indus-
try: structure and tasks. Cogn. Technol. Work 11, 215–226 (2009)
3. Allahverdi, A., Ng, C.T., Cheng, T.C.E., Kovalyov, M.Y.: A survey of scheduling
problems with setup times or costs. Eur. J. Oper. Res. 187, 985–1032 (2008)
4. Applegate, D.L., Bixby, R.E., Chvátal, V., Cook, W.J.: The Traveling Salesman
Problem: A Computational Study. Princeton University Press, Princeton (2006)
5. Balas, E., Simonetti, N., Vazacopoulos, A.: Job shop scheduling with setup times,
deadlines and precedence constraints. J. Sched. 11, 253–262 (2008)
6. Bampis, E., Guinand, F., Trystram, D.: Some models for scheduling parallel pro-
grams with communication delays. Discrete Appl. Math. 72, 5–24 (1997)
7. Bellabdaoui, A., Teghem, J.: A mixed-integer linear programming model for the
continuous casting planning. Int. J. Prod. Econ. Theor. Issues Prod. Sched. Control
Plan. Control Supply Chains Prod. 104, 260–270 (2006)
8. Claassen, G.D.H., van Beek, P.: Planning and scheduling. Eur. J. Oper. Res. 70,
150–158 (1993)
9. Cowling, P.: A flexible decision support system for steel hot rolling mill scheduling.
Comput. Ind. Eng. 45, 307–321 (2003)
10. Delucchi, M., Barbucci, A., Cerisola, G.: Optimization of coil coating systems by
means of electrochemical impedance spectroscopy. Electrochim. Acta 44, 4297–
4305 (1999)
11. Doganis, P., Sarimveis, H.: Optimal production scheduling for the dairy industry.
Ann. Oper. Res. 159, 315–331 (2008)
12. Gellert, T., Höhn, W., Möhring, R.H.: Sequencing, scheduling for filling lines in
dairy production. Optim. Lett. 5, 491–504 (2011). Special issue of SEA 2011
13. Gutin, G., Punnen, A.P.: The Traveling Salesman Problem and Its Variations.
Springer, New York (2002)
14. Hall, N.G., Potts, C.N., Sriskandarajah, C.: Parallel machine scheduling with a
common server. Discrete Appl. Math. 102, 223–243 (2000)
15. Helsgaun, K.: An effective implementation of the Lin-Kernighan traveling salesman
heuristic. Eur. J. Oper. Res. 126, 106–130 (2000)
16. Höhn, W., Jacobs, T.: An experimental and analytical study of order constraints
for single machine scheduling with quadratic cost. In: Proceedings of the 14th
Meeting on Algorithm Engineering and Experiments (ALENEX), pp. 103–117.
SIAM (2012)
17. Höhn, W., Jacobs, T.: On the performance of Smith’s rule in single-machine
scheduling with nonlinear cost. ACM Trans. Algorithms (2014, to appear)
18. Höhn, W., König, F.G., Lübbecke, M.E., Möhring, R.H.: Integrated sequencing
and scheduling in coil coating. Manage. Sci. 57, 647–666 (2011)
19. Kopanos, G.M., Puigjaner, L., Georgiadis, M.C.: Optimal production scheduling
and lot-sizing in dairy plants: the yogurt production line. Ind. Eng. Chem. Res.
49, 701–718 (2010)
20. Kopanos, G.M., Puigjaner, L., Georgiadis, M.C.: Efficient mathematical frame-
works for detailed production scheduling in food processing industries. Comput.
Chem. Eng. 42, 206–216 (2012)
21. Koulamas, C., Kyparisis, G.J.: Single-machine scheduling problems with past-
sequence-dependent setup times. Eur. J. Oper. Res. 187, 1045–1049 (2008)
Integrating Sequencing and Scheduling: A Generic Approach 351
22. Lütke Entrup, M., Günther, H.-O., van Beek, P., Grunow, M., Seiler, T.: Mixed-
integer linear programming approaches to shelf-life-integrated planning, scheduling
in yoghurt production. Int. J. Prod. Res. 43, 5071–5100 (2005)
23. Marinelli, F., Nenni, M., Sforza, A.: Capacitated lot sizing and scheduling with
parallel machines and shared buffers: a case study in a packaging company. Ann.
Oper. Res. 150, 177–192 (2007)
24. Meloni, C., Naso, D., Turchiano, B.: Multi-objective evolutionary algorithms for
a class of sequencing problems in manufacturing environments. In: Proceedings of
the IEEE International Conference on Systems, Man and Cybernetics, pp. 8–13
(2003)
25. Meuthen, B., Jandel, A.-S.: Coil Coating. Vieweg, Wiesbaden (2005)
26. Möhring, R.H.: Algorithmic aspects of comparability graphs and interval graphs.
In: Rival, I. (ed.) Graphs and Order. NATO ASI Series, vol. 147, pp. 41–101.
Springer, Heidelberg (1985)
27. Mühlenbein, H., Gorges-Schleuter, M., Krämer, O.: Evolution algorithms in com-
binatorial optimization. Parallel Comput. 7, 65–85 (1988)
28. Papadimitriou, C.H., Yannakakis, M.: The traveling salesman problem with dis-
tances one and two. Math. Oper. Res. 18, 1–11 (1993)
29. Perrello, B.D., Kabat, W.C.: Job-shop scheduling using automated reasoning: a
case study of the car-sequencing problem. J. Autom. Reason. 2, 1–42 (1986)
30. Rekieck, B., De Lit, P., Delchambre, A.: Designing mixed-product assembly lines.
IEEE Trans. Robot. Autom. 16, 268–280 (2000)
31. Sahni, S., Gonzalez, T.F.: P-complete approximation problems. J. ACM 23, 555–
565 (1976)
32. Smith, W.E.: Various optimizers for single-stage production. Naval Res. Logist. Q.
3, 59–66 (1956)
33. Solnon, C., Cung, V.D., Nguyen, A., Artigues, C.: The car sequencing prob-
lem: overview of state-of-the-art methods and industrial case-study of the
ROADEF’2005 challenge problem. Eur. J. Oper. Res. 191, 912–927 (2008)
34. Tang, L., Liu, J., Rong, A., Yang, Z.: A review of planning and scheduling systems
and methods for integrated steel production. Eur. J. Oper. Res. 133, 1–20 (2001)
35. Tang, L., Wang, X.: Simultaneously scheduling multiple turns for steel color-
coating production. Eur. J. Oper. Res. 198, 715–725 (2009)
36. van Dam, P., Gaalman, G., Sierksma, G.: Scheduling of packaging lines in the
process industry: an empirical investigation. Int. J. Prod. Econ. 30–31, 579–589
(1993)
Engineering a Bipartite Matching Algorithm
in the Semi-Streaming Model
Lasse Kliemann(B)
Outline. The first three sections are an introduction to the matching problem
in the semi-streaming model. In Sect. 4, a previous semi-streaming algorithm
is described, which grows augmenting paths using the position limiting tech-
nique [11] and is the basis for all further development. Experimental studies start
in Sect. 5, with a description of the experimental setup. In Sect. 6, the algorithm
from [11] is analyzed experimentally, which marks the start of the Algorithm
Engineering process. The remaining sections describe how this process leads to
a new semi-streaming algorithm [21], which experimentally shows much smaller
pass counts than the previous one but maintains many of the (good) theoretical
properties.
1 Bipartite Matching
Let G = (A, B, E) be a bipartite graph, i. e., V := A ∪ B is the set of vertices,
A ∩ B = ∅, and E ⊆ {{a, b}; a ∈ A ∧ b ∈ B} are the edges. Denote n := |V |
and N := |E| the number of vertices and edges, respectively. The density of
|E|
the graph is D := |A||B| ∈ [0, 1], i. e., the ratio of the number of edges to the
maximum possible number of edges given the two sets A and B. A matching
of G is a set M ⊆ E such that m ∩ m = ∅ for all m, m ∈ M with m = m .
A matching M of G is called inclusion-maximal a.k.a. maximal if M ∪ {e} is
not a matching for all e ∈ E \ M , i. e., if we cannot add another edge without
destroying the matching property. A matching M ∗ of G is called a cardinality-
maximal a.k.a. maximum matching if |M | |M ∗ | for all matchings M of G. If
M ∗ is a maximum matching and ρ 1 is a real number, then a matching M is
called a ρ-approximation if |M | ρ |M ∗ |.
A maximal matching is easy to find algorithmically: just start with M = ∅
then consider all the edges in an arbitrary order and add an edge to M if this
c Springer International Publishing AG 2016
L. Kliemann and P. Sanders (Eds.): Algorithm Engineering, LNCS 9220, pp. 352–378, 2016.
DOI: 10.1007/978-3-319-49487-6 11
Engineering a Bipartite Matching Algorithm in the Semi-Streaming Model 353
does not destroy the matching property. However, a maximal matching needs
not to be a maximum matching, although it is a well-known fact that it is
a 12 -approximation, i. e., |M | 12 |M ∗ | for all maximal matchings M and a
maximum matching M ∗ of G. Devising polynomial-time algorithms computing
maximum matchings in bipartite graphs is a classical and important problem in
Combinatorial Optimization.
We introduce some terminology relative to a matching M . A vertex v is
called matched or covered if there is m ∈ M such that v ∈ m, and v is called free
otherwise. For X ⊆ V , denote free(X) the free vertices in X. We often denote
free vertices with lower-case Greek letters. If v ∈ V is matched, then there is
a unique vertex Mv with {v, Mv } ∈ M ; we call Mv the mate of v. An edge
e ∈ E is called a matching edge if e ∈ M and free otherwise. Note that the end-
vertices of a free edge need not to be free vertices. A path is called alternating
if it traverses matching edges and free edges alternately. An alternating path
where both of the end-vertices are free, is called an augmenting path. Clearly,
augmenting paths have an odd number of edges and in the bipartite case can
be written in the form (α, e1 , b1 , m1 , a1 , . . . , mt , at , et+1 , β), where α ∈ free(A),
β ∈ free(B), a1 , . . . , at ∈ A, b1 , . . . , bt ∈ B, m1 , . . . , mt ∈ M , and e1 , . . . , et+1 ∈
E \ M for some t ∈ N. The length (number of traversed edges) of this path is
2t + 1. We always denote paths as a sequence of vertices and edges; this notation
has redundancy but will be helpful.
By exchanging matching edges for free edges along an augmenting path, a
new matching is obtained from M with cardinality |M | + 1. Hence a maximum
matching does not admit any augmenting paths. By Berge’s theorem [5], the
converse also holds: when M admits no augmenting paths, then M is a maximum
matching. This theorem also holds for general (not necessarily bipartite) graphs.
Finding augmenting paths, or determining that there exist none, is algorith-
mically easy in the bipartite case. The simplest algorithm starts at an arbitrary
free vertex α ∈ free(A) and does a modified breadth-first search (BFS) which
only considers free edges when moving from A to B and only matching edges
when moving from B to A.1 As soon as another free vertex β is found (it will be
in B then), an augmenting path (α, . . . , β) is constructed by following the BFS
layers. In each such iteration, the matching grows by 1, so in O (n) iterations
a maximum matching is reached, resulting in a bound of O (nN ) = O n3 on
the total runtime. If no free vertex β is found, then the next free vertex from
free(A) is tried as a starting point for BFS. It can be seen easily that if free(B)
is not reachable from free(A) by modified BFS, then there are no augmenting
paths. A similar argument cannot be made for general graphs, causing the situa-
tion there to be much more complicated. We will not consider general graphs in
this work.
1
This can also be achieved by orienting the edges
and then using normal BFS in this directed bipartite graph (A, B, E ).
354 L. Kliemann
√
For bipartite case, the total runtime bound can be reduced to O ( nN ) =
the
O n 5/2
using the algorithm by Hopcroft and Karp [18] (a good description
can also be found in [24]). Modified BFS is done starting from free(A), more
precisely, we start with the BFS queue containing all vertices from free(A). This
BFS constructs a layered graph, with free(A) being one of the end-layers. As
soon as a layer with vertices from free(B) is found, depth-first search (DFS) in
the layered graph is used to find an inclusion-maximal set A of pairwise vertex-
disjoint shortest augmenting paths. We say disjoint in the following meaning
“pairwise vertex-disjoint”. Since the paths in A are disjoint, they can be used
simultaneously to improve the matching, the new matching will be of cardinality
|M | + |A|. One such BFS and then DFS is called a phase and takes O (N ) time.
The achievement
√ of Hopcroft and Karp lies in recognizing and proving that there
are only O ( n) phases. An implementation
of their algorithm
particularly suited
for dense graphs having a bound of O n 1.5
N/ log n on the runtime was later
given by Alt et al. [3].
There is also a randomized algorithm by Mucha and Sankowski [27], which
runs in O (nω ), where ω depends on the running time of the best known matrix
multiplication algorithm; currently known is ω < 2.38.
On the experimental side, two early studies by Setubal [30] and Cherkassky
et al. [7] are particularly important since they introduce families of bipartite
graphs where solving the maximum matching problem is difficult. In practice,
the choice of the initial (usually inclusion-maximal) matching can make a differ-
ence. Different such initialization heuristics are known and were combined with
different algorithms in the more recent study by Langguth et al. [25]. One possi-
ble initialization heuristic repeatedly picks a vertex v with minimum degree and
matches it to one of its neighbors w, either randomly chosen or also chosen with
minimum degree. Then the two matched vertices are removed from all further
considerations and the degrees of the remaining vertices updated accordingly,
i. e., all remaining neighbors of v and w have their degree reduced.
2 Semi-Streaming Model
Traditionally, random access to the problem instance (here the graph) is assumed
to be cheap. For example, the following assumptions are typically made, where
Δ := maxv∈V deg(v) is the maximum degree in the graph:
– when using an adjacency matrix, whether two vertices are adjacent or not can
be tested in O (1) time and the neighborhood of a vertex can be collected in
O (n);
– when using adjacency lists, adjacency can be tested in O (Δ) time (or O (log Δ)
if vertices are ordered) and the neighborhood of a vertex v can be traversed
in O (deg(v)).
BFS and DFS, heavily used in the algorithms discussed in the previous
section, rely on this. However, when the input becomes very large, perhaps larger
Engineering a Bipartite Matching Algorithm in the Semi-Streaming Model 355
than the amount of random access memory (RAM) available on the machine,
then those assumptions are no longer realistic. Large data calls for a different
access model; a popular class of such models are streaming models. In a streaming
model, the input is given as a sequence of items, e. g., numbers or pairs of num-
bers (which could represent edges in a graph). Such a sequence is called a stream.
An algorithm can request to see the stream once or multiple times, and each time
each of the items is presented to the algorithm, one by one. Seeing the stream
once is called a pass or a pass over the input. No assumption is made on the
order in which the items are presented, and often algorithms are designed so
that the order of items is allowed to change from one pass to the other. A pass is
assumed to be costly, and a bound on the number of passes, the pass guarantee,
is an important characteristic of a streaming algorithm. It is generally accepted
that the pass guarantee should be independent of the input size, but is allowed
to depend on approximation parameters.
The first streaming algorithms were devised starting in the late 1970s (see,
e. g., [15,28]), with one of the most influential works published in 1996 by Alon
et al. [2]. The term “streaming” was coined shortly after by Henzinger et al. [17].
Besides the pass guarantee, another important characteristic is the amount
of RAM an algorithm requires. RAM should be substantially smaller than the
input size for the streaming model to make any sense. On the other hand,
RAM should be large enough in order that something useful can be done. For
graph problems, the graph is given as a stream of its edges, i. e., as a sequence
e1 , . . . , eN where each ei is a pair (interpreted as an unordered pair) of num-
bers from [n] = {1, . . . , n} when n is the number of vertices. Feigenbaum et al.
[13] showed that O (poly log n) bits2 is not enough to even determine whether a
path of length 2 exists between two given vertices, i. e., if their neighborhoods
are disjoint or not, unless an input-size-dependent number of passes is allowed.
The argument is based on the fact that set disjointness has Ω (n) communica-
tion complexity [19], and a p(n)-pass streaming algorithm with O (poly log n)
bits of RAM, say O (logc n) for some c > 0, would allow the problem to be
solved with only O (p(n) · logc n) bits of communication. So any pass guarantee
of p(n) = o (n/ logc n) is ruled out, in particular p(n) = O (1) is impossible. It
follows that logarithmic space makes not much sense for graph problems.
Shortly before the work by Feigenbaum et al., in 2003, Muthukrishnan
had proposed the semi-streaming model [29], where RAM is restricted to
O (n · poly log n) bits, meaning that we can store a linear (in n) number of
edges at a time.3 In this model, they investigate [13]
several graph problems,
in particular they devise a semi-streaming 23 − ε -approximation algorithm,
0 < ε < 13, for the bipartite
maximum matching problem with a pass guar-
antee of O ε−1 log ε−1 . This is an impressive bound, but on the other hand a
2
By poly x we denote a polynomial in x, another way to write O (poly x) is xO(1) .
3
The “semi” attribute was chosen since the term “streaming model” is generally asso-
ciated with logarithmic space. The semi-streaming model is considered “between”
logarithmic and quadratic space,
the latter being equivalent to the RAM model since
a graph can be stored in O n2 bits of space.
356 L. Kliemann
1
2 -approximation, not that far from 23 , can already be obtained in just one pass,
since in one pass a maximal matching can be computed.
The exact algorithms from the previous section cannot be directly applied
to the streaming situation. If the order of edges in unfortunate, a naively imple-
mented BFS can require as many passes as the number of layers. Feigenbaum
et al. [14] prove that indeed, any BFS algorithm computing the first k layers
with probability at least 2/3, requires more than k/2 passes if staying within the
limits of the semi-streaming model (see Guruswami and Onak [16] for improved
−1
lower bounds). In the next section, we will see that in order to obtain a (1 +ε) -
approximation, it is sufficient to consider augmenting paths of length O ε−1 ,
so BFS can be concluded when reaching this depth. But then we still need a
bound on the number of augmentation steps (for the simple algorithm), √ but
even for Hopcroft-Karp we only know the input-size-dependent O ( n) bound
on the number of phases. It is worth noting that even the simple initialization
heuristic described at the end of Sect. 1 cannot be carried over to the streaming
situation, since each degree update takes one pass.
Now suppose
that in Hopcroft-Karp, we could bring the number of phases
down to O poly ε−1 . This still leaves us with a problem, namely how to do
the DFS in a pass-efficient way. The breakthrough idea for this was given by
McGregor [26], namely to perform a “blend” of BFS and DFS: depending on
which edge comes along in the stream, we either grow in breadth or in depth.
−1
Using this technique, McGregor gave a randomized (1 + ε) -approximation
algorithm for general graphs, but with an exponential dependence on ε−1 in
the pass guarantee. Eggert et al. [11] showed that this dependence remains even
if
restricting to bipartite input, namely we have a worst-case lower bound of
−(ε−1 )
Ω ε on the number of passes required by McGregor’s algorithm. This is
due mainly to the randomized nature of the algorithm, requiring a large number
of iterations in order to attain a useful success probability. Using the concept of
position limiting, Eggert et al. [11] gavea new BFS/DFS blend for the bipartite
case, with a pass guarantee of O ε−5 . Subsequently, Algorithm Engineering
was performed on this algorithm yielding an experimentally much faster deriva-
tive [21]. This algorithm and the engineering process that led to its creation will
be described in Sect. 7 and experimentally analyzed in Sect. 8.
In a different line of research, Ahn and Guha devised linear-programming-
based algorithms for a variety of matching-type graph problems [1].
For the bipartite matching problem, an algorithm with pass guarantee
O ε−2 log log ε−1 is presented (with the number of passes being a factor in
the RAM requirement). An experimental evaluation of these techniques is still
an open task. Konrad et al. [23] gave algorithms for the bipartite maximum
matching problem which work in one or two passes with approximation guaran-
tees slightly above the known 12 of an inclusion-maximal matching. For further
work on graph streams (connectivity, spanning trees, weighted matching, cuts)
the work by Zelke [31] is a good starting point.
More recent results include the following; we restrict the discussion to upper
bounds. In dynamic graph streams, edges that have been announced can also
Engineering a Bipartite Matching Algorithm in the Semi-Streaming Model 357
be removed from the graph at a later point in the stream, and edges may be
inserted multiple times. Assadi et al. [4] give a one-pass randomized algorithm
for approximating a maximum matching in a bipartite dynamic graph stream,
with a parameter controlling a trade-off between approximation and required
memory. Konrad [22] gives a similar result for the case of a slightly more gen-
eral dynamic streaming model. A constant-factor randomized approximation for
the size of a maximum matching in planar graphs is achieved in sub-linear
memory size and with one pass with the algorithm by Esfandiari et al. [12].
They generalize to graphs with bounded arboricity. For extensions to dynamic
graph streams, see Bury and Schwiegelshohn [6]. Chitnis et al. [8,9], for dynamic
graph streams, give a one-pass algorithm for maintaining an inclusion-maximal
matching using O (k · poly log n) bits, provided that at any point in time, no
inclusion-maximal matching of size greater than k exists. For computing a poly-
logarithmic approximation to the size of a maximum matching, Kapralov et al.
[20] give a one-pass algorithm requiring only polylogarithmic space, requiring
the stream to be presented in random order. Extensions are given in [6]. Much
has been done for weighted matching, and a recent approximation algorithm was
presented by Crouch and Stubbs [10]. Again, for extensions, see [6].
Following the pattern of [11, Lemmas 4.1 and 4.2] we can prove:
Lemma 1. Let M be an inclusion-maximal matching. Let D be a (λ1 , λ2 ) DAP
−1
set such that |D| 2δ |M | with δ = δ(k, λ1 , λ2 ). Then M is a 1 + k1 -
approximation.
−1
The lemma yields the 1 + k1 approximation guarantee for Algorithm 1 when
δinn = δout = δ(k, λ1 , λ2 ). What are desirable values for λ1 and λ2 ? The DAP
approximation algorithms presented in later sections (the path-based and the
tree-based one) can work with any allowable setting for λ1 and λ2 , so we have
some freedom of choice. We assume that constructing longer paths is more expen-
sive, so we would like to have those values small and in particular λ1 = λ2 . (We
will later encounter situations where it is conceivable that λ1 < λ2 , in other
words s > 1, may be beneficial.) On the other hand, we would like to have δ
large in order√to terminate quickly. The function λ → δ(k, λ, λ) climbs until
λ = k − 1 + k 2 − 1 2k − 1 and falls after that. Since we only use integral
values for λ1 , the largest value to consider is λ1 = λ2 = 2k − 1. The smallest one
is λ1 = λ2 = k. We parametrize the range in between by defining
The following paragraphs describe how we find a (λ1 , λ2 , δinn ) DAP approxima-
tion with λ1 = λ2 , following [11]. Since both length parameters are the same, we
write λ = λ1 = λ2 . Pseudocode is given as Algorithm 2.
Engineering a Bipartite Matching Algorithm in the Semi-Streaming Model 359
23 until c δinn |M |;
24 return A;
(i) The procedure described above is a (λ, λ, δinn ) DAP approx. algorithm.
−1
(ii) Termination occurs after at most 2λδinn + 1 passes.
i=4 k+1=8
On the other hand, if the other end of the dotted edge is matched, so there is
a matching edge m there, then it is checked whether we can use m to extend
that constructed path. Since we are talking about position i = 4, it is checked
whether 4 < (m). If so, the dotted edge and the matching edge m are appended
to the path, and the position limit of m is updated to (m) := 4:
i=4 k+1=8
If m is no part of any constructed path, then this is all that has to be done.
On the other hand, let us consider that an edge comes along in the stream
connecting the end of the 4th constructed path to some matching edge inside of
the 1st constructed path:
m1 m2 m3
i=4 k+1=8
362 L. Kliemann
Then it is checked whether we may use m1 and its right wing (containing match-
ing edges m2 and m3 ) to extend the 4th path. For this, only the position limit of
m1 is relevant. It is checked whether 4 < (m1 ), which is true since (m1 ) = 5.
Hence m1 and its right wing is migrated from the 1st constructed path to the
4th one and position limits are updated, namely (m1 ) := 4 and (m2 ) := 5 and
(m3 ) := 6:
m1 m2 m3
i=4 k+1=8
i=4 k+1=8
Finally, note that the following situation cannot occur, since the graph is
bipartite:
Engineering a Bipartite Matching Algorithm in the Semi-Streaming Model 363
i=4 k+1=8
5 Experimental Setup
In order to perform Algorithm Engineering, we need an implementation and a
way to generate difficult test instances. All algorithms and instance generators
were implemented in C++. Each instance is kept completely in RAM while the
algorithms are working on it, as an array of pairs of 32 bit integers. So the
streaming situation is only simulated (although one might argue that even in
RAM, sequential access can be beneficial). When randomizing the order of edges
for the sake of experiments, it is important to really randomize the array and not
just access it using randomized indices. If s [] is the array holding the stream,
it should be first randomized by permuting its entries so that during a pass
we access it using s [ i ] with i = 1, . . . , N . Another possibility would be to use
an array p [] being a random permutation of [N ] and then obtain a random
permutation of the stream by accessing it using s [p[ i ]] with i = 1, . . . , N .
However, the latter is substantially less efficient and would lead to much fewer
experiments conducted per time.
We generate instances with various structure:
rand: Random bipartite graph; each edge in {{a, b} ; a ∈ A ∧ b ∈ B} occurs with
probability p ∈ [0, 1], which is a parameter.
degm: The degrees in one partition, say A, are a linear function of the vertex
index, which runs from 1 to |A|. The neighbors in partition B are chosen
uniformly at random. A parameter p ∈ [0, 1] is used to scale degrees.
The following three classes were introduced in [7,30], see also [25]. The con-
structions work by dividing vertices into groups of equal size and connect them
following certain rules.
hilo: Parameters are l, k, d ∈ N, with d k and |A| = |B| = lk. Denote A =
{a0 , . . . , alk−1 } and B = {b0 , . . . , blk−1 }. Define the groups by
Ai := {aj ; ki j < k (i + 1)} and B i := {bj ; ki j < k (i + 1)}
for each 0 i < l. This makes
l groups in each partition,
each group
being
of size k. Denote Ai = ai0 , . . . , aik−1 und B i = bi0 , . . . , bik−1 for each
0 i < l.
364 L. Kliemann
Edges run as follows. For each 0 i < l,each 0 j < k, and each 0 t < d
with 0 j − t we add the edge aij , bij−t , and if i + 1 < l, then we add also
i i+1
aj , bj−t . That is, each aij is connected with its “direct counterpart” bij , and
with the d − 1 vertices in B i located before bij ; and then the same with B i+1
instead of B i , provided we have not yet reached the last group. Such a graph
has a unique perfect matching.
For l = 3, k = 5, and d = 2 this looks as follows, where A is shown at the top.
The unique perfect matching is highlighted (all the vertically drawn edges).
rbg: Parameters are l, k ∈ N and p ∈ [0, 1], where again |A| = |B| = lk. Groups
i l l
A i=1 and B i i=1 are defined as for hilo. For each 0 i < l and each
j ∈ {i − 1, i, i + 1} (where the arithmetic is modulo l, hence −1 = l − 1 and
l = 0) and each vertex v ∈ Ai and each vertex w ∈ B j , we add {v, w} with
probability p. That is, we have a random bipartite graph between each group
of A and its three “nearest” groups in B, with wrap-around. This class is also
known as fewg and manyg, depending on the size of parameter l.
rope: Parameters and definition of groups is as in rbg. Edges run as follows. For
each 0 i < l, we add a perfect matching between Ai and B i . For each
1 i < l, we add each possible edge between Ai and B i−1 with probability p.
Such a graph has a unique perfect matching. The following picture gives an
example for l = 3 and k = 4, with the unique perfect matching highlighted.
From left to right, we have A0 , B 0 , A1 , B 1 , A2 , B 2 .
We impose a hard limit of 1 × 109 on |E|, meaning about 7.5 GiB (each
vertex is stored as a 32 bit unsigned integer). A series is specified by a density
limit Dmax and a set of values for n. For each n of a series and for each class,
we generate 256 instances on n vertices. For hilo, rbg, and rope, parameter l is
chosen randomly from the set of divisors of |A| = n2 . For all classes, a parameter
controlling the (expected) number of edges (e. g., p for rand) is being moved
through a range such that we start with very few (expected) edges and go up to
(or close to) the maximum number of edges possible, given the hard limit, the
limit Dmax on the density (allowing some overstepping due to randomness), and
any limit resulting from structural properties (e. g., number of groups l). This
way we produce instances of different densities. For rand and degm, we use 16
Engineering a Bipartite Matching Algorithm in the Semi-Streaming Model 365
different densities and generate 16 instances each. For hilo, rbg, and rope, we use
64 random choices of l and for each 4 different densities. This amounts to 256
instances per n and class. After an instance is generated, its edges are brought
into random order. Then each algorithm is run on it once, and then again with
partitions A and B swapped. During one run of an algorithm, the order of edges
in the stream is kept fix. We use the term pass count to refer to the number of
passes occurring until the algorithm terminates; clearly the pass guarantee is an
upper bound on any pass count.
where
λ−k+1
δ= and λ = k (1 + k −γ̃ ) − 1.
2kλ (λ + 2)
Concrete numbers for k = 9 are given in the following table.
γ̃ = 0 γ̃ = 12 γ̃ = 1
How pass guarantee depends on k: O k5 O k6 O k7
Length parameter λ(k−γ̃ ) for k = 9: 17 11 9
Termination parameter δ −1 for k = 9: 646 858 1 782
Concrete pass guarantee for k = 9: 14 211 355 16 215 343 57 193 291
γ̃ Maximum Mean
rand degm hilo rbg rope rand degm hilo rbg rope
0 O k5 11 886 14 180 7 032 4 723 2 689 107 145 3 337 257 378
1
2
O k6 7 817 31 491 7 971 4 383 3 843 80 127 2 071 500 541
7
1 O k 7 121 32 844 9 106 5 687 5 126 74 166 2 033 844 790
It is not only interesting to note that these numbers are much smaller than
the pass guarantees, but also that there is no best γ̃ setting. When only looking
at how the pass guarantee depends on k and also when looking at the concrete
pass guarantees (as per Table 1), the setting γ̃ = 0 is superior. However, in
experiments it shows to be inferior to γ̃ = 1 in the following cases: for the
maximum and mean for rand, and for the mean for hilo. Especially for hilo this
is interesting since this instance class shows by far the highest mean. This is yet
another reminder that the performance of an algorithm observed in practice will
not necessarily be predicted by theoretical analysis, even if constants otherwise
hidden in O (·) notation are taken into account.
i=4 k+1=8
Then nothing will happen, the edge will be ignored. Moreover assume that the
5th constructed path as shown above has a “dead end”, i. e., there is no way
to complete it to an augmenting path, not until the last two matching edges
have been removed by backtracking. (Recall that after each pass, we backtrack
conditionally: each constructed path that was not modified during that preceding
Engineering a Bipartite Matching Algorithm in the Semi-Streaming Model 367
pass has its last two edges removed.) After backtracking was performed twice on
the 5th path – which will take at least 2 more passes – the dotted edge shown
above can finally become effective and complete the path to an augmenting path.
The question arises: would it not be a good idea to complete a path as soon as
it is possible instead?
Another question is raised if the completion is not imminent, but we could
complete for example using one intermediate matching edge m:
i=4 k+1=8
To benefit from this without having to go through all the backtracking, we would
have to first remember m when the first dotted edge comes along in the stream
and later complete when the second dotted edge comes along. So in fact, we
would not be growing paths, but growing trees. These considerations give rise
to the first version of our tree-based DAP approximation algorithm, described
formally in the following.
First Version
(i. e., the augmenting paths found) will be stored into a set A, that is initialized
to A := ∅.
Trees grow over time, and there may also emerge non-properly rooted trees.
When a free edge {a, b} between two remaining vertices goes by in the stream
with b being covered, the algorithm checks whether to extend any of the trees.
Conditions are: the tree has to be properly rooted, say T (α), it must contain a,
and i < ({b, Mb }), where i is the position that the matching edge {b, Mb }
would take in T (α). If all those conditions are met, an extension step occurs:
the two edges {a, b} and {b, Mb } are added to T (α), and, if {b, Mb } is already
part of a tree T (b ), then T (b )[b] is removed from T (b ) and connected to T (α)
via {a, b}. The tree T (b ) is not required to be properly rooted, but it may be.
Bipartiteness ensures that Mb ∈ V (T (b )[b]). Position limits for all inserted or
migrated edges are updated to reflect their new positions. The following figures
show an example. There is a properly rooted tree T (α) and a non-properly rooted
tree T (b ). Assume that the dotted edge {a, b} comes along in the stream:
b b Mb
α a
Provided that position limits allow, then part of T (b ), namely the subtree
T (b )[b], is migrated to T (α). The migrated edges will receive new position lim-
its, e. g., ({b, Mb }) := 2. There are only 3 edges left in tree T (b ), two matching
edges and one free edge:
b
b Mb
α a
Engineering a Bipartite Matching Algorithm in the Semi-Streaming Model 369
When a free edge {a, β} with a, β ∈ remain(V ) goes by in the stream with β
being free, then we check whether we can build an augmenting path. If there is
a properly rooted tree T (α) with a ∈ V (α), the path P in T (α) from α to β is
augmenting. In that case, a completion step occurs: we store P into the result
set A, and mark all vertices on P as used. Also, we adjust our forest as follows.
For each a ∈ V (P ) ∩ A and each of its neighbors in T (α) and not in P , i. e., for
each b ∈ NT (α) (a) \ V (P ), we set T (b) := T (α)[b]. In other words, we “cut” P
out of T (α) and make each of the resulting subtrees that “fall off” a new tree of
its own. None of those is properly rooted, and also they are rooted at vertices
of partition B, not A as the properly rooted ones. However, they – or parts of
them – can subsequently be connected to remaining properly rooted trees by an
extension step as described before.
After each pass, it is checked whether it is time to terminate and return the
result A. We terminate when any of the following two conditions is met:
(T1) During the last pass, no extension or completion occurred. In other words,
the forest did not change. (It then would not change during further passes.)
(T2) The number of properly rooted trees (which is also the number of remaining
free vertices of A) is on or below δinn |M |.
A backtracking step as for the path-based algorithm makes no sense here
since its purpose was to free up the ends of constructed paths in order that
other edges can be attached there – which obviously is not necessary for the
tree-based algorithm.
4
The statement (i) can also be formulated as the algorithm with threshold 0 being a
(λ1 , λ2 , 0) DAP approximation algorithm.
370 L. Kliemann
It immediately follows that for τ = δinn |M |, there exist a (λ1 , λ2 ) DAP set D
with |D| |A| + δinn |M |, namely we can take for D the set that would have
been constructed for threshold 0. By definition, we thus have a (λ1 , λ2 , δinn )
DAP approximation algorithm, as required. The most complicated step in this
proof is (i). An attempt to follow through the same program for the first version
of the tree-based algorithm failed, and it must fail due to the following simple
example. Let the following graph and matching be given:
b1 a1
α1 β1
α2 β2
b2 a2
Assume {α1 , b1 } and {α1 , b2 } come first in the stream. Then the two matching
edges {b1 , a1 } and {b2 , a2 } are built into T (α1 ) and their position limits set to 1,
so they can never migrate to T (α2 ). Thus, at most one of the four augmenting
paths is found (namely either (α1 , b1 , a1 , β1 ) or (α1 , b2 , a2 , β2 ), depending on
whether (a1 , β1 ) or (a2 , β2 ) comes next in the stream), leaving one behind that
is disjoint to the found one.
This does not only spoil the proof attempt, but it is a serious flaw in the
algorithm, which can lead to poor approximation. To see this, we generalize the
above example to |free(A)| = t for some t ∈ N.
α2 β2
b2 a2
αt βt
bt at
Due to the order in the stream, the initial matching will be {{ai , bi } ; i ∈ [t]}
as shown in the picture above, since all these edges come first and are hence
picked. This is just a 12 -approximation, not better. During the first pass of the
DAP approximation, the tree T (α1 ) will first grab all the matching edges and
when {a1 , β1 } comes along in the stream, we will have the augmenting path
(α1 , b1 , a1 , β1 ). Due to position limits, nothing more will happen in this phase,
so the DAP approximation terminates, delivering just one augmenting path. It
will be used to improve the matching from size t to size t + 1, but then the
whole algorithm, which is Algorithm 1, will terminate since |A| = 1. Strictly,
this requires t to be sufficiently large compared to k, but these requirements
are easily met. For example, let k = 9 and λ1 = λ2 = k (i. e., γ̃ = 1). Then
δ(k, λ1 , λ2 ) = 1782 1
as per (1). So if t 1782 and k = 9 then the algorithm will
terminate after one invocation of the DAP approximation and as a result it will
miss its goal of a 90%-approximation by far.
In order to remedy the flaw that has become evident above, we add a feature to
the completion step. Recall that in the completion step, an augmenting path is
“cut” out of a properly rooted tree, perhaps leaving some non-properly rooted
trees behind. The second version introduces position limit release: we reset posi-
tion limits to λ1 + 1 on edges of the new (non-properly rooted) trees; we say
that the position limits on those edges are released. This can be considered an
implicit form of backtracking.
Pass counts in experiments went up only moderately after position limit
release was implemented. However, something else unexpectedly happened: the
batch system at the Computing Center where the experiments were run killed
many of the jobs after some time because they exceeded their memory limit.
The batch system requires to give a bound on the memory requirements of a job
and will kill the job if this bound is exceeded. One way is to just give the total
amount of memory available on the desired compute nodes, but this would be a
waste of computing power since the nodes have multiple cores and the memory
limit set to total memory would mean that only one (single-threaded) job would
run on it. So the memory requirement of the program was estimated by looking
372 L. Kliemann
at its data structures and this estimate was given to the batch system. With
this estimate, jobs were killed, even after introduction of an extra safety margin.
Additional assert () statements finally revealed that the data structure used for
storing trees grew beyond all expectations, in particular paths much longer than
2λ2 + 1 were constructed (i. e., they were not λ2 paths anymore).5 Indeed, a
review of the algorithm showed that this is to be expected after position limit
release was introduced, explained in the following.
In an extension step, although position limits at first are not higher than
λ1 + 1, edges can be included in a tree at positions beyond λ1 . Assume m =
{b, Mb } is inserted at position i λ1 into a properly rooted tree T (α) and
subsequently, more edges are inserted behind m. Then an augmenting path is
found in T (α) not incorporating m, hence the position limit of m is released.
Later m can be inserted at a position j with λ1 j > i in another properly
rooted tree T (α ). When m carries a sufficiently deep subtree with it, then T (α )
could grow beyond λ1 , even though j λ1 . This is no good since we expect a
DAP approximation algorithm to deliver λ2 paths for a parameter λ2 (which so
far was chosen to be equal to λ1 ); cf. Sect. 3.
As a solution, the second length parameter λ2 takes on a special role. The
third version of the tree-based DAP approximation includes the following feature:
when the migrated subtree is too deep, we trim its branches just so that it can be
migrated without making the destination tree reach beyond λ2 . The trimmed-off
branches become non-properly rooted trees of their own. We control a trade-off
this way: higher λ2 means fewer trimming and hence that we destroy fewer of our
previously built structure. But higher λ2 reduces δ(λ1 , λ2 ) and so may prolong
termination. Choosing λ2 := λ1 is possible, so we may stick to a single length
parameter as before, but can also experiment with larger λ2 . Recall that the
stretch s = λλ21 is used as a measure how far beyond λ1 our structures may
stretch.
This experience shows that appropriate technical restrictions during experi-
mentation, such as memory limits, can be helpful not only to find flaws in the
implementation but also to find flaws in the algorithm design.
We are finally able to give an approximation guarantee:
Lemma 2. The third version of the tree-based algorithm (with position limit
release and trimming) is a (λ1 , λ2 , δinn ) DAP approximation algorithm.
Proof. Recall the termination conditions on page 18. When the algorithm termi-
nates via condition (T2), it could have, by carrying on, found at most δinn |M |
additional augmenting paths. We show that when we restrict to termination
condition (T1), we have a (λ1 , λ2 , 0) DAP approximation algorithm. Clearly,
by trimming, only λ2 paths can be returned. It is also obvious that all paths
returned are augmenting paths and disjoint.
It remains to show that we cannot add an augmenting λ1 path to A without
hitting at least one of the paths already included. Suppose there is an augmenting
5
Recall that so far we have always used λ1 = λ2 . A distinction between the two
parameters will be made shortly.
Engineering a Bipartite Matching Algorithm in the Semi-Streaming Model 373
Pass Guarantee
By the previous lemma, we have an approximation guarantee for the third ver-
sion. What about a pass guarantee? Unfortunately, no input-size-independent
pass guarantee can be made. This is seen by Example 1. First the tree T (α1 )
will grab all the matching edges, then one augmenting path is found and posi-
tion limits on all the matching edges not in that path are released. During the
next pass, tree T (α2 ) will grab all the matching edges, and so on. There will be
Ω (t) = Ω (n) passes.
On the upside, this is easily fixed: we simply let an algorithm using the
path-based DAP approximation run in parallel, feeding it the same edges from
the stream. We terminate when one−1 of the two terminates. Since both have an
approximation guarantee of 1 + k1 , we know that we have a good approxi-
mation no matter from which of the two algorithms we take the solution. Since
the path-based algorithm has a pass guarantee of O k O(1) , we know that ter-
mination will certainly occur after that many passes – in practice it is of course
reasonable to expect that termination will occur much earlier than the pass
guarantee predicts and also that termination will be triggered by the tree-based
algorithm.
On the other hand, in the special case of Example 1, the path-based algo-
rithm would require at most 4 passes: one pass to establish the initial matching,
one pass to construct the paths (αi , bi , ai ) in the order i = 1, . . . , t, one pass to
complete each such path to (αi , bi , ai , βi ), then the DAP approximation termi-
nates and an augmentation step occurs, and then there is at most one more final
374 L. Kliemann
pass to realize that nothing more can be done, the DAP approximation returns
A = ∅. The result is the perfect matching {{αi , bi } , {ai , βi } ; i ∈ [t]}.
Except for one case (maximum value for hilo and γ̃ = 0), there is no improvement
of s = 2 over s = 1; on the contrary, the higher stretch lets maximum for rope
and γ̃ = 0 jump from 79 to 94. Among the s = 1 results, γ̃ = 1 is the best except
for the maximum for rbg, which is one less for γ̃ = 12 . But γ̃ = 12 shows inferior
results for several other classes.
γ̃ s Maximum Mean
rand degm hilo rbg rope rand degm hilo rbg rope
0 1 6 9 75 41 79 3 3 51 5 22
0 2 6 9 74 52 94 3 3 51 5 26
1
2
1 6 9 59 37 63 3 3 38 5 20
1
2
2 6 9 59 44 70 3 3 38 5 22
1 1 6 9 54 38 61 3 3 35 5 20
1 2 6 9 55 40 67 3 3 36 6 21
All the following experiments are done with the good (and almost always
best) settings γ̃ = 1 and s = 1. The highest pass count we have seen for this
setting in Table 3 is 61. We increase number of vertices up to a million and first
keep the density limit at Dmax = 10 1
. The following table shows development
for growing n. The number of edges ranges up to about the hard limit of |E| =
1 × 109 , which takes about 7.5 GiB of space (Table 4).
The previous highest pass count of 61 is exceeded, for n = 1 000 000 and
hilo we observe 65 in this new series. However, this is only a small increase, and
moreover the mean values show no increase. The linear worst-case dependence
on n, as seen by Example 1, is not reflected by these results.
Next, we lower the density limit. The following table is based on two series:
one with Dmax = 1 × 10−3 and the other with Dmax = 1 × 10−4 (Table 5).
Engineering a Bipartite Matching Algorithm in the Semi-Streaming Model 375
n Maximum Mean
rand degm hilo rbg rope rand degm hilo rbg rope
100 000 3 8 53 30 62 2.5 3.2 35.0 5.1 19.8
200 000 3 7 56 31 63 2.5 2.8 37.6 4.7 19.1
300 000 3 7 55 29 64 2.5 2.9 38.6 3.9 18.2
400 000 3 8 56 33 63 2.5 2.9 36.3 5.3 15.6
500 000 3 7 58 34 64 2.5 3.0 36.7 4.4 19.4
600 000 3 9 58 30 64 2.5 3.5 38.4 3.3 18.1
700 000 6 9 56 35 62 2.5 3.6 37.4 3.9 18.5
800 000 3 8 58 31 63 2.5 3.5 37.9 3.1 16.2
900 000 7 8 61 32 62 2.6 3.3 37.0 3.7 14.5
1 000 000 6 9 60 34 65 2.5 3.1 33.4 4.6 18.2
n Maximum Mean
rand degm hilo rbg rope rand degm hilo rbg rope
100 000 40 41 53 48 46 10.3 12.0 28.3 18.9 24.0
200 000 43 43 54 48 46 8.3 11.1 28.5 14.6 21.9
300 000 41 42 56 52 55 6.5 8.4 29.1 11.9 21.2
400 000 44 42 56 48 55 6.1 8.2 29.6 8.6 18.2
500 000 48 45 59 41 56 4.9 7.1 28.9 8.2 18.5
600 000 48 40 58 42 56 5.3 8.3 29.2 6.0 19.1
700 000 40 42 57 32 55 4.3 6.6 29.4 4.7 16.4
800 000 30 42 57 34 57 3.8 6.3 30.9 4.8 16.7
900 000 46 45 58 48 60 4.3 6.6 30.0 4.6 16.6
1 000 000 48 45 59 42 60 4.1 7.5 31.3 4.5 17.3
For several classes, in particular rand and degm, lower density elicits higher
pass counts. But still the previous maximum of 65 is not exceeded. This remains
true even for the following series going up to two million vertices and Dmax =
1 × 10−4 (Table 6).
n Maximum Mean
rand degm hilo rbg rope rand degm hilo rbg rope
1 000 000 48 43 62 41 48 5.5 8.8 29.6 4.5 17.7
1 100 000 47 49 60 42 50 5.1 8.6 29.6 4.4 17.0
1 200 000 45 51 60 33 52 4.4 8.4 29.0 5.2 14.5
1 300 000 31 30 59 41 47 3.9 8.7 30.0 3.9 15.5
1 400 000 32 35 61 35 51 4.5 7.6 28.5 4.6 14.9
1 500 000 28 29 57 33 51 3.9 8.5 28.7 4.5 15.5
1 600 000 25 27 58 34 52 4.1 6.9 26.7 4.5 15.9
1 700 000 28 42 60 35 52 3.6 7.7 28.8 4.6 16.2
1 800 000 31 29 60 35 54 4.1 6.8 28.1 3.2 15.2
1 900 000 23 26 56 34 50 3.2 6.3 27.7 4.6 14.0
2 000 000 32 21 60 35 49 3.4 6.4 28.9 4.5 15.7
References
1. Ahn, K.J., Guha, S.: Linear programming in the semi-streaming model with appli-
cation to the maximum matching problem. Inf. Comput. 222, 59–79 (2013). Con-
ference version at ICALP 2011
2. Alon, N., Matias, Y., Szegedy, M.: The space complexity of approximating the
frequency moments. J. Comput. Syst. Sci. 58, 137–147 (1999). Conference version
at STOC 1996
3. Alt, H., Blum, N., Mehlhorn, K., Paul, M.: Computing
a maximum cardinality
matching in a bipartite graph in time O(n1.5 m/ log n). Inf. Process. Lett. 37,
237–240 (1991)
4. Assadi, S., Khanna, S., Li, Y., Yaroslavtsev, G.: Tight bounds for linear sketches
of approximate matchings (2015). http://arxiv.org/abs/1505.01467
5. Berge, C.: Two theorems in graph theory. Proc. Natl. Acad. Sci. United States
Am. 43(9), 842–844 (1957). http://www.pnas.org/content/43/9/842.short
6. Bury, M., Schwiegelshohn, C.: Sublinear estimation of weighted matchings in
dynamic data streams. In: Bansal, N., Finocchi, I. (eds.) ESA 2015. LNCS, vol.
9294, pp. 263–274. Springer, Heidelberg (2015). doi:10.1007/978-3-662-48350-3 23
7. Cherkassky, B.V., Goldberg, A.V., Martin, P.: Augment or push: a compu-
tational study of bipartite matching and unit-capacity flow algorithms. ACM
J. Exp. Algorithms 3, Article No. 8 (1998). http://www.jea.acm.org/1998/
CherkasskyAugment/
8. Chitnis, R., Cormode, G., Esfandiari, H., Hajiaghayi, M.T., Monemizadeh, M.:
Brief announcement: new streaming algorithms for parameterized maximal match-
ing and beyond. In: Proceedings of the 27th ACM Symposium on Parallel Algo-
rithms and Architectures, Portland, Oregon, USA, June 2015 (SPAA 2015) (2015)
Engineering a Bipartite Matching Algorithm in the Semi-Streaming Model 377
26. McGregor, A.: Finding graph matchings in data streams. In: Chekuri, C.,
Jansen, K., Rolim, J.D.P., Trevisan, L. (eds.) APPROX/RANDOM -2005. LNCS,
vol. 3624, pp. 170–181. Springer, Heidelberg (2005). doi:10.1007/11538462 15
27. Mucha, M., Sankowski, P.: Maximum matchings via Gaussian elimination.
In: Proceedings of the 45th Annual IEEE Symposium on Foundations of
Computer Science, Rome, Italy, (FOCS 2004), pp. 248–255 (2004). http://
doi.ieeecomputersociety.org/10.1109/FOCS.2004.40, http://www.mimuw.edu.pl/
mucha/pub/mucha sankowski focs04.pdf
28. Munro, J.I., Paterson, M.: Selection and sorting with limited storage. Theoret.
Comput. Sci. 12, 315–323 (1980). Conference version at FOCS 1978
29. Muthukrishnan, M.: Data streams: algorithms and applications. Found. Trends
Theoret. Comput. Sci. 1(2), 1–67 (2005). http://algo.research.googlepages.com/
eight.ps
30. Setubal, J.C.: Sequential and parallel experimental results with bipartite match-
ing algorithms. Technical report IC-96-09, Institute of Computing, University of
Campinas, Brazil (1996). http://www.dcc.unicamp.br/ic-tr-ftp/1996/96-09.ps.gz
31. Zelke, M.: Algorithms for streaming graphs. Ph.D. thesis, Mathematisch-
Naturwissenschaftliche Fakultät II, Humboldt-Universität zu Berlin (2009). http://
www.tks.informatik.uni-frankfurt.de/getpdf?id=561
Engineering Art Galleries
Abstract. The Art Gallery Problem (AGP) is one of the most well-
known problems in Computational Geometry (CG), with a rich history
in the study of algorithms, complexity, and variants. Recently there has
been a surge in experimental work on the problem. In this survey, we
describe this work, show the chronology of developments, and compare
current algorithms, including two unpublished versions, in an exhaus-
tive experiment. Furthermore, we show what core algorithmic ingredients
have led to recent successes.
1 Introduction
The Art Gallery Problem (AGP) is one of the classic problems in Computational
Geometry (CG). Originally it was posed forty years ago, as recalled by Ross
Honsberger [37, p. 104]:
It should be noted that a slightly different definition is used today, where not only
the walls of the gallery have to be guarded, but also the interior (this is indeed
a different problem, see Fig. 1a). AGP has received enormous attention from the
CG community, and today no CG textbook is complete without a treatment
of it. We give an overview on the most relevant developments in Sect. 2, after
introducing the problem more formally.
c Springer International Publishing AG 2016
L. Kliemann and P. Sanders (Eds.): Algorithm Engineering, LNCS 9220, pp. 379–417, 2016.
DOI: 10.1007/978-3-319-49487-6 12
380 P.J. de Rezende et al.
(a) Three guards suffice to cover the (b) One point guard covers the interior,
walls, but not the interior. but a vertex guard cannot
Fig. 1. Edge cover and vertex guard variants have better and worse solutions than the
classic AGP, respectively.
Besides theoretical interest, there are practical problems that turn out to
be AGP. Some are of these are straightforward, such as guarding a shop with
security cameras, or illuminating an environment with few lights. For another
example, consider a commercial service providing indoors laser scanning: Given
an architectural drawing of an environment, say, a factory building, a high-
resolution scan needs to be obtained. For that matter, the company brings in
a scanner, places it on a few carefully chosen positions, and scans the building.
As scanning takes quite a while, often in the range of several hours per position,
the company needs to keep the number of scans as low as possible to stay com-
petitive — this is exactly minimizing the number of guards (scan positions) that
still survey (scan) the whole environment.
In this paper, we provide a thorough survey on experimental work in this area,
i.e., algorithms that compute optimal or good solutions for AGP, including some
problem variants. We only consider algorithms that have been implemented, and
that underwent an experimental evaluation. During the past seven years, there
have been tremendous improvements, from being able to solve instances with tens
of vertices with simplification assumptions, to algorithm implementations that
find optimal solutions for instances with several thousands of vertices, in reason-
able time on standard PCs. We avoid quoting experimental results from the lit-
erature, which are difficult to compare to each other due to differences in bench-
mark instances, machines used, time limits, and reported statistics. Instead, we
conducted a massive unified experiment with 900 problem instances with up to
5000 vertices, comparing six different implementations that were available to
us. This allows us to pinpoint benefits and drawbacks of each implementation,
and to exactly identify where the current barrier in problem complexity lies.
Engineering Art Galleries 381
Given that all benchmarks are made available, this allows future work to compare
against the current state. Furthermore, for this paper, the two leading implemen-
tations were improved in a joint work between their respective authors, using
what is better in each. The resulting implementation significantly outperforms
any previous work, and constitutes the current frontier in solving AGP.
The remainder of this paper is organized as follows. In the next section,
we formalize the problem and describe related work. In Sect. 3, we turn our
attention to the sequence of experimental results that have been presented in
the past few years, with an emphasis on the chronology of developments. This
is followed by an experimental cross-comparison of these algorithms in Sect. 4,
showing speedups over time, and the current frontier. In Sect. 5, we take an
orthogonal approach and analyze common and unique ingredients of the algo-
rithms, discussing which core ideas have been most successful. This is followed
by a discussion on closely related problem variants and current trends in Sect. 6,
and a conclusion in Sect. 7.
We are given a polygon P , possibly with holes, in the plane with vertices V and
|V | = n. P is simple if and only if its boundary, denoted by ∂P , is connected. For
p ∈ P , V (p) ⊆ P denotes all points seen by p, referred to as the visibility region
of p, i.e., all points p ∈ P that can be connected to p using the line segment
pp ⊂ P . We call P star-shaped if and only if P = V (p) for some p ∈ P , the set
of all such points p represents the kernel of P . For any G ⊆ P , we denote by
V (G) = g∈G V (g). A finite G ⊂ P with V (G) = P is called a guard set of P ;
g ∈ G is a guard. We say that g covers all points in V (g). The AGP asks for
such a guard set of minimum cardinality.
Note that visibility is symmetric, i.e., p ∈ V (q) ⇐⇒ q ∈ V (p). The inverse
of V (·) describes all points that can see a given point p. This is easily confirmed
to be
V −1 (p) := {q ∈ P : p ∈ V (q)} = V (p) .
We use two terms to refer to points of P , making the discussion easier to
follow. We call a point a guard position or guard candidate when we want to
stress its role to be selected as part of a guard set. The second term comes from
the fact that in a feasible solution, every point in w ∈ P needs to be covered by
some visibility polygon. We refer to such a point as witness when we use it as
certificate for coverage.
Let G, W ⊆ P be sets of guard candidates and witnesses such that W ⊆
V (G). The AGP variant were W has to be covered with a minimum number
382 P.J. de Rezende et al.
{g1 } ∅ {g2 }
{g1 , g2 }
{g1 } {g2 }
∅
g1 g2
They gave a lower bound of Ω(log n) for polygons with holes. For vertex,
edge and point guards in simple polygons, they established APX-hardness. For
restricted versions, approximation algorithms have been presented. Efrat and
Har-Peled [24] gave a randomized approximation algorithm with logarithmic
approximation ratio for vertex guards. Ghosh [34] presented algorithms for ver-
tex and edge guards only, with an approximation ratio of O(log n). For point
guards Nilsson [48] gave O(OPT2 )-approximation algorithms for monotone and
simple rectilinear polygons. Also for point guards, Deshpande et al. [23] pro-
posed one of the few existing approximation algorithms which is not constrained
to a few polygon classes. See Ghosh [34] for an overview of approximation algo-
rithms for the AGP. The first known exact algorithm for point guard problem
was proposed by Efrat and Har-Peled [24] and has complexity O((nc)3(2c+1) ),
where c is the size of the optimal solution. No experimental results with this
algorithm have been reported so far. The exponential grow of the running time
with c probably makes it useless to solve large non-trivial instances.
3 Timeline
After receiving mainly a theoretical treatment for over thirty years, several
groups have started working on solving the AGP using the Algorithm Engi-
neering methodology, aiming at providing efficient implementations to obtain
optimal, or near-optimal, solutions.
Especially two groups, the Institute of Computing at the University of Camp-
inas, Brazil, and the Algorithms Group at TU Braunschweig, Germany, devel-
oped a series of algorithms that substantially improve in what kind of instances
can be solved efficiently. In this section, we give a chronological overview on
these efforts, and describe the algorithms that were developed. It should be
noted that all these approaches follow similar core ingredients, e.g., the AGP
is treated as an infinite Set Covering Problem (SCP). As finite SCP instances
can be solved reasonably fast in practice, the AGP is reduced to finite sets, and
different techniques are employed to connect the finite and infinite cases.
Amit et al. [2] were among the first to experiment with a solver for AGP(P, P ),
see the journal version [3] and the PhD thesis by Packer [52] for extended
presentations.
In this work, greedy algorithms are considered, following the same setup:
A large set G of guard candidates is constructed, with the property that P
can be guarded using G. Algorithms pick guards one after the other from G,
using a priority function μ, until P is fully guarded. Both G and μ are heuristic
in nature. The authors present 13 different strategies (i.e., choices for G and
μ), and identify the three that are the best: In A1 , G consists of the polygon
vertices, and of one additional point in every face of the arrangement obtained
Engineering Art Galleries 385
by adding edge extensions to the polygon. Priority is given to guards that can see
the most of the currently unguarded other positions in G. The second strategy,
A2 follows the same idea. Additionally, after selecting a guard g, it adds V (g)
to the arrangements and creates additional candidate positions in the newly
created faces. Finally, A13 employs a weight function ω on G, used as a random
distribution. In each step, a point from G is selected following ω. Then, a random
uncovered point p is generated, and all guard candidates seeing p get their weight
doubled.
To produce lower bounds, greedy heuristics for independent witnesses (i.e.,
witnesses whose visibility regions do not overlap) are considered. Using a pool
of witness candidates, consisting of the polygon’s convex vertices and points
on reflex-reflex edges, a witness set is constructed iteratively. In every step,
the witness seeing the fewest other witness candidates is added, and dependent
candidates are removed.
The authors conducted experiments with 40 input sets, including randomly
generated as well as hand-crafted instances, with up to 100 vertices. Both simple
polygons and ones with holes are considered. By comparing upper and lower
bounds, it was found that the three algorithms mentioned above always produced
solutions that are at most a factor 2 from the optimum. Algorithm A1 was most
successful in finding optimal solutions, which happened in 12 out of 37 reported
cases.
AGP is then reduced to an SCP instance and modeled as an ILP. The resulting
formulation is subsequently solved using an ILP solver, in their case, XPRESS.
If the solution to the discretized version covers the whole polygon, then an opti-
mal solution has been found. Otherwise, additional points are added to W and
the procedure is iterated. The authors prove that the algorithm converges in a
polynomial number of iterations, O(n3 ) in the worst case.
An important step of this exact algorithm is to decide how to construct the
set of witnesses W . Couto et al. study various alternatives and investigated the
impact on the performance of the algorithm. In the first version [20], a single
method for selecting the initial discretization is considered, which is based on
the creation of a regular grid in the interior of P . In the journal version [21], four
new discretizations are proposed: Induced grid (obtained by extending the lines
of support of the edges of the polygon), just vertices (comprised of all vertices of
P ), complete AVP (consisting of exactly one point in the interior of each AVP),
and reduced AVP (formed by one point from each shadow AVP). The authors
prove that, with the shadow AVP discretization, that it takes the algorithm only
one iteration to converge to an optimal solution for the orthogonal AGP.
The first experimental results were largely surpassed by those reported in
the journal version. Besides introducing new discretizations, the shadow AVP
discretization increased the polygon sizes fivefold (to 1000 vertices). In total,
almost 2000 orthogonal polygons were tested, including von Koch polygons,
which give rise to high density visibility arrangements and, as a consequence, to
larger and harder to solve SCP instances.
The authors highlight that, despite the fact that the visibility polygons and
the remaining geometric operations executed by the algorithm can be computed
in polynomial time, in practice, the preprocessing phase (i.e. geometric opera-
tions such as visibility polygon computation) is responsible for the majority of
the running time. At first glance, this is surprising since the SCP is known to
be NP-hard and one instance of this problem has to be solved at each iteration.
However, many SCP instances are easily handled by modern ILP solvers, as is
the case for those arising from the AGP. Furthermore, the authors also observe
that, when reasonable initial discretizations of the polygon are used, the number
of iterations of the algorithm is actually quite small.
Knowing that the reduced AVP discretization requires a single iteration,
albeit an expensive one timewise, the authors remark that a trade-off between
the number of iterations and the hardness of the SCP instances handled by the
ILP solver should to be sought. Extensive tests lead to the conclusion that the
fastest results were achieved using the just vertices discretization since, although
many more iterations may be required, the SCP instances are quite small.
required. Despite being less constrained than the original AGP, the AGP(P, ∂P )
was proven to be NP-hard [44]. In this context, the authors presented an algo-
rithm capable of optimally solving AGP(P, ∂P ) for polygons with and without
holes provided the method converges in a finite number of steps. This represents
a significant improvement in the search for optimal solutions for the AGP.
The algorithm by Bottino and Laurentini works iteratively. First, a lower
bound specific for P is computed. The second step consists of solving an instance
of the so called Integer Edge Covering Problem (IEC). In this problem, the
objective is also to cover the whole boundary of the polygon with one additional
restriction: each edge must be seen entirely by at least one of the selected guards.
It is easy to see that a solution to the IEC is also viable for AGP(P, ∂P ) and,
consequently, its cardinality is an upper bound for the latter. After obtaining a
viable solution, the gap between the upper and lower bounds is checked. If it
is zero (or less than a predefined threshold) the execution is halted. Otherwise,
a method is used to find indivisible edges, which are edges that are entirely
observed by one guard in some or all optimal solutions of AGP(P, ∂P ). The
identification of these edges can be done in polynomial time from the visibility
arrangement. After identifying them, those classified as not indivisible are split
and the process starts over.
Tests were performed on approximately 400 random polygons with up to 200
vertices. The instances were divided into four classes: simple, orthogonal, random
polygons with holes and random orthogonal polygons with holes. Reasonable
optimality percentages were obtained using the method. For instance, on random
polygons with holes, optimal results were achieved for 65% of the instances with
60 vertices. In cases where the program did not reach an optimal solution (due
to the optimality gap threshold or to timeout limits), the final upper bound
was, on average, very close to the lower bound computed by the algorithm. On
average, for all classes of polygons, the upper bound exceeded the lower bound
by ∼ 7%.
initial discretization techniques were considered. The first one, called single ver-
tex, consists in the extreme case where just one vertex of the polygon forms
the initial discretized set W . As the second strategy, named convex vertices, W
comprises all convex vertices of P .
The authors made a thorough analysis of the trade-off between the number
and nature of the alternative discretization methods and the number of itera-
tions. Their tests were run on a huge benchmark set of more than ten thousand
polygons with up to 2500 vertices. The conclusion was that the decision over the
best discretization strategy deeply depends on the polygon class being solved. As
anticipated, the fraction of time spent in the preprocessing phase was confirmed
to be large for sizable non-orthogonal polygons and even worse in the case of
von Koch and random von Koch polygons. Moreover, while using shadow AVPs
as the initial discretization produces convergence after just one iteration of the
algorithm, the resulting discretization set can, in this case, be so large that the
time cost of the preprocessing phase overshadows the solution of the ensued SCP.
For this reason, the just vertices strategy lead to the most efficient version of
the algorithm, in practice, as the small SCP instances created counterbalanced
the larger number of iterations for many polygon classes.
0 ≤ xg ≤ 1 ∀g ∈ G. (4)
optimal solutions for the AGP(P, ∂P ). These are then tested in the search for a
coverage of the entire polygon. According to the authors, if such solution exists,
it is automatically a nearly optimal one for AGP(P, P ). If a viable solution is
not among those, guards are then added using a greedy strategy until a feasible
solution is found.
It should be noted that there are worst-case instances for AGP(P, P ) that
only possess a single optimal solution, where no characterization of the guard
positions is known. Therefore this algorithm, and none of the subsequent ones
presented in this paper, can guarantee to find optimal solutions. This common
issue is discussed in more detail in Sect. 6.2.
For the experiments presented in [10], 400 polygons with sizes ranging from
30 to 60 vertices were examined. As in the previous work, the following classes
were tested: simple, orthogonal, random polygons with holes and also random
orthogonal polygons with holes. Guaranteed optimal solutions were found in
about 68% of the polygons tested. Note that, in about 96% of the cases, the
solution found for AGP(P, ∂P ) in the first step of this algorithm was also viable
for AGP(P, P ). In addition, the authors also implemented the most promising
techniques by Amit et al. (see Sect. 3.1), in order to enable comparison between
both works. As a result, this technique was more successful than the method by
Amit et al. considering the random polygons tested.
issue of closing the integrality gap between them. This may lead to terminating
with a suboptimal solution, however with a provided lower bound. As a lever-
age against this shortcoming, cutting planes are employed to raise the lower
bounds [33]. Two classes of facet-defining inequalities for the convex hull of all
feasible integer solution of AGP(G, W ) are identified. While the NP-hardness
of AGP indicates that it is hopeless to find a complete polynomial-size facet
description, it is shown that the new inequalities contain a large set of facets,
including all with coefficients in {0, 1, 2}, see also [6]. The dual phase is enhanced
with separation routines for the two classes, consequently improving the lower
bounds, and often allowing the algorithm to terminate with provably optimal
solutions.
To evaluate this work, experiments were conducted on four different poly-
gon classes, sized between 60 and 1000 vertices. These included both orthogonal
and non-orthogonal instances, both with and without holes, and polygons where
optimal solutions cannot use vertex guards. Different parametrizations of the
algorithms were tested, and it was found that the ILP-based algorithm itself
(without applying cutting planes) could identify good integer solutions, some-
times even optimal ones, and considerably surpassed the previous 2010 version.
The algorithm was able to find optimal solutions for 500-vertex instances quite
often. Instances with 1000 vertices were out of reach though.
After presenting an algorithm for AGP with point guards in spring 2013 (see
Sect. 3.8), the research group from Campinas continued working on the subject.
In this context, improvements were implemented, including the development of
their own visibility algorithm that was also able to handle polygons with holes,
giving rise to a new version of the algorithm [59]. The resulting implementation is
Engineering Art Galleries 393
location [62, Sect. 3]1 of all already existing guards (or witnesses) with respect
to a new visibility polygon at once.
The new code now runs substantially faster, allowing it to solve much larger
instances than the previous one. This paper contains the first experimental evalu-
ation of this new algorithm. Section 4 contains results from running the algorithm
and comparing it to the other approaches presented here. Section 5 discusses the
speedup obtained by the new subroutines.
This implementation is the result of a joint effort by the Braunschweig and the
Campinas groups. With the intent of achieving robustness, its core is the algo-
rithm from Campinas (Sect. 3.9), refitted with optimizations from Braunschweig
that greatly improved its efficiency.
The new code now also uses the lazy exact kernel (cf. Sect. 3.10) of CGAL
and the triangular expansion algorithm [11] of the new visibility package [36]
of CGAL. While the impact of the new visibility polygon algorithm was huge
for both approaches the usage of the lazy kernel was also significant since the
overlays in the approach of Campinas contain significantly more intersection
points. To see more about how changes in kernel and visibility affect the solver,
consult Sect. 5.
Moreover, the current version of Campinas also includes new approaches on
the algorithm side. One of the ideas developed was to postpone the computation
of an upper bound (solving AGP(G, P )) to the time that a good lower bound,
and, consequently, a “good” set of guard candidates is obtained. This can be
done by repeatedly solving only AGP(P, W ) instances until an iteration where
the lower bound is not improved is reached. This situation possibly means that
the value obtained will not change much in the next iterations. It also increases
the chances that the first viable solution found is also provably optimal, which
automatically reduces the number of AGP(G, P ) instances which must be re-
solved.
Other changes that are the inclusion of a new strategy for guard positioning,
where only one interior point from each light AVP is chosen to be part of the
guard candidate set (instead of all its vertices), and the possibility of using IBM
ILOG CPLEX Optimization Studio [22] (CPLEX) solver instead of XPRESS.
This new version was tested in experiments conducted for this paper, using
900 problem instances ranging from 200 to 5000 vertices. Section 4 presents the
obtained results in detail. The implementation proved to be efficient and robust
for all classes of polygons experimented.
1
This is an an O((n+m) log n) sweep line algorithm, where n is the number of polygon
vertices and m the number of query points.
Engineering Art Galleries 395
4 Experimental Evaluation
To assess how well the AGP can be solved using current algorithms, and how
their efficiency has developed over the last years, we have run exhaustive exper-
iments. The experiments involve all algorithms for which we could get working
implementations, and were conducted on the same set of instances and on the
same machines, see Sect. 4.2. We refrain from providing comparisons based on
numbers from the literature.
We had several snapshots from the Braunschweig and Campinas code avail-
able, these are:
– For Braunschweig, the versions from 2010 (Sect. 3.5), 2012 (Sect. 3.7), and
2013 (Sect. 3.10). These will be referred to as BS-2010, BS-2012, and BS-2013,
respectively.
– For Campinas, the version from 2009 (Sect. 3.4), and the two snapshots from
2013 (Sects. 3.8 and 3.9). These will be referred to as C-2009, C-2013.1 and
C-2013.2, respectively.
– The latest version is the combined approach from Campinas and
Braunschweig that was obtained during a visit of Davi C. Tozoni to
Braunschweig (Sect. 3.11), which we refer to as C+BS-2013.
The older versions have already been published, for these we provide a unified
evaluation. The versions BS-2013 and C+BS-2013 are, as of yet, unpublished.
4.1 AGPLib
For the performed experiments, several classes of polygons were considered. The
majority of them were collected from AGPLib [16], which is a library of sample
instances for the AGP, consisting of various classes of polygons of multiple sizes.
They include the test sets from many previously published papers [7,19–21,41,
59,60].
To find out more about how each of the classes was generated, see [20] and
[59]. Below, we show a short description of the six classes of instances considered
in this survey; all of them are randomly generated:
“von Koch”: Random polygons inspired by randomly pruned Koch curves, see
Fig. 3e.
“spike”: Random polygons with holes as in Fig. 3f. Note that this class is specif-
ically designed to provide polygons that encourage placing point guards in the
intersection centers. It has been published along with the BS-2010 algorithm
(Sect. 3.5), which was the first capable of placing point guards.
4.3 Results
Historically, the two lines of algorithms have been working towards AGP from
different angles. Campinas focused on binary solutions, which initially came at
Engineering Art Galleries 397
the expense of being limited to given guard discretization, like vertex guards:
AGP(V, P ). The Braunschweig work started with point guards, but the price
were fractional solutions: AGPR (P, P ). It only became possible to reliably solve
the binary AGP with point guards, AGP(P, P ), with the BS-2012 and C-2013.1
algorithms.
Therefore, we first sketch progress for the AGP with vertex guards and the
fractional AGP before discussing the experimental results for AGP with point
guards itself.
Table 1. Optimality rates for vertex guards. Notice that BS-2010 finds fractional
vertex guard solutions, whereas the others find integer ones.
Table 1 shows optimality rates, i.e., how many of the instances each imple-
mentation could solve, given a 20 min time limit per instance. The polygons were
grouped in two categories: those without holes, including the instances classes
simple, ortho and von Koch, and those with holes composed by the instances
in the classes simple-simple, ortho-ortho and spikes. The Campinas versions
prior to C-2013.2 could not deal with holes in input polygons, so these entries are
empty. It should also be noted that BS-2010 solves the easier case of fractional
vertex guards. It is clearly visible how all algorithms (including the five-year-old
C-2009) can solve all simple polygons with up to 2000 vertices as well as most
simple 5000-vertex polygons. For instances with holes, however, the solution per-
centages of all algorithms (except BS-2010 which solves an easier problem) start
398 P.J. de Rezende et al.
significantly dropping at 1000 vertices. This demonstrates two effects: First, for
smaller sizes, the problem is easier to solve as the search for good guard candi-
dates is unnecessary. Second, for larger sizes, finding optimal solutions to large
instances of the NP-hard SCP dominate, resulting in a computational barrier.
The difficulty to handle large SCP instances also shows up when we consider the
results of the Campinas codes C-2013.1 and C-2013.2. As the size of the poly-
gons increases and the SCPs to be solved grow in complexity, the Lagrangian
heuristic employed by C-2013.2 version uses more computational time but does
not help the ILP solver to find optimal solutions for the AGP(G,W) instances,
due to the deterioration of the primal bounds. This inefficiency causes a decrease
in the solver’s performance, as can be seen in the optimality rate shown in Table
1 for simple polygons with 5000 vertices. In this case, if C-2013.2 did not use
the Lagrangian heuristic by default, a result at least similar to that obtained by
C-2013.1 would be expected.
The high solution rates allow us to directly analyze the speedup achieved over
time. Table 2 shows how much faster than C-2009 later algorithms could solve
the problem. The shown numbers are log-averages over the speedup against
C-2009 for all instances solved by both versions. As C-2009 cannot process
holes, this analysis is restricted to simple polygons. It is clearly visible that
Table 2. Speedup for vertex guards. Numbers indicate how many times faster than
C-2009 later implementations became, computed as log-average. The comparison is
only possible when there is at least one instance of the group that was solved by all
considered solvers. This table is restricted to simple polygons, since C-2009 does not
support polygons with holes.
BS-2013 is about five times faster then C-2009, and the changes from C-2013.2
to C+BS-2013 led to a speedup factor of about seven. These stem from a number
of changes between versions, however, roughly a factor 5 can be attributed to
improvements in geometric subroutines — faster visibility algorithms, lazy-exact
CGAL kernel, reduced point constructions. We discuss the influence of geometry
routines in Sect. 5.1.
Fractional Guards. The Braunschweig line of work started with solving the
fractional point guard variant AGPR (P, P ) and all Braunschweig versions, even
those designed for binary solutions, still support the fractional AGP. Table 3
shows how often the three implementations could find optimal solutions, and
how often they achieved a 5% gap by the end of the 20-min runtime limit. Here
again, the polygons have been grouped: those with holes and those without holes.
Point Guards. We turn our attention to the classic AGP, AGP(P, P ): Finding
integer solutions with point guards. We report optimality in three different ways:
Which percentage of instances could be solved optimally with a matching lower
bound (i.e., proven optimality) is reported in Table 4; we show in how many
percent of the cases an instance could be solved optimally, whether or not a
matching bound was found in Table 5; Table 6 reports how many percent of
the solutions were no more than 5% away from the optimum. This allows to
400 P.J. de Rezende et al.
distinguish between cases where BS-2013 does not converge, and cases where
the integrality gap prevents it from detecting optimality.
The C+BS-2013 implementation solves the vast majority of instances from
our test set to proven optimality, the only notable exception being some classes
of very large polygons with holes and the 5000-vertex Koch polygons. Given
how the best known implementation by 2011, the Torino one from Sect. 3.6,
had an optimality rate of about 70% for 60-vertex instances, it is clearly visible
Engineering Art Galleries 401
how the developments in the last years pushed the frontier. With C+BS-2013,
instances with 2000 vertices are usually solved to optimality, showing an increase
in about two orders of magnitude. The success of C+BS-2013 is multifactorial: It
contains improved combinatorial algorithms as well as faster geometry routines,
most notably a fast visibility implementation. Section 5 discusses its key success
factors.
402 P.J. de Rezende et al.
It can be seen from Table 6 that many algorithms are able to find near-optimal
solutions (5% gap) for most instances, indicating that for practical purposes, all
2013 algorithms perform very well. The frontier on how large instances can be
solved with small gap is between 2000 and 5000 vertices for most polygons with
holes and beyond 5000 vertices for simple polygons.
Engineering Art Galleries 403
5 Success Factors
As seen in Sect. 3, the most effective algorithms for the AGP can be decomposed
into four elements:
Both groups use the 2D Arrangements package [62] of CGAL which follows the
generic programming paradigm [5]. For instance, in the case of arrangements
it is possible to change the curve type that is used to represent the planar
subdivisions or the kernel that provides the essential geometric operations and
also determines the number type used. In the context of this work, it is clear
that the used curves are simply segments2 . However, the choice of the geometric
kernel can have a significant impact on the runtime.
First of all, it should be noted that among the different kernels that CGAL
offers only kernels that provide exact constructions should be considered as any
inexact construction is likely to induce inconsistencies in the data structure of
the arrangements package. This already holds for seemingly simple scenarios
as the code of the arrangement package heavily relies on the assumption that
constructions are exact.
This essentially leaves two kernels: The Cartesian kernel and the lazy-exact
kernel. For both kernels it is possible to exchange the underlying exact rational
number type, but CGAL :: Gmpq [35] is the recommended one3 .
The Cartesian kernel, is essentially the naive application of exact rational
arithmetic (using the one that it is instantiated with, in this case CGAL :: Gmpq).
Thus, coordinates are represented by a numerator and denominator each being
an integer using as many bits as required. This implies that even basic geometric
constructions and predicates are not of constant cost, but depend on the bit-
size of their input. For instance, the intersection point of two segments is likely
to require significantly more bits than the endpoints of the segments. And this
is even more relevant in case of cascaded constructions as the bit growth is
cumulative. This effect is very relevant in both approaches due to there iterative
nature, e.g., when such a point is chosen to be a new guard or witness position.
The lazy-exact kernel [54] tries to attenuate all these effects by using exact
arithmetic only when necessary. Every arithmetic operation and construction is
first carried out using only double interval arithmetic, that is, using directed
rounding, an upper and a lower of the exact value is computed. The hope is
that for most cases this is already sufficient to give the correct and certified
answer, for instance whether a point is above or below a line. However, for the
case when this is not sufficient, each constructed object also knows its history,
which makes it possible to carry out the exact rational arithmetic as it is done
in the Cartesian kernel in order to determine the correct result. This idea is
implemented by the lazy kernel not only on the number type level4 , but also for
predicates and constructions, which reduces the overhead (memory and time)
that is induced by maintaining the history.
2
In the context of fading [43] circular arcs may also be required.
3
Other options are, for instance, leda::rational [46] or CORE::BigRat [39], but,
compared to Gmpq, both imply some overhead and are only recommended in case
the usage of the more complex number types of these libraries is required.
4
This can be achieved by the instantiation of the Cartesian kernel with
CGAL::Lazy exact nt<CGAL::Gmpq>.
Engineering Art Galleries 405
Table 8. The speedup factor of C+BS-2013 using the Cartesian kernel and the lazy-
exact kernel. Similar numbers were obtained for BS-2012. The lazy-exact kernel is now
the standard configuration in BS-2013 and C+BS-2013.
Fig. 4. Split up of average total time for different configurations on all simple instances
with 1000 vertices. (left) The update time which is dominated by the visibility polygon
computation almost vanishes in BS-2013 compared to the BS-2012. (right) The time
spent on visibility in C+BS-2013 is almost negligible compared to the time spend in
C-2013.2.
in building the constraint matrices for the ILPs, denoted by Mat Time, also suf-
fered a huge reduction relative to C-2013.2. As commented in Sect. 3.11, this
was mostly due to the execution of the visibility testing from the perspective of
the witnesses rather than the guards.
(witnesses) of the Boolean constraint matrix of the ILP that models the SCP
instance.
Furthermore, their algorithm employs a Lagrangian Heuristic in order to
obtain good, hopefully optimal, feasible starting solutions for the SCP to speedup
the convergence towards an optimum. See [8] for a comprehensive introduction
to this technique. The heuristic implemented is based on the work presented
in [8]. Figure 5 shows how the use of this technique positively influenced the
average run time of the approach.
Fig. 5. Average Time needed for the current Campinas version to solve von Koch
polygons with 1000 vertices with and without the Lagrangian Heuristic.
i.e., the SCP instance is reduced to minimizing the difference of two non-linear
convex functions. For such optimization problems, the DCA algorithm [53] can be
used. In experiments, it was shown that solutions for AGP(G, W ) could be found
very quickly, however, at the time, the large runtime overhead of the geometric
subroutines led to inconclusive results on the benefits. Revisiting this approach
with the new BS-2013 and C+BS-2013 implementations, which no longer suffer
from this overhead, will be an interesting experiment left for future work.
this success is probably related to the fact that, with the winning strategy, there
is a reduced number of visibility tests between witnesses and guard candidates,
as well as a smaller size of SCP instances to be solved.
For AGPR (P, W ), as solved by the Braunschweig line of algorithms, this
observation can be extended further: If an optimal dual solution for AGPR (G, W )
is available, selecting additional guards corresponds to a column generation
process. Therefore, the BS algorithms place guards only in light AVPs where
the dual solution guarantees an improvement in the objective function. To avoid
cycling in the column generation process, G is monotonically growing, leading
over time to a large number of guard positions.
w1
w2
w3
g2
g1
Initial Set. The selection of the first candidates for guards and witnesses, i.e.,
the initial choice of G and W can have tremendous impact on algorithm runtime.
In principle, a good heuristic here could pick an almost optimal set for G and a
matching W to prove it, and reduce the algorithm afterwards to a few or even
no iterations.
Chwa et al. [14] provide a partial answer to this problem: They attempt to
find a finite set of witnesses with the property that guarding this set guarantees
guarding the whole polygon. If such a set exists, the polygon is called witnessable.
Unfortunately this is not always the case. However, for a witnessable polygon,
the set can be characterized and quickly computed. Current algorithms do not
410 P.J. de Rezende et al.
Fig. 7. Average Time needed to solve ortho-ortho (left) and spike (right) polygons with
1000 vertices using the Convex Vertices and the Chwa Points discretization.
Table 10. Speedup factors in BS-2010 obtained by varying initial guards and wit-
nesses [41].
and witnesses on every (or every other) vertex of the polygon, putting guards
on all reflex vertices, and putting a witness on every edge adjacent to a reflex
vertex. The Chwa-inspired combination allowed for a speedup of around two.
Table 11. Percentage of instances solved to binary optimality comparing two variants
of code from Braunschweig 2013, one with and without cutting planes, for 500-vertex
instances.
For the Campinas approach, the quality of the lower bound computed is a
very important issue. For AGP(P, P ), the lower bound is obtained by solving
an AGP(P, W ) instance, where W is a discretized set of witnesses points within
P . Therefore, it is fair to say that the quality of the value computed is directly
dependent on the strategy applied to select the points that comprise the set
W . For more information on how the witness set is managed and how it affects
convergence of Campinas method, see Sect. 5.3.
412 P.J. de Rezende et al.
An interesting variant for the AGP was proposed by Joe O’Rourke in 2005: What
if visibility suffers from fading effects, just like light in the real world does? To
be precise, we assume that for a guard g with intensity xg , a witness w ∈ V (g) is
illuminated with a value of (d(g, w))xg , where d(g, w) is the Euclidean distance
between g and w, and is a fading function, usually assumed to be
⎧
⎨ 1 if d < 1
(d) := d−α if 1 ≤ d < R . (8)
⎩
0 if d ≥ R
instead of (2). Two algorithms for vertex guards were proposed and tested [29],
based on the BS-2013 implementation. The first approximates with a step
function, and uses updated primal and dual separation routines that operate on
overlays of visibility polygons and circular arcs, resulting in an FPTAS for the
fractional AGP(V, P ). The other is based on continuous optimization techniques,
namely a simplex partitioning approach. In an experimental evaluation using
polygons with up to 700 vertices, it was found that most polygons can be solved
(to an 1.2-approximation in case of the discrete approach) within 20 min on a
standard PC. The continuous algorithm turned out to be much faster, and very
often finishing with an almost-optimal solution with a gap under 0.01%. In an
experimental work by Kokemüller [40], AGP(P, P ) with fading was analyzed. It
was found that placing guards makes the problem substantially more difficult.
This is mainly due to an effect where moving one guard requires moving chains
of other guards as well to cover up for decreased illumination. It was also found
that scaling an input polygon has an impact on the structure of solutions and
number of required guards, resulting in a dramatic runtime impact.
Engineering Art Galleries 413
6.2 Degeneracies
7 Conclusion
In this paper, we have surveyed recent developments on solving the Art Gallery
Problem (AGP) in a practically efficient manner. After over thirty years of mostly
theoretical work, several approaches have been proposed and evaluated over the
last few years, resulting in dramatic improvements. The size of instances for
which optimal solutions can be found in reasonable time has improved from tens
to thousands of vertices in just a few years.
414 P.J. de Rezende et al.
References
1. Aigner, M., Ziegler, G.M.: Proofs from THE BOOK, 4th edn. Springer Publishing
Company Incorporated, Heidelberg (2009)
2. Amit, Y., Mitchell, J.S.B., Packer, E.: Locating guards for visibility coverage of
polygons. In: ALENEX, pp. 120–134 (2007)
3. Amit, Y., Mitchell, J.S.B., Packer, E.: Locating guards for visibility coverage of
polygons. Int. J. Comput. Geom. Appl. 20(5), 601–630 (2010)
4. Asano, T.: An efficient algorithm for finding the visibility polygon for a polygonal
region with holes. IEICE Trans. 68(9), 557–559 (1985)
5. Austern, M.H.: Generic Programming and the STL. PUB-AW (1999)
6. Balas, E., Ng, S.M.: On the set covering polytope: II. lifting the facets with coef-
ficients in {0, 1, 2}. Math. Program. 45, 1–20 (1989). doi:10.1007/BF01589093.
http://dx.doi.org/10.1007/BF01589093
7. Baumgartner, T., Fekete, S.P., Kröller, A., Schmidt, C.: Exact solutions and
bounds for general art gallery problems. In: Proceedings of the SIAM-ACM Work-
shop on Algorithm Engineering and Experiments, ALENEX 2010, pp. 11–22. SIAM
(2010)
8. Beasley, J.E.: Lagrangian relaxation. In: Reeves, C.R. (ed.) Modern Heuristic
Techniques for Combinatorial Problems, pp. 243–303. Wiley, New York (1993).
http://dl.acm.org/citation.cfm?id=166648.166660
9. Bottino, A., Laurentini, A.: A nearly optimal sensor placement algorithm for
boundary coverage. Pattern Recogn. 41(11), 3343–3355 (2008)
10. Bottino, A., Laurentini, A.: A nearly optimal algorithm for covering
the interior of an art gallery. Pattern Recogn. 44(5), 1048–1056 (2011).
http://www.sciencedirect.com/science/article/pii/S0031320310005376
11. Bungiu, F., Hemmer, M., Hershberger, J., Huang, K., Kröller, A.: Efficient compu-
tation of visibility polygons. CoRR abs/1403.3905 (2014). http://arxiv.org/abs/
1403.3905
Engineering Art Galleries 415
30. Fekete, S.P., Friedrichs, S., Kröller, A., Schmidt, C.: Facets for art gallery problems.
In: Du, D.-Z., Zhang, G. (eds.) COCOON 2013. LNCS, vol. 7936, pp. 208–220.
Springer, Heidelberg (2013). doi:10.1007/978-3-642-38768-5 20
31. Fekete, S.P., Friedrichs, S., Kröller, A., Schmidt, C.: Facets for art gallery problems.
Algorithmica 73(2), 411–440 (2014)
32. Fisk, S.: A short proof of Chvátal’s watchman theorem. J. Comb. Theory Ser. B
24(3), 374–375 (1978)
33. Friedrichs, S.: Integer solutions for the art gallery problem using linear program-
ming. Master’s thesis, TU Braunschweig (2012)
34. Ghosh, S.K.: Approximation algorithms for art gallery problems in polygons and
terrains. In: Rahman, M.S., Fujita, S. (eds.) WALCOM 2010. LNCS, vol. 5942, pp.
21–34. Springer, Heidelberg (2010). doi:10.1007/978-3-642-11440-3 3
35. GNU Multiple Precision Arithmetic Library (2013). http://gmplib.org
36. Hemmer, M., Huang, K., Bungiu, F.: 2D visibility. In: CGAL User and Reference
Manual. CGAL Editorial Board (2014, to appear)
37. Honsberger, R.: Mathematical Gems II. Mathematical Association of America,
Washington, DC (1976)
38. Kahn, J., Klawe, M., Kleitman, D.: Traditional art galleries require fewer watch-
men. SIAM J. Algebr. Discrete Methods 4(2), 194–206 (1983)
39. Karamcheti, V., Li, C., Pechtchanski, I., Yap, C.: A core library for robust numeric
and geometric computation. In: Proceedings of the 15th Annual ACM Symposium
of Computational Geometry (SCG), pp. 351–359 (1999)
40. Kokemüller, J.: Variants of the art gallery problem. Master’s thesis, TU Braun-
schweig (2014)
41. Kröller, A., Baumgartner, T., Fekete, S.P., Schmidt, C.: Exact solutions and
bounds for general art gallery problems. ACM J. Exp. Algothmmics 17, Article
ID 2.3 (2012)
42. Kröller, A., Moeini, M., Schmidt, C.: A novel efficient approach for solving the art
gallery problem. In: Ghosh, S.K., Tokuyama, T. (eds.) WALCOM 2013. LNCS,
vol. 7748, pp. 5–16. Springer, Heidelberg (2013). doi:10.1007/978-3-642-36065-7 3
43. Kröller, A., Schmidt, C.: Energy-aware art gallery illumination. In: Proceedings of
the 28th European Workshop on Computational Geometry (EuroCG 2012), pp.
93–96 (2012)
44. Laurentini, A.: Guarding the walls of an art gallery. Vis. Comput. 15(6), 265–278
(1999)
45. Lee, D.T., Lin, A.K.: Computational complexity of art gallery problems. IEEE
Trans. Inf. Theory 32(2), 276–282 (1986)
46. Mehlhorn, K., Näher, S.: Leda: A Platform for Combinatorial and Geometric Com-
puting. PUB-CAMB, Cambridge (2000)
47. Mitchell, J.S.B.: Approximating watchman routes. In: Proceedings of the Twenty-
Fourth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2013,
NewOrleans, Louisiana, USA, 6-8 January 2013, pp. 844–855 (2013)
48. Nilsson, B.J.: Approximate guarding of monotone and rectilinear polygons. In:
Caires, L., Italiano, G.F., Monteiro, L., Palamidessi, C., Yung, M. (eds.) ICALP
2005. LNCS, vol. 3580, pp. 1362–1373. Springer, Heidelberg (2005). doi:10.1007/
11523468 110
49. O’Rourke, J.: Art Gallery Theorems and Algorithms. International Series of Mono-
graphs on Computer Science. Oxford University Press, New York (1987)
50. O’Rourke, J., Supowit, K.: Some NP-hard polygon decomposition problems. IEEE
Trans. Inf. Theory 29(2), 181–190 (1983)
Engineering Art Galleries 417
51. Packer, E.: Computing multiple watchman routes. In: McGeoch, C.C. (ed.) WEA
2008. LNCS, vol. 5038, pp. 114–128. Springer, Heidelberg (2008). doi:10.1007/
978-3-540-68552-4 9
52. Packer, E.: Robust geometric computing and optimal visibility coverage. Ph.D.
thesis, SUNY Stony Brook (2008)
53. Tao, P.D., An, L.T.H.: Convex analysis approach to D.C. programming: theory,
algorithms and applications. Acta Mathematica Vietnamica 22(1), 289–355 (1997)
54. Pion, S., Fabri, A.: A generic lazy evaluation scheme for exact geometric compu-
tations. In: 2nd # WOR-LCSD (2006). http://www.citebase.org/abstract?id=oai:
arXiv.org:cs/0608063
55. Schuchardt, D., Hecker, H.D.: Two NP-hard art-gallery problems for ortho-
polygons. Math. Log. Q. 41, 261–267 (1995)
56. Shermer, T.C.: Recent results in art galleries (geometry). Proc. IEEE 80(9), 1384–
1399 (1992)
57. Tomás, A.P., Bajuelos, A.L., Marques, F.: Approximation algorithms to minimum
vertex cover problems on polygons and terrains. In: Sloot, P.M.A., Abramson,
D., Bogdanov, A.V., Dongarra, J.J., Zomaya, A.Y., Gorbachev, Y.E. (eds.) ICCS
2003. LNCS, vol. 2657, pp. 869–878. Springer, Heidelberg (2003). doi:10.1007/
3-540-44860-8 90
58. Tomás, A.P., Bajuelos, A.L., Marques, F.: On visibility problems in the plane - solv-
ing minimum vertex guard problems by successive approximations. In: Proceedings
of the 9th International Symposium on Artificial Intelligence and Mathematics (AI
& MATH 2006) (2006, to appear)
59. Tozoni, D.C., de Rezende, P.J., de Souza, C.C.: Algorithm 966: a practical iterative
algorithm for the art gallery problem using integer linear programming. ACM
Trans. Math. Softw. 43(2), 16:1–16:27 (2016). doi:10.1145/2890491. Article no. 16
60. Tozoni, D.C., Rezende, P.J., Souza, C.C.: The quest for optimal solutions for the
art gallery problem: a practical iterative algorithm. In: Bonifaci, V., Demetrescu,
C., Marchetti-Spaccamela, A. (eds.) SEA 2013. LNCS, vol. 7933, pp. 320–336.
Springer, Heidelberg (2013). doi:10.1007/978-3-642-38527-8 29
61. Urrutia, J.: Art gallery and illumination problems. In: Sack, J.R., Urrutia, J. (eds.)
Handbook on Computational Geometry, pp. 973–1026. Elsevier Science Publishers,
Amsterdam (2000)
62. Wein, R., Berberich, E., Fogel, E., Halperin, D., Hemmer, M., Salzman, O., Zuker-
man, B.: 2D arrangements. In: CGAL User and Reference Manual, 4.0 edn., CGAL
Editorial Board (2012)
Author Index