Mehlhorn K., Sanders P. Concise Algorithmics, The Basic Toolbox 124ñ PDF
Mehlhorn K., Sanders P. Concise Algorithmics, The Basic Toolbox 124ñ PDF
Concise Algorithmics
or
Entwurf-Entwurf-Entwurf-Entwurf-Entwurf-Entwurf-Entwurf-Entwurf
Foreword
Buy me not [25].
iii
iv
Contents
1 Amuse Geule: Integer Arithmetics
1.1 Addition . . . . . . . . . . . . . . . . . . .
1.2 Multiplication: The School Method . . . .
1.3 A Recursive Version of the School Method .
1.4 Karatsuba Multiplication . . . . . . . . . .
1.5 Implementation Notes . . . . . . . . . . . .
1.6 Further Findings . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
3
4
4
6
8
10
11
2 Introduction
2.1 Asymptotic Notation . . . . . . . . . . . . . . . . .
2.2 Machine Model . . . . . . . . . . . . . . . . . . . .
2.3 Pseudocode . . . . . . . . . . . . . . . . . . . . . .
2.4 Designing Correct Programs . . . . . . . . . . . . .
2.5 Basic Program Analysis . . . . . . . . . . . . . . . .
2.6 Average Case Analysis and Randomized Algorithms
2.7 Data Structures for Sets and Sequences . . . . . . . .
2.8 Graphs . . . . . . . . . . . . . . . . . . . . . . . . .
2.9 Implementation Notes . . . . . . . . . . . . . . . . .
2.10 Further Findings . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
13
14
16
19
23
25
29
32
32
37
38
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
39
40
45
51
54
56
57
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
vi
CONTENTS
4 Hash Tables
4.1 Hashing with Chaining . . . . .
4.2 Universal Hash Functions . . . .
4.3 Hashing with Linear Probing . .
4.4 Chaining Versus Linear Probing
4.5 Implementation Notes . . . . . .
4.6 Further Findings . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
59
62
63
67
70
70
72
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
75
. 78
. 80
. 83
. 84
. 90
. 92
. 96
. 98
. 101
6 Priority Queues
6.1 Binary Heaps . . . . . . . .
6.2 Addressable Priority Queues
6.3 Implementation Notes . . . .
6.4 Further Findings . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
105
107
112
120
121
7 Sorted Sequences
7.1 Binary Search Trees . . . . . .
7.2 Implementation by (a, b)-Trees
7.3 More Operations . . . . . . .
7.4 Augmenting Search Trees . . .
7.5 Implementation Notes . . . . .
7.6 Further Findings . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
123
126
128
134
138
140
143
8 Graph Representation
8.1 Edge Sequences . . . . . . . . . . .
8.2 Adjacency Arrays Static Graphs .
8.3 Adjacency Lists Dynamic Graphs
8.4 Adjacency Matrix Representation .
8.5 Implicit Representation . . . . . . .
8.6 Implementation Notes . . . . . . . .
8.7 Further Findings . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
147
148
149
150
151
152
153
154
CONTENTS
9 Graph Traversal
9.1 Breadth First Search .
9.2 Depth First Search . .
9.3 Implementation Notes .
9.4 Further Findings . . . .
vii
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
157
158
159
165
166
10 Shortest Paths
10.1 Introduction . . . . . . . . . . . . . . . . . . . .
10.2 Arbitrary Edge Costs (Bellman-Ford Algorithm) .
10.3 Acyclic Graphs . . . . . . . . . . . . . . . . . .
10.4 Non-Negative Edge Costs (Dijkstras Algorithm)
10.5 Monotone Integer Priority Queues . . . . . . . .
10.6 All Pairs Shortest Paths and Potential Functions .
10.7 Implementation Notes . . . . . . . . . . . . . . .
10.8 Further Findings . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
167
167
171
172
173
176
181
182
183
.
.
.
.
.
.
185
186
187
188
190
191
192
.
.
.
.
.
.
.
.
195
196
199
201
204
207
214
217
217
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
viii
CONTENTS
A Notation
225
A.1 General Mathematical Notation . . . . . . . . . . . . . . . . . . . . . 225
A.2 Some Probability Theory . . . . . . . . . . . . . . . . . . . . . . . . 227
A.3 Useful Formulas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228
Bibliography
230
CONTENTS
[amuse geule arithmetik. Bild von Al Chawarizmi]
1
=
CONTENTS
Chapter 1
We assume throughout this section that a and b are n-digit integers. We refer to
the digits of a as an1 to a0 with an1 being the most significant (also called leading)
digit and a0 being the least significant digit. [consistently replaced bit (was used
= mixed with digit) by digit]
5
T
0.3
1.18
4.8
20.34
1.1 Addition
Table 1.1: The running time of the school method for the multiplication of n-bit integers. The running time grows quadratically.
We all know how to add two integers a and b. We simply write them on top of each
other with the least significant digits aligned and sum digit-wise, carrying a single bit
= from one position to the next. [picture!]
c=0 : Digit
// Variable for the carry digit
for i := 0 to n 1 do add ai , bi , and c to form si and a new carry c
sn = c
We need one primitive operation for each position and hence a total of n primitive
operations.
Lemma 1.1 Two n-digit integers can be added with n primitive operations.
[todo: proper alignment of numbers in tables] Table 1.1 shows the execution
time of the school method using a C++ implementation and 32 bit digits. The time
given is the average execution time over ??? many random inputs on a ??? machine.
The quadratic growth of the running time is clearly visible: Doubling n leads to a
four-fold increase in running time. We can interpret the table in different ways:
(1) We can take the table to confirm our theoretical analysis. Our analysis predicts quadratic growth and we are measuring quadratic growth. However, we analyzed the number of primitive operations and we measured running time on a ???
computer.Our analysis concentrates on primitive operations on digits and completely
ignores all book keeping operations and all questions of storage. The experiments
show that this abstraction is a useful one. We will frequently only analyze the number of representative operations. Of course, the choice of representative operations
requires insight and knowledge. In Section 2.2 we will introduce a more realistic computer model to have a basis for abstraction. We will develop tools to analyze running
time of algorithms on this model. We will also connect our model to real machines, so
that we can take our analysis as a predictor of actual performance. We will investigate
the limits of our theory. Under what circumstances are we going to concede that an
experiment contradicts theoretical analysis?
(2) We can use the table to strengthen our theoretical analysis. Our theoretical
analysis tells us that the running time grows quadratically in n. From our table we
may conclude that the running time on a ??? is approximately ??? n 2 seconds. We
can use this knowledge to predict the running time for our program on other inputs.
[todo: redo numbers.] Here are three sample outputs. For n =100 000, the running =
time is 1.85 seconds and the ratio is 1.005, for n = 1000, the running time is 0.0005147
seconds and the ratio is 2.797, for n = 200, the running time is 3.3 10 5 seconds and
the ratio is 4.5, and for n =1 000 000, the running time is 263.6 seconds and the ratio
is 1.433. We see that our predictions can be far off. We simply were too careless. We
started with the assumption that the running is cn2 for some constant c, estimated c,
and then predicted. Starting from the assumption that the running time is cn 2 + dn + e
would have lead us to different conclusions. We need to do our theory more carefully 1
in order to be able to predict. Also, when we made the prediction that the running
time is approximately ??? 1010 n2 seconds, we did not make any restrictions on
n. However, my computer has a finite memory (albeit large) and this memory is
organized into a complex hierarchy of registers, first and second level cache, main
memory, and disk memory. The access times to the different levels of the memory
differ widely and this will have an effect on running times. None of the experiments
reported so far requires our program to use disk memory. We should analyze the space
requirement of our program in order to be able to predict for how large a value of n
the program is able to run in core2. In our example, we ran into both traps. The
prediction is off for small n because we ignored the linear and constant terms in the
running time and the prediction is off for large n because our prediction ignores the
effects of the memory hierarchy.
(3) We can use the table to conjecture quadratic growth of the running time of
our algorithm. Again, we need to be careful. What are we actually conjecturing?
That the running time grows quadratically on random numbers? After all, we ran the
algorithm only on random numbers (and even on very few of them). That the running
time grows quadratically in the worst case, i.e., that there are no instances that lead to
higher than quadratic growth? That the running grows quadratically in the best case,
i.e., that there are no instances that lead to less than quadratic growth? We see that we
need to develop more concepts and a richer language.
[dropped checking for now. If we use it we should do more in that direction
= later, e.g., pqs, flows, sorting]
and b = b1 2k + b0
Figure 1.1: todo: visuzlization of the school method and its recursive variant.
and hence
a b = a1 b1 22k + (a1 b0 + a0 b1 ) 2k + a0 b0 .
This formula suggests the following algorithm for computing a b:
a) Split a and b into a1 , a0 , b1 , and b0 .
b) Compute the four products a1 b1 , a1 b0 , a0 b1 , and a0 b0 .
c) Add the suitably aligned products to obtain a b.
Observe that the numbers a1 , a0 , b1 , and b0 are dn/2e-bit numbers and hence the
multiplications in step (2) are simpler than the original multiplication if dn/2e < n,
i.e., n > 1. The complete algorithm is now as follows: To multiply 1-bit numbers, use
our multiplication primitive, and to multiply n-bit numbers for n 2, use the three
step approach above. [picture!]
=
It is clear why this approach is called divide-and-conquer. We reduce the problem
of multiplying a b into some number of simpler problems of the same kind. A divide
and conquer algorithm always consists of three parts: In the first part, we split the
original problem into simpler problems of the same kind (our step (1)), in the second
part we solve the simpler problems using the same method (our step (2)), and in the
third part, we obtain the solution to the original problem from the solutions to the
subproblems. The following program implements the divide-and-conquer approach
to integer multiplication.
What is the connection of our recursive integer multiplication to the school method?
It is really the same. Figure ?? shows that the products a1 b1 , a1 b0 , a0 b1 , and a0 b0
are also computed by the school method. Knowing that our recursive integer multiplication is just the school method in disguise tells us that the recursive algorithms uses
a quadratic number of primitive operations. Let us also derive this from first principles. This will allow us to introduce recurrence relations, a powerful concept for the
analysis of recursive algorithm.
Lemma 1.3 Let T (n) be the maximal number of primitive operations required by our
recursive multiplication algorithm when applied to n-bit integers. Then
(
1
if n = 1,
T (n)
4 T (dn/2e) + 3 2 n if n 2.
Proof: Multiplying two 1-bit numbers requires one primitive multiplication. This
justifies the case n = 1. So assume n 2. Splitting a and b into the four pieces a 1 ,
a0 , b1 , and b0 requires no primitive operations3. Each piece has at most dn/2e bits
and hence the four recursive multiplications require at most 4 T (dn/2e) primitive
operations. Finally, we need three additions to assemble the final result. Each addition involves two numbers of at most 2n bits and hence requires at most 2n primitive
operations. This justifies the inequality for n 2.
In Section 2.5 we will learn that such recurrences are easy to solve and yield the
already conjectured quadratic execution time of the recursive algorithm. At least if n
is a power of two we get even the same constant factors. [exercise: induction proof
= for n power of two?] [some explanation how this introduces the concept for the
= next section?]
and b = b1 2k + b0
will require work, but it is work that we do not account for in our analysis.
will require work, but it is work that we do not account for in our analysis.
10
TK
5.85
17.51
52.54
161
494
1457
4310
TK /TK0
2.993
3.001
3.065
3.068
2.95
2.957
TS
1.19
4.73
19.77
99.97
469.6
1907
7803
TS /TS0
3.975
4.18
5.057
4.698
4.061
4.091
TK /TS
4.916
3.702
2.658
1.611
1.052
0.7641
0.5523
Table 1.2: The running time of the Karatsuba and the school method for integer multiplication: TK is the running time of the Karatsuba method and TS is the running time
of the school method. TK0 and TS0 denote the running time of the preceding iteration.
For an algorithm with running time T (n) = cn we have T /T 0 = T (2n)/T (n) = 2 .
For = log3 we have 2 = 3 and for = 2 we have 2 = 4. The table was produced
on a 300 MHz SUN ULTRA-SPARC.
bits and hence the three recursive multiplications require at most 3 T (dn/2e + 1)
primitive operations. Finally, we need two additions to form a0 + a1 and b0 + b1 and
four additions to assemble the final result. Each addition involves two numbers of
at most 2n bits and hence requires at most 2n primitive operations. This justifies the
inequality for n 4.
The techniques introduced in Section 2.5 allow us to conclude that T (n) ???n log3 ).
= [fill in constant factor and lower order terms!]
11
[report on switching time in LEDA] Note that this number of bits makes the Karat- =
suba algorithm useful to applications in crytography where multiplying numbers up
to 2048 bits is the most time consuming operation in some approaches [?].
12
13
Chapter 2
Introduction
[Alan Turing und John von Neumann? gothische Kathedrale (Chartres)?]
=
When you want to become a sculptor, you first have to learn your basic trade.
Where to get the right stones, how to move them, how to handle the chisel, erecting
scaffolding. . . These things alone do not make you a famous artist but even if you are
a really exceptional talent, it will be very difficult to develop if you do not find a
craftsman who teaches you the basic techniques from the first minute.
This introductory chapter attempts to play a similar role. It introduces some very
basic concepts that make it much simpler to discuss algorithms in the subsequent
chapters.
We begin in Section 2.1 by introducing notation and terminology that allows us
to argue about the complexity of alorithms in a concise way. We then introduce a
simple abstract machine model in Section 2.2 that allows us to argue about algorithms
independent of the highly variable complications introduced by real hardware. Section 2.3 than introduces a high level pseudocode notation for algorithms that is much
more convenient than machine code for our abstract machine. Pseudocode is also
more convenient than actual programming languages since we can use high level concepts borrowed from mathematics without having to worry about how exactly they
can be compiled to run on actual hardware. The Pseudocode notation also includes
annotations of the programs with specifications of what makes a consistent state of the
system. Section 2.4 explains how this can be used to make algorithms more readable
and easier to prove correct. Section 2.5 introduces basic mathematical techniques for
analyzing the complexity of programs, in particular, for analyzing nested loops and
recursive procedure calls. Section 2.6 gives an example why concepts from probability theory can help analyzing existing algorithms or designing new ones. Finally,
Section 2.8 introduces graphs as a convenient language to describe many concepts in
14
Introduction
discrete algorithmics.
15
: n n0 : g(n) c f (n)}
+ : n n0 : g(n) c f (n)}
+
+
+
: n n0 : g(n) c f (n)}
: n n0 : g(n) c f (n)}
(2.1)
(2.2)
(2.3)
(2.4)
(2.5)
O ( f (n)) is the set of all functions that eventually grow no faster than f (n) except
for constant factors. Similarly, ( f (n)) is the set of all functions that eventually
grow at least as fast as f (n) except for constant factors. For example, the Karatsuba
algorithm for integer multiplication has worst case running
time in O n1.58 whereas
the school algorithm has worst case running time in n2 so that we can say that the
Karatsuba algorithm is asymptotically faster than the school algorithm. The little-o
notation o( f (n)) denotes the set of all functions that grow stricly more slowly than
f (n). Its twin ( f (n)) is rarely used and only shown for completeness.
Handling the definitions of asymtotic notation directly requires some manipulations but is relatively simple. Let us consider one example that [abhaken] rids us of =
future manipulations for polynomial functions.
k
i
Lemma
2.1 Let p(n) = i=0 ai n denote any polynomial with ak > 0. Then p(n)
k
n .
Proof: It suffices to show that p(n) O nk and p(n) nk .
First observe that for n > 0,
k
i=0
i=0
ak k
ak
ak
n + nk1 ( n A) nk
2
2
2
16
Introduction
17
(2.6)
(2.7)
(2.8)
= [more rules!]
Exercise 2.1 Prove Lemma 2.2.
To summarize this section we want to stress that there are at least three orthogonal
choices in algorithm analysis:
What complexity measure is analyzed? Perhaps time is most important. But we
often also consider space consumption, solution quality, (e.g., in Chapter 12).
Many other measures may be important. For example, energy consumption
of a computation, the number of parallel processors used, the amount of data
transmitted over a network,. . .
Are we interested in worst case, best case, or average case?
Are we simplifying bounds using O (), (), (), o(), or ()?
Exercise 2.2 Sharpen Lemma 2.1 and show that p(n) = ak nk + o(nk ).
Figure 2.1: John von Neumann born Dec. 28 1903 in Budapest, died Feb. 8, 1957,
Washington DC.
the model in mind also work well on the vastly more complex hardware of todays
machines.
The variant of von Neumanns model we consider is the RAM (random access
machine) model. The most important features of this model are that it is sequential,
i.e., there is a single processing unit, and that it has a uniform memory, i.e., all memory
accesses cost the same amount of time. The memory consists of cells S[0], S[1], S[2],
. . . The . . . means that there are potentially infinitely many cells although at any
point of time only a finite number of them will be in use.
The memory cells store small integers. In Chapter 1 we assumed that small
means one or two bits. It is more convenient to assume that reasonable functions
of the input size n can be stored in a single cell. Our default assumption will be that
polynomials in n like n2 or perhaps 100n3 are still reasonable. Lifting this restriction
could lead to absurdly overoptimistic algorithms. For example by repeated squaring,
we could generate a number with 2n bits in n steps.[mehr zu den komplexitaetstheoretischen Konsequenzen hier oder in further findings?] We should keep in =
mind however, that that our model allows us a limited form of parallelism. We can
perform simple operations on logn bits in constant time.
In addition to the main memory, there is a small number of registers R 1 , . . . , Rk .
Our RAM can execute the following machine instructions.
Ri := S[R j ] loads the content of the memory cell with index R j into register Ri .
S[R j ]:= Ri stores register Ri in memory cell S[R j ].
Ri := R j R` is a binary register operation where is a placeholder for a variety
of operations. Arithmetic operations can be the usual +, , and but also
18
Introduction
the bit-wise operations |, &, >> <<, and for exclusive-or. Operations div
and mod stand for integer division and remainder respectively. Comparison
operations , <, >, encode true as 1 and false as 0. Logical operations
and further manipulate the truth values 0 and 1. We may also assume that
there are operations which interpret the bits stored in a register as floating point
numbers, i.e., finite precision approximations of real numbers.
2.3 Pseudocode
19
2.3 Pseudocode
Our RAM model is an abstraction and simplification of the machine programs executed on microprocessors. But the model is still too low level for an algorithms
textbook. Our programs would get too long and hard to read. Our algorithms will
therefore be formulated in pseudocode that is an abstraction and simplification of imperative programming languages like C, C++, Java, Pascal. . . combined with a liberal
use of mathematical notation. We now describe the conventions used in this book and
give a rough idea how these high level descriptions could be converted into RAM machine instructions. But we do this only to the extend necessary to grasp asymptotic
behavior of our programs. This would be the wrong place to worry about compiler
optimization techniques since a real compiler would have to target real machines that
are much more complex. The syntax of our pseudocode is similar to Pascal [47] because this is typographically nicer for a book than the more widely known Syntax of
C and its descendents C++ and Java.
A variable declaration v=x : T introduces a variable v of type T that is initialized to value x. For example, answer=42 : . When the type of a variable is clear
from the context we sometimes omit the declaration. We can also extend numeric
types by values and . Similarly, we use the symbol to denote an undefined
value which we sometimes assume to be distinguishable from a proper element of
T.
An declaration a : Array [i.. j] of T yields an array a consisting of j 1 + 1
elements of type T stored in a[i], a[i + 1], . . . , a[ j]. Arrays are implemented as con-
20
Introduction
tiguous pieces of memory. To find element a[i], it suffices to know the starting address
of a. For example, if register Ra stores the starting address of the array a[0..k] and
elements of a have unit size, then the instruction sequence R 1 := Ra + 42; R2 := S[R1 ]
loads a[42] into register R2 . Section 3.1 gives more details on arrays in particular if
they have variable size.
Arrays and objects referenced by pointers can be allocate d and dispose d of. For
example, p:= allocate Array [1..n] of allocates an array of p floating point numbers.
dispose p frees this memory and makes it available for later reuse. allocate and
dispose cut the single memory array S into disjoint pieces that accommodate all data.
= These functions can be implemented to run in constant time.[more in appendix?]
For all algorithms we present they are also space efficient in the sense that a program
execution that needs n words of memory for dynmically allocated objects will only
touch physical memory locations S[0..O (n)].1
From mathematics we borrow the composite data structures tuples, sequences,
and sets. Pairs, Triples, and, more generally, tuples are written in round brackets,
e.g., (3, 1), (3, 1, 4) or (3, 1, 4, 1, 5). Since tuples only contain a constant number of
elements, operations on them can be broken into operations on their constituents in an
obvious way. Sequences store elements in a specified order, e.g.,
s=h3, 1, 4, 1i : Sequence of declares a sequence s of integers and initializes it
to contain the numbers 3, 1, 4, and 1 in this exact order. Sequences are a natural
abstraction for many data structures like files, strings, lists, stacks, queues,. . . but our
default assumption is that a sequence s is synonymous to an array s[1..|s|]. In Chapter 3 we will learn many additional ways to represent sequences. Therefore, we later
make extensive use of sequences as a mathematical abstraction with little further reference to implementation details. We extend mathematical notation for sets to a notation
for sequences
in the obvious way, e.g., e s if e occurs somewhere in s or
2
i : 1 i 9 = h1, 4, 9, 16, 25, 36, 49, 64, 81i. The empty sequence is written as hi.
Sets play a pivotal rule in the expressive power of mathematical language and
hence we also use them in high level pseudocode. In particular, you will see declarations like M={3, 1, 4} : set of that are analogous to array or sequence declarations. Sets are usually implemented as sequences.
Numerical expressions can be directly translated into a constant number of machine instructions. For example, the pseudocode statement a:= a + bc can be translated into the RAM instructions R1 := Rb Rc ; Ra := Ra + R1 if Ra , Rb , and Rv stand
for the registers storing a, b, and c respectively. From C we borrow the shorthands
= ++ , and . etc.[+= etc needed?] Assignment is also allowed for composite
objects. For example, (a,b):= (b,a) swaps a and b.
The conditional statement
1
Unfortunately, no memory management routines are known that are space efficient for all possible
sequences of allocations and deallocations. It seems that there can be a factor (log n) waste of space [?].
2.3 Pseudocode
21
if C then
I
else I 0
stands for the instruction sequence
C; JZ sElse, Rc ; I; JZ sEnd, R0 ; I0
where C is a sequence of instructions evaluating the expression C and storing its result
in register Rc , I is a sequence of instructions implementing the statement I, I0 implements I 0 , sElse addresses[Pfeile einbauen] the first instruction in I0 , sEnd addresses =
the first instruction after I0 , and R0 is a register storing the value zero.
Note that the statements affected by the then part are shown by indenting them.
There is no need for the proliferation of brackets observed in programming languages
like C that are designed as a compromise of readability for humans and computers.
Rather, our pseudocode is designed for readability by humans on the small pages of
a book. For the same reason, a line break can replace a ; for separating statements.
The statement if C then I can be viewed as a shorthand for if C then I else ;, i.e.,
an if-then-else with an empty else part.
The loop repeat I until C is equivalent to the instruction sequence I; C; JZ sI, R c
where I is an instruction sequence implementing the pseudocode in I, C computes the
condition C and stores its truth value in register Rc , and sI adresses the first instruction
in I. To get readable and concise programs, we use many other types of loops that can
be viewed as shorthands for repeat-loops. For example:[todo: nicer alignment]
=
while C do I
for i := a to b do I
for i := a downto b step s do I
for i := a to while C do I
foreach e s do I
[do we need a break loop construct? (so far not, Dec28,2003)] How exactly =
loops are translated into efficient code is a complicated part of compiler construction
lore. For us it only matters that the execution time of a loop can be bounded by
summing the execution times of each of its iterations including the time needed for
evaluating conditions.
Often we will also use mathematical notation for sequences or sets to express loops
implicitly. For example, assume the set A is represented as an array and s is its size.
Then A:= {e B : C(e)} would be a shorthand for
s:= 0; foreach e B if C(e) then A[++ s] = e. Similarly, the use of the logical
quantifiers and can implicitly describe loops. For example, e s : e > 3 could be
a shorthand for foreach e s do if e 3 then return 0 endfor return 1.
A subroutine with name foo is declared in the form Procedure foo(D) I where
I is the body of the procedure and D is a sequence of variable declarations specifying
22
Introduction
declares a recursive function that returns n!. [picture. with a factorial example
= showing the state of the machine, after several levels of recursion.]
Our pseudocode also allows a simple form of object oriented programming because this is needed to separate the interface and implementation of data structures.
We will introduce our notation by an example. The definition
Class Complex(x, y : Element) of Number
2 If
there are not enough registers, parameters can also be passed on the stack.
23
Number r:= x
Number i:= y
24
Introduction
Function power(a : ; n0 : ) :
assert (a = 0 n0 = 0)
// It is not so clear what 00 should be
p=a : ; r=1 : ; n=n0 :
while n > 0 do
invariant pn r = an0
if n is odd then n ; r:= r p
// invariant violated between assignments
else (n, p):= (n/2, p p)
// parallel assignment maintains invariant
// This is a consequence of the invariant and n = 0
assert r = an0
return r
Figure 2.2: An algorithm that computes integer powers of real numbers.
When a program solves a problem, it massages the state of the system the
input state is gradually transformed into an output state. The program usually walks
on a narrow path of consistent intermediate states that allow it to function correctly.
To understand why a program works, it is therefore crucial to characterize what is a
consistent state. Pseudocode is already a big advantage over RAM machine code since
it shapes the system state from a sea of machine words into a collection of variables
with well defined types. But usually this is not enough since there are consistency
properties involving the value of several variables. We explain these concepts for the
algorithm in Figure 2.2 that computes powers.
We can require certain properties of the system state using an assert-statement.
In an actual program we would usually check the condition at run time and signal an
error if it is violated. For our pseudocode however, an assertion has no effect on the
computation it just declares that we expect the given property to hold. In particular,
we will freely use assertions that would be expensive to check. As in our example,
we often need preconditions, i.e., assertions describing requirements on the input of
a function. Similarly, postconditions describe the result of a function or the effect of
a procedure on the system state. One can view preconditions and postconditions as
a contract between the caller and the called routine: If the caller passes consistent
paramters, the routine produces a result with guaranteed properties. For conciseness,
we will use assertions sparingly assuming that certain obvious conditions are implicit from the textual description of the algorithm. Much more elaborate assertions
may be required for high quality programs or even formal verification.
Some particularly important consistency properties should hold at many places in
the program. We use two kinds of such invariants: A loop invariant holds before
and after each loop iteration. Algorithm 2.2 and many of the simple algorithms explained in this book have a very simple structure: A couple of variables are declared
25
and initialized to establish the loop invariant. Then a main loop manipulates the state
of the program. When the loop terminates, the loop invariant together with the termination condition of the loop imply that the correct result has been computed. The
loop invariant therefore plays a pivotal role in understanding why a program works
correctly. Once we understand the loop invariant, it suffices to check that the loop
invariant is true initially and after each loop iteration. This is particularly easy, if
only a small number of statemtents are executed between points where the invariant
is maintained. For example, in Figure 2.2 the parallel assignment (n, p):= (n/2, p p)
helps to document that the invariant is only maintained after both assignments are
executed.[forward references to examples?]
=
More complex programs encapsulate their state in objects whose consistent representation is also governed by invariants. Such data structure invariants are declared
together with the data type. They are true after an object is constructed and they are
preconditions and postconditions of all methods of the class. For example, in order to
have a unique representation of a two-element set of integers we might declare
Class IntSet2 a, b : ; invariant a < b . . . [forward references to examples?] =
[more on programming by contract?]
=
26
Introduction
when we make an average case analysis or when we are interested in the expected
execution time of the randomized algorithms introduced in Section 2.6.
For example, the insertion sort algorithm introduced in Section 5.1[move that
= here?] has two nested loops. The outer loop counts i from 2 to n. The inner loop
performs at most i 1 iterations. Hence, the total number of iterations of the inner
loop is at most
n
i=2
i=1
i=1
1 = (i 1) = (i 1) = n + i =
i=2 j=2
n(n + 1)
n(n 1)
n =
.
2
2
i=2
n
i=2
i=2
i=n/2+1
(i 1)
d=2, b=4
d=b=2
(2.9)
Since the time for one execution of the inner loop is O (1), we get a worst case execution time of n2 . All nested loops with an easily predictable number of iterations
can be analyzed in an analogous fashion: Work your way inside out by repeatedly
finding a closed form expression for the innermost loop. Using simple manipulations
like i cai = c i ai , i (ai + bi ) = i ai + i bi , or ni=2 ai = a1 + ni=1 ai one can
often reduce the sums to very simple forms that can be looked up in a formulary. A
small sample of such formulas can be found in Appendix A. Since we are often only
interested in the asymptotic behavior, one can also get rid of constant factors or lower
order terms quickly. For example, the chain of equations 2.9 could be rewritten
n
27
n/2 = n2 /4 .
[more tricks, like telescoping, etc? but then we need good examples
here. Alternative: forward references to places where these tricks are actu= ally needed]
d=3, b=2
Figure 2.3: Examples for the three cases of the master theorem. Arrows give the size
of subproblems. Thin lines denote recursive calls.
values make the function well defined. Solving recurrences, i.e., giving nonrecursive,
closed form expressions for them is an interesting subject of mathematics.[sth more
in appendix?] Here we focus on recurrence relations that typically emerge from =
divide-and-conquer algorithms. We begin with a simple case that already suffices to
understand the main ideas:
Theorem 2.3 (Master Theorem (Simple Form)) For positive constants a, b, and d,
and n = bk for some integer k, consider the recurrence
(
a
if n = 1
r(n) =
cn + dT(n/b) else.
then
2.5.2 Recurrences
In our rules for analyzing programs we have so far neglected subroutine calls. Subroutines may seem to be easy to handle since we can analyze the subroutine separately
and then substitute the obtained bound into the expression for the running time of the
calling routine. But this gets interesting for recursive calls. For example, for the recursive variant of school multiplication and n a power of two we obtained T (1) = 1,
T (n) = 6n + 4T (n/2) for the number of primitive operations. For the Karatzuba algorithm, the corresponding expression was T (1) = 1, T (n/2) = 12n + 3T (n/2). In
general, a recurrence relation defines a function (directly or indirectly) in terms of
the same function using smaller arguments. Direct definitions for small parameter
(n)
r(n) = (n logn)
nlogb d
if d < b
if d = b
if d > b
Figure 2.3 illustrates the main insight behind Theorem 2.3: We consider the amount
of work done at each level of recursion. If d < b, the amount of work in a recursive
call is a constant factor smaller than in the previous level. Hence, the total work
decreases geometrically with the level of recursion. Informally speaking, the first
level of recursion already accounts for a constant factor of the execution time. If
d = b, we have the same amount of work at every level of recursion. Since there are
logarithmically many levels, the total amount of work is (n logn). Finally, if d > b
28
Introduction
we have a geometrically growing amount of work in each level of recursion so that the
last level accounts for a constant factor of the running time. Below we formalize this
reasoning.
Proof: At level k, the subproblems have size one and hence we have to account cost
an = ad k for the base case.
At the i-th level of recursion, there are d i recursive calls for subproblems of size
n/bi = bk /bi = bki . The total cost for the divide-and-conquer steps is
k1
i=0
cb
ki
= cb
k1
i=0
d
b
i
k1
= cn
i=0
d
b
i
Case d = b: we have cost ad k = abk = an = (n) for the base case and cnk =
cn logb n = (n logn) for the divide-and-conquer steps.
Case d < b: we have cost ad k < abk = an = O (n) for the base case. For the cost of
divide-and-conquer steps we use Formula A.9 for a geometric series and get
cn
1 (d/b)k
1
< cn
= O (n) and
1 d/b
1 d/b
1 (d/b)k
> cn = (n) .
cn
1 d/b
b
k log
log b logd
=b
d
k log
log b
= bk logb d = nlogb d .
Hence we get time anlogb d = nlogb d for the base case. For the divide-and-conquer
steps we use the geometric series again and get
cbk
(d/b)k 1
d/b 1
=c
d k bk
d/b 1
= cd k
1 (b/d)k
d/b 1
= d k = nlogb d .
How can one generalize Theorem 2.3 to arbitrary n? The simplest way is semantic
reasoning. It is clear3 that it is more difficult to solve larger inputs than smaller inputs
and hence the cost for input size n will be less than the time needed when we round to
bdlogb ne the next largest power of b. Since b is a constant, rounding can only affect
the running time by a constant factor. Lemma [todo. The analysis of merge sort
= uses C(n) C(bn/2c) +C(dn/2e) + n 1 C(n) n dlogne 2 dlogne + 1 n logn]
in the appendix makes this reasoning precise.
3 Be
29
There are many further generalizations of the Master Theorem: We might break
the recursion earlier, the cost for dividing and conquering may be nonlinear, the size of
the subproblems might vary within certain bounds, the number of subproblems may
depend on the input size, . . . [how much of this are we doing? Theorem here,
proof in appendix]
=
[linear recurrences? mention here, math in appendix?]
=
[amortization already here or perhaps in section of its own?]
=
*Exercise 2.4 Suppose you have a divide-and-conquer algorithm whose running time
is governed by the recurrence
T (1) = a, T (n) = cn + n T ( n/ n ) .
Exercise 2.5 Access to data structures is often governed by the following recurrence:
T (1) = a, T (n) = c + T (n/2).
T (1) = a, T (n) = c + T (dn/be) .
Show that T (n) = O (log n).
aware that most errors in mathematical arguments are near occurrences of the word clearly.
[Hier section on basic data structures and their set theoretic implementation?
move sth from summary? Also search trees, sorting, priority queues. Then
perhaps do binary search here?] [moved section Generic techniques into a =
summary chapter.]
=
30
Introduction
7 Euro in the fourth box and after opening and skipping boxes 57, you swap with
the 9 Euro in the eigth box. The trick is that you are only allowed to swap money ten
times. If you have not found the maximum amount after that, you lose everything.
Otherwise you are allowed to keep the money you hold. Your Aunt, who is addicted
to this show, tells you that only few candidates win. Now you ask yourself whether it
is worth participating in this game. Is there a strategy that gives you a good chance to
win? Are the hints of the show master useful?
Let us first analyze the obvious algorithm you always follows the show master.
The worst case is that he makes you open the boxes in order of increasing weight. You
would have to swap money 100 times to find the maximum and you lose after box 10.
The candidates and viewers, would hate the show master and he would be fired soon.
Hence, worst case analysis is quite unrealistic in this case. The best case is that the
show master immediately tells you the best box. You would be happy but there would
be no time to place advertisements so that the show master would again be fired. Best
case analysis is also unrealistic here.
So let us try average case analysis. Mathematically speaking, we are inspecting
a sequence hm1 , . . . , mn i from left to right and we look for the number X of left-right
maxima, i.e., those elements mi such that j < i : mi > m j . In Theorem 10.11 and
Exercise 11.6 we will see algorithmic applications.
For small n we could use a brute force approach: Try all n! permutations, sum up
the number of left right maxima, and divide by n! to obtain the average case number
of left right maxima. For larger n we could use combinatorics to count the number
of permutations ci that lead to X = i. The average value of X is then ni=1 ci /n!. We
use a slightly different approach that simplfies the task: We reformulate the problem
in terms of probability theory. We will assume here that you are familiar with basic
concepts of probability theory but you can find a short review in Appendix A.2.
The set of all possible permutations becomes the probability space . Every particular order has a probability of 1/n! to be drawn. The number of left-right maxima
X becomes a random variable and the average number of left-right maxima X is the
expectation E[X]. We define indicator random variables Xi = 1 if mi is a left-right maximum and Xi = 0 otherwise. Hence, X = ni=1 Xi . Using the linearity of expectation
we get
n
i=1
i=1
i=1
31
n
1
1
lnn
+
1.
Hint:
first
show
that
k
k
k=1
k=2
Z n
1
1
dx.
Let us apply these results to our game show example: For n = 100 we get less than
E[X] < 6, i.e., with ten opportunities to change your mind, you should have a quite
good chance to win. Since most people lose, the hints given by the show master are
actually contraproductive. You would fare much better by ignoring him. Here is the
algorithm: Choose a random permutation of the boxes initially and stick to that order
regardless what the show master says. The expected number of left-right maxima for
this random choice will be below six and you have a good chance to win. You have
just seen a randomized algorithm. In this simple case it was possible to permute the
input so that an average case analysis applies. Note that not all randomized algorithms
follow this pattern. In particular, for many applications there is no way to obtain an
equivalent average case input from an arbitrary (e.g. worst case) input.[preview of
randomized algorithms in this book. define Las Vegas and Monte Carlo? do
we have any Monte Carlo algs?]
=
Randomized algorithms come in two main varieties. We have just seen a Las Vegas
algorithm, i.e., an algorithms that always computes the correct output but where the
run time is a random variable. In contrast, a Monte Carlo algorithm always has the
same run time yet there is a nonzero probability that it gives an incorrect answer.
[more?] For example, in Exercise 5.5 we outline an algorithm for checking whether =
two sequences are permutations of each other that with small probability may may
errorneously answer yes.[primality testing in further findings?] By running a =
Monte Calo algorithm several times using different random bit, the error probability
can be made aribrarily small.
Exercise 2.7 Suppose you have a Las Vegas algorithm with expected execution time
t(n). Explain how to convert it to a Monte Carlo algorithm that gives no answer
with
probability at most p and has a deterministic execution time guarantee of O t(n) log 1p .
Hint: Use an algorithm that is easy to frustrate and starts from scratch rather than waiting for too long.
Exercise 2.8 Suppose you have a Monte Carlo algorithm with execution time m(n)
that gives a correct answer with probability p and a deterministic algorithm that verifies in time v(n) whether the Monte Carlo algorithm has given the correct answer.
Explain how to use these two algorithms to obtain a Las Vegas algorithm with ex1
pected execution time 1p
(m(n) + v(n)).
32
Introduction
2.8 Graphs
Exercise 2.9 Can you give a situation where it might make sense to combine the
two previous algorithm to transform a Las Vegas algorithm first into a Monte Carlo
algorithm and then back into a Las Vegas Algorithm?
= [forward refs to probabilistic analysis in this book]
2.8 Graphs
[Aufmacher? Koenigsberger Brueckenproblem ist historisch nett aber didaktisch bloed weil es ein Multigraph ist und Eulertouren auch sonst nicht unbed= ingt gebraucht werden.]
Graphs are perhaps the single most important concept of algorithmics because they
are a useful tool whenever we want to model objects (nodes) and relations between
them (edges). Besides obvious applications like road maps or communication networks, there are many more abstract applications. For example, nodes could be tasks
to be completed when building a house like build the walls or put in the windows
and edges model precedence relations like the walls have to be build before the windows can be put in. We will also see many examples of data structures where it
1
v
1
v
U
1
1
1
selfloop
K5
33
1
x
K 3,3
2
u
undirected
bidirected
34
Introduction
2.8 Graphs
35
undirected
directed
u
v
expression
r
+
u
rooted v
/
2
36
Introduction
We finish this barrage of definitions by giving at least one nontrivial graph algorithm. Suppose we want to test whether a directed graph is acyclic. We use the simple
observation that a node v with outdegree zero cannot appear in any cycle. Hence, by
deleting v (and its incoming edges) from the graph, we obtain a new graph G 0 that is
acyclic if and only if G is acyclic. By iterating this transformation, we either arrive at
the empty graph which is certainly acyclic, or we obtain a graph G 1 where every node
has outdegree at least one. In the latter case, it is easy to find a cycle: Start at any node
v and construct a path p by repeatedly choosing an arbitrary outgoing edge until you
reach a node v0 that you have seen before. Path p will have the form (v, . . . , v0 , . . . , v0 ),
i.e., the part (v0 , . . . , v0 ) of p forms a cycle. For example, in Figure 2.4 graph G has
no node with outdegree zero. To find a cycle, we might start at node z and follow the
walk hz, w, x, u, v, wi until we encounter w a second time. Hence, we have identified
the cycle hw, x, u, v, wi. In contrast, if edge (w, x) is removed, there is no cycle. Indeed,
our algorithm will remove all nodes in the order w, v, u, t, s, z, y, x. [repeat graph
G, with dashed (w, x), mark the walk hw, x, u, v, wi, and give the order in which
= nodes are removed?] In Chapter 8 we will see how to represent graphs such that
this algorithm can be implemented to run in linear time. See also Exercise 8.3.
Ordered Trees
Trees play an important role in computer science even if we are not dealing with
graphs otherwise. The reason is that they are ideally suited to represent hierarchies.
For example, consider the expression a + 2/b. We have learned that this expression
means that a and 2/b are added. But deriving this from the sequence of characters
ha, +, 2, /, bi is difficult. For example, it requires knowledge of the rule that division
binds more tightly than addition. Therefore computer programs isolate this syntactical
knowledge in parsers and produce a more structured representation based on trees.
Our example would be transformed into the expression tree given in Figure 2.5. Such
trees are directed and in contrast to graph theoretic trees, they are ordered, i.e., the
order of successors matters. For example, if we would swap the order of 2 and b
in our example, we would get the expression a + b/2.
To demonstrate that trees are easy to process, Figure 2.6 gives an algorithm for
evaluating expression trees whose leaves are numbers and whose interior nodes are
binary operators (say +,-,*,/).
We will see many more examples of ordered trees in this book. Chapters 6 and
7 use them to represent fundamental data structures and Chapter 12 uses them to
systematically explore a space of possible solutions of a problem.
=
[what about tree traversal, in/pre/post order?]
Function eval(r) :
if r is a leaf then return the number stored in r
else
v1 := eval(first child of r)
v2 := eval(second child of r)
return v1 operator(r)v2
37
// r is an operator node
Exercises
Exercise 2.11 Model a part of the street network of your hometown as a directed
graph. Use an area with as many one-way streets as possible. Check whether the
resulting graph is strongly connected.[in dfs chapter?]
=
Exercise 2.12 Describe ten sufficiently different applications that can be modelled
using graphs (e.g. not car, bike, and pedestrian networks as different applications). At
least five should not be mentioned in this book.
Exercise 2.13 Specify an n node DAG that has n(n 1)/2 edges.
Exercise 2.14 A planar graph is a graph that you can draw on a sheet of paper such
that no two edges cross each other. Argue that a street network is not necessarily
planar. Show that the graphs K5 and K33 in Figure 2.4 are not planar.
38
Introduction
39
convenience only and a robust implementation should circumvent using them. [give
= example for shortest path or the like]
C++
Our pseudocode can be viewed as a concise notation for a subset of C++.
The memory management operations allocate and dispose are similar to the C++
operations new and delete. Note that none of these functions necessarily guarantees constant time execution. For example, C++ calls the default constructor of each
element of a new array, i.e., allocating an array of n objects takes time (n) whereas
allocating an array n of ints may be done in constant time. In contrast, we assume
that all arrays which are not explicitly initialized contain garbage. In C++ you can
obtain this effect using the C functions malloc and free. However, this is a deprecated practice and should only be used when array initialization would be a severe
performance bottleneck. If memory management of many small objects is performance critical, you can customize it using the allocator class of the C++ standard
library.[more refs to good implementations? Lutz fragen. re-crosscheck with
= impl. notes of sequence chapter]
Our parameterization of classes using of is a special case of the C++-template
mechanism. The parameters added in brackets after a class name correspond to parameters of a C++ constructor.
Assertions are implemented as C-macros in the include file assert.h. By default, violated assertions trigger a runtime error and print the line number and file
where the assertion was violated. If the macro NDEBUG is defined, assertion checking
is disabled.
Java
= [what about genericity?] [what about assertions?] Java has no explicit memory
= mangement in particular. Rather, a garbage collector periodically recycles pieces of
memory that are no longer referenced. While this simplfies programming enormously,
it can be a performance problem in certain situations. Remedies are beyond the scope
of this book.
For a more detailed abstract machine model refer to the recent book of Knuth [58].
[results that say P=PSPACE for arbitrary number of bits?]
[verification, some good algorithms books, ]
[some compiler construction textbooks R. Wilhelm fragen]
Chapter 3
Representing Sequences by
Arrays and Linked Lists
Perhaps the worlds oldest data structures were tablets in cuneiform script used
more than 5000 years ago by custodians in Sumerian temples. They kept lists of goods,
their quantities, owners and buyers. The left picture shows an example. Possibly this
was the first application of written language. The operations performed on such lists
have remained the same adding entries, storing them for later, searching entries and
changing them, going through a list to compile summaries, etc. The Peruvian quipu
you see in the right picture served a similar purpose in the Inca empire using knots
in colored strings arranged sequentially on a master string. Probably it is easier to
maintain and use data on tablets than using knotted string, but one would not want to
haul stone tablets over Andean mountain trails. Obviously, it makes sense to consider
different representations for the same kind of logical data.
The abstract notion of a sequence, list, or table is very simple and is independent
of its representation in a computer. Mathematically, the only important property is
that the elements of a sequence s = he0 , . . . , en1 i are arranged in a linear order
in contrast to the trees and graphs in Chapters 7 and 8, or the unordered hash tables
discussed in Chapter 4. There are two basic ways to specify elements of a sequence.
One is to specify the index of an element. This is the way we usually think about arrays
where s[i] returns the i-th element of sequence s. Our pseudocode supports bounded
arrays. In a bounded data structure, the memory requirements are known in advance,
at the latest when the data structure is created. Section 3.1 starts with unbounded
arrays that can adaptively grow and shrink as elements are inserted and removed. The
analysis of unbounded arrays introduces the concept of amortized analysis.
40
Why are unbounded arrays important? Often, because we do not know in advance how
large an array should be. Here is a typical example: Suppose you want to implement
= the Unix command sort for sorting[explain somewhere in intro?] the lines of a
file. You decide to read the file into an array of lines, sort the array internally, and
finally output the sorted array. With unbounded arrays this is easy. With bounded
arrays, you would have to read the file twice: once to find the number of lines it
contains and once to actually load it into the array.
In principle, implementing such an unbounded array is easy. We emulate an unbounded array u with n elements by a dynamically allocated bounded array b with
w n entries. The first n entries of b are used to store the elements of b. The last
w n entries of b are unused. As long as w > n, pushBack simply increments n and
uses one of the previously unused entries of b for the new element. When w = n, the
next pushBack allocates a new bounded array b0 that is a constant factor larger (say a
factor two). To reestablish the invariant that u is stored in b, the content of b is copied
to the new array so that the old b can be deallocated. Finally, the pointer defining b is
redirected to the new array. Deleting the last element with popBack looks even easier
since there is no danger that b may become too small. However, we might waste a
41
lot of space if we allow b to be much larger than needed. The wasted space can be
kept small by shrinking b when n becomes too small. Figure 3.1 gives the complete
pseudocode for an unbounded array class. Growing and shrinking is performed using
the same utility procedure reallocate.
Amortized Analysis of Unbounded Arrays
Our implementation of unbounded arrays follows the algorithm design principle make
the common case fast. Array access with [] is as fast as for bounded arrays. Intuitively, pushBack and popBack should usually be fast we just have to update n.
However, a single insertion into a large array might incur a cost of n. Exercise 3.2
asks you to give an example, where almost every call of pushBack and popBack is
expensive if we make a seemingly harmless change in the algorithm. We now show
that such a situation cannot happen for our implementation. Although some isolated
procedure calls might be expensive, they are always rare, regardless of what sequence
of operations we execute.
Lemma 3.1 Consider an unbounded array u that is initially empty. Any sequence
= h1 , . . . , m i of pushBack or popBack operations on u is executed in time O (m).
If we divide the total cost for the operations in by the number of operations,
we get a constant. Hence, it makes sense to attribute constant cost to each operation.
Such costs are called amortized costs. The usage of the term amortized is similar to
its general usage, but it avoids some common pitfalls. I am going to cycle to work
every day from now on and hence it is justified to buy a luxury bike. The cost per
ride is very small this investment will amortize Does this kind of reasoning sound
familiar to you? The bike is bought, it rains, and all good intentions are gone. The
bike has not amortized.
In computer science we insist on amortization. We are free to assign arbitrary
amortized costs to operations but they are only correct if the sum of the amortized
costs over any sequence of operations is never below the actual cost. Using the notion
of amortized costs, we can reformulate Lemma 3.1 more elegantly to allow direct
comparisons with other data structures.
Corollary 3.2 Unbounded arrays implement the operation [] in worst case constant
time and the operations pushBack and popBack in amortized constant time.
To prove Lemma 3.1, we use the accounting method. Most of us have already used
this approach because it is the basic idea behind an insurance. For example, when you
rent a car, in most cases you also have to buy an insurance that covers the ruinous costs
you could incur by causing an accident. Similarly, we force all calls to pushBack and
42
return n
// Example for n = w = 4:
// b 0 1 2 3
// b 0 1 2 3
// b 0 1 2 3 e
// b 0 1 2 3 e
Procedure reallocate(w0 : )
// Example for w = 4, w0 = 8:
0
w := w
// b 0 1 2 3
b0 := allocate Array [0..w 1] of Element
// b0
(b0 [0], . . . , b0 [n 1]) := (b[0], . . . , b[n 1])
// b0 0 1 2 3
dispose b
// b 0 1 2 3
0
b := b
// pointer assignment b 0 1 2 3
43
popBack to buy an insurance against a possible call of reallocate. The cost of the
insurance is put on an account. If a reallocate should actually become necessary, the
responsible call to pushBack or popBack does not need to pay but it is allowed to use
previous deposits on the insurance account. What remains to be shown is that the
account will always be large enough to cover all possible costs.
Proof: Let m0 denote the total number of elements copied in calls of reallocate. The
total cost incurred by calls in the operation sequence is O (m + m0). Hence, it suffices
to show that m0 = O (m). Our unit of cost is now the cost of one element copy.
For = 2 and = 4, we require an insurance of 3 units from each call of pushBack
and claim that this suffices to pay for all calls of reallocate by both pushBack and
popBack. (Exercise 3.4 asks you to prove that for general and = 2 an insurance
of +1
1 units is sufficient.)
We prove by induction over the calls of reallocate that immediately after the call
there are at least n units left on the insurance account.
First call of reallocate: The first call grows w from 1 to 2 after at least one[two?] =
call of pushBack. We have n = 1 and 3 1 = 2 > 1 units left on the insurance account.
For the induction step we prove that 2n units are on the account immediately before
the current call to reallocate. Only n elements are copied leaving n units on the account
enough to maintain our invariant. The two cases in which reallocate may be called
are analyzed separately.
pushBack grows the array: The number of elements n has doubled since the last
reallocate when at least n/2 units were left on the account by the induction hypothesis.[forgot the case where the last reallocate was a shrink.] The n/2 new ele- =
ments paid 3n/2 units giving a total of 2n units.
popBack shrinks the array: The number of elements has halved since the last
reallocate when at least 2n units were left on the account by the induction hypothesis.
Exercises
Exercise 3.1 (Amortized analysis of binary counters.) Consider a nonnegative integer c represented by an array of binary digits and a sequence of m increment and
decrement operations. Initially, c = 0.
a) What is the worst case execution time of an increment or a decrement as a
function of m? Assume that you can only work at one bit per step.
44
45
Exercise 3.8 (Sparse arrays) Implement a bounded array with constant time for allocating the array and constant amortized time for operation []. As in the previous
exercise, a read access to an array entry that was never written should return . Note
that you cannot make any assumptions on the contents of a freshly allocated array.
Hint: Never explicitly write default values into entries with undefined value. Store
the elements that actually have a nondefault value in arbitrary order in a separate data
structure. Each entry in this data structure also stores its position in the array. The
array itself only stores references to this secondary data structure. The main issue is
now how to find out whether an array element has ever been written.
Exercise 3.2 Your manager asks you whether it is not too wasteful to shrink an array
only when already three fourths of b are unused. He proposes to shrink it already when
w = n/2. Convince him that this is a bad idea by giving a sequence of m pushBack and
popBack operations that would need time m2 if his proposal were implemented.
Exercise 3.4 (General space time tradeoff) Generalize the proof of Lemma 3.1 for
general and = 2 . Show that an insurance of +1
1 units paid by calls of pushBack
suffices to pay for all calls of reallocate.
*Exercise 3.5 We have not justified the relation = 2 in our analysis. Prove that
any other choice of leads to higher insurance costs for calls of pushBack. Is = 2
still optimal if we also require an insurance from popBack? (Assume that we now
want to minimize the maximum insurance of any operation.)
Exercise 3.6 (Worst case constant access time) Suppose for a real time application
you need an unbounded array data structure with worst case constant execution time
for all operations. Design such a data structure. Hint: Store the elements in up to
two arrays. Start moving elements to a larger array well before the small array is
completely exhausted.
Exercise 3.7 (Implicitly growing arrays) Implement an unbounded array where the
operation [i] allows any positive index. When i n, the array is implicitly grown to
size n = i + 1. When n w, the array is reallocated as for UArray. Initialize entries
that have never been written with some default value .
46
47
b0prev := a0
//
Y
t
a
b
t0
// insert ha, . . . , bi after t
R
-
t 0 := tnext
//
Y
bnext := t
aprev := t
//
//
Y
tnext := a
t 0prev := b
//
//
-
R
-
-
it after some target item. The target can be either in the same list or in a different list
but it must not be inside the sublist.
Since splice never changes the number of items in the system, we assume that there
is one special list freeList that keeps a supply of unused elements. When inserting
new elements into a list, we take the necessary items from freeList and when deleting
elements we return the corresponding items to freeList. The function checkFreeList
allocates memory for new items when necessary. We defer its implementation to
Exercise 3.11 and a short discussion in Section 3.5.
It remains to decide how to simulate the start and end of a list. The class List in
Figure 3.3 introduces a dummy item h that does not store any element but seperates
the first element from the last element in the cycle formed by the list. By definition
of Item, h points to the first proper item as a successor and to the last item as a
predecessor. In addition, a handle head pointing to h can be used to encode a position
before the first element or after the last element. Note that there are n + 1 possible
positions for inserting an element into an list with n elements so that an additional
item is hard to circumvent if we want to code handles as pointers to items.
With these conventions in place, a large number of useful operations can be implemented as one line functions that all run in constant time. Thanks to the power of
splice, we can even manipulate arbitrarily long sublists in constant time. Figure 3.3
gives typical examples.
The dummy header can also be useful for other operations. For example consider
the following code for finding the next occurence of x starting at item from. If x is not
present, head should be returned.
Function findNext(x : Element; from : Handle) : Handle
h.e = x
// Sentinel
x
while frome6= x do
from := fromnext
return from
-
48
: Item
// empty list hi with dummy item onlyh= this
this
// Simple access functions
Function head() : Handle; return address of h// Pos. before any proper element
Function isEmpty : {0, 1}; return h.next = this
Function first : Handle; assert isEmpty; return h.next
Function last : Handle; assert isEmpty; return h.prev
// hi?
// h. . . , a, b, . . .i 7 h. . . , a, e, b, . . .i
Function insertAfter(x : Element; a : Handle) : Handle
checkFreeList
// make sure freeList is nonempty. See also Exercise 3.11
a0 := freeList.first
moveAfter(a0 , a)
ae=x
return a
Function insertBefore(x : Element; b : Handle) : Handle return insertAfter(e, pred(b))
Procedure pushFront(x : Element) insertAfter(x, head)
Procedure pushBack(x : Element) insertAfter(x, last)
// Manipulations of entire lists
// (ha, . . . , bi, hc, . . . , di) 7 (ha, . . . , b, c, . . . , di, hi)
Procedure concat(o : List)
splice(o.first, o.last, head)
//
-
7
49
We use the header as a sentinel. A sentinel is a dummy element in a data structure that
makes sure that some loop will terminate. By storing the key we are looking for in the
header, we make sure that the search terminates even if x is originally not present in
the list. This trick saves us an additional test in each iteration whether the end of the
list is reached.
Maintaining the Size of a List
In our simple list data type it not possible to find out the number of elements in constant time. This can be fixed by introducing a member variable size that is updated
whenever the number of elements changes. Operations that affect several lists now
need to know about the lists involved even if low level functions like splice would
only need handles of the items involved. For example, consider the following code for
moving an element from one list to another:
// (h. . . , a, b, c . . .i, h. . . , a0 , c0 , . . .i) 7 (h. . . , a, c . . .i, h. . . , a0 , b, c0 , . . .i)
Procedure moveAfter(b, a0 : Handle; o : List)
splice(b,b,a0)
size
o.size++
// ha, . . . , bi 7 hi
Procedure makeEmpty
freeList.concat(this )
Interfaces of list data types should require this information even if size is not maintained so that the data type remains interchangable with other implementations.
A more serious problem is that operations that move around sublists beween lists
cannot be implemented in constant time any more if size is to be maintained. Exercise 3.15 proposes a compromise.
50
51
We can adopt the implementation approach from doubly linked lists. SItems form
collections of cycles and an SList has a dummy SItem h that precedes the first proper
element and is the successor of the last proper element. Many operations of Lists can
still be performed if we change the interface. For example, the following implementation of splice needs the predecessor of the first element of the sublist to be moved.
Exercise 3.14 findNext using sentinels is faster than an implementation that checks
for the end of the list in each iteration. But how much faster? What speed difference do
you predict for many searches in a small list with 100 elements, or for a large list with
10 000 000 elements respectively? Why is the relative speed difference dependent on
the size of the list?
Similarly, findNext should not return the handle of the SItem with the next fit but
its predecessor. This way it remains possible to remove the element found. A useful
addition to SList is a pointer to the last element because then we can support pushBack
in constant time. We leave the details of an implementation of singly linked lists to
= Exercise 3.17. [move some exercises]
Exercises
Exercise 3.9 Prove formally that items of doubly linked lists fulfilling the invariant
next prev = prev next = this form a collection of cyclic chains.
Exercise 3.10 Implement a procudure swap similar to splice that swaps two sublists
in constant time
(h. . . , a0 , a, . . . , b, b0 , . . .i, h. . . , c0 , c, . . . , d, d 0 , . . .i) 7
(h. . . , a0 , c, . . . , d, b0 , . . .i, h. . . , c0 , a, . . . , b, d 0 , . . .i) .
Can you view splice as a special case of swap?
Exercise 3.11 (Memory mangagement for lists) Implement the function checkFreelist
called by insertAfter in Figure 3.3. Since an individual call of the programming
language primitive allocate for every single item might be too slow, your function
should allocate space for items in large batches. The worst case execution time of
checkFreeList should be independent of the batch size. Hint: In addition to freeList
use a small array of free items.
Exercise 3.15 Design a list data type that allows sublists to be moved between lists
in constant time and allows constant time access to size whenever sublist operations
have not been used since the last access to the list size. When sublist operations have
been used size is only recomputed when needed.
Exercise 3.16 Explain how the operations remove, insertAfter, and concat have to be
modified to keep track of the length of a List.
Exercise 3.17 Implement classes SHandle, SItem, and SList for singly linked lists in
analogy to Handle, Item, and List. Support all functions that can be implemented
to run in constant time. Operations head, first, last, isEmpty, popFront, pushFront,
pushBack, insertAfter, concat, and makeEmpty should have the same interface as
before. Operations moveAfter, moveToFront, moveToBack, remove, popFront, and
findNext need different interfaces.
Sequences are often used in a rather limited way. Let us again start with examples
from precomputer days. Sometimes a clerk tends to work in the following way: There
is a stack of files on his desk that he should work on. New files are dumped on the top
of the stack. When he processes the next file he also takes it from the top of the stack.
The easy handling of this data structure justifies its use as long as no time-critical
jobs are forgotten. In the terminology of the preceeding sections, a stack is a sequence
that only supports the operations pushBack, popBack, and last. In the following we
will use the simplified names push, pop, and top for these three operations on stacks.
Exercise 3.12 Give a constant time implementation for rotating a list right: ha, . . . , b, ci 7
hc, a, . . . , bi. Generalize your algorithm to rotate sequence ha, . . . , b, c, . . . , di to hc, . . . , d, a, . . . , bi We get a different bahavior when people stand in line waiting for service at a
in constant time.
post office. New customers join the line at one end and leave it at the other end.
52
stack
...
53
port worst case constant access time and are very space efficient. Exercise 3.20 asks
you to design stacks and queues that even work if the data will not fit in main memory.
It goes without saying that all implementations of stacks and queues described here
can easily be augmented to support size in constant time.
FIFO queue
...
deque
...
popFront pushFront
pushBack popBack
n0
h
Such sequences are called FIFO queues (First In First Out) or simply queues. In the
terminology of the List class, FIFO queues only use the operations first, pushBack and
popFront.
The more general deque1, or double ended queue, that allows operations first, last,
pushFront, pushBack, popFront and popBack might also be observed at a post office,
when some not so nice guy jumps the line, or when the clerk at the counter gives
priority to a pregnant woman at the end of the queue. Figure 3.4 gives an overview of
the access patterns of stacks, queues and deques.
Why should we care about these specialized types of sequences if List can implement them all? There are at least three reasons. A program gets more readable and
easier to debug if special usage patterns of data structures are made explicit. Simple
interfaces also allow a wider range of implementations. In particular, the simplicity
of stacks and queues allows for specialized implementions that are more space efficient than general Lists. We will elaborate this algorithmic aspect in the remainder of
this section. In particular, we will strive for implementations based on arrays rather
than lists. Array implementations may also be significantly faster for large sequences
because sequential access patterns to stacks and queues translate into good reuse of
cache blocks for arrays. In contrast, for linked lists it can happen that each item access
= causes a cache miss.[klar? Verweis?]
Bounded stacks, where we know the maximal size in advance can easily be implemented with bounded arrays. For unbounded stacks we can use unbounded arrays.
Stacks based on singly linked lists are also easy once we have understood that we can
use pushFront, popFront, and first to implement push, pop, and top respectively.
Exercise 3.19 gives hints how to implement unbounded stacks and queues that sup1 Deque
; return (t h + n + 1) mod (n + 1)
54
Operator [i :
Exercises
Exercise 3.18 (The towers of Hanoi) In the great temple of Brahma in Benares, on
a brass plate under the dome that marks the center of the world, there are 64 disks of
pure gold that the priests carry one at a time between these diamond needles according
to Brahmas immutable law: No disk may be placed on a smaller disk. In the beginning
of the world, all 64 disks formed the Tower of Brahma on one needle. Now, however,
the process of transfer of the tower from one needle to another is in mid course. When
the last disk is finally in place, once again forming the Tower of Brahma but on a
different needle, then will come the end of the world and all will turn to dust. [42]. 3
Describe the problem formally for any number k of disks. Write a program that
uses three stacks for the poles and produces a sequence of stack operations that transform the state (hk, . . . , 1i, hi, hi) into the state (hi, hi, hk, . . . , 1i).
Exercise 3.19 (Lists of arrays) Here we want to develop a simple data structure for
stacks, FIFO queues, and deques that combines all the advantages of lists and unbounded arrays and is more space efficient for large queues than either of them. Use a
list (doubly linked for deques) where each item stores an array of K elements for some
large constant K. Implement such a data structure in your favorite programming language. Compare space consumption and execution time to linked lists and unbounded
arrays for large stacks and some random sequence of pushes and pops
Exercise 3.20 (External memory stacks and queues) Design a stack data structure
that needs O (1/B) I/Os per operation in the I/O model from Section ??. It suffices to
keep two blocks in internal memory. What can happen in a naive implementaiton with
only one block in memory? Adapt your data structure to implement FIFOs, again
using two blocks of internal buffer memory. Implement deques using four buffer
blocks.
Exercise 3.21 Explain how to implement a FIFO queue using two stacks so that each
FIFO operations takes amortized constant time.
fact, this mathematical puzzle was invented by the French mathematician Edouard Lucas in 1883.
55
Table 3.1: Running times of operations on sequences with n elements. Entries have
an implicit O () around them.
Operation
List SList UArray CArray explanation of
[]
n
n
1
1
||
1
1
1
1
not with inter-list splice
1
1
1
1
first
last
1
1
1
1
insert
1
1
n
n
insertAfter only
remove
1
1
n
n
removeAfter only
pushBack
1
1
1
1
amortized
1
1
n
1
amortized
pushFront
popBack
1
n
1
1
amortized
popFront
1
1
n
1
amortized
1
1
n
n
concat
splice
1
1
n
n
findNext,. . .
n
n
n
n
cache efficient
56
57
for unbounded stacks, and FIFO queues implemented via linked lists. It also offers
bounded variants that are implemented as arrays.
Iterators are a central concept of the C++ standard library that implement our abstract view of sequences independent of the particular representation.[Steven: more?] =
Java
The util package of the Java 2 platform provides Vector for unbounded arrays, LinkedList
for doubly linked lists, and Stack for stacks. There is a quite elaborate hierarchy of
abstractions for sequence like data types.[more?]
=
Many Java books proudly announce that Java has no pointers so that you might
wonder how to implement linked lists. The solution is that object references in Java are
essentially pointers. In a sense, Java has only pointers, because members of nonsimple
type are always references, and are never stored in the parent object itself.
Explicit memory management is optional in Java, since it provides garbage collections of all objects that are not needed any more.
Java does not allow the specification of container classes like lists and arrays for
a particular class Element. Rather, containers always contain Objects and the application program is responsible for performing appropriate casts. Java extensions for
better support of generic programming are currently a matter of intensive debate.[Im
Auge behalten]
=
58
59
Chapter 4
Hash Tables
[Cannabis Blatt als Titelbild?]
=
If you want to get a book from the central library of the University of Karlsruhe,
you have to order the book an hour in advance. The library personnel take the book
from the magazine. You can pick it up in a room with many shelves. You find your
book in a shelf numbered with the last digits of your library card. Why the last digits
and not the leading digits? Probably because this distributes the books more evenly
about the shelves. For example, many first year students are likely to have the same
leading digits in their card number. If they all try the system at the same time, many
books would have to be crammed into a single shelf.
The subject of this chapter is the robust and efficient implementation of the above
delivery shelf data structure known in computer science as a hash table. The definition of to hash that best expresses what is meant is to bring into complete disorder.
Elements of a set are intentionally stored in disorder to make them easier to find. Although this sounds paradoxical, we will see that it can make perfect sense. Hash table
accesses are among the most time critical parts in many computer programs. For example, scripting languages like awk [2] or perl [95] use hash tables as their only
data structures. They use them in the form of associative arrays that allow arrays to
be used with any kind of index type, e.g., strings. Compilers use hash tables for their
symbol table that associates identifiers with information about them. Combinatorial
search programs often use hash tables to avoid looking at the same situation multiple
times. For example, chess programs use them to avoid evaluating a position twice that
can be reached by different sequences of moves. One of the most widely used implementations of the join[ref some database book] operation in relational databases =
temporarily stores one of the participating relations in a hash table. (Exercise 4.4
gives a special case example.) Hash tables can often be used to replace applications
60
Hash Tables
of sorting (Chapter 5) or of search trees (Chapter 7). When this is possible, one often
gets a significant speedup.
Put more formally, hash tables store a set M of elements where each element e has
a unique key key(e). To simplify notation, we extend operation on keys to operations
on elements. For example, the comparison e = k is a abbreviation for key(e) = k. The
following dictionary operations are supported: [einheitliche Def. der Ops. Bereits
= in intro? move from summary?]
M.insert(e : Element): M := M {e}
M.remove(k : Key): M := M \ {e} where e is the unique element with e = k, i.e., we
assume that key is a one-to-one function.
M.find(k : Key) If there is an e M with e = k return e otherwise return a special
element .
In addition, we assume a mechanism that allows us to retrieve all elements in M. Since
this forall operation is usually easy to implement and is not severely affected by the
details of the implementation, we only discuss it in the exercises.
In the library example, Keys are the library card numbers and elements are the
book orders. Another pre-computer example is an English-German dictionary. The
keys are English words and an element is an English word together with its German
translations. There are many ways to implement dictionaries (for example using the
search trees discussed in Chapter 7). Here, we will concentrate on particularly efficient implementations using hash tables.
The basic idea behind a hash table is to map keys to m array positions using a hash
function h : Key 0..m 1. In the library example, h is the function extracting the
least significant digits from a card number. Ideally, we would like to store element
e in a table entry t[h(e)]. If this works, we get constant execution time for all three
operations insert, remove, and find.1
Unfortunately, storing e in t[h(e)] will not always work as several elements might
collide, i.e., they might map to the same table entry. A simple fix of this problem
in the library example allows several book orders to go to the same shelf. Then the
entire shelf has to be searched to find a particular order. In a computer we can store
a sequence of elements in each table entry. Because the sequences in each table entry are usually implemented using singly linked lists, this hashing scheme is known
as hashing with chaining. Section 4.1 analyzes hashing with chaining using rather
1 Strictly speaking, we have to add additional terms to the execution time for moving elements and for
evaluating the hash function. To simplify notation, we assume in this chapter that all this takes constant
time. The execution time in the more detailed model can be recovered by adding the time for one hash
function evaluation and for a constant number of element moves to insert and remove.
Hash Tables
61
optimistic assumptions about the properties of the hash function. In this model, we
achieve constant expected time for dictionary operations.
In Section 4.2 we drop these assumptions and explain how to construct hash functions that come with (probabilistic) performance guarantees. Note that our simple examples already show that finding good hash functions is nontrivial. For example, if we
applied the least significant digit idea from the library example to an English-German
dictionary, we might come up with a hash function based on the last four letters of a
word. But then we would have lots of collisions for words ending on tion, able,
etc.
We can simplify hash tables (but not their analysis) by returning to the original
idea of storing all elementsn in the table itself. When a newly inserted element e finds
entry t[h(x)] occupied, it scans the table until a free entry is found. In the library
example, this would happen if the shelves were too small to hold more than one book.
The librarians would then use the adjacent shelves to store books that map to the same
delivery shelf. Section 4.3 elaborates on this idea, which is known as hashing with
open addressing and linear probing.
Exercise 4.1 (Scanning a hash table.) Implement the forall operation for your favorite hash table data type. The total execution time for accessing all elements should
be linear in the size of the hash table. Your implementation should hide the representation of the hash table from the application. Here is one possible interface: Implement
an iterator class that offers a view on the hash table as a sequence. It suffices to
support initialization of the iterator, access to the current element and advancing the
iterator to the next element.
Exercise 4.2 Assume you are given a set M of pairs of integers. M defines a binary
relation RM . Give an algorithm that checks in expected time O (|M|) whether RM is
symmetric. (A relation is symmetric if (a, b) M : (b, a) M.) Space consumption
should be O (|M|).
Exercise 4.3 Write a program that reads a text file and outputs the 100 most frequent
words in the text. Expected execution time should be linear in the size of the file.
Exercise 4.4 (A billing system:) Companies like Akamai2 deliver millions and millions of files every day from their thousands of servers. These files are delivered for
other E-commerce companies who pay for each delivery (fractions of a cent). The
servers produce log files with one line for each access. Each line contains information
that can be used to deduce the price of the transaction and the customer ID. Explain
2 http://www.akamai.com/index_flash.html
62
Hash Tables
63
The proof is very simple once we know the probabilistic concepts of random variables,
their expectation, and the linearity of expectation described in Appendix ??.
Proof: Consider the execution time of remove or find for a key k. Both need constant
time plus the time for scanning the sequence t[h(k)]. Hence, even if this sequence
has to be scanned completely, the expected execution time is O (1 + E[X]) where the
random variable X stands for the length of sequence t[h(k)]. Let e i denote element i in
the table. Let X1 ,. . . ,Xn denote indicator random variables where Xi = 1 if h(ei ) = h(k)
and otherwise Xi = 0. We have X = ni=1 Xi . Using the linearity of expectation, we get
n
i=1
i=1
i=1
We can achieve linear space requirements and constant expected execution time
of all three operations if we keep m = (n) at all times. This can be achieved using
adaptive reallocation analogous to the unbounded arrays described in Section 3.1.
Exercise 4.5 (Unbounded Hash Tables) Explain how to implement hashing with chaining in such a way that m = (n) at all times. Assume that there is a hash function
h0 : Key and that you set h(k) = h0 (k) mod m.
Exercise 4.6 (Waste of space) Waste of space in hashing with chaining is due to
empty table entries. Assuming a random hash function, compute the expected number of empty table entries as a function of m and n. Hint: Define indicator random
variables Y0 , . . . , Ym1 where Yi = 1 if t[i] is empty.
64
Hash Tables
Definition 4.2 A family H {0..m 1}Key of functions from keys to table entries is
c-universal if for all x, y in Key with x 6= y and random h H ,
c
.
prob(h(x) = h(y))
m
For c-universal families of hash functions, we can easily generalize the proof of
Theorem 4.1 for fully random hash functions.
Theorem 4.3 If n elements are stored in a hash table with m entries using hashing
with chaining, the expected execution time of remove or find is O (1 + cn/m) if we
assume a hash function taken randomly from a c-universal family.
Now it remains to find c-universal families of hash functions that are easy to compute. We explain a simple and quite practical 1-universal family in detail and give
further examples in the exercises. In particular, Exercise 4.8 gives an family that
is perhaps even easier to understand since it uses only simple bit operations. Exercise 4.10 introduces a simple family that is fast for small keys and gives an example
where we have c-universality only for c > 1.
Assume that the table size m is a prime number. Set w = blogmc and write keys as
bit strings. Subdivide each bit string into pieces of at most w bits each, i.e., view keys
as k-tuples of integers between 0 and 2w 1. For a = (a1 , . . . , ak ) define
ha (x) = a x mod m
where a x = ki=1 ai xi denotes the scalar product.
Theorem 4.4
n
o
H = ha : a {0..m 1}k
In other words, we get a good hash function if we compute the scalar product between
a tuple representation of a key and a random vector.[bild dotprod.eps. redraw in
= latex?]
Proof: Consider two distinct keys x = (x1 , . . . , xk ) and y = (y1 , . . . , yk ). To determine
prob(ha (x) = ha (y)), we count the number of choices for a such that ha (x) = ha (y).
Fix an index j such that x j 6= y j . We claim that for each choice of the ai s with
i 6= j there is exacly one choice of a j such that ha (x) = ha (y). To show this, we solve
the equation for a j :
ha (x) = ha (y)
65
a i xi
1ik
a j (y j x j )
aj
( mod m)
a i yi
1ik
i6= j,1ik
ai (xi yi )
(y j x j )1
i6= j,1ik
( mod m)
ai (xi yi )
( mod m)
mk1
1
= .
mk
m
Is it a serious restriction that we need prime table sizes? On the first glance, yes.
We cannot burden users with the requirement to specify prime numbers for m. When
we adaptively grow or shrink an array, it is also not obvious how to get prime numbers
for the new value of m. But there is a simple way out. Number theory tells us that there
is a prime number just around the corner [41]. On the average it suffices to add an extra
O (logm) entries to the table to get a prime m. Furthermore, we can afford to look for
m using a simple brute force
approach: Check the desired m for primality by trying to
divide by all integers in 2.. b mc. If a divisor is found,
increment m. Repeat until a
prime number is found. This takes average time O ( m logm) much less than the
time needed to initialize the table with or to move elements at a reallocation.
Exercise 4.7 (Strings as keys.) Implement the universal family H for strings of 8
bit characters. You can assume that the table size is at least m = 257. The time for
evaluating a hash function should be proportional to the length of the string being
processed. Input strings may have arbitrary lengths not known in advance. Hint:
compute the random vector a lazily, extending it only when needed.
Exercise 4.8 (Hashing using bit matrix multiplication.) [Literatur? Martin fragen]=
For table size m = 2w and Key = {0, 1}k consider the family of hash functions
o
n
H = hM : M {0, 1}wk
66
Hash Tables
where hM (x) = Mx computes a matrix product using arithmetics mod 2. The resulting bit-vector is interpreted as an integer from 0..2w 1. Note that multiplication
mod 2 is the logical and-operation, and that addition mod 2 is the logical exclusiveor operation .
a) Explain how hM (x) can be evaluated using k bit-parallel exclusive-or operations.
b) Explain how hM (x) can be evaluated using w bit-parallel and operations and w
parity operations. Many machines support a machine instruction parity(y) that
is one if the number of one bits in y is odd and zero otherwise.
c) We now want to show that H is 1-universal. As a first step show that for any
two keys x 6= y, any bit position j where x and y differ, and any choice of the
columns Mi of the matrix with i 6= j, there is exactly one choice of column M j
such that hM (x) = hM (y).
with ha (x) = (ax mod 2k ) 2k` is 2-universal. (Due to Keller and Abolhassan.)
Exercise 4.13 (Table lookup made practical.) Let m = 2w and view keys as k + 1tuples where the 0-th elements is a w-bit number and the remaining elements are a-bit
numbers for some small constant a. The idea is to replace the single lookup in a huge
table from Section 4.1 by k lookups in smaller tables of size 2a . Show that
o
n
H [] = h(t1 ,...,tk ) : ti {0..m 1}{0..w1}
where
that generalizes class H by using arithmetics mod p for some prime number p.
Show that H is 1-universal. Explain how H is also a special case of H .
Exercise 4.10 (Simple linear hash functions.) Assume Key = 0..p1 for some prime
number p. Show that the following family of hash functions is (d|Key|/me/(|Key|/m)) 2 universal.
H = h(a,b) : a, b 0..p 1
where h(a,b) (x) = ax + b mod p mod m.
with h(a,b) (x) = ax + b mod m. Show that there is a set of b|Key|/mc keys M such that
x, y M : h(a,b) H fool : h(a,b)(x) = h(a,b) (y)
even if m is prime.
67
k
M
ti [xi ] .
i=1
is 1-universal.
68
Hash Tables
t = [. . . , x , y, z, . . .] .
h(z)
// no overflow
// not found
69
What happens if we reach the end of the table during insertion? We choose a
very simple fix by allocating m0 table entries to the right of the largest index produced
by the hash function h. For benign hash functions it should be sufficient to choose
m0 much smaller than m in order to avoid table overflows. Exercise 4.14 asks you
to develop a more robust albeit slightly slower variant where the table is treated as a
cyclic array.
A more serious question is how remove should be implemented. After finding the
element, we might be tempted to replace it with . But this could violate the invariant
for elements further to the right. Consider the example
When we naively remove element y and subsequently try to find z we would stumble
over the hole left by removing y and think that z is not present. Therefore, most
other variants of linear probing disallow remove or mark removed elements so that a
subsequent find will not stop there. The problem with marking is that the number of
nonempty cells (occupied or marked) keeps increasing, so searches eventually become
very slow. This can only be mitigated by introducing the additional complication of
periodic reorganizations of the table.
Here we give a different solution that is faster and needs no reorganizations [57,
Algorithm R][check]. The idea is rather obviouswhen an invariant may be violated, =
reestablish it. Let i denote the index of the deleted element. We scan entries t[ j] to
the right of i to check for violations of the invariant. If h(t[ j]) > i the invariant still
holds even if t[i] is emptied. If h(t[k]) i we can move t[ j] to t[i] without violating
the invariant for the moved element. Now we can pretend that we want to remove the
duplicate copy at t[ j] instead of t[i], i.e., we set i := j and continue scanning. Figure ??
depicts this case[todo???]. We can stop scanning, when t[ j] = because elements =
to the right that violate the invariant would have violated it even before the deletion.
Exercise 4.14 (Cyclic linear probing.) Implement a variant of linear probing where
the table size is m rather than m + m0 . To avoid overflow at the right end of the array,
make probing wrap around.
Adapt insert and delete by replacing increments with i := i + 1 mod m.
Specify a predicate between(i, j, k) that is true if and only if j is cyclically between i and j.
Reformulate the invariant using between.
Adapt remove.
70
Hash Tables
Exercise 4.15 (Unbounded linear probing.) Implement unbounded hash tables using linear probing and universal hash functions. Pick a new random hash function
whenever the table is reallocated. Let 1 < < < denote constants we are free to
choose. Keep track of the number of stored elements n. Grow the table to m = n if
n > m/. Shrink the table to m = n if n < m/. If you do not use cyclic probing as
in Exercise 4.14, set m0 = m for some < 1 and reallocate the table if the right end
should overflow.
71
bit representation of many data types and allow a reusable dictionary data type that
hides the existence of hash functions from the user. However, one usually wants at
least an option of a user specified hash function. Also note that many families of hash
functions have to be adapted to the size of the table so that need some internal state
that is changed when the table is resized or when access times get too large.
Most applications seem to use simple very fast hash functions based on xor, shifting, and table lookups rather than universal hash functions 3. Although these functions
seem to work well in practice, we are not so sure whether universal hashing is really
slower. In particular, family H [] from Exercise 4.13 should be quite good for integer
keys and Exercise 4.7 formulates a good function for strings. It might be possible to
implement the latter function particularly fast using the SIMD-instructions in modern
processors that allow the parallel execution of several small precision operations.
example http://burtleburtle.net/bob/hash/evahash.html
72
Hash Tables
C++
The current C++ standard library does not define a hash table data type but the popular implementation by SGI (http://www.sgi.com/tech/stl/) offers several
variants: hashs et, hashm ap, hashm ultiset, hashm ultimap. Here set stands for the kind
of interfaces used in this chapter wheras a map is an associative array indexed Keys.
The term multi stands for data types that allow multiple elements with the same
key. Hash functions are implemented as function objects, i.e., the class hash<T>
overloads the operator () so that an object can be used like a function. The reason
for this charade is that it allows the hash function to store internal state like random
coefficients.
LEDA offers several hashing based implementations of dictionaries. The class
ha rrayhKey, Elementi implements an associative array assuming that a hash function
intHash(Key&) is defined by the user and returns an integer value that is then mapped
to a table index by LEDA. The implementation grows adaptively using hashing with
= chaining.[check. Kurt, was ist der Unterschied zu map?]
Java
The class java.util.hashtable implements unbounded hash tables using the function
hashCode defined in class Object as a hash function.
*Exercise 4.16 (Associative arrays.) Implement a C++-class for associative arrays.
Support operator[] for any index type that supports a hash function. Make sure
that H[x]=... works as expected if x is the key of a new element.
= of a table entry is ( m). [reference? Martin und Rasmus fragen], i.e., there
can be be some elements where access takes much longer than constant time. This is
undesirable for real time applications and for parallel algorithms where many processors work together and have to wait for each other to make progress. Dietzfelbinger
73
and Meyer auf der Heide [31][check ref] give a family of hash functions that [which =
bound, outline trick.]. [m vs n dependence?]
=
Another key property of hash functions is in what sense sets of elements are hashed =
independently. A family H {0..m 1}Key is k-way independent if prob(h(x1 ) =
a1 h(xk ) = ak ) = mk for any set of k keys and k hash function values. One
application of higher independence is the reduction of the maximum
occupancy be
cause k-wise independence implies maximum occupancy O m1/k [check bound][?]. =
Below we will see an application requiring O (logm)-wise independence. A simple
k-wise independent family of hash functions are polynomials of degree k 1 with
random coefficients [fill in details][?].
=
[strongly universal hashing]
=
[cryptographic hash functions]
=
Many hash functions used in practice are not universal and one can usually construct input where they behave very badly. But empirical tests indicate that some of
them are quite robust in practice.[hash function studies http://www.cs.amherst.edu/=
ccm/challenge5/]
It is an interesting question whether universal families can completely replace hash
functions without theoretical performance guarantees. There seem to be two main
counterarguments. One is that not all programmers have the mathematical to background to implement universal families themselves. However, software libraries can
hide the complications from the everyday user. Another argument is speed but it seems
that for most applications there are very fast universal families. For example, if |Key|
is not too large, the families from Exercise 4.12 and 4.13 seem to be hard to beat in
terms of speed.
74
Hash Tables
75
Chapter 5
A telephone directory book is alphabetically sorted by last name because this makes
it very easy to find an entry even in a huge city. A naive view on this chapter could
be that it tells us how to make telophone books. An early example of even more
massive data processing were the statistical evaluation of census data. 1500 people
needed seven years to manually process the US census in 1880. The engineer Herman
Hollerith1 who participated in this evaluation as a statistician, spend much of the ten
years to the next census developing counting and sorting machines (the small machine
in the left picture) for mechanizing this gigantic endeavor. Although the 1890 census
had to evaluate more people and more questions, the basic evaluation was finished in
1891. Holleriths company continued to play an important role in the development of
the information processing industry; since 1924 is is known as International Business
Machines (IBM). Sorting is important for census statistics because one often wants
1
DC.
The picuture to the right. Born February 29 1860, Buffalo NY; died November 17, 1929, Washington
76
to group people by an attribute, e.g., age and then do further processing for persons
which share an attribute. For example, a question might be how many people between
20 and 30 are living on farms. This question is relatively easy to answer if the database
entries (punch cards in 1890) are sorted by age. You take out the section for ages 20
to 30 and count the number of people fulfilling the condition to live on a farm.
Although we probably all have an intuitive feeling what sorting means, let us
look at a formal definition. To sort a sequence s = he1 , . . . , en i, we have to produce a sequence s0 = he01 , . . . , e0n i such that s0 is a permutation of s and such that
e01 e02 e0n . As in Chapter 4 we distinguish between an element e and its key
key(e) but extend the comparison operations between keys to elements so that e e 0
if and only if key(e) key(e0 ). Any key comparison relation can be used as long as it
defines a strict weak order, i.e., a reflexive, transitive, and antisymmetric with respect
to some equivalence relation . This all sounds a bit cryptic, but all you really need to
remember is that all elements must be comparable, i.e., a partial order will not do, and
for some elements we may not care how they are ordered. For example, two elements
may have the same key or may have decided that upper and lower case letters should
not be distinguished.
Although different comparison relations for the same data type may make sense,
the most frequent relations are the obvious order for numbers and the lexicographic
order (see Appendix A) for tuples, strings, or sequences.
Exercise 5.1 (Sorting words with accents.) Ordering strings is not always as obvious as it looks. For example, most European languages augment the Latin alphabet
with a number of accented characters. For comparisons, accents are essentially omitted. Only ties are broken so that MullMull etc.
a) Implement comparison operations for strings that follow these definitions and
can be used with your favorite library routine for sorting.2
b) In German telephone books (but not in Encyclopedias) a second way to sort
words is used. The umlauts a , o , and u are identified with their circumscription
a =ae, o =oe, and u =ue. Implement this comparison operation.
c) The comparison routines described above may be too slow for time critical applications. Outline how strings with accents can be sorted according to the
above rules using only plain lexicographic order within the sorting routines. In
particular, explain how tie breaking rules should be implemented.
Exercise 5.2 Define a total order for complex numbers where x y implies |x| |y|.
2
For West European languages like French, German, or Spanish you can assume the character set ISO
LATIN-1.
77
Sorting is even more important than it might seem since it is not only used to
produce sorted output for human consumption but, more importantly, as an ubiquitous
algorithmic tool:
Preprocessing for fast search: Not only humans can search a sorted directory faster
than an unsorted one. Although hashing might be a faster way to find elements, sorting
allows us additional types of operations like finding all elements which lie in a certain
range. We will discuss searching in more detail in Chapter 7.
Grouping: Often we want to bring equal elements together to count them, eliminate
duplicates, or otherwise process them. Again, hashing might be a faster alternative.
But sorting has advantages since we will see rather fast deterministic algorithms for it
that use very little space.
Spatial subdivision: Many divide-and-conquer algorithms first sort the inputs according to some criterion and then split the sorted input list.[ref to example? graham scan (but where? vielleicht als appetizer in further findings? or in einem
special topic chapter?)???]
=
Establishing additional invariants: Certain algorithms become very simple if the
inputs are processed in sorted order. Exercise 5.3 gives an example. Other examples
are Kruskals algorithm in Section 11.3, several of the algorithms for the knapsack
problem in Chapter 12, or the scheduling algorithm proposed in Exercise 12.7. You
may also want to remember sorting when you solve Exercise ?? on interval graphs.
Sorting has a very simple problem statement and in Section 5.1 we see correspondingly simple sorting algorithms. However, it is less easy to make these simple
approaches efficient. With mergesort, Section 5.2 introduces a simple divide-andconquer sorting algorithm that runs in time O (n logn). Section 5.3 establishes that
this bound is optimal for all comparison based algorithms, i.e., algorithms that treat
elements as black boxes that can only be compared and moved around. The quicksort
algorithm described in Section 5.4 it is also based on the divide-and-conquer principle and perhaps the most frequently used sorting algorithm. Quicksort is also a good
example for a randomized algorithm. The idea behind quicksort leads to a simple algorithm for a problem related to sorting. Section 5.5 explains how the k-th smallest
from n elements can be found in time O (n). Sorting can be made even faster than the
lower bound from Section 5.3 by looking into the bit pattern of the keys as explained
in Section 5.6. Finally, Section 5.7 generalizes quicksort and mergesort to very good
algorithms for sorting huge inputs that do not fit into internal memory.
Exercise 5.3 (A simple scheduling problem.) Assume you are a hotel manager who
has to consider n advance bookings of rooms for the next season. Your hotel has k
identical rooms. Bookings contain arrival date and departure date. You want to find
78
out whether there are enough rooms in your hotel to satisfy the demand. Design an
algorithm that solves this problem in time O (n logn). Hint: Set up a sequence of
events containing all arrival and departure dates. Process this list in sorted order.
Exercise 5.4 (Sorting with few different keys.) Design an algorithm that sorts n elements in O (k logk + n) expected time if there are only k different keys appearing in
the input. Hint: use universal hashing.
*Exercise 5.5 (Checking.) It is easy to check whether a sorting routine produces
sorted output. It is less easy to check whether the output is also a permutation of
the input. But here is a fast and simple Monte Carlo algorithm for integers: Show
that he1 , . . . , en i is a permutation of he01 , . . . , e0n i if and only if the polynomial identity
(z e1 ) (z en ) (z e01 ) (z e0n ) = 0 holds for all z. For any > 0 let p denote a
prime such that p > max{n/, e1 , . . . , en , e01 , . . . , e0n }. Now the idea is to evaluate the
above polynomial mod p for a random value z 0..p 1. Show that if he 1 , . . . , en i is
not a permutation of he01 , . . . , e0n i then the result of the evaluation is zero with probability at most . Hint: A nonzero polynomial of degree n has at most n zeroes.
79
i=2
i=1
(i 1) = n + i =
n(n + 1)
n(n 1)
n =
= n2
2
2
80
sorting. Furthermore, in some applications the input is already almost sorted and in
that case insertion sort can also be quite fast.
Exercise 5.7 (Almost sorted inputs.) Prove that insertion sort executes in time O (kn)
if for all elements ei of the input, |r(ei ) i| k where r defines the rank of ei (see Section 5.5 or Appendix A).
Exercise 5.8 (Average case analysis.) Assume the input to the insertion sort algorithm in Figure 5.1 is a permutation of the numbers 1,. . . ,n. Show that the average
execution time over all possible permutations is n2 . Hint: Argue formally that
about one third of the input elements in the right third of the array have to be moved
to the left third of the array. Using a more accurate analysis you can even show that
on the average n2 /4 O (n) iterations of the inner loop are needed.
Exercise 5.9 (Insertion sort with few comparisons.) Modify the inner loops of the
array based insertion sort algorithm from Figure 5.1 so that it needs only O (n logn)
= comparisons between elements. Hint: Use binary search[ref].
Exercise 5.10 (Efficient insertion sort?) Use the data structure for sorted sequences
from Chapter 7 to derive a variant of insertion sort that runs in time O (n logn).
Exercise 5.11 (Formal verification.) Insertion sort has a lot of places where one can
make errors. Use your favorite verification formalism, e.g. Hoare calculus, to prove
that insertion sort is correct. Do not forget to prove that the output is a permutation of
the input.
2718281
split
271
8281
split
2 71
82
81
split
81
7 1
8 2 8 1
merge
17
28
18
merge
127
1288
merge
1222788
c ab
1288
127
1 288
127
11 288
27
112 88
27
1122 88
7
11227 88
concat
1122788
// base case
a
b
a
b
a
b
a
b
82
Note that no allocation and deallocation of list items is needed. Each iteration of the
inner loop of merge performs one element comparison and moves one element to the
output. Each iteration takes constant time. Hence merging runs in linear time.
Theorem 5.1 Function merge applied to sequences of total length n executes in time
O (n) and performs at most n 1 element comparisons.
83
Exercise 5.15 Give an efficient array based implementation of mergesort in your favorite imperative programming language. Besides the input array, allocate one auxiliary array of size n at the beginning and then use these two arrays to store all intermediate results. Can you improve running time by switching to insertion sort for small
inputs? If so, what is the optimal switching point in your implementation?
Exercise 5.16 The way we describe merge, there are three comparisons for each loop
iteration one element comparison and two termination tests. Develop a variant
using sentinels that needs only one termination test. How can you do it without appending dummy elements to the sequences?
Exercise 5.17 Exercise 3.19 introduces a list-of-blocks representation for sequences.
Implement merging and mergesort for this data structure. In merging, reuse emptied
input blocks for the output sequence. Compare space and time efficiency of mergesort
for this data structure, plain linked lists, and arrays. (Constant factors matter.)
84
5.4 Quicksort
85
Theorem 5.3 Any comparison based sorting algorithm needs n logn O (n) comparisons in the worst case.
Using similar arguments, Theorem 5.3 can be strengthened further. The same
bound (just with different constants hidden in the linear term) applies on the average,
i.e., worst case sorting problems are not much more difficult than randomly permuted
inputs. Furthermore, the bound even applies if we only want to solve the seemingly
simpler problem of checking whether an element appears twice in a sequence.
Exercise 5.18 Exercise 4.3 asks you to count occurences in a file.
a) Argue that any comparison based algorithm needs time at least O (n log n) to
solve the problem on a file of length n.
b) Explain why this is no contradiction to the fact that the problem can be solved
in linear time using hashing.
// base case
// pivot key
// (A)
// (B)
// (C)
Exercise 5.19 (Sorting small inputs optimally.) Give an algorithm for sorting k element using at most dlogk!e element comparisons.
a) For k {2, 3, 4}. Hint: use mergesort.
*b) For k = 5 (seven comparisons). Implement this algorithm efficiently and use it
as the base case for a fast implementation of mergesort.
c) For k {6, 7, 8}. Hint: use the case k = 5 as a subroutine.
= [ask Jyrky for sources]
=
5.4 Quicksort
[kann man die Bilder fuer merge sort und quicksort so malen, dass sie dual
= zueinander aussehen?] Quicksort is a divide-and-conquer algorithm that is complementary to the mergesort algorithm we have seen in Section 5.2. Quicksort does all
the difficult work before the recursive calls. The idea is to distribute the input elements
to two or more sequences that represent disjoint ranges of key values. Then it suffices
to sort the shorter sequences recursively and to concatenate the results. To make the
symmetry to mergesort complete, we would like to split the input into two sequences
of equal size. Since this would be rather difficult, we use the easier approach to pick
a random splitter elements or pivot p. Elements are classified into three sequences
quickSort
3 6 8 1 0 7 2 4 5 9
1 0 2|3|6 8 7 4 5 9
0|1|2
4 5|6|8 7 9
4|5
7|8|9
0 1 2 3 4 5 6 7 8 9
qsort
3 6 8 1
2 0 1|8
1 0|2|5
0 1| |4
| |3
| |
0 1 2 3
0 7
6 7
6 7
3|7
4|5
|5
4 5
2 4 5
3 4 5
3 4|8
6 5|8
6|7|
6| |
6 7 8
9
9
9
9
i->
3 6
2 6
2 0
2 0
partition
8 1 0 7 2 4
8 1 0 7 3 4
8 1 6 7 3 4
1|8 6 7 3 4
j i
<-j
5 9
5 9
5 9
5 9
Figure 5.5: Execution of quickSort (Figure 5.4) and qSort (Figure 5.6) on
h2, 7, 1, 8, 2, 8, 1i using the first character of a subsequence as the piviot. The right
block shows the first execution of the repeat loop for partitioning the input in qSort.
86
a, b, and c of elements that are smaller, equal to, or larger than p respectively. Figure 5.4 gives a high level realization of this idea and Figure 5.5 depicts an example.
This simple algorithm is already enough to show expected execution time O (n logn)
in Section 5.4.1. In Section 5.4.2 we then discuss refinements that make quicksort the
most widely used sorting algorithm in practice.
5.4 Quicksort
87
The middle transformation follows from the linearity of expectation (Equation (A.2)).
The last equation uses the definition of the expectation of an indicator random variable
we
E[Xi j ] = prob(Xi j = 1). Before we can further simplify the expression for C(n),
need to determine this probability.
Lemma 5.5 For any i < j, prob(Xi j = 1) =
5.4.1 Analysis
To analyze the running time of quicksort for an input sequence s = he 1 , . . . , en i we
focus on the number of element comparisons performed. Other operations contribute
only constant factors and small additive terms in the execution time.
Let C(n) denote the worst case number of comparisons needed for any input sequence of size n and any choice of random pivots. The worst case performance is
easily determined. Lines (A), (B), and (C) in Figure 5.4. can be implemented in such
a way that all elements except for the pivot are compared with the pivot once (we allow
three-way comparisons here, with possible outcomes smaller, equal, and larger).
This makes n 1 comparisons. Assume there are k elements smaller than the pivot
and k0 elements larger than the pivot. We get C(0) = C(1) = 0 and
C(n) = n 1 + max C(k) +C(k0 ) : 0 k n 1, 0 k0 < n k .
By induction it is easy to verify that
n(n 1)
= n2 .
2
The worst case occurs if all elements are different and we are always so unlucky to
pick the largest or smallest element as a pivot.
The expected performance is much better.
C(n) =
= E[
C(n)
i=1 j=i+1
Xi j ] =
i=1 j=i+1
E[Xi j ] =
i=1 j=i+1
prob(Xi j = 1) .
2
.
ji+1
Proof: Consider the j i + 1 element set M = {e0i , . . . , e0j }. As long as no pivot from
M is selected, e0i and e0j are not compared but all elements from M are passed to the
same recursive calls. Eventually, a pivot p from M is selected. Each element in M
has the same chance 1/|M| to be selected. If p = e0i or p = e0j we have Xi j = 1. The
probability for this event is 2/|M| = 2/( j i + 1). Otherwise, e0i and e0j are passed to
different recursive calls so that they will never be compared.
Now we can finish the proof of Theorem 5.4 using relatively simple calculations.
n
=
C(n)
i=1 j=i+1
n n
prob(Xi j = 1) =
i=1 j=i+1
n ni+1
2
2
=
j i + 1 i=1 k=2 k
i=1 k=2
k=2
For the last steps, recall the properties of the harmonic number Hn := nk=1 1/k
ln n + 1 (Equation A.8).
Note that the calculations in Section 2.6 for left-right maxima were very similar
although we had a quite different problem at hand.
5.4.2 Refinements
[implement]
=
Figure 5.6 gives pseudocode for an array based quicksort that works in-place and
uses several implementation tricks that make it faster and very space efficient.
To make a recursive algorithm compatible to the requirement of in-place sorting
of an array, quicksort is called with a reference to the array and the range of array
indices to be sorted. Very small subproblems with size up to n0 are sorted faster using
a simple algorithm like the insertion sort from Figure 5.1.3 The best choice for the
3 Some books propose to leave small pieces unsorted and clean up at the end using a single insertion
sort that will be fast according to Exercise 5.7. Although this nice trick reduces the number of instructions
executed by the processor, our solution is faster on modern machines because the subarray to be sorted will
already be in cache.
88
// Use divide-and-conquer
Helps to establish the invariant
//
//
//
//
//
r
a: `
i j
a: p
a:
p
a:
p
a:
p
// Scan over elements (A)
// on the correct side (B)
// Done partitioning
5.4 Quicksort
89
constant n0 depends on many details of the machine and the compiler. Usually one
should expect values around 1040.
The pivot element is chosen by a function pickPivotPos that we have not specified
here. The idea is to find a pivot that splits the input more accurately than just choosing
a random element. A method frequently used in practice chooses the median (middle) of three elements. An even better method would choose the exact median of
a random sample of elements. [crossref to a more detailed explanation of this
concept?]
=
The repeat-until loop partitions the subarray into two smaller subarrays. Elements
equal to the pivot can end up on either side or between the two subarrays. Since
quicksort spends most of its time in this partitioning loop, its implementation details
are important. Index variable i scans the input from left to right and j scans from right
to left. The key invariant is that elements left of i are no larger than the pivot whereas
elements right of j are no smaller than the pivot. Loops (A) and (B) scan over elements
that already satisfiy this invariant. When a[i] p and a[ j] p, scanning can be continued after swapping these two elements. Once indices i and j meet, the partitioning
is completed. Now, a[`.. j] represents the left partition and a[i..r] represents the right
partition. This sounds simple enough but for a correct and fast implementation, some
subtleties come into play.
To ensure termination, we verify that no single piece represents all of a[`..r] even
if p is the smallest or largest array element. So, suppose p is the smallest element.
Then loop A first stops at i = `; loop B stops at the last occurence of p. Then a[i] and
a[ j] are swapped (even if i = j) and i is incremented. Since i is never decremented,
the right partition a[i..r] will not represent the entire subarray a[`..r]. The case that p
is the largest element can be handled using a symmetric argument.
The scanning loops A and B are very fast because they make only a single test.
On the first glance, that looks dangerous. For example, index i could run beyond the
right boundary r if all elements in a[i..r] were smaller than the pivot. But this cannot
happen. Initially, the pivot is in a[i..r] and serves as a sentinel that can stop Scanning
Loop A. Later, the elements swapped to the right are large enough to play the role of a
sentinel. Invariant 3 expresses this requirement that ensures termination of Scanning
Loop A. Symmetric arguments apply for Invariant 4 and Scanning Loop B.
Our array quicksort handles recursion in a seemingly strange way. It is something
like semi-recursive. The smaller partition is sorted recursively, while the larger
partition is sorted iteratively by adjusting ` and r. This measure ensures that recursion
can never go deeper than dlog nn0 e levels. Hence, the space needed for the recursion
stack is O (log n). Note that a completely recursive algorithm could reach a recursion
depth of n 1 so the the space needed for the recursion stack could be considerably
larger than for the input array itself.
90
5.5 Selection
*Exercise 5.20 (Sorting Strings using Multikey Quicksort [12]) Explain why mkqSort(s, 1)
// Find an element with rank k
below correctly sorts a sequence s consisting of n different strings. Assume that for
Function select(s : Sequence of Element; k : ) : Element
any e s, e[|e| + 1] is an end-marker character that is different from all normal charassert |s| k
acters. What goes wrong if s contains equal strings? Fix this problem. Show that the
pick p s uniformly at random
expected execution time of mkqSort is O (N + n logn) if N = es |e|.
a := he s : e < pi
Function mkqSort(s : Sequence of String, i : ) : Sequence of String
if |a| k then return select(a, k)
//
assert e.e0 s : e[1..i 1] = e0 [1..i 1]
b := he s : e = pi
if |s| 1 then return s
// base case
if |a| + |b| k then return p
//
pick p s uniformly at random
// pivot character
c := he s : e > pi
return concatenation of mkqSort(he s : e[i] < p[i]i , i),
return select(c, k |a| |b|)
//
mkqSort(he s : e[i] = p[i]i , i + 1), and
mkqSort(he s : e[i] > p[i]i , i)
Figure 5.7: Quickselect
91
// pivot key
k
k
a
a
b
b
k
c
5.5 Selection
Often we want to solve problems that are related to sorting but do not need the complete sorted sequence. We can then look for specialized algorithms that are faster.
For example, when you want to find the smallest element of a sequence, you would
hardly sort the entire sequence and then take the first element. Rather, you would just
scan the sequence once and remember the smallest element. You could quickly come
up with algorithms for finding the second smallest element etc. More generally, we
could ask for the k-th smallest element or all the k smallest elments in arbitrary order.
In particular, in statistical problems, we are often interested in the median or n/2-th
smallest element. Is this still simpler than sorting?
To define the term k-th smallest if elements can be equal, it is useful to formally
introduce the notion of rank of an element. A ranking function r is a one-to-one
mapping of elements of a sequence he1 , . . . , en i to the range 1..n such that r(x) < r(y)
if x < y. Equal elements are ranked arbitrarily. A k-th smallest element is then an
element that has rank k for some ranking function.[is this the common def. I could
= not find this anywhere?]
Once we know quicksort, it is remarkably easy to modify it to obtain an efficient
selection algorithm from it. This algorithm is therefore called quickselect. The key
observation is that it is always sufficient to follow at most one of the recursive calls.
Figure 5.7 gives an adaption for the simple sequence based quicksort from Figure 5.4.
As before, the sequences a, b, and c are defined to contain the elements smaller than
the pivot, equal to the pivot, and larger than the pivot respectively. If |a| k, it suffices
to restrict selection to this problem. In the borderline case that |a| < k but |a| + |b| k,
the pivot is an element with rank k and we are done. Note that this also covers the case
|s| = k = 1 so that no separate base case is needed. Finally, if |a| + |b| < k the elements
in a and b are too small for a rank k element so that we can restrict our attention to c.
We have to search for an element with rank k |a| + |b|.
The table below illustrates the levels of recursion entered by
select(h3,1,4,5,9,2,6,5,3,5,8i, 6) = 5 assuming that the middle element of the
current s is used as the pivot p.
s
k p
a
b
c
h3, 1, 4, 5, 9, 2, 6, 5, 3, 5, 8i 6 2
h1i
h2i
h3, 4, 5, 9, 6, 5, 3, 5, 8i
4 6 h3, 4, 5, 5, 3, 4i
h3, 4, 5, 9, 6, 5, 3, 5, 8i
h3, 4, 5, 5, 3, 5i
4 5
h3, 4, 3i
h5, 5, 5i
As for quicksort, the worst case execution time of quickselect is quadratic. But the
expected execution time is only linear saving us a logarithmic factor over the execution
time of quicksort.
Theorem 5.6 Algorithm quickselect runs in expected time O (n) for an input of size
|s| = n.
Proof: An analysis that does not care about constant factors is remarkably easy to
obtain. Let T (n) denote the expected execution time of quickselect. Call a pivot good
if neither |a| nor |b| are larger than 2n/3. Let 1/3 denote the probability that p is
good. We now make the conservative assumption that the problem size in the recursive
call is only reduced for good pivots and that even then it is only reduced by a factor
of 2/3. Since the work outside the recursive call is linear in n, there is an appropriate
92
2n
+ (1 )T (n) or, equivalently
T (n) cn + T
3
cn
2n
2n
T (n)
+T
3cn + T
.
3
3
= Now the master Theorem ?? [todo: need the version for arbitrary n.] for recurrence
relations yields the desired linear bound for T (n).
=
=
=
93
class of algorithms. Evading this class of algorithms is then a guideline for getting
faster. For sorting, we devise algorithms that are not comparison based but extract
more information about keys in constant time. Hence, the (n logn) lower bound for
comparison based sorting does not apply.
Let us start with a very simple algorithm that is fast if the keys are small integers in
the range 0..K 1. This algorithm runs in time O (n + K). We use an array b[0..K 1]
of buckets that are initially empty. Then we scan the input and insert an element with
key k into bucket b[k]. This can be done in constant time per element for example
using linked lists for the buckets. Finally, we append all the nonempty buckets to
obtain a sorted output. Figure 5.8 gives pseudocode. For example, if elements are
pairs whose first element is a key in range 0..3 and
s = h(3, a), (1, b), (2, c), (3, d), (0, e), (0, f ), (3, g), (2, h), (1, i)i
we get b = [h(0, e), (0, f )i, h(1, b), (1, i)i, h(2, c), (2, h)i, h(3, a), (3, d), (3, g)i]
and the sorted output h(0, e), (0, f ), (1, b), (1, i), (2, c), (2, h), (3, a), (3, d), (3, g)i .
Procedure KSort(s : Sequence of Element)
s
b=hhi, . . . , hii : Array [0..K 1] of Sequence of Element
foreach e s do b[key(e)].pushBack(e)
//
s := concatenation of b[0], . . . , b[K 1]
Figure 5.9: Sorting with keys in the range 0..K d 1 using least significant digit radix
sort.
KSort can be used as a building block for sorting larger keys. The idea behind radix
sort is to view integer keys as numbers represented by digits in the range 0..K 1.
Then KSort is applied once for each digit. Figure 5.9 gives a radix sorting algorithm
for keys in the range 0..K d 1 that runs in time O (d(n + K)). The elements are sorted
94
first by their least significant digit then by the second least significant digit and so on
until the most significant digit is used for sorting. It is not obvious why this works.
LSDRadixSort exploits the property of KSort that elements with the same key(e) retain
their relative order. Such sorting algorithms are called stable. Since KSort is stable,
the elements with the same i-th digit remain sorted with respect to digits 0..i1 during
the sorting process by digit i. For example, if K = 10, d = 3, and
s =h017, 042, 666, 007, 111, 911, 999i, we successively get
s =h111, 911, 042, 666, 017, 007, 999i,
s =h007, 111, 911, 017, 042, 666, 999i, and
s =h007, 017, 042, 111, 666, 911, 999i .
The mechanical sorting machine shown on Page 75 basically implemented one pass
of radix sort and was most likely used to run LSD radix sort.
Procedure uniformSort(s : Sequence of Element)
n:= |s|
b=hhi, . . . , hii : Array [0..n 1] of Sequence of Element
foreach e s do b[bkey(e) nc].pushBack(e)
for i := 0 to n 1 do sort b[i] in time O (|b[i]| log|b[i]|)
s := concatenation of b[0], . . . , b[n 1]
Figure 5.10: Sorting random keys in the range [0, 1).
Radix sort starting with the most significant digit (MSD radix sort) is also possible.
We apply KSort to the most significant digit and then sort each bucket recursively. The
only problem is that the buckets might be much smaller than K so that it would be expensive to apply KSort to small buckets. We then have to switch to another algorithm.
This works particularly well if we can assume that the keys are uniformly distributed.
More specifically, let us now assume that keys are real numbers with 0 key(e) < 1.
Algorithm uniformSort from Figure 5.10 scales these keys to integers between 0 and
n 1 = |s| 1, and groups them into n buckets where bucket b[i] is responsible for
keys in the range [i/n, (i + 1)/n. For example, if s = h0.8, 0.4, 0.7, 0.6, 0.3i we get five
buckets responsible for intervals of size 0.2 and
b = [hi, h0.3i, h0.4i, h0.6, 0.7i, h0.8i]
and only b[3] = h0.7, 0.6i represents a nontrivial sorting subproblem.
We now show, that uniformSort is very efficient for random keys.
95
Theorem 5.7 If keys are independent uniformly distributed random values in the
range [0.1), Algorithm uniformSort from Figure 5.10 sorts n keys in expected time
O (n) and worst case time O (n logn).
Proof: We leave the worst case bound as an exercise and concentrate on the average
case analysis. The total execution time T is O (n) for setting up the buckets and concatenating the sorted buckets plus the time for sorting the buckets. Let Ti denote the
time for sorting the i-th bucket. We get
E[T ] = O (n) + E[ Ti ] = O (n) + E[Ti ] = nE[T0 ] .
i<n
i<n
The first = exploits the linearity of expectation (Equation (A.2)) and the second
= exploits that all buckets sizes have the same distribution for uniformly distributed
inputs. Hence, it remains to show that E[T0 ] = O (1). We prove the stronger claim
that E[T0 ] = O (1) even if a quadratic time algorithm such as insertion sort is used for
sorting the buckets. [cross ref with perfect hashing?]
=
Let B0 = |b[0]|. We get E[T0 ] = O E[B20 ] . The random variable B0 obeys a binomial distribution with n trials and success probability 1/n. Using the definition of expected values (Equation (A.1)) and the binomial distribution defined in Equation (A.5)
we get
1 ni
2
2
2 n 1
.
E[B0 ] = i prob(B0 = i) = i
1
i ni
n
in
in
It remains to show that this value is bounded by a constant independent of n. Using
Inequality (A.6) and by estimating (1 1/n)n1 1 we get
E[B20 ] i2
in
ne i 1
i
2 e
=
i
i .
i
ni in
Finally, we can drop the restriction i n and get an infinite sum that is independent of
n. It remains to show that this sum is bounded. By Cauchys n-th root test, the sum is
bounded if
r
i
e
i 2 e
i
= i2/i < q
i
i
for some constant q < 1 and sufficiently large i. For i 6 we get
e
e
e
e
i2/i i1/3 = 2/3 2/3 0.83 < 1 .
i
i
i
6
[More elegant proof? Kurt hatte was schoeneres?]
96
make_things_|as_simple_as|_possible_bu|t_no_simpler
form run
| form run
| form run
| form run
|
__aeghikmnst|__aaeilmpsss|__bbeilopssu|__eilmnoprst
\merge/
\merge/
____aaaeeghiiklmmnpsssst |____bbeeiillmnoopprssstu
\merge/
________aaabbeeeeghiiiiklllmmmnnooppprsssssssttu
Figure 5.11: An example of two-way mergesort with runs of length 12.
*Exercise 5.24 Implement an efficient sorting algorithm for elements with keys in
the range 0..K 1 that uses the data structure from Exercise 3.19 as input and output.
Space consumption should be at most n + O (n/B + KB) elements for n elements and
blocks of size B.
97
internal memory. When the buffer block of an input sequence runs out of entries, we
fetch the next block. When the buffer block of the output sequence fills up, we write
it to the external memory. In each phase, we need n/B block reads and n/B block
writes. Summing all I/Os we get 2(n/B + dlogN/Me) I/Os. The only requirement is
that M 3B.
Multiway Mergesort
We now generalize the external Mergesort to take full advantage of the available internal memory during merging. We can reduce the number of phases by merging as
many runs as possible in a single phase. In k-way merging, we merge k sorted sequences into a single output sequence. Binary merging (k = 2) is easy to generalize.
In each step we find the input sequence with the smallest first element. This element
is removed and appended to the output sequence. External memory implementation is
easy as long as we have enough internal memory for k input buffer blocks, one output
buffer block, and a small amount of additional storage.
For each sequence, we need to remember which element we are currently considering. To find the smallest element among all k sequences, we keep their current
element keys and positions in a priority queue. A priority queue maintains a set of
elements supporting the operations insertion and deletion of the minimum. Chapter 6
explains how priority queues can be implemented so that insertion and deletion take
time O (logk) for k elements. Figure 5.12 gives pseudocode for this algorithm. Figure 5.13 gives a snapshot of an execution of 4-way merging. This two-pass sorting
algorithm sorts n elements using 4n/B I/Os during run formation and merging everything is read and written exactly once. A single merging phase works if there is
enough internal memory to store dn/Me input buffer blocks, one output buffer block,
and a priority queue with dn/Me entries, i.e., we can sort up to n M 2 /B elements. If
internal memory stands for DRAMs and external memory stands for disks, this bound
on n is no real restriction for all practical system configurations. For comparison with
the I/O cost of binary mergesort it is nevertheless instructivelto look at arbitrarily
large
m
inputs. Run formation works as before, but we now need logM/B (N/M) merging
phases to arrive at a single sorted run. Overall we need
l
mN
n
1 + logM/B
(5.1)
2
B
M
I/Os. The difference to binary merging is the much larger base of the logarithm.
Exercise 5.25 (Huge inputs.) Describe a generalization of twoPassSort in Figure 5.12
that also works for n > M 2 /B, using dlogM/B n/Me merging phases.
98
Exercise 5.26 (Balanced systems.) Study the current market prices of computers, internal memory, and mass storage (currently hard disks). Also estimate the block size
needed to achieve good bandwidth for I/O. Can you find any configuration where
multi-way mergesort would require more than one merging phase for sorting an input
filling all the disks in the system? If so, which fraction of the system cost would you
have to spend on additional internal memory to go back to a single merging phase?
=
If there are M/B cache blocks this does not mean that we can use k = M/B 1. A discussion of this
issue can be found in [70].
99
// k-way merging
B
next
out : Array [0..B 1] of Element
k
internal
out
for i := 0 to n 1 step B do
for j := 0 to B 1 do
t
i=0
i=1
i=2
(x, `) := next.deleteMin
out[ j] := runBuffer[` div L][` mod B]
`++
if ` mod B = 0 then
// New input block
if ` mod L = 0 ` = n then
runBuffer[` div L][0] :=
// sentinel for exhausted run
else runBuffer[` div L] := f [`..` + B 1]
// refill buffer
next.insert((runBuffer[` div L][0], `))
write out to t[i..i + B 1]
// One output step
Figure 5.12: External k-way mergesort.
100
__aeghikmnst__aaeilmpsss__bbeilopssu__eilmnoprst
runBuffer
st
ss
B
M
t
ps
next
k
st
internal
out ss
101
This function has to be called for every single element comparison. In contrast, sort
uses the template mechanism of C++ to figure out at compile time how comparisons
are performed so that the code generated for comparisons is often a single machine
instruction. The parameters passed to sort are an iterator pointing to the start of the
sequence to be sorted and an iterator pointing after the end of the sequence. Hence,
sort can be applied to lists, arrays, etc. In our experiments on an Intel Pentium III
and gcc 2.95, sort on arrays runs faster than our manual implementation of quicksort.
One possible reason is that compiler designers may tune there code optimizers until
they find that good code for the library version of quicksort is generated.
________aaabbeeeeghiiiiklllmmmnnooppprss
Java
Exercises
Exercise 5.27 Give a C or C++-implementation of the quicksort in Figure 5.6 that
uses only two parameters. A pointer to the (sub)array to be sorted, and its size.
Figure 5.14: Array based sorting with keys in the range 0..K 1. The input is an
unsorted array a. The output is b with the elements of a in sorted order.
In this book you will find several generalizations of sorting. Chapter 6 discusses
priority queues a data structure that allows insertion of elements and deletion of
the smallest element. In particular, by inserting n elements and then deleting them we
get the elements in sorted order. It turns out that this approach yields some quite good
sorting algorithm. A further generalization are the search trees introduced in Section 7
that can be viewed as a data structure for maintaining a sorted list supporting inserting,
finding, and deleting elements in logarithmic time.
Generalizations beyond[check] the scope of this book are geometric problems on =
higher dimensional point sets that reduce to sorting for special inputs (e.g., convex
hulls or Delaunay triangulations [26]). Often, sorting by one coordinate is an important ingredient in algorithms solving such problems.
We have seen several simple, elegant, and efficient randomized algorithms in this
chapter. An interesting theoretical question is whether these algorithms can be replaced by deterministic ones. Blum et al. [15] describe a deterministic median selection algorithm that is similar to the randomized algorithm from Section 5.5. This
algorithm makes pivot selection more reliable using recursion: The pivot is the median
of the bn/5c medians of he5i+1 , e5i+2 , e5i+3 , e5i+4 , e5i+5 i for 0 i < n/5 1. Working
102
out the resulting recurrences yields a linear time worst case execution time but the
constant factors involved make this algorithms impractical. There are quite practical
ways to reduce the expected number of comparisons required by quicksort. Using the
median of three random elements yields an algorithm with about 1.188n logn comparisons. The median of three three-medians brings this down to 1.094n logn [10]. A
perfect implementation makes the number of elements considered for pivot selection dependent on size of the subproblem.
Martinez and Roura [62] show that for a
subproblem of size m, the median of ( m) elements is a good choice for the pivot.
The total number of comparisons required is then (1 + o(1))n logn, i.e., it matches
the lower bound of n logn O (n) up to lower order terms. A deterministic variant of
quicksort that might be practical is proportion extend sort [21].
A classical sorting algorithm of some historical interest is shell sort [48, 85] a
quite simple generalization of insertion sort, that gains efficiency by also comparing
nonadjacent elements. It is still open whether there might be a variant of Shellsort that
achieves O (n logn) run time on the average [48, 63].
There are some interesting tricks to improve external multiway mergesort. The
snow plow heuristics [57, Section 5.4.1] forms runs of size 2M on the average using a
fast memory of size M: When an element e is read from disk, make room by writing
the smallest element e0 from the current run to disk. If e e0 insert e into the current
run. Otherwise, remember it for the next run. Multiway merging can be slightly sped
up using a tournament tree rather than general priority queues [57].[discuss in PQ
= chapter?]
To sort large data sets, we may also want to use parallelism. Multiway mergesort and distribution sort can be adapted to D parallel disks by striping, i.e., every
D consecutive blocks in a run or bucket are evenly distributed over the disks. Using
randomization, this idea can be developed into almost optimal algorithms that also
overlaps I/O and computation [28]. Perhaps the best sorting algorithm for large inputs
on P parallel processors is a parallel implementation of the sample sort algorithm from
Section ?? [14].
=
[more lower bounds, e.g., selection, I/O? Or do it in detail?]
We have seen linear time algorithms for rather specialized inputs. A quite general model, where the n logn lower bound can be broken, is the word model. If keys
are integers that can be stored in a memory word, then they can be sorted in time
O (n loglogn) regardless of the word size as long as we assume that simple operations
= on[what exactly] words can be performed in constant time [5]. A possibly practical implementation of the distribution based algorithms from Section 5.6 that works
almost in-place is flash sort [75].
Exercise 5.28 (Unix spell checking) One of the authors still finds the following spell
checker most effective: Assume you have a dictionary consisting of a sorted sequence
103
of correctly spelled words. To check a text, convert it to a sequence of words, sort it,
scan text and dictionary simultaneously, and output the words in the text that do not
appear in the dictionary. Implement this spell checker using any unix tools in as few
lines as possible (one longish line might be enough).
[ssssort? skewed qsort? cache oblivious funnel sort? Coles merge sort sort
13 elements? more than dlogn!e vergleiche.]
=
104
105
Chapter 6
Priority Queues
Suppose you work for a company that markets tailor-made first-rate garments. Your
business model works as follows: You organize marketing, measurements etc. and get
20% of the money paid for each order. Actually executing an order is subcontracted
to an independent master taylor. When your company was founded in the 19th century
there were five subcontractors in the home town of your company. Now you control
15 % of the world market and there are thousands of subcontractors worldwide.
Your task is to assign orders to the subcontractors. The contracts demand that an
order is assigned to the taylor who has so far (this year) been assigned the smallest
total amount of orders. Your ancestors have used a blackboard with the current sum
of orders for each tailor. But for such a large number of subcontractors it would be
prohibitive to go through the entire list everytime you get a new order. Can you come
up with a more scaleable solution where you have to look only at a small number of
values to decide who will be assigned the next order?
In the following year the contracts are changed. In order to encourage timely
delivery, the orders are now assigned to the taylor with the smallest amount of unfinished orders, i.e, whenever a finished order arrives, you have to deduct the value of
the order from the backlog of the taylor who executed it. Is your strategy for assigning
orders flexible enough to handle this efficiently?
[Verweise auf Summary, Intro korrekt?] The data structure needed for the =
above problem is a priority queue and shows up in many applications. But first let
us look at a more formal specification. We maintain a set M of Elements with Keys.
Every priority queue supports the following operations:
Procedure build({e1 , . . . , en }) M:= {e1 , . . . , en }
Procedure insert(e) M:= M {e}
Function min return minM
Function deleteMin e:= min M; M:= M \ {e}; return e
106
Priority Queues
= [replaced findMin by min] [brauchen wir Min und size? Dann in allen Kapiteln
= durchschleifen? Im Moment habe ich size weggelassen weil trivial] This is
enough for the first part of our tailored example: Every year we build a new priority
queue containing an Element with Key zero for each contract tailor. To assign an
order, we delete the smallest Element, add the order value to its Key, and reinsert it.
Section 6.1 presents a simple and efficient implementation of this basic functionality.
Addressable priority queues additionally support operations on arbitrary elements
addressed by an element handle:
Function remove(h : Handle) e:= h; M:= M \ {e}; return e
Procedure decreaseKey(h : Handle, k : Key) assert key(h) k; key(h):= k
Procedure merge(M 0 ) M:= M M 0
= [index terms: delete: see also remove, meld: see also merge].
In our example, operation remove might be helpful when a contractor is fired because it delivers poor quality. Together with insert we can also implement the new
contract rules: When an order is delivered, we remove the Element for the contractor who executed the order, subtract the value of the order from its Key value, and
reinsert the Element. DecreaseKey streamlines this process to a single operation. In
Section 6.2 we will see that this is not just convenient but that decreasing a Key can be
implemented more efficiently than arbitrary element updates.
Priority queues support many important applications. For example, in Section 12.2
we will see that our tailored example can also be viewed as greedy algorithm for a very
natural machine scheduling problem. Also, the rather naive selection sort algorithm
from Section 5.1 can be implemented efficiently now: First insert all elements to be
sorted into a priority queue. Then repeatedly delete the smallest element and output
it. A tuned version of this idea is described in Section 6.1. The resulting heapsort
algorithm is one of the most robust sorting algorithms because it is efficient for all
inputs and needs no additional space.
In a discrete event simulation one has to maintain a set of pending events. Each
event happens at some scheduled point in time [was execution time which can be
misunderstood as the time the task/event takes to execute in the real world
= or on the simulated computer.] and creates zero or more new events scheduled to
happen at some time in the future. Pending events are kept in a priority queue. The
main loop of the simulation deletes the next event from the queue, executes it, and
inserts newly generated events into the priority queue. Note that priorities (times) of
the deleted elements (simulated events) are monotonically increasing during the simulation. It turns out that many applications of priority queues have this monotonicity
property. Section 10.5 explains how to exploit monotonicity for queues with integer
keys.
107
Another application of monotone priority queues is the best first branch-andbound approach to optimization described in Section 12.4.1. Here elements are partial
solutions of an optimization problem and the keys are optimistic estimates of the obtainable solution quality. The algorithm repeatedly removes the best looking partial
solution, refines it, and inserts zero or more new partial solutions.
We will see two applications of addressable priority queues in the chapters on
graph algorithms. In both applications the priority queue stores nodes of a graph.
Dijkstras algorithm for computing shortest paths in Section 10.4 uses a monotone priority queue where the keys are path lenghts. The Jarnk-Prim algorithm for computing
minimum spanning trees in Section 11.2 uses a (nonmonotone) priority queue where
the keys are edge weights connecting a node to a spanning tree. In both algorithms,
each edge can lead to a decreaseKey operation whereas there is at most one insert
and deleteMin for each node. Hence, decreaseKey operations can dominate the running time if m n. [moved this discussion here since it fits to the application
overview]
=
Exercise 6.1 Show how to implement bounded non-addressable priority queues by
arrays. The maximal size of the queue is w and when the queue has size n the first
n entries of the array are used. Compare the complexity of the queue operations for
two naive implementations. In the first implementation the array is unsorted and in
the second implementation the array is sorted.
Exercise 6.2 Show how to implement addressable priority queues by doubly linked
lists. Each list item represent an element in the queue and a handle is a handle of a list
item. Compare the complexity of the queue operations for two naive implementations.
In the first implementation the list is unsorted and in the second implementation the
list is sorted.
108
Priority Queues
109
h: a c g r d p h w j s z q
// The heap h is
// initially empty and has the
// heap property which implies that
// the root is the minimum.
j: 1 2 3 4 5 6 7 8 910 11 12 13
c g
c g
deleteMin
r d p h
Figure 6.1: Class declaration for a priority queue based on binary heaps whose size is
bounded by w.
r d p h
a
c b
w j s z q
r d g h
What does this mean? The key to understanding this definition is a bijection between positive integers and the nodes of a complete binary tree as illustrated in Figure 6.2. In a heap the minimum element is stored in the root (= array position 1). Thus
min takes time O (1). Creating an empty heap with space for w elements also takes
= constant time [ps: was O (n)???] as it only needs to allocate an array of size w.
Although finding the minimum of a heap is as easy as for a sorted array, the heap
property is much less restrictive. For example, there is only one way to sort the set
{1, 2, 3} but both h1, 2, 3i and h1, 3, 2i a legal representations of {1, 2, 3} by a heap.
1 2 3 4 5 6 7 8 9 10 11 12 13
insert(b )
d g
w j s z q
r q p h
w j s z q p
w j s z
Figure 6.2: The top part shows a heap with n = 12 elements stored in an array h with
w = 13 entries. The root has number one. The children of the root have numbers 2
and 3. The children of node i have numbers 2i and 2i + 1 (if they exist). The parent
of a node i, i 2, has number bi/2c. The elements stored in this implicitly defined
tree fulfill the invariant that parents are no larger than their children, i.e., the tree is
heap-ordered. The left part shows the effect of inserting b. The fat edges mark a path
from the rightmost leaf to the root. The new element b is moved up this path until its
parent is smaller. The remaining elements on the path are moved down to make room
for b. The right part shows the effect of deleting the minimum. The fat edges mark the
path p starting at the root and always proceding to the child with smaller Key. Element
q is provisorically moved to the root and then moves down path p until its successors
are larger. The remaining elements move up to make room for q
.
assert the heap property holds except for j = i
swap(h[i], h[bi/2c])
assert the heap property holds except maybe for j = bi/2c
siftUp(bi/2c)
Correctness follows from the stated invariants.
Exercise 6.4 Show that the running time of siftUp(n) is O (log n) and hence an insert
takes time O (logn)
110
Priority Queues
111
A deleteMin returns the contents of the root and replaces it by the contents of node
n. Since h[n] might be larger than h[1] or h[2], this manipulation may violate the heap
property at positions 1 or 2. This possible violation is repaired using siftDown.
Exercise 6.5 Our current implementation of siftDown needs about 2 logn element
comparisons. Show how to reduce this to log n+ O (loglog n). Hint: use binary search.
Section 6.4 has more on variants of siftDown.
We can obviously build a heap from n elements by inserting them one after the
other in O (n logn) total time. Interestingly, we can do better by estabilishing the heap
property in a bottom up fashion: siftDown allows us to establish the heap property for
a subtree of height k + 1 provided the heap property holds for its subtrees of height k.
The following exercise asks you to work out the details of this idea:
Procedure siftDown(1) moves the new contents of the root down the tree until the heap
= property[ps:was condition] holds. More precisely, consider the path p starting at the
root and always proceding to a child with minimal[was smaller which is wrong
= for equal keys] Key, cf. Figure 6.2. We are allowed to move up elements along
path p because the heap property with respect to the other successors (with maximal
= Key) will be maintained.[new sentence] The proper place for the root on this path is
the highest position where both its successors[was: where is also fulfills the heap
property. This is wrong because this would only require it to be larger than
= the parent.] fulfill the heap property. In the code below, we explore the path p by a
recursive procedure.
Procedure siftDown(i) repairs the heap property at the successors of heap node i
without destroying it elsewhere. In particular, if the heap property originally held for
the subtrees rooted at 2i and 2i + 1, then it now holds for the the subtree rooted at i.
(Let us say that the heap property holds at a subtree rooted at node i if it holds for
= all descendants[check whether this is defined in intro] of i but not necessarily for
i itself.)[new somewhat longwinded discussion. But some redundance might
help and it looks like this is needed to explain what siftDown does in other
= circumstances like buildHeap.]
[changed the precondition and postcondition so that the correctness proof
= of buildHeap works.]
Procedure siftDown(i : )
assert the heap property holds for the subtrees rooted at j = 2i and j = 2i + 1
if 2i n then
// i is not a leaf
if 2i + 1 > n h[2i] h[2i + 1] then m:= 2i else m:= 2i + 1
assert the sibling of m does not exist or does not have a smaller priority than m
if h[i] > h[m] then
// the heap property is violated
swap(h[i], h[m])
siftDown(m)
assert heap property @ subtree rooted at i
Exercise 6.6 (buildHeap) Assume that we are given an arbitrary array h[1..n] and
want to establish the heap property on it by permuting its entries. We will consider
two procedures for achieving this:
Procedure buildHeapBackwards
for i := bn/2c downto 1 do siftDown(i)
Procedure buildHeapRecursive(i : )
if 4i n then
buildHeapRecursive(2i)
buildHeapRecursive(2i + 1)
siftDown(i)
[smaller itemsep for our enumerations?]
112
Priority Queues
Theorem 6.1 [reformulated theorem such that ALL results on heaps are sum= marized] With the heap implementation of bounded non-addressable priority queues,
creating an empty heap and finding min takes constant time, deleteMin and insert take
logarithmic time O (log n) and build takes linear time.
Heaps are the basis of heapsort. We first build a heap from the elements and then
repeatedly perform deleteMin. Before the i-th deleteMin operation the i-th smallest
element is stored at the root h[1]. We swap h[1] and h[n + i + 1] and sift the new
root down to its appropriate position. At the end, h stores the elements sorted in
decreasing order. Of course, we can also sort in increasing order by using a maxpriority queue, i.e., a data structure supporting the operations insert and deleting the
maximum. [moved deleteMin by binary search up to the general deleteMin
since I see no reason to use it only for heapsort. Moved the bottom up part
to further findings since new experiments indicate that it does not really help if
= the top-down algorithm is properly implemented.]
Heaps do not immediately implement the ADT addressable priority queue, since
elements are moved around in the array h during insertion and deletion. Thus the array
indices cannot be used as handles.
Exercise 6.7 (Addressable binary heaps.) Extend heaps to an implementation of addressable priority queues. Compared to nonaddressable heaps your data structure
= should only consume two additional pointers per element.[new requirement]
Exercise 6.8 (Bulk insertion.) We want to design an algorithm for inserting k new
elements into an n element heap.
*a) Give an algorithm that runs in time O k + log2 n . Hint: Use a similar bottom
up approach as for heap construction.
**b) Can you even achieve time O (min(log n + k logk, k + log(n) loglog n))?
113
minPtr
roots
a
b
114
Priority Queues
Figure 6.4: A heap ordered forest representing the set {0, 1, 3, 4, 5, 7, 8}.
single tree is replaced by a collection of trees a forest. Each tree is still heapordered, i.e., no child is smaller than its parent. In other words, the sequence of keys
along any root to leaf path is non-decreasing. Figure 6.4 shows a heap-ordered forest.
But now there is no direct restriction on the height of the trees or the degrees of their
nodes. The elements of the queue are stored in heap items that have a persistent
location in memory. Hence, we can use pointers to address particular elements at
any time. Using the terminology from Section 3, handles for priority queue elements
are implemented as pointers to heap items. The tree structure is explicitly defined
using pointers between items. However we first describe the algorithms on an abstract
level independent of the details of representation. In order to keep track of the current
minimum, we maintain the handle to the root where it is stored.
Figure 6.3 gives pseudocode that expresses the common aspects of several addressable priority queues. The forest is manipulated using three simple operations:
adding a new tree (and keeping minPtr up to date), combining two trees into a single
one, and cutting out a subtree making it a tree of its own.
An insert adds a new single node tree to the forest. So a sequence of n inserts into
an initially empty heap will simply create n single node trees. The cost of an insert is
clearly O (1).
A deleteMin operation removes the node indicated by minPtr. This turns all children of the removed node into roots. We then scan the set of roots (old and new) to
find the new minimum. To find the new minimum we need to inspect all roots (old
and new), a potentially very costly process. We make the process even more expensive
(by a constant factor) by doing some useful work on the side, namely combining some
trees into larger trees. We will use the concept of amortized analysis to charge the cost
depending on the number of trees to the operations that called newTree. Hence, to
prove a low complexity for deleteMin, it suffices to make sure that no tree root has too
many children.
We turn to the decreaseKey operation next. It is given a handle h and a new key
k and decreases the key value of h to k. Of course, k must not be larger than the old
key stored with h. Decreasing the information associated with h may destroy the heap
property because h may now be smaller than its parent. In order to maintain the heap
property, we cut the subtree rooted at h and turn h into a root. Cutting out subtrees
causes the more subtle problem that it may leave trees that have an awkward shape.
115
a
b
roots
d
Figure 6.5: The deleteMin operation of pairing heaps combines pairs of root nodes.
*Exercise 6.9 (Three pointer items.) Explain how to implement pairing heaps using
three pointers per heap item: One to the youngest child, one to the next older sibling
(if any), and one that either goes to the next younger sibling, or, if there is no younger
sibling, to the parent. Figure 6.8 gives an example.
*Exercise 6.10 (Two pointer items.) The three pointer representation from Exercise 6.9
can be viewed as a binary tree with the invariant that the items stored in left subtrees
contain no smaller elements. Here is a more compact representation of this binary
tree: Each item stores a pointer to its right child. In addition, a right child stores a
pointer to its sibling. Left childs or right childs without sibling store a pointer to their
parent. A right child without a sibling stores Explain how to implement pairing heaps
using this representation. Figure 6.8 gives an example.
116
Priority Queues
117
Proof: A deleteMin first calls newTree at most maxRank times and then initializes an
array of size maxRank. The remaining time is proportional to the number of combine
operations performed. From now on we therefore only count combines. To make
things more concrete lets say that one peanut pays for a combine.[intro here or in
search tree?] We make newTree pay one peanut stored with the new tree root. With =
these conventions in place, a combine operation performed by deleteMin is free since
the peanut stored with the item that ceases to be a root, can pay for the combine.
B0
B1
b a
roots
bac
b
c
c d
feg
g
f
B2
B3
B4
c d
Figure 6.6: An example for the development of the bucket array while the deleteMin of
Fibonacchi heaps is combining the roots. The arrows indicate which roots have been
scanned. Node that scanning d leads to a cascade of three combination operations.
In addition, every heap item contains a field rank. The rank of an item is the
number of its children. In Fibonacchi heaps, deleteMin links roots of equal rank r.
The surviving root will then get rank r + 1. An efficient method to combine trees
of equal rank is as follows. Let maxRank be the maximal rank of a node after the
execution of deleteMin. Maintain a set of buckets, initially empty and numbered from
0 to maxRank. Then scan the list of old and new roots. When a root of rank i is
considered inspect the i-th bucket. If the i-th bucket is empty then put the root there.
If the bucket is non-empty then combine the two trees into one. This empties the ith bucket and creates a root of rank i + 1. Try to throw the new tree into the i + 1st
bucket. If it is occupied, combine . . . . When all roots have been processed in this way,
we have a collection of trees whose roots have pairwise distinct ranks. Figure 6.6
gives an example.
A deleteMin can be very expensive if there are many roots. However, we now
show that in an amortized sense, maxRank is more important.
Lemma 6.2 By charging a constant amount of additional work to every newTree operation, the amortized complexity of deleteMin can be made O (maxRank).
B5
B3
4 pointers:
Fibonacci heaps
3 pointers:
pairing heaps
binomial heaps
2 pointers:
Exercise 6.10
Figure 6.8: Three ways to represent trees of nonuniform degree. The binomal tree of
rank three, B3 , is used as an example.
Lemma 6.2 tells us that in order to make deleteMin fast we should make sure that
maxRank remains small. Let us consider a very simple situation first. Suppose that
we perform a sequence of inserts followed by a single deleteMin. In this situation, we
start with a certain number of single node trees and all trees formed by combining are
so-called binomial trees as shown in Figure 6.7. The binomial tree B 0 consists of a
single node and the binomial tree Bi+1 is obtained by joining two copies of the tree
Bi . This implies that the root of the tree Bi has rank i and that the tree Bi contains
118
Priority Queues
exactly 2i nodes. We conclude that the the rank of a binomial tree is logarithmic in the
size of the tree. If we could guarantee in general that the maximal rank of any node is
logarithmic in the size of the tree then the amortized cost of the deleteMin operation
would be logarithmic.
Unfortunately, decreaseKey may destroy the nice structure of binomial trees. Suppose item v is cut out. We now have to decrease the rank of its parent w. The problem
is that the size of the subtrees rooted at the ancestors of w has decreased but their rank
has not changed. Therefore, we have to perform some balancing to keep the trees in
shape.
An old solution suggested by Vuillemin [94] is to keep all trees in the heap binomial. However, this causes logarithmic cost for a decreaseKey.
*Exercise 6.11 (Binomial heaps.) Work out the details of this idea. Hint: you have
to cut vs ancestors and their younger siblings.
Fredman and Tarjan showed how to decrease its cost to O (1) without increasing the
cost of the other operations. Their solution is surprisingly simple and we describe it
next.
7
9
3
5
7
9
1 6
Figure 6.9: An example for cascading cuts. Marks are drawn as crosses. Note that
roots are never marked.
When a non-root item x loses a child because decreaseKey is applied to the child,
x is marked; this assumes that x has not already been marked. When a marked node
x loses a child, we cut x, remove the mark from x, and attempt to mark xs parent.
If xs parent is marked already then . . . . This technique is called cascading cuts. In
other words, suppose that we apply decreaseKey to an item v and that the k-nearest
ancestors of v are marked, then turn v and the k-nearest ancestors of v into roots and
mark the k + 1st-nearest ancestor of v (if it is not a root). Also unmark all the nodes
that were turned into roots. Figure 6.9 gives an example.
Lemma 6.3 The amortized complexity of decreaseKey is constant.
119
Proof: We generalize the proof of Lemma 6.2 to take the cost of decreaseKey
operations into account. These costs are proportional to the number of cut operations performed. Since cut calls newTree which is in turn charged for the cost of
a combine, we can as well ignore all costs except the peanuts needed to pay for
combines.[alternative: also account cuts then marked items store two peanuts
etc.] We assign one peanut to every marked item. We charge two peanuts for a =
decreaseKey one pays for the cut and the other for marking an ancestor that has
not been marked before. The peanuts stored with unmarked ancestors pay for the
additional cuts.
How do cascading cuts affect the maximal rank? We show that it stays logarithmic.
In order to do so we need some notation. Let F0 = 0, F1 = 1, and Fi = Fi1 + Fi2
for
i i 2 be the i sequence of Fibonacci numbers. It is well-known that Fi+1 (1 +
5/2) 1.618 for all i 0.
Lemma 6.4 Let v be any item in a Fibonacci heap and let i be the rank of v. Then the
subtree rooted at v contains at least Fi+2 nodes. In a Fibonacci heap with n items all
ranks are bounded by 1.4404 logn.
decreaseKey( ,2)
decreaseKey( ,6)
4 1 6
decreaseKey( ,4)
1 6
Proof: [start counting from zero here?] Consider an arbitrary item v of rank i. =
Order the children of v by the time at which they were made children of v. Let w j
be the j-th child, 1 j i. When w j was made child of v, both nodes had the same
rank. Also, since at least the nodes w1 , . . . , w j1 were nodes of v at that time, the rank
of v was at least j 1 at the time when w j was made a child of v. The rank of w j has
decreased by at most 1 since then because otherwise w j would be a root. Thus the
current rank of w j is at least j 2.
We can now set up a recurrence for the minimal number Si of nodes in a tree
whose root has rank i. Clearly S0 = 1, S1 = 2, and Si 2 + S0 + S1 + + Si2 . The
last inequality follows from the fact that for j 2, the number of nodes in the subtree
with root w j is at least S j2 , and that we can also count the nodes v and w1 . The
recurrence above (with = instead of ) generates the sequence 1, 2, 3, 5, 8,. . . which
is identical to the Fibonacci sequence (minus its first two elements).
Lets verify this by induction. Let T0 = 1, T1 = 2, and Ti = 2 + T0 + + Ti2 for
i 2. Then, for i 2, Ti+1 Ti = 2 + T0 + + Ti1 2 T0 Ti2 = Ti1 , i.e.,
Ti+1 = Ti + Ti1 . This proves Ti = Fi+2 .
For the second claim, we only have to observe that Fi+2 n implies i log(1 +
5/2) logn which in turn implies i 1.4404 logn.
This concludes our treatment of Fibonacci heaps. We have shown.
120
Priority Queues
Theorem 6.5 The following time bounds hold for Fibonacci heaps: min, insert, and
merge take worst case constant time; decreaseKey takes amortized constant time and
remove and deleteMin take amortized time logarithmic in the size of the queue.
121
C++
The STL class priority queue offers nonaddressable priority queues implemented using binary heaps. LEDA implements a wide variety of addressable priority queues
including pairing heaps and Fibonacchi heaps.
Java
The class java.util.PriorityQueue supports addressable priority queues to the extend
that remove is implemented. However decreaseKey and merge are not supported. Also
it seems that the current implementation of remove needs time (n)! JDSL offers
an addressable priority queue jdsl.core.api.PriorityQueue which is currently implemented as a binary heap.[wo erklren wir was jdsl ist? in implmentation notes of
intro?]
=
122
Priority Queues
= sort?] the algorithm is efficient in the external memory model even though it
does not explicitly use the block size or cache size.
Pairing heaps have amortized constant complexity for insert and merge [44] and
logarithmic amortized complexity for deleteMin. There is also a logarithmic upper
bound for decreaseKey but it is not known whether this bound is tight. Fredman
[35] has given operation sequences consisting of O (n) insertions and deleteMins and
O (n logn) decreaseKeys that take time (n logn loglog n) for a family of addressable
priority queues that includes all previously proposed variants of pairing heaps.
The family of addressable priority queues from Section 6.2 has many additional
interesting members. Hyer describes additional balancing operations which can be
used together with decreaseKey that look similar to operations more well known for
search trees. One such operation yields thin heaps [50] which have similar performance guarantees than Fibonacchi heaps but do not need a parent pointer or a mark
bit. It is likely that thin heaps are faster in practice than Fibonacchi heaps. There are
also priority queues with worst case bounds asymptotically as good as the amortized
bounds we have seen for Fibonacchi heaps [17]. The basic idea is to tolerate violations
of the heap property and to continously invest some work reducing the violations. The
fat heap [50] seem to be simple enough to be implementable at the cost of allowing
merge only in logarithmic time.
Many applications only need priority queues for integer keys. For this special case
there are more efficient priority queues. The best theoretical bounds so far are constant time decreaseKey and insert and O (loglog n) time for
[92, 71]. Using
deleteMin
randomization the time bound can even be reduced to O log logn [97]. These algorithms are fairly complex but by additionally exploiting mononotonicity of the queues
one gets simple and practical priority queues. Section 10.5 will give examples.[move
= here?] The calendar queues [19] popular in the discrete event simulation community are even simpler variants of these integer priority queues. A practical monotone
priority queue for integers is described in [6].
123
Chapter 7
Sorted Sequences
Searching a name in a telephone book is easy if the person you look for has the
telephone number for long enough. It would be nicer to have a telephone book that
is updated immediately when something changes. The manual data structure used
for this purpose are file card boxes. Some libraries still have huge collections with
hundreds of thousands of cards collecting dust.
[erstmal binary search proper wahrscheinlich in intro]
=
People have looked for a computerized version of file cards from the first days of
commercial computer use. Interestingly, there is a quite direct computer implementation of file cards that is not widely known. The starting point is our telephone book
data structure from Chapter 5 a sorted array a[1..n]. We can search very efficiently
by binary search. To remove an element, we simply mark it as removed. Note that
we should not overwrite the key because it is still needed to guide the search for other
elements. The hard part is insertion. If the new element e belongs between a[i 1] and
a[i] we somehow have to squeeze it in. In a file card box we make room by pushing
other cards away. Suppose a[ j + 1] is a free entry of the array to the right of a[i]. Perhaps a[ j + 1] previously stored a removed element or it is some free space at the end
of the array. We can make room for element e in a[i] by moving the elements in a[i.. j]
one position to the right. Figure 7.1 gives an example. Unfortunately, this shifting
can be very time consuming. For example, if we insert n elements with decreasing
insert(10)
2
11 13 17 19 19 23
shift
10 11 13 17 19 23
Figure 7.1: A representation of the sorted sequence h2, 3, 5, 7, 11, 13, 17, 19i using a
sparse table. Situation before and after a 10 is inserted.
124
Sorted Sequences
11
13
17
19
Figure 7.2: A sorted sequence as a doubly linked list plus a navigation data structure.
key value, we always have to shift all the other elements so that a total of n(n 1)/2
elements are moved. But why does the shifting technique work well with file cards?
The trick is that one leaves some empty space well dispersed over the entire file. In the
array implementation, we can achieve that by leaving array cells empty. Binary search
will still work as long as we make sure that empty cells carry meaningful keys, e.g.,
the keys of the next nonempty cell. Itai et al. [45] have developed this idea of sparse
tables into a complete data structure including rebalancing operations when things get
too tight somewhere. One gets constant amortized insertion time on the average,
i.e., if insertions happen everywhere
with equal probability. In the worst case, the best
known strategies give O log2 n amortized insertion time into a data structure with
n elements. This is much slower than searching and this is the reason why we will
follow another approach for now. But look at Exercise 7.10 for a refinement of sparse
tables.
Formally, we want to maintain a sorted sequence, i.e., a sequence of Elements
sorted by their Key value. The three basic operations we want to support are
Function locate(k : Key) return min {e M : e k}
Procedure insert(e : Element) M:= M {e}
Procedure remove(k : Key) M:= M \ {e M : key(e) = k}
where M is the set of elements stored in the sequence. It will turn out that these
basic operations can be implemented to run in time O (logn). Throughout this section,
n denotes the size of the sequence. It is instructive to compare this operation set
with previous data structures we have seen. Sorted sequences are more flexible than
sorted arrays because they efficiently support insert and remove. Sorted sequences are
slower but also more powerful than hash tables since locate also works when there is
no element with key k in M. Priority queues are a special case of sorted sequences
because they can only locate and remove the smallest element.
Most operations we know from doubly linked lists (cf. Section 3.2.1) can also be
Sorted Sequences
125
implemented efficiently for sorted sequences. Indeed, our basic implementation will
represent sorted sequences as a sorted doubly linked list with an additional structure
supporting locate. Figure 7.2 illustrates this approach. Even the dummy header element we used for doubly linked lists has a useful role here: We define the result of
locate(k) as the handle of the smallest list item e k or the dummy item if k is larger
than any element of the list, i.e., the dummy item is treated as if it stores an element
with infinite key value. With the linked list representation we inherit constant time
implementations for first, last, succ, and pred. We will see constant amortized time
implementations for remove(h : Handle), insertBefore, and insertAfter and logarithmic time algorithms for concatenating and splitting sorted sequences. The indexing
operator [] and finding the position of an element in the sequence also takes logarithmic time.
Before we delve into implementation details, let us look at a few concrete applications where sorted sequences are important because all three basic operations locate,
insert, and remove are needed frequently.
Sweep-Line Algorithms: [todo: einheitlich myparagraph mit kleinerem Abstand] Assume you hava a set of horizontal and vertical line segments in the plane =
and want to find all points where two segments intersect. A sweep line algorithm
moves a vertical line over the plane from left to right and maintains the set of horizontal lines that intersect the sweep line in sorted sequence s. When the left endpoint
of a horizontal segment is reached, it is inserted into s and when its right endpoint is
reached, it is removed from s. When a vertical line segment is reached at position x
that spans the vertical range [y, y0 ], we call s.locate(y) and scan s until we reach key
y0 . All the horizontal line segments discovered during this scan define an intersection.
The sweeping algorithm can be generalized for arbitrary line segments [11], curved
objects, and many other geometric problems[cite sth?].
=
126
Sorted Sequences
Data Base Indexes: A variant of the (a, b)-tree data structure explained in Section 7.2 is perhaps the most important data structure used in data bases.
The most popular way to accelerate locate is using a search tree. We introduce
search tree algorithms in three steps. As a warm-up, Section 7.1 introduces a simple
variant of binary search trees that allow locate in O (logn) time under certain circumstances. Since binary search trees are somewhat difficult to maintain under insertions
and removals, we switch to a generalization, (a, b)-trees that allows search tree nodes
of larger degree. Section 7.2 explains how (a, b)-trees can be used to implement all
three basic operations in logarthmic worst case time. Search trees can be augmented
with additional mechanisms that support further operations using techniques introduced in Section 7.3.
= [already in intro?]
Navigating a search tree is a bit like asking your way around a foreign city. At
every street corner you show the address to somebody. But since you do not speak the
language, she can only point in the right direction. You follow that direction and at
the next corner you have to ask again.
More concretely, a search tree is a tree whose leaves store the elements of the
sorted sequence.1 To speed up locating a key k, we start at the root of the tree and
traverse the unique path to the leaf we are looking for. It remains to explain how we
find the correct path. To this end, the nonleaf nodes of a search tree store keys that
guide the search. In a binary search tree that has n 2 leaves, a nonleaf node has
exactly two children a left child and a right child. Search is guided by one splitter
key s for each nonleaf node. The elements reachable through the left subtree have
keys k s. All the elements that are larger than s are reachable via the right subtree.
With these definitions in place, it is clear how to ask for the way. Go left if k s.
Otherwise go right. Figure 7.5 gives an example. The length of a path from the root
the a node of the tree is called the depth of this node. The maximum depth of a leaf is
the height of the tree. The height therefore gives us the maximum number of search
steps needed to locate a leaf.
Exercise 7.1 Prove that a binary search tree with n 2 leaves can be arranged such
that it has height dlogne.
A search tree with height dlog ne is perfectly balanced. Compared to the (n) time
needed for scanning a list, this is a dramatic improvement. The bad news is that it is
1 There
is also a variant of search trees where the elements are stored in all nodes of the tree.
insert e
u
Figure 7.3: Naive insertion into a binary search tree. A triangles indicates an entire
subtree.
insert 17
19
17
17
19
17
17
13
13
19
13
19
insert 11
insert 13
19
19
127
11
17
19
11
13
17
19
128
Sorted Sequences
7
rotate right
13
y
2
2
11
7
11
19
13
17
19
rotate left
Figure 7.5: Left: Sequence h2, 3, 5, 7, 11, 13, 17, 19i represented by a binary search
tree. Right: Rotation of a binary search tree.
height=2
2 3
Exercise 7.3 Prove that an (a, b)-tree with n 2 leaves has height at most bloga n/2c+
1. Also prove that this bound is tight: For every height h give an (a, b)-tree with height
h = 1 + loga n/2.
Searching an (a, b)-tree is only slightly more complicated than searching a binary
tree. Instead of performing a single comparison at a nonleaf node, we have to find the
correct child among up to b choices. If the keys in the node are stored in sorted order,
we can do that using binary search, i.e., we need at most dlogbe comparisons for each
node we are visiting.
5 17
7 11 13
129
17
11
13
19
17
19
Figure 7.6: Sequence h2, 3, 5, 7, 11, 13, 17, 19i represented by a (2, 4)-tree.
compared to a pointer based implementation? Compare search in an implicit binary
tree to binary search in a sorted array. We claim that the implicit tree might be slightly
faster. Why? Try it.
7 11 13
h=1
Class ABTree(a 1 : ,b 2a 1 :
`=hi : List of Element
r : ABItem(hi, h`.head()i)
height=1 :
k=12
h>1
12
13
//
r
) of Element
//
k0
130
Sorted Sequences
// Example:
// h2, 3, 5i.insert(12)
Procedure ABTree::insert(e : Element)
(k,t):= r.insertRec(e, height, `)
//
if t 6= null then
// root was split
r:= new ABItem(hki, hr,ti)
height++
k=3, t=
2 3 5
5 12
3
12
//
5 12
12
131
dummy item is the rightmost leaf in the search tree. Hence, there is no need to treat
the special case of root degree 0 and a handle of the dummy item can serve as a return
value when locating a key larger than all values in the sequence.
To insert an element e, the routine in Figure 7.8 first descends the tree recursively
to find the smallest sequence element e0 that is not smaller than e. If e and e0 have
equal keys, e replaces e0 . Otherwise, e is inserted into the sorted list ` before e0 . If e0
was the i-th child c[i] of its parent node v then e will become the new c[i] and key(e)
becomes the corresponding splitter element s[i]. The old children c[i..d] and their
corresponding splitters s[i..d 1] are shifted one position to the right.
The difficult part is what to do when a node v already had degree d = b and now
would get degree b+1. Let s0 denote the splitters of this illegal node and c0 its children.
The solution is to split v in the middle. Let d = d(b + 1)/2e denote the new degree of
v. A new node t is allocated that accommodates the b + 1 d leftmost child pointers
c0 [1..b + 1 d] and the corresponding keys s0 [1..b d]. The old node v keeps the d
rightmost child pointers c0 [b + 2 d..b + 1] and the corresponding splitters s0 [b + 2
d..b].
The leftover middle key k = s0 [b+1d] is an upper bound for the keys reachable
from t now. Key k and the pointer to t is needed in the predecessor u of v. The situation
in u is analogous to the situation in v before the insertion: if v was the ith child of u,
t is displacing it to the right. Now t becomes the ith child and k is inserted as the
i-th splitter. This may cause a split again etc. until some ancestor of v has room to
accommodate the new child or until the root is split.
In the latter case, we allocate a new root node pointing to the two fragments of the
old root. This is the only situation where the height of the tree can increase. In this
case, the depth of all leaves increases by one, i.e., we maintain the invariant that all
leaves have the same depth. Since the height of the tree is O (logn) (cf. Exercise 7.3),
we get a worst case execution time of O (logn) for insert.
Exercise 7.4 It is tempting to streamline insert by calling locate to replace the initial descent of the tree. Why does this not work with our representation of the data
structure?2
Exercise 7.5 Prove that for a 2 and b 2a 1 the nodes v and t resulting from
splitting a node of degree b + 1 have degree between a and b.
[consistently concat, concatenate ?] The basic idea behind remove is similar =
to insertion it locates the element to be removed, removes it from the sorted list,
and on the way back up repairs possible violations of invariants. Figure 7.9 gives
2
Note that this approach becomes the method of choice when the more elaborate representation from
Section 7.4.1 is used.
132
Sorted Sequences
//
2
2 3
...
k
5
//
i
s x y z
c
//
a b c d
i
x z
133
the invariant. There are two cases. If the neighbor has degree larger than a, we can
balance the degrees by transferring some nodes from the neighbor. If the neighbor has
degree a, balancing cannot help since both nodes together have only 2a 1 children
so that we cannot give a children to both of them. However, in this case we can fuse
them to a single node since the requirement b 2a 1 ensures that the fused node has
degree at most b.
To fuse a node c[i] with its right neighbor c[i + 1] we concatenate their children
arrays. To obtain the corresponding splitters, we need to place the splitter s[i] from
the parent between the splitter arrays. The fused node replaces c[i + 1], c[i] can be
deallocated, and c[i] together with the splitter s[i] can be removed from the parent
node.
Exercise 7.6 Suppose a node v has been produced by fusing two nodes as described
above. Prove that the ordering invariant is maintained: Elements e reachable through
child v.c[i] have keys v.s[i 1] < key(e) v.s[i] for 1 i v.d.
Balancing two neighbors works like first fusing them and then splitting the result
in an analogous way to the splitting operation performed during an insert.
Since fusing two nodes decreases the degree of their parent, the need to fuse or
balance might propagate up the tree. If the degree of the root drops to one this only
makes sense if the tree has height one and hence contains only a single element. Otherwise a root of degree one is deallocated an replaced by its sole child the height
of the tree decreases.
As for insert, the execution time of remove is proportional to the height of the tree
and hence logarithmic in the size of data structure. We can summarize the performance
of (a, b)-tree in the following theorem:
Theorem 7.1 For any constants 2 a and b 2a 1, (a, b)-trees support the operations insert, remove, and locate on n element sorted sequences in time O (logn).
Exercise 7.7 Give a more detailed implementation of locateLocally based on binary
search that needs at most dlogbe Key comparisons. Your code should avoid both
explicit use of infinite key values and special case treatments for extreme cases.
a c d
pseudocode. When a parent u notices that the degree of its child c[i] has dropped
to a 1, it combines this child with one of its neighbors c[i 1] or c[i + 1] to repair
Exercise 7.8 Suppose a = 2k and b = 2a. Show that (1 + 1k ) logn + 1 element comparisons suffice to execute a locate operation in an (a, b)-tree. Hint, it is not quite
sufficient to combine Exercise 7.3 with Exercise 7.7 since this would give you an
additional term +k.
*Exercise 7.9 (Red-Black Trees.) A red-black tree is a binary search tree where the
edges are colored either red or black. The black depth of a node v is the number of
black edges on the path from the root to v. The following invariants have to hold:
134
Sorted Sequences
may be more space efficient than a direct representation, in particular if keys are large.
135
search tree in Figure 7.6 will find the 5, subsequently outputs 7, 11, 13, and stops
when it sees the 17.
Build/Rebuild: Exercise 7.11 asks you to give an algorithm that converts a sorted
list or array into an (a, b)-tree in linear time. Even if we first have to sort an unsorted
list, this operation is much faster than inserting the elements one by one. We also get
a more compact data structure this way.
Exercise 7.11 Explain how to construct an (a, b)-tree from a sorted list in linear time.
Give the (2, 4)-tree that your routine yields for h1..17i. Finally, give the trees you get
after subsequently removing 4, 9, and 16.
Concatenation: Two (a, b)-trees s1 = hw, . . . , xi and s2 = hy, . . . , zi can be concatenated in time O (logmax(|s1 |, |s2 |)) if x < y. First, we remove the dummy item from
s1 and concatenate s1 .` and s2 .`. Now the main idea is to fuse the root of one tree
with a node of the other in tree in such a way that the resulting tree remains sorted
and balanced. If s1 .height s2 .height, we descend s1 .height s2 .height levels from
the root of s1 by following pointers to the rightmost children. The node v we reach is
then fused with the root of s2 . The required new splitter key is the largest key in s1 . If
the degree of v now exceeds b, v is split. From that point, the concatenation proceeds
like an insert operation propagating splits up the tree until the invariant is fulfilled or
a new root node is created. Finally, the lists s1 .` and s2 .` are concatenated. The case
for s1 .height < s2 .height is a mirror image. We descend s2 .height s1 .height levels
from the root of s2 by following pointers to the leftmost children. These operations
can be implemented to run in time O (1 + |s1.height s2 .height|) = O (log n). All in
all, a concat works in O (logn) time. Figure 7.10 gives an example.
s2 17
5:insert 5 17
s1
4:split
2 3 5
11 13
19
2 3
7 11 13
19
3:fuse
2
1:delete
11
13
17
19
11
13
17
19
2:concatenate
Figure 7.10: Concatenating (2, 4)-trees for h2, 3, 5, 7i and h11, 13, 17, 19i.
136
Sorted Sequences
2 3
13
11
19
13
17
13
19
5 7
11
11
17 19
13
137
Exercise 7.13 Give a sequence of n operations on (2, 3)-trees that requires (n logn)
split operations.
split <2.3.5.7.11.13.17.19> at 11
3
17
19
Figure 7.11: Splitting the (2, 4)-tree for h2, 3, 5, 7, 11, 13, 17, 19i from Figure 7.6
yields the subtrees shown on the left. Subsequently concatenating the trees surrounded
by the dashed lines leads to the (2, 4)-trees shown on the right side.
Splitting: A sorted sequence s = hw, . . . , x, y, . . . , zi can be split into sublists s1 =
hw, . . . , xi and s2 = hy, . . . , zi in time O (logn) by specifying the first element y of the
second list. Consider the path from the root to the leaf containing the splitting element y. We split each node v on this path into two nodes. Node v ` gets the children of
v that are to the left of the path and vr gets the children that are to the right of the path.
Some of these nodes may be empty. Each of the nonempty nodes can be viewed as
the root of an (a, b)-tree. Concatenating the left trees and a new dummy list element
yields the elements up to x. Concatenating hyi and the right trees yields the list of elements starting from y. We can do these O (log n) concatenations in total time O (logn)
by exploiting that the left trees have strictly decreasing height and the right trees have
strictly increasing height. By concatenating starting from the trees with least height,
= the total time (the sum of the height differences) is O (logn).[more gory details?]
Figure 7.11 gives an example.
Exercise 7.12 Explain how to delete a subsequence he s : e i from an (a, b)tree s in time O (logn).
We now show that the amortized complexity is much closer to the best case except
if b has the minimum feasible value of 2a 1. In Section 7.4.1 we will see variants of
insert and remove that turn out to have constant amortized complexity in the light of
the analysis below.
Theorem 7.2 Consider an (a, b)-tree with b 2a that is initially empty. For any
sequence of n insert or remove operations. the total number of split or fuse operations
is O (n).
Proof: We only give the proof for (2, 4)-trees and leave the generalization to Exercise 7.14. In contrast to the global insurance account that we used in Section 3.1 we
now use a very local way of accounting costs that can be viewed as a pebble game
using peanuts.[here or in priority queue section?] We pay one peanut for a split or =
fuse. We require a remove to pay one peanut and an insert to pay two peanuts. We
claim that this suffices to feed all the split and fuse operations. We impose an additional peanut invariant that requires nonleaf nodes to store peanuts according to the
following table:
degree
peanuts
Note that we have included the cases of degree 1 and 5 that violate the degree invariant
and hence are only temporary states of a node.
operand
balance:
or
operation cost
insert
split:
remove
fuse:
+ for split +
+ for fuse +
=leftover
peanut
for parent
for parent
Figure 7.12: The effect of (a, b)-tree operations on the peanut invariant.
Since creating the tree inserts the dummy item, we have to pay two peanuts to get
things going. After that we claim that the peanut invariant can be maintained. The
peanuts paid by an insert or remove operation suffice to maintain the peanut invariant
for the nonleaf node immediately above the affected list entries. A balance operation can only decrease the total number of peanuts needed at the two nodes that are
involved.
138
Sorted Sequences
A split operation is only performed on nodes of (temporary) degree five and results
in left node of degree three and a right node of degree two. The four peanuts stored
on the degree five node are spent as follows: One peanut is fed to the split operation
itself. Two peanuts are used to maintain the peanut invariant at the parent node. One
peanut is used to establish the peanut invariant of the newly created degree two node
to the left. No peanut is needed to maintain the peanut invariant of the old (right) node
that now has degree three.
A fuse operation fuses a degree one node with a degree two node into a degree
three node. The 2 + 1 = 3 peanuts available are used to feed one peanut to the fuse
operation itself and to maintain the peanut invariant of the parent with one peanut. 4
Figure 7.12 summarizes all peanut pushing transformations.
*Exercise 7.14 Generalize the proof for arbitrary b 2a. Show that n insert or
remove operations cause only O (n/(b 2a + 1)) fuse or split operations.
Exercise 7.15 (Weight balanced trees.) Consider the following variant of (a, b)-trees:
The node-by-node invariant d a is relaxed to the global invariant that the tree leads
to at least 2aheight1 elements. Remove does not perform any fuse or balance operations. Instead, the whole tree is rebuild using the routine from Exercise 7.11 when
the invariant is violated. Show that remove operations execute in O (log n) amortized
= time. [check]
139
140
Sorted Sequences
select 6th element
subtree
size
9
0+7>6 17
i=0
0+4<6
i=4
3
4+2>6 13
7 7
3 4
i=4
11 2 4+1<6
5 2
2 2
5
11
i=5
13
17
19 2
19
Figure 7.13: Selecting the 6th-smallest element from h2, 3, 5, 7, 11, 13, 17, 19i represented by a binary search tree.
size of the left subtree of t. If i + i0 k then we set t to its left successor. Otherwise
t is set to its right successor and i is incremented by i0 . When a leaf is reached, the
invariant ensures that the k-th element is reached. Figure 7.13 gives an example.
Exercise 7.21 Generalize the above selection algorithm for (a, b)-trees. Develop two
variants. One that needs time O (b loga n) and stores only the subtree size. Another
variant needs only time O (loga n) and stores d 1 sums of subtree sizes in a node of
degree d.
Exercise 7.22 Explain how to determine the rank of a sequence element with key k
in logarithmic time.
Exercise 7.23 A colleague suggests to support both logarithmic selection time and
constant amortized update time by combining the augmentations from Sections 7.4.1
and 7.4.2. What goes wrong?
141
of inserting a new list element and splitting a node are no longer the same from the
point of view of their parent.
For large b, locateLocally should use binary search. For small b, linear search
might be OK. Furthermore, we might want to have a specialized implementation for
small, fixed values of a and b that unrolls5 all the inner loops. Choosing b to be a
power of two might simplify this task.
Of course, the crucial question is how a and b should be chosen. Let us start with
the cost of locate. There are two kinds of operations that (indirectly) dominate the
execution time of locate: element comparisons (because they may cause branch mispredictions[needed anywhere else? mention ssssort in sort.tex?]6 ) and pointer =
dereferences (because they may cause cache faults). Exercise 7.8 indicates that element comparisons are minimized by choosing a as large as possible and b 2a should
be a power of two. Since the number of pointer dereferences is proportional to the
height of the tree (cf. Exercise 7.3), large values of a are also good for this measure.
Taking this reasoning to the extreme, we would get best performance for a n, i.e.,
a single sorted array. This is not so astonishing. By neglecting update operations, we
are likely to end up with a static search data structure looking best.
Insertions and deletions have the amortized cost of one locate plus a constant
number of node reorganizations (split, balance, or fuse) with cost O (b) each. We get
logarithmic amortized cost for update operations for b = O (logn). A more detailed
analysis [64, Section 3.5.3.2] would reveal that increasing b beyond 2a makes split and
fuse operations less frequent and thus saves expensive calls to the memory manager
associated with them. However, this measure has a slightly negative effect on the
performance of locate and it clearly increases space consumption. Hence, b should
remain close to 2a.
Finally, let us have a closer look at the role of cache faults. It is likely that (M/b)
nodes close to the root fit in the cache. Below that, every pointer dereference is associated with a cache miss, i.e., we will have about loga (bn/(M)) cache misses in a
cache of size M provided that a complete search tree node fits into a single cache block.
[some experiments?] Since cache blocks of processor caches start at addresses that =
are a multiple of the block size, it makes sense to align the starting address of search
tree nodes to a cache block, i.e., to make sure that they also start at an address that is
a multiple of the block size. Note that (a, b)-trees might well be more efficient than
binary search for large data sets because we may save a factor loga cache misses.
5 Unrolling a loop for i := 1 to K do body means replacing it by the straight line program
i
body 1 ,. . . ,bodyK . This saves the overhead for loop control and may give other opportunities for simplifications.
6 Modern microprocessors attempt to execute many (up to a hundred or so) instructions in parallel. This
works best if they come from a linear, predictable sequence of instructions. The branches in search trees
have a 50 % chance of going either way by design and hence are likely to disrupt this scheme. This leads to
large delays when many partially executed instructions have to be discarded.
142
Sorted Sequences
Very large search trees are stored on disks. Indeed, under the name BTrees [8],
(a, b)-tree are the working horse of indexing data structures in data bases. In that case,
internal nodes have a size of several KBytes. Furthermore, the linked list items are also
replaced by entire data blocks that store between a0 and b0 elements for appropriate
values of a0 and b0 (See also Exercise 3.19). These leaf blocks will then also be subject
to splitting, balancing and fusing operations. For example, assume we have a = 2 10 ,
the internal memory is large enough (a few MBytes) to cache the root and its children,
and data blocks store between 16 and 32 KBytes of data. Then two disk accesses are
sufficient to locate any element in a sorted sequence that takes 16 GBytes of storage.
Since putting elements into leaf blocks dramatically decreases the total space needed
for the internal nodes and makes it possible to perform very fast range queries, this
measure can also be useful for a cache efficient internal memory implementation.
However, note that update operations may now move an element in memory and thus
will invalidate element handles stored outside the data structure.
There are many more tricks for implementing (external memory) (a, b)-trees that
are beyond the scope of this book. Refer to [40] and [72, Chapters 2,14] for overviews.
Even from the augmentations discussed in Section 7.4 and the implementation tradeoffs discussed here you have hopefully learned that the optimal implementation of
sorted sequences does not exist but depends on the hardware and the operation mix
relevant for the actual application. We conjecture however, that in internal memory
(a, b)-trees with b = 2k = 2a = O (logn) augmented with parent pointers and a doubly
linked list of leaves will yield a sorted sequence data structure that supports a wide
range of operations efficiently.
143
Exercise 7.25 Explain how our implementation of (a, b)-trees can be generalized to
implement multisets. Element with identical key should be treated like a FIFO, i.e.,
remove(k) should remove the least recently inserted element with key k.
The most widespread implementation of sorted sequences in STL uses a variant of
red-black trees with parent pointers where elements are stored in all nodes rather than
in the leaves. None of the STL data types supports efficient splitting or concatenation
of sorted sequences.
LEDA offers a powerful interface sortseq that supports all important (and many
not so important) operations on sorted sequences including finger search, concatenation, and splitting. Using an implementation parameter, there is a choice between
(a, b)-trees, red-black trees, randomized search trees, weight balanced trees, and skip
lists.8 [todo: perhaps fix that problem in LEDA? Otherwise explain how to
declare this in LEDA. PS was not able to extract this info from the LEDA documentation.]
=
Java
The Java library java.util[check wording/typesetting in other chapters] offers the =
interface classes SortedMap and SortedSet which correspond to the STL classes set
and map respectively. There are implementation classes, namely TreeMap and TreeSet
respectively based on red-black trees.
The STL has four container classes set, map, multiset, and multimap. The prefix multi
means that there may be several elements with the same key. A map offers an arraylike interface. For example, someMap[k]:= x would insert or update the element with
key k and associated information x.
There is an entire zoo of sorted sequence data structures. If you just want to support insert, remove, and locate in logarithmic time, just about any of them might do.
Performance differences are often more dependent on implementation details than on
fundamental properties of the underlying data structures. We nevertheless want to
give you a glimpse on some of the more interesting species. However, this zoo displays many interesting specimens some of which have distinctive advantages for more
specialized applications.
The first sorted sequence data structure that supports insert, remove, and locate in
logarithmic time were AVL trees [1]. AVL trees are binary search trees which maintain
the invariant that the heights of the subtrees of a node differ by at most one. Since this
is a strong balancing condition, locate is probably a bit faster than in most competitors.
On the other hand, AVL trees do not support constant amortized update costs. Another
small disadvantage is that storing the heights of subtrees costs additional space. In
7
We are committing a slight oversimplification here since in practice one will use much smaller block
sizes for organizing the tree than for sorting.
8
Currently, the default implementation is a remarkably inefficient variant of skip lists. In most cases you
are better off choosing, e.g., (4, 8)-trees [27].
C++
144
Sorted Sequences
comparison, red-black trees have slightly higher cost for locate but they have faster
updates and the single color bit can often be squeezed in somewhere. For example,
pointers to items will always store even addresses so that their least significant bit
could be diverted to storing color information.
Splay trees [90] and some variants of randomized search trees [86] even work
without any additional information besides one key and two successor pointers. A
more interesting advantage of these data structures is their adaptivity to nonuniform
access frequencies. If an element e is accessed with probabilityp then these search
trees will over time be reshaped to allow an access to e in time O log 1p . This can be
shown to be asymptotically optimal for any comparison based data structure.
=
[sth about the advantages of weight balance?]
There are so many search tree data structures for sorted sequences that these two
terms are sometimes used as synonyms. However, there are equally interesting data
structures for sorted sequences that are not based on search trees. In the introduction,
we have already seen sorted arrays as a simple static data structure and sparse tables
[45] as a simple way to make sorted arrays dynamic. Together with the observation
from Exercise 7.10 [9] this yield an data structure which is asymptotically optimal in
an amortized sense. Moreover, this data structure is a crucial ingredient for a sorted
sequence data structure [9] which is cache oblivious [37], i.e., cache efficient on any
two levels of a memory hierarchy without even knowing the size of caches and cache
blocks. The other ingredient are cache oblivious static search trees [37] perfectly
balance binary search trees stored in an array such that any search path will exhibit
good cache locality in any cache. We describe the van Emde Boas layout used for this
k
purpose for the case that there are n = 22 leaves for some integer k: Store the top
2k1 levels of the tree in the beginning of the array. After that, store the 2 k1 subtrees
of depth 2k1 allocating consecutive blocks of memory for them. Recursively allocate
the resulting 1+2k1 subtrees. At least static cache oblivious search trees are practical
in the sense that they can outperform binary search in a sorted array.
Skip lists [80] are based on another very simple idea. The starting point is a sorted
linked list `. The tedious task of scanning ` during locate can be accelerated by producing a shorter list `0 that only contains some of the elements in `. If corresponding
elements of ` and `0 are linked, it suffices to scan `0 and only descend to ` when approaching the searched element. This idea can be iterated by building shorter and
shorter lists until only a single element remains in the highest level list. This data
structure supports all important operations efficiently in an expected sense. Randomness comes in because the decision which elements to lift to a higher level list is made
randomly. Skip lists are particularly well suited for supporting finger search.
Yet another familie of sorted sequence data structures comes into play when we no
longer consider keys as atomic objects that can only be compared. If keys are numbers
145
in binary representation, we get faster data structures using ideas similar to the fast
numeric sorting algorithms from Section 5.6. For example, sorted sequences with w
bit integer keys support all operations in time O (logw) [93, 66]. At least for 32 bit
keys these ideas bring considerable speedup also in practice [27]. Not astonishingly,
string keys are important. For example, suppose we want to adapt (a, b)-trees to use
variable length strings as keys. If we want to keep a fixed size for node objects, we
have to relax the condition on the minimal degree of a node. Two ideas can be used
to avoid storing long string keys in many nodes: common prefixes of keys need to
be stored only once, often in the parent nodes. Furthermore, it suffices to store the
distinguishing prefixes of keys in inner nodes, i.e., just enough characters to be able
to distinguish different keys in the current node. Taking these ideas to the extreme
we get tries [34][check], a search tree data structure specifically designed for strings =
keys: Tries are trees whose edges are labelled by characters or strings. The characters
along a root leaf path represent a key. Using appropriate data structures for the inner
nodes, a trie can be searched in time O (s) for a string of size s. [suffix tree and array
wrden zu weit fhren?]
=
We get a more radical generalization of the very idea of sorted sequences when
looking at more complex objects such as intervals or pointd in d-dimensional space.
We refer to textbooks on geometry for this wide subject [26][cite more books?].
=
Another interesting extension of the idea of sorted sequences is the concept of
persistence. A data structure is persistent if it allows us to update a data structure and
at the same time keep arbitrary old versions around. [more? what to cite?]
=
[what else]
=
146
Sorted Sequences
147
Chapter 8
Graph Representation
[Definition grundlegender Begriffe schon in Intro. Weitere Begriffe dort wo
sie das erste mal gebraucht werden. Zusaetzlich Zusammenfassung der Defs.
im Anhang.]
=
Nowadays scientific results are mostly available in the form of articles in journals,
conference proceedings, and on various web resources. These articles are not self
contained but they cite previous articles with related content. However, when you
read an article from 1975 with an interesting partial result, you often ask yourselves
what is the current state of the art. In particular, you would like to know which newer
papers cite the old paper. Projects like citeseer1 work on providing this functionality
by building a database of articles that efficiently support looking up articles citing a
given article.
We can view articles as the nodes in a directed graph where[which terms need
definition] an edge (u, v) means u cites v. In the paper representation, we know the =
outgoing edges of a node u (the articles cited by u) but not the incoming edges (the
articles citing u). We can see that even the most elementary operations on graphs can
be quite costly if we do not have the right representation.
This chapter gives an introduction into the various possibilities of graph representation in a computer. We mostly focus on directed graphs and assume that an
undirected graph G = (V, E) is represented in the same way as the (bi)directed graph
S
G0 = (V, {u,v}E {(u, v), (v, u)}). Figure 8.1 gives an example. Most of the presented
data structures also allow us to represent parallel edges and self-loops. The most
important question we have to ask ourselves is what kind of operations we want to
support.
Accessing associated information: Given a node or an edge, we want to access
information directly associated to it, e.g., the edge weight. In many representations
1 http://citeseer.nj.nec.com/cs
Update: Sometimes we want to add or remove nodes or edges one at a time. Exactly
what kind of updates are desired is what can make the representation of graphs a
nontrivial topic.
=
[somewhere a more comprehensive overview of operations?]
3
4
2 4 1 3 4 2 4 1 2 3
1
m
Construction, Conversion and Output: The representation most suitable for the
algorithmic problem at hand is not always the representation given initially. This is
not a big problem since most graph representations can be translated into each other
= in linear time. However, we will see examples[do we? in mst chapter?], where
conversion overhead dominates the execution time.
0
1
0
1
4 2
1
0
1
1
0
1
0
1
1
1
1
0
Edge Queries: Given a pair of nodes (u, v) we may want to know whether this edge
is in the graph. This can always be implemented using a hash table but we may want
to have something even faster. A more specialized but important query is to find the
reverse edge (v, u) of a directed edge (u, v) E if it exists. This operation can be
implemented by storing additional pointers connecting edges with their reverse edge.
1
n
1 3 6 8 11
Navigation: Given a node we want to access its outgoing edges. It turns out that this
operation is at the heart of most graph algorithms. As we have seen in the scientific
article example, we sometimes also want to know the incoming edges.
149
nodes and edges are objects and we can directly store this information as a member
of these objects. If not otherwise mentioned, we assume that V = {1, . . . , n} so that
information associated with nodes can be stored in arrays. When all else fails, we can
always store node or edge information in a hash table. Hence, elementary accesses
can always be implemented to run in constant time. In the remainder of this book we
abstract from these very similar options by using data types NodeArray and EdgeArray
to indicate an array-like data strucure that can be addressed by node or edge labels
respectively.
Graph Representation
148
Figure 8.1: The top row shows an undirected graph, its interpretation as a bidirected
graph, and representations of this bidirected graph by adjacency array and by adjancency list. The bottom part shows a direct representation of the undirected graph using
linked edge objects and the adjacency matrix.
150
Graph Representation
O(1) auxiliary space. Hint: View the problem as the task to sort edges by their source
node and adapt the integer sorting algorithm from Figure 5.14.
151
v. The bottom part of Figure 8.1 gives an example. A node u now stores a pointer to
one incident edge e = {u, w}. To find the identity of w we have to inspect the node
information stored at e. To find further nodes incident to u we have to inspect the
adjacency information with respect to u which is also stored at e. Note that in order
to find the right node or adjacency information we either need to compare u with the
node information stored at e, or we need a bit stored with u that tells us where to look.
(We can then use this bit as an index into two-element arrays stored at e.)
152
Graph Representation
is completely defined by the two parameters k and ` . Figure 8.2 shows G 3,4 . Edge
weights could be stored in two two-dimensional arrays, one for the vertical edges and
one for the horizontal edges.
=
[refs to examples for geometric graphs?]
Exercise 8.6 Nodes of interval graphs can be represented by real intervals [v l , vr ].
Two nodes are adjancent iff their intervals overlap. You may assume that these intervals are part of the input.
a) Devise a representation of interval graphs that needs O (n) storage and supports navigation in constant expected time. You may use preprocessing time
O (n logn).
2 In practice, the situation is more complicated since we rarely get disconnected Matrices. Still, more
sophisticated graph theoretic concepts like cuts can be helpful for exploiting the structure of the matrix.
Figure 8.2: The grid graph G34 (left) and an interval graph with 5 nodes and 6 edges
(right).
153
b) As the first part but you additionally want to support node insertion and removal
in time O (log n).
c) Devise an algorithm using your data structure that decides in linear time whether
the interval graph is connected.
154
Graph Representation
C++ library for our unbounded array data structure from Section 3.1, we only have to
implement the data type iterator which is basically an abstraction of a pointer into the
sequence.
=
[sth on standard formats?]
C++
LEDA [67] has a very powerful graph data type that is space consuming but supports
a large variety of operations in constant time. It also supports several more space
efficient adjacency array based representations.
The Boost graph library 3 emphasizes a consistent separation of representation
and interface. In particular, by implementing the Boost interface, a user can run Boost
graph algorithms on her own (legacy) graph representation. With adjacency list Boost
also has its own graph representatotion class. A large number of parameters allow to
choose between variants of graphs (directed, undirected, multigraph) type of available
navigation (in-edges, out-edges,. . . ) and representations of vertex and edge sequences
(arrays, linked lists, sorted sequences,. . . ). However, it should be noted that even the
array representation is not what we call adjacency array representation here because
one array is used for the adjacent edges of each vertex. [some qualified criticism?
= Is this still easy to use?]
Java
addition, the graph data structure should efficiently support iterating over the edges
along a face of the graph a cycle that does not enclose any other nodes.
[move bipartite and hypergraphs into intro chapter?] Bipartite graphs are =
special graphs where the node set V = L R can be decomposed into two disjoint
subsets L and R so that edges are only between nodes in L and R.
Hypergraphs H = (V, E) are generalizations of graphs where edges can connect
more than two nodes. Often hypergraphs are represented as the bipartite graph B H =
(E V, {(e, v) : e E, v e}).
Cayley graphs 5 are an interesting example for implicitly defined graphs. Recall
that a set V is a group if it has a associative multiplication operation , a neutral
element, and a multiplicative inverse operation. The Cayley graph (V, E) with respect
to a set S V has the edge set {(u, u s) : u V, s S}. Cayley graphs are useful
because graph theoretic concepts can be useful in group theory. On the other hand,
group theory yields concise definitions of many graphs with interesting properties.
For example, Cayley graphs have been proposed as the interconnection networks for
parallel computers [7].
In this book we have concentrated on convenient data structures for processing
graphs. There is also a lot work on storing graphs in a flexible, portable, and space
efficient way. Significant compression is possible if we have a priori information on
the graphs. For example, the edges of a triangulation of n points in the plan can be
representd with about 6n bits [24, 82].
[OBDDs??? or is that too difficult] [what else?]
=
=
155
5 We
156
Graph Representation
157
Chapter 9
Graph Traversal
Suppose you are working in the traffic planning department of a nice small town with
a medieval old town full of nooks and crannies. An unholy coalition of the shop
owners who want more streeside parking opportunities and the green party that wants
to discourage car traffic all together has decided to turn most streets into one-way
streets. To avoid the worst, you want to be able to quickly find out whether the current
plan is at least feasible in the sense that one can still drive from every point in the
town to every other point.
Using the terminology of graph theory, the above problem asks whether the directed graph formed by the streets is strongly connected. The same question is important in many other applications. For example, if we have a communication network
with unidirectional channels (e.g., radio transceivers with different ranges) we want
to know who can communicate with whom. Bidirectional communication is possible
within components of the graph that are strongly connected. Computing strongly connected components (SCCs) is also an important subroutine. For example, if we ask
for the minimal number of new edges in order to make a graph G = (V, E) strongly
connected, we can regard each SCC as a single node of the smaller graph G = (V 0 , E 0 )
whose nodes are the SCCs of G and with the edge set
E = (u0 , v0 ) V 0 V 0 : (u, v) E : u u0 v v0 .
In G0 we have contracted SCCs to a single node. Since G0 cannot contain any directed
cycles (otherwise we could have build larger SCCs), it is a directed acyclic graph
(DAG). This might further simplify our task. Exercise 9.9 gives an example, where a
graph theoretic problem turns out to be easy since it is easy to solve on SCCs and easy
to solve on DAGs.
158
Graph Traversal
159
forward
s
tree
backward
cross
Figure 9.1: Classification of graph edges into tree edges, forward edges, backward
edges, and cross edges.
We present a simple and efficient algorithm for computing SCCs in Section 9.2.2.
The algorithm systematically explores the graph, inspecting each edge exactly once
and thus gathers global information. Many graph problems can be solved using a
small number of basic traversal strategies. We look at the two most important of
them: Breadth first search in Section 9.1 and depth first search in Section 9.2. Both algorithms have in common that they construct trees that provide paths from a root node
s to all nodes reachable from s. These trees can be used to distinguish between four
classes of edges (u, v) in the graph: The tree edges themselves, backward edges leading to an ancestor of v on the path, forward edges leading to an indirect descendent of
v, and cross edges that connect two different branches of the tree. Figure 9.1 illustrates
the four types of edges. This classification helps us to gather global information about
the graph.
160
Graph Traversal
make them very useful. Figure 9.3 therefore does not just give one algorithm but an
algorithm template. By filling in the routines init, root, traverse, and backtrack, we
can solve several interesting problems.
Lemma 9.1 The nodes on the DFS recursion stack are sorted with respect to .
init
foreach s V do
if s is not marked then
mark s
root(s)
recursiveDFS(s, s)
// finish v u
Finishing times have an even more useful property for directed acyclic graphs:
Proof: We consider for each edge e = (v, w) the event that traverse(v, w) is called.
If w is already finished, v will finish later and hence, finishTime[v] > finishTime[w].
Exercise 9.5 asks you to prove that this case covers forward edges and cross edges.
Similarly, if e is a tree edge, recursiveDFS[w] will be called immediately and w gets
a smaller finishing time than v. Finally, backward edges are illegal for DAGs because
together with the tree edges leading from w to v we get a cycle.
161
An ordering of the nodes in a DAG by decreasing finishing times is known as topological sorting. Many problems on DAGs can be solved very efficiently by iterating
through the nodes in topological order. For example, in Section 10.3 we will get a very
simple algorithm for computing shortest paths in acyclic graphs.[more examples?] =
Figure 9.3: A template for depth first search of a graph G = (V, E).
Exercise 9.4 Give a nonrecursive formulation of DFS. You will need to maintain a
stack of unexplored nodes and for each node on the stack you have to keep track of
the edges that have already been traversed.
Exercise 9.5 (Classification of Edges) Prove the following relations between edge
types and possible predicates when edge (v, w) is traversed. Explain how to compute
each of the predicates in constant time. (You may have to introduce additional flags
for each node that are set during the execution of DFS.)
w marked? w finished? v w w on recursion stack?
type
tree
no
no
yes
no
yes
yes
yes
no
forward
backward
yes
no
no
yes
cross
yes
yes
no
no
v
u
dfsNum records the order in which nodes are marked and finishTime records the order
in which nodes are finished. Both numberings are frequently used to define orderings
of the nodes. In this chapter we will define based on dfsNum, i.e., u v
dfsNum[u] < dfsNum[v]. Later we will need the following invariant of DFS:
Exercise 9.6 (Topological sorting) Design a DFS based algorithm that outputs the
nodes as a sequence in topological order if G is a DAG. Otherwise it should output a
cycle.
Exercise 9.7 Design a BFS based algorithm for topological sorting.
[rather use markTime, finishTime?]
Exercise 9.8 (Nesting property of DFS numbers and finishing times) Show that
6 u, v V : dfsNum[u] < dfsNum[v] < finishTime[u] < finishTime[v]
162
Graph Traversal
traverse(c,a)
a b c d e f g h i j k
traverse(e,g) traverse(e,h) traverse(h,i)
traverse(i,e)
traverse(i,j) traverse(j,c)
backtrack(b,c) backtrack(a,b)
traverse(j,k)
traverse(k,d)
backtrack(a,a)
backtrack(j,k) backtrack(i,j) backtrack(h,i)
backtrack(e,h) backtrack(d,e)
root(d) traverse(d,e) traverse(e,f) traverse(f,g)
backtrack(f,g)
unmarked
backtrack(e,f)
marked
backtrack(d,d)
finished
nonrepresentative node
representative node
163
We now come back to the problem posed at the beginning of this chapter. Computing
connected components of an undirected graph is easy. Exercise 9.3 outlines how to do
it using BFS and adapting this idea to DFS is equally simple. For directed graphs it
is not sufficient to find a path from u to v in order to conclude that u and v are in the
same SCC. Rather we have to find a directed cycle containing both u and v.
Our approach is to find the SCCs using a single pass of DFS. At any time during DFS, we will maintain a representation of all the SCCs of the subgraph defined
by marked nodes and traversed edges. We call such an SCC open if it contains any
unfinished nodes and closed otherwise. We call a node open if it belongs to an open
component and closed if it belongs to a closed component. Figure 9.4 gives an example for the development of open and closed SCCs during DFS. DFS is so well suited
for computing SCCs because it maintains the following invariants at all times:
a
closed SCC
open SCC
Figure 9.4: An example for the development of open and closed SCCs during DFS.
164
init
component=h, . . ., i : NodeArray of Node
oReps=hi : Stack of Node
oNodes=hi : Stack of Node
Graph Traversal
// SCC representatives
// representatives of open SCCs
// all nodes in open SCCs
root(s)
oReps.push(s)
oNodes.push(s)
traverse(v, w)
if (v, w) is a tree edge then
oReps.push(w)
oNodes.push(w)
else if w oNodes then
while w oReps.top do
oReps.pop
backtrack(u, v)
if v = oReps.top then
oReps.pop
repeat
w:= oNodes.pop
component[w]:= v
until w = v
Figure 9.5: An instantiation of the DFS template that computes strongly connected
components of a graph G = (V, E).
The invariants guaranteed by Lemma 9.3 come for free with DFS without any
additional implementation measures. All what remains to be done is to design data
structures that keep track of open components and allow us to record information on
closed components. The first node marked in any open or closed component is made
its representative. For a node v in a closed component, we record its representative
in component[v]. This will be our output. Since the sequence of open components
only changes at its end, it can be managed using stacks. We maintain a stack oReps
of representatives of the open components. A second stack oNodes stores all the open
nodes ordered by . By Invariant 2, the sequence of open components will correspond
to intervals of nodes in oNodes in the same order.
Figure 9.5 gives pseudocode. When a new root is marked or a tree edge is ex-
165
plored, a new single node open component is created by pushing this node on both
stacks. When a cycle of open components is created, these components can be merged
by popping representatives off oReps while the top representative is not left of the node
w closing the cycle. Since an SCC S is represented by its node v with smallest dfsNum,
S is closed when v is finished. In that case, all nodes of S are stored on top of oNodes.
Operation backtrack then pops v from oReps and the nodes w S from oNodes setting
their component to the representative v.
Note that the test w oNodes in traverse can be done in constant time by keeping
a flag for open nodes that is set when a node is first marked and that is reset when its
component is closed. Furthermore, the while and the repeat loop can make at most n
iterations during the entire execution of the algorithm since each node is pushed on
the stacks exactly once. Hence, the exeuction time of the algorithm is O (m + n). We
get the following theorem:
Theorem 9.4 The DFS based algorithm in Figure 9.5 computes strongly connected
components in time O (m + n).
Exercises
*Exercise 9.9 (transitive closure) The transitive closure G = (V, E )[check notation] of a graph G = (V, E) has an edge (u, v) E whenever there is a path from u to =
v in E. Design an algorithm that computes E in time O (n + |E |). Hint: First solve
the problem for the DAG of SCCs of G. Also note that S S 0 E if S and S0 are
SCCs connected by an edge.
Exercise 9.10 (2-edge connected components) Two nodes of an undirected graph
are in the same 2-edge connected component (2ECC) iff they lie on a cycle. Show that
the SCC algorithm from Figure 9.5 computes 2-edge connected components. Hint:
first show that DFS of an undirected graph never produces any cross edges.
Exercise 9.11 (biconnected components) Two edges of an undirected graph are in
the same biconnected component (BCC) iff they lie on a simple cycle. Design an
algorithm that computes biconnected components using a single pass of DFS. You can
use an analogous approach as for SCC and 2ECC but you need to make adaptations
since BCCs are defined for edges rather than nodes.[explain why important?]
=
166
Graph Traversal
depth d and nodes at depth d + 1 mainly because it allows a simple loop invariant that
makes correctness immediately evident. However, an efficient implementation of our
formulation is also likely to be somewhat more efficient. If q and q 0 are organized as
stacks, we will get less cache faults than for a queue in particular if the nodes of a
layer do not quite fit into the cache. Memory management becomes very simple and
efficient by allocating just a single array a of n nodes for both stacks q and q 0 . One
grows from a[1] to the right and the other grows from a[n] towards smaller indices.
When switching to the next layer, the two memory areas switch their roles.
=
[unify marks and dfsnumbers]
=
[graph iterators]
C++
167
Chapter 10
Shortest Paths
The shortest or quickest or cheapest path problem is ubiquitous. You solve it all
the time when traveling. Give more examples
the shortest path between two given nodes s and t (single source, single sink)
the shortest paths from a given node s to all other nodes ( single source)
the shortest paths between any pair of nodes (all pairs problem)
10.1 Introduction
Abstractly, we are given a directed graph G = (V, E) and a cost function c that maps
edges to costs. For simplicity, we will assume that edge costs are real numbers,
although most of the algorithms presented in this chapter will work under weaker
assumptions (see Exercise 10.13); some of the algorithms require edge costs to be
integers. We extend the cost function to paths in the natural way. The cost of a
path is the sum of the costs of its constituent edges, i.e., if p = [e1 , e2 , . . . , ek ] then
c(p) = 1ik c(ei ). The empty path has cost zero. For two nodes v and w, we define
todo???: redraw
Figure 10.1: Single Source Shortest Path Problem: source node s = fat blue node,
yellow node has distance + from s, blue nodes have finite distance from s, square
blue node has distance 1 from s. There are paths of length 1, 4, 9, . . . , green nodes
have distance from s.
168
Shortest Paths
the least cost path distance (v, w) from v to w as the minimal cost of any path for v to
w.
(v, w) = inf {c(p) : p is a path from v to w} IR {, +} .
The distance is +, if there is no path from v to w, is , if there are paths of arbitrarily small cost1 and is a proper number otherwise, cf. Figure 10.1 for an example.
A path from v to w realizing (v, w) is called a shortest or least cost path from v to w.
The following Lemma tells us when shortest paths exist.
[PS: avoid proofs early in the chapters? These properties here we could
= just state and give similar arguments in a less formal setting.]
Lemma 10.1 (Properties of Shortest Path Distances)
a) (s, v) = + iff v is not reachable from s.
b) (s, v) = iff v is reachable from a negative cycle C which in turn is reachable
from s.
c) If < (s, v) < + then (s, v) is the cost of a simple path from s to v.
Proof: If v is not reachable from s, (s, v) = +, and if v is reachable from s,
(s, v) < +. This proves part a). For part b) assume first that v is reachable from a
negative cycle C which is turn is visible from s. Let p be a path from s to some node
u on C and let q be a path from u to v. Consider the paths p(i) which first use p to
go from s to u, then go around the cycle i times, and finally follow q from u to v. Its
cost is c(p) + i c(C) + c(q) and hence c(p(i+1) ) < c(p(i) ). Thus there are paths of
arbitrarily small cost from s to v and hence (s, v) = . This proves part b) in the
direction from right to left.
For the direction from left to right, let C be the minimal cost of a simple path from
s to v and assume that there is a path p from s to v of cost strictly less than C. Then p
is non-simple and hence we can write p = p1 p2 p3 , where p2 is a cycle and p1 p3
is a simple path. Then
C c(p1 p3 ) = c(p) c(p2 ) < C
and hence c(p2 ) < 0. Thus v is reachable from a negative cycle which in turn is
reachable from s.
We turn to part c). If < (s, v) < +, v is reachable from s, but not reachable
through a negative cycle by parts a) and b). Let p be any path from s to v. We
decompose p as in preceding paragraph. Then c(p1 p3 ) = c(p) c(p2 ) c(p) since
1 min {c(p)
10.1 Introduction
169
the cost of the cycle p2 must be non-negative. Thus for every path from s to v there is
a simple path from s to v of no larger cost. This proves part c).
Exercise 10.1 Let p be a shortest path from from u to v for some nodes u and v and
let q be a subpath of p. Show that q is a shortest path from its source node to its target
node.
Exercise 10.2 Assume that all nodes are reachable from s and that there are no negative cycles. Show that there is an n-node tree T rooted as s such that all tree paths
are shortest paths. Hint: Assume first that shortest paths are unique and consider the
subgraph T consisting of all shortest paths starting at s. Use the preceding exercise to
prove that T is a tree. Extend to the case when shortest paths are not unique.
The natural way to learn about distances is to propagate distance information
across edges. If there is a path from s to u of cost d(u) and e = (u, v) is an edge
out of u, then there is a path from s to v of cost d(u) + c(e). If this cost is smaller than
the best cost previously known, we remember that the currently best way to reach v is
through e. Remembering the last edges of shortest paths will allow us to trace shortest
paths.
More precisely, we maintain for every node v a label [PS: In a real implementation you store a predecessor node rather than an edge. Note this somewhere?] hd(v), in(v)i where d(v) is the cost of the currently best path from s to v and =
in(v) is the last edge of this path. We call d(v) the tentative distance of v. If no path
from s to v is known yet, d(v) = and in(v) has the special value . If p(v) is the
empty path (this is only possible for v = s), d(v) = 0, and in(v) = .
A function relax(e : Edge) is used to propate distance information.
Procedure relax(e = (u, v) : Edge)
if d(u) + c(e) < d(v) set the label of v to hd(u) + c(e), ei
At the beginning of a shortest path computation we know very little. There is a
path of length zero from s to s and no other paths are known.
Initialization of Single Source Shortest Path Calculation:
hd(s), in(s)i := h0, , i
hd(v), in(v)i := h+, i for v 6= s
Once the node labels are initialized we propagate distance informations.
Generic Single Source Algorithm
initialize as described above
relax edges until d(v) = (s, v) for all v
170
Shortest Paths
The following Lemma gives sufficient conditions for the termination condition to
hold. We will later make the conditions algorithmic.
Lemma 10.2 (Sufficient Condition for Correctness) We have d(v) = (s, v) if for
some shortest path p = [e1 , e2 , . . . , ek ] from s to v there are times t1 , . . . , tk such that
t1 < t2 < . . . < tk and ei is relaxed at time ti .
Proof: We have (s, v) = kj=1 c(e j ). Let t0 = 0, let v0 = s, and let vi = target(ei ).
Then d(vi ) 1 ji c(e j ) after time ti . This is clear for i = 0 since d(s) is initialized
to zero and d-values are only decreased. After the relaxation of e i at time ti for i > 0,
we have d(vi ) d(vi1 ) + c(ei ) ij=1 c(e j ).
The Lemma above paves the way for specific shortest path algorithms which we
discuss in subsequent sections. Before doing so, we discuss properties of graph defined by the in-edges. The set {in(v) : in(v) 6= } of in-edges form a graph on our
node set with maximal indegree one; we call it the in-graph. The in-graph changes
over time. Unreached nodes and s (except if it lies on a negative cycle) have indegree
zero. Thus the in-graph consists of a tree rooted at s, isolated nodes, cycles and trees
emanating from these cycles, cf. Figure 10.1. The tree rooted at s may be empty. We
call the tree rooted at s the shortest path tree and use T to denote it. The next lemma
justifies the name.
[PS: Hier wird es etwas schwierig. Bilder? Ausfuehlicher formulieren?
Dijkstra und Bellmann-Ford ohne negative cycles kann man einfacher haben.
= Aber dann wird es weniger elegant. . . .]
Lemma 10.3 (Properties of In-Graph)
=
=
a) If p is a path of in-edges from u to v then d(v) d(u) + c(p). [needed anywhere outside the proof?]
b) If v lies on a cycle or is reachable from a cycle of the in-graph, (s, v) =
.[already follows from the invariant]
c) If < (s, v) < + and d(v) = (s, v), the tree path from s to v is a shortest
path from s to v.
171
together, we have
k
172
Shortest Paths
If d(v) is set to , there is an edge e = (x, y) with d(x) + c(e) < d(y) after
termination of the do-loop and such that v is reachable from y. The edge allows us
to decrease d(y) further and hence d(y) > (y) when the do-loop terminates. Thus
(y) = by the second paragraph; the same is true for (s, v), since v is reachable
from y.
173
We assume that all edge costs are non-negative. Thus there are no negative cycles and
shortest paths exist for all nodes reachable from s. We will show that if the edges are
relaxed in a judicious order, every edge needs to be relaxed only once.
What is the right order. Along any shortest path, the shortest path distances increase (more precisely, do not decrease). This suggests to scan nodes (to scan a node
means to relax all edges out of the node) in order of increasing shortest path distance.
Of course, in the algorithm we do not know shortest path distances, we only know
tentative distances. Fortunately, it can be shown that for the unscanned node with
minimal tentative distance, the true distance and tentative distance agree. This leads
to the following algorithm.
Dijkstras Algorithm
initalize node labels and declare all nodes unscanned
while unscanned node with tentative distance + do
u := the unscanned node with minimal tentative distance
relax all edges out of u and declare u scanned
Theorem 10.6 Dijkstras algorithm solves the single source shortest path for graphs
with non-negative edge costs.
[PS: a picture here?]
=
Proof: Assume that the algorithm is incorrect and consider the first time that we scan
a node with its tentative distance larger than its shortest path distance. Say at time t
we scan node v with (s, v) < d(v). Let p = [s = v1 , v2 , . . . , vk = v] be a shortest path
from s to v and let i be minimal such that vi is unscanned just before time t. Then i > 0
since s is the first node scanned (in the first iterations s is the only node whose tentative
distance is less than +) and since (s, s) = 0 = d(s) when s is scanned. Thus v i1
was scanned before time t and hence d(vi1 ) = (s, vi1 ) when vi1 was scanned by
definition of t. When vi1 was scanned, d(vi ) was set to (s, vi ) since any prefix of
a shortest path is a shortest path. Thus d(vi ) = (s, vi ) (s, vk ) < d(vk ) just before
time t and hence vi is scanned instead of vk , a contradiction.
174
Shortest Paths
Exercise 10.6 Let v1 , v2 , . . . be the order in which nodes are scanned. Show (s, v 1 )
(s, v2 ) . . ., i.e., nodes are scanned in order of increasing shortest path distances.
We come to the implementation of Dijkstras algorithm. The key operation is to
find the unscanned node with minimum tentative distance value. Adressable priority queues (see Section ??) are the appropriate data structure. We store all unscanned
reached (= tentative distance less than +) nodes in an addressable priority queue PQ.
The entries in PQ are pairs (d(u), u) with d(u) being the priority. Every reached unscanned node stores a handle to its entry in the priority queue. We obtain the following
implementation of Dijkstras algorithm.
=
[todo: Dijkstra und Prim aehnlicher machen.]
PQ : PriorityQueue of Node
// Init
intialize the node labels
// this sets d(s) = 0 and d(v) = for v 6= s
declare all nodes unscanned
// s is the only reached unscanned node at this point
PQ.insert(s, 0)
while PQ6= 0/ do
select u PQ with d(u) minimal and remove it; declare u scanned // delete min
forall edges e = (u, v) do
if D = d(u) + c(e) < d(v) then
if d(v) == then PQ.insert(v, D)
// insert
else PQ.decreaseKey(v, D)
// decreaseKey
set the label of v to hD, ei
175
Proof: Every reachable node is removed from the priority queue exactly once and
hence we consider each edge at most once in the body of the while loop. We conclude that the running time is O(n + m) plus the time spent on the operations on the
priority queue. The queue needs to be initialized. Every node is inserted into the
queue and deleted from the queue at most once and we perform one emptyness test
in each iteration of the while-loop. The number of decrease priority operation is at
most m (n 1): for every node v 6= s we have at most indeg(v) 1 decrease priority
operations and for s we have none.
Exercise 10.7 Design a graph and and a non-negative cost function such that the relaxation of m (n 1) edges causes a decreaseKey operation.
In his original paper [32] Dijkstra proposed the following implementation of the
priority queue. He proposed to maintain the number of reached unscanned nodes
and two arrays indexed by nodes: an array d storing the tentative distances and an
array storing for each node whether it is unscanned and reached. Then init is O(n)
and is empty, insert and decreaseKey are O(1). A delete min takes time O(n) since
it requires to scan the arrays in order to find the minimum tentative distance of any
reached unscanned node. Thus total running time is O(m + n 2).
Theorem 10.8 With Dijkstras proposed implementation of the priority queue, Dijkstras algorithm runs in time O(m + n2).
Much better priority queue implementations were invented since Dijkstras original paper, cf. the section of adressable priority queues (Section ??).
Theorem 10.9 With the heap implementation of priority queues, Dijkstras algorithm
runs in time O(m logn + n logn).
Theorem 10.10 With the Fibonacci heap implementation of priority queues, Dijkstras algorithm runs in time O(m + n logn).
Asymptotically, the Fibonnacci heap implementation is superior except for sparse
graphs with m = O(n). In practice (see [22, 67]), Fibonacci heaps are usually not
the fastest implementation because they involve larger constant factors and since the
actual number of decrease priority operations tends to much smaller than what the
worst case predicts. An average case analysis [77] sheds some light on this.
Theorem 10.11 Let G be an arbitrary directed graph, let s be an arbitrary node of
G, and for each node v let C(v) be a set of non-negative real numbers of cardinality
176
Shortest Paths
indeg(v). For each v the assignment of the costs in C(v) to the edges into v is made at
random, i.e., our probability space consists of the v indeg(v)! many possible assignments of edge costs to edges. Then the expected number of decreaseKey operations is
O(n log(m/n)).
= [PS: todo similar thing for analysis of Prim.]
Proof: Consider a particular node v and let k = indeg(v). Let e1 , . . . , ek be the order
in which the edges into v are relaxed in a particular run of Dijkstra algorithm and let
ui = source(ei ). Then d(u1 ) d(u2 ) . . . d(uk ) since nodes are removed from U
in increasing order of tentative distances. Edge ei causes a decreaseKey operation iff
i 2 and d(ui ) + c(ei ) < min d(u j ) + c(e j ) : j < i .
Thus the number of operations decreaseKey(v, ) is bounded by the number of i such
that
i 2 and c(ei ) < min c(e j ) : j < i .
Since the order in which the edges into v are relaxed is independent of the costs
assigned to them, the expected number of such i is simply the number of left-right
maxima in a permutation of size k (minus 1 since i = 1 is not considered). Thus the
expected number is Hk 1 by Theorem ?? and hence the expected number of decrease
= priority operations is bounded by [KM: CHECK first .]
where the last inequality follows from the concavity of the ln-function (see Appendix
??).
We conclude that the expected running time is O(m + n log(m/n) logn) with the
heap implementation of priority queues. This is asymptotically more than O(m +
n logn) only for m = (1) and m = o(n logn loglogn).
Exercise 10.8 When is n log(m/n) logn = O(m + n logn)? Hint: Let m = nd. Then
the question is equivalent to logd logn = O(d + logn).
177
large as the priority returned by the last deleteMin operation (at least as large as the
first insertion for the operations preceding the first deleteMin). Dijkstras algorithm
uses its queue in a monotone way.
It is not known whether monotonicity of use can be exploited in the case of general edge costs. However, for integer edge costs significant savings are possible. We
therefore assume for this section that edges costs are integers in the range [0 ..C] for
some integer C. C is assumed to be known at initialization time of the queue.
Since a shortest path can consist of at most n 1 edges, shortest path distances are
at most (n 1)C. The range of values in the queue at any one time is even smaller. Let
min be the last value deleted from the queue (zero before the first deletion). Then all
values in the queue are contained in [min .. min +C]. This is easily shown by induction
on the number of queue operations. It is certainly true after the first insertion. A
deleteMin does not decrease min and inserts and decrease priority operations insert
priorities in [min .. min +C].
178
Shortest Paths
179
TODO
Lemma 10.13 Let i be minimal such that Bi is non-empty and assume i 1. Let min
be the smallest element in Bi . Then msd(min, x) < i for all x Bi .
Figure 10.3: The path represents the binary representation of min with the least significant digit on the right. A node v U is stored in bucket Bi if its binary representation
differs in the i-th bit when binary representations are scanned starting at the most
significant bit. Distinguishing indices i with i K are lumped together.
Proof: We distinguish the cases i < K and i = K. Let min0 be the old value of min. If
i < K, the most significant distinguishing index of min and any x B i is i, i.e., min0
has a zero in bit position i and all x Bi have a one in bit position i. They agree in
all positions with index larger than i. Thus the most signifant distinguishing index for
min and x is smaller than i.
Let us next assume that i = K and consider any x BK . Then min0 < min x
min0 +C. Let j = msd(min0 , min) and h = msd(min, x). Then j K. We want to show
that h < K. Observe first that h 6= j since min has a one bit in position j and a zero bit
in position h. Let min0 = l l 2l .
Assume first that h < j and let A = l> j l 2l . Then min0 A+ l< j 2l A+2 j 1
since the j-th bit of min0 is zero. On the other hand, x has a one bit in positions j and
h and hence x A + 2 j + 2h . Thus 2h C and hence h blogCc < K.
Assume next that h > j and let A = l>h l 2l . We will derive a contradiction. min0
has a zero bit in positions h and j and hence min0 A + 2h 1 2 j . On the other hand,
x has a one bit in position h and hence x A + 2h . Thus x min0 > 2 j 2K C, a
contradiction.
Radix heaps exploit the binary representation of tentative distances. For numbers
a and b with binary representations a = i0 i 2i and b = i0 i 2i define the most
significant distinguishing index msd(a, b) as the largest i with i 6= i and let it be
1 if a = b. If a < b then a has a zero bit in position i = msd(a, b) and b has a
one bit. A radix heap consists of a sequence of buckets B1 , B0 , . . . , BK where K =
1 + blogCc. A node v U is stored in bucket Bi where i = min(msd(min, d(v)), K).
Buckets are organized as doubly linked lists and every node keeps a handle to the
list item representing it. Figure 10.3 illustrates this definition. We assume that most
distinguishing indices can be computed2 in time O(1) and justify this assumption in
Exercise 10.10.
Exercise 10.9 There is another way to describe the distribution of nodes over buckets.
Let min = j j 2 j , let i0 be the smallest index greater than K with i0 = 0, and let
Mi = j>i j 2 j . B1 contains all nodes v U with d(v) = min, for 0 i < K, Bi =
i
i+1
0
/ if i = 1, and iBi = v
U : Mi + 2 d(x) < Mi + 2 1 if i = 0, and BK =
v U : Mi0 + 2 0 d(x) . Prove that this description is correct.
We turn to the realization of the queue operations. Initialization amounts to creating K +1 empty lists, an insert(v, d(v)) inserts v into the appropriate list, a decreaseKey(v, d(v))
removes v from the list containing it and inserts it into the appropriate queue. Thus
Theorem 10.14 With the Radix heap implementation of priority queues, Dijkstras
insert and decreaseKey take constant time.
algorithm runs in time O(m + n logC). This assumes that edge costs are integers in
A deleteMin first finds the minimum i such that Bi is non-empty. If i = 1, an
the
range [0 ..C].
arbitrary element in B1 is removed and returned. If i 0, the bucket Bi is scanned
and min is set to the smallest tentative distance contained in the bucket. Afterwards,
Exercise 10.10 [PS: ich verstehe die Aufgabe im Moment nicht. Ich dachte
all elements in Bi are moved to the appropriate new bucket. Thus a deleteMin takes
Aufg.
10.9 ist eine andere Darstellung.] The purpose of this exercise is to show =
constant time if i = 1 and takes time O(i + |Bi |) = O(K + |Bi |) if i 0. The crucial
that
the
assumption that the msd-function can be computed in amortized constant time
observation is now that every node in bucket Bi is moved to a bucket with smaller
is warranted. We assume inductively, that we have the binary representation of min
index.
and the description of bucket ranges given in Exercise 10.9 available to us. When we
=
[somewhere show that the other buckets need not be touched]
need to move a node from bucket i to a smaller bucket we simply scan through buckets
Bi1 , Bi2 until we find the bucket where to put the node. When min is increased, we
2 For the built-in type int it is a machine instruction on many architectures.
compute the binary representation of min from the binary representation of the old
180
Shortest Paths
minimum min0 by adding min min0 to the old minimum. This takes amortized time
O(K + 1) by Theorem ??.
Next observe that mincost(v) is the minimum over c(e) of the edges into v and hence
Exercise 10.11 Radix heaps can also be based on number representations with base b
for any b 2. In this situation we have buckets Bi, j for i = 1, 0, 1, . . .K and 0 j b,
where K = 1 + blogC/ logbc. An unscanned reached node x is stored in bucket B i, j
if msd(min, d(x)) = i and the i-th digit of d(x) is equal to j. We also store for each i,
the number of nodes contained in buckets j Bi, j . Discuss the implementation of the
priority queue operations and show that a shortest path algorithm with running time
O(m + n(b + logC/logb)) results. What is the optimal choice of b?
If the edge costs are random integers in the range [0 ..C], a small change of the algorithm guarantees linear running time [?, 38, ?]. For every node v let min in cost(v)
the minimum cost of an incoming edge. We divide U into two parts, a part F which
contains nodes whose tentative distance label is known to be equal to their exact distance from s, and a part B which contains all other labeled nodes. B is organized as a
radix heap. We also maintain a value min. We scan nodes as follows.
When F is non-empty, an arbitrary node in F is removed and the outgoing edges
are relaxed. When F is empty, the minimum node is selected from B and min is set
to its distance label. When a node is selected from B, the nodes in the first nonempty bucket Bi are redistributed if i 0. There is a small change in the redistribution
process. When a node v is to be moved, and d(v) min + min in cost(v), we move
v to F. Observe that any future relaxation of an edge into v cannot decrease d(v) and
hence d(v) is know to be exact at this point.
The algorithm is correct since it is still true that d(v) = (s, v) when v is scanned.
For nodes removed from F this was argued in the previous paragraph and for nodes
removed B this follows from the fact that they have the smallest tentative distance
among all unscanned reached nodes.
Theorem 10.15 Let G be an arbitrary graph and let c be a random function from E
to [0 ..C]. Then the single source shortest path problem can be solved in expected time
O(n + m).
Proof: We still need to argue the bound on the running time. As before, nodes
start out in BK . When a node v is moved to a new bucket, but not yet to F, d(v) >
min + min in cost(v) and hence v is moved to a bucket Bi with i log min in cost(v).
We conclude that the total charge to nodes in deleteMin and decreaseKey operations
is
(K logmin in cost(v) + 1) .
v
181
K logc(e) is the number of leading zeros in the binary representation of c(e) when
written as a K-bit number. Our edge costs are uniform random numbers in [0 ..C] and
K = 1 + blogCc. Thus prob(k logc(e)) = i) = 2i . We conclude
E[(k logc(e))] = i2i = O(m)
e
e i0
We conclude that the total expected cost of deleteMin and decrease p operations is
O(n + m). The time spent outside these operations is also O(n + m).
c(p) =
c(ei )
i=0
= pot(v0 ) +
0i<k
0i<k
c(ei ) pot(vk )
182
Shortest Paths
Exercise 10.12 Potential functions can be used to generate graphs with negative edge
costs but no negative cycles: generate a (random) graph, assign to every edge e a
(random) non-negative (!!!) weight c(e), assign to every node v a (random) potential
pot(v), and set the cost of e = (u, v) to c(e) = pot(u) + c(e) pot(v). Show that this
rule does not generate negative cycles. Hint: The cost of a cycle with respect to c is
the same as with respect to c.
Lemma 10.17 Assume that G has no negative cycles and that all nodes can be reached
from s. Let pot(v) = (s, v) for v V . With this potential function reduced edge costs
are non-negative.
Proof: Since all nodes are reachable from s and since there are no negative cycles,
(s, v) IR for all v. Thus the reduced costs are well defined. Consider an arbitrary
edge e = (v, w). We have (s, v) + c(e) (w) and hence c(e) = (s, v) + c(e)
(s, w) 0.
All Pair Shortest Paths in the Absence of Negative Cycles
add a new node s and zero length edges (s, v) for all v // no new cycles, time O(m)
compute (s, v) for all v with Bellman-Ford
// time O(nm)
set pot(v) = (s, v) and compute reduced costs
// time O(m)
forall nodes x do
// time O(n(m + n logn))
solve the single source problem with source x and
reduced edge costs with Dijkstras algorithm
translate distances back to original cost function
// time O(m)
(v, w) = (v, w) + pot(w) pot(v)
We have thus shown
Theorem 10.18 Assume that G has no negative cycles. The all pairs shortest problem
= in graphs without negative cycles can be solved in time O(nm)[PS: +n 2 logn???].
=
183
in(v) = iff d(v) = +, i.e., when in(v) = , we d(v) = + and ignore the number
stored in d(v).
[PS: More implementation notes: heuristics? store PQ items with nodes?
Implementations in LEDA? BOOST?]
=
184
Shortest Paths
185
Chapter 11
186
small subtrees. This algorithm applies a generally useful data structure explained in
Section 11.4: Maintain a partition of a set of elements. Operations are a to find out
whether two elements are in the same subset and to join two subsets.
Exercises
Exercise 11.1 Develop an efficient way to find minimum spanning forests using a
single call of a minimum spanning tree routine. Do not find connected components
first. Hint: insert n 1 additional edges.
Exercise 11.2 Explain how to find minimum spanning sets of edges when zero and
negative weights are allowed. Do these edge sets necessarily form trees?
Exercise 11.3 Explain how to reduce the problem of finding maximum weight spanning trees to the minimum spanning tree problem.
Lemma 11.2 (Cycle Property:) Consider any cycle C E and an edge e C with
maximal weight. Then any MST of G0 = (V, E \ {e}) is also an MST of G.
Proof: Consider any MST T of G. Since trees contain no cycles, there must be some
edge e0 C \ T . If e = e0 then T is also an MST of G0 and we are done. Otherwise,
187
T 0 = {e0 } T \ {e} forms another tree and since c(e0 ) c(e), T 0 must also form an
MST of G.
Using the cut property, we easily obtain a greedy algorithm for finding a minimum
spanning tree: Start with an empty set of edges T . While T is not a spanning tree, add
an edge fulfilling the cut property.
There are many ways to implement this generic algorithm. In particular, we have
the choice which S we want to take. We also have to find out how to find the smallest edge in the cut efficiently. We discuss two approaches in detail in the following
sections and outline a third approach in Section 11.6.
Exercises
Exercise 11.4 Show that the MST is uniquely defined if all edge weights are different.
Show that in this case the MST does not change if each edge weight is replaced by its
rank among all edge weights.
188
The Jarnk-Prim (JP) Algorithm for MSTs is very similar to Dijkstras algorithm
for shortest paths.4 Starting form an (arbitrary) source node s, the JP-algorithm grows
a minimum spanning tree by adding one node after the other. The set S from the cut
property is the set of nodes already added to the tree. This choice of S guarantees that
the smallest edge leaving S is not in the tree yet. The main challenge is to find this
edge efficiently. To this end, the algorithm maintains the shortest connection between
any node v V \ S to S in a priority queue q. The smallest element in q gives the
desired edge. To add a new node to S, we have to check its incident edges whether
they give improved connections to nodes in V \ S. Figure 11.1 gives pseudocode for
= the JP-algorithm. [example. (JP plus Kruskal)] [harmonize with description of
= Dijkstras algorithm] Note that by setting the distance of nodes in S to zero, edges
connecting s with a node v S will be ignored as required by the cut property. This
small trick saves a comparison in the innermost loop.
The only important difference to Dijkstras algorithm is that the priority queue
stores edge weights rather than path lengths. The analysis of Dijkstras algorithm
transfers to the JP-algorithm, i.e., using a Fibonacci heap priority queue, O (n logn + m)
execution time can be achieved.
Exercises
Exercise 11.5 Dijkstras algorithm for shortest paths can use monotonous priority
queues that are sometimes faster than general priority queues. Give an example to
show that monotonous priority queues do not suffice for the JP-algorithm.
=
**Exercise 11.6 (Average case analysis of the JP-algorithm) Assume the edge weights
1,. . . ,m are randoly assigned to the edges of G. Show that the expected number of
decreaseKey operations performed by the JP-algorithm is then bounded by O (??)[Referenz?
= Bezug zu SSSP.].
189
Exercises
Exercise 11.7 Explain how Kruskal fits into the framework of the generic greedy algorithm based on the cut property, i.e., explain which set S must choosen in each iteration of the generic algorithm to find the MST edges in the same order as Kruskals
algorithm.
Exercise 11.8 (Streaming MST) Suppose the edges of a graph are presented to you
only once (for example over a network connection) and you do not have enough memory to store all of them. The edges do not necessarily arrive in sorted order.
a) Outline an algorithm that nevertheless computes an MST using space O (V ).
190
*b) Refine your algorithm to run in time O (m log n). Hint: Use the dynamic tree
data structure by Sleator and Tarjan [89].
// path compression
// picture after
191
of different partitions joins these two subsets. Figure 11.3 gives an efficient implementation of this idea. The most important part of the data structure is the array parent.
Leaders are their own parents. Following parent references leads to the leaders. The
parent references of a subset form a rooted tree[where else], i.e., a tree with all edges =
directed towards the root.5 Additionally, each root has a self-loop. Hence, find is easy
to implement by following the parent references until a self-loop is encountered.
Linking two leaders i and j is also easy to implement by promoting one of the
leaders to overall leader and making it the parent of the other. What we have said
so far yields a correct but inefficient union-find data structure. The parent references
could form long chains that are traversed again and again during find operations.
Therefore, Figure 11.3 makes two optimizations. The link operation uses the array
gen to limit the depth of the parent trees. Promotion in leadership is based on the
seniority principle. The older generation is always promoted. We will see that this
measure alone limits the time for find to O (logn). The second optimization is path
compression. A long chain of parent references is never traversed twice. Rather,
find redirects all nodes it traverses directly to the leader. We will see that these two
optimizations together make the union-find data structure breath-takingly efficient
the amortized cost of any operation almost constant.
Analysis
[todo. trauen wir uns an Raimunds neue analyse?]
// picture before
Exercises
// balance
Figure 11.3: An efficient Union-Find data structure maintaining a partition of the set
{1, . . . , n}.
The union-find data structure maintains a partition of the set 1..n and supports
these two operations. Initially, each element is in its own subset. Each subset is assigned a leader element[term OK? representative (CLR) or canonical element
= are such long words. too much confusion between leader and parent?]. The
function find(i) finds the leader of the subset containing i; link(i, j) applied to leaders
192
193
time much less than for solving the all-pairs shortest path problem.
A related and even more frequently used application is clustering based on the
MST [4, Application 13.5]: By dropping k 1 edges from the MST it can be split into
k subtrees. Nodes in a subtree T 0 are far away from the other nodes in the sense that
all paths to nodes in other subtrees use edges that are at least as heavy as the edges
used to cut T 0 out of the MST.
Many applications of MSTs define complete graphs with n(n 1)/2 edges using
a compact implicit description of the graph. Then it is an important concern whether
one can rule out most of the edges as too heavy without actually looking at them. For
example, if the nodes represent points in the plane and if edge weights are Euclidean
distances, one can exploit the geometrical structure of the problem. It can be proven
that the MST or the complete geometric graph is contained in various well know O (n)
subgraphs that have size O (n) and can be computed in time O (n logn) (Delaunay
triangulation, Gabriel traph [79], Yao graph [?]).
Delaunay-Triangulation of the point set [79]. The Delaunay-Triangulation is an
O (n) subset of the edges that can be found in time O (n logn). Hence, MSTs of 2D
point sets with Euclidean distance function can be found in time O (n logn). We will
see another example for implicitly defined complete graphs below.
Although we introduced MSTs as a network design problem, most network design
problems of practical interest ask more general questions that are very hard to solve
exactly. For example, a Minumum Weight Steiner Tree (MWST) T E connects a set
of terminals U V . The Steiner nodes V \U need not be connected but can be used to
reduce the overall weight. In our cabling example, the Steiner nodes could represent
uninhabited islands that do not need a telephone but can host a switch. MSTs can
be used to approximate MWSTs. Consider the complete graph G0 = (U, E 0 ) where
c((s,t)) is the shortest path distance between s and t in G = (V, E). An MST in G 0
yields a Steiner tree spanning U in G by taking the edges in E from all paths used
to connect nodes in G0 . This Steiner tree hast at most twice the weight of an optimal
Steiner tree. Although G0 can have (n2 ) edges, its MST can be found efficiently in
time O (n log n + m) [65].
[overview paper of other generalizations]
=
[TSP approximation? Held-Karp lower bound?]
=
[checkers for MST?]
=
194
195
Chapter 12
Generic Approaches to
Optimization
A smuggler in the mountainous region of Profitania has n items in his cellar. If he
sells item i across the border, he makes a profit pi . However, the smugglers trade
union only allows him to carry knapsacks with maximum additional weight M . If
item i has a weight of wi , what items should he pack into the knapsack to maximize
the profit in his next trip?
This knapsack problem has many less romantic applications like [nachschauen] =
[61, 55]. In this chapter we use it as a model problem to illustrate several generic
approaches to solve optimization problems. These approaches are quite flexible and
can be adapted to complicated situations that are ubiquitous in practical applications.
In the previous chapters we looked at specific very efficient solutions for frequently
occurring simple problems like finding shortest paths or minimum spanning trees.
Now we look at generic solutions that work for a much larger range of applications
but may be less efficient.
More formally, an optimization problem can be described by a set U of potential
solutions, a set L of feasible solutions, and an objective function f : L . In a
maximization problem, we are looking for a feasible solution x L that maximizes
f (L ) among all feasible solutions. In a minimization problem, we look for a solution minimizing f . For example, the knapsack problem is a maximization problem
with U = {0, 1}n , L = {x = (x1 , . . . , xn ) U : ni=1 xi wi M}, and f (x) = ni=1 xi pi .
Note that the distinction between minimization and maximization problems could be
avoided because setting f := f converts a maximization problem into a minimization problem and vice versa. We will use maximization as our default simple because
196
197
with f (x) = c x where c is called the cost vector and where stands for the scalar
product of two n-vectors. Constraints have the form ai x ./i bi where ./i {, , =}
and ai n for i 1..m. We have
L = {x
: i 1..m : xi 0 ai x ./i bi } .
Exercises
feasible solutions
y<=6
Exercise 12.1 (Optimizing versus Deciding.) Suppose you have a routine P that outputs a feasible solution with f (x) a if such a solution exists and signals failure
otherwise.
x+y<=8
a) Assume that objective function values are positive integers. Explain how to find
an minimal solution x using O (log f (x )) calls of P .
*b) Assume that objective function values are reals larger than one. Explain how to
find a solution that is within a factor (1+)from minimal using O (loglog f (x ) + log(1/))
calls of P . Hint: use the geometric mean a b.
2xy<=8
better
solutions
198
.
w1
w2
wn
199
Find the smallest index j such that i=1 wi > M (if there is no such index, we can take
all knapsack items). Now set
j1
x1 = = x j1 = 1, x j = M wi , and x j+1 = = xn = 0 .
i=1
Figure ?? gives an example. For the knapsack problem we only have to deal with
a single fractional variable x j . Rounding x j to zero yields a feasible solution that is
j1
quite good if p j i=1 pi . This fractional solution is the starting point for many good
algorithms for the knapsack problem.
Exercises
Exercise 12.2 (More Animal Food.) How do you model the farmers own supplies
like hay which are cheap but limited? Also explain how to additionally specify upper
bounds like no more than 0.1mg Cadmium contamination per cow and day.
[min cost flow?, multi-commodity flow?]
Exercise 12.3 Explain how to replace any ILP by an ILP that uses only 0/1 variables.
If U is an upper bound on the bi values, your 0/1-ILP should be at most a factor
O (logU) larger than the original ILP.
Exercise 12.4 Formulate the following set covering problem as an ILP: Given
a set
S
M = 1..m, n subsets Mi M for i 1..n and a cost ci for set Mi . Assume ni=1 Mi =
S
1..m. Select F 1..n such that iF Mi = 1..m and iF ci is minimized.
Exercise 12.5 (Linear time fractional knapsacks.) Explain how to solve the fractional knapsack problem in linear expected time. Hint: use a similar idea as in Section 5.5.
200
algorithms are often the best choice for getting a reasonable solution quickly. Let us
look at another typical example.
Suppose you have m identical machines that can be used to process n weighted
jobs of size t1 , . . . ,t j . We are looking for a mapping
1..m from jobs to
x : 1..n
machines such that the makespan Lmax = maxmj=1 ti :x(i)= j ti is minimized. This
is one of the simplest among an important class of optimization problems known as
scheduling problems.
We give a simple greedy algorithm for scheduling independent weighted jobs
on identical machines that has the advantage that we do not need to know the job
sizes in advance. We assign jobs in the order they arrive. Algorithms with this
property are known as online algorithms. When job i arrives we inspect the loads
` j = {ti : i < j, x(i) = j} and assign the new job to most the lightly loaded machine,
i.e., x(i):= minmj=1 ` j . This shortest queue algorithm does not guarantee optimal solutions but at least we can give the following performance guarantee:
1 n
m1 n
max ti .
Theorem 12.3 The shortest queue algorithm ensures that Lmax ti +
m i=1
m i=1
Proof: We focus on the job that is the last job being assigned to the machine with
maximum load. When job is scheduled, all m machines have load at least L max t,
i.e.,
ti (Lmax t) m .
i6=
1
1
m1
1 n
m1 n
t
+
t
=
t
+
ti +
max ti .
t
i
m i6=
m i
m
m i=1
m i=1
201
When an approximation algorithm for a minimization problem guarantees solutions that are at most a factor a larger than an optimal solution, we say that the algorithm achieves approximation ratio a. Hence, we have shown that the shortest
queue algorithm achieves an approximation ratio of 2 1/m. This is tight because
for n = m(m 1) + 1, tn = m, and ti = 1 for i < n, the optimal solution has makespan
Lmax = m whereas the shortest queue algorithm produces a solution with makespan
Lmax = 2m 1. Figure ?? gives an example for m = 4.
Similarly, when an online algorithm for a minimization problem guarantees solutions that are at most a factor a larger than solutions produced by an algorithm that
knows the entire input beforehand, we say that the algorithm has competetive ratio a,
i.e. the shortest queue algorithm has competitive ratio 2 1/m.
Exercises
Exercise 12.6 Prove Corollary ??. Hint: distinguish the cases t i ti /m and t>
i ti /m.
*Exercise 12.7 Show that the shortest queue algorithm achieves approximation ratio
4/3 if the jobs are sorted by decreasing size.
*Exercise 12.8 (Bin packing.) Suppose a smuggler boss has perishable goods in her
cellar. She has to hire enough porters to ship all items this night. Develop a greedy
algorithm that tries to minimize the number of people she needs to hire assuming that
they can all carry maximum weight M. Try to show an approximation ratio for your
bin packing algorithm.
For many optimization problems the following principle of optimality holds: An optimal solution can be viewed as constructed from optimal solutions of subproblems.
Furthermore, for a given subproblem size it does not matter which optimal solution is
used.
The idea behind dynamic programming is to build an exhaustive table of optimal
solutions starting with very small subproblems. Then we build new tables of optimal
solutions for increasingly larger problems by constructing them from the tabulated
solutions of smaller problems.
202
Table 12.1: A dynamic programming table for the knapsack problem maximize
(10, 20, 15, 20) x subject to (1, 3, 2, 4) x 5. Table entries have the form P(i,C),
(xi ). Bold face entries contribute to the optimal solution.
i \C
0
1
2
3
4
0
0
0, (0)
0, (0)
0, (0)
0, (0)
1
0
10, (1)
10, (0)
10, (0)
10, (0)
2
0
10, (1)
10, (0)
15, (1)
15, (0)
3
0
10, (1)
20, (1)
25, (1)
25, (0)
4
0
10, (1)
30, (1)
30, (0)
30, (0)
5
0
10, (1)
30, (1)
35, (1)
35, (0)
Again, we use the knapsack problem as an example. Define P(i,C) as the maximum profit possible using only items 1 through i and total weight at most C. Set
P(0,C) = 0 and also P(i,C) = 0 if C 0.
Lemma 12.5 i 1..n : P(i,C) = max(P(i 1,C), P(i 1,C ci ) + pi ).
Proof: Certainly, P(i,C) P(i 1,C) since having a wider choice of items can
only increase the obtainable profit. Furthermore P(i,C) P(i 1,C c i ) + pi since a
solution obtaining profit P(i 1,C ci ) using items 1 through i 1 can be improved
by putting item i into the knapsack. Hence,
P(i,C) max(P(i 1,C), P(i + 1,C ci ) + pi)
and it remains to show the direction. So, assume there is a solution x with
P(i,C) > P(i 1,C) and P(i,C) > P(i 1,C ci) + pi ). We can distinguish two cases:
Case xi = 0: The solution x does not use item i and hence it is also a solution of
the problem using item 1 through i 1 only. Hence, by definition of P, P(i,C)
P(i 1,C) contradicting the assumption P(i,C) > P(i 1,C).
Case xi = 1: If we remove item i from the solution we get a feasible solution of a
knapsack problem using items 1 through i 1 and capacity at most C c i . By our
assumption we get a profit larger than P(i 1,C ci) + pi pi = P(i 1,C ci). This
contradicts our definition of P.
Using Lemma 12.5 we can compute optimal solutions by filling a table top-down
and left to right. Table 12.1 shows an example computation.
Figure ?? gives a more clever list based implementation of the same basic idea.
Instead of computing the maximum profit using items 1..i and all capacities 1..C, it
203
Function dpKnapsack
/ :L
L=h0i
// candidate solutions. Initially the trivial empty solution
for i := 1 to n do
invariant L is sorted by total weight w(x) = k<i xi wi
invariant L contains all Pareto optimal solutions for items 1..i
L0 := hx {i} : x L, w(x) Mi
// treats a solution as a set of items
L := prune(merge(L, L0 ))
// merge by w(x), see Figure 5.2
return L.last
Function prune(L : Sequence of sol) : Sequence of L
p := 1
// best profit seen so far
L0 := hi
// Pareto optimal solutions
foreach x L do
if p(x) > p then
// p(x) = k<i xi pi is the total profit of the items in x
p := p(x)
L0 .pushBack(x)
return L0
Figure 12.2: A dynamic programming algorithm for the knapsack problem.
only computes Pareto optimal solutions. A solution x is Pareto optimal if there are
no other solutions that achieve higher profit using no more knapsack capacity than x.
We cannot overlook optimal solutions by this omission since solutions of subproblems
that are not Pareto optimal could be be improved by replacing them with solutions that
have better profit.
Algorithm dpKnapsack needs O (nM) worst case time. This is quite good if M
is not too large. Since the running time is polynomial in n and M, dpKnapsack is a
pseudopolynomial algorithm. The Pseudo means that this is not necessarily polynomial in the input size measured in bits we can encode an exponentially large M in
a polynomial number of bits. [say sth about average case complexity]
=
Exercises
Exercise 12.9 (Making Change.) Suppose you have to program a vending machine
that should give exact change using a minimum number of coins.
a) Develop an optimal greedy algorithm that works in the Euro zone with coins
worth 1, 2, 5, 10, 20, 50, 100, and 200 cents and in the Dollar zone with coins
worth 1, 5, 10, 25, 50, and 100 cents.
204
b) Show that this algorithm would not be optimal if there were a 4 cent coin.
c) Develop a dynamic programming algorithm that gives optimal change for any
currency system.
Exercise 12.10 (Chained Matrix Multiplication.) We want to compute the matrix
product M1 M2 Mn where Mi is a ki1 ki matrix. Assume that a pairwise matrix
product is computed in the straight forward way using mks element multiplications
for the product of an m k matrix with an k s matrix. Exploit the associativity of
the matrix product to minimize the number of element multiplicationsneeded. Use
dynamic programming to find an optimal evaluation order in time O n3 .
Exercise 12.11 (Minimum edit distance.) Use dynamic programming to find the minimum edit distance between two strings s and t. The minimum edit distance is the
minimum number of character deletions, insertions, and replacements applied to s
that produces string t.
Exercise 12.12 Does the principle of optimality hold for minimum spanning trees?
Check the following three possibilities for definitions of subproblems: Subsets of
nodes, arbitrary subsets of edges, and prefixes of the sorted sequence of edges.
Exercise 12.13 (Constrained shortest path.) Consider a graph with G = (V, E) where
edges e E have a length `(e) and a cost c(e). We want to find a path from node s to
node t that minimizes the total cost of the path subject to the constraint that the total
length of the path is at most L. Show that subpaths [s0 ,t 0 ] of optimal solutions are not
necessarily optimal paths from s0 to t 0 .
Exercise 12.14 Implement a table based dynamic programming algorithm for the
knapsack problem that needs nM + O (M) bits of space.
Exercise 12.15 Implement algorithm dpKnapsack from Figure 12.2 so that all required operations on solutions (w(x), p(x), ) work in constant time.
205
12.4.1 Branch-and-Bound
Function bbKnapsack(hp1, . . . , pn i, hw1 , . . . , wn i, M) : L
assert p1 /w1 p2 /w2 pn /wn // assume input is sorted by profit density
x =heuristicKnapsack(hp1, . . . , pn i, hw1 , . . . , wn i, M) : L
// best solution so far
x: L
// current partial solution
recurse(1, M, 0)
// Find solutions assuming x1 , . . . , xi1 are fixed, M 0 = M xi wi , P = xi pi .
k<i
k<i
Procedure recurse(i, M 0 , P : )
n
if P + upperBound(hpi , . . . , pn i, hwi , . . . , wn i, M 0 ) > x i pi then
i=1
if i > n then x := x
else
// Branch on variable xi
if wi M then xi := 1; recurse(i + 1, M 0 wi , P + pi )
xi := 0; recurse(i + 1, M 0 , P)
Figure 12.3: A branch-and-Bound algorithm for the knapsack problem. Function heuristicKnapsack constructs a feasible solution using some heuristic algorithm.
Function upperBound computes an upper bound for the possible profit.
Figure 12.3 gives pseudocode for a systematic search routine for the knapsack
problem. The algorithm follows a pattern known as branch-and-bound. The branch
is the most fundamental ingredient of systematic search routines. Branching tries all
sensible setting of a part of the result here the values 0 and 1 for x i and solves the
resulting subproblems recursively. [Bild mit Beispielsuchbaum] Algorithms based =
on branching systematically explore the resulting tree of subproblems. Branching
decisions are internal nodes of this search tree.
Bounding is a more specific method to prune subtrees that cannot contain promising solutions. A branch-and-bound algorithm keeps an incumbent x for the best solution. Initially, the incumbent is found using a heuristic routine. In the knapsack
example we could use a greedy heuristic that scans the items by decreasing profit density and includes items as capacity permits. Later x contains the best solution found
at any leaf of the search tree. This lower bound on the best solutions is complemented
by an upper bound that can be computed quickly. In our example the upper bound
could be the profit for the fractional knapsack problem with items i..n and capacity
M j<i xi wi . In Section 12.1.1 we have seen that the fractional knapsack problem
can be solved quickly. In Exercise 12.16 we even outline an algorithm that runs in
time O (logn). Upper and lower bound together allow us to stop searching if the up-
206
per bound for a solution obtainable from x is no better than the lower bound already
known.
207
Exercises
Exercise 12.16 (Logarithmic time upper bounds for the knapsack problem.) Explain
how to implement the function upperBound in Figure 12.3 so that it runs in time
O (logn). Hint: precompute prefix sums ki wi and k<i pi and use binary search[rather
golden ratio search?].
=
Exercise 12.17 (15-puzzle)
The 15-puzzle is a popular mechanical puzzle. You have to shift 15
scrambled squares in a 4 4 frame into the right order. Define a
move as the action of moving one square into the hole. Implement
a systematic search algorithm that finds a shortest move sequence
from a given starting configuration to the order given in the picture to
the right. Use iterative deepening depth first search [59]: Try all one
move sequences first, then all two move sequences,. . . This should
work for the simpler 8-puzzle. For the 15-puzzle use the following
optimizations: Never undo the immediatly preceding move. Maintain the number of moves that would be needed if all pieces could
be moved freely. Stop exploring a subtree if this bound proves that
the current search depth is too small. Decide beforehand, whether
the number of moves is odd or even. Implement your algorithm to
run in constant time per move tried.
10 11
12 13 14 15
9 10 11
12 13 14 15
208
The optimization algorithms we have seen so far are only applicable in special
circumstances. Dynamic programming needs a special structure of the problem and
may require a lot of space and time. Systematic search is usually too slow for large
inputs. Greedy algorithms are fast but often do not give very good solutions. Local
search can be viewed as a generalization of greedy algorithms. We still solve the
problem incrementally, but we are allowed to change the solution as often as we want
possibly reconsidering earlier decisions.
Figure 12.4 gives the basic framework that we will later refine. Local search maintains a current feasible solution x and the best solution x seen so far. We will see in
Section 12.5.2 that the restriction to work with feasible solutions only can be circumvented. The idea behind local search is to incrementally modify the current solution
x. The main application specific design choice for a local search algorithm is to define
how a solution can be modified. The neighborhood N (x) formalizes this concept. The
second important design decision is which element from the neighborhood is chosen.
Finally, some heuristic decides when to stop searching.
209
0.2
0
-0.2
-0.4
-50
50
210
solutions that make f (x) worse. This implies a dilemma. We must be able to make
many downhill steps to escape from wide local optima and at the same time we must
be sufficiently goal directed to find a global optimum at the end of a long narrow ridge.
shock cool
glass
liquid
anneal
crystal
211
212
The first term penalizes illegal edges and the second favors large color classes. Exercise 12.18 asks you to show that this cost function ensures feasible colorings at local
optima. Hence, simulated annealing is guaranteed to find a feasible solution even if it
= starts with an illegal coloring. [Bild!]
The Kempe Chain Approach
Now solutions are only legal colorings. The objective function simplifies to f (x) =
i |Ci |2 . To find a candidate for a new solution, randomly choose two colors i and
j and a node v with color x(v) = i. Consider the maximal connected component K of
G containing v and nodes with colors i and j. Such a component is called a Kempe
Chain. Now exchange colors i and j in all nodes contained in K. If we start with a
= legal coloring, the result will be a legal coloring again.[Bild!]
Experimental Results
Johnson et al. [49] have made a detailed study of algorithms for graph coloring with
particular emphasis on simulated annealing. The results depend a lot on the structure
of the graph. Many of the experiments use random graphs. The usual model for an
undirected random graph picks each possible edge {u, v} with probability p. The edge
probability p can be used to control the the expected number pn(n 1)/2 of edges in
the graph.
213
For random graphs with 1000 nodes and edge probability 0.5, Kempe chain annealing produced very good colorings given enough time. However, a sophisticated
and expensive greedy algorithm, XRLF, produces even better solutions in less time.
Penalty function annealing performs rather poorly. For very dense random graphs
with p = 0.9, Kempe chain annealing overtakes XRLF.
For sparser random graphs with edge probability 0.1, penalty function annealing
overtakes Kempe chain annealing and can sometimes compete with XRLF.
Another interesting class of random inputs are random geometric graphs: Associate the nodes of a graph with random uniformly distributed positions in the unit
square [0, 1] [0, 1]. Add an edge (u, v) whenever the Euclidean distance of u and v
is less than some range r. Such instances might be a good model for an application
where nodes are radio transmitters, colors are frequency bands, and edges indicate
possible interference between neighboring senders that use the same frequency. For
this model, Kempe chain annealing is outclassed by a third annealing strategy not
described here.
Interestingly, the following simple greedy heuristics is quite competitive:
Given a graph G = (V, E), we keep a subset V 0 of nodes already colored. Ini/
tially, V 0 = 0.
In every step we pick a node v V \V 0 that maximizes | {(u, v) E : u V 0 }|.
Node v is then colored with the smallest legal color.
To obtain better colorings than using a single run, one simple takes the best coloring
produced by repeated calls of the heuristics using a random way to break ties when
selecting a node to be colored.
Exercises
Exercise 12.18 Show that the objective function for graph coloring given in Section 12.5.2 has the property that any local optimum is a correct coloring. Hint: What
happens with f (x) if one end of an illegally colored edge is recolored with a fresh
color? Prove that the cost function of the penalty function approach does not necessarily have its global optimum at a solution that minimizes the number of colors
used.
214
215
solutions at once.
The individuals in a population produce offspring. Because there is only a limited
amount of resources, only the individuals best adapted to the environment survive. For
optimization this means that feasible solutions are evaluated using the cost function f
and only solutions with lower value are likely to be kept in the population.
Even in bacteria which reproduce by cell division, no offspring is identical to its
parent. The reason is mutation. While a genome is copied, small errors happen. Although mutations usually have an adverse effect on fitness, some also improve fitness.
The survival of the fittest means that those individuals with useful mutations will produce more offspring. On the long run, the average fitness of the population increases.
An optimization algorithm based on mutation produces new feasible solutions by selecting a solution x with large fitness f (x), copies it and applies a (more or less random) mutation operator to it. To keep the population size constant, a solution with
small fitness f is removed from the population. Such an optimization algorithm can
be characterized as many parallel local searches. These local searches are indirectly
coupled by survival of the fittest.
In natural evolution, an even more important ingredient is mating. Offspring is
produced by combining the genetic information of two individuals. The importance of
mating is easy to understand if one considers how rare useful mutations are. Therefore
it takes much longer to get an individual with two new and useful mutations than it
takes to combine two individuals with two different useful mutations.
We now have all the ingredients needed for an evolutionary algorithm. There are
many ideas to brew an optimization algorithm from these ingredients. Figure 12.8
presents just one possible framework. The algorithm starts by creating an initial
population. Besides the population size N, it must be decided how to build the initial
individuals. This process should involve randomness but it might also be useful to use
heuristics for constructing some reasonable solutions from the beginning.
To put selection pressure on the population, it is important to base reproduction
success on the fitness on the individuals. However, usually it is not desirable to draw
a hard line and only use the fittest individuals because this might lead to a too uniform
population and incest. Instead, one draws reproduction partners randomly and only
biases the selection by choosing a higher selection probability for fitter individuals.
An important design decision is how to fix these probabilities. One choice might be
to sort the individuals by fitness and then to define p(xi ) as some decreasing function
of the rank of xi in the sorted order. This indirect approach has the advantage that it is
independent on the shape of f and that it is equally distinctive in the beginning when
fitness differences are large as in the end when fitness differences are usually small.
The most critical operation is mate(xi , x j ) which produces two new offspring from
two ancestors. The canonical mating operation is called crossover: Individuals are
assumed to be represented by a string of k bits in such a way that every k-bit string
216
k
xi
xj
xi
xj
Figure 12.9: The crossover operator.
217
218
219
Chapter 13
220
this divide-and-conquer approach. These two quite different solutions for the same
= problem are also a good example that many roads lead to Rome[check]. For example,
quicksort has an expensive divide-stragety and a trivial conquer-stragegy and it is the
other way round for mergesort.
Dynamic Programming:
Whereas divide-and-conquer procedes in a top-down fashion, dynamic programming
(Section 12.3) works bottom up systematically construct solutions for small problem instances and assemble them to solutions for larger and larger inputs. Since dynamic programming is less goal directed than divide-and-conquer, it can get expensive
in time and space. We might ask why not always use a top-down approach? The
reason is that a naive top down approach might solve the same subproblems over and
= over again. [example: fibonacchi numbers?]
Randomization:
Making random decisions in an algorithm helps when there are many good answers
and not too many bad ones and when it would be expensive to definitively distinguish
good and bad. For example, in the analysis of the quick-sort like selection algorithm
from Section 5.5 on third of all possible splitters were good. Moreover, finding out
whether a splitter is good is not much cheaper than simply using it for the divide step
of the divide-and-conquer algorithm.
Precomputation:
= [preconditioning for APSP?]
=
[nice example from Brassard Bradlay]
221
Table 13.1: Basic operations on a Set M of Elements and their complexity (with an
implicit O ()) for six different implementations. Assumptions: e an Element, h is an
Element Handle, k a Key, and n = |M|. a stands for an amortized bound, r for
a randomized algorithm. means that the representation is not helpful for implementing the operation. All data structures support forall e M. . . in time O (n) and | |
in constant time.
Procedure insert(e) M:= M {e}
Procedure insertAfter(h, e) assert h = max{e0 M : e > e0 }; M:= M {e2 }
Procedure build({e1 , . . . , en }) M:= {e1 , . . . , en }
Function deleteMin e:= min M; M:= M \ {e}; return e
Function remove(k) {e} := {e M : key(e) = k}; M:= M \ {e}; return e
Function remove(h) e:= h; M:= M \ {e}; return e
Procedure decreaseKey(h, k) assert key(h) x; key(h):= k
Function find(k) : Handle {h}:= {e : key(e) = k}; return h
Function locate(k) : Handle h:= min {e : key(e) k}; return h
Procedure merge(M 0 ) M:= M M 0
Procedure concat(M 0 ) assert maxM < minM 0 ; M:= M M 0
Function split(h) : Set M 0 := {e M : e h}; M:= M \ M 0 ; return M 0
Function findNext(h) : Handle return min{e M : e > h}
Function select(i : ) {e1 , . . . , en }:= M; return ei
Operation
insert
insertAfter
build
deleteMin
remove(Key)
remove(Handle)
decreaseKey
find
locate
merge
concat
split
findNext
select
List
1
1
n
1
1
1
1
HashTable
1r
1r
n
1
1
1r
sort-array
n logn
1
logn
logn
n
n
1
1
1
PQ
1
1
n
logn
APQ
1
1
n
log n
log n
1a
log n
(a, b)-tree
log n
1a
n logn
1a
log n
1a
log n
log n
log n
n
log n
log n
1
log n
222
representations just accumulates inserted elements in the order they arrive using a sequence data structure like (cyclic) (unbounded) arrays or (doubly) linked lists (Chapter 3). Table 13.1 contains a column for doubly linked lists but often even arrays do
the job. For a more detailed comparison and additional operations for sequences you
should refer to Section 3.4.
Hash tables (Chapter 4) are better if you frequently need to find, remove, or change
arbitrary elements of the set without knowing their position in the data structure. Hash
tables are very fast but have a few restrictions. They give you only probabilistic performance guarantees for some operations (there is some flexibility which operations).
You need a hash function for the Element type (or better a universal family of hash
functions). Hash tables are not useful if the application exploits a linear ordering of
elements given by a key.
If you need to process elements in the order of key values or if you want to
find the elements closest to a given key value, the simplest solution is a sorted array (Chapter 5). Now you can find or locate elements in logarithmic time using binary
= search[where]. You can also merge sorted arrays in linear time, find elements of
given rank easily and split the array into subarrays of disjoint key ranges. The main
restriction for sorted arrays is that insertion and deletion is expensive.
Priority queues (Chapter 4) are a somewhat specialized yet frequently needed dynamic data structure. They support insertions of arbitrary elements and deletion of
the smallest element. You can additionally remove elements from adressable priority queues (column APQ in Table lreftab:operations) and some variants allow you to
merge two priority queues in logarithmic time. Fibonacchi heaps support decreasKey
in constant amortized time.
Search trees like (a, b)-trees (Chapter 7) support almost all conceivable elementwise operations in logarithmic time or faster. They even support the operations split
and concat in logarithmic time although they affect the entire sorted sequence. Search
trees can be augmented with additional information to support more information. For
example, it is easy to support extraction of the k-th smallest element (Function select)
in logarithmic time if each subtree knows its size.
=
[so far the only place where this is needed?] [todo: nicer alignment of
= setops]
223
=
224
225
Appendix A
Notation
A.1
Positive integers,
= {1, 2, . . .}.
|, &, <<, >>, : Bit-wise or, and, right-shift, left-shift, and exclusive-or respectively.
ni=1 ai = i{1,...,n} ai : = a1 + a2 + + an
ni=1 ai = i{1,...,n} ai : = a1 a2 an
n!: = ni=1 i the factorial of n.
div: Integer division. c = m divn is the largest nonnegative integer such that m cn
0.[durch b/c ersetzen?]
=
226
Notation
227
: a + im = b.
: Some ordering relation. In Section 9.2 it denotes the order in which nodes are
marked during depth first search.
i.. j: Short hand for {i, i + 1, . . ., j}.
prime numbers: n
n = a b.
AB : When A and B are sets this is the set of all functions mapping B to A.
strict weak order: A relation that is like a total order except the antisymmetry only
needs to hold with respect to some equivalence relation that is not necessarily
the identity (see also http://www.sgi.com/tech/stl/LessThanComparable.
html).
symmetric relation: A relation is antisymmetric if for all a and b, a b implies
b a.
total order: A reflexive, transitive, antisymmetric relation.
transitive: A relation is transitive if for all a, b, c, a b and b c imply a c.
true A shorthand for the value 1.
A.2
The basis of any argument in probability theory is a sample space . For example,
if we want to describe what happens if we roll two dice, we would probably use
the 36 element sample space {1, . . . , 6} {1, . . . , 6}. In a random experiment, any
element of is chosen with the elementary probability p = 1/||. More generally,
the probability of an event E is the sum of the probabilities of its elements,
i.e, prob(E ) = |E |/||. [conditional probability needed?] A random variable is a =
mapping from elements of the sample space to a value we obtain when this element
of the sample space is drawn. For example, X could give the number shown by the
first dice[check: wuerfel] and Y could give the number shown by the second dice. =
Random variables are usually denoted by capital letters to differentiate them from
plain values.
We can define new random variables as expressions involving other random variables and ordinary values. For example, if X and Y are random variables, then (X +
Y )() = X() +Y (), (X Y )() = X() Y (), (X + 3)() = X() + 3.
228
Notation
Events are often specified by predicates involving random variables. For example,
we have prob(X 2) = 1/3 or prob(X +Y = 11) = prob({(5, 6), (6, 5)}) = 1/18.
Indicator random variables are random variables that can only take the values
zero and one. Indicator variables are a useful tool for the probabilistic analysis of
algorithms because they encode the behavior of complex algorithms into very simple
mathematical objects.
The expected value of a random variable X : A is
E[X] =
x prob(X = x)
k
k
n
i=
i=1
(A.2)
In contrast, E[X Y ] = E[X] E[Y] only if X and Y are independent. Random variables
X1 , . . . , Xk are independent if and only if
x1 , . . . , xk : prob(X1 = x1 Xk = xk ) = prob(X1 = x1 ) prob(Xk = xk ) (A.3)
[exercise?: let A,B denote independent indicator random variables. Let X =
= A B. Show that X, A, B are pairwise independent, yet not independent.]
We will often work with a sum X = X1 + + Xn of n indicator random variables
X1 ,. . . , Xn . The sum X is easy to handle if X1 ,. . . , Xn are independent. In particular,
there are strong tail bounds that bound the probaility of large deviations from the
expectation of X. We will only use the following variant of a Chernoff bound:
prob(X < (1 )E[X]) e
2 E[X]/2
(A.4)
If the indicator random variables are also identically distributed with prob(Xi = 1) = p,
X is binomially distributed,
n i
prob(X = i) =
(A.5)
p (1 p)( n i) .
i
A.3
Useful Formulas
(A.7)
=
lnn Hn =
xA
n(n + 1)
2
(A.6)
[ni=1 i2 ]
(A.1)
229
k ln n + 1
(A.8)
k=1
qi =
i=0
1 qn
for q 6= 1
1q
(A.9)
Stirling
n n
e
n n
1
n! = 1 + O
2n
Stirlings equation
n
e
(A.10)
230
Notation
231
Bibliography
[1] G. M. Adelson-Velskii and E. M. Landis. An algorithm for the organization of
information. Soviet Mathematics Doklady, 3:12591263, 1962.
[2] A. V. Aho, B. W. Kernighan, and P. J. Weinberger. The AWK Programming
Language. Addison-Wesley, 1988.
[3] R. Ahuja, K. Mehlhorn, J. Orlin, and R. Tarjan. Faster algorithms for the shortest
path problem. Journal of the ACM, 3(2):213223, 1990.
[4] R. K. Ahuja, R. L. Magnanti, and J. B. Orlin. Network Flows. Prentice Hall,
1993.
[5] A. Andersson, T. Hagerup, S. Nilsson, and R. Raman. Sorting in linear time?
Journal of Computer and System Sciences, pages 7493, 1998.
[6] A. Andersson and M. Thorup. A pragmatic implementation of monotone priority
queues. In DIMACS96 implementation challenge, 1996.
[7] F. Annexstein, M. Baumslag, and A. Rosenberg. Group action graphs and parallel architectures. SIAM Journal on Computing, 19(3):544569, 1990.
[8] R. Bayer and E. M. McCreight. Organization and maintenance of large ordered
indexes. Acta Informatica, 1(3):173 189, 1972.
[9] M. A. Bender, E. D. Demaine, and M. Farach-Colton. Cache-oblivious b-trees.
In IEEE, editor, 41st IEEE Symposium on Foundations of Computer Science,
pages 399409, 2000.
[10] J. L. Bentley and M. D. McIlroy. Engineering a sort function. Software Practice
and Experience, 23(11):12491265, 1993.
[11] J. L. Bentley and T. A. Ottmann. Algorithms for reporting and counting geometric intersections. IEEE Transactions on Computers, pages 643647, 1979.
232
BIBLIOGRAPHY
[12] J. L. Bentley and R. Sedgewick. Fast algorithms for sorting and searching strings.
In ACM, editor, Proceedings of the Eighth Annual ACM-SIAM Symposium on
Discrete Algorithms, New Orleans, Louisiana, January 57, 1997, pages 360
369, New York, NY 10036, USA, 1997. ACM Press.
[13] D. Bertsimas and J. N. Tsitsiklis. Introduction to Linear Optimization. Athena
Scientific, 1997.
[14] G. E. Blelloch, C. E. Leiserson, B. M. Maggs, C. G. Plaxton, S. J. Smith, and
M. Zagha. A comparison of sorting algorithms for the connection machine CM2. In ACM Symposium on Parallel Architectures and Algorithms, pages 316,
1991.
[15] M. Blum, R. W. Floyd, V. R. Pratt, R. L. Rivest, and R. E. Tarjan. Time bounds
for selection. J. of Computer and System Sciences, 7(4):448, 1972.
[16] O. Boruvka. O jistem problemu minimalnm. Pra` ce, Moravske Prirodovedecke
Spolecnosti, pages 158, 1926.
[17] G. S. Brodal. Worst-case efficient priority queues. In Proc. 7th Symposium on
Discrete Algorithms, pages 5258, 1996.
BIBLIOGRAPHY
233
[18] M. Brown and R. Tarjan. Design and analysis of a data structure for representing
sorted lists. SIAM Journal of Computing, 9:594614, 1980.
[19] R. Brown. Calendar queues: A fast O(1) priority queue implementation for the
simulation event set problem. Communications of the ACM, 31(10):12201227,
1988.
[31] M. Dietzfelbinger and F. Meyer auf der Heide. Simple, efficient shared memory simulations. In 5th ACM Symposium on Parallel Algorithms and Architectures, pages 110119, Velen, Germany, June 30July 2, 1993. SIGACT and
SIGARCH.
234
BIBLIOGRAPHY
BIBLIOGRAPHY
235
236
BIBLIOGRAPHY
[64] K. Mehlhorn. Data Structures and Algorithms, Vol. I Sorting and Searching.
EATCS Monographs on Theoretical CS. Springer-Verlag, Berlin/Heidelberg,
Germany, 1984.
[65] K. Mehlhorn. A faster approximation algorithm for the Steiner problem in
graphs. Information Processing Letters, 27(3):125128, Mar. 1988.
[66] K. Mehlhorn and S. Naher. Bounded ordered dictionaries in O(loglogN) time
and O(n) space. Information Processing Letters, 35(4):183189, 1990.
BIBLIOGRAPHY
237
[67] K. Mehlhorn and S. Naher. The LEDA Platform for Combinatorial and Geometric Computing. Cambridge University Press, 1999. 1018 pages.
[81] P. Sanders and S. Winkel. Super scalar sample sort. In 12th European Symposium
on Algorithms (ESA), volume 3221 of LNCS, pages 784796. Springer, 2004.
[68] K. Mehlhorn and S. Naher. The LEDA Platform of Combinatorial and Geometric
Computing. Cambridge University Press, 1999.
[82] R. Santos and F. Seidel. A better upper bound on the number of triangulations of
a planar point set. Journal of Combinatorial Theory Series A, 102(1):186193,
2003.
238
BIBLIOGRAPHY
[92] M. Thorup. Integer priority queues with decrease key in constant time and the
single source shortest paths problem. In 35th ACM Symposium on Theory of
Computing, pages 149158, 2004.
[93] P. van Emde Boas. Preserving order in a forest in less than logarithmic time.
Information Processing Letters, 6(3):8082, 1977.
[94] J. Vuillemin. A data structure for manipulating priority queues. Communications
of the ACM, 21:309314, 1978.
[95] L. Wall, T. Christiansen, and J. Orwant. Programming Perl. OReilly, 3rd edition, 2000.
[96] I. Wegener. BOTTOM-UP-HEAPSORT, a new variant of HEAPSORT beating,
on an average, QUICKSORT (if n is not very small). Theoretical Comput. Sci.,
118:8198, 1993.
[97] M. T. Y. Han. Integer sorting in O n loglogn expected time and linear space.
In 42nd Symposium on Foundations of Computer Science, pages 135144, 2002.