Massively Parallel Tabu Search for the Quadratic A
Massively Parallel Tabu Search for the Quadratic A
net/publication/226149490
CITATIONS READS
132 1,745
2 authors, including:
Jadranka Skorin‐Kapov
Stony Brook University
64 PUBLICATIONS 2,409 CITATIONS
SEE PROFILE
All content following this page was uploaded by Jadranka Skorin‐Kapov on 18 November 2015.
MASTER'S THESIS
Samuel Gabrielsson
Samuel Gabrielsson
September 2007
2
Abstract
A parallel version of the tabu search algorithm is implemented and used to optimize
the solutions for a quadratic assignment problem (QAP). The instances are taken from
the qaplib website1 and we mainly concentrate on solving and optimizing the instances
announced by Sergio Carvalho derived from the “Microarray Placement Problem”2
where one wants to find an arrangement of the probes (small DNA fragments) on
specific locations of a microarray chip.
We briefly explain combinatorics including graph theory and also the theory be-
hind combinatorial optimization, heuristics and metaheuristcs. A description of some
network optimization problems are also introduced before we apply our parallel tabu
search algorithm to the quadratic assignment problem.
Different approaches like Boltzmann selection procedure and random restarts are
used to optimize the solutions. Through our experiments, we show that our parallel
version of tabu Search do indeed manage to further optimize and even find better
solutions found so far in the litterature.
We try out a communication protocol based on sequentially generating graphs,
where each node in the graph corresponds to a CPU or tabu search thread. One of the
main goals is to find out if communication helps to further optimize the best known
solution found so far for each instace.
1 http://www.opt.math.tu-graz.ac.at/qaplib/
2 http://gi.cebitec.uni-bielefeld.de/comet/chiplayout/qap
i
ii
Acknowledgements
This thesis is the final part of the Master of Science programme in Computer Science
and Engineering. It has been carried out during the spring semester of 2007 in the
Toronto Intelligent Decision Engineering Lab (TIDEL), at the University of Toronto
(UofT), Ontario, Canada.
I was one of the lucky few students from abroad to work for Prof. J. Christopher
Beck in the department of Mechanical and Industrial Engineering at UofT as a research
trainee. It was truly an honor and I most definitely had lots of fun learning and working
with the different projects. All thanks to Prof Beck and the IAESTE organization
in Luleå, Sweden for giving me that chance and exposure to the wonderful world of
research.
I would also like to thank the people in TIDEL, especially Lei Duan for all his hard
work, and Ivan Heckman for helping out on making the fundy cluster behave nicely.
Finally, I would like to thank my thesis supervisor Inge Söderkvist in Luleå, Sweden
for his work on helping me to improve this thesis.
The thesis is meant to be a spin off the parallel tabu search part from a published
paper [1] by Lei Duan, the author of this thesis, and professor J. Christopher Beck at
the university of Toronto3 .
3 http://tidel.mie.utoronto.ca/publications.php
iii
iv
Contents
1 Introduction 3
2 Combinatorics 5
2.1 The Rule of Product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 Permutations - When Order Matters . . . . . . . . . . . . . . . . . . . . 6
2.2.1 Permutations Without Repetition . . . . . . . . . . . . . . . . . 6
2.2.2 Permutations With Repetition . . . . . . . . . . . . . . . . . . . 7
2.3 Combinations - When Order do not Matter . . . . . . . . . . . . . . . . 7
2.3.1 Combinations Without Repetition . . . . . . . . . . . . . . . . . 7
2.3.2 Combinations With Repetition . . . . . . . . . . . . . . . . . . . 8
2.4 Graph Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.4.1 Introducing Graphs . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.4.2 Directed Graphs and Undirected Graphs . . . . . . . . . . . . . . 9
2.4.3 Adjacency and Incidence . . . . . . . . . . . . . . . . . . . . . . 10
2.4.4 Paths and Cycles . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.4.5 Subgraphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.4.6 Vertex Degrees . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.4.7 Graph Representation . . . . . . . . . . . . . . . . . . . . . . . . 16
2.4.8 Graph Applications . . . . . . . . . . . . . . . . . . . . . . . . . 17
3 Combinatorial Optimization 19
3.1 Combinatorial Optimization Problems . . . . . . . . . . . . . . . . . . . 20
3.1.1 The Traveling Salesman Problem . . . . . . . . . . . . . . . . . . 21
3.2 Computational Complexity . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.2.1 The Class P . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.2.2 The Class N P . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.2.3 The Classes N P-Hard and N P-Complete . . . . . . . . . . . . . 25
3.3 Heuristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.4 Heuristic Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.4.1 Local Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.4.2 Hill Climbing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.4.3 Local Improvement Procedures . . . . . . . . . . . . . . . . . . . 31
3.5 Metaheuristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.6 Metaheuristic Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.6.1 Greedy Algorithms and Greedy Satisfiability . . . . . . . . . . . 34
3.6.2 Simulated Annealing . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.6.3 Hybrid Evolutionary Algorithm . . . . . . . . . . . . . . . . . . . 35
v
4 Network Optimization Problems 39
4.1 Network Flow Problem Terminology . . . . . . . . . . . . . . . . . . . . 39
4.2 The Minimum Cost Network Flow Problem . . . . . . . . . . . . . . . . 40
4.3 The Transportation Problem . . . . . . . . . . . . . . . . . . . . . . . . 40
4.4 The Assignment Problem . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.5 The Quadratic Assignment Problem . . . . . . . . . . . . . . . . . . . . 41
4.5.1 Mathematical Formulation of the QAP . . . . . . . . . . . . . . . 42
4.5.2 Location Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
4.5.3 A Quadratic Assignment Problem Library . . . . . . . . . . . . . 46
vi
List of Figures
2.1 The seven bridges of Könighsberg with four land areas interconnected
by the seven bridges. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2 The graph representation of the seven bridges problem. . . . . . . . . . 9
2.3 The difference between an undirected and a directed graph is indicated
by drawing an arrow to show the direction. . . . . . . . . . . . . . . . . 10
2.4 The two edges (e, c) and (c, e) joining the same pair or vertices in Figure
2.4(a) is a graph with multiple edges. Figure 2.4(b) has a loop in node
b and Figure 2.4(c) is just a simple graph with no loop or multiple edges. 11
2.5 A graph showing the relationship between adjacency and incidence. . . . 11
2.6 An example of a large graph. . . . . . . . . . . . . . . . . . . . . . . . . 12
2.7 A disconnected graph. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.8 A graph G and one of its subgraphs G1 with an isolated node d. . . . . 13
2.9 The complete graph with n vertices denoted by Kn . . . . . . . . . . . . 14
2.10 The null graph of 1, 2, 3, 4, 5 and 6 vertices denoted Nn where n is the
amount of vertices or nodes. . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.11 The graphs show two different ways to represent the weighted graph. . . 16
vii
viii
List of Tables
6.1 The best values are shown in bold. These values are even better than
those found in the literature. . . . . . . . . . . . . . . . . . . . . . . . . 57
1
2
Chapter 1
Introduction
3
4
Chapter 2
Combinatorics
Combinatorics [2], which is a collectively name for the fundamental principles of count-
ing, combinatorics and permutations, was first presented by Thomas Kirkman in a
paper from 1857 to the Historic Society of Lancashire and Cheshire. Combinato-
rial methods are today very important in statistics and computer science. In many
computer applications, determining the efficiency of algorithms requires some skills in
counting.
In this chapter we will be counting choices or distributions which may be ordered
or unordered and in which repetitions may or may not be allowed. Counting becomes
a very important activity in combinatorial mathematics. Graph theory is also is also
included because we are often concerned with counting the number of objects of a given
type in particular graphs.
Definition 1 (The Rule of Product). If a procedure can be broken down into first and
second stages, and if there are m possible outcomes for the first stage and if, for each
of these outcomes, there are n possible outcomes for the second stage, then the total
procedure can be carried out, in the designated order, in m × n ways.
Solution. By the rule of product, the director can cast his leading couple in 2 × 3 = 6
ways.
5
2.2 Permutations - When Order Matters
Counting linear arrangements of distinct objects are often called permutations. We
give an example adopted from [3].
Example 3. A computer science class at LuleåUniversity of Technology consists of 15
students. Four are to be chosen and seated in a row for a picture during the start of
the new school year. How many such linear arrangements are possible?
Solution. The most important word in this example is arrangement, which implies
order. Let A, B, C, . . . , N and O denote the 15 students, then CAGO, EGBO, and
OBGE are three different arrangements, even though the last two involve the same
four students. Each of the 15 students can occupy the first position in the row. To fill
the second position, we can only select one of the fourteen remaining students because
repetitions are not allowed. Continuing in this way, we find only twelve students to
select from in order to fill the fourth and final position as shown in Table 2.1.
This gives us a total of 32760 possible arrangements of four students selected from
the class of 15.
The following notation allow us to express our answers in a more convenient form.
Definition 2. For an integer n ≥ 0, n factorial (denoted n!) is defined by
0! = 1, (2.1)
n! = (n)(n − 1)(n − 2) · · · (3)(2)(1), for n ≥ 1. (2.2)
Solution. 5! = 5 × 4 × 3 × 2 × 1 = 120.
The values of n! increase very fast. To better appreciate how fast n! grows we
calculate 10! = 3628800 which happens to be the number of seconds in six weeks. In
the same way 11! exceeds the number of seconds in one year, 12! in twelve years and
13! surpasses the number of seconds in a century.
6
(n − r)(n − r − 1) · · · (3)(2)(1)
(n)(n − 1)(n − 2) · · · (n − r + 1) × (2.3)
(n − r)(n − r − 1) · · · (3)(2)(1)
which in factorial notation results to
n!
. (2.4)
(n − r)!
n!
We denote Equation 2.4 by P (n, r). When r = n we find that P (n, n) = 0! = n!.
nr (2.5)
7
The binomial coefficient symbol nr is also used instead of C(n, r) and are sometimes
read as “n choose r”. Note that C(n, 0) = 1 for all n ≥ 0.
Example 7. Luleå Academic Computer Society (LUDD) is hosting the yearly taco
party but there is only room for 40 members and 45 wants to get in. In how many
ways can the chairman invite the lucky 40 members? The order is not important.
Solution. The chairman can invite the lucky 40 members in C(45, 40) = 45 45!
40 = 5!40! =
1221759 ways. However once the 40 members arrive, how the chairman arranges them
around the table becomes an arrangement problem.
Example 8. An ice cream shop offers five flavors of ice cream: vanilla, chocolate,
strawberry, banana and lemon. You can only have three scoops. How many variations
will there be?
(5+3−1)! 7!
Solution. There will be 3!(5−1)! = 3!4! = 35 variations.
8
C g
c d
e D
A
a b
f
B
Figure 2.1: The seven bridges of Könighsberg with four land areas interconnected by
the seven bridges.
g
c d
A e D
a b
f
9
a to b are two-way roads, we also get the relation (b ℜ a). If all the roads are two-way,
we get a symmetric relation.
Definition 5. Let V be a finite nonempty set, and let E ⊆ V × V . The pair (V, E) is
then called a directed graph on V , or digraph on V , where V is the set of vertices or
nodes and E is its set of directed edges or arcs. We write the graph as G = (V, E).
When there is no concern about the direction of any edge, we still write G = (V, E).
But now E is a set of undirected pairs of elements taken from V , and G is called an
undirected graph. In general, if a graph G is not specified as directed or undirected, it
is assumed to be undirected. Whether G = (V, E) is directed or undirected, we often
call V the vertex set of G and E the edge set of G.
a b c a b c
d e f d e f
(a) Undirected graph (b) Directed graph or digraph
Figure 2.3: The difference between an undirected and a directed graph is indicated by
drawing an arrow to show the direction.
The graph in Figure 2.3(a) has six vertices and seven edges:
V = {a, b, c, d, e, f } (2.9)
E = {(a, b), (a, d), (b, c), (b, e), (c, f ), (d, e), (e, f )} (2.10)
The directed graph in figure 2.3(b) has six vertices and eight directed edges:
V = {a, b, c, d, e, f } (2.11)
E = {(a, b), (b, e), (e, b), (c, b), (c, f ), (e, f ), (d, e), (d, a)} (2.12)
Definition 6. In a graph, two or more edges joining the same pair of vertices are
multiple edges. An edge joining a vertex to itself as a loop, see Figure 2.4(b).
10
a b a b a b
c
d e c d c d
(a) Multiple edges. (b) A loop. (c) A simple
graph.
Figure 2.4: The two edges (e, c) and (c, e) joining the same pair or vertices in Figure
2.4(a) is a graph with multiple edges. Figure 2.4(b) has a loop in node b and Figure
2.4(c) is just a simple graph with no loop or multiple edges.
a 2 b
c
1 3 4
5
7 6
d e
Figure 2.5: A graph showing the relationship between adjacency and incidence.
11
We show an example from [4] to clarify definition 9 above.
Example 10. In the graph of Figure 2.6 (a) Find a walk that is not a trail. (b) Find
a trail that is not a path. (c) Find five b-d paths. (d) Find the length for each path
of part c. (e) Find a circuit that is not a cycle. (f) Find all distinct cycles that are
present.
e b a
g
j
d c f
i h
Definition 10. Let G = (V, E) be an undirected graph. If there is a path from any
point to any other point in the graph, i.e every pair of its vertices is connected by a path
is called a connected graph. A graph that is not connected is said to be a disconnected
graph.
Definition 11. Let G = (V, E) be a directed graph. Its associated undirected graph
is the graph obtained from G by ignoring the directions on the edges. If more than one
undirected edge results for a pair of distinct vertices in G, then only one of these edges
is drawn in the associated undirected graph. When this associated graph is connected,
we consider G connected.
For example, Figure 2.7 is a disconnected graph because there is no path from a
and c. However, the graph is composed of pieces with vertex sets V1 = {a, b, d, e},
V2 = {c, f } and edge set E1 = {{a, b}, {a, d}, {b, e}, {d, e}}, E2 = {c, f } that are
connected. These pieces are called the (connected) components of the graph. Hence
an undirected graph G = (V, E) is disconnected if and only if V can be partitioned
into at least two subsets V1 , V2 such that there is no edge in E of the form {x, y} where
x ∈ V1 and y ∈ V2 . A graph is connected if and only if it has only one component.
Definition 12. For any graph G = (V, E) , κ(G) denotes the number of components
of G.
12
a b c
d e f
So far we have allowed at most one edge between two vertices. We extend our
concept of a graph by considering an extension.
Definition 13. Let V be a finite nonempty set. Then the pair (V, E) determines
a multigraph G with vertex set V and edge set E if, for some x, y ∈ V , there are
two or more edges in E of the form (a) (x, y) (for directed multigraph), or (b) {x, y}
(for an undirected multigraph). In both cases we write G = (V, E) to designate the
multigraph.
2.4.5 Subgraphs
We often want to solve complicated problems by looking at simpler objects of the same
type. We do that in mathematics by sometimes studying subsets of sets, subgroups of
groups and so on. In graph theory we define subgraphs of graphs.
b c b c
a d a d
f e f e
(a) G (b) G1
Figure 2.8: A graph G and one of its subgraphs G1 with an isolated node d.
13
Definition 16. Let G = (V, E) be a directed or undirected graph. If ∅ 6= S ⊆ V
then the subgraph induced by S, denoted hSi is the subgraph whose vertex set is S and
which contains all edges form G.
Definition 17. Let V be the set of vertices. The complete graph on V , denoted Kn
is a loop free undirected graph where for all a, b ∈ V , a 6= b, there is an edge {a, b}.
In a more readable form, the definition above means that a complete graph is a
graph where each vertex is connected to each of the others by exactly one edge.
Since there are n vertices, this implies that the number of edges satisfies
n n(n − 1)
|E(Kn )| = = . (2.14)
2 2
It also follows that this number is an upper bound of the number of edges of any
graph on n vertices defined as
n(n − 1)
|V (G)| = n =⇒ |E(G)| ≤ (2.15)
2
Figure 2.10: The null graph of 1, 2, 3, 4, 5 and 6 vertices denoted Nn where n is the
amount of vertices or nodes.
3A single isolated node with no edges, i.e, the null graph on 1 node.
14
2.4.6 Vertex Degrees
It is convenient to define a term for the number of edges meeting at a vertex. For exam-
ple, when we wish to specify the number of roads meeting at a particular intersection
or the number of chemical bonds joining an atom to its neighbors.
Definition 19. Let G be an undirected graph or multigraph. The number of edges
incident at vertex v in G is called the degree or valence of v in G, written dG (v) or
simply deg(v) when G requires no explicit reference.
A loop at v is to be counted twice in computing the degree of v. The minimum of
the degrees of the vertices of a graph G is denoted δ(G) and ∆(G) for the maximum
number of degrees. An undirected graph or multigraph where each vertex has the same
degree is called a regular graph, if deg(v) = k for all vertices v, then the graph is called
k-regular. In particular, a vertex of degree 0 is an isolated vertex of G. A vertex of
degree 1 is called a pendant vertex.
As mentioned before, the very first theorem of graph theory was due to Leonhard
Euler.
Theorem 1 (Euler). The sum of the degrees of the vertices of a graph is equal to
twice the number of its edges X
deg(v) = 2|E|. (2.16)
v∈V
Proof. An edge e = {a, b} of G, is counted once while counting the degrees of each of
a and b, even when a = b.PConsequently each edge contributes 2 to the sum of the
P of the vertices (2 × v∈V deg(v)). Thus 2|E| accounts for deg(v), for all v ∈ V
degrees
and v∈V deg(v) = 2|E|.
Corollary 1. For any graph G, the number of vertices of odd degree is even.
Proof. Let V1 and V2 be the subsets of vertices of G with odd and even degrees respec-
tively. By Theorem 1
X X X
2|E| = deg(v) = deg(v) + deg(v) (2.17)
v∈V v∈V1 v∈V2
P P
The numbers 2|E|, v∈V2 deg(v) and v∈V1 deg(v) are even. Then for each vertices
v ∈ V1 , deg(v) is odd, |V1 | must be even.
Definition 20. Let G = (V, E) be an undirected graph or multigraph with no isolated
vertices. If there is a circuit in G that traverses every edge of the graph exactly once
then G has an Euler circuit. An open trail that traverses each edge in G exactly once
is called an Euler trail or Euler path.
So far we have presented a lot of definitions. We can now finally conclude that
the seven bridges problem actually requires us to find an Euler circuit. The questions
remains, is there an easy way to find out if a graph G is an Euler circuit or an Euler
trail without trying to traverse every single edge by hand?
Theorem 2. Let G = (V, E) be an undirected graph or multigraph with no isolated
vertices. Then G has an Euler circuit if and only if G is connected and every vertex
has even degree.
15
Proof. Can be found in [4].
Remark. An Euler trail in G must begin at one of the odd vertices and end at the
other.
We return once again to the seven bridges problem. We observer from Figure 2.2
that each vertex has an odd number of edges. For example, deg(B) = deg(C) =
deg(D) = 3 and deg(A) = 5. Therefor the citizens of Königsberg could not find a
solution as each edge can be used only once and all the vertices are odd. It is impossible
to re-enter any vertex again after leaving it and this makes the starting and ending at
the same point impossible as conducted in the beginning of this chapter.
Definition 21. Let G be an undirected graph with n vertices. The adjacent matrix
A(G) of G is the n × n boolean matrix with one row and one column for each of the
graph’s vertices, in which the entry in row i and column j is equal to 1 if there is an
edge joining the ith vertex to the jth vertex and equal to 0 if there is no such edge.
a 1 b a b c d
a 0 1 2 5 a b,1 c,2 d,5
5 b 1 0 0 3 b a,1 d,3
2 3
c 2 0 0 4 c a,2 d,4
c 4 d d 5 3 4 0 d a,5 b,3 c,4
(a) Weighted graph (b) Its adjacency ma- (c) Its adjacency linked list
trix
Figure 2.11: The graphs show two different ways to represent the weighted graph.
16
2.4.8 Graph Applications
A chemist named Cayley found a good use for graph theory. The earliest application
to chemistry was found by him in 1857.A chemical molecule can be represented by a
graph by mapping each atom of the molecule to a vertex of the graph and making
the edges represent atomic bounds. The degree of each vertex gives the valence of the
corresponding atom.
Graphs in computer science and parallel programming are crucial. When work-
ing on parallel computers one defines and models the communication protocol using
graphs. The experiments in later chapters would be impossible without graph the-
ory and combinatorics. Each CPU represents a node or vertex and each edge defines
the intercommunication between the CPUs when sending and receiving solutions from
and to each CPU. For example, two CPUs or processors, say p1 and p2 are able to
communicate directly with one another. We draw the edge {p1 , p2 } to represent this
line of possible communication. Note that a graph with relatively few edges missing is
called a dense graph and a graph with few edges relative to the number of its vertices
is called a sparse graph. The running time of an algorithm is heavily dependent on
whether we are dealing with a dense or a sparse graph. How to decide on a model
for the communication, i.e., the graph to speed up the processing time becomes an
optimization problem.
17
18
Chapter 3
Combinatorial Optimization
f : A → R1 . (3.1)
Such a point is called a globally optimum solution to the given instance or simply an
optimal solution.
Definition 23. An optimization problem is a set I of instances of an optimization
problem.
Note the difference between a problem and an instance of a problem. In an instance
we are given the “input data” and have enough information to obtain a solution. A
problem is a collection of instances. For example, an instance of the traveling salesman
problem (in Section 3.1.1) has a given distance matrix, but we speak in general of the
traveling salesman problem as the collection of all the instances associated with all
distance matrices.
Definition 24. A point x0 is a locally optimal solutions to an instance I if
19
where N is a neighborhood defined for each instance in the following way
Definition 25.
Nε (x0 ) = {x : x ∈ A and |x − x0 | ≤ ε}. (3.4)
Over the past few decades major subfields of optimization has emerged, together
with a corresponding collection of techniques for their solution. The first subfield is
the nonlinear programming problem where the main goal is to
min f (x) (3.5)
x∈Rn
subject to
gi (x) ≥ 0, i = 1, . . . , m (3.6)
hj (x) = 0, j = 1, . . . , n (3.7)
where f is an objective function, gi and hj are general functions of x ∈ Rn . If f is
convex, gi concave, and hj linear we arrive to a new subfield in optimization where the
problem is now called convex programming problem. If f , gi and hj are all linear, we
come to another major subfield in optimization called the linear programming problem.
A widely used algorithm called the simplex algorithm of G.B Dantzig finds an optimal
solution to a linear programming problem in a finite number of steps. After thirty years
of improvement, it now solves problems with hundreds of variables and thousands of
constraints. When dealing with problems where the set of feasible solutions are finite
and discrete, or can be reduced to a discrete one, we call the optimization problem
combinatorial. Just to mention a few more, there exists a subfields like quadratic pro-
gramming problem where the objective is a quadratic function of the decision variables,
and constraints which are all linear functions of the variables.
Applying the structures of trees1 and graphs, we mainly work with optimization
techniques that arise in the area of operations research. These techniques can be applied
to graphs and multigraphs with a positive integer weight associated to each edge of the
graph or multigraph. The weights relate information such as the distance between the
vertices that are endpoints of the edge or the amount of material that can be shipped
from one vertex to another along an edge that represents a highway or air route.
20
where f is an objective function (Nn → N) that associates a performance measure with
a variable assignment, x is a vector of n discrete decision variables, and C1 , . . . , Cn are
constraints defining the solution space.
Definition 26. A solution to P is an assignment of values to the variables in x. The
set of solutions to P is denoted by LP .
Definition 27. A feasible solution to P is a solution x̂ that satisfies all constraints
Ci (x̂) for i = 1, . . . , n. The set of all feasible solutions in P is denoted by LeP .
Definition 28. The set of optimal solutions to P, denoted by L∗P is defined as
Some algorithms take into account only solutions that satisfy some of the con-
straints. The set of solutions over which the algorithm is defined is called the search
space.
Definition 29. A search space of P is a set LbP such that LP ⊆ LbP ⊆ N n . Elements
of the set LbP often satisfy a subset of {C1 , . . . , Cn }.
It is useful in many situations to define a set N (s) of points that are “close” in
some sense to the solution s.
Definition 30. A neighborhood is a pair (LbP , N ), where LbP is a search space and N
is a mapping in the following way.
b
N : LbP → 2LP (3.10)
that defines for each solution s, the set of adjacent solutions N (s) ⊆ LbP . If the relation
s1 ∈ N (s2 ) ⇔ s2 ∈ N (s1 ) holds, then the neighborhood is symmetric.
To find a globally optimal solution to an instance of P can be very difficult and
requires in many cases a lot of computational time. But it is often possible to find a
solution s which is best in the sense that there is nothing better in its neighborhood
N (s).
Definition 31. A solution s in LP is locally optimal with respect to N if
21
There is a integer cost cij to travel from city i to city j and the salesman wishes to make
the tour with a minimal total cost. The total cost becomes the sum of the individual
costs along the edges of the tour. The travel costs are symmetric in the sense that
traveling from city i to city j costs just at much as traveling from city j to city i.
The problem can be modeled as a complete graph with n vertices. Each vertices
represent a city and the edge weights specifying the distances. This is closely related to
the Hamiltonian cycle problem and can be stated as the problem of finding the shortest
Hamiltonian circuit 2 of the graph. If there are n cities to visit, the number of possible
tours or paths is finite. To be precise it becomes (n − 1)!. Hence an algorithm can
easily be designed that systematically examines all tours in order to find the shortest
tour. This is done by generating all the permutations of n − 1 intermediate cities,
computing the tour lengths and finding the shortest among them. Mathematically, the
cost is represented as
Xn
c(π) = djπ(j) , (3.12)
j=1
where a cyclic permutation π represents a tour if we interpret π(j) to be the city visited
after city j, j = 1, . . . , n. Then the cost c maps π to the total sum of the costs. The
objective is to miminimize the cost function in the following way.
where F = {all cyclic permutations π on n objects} and dij denotes the distance be-
tween city ci and cj . We are assuming that dii = 0 and dij = dji for all i, j meaning
that the graphs adjacency n × n matrix[dij ] is loop free and symmetric. Note that
dij ∈ Z+ .
Example 12. We show here a small instance with four cities. The objective is to find
the optimal tour or the minimal cost and minimize the cost function in Equation 3.12.
a 2 b
5 3
8 7
c 1 d
Figure 3.1: A small instance with size 4 of the traveling salesman problem.
22
Tour Total cost
a→b→c→d→a 2 + 8 + 1 + 7 = 18
a→b→d→c→a 2 + 3 + 1 + 5 = 11 optimal
a→c→b→d→a 5 + 8 + 3 + 7 = 23
a→c→d→b→a 5 + 1 + 3 + 2 = 11 optimal
a→d→b→c→a 7 + 3 + 8 + 5 = 23
a→d→c→b→a 7 + 1 + 8 + 2 = 18
We notice that three pairs of tours differ only by the tours direction, we can cut the
number of vertex permutations by half. This improvement makes the total number
of permutations needed into (n − 1)!/2. Also notice that the number of permutations
increase so rapidly, cutting the number of vertex permutations by half doesn’t make a
big difference when it comes to computational complexity.
Our brute-force approach used to solve the example above is called exhaustive search
and is very useful when working with small instances because of its simple non sophisti-
cated implementation. The algorithm generates each and every element of the problems
domain, selecting those of them that satisfy the problem’s constraints and then finds
a desired element that optimizes the objective function.
When dealing with combinatorial objects such as permutations, combinations and
subsets of a given set, we often don’t have the computational power to use exhaustive
search to find the optimal value as the instances grow in size. There are just too many
tours to be examined. For our modest problem of 4 cities we get only 6 tours to examine.
So the computations can easily be done by hand. A problem of 10 cities require us to
examine 9! = 362880 tours. This can easily be carried out by a computer today with
its multi core architecture. What if we had 40 cities to visit? The number of tours
becomes gigantic and grows to 1045 different permutations. Even if we could examine
1015 tours per second, which is really fast for the most powerful supercomputers today,
the required time for completing this calculation would be several billion lifetimes of
the universe!
Exhaustive search is not the best way to go and is impractical for all but very small
instances of the problem. Fortunately there exists much efficient algorithms for solving
problems like this.
Definition 32. Problems that can be solved in polynomial time are called tractable
or easy problems. Problems that can not be solved in polynomial time are called
intractable or hard problems.
3 If T (n) is the time for an algorithm on n inputs, then we write T (n) = O(p(n)) to mean that the
23
Table 3.1 show that we cannot solve arbitrary instances of intractable problems in
a reasonable amount of time unless such instances are very small.
n log2 n n n log2 n n2 n3 2n n!
101 3.3 101 3.3 × 101 102 103 1.0 × 103 3.6 × 101
102 6.6 102 6.6 × 102 104 106 1.3 × 1030 9.3 × 10157
103 10.0 103 1.0 × 104 106 109
104 13.0 104 1.3 × 105 108 1012
105 17.0 105 1.7 × 106 1010 1015
106 20.0 106 2.0 × 107 1012 1018
Table 3.1: Values of several functions important for analysis of algorithms. Problems
that grow too fast are left empty.
Definition 33. Class P is a class of decision problems that can be solved in polynomial
time by a deterministic algorithm.
Stage two (The Deterministic Stage) or verification stage, verifies whether this
solution is correct in polynomial time by taking the instance I and the arbitrary
generated string S as its input and outputs yes if S represents a solution to
instance I, otherwise the algorithm either returns no or is allowed not to halt at
all.
The algorithm can behave in a nondeterministic way, when it operates in a way that
is timing sensitive. For example if it has multiple processors writing to the same data
at the same time. The precise order in which each processor writes its data will affect
24
the result. Another cause is if the algorithm uses external state other than the input
such as a hardware timer value or a random value determined by a random number
generator.
Definition 35. A nondeterministic polynomial algorithm is an algorithm where its
time efficiency of its verification stage is polynomial.
We can now define the class N P.
Definition 36. Class N P is the class of decision problems that can be solved by
nondeterministic polynomial algorithms.
Any problems in class P is always also in N P:
P ⊆ NP (3.14)
The open question that still remains today is whether or not class P is a proper
subset of N P, or if the two classes P and N P are actually equivalent. If classes P and
N P are not the same, then the solution of N P-problems requires in the worst case an
exhaustive search.
Nobody has yet been able to prove whether N P-complete problems are solvable in
polynomial time, making this one of the great unsolved problems of mathematics. An
award of $1 million is offered by the Clay Mathematics Institute in Cambridge, MA to
anyone who has a formal proof that class P = N P or that class P = 6 N P.
Class N P contains the Hamiltonian circuit problem, the partition problem, the
knapsack problem, graph coloring, and many hundreds of other difficult combinatorial
optimization problems. If class P = N P then many hundreds of difficult combinatorial
decision problems can be solved by a polynomial time algorithm.
Satisfiability
The satisfiability problem is to determine if a formula F is true for some assignment of
truth values to the variables.
In an algorithm, we may use boolean logic for expressing compound statements.
We use boolean variables x1 , x2 , . . . , xi and the negations x1 , x2 , . . . , xj to denote the
4A clause is a disjunction of literals.
25
individual statements. Each statement can be true or false independently of the truth
value of the others. We then use boolean connectivity to combine boolean variables into
a boolean formula. For example,
F = x3 ∨ (x1 ∧ x2 ∧ x3 ) (3.17)
is the boolean formula and given a value t(x) for each variable x, we can evaluate the
boolean formula just the same way we would do with an algebraic expression. The
truth assignment t(x1 ) = true, t(x2 ) = true, and t(x3 ) = f alse gives the value true to
F in Equation 3.17, thus the boolean formula becomes satisfiable.
Reducibility
A problem (or a language) L1 can be reduced to another problem L2 if any instance
I of L1 can be “easily rephrased” as an instance of L2 with the solution s to which
provides a solution to the instance of L1 . For example, the problem of solving linear
equations where x reduces to the problem of solving quadratic equations. Given an
instance ax + b = 0, we transform it to 0x2 + ax + b = 0, whose solution provides a
solution to ax + b = 0. Thus, if a problem L1 reduces to another problem L2 then L1
is, “no harder to solve” than L2 .
Definition 37. Let L1 and L2 be decision problems. We say that L1 reduces in
polynomial time to L2 , also written L1 ∝ L2 , if and only if there is a way to solve L1
by a deterministic polynomial time algorithm A1 using a deterministic algorithm A2
that solves L2 in polynomial time.
This definition implies that if we have a polynomial time algorithm for L2 then we
can solve L1 in polynomial time.
Definition 38. A problem or language L is N P-hard 5 if and only if it is at least as
hard as any problem in N P. A problem or language L is N P-complete if and only if
L is N P-hard and L ∈ N P.
Alternative definitions of the N P-hard class do exist based on satisfiability and
reducability which is computable by a deterministic Turing machine in polynomial
time. The Venn diagram in Figure 3.2 depicts the relationship between the different
classes.
So, an N P-complete problem is a problem in N P that is as difficult as any problem
in this class. We refer to it as being N P-complete if it is in N P and is as “hard”
as any problem in N P. Only a decision problem can be N P-complete but N P-hard
problems may be of any type: decision problems, search problems or optimization
problems. An example of a N P-hard decision problem that is not N P-complete is the
halting problem [9].
Finally, the following theorem shows that a fast algorithm for the traveling salesman
problem is unlikely to exist.
Theorem 3. The traveling salesman problem is N P-hard.
Proof. Proof can be found in [10].
A list of more than 200 N P-hard optimization problems can be found in [11].
5 N P-hard = nondeterministic polynomial time hard. A common mistake is to think that N P in
26
NP−hard NP−hard
NP−complete
NP
P=NP=
NP−complete
(a) P 6= N P (b) P = N P
Figure 3.2: The Venn diagram for P, N P, N P-complete, and N P-hard set of problems.
3.3 Heuristics
The term heuristic comes from the Greek “heurisko”, which means “I find”. You may
recognize it as a form of the same verb as heureka, the word Archimedes once screamed
naked on the streets of Syracuse.
Because of the complexity of a combinatorial optimization problem P, it may not
always be possible to search the whole search space using conventional algorithms to
find an optimal solution. In such situations, it is still important to find a good feasible
solution that is at least reasonably close to being globally optimal. Heuristic methods
are used on N P-hard problems to search for such a solution. A heuristic method is
a procedure that is likely to find a very good feasible solution, but not necessarily an
optimal solution for the specific instance. A well designed heuristic method can usually
provide a solution that is at least nearly optimal or conclude that no such solution exist,
but no guarantee can be given about the quality of the solution obtained. It should also
be very efficient to deal with larger instances and is often an iterative algorithm, where
each iteration involves conducting a search for a new solution that might be better
than the best solution found in a previous search. When the algorithm is terminated
after a reasonable amount of time or simply when it reaches a number of predefined
iterations, the solution it provides is the best one that was found during any iteration.
27
exhaustive search method applied to the traveling salesman problem in previous sec-
tion. Exhaustive search, which is based on the primitive brute-force method where we
generate all solutions and search for the globally optimal one. Local search, which
is non-exhaustive in the sense that it doesn’t guarantee to find a feasible or optimal
solution, searches non-systematically until a specific stop criterion is satisfied. This is
actually one of the reasons that makes local search so successful on a variety of difficult
combinatorial optimization problems compared to exhaustive search.
The local search algorithm operates in a simple way. Given an instance I of a
combinatorial optimization problem P, we associate the search space, LbP to it. Each
element s ∈ LbP corresponds to a potential solution of I, and is called a state of
I. The local search algorithm relies on a function N which assigns to each s ∈ LbP
its neighborhood N (s) ⊆ LbP . Each state s′ ∈ N (s) is called a neighbor of s. The
neighborhood is composed by the states that are obtained by the local changes called
moves. The local search algorithm starts from an initial state s0 and enters a loop that
navigates the search space, moving from one state si to one of its neighbors si+1 in
hope of improving (minimizing) a function f . The function f measures the quality of
solutions.
A move from a solution to a neighboring solution is defined by the concepts of
neighborhood, local optimality and transition graph. The move is controlled by a
legality condition function L and a selection rule function S to help local search escape
local minima and find a high-quality local optimum.
Definition 39. The transition graph G(LbP , N ) associated to a neighborhood (LbP , N )
is the graph whose nodes are solutions in LbP and where an arc a → b exists if b ∈ N (a).
The reflexive and transitive closure of → is denoted by →∗ .
b b
Definition 40. A legality condition L is a function (2LP × LbP ) → 2LP that filters
sets of solutions from the search space. A selection rule S(M, s) is a function S :
b b
(2LP × LbP ) → 2LP that picks an element s from M according to some strategy and
decides to accept it or to select the current solution s instead.
Definition 41. A local search algorithm for P is a path
s 0 → s 1 → . . . sk (3.18)
A local search produces a final computation state sk that belongs to a set of locally
optimal solutions with respect to N , e.g., sk ∈ L+
P.
At a specific iteration, some of the neighbors may be forbidden, and therefor may not
be selected, or they may be legal. Once the legal neighbors are identified by operation
L, the local search selects one of them and decides whether to move to this neighbor
or to stay at s (operation S). We illustrate these concepts in Figure 3.3.
Algorithm 1 depicts a simple generic local search template parameterized by the
objective cost function f , the neighborhood N , as well as the functions L and S spec-
ifying legal moves and selecting the next neighbor and for different initial solutions
s0 .
The search starts from any initial solution s and stores that as the best solution
found so far (line 2) and performs a number of iterations (line 3). Line 4 checks if the
28
L(N(s),s)
N(s)
Figure 3.3: The local search move showing the solution s, its neighborhood N (s), its
set L(N (s), s) of legal moves and the selected solution in bold thick circle.
29
solution satisfies the constraints and if there exists a neighbor solution si better than
sbest . If there exists such solution then it stores si to the new best solution sbest (line
5) to always keep track of the best solution encountered so far. The move is set to
action in line 6 and consists of selecting a new solution by composing operations N , L
and S.
As you may have noticed, there are different implementations of local search with
very simple changes of some of the N , L and S operations. For example, some lo-
cal search algorithms may set all moves to legal or some to forbidden and in others,
these operations may be rather complex and rely on sophisticated data structures and
algorithms as well as on randomizations. In some cases the local search algorithm is
executed from several different starting points to diversify the the search for global
optimum. In cases like these we must decide on how many starting points we need to
try and how to distribute them in the search space LbP .
30
progress towards a solution, because it is usually quite easy to improve a bad solution in
the beginning of a search, assuming we start from a relatively bad one. Unfortunately,
hill climbing often finds a local optimum, or gets “stuck” in ridges7 or plateau8 as
shown in Figure 3.4. In each case, the algorithm reaches a point where no progress is
being made.
Objective
function
global optimum
local optimum
plateau
Search space
Figure 3.4: The possible landscape of a search space and problems that may occur for
the hill climbing algorithm. The algorithm can get stuck in plateaus or local optimum.
31
Algorithm 2: The best neighbor heuristic function S-Best(N ,s).
Input: Neighborhood N , Solution s
Output: Best Neighbor n
1 begin function S-Best(N ,s)
2 Nbest := {n ∈ N | f (n) = mins∈N f (s)} ;
3 return n ∈ N with probability 1/size of (N );
4 end
A best improvement local search can then be specified as the following implemen-
tation (Algorithm 5) of the generic local search using the functions previously defined.
that depends on the distance between f (n) and f (s) and a parameter t called the
temperature. Otherwise it rejects n. The metropolis heuristic is specified by Algorithm
6
Other procedures define different variants of local search, such as stochastic hill
climbing which chooses at random from among the uphill moves. The probability of
selection can vary with the steepness of the uphill move. This usually converges more
slowly than steepest ascent, but in some cases it finds better solutions.
Random restart hill climbing conducts a series of hill climbing searches from ran-
domly generated initial solutions, stopping when a good solution is found. The restart
method can restart the local improvement procedure a number of times from randomly
32
Algorithm 5: The heuristic function L-BestImprovement(s).
Input: Solution s
1 begin function L-BestImprovement(s)
2 return LocalSearch(f ,N ,L-Improvement,sbest)
3 end
selected initial trial solutions. This will lead to a new local optimum. If we repeat
restarting the search a number of times, we increase the chance that the best of the
local optimum obtained will actually be the global optimum. This approach works
well on small instances and but is much less successful on larger instances with many
variables and a rugged landscape with complicated feasible regions.
The local search ideas need to be carefully tweaked and tailored to fit the problem.
Each method is usually designed to fit a specific problem type rather than a variety of
applications. This means that we always have to implement it from scratch each time we
want to develop a heuristic method to a specific problem. To overcome the drawbacks
of local improvement procedures, we need sophisticated strategies and procedures with
a more structured approach that uses the information being gathered to guide the
search toward the global optimum. This is the role that a metaheuristic plays.
3.5 Metaheuristics
The term metaheuristic is also written meta-heuristic, where “meta” is the Greek pre-
fix which means “beyond”. It was first introduced in the same paper [15] that also
introduced the term tabu search [16]. A metaheuristic is a general solution heuristic
method with a master strategy that guides and modifies other heuristics. The method
collects information on the execution stages and aims primarily at escaping local min-
ima and at directing the search towards global optimality to produce feasible solutions
beyond those that are normally generated in a search for local optimality. This is
fundamentally quite different from a heuristic method which focuses on choosing the
next solution from the neighborhood using only local information on the quality of the
neighbors. Heuristics typically drive the search toward high quality local minima. As
33
a consequence, heuristics are often characterized as memoryless, while metaheuristics
typically include some form of memory or learning which explains the great diversity
and the wealth of results in this field.
Most algorithms for optimization problems go through a sequence of steps, with a set
of choices at each step. This behavior applies to greedy algorithms. A greedy algorithm
is any algorithm that follows the problem solving metaheuristic at obtaining a locally
optimal solution to a problem by making a sequence of choices at each stage with the
hope of leading to a global optimal solution. For each decision point in the algorithm,
the choice that looks best at the moment is chosen. The choice made on each step must
fulfill the following three properties: feasible – it has to satisfy the problems constraints,
locally optimal – it has to be the best local choice among all feasible choices available
on that step and irrevocable – once made, it cannot be changed on subsequent steps
of the algorithm. Greedy algorithms are very powerful and work quite well for a wide
range of problems, but they do not always yield optimal solutions. Examples of greedy
algorithms are Prim’s algorithm and Kruska’s algorithm [18] for finding the minimum
spanning tree [19] [20], Dijkstra’s algorithm [18] for finding single source shortest paths,
and Chvátal’s greedy heuristic [21] for the set covering problem.
Greedy Satisfiability
34
3.6.2 Simulated Annealing
Simulated annealing is another metaheuristic that enables the search process to escape
local optimum. It is based on the Metropolis algorithm with a sequence of decreasing
temperatures starting with a relatively large value of t. A large value of t makes
the acceptance of a solution relatively large, which enables the search to proceed in
almost random directions. Gradually decreasing the value of t as the search continuous
decreases the probability of acceptance. Thus the choice of the values of t over time
controls the degree of randomness in the process. This random component provides
more flexibility for moving toward another part of the feasible region in the hope of
finding a global optimum. The template for simulated annealing is shown in Algorithm
7.
35
Some of the members are randomly paired and become parents who then have children
(new trial solutions) who share some of the new genes of both parents. As the algorithm
proceeds the fittest members of the population generate improving populations of trial
solutions. When parents reproduce, there is no guarantee that resulting children will
be perfect and occasionally a mutations occurs. This happens with a random process
and helps the algorithm to explore a new, perhaps better parts of the feasible region.
A Hybrid Algorithm
To further increase the chance to find high quality solutions, we can combine different
algorithms into a hybrid algorithm. The hybrid evolutionary algorithm exploit the
strength of both local search and evolutionary search by applying local search to the
solutions in the population before combining them. We benefit from the effectiveness
of local search in finding high quality solutions, while the evolutionary aspects, provide
novel ways of diversifying the search and escaping local minima. The hybrid local search
algorithm is depicted in Algorithm 8. It starts by generating an initial population
36
We then apply local search to find an improved solution (line 7), and update in line 11
the population using s and its parents.
The heuristics and metaheuristics presented so far can all be applied to solve N P-
hard and N P-complete combinatorial optimization problems like the traveling sales-
man problem. Another important example is the fundamental optimization problem
called quadratic assignment problem which concerns the placement of facilities in order
to minimize transportation costs, avoid placing hazardous materials near housing, and
saving lives under the ambulance location problem in a big city by minimize the path
from an ambulance to a patient in need.
37
38
Chapter 4
Network Optimization
Problems
39
4.2 The Minimum Cost Network Flow Problem
The minimum cost network flow problem (MCNFP) is to send flow from a set of supply
nodes, through the arcs of a network, to a set of demand nodes, at minimum total cost,
and without violating the lower and upper bounds on flows through the arcs.
Let G = (V, E) be a directed graph consisting of a finite set of nodes V =
{1, 2, . . . , n}, and a set of directed arcs, E = {1, 2, . . . , m}, linking pairs of nodes
in V . We associate with every arc of (i, j) ∈ E, a flow xij , a cost per unit flow cij ,
a lower bound on the flow lij and a capacity uij . To each node i ∈ V we assign an
integer number bi representing the available supply of, or demand for flow at that node.
If bi > 0 then node i is a supply node, if bi < 0 then node i is a demand node, and
otherwise, if bi = 0, node i is referred to as a transshipment node. The total supply
must equal the total demand.
The minimum cost network flow problem N = (i, j, V, E, b) is to determine the flows
xij ≥ 0 in each arc of the network so that the net flow into each node i is bi while
minimizing the total cost. In mathematical terms, the linear programming formulation
of the problem becomes
Xn Xn
min cij xij (4.1)
i=1 j=1
subject to
n
X n
X
xij − xji = bi for each node i, (4.2)
j=1 j=1
and
0 ≤ xij ≤ uij for each arch i → j. (4.3)
The decision variables for the linear programming formulation are
The last value bi has a sign convention that depends on the nature of the node i, where
40
node for each destination, but no transshipment nodes are included in the network.
The conservation constraints have one of the following forms
n
X
xij = bi (4.11)
j=1
for a sink with bi < 0. The transportation problem is used to model the movement of
goods from suppliers to costumers with some cost associated to the shipments.
subject to
n
X
xij = 1 for each worker (node) i, (4.14)
j=1
Xn
xij = 1 for each job j, (4.15)
i=1
and
xij ∈ {0, 1} for each arch i → j. (4.16)
The variables xij must take the value 0 or 1, otherwise the solution is not meaningful
because it is not possible to make fractional assignments of a person to a job.
41
of optimizations problems in graphs such as the maximum clique problem, the graph
partitioning problem and the minimum feedback arc set problem.
From a computational point of view, quadratic assignment problems are very dif-
ficult problems. They are actually N P-hard combinatorial optimization problems. It
is proven in [25] that QAP is N P-hard and even with todays fast multi-core CPUs, it
is still considered hard to solve problems of a modest size as n = 30 within reasonable
time limits.
The QAP can be described as the problem of assigning a set of facilities to a set
of locations with given distances between the locations and given flows between the
facilities. The goal then is to place the facilities on locations in such a way that the
sum of the product between flows and distances is minimal.
The QAP can also be formulated as a “0-1” integer optimization problem1 The
term “quadratic” actually comes from the formulation of the problem as an optimiza-
tion problem with a quadratic objective function. From a mathematical point of view,
an assignment is a one-to-one correspondence (i.e. a bijection) of the finite set N into
itself, in the sense that, permutation p assigns some j = p(i) to each i ∈ N . Every per-
mutation p of the set N corresponds uniquely to a permutation matrix Xp = (xij )n×n
with xij = 1 for j = p(i) and xij = 0 for j 6= p(i). The entries of such permutation
matrix can be defined as a matrix that must satisfy the assignment constraints similar
to Equation 4.14, Equation 4.15 and Equation 4.16 in the following way:
n
X
xij = 1, for all i = 1, . . . , n (4.18)
j=1
n
X
xij = 1, for all j = 1, . . . , n (4.19)
i=1
42
With the above constraints on x, we have our equivalent formulation of the QAP
in terms of permutation matrices QAP (A, B, C)
n X
X n X
n X
n n X
X n
min aij bkl xik xjl + cij xij . (4.22)
i=1 j=1 k=1 l=1 i=1 j=1
43
Location 1
Facility 2
Location 3
Facility 4
We are also given in advance the flow matrix F = (fij )n×n between facilities
f11 f12 f13 f14 0 3 0 2
f21 f22 f23 f24 1
F = (fij )4×4
= = 3 0 0 (4.24)
f31 f32 f33 f34 0 0 0 4
f41 f42 f43 f44 2 1 4 0
d11 d12 d13 d14 0 22 53 0
d21 d22 d23 d24 22 0 40 0
D = (dkl )4×4 =
d31
= (4.25)
d32 d33 d34 53 40 0 55
d41 d42 d43 d44 0 0 55 0
The assigned cost f (with cost matrix C = 0) of the randomly chosen permutation
with
p(1) = 2 (4.26)
p(2) = 1 (4.27)
p(3) = 4 (4.28)
p(4) = 3 (4.29)
(4.30)
44
becomes
n X
X n
fij dp(i)p(j) =
i=1 j=1
=(0 × 0 + 3 × 22 + 0 × 0 + 2 × 40
+3 × 22 + 0 × 0 + 0 × 0 + 1 × 53
+0 × 0 + 0 × 0 + 0 × 0 + 4 × 55
+2 × 40 + 1 × 53 + 4 × 55 + 0 × 0)/2
838
=
2
=419
We divided the right side of the equation above by two because the cost between each
location and facility is calculated twice. Looking at Figure 4.1, we can keep track of
the direction of the flows, ignore the zero flows, the zero distances and the loops. In
this way our calculation becomes a bit more compact thanks to the symmetry in the
flow and distance matrices. This actually helps to reduce the time to obtain a single
solution.
n X
X n
fij dp(i)p(j) = f12 × d21 + f14 × d23 + f24 × d13 + f34 ×43
i=1 j=1
= 3 × 22 + 2 × 40 + 1 × 53 + 4 × 55
= 419.
This is not the best possible permutation. By trying out the rest of the 23 permutations
you might find it though. Note that because of the symmetries or the undirected graph
in Figure 4.1, you only need to check 12 permutations if you include the one we just
solved. The major concern when dealing with larger instances, is that the number
of permutations grow extremely fast in the same way as with the traveling salesman
problem. This means that it is very time consuming to generate and calculate every
single permutation. This is one of the reasons why different algorithms have been
developed for QAPs.
Other examples of some problems expressed in terms of QAP include: Computer
Aided Design (CAD), more precisely, the placement of logical modules in a chip such
45
that the total length of the connections on a board (chip) is minimized. In this case, aij
is the number of connections between electronic module i and module j and bkl is the
distance between locations k and l on which modules can be placed. The assignment
of specialized rooms in a building, where aij is the flow of people that go from service
i to service j and bkl is the time for going from room k to room l. The assignment
of gates to airplanes in an airport, where aij is the number of passengers going from
airplane i to airplane j and bkl is the walking distance between gates k and l.
The algorithm fitted for solving these types of problems is the one we have developed
for the QAP. It is a simple metaheuristic algorithm called parallel tabu search. We
mainly test it to solve problems taken from the quadratic assignment problem library
on the Internet.
where n is the size of the instance, sol is the objective function value (or solution) and
p is the corresponding permutation, i.e.
n X
X n
sol = fij dp(i)p(j) . (4.31)
i=1 j=1
Our work is based on solving the quadratic assignment problem called the “Mi-
croarray Placement Problem” taken from the QAPLIB home page. We use our own
implementation of a parallel version of tabu search, with some unique local improve-
ment procedures and a communication system based on graphs.
2 http://www.opt.math.tu-graz.ac.at/qaplib/
46
Chapter 5
The word tabu (or taboo) means “a prohibition imposed by social custom as a protective
measure” - Webster’s dictionary. The word comes from a language of Polynesian called
Tongan and is used by the natives of Tonga island to indicate sacred things that cannot
be touched.
Tabu search was originally proposed by Fred Glover [16] in 1986 to allow local search
methods to overcome local optima. It is a widely used metaheuristic to solve combina-
torial optimization problems that classical optimization methods have great difficulty
solving within practical time limits. The algorithm can be viewed as a deterministic
alternative to simulated annealing, in which memory, rather than probability, guides
the intelligent search. It uses some common sense ideas like a short term memory to
enable the search process to escape from a local optimum. This yields solutions whose
quality often significantly surpasses that obtained by other methods, typically within
1% of the best known solution found in the litterature.
47
Just like in the ascent method, each iteration selects the available move that goes
furthest up the hill, or a move that drops least down the hill if an upward move is not
available in the hill climbing process. If all goes well, the process will follow a pattern
like that shown in Figure 3.4. A local optimum is left behind in order to climb to the
global optimum. The opposite applies when minimizing the objective function (cost
function) with the descent method.
Tabu search can be viewed as the combination of a greedy strategy with a definition
of legal moves ensuring that a solution is never visited twice. The generic local search in
Algorithm 1 is modified to be consistent with handling a sequence of solutions as shown
in Algorithm 9. It consists mainly of maintaining the sequence τ = {s0 , s1 , . . . , sk } of
solutions found so far. The sequence τ is declared and initialized with solution s0 in
line 3 and new solutions are sequentially explored and added to τ in line 8. Functions
L and S are also modified to handle a sequence of solutions. Function S is similar
to the heuristic function in Algorithm 2. The heuristic function L is implemented as
L-N otT abu in Algorithm 10 and later called from the tabu search implementation in
Algorithm 11.
Different heuristic search strategies are usually implemented in tabu search to in-
crease the chance to find the global optimum. Next section describes some typical
search strategies applied to our implementation of parallel tabu search.
48
Algorithm 11: Tabu search algorithm.
Input: Objective Function f , Neighborhood N , Solution s0
Output: Best Solution sbest
1 begin function TabuSearch(f , N , s0 )
2 return TabuSearch(f , N , L-NotTabu, S-Best);
3 end
49
Diversification involves forcing or directing the search toward or into other unex-
plored regions of the search space. There are many ways to achieve such goal. One
way is to perturb or restart the search when it reaches a predefined or random number
of iterations. Another way, which we will not implement in our algorithm, include
strategic oscillation. It consists of changing the objective function to balance the time
spent in the feasible and infeasible regions.
50
where e is the number of edges and n is the number of nodes (or CPUs in this case).
Note that n(n − 1)/2 gives the total number of edges as shown in Equation 2.15. Let
us examine the four-solver, eight-solver, and twelve-solver interaction with different
communication graph structures created when increasing the density by adding more
edges.
In the four-solver communication graph, we can choose not to communicate at all
by simply starting with a density 0 containing no edges. So no communication takes
place and each solver attempts a problem independently. By adding three edges, we
get a path connecting all the four nodes with density 0.50 as shown in Figure 5.1(a).
Sequentially adding more edges, we form the other graph structures of density 0.67 in
Figure 5.1(b), 0.83 in Figure 5.1(c) and 1.00 in Figure 5.1(d).
Figure 5.1: Communication graphs for four solvers with different densities.
The no communication graph for the eight-solver consists of eight independent nodes
starting with density 0 and no edges. By adding seven edges to a path connecting eight
solvers, we get density 0.25 as shown in Figure 5.2(a). Adding one extra edge to connect
all the nodes in a circle, we get density 0.29 in Figure 5.2(b). Sequentially adding four
more edges in the same similar process as with the four communication graph topology,
we get different densities ranging from 0.43 to 1.00 in Figure 5.2(c) to Figure 5.2(g).
Figure 5.2: Communication graph for eight solvers with different densities.
51
(a) 0.17 (b) 0.18 (c) 0.27 (d) 0.36
Figure 5.3: Communication graph for twelve solvers with different densities.
adding eleven edges to a path connecting twelve solvers, we begin with density 0.17 in
Figure 5.3(a). Adding one extra edge to form a complete circle, we arrive to density
0.18 in Figure 5.3(b). By sequentially adding six more edges to each graph we get
different densities ranging from 0.27 to 1.00 in Figure 5.3(c) to Figure 5.3(k). Note
that the empty graph for Figure 5.1, Figure 5.2 and Figure 5.3 are not depicted.
52
can be described in the following way. Given a set N of solutions {s1 , s2 , . . . , sN } with
their corresponding objective values {f1 , f2 , . . . , fN } for optimization with the goal of
minimization, the probability of choosing a solution si is mapped by the Boltzmann
selection equation.
exp(Qt (si )/T )
p(si ) = PN , (5.2)
k=1 exp(Qt (sk )/T )
where Qt (si ) = fi − minNk=1 {fk }, and the temperature T = −(max{fk } − min{fk }).
The numerator contains the Boltzmann wighting term and the denominator is a nor-
malization factor. If T = 0, then we define p(si ) = 1/N . This mapping assigns a
greater probability to a solution with fewer unassigned variables or a better objective
value, ensuring that a better
PN solution is more likely to be selected by import-softmax.
It is easy to verify that i=1 p(si ) = 1.
An import policy has a local impact at each solver, while a communication graph
impacts the global distribution of elite solutions. We call a combination of an import
policy and a communication graph a cooperation configuration. How a configuration
affects the solver performance will be studied in the experiments.
53
54
Chapter 6
In this chapter we apply parallel cooperative solvers with tabu search for the quadratic
assignment problem. The main objective of the experiments in this thesis is to find out
whether adding more solvers helps to find a solution more quickly and how different
communication graphs affect the performance.
55
• Proportion of searching guided by elite solutions: After 500 iterations without
improvement, a solver restarts either from a randomly generated solution, or
from the elite solution. The probability of restarting from the elite solution is set
to 0.5.
• Communication: A solver sends the best solution to its neighboring solvers every
50 iterations and imports a solution at the end of each individual search. The
imported solution will replace the current elite solution, if it is better.
where Obj is the objective value obtained by the solvers and ObjB is the best known
in the litterature [33]. A bold entry indicates a new best solution. The wall time the
duration between the start of executing the solvers and the end at which all the solvers
terminate.
The 06 × 06 ci instances show the best relative error of −0.24%. This means that
our PTS algorithm managed to find an optimal value of 0.24% better than the value
found so far for the same instance. The other interesting instances are 07 × 07 ci with
an improvement of −0.20%, 08 × 08 bl with an optimized value of −0.13%, 08 × 08 ci
with −0.10% and last the 07 × 07 bl with the best improvement of −0.09%.
In Figure 6.1, the speedup on the mean concurrent iterations is shown. It is calcu-
lated as follows.
1. The best solution ever found is identified and a very close range within the best
objective value, e.g. 0.1%, is targeted.
2. For each run, the iteration at which the objective value first reaches the target
range is recorded. If a run fails to reach the range, the maximum iteration (10000)
is used instead.
1 Details at http://gi.cebitec.uni-bielefeld.de/comet/chiplayout/qap/index.html
56
Instance Best Known Cooperative Solvers Relative Error wall time
(%) (min.)
06 × 06 bl 3, 296 3, 296 0.00 5.0
07 × 07 bl 4, 564 4,560 −0.09 15.6
08 × 08 bl 6, 048 6,040 −0.13 29.7
09 × 09 bl 7, 644 7, 648 +0.05 71.8
10 × 10 bl 9, 432 9, 452 +0.21 128.2
11 × 11 bl 11, 640 11, 676 +0.31 181.8
12 × 12 bl 13, 832 13, 848 +0.12 429.1
06 × 06 ci 169, 016, 907 168,611,971 −0.24 5.1
07 × 07 ci 237, 077, 377 236,613,631 −0.20 17.6
08 × 08 ci 326, 696, 412 326,376,790 −0.10 29.8
09 × 09 ci 428, 682, 120 430, 224, 089 +0.36 53.6
10 × 10 ci 525, 401, 670 528, 471, 212 +0.58 118.6
11 × 11 ci 658, 317, 466 662, 898, 977 +0.70 247.6
12 × 12 ci 803, 379, 686 809, 244, 786 +0.73 411.4
Table 6.1: The best values are shown in bold. These values are even better than those
found in the literature.
Instance 06x06 ci
12
no-communication
import-softmax
10
Speedup of Mean Concurrent Iterations
2 4 6 8 10 12
Number of Solvers (CPUs)
57
3. The iterations from 2. for a single solver are averaged over all the runs, then
divided by the corresponding mean iterations for multiple solvers.
Figure 6.1 shows the speedup (dotted line correspond to the theoretical speedup) for
the 0.1% target range within the best overall solution on the chip size 6 × 6 conflict in-
dex instance. Other instances show similar speedup results. Adding more solvers helps
to reach a high-quality solution more quickly, and communication with the best config-
uration outperforms non-communication. However, the speedup develops more slowly:
the 12 solvers achieved a mere 2.7 speedup with the best cooperation configuration and
with non-communication only a speedup of 2.1 is achieved.
Instance 06x06 ci
0.00% 12-Solver-import-softmax
12000 0.10% 12-Solver-import-softmax
0.25% 12-Solver-import-softmax
10000
Mean Concurrent Iterations
8000
6000
4000
2000
0
0 0.2 0.4 0.6 0.8 1
Communication Graph Density
Figure 6.2: Comparing the mean concurrent iterations across different communication
graphs for 12 solvers. The best solution found has an objective value of 168611971.
The target ranges are 0%, 0.1%, and 0.25% away from this objective value.
Figure 6.2 presents the mean concurrent iterations for 0%, 0.1%, and 0.25% target
ranges across different communication graphs for 12 solvers. The solvers are not always
finding the optimal solutions, so we decided to target the solutions at .10% and .25%
away from the optimal. This way we can plot a that the set of solvers will find at least
once. Looking at the 0.00% and 0.10% we notice a large improvement of the mean con-
current number of iterations at 64% communication density. Clearly communication
has some positive impact on the twelve solver for 64% communication density because
we managed to find the best solution quicker than without communication (0% com-
munication density). But the 0.25% range undeniably show that communication does
not help. Only a few topologies help the performance in the PTS case.
Nevertheless, the cooperative solvers outperform several best know solutions on this
QAP instance set2 . We managed to always beat the GRASP-PR algorithm [33] on all
2 Please visit http://gi.cebitec.uni-bielefeld.de/comet/chiplayout/qap for more details about the
58
the instances and always with communication for all densities. We also managed to
find better solutions than GATS algorithm [33] with PTS for the instances 07x07x and
08x08 with solutions 4, 560 and 6, 040, thus beating it by 0.09% and 0.13% for the
border line minimization problem. For the conflict index minimization problem, we
managed to get much better values than the GATS algorithm. We beat instance 06x06
with our solution (168, 611, 971) by 0.24%, instance 07x07 (236, 613, 631) by 0.20% and
instance 08x08 (326, 376, 790) by 0.10% with the help of our PTS algorithm compared
to the GATS algorithm.
compared algorithms.
59
60
Bibliography
[1] Lei Duan, Samuel Gabrielsson, and J. Christopher Beck. Solving combinatorial
problems with parallel cooperative solvers, 2007.
[4] Fred Buckley and Marty Lewinter. A Friendly Introduction to Graph Theory,
pages 57–58, 145–146. Pearson Education, Inc., Upper Saddle River, New Jersey,
USA, 2003.
[5] Joan M. Aldous and Robin J. Wilson. Graph and Applications An Introductory
Approach, pages 64–65. Springer-Verlag Berlin, 2000.
[8] Nils J. Nilsson. Artificial Intelligence: A New Synthesis, pages 215–238. Morgan
Kaufmann Publishers, Inc., 1998.
[9] Ellis Horowitz and Sartaj Sahni. Fundamentals of Computer Algorithms, pages
501–604. Computer Science Press, Inc., 1978.
[10] Alan Gibbons. Algorithmic Graph Theory, page 234. Cambridge University Press,
1985.
[12] Stuart Russel and Peter Norvig. Artificial Intelligence A modern Approach. Pear-
son Education, Inc., 2. edition, 2003.
[14] Yoshizawa H. and Hashimoto S. Landscape analyses and global search of knapsack
problems. In Systems, Man, and Cybernetics, 2000 IEEE International Conference
on, volume 3, pages 2311–2315, 2000.
61
[15] Fred Glover. Future paths for integer programming and links to artificial intelli-
gence. Comput. Operations Research, 13(5):533–549, 1986.
[16] Fred Glover and Manuel Laguna. Tabu Search. Kluwer Academic Publishers,
1997.
[17] Helena R. Lorenco, Oliver Martin, and Thomas Sttzle. Iterated local search.
[18] Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein.
Introduction to Algorithms, pages 595–601. MIT Press and MCGraw-Hill, 2 edi-
tion, 2001.
[19] Nesetril, Milkova, and Nesetrilova. Otakar boruvka on minimum spanning tree
problem: Translation of both the 1926 papers, comments, history. DMATH: Dis-
crete Mathematics, 233, 2001.
[20] Seth Pettie and Vijaya Ramachandran. An optimal minimum spanning tree algo-
rithm. J. ACM, 49(1):16–34, 2002.
[21] Václav Vasek Chvátal. A greedy heuristic for the set covering problem. Mathe-
matics of Operations Research, 4(3):233–235, 1979.
[22] Bart Selman, Hector J. Levesque, and David Mitchell. A new method for solving
hard satisfiability problems. In Paul Rosenbloom and Peter Szolovits, editors,
Proceedings of the Tenth National Conference on Artificial Intelligence, pages 440–
446, Menlo Park, California, 1992. AAAI Press.
[23] Charles Darwin. On the Origin of Species. London John Murray, 1. edition, 1859.
[24] Tjalling C. Koopmans and Martin J. Beckmann. Assignment problems and the
location of economic activities. Cowles Foundation Discussion Papers 4, Cowles
Foundation, Yale University, 1957.
[25] Sartaj Sahni and Teofilo Gonzalez. P-complete approximation problems. J. ACM,
23(3):555–565, 1976.
[26] Rainer E. Burkard, Stefan E. Karisch, and Franz Rendl. Qaplib-a quadratic as-
signment problem library. European Journal of Operational Research, 55:115–119,
1991.
[27] Teodor Gabriel Crainic and Michel Gendreau. Cooperative parallel tabu search
for capacitated network design. Journal of Heuristics, 8(6):601–627, 2002.
[29] S. Porto and C. Ribeiro. Parallel tabu search message-passing synchronous strate-
gies for task scheduling under precedence constraints, 1995.
62
[31] Michael de la Maza and Bruce Tidor. An analysis of selection procedures with
particular attention paid to proportional and boltzmann selection. In Proceedings
of the Fifth International Conference on Genetic Algorithms, pages 124–131, San
Fransisco, California, USA, 1993. Morgan Kaufmann Publisher Inc.
[32] Holger H. Hoos and Thomas Stützle. Stochastic Local Search Foundations and
Applications, chapter 9, pages 372–373. Morgan kaufmann, 1 edition, 2004.
[33] Sérgio A. de Carvalho Jr. and Sven Rahmann. Microarray layout as a quadratic
assignment problem. In Proceedings of the German Conference on Bioinformatics
(GCB), volume P-83, pages 11–20, 2006.
63
Index
2-opt, 55 costs, 16
N P-complete, 26, 27 cycle, 12
N P-complete problem, 27
N P-hard, 26 decision algorithm, 24
decision problem, 24
adjacent, 10 degree, 15
adjacent list, 16 demand node, 39
adjacent matrix, 16 dense graph, 17
arc, 9, 39 deterministic algorithm, 24
arc capacity, 39 digraph, 10
artificial intelligence, 30 Dijkstra’s algorithm, 35
directed graph, 9
Boltzmann selection procedure, 53 disconnected graph, 12
boolean connectivity, 26 disjunctive normal form, 26
boolean formula, 26 diversification, 50
boolean logic, 26 diversify, 30
boolean variables, 26
brute-force, 28 edge, 9
edge capacity, 39
children, 36 edge set, 10
Chvátal’s greedy heuristic, 35 elevation, 31
circuit, 11 elite set, 50
class N P, 25 Euler circuit, 15
closed, 11 Euler path, 15
closed walk, 11 Euler trail, 15
combination, 7 evolutionary algorithm, 35
combinatorial, 20 evolutionary algorithms, 34
combinatorial optimization problem, 20 exhaustive search, 23
combinatorics, 5
communication graphs, 50 feasible, 35
communication strategy, 50 feasible solution, 21
complement, 14 feasible solutions, 19
complete graph, 14 flow, 39
Computer Aided Design (CAD), 45 forbidden, 29
conjunctive normal form, 26 formula, 25
connected graph, 12 fundamental principles of counting, 5
conservation of flow, 39
constraints, 19 generation, 36
continuous problems, 20 genetic algorithm, 34, 35
convex programming problem, 20 globally optimum, 19
cost function, 19 gradient ascend, 30
64
gradient decent, 30 mutation, 36
graph, 8
greedy algorithm, 34 negations, 26
greedy algorithms, 34 neighborhood, 20, 21
greedy local search, 31 network, 39
greedy satisfiability, 35 node, 39
nodes, 9
Hamiltonian circuit, 22 nondeterministic algorithm, 24, 25
heuristic, 27 nondeterministic polynomial algorithm, 25
heuristic method, 28 nonlinear programming problem, 20
heuristic methods, 27 null graph, 14
hill climbing, 30
hybrid algorithm, 36 objective function, 19
hybrid evolutionary algorithm, 34, 36 open walk, 11
operations research, 20, 39
import-best policy, 53 optimal solutions, 21
import-hardmax, 53 optimal tabu list size, 49
import-policy, 53 optimization, 19
import-softmax, 53 optimization problem, 19
incident, 10 Optimization problems, 20
instance, 19 optimum, 19
intensification, 49
intractable, 24 parallel computing, 3
irrevocable, 35 parallel metaheuristics, 50
isolated vertex, 15 parallel tabu search, 46, 50
iterated local search, 34 parents, 36
path, 11
Kruska’s algorithm, 35 pendant vertex, 15
permutation, 6
legal, 29 polynomial time, 24
legality condition, 28 population, 35
length, 11 Prim’s algorithm, 35
linear programming problem, 20 proposition calculus, 25
literal, 26
local improvement procedures, 31 quadratic assignment problem, 36, 41
local search, 28 quadratic assignment problem library, 46
local search algorithm, 29 quadratic programming, 20
locally optimal, 19, 21, 35
location, 31 random restart hill climbing, 33
location theory, 43 reduced, 26
loop, 10 reduces in polynomial time, 26
regular graph, 15
mathematical programming, 19 restart, 50
metaheuristic, 34 restart method, 33
metropolis heuristic, 32 rugged landscape, 31
minimum cost network flow problem, 40
move, 47 satisfiability problem, 26
moves, 28 satisfiable, 26
multigraph, 13 search space, 19, 21
multiple edges, 10 search space landscape, 31
65
search strategies, 48 vertices, 9
selection, 7
selection method, 53 walk in a graph, 11
selection rule, 28, 29 weighted digraph, 16
simplex algorithm, 20 weighted graph, 16
simulated annealing, 34, 35 weights, 16
singleton graph, 14
solution, 21
solver, 50
spanning subgraph, 13
sparse graph, 17
species, 35
state, 28, 31
steepest ascend, 30
steepest descent, 30
stochastic hill climbing, 33
strategic oscillation, 50
subgraph, 13
subgraph induced, 14
subset, 19
supply node, 39
tabu, 47
tabu list, 49
tabu list length, 49
tabu list size, 49
Tabu search, 47
tabu search, 34
The assignment of gates to airplanes in an
airport, 46
The assignment of specialized rooms in a
building, 46
the assignment problem, 41
the rule of product, 5
The traveling salesman problem, 22
theory of evolution, 35
tour, 22
tractable, 24
trail, 11, 12
transition graph, 28
transportation problem, 40
transshipment node, 39
traveling salesman problem, 19
trial and error, 28
trivial walk, 11
truth assignment, 26
valence, 15
verification stage, 25
vertex set, 10
66