0% found this document useful (0 votes)
117 views

Aco For Imp

This document summarizes research on ant colony optimization algorithms, which were inspired by the foraging behavior of ant colonies. It describes how real ants deposit pheromone trails to find food sources and short paths. Experiments show that although individual ants choose paths randomly, the collective behavior of an ant colony converges on the shortest path through positive feedback of the pheromone trail. The document introduces how ant colony optimization algorithms apply this process to solve discrete optimization problems.

Uploaded by

api-3831656
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
117 views

Aco For Imp

This document summarizes research on ant colony optimization algorithms, which were inspired by the foraging behavior of ant colonies. It describes how real ants deposit pheromone trails to find food sources and short paths. Experiments show that although individual ants choose paths randomly, the collective behavior of an ant colony converges on the shortest path through positive feedback of the pheromone trail. The document introduces how ant colony optimization algorithms apply this process to solve discrete optimization problems.

Uploaded by

api-3831656
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 36

Ant Algorithms for

Discrete Optimization∗
Marco Dorigo and Gianni Di Caro
IRIDIA, Université Libre de Bruxelles
Brussels, Belgium
{mdorigo,gdicaro}@ulb.ac.be
Luca M. Gambardella
IDSIA, Lugano, Switzerland
[email protected]

Abstract
This paper overviews recent work on ant algorithms, that is, algorithms for discrete
optimization which took inspiration from the observation of ant colonies foraging
behavior, and introduces the ant colony optimization (ACO) meta-heuristic. In the
first part of the paper the basic biological findings on real ants are overviewed, and
their artificial counterparts as well as the ACO meta-heuristic are defined. In the
second part of the paper a number of applications to combinatorial optimization and
routing in communications networks are described. We conclude with a discussion of
related work and of some of the most important aspects of the ACO meta-heuristic.

1 Introduction
Ant algorithms were first proposed by Dorigo and colleagues [33, 39] as a multi-agent ap-
proach to difficult combinatorial optimization problems like the traveling salesman problem
(TSP) and the quadratic assignment problem (QAP). There is currently a lot of ongoing
activity in the scientific community to extend/apply ant-based algorithms to many differ-
ent discrete optimization problems [5, 21]. Recent applications cover problems like vehicle
routing, sequential ordering, graph coloring, routing in communications networks, and so
on.
Ant algorithms were inspired by the observation of real ant colonies. Ants are social
insects, that is, insects that live in colonies and whose behavior is directed more to the
survival of the colony as a whole than to that of a single individual component of the
colony. Social insects have captured the attention of many scientists because of the high
structuration level their colonies can achieve, especially when compared to the relative sim-
plicity of the colony’s individuals. An important and interesting behavior of ant colonies
is their foraging behavior, and, in particular, how ants can find shortest paths between
food sources and their nest.
While walking from food sources to the nest and vice versa, ants deposit on the ground
a substance called pheromone, forming in this way a pheromone trail. Ants can smell
pheromone and, when choosing their way, they tend to choose, in probability, paths marked
by strong pheromone concentrations. The pheromone trail allows the ants to find their

To appear in Artificial Life, MIT Press, 1999.

1
15 cm
% Passages upper branch % Passages lower branch
100

Upper Branch

75

50
Nest Food

25

Lower Branch 0
0 5 10 15 20 25 30
Time (minutes)

(a) (b)

Figure 1. Single bridge experiment. (a) Experimental setup. (b) Results for a typical single trial, showing the percentage of
passages on each of the two branches per unit of time as a function of time. Eventually, after an initial short transitory phase, the
upper branch becomes the most used. After Deneubourg et al., 1990 [25].

way back to the food source (or to the nest). Also, it can be used by other ants to find
the location of the food sources found by their nestmates.
It has been shown experimentally that this pheromone trail following behavior can give
rise, once employed by a colony of ants, to the emergence of shortest paths. That is, when
more paths are available from the nest to a food source, a colony of ants may be able to
exploit the pheromone trails left by the individual ants to discover the shortest path from
the nest to the food source and back.
In order to study in controlled conditions the ants’ foraging behavior, the binary bridge
experiment has been set up by Deneubourg et al. [25] (see Figure 1a). The nest of a
colony of ants of the species Linepithema humile and a food source have been separated
by a double bridge in which each branch has the same length. Ants are then left free
to move between the nest and the food source and the percentage of ants which choose
one or the other of the two branches is observed over time. The result (see Figure 1b) is
that after an initial transitory phase in which some oscillations can appear, ants tend to
converge on a same path.
In the above experiment initially there is no pheromone on the two branches, which
are therefore selected by the ants with the same probability. Nevertheless, random fluc-
tuations, after an initial transitory phase, cause a few more ants to randomly select one
branch, the upper one in the experiment shown in Figure 1a, over the other. Because
ants deposit pheromone while walking, the greater number of ants on the upper branch
determines a greater amount of pheromone on it, which in turn stimulates more ants to
choose it, and so on. The probabilistic model that describes this phenomenon, which
closely matches the experimental observations, is the following [58]. We first make the
assumption that the amount of pheromone on a branch is proportional to the number of
ants that used the branch in the past. This assumption implies that pheromone evapo-
ration is not taken into account. Given that an experiment typically lasts approximately
one hour, it is plausible to assume that the amount of pheromone evaporated in this time
period is negligible. In the model, the probability of choosing a branch at a certain time
depends on the total amount of pheromone on the branch, which in turn is proportional
to the number of ants that used the branch until that time. More precisely, let Um and
Lm be the numbers of ants that have used the upper and lower branch after m ants have
crossed the bridge, with Um + Lm = m. The probability PU (m) with which the (m + 1)-th
ant chooses the upper branch is
(Um + k)h
PU (m) = (1)
(Um + k)h + (Lm + k)h
while the probability PL (m) that it chooses the lower branch is PL (m) = 1 − PU (m). This

2
% of experiments
0 20 40 60 80 100

Foraging Foraging
0-20
area area

20-40

40-60

60-80

80-100

Nest Nest

(a) (b) (c)

Figure 2. Double bridge experiment. (a) Ants start exploring the double bridge. (b) Eventually most of the ants choose the
shortest path. (c) Distribution of the percentage of ants that selected the shorter path. After Goss et al. 1989 [58].

functional form of the probability of choosing a branch over the other was obtained from
experiments on trail-following [80]; the parameters h and k allow to fit the model to exper-
imental data. The ant choice dynamics follows from the above equation: Um+1 = Um + 1,
if ψ ≤ PU , Um+1 = Um otherwise, where ψ is a random variable uniformly distributed over
the interval [0,1].
Monte Carlo simulations were run to test the correspondence between this model and
the real data: results of simulations were in agreement with the experiments with real ants
when parameters were set to k ≈ 20 and h ≈ 2 [80].
It is easy to modify the experiment above to the case in which the bridge’s branches are
of different length [58], and to extend the model of Equation 1 so that it can describe this
new situation. In this case, because of the same pheromone laying mechanism as in the
previous situation, the shortest branch is most often selected: The first ants to arrive at the
food source are those that took the two shortest branches, so that, when these ants start
their return trip, more pheromone is present on the short branch than on the long branch,
stimulating successive ants to choose the short branch. In this case, the importance of
initial random fluctuations is much reduced, and the stochastic pheromone trail following
behavior of the ants coupled to differential branch length is the main mechanism at work.
In Figure 2 are shown the experimental apparatus and the typical result of an experiment
with a double bridge with branches of different lengths.
It is clear that what is going on in the above described process is a kind of distributed
optimization mechanism to which each single ant gives only a very small contribution. It
is interesting that, although a single ant is in principle capable of building a solution (i.e.,
of finding a path between nest and food reservoir), it is only the ensemble of ants, that
is the ant colony, which presents the “shortest path finding” behavior.1 In a sense, this
behavior is an emergent property of the ant colony. It is also interesting to note that ants
can perform this specific behavior using a simple form of indirect communication mediated
by pheromone laying, known as stigmergy [60].
As defined by Grassé in his work on Bellicositermes Natalensis and Cubitermes [60],
stigmergy is the “stimulation of workers2 by the performance they have achieved.”
1
The above described experiments have been run in strongly constrained conditions. A formal proof of
the pheromone driven shortest path finding behavior in the general case is missing. Bruckstein et al. [9, 10]
consider the shortest path finding problem in absence of obstacles for ants driven by visual clues and not by
pheromones, and prove the convergence of the ants path to the straight line.
2
Workers are one of the castes in termite colonies. Although Grassé introduced the term stigmergy to

3
In fact, Grassé (1946) [59] observed that insects are capable to respond to so called
“significant stimuli” which activate a genetically encoded reaction. In social insects, of
which termites and ants are some of the best known examples, the effects of these reactions
can act as new significant stimuli for both the insect that produced them and for other
insects in the colony. The production of a new significant stimulus as a consequence of the
reaction to a significant stimulus determines a form of coordination of the activities and
can be interpreted as a form of indirect communication. For example, Grassé [60] observed
that Bellicositermes Natalensis as well as Cubitermes, when building a new nest, start by
a random, non coordinated activity of earth-pellet depositing. But once the earth-pellet
reach a certain density in a restricted area they become a new significant stimulus which
causes more termites to add earth-pellet so that pillar and arches, and eventually the
whole nest, are built.
What characterizes stigmergy from other means of communication is (i) the physical
nature of the information released by the communicating insects, which corresponds to
a modification of physical environmental states visited by the insects, and (ii) the local
nature of the released information, which can only be accessed by insects that visit the
state in which it was released (or some neighborhood of that state).
Accordingly, in this paper we take the stance that it is possible to talk of stigmergetic
communication whenever there is an “indirect communication mediated by physical mod-
ifications of environmental states which are only locally accessible by the communicating
agents.”
One of the main tenet of this paper is that the stigmergetic model of communication
in general, and the one inspired by ants foraging behavior in particular, is an interesting
model for artificial multi-agent systems applied to the solution of difficult optimization
problems. In fact, the above-mentioned characteristics of stigmergy can easily be extended
to artificial agents by (i) associating to problem states appropriate state variables, and (ii)
by giving the artificial agents only local access to these variables’ values.
For example, in the above described foraging behavior of ants, stigmergetic communi-
cation is at work via the pheromone that ants deposit on the ground while walking. Cor-
respondingly, our artificial ants will simulate pheromone laying by modifying appropriate
“pheromone variables” associated to problem states they visit while building solutions to
the optimization problem to which they are applied. Also, according to the stigmergetic
communication model, our artificial ants will have only local access to these pheromone
variables.
Another important aspect of real ants’ foraging behavior that is exploited by artificial
ants is the coupling between the autocatalytic (positive feedback ) mechanism [39] and
the implicit evaluation of solutions. By implicit solution evaluation we mean the fact that
shorter paths (which correspond to lower cost solutions in artificial ants) will be completed
earlier than longer ones, and therefore they will receive pheromone reinforcement more
quickly. Implicit solution evaluation coupled with autocatalysis can be very effective: the
shorter the path, the sooner the pheromone is deposited by the ants, the more the ants that
use the shorter path. If appropriately used, autocatalysis can be a powerful mechanism in
population-based optimization algorithms (e.g., in evolutionary computation algorithms
[44, 64, 82, 88] autocatalysis is implemented by the selection/reproduction mechanism). In
fact, it quickly favors the best individuals, so that they can direct the search process. When
using autocatalysis some care must be taken to avoid premature convergence (stagnation),
that is, the situation in which some not very good individual takes over the population
just because of a contingent situation (e.g., because of a local optimum, or just because
of initial random fluctuations which caused a not very good individual to be much better
than all the other individuals in the population) impeding further exploration of the search
space. We will see that pheromone trail evaporation and stochastic state transitions are
explain the behavior of termite societies, the same term has been used to describe indirect communication
mediated by modifications of the environment that can be observed also in other social insects.

4
the needed complements to autocatalysis drawbacks.
In the following of this paper we discuss a number of ant algorithms based on the above
ideas. We start by defining, in Section 2, the characterizing aspects of ant algorithms and
the Ant Colony Optimization (ACO) meta-heuristic.3 Section 3 overviews most of the
applications of ACO algorithms. In Section 4 we briefly discuss related work, and in
Section 5 we discuss some of the characteristics of implemented ACO algorithms. Finally,
we draw some conclusions in Section 6.

2 The ant colony optimization approach


In the ant colony optimization (ACO) meta-heuristic a colony of artificial ants cooperate
in finding good solutions to difficult discrete optimization problems. Cooperation is a
key design component of ACO algorithms: The choice is to allocate the computational
resources to a set of relatively simple agents (artificial ants) that communicate indirectly by
stigmergy. Good solutions are an emergent property of the agents’ cooperative interaction.
Artificial ants have a double nature. On the one hand, they are an abstraction of those
behavioral traits of real ants which seemed to be at the heart of the shortest path finding
behavior observed in real ant colonies. On the other hand, they have been enriched with
some capabilities which do not find a natural counterpart. In fact, we want ant colony
optimization to be an engineering approach to the design and implementation of software
systems for the solution of difficult optimization problems. It is therefore reasonable to
give artificial ants some capabilities that, although not corresponding to any capacity of
their real ants counterparts, make them more effective and efficient. In the following we
discuss first the nature inspired characteristics of artificial ants, and then how they differ
from real ants.

2.1 Similarities and differences with real ants


Most of the ideas of ACO stem from real ants. In particular, the use of: (i) a colony of
cooperating individuals, (ii) an (artificial) pheromone trail for local stigmergetic communi-
cation, (iii) a sequence of local moves to find shortest paths, and (iv) a stochastic decision
policy using local information and no lookahead.

Colony of cooperating individuals. As real ant colonies, ant algorithms are composed
of a population, or colony, of concurrent and asynchronous entities globally coop-
erating to find a good “solution” to the task under consideration. Although the
complexity of each artificial ant is such that it can build a feasible solution (as a real
ant can find somehow a path between the nest and the food), high quality solutions
are the result of the cooperation among the individuals of the whole colony. Ants co-
operate by means of the information they concurrently read/write on the problem’s
states they visit, as explained in the next item.
Pheromone trail and stigmergy. Artificial ants modify some aspects of their environ-
ment as the real ants do. While real ants deposit on the world’s state they visit a
chemical substance, the pheromone, artificial ants change some numeric information
locally stored in the problem’s state they visit. This information takes into account
the ant’s current history/performance and can be read/written by any ant accessing
3
It is important here to clarify briefly the terminology used. We talk of ACO meta-heuristic to refer to
the general procedure presented in Section 2. The term ACO algorithm will be used to indicate any generic
instantiation of the ACO meta-heuristic. Alternatively, we will also talk more informally of ant algorithms to
indicate any algorithm that, while following the general guidelines set above, not necessarily follows all the
aspects of the ACO meta-heuristic. Therefore, all ACO algorithms are also ant algorithms, though the vice
versa is not true (e.g., we will see that HAS-QAP is an ant, but not strictly an ACO algorithm).

5
the state. By analogy, we call this numeric information artificial pheromone trail,
pheromone trail for short in the following. In ACO algorithms local pheromone
trails are the only communication channels among the ants. This stigmergetic form
of communication plays a major role in the utilization of collective knowledge. Its
main effect is to change the way the environment (the problem landscape) is locally
perceived by the ants as a function of all the past history of the whole ant colony.
Usually, in ACO algorithms an evaporation mechanism, similar to real pheromone
evaporation, modifies pheromone information over time. Pheromone evaporation
allows the ant colony to slowly forget its past history so that it can direct its search
towards new directions without being over-constrained by past decisions.
Shortest path searching and local moves. Artificial and real ants share a common
task: to find a shortest (minimum cost) path joining an origin (nest) to destina-
tion (food) sites. Real ants do not jump, they just walk through adjacent terrain’s
states, and so do artificial ants, moving step-by-step through “adjacent states” of the
problem. Of course, exact definitions of state and adjacency are problem-specific.
Stochastic and myopic state transition policy. Artificial ants, as real ones, build so-
lutions applying a probabilistic decision policy to move through adjacent states. As
for real ants, the artificial ants’ policy makes use of local information only and it
does not make use of lookahead to predict future states. Therefore, the applied
policy is completely local, in space and time. The policy is a function of both the
a priori information represented by the problem specifications (equivalent to the
terrain’s structure for real ants), and of the local modifications in the environment
(pheromone trails) induced by past ants.
As we said, artificial ants have also some characteristics which do not find their coun-
terpart in real ants.
• Artificial ants live in a discrete world and their moves consist of transitions from
discrete states to discrete states.
• Artificial ants have an internal state. This private state contains the memory of the
ant past actions.
• Artificial ants deposit an amount of pheromone which is a function of the quality of
the solution found.4
• Artificial ants timing in pheromone laying is problem dependent and often does
not reflect real ants behavior. For example, in many cases artificial ants update
pheromone trails only after having generated a solution.
• To improve overall system efficiency, ACO algorithms can be enriched with extra
capabilities like lookahead, local optimization, backtracking, and so on, that cannot
be found in real ants. In many implementations ants have been hybridized with local
optimization procedures (see for example [37, 50, 92]), while, so far, only Michel and
Middendorf [76] have used a simple one-step lookahead function and there are no
examples of backtracking procedures added to the basic ant capabilities, except for
simple recovery procedures used by Di Caro and Dorigo [26, 28].5
In the following section we will show how artificial ants can be put to work in an
algorithmic framework so that they can be applied to discrete optimization problems.
4
In reality, some real ants have a similar behavior: they deposit more pheromone in case of richer food
sources.
5
Usually, backtracking strategies are suitable to solve constraint satisfaction problems, (e.g., n-queens) and
lookahead is very useful when the cost of making a local prediction about the effect of future moves is much
lower than the cost of the real execution of the move sequence (e.g., mobile robotics). To our knowledge, until
now ACO algorithms have not been applied to these classes of problems.

6
2.2 The ACO meta-heuristic
In ACO algorithms a finite size colony of artificial ants with the above described char-
acteristics collectively searches for good quality solutions to the optimization problem
under consideration. Each ant builds a solution, or a component of it6 , starting from an
initial state selected according to some problem dependent criteria. While building its
own solution, each ant collects information on the problem characteristics and on its own
performance, and uses this information to modify the representation of the problem, as
seen by the other ants. Ants can act concurrently and independently, showing a coopera-
tive behavior. They do not use direct communication: it is the stigmergy paradigm that
governs the information exchange among the ants.
An incremental constructive approach is used by the ants to search for a feasible
solution.
A solution is expressed as a minimum cost (shortest) path through the states of the
problem in accordance with the problem’s constraints. The complexity of each ant is such
that even a single ant is able to find a (probably poor quality) solution. High quality
solutions are only found as the emergent result of the global cooperation among all the
agents of the colony concurrently building different solutions.
According to the assigned notion of neighborhood (problem-dependent), each ant builds
a solution by moving through a (finite) sequence of neighbor states. Moves are selected
by applying a stochastic local search policy directed (i) by ant private information (the
ant internal state, or memory) and (ii) by publicly available pheromone trail and a priori
problem-specific local information.
The ant’s internal state stores information about the ant past history. It can be
used to carry useful information to compute the value/goodness of the generated solution
and/or the contribution of each executed move. Moreover it can play a fundamental
role to manage the feasibility of the solutions. In some problems in fact, typically in
combinatorial optimization, some of the moves available to an ant in a state can take the
ant to an infeasible state. This can be avoided exploiting the ant’s memory. Ants therefore
can build feasible solutions using only knowledge about the local state and about the effects
of actions that can be performed in the local state.
The local, public information comprises both some problem-specific heuristic infor-
mation, and the knowledge, coded in the pheromone trails, accumulated by all the ants
from the beginning of the search process. This time-global pheromone knowledge built-up
by the ants is a shared local long-term memory that influences the ants’ decisions. The
decisions about when the ants should release pheromone on the “environment” and how
much pheromone should be deposited depend on the characteristics of the problem and on
the design of the implementation. Ants can release pheromone while building the solution
(online step-by-step), or after a solution has been built, moving back to all the visited
states (online delayed), or both. As we said, autocatalysis plays an important role in ACO
algorithms functioning: the more ants choose a move, the more the move is rewarded (by
adding pheromone) and the more interesting it becomes for the next ants. In general, the
amount of pheromone deposited is made proportional to the goodness of the solution an
ant has built (or is building). In this way, if a move contributed to generate a high-quality
solution its goodness will be increased proportionally to its contribution.
A functional composition of the locally available pheromone and heuristic values defines
ant-decision tables, that is, probabilistic tables used by the ants’ decision policy to direct
their search towards the most interesting regions of the search space. The stochastic
component of the move choice decision policy and the previously discussed pheromone
6
To make more intuitive what we mean by component of a solution, we can consider, as an example, a
transportation routing problem: given a set of n cities, {ci }, i ∈ {1, . . . , n}, and a network of interconnection
roads, we want to find all the shortest paths sij connecting each city pair ci cj . In this case, a complete solution
is represented by the set of all the n(n − 1) shortest path pairs, while a component of a solution is a single
path sij .

7
evaporation mechanism avoid a rapid drift of all the ants towards the same part of the
search space. Of course, the level of stochasticity in the policy and the strength of the
updates in the pheromone trail determine the balance between the exploration of new
points in the state space and the exploitation of accumulated knowledge. If necessary and
feasible, the ants’ decision policy can be enriched with problem-specific components like
backtracking procedures or lookahead.
Once an ant has accomplished its task, consisting of building a solution and depositing
pheromone information, the ant “dies”, that is, it is deleted from the system.
The overall ACO meta-heuristic, beside the two above-described components acting
from a local perspective (that is, ants generation and activity, and pheromone evaporation),
can also comprise some extra components which use global information and that go under
the name of daemon actions in the algorithm reported in Figure 3. For example, a daemon
can be allowed to observe the ants’ behavior, and to collect useful global information to
deposit additional pheromone information, biasing, in this way, the ant search process from
a non-local perspective. Or, it could, on the basis of the observation of all the solutions
generated by the ants, apply problem-specific local optimization procedures and deposit
additional pheromone “offline” with respect to the pheromone the ants deposited online.
The three main activities of an ACO algorithm (ants generation and activity, pheromone
evaporation, and daemon actions) may need some kind of synchronization, performed by
the schedule activities construct of Figure 3. In general, a strictly sequential schedul-
ing of the activities is particularly suitable for non-distributed problems, where the global
knowledge is easily accessible at any instant and the operations can be conveniently syn-
chronized. On the contrary, some form of parallelism can be easily and efficiently exploited
in distributed problems like routing in telecommunications networks, as will be discussed
in Section 3.2.
In Figure 3, a high-level description of the ACO meta-heuristic is reported in pseudo-
code. As pointed out above, some described components and behaviors are optional,
like daemon activities, or strictly implementation-dependent, like when and how the
pheromone is deposited. In general, the online step-by-step pheromone update and the
online delayed pheromone update components (respectively, lines 24-27 and 30-34 in the
new active ant() procedure) are mutually exclusive and only in few cases they are both
present or both absent (in the case they are absent the pheromone is deposited by the
daemon).
ACO algorithms, as a consequence of their concurrent and adaptive nature, are partic-
ularly suitable for distributed stochastic problems where the presence of exogenous sources
determines a non-stationarity in the problem representation (in terms of costs and/or en-
vironment). For example, many problems related to communications or transportation
networks are intrinsically distributed and non-stationary and it is often not possible to
have an exact model of the underlying variability. On the contrary, because stigmergy
is both the only inter-ant communication method and it is spatially localized, ACO al-
gorithms could perform not at best in case of problems where each state has a big-sized
neighborhood. In fact, an ant that visits a state with a big-sized neighborhood has a huge
number of possible moves among which to choose. Therefore, the probability that many
ants visit the same state is very small, and consequently there is little, if any, difference
between using or not pheromone trails.

3 Applications of ACO algorithms


There are now available numerous successful implementations of the ACO meta-heuristic
(Figure 3) applied to a number of different combinatorial optimization problems. Looking
at these implementations it is possible to distinguish among two classes of applications:
those to static combinatorial optimization problems, and those to dynamic ones.

8
1 procedure ACO Meta heuristic()
2 while (termination criterion not satisfied )
3 schedule activities
4 ants generation and activity();
5 pheromone evaporation();
6 daemon actions(); foptionalg
7 end schedule activities
8 end while
9 end procedure

10 procedure ants generation and activity()


11 while (available resources)
12 schedule the creation of a new ant();
13 new active ant();
14 end while
15 end procedure

16 procedure new active ant() fant lifecycleg


17 initialize ant();
18 M = update ant memory();
19 while (current state = 6 target state)
20 A = read local ant-routing table();
21 P = compute transition probabilities(A M problem constraints);
; ;

22 next state = apply ant decision policy(P problem constraints );


;

23 move to next state(next state);


24 if (online step-by-step pheromone update)
25 deposit pheromone on the visited arc();
26 update ant-routing table();
27 end if
28 M = update internal state();
29 end while
30 if (online delayed pheromone update)
31 evaluate solution();
32 deposit pheromone on all visited arcs();
33 update ant-routing table();
34 end if
35 die();
36 end procedure

Figure 3. The ACO meta-heuristic in pseudo-code. Comments are enclosed in braces. All the procedures at the first level of
indentation in the statement in parallel are executed concurrently. The procedure daemon actions() at line 6 is optional and
refers to centralized actions executed by a daemon possessing global knowledge. The target state (line 19) refers to a complete
solution built by the ant. The step-by-step and delayed pheromone updating procedures at lines 24-27 and 30-34 are often mutually
exclusive. When both of them are absent the pheromone is deposited by the daemon.

9
Static problems are those in which the characteristics of the problem are given once
and for all when the problem is defined, and do not change while the problem is being
solved. A paradigmatic example of such problems is the classic traveling salesman problem
[65, 69, 83], in which city locations and their relative distances are part of the problem
definition and do not change at run-time. On the contrary, dynamic problems are defined
as a function of some quantities whose value is set by the dynamics of an underlying
system. The problem changes therefore at run-time and the optimization algorithm must
be capable of adapting online to the changing environment. The paradigmatic example
discussed in the following of this section is network routing.
Topological modifications (e.g., adding or removing a node), which are not considered
by the above classification, can be seen as transitions between problems belonging to the
same class.
Tables 1 and 2 list the available implementations of ACO algorithms. The main char-
acteristics of the listed algorithms are discussed in the following two subsections. We then
conclude with a brief review of existing parallel implementations of ACO algorithms.

Table 1. List of applications of ACO algorithms to static combinatorial optimization problems. Classification by application and
chronologically ordered.

Problem name Authors Year Main references Algorithm name


Traveling salesman Dorigo, Maniezzo & Colorni 1991 [33, 39, 40] AS
Gambardella & Dorigo 1995 [48] Ant-Q
Dorigo & Gambardella 1996 [36, 37, 49] ACS & ACS-3-opt
Stützle & Hoos 1997 [92, 93] MMAS
Bullnheimer, Hartl & Strauss 1997 [12] ASrank
Quadratic assignment Maniezzo, Colorni & Dorigo 1994 [75] AS-QAP
Gambardella, Taillard & Dorigo 1997 [52] HAS-QAPa
Stützle & Hoos 1998 [94] MMAS-QAP
Maniezzo & Colorni 1998 [74] AS-QAPb
Maniezzo 1998 [73] ANTS-QAP
Job-shop scheduling Colorni, Dorigo & Maniezzo 1994 [20] AS-JSP
Vehicle routing Bullnheimer, Hartl & Strauss 1996 [15, 11, 13] AS-VRP
Gambardella, Taillard & Agazzi 1999 [51] HAS-VRP
Sequential ordering Gambardella & Dorigo 1997 [50] HAS-SOP
Graph coloring Costa & Hertz 1997 [22] ANTCOL
Shortest common Michel & Middendorf 1998 [76] AS-SCS
supersequence
a HAS-QAP is an ant algorithm which does not follow all the aspects of the ACO meta-heuristic.
b This is a variant of the original AS-QAP.

3.1 Applications of ACO algorithms to static combinatorial optimization prob-


lems
The application of the ACO meta-heuristic to a static combinatorial optimization problem
is relatively straightforward, once defined a mapping of the problem which allows the
incremental construction of a solution, a neighborhood structure, and a stochastic state
transition rule to be locally used to direct the constructive procedure.
A strictly implementation dependent aspect of the ACO meta-heuristic regards the
timing of pheromone updates (lines 24-27 and 30-34 of the algorithm in Figure 3). In ACO
algorithms for static combinatorial optimization the way ants update pheromone trails

10
Table 2. List of applications of ACO algorithms to dynamic combinatorial optimization problems. Classification by application
and chronologically ordered.

Problem name Authors Year Main references Algorithm name


Connection-oriented Schoonderwoerd, Holland, 1996 [87, 86] ABC
network routing Bruten & Rothkrantz
White, Pagurek & Oppacher 1998 [100] ASGA
Di Caro & Dorigo 1998 [31] AntNet-FS
Bonabeau, Henaux, Guérin, 1998 [6] ABC-smart ants
Snyers, Kuntz & Théraulaz
Connection-less Di Caro & Dorigo 1997 [26, 28, 32] AntNet & AntNet-FA
network routing Subramanian, Druschel & Chen 1997 [95] Regular ants
Heusse, Guérin, Snyers & Kuntz 1998 [62] CAF
van der Put & Rothkrantz 1998 [97, 98] ABC-backward

changes across algorithms: Any combination of online step-by-step pheromone updates


and online delayed pheromone updates is possible.
Another important implementation dependent aspect concerns the daemon actions()
component of the ACO meta-heuristic (line 6 of the algorithm in Figure 3). Daemon
actions implement actions which require some kind of global knowledge about the problem.
Examples are offline pheromone updates and local optimization of solutions built by ants.
Most of the ACO algorithms presented in this subsection are strongly inspired by Ant
System (AS), the first work on ant colony optimization [33, 39]. Many of the successive
applications of the original idea are relatively straightforward applications of AS to the
specific problem under consideration. We start therefore the description of ACO algo-
rithms with AS. Following AS, for each ACO algorithm listed in Table 1 we give a short
description of the algorithm’s main characteristics and of the results obtained.

3.1.1 Traveling salesman problem


The first application of an ant colony optimization algorithm was done using the traveling
salesman problem (TSP) as a test problem. The main reasons why the TSP, one of the
most studied NP -hard [69, 83] problems in combinatorial optimization, was chosen are
that it is a shortest path problem to which the ant colony metaphor is easily adapted and
that it is a didactic problem (that is, it is very easy to understand and explanations of the
algorithm behavior are not obscured by too many technicalities).
A general definition of the traveling salesman problem is the following. Consider a set
N of nodes, representing cities, and a set E of arcs fully connecting the nodes N . Let
dij be the length of the arc (i, j) ∈ E, that is the distance between cities i and j, with
i, j ∈ N . The TSP is the problem of finding a minimal length Hamiltonian circuit on the
graph G = (N, E), where an Hamiltonian circuit of graph G is a closed tour visiting once
and only once all the n = |N | nodes of G, and its length is given by the sum of the lengths
of all the arcs of which it is composed.7
In the following we will briefly overview the ACO algorithms that have been proposed
for the TSP, starting with Ant System.
7
Note that distances need not be symmetric: in an asymmetric TSP (ATSP) dij = dji . Also, the graph
need not be fully connected. If it is not, it suffices to add the missing arcs giving them a very high length.

11
3.1.1.1 Ant System (AS)

Ant System (AS) was the first (1991) [33, 39] ACO algorithm. Its importance resides
mainly in being the prototype of a number of ant algorithms which have found many
interesting and successful applications.
In AS artificial ants build solutions (tours) of the TSP by moving on the problem graph
from one city to another. The algorithm executes tmax iterations, in the following indexed
by t. During each iteration m ants build a tour executing n steps in which a probabilistic
decision (state transition) rule is applied. In practice, when in node i the ant chooses the
node j to move to, and the arc (i, j) is added to the tour under construction. This step is
repeated until the ant has completed its tour.
Three AS algorithms have been defined [19, 33, 39, 40], which differ by the way
pheromone trails are updated. These algorithms are called ant-density, ant-quantity, and
ant-cycle. In ant-density and ant-quantity ants deposit pheromone while building a solu-
tion,8 while in ant-cycle ants deposit pheromone after they have built a complete tour.
Preliminary experiments run on a set of benchmark problems [33, 39, 40] have shown
that ant-cycle’s performance was much better than that of the other two algorithms.
Consequently, research on AS was directed towards a better understanding of the charac-
teristics of ant-cycle, which is now known as Ant System, while the other two algorithms
were abandoned.
As we said, in AS after ants have built their tours, each ant deposits pheromone on
pheromone trail variables associated to the visited arcs to make the visited arcs become
more desirable for future ants (that is, online delayed pheromone update is at work). Then
the ants die. In AS no daemon activities are performed, while the pheromone evaporation
procedure, which happens just before ants start to deposit pheromone, is interleaved with
the ants activity.
The amount of pheromone trail τij (t) associated to arc (i, j) is intended to represent
the learned desirability of choosing city j when in city i (which also corresponds to the
desirability that the arc (i, j) belong to the tour built by an ant). The pheromone trail
information is changed during problem solution to reflect the experience acquired by ants
during problem solving. Ants deposit an amount of pheromone proportional to the quality
of the solutions they produced: the shorter the tour generated by an ant, the greater the
amount of pheromone it deposits on the arcs which it used to generate the tour. This choice
helps to direct search towards good solutions. The main role of pheromone evaporation is
to avoid stagnation, that is, the situation in which all ants end up doing the same tour.
The memory (or internal state) of each ant contains the already visited cities and is
called tabu list (in the following we will continue to use the term tabu list9 to indicate the
ant’s memory). The memory is used to define, for each ant k, the set of cities that an ant
located on city i still has to visit. By exploiting the memory therefore an ant k can build
feasible solutions by an implicit state-space graph generation (in the TSP this corresponds
to visiting a city exactly once). Also, memory allows the ant to cover the same path to
deposit online delayed pheromone on the visited arcs.
The ant-decision table Ai = [aij (t)]|Ni | of node i is obtained by the composition of the
local pheromone trail values with the local heuristic values as follows:
8
These two algorithms differ by the amount of pheromone ants deposit at each step: in ant-density ants
deposit a constant amount of pheromone, while in ant-quantity they deposit an amount of pheromone inversely
proportional to the length of the chosen arc.
9
The term tabu list is used here to indicate a simple memory that contains the set of already visited cities,
and has no relation with tabu search [55, 56].

12
[τij (t)]α [ηij ]β
aij (t) =  ∀j ∈ Ni (2)
[τil (t)]α [ηil ]β
l∈Ni

where τij (t) is the amount of pheromone trail on arc (i, j) at time t, ηij = 1/dij is the
heuristic value of moving from node i to node j, Ni is the set of neighbors of node i, and α
and β are two parameters that control the relative weight of pheromone trail and heuristic
value.
The probability with which an ant k chooses to go from city i to city j ∈ Nik while
building its tour at the t-th algorithm iteration is:
aij (t)
pkij (t) =  (3)
ail (t)
l∈Nik

where Nik ⊆ Ni is the set of nodes in the neighborhood of node i that ant k has not
visited yet (nodes in Nik are selected from those in Ni by using the ant private memory
Mk ).
The role of the parameters α and β is the following. If α = 0, the closest cities are
more likely to be selected: this corresponds to a classical stochastic greedy algorithm (with
multiple starting points since ants are initially randomly distributed on the nodes). If on
the contrary β = 0, only pheromone amplification is at work: this method will lead to
the rapid emergence of a stagnation situation with the corresponding generation of tours
which, in general, are strongly sub-optimal [40]. A trade-off between heuristic value and
trail intensity therefore appears to be necessary.
After all ants have completed their tour, pheromone evaporation on all arcs is triggered,
and then each ant k deposits a quantity of pheromone ∆τijk (t) on each arc that it has used:

1/Lk (t) if (i, j) ∈ T k (t)
∆τijk (t) = (4)
0 if (i, j) ∈
/ T k (t)
where T k (t) is the tour done by ant k at iteration t, and Lk (t) is its length. Note
that in the symmetric TSP arcs are considered to be bidirectional so that arcs (i, j) and
(j, i) are always updated contemporaneously (in fact, they are the same arc). Different
is the case of the asymmetric TSP, where arcs have a directionality, and for which the
pheromone trail level on the arcs (i, j) and (j, i) can be different. In this case, therefore,
when an ant moves from node i to node j only arc (i, j), and not (j, i) is updated.
It is clear from Equation 4 that the value ∆τijk (t) depends on how well the ant has
performed: the shorter the tour done, the greater the amount of pheromone deposited.
In practice, the addition of new pheromone by ants and pheromone evaporation are
implemented by the following rule applied to all the arcs:

τij (t) ← (1 − ρ)τij (t) + ∆τij (t) (5)


m
where ∆τij (t) = k=1 ∆τijk (t), m is the number of ants at each iteration (maintained
constant), and ρ ∈ (0, 1] is the pheromone trail decay coefficient. The initial amount of
pheromone τij (0) is set to a same small positive constant value τ0 on all arcs, and the total
number of ants is set to m = n, while α, β and ρ are respectively set to 1, 5 and 0.5; these
values were experimentally found to be good by Dorigo [33]. Dorigo et al. [39] introduced
also elitist ants, that is, a daemon action by which the arcs used by the ant that generated
the best tour from the beginning of the trial get extra pheromone.
Ant System was compared with other general purpose heuristics on some relatively
small TSP problems (these were problems ranging from 30 to 75 cities). The results [39, 40]

13
were very interesting and disappointing at a time. AS was able to find and improve the
best solution found by a genetic algorithm for Oliver30 [101], a 30-city problem, and it
had a performance similar or better than that of some general purpose heuristics with
which it was compared. Unfortunately, for problems of growing dimensions AS never
reached the best known solutions within the allowed 3,000 iterations, although it exhibited
quick convergence to good solutions. These encouraging, although not exceptional, results
stimulated a number of researchers to further study the ACO approach to optimization.
These efforts have resulted in numerous successful applications, listed in the following
sections.

3.1.1.2 Others AS-like approaches

Stützle and Hoos (1997) [92, 93] have introduced Max-Min AS (MMAS), which is the
same as AS, but (i) pheromone trails are only updated offline by the daemon (the arcs
which were used by the best ant in the current iteration receive additional pheromone),
(ii) pheromone trail values are restricted to an interval [τmin , τmax ], and (iii) trails are
initialized to their maximum value τmax .
Putting explicit limits on the trail strength restricts the range of possible values for
the probability of choosing a specific arc according to Equation 3. This helps avoiding
stagnation, which was one of the reasons why AS performed poorly when an elitist strategy,
like allowing only the best ant to update pheromone trails, was used. To avoid stagnation,
which may occur in case some pheromone trails are close to τmax while most others are close
to τmin , Stützle and Hoos have added what they call a “trail smoothing mechanism”; that
is, pheromone trails are updated using a proportional mechanism: ∆τij ∝ (τmax − τij (t)).
In this way the relative difference between the trail strengths gets smaller, which obviously
favors the exploration of new paths. They found that, when applied to the TSP, MMAS
finds significantly better tours than AS, although comparable to those obtained with ACS
(ACS is an extension of AS discussed in Section 3.1.1.3).
Bullnheimer, Hartl and Strauss [12] proposed yet another modification of AS, called
ASrank . In ASrank , as it was the case in MMAS, the only pheromone updates are per-
formed by the daemon, that implements the following activities: (i) the m ants are ranked
by tour length (L1 (t), L2 (t), . . . , Lm (t)) and the arcs which were visited by one of the first
σ − 1 ants in the ranking receive an amount of pheromone proportional to the visiting ant
rank, and (ii) the arcs used by the ant that generated the best tour from beginning of the
trial also receive additional pheromone (this is equivalent to AS’s elitist ants pheromone
updating). These are both forms of offline pheromone update. In their implementation
the contribution of the best tour so far was multiplied by σ. The dynamics of the amount
of pheromone τij (t) is given by:

τij (t) ← (1 − ρ)τij (t) + σ∆τij+ (t) + ∆τijr (t) (6)


where ∆τij+ (t) = 1/L+ (t), L+ being the length of the best solution from beginning of

µ=1 ∆τij (t), with ∆τij (t) = (σ − µ)1/L (t) if the ant with rank
µ µ
the trial, and ∆τijr (t) = σ−1 µ
µ µ
µ has used arc (i, j) in its tour and ∆τij (t) = 0 otherwise. L (t) is the length of the tour
performed by the ant with rank µ at iteration t. Equation 6 is applied to all arcs and
implements therefore both pheromone evaporation and offline pheromone updating. They
found that this new procedure improves significantly the quality of the results obtained
with AS.

14
3.1.1.3 Ant colony system (ACS), ACS-3-opt, and Ant-Q

The Ant Colony System (ACS) algorithm has been introduced by Dorigo and Gambardella
(1996) [36, 37, 49] to improve the performance of AS, that was able to find good solutions
within a reasonable time only for small problems. ACS is based on AS but presents some
important differences.
First, the daemon updates pheromone trails offline: at the end of an iteration of the
algorithm, once all the ants have built a solution, pheromone trail is added to the arcs
used by the ant that found the best tour from the beginning of the trial. In ACS-3-opt
the daemon first activates a local search procedure based on a variant of the 3-opt local
search procedure [71] to improve the solutions generated by the ants and then performs
offline pheromone trail update. The offline pheromone trail update rule is:

τij (t) ← (1 − ρ)τij (t) + ρ∆τij (t) (7)


where ρ ∈ (0, 1] is a parameter governing pheromone decay, ∆τij (t) = 1/L+ , and L+
is the length of T + , the best tour since the beginning of the trial. Equation 7 is applied
only to the arcs (i, j)’s belonging to T + .
Second, ants use a different decision rule, called pseudo-random-proportional rule, in
which an ant k on city i chooses the city j ∈ Nik to move to as follows. Let Ai = [aij (t)]|Ni |
be the ant-decision table:
[τij (t)][ηij ]β
aij (t) =  ∀j ∈ Ni (8)
[τil (t)][ηil ]β
l∈Ni

Let q be a random variable uniformly distributed over [0, 1], and q0 ∈ [0, 1] be a
tunable parameter. The pseudo-random-proportional rule, used by ant k located in node
i to choose the next node j ∈ Nik , is the following: if q ≤ q0 then

1 if j = arg max aij
pkij (t) = (9)
0 otherwise
otherwise, when q > q0
aij (t)
pkij (t) =  (10)
ail (t)
l∈Nik

This decision rule has a double function: when q ≤ q0 the decision rule exploits the
knowledge available about the problem, that is, the heuristic knowledge about distances
between cities and the learned knowledge memorized in the form of pheromone trails, while
when q > q0 it operates a biased exploration (equivalent to AS’s Equation 3). Tuning q0
allows to modulate the degree of exploration and to choose whether to concentrate the
activity of the system on the best solutions or to explore the search space.
Third, in ACS ants perform only online step-by-step pheromone updates. These up-
dates are performed to favor the emergence of other solutions than the best so far. The
pheromone updates are performed by applying the rule:

τij (t) ← (1 − ϕ)τij (t) + ϕτ0 (11)


where 0 < ϕ ≤ 1.

15
Equation 11 says that an ant moving from city i to city j ∈ Nik updates the pheromone
trail on arc (i, j). The value τ0 is the same as the initial value of pheromone trails and it
was experimentally found that setting τ0 = (nLnn )−1 , where n is the number of cities and
Lnn is the length of a tour produced by the nearest neighbor heuristic [54], produces good
results. When an ant moves from city i to city j, the application of the local update rule
makes the corresponding pheromone trail τij diminish. The rationale for decreasing the
pheromone trail on the path an ant is using to build a solution is the following. Consider
an ant k2 starting in city 2 and moving to city 3, 4, and so on, and an ant k1 starting
in city 1 and choosing city 2 as the first city to move to. Then, there are good chances
that ant k1 will follow ant k2 with one step delay. The trail decreasing effect obtained
by applying the local update rule reduces the risk of such a situation. In other words,
ACS’s local update rule has the effect of making the visited arcs less and less attractive as
they are visited by ants, indirectly favoring the exploration of not yet visited arcs. As a
consequence, ants tend not to converge to a common path. This fact, which was observed
experimentally [37], is a desirable property given that if ants explore different paths then
there is a higher probability that one of them will find an improving solution than there is
in the case that they all converge to the same tour (which would make the use of m ants
pointless).
Last, ACS exploits a data structure called candidate list which provides additional
local heuristic information. A candidate list is a list of preferred cities to be visited from a
given city. In ACS when an ant is in city i, instead of examining all the unvisited neighbors
of i, it chooses the city to move to among those in the candidate list; only if no candidate
list city has unvisited status then other cities are examined. The candidate list of a city
contains cl cities ordered by increasing distance (cl is a parameter), and the list is scanned
sequentially and according to the ant tabu list to avoid already visited cities.
ACS was tested (see [36, 37] for detailed results) on standard problems, both symmetric
and asymmetric, of various sizes and compared with many other meta-heuristics. In all
cases its performance, both in terms of quality of the solutions generated and of CPU time
required to generate them, was the best one.
ACS-3-opt performance was compared to that of the genetic algorithm (with local opti-
mization) [46, 47] that won the First International Contest on Evolutionary Optimization
[2]. The two algorithms showed similar performance with the genetic algorithm behaving
slightly better on symmetric problems and ACS-3-opt on asymmetric ones.
To conclude, we mention that ACS was the direct successor of Ant-Q (1995) [35, 48],
an algorithm that tried to merge AS and Q-learning [99] properties. In fact, Ant-Q differs
from ACS only in the value τ0 used by ants to perform online step-by-step pheromone
updates. The idea was to update pheromone trails with a value which was a prediction of
the value of the next state. In Ant-Q, an ant k implements online step-by-step pheromone
updates by the following equation which replaces Equation 11:

τij (t) ← (1 − ϕ)τij (t) + ϕγ · maxk τjl (12)


l∈Nj

Unfortunately, it was later found that setting the complicate prediction term to a small
constant value, as it is done in ACS, resulted in approximately the same performance.
Therefore, although having a good performance, Ant-Q was abandoned for the equally
good but simpler ACS.
Also, other versions of ACS were studied which differ from the one described above
because of (i) the way online step-by-step pheromone updates was implemented (in [37]
experiments were run disabling it, or by setting the update term in Equation 11 to the
value τ0 = 0), (ii) the way the decision rule was implemented (in [48] the pseudo-random-
proportional rule of Equations 9 and 10 was compared to the random-proportional rule
of Ant System, and to a pseudo-random rule which differs from the pseudo-random-
proportional rule because random choices are done uniformly random), and (iii) the type

16
of solution used by the daemon to update the pheromone trails (in [37] the use of the best
solution found in the current iteration was compared with ACS’s use of the best solution
found from the beginning of the trial). ACS as described above is the best performing of
all the algorithms obtained by combinations of the above-mentioned choices.

3.1.2 Quadratic assignment problem


The quadratic assignment problem (QAP) is the problem of assigning n facilities to n
locations so that the cost of the assignment, which is a function of the way facilities have
been assigned to locations, is minimized [67]. The QAP was, after the TSP, the first
problem to be attacked by an AS-like algorithm. This was a reasonable choice, since the
QAP is a generalization of the TSP.10 Maniezzo, Colorni and Dorigo (1994) [75] applied
exactly the same algorithm as AS using the QAP-specific min-max heuristic to compute
the η values used in Equation 3. The resulting algorithm, AS-QAP, was tested on a set
of standard problems and resulted to be of the same quality as meta-heuristic approaches
like simulated annealing and evolutionary computation. More recently, Maniezzo and
Colorni [74] and Maniezzo [73] developed two variants of AS-QAP and added to them a
local optimizer. The resulting algorithms were compared with some of the best heuristics
available for the QAP with very good results: their versions of AS-QAP gave the best
results on all the tested problems.
Similar results were obtained by Stützle and Hoos (1998) with their MMAS-QAP
algorithm [94] (MMAS-QAP is a straightforward application of MMAS, see Section
3.1.1.2, to the QAP), and by Gambardella, Taillard and Dorigo (1997) with their HAS-
QAP.11 MMAS-QAP and HAS-QAP were compared with some of the best heuristics
available for the QAP on two classes of QAP problems: random and structured QAPs,
where structured QAPs are instances of problems taken from real world applications.
These ant algorithms were found to be the best performing on structured problems [52].

3.1.3 Job-shop scheduling problem


Colorni, Dorigo and Maniezzo (1994) [20] applied AS to the job-shop scheduling problem
(JSP), which is formulated as follows. Given a set M of machines and a set J of jobs
consisting of an ordered sequence of operations to be executed on these machines, the
job-shop scheduling problem is the problem of assigning operations to machines and time
intervals so that the maximum of the completion times of all operations is minimized and
no two jobs are processed at the same time on the same machine. JSP is NP -hard [53].
The basic algorithm they applied was exactly the same as AS, where the η heuristic value
was computed using the longest remaining processing time heuristic. Due to the different
nature of the constraints with respect to the TSP they also defined a new way of building
the ant’s tabu list. AS-JSP was applied to problems of dimensions up to 15 machines and
15 jobs always finding solutions within 10% of the optimal value [20, 40]. These results,
although not exceptional, are encouraging and suggest that further work could lead to a
workable system. Also, a comparison with other approaches is necessary.

3.1.4 Vehicle routing problem


There are many types of vehicle routing problems (VRPs). Bullnheimer, Hartl and Strauss
[11, 13, 15] applied an AS-like algorithm to the following instance. Let G = (N, A, d) be
10
In fact, the TSP can be seen as the problem of assigning to each of n cities a different number chosen
between 1 and n. QAP, as the TSP, is NP-hard [89].
11
HAS-QAP is an ant algorithm which, although initially inspired by AS, does not strictly belong to the
ACO meta-heuristic because of some peculiarities like ants which modify solutions as opposed to build them,
and pheromone trail used to guide solutions modifications and not as an aid to direct their construction.

17
a complete weighted directed graph, where N = (n0 , . . . , nn ) is the set of nodes, A =
{(i, j) : i = j} is the set of arcs, and each arc (i, j) has an associated weight dij ≥ 0
which represents the distance between ni and nj . Node n0 represents a depot, where M
vehicles are located, each one of capacity D, while the other nodes represent customers
locations. A demand di ≥ 0 and a service time δi ≥ 0 are associated to each customer
ni (d0 = 0 and δ0 = 0). The objective is to find minimum cost vehicle routes such that
(i) every customer is visited exactly once by exactly one vehicle, (ii) for every vehicle the
total demand does not exceed the vehicle capacity D, (iii) the total tour length of each
vehicle does not exceed a bound L, and (iv) every vehicle starts and ends its tour in the
depot.12 AS-VRP, the algorithm defined by Bullnheimer, Hartl and Strauss for the above
problem, is a direct extension of AS based on their ASrank algorithm discussed in Section
3.1.1.2. They used various standard heuristics for the VRP [18, 79], and added a simple
local optimizer based on the 2-opt heuristic [24]. They also adapted the way the tabu list
is built by taking in consideration the constraints on the maximum total tour length L of a
vehicle and its maximum capacity D. Comparisons on a set of standard problems showed
that AS-VRP performance is at least interesting: it outperforms simulated annealing and
neural networks, while it has a slightly lower performance than tabu search.
Gambardella, Taillard and Agazzi (1999) [51] have also attacked the VRP by means
of an ACO algorithm. They first reformulate the problem by adding to the city set M − 1
depots, where M is the number of vehicles. Using this formulation, the VRP problem be-
comes a TSP with additional constraints. Therefore they can define an algorithm, called
HAS-VRP, which is inspired by ACS: Each ant builds a complete tour without violat-
ing vehicle capacity constraints (each vehicle has associated a maximum transportable
weight). A complete tour comprises many subtours connecting depots, and each sub-tour
corresponds to the tour associated to one of the vehicles. Pheromone trail updates are done
offline as in ACS. Also, a local optimization procedure based on edge exchanges is applied
by the daemon. Results obtained with this approach are competitive with those of the best
known algorithms and new upper bounds have been computed for well known problem
instances. They also study the vehicle routing problem with time windows (VRPTW),
which extends the VRP by introducing a time window [bi , ei ] within which a customer i
must be served. Therefore, a vehicle visiting customer i before time bi will have to wait.
In the literature the VRPTW is solved considering two objectives functions: the first one
is to minimize the number of vehicles and the second one is to minimize the total travel
time. A solution with a lower number of vehicles is always preferred to a solution with
a higher number of vehicles but lower travel time. In order to optimize both objectives
simultaneously, a two-colony ant algorithm has been designed. The first colony tries to
minimize the number of vehicles, while the other one uses V vehicles, where V is the num-
ber of vehicles computed by the first colony, to minimize travel time. The two colonies
work using different sets of pheromone trails, but the best ants are allowed to update the
pheromone trails of the other colony. This approach has been proved to be competitive
with the best known methods in literature.

3.1.5 Shortest common supersequence problem


Given a set L of strings over an alphabet Σ, find a string of minimal length that is
a supersequence of each string in L, where a string S is a supersequence of a string
A if S can be obtained from A by inserting in A zero or more characters.13 This is
the problem known as shortest common supersequence problem (SCS) that Michel and
Middendorf (1998) [76] attacked by means of AS-SCS. AS-SCS differs from AS in that it
12
Also in this case it can be easily seen that the VRP is closely related to the TSP: a VRP consists of the
solution of many TSPs with common start and end cities. As such, VRP is an NP-hard problem.
13
Consider for example the set L = {bcab, bccb, baab, acca}. The string baccab is a shortest supersequence.
The shortest common supersequence (SCS) problem is NP-hard even for an alphabet of cardinality two [81].

18
uses a lookahead function which takes into account the influence of the choice of the next
symbol to append at the next iteration. The value returned by the lookahead function
takes the place of the heuristic value η in the probabilistic decision rule (Equation 3).
Also, in AS-SCS the value returned by a simple heuristic called LM [7] is factorized in the
pheromone trail term. Michel and Middendorf further improved their algorithm by the
use of an island model of computation (that is, different colonies of ants work on the same
problem concurrently using private pheromone trail distributions; every fixed number of
iterations they exchange the best solution found).
AS-SCS-LM (i.e., AS-SCS with LM heuristic, lookahead, and island model of computa-
tion) was compared in [76] with the MM [45] and LM heuristics, as well as with a recently
proposed genetic algorithm specialized for the SCS problem. On the great majority of the
test problems AS-SCS-LM resulted to be the best performing algorithm.

3.1.6 Graph coloring problem


Costa and Hertz (1997) [22] have proposed the AS-ATP algorithm for assignment type
problems.14 The AS-ATP algorithm they define is basically the same as AS except that
ants need to make two choices: first they choose an item, then they choose a resource to
assign to the previously chosen item. These two choices are made by means of two prob-
abilistic rules function of two distinct pheromone trails τ1 and τ2 and of two appropriate
heuristic values η1 and η2 . In fact, the use of two pheromone trails is the main novelty
introduced by AS-ATP. They exemplify their approach by means of an application to the
graph coloring problem (GCP). Given a graph G = (N, E), a q-coloring of G is a mapping
c : N → {1, · · · , q} such that c(i) = c(j) if (i, j) ∈ E. The GCP is the problem of finding
a coloring of the graph G so that the number q of colors used is minimum. The algorithm
they propose, called ANTCOL, makes use of well-known graph coloring heuristics like re-
cursive large first (RLF) [70] and DSATUR [8]. Costa and Hertz tested ANTCOL on a set
of random graphs and compared it with some of the best available heuristics. Results have
shown that ANTCOL performance is comparable to that obtained by the other heuristics:
on 20 randomly generated graphs of 100 nodes with any two nodes connected with prob-
ability 0.5 the average number of colors used by ANTCOL was 15.05, whereas the best
known result [23, 43] is 14.95. More research will be necessary to establish whether the
proposed use of two pheromone trails can be a useful addition to ACO algorithms.

3.1.7 Sequential ordering problem


The sequential ordering problem (SOP) [41] consists of finding a minimum weight Hamil-
tonian path on a directed graph with weights on the arcs and on the nodes, subject to
precedence constraints among nodes. It is very similar to an asymmetric TSP in which
the end city is not directly connected to the start city. The SOP, which is NP -hard,
models real-world problems like single-vehicle routing problems with pick-up and delivery
constraints, production planning, and transportation problems in flexible manufacturing
systems and is therefore an important problem from an applications point of view.
Gambardella and Dorigo (1997) [50] attacked the SOP by HAS-SOP, an extension of
ACS. In fact, HAS-SOP is the same as ACS except for the set of feasible nodes, which
is built taking in consideration the additional precedence constraints, and for the local
optimizer, which was a specifically designed variant of the well-known 3-opt procedure.
Results obtained with HAS-SOP are excellent. Tests were run on a great number of
standard problems15 and comparisons were done with the best available heuristic methods.
14
Examples of assignment type problems are the QAP, the TSP, graph coloring, set covering, and so on.
15
In fact, on all the problems registered in the TSPLIB, a well-known repository for TSP related benchmark
problems available on the Internet at:
http://www.iwr.uni-heidelberg.de/iwr/comopt/soft/TSPLIB95/TSPLIB.html.

19
In all cases HAS-SOP was the best performing method in terms of solution quality and of
computing time. Also, on the set of problems tested it improved many of the best known
results.

3.2 Applications of ACO algorithms to dynamic combinatorial optimization


problems
Research on the applications of ACO algorithms to dynamic combinatorial optimization
problems has focused on communications networks. This is mainly due to the fact that
network optimization problems have characteristics like inherent information and com-
putation distribution, non-stationary stochastic dynamics, and asynchronous evolution of
the network status, that well match those of the ACO meta-heuristic. In particular, the
ACO approach has been applied to routing problems.
Routing is one of the most critical components of network control, and concerns the
network-wide distributed activity of building and using routing tables to direct data traffic.
The routing table of a generic node i is a data structure that says to data packets entering
node i which should be the next node to move to among the set Ni of neighbors of i. In
the applications presented in this section routing tables for data packets are obtained by
some functional transformation of ant-decision tables.
Let G = (N, A) be a directed weighted graph, where each node in the set N represents
a network node with processing/queuing and forwarding capabilities, and each oriented
arc in A is a transmission system (link) with an associated weight defined by its physical
properties. Network applications generate data flows from source to destination nodes.
For each node in the network, the local routing component uses the local routing table to
choose the best outgoing link to direct incoming data towards their destination nodes.
The generic routing problem can be informally stated as the problem to build routing
tables so that some measure of network performance is maximized.16
Most of the ACO implementations for routing problems well match the general guide-
lines of the meta-heuristic shown in Figure 3. Ants are launched from each network
node towards heuristically selected destination nodes (launching follows some random or
problem-specific schedule). Ants, like data packets, travel across the network and build
paths from source to destination nodes by applying a probabilistic transition rule that
makes use of information maintained in pheromone trail variables associated to links and,
in some cases, of additional local information. Algorithm-specific heuristics and informa-
tion structures are used to score the discovered paths and to set the amount of pheromone
ants deposit.
A common characteristic of ACO algorithms for routing is that the role of the daemon
(line 6 of the ACO meta-heuristic of Figure 3) is much reduced: in the majority of the
implementations it simply does not perform any actions.
ACO implementations for communications networks are grouped in two classes, those
for connection-oriented and those for connection-less networks. In connection-oriented
networks all the packets of a same session follow a common path selected by a preliminary
setup phase. On the contrary, in connection-less, or datagram, networks data packets of a
same session can follow different paths. At each intermediate node along the route from the
source to the destination node a packet-specific forwarding decision is taken by the local
routing component. In both types of networks best-effort routing, that is routing without
any explicit network resource reservation, can be delivered. Moreover, in connection-
oriented networks an explicit reservation (software or hardware) of the resources can be
16
The choice of what particular measure of network performance to use depends on the type of network
considered and on which aspects of the provided services are most interesting. For example, in a telephone
network performance can be measured by the percentage of accepted calls and by the mean waiting time to
setup or refuse a call, while, in a Internet-like network, performance can be scored by the amount of correctly
delivered bits per time unit (throughput), and by the distribution of data packet delays.

20
done. In this way, services requiring specific characteristics (in terms of bandwidth, delay,
and so on) can be delivered.17

3.2.1 Connection-oriented networks routing


The work by Schoonderwoerd, Holland, Bruten and Rothkrantz (1996) [86, 87] has been,
to our knowledge, the first attempt to apply an ACO algorithm to a routing problem.
Their algorithm, called ant-based control (ABC), was applied to a model of the British
Telecom (BT) telephone network. The network is modeled by a graph G = (N, A), where
each node i has the same functionalities as a crossbar switch with limited connectivity
(capacity) and links have infinite capacity (that is, they can carry a potentially infinite
number of connections). Each node i has a set Ni of neighbors, and is characterized
by a total capacity Ci , and a spare capacity Si . Ci represents the maximum number of
connections node i can establish, while Si represents the percentage of capacity which is
still available for new connections. Each link (i, j) connecting node i to node j has an
associated vector of pheromone trail values τijd , d = 1, . . . , i − 1, i + 1, . . . , n. The value τijd
represents a measure of the desirability of choosing link (i, j) when the destination node
is d.
Because the algorithm does not make use of any additional local heuristics, only
pheromone values are used to define the ant-decision table values: aind (t) = τind (t). The
ant stochastic decision policy uses the pheromone values as follows: τind (t) gives the prob-
ability that a given ant, the destination of which is node d, be routed at time t from node i
to neighbor node n. An exploration mechanism is added to the ant’s decision policy: with
some low probability ants can choose the neighbor to move to following a uniformly ran-
dom scheme over all the current neighbors. The ant internal state keeps trace only of the
ant source node and launching time. No memory about the visited nodes is maintained to
avoid ant cycles. Routing tables for calls are obtained using ant-decision tables in a deter-
ministic way: at setup time a route from node s to node d is built by choosing sequentially
and deterministically, starting from node s, the neighbor node with the highest probability
value until node d is reached. Once the call is set up in this way, the spare capacity Si of
each node i on the selected route is decreased by a fixed amount. If at call set up time any
of the nodes along the route under construction has no spare capacity left, then the call
is rejected. When a call terminates, the corresponding reserved capacity of nodes on its
route is made available again for other calls. Ants are launched at regular temporal inter-
vals from all the nodes towards destination nodes selected in a uniform random way. Ants
deposit pheromone only online, step-by-step, on the links they visit (they do not deposit
pheromone after the path has been built, that is, they do not implement the procedure
of lines 30-34 of the ACO meta-heuristic of Figure 3). An ant k originated in node s and
arriving at time t in node j from node i adds an amount ∆τ k (t) of pheromone to the value
τjis (t) stored on link (j, i). The updated value τjis (t), that represents the desirability to go
from node j to destination node s via node i, will be used by ants moving in the opposite
direction of the updating ant. This updating strategy can be applied when the network
is (approximately) cost-symmetric, that is, when it is reasonable to use an estimate of
the cost from node i to node j as an estimate of the cost from node j to node i. In the
network model used by Schoonderwoerd et al. cost-symmetry is a direct consequence of
the assumptions made on the switches and transmission link structure. The pheromone
trail update formula is:
τjis (t) ← τjis (t) + ∆τ k (t) (13)
After the visited entry has been updated, the pheromone value of all the entries relative
to the destination s decays. Pheromone decay, as usual, corresponds to the evaporation
17
For example, telephone calls need connection-oriented networks able to guarantee the necessary bandwidth
during all the call time.

21
of real pheromone.18 In this case the decay factor is set to 1/(1 + ∆τ k (t)) so that it
operates a normalization of the pheromone values which continue therefore to be usable
as probabilities:
τins (t)
τins (t) ← , ∀n ∈ Ni (14)
(1 + ∆τ k (t))
The value ∆τ k (t) is a function of the ant age. Ants move over a control network isomorphic
to the real one. They grow older after each node hop and they are virtually delayed in
nodes as a function of the node spare capacity. By this simple mechanism, the amount
of pheromone deposited by an ant is made inversely proportional to the length and to
the degree of congestion of the selected path. Therefore, the overall effect of ants on
pheromone trail values is such that routes which are visited frequently and by “young”
ants will be favored when building paths to route new calls.
In ABC no daemon actions are included into the algorithm. Each new call is accepted
or rejected on the basis of a setup packet that looks for a path with spare capacity by
probing the deterministically best path as indicated by the routing tables.
ABC has been tested on the above-described model of the British Telecom (BT) tele-
phone network (30 nodes) using a sequential discrete time simulator and compared, in
terms of percentage of accepted calls, to an agent-based algorithm developed by BT re-
searchers. Results were encouraging19 : ABC always performed significantly better than
its competitor on a variety of different traffic situations.
White, Pagurek and Oppacher (1998) [100] use an ACO algorithm for routing in conn-
ection-oriented point-to-point and point-to-multipoint networks. The algorithm follows a
scheme very similar to AS (see Section 3.1.1.1): the ant-decision table has the same form
as AS’s (Eq. 2) and the decision rule (Eq. 3) is identical. The heuristic information η
is locally estimated by link costs. From the source node of each incoming connection,
a group of ants is launched to search for a path. At the beginning of the trip each ant
k sets a private path cost variable Ck to 0, and after each link crossing the path cost
is incremented by the link cost lij : Ck ← Ck + lij . When arrived at destination, ants
move backward to their source node and at each node they use a simple additive rule to
deposit pheromone on the visited links. The amount of deposited pheromone is a function
of the whole cost Ck of the path, and only this online step-by-step updating is used to
update pheromone. When all the spooled ants arrive back at the source node, a simple
local daemon algorithm decides whether to allocate a path, based on the percentage of
ants that followed a same path. Moreover, during all the connection lifetime, the local
daemon launches and coordinates exploring ants to re-route the connection paths in case
of network congestion or failures. A genetic algorithm [57, 64] is used online to evolve the
parameters α and β of the transition rule formula, which determine the relative weight
of pheromone and link costs (because of this mechanism the algorithm is called ASGA,
ant system plus genetic algorithm). Some preliminary results were obtained testing the
algorithm on several networks and using several link cost functions. Results are promising:
the algorithm is able to compute shortest paths and the genetic adaptation of the rule
parameters considerably improves the algorithm’s performance.
Bonabeau, Henaux, Guérin, Snyers, Kuntz and Théraulaz (1998) [6] improved the ABC
algorithm by the introduction of a dynamic programming mechanism. They update the
pheromone trail values of all the links on an ant path not only with respect to the ant’s
origin node, but also with respect to all the intermediate nodes on the sub-path between
the origin and the ant current node.
18
Schoonderwoerd et al. developed ABC independently from previous ant colony optimization work. There-
fore, they do not explicitly speak of pheromone evaporation, even if their probability re-normalization mechanism
plays the same role.
19
In the following we use the word “encouraging” whenever interesting results were obtained but the
algorithm was compared only on simple problems, or with no state-of-the-art algorithms or under limited
experimental conditions.

22
Di Caro and Dorigo (1998) [31] are currently investigating the use of AntNet-FS to man-
age fair-share best-effort routing in high-speed connection-oriented networks. AntNet-FS
is a derivation of AntNet, an algorithm for best-effort routing in connection-less networks
the same authors developed. Therefore, we postpone the description of AntNet-FS to the
next sub-subsection, where AntNet and its extensions are described in detail.

3.2.2 Connection-less networks routing


Several ACO algorithms have been developed for routing on connection-less networks
taking inspiration both from AS (Sec. 3.1.1.1) in general and from ABC (Sec. 3.2.1) in
particular.
Di Caro and Dorigo (1997) [26, 27, 28, 29, 30] developed several versions of AntNet, an
ACO algorithm for distributed adaptive routing in best-effort connection-less (Internet-
like) data networks. The main differences between ABC and AntNet are that (i) in AntNet
real trip times experienced by ants (ants move over the same, real network as data pack-
ets) and local statistical models are used to evaluate paths goodness, (ii) pheromone is
deposited once a complete path is built (this is a choice dictated by a more general as-
sumption of cost asymmetry made on the network), and (iii) the ant decision rule makes
use of local heuristic information η about the current traffic status.
In AntNet the ant-decision table Ai = [aind (t)]|Ni |,|N |−1 of node i is obtained by the com-
position of the local pheromone trail values with the local heuristic values as follows:

ωτind (t) + (1 − ω)ηn


aind (t) = (15)
ω + (1 − ω)(|Ni | − 1)

where Ni is the set of neighbors of node i, n ∈ Ni , d is the destination node, ηn is


a [0,1]-normalized heuristic value inversely proportional to the length of the local link
queue towards neighbor n, ω ∈ [0, 1] is a weighting factor and the denominator is a
normalization term. The decision rule of the ant located at node i and directed towards
destination node d (at time t) uses the entry aind (t) of the ant-decision table as follows:
aind (t) is simply the probability of choosing neighbor n. This probabilistic selection is
applied over all the not yet visited neighbors by the ant, or over all the neighbors in case
all the neighbors have already been visited by the ant. While building the path to the
destination, ants move using the same link queues as data. In this way ants experience
the same delays as data packets and the time Tsd elapsed while moving from the source
node s to the destination node d can be used as a measure of the path quality. The
overall “goodness” of a path is evaluated by an heuristic function of the trip time Tsd and
of local adaptive statistical models. In fact, paths need to be evaluated relative to the
network status because a trip time T judged of low quality under low congestion conditions
could be an excellent one under high traffic load. Once a path has been completed ants
deposit on the visited nodes an amount of pheromone proportional to the goodness of the
path they built. AntNet’s ants use only this online delayed way to update pheromone,
differently from ABC which uses only the online step-by-step strategy (lines 30-34 and
24-27 respectively of the ACO meta-heuristic of Figure 3). To this purpose, after reaching
their destination nodes, ants move back to their source nodes along the same path but
backward and using high priority queues, to allow a fast propagation of the collected
information (in AntNet the term “forward ant” is used for ants moving from source to
destination nodes, and the term “backward ant” for ants going back to their source nodes).
During the backward path, the pheromone value of each visited link is updated with a
rule similar to ABC’s. AntNet differs from ABC also in a number of minor details, the
most important of which are: (i) ants are launched from each node towards destinations
chosen to probabilistically match the traffic patterns, (ii) all the pheromone values on an
ant path are updated with respect to all the successor nodes of the (forward) path (as it

23
is done also in [6]), (iii) cycles are removed online from the ant’s paths20 , and (iv) data
packets are routed probabilistically using routing tables obtained by means of a simple
functional transformation of the ant-decision tables. AntNet was tested using a continuous
time discrete event network simulator, on a wide variety of different spatial and temporal
traffic conditions, and on several real and randomly generated network configurations
(ranging from 8 to 150 nodes). State-of-the-art static and adaptive routing algorithms
have been used for comparison. Results were excellent: AntNet showed striking superior
performance in terms of both throughput and packet delays. Moreover, it appears to
be very robust to the ant production rate and its impact on network resources is almost
negligible.
Di Caro and Dorigo (1998) [31, 32] recently developed an enhanced version of AntNet,
called AntNet-FA.21 AntNet-FA is the same as AntNet except for the following two aspects.
First, forward ants are substituted by so-called “flying ants”: while building a path from
source to destination node, flying ants make use of high priority queues and do not store
the trip times T . Second, each node maintains a simple local model of the local link queue
depletion process. By using this model, rough estimates of the missing forward ant trip
times are built. Backward ants read online these estimates and use them to evaluate the
path quality and consequently to compute the amount of pheromone to deposit. AntNet-
FA appears to be more reactive, the information collected by ants is more up-to-date and
is propagated faster than in the original AntNet. AntNet-FA has been observed to perform
much better than the original AntNet in best-effort connection-less networks [32].
Starting from AntNet-FA Di Caro and Dorigo (1998) [31] are currently developing AntNet-
FS, a routing and flow control system to manage multi-path adaptive fair-share routing
in connection-oriented high-speed networks. In AntNet-FS, some ants have some extra
functionalities to support the search and allocation of multi-paths for each new incoming
user session. Forward setup ants fork to search for convenient multi-paths (virtual circuits)
to allocate the session. A daemon component local to the session end-points decides to
accept or not the virtual circuits discovered by the forward setup ants. Accepted virtual
circuits are allocated by the backward setup ants, reserving at the same time the session’s
bandwidth in a fair-share [72] fashion over the circuit nodes. The allocated bandwidth is
dynamically redistributed and adapted after the arrival of a new session or the departure
of an old one. The AntNet-FS approach looks very promising for high-speed networks
(like ATM), but needs more testing.
Subramanian, Druschel and Chen (1997) [95] proposed the regular ants algorithms,
which essentially is an application of Schoonderwoerd’s et al. ABC algorithm [86, 87] to
packet switched networks, where the only difference is the use of links costs instead of ants
age. The way their ants use links costs requires the network to be (approximately) cost-
symmetric. They also propose uniform ants, that is, ants without a precise destination,
which live for a fixed amount of time in the network and explore it by using a uniform
probability scheme over the node neighbors. Uniform ants do not use the autocatalytic
mechanism that characterizes all ACO algorithms and therefore do not belong to the ACO
meta-heuristic.
Heusse, Snyers, Guérin and Kuntz (1998) [62] developed a new algorithm for general
cost-asymmetric networks, called Co-operative Asymmetric Forward (CAF). In CAF, each
data packet, after going from node i to node j, releases on node j the information cij about
the sum of the experienced waiting and crossing times from node i. This information is
used as an estimate of the time distance to go from i to j, and is read by the ants traveling
in the opposite direction to perform online step-by-step pheromone updating (no online
delayed pheromone updating is used in this case). The algorithm’s authors tested CAF
20
This is a simple form of backtracking.
21
In the original paper [32] the same algorithm was called AntNet-CO, because the algorithm was developed
in the perspective of connection-oriented routing, while here and in the future the authors choose to use the
name AntNet-FA, to emphasize the “flying ant” nature of forward ants.

24
under some static and dynamic conditions, using the average number of packets waiting in
the queues and the average packet delay as performance measures. They compared CAF
to an algorithm very similar to an earlier version of AntNet. Results were encouraging,
under all the test situations CAF performed better than its competitors.
Van der Put and Rothkrantz (1998) [97, 98] designed ABC-backward, an extension of
the ABC algorithm applicable to cost-asymmetric networks. They use the same forward-
backward ant mechanism as in AntNet: Forward ants, while moving from the source to
the destination node, collect information on the status of the network, and backward
ants use this information to update the pheromone trails of the visited links during their
journey back from the destination to the source node. In ABC-backward, backward ants
update the pheromone trails using an updating formula identical to that used in ABC,
except for the fact that the ants’ age is replaced by the trip times experienced by the
ants in their forward journey. Van der Put and Rothkrantz have shown experimentally
that ABC-backward has a better performance than ABC on both cost-symmetric, because
backward ants can avoid depositing pheromone on cycles, and cost-asymmetric networks.
They apply ABC-backward to a fax distribution problem proposed by the Dutch largest
telephone company (KPN Telecom).

3.3 Parallel implementations


The very nature of ACO algorithms lends them to be naturally parallelized in the data or
population domains. In particular, many parallel models used in other population-based
algorithms can be easily adapted to the ACO structure (e.g., migration and diffusion
models adopted in the field of parallel genetic algorithms, see for example reviews in
[16, 38]).
Early experiments with parallel versions of AS for the TSP on the Connection Machine
CM-2 [63] adopted the approach of attributing a single processing unit to each ant [4].
Experimental results showed that communication overhead can be a major problem with
this approach on fine grained parallel machines, since ants end up spending most of their
time communicating to other ants the modifications they did to pheromone trails. In fact,
the algorithm’s behavior was not impressive and scaled up very badly when increasing the
problem dimensions. Better results were obtained on a coarse grained parallel network of
16 transputers [4, 34]. The idea here was to divide the colony in p subcolonies, where p is
the number of available processors. Each subcolony acts as a complete colony and imple-
ments therefore a standard AS algorithm. After each subcolony has completed an iteration
of the algorithm, a hierarchical broadcast communication process collects the information
about the tours of all the ants in all the subcolonies and then broadcasts this information
to all the p processors so that a concurrent update of the pheromone trails can be done.
In this case the speed-up was nearly linear when increasing the number of processors and
this behavior did not change significatively for increasing problem dimensions.
More recently, Bullnheimer, Kotsis and Strauss [14] proposed two coarse grained par-
allel versions of AS. The first one, called Synchronous Parallel Implementation (SPI), is
basically the same as the one implemented by Bolondi and Bondanza [4], while the second
one, called Partially Asynchronous Parallel Implementation (PAPI), exchanges pheromone
information among subcolonies every fixed number of iterations done by each subcolony.
The two algorithms have been evaluated by simulation. The findings show that the re-
duced communication due to the less frequent exchange of pheromone trail information
among subcolonies determines a better performance of the PAPI approach with respect
to running time and speedup. More experimentation is necessary to compare the quality
of the results produced by the SPI and the PAPI implementations.
Krüger, Merkle and Middendorf [68] investigated which (pheromone trail) information
should be exchanged between the m subcolonies and how this information should be
used to update the subcolony’s trail information. They compared an exchange of (i) the

25
global best solution (every subcolony uses the global best solution to choose where to
add pheromone trail), (ii) the local best solutions (every subcolony receives the local best
solution from all other subcolonies and updates pheromone trail on the corresponding
arcs), and (iii) the total trail information (every colony computes the average over the
trail information of all colonies); that is, if τ l = [τijl ] is the trail information of subcolony
l, 1 ≤l ≤ m then every colony l sends τ l to the other colonies and afterwards computes
τijl = m h=1 τij , 1 ≤, i, j ≤ n. The results indicate that methods (i) and (ii) are faster and
h

give better solutions than method (iii). But further investigations are necessary.
Last, Stützle [91] presents computational results for the execution of parallel indepen-
dent runs on up to ten processors of his MMAS algorithm [92, 93]. The execution of
parallel independent runs is the easiest way to obtain a parallel algorithm and, obviously,
it is a reasonable approach only if the underlying algorithm, as it is the case with ACO
algorithms, is randomized. Stützle’s results show that the performance of MMAS grows
with the number of processors.

4 Related work
ACO algorithms show similarities with some optimization, learning and simulation ap-
proaches like heuristic graph search, Monte Carlo simulation, neural networks, and evolu-
tionary computation. These similarities are briefly discussed in the following.
Heuristic graph search. In ACO algorithms each ant performs an heuristic graph
search in the space of the components of a solution: ants take biased probabilistic decisions
to choose the next component to move to, where the bias is given by an heuristic evaluation
function which favors components which are perceived as more promising. It is interesting
to note that this is different from what happens, for example, in stochastic hillclimbers [78]
or in simulated annealing [66], where (i) an acceptance criteria is defined and only those
randomly generated moves which satisfy the criteria are executed, and (ii) the search is
usually performed in the space of the solutions.
Monte Carlo simulation. ACO algorithms can be interpreted as parallel replicated
Monte Carlo systems [90]. Monte Carlo systems [84] are general stochastic simulation
systems, that is, techniques performing repeated sampling experiments on the model of
the system under consideration by making use of a stochastic component in the state
sampling and/or transition rules. Experiment results are used to update some statistical
knowledge about the problem, as well as the estimate of the variables the researcher is
interested in. In turn, this knowledge can be also iteratively used to reduce the variance
in the estimation of the desired variables, directing the simulation process towards the
most interesting state space regions. Analogously, in ACO algorithms the ants sample
the problem’s solution space by repeatedly applying a stochastic decision policy until a
feasible solution of the considered problem is built. The sampling is realized concurrently
by a collection of differently instantiated replicas of the same ant type. Each ant “experi-
ment” allows to adaptively modify the local statistical knowledge on the problem structure
(i.e., the pheromone trails). The recursive transmission of such knowledge by means of
stigmergy determines a reduction in the variance of the whole search process: the so far
most interesting explored transitions probabilistically bias future search, preventing ants
to waste resources in not promising regions of the search space.
Neural networks. Ant colonies, being composed of numerous concurrently and lo-
cally interacting units, can be seen as “connectionist” systems [42], the most famous
examples of which are neural networks [3, 61, 85]. From a structural point of view, the
parallel between the ACO meta-heuristic and a generic neural network is obtained by
putting each state i visited by ants in correspondence with a neuron i, and the problem-
specific neighborhood structure of state i in correspondence with the set of synaptic-like
links exiting neuron i. The ants themselves can be seen as input signals concurrently

26
propagating through the neural network and modifying the strength of the synaptic-like
inter-neuron connections. Signals (ants) are locally propagated by means of a stochastic
transfer function and the more a synapse is used, the more the connection between its
two end-neurons is reinforced. The ACO-synaptic learning rule can be interpreted as an
a posteriori rule: signals related to good examples, that is, ants which discovered a good
quality solution, reinforce the synaptic connections they traverse more than signals related
to poor examples. It is interesting to note that the ACO-neural network algorithm does
not correspond to any existing neural network model.
The ACO-neural network is also reminiscent of networks solving reinforcement learning
problems [96]. In reinforcement learning the only feedback available to the learner is a
numeric signal (the reinforcement) which scores the result of actions. This is also the
case in the ACO meta-heuristic: The signals (ants) fed into the network can be seen as
input examples with associated an approximate score measure. The strength of pheromone
updates and the level of stochasticity in signal propagation play the role of a learning rate,
controlling the balance between exploration and exploitation.
Finally, it is worth to make a reference to the work of Chen [17], who proposed a neural
network approach to TSP which bears important similarities with the ACO approach.
Like in ACO algorithms, Chen builds a tour in an incremental way, according to synaptic
strengths. It makes also use of candidate lists and 2-opt local optimization. The strengths
of the synapses of the current tour and of all previous tours are updated according to
a Boltzmann-like rule and a learning rate playing the role of an evaporation coefficient.
Although there are some differences, the common features are, in this case, striking.
Evolutionary computation. There are some general similarities between the ACO
meta-heuristic and evolutionary computation (EC) [44]. Both approaches use a population
of individuals which represent problem solutions, and in both approaches the knowledge
about the problem collected by the population is used to stochastically generate a new
population of individuals. A main difference is that in EC algorithms all the knowledge
about the problem is contained in the current population, while in ACO a memory of past
performance is maintained under the form of pheromone trails.
An EC algorithm which is very similar to ACO algorithms in general and with AS
in particular is Baluja and Caruana’s Population Based Incremental Learning (PBIL)
[1]. PBIL maintains a vector of real numbers, the generating vector, which plays a role
similar to that of the population in genetic algorithms [64, 57]. Starting from this vector, a
population of binary strings is randomly generated: each string in the population will have
the i -th bit set to 1 with a probability which is a function of the i-th value in the generating
vector. Once a population of solutions is created, the generated solutions are evaluated
and this evaluation is used to increase (or decrease) the probabilities of each separate
component in the generating vector so that good (bad) solutions in the future generations
will be produced with higher (lower) probability. It is clear that in ACO algorithms the
pheromone trail values play a role similar to Baluja and Caruana’s generating vector, and
pheromone updating has the same goal as updating the probabilities in the generating
vector. A main difference between ACO algorithms and PBIL consists in the fact that
in PBIL all the probability vector components are evaluated independently, making the
approach working well only in the cases the solution is separable in its components.
The (1, λ) evolution strategy is another EC algorithm which is related to ACO algo-
rithms, and in particular to ACS. In fact, in the (1, λ) evolution strategy the following
steps are iteratively repeated: (i) a population of λ solutions (ants) is initially generated,
then (ii) the best individual of the population is saved for the next generation, while all
the other solutions are discarded, and (iii) starting from the best individual, λ − 1 new
solutions are stochastically generated by mutation, and finally (iv) the process is iterated
going back to step (ii). The similitude with ACS is striking.
Stochastic learning automata. This is one of the oldest approaches to machine
learning (see [77] for a review). An automaton is defined by a set of possible actions and

27
a vector of associated probabilities, a continuous set of inputs and a learning algorithm
to learn input-output associations. Automata are connected in a feedback configuration
with the environment, and a set of penalty signals from the environment to the actions is
defined. The similarity of stochastic learning automata and ACO approaches can be made
clear as follows. The set of pheromone trails available on each arc/link is seen as a set of
concurrent stochastic learning automata. Ants play the role of the environment signals,
while the pheromone update rule is the automaton learning rule. The main difference lies
in the fact that in ACO the “environment signals” (i.e., the ants) are stochastically biased,
by means of their probabilistic transition rule, to direct the learning process towards the
most interesting regions of the search space. That is, the whole environment plays a key,
active role to learn good state-action pairs.

5 Discussion
The ACO meta-heuristic was defined a posteriori, that is, it is the result of a synthesis
effort effectuated on a set of algorithms inspired by a common natural process. Such
a synthesis can be very useful because (i) it is a first attempt to characterize this new
class of algorithms, and (ii) it can be used as a reference to design new instances of ACO
algorithms. On the other hand, the a posteriori character of the synthesis determines a
great variety in the way some aspects of the meta-heuristic are implemented, as discussed
in the following.
Role of the local heuristic. Most of the ACO algorithms presented combine
pheromone trails with local heuristic values to obtain ant-decision tables. An exception
are Schoonderwoerd et al. ABC algorithm and all the derived algorithms (ABC-smart
ants, ABC-backward, regular ants, and CAF) in which ant-decision tables are obtained
using only pheromone trail values. Current wisdom indicates that the use of a heuristic
value, whenever possible, improves ACO performance considerably. The nature of the
heuristic information differs between static and dynamic problems. In all static problems
attacked by ACO a simple heuristic value directly obtainable from the problem definition
was used. On the contrary, in dynamic problems the heuristic value must be estimated by
local statistical sampling of the dynamic system.
Step-by-step versus delayed online solution evaluation. In dynamic combi-
natorial optimization problems some of the proposed ACO algorithms use step-by-step
online solution evaluation: ABC, ABC-smart, and regular ants, that take advantage of
the cost-simmetric nature of the network model, and CAF, that, although applied to a
cost-asymmetric network, can apply step-by-step solution evaluation by exploiting infor-
mation deposited on nodes by data packet traveling in the opposite direction of ants. The
other algorithms applied to cost-asymmetric networks, AntNet and ABC-backward, use
delayed online solution evaluation. For the static optimization problems considered the
use of step-by-step online solution evaluation would be misleading because problem con-
straints make the quality of partial solutions not a good estimate of the quality of complete
solutions.
Pheromone trail directionality. The pheromone trail can have directional proper-
ties. This is true for all dynamic optimization problems considered. Differently, in all the
applications to static problems (3.1) pheromone trail is not directional: an ant using arc
(i, j) will see the same pheromone trail values independent of the node it is coming from.
(It is the decision policy that can take in consideration the node, or the series of nodes,
the ant is coming from.)
Implicit solution evaluation. One of the interesting aspects of real ants shortest
path finding behavior is that they exploit implicit solution evaluation: if an ant takes a
shorter path it will arrive to the destination before any other ant that took a longer path.
Therefore, shorter paths will receive pheromone earlier and they start to attract new ants

28
before longer paths. This implicit solution evaluation property is exploited by the ACO
algorithms applied to routing, and not by those applied to static optimization problems.
The reason for this is that implicit solution evaluation is obtained for free whenever the
speed with which ants move on the problem representation is inversely proportional to the
cost of each state transition during solution construction. While this is the most natural
way to implement artificial ants for network applications, it is not an efficient choice for
the considered static problems. In fact, in this case it would be necessary to implement
an extra algorithm component to manage each ant’s speed, which would require extra
computation resources without any guarantee of improved performance.

6 Conclusions
In this paper we have introduced the ant colony optimization (ACO) meta-heuristic and we
have given an overview of ACO algorithms. ACO is a novel and very promising research
field situated at the crossing between artificial life and operations research. The ant
colony optimization meta-heuristic belongs to the relatively new wave of stochastic meta-
heuristics like evolutionary computation [44], simulated annealing [66], tabu search [55, 56],
neural computation [3, 61, 85], and so on, which are built around some basic principles
taken by the observation of a particular natural phenomenon. As it is very common in the
practical usage of these heuristics, ACO algorithms take often some distance from their
inspiring natural metaphor. Often ACO algorithms are enriched with capacities that do
not find a counterpart in real ants, like local search and global-knowledge-based actions,
so that they can compete with more application-specific approaches. ACO algorithms
so enriched are very competitive and in some applications they have reached world-class
performance. For example, on structured quadratic assignment problems AS-QAP, HAS-
QAP, and MMAS-QAP are currently the best available heuristics. Other very successful
applications are those to the sequential ordering problem, for which HAS-SOP is by far
the best available heuristic, and to data network routing, where AntNet resulted to be
superior to a whole set of state-of-the-art algorithms.
Within the artificial life field, ant algorithms represent one of the most successful
applications of swarm intelligence.22 One of the most characterizing aspect of swarm in-
telligent algorithms, shared by ACO algorithms, is the use of the stigmergetic model of
communication. We have seen that this form of indirect distributed communication plays
an important role in making ACO algorithms successful. There are however examples of
applications of stigmergy based on social insects behaviors other than ants foraging behav-
ior. For example, the stigmergy mediated allocation of work in ant colonies has inspired
models of task allocation in a distributed mail retrieval system, dead body aggregation
and brood sorting, again in ant colonies, have inspired a data clustering algorithm, and
models of collective transport by ants have inspired transport control strategies for groups
of robot.23
In conclusion, we hope this paper has achieved its goal: To convince the reader that
ACO, and more generally the stigmergetic model of communication, are worth further
research.
22
Swarm intelligence can be defined as the field which covers “any attempt to design algorithms or distributed
problem-solving devices inspired by the collective behavior of social insect colonies and other animal societies”
([5], page 7).
23
These and other examples of the possible applications of stigmergetic systems are discussed in details by
Bonabeau, Dorigo and Théraulaz in [5].

29
7 Acknowledgments
We are grateful to Nick Bradshaw, Bernd Bullnheimer, Martin Heusse, Owen Holland,
Vittorio Maniezzo, Martin Middendorf, Ruud Schoonderwoerd, and Dominique Snyers for
critical reading of a draft version of this article. We also wish to thank Eric Bonabeau and
the two referees for their valuable comments. Marco Dorigo acknowledges support from
the Belgian FNRS, of which he is a Research Associate. This work was supported by a
Madame Curie Fellowship awarded to Gianni Di Caro (CEC-TMR Contract N. ERBFM-
BICT 961153).

References
[1] S. Baluja and R. Caruana. Removing the genetics from the standard genetic al-
gorithm. In A. Prieditis and S. Russell, editors, Proceedings of the Twelfth Inter-
national Conference on Machine Learning, ML-95, pages 38–46. Palo Alto, CA:
Morgan Kaufmann, 1995.
[2] H. Bersini, M. Dorigo, S. Langerman, G. Seront, and L. M. Gambardella. Results of
the first international contest on evolutionary optimisation (1st ICEO). In Proceed-
ings of IEEE International Conference on Evolutionary Computation, IEEE-EC 96,
pages 611–615. IEEE Press, 1996.
[3] C. M. Bishop. Neural Networks for Pattern Recognition. Oxford University Press,
1995.
[4] M. Bolondi and M. Bondanza. Parallelizzazione di un algoritmo per la risoluzione
del problema del commesso viaggiatore. Master’s thesis, Dipartimento di Elettronica
e Informazione, Politecnico di Milano, Italy, 1993.
[5] E. Bonabeau, M. Dorigo, and G. Théraulaz. From Natural to Artificial Swarm
Intelligence. Oxford University Press, 1999.
[6] E. Bonabeau, F. Henaux, S. Guérin, D. Snyers, P. Kuntz, and G. Théraulaz. Rout-
ing in telecommunication networks with ”Smart” ant-like agents telecommunication
applications. In Proceedings of IATA’98, Second Int. Workshop on Intelligent Agents
for Telecommunication Applications. Lectures Notes in AI vol. 1437, Springer Verlag,
1998.
[7] J. Branke, M. Middendorf, and F. Schneider. Improved heuristics and a genetic
algorithm for finding short supersequences. OR-Spektrum, 20:39–46, 1998.
[8] D. Brelaz. New methods to color vertices of a graph. Communications of the ACM,
22:251–256, 1979.
[9] A. M. Bruckstein. Why the ant trails look so straight and nice. The Mathematical
Intelligencer, 15(2):59–62, 1993.
[10] A. M. Bruckstein, C. L. Mallows, and I. A. Wagner. Probabilistic pursuits on the
grid. AMM: The American Mathematical Monthly, 104, 1997.
[11] B. Bullnheimer, R. F. Hartl, and C. Strauss. An improved ant system algorithm for
the vehicle routing problem. Technical Report POM-10/97, Institute of Management
Science, University of Vienna, 1997. Accepted for publication in the Annals of
Operations Research.

30
[12] B. Bullnheimer, R. F. Hartl, and C. Strauss. A new rank-based version of the ant
system: a computational study. Technical Report POM-03/97, Institute of Manage-
ment Science, University of Vienna, 1997. Accepted for publication in the Central
European Journal for Operations Research and Economics.
[13] B. Bullnheimer, R. F. Hartl, and C. Strauss. Applying the ant system to the vehicle
routing problem. In I. H. Osman S. Voß, S. Martello and C. Roucairol, editors,
Meta-Heuristics: Advances and Trends in Local Search Paradigms for Optimization,
pages 109–120. Kluwer Academics, 1998.
[14] B. Bullnheimer, G. Kotsis, and C. Strauss. Parallelization strategies for the ant
system. Technical Report POM 9-97, Institute of Management Science, University
of Vienna, Austria, 1997. To appear in: Kluwer Series of Applied Optmization
(selected papers of HPSNO’97), A. Murli, P. Pardalos and G. Toraldo, editors.
[15] B. Bullnheimer and C. Strauss. Tourenplanung mit dem ant system. Technical
Report 6, Instituts für Betriebwirtschaftslehre, Universität Wien, 1996.
[16] R. Campanini, G. Di Caro, M. Villani, I. D’Antone, and G. Giusti. Parallel ar-
chitectures and intrinsically parallel algorithms: Genetic algorithms. International
Journal of Modern Physics C, 5(1):95–112, 1994.
[17] K. Chen. A simple learning algorithm for the traveling salesman problem. Physical
Review E, 55, 1997.
[18] G. Clarke and J. W. Wright. Scheduling of vehicles from a central depot to a number
of delivery points. Operations Research, 12:568–581, 1964.
[19] A. Colorni, M. Dorigo, and V. Maniezzo. Distributed optimization by ant colonies.
In Proceedings of the First European Conference on Artificial Life, pages 134–142.
Elsevier, 1992.
[20] A. Colorni, M. Dorigo, V. Maniezzo, and M. Trubian. Ant system for job-shop
scheduling. Belgian Journal of Operations Research, Statistics and Computer Science
(JORBEL), 34:39–53, 1994.
[21] D. Corne, M. Dorigo, and F. Glover, editors. New Ideas in Optimization. McGraw-
Hill, 1999.
[22] D. Costa and A. Hertz. Ants can colour graphs. Journal of the Operational Research
Society, 48:295–305, 1997.
[23] D. Costa, A. Hertz, and O. Dubuis. Embedding of a sequential algorithm within
an evolutionary algorithm for coloring problems in graphs. Journal of Heuristics,
1:105–128, 1995.
[24] G.A. Croes. A method for solving traveling salesman problems. Operations Research,
6:791–812, 1958.
[25] J.-L. Deneubourg, S. Aron, S. Goss, and J.-M. Pasteels. The self-organizing ex-
ploratory pattern of the argentine ant. Journal of Insect Behavior, 3:159–168, 1990.
[26] G. Di Caro and M. Dorigo. AntNet: A mobile agents approach to adaptive routing.
Technical Report 97-12, IRIDIA, Université Libre de Bruxelles, 1997.
[27] G. Di Caro and M. Dorigo. Mobile agents for adaptive routing. In Proceedings of
the 31st International Conference on System Sciences (HICSS-31), volume 7, pages
74–83. IEEE Computer Society Press, 1998.

31
[28] G. Di Caro and M. Dorigo. AntNet: Distributed stigmergetic control for commu-
nications networks. Journal of Artificial Intelligence Research (JAIR), 9:317–365,
December 1998.
[29] G. Di Caro and M. Dorigo. Ant colonies for adaptive routing in packet-switched
communications networks. In A. E. Eiben, T. Back, M. Schoenauer, and H.-P.
Schwefel, editors, Proceedings of PPSN-V, Fifth International Conference on Parallel
Problem Solving from Nature, pages 673–682. Springer-Verlag, 1998.
[30] G. Di Caro and M. Dorigo. An adaptive multi-agent routing algorithm inspired by
ants behavior. In Proceedings of PART98 - 5th Annual Australasian Conference on
Parallel and Real-Time Systems, pages 261–272. Springer-Verlag, 1998.
[31] G. Di Caro and M. Dorigo. Extending AntNet for best-effort Quality-of-
Service routing. Unpublished presentation at ANTS’98 - From Ant Colonies
to Artificial Ants: First International Workshop on Ant Colony Optimization
http://iridia.ulb.ac.be/ants98/ants98.html, October 15-16 1998.
[32] G. Di Caro and M. Dorigo. Two ant colony algorithms for best-effort routing in
datagram networks. In Proceedings of the Tenth IASTED International Conference
on Parallel and Distributed Computing and Systems (PDCS’98), pages 541–546.
IASTED/ACTA Press, 1998.
[33] M. Dorigo. Optimization, Learning and Natural Algorithms (in Italian). PhD thesis,
Dipartimento di Elettronica e Informazione, Politecnico di Milano, IT, 1992.
[34] M. Dorigo. Parallel ant system: An experimental study. Unpublished manuscript,
1993.
[35] M. Dorigo and L. M. Gambardella. A study of some properties of Ant-Q. In Pro-
ceedings of PPSN-IV, Fourth International Conference on Parallel Problem Solving
From Nature, pages 656–665. Berlin: Springer-Verlag, 1996.
[36] M. Dorigo and L. M. Gambardella. Ant colonies for the traveling salesman problem.
BioSystems, 43:73–81, 1997.
[37] M. Dorigo and L. M. Gambardella. Ant colony system: A cooperative learning
approach to the traveling salesman problem. IEEE Transactions on Evolutionary
Computation, 1(1):53–66, 1997.
[38] M. Dorigo and V. Maniezzo. Parallel genetic algorithms: Introduction and overview
of the current research. In J. Stender, editor, Parallel Genetic Algorithms: Theory
& Applications, pages 5–42. IOS Press, 1993.
[39] M. Dorigo, V. Maniezzo, and A. Colorni. Positive feedback as a search strategy.
Technical Report 91-016, Dipartimento di Elettronica, Politecnico di Milano, IT,
1991.
[40] M. Dorigo, V. Maniezzo, and A. Colorni. The ant system: Optimization by a colony
of cooperating agents. IEEE Transactions on Systems, Man, and Cybernetics–Part
B, 26(1):29–41, 1996.
[41] L. F. Escudero. An inexact algorithm for the sequential ordering problem. European
Journal of Operations Research, 37:232–253, 1988.
[42] J. A. Feldman and D. H. Ballard. Connectionist models and their properties. Cog-
nitive Science, 6:205–254, 1982.

32
[43] C. Fleurent and J. Ferland. Genetic and hybrid algorithms for graph coloring. Annals
of Operations Research, 63:437–461, 1996.
[44] D. B. Fogel. Evolutionary Computation. IEEE Press, 1995.
[45] D. E. Foulser, M. Li, and Q. Yang. Theory and algorithms for plan merging. Artificial
Intelligence, 57:143–181, 1992.
[46] B. Freisleben and P. Merz. Genetic local search algorithms for solving symmetric
and asyimmetric traveling salesman problems. In Proceedings of IEEE International
Conference on Evolutionary Computation, IEEE-EC96, pages 616–621. IEEE Press,
1996.
[47] B. Freisleben and P. Merz. New genetic local search operators for the traveling,
salesman problem. In H.-M. Voigt, W. Ebeling, I. Rechenberg, and H.-P. Schwe-
fel, editors, Proceedings of PPSN-IV, Fourth International Conference on Parallel
Problem Solving from Nature, pages 890–899. Berlin: Springer-Verlag, 1996.
[48] L. M. Gambardella and M. Dorigo. Ant-Q: A reinforcement learning approach to the
traveling salesman problem. In Proceedings of the Twelfth International Conference
on Machine Learning, ML-95, pages 252–260. Palo Alto, CA: Morgan Kaufmann,
1995.
[49] L. M. Gambardella and M. Dorigo. Solving symmetric and asymmetric TSPs by
ant colonies. In Proceedings of the IEEE Conference on Evolutionary Computation,
ICEC96, pages 622–627. IEEE Press, 1996.
[50] L. M. Gambardella and M. Dorigo. HAS-SOP: An hybrid ant system for the sequen-
tial ordering problem. Technical Report 11-97, IDSIA, Lugano, CH, 1997.
[51] L. M. Gambardella, E. Taillard, and G. Agazzi. Ant colonies for vehicle routing
problems. In D. Corne, M. Dorigo, and F. Glover, editors, New Ideas in Optimization.
McGraw-Hill, 1999.
[52] L. M. Gambardella, E. D. Taillard, and M. Dorigo. Ant colonies for the QAP.
Technical Report 4-97, IDSIA, Lugano, Switzerland, 1997. Accepted for publication
in the Journal of the Operational Research Society (JORS).
[53] M. R. Garey, D. S. Johnson, and R. Sethi. The complexity of flowshop and jobshop
scheduling. Mathematics of Operations Research, 2(2):117–129, 1976.
[54] J. Gavett. Three heuristics rules for sequencing jobs to a single production facility.
Management Science, 11:166–176, 1965.
[55] F. Glover. Tabu search, part I. ORSA Journal on Computing, 1:190–206, 1989.
[56] F. Glover. Tabu search, part II. ORSA Journal on Computing, 2:4–32, 1989.
[57] D. E. Goldberg. Genetic Algorithms in Search, Optimization and Machine Learning.
Addison-Wesley, 1989.
[58] S. Goss, S. Aron, J. L. Deneubourg, and J. M. Pasteels. Self-organized shortcuts in
the Argentine ant. Naturwissenschaften, 76:579–581, 1989.
[59] P. P. Grassé. Les insects dans leur univers. Ed. du Palais de la découverte, Paris,
1946.

33
[60] P. P. Grassé. La reconstruction du nid et les coordinations interindividuelles
chez bellicositermes natalensis et cubitermes sp. La théorie de la stigmergie: es-
sai d’interprétation du comportement des termites constructeurs. Insectes Sociaux,
6:41–81, 1959.
[61] J. Hertz, A. Krogh, and R. G. Palmer. Introduction to the Theory of Neural Com-
putation. Addison Wesley, 1991.
[62] M. Heusse, S. Guérin, D. Snyers, and P. Kuntz. Adaptive agent-driven routing
and load balancing in communication networks. Technical Report RR-98001-IASC,
Départment Intelligence Artificielle et Sciences Cognitives, ENST Bretagne, 1998.
Accepted for publication in the Journal of Complex Systems.
[63] W. D. Hillis. The Connection Machine. MIT Press, 1982.
[64] J. Holland. Adaptation in Natural and Artificial Systems. University of Michigan
Press, 1975.
[65] D. S. Johnson and L. A. McGeoch. The traveling salesman problem: A case study.
In E.H. Aarts and J. K. Lenstra, editors, In Local Search in Combinatorial Opti-
mization, pages 215–310. Chichester: John Wiley & Sons, 1997.
[66] S. Kirkpatrick, C. D. Gelatt, and M. P. Vecchi. Optimization by simulated annealing.
Science, 220(4598):671–680, 1983.
[67] T. C. Koopmans and M. J. Beckmann. Assignment problems and the location of
economic activities. Econometrica, 25:53–76, 1957.
[68] F. Krüger, D. Merkle, and M. Middendorf. Studies on a parallel ant system for the
BSP model. Unpublished manuscript.
[69] E. L. Lawler, J. K. Lenstra, A. H. G. Rinnooy-Kan, and D. B. Shmoys, editors. The
Travelling Salesman Problem. Wiley, 1985.
[70] F. Leighton. A graph coloring algorithm for large scheduling problems. Journal of
Research of the National Bureau of Standards, 84:489–505, 1979.
[71] S. Lin. Computer solutions of the traveling salesman problem. Bell Systems Journal,
44:2245–2269, 1965.
[72] Q. Ma, P. Steenkiste, and H. Zhang. Routing in high-bandwidth traffic in max-
min fair share networks. ACM Computer Communication Review (SIGCOMM’96),
26(4):206–217, 1996.
[73] V. Maniezzo. Exact and approximate nondeterministic tree-search procedures for
the quadratic assignment problem. Technical Report CSR 98-1, C. L. In Scienze
dell’Informazione, Universitá di Bologna, sede di Cesena, Italy, 1998.
[74] V. Maniezzo and A. Colorni. The ant system applied to the quadratic assignment
problem. IEEE Trans. Knowledge and Data Engineering, 1999, in press.
[75] V. Maniezzo, A. Colorni, and M. Dorigo. The ant system applied to the quadratic
assignment problem. Technical Report IRIDIA/94-28, Université Libre de Bruxelles,
Belgium, 1994.

34
[76] R. Michel and M. Middendorf. An island model based ant system with lookahead
for the shortest supersequence problem. In A. E. Eiben, T. Back, M. Schoenauer,
and H.-P. Schwefel, editors, Proceedings of PPSN-V, Fifth International Conference
on Parallel Problem Solving from Nature, pages 692–701. Springer-Verlag, 1998.
[77] K. Narendra and M Thathachar. Learning Automata: An Introduction. Prentice-
Hall, 1989.
[78] N. J. Nilsson. Artificial Intelligence: A New Synthesis. Morgan Kauffmann, 1998.
[79] H. Paessens. The savings algorithm for the vehicle routing problem. European
Journal of Operational Research, 34:336–344, 1988.
[80] J. M. Pasteels, J.-L. Deneubourg, and S. Goss. Self-organization mechanisms in
ant societies (i): Trail recruitment to newly discovered food sources. Experientia
Supplementum, 54:155–175, 1987.
[81] K.-J. Räihä and E. Ukkonen. The shortest common supersequence problem over
binary alphabet is NP-complete. Theoretical Computer Science, 16:187–198, 1981.
[82] R. I. Rechenberg. Evolutionsstrategie: Optimierung Technischer Systeme nach
Prinzipien der Biologischen Evolution. Frommann-Holzboog, Stuttgart (DE), 1973.
[83] G. Reinelt. The Traveling Salesman Problem: Computational Solutions for TSP
Applications. Berlin: Springer-Verlag, 1994.
[84] R. Y. Rubistein. Simulation and the Monte Carlo Method. John Wiley & Sons, 1981.
[85] D. E. Rumelhart, J. L. McClelland, and the PDP Research Group, editors. Parallel
Distributed Processing. Cambridge, MA: MIT Press, 1986.
[86] R. Schoonderwoerd, O. Holland, and J. Bruten. Ant-like agents for load balancing in
telecommunications networks. In Proceedings of the First International Conference
on Autonomous Agents, pages 209–216. ACM Press, 1997.
[87] R. Schoonderwoerd, O. Holland, J. Bruten, and L. Rothkrantz. Ant-based load
balancing in telecommunications networks. Adaptive Behavior, 5(2):169–207, 1996.
[88] H.-P. Schwefel. Numerische Optimierung von Computer-Modellen mittels der Evo-
lutionsstrategie. Basel:Birkauser, 1977.
[89] S. Shani and T. Gonzales. P-complete approximation problems. Journal of ACM,
23:555–565, 1976.
[90] S. Streltsov and P. Vakili. Variance reduction algorithms for parallel replicated
simulation of uniformized Markov chains. Discrete Event Dynamic Systems: Theory
and Applications, 6:159–180, 1996.
[91] T. Stützle. Parallelization strategies for ant colony optimization. In A. E. Eiben,
T. Back, M. Schoenauer, and H.-P. Schwefel, editors, Proceedings of PPSN-V, Fifth
International Conference on Parallel Problem Solving from Nature, pages 722–731.
Springer-Verlag, 1998.
[92] T. Stützle and H. Hoos. The MAX –MIN ant system and local search for the
traveling salesman problem. In T. Baeck, Z. Michalewicz, and X. Yao, editors,
Proceedings of IEEE-ICEC-EPS’97, IEEE International Conference on Evolution-
ary Computation and Evolutionary Programming Conference, pages 309–314. IEEE
Press, 1997.

35
[93] T. Stützle and H. Hoos. Improvements on the ant system: Introducing MAX –
MIN ant system. In Proceedings of the International Conference on Artificial
Neural Networks and Genetic Algorithms, pages 245–249. Springer Verlag, Wien,
1997.
[94] T. Stützle and H. Hoos. MAX –MIN Ant system and local search for combinatorial
optimization problems. In S. Voß, S. Martello, I.H. Osman, and C. Roucairol, editors,
Meta-Heuristics: Advances and Trends in Local Search Paradigms for Optimization,
pages 137–154. Kluwer, Boston, 1998.
[95] D. Subramanian, P. Druschel, and J. Chen. Ants and reinforcement learning: A case
study in routing in dynamic networks. In Proceedings of IJCAI-97, International
Joint Conference on Artificial Intelligence, pages 832–838. Morgan Kaufmann, 1997.
[96] R. S. Sutton and A. G. Barto. Reinforcement Learning: An Introduction. Cambridge,
MA: MIT Press, 1998.
[97] R. van der Put. Routing in the faxfactory using mobile agents. Technical Report
R&D-SV-98-276, KPN Research, 1998.
[98] R. van der Put and L. Rothkrantz. Routing in packet switched networks using
agents. Simulation Practice and Theory, 1999, in press.
[99] C. J. Watkins. Learning with Delayed Rewards. PhD thesis, Psychology Department,
University of Cambridge, UK, 1989.
[100] T. White, B. Pagurek, and F. Oppacher. Connection management using adaptive
mobile agents. In H.R. Arabnia, editor, Proceedings of the International Conference
on Parallel and Distributed Processing Techniques and Applications (PDPTA’98),
pages 802–809. CSREA Press, 1998.
[101] D. Whitley, T. Starkweather, and D. Fuquay. Scheduling problems and travelling
salesman: The genetic edge recombination operator. In Proceedings of the Third
International Conference on Genetic Algorithms, pages 133–140. Palo Alto, CA:
Morgan Kaufmann, 1989.

36

You might also like