Alogorithm Applications PDF
Alogorithm Applications PDF
Handbook of
Bioinspired Algorithms
and Applications
PUBLISHED TITLES
HANDBOOK OF SCHEDULING: ALGORITHMS, MODELS, AND PERFORMANCE ANALYSIS
Joseph Y.-T. Leung
THE PRACTICAL HANDBOOK OF INTERNET COMPUTING
Munindar P. Singh
HANDBOOK OF DATA STRUCTURES AND APPLICATIONS
Dinesh P. Mehta and Sartaj Sahni
DISTRIBUTED SENSOR NETWORKS
S. Sitharama Iyengar and Richard R. Brooks
SPECULATIVE EXECUTION IN HIGH PERFORMANCE COMPUTER ARCHITECTURES
David Kaeli and Pen-Chung Yew
SCALABLE AND SECURE INTERNET SERVICES AND ARCHITECTURE
Cheng-Zhong Xu
Handbook of
Bioinspired Algorithms
and Applications
Edited by
Stephan Olariu
Old Dominion University
Norfolk, Virginia, U.S.A.
Albert Y. Zomaya
University of Sydney
NSW, Australia
Published in 2006 by
Chapman & Hall/CRC
Taylor & Francis Group
6000 Broken Sound Parkway NW, Suite 300
Boca Raton, FL 33487-2742
This book contains information obtained from authentic and highly regarded sources. Reprinted material is quoted with
permission, and sources are indicated. A wide variety of references are listed. Reasonable efforts have been made to publish
reliable data and information, but the author and the publisher cannot assume responsibility for the validity of all materials
or for the consequences of their use.
No part of this book may be reprinted, reproduced, transmitted, or utilized in any form by any electronic, mechanical, or
other means, now known or hereafter invented, including photocopying, microfilming, and recording, or in any information
storage or retrieval system, without written permission from the publishers.
For permission to photocopy or use material electronically from this work, please access www.copyright.com
(http://www.copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC) 222 Rosewood Drive, Danvers, MA
01923, 978-750-8400. CCC is a not-for-profit organization that provides licenses and registration for a variety of users. For
organizations that have been granted a photocopy license by the CCC, a separate system of payment has been arranged.
Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used only for
identification and explanation without intent to infringe.
The Handbook of Bioinspired Algorithms and Applications seeks to provide an opportunity for researchers
to explore the connection between biologically inspired (or bioinspired) techniques and the development
of solutions to problems that arise in a variety of problem domains. The power of bioinspired paradigms
lies in their capability in dealing with complex problems with little or no knowledge about the search space,
and thus is particularly well suited to deal with a wide range of computationally intractable optimizations
and decision-making applications.
Vast literature exists on bioinspired approaches for solving an impressive array of problems and there
is a great need to develop repositories of “how to apply” bioinspired paradigms to difficult problems. The
material of the handbook is by no means exhaustive and it focuses on paradigms that are “bioinspired,”
and therefore, chapters on fuzzy logic or simulated annealing were not included in the organization. There
was a decision to limit the number of chapters so that the handbook remains manageable within a single
volume.
The handbook endeavors to strike a balance between theoretical and practical coverage of a range of
bioinspired paradigms and applications. The handbook is organized into two main sections: Models and
Paradigms and Application Domains, and the titles of the various chapters are self-explanatory and a
good indication to what is covered. The theoretical chapters are intended to provide the fundamentals of
each of the paradigms in such a way that allows the readers to utilize these techniques in their own fields.
The application chapters show detailed examples and case studies of how to actually develop a solution
to a problem based on a bioinspired technique. The handbook should serve as a repository of significant
reference material, as the list of references that each chapter provides will become a useful source of further
study.
Stephan Olariu
Albert Y. Zomaya
First and foremost we would like to thank and acknowledge the contributors of this book for their support
and patience, and the reviewers for their useful comments and suggestions that helped in improving
the earlier outline of the handbook and presentation of the material. Professor Zomaya would like to
acknowledge the support from CISCO Systems and members of the Advanced Networks Research Group
at Sydney University. We also extend our deepest thanks to Jessica Vakili and Bob Stern from CRC Press
for their collaboration, guidance, and, most importantly, patience in finalizing this handbook. Finally,
we thank Mr. Mohan Kumar for leading the production process of this handbook in a very professional
manner.
Stephan Olariu
Albert Y. Zomaya
Stephan Olariu received his M.Sc. and Ph.D. degrees in computer science from McGill University,
Montreal, in 1983 and 1986, respectively. In 1986 he joined the Old Dominion University where he is a
professor of computer science. Dr. Olariu has published extensively in various journals, book chapters,
and conference proceedings. His research interests include image processing and machine vision, parallel
architectures, design and analysis of parallel algorithms, computational graph theory, computational geo-
metry, and mobile computing. Dr. Olariu serves on the Editorial Board of IEEE Transactions on Parallel
and Distributed Systems, Journal of Parallel and Distributed Computing, VLSI Design, Parallel Algorithms
and Applications, International Journal of Computer Mathematics, and International Journal of Foundations
of Computer Science.
Albert Y. Zomaya is currently the CISCO Systems chair professor of internetworking in the School of
Information Technologies, The University of Sydney. Prior to that he was a full professor in the Electrical
and Electronic Engineering Department at the University of Western Australia, where he also led the
Parallel Computing Research Laboratory from 1990 to 2002. He served as associate, deputy, and acting
head in the same department, and held visiting positions at Waterloo University and the University of
Missouri–Rolla. He is the author/co-author of 6 books and 200 publications in technical journals and
conferences, and the editor of 6 books and 7 conference volumes. He is currently an associate editor
for 14 journals, the founding editor of the Wiley Book Series on Parallel and Distributed Computing, and
the editor-in-chief of the Parallel and Distributed Computing Handbook (McGraw-Hill 1996). Professor
Zomaya was the chair of the IEEE Technical Committee on Parallel Processing (1999–2003) and currently
serves on its executive committee. He has been actively involved in the organization of national and
international conferences. He received the 1997 Edgeworth David Medal from the Royal Society of New
South Wales for outstanding contributions to Australian science. In September 2000 he was awarded the
IEEE Computer Society’s Meritorious Service Award. Professor Zomaya is a chartered engineer (CEng), a
fellow of the IEEE, a fellow of the Institution of Electrical Engineers (U.K.), and member of the ACM. He also
serves on the boards of two startup companies. His research interests are in the areas of high performance
computing, parallel algorithms, networking, mobile computing, and bioinformatics.
1.1 Introduction
One of the most striking features of Nature is the existence of living organisms adapted for surviving in
almost any ecosystem, even the most inhospitable: from abyssal depths to mountain heights, from volcanic
vents to polar regions. The magnificence of this fact becomes more evident when we consider that the
life environment is continuously changing. This motivates certain life forms to become extinct whereas
other beings evolve and preponderate due to their adaptation to the new scenario. It is very remarkable
that living beings do not exert a conscious effort for evolving (actually, it would be rather awkward to talk
about consciousness in amoebas or earthworms); much on the contrary, the driving force for change is
controlled by supraorganic mechanisms such as natural evolution.
Can we learn — and use for our own profit — the lessons that Nature is teaching us? The answer is a big
YES, as the optimization community has repeatedly shown in the last decades. “Evolutionary algorithm”
is the key word here. The term evolutionary algorithm (EA henceforth) is used to designate a collection
of optimization techniques whose functioning is loosely based on metaphors of biological processes.
This rough definition is rather broad and tries to encompass the numerous approaches currently
existing in the field of evolutionary computation [1]. Quite appropriately, this field itself is continuously
evolving; a quick inspection of the proceedings of the relevant conferences and symposia suffices to
demonstrate the impetus of the field, and the great diversity of the techniques that can be considered
“evolutionary.”
1-3
This variety notwithstanding, it is possible to find a number of common features of all (or at least
most of) EAs. The following quote from Reference 2 illustrates such common points:
The algorithm maintains a collection of potential solutions to a problem. Some of these possible
solutions are used to create new potential solutions through the use of operators. Operators act on
and produce collections of potential solutions. The potential solutions that an operator acts on are
selected on the basis of their quality as solutions to the problem at hand. The algorithm uses this
process repeatedly to generate new collections of potential solutions until some stopping criterion
is met.
This definition can be usually found in the literature expressed in a technical language that uses terms
such as genes, chromosomes, population, etc. This jargon is a reminiscence of the biological inspiration
mentioned before, and has deeply permeated the field. We will return to the connection with biology
later on.
The objective of this work is to present a gentle overview of these techniques comprising both the
classical “canonical” models of EAs as well as some modern directions for the development of the field,
namely, the use of parallel computing, and the introduction of problem-dependent knowledge.
• Evolution is a process that does not operate on organisms directly, but on chromosomes. These
are the organic tools by means of which the structure of a certain living being is encoded, that is,
the features of a living being are defined by the decoding of a collection of chromosomes. These
chromosomes (more precisely, the information they contain) pass from one generation to another
through reproduction.
• The evolutionary process takes place precisely during reproduction. Nature exhibits a plethora
of reproductive strategies. The most essential ones are mutation (that introduces variability in
the gene pool) and recombination (that introduces the exchange of genetic information among
individuals).
• Natural selection is the mechanism that relates chromosomes with the adequacy of the entities they
represent, favoring the proliferation of effective, environment-adapted organisms, and conversely
causing the extinction of lesser effective, nonadapted organisms.
These principles are comprised within the most orthodox theory of evolution, the Synthetic
Theory [6]. Although alternate scenarios that introduce some variety in this description have been
proposed — for example, the Neutral Theory [7], and very remarkably the Theory of Punctuated
Equilibria [8] — it is worth considering the former basic model. It is amazing to see that despite the
apparent simplicity of the principles upon which it rests, Nature exhibits unparalleled power in developing
and expanding new life forms.
Not surprisingly, this power has attracted the interest of many researchers, who have tried to translate the
principles of evolution to the realm of algorithmics, pursuing the construction of computer systems with
analogous features. An important point must be stressed here: evolution is an undirected process, that is,
there exists no scientific evidence that evolution is headed to a certain final goal. On the contrary, it can
be regarded as a reactive process that makes organisms change in response to environmental variations.
However, it is a fact that human-designed systems do pursue a definite final goal. Furthermore, whatever
this goal might be, it is in principle, desirable to reach it quickly and efficiently. This leads to the distinction
between two approaches to the construction of natureinspired systems:
1. Trying to reproduce Nature principles with the highest possible accuracy, that is, simulate Nature.
2. Using these principles as inspiration, adapting them in whatever required way so as to obtain
efficient systems for performing the desired task.
Both approaches concentrate nowadays on the efforts of researchers. The first one has given rise to
the field of Artificial Life (e.g., see Reference 9), and it is interesting because it allows re-creating and
studying numerous natural phenomena such as parasitism, predator/prey relationships, etc. The second
approach can be considered more practical, and constitutes the source of EAs. Notice anyway that these
two approaches are not hermetic containers, and have frequently interacted with certainly successful
results.
P
t
c s
vx P⬘
g P⬙ vm
Evolutionary-Algorithm:
of solutions onto which the EA will subsequently work, iteratively applying some evolutionary operators
to modify its contents. More precisely, the process comprises three major stages: selection (promising
solutions are picked from the population by using a selection function σ ), reproduction (new solutions
are created by modifying selected solutions using some reproductive operators ωi ), and replacement (the
population is updated by replacing some existing solutions by the newly created ones, using a replacement
function ψ). This process is repeated until a certain termination criterion (usually reaching a maximum
number of iterations) is satisfied. Each iteration of this process is commonly termed a generation.
According to this description, it is possible to express the pseudocode of an EA as shown in Figure 1.2.
Every possible instantiation of this algorithmic template1 will give rise to a different EA. More precisely,
it is possible to distinguish different EA families, by considering some guidelines on how to perform this
instantiation.
• Evolutionary Programming (EP): This EA family originated in the work of Fogel et al. [11].
EP focuses on the adaption of individuals rather than on the evolution of their genetic informa-
tion. This implies a much more abstract view of the evolutionary process, in which the behavior of
individuals is directly modified (as opposed to manipulating its genes). This behavior is typically
modeled by using complex data structures such as finite automata or as graphs (see Figure 1.3[a]).
Traditionally, EP uses asexual reproduction — also known as mutation, that is, introducing slight
changes in an existing solution — and selection techniques based on direct competition among
individuals.
• Evolution Strategies (ESs): These techniques were initially developed in Germany by Rechenberg
[12] and Schwefel [13]. Their original goal was serving as a tool for solving engineering problems.
With this goal in mind, these techniques are characterized by manipulating arrays of floating-point
numbers (there exist versions of ES for discrete problems, but they are much more popular for
continuous optimization). As EP, mutation is sometimes the unique reproductive operator used
in ES; it is not rare to also consider recombination (i.e., the construction of new solutions by
combining portions of some individuals) though. A very important feature of ES is the utilization
of self-adaptive mechanisms for controlling the application of mutation. These mechanisms are
aimed at optimizing the progress of the search by evolving not only the solutions for the problem
being considered, but also some parameters for mutating these solutions (in a typical situation,
1 Themere fact that this high-level heuristic template can host a low-level heuristic, justifies using the term
metaheuristic, as it will be seen later.
Hidden layer n o p
(b) IF
AND IS
IS IS FOR PL
POS NL VEL NL
FIGURE 1.3 Two examples of complex representations. (a) A graph representing a neural network. (b) A tree
representing a fuzzy rule.
an ES individual is a pair (x , σ ), where σ is a vector of standard deviations used to control the
Gaussian mutation exerted on the actual solution x ).
• Genetic Algorithms (GAs): GAs are possibly the most widespread variant of EAs. They were con-
ceived by Holland [14]. His work has had a great influence in the development of the field, to the
point that some portions — arguably extrapolated — of it were taken almost like dogmas (i.e., the
ubiquitous use of binary strings as chromosomes). The main feature of GAs is the use of a recom-
bination (or crossover) operator as the primary search tool. The rationale is the assumption that
different parts of the optimal solution can be independently discovered, and be later combined to
create better solutions. Additionally, mutation is also used, but it was usually considered a second-
ary background operator whose purpose is merely “keeping the pot boiling” by introducing new
information in the population (this classical interpretation is no longer considered valid though).
These families have not grown in complete isolation from each other. On the contrary, numerous
researchers built bridges among them. As a result of this interaction, the borders of these classical families
tend to be fuzzy (the reader may check [15] for a unified presentation of EA families), and new variants
have emerged. We can cite the following:
• Evolution Programs (EPs): This term is due to Michalewicz [5], and comprises those techniques
that, while using the principles of functioning of GAs, evolve complex data structures, as in EP.
In addition to the different EA variants mentioned above, there exist several other techniques that could
also fall within the scope of EAs, such as Ant Colony Optimization [20], Distribution Estimation Algorithms
[21], or Scatter Search [22] among others. All of them rely on achieving some kind of balance between
the exploration of new regions of the search space, and the exploitation of regions known to be promising
[23], so as to minimize the computational effort for finding the desired solution. Nevertheless, these
techniques exhibit very distinctive features that make them depart from the general pseudocode depicted
in Figure 1.2. The broader term metaheuristic (e.g., see Reference 24) is used to encompass this larger set
of modern optimization techniques, including EAs.
so as to have a genetic reservoir of worthwhile information in the past, and thus be capable of tackling
dynamic changes in the fitness function.
Notice that there may even exist more than one criterion for guiding the search (e.g., we would like to
evolve the shape of a set of pillars, so that their strength is maximal, but so that their cost is also minimal).
These criteria will be typically partially conflicting. In this case, a multiobjective problem is being faced.
This can be tackled in different ways, such as performing an aggregation of these multiple criteria into a
single value, or using the notion of Pareto dominance (i.e., solution x dominates solution y if, and only
if, fi (x) yields a better or equal value than fi (y) for all i, where the fi ’s represent the multiple criteria being
optimized). See References 26 and 27 for details.
1.4.2 Initialization
In order to have the EA started, it is necessary to create the initial population of solutions. This is
typically addressed by randomly generating the desired number of solutions. When the alphabet used
for representing solutions has low cardinality, this random initialization provides a more or less uniform
sample of the solution space. The EA can subsequently start exploring the wide area covered by the initial
population, in search of the most promising regions.
In some cases, there exists the risk of not having the initial population adequately scattered all over the
search space (e.g., when using small populations and/or large alphabets for representing solutions). It is
then necessary to resort to systematic initialization procedures [28], so as to ensure that all symbols are
uniformly present in the initial population.
This random initialization can be complemented with the inclusion of heuristic solutions in the initial
population. The EA can thus benefit from the existence of other algorithms, using the solutions they
provide. This is termed seeding, and it is known to be very beneficial in terms of convergence speed, and
quality of the solutions achieved [29,30]. The potential drawback of this technique is having the injected
solutions taking over the whole population in a few iterations, provoking the stagnation of the algorithm.
This problem can be remedied by tuning the selection intensity by some means (e.g., by making an
adequate choice of the selection operator, as it will be shown below).
1.4.3 Selection
In combination with replacement, selection is responsible for the competition aspects of individuals in
the population. In fact, replacement can be intuitively regarded as the complementary application of the
selection operation.
Using the information provided by the fitness function, a sample of individuals from the population is
selected for breeding. This sample is obviously biased towards better individuals, that is good — according
to the fitness function — solutions should be more likely in the sample than bad solutions.2
The most popular techniques are fitness-proportionate methods. In these methods, the probability of
selecting an individual for breeding is proportional to its fitness, that is,
fi
pi = , (1.1)
j∈P fj
where fi is the fitness3 of individual i, and pi is the probability of i getting into the reproduction stage. This
proportional selection can be implemented in a number of ways. For example, roulette-wheel selection rolls
2 At least, this is customary in genetic algorithms. In other EC families, selection is less important for biasing
evolution, and it is done at random (a typical option in evolution strategies), or exhaustively, that is, all individuals
undergo reproduction (as it is typical in evolutionary programming).
3 Maximization is assumed here. In case we were dealing with a minimization problem, fitness should be transformed
so as to obtain an appropriate value for this purpose, for example, subtracting it from the highest possible value of the
guiding function, or taking the inverse of it.
a dice with |P| sides, such that the ith side has probability pi . This is repeated as many times as individuals
are required in the sample. A drawback of this procedure is that the actual number of instances of
individual i in the sample can largely deviate from the expected |P| · pi . Stochastic Universal Sampling [31]
(SUS) does not have this problem, and produces a sample with minimal deviation from expected values.
Fitness-proportionate selection faces problems when the fitness values of individuals are very similar
among them. In this case, pi would be approximately |P|−1 for all i ∈ P, and hence selection would be
essentially random. This can be remedied by using fitness scaling. Typical options are (see Reference 5):
Another problem is the appearance of an individual whose fitness is much better than the remaining
individuals. Such super-individuals can quickly take over the population. To avoid this, the best option is
using a nonfitness-proportionate mechanism. A first possibility is ranking selection [32]: individuals are
ranked according to fitness (best first, worst last), and later selected — for example, by means of SUS —
using the following probabilities:
1 − + − i−1
pi = η + (η − η ) , (1.2)
|P| |P| − 1
1.4.4 Recombination
Recombination is a process that models information exchange among several individuals (typically two
of them, but a higher number is possible [37]). This is done by constructing new solutions using the
information contained in a number of selected parents. If it is the case that the resulting individuals (the
offspring ) are entirely composed of information taken from the parents, then the recombination is said to
Cut point
Binary mask 00110100011001
01001101011010 01001101011010
Parents
11011010010011 11011010010011
Cutting
Father 1 2 3 4 5 6 7 8 9
1 3 84 5 6 9 2 7
Mappings Child
(1) (6)
Mother 4 3 8 1 7 5 9 2 6
(5)
(2) (4)
(3)
FIGURE 1.5 PMX at work. The numbers in brackets indicate the order in which elements are copied to the
descendant.
be transmitting [38,39]. This is the case of classical recombination operators for bitstrings such as single-
point crossover, or uniform crossover [40], among others. Figure 1.4 shows an example of the application
of these operators.
This property captures the a priori role of recombination: combining good parts of solutions that have
been independently discovered. It can be difficult to achieve for certain problem domains though (the
Traveling Salesman Problem (TSP) is a typical example). In those situations, it is possible to consider other
properties of interest such as respect or assortment. The former refers to the fact that the recombination
operator generates descendants carrying all features common to all parents; thus, this property can be seen
as a part of the exploitative side of the search. On the other hand, assortment represents the exploratory side
of recombination. A recombination operator is said to be properly assorting if, and only if, it can generate
descendants carrying any combination of compatible features taken from the parents. The assortment is
said to be weak if it is necessary to perform several recombinations within the offspring to achieve this
effect.
The recombination operator must match the particulars of the representation of solutions chosen.
In the GA context, the representation was typically binary, and hence operators such as those depicted
in Figure 1.4 were used. The situation is different in other EA families (and indeed in modern GAs too).
Without leaving GAs, another very typical representation is that of permutations. Many ad hoc operators
have been defined for this purpose, for example, order crossover (OX) [41], partially mapped crossover
(PMX; see Figure 1.5) [42], and uniform cycle crossover (UCX) [43] among others. The reader may check
[43] for a survey of these different operators.
When used in continuous parameter optimization, recombination can exploit the richness of the
representation, and utilize a variety of alternate strategies to create the offspring. Let (x1 , . . . , xn ) and
+ / + /
+ X + – + + – + X +
X Y Y X X Y X Y Y X X Y
(y1 , . . . , yn ) be two arrays of real valued elements to be recombined, and let (z1 , . . . , zn ) be the resulting
array. Some possibilities for performing recombination are the following:
In the case of self-adaptive schemes as those typically used in ES, the parameters undergoing self-
adaption would be recombined as well, using some of these operators. More details on self-adaption will
follow in next subsection.
Solutions can be also represented by means of some complex data structure, and the recombination
operator must be adequately defined to deal with these (e.g., References 46 to 48). In particular, the field
of GP normally uses trees to represent LISP programs [17], rule-bases [49], mathematical expressions,
etc. Recombination is usually performed here by swapping branches of the trees involved, as exemplified
in Figure 1.6.
1.4.5 Mutation
From a classical point of view (atleast in the GA arena [50]), this was a secondary operator whose mission is
to keep the pot boiling, continuously injecting new material in the population, but at a low rate (otherwise,
the search would degrade to a random walk in the solution space). EP practitioners [11] would disagree
with this characterization, claiming a central role for mutation. Actually, it is considered the crucial part
of the search engine in this context. This later vision has nowadays propagated to most EC researchers
(atleast in the sense of considering mutation as important as recombination).
As it was the case for recombination, the choice of a mutation operator depends on the representation
used. In bitstrings (and in general, in linear strings spanning n , where is arbitrary alphabet) mutation
is done by randomly substituting the symbol contained at a certain position by a different symbol. If a
permutation representation is used, such a procedure cannot be used for it would not produce a valid
permutation. Typical strategies in this case are swapping two randomly chosen positions, or inverting a
segment of the permutation. The interested reader may check [51] or [5] for an overview of different
options.
If solutions are represented by complex data structures, mutation has to be implemented accordingly.
In particular, this is the case of EP, in which, for example, finite automata [52], layered graphs [53],
directed acyclic graphs [54], etc., are often evolved. In this domain, it is customary to use more than one
mutation operator, making for each individual a choice of which operators will be deployed on it.
In the case of ES applied to continuous optimization, mutation is typically done using Gaussian
perturbations, that is,
zi = xi + Ni (0, σi ), (1.3)
where σi is a parameter controlling the amplitude of the mutation, and N (a, b) is a random number
drawn from a normal distribution with mean a and standard deviation b. The parameters σi usually
undergo self-adaption. In this case, they are mutated prior to mutating the xi ’s as follows:
)+N (0,τ )
σi = σi · eN (0,τ i , (1.4)
where τ and τ are two parameters termed the local and global learning rate, respectively. Advanced schemes
have been also defined in which a covariance matrix is used rather than independent σi ’s. However, these
schemes tend to be unpractical if solutions are highly dimensional. For a better understanding of ES
mutation see Reference 55.
1.4.6 Replacement
The role of replacement is keeping the population size constant.4 To do so, some individuals from the
population have to be substituted by some of the individuals created during reproduction. This can be
done in several ways:
• Replacement-of-the-worst : the population is sorted according to fitness, and the new individuals
replace the worst ones from the population.
• Random replacement : the individuals to be replaced are selected at random.
• Tournament replacement : a subset of α individuals is selected at random, and the worst one is
selected for replacement. Notice that if α = 1 we have random replacement.
• Direct replacement : the offspring replace their parents.
Some variants of these strategies are possible. For example, it is possible to consider the elitist versions
of these, and only perform replacement if the new individual is better than the individual it has to replace.
Two replacement strategies (comma and plus) are also typically considered in the context of ES and
EP. Comma replacement is analogous to replacement of the worst, with the addition that the number of
new individuals |P | (also denoted by λ) can be larger than the population size |P| (also denoted by µ).
In this case, the population is constructed using the best µ out of the λ new individuals. As to the plus
strategy, it would be the elitist counterpart of the former, that is, pick the best µ individuals out of the µ
old individuals plus the λ new ones. The notation (µ, λ) — EA and (µ + λ) — EA is used to denote these
two strategies.
It must be noted that the term “elitism” is often used as well to denote replacement-of-the-worst
strategies in which |P | < |P|. This strategy is very commonly used, and ensures that the best individual
found so far is never lost. An extreme situation takes place when |P | = 1, that is, just a single individual is
generated in each iteration of the algorithm. This is known as steady-state reproduction, and it is usually
associated with faster convergence of the algorithm. The term generational is used to designate the classical
situation in which |P | = |P|.
4 Although it is not mandatory to do so [56], it is common practice to use populations of fixed size.
Telecommunications is another field that has witnessed the successful application of EAs. For example,
EAs have been applied to the placement of antennas and converters [82,83], frequency assignment
[84–86], digital data network design [87], predicting bandwidth demands in ATM networks [88], error
code design [89,90], etc. See also Reference 91.
Evolutionary algorithms have been actively used in electronics and engineering as well. For example,
work has been done in structure optimization [92], aeronautic design [93], power planning [94], circuit
design [95] computer-aided design [96], analogue-network synthesis [97], and service restoration [98]
among other areas.
Besides the precise application areas mentioned before, EAs have been also utilized in many other
fields such as, for example, medicine [99,100], economics [101,102], mathematics [103,104], biology
[105–107], etc. The reader may try querying any bibliographical database or web search engine for
“evolutionary algorithm application” to get an idea of the vast number of problems that have been tackled
with EAs.
1.6 Conclusions
EC is a fascinating field. Its optimization philosophy is appealing, and its practical power is striking.
Whenever the user is faced with a hard search/optimization task that she cannot solve by classical means,
trying EAs is a must. The extremely brief overview of EA applications presented before can convince the
reader that a “killer approach” is in her hands.
EC is also a very active research field. One of the main weaknesses of the field is the absence of
a conclusive general theoretical basis, although great advances are being made in this direction, and
in-depth knowledge is available about certain idealized EA models.
Regarding the more practical aspects of the paradigm, two main streamlines can be identified:
parallelizing and hybridizing. The use of decentralized EAs in the context of multiprocessors or net-
worked systems can result in enormous performance improvement [108], and constitutes an ideal option
for exploiting the availability of distributed computing resources. As to hybridization, it has become
evident in the last years that it constitutes a crucial factor for the successful use of EAs in real-world
endeavors. This can be achieved by hard-wiring problem-knowledge within the EA, or by combining it
with other techniques. In this sense, the reader is encouraged to read other essays in this volume to get
valuable ideas on suitable candidates for this hybridization.
Acknowledgments
This work has been partially funded by the Ministry of Science and Technology (MCYT) and Regional
Development European Found (FEDER) under contract TIC2002-04498-C05-02 (the TRACER project)
http://tracer.lcc.uma.es.
References
[1] T. Bäck, D.B. Fogel, and Z. Michalewicz. Handbook of Evolutionary Computation. Oxford
University Press, New York, 1997.
[2] T.C. Jones. Evolutionary Algorithms, Fitness Landscapes and Search. Ph.D. thesis, University of
New Mexico, 1995.
[3] C. Darwin. On the Origin of Species by Means of Natural Selection. John Murray, London, 1859.
[4] G. Mendel. Versuche über pflanzen-hybriden. Verhandlungen des Naturforschendes Vereines in
Brünn, 4: 3–47, 1865.
[5] Z. Michalewicz. Genetic Algorithms + Data Structures = Evolution Programs. Springer-Verlag,
Berlin, 1992.
[6] J. Huxley. Evolution, the Modern Synthesis. Harper, New York, 1942.
[7] M. Kimura. Evolutionary rate at the molecular level. Nature, 217: 624–626, 1968.
[8] S.J. Gould and N. Elredge. Punctuated equilibria: The tempo and mode of evolution reconsidered.
Paleobiology, 32: 115–151, 1977.
[9] C.G. Langton. Artificial life. In C.G. Langton, Ed., Artificial Life 1. Addison-Wesley, Santa Fe, NM,
1989, pp. 1–47.
[10] D.B. Fogel. Evolutionary Computation: The Fossil Record. Wiley-IEEE Press, Piscataway, NJ, 1998.
[11] L.J. Fogel, A.J. Owens, and M.J. Walsh. Artificial Intelligence Through Simulated Evolution. John
Wiley & Sons, New York, 1966.
[12] I. Rechenberg. Evolutionsstrategie: Optimierung technischer Systeme nach Prinzipien der biologis-
chen Evolution. Frommann-Holzboog Verlag, Stuttgart, 1973.
[13] H.P. Schwefel. Numerische Optimierung von Computer–Modellen mittels der Evolutionsstrategie,
Vol. 26 of Interdisciplinary Systems Research. Birkhäuser, Basel, 1977.
[14] J.H. Holland. Adaptation in Natural and Artificial Systems. University of Michigan Press,
Ann Harbor, MI, 1975.
[15] T. Bäck. Evolutionary Algorithms in Theory and Practice. Oxford University Press, New York, 1996.
[16] M.L. Cramer. A representation for the adaptive generation of simple sequential programs.
In J.J. Grefenstette, Ed., Proceedings of the First International Conference on Genetic Algorithms.
Lawrence Erlbaum Associates, Hillsdale, NJ, 1985.
[17] J.R. Koza. Genetic Programming. MIT Press, Cambridge, MA, 1992.
[18] P. Moscato. On Evolution, Search, Optimization, Genetic Algorithms and Martial Arts: Towards
Memetic Algorithms. Technical report Caltech Concurrent Computation Program, Report 826,
California Institute of Technology, Pasadena, CA, USA, 1989.
[19] P. Moscato and C. Cotta. A gentle introduction to memetic algorithms. In F. Glover and
G. Kochenberger, Eds., Handbook of Metaheuristics. Kluwer Academic Publishers, Boston, MA,
2003, pp. 105–144.
[20] M. Dorigo and G. Di Caro. The ant colony optimization meta-heuristic. In D. Corne, M. Dorigo,
and F. Glover, Eds., New Ideas in Optimization. Maiden head, UK, 1999, pp. 11–32.
[21] P. Larrañaga and J.A. Lozano. Estimation of Distribution Algorithms. A New Tool for Evolutionary
Computation. Kluwer Academic Publishers, Boston, MA, 2001.
[22] M. Laguna and R. Martí. Scatter Search. Methodology and Implementations in C. Kluwer Academic
Publishers, Boston, MA, 2003.
[23] C. Blum and A. Roli. Metaheuristics in combinatorial optimization: Overview and conceptual
comparison. ACM Computing Surveys, 35: 268–308, 2003.
[24] F. Glover and G. Kochenberger. Handbook of Metaheuristics. Kluwer Academic Publishers, Boston,
MA, 2003.
[25] R.E. Smith. Diploid genetic algorithms for search in time varying environments. In Annual
Southeast Regional Conference of the ACM. ACM Press, New York, 1987, pp. 175–179.
[26] C.A. Coello. A comprehensive survey of evolutionary-based multiobjective optimization
techniques. Knowledge and Information Systems, 1: 269–308, 1999.
[27] C.A. Coello and A.D. Christiansen. An approach to multiobjective optimization using genetic
algorithms. In C.H. Dagli, M. Akay, C.L.P. Chen, B.R. Fernández, and J. Ghosh, Eds., Intelligent
Engineering Systems Through Artificial Neural Networks, Vol. 5. ASME Press, St. Louis, MO, 1995,
pp. 411–416.
[28] C.R. Reeves. Using genetic algorithms with small populations. In S. Forrest, Ed., Proceedings of the
Fifth International Conference on Genetic Algorithms. Morgan Kaufmann, San Mateo, CA, 1993,
pp. 92–99.
[29] C. Cotta. On the evolutionary inference of temporal Boolean networks. In J. Mira and
J.R. Álvarez, Eds., Computational Methods in Neural Modeling, Vol. 2686 of Lecture Notes in
Computer Science. Springer-Verlag, Berlin, Heidelberg, 2003, pp. 494–501.
[30] C. Ramsey and J.J. Grefensttete. Case-based initialization of genetic algorithms. In S. Forrest,
Ed., Proceedings of the Fifth International Conference on Genetic Algorithms. Morgan Kaufmann,
San Mateo, CA, 1993, pp. 84–91.
[31] J.E. Baker. Reducing bias and inefficiency in the selection algorithm. In J.J. Grefenstette, Ed.,
Proceedings of the Second International Conference on Genetic Algorithms. Lawrence Erlbaum
Associates, Hillsdale, NJ, 1987, pp. 14–21.
[32] D.L. Whitley. Using reproductive evaluation to improve genetic search and heuristic discovery.
In J.J. Grefenstette, Ed., Proceedings of the Second International Conference on Genetic Algorithms.
Lawrence Erlbaum Associates, Hillsdale, NJ, 1987, pp. 116–121.
[33] T. Bickle and L. Thiele. A mathematical analysis of tournament selection. In L.J. Eshelman,
Ed., Proceedings of the Sixth International Conference on Genetic Algorithms. Morgan Kaufmann,
San Francisco, CA, 1995, pp. 9-16.
[34] E. Cantú-Paz. Order statistics and selection methods of evolutionary algorithms. Information
Processing Letters, 82: 15–22, 2002.
[35] K. Deb and D. Goldberg. A comparative analysis of selection schemes used in genetic algorithms.
In G.J. Rawlins, Ed., Foundations of Genetic Algorithms. San Mateo, CA, 1991, pp. 69–93.
[36] E. Alba and J.M. Troya. A survey of parallel distributed genetic algorithms. Complexity, 4: 31–52,
1999.
[37] A.E. Eiben, P.-E. Raue, and Zs. Ruttkay. Genetic algorithms with multi-parent recombination.
In Y. Davidor, H.-P. Schwefel, and R. Männer, Eds., Parallel Problem Solving from Nature
III, Vol. 866 of Lecture Notes in Computer Science. Springer-Verlag, Berlin, Heidelberg, 1994,
pp. 78–87.
[38] C. Cotta and J.M. Troya. Information processing in transmitting recombination. Applied
Mathematics Letters, 16: 945–948, 2003.
[39] N.J. Radcliffe. The algebra of genetic algorithms. Annals of Mathematics and Artificial Intelligence,
10: 339–384, 1994.
[40] G. Syswerda. Uniform crossover in genetic algorithms. In J.D. Schaffer, Ed., Proceedings of the
Third International Conference on Genetic Algorithms. Morgan Kaufmann, San Mateo, CA, 1989,
pp. 2–9.
[41] L. Davis. Handbook of Genetic Algorithms. Van Nostrand Reinhold Computer Library, New York,
1991.
[42] D.E. Goldberg and R. Lingle, Jr. Alleles, loci and the traveling salesman problem.
In J.J. Grefenstette, Ed., Proceedings of an International Conference on Genetic Algorithms.
Lawrence Erlbaum Associates, Hillsdale, NJ, 1985.
[43] C. Cotta and J.M. Troya. Genetic forma recombination in permutation flowshop problems.
Evolutionary Computation, 6: 25–44, 1998.
[44] L.J. Eshelman and J.D. Schaffer. Real-coded genetic algorithms and interval-schemata. In
D. Whitley, Ed., Foundations of Genetic Algorithms 2. Morgan Kaufmann Publishers, San Mateo,
CA, 1993, pp. 187–202.
[45] F. Herrera, M. Lozano, and J.L. Verdegay. Dynamic and heuristic fuzzy connectives-based cros-
sover operators for controlling the diversity and convengence of real coded genetic algorithms.
Journal of Intelligent Systems, 11: 1013–1041, 1996.
[46] E. Alba, J.F. Aldana, and J.M. Troya. Full automatic ann design: A genetic approach. In J. Cabestany,
J. Mira, and A. Prieto, Eds., New Trends in Neural Computation, Vol. 686 of Lecture Notes in
Computer Science. Springer-Verlag, Heidelberg, 1993, pp. 399–404.
[47] E. Alba and J.M. Troya. Genetic algorithms for protocol validation. In H.M. Voigt, W. Ebeling,
I. Rechenberg, and H.-P. Schwefel, Eds., Parallel Problem Solving from Nature IV. Springer-Verlag,
Berlin, Heidelberg, 1996, pp. 870–879.
[48] C. Cotta and J.M. Troya. Analyzing directed acyclic graph recombination. In B. Reusch, Ed.,
Computational Intelligence: Theory and Applications, Vol. 2206 of Lecture Notes in Computer
Science. Springer-Verlag, Berlin, Heidelberg, 2001, pp. 739–748.
[49] E. Alba, C. Cotta, and J.M. Troya. Evolutionary design of fuzzy logic controllers using strongly-
typed GP. Mathware & Soft Computing, 6: 109–124, 1999.
[50] D.E. Goldberg. Genetic Algorithms in Search, Optimization and Machine Learning. Addison-
Wesley, Reading, MA, 1989.
[51] A.E. Eiben and J.E. Smith. Introduction to Evolutionary Computing. Springer-Verlag, Berlin,
Heidelberg, 2003.
[52] C.H. Clelland and D.A. Newlands. PFSA modelling of behavioural sequences by evolutionary
programming. In R.J. Stonier and X.H. Yu, Eds., Complex Systems: Mechanism for Adaptation.
IOS Press, Rockhampton, Queensland, Australia, 1994, pp. 165–172.
[53] X. Yao and Y. Liu. A new evolutionary system for evolving artificial neural networks. IEEE
Transactions on Neural Networks, 8: 694–713, 1997.
[54] M.L. Wong, W. Lam, and K.S. Leung. Using evolutionary programming and minimum descrip-
tion length principle for data mining of bayesian networks. IEEE Transactions on Pattern Analysis
and Machine Intelligence, 21: 174–178, 1999.
[55] H.-G. Beyer. The Theory of Evolution Strategies. Springer-Verlag, Berlin, Heidelberg, 2001.
[56] F. Fernandez, L. Vanneschi, and M. Tomassini. The effect of plagues in genetic programming:
A study of variable-size populations. In C. Ryan et al., Eds., Genetic Programming, Proceedings of
EuroGP’2003, Vol. 2610 of Lecture Notes in Computer Science. Springer-Verlag, Berlin, Heidelberg,
2003, pp. 320–329.
[57] S. Chatterjee, C. Carrera, and L. Lynch. Genetic algorithms and traveling salesman problems.
European Journal of Operational Research, 93: 490–510, 1996.
[58] D.B. Fogel. An evolutionary approach to the traveling salesman problem. Biological Cybernetics,
60: 139–144, 1988.
[59] P. Merz and B. Freisleben. Genetic local search for the TSP: New Results. In Proceedings of the
1997 IEEE International Conference on Evolutionary Computation. IEEE Press, Indianapolis, USA,
1997, pp. 159–164.
[60] C. Cotta and J.M. Troya. A hybrid genetic algorithm for the 0–1 multiple knapsack problem.
In G.D. Smith, N.C. Steele, and R.F. Albrecht, Eds., Artificial Neural Nets and Genetic Algorithms
3. Springer-Verlag, Wien New York, 1998, pp. 251–255.
[61] S. Khuri, T. Bäck, and J. Heitkötter. The zero/one multiple knapsack problem and genetic
algorithms. In E. Deaton, D. Oppenheim, J. Urban, and H. Berghel, Eds., Proceedings of the 1994
ACM Symposium of Applied Computation proceedings. ACM Press, New York, 1994, pp. 188–193.
[62] R. Berretta, C. Cotta, and P. Moscato. Enhancing the performance of memetic algorithms by
using a matching-based recombination algorithm: Results on the number partitioning problem.
In M. Resende and J. Pinho de Sousa, Eds., Metaheuristics: Computer-Decision Making. Kluwer
Academic Publishers, Boston, MA, 2003, pp. 65–90.
[63] D.R. Jones and M.A. Beltramo. Solving partitioning problems with genetic algorithms. In
R.K. Belew and L.B. Booker, Eds., In Proceedings of the Fourth International Conference on Genetic
Algorithms. Morgan Kaufmann, San Mateo, CA, 1991, pp. 442–449.
[64] C.C. Aggarwal, J.B. Orlin, and R.P. Tai. Optimized crossover for the independent set problem.
Operations Research, 45: 226–234, 1997.
[65] M. Hifi. A genetic algorithm-based heuristic for solving the weighted maximum independent set
and some equivalent problems. Journal of the Operational Research Society, 48: 612–622, 1997.
[66] D. Costa, N. Dubuis, and A. Hertz. Embedding of a sequential procedure within an evolutionary
algorithm for coloring problems in graphs. Journal of Heuristics, 1: 105–128, 1995.
[67] C. Fleurent and J.A. Ferland. Genetic and hybrid algorithms for graph coloring. Annals of
Operations Research, 63: 437–461, 1997.
[68] S. Cavalieri and P. Gaiardelli. Hybrid genetic algorithms for a multiple-objective scheduling
problem. Journal of Intelligent Manufacturing, 9: 361–367, 1998.
[69] D. Costa. An evolutionary tabu search algorithm and the NHL scheduling problem. INFOR, 33:
161–178, 1995.
[70] C.F. Liaw. A hybrid genetic algorithm for the open shop scheduling problem. European Journal
of Operational Research, 124: 28–42, 2000.
[71] L. Ozdamar. A genetic algorithm approach to a general category project scheduling problem.
IEEE Transactions on Systems, Man and Cybernetics, Part C (Applications and Reviews), 29: 44–59,
1999.
[72] E.K. Burke, J.P. Newall, and R.F. Weare. Initialisation strategies and diversity in evolutionary
timetabling. Evolutionary Computation, 6: 81–103, 1998.
[73] B. Paechter, R.C. Rankin, and A. Cumming. Improving a lecture timetabling system for university
wide use. In E.K. Burke and M. Carter, Eds., The Practice and Theory of Automated Timetabling
II, Vol. 1408 of Lecture Notes in Computer Science. Springer-Verlag, Berlin, 1998, pp. 156–165.
[74] K. Haase and U. Kohlmorgen. Parallel genetic algorithm for the capacitated lot-sizing prob-
lem. In Kleinschmidt et al., Eds., Operations Research Proceedings. Springer-Verlag, Berlin, 1996,
pp. 370–375.
[75] J. Berger and M. Barkaoui. A hybrid genetic algorithm for the capacitated vehicle routing prob-
lem. In E. Cantú-Paz, Ed., Proceedings of the Genetic and Evolutionary Computation Conference
2003, Vol. 2723 of Lecture Notes in Computer Science. Springer-Verlag, Berlin, Heidelberg, 2003,
pp. 646–656.
[76] J. Berger, M. Salois, and R. Begin. A hybrid genetic algorithm for the vehicle routing problem with
time windows. In R.E. Mercer and E. Neufeld, Eds., Advances in Artificial Intelligence. 12th Biennial
Conference of the Canadian Society for Computational Studies of Intelligence. Springer-Verlag,
Berlin, 1998, pp. 114-127.
[77] P. Merz and B. Freisleben. Genetic algorithms for binary quadratic programming. In W. Banzhaf
et al., Eds., Proceedings of the 1999 Genetic and Evolutionary Computation Conference,
Morgan Kaufmann, San Francisco, CA, 1999, pp. 417–424.
[78] P. Merz and B. Freisleben. Fitness landscape analysis and memetic algorithms for the quadratic
assignment problem. IEEE Transactions on Evolutionary Computation, 4: 337–352, 2000.
[79] E. Hopper and B. Turton. A genetic algorithm for a 2d industrial packing problem. Computers &
Industrial Engineering, 37: 375–378, 1999.
[80] R.M. Krzanowski and J. Raper. Hybrid genetic algorithm for transmitter location in wireless
networks. Computers, Environment and Urban Systems, 23: 359–382, 1999.
[81] M. Gen, K. Ida, and L. Yinzhen. Bicriteria transportation problem by hybrid genetic algorithm.
Computers & Industrial Engineering, 35: 363–366, 1998.
[82] P. Calegar, F. Guidec, P. Kuonen, and D. Wagner. Parallel island-based genetic algorithm for radio
network design. Journal of Parallel and Distributed Computing, 47: 86–90, 1997.
[83] C. Vijayanand, M.S. Kumar, K.R. Venugopal, and P.S. Kumar. Converter placement in all-optical
networks using genetic algorithms. Computer Communications, 23: 1223–1234, 2000.
[84] C. Cotta and J.M. Troya. A comparison of several evolutionary heuristics for the frequency
assignment problem. In J. Mira and A. Prieto, Eds., Connectionist Models of Neurons, Learning
Processes, and Artificial Intelligence, Vol. 2084 of Lecture Notes in Computer Science. Springer-
Verlag, Berlin, Heidelberg, 2001, pp. 709–716.
[85] R. Dorne and J.K. Hao. An evolutionary approach for frequency assignment in cellular radio
networks. In 1995 IEEE International Conference on Evolutionary Computation. IEEE Press, Perth,
Australia, 1995, pp. 539–544.
[86] A. Kapsalis, V.J. Rayward-Smith, and G.D. Smith. Using genetic algorithms to solve the radio link
frequency assignment problem. In D.W. Pearson, N.C. Steele, and R.F. Albretch, Eds., Artificial
Neural Nets and Genetic Algorithms. Springer-Verlag, Wien New York, 1995, pp. 37–40.
[87] C.H. Chu, G. Premkumar, and H. Chou. Digital data networks design using genetic algorithms.
European Journal of Operational Research, 127: 140–158, 2000.
[88] N. Swaminathan, J. Srinivasan, and S.V. Raghavan. Bandwidth-demand prediction in virtual path
in atm networks using genetic algorithms. Computer Communications, 22: 1127–1135, 1999.
[89] H. Chen, N.S. Flann, and D.W. Watson. Parallel genetic simulated annealing: A massively parallel
SIMD algorithm. IEEE Transactions on Parallel and Distributed Systems, 9: 126–136, 1998.
[90] K. Dontas and K. De Jong. Discovery of maximal distance codes using genetic algorithms.
In Proceedings of the Second International IEEE Conference on Tools for Artificial Intelligence. IEEE
Press, Herndon, VA, 1990, pp. 805–811.
[91] D.W. Corne, M.J. Oates, and G.D. Smith. Telecommunications Optimization: Heuristic and
Adaptive Techniques. John Wiley, New York, 2000.
[92] I.C. Yeh. Hybrid genetic algorithms for optimization of truss structures. Computer Aided Civil
and Infrastructure Engineering, 14: 199–206, 1999.
[93] D. Quagliarella and A. Vicini. Hybrid genetic algorithms as tools for complex optimisation prob-
lems. In P. Blonda, M. Castellano, and A. Petrosino, Eds., New Trends in Fuzzy Logic II. Proceedings
of the Second Italian Workshop on Fuzzy Logic. World Scientific, Singapore, 1998, pp. 300–307.
[94] A.J. Urdaneta, J.F. Gómez, E. Sorrentino, L. Flores, and R. Díaz. A hybrid genetic algorithm for
optimal reactive power planning based upon successive linear programming. IEEE Transactions
on Power Systems, 14: 1292–1298, 1999.
[95] M. Guotian and L. Changhong. Optimal design of the broadband stepped impedance transformer
based on the hybrid genetic algorithm. Journal of Xidian University, 26: 8–12, 1999.
[96] B. Becker and R. Drechsler. Ofdd based minimization of fixed polarity Reed-Muller expressions
using hybrid genetic algorithms. In Proceedings of the IEEE International Conference on Computer
Design: VLSI in Computers and Processor. IEEE, Los Alamitos, CA, 1994, pp. 106–110.
[97] J.B. Grimbleby. Hybrid genetic algorithms for analogue network synthesis. In Proceedings of the
1999 Congress on Evolutionary Computation. IEEE, Washington D.C., 1999, pp. 1781–1787.
[98] A. Augugliaro, L. Dusonchet, and E. Riva-Sanseverino. Service restoration in compensated dis-
tribution networks using a hybrid genetic algorithm. Electric Power Systems Research, 46: 59–66,
1998.
[99] M. Sipper and C.A. Peña Reyes. Evolutionary computation in medicine: An overview. Artificial
Intelligence in Medicine, 19: 1–23, 2000.
[100] R. Wehrens, C. Lucasius, L. Buydens, and G. Kateman. HIPS, A hybrid self-adapting expert system
for nuclear magnetic resonance spectrum interpretation using genetic algorithms. Analytica
Chimica ACTA, 277: 313–324, 1993.
[101] J. Alander. Indexed Bibliography of Genetic Algorithms in Economics. Technical report
94-1-ECO, University of Vaasa, Department of Information Technology and Production
Economics, 1995.
[102] F. Li, R. Morgan, and D. Williams. Economic environmental dispatch made easy with hybrid
genetic algorithms. In Proceedings of the International Conference on Electrical Engineering, Vol.
2. International Academic Publishers, Beijing, China, 1996, pp. 965–969.
[103] C. Reich. Simulation if imprecise ordinary differential equations using evolutionary algorithms.
In J. Carroll, E. Damiani, H. Haddad, and D. Oppenheim, Eds., ACM Symposium on Applied
Computing 2000. ACM Press, New York, 2000, pp. 428–432.
[104] X. Wei and F. Kangling. A hybrid genetic algorithm for global solution of nondifferentiable
nonlinear function. Control Theory & Applications, 17: 180–183, 2000.
[105] C. Cotta and P. Moscato. Inferring phylogenetic trees using evolutionary algorithms.
In J.J. Merelo, P. Adamidis, H.-G. Beyer, J.-L. Fernández-Villacañas, and H.-P. Schwefel, Eds.,
Parallel Problem Solving from Nature VII, Vol. 2439 of Lecture Notes in Computer Science.
Springer-Verlag, Berlin, 2002, pp. 720–729.
[106] G.B. Fogel and D.W. Corne. Evolutionary Computation in Bioinformatics. Morgan Kaufmann,
San Francisco, CA, 2003.
[107] R. Thomsen, G.B. Fogel, and T. Krink. A clustal alignment improver using evolution-
ary algorithms. In David B. Fogel, Xin Yao, Garry Greenwood, Hitoshi Iba, Paul Marrow,
and Mark Shackleton, Eds., Proceedings of the Fourth Congress on Evolutionary Computation
(CEC-2002) Vol. 1. 2002, pp. 121–126.
[108] E. Alba. Parallel evolutionary algorithms can achieve super-linear performance. Information
Processing Letters, 82: 7–13, 2002.
2.1 Introduction
Artificial Neural Networks have been one of the most active areas of research in computer science over
the last 50 years with periods of intense activity interrupted by episodes of hiatus [1]. The premise for
the evolution of the theory of artificial Neural Networks stems from the basic neurological structure of
living organisms. A cell is the most important constituent of these life forms. These cells are connected
by “synapses,” that are the links that carry messages between cells. In fact, by using synapses to carry the
pulses, cells can activate each other with different threshold values to form a decision or memorize an
event. Inspired by this simplistic vision of how messages are transferred between cells, scientists invented
a new computational approach, which became popularly known as Artificial Neural Networks (or Neural
Networks for short) and used it extensively to target a wide range of problems in many application
areas.
Although the shape or configurations of different Neural Networks may look different at the first
glance, they are almost similar in structure. Every neural network consists of “cells” and “links.” Cells are
the computational part of the network that perform reasoning and generate activation signals for other
2-21
cells, while links connect the different cells and enable messages to flow between cells. Each link is usually
a one directional connection with a weight which affects the carried message in a certain way. This means,
that a link receives a value (message) from an input cell, multiplies it by a given weight, and then passes it
to the output cell. In its simplest form, a cell can have three states (of activation): +1 (TRUE), 0, and −1
(FALSE) [1].
y = f (W · X + b),
W = (w1 w2 ... wn ),
X = (x1 x2 ... xn )T .
The above-mentioned basic structure can be extended to produce networks with more than one output.
In this case, each output has its own weights and is completely uncorrelated to the other outputs. Figure 2.3
(a)
w
x f (.) y
(b)
w
x Σ f (.) y
FIGURE 2.1 (a) Unbiased and (b) biased structure of a neural network.
x1
w1
x2 w2
Σ f ( .) y
wn
xn
b
x1
w1, 1
w1, 2
Σ f1 (.) y1
x2
b1
Σ f2 (.) y2
b2
wm–1,n Σ fm (.) ym
bm
wm,n
xn
1
Y = F (W · X + B),
w1,1 w1,2 . . . w1,n
w2,1
W = . ,
..
wm,1 . . . wm,n
X = (x1 x2 ... xn ) T ,
Y = ( y1 y2 ... ym ) T ,
B = (b1 b2 ... b m )T ,
F (·) = ( f1 (·) f2 (·) ... fm (·))T ,
x1 y1
w11,1 z p1
z11 w 21,1 z 21 w p1,1
Σ f 11(.) Σ f 21(.) Σ f p1(.)
x1
b11 b12 p f P1(.)
b1
1 1 1
z p2 y2
z12 z 22
Σ .)
f 12( Σ f 22 ( .) Σ f P2( .)
b12 b 22 b p2
1 1 1
zm1 z 2m2
Σ f 1m1( .) Σ f 2m (.) Σ f 2m (.)
2
w 2m2,n2 w pmp,np 2
where n is the number of inputs, m the number of outputs, W the weighing matrix, X the input vector,
Y the output vector, and F (·) the array of output functions.
A multi-layer perceptron can simply be constructed by concatenating several single-layer perceptron
networks. Figure 2.4 shows the basic structure of such network with the following parameters [1]: X is
the input vector, Y the output vector, n the number of inputs, m the number of outputs, p the total number
of layers in the network, mi the number of outputs for the ith layer and, ni the number of inputs for the
ith layer.
Note that in this network, every internal layer of the network can have its own number of inputs and
outputs only by considering the concatenation rule, that is, ni = mi−1 . The output of the first layer is
calculated as follows:
Z 1 = F 1 (W 1 · X + B 1 ),
1
w1,1 1
w1,2 ... 1
w1,n
1
w2,1
W1 =
..
,
.
wm1
1 ,1
. . . wm
1
1 ,n
X = (x1 x2 ... x n )T ,
Z 2 = F 2 (W 2 · Z 1 + B 2 ),
2
w1,1 w1,2 2 ... 2
w1,n
2
w2,1
W2 = .
,
..
2
wm 2 ,1
. . . wm
2
2 ,m1
Y = Z p = F p (W p · Z p−1 + B p ),
p p p
w1,1 w1,2 . . . w1,n
p
w
2,1
W =
p
..
,
.
p p
wm1 ,1 . . . wmp ,mp−1
p p p
B p = (b1 b2 ... bmp )T ,
p p p
Z p = (z1 z2 ... zmp )T ,
p p p
F p (·) = ( f1 (·) f2 (·) ... fmp (·))T .
Notice that in such networks, the complexity of the network raises in a fast race based on the number of
layers. Practically experienced, each multi-layer perceptron can be evaluated by a single-layer perceptron
with comparatively huge number of nodes.
x1 1.4
Σ sgn( .) y
1.4
x2 –0.7
x2
+1
–1 +1 x1
–1
x1
–1 +1
x2 –1 –1 –1
+1 –1 +1
W = (w0 w1 ... wn ),
T = {(R 1 , S 1 ), (R 2 , S 2 ), . . . , (R L , S L )},
where n is the number of inputs, R i is the ith input data, S i represents the appropriate output for the ith
pattern, and, L is the size of the training set. Note that, for the above vector W , wn is used to adjust the
bias in the values of the weights. The Perceptron Learning can be summarized as follows:
Step 1: Set all elements of the weighting vector to zero, that is, W = (0 0 · · · 0).
Step 2: Select training pattern at random, namely kth datum.
Step 3: IF the current W has not been classified correctly, that is, W · R k = S k , then, modify the
weighing vector as follows: W ← W + R k S k .
Step 4: Repeat steps 1 to 3 until all data are classified correctly.
0 i = j,
Ti , Tj ≈
1 i = j,
N
W = Ti ⊗ T i .
i=1
5 2
4 3
As it can be seen, the main advantage of this network is in its one-shot learning process, by considering
orthogonal data. Note that, even if the input data are not orthogonal in the first place, they can be
transferred to a new space by a simple transfer function.
2.3.1.3 Iterative Learning
Iterative learning is another approach that can be used to train a network. In this case, the network’s weights
are modified smoothly, in contrast to the one-shot learning algorithms. In general, network weights are set
to some arbitrary values first, then, trained data are fed to the network. In this case, in each training cycle,
network weights are modified smoothly. Then, the training process proceeds until achieving an acceptable
level of acceptance for the network. However, the training data could be selected either sequentially or
randomly in each training cycle [9–11].
2.3.1.4 Hopfield’s Model
A Hopfield neural network is another example of an auto-associative network [1,12–14]. There are two
main differences between this network and the previously described auto-associate network. In this
network, self-connection is not allowed, that is, wi,i = 0 for all nodes. Also, inputs and outputs are either
0 or 1. This means that the node activation is recomputed after each cycle of convergence as follows:
N
Si = wi,j · uj (t ), (2.1)
j=1
1 if Si ≥ 0,
uj = (2.2)
0 if Si < 0.
After feeding a datum into the network, in each convergence cycle, the nodes are selected by a uniform
random function, the input are used to calculate Equation (2.1) and then followed by Equation (2.2) to
generate the output. This procedure is continued until the network converges.
The proof of convergence for this network uses the notion of “energy.” This means that an energy value
is assigned to each state of the network and through the different iterations of the algorithm, the overall
energy is decreased until it reaches a steady state.
2.3.1.5 Mean Square Error Algorithms
These techniques emerged as an answer to the deficiencies experienced by using Preceptrons and other
simple networks [1,15]. One of the most important reasons is the inseparability of training data. If the data
used to train the network are naturally inseparable, the training algorithm never terminates (Figure 2.8).
The other reason for using this technique is to converge to a better solution. In Perceptron learning,
the training process terminates right after finding the first answer regardless of its quality (i.e., sensitivity
of the answer). Figure 2.9 shows an example of such a case. Note that, although the answer found by
the Perceptron algorithm is correct (Figure 2.9[a]), the answer in Figure 2.9(b) is more robust. Finally,
another reason for using Mean Square Error (MSE) algorithms, which is crucial for most neural network
algorithms, is that of speed of convergence.
The MSE algorithm attempts to modify the network weights based on the overall error of all data. In this
case, assume that network input and output data are represented by Ti , Ri for i = 1, . . . , N , respectively.
Now the MSE error is defined as follows:
1
N
E= (W · Ti − Ri )2 .
N
i=1
Note that, the stated error is the summation of all individual errors for the all the training data. Inspite
of all advantages gained by this training technique, there are several disadvantages, for example, the
network might not be able to correctly classify the data if it is widely spread apart (Figure 2.10). The other
–1
–2
–3
–3 –2 –1 0 1 2 3
(a) 3 (b) 3
2 2
1 1
0 0
0
–1 –1
–2 –2
–3 –3
–3 –2 –1 0 1 2 3 –3 –2 –1 0 1 2 3
disadvantage is that of the speed of convergence which may completely vary from one set of data to
another.
2.3.1.6 The Widow–Hoff Rule or LMS Algorithm
In this technique, the network weight is modified after each iteration [1,16]. A training datum is selected
randomly, then, the network weights are modified based on the corresponding error. This procedure
continues until converging to the answer. For a randomly selected kth entry in the training data, the error
is calculated as follows:
ε = (W · Tk − Rk )2 .
∂ε ∂ε ∂ε
∇ε = ··· .
∂W0 ∂W1 ∂WN
–1
–2
–3
–3 –2 –1 0 1 2 3
Hence,
∂ε
= 2(W · Tk − Rk ) · Tk .
∂Wj
Based on the Widow–Hoff algorithm, the weights should be modified opposite the direction of the
gradient. As a result, the final update formula for the weighting matrix W would be:
W = W − ρ · (W · Tk − Rk ) · Tk .
Note that, ρ is known as the learning rate and it absorbs the multiplier of value “2.”
(a) (b)
+ + +
+ + +
FIGURE 2.11 Results for a K-means clustering with (a) correct (b) incorrect number of clusters.
Figure 2.11 shows an instance of applying such network for data classification with the correct and
incorrect number of clusters.
2.4.3 ART1
This neural classifier, known as“Adaptive Resonance Theory”or ART, deals with digital inputs (Ti ∈ {0, 1}).
In this network, each “1” in the input vector represents information while a “0” entry is considered noise or
unwanted information. In ART, there is no predefined number of classes before the start of classification;
in fact, the classes are generated during the classification process.
Moreover, each class prototype may include the characteristics of more than a training datum. The
basic idea of such network relies on the similarity factor for data classification. In summary, every time
a datum is assigned to a cluster, firstly, the nearest class with this datum is found, then, if the similarity of
this datum and the class prototype is more than a predefined value, known as a vigilance factor, then, the
datum is assigned to this class and the class prototype is modified to have more similarity with the a new
data entry [1,22,23].
The following procedure shows how this algorithm is implemented. However, the following needs to
be noted before outlining the algorithm:
Step 1: Let β be a small number, n be the dimension of the input data; and ρ be the vigilance factor
(0 ≤ ρ < 1).
Step 2: Start with no class prototype.
Step 3: Select a training datum by random, Tk .
Step 4: Find the nearest unchecked class prototype, Ci , to this datum by minimizing (Ci ·Tk )/(β +Ci ).
Step 5: Test if Ci is sufficiently close to Tk by verifying if (Ci · Tk )/(β + Ci ) > (Tk /(β + ρ)).
Step 6: If it is not similar enough, then, make a new class prototype and go to step 3.
Step 7: If it is sufficiently similar check the vigilance factor: (Ci · Tk /Tk ) ≥ ρ.
Step 8: If vigilance factor is exceeded, then, modify the class prototype by Ci = Ci ∩ Tk and go to step 3.
Step 9: If vigilance factor is not exceeded, then, try to find another unchecked class prototype in step 4.
Step 10: Repeat steps 3 to 9 until none of the training data causes any change in class prototypes.
2.4.4 ART2
This is a variation to ART1 with the following differences:
x1 w11,1 y1
z1 w 21,1
Σ f11(.) Σ f12(.)
x2
z2 y2
Σ f21(.) Σ f22(.)
zs
Σ f 1m1(.) Σ fm2(.)
w 2s,m
w 1n,s
xn ym
In this approach, an input is presented to the network and allowed to “forward” propagate through
the network and the output is calculated. Then, the output will be compared to a“desired”output (from the
training set) and an error calculated. This error is then propagated “backward” into the network and the
different weights updated accordingly. To simplify describing this algorithm, consider a network with
a single hidden layer (and two layers of weights) given in Figure 2.13.
In relation to the above network, the following definitions apply. Of course, the same definitions can
be easily extended to larger networks.
It is important to note that, in such network, different combinations of weights might produce the
same input/output relationship. However, this is not crucial as long as the network is able to “learn” this
association. As a result, the network weights may converge to different sets of values based on the order of
the training data and the algorithm used for training although their stability may differ.
x1 Σ y1
l1,1
Φ1 (.) l1,2
x2 Σ y2
Φ2(.)
Φp(.)
lp,m
xn Σ ym
The main topology of this network as is shown in Figure 2.14. Many functions were introduced for
possible use in the hidden layer; however, radial functions (Gaussian) remain the most effective to use for
data or pattern classification. The Gaussian functions are defined as follows:
where j = 1, 2, . . . , L and L represents the number of nodes in the hidden layer, X is the input vector,
µj and
j are the mean vector and covariance matrix of the jth Gaussian function, respectively. In some
approaches, a polynomial term is appended to the above expression while in others the functions are
normalized to the sum of all Gaussian components as in the Gaussian mixture estimation. Geometrically,
a radial basis function in this network represents a bump in the N -dimensional space where N is the
number of entries (input vector size). In this case, the µj represents the location of this bump in the space
and
j models its shape.
Because of the nonlinear behavior of this network, training procedure of the RBF network (as in
multi-layer networks) is approached in a completely different manner to that of single-layer networks.
In this network, the aim is to find the center and variance factor of all hidden layer Gaussian functions as
well as the optimal weights for the linear output layer. In this case, the following cost function is usually
considered as the main network objective:
N
T
Min ([Y (Ti ) − Ri ] · [Y (Ti ) − Ri ]) ,
i=0
where N is the number of inputs in the training data set, Y (X ) is the output of the network for input X
and,
Tk , Rk is the kth training data pair. So, the actual output of the network is a combination of a
nonlinear computation followed by a linear operation. Therefore, finding an optimal set of weights for
hidden layers and output layer parameters is hardly achievable.
In this case, several approaches were used to find the optimal set of weights, however, none of these can
provide any guarantees that optimality can be achieved. For example, many approaches suggest that the
hidden layer parameters are set randomly and the training procedure is just carried on for the output linear
components. In contrast, in some other cases, the radial basis functions are homogenously distributed
over the sample space before finding the output linear weights. However, the back propagation algorithm
seems to be the most suitable approach for training such a network.
2.7 Conclusion
In this chapter, a general overview of the artificial Neural Networks was presented. These networks vary
in their sophistication from the very simple to the more complex. As a result, their training techniques
vary as well as their capabilities and suitability for certain applications. Neural networks have attracted a
lot of interest over the last few decades and it is expected they will be an active area of research for years
to come. Undoubtedly, more robust neural techniques will be introduced in the future that could benefit
a wide range of complex application.
References
[1] Gallant, S.I. Neural Network Learning and Expert Systems, MIT Press, Cambridge, MA, 1993.
[2] Karayiannis, N.B. and Venetsanopoulos, A.N. Efficient learning algorithms for Neural Networks
(ELEANNE). IEEE Transactions on Systems, Man, and Cybernetics, 23, 1993, 1372–1383.
[3] Hassoun, M.H. and Clark, D.W. An adaptive attentive learning algorithm for single-layer Neural
Networks. In Proceedings of the IEEE International Conference on Neural Networks, Vol. 1,
July 24–27, 1988, pp. 431–440.
[4] Ulug, M.E. A single layer fast learning fuzzy controller/filter, Neural Networks. In Proceedings of
the IEEE World Congress on Computational Intelligence, Vol. 3, June 27–July 2, 1994, pp. 1662–1667.
[5] Karayiannis, N.B. and Venetsanopoulos, A.N. Fast learning algorithms for Neural Networks.
IEEE Transactions on Circuits and Systems II: Analog and Digital Signal Processing, 39, 1992, 453–474.
[6] Hrycej, T. Back to single-layer learning principles. In Proceedings of the International Joint
Conference on Neural Networks, Seattle, Vol. 2, July 8–14, 1991, p. 945.
[7] Healy, M.J. A logical architecture for supervised learning, Neural Networks. In Proceedings of
the IEEE International Joint Conference on Neural Networks, Vol. 1, November 18–21, 1991,
pp. 190–195.
[8] Brandt, R.D. and Feng, L. Supervised learning in Neural Networks without feedback network. In
Proceedings of the IEEE International Symposium on Intelligent Control, September 15–18, 1996,
pp. 86–90.
[9] Gong, Y. and Yan, P. Neural network based iterative learning controller for robot manipulators. In
Proceedings of the IEEE International Conference on Robotics and Automation, Vol. 1, May 21–27,
1995, pp. 569–574.
[10] Park, S. and Han, T. Iterative inversion of fuzzified Neural Networks. IEEE Transactions on Fuzzy
Systems, 8, 2000, 266–280.
[11] Zhan, X., Zhao, K., Wu, S., Wang, M., and Hu, H. Iterative learning control for nonlinear systems
based on Neural Networks. In Proceedings of the IEEE International Conference on Intelligent
Processing Systems, Vol. 1, October 28–31, 1997, pp. 517–520.
[12] Chen, C.J., Haque, A.L., and Cheung, J.Y. An efficient simulation model of the Hopfield Neural
Networks. In Proceedings of the International Joint Conference on Neural Networks, Vol. 1, June 7–11,
1992, pp. 471–475.
[13] Galan-Marin, G. and Munoz-Perez, J. Design and analysis of maximum Hopfield networks. IEEE
Transactions on Neural Networks, 12, 2001, 329–339.
[14] Nasrabadi, N.M. and Li, W. Object recognition by a Hopfield neural network. IEEE Transactions
on Systems, Man and Cybernetics, 21, 1991, 1523–1535.
[15] Xu, J., Zhang, X., and Li, Y. Kernel MSE algorithm: a unified framework for KFD, LS-SVM and
KRR. In Proceedings of the International Joint Conference on Neural Networks, Vol. 2, July 15–19,
2001, pp. 1486–1491.
[16] Hayasaka, T., Toda, N., Usui, S., and Hagiwara, K. On the least square error and prediction square
error of function representation with discrete variable basis. In Proceedings of the Workshop on
Neural Networks for Signal Processing, Vol. VI. IEEE Signal Processing Society, September 4–6,
1996, pp. 72–81.
[17] Park, D.-C. Centroid neural network for unsupervised competitive learning. IEEE Transactions on
Neural Networks, 11, 2000, 520–528.
[18] Pedrycz, W. and Waletzky, J. Neural-network front ends in unsupervised learning. IEEE
Transactions on Neural Networks, 8, 1997, 390–401.
[19] Park, D.-C. Development of a neural network algorithm for unsupervised competitive learning.
In Proceedings of the International Conference on Neural Networks, Vol. 3, June 9–12, 1997,
pp. 1989–1993.
[20] Hsieh, K.-R. and Chen, W.-T. A neural network model which combines unsupervised and
supervised learning. IEEE Transactions on Neural Networks, 4, 1993, 357–360.
[21] Dajani, A.L., Kamel, M., and Elmastry, M.I. Single layer potential function neural network for
unsupervised learning. In Proceedings of the International Joint Conference on Neural Networks,
Vol. 2, June 17–21, 1990, pp. 273–278.
[22] Georgiopoulos, M., Heileman, G.L., and Huang, J. Properties of learning in ART1. In Proceedings
of the IEEE International Joint Conference on Neural Networks, Vol. 3, November 18–21, 1991,
pp. 2671–2676.
[23] Heileman, G.L., Georgiopoulos, M., and Hwang, J. A survey of learning results for ART1 networks.
In Proceedings of the IEEE International Conference on Neural Networks, IEEE World Congress on
Computational Intelligence, Vol. 2, June 27–July 2, 1994, pp. 1222–1225.
[24] Song, J. and Hassoun, M.H. Learning with hidden targets. In Proceedings of the International Joint
Conference on Neural Networks, Vol. 3, June 17–21, 1990, pp. 93–98.
[25] Kwan, H.K. Multilayer feedbackward Neural Networks. In Proceedings of the International
Conference on Acoustics, Speech, and Signal Processing, Vol. 2, April 14–17, 1991, pp. 1145–1148.
[26] Shepanski, J.F. Fast learning in artificial neural systems: multilayer perceptron training using
optimal estimation. In Proceedings of the IEEE International Conference on Neural Networks, Vol. 1,
July 24–27, 1988, pp. 465–472.
[27] Karayiannis, N.B. and Randolph-Gips, M.M. On the construction and training of reformulated
radial basis function Neural Networks. IEEE Transactions on Neural Networks, 14, 2003, 835–846.
[28] Leonard, J.A. and Kramer, M.A. Radial basis function networks for classifying process faults. IEEE
Control Systems Magazine, 11, 1991, 31–38.
[29] Li, R., Lebby, G., and Baghavan, S. Performance evaluation of Gaussian radial basis function
network classifiers. In Proceedings of the IEEE, SoutheastCon, April 5–7, 2002, pp. 355–358.
[30] Heimes, F. and van Heuveln, B. The normalized radial basis function neural network. In Proceedings
of the IEEE International Conference on Systems, Man, and Cybernetics, Vol. 2, October 11–14, 1998,
pp. 1609–1614.
[31] Craddock, R.J. and Warwick, K. Multi-layer radial basis function networks. An extension to the
radial basis function. In Proceedings of the IEEE International Conference on Neural Networks, Vol. 2,
June 3–6, 1996, pp. 700–705.
[32] Carr, J.C., Fright, W.R., and Beatson, R.K. Surface interpolation with radial basis functions for
medical imaging. IEEE Transactions on Medical Imaging, 16, 1997, 96–107.
[33] Romyaldy, M.A., Jr. Observations and guidelines on interpolation with radial basis function
network for one dimensional approximation problem. In Proceedings of the 26th Annual Conference
of the IEEE Industrial Electronics Society, Vol. 3, October 22–28, 2000, pp. 2129–2134.
[34] Leung, H., Lo, T., and Wang, S. Prediction of noisy chaotic time series using an optimal radial
basis function neural network. IEEE Transactions on Neural Networks, 12, 2001, 1163–1172.
[35] Katayama, R., Kajitani, Y., Kuwata, K., and Nishida, Y. Self generating radial basis function as neuro-
fuzzy model and its application to nonlinear prediction of chaotic time series. In Proceedings
of the Second IEEE International Conference on Fuzzy Systems, Vol. 1, March 28–April 1, 1993,
pp. 407–414.
[36] Warwick, K. and Craddock, R. An introduction to radial basis functions for system identification.
A comparison with other neural network methods. In Proceedings of the 35th IEEE Decision and
Control Conference, Vol. 1, December 11–13, 1996, pp. 464–469.
[37] Lu, Y., Sundararajan, N., and Saratchandran, P. Adaptive nonlinear system identification using
minimal radial basis function Neural Networks. In Proceedings of the IEEE International Conference
on Acoustics, Speech, and Signal Processing, Vol. 6, May 7–10, 1996, pp. 3521–3524.
[38] Tan, S., Hao, J., and Vandewalle, J. A new learning algorithm for RBF Neural Networks with
applications to nonlinear system identification. In Proceedings of the IEEE International Symposium
on Circuits and Systems, Vol. 3, April 28–May 3, 1995, pp. 1708–1711.
[39] Ibayashi, T., Hoya, T., and Ishida, Y. A model-following adaptive controller using radial basis
function networks. In Proceedings of the International Conference on Control Applications, Vol. 2,
September 18–20, 2002, pp. 820–824.
[40] Dash, P.K., Mishra, S., and Panda, G. A radial basis function neural network controller for UPFC.
IEEE Transactions on Power Systems, 15, 2000, pp. 1293–1299.
[41] Deng, J., Narasimhan, S., and Saratchandran, P. Communication channel equalization using
complex-valued minimal radial basis function Neural Networks. IEEE Transactions on Neural
Networks, 13, 2002, 687–696.
[42] Lee, J., Beach, C.D., and Tepedelenlioglu, N. Channel equalization using radial basis function
network. In Proceedings of the IEEE International Conference on Neural Networks, Vol. 4, June 3–6,
1996, pp. 1924–1928.
[43] Lee, J., Beach, C.D., and Tepedelenlioglu, N. Channel equalization using radial basis function
network. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal
Processing, Vol. 3, May 7–10, 1996, pp. 1719–1722.
[44] Sankar, R. and Sethi, N.S. Robust speech recognition techniques using a radial basis function
neural network for mobile applications. In Proceedings of IEEE Southeastcon, April 12–14, 1997,
pp. 87–91.
[45] Ney, H. Speech recognition in a neural network framework: discriminative training of Gaussian
models and mixture densities as radial basis functions. In Proceedings of the IEEE International
Conference on Acoustics, Speech, and Signal Processing, Vol. 1, April 14–17, 1991, pp. 573–576.
[46] Cha, I. and Kassam, S.A. Nonlinear image restoration by radial basis function networks.
In Proceedings of the IEEE International Conference on Image Processing, Vol. 2, November 13–16,
1994, pp. 580–584.
[47] Cha, I. and Kassam, S.A. Nonlinear color image restoration using extended radial basis function
networks. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal
Processing, Vol. 6, May 7–10, 1996, pp. 3402–3405.
[48] Bors, A.G. and Pitas, I. Optical flow estimation and moving object segmentation based on median
radial basis function network. IEEE Transactions on Image Processing, 7, 1998, 693–702.
[49] Gao, D. and Yang, G. Adaptive RBF Neural Networks for pattern classifications. In Proceedings of
the International Joint Conference on Neural Networks, Vol. 1, May 12–17, 2002, pp. 846–851.
[50] Fan, C., Jin, Z., Zhang, J., and Tian, W. Application of multisensor data fusion based on RBF
Neural Networks for fault diagnosis of SAMS. In Proceedings of the 7th International Conference on
Control, Automation, Robotics and Vision, Vol. 3, December 2–5, 2002, pp. 1557–1562.
[51] Tou, J.T. and Gonzalez, R.C. Pattern Recognition, Addison-Wesley, Reading, MA, 1974.
[52] Lo, Z.-P., Yu, Y., and Bavarian, B. Derivation of learning vector quantization algorithms.
In Proceedings of the International Joint Conference on Neural Networks, Vol. 3, June 7–11, 1992,
pp. 561–566.
[53] Burrascano, P. Learning vector quantization for the probabilistic neural network. IEEE Transactions
on Neural Networks, 2, 1991, 458–461.
[54] Karayiannis, N.B. and Randolph-Gips, M.M. Soft learning vector quantization and clustering
algorithms based on non-Euclidean norms: multinorm algorithms. IEEE Transactions on Neural
Networks, 14, 2003, 89–102.
[55] Medsker, L. Design and development of hybrid neural network and expert systems. In Proceedings
of the IEEE International Conference on Neural Networks, IEEE World Congress on Computational
Intelligence, Vol. 3, June 27–July 2, 1994, pp. 1470–1474.
[56] Kurzyn, M.S. Expert systems and Neural Networks: a comparison, artificial Neural Networks and
expert systems. In Proceedings of the First International Two-Stream Conference on Neural Networks,
New Zealand, November 24–26, 1993, pp. 222–223.
[57] Hudli, A.V., Palakal, M.J., and Zoran, M.J. A neural network based expert system
model. In Proceedings of the Third International Conference on Tools for Artificial Intelligence,
November 10–13, 1991, pp. 145–149.
[58] Wang, W.-Y., Cheng, C.-Y., and Leu, Y.-G. An online GA-based output-feedback direct adaptive
fuzzy-neural controller for uncertain nonlinear systems. IEEE Transactions on Systems, Man and
Cybernetics, Part B, 34, 2004, 334–345.
[59] Zhang, Y., Peng, P.-Y., and Jiang, Z.-P. Stable neural controller design for unknown nonlinear
systems using backstepping. IEEE Transactions on Neural Networks, 11, 2000, 1347–1360.
[60] Nelson, A.L., Grant, E., and Lee, G. Developing evolutionary neural controllers for teams of mobile
robots playing a complex game. In Proceedings of the IEEE International Conference on Information
Reuse and Integration, October 27–29, 2003, pp. 212–218.
[61] Rothrock, L. Modeling human perceptual decision-making using an artificial neural network.
In Proceedings of the International Joint Conference on Neural Networks, Vol. 2, June 7–11, 1992,
pp. 448–452.
[62] Mukhopadhyay, S. and Wang, H. Distributed decomposition architectures for neural
decision-makers. In Proceedings of the 38th IEEE Conference on Decision and Control, Vol. 3,
December 7–10, 1999, pp. 2635–2640.
[63] Rogova, G., Scott, P., and Lolett, C. Distributed reinforcement learning for sequential decision
making. In Proceedings of the Fifth International Conference on Information Fusion, Vol. 2, July 8–11,
2002, pp. 1263–1268.
[64] Taheri, J. and Sadati, N. Fully modular online controller for robot navigation in static and
dynamic environments. In Proceedings of the 2003 IEEE International Symposium on Computational
Intelligence in Robotics and Automation, Vol. 1, July 16–20, 2003, pp. 163–168.
[65] Sadati, N. and Taheri, J. Genetic algorithm in robot path planning problem in crisp and fuzzified
environments. In Proceedings of the IEEE International Conference on Industrial Technology, Vol. 1,
December 11–14, 2002, pp. 175–180.
[66] Sadati, N. and Taheri, J. Solving robot motion planning problem using Hopfield neural network in
a fuzzified environment. In Proceedings of IEEE International Conference on Fuzzy Systems, Vol. 2,
May 12–17, 2002, pp. 1144–1149.
[67] Bambang, R. Active noise cancellation using recurrent radial basis function Neural Networks.
In Proceedings of the Asia-Pacific Conference on Circuits and Systems, Vol. 2, October 28–31, 2002,
pp. 231–236.
[68] Chen, C.K. and Chiueh, T.-D. Multilayer perceptron Neural Networks for active noise cancellation.
In Proceedings of the IEEE International Symposium on Circuits and Systems, Vol. 3, May 12–15,
1996, pp. 523–526.
[69] Tao, L. and Kwan, H.K. A neural network method for adaptive noise cancellation, Circuits and
Systems. In Proceedings of the IEEE International Symposium on Circuits and Systems, Vol. 5,
May 30–June 2, 1999, pp. 567–570.
3-41
the worker have changed, and the next phase of working is encouraged, which in turn results in new
surroundings, and so forth. Another example for indirect communication is the laying of pheromone
trails performed by certain species of ants. An ant foraging for food will mark its path by distributing an
amount of pheromone on the trail it is taking, encouraging (but not forcing) ants who are also foraging
for food to follow its path. The principle of modifying the environment in order to induce a change in
behavior as a means of communication is called stigmergy and was first proposed in Reference 5.
Stigmergy is the basis for the organization in many ant colonies. Although the ants have a queen, this
is a specialized ant which is only responsible for laying eggs and does not have any governing function.
Instead, the ants of a colony are self-organized. The term self-organization (SO) is used to describe the
complex behavior which emerges from the interaction of comparatively simple agents. Its origins lie in
the fields of physics and chemistry, where SO is used to describe microscopic operations resulting in
macroscopic patterns, see Reference 6. Through SO, the ants are able to solve the complex problems which
they encounter on a daily basis. The benefits of SO as a basis for problem solving are especially apparent
in its distributed and robust character. Effectively, an ant colony can maintain a meaningful behavior even
if a large number of ants are incapable of contributing for some amount of time.
To better understand the mechanism behind an ant colony’s ability to converge to good solutions when
looking for a short path from the nest to a food source, some experiments were undertaken in References 7
and 8. In Reference 8 a nest of the Argentine ant Linepithema humile was given two paths of identical
length that it could take to reach a food source, and after some time had passed, it was observed that the
ants had converged to one of the paths, following it practically to the exclusion of the alternative. To test
whether this type of ant would converge to the shorter of two alternate paths, an experimental setup
similar to the one depicted in Figure 3.1 was evaluated in Reference 7.
The Argentine ant is practically blind, so it has no means of directly identifying a shorter path. However,
despite this deficiency, a swarm of these ants is capable of finding the shorter path connecting the nest to
the foraging area containing the food, as the experiment shows. Initially, all ants are located at the nest
site. A number of ants start out from the nest in search of food, each ant laying pheromone on its path,
and reach the first fork at point A. Since the ants have no information which way to go, that is, no ant has
walked before them and left a pheromone trail, each ant will choose to go either right or left with equal
probability. As a consequence, about one half of the foraging ants will take the shorter route, the others the
longer route to intersection B. The ants which were on the shorter track will reach this intersection first,
and have to decide which way to turn. Again, there is no information for the ants to use as orientation,
so half of the ants reaching intersection B will turn back toward the nest, while the rest continues toward
the foraging area containing the food. The ants on the longer branch between intersections A and B,
unaffected by the other ants they met head-on, arrive at intersection B and will also split up; however,
Foraging
area
B
12.5 cm
Nest
since the intensity of the pheromone trail heading back toward the nest is roughly twice as high as that of
the pheromone trail heading for the foraging area, the majority will turn back toward the nest, arriving
there at the same time as the other ants which took the long way back. Interestingly, since more ants have
now walked on the short branch between intersections A and B in comparison to the long one, future ants
leaving the nest will now already be more inclined to take the short branch, which is a first success in the
search for the shortest path.
The ants which continued toward the foraging area pick up some food to carry back to the nest. Arriving
at intersection B, the ants will prefer the short branch by the same argument as used above for ants starting
out fresh from the nest. Since the amount of pheromone at intersection A on the path back to the nest is
(roughly) equal to the sum of the pheromone amounts on the two branches leading away from the nest,
the shortest complete path from the foraging area back to the nest is also the most likely to be chosen by
the returning ants. Since the ants are continually distributing pheromone as they walk, the short path is
continually reinforced by more and more ants, until the amount of pheromone placed thereon in relation
to the alternative routes is so high that practically all ants use the shortest path, that is, the system converges
to the shortest path through self-reinforcement.
One point that we have neglected to mention so far is that the pheromone used by ants to mark their
trails slowly evaporates over time. This does not render the arguments used to explain the double bridge
experiment any less valid, it simply makes some of the math used for explanation less rigorous than
implied. Indeed, due to the evaporation of pheromone, a path that has not been chosen for some time,
invariably the long one, will contain almost no traces of pheromone after a sufficient amount of time,
further increasing the likelihood of ants taking the short path identified by the continually updated
pheromone.
In the rest of this chapter, we explain how the concept of marking paths with pheromones can be
exploited to construct algorithms capable of solving highly complex combinatorial optimization problems,
which was first proposed in Reference 9. The following section outlines the structure of Ant Colony
Optimization (ACO) algorithms using the Traveling Salesman Problem as an example. Afterwards, the
design decisions which must be made when implementing an ACO algorithm for a particular problem
are discussed. Finally, a number of applications where ACO algorithms lead to good results are surveyed.
After the initialization of the pheromone matrix, m ants per iteration each independently construct
a solution for the TSP instance. The solution of ant l is stored in a permutation vector πl , which contains
the edges that were traversed during tour construction, that is, ant l moved from city i to city πl (i) in
the tour. In order to ensure that each solution constructed is feasible, each ant maintains a selection set S
which, while the ant constructs a solution, contains all cities that still need to be visited to complete the
tour. The starting city for the ants is chosen randomly, since a tour is a circle visiting all cities and therefore
the starting position is arbitrary. Afterwards, as long as there are still cities that need to be visited, the ant
chooses the next city in the tour in a probabilistic fashion according to Equation (3.1), which is called the
random proportional transition rule:
β
τijα · ηij
pij = β
. (3.1)
α
h∈S τih · ηih
The artificial ant tends to move to the city which has the best combination of pheromone information,
signifying a promising overall solution quality, and heuristic information, which equates to a short distance
to the immediate successor in the tour. The pheromone and heuristic information can be weighted with
the parameters α, β ∈ IR + in order to gauge the influence of the respective information on the decision
process. For the TSP, using α = 1 and β = 5 yields good results, see Section 3.3 for more details.
Once the artificial ant has visited all the cities initially in S, it returns to the starting city, thereby
completing the tour. After m ants have completed their individual tours in this fashion, the solutions are
evaluated and the pheromone information is updated. The update consists of two parts: evaporation and
intensification. The purpose of evaporation is to diminish all pheromone values by a relative amount,
and is accomplished by multiplying all pheromone values with a factor (1 − ρ), where ρ is called the
evaporation rate. For the intensification, the best ant of the iteration (with solution π + ) and the best
ant found by the algorithm over all iterations (with solution π ) are used, each updating the pheromone
values corresponding to the edges traversed on the tour with ∆/2. In conjunction with τ0 = 1, setting
∆ = ρ usually works well.
After a number of iterations, the exact number correlating to how low the pheromone evaporation is set,
the ants will cease finding new solutions because the pheromone matrix has converged to one solution.
It is customary to stop the ACO algorithm after a number of iterations chosen in accordance with the
evaporation rate ρ and problem size n in order to facilitate the aggregation and comparison of results.
are in the subset) to a given number of items is sought which satisfies a number of linear constraints and
maximizes a linear function. The higher the pheromone value assigned to an item is, the more likely its
inclusion in the knapsack becomes.
τijα
pij = α. (3.2)
h∈S τih
In References 16 and 17, a modification to the random proportional transition rule is proposed which
allows for greater control over the balance between exploration and exploitation. A parameter q0 ∈ [0, 1)
denotes the probability for choosing the best combination of pheromone and heuristic information
perceived by the ant, that is, for choosing j with
β
j = arg max τihα · ηih , (3.3)
h∈S
instead of proceeding probabilistically according to Equation (3.1), which is done with probability (1−q0 ).
This pseudo-random proportional rule allows for a calibration of the amount of exploitation (q0 -case)
versus biased exploration ((1−q0 )-case) performed by the ants, similar to the temperature setting in Simu-
lated Annealing or to reinforcement learning, specifically Q-learning [18]. In most cases, however, it is suf-
ficient to use only the random proportional transition rule defined by Equation (3.1), that is, to set q0 = 0.
For an instance of size n, constructing a solution takes O(n 2 ) steps for an ant, since it must make
n decisions and has O(n) candidates for each decision. A common method for reducing this computational
burden for very large problem instances is using so called candidate lists. For example, when using an
ACO algorithm for the TSP, it could make sense to define a candidate list Li for each city i which contains
the L nearest neighbors of city i. When the ant makes a decision, it uses Si = S ∩ Li instead of S as the
selection set, unless Si = ∅, in which case the ant reverts to using S. On average, this will greatly reduce the
computational expense of constructing a solution while still maintaining good solution quality, see also
Reference 19.
Equations (3.1), (3.2), or (3.3) are usually applied in permutation problems. For some problem classes,
however, extensions or alternatives exist. For scheduling problems which minimize total tardiness, using
the sum over all pheromone values up to and including i, that is,
i
τij = τlj , (3.4)
l=1
instead of using only the pheromone value τij , is studied in Reference 20. This type of pheromone
evaluation more accurately reflects that a job which was not chosen for a particular place in the schedule
despite a high pheromone value should be scheduled soon afterwards.
In Reference 21, an ACO algorithm for the Maximum Clique Problem (MCP) is proposed, where new
nodes for the clique C being constructed are chosen by evaluating how strong the pheromone-based con-
nection between nodes already in C and the candidate nodes in S are, that is, the probability to add node i is
α
τiC
pi = α (3.5)
h∈S τhC
with τiC = j∈C τij and S being updated in such a way that C = C ∪ {i} is still a clique for all i ∈ S.
Correspondingly, a pheromone update is undertaken on all edges of the largest final clique of the iteration.
In addition to the pheromone information, ants have the possibility of exploiting heuristic information,
if available. Although not a strict necessity, heuristic guidance usually plays an important role for attaining
a good solution quality. Two known exceptions to this rule are the QAP, where no beneficial heuristic
guidance seems to exist, and the MCP mentioned above, for which results in Reference 21 showed that
employing a straightforward heuristic guidance scheme ultimately led to a worse solution quality.
For the problem classes for which beneficial heuristic guidance does exist, we distinguish between
constant and dynamic heuristic values. For the TSP, the heuristic values ηij = 1/dij are computed once
and remain constant. For other problems like the Probabilistic TSP (see References 22 and 23) or the single
machine total weighted tardiness problem (SMTWTP), the ηij values are functions of past decisions which
the ant has made, that is, they need to be recalculated during every solution construction. For example,
when considering which job j to schedule at place i when minimizing the total tardiness of all jobs, the
heuristic values
1
ηij = (3.6)
max(T + pj , dj ) − T
are used in Reference 20, where dj is the due date and pj the processing time of job j, and T = i−1
h=1 pπ(h)
is the sum of processing times of all jobs that have already been scheduled. Even when heuristic values
are dynamic, the effort for their computation should be restricted to constant time if possible, since
otherwise the amount of time necessary for constructing a solution becomes prohibitively large. Another
disadvantage of the heuristic information for the SMTWTP is that it cannot be combined with a random
decision sequence.
Finally, it is possible to influence the ant’s decisions by tuning the pheromone and heuristic weights
α and β. For most applications, setting α = β = 1 is sufficient. However, tuning these parameters can
lead to better performance for some problem classes. For the TSP, choosing β > 1 has been shown to yield
good results, for example, using β = 2 in References 17 and 24 or β = 5 in References 25 and 26. Using
a steadily decreasing value of β has also been applied successfully for the Resource Constrained Project
Scheduling Problem (RCPSP) in Reference 27. Using values of α > 1 has been shown in Reference 21 to
achieve quicker convergence at the cost of a lower solution quality in the long run.
mechanisms used for this are pheromone evaporation (for the negative reinforcement), which diminishes
all pheromone values by a relative amount each time it is applied, and pheromone intensification (for
the positive reinforcement), achieved by adding an update term to selected pheromone values. Formally,
an update takes the form
where ρ ∈ (0, 1] is a parameter of the algorithm denoting how much of the pheromone information is lost
with every application of evaporation. A high evaporation rate will cause a more rapid convergence which
is coupled with less exploration than a low evaporation rate. Thus, the evaporation rate should be tuned
in accordance with the number of iterations that the ACO algorithm is allowed to run. ∆ij is an update
value, which is 0 if the edge (i, j) was not traversed by the ant and some value greater than 0 if it was.
The exact value of ∆ij and especially the strategy when an update is performed is the key difference
between most types of ACO algorithm. There are two aspects to consider when characterizing pheromone
intensification: which solutions update, and how much is updated by these solutions.
Generally, updates of the pheromone values take place after an iteration of m ants has constructed
solutions. In the Ant System (AS), which was introduced in Reference 9 for solving the TSP, every ant of
the iteration contributes to the pheromone update. For each ant l ∈ [1, m], the update value ∆ij (l), is
calculated, and the update is performed with the sum of update values ∆ij = m l=1 ∆ij (l). Three different
methods for determining the individual ∆ij were tested: assigning a constant, using the inverse to the
distance dij between customers i and j, and, performing best and used subsequently, inverse to the length
of the entire tour, that is, the solution quality. In addition to the m ants of an iteration being allowed
to perform an update, it was also proposed to let a number of so called elitist ants, which represent the
best solution found by all ants so far, update the pheromone trail. Using a small number of these elitist
ants, inspired by the elitist strategy in Reference 28, intensifies the search near the currently best solution,
leading to better results overall.
Further research resulted in the introduction of the Ant Colony System (ACS) [16,17]. Here, an online
update of the pheromone values was proposed in order to enforce exploration: each time an ant traversed
an edge (i, j), it would reduce the corresponding pheromone value according to
τij
→ (1 − ρ) · τij + ρ · τ0 , (3.8)
thus encouraging subsequent ants to choose different edges (note that this holds only for τij ≥ τ0 ). Also,
the global update by all ants at the end of an iteration was replaced by one update performed along the
best tour found so far, that is, by one elitist ant.
Originating from AS and ACS, many other update schemes have been proposed. In References 24
and 29, the MAX–MIN Ant System (MMAS) is introduced, which uses only the best ant of the iteration
and an elitist ant for the positive pheromone update and avoids stagnation in the search process by limiting
pheromone values to the predetermined interval [τmin , τmax ]. Limiting the pheromone values also bounds
the minimum and maximum probability with which an edge is selected according to Equation (3.1), if the
heuristic values are bounded as well. In Reference 25, a modification of the AS called AS-rank is proposed,
where the m ants of an iteration are ranked by their solution quality and, together with a number of elitist
ants of maximum rank, update the pheromone trail in proportion to their rank.
Some methods also exist that operate without using evaporation. In Reference 30, pheromone update is
accomplished by comparing the solution quality of the ant to the average quality of the m previous ants. If it
is better, a positive update is performed along the path; if it is worse, the update is negative. Thus, the update
is accomplished in O(m · n) time for m ants, compared to O(n 2 ) for Equation (3.7). In Reference 31,
a population of solutions is maintained from which the pheromone matrix is derived. Updates are
performed on the population, with the insertion of a solution being equivalent to a positive update of the
pheromone matrix, and a deletion to a negative update (nullifying the previous positive update which was
undertaken upon insertion). This update mechanism will be studied in more detail in Section 3.4.
The number of ants m also plays a role in the exact effect of the pheromone updates. The more solutions
are constructed before an update is undertaken, the higher the expected quality of the updating solutions.
However, the number of ants is a linear factor in the runtime of the algorithm, which is why a trade-off
between runtime and quality of the updating solutions must be found. In Reference 17, a method for
finding the optimal number of ants is discussed which relies on knowledge of the average size of the
pheromone values before and after a change, both being a function of the problem size n. Modeling the
local pheromone update as a first-order linear recurrence relation allows m to be expressed as a function
of the average pheromone levels before and after an update, and of the initial pheromone values. Although
the authors cannot provide these pheromone levels, they argue that experimental results show m = 10 to
work well, and this is also the case in our own experiments.
∆
τ0 = , (3.9)
ρ
which is the maximum value attainable in the long run for any τij via Equation (3.7), usually represents
a good trade-off between runtime and exploration.
of the pheromone matrix, and from which the pheromone values τij can be derived via
3.5 Applications
In this section, we present a survey of some of the noteworthy applications of ACO. Of course, this survey
cannot hope to present a complete overview, and the interested reader is referred to References 26, 39,
and 40 for additional surveys.
One of the earliest and most intuitive applications of ACO was the TSP [9]. Since all ACO algorithms
depend in some fashion on the metaphor of an ant moving through a graph [10], using the TSP to illustrate
the basic principles of Ant Algorithms is a logical choice, and it is also used as the introductory example in
Reference 39. ACO has delivered good results on many TSP instances, especially when combined with local
search [24]. However, due to the existence of very good heuristics like Lin–Kernighan [34] and polynomial
time approximation schemes [41] for the Euclidean TSP, ACO algorithms are not the best choice for this
problem class. The situation is better for the Sequential Ordering Problem (SOP), an extension of the
TSP, where the goal is to find a minimum weight Hamiltonian Path with precedence constraints among
the nodes. Here, a form of Ant Algorithm called Hybrid Ant System (HAS-ACO) [42] is currently one of
the best algorithms available. Other variations of the standard TSP, like the Probabilistic TSP (PTSP) and
Dynamic TSP (DTSP), are also handled well by ACO, using proper heuristic guidance [23] for the PTSP
and pheromone resetting strategies [43,44] or the PACO algorithm [35] for the DTSP.
Another problem related to the TSP is the Vehicle Routing Problem (VRP), in which a number of
customers must be serviced exactly once, and all vehicles begin and end their respective tours at a depot.
The goal is to minimize the number of vehicles while meeting constraints such as capacity per vehicle,
maximum tourlength per vehicle, and time windows. Solving this problem with Ant Systems was first
proposed in Reference 45, and further research has lead to a unified approach for VRPs [46], where the
Ant System is combined with an insertion heuristic from Reference 47.
The QAP, defined in Reference 11 and shown to be NP-hard in Reference 48, is a conceptually different
optimization problem compared to the TSP and its derivates in the sense that the pheromone matrix
is not interpreted in an item × item fashion, but rather as item × place. Applying Ant System to the
QAP was first undertaken in Reference 49, including a heuristic guidance scheme for the ants when
constructing a solution. Adding local search to the AS algorithm was shown to be beneficial in Reference 50.
In Reference 51, the Hybrid Ant System (HAS) was introduced and applied to the QAP with good results.
The HAS algorithm uses ants to modify solutions instead of building them, and the pheromone values
are used to remember beneficial changes.
Another class of problems in which ACO algorithms have seen wide and successful application is in
scheduling problems. For scheduling with due dates, for example, the Single Machine Total Weighted
Tardiness Problem (SMTWTP), the pheromone matrix is also interpreted in a item × place fashion.
However, in contrast to the QAP, “place” in this case refers to the place in the schedule and not a physical
location. An ACO algorithm for the SMTWTP was applied in Reference 52, where ACO found the optimal
solution to 125 benchmark problems more often than the other heuristics evaluated. Ant Algorithms have
also been applied to somewhat more complex scheduling problems, for example, job shop scheduling
[53], flow shop scheduling [54], and, most notably, the Resource Constrained Project Scheduling Problem
(RCPSP) [27], where ACO was state of the art at the time of publishing.
Lately, the ACO algorithm has been extended to be able to deal with multi-criteria optimization
problems, in particular the Single Machine Total Tardiness with Setup Costs Problem. Here, two criteria
exist which must be optimized simultaneously, yet cannot be aggregated into a single optimization func-
tion. Rather, the algorithm needs to find a number of solutions which represent different trade-offs
between the two (or more) criteria. The PACO algorithm was modified for optimizing multi-criteria prob-
lems in Reference 36 with further improvements in Reference 28 yielding an algorithm which performs
very well and can deal with an arbitrary number of criteria.
So far, all the problems discussed have been permutation problems, which can be handled quite
well by ACO. However, some efforts have been undertaken to apply ACO to areas where solutions are
not permutations. As mentioned above, in Reference 12, ACO is successfully applied to the shortest
supersequence problem. Also, some partitioning problems, for example, graph coloring [55] and data
clustering [56], have been solved with ACO, with varying degrees of success. In Reference 57, ACO is used
as a generic algorithm for solving Constraint Satisfaction Problems (CSPs) with promising results.
As a final note, although not being an application, in the recent past it has been shown that under certain
conditions, some versions of ACO can provably find the optimal solution to the instance of a problem
with a probability arbitrarily close to 1 [58,59]. Although these results have no immediate impact on
the applicability of ACO algorithms, they put ACO on the same level as Simulated Annealing or Genetic
Algorithms in terms of solution finding capability. Note that with a lower bound greater than 0 on the
probability to find the solution or move closer to the solution in a given iteration, any method will find
the optimum with a probability arbitrarily close to 1, given enough time.
References
[1] R.H. Arnett. American Insects: A Handbook of the Insects of America North of Mexico. Van Nostrand
Rehinhold, New York, 1985.
[2] E.J. Fittkau and H. Klinge. On biomass and trophic structure of the central amazonian rain forest
ecosystem. Biotropica, 5: 2–14, 1973.
[3] K. von Frisch. The Dance Language and Orientation of Bees. Harvard University Press, 1967.
[4] M. Lüscher. Air-conditioned termite nests. Scientific American, 205: 138–145, 1961.
[5] P. Grassé. La reconstruction du nid et les coordinations interindividuelles chez bellicositermes
natalensis et cubitermes sp. la theorie de la stigmergie: essai d’interpretation du comportement
des termites constructeurs. Insectes Sociaux, 6: 41–81, 1959.
[6] G. Nicolis and I. Prigogine. Self-Organization in Non-Equilibrium Systems. John Wiley & Sons,
New York, 1977.
[7] S. Goss, S. Aron, J.-L. Deneubourg, and J. Pasteels. Self-organized shortcuts in the argentine ant.
Naturwissenschaften, 76: 579–581, 1989.
[8] J.-L. Deneubourg, S. Aron, S. Goss, and J. Pasteels. The self-organizing exploratory pattern of the
argentine ant. Journal of Insect Behavior, 3: 159–168, 1990.
[30] V. Maniezzo. Exact and approximate nondeterministic tree-search procedures for the quadratic
assignment problem. INFORMS Journal on Computing, 11(4): 358–369, 1999.
[31] M. Guntsch and M. Middendorf. A population based approach for ACO. In S. Cagnoni et al., Eds.,
Applications of Evolutionary Computing — Evo Workshops, Vol. 2279 of Lecture Notes on Computer
Science. Springer, 2002, pp. 72–81.
[32] C. Solnon. Boosting ACO with a preprocessing step. In S. Cagnoni, J. Gottlieb, E. Hart,
M. Middendorf, and G. Raidl, Eds., Applications of Evolutionary Computing — Evo Workshops,
Vol. 2279. Springer-Verlag, Kinsale, Ireland, 2002, pp. 161–170.
[33] S. Lin. Computer solutions for the traveling salesman problem. Bell Systems Technical Journal,
44(10): 2245–2269, 1965.
[34] S. Lin and B. Kernighan. An effective heuristic algorithm for the traveling salesman problem.
Operations Research, 21: 498–516, 1973.
[35] M. Guntsch and M. Middendorf. Applying population based ACO to dynamic optimization
problems. In International Workshop on Ant Algorithms ANTS, Vol. 2463 of Lecture Notes on
Computer Science. Springer-Verlag, Heidelberg, 2002, pp. 111–122.
[36] M. Guntsch and M. Middendorf. Solving multi-criteria optimization problems with population-
based aco. In C. Fonseca, P. Fleming, E. Zitzler, K. Deb, and L. Thiele, Eds., Evolutionary Multi-
Criterion Optimization (EMO), Vol. 2632 of Lecture Notes on Computer Science. Springer, Berlin,
Heidelberg, 2003, pp. 464–478.
[37] M. Guntsch. Ant Algorithms in Stochastic and Multi-Criteria Environments, Ph.D. thesis. Insti-
tute AIFB, University of Karlsruhe, January 2004. http://www.ubka.uni-karlsruhe.de/cgibin/
psview?document=2004/wiwi/3.
[38] J. Branke, C. Barz, and I. Behrens. Ant-based crossover for permutation problems. In E. Cantu-Paz,
Ed., Genetic and Evolutionary Computation Conference, Vol. 2723 of Lecture Notes in Computer
Science. Springer, 2003, pp. 754–765.
[39] E. Bonabeau, M. Dorigo, and G. Théraulaz. Swarm Intelligence. Oxford University Press, Oxford,
1999.
[40] T. Stützle and M. Dorigo. The ant colony optimization metaheuristic: Algorithms, applica-
tions, and advances. In F. Glover and G. Kochenberger, Eds., Handbook of Metaheuristics. Kluwer
Academic Publishers, Norwell, MA, 2002.
[41] S. Arora. Polynomial time approximation schemes for Euclidean traveling salesman and other
geometric problems. Journal of the ACM, 45: 753–782, 1998.
[42] L.M. Gambardella and M. Dorigo. An ant colony system hybridized with a new local search for
the sequential ordering problem. IN-FORMS Journal on Computing, 12: 237–255, 2000.
[43] M. Guntsch, M. Middendorf, and H. Schmeck. An ant colony optimization approach to dynamic
TSP. In L. Spector et al., Eds., Genetic and Evolutionary Computation Conference (GECCO).
Morgan Kaufmann Publishers, San Francisco, CA, 2001, pp. 860–867.
[44] M. Guntsch and M. Middendorf. Pheromone modification strategies for ant algorithms applied
to dynamic TSP. In E. Boers et al., Eds., Applications of Evolutionary Computing — Evo Work-
shops, Vol. 2037 of Lecture Notes in Computer Science. Springer-Verlag, Heidelberg, 2000,
pp. 213–222.
[45] B. Bullnheimer, R. Hartl, and C. Strauss. An improved ant system algorithm for the vehicle routing
problem. Technical report, POM Working Paper No. 10/97, University of Vienna, 1997.
[46] M. Reimann, K. Doerner, and R. Hartl. Analyzing a unified ant system for the vrp and
some of its variants. In G. Raidl et al., Eds., Applications of Evolutionary Computing —
Evo Workshops, Vol. 2611 of Lecture Notes on Computer Science. Springer, Heidelberg, 2003,
pp. 300–310.
[47] M.M. Solomon. Algorithms for the vehicle routing and scheduling problems with time window
constraints. Operations Research, 35(2): 254–265, 1987.
[48] S. Sahni and T. Gonzales. P-complete approximation problems. Journal of ACM, 23(3): 555–565,
1976.
[49] V. Maniezzo and A. Colorni. The ant system applied to the quadratic assignment problem. IEEE
Transactions on Knowledge and Data Engineering, 5: 769–778, 1998.
[50] T. Stützle and H. Hoos. Max–min ant system and local search for combinatorial optimization
problems. In Metaheuristics International Conference (MIC), Kluwer Academic, Norwell, MA,
1997.
[51] L.M. Gambardella, E.D. Taillard, and M. Dorigo. Ant colonies for the QAP. Journal of Operations
Research Society, 2: 167–176, 1999.
[52] A. Bauer, B. Bullnheimer, R. Hartl, and C. Strauss. An ant colony optimization approach for the
single machine total tardiness problem. In Congress on Evolutionary Computation (CEC), IEEE
Press Piscataway, NJ, 1999, pp. 1445–1450.
[53] A. Colorni, M. Dorigo, V. Maniezzo, and M. Trubian. Ant system for job-shop scheduling.
JORBEL — Belgian Journal of Operations Research, Statistics and Computer Science, 34: 39–53,
1994.
[54] T. Stützle. An ant approach for the flow shop problem. In 6th European Congress on Intelligent
Techniques & Soft Computing (EUFIT), Vol. 3. Verlag Mainz, Aachen, 1998, pp. 1560–1564.
[55] A. Vesel and J. Zerovnik. How good can ants color graphs? Journal of Computing and Information
Technology — CIT, 8: 131–136, 2000.
[56] N. Monmarche. On data clustering with artificial ants. In A.A. Freitas, Ed., Data Mining with
Evolutionary Algorithms: Research Directions. AAAI Press, Orlando, FL, 1999, pp. 23–26.
[57] C. Solnon. Ants can solve constraint satisfaction problems. IEEE Transactions on Evolutionary
Computation, 6: 347–357, 2002.
[58] W. Gutjahr. ACO algorithms with guaranteed convergence to the optimal solution. Information
Processing Letters, 82: 145–153, 2002.
[59] T. Stützle and M. Dorigo. A short convergence proof for a class of ACO algorithms. IEEE
Transactions on Evolutionary Computation, 6: 358–365, 2002.
4.1 Introduction
Swarm Intelligence (SI) is a computational and behavioral metaphor for solving distributed problems
inspired from biological examples provided by social insects such as ants, termites, bees, and wasps and
by swarm, herd, flock, and shoal phenomena in vertebrates such as fish shoals and bird flocks.
In other words, SI is based on the principles underlying the behavior of natural systems consisting
of many agents, and exploiting local communication forms and highly distributed control. Thus, the
SI approach constitutes a very practical and powerful model that greatly simplifies the design of distributed
solutions to different kind of problems. In the last few years, SI principles have been successfully applied
to a series of applications including optimization algorithms, communications networks, and robotics.
4-55
and plans, the current input, and the current state, collects agent results, analyzes them, and decides the
actions to be executed next.
One way of achieving the required task without a centralized part is by the addition of the individual
efforts of a multitude of agents who do not have any idea of the global objective to be reached; that is, there
is the emergence of collective behavior. Deneubourg et al. [1] introduced a model of sorting behavior
in ants. They found that simple model ants were able to sort into piles objects initially strewn randomly
across a plane. To be precise, near an anthill, one can observe that ants run in all directions to gather
corpses, to clean up their nests, or transport their eggs to order them by size etc. One can only imagine
that something, such as a special chemical marker, indicates to individual ants where to place their chips,
and allows them to distinguish an object already arranged from an object to be arranged. But how are
these markers placed and on what criteria? In fact, such interesting collective behavior can be mediated by
nothing more than similar, simple individual behavior. For example, F(1) each ant wanders a bit, (2) if an
ant meets an object and if it does not carry one, it takes it, and (3) if an ant transports an object and there
is a similar object in the same way in front of it, it deposits its load. By following these local strategic rules
with only local perceptual capacities, ants display the ability to perform global sorting and clustering of
objects.
Swarm Intelligence is a new way to control multiple agent systems. The swarm-type approach to
emergent strategy deals with large numbers of homogeneous agents, each of which has fairly limited
capabilities on its own. However, when many such simple agents are brought together, globally interesting
behavior can emerge as a result of the local interactions of the agents and the interactions between the
agents and the environment. A key research issue in such a scenario is determining the proper design of
the local control laws that will allow the collection of agents to solve a given problem.
Autonomy: The system does not require outside management or maintenance. Individuals are
autonomous, controlling their own behavior both at the detector and effector levels in a
self-organized way.
Adaptability: Interactions between individuals can arise through direct or indirect communication
via the local environment; two individuals interact indirectly when one of them modifies the
environment and the other responds to the new environment at a later time. By exploiting such
local communication forms, individuals have the ability to detect changes in the environment
dynamically. They can then autonomously adapt their own behavior to these new changes. Thus,
swarm systems emphasize auto-configuration capabilities.
Scalability: SI abilities can be performed using groups consisting of a few, up to thousands of individuals
with the same control architecture.
Flexibility: No single individual of the swarm is essential, that is, any individual can be dynamically
added, removed, or replaced.
Robustness: SI provides a good example of a highly distributed architecture that greatly enhances
robustness; no central coordination takes place, which means that there is no single point of failure.
Moreover, like most biological and social systems, and by combining scalability and flexibility
capabilities, the swarm system enables redundancy, which is essential for robustness.
Massively parallel: The swarm system is massively parallel and its functioning is truly distributed.
Tasks performed by each individual within its group are the same. If we view each individual
as a processing unit, SI architecture can be thought of as single instruction stream–multiple data
stream (SIMD) architecture or systolic networks.
Self-organization: Swarm systems emphasize self-organization capabilities. The intelligence exhibited is
not present in the individuals, but rather emerges somehow out of the entire swarm. In other words,
if we view every individual as a processing unit, solutions to problems obtained are not predefined
or preprogrammed but are determined collectively as a result of the running program.
Cost effectiveness: The swarm-type system consists of a finite collection of homogeneous agents, each
of which has fairly limited capabilities on its own. Also, each agent has the same capabilities and
control algorithm. It is clear that the autonomy and the highly distributed control afforded by the
swarm model greatly simplify the task of designing the implementation of parallel algorithms and
hardware. For example, for swarm-type multi-robotic systems, robots are relatively simple and
their design process effort can be kept minimal in terms of sensors, actuators, and resources for
computation and communication.
and this will in turn cause a higher number of ants to choose the shorter path. This elementary behavior
of real ants explains how they can find the shortest path. The collective behavior that emerges is a form of
autocatalytic behavior (or positive feedback), whereby the more the ants follow the trail the more likely
they are to do so.
chemicals called pheromones to provide a sophisticated signaling system. While walking, ants deposit
quantities of pheromone marking the selected routes that they follow with a trail of the substance. When
an ant encounters an intersection, it has to decide which path to follow next. The concentration of
pheromone on a certain path is an indication of its usage. An ant chooses a path with a high probability
to follow and thereby reinforces it with a further quantity of pheromone. Over time, the concentration of
pheromone decreases due to diffusion. This foraging process is an autocatalytic process characterized by
a positive feedback loop, where the probability that an ant chooses any given path increases according to
the number of ants choosing the path on previous occasions. Ants that take the shortest path will reach
the food source first. On their way back to the nest, the ants again have to select a path. After a sufficiently
long period of time, the pheromone concentration on the shorter path will be higher than on other longer
paths. Thus, all the ants will finally choose the shorter path.
This ant foraging process can be used to find the shortest path in networks. Also, ants are capable of
adapting to changes in the environment, and find a new shortest path once the old one is no longer feasible
due to some obstacle. Thus, this process is appropriate to mobile ad hoc networks wherein link changes
occur frequently [5].
Let G = (V , E) be a connected graph with N = |V | nodes. The simple ant colony optimization meta-
heuristic can be used to find the shortest path between a source node vs and a destination node vd on the
graph G. The path length is defined by the number of nodes on the path. A variable ϕi,j corresponding to
the artificial pheromone concentration is associated with each edge (i, j). An ant located in node vi uses
pheromone ϕi,j to compute the probability of node vj being the next hop. This transition probability pi,j
is defined as:
ϕi,j
if j ∈ Vi ,
ϕi,j
pi,j = j∈Vi with pi,j = 1 for 1 ≤ i ≤ N .
0 if j ∈
/ Vi , j∈Vi
During the process, ants deposit pheromone on the edges. In the simplest version of the algorithm, the
ants deposit a constant amount of pheromone, that is, the amount of pheromone of the edge (i, j) when
an ant moves from node vi to node vj is updated from the formula:
Moreover, like real ant pheromone, the artificial pheromone concentration should decrease over time.
In the simple ant algorithm this is shown by:
visit. Recall that a feasible tour visits a city exactly once. Additionally, this allows the ant to cover the same
tour (i.e., path) to deposit delayed pheromones on the visited arcs. The probability with which an ant k
chooses to go from city i to city j while building its tour at the algorithm iteration t is:
a (t ) (k)
i,j if j ∈ Vi ,
(k)
pi,j (t ) = (k) ai,l (t )
l∈Vi
0 otherwise,
(k)
where Vi denotes the set of the neighborhood of node i that ant k has not visited yet.
The ant decision value ai,j (t ) is obtained by the composition of the local pheromone trail value with a
local heuristic value that supports the nearest neighbors as follows:
where Ni is the set of neighbors of node i, and α and β are two parameters that control the relative weight
of the pheromone trail and the heuristic value. A heuristic value should measure or estimate the relevance
of adding an arc (i, j). A reasonable heuristic for TSP is 1/dij , the inverse of the distance between cities
i and j.
After all the ants have completed their tour, pheromone evaporation on arcs is executed. Each ant k
deposits a quantity of pheromone
1
if arc (i, j) ∈ T (k) (t ),
(k)
ϕij (t ) = L (k) (t )
0 otherwise,
where T (k) (t ) is the tour by ant k at iteration t and L (k) (t ) is its length. Note that the shorter the tour of
ant k, the greater is the amount of pheromone deposited.
The addition of new pheromone and pheromone evaporation are set by the following formula:
m
(k)
ϕij (t ) = (1 − q)ϕij (t − n) + ϕij (t ),
k=1
where q is the pheromone decay trail, 0 < q ≤ 1. The initial amount of pheromone ϕij (0) is set to the
same small positive value on all arcs. The suitable value for q is 0.5, which ensures a tradeoff between
sufficient positive feedback and the exploration of new cycles. For α and β, optimal values are α ≈ 1 and
1 ≤ β ≤ 5. Note that with α = 0, the algorithm corresponds to the classical greedy algorithm, and with
α > 2, all the agents converge to the same cycle, which is not necessarily optimal.
The comparison of this algorithm with other heuristics such as tabu search and simulated annealing on
small TSP problems (n = 30 cities), emphasizes its efficiency. The same technique presented here for TSP
was applied to solve other optimization problems such as job scheduling, QAP, and routing in networks
[2,5–8].
4.3.1.3 Comparison with Other Nature-Inspired Algorithms
A number of modern optimization techniques are inspired by nature. In simulated annealing modeled
from the thermodynamic behavior of solids, particles in solution space move under the control of
a randomized scheme, with probabilities according to some typically Boltzmann-type distribution.
Genetic algorithms (GAs) start with a randomly generated population and use crossover and mutation
operators to update it together with fitness function to evaluate the individuals. Neural networks (NNs)
are a distributed learning technique in which the knowledge associated with a trained neural network is
not stored in any specific location but encoded in a distributed way across its weight matrix.
Ant colony optimization shares many common points with these nature-inspired approaches. ACO,
SA, and GA share the same update mechanism with random techniques. Randomness is present in the
fuzzy behavior of ants [8]. ACO shares with GA some organizing principles of social population such as
interaction and self-organization. ACO shares with NN trained networks the property that knowledge is
distributed throughout the network. Moreover, ACO, like NN, exhibits emergence capabilities [8].
(1) (2)
vi (t + 1) = avi (t ) + b1 r1 (pi − xi (t )) + b2 r2 (pi − xi (t )),
xi (t + 1) = xi (t ) + vi (k + 1),
where vi (t ) denotes the velocity of particle i, which represents the distance to be traveled by this particle
from its current position, that is, the difference between two successive particle positions; xi (t ) represents
(1) (2)
the particle position; pi represents its own previous best position; and pi is the best value obtained so
(2)
far by any particle among the neighbors. In the global version of the algorithm, pi represents the globally
best position among the whole swarm. Particles change their position (or state) in the following manner.
At iteration t , the velocity vi (t ) is updated based on its current value affected by a tuning parameter a,
and on a term that attracts the particle towards previously found best positions. The strength of attraction
is given by the coefficients b1 and b2 . The particle position xi (t ) is updated using its current value and
the newly computed velocity vi (t + 1). The three tuning parameters, a, b1 , and b2 , influence greatly the
algorithm performance. The inertia weight a is a user specified parameter, a large inertia weight pressures
towards global exploration in a new search area while a small inertia pressures towards fine tuning the
current search area. Positive constant acceleration coefficients (or learning factors) b1 and b2 control the
maximum step size of the particle, usually b1 = b2 = 2. Suitable selection of the tuning factors a, b1 ,
and b2 can provide a balance between the global (i.e., state space exploration), and local search (i.e., state
space exploitation). Random numbers r1 and r2 are selected in the range [0, 1] and they introduce useful
randomness for state space exploitation.
In the following, we show how this basic PSO algorithm can be adapted to solve an optimization
problem. The task-mapping problem (TMP) is one of the most studied NP-hard problems in distributed
computing.
4.4 Conclusion
Swarm Intelligence is a rich source of inspiration for our computer systems. Specifically, SI has many fea-
tures that are desirable for distributed computing. These include auto-configuration, auto-organization,
autonomy, scalability, flexibility, robustness, emergent behavior, and adaptability. These capabilities sug-
gest a wide variety of applications that can be solved by SI principles. We believe that the emergence
paradigm and the highly distributed control paradigm will be fruitful to new technologies, such as
nanotechnology, massively parallel supercomputers, embedded systems, and scalable systems for deep
space applications.
References
[1] J.L. Deneubourg, S. Goss, N. Franks, A. Sendova-Franks, C. Detrain, and L. Chrétien. The
dynamics of collective sorting, robot-like ants and ant-like robots. In Simulation of Animal
Behaviour: From Animal to Animals (J.A. Meyter and S. Wilson, Eds.), MIT Press, Cambridge,
MA, 1991, pp. 356–365.
[2] T. White. Swarm intelligence and problem solving in telecommunications. Canadian Artificial
Intelligence Magazine, spring, 1997.
[3] R. Beckers, J.L. Deneubourg, and S. Goss. Trails and U-turns in the selection of the shortest path
by the ant Lasius niger. Journal of Theoretical Biology, 159, 397–415, 1992.
[4] Lynne E. Parker. ALLIANCE: An architecture for fault tolerant multi-robot cooperation. IEEE
Transactions on Robotics and Automation, 14, 220–240, 1998.
[5] Mesut Günes and Otto Spanio. Ant-routing-algorithm for mobile multi-hop ad-hoc networks. In
Network Control and Engineering for Qos, Security and Mobility II, Kluwer Academic Publishers,
2003, pp. 120–138.
[6] Eric Bonabeau and Guy Theraulaz. Intelligence Collective, Edition HERMES, Paris, 1994.
[7] Marc Dorigo, Eric Bonabeau, and Guy Theraulaz. Ant algorithms and stigmergy. Future Generation
Computer Systems, 16, 851–871, 2000.
[8] B. Denby and S. Le Hégarat-Mascle. Swarm intelligence in optimization problems. Nuclear
Instruments and Methods in Physic Research Section A, 502, 364–368, 2003.
[9] Ayed Salman, Imtiaz Ahmad, and Sabah Al-Madani. Particle swarm optimization for task
assignment problem. Microprocessors and Microsystems, 26, 363–371, 2002.
[10] Ioan Cristian Trelea. The particle swarm optimization algorithm: Convergence analysis and
parameter selection. Information Processing Letters, 85, 317–325, 2003.
[11] X. Hu, R. Eberhart, and Y. Shi. Particle swarm with extended memory for multiobjective
optimization. In Proceedings of the IEEE Swarm Intelligence Symposium, 2003, Indianapolis,
IN, USA.
[12] L. Messerschmidt and A.P. Engelbrecht. Learning to play games using a PSO-based competitive
learning approach. In Proceedings of the 4th Asia-Pacific Conference on Simulated Evolution and
Learning, 2002.
5-65
The finding of reliable methods for automatic programming would be a revolution for computer
science, and would completely change our perception of the software industry.
Although the proposal described in the 1950s was of interest, until very recently results have not unveiled
the potential that techniques of machine learning can attain. This potential has been clearly shown by a new
technique that can be considered part of “evolutionary algorithms” (EAs), genetic programming (GP) [2].
Genetic programming is aimed at evolving computer programs. It begins with a basic description of
the problem to be solved, after which the initialization process takes place; GP proceeds by automatically
generating a set of candidate solutions for the problem to be solved. Each of the candidate solutions
takes the shape of a computer program. Finally, GP enters a loop that evaluates each of the solutions,
selects the best one — according to a measurement criteria — and produces a new set of candidate
solutions employing the information contained in those selected solutions that acts as parents for the next
generation.
Not only GP but also the set of techniques comprised within the EA field resembles the natural evolution
of species in nature, as described by the Natural Evolution Theory [3].
Among the techniques that arose under the umbrella of natural evolution, genetic algorithms [4],
evolutionary programming [5], and evolution strategies [6,7] have matured and demonstrated their
usefulness.
During the last few years, GP has demonstrated not only its capability of automatically developing
software modules [8], but has also been employed for designing industrial products with outstanding
quality — such as electronic circuits that have been recently patented [9].
Although GP has proved its usefulness, the computational resources required for solving complex
problems may become huge. In such instances, improvement of the technique is sought by using concepts
borrowed from the parallel processing area. Not only GP but also EA have incorporated some degree of
parallelization when difficult problems are to be solved.
This chapter focuses on parallel genetic programming. We first provide some basic ideas about how GP
works, and then how it can be parallelized. Finally, after reviewing the history of the field, we show some
problems — benchmark and real-life problems — that are solved by Parallel GP.
+
F = {+}
n + T = {n}
n n
+
*
n n 2 n
F ={+} F = {*}
T = {n} T = {2,n}
FIGURE 5.2 Two solutions for the problem, employing different terminal and function sets.
Even when the designer does not know the optimal solution, he should be able to extract some inform-
ation from the high-level specification of the problem, which helps him to define the appropriate function
and terminal sets.
+ n n +
Parent A Parent B
n n n n
+
Subtrees are exchanged +
in decendants
n n n +
Decendant #1
+ n
Decendant #2
Crossover nn n
FIGURE 5.3 Tree-based GP crossover. The two parents exchange one of their subtrees in order to generate two new
offspring.
+ +
+ n n n
Mutation
FIGURE 5.4 Tree-based GP mutation. A subtree is randomly selected from the parent and replaced with another
randomly generated subtree.
3 0 5 7 1
3 — Fitness values are returned
to the main processor
1— Individuals are sent
for evaluation
FIGURE 5.5 Parallelizing at the fitness level. Different processors or workstations are in charge of evaluating
individuals, while the main processor runs the GP algorithm.
We have to bear in mind that the previously described parallel algorithm is basically the same as the
sequential version, the only difference is the parallel evaluation of individuals.
In the next section we describe other ways of developing parallel GP, but with some changes to the basic
algorithm, which will also help to solve the problem at hand more quickly.
1. Number of subpopulations
2. Frequency of exchange
3. Number of exchanged individuals
4. The communication topology
5. Replacement policy
Some of these important parameters have been studied recently [14]. Researchers have found that a good
choice for the previous parameters is to send 10% of individuals from each subpopulation to an adjacent
one every 10 generations. The communication topology does not affect significantly the convergence
process. On the other hand, a compromise between the number of subpopulations and the total number
of individuals employed for solving the problem has to be adopted: if each of the subpopulations is made
up of a small number of individuals, the exploration phase performed by each of the subpopulation will
not be satisfactory.
A difference may be established between the parallel model we use and the parallel architecture we
employ to run that algorithm. When dealing with distributed EAs, such as island GP, one can run the
algorithm on both distributed memory multiprocessor machines and also on the network of workstations
(NOWs).
In these architectures, the address spaces of each processor are separated and communication between
processors must be implemented through some form of message passing. NOWs are widely used because
of their low cost and ubiquity, although their performances are limited by communication latencies and
heterogeneous workload distribution. In the case of parallel EAs, given that the communication step is
not the most time-consuming part of the algorithm, and the migration step is rarely performed, these
kinds of low-cost architecture are useful enough.
The migrations between the different demes can be implemented, for example, using the Message
Passing Interface Standard (MPI) with synchronous communication operations, that is, each island
runs a standard generational GP and individuals are exchanged at fixed synchronization points between
generations. Implementation details can be found in Reference 18.
Researchers have also found that the Island Model may obtain better results than the panmictic model —
classic model — even when it is run on a standard sequential machine [19]. The improvement does not
come from the parallelization of the operations, because only one processor is employed, but from the
change in the model. The Island Model introduces the migration step that helps to conserve diversity
during the run, which is good for the finding of better solutions.
Other spatial distributions are also available for parallel EAs. For instance, one could distribute each of
the individuals from the population on a two-dimensional grid (other dimensions are also possible). The
idea is that each of the individuals interact only with their direct neighbors. Therefore, reproduction and
mating take place locally for each of the individuals.
The model allows the slow diffusion of information from good individuals across the grid, and semi-
isolated niches of individuals arise in different areas of the space.
biased by the different speeds of each of the processors or computers in charge of evaluating each of the
subpopulation. This measure will by itself show us the advantage of the model, and this advantage will
be observed even when it is employed in a sequential machine. Of course, in a given architecture and a
real-life problem, the time required for obtaining a particular solution is the right measure: researchers
try to obtain a solution for a problem as soon as possible.
For computing fitness values, a useful figure is the Mean Best Fitness (MBF, the average over a number
of runs) of the best fitness value at the end of the run. Nevertheless, when difficult problems are evaluated,
none knows in advance whether the global optimum has been obtained or not. Therefore, the idea is to
take the measure when a specified amount of computational effort has been spent.
In the comparisons we show below, we employ the measure described here: MBF versus computing
effort (total number of nodes evaluated).
of solutions in the search space greatly influences the performance of the Island Model. Therefore, the
conclusion was that multiple-solution problems would be more amenable to multiple populations than
single-solution problems. On the other hand, nondeceptive problems would also be more amenable to
multiple populations than deceptive problems.
In the aforementioned papers, different parallel GP models are employed to study the improvement
achieved when different benchmark and real-life problems are tackled. Nevertheless, no indepth study on
specific parameters of the new models is presented until 2000. For instance, Tongchim and Chongstitvatana
[26] studied synchronous and asynchronous versions of parallel GP. Results demonstrated that the parallel
asynchronous algorithms obtains better results.
On the other hand, a whole study on the migration policies, topology, and other important parameters
for the Island Model is presented for GP in 2003 [14]. Similarly, a study is performed, employing a parallel
version of GP based on a cellular approach [27].
Plenty of results dealing with the application of parallel GP to real-life problems, and also papers
focusing on any of the important parameters for the new model are available today. Recently, researchers
have shown that the technique is even capable of solving some problems and obtaining solutions better
than any other techniques previously invented by human being. The results are so impressive that they
have even been patented. For instance, Koza [9] describes several applications of parallel GP that had led
to solutions to problems; these applications are novel and useful enough to be patented. Particularly, Koza
[28] describes some results that can be considered as inventions. He presented several analogue circuits
that have been discovered by means of parallel GP. Some of those circuits — later patented — can probably
be considered as the first inventions developed by computers.
In the following section, we show by means of two examples, how parallel GP can be applied to solving
a real-life problem.
5.5 Applications
In this section, we briefly describe several benchmark problems that have been traditionally employed for
testing GP performances, and also present two different real-life problems that have been addressed by
means of parallel GP.
Even Parity k Problem. The boolean even parity k function of k boolean arguments returns true if an even
number of its boolean arguments evaluates to true, otherwise it returns false. If k = 4, then 16 fitness cases
must be checked to evaluate the fitness of an individual. The fitness is computed as 16 minus the number of
hits over the 16 cases. Thus a perfect individual has fitness 0, while the worst individual has fitness 16. The
set of functions to be employed for GP individuals might be the following one: F = {AND, OR, NOT }.
The terminal set in this problem is composed of k different boolean variables T = {v1 , . . . , vN }.
Artificial Ant Problem on the Santa Fe Trail. In this problem, an artificial ant is placed on a 32 × 32
toroidal grid. Some of the cells from the grid contain food pellets. The goal is to find a navigation
strategy for the ant that maximizes its food intake. Typically, the function set employed is the following
one: F = {if − food − ahead} while the terminal set is T = {left, right, forward} as described in Ref-
erence 2. As fitness function, we use the total number of food pellets lying on the trail (89) minus the
C
B B B B
A A A A A A A A A
X X X X X X X X X
amount of food eaten by the ant from the path. This turns the problem into a minimization one, like the
previous one.
Symbolic Regression Problem. The problem aims to find a program that matches a given equation. We
employ the classic polynomial equation f (x) = x 4 + x 3 + x 2 + x, and the input set is composed
of 1000 fitness cases. For this problem, the set of functions used for GP individuals is the following:
F = {∗, //, +, −}, where // is like / but returns 0 instead of error when the divisor is equal to 0, thus
allowing syntactic closure. The fitness computes the sum of the square errors at each test point. Again,
lower fitness means a better solution.
The Royal Tree Problem. This Problem [24] is commonly used as a standard function for testing the
effectiveness of GP. It consists of a single base function that is specialized into as many cases as necessary,
depending on the desired complexity of the resulting problem.
A series of functions, a, b, c, etc., with increasing arity are defined. (An a function has arity 1, a
b function has arity 2, and so on.) A number of terminals x, y, and z are also defined.
A level-a tree is an a root node with a single x child. A level-b tree is a b root node with two level a trees
children. A level-c tree is a c root node with three level-b trees as children. A level-e tree has depth 5 and
326 nodes, while a level-f tree has depth 6 and 1927 nodes. Perfect trees are defined as shown in Figure 5.7.
The raw fitness of a subtree is the score of its root. Each function calculates its score by adding up the
weighted scores of its direct children. If the child is a perfect tree of the appropriate level (for instance, a
complete level-c tree beneath a d node), then the score of that subtree, times a FullBonus weight, is added
to the score of the root. If the child’s root is incorrect, then the weight is Penalty. After scoring the root,
if the function is itself the root of a perfect tree, the final sum is multiplied by CompleteBonus. Typical
values used are: FullBonus = 2, PartialBonus = 1, Penalty = 1/3, and CompleteBonus = 2. The score
base case is a level-a tree, which has a score of 4 (the a–x connection is worth 1 times the FullBonus, times
the CompleteBonus).
The IOBs allow the connection of the circuit implemented by the CLBs with any external system. Finally,
the connection blocks (switch-boxes and interconnection lines) are employed for the internal routing of
the circuit.
One of the main steps in the FPGA design process is the placement and routing. We present a meth-
odology that is based on parallel GP. The methodology has also been employed for tackling Multi-FPGA
Systems Synthesis [29].
The problem we try to solve begins with a circuit description, and the goal is to place components
and wires in an FPGA. Genetic programming is thus in charge of encoding circuits, so that a graph —
circuit — is described by means of a tree — GP individual. In the following, we describe how graphs are
encoded by means of trees.
Although several authors have implemented GP in hardware [30,31], the idea here is completely
different: we use GP for implementing circuits on hardware.
i1
1
i2 5
2 01
7 8
i3
3
i4 6
4
FIGURE 5.9 Representing a circuit with black boxes and labeling connections.
5.6.1.2 GP Sets
The function set for our problem contains only one element: F = {SW}, Similarly, the terminal set
contains only one element T = {CLB}. But SW and CLB may be interpreted differently depending on the
position of the node within a tree. Sometimes a terminal node corresponds to an IOB connection, while
sometimes it corresponds to a CLB connection in the FPGA. Similarly, an internal node — SW node —
sometimes corresponds to a CLB connection (the first node in the branch), while others affect switch
connections in the FPGA (internal node in a branch, see Figure 5.10). Each of the nodes in the tree will
thus contain different information:
1. If we are dealing with a terminal node, it will have information about the position of CLBs, the
number of pins selected, the number of wires to which it is connected, and the direction we are
taking when placing the wire.
2. If we are, instead, in a function node, it will have information about the direction we are taking.
This information enables us to establish the switch connection, or in the case of the first node of
the branch, the number of the pin where the connection ends.
5.6.1.4 Results
Figure 5.9 graphically depicts one of the test circuits that has been used for validating the methodology.
The main parameters employed were the following: maximum number of generations equal to 500,
maximum tree depth equal to 30, steady state, tournament selection of size 10, crossover probability
equal to 98%, mutation probability equal to 2%, ramped half-and-half initialization, elitism (i.e., the best
individual has been added to the new population at each generation). We employed 5 subpopulations of
500 individuals each, with a period of migration of 10 generations. The GP tool we used is described in
Reference 18.
Figure 5.11 shows one of the proposed solutions among those obtained with parallel GP for the circuit.
A very important fact is that each of the solutions that GP found possesses different features, such as area
of the FPGA used and position of the input/output terminals. This means that the methodology could
easily be adapted for managing typical constraints in FPGA placement and routing.
Figure 5.12 presents a comparison of parallel GP and classic GP when applied to the problem of
placement and routing on FPGAs. We can see that parallel GP, employing 5 populations and 500 individuals
per population, achieved better convergence results than GP with 2500 individuals — the same total
amount of individuals: PGP converges more quickly and obtains slightly better results.
The methodology has been successfully applied to Multi-FPGAs System Synthesis [29]. Figure 5.13
shows a picture of the device that was built for testing the methodology.
4 i4
8 o1
6 3 i3
7 1 i1
2 i2
FIGURE 5.12 Comparison between parallel GP — 5 pop., 500 individuals each — and classic GP — 2500 individuals.
50 runs have been performed and averaged for each curve.
Parallel GP
10 × 105
GP
6 × 105
Fitness 2 × 105
5.6.2.3 A Case Study: Burns Unit, Virgen del Rocío Hospital. Seville, Spain
In order to apply the methodology, the collaboration of medical specialists from the area of burns treatment
was required.
The Burns Unit, at Virgen del Rocío Hospital, was in charge of retrieving information about real cases
of burns. A form was provided for several specialists to collect information on cases, which would be
useful for testing the problem. Several photographs were taken in each of the cases, and several features
of burns were noted in the form.
Thirty one different clinical cases were studied. Each completed form was accompanied by two pho-
tographs. No image processing has been done, but photographs are necessary for studying the different
parameters that will be automatically retrieved in the future. In this research, following the specialists’
indications, just three parameters have been studied.
No photographs are shown here, in order to preserve the privacy of the people who took part in the
study, but all of them are stored in the Burns Unit at the Virgen del Rocío Hospital, Seville, Spain.
We are aware that an accurate diagnosis requires the study of additional features, but we believe that
these three parameters are enough to see how the methodology works, and to obtain a first sketch for a
knowledge system based on GP.
Following specialist doctors’ advice, we decided to develop the first version of the knowledge system
taking into account just three different parameters from each picture:
1. Color (C): Several possible values: White, Black, Yellow, Red, Pink, and combinations
2. Dryness (D): Two possibilities: True or False
3. Ampoule (A): Two possibilities: Absent or Not
Although studying a wider set of parameters would be more useful, this simple model confirms the
validity of the methodology employed. The aim was not to create a tool that can classify 100% of the cases
but to demonstrate how easily a doctor’s knowledge can automatically be captured by means of GP and
decision trees.
Four kinds of diagnosis are possible for a burn: first degree, surface second degree, deep second degree,
and third degree.
In order to train the system, a table with parameters, taken by studying the photographs and diagnoses,
was built.
5.6.2.4 Results
A set of 31 clinical cases were provided by doctors, and these cases were used to train the system. Each of
them was allocated its corresponding parameters together with the result of its evolution over a couple of
weeks. This table of values was given to the algorithm in order for it to train.
Due to how GP works, the terminal and function sets were established as F = {ifthenelse, >, =, <,
AND, NOT} and T = {C, D, A}.
We ran the algorithm employing 5 populations of 2000 individuals each, with a period of migration
of 10 generations. We waited for 60 generations before taking the results. At the end, the best decision
tree we obtained was able to classify correctly 27 out of 31 cases. This was due to the presence of several
cases with the same parameters but with different evolution values. It was, consequently, impossible to
categorize these cases accurately. Figure 5.14 shows the decision tree obtained.
The meanings for nodes are the following: A, Ampoule; D, Dryness; C, Color; D2, Deep second degree;
S2, Surface second degree; 3, third degree; P, Pink color; Y, Yellow color.
Bearing in mind that each color has an integer number associated with it, comparisons among them are
meaningful. Of course, many different versions of the same tree can be obtained in different executions
of the algorithm. For the sake of simplicity, we show just one.
Using this methodology, we are able to represent medical knowledge by means of decision trees. This
is just an example about the use of GP for the task proposed, although for obtaining a completely reliable
decision tree a much larger set of examples must be used, and training and test sets have to be built and
employed in the learning process.
Finally, Figure 5.15 presents a comparison between parallel GP and GP when applied to the problem
of medical knowledge representation. We can see that parallel GP employing 5 populations and 2000
individuals per population achieved better convergence results than GP with 10000 individuals (the same
total amount of individuals).
We have to consider that the curve shown for PGP is not taking into account the time savings obtained
when using a parallel architecture, such as a multiprocessor system or a cluster of computers. If we take
both improvements into account — time savings and improvement in the convergence process — parallel
GP is far superior to plain GP.
IF
!A D2 IF
!D S2 IF
<= 3 IF
C Y > D2 S2
C P
Parallel GP
1400 GP
1000
Fitness 600
200
FIGURE 5.15
Acknowledgment
Part of this research has been possible thanks to Ministerio de Ciencia y Tecnología research projects
number TIC2002-04498-C05-01.
References
[1] T. Mitchell. Machine Learning. McGraw Hill, New York, 1996.
[2] J.R. Koza. Genetic Programming. The MIT Press, Cambridge, MA, 1992.
[3] C. Darwin. On the Origin of Species by Means of Natural Selection. John Murray, London, 1859.
[4] John H. Holland. Adpatation in Natural and Artificial Systems. University of Michigan Press, Ann
Arbor, MI, 1975.
[5] L.J. Fogel, A.J. Owens, and M.J. Walsh. Artificial intelligence through a simulation of evol-
ution. In M. Maxfield, A. Callahan, and L.J. Fogel, Eds., Biophysics and Cybernetic Systems:
Proceedings of the second Cybernetic Sciences Symposium, Spartan Books, Washington, D.C., 1965,
pp. 131–155.
[6] I. Rechenberg. Evolutionsstrategie: Optimierung technischer Systeme nach Prinzipien der biologischen
Evolution. frommann-holzbog, Stuttgart, 1973. German.
[7] H.P. Schwefel. Evolutionsstrategie und numerische Optimierung. Ph.D thesis, Technische
Universitat Berlin, Berlin, 1975.
[8] W.B. Langdon. Data Structures and Genetic Programming: Genetic Programming + Data Structures =
Automatic Programming! Kluwer Academic Publishers, New York, 1998.
[9] J.R. Koza, F.H. Bennett III, and O. Stiffelman. Genetic programming as a Darwinian invention
machine. Genetic Programming: Proceedings of EuroGP ’99, LNCS, Vol. 1598. Springer-Verlag, May
1999.
[10] K. Stoffel and L. Spector. High-performance, parallel, stack-based genetic programming. In
Koza, J.R., et al., Eds., Genetic Programming 1996: Proceedings of the First Annual Conference.
Stanford University, MIT Press, CA, 1996, pp. 224–229.
[11] W.B. Langdon and R. Poli. Fitness causes bloat. In R. Roy P.K. Chawdhry and R.K. Pant, Eds., Soft
Computing in Engineering Design and Manufacturing. Springer-Verlag, London, 1997, pp. 13–22.
[12] M. Tomassini. Parallel and distributed evolutionary algorithms: a review. In K. Miettinen,
M. Mäkelä, P. Neittanmäki, and J. Périaux, Eds., Evolutionary Algorithms in Engineering and
Computer Science. John Wiley & Sons, New York, 1999, 113–133.
[13] M. Oussaidéne, B. Chopard, O. Pictet, and M. Tomassini. Parallel genetic programming and its
application to trading model induction. Parallel Computing, 23: 1183–1198, 1997.
[14] F. Fernández, M. Tomassini, and L. Vanneschi. An empirical study of multipopulation genetic
programming. Genetic Programming and Evolvable Machines, 4: 21–52, 2003.
[15] J.P. Cohoon, S.U. Hegde, W.N. Martin, and D. Richards. Punctuated equilibria: A parallel genetic
algorithm. In J.J. Grefenstette, Ed., Proceedings of the Second International Conference on Genetic
Algorithms. Lawrence Erlbaum Associates, Mahwah, NJ, 1987, p. 148.
[16] R. Tanese. Parallel genetic algorithms for a hypercube. In J.J. Grefenstette, Ed., Proceedings of
the Second International Conference on Genetic Algorithms. Lawrence Erlbaum Associates, 1987,
pp. 177–183.
[17] F. Fernández de Vega. Distributed Genetic Programming Models with Application to Logic Syn-
thesis on FPGAs. Ph.D. thesis, Computer Science Department, University of Extremadura, Cáceres,
Spain, 2001.
[18] F. Fernández, M. Tomassini, L. Vanneschi, and L. Bucher. A distributed computing environment
for genetic programming using MPI. In J. Dongarra, P. Kaksuk, and N. Podhorszki, Eds., Recent
Advances in Parallel Virtual Machine and Message Passing Interface, Vol. 1908 of Lecture Notes in
Computer Science. Springer-Verlag, Heidelberg, 2000, pp. 322–329.
[19] F. Fernandez, M. Tomassini, and J.M. Sanchez. Experimental study of isolated multipopulation
genetic programming. In IEEE International Conference on Industrial Electronics, Control and
Instrumentation, Nagoya, Japan, 2000. IEEE Press, Washington, 2000, pp. 2672–2677.
[20] W.B. Langdon and R. Poli. Foundations of Genetic Programming. Springer-Verlag, Berlin, 2002.
[21] P. Tufts. Parallel case evaluation for genetic programming. In 1993 Lectures in Complex Systems,
Vol. VI of Santa Fe Institute Studies in the Science of Complexity, 1993, pp. 591–596.
[22] H. Juille and J.B. Pollack. Parallel genetic programming and fine-grained SIMD architecture. In
E.V. Siegel and J.R. Koza, Eds., Working Notes for the AAAI Symposium on Genetic Programming.
MIT, Cambridge, MA, November 10–12, 1995. AAAI, pp. 31–37.
[23] P.J. Angeline and K.E. Kinnear Jr. (Eds.). Advances in Genetic Programming 2. The MIT Press,
Cambridge, MA, 1996.
[24] W. Punch. How effective are multiple populations in genetic programming. In J.R. Koza,
W. Banzhaf, K. Chellapilla, K. Deb, M. Dorigo, D.B. Fogel, M. Garzon, D. Goldberg, H. Iba,
and R.L. Riolo, Eds., Genetic Programming 1998: Proceedings of the Third Annual Conference.
Morgan Kaufmann, San Francisco, CA, 1998, pp. 308–313.
[25] D. C. Dracopoulos and S. Kent. Bulk synchronous parallelisation of genetic programming. In Jerzy
Waśniewski, Ed., Applied Parallel Computing: Industrial Strength Computation and Optimization;
Proceedings of the third International Workshop, PARA ’96, Springer-Verlag, Berlin, Germany, 1996,
pp. 216–226.
[26] S. Tongchim and P. Chongstitvatana. Nearest neighbor migration in parallel genetic programming
for automatic robot programming. In Proceedings of the Sixth International Conference on Control,
Automation, Robotics and Vision, Singapore, December 2000.
[27] G. Folino, C. Pizzuti, and G. Spezzano. A scalable cellular implementation of parallel genetic
programming. IEEE Transactions on Evolutionary Computation, 7: 37–53, 2003.
[28] J.R. Koza, F.H. Bennett III, D. Andre, and M.A. Keane. Genetic Programming III: Darwinian
Invention and Problem Solving. Morgan Kaufmann, San Francisco, CA, 1999.
[29] F. Fernandez, I. Hidalgo, J. Lanchares, and J.M. Sanchez. A methodology for reconfigurable hard-
ware designed based upon evolutionary computation. Microprocessors and Microsystems. Elsevier,
28: 363–371, 2004.
[30] P. Martin. A hardware implementation of a genetic programming system using FPGAs and
Handel-C. Genetic Programming and Evolvable Machines, 2: 317–343, 2001.
[31] M.I. Heywood and A.N. Zincir-Heywood. Register based genetic programming on fpga computing
platforms. In R. Poli, W. Banzhaf, W.B. Langdon, J.F. Miller, P. Nordin, and T.C. Fogarty, Eds.,
Proceedings of the European Conference on Genetic Programming, Vol. 1802 of Lecture Notes in
Computer Science. Springer-Verlag, London, 2000, pp. 44–59,
[32] J. H. Holmes. Discovering risk of disease with a learning classifier system. In T. Bäck, Ed., Proceed-
ings of the Seventh International Conference on Genetic Algorithms (ICGA97). Morgan Kaufmann,
San Francisco, CA, 1997.
[33] J.R. Quinlan. Decision trees and instance-based classifiers. In The Computer Science and
Engineering Handbook, 1997, pp. 521–535.
[34] M. Kurzynski. The application of unified and combined recognition decision rules to the
multistage diagnosis problem. In Proceedings of the 20th Annual International Conference of the
IEEE Engineering in Medicine and Biology Society, Vol. 3. Hong-Kong, 1998, pp. 1194–1197.
6.1 Introduction
An emergent phenomenon is the large-scale group behavior of a system that does not seem to have any
explanation in terms of the single constituent parts only. In other words, emergence can be defined by
saying that “the whole is greater than the sum of the parts.” In emergent systems, we can consider two
different levels of description: the microscopic level, where all the single components are taken into account;
and the macroscopic level, where emergent behavior occurs as the synthesis of the complex interaction of
the microscopic components. To bring emergent systems out of a speculative horizon it is necessary to
experiment and test them. In particular, emergent system simulation on parallel computers is an essential
practice for an indepth analysis and evaluation of the accuracy of the proposed models of emergent
behavior.
The programming of emergent phenomena and systems using traditional programming models and
tools is very difficult and involves long and complex coding. This is mainly because these approaches
are based on the design of a system as a whole; hence, design and programming do not start from basic
elements. It is better to design emergent systems by means of paradigms that allows for expressing the
behavior of single elements and their interactions. The global behavior of these systems then emerges from
6-85
the evolution and interaction of a massive number of elements; hence, it does not need to be explicitly
coded.
The cellular automata (CA) model is a massively parallel computational model that can be effectively
used for the investigation and simulation of emergent phenomena and systems. Cellular automata are
inherently parallel; therefore, they can be used to model and simulate very large-scale emergent systems on
parallel computers [1,2]. Cellular parallel tools allow for the exploitation of the inherent parallelism of CA
in the implementation of natural solvers that simulate dynamical emergent systems by a massive number
of simple agents (cells) that interact locally. Parallel cellular languages and environments provide useful
design and programming tools for the development of scalable simulations and models of emergent
behavior. This approach is a valid alternative to complex and expensive laboratory experiments and
simulations [3].
We discuss here how the basic CA concepts are related to emergent systems, describe parallel CA
environments and tools for programming emergence in complex systems, and present some significant
programming examples of emergent systems. The remainder of the chapter is organized as follows:
Section 6.2 introduces CA and Section 6.3 outlines the main issues in parallel CA programming, describes
the main features of the CAMELot environment and its programming language CARPET and discusses
some related systems. Section 6.4 shows how different classes of CA can be implemented by using the
CARPET language. Section 6.5 presents two examples of emergent phenomena programmed according to
the parallel CA model in the CARPET cellular language and gives performance figures for them. Finally,
Section 6.6 draws some conclusions.
• Transition function: Set of rules that define how the state of each cell changes on the basis of its
current state and the states of its neighbor cells.
• State: The state of a cellular automaton (global state) is completely specified by the values of the
variables at each cell (local state). The state of a cell is a simple or structured variable (see substate)
that takes values in a finite set. The cell state can be either a numeric value or a property. For
instance, if each cell represents part of a landscape, then the state might contain the altitude or the
type of land.
• Substate: If the cell state is represented as a structured variable, substates are the fields of the
structure that represent the attributes of the cell state. For example, if each cell represents a particle,
its state can be composed of two substates that represent particle mass and speed.
C C
C
• Neighborhood: The set of cells that a cell interacts with. The neighborhood of a cell is typically
taken to be all immediately adjacent cells. Simple neighborhoods of a cell (C) in a two-dimensional
lattice are shown in Figure 6.1.
Let us define a CA as the 4-tuple (E d , S, N , σ ), where
• E d is a regular lattice (the elements of E d are called cells).
• S is a finite set of states.
• N a finite set (with |N | = n) of neighborhood indices such that for all x in N , all c in E d : c + x
in E d .
• σ : S n → S a transition function.
A configuration Ct : E d → S is a function that associates a state with each cell of the lattice. The effect
of the transition function σ is to change the configuration Ct into the new configuration Ct +1 according
to Ct +1 (c) = σ ({Ct (i) : i in N (c)}), where N (c) denotes the set of neighbors of cell c, N (c) = {i in E d :
c − i in N }. In standard CA, all cells of the automaton are updated synchronously in parallel; whereas
extended CA models define asynchronous updating [6]. In section 6.4, we discuss nonstandard CA models.
The state of the entire automaton advances in discrete time steps. Therefore, in CA the transition function
plays a role analogous to that of the evolution equation in classical dynamical models. The global behavior
of the system is not directly specified but it is determined, in other words, it emerges by the evolution of
the states of all cells as a result of multiple interactions. Cellular automata capture the peculiar features of
systems that may be seen to evolve exclusively according to the local interactions of their constituent parts,
and guarantee computational universality. Furthermore, applied aspects of modeling have been widely
investigated from a theoretical viewpoint [4,5].
that does express the cellular paradigm in a natural way; these programs are simpler to read, change, and
maintain. On the other hand, the regularity of computation and locality of communication allow CA
programs to get good performance and scalability on parallel architectures.
Recently, several CA environments have been implemented on current desktop computers. Examples of
these systems are CAT, CelLab, CaSim, CDM, Cellsim, DDLab, and Mathematica. A longer list can be found
in Reference 7, in which the main features of these systems are outlined. Sequential CA-based systems can
be used for educational purposes and very simple simulations, but real-world phenomena simulations
generally take a very long time, or in some cases cannot be executed on this class of systems because
of memory or computing power limits. Therefore, massively parallel computers are the appropriate
computing platform for the execution of CA models when real-life problems must be solved. In fact,
for two- and three-dimensional CA of large size the computational load can be enormous. Thus, if CA
are to be used for investigating large complex phenomena, their implementation on high performance
computers composed of several processors is a must.
In particular, general-purpose distributed-memory parallel computers offer a very useful architecture
for a scalable CA machine in terms of speed-up, programmability, and portability. These systems are based
on a large number of interconnected processing elements (PE), which perform a task in parallel. According
to this approach, in recent years several parallel cellular software environments have been developed.
Significant examples of these parallel cellular systems are CAMELot, StarLogo, NEMO, P-CAM [8],
Cellular, ParCel-1, PECANS [9], and DEVS. Some are discussed in Reference 1. Together with these
software systems, parallel CA hardware has been developed for a more efficient execution of CA algorithms.
Two examples of CA hardware are the CAM-8 machine [10] and the CEPRA FPGA machine. These two
systems are special purpose machines that exploit CA parallelism in a very efficient way, although they are
specialized machines that do not support general computation models.
Cellular automata parallel systems allow a user to exploit the inherent parallelism of CA to support the
efficient simulation of complex systems that can be modeled by a very large number of simple elements
(cells) with local interaction only. Cellular automata-based languages share several features such as a
common computational paradigm and some differences such as, for example, different constructs to
specify details of a CA or of mapping and output visualization. Many real-world applications in science
and engineering, such as lava-flow simulations, molecular gas simulation, landslide modeling, freeway
traffic flow, three-dimensional rendering, soil bioremediation, biochemical solution modeling, and forest
fire simulation, have been implemented by using these CA languages. Moreover, parallel CA languages can
be used to implement a more general class of fine-grained applications such as finite elements methods,
partial differential equations, and systolic algorithms.
The main issues that influence the way in which CA languages support the design of applications on
high performance architectures are:
1. The programming approach: The unit of programming is the single cell of the automaton.
2. The cellular lattice declaration: It is based on the definition of the lattice dimension and the lattice
size.
3. The cell state definition and operations: Cell state is defined as a single variable or a record of typed
variables; cell state access and update operations are needed.
4. The neighborhood declaration and use: Neighborhood concept is used to define interaction among
cells in the lattice.
5. The Parallelism exploitation: The unit of parallelism is the cell, and parallelism, like communication,
is implicit.
6. The cellular automata mapping: Data partitioning and process-to-processor mapping is implicit at
the language level.
7. The output visualization: Automaton global state, as the collection of the cell states, is shown as it
evolves.
Many of these issues are taken into account in parallel CA systems and similar or different solutions are
provided by parallel CA languages. By discussing these concepts, we intend to illustrate how this class of
languages can be effectively used to implement high-performance applications in science and engineering
using the massively parallel cellular approach.
Programming Approach. When a programmer starts to design a parallel cellular program, she/he must
define the structure of the lattice that represents the abstract model of a computation in terms of cell-
to-cell interaction patterns. Then it must concentrate on the unit of computation that is a single cell of
the automaton. The computation to be performed must be specified as the evolution rule (transition
function) of the cells that compose the lattice. Thus, as against other approaches, a user does not specify
a global algorithm that contains the program structure in an explicit form. The global algorithm consists
of all the transition functions of all cells that are executed in parallel for a certain number of iterations
(steps).
It is worth to notice that in some CA languages it is possible to define transition functions that change
in time and space to implement inhomogeneous CA computations. Thus, after defining the dimension
(e.g., one-, two-, or three-dimensional) and the size of the CA lattice, she/he needs to specify, by the
conventional and the CA statements, the transition function of the CA that will be executed by all the
cells. Then the global execution of the cellular program is performed as a massively parallel computation
in which implicit communication occurs only among neighbor cells that access each other’s state.
Cellular Lattice Declaration. As mentioned in the previous section, the lattice declaration defines the
lattice dimension and the lattice size. Most languages support two-dimensional rectangular lattices only
(e.g., CANL and CDL). However, some of them, such as CARPET and Cellang, allow the definition of
one-, two-, and three-dimensional lattices. Some languages allow also the explicit definition of boundary
conditions, such as CANL [9], which allows adiabatic boundary conditions where absent neighbor cells
are assumed to have the same state as the center cell. Others implement reflecting conditions that are
based on mirroring the lattice at its borders. Most languages use standard boundary conditions such as
fixed and toroidal conditions.
Cell State. The cell state contains the values of data on which the cellular program works. Thus the
global state of an automaton is defined by the collection of the state values of all the cells. While low-level
implementations of CA allow to define the cell state as a small number of bits (typically 8 or 16 bits),
cellular languages such as CARPET, CANL, DEVS-C++, and CDL allow a user to define cell states as a
record of typed variables as follows:
where two substates are declared for the cell state. According to this approach, the cell state can be composed
of a set of substates that are of integer, real, char, or Boolean type and in some cases (e.g., CARPET), arrays
of these basic types can also be used. Together with the constructs for cell state definition, CA languages
define statements for state addressing and updating that address the substates by using their identifiers;
for example, cell.direction indicates the direction substate of the current cell.
Neighborhood. An important feature of CA languages that differentiate them from array-based languages
and standard data-parallel languages is that that they do not use explicit array indexing. Thus, cells are
addressed with a name or the name of the cells belonging to the neighborhood. In fact the neighborhood
concept is used in the CA setting to define interaction among cells in the lattice. In CA languages, the
neighborhood defines the set of cells whose state can be used in the evolution rule of the central cell. For
example, if we use a simple neighborhood composed of four cells we can declare it as follows
and address the neighbor cell states by the ids used in the above declaration (e.g., down.speed,
left.direction). The neighborhood abstraction is used to define the communication pattern among
cells. It means that at each time step, a cell sends to and receives from the neighbor cells the state values.
In this way implicit communication and synchronization are realized in cellular computing.
The neighbor mechanism is a concept similar to the region construct that is used in the ZPL
language [11], in which regions replace explicit array indexing making the programming of vector-
or matrix-based computations simpler and more concise. Furthermore, this way of addressing the lattice
elements (cells) does not require compile-time sophisticated analysis and complex run-time checks to
detect communication patterns among elements.
Parallelism Exploitation. Cellular automata languages do not provide statements to express parallelism
at the language level. It turns out that a user does not need to specify what portion of the code must be
executed in parallel. In fact, in parallel CA languages the unit of parallelism is a single cell, and parallelism,
like communication and synchronization, is implicit. This means that in principle the transaction function
of every cell is executed in parallel with the transaction functions of the other cells. In practice, when
coarse-grained parallel machines are used, the number of cells N is greater than the number of available
processors P, so each processor executes a block of N /P cells that can be assigned to it using a domain
decomposition approach.
CA Mapping. Like parallelism and communication, data partitioning and process-to-processor mapping
is implicit in CA languages. The mapping of cells (or blocks of them) onto the physical processors that
compose a parallel machine is generally done by the run-time system of each particular language and the
user usually intervenes in selecting the number of processors or some other simple parameter.
Some systems that run on MIMD computers use load balancing techniques that assign at run-time the
execution of cell transition functions to processors that are unloaded, or use greedy mapping techniques
that avoid some processor to become unloaded or free during the CA execution for a long period. Examples
of these techniques can be found in References 1, 12, and 13.
Output Visualization and Monitoring. A computational science application is not just an algorithm.
Therefore it is not sufficient to have a programming paradigm for implementing a complete application. It
is also significant to dispose of environments and tools that help a user in all the phases of the application
development and execution. Most of the CA languages we are discussing here provide a development
environment that allows a user not only to edit and compile the CA programs, but also to monitor the
program behavior during its execution on a parallel machine, by visualizing the output as composed of
the states of all cells. This is done by displaying the numerical values or by associating colors to these
values. Examples of these parallel environments are CAMEL for CARPET, PECANS for CANL, and DEVS
for DEVS-C++. Some of these environments provide dynamic visualization of simulations together with
monitoring and tuning facilities. Users can interact with the CA environment to change values of cell
states, simulation parameters, and output visualization features. These facilities are very helpful in the
development of complex scientific applications and make it possible to use these CA environments as real
problem-solving environments (PSEs) [14].
In the rest of this section we outline some of the listed issues by discussing the main features of
CAMELot, a general-purpose system that can be easily used for programming emergent systems using
the CARPET cellular programming language according to a massively parallel paradigm and some related
parallel CA environments and languages.
6.3.1 CAMELot—CARPET
CAMELot (CAMEL open technology) is a parallel software environment designed to support the parallel
execution of cellular algorithms, the visualization of the results, and the monitoring of cellular program
execution [15]. CAMELot is an MPI-based portable version of the CAMEL system based on the CARPET
language. CARPET (CellulAR Programming EnvironmenT ) offers a high-level cellular paradigm that
offers to a user the main CA features to assist her/him in the design of parallel cellular algorithms without
apparent parallelism [6].
A CARPET user can develop cellular programs describing the actions of many simple active elements
(implemented by cells) interacting locally. Then, the CAMELot system executes in parallel cell evolution
and allows a user to observe the global complex evolution that arises from all the local interactions.
CARPET uses a C-based grammar with additional constructs to describe the rules of the transition
function of a single cell. In a CARPET program, a user can define the basic rules of the system to be
simulated (by the cell evolution rule), but she/he does not need to specify details about the parallel
execution. The language includes
Figure 6.2 shows a simple CA programmed in CARPET that implements the Fredkin’s rule. This is a
simple rule: a cell becomes alive if the number of living cells in its neighborhood is odd; if the number of
living cells in its neighborhood is even a cell becomes dead. Fredkin’s rule is very simple, however it has
the fascinating property that any initial pattern of living cells is replicated several times on a larger scale.
CARPET and CAMELot have been used for implementing high-performance real-world simulations
based on the emergence paradigm such as lava flow, traffic flow, and forest fire simulations [6]. In
Section 6.5 the CARPET language is used to program two significant examples of emergent systems. Its
main linguistic features are outlined by describing how it supports the implementation of real emergent
applications.
#define dead 0
#define alive 1
cadef
{
dimension 2;
radius 1;
state (short value);
neighbor Neumann[4] ([0,-1]North, [-1,0]West,
[0,1]South,[1,0] East);
}
int sum=0;
{ for (i=0; i<4; i++)
sum = sum + Neumann[i]_value;
if (sum%2 == 0)
update(cell_value, dead);
else
update (cell_value, alive);
}
specifies the standard indices of an nxn array. A ZLP program next declares a set of directions. Directions
are used to transform regions, as in the expression north of X . As in cellular programming, array indexing
is avoided in ZPL by referring to adjacent array elements using the @ operator, which is similar to the
neighborhood mechanism of CARPET. An expression A@d, executed in the context of a region X , results
in an array of the same size and shape as X composed of elements of A offset in the direction d. ZPL does
not have parallel directives as in data-parallel languages such as HPF or other forms of explicit parallelism.
This implicit computation can be parceled out to different processors to get parallelism. Thus, parallelism,
as in cellular programming, comes simply from the semantics of the array operations.
6.4.1 Inhomogeneous CA
Inhomogeneous CA are a generalization of standard CA. In CA we can have spatial inhomogeneity,
temporal inhomogeneity, or both. In spatially inhomogeneous CA, there is not a single transition function
σ , but there is a set of different transition functions σ1 , σ2 , . . . , σN , associated to different cells or regions
of a CA in which also different neighborhoods can be defined. This class of automata can be implemented
in CARPET using the operations GetX, GetY, GetZ that return the value of the coordinates X, Y, and
Z of a cell in the CA lattice. For example, if we want to use a different transition function in a rectangular
region of a two-dimensional lattice we can write a code as follows
{
. . . .
if ((GetX >= 100 && GetX <= 200) && (GetY >= 80 && GetY <= 400)
{ trans-funct1()}
else
{ trans-funct2()}
. . . .
}
The same approach can be used for different transition functions for cells that belong to the border of
lattice to define boundary conditions. For instance, if two border sides are identified by the coordinates
X = 0 and X = 400, to use a transition function for the cells that are placed on that borders we can write
. . . .
if ((GetX == 0 | | GetX == 400)
{ border-trans-funct()}
else
{ normal-trans-funct()}
In temporal inhomogeneous CA, the transaction function changes during time. This generalization of
standard CA is very useful in modeling and simulation of phenomena that consist of more computational
phases. Thus, for a certain number of iterations a function σta is used, then another function σtb is used
for another time interval, and so on depending on the kind of computation that must be performed.
This class of CA can be easily programmed in CARPET by using the predefined variable step. For
example, Figure 6.3 shows the structure of a CARPET algorithm for a two-dimensional CA composed of a
sequence of three different transition functions. Temporal inhomogeneity can also be implemented using
conditions that include the step variable and more complex logic expressions.
cadef
{
dimension 2;
radius 1;
....
}
{
if (step == 1)
trans_funct1();
else
if (step > 1 && step <= 10 )
trans_funct2();
else
trans_funct2();
}
state according to the transition function σ . This class of CA is useful in the simulation of asynchronous
systems, where it is not necessary to update the state of its components at the same time.
Asynchronous CA can be programmed in CARPET by using the random function that allows to
express nondeterminism in the local transition function of each cell. The random(n) function returns
a pseudo random integer number between 0 and n. We can use it when the state of a cell must be updated.
For example, the following instructions show the nondeterministic update of a substate of a cell (notice
that the randomize function call creates a different seed for the random number generator, avoiding
that the same random value is generated for every cell):
cadef
{
. . . .
The random function can be also used for the implementation of probabilistic CA that in some
aspects are similar to asynchronous CA. In a probabilistic CA, given a local configuration c, each cell
can enter a new state s with a probability σ (c, s) and s∈S σ (c, s) = 1. This type of CA is useful in the
simulation of probabilistic phenomena that occur in physics and other sciences. As mentioned before,
by the use of the random function it is possible to implement this modification of the standard CA
model in CARPET. In particular, this can be done by a using a switch statement with probability values
(computed on the basis of a random number) assigned to the different case branches that must contain
an update operation whose execution will depend on the probability assigned to its own branch.
cadef
{
dimension 2;
radius 1;
state (int sub1, sub2, sub3, sub4);
neighbor VonNeu[4] ([0,-1]North,[-1,0]West,[0,1]South,[1,0] East);
}
int sum=0;
{
sum = VonNeu[1]_sub1+VonNeu[2]_sub2+VonNeu[3]_sub3+VonNeu[4]_sub4;
if (sum <= 2)
{
update(cell_sub1, 0);
update(cell_sub2, 1);
}
else
{
update(cell_sub3, 0);
update(cell_sub4, 1);
}
}
6.4.3 Partitioned CA
Another significant modification of the standard CA model is represented by the partitioned CA. Whereas
in a standard CA, a cell can use the whole state of each neighboring cell to compute its next state, in a
partitioned CA only the component Sni of a neighbor ni is used to determine the new state of a cell. Thus,
each cell reads only the component i of a cell ni that belongs to its neighborhood. The transition function
of a partitioned CA can be expressed as
where N = {n1 , n2 , . . . , nk }. Partitioned CA is useful in modeling systems where each cell in a neigh-
borhood contributes toward the change of the state of a central cell. Moreover, they can be used to make
possible the implementation of transition functions that otherwise would be infeasible for software or
hardware limits. In fact, in them the domain of σ is of size |S| instead of |S||N | .
The implementation of partitioned CA in CARPET is quite direct because the language allows the
definition of the cell state as a set of substates. Thus we can define the cell state of a partitioned CA as
composed of a number of substates equal to the number of neighbors. Figure 6.4 shows a simple CARPET
program that implements a two-dimensional partitioned CA.
The state of a cell is updated according to the sum of the ith component of the neighbor cell i. Since a
radius 1 von Neumann neighborhood is defined, the cell state is composed of four substates. If we use a
Moore neighborhood, a state composed of eight substates must be used.
problems. We cannot discuss these systems in detail, but we give here two simple but significant examples
of how to program emergent systems on parallel machines using a massively parallel cellular paradigm.
In particular, this section presents two examples of emergent systems programmed by using the
CARPET language that we implemented on a Linux cluster. The first example is the Q2R Ising model and
the second one is an epidemics diffusion model.
#define up 1
#define down 0
cadef
{
dimension 2;
radius 1;
state (short spin);
neighbor Moore[8]([0,-1]N,[-1,-1]NW,
[-1,0]W,[-1,1]SW, [0,1]S,
[1,1]SE,[1,0]E,[1,-1]NE);
}
int i;
short sum = 0;
{
for (i = 0; i < 8; i++)
sum = Moore[i]_spin + sum;
if (sum == 4)
{ if (cell_spin == down)
update(cell_spin, up);
else
update(cell_spin,down);
}
}
FIGURE 6.5 The Q2R Ising model written in the CARPET language.
Automata 1 2 4 8 12
size Proc Procs Procs Procs Procs
cells become sick with a chance of 55% when next to a sick cell. Sick cells recover at any given time step
with a probability of 80%, but have a 10% chance of dying. The simple behavior of a single cell can
be programmed in CARPET as shown in Figure 6.7 and the global behavior of the epidemic implicitly
emerges by the parallel execution of a large number of cells that dynamically interact according to a simple
neighbor pattern.
Such a CA program can be used to investigate spatial clustering effects in the spread of a simple epidemic
or it can be the basic skeleton for a more complex model with many degrees of freedom. Its use makes
it very easy to try and test different scenarios and “what if ” situations such as effects of vaccination,
preventive measures, or the effects of new drugs.
Table 6.2 shows the performance results (elapsed time in seconds) up to 12 processors for the execution
of 100 iteration steps considering four different lattice sizes.
Ising
1152x1152 2304x2304 4068x4068 9216x9216
14
12
10
8
Speedup
0
0 2 4 6 8 10 12 14
Processors
Figure 6.8 shows the corresponding speedup. As the number of processors increases there is a cor-
responding increase in the relative speedup, which implies that the computational power of the cluster
processors is exploited in an efficient way. In fact, efficiency of the parallel epidemics program ranges from
0.97 to 1, this means that all the CPUs are fully used during the CA program execution.
6.6 Conclusion
A large number of applications can be naturally expressed by combining the emergent system model and
the massively parallel paradigm. In many cases, designers do not use this approach because the available
tools do not support it. However, parallel implementation of CA-based emergent computation systems
and phenomena represent a valid and effective approach in the study of several classes of problems. These
kinds of simulation are very helpful in vital scientific areas such as biology, physics, chemistry, medicine,
social science, and economy. Cellular automata are a viable abstract model for representing emergent
decentralized systems and phenomena in those and other areas.
In this chapter, we discussed how to use parallel CA for complex system development and described
the main features of parallel CA environments for developing scalable emergent systems and phenomena.
Parallel cellular languages and tools provide a high-level framework for emergent computation and, at the
same time, they offer a scalable setting for getting high performance using parallel architectures. While
efforts in traditional sequential computer languages and systems design focused on how to express and
implement imperative operations and data, the main goal of the cellular paradigm is to offer a massively
parallel computational model based on a large number of simple cellular objects and operations that are
suitable for defining emergent complex systems.
We showed, through CARPET example programs, how the combination of the CA model with parallel
computing techniques and systems could be exploited to efficiently implement emergent computation
structures. Finally, modeling and simulation work through parallel cellular methods helps researchers by
#define blank 0
#define healthy 1
#define sick 2
cadef
{
dimension 2;
radius 1;
state (short status);
neighbor Moore[8]([0,-1]N,[-1,-1]NW,[-1,0]W,[-1,1]SW, [0,1]S,
[1,1]SE,[1,0]E,[1,-1]NE);
parameter (probS 0.55, probR 0.8, probD 0.1);
}
int i; float probX; short cond = 0;
{
for (i = 0; i < 8 && cond == 0; i++)
if (cell_status == healthy && Moore[i]_status == sick)
{ probX = random(1);
if (probX <= probS)
update(cell_status, sick);
cond = 1;
}
if (cell_status == sick)
{ probX = random(1);
if (probX <= probR)
update(cell_status, healthy);
else
{ probX = random(1);
if(probX <= probD)
update(cell_status,blank);
}
} }
FIGURE 6.7 The simple epidemics model written in the CARPET language.
Automata 1 2 4 8 12
size Proc Procs Procs Procs Procs
14 Epidemics
1152 × 1152 2304 × 2304
4068 × 4068 9216 × 9216
12
10
Speedup 8
0
0 2 4 6 8 10 12 14
Processors
setting up a cross-disciplinary framework and assisting the designers during software development from
the design phase to the execution, tuning, and validation phases. Therefore, these environments can be
used as virtual laboratories where scientists may work cooperatively by programming and experimenting
as in a real laboratory, getting data and knowledge on the modeled systems.
A Linux version of the CAMELot system is available online at the webpage www.icar.cnr.it/
spezzano/camelot/carpet.html. Programmers, scientists, students, and professionals may
download and use it to simulate complex systems according to the CA model.
References
[1] D. Talia, “Cellular processing tools for high-performance simulation,” IEEE Computer, 33, 44–52,
2000.
[2] J.R. Weimar, Simulation with Cellular Automata, Logos-Verlag, Berlin, 1997.
[3] M. Sipper, “The Emergence of Cellular Computing,” IEEE Computer, 32, 18–26, 1999.
[4] J. von Neumann, Theory of Self Reproducing Automata, University of Illinois Press, IL, 1966.
[5] S. Wolfram,“Computation theory of cellular automata,” Communicates in Math. Physics, 96, 15–57,
1984.
[6] D. Talia, “Implementing Standard and Nonstandard Parallel Cellular Automata in CARPET,”
In Proceedings of the 9th Euromicro Workshop on Parallel and Distributed Processing (PDP 2001),
Mantova, Italy, IEEE Computer Society Press, Washington, 2001, pp. 243–249.
[7] T. Worsch, “Simulation of cellular automata,” Future Generation Computer Systems, 16, 157–170,
1999.
[8] A. Schoneveld and J.F. de Ronde, “P-CAM: a framework for parallel complex systems simulations,”
Future Generation Computer Systems, 16, 217–234, 1999.
[9] L. Carotenuto, F. Mele, M. Mango Furnari, and R. Napolitano, “PECANS: A parallel environment
for cellular automata modeling,” Complex Systems, 10, 23–41, 1996.
[10] T. Toffoli and N. Margolus, Cellular Automata Machines: A New Environment for Modeling. The
MIT Press, Cambridge, MA, 1986.
[11] B.L. Chamberlain et al., “The Case for high level parallel programming in ZPL,” IEEE
Computational Science & Engineering, 5, 76–86, 1998.
[12] C. Hecker, D. Roytenberg, J.-R. Sack, and Z. Wang, “System development for parallel cellular
automata and its applications,” Future Generation Computer Systems, 16, 235–247, 1999.
[13] M. Cannataro, S. Di Gregorio, R. Rongo, W. Spataro, G. Spezzano, and D. Talia, “A Parallel Cellular
automata environment on multicomputers for computational science,” Parallel Computing, 21
803–824, 1995.
[14] E. Gallopoulos, E.N. Houstis, and J.R. Rice, “Workshop on problem-solving environments:
findings and recommendations,” ACM Computing Surveys, 27, 277–279, 1995 .
[15] G. Spezzano and D. Talia, “Programming cellular automata algorithms on parallel computers,”
Future Generation Computer Systems, 16, 203–216, 1999.
[16] J.D. Eckart, “Cellang 2.0: Reference Manual,” ACM Sigplan Notices, 27, 107–112, 1992.
[17] P.Z. Zeigler et al., “The DEVS environment for high-performance modeling and simulation,” IEEE
Computational Science & Engineering, 4, 61–71, 1997.
7.1 Introduction
This chapter focuses on the class of algorithms called cellular evolutionary algorithms (cEAs). Here,
we present the canonical algorithm and suggest the interesting variants targeted to solve complex problems
accurately with a minimum effort for customization. These techniques, also called diffusion or fine-
grained models, have been popularized, among others, by the early work of Gorges-Schleuter [1] and
Manderick and Spiessens [2]. The basic idea is to add some structure to the population of tentative
solutions. The pursued effect is to improve on the diversity and exploration capabilities of the algorithm
while, still admitting an easy combination with local search and other search techniques to improve on
exploitation.
7-103
These structured models are based on a spatially distributed population in which genetic operations
may only take place in a small neighborhood of each individual. Usually, individuals are arranged on
a regular grid of dimensions, d = 1, 2, or 3. cEAs are a kind of decentralized EA model [3]. They are not
a parallel implementation of an EA; in fact, although parallelism could be used to speed up the search,
we do not address parallel implementations in this work. In addition, it is worth remarking that, although
SIMD (single instruction stream–multiple data stream) machine implementations were popular a decade
ago, this is no longer true, and today the best distributed implementation of a cEA should make use of
domain decomposition on clusters of networked machines.
Although, fundamental theory is still an open research line for cEAs, they have been empirically reported
as being useful in maintaining diversity, and promoting slow diffusion of solutions through the grid. Part
of their behavior is due to a lower selection pressure compared with that of panmictic EAs (here panmictic
means that, any individual may mate with any other individual in the population). The influence of the
selection method [4,5], neighborhood [6], and grid topology on the efficiency of cEAs in comparison with
other EAs [7] have been investigated in detail, and tested on different applications, such as combinatorial
and numerical optimization.
Cellular evolutionary algorithms can be seen as stochastic cellular automata (CA) [8,9] where the
cardinality of the set of states is equal to the number of points in the search space. CAs, as well as cEAs,
usually assume a synchronous or “parallel” update policy, in which all the cells are updated simultaneously.
However, this is not the only option available. Indeed, several works on asynchronous CAs have shown
that sequential update policies have a marked effect on their dynamics [10,11]. Furthermore, the shape of
the structure in which individuals evolve has a deep impact on the performance of the cEA. The algorithm
admits a special, easy modulation of its shape that can sharpen the exploration or the exploitation
capabilities of the canonical technique, as shown in Reference 7. Thus, it is interesting to investigate
asynchronous cEAs and nonsquare shaped cEAs, in order to analyze their problem solving capabilities,
which is the subject of the second part of this chapter.
This work is organized as follows. Section 7.2 contains some background on synchronous and asyn-
chronous cEAs. In Section 7.3, we discuss the ability of cEAs for changing their behavior depending
on the population shape. In Section 7.4, we illustrate, quantitatively, the modifications on the selection
intensity due to asynchronicity and population shape. We deal with a set of discrete and continuous
benchmark problems in Sections 7.5 and 7.6, respectively, with the goal of analyzing the actual computa-
tional power of the algorithms. Finally, Section 7.7 discusses our conclusions, as well as some comments
on future work.
• In fixed line sweep, the simplest method, the n grid cells are updated sequentially (1, 2, . . . , n) line
after line.
• In fixed random sweep, the next cell to be updated is chosen with uniform probability without
j p
replacement; this will produce a certain update sequence (c1 , c2k , . . . , cnm ), where cq means that
cell number p is updated at time q and ( j, k, . . . , m) is a permutation of the n cells. The same
permutation is then used for all update cycles.
• The new random sweep method works like FRS, except that a new random cell permutation is used
for each sweep through the array.
• In uniform choice, the next cell to be updated is chosen at random with uniform prob-
ability and with replacement. This corresponds to a binomial distribution for the update
probability.
The concept of generation that is customarily used in EAs and in synchronous cEAs has to be replaced
by time step in the asynchronous cases. A time step is defined as updating n times sequentially, which
corresponds to updating all the n cells in the grid for LS, FRS, and NRS, and possibly less than n different
cells in the UC method, since some cells might be updated more than once. It should be noted that, with
the exception of LS, the other asynchronous updating policies are stochastic, representing an additional
source of nondeterminism besides that of the genetic operators.
Although it is called a “radius,” rad measures the dispersion of n ∗ patterns. Other possible measures
for symmetrical topologies would allocate the same numerical value to different topologies (which is
undesirable). Two examples are, the radius of a circle surrounding a rectangle containing the topology, or
an asymmetry coefficient. The definition not only characterizes the grid shape but also provides a radius
value for the neighborhood. As proposed in Reference 6, the grid-to-neighborhood relationship can be
quantified by the ratio between their radii (Equation [7.2]).
radNeighborhood
ratiocEA = . (7.2)
radTopology
When solving a given problem with a constant number of individuals (n = n ∗ , for making fair
comparisons) the topology radius will increase as the grid gets thinner (Figure 7.1[b]). Since the neighbor-
hood is kept constant in size and shape throughout this chapter (we always use linear5 (L5), Figure 7.1[a]),
the ratio will be smaller as the grid gets thinner.
This ratio value directly influences the behavior of the algorithm. During the search for reducing the
ratio means reducing the global selection intensity on the population (see Section 7.4), thus promoting
exploration, that is, the importance of such a ratio measure. This is expected to allow a higher diversity
in the genotype that could improve the results in difficult problems (such as in multimodal or epistatic
tasks). On the other hand, the search performed inside each neighborhood is guiding the exploitation
of the algorithm. In this chapter we study how the ratio affects the search efficiency over a variety of
domains. Changing the ratio during the search is a unique feature of cEAs that can be used to shift from
rad2
FIGURE 7.1 (a) Radius of neighborhood L5. (b) 5 × 5 = 25 and 3 × 8 ≈ 25 grids; equal number of individuals
with two different ratios.
(a) 0 1 2 3 4 5 6 7 (b)
0
0 1 2 3 4 5 6 7 8 9 10 1112 13 14 15
1
2 Relocation 0
3 1
4 2
5 3
6
7 (2, 4) ([2·8 + 4] div 16, [2·8 + 4] mod 16) = (1, 4)
FIGURE 7.2 Relocation of an individual and its neighbors when the grid changes from (a) 8 × 8 to other (b) with
shape 4 × 16.
exploration to exploitation at a minimum complexity without introducing another new algorithm family
in the literature.
Many techniques for managing the exploration/exploitation trade-off are possible. Among them, it is
worth to mention the heterogeneous EAs [12,13], in which algorithms with different features run in
separate subpopulations and collaborate in order to avoid premature convergence. A different alternative
is using Memetic Algorithms [14], in which local search is combined with the genetic operators in order to
promote local exploitation.
Since, a shift between exploration and exploitation can be made by changing the shape of the population
(and thus its radius), one can think on changing it during the search. Hence, we theoretically consider
the population as a list of length m · n, such that, the first row of the m × n grid is composed by the first
n individuals of the list, the second row is made up with the next n individuals, and so on. Therefore,
when performing a change from a m × n grid to a new m
× n
grid (being m · n = m
· n
) the individual
placed at position (i, j) will be relocated as follows:
We call this redistribution method contiguous, because the new grid is filled up by following the order of
appearance of the individuals in the list. Figure 7.2 contains an example of a grid change from 8×8 to 4×16.
It can be shown, how an individual in position (2, 4) is relocated at (1, 4), changing its neighbors placed
at its north and south positions, and keeping close to those placed at its east and west positions. Hence,
the change in the grid shape can be seen as an actual migration of individuals among neighborhoods,
which will introduce additional diversity into the population for the forthcoming generations. Note that
in this chapter we only use static ratios, that is, grid and neighborhood shapes that do not change during
the run.
1.0
0.9
0.8
0.7
Best proportion
0.6
0.5
0.4
0.3
0.2
L5 (32 × 32)
0.1 C21 (64 × 64)
0
0 4 8 12 16 20 24 28 32 36 40
Number of generations
FIGURE 7.3 Growth curves of the best individual for two cEAs with different neighborhood and population shapes,
but similar ratio values. The vertical axis represents the proportion of population consisting of best individual as a
function of time.
1
Ratio 0.003 (1 × 1024 population)
0.9 Ratio 0.006 (2 × 512 population)
Ratio 0.012 (4 × 256 population)
0.8 Ratio 0.024 (8 × 128 population)
Ratio 0.047 (16 × 256 population)
0.6
0.5
0.4
0.3
0.2
0.1
FIGURE 7.4 Takeover times with tournament selection using a L5 neighborhood in a population of 1024 individuals
with different grid shapes; mean values over 100 runs. The vertical axis represents the proportion of population
consisting of best individual as a function of time. Horizontal axis is in logarithmic scale.
0.9
0.8
0.7
Best individual proportion
0.6
0.5
0.4
0.3 Synchronous
UC
0.2 NRS
LS
0.1 Panmictic
0
0 10 20 30 40 50
Time steps
FIGURE 7.5 Takeover times with tournament selection using a L5 neighborhood in a 32 × 32 grid; mean values over
100 runs. The vertical axis represents the proportion of population consisting of best individual as a function of time.
We experiment with the set of problems studied in Reference 7 which includes the massively multimodal
deceptive problem (MMDP), the frequency modulation sounds (FMSs), and the multimodal problem
generator P-PEAKS; we then extend this basic three-problem benchmark with error correcting code
design (ECC), maximum cut of a graph (MAXCUT), the minimum tardy task problem (MTTP), and the
satisfiability problem (SAT).
Subfunction value
1.0
1 0.000000 0.8
2 0.360384 0.6
3 0.640576 0.4
4 0.360384 0.2
5 0.000000 0.0
6 1.000000 0 1 2 3 4 5 6
Unitation
The choice of this set of problems is justified by both their difficulty and their application domains
(parameter identification, telecommunications, combinatorial optimization, scheduling, etc.). This gives
us a high level of confidence in the results obtained, although the evaluation of conclusions is consequently
more laborious than with a small test suite.
The problems selected for this benchmark are explained in Sections 7.5.1 to 7.5.7. We include the
explanations in the chapter to make it self-contained and to avoid the typical small information lacks that
could preclude other researchers from reproducing the results. Finally, in Section 7.5.8 we present and
analyze the results.
k
fMMDP ( s ) = fitnesssi . (7.4)
i=1
The goal is to minimize the sum of square errors given by Equation (7.7). This problem is a highly com-
plex multimodal function having strong epistasis, with optimum value 0. Due to the extreme difficulty of
solving this problem with high accuracy, without applying local search or specific operators for continuous
optimization (such as gradual GAs [13]), we stop the algorithm when the error falls below 10−2 . Hence,
our objective for this problem will be to minimize Equation (7.7)
100
fFMS (x ) = (y(t ) − y0 (t ))2 . (7.7)
t =0
1
fP-PEAKS (x ) = max {N − Hamming D(x , Peak i )}. (7.8)
N 1≤i≤p
1
fECC = M M , (7.9)
i=1 j=1,i =j dij−2
where dij represents the Hamming distance between codewords i and j in the code C (made up of
M codewords, each of length n). Here, we consider an instance where M = 24 and n = 12. The search
space is of size 4096
24 , which is approximately 10 . The optimum solution for M = 24 and n = 12 has a
87
if it is 0 then the corresponding vertex is in set V0 . The function to be maximized [23] is:
n−1
n
fMAXCUT (x ) = wij · [xi · (1 − xj ) + xj · (1 − xi )]. (7.10)
i=1 j=i+1
Note that wij contributes to the sum only if nodes i and j are in different partitions. While one can
generate different random graph instances to test the algorithm, here we have used the case “cut20.09,”
with 20 vertices and a probability of 0.9 of having an edge between any two randomly chosen vertices.
The maximum fitness value for this instance is 56.740064.
1. A task cannot be scheduled before the previous one is completed: g (i) < g ( j) ⇒ g (i) + li ≤ g ( j).
2. Every task finishes before its deadline: g (i) + li ≤ di .
The objective function for this problem is to minimize the sum of the weights of the unscheduled tasks.
Therefore, the optimum scheduling minimizes Equation (7.11):
fMTTP (x ) = wi . (7.11)
i∈T −S
The schedule of tasks S can be represented by a vector x = (x1 , x2 , . . . , xn ) containing all the tasks
ordered by its deadline. Each xi ∈ {0, 1}, where if xi = 1 then task i is scheduled in S, while if xi = 0
means that task i is not included in S. The fitness function is the inverse of Equation (7.11), as described
in Reference 23. We have used in this study an instance called “mttp20,” with size 20, and maximum fitness
value of 0.02439.
An instance of SAT, x , is called satisfiable if fSAT (x ) = 1, and unsatisfiable otherwise. A k-SAT instance
consists of clauses with length k. When k ≥ 3 the problem is NP-hard [25]. In this chapter we will consider
an instance of 3-SAT made up of 430 clauses and 100 variables. This instance belongs to the well-known
phase transition of hard SAT instances. The fitness function is a linear function of the number of satisfied
clauses. In this, we use the so-called stepwise adaptation of weights (SAWs) [26]:
This function weighs the values of the clauses with wi ∈ B in order to give more importance to those
clauses which are not yet satisfied by the current best solution. These weights are adjusted dynamically
according to wi = wi + 1 − ci (x ∗ ), x ∗ being the current fittest individual.
Average
solution Average Hit rate
Algorithm (best = 20) generations (%)
Average
solution Average Hit rate
Algorithm (best ≥ 100) generations (%)
Square 90.46 437.4 57
Rectangular 85.78 404.3 61
Narrow 80.76 610.9 63
LS 81.44 353.4 58
FRS 73.11 386.2 55
NRS 76.21 401.5 56
UC 83.56 405.2 57
Average
solution Average Hit rate
Algorithm (best = 1) generations (%)
Square 1.0 51.8 100
Rectangular 1.0 50.4 100
Narrow 1.0 53.9 100
LS 1.0 34.8 100
FRS 1.0 38.4 100
NRS 1.0 38.8 100
UC 1.0 40.1 100
Average
solution Average Hit rate
Algorithm (best = 0.0674) generations (%)
Square 0.0670 93.9 85
Rectangular 0.0671 93.4 88
Narrow 0.0673 104.2 94
LS 0.0672 79.7 89
FRS 0.0672 82.4 90
NRS 0.0672 79.5 89
UC 0.0671 87.3 86
Average
solution Average Hit rate
Algorithm (best = 56.74) generations (%)
Average
solution Average Hit rate
Algorithm (best = 0.02439) generations (%)
Average
solution Average Hit rate
Algorithm (best = 430.0) generations (%)
are statistically significant (with two exceptions), thus indicating that the asynchronous versions perform
more efficiently with respect to cEAs with a changing ratio.
On the contrary, synchronous algorithms perform like asynchronous or even better in terms of the
percentage of solutions found (hit rate), while the quality of solutions found by the algorithms does not
always have significant differences (the exceptions are probably due to the difference on the hit rate).
Another interesting result is that, we can define two classes of problems: those solved by any method
to optimality (100% hit rate) and those in which no 100% rate is achieved at all. Problems seem to be
amenable for cEAs directly, or to need some (yet unstudied) help, for example, by including local search.
In order to summarize the large set of results and get some useful conclusion, we present a ranking with
the best algorithms by following three different metrics: average best final solution, average number of
generations on success, and hit rate. Table 7.10 shows the three mentioned rankings. These rankings have
Average Average
solution generations Hit rate (%)
1 Narrow 10 1 LS 14 1 Narrow 6
1 Rectangular 10 2 NRS 16 2 Rectangular 10
3 Square 14 3 FRS 18 3 FRS 14
4 FRS 15 4 UC 30 4 LS 15
5 LS 18 5 Rectangular 33 5 Square 17
5 UC 18 6 Square 37 6 UC 19
7 NRS 21 7 Narrow 48 7 NRS 21
been computed by adding the position (from better to worse: 1, 2, 3, . . .) that algorithms are allocated for
the previous results presented from Table 7.3 to Table 7.9, according to the three criteria.
As we would expect after the previous comments, according to the average final best fitness and hit rate
criteria, synchronous algorithms with any of the three studied ratios are, in general, more accurate than
all the asynchronous ones for our test problems, with a special leading position for narrow population
grids. On the other hand, asynchronous versions clearly outperform any of the synchronous algorithms,
in terms of the average number of generations (efficiency), with a trend toward LS as being the best ranked
flavor of cEA for our discrete test suite.
n
fRASTR (x) = nA + xi2 − A cos(ωxi ). (7.14)
i=1
Bäck et al. [30] (see Equation [7.15]). Contrary to Rastrigin’s, Ackley’s function is not separable, even
though it shows a regular arrangement of the local optima.
1/2
1 2 1
n n
fACKL (x) = −a exp −b xi − exp cos(cxi ) + a + e. (7.15)
n n
i=1 i=1
The constants are a = 20, b = 0.2, and c = 2π. The variables xi , i = 1, . . . , n are in the domain −32.768 ≤
xi ≤ 32.768. This function has a global minimum at the point f (0) = 0.
n
fFRAC (x) = (C
(xi ) + xi2 − 1), (7.16)
i=1
where
C(z)
if z = 0,
C(1)|z|2−D
C (z) =
1 if z = 0,
and
∞
1 − cos(b j z)
C(z) = .
b (2−D)j
j=−∞
For the runs, we have chosen the constants D = 1.85 and b = 1.5. The 20 variables xi (xi = 1, . . . , 20)
vary in the range [−5, 5]. The infinite sum in the function C(z) is practically calculated starting with
j = 0 and alternating the signs of the j values. The sum is stopped when the relative difference between
its previous and present value is lower than 10−8 , or when j = 100 is reached.
Average
solution Average Hit rate
Algorithm (best ≤ 0.1) generations (%)
Square 0.0900 323.8 100
Rectangular 0.0883 309.8 100
Narrow 0.0855 354.2 100
LS 0.0899 280.9 100
FRS 0.0900 289.6 100
NRS 0.0906 292.2 100
UC 0.0892 292.4 100
Average
solution Average Hit rate
Algorithm (best ≤ 0.1) generations (%)
Square 0.0999 321.7 78
Rectangular 0.0994 293.1 73
Narrow 0.1037 271.9 65
LS 0.0932 302.0 84
FRS 0.0935 350.6 92
NRS 0.0956 335.5 87
UC 0.0968 335.0 85
Average
solution Average Hit rate
Algorithm (best ≤ 0.1) generations (%)
Average Average
solution generations Hit rate (%)
1 UC 8 1 LS 7 1 FRS 3
2 LS 9 2 Narrow 9 5 NRS 5
2 FRS 9 2 Rectangular 9 4 LS 7
4 NRS 13 4 FRS 13 6 UC 8
4 Rectangular 13 5 UC 14 7 Square 11
6 Narrow 15 6 NRS 15 3 Rectangular 13
7 Square 16 7 Square 17 2 Narrow 15
This result tells us about a larger efficiency of changing ratio cGAs, which contrast with the findings for
discrete problems. On the other hand, contrary to that observed in the case of discrete problems, the
success rates of asynchronous algorithms are greater than those of synchronous, in general. Contrary to
the results of Section 7.5.8, where either all the algorithms get a 100% hit rate or none finds the solution
in every run for any problem, the FRS cEA is the unique algorithm which is able to find the solution in all
the executions for FRAC problem.
In order to summarize these results, and following the structure of Section 7.5.8, we present in Table 7.15
a ranking with the best algorithms in all the problems by means of the average solution found, the number
of generations needed to find an optimal solution, and the success rate. It can be seen in this table that
there exists a trend of asynchronous algorithms that perform better than synchronous ones in terms of
the average solution found and the success rate, while synchronous changing-ratio algorithms seem to be
more efficient than most asynchronous ones, in general (square and LS are the exceptions).
7.7 Conclusions
In the first part of this chapter we have described several asynchronous update policies for the population
of a cEA, followed by some ratio policies, all of them inducing a different kind of search in the cEA. One
can tune the selection intensity of a cEA by choosing the update policy and grid ratio without having to
deal with additional numerical parameter settings. This is a clear advantage of the algorithms that allows
users to utilize existing knowledge instead of inventing a new class of heuristic.
Our conclusion is that cEAs can be easily induced to promote exploration or exploitation by simply
changing the update policy or the ratio of the population. This opens new research lines to decide efficient
manners of shifting from one given policy/ratio to another in order for the optimum to be reached with
a smaller effort when compared with the basic cEA or other types of EAs.
In a later part of the chapter we have applied our extended cEAs to a set of both discrete and continuous
test problems. Although our goal has not been to compete with state-of-the-art specialized heuristics, the
results clearly show that cEAs are very efficient optimization techniques, that could be further improved
by being hybridized with local search techniques [32]. The results on the test problems largely confirm,
with some small exceptions, that the solving abilities using the various update/ratio modes are directly
linked to their induced selection pressures, showing that exploitation plays an important role. It is clear
that the role of exploration might be more important on even harder problem instances, but this aspect
can be addressed in our algorithms by using more explorative settings, as well as by using different cEA
strategies at different times during the search, dynamically [33].
We conclude that, with respect to discrete problems, asynchronous algorithms are more efficient than
synchronous; with statistically significant values for most problems. On the other hand, if we pay attention
to the success (hit) rate, it can be concluded that the synchronous policies for the different evaluated ratios
outperform the asynchronous algorithms: slightly in terms of the average final fitness, and clearly in terms
of probability of finding a solution (i.e., frequency of optimum location).
On the contrary, if we pay attention to the experiments on continuous problems we can get different
(somewhat complementary) conclusions. Asynchronous algorithms outperform synchronous ones in the
cases of average solutions found and hit rate (significant differences in many cases), while in the average
number of generations synchronous algorithms are, in general, more efficient than asynchronous ones.
As future research, we plan to address a single problem of large difficulty and to characterize selection
intensity analytically for all the models.
Acknowledgments
This work has been partially funded by the Ministry of Science and Technology (MCYT) and Regional
Development European Fund (FEDER) under contract TIC2002-04498-C05-02 (the TRACER project)
http://tracer.lcc.uma.es. We thank J. Kaempf for performing a part of the computer simula-
tions for the real-valued problems. M. Giacobini gratefully acknowledges financial support by the Fonds
National Suisse pour la Recherche Scientifique under contract 200021-103732/1.
References
[1] M. Gorges-Schleuter. ASPARAGOS an asynchronous parallel genetic optimisation strategy.
In J.D. Schaffer, Ed., Proceedings of the Third International Conference on Genetic Algorithms,
Morgan Kaufmann, San Francisco, CA, 1989, pp. 422–427.
[2] B. Manderick and P. Spiessens. Fine-grained parallel genetic algorithms. In J.D. Schaffer, Ed.,
Proceedings of the Third International Conference on Genetic Algorithms, Morgan Kaufmann,
San Francisco, CA, 1989, pp. 428–433.
[3] E. Alba and M. Tomassini. Parallelism and evolutionary algorithms. IEEE Transactions on
Evolutionary Computation, 6: 443–462, 2002.
[4] M. Gorges-Schleuter. An analysis of local selection in evolution strategies. In Genetic and
Evolutionary Conference, GECCO99, Vol. 1, Morgan Kaufmann, San Francisco, CA, 1999,
pp. 847–854.
[5] J. Sarma and K.A. De Jong. An analysis of local selection algorithms in a spatially structured
evolutionary algorithm. In T. Bäck, Ed., Proceedings of the Seventh International Conference on
Genetic Algorithms, Morgan Kaufmann, San Francisco, CA, 1997, pp. 181–186.
[6] J. Sarma and K.A. De Jong. An analysis of the effect of the neighborhood size and shape on
local selection algorithms. In H.M. Voigt, W. Ebeling, I. Rechenberg, and H.P. Schwefel, Eds.,
Parallel Problem Solving from Nature (PPSN IV), Vol. 1141 of Lecture Notes in Computer Science,
Springer-Verlag, Heidelberg, 1996, pp. 236–244.
[7] E. Alba and J.M. Troya. Cellular evolutionary algorithms: Evaluating the influence of ratio.
In M. Schoenauer et al., Eds., Parallel Problem Solving from Nature, PPSN VI, Vol. 1917 of Lecture
Notes in Computer Science, Springer-Verlag, Heidelberg, 2000, pp. 29–38.
[8] M. Tomassini. The parallel genetic cellular automata: Application to global function optimization.
In R.F. Albrecht, C.R. Reeves, and N.C. Steele, Eds., Proceedings of the International Conference on
Artificial Neural Networks and Genetic Algorithms, Springer-Verlag, Heidelberg, 1993, pp. 385–391.
[9] D. Whitley. Cellular genetic algorithms. In S. Forrest, Ed., Proceedings of the Fifth International
Conference on Genetic Algorithms, Morgan Kaufmann Publishers, San Mateo, CA, 1993, p. 658.
[10] B. Schönfisch and A. de Roos. Synchronous and asynchronous updating in cellular automata.
BioSystems, 51: 123–143, 1999.
[11] M. Sipper, M. Tomassini, and M.S. Capcarrere. Evolving asynchronous and scalable non-uniform
cellular automata. In Proceedings of International Conference on Artificial Neural Networks and
Genetic Algorithms (ICANNGA97), Springer-Verlag KG, Vienna, 1998, pp. 67–71.
[12] P. Adamidis and V. Petridis. Co-operating populations with different evolution behaviours. In Pro-
ceedings of the Third IEEE Conference on Evolutionary Computation, IEEE Press, Washington, 1996,
pp. 188–191.
[13] F. Herrera and M. Lozano. Gradual distributed real-coded genetic algorithms. IEEE-EC, 4: 43–62,
2000.
[14] P. Moscato. Memetic algorithms. In P.M. Pardalos and M.G.C. Resende, Eds., Handbook of Applied
Optimization, Oxford University Press, Oxford, 2000, pp. 157–167.
[15] D.E. Goldberg and K. Deb. A comparative analysis of selection schemes used in genetic algorithms.
In G.J.E. Rawlins, Ed., Foundations of Genetic Algorithms, Morgan Kaufmann, San Francisco, CA,
1991, pp. 69–93.
[16] M. Giacobini, E. Alba, and M. Tomassini. Selection intensity in asynchronous cellular evolutionary
algorithms. In E. Cantú-Paz et al., Eds., Proceedings of the Genetic and Evolutionary Computation
Conference GECCO’03, Springer-Verlag, Berlin, 2003, pp. 955–966.
[17] D. Whitley, S. Rana, J. Dzubera, and K.E. Mathias. Evaluating evolutionary algorithms. Artificial
Intelligence, 85:245–276, 1997.
[18] D.E. Goldberg, K. Deb, and J. Horn. Massively multimodality, deception and genetic algorithms.
In R. Männer and B. Manderick, Eds., Parallel Problem Solving from Nature, PPSN II,
North-Holland, 1992, pp. 37–46.
[19] S. Tsutsui and Y. Fujimoto. Forking genetic algorithm with blocking and shrinking modes.
In S. Forrest, Ed., Proceedings of the fifth International Conference of Genetic Algorithms,
Morgan Kaufmann, San Mateo, CA, 1993, pp. 206–213.
[20] K.A. De Jong, M.A. Potter, and W.M. Spears. Using problem generators to explore the effects of
epistasis. In T. Bäck, Ed., Proceedings of the Seventh ICGA, Morgan Kaufmann, San Francisco, CA,
1997, pp. 338–345.
[21] F.J. MacWilliams and N.J.A. Sloane. The Theory of Error-Correcting Codes. North-Holland,
Amsterdam, 1977.
[22] H. Chen, N.S. Flann, and D.W. Watson. Parallel genetic simulated annealing: A massively parallel
SIMD algorithm. IEEE Transactions on Parallel and Distributed Systems, 9: 126–136, 1998.
[23] S. Khuri, T. Bäck, and J. Heitkötter. An evolutionary approach to combinatorial optimization
problems. In Proceedings of the 22nd ACM Computer Science Conference, ACM Press, Phoenix, AZ,
1994, pp. 66–73.
[24] D.R. Stinson. An Introduction to the Design and Analysis of Algorithms, 2nd ed. (1987). The Charles
Babbage Research Center, Winnipeg, Manitoba, Canada, 1985.
[25] M. Garey and D. Johnson. Computers and Intractability: A Guide to the Theory of NP-Completeness.
Freeman, San Franciso, CA, 1979.
[26] A. Eiben and J. van der Hauw. Solving 3-SAT with adaptive genetic algorithms. In Proceedings of
the Fourth IEEE Conference on Evolutionary Computation, IEEE, Piscataway, NJ, 1997, pp. 81–86.
[27] Z. Michalewicz. Genetic Algorithms + Data Structures = Evolution Programs, 3rd ed. Springer-
Verlag, Heidelberg, 1996.
[28] H. Mühlenbein and D. Schlierkamp-Voosen. The science of breeding and its application to the
breeder genetic algorithm (bga). Evolutionary Computation, 1: 335–360, 1993.
[29] D.H. Ackley. A Connectionist Machine for Genetic Hillclimbing. Kluwer, Boston, MA, 1987.
[30] T. Bäck, G. Rudolf, and H.-P. Schwefel. Evolutionary programming and evolution strategies:
Similarities and differences. In D.B. Fogel and W. Atmar, Eds., Proceedings of the Second Conference
on Evolutionary Programming, Evolutionary Programming Society, La Jolla, CA, 1993, pp. 11–22.
[31] T. Bäck. Evolutionary Algorithms in Theory and Practice: Evolution Strategies, Evolutionary
Programming, Genetic Algorithms. Oxford University Press, New York, 1996.
[32] G. Folino, C. Pizzuti, and G. Spezzano. Parallel hybrid method for SAT that couples genetic
algorithms and local search. IEEE Transactions on Evolutionary Computation, 5: 323–334, 2001.
[33] E. Alba and B. Dorronsoro. The exploration/exploitation tradeoff in dynamic cellular evolutionary
algorithms. IEEE Transactions on Evolutionary Computation, 9(2): 126–142, 2005.
8.1 Introduction
The extreme diversity of life forms on our planet is a testament to the many ways in which millions of
different species have adapted to the huge number of biological niches that are available ranging, as they
do, over extremes in temperature and pressure in air, land, and water. Viewed in an abstracted formulation,
we may consider evolutionary adaptation of species within a biological niche to be a robust search and
optimization mechanism (Fogel, 1994), the phenotype changing over time so that the species survives
in its competitive environment. Under the Darwinian view, evolutionary processes are involved with
reproduction, mutation, competition, and selection. The bioinspired strategies of genetic algorithms and
evolutionary optimization, in general, strive to capture these biologically robust search mechanisms by
formulating mathematical models that search function domains using algorithmic procedures that mimic
some of the evolutionary processes of the cell, for example, mutation, and crossover. Before discussing
these procedures, let us go over a superficial but quick review of the main biological events in the cell.
8-121
Transcription:
Primary
mRNA
transcript
RNA
splicing:
Modify gene
and generate
phenotype
Is
Yes
phenotype
fit?
No
Replace
phenotype
not exactly follow the same encoding as regular DNA. This symbiotic combination illustrates a mechanism
that achieves a remarkable advance in complexity without any reliance on mutation and crossover per se.
Problem
specification
Fitness
Binary gene
value
Evaluate
objective
function
Problem
specification
by being chosen for transcription leading ultimately to some particular protein with possible modification.
The important issue is that, fitness is essentially due to the capabilities of protein making up the phenotype.
More significantly, fitness is not an immediate property of the genes that have been expressed.
In the traditional genetic algorithm (Goldberg, 1989: 60), the phenotype of an individual is the decoded
parameter or parameters of the binary gene. We use Figure 8.3 to describe a somewhat simplified version
of the algorithm, showing how the problem specification will dictate the format of a binary gene. A
subsequent calculation using an objective function evaluates the fitness of that gene. In this scenario, the
fitness function is applied directly to the binary gene. As noted by Mitchell (1996) p. 6:, there is often no
notion of a phenotype in the context of genetic algorithms, although more recently, various works have
presented various models to incorporate them.
When a genetic algorithm involves a gene expression strategy, the fitness function is applied to a
computed phenotype that is more than a simple alternate representation of the binary gene. Typically,
the computation of this phenotype is fairly complicated and is achieved by an algorithm that is called a
gene expression algorithm. As shown in Figure 8.4, the gene expression strategy utilizes an intermediate
stage that produces a final phenotype, which is then subjected to fitness evaluation. To make the biological
inspiration more obvious, we simplify the transformation of cellular information as follows:
In our computational environment, we will have various binary representations that run in parallel with
this natural model:
gene → intermediate representation → phenotype.
The intermediate representation may be absent or may exist as a sequence of intermediate transformations
depending on the requirements of the algorithm. The most significant component of our strategy is
that, we use a gene expression algorithm to produce a binary phenotype representation that is ready for
fitness evaluation. The phenotype meets various feasibility requirements and, in its construction, the
gene expression algorithm utilizes various parameters that eventually take on values that are determined
through the use of a genetic algorithm. Another description of the gene expression strategy would be the
use of optimization heuristics that are dependent on parameters derived via evolutionary computation.
It is important to understand how an algorithm designer might approach a problem with the intention of
utilizing a gene expression algorithm. In the traditional deployment of a genetic algorithm, the algorithm
designer would have a binary string acting as a gene and then would exercise the genetic algorithm with an
appropriate fitness function applied to such a gene. With a gene expression algorithm, some appropriate
data structure (usually a binary string) acts as a phenotype structure which provides an input for the
fitness function. The “cleverness” of the algorithm designer is then challenged by two goals:
1. Specify the format of a binary gene (suitable for mutation and crossover operations) that can then
be transformed into the phenotype structure.
2. Design the gene expression algorithm that does this transformation.
Another interesting set of gene expression papers has been written by Hillol Kargupta. He uses the idea of
genetic code-like transformations. Kargupta (1999) emphasizes the importance of learning functions from
data. This would have application in areas such as inductive learning, statistics, data mining, and optimiza-
tion. In this approach, a function is learnt or induced by generating the coefficients in its Fourier expansion.
In general, a function (e.g., a fitness function) defined on an n-bit binary string, requires O(2n ) Fourier
coefficients. Hence, this inductive procedure is computationally inefficient. However, it is sometimes
possible to find very reasonable approximations of a function if the set of Fourier coefficients has a
smaller subset of “large” coefficients that are polynomial in number. In such a case, it must be demon-
strated that the power spectrum, defined by the set of coefficients, has a high concentration of energy in a
few coefficients (O(n k ) in number) with an exponential drop-off in the magnitude of all other nonzero
coefficients. This is equivalent to the demand that, the “small” coefficients (O(2n ) in number) be expo-
nentially small so that, cumulatively, they do not amount to any significant sum. The contribution of this
chapter to gene expression resides in the observation that for some functions with O(2n ) large coefficients,
it is possible to use a “genetic code-like” transformation of the data that will transform the function into
a new function that has only O(n k ) large coefficients. The strategy is considered to involve a gene expres-
sion algorithm because we assume that the given function is defined on a phenotype space and the gene
expression algorithm is used to derive a possibly degenerate transformation that establishes a mapping
between a phenotype string and its gene image, which typically has a length that is a small integer multiple
of the phenotype string length. This is similar to a natural system in which the coding part of the gene
has a string length that is three times the length of the amino acid sequence that it encodes. After such a
transformation of data, the fitness function can be seen to have the gene space as its domain and if the
genetic code-like transformation is defined in the appropriate manner, then the fitness function will have
O(n k ) large Fourier coefficients. These are two other papers (Kargupta and Park, 2000; Kargupta and
Ghosh, 2002) that deal with there related issues.
we can derive a tour T from a set of cities C by doing tour extensions of a tour that initially starts as some
single city in C:
begin
T := <c>;
C := C - {c};
while C != φ do
begin
Insert c into T;
C := C- {c};
end
end
There is a variety of choice for the city selection function f. For example, we may select a city using one
of the following predefined criteria:
• Nearest Neighbor: Select the city that is nearest to the last city inserted in T.
• Nearest Insertion: Select the city that has the minimal distance from any city already included in the
subtour T.
• Farthest Insertion: Select the city whose distance from the nearest city in T is maximal.
• Cheapest Insertion: Select the city whose insertion in T involves the minimum increase in path
length.
We start by describing the conversion of a priority vector to a tour and then describe the inversion vector.
Step(0): Build a set of subtours initialized to be n elementary subtours each containing a single city.
We then process the entries of the priority vector in a consecutive order.
Step(i): Find the subtour containing the city labeled ai and merge this subtour with its closest
neighboring subtour.
On completion of the final step, we will have a full n-city tour. To derive the phenotype, we simply read
out the cities in the final tour that is just constructed. To provide more details about step(i) we describe
the merge of subtours as follows: Two subtours are merged by making two cuts (removing an edge from
each subtour), thus opening up both subtours, and then reconnecting them to make one large subtour
(Figure 8.5).
Given an arbitrary subtour SA , the closest neighboring subtour SC is defined to be the subtour providing
the lowest merge cost. The merge cost of any two subtours is the minimal merge cost calculated by
evaluating all possible cut pairs, one cut from each subtour. We define the cost of a merge as follows:
cost of merge = total length of edges added − total length of edges deleted.
Merging two single city subtours will produce a simple two city loop and merging a single city subtour
with a multi-city subtour is essentially the insertion strategy described earlier.
It should be noted that, this strategy does more than simply generating feasible “single loop” tours.
Each tour constructed is typically a reasonable approximation. Experimentation with the TSPLIB test
suite (Reinelt, 1991) shows that an arbitrary priority vector (without any improvements facilitated by the
genetic algorithm described later) will generate a tour that is roughly within 15% to 20% of optimal.
a1 , a2 , . . . , an = 1 7 6 9 5 8 3 4 2
b1 , b2 , . . . , bn = 0 7 5 5 3 1 0 1 0.
To illustrate why this is true, note that b5 = 3 because there are 3 elements in the permutation, namely 7,
6, and 9 that are greater than 5 and situated to the left of 5.
Because of the manner in which the bj are defined, we will always have the following equations:
0 ≤ b1 ≤ n − 1,
0 ≤ b2 ≤ n − 2,
.. (8.1)
.
0 ≤ bn−1 ≤ 1,
bn = 0.
1 We have adopted the terminology inversion to pay proper respect to Knuth who has a prior use of this term in
the context of a permutation. For Knuth inversion is used in the sense of a two element swap. The reader should not
confuse our use of the term with the notion of inversion as used in genetics.
Number CPU
Problem Opt. Tour Quality of tour time
instance length length of tour evals. sec
Number CPU
Problem Opt. Tour Quality of tour time
instance length length of tour evals. sec
quality paths. As a consequence, we have chosen to work with the permutation as representing not the
order of the cities but rather the order of merging subtours as described earlier.
parameters that define the construction of a phenotype that will then be subjected to a fitness evaluation.
In particular:
• We have a gene representation that is compatible with crossover and a phenotype representation
with the usual appearance of a Hamiltonian path.
• All phenotypes generated by the gene expression algorithm are valid representations suitable for
fitness evaluation.
• Our experiments show that, utilization of a sophisticated algorithm such as subtour merging gives
the evolutionary computation a “head start” in the construction of good approximations to the
optimal tour.
• Our experiments further demonstrate that a TSP construction heuristic will do better when
combined with a subsystem for evolutionary computation.
We have referred to the heuristic strategy used to construct a phenotype as a gene expression algorithm
since gene expression in the living cell is responsible for utilizing the gene when protein products contrib-
uting to the phenotype need to be synthesized. While there is no guarantee that such a gene expression
algorithm can ever produce an optimal solution for an NP-complete problem our empirical evidence
shows that optimal solutions for a TSP instance can at times be constructed using a subtour merging
heuristic adapted to work with a genetic algorithm. This at least holds the promise of providing a viable
evolutionary computation environment for future avenues of research.
References
Ausiello G., Crescenzi P., G. Gambosi, V. Kann, A. Marchetti-Spaccamela, and M. Protasi (1999) Complexity
and Approximation. Springer-Verlag, Berlin.
J.D. Bagley (1967) The behaviour of adaptive systems which employ genetic and correlation algorithms.
Dissertation Abstracts International, 28, 5106B (University Microfilms No. 68-7556).
F.J. Burkowski (2003) Proximity and priority: Applying a gene expression algorithm to the traveling
salesperson problem, paper presented at NIDISC’03 (The Sixth International Workshop on Nature
Inspired Distributed Computing) Nice, France, April 22–26, 2003.
C. Ferreira (2001) Gene expression programming: A new adaptive algorithm for solving problems.
Complex Systems, 13, 87–129.
D.B. Fogel (1994) An introduction to simulated evolutionary optimization. IEEE Transactions on Neural
Networks, 5, 3–14.
D. Goldberg (1989) Genetic Algorithms in Search, Optimization, and Machine Learning. Addison-Wesley
Publishing, Reading, MA.
D.E. Goldberg, B. Korb, and K. Deb (1989) Messy genetic algorithms: Motivation, analysis, and first
results. Complex Systems, 3, 493–530.
J.H. Holland (1992) Adaptation in Natural and Artificial Systems. MIT Press, Cambridge, MA.
B. Julstrom (1999) Coding TSP tours as permutations via an insertion heuristic. In SAC ’99, Proceedings
of the 1999 ACM Symposium on Applied Computing. ACM Press, New York, pp. 297–301.
H. Kargupta (1999) A striking property of genetic code-like transformations. School of EECS Technical
report EECS-99-004, Washington State University, Pullman, WA.
H. Kargupta and S. Ghosh (2002) Toward machine learning through genetic code-like transformations.
Genetic Programming and Evolvable Machines, 3, 231–258.
H. Kargupta and B.H. Park (2000) Gene expression and fast construction of distributed evolutionary
representation. Evolutionary Computation, 9, 45–68.
D. Knuth (1998) The Art of Computer Programming, Sorting and Searching, Vol. 3, 2nd ed. Addison-Wesley,
Reading, MA.
B. Lewin (1997) Genes VI. Oxford University Press, New York.
Z. Michalewicz (1992) Genetic Algorithms + Data Structures = Evolution Programs. Springer-Verlag,
Berlin.
M. Mitchell (1996) An Introduction to Genetic Algorithms, MIT Press, Cambridge, MA.
Y. Nagata and S. Kobayashi (1997) Edge assembly crossover: A high-power genetic algorithm for the travel-
ing salesman problem. In Proceedings of the Seventh International Conference on Genetic Algorithms,
T. Bäck (Ed.), pp. 450–457.
G. Reinelt (1991) TSPLIB — A Traveling Salesman Problem Library, ORSA Journal on Computing, 3,
376–384. See also: http://softlib.rice.edu/softlib/tsplib/
G. Reinelt (1994) The Traveling Salesman. Springer-Verlag, Berlin.
H. Sawai and A. Adachi (1999) Genetic algorithms inspired by gene duplication. Congress on Evolutionary
Computation, July. 1999, IEEE Press, Washington, D.C., pp. 480–487.
R.E. Smith (1988) An investigation of diploid genetic algorithms for adaptive search of non-stationary
functions, TCGA report No. 88001, University of Alabama, The Clearinghouse for Genetic
Algorithms, Tuscaloosa.
G. Tao and Z. Michalewicz (1998) Inver-over operator for the TSP. In Proceedings of the 5th Parallel Problem
Solving from Nature, T. Baeck, A. Eiben, M. Schoenauer, and H. Schwefel (Eds.), Lecture Notes in
Computer Science. Springer-Verlag, Amsterdam, pp. 803–812.
A. Wu and R. Lindsay (1995) Empirical studies of the genetic algorithm with non-coding segments.
Journal a Evolutionary Computation, 3, 121–147.
A. Wu and R. Lindsay (1996) A survey of intron research in genetics. Parallel Problem Solving from Nature
(PPSN IV), H. Voigt, W. Ebeling, I. Rechenberg, and H. Schwefel (Eds.), Springer-Verlag, Berlin,
pp. 101–110.
9.1 Introduction
It has been clearly shown that DNA computing can be used to solve those problems that are currently
intractable on even the fastest electronic computers. For methods to design the algorithms for DNA
computing, however, is not straightforward. To develop efficient DNA computing algorithms requires a
strong background in both DNA molecule and computer engineering. All of these algorithms need to start
over from the very beginning when their initial conditions change. It is very frustrating, especially when
the initial condition change is very small. The existing models based on which a few DNA computing
algorithms were developed are not able to accomplish the dynamic updating.
People have been talking about the huge memory made possible through DNA computing due to
the fact that each strand can be treated as both storage media and processor for a long time. Currently,
there is no existing application that has used this huge memory because although it is easy to read
from this memory, it is extremely hard to store data in it. Memories can only be ready after data has been
stored.
A new DNA computing model is introduced based on which new algorithms are developed to solve
the 3-coloring problem. These algorithms are presented as vehicles to demonstrate the advantages of the
new model. They can be expanded to solve other NP-complete problems. They have the advantage of
dynamic updating, so answers can be changed when the initial conditions are modified. The new model
takes advantage of this huge memory by generating “lookup tables” during the process of implementing
the algorithms. When the initial conditions change, the answers are changed accordingly. The new model
can be used to solve computationally intense problems both efficiently and attractively.
9-135
9.1.1 Motivation
A strand of DNA is composed of four different base nucleotides: A (adenine), C (cytosine), G (guanine),
and T (thymine). When attached to deoxyribose, these base nucleotides can be strung together to form a
strand. Because DNA strand can be used to encode information and DNA bio-operations are completely
based on the interactions between strands, each DNA strand can be counted as a processor as well as
storage media. Numerous strands are involved in DNA bio-operations and the interactions between one
another occur simultaneously. This, then, can be viewed as a realization of massive parallel processing.
Since Adleman [1] solved a 7-vertex instance of the Hamiltonian Path Problem, a well-known repre-
sentative of NP-complete problems, the major goal of subsequent research in the area of DNA computing
has been to develop new techniques for solving NP-complete problems. NP-complete problems are those
problems for which no polynomial-time algorithm has yet been discovered, in contrast to polynomial-time
algorithms whose worst-case run time is O(n k ) for some constant k, where n is the size of the problem.
Consider that 1 litre of water can hold 1022 DNA strands. The potential computing power is significant,
and this recognition raises the hope of solving problems currently intractable on electronic computers.
Rather than using electronic computers upon which the time needed to solve NP-complete problems
grows exponentially with the size of the problem, DNA computing technology can be used to solve
these problems within a time proportional to the problem size. An NP-complete problem that may take
thousands of years for current electronic computers to solve would take a few months, if the existing DNA
computing techniques were adopted.
As indicated in several articles [2–7], most DNA computing algorithms are based on certain developed
DNA computing models. The most popular models are the sticker based model [8,9], the surface-based
model [10,11], and the self-assembly based model [12,13]. The problem with the sticker-based model is
that the stickers annealed to the long strand may fall off during bio-operations, thus causing a very high rate
of error. The limitation of the surface based model is that the scale of computation is severely restricted by
the two-dimensional nature of surface based computations. The shortcoming of the self-assembly based
model is that it makes use of biological operations that are not yet matured.
While the theory of molecular computation has developed rapidly, most of these algorithms usually
take months to solve problems that may take thousands of years to solve with electronic computers. The
problem is that when the initial condition changes, the algorithms have to start over again. Here, a new
DNA computing model, which can eliminate this problem, is introduced. Based on this model, algorithms
can be designed to dynamically update the answer. When the initial condition changes, the new algorithms
can continue with the current process, and the solution for the new problem can be generated by a few extra
processes. In addition, this new model can also be used to solve several similar problems simultaneously.
A T G G G C C A A G G G C C T C C A A G
FIGURE 9.1
One pool
Capture layers
FIGURE 9.2
{s2j |j = 1, 2, . . . , c, where s2j ∈ P2 }, after the ligation, the ligated strands are stored in P3 and they represent
the code {s1i |i = 1, 2, . . . , c, where s1i ∈ P1 } where sk = s1i s2j , for k = i + ( j − 1) × c.
In separation operation, many identical short strands defined as probes are attached to magnetic beads.
These probes are then sent into the pool containing the strands to be separated. Every probe will be
paired up with a complementary strand and together they form a double helix. Such pair-ups occur only
under the Watson–Crick complement rule: C only pairs with G and T only pairs with A. For example,
in Figure 9.1, if the strands containing the region for node 1 colored as ‘R’ need to be separated, the
DNA short strands TACCCGGTTC should be used as a probe because TACCCGGTTC complements
ATGGGCCAAG. Also, the double helix can be separated by heating in order to have the paired strands
part from each other without breaking the chemical bonds that hold the nucleotides together inside the
single strand. The strands in the pool containing a region that complements the probes will be hybridized
to, and captured by, the probe while all those without the region will remain in the pool [8].
A gel-based separation technique for DNA computing [14] has been developed that uses gel-layer
probes instead of the bead to capture the strands. The capture layer only retains the strand with a region
that complements the probe when it is cooled down, and will let all strands pass when the layer is heated
up. The advantage of using gel-based probes over bead-based probes is that the gel-based method is more
accurate when capturing DNA molecules. In Figure 9.2, which illustrates the gel-based separation, a set
of strands runs from the left side buffer to the right. At each capture layer, the temperature is cold in
order to capture the desired strands, and all unwanted strands are passed through into one pool. Then the
temperature is raised to let all desired strands in the layer pass into another pool. The strands from the
left buffer are separated and stored in two different pools:
Combination B(P, P1 , P2 ): to pour two pools, P1 and P2 , together to form a new pool, P.
Detection D(P): to check if there is any strand left in the pool, P. If the answer is “yes,” the strands in
the pool may need to be decoded.
The rest of this chapter is organized as follows: Section 9.2 gives an introduction to our new algorithm
and how, based on the new model we purposed, it solves the 3-coloring problem. The complexity analysis
of the new algorithm is provided in Section 9.3, which shows how dynamic updating can be accomplished.
The last section makes the conclusion.
R
G
G B
G
R
R
FIGURE 9.3
G1 G2
G3 G4 G5 G6 1
G2b–1
G2b+1 b
G2b+1 G2b+2
FIGURE 9.4
Algorithm 1.
For i = 1 to n do
In parallel (I(Pi, color of node i))
End
F=n
While f ≠ 1 do
In parallel (Make multiple copies of strands in all pools)
For all odd I do
In parallel ( L (Pi, Pi, Pi+1 ))
In parallel (relabel all pools 1 to f/2)
End
f = f/2
End
S (P, Pt, Pf1, θ), θi is color conflicts along ei
K = 1;
For i = 2 to m do
S (Pt, P1t, P1f, θ), θi is color conflicts along ei
For j = 1 to k do
In parallel { S (Pfj, Pjt, Pjf, θi) }, θi is color conflicts along ei
End
For j = 1 to k do
In parallel do ( (B(Pfj, Pj-1f, Pjt))
End
B (Pt, P1t, θ)
B (Pfk+1, Pkf, θ)
K=k+1
End
Check if P is empty to return ‘‘yes’’ or ‘‘no’’ accordingly.
FIGURE 9.5
cut sets and detect all the color conflicts caused. The task can be accomplished by the separation operation
to filter out all strands that contain any color conflict from the pool. Two nodes, i and j, are connected by
an edge. The pool is first separated into three pools while each pool contains the strands coloring node i
as R, G, and B. The strands having node j colored as R, G, and B are filtered out by using the separation
operation.
The answer to the 3-coloring problem is “yes” if there is any strand left in the final pool. The answer is
“no” and the graph cannot be colored by only three colors if there is no strand left.
(a) (b)
FIGURE 9.6
(a) (b)
FIGURE 9.7
of the given graph. The answer can be changed to “yes” after one node is inserted to the graph as shown
in Figure 9.6(b).
If the original answer is“no,” it can be changed to“yes”after a node is removed from the edge. An example
is shown in Figure 9.7. Figure 9.7(a) contains the graph with the answer “no” for the 3-coloring problem.
The answer is changed to “yes” after one node is removed from the graph as shown in Figure 9.7(b).
If an edge is removed or inserted to the graph, it can be dealt with similarly because at least one edge
should be removed if a node is eliminated and at least one edge should be added when a node is inserted.
What is illustrated next shows how to dynamically update a solution when a node or edge is inserted
into the graph, following an original answer “yes.” The strands in the final set, Pt , are checked for possible
new answers. The final set is the only set that can be used because it is the only set that contains the
strands representing all possible coloring solution that do not have any color conflicts among all the nodes
except the newly added one. Only the color conflicts that occur between the newly added node and nodes
connected with it need to be checked based on these sets. Only the newly added edges are checked for
color conflicts.
The most difficult case occurs when a node or edge is removed from a graph with an original answer of
“no.” The answer to the new graph may be either “yes” or “no.” To remove a node includes removing both
the node itself and all the edges connecting the new node to the graph. What follows is the dynamically
updating algorithm for this case. The DNA computing result that reflects an original answer of “no” is
Algorithm 2.
For i = 1 to α do
In parallel (S (Pfi, Pnewi, Pfi, θi), θi is color conflicts along exact i # of edges)
End
B (Pnew, φ, φ)
For i = 1 to α do
In parallel ( B (Pnew, Pnew, Pnewi))
End
B (Pnew, Pt, Pnew)
For j = 1 to β do
S (Pnew, Pnew, Pnewf, wi), wj is the colored conflicts based on edge ej
End
Check if Pnew is empty to return ‘‘yes’’ or ‘‘no’’ accordingly.
FIGURE 9.8
represented by an empty Pt set with no strand. All other sets represent the coloring patterns of the original
graph with the color conflicts. After removing the nodes or edges, some coloring patterns may no longer
have conflicts. The task now is to identify those patterns represented by DNA strands. Here, the strand
sets that need to be examined is limited. Only those strands representing color patterns with color conflict
involving the pair of nodes connected by the edges being removed are checked. Finding the above strand
sets takes O(α) steps, where α is the number of edges being removed. This process is much less expensive
than recomputing the updated graph from the very beginning when α is not large.
The detailed algorithm needed to find the answer for the new graph with the removed edges, based on
the original “no” answer, is illustrated in Figure 9.8.
When only one edge is removed from the original graph, pool Pf 1 is checked. This is because Pf 1
contains all the strands representing all the color combinations for the graph that have no color conflicts
along all edges, except one.
Assuming that the two nodes along the edge being removed are n1 and n2 , the strands that need to be
separated from the pool are those that have the two nodes colored as {RR}, {BB}, and {GG}. That means
that only those strands, which have two identically colored nodes are extracted to a new pool, Pnew . If Pnew
is not empty, the answer to the 3-coloring problem for the new graph is “yes,” which is different from the
original graph. Otherwise, the “no” answer remains.
When two edges are removed from the graph, both Pf 1 and Pf 2 need to be checked. This is because Pf 2
may contain strands that represent color combinations that have color conflicts along both the edges being
removed. Pf 1 may contain strands that represent color combinations of the graph with a color conflict
along only one of the two edges being removed. Suppose the two edges being removed are e1 and e2 .
Then, strands that need to be extracted from pool Pf 2 using the separation operation must represent the
color combinations of the graph having color conflicts along both edges. Strands that should be extracted
from Pf 1 are those representing color combinations with color conflict along either e1 or e2 . The extracted
strands are stored in a new pool, Pnew . If Pnew is not empty, the answer to the 3-coloring problem for the
new graph is “yes,” which is different from the original graph. Otherwise, the answer for the 3-coloring
problem to the new graph remains “no.”
When α different edges are removed from the original graph, α different pools should be checked.
These pools are Pf 1 , Pf 2 , . . . , Pf α . For different pools, different operations need to be undertaken. For
pool Pf 1 , all strands are left due to the color conflict along one edge. If the edge that caused the conflict
is removed, the answer will change to “yes.” Because of this, all strands in this pool representing those
color combinations with color conflicts along one of the α edges that have been removed should represent
the answer to the 3-coloring problem of the new graph. For pool Pf 2 , all strands representing the color
combinations have color conflicts along two, and only two, of the edges being removed, representing the
answers to the 3-coloring problem of the new graph. For pool Pft where t ≤ α, all strands representing
the color combinations having color conflicts along exactly t different edges being removed will generate
the answer to the 3-coloring problem for the new graph. All strands extracted from these sets are stored
in a new pool, Pnew . If Pnew is not empty, the answer to the 3-coloring problem for the new graph is “yes,”
and thus different from the original graph. The answer is “no” if Pnew is empty.
When the graph is changed by both removing and adding edges, multiple processing steps need to be
considered. Assuming that the number of edges being removed is α and the number of edges being added
is β, the strands with color conflicts along the removed edges should be found first. This will put the
strands to be considered for the following operations in one pool, Pnew , instead of involving several pools.
Those α edges should first be considered by using the method introduced above to go through α different
pools. Then, Pt is combined with Pnew and relabeled Pnew . This is due to the fact that those strands that
may generate the “yes” answer are distributed in α + 1 different pools. Collecting the strands in one pool
will save time and further operations as compared to working on these pools one at a time. If no strands
are left in pool Pnew , then the answer to the new graph is “no.” If there are some strands in pool Pnew after
α edges are removed, color conflicts along β edges are checked. This operation can be accomplished in a
manner similar to what has been described above for adding edges.
Compared to the existing algorithms, our new method can dynamically update the solution when the
initial condition changes for the 3-coloring problem of a graph. It can also solve the 3-coloring problem
for many graphs that are similar to each other. The complexity of the existing algorithm is O(m + n),
where n is the number of vertices and m is the number of edges [18]. If the updating process is not used,
any change to the initial condition will result in a restarting of the process. With our new algorithm,
the number of extra processes that need to be undertaken depends upon the significance of the changes.
The complexity of the updating process is O(α + β), where α is the number of edges being removed. β is
the number of edges being added.
When this method is used to solve the 3-coloring problem for multiple graphs that are similar to each
other, the time complexity is O(θ) after the solution for one graph is generated, where θ is the difference
between the number of edges of the two graphs.
It is necessary to check the extra space and effort that may be necessary for making dynamic updating
available. First, m additional containers are needed to keep all m extra sets of strands. Second, the extra
DNA material for generating these sets needs to be contained. Because strands are generated to represent all
color combinations for the graph before the separation process takes place, no extra material is necessary
as compared with the existing algorithms until the answer is generated for the original graph. The extra
material is only necessary if new solution needs to be formed for the modified graph when the edges
and/or nodes are added.
When the procedure for approaching a 3-coloring problem of a given graph is finished and a new graph
is provided, how can one determine whether to start again from the beginning or to use the dynamic
updating method to generate the new answer?
Assuming that the implementation of the algorithms introduced above for the 3-coloring problem of
the graph with n nodes and m edges has been finished, the 3-coloring problem of a new graph needs
to be solved. This new graph has N nodes and M edges. This graph can be converted from the existing
graph by first removing δ nodes and α edges, and then adding γ nodes and β edges. The new graph can
be generated by changing the original graph, or it can be treated as a totally new graph. In order to solve
the problem for the new graph, N ligation and M separation operations are necessary if the algorithm is
being restarted from the beginning. The total time necessary is:
T1 = N × l + M × s,
where l is the time for each ligation operation and s is the time necessary for each separation operation.
Here, combinations are ignored due to their simplicity because the time needed for the combination
operations is very short, as compared to the other operations used in DNA computing. When the answer
is generated based on the pools already generated using this new, dynamically updating strategy, the time
necessary for reaching the answer is
T2 = (α + β) × s + γ × l.
In order to take advantage of the new method, the time that is needed must be shorter than restarting
the algorithm from the beginning.
T 2 ≤ T1 ,
(α + β) × s + γ × l ≤ N × l + M × s,
(α + β) × s + γ × l ≤ (n + γ − δ) × l + (m + β − α) × s,
(m − α) × s + (n − δ) × l ≥ α × s,
(m − α) × s > α × s.
So, α < m/2. The algorithm needs to be restarted from the beginning only when the change is significant,
which means, when more than half of the edges need to be removed to generate the new graph from the
original.
Given the above condition, it is clear that there is no need to retain all m sets. At least half of the pools
can be destroyed in order to save storage space. This will save the expenses once required for storing m sets
of strands and the material needed to work on them.
9.4 Conclusion
A new model for DNA computing is introduced. Based on the new model, our new algorithms for the
3-coloring problem have been presented. The new algorithms have the advantage of dynamic updating,
as compared to the existing algorithms. These new algorithms represent a huge improvement over the
existing algorithms.
Instead of restarting the DNA computing algorithm from the very beginning every time the initial
condition would change, this new method can generate the new solution through a few extra DNA
operations based on the existing answer. It can also quickly solve problems similar to those already solved.
No extra material is needed to prepare for the dynamically updating process. The only expense is some
extra storage containers for storing the additional pools of DNA strands. As compared to the existing
DNA computing algorithms, this new method can achieve a solution much more quickly after the answer
for the first problem is generated and it is very financially efficient. This will make DNA computing more
attractive to potential users who want to solve the problem that is currently unsolvable.
References
[1] L. Adleman. Molecular computation of solutions to combinatorial problems. Science, 1021–1024,
1994.
[2] Y. Gao, M. Garzon, R.C. Murphy, J.A. Rose, R. Deaton, D.R. Franceschetti, and S.E. Stevens Jr.
DNA implementation of nondeterminism. In Proceedings of the Third DMIACS Workshop on DNA
Based Computers, University of Pennsylvania, June 1997, pp. 204–211.
[3] G. Gloor, L. Kari, M. Gaasenbeek, and S. Yu. Towards a DNA solution to the shortest common
superstring problem. In Proceedings of the Fourth International Meeting on DNA Based Computers,
University of Pennsylvania, June 1998, pp. 111–116.
[4] V. Gupta, S. Parthasarathy, and M.J. Zaki. Arithmetic and logic operation with DNA. In Proceedings
of the Third DIMACS Workshop on DNA Based Computers, University of Pennsylvania, June 1997,
pp. 212–222.
[5] P. Kaplan, D. Thaler, and A. Libchaber. Parallel overlap assembly of paths through a directed
graph. In Proceedings of the Third DIMACS Workshop on DNA Based Computers, University of
Pennsylvania, June 1997, pp. 127–141.
[6] R. Lipton. Using DNA to Solve SAT, 1995.
[7] Z.F. Qiu and M. Lu. Arithmetic and logic operations for DNA computers. In Parallel and Distributed
Computing and Networks (PDCN’98), IASTED, December 1998, pp. 481–486.
[8] S. Roweis, E. Winfree, R. Burgoyne, N. Chelyapov, M. Goodman, P.R. Othemund, and L. Adleman.
A sticker based architecture for DNA computation. In Proceedings of the Second Annual Meeting
on DNA Based Computers, Princeton University, June 1996, pp. 1–27.
[9] S. Roweis, E. Winfree, R. Burgoyne, N. Chelyapov, M. Goodman., P.R. Othemund, and L. Adleman.
A sticker based model for DNA computation. Journal of Computational Biology, 5: 615–629, 1998.
[10] Q. Liu, Z. Guo, A.E. Condon, R.M. Corn, M.G. Legally, and L.M. Smith. A surface-based approach
to DNA computation. In Proceedings of the Second Annual Meeting on DNA Based Computers,
Princeton University, June 1996, pp. 206–216.
[11] L. Wang, Q. Liu, A. Frutos, S. Gillmor, A. Thiel, T. Strother, A. Condon, R. Corn, M. Lagally, and
L. Smith. Surface-based DNA computing operations: Destroy and readout. In Proceedings of the
Fourth International Meeting on DNA Based Computers, University of Pennsylvania, June 1998,
pp. 247–248.
[12] E. Winfree. Proposed techniques. In Proceedings of the Fourth International Meeting on DNA Based
Computers, University of Pennsylvania, June 1998, pp. 175–188.
[13] E. Winfree, X. Yang, and N.C. Seeman. Universal computation via self-assembly of DNA: Some
theory and experiments. In Proceedings of the Second Annual Meeting on DNA Based Computers,
Princeton University, June 1996, pp. 172–190.
[14] R.S. Braich, C. Johnson, P.W.K. Rothemund, D. Hwang, N. Chelyapov, and L.M. Adleman. Solution
of a satisfiablility problem on a gel-based DNA computer. In Proceedings of the Sixth International
Meeting on DNA Based Computers, June 2000, pp 31–42.
[15] J. Clark and D.A. Holton. A First Look at Graph Theory. World Scientific, Singapore, 1991.
[16] T.H. Cormen, C.E. Leisenson, and R.L. Rivest. Introduction to Algorithms. MIT Press, Cambridge,
MA, 1990.
[17] L. Adleman. On constructing a molecular computer, 1995.
[18] E. Bach and A. Condon. DNA models and algorithms for NP-complete problem. Journal of
Computer and System Sciences, 57: 172–186, 1996.
[19] N. Christofides. Graph Theory: An Algorithmic Approach. Academic Press, New York, 1975.
[20] P.D. Kaplan, G. Cecchi, and A. Libchaber. DNA-based molecular computation: template-template
interactions in pcr. In Proceedings of the Second Annual Meeting on DNA Based Computers,
Princeton University, June 1996, pp. 159–171.
10.1 Introduction
Over the past decades, a multitude of new search heuristics, often called “metaheuristics” have been
proposed, many of them inspired by principles observed in nature. Common representatives include
evolutionary algorithms (GAs) [1], ant colony optimization (ACO) [2], simulated annealing [3], tabu
search [4], or estimation of distribution algorithms [5]. Besides the book at hand, overviews of several
such metaheuristics can be found, for example, in References 6 and 7 or 8.
Each of these metaheuristics has been proven successful on a variety of applications. Although there
have been attempts to compare their performance, the results are contradicting and inconclusive. There
does not seem to be a superior candidate that should generally be preferred over the others. Thus, it is
not surprising that recently, there has been a growing interest in hybridization of these metaheuristics
(cf. Section 10.2).
In this chapter, we propose a simple unified framework that describes the fundamental principle
common to all metaheuristics. The framework focuses on the commonalities rather than the differences
between search algorithms. Due to its simplicity and generality, it suggests a natural way for hybridization,
basically turning the variety of metaheuristics into one large toolbox from which an algorithm designer
can choose those parts that seem most appropriate for the application at hand. The power of the model
10-147
to unify different metaheuristics will be demonstrated at the example of combining EAs and ACO, and we
will report on some preliminary empirical results on the performance of the hybrids so generated.
The chapter is structured as follows: in Section 10.2, we will survey some related work. Then, in
Section 10.3, we will describe the proposed unified framework. A specific aspect of that framework,
the organization of memory, is discussed in Section 10.4. Section 10.5 demonstrates the application
of the model to the hybridization of EAs and ACO. The resulting hybrids are compared empirically in
Section 10.6. The chapter concludes with a summary and some suggestions for future work.
• Evolutionary algorithms store information about the previous search in the form of a set of solutions
(population). New solutions are constructed by selecting two solutions (parents), combining them
Construction operators
Select one or
several operators
Memory
Probabilistic model
Other memory
operators
Memory update
Select one or
several operators
Update operators
in some way (crossover), and performing some local modifications (mutation). Then, the memory
is updated by inserting the new solutions into the population. Although there exist a variety of
EA variants, they all fit into this general framework. For example, evolution strategies with self-
adaptive mutation can be specified by extending the memory to also maintain information about
the strategy parameters. Steady state genetic algorithms update the population after every newly
generated solution, while genetic algorithms with generational reproduction generate a whole new
population of individuals before updating the memory.
• Simulated annealing only maintains a single solution in the memory. In addition to that, it keeps
track of time by a temperature variable. New solutions are created by local modifications more or
less equivalent to the mutation operator in EAs. The new solution replaces the current memory
solution depending on the quality difference and the temperature.
• Tabu search, just as simulated annealing, creates new solutions based on a single current solution in
the memory. Additionally, it maintains so-called tabu-lists to avoid revisiting previous solutions.
These tabu lists are generally recently visited solutions or recently performed move operations.
New solutions are created by local modifications, while taking into account the tabu list. More
advanced tabu search algorithms can comprise of a number of additional memory structures,
like for example, a long-term frequency memory that records the number of times a particular
component has appeared in a solution.
• Particle swarm optimization uses a swarm (set) of particles (current solutions). The search process
can be imagined as a parallel search of particles “moving” through the landscape defined by the
fitness function. In addition to their locations (solution characteristics), the memory contains for
each particle the personal best solution encountered so far and a velocity, which can be seen as a
kind of general accumulated search direction. In every iteration, new solutions are generated by
moving the particles according to their velocity, and a linear, spring-like attraction to their personal
best solution encountered and the overall best solution encountered by any of the swarm’s particles.
Memory update includes an update of the particle locations, the personal best solutions, and the
particles’ velocities.
• Ant colony optimization, when compared to the approaches outlined earlier, has a completely dif-
ferent way to store information about the search conducted so far. Instead of storing complete
solutions, it accumulates information about which partial decisions have been successful when
constructing solutions from scratch. For example, for the traveling salesperson problem, it main-
tains a (so-called pheromone) matrix indicating for each city how desirable it is to visit another city.
Using this matrix, new solutions are constructed systematically, starting at a random city, and iter-
atively selecting the next city probabilistically according to the relative preferences encoded in the
matrix. Usually, several new solutions are generated that way, and then the best solution found is
used to update the matrix, increasing the probability that future ants will make the same decisions.
An elitist ant (best solution found so far) can be modeled by an additional (complete) solution
stored in the memory.
• Estimation of distribution algorithms, similar to ACO, construct solutions based on a probabilistic
model, only that the probabilistic model is not necessarily stored in the form of a matrix. The
new solutions are then evaluated, and the information gained is used to update the probabilistic
model. The class of EDAs contains a multitude of different approaches that vary primarily in the
complexity of the probabilistic model (in particular whether they take variable dependencies into
account or not), and in the way the probabilistic model is updated (incrementally or reconstructed
every iteration based on the generated samples). Note that many EDA approaches actually do not
use a probabilistic model as main memory component, but instead rely on a population of solutions
as underlying memory structure, and construct a new probabilistic model in every iteration based
on the current population.
Given a description of the different metaheuristics in this general form has many benefits. First, it
creates a common language, which allows researchers from different fields to understand each other’s
approaches easily. Second, it moves the focus from the complete algorithms to the components. And
third, it provides the interfaces for the different components to work together.
Based on the presented unified framework, it is almost straightforward to combine different com-
ponents from different algorithmic paradigms: an algorithm designer can easily select a combination of
memorization features, choose a suitable set of construction operators or create new ones that make use
of the combined set of selected memorization features, and then decide how the memory is updated with
the newly generated information. The framework allows for a lot of freedom: new solutions may be con-
structed in different ways, using different information from the memory, the solutions thus constructed
using one part of the memory may be used to update another part of the memory, and so on.
Depending on what information is stored in the memory, metaheuristics may be classified into solution-
based or model-based (cf. [20]).
The approaches from the former category primarily keep some of the solutions generated so far.
Simulated annealing just stores a single solution; EAs and tabu search store a set of solutions. Although
the way these solutions are selected is different for the different algorithms, the implicit assumption
always is that the stored solutions sufficiently represent the promising regions of the search space and
appropriately reflect the history of the search.
Ant Colony Optimization belongs to the latter category: it assumes that the problem is to make a
sequence of decisions, and then accumulates information about the desirability of making a certain
decision in a given situation (state). It builds a model of construction methods. The space and complexity
limitations are observed by restricting the number of states, and by ignoring interdependencies between
decisions. For example, the state usually considered when solving a traveling salesperson problem is the
current city, independent of the sequence of cities visited so far. Because usually, only complete solutions
(corresponding to a combination of decisions) can be evaluated, but the memory stores desirability of
decisions, ACO has to assign the credit for a good solution to the individual decisions. Currently, this is
done in a straightforward way by simply distributing the credit evenly.
The class of EDAs is rather broad, and by using a population of solutions as well as a probabilistic
model, different instantiations can be closer to either the solution-based or the model-based memory
category. They are more or less decision-based (i.e., construct solutions step by step) but may keep track
of variable dependencies through graphical models (e.g., Bayes networks or Gaussian networks).
Obviously, each of the above memorization schemes has its benefits and its drawbacks. Storing complete
solutions preserves all the interdependencies between decision variables and thus implicitly takes epistasis1
into account. However, it discards a lot of the solutions generated. On the other hand, the decision-
based memorization scheme used for example, in ACO integrates information about many generated
solutions (information is only slowly evaporated), at the expense of losing a lot of information about
interdependencies.
From the above considerations, it seems natural that a combination of these two fundamental mem-
orization schemes may be beneficial, which is one of the reasons why we decided to further explore the
combination of EAs and ACO using the framework proposed in Section 10.3.
Furthermore, it should be mentioned that besides information about the search space, it seems prom-
ising to also store information about algorithmic parameters or meta-information about the search like
some characteristics of the fitness values observed over time. Examples for such meta-knowledge are
the already mentioned self-adaptive mutation in evolution strategies, or the dependence of the tempera-
ture parameter on the fraction of accepted moves in simulated annealing. Tabu search may have several
additional memorized features like the frequency of certain moves in the past, etc. Another example, in
particular for hybrid approaches, would be to track the performance of different operators in order to
decide which operators should be used more often. Again, all those aspects easily fit into the proposed
framework.
1 Epistasis,
in the field of evolutionary computation, generally refers to the fact that different components of the
solution interact.
τijα
Pij = α,
l∈U τil
where U being the set of yet unvisited cities and α being a user-defined weighting parameter.
After the k new solutions of one cycle have been constructed, pheromone is evaporated by multiplying
every element τij of the pheromone matrix with (1 − ρ) where ρ is the pheromone evaporation rate. Then,
the best individual found in the iteration is used to update the pheromone matrix by adding 1/L to all
matrix elements τij where the edge from city i to city j is part of the corresponding tour with length L.
Furthermore, the algorithm maintains the overall best solution as an elite that is also used to update
the matrix in the above way in every cycle.
Note that for many optimization problems, the construction procedure as used by ACO allows to
incorporate heuristic knowledge in a very elegant way. In the case of TSP, for example, this can be done
by simply preferring close cities in the selection process.
The selection probabilities then become
β
τijα ηij
Pij = ,
α β
l∈U τil ηil
where ηij = 1/dij being the reciprocal of the distance dij between cities i and j and β being the weight for
the heuristic information.
Naturally, a direct comparison of algorithms that use heuristic knowledge with algorithms that do not
is not fair. Since the focus of this chapter is not to construct the best algorithm to solve TSP, but rather
to study the effect of different memorization schemes and algorithmic hybrids, most of the approaches
suggested later do not incorporate problem-specific knowledge. Nevertheless, we will also briefly report
on the performance of ACO and a hybrid operator when they incorporate heuristic knowledge.
of 0.3 or 0.6, respectively. The three resulting problems are visualized in Figure 10.2. Independent of the
problem instance, each algorithm was allowed to create and evaluate 200,000 solutions.
Comparing different algorithms is a tricky business, because the results are highly dependent on the
parameters used. Therefore, we tested all possible combinations of the parameter settings depicted in
Table 10.1.
A comparison of the performance of the different algorithms on all three problems is presented in
Table 10.2. The table reports the performance of the best parameter setting for each algorithm. All results
are averaged over 30 runs with different random seeds. ACO with incorporated heuristic knowledge
(β = 5), and ABX is also included for comparison.
Looking at the basic algorithms first, it is obvious that the edge recombination crossover yields signifi-
cantly better results than the order crossover, which again performs better than ACO. The EA is able to
find the optimal solution for the simple problem (which is approximately the circumference of a unit
FIGURE 10.2 Location of cities in problem instance (a) P0, (b) P3, and (c) P6.
α k ρ n pm s
ACO 1, 5, 10 10, 20, 50, 100 0.005, 0.01, 0.05 — — —
EA with ERX — — — 100, 200, 300 0.3, 0.5 1.25, 1.5, 2.0
EA with OX — — — 100, 200, 300 0.3, 0.5 1.25, 1.5, 2.0
50/50 1, 5 10, 20, 50 0.005, 0.01, 0.05 200, 300 0.3, 0.5 1.5
PsER crossover 1, 5 10, 20, 50 0.005, 0.01, 0.05 200, 300 0.3, 0.5 1.25, 1.5, 2.0
PC crossover 1, 5 10, 20, 50 0.005, 0.01, 0.05 200, 300 0.3, 0.5 1.25, 1.5, 2.0
MutA 1, 5, 10 10, 20, 50, 100 0.005, 0.01, 0.05 — 0.3, 0.5 —
TABLE 10.2 Tour Length of the Best Solution Found by the Different
Algorithms
Problem instance
circle, i.e., 2π ), but deviates from the optimal solution for the other two problems (the optimum is not
known, but ACO with heuristic knowledge achieves better results). ACO, on the other hand, seems to be
fairly unaffected by the structure of the problem.
As to the hybrids, only the pheromone-supported ER crossover was a clear winner. Simple mutation
of the solutions constructed by ACO also performed very well, and consistently outperformed the basic
ACO approach, indicating that the algorithm indeed benefited from the mix of operators. The pheromone
completion crossover was disappointing (although still better than the basic ACO). However, these results
are preliminary, and may be due to the specific problem instances chosen. Certainly, there are problems
where ACO prevails, while for other problems, EAs perform better. Yet other problems may require a
hybrid approach to be solved.
As expected, heuristic domain knowledge is able to drastically improve performance. The ACO with
heuristic knowledge as well as ABX generate equally good (presumably optimal) results on all problem
instances, outperforming all methods without heuristic knowledge. Note that besides the idea of ABX,
incorporation of domain knowledge into the EA is not as straightforward, and we have not been able to
produce similar results, for example, by seeding the population with a heuristic (results not reported).
As the results show, the problem instances examined are too simple if heuristic knowledge is incorporated.
On the larger problem instances used in Reference 25, ACO and ABX clearly outperformed a standard EA
with ERX, and ABX outperformed ACO by ∼1.3%.
10.7 Conclusion
In this chapter we have presented a unified framework for iterative search heuristics. According to the
framework, each search heuristic maintains some sort of memory of the search history, which is used to
construct new solutions, which are then evaluated and used to update the memory. Furthermore, we have
argued that different memory schemes have different advantages, and that a search heuristic should benefit
from combining different memorization paradigms.
The presented unified framework suggests a natural way for hybridization, and we have demonstrated
its usefulness by deriving several interesting combinations of EAs and ACO and conducting a preliminary
empirical evaluation of the resulting hybrids. A closer look at the compatibility of different memory
schemes, and how they are best combined, is subject to future research. For the algorithm designer,
of course, it would be invaluable to know which operators and memory schemes are most promising
depending on the application at hand. However, that assumes a useful categorization of problems, and is
thus several steps in the future. Overall, we hope that this chapter helps to gain a general understanding
of different metaheuristics and of the way they interact.
References
[1] K.A. DeJong. Evolutionary Computation. MIT Press, Cambridege, MA, 2002.
[2] E. Bonabeau, M. Dorigo, and G. Theraulaz. Swarm Intelligence: From Natural to Artificial Systems.
Oxford University Press, Oxford, 1999.
[3] E. Aarts and J. Korst. Simulated Annealing and Boltzmann Machines. John Wiley & Sons, New York
1989.
[4] F. Glover. Tabu search — p. I. ORSA Journal of Computing, 1: 190–206, 1989.
[5] P. Larrañaga and J.A. Lozano, Eds. Estimation of Distribution Algorithms. Kluwer Academic,
New York, 2002.
[6] C. Reeves, Ed. Modern Heuristic Techniques for Combinatorial Optimization. McGraw-Hill,
New York, 1995.
[7] E.L. Aarts and J.K. Lenstra, Eds. Local Search in Combinatorial Optimization. Wiley, Chichester,
1997.
[8] Z. Michalewicz and D.B. Fogel. How to Solve It: Modern Heuristics. Springer, New York, 1999.
[9] D.E. Brown, C.L. Huntley, and A.R. Spillane. A parallel genetic heuristic for the quadratic
assignment problem. In International Conference on Genetic Algorithms, Morgan Kaufmann, San
Francisco, CA, 1989, pp. 406–415.
[10] S.W. Mahfoud and D.E. Goldberg. Parallel recombinative simulated annealing: A genetic
algorithm. Parallel Computing, 21: 1–28, 1995.
[11] C. Fleurent and J. Ferland. Genetic and hybrid algorithms for graph coloring. Technical report,
Departement d‘Informatique, Montreal, Canada, 1994.
[12] V.V. Miagkikh and W.F. Punch. An approach to solving combinatorial optimization problems
using a population of reinforcement learning agents. In Genetic and Evolutionary Computations
Conference, Morgan Kaufmann, San Francisco, CA, 1999, pp. 1358–1365.
[13] E.-G. Talbi, O. Roux, C. Fonlupt, and D. Robilliard. Parallel ant colonies for the quadratic
assignment problem. Future Generation Computer Systems, 17: 441–449, 2001.
[14] T. Krink and M. Lovbjerg. The lifecycle model: Combining particle swarm optimisation, genetic
algorithms and hillclimbers. In J.J. Merelo, P. Adamidis, H.-G. Beyer, J.-L. Fernandez-Villacanas,
and H.-P. Schwefel, Eds., Parallel Problem Solving from Nature, Vol. 2439 of Lecture Notes in
Computer Science, Springer, New York, 2002, pp. 621–630.
[15] P. Moscato. Memetic algorithms: A short introduction. In D. Corne, M. Dorigo, and F. Glover,
Eds., New Ideas in Optimization, McGraw Hill, New York, 1999, chap. 14, pp. 219–234.
[16] P. Calegari, G. Coray, A. Hertz, D. Kobler, and P. Kuonen. A taxonomy of evolutionary algorithms
in combinatorial optimization. Journal of Heuristics, 5: 145–158, 1999.
[17] E.-G. Talbi. A taxonomy of hybrid metaheuristics. Journal of Heuristics, 8: 541–564, 2002.
[18] R. Poli and B. Logan. The evolutionary computation cookbook: Recipes for designing new
algorithms. In Online Workshop on Evolutionary Computation, 1996, pp. 33–36.
[19] C. Blum and A. Roli. Metaheurisitics in combinatorial optimization: Overview and conceptual
comparison. ACM Computer Survey, 35: 268–308, 2003.
[20] M. Zlochin, M. Birattari, N. Meuleau, and M. Dorigo. Model-based search for combinatorial
optimization. Technical report TR/IRIDIA/2000-15, INRIDIA, Universite Libre de Bruxelles, 2001.
[21] J. Branke, M. Stein, and H. Schmeck. A Unified Framework for Metaheuristics. Technical
report 417, University of Karlsruhe, Institute AIFB, Karlsruhe, Germany, 2002.
[22] L. Davis. Applying adaptive algorithms to epistatic domains. In International Joint Conference on
Artificial Intelligence, Morgan Kaufmann, San Francisco, CA, 1985, pp. 162–164.
[23] D. Whitley, T. Starkweather, and D’A. Fuquay. Scheduling problems and traveling salesman:
The genetic edge recombination operator. In J. Schaffer, Ed., International Conference on Genetic
Algorithms, Morgan Kaufmann, San Francisco, CA, 1989, pp. 133–140.
[24] V.V. Miagkikh and W.F. Punch. Global search in combinatorial optimization using reinforce-
ment learning algorithms. In Congress on Evolutionary Computation, IEEE, Piscataway, 1999,
pp. 189–196.
[25] J. Branke, C. Barz, and I. Behrens. Ant-based crossover for permutation problems. In E. Cantu-Paz,
Ed., Genetic and Evolutionary Computation Conference, Vol. 2273 of Lecture Notes in Computer
Science, Springer, New York, 2003, pp. 754–765.
11.1 Introduction
The advances in computing and communication technologies and software tools have resulted in an
explosive growth in networked applications and information services that cover all aspects of our life. These
services and applications are inherently complex, dynamic, and heterogeneous. In a similar way, the under-
lying information infrastructure, for example, the Internet, is large, complex, heterogeneous, and dynamic,
globally aggregating large numbers of independent computing and communication resources, data stores,
and sensor networks. The combination of the two results in application development, configuration, and
management complexities that break current computing paradigms, which are based on static behaviors,
interactions, and compositions of components and services. As a result, applications, programming
environments, and information infrastructures are rapidly becoming brittle, unmanageable, and insecure.
This has led researchers to consider alternative programming paradigms and management techniques that
are based on strategies used by biological systems to deal with complexity, dynamism, heterogeneity, and
uncertainty.
11-157
Autonomic computing is inspired by the human autonomic nervous system, which has developed
strategies and algorithms to handle complexity and uncertainties, and aims at realizing computing systems
and applications capable of managing themselves with minimum human intervention. In this chapter,
we first give an overview of the architecture of the nervous system and use it to motivate the autonomic
computing paradigm. We then illustrate how this paradigm can be used to build and manage complex
applications. Finally, we present an overview of existing autonomic computing systems and applications
and highlight two such systems.
Viability zone
EV2
EV1
Motor Sensor
channels channels
this observation. He states that a form of behavior is adaptive if it maintains the essential variables within
physiological limits [2] that define the viability zone. Two important observations can be made:
• The goal of the adaptive behavior is directly linked with the survivability of the system.
• If the external or internal environment pushes the system outside its physiological equilibrium
state, the system will always work toward coming back to the original equilibrium state.
Ashby observed that many organisms undergo two forms of disturbance: (1) frequent small impulses
to main variables and (2) occasional step changes to its parameters. Based on this observation, he devised
the architecture of the Ultrastable system that consists of two closed loops (see Figure 11.2): one that
controls small disturbances and a second that is responsible for longer disturbances.
As shown in Figure 11.2, the ultrastable system consists of two subsystems, the environment and the
reacting part (R). R represents a subsystem of the organism that is responsible for overt behavior or
perception. It uses the sensor channels as part of its perception capability and motor channels to respond
to the changes impacted by the environment. These set of sensors and motor channels constitute the
primary feedback between R and the environment. We can think of R as a set of behaviors of the organism
that gets triggered based on the changes affected by the environment. S represents the set of parameters
that triggers changes in relevant features of this behavior set. Note that in Figure 11.2, S triggers changes
only when the environment affects the essential variables in a way that causes them to go outside their
physiological limits. As mentioned earlier, these variables need to be maintained within physiological
limits for any adaptive system/organism to survive. Thus we can view this secondary feedback between
the environment and R as responsible for triggering the adaptive behavior of the organism. When the
changes impacted by the environment on the organism are large enough to throw the essential variables
out of their physiological limits, the secondary feedback becomes active and changes the existing behavior
sets of the organism to adapt to these new changes. Notice that any changes in the environment tend to
push an otherwise stable system to an unstable state. The objective of the whole system is to maintain
the subsystems (the environment and R) in a state of stable equilibrium. The primary feedback handles
finer changes in the environment with the existing behavior sets to bring the whole system to stable
equilibrium. The secondary feedback handles coarser and long-term changes in the environment by
changing its existing behavior sets and eventually brings back the whole system to stable equilibrium state.
Hence, in a nutshell, the environment and the organism always exist in a state of stable equilibrium and
any activity of the organism is triggered to maintain this equilibrium.
Sensor channels
Sensory neurons
Internal External
environment Reacting part R environment
Motor neurons
Essential variables (EV)
Motor channels
Environment
S = f (change in EV)
goes out of the physiological limits change the normal behavior of the system such that the reacting part R
works to bring the essential variable back within limits. It uses its motor channels to effect changes so that
the internal environment and the system (organism) come into the state of stable equilibrium. It should be
noted that the environment here is divided into the internal environment and external environment. The
internal environment represents changes impacted internally within the human system and the external
environment represents changes impacted by the external world. However, the goal of the organism is to
maintain the equilibrium of the entire system where all the subsystems (the organism or system itself, and
the internal and external environments) are in stable equilibrium.
Managed Element: This is the smallest unit of the application and it contains executable code
(e.g., numerical model of a physical process) and a data structure that defines the executable
code’s attributes (e.g., its purpose, operation, input and output requirements, criteria for when and
how to control it). At runtime, the managed element can be affected in different ways, for example,
it can encounter a failure during execution, it can be externally attacked, or it may slow down and
affect the performance of the entire application.
Environment: The environment represents all the factors that can impact the managed element. The
environment and the managed element can be two subsystems forming a stable system. Any
change in the environment causes the whole system to go from a stable state to an unstable state.
This change is then offset by reactive changes in the managed element causing the system to move
Control
PE PE
KE KE
programmed autonomic
behavior behavior
M & A Cardinals
G cardinal
L
M&A E E M&A
S A S
Managed element
Environment
Internal External
back from the unstable state to a different stable state. Notice that the environment consists of
two parts — internal and external. The internal environment consists of changes internal to the
managed element, which can be looked at as reflecting the state of the application. The external
environment can be thought of as reflecting the state of the execution environment.
Control: Each autonomous component has its own manager that: (1) accepts user-specified require-
ments (fault tolerance, performance, security, etc.) (2) interrogates the data structure that
characterizes the executable code (3) senses the state of the overall computation (4) determines the
nature and instantaneous state of the overall computational environment and (5) uses this infor-
mation to control the operation of its associated executable code within the overall system in order
to effectively achieve the user-specified requirements. This process is accomplished on-the-fly and
continuously throughout the execution of the overall computation. As is evident from Figure 11.5,
the control part consists of two control loops — the local loop and the global loop.
The local loop can only handle known environment states. Its knowledge engine contains the
mapping of environment states to behaviors. For example, when the load on the local system goes
above the threshold value, the local control loop will work toward balancing the load by either con-
trolling the local resources available to the managed element or by reducing the size of the problem
handled by this element. This will work only if the local resources can handle the computational
requirements. However, the local loop is blind to the overall behavior of the entire application or
system and thus cannot achieve the desired global objectives. In a scenario where the entire system
is affected, the local loop will continue repeating local optimization that may lead to degradation
in performance and result in unadapted or chaotic behavior. At some point, one of the essential
variables of the system (in this case, a performance cardinal) overshoots its limits. This is when the
global loop comes into action.
The global loop can handle unknown environment states and may involve machine learning.
It uses four cardinals for the monitoring and analysis of the managed elements. These are
performance, configuration, protection, and security. These cardinals are like the essential variables
described in Ashby’s ultrastable system. This control loop acts toward changing the existing behavior
of the managed element such that it can adapt itself to changes in the environment. For example, in
load-balancing, the desired behavior of the managed element (as directed by the local loop) requires
its local load to be within prescribed limits. However, the local loop might not be able to maintain
the local load within these acceptable limits, which in turn might degrade the performance of the
overall system. Consequently, this change in the overall performance cardinal triggers the global
loop, which then selects an alternate behavior pattern from the pool of behavior patterns for the
managed element. This analysis and planning uses its knowledge engine. Finally, the new plan is
executed to adapt the behavior of the managed element to the new environment conditions.
Input and Output Ports: Many interacting autonomous components may be composed to form a
complex application. These autonomic components use the input and output ports for such a
composition.
Self-Awareness: an autonomic system knows itself and is aware of its state and its behaviors.
Self-Protecting: an autonomic system is equally prone to attacks and hence it should be capable of
detecting and protecting its resources from both internal and external attack and of maintaining
overall system security and integrity.
Self-Optimizing: an autonomic system should be able to detect suboptimal behaviors and intelligently
perform self-optimization functions.
Self-Healing: an autonomic system must be aware of potential problems and should have the ability to
reconfigure itself to continue to function smoothly.
Self-Configuring: an autonomic system must have the ability to dynamically adjust its resources based
on its state and the state of its execution environment.
Contextually Aware: an autonomic system must be aware of its execution environment and be able to
react to changes in the environment.
Open: an autonomic system must be portable across multiple hardware and software architectures, and
consequently it must be built on standard and open protocols and interfaces.
Anticipatory: an autonomic system must be able to anticipate, to the extent possible, its needs and
behaviors and those of its context, and be able to manage itself proactively.
Sample self-managing system/application behaviors include installing software when it detects that
the software is missing (self-configuration), restarting a failed element (self-healing), adjusting current
workload when it observes an increase in capacity (self-optimization), and taking resources offline if it
detects an intrusion attempt (self-protecting).
Each of the attributes listed above are active research areas toward realizing autonomic systems and
applications. Generally, self-management is addressed in four primary system/application aspects, that is,
configuration, optimization, protection, and healing. Further, self-management solutions typically consist
of the steps outlined earlier: (1) the application and underlying information infrastructure provide infor-
mation to enable context and self-awareness (2) system/application events trigger analysis, deduction, and
planning using system knowledge and (3) plans are executed using the adaptive capabilities of the system.
An autonomic system implements self-managing attributes using the control loops described earlier to
collect information, makes decisions, and adapt, as necessary.
Autonomic components need to collaborate to achieve coherent autonomic behaviors at the application
level. This requires a common set of underlying capabilities including representations and mechanisms for
solution knowledge, system administration, problem determination, monitoring and analysis, and policy
definition, enforcement, and transaction measurements [5]. For example, a common solution knowledge
capability captures installation, configuration, and maintenance information in a consistent manner,
and eliminates the complexity introduced by heterogeneous tools and formats. Common administrative
console functions ranging from setup and configuration to solution runtime monitoring and control
provide a single platform to host administrative functions across systems and applications, allowing users
to manage solutions rather than managing individual systems/applications. Problem determination is one
of the most basic capabilities of an autonomic element and enables it to decide on appropriate actions when
healing, optimizing, configuring, or protecting itself. Autonomic monitoring is a capability that provides
an extensible runtime environment to support the gathering and filtering of data obtained through sensors.
Complex analysis methodologies and tools provide the power and flexibility required to perform a range
of analyses of sensor data, including deriving information about resource configuration, status, offered
workload, and throughput. A uniform approach to defining the policies is necessary to support adaptations
and govern decision-making required by the autonomic system. Transaction measurements are needed to
understand how the resources of heterogeneous systems combine into a distributed transaction execution
environment. Using these measurements, analysis and plans can be derived to change resource allocations
to optimize performance across these multiple systems as well as determine potential bottlenecks in the
system.
NW N NE
W E
SW SE
S
NW N NE
W E
SW S SE
Autonomic manager
Analyze Plan
Knowledge
Monitor Execute
S A
output ports (the DEVS-Java model). A cell is programmed to undergo state changes from “unburned”
to “burning” if it is hit by an igniter or gets a notification message to compute its fire-line intensity value.
The cell changes state from “unburned” to “burning” only if the computed fire-line intensity is above a
threshold value for “burning.” During the “burning” phase, the cell propagates to eight different fire
components along the eight directions (refer to Figure 11.5). The direction and value of maximum fire
spread is computed using Rothermel’s fire spread model [6]. The remaining seven components are then
derived using a different decomposition algorithm. Rothermel’s model takes into account the wind–speed
and direction, the vegetation type, the calorific value of the fuel, and terrain type in calculating the fire
spread.
Self-optimizing
actions
Self-
optimizing Pause
policies
Resume
Self-protecting Partition cell space
actions horizontally
Self- States
protecting Partition cell space vertically
policies
windw Allocate resources
(a)
1
2
... North or south
1 2 N
wind directions
...
East or west
wind directions
(b)
1 1
2 2 North–east or south–west
... ... wind direction
N N
North–west or south–east
wind direction
Repartition
cell space
vertically
1
2
... 1 2 ... N
Generate output
Time advance
State
transition
Cstate: burning Cstate: failed Cstate: stopped Cstate: ready Cstate: burning
Internal transition
due to
node failure
FIGURE 11.12 Sets of state transitions and actions guided by self-healing policies.
is generated, which involves a self-optimization action. In this case, the requested action is to repartition
the cell space vertically. Referring to Figure 11.11, we see that this action is mapped with the state windns
The autonomic manager uses its monitoring and analysis engines to detect this state change. The planning
engine then generates the appropriate action as shown in Figure 11.11. The appropriate action (mapping
function) is a part of the knowledge stored in the knowledge engine. Finally, the execution engine executes
this action.
Case 2: State change from burning to failed triggers actions guided by the self-healing policy.
A scenario where a component stops running due to a failure in the node is shown in Figure 11.12.
A series of state changes generates the corresponding actions as shown in Figure 11.12. Finally the
component resumes execution from the last check-pointed state and its state changes into burning.
Autonomic manager
Analyze Plan Group of forest
Knowledge cells
Monitor Execute
S A
VCU
Input ports
Output ports Input ports Output ports Input ports Output ports
S A S A S A
FIGURE 11.13 A group of forest fire cells expressed as an autonomic coupled component.
CCstate: loadHigh CCstate: paused CCstate: reconfigure CCstate: ready CCstate: loadMid
VCU2
CCact: select cells for migration CCact: resume
VCU1 after
repartitioning
FIGURE 11.14 Sets of state transitions and actions guided by self-optimizing policies bring a component from heavily loaded to lightly loaded state.
11-169
© 2006 by Taylor & Francis Group, LLC
11-170
Input interfaces Output interfaces Input interfaces Output interfaces Input interfaces Output interfaces
S E S E S E
Forest fire cell burning Forest fire cell burning Forest fire cell burning
1. Data 1. Data 1. Data
2. Operational rules 2. Operational rules 2. Operational rules
3. Neighbor information 3. Neighbor information 3. Neighbor information
4. Local computation 4. Local computation 4. Local computation
S E S E S E
Forest fire cell burning Forest fire cell burning Forest fire cell burning
1. Data 1. Data 1. Data
2. Operational rules 2. Operational rules 2. Operational rules
3. Neighbor information 3. Neighbor information 3. Neighbor information
4. Local computation 4. Local computation 4. Local computation
Two projects, AutoMate [21] and Autonomia [20], belonging to the second category, directly investig-
ate the key issues of autonomic component/service definition and construction, autonomic application
construction, execution and management, and autonomic middleware services. These systems are briefly
described below.
11.6.2 Autonomia
Autonomia (University of Arizona) provides application developers with the tools required to specify the
appropriate control and management schemes, the services to deploy and configure required software
and hardware resources, and to run applications. Autonomia can efficiently support the development
of pervasive systems and services, and provides an environment to make the control and management
programming
framework
Accord
Programming system
Ontology, taxonomy
coordination
middleware
Decentralized coordination engine
Rudder
agent framework,
decentralized reactive tuple space
middleware
Content overlay
content-based routing engine,
self-organizing overlay
User’s Application
application management
editor
AMS
Managed elements
Host 1 Host 2 Host N
Interface Interface Interface
Component 1 Component 2 Component N
Mobile agent 1 Mobile agent 2 Mobile agent N
of large-scale parallel and distributed applications autonomic. Autonomia provides online monitoring
and management to maintain the desired autonomic attributes of applications as well as system services,
achieving self-deployment, self-configuration, self-optimization, self-healing, and self-protection by the
policy engines.
The main modules of Autonomia include Application Management Editor (AME), Autonomic
Middleware Services (AMS) and Application Autonomic Manager (AAM). The AME is a graphical user
interface for developing an application using pre-developed and standard components and specifying the
management requirements for each application component. The AMS provides common middleware
services and tools needed by applications and systems to automate the required control and management
functions. The AAM mainly focuses on setting up the application execution environment. It acts as the
application administrator that is responsible for allocating the appropriate resources to run the application
and maintaining the application requirements at runtime.
11.7 Summary
In this chapter, we presented the autonomic computing paradigm, which is inspired by biological systems
such as the autonomic human nervous system and which enables the development of self-managing
computing systems. These systems use autonomic strategies and algorithms to handle complexity and
uncertainties with minimum human interference, thus shifting the burden of managing systems from
people to technologies. An autonomic computing system is a collection of autonomic components, which
implement an intelligent control loop to monitor, analyze, plan, and execute using knowledge of the
environment.
Several research efforts focused on enabling the autonomic properties address four main areas: (1) A self-
healing system could be expected to heal program parts that malfunction. (2) Self-protection of systems
to prevent large-scale correlated attacks or cascading failures from permanently damaging valuable infor-
mation and critical system functions. (3) Self-configuration that involves automatic incorporation of
new components and automatic component adjustment to new conditions. (4) Self-optimization on
a system level addressing automatic parameter tuning. Projects in both industry and academia have
addressed autonomic behaviors at all levels of system management, from the lowest levels of the hardware
to the highest levels of software systems and applications. At the hardware level, systems are dynamically
upgradable [26], while at the operating system level, active operating system code is replaced dynamically
[27]. Efforts have also focused on autonomic middleware, programming systems, and runtime [21, 28].
At the application level, self-optimizing databases and web servers dynamically reconfigure to adapt service
performance. The challenges to achieve true autonomic computing still exist, which will be accomplished
through a combination of process changes, skills evolution, new technologies and architecture, and open
industry standards.
References
[1] Autonomic Nervous System. http://users.rcn.com/jkimball.ma.ultranet/BiologyPages/P/PNS.
html#autonomic
[2] W. Ross Ashby. Design for a Brain, 2nd ed., Revised, Chapman & Hall Ltd, London, 1960.
[3] J.O. Kephart and D.M. Chess. The vision of autonomic computing. IEEE Computer, 36, 41–50,
2003.
[4] IBM Corporation. Autonomic computing concepts. http://www-3.ibm.com/autonomic/library.
shtml, 2001.
[5] IBM. An Architectural Blueprint for Autonomic Computing, April 2003.
[6] R. Rothermel. A mathematical model for predicting fire spread in wildland fuels. Research paper
INT-115. Ogden, UT: U.S. Department of Agriculture, Forest Service, Intermountain Forest and
Range Experiment Station, 1972.
[7] Oceanstore. http://oceanstore.cs.berkeley.edu, July 8, 2002.
[8] J. Kubiatowicz. OceanStore: Global-scale persistent storage. Stanford Seminar Series, Stanford
University, 2001.
[9] IBM Almaden Research. IBM storage tank — A distributed storage system WhitePaper. January 24,
2002.
[10] The Océano Project, http://www.research.ibm.com/oceanoproject, IBM Corporation.
[11] Guy M. Lohman and Sam Lightstone. SMART: Making DB2 (More) Autonomic. In Very Large
Data Bases (VLDB) Conference 2002.
[12] S. Chaudhuri. AutoAdmin: Self-tuning and self-administering databases. http://research.
microsoft.com/research/dmx/autoadmin, Microsoft Research Center.
[13] R. Pool. Natural selection, a new computer program classifies documents automatically, 2002.
[14] Q-fabirc. http://www.cc.gatech.edu/systems/projects/ELinux/qfabric.html.
[15] G. Kaiser, P. Gross, G. Kc, J. Parekh, and G. Valetto. An approach to autonomizing legacy systems.
In Workshop on Self-healing, Adaptive and Self-managed Systems, New York City, NY, June 23, 2002.
[16] Anthill. http://www.cs.unibo.it/projects/anthill/index.html.
[17] Robbert van Renesse, Kenneth Birman, and Werner Vogels. Astrolabe: A robust and scalable
technology for distributed system monitoring, management, and data mining. ACM Transactions
on Computer Systems, 21, 164–206, 2003.
[18] Gryphon. http://www.research.ibm.com/gryphon/gryphon.html
[19] Smart Grid, http://www.ldeo.columbia.edu/res/pi/4d4/testbeds/
[20] S. Hariri, Lizhi Xue, Huoping Chen, Ming Zhang, S. Pavuluri, and S. Rao. Autonomia:
an autonomic computing environment. In Proceedings of the Performance, Computing, and
Communications Conference, IEEE International, April 9–11, 2003.
[21] M. Agarwal, V. Bhat, H. Liu, et al. AutoMate: Enabling autonomic applications on the grid. In
Autonomic Computing Workshop Fifth Annual International Workshop on Active Middleware Services
(AMS’03), June 25–25, 2003.
[22] H. Liu, M. Parashar, and S. Hariri. A component-based programming framework for autonomic
applications. In Proceedings of 1st IEEE International Conference on Autonomic Computing
(ICAC-04), IEEE Computer Society Press, Washington, 2004, pp. 278–279.
[23] Z. Li and M. Parashar. Rudder: A rule-based multi-agent infrastructure for supporting autonomic
grid applications. In Proceedings of 1st IEEE International Conference on Autonomic Computing
(ICAC-04),” May 2004, pp. 10–17.
[24] N. Jiang, C. Schmidt, V. Matossian, and M. Parashar. Content-based Middleware for Decoupled Inter-
actions in Pervasive Environments, Rutgers University, Wireless Information Network Laboratory
(WINLAB), Piscataway, NJ, USA, 2004.
[25] V. Bhat and M. Parashar. Discover middleware substrate for integrating services on the grid.
In Proceedings of 10th International Conference on High Performance Computing (HiPC 2003),
Springer-Verlag, Heidelberg, December 2003, pp. 373–382.
[26] J. Jann, L.M. Browning, and R.S. Burgula. Dynamic reconfiguration: Basic building blocks for
autonomic computing on ibm pseries servers. IBM Systems Journal, 2003.
[27] J. Appavoo, K. Hui, C.A.N. Soules, R.W. Wisniewski, D.M. Da Silva, O. Krieger, M.A. Auslander,
D.J. Edelsohn, B. Gamsa, G.R. Ganger, P. McKenney, M. Ostrowski, B. Rosenburg, M. Stumm,
and J. Xenidis. Enabling autonomic behavior in systems software with hot swapping. IBM Systems
Journal, 2003.
[28] James Kaufman and Toby Lehman. Optimal grid: Grid middleware for high performance
computational biology. Research report, IBM Almaden Research Center.
[29] P. Horn. Autonomic Computing: IBM’s Perspective on the State of Information Technology,
http://www.research.ibm.com/autonomic/, October 2001.
[30] Adaptive Systems. http://www.cogs.susx.ac.uk/users/ezequiel/AS/lectures.
12.1 Introduction
Optimization problems are widespread and appear frequently in a variety of common, everyday applica-
tions. For example, a shipping company handles containers of varying sizes and shapes, and wants to pack
the maximum possible number into a fixed space. This packing plan must be generated for each truck. An
airport wants to determine the fastest pattern for their fleet of snow plows and dump trucks for clearing
12-179
snow from the runways. Since heavier snowfall will require more trips by the dump trucks, they also need
the plowing pattern generator to include the rate of snowfall in the computations. Finally, consider a
school district that needs to reduce the amount of fuel used by its 40 buses, and wants to determine the
shortest route that will allow children to be picked up at their homes. The district has a call-in system
for children who need not be picked up on a particular day, so the route plan has to be updated every
morning.
In each of these examples, the result must be determined quickly. Unfortunately, in order to determine
the very best answer in each case, all possible solutions must be considered. For example, in order to
determine the best bus route to pick up 50 children for one bus in a fixed area requires that 50! routes
be examined. That is 3.04e64 routes for a single bus. For 40 buses of 50 children each, there are 2000!
possible schedules! Because of the amount of time taken to find an optimal answer to these types of
computationally intractable problems, approximation algorithms are developed to find acceptably good
answers in a reasonable amount of time.
The decision to use an approximation algorithm raises additional questions. How close will a particular
approximation algorithm’s answer be to the optimal answer? Will the approximation algorithm be able
to find this answer fast enough? In all the examples given, there is a limit to the amount of time that can
be spent to find a solution. The program implementing the algorithm must guarantee a solution before
the time limit is reached.
Biologically inspired optimization methods such Genetic Optimization Algorithms (GOAs) offer the
advantage of an immediate approximate solution. The available time is then used to refine that solution
to bring it as close as possible to the optimal. Of course, in real applications, the actual optimal solution
is not known. However, where optimal solutions are known, GOA approximations frequently find them.
When the optimal is not found, the approximation is usually very close. GOAs can be written to execute in
parallel to speed up the process of finding an optimal or approximate solution. Communication of local
results among Parallel GOAs (P-GOAs) can often improve both the quality of the approximations and the
speed at which they are found.
The likelihood that a very good solution will be found when using a P-GOA depends largely on the values
of the parameters used by the program. These parameters include population size, deme size (the number
of individuals on each processor), number of generations, rate of communication, and communication
frequency. This chapter discusses a mathematical method to determine appropriate P-GOA values for
these parameters for a multiprocessor scheduler. The computations required to determine proper values
for the P-GOA parameters are shown in detail. In addition, known optimal schedules are compared with
schedules generated using the P-GOA and the results are shown.
The analysis and experimentation discussed here represents ongoing investigation. The results presented
here represent the culmination of an initial attempt to find an “all purpose” sizing formula for task
scheduling in a computational science programming environment. Using the work by Cantú-Paz [1] and
Goldberg et al. [2] as a theoretical and practical foundation, numerous computations were explored,
and trials were run using a variety of equations. An “all purpose” formula was not found; however, it is
now clear what can be accomplished using this sort of methodology, and what problem qualities suggest
a very different approach. The material in some of the sections of this chapter has appeared in previous
discussions of the progress in this area [3–6]. It has been repeated here so that this chapter may be read
without the need to refer to the earlier work.
The rest of this chapter is organized as follows. Section 12.2 outlines the background of the sample
problem and of previous work related to parameter value selection for parallel Genetic Algorithms (GAs).
Section 12.3 presents the problem-specific scheduling model and the complexity of the problem. This
material has appeared in previous publications. Section 12.4 introduces GAs and the application of the
genetic metaphor to the scheduling problem, which has appeared in previous publications. Section 12.5
describes the genetic operators and the parameter variables used in the scheduler. This section is similar
to discussions in Reference 6, but provides the critical modifications that produced a successful deme
sizing equation for the task scheduling P-GOA. Readers familiar with the previous reports might begin
their reading in Section 12.5. Section 12.6 describes the design of the experiments and describes how
particular values were chosen for the sizing equations. Section 12.7 details the results of the experiments
and summarizes the conclusions.
12.2 Background
A multiprocessor scheduling problem is used to illustrate appropriate calculations for determining
several important parameter values for a P-GOA solution approximation. The schedule to be developed
by the P-GOA specifies the processor on which each of the tasks required for a particular application
is to run. This application schedules tasks on a cluster of homogeneous processors with an Ethernet
connection. The task execution times and communication times are known in advance. This models
the scheduling requirements that may be encountered when performing distributed database queries or
when doing “production runs” in computational science applications with known computation times and
varying data. In applications such as these, execution and communication times can be known (or at least
estimated very closely) in advance.
Parameter sizing methods were reported by Goldberg [7], Goldberg et al. [2], and others. Precise analysis
of these methods for selected problems and processor topologies were presented by Cantú-Paz [1]. In his
forward [1], Goldberg states, “I believe it is fair to say that prior to this work, the design of parallel GAs
was something of an empirical black art, guided largely by ad hoc trial and error experimentation.” The
statistical analysis presented in these works had their basis in schema theory. Schema theory attempts
to provide an explanation for the behavior of genetic algorithms. Analysis of this explanation of GA
behavior resulted in precise formulae for determining, a priori, the values of GA input parameters that
would produce a particular solution quality.
A number of objections to schema theory have been published. A recent book by Reeves and Rowe [8]
summarizes many of these objections, and provides sufficient bibliographic information to begin a more
in-depth investigation into the reasoning of schema theory detractors. They suggest that schema theory-
based analysis can be fruitful for specific categories of GAs. However, they insist that the behavior of other
types of GAs differs vastly from that described by schema theory. Nevertheless, the benefits of calculating
accurate GA parameters in advance, in terms of time saved and solution quality, can be sizable for some
applications. In this chapter, effective parameters are found for a task-scheduling problem by applying
appropriately adapted analysis.
In Reference 3 these concepts were first applied to this scheduling problem. In References 4 to 6, an
in-depth examination of three deme sizing equations developed over time by Cantú-Paz, Goldberg, and
others was reported in terms of their applicability to the scheduling problem. Information in Reference 6
outlined how to apply the equations to the cluster-scheduling problem, specifically detailing the appro-
priate determination of values for the equation variables. This chapter explains refinements developed
through analysis of the most promising equations, and provides the rationale for making those refine-
ments. Justification of approximations used in parameter computations and modifications to the original
equations are explained.
12.3.1 Definitions
A task ti is represented as a taskpair (ei , ci ), consisting of an execution time ei and a communication
time ci . The total number of tasks to schedule is n. The symbol pi represents one of the m processors
available to the scheduler. The makespan of a schedule is the time at which the execution of all taskpairs
is completed. The optimal makespan is the shortest possible period in which a given set of taskpairs can
execute on the available multiprocessor system. The goal of the scheduler is to produce schedules with
makespans as close to optimal as possible within a predictable and practical amount of time. In summary,
the set of tasks {t0 , . . . , tn−1 } is scheduled to execute on a system of m processors {p0 , . . . , pm−1 }.
Tasks are created and scheduled by an initial processor designated p0 . The time p0 uses to create
a schedule for the given set of tasks is not considered as a part of the makespan. The time required to
send the task assignments to the other processors is assumed constant and is therefore not considered.
Any final computations that must occur after all results are communicated back to p0 are unaffected
by the particular schedule, and are not included in the schedule quality evaluations. Finally, messages
and processes for control or monitoring are assumed to have no effect on the relative efficiencies of the
schedules, so these values are also not considered in the evaluation of schedule quality.
The time required for each task to execute and communicate a result (or estimations of these times)
is available in advance. Dependent computations are grouped into a single task so that each task is
independent. Consequently, there are no precedence constraints on the tasks. However, the computation
portion of each task must complete execution before its corresponding message is sent.
and the execution time complexity of the sequential version of the genetic schedule optimization
algorithm is
The scheduler is cost optimal in that the space complexity of the P-GOA is
{(7, 16), (11, 22), (12, 40), (15, 22), (17, 23),
(17, 23), (19, 23), (20, 28), (20, 27), (26, 27),
(28, 31), (36, 37), (31, 29), (28, 22), (23, 19),
(22, 18), (22, 17), (29, 16), (27, 16), (35, 15)},
1 0 0 0 0 0 0 0 2 0 0 0 1 1 1 1 1 2 1 2.
The analysis here considers only deme sizing, sets migration to the maximum possible value, and uses
a fully connected system.
Consideration of collateral noise (the noise of other partitions when deciding between best and second
best building block) is built into the sizing equation. External noise may exist if fitness cannot be directly
calculated. This was not the case for the scheduling application, so the sizing computation does not require
adjustments for external noise. Each of the processors used in finding the schedule is assigned nd schedules
to evolve locally.
Analysis produced a sizing equation that is identical to the one given in Reference 1. However, many of
the variables could not be calculated in the same manner as in Reference 1 and required approximations.
In addition, experiments revealed that a scaling factor was needed to compensate for a crucial difference
in this application. Unlike the applications in Reference 1, the scheduler does not need to converge. It is
only necessary to find one schedule of the required quality. The scaling factor will be discussed in detail
in Section 12.6.4. The equation, minus scaling, is given below:
where
( fmax − fmin )2
≈ , (12.2)
12
Because the goal of the scheduler is to minimize the makespan, the fitness of an individual chromosome
string is evaluated as
fitness = −1 × makespan.
In the scheduling application, fitness is an indirect computation requiring an evaluation of the meaning
and implications (i.e., effect on communication time) of the encoding. When they compared expected
confidence levels with experimental results, the degree of correctness was defined as the percentage of
alleles possessing the correct value when the algorithm converged. In the scheduling application, the degree
of correctness is defined as the “nearness” to the optimal schedule that can be obtained in a limited amount
of time by the best individual. These differences were overcome by problem specific interpretations of
the meanings of the equation variables and by employing a scaling factor to adjust the output of the
equation.
The fitness distribution associated with a particular building block in a task schedule cannot be
directly determined from the value assigned to the building block. Fitness calculations must consider
the meaning of the allele value in terms of previous assignments and the effect on communication time.
Consequently, the P-GOA scheduler considers the smallest possible difference in average makespans
and the average execution time. The average of these two values is used as an estimate for signal
difference.
d ≈ (1 + µe )/2.
This value will vary in relationship with the means of the data set values. Data sets with larger execution
times will have larger signal difference values.
The worst possible schedule for a given data set could be generated if tasks were scheduled to
maximize the makespan. This worst fitness for the specific scheduling problems is calculated as
follows:
The minimum makespans were estimated using observations of task placements found by the optimal
algorithm
1
fmin ≈ if µc µe (1/num_procs) lval µe + 2µc × 15 × 25 + 10 = 135,
3
1
if µc > µe (1/num_procs) lval µc + µe × 15 × 40 + 25 = 225.
3
Next, the probability of retaining the best building block between generations on at least one deme is
1
p≈ + ψ/sqrt(2π ) where ψ = d/(σbb sqrt(2m )),
2
1
≈ + (13/(σbb sqrt(28))/sqrt(2π )),
2
≈ 0.5 + (13/σbb × 5.292)/0.251,
≈ 0.5 + 9.789/σbb ,
where
σbb = sqrt(σf2 )
and
then
7 0.805 0.80
10 0.878 0.85
14 0.950 0.90
24 0.975 0.95
50 0.997 0.99
Nearness to optimal
0.9
0.8
actual
0.7 predicted
0.6
7 10 14 24 50
Deme size
TABLE 12.2
115 115 78
117 117 404
118 119 462
118 118 40
114 114 620
TABLE 12.3
TABLE 12.4
found. The tables illustrate that the P-GOA found the optimal makespan 53.33% of the time. When the
optimal was not found, the approximation was very close.
However, experiments with exponentially distributed data yielded very disappointing results.
Makespans in all experiments were far from optimal. The distribution of the data is undeniably a strong
factor in the applicability of the sizing equations. This is a serious limitation, but it should be kept in
perspective. The results of the sizing equation were disappointing, but the scheduler was able to pro-
duce schedules that were <3% away from the predicted quality at the high range (99%). In practical
terms, this is quite useful and the scheduler is ready for experimental incorporation into a cluster system.
Table 12.5 provides the average quality measures compared with the predicted quality at each population
size calculated. Figure 12.2 illustrates the disappointing performance.
7 0.436 0.80
10 0.541 0.85
14 0.668 0.90
24 0.842 0.95
50 0.962 0.99
0.9
0.8
0.7
0.6
actual
0.5 predicted
0.4
7 10 14 24 50
Deme size
A great deal of analytical work is needed before an “all purpose” parameter sizing methodology can be
found for P-GOAs, but it is encouraging that a “special purpose” parameter sizing methodology has been
proven for at least one complex optimization problem. The only data specific variables needed by the
P-GOA scheduler are the means of the task execution times and communication times. For the type of
ongoing computational science application that this scheduler was designed to work with, this information
is either available or easily estimated.
Acknowledgments
Research Assistants John Picarazzi, Shelia Poorman, Brian McCord, Tzintzuni Garcia, William Jackson,
Simon San Miguel, Jason Picarazzi, and Lucas Wilson have contributed or are currently contributing to
this project. This work is supported by NASA Grant NAG9-1401 and NSF Grant NSF 01-171.
References
[1] Cantú-Paz, E. Efficient and Accurate Parallel Genetic Algorithms, Kluwer Academic Publishers,
Dordrecht, 2000.
[2] Goldberg, D., Deb, K., and Clark, J. Genetic algorithms, noise, and the sizing of populations.
Complex Systems, 6, 1992, 332–362.
[3] Moore, M. Parallel genetic algorithms to find near optimal schedules for tasks on multiprocessor
architectures. In Proceedings of Communicating Process Architectures, Bristol, UK, September 2001,
pp. 27–36.
[4] Moore, M. An accurate and efficient parallel genetic algorithm to schedule tasks on a cluster. In
Proceedings of the International Parallel and Distributing Processing Symposium, Nature Inspired
Distributed Computing Workshop, Nice, France, April 2003.
[5] Moore, M. Accurate calculation of deme sizes for a parallel genetic scheduling algorithm.
In Proceedings of Communicating Process Architectures, Enschede, NL, September 2003,
pp. 305–314.
[6] Moore, M. An accurate parallel genetic algorithm to schedule tasks on a cluster. Parallel Computing,
30(5–6), 2004, 567–583.
[7] Goldberg, D. Sizing populations for serial and parallel genetic algorithms. In Proceedings of the
Third International Conference on Genetic Algorithms, Fairfax, VA, USA, June 1989, pp. 70–79.
[8] Reeves, C. and Rowe, K., Genetic Algorihtms: Principles and Perspectives, Kluwer Academic
Publishers Group, Boston, MA, 2003.
[9] De Jong, K. and Spears, W. Using genetic algorithms to solve NP-complete problems. In Pro-
ceedings of the Third International Conference on Genetic Algorithms, Fairfax, VA, USA, June 1989,
pp. 124–132.
[10] Hou, E., Hong, R., and Ansari, N. Efficient multiprocessor scheduling based on genetic algorithms.
In Proceedings of the 16th Annual Conference of the IEEE Industrial Electronics Society, Asilomar,
CA, USA, November 1990, pp. 1239–1243.
[11] Coffman, E. Introduction to Deterministic Scheduling Theory, Computer and Job-Shop Scheduling
Theory. John Wiley & Sons, New York, 1976.
[12] Horowitz, E. and Sahni, S. Exact and approximate algorithms for scheduling non-identical
processors. Journal of the ACM, 23, 1976, 317–327.
[13] Holland, J. Adaptation in Natural and Artificial Systems. University of Michigan Press, Ann Arbor,
MI, 1975.
[14] Kidwell (Moore), M. Using genetic algorithms to schedule distributed tasks on a bus-based system.
In Proceedings of the Fifth International Conference on Genetic Algorithms, Urbana-Champaign, IL,
USA, July 1993, pp. 368–374.
[15] Eshelman, L., Caruana, R., and Schaffer, J. Biases in the crossover landscape. In Proceedings of the
Third International Conference on Genetic Algorithms, Fairfax, VA, USA, June 1989, pp. 10–19.
[16] Syswerda, G. Uniform crossover in genetic algorithms. In Proceedings of the Third International
Conference on Genetic Algorithms, 1989, pp. 2–9.
[17] Goldberg, D. Genetic Algorithms Search, Optimization, and Machine Learning. Addison-Wesley,
Reading, MA, 1989.
[18] Mitchell, M. An Introduction to Genetic Algorithms. MIT Press, Cambridge, MA, 1996.
[19] Salleh, S. and Zomaya, A. Scheduling in Parallel Computing Systems. Kluwer Academic Publishers
Group, Dordrecht, The Netherlands, 2000.
13-193
13.1 Introduction
The proliferation of the Internet and the availability of powerful computers and high-speed networks as
low-cost commodity components are changing the way computing is done today. The interest in cou-
pling geographically distributed computational resources is also growing for solving large-scale problems,
leading to what is popularly called Grid Computing. Grids enable the sharing, selection, and aggregation
of suitable computational and data resources for solving large-scale data intensive problems in science,
engineering, and commerce [1–3].
An important issue for Grid and other Heterogeneous Computing environments is how to assign
tasks to resources and order execution of the tasks to maximize some performance criterion of the Grid
environment. These procedures are termed matching and scheduling, and taken together, are known as
mapping. There are two different types of mapping: static and dynamic. Static mapping is performed
when the applications are mapped in an offline planning phase, for example, planning the schedule for a
set of production jobs. Dynamic mapping is performed when the applications are mapped in an online
fashion, for example, when tasks arrive at unknown intervals and are mapped as they arrive (the workload
is not known a priori) [4,5]. In both cases, this generalized mapping problem has been shown to be
NP-hard (e.g., in References 6 to 8).
The goal of this study has been to investigate classes of scheduling algorithms for service-based Grid
Environments, that is, where a known request is served to a user or users. Such an environment may
involve request scheduling, which is a task scheduling architecture that uses a service request as the minimal
scheduling unit. A service request is considered to be finer-grained than a job request. A single server would
be able to handle multiple requests [9].
This chapter examines some variations of conventional schedulers for dynamic mapping of depen-
dent tasks. Section 13.2 describes further background material. Section 13.3 discusses related work
in the literature. Section 13.4 describes enhancements made to a grid simulation toolkit and intro-
duces a genetic mapping heuristic. Section 13.5 gives the results from the study. Section 13.6
describes some further work that might be considered, and Section 13.7 draws some conclusions from
the work.
13.2 Background
13.2.1 List Scheduling Heuristics
As described in Section 13.1, this chapter focuses on the dynamic scheduling of tasks on a service-oriented
Grid, a network of heterogeneous machines. The service to be scheduled may be represented by a task
graph or directed acyclic graph (DAG). The DAG specifies the tasks that make up the service, as well as the
dependencies between tasks [10].
The scheduling of DAGs may be done using a number of broad strategies, including Clustering
Algorithms [11]. Clustering Algorithms are not considered in this chapter, but involve mapping sub-
sets of tasks that have large inter-task communications to a set of processors with high bandwidth and
low latency (e.g., see Reference 12).
An example of a generational algorithm is a List Scheduler. A List Scheduler performs mappings based
upon a subset of the tasks in the DAG. It only attempts to schedule those tasks that have had all dependency
relationships fulfilled. This subset of tasks is called a meta-task and consists of tasks that are independent
with respect to each other. This greatly simplifies the mapping process and allows it to be done dynamically.
An auxiliary algorithm then maps this meta-task to available resources. When a rescheduling event (such
as a task finishing) occurs, a new meta-task is constructed and scheduled. The expected execution time of
each task on every resource is considered to be known a priori. This assumption is typically made when
conducting mapping research [11,13].
The mapping of tasks to resources, whether static or dynamic, is an NP-hard problem. Finding an
optimal solution is intractable. Therefore, a heuristic is normally used to choose a mapping.
• OLB (Opportunistic Load Balancing): OLB assigns each task, in arbitrary order, to the next available
machine, regardless of the task’s expected execution time on that machine.
• UDA (User Defined Assignment): UDA assigns each task, in arbitrary order, to the machine with the
best expected execution time for that task, regardless of that machine’s availability.
• Fast Greedy: Each task is assigned to the resource with the minimum completion time. The tasks
are assigned in an arbitrary order.
• Min–Min: The Min–Min heuristic begins with the set U of all unmapped tasks. For each task in
U , the minimum completion time on all machines is calculated. The task with the earliest overall
minimum completion time is selected and assigned to the machine that yielded that minimum
completion time. This task is removed from U , and the process reiterated until all tasks have been
mapped. Min–Min attempts to map as many tasks as possible to their first choice of machine,
under the assumption that this will result in a shorter makespan. Min–Min has been reported as
superior to many other simple mapping heuristics [14].
• Max–Min: Max–Min is similar to Min–Min except that the task with the latest minimum comple-
tion time is selected and mapped. Max–Min attempts to minimize the penalties incurred by the
scheduling of long-running tasks.
13.2.2 GridSim
GridSim [3,16,17] (http://www.gridbus.org/gridsim/) is a toolkit that supports the modeling and simula-
tion of parallel and distributed computing environments. Entities such as users, applications, resources,
and schedulers may be incorporated, primarily to aid in the design and evaluation of scheduling
algorithms. The features of GridSim are described in Reference 17.
GridSim embodies a layered and modular architecture [17] to make use of existing infrastructure such
as the open-source discrete-event simulator SimJava [18], which itself runs in a java virtual machine. The
layered structure is as follows:
• The gridbroker package provides high-level support for Schedulers or Grid Resource Brokers.
Some of the main classes are:
• Broker, which encapsulates a Scheduler — The Broker incorporates a variant of the Fast
Greedy heuristic within its default time-optimized job scheduler.
• BrokerResource, which embodies a Resource, as known to the Broker.
• UserEntity, which represents a user as known to the Broker.
• Experiment, which manages the work done by the Broker for the UserEntity. It includes
a GridletList object that represents the tasks to be scheduled.
• The GridSim package provides the basic grid infrastructure used by gridbroker. It includes
the following primary classes:
• GridSim. This class represents a grid entity and gives it communication and simulation support.
• Gridlet. A class that represents a single independent task. It includes attributes such as
execution size and size of input and output files. As it models an independent task, there is no
support for multiple input and output file sizes.
• GridletList. A class that encapsulates a set of independent tasks. It simply extends the
java utility class LinkedList, while adding support for sorting component Gridlet objects by
execution size.
• GridResource. A class that embodies a grid computing resource. Its attributes are specified
in the member class ResourceCharacteristics.
• ResourceCharacteristics. A class that embodies the attributes of a grid computing
resource. It is used to specify characteristics such as CPU speed, baud rate, availability, and cost.
• gridscheduler
• gridscheduler.ga
Only one class in the gridsim and gridbroker packages was directly amended. All the required
functionality was generally accommodated by writing new subclasses of the existing gridsim or
gridbroker classes.
The use of the max( ) function reflects the assumption that communications transmission may proceed
in parallel with waiting for the processor to be available.
If the task in question has no predecessors, or is colocated with all such predecessors, then the commu-
nication delay is considered to be only a nominal amount — reflecting only an initiating communication
from the scheduler. A corresponding method isCoLocated( ) was created to test whether the task is
colocated. This method is also used by the TaskBroker class (see Section 4.2) when tasks are forwarded
to resources; if a task is colocated, then the input file size of the Tasklet is also set to a nominal value.
A simple and popular model of the communication delay for message passing is:
This model includes a term for the latency of the communication link. However, a bandwidth-only
model has been found to lead to better predictions of communication delay, at least in some applications
[22]. It was therefore considered not essential to add a network latency component to the GridSim
communication delay methods.
• Evolution. Encapsulates the scheduler GA itself. Contains the method evolve( ), which
performs the complete evolutionary process and returns the solution as a Chromosome.
• Population. The set of Chromosomes that encapsulate the mappings currently under
consideration.
• Chromosome. Specifies the processor mappings for all tasks currently ready for execution.
A SingleGene object specifies each task mapping. The Chromosome contains a collection
of these SingleGene objects.
• SingleGene. A container for a gene within a Chromosome. Each gene specifies the mapping
of a task to a processor [resource] and the priority of the mapping to the resource. The priority is
encoded as an integer value. If another SingleGene in the Chromosome specifies that its task
is to be mapped to the same resource with a higher priority, then the other task will be scheduled
before this one.
• Evaluation. A container for the result of an evolutionary generation, that is, whether the
evolution has completed; why it has completed; how many generations the result took; and the
elite Chromosome itself. Completion may have occurred for one of three criteria described in
Section 13.3.1.
• Mating. Encapsulates the genetic pairing of two Chromosomes, which may share genetic
information to produce two [genetically related] offspring.
• GaRandom. A simple class with static methods to return random numbers efficiently.
13.5 Results
13.5.1 Methodology
The GridSim distribution, including its example source files, was used as the starting point for this project.
A number of incremental changes were then made in order to provide needed support (see Section 13.4).
With the implementation of the GA mapping heuristic, a large number of tests were carried out using
a TGFF generated task graph. These tests indicated that the GA heuristic gave a nearly 8% improvement
over a Min–Min heuristic for the task graph in question. Interestingly, the GA heuristic displayed a large
degree of variability in its performance. In a typical set of 100 simulations for the task graph, the GA
heuristic gave an improvement of 7.8%, but with a standard deviation of 5%. However, the vast majority
of results were an improvement over the control Min–Min result.
An identical heterogeneous processor arrangement was used for this and all subsequent tests. The
configuration file for this is given in Appendix A.
In recognition that this result was for only a single task graph, a script was designed to generate
multiple variants of similar task graphs and compare the performance of a trial heuristic against the
Min–Min heuristic for all of these task graphs.
Fortunately, TGFF includes excellent support for just this scenario, as it accepts a random number
generator seed parameter. This seed affects all the randomized aspects of the generated task graph.
Varying the seed while holding all other parameters constant generates task graph families containing an
arbitrary number of task graphs. Given an identical seed, an identical task graph is produced [20].
The script invokes TGFF with an incremented seed, starting from a value of zero (the default value),
generating a family of n task graphs. For each generated task graph, first the Min-Min heuristic is run
and then the trial heuristic is run m times. Therefore, each test involved running the trial heuristic n × m
times.
These task graph family tests revealed that the previously recorded 8% improvement had merely been
due to a fortunate choice of task graph. The results of other task graphs in the same family led to an overall
ambivalent result. Further tests were therefore conducted in an attempt to find circumstances in which
the GA might perform with more consistency.
This further testing at first indicated that the GA heuristic does give a significant improvement for
a family of task graphs. However, this improvement vanished when further members of the task graph
family were evaluated.
• MAX_FINISH (a.k.a. MAKESPAN). This was the objective function used in Reference 1. It is
ideal for a static mapping scenario; but in a generational scheduler it does not allow the GA to
discriminate between solutions on the basis of the completion times of all tasks — not just those
tasks on the processor that completes last. (The critical path of the mapped meta-task.)
• SUM_TASK_FINISH. This function seeks to achieve a better result for all tasks in the meta-task,
not just those that execute on the critical path of the mapped meta-task. This function differentiates
between solutions with an identical makespan, but with poorer task completion times on other
resources. For example, if one solution mapped those tasks not contributing to the makespan on
a single processor, and another solution mapped such tasks to multiple processors — then this
function would discriminate between the two solutions. MAKESPAN gives the same score to both
solutions.
• SUM_PE_FINISH. This applies a variant of the logic of the previous heuristic, but is likely not to
be as discriminating.
• SUM_TASK_CRITICAL. This objective function hopes to introduce some non-locality into the
GA. In other objective functions, the position of the task in the task graph plays no part in how well
it might be mapped. This function seeks to preferentially map those tasks that are most critical to
a total reduction in the makespan of the task graph.
Each GA objective function was tested against 100 task graphs, with each task graph having 100 nodes,
and an execution size of 2 to 8 × 109 instructions and a communication size of 0.5 to 1.5 × 105 bytes.
Each task graph was a recursive series-parallel graph, with a series length of 1–3 and series width of 1–3.
Each task graph also had a local crossover of 20 (see Reference 20 for a full description of parameters).
Appendix B depicts the seed task graph (0) resulting from these parameters.
Other broker configuration parameters were:
<broker_properties>
<heuristic>min_min</heuristic>
<minmin_limit>1</minmin_limit>
<minmin_seed>true</minmin_seed>
<population_size>100</population_size>
<max_generations>500</max_generations>
<max_winning_run>50</max_winning_run>
<crossover_rate>1.0</crossover_rate>
<mutation_rate>0.01</mutation_rate>
<fertility_rate>2.0</fertility_rate>
<random_injection>0.0</random_injection>
<elitism>true</elitism>
..<max_task_per_PE>1</max_task_per_PE>
</broker_properties>
Each test consisted of a family of 10 task graphs, with 20 scheduling runs per task graph, for a total of
800 scheduling runs.
The results indicate that, for this task graph family subset, the GA mapping heuristic gives an average
6% improvement — when the SUM_TASK_CRITICAL objective function is used (Table 13.1). The
SUM_TASK_FINISH function performs nearly as well, while the MAKESPAN function gives less than
half the same improvement. This result is in line with the expectations of the objective functions. The
SUM_TASK_FINISH function, with its contribution of all task finish times, gives a greatly improved
result over the MAKESPAN. The SUM_TASK_CRITICAL function, with its non-localized contribution,
gives a further improved result.
Note that this improvement, relative to Min–Min, vanished when further members of the task graph
family were examined. However, the relative results of the different objective functions still hold.
minmin_limit 1 2 3
minmin_limit 1 2 3 4 5
Proportional increase to minmin −0.017 0.012 0.006 0.019 0.005
Improvement versus
unsorted fg 0.073 0.035 0.051 0.043 0.036 0.029 0.040 0.034 0.033
This chapter assumed a latency of zero, which was considered acceptable, given the effectiveness of a
bandwidth-only model [22] and GridSim’s lack of support for network latency. However, incorporation
of non-zero latencies might provide for a more accurate model of a grid environment. This would require
some refactoring of several GridSim classes, but particularly IO_Data, Input, and Output. It is expected
that this work would have little bearing upon the GA mapping heuristic.
evaluated. The TGFF table generating functionality can be used to generate such randomized grid
environments.
13.7 Conclusions
This chapter has described several variants of an existing dynamic generational mapping heuristic, the
Fast Greedy heuristic. It has been shown that simply sorting the meta-task in order of execution size may
make substantial improvements to the Fast Greedy heuristic. Other orderings, such as fanout size and
critical-path length, appear not to be as advantageous.
This chapter has also described a new dynamic generational mapping heuristic, the GA mapping
heuristic. It has been shown that the total makespan of the heuristic is roughly comparable to that of the
Min-Min heuristic, with an added variance. The variance is introduced by the non-deterministic nature of
the GA (The execution time of the scheduler itself will, of course, be substantially greater). It is suspected
that this result might be enhanced with an improvement in the objective function of the GA. Most
evaluated objective functions only make use of the current meta-task, and have no broader knowledge
of the task graph. The exception to this was the MAX-TASK-CRITICAL function, which attempted to
factor the critical path of a task into the fitness value. This GA appeared to be the best performing (along
with the SUM_TASK-FINISH function). One difficulty in improving the GA is that it may only use the
fitness value to choose between solutions. It may not use any other contextual information, and the fitness
is only a simple scalar value.
It is also suggested that performance of the GA heuristic would be relatively greater in a multiuser or
inconsistent processor environment. In such an environment, the local nature of the GA should be less of
a constraint on performance.
References
[1] Braun, T., Siegel, H., et al. A Comparison study of static mapping heuristics for a class of meta-tasks
on heterogeneous computing systems. In 8th IEEE Heterogeneous Computing Workshop (HCW ’99),
1999.
[2] Foster, I. and Kesselman, C., Ed. The Grid: Blueprint for a New Computing Infrastructure. Morgan
Kaufmann, San Francisco, CA, 1998.
[3] Murshed, M. and Buyya, R. Using the GridSim Toolkit for Enabling Grid Computing Education,
http://citeseer.nj.nec.com/574558.html.
[4] Maheswaran, M., Ali, S., et al. Dynamic matching and scheduling of a class of independent
tasks onto heterogeneous computing systems. In 8th IEEE Heterogeneous Computing Workshop
(HCW ’99), April 1999, pp. 30–44.
[5] Kim, J., Shivle, S., et al. Dynamic mapping in a heterogeneous environment with tasks having
priorities and multiple deadlines. In International Parallel and Distributed Processing Symposium
(IPDPS’03), April 2003.
[6] Coffman, E., Jr., Ed. Computer and Job-shop Scheduling Theory. John Wiley & Sons, New York,
1976.
</resource>
<resource name="resource1">
<baud>80</baud>
<machine>
<pe mips="250"/>
</machine>
</resource>
<resource name="resource2">
<baud>150</baud>
<machine>
<pe mips="68"/>
</machine>
</resource>
<resource name="resource3">
<baud>80</baud>
<machine>
<pe mips="500"/>
</machine>
</resource>
<resource name="resource4">
<baud>100</baud>
<machine>
<pe mips="120"/>
</machine>
</resource>
</grid>
This configuration specifies that the available grid environment consists of five machines, each having a
bandwidth ranging from 80 to 150 KBpsec. All are space-sharing architectures, rather than time-sharing,
and have a nominal cost. Each of the machines have a single processor element, with speed ratings of
68 to 500 mips.
tg_cnt 1
gen_series_parallel true
series_must_rejoin true
series_len 2 1
series_wid 2 1
series_local_xover 20
task_cnt 100 1
task_degree 2 4
tg_write
eps_write
table_label PROCESS_SIZE
table_cnt 1
TEST:_GRAPH 0
Period = 3000
In/Out Degree Limits = 2/4
0
2 4
18 20 22
23 26 28 70 72
24 46 66 68 74 76 98 100
42 44 47 48 67 69 77 99 101
29 43 50 58 62 64 71 75
27 51 52 60 63 73 86 88
19 58 59 87 89 90 92
54 61 65 91 93
21 45 49 55 56
25 57
3 5
30 32
31 33
82 84
83 85
8 78 80
10 12 79 81
11 14 16
17 38 40 94 96
30 41 95 97
15
30 36
35 37
13
1
d=3000
14.1 Introduction
Most of today’s optical networks are hierarchies of SONET rings. This optical fiber communication is
employed with WDM technology, where the whole bandwidth of an optical fiber is divided among a
number of nonoverlapping wavelengths, each of which is capable of carrying high-speed optical data. In
recent years, the bandwidth of a wavelength has increased from 2.5 Gbps (OC-48) to 10 Gbps (OC-192)
and is likely to go upto 40 Gbps (OC-768) in the near future [1,2]. Thus, the bandwidth capacity on
a wavelength is too large for certain traffic requirements. An approach to provide fractional wavelength
capacity is to split a wavelength into multiple time slots and multiplex traffic on the wavelength. Therefore,
14-209
each wavelength running at the line rate of OC-N can carry several low-speed OC-M (M < N ) traffic
channels in TDM fashion. For example, an OC-48 line can carry 16 OC-3 channels. The resulting networks
are called WDM–TDM networks or WDM traffic grooming networks. The ratio of N to the smallest value
of M is called the grooming ratio [1–5].
Using WDM technology, multiple rings can be supported on a single fiber ring. In this architecture,
each wavelength independently carries a SONET ring. Each SONET ring can further support multiple
low-speed streams. At every node, a Wavelength Add/Drop Multiplexer (WADM) adds and drops or
bypasses traffic on any wavelength. At each node, there are SONET add/drop multiplexers (SADM) on
each wavelength to add/drop low-speed streams. So the number of SADMs per node will increase linearly
with the number of wavelengths that a single fiber ring can carry. The cost of SADMs dominates the total
cost of the optical network. But in fact, it is not necessary for each node to be equipped with SADMs on
each wavelength. An SADM on a wavelength at a node is needed only if there is traffic terminating at this
node on this wavelength. So, the problem is to combine different low-speed traffic streams into high-speed
traffic streams in such a way that the number of SADMs is minimized [3,4]. This problem is proven to
be NP-complete [3]. As far as our knowledge goes, we are the first to propose a genetic algorithm (GA)
solution to this problem for unidirectional ring topologies. Here, we have restricted ourselves to consider
static traffic pattern only. We have proposed an ILP formulation for the problem and solved it using a
GA. We have shown that our algorithm produces better results compared to those found in some recent
literature.
The rest of the chapter is organized as follows: Section 14.2 presents an example for SADM min-
imization problem. Section 14.3 proposes the ILP formulation of the problem. A brief introduction
to GA is presented in Section 14.4. Section 14.5 gives an overview of the proposed GA to solve
the problem. The results obtained are discussed in Section 14.6. Finally, Section 14.7 concludes the
chapter.
0 1 1 1
0 0 1 1
T=
0 0 0 1
0 0 0 0
l1 l2
l1 l1
4 2
l2 l2
l1 l2
l1 l2
l1 l1
4 2
l2
l1 l2
We know that an SADM for wavelength w is needed at node n only if node n is the source or destination
of any traffic stream using wavelength w. Hence, we can compute SADMnw for each (n, w) pair in the
following way.
1 if [si = n or di = n] and li = w,
SADMnw =
0 otherwise,
where (si , di ) ∈ R, 0 ≤ li ≤ L − 1, 0 ≤ i ≤ |R| − 1. Therefore, the total number of SADMs in the ring is
N −1 L−1
given by n=0 w=0 SADMnw . The objective is to minimize the number of SADMs, that is, minimize
N −1 L−1
n=0 w=0 SADMnw .
There are certain constraints that we have to consider. First, 0 ≤ SADMnw ≤ 1, and SADMnw is an
integer variable. The other constraint is that no wavelength at any link should be overloaded, that is, the
maximum number of traffic streams that pass through it should be less than or equal to the grooming
ratio g .
The link from node n to ((n + 1) mod N ) is traversed by all low-speed traffic streams (si , di ) ∈ R,
where (si ≤ n < di ), or (n < di < si ), or (di < si ≤ n). Therefore, the load on the wavelength w on link
n → ((n + 1) mod N ) is given by the cardinality of the set
(si , di ) | [(0 ≤ si ≤ n < di ≤ N − 1) or (0 ≤ n < di < si ≤ N − 1)
LDnw = or (0 ≤ di < si ≤ n ≤ N − 1)] and [li = w], .
(si , di ) ∈ R, 0 ≤ li ≤ L − 1, 0 ≤ i ≤ |R| − 1
N −1 L−1
SADMnw is minimized.
n=0 w=0
Subject to:
Example 14.1
Input set: R = {(2, 3), (6, 2), (1, 8), (5, 6), (2, 4), (3, 6)}
Solution representation: W = { 3 8 3 2 2 1 }
14.5.4 Crossover
The crossover operation is used to exchange genetic materials between two parents. Here we use single-
point crossover, which produces two children from two parents. A random crossover point is generated
between 0 and the maximum chromosome length of the solution. This point divides each solution into
two parts. The corresponding parts of the parent chromosomes are swapped to produce two new offspring.
Example 14.2
Parent1: 1 3 5 2 3 6 7 4 1
Parent2: 3 4 5 2 2 5 7 1 5
Crossover point: 5
Offspring1: 1 3 5 2 3 | 5 7 1 5
Offspring2: 3 4 5 2 2 | 6 7 4 1
14.5.5 Mutation
A random mutation point between 0 and maximum chromosome length is selected, and another
wavelength is assigned instead of the current wavelength at that point with some probability called the
mutation probability.
Example 14.3
Chromosome: 1 3 5 2 3 6 7 4 1
Mutation point: 4
New assignment to 4th wavelength (2): 8
New chromosome: 1 3 5 8 3 6 7 4 1
Procedure GA
Begin
Generate initial population as per Section 5.2
Initialize number_of_generations = 0
While (number_of_generations < max _generation) do
Select the parents for crossover as per Section 5.3
s
2
2 2
2
d
1, 2
1 2
1, 2
140
120 Modiano
GA
Number of SADMs
100
GA*
80
RLS
60 Lower
bound
40
20
8 9 10 11 12 13 14 15 16
Number of nodes
problem, whereas in Reference 4, a Reactive Local Search (RLS) algorithm has been presented. We have
tested the performances of the algorithms for all-to-all unitary traffic, that is, traffic requirement is exactly
one for each pair of nodes. Hence the input set of traffic requirements R can be calculated as:
Example 14.5
For s = 0, 1, 2, 3, . . . , N − 1
For d = 0, 1, 2, 3, . . . , N − 1
If s = d then R = R ∪ {(s, d)}
End for
End for
For example, for a SONET ring with three nodes, the input set of connection requirements will be:
R = {(1, 2), (2, 1), (1, 0), (0, 1), (2, 0), (0, 2)}.
We have run our algorithm upto 10,000 generations to get the results, which takes close to 1 min on a
1.7 GHz Pentium IV computer with 128 MB RAM running Windows Me operating system. The execution
time does not depend on the number of nodes of the ring.
Figure 14.6 is a comparison of Modiano’s algorithm, the RLS algorithm, our GA, and GA∗ for all-
to-all unitary traffic in unidirectional rings. The graphs represent the number of SADMs required for a
certain number of nodes. The number of nodes varies from 8 to 16, as most of the SONET ring contains
a maximum of 16 nodes. We have shown the graphs for grooming ratio g = 4. We have also given the
graphs for the optimal number of SADMs required. From the graph it is clear that our algorithm performs
better than Modiano’s algorithm in all the cases. GA∗ and RLS give the optimal result for g = 4.
Figure 14.7 shows the graphs comparing the four algorithms for grooming ratio g = 16. Here we can
see that the improved GA, that is, GA∗ performs better than the other algorithms and its performance is
near optimal.
The effectiveness of the proposed improved GA, that is, GA∗ can be viewed better if we compare the
values for the number of SADMs obtained using the algorithm GA∗ with its lower bound. In Table 14.3,
we have shown the optimum solution and the solution obtained by algorithm GA∗ for different numbers
of nodes and for grooming ratio 4 and 16. It is clear that for grooming ratio 4, GA∗ gives the optimum
result. For larger grooming ratios also, GA∗ provides near-optimal results for a small number of nodes.
Undoubtedly, GA∗ gives the best result among all the algorithms for the case of all-to-all unitary traffic.
70
80
Modiano
60
Number of SADMs
GA
40 GA*
RLS
30
Lower
bound
20
10
8 9 10 11 12 13 14 15 16
Number of nodes
14.8 Conclusions
In this chapter, we have presented a GA for the traffic grooming problem in unidirectional SONET/WDM
ring networks to minimize the number of SADMs required. We have shown that our algorithm performs
better than other heuristic and reactive search techniques found in recent literature. The GA also takes
lesser time to reach the optimal result; this time is independent of the number of nodes as the number of
generations has been fixed at 10,000. In most of the cases, the optimum value is found with a much lesser
number of generations.
References
[1] A. Mukhopadhyay, J.K. Singh, U. Biswas, and M.K. Naskar, Distributed approaches for dynamic
traffic grooming in WDM optical networks. In Proceedings of International Conference on CODEC-
04, Kolkata, India, January, 2004, p. P-55.
[2] A. Mukhopadhyay, J.K. Singh, U. Biswas, and M.K. Naskar, Improved distributed approaches for
dynamic traffic grooming in WDM optical networks. In Proceedings of Conference on DPN-04, IIT
Kharagpur, India, January, 2004, pp. 92–96.
[3] A.L. Chiu and E.H. Modiano, Traffic grooming algorithms for reducing electronic multiplexing
costs in WDM ring networks. IEEE Journal of Lightwave Technology, 18, 2–12, 2000.
[4] R. Battiti and M. Brunato, Reactive search for traffic grooming in WDM networks. In S. Palazzo,
Ed., Proceedings of IWDC2001, Lecture Notes in Computer Science, Springer-Verlag, Heidelberg,
2001.
[5] E.H. Modiano and P.J. Lin, Traffic grooming in WDM networks. IEEE Communications Magazine,
39, 124–129, 2001.
[6] D.E. Goldberg. Genetic Algorithms in Search, Optimization and Machine Learning. Addison-Wesley,
New York, 1989.
[7] M. Mitchell, An Introduction to Genetic Algorithms. MIT Press, Cambridge, MA, 1996.
[8] P. Mazumder and E.M. Rudnick, Genetic Algorithms for VLSI design, Layout & Test Automation.
Prentice Hall, New York, 1999.
[9] S. Roy, S. Bandyopadhyay, U. Maulik, and B.K. Sikdar, A genetic algorithm based state assignment
scheme for synthesis of testable FSMs targeting area and delay. International Journal of Engineering
Intelligent Systems, 10, 45–52, 2002.
[10] U. Maulik and S. Bandyopadhyay, Fuzzy partitioning using real coded variable length genetic
algorithm for pixel classification. IEEE Transactions on Geosciences and Remote Sensing, 41,
1075–1081, 2003.
[11] U. Maulik and S. Bandyopadhyay, Genetic algorithm based clustering technique. Pattern
Recognition, 33, 1455–1465, 2000.
[12] S. Bandyopadhyay and U. Maulik, Non-parametric genetic clustering: Comparison of validity
indices. IEEE Transactions on Systems, Man, and Cybernetics Part-C, 31, 120–125, 2001.
[13] G.B. Fogel and D.W. Corne, Evolutionary Computation in Bioinformatics. Morgan Kaufmann, San
Francisco, CA, 2003.
[14] S. Bandyopadhyay, An efficient technique for superfamily classification of amino acid sequences:
Feature extraction, fuzzy clustering and prototype selection. Fuzzy Sets and Systems, 152, 5–16,
2005.
[15] K. Dev and C. Siva Ram Murthy, A genetic algorithm for the knowledge base partitioning problem.
Pattern Recognition Letters, 16, 873–879, 1995.
15.1 Introduction
The surging demand for mobile communications and the emerging field of ubiquitous computing require
enormous development in the realm of wireless networking. Although many problems in mobile wireless
networking can be successfully solved using techniques borrowed from wireline networks, there exists
some problems that are very specific to the wireless domain and often computationally very difficult or
sometimes even intractable. This is primarily due to the inherent limitations of wireless communications,
such as scarce bandwidth, high bit error rate, or location uncertainty. Most of these problems can be
mapped to classical optimization problems, with the goal of optimizing some objective functions, say
resource utilization, while satisfying all the constraints imposed by the wireless communication systems.
The constraints make most of the problems NP-complete or NP-hard [1]. There are two broad approaches
to solve such problems. One is to directly compute the exact solution based on the constraints, using a brute
force technique. However, this approach is often infeasible in a large-scale problem domain. The other
approach is to use a heuristic-based solution that can be computed in feasible time. Although a heuristic
may not yield the optimal solution, a carefully designed heuristic may produce a near-optimal solution
with low computational complexity. Various attempts have been made to develop efficient heuristics that
range from calculus-based search methods to random search techniques. The paradigm of “evolutionary
15-219
computing or programming” essentially provides such heuristics for solving computationally difficult
problems, by mimicking the evolution process that the nature uses to eliminate weak forms of life.
An important example of evolutionary computing is genetic algorithm (GA), which is a guided, random
search technique. In GA, the less fit (or incompetent) solutions are replaced by better solutions produced
by applying some genetic operators on the existing ones in the solution space. This evolutionary learning
process progressively refines the search space and makes the problem computationally manageable, thus
enabling the search procedure to converge quickly and resulting in a near-optimal solution in feasible time.
GA has been successfully applied to solve several optimization problems in the wireless domain [2–4]. The
common objective of such problems is the optimal usage of scarce and hence costly wireless resources,
such as bandwidth. For example, in References 3 and 5, GA has been used to find an optimal location
management strategy for cellular networks. The goal here is to determine an optimal policy for location
updates of a mobile host, which ensures minimal paging and signaling overhead, thereby conserving the
costly wireless bandwidth. Whereas this work considers only a single optimization criteria (the location
management cost), a multi-objective GA-based location management framework has been proposed in
Referece 6 that considers other optimization factors, such as the load balancing between various mobility
routers in an Universal Mobile Telecommunication Systems (UMTS) [7] network. Channel assignment is
another challenging problem in cellular networks. The entire spectrum of wireless bandwidth is divided
into several channels and assigned to the cells, with the constraint that no two adjacent cells are assigned the
same channel, which ensures that there is no co-channel interference. A single cell may have more than one
channels. The channel assignment problem can be shown to be equivalent to the graph coloring problem,
and hence NP-hard [1]. GA-based channel assignment algorithms [2] provide a scalable mechanism to
determine the least number of channels required for a particular cellular network, while satisfying the
constraint related to the co-channel interference problem in adjacent cells. Due to the limited resource
available in wireless networks, the admission of new calls must be monitored to ensure a minimum quality
of service (QoS) for the already admitted calls. This is done by call admission control algorithms, which
essentially select, from a large number of possible admission policies, the optimal one with overall service
quality improvement and minimum call blocking rate. However, the number of possible policies can be
very large such that finding the optimal policy can be computationally intractable. Again, GA provides
a heuristic-based approach [4] to yield a near-optimal call admission policy that ensures QoS for the
admitted calls while minimizing the call blocking rate. GA has also been used in solving QoS routing
problems in networks. It can be shown that unicast routing with two or more QoS constraints can be
mapped to a NP-hard problem. Naturally, the problem is also NP-hard for multicast routing. The imprecise
network state information in dynamic wireless networks further complicates the problem. In Reference 8,
GA has been used to produce near-optimal solutions in computationally feasible time, for multicast
routing problem in networks with imprecise state information, while satisfying more than one QoS
constraints.
The goal of this chapter is to present a review of GA-based solutions of the above problems. The rest of
this chapter is organized as follows. Section 15.2 reviews the basic concept of GAs and their interpretation.
The location management problem is discussed in Section 15.3. Section 15.4 presents a GA-based solution
for optimal channel allocation in cellular networks. Efficient call admission control in cellular networks
is described in Section 15.5, while the design and performance analysis of an efficient multicast routing
protocol is described in Section 15.6. Finally, Section 15.7 concludes the chapter.
been successfully applied to tackle optimization problems such as scheduling [9], adaptive control [10],
game playing [11], cognitive modeling [12], transportation problems [13], traveling salesman problems
[14], database query optimization [15], and so on. Although GAs belong to the class of probabilistic
algorithms, they differ from random search algorithms because they combine elements of directed and
stochastic search. Therefore, GAs are also more robust than purely directed search techniques [16].
As an evolutionary programming model, a GA for a particular problem must have the following five
properties:
• A genetic representation of the potential solution space to a symbolic domain, such as a strings of
symbols, which form the so-called chromosomes.
• A methodology to create an initial population of potential solutions.
• An evolution function that plays the role of the environment, rating solutions in terms of their
“fitness.”
• Genetic operators that alter the structure of chromosomes.
• Values for various parameters such as population size, probabilities of applying genetic operat-
ors, etc.
Figure 15.1 shows a pseudo-code depicting the generic framework of any GA. An initial random
population is created and each member of the population is evaluated using a fitness function. Three
major genetic operators, selection, crossover, and mutation, are then successively applied to the members
of the population in each generation until the stopping criteria is met or there is little change in the quality
of the best solution in the entire population. Selection is the operation that increases the number of good
solutions or chromosomes in a population; usually the best chromosome in each phase of evolution is
selected. The crossover operation generates new representative chromosomes containing the properties
of more than two chromosomes. Finally, mutation changes or mutates a portion of a chromosome to
introduce a new chromosome in the population. When there is little or no change in the quality (or
fitness) of the chromosomes in the population with successive application of GA operators, the best one
is selected to represent the final solution.
Wired backbone
located. This procedure is known as paging. By giving a low upper bound on the maximum number of
cells that can be polled, the paging process reduces the paging cost, but requires more frequent updates,
thus increasing the update cost. On the other hand, a reduction of number of updates essentially decreases
the update cost, but increases the location uncertainty and the subsequent paging cost. The essence of an
optimal location tracking thus lies in minimizing the combined update and paging costs associated with
a particular mobile device. The concept of using a suitable GA-based approach to minimize the average
per-user location management cost was first proposed in Reference 3.
A cellular network can be represented by a graph G(V , E), where a node represents the location areas
(LA — a logical representation of one or more cells) and an edge represents the access paths between a pair
of location areas. At this point, the most interesting question is whether a mobile device will issue an update
or not upon entering a new LA. Let δi represent an update decision variable for a user in LAi (1 ≤ i ≤ M ),
such that δi = 1 or 0 depending on whether or not an update has been issued. Now, if Cost p (i) and Costu
respectively denote the cost associated with paging LAi and issuing a single update, then the total location
management cost (LMC) is determined by taking a weighted average of the LMCs in the individual LAs.
(δi )
This is mathematically represented by: LMC = M i=1 i × LMi , where i is the normalized weight
associated with the LMC, LMi , of LAi . The average LMCs for both cases (δi = 1 and δi = 0) depend on
the individual update cost, paging cost, call arrival rate, and the user’s residence time in the LA. Assuming
a Poisson arrival with rate λ, geometrically distributed (with parameter pi ) residence time, it has been
(1) (0)
shown in Reference 3 that LMi = Costu + ( pλi )Costp (i) and LMi ≈ Cost0p (i) + ((λ/pi ) − 1)Costp (i).
It is now clear that an update strategy Us = [δi ] for the user constitutes a vector of decision variables,
having values 0 or 1 for all the LAs. The objective is to obtain optimal strategy Us∗ such that the LMC
is minimized. While enumerating all possible update strategies, the state space of the solution increases
exponentially with the number of LAs. Therefore, a GA is proposed in Reference 3 to obtain a (near-)
optimal solution for the location management problem.
As mentioned in Section 15.2, the first step in a GA-based approach is to map the state space into a
symbolic domain. The most obvious way is to represent each bit-string associated with a strategy Us by a
single chromosome (or genome). The length of every chromosome is equal to the number of LAs. A group
of strategies is chosen to form the initial population. For faster convergence, the relative proportion of
the number of 0s and 1s in the bit-string is chosen based on the call arrival rate and update cost. For
a relatively low call arrival rate and high update costs, it is wise to issue less frequent updates, thereby
resulting in more 0s than 1s in the chromosomes. Since, at every iteration, the GA inherently attempts
to increase the associated fitness values, the fitness function is chosen to be reciprocal of the total LMC,
that is, 1/LMC. The roulette wheel spinning selection [16] is used with elitism so that better chromosomes
will survive for the next iteration. After this selection, the crossover and mutation functions are executed
with probabilities 0.8 and 0.01 respectively. The fitness of the children are now evaluated and the entire
process is repeated.
Illustrative Example: At each iteration of the GA, the best chromosome, from the initialization phase till
that iteration cycle, is tracked. This gives the optimal (or near-optimal) solution at the termination of
the algorithm. The population size for each generation is kept constant at 50 and the number of bits
(i.e., the number of LAs) in the chromosome is chosen as 8. The cost function LMC is computed using the
steady-state transition probabilities between any two LAs. It was found that the GA converged very fast
to the near-optimal solution. The population size is kept constant at 20 and in all the cases the algorithm
converges to the optimal solution within 1000 generations. The best and average values of the fitness of
the chromosomes as well as the standard deviations are computed for each generation. A sample run of
the entire process with different generations having best and average fitness values is shown in Table 15.1,
which results in 11011 as the best chromosome having fitness 0.127 corresponding to LMC = 7.87 units.
Table 15.2, on the other hand, represents the optimal update strategies for different call arrival rates and
update/paging cost ratios.
Recently, the location management problem is also investigated in Reference 5 using a combination of
N −1
cellular automata (CA) and GAs. The total LMC is estimated as r i∈C wmi + j=0 wcj .v(j), where r
is the update-to-paging cost ratio, C is total number of reporting cells, that is, the number of cells from
which at least one update is issued, wmi represents frequency of movement into a cell i, wcj represents
frequency of call arrival within a cell j, and v(j) represents the vicinity (neighborhood) of cell j. Note that
States 0, 1 or 2
States 0 or 1
cellular automata is a decentralized, discrete space–time system and the state of every cell is governed by
its surrounding cells. Each cellular unit of cellular automata is associated with each cell in the network.
Each cell is represented either by “1” or “0” depending on whether or not it is a reporting cell. This leads
to two possible states for each cellular unit in the CA. Considering the hexagonal cells of the network,
the maximum neighborhood of every cell is taken as 6. For cells with less than 6 neighbors, dummy cells,
represented by “2” are added. Thus, as shown in Figure 15.3, each cell itself has 2 possible states (0 or 1) and
each of its 6 neighbors has 3 possible states (0, 1, or 2), implying a total number of 37 × 2 = 1458 possible
neighborhood states. Hence, the corresponding rule is of length 1458 bits and there can be a total of 21458
transition functions. Genetic algorithms are used to search for the best one from this exponential number
of transition rules. In Reference 5, an initial population of 1000 rules are created with a random value
for each rule. At every iteration, a new set of CA test data is generated. The set consists of four randomly
generated reporting cells configuration. The fitness function is chosen as the sum of the LMCs of all these
four configurations. The selection strategy is used to get the minimum fitness values (minimum location
management costs). A two-point crossover [16] with probability 0.8 and a mutation with probability 0.01
is used to achieve better solutions at every iteration. A set of 80 rules (chromosomes) have been used for
elitism, while crossover and mutation have been applied on the rest of the 920 rules. Simulation results
on different 4 × 4 and 5 × 5 cellular networks result in a cost per call arrival between 12.25 and 16 within
200 generations.
A careful look at both of the above mentioned location management strategies reveals that the only
objective of both the schemes is to reduce the overall LMC. However, more recently, the problem of
location management is combined with load balancing constraints, thereby introducing the notion of
multi-objective optimization techniques. Genetic algorithms can be easily extended to solve such multi-
objective optimization problems. A multi-objective, hierarchical mobility management optimization for
UMTS networks has been proposed in Reference 6, which balances the load among various unique logical
areas in the network, such as the LA, and the mobility routers, such as the mobile switch centers (MSCs),
Radio Network Controller (RNC), and Serving GPRS Node (SGSN), while minimizing the signaling
cost for location update. A schema-based niched Pareto GA [17] has been used that deals with multiple
objectives by incorporating the concept of Pareto domination in its selection operator, and applying
a niching pressure to spread out its population along the Pareto optimal tradeoff surface. The fitness
function or the cost functions used are the RA load balancing and the intra-SGSN signaling cost, which
covers the intra-SGSN routing area update and paging cost.
F2
F7 F3
F1
F6 F4
F2 F5
F7 F2
F3
F1 F7 F3
F6 F4 F1
F5 F6 F4
F5
and each cell is allocated to a certain number of channels depending on its traffic density. In the cellular
system, the same frequency cannot be used in the adjacent cells, as there will be co-channel interference.
The hexagonal cell structure creates a cluster of 7 cells, where the frequencies will differ from each other.
Thus, channels used by a cell can be reused by a different cell belonging to a different cluster (sufficiently
far apart), so that the interference is bounded by some specific value. This is demonstrated in Figure 15.4,
where Fi represents the different frequencies used in the corresponding cell.
The channel assignment technique [2] can be static or dynamic. In a static assignment, a fixed num-
ber of channels are allocated to each cell and are estimated by the traffic demand in that cell. On the
other hand, in a dynamic allocation strategy, the channels are assigned on demand, and no cell has
exclusive control on any channel. Channel assignment strategies need to tackle two types of channel
interferences: (i) co-channel interference and (ii) adjacent channel interference. Co-channel interference
occurs when a different signal is received in the same channel as the original signal and therefore can-
not be eliminated by the receiver itself. Adjacent channel interference, on the other hand, is caused
by inadequate/incomplete filtering of unwanted modulation products in frequency modulation systems,
improper tuning, or poor frequency control in either the reference channel or the interfering chan-
nel, or both. The channel assignment strategy needs to maintain both the co-channel and the adjacent
channel interference below a tolerable limit. Most of the research in this area is focussed on finding
the minimum bandwidth spectrum to satisfy a given traffic load, while maintaining the interference
constraint.
Let the available frequency spectrum consist of M consecutive channels f1 , f2 , . . . , fM . Also, let N , di ,
and cij respectively represent the number of cells, channel demand of cell i, and frequency separation
needed between any two channels allotted to a pair of cells i and j. The matrices formed by the elements
cij and di , (1 ≤ i, j ≤ N ), are denoted as C and D, respectively. Intuitively, cij = 1 and cij = 2, respect-
ively mean that the same and adjacent frequencies cannot be assigned to certain channels. The channel
assignment strategy now needs to find out a suitable frequency assignment matrix F = [fmi ]M×N , where
fmi is binary-valued, with 1 indicating that the frequency fm is assigned to the ith cell and 0 indicat-
ing that it is not. The problem now boils down to minimize the value of M such that the channels
can be allocated to all the cells without interference, that is, when fmi = flj = 1, |fmi − flj | ≥ cij and
M
m=1 fmi = di . It has been shown in Reference 18 that the channel assignment problem is equivalent
to the graph coloring problem (NP-hard) [1], even when only the co-channel constraints are considered,
that is, when the matrix C is binary-valued. The actual channel assignment problem considered here is
even more complex and the solution space increases exponentially with the increasing number of cells.
The role of evolutionary algorithm now comes into play to obtain a (near-) optimal solution [2] in a
polynomial time.
The first step to solve this problem using GA is again to generate the initial pool of encoded strings or
chromosomes. Let, S1 , S2 , . . . , SP represent the P strings of an initial population. Each Si is an M × N
matrix, where M and N are number of frequencies and number of cells, respectively. The elements of the
matrix Si can have values 0, 1, −1, or 9. The interpretation of these four different values are enumerated
here:
1. 0: Cell (given by the column number) is not using the frequency (row number), and even if it uses
that frequency, there will be no conflict with other existing allocations.
2. 1: Cell is using the particular frequency.
3. −1: Cell is not using the frequency (row number), and cannot use that frequency for possible
interference.
4. 9: It is used at the head of all unused channels (rows).
Illustrative Example: Table 15.3 demonstrates an example of a valid solution for the four-node channel
allocation problem shown in Figure 15.5. The initial population has been created by using different
permutations of the nodes in the chromosomes, e.g., 1, 2, 3, 4, or 1, 3, 2, 4, or 3, 4, 1, 2, or 3, 1, 2, 4. The
fitness function is decided based on the total number of channels allocated, that is, the value of M. The
objective is to minimize this value of M. For chromosomes with equal M values, the one with more 0s is
selected as the better one. The reason is that the chromosomes having more 0s allow more channels to be
added, while satisfying the interference constraint. Now, out of the P × N columns, P × N × ρ columns
are selected at random, where ρ is the probability of mutation. From a selected column, randomly a
0 and 1 is chosen and flipped. An unavailability of a column with 0 or 1 leads to a mutation failure. If
the mutation results in a row with all 0 values, a leading 9 is placed, reducing the bandwidth requirement
by 1. Simulation results show that the lower bound for the minimum number of frequencies range from
∼300 to 500 for different types of demands. The GA approach takes only 1 to 3 generations (fraction of a
second) to converge in a near-optimal solution.
solution
+1 −1 +1 −1
−1 −1 −1 −1
−1 −1 −1 +1
−1 −1 −1 −1
−1 +1 −1 −1
−1 −1 0 −1
−1 −1 −1 −1
−1 −1 −1 +1
0 −1 −1 −1
0 0 0 −1
0 0 0 −1
0 0 −1 −1
0 −1 −1 +1
9 0 −1 −1
9 0 0 −1
5 4 0 0 1
4 5 0 0 1
C= D=
0 0 5 0 1
0 1 2 5 3
Cells Null policy Best policy Direct code (GA) Block code (GA) Program code (GA)
Pr[B] bits G Pr[B] bits G Pr[B] bits G
10 0.121 0.0753 0.076 400 132 0.077 200 100 0.076 24 < 40
50 0.216 0.072 0.097 2652 3600 0.078 510 800 0.072 48 < 40
length of strings varies for different coding strategies. Program codes perform the best in terms of the
number of generations and string lengths, followed by the block codes and direct codes.
An adaptive resource allocation and call admission control scheme, based on GA, was proposed in
Reference 19 for wireless ATM networks. Multimedia calls have their own distinct QoS requirements (e.g.,
cell loss rate, delay, jitter, etc.) for each of their substreams. The network usually allocates an appropriate
amount of resource that constitutes a certain QoS level, which remains fixed during a call. But such static
schemes are inefficient in terms of resource utilization. In the adaptive algorithm proposed in Reference 19,
each substream declares a range of acceptable QoS levels (e.g., high, medium, low) instead of just a single
one. With the variation of network resources, the algorithm selects the best possible QoS level that each
substream can obtain, while achieving maximum utilization of the resources and their fair distribution
among the calls. For example, in case of congestion, the algorithm tries to free up some resources by
degrading the QoS levels of some of the existing calls. The problem essentially boils down to finding the
best QoS levels for all existing calls amidst a large search space. Thus, if three types of streams, namely
audio, video, and data are considered with four possible QoS levels — “High,”“Medium,”“Low,” and “No”
Component — then a total number of 43 = 64 QoS levels are possible, and if we consider N existing calls,
the search space is given by 64N . For N = 10, the dimension of search space will be 6410 = 1.153 × 1018 .
A GA has been provided in Reference 19, which is a tool for searching an optimal solution in this huge
search space. Simulation results show that the algorithm is capable of searching solutions with a huge gain
(250%) in terms of the number of admitted calls, while achieving a resource utilization ranging between
85.87% and 99.985%.
time-varying nature of the wireless medium and the dynamism in the available resources. However, the
moments of their probabilistic distributions can be obtained by studying the history of their change over
a certain period of time, broadcasted with link state advertisements. Proper modeling with probability
distributions for each QoS parameter enables us to estimate the probabilistic bound corresponding to
a particular value of that parameter. This bound serves as the guarantee provided by the network, for
satisfying a particular value of a QoS parameter. Additionally, the tree should admit the maximum
number of sessions with such guarantees. This is achieved by efficient resource and traffic management
mechanisms.
Genetic algorithm-based operations, namely selection, crossover, and mutation operations are then
successively applied to the members of the population until there is little change in the quality of the
best solution in the entire population. The best solution replaces the worst solution of the previous
population. In crossover operations, the corresponding parts of two randomly selected multicast routing
trees are concatenated to obtain new routing trees. In mutation operations, a path in a multicast routing
tree is randomly selected and is replaced by another valid path between the same source and destination.
1 4 5
2 3 6
Illustrative Example: To illustrate the coding scheme of potential solutions, we study a network of eight
nodes, shown in Figure 15.6, where node 1 is the source of multicast delivery and 4, 5, 6, and 7 are
the destination nodes. Then the pools of valid paths for each source–destination pair are as shown
in Figure 15.7. The initial population, as required by the GA, is created as follows. Each member of
the population is formed by randomly selecting a path between each source–destination pair and then
concatenating them to represent a multicast tree spanning the source node and the set of destination
nodes. Figure 15.8 depicts the multicast delivery tree computed by the underlying routing algorithm for
the sample network, shown in Figure 15.8, with node 1 as the source and nodes 4, 5, 6, and 7 as the
destination nodes.
15.6.2 Improvement
The GA framework described here, combines the three QoS objectives into a single linear fitness function.
This scheme works well when only one solution is needed. But when multiple, mutually conflicting optim-
ization parameters are involved, it cannot yield solutions, which are better than the other with respect to a
Source node
1 4 5
2 3 6
single optimization parameter, but not the superior when all the optimization parameters are considered.
In such a case, no solution is dominated by others when all the parameters are considered. These are
generally termed as nondominated or pareto-optimal solutions, generated by using a multi-objective GA.
The GA-based multicast routing algorithm has been extended to incorporate a multi-objective QoS-
optimization mechanism [8]. The procedure does not combine the three predefined QoS parameters into
a single objective function but attempts to optimize each parameter individually, thereby providing a near-
optimal and non dominated set of solutions (i.e., multicast trees). The solution set consists of not only
those trees that offer best delay, bandwidth requirement, and residual bandwidth guarantee individually,
but also a set of trees compromising fairly between the three optimization parameters.
Simulation results demonstrate that the algorithm is capable of obtaining a near-optimal multicast tree
in reasonable time. With a session arrival rate of 5–10 multicast sessions, the average session blocking
rate is only 2–5%. The multi-objective GA improves the flexibility of this scheme by offering a set of
nondominated solutions. The user now has the flexibility to choose his/her favorable solution from this
nondominated set. The dynamism and fluctuation of networks always creates resource uncertainty. This
might result in the unavailability of a particular QoS-constrained path. The nondominated solutions aids
in offering an alternate QoS-guaranteed path, thereby resulting in the graceful degradation scheme of QoS
provisioning. With multi-objective GA, it becomes possible to sustain more calls with their minimum level
of QoS.
15.7 Conclusion
In this chapter, we have presented a survey of the computationally difficult problems specific to wireless
networking that have been solved using GA, a bio-inspired optimization algorithm. The primary objective
of each of the problems is to derive a near-optimal solution in a computationally feasible time. GA provided
a tool for deriving such solutions. Relevant performance results of the solutions for the problems have
been been presented to illustrate the efficiency of GAs in solving the complex problems. Potentially, GA
can be used in any of the optimization problems faced in the design and operation of wireless networks.
While, in this chapter, we have predominantly discussed problems from the cellular network domain,
recent developments in the area of ad hoc and sensor networks are posing new challenges, a majority of
which are difficult optimization problems. As a result, there have been some research attempts [23–25]
to solve these problems using GA. We hope that this chapter will help in providing a clear idea on the
methodology of solving complex, real-life problems in the wireless domain using GAs.
References
[1] M.R. Garey and D.S. Johnson, Computers and Intractability: A Guide to the Theory of
NP-Completeness, W.H. Freeman and Company, San Francisco, CA, 1983.
[2] G. Chakraborty and B. Chakraborty. A genetic algorithm approach to solve channel assignment
problem in cellular radio networks. IEEE Midnight-Sun Workshop on Soft Computing Methods in
Industrial Applications, June 1999.
[3] S.K. Sen, A. Bhattacharya, and S.K. Das. A selective update strategy for PCS users. Wireless Networks,
5, 313–326, 1999.
[4] A. Yener and C. Rose. Genetic algorithms applied to cellular call admission: Local policies. IEEE
Transactions on Vehicular Technology, 46, 72–79, 1997.
[5] R. Subrata and A.Y. Zomaya. Evolving cellular automata for location management in mobile
computing networks. IEEE Transactions on Parallel and Distributed Computing, 14, 13–26, 2003.
[6] T. Ozugur, A. Bellary, and F. Sarkar. Multiobjective hierarchical 2G/3G mobility manage-
ment optimization: niched Pareto genetic algorithm. In Proceedings of Globecom, Vol 6, 2001,
pp. 3681–3685.
[7] 3GPP - UMTS Standards. http://www.3gpp.org/ftp/tsg_cn/ TSG_CN/TSGN_03/Docs
[8] A. Roy and S.K. Das. QM 2 RP: A QoS-based Mobile Multicast Routing Protocol. ACM/Springer
Wireless Networks (WINET), 10(3), 271–286, 2004.
[9] S.J. Beaty. Genetic algorithms and instruction scheduling. In Proceedings of the 24th annual
international symposium on Microarchitecture, pp. 206–211
[10] K.Ng and Y.Li. Design of sophisticated fuzzy logic controllers using genetic algorithms. In
Proceedings of the IEEE World Congress on Computational Intelligence, 1994.
[11] C.T. Sun and M.D. Wu. Multi-stage genetic algorithm learning in game playing. In Proceedings
of the First International Joint Conference of the North American Fuzzy Information Processing
Society, the Industrial Fuzzy Control and Intelligent Systems, and the NASA Joint Technology, 1994,
pp. 223–227.
[12] C.L. Karr. Genetic algorithms for modelling, design, and process control. In Proceedings of the
second international conference on Information and knowledge management, 1993, pp. 233–238.
[13] Z. Michalewich. A genetic algorithm for the linear transportation problem. IEEE Transactions on
Systems, Man, and Cybernetics, 21, 445–452, 1991.
[14] C.A.R. Jahuria. Hybrid genetic algorithm with exact techniques applied to TSP. In proceedings of
the Second International Workshop on Intelligent Systems Design and Application, 2002, pp. 119–124.
[15] J.J. yang and R.R. Korfhage. Query optimization in information retrieval using genetic algorithms:
Report on the experiments of the TREC project. In Proceedings of TREC’1. NIST, 1993, pp. 31–58.
[16] D.E. Goldberg, Genetic Algorithms in Search, Optimization and Machine Learning ”, Addison-Wesley
Longman Publishing Co., Reading, MA, 1991.
[17] J. Horn, N. Nafpliotis, and D.E. Goldberg, “ A niched pareto genetic algorithm for mutiobjective
optimization. In IEEE Conference on Evolutionary Computation, New Jersey, vol. 1, 1994, pp. 82–87.
[18] W.K. Hale. Frequency assignment: Theory and applications. Proceedings of IEEE, 68, 1497–1514,
1980.
[19] M. Sherif, I. Habib, M. Naghshineh, and P. Kermani. An Adaptive Resource Allocation and
Call Admission Control Scheme for Wireless ATM Using Genetic Algorithms. In Proceedings
of Globecom, 1999, pp. 1500–1504.
[20] S. Chen and K. Nahrstedt. An overview of quality of service routing for next-generation high-speed
networks: Problems and solutions. IEEE Network, Special Issue on Transmission and Distribution
of Digital Video, 12, 64–79, 1998.
[21] N. Banerjee and S.K. Das. Fast determination of QoS-based multicast routes in wireless networks
using Genetic Algorithm. IEEE International Conference on Communications (ICC), 8, 2588–2592,
2001.
[22] R. Nelson. Probability, Stochastic Process and Queuing Theory. Springer-Verlag, Heidelberg, 1995.
[23] L. Barolli, A. Koyama, and N. Shiratori. A QoS routing method for ad-hoc networks based
on genetic algorithm. In Proceedings of International Workshop on Database and Expert Systems
Applications, 2003, pp. 175–179.
[24] D. Turgut, S.K. Das, R. Elmasri, and B. Turgut. Optimizing clustering algorithm in mobile ad
hoc networks using genetic algorithmic approach. In Proceedings of IEEE GLOBECOM, 2002,
pp. 62–66.
[25] Q. Wu, S.S. Iyengar, N.S.V. Rao, J. Barhen, V.K. Vaishnavi, H. Qi, and K. Chakrabarty. On comput-
ing the mobile agent routes for data fusion in a distributed sensor network. IEEE Transactions on
Knowledge and Data Engineering, 1(6), 740–753, 2004.
16.1 Introduction
The last half of the twentieth century has seen a vigorous growth in the field of digital image processing
(DIP) and its potential applications. DIP deals with the manipulation and analysis of images that are
generated by discretizing the continuous signals. One important area of application that has evolved from
the 1970s is that of medical images. Rapid development in different areas of image processing, computer
vision, pattern recognition, and imaging technology, and the transfer of technology from these areas to
the medical domain has changed the entire way of looking at clinical routine, diagnosis, and therapy. Also,
the need for more effective and less (or non) invasive treatment has led to a large amount of research for
developing what may be called computer aided medicine.
Most modern medical data are expressed as images or other types of digital signals. The explosion in
computer technology in recent years introduced new imaging modalities such as x-rays, magnetic reson-
ance imaging (MRI), computer tomography (CT), positron emission tomography (PET), single photon
emission computed tomography (SPECT), electrical impedance tomography (EIT), ultrasound, and so on.
These images are noninvasive and offer high spatial resolution. Thus the acquisition of a large number of
such sophisticated image data has given rise to the development of quantitative and automatic processing
and analysis of medical images (as opposed to the manual qualitative assessment done earlier). Moreover,
the use of new, enhanced, and efficient computational models and techniques has also become necessary.
16-235
A large amount of research is being devoted to the various domains of medical image processing, and
some surveys are already published [1,2]. However, in view of the vastness of the field, it has become
necessary to specialize any further survey work that is undertaken in this area, so that it can become
manageable and can be of more benefit to researchers/users. Some such attempts have already been made,
for example, specialization in terms of period of publication [3], image segmentation [4], registration [5],
virtual reality, and surgical simulation [6,7].
The area of designing equipments for better imaging and hence improvement in subsequent processing
tasks has also received the attention of researchers. The design problem has been viewed as one of
optimization, and therefore the use of efficient search strategies has been studied. The application of
genetic algorithms, a well-known class of search and optimization strategies, is also one of the important
areas that has been investigated in this regard.
Genetic Algorithms (GAs) [8,9] are randomized search and optimization techniques guided by the
principles of evolution and natural genetics, and have a large amount of implicit parallelism. They provide
near optimal solutions of an objective or fitness function in complex, large, and multimodal landscapes.
In GAs, the parameters of the search space are encoded in the form of strings called chromosomes. A fitness
function is associated with each string that represents the degree of goodness of the solution encoded in
it. Biologically-inspired operators such as selection, crossover, and mutation are used over a number of
evolutions (generations) for generating potentially better strings.
The important fallout of (semi-) automated medical image processing tasks is enhanced diagnosis.
Several tasks in the area of medical diagnosis have also been modeled as an optimization problem, and
researchers have used GAs for solving them. In this chapter, we attempt to provide a state-of-the-art
survey in the application of the principles of GAs, an important component of evolutionary computation,
for improving medical imaging and diagnosis tasks. Section 16.2 describes the basic principles of GAs.
Thereafter, the use of GAs in improving equipment design has been studied. Finally, the application
of GAs for computer aided diagnosis, including schemes driven by both image and data (consisting of
information not derived from images), is provided.
• A representation strategy that determines the way in which potential solutions will be encoded to
form string like structures called chromosomes.
• A population of chromosomes.
Yes
Termination criterion attained? Stop
No
A schematic diagram of the basic structure of a GA is shown in Figure 16.1. The components of GAs
are described in the following sections.
1 0 0 1 1 0 1 0
is a binary chromosome of length 8. It is evident that the number of different chromosomes (or strings)
is 2l , where l is the string length. Each chromosome actually refers to a coded possible solution. A set of
such chromosomes in a generation is called a population, the size of which may be constant or may vary
from one generation to another. A common practice is to choose the initial population randomly.
6
7
5 8
4 3
16.2.4.1 Selection
The selection/reproduction process copies individual strings (called parent chromosomes) into a tentative
new population (known as mating pool) for genetic operations. The number of copies that an individual
receives for the next generation is usually taken to be directly proportional to its fitness value; thereby
mimicking the natural selection procedure to some extent. This scheme is commonly called the propor-
tional selection scheme. Roulette wheel parent selection, stochastic universal selection, and binary tournament
selection [8,10] are some of the most frequently used selection procedures. Figure 16.2 demonstrates the
roulette wheel selection. The wheel has as many slots as the population size P, where the size of a slot is
proportional to the relative fitness of the corresponding chromosome in the population. An individual is
selected by spinning the roulette, and noting the position of the marker when the roulette stops. Therefore,
the number of times that an individual will be selected is proportional to its fitness (or, the size of the slot)
in the population. In the commonly used elitist model of GAs, thereby providing what is called an elitist
GA (EGA), the best chromosome seen up to the present generation is retained either in the population,
or in a location outside it.
16.2.4.2 Crossover
The main purpose of crossover is to exchange information between randomly selected parent chromo-
somes by recombining parts of their genetic information. It combines parts of two parent chromosomes
to produce offspring for the next generation. Single-point crossover is one of the most commonly used
schemes. Here, first of all, the members of the selected strings in the mating pool are paired at random.
Then each pair of chromosomes is subjected to crossover with a probability µc where an integer position
k (known as the crossover point) is selected uniformly at random between 1 and l − 1 (l > 1 is the string
length). Two new strings are created by swapping all characters from position (k + 1) to l. For example,
let the two parents and the crossover points be as shown below.
1 0 0 1 1| 0 1 0
0 0 1 0 1| 1 0 0
1 0 0 1 1 1 0 0
0 0 1 0 1 0 1 0
Other common crossover techniques are two-point crossover, multiple point crossover, shuffle-
exchange crossover, and uniform crossover [9].
The successful operation of GAs depends, to a great extent, on the coding technique used to represent
the problem variables [11,12]. The building block hypothesis indicates that GAs work by identifying good
building blocks, and by eventually combining them to get larger building blocks [8,13,14]. Unless good
building blocks are coded tightly, the crossover operation cannot combine them [15,16]. Thus coding–
crossover interaction is important for the successful operation of GAs. The problem of tight or loose
coding of problem variables is largely known as the linkage problem [17]. Recent work on linkage learning
GAs that exploits the concept of gene expression can be found in References 18 to 20.
16.2.4.3 Mutation
Mutation is the process by which a random alteration in the genetic structure of a chromosome takes
place. Its main objective is to introduce genetic diversity into the population. It may so happen that
the optimal solution resides in a portion of the search space that is not represented in the population’s
genetic structure. The process will therefore be unable to attain the global optima. In such situations,
only mutation can possibly direct the population to the optimal section of the search space by randomly
altering the information in a chromosome. Mutating a binary gene involves simple negation of the bit,
whereas that for real coded genes are defined in a variety of ways [10,21]. Here, we discuss the binary
bit-by-bit mutation, where every bit in a chromosome is subject to mutation with a probability µm . The
result of applying the bit-by-bit mutation on positions 3 and 7 of a chromosome is shown here.
1 0 0 1 1 0 1 0
1 0 1 1 1 0 0 0
16.2.5 Parameters of GA
There are several parameters in GAs that have to be manually tuned and fixed by the programmer.
Among these are the population size, probabilities of performing crossover and mutation, and the ter-
mination criteria. Several other things must also be determined by the programmer. For example, one
must decide whether to use the generational replacement strategy, in which the entire population is
replaced by a new population, or the steady state replacement policy where only the less fit individuals are
replaced. Most such parameters in GAs are problem dependent, and no guidelines for their choice exist
in the literature. Therefore, several researchers have also kept some of the GA parameters variable and/or
adaptive [22–24].
As shown in Figure 16.1, the cycle of selection, crossover, and mutation is repeated a number of times
till one of the following occurs:
1. The average fitness value of a population becomes more or less constant over a specified number
of generations.
2. A desired objective function value is attained by at least one string in the population.
3. The number of generations (or iterations) is greater than some threshold.
is placed at the center (or, isocenter) of the magnetic field inside the bore. In this noninvasive procedure,
strong magnetic fields along with radio waves are used to visualize the structure of a particular body part.
The design of appropriate equipments for the purpose of good imaging may be considered as the first
step in medical image processing. For example, large superconducting solenoids with apertures of typically
1 m, highly uniform (20 ppm) central fields of 1–2T, and low fringe fields (5 Gauss at 5 m) are required
in clinical MRI. However, these magnets, which are now available in the clinical market, have deep bores,
typically between 1.8 and 2 m length, and have a number of disadvantages such as patient claustrophobia
and limited access for intervention. In order to overcome the limitations and evolve good designs for the
magnets, researchers have mapped the design problem to one of optimization and have investigated the
use of computational methods [25,26], including GAs [27–29], for this purpose.
Analytical techniques have been the preferred approach to design such magnets and gradient sets for
MRI. Such technique are computationally efficient but are approximate, particularly away from the axis
of symmetry. In Reference 30, an attempt has been made, which uses GA running on massively parallel
computers to design an actively shielder whole-body MRI solenoid magnet with a bore of 1 m. The task
is to optimize a cost function based on the magnetic field generated by a set of upto 20 circular coils, each
having upto 2000 turns. The coils are constrained to be concentric with the magnet, and are arranged in
symmetric pairs. A single coil is described by five parameters:
A chromosome encodes these five parameters per coil, for upto 20 coils. Since the coils are arranged in
pairs that are symmetric about the central X –Y plane of the magnet, only 10 of the coils are independent.
Thus a chromosome encodes upto 50 parameters that have to be optimally tuned. The magnetic field is
computed using the Biot–Savart law. The fitness function incorporates terms for uniformity of field in
the region of interest (ROI), and the smallness of the fringe field. The field is calculated by summing over
the contributions from the turns in each coil. Recombination is performed as a two-stage process. In the
first stage, a parent subset, which is of half the size of the population, is created by performing binary
tournament selection in the top eighth of the population. In the second stage, pairs of chromosomes from
the parent subset are picked at random and mated to produce a single offspring. The parents are replaced
in the subset, and this process is continued till the next generation is completed. Three types of mutations
are considered, which takes care of small perturbations, perturbations to add or remove a coil from the
design, and a drastic perturbation for adding extra diversity to the population. The initial result provided
in Reference 30 demonstrates the effectiveness of GAs for producing shorter whole-body magnet designs,
which are original and innovative.
Two-dimensional ultrasonic arrays provide the possibility of three-dimensional electronic focusing and
beam-steering and thus three-dimensional imaging. In simulation studies, it has been demonstrated [31]
that reducing the number of elements of a two-dimensional matrix array down to order eight, keeps
resolution and leads still to sufficient contrast. These random arrays are usually obtained by generating a
random pattern with the desired number of elements. In Reference 32, simulation is presented to show
that the imaging quality of a sparse tree can be improved by optimizing the random choice of elements.
The optimization is done using GA.
computer assisted diagnostic tools, which are intended to help the medical practitioners make sense out
of a large amount of data, and make diagnosis and therapy decisions. Figure 16.3 shows a block diagram
of a computer aided diagnosis system.
so that conventional scalar optimization technique can be utilized. This involves incorporating a priori
information into the aggregation method so that the resulting performance of the classifier is satisfactory
for the task at hand. It has been shown in Reference 35 that the multiobjective genetic approach removes
the ambiguity associated with defining a scalar measure of classifier performance and that it returns a set
of optimal solutions that are equivalent in the absence of any information regarding the preference of
the objectives, that is, sensitivity and specificity. The a priori knowledge that is used for aggregating the
objective functions in conventional classifier training may instead be applied for post-optimization to select
from one of the series of solutions returned from multiobjective genetic optimization. This technique is
applied in Reference 35 to train a linear classifier and an artificial neural network, using simulated datasets.
The performance of the solutions returned from the multiobjective genetic optimization represents a series
of optimal pairs, sensitivity and specificity, which can be thought of as operating points on an ROC curve.
It is observed that all possible ROC curves for a given dataset and classifier are less than or equal to the ROC
curve generated by the NP-GA optimization. In Reference 36, a multiobjective approach for optimizing the
performance of two rule-based CAD schemes has been proposed. One of these CAD schemes is designed
to detect clustered microcalcifications in digitized mammograms, while the other scheme is developed to
detect the breast masses.
Diagnosis and follow up of pigmented skin lesions is an important step toward early diagnosis of skin
cancer [37]. For this purpose, digitized epiluminescence microscope (ELM) [38] images of pigmented
skin lesions is used. Epiluminescence microscopy is a noninvasive technique that uses an oil immersion to
render the skin translucent and make pigmented structure visible. During clinical diagnosis of pigmented
skin lesions, one of the main features is the lesion symmetry, which should be evaluated according to
its shape, color, and texture. This may be evaluated by drawing two orthogonal axes that maximize the
perceived symmetry [39]. The evaluation is binary, that is, the lesion is either symmetrical or asymmetrical.
In addition to the small number of possible outcomes, the evaluation is highly subjective and depends on
the physicians’ experience. As a result, the development of automatic techniques for the quantification of
symmetry and the detection of symmetry axes is necessary. Other methods based on principal component
technique for computing the axes may be found in References 40 and 41.
Reference 42 has proposed a GA-based technique and an optimization scheme derived from the self-
organizing maps theory for the detection of symmetry axes. The notion of symmetry map has been
introduced, which allows an object to be mapped to a symmetry space where its symmetry properties
can be analyzed. The objective function (ψ) used in Reference 42 is based on a given symmetry measure,
which is a function of the mean-square error (MSE) between the original and the reflected images. The
MSE is defined as follows:
where (x, y) : [0, c − 1] × [0, r − 1] → IR n is the vector valued r × c input image, where the pixel
values are in an n dimensional space, (x, y) and (x , y ) represent the pixel coordinates and the symmetry
coordinates respectively. The input image can be decomposed into symmetric (s (x, y)) and asymmetric
(a (x, y)) components. Therefore, (x, y) = s (x, y) + a (x, y). The asymmetric component can be
considered as symmetric noise, and the MSE is proportional to its energy. The distortion due to noise can
be measured through the peak signal-to-noise ratio (PSNR), and is given by
(Nq − 1)2
PSNR = 10 log10 , (16.2)
MSE
where Nq is the number of quantization levels in the image. The fitness function ψ is defined as
1
ψ =1− . (16.3)
1 + PSNR
Real encoding of chromosomes has been used in Reference 42 along with a modified version of simu-
lated binary crossover [43]. The proposed technique is applied for detection and diagnosis of malignant
melanoma.
The development of computer supported systems for melanoma diagnosis is of great importance to
dermatologists due to its clinical accuracy in identifying malignant melanomas [44,45]. Several techniques
for computerized melanoma diagnosis are based on color images making use of image analysis methods
to quantify visual features as described by the “ABCD” rule (Asymmetry, irregular Border, varying Color,
Diameter) [37,46,47]. Laser profilometry opens up new possibilities to improve tumor diagnostics in
dermatology [48].
The recognition task is to classify surface profiles of melanomas and nevi also called moles. Due to the
fact that all profiles contain regions with a structure similar to normal skin, each profile is subdivided
into 16 non-overlapping quadratic subprofiles and image analysis algorithms are applied to each sub-
profile separately. Subsequently, feature selection algorithms are applied to optimize the classification
performance of the recognition system.
An efficient computer supported technique that uses GAs for diagnosis of skin tumors in dermatology is
presented in Reference 49. High resolution skin surface profiles are analyzed to recognize malignant melan-
omas and nevocytic nevi. In the initial phase, several types of features are extracted by two-dimensional
image analysis techniques characterizing the structure of skin surface profiles: texture features based on
co-occurrence matrices [50], Fourier features [51], and fractal features [37,52,53]. Subsequently, several
feature selection algorithms based on heuristic strategies, greedy technique, and GA are applied to determ-
ine suitable feature subsets for the recognition process by describing it as an optimization problem. As
a quality measure for feature subsets, the classification rate of the nearest neighbor classifier computed
with the leaving-one-out method is used. Among the different techniques used, GAs show the best result.
Finally, neural networks with error back-propagation as learning paradigm are trained using the selected
feature sets. Different network topologies, learning parameters, and pruning algorithms are investigated
to optimize the classification performance of the neural classifier. With the optimized recognition system,
a classification performance of 97.7% is achieved.
In the 1980s, microwave imaging was thought to have great potential in developing medical diagnostic
tools. However, because of the problem of inverse scattering, good reconstruction of images was found
to be difficult. Carosi et al. [54] have used the approach of focused imaging, where only a part of
the body is subjected to the investigation, combined with the capabilities of global search techniques
like GAs for accurate reconstruction. For experimental purpose, a human abdomen is considered and
different electromagnetic sources operating at the working frequency of 433 MHz and 2.54 GHz are used.
Numerical investigations are performed to define the optimal dimensions of the reduced investigation
domain. To quantitatively evaluate the effects of the reduction of the original investigation domain on
the inversion data, suitable relative errors are defined. Once the reduced domain is defined, preliminary
reconstructions are performed aiming to evaluate the imaging capability of GAs when a focussed approach
is used for tomographic application.
One of the most important facial paralysis diagnosis techniques is quantitative assessment of patient’s
facial expression motion. Johnson et al. [55] proposed a technique for achieving maximal regional facial
motion while the rest of the face is held at rest based on Maximal Static Response Assay of facial nerve
function. This requires the placement of removable adhesive dots and a small adhesive ruler on the
face at predefined locations. However, many facts, such as misplacement of facial dots, misplacement
of the grid, and reading and entry errors, will cause an error in the assay. Helling et al. [56] used
region-specific, subtracted, digitized image light reflectance as a two-dimensional marker for the complex
three-dimensional surface deformations of the face during expression. The velocity of region-specific
facial motion is estimated from the facial motion image sequences using the optimal flow (OF) technique.
The computation of the OF field requires estimates of both the spatial gradient and spatial time derivative
at each pixel, and this time-consuming process often limits its use, especially in medical application. To
overcome this problem, an OF technique based on GA is proposed in Reference 57 to detect facial motions
from dynamic image sequences. Experimental results demonstrate that the proposed technique is very
useful to diagnose the site of facial paralysis and assess progression or recovery profiles of patients when
combined with other diagnosis techniques.
The motivation behind this study is to accurately predict the outcome so that patients who are more
likely to die can be identified at diagnosis, and subjected to high dose aggressive chemotherapy (which
has several negative side effects). On the other hand, those with a high chance of survival can be spared
this treatment and its side effects. Given a set of case histories, a technique is proposed in Reference 70
that attempts to find the relative weights of the different factors that are used to describe the cases. The
eight factors that are used in the prognostic model of Reference 70 and their possible ranges are provided
in Table 16.1.
A diffusion GA is used for building the prognostic model that will simultaneously optimize the three
objectives given here. A model is represented as a chromosome by encoding the weights associated with
each factor. Boolean criteria (1–5 in Table 16.1) have single associated weights, while criteria having a
range of values (6–8 in Table 16.1) have one weight per subrange. In addition a combination weight is used
in Reference 70, which can incorporate the possibility that a combination of factors might be important.
Thus a total of 18 weights are used, 5 for the Boolean criteria, 12 for the subranged criteria, and the
combination weight. In the diffusion GA, the individuals in a population are arranged along the vertices
of a square lattice. During crossover, each individual chooses one of its four neighbors randomly as the
mate. Mutation can either increase or decrease a weight by 10%, or set it to zero. Only if the result of
crossover and mutation at a position is better, in the Pareto optimal sense, than the original, the latter will
be replaced. Marvin et al. [70] experimented with a population size of 169 placed on a 13 × 13 grid. The
weights are randomly set in the range [−2000, 2000], and the algorithm is executed for 1000 generations.
It is found to predict 90% of the survivals and 87% of the deaths, while using 6 of the 8 factors, and 13
of the possible 18 weights when the entire data set is used. For training using 90% data and testing using
the remaining 10%, several prognostic models are obtained each coming up with a different compromise
among the three objective values. Significantly, the method in Reference 70 enables a simple model to be
evolved, one that produces well-balanced predictions and one that is relatively easy for clinicians to use.
Ngan et al. [71] employ evolutionary programming (EP) and genetic programming (GP) in the domain
of knowledge discovery in medical systems. Evolutionary Programming is used to learn Bayesian networks,
which is known to be an intractable problem. Minimum description length principle is used to measure the
goodness of solution in EP followed by the use of GP to learn rules. The entire knowledge discovery system
is applied on limb fracture and scoliosis data, where it is able to detect many interesting patterns/rules
that were uncovered earlier. Ngan et al. had earlier used GP for discovering comprehensible rules in the
medical domain; they used grammar to restrict the seach space, and to ensure the syntactical correctness of
the rules [72]. The discovered rules were evaluated within the framework of support confidence proposed
for association rule mining. Here, a major limitation was that the grammar was application dependent,
and had to be written for each application domain.
Genetic programming is also used to discover comprehensible rules for predicting 12 different diseases
using 189 predicting attributes, or measurements [73]. The 12 diseases considered here are stable angina,
unstable angina, acute myocardial infarction, aortic dissection, cardiac tamponade, pulmonary embolism,
pneumothorax, acute pericarditis, peptic ulcer, esophageal pain, musculoskeletal disorders, and psycho-
genic chest pain, the characteristic of all of which was chest pain. All the 189 attributes are binary. Genetic
programming is used to learn rules expressed in a kind of first-order logic of the form <Atti Op Attj >,
where Atti and Attj are the predicting attributes, and Op is some relational operator. Genetic programming
evolves a population of “programs” candidate to the solution of a specific problem. Here, a program is
represented in the form of a tree, in which the internal nodes are functions (operators) and the leaf nodes
are terminal symbols. In the GP formulation of the problem in Reference 73, the terminal set consists
of the 189 attributes, and the function set consists of {AND, OR, NOT}. The GP is executed once for
each class, with the appropriate rule for the ith class being evolved in the ith GP run. Thus, each run
consists of learning a two-class classification problem, in which the goal is to predict whether a patient
has a particular disease (class i) or not (NOT class i). For computing the fitness of a candidate (program
or rule), a (labeled) training set is used on which the the rule is tested. The size of the following sets are
computed:
• True positives (tp): the rule predicts that the patient has a given disease and the patient does
have it.
• False positives (fp): the rule predicts that the patient has a given disease and the patient does not
have it.
• True negative (tn): the rule predicts that the patient does not have a given disease and the patient
actually does not have it.
• False negative (fn): the rule predicts that the patient does not have a given disease but the patient
actually has it.
Thereafter, two measures are computed, the sensitivity (Se) and specificity (Sp):
tp
Se = (16.4)
tp + fn
tn
Sp = . (16.5)
tn + fp
For the experiments, the data set consisted of 138 samples (patients), which were partitioned into a training
set with 90 samples, and a test set with 48 samples. The GP achieved an accuracy of 77.08% on. the test set.
Other related methods that used GP for classification problem can be found in References 74 to 78. On
similar lines, in Reference 79, GAs were used to discover comprehensible IF–THEN rules for the diagnosis
of dermatological diseases and prediction of the recurrence of breast cancer. Here, a chromosome is
of length n, where n is the number of attributes. The i, the gene corresponding to the ith attribute,
is divided into three fields: weight (Wi ), operator (Oi ), and value (Vi ). Each gene corresponds to one
condition in the IF part of the rule. The GA is executed once for each class, and therefore the THEN part
(indicating the class for which the GA was run) is not required to be encoded in the chromosome. The
weight value indicated whether the ith attribute, Ai , is at all present in the rule (if Wi > Limit) or not (if
Wi ≤ Limit). Limit was set to 0.3. The operator Oi could take values from {=,
=}, if the corresponding
attribute is categorical, and from {≥, <}, if the corresponding attribute is continuous. The value Vi could
take values from the domain of the attribute Ai . Normal selection and crossover operators are used. Three
mutation operators are defined, namely weight mutation, operator mutation, and value mutation. The
fitness function of a chromosome was defined as fitness = Se ∗ Sp as in Reference 73. The dermatology
data set consists of the differential diagnosis of erythematisquamous. There are six different diagnoses (six
classes): psoriasis, seboreic dermatitis, lichen planus, pityriases rosea, chronic dermatitis, and pityriasis
rubra pilaris. The data set consists of 366 records with 34 attributes. The breast cancer data consists of
286 records with 9 attributes and 2 classes (recurrence and nonrecurrence of cancer). The accuracy rates
achieved in Reference 79 were 95% for the dermatological data and 67% for the cancer data. The resultant
rules were also found to be comprehensible, with one rule obtained per class.
Acknowledgment
A part of this work was carried out when Dr. U. Maulik visited the University of Texas at Arlington, USA,
with the BOYSCAST fellowship provided by Department of Science and Technology, Government of
India, during 2001. Dr. S. Bandyopadhyay would like to acknowledge Indian National Science Academy,
Government of India sponsored project Soft computing for medical image segmentation and classification
No. BS/YSP/36/887 for providing partial support to carry out this work.
References
[1] G. Gerig, T. Pun, and O. Ratib. Image analysis and computer vision in medicine. Computerized
Medical Imaging and Graphics, 18, 85–96, 1994.
[2] N. Ayache. Medical computer vision, virtual reality and robotics. Image and Computer Vision, 13,
295–313, 1995.
[3] J.S. Duncan and N. Ayache. Medical image analysis: Progress over two decades and the challenges
ahead. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22, 85–108, 2000.
[4] T. Mclnerney and D. Terzopolous. Deformable models in medical image analysis: A survey. Medical
Image Analysis, 1, 91–108, 1996.
[5] J.B.A. Maintz and M.A. Viergever. A survey of medical image restoration. Medical Image Analysis,
2, 1–37, 1998.
[27] B.J. Fisher, N. Dillon, T.A. Carpenter, and L.D. Hall. Design by genetic algorithm of a z gradient
set for magnetic resonance imaging of the human brain. Measurement Science and Technology, 6,
904–909, 1995.
[28] B.J. Fisher, N. Dillon, T.A. Carpenter, and L.D. Hall. Design of a biplanar gradient coil using a
genetic algorithm. Magnetic Resonance Imaging, 15, 369–376, 1997.
[29] G.B. Wiliams, B.J. Fisher, C.L.-H. Huang, T.A. Carpenter, and L.D. Hall. Design of biplanar
gradient coils for magnetic resonance imaging of the human torso and limbs. Magnetic Resonance
Imaging, 17, 739–754, 1999.
[30] R.E. Ansorge, T.A. Carpenter, L.D. Hall, N.R. Shaw, and G.B. Williams. Use of parallel supercom-
puter to design magnetic resonance systems. IEEE Transactions on Applied Superconductivity, 10,
1368–1371, 2000.
[31] D.H. Turnbull, A.T. Kerr, and F.S. Foster. Simulation of B-Scan images from two dimensional
transducer arrays. In Proceedings of the 1990 IEEE Ultrasonic Symposium, 1990, pp. 769–773.
[32] P.K. Weber, R.M. Schmitt, B.D. Tylkowski, and J. Steak. Optimization of random sparse 2-D
transducer arrays for 3-D electronic beam steering and focusing. In Proceedings of the 1994 IEEE,
IEEE Ultrasonics Symposium, 1994, pp. 1503–1506.
[33] M.A. Anastasio, H. Yoshida, R. Nagel, R.M. Nishikawa, and K. Doi. A genetic algorithm-based
method for optimizing the performance of a computer-aided diagnosis scheme for detection of
clustered microcalcifications in mammograms. Medical Physics, 25, 1613–1620, 1998.
[34] J. Horn and N. Nafpliotis. Multiobjective Optimization using the Niched Pareto Genetic
Algorithms. illiGAL report no. 93005. University of Illinois at Urbana-Champaign, 1993.
[35] M.A. Kupinski and M.A. Anastasio. Multiobjective genetic optimization of diagnostic classifiers
with implications for generating receiver operating characteristic curves. IEEE Transactions on
Medical Imaging, 18, 675–685, 1999.
[36] M.A. Anastasio, M.A. Kupinski, R.M. Nishikawa, and M.L. Giger. A multiobjective approach
to optimizing computerized detection schemes. IEEE Nuclear Science Symposium, 3, 1879–1883,
1998.
[37] A. Green, N. Martin, J. Pfitzner, M. O’Rourke, and N. Knight. Computer image analysis in the
diagnosis of melanoma. Journal of American Academies of Dermatology, 31, 958–964, 1994.
[38] Z.B. Argenyi. Dermatoscopy (epiluminescence microscopy) of pigmented skin lesions. Dermato-
logy Clinical, 15, 79–95, 1997.
[39] W. Stolz, O. Braun-Falco, M. Landthaler, P. Bilek, and A.B. Cognetta, Color Atlas of Dermatoscopy.
Blackwell Science, Oxford, 1994.
[40] W.V. Stoecker, W.W. Li, and R.H. Moss. Automatic detection of asymmetry in skin tumors.
Computerized Medical Imagings and Graphics, 16, 191–197, 1992.
[41] D. Gutkowicz-Krusin, M. Elbaum, P. Szwaykowski, and A.W. Kopf. Can early malignant melanoma
be differentiated from a typical melanocytic nerves by in vivo techniques?, Part II, automatic
machine vision classification. Skin Research and Technology, Vol. 3, pp. 15–22, 1997.
[42] P. Schmid-Saugeon. Symmetry axis computation for almost-symmetrical and asymmetrical
objects: Application to pigmented skin lesions. Medical Image Analysis, 4, 269–282, 2000.
[43] K. Deb and A. Kumar. Simulated binary crossover for continuous search space. Complex Systsems,
9, 115–148, 1995.
[44] C.M. Balch and G.W. Milton, Hautmelanome Springer, Berlin, 1988.
[45] C.M. Grin, A.W. Kopf, B. Welkovich, R.S. Bart, and M.J. Levenstein. Accuracy in the clinical
diagnosis of malignant melanoma. Archives of Dermatology, 126, 763–766, 1990.
[46] J.E. Golston, W.V. Stoecker, R.H. Moss, and I.P.S. Dhillon. Automatic detection of irregular borders
in melanoma and other skin tumors. Computer Medical Imaging Graphics, 16, 163–177, 1992.
[47] A.J. Sober and J.M. Burstein. Computerized digital image analysis: An aid for melanoma diagnosis
preliminary investigations and brief review. Journal of Dermatology, 21, 885–890, 1994.
[48] K.P. Wilhelm, P. Elsner, E. Beradesca, and H. Baibach, Bioengineering of the Skin: Skin Surface
Imaging and Analysis. CRC Press, Boca Raton, FL, 1997.
[49] H. Handels, T. Rob, J. Kreusch, H.H. Wolff, and S.J. Poppl. Feature selection for optimized skin
tumor recognition using genetic algorithms. Artificial Intelligence in Medicine, 16, 283–297, 1999.
[50] R.M. Haralick, K. Shanmugam, and I. Dinstein. Texture features for image classification. IEEE
Transactions on System, Man, & Cybernetics, 3, 610–621, 1973.
[51] D.H. Ballard and M.B. Brown. Computer Vision. Prentice-Hall, Englewood Cliffs, NJ, 1982.
[52] K.J. Falconer. Fractal Geometry, Mathematical Foundations and Applications. Wiley, Chichester,
1990.
[53] H.O. Peitgen and D. Saupe. The Science of Fractal Images. Springer, Berlin, 1988.
[54] S. Carosi, A. Massa, and M. Pastorino. Numerical assessment concerning a focussed microwave
diagnostic method for medical application. IEEE Transactions of Microwave Theory and Techniques,
48, 1815–1830, 2000.
[55] P.C. Johnson, H. Brown, W.M. Kuzon, J.R. Ballit, J.L. Garrison, and J. Campbell. Simultaneous
quantitation of facial movements: The maximal static response assay of facial nerve function.
Annals of Plastic Surgery, 32, 171–179, 1994.
[56] T.D. Helling and M.J.G. Neely. Validation of objective measures for facial paralysis. Laryngoscope,
107, 1345–1349, 1997.
[57] Y. Cui, M. Wan, and J. Li. A new quantitative assessment method of facial paralysis based on motion
estimation. In Proceedings of the 20th Annual International Conference of the IEEE Engineering in
Medicine and Biology Society, Vol. 3. IEEE Press, Hong Kong, 1998, pp. 1412–1413.
[58] L.J. Dorfman and K.C. McGill. AAEE minimonograph #29: Automatic quantitative elec-
tromyography. Muscle Nerve, 11, 804–818, 1988.
[59] C.N. Schizas, C.S. Pattichis, I.S. Schofield, P.R. Fawcett, and L.T. Middleton. Artificial neural nets in
computer-aided macro motor unit potential classification. IEEE Engineering Medicine and Biology
Magazine, 9, 31–38, 1990.
[60] C.S. Pattichis. Artificial Neural Networks in Clinical Electromyography. Ph.D. thesis, Queen Mary
and Westfield College, University of London, UK, 1992.
[61] C.S. Pattichis and C.N. Schizas. Genetic-based machine learning for the assessment of certain
neuromuscular disorders. IEEE Transactions on Neural Networks, 7, 427–439, 1996.
[62] P. Larranaga, B.S.M.Y. Gallego, M.J. Michelena, and J.M. Pikaza. Learning bayesian networks by
genetic algorithms. A case study in the prediction of survival in malignant skin melanoma. In
Lecture Notes in Artificial Intelligence, Vol. 1211. Springer-Verlag, Heidelberg, 1997, pp. 261–272.
[63] B. Sierra and P. Larranaga. Predicting survival in malignant skin melanoma using Bayesian net-
works automatically induced by genetic algorithms. An empirical comparison between different
approaches. Artificial Intelligence in Medicine, 14, 215–230, 1998.
[64] F.V. Jensen, Introduction to Bayesian Networks. University College of London, UK, 1996.
[65] P. Larranaga, M.Y. Gallego, B. Sierra, L. Urkola, and M.J. Michelena. Bayesian networks, rule
induction and logistic regression in the prediction of women survival suffering from breast cancer.
In Lecture Notes in Artificial Intelligence, Vol. 1323. Springer-Verlag, Heidelberg, 1997, pp. 303–308.
[66] R. Blanco, I. Inza, and P. Larranaga. Learning bayesian networks in the space of structures by
estimation of distribution algorithms. International Journal of Intelligent Systems, 18, 205–220,
2003.
[67] C.A. Pena-Reyes and M. Sipper. Evolving fuzzy rules for breast cancer diagnosis. In Proceedings
of 1998 International Symposium on Nonlinear Theory and Applications (NOLTA ’98), Vol. 2.
Lausanne, Presses Polytechniques et Universitaires Romandes, pp. 369–372, 1998.
[68] C.A. Pena-Reyes and M. Sipper. A fuzzy-genetic approach to breast cancer diagnosis. Artificial
Intelligence in Medicine, 17, 131–155, 1999.
[69] C.J. Kahn, L. Roberts, K. Shaffer, and P. Haddawy. Construction of a bayesian network for
mammographic diagnosis of breast cancer. Computers in Biology and Medicine, 27, 19–29, 1997.
[70] N. Marvin, M. Bower, and J.E. Rowe. An evolutionary approach to constructing prognostic models.
Artificial Intelligence in Medicine, 15, 155–165, 1999.
[71] S. Ngan, M.L. Wong, W. Lam, K.S. Leung, and J.C.Y. Cheng. Medical data mining using
evolutionary computation. Artificial Intelligence in Medicine, 16, 73–96, 1999.
[72] S. Ngan, M.L. Wong, and K.S. Leung. Using grammar based genetic programming for data mining
of medical knowledge. In Genetic Programming 1998: Proceedings of 3rd Annual Conference, Morgan
Kaufmann, San Mateo, CA, 1998, pp. 254–259.
[73] C.C. Bojarczuk, H.S. Lopes, and A.A. Freitas. Discovering comprehensible classification rules by
using genetic programming: A case study in a medical domain. In Proceedings of the Genetic and
Evolutionary Computation Conference (W. Banzhaf, J. Daida, A.E. Eiben, M.H. Garzon, V. Honavar,
M. Jakiela, and R.E. Smith, Eds.), Vol. 2, (Orlando, FL, USA), pp. 953–958, Morgan Kaufmann,
San Mateo, CA, 1999, pp. 13–17.
[74] A. Teller and M. Veloso. Program evolution for data mining. International Journal of Expert Systems,
8(3), 213–236, 1995.
[75] B. Marchesi, A.L. Stelle, and H.S. Lopes. Detection of epileptic events using genetic program-
ming. In Proceedings of the 19th International Conference IEEE/EMBS, pp. 1198–1201. IEEE Press,
Washington, 1997.
[76] J.R. Sherrah, R.E. Bogner, and A. Bouzerdoum. The evolutionary pre-processor: Automatic feature
extraction. In Genetic Programming: Proceedings of 2nd Annual Conference, Morgan Kaufmann,
San Mateo, CA, 1997, pp. 304–312.
[77] N.I. Nikolaev and V. Slavov. Inductive genetic programming with decision trees. In Proceedings of
the 1997 European Conference on Machine Learning (ECML), 1997.
[78] L. Martin, F. Moal, and C. Vrain. A relational data mining tool based on genetic programming.
In Principles of Data Mining and Knowledge Discovery: Proceedings of 2nd European Symposium
(LNAI), Vol. 1510 Springer-Verlag, Heidelberg, 1998, pp. 130–138.
[79] M.V. Fidelis, H.S. Lopes, and A.A. Freitas. Discovering comprehensible classification rules a genetic
algorithm. In Proceedings of the 2000 Congress on Evolutionary Computation CEC00, La Jolla
Marriott Hotel La Jolla, CA, USA, pp. 805–810, IEEE Press, Washington 2000, pp. 6–9.
17.1 Introduction
Multiprocessor scheduling is one of the most challenging problems in parallel and distributed
computing [1]. It is known to be NP-complete in its general form [2]. Researchers have studied
restricted forms of the problem by constraining either a parallel program model or a multiprocessor
model. However, these special cases do not fully represent real-world systems. To solve the schedul-
ing problem in the general case, a number of heuristics based on different mathematical platforms and
metaheuristics based on mechanisms observed in nature, have been introduced. The commonly known
heuristics are list scheduling, critical path, or clustering [3,4]. Recently metaheuristics such as simulated
annealing (SA), genetic algorithms (GAs), ant colonies, tabu search (TS), or neural networks have been
successfully applied [5,6].
The main problem related to heuristics and metaheuristics is the existence of the scheduling overhead
represented by the cost of running the scheduler. Here, in the case of metaheuristics on which we focus
17-253
our attention, the main source of the scheduling overhead is the necessity of calculation of a cost function
in subsequent iterations of a scheduling algorithm. One of the main sources of the scheduling overhead is
neglecting potential knowledge about the scheduling problem, which could be gained during solving its
instances. The prevailing number of scheduling algorithms does not extract, conserve, and reuse any
knowledge about the problem while solving instances of the scheduling problem.
The motivation of our work is to develop a framework for designing scheduling algorithms where
knowledge about scheduling process can be extracted and reused while solving new instances of the
scheduling problem. For this purpose we propose to use a recently emerged and very promising hybrid
technique combining cellular automata (CAs) and GA, and a computational paradigm of artificial immune
system (AIS) [7].
CAs are discrete dynamical systems made up of a large number of cells, which behave according to
local rules. It is an interesting feature of these systems that although cells interact only locally, in a
fully distributed manner, a complex global behavior can emerge. For this reason CAs are often used to
model real-world phenomena [8]. CAs are also considered as models of highly parallel and distributed
computations in multiprocessor and distributed systems [9]. They are used to find solutions of problems
such as scheduling and resource management [10].
The main problem related to CAs is a huge space of local CA rules representing possible solutions of
a problem. Therefore, most applications of CAs were a result of clever, but time-consuming hand designing
rather than an oriented search. Only recent works [11,12] on applying evolutionary computation to search
CA rules opened new possibilities. Results described in the literature show that CAs, combined with
evolutionary techniques for discovering local rules, can be effectively used to find parallel and distributed
solutions of complex global problems such as density classification task and synchronization task [11,12],
and location management in mobile computing [13] or cryptography [14]. Recently it has been shown
[15] that such a hybrid technique can be applied to discover scheduling algorithms. In this chapter,
we extend this methodology and additionally propose to use AIS as a support for reusing discovered CAs
scheduling rules to solve new instances of the scheduling problem.
The rest of this chapter is organized as follows. Section 17.2 presents the scheduling problem.
Section 17.3 provides a background on CAs. Section 17.4 contains the description of the proposed
CA-based scheduling system. Section 17.5 contains experimental results concerning CAs applied to
scheduling in two-processor systems. Section 17.6 describes AIS for reusing knowledge conserved in
CA rules while solving new instances of the scheduling problem. Section 17.7 presents the extension of
the proposed approach on the case of multiprocessor systems consisting of more than two processors.
Finally, Section 17.8 contains the conclusions.
(a) P0 (b) 0
1
1 1
1 2
2 4
3
P1 2
FIGURE 17.1 Examples of a (a) system graph and (b) precedence task graph.
of the system graph corresponding to the two-processor system. Figure 17.1(b) shows an example of the
program graph consisting of four tasks with their order numbers from 0 to 3. All communication costs of
the program graph are equal to one. Computational costs of tasks are 1, 2, 4, and 2, respectively.
The purpose of the scheduling is to distribute the tasks among the processors in such a way that the
precedence constraints are preserved and the response time T (the total execution time) is minimized.
Found optimal schedule is usually represented by a Gantt chart showing allocation of tasks to processors
and time when a given task starts and finishes execution.
The response time T for a given precedence task graph depends on an allocation of tasks in
multiprocessor topology and some scheduling policy [15], which defines the order of processing tasks,
ready to run in a given processor.
1 2
3
3
condition means that the leftmost cell is considered to be the right neighbor of the rightmost cell and
vice versa. Null boundary condition means that “absent” cells are always in state 0.
The behavior of CA is often illustrated using “space-time diagrams” in which the configuration of states
in n-dimensional lattice is plotted as a function of time (in most cases space-time diagrams are practical
only for n ≤ 2).
The purpose of the learning mode (see, Figure 17.3, left) is to discover CA rules for scheduling.
Randomly generated allocations of program graph tasks into a system graph serve as initial states of CA,
which then runs according to a rule from GA population of rules. GA searches a CA rule, which will be
able to evolve to a final configuration corresponding to an optimal or suboptimal value of the response
time T , for the input allocation of the program tasks. The algorithm for discovering CA rules with use of
GA [11,18] is presented.
BEGIN
create an initial population of rules of size P;
FOR l = 1 TO G DO
BEGIN
create a set of size I of test problems
FOR i = 1 TO P DO
BEGIN
Ti∗ = 0;
FOR j = 1 TO I DO
Ti∗ = Ti∗ +CA(rulei ,testj ,CA mode,M steps);
Ti∗ = Ti∗ /I ;
END;
sort current population of rules according to Ti∗ ;
move E of the best individuals to the next population;
FOR i = 1 TO (P − E)/2
DO
BEGIN
parent
rule1 =select();
parent
rule2 =select();
parent parent
(rule1child , rule2child ) =crossover(rule1 , rule2 );
mutation(rule1child , rule2child );
END;
END;
END;
GA begins with a population of P randomly generated CA rules. The length of the rule is calculated as
L = k 2r+1 . We then create a set (different for each generation of GA) of size I of test problems (ICs of
CA) by generating a set of random allocations of tasks in multiprocessor system for a given instance of a
problem. Each rule i is tested on a whole set of the set problems. The function CA() returns for a rule i
j
the response time Ti obtained by CA running M steps under given mode of CA (i.e., seq, seq–ran, or par)
on a test problem j. In our experiments, we set M ≈ 4 × Np . M can be calculated by some heuristic based
j
on many observations of behavior of CAs. Ti is calculated once, after the last step of running CA, when
Initial (random)
allocation
CA-based T
scheduling
Learning algorithm
AIS
Normal operating
GA CA rule Rule
the CA final configuration of states corresponds a final allocation of tasks. As a rule fitness of the rule i,
j
a value Ti∗ is accepted, which is the sum of values Ti averaged over the number I of solved tests, that is,
j
Ti∗ = Ij=1 Ti /I .
After calculation of the fitness function for all rules, GA operators are used. A number E of the best
rules (“elite”) is copied without modification to the next generation. The remaining P − E rules for the
next generation are formed by two-point crossover operator applied to randomly chosen pairs of elite
rules, which are mutated next with probability pm . This process is continued over a predefined number
of generations G and when completed the discovered rules are stored.
In the normal operating mode (see Figure 17.3, middle), when a program graph is initially, randomly
allocated, CA is initiated and equipped with a rule taken from the set of discovered rules. In this mode
we expect that for any initial allocation of tasks of a given program graph, CA will be able to evolve very
fast, without time-consuming calculation of T , to a configuration corresponding to the minimal or near
minimal value of T .
In the third mode AIS (see Figure 17.3, right), enables a potential reusing of discovered knowledge
stored in CA rules. It has an important meaning when trying to reschedule. The concept of AIS, along
with experimental results, is presented in Section 17.6.
(a) 1 (b) 0 1
0
2 3 4 5 6 7 8
3 1 3 2 3 3 3 4 3 5 3 6 3 7 3 8 9 10
11 12 13 14 15 16 17 18 19
10 10 10 10
9 10 11 12
20 21 22 23 24 25 26 27 28
5 13 5 14 5 15 5 16 29 30 31 32 33 34 35 36 37
1 17 38 39
(c) (d)
0 0
8
12
4 1 4 2 4 3 4 5
4 4
8
8 8 8
1 2 8 6 6
12 12
12 12
3 7 3 8 3 9 3 10
12
8 8 8
8 4 11
3 4 5 6 12 12 12
12
2 12 2 13 2 14
8 8
8 2 15
7 8 9 10 11 12 13 14 12 12
1 16 1 17
FIGURE 17.4 Program graphs: (a) g 18, (b) g 40, (c) tree15, and (d) gauss18.
experiments were performed on PC, Celeron 745 MHz. We used CAs with a radius of neighborhood
r ∈ {1, 2, 3}, what corresponds to rule lengths equal to 8, 32, and 128, respectively. We used the following
parameters in the experiment: a rules population size P = 50 ÷ 200, elite size E = 15 ÷ 100, a size
of test problems I = 10 ÷ 50, a number of CA steps M = 4 × Np (Np — a number of tasks), and
mutation probability pm = .03. In this section, we describe in detail the results obtained for the following
deterministic program graphs: g 18, g 40, tree15, and gauss18 (see Figure 17.4) and for the randomly
generated program graph called Rnd25_5.
The first program graph used in experiments is tree15. It is a binary tree consisting of 15 tasks. All
computational and communication costs are the same and equal to one. The optimal response time T
for tree15 in the two-processor system is equal to nine. Experiments have shown that for program graphs
from family tree it is sufficient to use r = 1 to discover scheduling rules. Rules are discovered in a few
generations of GA for all three modes of the CA, that is, seq, par, and seq–ran. Figure 17.5 shows typical
runs of the CA-based scheduler with the best rule from the final generation, starting from randomly
generated initial configuration.
The next experiment is conducted with the program graph g 40 (see, e.g., Reference 15), which is
composed of 40 tasks, with computational and communication costs equal to four and one, respectively.
The optimal response time T for g 40 in the two-processor system is equal to 80. The minimal value of r,
which allows in the learning mode for g 40 converging to optimal value of T is r = 2. Figure 17.6(a) shows
a typical run of the scheduling system in the learning mode. One can see that for two modes of updating
(a) 0
20
30
4 5 6 7 8 9 10 11 12
T=9
(b) 0
20
30
4 5 6 7 8 9 10 11 12 13
T=9
(c) 0
20
40
59
4 5 6 7 8 9 10 11 12 13 14
T=9
FIGURE 17.5 Space-time diagrams of CA-based scheduler for tree15: (a) sequential mode, (b) parallel mode, and
(c) sequential mode with a random order of updating cells’ states.
g40 g40
(a) 83 (b) 130
82.5
120
82 seq
Response time T
par
Average of T
80 90
79.5
80
79
0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100
Generation Rules
FIGURE 17.6 Running CA-based scheduling system for program graph g 40: (a) learning mode and (b) normal
operating mode.
CA rules, seq and par, the system discovers rules converging CA to a configuration corresponding to
optimal value of T = 80, and for seq–ran mode the system converges to rules providing suboptimal value
of T . It takes <20 generations of GA.
After the run of the system in the learning mode the final population of GA contains rules suitable for
CA-based scheduling. We can find out a quality of these rules in the normal operating mode. We generate
1000 random ICs (program tasks’ allocations) and use them to schedule each test problem by each of the
rules found. Figure 17.6(b) presents the average value of T for each rule over a whole set of test problems.
One can see that about 60% of discovered rules are able to find an optimal schedule for each test problem.
(a) 0
20
40
49
75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94
T = 80
(b) 0
20
40
60
80
89
75 77 79 81 83 85 87 89 91 93 95 97 99 101 103
T = 80
(c) 0
20
40
49
76 78 80 82 84 86 88 90 92 94 96 98 100 102
T = 81
FIGURE 17.7 Space-time diagrams of CA-based scheduler for g 40: (a) CA in seq mode, (b) par mode, and
(c) seq–ran mode.
The scheduling in this mode is performed automatically, very quickly, without calculation of the cost
function T , and can be observed by watching a space-time diagram of CA.
Figures 17.7(a)–(c) show space-time diagrams for typical runs of the CA-based scheduler working
with the best rule found for g 40, and performing scheduling in the normal operating mode for a single,
randomly generated test problem, with CA working in seq, par, and seq–ran modes, respectively. Left
part of each figure presents a space-time diagram of the CA consisting of 40 cells and right part shows
graphical values of T corresponding to the allocations found in a given step. Let us consider Figure 17.7(a).
One can see that in the step 0 (the first row), cells of the CA are in some initial states corresponding to
the initial allocation of tasks (white cell — a corresponding task is allocated to P0 , black cell — a task
is allocated to P1 ) and the value of T corresponding to this allocation is equal to 94. Then the CA
starts to run, changing its states sequentially according to its rule, what results in changing values of T .
Each subsequent row of the figure shows a situation after changing states of 40 cells. One can see that
the CA needs 38 steps to converge to tasks’ allocation corresponding to the minimal value of T = 80.
During which CA running no calculation of a cost function is performed. The 38 steps of the CA requires
38 × 40 = 1520 time steps, in which sequentially performed elementary entries to a simple look-up
table are executed to update states of CA cells, and it corresponds to a very low computational cost (see,
Table 17.2).
Figure 17.7(b) shows a space-time diagram of CA-based scheduler working in par mode. In this case
the CA converges to the optimal value of T after 84 steps, which is equivalent to 84 time steps, because 40
updates of the look-up table are performed in parallel in each step. Figure 17.7(c) presents a typical run
of the CA-based scheduler working in seq–par mode, with the suboptimal value of T equal to 81 found in
20 steps of CA.
The next program graph used in experiments, referred as gauss18 (see, e.g., Reference 15), represents
the parallel Gaussian elimination algorithm consisting of 18 tasks. The optimal response time T for this
program graph in the two-processor system is equal to 44. Despite the lower number of tasks when
compared with the g 40, the gauss8 is much more difficult for the learning mode because of its nonregular
Response time T
seq. par.
60 104 seq–ran.
par.
ser–ran. 102
55
100
50
98
45 96
94
40
0 100 200 300 400 500 600 700 800 900 1000 0 50 100 150 200 250
Generation Generation
FIGURE 17.8 Learning mode of the scheduling system: (a) for deterministic program graph gauss18 and (b) random
graph Rnd25_5.
structure. The minimal value of r, which allows for gauss18 converging in the learning mode to optimal
value of T = 3, what results in 128-bit long CA rule and a large space of possible rules.
Figure 17.8(a) shows typical runs of the GA in the learning mode for three modes of operating of
the CA. One can see that only for seq mode of CA the GA is able to find, after about 600 generations,
optimal CA rules. However, in the normal operating mode, the best-discovered rule needs only 18 steps
to schedule, so the performance in this mode is similar as the performance shown for the g 40.
Figure 17.9(a) shows a typical run of the CA-based scheduler working in sequential mode with the best
rule from the final generation, starting from randomly generated IC. One can see that the CA needs <20
time steps to converge to tasks’ allocation corresponding to the minimal value of T = 44. Figure 17.9(b)
shows space-time diagram of CA-based scheduler working in parallel mode. In this case the CA converges
to suboptimal value of T = 46. Figure 17.9(c) presents typical run of the CA-based scheduler working in
sequential mode with a random order of updating cells’ states, with the value of T = 51.
Last experiment presented in this section is performed on randomly generated program graph called
Rnd25_5 (with 17 tasks). The optimal response time T for Rnd25_5 in the two-processor system is equal
to 94. In the learning mode r = 3 is required. Figure 17.8(b) shows typical runs of the GA for three modes
of operating of the CA. For seq mode, the GA discovers optimal CA rules after about 50 generations.
For par mode, it takes above 150 generations. For seq–par mode, the GA does not discover optimal CA
scheduling rules. Figure 17.10 shows typical runs of the CA-based scheduler with the best rule from the
final generation, starting from randomly generated IC.
(a) 0
20
40
49
39 43 47 51 55 59 63 67 71 75 79 83
T = 44
(b) 0
20
40
49
41 45 49 53 57 61 65 69 73 77 81 85 89
T = 46
(c) 0
20
40
49
46 50 54 58 62 66 70 74 78 82 86 90 94
T = 51
FIGURE 17.9 Space-time diagrams of CA-based scheduler for gauss18: (a) sequential mode, (b) parallel mode, and
(c) sequential mode with a random order of updating cells’ states.
(a) 0
20
40
49
89 94 99 104 109 114 119 124 129 134 139 144
T = 94
(b) 0
20
40
49
89 94 99 104 109 114 119 124 129 134 139 144
T = 94
(c) 0
20
40
49
95 100 105 110 115 120 125 130 135 140 145 150
T = 100
FIGURE 17.10 Space-time diagrams of CA-based scheduler for Rnd25_5: (a) sequential mode, (b) parallel mode,
and (c) sequential mode with a random order of updating cells’ states.
20
40
49
87 91 95 99 103 107 111 115 119 123 127
T = 92
FIGURE 17.11 Space-time diagrams of CA-based scheduler for g 36: sequential mode.
of operating a CA were used: sequential and parallel mode. The experiments conducted have shown that
the best rules obtained for the program graph g 18 can successfully find an optimal scheduling for each
representative of the test. These solutions are obtained without a process of discovering CA rules and
without using GA to find optimal schedules. It means that the time required to find optimal solutions for
tested program graphs is significantly reduced.
Figure 17.11 shows typical run of the CA-based scheduler, working in sequential mode, for the program
graph g 36, with the best rule discovered for the program graph g 18. The optimal response time T for the
program graph g 36 is equal to 92.
The next question that arises is whether the discovered rules are sensitive to some modifications of
our program graph. To find out, discovered rules were used in the normal operating phase (assuming
sequential mode of a CA) to find solutions of some other program graphs. These graphs were constructed
from g 18 by introducing some random modifications to it. These modifications included changing the
values of the weights of some randomly chosen tasks or (and) edges. This way we obtained 30 new
program graphs. In most of the cases the best discovered rules were able to find an optimal (or near
optimal) scheduling for each representative of the test. These solutions are obtained without a process of
discovering rules and without using GA to find optimal schedules.
The last experiment had a more general character than the two previously described. We wanted to
know if the discovered rules had possibilities of scheduling other program graphs. We took the whole
population of rules discovered for the following program graphs: tree7, intree7, g 18, g 40, gauss18, and
Rnd25_5. These populations of rules were discovered assuming r = 3 and sequential mode of a CA. The
course of the experiment was identical as in the experiments described earlier. The populations of rules
discovered for these program graphs were tested (in the normal operating mode, assuming I = 1000) on
other program graphs the following way: “each population of rules on each program graph.” Obtained
results of the best rules are presented in Table 17.3. These are typical results obtained in a given test. The
last row of Table 17.3 contains the optimal response times T for tested program graphs.
Results presented in Table 17.3 indicate that in some cases rules discovered for a given program graph
have possibilities of scheduling other program graphs. The best rules in the tested populations can be used
to obtain optimal (or near optimal) schedules of the program graph tree7 and intree7. This fact perhaps is
not surprising because tree-structured program graphs are the easiest instances. More interesting is that
rules discovered for tree-structured program graphs are suitable for scheduling program graphs g 18 and
g 40. On the other hand, rules discovered for gauss18 and Rnd25_5 operate on the program graphs (with
the exception of trees) weaker than other tested rules.
Presented results indicate that discovered CA rules store knowledge about a way of solving instances of
the scheduling problem. Typically, rules specialize in solving the scheduling problem for a specific program
graph. Some of them are “more general” than others. They have limited possibilities to schedule other
program graphs. These observations lead to a concept of AIS, which would support reusing knowledge
stored in discovered CA rules.
Library of rules
..........................
...........
rules for a new program graphs, first, we choose from the library some number of the best rules (those
which the best “recognize” a new program graph). Then we add to these rules some number of random
rules.
An initial population of rules is created as follows: First, we run the CA-based system in the normal
operating mode applying stored rules from the library to evaluate schedules T proposed by them. As shown
earlier (see Table 17.2) this mode of work is characterized by a very low computational cost. Next, we
select the initial population of AIS d · P the best rules, where d ∈ (0, 1). Remaining (1 − d) · P rules are
randomly generated. Finally, the whole population of AIS rules is evolved by GA in the learning mode.
It is important to underline that all rules in the library have the same length (they were discovered
assuming r = 3). It enables applying genetic operators on rules from different program graphs.
0 1 2 3 4 5
6 7 8 9 10 11 12 13
14 15 16 17 18 19
20 21 22 23 24 25 26 27
28 29 30 31
0 0
0 0
32
33 34 35 36 37 38 39 40
41 42 43 44
45 46 47 48
49
590
Average of T
600
580
570 580
560
560
550
540 540
0 20 40 60 80 100 120 140 160 180 0 5 10 15 20 25 30 35 40 45
Rules Rules
FIGURE 17.14 Normal operating phase for Rnd25_1g 18: (a) population of rules discovered for Rnd25_1,
(b) population of rules discovered for g 18.
590 590
Response time T
Response time T
580 580
540 540
0 20 40 60 80 100 120 140 160 180 0 20 40 60 80 100 120 140 160 180
Generation Generation
FIGURE 17.15 Learning mode for Rnd25_1g 18: (a) randomly generated initial population (b) AIS with population
of rules from the library.
scheduling system in the standard learning mode with randomly created CA rules. We set P = 200, E = 50,
and pm = .03. We can see (Figure 17.15[a]) that the system can discover, during 200 generations, rules
providing only suboptimal solutions.
Finally, we run AIS in the learning mode. The initial population of AIS rules was composed of 60% of
rules from the library, after evaluating them in the normal operating mode, and the remaining rules were
randomly generated. We can see (Figure 17.15[b]) that the behavior of AIS is different when compared
with the behavior of the system in the standard learning mode. AIS that uses the knowledge stored in rules
library can discover the optimal CA scheduling rule after about 60 generations.
Figure 17.16 shows typical run of the CA-based scheduler with the best-discovered rule, starting from
randomly generated IC. One can see that in the step 0, cells of the CA are in some states correspond-
ing to the allocation of tasks and the value of T corresponding to this allocation is >656. Then the
CA starts to change its states sequentially, which results in changing values of T . One can see that
the CA needs <70 time steps to converge to tasks’ allocation corresponding to the minimal value of
T = 541.
20
40
60
80
99
536 548 560 572 584 596 608 620 632 644 656 668 680
T = 541
Other conducted experiments have shown the advantage of AIS operating on rules from the library,
significantly accelerating the process of discovery rules for new program graphs, which were partially
presented earlier to the system.
g 40
46
45
44 Best
Response time T
av. el.
43 av. pop.
42
41
40
39
38
0 50 100 150 200 250
Generation
20
40
59
52 54 56 58 60 62 64 66 68 70 72 74 76
T = 57
FIGURE 17.18 Space-time diagrams of CA-based scheduler for g 40 and three processors: sequential mode.
20
40
49
41 44 47 50 53 56 59 62 65 68 71 74 77
T = 46
FIGURE 17.19 Space-time diagrams of CA-based scheduler for g 40 and four processors: sequential mode.
20
40
49
34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49
T = 39
FIGURE 17.20 Space-time diagrams of CA-based scheduler for g 40 and eight processors: sequential mode.
17.8 Conclusions
In this chapter we have presented the results of research on developing CA-based scheduler, working
in sequential and parallel modes. They show that the GA can discover CA rules suitable to solve the
scheduling problem for a given instance of the problem. This phase of the algorithm is called the learning
phase. In this phase, knowledge about solving a given instance of the scheduling problem is extracted and
coded into CA rules. Discovered rules are used in the normal operating phase by CA-based scheduler for
automatic scheduling.
However, the main effort was directed to the question of a potential reusing of discovered CA rules.
To solve this problem we proposed the immune system approach implemented by GA, which matches
the currently available knowledge in discovered and stored CA rules with new instances of the scheduling
problem and quickly recognizes familiar building blocks in these instances.
Proposed hybrid technique opens very promising possibilities in developing parallel and distributed
scheduling algorithms and reducing their complexity.
Future work will include the expanding of the immune system concept. Perhaps it is possible to keep
in the library not the whole chromosomes, as in the current approach, but only selected genes. It might
help in further reducing the complexity of the algorithm.
References
[1] J. Błażewicz, K.H. Ecker, G. Schmidt, and J. Wȩglarz, Scheduling in Computer and Manufacturing
Systems, Springer-Verlag, Heidelberg, 1994.
[2] M.R. Gary and D.S. Johnson, Computers and Intractability: A Guide to the Theory of NP-
Completeness. W.H. Freeman and Company, San Francisco, CA, 1979.
[3] H. El-Rewini, T.G. Lewis, and H.H. Ali, Task Scheduling in Parallel and Distributed Systems. PTR
Prentice Hall, Englewood Cliffs, NJ, 1994.
[4] Y.K. Kwok and I. Ahmad, Benchmarking the task graph scheduling algorithms. In Proceedings of
1998 IPPS/SPDP Symposium, Orlando, FL, 1998, pp. 531–537.
[5] S. Saleh and A.Y. Zomaya, Multiprocessor scheduling using mean-field annealing. In Parallel and
Distributed Processing, Vol. 1388 of Lecture Notes in Computer Science Springer-Verlag, Heidelberg,
1998, pp. 288–296.
[6] L. Wang, H.J. Siegel, V.P. Roychowolhury, and A.A. Maciejewski, Task matching and scheduling
in heterogeneous computing environments using a genetic-algorithm-based approach. Journal of
Parallel and Distributed Computing, 47, 8–22, 1997.
[7] L.N. de Castro and J. Timmis, Artificial Immune Systems: A New Computational Intelligence
Approach, Springer-Verlag, Heidelberg, 2002.
[8] A. Schoneveld, Parallel Complex Systems Simulation, Ph.D. thesis, University of Amsterdam,
Holland, 1999 (http://www.science.uva.nl/research /pscs/papers/phd.html).
[9] M. Mitchell, Computation in cellular automata. In T. Gramb, S. Bornholdt, M. Grob, M. Mitchell,
and T. Pellizzari (Eds.), Non-Standard Computation, Wiley-VCH, Weinhein, Federal Republic of
Germany, 1998, pp. 95–140.
[10] B.J. Overeinder, Distributed Event-driven Simulation — Scheduling Strategies and
Resource Management, Ph.D. thesis, University of Amsterdam, Holland, 2000
(http://www.science.uva.nl/research /pscs/papers/phd.html).
[11] R. Das, M. Mitchell, and J.P. Crutchfield, A genetic algorithm discovers particle-based computa-
tion in cellular automata. In Y. Davidor, H.-P. Schwefel, and R. Männer (Eds.), Parallel Problem
Solving from Nature — PPSN III, Vol. 866 of Lecture Notes in Computer Science. Springer-Verlag,
Heidelberg, 1994, pp. 344–353.
[12] M. Sipper, Evolution of Parallel Cellular Machines. The Cellular Programming Approach, LNCS
1194, Springer-Verlag, Heidelberg, 1997.
[13] R. Subrata and A.Y. Zomaya, Evolving cellular automata for location management in mobile
computing. IEEE Transactions on Parallel and Distributed Systems, 14, 13–26, 2003.
[14] M. Tomassini, M. Sipper, and M. Perrenoud, On the generation of high-quality random numbers
by two-dimensional cellular automata. IEEE Transactions on Computers, 49, 1140–1151, 2000.
[15] F. Seredyński and A.Y. Zomaya, Sequential and parallel cellular automata-based scheduling
algorithms. IEEE Transactions on Parallel and Distributed Systems, 13, 1009–1023, 2002.
[16] S. Wolfram, Universality and complexity in cellular automata. Physica D, 10, 1–35, 1984.
[17] F. Seredyński, Scheduling tasks of a parallel program in two-processor systems with use of cellular
automata, Future Generation Computer Systems, 14, 351–364, 1998.
[18] Z. Michalewicz, Genetic Algorithms + Data Structures = Evolution Programs, Springer-Verlag,
Heidelberg, 1992.
[19] D.H. Wolpert and W.G. Macready, No free lunch theorems for optimization. IEEE Transactions on
Evolutionary Computation, 1, 67–82, 1997.
[20] J.D. Farmer, N.H. Packard, and A.S. Perelson, The immune system, adaptation, and machine
learning. Physica D, 22, 187–204, 1986.
[21] S. Forrest, B. Javornik, R. Smith, and A.S. Perelson, Using genetic algorithms to explore pattern
recognition in the immune system. Evolutionary Computation, 1, 191–211, 1993.
[22] A.S. Perelson, Immune network theory. Immunological Review, 110, 5–36, 1989.
[23] E. Hart and P. Ross, An immune system approach to scheduling in changing environments. In
W. Banzhaf et al. (Eds.), GECCO-99: Proceedings of the Genetic and Evolutionary Computation
Conference. Morgan Kaufmann, San Mateo, CA, 1999, pp. 1559–1566.
[24] A.A. Khan, C.L. McCreary, and M.S. Jones, A Comparison of multiprocessor scheduling heuristics.
In Proceedings of International Conference on Parallel Processing, Vol. II, 1994, pp. 243–250.
[25] M. Mitchell and S. Forrest, Genetic algorithms and artificial life. In Ch.G. Langton (Ed.), Artificial
Life. An Overview. The MIT Press, Cambridge, MA, 1995.
[26] F. Seredyński and A. Świȩcicka, Immune-like system approach to multiprocessor scheduling, In
R. Wyrzykowski, J. Dongarra, M. Paprzycki, and J. Wasniewski (Eds.), Parallel Processing and
Applied Mathematics, Vol. 2328, of Lecture Notes in Computer Science. Springer-Verlag, Heidelberg,
2002, pp. 626–633.
18.1 Introduction
A cellular automaton (CA) is a rule-based computing machine, which was first proposed by von Newmann
in early 1950s and systematic studies were pioneered by Wolfram in 1980s. Since a cellular automaton
consists of space and time, it is essentially equivalent to a dynamical system that is discrete in both space
and time. The evolution of such a discrete system is governed by certain updating rules rather than
differential equations. Although the updating rules can take many different forms, most common cellular
automata use relatively simple rules (Von Newmann, 1966; Wolfram, 1983). On the other hand, an
equation-based system such as the system of differential equations and partial differential equations also
describe the temporal evolution in the domain. Usually, differential equations can also take different forms
that describe various systems. Now one natural question is what is the relationship between a rule-based
18-273
system and an equation-based system? Given differential equations, how can one construct a rule-based
cellular automaton, or vice versa? There has been substantial amount of research in these areas in the
past two decades. This chapter intends to summarize the results of the relationship among the cellular
automata, partial differential equations (PDEs), and pattern formations.
Class 1: The first class of cellular automata always evolves after a finite number of steps from almost
all initial states to a homogeneous state where every cell is in the same state. This is something like
fixed point equilibrium in the dynamical system.
Class 2: Periodic structures with a fixed number of states occur in the second class of cellular automata.
Class 3: Aperiodic or “chaotic” structures appear from almost all possible initial states in this type of
cellular automata.
Class 4: Complex patterns with localized spatial structure propagate in the space as time evolves.
Eventually, these patterns will evolve to either homogeneous or periodic. It is suggested that this
class of cellular automata may be capable of universal computation.
Cellular automata can be formulated in higher dimensions such as 2D and 3D. One of the most
popular and yet very interesting 2D cellular automata using relatively simple updating rules is the
Conway’s Game of Life. Each cell has only two states k = 2, and the states can be 0 and 1. With a
radius of r = 1 in the 2D case, each cell has eight neighbors, thus the new state of each cell depends on
total nine cells surrounding it. The boundary cells are treated as periodic. The updating rules are: if two
or three neighbors of a cell are alive (or 1) and it is currently alive, then it is alive at next time step; if three
neighbors of a cell are alive (or 0) and it is currently not alive its next state is alive; the next state is not
alive for all the other cases. It is suggested that this simple automaton may have the capability of universal
computation. There are many existing computer programs such as Life in Matlab and screen savers on all
computer platforms such as Windows and Unix.
r
r
φijt +1 = G t
aαβ φi+α, j+β , (i, j = 1, 2, . . . , N ),
α=−r β=−r
where aαβ (α, β = ±1, ±2, . . . , ±r) are the coefficients. The cellular automata with fixed rules defined
this way are deterministic cellular automata. In contrast, there exists another type, namely, the stochastic
cellular automata that arise naturally from the stochastic models for natural systems (Guinot, 2002;
Yang, 2003).
is reversible since for any function g (u), one can compute u(t + 1) from u(t ) and u(t − 1), and invert
u(t − 1) from u(t ) and u(t + 1). The automaton rule for 2D reversible automata can be similarly
constructed as
t +1 t t −1
ui,j = g (ui,j ) − ui,j ,
together with appropriate boundary conditions such as fixed-state boundary conditions (Margolus, 1984).
values; numerical computation on a computer always lead to the discrete values due to the limited bits of
processors or round-off. Similarly, cellular automata are also about the evolution of state variables with
finite number of values on a regular grid of cells at different discrete time steps. If the number of states of
a cellular automaton is comparable with that of the related finite difference equation, then we can expect
the results to be comparable.
To demonstrate this, we choose the 1D heat equation
∂T ∂ 2T
=κ 2,
∂t ∂x
where T is temperature and κ is the thermal diffusivity. This is a mathematical model that is widely used
to simulate many phenomena. The temperature T (x, t ) is a real-valued function, and it is continuous
for any time t > 0 whatever the initial conditions. In reality, it is impossible to measure the temperature
at a mathematical point, and the temperature is always the average temperature in a finite representative
volume over certain short time. Mathematically, one can obtain a closed-form solution with infinite accu-
racy in the domain, but physically the temperature would only be meaningful at certain macroscopic
levels. No matter how accurate the solution may have at very fine scale, it would be meaningless to try
to use the solution at the atomic or subatomic levels where quantum mechanics come into play and, the
solution of temperature is invalid (Toffoli, 1984). Thus, numerical computation would be very useful even
though it has finite discrete values.
The simplest discretization of the above heat equation is the central difference for spatial derivative and
forward scheme for time derivatives, and we have
κt
Tin+1 − Tin = (T n − 2Tin + Ti−1
n
),
(x)2 i+1
where i and n are the spatial and time indices. If we choose the time steps and spatial discretization such
that κt /(x)2 = 1, now we have
Tit +1 = (Ti+1
t
+ Tit + Ti−1
t
) − 2Tit ,
which is something like the “mod 2” cellular automata or Wolfram’s cellular automata with rule 150
(Wolfram, 1984a; Weimar, 1997).
Cellular automata obtained this way are very similar to the finite difference method. If the state variables
are discrete, they are exactly the finite-state cellular automata. However, one can use the continuous-
valued state variable for such simulation, in this case, they are continuous-valued cellular automata from
differential equations as studied by Rucker’s group and used in their the well-known CAPOW program
(Rucker, 2003; Ostrov and Rucker, 1996). In some sense, the continuous-valued cellular automata based
on the differential equations are the same as the finite difference methods, but there are some subtle
differences and advantages of cellular automata over the finite difference simulations due to the CA’s
properties of parallel nature, artificial life-oriented emphasis on experiment and observation and genetic
algorithms for searching large phase space of the rules as proposed by Rucker et al. (1998).
where u(x, y, t ) is the state variable that evolves with time in a 2D domain, and the function f (u) can be
either linear or nonlinear. D is a constant depending on the properties of diffusion. This equation can
also be considered as a vector form for a system of reaction-diffusion equations if let D = diag(D1, D2 ),
u = [u1 u2 ]T . The discretization of this equation can be written as
n+1
ui,j − ui,j
n n
ui+1,j − 2ui,j
n + un
i−1,j
n
ui,j+1 − 2ui,j
n + un
i,j−1 n
=D + + f (ui,j ),
t (x)2 (y)2
by choosing t = x = y = 1, we have
n+1 n n n n n n
ui,j = D[ui+1,j + ui−1,j + ui,j+1 + ui,j−1 ] + f (ui,j ) + (1 − 4D)ui,j ,
r
t +1 t t
ui,j = ak,l ui+k,j+l + f (ui,j ),
k,l=−r
where the summation is over the 4r + 1 neighborhood. This is a finite-state cellular automaton with the
coefficients ak,l being determined from the discretization of the governing equations, and for this special
case, we have a−1,0 = a+1,0 = a0,−1 = a0,+1 = D, a0,0 = 1 − 4D, r = 1.
∂ 2u 2∂ u
2
= c ,
∂t 2 ∂x 2
where c is the wave speed. The simplest central difference scheme leads to
By choosing t = x = 1, t = n, it becomes
uit +1 = [ui+1
t t
+ ui−1 + 2(1 − c 2 )uit ] − uit −1 .
uit +1 + uit −1 = g (u t ),
which is reversible under certain conditions. This property comes from the reversibility of the wave
equation because it is invariant under the transformation: t → −t .
∂u ∂u ∂ 2u
= 2u + 2 + ∇υ,
∂t ∂x ∂x
where υ is the noise that is uncorrelated in space and time so that υ(x, t ) = 0 and υ(x, t )υ(x0 , t0 ) =
2Dδ(x − x0 )δ(t − t0 ) (Emmerich and Kahng, 1998). This equation with Gaussian white noise can be
rewritten as
∂u ∂u ∂ 2u
+ ξ = 2u + 2 + η,
∂t ∂x ∂x
where both ξ and η are uncorrelated. By introducing the variables vit = c exp(x uit ), φIt = β ln(vit ),
α = t /(x)2 , (1 − 2α)/cα = exp(−A/β), c 2 = exp(B/β), ξ = exp(), η = exp() and after some
straightforward calculations in the limit of β tends zero, we have the automata rule
φit +1 = φi−1
t
+ max[0, φit − A, φit + φi+1
t
− B, it − φi−1
t
]
t t
− max[0, φi−1 − A, φi−1 + φit − B, ti − φi−1
t
],
This forms a generalized probabilistic cellular automaton that is referred to as the noisy Burgers cellular
automaton. Burgers equation without noise usually evolves in shock wave, and in the presence of noise,
the states of the probabilistic cellular automata may be taken as discrete reminders of those shock waves
that were disorganized.
S2 S3 S4
(–1,+1) (0,+1) (+1,+1)
S1 N/P S5
S8 S7 S6
(–1,–1) (0,–1) (+1,–1)
FIGURE 18.2 Diagram of eight neighbors and the positions of Omohundro’s functions.
taken as
β
S1 (xβ/α) x
S1 = e−1/(β−x) (−β < x < β), 0 (|x| ≥ β),
2
S2 = , S3 = S1 (x)dx S1 (x)dx,
S1 (0) −1 −β
∞
S4 = S3 (x − α/2)S3 (α/2 − x), S5 = S3 (x − α)S3 (α − x), S6 = S4 (x − k),
k=−∞
∞
∞
S7 = S5 (x − k), S8 = S2 (x − k).
k=−∞ k=−∞
where γ is a large constant. Differential equations for other variables can be written in a similar manner
although they are more complicated (Omohundro, 1984).
∂v √
= D∇ 2 v + f (v) + ε v,
∂t
where v is the concentration of live sites or ui (t ) = 1, and ε is a zero-mean Gaussian random variable
with unit variance (Ahmed and Elgazzar, 2001). However, there is nonuniqueness associated with the
formulation of differential equations from the cellular automata. Bagnoli et al. (2001) demonstrated that
the above equation can be obtained from the following rules: (1) ui (t + 1) = 0, if = ui+1 (t ) + ui (t ) +
ui−1 (t ) = 0; (2) ui (t + 1) = 1 with probability p1 , if = 1; (3) ui (t + 1) = 2 with probability p2 , if
= 2; (4) ui (t + 1) = 1, if = 3. This nonuniqueness in the relationship between cellular automata
and PDEs require more research.
FIGURE 18.3 Pattern formation in cellular automata: (a) 1D CA with disordered initial conditions; (b) CA for 1D
wave equation; and (c) nonlinear 1D Sine-Gordon equation.
of its formation is essential in many processes such as biological pattern formation, enzyme dynamics,
percolation, and other processes in engineering applications (Turing, 1952; Flake, 1998; Cappuccio et al.,
2001; Boffetta et al., 2002). This section focuses on the pattern formation in cellular automata and
comparison of CA results with the results using differential equations.
∂u ∂v
= Du ∇ 2 u + f (u), = Dv ∇ 2 v + g (v), f (u, v) = α(1 − u) − uv 2 ,
∂t ∂t
(α + β)v
g (u, v) = uv 2 − ,
1 + (u + v)
where Du = 0.05. The parameters γ = Dv /Du , α, β can vary so as to produce the complex patterns.
This system can model many systems such as enzyme dynamics and biological pattern formations by slight
modifications (Murray, 1989; Meinhardt, 1982, 1995; Keener and Sneyd, 1998; Yang, 2003). Figure 18.4
shows a snapshot of patterns formed by the simulations of the above reaction-diffusion system at t = 500
for γ = 0.6, α = 0.01, and β = 0.02. The right plot is the comparison of results obtained by three
different methods: cellular automata (marked with CA), finite difference method (FD), and finite element
method (FE). The plot is for the data on the middle line of the pattern shown on the left.
FIGURE 18.4 A snapshot of pattern formation of the reaction-diffusion system (for α = 0.01, β = 0.02, and
γ = 0.6) and the comparison of results obtained by three different methods (CA-dotted, FD-solid, FE-dashed)
through the middle line of the pattern on the left.
200
0.31
180 0.30
160 0.29
140 0.28
120 0.27
100 0.26
0.25
80
0.24
60
0.23
40
0.22
20 0.21
FIGURE 18.5 Pattern formations from 2D random initial conditions with α = 0.05, β = 0.01, and γ = 0.5 (left)
and 3D structures with α = 0.1, β = 0.05, and γ = 0.36 (right).
Another example is the formation of spiral waves studied by Barkley and his colleagues in detail (Barkley
et al., 1990; Margerit and Barkley, 2001)
∂u u v +β ∂v
= ∇ 2 u + 2 (1 − u) u − , = ε∇ 2 v + (u − v),
∂t ε α ∂t
where ε, α, β are parameters. Figure 18.6 shows the formation of spiral waves (2D) and scroll waves (3D)
of this nonlinear system under appropriate conditions.
The patterns formed in terms of diffusion-reaction equations have been observed in many phenomena.
The ring, spots, and stripes exist in animal skin coating, enzymatic reactions, shell structures, and mineral
formation. The spiral and scroll waves and spatiotemporal pattern formations are observed in calcium
transport, Belousov–Zhabotinsky reaction, cardiac tissue, and other excitable systems.
In this chapter, we have discussed some of important development and research results concerning the
connection among the cellular automata and partial differential equations as well as the pattern formation
related to both systems. Cellular automata are rule-based methods with the advantages of local inter-
actions, homogeneity, discrete states and parallelism, and thus they are suitable for simulating systems
with large degrees of freedom and life-related phenomena such as artificial intelligence and ecosystems.
FIGURE 18.6 Formation of spiral wave for α = 1, β = 0.1, and ε = 0.2 (left) and 3D scroll wave for α = 1, β = 0.1,
and ε = 0.15 (right).
PDEs are continuum-based models with the advantages of mathematical methods and closed-form
analytical solutions developed over many years, however, they usually deal with systems with small num-
bers of degree of freedom. There is an interesting connection between cellular automata and PDEs although
this is not always straightforward and sometimes may be very difficult. The derivation of updating rules for
cellular automata from corresponding PDEs are relatively straightforward by using the finite differencing
schemes, while the formulation of differential equations from the cellular automaton is usually difficult
and nonunique. More studies are highly needed in these areas. In addition, either rule-based systems
or equation-based systems can have complex pattern formation under appropriate conditions, and these
spatiotemporal patterns can simulate many phenomena in engineering and biological applications.
References
Ahmed, E. and Elgazzar, A.S. On some applications of cellular automata. Physica A, 296 (2001) 529–538.
Bagnoli, F., Boccara, N., and Rechtman, R. Nature of phase transitions in a probabilistic cellular automaton
with two absorbing states. Physical Review E, 63 (2001) 461161–461169.
Barkley, D., Kness, M., and Tuckerman, L.S. Spiral-wave dynamics in a simple model of excitable media:
The transition from simple to compound rotation. Physical Review A, 42 (1990) 2489–2492.
Boffetta, G., Cencini, M., Falcioni, M., and Vulpiani, A. Predictability: A way to characterize complexity.
Physics Reports, 356 (2002) 367–474.
Cappuccio, R., Cattaneo, G., Erbacci, G., and Jocher, U. A parallel implementation of a cellular automata
based on the model for coffee percolation. Parallel Computing, 27 (2001) 685–717.
Emmerich, H. and Kahng, B.N. A random automata related to the noisy Burgers equation. Physica A, 259
(1998) 81–89.
Flake, G.W. The Computational Beauty of Nature. MIT Press, Cambridge, MA (1998).
Guinot, V. Modelling using stochastic, finite state cellular automata: Rule inference from continuum
models. Applied Mathematical Modelling, 26 (2002) 701–714.
Keener, J. and Sneyd, J. Mathematical Physiology. Springer-Verlag, New York (1998).
Margerit, D. and Barkley, D. Selection of twisted scroll waves in three-dimensional excitable media.
Physical Review Letters, 86 (2001) 175–178.
Margolus, N. Physics-like models of computation. Physica D, 10 (1984) 81–95.
Meinhardt, H. Models of Biological Pattern Formation. Academic Press, London (1982).
Meinhardt, H. The Algorithmic Beauty of Sea Shells. Springer-Verlag, New York (1995).
Murray, J.D. Mathematical Biology. Springer-Verlag, New York (1989).
Omohundro, S. Modelling cellular automata with partial differential equations. Physica D, 10 (1984)
128–134.
Ostrov, D. and Rucker, R. Continuous-valued cellular automata for nonlinear wave equations. Complex
Systems, 10 (1996) 91–117.
Rucker, R. Continuous-valued cellular automata in two dimensions. In Griffeath, D. and Moore, C. (Eds.),
New Constructions in Cellular Automata. Oxford University Press, (2003). Website for CAPOW98
Software: http://www.cs.sjsu.edu/faculty/rucker/capow/
Toffoli, T. Cellular automata as an alternative to (rather than an approximate of) differential equations in
modelling physics. Physica D, 10 (1984) 117–127.
Turing, A. The chemical basis of morphogenesis. Philosophical Transactions of the Royal Society of London B,
237 (1952) 37–72.
Vichniac, G.Y. Simulating physics with cellular automata. Physica D, 10 (1984) 96–116.
von Newmann J. Theory of Self-Reproducing Automata (Edited by A.W. Burks). University of Illinois
Press, Urbana, IL (1966).
Weimar, J.R. Cellular automata for reaction-diffusion systems. Parallel Computing, 23 (1997) 1699–1715.
Wolfram, S. Statistical mechanics of cellular automata. Review of Modern Physics, 55 (1983) 601.
Wolfram, S. Cellular automata as models of complexity. Nature, 311 (1984a) 419–424.
Wolfram, S. Universality and complexity in cellular automata. Physica D, 10 (1984b) 1–35.
Wolfram, S. Cellular Automata and Complexity. Addison-Wesley, Reading, MA (1994).
Yang, X.S. Characterization of multispecies living ecosystems with cellular automata. In Standish, Abbass,
and Bedau (Eds.), Artificial Life VIII. MIT Press, Cambridge, MA (2002), pp. 138–141.
Yang, X.S. Turing pattern formation of catalytic reaction-diffusion systems in engineering applications.
Modelling and Simulation in Materials Science and Engineering, 11 (2003) 321–329.
The complex social behaviors of ants have been much studied by science, and computer scientists
are now finding that these behavior patterns can provide models for solving difficult combinatorial
optimization problems.
19.1 Introduction
An ant colony is a “distributed system” that has a highly structured social organization in spite of the
simplicity of its individuals. Because of this organization, an ant colony is capable of accomplishing
complex natural tasks that far exceed the individual capacity of a single ant. It is not surprising, therefore,
that computer scientists have taken inspiration from ants and their behavior in order to design algorithms
for solving computationally demanding problems. This chapter is concerned with applying these ideas to
solve an important, but NP-hard, combinatorial optimization problem: the so-called mesh-partitioning
problem.
Many of the problems that arise in mechanical, civil, automobile, and aerospace engineering can be
expressed in terms of partial differential equations and solved by using the finite-element method. If a
partial differential equation involves a function, f , then the purpose of the finite-element method is to
determine an approximation to f . To do this the domain is discretized into a set of geometrical elements
consisting of nodes: a process known as meshing. The value of f is then computed for each of these nodes,
and the solutions for the other points are interpolated from these values [1].
19-285
(a) (b)
(c) (d)
FIGURE 19.1 Mesh partitioning: (a) sample mesh; (b) mesh with induced graph; (c) after graph partitioning; and
(d) the resulting partitioned mesh.
However, in real-world engineering problems, meshing is a demanding task because meshes usually
have large numbers (e.g., hundreds of thousands) of elements. For this reason, the finite-element method
is usually parallelized, which means the mesh is partitioned and distributed among several processors:
a process known as mesh partitioning. To achieve high computational efficiency, it is important that the
mesh partitioning produces workloads that are well balanced and that the interprocessor communication
is minimized. This is a combinatorial optimization problem and is a special case of the well-known
graph-partitioning problem (see Figure 19.1).
The graph-partitioning problem is defined as follows. Let G(V , E) be an undirected graph consisting
of a nonempty set V of vertices and a set E ⊆ V × V of edges. A k-partition D of G comprises k mutually
disjointed subsets D 1 , D 2 , . . . , D k (called domains) of V , whose union is V . The set of edges that connect
the different domains of a partition D is called an edge-cut, and is denoted by ξ(D). A partition D is
considered to be balanced if the sizes of the domains are roughly the same, that is, if
Source: Adapted from Dorigo, M., Bonabeau, E., and Therauluz, G. Future Generation Computer Systems, 16,
858, 2000. With permission from Elsevier.
annealing [7], neural networks [8], and genetic algorithms [9]. These methods are widely applicable and
have proven to be very powerful in practice [10].
Ant colonies are yet another such natural phenomenon that has recently given rise to new optimization
methods belonging to the group of “Heuristics from Nature,” which are implemented in some new
bio-inspired algorithms [11,12]. This ant-colony optimization has already been used in solving various
combinatorial optimization problems, such as the traveling salesman [13], quadratic assignment [14],
job-shop scheduling [15], vehicle [16] and network routing [17]. More references to applications of
ant-colony optimization can be found in References 18 and 19 (see Table 19.1). For the case of mesh
partitioning, however, only a few attempts have been made [20–23].
paths between food sources and their nest [24]. Ants deploy a chemical trail (or pheromone trail) as they
walk; this trail attracts other ants to take the path that has the most pheromone. This reinforcement process
results in the selection of the shortest path: the first ants coming back to the nest are those that took the
shortest path twice (from the nest to the source and back to the nest), so that more pheromone is present
on the shortest path than on the longer paths immediately after these ants have returned, stimulating nest
mates to choose the shortest path.
Ant-colony optimization algorithms (see Figure 19.2) are based on a parameterized probabilistic model
(pheromone model) that is used to model the chemical pheromone trails. Artificial ants incrementally
construct solutions by adding opportunely defined solution components to a partial solution under
consideration. In order to do this, artificial ants perform randomized walks on a completely connected
graph G(C, L), called a construction graph, whose vertices are the solution components C, and the set
L composed of the connections. When a constrained combinatorial optimization problem is considered,
the problem constraints are built into the ants’ constructive procedure in such a way that in every step of
the construction process only feasible solution components can be added to the current partial solution.
Ant_activity(): In the construction phase an ant incrementally builds a solution by adding solution
components to the partial solution constructed so far. The probabilistic choice of the next solution
component to be added is done by means of transition probabilities. More specifically, ant n in step t
moves from vertex i ∈ C to vertex j ∈ C with a probability given by:
τija (t )ηijb
j ∈ Ni,n ,
pij,n (t ) = m∈Ni,n τim (t )ηim
a b
0 j∈
/ Ni,n ,
where ηij is a priori available heuristic information, a and b are two parameters that determine the
relative influence of the pheromone trail τij (t ) and heuristic information, respectively, and Ni,n is the
feasible neighborhood of vertex i. If a = 0, then only heuristic information is considered. Similarly, if
b = 0, then only pheromone information is at work. Once an ant builds a solution, or while a solution
is being built, the pheromone is being deposited (on nodes or connections) according to the evaluation
of a (partial) solution. This pheromone information will direct the search of the ants in the following
iterations. The solution construction ends when an ant comes to the ending vertex (where the food is
located).
Pheromone_evaporation(): Pheromone-trail evaporation is a procedure that simulates the
reduction of pheromone intensity. It is needed in order to avoid a too quick convergence of the algorithm
to a suboptimal solution.
Daemon_actions(): Daemon actions can be used to implement centralized actions that cannot be
performed by single ants. Examples are the use of a local search procedure applied to the solutions built
by the ants, or the collection of global information that can be used to decide whether it is useful or not
to deposit additional pheromone to bias the search process from a nonlocal perspective.
As we can see from the pseudo code, the ScheduleActivities construct does not specify how the three
included activities should be scheduled or synchronized. This means it is up to the programmer to specify
how these procedures will interact (parallel or independent).
Within the ACO metaheuristic framework the currently best-performing versions in practice are
Ant Colony System [13] and MAX–MIN Ant System [25]. Recently, researchers have been dealing
with finding similarities between ACO algorithms and Estimation of Distribution Algorithms [26,27].
Furthermore, connections between ACO algorithms and Stochastic Gradient–Descent algorithms are
shown in Reference 28.
food in the first possible place around the nest (e.g., in a clockwise direction). After an ant has dropped its
food, it starts a new round of foraging. Of course, ants can also gather food from other nest loci. When an
ant tries to pick up food from other nest loci it performs the same procedure as when foraging for food,
with the exception that when the food is too heavy to be picked up, the ant moves on instead of sending a
help signal. In this way the temporary solution is significantly improved.
As was mentioned above, there are some constraints that are imposed on the b-MACA algorithm.
The first is the colony’s storage-capacity constraint, which is implemented so that no single colony can
gather all the food into its nest, and to maintain the appropriate balance between domains. The second
constraint ensures that when the pheromone intensity of a certain cell drops below a fixed value, that
cell’s pheromone intensity is restored to the initial value. In this way we maintain a high exploration level.
Other constraints are as follows: there can only be a limited number of vertices put in a single cell; each
ant can carry only a limited number of pieces of food; the food that is being brought back to the nest is a
kind of a tabu, that is, it is not available to other ants. A short tabu list consisting of the last m pieces of
food that were moved helps the algorithm to escape from local minima.
phase
Coars
e
ment
ning p
Refine
hase
s
FIGURE 19.4 The three phases of multilevel k-way mesh partitioning.
The edges that have not been collapsed are inherited by the new graph G
+1 and the edges that have
become duplicated are merged and their weight is summed. Because of inheritance, the total weight
of the graph remains the same and the total edge weight is reduced by an amount equal to the weight
of the collapsed edges, which have no impact on the graph balance or the edge-cut. In the second part,
the already-optimized partition (with the algorithm b-MACA) of graph G
is expanded. The optimized
partition must be interpolated onto its parent graph G
−1 . Because of the simplicity of the coarsening
in the first part, the interpolation itself is trivial. So, if vertex v ∈ V
belongs to domain D i , then
after refinement the matched pair u1 , u2 ∈ V
−1 that represents v, will also be in D i . In this way
the graph is expanded to its original size, and on every level
of our expansion we run our basic
ant-colony algorithm. This is referred to as the Multilevel Multiple Ant-Colony Algorithm (m-MACA)
approach.
Due to the large graphs and the increased number of levels, the number of vertices in a single cell
increases rapidly. To overcome this problem we introduced a method called bucket sort that accelerates
and improves the algorithm’s convergence by choosing the most “promising” vertex from the cell. The
bucket sort, which was first introduced by Fiduccia and Mattheyses [32], has become an essential tool
for the efficient and rapid sorting and adjustment of vertices in terms of their gain. The basic idea
is that all the vertices with a particular gain g are put together in a “bucket” ranked g . In this way
the problem of finding a vertex with maximum gain is converted into finding the nonempty bucket
with the highest rank, and then picking a vertex from it. If a chosen vertex migrates from one domain
to another, only its gain and the gains of all its neighbors have to be recalculated and put back into
appropriate buckets. In our implementation each bucket is represented by a double-linked list of vertices.
Because of the multilevel process, it often happens that the potential gain values are dispersed over a wide
range. For this reason we have introduced the 2–3 tree. With this we avoided large and sparse arrays of
pointers. We store the nonempty buckets in the 2–3 tree, so each leaf in the tree represents a bucket.
For even faster searching we have made one 2–3 tree for each colony on every cell that has vertices on
it (see Figure 19.5). With this we have increased the speed of the search, as well as the add and delete
operations.
–1 2
–3 0 4 6
Bucket
–7 –3 –1 0 2 4 6 ranked 6
Double linked
list of vertices
Y = {y i : i = 1, 2, . . . , k}.
Each vector y i is called a codeword and the set of all the codewords is called a codebook. Associated with
each codeword y i is a nearest-neighbor region called the Voronoi region, and it is defined by:
υi = {x ∈ IR
:
x − y i
≤
x − y j
, ∀i = j}.
f (x, y i ) = ε(x, y i ) − ξi − βi ,
where ε(x, y i ) represents a function that calculates the Euclidian distance between x and y i , ξi
represents the change in the edge-cut if x belonged to the i-th domain, and βi represents the
difference between the number of vertices in the largest and the i-th domains.
Step 5: Compute the new set of codewords. We add up all the x i vectors in the i-th domain and divide
the summation by the number of input vectors in the domain:
1
m
yi = xij ,
m
j=1
X y12
X
Input vectors
X X
V8 y8 X
y10 y11
X X X X
X X X
y9 V10
X
Codewords
y6 V9 X
X
X
y7 V6 X y4
X X V4
X X X X
y5
X X
V7 X X X
X X V5 X X
y2 y3
Voronoi region
V1 X X
X
X X
X
y1 V2 V3
X X
divided into k = 13 domains (Voronoi regions υ 1 , υ 2 , . . . , υ 13 ) and are represented with the codewords
y 1 , y 2 , . . . , y 13 .
(a)
(b)
FIGURE 19.8 User interface: (a) initial stage and (b) final stage.
Number of Number of
Graph vertices |V | edges |E|
graph1a 50 86
graph2a 50 143
grid1c 252 476
grid1_dualc 224 420
graph3a 400 546
graph4a 400 1,006
netz4504_dualc 615 1,171
U1000.5c 1,000 2,394
U1000.10c 1,000 4,696
U1000.20c 1,000 9,339
ukerbe1_dualc 1,866 3,538
netz4504c 1,961 2,578
add20b 2,395 7,462
datab 2,851 15,093
grid2_dualc 3,136 6,112
grid2c 3,296 6,432
airfoilc 4,253 12,289
3eltb,c 4,720 13,722
ukb 4,824 6,837
add32b 4,960 9,462
ukerbe1c 5,981 7,852
airfoil_dualc 8,034 11,813
bcsstk33b 8,738 291,583
3elt_dualc 9,000 13,287
whitaker3b,c 9,800 28,989
crackb,c 10,240 30,380
wing_nodalb 10,937 75,488
fe_4elt2b 11,143 32,818
4eltb 15,606 45,878
bigc 15,606 45,878
fe_sphereb 16,386 49,152
ctib 16,840 48,232
whitaker3_dualc 19,190 8,581
crack_dualc 20,141 30,043
cs4b 22,499 43,858
big_dualc 30,269 44,929
a Randomly generated.
b Graph partitioning archive: www.gre.ac.uk/
∼c.walshaw/partition/.
c Graph collection: www.uni-paderborn.de/cs/ag-monien/
RESEARCH/PART/graphs.html.
graph1 2 15 0 14 0 14 0 13 2
4 32 12 25 1 25 1 25 1
graph2 2 35 0 33 0 33 0 33 0
4 77 9 57 1 57 1 57 1
graph3 2 45 8 41 0 40 0 41 0
4 85 4 82 0 76 0 79 0
graph4 2 208 10 181 0 187 0 182 0
4 331 2 309 0 310 0 316 2
ξ(D)
Graph k Algorithm(s) with 1% unbalance
Clearly, m-MACA performed very well. Notice that m-MACA is superior to the classical k-Metis and
Chaco algorithms. Notice also that MLSATS produced some results that have better ξ(D) but with much
higher β(D) than m-MACA.
The algorithm m-MACA also returned some solutions that are better than currently available (Winter
2003–2004) solutions in the Graph Partitioning Archive (Table 19.5). Furthermore, m-MACA is even
comparable with the combined evolutionary/multilevel scheme used in the JOSTLE Evolutionary
algorithm [37], which is currently the most promising mesh-partitioning algorithm.
3elt 2 124 0 98 0 90 0 90 0
4 258 0 252 0 225 2 212 4
3elt_dual 2 70 0 70 0 44 6 45 8
4 130 0 120 0 112 4 154 7
airfoil 2 82 1 85 1 74 1 81 1
4 182 1 179 1 176 2 190 3
airfoil_dual 2 60 0 40 0 37 0 40 14
4 111 1 84 1 80 7 110 7
big 2 242 0 165 0 141 0 139 0
4 416 1 405 1 354 11 382 14
big_dual 2 92 1 92 1 78 1 77 11
4 219 1 196 1 215 18 222 25
crack 2 209 0 206 0 184 0 196 2
4 457 0 458 0 377 6 371 9
crack_dual 2 130 1 101 1 80 25 87 1
4 228 1 201 2 169 3 164 9
grid1 2 26 0 20 0 18 0 18 0
4 48 0 40 0 38 0 38 0
grid1_dual 2 16 0 16 0 16 0 16 0
4 37 0 35 0 35 0 35 0
grid2 2 38 0 37 0 34 0 34 2
4 106 0 121 0 94 2 92 2
grid2_dual 2 35 0 32 0 32 0 32 0
4 99 0 91 0 90 2 96 4
netz4504 2 25 1 26 1 22 1 24 1
4 66 1 62 1 50 1 49 3
netz4504_dual 2 21 1 21 1 19 1 19 1
4 54 1 49 1 44 2 44 2
U1000.5 2 10 0 1 0 1 0 2 0
4 20 0 6 0 7 2 12 2
U1000.10 2 115 0 56 0 39 0 39 0
4 200 0 108 2 99 2 107 2
U1000.20 2 294 0 253 0 220 4 221 2
4 554 0 515 2 546 2 497 2
ukerbe1 2 30 1 28 1 27 1 28 1
4 82 1 64 1 63 2 61 1
ukerbe1_dual 2 25 0 25 0 22 0 22 0
4 56 1 51 1 52 3 48 1
whitaker3 2 135 0 128 0 126 16 127 0
4 439 0 424 0 383 0 383 2
whitaker3_dual 2 82 0 74 0 64 18 65 0
4 251 1 210 1 200 6 195 12
We partitioned each of the graphs into two and four domains (k = 2 and k = 4). Each score is described
with the best obtained edge-cut ξ(D). It is important to mention that the balance β(D) was kept inside
0.2% of the |V |. The results of our experiment are shown in Table 19.6.
Table 19.6 shows that in most cases the best partition was obtained with the h-MACA algorithm.
(a)
(b)
Figure 19.9 shows the main drawback of the m-MACA. Here we can see vertices represented as lighter
and darker dots, where lighter belongs to the one domain, and darker to the other. With the solid white
line we have emphasized the border between these two domains. The edges are hidden again. The final
solution obtained by the m-MACA includes “islands” (i.e., a set of connected vertices) that belong to
different domains. The islands are due to a bad initial partition and the inability of the m-MACA to merge
the islands into homogeneous regions (each having only one border). This drawback is eliminated by
using the VQ to obtain the initial partitions. In Figure 19.9(a) (m-MACA) one can see four such islands,
whereas there are only two in Figure 19.9(b) (h-MACA).
19.4 Conclusions
The graph-partitioning problem is an important component for mesh partitioning in the domain-
decomposition method. The ACO uses a metaheuristic approach for solving hard combinatorial
optimization problems. The purpose of this chapter was to give the reader a basic knowledge of ACO,
to investigate variants of the MACA for mesh partitioning, to suggest modifications to improve this
algorithm, and to evaluate them experimentally.
The b-MACA performed very well on small- or medium-sized graphs (n < 500). With larger graphs,
which are often encountered in mesh partitioning, we had to use a multilevel or hybrid method to
produce results that were competitive with the results given by other algorithms. Both multilevel and
hybrid algorithms performed very well on almost all graphs. Both of them were quite similar in producing
the best results. The only difference was in the standard deviation of the results [39], which was in favor
of the hybrid method. The m-MACA and the h-MACA are very promising algorithms that need to be
thoroughly investigated.
An obvious improvement of our algorithm would be to merge these two methods into one. To do this
one could apply VQ to produce a starting partition, then coarsen the graph to some extent (to a much
smaller extent than in the original multilevel method), and then use the multilevel method to refine the
previously (with VQ) obtained partition.
On the other hand, there are many possibilities for improving multilevel and, to some extent, hybrid
algorithms. One possibility is in the mapping of the graph onto the grid: with a proper mapping conver-
gence the results can be improved. The use of the load-balancing method between levels would also be a
very promising way to go. The next possibility is in determining which and how many vertices from the
cell will be picked and with what probability. Here, the Kernighan–Lin gain method [40] might be used.
We could also add some daemon actions, like the min-cut algorithm, to improve solutions during the
crossing from one level to another. And, finally, we could change the way the pheromone is evaporated,
deposited and restored.
There is a wide range of possibilities to be considered in the future. One of the most appealing is a
merger of the MACA with some other method through daemon actions and parallel implementation of
the MACA.
References
[1] Cook, R.D. et al. Concepts and Applications of Finite Element Analysis, 4th ed. John Wiley & Sons,
New York, 2001.
[2] Farhat, C. and Lesoinne, M. Automatic partitioning of unstructured meshes for the parallel
solution of problems in computational mechanics. International Journal for Numerical Methods in
Engineering, 36, 745, 1993.
[3] Shephard, M.S. et al. Parallel automated adaptive procedures for unstructured meshes. In
Special Course on Parallel Computing in Computational Fluid Dynamics (AGARD-R-807), AGARD,
Neuilly-sur-Seine, France, 1995, p. 6.1.
[4] Pothen, A., Simon, H.D., and Liou, K.P. Partitioning sparse matrices with eigenvectors of graphs.
SIAM Journal on Matrix Analysis and Application, 11, 430, 1990.
[5] Zomaya, A.Y. et al. Non-conventional computing paradigms in the new millennium. IEEE/AIP
Computing in Science and Engineering, 3, 82, 2001.
[6] Kadłuczka, P. and Wala, K. Tabu search and genetic algorithms for the generalized graph
partitioning problem. Control and Cybernetics, 24, 459, 1995.
[7] Tao, L. et al. Simulated annealing and tabu search algorithms for multiway graph partition,
Journal of Circuits, Systems and Computers, 2, 159, 1992.
[8] Bahreininejad, A., Topping, B.H.V., and Khan, A.I. Finite element mesh partitioning using neural
networks. Advances in Engineering Software, 27, 103, 1996.
[9] Ż ola, J. and Wyrzykowski, R. Application of genetic algorithm for mesh partitioning. In Proceedings
of the Workshop Parallel Numerics, Bratislava, Slovakia, 2000, p. 209.
[10] Blum, C. and Roli, A. Metaheuristics in combinatorial optimization: Overview and conceptual
comparison. ACM Computing Surveys, 35, 268, 2003.
[11] Colorni, A., Dorigo, M., and Maniezzo, V. Distributed optimization by ant colonies. In Proceedings
of the 1st European Conference on Artificial Life, Paris, France, 1991, p. 134.
[12] Dorigo, M., Maniezzo, V., and Colorni, A. The ant system: Optimization by a colony of cooperating
agents. IEEE Transactions on Systems Man and Cybernetics Part B, 26, 29, 1996.
[13] Dorigo, M. and Gambardella, L.M. Ant colony system: A cooperative learning approach to the
traveling salesman problem. IEEE Transactions on Evolutionary Computation, 1, 53, 1997.
[14] Gambardella, L.M., Taillard, E., and Dorigo, M. Ant colonies for the quadratic assignment problem.
Journal of Operational Research Society, 50, 167, 1999.
[15] Teich, T. et al. A new ant colony algorithm for the job shop scheduling problem. In Beyer, H.,
Canta-Paz, E., GoldBerg, D., Parmee, Spector, L., and Whitley, D., Eds., Proceedings of Genetic
Evolutionary Computation Conference, Morgan Kaufmann Publishers, San Francisco, CA, 2001,
p. 803.
[16] Montemanni, R. et al. A new algorithm for a dynamic vehicle routing problem based on ant
colony system. In Proceedings of the 34th Annual Conference of the Italian Operations Research
Society, Venice, Italy, 2003, p. 140.
[17] Sim, K.M. and Sun, W.H. Multiple ant-colony optimization for network routing. In Proceedings of
the 1st International Symposium Cyber Worlds, Tokyo, Japan, 2002, p. 277.
[18] Dorigo, M. and Stützle, T. Ant Colony Optimization, MIT Press, Cambridge, CA, 2004.
[19] Dorigo, M., Bonabeau, E., and Theraulaz, G. Ant algorithms and stigmergy. Future Generation
Computer Systems, 16, 851, 2000.
[20] Korošec, P., Šilc, J., and Robič, B. An ant-colony-optimization approach to the mesh-partitioning
problem. In Parallel Numerics ’02, Trobec, R. et al., Eds., University of Salzburg and Jožef Stefan
Institute, 2002, p. 123.
[21] Korošec, P., Šilc, J., and Robič, B. A multilevel ant-colony-optimization algorithm for mesh
partitioning. International Journal of Pure and Applied Mathematics, 5, 143, 2003.
[22] Langham, A.E. and Grant, P.W. Using competing ant colonies to solve k-way partitioning problems
with foraging and raiding strategies, Lecture Notes in Computer Science, 1674, 621, 1999.
[23] Šilc, J., Korošec, P., and Robič, B. Combining vector quantization and ant-colony algorithm for
mesh-partitioning. Lecture Notes in Computer Science, 3019, 113, 2004.
[24] Deneubourg, J.-L. et al. The self-organizing exploratory pattern of the argentine ant. Journal of
Insect Behavior, 3, 159, 1990.
[25] Stützle, T. and Hoos, H.H. MAX–MIN ant system. Future Generation Computer Systems, 16, 889,
2000.
[26] Pelikan, M., Goldberg, D.E., and Lobo, F.G. A survey of optimization by building and using
probabilistic models. Computational Optimization and Applications, 21, 5, 2002.
[27] Zlochin, M. et al. Model-based search for combinatorial optimization: A critical survey. Annuals
of Operations Research, 131, 373, 2004.
[28] Meuleau, N. and Dorigo, M. Ant colony optimization and stochastic gradient descent. Artificial
Life, 8, 103, 2002.
[29] Handl, J. and Meyer, B. Improved ant-based clustering and sorting in a document retrival interface.
Lecture Notes in Computer Science, 2439, 913, 2002.
[30] Barnard, S.T. and Simon, H.D. A fast multilevel implementation of recursive spectral bisection for
partitioning unstructured problems. Concurrency — Practice and Experience, 6, 101, 1994.
[31] Hendrickson, B. and Leland, R. A multilevel algorithm for partitioning graphs. In Proceedings of
the ACM/IEEE Conference on Supercomputing, San Diego, CA, 1995, p. 28.
[32] Fiduccia, C.M. and Mattheyses, R.M. A linear time heuristic for improving network partitions,
In Proceedings of 19th IEEE Design Automation Conference, Las Vegas, NV, 1982, p. 175.
[33] Linde, Y., Buzo, A., and Gray, R.M. An algorithm for vector quantizer design. IEEE Transactions on
Communications, 28, 84, 1980.
[34] Karypis, G. and Kumar, V. Multilevel k-way partioning scheme for irregular graphs. Journal of
Parallel and Distributed Computing, 48, 96, 1998.
[35] Šilc, J., Korošec, P., and Robič, B. An experimental evaluation of modified algorithms for the
graph partitioning problem. In Proceedings of 17th International Symposium on Computer and
Information Science, Orlando, FL, October 28–30, CRC Press, Boca Raton, FL, 2002, p. 120.
[36] Banos, R. et al. Multilevel heuristic algorithm for graph partitioning. Lecture Notes in Computer
Science, 2611, 143, 2003.
[37] Soper, A.J., Walshaw, C., and Cross, M. A combined evolutionary search and multilevel approach
to graph partitioning. In Proceedings of Genetic Evolutonary Computation Conference, Las Vegas,
NV, 2000, p. 674.
[38] Walshaw, C. and Cross, M. Mesh partitioning: A multilevel balancing and refinement algorithm.
SIAM Journal of Scientific Computing, 22, 63, 2000.
[39] Korošec, P., Šilc, J., and Robič, B. Solving the mesh-partitioning problem with an ant-colony
algorithm. Parallel Computing, 30, 785, 2004.
[40] Kernighan, B.W. and Lin, S. An efficient heuristic procedure for partitioning graph. Bell System
Technical Journal, 49, 291, 1970.
20.1 Introduction
In an organizational setting, a strategy consists of a choice of what activities the organization will perform,
and choices as to how these activities will be performed [1]. These choices define the strategic config-
uration of the organization. Recent work by Levinthal [2] and Rivkin [3] has recognized that strategic
configurations consist of interlinked individual elements (decisions), and have applied general models of
interconnected systems such as Kauffman’s NK model to examine the implications of this for processes of
organizational adaptation.
Following a long-established metaphor of adaptation as search [4], strategic adaptation is considered
in this study as an attempt to uncover peaks on a high-dimensional strategic landscape. Some strategic
configurations produce high profits, others produce poor results. The search for good strategic configura-
tions is difficult due to the vast number of strategic configurations possible, uncertainty as to the nature of
topology of the strategic landscape faced by an organization, and changes in the topology of this landscape
over time. Despite these uncertainties, the search process for good strategies is not blind. Decision-makers
receive feedback on the success of their current and historic strategies, and can assess the payoffs received
by the strategies of their competitors [5]. Hence, certain areas of the strategic landscape are illuminated.
20-305
Organizations do not exist in isolation but interact with, and receive feedback from their environment.
Their efforts at strategic adaptation are guided by social as well as individual learning. Good ideas dis-
covered by one organization disseminate over time. Particle swarm algorithms (PSAs) also emphasize
the importance of individual and social learning processes. Surprisingly, despite the parallels between
the learning processes in particle swarm algorithms and those in populations of organizations, as yet the
particle swarm metaphor has not been applied to the domain of organizational science. This chapter
describes a novel simulation model based on the particle swarm metaphor, and applies this to examine
the process of organizational adaptation. This study, therefore, extends the particle swarm metaphor into
the domain of organization science.
Under the swarm metaphor, a swarm of particles (entities) are assumed to move (fly) through an
n-dimensional space, typically looking for a function optimum. Each particle is assumed to have two
associated properties, a current position and a velocity. Each particle also has a memory of the best
location in the search space that it has found so far (pbest), and knows the best location found to date
by all the particles in the population (gbest). At each step of the algorithm, particles are displaced from
their current position by applying a velocity vector to them. The size and direction of this velocity is
influenced by the velocity in the previous iteration of the algorithm (simulates “momentum”), and the
current location of a particle relative to its pbest and gbest. Therefore, at each step, the size and direction
of each particle’s move is a function of its own history (experience), and the social influence of its peer
group. A number of variants of the PSA exist.
1. Initialize each particle in the population by randomly selecting values for its location and velocity
vectors.
2. Calculate the fitness value of each particle. If the current fitness value for a particle is greater than
the best fitness value found for the particle so far, then revise pbest.
3. Determine the location of the particle with the highest fitness and revise gbest if necessary.
4. For each particle, calculate its velocity according to Equation (20.1).
5. Update the location of each particle.
6. Repeat steps 2 to 5 until stopping criteria are met.
Each particle i has an associated current position in d-dimensional space x i , a current velocity vi , and a
personal best position yi . During each iteration of the algorithm, the location and velocity of each particle
is updated using Equations (20.1) to (20.4). Assuming a function f is to be maximized, that the swarm
consists of n particles, and that r1 , r2 are drawn from a uniform distribution in the range (0, 1), the velocity
update is described as follows:
where ŷ is the location of the global-best solution found by all the particles. A variant on the basic
algorithm is to use a local rather than a global version of gbest, and the term gbest is replaced by lbest.
In the local version, lbest is set independently for each particle, based on the best point found thus far
within a neighborhood of that particle’s current location.
In every iteration of the algorithm, each particle’s velocity is stochastically accelerated toward its previous
best position and toward gbest (or lbest). The weight-coefficients c1 and c2 control the relative impact of
pbest and gbest locations on the velocity of a particle. The parameters r1 and r2 ensure that the algorithm
is stochastic. A practical effect of the random coefficients r1 and r2 is that neither the individual nor the
social learning terms are always dominant.
Although the velocity update has a stochastic component, the search process is not random. It is guided
by the memory of past “good” solutions (corresponding to a psychological tendency for individuals to
repeat strategies that have worked for them in the past [15], and by the global best solution found by all
particles thus far. W represents a momentum coefficient that controls the impact of a particle’s prior-
period velocity on its current-period velocity. Each component (dimension) of the velocity vector vi is
restricted to a range [−vmax , vmax ] to ensure that individual particles do not leave the search space. The
implementation of a vmax parameter can also be interpreted as simulating the incremental nature of most
learning processes [15]. The value of vmax is usually chosen to be k × xmax , where 0 < k < 1. Once
the velocity update for particle i is determined, its position is updated and pbest is updated if necessary,
xi (t + 1) = xi (t ) + vi (t + 1), (20.2)
After all particles have been updated, a check is made to determine whether gbest needs to be updated.
assume a finite number of states. If the number of states for each element is constant (S), the space of all
possible configurations has N dimensions, and contains a total of N i=1 Si possible configurations.
In Kauffman’s operationalization of this general framework [12], the number of states for each element
is restricted to two (0 or 1). Therefore the configuration of N elements can be represented as a binary
string. The parameter K , determines the degree of fitness interconnectedness of each of the N elements
and can vary in value from 0 to N − 1. In one limiting case where K = 0, the contribution of each of
the N elements to the overall fitness value (or worth) of the configuration are independent of each other.
As K increases, this mapping becomes more complex, until at the upper limit when K = N − 1, the fitness
contribution of any of the N elements depends both on its own state, and the simultaneous states of all
the other N − 1 elements, describing a fully connected graph.
If we let si represent the state of an individual element i, the contribution of this element (fi ) to the
overall fitness (F ) of the entire configuration is given by fi (si ) when K = 0. When K > 0, the contribution
of an individual element to overall fitness depends both on its state and the states of K other elements
to which it is linked ( fi (si :si1 , . . . , sik )). A random fitness function (U (0, 1)) is adopted, and the overall
fitness of each configuration is calculated as the average of the fitness values of each of its individual
elements.
Altering the value of K effects the ruggedness of the described landscape, and consequently impacts
on the difficulty of search on this landscape [12,20]. The strength of the NK model in the context of this
study is that by tuning the value of K it can be used to generate strategic landscapes (graphs) of differing
degrees of local-fitness correlation (ruggedness).
The strategy of an organization is characterized as consisting of N attributes [2]. Each of these attributes
represents a strategic decision or policy choice, which an organization faces. Hence a specific strategic
configuration s, is represented as a vector s1 , . . . , sN where each attribute can assume a value of 0 or 1 [3].
The vector of attributes represents an entire organizational form, hence it embeds a choice of markets,
products, method of competing in a chosen market, and method of internally structuring the organiza-
tion [3]. Good consistent sets of strategic decisions — configurations, correspond to peaks on the strategic
landscape.
The definition of an organization as a vector of strategic attributes finds resonance in the work of
Porter [1,25], where organizations are conceptualized as a series of activities forming a value-chain.
The choice of what activities to perform, and subsequent decisions as to how to perform these activities,
defines the strategy of the organization. The individual attributes of an organization’s strategy interact. For
example, the value of an efficient manufacturing process is enhanced when combined with a high-quality
sales force. Differing values for K correspond to varying degrees of payoff-interaction among elements of
the organization’s strategy [3]. As K increases, the difficulty of the task facing strategic decision-makers is
magnified. Local-search attempts to improve an organization’s position on the strategic landscape become
ensnared in a web of conflicting constraints.
Although our simulator embeds all of the above, in this chapter we report results that consider the first
three of these factors. We note that this model bears passing resemblance to the eleMentals model of [16],
which combined a swarm algorithm and an NK landscape to investigate the development of culture and
intelligence in a population of hypothetical beings called eleMentals. However, the OrgSwarm simulator is
differentiated from the eleMental model on grounds of application domain, and because it incorporates
the above five characteristics of strategic adaptation.
20.4.2.1 Dynamic Environment
Organizations do not compete in a static environment. The environment may alter as a result of exogenous
events, for example a regime change such as the emergence of a new technology or a change in customer
preferences. This can be mimicked in the simulation by stochastically respecifying the strategic landscape
during the course of a simulation run. These respecifications simulate a dynamic environment, and
a change in the environment may at least partially negate the value of past learning (adaptation) by
organizations. Minor respecifications are simulated by altering the fitness values associated with one of
the N dimensions of the strategic landscape, whereas in major changes, the fitness of the entire strategic
landscape is redefined. The environment faced by organizations can also change as a result of competition
among the population of organizations. The effect of interfirm competition is left for future work.
20.4.2.2 Strategic Anchor
Organizations do not have complete freedom to alter their current strategy. Their adaptive processes are
subject to strategic inertia. This inertia springs from the organization’s culture, history, and the mental
models of its management [6]. In the simulation, strategic inertia is mimicked by implementing a strategic
anchor. The degree of inertia can be varied from zero to high. In the latter case, the organization is highly
constrained from altering its strategic stance. By allowing the weight of this anchor to vary, adaptation
processes corresponding to different industries, each with different levels of inertia, can be simulated.
Inertia could be incorporated into the PSA in a variety of ways. We have chosen to incorporate it into
the velocity update equation, so that the velocity and direction of the particle at each iteration is also a
function of the location of its strategic anchor. Therefore for the simulations, Equation (20.1) is altered
by adding an additional “anchor” term
where ai represents the position of the anchor for organization i (a full description of the other terms
such as R1 is provided in the pseudo-code below). The weight attached to the anchor parameter (R3 )
(relative to those attached to pbest and gbest), can be altered by the modeler. The position of the anchor
can be fixed at the initial position of the particle at the start of the simulation, or it can be allowed to
“drag,” thereby being responsive to the adaptive history of the particle. In the latter case, the position of
the anchor for each particle corresponds to the position of that particle “x” iterations ago.
20.4.2.3 Election operator
Real-world organizations do not usually intentionally move to strategies that are poorer (i.e produce a
lower payoff) than the one they already have. Hence, an election operator (also referred to as a conditional
update or ratchet operator) is implemented, which when turned on ensures that position updates that
would worsen an organization’s strategic fitness are discarded. In these cases, an organization remains at
its current location.
The vector vi is interpreted as organization i’s predisposition to set each of the N binary strategic choices
j
that it faces to 1. The higher the value of vi for an individual decision j, the more likely that organization
j
i will choose to set decision j = 1, with lower values of vi favoring the choice of decision j = 0.
In order to model the tendency of managers to repeat historically good strategies, values for each
dimension of xi that match those of pbest, should become more probable in the future. Adding the
j j j
difference between pbesti and xi for organization i to vi will increase the likelihood that organization
j j
i will choose to set decision j = 1 if the difference is positive (when pbesti = 1 and xi = 0). If the
j j j j
difference between pbesti and xi for organization i is negative (when pbesti = 0, and xi = 1), adding
j j
the difference to vi will decrease vi .1
In each iteration of the algorithm, the agent adjusts his decision-vector (xi (t)), taking account of his
historical experience (pbest), and the best strategy found by his peer group (lbest). Hence, the velocity
update equation used in the continuous version of the PSA (see Equation [20.6]) can still be used, although
now, vi (t + 1) is interpreted as the updated vector of an agent’s predisposition (or probability thresholds)
to set each of the N binary strategic choices that it faces to one
To ensure that each element of the vector vi (t + 1) is mapped into (0, 1), a sigmoid transformation is
performed on each element j of vi (t + 1) (see Equation [20.9])
j 1
Sig(vi (t + 1)) = j
. (20.9)
1 + exp(−vi (t + 1))
Finally, the transformed vector of probability thresholds is used to determine the values of each element
of xi (t + 1), by comparing each element of Sig(vi (t)) with a random number drawn from U (0, 1) (see
Equation [20.10]).
j j j
If U (0, 1) < Sig(vi (t + 1)), then xi (t + 1) = 1; else xi (t + 1) = 0 (20.10)
In the binary version of the algorithm, trajectories/velocities are changes in the probability that a coordinate
j j
will take on a 0 or 1 value. Sig(vi ) represents the probability of bit xi taking the value 1 [26]. Therefore,
j j
if Sig(vi ) = 0.3 there is a 30% chance that xi = 1, and a 70% chance it is zero.
j
1 The difference in each case is weighted by a random number drawn from U (0, 1). Therefore, if pbesti = 1,
j j j j
(pbesti − xi ) × U (0, 1) will be nonnegative. Adding this to vi will increase vi , and therefore also increase the
j j j j
probability that xi = 1. On the other hand if pbesti = 0, vi will tend to decrease, and Prob(xi ) = 1 becomes smaller.
R1 , R2 , and R3 are random weights drawn from a uniform distribution ranging from 0 to R1max , R2max , and
R3max , respectively, and they weight the importance attached to pbest, lbest and anchor in each iteration
of the algorithm. R1max , R2max , and R3max are constrained to sum up to 4.0 in line with the BinPSO
algorithm of [26]. x is the particle’s actual position, pbest is its past best position, lbest its local best and
a is the position of its anchor. Vmax is set to 4.0 to ensure that Sig(v[n]) does not get too close to either
0 or 1, therefore ensuring that there is a nonzero possibility that a bit will flip state during each iteration.
Pr is a random value drawn from U (0, 1), and Sig is the sigmoid function: Sig(x) = 1/(1 + exp(−x)),
which squashes v into the range 0 → 1 range. t is a temporary record that is used in order to implement
the ratchet operator. If the new strategy is considered better than the organization’s existing strategy, it is
accepted and t is copied into x. Otherwise t is discarded and x remains unchanged. e is the error or noise,
injected in the fitness evaluation, in order to mimic an errorful forecast of strategy fitness.
20.4.4 Simulator
Although the underlying code for the OrgSwarm simulator is written in C + + , the user interacts with the
simulator through a series of easy-to-use screens (Figure 20.1 shows one of the screens in the main control
menu for the simulator). These screens allow the user to select and alter a wide variety of parameters that
determine the nature of the simulation run. In essence, the simulator allows the user to select choices for
four items:
During the simulation run, a series of graphics (see Figure 20.2 for an example of a graphic that shows
the status of each particle in the population during the simulation run), and a run report (see Figure 20.3)
can be displayed. The report display records the full list of simulation parameters chosen by the modeler,
as well as providing a running record of the best design in the population at the end of each iteration. The
simulator also facilitates the recording of comprehensive run-data to disk during the simulation.
FIGURE 20.2 OrgSwarm screendump showing the status of each particle in the population during the simulation
run. Three bars are shown for each of the 20 particles in the population, and these bars represent the fitness of the
anchor location, the fitness of the pbest location, and the fitness of the current location of each particle.
20.5 Results
All simulations were run for 5000 iterations, and all reported fitnesses are the average population fitnesses,
and average environment best fitnesses, across 30 separate simulation runs. On each of the simulation
runs, the NK landscape is specified anew, and the positions and velocities of particles are randomly
initialized at the start of each run. A population of 20 particles is employed, with a neighborhood of size
18. The choice of a high value for the neighborhood, relative to the size of the population, arises from the
observation that real-world organizations know the profitability of their competitors.
Tables 20.1 and 20.2 provide the results for each of 14 distinct PSA variants, at the end of 5000
iterations, across a number of static and dynamic NK landscape scenarios. In each scenario, the same
series of simulations are undertaken. Initially, a basic PSA is employed, without an anchor or a ratchet
(conditional move) heuristic. This simulates a population of organizations searching a strategic landscape,
where the population has no strategic inertia, and where organizations do not utilize a ratchet operator
in deciding whether to alter their position on the strategic landscape.
The basic PSA is then supplemented by inclusion of a series of strategic anchor formulations, ranging
from an anchor that does not change position during the simulation (initial position anchor) to one that
can adapt after a time-lag (moving anchor). Two lag periods are examined, 20 and 50 iterations. Differing
weights can be attached to the anchor term in the velocity Equation (20.6), ranging from 0 (anchor is
“turned off ”) to a maximum of 4. To determine whether the weight factor for the anchor term has a
critical impact on the results, results are reported for weight values of both 1 and 3, corresponding to
low and high inertia weights. Next, to isolate the effect of the ratchet, the conditional move operator is
implemented, and the anchor term is dropped. Finally, to ascertain the combined effect of both ratchet
and anchor, the anchor simulations outlined above are repeated with the ratchet operator “turned on.”
TABLE 20.1 Average (Environment Best) Fitness after 5000 Iterations, Static
Landscape
Fitness
Algorithm (N = 96, K = 0) (N = 96, K = 4) (N = 96, K = 10)
Basic PSA 0.4641 (0.5457) 0.5002 (0.6000) 0.4991 (0.6143)
Initial anchor, w = 1 0.4699 (0.5484) 0.4921 (0.5967) 0.4956 (0.6102)
Initial anchor, w = 3 0.4943 (0.5591) 0.4994 (0.5979) 0.4991 (0.6103)
Mov. anchor (50,1) 0.4688 (0.5500) 0.4960 (0.6003) 0.4983 (0.6145)
Mov. anchor (50,3) 0.4750 (0.5631) 0.4962 (0.6122) 0.5003 (0.6215)
Mov. anchor (20,1) 0.4644 (0.5475) 0.4986 (0.6018) 0.5001 (0.6120)
Mov. anchor (20,3) 0.4677 (0.5492) 0.4994 (0.6156) 0.4994 (0.6229)
Ratchet PSA 0.5756 (0.6021) 0.6896 (0.7143) 0.6789 (0.7035)
Rach-Initial anchor, w = 1 0.6067 (0.6416) 0.6991 (0.7261) 0.6884 (0.7167)
Rach-Initial anchor, w = 3 0.5993 (0.6361) 0.6910 (0.7213) 0.6844 (0.7099)
Rach-Mov. anchor (50,1) 0.6659 (0.6659) 0.7213 (0.7456) 0.6990 (0.7256)
Rach-Mov. anchor (50,3) 0.6586 (0.6601) 0.7211 (0.7469) 0.6992 (0.7270)
Rach-Mov. anchor (20,1) 0.6692 (0.6695) 0.7211 (0.7441) 0.6976 (0.7243)
Rach-Mov. anchor (20,3) 0.6612 (0.6627) 0.7228 (0.7462) 0.6984 (0.7251)
“Real world” strategy vectors consist of a large array of strategic decisions. A value of N = 96 was chosen
in defining the landscapes in this simulation. It is noted that there is no unique value of N that could
have been selected, but the selection of very large values are not feasible due to computational limitations.
However, a binary string of 96 bits provides 296 , or ∼1028 , distinct choices of strategy. It is also noted
TABLE 20.2 Average (Environment Best) Fitness after 5000 Iterations, Entire
Landscape Respecified Stochastically
Fitness
Algorithm (N = 96, K = 0) (N = 96, K = 4) (N = 96, K = 10)
0.65
Mean average fitness (30 runs)
0.6
Basic PSA
Anchor (1)
Anchor (3)
0.55 Mov. Anchor (50,1)
Mov. Anchor (50,3)
Mov. Anchor (20,1)
Mov. Anchor (20,3)
Ratchet PSA
R-Anchor (1)
R-Anchor (3)
R-Mov. Anchor (50,1)
0.5
R-Mov. Anchor (50,3)
R-Mov. Anchor (20,1)
R-Mov. Anchor (20,3)
0.45
0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000
Iteration
FIGURE 20.4 Plot of the mean average fitness on the static landscape where K = 0.
that we would expect the dimensionality of the strategy vector to exceed the number of organizations in
the population, hence the size of the population is kept below 96, and a value of 20 is chosen. A series of
landscapes of differing K values (0, 4, and 10), representing differing degrees of fitness interconnectivity,
were used in the simulations.
Basic PSA
0.7 Anchor (1)
0.65
Anchor (3)
Mov. Anchor (50,1)
Mov. Anchor (50,3)
Basic PSA Mov. Anchor (20,1)
Anchor (1)
Mov. Anchor (20,3)
0.5
0.5
0.45 0.45
0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000
0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000
Iteration Iteration
FIGURE 20.5 Plot of the mean average fitness on the static landscape where K = 4 (left) and 10 (right).
0.65
Basic PSA
Anchor (1)
Mean average fitness (30 runs) Anchor (3)
Mov. Anchor (50,1)
0.6
0.45
0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000
Iteration
FIGURE 20.6 Plot of the mean average fitness on the dynamic landscape where K = 0.
NK landscape is respecified in any iteration with a p = 0.00025. When the landscape is wholly or partially
respecified, the benefits of past strategic learning by organizations is eroded (see [27–29] for a detailed
discussion of the utility of the PSO in tracking dynamic environments).
Qualitatively, the results in both scenarios are similar to those obtained on the static landscape. The
basic PSA, even if supplemented by an anchor mechanism, does not perform any better than random
search. Supplementing the basic PSA with the ratchet mechanism leads to a significant improvement in
populational fitness, with a further improvement in fitness occurring when the ratchet is combined with
an anchor mechanism. In the latter case, an adaptive or dragging anchor gives better results than a fixed
anchor, but the results between differing forms of dragging anchor do not show a clear dominance for any
particular form. As for the static landscape case, the results for the combined ratchet/anchor, are relatively
insensitive to the choice of weight value (1 or 3).
20.6 Conclusions
In this chapter, a synthesis of a strategic landscape defined using the NK model, and a Particle Swarm
metaphor is used to create a novel simulation model of the process of strategic adaptation of organizations.
The results suggest that a degree of strategic inertia, in the presence of an election operator, can assist rather
than hamper the adaptive efforts of populations of organizations in static and slowly changing strategic
environments. The results also suggest that despite the claim for the importance of social learning in
populations, social learning alone is not always enough, unless learnt lessons can be maintained by means
of an election mechanism.
Strategic inertia using particle swarm K = 4 Strategic inertia using particle swarm K = 10
0.75 0.7
Ratchet PSA
Mov. Anchor (50,3)
R-Anchor (1)
Mov. Anchor (20,1) 0.6 R-Anchor (3)
Mov. Anchor (20,3) R-Mov. Anchor (50,1)
Ratchet PSA R-Mov. Anchor (50,3)
R-Anchor (1)
0.6 R-Mov. Anchor (20,1)
R-Anchor (3)
R-Mov. Anchor (20,3)
R-Mov. Anchor (50,1)
R-Mov. Anchor (50,3)
R-Mov. Anchor (20,1) 0.55
R-Mov. Anchor (20,3)
0.55
0.5
0.5
0.45 0.45
0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000
Iteration Iteration
FIGURE 20.7 Plot of the mean average fitness on the dynamic landscape where K = 4 (left) and 10 (right).
It is not possible in a single set of simulation experiments to exhaustively examine every possible com-
bination of settings for each parameter in the simulation model. Future work will extend the range of
settings examined. However, the initial results cast an interesting light on the role of anchoring in organ-
izational adaptation, and the development of the swarm-landscape simulator extends the methodologies
available to researchers to conceptualize and examine organizational adaptation.
Finally, it is noted that the concept of anchoring developed in this chapter is not limited to organizations,
but is plausibly a general feature of social systems. Hence, the extension of the social swarm model to
incorporate inertia may prove useful beyond this study.
References
[1] Porter, M. (1996). What is strategy? Harvard Business Review, Nov.–Dec., pp. 61–78.
[2] Levinthal, D. (1997). Adaptation on rugged landscapes. Management Science, 43: 934–950.
[3] Rivkin, J. (2000). Imitation of complex strategies. Management Science, 46: 824–844.
[4] Wright, S. (1932). The roles of mutation, inbreeding, crossbreeding and selection in evolution.
In Proceedings of the Sixth International Congress on Genetics, 1: 356–366.
[5] Kitts, B., Edvinsson, L., and Beding, T. (2001). Intellectual capital: from intangible assets to fitness
landscapes. Expert Systems with Applications, 20: 35–50.
[6] Boeker, W. (1989). Strategic change: The effects of founding and history. Academy of Management
Journal, 32: 489–515.
[7] Stuart, T. and Podolny, J. (1996). Local search and the evolution of technological capabilities,
Strategic Management Journal, 17: 21–38.
[8] Hannan, M. and Freeman, J. (1977). The populational ecology of organizations. American Journal
of Sociology, 82: 929–964.
[9] Hannan, M. and Freeman, J. (1984). Structural inertia and organizational change. American
Sociological Review, 49: 149–164.
[10] Levinthal, D. (1991). Random walks and organisational mortality. Administrative Science
Quarterly, 36: 397–420.
[11] Tushman, M. and O’Reilly, C. (1996). Ambidextrous organizations: Managing evolutionary and
revolutionary change. California Management Review, 38: 8–30.
[12] Kauffman, S. (1993). The Origins of Order. Oxford University Press, Oxford, England.
[13] Kennedy, J. and Eberhart, R. (1995). Particle swarm optimization. In Proceedings of the IEEE
International Conference on Neural Networks, December 1995, pp. 1942–1948.
[14] Kennedy, J., Eberhart, R., and Shi, Y. (2001). Swarm Intelligence. Morgan Kauffman, San Mateo, CA.
[15] Kennedy, J. (1997). The particle swarm: Social adaptation of knowledge. In Proceedings of the
International Conference on Evolutionary Computation. IEEE Press, Washington, pp. 303–308.
[16] Kennedy, J. (1999). Minds and cultures: Particle swarm implications for beings in sociocognitive
space. Adaptive Behavior, 7: 269–288.
[17] Kennedy, J. (1999). Small worlds and mega-minds: effects of neighbourhood topology on particle
swarm performance. In Proceedings of the International Conference on Evolutionary Computation.
IEEE Press, Washington, pp. 1931–1938.
[18] Thorndike, E. (1911). Animal Intelligence, Macmillan, New York.
[19] Bandura, A. (1986). Social Foundations of Thought and Action: A Social Cognitive Theory.
Prentice Hall, Englewood Cliffs, NJ.
[20] Kauffman, S. and Levin, S. (1987). Towards a general theory of adaptive walks on rugged
landscapes. Journal of Theoretical Biology, 128: 11–45.
[21] Gavetti, G. and Levinthal, D. (2000). Looking forward and looking backward: Cognitive and
experiential search, Administrative Science Quarterly, 45: 113–137.
[22] Porter, M. and Siggelkow, N. (2001). Contextuality within activity systems. Harvard Business
School Working paper series, No. 01-053.
[23] Lobo, J. and MacReady, W. (1999). Landscapes: A Natural Extension of Search Theory.
Santa Fe Institute Working paper 99-05-037.
[24] Kauffman, S., Lobo, J., and MacReady, W. (1998). Optimal Search on a Technology Landscape.
Santa Fe Institute Working paper 98-10-091.
[25] Porter, M. (1985). Competitive Advantage:Creating and Sustaining Superior Performance. The Free
Press, New York.
[26] Kennedy, J. and Eberhart, R. (1997). A discrete binary version of the particle swarm algorithm.
In Proceedings of the Conference on Systems, Man, and Cybernetics. IEEE Press, Washington,
pp. 4104–4109.
[27] Blackwell, T. (2003). Swarms in dynamic environments. In Proceedings of GECCO 2003, Vol. 2723
of Lecture Notes in Computer Science. Springer-Verlag, Berlin, pp. 1–12.
[28] Eberhart, R. and Shi, Y. (2001). Tracking and optimizing dynamic systems with particle swarms.
In Proceedings of the CEC 2001. IEEE Press, Washington, pp. 94–97.
[29] Hu, X. and Eberhart, R. (2002). Adaptive particle swarm optimization: detection and response to
dynamic systems. In Proceedings of CEC 2002. IEEE Press, Washington, pp. 1666–1670.
Some parts of the chapter have been taken from H.F. Wedde, M. Farooq, and Y. Zhang. BeeHive: An efficient fault-
tolerant routing algorithm inspired by honey bee behavior. Ant Colony Optimization and swarm intelligence, Lecture
Notes in Computer Science 3172, by kind permission of Springer-Verlag, Heidelberg, Germany. The paper also won
the best paper award at ANTS2004 conference.
21-321
21.1 Introduction
A honey bee colony manages to react to countless changes in the forage pattern outside the hive and
internal changes inside the hive through a decentralized and sophisticated communication and control
system. According to Seeley, a honey bee colony can thoroughly monitor a vast region around the hive for
rich food sources, nimbly redistribute its foragers within an afternoon, fine-tune its nectar processing to
match its nectar collecting, effect cross inhibition between different forager groups to boost its response
differential between food sources, precisely regulate its pollen intake in relation to its ratio of internal
supply and demand, and limit the expensive process of comb building to times of critical need for
additional storage space [1]. A bee colony demonstrates this flexible and adaptive response because it is
organized with morphologically uniform individuals but with different temporary specializations. A bee
takes up four roles during her life span — cleaner, nurse, food-storer, and forager. The foragers could
be further recognized as nectar, pollen, and water collectors [1]. They have two functional roles within
each sub specialty: scouts, who discover new food sources around the hive, and foragers, who transport
nectar from an already discovered flower site by following the dances of other scouts or foragers. The
colony brilliantly allocates, through its communication and control system, its labor force among these
individuals to maintain a balance between collection and processing rate of each commodity, as a result,
an optimum stock of nectar, pollen, and water is piled inside the colony. On an average, a colony extracts
from its environment around 20 kg of pollen, 120 kg of nectar, 25 l of water, and 100 g of resin each
year [1].
Karl von Frisch in 1944 made a revolutionary discovery about the communication paradigm that
foragers use to communicate the information about flower sites around the hive. He experimentally
verified that foragers can inform their fellow foragers, inside the hive on a dance floor, of the direction
and the distance to a food source by means of a dance. He deciphered these communication signals
into a language, in his book Tanzsprache und Orientierung der Bienen [2] (the translation was done by
Chadwick [3]). The foragers, according to von Frisch, used two type of dances: round dances, which
show that a food source is present near the hive (about 100 m), and waggle dances, which further specify
the direction and the distance to a food source (up to few kilometers), by the orientation and duration
portion of the waggle portion of each dance circuit. The food sources located in the direction of the
sun are represented as an upward direction on the vertical comb, and any angle to the right or left of the
sun is encoded as any angle to the right or left of the upward direction. The recruited foragers maintain
this angle from the sun to reach in a small region around the flower and then use the flower fragrance,
which clings to the body of a forager and which they smelt while observing the dances, to identify the food
source unambiguously. Recently, the researchers have also indicated that several components of dance are
correlated with food-source profitability, such as dance duration and probability of sound production;
however, this information is not taken into account by unemployed (dance following) foragers, instead
they randomly choose a dancer before leaving the hive. Nevertheless, the recruited foragers arrive in greater
numbers at more profitable food sources because the dances for richer sources are more conspicuous and
hence likely to be encountered by the unemployed (dance following) foragers [1].
results obtained from the extensive simulations. Finally, we conclude our findings, and provide an outlook
to future research.
it experiences a very small search time to locate a food-storer bee for its commodity (a cue that the colony
needs the commodity). By maintaining the search time within some thresholds, the honey bee colony
reinforces the foraging labor at a site in times of need and vice versa. A stochastic model for the foraging
behavior is presented in Reference 5. Sumpter used this basic model to come up with an agent-based
model in Reference 6. The Sumpter’s model provides a solid foundation for developing an agent-based
reinforcement learning algorithm [7].
Definition 21.1 (Natural Engineering). Natural Engineering is an emerging new engineering discipline that
enables scientists/engineers to utilize inspirations and observations from organizational principles of natural
systems, and transform them into structural principles of software organization (algorithms) or industrial
products, in search of efficient/optimal solutions for real world problems under constraints of the resources.
There is, of course, no clear-cut way to achieve one-to-one match of structures/principles in Nature life
organizations with working principles in technical systems. The most important challenge, therefore, is to
identify a natural system whose working principles could be easily abstracted to the one in the technical
system. If one has to add many nonbiological features into the natural system then, we believe, it is more
advisable to look at other natural systems for inspiration. Consequently, we chose honey bees because
their foraging behavior could be easily abstracted into a routing problem in telecommunication networks.
Both systems have to maximize the amount of a commodity (nectar/data delivered to the hive/nodes) as
quickly as possible, under a continuously changing operating environment.
The major focus of our research, however, is to take bio/nature inspired solutions into business and
therefore we decided to follow, for developing the BeeHive algorithm, a feedback oriented engineering
approach as displayed schematically in Figure 21.1 that incorporates most of the features discussed here.
First, we considered the ensemble of constraints under which the envisioned routing protocol is
supposed to operate:
At the same time we decided that the bee agents should explore the network, collect important parameters,
and make the routing decisions in a decentralized fashion (in the style as real scouts/foragers do decision
making during collecting nectar from flowers). Bee agents should measure the quality of a route and then
communicate to other bee agents like foragers do that in Nature. The structure of the routing table should
provide the functionality of a dance floor for exchange of information among bee agents, and among
bee agents and data packets. Moreover, later on we should be able to utilize it in a real kernel of the Linux
operating system.
We implemented our ideas in a simulation environment and then refined our algorithmic mapping
through the feedback channel 1 (see Figure 21.1). During this phase we did not use any simulation
specific features that were not available inside the Linux kernel, for example, using vector, stack, or similar
data structures. Once we reached a relative optimum of the BeeHive concept, we started to develop
an engineering model of the algorithm. The engineering model could be easily transported to the Linux
kernel routing framework. We tested it on the real network of Linux machines and refined our engineering
model through the feedback channel 2 (see Figure 21.1). At the moment we are evaluating our conceptual
approach in two prototype projects: BeeHive [8], which deals with the design and development of a
routing algorithm for fixed networks, and BeeAdHoc, whose goal is to design and develop an energy
efficient routing algorithm for mobile ad hoc networks (MANETS) [9].
Natural
system
Algorithmic
Working model
environment
Natural algorithm to
Model system algorithm
mapping
Testing and 1
evaluation
Engineering
model
In Section 21.4 we will elaborate the most important step of our engineering approach: the mapping of
concepts in a honey bee colony, discussed in Section 21.2, to operating environment of real packet switch-
ing networks. This step will help the reader in understanding our algorithm described in Section 21.5.
We do not discuss the implementation of the BeeHive inside the network stack of Linux kernel because it is
beyond the scope of the chapter, however, interested reader will find the technical details in Reference 10.
One could consider propagation delay as a distance information and queuing delay as a direction
information (please remember bee scouts also provide these parameters in their dances).
3. A bee agent decides to provide its path information only, if the quality of the path it traversed,
is above a threshold. The threshold is controlled by the number of hops that a bee agent is allowed
to take. Moreover, the agents model the quality of a path as a function of the propagation and the
queuing delay of the path; lower values of the parameters result in higher values for the quality
parameter.
4. Majority of bee agents in BeeHive algorithm explore the network near their launching node and
very few explore the complete network. The idea is borrowed from a honey bee colony as this
reduces not only the overhead of collecting the routing information but also helps in maintaining
smaller/local routing tables.
5. We consider routing table as a dance floor where the bee agents provide the information about
the quality of the paths they traversed. The routing table is used for information exchange among
bee agents, launched from the same node but arrived at a node via its different neighbors. This
information exchange helps in evaluating the overall quality of a node (that has multiple pathways
to a destination) for reaching a certain destination.
6. The nectar foragers exploit the flower sites according to their quality while the distance and direction
to the sites is communicated through dances made by their fellow foragers, on the dance floor. In our
algorithm, we have to map the quality of paths onto the quality of nodes to utilize the bee principle.
Consequently, we formulate the quality of a node, for reaching a destination, as a function of
proportional quality of only those neighbors, which possibly lie in the path toward the destination.
7. We consider data packets as foragers. Once they arrive at a node, they access the information in the
routing tables, stored by bee agents, about the quality of different neighbors of the node for reaching
their destinations. They select the next neighbor toward the destination in a stochastic manner
depending upon its goodness, as a result, not all packets follow the best paths. This will help in
maximizing the system performance though a data packet may not follow the best path, a concept
directly borrowed from a principle of bee behavior: a bee could only maximize her colony’s profit
if she refrains from broadly monitoring the dance floor to identify the single most desirable food [1]
(see Section 21.2).
Now we are in a position to introduce our bee agent model and BeeHive algorithm in the following section.
Informally, the BeeHive algorithm and its main characteristics could be summarized as follows:
1. The network is organized into fixed partitions called foraging regions. A partition results from
particularities of the network topology. Each foraging region has one representative node. Currently
the lowest IP address node in a foraging region is elected as the representative node. If this node
crashes then the next higher IP address node takes over the job.
2. Each node also has a node specific foraging zone that consists of all nodes from whom short distance
bee agents can reach this node.
3. Each nonrepresentative node periodically sends a short distance bee agent, by broadcasting replicas
of it to each neighbor site.
4. When a replica of a particular bee agent arrives at a site it updates the routing information there,
and will be flooded again; however, it will not be sent to the neighbor from where it arrived. This
process continues until the life span of the agent has expired, or if a replica of this bee agent had
been received already at a site, the new replica will be killed there.
5. Representative nodes only launch long distance bee agents that would be received by the neighbors
and propagated as in 4. However, their life span (number of hops) is limited by the long distance
limit.
6. The idea is that each agent while traveling, collects and carries path information, and it leaves,
at each node visited, the trip time estimate for reaching its source node from this node over the
incoming link. Bee agents use priority queues for quick dissemination of routing information.
7. Thus each node maintains current routing information for reaching nodes within its foraging zone
and the representative nodes of foraging regions. This mechanism enables a node to route a data
packet (whose destination is beyond the foraging zone of the given node) along a path toward the
representative node of the foraging region containing the destination node.
8. The next hop for a data packet is selected in a stochastic fashion according to the quality measure
of the neighbors. The motivation for this routing policy is explained in Section 21.4. Note that
the currently employed routing algorithms in Internet always choose a next hop on the shortest
path [14].
Figure 21.2 provides an exemplary working of the flooding algorithm. Short distance bee agents can
travel up to 3 hops in this example. Each replica of the shown bee agent (launched by node 10) is specified
with a different trail to identify its path unambiguously. The numbers on the paths show their costs.
The flooding algorithm is a variant of breadth first search algorithm. Nodes 2, 3, 4, 5, 6, 7, 8, 9, 11
constitute the foraging zone of node 10.
Now we will briefly discuss the estimation model that bee agents utilize to approximate the trip time tis
that a packet will take in reaching its source node s from current node i (ignoring the protocol processing
delays for a packet at node i and s).
qlin
tis ≈ + txin + pdin + tns , (21.1)
bin
where qlin is the size of the queue (in bits) for neighbor n at node i, bin is the bandwidth of the link
between node i and neighbor n, txin and pdin are transmission and propagation delays, respectively, of the
link between node i and neighbor n, and tns is trip time from n to s. Bandwidth and propagation delays
of all links of a node are approximated by transmitting hello packets.
Rep node
11
1
1 8
7
1 1
10
1
5
1
4 9
1
2
6
1
1 Foraging region 1
0
3
Representative 1
node
1
1
1 2
Foraging region 0
experience in reaching destination k via neighbor j. Table 21.1 shows an example of Ri . In the IFR routing
table, the queuing delay and propagation delay values for reaching the representative node of each foraging
region through the neighbors of a node are stored. The structure of the IFR routing table is similar to
the one shown in Table 21.1 where destination is replaced by a pair of (representative, region). The FRM
routing table provides the mapping of known destinations onto a foraging region. In this way we eliminate
the need to maintain O(N × D) (where D is total number of nodes in a network) entries in a routing table
as done by AntNet and save a considerable amount of router memory needed to store this routing table.
Goodness of a Neighbor: The goodness of a neighbor j of node l (l has n neighbors) for reaching
a destination d is gjd and defined as follows:
The fundamental motivation behind Definition 21.2 is to approximate the behavior of the real network.
When the network is experiencing a heavy traffic load then the queuing delay plays the primary role in
the delay of a link. In this case it is trivial to say that qjd pjd and we could see from Equation (21.2)
that gjd ≈ (1/qjd )/ nk=1 1/qkd . When the network is experiencing low traffic then the propagation
delay plays an important role in defining the delay of a link. As qjd
pjd , from Equation (21.2) we get
gjd ≈ (1/pjd )/ nk=1 1/pkd . We use stochastic sampling with replacement [15] for selecting a neighbor.
This principle ensures that a neighbor j with goodness gjd will be selected as the next hop with at least the
probability gjd / nk=1 gkd . Algorithm 21.1 provides the pseudo-code of BeeHive.
else
consult FRM routing table of node i to find node w
consult IFR routing table of node i to find delays to node w
calculate goodness of all neighbors for reaching w using equation 21.2
endif
probabilistically select a neighbor n (n
= p) as per goodness
enqueue data packet dsd in normal queue for neighbor n
endfor
if (t mod h = 0)
send a hello packet to all neighbors
if (time out before a response from neighbor) (4th time)
neighbor is down
update the routing table and launch bees to inform other nodes
endif
endif
endwhile
endfor
routing table. In this algorithm ants are asked to traverse a set of n nodes in a particular order, known as a
chromosome. Once an agent visits the nth node it is then converted into a backward agent that returns to
its source node. In contrast to AntNet, the backward agents only modify the routing tables at the source
node. The source node also measures the fitness of this agent based on the trip time value, and then it
generates a new population using single point cross over. New agents enter the network and evaluate the
assigned paths. The routing table stores the agents’ IDs, their fitness values and trip times to the visited
nodes. Routing of a data packet is done through the path that has the shortest trip time to the destination.
If no entry for a particular destination is found then a data packet is routed with the help of an agent that
has the maximum fitness value. DGA was designed assuming that the routers could crash during network
operations. The interested reader will find more details in Reference 22.
Note that in contrast to above-mentioned algorithms, bee agents need not be equipped with a stack to
perform their duties. Moreover, our agent model requires only forward moving agents and they utilize
an estimation model to calculate the trip time from their source to a given node. This model eliminates
the need for global clock synchronization among routers, and it is expected that for very large networks
routing information could be disseminated quickly with a small overhead as compared with AntNet.
Our agent model does not require to store the average trip time, the variance of trip times, and the
best trip time for each destination at a node to determine the goodness of a neighbor for a particular
destination. Last but not the least, BeeHive works with a significantly smaller routing table as compared
with AntNet.
54
52 55
56
Node 40 51
53
48
47 50
44 49
45
26 34 40 46
13 18 19 32
10 43
5 14 22 21 20 25 273036 35 39 41
9 11 29
6 15 42
3 24 38
7
2 1 4 12
17 16 23
28 31 33 37
0 8
Node 21
Node 4 (hot spot)
uniform distribution. While, in Weighted state (W), a destination selected in the previous Uniform state
is favored over other destinations. This approach provides a more challenging experimental environment
than the one in which AntNet was evaluated.
Saturating loads: The purpose of the experiments was to study the behavior of the algorithms by
gradually increasing the traffic load, through decreasing MSIA from 4.7 to 1.7 sec. MPIA is 0.005 sec
during these experiments. Figure 21.4 shows the average throughput and 90th percentile of the
packet delay distribution. It is obvious from Figure 21.4 that BeeHive delivered approximately the
same number of data packets as that of AntNet but with lesser packet delay. Both OSFP and DGA
are unable to cope with a saturated load yet the performance of DGA is the poorest.
Size of foraging zones: Next, we analyzed the effect of the size of a foraging zone in which a bee agent
updates the routing table. We report the results for sizes of 7, 10, 15, and 20 hops in Figure 21.5.
Figure 21.5 shows that increasing the size of foraging zone after 10 does not bring significant
performance gains. This shows the power of BeeHive that it converges to an optimum solution
with a size of just 7 hops.
Size of the routing table: The fundamental motivation of the foraging zone concept was not only to elim-
inate the requirement for global knowledge but also to reduce memory needed to store a routing
table. BeeHive requires 88, 94, and 104 entries, on the average, in the routing tables for foraging
zone sizes of 7, 10, and 20 hops, respectively. OSFP needs just 57 entries while AntNet needs 162
entries on the average. Hence BeeHive achieves similar performance as that of AntNet but size of
the routing table is of the order of OSPF.
(a) 70 (b) 8
OSPF
OSPF 58
85 AntNet
97 99
5
40 81
4
30 39 85
99 3
92 97 68 82
48
20 81
58 2
97
10 1 92 97
99
99
0 0
1.7 2.7 4.7 1.7 2.7 4.7
MSIA MSIA
(a) 70 (b) 8
BeeHive(7) BeeHive(7)
90th percentile of packet delays (sec)
60 83 82 81 82 BeeHive(10) 7 BeeHive(10)
BeeHive(15) BeeHive(15)
BeeHive(20) 6 BeeHive(20)
50
Throughput (Mbps)
99 99 99 99
5
40
4
30 99 99 99 99
3 83 82 81 82
20
2
10 1
99 99 99 99
99 99 99 99
0 0
1.7 2.7 4.7 1.7 2.7 4.7
MSIA MSIA
Hot spot : The purpose of this experiment is to study the effect of transient overloads in a network. We
selected node 4 (see Figure 21.3) as a hot spot. The hot spot was active from 500 to 1000 sec and
all nodes sent data to node 4 with MPIA = 0.05 sec. This transient overload was superimposed
on a normal load of MSIA = 2.7 sec and MPIA = 0.3 sec. Figure 21.6 shows that both BeeHive
and AntNet are able to cope with the transient overload, however the average packet delay for
BeeHive is less than 100 msec as compared with 500 msec for AntNet. Again DGA shows the
poorest performance.
Router crash: The purpose of this experiment was to analyze the fault-tolerant behavior of BeeHive so
we took MSIA = 4.7 sec and MPIA = 0.005 sec to ensure that no packets are dropped because
of the congestion. We simulated a scenario in which Router 21 crashed at 300 sec, and Router 40
crashed at 500 sec and then both were repaired at 800 sec. Figure 21.7 shows the results. BeeHive
is able to deliver 97% of deliverable packets as compared with 89% by AntNet. Observe that from
300 to 500 sec (just Router 21 is down), BeeHive has a superior throughput and lesser packet
delay but once Router 40 crashes, the packet delay of BeeHive increases because of higher load at
Router 43. From Figure 21.3 it is obvious that the only path to the upper part of the network is via
Router 43 once Router 40 crashed. Since BeeHive is able to deliver more packets the queue length
at Router 43 increased and this led to relatively poorer packet delay as compared with AntNet.
(a) 70
OSPF (81.39%)
AntNet (96.85%)
DGA (48.08%)
60 BeeHive (99.9%)
BeeHive
50
OSPF AntNet
40
30
DGA
20
10
0
0 100 200 300 400 500 600 700 800 900 1000
(b) 2
OSPF (81.39%)
1.8 AntNet (96.85%)
DGA DGA (48.08%)
1.6 BeeHive (99.9%)
1.4
1.2
1
OSPF
0.8
0.6
0.4
0.2 BeeHive
AntNet
0
0 100 200 300 400 500 600 700 800 900 1000
FIGURE 21.6 Node 4 acted as hot spot from 500 to 1000 sec.
In addition, observe that Router 21 is critical but in case of its crash still multiple paths exist to the
middle and upper part of the topology via 15, 18, 19, 20, 24.
Overhead of BeeHive: We first define the terms routing overhead and suboptimal overhead.
Definition 21.2 (Routing overhead). Routing overhead is defined as the ratio between the bandwidth
occupied by the routing packets and the total available network bandwidth [18].
The metric suboptimal overhead was introduced in Reference 25 in the context of MANETS but we believe
that it is equally relevant in fixed networks as well.
(a) 40
AntNet (89.43%)
DGA (46.34%)
BeeHive (97.45%)
30
BeeHive
20
AntNet
DGA
10
0
0 100 200 300 400 500 600 700 800 900 1000
(b)
AntNet (89.43%)
1.4
DGA (46.34%)
BeeHive (97.45%)
1.2
1 DGA
0.8
0.6
0.4
AntNet
0.2
BeeHive
0
0 100 200 300 400 500 600 700 800 900 1000
FIGURE 21.7 Router 21 is down at 300 and Router 40 at 500 and both repaired at 800.
Definition 21.3 (Suboptimal overhead). The difference between the bandwidth consumed when trans-
mitting data packets from all the sources to destinations and the bandwidth that would have been
consumed, should the data packets have followed the shortest hop count path. Formally, we could define the
parameter as
n n k
i=1 (hi − hosd ) × Lisd
sd
d=1 s=1
So = , s
= d, (21.3)
Bt
1.5 BeeHive
BeeHive 15
1
10
0.5 5
0 0
4.7 2.7 1.7 4.7 2.7 1.7
MSIA MSIA
where n is total number of nodes in the network, k is total number of packets generated, hisd number of hops
that packet i took to reach its destination d from its source s, hosd minimum hops between node s and node d,
Lisd is length of packet i from source s to destination d, and Bt is the total bandwidth of the network.
Figure 21.8 shows the control overhead and suboptimal overhead of the algorithms. It is quite interesting
to note that the suboptimal overhead is much higher than the control overhead (note that the x- and y-
axes have different scales on the two figures). OSPF has the smallest suboptimal overhead but then it
delivers less packets as well. The routing overhead of DGA decreases with an increase in the load and
vice versa. This happens due to the genetic algorithm. Remember that the next generation of the agents
are launched once four agents from the previous generation are received. Under low load, the return times
for the agents are smaller, as a result, the agents are launched at a higher rate and vice versa. Since in
AntNet, Forward_Ant agents use the same queues that data packets also use, therefore more ants were
dropped under increased network load and this explains the decrease in routing overhead behavior of
the algorithm. It is evident from Figure 21.8 that BeeHive has significantly smaller routing overhead and
suboptimal overhead as compared to AntNet and DGA.
21.9 Conclusion
A honey bee colony is able to optimize its stockpiles of nectar, pollen, and water through an intelligent
allocation of labor among different specialists, which communicate with each other using a sophisticated
communication protocol that consists of signals and cues, in continuously changing internal and external
environments. The dance language and foraging behavior of honey bees inspired us to develop a fault-
tolerant, adaptive, and robust routing protocol. The algorithm does not need any global information
such as the structure of the topology and cost of links among routers, rather it works with the local
information that a short distance bee agent collects in a foraging zone. It works without the need of global
clock synchronization, which not only simplifies its installation on real routers but also enhances fault
tolerance. In contrast to AntNet our algorithm utilizes only forward moving bee agents that help in
disseminating the state of the network to the routers in real time. The bee agents take less than 1% of the
available bandwidth but provide significant enhancements in throughput and packet delay.
We implemented two state-of-the-art adaptive algorithms (AntNet and DGA) for the OMNeT++
simulator and then compared our BeeHive algorithm with them. Through extensive simulations represent-
ing dynamically changing operating network environments we have demonstrated that BeeHive achieves
a better or similar performance as compared with AntNet. However, this enhancement in performance is
achieved with a routing table whose size is of the order of OSPF.
We should emphasize that the foraging model described in References 6 and 7 could be used
for developing any multi-objective optimization algorithm for dynamic environments. The impor-
tant concepts of BeeHive, such as bee agent propagation algorithm and bee agent communication
paradigm, could be applied to any optimization problem that could be represented in a graph. We
believe that BeeHive concept will motivate the researchers to develop Bee Colony Optimization (BCO)
algorithms.
References
[1] T.D. Seeley. The Wisdom of the Hive. Harvard University Press, London, 1995.
[2] K. von Frisch. Tanzsprache und Orientierung der Bienen. Springer-Verlag, Heidelberg, 1965.
[3] K. von Frisch. The Dance Language and Orientation of Bees. Harvard University Press, Cambridge,
1967.
[4] H.A. Simon. Administrative Behavior: A Study of Decision-Making Processes in Administrative
Organization. Free Press, New York, 1976.
[5] T.D. Seeley and W.F. Towne. Collective decision making in honey bees: How colonies choose
among nectar sources. Behavior Ecology and Sociobiology, 12: 277–290, 1991.
[6] D.J.T. Sumpter. From bee to society: An agent-based investigation of honey bee colonies.
Ph.D. thesis, The University of Manchester, UK, 2000.
[7] L.P. Kaelbling, M.L. Littman, and A.W. Moore. Reinforcement learning: A survey. Journal of
Artificial Intelligence, 4: 237–285, 1996.
[8] H.F. Wedde, M. Farooq, and Y. Zhang. Beehive: An efficient fault-tolerant routing algorithm
inspired by honey bee behavior. In Proceedings of ANTS Workshop, Lecture Notes in Computer
Science 3172, Springer-Verlag, September 2004.
[9] H.F. Wedde and M. Farooq et al. BeeHive — An Energy-Aware Scheduling and Routing Framework.
Technical report, Project Group 439, LSIII, School of Computer Science, University of Dortmund,
2004.
[10] Y. Zhang. Design and implementation of bee agents based algorithm for routing in high speed,
adaptive and fault-tolerant networks. Master thesis, LSIII, The University of Dortmund, Germany,
2003.
[11] P. Nii. The blackboard model of problem solving. AI Magazine, 7: 38–53, 1986.
[12] E. Bonabeau, M. Dorigo, and G. Theraulaz. Swarm Intelligence from Natural to Artificial Systems.
Oxford University Press, Oxford, 1999.
[13] P.P. Grassé. La reconstruction du nid et les coordinations interindividuelles chez bellicositermes
natalensis et cubitermes sp. la théorie de la stigmergie: essai d’interprétation du comportement
des termites constructeurs. Insectes Sociaux, 6: 41–81, 1959.
[14] L.L. Peterson and B.S. Davie. Computer Networks a Systems Approach. Morgan Kaufmann
Publishers, San Francisco, CA, 2000.
[15] D.E. Goldberg. Genetic Algorithms in Search, Optimization and Machine Learning. Addison-Wesley,
Reading, MA, 1989.
[16] J.T. Moy. OSPF Anatomy of an Internet Routing Protcol. Addison-Wesley, Reading, MA, 1998.
[17] J.T. Moy. OSPF Complete Implementation. Addison-Wesley, Reading, MA, 2000.
[18] G. Di Caro and M. Dorigo. AntNet: Distributed stigmergetic control for communication networks.
Journal of Artificial Intelligence, 9: 317–365, 1998.
[19] G. Di Caro and M. Dorigo. Two ant colony algorithms for best-effort routing in datagram networks.
In Proceedings of the Tenth IASTED International Conference on Parallel and Distributed Computing
and Systems (PDCS’98), IASTED/ACTA Press, 1998, pp. 541–546.
[20] B. Barán and R. Sosa. A new approach for antnet routing. In Proceedings of the Ninth International
Conference on Computer, Communications and Networks, 2000.
[21] S. Liang, A.N. Zincir-Heywood, and M.I. Heywood. The effect of routing under local informa-
tion using a social insect metaphor. In Proceedings of IEEE Congress on Evolutionary Computing,
May 2002.
[22] S. Liang, A.N. Zincir-Heywood, and M.I. Heywood. Intelligent packets for dynamic network
routing using distributed genetic algorithm. In Proceedings of Genetic and Evolutionary Computa-
tion Conference, GECCO, July 2002.
[23] E.W. Dijkstra. A note on two problems in connection with graphs. Numerical Mathematics,
1: 269–271, 1959.
[24] A. Varga. OMNeT++: Discrete Event Simulation System: User Manual. http://www.omnetpp.org.
[25] C. Santivanez, B. McDonald, I. Stavrakakis, and R. Ramanathan. On the scalability of ad hoc
routing protocols. In Proceedings of IEEE INFOCOM 2002, IEEE, June 2002.
22.1 Introduction
In recent years several approaches to knowledge discovery and data mining, and in particular, clustering,
have been developed, but only a few of them are designed using a decentralized approach. Clustering
data is the process of grouping similar objects according to their distance, connectivity, or relative density
in space [1]. There are a large number of algorithms for discovering natural clusters, if they exist, in a
dataset, but they are usually studied as a centralized problem. These algorithms can be classified into
partitioning methods [2], hierarchical methods [3], density-based methods [4], and grid-based methods
[5]. Han et al.’s paper [6] is a good introduction to this subject. Many of these algorithms work on
data contained in a file or database. In general, clustering algorithms focus on creating good compact
representation of clusters and appropriate distance functions between data points. For this purpose, they
generally need to be given one or two parameters by a user that indicate the types of clusters expected.
Since they use a central representation where each point can be compared with each other point or cluster
representation, points are never placed in a cluster with largely different members. Centralized clustering
22-341
is problematic if we have large data to explore or data is widely distributed. Parallel and distributed
computing is expected to relieve current mining methods from the sequential bottleneck, providing the
ability to scale to massive datasets, and improving the response time. Achieving good performance on
today’s high performance systems is a nontrivial task. The main challenges include synchronization and
communication minimization, workload balancing, finding good data decomposition, etc. Some existing
centralized clustering algorithms have been parallelized and the results have been encouraging. Centralized
schemes require high level of connectivity, impose a substantial computational burden, are typically more
sensitive to failures than decentralized schemes, and are not scalable, which is a property that distributed
computing systems are required to have.
Recently, other algorithms based on biological models [7–10] have been introduced to solve the clus-
tering problem in a decentralized fashion. These algorithms are characterized by the interaction of a large
number of simple agents that sense and change their environment locally. Furthermore, they exhibit
complex, emergent behavior that is robust with respect to the failure of individual agents. Ant colonies,
flocks of birds, termites, swarms of bees, etc., are agent-based insect models that exhibit a collective intel-
ligent behavior (SWARM intelligence, SI) [11] that may be used to define new algorithms of clustering.
The use of new SI-based techniques for data clustering is an emerging new area of research that has
attracted many investigators in the last years. SI models have many features in common with evolutionary
algorithms (EAs). Like EA, SI models are population-based. The system is initialized with a population of
individuals (i.e., potential solutions). These individuals are then manipulated over many iteration steps
by mimicking the social behavior of insects or animals, in an effort to find the optima in the problem
space. Unlike EAs, SI models do not explicitly use evolutionary operators, such as crossover and muta-
tion. A potential solution simply “flies” through the search space by modifying itself according to its past
experience and its relationship with other individuals in the population and the environment. In these
models, the emergent collective behavior is the outcome of a process of self-organization, in which insects
are engaged through their repeated actions and interaction with their evolving environment. Intelligent
behavior frequently arises through indirect communication between the agents using the principle of
stigmergy [12]. This mechanism is a powerful principle of cooperation in insect societies. According to
this principle, an agent deposits something in the environment that makes no direct contribution to the
task being undertaken but is used to influence the subsequent behavior that is task related. The advantages
of SI are twofold. First, it offers intrinsically distributed algorithms that can use parallel computation
quite easily. Second, these algorithms show a high level of robustness to change by allowing the solution to
dynamically adapt itself to global changes by letting the agents self-adapt to the associated local changes.
Social insects provide a new paradigm for developing decentralized clustering algorithms.
In this chapter, we present a method for decentralized clustering based on an adaptive flocking algorithm
proposed by Macgill [13]. We consider clustering as a search problem in a multiagent system in which
individual agents have the goal of finding specific elements in the search space, represented by a large
dataset of tuples, by walking efficiently through this space. The method takes advantage of the parallel
search mechanism a flock implies, by which if a member of a flock finds an area of interest; the mechanics
of the flock will draw other members to scan that area in more detail. The algorithm selects interesting
subsets of tuples without inspecting the whole search space guaranteeing a fast placing of points correctly in
the clusters. We have applied this strategy as a data reduction technique to perform efficiently approximate
clustering [14]. In the algorithm, each agent can use hierarchical, partitioned, density-based, and grid-
based clustering methods to discover if a tuple belongs to a cluster.
To illustrate this method we present two algorithms: SPARROW (SPAtial clusteRing algoRithm thrOugh
sWarm intelligence) and SPARROW-SNN. SPARROW combines the flocking algorithm with the density-
based DBSCAN algorithm [15]. SPARROW-SNN combines the flocking algorithm with a shared nearest-
neighbor (SNN) cluster algorithm [16] to discover clusters with differing sizes, shapes, and densities in
noise, high dimensional data. We have built a SWARM [17] simulation of both algorithms to investigate
the interaction of the parameters that characterize them. First, experiments showed encouraging results
and a better performance of both algorithms in comparison with the linear randomized search using an
entropy-based model.
The rest of this chapter is organized as follows: Section 22.2 presents the SPARROW clustering algorithm,
first, introducing the heuristics of the DBSCAN algorithm, used by each agent to discover clusters, and
the flocking algorithm for the interaction among the agents. Section 22.3 describes the centralized SNN
clustering algorithm and how it is combined with the flocking algorithm to produce the SPARROW-SNN
algorithm. Section 22.4 discusses the results obtained. Section 22.5 presents an entropy-based model to
theoretically explain the behavior of the algorithm and Section 22.6 draws some conclusions.
Separation gives an agent the ability to maintain a certain distance from others. This prevents agents
from crowding too closely together, allowing them to scan a wider area.
Cohesion gives an agent the ability to cohere (approach and form a group) with other nearby agents.
Steering for cohesion can be computed by finding all agents in the local neighborhood and com-
puting the average position of the nearby agents. The steering force is then applied in the direction
of that average position.
Alignment gives an agent the ability to align with other nearby characters. Steering for alignment can
be computed by finding all agents in the local neighborhood and averaging together the “heading”
vectors of the nearby agents.
• Alignment and cohesion do not consider yellow agents, since they move in a not very attractive
zone.
• Cohesion is the resultant of the heading toward the average position of the green flockmates
(centroid), of the attraction toward reds, and of the repulsion from whites, as illustrated in
Figure 22.1.
• A separation distance is maintained from all the boids, apart from their color.
Ignore it White
Yellow
Centroid Green
Red
Resultant
for i =1..MaxGenerations
foreach agent (yellow, green)
age =age +1;
if (age >Max_Life)
generate_new_agent( ); die( );
endif
if (not visited (current_point))
density = compute_local_density( );
mycolor = color_agent( );
endif
end foreach
The SPARROW algorithm consists of a setup phase and a running phase shown in Figure 22.2. During
the setup phase agents are created, data are loaded, and some general settings are made. In the running
phase each agent repeats four distinct procedures for a fixed number of times (MaxGenerations). We
use the foreach statement to indicate that the rules are executed in parallel by the agents whose color is
specified in the argument. The mycolor procedure chooses the color and the speed of the agents with
regard to the local density of the points of clusters in the data. It is based on the same parameters used in
the DBSCAN algorithm: MinPts, the minimum number of points to form a cluster and Eps, the maximum
distance that the agents can look at. In practice, the agent computes the local density (density) in a circular
neighborhood (with a radius determined by its limited sight, i.e., Eps) and then it chooses the color
according to the following simple rules:
In the running phase, yellow and green agents will compute their direction, according to the rules
previously described, and will move following this direction with the speed corresponding to their color.
In case the agent falls in the same position of that of an older one it will die and will be regenerated in
another place.
Agents will move toward the computed destination with a speed depending on their color: green agents
more slower than yellow agents since they will explore denser zones of clusters. Green and yellow agents
have a variable speed, with a common minimum and maximum for all agents. An agent will speedup to
leave an empty or uninteresting region whereas will slowdown to investigate an interesting region more
carefully.
The variable speed introduces an adaptive behavior in the algorithm. In fact, agents adapt their move-
ment and change their behavior (speed) based on their previous experience represented from the red and
white agents. Red and white agents will stop signaling to the others the interesting and desert regions.
Note that for any agent that has become red or white, a new agent will be generated in order to maintain
a constant number of agents exploring the data. In the first case, the new agent will be generated in a
close random point, since the zone is considered interesting, while in the latter it will be generated in a
random point over all the space. Next, red agents will run the merge procedure, which will merge the
neighboring clusters. The merging phase considers two different cases: when we have never visited points
in the circular neighborhood and when we have points belonging to different clusters. In the first case, the
points will be labeled and will constitute a new cluster; in the second case, all the points will be merged
into the same cluster, that is, they will get the label of the cluster discovered first. The first part of code
executed by yellow and green agents was added to avoid a “cage effect ” (see Figure 22.3), which occurred
during the first simulations; in fact, some agents could remain trapped inside regions surrounded by red
or white agents and would have no way to go out, wasting useful resources for the exploration. Therefore,
a limit was imposed on their life. When their age exceeded a determined value (Max_Life) they were made
to die and were regenerated in a new randomly chosen position of the space.
SPARROW suffers the same limitation as DBSCAN, that is, cannot cope with clusters of different
densities. The new algorithm SPARROW-SNN, introduced in Section 22.3, is more general and overcomes
these drawbacks. It can be used to discover clusters with differing sizes, shapes, and densities in noise data.
of points in terms of how many nearest neighbors the two points share. Using this new definition of
similarity, the algorithm eliminates noise and outliers, identifies representative points, and then builds
clusters around the representative points. These clusters do not contain all the points, rather they represent
relatively uniform group of points.
The SNN algorithm starts performing the Jarvis–Patrick scheme. In the Jarvis–Patrick algorithm, a set
of objects is partitioned into clusters based on the number of shared nearest neighbors. The standard
implementation is constituted by two phases. The first, a preprocessing stage identifies the K nearest
neighbors of each object in the dataset. In the subsequent clustering stage, two objects i and j join the
same cluster if:
• i is one of the K nearest neighbors of j
• j is one of the K nearest neighbors of i
• i and j have at least Kmin of their K nearest neighbors in common
where K and Kmin are used-defined parameters. For each pair of points i and j is defined as a link with an
associate weight. The strength of the link between i and j is defined as:
In this equation, k is the nearest-neighbor list size, m and n are the positions of a shared nearest
neighbor in i and j’s lists. At this point, clusters can be obtained by removing all edges with weights less
than a user-specified threshold and taking all the connected components as clusters. A major drawback
of the Jarvis–Patrick algorithm is that, the threshold needs to be set high enough since two distinct set of
points can be merged into same cluster even if there is only a link across them. On the other hand, if a
high threshold is applied, then a natural cluster will be split into many small clusters due to the variations
in the similarity in the cluster.
Shared nearest-neighbor addresses these problems introducing the following steps:
1. For every node (data point) calculates the total strength of links coming out of the point.
2. Identify representative points by choosing the point that have high density (>core_threshold).
3. Identify noise points by choosing the points that have low density (<noise_threshold) and
remove them.
4. Remove all links between points that have weight smaller than a threshold (merge_threshold).
5. Take connected components of points to form clusters, where every point in a cluster is either a
representative point or is connected to a representative point.
The number of clusters is not given to the algorithm as a parameter. Also, note that not all the points are
clustered.
for i=1..MaxGenerations
foreach agent (yellow, green)
if (not visited (current_point))
conn =compute_conn( );
if (conn <noise_threshold)
consider the point for the removal from the clustering
endif
endif
mycolor = color_agent( );
end foreach
computes the connectivity, conn[A], of the point, that is, computes the total number of strong links the
points has, according to the rules of the SNN algorithm. Points having connectivity smaller than a fixed
threshold (noise_threshold) are classified as noise and are considered for removal from clustering. A color
is then assigned to each agent, based on the value of the connectivity computed in the visited point, using
the following procedure (called color_agent() in the pseudo code):
The colors assigned to the agents are: red, revealing representative points, green, border points, yellow,
noise points, and white, indicating an obstacle (uninteresting region). After the coloration step, the green
and yellow agents, compute their movement observing the positions of all other agents that are at some
fixed distance (dist_max) from their and applying the same rules of SPARROW.
In any case, each red agent (placed on a representative point) will run the merge procedure so that it
will include, in the final cluster, the representative point discovered together with the points that share
a significant (greater than Pmin ) number of neighbors and that are not noise points. The merging phase
considers two different cases: when we have visited none of these points in the neighborhood and when
we have points belonging to different clusters. In the first case, the points will be assigned the same
temporary label and will constitute a new cluster; in the second case, all the points will be merged into
the same cluster, that is, they will get the label corresponding to the smallest one. Therefore, clusters will
be built incrementally.
(a) (b)
In addition, in this case, during simulations a “cage effect,” was observed. This part is not reported in
the pseudo code, as it is the same used in the SPARROW algorithm.
Percentage of
Clustering data points for
using the cluster found by SPARROW
GEORGE
dataset 7% 12% 22%
Percentage of
data points for
Clustering cluster found by SPARROW
using the
DS4 dataset 7% 12% 22%
1 51.16 70.99 78.86
2 45.91 64.74 74.40
3 40.68 59.36 81.95
4 44.21 60.66 81.67
5 54.65 58.72 71.54
6 48.77 59.91 78.10
7 54.29 66.43 79.18
8 51.16 70.99 78.86
9 45.91 64.74 74.40
SPARROW
400 Random
Flock
300
Core points
200
100
0
0 100 200 300 400 500
Visited points
FIGURE 22.6 Number of core points found for SPARROW, random, and flock strategy versus total number of visited
points for the DS4 dataset.
SPARROW
400 Random
Flock
Core points
300
200
100
0
0 100 200 300 400 500
Visited points
FIGURE 22.7 Number of core points found for SPARROW, random, and flock strategy versus total number of
visited points for the GEORGE dataset.
2000
100 boids
1800
50 boids
1600
25 boids
1400
Core points
1200
1000
800
600
400
200
0
0 20 40 60 80 100
Generations
FIGURE 22.8 The impact of the number of agents on the foraging for clusters strategy (DS4).
2000
1800 100 boids
1600 50 boids
25 boids
1400
Core points
1200
1000
800
600
400
200
0
0 20 40 60 80 100
Generations
FIGURE 22.9 The impact of the number of agents on the foraging for clusters strategy (GEORGE).
At the beginning, the random strategy, and also (to a minor extent) the flock, overcomes SPARROW;
however, after 200 to 250 visited points SPARROW presents a superior behavior on both the search
strategies because of the adaptive behavior of the algorithm that allows agents to learn on their previous
experience. Finally, we present the impact of the number of agents on the foraging for clusters performance.
Figure 22.8 and Figure 22.9 give the number of clusters found in 100 time steps (generations) for 25, 50,
and 100 agents. A comparative analysis reveals that a 100-agents population discovers a larger number of
clusters than the other two populations with a smaller number of agents (the scalability is almost linear).
This scalable behavior of the algorithm determines a faster completion time because a smaller number of
iterations are necessary to produce the solution.
(a) (b)
Boston
New York
Philadelphia
FIGURE 22.10 The two datasets used in our experiments (circles surround the three towns of North–East dataset).
1. Determine the accuracy of the approximate solution that we obtain if we run our cluster algorithm
on only a percent of points, as opposed to running the SNN clustering algorithm on the entire
dataset.
2. Determine the effects of using SPARROW-SNN searching strategy as opposed to random-walk
searching strategy in order to identify clusters.
3. Determine the impact of the number agents on the foraging for clusters performance.
For the experiments, we used synthetic datasets and a real life dataset extracted from a spatial database.
The structures of these datasets are shown in Figure 22.10(a) and 22.10(b).
The first dataset, called DS1, contains 8,000 points and presents different densities in the clusters.
The second dataset, called North–East, contains 123,593 points representing postal addresses of three
metropolitan areas (New York, Boston, and Philadelphia) in the North East States.
We first illustrate the accuracy loss of our SPARROW-SNN algorithm in comparison with SNN
algorithm when SPARROW-SNN is used as a technique for approximate clustering. For this purpose,
we implemented a version of SNN and we computed the number of clusters and the number of points for
cluster for the two datasets. Table 22.3 presents a comparison of these results with respect to ones obtained
from SPARROW-SNN when a population of 50 agents visited, respectively, 7, 12, and 22% of the entire
dataset.
Note that with only 7% of points we can have a clear vision of the found clusters and with a few more
points we can obtain the near totality of the points. This trend is well marked in the North–East dataset.
For the DS1 dataset, the results are not so well defined because the clusters called numbers 3 and 4 have
very few points, so they are very hard to discover. For the real dataset, we only reported the results for the
three main clusters representing the towns of Boston, New York, and Philadelphia.
We can explain the good results through the adaptive search strategy of SPARROW-SNN that requires
to the individual agents to first explore the data searching for representative points whose position is not
known a priori, and then, after the representative points are located, all the flock member are steered
to move toward the representative points, that represent the interesting regions, in order to help them,
avoiding the uninteresting areas that are instead marked as obstacles and adaptively changing their speed.
To verify the effectiveness of the search strategy we have compared SPARROW-SNN with the RWS
strategy and with the standard flocking search strategy. Figure 22.11 and Figure 22.12 show the number
of representative points found with SPARROW-SNN and those found with the random search and with
the standard flock versus the total number of visited points. Both figures reveal that the number of
representative points discovered at the beginning (and until about 170 visited points for DS1 and 150 for
SPARROW-SNN
400 Random
Flock
300
Core points
200
100
0
0 100 200 300 400 500
Visited points
FIGURE 22.11 Number of core points found for SPARROW-SNN, random, and flock versus total number of visited
points for DS1 dataset.
North–East dataset) from the random strategy, and (to a minor extent) for the flock, is slightly greater
than of SPARROW-SNN.
Next, our strategy presents a superior behavior on both the strategies because of the adaptive behavior
of the algorithm that allows to the agents to learn on their previous experience.
Finally, in order to study the scalability of our approach, we present the impact of the number of agents
on the foraging for clusters performance. Figure 22.13 and Figure 22.14 give, respectively, the DS1 and
North–East dataset, the number of clusters found in 100 time steps (generations) for 25, 50, and 100
agents. A comparative analysis reveals that a 100-agents population discovers a larger number of clusters
(almost linear) than the other two populations with a smaller number of agents. This scalable behavior of
the algorithm should determine a faster completion time because a smaller number of iterations should
be necessary to produce the solution.
We try to partially give a theoretical explanation of the behavior observed in Figure 22.11 and
Figure 22.12, that is, the improvement in accuracy of the SPARROW-SNN strategy, introducing an
entropy-based model in Section 22.5.
400
SPARROW-SNN
Random
300 Flock
Core points
200
100
0
0 100 200 300 400
Visited points
FIGURE 22.12 Number of core points found for SPARROW-SNN, random, and flock versus total number of visited
points for North–East dataset.
2000
1800 100 boids
1600 50 boids
25 boids
Core points
1400
1200
1000
800
600
400
200
0
0 20 40 60 80 100
Generations
FIGURE 22.13 The impact of the number of agents on foraging strategy for the DS1 dataset.
2000
100 boids
1800
50 boids
1600
25 boids
1400
Core points
1200
1000
800
600
400
200
0
0 20 40 60 80 100
Generations
FIGURE 22.14 The impact of the number of agents on foraging strategy for the North–East dataset.
emergence in multiagent systems and concepts as entropy is not just a loose metaphor, but can provide
quantitative, analytical guidelines for designing and operating agent systems. These concepts can be applied
in measuring the behavior of multiagent systems. The main result suggested here concerns the principle
that the key to reducing disorder in a multiagent system to achieve a coherent global behavior is coupling
that system to another in which disorder increases. This corresponds to a macrolevel where the order
increases, that is, a coherent behavior arises, and the microlevel where an increase in disorder is the cause
for this coherent behavior at the macrolevel.
A multiagent system should follow the second law of thermodynamics “energy spontaneously disperses
from being localized to becoming spread out if it is not hindered,” if agents move without any constriction.
However, if we add information in an intelligent way the agents natural tendency to maximum entropy
will be contrasted and system goes toward self-organization.
Here, we will apply this model to the SPARROW-SNN algorithm. As stated in Reference 20, we can
observe two levels of entropy: a macrolevel in which organization takes place, balanced by the micro in
that we have an increase of entropy. For the sake of clarity, in SPARROW-SNN, microlevel is represented
by red and white agents’ positions, signaling, respectively, interesting and desert zones, and the macrolevel
is computed considering all the agents positions. Therefore, we expect to observe an increase of micro
entropy by the birth of new red and white agents and, on the contrary, a decrease in macro entropy
indicating organization in the coordination model of the agents.
In the following, we give a more exact description of the entropy-based model. In information theory,
entropy can be defined as
S=− pi log pi .
i
Now, to adapt this formula to our need, we use the location-based entropy. Consider an agent moving
in the space of data divided in a grid N × M = K , where each cell has the same dimension. Hence, if N
and M are quite large, each random agent has the same probability to be in one of the K cells of the grid.
We measured the entropy running a simulation for 100 times, for 2000 time steps using 50 agents and
counting how many times the agent falls in the same cell i for each time step. Dividing this number by T
we obtain the probability pi that the agent will be in this cell.
Then, the locational entropy will be:
a
i=1 pi log pi
S=− . (22.1)
log a
In case of random distribution, every state has probability 1/a, so the overall entropy will be
(log a)/(log a) = 1; this explains the factor of normalization a.
This equation can be generalized for P agents, summing the over all agents and dividing the average
by P. Equation (22.1) represents the macro entropy; if we consider only red and white points, it represents
the micro entropy.
We computed the micro and macro locational entropy for the real dataset North–East with the
SPARROW-SNN algorithm using the parameters cited earlier. The results are showed in Figure 22.15
and Figure 22.16.
As expected, we can observe an increase in micro entropy and a decrease in macro entropy of SPARROW-
SNN due to the organization introduced in the coordination model of the agents by the attraction toward
red agents and the repulsion of white agents. On the contrary, in random and standard flock model, the
curve of macro entropy is almost constant, confirming the absence of organization.
These experiments partially explain the validity of our approach, but further studies are necessary and
will be conducted to better demonstrate the positive effects of organization and the correlation between
macro entropy and validity of searching strategy.
0.35
0.3
0.25
Micro entropy
0.2
0.15
0.1
0.05
0
0 200 400 600 800 1000 1200 1400 1600 1800 2000
Iterations
FIGURE 22.15 Micro entropy (red and white agents) for the North–East dataset using SPARROW-SNN.
1
Random
0.995
Flock
0.99
0.985
Macro entropy
0.98
0.975
0.97 SPARROW-SNN
0.965
0.96
0.955
0 200 400 600 800 1000 1200 1400 1600 1800 2000
Iterations
FIGURE 22.16 Macro entropy (all the agents) for the North–East dataset using SPARROW-SNN.
22.6 Conclusions
In this chapter, we have described a fully decentralized approach to clustering spatial data by a multi-
agent system. The approach is based on the use of SI techniques. Two novel algorithms that combine
density-based and shared nearest-neighbor clustering strategy with a flocking algorithm have been presen-
ted. The algorithms have been implemented in SWARM and evaluated using synthetic datasets and one
real word dataset. Measures of accuracy of the results show that the flocking algorithm can be efficiently
applied as a data reduction strategy to perform approximate clustering. Moreover, by an entropy-based
model we have theoretically demonstrated that the adaptive search strategy of SPARROW is more efficient
than that of the RWS strategy. Finally, the algorithms show a good scalable behavior.
Acknowledgment
This work was supported by the CNR/MIUR project — legge 449/97-DM 30/10/2000 and by Project
“FIRB Grid.it” (RBNE01KNFP).
References
[1] Han, J. and Kamber, M. Data Mining: Concepts and Techniques. Morgan Kaufmann, San Mateo,
CA, 2000.
[2] Kaufman, L. and Rousseeuw, P.J. Finding Groups in Data: An Introduction to Cluster Analysis.
John Wiley & Sons, New York, 1990.
[3] Karypis, G., Han, E., and Kumar, V. CHAMELEON: A hierarchical clustering algorithm using
dynamic modeling. IEEE Computer, 32, 68–75, 1999.
[4] Sander, J., Ester, M., Kriegel, H.-P., and Xu, X. Density-based clustering in spatial databases: The
algorithm GDBSCAN and its applications, Data Mining and Knowledge Discovery, 2, 169–194,
1998.
[5] Wang, W., Yang, J., and Muntz, R. STING: A statistical information grid approach to spatial data
mining. In Proceedings of International Conference on Very Large Data Bases (VLDB’97), 1997,
pp. 186–195.
[6] Han, J., Kamber, M., and Tung, A.K.H. Spatial clustering methods in data mining: A survey. In
Geographic Data Mining and Knowledge Discovery, H. Miller and J. Han (Eds.), Taylor & Francis,
London, 2001.
[7] Deneubourg, J.L., Goss, S., Franks, N., Sendova-Franks, A., Detrain, C., and Chretien, L. The
dynamic of collective sorting robot-like ants and ant-like robots. In Proceedings of the first Confer-
ence on Simulation of Adaptive Behavior, J.A. Meyer and S.W. Wilson (Eds.), MIT Press/Bradford
Books, Cambridge, MA, 1990, pp. 356–363.
[8] Lumer, E.D. and Faieta, B. Diversity and adaptation in populations of clustering ants. In Proceedings
of the Third International Conference on Simulation of Adaptive Behavior: From Animals to Animats
(SAB94), D. Cliff, P. Husbands, J.A. Meyer, and S.W. Wilson (Eds.), MIT Press, Cambridge, MA,
1994, pp. 501–508.
[9] Kuntz, P. and Snyers, D., Emergent colonization and graph partitioning, In Proceedings of the Third
International Conference on Simulation of Adaptive Behavior: From Animals to Animats (SAB94),
D. Cliff, P. Husbands, J.A. Meyer, and S.W. Wilson (Eds.), MIT Press, Cambridge, MA, 1994,
pp. 494–500.
[10] Monmarché, N., Slimane, M., and Venturini, G. On improving clustering in numerical databases
with artificial ants. In Advances in Artificial Life: 5th European Conference, ECAL 99, Vol. 1674 of
Lecture Notes in Computer Science, Springer-Verlag, Berlin, 1999, pp. 626–635.
[11] Bonabeau, E., Dorigo, M., and Theraulaz, G. Swarm Intelligence: From Natural to Artificial Systems.
Oxford University Press, Oxford, 1999.
[12] Grassé, P.P., La Reconstruction du nid et les Coordinations Inter-Individuelles chez Beellicositermes
Natalensis et Cubitermes sp. La Théorie de la Stigmergie: Essai d’interprétation du Comportement
des Termites Constructeurs, Insect. Soc. 6, pp. 41–80, 1959.
[13] Macgill, J. Using flocks to drive a geographical analysis engine. In Artificial Life VII: Proceedings of
the Seventh International Conference on Artificial Life. MIT Press, Reed College, Portland, Oregon,
2000, pp. 1–6.
[14] Kollios, G., Gunopoulos, D., Koudas, N., and Berchtold, S. Efficient biased sampling for
approximate clustering and outlier detection in large datasets. IEEE Transactions on Knowledge
and Data Engineering, 15, 1170–1187, 2003.
[15] Ester, M., Kriegel, H.-P., Sander, J., and Xu, X. A density-based algorithm for discovering clusters in
large spatial databases with noise. In Proceedings of the 2nd International Conference on Knowledge
Discovery and Data Mining (KDD-96), Portland, OR, 1996, pp. 226–231.
[16] Ertöz, L., Steinbach, M., and Kumar, V. A new shared nearest neighbor clustering algorithm and its
applications. In Workshop on Clustering High Dimensional Data and its Applications at 2nd SIAM
International Conference on Data Mining, Washington, USA, April 2002, pp. 105–115.
[17] Minar, N., Burkhart, R., Langton, C., and Askenazi, M. The Swarm Development Group, 1996,
http://www.santafe.edu/projects/swarm.
[18] Reynolds, C.W. Flocks, herds, and schools: A distributed behavioral model. Computer Graphics,
21, 25–34, 1987.
[19] Jarvis, R.A. and Patrick, E.A. Clustering using a similarity measure based on shared nearest
neighbors. IEEE Transactions on Computers, C-22(11), 1025–1034, 1973.
[20] Van Dyke Parunak, H. and Brueckner, S. Entropy and self-organization in multi-agent systems.
In Proceedings of the Fifth International Conference on Autonomous Agents, ACM Press, 2001,
pp. 124–130.
23.1 Introduction
Security management against malicious intrusions and frauds in mobile telecommunication systems
represents one of the most challenging problems for the wireless networking and mobile phone com-
munities. Currently, mobile telephone carriers had been plagued by fraudulent use of cloned and stolen
phone numbers. These fraudulent acts are costing them dearly in many countries, especially in countries
where mobile phones can be rented freely and exchanged freely from some telecom carriers [1]. They are
also witnessing, every year, a significant increase of their profit losses that are passed, unfortunately, to the
mobile phone users. As a consequence, many of these mobile phone carriers are investing a large amount
23-359
of money in the R&D to develop efficient security management solutions to deal with their security
concerns. In this chapter, we focus mainly on the cloning and subscription frauds. In a subscription fraud,
an impostor may subscribe to the service provided by the mobile phone telecommunication company,
with a prior intention of not paying for the service. The impostor can also steal, subscribe, or use, each
month, a different mobile phone (e.g., using a different name/number). However, in all of these cases,
these mobile phones will most likely have a common usage pattern. The cloning fraud is characterized
by calls that were made by cloned phones where the calls will appear in a monthly billing statement of
a legitimate phone. The cloning may occur via a simple radio that can capture the two identification
numbers: ESN, Electronic Serial Number and MIN, Mobile Identification Number) of a legitimate phone.
Here again, calls from cloned phones and calls from legitimate phones will most likely have different usage
patterns, as usage pattern is specific to each mobile phone user. While it is true that cloning frauds can
be significantly reduced using new hardware technologies, subscription frauds are hardware independent.
Therefore, new software-based technologies must be developed and deployed to identify these malicious
intruders. To the best of our knowledge little work has been done at the software level to deal with the
cloning and subscription frauds problem [2–4].
1. Cryptography schemes will increase the difficulty for malicious intruders to capture the ESN/MIN
numbers and perform clandestine hearings. While one might argue that it is simple to use such
schemes in digital phones, it is not quite obvious as to how they can be used in analog phones.
Thereby, new and efficient schemes are needed to deal with the security problem in mobile phone
operations. Furthermore, GSM digital phones using encryption model have already been cloned
and broken recently by malicious intruders.
2. Denial of service schemes are basically based upon denying international calls, for instance. While
this approach might help the mobile phone carriers to reduce their profit losses, it is clear that this
solution is an alternative solution and not an efficient security management solution.
3. User verification schemes are based upon the use of passwords by a mobile phone user before using
his phone set. Note that this approach is widely used by many mobile phone carriers. While this
approach provides a certain level of security, it is our belief that a good security solution should
not complicate the mobile phone usage by requiring the users to memorize all of these passwords
in order to use their mobile phones.
4. Traffic analysis schemes are based upon the profile and the behavior of the mobile phone users.
Nowadays, the main problem of this approach is its’ simplistic and rule based analysis where only
the monthly statement bill and the price of the calls are being considered.
In this chapter, we focus upon the last approach. We show how nature and biological inspired techniques
can be used efficiently for traffic analysis purposes and the identification of malicious intruders in mobile
phone operation systems.
service (e.g., user has history of a bad debt). An immediate and automatic message (thereby, minimizing
the size of the personal staff to do that) is sent to the mobile phone user in order to confirm the cloning
or fraud subscription. As opposed to waiting until the end of the monthly billing statement cycle, this
immediate notification will help the telecom carriers to reduce their profit losses, thereby reducing the
damage that would have been passed on, somehow, to the mobile phone users, such as increased phone
call fees, surcharges, and so on. In the case of detection of fraudulent subscription, personalized reports
are quickly generated and used by the system and the telecom carrier to investigate who actually made the
suspected calls.
In addition to the security management, Web/WAP application tools could be developed where users
can observe online their phone bills, which will allow them to track down their mobile phone statement
bill and identify malicious intruders using their phone number, thereby reducing the fraudulent usage of
the mobile phone. The system may encompass services of access control, authentication, confidentiality,
integrity, availability and nonrepudiation of communication. In our previous work [2,4,5], we have
incorporated and implemented all of these security services using Java with the support of CORBA.
Moreover, as an important characteristic, our system was formally validated according to ISO 8807
standard, see References 3 and 6.
This chapter is organized as follows: Section 23.2 describes the use of ISO 8807 standard that we have
used to specify and validate our security management system. Section 23.3 demonstrates how neural
network techniques can be used to detect intrusion by pattern recognition of telecom users. Section 23.4
comments about how the immune human system can be used as an alternative solution to identify
malicious intruders and fraudulent usage of mobile phone users. Section 23.5 concludes the chapter.
Caesar is a compiler that translates a LOTOS specification into a C program (to be executed or simulated)
or into an LTS, Labeled Transition Systems (to be verified using bi-simulation and temporal logic
tools). As will be demonstrated in Section 23.2.3, it is possible to compare the protocol LTS with
the related LTS’ service. Both LTSs are generated using the Caesar compiler and are compared using
the Aldebaran tool.
Aldebaran is a tool for communication systems verification. It is represented by LTSs, where the state
transitions are labeled by action names. Aldebaran allows the LTSs reduction using several equiv-
alence relations (e.g., strong bi-simulation, observational equivalence, delay bi-simulation and
secure equivalence).
The stepwise refinements approach used in our work to build the LOTOS specifications, allows the
system to be validated along the system development.
mail_alarm
phone_alarm
SSTCC Telecom
Service online_bill Users
check_owner
specification SstccService[mail_alarm,phone_alarm,online_bill,check_owner]:noexit
behaviour SstccService[mail_alarm,phone_alarm,online_bill,check_owner] where
process SstccService[mail_alarm,phone_alarm,online_bill,check_owner]:noexit:=
(i;mail_alarm;
(phone_alarm;SstccService[mail_alarm,phone_alarm,online_bill,check_owner]
[] SstccService[mail_alarm,phone_alarm,online_bill,check_owner]))
[](online_bill;SstccService[mail_alarm,phone_alarm,online_bill,check_owner])
[](check_owner;SstccService[mail_alarm,phone_alarm,online_bill,check_owner])
endproc
endspec
The mail_alarm gate is used by SSTCC to send alarm signals by snail mail to the users about a
malicious intruder. The phone_alarm gate allows the SSTCC system to send these alarms using the
mobile phone (e.g., short message or interactive voice message). The advantage of using snail mail is its
delivery guarantee, whereas short time has the benefit of using mobile phone. The online_bill gate
is used by the users to observe their online phone bill via the Web/WAP tool. Finally, the check_owner
gate is used by the telecom carrier to investigate probable impostors (with potential future bad debits). The
SSTCC system remains permanently active, thereby characterizing an infinite behavior of this system —
This is illustrated in Figure 23.2 using the noexit functionality.
The behavior of our SSTCC system is defined by the process SSCCService, that can execute an action
in the gate mail in order to send an alarm by a regular surface mail (this action is always possible); the
sequence is followed by a nondeterministic choice with two options. The first option is related to the alarm
sent by mobile phone, in the gate phone (this action is not always possible) and, following, the process
SSTCCService is called recursively in order to deal with another case. The first option may not be
successful, since the phones may not work properly (e.g., the mobile phone users may be out of their area
after a certain period of time). Thus, an internal action i occurs (which has not been observed) and the
process SSTCCService is executed recursively. Similarity, the online_bill and check_owner
are the two gates used to observe the phone bills and verify/confirm the user identity, respectively.
This abstract level of SSTCC specification corresponds to a formal specification of the user requirements
of the system, which will represent the basis for future refinements (i.e., for the protocol specification).
SSTCCProtocol
C
CloneCorbaAgent1 clone_notif O
R
user_pattern online_call B mail
A alarm
UserPatternFile1 OnlineCallsFile1
M
A
CloneCorbaAgentn clone_notif N phone
A alarm
user_pattern online_call G
UserPatternFilen OnlineCallsFilen E
R
SSCC
C U
BillCorbaAgent1 O O S
bill_notif R E
bill_db online_call B R
A S
BillDB1 UserOnlineCallsFile1
M online
A bill
BillCorbaAgentn bill_notif N
A
bill_db online_call G
E
BillDB1 UserOnlineCallsFile1 R
SETWeb/SETWap
impostor _notif C
ImpostorCorbaAgent2 O
R
user_1 impostor_1 online_1 B
A
Patterns1 Impostors1 OnLine1
M check
impostor_notif A owner
ImpostorCorbaAgentn N
A
user_n impostor_n online_n G
E
Patternsn Impostorsn OnLinen R
SIPI
Figure 23.3 displays each of the three components that are identified by a manager and several
distributed agents. The SSCC/SIPI agents compare the online calls database with the users’ pattern data-
base identified by the neural networks technology. The SETWeb/Wap agents access both the online and the
existing databases to build the phone bills. The procedural definition of the SSTCCProtocol behavior
is shown in Figure 23.4.
Figure 23.4 presents the three components composed in parallel using the parallel independent com-
position operator (|||). Based upon the use of the two specifications (SstccService.lotos and
specification SstccProtocol[mail_alarm,phone_alarm,online_bill,check_owner]:noexit
behaviour
hide clone_notif,web_notif,impostor_notif in
SsccClone[mail_alarm,phone_alarm]
|||SetwebBill[online_bill]
|||SipiImpostor[check_owner]
where
process SsccClone[mail_alarm,phone_alarm]:noexit:= ... endproc
process SetwebBill[online_bill]:noexit:= ... endproc
process SipiImpostor[check_owner]:noexit:= ... endproc
endspec
-----------------------------------------------------------
aldebaran –bddsize 4 –oequ –std SstccProtocol_bmin.bcg ./SstccService.bcg | tee aldebaran.seq
TRUE
-----------------------------------------------------------
Comada de entrada
u0 C
Camada escondida
W Camada de saida
u1 y1
u2 y2
uN y0
Junçâo de somatório
Funçao Gaussina
Funça–o linear
To increase the RBF’s functionality, the Mahalanobis distance could be used within the Gauss function, as
follows:
1 1 T −1
f = (x − c) = exp − (x − c) K (x − c) , (23.2)
(2π )n/2 |K |1/2 2
where K −1 is the inverse of the X covariance matrix, associated with the node of the hidden C layer.
Given n-vectors (input data) of p-samples, representing p-classes, the network can be initiated with
the knowledge of the centers (i.e., locations of the samples). If the J-th vector sample is represented, then
the weight matrix C can be defined as: C = [c1 , c2 , . . . , c3 ]T so that the weights of the hidden layer in
the j node are composed of the center vector. The output layer is a weight sum of the outputs of the hidden
layer. When presenting an input vector for the network, the network implements Equation (23.3)
where f represents the functional output vector of the hidden layer, and C the corresponding center vector.
After supplying some data with the desired results, the weights (W ) can be determined using the training
algorithm either interactively or noninteractively, based upon the descendant or pseudo-inverse gradient
scheme, respectively. In order to determine the σ 2 variation parameter for the Gauss function, one has,
first, to compute the median distance between all the training data, see Equation (23.4).
1
σj2 = (x − c)T (x − c), (23.4)
Mj
x∈j
where j is the group of training patterns grouped in the center of the cluster Cj , and Mj is the number
of patterns in .
Another way of choosing the parameter σ 2 is to calculate the distances between the centers in each
dimension and use a scaling factor. This approach will allow the p-nearest neighboring algorithm to be
used and obtain the variance related to each center.
rate is quite encouraging, when compared to previous results obtained by other researchers [13], where a
5.4% error rate was obtained using the Back Propagation algorithm.
groups of algorithms have emerged as successful implementations of artificial immune systems: (1) the
negative selection algorithm developed by Forrest et al. [19] is used to differentiate between normal system
operation (i.e., self) and abnormal operation (i.e., nonself), and (2) the immune network based model as
discussed by Jerne [20], then formally described by Perelson [21] and Farmer et al. [16]. Our intrusion
detection model for mobile phone operations is based upon the Jerne model. Both approaches are partic-
ularly well suited to data-mining related tasks that involve searching through large databases and finding
the matching patterns [22–25].
The Idiotypic network theory, introduced by Jerne, maintains that interactions in the immune system
do not occur only among antibodies and antigens since antibodies may interact with each others as well.
Hence, an antibody may be matched to other antibodies, which in turn may be matched to yet other
antibodies, and this process can spread throughout the entire population. According to the immune
network theory, immunological memory is achieved by the B cells supporting each other through an
Idiotypic network. B cells are not only stimulated by antigens, but also stimulated and, to a degree,
suppressed by neighboring B cells. The interaction between B cells takes place via the idiotopes on each B
cell which acts as a recognizer for other B cells. Therefore, the more neighboring cells a particular B cell
has that are similar to itself, the more stimulated that particular B cell becomes. This allows the clustering
of a population of identical B cells, and produces a self-supporting structure to aid the immune responses.
The network self-organizes and stabilizes within a reasonable time since its survival is achieved by mutual
reinforcement among the B cells via a feedback mechanism. Survival of a new B cell produced, as part
of the immune response, either by the bone marrow or by hypermutation, depends on its affinity to the
antigens that are present, and to its neighbors in the immune network. The new B cells may have an
improved match for an antigen and, thus, will proliferate and survive longer than the existing B cells. By
repeating the mutation and the selection procedures several times, the immune system learns to produce
better antigen matches. This theory could help in explaining how the memory of past infections is retained.
Furthermore, it could result in the suppression of similar antibodies, thereby, encouraging a good diversity
in the antibody pool [26]. See Figure 23.10.
Stimulation
Supression
Antigen
Epitope
B cell 2
Idiotope 3
Paratope 2
Antibody 2
B cell 1
Idiotope 1
Paratope 1
Antibody 1
Paratope 4
B cell 3 Antibody 4
Idiotope 4
Idiotope 3
B cell 4
Antibody 3 Paratope 3
FIGURE 23.10 Jernes Idiotypic Network Hypothesis. (From Jerne, N.K. Annals of Immunology, 125, 373–389, 1973.
With permission.)
Figure 23.10 shows the basic idea of the immune network hypothesis. There are many antibodies on
the surface of the B cells that act as antigen detectors. The specialized portion of the antibody used for
identifying other molecules (of antigens or antibodies) is called paratopes. The region on any molecules
that can be recognized by the paratopes is called epitope. The binding between idiotopes and paratopes has
the effect of stimulating the B cells. This is mainly due to the fact that the paratopes on B cell2 reacts to the
idiotope on B cell1 as it would on an antigen. However, to counter the reaction, there is a certain amount
of pressure among the B cells to act as a regulatory mechanism. The authors [16,21] have formulized the
theory, and they have suggested that the B cells are stimulated by an interaction not only between B cells
and antigens, but also the surrounding of the B cells that are connected to form a network of cells. As
illustrated in Figure 23.10, B cell1 stimulates three cells, B cell2, B cell3, and B cell4, and it also receives a
certain amount of pressure from each one. This creates a network type structure that provides a regulatory
effect on the neighboring B cells. The immune network acts as a self-organizing and self-regulatory system
that captures antigen information, and is ready to launch an attack against any similar antigens.
In the immune network, a node represents a type of antigens or antibodies, and the links between
nodes represent the affinity between them. The affinity is determined by the matching degree between
the paratope and the epitope, and the population of each type of antibodies (or antigens). An antibody
type is thought to be stimulated when its paratope recognizes the epitope of antigens or other types of
antibodies, so that the corresponding lymphocyte is stimulated to reproduce more lymphocytes, and the
lymphocytes secrete more antibodies (with a certain mutation rate). This process is called clonal selection.
On the contrary, an antibody or antigen may be suppressed if its epitopes are recognized by others [25].
The essential aspect of the immune network model is that the list of antigens and antibody types is
dynamically changing as some types that are added or removed, change with time.
1. Data training : The initial network B cell population is made up by sampling the raw data set and
creating the B cell objects (for instance, phone calls on a Sunday between 12 and 14 h).
2. Data items: The remainder of the data items are taken to create the antigens set. These data are
introduced during the training process and will be repeatedly presented to the network in an
attempt to capture the trends of the mobile phone users using the data set.
3. Expose: This level computes the B cell stimulation level, which takes into account the match scores
with the antigens.
4. Cloning : New clones may be produced when the cells are sufficiently stimulated.
5. Feedback: New items are introduced in the system.
6. Results: Adopting the primary immune response’s analogy, the components of the network
repeatedly present these new items for training purposes to the network, in an attempt to pro-
voke a reply from B cell, clone and mutate, thereby creating a diverse set of the raw data to keep a
network of B cells.
23.5 Conclusion
Biological inspired techniques have received a great deal of interests due to their promises to resolve
challenging combinatorial and real-world problems. In this chapter, we have reviewed two promising
Data
items
Start
Training
set
Expose data
training
Stimulate
B cell
Clone B cell
Add new
clones
Training No
finish ?
Results
FIGURE 23.11 Immune human-based intrusion detection model for mobile phone operations.
nature-based models, neural networks and immune human-based systems, and show how these two
approaches could be used to design efficient distributed security management systems for mobile phone
operations.
In the future, we plan to investigate further the immune human-based intrusion detection system and
study its performance evaluation for mobile phone operations. Next, we wish to investigate further the
design of efficient security models using biological inspired techniques for mobile ad hoc networks.
References
[1] Ottawa Citizen, October 11, 2004.
[2] Boukerche, A. and Notare, M.S.M.A. Behavior-based intrusion detection in obile phone systems.
Journal of Parallel and Distributed Computing, 62, 1476–1490, 2002.
[3] Notare, M.S.M.A., Boukerche, A., Cruz, F.A., Riso, B.G., and Westphall, C.B. Security management
against cloning mobile telephones. In IEEE GLOBECOM’99, 1999, Rio de Janeiro, Brazil. 1999,
pp. 1969–1973.
[4] Boukerche, A., Sobral, J.B.M., Juca, K., and Notare, M.S.M.A. Anomaly and misuse intrusion detec-
tion based in the immune human system. In Proceedings of the 17th IEEE IPDPS/NIDISC’2003 —
International Parallel and Distributed Processing Symposium, 2003, Nice, France. IEEE Press, 2003,
p. 146.
[5] Boukerche, A., Juca, K., Notare, M.S.M.A., and Sobral, J.B.M. Intrusion detection based on the
immune human system. In IEEE 16th IPDPS’2002/BioSP3 — International Parallel and Distributed
24-373
24.1 Introduction
24.1.1 Artificial Neural Networks
Several novel methods of computation that are collectively known as soft computing have recently
emerged. The raison d’être of these modes is to exploit the tolerance for imprecision and uncertainty
in real-world problems to achieve tractability, robustness, and low cost. Soft computing is usually used to
find an approximate solution to a precisely or imprecisely formulated problem. Neural computing, fuzzy
computing, and evolutionary computing are the major components of this approach.
Artificial neural networks are an attempt to mimic some or all of the characteristics of biological
neural networks. This soft computational paradigm differs from a programmed instruction sequence, in
that information is stored in form of weights. Each neuron is an elementary processor with primitive
operations, such as summing the weighted inputs coming to it and then amplifying or thresholding the
sum. Assembly of such neurons can, in principle, perform universal computations for suitably chosen
weights.
A well-known model of neuron studied extensively in Reference 1 is called the perceptron
(see Figure 24.1). The perceptron computes a weighted sum of its input signals and generates an output
of 1 if this sum is above a certain threshold t ∈ R, otherwise, an output of 0 results. In General, given a
weight vector w = (w1 , . . . , wn ) ∈ R n and an input vector x = (x1 , . . . , xn ) ∈ R n , such neuron computes
a simple function of the form fw : R n → S, where n ≥ 1, S ⊆ R and
fw (x ) = g (w
x ) (24.1)
for some transfer function g : R → S. There are two choices for the set S currently popular in literature.
The first is the discrete model with S = {0, 1}. In this case, if a neuron’s threshold is t , then its transfer
function g is a linear threshold function (see Figure 24.2[a]) defined by
0 if y < t
g (y) = (24.2)
1 if y ≥ t
and f is called a weighted linear threshold function. The second is the continuous model with S = [0, 1]. In
this case, g is typically a monotone increasing function such as the sigmoid function (see Figure 24.2[c])
given by
1
g (y) = (24.3)
1 + a −(by)
for a, b ∈ R + . The continuous model is popular because it is easier to construct. The discrete model is
popular because its behavior is easier to analyze (however, it uses more hardware). A neural network is
characterized by the network topology, the connection strength (i.e., weights) between pairs of neurons,
the neurons properties (i.e., transfer functions), and the learning algorithms.
x1
. w1
.
y
. f (x) = g (y)
.
. wn
xn
FIGURE 24.2 Examples of transfer functions: (a) linear threshold function, (b) piecewise linear function, (c) sigmoid
function, and (d) Gaussian function.
(a) x1 (b) x1
x2 x2
x3 x3
FIGURE 24.3 Examples of neural networks architectures: (a) feed-forward network and (b) feed-back network.
Artificial neural networks can be viewed as weighted directed graphs in which artificial neurons are
nodes and directed edges (with weights) are connections between the neuron’s outputs and inputs. Based
on the interconnection pattern (architecture), artificial neural networks are grouped in two categories:
feed-forward networks (in which graphs have no loops and cycles) and feed-back (or recurrent ) networks
(in which loops or cycles occur because of feed-back connections). Different network topologies yield
different network behaviors and require appropriate learning algorithms. Figure 24.3 illustrates the two
types of network topologies.
A learning process in artificial neural networks is the problem of updating the connection weights so
that the network can efficiently perform a specific task. Learning can also be viewed as the problem of
minimizing an objective function which is the error between the network output and the desired output.
Efficient learning algorithms for specific topologies have been proposed in the literature.
transfer function and realizes a function of n variables ranging in the set S ⊆ R with values in K , that is,
computes a function f : S n ⊆ R n → K . For S = K we refer to the processing unit as a multiple-valued
logic neuron since it simulates a multiple-valued logic function f : K n → K . Multiple-valued logic neural
networks are thus neural networks composed of multiple-valued logic neurons as processing units. The
first model of multiple-valued logic neural networks were introduced in Reference 6 and since then various
other models have been described [7–11].
can be implemented by such a single element. More general circuit elements called multiple-valued
multiple-threshold elements have also been studied [24–26].
Interesting methods for the synthesis of multiple-valued logic functions have been described in the
literature. These new approaches to synthesis, based on natural or physical laws, were introduced
mostly to search for a minimal circuit representation (or equivalently, a minimal logic expression) of
a given multiple-valued logic function. The only known algorithm for finding minimal multiple-valued
logic expressions is exhaustive search. The excessive computation time makes this approach impractical.
Especially, multiple-valued sum-of-products expressions are interesting because of the ease with which
they can be implemented by programmable logic arrays [26,27]. Because of the computational com-
plexity associated with minimal sum-of-products solutions, there is considerable interest in heuristics.
Typical heuristics is that, first a minterm is selected and then an implicant is chosen that covers the min-
term [26, 28]. This process is repeated until the given expression is covered. Yildirim et al. [29] proposed
multiple-valued logic design methods which employ simulated annealing [30]. Kaczmarek et al. [31]
proposed neural network techniques. Hata et al. [32] proposed solutions using genetic algorithms.
Lloris-Ruiz et al. [33] used information theoretic (entropy) approaches for the minimization of logic
expressions.
24.1.6 Motivations
References 6, 10, 12, and 14 do not discuss how to construct minimal neural networks from their respective
models. In all of these papers, the size and depth of the networks is fixed in advance. This is a major
drawback since many logic functions can be synthesized with minimal size or minimal depth networks.
The homogeneous multiple-valued perceptron learning algorithm presented in Reference 15 has the
weakness of being able to learn only a very tiny portion of the set of separable functions, so it has a
low capacity. In addtion, no learning algorithm is known, in general, for the multiple-valued multiple-
threshold perceptrons.
Multiple-valued multiple-threshold perceptrons are multiple-valued multiple-threshold logic elements
with learning abilities. A minimal multiple-valued logic perceptron which computes a given logic function
is a perceptron containing the least number of thresholds. The problem of finding such minimal perceptron
for a function is difficult and is still left open.
Finding a minimal logic expression for a given function is a very difficult problem. The simulated
annealing [29], genetic algorithms [32], and neural networks [31] based-approaches to express minimiza-
tion introduced in the literature have the tendency to produce local optimum solutions. These are still very
good solutions; however, it seems to us that better solutions can be obtained by improving these methods.
in V ⊆ K n is described and the results obtained are compared with those of Section 24.4. In Section 24.6,
we address the problem of minimizing the size of single multiple-valued multiple-threshold perceptrons.
Every n-input k-valued logic function can be implemented using a (n, k, s)-perceptron, for some number
of thresholds s. We propose a GA to search for an optimal (n, k, s)-perceptron that efficiently realizes a
given multiple-valued logic function, that is, to minimize the number of thresholds. Experimental results
show that the GA finds optimal solutions in most cases.
o0 if y < t1
t ,o
gk,s (y) = oi if ti ≤ y < ti+1 for 1 ≤ i ≤ s − 1 (24.4)
os if ts ≤ y,
where o = (o0 , . . . , os ) ∈ K s+1 is the output vector, t = (t1 , . . . , ts ) ∈ R s is the threshold vector — with
ti ≤ ti+1 (1 ≤ i ≤ s − 1) — and s (1 ≤ s ≤ k n − 1) is the number of threshold values.
Multiple-threshold devices [21] are threshold elements containing multiple levels of excitation
(thresholds). Among their qualities, given enough thresholds, a single multiple-threshold element can
realize any given function operating on a finite domain [25].
o0 x < t1 ,
if w
t ,o
n
t , o)(x ) = gk,s (w
Fk,s (w, x ) = oi if ti ≤ w
x < ti+1 , (24.5)
os if ts ≤ w
x ,
(a)
os-2
os
os-1
o3
o1
o2
o0
0 y
(b)
1 g(y)1
0 y
t1
1 g(y)2
0 y
t2
1 g (y )3
0 y
t3
1 g (y)s–1
0 y
ts–1
1 g (y)s
0 y
ts
t ,o
FIGURE 24.4 Decomposition of (a) gk,s (w
x ) into (b) s linear threshold functions. (From A. Ngom, I. Stojmenović,
and V. Milutinović. IEEE Transactions on Neural Networks, 12, 212–227, 2001. With permission.)
g (w
x )1 , . . . , g (w
x )s , that is,
s
0 x < ti ,
if w
gst ,o (w
x ) = o0 + ai g (w
x )i and g (w
x )i = (24.6)
i=1
1 x ≥ ti ,
if w
t2
a2
w2
x2
as
w1
ts
x1
Linear summers
t ,o
FIGURE 24.5 Two-hidden-layers network for gk,s (w
x ). (From A. Ngom, I. Stojmenović, and V. Milutinović. IEEE
Transactions on Neural Networks, 12, 212–227, 2002. With permission.)
Let V = {x1 , . . . , xv } ⊆ R n be a set of v vectors (v ≥ 1). A k-valued function f with domain V and
specified by the input–output pairs {(x1 , f (x1 )), . . . , (xv , f (xv ))}, where xi ∈ R n , f (xi ) ∈ K , is said to be
s-separable if there exist vectors w ∈ R n , t ∈ R s , and o ∈ K s+1 , such that
o0 xi < t1
if w
f (xi ) = oj if tj ≤ w
xi < tj+1 for 1 ≤ j ≤ s − 1 (24.7)
os if ts ≤ w
xi
for 1 ≤ i ≤ v. Equivalently, f is s-separable if and only if it has a s-representation defined by (w, t , o).
A k-valued function over V is said to be s-nonseparable if it is not s-separable.
In other words, an (n, k, s)-perceptron partitions the space V ⊂ R n into s + 1 distinct classes
[oj ]
H0 , . . . , Hs[os ] , using s parallel hyperplanes, where Hj = {x ∈ V |f (x ) = oj and tj ≤ w
[o0 ]
x < tj+1 }.
We assume that, t0 = −∞ and ts+1 = +∞. Each hyperplane equation denoted by Hj (1 ≤ j ≤ s) is of
the form
x = tj .
Hj : w (24.8)
Multilinear separability (s-separability) extends the concept of linear separability (1-separability of the
common binary one-threshold perceptron) to the (n, k, s)-perceptron. Linear separability in two-valued
case tells us that an (n, 2, 1)-perceptron can only learn from a space V ⊆ [0, 1]n in which there is a single
hyperplane which separates it into two disjoint halfspaces: H0[0] = {x |f (x ) = 0} and H1[1] = {x |f (x ) = 1}.
From the (n, 2, 1)-perceptron convergence theorem [1], concepts which are linearly nonseparable cannot
be learned by an (n, 2, 1)-perceptron. One example of linearly nonseparable two-valued logic function is
the n-input parity function. Likewise, the (n, k, s)-perceptron convergence theorems [16,34] state that an
(n, k, s)-perceptron computes a given function f ∈ Pkn if and only if f is s-separable. Figure 24.6 shows an
example of two-separable four-valued logic function of P52 .
y
4 2 2 2 2 2
3 2 2 2 0 0
2 0 0 0 0 0
1 0 0 4 4 4
0 4 4 4 4 4
0 1 2 3 4x
FIGURE 24.6 A two-separable function of P52 . (From A. Ngom, I. Stojmenović, and J. Žunić. IEEE Transactions on
Neural Networks. 14, 469–477, 2004, Proceedings of the 29th IEEE International Symposium on Multiple-Valued Logic.
IEEE Computer Society Technical Committee on Multiple-Valued Logic, May 1999, pp. 208–213, IEEE Computer
Society. With permission.)
FIGURE 24.7 A (2, 52 , 2)-partition. (From A. Ngom, I. Stojmenović, and J. Žunić. IEEE Transactions on Neural
Networks, 14, 469–477, 2004, Proceedings of the 29th IEEE International Symposium on Multiple-Valued Logic, IEEE
Computer Society Technical Committee on Multiple-Valued Logic, May 1999, pp. 208–213, IEEE Computer Society.
With permission.)
y y y
f1(x, y) f2(x, y) f3(x, y)
3 1 1 3 3 3 2 2 3 3 3 2 2 3 3
2 2 1 1 3 2 1 2 2 3 2 1 2 2 3
1 0 2 1 1 1 0 1 2 2 1 3 1 2 2
0 0 0 2 1 0 0 0 1 2 0 3 3 1 2
x x x
0 1 2 3 0 1 2 3 0 1 2 3
FIGURE 24.8 Examples of three-separable two-input four-valued logic functions. (From A. Ngom, C. Reischer,
D.A. Simovici, and I. Stojmenović. Neural Processing Letters, 12, 2000, Proceedings of the 28th IEEE International
Symposium on Multiple-Valued Logic, May 1998, pp. 161–166. With permission.)
FIGURE 24.9 Permutably homogeneous (n, k, s)-perceptron learning algorithm. (From A. Ngom, C. Reischer,
D.A. Simovici, and I. Stojmenović. Neural Processing Letters, 12, 2000, Proceedings of the 28th IEEE International
Symposium on Multiple-Valued Logic, May 1998, pp. 161–166. With permission.)
Let o ∈ K s+1 be the output vector of an (n, k, s)-perceptron. When o is a (s + 1, k)-permutation, that is,
there are no i and j (i = j) such that oi = oj , we propose the permutably homogeneous (n, k, s)-perceptron
learning algorithm (for a fixed (s + 1, k)-permutation o) as shown in Figure 24.9.
In Figure 24.9, the constant 0 < η ≤ 1 is the learning rate. The initial weights can be set to any
(random) values. The initial thresholds can also be set to any (random) values, however, empirical tests
show that the algorithm converges faster when the initial thresholds are set in such a way that ti+1 − ti = c
(e.g., c = kn) and that tv−1 ≤ tv ≤ tv+1 each time we update tv or tv+1 . We can also generate a new
random η before each call to MultiPerceptronUpdate. Poso [z] is the position (or the index) of z in o. For
example, if o = (3, 0, 2, 1) then Poso [3] = 0, Poso [0] = 1, Poso [2] = 2, and Poso [1] = 3.
In the algorithm, the weight and the threshold vectors are always updated in opposite directions
using the error value δ = Poso [f (x )] − Poso [v]. If Poso [f (x )] < Poso [v] then δ < 0 means that the
weights are too large or tPoso [v] is too small. Therefore, we decrease the weights and increase tPoso [v] .
If Poso [f (x )] > Poso [v] then δ > 0 means that the weights are too small or tPoso [v]+1 is too large.
Thus, we increase the weights and decrease tPoso [v]+1 . When Poso [f (x )] = Poso [v] no modification is
done and the algorithm goes to the next step. So, w and t are always updated in opposite directions
given by the position of f (x ) relative to that of v in o. Note that o is known and given as input to the
algorithm.
Our algorithm can learn any o-separable logic function as long as o is a fixed (s + 1, k)-permutation.
In other words, the algorithm can learn any function whose input vectors can be separated by a set of paral-
lel hyperplanes (i.e., s-separable, for some s) and whose classes — separated by these hyperplanes — have
distinct values (i.e., o-separable, for some (s + 1, k)-permutation o). In fact, the permutably homogeneous
algorithm generalizes the homogeneous algorithm in which o is any permutation (such permutation need
to be the identity permutation or a k-permutation).
When o is not a permutation, that is, there are i and j (i = j) such that oi = oj , then it becomes
difficult to obtain a learning algorithm with guaranteed convergence. This problem is left open for further
research.
Lemma 24.1 tells us that, given a (k − 1)-representation (w, t , σ ) for a homogeneous function f , the
homogeneous transformation of f into a permutably homogeneous function fπ leaves the weights and
the thresholds invariant. Therefore, the positions of the k − 1 separating parallel hyperplanes do not
change after transformation of f . Clearly, in Figure 24.8, the three hyperplanes remain invariant even after
transformation of f2 into f1 = f2(0,2,1,3) and vice versa.
Theorem 24.1 Given the output vector π, the permutably homogeneous (n, k, k − 1)-perceptron learning
algorithm for learning a function f ∈ Pkn terminates if and only if f is π -separable.
Proof: ⇒) If the permutably homogeneous (n, k, k − 1)-perceptron algorithm with output vector π
terminates on learning f then a (k −1)-representation (w, t , π ) exists for f . Therefore, f is π-separable
and
thus permutably homogeneous. ⇐) Let f be π -separable, we want to show that the algorithm terminates
for f . The algorithm, instead of learning f , learns fπ −1 using the homogeneous (n, k, k − 1)-perceptron
learning algorithm with output vector π −1 . Since f is permutably homogeneous and π -separable then
from the ⇐ part of Lemma 24.1 we have that fπ −1 is homogeneous and thus σ -separable. Once fπ −1 has
been learned, f can be reconstructed using the ⇒ part of Lemma 24.1. The algorithm is guaranteed to
terminate by the homogeneous (n, k, k − 1)-perceptron convergence theorem of Reference 16.
Proof: Clearly, if r = (w, t = (t1 , . . . , ts ), ν ) is a s-representation for f , then it is easy to see that
t c = (t1c = t1 , . . . , tsc = ts , ts+1
r c = (w, c , . . . , t c ), ν
k−1 ) is a (k − 1)-representation for f and vice versa
c
(where ts+1 ≤ · · · ≤ tk−1 are arbitrary and their corresponding hyperplanes contain no points between
c c
them).
Theorem 24.2 (Permutably homogeneous perceptron convergence theorem) Given the output
vector ν , the permutably homogeneous (n, k, s)-perceptron learning algorithm for learning a function f ∈ Pkn
terminates if and only if f is ν -separable.
Proof: ⇒) Same as in Theorem 24.2 ⇐) The algorithm learns f using the output vector ν c instead
of ν . Since from Lemma 24.2 f is ν c -separable, then by Theorem 24.2 the algorithm is guaranteed to
terminate.
2 4 3
0 01
FIGURE 24.10 Example of partially ordered set. (From A. Ngom, C. Reischer, D.A. Simovici, and I. Stojmenović.
Neural Processing Letters, 12, 2000, Proceedings of the 28th IEEE International Symposium on Multiple-Valued Logic,
May 1998, pp. 161–166. With permission.)
FIGURE 24.11 Extended (n, k, s)-perceptron learning algorithm. (From A. Ngom, C. Reischer, D.A. Simovici,
and I. Stojmenović. Neural Processing Letters, 12, 2000, Proceedings of the 28th IEEE International Symposium on
Multiple-Valued Logic, May 1998, pp. 161–166. With permission.)
Our extended permutably homogeneous (n, k, s)-perceptron learning algorithm, for searching an out-
put vector o ∈ K s+1 and then learning a given function f using o, is shown in Figure 24.11. For f ∈ Pkn ,
let Kf be its set of values. Denote by <dKf an order relation over Kf with respect to the d-th variable, that
is xd , where 1 ≤ d ≤ n. We will refer to d as direction since, as we will see later, it selects the dimension of
the n-cube K n along which we construct a poset.
The extended learning algorithm goes as follows. Using the partial order construction algorithm of
Figure 24.12 we attempt to construct a poset (Kf , <dKf ) with respect to some variable xd . If such (Kf , <dKf )
exists and is a chain then o is the concatenation of the unique linear extension of (Kf , <dKf ) and a
(s − |Kf | + 1, |K − Kf |)-permutation of K − Kf and it will be used to learn f . If such (Kf , <dKf )
exists but is not a chain then we attempt to obtain (Kf , <d+1 Kf ) and so on until either we construct a
chain poset (Kf , <dKf ) for some d, or there is some d such that a poset (Kf , <dKf ) cannot be obtained, or
d = n. When n nonchain posets (Kf , <1Kf ), . . . , (Kf , <nKf ) are constructed then, using the partial orders
combination algorithm of Figure 24.13, we attempt to combine these n nonchain posets into a chain poset
(Kf , <Kf ).
The partial order construction algorithm goes as follows. For a given direction 1 ≤ d ≤ n, we start with
an antichain (Kf , <dKf ). Then, for every x = (x1 , . . . , xn ), we construct poset (Kf , <dKf ) by adding new
comparable pairs f1 = f (x1 , , . . . , xd , . . . , xn ) <dKf f (x1 , . . . , xd + 1, . . . , xn ) = f2 whenever f1 ≤dKf f2 ,
and also, we add new comparable pairs y <dKf f2 whenever y <dKf f1 and comparable pairs f1 <dKf y
whenever f2 <dKf y, for some y. We exit the loops as soon as there is some new comparable pair y <dKf z
(for some y and z) that cannot be added to (Kf , <dKf ). That is, z <dKf y is already in (Kf , <dKf ) and,
therefore, adding its inverse leads to inconsistency. In this case, the construction of (Kf , <dKf ) cannot be
completed along the direction d (meaning that (Kf , <dKf ) simply do not exist). In case (Kf , <dKf ) exists —
its construction can be completed along d — then it has a unique linear extension if and only if it is a
chain. In other words, (Kf , <dKf ) is always constructed in the positive direction along the d-th dimension
of the n-cube K n , or equivalently, starting from any point y in the hyperplane xd = 0 we move toward the
hyperplane xd = k − 1 by following the line segment orthogonal to both hyperplanes and whose origin
is y . Figure 24.14 shows examples of constructed posets. For illustration purpose, we have also shown the
graphs obtained from function g (x, y) (Figure 24.6[b]) when attempting to complete the construction
with possible contradictory pairs. As one can see, such graph cannot be embedded into a poset. So posets
(Kg , <1Kg ) and (Kg , <2Kg ) do not exist.
For given permutably homogeneous and s-separable function f ∈ Pkn , not any linear extension e of a
nonchain (Kf , <dKf ) is good for learning. The (n, k, s)-perceptron learning algorithm may not terminate
when e is used. For instance, from Figure 24.6(c) the linear extension (3, 0, 1, 2) of (Kh , <1Kh ) is not good for
learning h since h is (2, 0, 1, 3)-separable (assuming s = 3). Some directions may give more information
on order than others. For example, f (x1 , x2 ) = (x1 + 1) mod k is irrelevant on x2 (or direction 2) and
so (Kf , <2Kf ) is an antichain whereas (Kf , <1Kf ) is a chain. In general, given a direction d there are two
possibilities for failure. Either (Kf , <dKf ) cannot be constructed, or (Kf , <dKf ) exists but is a nonchain poset,
such that a selected linear extension (among its many linear extensions) does not yield a convergence of the
(n, k, s)-perceptron learning algorithm (but some other will do so). Because of this fact, when a nonchain
poset (Kf , <dKf ) is obtained for any direction 1 ≤ d ≤ n, we must combine these n posets in some way in
order to obtain a unique linear extension. Next we describe how to combine them.
The partial orders combination algorithm goes as follows. Let (Kf , <Kf ) be a combination poset of
d consistent nonchains posets (Kf , <1Kf ), . . . , (Kf , <dKf ). Initially (Kf , <Kf ) is set to (Kf , <1Kf ) and is
constructed according to some binary string c = [c1 , . . . , cd ] ∈ {0, 1}d , where 1 ≤ d ≤ n. When
ci = 1 then the inverse of (Kf , <iKf ), that is poset (Kf , >iKf ), is in (Kf , <Kf ), otherwise (Kf , <iKf ) itself
is in (Kf , <Kf ) (obviously, (Kf , <iKf ) and (Kf , >iKf ) cannot both be in (Kf , <Kf ) at the same time). The
combination poset (Kf , <Kf ) is constructed using an algorithm for generating binary strings of lengths ≤n
in lexicographic order. For example, for n = 4, the lexicographic generation of binary strings of lengths ≤4
goes in the following manner:
0 00 000 0000
0001
001 0010
0011
01 010 0100
0101
011 0110
0111
1 ...
..
.
3 2 2 1 1 1 1
2 0 2 2 1 2 2
1 3 0 0 2 0 0
0 3 3 0 0 3 3
x
0 1 2 3
(b) y
g (x, y) 1 1
3 2 2 1 1 3
2
2
2 0 2 3 1
0 1
0
1 3 1 0 2 3
1
0 3 3 0 0 3 0
x
0 1 2 3
(c) y
h (x, y)
3 3 1 1 1
2 1 3
2 1 1 1 1
1
1 1 1 1 1
0 3 0 3
0 0 0 0 2
x
0 1 2 3
FIGURE 24.16 Cutting algorithm. (From A. Ngom, C. Reischer, D.A. Simovici and I. Stojmenović. Neural Pro-
cessing Letters, 12, 2000, Proceedings of the 28th IEEE International Symposium on Multiple-Valued Logic, May 1998,
pp. 161–166. With permission.)
The algorithm is in ExtendPoset phase when it goes from left to right staying in a row (Figure 24.15).
It is in CutPoset phase when the algorithm shifts to some row (possibly far) below (Figure 24.16). The
algorithm is used to construct poset (Kf , <Kf ) as follows. If with the string [c1 , . . . , cd ] poset (Kf , <Kf )
exists but is a nonchain for d < n then we extend to next string [c1 , . . . , cd cd+1 = 0] in the row and
add poset (Kf , <d+1
Kf ) into (Kf , <Kf ). If with the string [c1 , . . . , cd ] poset (Kf , <Kf ) cannot be constructed
then we do not need to extend since poset (Kf , <Kf ) simply do not exist and that we cannot add a poset
Direction = 1 Direction = 2
2 1 3
y h (x, y)
1
3 3 1 1 1
0 3 0 3
2 1 1 1 1
1 1 1 1 1 Combination = 00 Combination = 01
1 2
0 0 0 0 2
0 1 2 3 x
3 0 2
1
1
0 2 3
FIGURE 24.17 Examples of combinations posets for Figure 24.14c. (From A. Ngom, C. Reischer, D.A. Simovici,
and I. Stojmenović. Neural Processing Letters, 12, 2000, Proceedings of the 28th IEEE International Symposium on
Multiple-Valued Logic, May 1998, pp. 161–166. With permission.)
to an undefined poset. Hence, in this case we can bypass the lexicographic generation of binary strings
to an appropriate point: we say that the algorithm is in CutPoset phase. We cut in the following manner.
Starting from position d of c we search for the first position r ≤ d such that cr = 0. We remove posets
(Kf , <rKf ) and (Kf , >r<i≤d
Kf ) from (Kf , <Kf ) and add poset (Kf , >rKf ) into (Kf , <Kf ), then finally, we set
d to r and cd to 1.
To summarize, with a given string [c1 , . . . , cd ] we have three possibilities. We generate the next
string whenever d < n and (Kf , <Kf ) is a nonchain poset (the extension phase) and update (Kf , <Kf )
accordingly. We bypass the lexicographic generation to some row below whenever (Kf , <Kf ) cannot be
constructed (the cutting phase) and compute (Kf , <Kf ) appropriately. We exit the algorithm as soon as
(Kf , <Kf ) is a chain or c1 = 1 or, (Kf , <Kf ) is a nonchain and d = n. In the first case, we learn f using
the unique linear extension of (Kf , <Kf ). In the second case, when no chain poset (Kf , <Kf ) is found, we
may either randomly select one poset (Kf , <iKf ) and look for a good linear extension of it to learn f , or
we select among all posets constructed so far the one which has the smallest width (since it will have the
smallest number of linear extensions to search); the selection can be done by computing the width of the
currently constructed consistent poset and keeping track of the smallest width (and storing the associated
poset). The width of a poset is the size of its longest antichain.
In addition, we do not need to continue generating new binary strings when c1 becomes 1. Because
they are symmetric to (i.e., complement of) those generated already (the poset constructed according to a
string c is dual to the poset constructed according to the complement of c, and hence, both posets behave
exactly the same way). See Figure 24.17 for examples of combination posets. Next, we explain how to add
or remove from a combination poset (Kf , <Kf ).
Given (Kf , <dKf ) to be added to (Kf , <Kf ) the addition (Kf , <Kf ) ⊕ (Kf , <dKf ) is defined by
(Kf , <Kf ) ⊕ (Kf , <dKf ) = (Kf , <Kf ∪γ (<dKf )), where γ (<dKf ) is the transitive closure of every compa-
rable pair of relation <dKf in relation <Kf . That is, O(|Kf |2 ) comparabilities from (Kf , <dKf ) are added to
(Kf , <Kf ) during addition. In addition, for every such comparability from (Kf , <dKf ) its transitive closure
in (Kf , <Kf ) is also added, that is, O(|Kf |) more comparabilities. In sum, operation ⊕ take O(|Kf |3 ) steps.
Given a poset (Kf , <dKf ) to be removed from poset (Kf , <Kf ), the substraction (Kf , <Kf ) (Kf , <dKf )
is defined by (Kf , <Kf ) (Kf , <dKf ) = [c1 ,...,ci ,...,cd−1 ] (Kf , <iKf ) (where (Kf , <iKf ) is reversed when
necessary). That is, to remove (Kf , <dKf ) from (Kf , <Kf ) is equivalent to restore (Kf , <Kf ) in the state it
was before (Kf , <dKf ) was added to it. To achieve efficiency, substraction operation is done in the following
way. Whenever we add a poset (Kf , <dKf ) in (Kf , <Kf ) we store into a separate data structure Rd all
comparabilities of (Kf , <dKf ) that are not in (Kf , <Kf ). So that when we later remove (Kf , <dKf ) from
(Kf , <Kf ) we will eliminate only comparabilities of Rd from (Kf , <Kf ). The ⊕ operation modified in this
way still operates in O(|Kf |3 ). O(|Kf |2 ) comparabilities of Rd are removed from (Kf , <Kf ). Therefore,
substraction takes O(|Kf |2 ) steps hence faster than addition. Inconsistency and unicity can be tested,
respectively, in O(1) and O(|Kf |) during addition.
In Figure 24.14 we show examples of constructed posets (Kf , <1Kf ) and (Kf , <2Kf ) for some f ∈ P42 .
Suppose s = 3. Poset (Kf , <1Kf ) and (Kf , <2Kf ) are chains (Figure 24.14[a]), so they have unique linear
extensions and thus f is permutably homogeneous and three-separable and can be learned. In an attempt
to construct (Kg , <1Kg ) and (Kg , <2Kg ) we obtain graphs (Figure 24.14[b]) that cannot be embedded into
posets because of inconsistencies, so g is not permutably homogeneous and three-separable and thus g
cannot be learned. Poset (Kh , <1Kh ) and (Kh , <2Kh ) are both nonchains (Figure 24.14[c]), so they have
many linear extensions and h is permutably homogeneous and three-separable; however, we do not know
which linear extensions are good for learning f , so we must combine (Kh , <1Kh ) and (Kh , <2Kh ) to search
for a unique linear extension. In Figure 24.17 we show two combination posets (Kh , <Kh ) according to
binary strings 00 and 01. As we can see, with string 00 poset (Kh , <Kh ) cannot be obtained because of
inconsistencies whereas with string 01 it is a chain.
A thick s-separable function is a function f ∈ Pkn for which the distance between any two neighboring
separating hyperplanes, in any direction, is strictly greater than one.
Theorem 24.3 If a permutably homogeneous function f ∈ Pkn is thick s-separable then (Kf , <dKf ) is a chain
for any 1 ≤ d ≤ n.
Proof: Let a and b be two neighboring distinct values connected by an edge. There is at least one separating
hyperplane between them. However, there is at most one separating hyperplane since, otherwise, two such
separating hyperplanes will be at distance strictly less than one along the dimension of that edge. Thus,
a ≺dKf b or b ≺dKf a, that is a and b are neighbors in the poset. All such neighboring pairs are detected in
at least one dimension. Therefore (Kf , <dKf ) has a unique linear extension.
For some nonthick s-separable functions, all combination posets (Kf , <Kf ) may have many linear
extensions and the last repeat loop of the algorithm in Figure 24.13 can be modified as follows to make
it more efficient. In parallel using several processors, we generate each linear extension of (Kf , <dKf )
and test it for learning, until one processor succeeds. This can be simulated on one processor by
time sharing, that is, generate linear extensions and test each of them for the same time in succes-
sion, until one successfully terminates. Next, we discuss the time complexity of the extended learning
algorithm.
The worst case scenario, in terms of time complexity, for the partial order construction algorithm is
when there is no contradiction for a given direction d. Therefore, the while loop associated with the selected
variable xd will be iterated k − 1 times and the n − 1 remaining for loops associated with nonselected
variables will be iterated each k times. Also, each of the two inner for loops will be iterated |Kf | times
and it takes O(|Kf |) steps to test whether (Kf , <dKf ) is a chain. Therefore the partial order construction
algorithm has a time complexity of O(k n |Kf |).
The worst case scenario for the partial orders combination algorithm is when there is no contradiction
for d < n but always contradiction for d = n. Therefore, 2n − 2 combination posets are construc-
ted each by either extension or cutting and then O(|Kf |!) linear extensions are checked for learning f .
Extension and cutting involve ⊕ operations and tests for unicity and inconsistency. Cutting is slower
than extension since it also involves operations and a search for the first bit equals to 0 (starting
from the end of the current string). Therefore, extension and cutting take, respectively, O(|Kf |3 ) and
O(n|Kf |2 ) steps. The (n, k, s)-perceptron learning algorithm takes O(enk n ) steps (e is the number of
learning epochs) and thus the partial orders combination algorithm has O(2n n|Kf |2 + (s + enk n )|Kf |!)
time complexity. Since in practice e is large and that 2n ≤ k n and |Kf |2 ≤ |Kf |! then the complexity
becomes O(enk n |Kf |!).
The worst case scenario for the extended learning algorithm is when poset (Kf , <dKf ) is a nonchain
for any direction d. So, n posets are constructed and combined. Consequently, the extended learning
algorithm has O(nk n |Kf | + enk n |Kf |!), that is, O(enk n |Kf |!) time complexity.
Recall the first method: generate each (s + 1, k)-permutation p and apply the (k, s)-perceptron learning
algorithm with output vector o = p for learning f until the learning terminates for some permutation p .
This method takes O(enk n (k!/(k −s −1)!)) time complexity. Let us refer to it as the permutation generating
learning algorithm. Next, we compare the extended algorithm to the permutation algorithm.
First, note that the time complexity of the permutation generating algorithm is always the same for any
function. That is not true for the extended algorithm. For instance, for any nonpermutably homogeneous
function f ∈ Pkn the extended algorithm takes O(nk n |Kf |) steps; the algorithm takes O(enk n ) steps for
any permutably homogeneous thick s-separable function. The worst time complexity is achieved only for
permutably homogeneous nonthick s-separable functions f whose any combination poset (Kf , <Kf ) is
a nonchain or cannot be constructed. We believe that the probability to obtain such function f is very
close to zero (if not equal to zero), so that in practice, the extended learning algorithm runs in O(enk n )
for permutably homogeneous s-separable functions. This proves its superiority over the permutation
generating learning algorithm.
24.3.4 Experiments
We tested our extended learning algorithm on nonpermutably homogeneous functions and on permutably
homogeneous thick or nonthick functions. However, we could not obtain nonthick functions whose
combination posets are all nonchains. This suggests that such functions are very rare if not inexistent.
The nonthick functions we used have at least one chain combination poset. In our test we set the learning
rate η to 0.5 and the maximum number of learning epochs e to 5000. We experimented with different
values of n and k. Also, the number of threshold s was not given to the learning algorithm, it was to be
found by the algorithm itself. The initial weigth vector is set to 0 and the initial threshold vector is set to
(k n , 2k n , . . . , sk n ) after s was found.
For nonpermutably homogeneous functions, the algorithm behaved as expected, that is, no learning is
effected on these functions. For permutably homogeneous (thick or nonthick) functions the algorithm
always terminated after learning the function with its unique linear extensions.
Next, we discuss an example of nonthick function which we have used in our experiment for k = 4 and
n = 3. Consider the two-place function h shown in Figure 24.17. To obtain a three-place function f we
project the values h (which will correspond to points in plane x3 = 0) in the three planes x3 = 1, x3 = 2,
and x3 = 3. So, f is also a nonthick function as h. Now, to make it more difficult, to find one of its good
linear extensions, we replace the value 3 that lies in plane x3 = 0 by value 1, and also change the value 2
that lies in plane x3 = 3 to 0. Here, the function f has no unique linear extension at all in any direction
and thus the extended learning algorithm must combine the three constructed posets to search for a good
linear extension. The algorithm did indeed, as we expected, find a chain poset which has the unique linear
extension (2, 0, 1, 3). The function has been learned successfully in 61 learning epochs. We also obtain
same results when extending f to an (n ≥ 3)-place functions.
Examples of permutably homogeneous thick functions are given by the following formula:
n
1
f (x ) = xi + n mod k,
ai
i=1
where ai = 2i + 1. For example, we tested with the four-place four-valued logic function f (x ) =
x1 /3 + x2 /5 + x3 /7 + x4 /9 + 4 mod f 4. Clearly, such function is permutably homogeneous since it
defines itself its separating hyperplanes and their number. It is easy to see that the function has three
possible values, namely 0, 1, and 2 and thus there must be 2 separating hyperplanes, also, the three classes
of input are separated in the order (0, 1, 2). Therefore, we expect that our extended learning algorithm
will find two separating hyperplanes and the output vector (0, 1, 2). Indeed, the function was learned
successfully in 946 learning epochs after the algorithm has found the output vector.
Our approach to the problem’s solution is discussed in Section 24.4.4. The learning method is based
on the general principle of partitioning algorithms discussed in Section 24.4.3. A partitioning algorithm
seeks to construct a minimal network by partitioning the input space into classes that are as large as
possible. Each class of partition is then assigned to a new hidden unit. The connections and weights of
the new units are determined appropriately in such a way that the constructed network will always give
the correct answer for any input. Distinct partitioning algorithms differ in the way the input space is
partitioned. Also, network topologies obtained from different partitioning algorithms may differ in the
way new hidden units are connected.
used to cut off a set of points with identical function values from the remaining set of points. Let the
halfspaces constructed by their algorithm be H1 , . . . , Hr and define ui to be 0 (or 1) if f is 0 (or 1) on the
region cut off by Hi for 1 ≤ i ≤ r. Then adding an output unit with threshold 0 and weight ui 2r−i for
the edge leaving the hidden unit corresponding to Hi , they get a neural network that computes f . They
assume that linear threshold units produce an output 0 or 1. In a restricted version of their approach
called regular partitioning, Ruján and Marchand [45], the hyperplanes do not intersect. This description
gives the general principle of partitioning algorithms. Particular implementations depend on the way the
next hyperplane is selected and the way new hidden units are connected (in term of network weights
and topology). Experiments indicate that partitioning algorithms are successful in the sense that they
efficiently construct (near) minimal neural networks [44–47].
Another way to view this process, without any reference to neural networks, is to consider a sequence
of halfspaces H1 , . . . , Hr . For each halfspace Hi we specify the value ui of f for those points in Hi that
are not contained in any of the previous halfspaces. Thus, for every input vector x ∈ {0, 1}n it holds that
f (x ) = ui , where i is the smallest index such that x ∈ Hi . It is assumed that Hr contains {0, 1}n , thus i is
always defined. This model is called a linear decision list [47, 48].
More formally, Rivest [48] defines a linear decision list in the following way. A linear test L over the
variable x = (x1 , . . . , xn ) ∈ {0, 1}n is of the form ni=1 wi xi ≥ t , where w1 , . . . , wn ∈ R n are the weights
and t ∈ R is a threshold. A linear decision list D over x is a sequence (L1 , u1 ), . . . , (Lr , ur ) where Li is
a linear test and ui is 0 or 1 for 1 ≤ i ≤ r. It is assumed that Lr , the last linear test is true for all input
vectors x . The length of D is r. The two-valued function fD computed by D assigns to every x the value
ui , where Li is the first linear test in the list that is satisfied by x . Every two-valued function f ∈ P2n can
be computed by some linear decision list. For instance, a disjunctive normal form with m terms can be
represented by a linear decision list of length m + 1.
Many heuristics such as Fahlman’s cascade correlation algorithm [49], Frean’s upstart algorithm [50],
Sethi’s entropy nets [51], Sirat’s neural trees [52], Mezard’s tiling algorithm [53], Barkema’s patch algorithm
[54], Frattale-Mascioli’s oil-spot algorithm [55], Young’s carve algorithm [56], regular partitioning [45], and
other partitioning algorithms [44,47,57–60], are proposed as approaches for building networks which are
(near) minimal for a given arbitrary task. These heuristics are known as constructive or growth algorithms
since they all construct a network starting from a fixed small number of units.
Partitioning techniques such as cascade correlation [49], upstart algorithm [50], and entropy nets [51]
apply only for functions with Boolean-valued outputs. Moreover, most of the growth algoritms described
in literature (see, for instance, References 44, 45, 53–55, 58 and 59) are only applicable to problems with
Boolean-valued inputs. Very few constructive methods deal with multiclass problems (k-valued functions
are multiclass functions). For instance, the carve algorithm [56] and Marchand’s neural decision list
Reference [47] both apply to functions f : R n → K .
The techniques in References 47 and 56 seek to identify the largest subset of input vectors of same class
that is separable from vectors of other classes. The hyperplane determined by the maximum separable
subset is then assigned to a newly created (n, 2, 1)-perceptron. A drawback of Marchand’s neural decision
list (NDL) [47] is that, since it employs linear programming to determine whether specific subsets of the
input vectors are linearly separable, the number of possible subsets of points that could be considered
for linear separability is exponential in the size of the inputs set. In order to circumvent this problem,
Marchand and Golea [47] worked only with a specific class of functions called halfspace intersections.
Young’s CARVE [56], which is an extension of Marchand et al. [44] to multiclass functions with real-
valued inputs, avoids the issue of testing for linear separability of subsets of points by directly searching
for hyperplanes that separate sets of points of one class only. Let Si be a set of points of class i and Hi (the
hull set for Si ) be the set of all points in the training set except those in Si , i ∈ K . Clearly, points outside
the convex hull formed by Hi are all of the same class i and thus, only points outside the convex hull
can actually be separated by a hyperplane boundary from the hull set. So to find the maximum separable
subset, CARVE considers only hyperplanes that touch the boundary (i.e., (n − 1)-dimensional faces or
vertices) of the convex hull. The hyperplane with the largest set of points in its open halfspace outside
the convex hull is the maximum separable subset. The algorithm is a simple hill-climbing technique that
searches for a (near) optimal hyperplane by partially traversing (or covering) the convex hull (i.e., the
boundary of the hull set) starting from a randomly selected convex hull vertex and a hyperplane that passes
through that vertex. This is repeated a fixed number of times nv and each time, the initial hyperplane
is randomly rotated a fixed number of times nr around the boundary of the hull set. The set of points
encountered during each rotation is determined; these are potential solutions. The main difficulty with
CARVE is that the likelihood to find the maximum separable subset depends on the parameters nv and nr .
For larger values of the product nv nr , the more likely is one to obtain a (near) optimal hyperplane since
large section of the hull set boundary will be traversed (it should be noted that the number of facets and
vertices of the convex hull may be exponential in the dimension of the input space). The optimal values
for nv and nr depend on the actual problem and the user must find such values by trial and error. In the
end, the size of the network obtained by CARVE depends on nv and nr .
In this section we introduce a method of partitioning a set V ⊆ K n using GA [61] to grow a
multiple-valued logic neural network for learning a function. In our own approach, the maximum
separable subset is treated as a special case of the longest strip (as will be discussed later). We introduce
(n, k + 1, 2)-perceptrons which will freely reduce the network size for given arbitrary k-valued functions.
Both CARVE and NDL use local search to search for good solutions. They extensively search some
parts of the space (around faces of a convex hull as in CARVE, or, around neighborhood of separating
hyperplanes as in NDL), while leaving some other parts of the search space untouched. On the other hand,
GA (in our implementation) performs a global search; it treats all parts of the space equally, due to its
implicit parallelism.
3 1 1 3 3
2 0 1 1 2
1 0 1 1 1
0 3 0 0 1
0 1 2 3 x
FIGURE 24.18 Example of longest strip for k = 4 and n = 2. (From A. Ngom, I. Stojmenović, and V. Milutinović.
IEEE Transactions on Neural Networks, 12, 212–227, 2001. With permission.)
3 1 1 3 3
2 0 1 1 2
1 0 1 1 1
0 3 0 0 1
x
0 1 2 3
FIGURE 24.19 Example of maximum separable subset for k = 4 and n = 2. (From A. Ngom, I. Stojmenović, and
V. Milutinović. IEEE Transactions on Neural Networks, 12, 212–227, 2001. With permission.)
removal of points in a predefined objective subset A ⊆ V . A is either the longest strip or the maximum
separable subset in V . Our main objective is, using GA, to obtain a subset G ⊆ V such that |G| is
as close as possible to |A| if not equal to |A| (of course we have |G| ≤ |A|). Our maximum separable
subset problem approach is an extension of the sequential learning algorithm described in Reference 44
to multiclass functions f : R n → K . The growth algorithm of Reference 44 is based on the perceptron
learning algorithm (more specifically, on the pocket algorithm) and its performance was hampered by the
fact that the pocket algorithm does not converge in the case where the points are not linearly separable.
In our own approach, however, we use an evolutionary approach to find optimal subsets.
In our particular implementation of partitioning algorithm (see Figure 24.20), GA is used to obtain
subsequent halfspaces delimited by either one or two hyperplanes (depending on the predefined objective
subset A). To each halfspace we assign a hidden unit that correctly classifies all elements of it. Let the
objective subset A be the longest strip in the given training set. Our growth algorithm begins with an
empty first hidden layer into which new (n, k + 1, 2)-perceptrons are inserted one after another until no
more insertion is possible. An (n, k + 1, 2)-perceptron implements two parallel hyperplanes in the input
domain and the aim is to find two parallel hyperplanes that define a strip G such that |G| is as close as
possible to |A|. The strip G is then removed from the training set. The next (n, k + 1, 2)-perceptron to
be added to the network aims to separate another (near) longest strip G, but now only from the reduced
training set. Once a (near) longest strip for this unit is found, the unit is added to the layer and the strip is
removed from the training set. The construction of the first hidden layer continues with each subsequent
unit separating a strip from the remaining training examples. The first hidden layer is complete when
only points of one class remains in the training set. Once the first hidden layer is complete, the remaining
weights, layers and units of the networks are determined to complete the network construction (the details
of the network architecture are described in Section 24.4.6).
Our principal objective in this section is to synthesize any arbitrary k-valued logic function by a
neural network constructed by our growth algorithms when (portion of) the function is given. We obtain
networks that either exactly or approximately implement k-valued logic functions.
This generational cycle is repeated until a predefined maximum number of generations is reached. Subset
G will be the best solution generated so far in the population.
Our main objective here is, for a given function f , to obtain a chromosome w which generates G such
that |G| ≈ |A|. Once such w is found we create a hidden unit to be inserted in the neural network. We then
eliminate all points x in G and again apply the GA on the remaining points. The algorithm terminates as
soon as there are no points left. The created hidden units will then be collected to construct a feed-forward
network. The parameters (weight, threshold, and output vectors) of the hidden units and the topology of
the network will be discussed later.
24.4.5.2 Fitness Function
The objective function, the function to optimize, provides the mechanism for evaluating each
chromosome.
Let S ⊆ K n be the set of remaining points. Initially, S = K n . To compute the longest strip generated by
we calculate for every x ∈ S the value w
w, x and construct a sorted list of records of the form (w x , f (x )).
The list is sorted using w x as primary key and f (x ) as secondary key. Let these records be sorted as follows:
x1 , . . . , x|S| , or more precisely, Pi = (w xi , f (xi )), 1 ≤ i ≤ |S|, where w
xi ≤ · · · ≤ w
x|S| . A strip in S is a
(f (xi ))
sequence Tw = Pi Pi+1 . . . Pi+j such that
1. f (xi ) = f (xi+1 ) = · · · = f (xi+j )
xi−1 = w
2. w xi and w
xi+j = w
xi+j+1
with 1 ≤ i ≤ |S| and 0 ≤ j ≤ |S| − i. The length of the strip is j − i + 1 and f (xi ) is the value of the strip.
For example, in Figure 24.18 we have w = (1, 1) and
(1)
which gives the longest strip T(1,1) = P7 P8 P9 P10 P11 P12 P13 generated by w.
Given a set of points S ⊆ K n and a function f over S, let P1 · · · Pj1 and Pj2 · · · P|S| (1 ≤ j1 < j2 ≤ |S|)
be, respectively, the leftmost and rightmost strips generated by w, with strip values c1 and c2 . We denote by
the length of the longest strip generated by w
L(S, w) and denote by M (S, w)
the length of the maximum
between the leftmost and rightmost strips, on set S and function f . To evaluate how good is w we propose
the following fitness function with respect to the definition of A:
• A = longest strip
L(S, w)
Fitness1L (w)
= (24.9)
|S|
M (S, w)
Fitness1M (w)
= (24.10)
|S|
Let Sc = {x ∈ S|f (x ) = c, c ∈ K }, that is, the set of points of value c. An alternative objective is to
(c) (c) (c) (c)
select a strip of value c, that is, Tw , which maximizes |Tw |/|Sc |, where |Tw | denotes the length of Tw .
That is, as in References 44 and 56, the selection criterion chooses the strip that constitutes the largest
proportion of a class of points that can be separated. We denote by L(Sc , w) the length of the largest strip
where the maximum is over S for every class c presents in the training set. As it was stated in Reference 56,
choosing the largest proportion rather than the largest set do have some advantage for some functions
(v )
f : S → K . For, if the number of points of a class v1 is small and a strip Tw 11 constituting the whole class
(v )
Sc1 is found, it may be that a longer strip Tw 22 with a smaller proportion L(Sv2 , w)/|S v2 | can be found.
(v )
However, it is preferable to select Tw 11 because this removes the entire set Sv1 from S and brings the neural
network construction closer to the hidden layer termination criteria of having only points of one class
remaining in the training set. This is illustrated in Figure 24.21 where f ∈ P32 is a random function to
which both fitness selection criteria were applied.
As seen in Figure 24.21, it is impossible to obtain a network with exactly three hidden units (which is
the absolute minimum here) when using Fitness1L . The number in circles indicates the order in which
a generated strip is assigned to hidden units. So in Figure 24.21[b]) a fourth hidden unit is needed for
the last remaining point of value 2. In Figure 24.21(a), Fitness2L removes the set S1 immediately after
removing S0 (unlike in Figure 24.2[b] where a proper subset of S2 is removed after S0 ), hence one more
unit is needed to remove S2 .
A note on the time complexity of the evaluation function. For a given w, both fitness functions take
n|S| steps to compute the w x ’s, n|S| log |S| steps to sort them and at most |S| steps to compute L(S, w)
or
Therefore, the evaluation of Fitness(1 or 2)(L or M ) (w)
M (S, w). has a time complexity of O(n|S| log |S|).
In addition, crossover and mutation operations below take O(n) steps each and the initialization of the
population takes O(pnk n log k n ) steps (p is the number of chromosomes and all initial chromosomes are
evaluated for their fitness). Thus, the evaluation of Fitness(w) is the most expensive operation in our GA.
Let g be the number of generations, then at each new generation p/2 new chromosomes are evaluated for
their fitnesses and hence, our GA has a time complexity of O(gpn|S| log |S|) ≈ O(gpn 2 k n log k).
(a) 2 (b) 2 3
2 2 1 2 2 1
1 0 0 0 0 0 0 1
2 1 2 2 1 2
FIGURE 24.21 Behaviors of (a) Fitness2L and (b) Fitness1L on some f ∈ P32 . (From A. Ngom, I. Stojmenović, and
V. Milutinović. IEEE Transactions on Neural Networks, 12, 212–227, 2001. With permission.)
24.4.5.3 Crossover
Crossover is the GA’s crucial operation. Pairs of randomly selected chromosomes are subjected to
crossover. For our problem representation we propose the following mixed crossover method for
real-coded chromosomes as described in Reference 64. Let p1 and p2 be two unit vectors to be crossed over
and let c1 and c2 be the result of their crossing. Vectors c1 and c2 are obtained using, with equal probability,
two of the following three crossovers operations:
p1j if random() ≤ 0.5,
cij = (24.15)
p2j otherwise.
Crossover in Equation (24.13) is simply the addition of two parents and the child is assured to be their
exact middle vector since the parents are unit vectors. Crossover in Equation (24.14) is the substraction of
two parents and the child is the vector orthogonal to the sum of its parents. Crossover in Equation (24.15)
is a uniform crossover of two parents, that is, at coordinate i each parent has 50% chances to be selected
as cij (1 ≤ j ≤ n). For a more efficient search, c1 and c2 must not be obtained from the same cross-
over operator. That is, if c1 is obtained using Equation (24.13) then c2 must be generated from either
Equations (24.14) or (24.15); this helps to maintain a certain level of diversity among chromosomes in the
population. Also, crossover is applied only if a randomly generated number in the range 0 to 1 is less than
or equal to the crossover probability pcros (in large population, pcros gives the fraction of chromosomes
actually crossed).
We must emphasize that each chromosome is a unit vector at any moment in the population. Thus,
the initial random vectors are all normalized and the childs are also normalized to unit vectors after any
crossover or mutation operation.
24.4.5.4 Mutation
After crossover, chromosomes are subjected to random mutations. We propose three methods of
coordinate-wise mutations as described in Reference 64. They correspond to bitwise mutation for binary
chromosomes. Let p be a unit vector to be mutated to a child c .
Random replacement. With some probability of mutation, each coordinate pi (1 ≤ i ≤ n) of a parent p
may be replaced in the following way:
where random[−1, 1] returns a random real number in the interval [−1, 1] with uniform
probability.
Orthogonal replacement. With some probability of mutation, each coordinate pi (1 ≤ i ≤ n) of a parent
p may be replaced in the following way:
ci = ± 1 − pi2 . (24.17)
m
ci = pi ± , (24.18)
kn
where m ≤ k is a random constant. Unlike the two previous methods of mutation, this method
to a neighboring one.
slightly rotates the current hyperplane w
Just as pcros controls the probability of crossover, the mutation rate pmuta gives the probability for a
given coordinate to be mutated. For a vector to be mutated, one of the three mutation operators is selected
with probability 13 .
Here, we treat mutation only as a secondary operator with the role of restoring lost genetic material
or generating completely new genetic material which may be probably (near) optimal. Mutation is not a
conservative operator, it is highly disruptive. Therefore, we must set pmuta ≤ 0.1.
|(w
r xi − w r xi−1 )| r xi+j+2 − w
|(w r xi+j+1 )|
τ1 = and τ2 = . (24.19)
2 2
2. Maximum separable subset-based network. At every iteration r of the A-based synthesis algorithm,
GA finds a chromosome w r which produces a maximum separable subset Gr = P1 · · · Pj1 , or Gr =
Pj2 · · · P|Sr | (i.e., the maximum between the leftmost and the rightmost strips), where 1 ≤ j1 < j2 ≤ |Sr |
and strip value vr = f (x1 ) or f (xj2 ). Let ur = vr + 1. We then create an (n, k + 1, 1)-perceptron (hidden
unit Ur ) whose weight vector is w r , threshold vector is tr = (w
r xj1 +1 + τ2 ) if Gr is the leftmost strip, or
tr = (w r xj2 −τ1 ) if Gr is the rightmost strip, and output vector is or = (ur , 0) or or = (0, ur ) depending on
(w
r xj +1 +τ2 ),(ur ,0)
Gr . In other words, the perceptron has a transfer function of the form gk+1,11 : R → {0, ur } or
(w
r xj −τ1 ),(0,ur )
gk+1,12 → {0, ur } (i.e., a (k + 1)-valued one-threshold function). The (n, k + 1, 1)-perceptron
:R
will output the value ur for all points x ∈ Gr and will output the value 0 for all points x ∈ Sr − Gr . Offsets
τ1 and τ2 are determined as earlier.
After defining all units U1 , . . . , Ur (where r is the number of runs of the A-based synthesis algorithm),
the next step is to construct a feed-forward multilayer neural network. We propose two network topologies.
24.4.6.1 Three Hidden Layers and r + k + 2 Units Architecture
The network in Figure 24.22 (which shows the case for strip-based method) has three hidden layers
and r + k + 2 neurons. Hidden layer 1 contains the units (the Ui s) obtained by the GA. Each unit
is connected to the inputs and their parameters (weight, threshold, and output vectors) are defined as
tj = j + 1 1 ⱕ j ⱕ k–1
oj = j 0 ⱕ j ⱕ k–1
1 1 1
t=0
0 1 1 1 o = 0 or 1
0 0
–1 –j –k
ti = (k + 1)i 1 ⱕ i ⱕ r –1
oi = (k + 1)i 0 ⱕ i ⱕ r –1
(k +1)0
(k +1)r –1
(k +1)r –i
v1 vi vr 1 ⱕ vi ⱕ k
U1 0 0 Ui 0 0 Ur 0 0 1ⱕi ⱕr
FIGURE 24.22 Three hidden layers and r + k + 2 units network. (From A. Ngom, I. Stojmenović, and V. Milutinović.
IEEE Transactions on Neural Networks, 12, 212–227, 2001. With permission.)
described previously. So the units in this layer are all either (n, k + 1, 2)-perceptrons or (n, k + 1, 1)-
perceptrons, depending on the definition of A, and there are r such units (Figure 24.22 shows the case
A = longest strip).
Hidden layer 2 has only one unit which is a (r, (k + 1)r−1 + 1, r − 1)-perceptron. Its weight vector
w = ((k + 1)r−1 , (k + 1)r−2 , . . . , (k + 1)0 ), that is, wi = (k + 1)r−i for 1 ≤ i ≤ r; its threshold vector
t = ((k + 1)1 , (k + 1)2 , . . . , (k + 1)r−1 ), that is, ti = (k + 1)i for 1 ≤ i ≤ r − 1; and its output
vector o = ((k + 1)0 , (k + 1)1 , . . . , (k + 1)r−1 ), that is, oi = (k + 1)i for 0 ≤ i ≤ r − 1. All units of layer 1
are connected to this unit and the connection weight vector is w.
Hidden layer 3 contains k units. Each unit of layer 1 and 2 is connected to every unit in this layer. Each
unit is an ordinary linear threshold element (thus o = (0, 1)) and the connection weight vector from
layer 1 to that unit is the same as the connection weight vector from layer 1 to the unit at layer 2. The
connection weight wi,r+1 (1 ≤ i ≤ k) from layer 2 to the ith unit in layer 3 is −i. The threshold of units
in layer 3 are all set to 0.
The output layer has one unit which is a (k, k, k − 1)-perceptron whose threshold vector t = (2, . . . , k),
that is, ti = i + 1 for 1 ≤ i ≤ k − 1, and output vector o = (0, . . . , k − 1), that is, oi = i for 0 ≤ i ≤ k − 1
(or equivalently, oi = ti − 1). The connection weight from a unit in layer 3 to the output unit is 1.
t = 1(k +1) ,0..., k(k +1) 0, ..., 1(k +1)r –1 , ..., k(k +1)r –1
(k +1)0
(k +1)r –1
(k +1)r–i
v1 vi vr
U1 0 Ui 0 Ur 0
FIGURE 24.23 One hidden layer and r + 1 units networks. (From A. Ngom, I. Stojmenović, and V. Milutinović.
IEEE Transactions on Neural Networks, 12, 212–227, 2001. With permission.)
The output layer has only one unit which is a (r, k(k + 1)r−1 + 1, kr)-perceptron. Its weight vector
= ((k + 1)r−1 , (k + 1)r−2 , . . . , (k + 1)0 ), that is wi = (k + 1)r−i for 1 ≤ i ≤ r; its threshold vector
w
t = (1(k + 1)0 , . . . , k(k + 1)0 , 1(k + 1)1 , . . . , k(k + 1)1 , . . . , 1(k + 1)r−1 , . . . , k(k + 1)r−1 ); and output
vector o = (0, . . . , k − 1, 0, . . . , k − 1, . . . , 0, . . . , k − 1). All units of layer 1 are connected to this unit and
the connection weight vector is w.
As seen from the table, STRIPd2 , CARVE, and STRIPd4 networks are all within the same order of
complexities with respect to r, that is, O(1) number of layers, O(r) number of nodes, thresholds, and
weight connections. The total number of parameters, that is, the number of thresholds plus the number
of weight connections, is an important complexity measure of neural networks because of their hardware
cost and also their influence on generalization performance. With respect to this measure, the NDL
network has the worst overall complexity, which is O(r) number of layers and nodes and O(r 2 ) number of
parameters. CARVE network achieves the best overall complexity but is only slightly better than STRIPd2
network (which has 2r − k more parameters but k − 1 less nodes). STRIPd4 network is more complex
than CARVE and STRIPd2 networks because of its asymptotic constant.
An important observation from the experiments is that, as n and k increase in many classes of functions,
STRIPd2 and STRIPd4 networks have significantly much smaller values for r than CARVE, NDL, and
SEPAR networks. If GA is used to construct a SEPAR network for a given function, then such SEPAR
network should be at least smaller than a CARVE network (for the same function) for reasons explained
in the last paragraph of Section 24.4.3. That is, higher values of r in a CARVE network are due to the less
efficient search of CARVE algorithm compared with SEPAR algorithm (using GA), given the same task.
In practical applications the relation between the number of new hidden units r and the target function
is important. CARVE networks have the simplest operation of the basic units among them, and therefore
require more new hidden units than STRIP networks. Thus, we should discuss the overall complexity
with the number of new hidden units r for some function realizations. SEPAR algorithm already improves
CARVE and NDL for the given reasons. For most classes of functions, STRIP network implementations are
significantly smaller in r than maximum separable-based network implementations of the same functions
such as CARVE, NDL, SEPAR, and other networks. The reason is that removing a strip generated by a w
may create a completely new strip which is the union of two strips that enclosed the removed strip. This
can happen only when the removed strip is not an end strip. For instance, consider three strips where
one has value 1 and the other two have values 0 and suppose strip of value 1 is between strips of values 0,
that is, we have the sequence 010. Then removing strip 1 creates a new strip 00, which may be longer than
the strips of values 1 and 0 together. If the longer strips are created then the smaller strips will be the
number of new added nodes. This situation cannot happen when maximum separable subset techniques
are used. That is why the maximum separable method creates more nodes than the longest strip method.
To illustrate this fact, consider the example function of P92 (i.e., k = 9 and n = 2) in Figure 24.24.
This function has a mirror-symmetric table, that is, all rows, columns, and the two main diagonals are
symmetric about their center; the second half of a row, column, or diagonal is a mirror reflection of
the first half. Moreover, at row y (1 ≤ y ≤ (k − 1)/2) the first y entries in that row are equal, and
at column x (0 ≤ x ≤ (k − 1)/2) the first x + 1 entries in that column are equal. Such two-input
mirror-symmetric-table function can be constructed for any odd k and the analysis is similar. This class
of functions is very interesting. First, the smallest possible size for a strip-based network or a maximum
separable subset-based network that realizes a function in this class can be obtained analytically. Second,
like random functions, these functions have small separations between inputs and, therefore, seem least
difficult to realize. Third, as described later, they clearly demonstrate the power of STRIP compared with
8 6 4 2 0 2 4 6 8
7 6 4 2 0 2 4 6 7
5 5 4 2 0 2 4 5 5
3 3 3 2 0 2 3 3 3
1 1 1 1 0 1 1 1 1
3 3 3 2 0 2 3 3 3
5 5 4 2 0 2 4 5 5
7 6 4 2 0 2 4 6 7
8 6 4 2 0 2 4 6 8
FIGURE 24.24 Mirror-symmetric-table function of P92 . (From A. Ngom, I. Stojmenović, and V. Milutinović. IEEE
Transactions on Neural Networks, 12, 212–227, 2001. With permission.)
CARVE, NDL, and SEPAR algorithms; the difference in size between a smallest STRIP network and a
smallest CARVE network for a given function in this class is O(k 2 ).
Clearly, a function in this class has a minimal representation of exactly k units in the first hidden layer
of STRIP. In our example function, STRIP algorithm will extract nine strips out of it and these will have
values 0, 1, 2, 3, 4, 5, 6, 7, 8 in that order. A weight vector that generates the strip of value 0 (the longest
= (0, 1) — also note that there are four strips of values 2 (and lengths four) in that
strip initially) is w
direction. After removal of the strip of value 0 the algorithm must change direction, that is w = (1, 0), in
order to remove the strip of value 1. The next longest strip is the strip of value 2 in direction w = (0, 1);
the four short strips of length 4 are now joined together into a single strip of length 16, since strip 0 and
strips 1 which were between them are removed.
What CARVE, NDL, and SEPAR algorithms can do at best? It can be shown that the minimum number
of hyperplanes needed to partition a mirror-symmetric-table function is (k 2 + 2k − 3)/2. For instance,
the smallest CARVE network associated with our example function contains 48 units in its first hidden
layer. Clearly, the ratio between the sizes of a smallest STRIP network and a smallest CARVE network for
a function in this class → 0 as k → +∞. A smallest CARVE, NDL, or SEPAR network for such functions
is O(k) times larger than a corresponding smallest STRIP network.
Suppose for some function realization, a maximum separable subset-based algorithm, say CARVE,
achieves the smallest network size rm . Suppose, also for the same function realization that STRIP achieves
its smallest network size rs . For STRIP to be fundamentally better than CARVE we must have rs < 12 rm
(this is the case for mirror-symmetric-table functions realizations). Simply put one STRIP neuron (a two-
threshold perceptron) is equal in complexity to two CARVE neurons (one-threshold perceptrons). For
functions where the inputs are separated by a number of parallel hyperplanes (such as linear, permutably
homogeneous, some monotone functions), STRIP is not fundamentally better than CARVE. For these
functions, a smallest STRIP network has a value rs ≥ 12 rm . We will discuss more about this fact in
Section 24.5.7.
STRIP networks are fundamentally better than CARVE, NDL, SEPAR (and other maximum separable
subset based) networks for many classes of functions other than those cited in the previous paragraph.
More complex examples than the mirror-symmetric-table functions can be made, for some n and k,
where STRIP selects optimal directions for partitioning whereas SEPAR, for instance, cannot know such
information and therefore will most likely do poor job. Preliminary experiments (see Table 24.2) seem
to indicate that such functions (including mirror-symmetric-table functions) are very hard to realize by
CARVE and similar algorithms; indeed, they look at least harder than random functions. For example,
given the function in Figure 24.24, if CARVE algorithm removes a single point of value 6 in its first few
iterations, then the corresponding CARVE network will never be minimal. CARVE algorithm proceeds
by corner (and border) separation and it is very likely that one of the points (1, 0), (1, 9), (8, 0), (8, 9) of
value 6 will be in a singleton class that may be separated.
We can also compare the complexity and latency of a minimal STRIP network implementation of
an arbitrary multiple-valued logic function to the complexity and latency of a minimal direct circuit
implementation of the same function. For some functions realizations, STRIP network implementations
are sometimes better and sometimes worse (in depth or size) than their corresponding direct circuit
implementations. The circuit implementations can be from any basis of gates, such as {XOR} or {AND,
OR, NOT } bases for example. See Section 24.5.7 for a comparison with n-bit parity circuits.
Source: From A. Ngom, I. Stojmenović, and V. Milutinović. IEEE Transactions on Neural Networks, 12, 212–227, 2001. With
permission.
Clearly, on input x each unit Ui has either value ui = 0 or ui = vi = 0, where vi is the maximum
amplitude of the unit (see Figures 24.22 and Figure 24.23). Recall that each Ui corresponds to a subset
Gi found and removed by our A-based synthesis algorithm during its ith run. The collection of the Gi s
is a partition of K n , that is ri=1 Gi = K n and ri=1 Gi = ∅, and therefore, the subset Gi such that
input x ∈ Gi corresponds to the first unit in layer 1 (starting from the left) which outputs a nonzero
value on input x . That is u = (0, . . . , 0, ui = vi , ui+1 , . . . , ur ), where uj for i + 1 ≤ j ≤ r is either 0 or
vj . Recall that, according to our definition of Ui , vi − 1 is the function value of all points in Gi , that is
f (x ) = vi − 1.
By definition, unit P always outputs the value p = (k + 1)r−i whenever i is the least index in u such that
ui > 0 on input x . In the next paragraph we let i be the least such index, that is, such that ui = vi = 0
and u1 = · · · = ui−1 = 0. Let a = rl=1 (k + 1)r−l ul = rl=i (k + 1)r−l ul be the dot product of the
weights and inputs (i.e., the outputs of layer 1) of P.
Each unit Qj (1 ≤ j ≤ k) in layer 3 performs the sum bj = a − j(k + 1)r−i (recall that the weight
connection from P to Qj is −j and that the output of P is (k + 1)r−i ). We have bj = (k + 1)r−i ui −
j(k + 1)r−i + rl=i+1 (k + 1)r−l ul . From our definition of the weight connection vector between layer 1
and layer 3 it is easy to see that 0 ≤ rl=i+1 (k + 1)r−l ul < (k + 1)r−i . Therefore, we obtain bj ≥ 0 for
1 ≤ j ≤ ui and bj < 0 for ui+1 ≤ j ≤ k. That is, exactly ui units in layer 3 will output the value 1.
By definition, the output unit Z which has only unit weights, will perform the sum c = kj=1 1bj =
ui
j=1 1 = ui . Since 1 ≤ ui ≤ k and the threshold vector of Z is such that tui = ui + 1, therefore, we
obtain tui −1 ≤ ui < tui . From the definition of the output vector of Z we know that the output of the
neural network is z = tui −1 − 1 = ui − 1 = vi − 1 = f (x ). Thus, the networks have effectively classified
input x correctly. This completes our proof for the network of Figure 24.22. The proof for Figure 24.23 is
straightforward and therefore omitted.
1. Permutably homogeneous functions. Functions in this class are partitioned by at most k −1 separating
parallel hyperplanes such that no two distinct classes have equal values. Therefore, the minimal number of
new hidden units for both STRIPand SEPAR is ≤k. We have experimented with the following permutably
homogeneous functions: f (x ) = ( ni=1 (1/ai )xi )+n mod k, where ai = 2i+1. We randomly generated
the four-input four-valued logic function f (x ) = x1 /3 + x2 /5 + x3 /7 + x4 /9 + 4 mod 4 and tested our
algorithms with it. The function has three possible values, namely 0, 1, 2. The three classes are separated by
two parallel hyperplanes (classes 1 being in the middle) and the distance between two adjacent hyperplanes
is ≥2.
The results of STRIP and SEPAR are consistent: they gave a value of about the minimal solution r = 3.
Clearly, there is no benefit of using STRIP for these functions; there is no reduction at all and thus
the STRIP network is twice more complex than its corresponding SEPAR network. In general, STRIP is
not better than SEPAR for functions where the inputs are separated by nonintersecting hyperplanes with
distinct values in distinct regions.
2. Monotone functions. We experimented with a (random) function of P44 which is monotonic under
the natural non-decreasing order on K , that is 0 ≤ 1 ≤ · · · ≤ k − 2 ≤ k − 1. For such functions, STRIP
and SEPAR produced closed results. This suggests that, as for permutably homogeneous functions, SEPAR
is better than STRIP for these monotone functions since its achieves smaller network complexity even
though STRIP has smaller values for r.
3. Linear functions. These functions are partitioned by a number of parallel hyperplanes where, many
separated distinct classes of inputs have equal values and any two adjacent classes have distinct values. The
minimal STRIP network for such functions will be exactly twice smaller than the corresponding minimal
SEPAR network, for reasons explained in Section 24.4.6.3. Therefore, even though the STRIP network is
twice smaller in r, it will have exactly the same complexity as its corresponding SEPAR network for linear
functions realizations. Thus STRIP is no better than SEPAR and CARVE for such functions.
The random four-valued linear function generated was f (x ) = (3x1 + x2 + 3x3 + x4 ) mod k. Both
STRIP and SEPAR algorithms, have difficulties realizing this function. First, the minimum r produced by
SEPAR, 13, is very far from the average result, 20.3. Second, STRIP should give a result which is about
two times smaller than the minimum obtained by SEPAR. The reason for this is that, for linear functions,
the distance between two adjacent separating parallel hyperplanes is very small (≤1). Therefore, the
algorithms are very sensitive to rotations, that is, small rotations from separating hyperplanes could cause
the algorithms to produce large networks.
The n-bit parity functions are linear functions and it is well-known that a single layer minimal solution
exists with n hidden (n, 2, 1)-perceptrons. We carried out experiments with STRIP and SEPAR algorithms
for n = 0, . . . , 12 using 2500 generations of GA to learn such functions. The results obtained were
consistently #n/2$ + 1 hidden units for STRIP and n + 1 hidden units for SEPAR, using both fitness
measures, respectively. For SEPAR we cannot obtain n hidden units like the other maximum separable-
based methods because the last training set (of points of same class) is always assigned to a new hidden
unit. Thus, for instance, we will obtain exactly two hidden units for the binary AND function whereas the
other methods would obtain one hidden unit. For SEPAR, the results obtained for n = 10, 11, 12 were
in reality 13, 15, 18 for 2500 generations of GA; however, we obtained the correct results, 11, 12, 13 when
we increased the number of generations to 4000. This suggests that our algorithms are able to find the
minimal value if they are given enough time.
n-bit parity functions can be implemented by {XOR} circuits of depth one and size O(n), {XOR} circuits
of depth O(log2 n) and size n, {AND, OR} circuits of depth d and size O(21/(d−1) n (d−2)/(d−1) ) or {AND,
OR, NOT } circuits of depth two and size 2n−1 . For such functions, a STRIPd2 network has size #n/2$ + 2
and a STRIPd4 network has size #n/2$ + k + 3. As one can see, STRIP networks are not always better
than their corresponding direct circuit implementations. The difference in complexity between a STRIP
network and a direct circuit depends on factors such as the basis set of gates and the fan-in or fan-out of
the gates in the circuit.
4. Random functions. The experiments clearly show that STRIP is fundamentally better than SEPAR
and CARVE for random functions realizations. The average and the minimum obtained STRIP networks
are at least twice smaller than the corresponding results for SEPAR networks. As already stated, STRIP
is able to change (and select good) directions for separation and to create larger classes, while SEPAR
cannot do any of these two things. For random functions, the separation between classes is very small.
STRIP can see inside a random function to decide the best directions to choose and, by doing that, it
maximizes the size of all classes (they will get bigger and bigger until they can be removed). SEPAR only
maximizes the size of the classes that are at the boundaries of the inputs space.
Table 24.3 contains results reported for the best three constructive algorithms (so far in the literature)
that have been applied to learning random two-valued logic functions. The value in parenthesis at the
top of each column is the number of trials over which the network is averaged. In each trial of STRIP, we
generated a different random function and used 2500 iterations of GA. STRIP produces smaller networks
for this classification task than any other growth method. Significantly, STRIP gives much smaller networks
than CARVE as n and k increase. Also, SEPAR performs better than Upstart as n and k increases.
5. Mirror-symmetric functions. We have carried out experiments with mirror-symmetric functions.
A mirror-symmetric function has value 1 if the second half of an input vector is a mirror reflection of the
first half, that is the input vector is symmetric about its center. This function is known to have a minimal
representation of two hidden units. For n = 2, . . . , 7 we have always found the optimal number of hidden
units in STRIP and SEPAR, two, using 1000 generations of GA. For n = 8, . . . , 12 we have always found the
optimum when using 3000 generations of GA. Here, as for permutably homogeneous functions, STRIP is
no better than SEPAR.
6. Mirror-symmetric-table (mst) functions. Table 24.4 shows the comparisons of performances between
our example mst function in Figure 24.24 and ten random functions. For each algorithm (STRIPF 2 ,
STRIPF 1 , SEPARF 2 , and SEPARF 1 ), we did ten runs on our mst function and averaged the results. Also,
each algorithm was applied on ten distinct random functions (different from the random functions used
for the other three algorithms) and the results were then averaged. STRIPF 2 is a clear winner (except its
performance in generalization for random functions); it produced the minimal size for mst. The table also
Using 100% of K n
Mst 09.0 ± 0.00 09.8 ± 0.40 60.2 ± 1.08 69.0 ± 3.69
Random functions 29.2 ± 1.99 31.8 ± 1.60 58.8 ± 2.89 64.7 ± 4.61
Using 60% of K n
Mst 09.0 ± 0.00 11.7 ± 1.95 31.0 ± 3.13 36.0 ± 2.49
Random functions 19.1 ± 1.37 21.6 ± 1.91 36.1 ± 2.98 36.7 ± 2.90
Accuracies
Mst 65.15% ± 12.44 57.27% ± 09.53 18.18% ± 06.91 13.33% ± 05.62
Random functions 09.70% ± 04.85 16.06% ± 06.36 10.91% ± 07.20 12.42% ± 05.33
shows that the single mst function is harder to realize by SEPAR than random functions (see the last two
entries of the first two rows): the average result of SEPAR for mst is very far from the absolute minimum,
48, and is significantly larger than the averaged result of ten random functions. For reasons explained in
Section 24.4.5.2, Fitness2 helps to remove the classes faster than Fitness 1, and therefore, it yields better
results in the table for both algorithms (except on generalization of random functions).
d
wj = − . (24.20)
yj − x j
STRIP_GA and SEPAR_GA networks reported in Section 24.4 along with other well-known construction
techniques.
Throughout the experiments, we used the following parameters: 2000 generations for ES (since ES
is at least twice faster than GA and that in Section 24.4 uses 1000 generations for GA) and 10%
mutation rate. Section 24.4 used 10% mutation rate, 75% crossover rate and an elitist strategy in the
GA (i.e., the best individual of the current generation is always reproduced to the next generation).
In some experiments such as random binary functions (and other functions) learning we increased
the number of generation of GA (see Section 24.4), and accordingly use twice as much iterations for
the ES.
Tables 24.5 and 24.6 show, respectively, for both objective functions, the results of ten runs of each
method on randomly generated functions from each of the four test classes. We display the average
number of created hidden nodes, r (in the first layer of the constructed networks), with its stand-
ard deviation, the minimum value found by the method (the number in parenthesis is the number
of times it was found), the smallest running time (in minutes) over ten runs, and the average gener-
alization accuracy (with its standard deviation) on the test set over ten runs. From these two tables
we can see that, in general, Fitness2 yields (slightly) smaller but (slightly) less accurate networks than
Fitness1.
GA performed slightly better than ES in most experiments, due to the GA manipulation of population
and also its implicit parallelism. However, ES is at least twice faster than GA.
STRIPFGA
2 [7] STRIPFES2 [7] CARVE [56] Sequential [44] Upstart [50] SEPARFGA
2 [7]
n (100) (100) (100) (100) (100) (100)
interested in finding minimal s for which there exist a s-representation for a given f ∈ Pkn . In other
words, given f ∈ Pkn , we want to find a s-representation r with the least possible number of thresholds s
n (
such that Fk,s r ) = f . We propose GAs as techniques for minimizing multiple-valued multiple-threshold
perceptrons.
What we are trying to do in both methods of initialization is to generate random hyperplanes (since each
represent a hyperplane).
w
T (w)
=1−
Fitness1(w) . (24.21)
k −1
n
Note that a GA always maximizes an objective function and since 1 ≤ T (w) ≤ k n − 1, then Fitness1(w)
is maximal when T (w) is minimal.
However, invalid thresholds must need severe penalty. For instance, assume a n-input k-valued
logic function f : K n → {0, 1} chosen at random. Then one may take hyperplanes x1 = 0,
x1 = 1, . . . , x1 = k − 1 as invalid thresholds. These k hyperplanes (or k 2 thresholds) will separate in
our sense but are not really separating as such random function needs actually an exponential number of
thresholds. Because of this fact, instead of using Formula (24.21) we can alternatively use Formula (24.22).
2 − (T w)/(k
n − 1) − I (w)/T
(w)
T (w)
I (w)
=
Fitness2(w) =1− − . (24.22)
2 2 · (k − 1) 2 · T (w)
n
Here, we not only minimize T (w) (in second term) but we also punish a chromosome that generates
a large number of invalid hyperplanes (in last term). That is we are minimizing T (w) and I (w) at the
same time. Note that 0 ≤ I (w) ≤ T (w) and thus Fitness2(w)
will be maximal if both T (w) and I (w)
are
minimal.
In all our experiments, both formulae of fitness yield the same results for I (w) = 0. We do not know
for now how they do behave for I (w) = 0 since the w’s
generated valid thresholds only. The probability
to generate invalid thresholds seems to be very close to zero.
A note on the time complexity of the evaluation function. For a given w, it takes n · k n steps to compute
· x ’s, k · log k steps to sort them and at most k steps to compute T (w).
all the w n n n Therefore the evaluation
of Fitness(w) has a time complexity of O(n · k n · log k).
Also, crossover and mutation operations take O(n) steps each and the initialization of the population
takes O(n · p · k n · log k) steps (p is the number of chromosomes and all initial chromosomes are evaluated
for their fitness). Thus the evaluation of Fitness(w) is the most expensive operation in our GA (and is
true in general for any GA). Let g be the number of generations, then at each new generation p/2 new
chromosomes are evaluated for their fitness and hence, our GA has a time complexity of O(n·g ·p·k n ·log k).
k #Invalids #Seconds
2 0 3.25
4 0 8.92
8 0 36.72
16 0 153.76
32 0 700.12
64 0 3,289.59
128 0 14,511.80
256 0 99,434.36
n #Invalids #Seconds
2 0 3.25
4 0 10.40
8 0 217.47
9 0 496.61
10 0 1,047.64
11 0 2,155.61
12 0 4,797.34
13 0 9,918.57
14 0 21,959.14
Average number
n Optimal s Number of runs of generations
k=4
2 1 10 0
3 2 10 24.6
4 2 9 430.9
k=3
5 1 3 669
k=2
6 1 9 283.78
7 1 3 660.33
As we can see in both tables (and also in Table 24.10), the number of invalid thresholds obtained is
always zero. The last column in both tables shows the running time for s = 100 and g = 100. Although our
method is slow, it is no surprise that the algorithm is slower as n grows than as k grows. Such results agree
with the complexity analysis given in Section 24.6.1.2. From neural networks applications perspective,
results on the number of invalid thresholds for k ≤ 32 and k ≥ 64 given in Table 24.8 are interesting,
since these values of k correspond to discretizations of real-valued neurons by at most 5 bits or at least 6
bits, for fixed n. A more theoretical, not well understood problem would address n/ log n, n, or constant
number of bits since we know that n/ log n bits is sufficient, n bits is the best known lower bound, while
constant number of bits appears to be sufficient in practice for nonmalicious threshold functions.
In Table 24.10 we show results of ten runs of the GA on examples of n-place k-valued logic functions
(for 2 ≤ k ≤ 4 and 2 ≤ n ≤ 7) given by
n
1
f (x ) = xi + n mod k (24.23)
ai
i=1
where ai = 2i + 1. We can easily guess the minimum number of thresholds needed for a perceptron
to simulate these functions. Indeed, each of these functions defines itself its separating hyperplanes and
their number. The number of hyperplanes is simply the number of distinct values of such function
minus one, and each hyperplane Hj is defined by the equation ni=1 (1/ai )xi = tj for some threshold tj
(1 ≤ t ≤ number of thresholds). The output vector can also be obtained by computing the value of f for
x1 = · · · = xn = 0 and x1 = · · · = xn = k − 1 = 3 and listing in increasing order modulo k the sequence
of other distinct values of f in between. In Table 24.11 we show examples of optimal solutions obtained
by the GA.
In Table 24.10, the second column indicates the optimal number of thresholds that the GA must find,
the third columns contains the number of runs where the GA reached the optimum, and the fourth
column shows the average number of generations over all successful runs needed to obtain the optimal
solution. All solutions found by the GA, optimal or not, were valid solutions in that they do not contains
invalids thresholds.
As seen from the table, the difficulty for the GA to find an optimal solution within 1000 generations
depends mostly on n rather than k. This is not surprising since the search space is exponential on n and
thus the GA needs more and more generations (meaning more genetic operations) to successfully obtain
an optimum. This is indicated by the fourth column. For k = 4 and n = 5, for example, the GA could
not find an optimum within five runs of 1000 generations each; however, it was successful within one run
with 2000 generations. This suggest that given enough time (which depends on n) the GA will always find
n
w t o
k=4
2 0.953823 0.300371 2.508386 2 3
3 0.830144 0.473697 0.294061 2.303273 4.793706 3 0 1
4 0.785003 0.441867 0.347428 0.260417 2.351256 4.714851 0 1 2
k=3
5 −0.754707 −0.465486 −0.348600 −0.228958 −0.199491 −2.285579 0 2
k=2
6 0.487796 0.827506 0.205418 0.075150 0.111406 0.130513 1.595870 0 1
7 0.746777 0.459637 0.299486 0.148247 0.255323 0.191369 0.132577 1.654147 1 0
the minimal s-representation for a logic function f . We do not have rows for higher values of n because
of the fact that the algorithm is slow as n grows.
It is interesting to note that the functions we used in our experiments are the most difficult for
the GA since the their s-representations are very small (e.g., s ∈ {1, 2}). This indicates that for most
(random) functions the GA will perform much better than for our test functions because s is larger on
average.
We compared our technique with the extended permutably homogeneous (n, k, s)-perceptron learning
algorithm (EPHPLA) described in Section 24.3. A permutably homogeneous perceptron has a (s + 1, k)-
permutation as its output vector, that is a permutations of s + 1 elements out of K with s ≤ k − 1.
The EPHPLA generalizes the homogeneous (k, k − 1)-perceptron learning algorithm of Reference 16
and has a time complexity of O(e · n · k n ), where e is the number of learning epochs. The EPHPLA
can only learn permutably homogeneous functions and an example of such class of functions are our
test functions given by Equation (24.23). It is proven in Reference 34 that the EPHPLA always converges
for permutably homogeneous functions, and that also, it always finds a minimal s-representation for a
learned function f . The EPHPLA is faster and outperform the GA on learning these same test functions
within one run of 1000 learning epochs. The GA converged better only for n = 2 and any k. The main
advantage of the GA method over the EPHPLA is that it can learn any logic function provided enough
time is given.
able to do this without an oracle within the net, that is without devine revelation or guidance, so to speak,
at critical junctures. Positive results gained in activities of such nature would not be interpreted to indicate
that that is indeed how human biological nets do function but might serve to suggest preferences among
various ways of thinking of processing in biological nets.
The issue we have addressed in this chapter is that of implementing multiple-valued logic systems in
neural networks. There are many kinds of multiple-valued logic algebras such as fuzzy logic, probabilistic
logic, logical calculus with rough sets, etc. However, for the present we have concentrated on one such logic
system, that is the classical multiple-valued logic as defined in this chapter.
In particular, we have discussed original models of multiple-valued neurons and multiple-valued neural
networks and studied their learning and computing powers.
24.7.1 Conclusions
We have addressed the issues of synthesizing multiple-valued logic circuits with (n, k, s)-perceptrons.
The (n, k, s)-perceptron learning problem is the problem of determining an (optimal) s-representation
t , o) required to compute a given function. Another problem related to learning with (n, k, s)-
r = (w,
perceptron networks is the search for an optimal size of a network during learning.
1. Learning abilities of (n, k, s)-perceptrons are examined. The previously studied homogeneous
(n, k, k − 1)-perceptron learning algorithm have been generalized to the permutably homoge-
neous (n, k, s)-perceptron learning algorithm with guaranteed convergence property. A permutably
homogeneous perceptron is a neuron whose output vector o is a permutation on K . We have
obtained a powerful learning method that learns any permutably homogeneous separable k-valued
function given as input. When the number of thresholds is not fixed, the algorithm will always find
the minimal one to be used for learning a separable function.
2. We have discussed a particular implementation of partitioning algorithm to construct (near)
minimal (n, k, s)-perceptron networks for learning given but arbitrary multiple-valued functions.
We used GA or ES to find a (near) minimal set of hidden units that partition the space V ⊆ K n
into strips or maximum separable subsets. A strip contains points located between two parallel
hyperplanes. We have constructed two neural networks based on these hidden units and show that
they correctly compute the given but arbitrary multiple-valued function. STRIP and SEPAR can
be used for functions with real-valued inputs, k-valued inputs, two-valued outputs, or k-valued
outputs. More research is needed in order to increase the speed of the A-based synthesis algorithm.
3. Every n-input k-valued logic function can be implemented using a (n, k, s)-perceptron, for some
number of thresholds s. We proposed a GA to search for a minimal (n, k, s)-perceptron that
efficiently realizes a given but arbitray function, that is to minimize its number of thresholds.
Experimental evidence show that the genetic search can be very effective however slow it
may be.
2. The computing capacity of linear decision lists or any architecture obtained by a partitioning
algorithm is not known and is still an open problem. The technique we have applied for deriving capacity
results on (n, k, s)-perceptrons may be extended or improved to give results for partitioning architectures.
The main question is: In how many ways can we partition a finite set V ⊆ R n using s hyperplanes? The
hyperplanes are not necessarily parallel. The answer to this question gives (bounds on) the capacity of
neural networks constructed by partitioning algorithms. Such neural network architectures include neural
trees and neural decision lists. This question also motivates for the investigation of the VC-dimension and
PAC-learnability of such structures.
3. The combinatorial arguments used to derive our results for the number of linear (or multilinear)
partitions and the capacity of (n, k, s)-perceptrons may possibly be extended for the general case of
n-dimensional set. For instance, enumerate partitions for higher dimension grids.
4. Suppose the following generalization of a threshold function: g (P(x)) = 0 if P(x) < t and,
g (P(x)) = 1 if t ≤ P(x), where P(x) is a polynomial of degree d ≥ 0 and t is a threshold level. We
say that g is a threshold function of order d. For instance, the well-known linear threshold function
is a threshold function of order 1 with P(x) = w x . Geometrically, an order d threshold function is a
separating hypersurface of degree d. For example, a linear threshold is a separating hyperplane that can be
expressed by a polynomial of degree 1. Very little results are known in neural networks literature on using
hypersurfaces as discriminant functions, such as quadratic surfaces (parabola, hyperbola, circle, ellipses,
spheres, etc.), cubic surfaces, or surfaces of degree d ≥ 4. It may be that hypersurfaces are better separators
than linear surfaces (i.e., hyperplanes). More studies need to be done.
5. An interesting research is to design efficient learning algorithms for multilayer (n, k, s)-perceptron
networks. Such network would have the ability to learn any multiple-valued function. Learning and
computing abilities of discrete Hopfield networks (or any other type of discrete associative neural net-
work) composed of (n, k, s)-perceptrons can be investigated. Support vector machines (SVMs) with
(n, k, s)-perceptrons as processing units can be studied. The capacity of single (n, k, s)-perceptrons may
be increased if learning algorithms are designed for output vectors which are (s + 1, k)-permutations with
possible repetitions of their distinct elements.
6. The neural networks we have considered in this thesis are heteregeneous networks, meaning that
the (n, k, s)-perceptron elements are not all identical (e.g., the number of thresholds or inputs are not
all the same). Designing learning algorithms for homogeneous networks seems (to us) much easier than
for heteregeneous networks. However for homogeneous networks, the problem we must solve first is to
define a good differentiable error function in order to do learning with gradient descent similar to the
back-propagation learning method.
7. The generalization properties of STRIP and SEPAR need to be studied. We believe that our methods
generalize better than CARVE since CARVE always overfits the weights. We avoided this overfitting by
translating the obtained hyperplanes away from their original positions so to include some test points in
the closed space delimited by these hyperplanes. In addition, in higher dimension and for many classes
of functions, our STRIP networks are much less complexe (in terms of total number of parameters) than
CARVE and others maximum separable based networks, and therefore by the Occam razor principle STRIP
would generalize better. One can further improve the generalizability by transforming an obtained STRIP
network into a radial basis functions network and apply back-propagation to the resulting network. Young
and Downs [56] have done the same thing with CARVE using sigmoidal units and proved it efficient.
In our case, we will use radial basis functions because Gaussian units are continuous version of our
(n, k + 1, 2)-perceptrons.
8. In minimizing the number of thresholds of an (n, k, s)-perceptron, the generalization properties of
the GA can be studied when the fitness function is modified to work with small subsets of K n . Another
aspect that would be of interest to neural networks community is related to our proposed GA optimization.
If it generates small number of invalid thresholds when working with K n , it should also be able to discover
a near optimal representation for smaller subsets of K n . If so, and if we believe in Occam’s razor principle
(that the simplest satisfactory explanation of a phenomenon is most likely to be a correct one) then this is
interesting from generalization perspective. In particular, instead of calculating w x for every x ∈ K n , the
same fitness estimation algorithm can be applied to a small enough subset S ⊂ K n (e.g., (k − 1)n randomly
selected points from K n ). Once a faithful representation for S (called the training set) is learned, it is easy
(but interesting) to measure classification error on out of sample data (e.g., total number of mistakes, and
maybe sum of squared errors on K n − S). If applied for generalization and if needed, the fitness function
can be further modified to allow some error if it results in smaller representation. One major problem with
our method is that GA is very slow for even small values of n (n ≥ 10) and k (k ≥ 8). Better techniques
are needed for efficient computation of (n, k, s)-perceptrons. For example, one possible solution could
be to use a population of s-representations r (instead of weights w) and then design a fitness function
which minimizes s and minimizes the error between the (n, k, s)-perceptron and a given multiple-valued
logic function. Sorting is not needed here since we select only chromosomes whose t is sorted. Those with
unsorted t will have severe penalty.
References
[1] M. Minsky and S. Papert, Perceptrons: An Introduction to Computational Geometry, MIT Press,
Cambridge, MA, 1969, Expanded edition, 1988.
[2] K.C. Smith, A multiple-valued logic: A tutorial and appreciation. Computer, 21, 17–27, 1988.
[3] J. Lukasiewicz, O logice trojwartosciowej. Ruch Filozoficzny, 15, 169–171, 1920.
[4] E.L. Post, Introduction to a general theory of elementary propositions. American Journal of
Mathematics, 43, 163–185, 1921.
[5] A. Ngom, C. Reischer, D.A. Simovici, and I. Stojmenović, Set-valued logic algebra: A carrier
computing foundation. Multiple-Valued Logic — An International Journal, 2, 183–216, 1997.
[6] S.C. Chan, L.S. Hsu, and H.H. Teh, On neural logic networks. Neural Networks, 1 (Suppl. I.), 428,
1988.
[7] V. Milutinović, A. Ngom, and I. Stojmenović, Strip — a strip-based neural network growth
algorithm for learning multiple-valued functions. IEEE Transactions on Neural Networks, 12,
212–227, 2001.
[8] A. Ngom, Synthesis of Multiple-Valued Logic Functions by Neural Networks, Ph.D. thesis,
Computer Science Department, University of Ottawa, Ottawa, Ontario, Canada, October 1998.
[9] Z. Obradović, Computing with nonmonotone multivalued neurons. Multiple-Valued Logic — An
International Journal, 1, 271–284, 1996.
[10] Z. Tang, Q. Cao, and O. Ishizuka, A learning multiple-valued logic networks: Algebra, algorithm,
and application. IEEE Transactions on Computers, 47, 247–251, 1998.
[11] G. Wang and H. Shi, Tmlnn: Triple-valued or multiple-valued logic neural network. IEEE
Transactions on Neural Networks, 9, 1099–1117, 1998.
[12] T. Watanabe, M. Matsumoto, M. Enokida, and T. Hasegawa, A design of multi-valued logic
neuron. In Proceedings of the 20th IEEE International Symposium on Multiple-Valued Logic, 1990,
pp. 418–425.
[13] Q. Cao, O. Ishizuka, Z. Tang, and H. Matsumoto, Algorithm and implementation of a learn-
ing multiple-valued logic network. In Proceedings of the 23rd IEEE International Symposium on
Multiple-Valued Logic, 1993, pp. 202–207.
[14] Z. Tang, O. Ishizuka, Q. Cao, and H. Matsumoto, Algebraic properties of a learning multiple-
valued logic network. In Proceedings of 23rd IEEE International Symposium on Multiple-Valued
Logic, pp. 196–201, 1993.
[15] Z. Obradović and I. Parberry, Computing with discrete multivalued neurons. Journal of Computer
and System Sciences, 45, 471–492, 1992.
[16] Z. Obradović and I. Parberry, Learning with discrete multivalued neurons. Journal of Computer
and System Sciences, 49, 375–390, 1994.
[17] R.O. Duda and P.E. Hart, Pattern Classification and Scene Analysis, John Wiley & Sons, New York,
1973.
[18] I. Parberry and G. Schnitger, Parallel computation with threshold functions. Journal of Computing
and System Science, 36, 278–302, 1988.
[19] K.Y. Siu, V. Roychowdhury, and T. Kailath, Discrete Neural Computation: A Theoretical Foundation,
Information and System Sciences Series. Thomas Kailath, Series Editor. Prentice-Hall, 1995.
[20] S. Muroga, Threshold Logic and its Applications, Wiley Interscience, New York, 1971.
[21] D. Haring, Multi-threshold threshold elements. IEEE Transactions on Electronic and Computer, 15,
45–65, 1965.
[22] S. Olafsson and Y.A. Abu-Mostafa, The capacity of multilevel threshold functions. IEEE
Transactions on Pattern Analysis and Machine Intelligence, 10, 277–281, 1988.
[23] R. Takiyama, Multiple threshold perceptron. Pattern Recognition, 10, 27–30, 1978.
[24] D. Acketa and J. Žunić, On the number of linear partitions of the (m, n)-grid. Information
Processing Letters, 38, 163–168, 1991.
[25] O. Ishizuka, Multivalued multithreshold networks. In Proceedings of the 6th IEEE International
Symposium on Multiple-Valued Logic, 1976, pp. 44–47.
[26] T. Sasao, On the optimal design of multiple-valued plas. IEEE Computer, 38, 582–592, 1989.
[27] A. Blum and R.L. Rivest, Training a 3-node neural network is NP-Complete. In Proceedings of the
1st Workshop on Computational Learning Theory. Morgan Kaufmann, San Mateo, CA, 1988, p. 9.
[28] J.F. Miller and P. Thomson, Highly efficient exhaustive search algorithm for optimizing canonical
ternary Reed-Muller expansions of Boolean functions. International Journal on Electronics, 76,
37–56, 1994.
[29] C. Yildirim, J.T. Butler, and C. Yang, Multiple-valued PLA minimization by concurrent multiple
and mixed simulated annealing. In Proceedings of the 23rd IEEE International Symposium on
Multiple-Valued Logic, pp. 17–23, 1993.
[30] S. Kirkpatrick, C.D. Gelatt, and M.P. Vecchi, Optimization by simulated annealing. Science, 220,
671–680, 1983.
[31] A. Kaczmarek, V. Antonenko, S. Yanushkevich, and E.N. Zaitseva, Algorithm for network to
realize linear MVL functions using arithmetical logic. In Proceedings of the 12th International
Conference on System Science, pp. 23–30.
[32] Y. Hata, K. Hayase, T. Hozumi, N. Kamiura, and K. Yamato, Multiple-valued logic minimization
by genetic algorithms. In Proceedings of the 27th IEEE International Symposium on Multiple-Valued
Logic, 1997, pp. 97–102.
[33] A. Lloris-Ruiz, J.F. Gomez-Lopera, and R. Roman-Roldan, Entropic minimization of multiple-
valued functions. In Proceedings of the 23rd IEEE International Symposium on Multiple-Valued
Logic, 1993, pp. 24–28.
[34] A. Ngom, C. Reischer, D.A. Simovici, and I. Stojmenović, Learning with permutably homogeneous
multiple-valued multiple-threshold perceptrons. Neural Processing Letters, 12, 2000, Proceedings of
the 28th IEEE International Symposium on Multiple-Valued Logic, May 1998, pp. 161–166,
[35] A. Ngom, I. Stojmenović, and R. Tošić, The computing capacity of three-input multiple-valued
one-threshold perceptrons. Neural Processing Letters, 14, 141–155, 2001.
[36] M.H. Abd-El-Barr, S.G. Zaky, and Z.G. Vranesić, Synthesis of multivalued multithreshold
functions for ccd implementation. IEEE Transactions on Computers, 35, 124–133, 1986.
[37] R. Takiyama, The separating capacity of a multithreshold threshold element. IEEE Transactions on
Pattern Analysis and Machine Intelligence, 7, 112–116, 1985.
[38] A. Ngom, I. Stojmenović, and J. Žunić, On the number of multilinear partitions and the computing
capacity of multiple-valued multiple-threshold perceptrons. IEEE Transactions on Neural Networks,
14, 469–477, 2003, Proceedings of the 29th IEEE International Symposium on Multiple-Valued Logic,
IEEE Computer Society Technical Committee on Multiple-Valued Logic, IEEE Computer Society,
May 1999, pp. 208–213.
[39] H. Edelsbrunner, Pattern Classification and Scene Analysis. Springer-Verlag, Heidelberg, 1987.
[40] D.E. Rumelhart, G.E. Hinton, and R.J. Williams, Learning internal representations by error
propagation. In Parallel Distributed Processing: Explorations in the Microstructure of Cognition,
(J.L. McClelland, D.E. Rumelhart, and the PDP Research Group, Eds.), Vol. I Foundations.
MIT Press, Cambridge, MA, 1986.
[41] X. Yao, Evolving artificial neural networks. Proceedings of the IEEE, 87, 1423–1447, 1999.
[42] S. Judd, In Proceedings of the IEEE 1st Conference on Neural Networks, Vol. 2, San Diego, CA, 1987,
p. 685.
[43] T.G. Dieterich and G. Bakiri, Solving multiclass learning problems via error-correcting output
codes. Journal of Artificial Intelligence Research, 2, 263–386, 1995.
[44] M. Marchand, M. Golea, and P. Ruján, A convergence theorem for sequential learning in two-layer
perceptrons. Europhysics Letters, 11, 487–492, 1990.
[45] P. Ruján and M. Marchand, A geometric approach to learning in neural networks. Complex Systems,
3, 229–242, 1989.
[46] S.A.J. Keibek, H.M.A. Andree, M.H.F. Savenije, G.T. Barkema, and A. Taal, A fast partitioning
algorithm and a comparison of feedforward neural networks. Europhysics Letters, 18, 555–559,
1992.
[47] M. Marchand and M. Golea, On learning simple neural concepts: From halfspace intersections to
neural decisions lists. Network: Computation in Neural Systems, 4, 67–85, 1993.
[48] R.L. Rivest, Linear decision list. Machine Learning, 2, 229–246, 1987.
[49] S.E. Fahlman and C. Lebière, The cascade-correlation learning architecture. Advances in Neural
Information Processing Systems, 2, 254, 1990.
[50] M. Frean, The upstart algorithm: A method for constructing and training feedforward neural
networks. Neural Computation, 2, 198–209, 1990.
[51] I.K. Sethi, Entropy nets: from decision trees to neural networks. In Proceedings of the IEEE, 78,
1605–1613, 1990.
[52] J.A. Sirat and J.P. Nadal, Neural trees: A new tool for classification. Network, 1, 423–438, 1990.
[53] M. Mezard and J.P. Nadal, Learning in feedforward layered networks: The tiling algorithm. Journal
of Physics A, 22, 2191–2203, 1989.
[54] G.T. Barkema, H.M.A. Andree, and A. Tall, The patch algorithm: Fast design of binary feedforward
neural networks. Network, 5, 393–407, 1993.
[55] F.M. Frattale-Mascioli and G. Martinelli, A constructive algorithm for binary neural networks:
The oil-spot algorithm. IEEE Transactions on Neural Networks, 6, 794–797, 1995.
[56] S. Young and T. Downs, Carve: A constructive algorithm for real-valued examples. IEEE
Transactions on Neural Networks, 9, 1180–1190, 1998.
[57] S.I. Gallant, Three constructive algorithms for network learning. In Proceedings of the
8th Annual Conference on Cognitive Science Society, Amherst, MA, August 15–17, 1986,
pp. 652–660.
[58] M. Golea and M. Marchand, A growth algorithm for neural network decision trees. Europhysics
Letters, 12, 205–210, 1990.
[59] M.M. Muselli, On sequential construction of binary neural networks. IEEE Transactions on Neural
Networks, 6, 678–690, 1995.
[60] J.P. Nadal, Study of a growth algorithm for neural networks. International Journal of Neural Systems,
1, 55–59, 1989.
[61] D.E. Goldberg, Genetic Algorithms in Search, Optimization, and Machine Learning. Addison-Wesley,
Reading, MA, 1989.
[62] J.H. Holland, Adaptation in Natural and Artificial Systems. Michigan University Press, Ann Arbor,
MI, 1975.
[63] J.D. Schaffer, L.D. Whitley, and L.J. Eshelman, Combinations of genetic algorithms and neural
networks: A survey of the state of the art. In COGANN-92: International Workshop on Combi-
nations of Genetic Algorithms and Neural Networks, (L.D. Whitley and J.D. Schaffer, Eds.), IEEE
Computer Society Press, Washington, 1992.
[64] A. Ngom, I. Stojmenović, and Z. Obradović, Minimization of multiple-valued multiple-threshold
perceptrons by genetic algorithms. In Proceedings of the 28th IEEE International Symposium on
Multiple-Valued Logic, Fukuoka, Japan, May 27–29, 1998, IEEE Computer Society Technical
Committee on Multiple-Valued Logic, IEEE Computer Society, pp. 209–214.
[65] T. Bäck, Evolutionary Algorithms in Theory and Practice. Oxford University Press, Oxford, UK,
1996.
[66] H.-P. Schwefel, Numerical Optimization of Computer Models. Wiley, Chichester, New York, 1981.
25.1 Introduction
It has been shown in literature that analog neurons of limited precision are essentially discrete multiple-
valued neurons. In this chapter, we study the computational abilities of the Discrete Multiple-Valued
Multiple-Threshold Perceptron. This neuron model extends and generalizes the well-known (two-valued
one-threshold) perceptron and other previously studied models. We introduce the concept of multilinear
25-427
where o
= (o0 , . . . , os ) ∈ K s+1 is the output vector,
t = (t1 , . . . , ts ) ∈ R s is the threshold vector — with
ti ≤ ti+1 (1 ≤ i ≤ s − 1) — and s (1 ≤ s ≤ k n − 1) is the number of threshold values.
Multiple-threshold devices [9] are threshold elements containing multiple levels of excitation
(thresholds). Among their qualities, is that given enough thresholds, a single multiple-threshold element
can realize any given function operating on a finite domain [7].
arises: what is the maximum number of ways in which v points in R n can be partitioned by s parallel
hyperplanes (none of which contains any of the points)?
Answering this question, or obtaining good bounds on the answer, has consequences for the capacity
of generalizations of the threshold functions and threshold elements so central to the theory of artificial
neural networks. (These consequences are discussed in Reference 27.)
t , o
).
for 1 ≤ i ≤ v. Equivalently, f is s-separable if and only if it has an s-representation defined by (w,
A k-valued function over V is said to be s-nonseparable if it is not s-separable.
In other words, a (n, k, s)-perceptron partitions the space V ⊂ R n into s + 1 distinct classes
[oj ]
H0 , . . . , Hs[os ] , using s parallel hyperplanes, where Hj = {
x ∈ V |f (
x ) = oj and tj ≤ w
[o0 ]
x
< tj+1 }.
We assume that t0 = −∞ and ts+1 = +∞. Each hyperplane equation denoted by Hj (1 ≤ j ≤ s) is of
the form
x
= tj .
Hj : w (25.4)
Multilinear separability (s-separability) extends the concept of linear separability (1-separability of the
common binary 1-threshold perceptron) to the (n, k, s)-perceptron. Linear separability in two-valued
case tells us that a (n, 2, 1)-perceptron can only learn from a space V ⊆ [0, 1]n in which there is a single
hyperplane that separates it into two disjoint halfspaces: H0[0] = {
x |f (
x ) = 0} and H1[1] = {
x |f (
x ) = 1}.
From the (n, 2, 1)-perceptron convergence theorem [13], concepts that are linearly nonseparable cannot
be learned by a (n, 2, 1)-perceptron. One example of linearly nonseparable two-valued logic function is
the n-input parity function. Likewise, the (n, k, s)-perceptron convergence theorems [10,15] state that a
(n, k, s)-perceptron computes a given function f ∈ Pkn if and only f is s-separable. Figure 25.1 shows an
example of 2-separable 4-valued logic function of P52 .
Let V ⊂ R n with |V | = v ≥ 2. A (n, v, s)-partition is a partition of V by s ≤ v − 1 parallel hyperplanes
(namely (n − 1)-planes), which do not pass through any of the v points. For instance, Figure 25.2 shows
an example of (2, 52 , 2)-partition.
A (n, v, s)-partition determines s + 1 distinct classes S0 , . . . , Ss ⊂ V separated by s parallel (n − 1)-
planes such that si=0 Si = V and si=0 Si = ∅. A (n, v, s)-partition corresponds to an s-separable
k-valued function f ∈ Pkn if and only if all points in the set Si , for 0 ≤ i ≤ s + 1, have the same value
taken out of K . Also, we assume that any two neighboring classes have distinct values. If for a given
(n, v, s)-partition we have Si = ∅ (0 ≤ i ≤ s + 1) then, clearly, the number of associated functions is
k(k − 1)s . In Section 25.6, we consider only partitions where Si = ∅ for 0 ≤ i ≤ s.
A linear partition of a point set V is a (n, v, 1)-partition, so only a single (n − 1)-plane is required
to separate an n-dimensional space V ⊆ R n into two halfspaces. The enumeration problem for linear
partitions is closely related to the efficiency measurement problem for linear discriminant functions in
pattern recognition [22] and to many other algorithmic problems [28].
42 2 2 2 2
32 2 2 0 0
y 20 0 0 0 0
10 0 4 4 4
04 4 4 4 4
0 1 2 3 4
x
FIGURE 25.1 A 2-separable function of P52 .
logic. It is therefore of practical as well as theoretical interest to estimate the number of functions that can
be modeled as multiple-threshold functions for a given number of inputs and threshold levels.
For given s ≥ 2, the capacity of a (n, k, s)-perceptron with domain V (i.e., the number of k-valued
functions over V that can be simulated by the (n, k, s)-perceptron) is approximated by the product of the
number of (n, v, s)-partitions and the number of functions associated with each (n, v, s)-partition of V .
Thus counting the number of (n, v, s)-partitions of V is a first step toward calculating the capacity of
(n, k, s)-perceptrons. We emphasize that the s partitioning (n − 1)-planes of a (n, v, s)-partition do not
pass through any point of V , and therefore we do not obtain the exact number of (n, k, s)-perceptron
computable functions.
Let Ln,v,s be the number of (n, v, s)-partitions of V ⊂ R n and denote by |Fk,s n | the capacity of a (n, k, s)-
perceptron. The capacity of (n, 2, 1)-perceptrons with domain V [29] is well known and is given by
v
n 2 if n ≥ v − 1,
v − 1,
n
|F2,s |=2 =
v−1 v −1 (25.5)
i
2v − 2 i=n+1 otherwise,
i=0 i
Reference [17] estimated lower and upper bounds for the capacity of (n, 2, s)-perceptrons, using two
essentially different enumeration techniques. The paper demonstrated that the exact number of multiple-
threshold functions depends strongly on the relative topology of the input set. The results corrected a
previously published estimate [18] and indicated that adding threshold levels enhances the capacity more
than adding variables.
In order to answer the question, we first describe well-known relationships between linear partitions
and minimal pairs in Section 25.4. Based on these relationships, we obtain in Section 25.5 an exact and
general formula for the capacity of (3, k, 1)-perceptrons. In Section 25.6 we derive results on the capacity
of (n, k, s)-perceptrons.
P1
D
D
C
C
P2
M1
P
D B
A C
M2
FIGURE 25.4 Separating line P corresponds to minimal pairs (A, B) and (C, D).
Reference 35 derived a formula for the number of linear partitions of a given point set V in two-
dimensional and three-dimensional spaces, depending on the configuration formed by the points of V .
He considered the case where some points of V may coincide.
3
|Fk,1 | = k(k − 1)L3,v,1 + k, (25.7)
where the coefficient k(k − 1) is the number of functions associated with each linear partition of V , and
the last term k is the number of functions associated with the trivial linear partition {V , ∅} of V (recall
that (n, v, s)-partitions with empty classes are not included in our definition). Determining L3,v,1 is the
subject of the next sections.
For V ⊂ R 3 in general position, the capacity of (3, k, 1)-perceptrons follows directly from Equations
(25.6) and (25.7). That is,
Corollary 25.1
3
3 v −1
|Fk,1 | = k(k − 1) + k.
i
i=1
In this section, however, V is not necessarily in general position and may even contain points that
coincide, that is, V can be a multi-set. Thus the formula we obtain will be the most general result for
(3, k, 1)-perceptrons.
Formula (25.6) follows directly from Lemma 25.2 since every pair of points in V is a minimal. From Lemma
25.2 and Formula (25.6), the following statement can be easily deduced.
Theorem 25.3 Let p1 , . . . , pd be all lines determined by a planar point-set V (each line contains at least two
points of V ) and let ci,1 denote the number of points of V belonging to the line pi , 1 ≤ i ≤ d. Then
d
v −1 v −1 ci,1 − 1
LV ,2 = 1 + + − .
1 2 2
i=1
Proof. If the line pi contains ci,1 points from V , then they determine ( ci,1
2 ) pairs among which exactly
ci,1 −1
ci,1 − 1 are minimal pairs; so the number of pairs that are not minimal is ( ci,1
2 ) − (ci,1 − 1) = ( 2 ).
d ci,1 −1
Then the number of nonminimal pairs of points in V is i=1 ( 2 ). Now, the statement follows from
the fact that the total number of pairs in V is ( v−1 v−1
1 ) + ( 2 ).
Now, we consider a generalization in R 1 , when some points coincide (multiplicity of points). That is there
j
are points in V with multiple occurrence. By Ai we denote a point Ai ∈ V with multiplicity j, that is, j
points coincide.
c c
Theorem 25.4 Let V = {A11,0 , . . . , Add,0 }, where c1,0 + · · · + cd,0 = v. Then
d
v −1 ci,0 − 1
L V ,1 = 1 + − .
1 1
i=1
Proof. Obvious.
c VV
cV d ,0
Let V ⊂ R 2 such that |V | = v and V = {A11,0 , . . . , Ad V0 }, where ci,0
V is the multiplicity of the point A
i
0
in V (for 1 ≤ i ≤ d0 ), c1,0 + · · · + cd V ,0 = v and d0 is the number of distinct points in V . Let
V V V V
0
p1 , . . . , pd V (25.9)
1
be all different lines determined by the points of V (each line contains at least two noncoincident points
of V and d1V is the number of all such lines in V ).
V the number of points of V belonging to the line p (with the corresponding multiplicities)
Denote by ci,1 i
cV
V be the number of different lines from (25.9) through the point A i,0 , for 1 ≤ i ≤ d V .
for 1 ≤ i ≤ d1V . Let ri,0 i 0
If we denote by L V ,2 the number of linear partitions of the set V , then the following statement can be
proved.
Theorem 25.5
d0V
V −1
v −1 v −1 ci,0
L V ,2 =1+ + −
1 2 1
i=1
d1V d0V V
V −1
ci,1 ci,0 − 1
V
− + (ri,0 − 1) .
2 2
i=1 i=1
Proof. The proof will be by induction on v. Consider a set of points V such that |V | = v + 1. Let A
be a single point (with multiplicity one) of V , which is a vertex of the convex hull of V . By induction
hypothesis, for the set U = V − {A} we have
LU ,2 = LV −{A},2
d0U
U −1
v −1 v −1 ci,0
=1+ + −
1 2 1
i=1
d1U d0U U
U −1
ci,1 c −1
U
− + (ri,0 − 1) i,0 . (25.10)
2 2
i=1 i=1
Let us determine the number of movable line partitions of U . Project U from the point A into a line a
which separates A from U . Let U be the projection of U . The number of movable line partitions of U is,
d0U
v −1 U −1
ci,0
L U ,1 =1+ − . (25.11)
1 1
i=1
LV ,2 = LU ,2 + LU ,1
d0V
V −1
d1V
V −1
d0V V
v v ci,0 ci,1 V c −1
=1+ + − − + (ri,0 − 1) i,0 . (25.12)
1 2 1 2 2
i=1 i=1 i=1
In the case none of the vertices of the convex hull of V is a single point, we shall see what happen when
the multiplicity of one point A ∈ U (|U | = v), increases by one, producing the set V .
U
U
Denote, for the sake of simplicity, the sums from the right-hand side of (25.10) by U 0 , 1 ,
V
V
V 2
respectively and those from (25.12) by 0 , 1 , 2 . Since
v v v −1 v −1
1+ + − 1+ − = v,
1 2 1 2
V
V
V
U
U
U
+ − − + − = v. (25.13)
0 1 2 0 1 2
c V +1 cU
Let we have the point Ai i,0 in V instead of Ai i,0 in U . Then
V
U
− = 1. (25.14)
0 0
V = r U and c V = c U + 1, we have
Taking into account that ri,0 i,0 i,0 i,0
V
U U
U
U
ci,0 c −1
− = (ri,0 − 1) − i,0 ,
2 2
2 2
that is,
V
U
U U
− = (ri,0 − 1)(ci,0 − 1). (25.15)
2 2
Let {p1 , p2 , . . . , pr U } be the set of lines from Equation (25.9) through the point A. Then
i,0
V
U i,0 V
rU
V −1
c ci,1
− = i,1 −
2 2
2 2 i=1
rU rU rU
i,0
i,0
i,0
V V
= (ci,1 − 1) = ci,1 − 1
i=1 i=1 i=1
U U U U
= v − ci,0 + ri,0 ci,0 − ri,0
U U U
= v + ri,0 (ci,0 − 1) − ci,0
U U
= v + (ri,0 − 1)(ci,0 − 1) − 1. (25.16)
Theorem 25.6
d1V
V −1
d2V
V −1
v −1 v −1 v −1 ci,1 ci,2
LV ,3 =1+ + + − −
1 2 3 2 3
i=1 i=1
d1V
V
V c −1
+ (ri,1 − 1) i,1 . (25.17)
3
i=1
Proof. We use induction (with trivial basis) on |V |. Suppose that the statement is valid for |V | = v.
Consider |V | = v + 1.
Take a point A of V . For the sake of simplicity, we may assume (without any loss of generality) that A
is a vertex of the convex hull of V . Denote V − {A} by U . Consider the projection π of U onto a plane
α separating A from U , point A being the center of projection. Those linear partitions of U that can be
established by using planes through the point A are said to be movable with respect to A.
The number of additional linear partition that are obtained after extension of the set U to V (by adding
the point A) is equal to the number of movable (w.r.t. A) linear partitions of U . This last number is equal
to L Y ,2 , that is, to the number of linear partitions of the planar point set Y = π(U ). The corresponding
bijection is established by the projection π . namely, each movable linear partition (w.r.t. A) may be
represented by a plane H through A. The line h = π(H ) corresponds to a linear partition of Y = π(U )
in α. Conversely, given a linear partition of Y with the corresponding line h, the plane through h and A
determines the associated movable linear partition of V . It follows that
LV ,3 = LU ,3 + L Y ,2 . (25.18)
Now Equation (25.17) can be deduced from Equation (25.18) using induction hypothesis and
Theorem 25.5. Namely, by induction hypothesis,
v −1 v −1 v −1
LU ,3 = 1 + + +
1 2 3
d0Y
Y −1
v −1 v −1 ci,0
L Y ,2 =1+ + −
1 2 1
i=1
d1Y d0Y Y
Y −1
ci,1 c −1
Y
− + (ri,0 − 1) i,0 . (25.20)
2 2
i=1 i=1
and
d1
U U −1
d0Y Y d1V V
U ci,1 Y ci,0 − 1 V c −1
(ri,1 − 1) + (ri,0 − 1) = (ri,1 − 1) i,1 . (25.23)
3 3 3
i=1 i=1 i=1
point in K 3 are obtained as: x = ((m−1) div k) div k, y = ((m−1) div k) mod k and z = (m−1) mod k.
Let V = K 3 . Next we describe the counting algorithm.
Step 1: Generate all d2V planes determined by three noncolinear points from V and compute B =
d2V ci,2
V −1
i=1 3
as follows.
1. Initialize B and d2V to 0.
3
2. Generate a candidate plane P as a triple m1 < m2 < m3 out of k 3 . There are ( k3 ) candidate
planes among which only d2V planes are valid.
3. Accept the candidate plane P if m1 and m2 are minimal points (i.e., they are the two smallest
values) on it and m3 is minimal point among those noncolinear with m1 and m2 . That is, P
/ {m1 , m2 , m3 } we have m > m2 , and, if
is valid if and only if for each point m in P and m ∈
m is noncolinear with m1 and m2 then also m > m3 .
4. If P is valid, then
(a) d2V ← d2V + 1.
(b) cdVV ,2 ← number of such points m (described above) +3.
2
c V −1
(c) B ← B + d2V ,2 .
3
5. Repeat from Step 2 until no more plane can be generated.
Step 2: Generate all d1V lines determined by two points from V and compute the sums A =
d1V ci,1
V −1
d1V V V
c −1
i=1 2
and C = i=1 (ri,1 − 1) i,13 as follows.
1. Initialize A, C, and d1V to 0.
3
2. Generate a candidate line L as a pair m1 < m2 out of k 3 . There are k2 candidate lines
among which only d1V lines are valid.
3. Accept the candidate line L if m1 and m2 are minimal points on it. That is, L is valid if and
/ {m1 , m2 } we have m > m2 .
only if for each point m in L and m ∈
4. If L is valid, then
(a) d1V ← d1V + 1.
(b) cdVV ,1 ← number of such points m (described above) +2.
1
c V −1
(c) A ← A + d1V ,1 .
2
(d) rdVV ,1 ← number of planes (with third point from V ) that pass through L.
1
c V −1
(e) C ← C + rdVV ,1 − 1 d1V ,1 .
1 3
5. Repeat from Step 2 until no more line can be generated.
Step 3: Apply Theorem 25.6 and report the results. That is,
1. LV ,3 = v + ( v3 ) − A − B + C.
3 | = k(k − 1)(L
2. |Fk,1 V ,3 − 1) + k = k(k − 1)LV ,3 − k + 2k.
2
Figure 25.5 and Figure 25.6 show, respectively, the codes for Step 1 and Step 2.
Remark 1. There is no need to memorize lines and planes, not even to memorize points on them, the
algorithms work with one plane (resp. line) at a time. Also, to avoid errors or imprecisions, equations of
lines and planes should be made with integer coefficients.
Remark 2. The equation of a plane containing points D, E, F is obtained as follows. Find the normal
−
→ − →
vector n
= ED × EF (cross product), which gives the coefficients a, b, c, then find d from ax0 + by0 +
cz0 + d = 0, (where (x0 , y0 , z0 ) is a point on the plane. Also, three points D, E, F are colinear if and only
) × (E −F
if (D −F
) = 0 (cross product).
In Step 1, O(k 9 ) candidate planes are generated and each generated plane is checked for validity at most
k3 times, which gives us a total of O(k 12 ) validity tests. Thus Step 1 has a time complexity polynomial
k K3
d1 d2K
3
A B C LK 3 ,3 3 |
|Fk,1
2 28 20 0 12 0 52 104
3 253 491 49 1,552 0 1,351 8,103
4 1,492 7,502 300 24,422 350 17,356 208,264
5 5,485 52,013 1,338 201,260 4,252 119,529 2,390,565
6 17,092 297,464 3,712 1,031,292 25,852 647,424 19,422,696
7 41,905 1,119,791 10,227 4,322,716 119,598 2,453,869 103,062,463
8 95,140 3,900,890 21,948 14,236,066 418,546 8,399,764 470,386,736
on k. In Step 2, O(k 6 ) lines are generated and each generation consists of at most k 3 validity checks and
O(k 6 ) time to compute the number of planes that contain the given line. Therefore Step 2 has also a time
complexity of O(k 12 ) (Table 25.1).
n
v −1
Ln,v,1 = Ln,v−1,1 + Ln−1,v−1,1 = . (25.24)
i
i=1
The difference with the original formula (25.6) is that the index i starts with 1 instead of 0. The reason we
start from i = 1 is that we do not include partitions containing empty classes.
To count the (2, v, s)-partitions, we associate each (2, v, s)-partition with a slope σ ∈ R (or equivalently,
a minimal pair) as follows:
• For each of the s partitioning lines, choose the corresponding minimal pair which has the smaller
slope.
• The associated slope of the given (2, v, s)-partition is the maximum among these smaller slopes.
v−2
Lemma 25.7 The number of (2, v, s)-partitions associated with a given slope σ is s−1 .
Proof: For a given minimal pair, rotate its slope σ to increase it a bit in order to obtain the direction
of separation P that corresponds to σ . Sort the points of V along this direction, that is, according to
as one point. Then we can choose s − 1
their distance to the line P. Consider the selected minimal pair
additional separating lines (parallel to P) for v − 2 points in v−2
s−1 ways.
Theorem 25.8
v −2 v
L2,v,s ≤ .
s−1 2
Proof. Since V is in general position, then there are v2 slopes. The inequality is explained by the fact
that two distinct choices of minimal pairs (or slopes) may have sets of associated (2, v, s)-partitions that
intersect each other. Therefore a given partition may be counted many times, depending on the config-
uration of V . For instance, consider one of the partitions in Figure 25.7. Clearly, the same partition can
be obtained either by selecting the upper minimal pair or by selecting the lower minimal pair (indeed in
this example, they both give the same set of associated partitions, even though the points are in general
position). We have equality only when s = 1.
Anthony [36] gave an upper bound on Ln,v,s , refining an upper bound of Olafsson and Abu-Mostafa
[17] which is itself a correction of a claimed upper bound of Takiyama [18]. Their result is as follows:
Theorem 25.9 (Anthony [36]) The maximum possible number of ways in which v points in R n can be
partitioned by s parallel hyperplanes is bounded as follows:
s 2 n−1
v −1
Ln,v,s ≤2 S(v − 1, n − 1 − 2j),
i
i=0 j=0
r
where S(a, b) is the Stirling number of the first kind, the coefficient of x b in j=1 (1 + jx)
Paper [36] also gave another upper bound on Ln,v,s , better than that of Theorem 25.9 as follows:
Theorem 25.10 (Anthony [36]) The maximum possible number of ways in which v points in R n can be
partitioned by s parallel hyperplanes is bounded as follows:
n+s−1
vs − 1
Ln,v,s ≤ .
i
i=0
Corollary 25.11 The number of n-input k-valued s-separable functions f : V → K is bounded as follows:
n+s−1
vs − 1
n
|Fk,s | ≤ k(k − 1)s .
i
i=0
Proof. Given a (n, v, s)-partition S0 , . . . , Ss , each class can take one of k values from K such that any
two neighboring classes have different values. So there are k(k − 1)s ways to assign values to a (n, v, s)-
partition. Each assignment of values to a (n, v, s)-partition defines a unique k-valued s-separable function.
The inequality is explained by the fact that some functions can be obtained by at least two different
(n, v, s)-partitions (Figure 25.7 shows an example of such function).
Corollary 25.13 The number of permutably homogeneous n-input k-valued s-separable functions
f : V → K is
k! vs − 1
n+s−1
n
|Gk,s |≤ .
(k − s − 1)! i
i=0
Proof. For a (n, v, s)-partition where s ≤ k − 1, the number of ways to map the s + 1 classes S0 , . . . , Ss to
distinct values in K equals the number of (s+1, k)-permutations, that is, k!/(k−s−1)! Each such (s+1, k)-
permutation uniquely determines a permutably homogeneous s-separable function. From Theorem 25.12,
each permutably homogeneous s-separable function uniquely determines a (n, v, s)-partition.
1 0 1 0
0 1 0 1
→
m y
b/a
→
x
Thus the number of minimal pairs corresponds to the number of pairs (
x , y
) of the (i, j)-grid such that
a ⊥ b. Let natural numbers i and j be given so that i ≤ j. The generalized Farey (i, j)-sequence Fi,j [34]
is the strictly increasing sequence of all the fractions of the form b/a, where the integers a and b satisfy:
a ⊥ b, 0 < b < a ≤ j, b ≤ i. Thus the sequence F4,7 is as follows:
1 1 1 1 2 1 2 3 1 4 3 2 3 4
.
7 6 5 4 7 3 5 7 2 7 5 3 4 5
The Farey i-sequence Fi for any positive integer i is the set of irreducible rational numbers b/a, with
0 ≤ b ≤ a ≤ i and a ⊥ b, arranged in increasing order [37]. So the sequence F4 is:
0 1 1 1 2 3 1
.
1 4 3 2 3 4 1
The length of the sequence Fi,j (resp. Fi ) will be denoted by |Fi,j | (resp. |Fi |). Also, Fi,j
d (resp. F d ) stands
i
for the d-th fraction in Fi,j (resp. Fi ), 1 ≤ d ≤ |Fi,j | (resp. |Fi |).
To count the (2, k 2 , s)-partitions, we associate to each (2, k 2 , s)-partition a fraction (b/a) ∈ Fk−1 as
follows:
• For each of the s partitioning lines, choose one of the two minimal pairs that has the smaller slope.
• The associated slope of the given (2, k 2 , s)-partition is the maximum among these smaller slopes.
Then to enumerate or generate the (2, k 2 , s)-partitions associated with a given b/a we will need Lemmas
25.14 and 25.15 below. As in Figure 25.8, rotate a line segment [
x , y
] whose slope b/a is irreducible to
increase it for a small amount, so that we obtain a straight line P with slope m ∈ / Fk−1 . P is the direction
for separation and corresponds to the line segment [
x , y
] with slope b/a.
Lemma 25.14 Slope b/a of the line segment [
x , y
] is greater than or equal to the associated slope of any
(2, k 2 , s)-partition in direction parallel to P (where P is the direction of separation that corresponds to the line
segment [
x , y
]).
Proof. Consider Figure 25.9, where t ∈ Fk−1 is the associated slope of a (2, k 2 , s)-partition parallel to P
and m ∈/ Fk−1 is the slope of P. From our construction of P we have that m > (b/a) ∈ Fk−1 but near b/a.
P
m
Convex hull
one side
Convex hull
other side
d+1
Clearly, from Figure 25.9, t < m. Let d be the rank of b/a in Fk−1 . Since Fk−1
d = (b/a) < m < Fk−1 ,t <
m and, t ∈ Fk−1 , therefore we have t ≤ b/a.
Lemma 25.15 Slope b/a of the line segment [
x , y
] is equal to the associated slope of a (2, k 2 , s)-partition in
direction parallel to P if and only if at least one of the s separating lines intersects a minimal pair with slope
b/a (where P is the direction of separation that corresponds to the line segment [
x , y
]).
Proof. Let t ∈ Fk−1 be the associated slope of a (2, k 2 , s)-partition parallel to P then from Lemma 25.14
we have t ≤ (b/a). ⇒) Clearly, if none of the separating lines intersects any minimal pair of slope b/a
then b/a will not be chosen as the smaller slope for any of the separating lines. Therefore b/a cannot be
the associated slope of any (2, k 2 , s)-partition parallel to P, that is (b/a) = t . ⇐) Since t is the associated
slope of a (2, k 2 , s)-partition parallel to P, then t is greater than or equal to the slopes of the two minimal
pairs associated with each separating line. In particular, we have t ≥ (b/a) for a separating line that
intersects a minimal pair of slope b/a. Since t ≤ (b/a) and t ≥ (b/a) therefore t = (b/a).
Reference 34 obtained an exact formula for the number of linear partitions of the (i, j)-grid. Substituting
for the (k, k)-grid we obtain the following corollary.
Corollary 25.16
L2,k 2 ,1 = 2k(k − 1) + 2(k − 1)2 + 4 (k − a)(k − b).
a⊥b,0<b<a<k
Proof. If b = 0, then the number of minimal pairs is equal to 2k(k − 1), which is the number of minimal
vertical and horizontal segments of the (k, k)-grid (these are k(k − 1) each). This argument explains the
first term. One can see from Figure 25.10 that there are k − 1 segments in each row and column and that
there are k rows and columns. If b > 0 and a ⊥ b then a = b implies a = b = 1. In that case, the
(k, k)-grid contains obviously 2(k − 1)2 minimal segments with slope 11 , which explains the second term.
In Figure 25.11, it suffices to count the number of squares of size 1 × 1. There are (k − 1)2 such squares and
each square contributes 2 diagonals. Finally, if a > b > 0 and a ⊥ b, then there are exactly 2(k − a)(k − b)
horizontal and vertical rectangles of size a × b in the (k, k)-grid. The third term comes from the fact that
there are two minimal segments per each rectangle (the diagonal segments). In Figure 25.12, the horizontal
rectangle in the lower left corner can be translated by one unit length k − b − 1 times in the horizontal
direction to reach the lower right corner, and k − a − 1 times in the vertical direction to reach the upper
left corner (where k = 5, a = 1, b = 2). This gives (k − a)(k − b) horizontal rectangles, each contributing
two minimal pairs. Similar argument for the vertical rectangle.
Lemma 25.17
k2 − 1 k −1 2k − 2
L2,k 2 ,s = 4 −2 −2
s s s
k −1
2 ak + bk − ab − 1
+4 − .
s s
a⊥b,0<b<a<k
Proof. Let a slope b/a be such that a ⊥ b (we consider slopes for directions between 0◦ and 45◦ . For each
such slope b/a, Lemmas 25.14 and 25.15 give a simple algorithm to construct (i.e., generate) and count
the (2, k 2 , s)-partitions associated with b/a. See Figure 25.13 for an illustration of the proof.
• Rotate slope b/a to increase it a bit (this gives the direction of an associated (2, k 2 , s)-partition P).
• Sort the points along this direction.
2
• Choose s points x
1 , . . . , x
s out of k 2 − 1 points in k s−1 ways (selected points are beginning of
classes S1 , . . . , Ss and the point x
0 = (k − 1, 0) can always be selected for S0 ).
• Eliminate all selections of points where no minimal pair of slope b/a is intersected by a separating
line of direction P.
To intersect one of the s separating lines, the lower end of a minimal pair with slope b/a must be
selected. From Corollary 25.16 we have:
• If b = 0 then there are k(k − 1) lower ends (the number of minimal horizontal segments of the
2 k−1
(k, k)-grid). None of them can be selected in k −1−k(k−1)
s = s ways. So the number of ways
2
to select a minimal pair with slope 01 to intersect a separating line is 2 k s−1 − 2 k−1 s .
• If a = b = 1 then there are (k − 1)2 lower ends (the number of minimal segments with slope 45◦ ).
2 2
None of them can be selected in k −1−(k−1)s = 2k−2 s ways. So the number of ways to select a
2
minimal pair with slope 1 to intersect a separating line is 2 k s−1 − 2 2k−2
1
s .
• If a > b > 0 then there are (k − a)(k − b) lower ends (the number of horizontal rectangles of size
2 ak+bk−ab−1
a × b of the (k, k)-grid). None of them can be selected in k −1−(k−a)(k−b)
s = s ways.
So the number of ways to select a minimal pair with slope 0 < (b/a) < 1 to intersect a separating
line is
2
k −1 ak + bk − ab − 1
4 − .
s s
a⊥b,0<b<a<k
Taking the total sum of all three cases yields the formula. This completes the proof.
Figure 25.13 illustrates the above proof with a = b = 1, s = 2, k = 3. The vector w
gives the direction
of separation, which is the dotted line P. The sorted list of points along that direction is 6, 7, 3, 8, 4, 0, 5, 1, 2.
To construct (generate) a (2, 9, 2)-partition, we must select three points out of 9 points in the list (these
will be beginning of classes S0 , S1 , and S2 , respectively). Point 6 can always be selected as x
0 for class
S0 and thus we need only to select two points (x
1 and x
2 ) for the remaining classes. A separating line
is then placed between x
0 and x
1 , and another between x
1 and x
2 (points x
0 , x
1 , x
2 are assumed sorted
along the direction of P). Moreover, if a selected point is a lower end of a minimal pair, we will then
place a separating line that intersects the minimal pair. For example, the (2, 9, 2)-partition in the figure is
generated by selection x0 = 6, x1 = 7, x2 = 0. Point 0 is the lower end of minimal pair (0, 4) so we place
a separating line between 0 and 4. Among the sorted list of points above, only four points are lower ends
of minimal pairs with slope 11 , that is, points 0, 1, 3, 4. To ensure that at least one minimal pair of slope 11
is intersected by a separating line then at least one lower end point must be selected in the construction
of a (2, 9, 2)-partition. To enumerate the number of (2, 9, 2)-partitions parallel to P, it suffices then to
count the number of selections of points (x
1 , x
2 ) that contain at least one lower end point. In Figure 25.13,
selections (6, 7, 8), (6, 8, 5), (6, 8, 2), (6, 7, 5), (6, 7, 2), and (6, 5, 2) for instance, will be eliminated since
none of them contain lower ends. Also, given selection (6, 7, 8) we can surely have a (2, 9, 2)-partition such
as in Figure 25.14, even though (6, 7, 8) is an invalid selection. However, such partition will be generated
by selection (6, 3, 0) and slope 12 , where 0 is the lower end of minimal pair (0, 7) (see dotted separating
lines in Figure 25.14).
Theorem 25.18
2
k2 − 1 k −1 k −1 2k − 2
L2,k 2 ,s =4 |Fk−1,k−1 | + 4 −2 −2
s s s s
ak + bk − ab − 1
−4 .
s
a⊥b,0<b<a<k
Proof. Follows from Lemma 25.17 and the fact that the last sum is over |Fk−1,k−1 |.
2 5 8
1 4 7
w
0 3 6
L2
L1
2 5 8
1 4 7
0 3 6
L2
L1
Corollary 25.20
2
k2 − 1 k −1 2k − 2 2k
L2,k 2 ,s > 4 (|Fk−1,k−1 | + 1) − 2 −2 −4 |Fk−1,k−1 |.
s s s s
Proof. Follows from Lemma 25.19 and Theorem 25.18 because the last sum becomes smaller when
substituting 2k 2 for ak + bk − ab − 1.
Corollary 25.21
2
k −1 k −1 2k − 2
L2,k 2 ,s < 4 (|Fk−1,k−1 | + 1) − 2 −2 .
s s s
Proof. Follows from Theorem 25.18 because the last sum is added.
The asymptotic formula for the length of the generalized Farey (i, j)-sequence (for i ≤ j) is given in
Reference 38 as |Fi,j | = (3i 2 j 2 /π 2 ) + O(i 2 j log j) + O(ij 2 log log j). Hence, substituting for the (k, k)-grid
we obtain |Fk−1,k−1 | = (3k 4 /π 2 ) + O(k 3 log k) + O(k 3 log log k).
Theorem 25.22
2 2 4 2
k −1 2k 3k k −1 k −1 2k − 2
L2,k 2 ,s ≈ 4 −2 + 4 − 2 − 2 .
s s π2 s s s
Proof. We take the mean of the lower and upper bounds and replace |Fk−1,k−1 | by its formula given
above.
Corollary 25.23 The number of 2-input k-valued s-separable logic functions is |Fk,s
2 | < k(k − 1)s L
2,k 2 ,s .
Corollary 25.24 The number of permutably homogeneous 2-input k-valued s-separable logic functions is
2 | ≈ k!/(k − s − 1)!L
|Gk,s 2,k 2 ,s .
b/a. Since there are k 2 elements in the (k, k)-grid, then it takes O(sk 2 ) ≤ O(k 4 ) time to calculate L2,k 2 ,s
(recall that s ≤ k 2 − 1). Clearly, the time complexity is polynomial on k.
The answer to this question gives (bounds on) the capacity of neural networks constructed by partitioning
algorithms. Such neural network architectures include neural trees and neural decision lists. This question
also motivates for the investigation of the VC-dimension and PAC-learnability of such structures. The
combinatorial arguments used to derive results for the number of linear (or multilinear) partitions and
the capacity of (n, k, s)-perceptrons may possibly be extended for the general case of n-dimensional set.
For instance, enumerate partitions for higher dimension grids.
References
[1] S.C. Chan, L.S. Hsu, and H.H. Teh, On neural logic networks. Neural Networks, 1 (Suppl. I) 428,
1988.
[2] V. Milutinović, A. Ngom, and I. Stojmenović, Strip — a strip-based neural network growth
algorithm for learning multiple-valued functions. IEEE Transactions on Neural Networks, 12,
212–227, 2001.
[3] A. Ngom, Synthesis of Multiple-Valued Logic Functions by Neural Networks, Ph.D. thesis,
Computer Science Department, University of Ottawa, Ottawa, Ontario, Canada, 1998.
[4] Z. Obradović, Computing with nonmonotone multivalued neurons. Multiple-Valued Logic — An
International Journal, 1, 271–284, 1996.
[5] Z. Tang, Q. Cao, and O. Ishizuka, A learning multiple-valued logic networks: Algebra, algorithm,
and application. IEEE Transactions on Computers, 47, 247–251, 1998.
[6] G. Wang and H. Shi, Tmlnn: Triple-valued or multiple-valued logic neural network. IEEE
Transactions on Neural Networks, 9, 1099–1117, 1998.
[7] O. Ishizuka, Multivalued multithreshold networks. In Proceedings of the 6th IEEE International
Symposium on Multiple-Valued Logic, 1976, pp. 44–47.
[8] T. Sasao, On the optimal design of multiple-valued plas. IEEE Computer, 38, 582–592, 1989.
[9] D. Haring, Multi-threshold threshold elements. IEEE Transactions on Electronic and Computer, 15,
45–65, 1965.
[10] A. Ngom, C. Reischer, D.A. Simovici, and I. Stojmenović, Learning with permutably homogeneous
multiple-valued multiple-threshold perceptrons. Neural Processing Letters, 12, 2000, Proceedings of
the 28th IEEE International Symposium on Multiple-Valued Logic, May 1998, pp. 161–166.
[11] A. Ngom, I. Stojmenović, and R. Tošić, The computing capacity of three-input multiple-valued
one-threshold perceptrons. Neural Processing Letters, 14, 141–155, 2001.
[12] M.H. Abd-El-Barr, S.G. Zaky, and Z.G. Vranesić, Synthesis of multivalued multithreshold
functions for ccd implementation. IEEE Transactions on Computers, 35, 124–133, 1986.
[13] M. Minsky and S. Papert, Perceptrons: An Introduction to Computational Geometry, MIT Press,
Cambridge, MA, 1969, Expanded edition 1988.
[14] Z. Obradović and I. Parberry, Computing with discrete multivalued neurons. Journal of Computer
and System Sciences, 45, 471–492, 1992.
[15] Z. Obradović and I. Parberry, Learning with discrete multivalued neurons. Journal of Computer
and System Sciences, 49, 375–390, 1994.
[16] R. Takiyama, Multiple threshold perceptron. Pattern Recognition, 10, 27–30, 1978.
[17] S. Olafsson and Y.A. Abu-Mostafa, The capacity of multilevel threshold functions. IEEE
Transactions on Pattern Analysis and Machine Intelligence, 10, 277–281, 1988.
[18] R. Takiyama, The separating capacity of a multithreshold threshold element. IEEE Transactions on
Pattern Analysis and Machine Intelligence, 7, 112–116, 1985.
[19] A. Ngom, I. Stojmenović, and J. Žunić, On the number of multilinear partitions and the computing
capacity of multiple-valued multiple-threshold perceptrons. IEEE Transactions on Neural Networks,
14, 469–477, 2003, Proceedings of the 29th IEEE International Symposium on Multiple-Valued Logic,
IEEE Computer Society Technical Committee on Multiple-Valued Logic, May 1999, IEEE Computer
Society, pp. 208–213.
[20] M. Anthony and P.L. Bartlett, Neural Network Learning: Theoretical Foundations, Cambridge
University Press, Cambridge, 1999.
[21] N. Cristiani and J. Shawe-Taylor, An Introduction to Support Vector Machines, Cambridge University
Press, Cambridge, 2000.
[22] R.O. Dud and P.E. Hart, Pattern Classification and Scene Analysis, John Wiley Sons, New York,
1973.
[23] V.N. Vapnik, Statistical Learning Theory, John Wiley Sons, New York, 1998.
[24] M. Anthony, Classification by polynomial surfaces. Discrete Applied Mathematics, 61, 91–103,
1995.
[25] T.M. Cover, The number of linearly inducible orderings of points in d-space. SIAM Journal of
Applied Mathematics, 15, 434–439, 1967.
[26] V. Bohossian and J. Bruck, Multiple threshold neural logic. In Advances in Neural Information
Processing (M. Jordan, M. Kearns, and S. Colla, Eds.), Vol. 10, NIPS’1997, MIT Press, Cambridge,
MA, 1998.
[27] M. Anthony, Analysis of data with threshold decision lists. Center for Discrete and
Applied Mathematics Research Report, CDAM-LSE-2002-12, London School of Economic,
December 2002.
[28] H. Edelsbrunner, Pattern Classification and Scene Analysis, Springer-Verlag, Heidelberg, 1987.
[29] K.Y. Siu, V. Roychowdhury, and T. Kailath, Discrete Neural Computation: A Theoretical Foundation,
Information and System Sciences Series (Thomas Kailath, Series Ed.), Prentice Hall, Upper Saddle
River, New Jersey, 1995.
[30] J.T. Tou and R.C. Gonzalez, Pattern Recognition Principles, Addison-Wesley, Reading, MA, 1974.
[31] T.M. Cover, Geometrical and statistical properties of systems of linear inequalities with applic-
ations in pattern recognition. IEEE Transactions on Electronic and Computer, 14, 326–334,
1965.
[32] J. Nilsson, Learning Machines: Foundations of Trainable Pattern Classifying Systems, McGraw-Hill,
New York, 1968.
[33] J. Koplowitz, M. Lindenbaum, and A. Bruckstein, The number of digital straight lines on an n × n
grid. IEEE Transactions on Information Theory, 36, 192–197, 1990.
[34] D. Acketa and J. Žunić, On the number of linear partitions of the (m, n)-grid. Information
Processing Letters, 38, 163–168, 1991.
[35] R. Tošić, On the number of linear partitions. Review of Research, Mathematical Series, 22, 141–149,
1992.
[36] M. Anthony, Partitioning points by parallel hyperplanes. Discrete Mathematics, to appear.
[37] G.H. Hardy and E.M. Wright, An Introduction to the Theory of Numbers, 5th ed., Clarendon Press,
Oxford, England,1979.
[38] J. Žunić, On the asymptotic number of linear grid square partitions. Bild und Ton, 1991.
26.1 Introduction
The interest of making research in Artificial Neural Networks (ANNs) resides in the appealing properties
that ANNs exhibit: adaptability, learning capability, and ability to generalize. Nowadays, ANNs are receiv-
ing a great attention from the international research community: a large number of studies concerning
training, structure design, and real world applications, ranging from classification to robot control or
vision [1].
The neural network training task is a capital process in supervised learning, in which a pattern set
made up of pairs of inputs plus expected outputs is known beforehand. This set of patterns is used to
compute the set of weights that makes the ANN to learn it. To achieve this goal, the algorithm must
modify the weights of the neural network in order to get the desired output for a given input, usually in
an iterative manner, until a minimum error between the actual and the expected output is attained.
One of the most popular training algorithms in the domain of neural networks is the Backpropagation
(BP) technique (generalized delta rule) [2], a gradient-descendent method. Other techniques such as
Evolutionary Algorithms (EAs) have been also applied to the training problem in recent years [3,4],
26-453
trying to avoid the local minima in such a complex problem. Although training is a main issue in ANN’s
design, many other works exist addressing the evolution of the layered structure of the ANN or even the
elementary behavior of the neurons composing the ANN. For example, in Reference 5 a definition of
neurons, layers, and the associated training problem is faced by using parallel genetic algorithms (GAs);
also, in Reference 6 the architecture of the networks and the weights are evolved by using the EPNet
evolutionary system. An exhaustive revision of this topic is really difficult to perform nowadays, however,
the work of Yao [7] represents an excellent starting point to get acquired of the research in training ANNs.
The motivation of the present chapter is manyfold. First, we want to perform a standard presentation
of results that promotes and facilitates future comparisons. This sounds like common sense, but it is
not frequent that authors follow standard rules for comparisons such as the structured Prechelt’s set
of recommendations [8], a “de facto” standard for many ANN researchers. A second contribution is to
include in our study, not only well-known EAs and the BP algorithm, but also the Levenberg–Marquardt
(LM) approach [9], and two additional hybrids. The potential advantages coming from an LM utilization
merit a detailed study. We have selected a benchmark from the field of Medicine, composed of three
classification problems: diagnosis of breast cancer, diagnosis of diabetes in Pima Indians, and diagnosis
of heart disease.
The remainder of the chapter is organized as follows. Section 26.2 introduces the ANN computation
model. In Section 26.3, we give a brief description of the algorithms under analysis. In Section 26.4, we
discuss the mechanisms used for representing solutions and evaluating their quality, a methodological
step needed in the application of EAs. The details of the experiments and the analysis of results are shown
in Section 26.5. Finally, we summarize our conclusions and future work in Section 26.6.
Inputs Weights
W1
A1
W2
A2
W3
A3
Sum-of-product
Output
x
Σ f (x) y
WN
AN Summation Activation
function function
1
θ
Bias
Output
Output layer
Neuron
Hidden layer
Connection
weights
Input layer
Input pattern
The recurrent model defines networks in which feedback connections are allowed, thus inducing complex
dynamical properties in the ANN. In this chapter we concentrate on the first and simpler model, the
feedforward networks. To be precise, we consider the so-called multilayer perceptron (MLP) [11], in which
units are structured into ordered layers, and connections are allowed only between adjacent layers in an
input-to-output sense (see Figure 26.2).
For any MLP, several parameters such as the number of layers and the number of units per layer must
be defined. Then, the last step in the design is to adjust the weights of the network, so that it produces
the desired output when its corresponding input is presented. This process is known as training the ANN
or learning the network weights. Network weights comprise both the previously mentioned connection
weights as well as bias terms for each unit. The latter can be viewed as the weight of a constant saturated
input that the corresponding unit always receives. As initially stated, we will focus on the learning situation
known as supervised training, in which a set of current-input/desired-output patterns is available. Thus,
the ANN has to be trained to produce the desired output according to these examples. The input and
output of the network are both real vectors in our case.
In order to perform a supervised training we need a way of evaluating the ANN output error with
respect to the expected output. A popular measure is the Squared Error Percentage (SEP). This term
measures the proximity of the actual output to the desired output. We can compute this error term just
for one single pattern or for a set of patterns. In this last case, the SEP is the average value of the separate
SEP values. The expression for this global SEP is:
omax − omin p
P S
p
SEP = 100 · (ti − oi )2 , (26.1)
P ·S
p=1 i=1
p p
where ti and oi are, respectively, the ith components of the expected vector and the current output vector
for the pattern p; omin and omax are the minimum and maximum values of the output neurons, S is the
number of output neurons, and P is the number of patterns. This SEP value is closely related to the Mean
Squared Error (MSE), whose expression is:
P S p p
p=1 i=1 (ti − o i )2
MSE = . (26.2)
P ·S
In classification problems, we could use still an additional measure: the Classification Error Percentage
(CEP). CEP is the percentage of incorrectly classified patterns, and it is a usual complement to any of the
other two (SEP or MSE) crude error values, since CEP reports in a high-level manner the quality of the
trained ANN.
P
S
p p
E= (ti − oi )2 . (26.3)
p=1 i=1
The actual value of the previous expression depends on the weights of the network. The basic BP
algorithm calculates the gradient of E and updates the weights by moving them in the gradient-descendent
direction. This can be summarized with the expression:
∂E
wij (t + 1) = wij (t ) − η , (26.4)
∂wij
where the parameter η > 0 is the learning rate that controls the learning speed. A more general BP
algorithm adds to the previous expression a momentum term in order to increase the stability of the
search process. Then, the final expression for the BP algorithm is:
∂E
wij (t + 1) = wij (t ) + αwij (t ) − η , (26.5)
∂wij
where wij (t ) is the change in the weight wij at step t , and α is the momentum constant, whose value
must match 0 ≤ α < 1. With this term, the algorithm accelerates the minimization of the error when the
error function does not change (in smooth zones of the function). The pseudo-code of the BP algorithm
is shown in Figure 26.3.
where J p (w) is the Jacobian matrix of the vector ep (w) evaluated in w, and I is the identity matrix. The
vector ep (w) is the error of the network for pattern p, that is, ep (w) = tp − op (w). The parameter µ is
increased or decreased at each step. If the error is reduced then µ is divided by a factor β and it is multiplied
by β in other case. LM performs the steps included in Figure 26.4. It calculates the network output, the
error vectors, and the Jacobian matrix for each pattern. Then, it computes w using Equation (26.6) and
recalculates the error with w + w as network weights. If the error has decreased, µ is divided by β, the
new weights are maintained, and the process starts again; otherwise, µ is multiplied by β, w is calculated
with a new value, and it probes again.
the weights and biases of the ANN, using the network error function to be minimized. Some general
considerations must be taken into account when using an evolutionary approach for ANN training. The
search features of EAs contrast with those of the BP and LM in that they are not trajectory driven, but
population driven. An EA is expected to avoid local optima frequently by promoting exploration of the
search space, in opposition to the exploitative trend usually allocated to local search algorithms such
as BP or LM. In this chapter, we use four EAs: a canonical GA, the CHC method, the Hy3 algorithm,
and an ES.
26.3.3.2 CHC
The CHC acronym stands for “Cross generational elitist selection, Heterogeneous recombination, and
Cataclysmic mutation” [13]. A CHC is a non-canonical GA that combines a conservative replace strategy
(elitist replace) with a highly disruptive recombination (HUX). In Figure 26.6 we present the pseudo-code
for the CHC method. The main differences between a canonical GA and a CHC are described in the
following paragraphs.
First, in CHC, the bias in favor of the best structures occurs in the replacement stage (or survival-
selection) rather than in reproduction-selection. More precisely, during the selection for reproduction,
each member of the current population is copied to the parent set, that is, the candidates for reproduction
are identical to the current population except that the order of the structures has been shuffled. During the
replacement, the offspring and the old population are merged and ranked according to the fitness value,
the new population is created by selecting the best µ members out of the merged population (where µ is
the population size). This mechanism always preserves the best individuals found so far.
Second, a new bias is introduced against mating individuals who are similar (incest prevention). The
initial threshold (minimum Hamming distance between the parents) for allowing mating is often set to 1/4
of the chromosome length. If no offspring is inserted into the new population during the reproduction,
this threshold is reduced by 1.
Third, the recombination operator in CHC is a variant of uniform crossover (HUX), a highly disruptive
form of crossover. This operator crosses over exactly half of the bits that differ between the two parent
strings. HUX guarantees that the children are always the maximum Hamming distance from their parents.
Finally, a restart process reintroduces diversity whenever convergence is detected (i.e., the difference
threshold of the incest prevention bias has dropped to zero). The population is restarted by using the best
individual found so far as a template for creating the new population.
Exploitation plane
e4 e1
E4 E1
e3 e2
E3 E2
Exploration Plane
The resulting structure is a parallel-suited multi-resolution method using several crossover opera-
tors that allow to achieve simultaneously a diversified search (reliability) and an effective local tuning
(accuracy).
equations:
where C(σ , ω ) is the covariance matrix associated with σ and ω , N (0, 1) is the standard univariate
normal distribution, and N(0, C) is the multivariate normal distribution with mean 0 and covariance
matrix C. The subindex i in the standard normal distribution indicates that a new random number is
generated for each component of the vector. The notation N (0, 1) is employed to indicate that the same
random number is used for all the components. The parameters τ , η, and ϕ are set to (2n)−1/2 , (4n)−1/4 ,
and 5π/180, respectively as mentioned in Reference 16.
For the recombination many alternatives may be used. In Reference 17 Fogel summarizes some of
them. The three vectors of the individual are recombined in an independent way, that is, for each vector,
a different recombination scheme can be chosen.
grouping together the input weights and bias for each unit. This way, the probability of transmitting them
as a block is increased. Obviously, recombination is not used in many EAs, so this consideration does not
apply to all situations.
In BP, LM, ES, and Hy3 each variable is encoded using a machine-dependent codification for real
numbers. In the rest, the encoding of solutions is approached via binary strings. More precisely, m bits
are used to represent each single variable; subsequently, the k m-bit segments are concatenated into an
l-bit binary string, where l = k · m. This encoding of the network variables raises a number of issues. Two
of them are the choice of m and the encoding mechanism for individual variables, that is, pure binary,
Gray-coded numbers, magnitude-sign, etc. In this work we use 16-bit pure binary encoding. The integer
value of each variable is mapped linearly into an interval and the result is the weight value.
Now, we discuss the alternatives for the fitness function in order to evaluate the quality of the solutions
in EAs. The objective of the network is to classify all the patterns correctly, that is, to obtain 0% of CEP.
We can use the CEP as a function to minimize. However, among two networks with the same CEP for the
pattern set, the output of one of them can be nearer to the desired output than the other. For this reason,
the SEP can better guide the search. In our presentation, the fitness function to be minimized is the SEP
for the training pattern set. Conversely, a maximization approach should consider the inverse of the SEP
as fitness.
• Cancer: Diagnosis of breast cancer. Classify a tumor as either benign or malignant based on cell
descriptions gathered by microscopic examination. There are 699 examples that were obtained by
Dr. William H. Wolberg at the University of Wisconsin Hospitals, Madison [20–23].
• Diabetes: Diagnose diabetes of Pima Indians. Based on personal data and the results of medical
examinations, decide whether a Pima Indian individual is diabetes positive or not. There are 768
examples from the National Institute of Diabetes and Digestive and Kidney Diseases by Vincent
Sigillito [24].
• Heart : Predict heart disease. Decide whether at least one of four major vessels is reduced in diameter
by >50%. This decision is made based on personal data and results of medical examinations.
There are 920 examples from four different sources: Hungarian Institute of Cardiology in Budapest
(Andras Janosi, M.D.), University Hospital of Zurich in Switzerland (William Steinbrunn, M.D.),
University Hospital of Basel in Switzerland (Mathhias Pfisterer, M.D.), V.A. Medical Center of
Long Beach and Cleveland Clinic Foundation (Robert Detrano, M.D., Ph.D.) [25,26].
The structure of the MLP used for all the instances accounts for three layers (input-hidden-output)
having six neurons in the hidden layer. The number of neurons in the input and output layers depends
on the concrete instance. The activating function of the neurons is the sigmoid function. Table 26.1
summarizes the network architecture for each instance.
To evaluate an ANN, we split the pattern set into two subsets: the training one and the test one. The
ANN is trained with the different algorithms by using the training pattern set, and then it is evaluated on
the unseen test pattern set. The training set for each instance is approximately made of the first 75% of
the examples, while the last 25% constitutes the test set. The exact number of patterns for each instance is
presented in Table 26.1 to ease future comparisons.
After presenting the problems, we now turn to describe the parameters for the algorithms. The para-
meters for BP, LM, and the hybrid algorithms are shown in Table 26.2. The hybrid algorithms use the same
parameters as their elementary components. However, the mutation operator of the GA is not applied but
it is replaced by BP or LM, respectively. The BP and LM are applied with an associated probability pt , just
like any other classical genetic operator. When applied, BP/LM only performs one single epoch.
The parameters of the rest of the algorithms are shown in Tables 26.3 and 26.4. The ES uses neither
vector of angles nor recombination, thus making irrelevant the distribution of variables inside the vector.
The ES and Hy3 algorithms employ real representation for the variables of the network, but the GA, the
CHC, and the hybrid algorithms use binary vectors. These vectors are 16-bit length and represent a real
Patterns
Instance Architecture Training Test
BC DI HE
GA CHC Hy3 ES
Abbreviation Meaning
LR Linear Ranking
SUS Stochastic Universal Sampling
FCBX Fuzzy Connective-Based Crossover
SPX Single Point Crossover
HUX Highly Disruptive Uniform Crossover
GM Gaussian Mutation
NU Non-Uniform Mutation
60
BP LM GA CHC Hy3 ES GABP GALM
50
40
CEP(%)
30
20
10
0
Cancer Diabetes Heart
Instances
value in the interval [−1, +1]. The weights (variables) associated to input links for a neuron are placed
contiguously in the chromosome. The GA uses a steady-state strategy.
There are many interesting works related to neural network training that also solve the instances tackled
here. But unfortunately, some of the results are not comparable with ours, because they use a different
definition of the training and test sets; this is why we consider a capital issue to adhere to a standard
way of evaluations like the one proposed by Prechelt [8]. In fact, we did find some works for meaningful
comparisons.
For the Cancer instance the best results are obtained by BP, ES, GABP, and GALM. They are followed in
a less accurate ranking by LM and, finally, by the GAs. This is not a surprising fact, since the GAs perform
a rather explorative search in this kind of problems. For this instance, we found that the best mean CEP
in Reference 27 is 1.1%, which represents a lower accuracy compared to our 0.02% obtained with the
GALM hybrid, so far the best solution to our knowledge. In Reference 28, a CEP close to 2% for this
instance is achieved, and the reader should notice that our GALM is one hundred times more accurate.
The mentioned work uses 524 patterns for the training set and the rest for the test set, that is, almost
exactly our configuration with only one pattern changed (a minor detail), and therefore the results can be
compared. The same happens for the work of Yao and Liu [6], where their EPNet algorithm gets neural
networks reaching 1.4% of CEP.
In Diabetes, BP and ES show again the highest accuracy, followed by LM and GALM. This time, GA,
CHC, and GABP obtain similar CEP, always below that of Hy3. For this instance, a CEP of 30.11% is
reached in Reference 29 (outperformed by many of our algorithms like BP, LM, ES, and GALM) with the
same network architecture as here. In Reference 6 the authors report a 22.4% of CEP for this problem,
clearly outperformed by our BP and ES techniques.
As to the Heart problem, we found approximately the same behavior as for the two previous instances.
However, Hy3 reaches lower CEP than GABP, and GALM gets the lowest CEP. In Reference 29 the
authors reported a 45.71% of CEP for the Heart instance using the same architecture. All our algorithms
outperform this CEP measure (except Hy3 and GABP).
We observed that BP and ES performed slightly more accurate than LM for all the instances. This is also
an unexpected result, since LM is quite accurate in many applications. We conclude that the three selected
instances represent problems that do not really need such a complex in-depth local search. In the near
future we plan to consider larger and more complex instances to further test LM. With respect to the
hybrid algorithms, the results do confirm this hypothesis: GALM is more accurate than GABP.
In summary, we have found some of the more accurate results for the three instances, but there is still
need to get ahead in other instances and improve the accuracy on the addressed ones. A learned lesson
is to always keep in mind the importance of reporting results in a standard way for meaningful future
comparisons.
26.6 Conclusions
In this chapter we have tackled the neural network training problem with eight algorithms: two well-
known problem-specific algorithms such as BP and LM, four general metaheuristics such as GA, CHC,
Hy3 Algorithm, ES, and two hybrid algorithms combining the GA with the problem-specific techniques.
To compare the algorithms we solved three classification problems from the domain of Medicine: the
diagnosis of breast cancer, the diagnosis of diabetes in the Pima Indians, and the diagnosis of heart
disease.
Our results show that the problem-specific algorithms (BP and LM) and the ES get lower classification
error than the other more general search procedures. It is surprising that the ES, also a general search
procedure, gets a lower CEP than the LM, a sophisticated problem-specific algorithm.
With respect to hybrids, the algorithm GALM outperforms in two of the three instances the classification
error of the problem-specific algorithms. This makes GALM a promising algorithm for neural network
training. On the other hand, many of the classification errors obtained in this work are below those found
in the literature, what represents a contribution as a reference work for these medical problems. As a future
work we plan to add new algorithms to the analysis, and to apply them to more instances, especially in the
broader domain of Bioinformatics.
Acknowledgments
This work has been partially funded by the Ministry of Science and Technology (MCYT) and Regional
Development European Fund (FEDER) under contract TIC2002-04498-C05-02 (the TRACER project,
http://tracer.lcc.uma.es).
References
[1] J.T. Alander. Indexed Bibliography of Genetic Algorithms and Neural Networks. Technical
report 94-1-1NN, University of Vaasa, Department of Information Technology and Production
Economics, 1994.
[2] D. Rumelhart, G. Hinton, and R. Williams. Learning representations by backpropagation errors.
Nature, 323: 533–536, 1986.
[3] E. Cantú-Paz. Pruning neural networks with distributions estimation algorithms.
In Erick Cantú-Paz et al., Eds., Proceedings of GECCO 2003, Vol. 2733 of Lecture Notes in Computer
Science. Springer-Verlag, 2003, pp. 790–800.
[4] C. Cotta, E. Alba, R. Sagarna, and P. Larrañaga. Adjusting weights in articficial neural networks
using evolutionary algorithms. In P. Larrañaga and J.A. Lozano, Eds., Estimation of Distribu-
tion Algorithms. A New Tool for Evolutionary Computation. Kluwer Academic Publishers, 2001,
pp. 357–373.
[5] E. Alba, J.F. Aldana, and J.M. Troya. Full automatic ANN design: A genetic approach. In J. Mira,
J. Cabestany, and A. Prieto, Eds., New Trends in Neural Computation. Springer-Verlag, 1993,
pp. 399–404.
[6] X. Yao and Y. Liu. A new evolutionary system evolving artificial neural networks. IEEE Transactions
on Neural Networks, 8: 694–713, 1997.
[7] X. Yao. Evolving artificial neural networks. Proceedings of the IEEE, 87: 1423–1447, 1999.
[8] L. Prechelt. Proben1 — A Set of Neural Network Benchmark Problems and Benchmarking Rules.
Technical report 21, Fakultät für Informatik Universität Karlsruhe, 76128 Karlsruhe, Germany,
September, 1994.
[9] M.T. Hagan and M.B. Menhaj. Training feedforward networks with the Marquardt algorithm.
IEEE Transactions on Neural Networks, 5: 989–993, 1994.
[10] J.L. McClelland and D.E. Rumelhart. Parallel Distributed Processing: Explorations in the Microstruc-
ture of Cognition. MIT Press, Cambridge, MA, 1986.
[11] F. Rosenblatt. Principles of Neurodynamics. Spartan Books, New York, 1962.
[12] T. Bäck. Evolutionary Algorithms in Theory and Practice: Evolution Strategies, Evolutionary
Programming, Genetic Algorithms. Oxford University Press, New York, 1996.
[13] L.J. Eshelman. The CHC adaptive search algorithm: How to have safe search when engaging in
nontraditional genetic recombination. In Foundations of Genetic Algorithms. Morgan Kaufmann,
San Mateo, CA 1991, pp. 265–283.
[14] E. Alba, F. Luna, A.J. Nebro, and J.M. Troya. Parallel heterogeneous genetic algorithms for
continous optimization. Parallel Computing, 2004.
[15] I. Rechenberg. Evolutionsstrategie: Optimierung technischer Systeme nach Prinzipien der biologischen
Evolution. Fromman-Holzboog Verlag, Stuttgart, 1973.
[16] G. Rudolph. Evolutionary Computation 1. Basic Algorithms and Operators, Vol. 1. IOP Publishing
Ltd, 2000, chap. 9, pp. 81–88.
[17] D.B. Fogel. Evolutionary Computation 1. Basic Algorithms and Operators, Vol. 1. IOP Publishing
Ltd, 2000, chap. 33.2, pp. 270–274.
[18] C. Cotta and J.M. Troya. On decision-making in strong hybrid evolutionary algorithms. In Tasks
and Methods in Applied Artificial Intelligence, Vol. 1415 of Lecture Notes in Artificial Intelligence,
1998, pp. 418–427.
[19] L. Davis, Ed. Handbook of Genetic Algorithms. Van Nostrand Reinhold, New York, 1991.
[20] K.P. Bennett and O.L. Mangasarian. Robust linear programming discrimination of two linearly
inseparable sets. Optimization Methods and Software, 1: 23–34, 1992.
[21] O.L. Mangasarian, R. Setiono, and W.H. Wolberg. Pattern recognition via linear programming:
Theory and application to medical diagnosis. In Thomas F. Coleman and Yuying Li, Eds., Large-
Scale Numerical Optimization. SIAM Publications, Philadelphia, PA, 1990, pp. 22–31.
[22] W.H. Wolberg. Cancer diagnosis via linear programming. SIAM News, 23: 1–18, 1990.
[23] W.H. Wolberg and O.L. Mangasarian. Multisurface method of pattern separation for medical
diagnosis applied to breast cytology. In Proceedings of the National Academy of Sciences, Vol. 87,
U.S.A., December 1990, pp. 9193–9196.
[24] J.W. Smith, J.E. Everhart, W.C. Dickson, W.C. Knowler, and R.S. Johannes. Using the ADAP learn-
ing algorithm to forecast the onset of diabetes mellitus. In Proceedings of the Twelfth Symposium
on Computer Application in Medical Care. IEEE Computer Society Press, 1988, pp. 261–265.
[25] R. Detrano, A. Janosi, W. Steinbrunn, M. Pfisterer, J. Schmid, S. Sandhu, K. Guppy, S. Lee, and
V. Froelicher. International application of a new probability algorithm for the diagnosis of coronary
artery disease. American Journal of Cardiology, 64: 304–310, 1989.
[26] J.H. Gennari, P. Langley, and D. Fisher. Models of incremental concept formation. Artificial
Intelligence, 40: 11–61, 1989.
[27] T. Ragg, S. Gutjahr, and H. Sa. Automatic determination of optimal network topologies based on
information theory and evolution. In Proceedings of the 23rd EUROMICRO Conference, Budapest,
Hungary, September 1997.
[28] W.H. Land and L.E. Albertelli. Breat cancer screening using evolved neural networks. In IEEE
International Conference on Systems, Man, and Cybernetics, 1998, Vol. 2, IEEE Computer Society
Press, October 1988, pp. 1619–1624.
[29] W. Erhard, T. Fink, M.M. Gutzmann, C. Rahn, A. Doering, and M. Galicki. The improvement
and comparison of different algorithms for optimizing neural networks on the MasPar MP-2.
In M. Heiss, Ed., Neural Computation — NC’98. ICSC Academic Press, 1998, pp. 617–623.
27.1 Introduction
The dissemination of information systems in organizations and technological progress in terms of
computational power, communications technology, and storage capability has the side effect of producing
repositories with huge amounts of data. Databases — as these repositories are called — have grown in
number and size and it quickly became apparent that all the stored information constituted a valuable
resource, especially if reliable methods could be found to recover useful information — knowledge —
from the raw data in storage. A completely new interdisciplinary field has grown around this general goal
and is now known as Knowledge Discovery in Databases (KDD). As a broadly accepted definition, we
can say that KDD is a complex process that aims to extract implicit, previously unknown, and potentially
useful information from data, in a nontrivial way.
The central step in the KDD process is usually called data mining and consists of the actual search for
interesting regularities or patterns in the data. This step is preceded by a processing stage, where data is
27-469
prepared for the application of a data mining algorithm. After the data mining algorithm is executed,
a postprocessing stage occurs, where the algorithm’s results can be refined and simplified.
From the data mining view point, the essential task is to build computer programs capable of search-
ing through the data for nuggets of knowledge. These nuggets are usually represented as data patterns
that are expected to have some beneficial characteristics, namely: validity on new data with a high degree
of certainty, novelty, potential usefulness measured by some utility function, and “comprehensibility to
humans,” allowing them a better understanding of the underlying data.
Data used in this task is usually divided into instances described by a set of attributes. The most common
problem in data mining is trying to predict the value of a user-defined attribute based on the values of
some other attributes. This is called a classification problem. When we want to predict the values of several
attributes (instead of just one goal attribute), the problem is now an association one. In some problems
we do not really have an attribute that defines the class of a given instance but still would like to know if
it were possible to group the data in different groups or classes. This grouping (and class discovery) must
then be done by the algorithm and this becomes a clustering problem. Useful overviews of the field can
be found in References 1 and 2.
There are several research areas that provide the methods in data mining, the most important being stat-
istics and Artificial Intelligence (AI). Other areas include pattern recognition, databases, data visualization,
etc. Within the AI area, the main contributions came from the machine learning community and represent
the three main AI paradigms: symbolic, connectionist, and evolutionary. From these, the last two bring us
back to the title — and content — of this chapter, since both have a strong biological inspiration. A useful
introduction to machine learning approaches to data mining is presented in Reference 3.
Connectionist approaches in data mining are usually synonyms of neural networks. This is one of the
most widely known (if not widely understood) technique for data mining. It provides a very flexible way
of fitting a model to observed data. This model can then be used to classify new data or make some kind
of prediction on it. The major drawback of neural networks is that the resulting models cannot easily
fulfill the “comprehensibly to humans” requirement, since their most friendly representation is a network
of weighted connections between nodes where some nonlinear operation occurs.
In this chapter, we focus mainly on bio-inspired approaches for data mining that are not neural network
related. We start by giving a slight overview of these approaches, especially evolutionary ones. We also
describe some novel approaches that are based in new computational models, inspired by biological
mechanisms, but not completely accommodated by the connectionist or evolutionary paradigms.
Following this overview, we introduce a new approach to the data mining task of finding classification
rules based on the Particle Swarm Optimizer (PSO) algorithm.
The particle swarm algorithm has been originally presented in Reference 4 as a population-based
function optimizer in the n-dimensional space of real numbers. Bird flock flight simulations initially
inspired the algorithm, and biological inspiration is still present in the current denomination. The swarm
or flock metaphor applies to the PSO in the way particles fly in a somewhat coordinated way through an
n-dimensional space (the space of parameters of the function being optimized) in search of some desired
place (function optimum).
While being considered as a form of evolutionary algorithm, there is no use of genetic operators
such as mutation or recombination, as is the case in other evolutionary paradigms such as evolutionary
programming, evolutionary strategies, or genetic algorithms. Explicit selection is also not present. Instead,
in each iteration, the position of every particle in the search space is updated accordingly to the particle’s
velocity. The velocity of a particle in a given iteration is a function of the velocity in the previous iteration,
its best previous position in the search space, and the best previous position in the search place of all of the
particle’s neighbors. The behavior of a particle in the swarm is the result of balancing the desire of flying
toward the best point in the search place according to its own experience or conforming to the swarm
knowledge of where the current best point is. For an extensive discussion of the cultural model behind
the PSO, as well as of the PSO itself and several variants see Reference 5.
Our method uses a specialized particle swarm algorithm to build rules for classification tasks. We call the
resulting algorithm a Particle Swarm Data Miner (PSDM). We present several variants of the PSDM, more
explicitly, a discrete version, two real-valued versions using different stop criteria, and an adaptive version.
This last version is based on the Simple Adaptive Predator–Prey Optimizer (SAPPO) algorithm [6,7],
initially developed to improve performance on real-valued optimization problems. SAPPO introduces
a predator–prey mechanism in the basic swarm algorithm in order to maintain diversity during the
search. It also includes a swarm of symbiotic particles, designed to allow the adaptation of the algorithm
parameters to the problem being tackled.
These algorithms are compared between themselves and against two industry standard classification
algorithms (J48 and PRISM) on a set of benchmark problems. The results are then used to compare and
discuss the different characteristics of each variant.
Section 27.2 presents the overview on bio-inspired data mining methodologies. In Section 27.3 we
describe the higher level classification algorithm, which can transparently make use of any of the lower
level PSO-based algorithms to search for classification rules. The first three variants of the PSO algorithm
are discussed in Section 27.4. The SAPPO data miner is presented in Section 27.5. The experimental
setup used to test the algorithms is outlined in Section 25.6, where we also report and discuss the results
obtained. Finally, we draw some conclusions in Section 27.7.
by randomly changing parts of a solution (mutation) or recombining existing solutions into new ones
(recombination).
These basic principles have given origin to many flavors of evolutionary algorithms of which the
most widely known (and used) are genetic algorithms [8], evolutionary programming [9], evolutionary
strategies [9], and genetic programming (GP) [10]. In spite of their differences, all these algorithms are
built around what is usually called the generic evolutionary algorithm loop:
P(0) = Generate_initial_population()
Evaluate_population(P(0))
P’(t) = Select_individuals_to_reproduce(P(t))
P’’(t) = Create_offspring_using_recombination(P’(t))
P(t+1) = Merge_old_population_with_offspring(P’’(t))
Evaluate_population(P(t+1))
Of the algorithms given, genetic algorithms and GP are the most widely used for data mining tasks.
In Sections 27.2.1.1 and 27.2.1.2 we present, in a generic way, how these algorithms can be used for
data mining, more specifically their application to rule discovery. An extended survey on the use of
evolutionary algorithms not only for data mining but also for other tasks in the KDD progress can be
found in Reference 11.
pointed out that not only the accuracy of a rule (or rule set) is important in its evaluation, but also other
aspects can be taken into account, for example, its simplicity.
create abstract models of immune organs, cells, and molecules; a set of affinity functions, which quantify
the interactions of these elements; and a set of algorithms to rule the dynamic behavior of the artificial
immune system.
Several mechanisms of the immune system have been used as inspirations for specific algorithms
[23,24]. Clonal selection, a mechanism by which immune cells that have a higher affinity to a given
pathogen are allowed to reproduce more and with lower mutation rates than cells with lower affinity,
has been used in optimization. The “affinity measure” is given by the function being optimized. There is
no special reason why this mechanism cannot be used to search for binary represented if–then rules, with
affinity being measured by rule fitness.
Another theoretical model of the immune system, immune networks, have been used to develop
learning networks, where nodes correspond to immune cells and connection strength represent affinity
between the pathogen recognized by each cell. This model has already been used in data analysis, namely
for data clustering [25,26].
IF attribute_a=value_1
AND attribute_b=value_2
AND attribute_n=value_I
THEN class_x
A rule set is constructed with rules performing different attributes tests, predicting different classes and
a default rule, to capture and classify instances not classified by the previous rules.
Rule #1
Rule #2
Rule #n
Default Rule
Containing no attribute tests and predicting the same class as the one predominant in the remaining
instances, the default rule takes the form:
IF true
THEN class_x.
As an illustrating example, one of the datasets commonly used as a benchmark, describes animals accord-
ingly to whether they have hair, feathers, or tail, among other attributes, and classifies them as mammals,
birds, fish, etc. The resulting set of rules could be:
=>RULE #1
IF milk = true
=>RULE #2
IF feathers = true
=>RULE #3
IF fins = true
=>RULE #4
IF airborne = false
=>RULE #5
IF legs=6
=>RULE #6
IF TRUE
A new instance, the characterization of one animal that we might need to classify, would be iteratively
tested against each rule, until its attributes fully matched the attribute tests of one rule. This new instance
would be classified with the rule’s predicted class.
This type of database that extracted high level knowledge is greatly valued in areas such as loan granting,
fraud detection, marketing, etc., where huge databases already exist and predictions are most welcome.
Once found, a set of if–then rules is easily comprehended and utilized by humans, which need no special
training.
The complexity of this problem — finding the rule set and the set of attribute tests for each rule —
lies in the exponentially high number of attribute-value pairs, which generate a wide multidimensional
search space.
In this kind of search problem, evolutionary algorithms such as GA and PSO have already proven to be
reliable and efficient due to their parallel, population-based search strategies.
Following a typical architecture of the Michigan approach in classification rule discovery, the over-
all structure of our work was designed contemplating three imbricated procedures, each one fulfilling
a specific task.
Validation Procedure
| Covering Procedure
| |
| | |
| |
10 times
The classification rule discovery procedure aims to find and return the rule that better classifies the
predominant class in a given instance set. One rule is better than another, if it can match a higher
number of instances with the specified class and segregate instances with a different class. It is at this
level that the PSO-based algorithms are used. The covering procedure, on receiving an instance set (the
training set), invokes the classification rule discovery procedure to reduce this set by removing instances
correctly classified by the rule returned by the classification rule discovery procedure. This process is
repeated until only a predefined number of instances are left to classify in the training set. A sequential
rule set is therefore created — hence the Michigan approach. The aim of the validation procedure is, not
only to determine the accuracy of a rule set returned by the covering procedure, but also to gauge the
liability of the complete classifying procedure — classification rule discovery and covering procedures
altogether. This is achieved by iteratively dividing the initial dataset into different test and training sets
and computing average indicators, such as accuracy, rule number per set, and attribute tests number
per rule.
Having this architecture, clearly segregating tasks to each of the three imbricated procedures, and using
alternatively different search strategies in its core procedure — classification rule discovery — we end up
having an unbiased platform for testing and benchmarking each of these search strategies.
• There is a previous rule in the rule set that has a subset of the rule’s attribute tests.
• If it predicts the same class as the default rule and is located just before it.
Therefore, in the example given, rule number 2 and 3 will be removed and the rule set will be reduced to
the first and last rules:
Rule #1
If attribute_a = x_a
Then class=c_1
Rule #2
Then class=c_2
Rule #3
If attribute_c = x_c
Then class=c_3
If TRUE
Then class=c_3.
definitions have been tried [30]; here, we assume that every particle is a neighbor to every other particle
in the swarm. The general equations for updating the position and velocity for some particle i are the
following:
In this formula, χ is the constriction coefficient described in Reference 27, ϕ1 and ϕ2 are random numbers
distributed between 0 and an upper limit and different for each dimension in each individual, Pi is the
best position particle i has found in the search space and g is the index of the best individual in the neigh-
borhood. The velocity is usually limited in absolute value to a predefined maximum, Vmax . The parameter
w is a linear decreasing weight. The swarm is usually run for a limit number of iterations or until an error
criterion is met.
From Equation (27.1) we can derive the two most usual ways in which convergence and, as a result,
the balance between exploration and exploitation is controlled. Reference 28 uses χ = 1 and weight w
decreasing linearly from wmax to wmin during the execution of the algorithm. In Reference 27 convergence
is guaranteed by choosing appropriated values for χ and ϕ = ϕ1 + ϕ2 . w is fixed and equal to 1 in this
approach.
Although some actions differ from one variant of PSO to another, its basic pseudo-code is as follows:
Initiate_Swarm()
Loop
Evaluate(p)
Update_past_experience(p)
Update_neighbourhood_best(p,k)
Move(p,d)
Until Criterion
The output of this algorithm is the best point in the hyperspace the swarm visited.
There are several variants of PSO, typically differing in the representation: discrete or continuous
PSO [5]; in the mechanism used to avoid spatial explosion of the swarm and guaranteeing convergence:
linear decreasing weight [28] or constricted PSO [27]; or in the mechanism used to avoid premature
convergence to local optima: predator–prey interactions [6] or collision avoiding swarms [31].
Nominal attributes are normalized assigning to each different attribute value an enumerated index #idx
and applying Equation (27.2)
idxv × t
vnorm = . (27.2)
#idx
idxv is the index of the attribute value v and #idx the total number of different attribute values.
Both integer and real types are normalized using Equation (27.3)
(v − vmin ) × t
vnorm = . (27.3)
vmax − vmin
vmin and vmax are the lower and higher attribute values found for this attribute.
Rules/Particles are encoded as a floating-point array and each attribute is represented by either one or
two elements on the array, according to its type: nominal attributes are assigned with one element on the
array and attribute-matching tests are defined as follows:
true if vr × #idx = vi × #idx ,
m(vr , vi ) = (27.4)
false otherwise.
t being the indifference threshold value, vr the attribute value stored in the rule for testing and vi the
instance value stored in the normalized image of the dataset.
Integer and real attributes are assigned with an extra element in the array in order to implement a value
range instead of a single value,
true if vr1 ≥ t or (vr1 − vr2 ) ≤ vi or (vr1 + vr2 ) ≥ vi ,
m(vr1 , vr2 , vi ) = (27.5)
false otherwise.
vr1 can be seen as the center and vr2 as a neighborhood radius, inside which matching will occur.
where
• TN (True Positives) = number of instances covered by the rule that are correctly classified, that is,
its class matches the training target class.
• FP (False Positives) = number of instances covered by the rule that are wrongly classified, that is,
its class differs from the training target class.
• TN (True Negatives) = number of instances not covered by the rule, whose class differs from the
training target class.
• FN (False Negatives) = number of instances not covered by the rule, whose class matches the
training target class.
This formula penalizes a particle, which has moved out of legal values, assigning it with a negative value
(−1.0), forcing it to return to the search space.
the algorithm’s parameters. An undesirable side effect of these strategies is that when the local optimum is
not the global optimum being sought, particles cannot gain velocity to jump to another optimum in the
search space. This phenomenon is similar to premature convergence in other evolutionary algorithms.
Our motivation for introducing the predator–prey interaction was mainly to introduce a mechanism
for creating diversity in the swarm at any moment during the run of the algorithm, by adding velocity to
some particles, not depending on the level of convergence already achieved. This would allow the “escape”
of particles even when convergence of the swarm around a local sub-optimum had already occurred.
There are other mechanisms used for the same effect in the literature (see Reference 31), but two main
reasons led us to prefer the predator–prey scheme. The first reason is its computational simplicity when
compared with other approaches. As it will be seen when the mechanism is explained, it only introduces
a new particle and little computational effort in the basic algorithm. The simple adjective in SAPPO
comes from this fact. A second, and less technical, motive was to maintain the swarm intelligence philos-
ophy behind the algorithm. It seemed more appropriate to introduce a mechanism that could also be
implemented as a distributed behavior in the swarm.
The predator–prey model is based on the disturbance caused by a predator to the group of animals
being hunted. Animals are driven from their favorite locations, for example, pastures and water sources,
by fear of nearby predators. Eventually, this process will result in the finding of even better locations where
the arriving animals will also be chased by nearby predators.
It is this predator/prey interaction that we try to reproduce in SAPPO. Here, a new particle is introduced
into the swarm to mime the predators’ behavior. This particle, called the predator particle, is attracted by
the best (fittest) particle in the swarm, according to the following equations:
Vp (t ) = ϕ3 (Xg (t − 1) − Xp (t − 1)),
(27.7)
Xp (t ) = Xp (t − 1) + Vp (t ).
In Equation (27.7), ϕ3 is another random number distributed between 0 and an upper limit, usually 1,
and Xg is the present position of the best particle in the swarm.
The predator particle can influence any particle in the swarm by changing its velocity in one or
more dimensions. This influence is controlled by a “fear” probability f , which is the probability of
a particle changing its velocity in one of the available dimensions due to the presence of the predator. For
some particle i, if there is no change in the velocity in a dimension j, the update rules in that dimension
still are:
vij (t ) = wvij (t − 1) + ϕ1ij (pij − xij (t − 1)) + ϕ2ij (pg j − xij (t − 1)),
(27.8)
xij (t ) = xij (t − 1) + vij (t ).
The only differences from the other approaches are that w is fixed and χ is not explicitly used. However,
if the predator “scares” the prey (particle), that is, if there is a change in velocity in dimension j, the rule
becomes:
vij (t ) = wvij (t − 1) + ϕ1ij (pij − xij (t − 1)) + ϕ2ij (pg j − xij (t − 1)) + D(d),
(27.9)
xij (t ) = xij (t − 1) + vij (t ).
This process is repeated for all dimensions, that is, there can be simultaneous changes in velocity in several
dimensions.
The fourth term in the first equation of (27.9) quantifies the repulsive influence of the predator. This
term is a function of the difference between the positions of the predator and the particle. The Euclidean
distance between predator and prey is d. D(x) is an exponential decreasing distance function:
D(x) makes the influence of the predator grow exponentially with proximity. The objective of its use
is to introduce more perturbation in the swarm when the particles are near the predator, which usually
happens when convergence to a local optimum occurs. The a and b parameters define the form of the
D function: a represents the maximum amplitude of the predator effect over a prey and b allows to control
the distance at which the effect is still significant.
During the initial stages of the algorithm, its behavior is similar to a traditional PSO, since the particles
are scattered and the predator’s influence is negligible. As convergence occurs and particles start to move
to a local optimum, their distance to the best particle diminishes and the predator effect exponentially
increases, accelerating some particles in new directions. When the current local optimum is not the same
as the global optimum being searched, this will hopefully allow one particle to jump to another near
local optima, becoming the new best particle and leading the swarm to a new exploration phase. Several
repetitions of this process could lead the swarm through a chain of local optima until the global optimum
is found.
The aim of the adaptation scheme is to find, simultaneously, a solution for some problem and a set of
parameters that increase the performance of the algorithm in the search for that solution. To achieve
this, we must link in some way the two swarms in the algorithm. We modeled this link on the symbiotic
relations so common in nature, where two species live in a close relation from which both get some
advantage.
In SAPPO each solution particle lives in symbiosis with a parameter particle, which encodes the
parameters used when the algorithm’s update equations are applied to that solution particle. The symbiotic
relation is implemented through the definition of the parameter particle fitness function.
A parameter particle has a “slower” life cycle than its companion does. While a solution particle is
evaluated and has its velocity and position updated every iteration, to a parameter function this only
happens every i iterations (usually 10). The parameter particle is then evaluated by comparing the actual
fitness of its solution particle companion with the same fitness i iterations ago. If there was an improve-
ment, its value is stored as the parameter particle fitness, unless it is smaller than the improvement value
already stored. When a new improvement value replaces an older one, the P vector is also updated with
the current parameter particle’s position in the search space. Velocity and position updates will then be
made using Equation (27.11). As a final element, the fitness of a parameter particle will decay slowly over
every iteration of the algorithm (usually being multiplied by a decay factor α = 0.98). This decaying
ensures that a parameter particle has to keep producing improvements in the associated solution particle
to maintain a high fitness.
Our approach to adaptation in SAPPO tries to maintain, as we already did with the predator–prey
principle, the biological inspiration of the original PSO algorithm. We also made an effort to develop
a mechanism that could be implemented following principles and ideas underlying the paradigm of
swarm intelligence. Both the mechanisms introduced are based in the interaction of simple particles,
with simple update rules and no centralized control. Both the solution and the behavior of the algorithm
emerge from the interactions between the individuals in this ecology of particles, and if there is intelligence
in the system it is clearly not preprogrammed but emerges as a property of the system of swarms.
Zoo is a simple dataset that acts as a basic test. It is useful to access the algorithms’ ability to deal with
multiple classes. Breast Cancer is a typical dataset from the field’s literature with only two possible
classes. Wisconsin Breast Cancer presents a similar problem but with a larger search space and
number of instances. Promoters, with instances being DNA sequences, presents a qualitatively different
problem, with much more attributes to be taken into account then the previous datasets. This also results
in a much larger search space. Splice is again a DNA classification problem, but with the availability of
a significantly larger number of examples.
The PSO-based data miners were empirically compared with two standard classification algorithms,
namely J48 and PRISM. J48 is a Java implementation of a decision-tree building algorithm (C4.5) and
PRISM is a rule discovering algorithm. Both are widely used for classification tasks. J48 was chosen for
this comparison since it is probably the best known “industry standard” data mining algorithm. PRISM
uses a similar covering strategy to the one used in the higher level of our algorithms, so it is especially
useful for accessing the advantages of using swarm particle algorithms for the rule discovering level. Both
algorithms can produce rule sets as their final classification model, which provides significant advantages
in comparison fairness. Not only do the algorithms produce the same type of knowledge, but also rule
sets can be easily compared between themselves in terms of comprehensibility and simplicity.
For each dataset we ran ten tenfold cross-validations for each of the algorithms being compared. This
means that 100 classification models where built for each algorithm and dataset. For each model, 90% of
the available instances were effectively used in its construction, while the remaining 10% were used for
testing. The results are presented in Section 27.6.1.
27.6.1 Results
The tables presented show the results obtained for the described datasets. For each experiment with a given
algorithm/dataset combination, we present the average accuracy and variance over the ten tenfold cross-
validations. Averages and standard deviations are also presented for the number of rules and attribute test
per rule over the same cross-validations. For J48 the number of rules represents the number of tree leafs
and the attribute/rule ratio is equal to the number of nodes in the tree divided by the number of leafs.
These numbers allow a fair comparison between the complexity of rule set and tree-based models.
Experiments were done with J48, PRISM, three variants of the basic PSDM with different stop criterions
and, finally a SAPPO-based data miner. The algorithms with platform and radius stop criterions were
investigated in order to find out if temporal complexity — always a major constraint in evolutionary
computation — could be reduced without compromising accuracy. The other PSDM algorithms stopped
(for each rule) when a 2000 iteration limit was achieved.
On an individual analysis of the results, for the zoo dataset, we can observe that, with exception of
PRISM, all algorithms performed at similar levels. From these the simplest models were built by the
PSO-based data miners, with just around seven rules against an average 10.98 by J48. In the PSO-based
approaches, each rule tested just one attribute against an average 1.67 by the J48 algorithm. This makes the
models found by the PSDM algorithms significantly simpler for this problem than the ones found by J48.
For the Wisconsin breast cancer dataset, the accuracies are again similar, with a slight advan-
tage for PRISM. Regarding simplicity, we find that the tendency observed for the previous dataset is
maintained. While J48 (the nonPSDM algorithm with simpler models) uses an average 37.18 rules, all the
PSDM obtained models with a number of rules slightly below or around 10. The number of attributes
per rule is similar.
The results for the breast cancer dataset show that several PSO data miners are around 2% more
accurate than J48, the best standard data miner. The models found are again significantly simpler for the
PSDM algorithms.
The experiences with the promoters dataset show that the best PSDM algorithm presented an
accuracy of 3% above the most accurate nonPSO approach — J48. The other PSDM algorithms all
obtained similar accuracy. In terms of simplicity, the PSO-based approaches used, again, a significantly
inferior number of rules and attribute tests.
Experimental Results
For the final dataset, splice, the accuracy of J48 is almost 8% above the best PSO data miner, which
was the SAPPO-based approach. This difference is reflected in terms of simplicity, with the SAPPO models
requiring a mere 6.92 average rules, against an average 105.21 from J48. The next best PSDM approach
obtained a 5% worse accuracy than the SAPPO-based PSDM.
On a more global analysis, there are also some conclusions to be made. The most relevant one is
clearly the competitiveness of the PSO-based data miners — especially SAPPO — with the classical, more
established, approaches in terms of accuracy. Inclusively, in two of the datasets (three if we include zoo
where the results were very similar, but with all PSDM approach getting an accuracy around 1% above the
accuracy of J48), the best results were obtained by PSO-based approaches. Only in the splice dataset
was the best PSO miner clearly outperformed by a classical approach.
When discussing comprehensibility of the results, and assuming simpler models are more comprehen-
sible, results are extremely favorable to the PSO-based approaches. The PSO miners always present the
solutions with the least number of rules for each of the tested datasets and the number of attribute tests
per rule is also always, at least, slightly lower than the values obtained by the classic approaches. Probably,
it is this bias toward simplicity that is accountable for the worse results with the splice dataset, this
being an issue worthy of further investigation.
Between the PSO-based approaches, in terms of simplicity, all the algorithms performed in a similar way.
Regarding accuracy, SAPPO was probably the most robust PSDM, with results near the best PSDM for all
datasets, and significantly superior for the last, more complex, problem.
27.7 Conclusions
In this chapter, after briefly discussing other biologically inspired methods for data mining, more specifi-
cally for classification problems, we presented a first approach to classification based on particle swarm
algorithms. We tested several variants of our algorithm, including an adaptive one, which uses symbiotic
particles to adapt its parameters to the problem, and a predator–prey mechanism to maintain diversity in
the swarm.
The results obtained until now, for a limited number of datasets, lead us to believe that the PSO-based
approaches are clearly competitive with the classical algorithms, used for comparison, in terms of accuracy,
and clearly superior in terms of simplicity and, consequently, comprehensibility.
Results with the last dataset seem to indicate that an increase in number of instances can result in a loss
of accuracy of the PSDM approaches, probably caused by the identified bias toward simplicity, which can,
in these cases become prejudicial by not allowing more complex models to be found by the algorithm.
We plan to address this issue in future work.
Especially relevant for us is the fact that, for all but one of the datasets, the PSDM approaches outper-
formed the PRISM algorithm, which uses a similar covering strategy, but a greedy approach to the search
for new rules. The strengths of evolutionary search algorithms are clearly patent in these results.
If a choice was to be made between the PSDM algorithms, if temporal complexity was an issue,
the PSDM-radius and PSDM-platform variants are clearly competitive in terms of accuracy — with a
possible slight advantage for the platform version — and obviously take less time to find solutions than
the remaining approaches. If accuracy was the essential choice criterion, these first results seem to indicate
that the SAPPO approach can be more robust over a large set of application domains.
References
[1] U.M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy. Advances in Knowledge Discovery
and Data Mining. AAAI/MIT Press, Cambridge, MA, 1995.
[2] S. Weiss and N. Indurkhya. Predictive Data Mining: A Practical Guide. Morgan Kaufmann,
San Francisco, CA, 1998.
[3] I.H. Witten and E. Frank. Data Mining: Practical Machine Learning Tools and Techniques with Java
Implementations. Morgan Kaufmann, San Francisco, CA, 1999.
[4] R.C. Eberhart and Y. Shi. Particle swarm optimization. In Proceedings of the International
Conference on Neural Networks and the Brain. Beijing, China, 1998, pp. PL5–PL13.
[5] J. Kennedy, R.C. Eberhart, and Y. Shi. Swarm Intelligence. Morgan Kaufmann, San Francisco, CA,
2001.
[6] A. Silva, A. Neves, and Ernesto Costa. An empirical comparison of particle swarm and predator prey
optimisation. In Proceedings of Artificial Intelligence and Cognitive Science: 13th Irish International
Conference, AICS 2002. Vol. 2464/2002 of Lecture Notes in Computer Science, September 12–13,
2002, Springer-Verlag, Limerick, Ireland.
[7] A. Silva, A. Neves, and E. Costa. SAPPO: A simple, adaptive, predator prey optimiser. In Proceedings
of the EPIA’03 — 11th Portuguese Conference on Artificial Intelligence, Workshop on Artificial Life
and Evolutionary Algorithms (ALEA), December 4–7, 2003, Beja, Portugal.
[8] D.E. Goldberg. Genetic Algorithms in Search, Optimization and Machine Learning. Addison-Wesley,
Reading, MA, 1989.
[9] T. Back. Evolutionary Algorithms in Theory and Practice: Evolution Strategies, Evolutionary
Programming, Genetic Algorithms. Oxford University Press, Oxford, 1996.
[10] J.R. Koza. Genetic Programming: On the Programming of Computers by Means of Natural Selection.
MIT Press, Cambridge, MA, 1992.
[11] A.A. Freitas. A survey of evolutionary algorithms for data mining and knowledge discovery.
Advances in Evolutionary Computation (A. Ghosh and S. Tsutsui, Eds.), Springer-Verlag, 2002.
[12] A. Giordana and F. Neri. Search-intensive concept induction. Evolutionary Computation, 3: 375–
419, 1995.
[13] D.P. Greene and S.F. Smith. Competition-based induction of decision models from examples.
Machine Learning, 13: 229–257, 1993.
[14] K.A. DeJong, W.M. Spears, and F.D. Gordon. Using genetic algorithms for concept learning.
Machine Learning, 13: 161–188, 1993.
[15] C.Z. Janikow. A knowledge-intensive genetic algorithm for supervised learning. Machine Learning,
13: 189–228, 1993.
[16] M.L. Wong and K.S. Leung. Data Mining Using Grammar Based Genetic Programming and
Applications. Kluwer, Dordrecht, 2000.
[17] J. Eggermont, A.E. Eiben, and J.I. van Hemert. A comparison of genetic programming variants for
data classification. Proceedings of the Intelligent Data Analysis (IDA-99), 1999.
[18] C.C. Bojarczuk, H.S. Lopes, and A.A. Freitas. Discovering comprehensible classification rules by
using genetic programming: A case study in a medical domain. GECCO, 953–958, 1999.
[19] M.D. Ryan and V.J. Rayward-Smith. The evolution of decision trees. In Proceedings of the Third
Annual Conference on Genetic Programming, Morgan Kaufmann, San Francisco, CA, 1998.
[20] E. Bonabeau, M. Dorigo, and G. Théraulaz. Swarm Intelligence: From Natural to Artificial Systems.
Oxford University Press, Oxford, 1999.
[21] M. Dorigo, V. Maniezzo, and A. Colorni. The ant system: optimization by a colony of cooperating
agents. IEEE Transactions on Systems, Man, and Cybernetics — Part B, 26: 29–41, 1996.
[22] R.S. Parpinelli, H.S. Lopes, and A.A. Freitas. An ant colony algorithm for classification rule
discovery. In Data Mining: A Heuristic Approach (H. Abbass, R. Sarker, and C. Newton, Eds.),
Idea Group Publishing, London, 2002, pp. 191–208.
[23] L.N. de Castro and J. Timmis. Artificial Immune Systems: A New Computational Intelligence
Approach. Springer-Verlag, 1998.
[24] D. Dasgupta, Z. Ji, and F. González. Artificial immune system (AIS) research in the last five years.
In Proceedings of the International Conference on Evolutionary Computation Conference (CEC),
Canbara, Australia, December 8–12, 2003.
[25] L.N. de Castro and Fernando José Von Zuben. aiNet: An artificial immune network for data
analysis. In Data Mining: A Heuristic Approach (Hussein A. Abbass, Ruhul A. Sarker, and Charles
S. Newton, Eds.), Idea Group Publishing, USA, March, 2001.
[26] O. Nasraoui, F.A. González, C. Cardona, C. Rojas, and D. Dasgupta. A scalable artificial immune
system model for dynamic unsupervised learning. GECCO, 219–230, 2003.
[27] M. Clerc and J. Kennedy. The particle swarm-explosion, stability, and convergence in a
multidimensional complex space. IEEE Transactions on Evolutionary Computation, 6: 58–73, 2002.
[28] Y. Shi and R.C. Eberhart. Empirical study of particle swarm optimization. In Proceedings of the
IEEE Congress on Evolutionary Computation (CEC 1999), Piscataway, NJ, 1999, pp. 1945–1950.
[29] http://web.ics.purdue.edu/∼hux/bibliography.shtml
[30] J. Kennedy. Small worlds and mega-minds: Effects of neighborhood topology on particle
swarm performance. In Proceedings of IEEE Congress on Evolutionary Computation (CEC 1999),
Piscataway, NJ, 1999, pp. 1931–1938.
[31] T.M. Blackwell and P.J. Bentley. Don’t push me! collision-avoiding swarms. In Proceedings of the
IEEE Congress on Evolutionary Computation (CEC 2002), Honolulu, Hawaii, USA, 2002.
[32] T. Sousa, A. Neves, and A. Silva. Swarm optimisation as a new tool for data mining. In Proceedings
of the Parallel and Distributed Processing Symposium, 2003, pp. 144–149.
[33] T. Sousa, A. Silva, and A. Neves. A particle swarm data miner. In Proceedings of 11th Protuguese
Conference on Artificial Intelligence (EPIA 2003), Vol. 2902 of Lecture Notes in Computer Science,
Progress in Artificial Intelligence, Beja, Portugal, 2003, pp. 43–53.
28.1 Introduction
Experiments with DNA microarray technology generate a large amount of data as they are able to measure
the expression levels of thousands of genes in a single experiment [1]. To explore these data it is necessary
to develop knowledge discovery techniques that can extract biological significance and use the data
to assign functions to genes. This is the goal of data mining (also known as Knowledge Discovery in
Databases [KDDs]) that has been defined as “The nontrivial extraction of implicit, previously unknown,
and potentially useful information from data” [2].
Several kinds of representations to express knowledge that can be extracted from microarray data can be
found in the literature [6]. Most current approaches in this area group genes, using clustering algorithms
such as hierarchical clustering [3] or Kohonen maps, that is, self-organizing maps [4]. Association rule
28-491
mining is an advanced data mining technique that is useful in deriving meaningful rules from a given
dataset. Rule mining, as well as other datamining approaches, may be seen as a NP-hard combinatorial
problem that can be solved with combinatorial optimization methods such as evolutionary algorithms.
In this chapter, we propose a multicriteria hybrid metaheuristic for microarray data analysis. We first
introduce the biological context of the work. Then we set the rule mining context. After this positioning,
we present a multicriteria hybrid algorithm and describe every feature of the algorithm (representation,
operators, etc.). Finally, results on real datasets are presented and analyzed.
In the Synteni/Stanford chips, probes of cDNA (500/5000 bases long) are immobilized to a solid surface
such as glass using robot spotting and exposed to a set of targets either separately or in a mixture.
Affymetrix chips are powerful technology for resequencing DNA and polymorphisms detection. They use
photolabile agents and photolithography techniques.
Although fundamental differences exist between those two technologies, their strength lies in a massively
parallel analysis of thousands of genes and the generation of a lot of data.
Analyzing DNA microarray data requires a preprocessing phase [1,5,6]. Figure 28.1 recalls the different
steps required to produce new biological assumptions from microarray experiments. Major steps are
described as follows:
Relative gene expression: The differential gene expression is calculated by dividing the intensity of the
gene in the sample under study by its intensity level in the control. This intensity ratio has a highly
asymmetric distribution. To avoid this problem, a log 2 -transformation is usually used to make a
normal-like distribution:
Normalization: There are many sources of variations of measures in microarray experiments (variations
in cells or individuals, mRNA extraction, isolation, hybridization condition, optical measurement,
scanner noise, etc.). The purpose of normalization is to adjust (or correct) a signal in order to make
the comparison with other signals more meaningful. Many techniques for normalization aim to
make the data more normally distributed (log transformation per chip and per gene). This is an
important issue for data analysis.
New hypothesis
Model Validation
DB
Analysis
Normalization
DB DB
Microarray Selection
DB Discretization
Gene filtering
Image analysis
Normalization
Gene filtering and discretization: We are interested in finding genes that show significant differences
between two groups of patients. Hence, the filtering process may remove genes that do not differ-
entiate the sample under study from the test sample (their relative expression is not significant).
Most of the time, gene expression data is discretized into under/over expressed genes thanks to
cutoffs.
Hence after these different phases, data may be considered as large tables indicating, with discretized
values, the relative gene expression for thousands of genes under different experiments.
reveal biological relevant associations between different genes or between environmental effects and gene
expression. They apply it with success to a yeast database. Kotala et al. [27] introduce a new approach
to mine association rules from microarray gene expression data using Peano count tree. Icev et al. [30]
focus on the combinatorial analysis of motifs involved in transcriptional control and introduce a notion
of association rules with distance information.
In our study we will consider data in the treatment table form (genes are the columns, treatments — or
comparison of individuals of different status — are the rows. Our goal is to look for rules combining genes
where a term can be in the form gene = value. The value belongs to the discretized gene expression
level. An example of a rule could be: IF (gene12 = over_expressed) AND (gene504 = under_expressed)
THEN (gene8734 = over_expressed).
x2 y2
x1 y3 y1
f 2 min
4
3
0 f 1 min
FIGURE 28.3 Example of dominance. In this example, points 1, 3, and 5 are nondominated. Point 2 is dominated
by point 3, and point 4 by points 3 and 5.
∀k ∈ [1..p], fk (x i ) ≤ fk (x j ),
(28.2)
∃k ∈ [1..p]/fk (x i ) < fk (x j ).
Definition 28.1. A solution is Pareto optimal if it is not dominated by any other solution of the feasible set.
The set of optimal solutions in the decision space X is denoted as the Pareto set, and its image in the
objective space is the Pareto front. In MOP, we are looking for all the Pareto optimal solutions.
In the Pareto front two types of solutions may be distinguished: the supported solutions (that are on
the convex hull of the set of solutions and that may be found by a linear combination of criteria), and
nonsupported solutions [33]. These solutions are important, because for some problems only few Pareto
solutions are supported (the extremes) and to get a good compromise between the two criteria, it is
necessary to choose one of the nonsupported solution.
Support (S): It is the classical measure of association rules. It enables to measure rule frequency in
the database. It is the percentage of transactions containing, both the C part and the P part,
in the database. It is used to find frequent itemsets in Apriori
|C&P|
S= . (28.3)
N
Confidence (Cf ): The Confidence measures the validity of a rule. It is the conditional probability of P
given C. It is used in Apriori to find interesting rules in frequent itemsets
|C&P|
Cf = . (28.4)
|C|
J-measure (Jm): Smyth and Goodman [37] have proposed the Jmeasure, which estimates the degree of
interest of a rule and combines support and confidence. It is used in optimization methods [38,39]
|P| |C&P| N × |C&P|
Jm = × log . (28.5)
N |P| |C| × |P|
Interest (I ): The Interest measures the dependency while privileging rare patterns in the region of weak
support
N × |C&P|
I= . (28.6)
|C| × |P|
Surprise (R): It is used to measure the affirmation. It enables the search for surprising rules
Hence, the use of these five criteria allow an evaluation of a rule in a multicriteria manner. As they
are complementary, this model is interesting to select interesting rules and to reduce the numbers of
possible rules.
1–Pm
Pm1
Value
Pm
mutation
Offsprings Mutation
Attribute
mutation 1–Pm1
chosen (see Figure 28.8). The third one is a reduction mutation that randomly removes one term
of the rule. The last one is an augmentation mutation that randomly adds a term to the rule.
Offspring 1 Att 1 = Val1 AND Att 3 = Val4 AND Att 4 = Val3 THEN Att 7 = Val3
New individual Att 1 = Val10 AND Att 5 = Val2 THEN Att 7 = Val3
New individual Att 4 = Val10 AND Att 5 = Val2 THEN Att 7 = Val3
2. Nondominated sorting GA (NSGA): This method assigns ranks to solutions by first finding
the set of nondominated solutions in the current population. Those solutions are removed
from the population and assigned rank 1. As these solutions are removed, a new so-called
front of nondominated solutions is now present in the remainder of the original population.
This second front is extracted and assigned rank 2. This procedure is repeated till there is no
solution present in the population (see Figure 28.10 for a minimization problem) [42].
Experiments have shown that with the proposed algorithm, both ranking methods give interesting
results with a small superiority for the Pareto ranking. Therefore we will use this Pareto ranking
for the selection process.
0
* 5
¤
+ 2
Criterion1
0
* 0 1
* +
0
*
0
*
Criterion2
1
* ¤
3
2
1 +
Criterion1
3
* ¤
1 2
* +
1
* 1 + 2
*
Criterion2
Replacement operator: We use the elitist nondominated sorting replacement. The worst ranked solutions
are replaced by the dominating solutions (if there exists any) generated by mutation and crossover
operators (offsprings). The size of the population remains unchanged.
Archive: Nondominated association rules are archived into a secondary population called the “Pareto
Archive” in order to keep track of them. It consists of archiving all the Pareto association rules
encountered over generations. When a new Pareto solution is added to the archive, an update has
to be done (some solutions may become dominated).
Elitism: The Pareto solutions (best solutions) are not only stored permanently, they also take part in
the selection and may participate to the reproduction.
28.4.4 Hybridization
In order to increase the robustness of the approach, we hybridize it with an exact enumeration procedure.
As the search space is large, this enumeration is realized for a small subspace defined thanks to solutions of
the population. Hence, we design an operator which makes an exhaustive search of all the possible rules
generated with attributes selected in two rules.
This operator may be seen as a quadratic crossover operator. It takes as input two individuals, each
coding a rule. It examines all the possible itemsets that can be derived from the items composing the
rules, while taking into account the different possible values of the attributes. All the possible rules that
can be constructed from the generated itemsets are evaluated and introduced if necessary in a local Pareto
archive. Finally, only the global Pareto solutions of the local Pareto archive are introduced in the global
Pareto archive with the replacement operator. Two offsprings are candidates to take part in the population
(Figure 28.11).
Individual 1 Individual 2
Enumeration
Then for each mutation operator Mi , assume Nb_mut (Mi ) applications of the mutation are done
during a given generation ( j = 1, . . . , Nbmut (Mi )). Then we can compute the profit of a mutation Mk :
j progressj (Mk )/Nb_mut(Mk )
Profit(Mk ) = . (28.9)
i j progressj (Mi )/Nb_mut(Mi )
We set a minimum rate δ and a global mutation rate pmutation for N mutation operators. The new mutation
ratio for each Mi is calculated using the following formula [46]:
The sum of all the mutation rates is equal to the global rate of mutation pmutation . The initial rate of
application of each mutation operator is set to pmutation /N .
28.5 Experiments
28.5.1 Data
In order to evaluate the algorithm, we evaluate it on two microarray databases:
• A confidential microarray data containing 22,376 human genes for 45 Affymetrix chips (DB1).
• A public database, the “MIPS yeast genome database” containing 2467 genes for 79 chips (YeastDB).
Genes expressions have been discretized and may take five values: Increase (I), Marginal Increase (MI),
when the gene is over-expressed, Decrease (D) and Marginal Decrease (MD) when it is under-expressed
and No Change (NC), when the difference of expression is not significant.
For an initial study, a set of 514 genes (numbered from 1 to 514) that show an interesting differential
expression over the set of experiments (filtered on the number of No Changes), have been selected for
DB1. For the yeastDB all the genes (2467) have been considered.
• Let L1 (respectively L2 ) be the set of solutions in PO1 (respectively PO2 ) that are dominated by
some solutions of PO2 (respectively PO1 ).
• Let N1 (respectively N2 ) be the other solutions of PO1 (respectively PO2 ): Ni = POi \
(PO ∪ Wi ∪ Li ).
Let us remark that PO ∗ = PO + W1 + N1 + W2 + N2 and Cont (PO1 /PO2 ) +
Cont (PO2 /PO1 ) = 1 (with Cont (PO1 /PO2 ) ∈ [0, 1]). Hence a contribution greater than 0.5 indicates
that the Pareto front has been improved.
For example, we evaluate the contribution of the two sets of solution PO1 and PO2 in Figure 28.12:
solutions of PO1 (respectively PO2 ) are represented by circles (respectively crosses). We obtain
Cont (PO1 , PO2 ) = 0.7 and Cont (PO2 , PO1 ) = 0.3.
28.5.3 Results
Genetic algorithms are stochastic methods. Hence to evaluate the proposed approach, we have executed
10 runs for each configuration. Results are given with different indicators: mean of the value, minimum,
maximum, and its standard deviation. This allows a more efficient comparison.
C=4
Y2 W1 = 4
W2 = 0
N1 = 1
N2 = 1
Y1
28.5.3.1.1 Elitism
The gain of elitism may be evaluated by comparing similar configurations with and without elitism (here,
C/A and D/B). Table 28.1 indicates that on an average, the Pareto fronts obtained using elitism are of
better quality than those obtained without elitism (Cont (C/A) = 0.62 and Cont (D/B) = 0.72). Moreover
this contribution may reach 0.9 for best improvements. This shows the interest of this mechanism.
28.5.3.1.3 Hybridization
Two questions may arise concerning the use of the hybridization. Is it interesting to use such an operator
and is it worth using it at each iteration. Hence we can compare configurations E/D and E/E . Table 28.1
shows that using the hybridization allows to improve the more complete configuration (Pareto ranking +
elitism + adaptive strategy). In this case, the minimum contribution encountered is equal to 0.59, which
means that the hybridization always improves the Pareto front. When applying the operator only one
generation over ten (Conf E ), the results are still interesting (Cont (E /D) = 0.60), but Pareto front
obtained are in general dominated by those produced by Conf E (Cont (E/E ) = 0.58), where the operator
is applied at each generation. However, the drawback of using Conf E is the large amount of time required
and the compromise between the quality of the solution and the computing time allowed has to be
considered.
TABLE 28.2 Description of Some Pareto Solutions Obtained with Conf E (YeastDB)
Rules Description
Rules S Cf I R Jm
28.6 Conclusion
In this chapter we have presented a multiobjective genetic algorithm for rule mining problems. There-
fore, a multicriteria model has been proposed for association rules mining. We have proposed a genetic
algorithm which helps look for the Pareto solutions regarding the five selected criteria. We have presented
its application to analyze microarray experiment data. Through the experiments, the advanced mech-
anisms proposed have been validated, and in particular the hybridization with an exact enumerative
procedure.
In order to improve the use of such an algorithm and to speed up executions that may take several
hours, we now work on a parallel implementation of the method. The parallelism should allow for results
in a more reasonable time, but should also allow to an execution of more intense searches with the
hybridization operator. Hence, we will be able to give biologists different hypothesis for their evaluation.
References
[1] D.P. Berrar, W. Dubitzky, and M. Granzow, Eds. A Practical Approach to Microarray Data Analysis.
Kluwer Academic Publishers, New York, 2003.
[2] U. Fayyad, G. Piatetsky-Shapiro, and P. Smyth. From data mining to knowledge discovery: An
overview. Advances in Knowledge Discovery. MIT Press, Cambridge, MA, 1996, pp. 1–34.
[3] P.T. Spellman, G. Sherlock, M.Q. Zhang, V.R. Iyer, K. Anders, M.B. Eisen, P.O. Brown, D. Botstein,
and B. Futcher. Comprehensive identification of cell cycle regulated genes of yeast Saccharomyces
cervisiae by microarray hybridization. Molecular Biology of the Cell, 9: 3273–3297, 1998.
[4] P. Tamayo, D. Slonim, J. Mesirov, Q. Zhu, S. Kitareewan, E. Dmitrovsky, E. Lander, and T. Golub.
Interpreting patterns of gene expression with self-organizing maps: Methods and application
to hematopoietic differentiation. Proceedings of National Academy of Sciences, 96: 2907–2912,
1999.
[5] B. Phimister. The chipping forecast. Nature Genetics, 21(Suppl.): 1–60, 1990.
[6] Collective works. The human genome project. Nature, 409: 813–959, 2001.
[7] R. Agrawal, T. Imielinski, and A. Swami. Mining association rules between sets of items in large
databases. In P. Buneman and S. Jajodia, Eds., Proeceedings of the 1993 ACM SIGMOD International
Conference on Management of Data. ACM Press, Washington, DC, May 1993, pp. 207–216.
[8] S. Morishita and A. Nakaya. Parallel branch-and-bound graph search for correlated association
rules. In Large-Scale Parallel Data Mining, 1999, pp. 127–144.
[9] T. Scheffer. Finding association rules that trade support optimally against confidence. In Principles
of Data Mining and Knowledge Discovery, 2001, pp. 424–435.
[10] F. Angiulli, G. Ianni, and L. Palopoli. On the complexity of mining association rules. In Atti del
Nono Convegno su Sistemi Evoluti per Basi di Dati (SEBD), Venise, Italie, 2001, p. 8.
[11] M.R. Garey and D.S. Johnson. Computers and Intractability. The Guide to the Theory of
NP-Completeness. W.H. Freeman and Company, San Francisco, CA, 1979.
[12] R. Agrawal and R. Srikant. Fast algorithms for mining association rules. In J.B. Bocca, M. Jarke,
and C. Zaniolo, Eds., Proceedings of the 20th International Conference on Very Large Data Bases,
VLDB, Morgan Kaufmann, San Francisco, CA, 12–15 1994, pp. 487–499.
[13] S. Brin, R. Motwani, and C. Silverstein. Beyond market baskets: Generalizing association rules to
correlations. In ACM SIGMAD, 1997, pp. 265–276.
[14] A. Savasere, E. Omiecinski, and S.B. Navathe. An efficient algorithm for mining association rules
in large databases. In The VLDB, Zurich, Switzerland, 1995, pp. 432–444.
[15] H. Toivonen. Sampling large databases for association rules. In T.M. Vijayaraman, A.P. Buchmann,
C. Mohan, and Nandlal L. Sarda, Eds., Proceedings of the 1996 International Conference on Very
Large Data Bases. Morgan Kaufmann, San Francisco, CA, 1996, pp. 134–145.
[16] M.J. Zaki. Parallel and distributed association mining: A survey. IEEE Concurrency, 7:14–25, 1999.
[17] L. Jourdan, C. Dhaenens, and E.-G. Talbi. Rules extraction in linkage disequilibrium mapping
with an adaptive genetic algorithm. In European Conference on Computational Biology (ECCB)
2003, Paris, France, 2003, pp. 29–32.
[18] C. Darwin. On the Origin of Species. John Murray, London, 1859.
[19] J. Holland. Adaptation in Natural and Artificial Systems. University of Michigan Press, Ann Arbor,
MI, 1975.
[20] S. Smith. Flexible learning of problem solving heuristics through adaptive search. In Proceedings
of the Eighth International Joint Conference on Artificial Intelligence, Karlsruhe, Germany, 1983,
pp. 422–425.
[21] K.A. De Jong, W.M. Spears, and D.F. Gordon. Using genetic algorithms for concept learning.
Machine Learning, 13: 161–188, 1993.
[22] C.Z. Janikow. A knowledge-intensive genetic algorithm for supervised learning. Machine Learning,
13: 189–228, 1993. J.J. Grefenstette, Ed., Kluwer Academic Publishers, Massachusetts.
[23] M. Pei, E.D. Goodman, and W.F. Punch III. Pattern discovery from data using genetic algorithms.
In Proceedings of the First Pacic-Asia Conference on Knowledge Discovery and Data Mining, February
1997. Available via www URL: http://garage.cps.msu.edu/papers/papers-index.html.
[24] G.M. Weiss. Timeweaver: A genetic algorithm for identifying predictive patterns in sequences
of events. In W. Banzhaf, J. Daida, A.E. Eliben, M.H. Garzon, V. Honavar, M. Jakiela, and
R.E. Smith, Eds., Proceedings of the Genetic and Evolutionary Computation Conference, Vol. 1.
Morgan Kaufmann, Orlando, Florida, USA, San Francisco, CA, 1999, pp. 718–725.
[25] D.P. Greene and S.F. Smith. Competition-based induction of decision models from examples.
Machine Learning, 13: 229–257, 1993. J.J. Grefenstette, Ed., Kluwer Academic Publishers,
Massachusetts.
[26] A. Giordana and F. Neri. Search-intensive concept induction. Evolutionary Computation, 3:
375–419, 1995.
[27] P. Kotala, P. Zhou, S. Mudivarthy, W. Perrizo, and E. Deckard. Gene expression profiling of
dna microarray data using peano count trees (p-trees). In Online Proceedings on the First Virtual
Conference on Genomics and Bioinformatics, 2001.
[28] R. Chen, Q. Jiang, H. Yuan, and L. Gruenswald. Mining association rules in analysis of transcription
factors essential to gene expressions. In Atlantic Symposium on Computational Biology and Genome
Information Systems and Technology, 2001.
[29] C. Creighton and S. Hanash. Mining gene expression databases for association rules.
Bioinformatics, 19: 79–86, 2003.
[30] A. Icev, C. Ruiz, and E.F. Ryder. Distance-enhanced association rules for gene expres-
sion. In ACM SIGKDD Workshop on Data Mining in Bioinformatics (BIOKDD03), 2003,
pp. 34–40.
[31] S. Cahon, N. Melab, E.-G. Talbi, and M. Schoenauer. Paradiseo based design of parallel and
distributed evolutionary algorithms. In Evolutionary Algorithms EA’03, 2003, pp. 195–207.
[50] E. Zitzler, L. Thiele, M. Laumanns, M.C. Fonseca, and V.G. da Fonseca. Performance Assess in IEEE
Transactions on Evolutionary Computation, 7(2): 117–132, 2003.
[51] M. Basseur, F. Seynhaeve, and E.-G. Talbi. Design of multi-objective evolutionary algorithms:
Application to the flow-shop scheduling problem. In Congress on Evolutionary Computation
(CEC’02), Honolulu, USA, 2002, pp. 1151–1156.
[52] H. Meunier, E.G. Talbi, and P. Reininger. A multiobjective genetic algorithm for radio
network optimisation. In CEC, Vol. 1, IEEE Service Center, Piscataway, NJ, July 2000, pp. 317–324.
[Genetic algorithms] have been shown to rapidly converge to near-optimal solutions in a wide variety
of application domains. Further, they have been shown to be computationally efficient, and to be
well suited for solving problems characterized by local minima.
This chapter presents two engineering design problems, both of which were solved with biology-inspired
algorithms. The evolutionary approach is used in universal electromotor geometry optimization (UM
design) and integrated circuits area/time optimization (IC design).
In the UM design we improve the efficiency of a universal motor; here the goal is to find a new set of
independent geometrical parameters for the rotor and the stator with the aim of reducing the motor’s
power losses, which occur in the iron and the copper. In the IC design we improve some parts of the
high-level synthesis process of integrated circuits by considering the concurrency of operation scheduling
and resource allocation constraints to ensure a globally optimal solution in a reasonable time.
29-509
FIGURE 29.1 A UM used in a vacuum cleaner, showing its rotor and stator parts.
• Each parameter’s dimension should only be varied within a predefined feasible limit.
• Parameter transformations and their evaluation should be done as quickly as possible.
29.1.1.2 The Efficiency of a UM
The efficiency of a UM is defined as the ratio of the output power to the input power, and it depends on
various power losses, which include:
• Copper losses: the joule losses in the windings of the stator and the rotor.
• Iron losses: including the hysteresis losses and the eddy-current losses, which are primarily in the
armature core and in the saturated parts of the stator core.
• Other losses: such as brush losses, ventilation losses, and friction losses.
The overall copper losses (in all stator and rotor slots) are as follows:
PCu = ( J 2 · A · ρ · l turn )i , (29.1)
i
where i stands for each slot, J is the current density, A is the slot area, ρ is the copper’s specific resistance,
and lturn is the length of the winding turn.
Because of the nonlinear magnetic characteristic, the calculation of the iron losses is less exact. The iron
losses are separated into two components: the hysteresis losses and the eddy-current losses. Consequently,
a motor’s iron losses can be expressed by the following equation [1]:
P Fe = ke · B 2 · f rot
2
· m rot + ke · B 2 · f stat
2
· m stat + k h · B 2 · f stat · m stat , (29.2)
where ω is set by the motor’s speed, and T is a vector product of the distance from the origin, r, and the
electromagnetic force, F .
When considering all the mentioned losses and the output power, the overall efficiency of a UM can be
defined as follows:
P2
η= . (29.4)
P2 + PCu + PFe + PBrush + PVent + PFrict
storage units preserve those values over time. We have N different functional units Fi , i = 1, 2, . . . , N , nr
registers, and nb buses, while T represents the execution time of the DFG.
The parameters are calculated as follows:
• The number nfi is the highest number of the i-th functional unit needed in a separate
control step.
• The number nr is the highest number of variables needed in a separate control step. We consider
variables that are needed by the functional unit as input data, variables that are returned as output
data, and variables that are not used at the moment but will be used in some of the later control
steps or must be available until the end of the execution of all operations.
• The number nb is the highest number of data transmissions (into or from the functional units) in
a separate moment.
• The execution time, T , is the time needed to execute all the operations of the schedule.
• The weights wfi , wr , wb , and wt are the weights of functional units, registers, buses, and time,
respectively, to be considered in the IC quality-evaluation cost function. The first three weights are
proportional to their silicon area in the IC, while wt reflects our IC speed constraints.
• Landwehr et al. [4] suggest solving with integer linear programming. The procedure ensures
optimal solutions, but is computationally expensive and therefore practically useful only for small
circuits.
• Zhu and Gajski [5] suggest a “soft scheduling,” where operations are temporarily scheduled, but
are later adjusted according to some physical characteristics. Operations are finally scheduled after
the allocation and binding tasks, when all the parameters that affect the scheduling optimality are
known. Actually, there is no concurrency, but there is some iterative refinement of the scheduled
operations.
• Mandal and Zimmer [6] suggest concurrent scheduling and allocation using a genetic algorithm,
and with a stress on the reduction of the number of connections. In the optimization process, the
whole circuit is partitioned into blocks consisting of a functional unit, a storage unit and internal
connections. Operations are scheduled according to the units in those blocks, and according to
additional global storage units and interconnections, which are to be minimized. Here, the problem
of various functional units being present in each block arises, to ensure that all the operations of
the block are executed.
• Kim [7] suggests a concurrent procedure where the operations subset is chosen for each control
step. These operations are later bound to functional units and the variables are bound to storage
units in order to achieve the highest possible connection utilization. The order of operations to be
scheduled is based on their readiness for execution and their influence on the number of storage
and connection units, while considering the lowest possible number of control steps.
• Grajcar [8] describes the concurrency of scheduling and allocation on a multiprocessor system.
The procedure is based on a GA and list scheduling, which considers only some values (critical
points) in the chromosome code.
As mentioned earlier, when tasks are performed separately (Figure 29.4[a]) the solution is not neces-
sarily optimal; it is better to use an approach with iterative repetition of the scheduling and allocation
(Figure 29.4[b]). Here, the problem of the next operation/unit to be changed appears, since the order of
changes can influence the final solution. It is a similar situation with the approach that involves partition-
ing of the operations into small groups, within which there is an iterative repetition of the scheduling and
Evolutionary
optimization
time/area
the allocation (Figure 29.4[c]). Since there are fewer operations in the group there is no problem with the
order of changes, but there is a problem with the appropriate partitioning of the operations. Obviously,
the best approach is the one with purely concurrent scheduling and allocation (Figure 29.4[d]), where the
iterative-refinement order does not influence the quality of the solution [9]. Concurrency is achieved by
using algorithms that do not depend on the order of the transformations. Therefore, there is no influence
of the altered start time on the allocated unit, nor is there any influence of the allocated unit on the start
time. When all the transformations are made, then the appropriateness of the changes is checked.
(a) (b)
Scheduling
Scheduling
Allocation
Allocation
(c) (d)
Partitioning
Scheduling
and allocation
Scheduling Scheduling Scheduling
Binding
initial population of strings, which evolve into the next generation under the control of probabilistic
transition rules — known as randomized genetic operators — such as selection, crossover, and mutation.
The objective function evaluates the quality (or fitness) of solutions coded as strings. This information
is then used to perform an effective search for better solutions. There is no need for any other auxiliary
knowledge. The GA tends to take advantage of the fittest solutions by giving them greater weight, and
concentrating the search on the regions of the search space that show likely improvement.
29.2.1.2 Encoding
One of the most important parts of the GA is the encoding. By encoding the proper parameters and using
the proper encoding type we can significantly influence the efficiency of the algorithm.
In the UM design, the mutually independent variable geometrical parameters of the rotor/stator lam-
ination are coded as strings over the alphabet IR of real values. There is no need to normalize the physical
parameters. They do differ in ranges but the crossover operation switches the values of the same parameter,
no matter where the crossover point is, since each parameter is always encoded at the same place in the
chromosome.
In the IC design the chromosome string consists of the numbers that represent the starting time of
each operation and the allocated unit for each operation, where the position in the string depends on
the order of the operations in the input IC description. This means that the chromosome consists of
pairs of time/unit information for each operation. And the genetic operators can influence both parts
of that information, either together or separately. The selected encoding type is chosen because of its
convenience. When strings have to be further transformed, checked, and analyzed, there is no need for
any additional conversion of their values. In addition, the used implementation of genetic operators can
check the changed values (their feasibility) instantly, without any transformation. The correctness of the
transformation can therefore be checked within the function itself.
N
Cost = (costfi )2 + cost2r + cost2b + cost2t ,
i=1
costfi = wfi Fi ,
(29.5)
costr = wr nr ,
costb = wb nb ,
costt = wt T .
To obtain the cost of a certain DFG, the algorithm has to evaluate the required number of resources. In
contrast to the other multiobjective functions that give more than one final solution, this one already
includes the decision-making part, that is, it chooses one solution from all the solutions on the Pareto
front. The chosen solution has the shortest distance to the origin, where the origin represents the ideal,
costless, solution and the axis represents the considered objectives.
• Setup. If the chromosome that represents a solution is large, then the population size also has
to be large enough to ensure that many different chromosomes will be involved in a search. The
population size therefore depends on the size of the chromosome or the complexity of the problem.
• Crossover. Considering four candidates — two parents and their two offspring — only the first and
the third, rated according to their fitness, pass to the next generation. This forces, very probably,
at least one of the offspring to be passed to the next generation in addition to the best candidate.
Otherwise the offspring have only a small influence on new generations, since the crossing of two
good parents probably produces offspring that are not so good. They might, however, be good after
a few more transformations.
• Mutation. Chromosomes with low fitness are mostly exposed to mutation. Each position in the
chromosome string is mutated if that position of the chromosome is of the same value in the
majority of “poorly fitted” chromosomes in the population. This is the way to change the bad
characteristics in “poorly fitted” chromosomes and to redirect the search to another direction. In
the case of “well-fitted” chromosomes, values are mutated if they differ from the majority of values
in other good chromosomes at the same position. This ensures faster convergence in the final stages
of the optimization.
• Variation. The interchange of the values of two positions, as described for the basic operators, is
performed if the frequency of the value in that position in the population of one position is high
and the frequency of another bit is low.
1. The initial estimation for the geometry of the rotor and the stator is made based on experience.
2. The appropriateness of this geometry is then usually analyzed by means of a numerical simula-
tion of the electromagnetic field. In our case, the analysis is performed with commercial ANSYS
software [21], which applies a finite-element method (FEM) with an automatic finite-element-mesh
generation. The result is a magnetic vector potential on every node of the finite-element mesh.
3. If the results of the numerical simulation show an inconvenient electromagnetic field structure, the
direct design procedure is repeated until the motor geometry is optimized.
The advantage of this approach is that engineers can significantly influence the progress of the design
process by using their experience, and they can react intelligently to any noticeable electromagnetic
response with proper geometry redesign. The drawback of this approach is that an experienced engineer
and a large amount of time are needed.
The described conventional motor design can be upgraded with a genetic algorithm. The concept of
this evolutionary design can be roughly explained as follows:
1. The GA can start its optimization from any configuration of geometrical parameters while it defines
a feasible solution.
2. Each geometrical configuration is analyzed using the ANSYS finite-element program. This step
requires a prior decoding of the strings into a set of geometrical parameters for the rotor and the
stator.
3. After the calculation of the fitness, the reproduction of the individuals and the application of the
genetic operators to a new population are made. The GA repeats this procedure until a predefined
number of iterations have been accomplished.
The advantages of this approach are that: there is no need for an experienced engineer to be present during
the whole process, only at the beginning to decide on the initial design, and there is no need to know the
mechanical and physical details of the problem. The problem can be solved without any knowledge of
the problem.
The drawbacks of this approach are that it can lead to the improper use of genetic operators, and an
initial solution that is too loosely set, can lead to a longer convergence time.
• A setup part for setting the GA parameters and the geometry limits of the rotor/stator lamination
of an initial UM.
• An optimization part for optimizing the geometry of the rotor/stator lamination.
The program was developed using the Microsoft® Visual C++® programming tool and runs under the
Microsoft® Windows® operating systems.
29.3.1.3 Parameters
29.3.1.3.1 Geometrical Parameters
As stated earlier, there are 12 mutually independent geometrical parameters that need to be optimized.
These parameters can only be varied within their predefined dimension limits to find an optimum con-
figuration that will increase the motor’s efficiency. Solutions in which the parameters exceed the limits are
rejected as being inoperable.
There are some invariable parameters that have a strong influence when defining the outline of the
lamination:
1. The external radius of the stator: which roughly defines the amount of iron and copper, and
consequently, the price of the motor. This is held constant during the optimization to ensure
cost-comparable solutions.
1. The radius of the rotor’s shaft : which we fixed at 5.5 mm. From our experience we know that when
the rotor-shaft radius is less than 5.5 mm, the rotor’s natural frequency can fall below the maximum
frequency of the motor and the resulting resonance would cause the rotor’s vibrations to exceed
allowable limits.
3. The radius of the stator’s side-hole: which would be 0 mm in the ideal case, is set to a small positive
value because holes for the rivets are required in order to bind the stator.
4. The air gap: The angles of the symmetrical and tangential parts of the air gap are set to fixed values
because the commutation, which is conditioned by the air gap, is not taken into account during
the optimization.
29.3.1.3.2 GA Parameters
For the GA to work well, robust parameter settings have to be found for the population size, the number
of generations, the selection criteria, and the genetic operator probabilities:
• If the population is too small, the GA converges too quickly to a local optimum solution and can
miss the best solution. On the other hand, a large population requires a long time to converge to
a region of the search space with significant improvement. The best results are obtained when the
population size is between 30 and 50.
• By applying the elitism strategy, fitter solutions have a greater chance of reproducing. But when
the ratio of least-fit solutions to be exchanged with best-fit ones is too high, the GA is trapped too
quickly into a local optimum solution. However, this number is subject to the population size, and
appears to be acceptable within 20 to 30% of the population size.
• As a crossover probability that is too low preserves solutions from being interchanged and a longer
time is required for them to converge, the probability is set to at least 60%, so that the algorithm
can act satisfactorily.
• A mutation rate that is too high introduces too much diversity and takes a longer time to reach an
optimum solution. A mutation rate that is too low tends to miss some near-optimum solutions.
Using the annealing strategy — a linearly decreasing mutation probability rate with each new
generation — the effects of a too high or too low mutation rate can be overcome. The operator is
useful if the value of the probability is 0.1%, while in the annealing strategy it starts with 1% and
ends at 0.1%.
1. With the ANSYS software we calculated the efficiency of an initial UM. An outline of the rotor/stator
lamination of this motor is shown in Figure 29.5(a). The power losses of this motor were calculated
to be 313 W and the output power was calculated to be 731 W (Table 29.1). In the outline, the
levels of magnetic flux density through the rotor/stator lamination are shown, expressed as T. The
darkest gray color indicates the areas with the highest level of magnetic flux density, which results
in high iron losses. The copper losses are not shown in this area.
2. After several runs of the DOptiMeL software, a set of promising solution candidates was collected.
We applied the following settings for the GA parameters: population size 30, number of generations
100, selection ratio 0.3, crossover probability 0.7, and mutation rate 0.01. For each candidate we
make a finite-element numerical simulation followed by the calculation of the objective function
value (fitness). Because each design is verified with finite-element numerical calculations, the
optimization is a lengthy process. It takes around 3000 runs for the optimization to converge. Most
of the solutions that are given by the DOptiMeL program show a significant reduction in the iron
and the copper losses in comparison with the losses in the initial motor. The best solution results
in a power-loss reduction of 24%, and gives us a motor with iron and copper losses of 239 W
(see Figure 29.5[b]).
FIGURE 29.5 Stator/rotor lamination outline: (a) initial, (b) optimized, (c) cost-invariant optimized.
The main differences between the initial design (Figure 29.5[a]) and the optimized design (Figure 29.5[b])
are: (a) the height of the rotor-and-stator laminations is increased by 13%, (b) the rotor radius is increased
by 5%, (c) the slot (copper) areas in the stator and the rotor are larger, and (d) the iron area in the rotor
is larger.
A comparison of the magnetic flux densities in the initial and the optimized motor shows a clear
reduction of the areas with the highest levels of magnetic flux density in the optimized motor.
one is better: its power loss is estimated to be 12 W lower. By fixing the stator’s outer radius, we mostly
lose the gain of the first design in terms of the decrease of the stator’s copper losses.
29.3.1.4.3 Prototyping
We made prototypes of both the optimized and the costs-optimized motors and measured the real power
losses and the efficiencies of the motors. These values are shown in Table 29.1. The results are only
slightly different from those calculated with the ANSYS finite-element program. The main reason for
this difference can be explained by the non-exact calculation of the iron losses, due to a variation in the
material’s properties.
29.3.2.3 Parameters
By considering 18,750 different schedules of each circuit with the ECSA algorithm and 3,125 different
combinations of the parameters, we statistically compared (using the procedure described in Reference 30)
the results according to their cost function (Equation [29.5]). To ensure that most solutions are time-
constrained, that is, executed in the shortest possible time, the weight wT is set to an extremely high
value.
Figure 29.6 presents the results of different parameter-set evaluations for different test circuits.
Figure 29.6(a) shows the influence of the number of generations on the percentage of good/bad solu-
tions within the set of solutions with a specified number of generations. Here, and in other figures and
subfigures, the column marked as high represents high-quality (good) solutions, while the column marked
as low represents the low-quality (bad) solutions. So, from among all the solutions we count the number
of solutions with high fitness and the number of solutions with low fitness for each number of genera-
tions. Figure 29.6(b) shows the influence of the population size on the percentage of good/bad solutions
within the set of solutions with a specified population size. Figure 29.6(c) shows the influence of the
crossover probability on the percentage of good/bad solutions within the set of solutions with a specified
crossover probability. Figure 29.6(d) shows the influence of the mutation probability on the percentage
of good/bad solutions within the set of solutions with a specified mutation probability. Figure 29.6(e)
shows the influence of the variation probability on the percentage of good/bad solutions within the set of
solutions with a specified variation probability.
As shown in Figure 29.6 and Table 29.3, the high-quality solutions are mostly obtained with the
following values of the parameters: probability of crossover equal to 0.7, probability of mutation equal to
0.04, and probability of variation equal to 0.03. In addition, taking into account the sizes of the circuits,
(a) 50
Solution quality
High
Low
40
% of solutions
30
20
10
0
20
30
40
50
60
60
70
80
90
100
60
70
80
90
100
100
110
120
130
140
Number of
generations
Differential Fifth-order elliptic Bandpass filter Least-mean-
equation filter square filter
(b) 50
Solution quality
High
Low
40
% of solutions
30
20
10
0
30
40
50
60
70
80
90
100
110
120
80
90
100
110
120
130
140
150
160
170
Polulation
size
Differential Fifth-order elliptic Bandpass filter Least-mean-
equation filter square filter
FIGURE 29.6 Test-bench parameter evaluation: (a) number of generations, (b) population size, (c) probability of
crossover, (d) probability of mutation, (e) probability of variation.
the number of generations and the population size should be set to 3- and 3.5-times the size of the circuit,
respectively.
The values of the parameters in this combination are referred to as the optimal values. These optimal
values are determined on the basis of the percentage of solutions with certain parameters from among the
good solutions. A parameter value that is to be considered as optimal should have at least a 25% share of
the high-quality solutions, as well as having a <10% share of the low-quality solutions.
The ECSA algorithm is used with the values of the parameters as presented in Table 29.3. Other
parameters needed to run the force-directed scheduling (FDS) and ECSA algorithms and the cost function
depend on the sizes of the functional units.
(c) 50
Solution quality
High
Low
% of solutions 40
30
20
10
0.6
0.6
0.7
0.8
0.9
1.0
0.6
0.7
0.8
0.9
1.0
0.6
0.7
0.8
0.9
1.0
0.7
0.8
0.9
1.0
Probability of
crossover
Differential Fifth-order elliptic Bandpass filter Least-mean-
equation filter square filter
(d) 50
Solution quality
High
Low
40
% of solutions
30
20
10
0
0.01
0.02
0.03
0.04
0.05
0.01
0.02
0.03
0.04
0.05
0.01
0.02
0.03
0.04
0.05
0.01
0.02
0.03
0.04
0.05
Population
size
Differential Fifth-order elliptic Bandpass filter Least-mean-
equation filter square filter
(e) 50
Solution quality
High
Low
% of solutions 40
30
20
10
0
0.01
0.02
0.03
0.04
0.05
0.01
0.02
0.03
0.04
0.05
0.01
0.02
0.03
0.04
0.05
0.01
0.02
0.03
0.04
0.05
Probability of
crossover
Differential Fifth-order elliptic Bandpass filter Least-mean-
equation filter square filter
TABLE 29.3 Optimal Values of the Parameters for Different Test-Bench Circuits
TABLE 29.4 The Evaluation Results of the ECSA Algorithm with Different Test-Bench ICs
Differential equation
FDS-fast 1 × FE2 + 1 × FE4 + 3 × FE6 23,249 17 6 6 0.01
FDS-slow 2 × FE1 + 1 × FE3 + 2 × FE5 7,173 18 6 20 0.01
ECSA-basic 2 × FE2 + 1 × FE4 + 3 × FE6 23,914 18 8 6 0.11
ECSA-independent 2 × FE2 + 1 × FE4 + 3 × FE6 23,914 17 6 6 0.09
circuits can be designed and optimized with the use of the proposed evolution-based algorithm, which
exhibits a linear increase in the design time with an increase of circuit size.
29.4 Conclusions
In the UM geometry optimization we used an evolutionary approach to improve the efficiency of a UM,
the motor that is typically used in home appliances and power tools. The goal of our optimization was
to find the new set of independent geometrical parameters of the rotor and the stator with the aim of
reducing the motor’s power losses, which occur in the iron and the copper. The approach proves to be
a simple and efficient search-and-optimization method for solving this day-to-day design problem in
industry. It outperforms, by a significant improvement of the motor’s efficiency, a conventional design
procedure that was used previously. By using the GA we are able to reduce the iron and the copper losses
of an initial UM by at least 20%, and increasing the GA running time or setting its parameters more
appropriately could improve on this result.
In the IC area/time optimization we used an evolutionary approach to some parts of IC design. The work
was focused on ASICs that need an even more sophisticated design due to their specific use. Optimally
scheduled operations are not necessarily optimally allocated to functional units. To ensure optimum
allocation we need to consider some allocation criteria while the scheduling is being done. The evolutionary
approach considers scheduling and allocation constraints and ensures a globally optimal solution in a
reasonable time. To evaluate our method we built an algorithm and implemented it with a computer. It
is used with a group of test-bench ICs. These circuits are chosen because the same types were used in
similar studies. They differ in terms of their size and the number of operation types. The results of the
evaluation of a computer-implemented algorithm show that the evolutionary methods are able to find a
solution that is more appropriate in terms of all the considered and important objectives than the classical
deterministic methods.
References
[1] Sen, P.C. Principles of Electric Machines and Power Electronics. John Wiley & Sons, New York, 1996.
[2] Gajski, D. et al. High-Level Synthesis: Introduction to Chip and System Design. Kluwer Academic
Publishers, Boston, MA, 1992.
[3] Armstrong, J.R. and Gray, F.G. VHDL Design: Representation and Synthesis. Prentice Hall PTR,
Upper Saddle River, NJ, 2000.
[4] Landwehr, B., Marwedel, P., and Dömer, R. Optimum simultaneous scheduling, allocation and
resource binding based on integer programming. In Proceedings of EuroDAC, Grenoble, France,
1994, p. 90.
[5] Zhu, J. and Gajski, D. Soft scheduling in high level synthesis. In Proceedings of the 36th ACM/IEEE
Design Automation Conference, New Orleans, LA, 1999, p. 219.
[6] Mandal, C. and Zimmer, R.M. A genetic algorithm for the synthesis of structured data paths.
In Proceedings of the IEEE VLSI Design, Calcutta, India, 2000, p. 206.
[7] Kim, T. Scheduling and Allocation Problems in High-Level Synthesis, Ph.D. thesis, University of
Illinois at Urbana-Champaign, 1993.
[8] Grajcar, M. Genetic list scheduling algorithm for scheduling and allocation on a loosely coupled
heterogeneous multiprocessor system. In Proceedings of the 36th ACM/IEEE Design Automation
Conference, New Orleans, LA, 1999, p. 280.
[9] Papa, G. Concurrent operation scheduling and unit allocation with an evolutionary technique in
the process of integrated-circuit design, Ph.D. thesis, Faculty of Electrical Engineering, University
of Ljubljana, 2002.
[10] Karr, C.L., Yakushin, I., and Nicolosi, K. Solving inverse initial-value, boundary-value problems
via genetic algorithm. Engineering Applications of Artificial Intelligence, 13, 625, 2000.
[11] Bäck, T. Evolutionary Algorithms in Theory and Practice. Oxford University Press, New York, 1996.
[12] Goldberg, D.E. Genetic Algorithms in Search, Optimization, and Machine Learning.
Addison-Wesley, Reading, MA, 1989.
[13] Drechsler, R. Evolutionary Algorithms for VLSI CAD. Kluwer Academic Publishers, Boston, MA,
1998.
[14] Papa, G. et al. Universal motor efficiency improvement using evolutionary optimization. IEEE
Transactions on Industrial Electronics, 50, 602, 2003.
[15] Filipič, B. and Štrancar, J. Tuning EPR spectral parameters with a genetic algorithm. Applied Soft
Computing, 1, 83, 2001.
[16] Papa, G. and Šilc, J. Automatic large-scale integrated circuit synthesis using allocation-based
scheduling algorithm. Microprocessors and Microsystems, 26, 139, 2002.
[17] Koroušić-Seljak, B. Heuristic methods for a combinatorial optimization problem — real-time
task scheduling problem. In Smart Engineering System Design: Neural Networks, Fuzzy Logic,
Evolutionary Programming, Data Mining, and Complex Systems, Dagli, C.H. et al., Eds., ASME
Press, New York, 1999, p. 1041.
[18] Dasgupta, D. and Michalewicz, Z. Evolutionary Algorithms in Engineering Applications. Springer-
Verlag, Berlin, 1997.
[19] Coello Coello, C.A. A comprehensive survey of evolutionary-based multiobjective optimization
techniques. Knowledge and Information Systems, 1, 269, 1999.
[20] Van Veldhuizen, D. and Lamont, G.B. Multiobjective evolutionary algorithms: analyzing the state-
of-the-art. Evolutionary Computation, 8, 125, 2000.
[21] ANSYS User’s Manual, ANSYS version 5.6, 2000.
[22] Papa, G. DOptiMeL — User’s Manual. Technical report IJS DP 8440, Jožef Stefan Institute,
Ljubljana, 2001.
[23] Paulin, P.G., Knight, J.P., and Girczyc, E.F. HAL: a multiparadigm approach to automatic data
path synthesis. In Proceedings of the 23rd ACM/IEEE Design Automation Conference, Las Vegas, NV,
1986, p. 263.
[24] Kung, T., Whitehouse, H.J., and Kailath, T. VLSI and Modern Signal Processing. Prentice Hall,
New York, 1985.
[25] Grewal, G.W. and Wilson, T.C. An enhanced genetic algorithm for solving the high-level syn-
thesis problems of scheduling, allocation, and binding. International Journal of Computational
Intelligence and Application, 1, 91, 2001.
[26] Benesty, J. and Duhamel, P. A fast exact least square adaptive algorithm. IEEE Transactions on
Signal Processing, 40, 2904, 1992.
[27] Parhami, B. Computer Arithmetic: Algorithms and Hardware Designs. Oxford University Press, New
York, 2000.
[28] Šilc, J., Robič, B., and Ungerer, T. Processor Architecture: From Dataflow to Superscalar and Beyond.
Springer-Verlag, Berlin, 1999.
[29] Thornton, M.A., Gaiche, J.D., and Lemieux, J.V. Trade-off analysis of integer multiplier circuits
implemented in FPGAs. In Proceedings of the IEEE Pacific Rim Conference on Communications,
Computers and Signal Processing, Victoria, Canada, 1999, p. 301.
[30] Papa, G. and Šilc, J. Evolutionary synthesis algorithm — genetic operators tuning. In Advances in
Intelligent Systems, Fuzzy Systems, Evolutionary Computation, Grmela, A. and Mastorakis, N., Eds.,
WSEAS Press, Athens, 2002, p. 256.
[31] Paulin, P.G. and Knight, J.P. Scheduling and binding algorithms for high-level synthesis. In
Proceedings of the 26th ACM/IEEE Design Automation Conference, Las Vegas, NV, 1989, p. 1.
[32] Paulin, P.G. and Knight, J.P. Force-directed scheduling in automatic data path synthesis. In
Proceedings of the 24th ACM/IEEE Design Automation Conference, Miami, FL, 1987, p. 195.
[33] Papa, G., Šilc, J., and Wyrzykowski, R. Scheduling algorithms based on genetic approach. In
Proceedings of the 4th Conference on Neural Networks and Their Application, Zakopane, Poland,
1999, p. 469.
30.1 Introduction
The widespread use of both fast Internet connections and also high-performance graphic cards have made
possible the current growth of Distributed Virtual Environment (DVE) systems. These systems allow
multiple users, working on different computers that are interconnected through different networks (and
even through Internet) to interact in a shared virtual world. This is achieved by rendering images of the
environment as the user would perceive them if he was located at that point of the virtual environment.
Each user is represented in the shared virtual environment by an entity called avatar, whose state is
controlled by the user. Since DVE systems support visual interactions between multiple avatars, every
change in each avatar must be notified to the neighboring avatars in the shared virtual environment.
30-531
DVE systems are currently used in different applications [1], such as collaborative design [2], civil and
military distributed training [3], e-learning [4], or multiplayer games [1,5–7].
Designing an efficient DVE system is a complex task, since these system show an inherent hetero-
geneousness. Such heterogeneousness appears in several elements:
Hardware: Each client computer controlling an avatar may have installed different hardware: a very
different range of resources like processor speed, memory size, and graphic card technology can be
specified for different client computer.
Connection: Different connections can be found in a single system. From shared medium topologies
like Ethernet or Fast-Ethernet to other network connections like ISDN, fiber-optic or ATM can be
simultaneously found in some DVEs.
Communication rate of avatars: Depending on the application, different communication rates of avatars
can be found. For example, the communication rate of avatars in a collaborative three-dimensional
(3D) environment may greatly differ from the communication rate of avatars in a 3D virtual
military battle.
Additionally, other factors help to increase the complexity of designing an efficient DVE system. Each
of them have now become an open research field:
Data model: This concept describes some conceivable ways of distributing persistent or semipersistent
data in a DVE [8]. Data can be managed in a replicated, shared, or distributed methodology.
Communication model: Network bandwidth determines the size and performance of a DVE. The system
behavior is related to the way all the scene clients are connected. Broadcast, peer-to-peer, or unicast
schemes define different network latency values for exchanging information between avatars.
View consistency: This problem has already been defined in other computer science fields such as
database management [9]. In DVE systems, this problem involves ensuring that all avatars sharing
a virtual space with common objects have the same local vision of them.
Message traffic reduction: Keeping a low amount of messages allows DVE systems to efficiently scale
with the number of avatars in the system. Traditionally, techniques like dead-reckoning described
in Reference 1 offered some level of independence to the avatars. With network support, broadcast
or multicast solutions [10,11] decrease the number of messages used to keep a consistent state of
the system.
Most of the issues described above are related to the partitioning problem or p-problem. This problem
consists of efficiently distributing the workload (avatars) among different servers in the system [12]. The
partitioning problem may seriously affect the overall performance of the DVE system, since it deter-
mines not only the workload that each server must support, but also the inter-server communication
requirements (and therefore the network traffic).
Some methods for solving the partitioning problems have been already proposed [13–15]. These
methods provide efficient solutions even for large DVE systems. However, there are still some features in
the proposed methods that can still be improved. For example, different heuristic search methods can be
used for finding the best assignment of clients to servers, instead of using ad hoc heuristics. In this chapter,
we present a comparison study of several evolutive heuristics for solving the partitioning problem in DVE
systems. We have implemented five different heuristics, ranging over most of the current taxonomy of
heuristics: Genetic Algorithms (GAs) [16], two different implementations of Simulated Annealing [17],
Ant Colony Systems (ACSs) [18], and Greedy Randomized Adaptive Search (GRASP) [19]. Performance
evaluation results show that the execution cost of the partitioning algorithm (in terms of execution times)
can be dramatically reduced, while providing similar or even better solutions than the ones provided by
the ad hoc heuristic proposed in [14].
The rest of the chapter is organized as follows: Section 30.2 describes the partitioning problem and
the existing proposals for solving it. Section 30.3 describes the proposed implementations of the heur-
istics considered for this study. Next, Section 30.4 presents the performance evaluation of the proposed
heuristics. Finally, Section 30.5 presents some concluding remarks.
(a) (c)
P P C C
S S
P P C C
(b) (d) P
C C
P P
S S
P P
C C
P
FIGURE 30.1 Architectures: (a) peer-to-peer, (b) server–network, (c) client–server, and (d) peer-to-server.
Inter-server communication
Server1 Inner-server
communication
Server2
LAN–WAN
Server3
LAN–WAN
AOI of avatars
the computing, storage, and communication requirements for maintaining a consistent state of the avatars
in a DVE system.
The partitioning problem consists of efficiently distributing the workload (assigning avatars) among
the different servers in the system. Lui and Chan have shown the key role of finding a good assignment
of avatars to servers in order to ensure both a good frame rate and a minimum network traffic in DVE
systems [12,14]. They propose a quality function, denoted as Cp , for evaluating each partition (assignment
of avatars to servers). This quality function takes into account two parameters. One of them consists of the
computing workload generated by clients in the DVE system, and is denoted as CpW . In order to minimize
this parameter, the computing workload should be proportionally shared among all the servers in the DVE
system, according to the computing resources of each server. The other parameter of the quality function
consists of the overall number of inter-server messages, and it is denoted as CpL . In order to minimize this
parameter, avatars sharing the same AOI should be assigned to the same server. Thus, quality function Cp
is defined as
where W1 + W2 = 1. W1 and W2 are two coefficients that weighs the relative importance of the com-
putational and communication workload, respectively. These coefficients should be tuned according to
the specific features of each DVE system. Using this quality function (and assuming W1 = W2 = 0.5)
Lui and Chan propose a partitioning algorithm that reassigns clients to servers [14]. The partitioning
algorithm should be periodically executed for adapting the partition to the current state of the DVE
system as it evolves (avatars can join or leave the DVE system at any moment, and they can also move
everywhere within the simulated virtual world). Lui and Chan also have proposed a testing platform for
the performance evaluation of DVE systems, as well as a parallelization of the partitioning algorithm [14].
The partitioning method proposed by Lui and Chan, known as LOT or Linear Optimization Technique,
currently provides the best results for DVE systems. However, it uses an ad hoc heuristic. We propose
a comparative study of several heuristics, ranging over most of the current taxonomy of heuristics,
in order to determine which one provides the best performance when applied to the partitioning problem
in DVE systems. In this study, we propose the same approach as Lui–Chan: using the same quality
function, we will obtain an initial partition (assignment) of avatars to servers, and then we will test the
implementation of each heuristic to provide a near optimal assignment.
FIGURE 30.3 Partition of a DVE performed by LOT method: (a) RBP, (b) CRP, and (c) LP.
Then, the layering partitioning procedure (LP) and the Communication Refinement Partitioning (CRP) are
applied on that initial partition of the graph. Each of these procedures performs workload balancing and
minimizes the number of inter-server messages, respectively.
Figure 30.3 shows the partitions that this method would provide when applied to a small DVE system.
In this example, a DVE composed by 10 avatars is simulated with three identical servers. Nodes and edges
obtained from the associated graph has been labeled. The label of a node (avatar) represents a estimation
of the workload generated by this avatar to the server where it is going to be assigned. Ranging from 1 to 10,
the label of each edge represents the nearness of these two avatars. Figure 30.3(a) represents the result
obtained by a RBP phase. Although the number of avatars assigned to each server seems to be balanced, the
workload that those avatars generate must be also uniform in order to achieve actual workload balancing.
By adding the labels of all the nodes assigned to the same server, we obtain that the workload assigned
to each server is 16, 7, and 12 units of workload, respectively. Then, CRP balances (Figure 30.3[b]) the
existing workload in sets of 12, 11, and 12 units. At that point, the number of inter-server messages
generated by these sets of avatars is reduced by LP, as shown in Figure 30.3(c). Since strategies of CRP and
LP techniques could be opposed, this couple of steps is repeated three times.
S
≤ AOI ≤ S. (30.2)
2
DVE world
B
G C 4 3 2
F D
E 3 2 3
2 1 5
A graph representation of the virtual scene is obtained from the volume of avatars contained in each
cell, as also shown in Figure 30.4. Next, this graph is divided in partitions and each partition is assigned
to a server of the DVE. In order to accomplish this division an exhaustive and greedy algorithms are
compared. Using a quality different from Cp , these algorithms take also into account the number of inter-
server messages and workload balancing among the servers. Although this approach provides a fast way of
solving the partitioning problem, the performance of the static partitioning is quite low when avatars show
a clustered distribution. In this case, the servers controlling the crowded areas are overloaded, increasing
the overall cost of the quality function.
are not shown here due to space limitations, we obtained the best results for a Density-Based Algorithm
(DBA) [24].
This algorithm divides the virtual 3D scene in square sections. Each section is labeled with the number
of avatars that it contains (na), and all the sections are sorted (using Quick-sort algorithm) by their na
value. The first S sections in the sorted list are then selected and assigned to a given server, where S is the
number of servers in the DVE system. That is, all the avatars in a selected region are assigned to a single
server. The next step consists of computing the mass-center (mc) of the avatars assigned to each server.
Using a round-robin scheme, the algorithm then chooses the closest free avatar to the mc of each server,
assigning that avatar to that server, until all avatars are assigned.
The proposed implementation of the DBA method consists of the following steps (expressed as pseudo-
code statements):
program Initial_Partition (avatar, Int n, Int S)
const
n_sections = 25x25
type Cell
sum,idx: Int
var
assigned,represent:Int[]
pivot,elect,ncentr:Int
min_dis,dist_tmp :Real
na :Cell[n_sections]
begin
DivideSceneInSquareSections(n_sections)
for i:=0 to n_sections do
na[i].sum:=CountAvatarsInSection(i)
na[i].idx:=i
end_for
QuickSort (na)
for i=0 to S do
representant [i]= ObtainMC (na[i].idx)
end_for;
pivot := 0, elect := -1, ncentr:=0
for i:=0 to n do
elect:=-1,min_dis := 100000
for j:=0 to n do
if (assigned[j] = NOT_ASSIGNED)
dist_tmp:= EuclideanDistance(avatar[j],represent([ncentr])
if (dist_tmp < min_dis)
pivot := j
min_dis := dist_tmp
endif
endif
end_for
avatar[pivot].assignment := assigned[pivot] := ncentr
ncentr : (ncentr + 1) mod S
end_for
end
Since the assignment of avatars follows a round-robin scheme, this algorithm provides a good bal-
ancing of the computing workload (the number of avatars assigned to each server does not differ in
more than one). On other hand, avatars that are grouped in a small region and close to the mass-center of
a server will be assigned to that server by the DBA. Additionally, since these avatars are located so closely,
they will probably will share the same AOI. Therefore, the DBA also provides an initial partition with low
inter-server communication requirements for those avatars.
However, the assignment of avatars equidistant (or located far away) from the mass-center of the
servers is critical for obtaining a partition with minimum inter-server communication requirements
(minimum values of the quality function Cp ), particularly for SMALL virtual worlds with only a few
servers. DBA inherently provides good assignments for clustered avatars, but it does not properly focus
on the assignment of these critical avatars. Each of the following evolutive methods should be used at this
point to search a near optimal assignment that properly reassigns these avatars.
var
B,temp_cost,Cp_GA :Real
av_i,av_j,av_k :Integer
begin
Initial_Partition (DBA)
B := ObtainBorderAvatars()
Cp_GA := Compute_Cp()
For i:=0 to iterations do
For j:=0 to chromo do
SelectAndCopyChromosome(j)
Choose2DifRamdomAvatars(B,av_i,av_j)
ExchangeServerAssignment(av_i,av_j)
temp_cost := Compute_Cp()
if (HaveItoMutate(mut_rate))
Chose1RandomAvatar(av_i)
ForceServerAssignment(av_i)
endif
AddChromosomeToPopulation(this)
end_for
CWPSortPopulationByCp(2*chromo)
Cp_GA := SelectBestIndividuals(chromo)
end_for
end
Figure 30.5 shows a example of generation of new avatars in a SMALL DVE system where six BA have
been obtained from the full set of avatars. These avatars define a chromosome, and they can be assigned
to any of the three servers in the DVE system. This figure represents a possible crossover and mutation on
this chromosome.
Although this basic GA approach performs reasonably well, it is based on the generation of an ini-
tial population obtained by deriving a unique solution provided by a clustering algorithm described in
Section 30.3.1. Despite the fact that this feature offers an improved initial population of feasible solutions,
it does not focus on maximizing the structural diversity of chromosomes. As described in References 27
and 28, this low level of structural diversity can lead the algorithm to reach a local minimum or even
a poorer approximation of this value. Additionally, the crossover mechanism used by this algorithm is
based on an auto-fertilization technique, where chromosomes are derived following a single-point cross-
over [26]. This crossover mechanism is excessively generalist, and it is possible to offer new crossover
strategies more oriented to problem specifications.
S0 S1 S0 S2 S2 S0 Chromosome
S2 S1 S0 S2 S0 S0 Crossover (swapping)
S0 S1 S0 S2 S2 S0 Mutation (Gene)
S1
Operator 1: Random exchange of the current assignment for two border avatars. A given avatar Ai is a
border avatar if it is assigned to a certain server Sr in the initial partition and any of the avatars in
its AOI is assigned to a server different from Sr [31].
Operator 2: Once a border avatar Ai has been randomly selected, it is randomly assigned to one of the
servers Sf hosting the border avatars of Ai .
Operator 3: Besides the step described in the previous operator, if there exists an avatar Aj such that Aj
is assigned to Sf and it is a neighbor avatar of Ai , then Aj is assigned to Sr .
Operator 4: Since each avatar generates a certain level of workload in the server where it is assigned
to [12], then it is possible to sort the servers of a DVE system according to the level of workload
they support. If Sm and Sn are the servers with the highest and the lowest level of workload in the
system, respectively, then a random avatar Ak assigned to Sm is assigned to Sn .
Operator 5: Besides the step described in the previous operator, a random avatar Al , initially assigned
to Sn , is now assigned to Sm .
The main parameters to be tuned in GA method are the population size P, the number of iterations N
and the mutation rate M . Figure 30.7 depicts the values of que quality function Cp (denoted as system
cost) as the number of generations and individuals grows. In both cases Cp decreases significantly while
the algorithm does not reach a threshold value. From this threshold value (close to 300 for the number
of generations and 15 for the population size) the quality of the obtained solutions remains constant,
showing the impossibility of finding better solutions in this search domain. Therefore it has no sense in
spending more time in searching new partitioning solutions.
Figure 30.8 shows the tuning of the mutation rate in a given DVE system. The behavior of the algorithm
is different for this parameter. In this case, the algorithm is able to provide a high-quality solution when
mutation rate is close to 1%.
If GA approach uses values much lower than 1% for this parameter, the system can be trapped in a local
minimum. In the opposite case, when GA selects rates higher than this threshold value, then the search
method spends too much CPU time testing useless solutions.
5000 5000
Uniform Uniform
System cost Cp
4000 Skewed
System cost Cp
Skewed 4000
Clustered Clustered
3000 3000
2000 2000
1000 1000
0 0
60 120 180 240 300 360 420 480 540 600 660 3 6 9 12 15 18 21 24 27 30 33
Number of generations Number of individuals
FIGURE 30.7 Values of the quality function Cp for different number of generations and population sizes.
Uniform
5000 Skewed
1250 Clustered
System cost Cp
1000
750
500
250
0
1 2 4 6 8 10 12 14 16 18 20
Mutation rate (%)
FIGURE 30.8 Values of the quality function Cp for different mutation coefficients.
As a conclusion of this tuning phase, we can state that for this particular application GA method
provides good results for a population of 15 individuals, a mutation probability of 1%, and 100 iterations
as stop criterion.
equally decreased in all the candidate servers of all of the border avatars, according to the evaporation
rate (the pheromone evaporates at a given rate). The ACS method ends when all the iterations have been
performed.
In the process described earlier, each ant must assign each border avatar to one of the candidate servers
for that avatar. Thus, a selection value is computed for each of the candidate servers. The selection value
Sv is defined as
Sv = α × pheromone + β × Cp , (30.3)
where pheromone is the current pheromone level associated to that server, Cp is the resulting value of
the quality function when the border avatar is assigned to that server instead of the current server, and
α and β are weighing coefficients that must be also tuned. The server with the highest selection value will
be chosen by that ant for that border avatar.
On other hand, when a partial solution is found then the pheromone level must be increased in those
servers where the border avatars are assigned to in that solution. The pheromone level is increased using
the following formula:
1
pheromone = pheromone + Q × . (30.4)
Cp
Following this description, the proposed implementation of the ACS search method consists of the
following steps:
program ACS (Int Ants, Int iterations, Real evap_rate)
const
alpha = 1.0
beta = 7.0
Q = 1000
var
temp_sol :Real[Number_of_Avatars]
L :Integer[]
B,Cp_ACS,temp_cost :Real
begin
Initial_Partition (DBA)
B := ObtainBorderAvatars()
Cp_ACS := Compute_Cp()
For i:=0 to iterations do
For j:=0 to N do
For k:=0 to B do
L:=ChooseServer(alpha,beta,Q)
end_for
temp_sol := Compose_Solution(B)
temp_cost:= Obtain_Cp(temp_sol)
if (temp_cost < Cp_ACS)
Cp_ACS := temp_cost
IncreasePheromone (B,Q)
endif
end_for
DecreasePheromone(evap_rate)
end_for
end
5000 5000
4000 4000
System cost Cp
System cost Cp
3000 3000
2000 2000
Uniform Uniform
1000 Skewed 1000 Skewed
Clustered Clustered
0 0
5 25 50 75 100 125 150 175 200 225 250 2 5 10 15 20 25 30 35 40 45 50
Number of ants Number of ants
FIGURE 30.9 Values of the quality function Cp for different number of iterations.
5000
4000
System cost Cp
3000
2000
Uniform
1000
Skewed
Clustered
0
0 0.25 0.5 0.75 1 1.25 1.5 1.75 2 2.25 2.5
Coefficient of evaporation (%)
FIGURE 30.10 Values of the quality function Cp for different evaporation rates.
Like in GA approach, there are some parameters in ACS search method that must be properly tuned.
In particular, the values for the number of ants N , the pheromone evaporation rate and the number of
iterations that ACS method must perform should be tuned.
Figure 30.9 shows the values of the quality function Cp (denoted as system cost ) reached by the ACS
method when different number of ants and iterations are considered. It shows that Cp decreases when
the number of iterations increases, until value of 25 iterations is reached. The same behavior manifests
when the number of ants is grown. In this case is 100 the top value. From that point, system cost Cp
slightly increases or remain constant, depending on the considered distribution of avatars. The reason
for this behavior is that the existing pheromone level keeps the search method from finding better search
paths even when more iterations are performed. Thus, the number of iterations and ants selected for ACS
method has been 25 and 100, respectively.
Finally, Figure 30.10 shows the values of Cp reached by the ACS method when different pheromone
evaporation rates are considered. This figure shows on the x-axis the percentage decreasing in pheromone
level that all candidate servers suffer after each iteration. It shows that for all the considered distributions
Cp decreases when the evaporation rate increases, until a value of 1% is reached. The reason for this
behavior is that for evaporation rates lower than 1% the pheromone level keeps the search method from
escaping from local minima, thus decreasing performance. From that point, system cost Cp also increases,
since pheromone evaporation is too high and the search method cannot properly explore good search
paths. Thus a coefficient of 1% has been selected as the optimal value of evaporation rate.
Additionally, we have performed empirical studies in order to obtain the best values for α, β, and
Q coefficients. Although the results are not shown here for the sake of shortness, we have obtained the
best behavior of the ACS method for α = 1.0, β = 7.0, and Q = 1000. These are the values that the
pseudo-code shown earlier uses for ACS algorithm.
where N determines the finishing condition of the search. When N iterations are performed without
finding a partition that decreases the value of quality function Cp , then the search finishes.
The next code shows the described implementation based on SA:
program SA (Int iterations, Real dec_t_rate)
var
B,temp_cost,Cp_SA :Real
delta_sup :Real
av_i,av_j, :Integer
begin
Initial_Partition (DBA)
B := ObtainBorderAvatars()
Cp_SA := Compute_Cp()
For i:=0 to iterations do
Choose2DifRandomAvatars(B,av_i,av_j)
ExchangeServerAssignment(av_i,av_j)
temp_cost := Compute_Cp()
In order to improve the performance of the SA method, we measured the impact of exchanging
groups of avatars, instead of exchanging one avatar in each iteration. Table 30.1 compares the results
(in terms of both the value for the quality function Cp and also in terms of execution times) of exchanging
groups of two, three, and four avatars. We tested each option under three different distributions of avatars
in the virtual world (uniform, skewed, and clustered). These distributions are detailed in Section 30.4.
On the other hand, these results have been obtained for a LARGE world composed by 2500 avatars and
8 servers. This table shows that the best option for SA method is to exchange the lowest number of avatars
as possible in each permutation. Therefore, we exchanged a single avatar in all the simulations performed
in our study.
On other hand, the two key issues for properly tuning this heuristic search method are the number of
iterations N and the temperature decreasing rate R [33]. Figure 30.11 shows the performance (in terms
of Cp values) obtained with SA algorithm for a LARGE world when the number of iterations increases.
From Figure 30.11 we can conclude that performing more iterations results in providing better values
of Cp . However, the slope of this plot decreases from a certain number of iterations. We have con-
sidered the value of 3000 iterations at that point, and we have tested GA method with this number of
iterations.
System temperature shows a different behavior in terms of Cp . Figure 30.12 shows the values of Cp
obtained when the temperature decreasing rate is modified in a LARGE world. It clearly shows that the
6000
5000
System cost Cp
4000
3000
2000
Uniform
1000 Skewed
Clustered
0
0
00
00
00
00
00
00
00
00
00
40
80
12
16
20
24
28
32
38
40
44
Number of iterations
FIGURE 30.11 Values of the quality function Cp for different number of iterations.
6000
5000
System cost CP
4000
3000
2000
Uniform
1000 Skewed
Clustered
0
0 0.25 0.5 0.75 1 1.25 1.5 1.75 2 2.25 2.5
Temperature rate (%)
FIGURE 30.12 Values of the quality function Cp for different temperature decreasing rate.
quality of the obtained solutions do not follow a lineal progression. Effectively, since the temperature
decreasing rate allows SA approach to escape from local minima, a threshold value appears when this
parameter is modified. As this rate comes closer to 1.15, algorithm abandons local minima much more
fast, and therefore the quality of the obtained solution increases. Beyond this value, the risk of accepting
inefficient exchange of avatars is much too high and thus the algorithm is unable to find the right path.
Nonassigned
Server C
Server B
Server A
local search also provides a server assignment of that border avatar in the same iteration, following the
next procedure: first, the resulting cost Cp of adding each nonassigned critical avatar to the current (initial)
partition is computed. Since each border avatar can be assigned to different servers, the cost for assigning
each border avatar to each server is computed, forming the List of Candidates (LCs) (each element in this
list has the form (nonassigned border avatar, server, resulting cost ). This list is sorted (using Quick-sort
algorithm) by the resulting cost Cp in descendent order, and then is reduced to its top quartile. One
element of this reduced list of candidates (RLCs) is then randomly chosen (construction phase). Next,
an extensive search is performed in the AOI of that selected avatar. That is, all the possible assignments
of the avatars in the AOI of the selected avatars are computed, and the assignment with the lowest Cp
is kept.
Next code describes how the GRASP approach works when n avatars are assigned to S servers in a DVE
system:
type New_sol
idx_av,idx_ser: Int
new_cost : Real
avatar,server : Int
var
tmp_cost,Cp_GRASP :Real
Non_assig :Integer
list :New_sol[]
begin
Initial_Partition (DBA-R,threshold)
non_assig := n - threshold
For i:=0 to non_assig do
for j:=0 to n do
for k:=0 to S do
tmp_cost=TestSolution(j,k)
AddToList(list,tmp_cost,j,k)
end_for
end_for
QuickSort(list)
ReduceToFirsQuartile(list)
ChooseRandomElement(list,avatar,server)
200 3500
180
3000
160
140 2500
Computation (s)
System cost CP
120 2000
100
80 1500
60 1000
40
500
20
0 0
0 50 100 150 200 250 300 350 400 450 500 0 50 100 150 200 250 300 350 400 450 500
Number of iterations Number of iterations
FIGURE 30.14 Variation of the performance measures for different threshold values.
The quality of the solutions provided by GRASP search method depends on the quality of the elements
in the RLC, and the range of solutions depends on the length of the RLC. Thus, the main parameter to be
tuned in this case is the number of nonassigned N or critical avatars that the initial partition must leave.
Figure 30.14 shows the results in this tuning phase in order to compose an intermediate solution. In this
example a LARGE world composed of 2500 avatars are assigned to eight servers. The avatars are located
following a uniform distribution. This figure represents the variations of two performance measures as
the number of critical avatars (iterations) is increased. The quality of the obtained solutions Cp and the
execution time for GRASP algorithm have been elected as performance measures.
Figure 30.14 shows that as the number of critical avatars increases the quality of the provided solutions
also increases (Cp values decreases), but the execution time for GRASP algorithm (labeled as com-
putations) also increases. We chose in this case a compromise solution of 250 iterations. It is worth
mentioning that a larger number of iterations result in higher execution times and do not reach significant
solutions.
FIGURE 30.15 Distributions of avatars: (a) uniform, (b) skewed, and (c) clustered.
required by the search method in order to provide that partition. For comparison purposes, we have also
implemented the LOT [14]. This method currently provides the best results for the partitioning problem
in DVE systems. In the case of SMALL worlds we have also performed an exhaustive search through the
solution space, obtaining the best partition possible. The hardware platform used for the evaluation has
been a 1.7 GHz Pentium IV with 256 Mbytes of RAM.
Since the performance of the heuristic search methods may heavily depend on the location of avatars
in the virtual world, we have considered three different distributions of avatars: uniform, skewed, and
clustered distribution. Figure 30.15 shows an example of how avatars would be located in a 2D world
when following each one of these distributions.
Table 30.2 shows the Cp values corresponding to the final partitions provided by each heuristic search
method for a SMALL virtual world, as well as the execution times required for each method in order
to obtain that final partition. For this proposed approach based on genetic algorithms, GA-B represents
the basic approximation described in Section 30.3.2 and GA-I incorporates both the PA method and the
proposed crossover operator detailed in Section 30.3.3. It can be seen that all of the heuristics provide better
(lower) Cp values than the LOT search method for a uniform distribution of avatars. For the skewed and
clustered distributions, most of the heuristics also provides better Cp values than the LOT search method,
and some of them (GA and SA methods) even provide the minimum value. However, the execution times
required by most of the heuristics are longer than the ones required by the LOT method. Only GRASP
method provides worse Cp values than the LOT method, but it requires much shorter execution times.
Although these results do not clearly show which heuristic provides the best performance, they validate
any of the proposed heuristics as an alternative to the LOT search method.
However, in order to design a scalable DVE system, the partitioning method must provide good
performance when the number of avatars in the system increases. That is, it must provide a good per-
formance specially for LARGE virtual worlds. Table 30.3 shows the required execution times and the Cp
values obtained by each heuristic search method for a LARGE virtual world.
When uniform distributions of avatars are considered, GA-I not only obtains the best partitioning
solutions (in terms of Cp ) but also achieved them by spending the minimum execution time. The rest
of heuristics provides similar values of Cp than the one provided by LOT method, while requiring much
shorter execution times. When nonuniform distributions of avatars are considered, then all the heuristics
provide much better Cp values than the LOT method and they also require much shorter execution
times than the LOT method. In particular, GA-I method provides the best Cp values for nonuniform
distributions, requiring also the shortest execution time in the case of a skewed and clustered distribution
of avatars.
These results show that the performance of the partitioning algorithm can be significantly improved
by simply using any of the proposed heuristics instead of the LOT method, thus increasing the scalability
of DVE systems. In particular, GA-I method provides the best performance as a partitioning algorithm
for LARGE worlds.
30.5 Conclusions
In this chapter, we have proposed a comparison study of modern heuristics for solving the partitioning
problem in DVE systems. This problem is the key issue that allows to design scalable and efficient DVE
systems. We have evaluated the implementation of different metaheuristics, ranging over most of the
current taxonomy of modern heuristics. We have tested the proposed heuristics when applied for both
SMALL and LARGE DVE systems, with different distributions of the existing avatars in the system. We have
compared these results with the ones provided by the LOT, the partitioning method that currently provides
the best solutions for DVE systems. For SMALL virtual worlds, we can conclude that in general terms
any of the implemented heuristics provides a partition with similar values of the quality function Cp , but
the execution times required by the implemented heuristics are longer than the time required by the LOT
search method. Although SA and GA methods provide the minimum value of the quality function, only
GRASP method provides execution times shorter than the ones required by the LOT method for all the
tested distributions of avatars. These results validate any of the proposed heuristics as an alternative to
the LOT search method when considering SMALL DVE systems. However, for LARGE virtual worlds any
of the proposed heuristics provides better Cp values and requires shorter execution times than the LOT
method for nonuniform distributions of avatars. In particular, GA-I method provides the best results.
Since a scalable DVE system must be able to manage large amounts of avatars, we can conclude that these
results validates GA-I search method as the best heuristic method for solving the partitioning problem in
DVE systems.
References
[1] S. Singhal and M. Zyda. Networked Virtual Environments. ACM Press, 1999.
[2] J.M. Salles, R. Galli, A.C. Almeida, C.A.C. Belo, and J.M. Rebordão. mworld: A multiuser 3d virtual
environment. IEEE Computer Graphics, 17(2): 55–65, 1997.
[3] D.C. Miller and J.A. Thorpe. Simnet: The advent of simulator networking. Proceedings of the IEEE,
83: 1114–1123, 1995.
[4] C. Bouras, D. Fotakis, and A. Philopoulos. A distributed virtual learning centre in cyber-
space. In Proceedings of the Fourth International Conference on Virtual Systems and Multimedia
(VSMM’98), November 1998.
[5] M. Abrash. Quake’s game engine. Dr. Dobb’s Journal, 51–63, Spring 1997.
[6] M. Lewis and J. Jacboson. Game engines in scientific research. Communications of the ACM,
45: 17–31, 2002.
[7] Kali networked game support software. http://www.kali.net.
[8] M. Macedonia. A taxonomy for networked virtual environments. IEEE Multimedia, 4(1): 48–56,
1997.
[9] P.A. Berstein, V. Hadzilacos, and N. Goodman. Concurrency, Control and Recovery in Database
Systems. Addison-Wesley, Reading, MA, 1997.
[10] J. Falby, M. Zyda, D. Pratt, and R. Mackey. Npsnet: Hierarchical data structures for real-time
three-dimensional visual simulation. Computers & Graphics, 17(1): 65–69, 1993.
[11] D. Lee, M. Lim, and S. Han. Atlas — A scalable network framework for distributed virtual envi-
ronments. In Proceedings of ACM Collaborative Virtual Environments (CVE 2002), September 2002,
pp. 47–54.
[12] J.C.S. Lui, M.F. Chan, and K.Y. Oldfield. Dynamic Partitioning for a Distributed Virtual
Environment. Department of Computer Science, The Chinese University of Hong Kong, 1998.
[13] P. Barham and T. Paul. Exploiting reality with multicast groups. IEEE Computer Graphics and
Applications, 15: 38–45, 1995.
[14] J.C.S. Lui and M.F. Chan. An efficient partitioning algorithm for distributed virtual environment
systems. IEEE Transaction on Parallel and Distributed Systems, 13: 193–211, 2002.
[15] P.T. Tam. Communication Cost Optimization and Analysis in Distributed Virtual Environ-
ment. Technical report RM1026-TR98-0412, Department of Computer Science and Engineering.
The Chinese University of Hong Kong, 1998.
[16] R.L. Haupt and S.E. Haupt. Practical Genetic Algorithms. John Wiley & Sons, New York,
1997.
[17] S. Kirkpatrick, C.D. Gelatt, and M.P. Vecchi. Optimization by simulated annealing. Science, 220:
671–679, 1983.
[18] M. Dorigo, V. Maniezzo, and A. Coloni. The ant system: Optimization by a colony of operation
agent. IEEE Transactions on Systems, Man and Cybernetics, 96: 1–13, 1996.
[19] H. Delmaire, J.A. Díaz, E.M. Fernández, and M. Ortega. Comparing New Heuristics for the Pure
Integer Capacitated Plant Location Problem. Technical report DR97/10, Department of Statistics
and Operations Research, Universitat Politecnica de Catalunya (Spain), 1997.
[20] D.B. Anderson, J.W. Barrus, and J.H. Howard. Building multi-user interactive multimedia
environments at MERL. IEEE Multimedia, 2(4): 77–82, 1995.
[21] F.C. Greenhalgh. Analysing movement and world transitions in virtual reality tele-conferencing. In
Proceedings of Fifth European Conference on Computer Supported Cooperative Work (ECSCW’97),
1997, pp. 313–328.
[22] J.C.S. Lui and W.K. Lam. General methodology in analysing the performance of parallel/distributed
simulation under general computational graphs. In Proceedings of Third International Conference
on the numerical Solution of Markov Chain, September 1996.
[23] C. Coello, G. Lamont, and D. Van Veldhuizen. Evolutionary Algorithms for Solving Multi-Objective
Problems. Kluwer Academic Publishers, 2002.
[24] R. Duda, P. Hart, and D. Stork. Pattern Classification. Wiley Interscience, 2000.
[25] P. Morillo, M. Fernández, and N. Pelechano. A grid representation for distributed virtual envi-
ronments, acrossgrid’2003. In Proceedings of the First European Across Grids Conference, February
2003.
[26] K.E. Kinnear. Alternatives in automatic function definition: A comparison of performance. In
K.E. Kinnear, Ed., Advances in Genetic Programming. MIT Press, Cambridge, MA, 1994,
pp. 119–141.
[27] Z. Michalewicz. Genetic Algorithms + Data Structures = Evolution Programs, 2nd ed. Springer-
Verlag, Heidelberg, 1992.
[28] J.H. Holland and D.E. Goldberg. Genetic algorithms and machine learning: Introduction to the
special issue on genetic algorithms. Machine Learning, 3, 1988.
[29] P. Morillo, M. Fernández, and J.M. Orduña. A comparison study of modern heuristics for solving
the partitioning problem in distributed virtual environment systems. In International Conference
in Computational Science and Its Applications (ICCSA’2003) Vol. 2669 of Lecture Notes in Computer
Science, Springer-Verlag, Heidelberg, 2003, pp. 458–467.
[30] R. Sedgewick. Algorithms in C, 3rd ed. Addison-Wesley, Reading, MA, 1998.
[31] P. Morillo, M. Fernández, and J.M. Orduña. An ACS-based partitioning method for distributed
virtual environment systems. In Proceedings of 2003 IEEE International Parallel and Distributed
Processing Symposium, NIDISC-IPDPS’2003, April 2003, p. 148.
[32] M. Dorigo, G. Di Caro, and M. Sampels. Ant Algorithms: Third International Workshop, Ants 2002.
Springer-Verlag, Heidelberg, 2002.
[33] C. Koulamas, R. Jaen, and S.R. Antony. A survey of simulated annealing applications to operations
research problems. International Journal of Management Science, 22: 41–56, 1994.
[34] T.A. Feo and M.G.C. Resende. Greedy randomized adaptive search procedures. Journal of Global
Optimization, 6: 109–133, 1995.
[35] M. Resende and C. Ribeiro. Handbook of Metaheuristics. Kluwer Academic Publishers.
[36] P. Morillo and M. Fernández. A grasp-based algorithm for solving DVE partitioning problem. In
Proceedings of 2003 IEEE International Parallel and Distributed Processing Symposium, IPDPS’2003,
April 2003, p. 60.
31-555
algorithms are used to model the microevolutionary process, and so-called Version Spaces are used
to model the macroevolutionary process of a cultural algorithm. A cultural algorithm models the
evolution of the culture component of an evolutionary computational system over time. The cul-
ture component provides an explicit mechanism for acquisition, storage, and integration of individual
and group’s problem solving experience, and behavior. In a cultural algorithm, there are two main
spaces: the normal population of the evolutionary algorithm and the belief space, which is the place
where the shared acquired knowledge is stored during the evolution of the population (Chung and
Reynolds, 1998).
Moscato and Norman (1989) have introduced the term memetic algorithm (MA) to describe evolution-
ary algorithms in which local search plays a significant part. This term is motivated by Richard Dawkins’s
notion of a meme as a unit of information that reproduces itself as people exchange ideas (Dawkins,
1976). Moscato and Norman liken this thinking to local refinement, and therefore promote the term
“memetic algorithm” to describe genetic algorithms that use local search heavily. While genetic algorithms
have been inspired by a biological evolution, (MAs) would try to mimic cultural evolution. MA is a
marriage between a population-based global search and the heuristic local search made by each of the
individuals.
Given a representation of an optimization problem, a certain number of individuals are created. The
state of these individuals can be randomly chosen or set according to a certain initialization procedure.
A heuristic can be chosen to initialize the population. After that, each individual makes local search. After
that, when the individual has reached a certain development, it interacts with the other members of the
population. The interaction can be a competitive or a cooperative one. The cooperative behavior can be
understood as the mechanisms of crossover in GA or other types of breeding that result in the creation
of a new individual. More generally, cooperation is understood as an interchange of information. The
local search and cooperation (mating, interchange of information) or competition (selection of better
individuals) is repeated until a stopping criterion is satisfied. Usually, it should involve a measure of
diversity within the population.
An important addition to the family of the population-based method is GRASP. A greedy random-
ized adaptive search procedure is a metaheuristic for combinatorial optimization. It is a multi-start
or iterative process in which each iteration consists of two phases, a construction one, in which a
feasible solution is produced and a local search one, in which a local optimum in the neighborhood
of the constructed solution is sought (Feo and Rosende, 1995). In the construction phase a feas-
ible solution is iteratively constructed, one element at a time. At each construction step, the choice
of the next element to be added is determined by ordering all candidate elements in a candidate list
C with respect to a greedy function measuring the benefit of selecting each element. The heuristic
is adaptive because the benefits associated with every element are updated at each iteration of the
construction phase to reflect the changes brought on by the selection of the previous element. The
probabilistic component of a GRASP is characterized by randomly choosing one of the best candidates
in the list, but not necessarily the top one. The list of best candidates is called the restricted candidate
list (RCL).
The solutions generated by a GRASP construction are not guaranteed to be locally optimal. Hence,
it is almost always beneficial to apply a local search in attempt to improve each constructed solution.
It terminates when no better solution is found in the neighborhood. While such local optimization
procedures can require exponential time from an arbitrary starting point, empirically their efficiency
significantly improves as the initial solution improves. The result is that often many GRASP solutions
are generated in the same amount of time required for the local optimization procedure to converge
from a single random start. Furthermore, the best of these GRASP solutions is generally signific-
antly better than the single solution obtained from a random starting point. GRASP can be easily
implemented in parallel. Each processor can be initialized with its own copy of the procedure, the
instance data, and an independent random number sequence. The GRASP iterations are then per-
formed in parallel with only a single global variable required to store the best solutions found over
all processors.
The above review, by no means exhaustive, allows to draw the following general conclusions:
• Population-based algorithms and techniques have become a standard tool to deal effectively with
computationally complex problems.
• There are several ingredients common to all population-based approaches. Among them, the most
characteristic include diversification by means of generating a population of individuals that are
solutions or parts of solutions and introducing some random noises at various stages of searching
for the solution. All population-based algorithms are equipped with some tools allowing to exploit
information gathered during computation with a view to drop less promising direction of search.
Finally, all population-based approaches mimic some natural biological or social processes.
• A common framework of properties characterizing population-based methods still does allow
for design flexibility and development of new population-based algorithms, each having its own
distinctive features and strengths, which may prove effective in solving target problem types.
START
LEARNi (P )
Define learning/improvement
procedures LEARNi (P ), i := i + 1
i = 1...N, operating on a
population of individuals P
NO
Define selection procedures i>N
SELECTi (P) i = 1...N,
operating on a population of YES
individuals P
Consider the best individual
from P as a solution
i := 1
END
Generate the initial population
Set P := initial population
(b)
Intermediate Final
Initial population 1
population 1 population 1
Solution
Intermediate Final
Initial population n population n population n
(c)
Intermediate
Initial population 1
population 1
Final Solution
population n
Intermediate
Initial population n
population n
Scheme (b) is a simple parallel implementation, assuming that the learning and improvement procedures
used at various stages do not require any information exchange between the concurrent popula-
tions. Finally, Scheme (c) is a parallel implementation with information exchange. Such scheme (or
its variants) should be used in case some information is exchanged between the concurrent popula-
tions during the learning and improvement process. Choice of the appropriate PLA scheme depends
on both — the computational resources available and characteristics of learning and improvement
procedures used.
Designing population learning algorithm intended for solving a particular problem type allows the
designer a lot of freedom (as, in fact, happens in case of majority of other population-based algorithms).
Moreover, an effective PLA would certainly require a lot of fine-tuning and experimenting. This could
be considered as a disadvantage, at least as long as the process of setting different parameters of the
population learning algorithm is rather based on heuristics instead of some theoretical or statistical rules,
which, unfortunately, are not yet appropriately developed. Main PLA design elements are summarized in
Table 31.1.
Although PLA shares many features with other population-based approaches, it clearly has its own
distinctive characteristics. Brief comparison of several example algorithms belonging to the discussed
class is shown in Table 31.2.
In the following sections several example implementations of the PLA applied to solving different com-
putationally difficult problems are discussed. The PLA is seen here as a general framework for constructing
hybrid solutions to difficult computational problems. From such a perspective the PLA role is to struc-
ture, organize, sequence, and eventually help to apply in a parallel environment variety of techniques.
Strength of the PLA stems from combining in an “intelligent” manner the power of population-based
algorithms using some random mechanism for diversity assurance, with efficiency of various local search
algorithms. The later may include, for example, reactive search, tabu search, simulated annealing as well
as the described earlier population based approaches.
Computational intelligence embedded into the population learning algorithm scheme is based on using
the following heuristic rules:
• To solve difficult computational problems apply a cocktail of methods and techniques including
random and local search techniques, greedy and construction algorithms, etc., building upon their
strengths and masking weaknesses.
• To escape getting trapped into a local optimum generate or construct an initial population of
solutions called individuals, which in the following stages will be improved, thus increasing chances
for reaching a global optimum.
• Another means of avoiding getting trapped into local optima is to apply at various stages of search
for a global optimum some random diversification algorithms.
• To increase effectiveness of searching for a global optimum divide the process into stages retaining
after each stage only a part of the population consisting of “better” or “more promising” individuals.
• Another means of increasing effectiveness is to use at early stages of search, improvement algorithms
with lower computational complexity as compared to those used at final stages.
To conclude this section it should be noted that the PLA is an addition to the family of the population-
based techniques, which can be seen as a hybridization framework allowing for an effective integration of
different deterministic and random local search techniques. Figure 31.3 summarizes main features of the
population learning algorithm.
i ai Ei + i bi Ti → min, i = 1, . . . , n.
It should be also noted that the due date is called unrestrictive if the optimal sequence of tasks can
be constructed without considering the value of the due date. Otherwise the common due date is called
restrictive. Obviously a common due date for which d ≥ i pi holds is unrestrictive.
In a permutation flow shop there is a set of n jobs. Each of n jobs has to be processed on m machines
1, . . . , m in this order. The processing time of job i on machine j is pij , where pij is fixed and nonnegative.
At any time, each job can be processed on at most one machine, and each machine can process at most one
job. The jobs are available at time 0 and the processing of a job may not be interrupted. In permutation
flow shop problem (PFSP) the job order is the same on every machine. The objective is to find a job
sequence minimizing schedule makespan (i.e., completion time of the last job).
In a single machine weighted tardiness problem there is a set of n jobs. Each of n jobs (numbered
1, . . . , n) is to be processed without interruption on a single machine that can handle no more than one
job at a time. Job j(j = 1, . . . , n) becomes available for processing at time zero, requires an uninterrupted
positive processing time pj on the machine, has a positive weight wj , and has a due date dj by which it
should ideally be finished. For a given processing order of the jobs, the earliest completion time Cj and
the tardiness Tj = max{Cj − dj , 0} of job j( j = 1, . . . , n) can readily be computed. The problem is to find
a processing order of the jobs with minimum total weighted tardiness j wj Tj .
Finally, a multiprocessor task scheduling in multistage hybrid flow shops is considered. The discussed
problem involves scheduling of n jobs composed of tasks in a hybrid flow shop with m stages. All jobs
have the same processing order through the machines, that is, a job is composed of an ordered list of
multiprocessor tasks where the i-th task of each job is processed at the i-th flow shop stage (the number
of tasks within a job corresponds exactly to the number of flow shop stages). Processing order of tasks
flowing through stages is the same for all jobs. At each stage i, i = 1, . . . , m, there are available mi identical
parallel processors. For the processing at stage i, task i being a part of the job j, j = 1, . . . , n, requires
sizei,j processors simultaneously. That is, sizei,j processors assigned to task i at stage i start processing the
task simultaneously and continue doing so for a period of time equal to the processing time requirement
of this task, denoted pi,j . Each subset of available processors can process only the task assigned to them at
a time. The processors do not break down. All jobs are ready at the beginning of the scheduling period.
Preemption of tasks is not allowed. The objective is to minimize makespan, that is the completion of the
lastly scheduled task in the last stage.
To solve permutation scheduling problems three versions of the population learning algorithm, denoted
respectively as PLA1, PLA2, and PLA3 have been designed and implemented. PLA1 has been used to solve
the restricted instances of the common due date scheduling problem, as well as instances of flow shop
and total tardiness problems. PLA2 has been used to solve the unrestricted instances of the common
due date problem. PLA3 has been designed to solve instances of permutation flow shop, total weighted
tardiness problem, as well as instances of multiprocessor task scheduling in multistage hybrid flow shops.
The proposed algorithms make use of different learning and improvement procedures, which, in turn,
are based on five neighborhood structures shown in Table 31.3 In what follows x denotes an individual
encoded as a set of natural numbers (a permutation of tasks) and g (x) its fitness function.
All learning and improvement (l & p) procedures operate on the population of individuals P, and
perform local search algorithms based on the above defined neighborhood structures. General structure
of a l & p procedure is shown in the following pseudo-code:
Procedure LEARN(i,P):
begin
for each individual x in P do
Local_ search(i,x);
end for
end
1 Perform all moves from the neighborhood structure N1 (x); accept moves improving g (x); stop when no
further improvements of g (x) are possible
2 Mutate x producing x {mutation procedure is selected randomly from the two available ones — the two
point random exchange or the rotation of all tasks between two random points}; perform all moves from the
neighborhood structure N1 (x ); accept moves improving g (x ); stop when no further improvements of g (x )
are possible
3 Repeat k times {k is a parameter set at the fine-tuning phase; in the reported experiment k = 3∗ initial
population size}; generate offspring y and y by a single point crossover of x and a random individual x ;
perform all moves from the neighborhood structure N1 (y) and N1 (y ); accept moves improving g (y) and
g (y ); stop when no further improvements of g (y) and g (y ) are possible; adjust P by replacing x and x with
the two best individuals from {x, x , y, y }
4 Perform all moves from the neighborhood structure N2 (x); accept moves improving g (x); stop when no
further improvements of g (x) are possible
5 Perfor Perform all moves from the neighborhood structure N3 (x); accept moves improving g (x); stop when
no further improvements of g (x) are possible
6 Perform SIMULATED ANNEALING (x) based on N4 (x) neighborhood structure
7 Perform TABU SEARCH (x) based on N5 (x) neighborhood structure
In the PLA implementations a simple procedure SELECT (P, s) is used, where s is a percentage of best
individuals from a current population promoted to higher stages. For any population P, let LEARNING(P)
stand for the following:
Procedure LEARNING(P):
begin
for i = 1 to 5 do
LEARN(i,P);
SELECT(P, s);
end for
end
The last learning procedure LEARN_REC(x) is recursive and operates on overlapping partitions of the
individual x. For example, for x = (5, 2, 4, 6, 1, 3, 7) the individuals s = (5, 2, 4), t = (4, 6, 1), u = (1, 3, 7)
make an overlapping partition of x, and the merge of (s, t , u) is x.
Procedure LEARN_REC(x):
begin
let (y1 ,. . ., ys ) be an overlapping partition of x
for i = 1 to s do
generate randomly a ‘‘small’’ population Pi of individuals for yi ;
LEARNING(Pi );
yi ’:=the best individual from Pi ;
end for
output the merge of (y1 ’,. . .,ys ’);
end
Now, the structure of the implemented population learning algorithms can be shown as:
Procedure PLA1:
begin
generate randomly initial_population;
P := initial_population;
LEARNING (P);
for each individual x in P do
LEARN_REC (x);
end for
output the best individual from P;
end
For the unrestrictive common due date scheduling not only the sequence of tasks but also the
starting time of the first task need to be found. To achieve this PLA2 is based on the following
assumptions:
• In the course of computation it is assumed that the deadline equals 0 and each task can be scheduled
prior to and after time 0.
• Each individual x = (x0 , x1 , . . . , xn ) in the population consists of a sequence of n tasks, 1 ≤ xj ≤ n
for j = 1, . . . , n and x0 = i ≤ n, which is the number of the last task completed before deadline
(x1 , . . . , xi are completed before time 0 and xi+1 , . . . , xn after time 0).
0
• starting time of the schedule is (− xj=1 pxj ). In what follows x0 is called the boundary task.
• In the final step of the computation the real starting time of the first task is computed from:
0
t0 = d − xj=1 pxj .
PLA2 uses slightly modified learning and improvement procedures as compared with PLA1. For instance,
LEARN (3, P) produces an offspring using the following rule: If two individuals x and x produce y, then
the boundary task of y equals that of x if the crossover point is greater than the boundary task of x, or the
value: (the crossover point + boundary task of x–1) otherwise. Same applies to y .
Further modifications include two new learning and improvement procedures using the following local
search procedures:
Procedure PLA2:
begin
generate randomly initial_population;
P := initial_population;
for i = 1,2,3,4,5,8 do
LEARN (i, P);
SELECT (P, s);
end for
LEARN (9, P);
output the best individual from P;
end
Procedure PLA3:
begin
generate randomly initial_population;
P := initial_population;
for i = 1, 7 do
LEARN (i, P);
SELECT (P, s);
end for
LEARN (6, P);
output the best individual from P;
end
Common due date 240 problem instances with 20, 50, 100, 200, 500, and 1000 www.wiwi.uni-bielefeld.de/∼
tasks (40 instances for each problem size). Due dates for kistner/Bounds.html
instances in each problem size group are calculated as: d =
hi pi with h = 0.2, 0.4, 0.6, and 0.8, respectively. Problems
with h = 0.6 and 0.8 are considered as unrestrictive. Upper
bounds are provided
Flow shop 120 problem instances; 10 instances for each combination OR-LIBRARY,
of number of tasks and number of machines (20-5, 20-10, http://people.brunel.ac.uk/∼
20-20, 50-5, 50-10, 50-20, 100-5, 100-10, 100-20, 200- mastjb/jeb/info.html (see also
10, 200-20 and 500-20 tasks-machines). Upper bounds are [1993])
provided
Weighted tardiness 375 problem instances; 125 instances for each problem size OR-LIBRARY,
(40, 50, and 100 tasks). Optimal solutions are provided for http://people.brunel.ac.uk/∼
instances with 40 and 50 tasks. Upper bounds are provided mastjb/jeb/info.html
for instances with 100 tasks
Multiprocessor task 160 problem instances. The dataset includes 80 instances OR-LIBRARY,
scheduling in a with 50 jobs each and 80 instances with 100 jobs each. http://people.brunel.ac.uk/∼
multistage hybrid Each group of 80 problem instances is further partitioned mastjb/jeb/info.html
flow shops into four subgroups of 20 instances each with 2, 5, 8,
and 10 stages, respectively. Processor availability at various
stages and processor requirements per task varied between
1 and 10
In case of the multiprocessor task scheduling in a multistage hybrid flow shop problem, the PLA3 uses
the following fitness function:
Procedure FITNESS:
begin
for i = 1 to m do
for j = 1 to n do
allocate task π (j) at stage i to the required number of
processors, scheduling it as early as feasible;
end for
end for
g(π ) := finishing time of task number π (n) at stage m;
end
To evaluate the proposed algorithms several computational experiments have been carried out using
a PC with Pentium III, 850 Mhz processor (experiment results were originally reported in Jedrzejowicz
& Jedrzejowicz, (2003a, 2003b). The results have been compared with upper bounds or optimal solutions
(if known) of several sets of benchmark problems. Table 31.5 contains short descriptions of benchmark
data sets.
Table 31.6 shows mean relative errors (MREs) and mean computation times (MCT) for the experiment
with common due date scheduling instances. These have been calculated by comparing the best result
out of the three experimental runs with the upper bounds (i.e., best up-to-now known results for bench-
mark problems). MRE and MCT values in Table 31.6 have been calculated taking the average from ten
instances available for each problem size and due date class. Negative value of the MRE shows, in fact,
the relative improvement in comparison to previously established upper bounds. In the experiment the
initial population size has been set to 2000 and the selection coefficient value s has been set to 0.5.
TABLE 31.6 MRE and MCT for the Common due Date
Scheduling Instances
TABLE 31.7 MRE, SDE and MCT for the Flow Shop Scheduling
Experiment results prove that PLA1 and PLA2, as applied to the common due date scheduling, perform
well. Mean relative error for the whole considered population of problem instances is negative and equals
–1.31% , which means that the total amount of penalties to be paid as a result of scheduling all 240
problem instances has decreased by 1.31% . Out of 240 problem instances involved in the experiment in
77.5% of cases, that is, in 186 instances, it has been possible to find a better upper bound than previously
known. Only in case of 8 problem instances out of 240, results obtained by applying PLA were worse than
respective benchmark values. The discussed results have been obtained with a reasonable computational
effort.
Table 31.7 shows mean relative errors, standard deviation of errors (SDEs), and mean computation
times for the experiment with flow shop scheduling instances. These have been calculated by comparing
results of a single experimental run of the PLA3 with the upper bounds, that is best up-to-now known
result for benchmark problems. MRE and MCT values in Table 31.7 have been calculated taking an average
from 10 benchmark instances available for each problem size and number of processors class. The initial
population size has been set to 100 and the selection coefficient value s has been set to 0.5.
Experiment results prove that PLA3 as applied to permutation flow shop scheduling perform quite
well. Mean relative error for the whole considered population of benchmark problem instances equals
only 0.1776% and it has been obtained in a single run. Out of 120 problem instances involved in the
experiment in 10% of cases, that is, in 12 instances, it has been possible to find a better upper bound than
previously known. Considering the research effort invested during the last 20 years into finding algorithms
solving flow shop problems the quality of PLA3 is more than satisfactory. The discussed results, however,
required a substantial computational effort.
Table 31.8 shows mean relative errors and standard deviation of errors for the total tardiness scheduling.
These have been calculated by comparing results of a single experimental run of PLA3 with the optimal
solutions in case of 40 and 50 tasks and upper bounds in case of 100 tasks instances (strongly suspected to
be optimal solutions). MRE values in Table 31.8 have been calculated for different initial population sizes
TABLE 31.8 MRE and SDE for the Total Tardiness Scheduling Experiment
Initial n = 40 n = 50 n = 100
population size MRE SDE MRE(%) SDE MRE(%) SDE
TABLE 31.9 MRE, SDE, and MTC for the Multiprocessor Task Scheduling in
a m-h Flow Shop Experiment
taking the average from 125 instances available for each problem size. The initial population sizes have
been set to 25, 50, 100, 200, 400, and 800 individuals with the selection factor set to 0.5.
Experiment results prove that PLA3 applied to total tardiness scheduling perform very well. However,
the computational effort involved is rather high (from an average 72s. per instance of 100 tasks and the
initial population size set to 25, up to about 3000s. with the initial population size set to 800).
Finally, MRE, SDE, and MCT calculated after a single run of the PLA3 applied to solving benchmark
instances of the multiprocessor task scheduling in a multistage hybrid (m-h) flow shop problem are shown
in Table 31.9. Negative values of MRE show a percentage of improvement achieved over known upper
bounds within the respective clusters of benchmark dataset.
Application of the population learning algorithm has resulted in improving the total of upper bounds
of the considered cases by 0.34% . Out of 160 instances solved it has been possible to improve currently
known upper bounds in 73 instances, which is more than 45% of all considered instances.
A GSSP is a combinatorial optimization problem that involves allocation of certain number of goods to
the available compartments subject to segregation (physical separation) constraints. Let m be a number
of divisible consignments consisting of the homogenous goods, n be a number of available (internal)
compartments. The size of each consignment to be stored — ai (i = 1, . . . , m) and the capacity of each
compartment, bj (j = 1, . . . , n) are known. Let cij be the cost of storing a unit of goods from consignment
i in the compartment j(i = 1, . . . , m, j = 1, . . . , n). For a given set of goods a segregation matrix, S
is introduced. Each element sij ∈ Z + of the matrix S(i, j = 1, . . . , m) defines the required segregation
distance between goods. Element sij is equal to 0, if and only if, good i can be stored together with j
(one without any restrictions) and element sij is greater than 0, if some segregation type between cargos
i, j(i, j = 1, . . . , m) is required (for example sij = 2 means that goods i and j must be stored in two
separated compartments). On the other hand, for a given set of compartments an additional compartment
segregation matrix, CS is defined. Element csij ∈ Z + of CS matrix (i, j = 1, . . . , n) specifies segregation
type guaranteed when storing goods in two compartments i and j(i, j = 1, . . . , n). It is also required
that all consignments will be stored. To meet this constraint, it is assumed that an external storage space,
denoted as (n + 1)st compartment, is also available. It can accommodate any consignment at a higher unit
cost ci,n+1 (i = 1, . . . , m). In this compartment segregation requirements do not have to be satisfied. Let
xij denote the amount of cargo i stored in the compartment j(i = 1, . . . , m, j = 1, . . . , n + 1). Generalized
Segregated Storage Problem can be then formulated as follows:
m
n+1
Z = min cij xij
i=1 j=1
subject to:
n+1
xij = ai i = 1, . . . , m,
j=1
m
xij ≤ bj j = 1, . . . , n
i=1
m
m
xik xjl hijkl = 0 k, l = 1, . . . , n,
i=1 j=1
xij ≥ 0 i = 1, . . . , m, j = 1, . . . , n + 1,
where
0 if sij ≤ cskl ,
hijkl =
1 if sij ≥ cskl ,
Elements hijkl form a binary matrix H . Each element of H is defined for two pairs: segregation requirement
for goods and segregation type for compartments, (i, k) and (j, l), where i, j = 1, . . . , m, and k, l =
1, . . . , n.
Let I = {1, 2, . . . , m} be a set of consignments, each consisting of a certain number of units and
H = {1, 2, . . . , n} be a set of internal compartments. Let g be a function g : {1, . . . , mn} → I × H defined
as follows:
g (v) = ((v − 1) /n + 1, (v − 1) mod n + 1).
An ordered pair (p,q) is called allocation (distribution) of some units of goods p to compartment q(p ∈
I , q ∈ H ).
m
n+1
f (chk ) = cij xyk
i=1 j=1
Procedure L1 :
begin
for each individual in J do
Create string Zj containing allocations (p,q) for which xkpq >0
and Nj containing allocations for which xkpq =0. Create new
individual by substituting the most expensive allocation i.e.
allocation (p,q) for which the expression xkpq (cp,n+1 –cpq ) has
the greatest value with element randomly chosen from Nj string.
Accept exchange if the new individual is an improvement,
otherwise recover an old individual;
end for
end
Procedure L2 :
begin
for each individual in J do
Choose randomly the position tj (tj ∈ {1, . . . ,mn}). Create a new
individual from the old one cyclically shifting its elements
to the left by tj positions. Accept if the new individual is an
improvement otherwise recover an old individual;
end for
end
Procedure L3 :
begin
for each individual in J do
Create two strings as in L1 . A new individual is created from
the old one by randomly changing position of each element of
the Zj string. Accept if the new individual is an improvement
otherwise recover an old individual;
end for
end
Procedure DECODE(ch):
begin
ai ’ ← a i ;
bj ’ ← bj ;
for k=1 to mn do
(p,q) ← g(chk )
if allocation (p,q) is feasible then
xpq ← min(ap ’, bq ’);
ap ’ ← ap ’ - xpq ;
bq ’ ← bq ’ - xpq ;
end if
end for
Store non-allocated goods into external ((n+1)–th) compartment;
end
To validate the approach, the performance of the proposed PLA implementation has been compared
with that of the “classic” evolutionary algorithm. The respective experiment involved two randomly
generated data sets of 10 instances each with number of goods and a number of compartments from
U [5,20] denoting the discrete uniform distribution between 5 and 20 inclusively. In data set 1, the internal
storage costs cij were randomly drawn from U [10,19], the external storage costs ci,n+1 from U [20,24].
In data set 2, the internal storage costs cij were randomly drawn from U [100,199], the external storage
costs ci,n+1 from U [200,249]. Quantities of cargos, ai and capacities of compartments, bj were randomly
generated from U [1,9] in both data sets. Elements of segregation matrices S and CS were generated from
U [0,5] and U [0,4], respectively.
All generated problem instances have been solved by EA and PLA. Optimum solutions have been also
obtained using a CPLEX solver. Mean relative error from optimal solution in case of the PLA is 2.17% (with
2.61% in case of EA). Mean relative computational effort is 91 and 100 for the PLA and EA, respectively.
The next example involves applying the population learning algorithm to solve the resource constrained
project scheduling problem (RCPSP) with makespan minimization as the objective function. A single-
mode RCPSP is considered. In a single-mode case a project consists of a set of activities, where each
activity has to be processed in a single, prescribed way (mode). Each activity requires some resources,
availability of which is constrained. The discussed problem is computationally difficult and belongs to
the NP-hard class. Because of its practical importance RCPSP has attracted a lot of attention and many
exact and heuristic methods have been proposed for solving it (see e.g., Christofides et al. [1987]). Exact
algorithms seem suitable for solving smaller instances of RCPSP. On the other hand, heuristic approaches,
used for solving its larger instances, can only be evaluated experimentally using benchmark datasets with
known optimal solutions or upper bounds.
A project consists of a set of n activities, where each activity has to be processed without interruption to
complete the project. The dummy activities 1 and n represent the beginning and end of the project. The
duration of an activity j is denoted by dj , where d1 = dn = 0. There are R renewable resource types. The
availability of each resource type k in each time period is Rk units, k = 1, . . . , r. Each activity j requires rjk
units of resource k during each period of its duration where r lk = rnk = 0, k = 1, . . . , r. All parameters
are nonnegative integers. There are precedence relations of the finish-start type with a zero parameter
value (i.e., FS = 0) defined between the activities. In other words activity i precedes activity j if j cannot
start until i has been completed. The structure of a project can be represented as an activity-on-node
network G = (V , A), where V is the set of activities and A is the set of precedence relationships. Sj (Pj )
is the set of successors (predecessors) of activity j. It is further assumed that 1 ∈ Pj , j = 2, . . . , n, and
n ∈ Sj , j = 1, . . . , n − 1. The objective is to find a schedule S of activities, that is, a set of starting times
(s1 , . . . , sn ) where s1 = 0 and resource constraints are satisfied, such that the schedule duration T (S) = sn
is minimized.
The respective PLA implementation involves three learning and improvement stages. The value of the
goal function is directly used as a measure of quality of individuals and, hence, as a selection criterion. An
Values of the above control parameters are set at the algorithm fine-tuning phase. All random moves
within the algorithm are drawn from the uniform distribution. All parameter values used in the following
pseudo-code have been set by trials and errors during the fine-tuning phase. In the pseudo-code shown
below P denotes the population, and |P| its size.
Procedure PLA:
begin
Set size of P, |P|= p*n;
Set values of xi1, xm1, xi2;
Generate the initial population;
{-- the first learning stage}
for it=1 to xi1*n do
for i=1 to 0.4*|P| do
Crossover(Random(P), Random(P));
end for
for i=1 to 0.05*|P|do
Mutation(Random(P));
end for
for i=1 to 0.05*|P| do
LSA(Random(P),2,6,10);
end for
if it mod (xm1*n)=0 then
Selection(medium makespan);
end if
end for
Selection(medium makespan);
{-- the second learning stage}
for it=1 to xi2*n do
for i=1 to 0.4*|P|do
Crossover(Random(P), Random(P));
end for
for 2 best solutions S∈P do
EPTA(S,6,4);
end for
for i=1 to 0.2*|P|do
LSA(Random(P),2,6,10);
end for
end for
Selection(medium makespan);
The algorithm creates an initial population by producing four individuals using simple construction
heuristics and generating randomly the remaining ones. Heuristics are based on the following rules:
shortest duration first, shortest duration last, longest duration first, and longest duration last. The first
learning stage uses evolutionary operators and a simple local search algorithm (LSA). Three procedures,
such as Crossover, Mutation, and LSA are repeated xi1∗ n times. The Random(P) function chooses
randomly an individual from the population P.
The LSA procedure requires four variables. The first (S) denotes an individual, the second (itNumber)
defines a number of iterations within the procedure. The last two (iStep, fStep) indicate the distance
between activities under exchange.
At the second learning stage a crossover and two heuristics, EPTA (exact precedence tree algorithm)
and LSA are used. EPTA is based on the precedence tree approach. It finds an optimum solution by
enumeration for a part of the schedule consisting of a sequence of activities, with the number of activities
in such sequence denoted as partExtent. In the following pseudo-code the variable S denotes an individual,
and |S| its current size (a number of activities). The step variable denotes a distance between starting points
of the considered partitions.
Finally, at the third learning stage two heuristics EPTA and LSA are again used with settings resulting
in more iterations and higher granularity of the neighborhood explored as compared with the previous
stages.
To validate the proposed approach a computational experiment has been carried based on 1440
benchmark instances of the single-mode RCPSP. The benchmark data set used in the reported
experiments includes 480 instances for each out of the three problem sizes (30, 60, 90, and 120
activities, respectively). The benchmark data set together with known upper bounds can be found at
http://www.wior.uni-karlsruhe.de/rcpsp
The fine-tuning phase of the experiment has been devoted to finding values of the PLA parameters
assuring acceptable compromise between computation time and quality of solutions. This search has been
carried using the subset of available benchmark instances consisting of all RCPSP with 30 activities. The
resulting setting is shown in Table 31.10.
The experiment involved solving all benchmark instances twice. The results were evaluated in terms
of mean and maximum relative errors (MRE, max RE), percent of solutions equal to respective upper
bounds (eqUB) as well as mean computation time (MCT) needed to solve a single instance. Relative errors
have been calculated considering the available upper bounds. Experiment results are shown in Tables 31.11
and 31.12. The proposed PLA implementation has contributed to finding an improved upper bound for
one benchmark instance with 90 activities. The experiment was carried out on a PC computer with the
AMD XP 1600+ processor and 256 MB RAM.
Application of the population learning algorithm to solving several difficult combinatorial problems
proves that it is a useful tool extending the range of available techniques. Although the PLA does not seem
to be able to produce very good solution quickly it is a technique providing consistently a good to very
good solution in reasonable time, leading in many instances to improving upper bounds on most difficult
combinatorial problems.
• An individual is a vector of real numbers from the predefined interval, each representing a value of
weight of the respective link between neurons in the considered ANN.
• The initial population of individuals is generated randomly.
• There are five learning/improvement procedures used — standard mutation, local search,
nonuniform mutation, gradient mutation, and gradient adjustment.
• There is a common selection criterion for all stages. At each stage, individuals with fitness below
the current average are rejected.
All five learning and improvement procedure L(1) to L(5) used for ANN training are shown in the
following pseudo-code:
end for
end
Procedure L(2): {local search}
begin
for i=1 to J do
for i=1 to k do {k -- number of iterations}
Select randomly two different elements x1 and x2 within individual i;
Exchange values between x1 and x2 producing a new individual;
Calculate the fitness of a new individual;
if new fitness < old fitness then accept changes;
end for
end for
end
Procedure L(3): {non-uniform mutation}
begin
for i=1 to J do
for t=1 to T do
Select random point within the individual i;
Apply non-uniform mutation;
Calculate the fitness of a new individual;
if new fitness < old fitness then accept change;
end for
end for
end
Procedure L(4): {gradient mutation}
begin
for i=1 to J do
for i=1 to k do {k -- number of iteration}
Select randomly two different elements x1 and x2 within
individual i;
Generate a random binary digit;
Change values x1 and x2 to x1 + ξ and x2 + ξ if a random digit
is 0 and to x1 -- ξ and x2 -- ξ otherwise, producing a new
individual;
Calculate the fitness of a new individual;
if new fitness < old fitness then accept changes;
end for
end for
end
Procedure L(5): {gradient adjustment operator}
begin
for i=1 to J do
α =1;
while α > 0 do
Apply gradient adjustment operator;
Calculate fitness of the new individual;
if new fitness < old fitness then accept changes and break;
α = α − 0.02;
end while
end for
end
The above-proposed procedures require some additional comments. The first procedure, standard
mutation, modifies an individual by generating new values of its two randomly selected elements. If the
operation improves the fitness function value then the change will be accepted. The second learning and
improvement procedure involves exchange of values between the two randomly selected elements within
an individual. If the operation improves the fitness function value then the change will be accepted. The
third procedure, a nonuniform mutation, involves modifying an individual by repeatedly adjusting value
of the randomly selected element (in this case a real number) until the fitness function value has improved
or until a number of the consecutive improvements have been attempted unsuccessfully. The value of the
adjustment is calculated as:
(t , y) = y 1 − r (1−(t /T ))·r ,
where r is the uniformly distributed real number from (0, 1], T is equal to the length of the vector
representing an individual and t is a current number of adjustment. The fourth learning and improve-
ment procedure, a gradient mutation, changes two randomly selected elements within an individual by
incrementing or decrementing their values. Direction of change (increment/decrement) is random and
has identical probabilities equal to 0.5. The value of change is proportional to the gradient of an indi-
vidual. If the fitness function value of an individual has improved then the change is accepted. Number
of iterations for learning and improvement procedures 1, 2, and 4 has to be set at the fine-tuning phase.
Finally, the fifth procedure adjusts the value of each element of the individual by a constant value delta
(), proportional to its gradient. Delta is calculated as = α · ξ , where α is the factor determining a size
of the step in direction of ξ , known as a momentum. Factor α takes values from (0, 1]. In the proposed
algorithm its value iterates starting from 1 with the step equal to 0.02. Here, ξ is a vector determining the
direction of search and is equal to the gradient of an individual.
Population learning algorithm for ANN training has been implemented in two versions — sequential
and parallel. A general structure of the sequential implementation is shown in the following pseudo-code:
Procedure Sequential_PLA
begin
Generate initial population;
P =: initial population;
for i=1 to 4 do
Procedure L(i);
P = Select(P);
end for
Procedure L(5);
Consider best individual in P as a solution;
end
The parallel PLA implementation is based on the cooperation between the master worker (server)
whose task is to manage computations and a number of slave workers, who act in parallel, performing
computations as requested by the master. The approach allows a lot of freedom in designing population-
learning process. The master worker is managing communication flow during the population learning. It
allocates computational tasks in terms of the required population size and the number of iterations and
also controls information exchange between slaves. The latter task involves upgrading, at various learning
stages, current populations maintained by all slaves with globally best individuals. Communication flow
between the master and slave workers is shown in Figure 31.4.
The parallel PLA implementation for ANN training denoted Parallel_PLA is based on the following
rules:
• Master worker defines the number of slave workers and the size of the initial population for each
of them.
Slave
worker-1
Master Slave
worker worker-2
Slave
worker-N
FIGURE 31.4 Communication flow between the master and slave workers.
• Each slave worker uses the same, described earlier, learning and improvement procedures.
• Master worker activates parallel processing.
• After completing each stage, workers inform master about the best solution found so far.
• Master worker compares the received values and sends out the best solution to all the workers
replacing their current worst individual.
• Master worker can stop computations if the desired quality level of the objective function has been
achieved. This level is defined at the beginning of computations through setting the desired value of
the mean squared error on a given set of training patterns. Alternatively, computations are stopped
after the predefined number of iterations at each stage has been executed.
• Slave workers can also stop computations if the above condition has been met.
A variant of the above approach, denoted Parallel-PLAs, differs with respect to the strategy of using
the best solution sent to slaves from the master. Now, the best solution is used to produce offspring by
applying two types of the crossover operators — a single point crossover and a position crossover. Both are
applied in turn to generate an offspring from all of the current individuals for each slave. Each individual
in each slave is coupled with the best current solution forwarded from the master to produce an offspring.
This takes place after completing each stage.
The proposed implementations of the PLA have been used to train several neural networks applied to
solving popular benchmarking classification problems — two spirals, 10-parity, Wisconsin breast cancer,
Cleveland heart disease, and credit approval.
Artificial neural network algorithms solving the two spirals problem has two real-valued inputs corres-
ponding to the x and y coordinates of the point, and one binary target output that classifies the point as
belonging to either of the two spirals coiling three times around the origin. The two spirals are constructed
from 97 points given by the respective coordinates from the training set. Solving the two spirals problem
appears to be a very difficult task for back-propagation networks (Shang and Wah, 1996). The topology of
the network for this problem has been adopted from (Fahlman and Lebiere, 1990), where also an efficient
constructive training algorithm called CASCOR was suggested. This so-called “shortcut” topology has 4
hidden units and 25 links with 25 weights.
In 10-parity problem ANN has to be trained to produce the Boolean “Exclusive OR” function of ten
variables. A two-layer neural network with 10, 10 and 1 neurons in layers 1, 2, and 3, respectively is used.
Diagnosis of breast cancer involves classifying a tumor as either benign or malignant based on
cell descriptions gathered by microscopic examination. Breast cancer databases were obtained from
Dr. William H. Wolberg, University of Wisconsin Hospitals, Madison (see [Mangasarian and Wolberg,
1990]). It includes 699 examples, 9 inputs and 2 outputs each. The corresponding ANN has, 9, 9, and 1
neuron in layers 1, 2, and 3, respectively.
TABLE 31.13 Correct Classification Ratios (%), Mean Errors and Mean Training Times for the PLA
Implementations
Two spirals Sequential-PLA 71.6 82.0 58.9 74.7 83.2 65.0 0.29 0.25 45.0
Two spirals Parallel-PLA 89.2 98.4 70.2 88.4 96.6 74.0 0.07 0.04 20.0
Two spirals Parallel-PLAs 91.7 98.0 77.0 92.0 98.2 80.0 0.07 0.04 18.0
10 parity Sequential-PLA 88.1 92.0 72.0 88.0 97.4 70.0 0.12 0.11 3.50
10 parity Parallel-PLA 93.6 98.0 80.5 92.2 96.0 75.0 0.06 0.04 2.00
Breast Sequential-PLA 95.4 98.5 82.2 96.4 98.0 78.0 0.02 0.01 2.50
Breast Parallel-PLAs 96.6 98.0 82.2 96.6 98.0 80.8 0.02 0.01 1.50
Heart Sequential-PLA 81.0 88.0 65.0 81.3 89.0 62.3 0.21 0.17 2.70
Heart Parallel-PLAs 85.7 90.0 68.0 86.5 89.2 70.0 0.11 0.10 1.50
Credit Sequential-PLA 82.0 86.4 63.0 82.6 87.4 63.0 0.20 0.18 3.40
Credit Parallel-PLAs 88.1 91.9 74.0 86.6 90.4 79.2 0.11 0.11 2.50
Cleveland heart disease problem involves predicting heart disease, that is, deciding whether at least one
of four major vessels is reduced in diameter by more than 50% . The binary decision is made based on
personal data. The data set includes 303 examples with 13 inputs and 2 outputs and the corresponding
ANN has 13, 13, and 1 neuron in layers 1, 2, and 3, respectively.
Credit card approval involves predicting the approval or rejection of a credit card request. The data set
consists of 690 examples with 15 inputs and 2 outputs with a good mix of attributes. The corresponding
ANN to be trained has 15, 15, and 1 neuron in layers 1, 2, and 3, respectively.
All data sets for the above problems are available at the UCI repository (Merz and Murphy, 1998) and are
often used to compare various techniques for ANN training. Data set for the 2-spirals problem is available
and has been obtained from the home page http://www.cae.wisc.edu/∼ece602/data/CMUbenchmark
Computational experiment designed to validate the proposed PLA implementations have been based
on the following quality measures:
• Correct classification ratio for the “10 cross validation” (10CV) approach.
• Correct classification ratio for the training, test, and validation sets constructed using the
50–25–25% (TTV) principle (Prechelt and Proben, 1994).
• Mean squared classification error, MSE.
• Mean relative classification error, MRE.
All artificial neural networks used in the experiment have sigmoid activation function with the sigmoid
gain equal to 1. The initial population size in all PLA implementations has been set to 200. A maximum
number of repetitions for each learning and improvement procedure has been set to 20 except for the
2-spirals problem in both, sequential and parallel versions, where it has been set to 40. Each benchmarking
problem has been solved 50 times and the reported values of the quality measures have been averaged
over these 50 runs. All computations have been carried on the Sun Challenge R4400 and, additionally, for
the Parallel_PLA and Parallel_PLAs using the PVM environment. Experiment results are shown in
Table 31.12. In Table 31.13 the performance of the PLA implementations is compared with some reported
results from the literature.
It can be observed that PLA implementations guarantee good or even competitive level of training
quality. It seems that the approach is also interesting in terms of computational time requirements. The
author has not been able to find comparable data on time performance of the alternative approaches.
Some conclusions can be drawn from the information on time performance of the Novel algorithm,
which required 900 min to train ANN solving 2-spirals problem and 35 min to train ANN solving
10-parity problem (Shang and Wah, 1996).
Step 1. Transform X normalizing value of each xij into interval [0, 1] and then rounding it to the
nearest integer, that is 0 or 1.
Step 2. Calculate for each instance from the original training set the value of its identity factor Ii :
n+1
Ii = xij sj , i = 1, . . . , N ,
j=1
where
N
Sj = xij , j = 1, . . . , n + 1.
i=1
Step 3. Map input vectors (i.e., rows from X ) into t clusters denoted as Yv , v = 1, . . . , t . Each
cluster contains input vectors with identical value of the identity factor Ii , where t is a number of
different values of Ii .
Step 4. Set value of the representation level K , which denotes the maximum number of input
vectors to be retained in each of t clusters defined in step 3. Value of K is set arbitrarily by the user.
Step 5. Select input vectors to be retained in each cluster. Let yv denote a number of input vectors
in cluster v, v = 1, . . . , t . Then the following rules for selecting input vectors apply:
• If yv ≤ K then S = S ∪ Yv .
• If yv > K then the order of input vectors in Yv is randomized and the cluster is partitioned
into q = yv /K subsets denoted as Dvj , j = 1, . . . , q. Generalization accuracy of each subset
denoted as Avj is calculated carrying out the leave-(q-1)-out test with X = X − Yv + Dvj as a
training set. Subset of input vectors from cluster v maximizing value of Avj is kept in the reduced
training set.
IRA2 uses two additional parameters: pr, precision and ml, multiple. The first one determines a number
of digits after the decimal point. The second determines a way the identity factor value is rounded. The
algorithm proceeds as follows:
Step 1. Set value of the representation level K , which is the maximum number of input vectors to
be retained in each cluster.
Step 2. Set values pr = 0 and ml = 1.
Step 3. Transform X normalizing value of each xij into interval [0, 1].
Step 4. Round xij to the nearest value of 0 or 1 with precision pr digits after the decimal point.
Step 5. Calculate for each instance from the original training set the value of its identity factor Ii .
Step 6. Round Ii to the nearest multiple of ml.
Step 7. Map instances (i.e., rows from X ) into t clusters Yv , v = 1, . . . , t . Each cluster contains
instances with identical value of the identity factor Ii , and t is a number of such clusters.
Step 8. Select instances to be retained in each cluster. Let yv denote a number of input vectors in
cluster v, v = 1, . . . , t . Then the following rules for selecting input vectors apply:
• If yv ≤ K then S = S ∪ Yv .
• If yv > K and K = 1 then the order of input vectors in Yv is randomized. Generalization
accuracy of each Yv is calculated carrying out the leave-(q-1)-out test. An instance from cluster
v maximizing the generalization accuracy is kept in the reduced training set.
• If yv > K and K > 1 set pr to 1 and then for instances in Yv and for several arbitrary values
of ml (i.e., ml = {10, 20, 30}) repeat steps 4 to 8 calculating new values of identity factor and
creating q subsets denoted as Dvj , j = 1, . . . , q, until a number of elements in each subset is
at least equal to K . Generalization accuracy of each subset Avj is calculated carrying out the
leave-(q-1)-out test with X = X − Yv + Dvj as a training set. The subset of instances from
cluster v maximizing the value of Avj is kept in the reduced training set.
IRA3 uses the population-learning algorithm for the selection of instances to be kept. Steps 1 to 4 in
IRA3 are identical as in IRA1 but step 5 differs:
Step 5. Select instances to be retained in each cluster:
• If yv ≤ K then S = S ∪ Yv .
• If yv > K and K = 1 then S = S ∪ {x }, where xv is a selected reference instance from the cluster
v
n yv
Yv , where the distance d(xv , µv ) = i=1 (xi − µi ) ) is minimal and µ = (1/yv )
v v 2 v
j=1 xv is
the mean vector of the cluster Yv .
• If yv > K and K > 1 then S = S ∪ {xjv }, where {xjv }, j = 1, . . . , K ) are reference instances from
the cluster Yv selected by applying the PLA.
The PLA implemented for the selection of reference instances maps instances xv from Yv into K subsets
Dvj (j = 1, . . . , K ), such that the sum of the squared Euclidean distances between each instance xv ∈ Dvj
and the mean vector µj of the subset Dvj is minimal. Vectors with minimal distance to the mean vector in
each subset are selected as K reference vectors. This selection method can be associated with one of the
clustering technique known as k-means algorithm (Likas et al., 2001).
A potential solution p is coded as the (K + yv )-element vector. Its first K positions inform how many
subsequent elements from all yv elements are contained in each Dvj (j = 1, . . . , K ). The next yv positions
of the potential solution represent input vector numbers from Yv .
The fitness of an individual p ∈ P where P is a population of individuals can be evaluated as:
K
J (p) = p[z] − µj 2 ,
j=1 z∈T
where
(K , K + p[j]) if j = 1
T = j−1 j
(K + i=1 p[i], K + i=1 p[i]) otherwise
K =1 K =4 K = 10
Credit 28 55 69
Heart 34 55 67
Breast 37 63 81
Thyroid 9 18 30
CI 1 3 5
The above proposed instance reduction algorithms have been used to generate training sets for all con-
sidered problems. For each problem five reduced instance sets have been generated with the representation
levels varied between 1 and 10. In the computational experiment “10 cross-validation” approach was used.
Each dataset was divided into 10 partitions and each reduction technique was applied to a training set T
consisting of 9 of the partitions, from which it reduced a subset S. The ANN was trained using only the
instances in S. Ten such trials were run for each dataset with each reduction algorithm, using a different
partitions as the test set in each trial. Mean percentages of input vectors retained are shown in Table 31.14.
The PLAANN classifier has been run 20 times for each representation level and for each training set
generated by IRA1 to IRA4. Results of thus obtained classifications, averaged over 20 runs, are shown in
Table 31.15. The column “Original set” in Table 31.15 shows results obtained by applying the PLAANN
to the original, non-reduced training set using the “10 cross-validation” approach. Best results for each of
the 5 benchmark problems and for each of the representation levels tried are shown in bold. Best overall
results are underlined.
Applying the proposed variants of IRA clearly results in a substantial reduction of the training set
size as compared with an original data set. Overall performance of the PLAANN classifier seems quite
satisfactory. It is also clear that in majority of cases increasing the representation level leads to a better
performance in terms of the classifier quality at a cost of higher requirements in terms of the computation
time. For the credit, heart and breast problems the PLAANN classifier trained using the reduced training
set, performs with the accuracy of classification comparable to the accuracy achieved on the original
training set. The respective training time decreases, on average, 4 to 6 times (for K = 10). It might be
worth noting that in case of the Customer Intelligence problem the PLAANN applied to the original set of
instances has not been able to find any satisfactory solution in a reasonable time and in fact classification
K =1 K =4 K = 10 Original set
Credit 74.00 79.87 82.70 85.70
Heart 85.34 78.31 85.10 88.10
IRA1 Breast 93.04 93.68 95.84 96.60
Thyroid 93.45 94.71 95.69 93.10
CI 67.44 69.45 71.80 58.70
Credit 74.00 80.04 83.15 85.70
Heart 85.34 83.45 85.87 88.10
IRA2 Breast 93.04 94.21 94.81 96.60
Thyroid 93.45 94.50 94.68 93.10
CI 67.44 75.31 76.48 58.70
Credit 76.00 81.32 84.33 85.70
Heart 80.10 87.20 87.28 88.10
IRA3 Breast 93.80 94.92 96.20 96.60
Thyroid 93.45 94.80 96.60 93.10
CI 70.00 79.50 80.50 58.70
Credit 76.00 81.45 83.73 85.70
Heart 80.10 87.10 88.10 88.10
IRA4 Breast 93.80 95.10 96.15 96.60
Thyroid 93.45 98.21 98.43 93.10
CI 70.00 80.50 83.40 58.70
process has been stopped. Accuracy of classification obtained with the original set was then only 58% .
However the PLAANN trained on the reduced set guarantees accuracy better then 80% already 120 sec of
computation time.
For the thyroid problem the PLAANN trained using the original data set produces accuracy of classi-
fication at the level of about 92%, which is a standard performance. The accuracy of classification for the
discussed problem has grown substantially with the reduction of the original dataset size.
Comparison of the proposed IRAs with other approaches to instance reduction is shown in Table 31.16.
The column “Retained” in Table 31.16 shows a percentage of training vectors from the original training
set that have been retained by the respective instance reduction algorithm. All the results other than
these produced by the proposed IRAs were reported in Wilson and Martinez (2000). Acronyms used in
Table 31.16 stand for k-Nearest-Neighbor (kNN), Condensed Nearest Neighbor (CNN), Selective Nearest
Neighbor (SNN), Instance Based (IB), and Decremental Reduction Optimization Procedure (DROP).
Computational experiment was carried on a SGI Challenge R4400 workstation with 12 processors.
A number of slave workers used by the master varied in different runs from 5 to 15 and in each run were
chosen randomly. A size of the initial population in the IRA3 implementation of the PLA was set to 100.
A maximum number of repetitions for each learning and improvement procedure was set to 500. In the
IRA4 implementation the respective parameters were set to 50 for the initial population size and 500 for
the number of iterations.
The proposed and provisionally validated simple, PLA-based heuristic instance reduction algorithms,
can be used to increase efficiency of the supervised learning. Computational experiment results support
the claim that reducing training set size still preserves basic features of the analyzed data and can be even
beneficial to a classifier accuracy. The approach extends a range of available instance reduction algorithms.
Moreover, it is shown that the proposed algorithm can be, for some problems, competitive in comparison
with the existing techniques. Possible extension of the approach could focus on establishing decision rules
for finding a representation level suitable for each cluster thus allowing a variable representation level for
different clusters.
References
Barbucha D. and Jȩdrzejowicz, P. A population learning algorithm for solving the generalized segregated
storage problem. In P. Sincak, J. Vascak, V. Kvasnicka, and R. Mesiar (Eds.), The State of the Art. In
Computational Intelligence”, Advances in Soft Computing, Physica-Verlag, Heidelberg, New York,
2000, pp. 355–360.
Chung, C-J. and Reynolds, R.G. CAEP: An evolution-based tool for real-valued function optimization
using cultural algorithms. Journal of Artificial Intelligence Tools, 7, 1998, 239–292.
Colorni, A., Dorigo, M., and Maniezzo, V. An investigation of some properties of an ant algorithm. In
R. Manner and B. Manderick (Eds.), Proceedings of the 2nd European Conference on Parallel Problem
Solving from Nature, Elsevier, Amsterdam, 1992, pp. 509–520.
Christofides, N., Alvarez-Valdes, R., and Tamarit, J.M. Project scheduling with resource constraints:
A branch and bound approach. European Journal of Operational Research, 29, 1987, 262–273.
Czarnowski, I., and Jȩdrzejowicz, P. Application of the parallel population learning algorithm to training
feed-forward ANN. In P. Sincak et al. (Eds.), Intelligent Technologies–Theory and Applications, IOS
Press, Amsterdam, 2002a, pp. 10–16.
Czarnowski, I. and Jȩdrzejowicz, P. An approach to artificial neural network training. In M. Bremer,
A. Preece, F. Coenen (Eds.), Research and Development in Intelligent Systems XIX, Springer-Verlag,
Heidelberg, 2002b, pp. 149–160.
Czarnowski, I. and Jȩdrzejowicz, P. An approach to instance reduction in supervised learning. In F. Coenen,
A. Preece, and A.L. Macintosh (Eds.), Research and Development in Intelligent Systems XX, Springer,
London, 2003, pp. 267–280.
Czarnowski, I., Gutjahr, W.J., Jȩdrzejowicz, P., Ratajczak, E., Skakowski, A., and Wierzbowska, I. Scheduling
multiprocessor tasks in presence of correlated failures. Central European Journal of Operations
Research, 11, 2003, 163–182.
Davis, L. Handbook of Genetic Algorithms, Van Nostrand Reinhold, New York, 1991.
Dawkins, R. The Selfish Gene, Oxford University Press, Oxford, 1976.
Dorigo, M., and Di Caro, G. The ant colony optimization meta-heuristic. In D. Corne, M. Dorigo, and
F. Glover, (Eds.), New Ideas in Optimization, McGraw-Hill, New York, 1999, pp. 11–32.
The European Network of Excellence on Intelligent Technologies for Smart Adaptive Systems (EUNITE) —
EUNITE World competition in domain of Intelligent Technologies — 0, 2002.
Fahlman, S.E. and Lebiere, C. The cascade-correlation learning architecture. In E. Touretzky (Ed.),
Advances in Neural Information Processing II, Morgan Kauffman, San Mateo, CA, 1990,
pp. 524–532.
Feo, T.A. and Rosende, M.G.C. Greedy randomized adaptive search procedures. Journal of Global
Optimization, 6, 1995, 109–133.
Glover, F. Heuristics for integer programming using surrogate constraints. Decision Sciences, 8, 1977,
156–166.
Reynolds, R.G. An introduction to Cultural Algorithms. In A.V. Sebald, L.J. Fogel (Eds.), Proceedings of
the 3rd Annual Conference on Evolutionary Programming, World Scientific, River Edge NJ, 1994,
pp. 131–139.
Shang, Yi. and Wah, B.W. A global optimization method for neural network training, Conference of
Neural Networks. IEEE Computer, 29, 1996, 45–54.
Taillard, E. Benchmarks for basic scheduling instances. European Journal of Operational Research, 64, 1993,
278–285.
Wilson, D.R. and Martinez, T.R. Reduction techniques for instance-based learning algorithm. Machine
Learning, 38, 2000, 257–286.
32.1 Introduction
Biology-derived algorithms are an important part of computational sciences, which are essential to many
scientific disciplines and engineering applications. Many computational methods are derived from or
based on the analogy to natural evolution and biological activities, and these biologically inspired compu-
tations include genetic algorithms, neural networks, cellular automata, and other algorithms. However, a
substantial amount of computations today are still using conventional methods such as finite difference,
finite element, and finite volume methods. New algorithms are often developed in the form of a hybrid
combination of biology-derived algorithms and conventional methods, and this is especially true in the
field of engineering optimizations. Engineering problems with optimization objectives are often difficult
and time consuming, and the application of nature or biology-inspired algorithms in combination with
the conventional optimization methods has been very successful in the last several decades.
There are five paradigms of nature-inspired evolutionary computations: genetic algorithms, evolution-
ary programming, evolutionary strategies, genetic programming, and classifier systems (Holland, 1975;
Goldberg, 1989; Mitchell, 1996; Flake, 1998). Genetic algorithm (GA), developed by John Holland and
his collaborators in the 1960s and 1970s, is a model or abstraction of biological evolution, which includes
the following operators: crossover, mutation, inversion, and selection. This is done by the representation
within a computer of a population of individuals corresponding to chromosomes in terms of a set of
32-589
character strings, and the individuals in the population then evolve through the crossover and mutation
of the string from parents, and the selection or survival according to their fitness. Evolutionary pro-
gramming (EP), first developed by Lawrence J. Fogel in 1960, is a stochastic optimization strategy similar
to GAs. But it differs from GAs in that there is no constraint on the representation of solutions in EP
and the representation often follows the problem. In addition, the EPs do not attempt to model genetic
operations closely in the sense that the crossover operation is not used in EPs. The mutation operation
simply changes aspects of the solution according to a statistical distribution, such as multivariate Gaussian
perturbations, instead of bit-flopping, which is often done in GAs. As the global optimum is approached,
the rate of mutation is often reduced. Evolutionary strategies (ESs) were conceived by Ingo Rechenberg
and Hans-Paul Schwefel in 1963, later joined by Peter Bienert, to solve technical optimization problems
(Rechenberg, 1973). Although they were developed independently of one another, both ESs and EPs have
many similarities in implementations. Typically, they both operate on real-values to solve real-valued
function optimization in contrast with the encoding in Gas. Multivariate Gaussian mutation with zero
mean are used for each parent population and appropriate selection criteria are used to determine which
solution to keep or remove. However, EPs often use stochastic selection via a tournament and the selec-
tion eliminates those solutions with the least wins, while the ESs use deterministic selection criterion that
removes the worst solutions directly based on the evaluations of certain functions (Heitkotter and Beasley,
2000). In addition, recombination is possible in an ES as it is an abstraction of evolution at the level of
individual behavior in contrast to the abstraction of evolution at the level of reproductive populations
and no recombination mechanisms in EPs.
The aforementioned three areas have the most impact in the development of evolutionary computations,
and, in fact, evolutionary computation has been chosen as the general term that encompasses all these
areas and some new areas. In recent years, two more paradigms in evolutionary computation have
attracted substantial attention: Genetic programming and classifier systems. Genetic programming (GP)
was introduced in the early 1990s by John Koza (1992), and it extends GAs using parse trees to represent
functions and programs. The programs in the population consist of elements from the function sets,
rather than fixed-length character strings, selected appropriately to be the solutions to the problems.
The crossover operation is done through randomly selected subtrees in the individuals according to their
fitness; the mutation operator is not used in GP. On the other hand, a classifier system (CFS), another
invention by John Holland, is an adaptive system that combines many methods of adaptation with learning
and evolution. Such hybrid systems can adapt behaviors toward a changing environment by using GAs
with adding capacities such as memory, recursion, or iterations. In fact, we can essentially consider the
CFSs as general-purpose computing machines that are modified by both environmental feedback and the
underlying GAs (Holland, 1975, 1995; Michaelewicz, 1996; Flake, 1998).
Biology-derived algorithms are applicable to a wide variety of optimization problems. For example,
optimization functions can have discrete, continuous, or even mixed parameters without any a priori
assumptions about their continuity and differentiability. Thus, evolutionary algorithms are particularly
suitable for parameter search and optimization problems. In addition, they are easy for parallel implement-
ation. However, evolutionary algorithms are usually computationally intensive, and there is no absolute
guarantee for the quality of the global optimizations. Besides, the tuning of the parameters can be very
difficult for any given algorithms. Furthermore, there are many evolutionary algorithms with different
suitabilities and the best choice of a particular algorithm depends on the type and characteristics of the
problems concerned. However, great progress has been made in the last several decades in the application
of evolutionary algorithms in engineering optimizations. In this chapter, we will focus on some of the
important areas of the application of GAs in engineering optimizations.
GAs, photosynthetic algorithms (PAs), neural networks, and cellular automata. We will briefly discuss
these algorithms in this section, and we will focus on the application of GAs and PAs in engineering
optimizations in Section 32.3.
Parent 1 1 1 0 1 1 0 1 0 1 1 0 1 1 1 1 0 Child 1
Parent 2 1 1 0 1 0 1 1 0 1 1 0 1 0 0 1 0 Child 2
Crossover point
1 1 0 1 1 0 1 0 Mutation 1 1 0 0 1 0 1 0
is just a simple version of a complicated process. Other factors, such as temperature, concentration of CO2 ,
water content, etc., being equal, the reaction efficiency depends largely on light intensity. The important
part of photosynthetic reactions is the dark reactions that consist of a biological process including two
cycles: the Benson–Calvin cycle and photorespiration cycle. The balance between these two cycles can be
considered as a natural optimization procedure that maximizes the efficiency of sugar production under
the continuous variations of light energy input (Murase, 2000).
Murase’s PA uses the rules governing the conversion of carbon molecules in the Benson–Calvin cycle
(with a product or feedback from dihydroxyacetone phosphate or DHAP) and photorespiration reactions.
The product DHAP serves as the knowledge strings of the algorithm and optimization is reached when
the quality or the fitness of the products no longer improves. An interesting feature of such algorithms
is that the stimulation is a function of light intensity that is randomly changing and affects the rate of
photorespiration. The ratio of O2 to CO2 concentration determines the ratio of the Benson–Calvin and
photorespiration cycles. A PA consists of the following steps: (1) the coding of optimization functions in
terms of fixed-length DHAP strings (16-bit in Murase’s PA) and a random generation of light intensity (L);
(2) the CO2 fixation rate r is then evaluated by the following equation: r = Vmax /(1 + A/L), where Vmax
is the maximum fixation rate of CO2 and A is its affinity constant; (3) either the Benson–Calvin cycle or
photorespiration cycle is chosen for the next step, depending on the CO2 fixation rate, and the 16-bit strings
are shuffled in both cycles according to the rule of carbon molecule combination in photosynthetic path-
ways; (4) after some iterations, the fitness of the intermediate strings is evaluated, and the best fit remains
as a DHAP, then the results are decoded into the solution of the optimization problem (see Figure 32.2).
In the next section, we will present an example of parametric inversion and optimization using PA in
finite element inverse analysis.
Light (random)
O2/CO2 concentration
Benson–Calvin Photorespiration
cycle
Fitness remove
evaluation if poor
iterations
DHAP Strings and results
u1(t )
u2(t )
FIGURE 32.3 Diagram of a McCulloch–Pitts neuron (left) and neural networks (right).
where H (x) is the Heaviside unit step function that H (x) = 1 if x ≥ 0 and H (x) = 0 otherwise. The
weight coefficient wij is considered as the synaptic strength of the connection from neuron j to neuron i.
For each neuron, it can be activated only if its threshold i is reached. One can consider a single neuron
as a simple computer that gives the output 1 or yes if the weighted sum of incoming signals is greater than
the threshold, otherwise it outputs 0 or no.
Real power comes from the combination of nonlinear activation functions with multiple neurons
(McCulloch and Pitts, 1943; Flake, 1998). Figure 32.3 also shows an example of feed-forward neural
networks. The key element of an artificial neural network (ANN) is the novel structure of such an
information processing system that consists of a large number of interconnected processing neurons.
These neurons work together to solve specific problems by adaptive learning through examples and self-
organization. A trained neural network in a given category can solve and answer what-if type questions
to a particular problem when new situations of interest are given. Due to the real-time capability and
parallel architecture as well as adaptive learning, neural networks have been applied to solve many real-
world problems such as industrial process control, data validation, pattern recognition, and other systems
of artificial intelligence such as drug design and diagnosis of cardiovascular conditions. Optimization is
just one possibility of such applications, and often the optimization functions can change with time as
is the case in industrial process control, target marketing, and business forecasting (Haykins, 1994). On
the other hand, the training of a network may take considerable time and a good training database or
examples that are specific to a particular problem are required. However, neural networks will, gradually,
come to play an important role in engineering applications because of its flexibility and adaptability in
learning.
in a discrete manner, and complex characteristics can be observed and studied. For more details on this
topic, readers can refer to Chapters 1 and 18 in this handbook.
There is some similarity between finite state CA and conventional numerical methods such as finite
difference methods. If one considers the finite different method as real-valued CA, and the real-values are
always converted to finite discrete values due to the round-off in the implementation on a computer, then
there is no substantial difference between a finite difference method and a finite state CA.
However, CA are easier to parallelize and more numerically stable. In addition, finite difference schemes
are based on differential equations and it is sometimes straightforward to formulate a CA from the
corresponding partial differential equations via appropriate finite differencing procedure; however, it is
usually very difficult to conversely obtain a differential equation for a given CA (see Chapter 18 in this
handbook).
An optimization problem can be solved using CA if the objective functions can be coded to be associated
with the states of the CA and the parameters are properly associated with automaton rules. This is an
area under active research. One of the advantages of CA is that it can simulate many processes such as
reaction–diffusion, fluid flow, phase transition, percolation, waves, and biological evolution. Artificial
intelligence also uses CA intensively.
32.2.5 Optimization
Many problems in engineering and other disciplines involve optimizations that depend on a number of
parameters, and the choice of these parameters affects the performance or objectives of the system con-
cerned. The optimization target is often measured in terms of objective or fitness functions in qualitative
models. Engineering design and testing often require an iteration process with parameter adjustment.
Optimization functions are generally formulated as:
Optimize: f (x),
Subject to: gi (x) ≥ 0, i = 1, 2, . . . , N ; hj (x) = 0, j = 1, 2, . . . , M .
where x = (x1 , x2 , . . . , xn ), x ∈ (parameter space).
Optimization can be expressed either as maximization or more often as minimization (Deb, 1995,
2000). As parameter variations are usually very large, systematic adaptive searching or optimization
procedures are required. In the past several decades, researchers have developed many optimization
algorithms. Examples of conventional methods are hill climbing, gradient methods, random walk, simu-
lated annealing, heuristic methods, etc. Examples of evolutionary or biology-inspired algorithms are GAs,
photosynthetic methods, neural network, and many others.
The methods used to solve a particular problem depend largely on the type and characteristics of
the optimization problem itself. There is no universal method that works for all problems, and there is
generally no guarantee to find the optimal solution in global optimizations. In general, we can emphasize
on the best estimate or suboptimal solutions under the given conditions. Knowledge about the particular
problem concerned is always helpful to make the appropriate choice of the best or most efficient methods
for the optimization procedure. In this chapter, however, we focus mainly on biology-inspired algorithms
and their applications in engineering optimizations.
where xi is the phenotypic value of individual i, and N is the population size. For the generalized
De Jong’s (1975) test function
n
f (x) = xi2α , |x| ≤ r, α = 1, 2, . . . , m,
i=1
where α is a positive integer and r is the half-length of the domain. This function has a minimum of
f (x) = 0 at x = 0. For the values of α = 3, r = 256, and n = 40, the results of optimization of this test
function are shown in Figure 32.4 using GAs.
The function we just discussed is relatively simple in the sense that it is single-peaked. In reality, many
functions are multi-peaked and the optimization is thus multileveled. Keane (1995) studied the following
bumby function in a multi-peaked and multileveled optimization problem
sin2 (x − y) sin2 (x + y)
f (x, y) = , 0 < x, y < 10.
x2 + y2
The optimization problem is to find (x, y) starting (5, 5) to maximize the function f (x, y) subject to:
x + y ≤ 15 and xy ≥ 3/4. In this problem, optimization is difficult because it is nearly symmetrical about
x = y, and while the peaks occur in pairs one is bigger than the other. In addition, the true maximum
is f (1.593, 0.471) = 0.365, which is defined by a constraint boundary. Figure 32.5 shows the surface
variation of the multi-peaked bumpy function.
Although the properties of this bumpy function make it difficult for most optimizers and algorithms,
GAs and other evolutionary algorithms perform well for this function and it has been widely used as a test
6 6
Best estimate = 0.046346 Best estimate = 0.23486
5 5
4
4
3
3
log[f(x)]
log[f(x)]
2
2
1
1
0
–1 0
–2 –1
0 100 200 300 400 500 600 0 100 200 300 400 500 600
Generation (t) Generation (t)
FIGURE 32.4 Function optimization using GAs. Two runs will give slightly different results due to the stochastic
nature of GAs, but they produce better estimates: f (x) → 0 as the generation increases.
0.6
0.4
f (x,y)
0.2
0
0
x 5
10
5
10 y
0
Th Ts
R R
function in GAs for comparative studies of various evolutionary algorithms or in multilevel optimization
environments (Jenkins, 1997; El-Beltagy and Keane, 1999).
The values for x1 , x2 should be considered as integer multiples of 0.0625. Using the same constraints as
given in Coello (2000), the variables are in the ranges: 1 ≤ x1 , x2 ≤ 99, 10.0000 ≤ x3 . x4 ≤ 100.0000
(with a four-decimal precision). By coding the GAs with a population of 44-bit strings for each indi-
vidual (4-bits for x1 · x2 ; 18-bits for x3 · x4 ), similar to that by Wu and Chow (1994), we can solve
the optimization problem for the pressure vessel. After several runs, the best solution obtained is
x∗ = (1.125, 0.625, 58.2906, 43.6926) with f (x) = 7197.9912$, which is compared with the results
x∗ = (1.125, 0.625, 28.291, 43.690) and f (x) = 7198.0428$ obtained by Kannan and Kramer (1994).
U = (u1 , v1 , u2 , v2 , u3 , v3 , u4 , v4 , u5 , v5 )T ,
where u1 = v1 = u2 = v2 = 0 (fixed). Measurements are made for other displacements. By using the PA
with the values of the CO2 affinity A = 10000, light intensity L = 104 to 5 × 104 lx, and maximum CO2
f =1
1
4
E4, n4
5
E1, n1 E3, n3
E2, n2
3
2
fixation speed Vmax = 30, each of eight elastic modulus (Ei , νi )(i = 1, 2, 3, 4) is coded as a 16-bit DHAP
molecule string. For a target vector
and measured displacements U = (0, 0, 0, 0, −0.0066, −0.0246, 0.0828, −0.2606, 0.0002, −0.0110), the
best estimates after 500 iterations from the optimization by the PA are Y = (580, 0.24, 400, 0.31,
460, 0.29, 346, 0.26).
∂u
∂t = ∇ · [κ(x, y)∇u], 0 < x, y < 1, t > 0,
u(x, y, 0) = 1, u(x, 0, t ) = u(x, 1, t ) = u(0, y, t ) = u(1, y, t ) = 0.
The domain is discretized as an N × N grid, and the measurements of values at (xi , yj , tn ), (i, j =
1, 2, . . . , N ; n = 1, 2, 3). The data set consists of the measured value at N 2 points at three different times
t1 , t2 , t3 . The objective is to inverse or estimate the N 2 diffusivity values at the N 2 distinct locations. The
Karr’s error metrics are defined as
N N measured computed N N known predicted
u − u κ − κ
i=1 j=1 i,j i,j i=1 j=1 i,j i,j
Eu = A N N measured , Eκ = A N N known ,
i=1 j=1 ui,j i=1 j=1 κi,j
× 10–3
2
1.5
Ek
1
0.5
0
15
10 15
5 10
node ( j ) 5
node (i)
FIGURE 32.8 Error metric Eκ associated with the best solution obtained by the GA algorithm.
The optimization methods using biology-derived algorithms and their engineering applications have
been summarized. We used four examples to show how GAs and PAs can be applied to solve optimization
problems in multilevel function optimization, shape design of pressure vessels, finite element inverse
analysis of material properties, and the inversion of diffusivity matrix as an IVBV problem. Biology-
inspired algorithms have many advantages over traditional optimization methods such as hill-climbing
and calculus-based techniques due to parallelism and the ability to locate the best approximate solu-
tions in very large search spaces. Furthermore, more powerful and flexible new generation algorithms
can be formulated by combining existing and new evolutionary algorithms with classical optimization
methods.
References
Chipperfield, A.J., Fleming, P.J., and Fonseca, C.M. Genetic algorithm tools for control systems engineer-
ing. In Proceedings of Adaptive Computing in Engineering Design and Control, Plymouth, pp. 128–133
(1994).
Coello, C.A. Use of a self-adaptive penalty approach for engineering optimization problems. Computers
in Industry, 41 (2000) 113–127.
De Jong, K. Analysis of the Behaviour of a Class of Genetic Adaptive Systems, Ph.D. thesis, University of
Michigan, Ann Arbor, MI (1975).
Deb, K. Optimization for Engineering Design: Algorithms and Examples, Prentice-Hall, New Delhi (1995).
Deb, K. An efficient constraint handling method for genetic algorithms. Computer Methods and
Applications of Mechanical Engineering, 186 (2000) 311–338.
El-Beltagy, M.A. and Keane, A.J. A comparison of various optimization algorithms on a multilevel
problem. Engineering Applications of Artificial Intelligence, 12 (1999) 639–654.
Flake, G.W. The Computational Beauty of Nature: Computer Explorations of Fractals, Chaos, Complex
Systems, and Adaptation, MIT Press, Cambridge, MA (1998).
Goldberg, D.E. Genetic Algorithms in Search, Optimization and Machine Learning, Addison-Wesley,
Reading, MA (1989).
Haykins, S. Neural Networks: A Comprehensive Foundation, MacMillan, New York (1994).
Heitkotter, J. and Beasley, D. The Hitch-Hiker’s Guide to Evolutionary Computation: A
List of Frequently Asked Questions (FAQ), (2000) Available via anonymous FTP from
ftp://rtfm.mit.edu/pub/usenet/news.answers/ai-faq/genetic/ About 110 pages.
Holland, J. Adaptation in Natural and Artificial Systems, University of Michigan Press, Ann Anbor, (1975).
Holland, J.H. Hidden Order: How Adaptation Builds Complexity, Addison-Wesley, Reading, MA (1995).
Jenkins, W.M. On the applications of natural algorithms to structural design optimization. Engineering
Structures, 19 (1997) 302–308.
Kannan, B.K. and Kramer, S.N. An augmented Lagrange multiplier based method for mixed integer
discrete continuous optimization and its application to mechanical design. Journal Mechanical
Design, Transaction of ASME, 116 (1994) 318–320.
Karr, C.L., Yakushin, I., and Nicolosi, K. Solving inverse initial-value, boundary-valued problems via
genetic algorithms. Engineering Applications of Artificial Intelligence, 13 (2000) 625–633.
Keane, A.J. Genetic algorithm optimization of multi-peak problems: Studies in convergence and
robustness, Artificial Intelligence in Engineering, 9 (1995) 75–83.
Koza, J.R. Genetic Programming: On the Programming of Computers by Natural Selection, MIT Press,
Cambridge, MA (1992).
McCulloch, W.S. and Pitts, W. A logical calculus of the idea immanent in nervous activity, Bulletin of
Mathematical Biophysics, 5 (1943) 115–133.
Michaelewicz, Z. Genetic Algorithm + Data Structure=Evolution Programming, Springer-Verlag,
New York (1996).
Mitchell, M. An Introduction to Genetic Algorithms, Cambridge, MIT Press, MA (1996).
Murase, H. Finite element inverse analysis using a photosynthetic algorithm. Computers and Electronics
in Agriculture, 29 (2000) 115–123.
Pohlheim, H. Genetic and Evolutionary Algorithm Toolbox for Matlab (geatbx.com) (1999).
Rechenberg, I. Evolutionsstrategie: Optimerrung Technischer Systeme nach Prinzipien der Biologischen
Evolution, FrommannHolzboog, Stuttgart (1973).
Renner, G. and Ekart, A. Genetic algorithms in computer aided design. Computer-Aided Design, 35 (2003)
709–726.
Wu, S.Y. and Chow, P. T. Genetic algorithms for solving mixed-discrete optimization problems. Journal of
the Franklin Institute, 331 (1994) 381–401.
33.1 Introduction
With the advent of Micro-Electrical-Mechanical Systems (MEMSs) technology, complex and ubiquitous
control of the physical environment by machines will facilitate the diversity of mechanization and auto-
mation, long promised by visionaries. Wireless sensor networks have appeared as the first generation of
this revolutionary technology. The small size and low cost of sensor devices will enable deployment of
massive numbers, but initially place severe limitations on the computing, communicating, and power
capabilities of these devices. With these constraints, research efforts have concentrated on developing
techniques for executing simple tasks with minimal energy expense. But, as MEMS evolve, computing
and communicating capabilities are expected to improve at an accelerating rate and new techniques for
supplying energy will significantly reduce the low power constraint. Increased capabilities will be possible,
and it is predicted that societies of machines will evolve to be autonomous, cooperative, fault-tolerant, self-
regulating, and self-healing. Improvements in biomimetic software and evolvable hardware will lead to
self-sustaining communities of machines with emergent behavior that autonomously operate and adapt to
changes in the environment. The main goal of this chapter is to investigate biomimetic models in relation
to their potential application to the evolution of these systems, thus providing a framework that guides
33-601
the evolution from the current primitive organizations of sensor nodes to pervasive societies of intelligent
electromechanical systems.
33.2 Background
The dream of ubiquitous machinery was a result of the Industrial Revolution: that machinery could
eventually be developed to service man’s needs, particularly replacing man’s direct participation in physical
work for provision of food, clothing, housing, and other necessities. The realization of this dream has
progressed for several centuries but has taken a revolutionary leap through the invention of electronic
computers. Computers have vastly improved the performance of mechanical machines by allowing more
sophisticated “thought” processes for control of the mechanics (e.g., capabilities of industrial robots
used in manufacturing are far more autonomous and complex, automobile engines are much more
efficiently controlled, etc.). Futurists have long predicted that machines would eventually communicate
and cooperate with each other to accomplish extraordinarily complex tasks without human intervention.
The rapidly accelerating improvement of computing capabilities over the last 50 years has contributed
to the realization of this dream. Four major factors are
1. Increasing computational performance: Moore’s Law has reliably predicted that computing
performance doubles every 18 months.
2. Reduction in physical size: Computer technology has gone from vacuum tubes to transistors, to
integrated circuits. Soon nano-technology will create another revolution by further reducing the
physical size of circuitry.
3. Decreasing cost of production: Improved technology has contributed, but commodity pricing has
had a greater affect.
4. Reduction in power requirements and improvements in power sources: Advances in technology com-
bined with reductions in size will continue to reduce power requirements. However, the next
paradigm shift will come from techniques making it possible to scavenge power from the ambient
environment (from various sources including vibration, heat, light, and background radio noise).
While futurists long dreamed of machines working with other machines, a giant step toward the
realization of this dream may be credited to a DARPA-sponsored program, SmartDust, originated in
1999 (Kahn et al., 1999). The title of the program creatively described its goal: to make machines, with
self-contained sensing, computing, transmitting, and powering capabilities, so small and inexpensive
that they could be released into the environment in massive numbers. Whether intended to be mobile
(e.g., small enough to be cast into the wind to stay aloft for extended periods) or immediately stationary
(e.g., deployed from an airplane over a large geographical area for ground surveillance), the sensors have
the formidable task of self-organizing into a network that can transmit information to a user while being
severely constrained by the onboard energy supply. As they were funded by the Department of Defense,
much of the research concerned surveillance of battlefield scenarios, but it was immediately apparent that
many peace-time applications could benefit from wireless sensor networks (WSNs).
Observations of the SmartDust project suggests the following definition of WSNs:
The National Research Council’s (NRC 2001) Committee on Networked Systems of Embedded Computers
published a report expanding this definition. They defined the concept of the embedded network, EmNet,
as a network of heterogeneous computing devices pervasively embedded in the environment of interest.
Their stated objective was to “develop a research agenda that could guide federal programs related to
computing research and inform the research community (in industry, universities, and government)
about the challenging needs” of research in EmNets. They recognized a difference between EmNets and
traditional computer networks in that the former will be “more tightly integrated with their physical
environment, more autonomous, and more constrained in terms of space, power, and other resources.
They will also need to operate, communicate, and adapt in real time, often unattended.” The Committee
enlarged the scope for SmartDust to paint a picture of the eventual embedding of sensor (and effector)
nodes into every aspect of our world: “computing and communications technologies will be embedded into
everyday objects of all kinds to allow objects to sense and react to their changing environments. Networks
comprising thousands or millions of nodes could monitor the environment, the battlefield, or the factory
floor; smart spaces containing hundreds of smart surfaces and intelligent appliances could provide access
to computational resources.” A subtle but important prediction is made here: these networks will not
only gather information about the environment but will affect environmental conditions or effect new actions.
Thus, a feedback mechanism is instantiated where input to the network may be affected by its own actions.
Diverse applications are anticipated:
EmNets will be implemented as a kind of digital nervous system to enable instrumentation of all sorts
of spaces, ranging from in situ environmental monitoring to surveillance of battlespace conditions;
EmNets will be employed in personal monitoring strategies (both defense related and civilian),
combining information from nodes on and within a person with information from laboratory
tests and other sources; and EmNets will dramatically affect scientific data collection capabilities,
ranging from new techniques for precision agriculture and biotechnological research to detailed
environmental and pollution monitoring.
When this point is reached, one could argue that the universe becomes one gigantic EmNet. The
Massachusetts Institute of Technology’s (MIT) Amorphous Computing group sees nanoscale computers
“combining microsensors, actuators and communications devices integrated on the same chip to produce
particles that could be mixed with bulk materials, such as paints, gels, and concrete” (Abelson et al., 1999).
These groups envision an evolution from SmartDust’s passive observation of the environment to active
manipulation of the environment, driven by the coordinated action of massive numbers of sensors and
actuators, coupled with vast computing resources. Clearly, the term wireless sensor network describes only
a small portion of this vision.
To accomplish this vision, research challenges abound. Despite constraints in power, memory, band-
width, etc., these devices will be embedded into systems designed to last for long periods of time.
While SmartDust predicted massive numbers of nodes per network, many issues resulting from density
and scale remain unsolved. EmNets will be self-configuring upon deployment and adaptive to changes in
both the network and environment. Trust and fault tolerance models will have to be developed to solve
problems well beyond those posed by conventional networks. EmNets will control real-time processes
where great costs will be incurred or life is at risk upon failure making reliability, security, and quality
of service the major research issues to be considered. There are also nontechnical issues. Ubiquitous and
pervasive devices constantly monitoring and controlling everything present many potential legal, ethical,
and policy controversies regarding privacy, security, reliability, intellectual property rights, etc. There are
other issues concerning production standards, commercialization, business models for implementation
and coordination, etc. For the sake of making some progress, many of these requirements are being
ignored for current prototypes and present significant challenges for future research.
The NRC expanded the definition of wireless sensor networks:
Wireless sensor networks are massive numbers of small, inexpensive, self-powered devices pervasive
throughout electrical and mechanical systems and ubiquitous throughout the environment that monitor
(i.e., sense) and control (i.e., effect) most aspects of our physical world.
While the NRC envisions ubiquitous use of WSNs in the next few years, some visionaries look beyond
that horizon to predict even greater developments. Kurzweil (Richards et al., 2002) predicts that within the
next 20 years, a $1000 computer (typical price of today’s PC) will have computational power that matches
the human brain. He states, “This level of processing power is necessary but not sufficient for achieving
human-level intelligence in a machine. Organizing these resources — the ‘software’ of intelligence — will
take us to 2029, by which time your average personal computer will be equivalent to a thousand brains.”
Unlike humans who must share knowledge through speech or vicariously through print or other media,
computers can instantly transmit all of their knowledge to other computers in a relative instant. Science
fiction writers and Hollywood often portray such an intelligent machine as a walking, talking robot (e.g.,
Arnold Schwartzenegger’s character in the Terminator), but a more likely scenario is a distributed network
of intelligent machines. Kurzweil further predicts that humans will be directly connected to these net-
works via neural implants. Some of these implants are available today as “standalone” devices to counteract
Parkinson’s disease, for hearing (cochlear implants), and, soon, visual implants for the blind. Wearable
devices currently monitor a person’s health and transmit this information to a network of resources for
analysis and action. Kurzweil envisions these implants will progress to allow humans to immerse them-
selves in the network of intelligent machines (e.g., to enter virtual reality environments). While his focus
is on the expectation of vast improvements in the intelligence of machines, it is a reasonable assumption
that similar scales of improvements will be obtained for the size of components and power requirements.
Moravec (1988) believes that machines will not only exceed the capabilities of humans in a similar time
period but will eventually become self-sufficient. He sees a future in which, “the human race has been
swept away by the tide of cultural change, usurped by its own artificial progeny.” Thus, when machines
are more intelligent than humans, autonomous, adaptable, self-sustaining, and self-replicating, they will
have no further use for mankind.
These visionaries further extend the definition of WSNs predicting their destiny:
The ever increasing capabilities of pervasive and ubiquitous wireless sensor networks will improve
the intelligence, autonomy, and adaptability of electrical and mechanical systems such that they will
soon converge with and surpass the capabilities of humans.
Kurzweil’s and Moravec’s predictions do not go unopposed. Their vision of man–machine conver-
gence, titled Strong AI (Artificial Intelligence), is challenged by many (Richards et al., 2002). One would
expect theologians to contest these predictions, but many metaphysicians as well argue that there are char-
acteristics of humanity that cannot be created or duplicated through a programmed series of chemical
and physical reactions. These include consciousness, curiosity, creativity, compassion, freedom, etc. The
debate rages on as to whether machines could ever identify with humans. But, even if they never do, and
Kurzweil and Moravec are off by orders of magnitude in their projections, capabilities will improve to the
extent that networked machines in the near future will go far beyond those of today.
Throughout this sequence of definitions, it is apparent that the term wireless sensor network is inadequate
to describe this increasing complexity. EmNets is neither very descriptive, nor often used. Sensor/effector
network may be more complete but is still not comprehensive. As the original term is well understood
and accepted by the research community, it will be retained for the remainder of the discussion using the
abbreviation, WSN. The individual sensor device is often called a sensor mote (or mote), but in the context
of the network, a mote is referred to as a sensor node (or node).
The progression from simple machines to complex arrangements of machines that are autonomous,
self-regulating, self-healing, and even self-reproducing will require increasing sophistication not only of
individual nodes and the network but also of the supporting software. As will be shown, the evolution
needed is not unlike that of life from simple unicellular organisms to multicellular societies. Section 33.2
presents a set of defining characteristics of WSNs that differentiate them from conventional computing
networks by describing their functionality and organization. Section 33.3 describes the current status
of WSN implementations. Section 33.4 describes the evolution of living organisms in terms of their
organization and behavior. Of prime importance are
• The diversity of life and how such diversity is required of future WSNs.
• The differences between innate behavior and learned behavior and their effect on emergent behavior
from individual organisms acting as a group.
The key to the application of these principles to WSNs is a new discipline of engineering, engineering for
emergent behavior, which will be defined in later sections of this chapter. Section 33.5 presents our vision
that current WSN implementations parallel early life forms and describes what types of biomimetics
will apply to future WSNs to build an ecosystem of WSNs. This section also identifies research techniques
currently used in Artificial Intelligence (AI) and Artificial Life (Alife) that apply to engineering for emergent
behavior of WSNs. Section 33.6 presents a philosophical discussion of the importance of engineering for
emergent behavior to the future of WSNs. Section 33.7 offers the concluding remarks.
the WSN is closely tied to the physical environment, node location must be determined. Wadaa
et al. (2004a) devised a scalable energy-efficient training protocol to provide locations for nodes
that are initially anonymous, asynchronous, and unaware of their locations.
• Large numbers of sensor nodes: Many of the challenging and new issues of WSNs are concerned
with coordinating massive numbers of sensor nodes into a functioning network.
• Small physical size of sensor nodes: Initially, reduction in size is expensive. However, as manufacturing
improves, smaller size and increased volume should contribute to reduced prices. Also, smaller sized
nodes will widen the range of applications.
• No pre-assigned network topology: If the location of the sensor nodes is not predetermined, neither
can the network topology. Modest power budgets may prevent nodes from transmitting directly to
a common destination, therefore an ad hoc process will be applied to form a multi-hop network.
• Anonymous network: Scalability of massive numbers of nodes within a WSN precludes the assig-
nment of unique node identifiers during deployment. Anonymity can be useful in securing a WSN.
Wadaa et al. (2004b) describe how anonymity can be used to prevent denial of service attacks.
• Wireless communication: Random distribution dictates wireless communication.
• Limited onboard energy supply: Current battery technology ties capacity to physical size. Therefore,
the need for small sensor nodes competes with the need for more onboard energy. If technology
improves sufficiently, this will no longer be a major issue.
• Node mobility: Motion of sensor nodes may be:
◦ Static: Once deployed, the sensor nodes do not move.
◦ Relative: The sensors move, but as a group (e.g., a group of satellites may be placed in parallel
orbits but remain fixed relative to each other; a “satellite constellation” ).
◦ Fully mobile with independent movement : For example, a swarm of sensing robots.
• Application code distributed across nodes: WSNs range from those containing homogeneous nodes
reporting simple observations to heterogeneous nodes with differing (and perhaps changing)
functions.
• Data aggregation: In addition to individual sensor nodes reporting to a single destination,
sensor nodes may form clusters whereby local preprocessing of sensed data improves efficiency
by transmitting only the aggregated data. In fact, a hierarchy of clusters and aggregation may
be used.
• Performance and reliability: Because of the interaction with the physical world and the potential
for damage to it in many applications, new demands will be placed on these networks for real-time
performance and reliability.
• Security: As stated, these networks must perform and be reliable; therefore, nodes must be trusted
network members. Jones et al. (2003) proposed a new security paradigm in which security is based
upon parameterized frequency hopping and cryptographic keys in a unified framework.
Many of these characteristics may not apply to networks resembling WSNs. Consider for example a
“smart skin”where many sensors are attached to a surface (e.g., an airplane wing). Sensor and actuators may
be pervasive throughout a larger structure, serving not only as surface sensors to sample environmental
input but internally as “muscles” to move an object. Such a network may be massive but fixed such that
each sensor’s position and identity may be predetermined, its network connectivity may be predetermined
and need not be wireless, and its supply of power may not be limited. But it still has large numbers of
small, spatially distributed nodes that must communicate efficiently and aggregate large amounts of
locally collected information for global decisions. Because of its close ties to the physical world, real-time
performance, reliability, and security are extremely important.
WSNs have many close cousins that go by a variety of names such as:
• Heterogeneous WSNs: This is the nomenclature of Intel’s EcoSense research project, that is,“tackling
a difficult challenge: how to network large numbers of inexpensive wireless sensor nodes while
maintaining a high level of network performance” (Intel Research — Exploratory Research — Deep
Networking — Heterogeneous Sensor, 2004).
• Deep networking : Intel defines Deep Networking as
Locally networking billions of embedded nodes, driving computing deeper into the infra-
structure that surrounds us. . . .Small, inexpensive, low-powered sensors and actuators, deeply
embedded into the physical environment can be [deployed] in large numbers, interacting and
forming networks to communicate, adapt, and coordinate high-level tasks. . . .As these micro
devices are networked, the Internet will be pushed not just into different locations but deep
into the embedded platforms within each location. This will enable a hundredfold increase
in the size of the Internet beyond the growth we are already anticipating. New and different
methods of networking devices to one another and to the Internet must be developed. (Intel
Research — Exploratory Research — Deep Networking, 2004)
• Amorphous computing : This is MIT’s term for “the development of organizational principles and
programming languages for obtaining coherent behavior from the cooperation of myriads of
unreliable parts that are interconnected in unknown, irregular, and time-varying ways.” (Abelson
et al., 2000)
• Sensor webs: NASA’s Jet Propulsion Laboratory (JPL) describes a sensor web as “an independent
network of wireless, intra-communicating sensor pods, deployed to monitor and explore a limitless
range of environments. This adaptable instrument can be tailored to whatever conditions it is sent
to observe.” (Delin, 2004)
• Mesh networks: This term describes a type of Wi-Fi network where nodes do not communicate
through a central controller, or access point, but rather, mobile ad hoc peers in the network form
a mesh topology to transmit data from source to destination via multiple hops throughout the
network. This technique improves on the reliability and efficiency of the centralized approach.
Although designed for connection of conventional computers, this technology shares many issues
with WSNs.
• Sensor constellation: This term is usually used to describe multiple satellites cooperating in a space
science experiment (Leveraging the Infosphere: Surveillance and Reconnaissance in 2020, 1995).
• Pervasive computing : The Centre for Pervasive Computing (2004) defines this as,
the next generation computing environments with information & communication technology
everywhere, for everyone, at all times. Information and communication technology will be an
integrated part of our environments: from toys, milk cartons and desktops to cars, factories
and whole city areas — with integrated processors, sensors, and actuators connected via
high-speed networks and combined with new visualization devices ranging from projections
directly into the eye to large panorama displays. . . .Pervasive computing goes beyond the
traditional user interfaces, on the one hand imploding them into small devices and appliances,
and on the other hand exploding them onto large scale walls, buildings and furniture.
• Ubiquitous computing : This term was coined by Mark Weiser (1996) whose goal is to “Activate the
world. Provide hundreds of wireless computing devices per person per office, of all scales (from 1
displays to wall sized). This has required new work in operating systems, user interfaces, networks,
wireless, displays, and many other areas.”
• Invisible computing : An ACM SIGGRAPH conference in 2000 explored “the tiny, cheap, special-
purpose devices that experts expect to diffuse into our lives over the next two decades. Though
these devices themselves may be visible, our common goal is to make them so comfortable to use
that they seem not to be computers at all — the computing power they use stays invisible” (Invisible
Computing: Scope, 2000).
Palms UAV-Dropped Sensor Network Demo). They dropped six sensor motes from an Unmanned Aerial
Vehicle (UAV) along a road. These motes self-organized by synchronizing their clocks and forming a
multi-hop network. They magnetically detected vehicles passing and reported the time of the passing.
Using information collected from all motes, velocity of passing vehicles was computed.
On August 27, 2001, researchers from UCB and the Intel Berkeley Research Lab demonstrated a self-
organizing WSN to those attending the kickoff keynote of the Intel Developers Forum. Several students
each brought on stage a wireless sensor mote and activated it at different times. As the motes were
initiated, their icons appeared on a display with lines connecting all motes that could “hear” each other and
highlighted lines depicting a multi-hop routing structure over which sensor data could be transmitted to a
central collector, a PC. The network grew as the nodes were initiated and adapted to changing conditions.
Additionally, color cues on the display indicated changing lighting conditions on the stage as sensors
detected these changes. As the students left the stage, the network disintegrated. In a second large-scale
demonstration of the networking capability, the quarter-sized motes were hidden under 800 chairs in
the presentation hall and simultaneously initiated forming what Culler described as, “the biggest ad hoc
network ever to be demonstrated” (Lammers, 2001).
In the spring of 2002, Culler’s group (Mainwaring et al., 2002) collaborated with the College of the
Atlantic in Bar Harbor to instate a WSN on Great Duck Island, Maine. The initial application was to
monitor the microclimates of nesting burrows in the Leach’s Storm Petrel, and, by disseminating the
data worldwide, to enable researchers anywhere to nonintrusively monitor sensitive wildlife habitats. The
sensor motes were placed in the habitat and formed a multi-hop network to pass messages in an energy-
efficient manner back to a laptop base station. The data was intermediately stored at the base station
and eventually passed by satellite to servers in Berkeley, CA, where it was distributed via the Internet to
any interested viewer. The sensors measured temperature, humidity, barometric pressure, and mid-range
infrared by periodically sensing and relaying the sensed data to the base station. The largest deployment
had 190 nodes with the most distant placement over 1000 feet from the nearest base station.
The FireBug system (Chen et al., 2003) is a network of GPS-enabled, wireless thermal sensors motes,
communicating through a control layer for processing sensor data, and a command center for interactive
monitoring and control of the WSN. The FireBug network self-organizes into clusters in which cluster
leader motes act as base stations, receiving sample data from cluster members and brokering commands
to these members. The controller is a personal computer running the Apache web server interfaced with
MySQL using PHP. The FireBug Command Center allows user interaction for controlling the FireBug
network and displays real-time changes in the network.
and improved rapidly. If Kurzweil and Moravec are correct, eventually electromechanical systems
may design themselves. Again, random mutation and selection for sustainability may prevail but
at a much faster pace than ever occurred in life.
• Functionality, as are all other characteristics of living organisms, is only important as it affects
sustainability. While humans are in control, functionality will be most important for a WSN,
whereas sustainability only affects issues such as cost, efficiency, longevity, etc. Evolution of WSNs
can be directed for increasing functionality with sustainability only a secondary issue. Again,
Kurzweil and Moravec predict that this may change back if machines become autonomous giving
primary importance to sustainability.
Organization models and behavior influence functionality. In general, with more complexity comes
more functionality and more diverse behavior. It is important to note that whereas man continually
strives to divide living systems into distinct categories organisms have developed as a continuum of both
organizations and behavior in which there are typically exceptions to any rule applied for distinction.
Development of machines can be perceived as such a continuum, but it is necessary to adopt a model
in which clear distinctions are made. A distinction will be made between the capabilities of unicellular
organisms compared with those of multicellular organisms and, furthermore between organisms surviving
alone and those organized into groups (colonies, swarms, societies, etc.).
Behaviors will be categorized as those that result from responses to stimuli that ultimately are determined
genetically (i.e., innate) and those that are cognitive. Cognition is defined as the mental process by
which knowledge is acquired, which is a result of awareness, perception, intuition, reasoning, memory, and
judgment. Thus, a behavior is said to be cognitive if its effect is known and understood by the effector.
Cognitive behavior begets new cognitive behavior that is not genetically encoded. For this discussion, new
behavior that results from prior experience defines learning. A simple response to stimuli is not considered
learning. An interesting effect of cognition in living systems is that it may introduce selection factors other
than sustainability.
• Simple organization, cells: Although cells are composed of identifiable structures, and some
independent but subcellular components function in a capacity that arguably demonstrates char-
acteristics of life, cells are the basic unit of life for the purpose of this discussion. All cells are
contained by a plasma membrane in which the chemical “soup” of life operates. All cells have a
genetic structure, DNA, controlling protein production that, in turn, catalyzes all metabolic pro-
cesses within the cell. Cells are at least potentially capable of self-replication. A cell may serve many
functions and exist independently as a distinct organism or have a very specialized function while
cooperating symbiotically with many other adjacent cells forming a multicellular organism.
• Building blocks of complexity, tissues, and organs: Tissues are contiguous, homogeneous, highly spe-
cialized cells, each serving a function where the collective result of their actions yields a cumulative
result. For example, muscle cells contract, applying a pulling force, when stimulated. While the
power of individual cells is small, the cumulative power of millions of cells contracting simulta-
neously produces a significant force. Unlike unicellular organisms that must forage for food, cells
of tissues are not self-sustaining as they are provided nourishment by the multicellular organism
to which they belong and cannot survive independently. Organs are a heterogeneous organization
composed of many kinds of tissues to serve a cumulative function greater than the individual
functions of their component tissues (e.g., a stomach has muscle tissues and tissues to secrete acid
and mucus; it functions to digest food). Organs may combine as components in an organ system to
perform a higher function (e.g., the digestive system has many organs working together to intake,
digest, and distribute food while eliminating waste).
• Complex organization, multicellular organisms: Multicellular organisms are composed of tissues and,
usually, organs. They range from the simplest of these composed of few tissues (e.g., Coelenterates
such as jellyfish) to plants with simple organs to mobile animals with complex motor and nervous
systems. Of particular interest to this discussion are the latter. Because of these complex systems,
they can not only respond to stimuli but can learn from these stimuli to alter their behavior.
Multicellular organisms reproduce as a unit.
• Organizations of organisms: Colony, population, community, and society are nearly synonymous
words describing a group of organisms living or growing together. Although they are often used
interchangeably, colony, population, and society usually denote a homogeneous group of the same
kind of animals, plants, or unicellular organisms, while community usually indicates all life in a
given habitat.
• Add the environment to comprise the ecosystem: An ecosystem is the community of living organisms
in a habitat together with all nonliving (abiotic) components of the environment with which the
community interacts.
• The root of the tree is the biosphere: On Earth, the biosphere is all of the area of Earth from the
highest altitude in the atmosphere to the lowest depths of the oceans and land where life exists.
Thus, life on Earth can be represented as a tree with the biosphere the root, which is composed of
ecosystems, which are composed of communities and the abiotic environment, etc. The beauty of
the evolutionary process is that the biosphere comprises all levels of organizations and behavior:
if an organism fills a niche for sustainability, it survives. Life did not begin with the simple organisms
and, as it evolved to more complex organisms, discard the simple organisms. There is a delicate
balance within an ecosystem between all of the interdependent organisms together with their
environmental resource requirements. With the obvious competition for food, it may appear that
life is a constant fight for survival. While this is true for an individual organism, when the biosphere
is considered, life is what is sustained. Margulis and Sagan (1986) stated, “Life did not take over the
globe by combat, but by networking.” The biosphere as a whole may be viewed as a giant symbiotic
relationship.
Applying the ecosystem model to future WSNs encourages one to plan a diverse collection of interacting
(i.e., networked) sensor/effector systems of all sizes and complexities to fulfil every required niche.
are synonymous, “Bacteria and other unicellular organisms are autonomous and social beings showing
(the lowest levels of) cognition. They have the fundamental cognitive abilities to identify elements of
the environment and to differentiate between them (and self), to choose among alternatives, to adapt to
changes, to coordinate their behavior in groups, to act purposefully.” While these cells respond to external
stimuli, their response is ultimately preprogrammed within and limited by their DNA. In view of previous
definitions of cognition and learning, these behaviors are not cognitive. Although improvements occur
through generations of randomly mutated cells with mutations selected for sustainability, this is also not
considered learning.
Multicellular organisms often act as singletons. Other than to mate, mosquitoes normally function
independently. Mosquitoes cohabitate in massive numbers, often appearing to swarm and are described
as such. In northern ecosystems, these swarms congregate in such density as to inhibit breathing (Conniff,
1996) and have been known to drain enough blood to cause the death of both cows and caribou (Budiansky,
2002). However, as defined below, this differs from the behavior exhibited by swarms of bees or ants. This
action is the result of massive singletons simultaneously attacking the same food source; they are not
cooperating but are competing. Again, for this discussion, these organisms do not exhibit cognitive
behavior.
The marvel of the “hive-mind” is that no one is in control, and yet an invisible hand governs, a
hand that emerges from very dumb members. The marvel is that more is different. To generate a
colony organism from a bug organism requires only that the bugs be multiplied so that there are
many, many more of them, and that they communicate with each other. At some stage the level of
complexity reaches a point where new categories like “colony” can emerge from simple categories
of “bug.” Colony is inherent in bugness, implies this marvel. Thus, there is nothing to be found in
a beehive that is not submerged in a bee. And yet you can search a bee forever with cyclotron and
fluoroscope, and you will never find the hive.”
Social homeostasis, hive-minds, swarming, and similar behavior associated with large groups of social
insects share a common thread: the intelligence that these behaviors seem to exhibit is not the result of a
cognitive process. The intelligence of the swarm is emergent and results from the interactions of many
thousands of basically dumb, autonomous individuals. Each individual follows its own set of rules and
reacts to local state information. The rules are primarily a result of a manifestation of genetic encoding in
the presence of environmental stimuli. There is no central control command issuing orders. Individuals
are highly connected within their immediate neighborhood and can share information, but they do not
have a central server with which to communicate. Control and management of the swarm is distributed
throughout and within the swarm members. It is distributed control with no single point of command
and no single point of failure.
What level of intelligence (i.e., the extent of their cognition) is required of bees to perform such
behavior? There is evidence that some degree of memory is required. Gould and Gould (1988) believed a
bee’s ability for spatial recollection not only provided for storage of many mental maps to food locations but
also the ability to calculate shortcut routes between visited sites. However, Dyer and Seeley (1991) found
that bees are limited to storing route maps comprising of memorized landmarks (i.e., a much condensed
representation of spatial topology) and found no evidence that bees can form spatial relationships between
different routes. Bees also use visions of solar movement together with some internal circadian indicator
to relate landmarks for directional cues.
Gregory (1997) stated,
One’s first thought might be that insects are mere automata, lacking ability to learn, but this is not so.
Bees can learn flower patterns and complex routes to food. They can navigate, and communicate in
many ways including the famous dance-language discovered by Karl von Frisch. The waggle dance . . .
is related to the position of the Sun, indicating to other workers direction and distance and food
quality of flowers. No doubt there is always a danger of reading too much intelligence into innate
behavior, but it seems impossible to see this as completely stereotyped, for it is adjusted to particular
conditions and needs. . . .Insect learning may be limited to immediate adaptive uses; but insects
do have associative learning by Pavlovian conditioning with rewards from previously meaningless
stimuli. . . .They do show latent learning, though probably without “insight.” They cannot reorganize
their memories for a new situation (at least not in laboratory conditions), and there is little transfer
learning from one situation to another. But it remains remarkable that such small brains can do
so much.
Reinhard et al. (2004) demonstrated a “Pavlov’s dog” response in bees by exposing a hive to two food
sources each laced with a different fragrances: rose and lemon. Later, all food and scent was removed
from the stations. When exposed to a scent, the bees returned to the empty feeding station that previously
contained that scent. While this demonstrates memory and learning, they do not give evidence for long
duration memory in bees.
Remarkable as the behavior is, Gregory (1997) sees no evidence of concept formation and insight in
bees. Gregory agreed with Gould and Gould (1988) when they said,
(bee) communication and navigation are the most complex known among invertebrates and, except-
ing humans, quite likely among vertebrates as well. . . .Far and away the most complex animal
behaviors we know of usually are innate. . . Orb-weaving spiders make their characteristic webs in
total darkness with no previous experience or learning. . . .In our opinion, selection has operated to
make complex behavior innate for the simple reason that, if it were not, animals could not hope to
discover it by trial and error or to learn it by observation in time to be able to perform it. In the end,
complexity of behavior is one of the worst guides of all to intelligence.
While these behaviors demonstrate the innate ability for efficient calculations using simple memories of
environmental cues such as landmarks and the movement of the sun, this does not compare with the
cognitive abilities of higher animals.
understands that the chance of success is greater in a pack and, therefore, prefers to participate as a member
of the pack. In this case, individual abilities are similar although differing duties may be assumed. Wolves
have behaviors for greeting pack members, keeping the pack together, and identifying social ranking. The
entire pack participates in raising the young: they feed the mother and the young, protect them from
harm, and nurse the young when the mother is away. Human societies are much more complex with
more diverse duties of individuals contributing to the good of the community. In a successful society, a
symbiotic relationship develops among individuals, where actions that are good for the society are also
good for the individual.
Most will agree that cognitive organisms require a higher level of intelligence. But that does not add to
understanding without a definition of intelligence. Pfeifer and Scheier (2000) summarized the opinions
of major experts in the field of psychology in 1921 on the definition of intelligence:
et al. (1979) stated that behavior can be modeled by “how people take in information, how they recode
and remember it, how they make decisions, how they transform their internal knowledge states, and
how they translate these states into behavioral outputs.” Pfeifer and Scheier (2000) described this as
cognitivistic paradigm or functionalism, which formed the basis of informational processing psychology.
This conveniently aligned with computer simulation, where thinking could be modeled as a computer
program having input, data structures, computation, and output.
While this model served well for simulating problem solving (culminating with the chess championship
of IBM’s “Deep Blue” computer over the reigning world champion chess player), it proved less than
adequate for modeling simple behavior even a child can perform, particularly those requiring interaction
with the environment (e.g., identification of colors and shapes, depth perception, navigation, etc.). Pfeifer
and Scheier (2000) promote a new model of learning they call embodied cognitive science (alias new
artificial intelligence or behavior-based artificial intelligence). Summarizing the view of Rodney Brooks
of the MIT Artificial Intelligence Laboratory, one of the founders of this new model, they agree that
the cognitivistic paradigm’s model of thinking, logic, and problem solving is fundamentally flawed by
our own introspection of how we see ourselves. Brooks suggested we abandon this approach and focus
on interaction with the real world. Intelligence emerges from the interaction of an organism with its
environment: intelligence must operate within a body. This effort has overcome some of the problems
encountered by use of the cognitivistic paradigm, and developed the approach of designing for emergent
behavior, producing some unexpected though desirable results.
Pfeifer and Scheier (2000) specify two additional concepts that pertain to this discussion: First, “The
essence of learning is that the [organism] can use its own experience to improve its behavior.” Because
experience is nondeterministic, predicting changes in behavior is difficult. Second, adaptive behavior
requires two components, compliance with existing “tried and true” rules and applying diverse, new
rules. They call this diversity–compliance tradeoff. It is the application of diverse, new behavior related to
previously learned behavior (rules) that has the best chance for improvement.
Cognitive behavior described above is defined as intra-generational; each individual of a generation
learns based on its own experience. Response to stimuli is evaluated by a fitness function (e.g., was lifting
my hand quickly from a hot stove good or bad?); behavior is either reinforced or altered based on the
result. In higher animals, stimuli and the resulting chosen behavior are remembered, organized, related,
and prioritized. Thus, when the same stimuli are encountered again, the best behavior is exhibited. But
inter-generational learning may be man’s greatest achievement. All organisms benefit from information
that is inherited (i.e., “learned”) from the previous generation through genetic codes. Genetic codes are
passed through DNA in reproduction and determine behavior as responses to stimuli (e.g., phototropism
in plants — plants “know how” to grow toward a light source). But as we have defined, this is not cognitive
behavior. In humans, this inter-generational transfer is augmented as information is acquired from parents
and other members of previous generations. In most animals this transfer of knowledge occurs only by
direct interaction with parents or contemporaries (e.g., other animals in the pack). Man however has
developed the ability to permanently capture knowledge through books and other media and indirectly
pass it on to future generations. The power of this technique cannot be overemphasized.
• Centralized: Behavior of the society is dictated by a central authority. This approach has the
weakness of a single point of failure.
• Federated: Behavior of the society is dictated by a federation of authorities. This approach mitigates
the single point of failure but increases the complication of decision making.
• Distributed: Behavior of the society can emerge from actions of individuals on the basis of rewards,
extinction, and punishment or other learning models.
Imagine a town is newly formed. Who decides how many tailors the town needs? How many merchants?
How many carpenters? Obviously, a leader could emerge that would make those decisions for all. But the
leader could be wrong. He could fail and chaos would develop until new leadership emerges. He could
misappropriate resources for personal gain, but to the detriment of the society. Alternatively, individuals
could decide their own role in the society. A balance would naturally evolve: if too many people decide
to be a tailor, there will not be enough work and some will starve or be forced to change. It is likely that
the better tailors will get what work is available and survive. When individuals are free to make their
own choices of behavior, the choices can be detrimental to the society. The most successful societies have
been those in which individuals have been free to work for individual rewards within certain bounds
(i.e., individuals cannot seek personal rewards to the detriment of the society).
If he sees a light or hears a noise, he may not report the incident but store that information for further
consideration. He may pay more attention to that area in the future. He may also rely on inter-generational
learning: when faced with a new situation, he may consult the command center for more information; he
may consult a manual. What happens during the changing of the guard (i.e., generational replacement)?
The guard could tell the central command all information to be transferred to the new guard, let the
central command inform the new guard, and pass the new guard without saying a word. Perhaps it would
be more efficient for the current guard to directly discuss the information with the new guard allowing
interaction; the new guard could ask questions for clarification. The current guard could do a complete
“data dump” to the new guard telling him every incident that happened so that the new guard had all
experience that happened during the previous shift. Perhaps it would be more efficient for the current
guard to use his awareness, perception, intuition, reasoning, memory, and judgment (i.e., cognition) to
decide what knowledge to impart to the new guard. Similarly, cognitive nodes can significantly improve
the efficiency and effectiveness of many WSN implementations.
• Genetic material: The “genes” of sensor nodes are represented by characteristics endowed in their
creation. These can be both software and hardware. Flexibility requires that this genetic material
be mutable. Current nodes can change their software “genes” via their wireless connection but
hardware is constrained to the genes granted to them at “birth” (i.e., their manufacture). Efforts
are currently underway to design hardware that is evolvable such that it can redesign itself for new
functions.
• Metabolism: The metabolism of a sensor node is represented by electromechanical processes. The
source of “food” to energize these processes is the power source. Currently, power in WSNs is
provided by an onboard battery. The lifetime of a sensor node can be extended by providing it with
more powerful batteries, and technological improvements will surely provide more power in smaller
sized batteries. But this approach, no matter how potent, is still a finite power source. As the NRC
predicts, WSNs will be deployed requiring a long lifetime. A better approach is to design sensor
nodes that “eat”: mobile nodes forage for energy as do animals; stationary nodes must acquire
energy from a nearby, renewable source as do plants (photosynthesis using sunlight and water).
Research in this area is active, discovering techniques to make it possible to scavenge power from
the ambient environment. Potential sources include vibration, heat, light, and background radio
noise.
• Self-healing : Just as a multicellular animal does not fail because of malfunction or the death of
individual cells, a WSN must be able to compensate for malfunction or death of nodes.
• Symbiosis: Some organisms have developed symbiotic relationships in which each benefit from a
direct relationship with others. WSNs must be designed to organize in cooperation; complimenting
each other rather than competing.
Behavioral characteristics also directly affect the sustainability and the functionality of WSNs:
• Adaptability: The ability to adapt to a changing environment is paramount for any sustainable
system. If WSNs are to be autonomous, cooperative, fault-tolerant, self-regulating, and self-healing,
they must be able to adapt to changes in their environment. As with living organisms, some
adaptability can be innate, but to achieve maximum flexibility, cognition is required.
• Functional mobility: Obviously, mobile nodes can carry their functionality to different locations.
However, functionality can also migrate using stationary nodes. This migration is facilitated by the
reassignment of individual node duties.
• Sharing knowledge: Cognitive organisms pass information among their community in order to
reduce collectively the cost of learning. By sharing knowledge amongst nodes, WSNs can get
universal benefits for the experiences of a single node.
• Generational learning : As with communities of organisms, individual nodes will be “born”
(i.e., manufactured and deployed) and die (i.e., cease to function). New nodes coming into an exist-
ing WSN represent a new generation. WSNs must be designed such that knowledge is preserved
and made available to newer generations.
• Policing the society: It must be expected that nodes will not only fail, they may “misbehave.” In a
self-sustaining WSN, such rogue nodes must be discovered and dealt with to prevent damage to
the functioning network (i.e., the WSN must police itself). The ideal condition would be that all
damage is prevented, but, as with human societies, this may not be possible. However, the policing
must keep damage at an acceptable level.
• Cooperative sustenance: In animal societies, weak or sick individuals are cared for by the healthy and
strong members. In a well-regulated WSN, duties may be assumed by a stronger node, or perhaps
they may “nurse” the weaker node by providing sustenance (e.g., energy) or repairs (e.g., software
patches).
This is not meant to be an exhaustive list but to merely inspire thought on how models of living systems can
be directly applied to the improvement of WSNs. Many of these examples require extensive cooperation
among nodes in a network and among WSNs. To avoid the requirement (and all of its weaknesses) of
a centralized controller, the WSNs must be designed for the emergence of desirable behavior from this
cooperation.
The behavior of the system emerges not only by the application of these rules but by the chosen initial state
of the environment (i.e., the initial position of the agents: which cells are lighted). When the changing
states are displayed as an animation, objects appear to form and move. Groups of cells consistently lighted
relative to each other begin to glide across the grid (called gliders). Sometimes, objects will appear to
remain still (called blinkers). Sometimes, blinkers will appear to emit gliders (called glider guns). The
behavior seems surprising and unpredictable.
Epstein and Axtell (1996) designed a similar system they called Sugarscape. Using a similar cellular
automaton, they endowed the environment with simple rules: each cell was assigned an initial amount of
sugar, a rate of replenishment when sugar was consumed by agents, and no two agents could simultaneously
occupy a single cell. Agents had two genetic endowments:
• A vision whereby they could see a given distance in cells (e.g., some could see two cells ahead; some
three cells, etc.) in each of the four horizontal and vertical directions.
• A metabolism rate: the rate at which they consume sugar as they move from cell to cell.
Agents were given one simple rule: look as far as you can see in all directions, find the cell that has the
most sugar, go there, and consume the sugar. Sugar was dispersed in the environment with concentrations
at certain places. Four hundred agents were initially assigned locations and the rules applied in a series of
states. Not surprisingly, at the end of the simulation, agents with high metabolism rates and poor vision
died (they consumed all of their sugar before they found nourishment) and those with better vision and
lower metabolism survived to locate the cells richest in sugar: the system demonstrated a living Darwinian
ecosystem. They added more and different rules to eventually simulate other living attributes such as
commerce, combat, and sexual reproduction.
The question is, can agent-environment systems be engineered to elicit desired behavior and prevent
undesired behavior? That is, can Kelly be correct that one cannot find the hive in the bee; or is it that,
if one knew enough about the bee, the hive could be seen in it. And, can this technique be applied in the
design of WSNs?
Fuzzy logic and artificial neural networks (ANN) can be applied to the development of cognitive
WSNs. Introduced by Zadeh (1965), fuzzy logic was conceived to define partial truth; that is, truth values
that are not completely true and not completely false. ANNs are based on the concept of an artificial
animal neuron, the Threshold Logic Unit (TLU), proposed by McCulloch and Pitts (1943). Gurney (1997)
defines an ANN as, “an interconnected assembly of simple processing elements, units or nodes, whose
functionality is loosely based on the animal neuron. The processing ability of the network is stored in
the inter-unit connection strengths, or weights, obtained by a process of adaptation to, or learning from,
a set of training patterns.” The strength of the ANN is that they can be trained when the weights of the
connections are adjusted on the basis of input experienced (i.e., the ANN learns). Hashem et al. (1995)
demonstrated that neural networks benefit real-time data analysis by WSNs. For the task of identifying
contaminants in the environment, they demonstrated that using neural network-based analysis of data
from an array of heterogeneous sensors, selectivity of the array is significantly improved using sensors that
are not individually selective. The combination of a heterogeneous sensor array with an automated analysis
system is described as an artificial or electronic nose and has been demonstrated for use in monitoring food
and beverage odors, analyzing fuel mixtures, and environmental monitoring. The array is designed such
that each sensor measures a different property of the sensed sample. Each chemical composition presented
to the array produces a characteristic signature through the fusion of sensor readings. By presenting
many different compositions to the array, a database of signatures can be recorded. Conventionally, the
number of sensors must be at least as great as the number of analytes. The quantity and complexity
of data collected can make analysis of the data intricate. Using ANN to analyze the data for pattern
recognition can not only reduce the necessary computation, but can reduce the required number and type
of sensors in the array. Can ANNs and/or fuzzy logic be used in other applications of WSNs to facilitate
learning ?
application of this science needs to begin now, while the focus is on the unicellular stage of evolution.
Rather than wait until nodes have the expected increased capability to begin application, it is far better
to begin now to understand and plan with this new method of engineering. When these networks are
endowed with the qualities and abilities that are expected, it must be guaranteed that a command such as,
“Please open the pod bay door, HAL.,” produces the desired result.
References
Abelson, H., Allen, D., Coore, D., Hanson, C., Homsy, G., Knight, T., Nagpal, R., Rauch, E., Sussman, G.,
and Weiss, R. (1999). Amorphous Computing. MIT Artificial Intelligence AI Memo 1665. Retrieved
April 5, 2004, from http://www.swiss.ai.mit.edu/projects/amorphous/papers/ aim1665.pdf.
Abelson, H., Allen, D., Coore, D., Hanson, C., Rauch, E., Sussman, G., and Weiss, R. (2000).
Amorphous computing. Communications of the ACM, 43, Retrieved April 5, 2004, from http://
www.swiss.ai.mit.edu/projects/amorphous/cacm-2000.html.
Bonabeau, E., Dorigo, M., and Theraulaz, G. (1999). Swarm Intelligence: From Natural to Artificial Systems.
Oxford University Press, Oxford.
Budiansky, S. (2002). Creatures of our own making. Science, 298: 80–86.
Centre for Pervasive Computing (2004). Retrieved April 5, 2004, from http://www.pervasive.dk.
Chen, M., Majidi, C., Doolin, D., Glaser, S., and Sitar, N. (2003). Design and Construction of a Wildfire
Instrumentation System using Networked Sensors (Poster). Network Embedded Systems Technology
(NEST) Retreat, Oakland, CA. Retrieved April 5, 2004, from http://firebug.sourceforge.net.
Clark, A. (1997). Being There: Putting Brain, Body, and the World Together Again. MIT Press,
Cambridge, MA.
Conniff, R. (1996). Spineless Wonders. Henry Holt and Company, New York.
Delin, K. (2004). NASA/JPL Sensor Webs Project. Retrieved April 5, 2004, from http://sensorwebs.jpl.
nasa.gov.
Dusenbery, D. (1996). Life at Small Scale: The Behavior of Microbes. Scientific American Library Series
No. 61.
Dyer, F. and Seeley, T. (1991). Dance dialects and foraging range in three Asian honey bee species.
Behavioral Ecology and Sociobiology, 28: 227–233.
Epstein, J. and Axtell, R. (1996). Growing Artificial Societies: Social Science from the Bottom Up. The
Brookings Institution, Washington, D.C.
Gardner, M. (1970). MATHEMATICAL GAMES: The fantastic combinations of John Conway’s new solit-
aire game “life.” Scientific American, 223: 120–123. Retrieved April 5, 2004, from http://ddi.cs.uni-
potsdam.de/HyFISCH/Produzieren/lis_projekt/proj_gamelife/ConwayScientificAmerican.htm.
Gould, J., and Gould, C. (1988). The Honey Bee. Scientific American Library, New York.
Gregory, R. (1997). Editorial: Brains of ants and elephants. Perception, 26. Retrieved April 5, 2004, from
http://www.perceptionweb.com/perc0397/editorial.html.
Gurney, K. (1997). Introduction to Neural Networks. Routledge, an imprint of Taylor and Francis Books
Lt, London.
Hashem, S., Keller, P., Kouzes, R., and Kangas, L. (1995). Neural network based data analysis for
chemical sensor arrays. In Proceedings of International Society for Optical Engineering (SPIE)
AeroSense Conference, Orlando, FL (April 17–21, 1995), in Applications and Science of Artifi-
cial Neural Networks, Vol. 2492, Paper #2492–05, pp. 33–40. Retrieved April 5, 2004, from
http://citeseer.ist.psu.edu/519919.html.
Intel Research — Exploratory Research — Deep Networking (2004). Retrieved April 5, 2004, from
http://www.intel.com/research/exploratory/deep_networking.htm.
Intel Research — Exploratory Research — Deep Networking — Heterogeneous Sensor Networks (2004).
Retrieved April 5, 2004, from http://www.intel.com/research/exploratory/heterogeneous.htm
Invisible Computing: Scope (2000). Retrieved April 5, 2004, from http://invisiblecomputing.org/
scope.html.
Jones, K., Wadaa, A., Olariu, S., Wilson, L., and Eltoweissy, M. (2003). Towards a new paradigm for
securing wireless sensor networks. In Proceedings New Security Paradigms Workshop 2003, Ascona,
Switzerland. August 18–21, 2003, pp. 115–122.
Kahn, J., Katz, R., and Pister, K. (1999). Next century challenges: Mobile networking for
“Smart Dust”. In ACM MOBICOM Conference, Seattle, WA. Retrieved April 5, 2004, from
http://www.cs.berkeley.edu/∼randy/Papers/mobicom99.pdf.
Kelly, K. (1994). Out of Control: The New Biology of Machines, Social Systems, and the Economic World.
Perseus Books.
Lachman, R., Lachman, J., and Butterfield, E. (1979). Cognitive Psychology and Information Processing.
Lawrence Erlbaum Assoc, Hillsdale, NJ.
Lammers, D. (2001). Embedded projects take a share of Intel’s research dollars. EE Times. Retrieved
April 5, 2004, from http://today.cs.berkeley.edu/800demo/eetimes.html.
Leveraging the Infosphere: Surveillance and Reconnaissance in 2020 (1995). Airpower Journal — Summer
1995, A SPACECAST 2020 White paper. Retrieved April 5, 2004, from http://www.airpower.
maxwell.af.mil/airchronicles/apj/spacast1.html.
Lodding, K.N. (2004a). Hitchhikers Guide to Biomorphic Software. ACM Queue.
Lodding, K.N. (2004b). Multi-agent organisms for persistent computing. In Proceedings of the 3rd Inter-
national Joint Conference an Autonomous Agents and Multi Agent Systems (AAMASO4). New York,
NY, July 19–23.
Mainwaring, A., Polastre, J., Szewczyk, R., and Culler, D. (2002). Wireless sensor networks for
habitat monitoring. (Intel Research, IRB-TR-02–006, June 10, 2002) In ACM International Work-
shop on Wireless Sensor Networks and Applications. Retrieved April 5, 2004, from http://www.
greatduckisland.net.
Margulis, L. and Sagan, D. (1986). Microcosmos: Four Billion Years of Microbial Evolution. Simon and
Schuster, New York, p. 15.
McCulloch, W. and Pitts, W. (1943). A logical calculus of the ideas immanent in nervous activity. Bulletin
of Mathematical Biophysics, 7: 115–133.
Moravec, H. (1988). Mind Children. Harvard University Press.
National Research Council (2001). Embedded, Everywhere: A Research Agenda for Systems of Embedded
Computers, Committee on Networked Systems of Embedded Computers, for the Computer Science
and Telecommunications Board, Division on Engineering and Physical Sciences, Washington, DC.
Retrieved April 5, 2004, from http://www.nap.edu/catalog/10193.html.
Noor, A. and Malone, J. (1999). Intelligent Agents and Their Potential for Future Design and Synthesis
Environment. NASA CP-1999-208986.
Pfeifer, R. and Scheier, C. (2000). Understanding Intelligence. MIT Press, Cambridge, MA.
Primio, F., Müller, B., Lengeler. (2000). Minimal cognition in unicellular organisms. In SAB2000
Proceedings Supplement, International Society for Adaptive Behavior. Honolulu, HI. pp. 3–12,
Retrieved April 5, 2004, from http://www.ais.fraunhofer.de/BAR/papers/diprimio-mincog.pdf.
Reinhard, J., Srinivasan, M., and Zhang, S. (2004). Olfaction: scent-triggered navigation in honeybees.
Nature, 427: 411.
Richards, J., Gilder, G., Kurzweil, R., Searle, J., Dembski, W., Denton, M., and Ray, T. (2002). Are We
Spiritual Machines? Discovery Institute, Seattle, WA.
Tilak, S., Abu-Ghazaleh, N., and Heinzelman, W. (2002). A taxonomy of wireless micro-sensor
network models. ACM Mobile Computing and Communications Review (MC2R), 6. Retrieved
April 5, 2004, from http://www.cs.colorado.edu/∼rhan/CSCI_7143_001_Fall_2002/Papers/
Tilak2002_p28-tilak.pdf.
Turner, J. (2000). The Extended Organism: The Physiology of Animal-Built Structures. Harvard University
Press.
UCB/MLB 29 Palms UAV-Dropped Sensor Network Demo (2001). University of California, Berkeley, CA.
Retrieved April 5, 2004, from http://robotics.eecs.berkeley.edu/∼pister/29Palms0103.
Wadaa, A., Olariu, S., Wilson, L., Eltoweissy, M., Jones, K., and Sundaram, P. (2004a). Training a sensor
network. In Special Issue of MObile NETwork (MONET) on Algorithmic Solutions for Wireless,
Mobile, Ad Hoc and Sensor Networks, Bar-Noy, A., Bertossi, A., Pinotti, M., and Raghavendra, C.
Eds. January 2004.
Wadaa, A., Olariu, S., Wilson, L., Eltoweissy, M., and Jones, K. (2004b). On providing anonymity
in wireless sensor networks. In Proceedings of the 10th International Conference on Parallel and
Distributed Systems, (ICPADS-2004). Newport Beach, CA. July 2004.
Weiser, M. (1991). The computer for the Twenty-first century. Scientific American, 94–10. September 1991.
Retrieved April 5, 2004, from http://www.ubiq.com/hypertext/weiser/ UbiHome.html.
Weiser, M. (1996). Ubiquitous Computing. Retrieved April 5, 2004, from http://www.ubiq.com/
hypertext/weiser/UbiHome.html.
Zadeh, L. (1965). Fuzzy sets. Information and Control, 8: 338–353.
Appendix A — Glossary
The following terms are pertinent to this topic:
Analyte: The object of measurement in an analytical procedure (in this case a chemical property).
Artificial intelligence: The science of simulating intelligence in a creation of humans (i.e., not natural).
Artificial life: Beyond intelligence, the science of simulating a living organism or some property thereof
in a creation of humans (i.e., not natural).
Biomimetics: The study of the origin, structure, or function of biological mechanisms, processes, and
materials as models for the design of artificial constructs.
Circadian: Relating to approximately a 24 h period (from Latin meaning “around the day”).
Effector: A device that, in response to a stimulus (input or command), initiates an effect on or affects
its environment.
Emergent Behavior: Behavior that results from:
Evolvable hardware: Hardware designed by the application of evolution to automate its creation and
adaptation. A goal is to use these techniques in situ producing hardware that can adapt to unpredicted
environmental conditions, thus improving its survivability for long durations in unknown and changing
environments.
Homeostasis: The state of a relatively constant internal environment. The physical and chemical states
that an organism must maintain to allow proper functioning, in maximum efficiency, of its components:
cells, tissues, organs, and organ systems.
Innate behavior: Behavior that results from genetic encoding. Such behavior does not require learning
and changes little in response to environmental stimuli.
Learned behavior: Behavior that results from the cognitive assessment of prior experiences to determine
a new response.
Metabolism: The chemical processes (breaking down substances to provide energy or synthesis of new
substances) within a living organism that are necessary for life.
Mote: Defined as a small particle, here it species the self-powered, physically independent device that
contains hardware and software for sensing or effecting, computing, and communication in a wireless
sensor network.
Nanoscale: Measurement on a scale of nanometers (dimensions under ∼100 nm).
Nano-technology: The process used to design and build electronic circuits and devices from atoms and
molecules.
Node: In computer science, this is either a terminal or hop point in a communications network. Here
it specifies a mote in the context of the network.
Pervasive: Defined as the quality of being present throughout; to permeate. Pervasive computing
and ubiquitous computing seem to be synonymous in computer science literature, but here, pervasive is
used to describe presence throughout a system (e.g., sensors are pervasive in an automobile if they are
installed and function throughout the automobile).
Photosynthesis: The process in some organisms (plants and some microbes) by which carbohydrates are
synthesized from carbon dioxide and water using light as an energy source. These carbohydrates serve as
energy storage and are later metabolized resulting in the release of carbon dioxide, water, and energy used
by the organism.
Protozoa: Plural for protozoan collectively naming any of a large group of single-celled, micro-
scopic organisms (e.g., amoeba, sporozoans, ciliates, flagellates, etc.). Protozoans differ from bacteria
(prokaryotic) in that they, like higher plants and animals, contain cellular constructs such as nucleus,
mitochondria, etc. (eukaryotic).
Sensor: A device that receives and reports a signal or stimulus.
Society: In biology, a society is defined as a colony of organisms, usually of the same species such as a
society of ants. Here, a society is limited to an association of cognitive animals such as wolves or humans.
Symbiosis: A relationship between organisms that is mutually beneficial and which, over time, forms a
dependence.
Ubiquitous: Defined as the quality of being or seeming to be everywhere at the same time. Pervasive
computing and ubiquitous computing seem to be synonymous in computer science literature, but here,
ubiquitous is used to describe presence within all systems (e.g., sensors are ubiquitous in automobiles if
they are installed and function in all automobiles).
34.1 Introduction
Combinatorial optimization often needs a large amount of computation resource. This is particularly the
case for NP-hard problems, for which no efficient algorithm is known. Parallel computers may supply
this resource. However, it may be interesting to use those platforms in another way than to parallelize the
computation of the objective function of the visited configurations.
Coevolutionary computation [1] leads to algorithms that can be implemented in a distributed way.
We call this kind of implementation cooperative algorithms. Such an algorithm presents a low coupling
between its components. In this chapter, we present a distributed design and implementation of the
metaheuristic COSEARCH [2,3]. This metaheuristic is a very general model which can easily be imple-
mented in several ways. We propose a view of this algorithm bearing in mind the need to explicitly balance
34-625
diversification and intensification during the search. We evaluate our approach on the graph coloring
problem.
Nevertheless, the implementation of COSEARCH needs the design of mechanisms dedicated to the
studied problem. In our case, we use two new operators which are the break and the Xρ operators. The
former is an operator modifying a unique coloring as mutation in genetic algorithms. The latter combines
two colorings as crossover.
This chapter is organized as follows. In Section 34.2, we present an overview of the parallel cooper-
ative COSEARCH method: concepts and implementations of suggestions. In Section 34.3, we present
the studied problem, which is the graph coloring problem (MGCP). In Section 34.4, we present some
preliminary works on the search operators. In Section 34.5, we present an implementation of COSEARCH
for MGCP. In Section 34.6, we have given some experimental results. Finally, we conclude and present
some perspectives in Section 34.7.
Such methods may provide good results. However, in some cases, these methods may be insufficient to
find the optimum. We propose to use the local search algorithm as a basic component of our algorithm.
Several parallel/distributed local search algorithms may work in a cooperative way to optimize a problem.
In our approach, such an algorithm is called a (search) agent.
adaptive memory
agent
agent
agent agent
agent
34.3.1 Applications
Graph coloring problem models a lot of real-life problems. Those problems consist in distributing objects
into different groups in order to separate some pairs of objects, for example, the Register Allocation
Problem (RAP) [7]. In this problem, the objects are the variables of a source code and the groups are the
registers of a computer. For each variable, we must assign a register number. Two variables, having the
same register number, cannot exist at the same time. Thus, we obtain a set of binary constraints of mutual
exclusion on variables. The smaller the number of used registers, the more efficient is the program. The
objective function is obviously to minimize the number of registers.
There are a lot of other problems, which are direct applications of MGCP. In this category, we can
find:
• Frequency assignment problem [8]. In this problem, the objects are the channels of the transceiv-
ers and the groups are the available frequencies. Two channels covering near by areas have to
communicate using different frequencies.
• Timetabling problem [9]. A course combines the constraints from three elements: teacher, class,
and location; knowing that a teacher cannot make two lectures at the same time, idem for class and
location.
• Some other problems such as pattern matching [10], air route conception [11], etc.
34.3.2 Formalization
All problems presented above can be modeled using a graph G(V , E). The set of vertices V is the set of
object to color (distribute or assign) and the set of edge E symbolizes the constraints between vertices:
vertices connected by an edge must have different colors (be assigned with different values).
We use the following formalization for a given graph G(V , E).
Definition 34.1 We call coloring (of G) any mapping from V into the set of natural numbers {1 . . . |V |}.
Definition 34.2 We call color of v with regards to a coloring C , the image of v by C (i.e., C (v)).
Definition 34.3 We call ith class of a coloring C , the set of vertices colored into i (i.e., C −1 (i)).
Technically, a coloring may be trivially encoded by an array of integers indexed by the vertices. Never-
theless, for some algorithms it will be helpful to have the vertices gathered by classes. We choose a mixed
encoding (see Figure 34.2). This encoding allows the visit of vertices of a class without investigating the
other vertices. An array allows us to change quickly (in O(1)) the color of one vertex.
Definition 34.4 We call violation of constraint (in a coloring C ), an edge whose extremities is colored by the
same color (i.e., C −1 (i)).
g : {colorings} → N
1
C → g (C ) = × δC (i),C (j) × IE (i, j)
2
i,j,∈V
where δi,j is the Kronecker’s symbol (i.e., δi,j = 1 if i = j; 0 otherwise) and IE (i, j) is the characteristic
function of E (i.e., IE (i, j) = 1 if (i, j) ∈ E; 0 otherwise).
O1 C1 C1 O1
O2 Ci O1 O5 O6 O8 O2
Ci
O2 O4
Ck
Om Ck O7 O9 Om Om
Definition 34.6 We say that a coloring C is a proper coloring if there is not any violation (i.e., g (C ) = 0).
Definition 34.7 A coloring is a k-coloring if the used colors are less than or equal to k.
Definition 34.8 The chromatic number of graph is the smaller number such that the graph allows a proper
k-coloring. For a given graph G, we denote χ (G) as its chromatic number.
Definition 34.9 Let k be a positive number. We call graph k-coloring problem (GkCP), the problem consists
in finding a k-coloring such that g (C ) is minimal.
Definition 34.10 We call Minimal graph coloring problem (MGCP), the problem consisting in finding a
coloring with the less used colors. So, we minimize the following function:
f : proper colorings → N
C → f (C ) = max C (v)
v∈V
The second one operates on k-colorings. The search is decomposed into steps. At each step, the algorithm
works with a fixed k and tries to find a proper k-coloring in the whole k-colorings. Then, k is decreased for
a new step. When a step fails (does not find a proper coloring), the previously found proper k + 1-coloring
is returned. This strategy was used in many heuristics:
• Local search (hill climbing, simulated annealing) [16] with the change of one vertex as operator
• Local search with using “permutation-neighborhood” operator [20]
• Tabu search [21–23]
• Addition of several diversification techniques to local search procedures [24]
• Local search with variable neighborhood search [25]
• Modification of bias of neighborhood operator [26,27]
• Evolutionary Algorithm with dedicated operators [23,28,29]
• A multi-level approach [30]
Nevertheless, Clerc [31] studied a method using a mix of both strategies. He used a method working
on valid colorings and k-colorings in particle swarm optimization algorithm.
In this study, our attention was focused on the first strategy.
34.4.1 IG Operator
Culberson [17] proposes a manner to generate a coloring C
, where the number of colors is less than
or equal to the number of colors of the genitor C . The new coloring is built as follow: First, all the
vertices are uncolored in C
. Second, all (uncolored in C
) vertices of class C −1 (1) are placed into the first
class of C
. Then, the class is filled with other (uncolored in C
) vertices (i.e., adding vertices until the
class does not accept any one without inducing constraint violations). This step is repeated to build the
1 http://mat.gsia.cmu.edu/COLOR/color.html
second color and so on. The visit of the class of C are made in a disorder by manner to generate different
colorings. Technically, this operator can be implemented using a unique coloring. So, we can apply again
this technique on the obtained coloring.
Algorithm 34.1 presents this idea. In line 5, we complete the class without regarding the colorings C .
The uncolored vertices are shuffled to increase the diversity of the search.
IG Break
Name min max avg std dev. min max avg std dev.
DSJC125.1 6 7 6.90 0.31 6 7 6.4 0.5
DSJC125.5 19 21 20.64 0.59 19 21 20.35 0.59
DSJC125.9 45 47 46.4 0.60 44 47 45.8 0.77
DSJC250.9 78 81 79.15 0.88 76 79 77.2 0.89
DSJC500.9 141 145 143.2 1.24 137 141 138.85 1.46
DSJR500.1c 85 85 85 0 85 85 85 0
le450_5c 6 8 6.8 0.52 5 7 6.40 0.6
le450_5d 5 8 6.7 0.66 5 7 6.05 0.69
queen10_10 12 14 12.95 0.39 12 13 12.95 0.22
queen11_11 14 15 14.1 0.31 14 15 14.5 0.51
queen8_12 13 14 13.05 0.22 13 13 13 0
queen9_9 12 12 12 0 11 12 11.8 0.41
school1 14 19 14.7 1.13 14 15 14.25 0.44
school1_nsh 14 18 15.6 1.05 14 16 14.6 0.6
• Intensify: The memory randomly picks a coloring in elite and sends it to an agent IG or a break
agent. Indeed those agents can easily modify a coloring and a better coloring can be reached by
several applications of those algorithms. We call such an agent an I-agent.
• Force a same class: The memory randomly picks one of the best colorings. It chooses the pair of
nonconnected vertices that are not colored by the same color for a long time. The memory forces
both vertices to be in the same class. This may increase the number of colors. Finally, the new
coloring is sent to a break agent. The agent respects always that both vertices are in the same class.
It behaves as both chosen vertices form a new one. The set of neighbors of the new vertex is the
union of the set of neighbors of the old ones. We call such an agent an FS-agent.
• Force different classes: The memory picks randomly one of the best colorings. It chooses the pair
of nonconnected vertices that are always colored by the same color for a long time. The memory
forces both vertices to be in different classes. This usually increases the number of colors. Finally,
the new coloring is sent to a break agent. The agent respects always that the two vertices are in
different classes. It behaves as if there is an additional edge between the two chosen vertices. We
call such an agent an FD-agent.
• Greedy crossover: Memory extracts from two colorings the commun part. This partial coloring is
sent to an agent to be completed. This completion is made in a greedy manner. Then, the break
operator is applied few times. Several compressions are tested before returning the best one. We
call such an agent an Xρ -agent.
• Every worker is working, so the adaptive memory waits for one result.
• There are some computational available resources, so a policy is chosen with regard to the
availabilities.
Initialisation
end Signal
Wait for work End
work
new
Work
Return result
Initialisation
end Signal
Receipt End
ava
ilab
work
re sult new ility?
Initialisation
Ask of availabilities
Receipt
ble as
ila yn
available
va ch
agent
na ron
u ou
ent sr
ag es
all ult
Wait result Choose policy New result
Intensification policy
Treat result Force same class policy
Force different class policy
Greedy crossover policy
End
• A new result is asynchronously obtained, so the memory incorporates the result before any choice
of a policy.
As suggested in Section 34.2.3, the automaton of the adaptive memory and the agents communicate
with the fleet managers. A fleet manager corresponds to the Input/Output interface of the memory.
The end of the algorithm is decided by the adaptive memory. The end is chosen as a policy but it
happens after a fixed number of iterations. The other policies of the memory is randomly picked from
the available agents with fixed biases. For example, we may apply 60% of I-agent, 10% of FD-agent,
10% of FS-agent, and 10% of Xρ -agent. At a given moment, if all intensification agents are working then
the policy cannot be an intensification.
34.6 Experiments
Our algorithm is composed of many components. It is possible that some presented agents do not
contribute well to the global quality of the search. So, we have to tune the use of each agent. This is
complicated by the fact that all agents are based on the break operator that is very efficient due to its
capacity to diversify. Thus, we cannot detect if we have to continue to use I-agent or operate some radical
changes.
In a first time, we evaluate each agent with the I-agent. Finally, we present results of the global
architecture.
• Number of iterations of the global cycle: At each iteration, adaptive memory collects a new coloring
and chooses a policy. We bound experimentally this parameter to 500.
• Number of iterations for each type of agent : The number of application of the break operator by the
I-agents, FS-agents and FD-agents is equal to the number of vertices. The Xρ agents recombine
two colorings and halve the number of vertices applications of break operator on each obtained
colorings.
• Number of available agent for each type of agent : We use ten I-agents, four FS-agents, four FD-agents,
and four Xρ -agents. For those tests, we use six diversification agents (FS, FD, Xρ ) in parallel with
ten intensification agents (break).
Table 34.3 summarizes the obtained results. For each combination, we made four runs on different
hardware platforms using different architectures: Linux PCs (between 2.7 and 3 Ghz , 512 Mo and 1 Go
RAM) and an IBM SP3 (with 16 processors Power3 NH2 per node running at 375 Mhz, 16 Go RAM).
We can see that the combination allows us to go further in the search. Indeed, the qualities of the overall
solutions are better for any combination than for the break operator.
Incidentally, we observe that the different combinations perform more or less well on different graphs.
I+FS provides the best results on DSJC250.9, I+FD provides the best results on Queen9_9 and I+X always
found the same quality of results for Queen8_12.
TABLE 34.3 Summarize Experiments with one Diversification Agent and the Break Agent
Name min max avg std dev. min max avg std dev. min max avg std dev.
DSJC125.1 6 6 6 0 6 6 6 0 6 6 6 0
DSJC125.5 19 19 19 0 19 19 19 0 19 19 19 0
DSJC125.9 44 44 44 0 44 44 44 0 44 44 44 0
DSJC250.9 73 75 74 0.81 74 74 74 0 74 74 74 0
DSJC500.9 131 133 132 0.81 131 132 131.75 0.5 133 134 133.5 0.57
DSJR500.1c 85 85 85 0 85 85 85 0 85 85 85 0
le450_5c 5 5 5 0 5 5 5 0 5 5 5 0
le450_5d 5 5 5 5 5 5 5 0 5 5 5 0
queen10_10 12 12 12 0 12 12 12 0 12 12 12 0
queen11_11 13 14 13.5 0.57 13 14 13.75 0.5 14 14 14 0
queen8_12 12 13 12.25 0.5 12 13 12.5 0.57 12 12 12 0
queen9_9 11 11 11 0 10 11 10.75 0.5 11 11 11 0
school1 14 14 14 0 14 14 14 0 14 14 14 0
school1_nsh 14 14 14 0 14 14 14 0 14 14 14 0
I+FS+FD+X
memory summarizes information and decides the direction of the search: intensification or a type of
diversification. Below the memory, specific agents allow to manage an asynchronous fleet of agents of the
same type. At the lowest level, the workers called agents apply strategies dedicated to the problem.
We apply such a strategy to the graph coloring problem. For that, we develop dedicated operators:
Break operator and Xρ . We propose several ways to diversify the search forcing the presence of characters
in colorings used by searching agents.
Combinations of components was tested. We saw that that diversification leads to obtain better results
but no diversification worth better than all others.
The combination of all diversification techniques improves results on a graph but pays back its
performance on two other graphs. This may be due to the too heavy diversification made in this case.
We are now working on new search agents as agents working with k-colorings and a way to tune to each
diversification technique.
References
[1] Jan Paredis. Coevolutionary computation. Artificial Life, 2: 355–375, 1995.
[2] V. Bachelet. Métaheuristique parallèle hybride: application au problème d’affectation quadratique.
Ph.D. thesis, Université des sciences et technologies de Lille, cité scientifique Villeneuve d’Ascq
59655, December 1999.
[3] B. Weinberg, V. Bachelet, and E.-G. Talbi. A co-evolutionnist meta-heuristic for the assignment of
the frequencies in cellular networks. In First European workshop on Evolutionary Computation in
Combinatorial Optimization (EvoCOP), Como, Italy, 2001. Springer-Verlag, pp. 140–149.
[4] S. Kirkpatrick, D.C. Gelatt, and M.P. Vecchi. Optimization by simulated annealing. Science, 220:
671–680, 1983.
[5] F. Glover and M. Laguna. Tabu search. In C. Reeves, Ed., Modern Heuristic Techniques for
Combinatorial Problems, Blackwell Scientific Publishing, Oxford, England, 1993.
[6] T. Feo and M. Resende. Greedy randomized adaptive search procedures. Journal of Global
Optimization, 6: 109–133, 1995.
[7] G.J. Chaitin, M. Auslander, A.K. Chandra, J. Cocke, M.E. Hopkins, and P. Markstein. Computer
Languages, chapter Register allocation via graph coloring, IBM. T.J. Watson Research Center, 1981,
pp. 47–57.
[8] A. Gamst. Some lower bounds for a class of frequency assignment problems. IEEE Transactions of
Vehicular Technology, 1(35): 8–14, 1986.
[9] S. Miner, S. Elmohamed, and H. Yau. Optimizing timetabling solutions using graph coloring. In
NPAC REU Program, NPAC, Syracuse University, NY, 1995.
[10] H. Ogawa. Labeled point pattern matching by delaunay triangulation and maximal cliques. In
Pattern Recognition, number 1 in 19, 1986, pp. 35–40.
[11] N. Barnier and P. Brisset. Coloriage de graphe en programmation par contraintes. In ROADEF
2003, ROADéF, February 2003, pp. 348–349.
[12] R.M. Karp. Complexity of Computer Computations, chapter Reducibility among combinatorial
problems, Plenm Press, New York, 1972, pp. 85–103.
[13] B. Weinberg and E.-G. Talbi. On symmetry of partitionning problems. In J. Gottlieb and G. Raidl
Eds., EvoCOP, LNCS, 2004. To appear.
[14] D. Brelaz. New methods to color the vertices of a graph. Communications of the ACM, 22: 251–256,
1979.
[15] F.T. Leighton. A graph coulouring algorithm for large scheduling problems. J. Res Natl Bur.
Standards, 84: 489–506, 1979.
[16] D.S. Johnson, C.R. Aragon, L.A. McGeoch, and C. Schevon. Optimization by simulated annealing:
An experimental evaluation; part II, graph coloring and number partitioning. Operations Research,
39: 378–406, 1991.
[17] J. Culberson. Iterated Greedy Graph Coloring and the Difficulty Landscape. Technical report,
University of Alberta, June 1992.
[18] S. Hurley, D. Smith, and C. Valenzuela. A permutation based genetic algorithm for minimum span
frequency assignment. In T. Baeck, A. Eiben, M. Schoenauer, and H. Schwefel, Eds, PPSN V: Pro-
ceedings of the Fifth International Conference on Parallel Problem Solving from Nature, Vol. 1498 of
Lecture Notes in Computer Science, Amsterdam, The Netherlands, September 1998. Springer-Verlag
Publication, pp. 907–916.
[19] D. Costa and A. Hertz. Ants can colour graphs. Journal of the Operational Research Society, 48:
295–305, 1997.
[20] C.A. Glass and A. Prügel-Bennett. A Polynomially Searchable Exponential Neighbourhood for
Graph Colouring. Technical report, Departement of Electronics and Computer Science, University
of Southampton, 1998.
[21] A. Hertz and D. de Werra. Using tabu search techniques for graph coloring. Computing, 39:
345–351, 1987.
[22] R. Dorne. Étude des méthodes heuristiques pour la coloration, la T-coloration et l’affectation de
fréquence. Ph.D. thesis, Université de Montpellier II Science et Technique, May 1998.
[23] P. Galinier and J-K. Hao. Hybrid evolutionary algorithms for graph coloring. Journal of
Combinatorial Optimization, 3: 379–397, 1999.
[24] L. Paquete and T. Stützle. An experimental investigation of iterated local search for coloring graphs.
In S. Cagnoni, J. Gottlieb, E. Hart, M. Middendorf, and G. Raidl, Eds, Applications of Evolutionary
Computing, Proceedings of Evo Workshops2002: EvoCOP, EvoIASP, EvoSTim, Vol. 2279, Kinsale,
Ireland, 3–4 Springer-Verlag, 2002, pp. 121–130.
[25] C. Avanthay, A. Hertz, and N. Zufferey. A variable neighborhood search for graph coloring.
European Journal of Operational Research, 151: 379–388, 2003.
[26] A. Vesel and J. Zerovnik. How good can ants color graphs? Journal of computing and Information
Technology, 8: 131–136, 2000.
[27] A. Petford and D. Welsh. A randomised 3-colouring algorithm. Discrete Mathematics, 74: 253–261,
1989.
[28] J-P. Hamiez and J-K. Hao. Scatter search for graph coloring. In Artificial Evolution, Le Creusot,
France, October 2001, pp. 267–278.
[29] D. Fotakis, S. Likothanassis, and S. Stefanakos. An evolutionary annealing approach to graph
coloring. In Proceedings of Applications of Evolutionary Computing, Vol. 2037 of Lecture Notes in
Computer Science, Evo Workshops 2001, Springer-Verlag, April 2001, pp. 120–129.
[30] C. Walshaw and M.G. Everett. Multilevel Landscapes in Combinatorial Optimisation. Technical
Report 02/IM/93, Comp. Math. Sci., Univ. Greenwich, London SE10 9LS, UK, April 2002.
[31] M. Clerc. Optimisation par essaim particulaire et coloriage de graphe. Technical report, France
Télécom, 2001.
[32] A. Geist, A. Beguelin, J. Dongarra, W. Jiang, R. Manchek, and V. Sunderam. PVM: Parallel
Virtual Machine, A Users’ Guide and a Tutorial Networked Parallel Computing. The MIT Press,
Cambridge, MA, 1994.
35.1 Introduction
Real-world optimization problems are often NP-hard, complex, and CPU time-consuming. Moreover,
their modeling evolves continuously in terms of constraints and objectives. Therefore, their resolution
requires the use of parallel/distributed hybrid metaheuristics. Unlike exact methods, metaheuristics allow
to find sub-optimal solutions in a reasonable execution time. They allow to meet the resolution delays
often imposed in the industrial field.
Metaheuristics fall in two categories: single solution-oriented or local search (LS) methods, and
population-based or evolutionary algorithms (EAs). An LS starts with a single initial solution. At each step
of the search the current solution is replaced by another (often the best) solution found in its neighborhood.
This work is a part of the current national joint grid computing project ACI-GRID DOC-G (Défis en Optimisation
Combinatoire sur Grilles). It includes research teams from different laboratories: OPAC from LIFL, OPALE from
PRISM and O2 and P3-ID from IMAG. The project is supported by the French government.
35-639
Very often, LS methods allow to find a local optimal solution, and so are called exploitation-oriented
methods. On the other hand, evolutionary algorithms work on a randomly generated population of solu-
tions. The initial population is enhanced through a natural evolution process. At each generation of the
process, the whole or a part of the population is replaced by the newly generated individuals (often the
best ones). EAs are often called exploration-oriented methods.
Although their time complexity is polynomial, metaheuristics remain insufficient for large-size prob-
lems. Therefore, parallel/distributed and concurrency tools are necessary to tackle these problems.
Different parallel/distributed models have been proposed for each class of methods. They are detailed
in Sections 35.2 and 35.3. In order to take benefit from the exploitation power of the LS methods and the
exploration merit of EAs, their hybridization is recommended [1]. Hybrid metaheuristics allow to deliver
high quality and robust solutions.
Several parallel and distributed metaheuristics and their implementations have been proposed in the
literature. Most of them are available on the Internet, and can be reused and adapted to its own problems.
Reusability may be defined as the ability of software components to build many different applications [2].
However, one has to rewrite the problem-specific sections of the code. Such task is tedious, error prone,
energy and time-consuming. Moreover, the new developed code is harder to maintain. A better way to
reuse the code of existing parallel and distributed metaheuristics is the use of libraries [3]. Their objective
is twofold: they are reliable as they are often well tested and documented. In addition, they allow a better
maintainability and efficiency. However, libraries do not allow the reuse of design. A better approach to
reuse the design and the code at the same time is the framework-based reuse [4].
In the literature, very often the authors do not make a clear of difference between a library and a
framework. In a framework, the provided code calls the user defined one according to the Hollywood
property: “do not call us, we call you.” Therefore, frameworks provide the full control structure of
the invariant part of the algorithms, and the user has to only supply the problem-specific details. This
chapter focuses on the frameworks, and aims at removing the ambiguity on their use, highlighting their
characteristics, requirements, and objectives.
Most of existing frameworks related to metaheuristics for discrete optimization problems are Object-
Oriented (OO) [5–14]. They include a set of classes that embody an abstract design of solution methods
of a family of related problems [2]. They are based on a strong conceptual separation of the invariant
(generic) part of PDM and their problem-specific part. Such characteristic allows the PDM programmer
to redo very little code.
The frameworks focus only on either EA [5–9] or LS [10,11]. Only few frameworks are dedicated
on the design of both EA and LS, and their hybridization [12–14]. All these frameworks are described,
summarized, and compared in this chapter. The comparison is mainly based on the class of provided
solution methods, the parallel/distributed models they implement, the hybridization mechanisms they
allow, and some implementation choices, mainly the programming language and the communication and
concurrency API. The presented overview will help the user to choose the framework corresponding to
his/her needs. At our best known, such overview has never been proposed in the literature.
The rest of the chapter is organized as follows: Sections 35.2 and 35.3 present the working principles
and their major parallel/distributed models of, respectively, LS methods and EA. In Section 35.4, we
present the main hybrid mechanisms of metaheuristics. In Section 35.5, we propose an overview of the
major frameworks dedicated to the LS methods, to the EA, and both of them. The main characterist-
ics of each of these frameworks are summarized. Section 35.6 ends the chapter with some concluding
remarks.
35.2.1 Principles of LS
Local search are metaheuristics dedicated to the improvement of a single solution. They are generally
based on the concept of the neighborhood. They start from a solution randomly generated or provided by
another metaheuristic. This solution is then updated, systematically, by replacing the current solution by
another found in the neighborhood. The specific features of LS are mainly: the heuristic internal memory,
the strategy used to choose the initial solution, the generator of candidate solutions, and the selection
policy of the candidate moves. Three major LS stand out: Hill Climbing (HC) [15], Simulated Annealing
(SA) [16], and Tabu Search (TS) [17].
A serial LS is composed of generic and specific features. Generic features include the initializing of a
given movement, the exploration strategy of the neighborhood, and the computation of the fitness value
of a given solution corresponding to a given movement. Specific features, such as the Tabu list involved in
the TS method, allow to differentiate LS.
Copy
Parallel
exploration
of the neighborhood
An iteration
Aggregation
E.A. E.A. of fitness
Solutions values
E.A. Solutions
Fitness Partial
values fitnesses
Migrations
Full evaluating
nodes Partial evaluating
nodes
consuming and IO intensive. The function could be viewed as an aggregation of a set of partial
functions. A reduction operation is performed on the results returned by the partial functions.
Consequently, for this model the user has to indicate a set of partial functions and an aggregation
operator of these.
Hybrid metaheuristic
Coevolution Relay
The low-level hybridization consists in changing the internal component of the metaheuristic on which
it is performed. A given function of a given metaheuristic is replaced by another metaheuristic. For
instance, the mutation operator of a given GA could be replaced by an LS method. In order to make easier
this kind of hybridization the semantics of the internal function must be the same as the metaheuristic. The
low-level hybridization requires to examine the internal working of the metaheuristic [14]. Conversely,
in high-level hybrid algorithms the combined metaheuristics are self-containing, meaning no direct
relationship to their internal working is considered.
On the other hand, in the relay hybridization mode the metaheuristics are applied in a pipeline way.
The output of a given metaheuristic (except the last) is the input of its successor (except the first) in
the pipeline. At the contrary, coevolutionist hybridization is a cooperative optimization model. Each
metaheuristic performs a search in a solution space, and exchange good solutions with others.
• Maximum design and code reuse: The framework must provide for the user a whole architecture
(design) of his/her solution method. Moreover, the programmer may redo as little code as possible.
This objective requires a clear and maximum conceptual separation between the solution methods
and the problems to be solved, and thus a deep problem domain analysis. The user might, therefore,
develop only the minimal problem-specific code.
• Utility and extendibility: The framework must allow the user to cover a broad range of
metaheuristics, problems, parallel distributed models, hybridization mechanisms, etc. It must
be possible for the user to easily add new features/metaheuristics or change existing ones
without implicating other components. Furthermore, as in practice existing problems evolve
and new others arise these have to be tackled by specializing/adapting to them the framework
components.
• Transparent use of parallel/distributed models and hybridization mechanisms: In order to facilitate
its use it is implemented so that the user can deploy his/her parallel algorithms in a transparent
manner. Moreover, the execution of the algorithms must be robust to guarantee the reliability
and the quality of the results. The hybridization mechanism allows to obtain robust and better
solutions.
• Portability: In order to satisfy a large number of users the framework must support different
material architectures and their associated operating systems.
According to the first two criteria, the frameworks fall in three categories that distinguish those,
respectively, dedicated to only EA, only LS, and both of them. Some of the frameworks limited to only
EA are the following: DREAM1 [5], ECJ2 [6], JDEAL3 [7], and Distributed BEAGLE4 [9]. These software
are frameworks as they are based on a clear OO conceptual separation. They are portable as they are
developed in the Java language except the last system, which is programmed in C++. However, they are
limited regarding the parallel distributed models. Indeed, in DREAM and ECJ only the island model is
implemented using Java threads and TCP/IP sockets. DREAM is particularly deployable on peer-to-peer
platforms. Furthermore, JDEAL and Distributed BEAGLE provide only the Master–Slave (M/S) model
using TCP/IP sockets. The latter also implements the synchronous migration-based island model, but
deployable on only one processor.
In the LS domain, most of existing frameworks [10,11] do not allow parallel distributed implement-
ations. Those enabling parallelism/distribution are often dedicated to only one solution method. For
instance, Reference 21 provides parallel skeletons for the TS method. Two skeletons are provided and
implemented in C++/MPI: independent runs (multi-start) model with search strategies, and a Master–
Slave model with neighborhood partition. The two models can be exploited by the user in a transparent
way.
In practice, only few frameworks available on the Internet are devoted to both PDEA and PDLS,
and their hybridization. MALLBA5 [12], MAFRA6 [13], and ParadisEO7 [14] are good examples of
such frameworks. MAFRA is developed in Java using design patterns [22]. It is strongly hybridization
oriented; however, it is very limited regarding parallelism and distribution. MALLBA and ParadisEO have
numerous common characteristics. They are C++/MPI open source frameworks. They provide all the
previously presented parallel/distributed models, and the different hybridization mechanisms. However,
they are quite different as ParadisEO seems to be more flexible because the granularity of its classes is
finer. Moreover, ParadisEO (an extended EO [23]) provides also the PVM-based communication layer
and PThreads-based concurrency. On the other hand, MALLBA is deployable on wide area networks [12].
Communications are based on NetStream, an ad hoc flexible and OOP message passing service upon MPI.
Furthermore, MALLBA allows the cooperation between metaheuristics and exact methods.
References
[1] E.-G. Talbi. A taxonomy of hybrid metaheuristics. Journal of Heuristics, 8: 541–564, 2002.
[2] A. Fink, S. Vo, and D. Woodruff. Building reusable software components for heuristc search.
In P. Kall and H.-J. Luthi (Eds.), Operations Research Proc., 1998, Springer-verlag, Berlin, 1999,
pp. 210–219.
[3] M. Wall. GAlib: A C++ library of genetic algorithm components. http://lancet.mit.edu/ga/.
[4] R. Johnson and B. Foote. Designing reusable classes. Journal of Object-Oriented Programming,
1: 22–35, 1988.
[5] M.G. Arenas, P. Collet, A.E. Eiben, M. Jelasity, J.J. Merelo, B. Paechter, M. Preuß and M. Schoenauer.
A framework for distributed evolutionary algorithms. In Proceedings of PPSN VII, September 2002.
[6] S. Luke, L. Panait, J. Bassett, R. Hubley, C. Balan, and A. Chircop. ECJ: a Java-based
evolutionary computation and genetic programming research system. http://www.cs.umd.edu/
projects/plus/ec/ecj/.
[7] J. Costa, N. Lopes, and P. Silva. JDEAL: The Java Distributed Evolutionary Algorithms Library.
http://laseeb.isr.ist.utl.pt/sw/jdeal/home.html.
[8] E. Goodman. An Introduction to GALOPPS — The “Genetic Algorithm Optimized for Portability
and Parallelism” System. Technical report, Intelligent Systems Laboratory and Case Center for
Computer-Aided Engineering and Manufacturing, Michigan State University, November 1994.
[9] C. Gagné, M. Parizeau, and M. Dubreuil. Distributed BEAGLE: An environment for parallel and
distributed evolutionary computations. In Proceedings of the 17th Annual International Symposium
on High Performance Computing Systems and Applications (HPCS) 2003, May 11–14, 2003.
[10] L. Di Gaspero and A. Schaerf. Easylocal++: An object-oriented framework for the design of local
search algorithms and metaheuristics. In MIC ’2001 4th Metaheuristics International Conference,
Porto, Portugal, July 2001, pp. 287–292.
[11] L. Michel and P. Van Hentenryck. Localizer++: An Open Library for Local Search. Technical report
CS-01-02, Brown University, Computer Science, 2001.
[12] E. Alba and the MALLBA Group. MALLBA: A library of skeletons for combinatorial optimization.
In R.F.B. Monien, Ed., Proceedings of the Euro-Par, Vol. 2400 of Lecture Notes in Computer Science
Paderborn, Springer-Verlag, Heidelberg, 2002, pp. 927–932.
[13] N. Krasnogor and J. Smith. MAFRA: A java memetic algorithms framework. In Alex A. Freitas,
William Hart, Natalio Krasnogor, and Jim Smith, Eds, Data Mining with Evolutionary Algorithms,
Las Vegas, Nevada, USA, August 2000, pp. 125–131.
[14] S. Cahon, N. Melab, and E.-G. Talbi. ParadisEO: A framework for the reusable design of parallel
and distributed metaheuristics. Journal of Heuristics, 10(3): 357–380, 2004.
[15] C.H. Papadimitriou. The Complexity of Combinatorial Optimization Problems. Master’s thesis,
Princeton University, 1976.
[16] S. Kirkpatrick, C.D. Gelatt, and M.P. Vecchi. Optimization by simulated annealing. Science,
220: 671–680, 1983.
[17] F. Glover. Tabu search, part I. ORSA. Journal of Computing, 1: 190–206, 1989.
[18] J.H. Holland. Adaptation in Natural and Artificial Systems, The University of Michigan Press, Ann
Arbor, MI, 1975.
[19] D. Roberts and R. Johnson. Evolving frameworks. A pattern language for developing object-
oriented frameworks. In Proceedings of the Third Conference on Pattern Languages and Programming
(PLoP ‘96), Allerton Park, Illinois, September 4–6, 1996.
[20] W. Pree, G. Pomberger, A. Schappert, and P. Sommerlad. Active guidance of framework
development. Software — Concepts and Tools, 16: 94–103, 1995.
[21] M.J. Blesa, Ll. Hernandez, and F. Xhafa. Parallel skeletons for tabu search method. In 8th Interna-
tional Conference on Parallel and Distributed Systems (ICPADS’01), IEEE Computer Society Press,
Kyongju City, Korea, 2001, pp. 23–28.
[22] E. Gamma, R. Helm, R. Johnson, and J. Vlissides. Design Patterns, Elements of Reusable Object-
Oriented Software, Addison-Wesley, Reading, MA, 1994.
[23] M. Keijzer, J.J. Morelo, G. Romero, and M. Schoenauer. Evolving objects: A general purpose
evolutionary computation library. In Proceedings of the 5th International Conference on Artificial
Evolution (EA ’01), Le Creusot, France, October 2001.
36.1 Introduction
Metaheuristics allow to provide near-optimal solutions of NP-hard complex problems in a reasonable time.
They fall into two complementary categories: evolutionary algorithms (EAs) that have a good exploration
power, and local searches (LSs) characterized by better intensification capabilities. The hybridization of
the two categories permits to improve the effectiveness (quality of provided solutions) and the robustness
of the metaheuristics [11]. Nevertheless, as it is CPU time consuming it is not often fully exploited in
practice. Indeed, experiments with hybrid metaheuristics are often stopped before the convergence is
reached. Nowadays, Peer-to-Peer (P2P) computing [8] and grid computing [5] are two powerful ways
to achieve high performance on long-running scientific applications. Parallel hybrid metaheuristics used
for solving real-world multiobjective problems (MOPs) are good challenges for P2P and grid computing.
However, to the best of our knowledge no research work has been published on that topic.
36-649
In this chapter, we contribute with the first results on parallel hybrid multiobjective metaheuristics
on P2P systems. The design and deployment of these optimization methods require a middleware that
allows cooperation between parallel tasks. In addition, the traditional parallel models and hybridization
mechanisms have to be re-thinked and adapted to be scaled up. Moreover, these require to be fault-tolerant
to allow long-running problem resolutions. We particularly focus here on the island model and the
multistart model.
Recently, few middlewares [1,4,13] allowing to exploit P2P systems have emerged. These middlewares
are well suited for embarrassingly parallel applications such as multi parameter simulations. However,
they are limited regarding the parallelism as they do not allow direct cross-peer (or cross-task) commu-
nication. Our contribution is to propose a Linda-like [7] coordination model and its implementation
on top of XtremWeb [4]. This is a Dispatcher/Worker oriented middleware, in which the Dispatcher
distributes application tasks submitted by clients to volunteer worker peers at their request. In addition,
the considered middleware provides fault-tolerance mechanisms that are costly in a highly volatile P2P
environment. Indeed, a work unit is re-started from scratch each time it fails. Another contribution of this
chapter is to deal with the fault-tolerance issue at application level. We propose a check-pointing approach
for the two parallel models quoted above.
To be validated the proposed approaches have been experimented on the Bi-criterion Permutation
Flow-Shop Problem (BPFSP) [12]. The problem consists roughly to find a schedule of a set of jobs
on a set of machines that minimizes the makespan and the total tardiness. Jobs must be scheduled in
the same order on all machines, and each machine cannot be simultaneously assigned to two jobs.
In Reference 2, a hybrid MultiObjective Metaheuristic (MOM) has been proposed to solve this problem.
In this chapter, we extend this work with two P2P-based fault-tolerant parallel models: the island and
multistart models. Our extended version allows to fully exploit the hybridization and provides clearly
better results. This constitutes another contribution of this chapter.
This chapter is organized as follows: Section 36.2 presents briefly parallel hybrid multiobjective optimiz-
ation (MOO). Section 36.3 highlights the requirements of MOO and describes the proposed coordination
model and its implementation on top of XtremWeb. Section 36.4 presents the experimentation of the
model and its implementation through a parallel hybrid metaheuristic applied to the BPFSP, and analyzes
the preliminary experimental results. Finally, Section 36.5 concludes the chapter.
f2
Pareto solution
Dominated solution
f1
the multistart model. In this chapter, we focus only on the coarse-grained models: the island model and the
multistart model. Due to the communication delays, fine-grained models are often inefficient when they
are deployed in a large-scale network.
In the island (a)synchronous cooperative model, different EAs are simultaneously deployed and cooper-
ate for computing better and robust solutions. They exchange, in an asynchronous way, the genetic stuff
to diversify the search. The objective is to allow to delay the global convergence, especially when the EAs
are heterogeneous regarding the variation operators. The migration of individuals follows a policy defined
by few parameters: the migration decision criterion, the exchange topology, the number of emigrants, the
emigrants selection policy, and the replacement/integration policy.
The multistart model consists in simultaneously launching several local searches. They may be hetero-
geneous, but no information is exchanged between them. The results would be identical as if the algorithms
were sequentially run. Very often deterministic algorithms differ by the supplied initial solution and/or
some other parameters. This trivial model is convenient for low-speed networks of workstations.
Combinations of different metaheuristics often provide very powerful search methods. In Reference 11,
two levels and two modes of hybridization are distinguished: Low and High levels, and Relay and Cooper-
ative modes. The low-level hybridization consists in replacing an internal function (e.g., an operator) of
a given metaheuristic by another metaheuristic. In high-level hybrid algorithms, the different metaheur-
istics are self-containing, meaning no direct relationship to their internal working is considered. Relay
hybridization means a set of metaheuristics is applied in a pipeline way. The output of a metaheuristic
(except the last) is the input of the following one (except the first). Conversely, teamwork hybridization is a
cooperative optimization model. Each metaheuristic performs a search in a solution space, and exchanges
solutions with others. In this chapter, we address the high-level hybridization mechanism in the relay and
cooperative modes.
and control the complex coordination between the workers. To deal with such problem existing middle-
wares must be extended with a software layer that implements a coordination model. Several interesting
coordination models have been proposed in the literature [6,9]. In this chapter, we focus only on one of
the most popular of them, that is, Linda [7], as our proposed model is an extension of this model.
In the Linda model, the coordination is performed through generative communications. Processes
share a virtual memory space called a tuple-space (set of tuples). The fundamental data unit, a tuple,
is an ordered vector of typed values. Processes communicate by reading, writing, and consuming these
tuples. The “eval” operation is particularly useful in a P2P environment as it allows to spawn tasks to be
executed on volunteer peers. A small set of four simple operations allows highly complex communication
and synchronization schemes:
Nevertheless, Linda has several limitations regarding the design and deployment of parallel hybrid
metaheuristics for P2P systems. First, it does not allow rewriting operations on the tuple space. Due to
the high communication delays in a P2P system, tuple rewriting is very important as it allows to reduce
the number of communications and the synchronization cost. Indeed, in Linda a rewriting operation
is performed as an “in” or “rd” operation followed by a local modification and an “out” operation.
The operations “in”/“rd” and “out” involve two communications and a heavy synchronization. Therefore,
the model needs to be extended with a rewriting operation. Furthermore, the model does not support
group operations that are useful for efficiently writing/reading Pareto sets in/from the tuple-space. Finally,
nonblocking operations that are very important in a P2P context are not supported in Linda. In the next
section, we propose an extension of the Linda model that allows to meet these requirements.
In addition to the operations provided in Linda, parallel P2P multiobjective optimization needs other
operations. These operations fall into two categories: group operations and nonblocking operations. Group
operations are useful to manage multiple Pareto optimal solutions. Nonblocking operations are necessary
to take into account the volatile nature of P2P systems. In our model, the coordination primitives are
defined as follows:
• in, rd, out and eval: These operations are the same as those of Linda defined in Section 36.2.3.
• ing(pattern): Withdraws from PS all the solutions matching the specified pattern.
• rdg(pattern): Reads from PS a copy of all the solutions matching the specified pattern.
• outg(setOfSolutions): Inserts multiple solutions in PS.
• update(pattern, expression): Updates all the solutions matching the specified pattern by the solutions
resulting from the evaluation of expression.
• inIfExist, rdIfExist, ingIfExist, and rdgIfExist : These operations have the same syntax than
respectively in, rd, ing, and rdg but they are non blocking probe operations.
The update operation allows to locally update the PS, and so to reduce the communication and syn-
chronization cost. The pattern matching mechanism depends strongly on how the model is implemented,
and in particular on how the tuple-space is stored and accessed. For instance, if the tuple-space is stored
in a database the mechanism can be the request mechanism used by the database management system.
More details on the pattern matching mechanism of our model are given in the next section.
XtremWeb clients
...
Get results
Submit work
Internet
Send results
Get a work unit
XtremWeb workers
XtremWeb dispatcher
switch(OP) { Pareto
... Space
...
case ING: Manager
switch(){}
local call of ...
ing(pattern); ...
... ing(pattern){}
CRB_Skeleton ...
};
XtremWeb worker
Work unit RMI call SELECT * FROM PS
of ing WHERE pattern is
...
matched
s=ing(pattern);
... CRB_Stub
switch() Pareto
{} space (PS)
From the worker side the coordination API is implemented in Java and in C/C++. The C/C++
version allows the deployment and execution of C/C++ applications with XtremWeb (written in Java).
The coordination library must be included in these programmer applications. From the Dispatcher side,
the coordination API is implemented in Java as a PS manager. The CRB is a software broker allowing the
workers to transport their coordination operations calls to the Dispatcher, and has two components: one
for the worker (CRB stub) and another for the Dispatcher (CRB skeleton). The role of the CRB stub is to
transform the local calls to the coordination operations performed by the tasks executed by the worker into
RMI calls. The role of the CRB skeleton is to transform these RMI calls into local calls to the coordination
operations performed by the PS Manager. These local calls are translated into MySQL requests addressed
to the PS.
To illustrate the implementation of the coordination layer on top of XtremWeb, let us consider the scen-
ario presented in Figure 36.3. The work unit performed by an XtremWeb worker calls the ing (template)
coordination operation. In the C++ version of the coordination API, the implementation of each coordin-
ation operation makes the system call execlp() with appropriate parameters to plug in the CRB_Stub Java
object. In our scenario, the major parameters are the number ING designating the operation and the file
ARGS_FILE containing the arguments specified in the template parameter. CRB_Stub translates the ing
local call into an RMI call to the CRB_Skeleton Java object. This latter translates the RMI call into a local
call to the ing operation implemented in the PS Manager class. The implementation of the coordination
operation consists in a MySQL select request addressed to the PS part of the XtremWeb information
database.
Note that the method declarations for the coordination operations in the PS Manager class contain the
Java synchronized keyword. Hence, the system associates a unique lock with the instance of the PS Manager
class. Whenever control enters a synchronized coordination operation, other calls to a synchronized
cooperation method are blocked until the PS Manager object is unlocked. In the next section, the proposed
coordination model is applied to parallel hybrid MOMs.
M1 J2 J4 J5 J1 J6 J3
M2 J2 J4 J5 J1 J6 J3
M3 J2 J4 J5 J1 J6 J3
The task tij being scheduled at time sij , the two objectives can be formulated as follows:
where x and y are solutions of the MOP, and m(x) (respectively t (x)) is the value of x corresponding to
the makespan (respectively tardiness) criterion.
Ppo* < a
POP POP
Crossover
PO* POP⬘
PO*⬘
Neighbors
Mimetic algorithm consists in selecting randomly a set of solutions from the current population of
the GA. A crossover operator is then applied to these solutions and new solutions are generated. Among
these new solutions only nondominated ones are maintained to constitute a new Pareto Front PO ∗ . An
LS is then applied to each solution of PO ∗ to compute its neighborhood. The nondominated solutions
belonging to the neighborhood are inserted into PO ∗ .
performance. Indeed, the evaluation of each objective has a low cost. Therefore, it is useless to evaluate
in parallel the two objectives and evaluate each of them in parallel. Conversely, it is useful to exploit the
following parallel models: (1) the island model that consists in performing in parallel several cooperative
AGMAs; (2) the parallel evaluation of the population of each AGMA; (3) the multistart model that consists
in applying in parallel an LS on each solution of the Pareto Front PO ∗ in MA. The parallel evaluation of
the neighborhood of each solution could not be efficient for the same reason as the parallel evaluation of
each solution.
We have limited our implementation to the coarse-grained parallel models, that is, the island model
and the multistart model. Figure 36.6 illustrates the parallel hybrid AGMA exploiting these two models.
• The island model: Due to its exorbitant cost in terms of CPU time on large-size instances of BPFSP,
the island model has not been exploited in Reference 2. Indeed, the exploitation of the model on
large-size BPFSP is possible only on large-scale P2P networks or grids. In our implementation
(see Figure 36.6), the parameters of the model are the following: The different cooperative AGMA
Island
model
Genetic
Mimetic
algorithm
algorithm
Ppo* < a
POP POP
Crossover
PO* POP⬘
PO*⬘
Neighbors
XtremWeb
interface PO*
Multistart
model
XtremWeb workers
exchange their whole archives PO ∗ , and the number of emigrants is dynamic. At its arrival, the
immigrant archive is merged with the local one. Migrations occur periodically (each a fixed number
of iterations). The migration topology is the random one, meaning the destination island is selected
randomly.
• The multistart model: The multistart model is exploited during the execution of MA. Each solution
of the Pareto Front PO ∗ computed by the algorithm represents the initial solution of an LS method
that calculates its neighborhood. The different LSs are executed in parallel according to the Master–
Slave model. The master, that is, the algorithm MA merges with PO ∗ the neighborhoods returned
by the different slaves and computes the new PO ∗ that contains the nondominated solutions.
LS LS
Neighbors Neighbors
Dispatcher
MySQL
PO*
Neighbors LS PO*
amount of CPU time is wasted as the system spends its time in re-starting work units performed by the
workers. Therefore, we propose a check-pointing approach at the client level that allows to solve more
efficiently the fault-tolerance problem. Indeed, the problem data and intermediate results are periodically
stored. If the Dispatcher fails the application is restored and re-started from the last checkpoint. In case
of worker failure the work unit is re-started using the intermediate results. The check-pointing operation
(storing) is performed after each LS and/or every 100 generations. The second condition is necessary when
no LS has been launched during the last 100 generations. This means that a significant progression of the
Pareto Front has been observed at each generation, what excludes any resort to the hybridization.
• Version 1 is that proposed in Reference 2, and exploits only the multistart model. This means
that the GA is executed on a single machine and the hybridization phase is deployed on a parallel
machine (IBM-SP2) according to the Master–Slave model (Push mode, i.e., work distribution is
initiated by the master).
• Version 2 is the same as Version 1 except that the hybridization is deployed in a distributed way
on a set of XtremWeb workers according to the cycle stealing paradigm (Pull mode, i.e., work
distribution is initiated by the workers).
• Version 3 is not considered in Reference 2, and is a combination of the multistart and island models.
As illustrated in Figure 36.7, three AGMA algorithms are deployed on client machines and cooperate
according to the island model. Each AGMA is an implementation of Version 2.
Figure 36.8 illustrates the Pareto Fronts obtained with the versions 1 and 2 after 80 LSs. The two fronts
are approximately the same, but Version 2 has the advantage to be fault tolerant.
The execution of Version 1 is stopped after 80 LSs as it is not fault tolerant. Conversely, with Version 2
long-lasting executions are possible. For instance, Figure 36.9 shows that the execution goes on up to
350 LSs. The execution lasted one week, and 10 failures have been observed and as many check-pointing
operations have been performed. As a result, the Pareto Front obtained with 350 LSs is clearly better than
that obtained with 80 LSs using Version 1 or Version 2. One has to note that such results are possible only
with a scalable and fault tolerant version of the algorithm.
Processor Number
Total 120
11250
11150
11100
Makespan
11050
11000
10950
10900
10850
40000 45000 50000 55000 60000 65000 70000
Tardiness
FIGURE 36.8 Pareto Fronts obtained with Version 1 and Version 2 (80 LSs).
11350
11200
11150
Makespan
11100
11050
11000
10950
10900
10850
35000 40000 45000 50000 55000 60000 65000 70000
Tardiness
FIGURE 36.9 Pareto Fronts with Version 1 (80 LSs) and Version 2 (350 LSs).
11250
11150
11100
Makespan
11050
11000
10950
10900
10850
42000 44000 46000 48000 50000 52000 54000 56000 58000 60000 62000 64000
Tardiness
FIGURE 36.10 Pareto Fronts with Version 2 (80 LSs) and Version 3 (80 LSs).
Figure 36.10 allows to compare the Pareto Fronts obtained with Version 2 and Version 3 and to
demonstrate the contribution of the island model to the effectiveness. With 80 LSs, the Pareto Front
obtained using Version 3 is better than that obtained using Version 2. More experiments with more LSs
are in progress.
Figure 36.11 (Part A) shows the oscillation between GA and MA (or LS) over time obtained with
Version 3. Figure 36.11 (Part B) is a zoom up of Figure 36.11 (Part A) on the 50,000 first time units. It
shows that the MA (thus LS) is frequently solicited and this lasts long. Figure 36.11 (Part C) illustrates the
evolution in time of the number of deployed workers at the beginning of the execution (zoom out on the
first 350 time-units). The maximum number of workers is 60 because during the starting phase the Pareto
Front contains a small number of solutions. The number of workers decrease to 0 when the GA succeed
to improve the Pareto Front without calling MA.
One has to note that the spectrum blackens with the time (from left to right). This means that the GA
solicits more and more the MA, that is, the LS because it never enhances again the Pareto Front, in other
words the GA converges. On the other hand, the local search lasts less and less time. Therefore, even the
intensification (by LS) does not contribute to enhance the effectiveness, meaning that the AGMA con-
verges. Through this experimentation, we have learned more on the convergence of the AGMA algorithm.
Therefore, one can note that P2P computing allows to“push far”the limits in terms of computing resources
to better evaluate the contribution of the hybridization but also its limitations.
60 .
(Part C)
50
40
Number of workers
30
20
10
0
0 50 100 150 200 250 300 350
MA Time
(LS)
GA
(Part A)
Time
metaheuristics on P2P systems. Nowadays, existing P2P computing middlewares are inadequate for the
deployment of parallel cooperative applications. Indeed, these need to be extended with a software layer
to support the cooperation. In this chapter, we have proposed a Linda-like cooperation model that has
been implemented on top of XtremWeb.
In Reference 2, a hybrid metaheuristic (AGMA) has been proposed and experimented on BPFSP.
The performed experiments on large-size instances such as 200 jobs on 10 machines are often stopped
without the convergence is reached. The full exploitation of the hybridization needs a large amount of
computational resources and the management of the fault-tolerance issue. We have proposed a fault-
tolerant hybrid parallel design of the AGMA combining two parallel models: the multistart model and the
island model. The algorithm has been implemented on our extended version of XtremWeb.
The first experiments have been performed on the education network of the Polytech’Lille engineering
school. The network is composed of 120 heterogeneous Linux PCs. The preliminary results, obtained
after several execution days, demonstrate that the use of P2P computing allows to fully exploit the benefits
of hybridization. Indeed, the obtained Pareto Front is clearly better than that obtained in Reference 2.
On the other hand, the deployment of the island model allows to improve the effectiveness. Beyond the
improvement of the effectiveness, the parallelism on P2P systems allows to push far the limits in terms
of computational resources. As a consequence, it permits to better evaluate the benefits and limitations
of the hybridization. Such result has to be confirmed again on a larger P2P network and larger instances
of the problem.
References
[1] D.P. Anderson, J. Cobb, E. Korpela, M. Lepofsky, and D. Werthimer. SETI@home: An experiment
in public-resource computing. Communications of the ACM, 45: 56–61, 2002.
[2] M. Basseur, F. Seynhaeve, and E.-G. Talbi. Adaptive mechanisms for multi-objective evolution-
ary algorithms. In Congress on Engineering in System Application CESA ’03, Lille, France, 2003,
pp. 72–86.
[3] S. Cahon, N. Melab, and E.-G. Talbi. ParadisEO: A framework for the reusable design of parallel
and distributed metaheuristics. Journal of Heuristics, 10: 353–376, 2004.
[4] G. Fedak, C. Germain, V. Neri, and F. Cappello. XtremWeb: Building an experimental platform
for Global Computing. Workshop on Global Computing on Personal Devices (CCGRID2001), IEEE
Press, May 2001.
[5] I. Foster and C. Kesselman. The Grid: Blueprint for a New Computing Infrastructure. Morgan
Kaufmann, San Fransisco, CA, 1999.
[6] D. Gelernter and N. Carriero. Coordination languages and their significance. Communications of
the ACM, 35: 97–107, 1992.
[7] D. Gelernter. Generative communication in Linda. ACM Transactions on Programming Languages
and Systems, 7: 80–112, 1985.
[8] A. Oram. Peer-to-Peer: Harnessing the Power of Disruptive Technologies. O’Reilly & Associates, 2001.
[9] G.A. Papadopoulos and F. Arbab. Coordination models and languages. Advances in Computers:
The Engineering of Large Systems, Academic Press, 1998, p. 46.
[10] E.-G. Talbi, M. Rahoual, M.-H. Mabed, and C. Dhaenens. A hybrid evolutionary approach
for multicriteria optimization problems: Application to the Flow Shop. In E. Zitzler et al.,
Eds., Evolutionary Multi-Criterion Optimization, Vol. 1993 of Lecture Notes in Computer Science,
Springer-Verlag, Heidelberg, 2001, pp. 416–428.
[11] E.-G. Talbi. A Taxonomy of Hybrid Metaheuristics. Journal of Heuristics, 8: 541–564, 2002.
[12] V. T’kindt and J.-C. Billaut. Multicriteria Scheduling — Theory, Models and Algorithms. Springer-
Verlag, Heidelberg, 2002.
[13] J. Verbeke, N. Nadgir, G. Ruetsch, and I. Sharapov. Framework for peer-to-peer distributed com-
puting in a heterogeneous, decentralized environment. In Proceedings of the Third International
Workshop on Grid Computing (GRID ’2002), Baltimore, MD, January 2002, pp. 1–12.