An Overview: Volutionary Omputation
An Overview: Volutionary Omputation
?
e-mail: [email protected], ∗∗ Department of Organismic Biology, Ecology, and Evolution,
University of California, Los Angeles, California 90095; e-mail: [email protected]
INTRODUCTION
0066-4162/99/1120-0593$08.00 593
P1: FIZ/FOP-FOK P2: FKP/FGO-FGI QC: FKP
September 16, 1999 14:25 Annual Reviews AR093-21
other operators), followed by natural selection in which the fittest tend to survive
and reproduce, thus propagating their genetic material to future generations. Yet
these simple rules are thought to be responsible for the extraordinary variety and
complexity we see in the biosphere.
There are several approaches that have been followed in the field of evolutionary
computation. The general term for such approaches is evolutionary algorithms. The
most widely used form of evolutionary algorithms are genetic algorithms (GAs),
which will be the main focus of this review. Other common forms of evolutionary
?
algorithms will be described in the third section.
?
different detailed behaviors.
The simple procedure just described is the basis for most applications of GAs.
There are a number of details to fill in, such as how the candidate solutions are
encoded, the size of the population, the details and probabilities of the selection,
crossover, and mutation operators, and the maximum number of generations al-
lowed. The success of the algorithm depends greatly on these details.
?
evolutionary computation methods have broken down to some extent. In this review
most of our examples involve what researchers have called genetic algorithms,
though in many cases these will be of a somewhat different form than Holland’s
original proposal.
In this review we first survey some applications of evolutionary computation in
business, science, and education, and we conclude with a discussion of evolution-
ary computation research most relevant to problems in evolutionary biology. Due
to space limitations, we do not survey the extensive work that has been done on the
theoretical foundations of evolutionary computation; much work in this area can
be found in the various Foundations of Genetic Algorithms proceedings volumes
(11, 58, 76, 77) and in recent reviews (6, 21, 26, 49).
?
Protease inhibitors bind tightly into this active site and disrupt the life cycle of
the virus. Such inhibitors are finding widespread use in treating AIDS. The mar-
ket for protease inhibitors is thus huge. Companies would like to screen candidate
molecules and determine whether they will fit into the active site and how well they
will bind there. Natural Selection Inc. provided software to Aguron Pharmaceuti-
cals that combines models of ligand-protein interactions for molecular recognition
with evolutionary programming to search the space of all possible configurations
of the ligand-protein complex (24). Each candidate solution of the evolving pop-
ulation is a vector with rigid-body coordinates and the angles about its rotatable
bonds. Those individuals with the lowest calculated energies are then used as par-
ents for the next generation. Mutations occur as random changes in the bond angles
in the offspring candidates. In practice it is useful to have the magnitudes of the
mutations themselves evolve as well.
Supply-chain management provides examples of a very different sort. For in-
stance, Volvo trucks are built to order and have dozens of options for each trac-
tor cab, giving millions of configurations that must be scheduled and inventory
checked, after which tools and time must be provided at the appropriate place
on the plant floor and then delivered there. Starting from a collection of average-
quality schedules, the scheduling program provided by I2 Technologies evolves a
satisficing schedule for plant production each week (57). Deere & Company was
probably the first to use such methods (54), also provided by I2 Technologies, for
making their John Deere tractors, and now employs the methods in six of their
assembly plants.
Evolution in such cases is based on an optimizing-scheduling procedure that
was developed and employed at the US Navy’s Point Magu Naval Airbase. Each
chromosome in the evolving population encodes a schedule—a permutation of the
tasks to be performed—whose fitness is the cost of the schedule after it is further
optimized by an additional scheduling program.
An altogether different problem is to predict stock market prices. Several
large corporations, including Citibank, Midland Bank, and Swiss Bank (through
their partner Prediction Company), have been evolving programs that attempt
such predictions (38). Typically such methods involve backcasting—withdrawing
the most recent data from the evaluators, then determining how well each pro-
gram in the population predicts that data. Not surprisingly, the details of how
such programs work, including their performance, are trade secrets, though, for
P1: FIZ/FOP-FOK P2: FKP/FGO-FGI QC: FKP
September 16, 1999 14:25 Annual Reviews AR093-21
reasons discussed in the section on the Baldwin effect, it seems that such pro-
grams are likely to contain some other sorts of learning mechanisms in addition to
evolution.
Asahi Microsystems is building an evolvable hardware (EHW) chip that is
part of an integrated circuit for cellular telephones. When computer chips are
made, there is slight variation from chip to chip, due to differences in capacitance,
resistance, etc. A percentage of such chips do not perform up to specification and
so must be discarded. In general, as chips are made smaller they are less likely to
?
perform to specification. In the case of analog cellular telephones, where size is a
major issue, certain filters must perform to within 1% of the central frequencies.
The laboratory of T Higuchi at Tsukuba (50) has shown how to build chips that
are tunable with 38 parameters as they come off the assembly line. Their EHW
microchip will alter these parameters using evolutionary computation and then set
the appropriate switches on a field programmable gate array (FPGA). The EHW
chip is leading to improved yield rates of the filter chip and also will lead to smaller
circuits and power consumption. The chip is to appear in January 1999 with a target
production of 400,000 chips per month.
Descriptions of various other commercial applications can be found in jour-
nals such as Evolutionary Computation or IEEE Transactions on Evolutionary
Computing, and in various conference proceedings (e.g., 3, 7).
The model strategies involved four variables: the abundance a of insects (the
probability of an insect appearing per square meter per second, assumed to be
uniform over space and time), the sprint velocity v of the lizard (assumed to be
constant), and the coordinates x, y of the insect in the lizard’s view, assumed in
this case to be two dimensional. (It is also assumed that only one insect at a time is
viewed, all insects are identical, and that each chase consists of the lizard leaving
and returning to its perch before a new insect appears). A strategy is a function of
these four variables that returns 1 if the insect is to be chased, −1 otherwise. The
?
goal is to devise a function that maximizes food capture per unit time. Clearly not
every insect is worth chasing; if an insect is too small or too far away it will take
too much time to catch it and might even be gone by the time the lizard reaches it.
In Koza, Rice, & Roughgarden’s model, a single simulation of a lizard’s be-
havior consisted of assigning values to the variables a and v, and then allowing
300 simulated seconds in which insects appear at different x, y coordinates (with
uniform probability over x and y); the lizard uses its strategy to decide which
ones to chase. In one of Koza, Rice, & Roughgarden’s experiments, the lizard’s
10 × 20 meter viewing area was divided into three regions, as shown in Figure 1(a).
Insects appearing in region I always escaped when chased; those appearing in re-
gion II never escaped; and those appearing in region III escaped with probability
zero on the x axis and linearly increasing with angle to a maximum of 0.5 on the y
axis. The optimal strategy is for a lizard to always ignore insects in region 1, chase
those in region 2 that are sufficiently close, and chase those in region 3 that are
Figure 1 (a) The lizard’s viewing area, divided into three regions, each with a different
escape probability for insects. (b)–(d ) Switching curves to illustrate the behavior of the best
program in the population at generations 0 (b), 12 (c), and 46 (d ) from one run of the genetic
programming algorithm. The curves divide the lizard’s viewing area into regions in which it
will chase insects and regions in which it will ignore insects. (Adapted from 41.)
P1: FIZ/FOP-FOK P2: FKP/FGO-FGI QC: FKP
September 16, 1999 14:25 Annual Reviews AR093-21
?
Figure 2 The optimal strategy for a simpler version of the foraging problem, encoded as a
parse tree.
?
Figure 3 Illustration of crossover between two parse trees to produce two offspring. The
(randomly chosen) point of crossover on each tree is marked by a dark line.
be in this set. Koza, Rice, & Roughgarden’s set of terminals was {a, v, x, y, R},
where R produces, each time it appears in an initial program, a random floating
point number between −1 and +1. Their set of functions was {+, −, ∗, /, exp,
iflte}. Koza, Rice, & Roughgarden presumably constructed this set of functions
and terminals via intelligent guesswork. For more details about how the initial
population is generated, see (41).
The fitness of each program in the population was calculated by simulating the
program with several different values for a, v, x, y and measuring the total number
of insects that were eaten over the different simulations. Once the fitness of each
individual program has been calculated, some fraction of the highest fitness indi-
viduals form a new population via copying themselves directly or by crossing over
to create offspring. A crossover between two programs is illustrated in Figure 3.
Two parents are selected to cross over, a random point is chosen in each program,
and the subtrees at that point are exchanged to form two offspring, which are added
to the new population. The copying and crossover procedures are repeated until a
new population of M individuals has been formed. This whole process is repeated
for some number G of generations. In Koza, Rice, & Roughgarden’s algorithm,
M = 1000 and G = 61.
Although the evolved strategies themselves were often hard to interpret, none-
theless, runs exhibited a sequence of progressively improved strategies. Each pro-
gram’s behavior can be visualized on a number of cases via “switching curves”—
curves that illustrate in what regions respectively the lizard will chase or ignore
insects. Figures 1(b)–(d) give switching curves for the best individuals in the pop-
ulation at generations 0, 12, and 46. It can be seen that genetic programming
produced individuals with increasingly fit behavior over the course of evolution.
For example, the best individual at generation 46 will avoid insects in an area that
approximates region I, chase insects in a region that approximates region II, and
in region III the distance the lizard is willing to travel is greatest on the x-axis and
decreases with angular distance on the y axis.
In short, Koza, Rice, & Roughgarden’s work showed that genetic programming
can evolve increasingly fit foraging behavior in this particular simplified model.
The evolved strategies can be considered hypotheses about real-life foraging
P1: FIZ/FOP-FOK P2: FKP/FGO-FGI QC: FKP
September 16, 1999 14:25 Annual Reviews AR093-21
?
results in both the loss of diversity in a population of candidate solutions and
evolved solutions that are “overfit” to that static environment—that is, solutions
that do not generalize well when placed in new environments. His solution was to
have the environment itself—in the form of “parasites”—evolve to be increasingly
challenging for the evolving candidate solutions.
The problem Hillis tackled was that of evolving minimal sorting networks.
Sorting is a much studied problem in computer science whose goal is to place the
elements in a data structure in some specified order (e.g., numerical or alphabetic)
in minimal time. One particular approach to sorting is the sorting network, a
parallelizable device for sorting lists with a fixed number n of elements. In a
simplified form, a sorting network consists of an ordered list of comparisons to be
made between elements in the given list; the compared elements are to be swapped
if they are out of order. For example, the sorting network
specifies that the second and fifth elements are to be compared (and possibly
swapped), then the fourth and second elements are to be compared, followed by
the seventh and fourteenth, and so on. A correct sorting network will take any list
of a fixed length n and, after performing the specified comparisons and swaps,
return the list in correctly sorted order.
In the 1960s several researchers had worked on the problem of finding correct
sorting networks for n = 16 with a minimal number of comparisons. It was first
believed that the minimum was 65 comparisons, but then smaller and smaller
networks were discovered, culminating in a 60-comparison sorter. No proof of
its minimality was found, but no smaller network was discovered. (See 39 for a
discussion of this history).
Hillis used a form of the genetic algorithm to search for minimal sorting net-
works for n = 16. There were two criteria for networks in the population: correct-
ness and small size. Small size was rewarded implicitly due to a diploid encoding
scheme in which networks with fewer comparisons were encoded as chromosomes
with more homozygous sites; smaller networks were more robust to crossovers and
thus tended to be implicitly favored. Correctness was rewarded explicitly via the
fitness function: Each network was tested on a sample of fitness cases (lists to be
sorted). There were too many possible input cases to test each network exhaustively,
so at each generation each network was tested on a set of cases chosen at random.
The fitness of a network was equal to the percentage of cases it sorted correctly.
P1: FIZ/FOP-FOK P2: FKP/FGO-FGI QC: FKP
September 16, 1999 14:25 Annual Reviews AR093-21
?
solutions. One reason was that after early generations the randomly generated test
cases used to compute the fitness of each individual were not challenging enough.
The GA had evolved strategies that worked well on the test cases they were pre-
sented with, and the difficulty of the test cases remained roughly the same. Thus,
after the early generations there was no pressure on the evolving population to
change the current suboptimal sorting strategies.
To solve this problem, Hillis used a form of host-parasite (or predator-prey)
coevolution, in which the sorting networks were viewed as hosts and the test cases
(lists of 16 numbers) as parasites. Hillis modified the system so that a population of
networks coevolved on the same grid as a population of parasites, where a parasite
consisted of a set of 10–20 test cases. Both populations evolved under a GA. The
fitness of a network was now determined by the parasite located at the network’s
grid location. The network’s fitness was the percentage of test cases in the parasite
that it sorted correctly. The fitness of the parasite was the percentage of its test
cases that the network sorted incorrectly.
The evolving population of test cases was thought to provide increasing chal-
lenges to the evolving population of networks. As the networks got better and better
at sorting the test cases, the test cases presumably got harder and harder, evolving
to specifically target weaknesses in the networks. This forced the population of
networks to keep changing—to keep discovering new sorting strategies—rather
than staying stuck at the same suboptimal strategies. With coevolution, the GA
discovered correct networks with only 61 comparisons—a real improvement over
the best networks discovered without coevolution, though a single comparison
away from rivaling the smallest known network for n = 16.
Hillis’s work is important because it introduced a new, potentially very useful
GA technique inspired by coevolution in biology, and his results are a convincing
example of the potential power of such biological inspiration. Additional work on
coevolution in genetic algorithms has been done by a number of people; see, e.g.,
(35, 53, 62).
?
Figure 4 “Breeding screen” of the Blind Watchmaker program. The parent biomorph is
present at the center of the screen and offspring biomorphs are surrounding it. Figures in the
line immediately above the screen signify the genotype of the parent. Offspring biomorphs
differ from the parent by mutations in the genotype string occurring with a probability deter-
mined by a parameter in the parent. Modified from (17).
around it. It becomes an engaging challenge to evolve forms that are exotic in one
way or another, and while doing so the user acquires a real feeling for mutation,
selection, and the enormous variety of phenotypes that evolution can produce. The
program is keyed to Dawkins’ popular book with the same title (18), which uses
the program to illustrate many key features of evolution.
Like the biomorphs, the Blind Watchmaker program has itself been the seed
for several interesting variants. One such variant, where the focus is on developing
interesting pictures, was developed by Karl Sims and displayed at the Centre
?
Georges Pompidou in Paris. Additional variants of this evolution have contributed
to his stunning videos, such as Liquid Selves, some of which are described in
(69). Besides the programs described here, there is a large variety of other popular
software that employs and illustrates the creativity of evolution, including SimLife
(37) and Creatures (15), nurturing hope that evolutionary computation will provide
a useful teaching experience and that the next generation of American students
might show a better understanding of evolution than this one has.
?
Lamarckian Evolution
Lamarckian Evolution refers to the evolution of traits that are modified through
experience and are passed on, in their modified form, to the genotype of the next
generation (42, 43). While this is consistent with certain pre-Mendelian theories of
inheritance, including that which Darwin himself used, it is now recognized never
to occur due to the lack of a mechanism for accomplishing it in natural systems.
Artificial organisms are, of course, not subject to such constraints, and the study
of Lamarckian evolution in such systems sheds some light on issues of general
evolvability. For Lamarckian evolution to occur requires both a means of adapting
within a generation (e.g., via development or learning) and a means of passing
those gains to the genotype of the subsequent generation. Models of learning
studied in this context include neural networks (63, 64), Hopfield networks (34)
and production systems (28).
Hopfield networks have the ability to learn associations and, most remarkably,
exhibit content addressable memory. The mere smell of a cookie, for example,
might evoke all sorts of memories that have nothing to do with cookies them-
selves; sensing just a few properties of an object can recover a whole host of
other properties. Starting from random configurations of a Hopfield network, the
number of memories reliably learned and stored is approximately 0.15 times the
number of nodes in a completely connected system (32). This result depends on
the starting conditions, and some configurations can lead to much greater ability to
remember. Imada & Araki (34) presented a set of inputs to a population of Hopfield
networks capable of learning connection weights (encoded as real-valued vectors)
via Hebbian (reinforcement) learning to perform a pattern-recognition task. Each
generation there were learning trials, where the vectors in the evolving population
were modified via a supervised-learning method. At the end of several learning
trials, the weights modified via learning replaced those that had started the gener-
ation. Thus evolution was Lamarckian. Then all possible inputs were presented,
and it was observed how many stable fixed points (memories) were reached by
the system. If exactly those states corresponding to the inputs were obtained, then
the fitness was set to its maximum value. If different vectors or more vectors were
observed as fixed points, then fitnesses were diminished accordingly. After this
fitness evaluation, mutation and recombination occurred, and a next generation
was formed, followed by another round of learning, fitness evaluation, and selec-
tion. Nearly twice as many memories could be reliably acquired with Lamarckian
P1: FIZ/FOP-FOK P2: FKP/FGO-FGI QC: FKP
September 16, 1999 14:25 Annual Reviews AR093-21
evolution as could be acquired without the Hebbian learning phase. Further, this
larger number of memories could be learned even more rapidly than the much
smaller number of memories acquired by networks evolving in a purely Dar-
winian fashion. Clearly the ability to learn and to pass this experience on through
the genotype accelerated evolution.
Exploring this in more depth, Tokoro and coworkers (63, 64) found that adap-
tation by Lamarckian evolution was indeed much faster for neural networks than
Darwinian evolution when the vectors to be learned were the same from gener-
?
ation to generation, that is, when the environment was constant. But when the
environment changed randomly from generation to generation, then Darwinian
evolution was superior. Further, when modifier genes that determined the amount
of Lamarckian abilities the networks possess were themselves allowed to evolve,
the Lamarckian abilities were lost completely in a randomly changing environment.
Apparently the relative advantages of Lamarckian versus Darwinian evolution
alone must depend on the degree of correlation in the environment from one
generation to the next, in much the way that modifiers of recombination and
sexuality do. In view of the very large differences in adaptability observed here,
we must expect that these differences will be likely to be exploited in practical
applications of evolutionary computation.
Baldwin Effect
Lamarckian evolution is often more effective than Darwinian evolution because the
space of phenotypes can be searched more extensively when each individual can
try out not just one phenotype but a whole suite of new possibilities serially within
their lifetime, perhaps even guided by learning. For example, in the experiments on
the evolution of pattern recognition by Hopfield networks just described, each in-
dividual instantiated a genotype that generation. Under Darwinian evolution alone,
a total of number of agents ∗ number of generations networks can be explored,
maximum. But with learning, each trial during the Hebbian learning phase could
explore yet another network, so the maximum now is number of agents ∗ num-
ber of generations ∗ number of trials, which is potentially much greater. The
problem is how to pass on successful discoveries to the next generation. As is
well-known, the lack of a suitable mechanism prevents biological organisms from
exploiting Larmarckian evolution. There is, however, a reasonable alternative that
is both possible and well-suited for evolution. This is “genetic assimilation” or the
“Baldwin effect.”
Many years ago C. Waddington (74) observed that Drosophila melanogaster
normally produce posterior cross-veins in their wings. But when exposed to heat
shock as pupae, they occasionally fail to develop the cross-veins in their wings
when they become adults. Waddington started selection from a base population
in which all of the adults had cross-veins. Each generation he heat-shocked the
offspring and selected from those who were later cross-veinless as adults. After
14 generations, 80% of those who were heat-shocked were cross-veinless, and a
P1: FIZ/FOP-FOK P2: FKP/FGO-FGI QC: FKP
September 16, 1999 14:25 Annual Reviews AR093-21
few began to be cross-veinless even without the shock. With subsequent selection
he obtained lines with as many as 95% cross-veinless in the absence of shock. He
recognized that this was not Lamarckian evolution, but that it rather resulted simply
from changing the thresholds for expression of the cross-vein trait; Waddington
termed the phenomenon “genetic assimilation” (74). It also happened that a similar
phenomenon had been described earlier by JM Baldwin and is sometimes called the
“Baldwin effect” (10). In textbooks of evolution this phenomenon is occasionally
mentioned but seldom receives more than a brief note.
?
The Baldwin effect has been observed in evolutionary computation studies
(see, e.g., 1, 30). In Waddinton’s study the problem was to select for a trait (cross-
veinlessness) that is almost never expressed. The importance for evolutionary
computation is slightly different; it sometimes occurs that a trait is enormously
useful if it is fully developed or expressed, but it is of no use otherwise. The
problem is to hit upon the right (and rare) configuration of alleles, then preserve it
for further selection and elaboration. In an asexual population, the right ensemble
of alleles might never (or almost never) arise. In a sexual population it might
arise but would tend to be broken up immediately by recombination. However, if
learning or other forms of adaptation during individuals’ lifetime are available, the
desired configuration can arise via these mechanisms; while the trait itself will not
be passed on to offspring, the genetic background producing it will be favored.
Thus, according to Baldwin, learning and other forms of within-lifetime adaptation
can lead to increased survival, which can eventually lead to genetic variation that
produces the trait genetically.
This effect has been demonstrated in simple evolutionary computation settings.
For example, Hinton & Nowlan (30) considered neural networks that evolved via
GAs. At each generation every individual in the population had a “lifetime” during
which its weights were learned. Each weight was coded by a different locus in the
network’s genome. The alleles at each locus were 0, 1, or ?, where “?” signified
that the value varied with learning, and where “learning” consisted of a series of
trials in which the ? values were guessed to be 0 or 1. A final weight of value 1
came either from having the “1” allele in one’s genome or from having adopted it
in a guessing trial. Populations of networks evolved under a fitness function that
highly rewarded networks when all connections were set to 1 sometime during the
network’s lifetime but not otherwise. If the 1 state was adopted early in a network’s
lifetime, then the fitness was higher than if it was adopted later. With this combi-
nation of evolution and learning, Hinton & Nowlan observed that correct settings
for all loci were achieved after about 20 generations of the GA, whereas they never
occurred under evolution alone. Hinton & Nowlan interpreted this result as an (ex-
tremely simple) example of the Baldwin Effect. Maynard Smith (46) calculated
that if phenotypes were strictly determined by genotype, without opportunity for
learning, then about 1000 generations would have been required in an asexual
population and would probably never evolve in a strictly sexual population. As de-
scribed by Hinton & Nowlan (30), learning makes the difference between finding
a needle in a haystack and finding a needle in the haystack when someone tells you
P1: FIZ/FOP-FOK P2: FKP/FGO-FGI QC: FKP
September 16, 1999 14:25 Annual Reviews AR093-21
when you are getting closer. It is evident from these studies, both with Lamarckian
evolution and the Baldwin effect, that learning often allows organisms to evolve
much faster than their non learning equivalents. Hence its importance for tracking
the stock market by adapting agents, and quite possibly for evolution in the natural
world.
?
In view of the effect of learning on evolution, it is natural to ask how culture
affects evolution. It would seem that artificial systems like those used for studying
Lamarckian evolution and the Baldwin effect would be natural vehicles to explore
the elements of cultural transmission. But in fact, there have been relatively few
such studies.
Studies of the evolution of cooperation in the Prisoner’s dilemma, begun by
Axelrod (5), have stimulated a great deal of investigation. These typically do
not involve cultural evolution, though in the real world such traits would have a
very strong cultural component. There have been a few studies on the evolution
of communication: acquiring the ability to communicate and agree on common
lexical items have been modeled with some success (4, 70, 75). In addition, a few
studies have addressed the very difficult problems concerned with how actual
languages are learned and evolve (see 51).
Learning human languages presents serious theoretical problems for complex
adaptive systems. For example, Gold’s problem (25) is concerned with how, af-
ter hearing only a finite number of sentences (many of which may have errors),
each of us learns a grammar that can generate an infinite number of grammati-
cally correct sentences. A second problem is to account for the many changes that
occur through time. Speakers of modern English can typically read Shakespeare
(early Modern English from 400 years ago) without much difficulty. We can read
Chaucer (Middle English from 800 years ago) with a moderate amount of lexi-
cal help. Only scholars can read Bede (Old English from 1300 years ago). Spoken
Chaucer would be incomprehensible because of changes in vowel sounds that have
occurred since it was spoken, but the spelling has remained similar. The challenge
is to describe and, possibly predict, the course of language evolution in cases such
as this. Learnability is a major issue here, and it is generally felt that the evolution
of languages is largely driven by how easy it is to learn. Niyogi & Berwick (51)
have used evolutionary computation methods to model how populations of agents
can acquire language from hearing samples from the prior generation and then
themselves provide examples for the next generation to learn. Using Chomsky’s
principles and parameter model of language (12), they found that some parame-
ters were more stable than others. Further, they found that learnability alone was
an inadequate explanation for some of the changes in grammatical form known
to have occurred in the evolution of English, such as verb order, where Old En-
glish resembles present-day German. For a review of other attempts to model the
coevolution of genes, cultures, and societies see Epstein & Axtell (20).
P1: FIZ/FOP-FOK P2: FKP/FGO-FGI QC: FKP
September 16, 1999 14:25 Annual Reviews AR093-21
?
ing Genetic Programming to Evolve Optimal Foraging Strategies, is one method
to get around this brittleness; neural networks are another. Ray’s “Tierra” pro-
gram uses a different method involving a specially designed assembly language to
construct self-replicating programs. The resulting “ecology” provides interesting
parallels to natural life—including competition for (memory) resources, trophic
structures, and so on (47). Ray’s current efforts are directed toward the evolution of
self-contained but cooperating programs that emerge through evolutionary com-
putation and are analogous to multicellular organisms. Success has so far been
limited, but Ray does observe differentiation into something akin to somatic and
reproductive code. (60).
Building on Tierra, C Adami (2) has developed a computational world, “Avida,”
with spatial structure that Tierra lacks. J Chu (14) has further developed Avida to
run on the massively parallel Intel Paragon computer, so that very large numbers
of simulations can be run in fairly large environments, e.g. 100 × 100 units, with
as many as 10,000 competing bits of code. Chu observed what seem to be invari-
ant power laws, where the log of the number of copies of a program that have
“lived” can be plotted against how frequently such sequences were observed in
the evolutionary sequence. When selection was strong he found a −3/2 slope for
this, just as is observed for the number of families in the fossil record (67) and
is observed for avalanche size in the higher-dimension sandpiles of Bak’s models
of self-organized criticality (9). Chu developed arguments based on the theory of
branching process to explain why this should be true. Such relationships, if found
to be general, might point to a radically different theory of evolution than we now
have, based on principles of self-organizing systems that are both more general
and also more capable, in that they can capture phenomena that have so far resisted
adequate explanation (19).
computation settings (e.g., population sizes of a few hundred and mutation rates
of 10−2 ) than in biology. Further, while most evolutionary computation systems
include recombination, the life cycle of individuals is like that of a moss, with
a short diploid and a long haploid phase—not at all what most genetic theory
addresses.
This is not to say that population genetics is inconsistent or inapplicable.
Christiansen & Feldman (13) showed how to derive parts of Holland’s GA theory
(31) from principles of population genetics. Further, theoretical predictions from
?
population genetics do help explain certain observations of evolutionary compu-
tation: e.g. in evolutionary computation applications where mutation rates and
magnitudes are allowed to evolve, it is typically observed that they evolve down-
wards after sufficient time (reviewed in 8), as expected from equilibrium theory in
population genetics.
One of the most challenging problems in population genetics has concerned the
manner that populations traverse their adaptive landscape. Does evolution carry out
a gradual hill-climbing, leading to some sort of optimization, as RA Fisher argued,
or does it proceed by jumps and starts, with chance playing a significant role, as
argued by Sewall Wright? In spite of the centrality of this issue for many questions
in evolutionary theory, it has proven extremely difficult to test different proposals
(16). Evolutionary computation has addressed this problem from a purely practical
standpoint and has typically found that population subdivision (“island models”)
significantly speeds evolution (e.g., 8, 27, 71). From a different vantage, theoretical
approaches to evolutionary computation, such as those proposing mechanisms
underlying metastability in evolution (72, 73), may provide new theoretical bases
for describing many of these phenomena.
One feature of adaptive landscapes, in both evolutionary computation and bio-
logical settings, is that broad plateaus of fitness seem common, and chance plays
a major role in moving about on them. Where the population can move next seems
to depend critically upon where it has drifted on the plateau. For example, Huynen,
Stadler & Fontana (33) used computational models for predicting molecular struc-
tures to describe the 3D structure of tRNA. While 30% of nucleotide substitutions
seemed to be neutral, the high dimensionality made it possible to traverse se-
quence space along a connected path, changing every nucleotide present, without
ever changing the structure. It is no surprise then, that when a population is begun
with all sequences identical, but with a small amount of mutation, the initial point
in sequence space diffuses outward into a cloud, to a limit in accord with theoreti-
cal expectations, then drifts along the plateau. Different subpopulations can reach
very different parts of the sequence space before dramatic improvement results
from finding one improvement or another. Fitness assumes a step-like improve-
ment, coinciding with Wright’s expectation that “Changes in wholly nonfunctional
parts of the molecule would be the most frequent ones but would be unimportant,
unless they occasionally give a basis for later changes which improve function in
the species in question, which would then become established by selection” (56),
p. 474.
P1: FIZ/FOP-FOK P2: FKP/FGO-FGI QC: FKP
September 16, 1999 14:25 Annual Reviews AR093-21
?
left when encountering a corner. Many neural networks prescribed this behavior,
but some of these made it easier to make jumps to radically more sophisticated
behavior, with correspondingly more complex programs. There was much variation
from run to run, with chance largely determining which populations were able to
find one improved solution or another.
While these studies showed quite clearly that plateaus on the adaptive surface
are common, with stepwise improvement in fitness, it must be stressed that this
is not always the case—especially when fitnesses are frequency-dependent. Very
complex dynamics are sometimes observed, including plateaus interspersed with
periods of chaos (45). An interesting example of this is provided by competition
among bit strings in a series of studies by K Kaneko and co-workers (summarized
in 36). In this system strings are assumed to compete to the extent that they are
similar (measured by their Hamming distance)—more similar strings compete
more strongly, so fitness is frequency-dependent. But strings too far apart have
less success in reproduction. Mutation among the strings is allowed. It is also
possible to include predator/prey interactions in this system, where the strength
of predator-prey interactions depends on the Hamming distance. Such systems
are high-dimensional and highly nonlinear. In a way, their interactions resemble
logistic maps, which are known to be chaotic over much of their parameter space,
except that here they are high-dimensional and can escape from having their fitness
reduced by competition, as it were, through mutation to a less frequent form. In
a series of papers Kaneko and co-workers analyzed the dynamics of this system,
numerically calculating the Lyapunov exponents, and observed high-dimensional,
weakly chaotic dynamics in the evolution of this system that often led to dynamic
stability and robustness against external perturbations. He termed this situation
“homeochaos” and suggested that such system dynamics may be very general
features of evolution, both in computational evolution and in the real world.
CONCLUSIONS
There are many parallels between biological evolution searching through a space
of gene sequences and computer evolution searching through a space of computer
programs or other data structures. Several approaches to exploit these similarities
have developed independently and are collectively termed evolutionary algorithms
or evolutionary computation. Such methods of computation can be used to search
P1: FIZ/FOP-FOK P2: FKP/FGO-FGI QC: FKP
September 16, 1999 14:25 Annual Reviews AR093-21
through the space of candidate solutions to problems and are now finding applica-
tion in an increasing number of industrial and business problems.
While there is no general theory that will identify the best method to find
optima, it appears that evolutionary computation is best suited for problems that
involve nonlinear interactions among many elements, where many intermediate
optima exist, and where solutions that are satisficing—merely very good without
necessarily being the absolute optimum—will do. Such problems are common in
business and in biological studies, such as cooperation, foraging, and coevolution.
?
Evolutionary computation can sometimes serve as a useful model for biologi-
cal evolution. It allows dissection and repetition in ways that biological evolution
does not. Computational evolution can be a useful tool for education and is begin-
ning to provide new ways to view patterns in evolution, such as power laws and
descriptions of non-equilibrium systems. Evolutionary theory, as developed by
biologists, typically tries to linearize systems, for ease of analysis with differential
equations, or to treat units in isolation, as in single-locus selection. While evo-
lutionary computation is not inconsistent with such theory, it tends to be outside
it, in that real difference in capacity and complexity are often observed and are
not really describable by stable equilibria or simple changes in gene frequencies,
at least in ways that are interesting. There is reason to believe that theories of
evolutionary computation might extend the language of biological evolutionary
theory and contribute to new kinds of generalizations and analyses that have not
been available up to now.
ACKNOWLEDGMENTS
MM acknowledges the Santa Fe Institute and the National Science Foundation
(grant NSF-IRI-9705830) for support. CT acknowledges NSF grant #5BR9720410.
LITERATURE CITED
1. Ackley D, Littman M. 1992. Interactions RK Belew, H Kitano, CE Taylor, pp. 9–17.
between learning and evolution. In Artifi- Cambridge, MA: MIT Press
cial Life II, ed. CG Langton, C Taylor, JD 5. Axelrod R. 1984. The Evolution of Cooper-
Farmer, S Rasmussen, pp. 487–509. Read- ation. New York: Basic
ing, MA: Addison-Wesley 6. Bäck T. 1996. Evolutionary Algorithms in
2. Adami C. 1998. Introduction to Artificial Theory and Practice: Evolution Strategies,
Life. New York: Springer-Verlag Evolutionary Programming, Genetic Algo-
3. Angeline PJ, ed. 1997. Evolutionary Pro- rithms. Oxford: Oxford Univ. Press
gramming VI: 6th Int. Conf. EP97. New 7. Bäck T, ed. 1997. Proceedings of the Seventh
York: Springer International Conference on Genetic Algo-
4. Arita T, Koyama Y. 1998. Evolution of lin- rithms, San Francisco, CA: M. Kaufmann
guistic diversity in a simple communication 8. Baeck T, Hammel U, Schwefel HP. 1997.
system. In Artificial Life VI, ed. C Adami, Evolutionary computation: comments on
P1: FIZ/FOP-FOK P2: FKP/FGO-FGI QC: FKP
September 16, 1999 14:25 Annual Reviews AR093-21
the history and current state. IEEE Trans. ficial Intelligence Through Simulated Evo-
Evol. Computation 1:3–17 lution. New York: John Wiley
9. Bak P. 1996. How Nature Works: The 24. Gehlhaar D, Verkhivker G, Rejto P, Sher-
Science of Self-Organized Criticality. New man C, Fogel D, et al. 1995. Molecular
York: Springer-Verlag recognition of the inhibitor AG-1343 by
10. Belew RK, Mitchell M, eds. 1996. Adap- HIV-1 protease: conformationally flexi-
tive Individuals in Evolving Populations: ble docking by evolutionary programming.
Models and Algorithms. Reading, MA: Ad- Chem. Biol. 2:317–24
?
dison Wesley 25. Gold EM. 1967. Language identification in
11. Belew RK, Vose MD, eds. 1997. Founda- the limit. Inform. Control 10:447–74
tions of Genetic Algorithms 4. San Fran- 26. Goldberg DE. 1989. Genetic Algorithms in
cisco, CA: M. Kaufmann Search, Optimization, and Machine Learn-
12. Chomsky N. 1995. The Minimalist Pro- ing. Reading, MA: Addison-Wesley
gram. Cambridge, MA: MIT Press 27. Gordon VS, Whitley D. 1993. Serial and
13. Christiansen FB, Feldman MW. 1998. Al- parallel genetic algorithms as function op-
gorithms, genetics, and populations: the timizers. In Proc. Fifth Int. Conf. Genetic
schemata theorem revisited. Complexity Algorithms, ed. T Bäck, pp. 177–183. San
3(3):57–64 Mateo, CA: M. Kaufmann
14. Chu J. 1999. Computational explorations 28. Grefenstette JJ. 1991. Lamarckian learning
of life. PhD thesis. Calif. Inst. Technol., in multi-agent environments. In Proc. 4th
Pasadena, CA Int. Conf. on Genetic Algorithms and Their
15. Cliff D, Grand S. 1999. The ‘Creatures’ Applications, ed. RK Belew, L Booker, pp.
global digital ecosystem. Artificial Life. In 303–10. San Mateo, CA: M. Kaufmann
press 29. Hillis WD. 1990. Co-evolving parasites
16. Coyne JA, Barton N, Turelli M. 1997. Per- improve simulated evolution as an opti-
spective: a critique of Sewall Wright’s mization procedure. Physica D 42:228–
shifting balance theory of evolution. Evo- 34
lution 51:643–71 30. Hinton GE, Nowlan SJ. 1987. How learn-
17. Dawkins R. 1989. The evolution of evolv- ing can guide evolution. Complex Systems
ability. In Artificial Life, ed. CG Langton, 1:495–502
201–220. Reading, MA: Addison-Wesley 31. Holland JH. 1975. Adaptation in Natural
18. Dawkins R. 1996. The Blind Watchmaker: and Artificial Systems. Ann Arbor, MI:
Why the Evidence of Evolution Reveals a Univ. Michi. Press
Universe Without Design. New York: Nor- 32. Hopfield J. 1982. Neural networks and
ton. 2nd ed. physical systems with emergent collective
19. Depew DJ, Weber BH. 1995. Darwinism computational abilities. Proc. Nat. Acad.
Evolving. Cambridge, MA: MIT Press Sci. USA 79:2554–58
20. Epstein J, Axtell R. 1996. Growing Artifi- 33. Huynen MA, Stadler F, Fontana W. 1996.
cial Societies. Cambridge, MA: MIT Press Smoothness within a rugged landscape:
21. Fogel DB. 1995. Evolutionary Compu- The role of neutrality in evolution. Proc.
tation: Toward a New Philosophy of Natl. Acad. Sci. USA 93:397–401
Machine Intelligence. New York: IEEE 34. Imada A, Araki K. 1996. Lamarckian evo-
Press lution and associative memory. In Proc.
22. Fogel DB, ed. 1998. Evolutionary Com- 1996 IEEE Third Int. Conf. Evol. Compu-
putation: The Fossil Record. New York: tation (ICES-96):676–80
IEEE Press 35. Juillé H, Pollack JB. 1998. Coevolutionary
23. Fogel LJ, Owens AJ, Walsh MJ. 1966. Arti- learning: a case study. In ICML ’98-Proc.
P1: FIZ/FOP-FOK P2: FKP/FGO-FGI QC: FKP
September 16, 1999 14:25 Annual Reviews AR093-21
Int. Conf. Machine Learning. San Fran- 49. Mitchell M. 1996. An Introduction to Ge-
cisco, CA: M. Kaufmann netic Algorithms. Cambridge, MA: MIT
36. Kaneko K. 1994. Chaos as a source of com- Press
plexity and diversity in evolution. Artificial 50. Murakawa M, Yoshizawa S, Adachi T,
Life 1:163–77 Suzuki S, Takasuka K, et al. 1998. Ana-
37. Karakotsios K, Bremer M. 1993. SimLife: logue EHW chip for intermediate fre-
The Official Strategy Guide. Rocklin, CA: quency filters. In Evolvable Systems: From
Prima Biology to Hardware, ed. M Sipper, D
?
38. Kelly K. 1994. Out of Control: The Rise Mange, pp. 134–43. New York: Springer
of Neo. Biological Civilization. Reading, 51. Niyogi P, Berwick RC. 1995. The Logical
MA: Addison-Wesley Problem of Language Change. Tech. Rep.
39. Knuth DE. 1973. The Art of Computer Pro- A. I. Memo No. 1516. MIT Artificial Intel-
gramming. Vol. 3: Sorting and Searching. ligence Lab. Cambridge, MA
Reading, MA: Addison-Wesley 52. Papadimtriou CH, Sideri M. 1998. On the
40. Koza JR. 1992. Genetic Programming: evolution of easy instances. Unpublished
On the Programming of Computers by manuscript, Computer Science Dept., Uni-
Means of Natural Selection. Cambridge, versity of California, Berkeley, CA
MA: MIT Press 53. Paredis J. 1997. Coevolving cellular au-
41. Koza JR, Rice JP, Roughgarden J. 1992. tomata: Be aware of the red queen! In
Evolution of food foraging strategies for Proc. Seventh Int. Conf. Genetic Algo-
the Caribbean anolis lizard using genetic rithms, ed. T Bäck, pp. 393–400. San Fran-
programming. Adaptive Behav. 1(2):47–74 cisco, CA: Morgan Kaufmann
42. Lamarck JB. 1809. Philosophie Zoolo- 54. Petzinger Jr. T. 1995. At Deere they know
gique, ou Exposition des Considérations a mad scientist may be the firm’s biggest
Relatives a l’Histoire Naturelle de Ani- asset. Wall Street J. 14 July 1995, p. A1
maux. Paris: Chez Dentu et L’Auteur 55. Press WH, A. Teukolsky S, Vetterling WT,
43. Lamarck JB. 1996. Of the influence of the Flannery BP. 1992. Numerical Recipes in
environment on the activities and habits of C. New York: Cambridge Univ. Press
animals, and the influence of the activities 56. Provine WB. 1986. Sewall Wright and Evo-
and habits of these living bodies in modi- lutionary Biology. Chicago, IL: Univ. Chi-
fying their organization and structure. See cago Press
Ref. 10, pp. 39–57 57. Rao SS. 1998. Evolution at warp speed.
44. Lewontin R. 1998. Survival of the nicest. Forbes Mag.
NY Rev. Books. 22 Oct. 1998, 59–63 58. Rawlins G, ed. 1991. Foundations of Ge-
45. Lindgren K. 1992. Evolutionary phenom- netic Algorithms. San Mateo, CA: M.
ena in simple dynamics. In Artificial Life Kaufmann
II, ed. CG Langton, C Taylor, JD Farmer, 59. Ray TS. 1991. An approach to the synthesis
S Rasmussen, pp. 295–312. Reading, MA: of life. In Artifical Life II, ed. CG Langton,
Addison-Wesley C Taylor, J Farmer, S Rasmussen, pp. 371–
46. Maynard Smith J. 1987. Natural selection: 408. Reading, MA: Addison–Wesley
when learning guides evolution. Nature 60. Ray TS, Hart J. 1998. Evolution of differ-
329:761–62 entiated multi-threaded digital organisms.
47. Maynard Smith J. 1992. Byte-sized evolu- In Artificial Life VI, ed. C Adami, RK
tion. Nature 355:772–73 Belew, H Kitano, CE Taylor, pp. 295–306.
48. Miglino O, Nafasi K, Taylor CE. 1994. Se- Cambridge, MA: MIT Press
lection for wandering behavior in a small 61. Rechenberg I. 1973. Evolutionsstrategie.
robot. Artificial Life 2:101–16 Stuttgart: Frommann-Holzboog
P1: FIZ/FOP-FOK P2: FKP/FGO-FGI QC: FKP
September 16, 1999 14:25 Annual Reviews AR093-21
62. Rosin CD, Belew RK. 1995. Methods for 70. Steels L, Kaplan F. 1998. Stochasticity as a
competitive coevolution: finding oppo- source of innovation in language games. In
nents worth beating. In Proc. Sixth Int. Artificial Life VI, ed. C Adami, RK Belew,
Conf. Genetic Algorithms, ed. LJ Eshel- H Kitano, CE Taylor, pp. 368–78. Cam-
man, pp. San Francisco, CA: M. Kauf- bridge, MA: MIT Press
mann 71. Tanese R. 1989. Distributed genetic algo-
63. Sasaki T, Tokoro M. 1997. Adaptation to- rithms. In Proc. Third Int. Conf. on Genetic
ward changing environments: Why Dar- Algorithms, ed. JD Schaffer, pp. 434–39.
?
winian in nature? In Proc. Fourth Eur. San Mateo, CA: M. Kaufmann
Conf. on Artificial Life, 145–53. Cam- 72. van Nimwegen E, Crutchfield JP, Mitchell
bridge, MA: MIT Press M. 1999. Statistical dynamics of the Royal
64. Sasaki T, Tokoro M. 1999. Evolvable learn- Road genetic algorithm. Theoret. Com-
able neural networks under changing en- puter Sci. To appear
vironments with various rates of inheri- 73. van Nimwegen E, Crutchfield JP, Mitchell
tance of acquired characters: comparison M. 1997. Finite populations induce
between Darwinian and Lamarckian evo- metastability in evolutionary search. Phys.
lution. Artificial Life. In press Lett. A, 229(2):144–50
65. Schwefel HP. 1975. Evolutionsstrategie 74. Waddington CH. 1953. Genetic assimila-
und Numerische Optimierung. PhD thesis, tion of an acquired character. Evolution
Technische Univ. Berlin, Berlin 7:118–26
66. Schwefel HP. 1995. Evolution and Opti- 75. Werner GM, Dyer MG. 1991. Evolution of
mum Seeking. New York: Wiley communication in artificial organisms. In
67. Sepkowski JJ. 1992. A Compendium of Artificial Life II, ed. CG Langton, C Tay-
Fossil Marine Animal Families. Milwau- lor, J Farmer, S Rasmussen, pp. 659–87.
kee, WI: Milwaukee Public Mus. 2nd ed. Reading, MA: Addison Wesley
68. Simon H. 1969. The Sciences of the Artifi- 76. Whitley LD, ed. 1993. Foundations of Ge-
cial. Cambridge, MA: MIT Press netic Algorithms 2. San Mateo, CA: M.
69. Sims K. 1994. Evolving 3D morphology Kaufmann
and behavior by competition. In Artificial 77. Whitley LD, Vose MD, eds. 1995. Foun-
Life IV, ed. RA Brooks, P Maes, pp. 28–39. dations of Genetic Algorithms 3. San Fran-
Cambridge, MA: MIT Press cisco, CA: M. Kaufmann