A Genetic Algorithm Tutorial
A Genetic Algorithm Tutorial
Darrell Whitley
Computer Science Department, Colorado State University
Fort Collins, CO 80523 [email protected]
Abstract
This tutorial covers the canonical genetic algorithm as well as more experimental
forms of genetic algorithms, including parallel island models and parallel cellular genetic
algorithms. The tutorial also illustrates genetic search by hyperplane sampling. The
theoretical foundations of genetic algorithms are reviewed, include the schema theorem
as well as recently developed exact models of the canonical genetic algorithm.
Keywords: Genetic Algorithms, Search, Parallel Algorithms
1 Introduction
Genetic Algorithms are a family of computational models inspired by evolution. These
algorithms encode a potential solution to a specic problem on a simple chromosome-like
data structure and apply recombination operators to these structures so as to preserve critical
information. Genetic algorithms are often viewed as function optimizers, although the range
of problems to which genetic algorithms have been applied is quite broad.
An implementation of a genetic algorithm begins with a population of (typically random)
chromosomes. One then evaluates these structures and allocates reproductive opportunities
in such a way that those chromosomes which represent a better solution to the target problem
are given more chances to \reproduce" than those chromosomes which are poorer solutions.
The \goodness" of a solution is typically dened with respect to the current population.
This particular description of a genetic algorithm is intentionally abstract because in
some sense, the term genetic algorithm has two meanings. In a strict interpretation, the
genetic algorithm refers to a model introduced and investigated by John Holland (1975) and
by students of Holland (e.g., DeJong, 1975). It is still the case that most of the existing
theory for genetic algorithms applies either solely or primarily to the model introduced by
Holland, as well as variations on what will be referred to in this paper as the canonical
genetic algorithm. Recent theoretical advances in modeling genetic algorithms also apply
primarily to the canonical genetic algorithm (Vose, 1993).
In a broader usage of the term, a genetic algorithm is any population-based model that
uses selection and recombination operators to generate new sample points in a search space.
Many genetic algorithm models have been introduced by researchers largely working from
1
an experimental perspective. Many of these researchers are application oriented and are
typically interested in genetic algorithms as optimization tools.
The goal of this tutorial is to present genetic algorithms in such a way that students new
to this eld can grasp the basic concepts behind genetic algorithms as they work through
the tutorial. It should allow the more sophisticated reader to absorb this material with
relative ease. The tutorial also covers topics, such as inversion, which have sometimes been
misunderstood and misused by researchers new to the eld.
The tutorial begins with a very low level discussion of optimization to both introduce basic
ideas in optimization as well as basic concepts that relate to genetic algorithms. In section 2
a canonical genetic algorithm is reviewed. In section 3 the principle of hyperplane sampling
is explored and some basic crossover operators are introduced. In section 4 various versions
of the schema theorem are developed in a step by step fashion and other crossover operators
are discussed. In section 5 binary alphabets and their eects on hyperplane sampling are
considered. In section 6 a brief criticism of the schema theorem is considered and in section
7 an exact model of the genetic algorithm is developed. The last three sections of the
tutorial cover alternative forms of genetic algorithms and evolutionary computational models,
including specialized parallel implementations.
Most users of genetic algorithms typically are concerned with problems that are nonlinear.
This also often implies that it is not possible to treat each parameter as an independent
variable which can be solved in isolation from the other variables. There are interactions
such that the combined eects of the parameters must be considered in order to maximize or
minimize the output of the black box. In the genetic algorithm community, the interaction
between variables is sometimes referred to as epistasis.
The rst assumption that is typically made is that the variables representing parameters
can be represented by bit strings. This means that the variables are discretized in an a
priori fashion, and that the range of the discretization corresponds to some power of 2. For
example, with 10 bits per parameter, we obtain a range with 1024 discrete values. If the
parameters are actually continuous then this discretization is not a particular problem. This
assumes, of course, that the discretization provides enough resolution to make it possible to
adjust the output with the desired level of precision. It also assumes that the discretization
is in some sense representative of the underlying function.
2
If some parameter can only take on an exact nite set of values then the coding issue
becomes more dicult. For example, what if there are exactly 1200 discrete values which
can be assigned to some variable Xi . We need at least 11 bits to cover this range, but
this codes for a total of 2048 discrete values. The 848 unnecessary bit patterns may result
in no evaluation, a default worst possible evaluation, or some parameter settings may be
represented twice so that all binary strings result in a legal set of parameter values. Solving
such coding problems is usually considered to be part of the design of the evaluation function.
Aside from the coding issue, the evaluation function is usually given as part of the problem
description. On the other hand, developing an evaluation function can sometimes involve
developing a simulation. In other cases, the evaluation may be performance based and
may represent only an approximate or partial evaluation. For example, consider a control
application where the system can be in any one of an exponentially large number of possible
states. Assume a genetic algorithm is used to optimize some form of control strategy. In
such cases, the state space must be sampled in a limited fashion and the resulting evaluation
of control strategies is approximate and noisy (c.f., Fitzpatrick and Grefenstette, 1988).
The evaluation function must also be relatively fast. This is typically true for any opti-
mization method, but it may particularly pose an issue for genetic algorithms. Since a genetic
algorithm works with a population of potential solutions, it incurs the cost of evaluating this
population. Furthermore, the population is replaced (all or in part) on a generational basis.
The members of the population reproduce, and their ospring must then be evaluated. If it
takes 1 hour to do an evaluation, then it takes over 1 year to do 10,000 evaluations. This
would be approximately 50 generations for a population of only 200 strings.
course, the expression 2L grows exponentially with respect to L. Consider a problem with
an encoding of 400 bits. How big is the associated search space? A classic introductory
textbook on Articial Intelligence gives one characterization of a space of this size. Winston
(1992:102) points out that 2 is a good approximation of the eective size of the search space
400
of possible board congurations in chess. (This assumes the eective branching factor at each
possible move to be 16 and that a game is made up of 100 moves; 16 = (2 ) = 2 ).
100 4 100 400
Winston states that this is \a ridiculously large number. In fact, if all the atoms in the
universe had been computing chess moves at picosecond rates since the big bang (if any),
the analysis would be just getting started."
The point is that as long as the number of \good solutions" to a problem are sparse with
respect to the size of the search space, then random search or search by enumeration of a large
3
search space is not a practical form of problem solving. On the other hand, any search other
than random search imposes some bias in terms of how it looks for better solutions and where
it looks in the search space. Genetic algorithms indeed introduce a particular bias in terms
of what new points in the space will be sampled. Nevertheless, a genetic algorithm belongs
to the class of methods known as \weak methods" in the Articial Intelligence community
because it makes relatively few assumptions about the problem that is being solved.
Of course, there are many optimization methods that have been developed in mathe-
matics and operations research. What role do genetic algorithms play as an optimization
tool? Genetic algorithms are often described as a global search method that does not use
gradient information. Thus, nondierentiable functions as well as functions with multiple
local optima represent classes of problems to which genetic algorithms might be applied.
Genetic algorithms, as a weak method, are robust but very general. If there exists a good
specialized optimization method for a specic problem, then genetic algorithm may not be
the best optimization tool for that application. On the other hand, some researchers work
with hybrid algorithms that combine existing methods with genetic algorithms.
4
Selection Recombination
(Duplication) (Crossover)
String 1 String 1 Offspring-A (1 X 2)
String 2 String 2 Offspring-B (1 X 2)
String 3 String 2 Offspring-A (2 X 4)
String 4 String 4 Offspring-B (2 X 4)
We will rst consider the construction of the intermediate population from the current
population. In the rst generation the current population is also the initial population. After
calculating fi =f for all the strings in the current population, selection is carried out. In the
canonical genetic algorithm the probability that strings in the current population are copied
(i.e., duplicated) and placed in the intermediate generation is proportion to their tness.
There are a number of ways to do selection. We might view the population as mapping
onto a roulette wheel, where each individual is represented by a space that proportionally
corresponds to its tness. By repeatedly spinning the roulette wheel, individuals are chosen
using \stochastic sampling with replacement" to ll the intermediate population.
A selection process that will more closely match the expected tness values is \remainder
stochastic sampling." For each string i where fi=f is greater than 1.0, the integer portion of
this number indicates how many copies of that string are directly placed in the intermediate
population. All strings (including those with fi =f less than 1.0) then place additional copies
in the intermediate population with a probability corresponding to the fractional portion of
fi=f. For example, a string with fi=f = 1:36 places 1 copy in the intermediate population,
and then receives a 0:36 chance of placing a second copy. A string with a tness of fi=f = 0:54
has a 0:54 chance of placing one string in the intermediate population.
5
\Remainder stochastic sampling" is most eciently implemented using a method known
as Stochastic Universal Sampling. Assume that the population is laid out in random order
as in a pie graph, where each individual is assigned space on the pie graph in proportion
to tness. Next an outer roulette wheel is placed around the pie with N equally spaced
pointers. A single spin of the roulette wheel will now simultaneously pick all N members of
the intermediate population. The resulting selection is also unbiased (Baker, 1987).
After selection has been carried out the construction of the intermediate population is
complete and recombination can occur. This can be viewed as creating the next population
from the intermediate population. Crossover is applied to randomly paired strings with
a probability denoted pc . (The population should already be suciently shued by the
random selection process.) Pick a pair of strings. With probability pc \recombine" these
strings to form two new strings that are inserted into the next population.
Consider the following binary string: 1101001100101101. The string would represent a
possible solution to some parameter optimization problem. New sample points in the space
are generated by recombining two parent strings. Consider the string 1101001100101101 and
another binary string, yxyyxyxxyyyxyxxy, in which the values 0 and 1 are denoted by x and
y. Using a single randomly chosen recombination point, 1-point crossover occurs as follows.
11010 \/ 01100101101
yxyyx /\ yxxyyyxyxxy
Swapping the fragments between the two parents produces the following ospring.
11010yxxyyyxyxxy and yxyyx01100101101
After recombination, we can apply a mutation operator. For each bit in the population,
mutate with some low probability pm . Typically the mutation rate is applied with less than
1% probability. In some cases, mutation is interpreted as randomly generating a new bit,
in which case, only 50% of the time will the \mutation" actually change the bit value. In
other cases, mutation is interpreted to mean actually
ipping the bit. The dierence is no
more than an implementation detail as long as the user/reader is aware of the dierence
and understands that the rst form of mutation produces a change in bit values only half as
often as the second, and that one version of mutation is just a scaled version of the other.
After the process of selection, recombination and mutation is complete, the next popu-
lation can be evaluated. The process of evaluation, selection, recombination and mutation
forms one generation in the execution of a genetic algorithm.
7
110
111
010 011
100 101
000 001
0110
0111
1110
0010
1010
1101
0101
1000
1001
0000
0001
Figure 2: A 3-dimensional cube and a 4-dimensional hypercube. The corners of the inner
cube and outer cube in the bottom 4-D example are numbered in the same way as in the upper
3-D cube, except a 1 is added as a prex to the labels of inner cube and a 0 is added as a
prex to the labels of the outer cube. Only select points are labeled in the 4-D hypercube.
8
dierent hyperplanes are evaluated in an implicitly parallel fashion each time a single string
is evaluated (Holland 1975:74); but it is the cumulative eects of evaluating a population of
points that provides statistical information about any particular subset of hyperplanes. 1
Implicit parallelism implies that many hyperplane competitions are simultaneously solved
in parallel. The theory suggests that through the process of reproduction and recombination,
the schemata of competing hyperplanes increase or decrease their representation in the pop-
ulation according to the relative tness of the strings that lie in those hyperplane partitions.
Because genetic algorithms operate on populations of strings, one can track the proportional
representation of a single schema representing a particular hyperplane in a population and
indicate whether that hyperplane will increase or decrease its representation in the popula-
tion over time when tness based selection is combined with crossover to produce ospring
from existing strings in the population.
9
1
F(X)
0
0 K/2 K
Variable X
F(X)
0
0 K/8 K/4 K/2 K
Variable X
F(X)
0
0 K/8 K/4 K/2 K
Variable X
10
String Fitness Random Copies String Fitness Random Copies
001b1;4...b1;L 2.0 { 2 011b12;4...b12;L 0.9 0.28 1
101b2;4...b2;L 1.9 0.93 2 000b13;4...b13;L 0.8 0.13 0
111b3;4...b3;L 1.8 0.65 2 110b14;4...b14;L 0.7 0.70 1
010b4;4...b4;L 1.7 0.02 1 110b15;4...b15;L 0.6 0.80 1
111b5;4...b5;L 1.6 0.51 2 100b16;4...b16;L 0.5 0.51 1
101b6;4...b6;L 1.5 0.20 1 011b17;4...b17;L 0.4 0.76 1
011b7;4...b7;L 1.4 0.93 2 000b18;4...b18;L 0.3 0.45 0
001b8;4...b8;L 1.3 0.20 1 001b19;4...b19;L 0.2 0.61 0
000b9;4...b9;L 1.2 0.37 1 100b20;4...b20;L 0.1 0.07 0
100b10;4...b10;L 1.1 0.79 1 010b21;4...b21;L 0.0 { 0
010b11;4...b11;L 1.0 { 1
The example population in Table 1 contains only 21 (partially specied) strings. Since we
are not particularly concerned with the exact evaluation of these strings, the tness values
will be assigned according to rank. (The notion of assigning tness by rank rather than by
tness proportional representation has not been discussed in detail, but the current example
relates to change in representation due to tness and not how that tness is assigned.)
The table includes information on the tness of each string and the number of copies to
be placed in the intermediate population. In this example, the number of copies produced
during selection is determined by automatically assigning the integer part, then assigning
the fractional part by generating a random value between 0.0 and 1.0 (a form of remainder
stochastic sampling). If the random value is greater than (1 remainder); then an additional
copy is awarded to the corresponding individual.
Genetic algorithms appear to process many hyperplanes implicitly in parallel when selec-
tion acts on the population. Table 2 enumerates the 27 hyperplanes (3 ) that can be dened
3
over the rst three bits of the strings in the population and explicitly calculates the tness
associated with the corresponding hyperplane partition. The true tness of the hyperplane
partition corresponds to the average tness of all strings that lie in that hyperplane parti-
tion. The genetic algorithm uses the population as a sample for estimating the tness of
that hyperplane partition. Of course, the only time the sample is random is during the rst
generation. After this, the sample of new strings should be biased toward regions that have
previously contained strings that were above average with respect to previous populations.
If the genetic algorithm works as advertised, the number of copies of strings that actually
fall in a particular hyperplane partition after selection should approximate the expected
number of copies that should fall in that partition.
11
Schemata and Fitness Values
Schema Mean Count Expect Obs Schema Mean Count Expect Obs
101*...* 1.70 2 3.4 3 *0**...* 0.991 11 10.9 9
111*...* 1.70 2 3.4 4 00**...* 0.967 6 5.8 4
1*1*...* 1.70 4 6.8 7 0***...* 0.933 12 11.2 10
*01*...* 1.38 5 6.9 6 011*...* 0.900 3 2.7 4
**1*...* 1.30 10 13.0 14 010*...* 0.900 3 2.7 2
*11*...* 1.22 5 6.1 8 01**...* 0.900 6 5.4 6
11**...* 1.175 4 4.7 6 0*0*...* 0.833 6 5.0 3
001*...* 1.166 3 3.5 3 *10*...* 0.800 5 4.0 4
1***...* 1.089 9 9.8 11 000*...* 0.767 3 2.3 1
0*1*...* 1.033 6 6.2 7 **0*...* 0.727 11 8.0 7
10**...* 1.020 5 5.1 5 *00*...* 0.667 6 4.0 3
*1**...* 1.010 10 10.1 12 110*...* 0.650 2 1.3 2
****...* 1.000 21 21.0 21 1*0*...* 0.600 5 3.0 4
100*...* 0.566 3 1.70 2
Table 2: The average tnesses (Mean) associated with the samples from the 27 hyperplanes
dened over the rst three bit positions are explicitly calculated. The Expected representation
(Expect) and Observed representation (Obs) are shown. Count refers to the number of
strings in hyperplane H before selection.
In Table 2, the expected number of strings sampling a hyperplane partition after selection
can be calculated by multiplying the number of hyperplane samples in the current population
before selection by the average tness of the strings in the population that fall in that
partition. The observed number of copies actually allocated by selection is also given. In
most cases the match between expected and observed sampling rate is fairly good: the error
is a result of sampling error due to the small population size.
It is useful to begin formalizing the idea of tracking the potential sampling rate of a
hyperplane, H. Let M (H; t) be the number of strings sampling H at the current generation t
in some population. Let (t + intermediate) index the generation t after selection (but before
crossover and mutation), and f (H; t) be the average evaluation of the sample of strings in
partition H in the current population. Formally, the change in representation according to
tness associated with the strings that are drawn from hyperplane H is expressed by:
M (H; t + intermediate) = M (H; t) f (H; t)
f :
Of course, when strings are merely duplicated no new sampling of hyperplanes is actu-
ally occurring since no new samples are generated. Theoretically, we would like to have a
sample of new points with this same distribution. In practice, this is generally not possible.
Recombination and mutation, however, provides a means of generating new sample points
while partially preserving distribution of strings across hyperplanes that is observed in the
intermediate population.
12
3.1 Crossover Operators and Schemata
The observed representation of hyperplanes in Table 2 corresponds to the representation in
the intermediate population after selection but before recombination. What does recombi-
nation do to the observed string distributions? Clearly, order-1 hyperplane samples are not
aected by recombination, since the single critical bit is always inherited by one of the o-
spring. However, the observed distribution of potential samples from hyperplane partitions
of order-2 and higher can be aected by crossover. Furthermore, all hyperplanes of the same
order are not necessarily aected with the same probability. Consider 1-point crossover. This
recombination operator is nice because it is relatively easy to quantify its eects on dierent
schemata representing hyperplanes. To keep things simple, assume we are are working with
a string encoded with just 12 bits. Now consider the following two schemata.
11********** and 1**********1
The probability that the bits in the rst schema will be separated during 1-point crossover
is only 1=L 1, since in general there are L 1 crossover points in a string of length L. The
probability that the bits in the second rightmost schema are disrupted by 1-point crossover
however is (L 1)=(L 1), or 1.0, since each of the L-1 crossover points separates the bits in
the schema. This leads to a general observation: when using 1-point crossover the positions
of the bits in the schema are important in determining the likelihood that those bits will
remain together during crossover.
where b1 to b12 represents bits 1 to 12. When viewed in this way, 1-point crossover
is a special case of 2-point crossover where one of the crossover points always occurs at
the wrap-around position between the rst and last bit. Maximum disruptions for order-2
schemata now occur when the 2 bits are at complementary positions on this ring.
For 1-point and 2-point crossover it is clear that schemata which have bits that are
close together on the string encoding (or ring) are less likely to be disrupted by crossover.
More precisely, hyperplanes represented by schemata with more compact representations
should be sampled at rates that are closer to those potential sampling distribution targets
achieved under selection alone. For current purposes a compact representation with respect
13
to schemata is one that minimizes the probability of disruption during crossover. Note that
this denition is operator dependent, since both of the two order-2 schemata examined in
section 3.1 are equally and maximally compact with respect to 2-point crossover, but are
maximally dierent with respect to 1-point crossover.
14
The linkage can now be changed by moving around the tag-bit pairs, but the string
remains the same when decoded: 010010110. One must now also consider how recombination
is to be implemented. Goldberg and Bridges (1990), Whitley (1991) as well as Holland (1975)
discuss the problems of exploiting linkage and the recombination of tagged representations.
To calculate M(H,t+1) we must consider the eects of crossover as the next generation
is created from the intermediate generation. First we consider that crossover is applied
probabilistically to a portion of the population. For that part of the population that does
not undergo crossover, the representation due to selection is unchanged. When crossover
does occur, then we must calculate losses due to its disruptive eects.
" #
f ( H; t) f ( H; t)
M (H; t + 1) = (1 pc )M (H; t) f + pc M (H; t) f (1 losses) + gains
In the derivation of the schema theorem a conservative assumption is made at this point.
It is assumed that crossover within the dening length of the schema is always disruptive to
the schema representing H. In fact, this is not true and an exact calculation of the eects
of crossover is presented later in this paper. For example, assume we are interested in the
schema 11*****. If a string such as 1110101 were recombined between the rst two bits with
a string such as 1000000 or 0100000, no disruption would occur in hyperplane 11***** since
one of the ospring would still reside in this partition. Also, if 1000000 and 0100000 were
recombined exactly between the rst and second bit, a new independent ospring would
sample 11*****; this is the sources of gains that is referred to in the above calculation. To
simplify things, gains are ignored and the conservative assumption is made that crossover
falling in the signicant portion of a schema always leads to disruption. Thus,
" #
f ( H; t) f ( H; t)
M (H; t + 1) (1 pc )M (H; t) f + pc M (H; t) f (1 disruptions)
where disruptions overestimates losses. We might wish to consider one exception: if two
strings that both sample H are recombined, then no disruption occurs. Let P (H; t) denote
the proportional represention of H obtained by dividing M (H; t) by the population size.
The probability that a randomly chosen mate samples H is just P (H; t). Recall that (H )
is the dening length associated with 1-point crossover. Disruption is therefore given by:
(H ) (1 P (H; t)):
L 1
15
At this point, the inequality can be simplied. Both sides can be divided by the popula-
tion size to convert this into an expression for P (H; t + 1), the proportional representation
of H at generation t + 1: Furthermore, the expression can be rearranged with respect to pc.
" #
f ( H; t) ( H )
P (H; t + 1) P (H; t) f 1 pc L 1 (1 P (H; t))
We now have a useful version of the schema theorem (although it does not yet consider
mutation); but it is not the only version in the literature. For example, both parents are
typically chosen based on tness. This can be added to the schema theorem by merely
indicating the alternative parent is chosen from the intermediate population after selection.
" #
P (H; t + 1) P (H; t) f (H;
f
t) 1 p (H ) (1 P (H; t) f (H; t) )
c
L 1 f
Finally, mutation is included. Let o(H ) be a function that returns the order of the
hyperplane H. The order of H exactly corresponds to a count of the number of bits in the
schema representing H that have value 0 or 1. Let the mutation probability be pm where
mutation always
ips the bit. Thus the probability that mutation does aect the schema
representing H is (1 pm)o H . This leads to the following expression of the schema theorem.
( )
" #
f ( H; t) ( H ) f ( H;
P (H; t + 1) P (H; t) f 1 pc L 1 (1 P (H; t) f ) (1 pm)o H t) ( )
(It doesn't matter which ospring inherits the rst critical bit, but all other bits must be
inherited by that same ospring. This is also a worst case probability of disruption which
assumes no alleles found in the schema of interest are shared by the parents.) Thus, for any
order-3 schemata the probability of uniform crossover separating the critical bits is always
1 (1=2) = 0:75. Consider for a moment a string of 9 bits. The dening length of a
2
schema must equal 6 before the disruptive probabilities of 1-point crossover match those
associated with uniform crossover (6/8 = .75). We can dene 84 dierent order-3 schemata
over any particular string of 9 bits (i.e., 9 choose 3). Of these schemata, only 19 of the 84
order-2 schemata have a disruption rate higher than 0.75 under 1-point crossover. Another
15 have exactly the same disruption rate, and 50 of the 84 order-2 schemata have a lower
disruption rate. It is relative easy to show that, while uniform crossover is unbiased with
respect to dening length, it is also generally more disruptive than 1-point crossover. Spears
and DeJong (1991) have shown that uniform crossover is in every case more disruptive than
2-point crossover for order-3 schemata for all dening lengths.
17
1111
0000
Figure 4: This graph illustrates paths though 4-D space. A 1-point crossover of 1111 and
0000 can only generate ospring that reside along the dashed paths at the edges of this graph.
Despite these analytical results, several researchers have suggested that uniform crossover
is sometimes a better recombination operator. One can point to its lack of representational
bias with respect to schema disruption as a possible explanation, but this is unlikely since
uniform crossover is uniformly worse than 2-point crossover. Spears and DeJong (1991:314)
speculate that, \With small populations, more disruptive crossover operators such as uniform
or n-point (n 2) may yield better results because they help overcome the limited infor-
mation capacity of smaller populations and the tendency for more homogeneity." Eshelman
(1991) has made similar arguments outlining the advantages of disruptive operators.
There is another sense in which uniform crossover is unbiased. Assume we wish to
recombine the bits string 0000 and 1111. We can conveniently lay out the 4-dimensional
hypercube as shown in Figure 4. We can also view these strings as being connected by a set
of minimal paths through the hypercube; pick one parent string as the origin and the other
as the destination. Now change a single bit in the binary representation corresponding to the
point of origin. Any such move will reach a point that is one move closer to the destination.
In Figure 4 it is easy to see that changing a single bit is a move up or down in the graph.
All of the points between 0000 and 1111 are reachable by some single application of
uniform crossover. However, 1-point crossover only generates strings that lie along two com-
plementary paths (in the gure, the leftmost and rightmost paths) through this 4-dimensional
hypercube. In general, uniform crossover will draw a complementary pair of sample points
with equal probability from all points that lie along any complementary minimal paths in
the hypercube between the two parents, while 1-point crossover samples points from only
two specic complementary minimal paths between the two parent strings. It is also easy to
see that 2-point crossover is less restrictive than 1-point crossover. Note that the number of
bits that are dierent between two strings is just the Hamming distance, H. Not including
the original parent strings, uniform crossover can generate 2H 2 dierent strings, while
1-point crossover can generate 2(H 1) dierent strings since there are H crossover points
that produce unique ospring (see the discussion in the next section) and each crossover
produces 2 ospring. The 2-point crossover operator can generate 2 H = H H dierent
2
2
18
ospring since there are H choose 2 dierent crossover points that will result in ospring
that are not copies of the parents and each pair of crossover points generates 2 strings.
Both strings lie in the hyperplane 0001**101*01001*. The
ip side of this observation
is that crossover is really restricted to a subcube dened over the bit positions that are
dierent. We can isolate this subcube by removing all of the bits that are equivalent in
the two parent structures. Booker (1987) refers to strings such as ----11---1-----1 and
----00---0-----0 as the \reduced surrogates" of the original parent chromosomes.
When viewed in this way, it is clear that recombination of these particular strings occurs in
a 4-dimensional subcube, more or less identical to the one examined in the previous example.
Uniform crossover is unbiased with respect to this subcube in the sense that uniform crossover
will still sample in an unbiased, uniform fashion from all of the pairs of points that lie
along complementary minimal paths in the subcube dened between the two original parent
strings. On the other hand, simple 1-point or 2-point crossover will not. To help illustrate
this idea, we recombine the original strings, but examine the ospring in their \reduced"
forms. For example, simple 1-point crossover will generate ospring ----11---1-----0
and ----00---0-----1 with a probability of 6/15 since there are 6 crossover points in the
original parent strings between the third and fourth bits in the reduced subcube and L-1
= 15. On the other hand, ----10---0-----0 and ----01---1-----1 are sampled with a
probability of only 1/15 since there is only a single crossover point in the original parent
structures that falls between the rst and second bits that dene the subcube.
One can remove this particular bias, however. We apply crossover on the reduced surro-
gates. Crossover can now exploit the fact that there is really only 1 crossover point between
any signicant bits that appear in the reduced surrogate forms. There is also another benet.
If at least 1 crossover point falls between the rst and last signicant bits in the reduced
surrogates, the ospring are guaranteed not to be duplicates of the parents. (This assumes
the parents dier by at least two bits). Thus, new sample points in hyperspace are generated.
The debate on the merits of uniform crossover and operators such as 2-point reduced sur-
rogate crossover is not a closed issue. To fully understand the interaction between hyperplane
sampling, population size, premature convergence, crossover operators, genetic diversity and
the role of hill-climbing by mutation requires better analytical methods.
These counting arguments naturally lead to questions about the relationship between
population size and the number of hyperplanes that are sampled by a genetic algorithm.
One can take a very simple view of this question and ask how many schemata of order-1
are sampled and how well are they represented in a population of size N. These numbers
are based on the assumption that we are interested in hyperplane representations associated
with the initial random population, since selection changes the distributions over time. In
a population of size N there should be N/2 samples of each of the 2L order-1 hyperplane
partitions. Therefore 50% of the population falls in any particular order-1 partition. Each
order-2 partition is sampled by 25% of the population. In general then, each hyperplane of
order i is sampled by (1=2)i of the population.
20
since = log(N=) and N = 2 . Fitzpatrick and Grefenstette now make the following
arguments. Assume L 64 and 2 N 2 . Pick = 8, which implies 3 17: By
6 20
This argument does not hold in general for any population of size N. Given a string of
length L, the number of hyperplanes in the space is nite. However, the population size can
be chosen arbitrarily. The total number of schemata associated with a string of length L
is 3L. Thus if we pick a population size where N = 3L then at most N hyperplanes can
be processed (Michael Vose, personal communication). Therefore, N must be chosen with
respect to L to make the N argument reasonable. At the same time, the range of values
3
Still, the argument that N hyperplanes are usefully processed assumes that all of these
3
hyperplanes are processed with some degree of independence. Notice that the current deriva-
tion counts only those schemata that are exactly of order-. The sum of all schemata from
order-1 that should be well represented in a random initial population is given
to order-
by: Px 2x Lx . By only counting schemata that are exactly of order- we might hope to
=1
avoid arguments about interactions with lower order schemata. However, all the N argu-
3
ment really shows is that there may be as many as N hyperplanes that are well represented
3
given an appropriate population size. But a simple static count of the number of schemata
available for processing fails to consider the dynamic behavior of the genetic algorithm.
As discussed later in this tutorial, dynamic models of the genetic algorithm now exist
(Vose and Liepins, 1991; Whitley, Das and Crabb 1992). There has not yet, however, been
any real attempt to use these models to look at complex interactions between large numbers of
hyperplane competitions. It is obvious in some vacuous sense that knowing the distribution
of the initial population as well as the tnesses of these strings (and the strings that are
subsequently generated by the genetic algorithm) is sucient information for modeling the
dynamic behavior of the genetic algorithm (Vose 1993). This suggests that we only need
information about those strings sampled by the genetic algorithm. However, this micro-level
view of the genetic algorithm does not seems to explain its macro-level processing power.
22
P (Z; t + 1) = P (Z; t) f (Z;
f
t) (1 fp lossesg) + fp gains.g
c c
In the current formulation, Z will refer to a string. Assume we apply this equation to
each string in the search space. The result is an exact model of the computational behavior
of a genetic algorithm. Since modeling strings models the highest order schemata, the model
implicitly includes all lower order schemata. Also, the tnesses of strings are constants in
the canonical genetic algorithm using tness proportional reproduction and one need not
worry about changes in the observed tness of a hyperplane as represented by the current
population. Given a specication of Z, one can exactly calculate losses and gains. Losses
occur when a string crosses with another string and the resulting ospring fails to preserve
the original string. Gains occur when two dierent strings cross and independently create
a new copy of some string. For example, if Z = 000 then recombining 100 and 001 will
always produce a new copy of 000. Assuming 1-point crossover is used as an operator, the
probability of \losses" and \gains" for the string Z = 000 are calculated as follows:
losses = f (111) (111 ) + f (101) (101 )
PI 0
f P ;t PI 0
f P ;t
+ I 2 (001) P
(010)
(001 ) (010 )
f
f
P ;t
f
f
P ;t :
The use of PI in the preceding equations represents the probability of crossover in any
0
position on the corresponding string or string pair. Since Z is a string, it follows that PI = 0
1.0 and crossover in the relevant cases will always produce either a loss or a gain (depending
on the expression in which the term appears). The probability that one-point crossover will
fall between the rst and second bit will be denoted by PI . In this case, crossover must 1
fall in exactly this position with respect to the corresponding strings to result in a loss or
a gain. Likewise, PI will denote the probability that one-point crossover will fall between
2
the second and third bit and the use of PI in the computation implies that crossover must
2
fall in this position for a particular string or string pair to eect the calculation of losses or
gains. In the above illustration, PI = PI = 0:5. 1 2
The equations can be generalized to cover the remaining 7 strings in the space. This trans-
lation is accomplished using bitwise addition modulo 2 (i.e., a bitwise exclusive-or denoted
by . See Figure 4 and section 6.4). The function (Si Z ) is applied to each bit string, Si,
contained in the equation presented in this section to produce the appropriate corresponding
strings for generating an expression for computing all terms of the form P(Z,t+1).
23
7.1 A Generalized Form Based on Equation Generators
The 3 bit equations are similar to the 2 bit equations developed by Goldberg (1987). The
development of a general form for these equations is illustrated by generating the loss and
gain terms in a systematic fashion (Whitley, Das and Crabb, 1992). Because the number of
terms in the equations is greater than the number of strings in the search space, it is only
practical to develop equations for encodings of approximately 15 bits. The equations need
only be dened once for one string in the space; the standard form of the equation is always
dened for the string composed of all zero bits. Let S represent the set of binary strings of
length L, indexed by i. In general, the string composed of all zero bits is denoted S . 0
1###1
/ \
/ \
01##1 1##10
/ \ / \
/ \ / \
001#1 01#10 1#100
/ \ / \ / \
/ \ / \ / \
00011 00110 01100 11000
The graph structure allows one to visualize the set of all generators for string losses. In
general, the root of this graph is dened by a string with a sentry bit in the rst and last bit
positions, and the generator token \#" in all other intermediate positions. A move down
and to the left in the graph causes the leftmost sentry bit to be shifted right; a move down
and to the right causes the rightmost sentry bit to be shifted left. All bits outside the sentry
positions are \0" bits. Summing over the graph, one can see that there are PLj j 2L j
1
=1
1
where (Si) is a function that counts the number of crossover points between sentry bits in
string Si .
25
In this case, the root of the directed acyclic graph is dened by starting with the most
specic generator pair. The A-generator of the root has a \1" bit as the sentry bit in the
rst position, and all other bits are \0." The
-generator of the root has a \1" bit as the
sentry bit in the last position, and all other bits are \0." A move down and left in the graph
is produced by shifting the left sentry bit of the current upper A-generator to the right.
A move down and right is produced by shifting the right sentry bit of the current lower
-generator to the left. Each vacant bit position outside of the sentry bits which results
from a shift operation is lled using the # symbol.
For any level k of the directed graph there are k generators and the number of string pairs
generated at that level is 2k for each pair of generators (the root is level 1). Therefore, the
1
total number of stringPpairs that must be included in the equations to calculate string gains
for S of length L is Lk k 2k :
0
1
=1
1
Let S x and S! y be two strings produced by a generator pair, such that S x was
+ + +
produced by the A-generator and has a sentry bit at location 1 and S! y was produced by
+
the
-generator with a sentry bit at L !. (The x and y terms are correction factors added to
and ! in order to uniquely index a string in S .) Let the critical crossover region associated
with S x and S! y be computed by the function (S x ; S! y ) = L ! ( 1): For each
+ + + +
string pair S x and S! y a term of the following form is added to the gains equations:
+ +
(S x; S! y ) f (S x) P (S ; t) f (S! y ) P (S ; t)
+ + + +
L 1 f x
+
f ! y
+
where (S x; S! y ) counts the number of crossover points that fall in the critical region
+ +
dened by the sentry bits located at 1 and L !.
The generators are used as part of a two stage computation where the generators are
rst used to create an exact equation in standard form. A simple transformation function
maps the equations to all other strings in the space.
To further generalize this model, the function ri;j (k) is used to construct a mixing matrix
M where the i; j th entry mi;j = ri;j (0). Note that this matrix gives the probabilities that
crossing strings i and j will produce the string S : Technically, the denition of ri;j (k) assumes
0
that exactly one ospring is produced. But note that M has two entries for each string pair
i; j where i 6= j , which is equivalent to producing two ospring. For current purposes, assume
no mutation is used and 1-point crossover is used as the recombination operator. The matrix
M is symmetric and is zero everywhere on the diagonal except for entry m ; which is 1.0.
00
Note that M is expressed entirely in terms of string gain information. Therefore, the rst row
and column of the matrix has entries inversely related to the string losses probabilities, each
entry is given by 1 (0:5 (Si)=L 1), where each string in the set S is crossed with S . For 0
completeness, (Si) for strings not produced by the string loss generators is 0 and, thus, the
probability of obtaining S during reproduction is 1.0. The remainder of the matrix entries
0
are given by 0:5 S+Lx ;S!+y : For each pair of strings produced by the string gains generators
(
1
)
determine their index and enter the value returned by the function into the corresponding
location in M: For completeness, (Sj ; Sk ) = 0 for all pairs of strings not generated by the
string gains generators (i.e., mj;k = 0).
Once dened M does not change since it is not aected by variations in tness or pro-
portional representation in the population. Thus, given the assumption of no mutations,
that s is updated each generation to correct for changes in the population average, and that
1-point crossover is used, then the standard form of the executable equations corresponds to
the following portion of the Liepins and Vose model (T denotes transpose):
sT Ms:
An alternative form of M denoted M 0 can be dened by having only a single entry for
each string pair i; j where i 6= j . This is done by doubling the value of the enties in the lower
triangle and setting the entries in the upper triangle of the matrix to 0.0. Assuming each
component of s is given by si = P (Si; t)(f (Si)=f)), this has the rhetorical advantage that
sT M 0 (:; 1)s = P (S ; t)(f (S )=f)(1 losses):
0 0 0
where M 0(:; 1) is the rst column of M 0 and s is the rst component of s. Not including the
0
above subcomputation, the remainder of the computation of sT M 0s calculates string gains.
Vose and Liepins formalize the notion that bitwise exclusive-or can be used to remap all
the strings in the search space, in this case represented by the vector s. They show that if
27
A Transform Function to Redene Equations
000 010 ) 010 100 010 ) 110
001 010 ) 011 101 010 ) 111
010 010 ) 000 110 010 ) 100
011 010 ) 001 111 010 ) 101
Figure 5: The operator is bit-wise exclusive-or. Let ri;j (k) be the probability that k results
from the recombination of strings i and j. If recombination is a combination of crossover
and mutation then ri;j (k 0) = rik;jk (0). The strings are reordered with respect to 010.
Recall that s denoted the representation of strings in the population during the inter-
mediate phase as the genetic algorithm goes from generation t to t + 1 (after selection, but
before recombination). To complete the cycle and reach a point at which the Vose and
Liepins models can be executed in an iterative fashion, tness information is now explicitly
introduced to transform the population at the beginning of iteration t + 1 to the next in-
termediate population. A tness matrix F is dened such that tness information is stored
along the diagonal; the i; i th element is given by f (i) where f is the evaluation function.
The transformation from the vector pt to the next intermediate population represented
+1
by st is given as follows:
+1
st F M(st):
+1
Vose and Liepins give equations for calculating the mixing matrix M which not only
includes probabilities for 1-point crossover, but also mutation. More complex extension of
the Vose and Liepins model include nite population models using Markov chains (Nix and
Vose, 1992). Vose (1993) surveys the current state of this research.
28
8 Other Models of Evolutionary Computation
There are several population based algorithms that are either spin-os of Holland's genetic
algorithm, or which were developed independently. Evolution Strategies and Evolutionary
Programming refer to two computational paradigms that use a population based search.
Evolutionary Programming is based on the early book by L. Fogel, Owens and Walsh
(1966) entitled, Articial Intelligence Through Simulated Evolution. The individuals, or
\organisms," in this study were nite state machines. Organisms that best solved some
target function obtained the opportunity to reproduce. Parents were mutated to create
ospring. There has been renewed interest in Evolution Programming as re
ected by the
1992 First Annual Conference on Evolutionary Programming (Fogel and Atmar 1992).
Evolution Strategies are based on the work of Rechenberg (1973) and Schwefel (1975;
1981) and are discussed in a survey by Back, Homeister and Schwefel (1991). Two examples
of Evolution Strategies (ES) are the (+)-ES and (; )-ES. In (+)-ES parents produce
ospring; the population is then reduced again to parents by selecting the best solutions
from among both the parents and ospring. Thus, parents survive until they are replaced
by better solutions. The (; )-ES is closer to the generational model used in canonical
genetic algorithms; ospring replace parents and then undergo selection. Recombination
operators for evolutionary strategies also tend to dier from Holland-style crossover, allowing
operations such as averaging parameters, for example, to create an ospring.
8.1 Genitor
Genitor (Whitley 1988; 1989) was the rst of what Syswerda (1989) has termed \steady
state" genetic algorithms. The name \steady state" is somewhat misleading, since these
algorithms show more variance than canonical genetic algorithms in the terms of hyperplane
sampling behavior (Syswerda, 1991) and are therefore more susceptible to sample error and
genetic drift. The advantage is that the best points found in the search are maintained in the
population. This results in a more aggressive search that in practice is often quite eective.
There are three dierences between Genitor-style algorithms and canonical genetic algo-
rithms. First, reproduction produces one ospring at a time. Two parents are selected for
reproduction and produce an ospring that is immediately placed back into the population.
The second major dierence is in how that ospring is placed back in the population. O-
spring do not replace parents, but rather the least t (or some relatively less t) member of
the population. In Genitor, the worst individual in the population is replaced. The third
dierence between Genitor and most other forms of genetic algorithms is that tness is as-
signed according to rank rather than by tness proportionate reproduction. Ranking helps
to maintain a more constant selective pressure over the course of search.
Goldberg and Deb (1991) have shown replacing the worst member of the population
generates much higher selective pressure than random replacement. But higher selective
pressure is not the only dierence between Genitor and the canonical genetic algorithm.
To borrow terminology used by the Evolution Strategy community (as suggested by Larry
Eshelman), Genitor is a ( + ) strategy while the canonical genetic algorithm is a (; )
strategy. Thus, the accumulation of improved strings in the population is monotonic.
29
8.2 CHC
Another genetic algorithm that monotonically collects the best strings found so far is the
CHC algorithm developed by Larry Eshelman (1991). CHC stands for Cross generational eli-
tist selection, Heterogeneous recombination (by incest prevention) and Cataclysmic mutation,
which is used to restart the search when the population starts to converge.
CHC explicitly borrows from the ( + ) strategy of Evolution Strategies. After recom-
bination, the N best unique individuals are drawn from the parent population and ospring
population to create the next generation. Duplicates are removed from the population. As
Goldberg has shown with respect to Genitor, this kind of \survival of the ttest" replace-
ment method already imposes considerable selective pressure, so that there is no real need
to use any other selection mechanisms. Thus CHC uses random selection, except restrictions
are imposed on which strings are allowed to mate. Strings with binary encodings must be
a certain Hamming distance away from one another before they are allowed to reproduce.
This form of \incest prevention" is designed to promote diversity. Eshelman also uses a form
of uniform crossover called HUX where exactly half of the diering bits are swapped during
crossover. CHC is typically run using small population sizes (e.g. 50); thus using uniform
crossover in this context is consistent with DeJong and Spears (1991) conjecture that uniform
crossover can provide better sampling coverage in the context of small populations.
The rationale behind CHC is to have a very aggressive search (by using monotonic se-
lection through survival of the best strings) and to oset the aggressiveness of the search
by using highly disruptive operators such as uniform crossover. With such small population
sizes, however, the population converges to the point that it begins to more or less repro-
duce many of the same strings. At this point the CHC algorithm uses cataclysmic mutation.
All strings undergo heavy mutation, except that the best string is preserved intact. After
mutation, genetic search is restarted using only crossover.
30
Experimental researchers and theoreticians are particularly divided on the issue of hy-
bridization. By adding hill-climbing or hybridizing with some other optimization methods,
learning is being added to the evolution process. Coding the learned information back onto
the chromosome means that the search utilizes a form of Lamarckian evolution. The chromo-
somes improved by local hill-climbing or other methods are placed in the genetic population
and allowed to compete for reproductive opportunities.
The main criticism is that if we wish to preserve the schema processing capabilities of
the genetic algorithm, then Lamarckian learning should not be used. Changing information
in the ospring inherited from the parents results in a loss of inherited schemata. This alters
the statistical information about hyperplane partitions that is implicitly contained in the
population. Therefore using local optimization to improve each ospring undermines the
genetic algorithm's ability to search via hyperplane sampling.
Despite the theoretical objections, hybrid genetic algorithms typically do well at opti-
mization tasks. There may be several reasons for this. First, the hybrid genetic algorithm
is hill-climbing from multiple points in the search space. Unless the objective function is
severely multimodal it may be likely that some strings (ospring) will be in the basin of
attraction of the global solution, in which case hill-climbing is a fast and eective form of
search. Second, a hybrid strategy impairs hyperplane sampling, but does not disrupt it
entirely. For example, using local optimization to improve the initial population of strings
only biases the initial hyperplane samples, but does not interfere with subsequent hyperplane
sampling. Third, in general hill-climbing may nd a small number of signicant improve-
ments, but may not dramatically change the ospring. In this case, the eects on schemata
and hyperplane sampling may be minimal.
31
In practice there may be clues as to when hill-climbing is a dominant factor in a search.
Hyperplane sampling requires larger populations. Small populations are much more likely
to rely on hill-climbing. A population of 20 individuals just doesn't provide very much
information about hyperplane partitions, except perhaps very low order hyperplanes (there
are only 5 samples of each order-2 hyperplane in a population of 20). Second, very high
selective pressure suggests hill-climbing may dominate the search. If the 5 best individuals
in a population of 100 strings reproduce 95% of the time, then the eective population size
may not be large enough to support hyperplane sampling.
32
of tournament selection is the same in expectation as ranking using a linear 2.0 bias. If a
winner is chosen probabilistically from a tournament of 2, then the ranking is linear and the
bias is proportional to the probability with which the best string is chosen.
With the addition of tournament selection, a parallel form of the canonical genetic al-
gorithm can now be implemented in a fairly direct fashion. Assume the processors are
numbered 1 to N/2 and the population size, N, is even; 2 strings reside at each processor.
Each processor holds two independent tournaments by randomly sampling strings in the
population and each processor then keeps the winners of the two tournaments. The new
strings that now reside in the processors represent the intermediate generation. Crossover
and evaluation can now occur in parallel.
33
An Island Model Genetic Algorithm A Cellular Genetic Algorithm
Figure 6: An example of both an island model and a cellular genetic algorithm. The coloring
of the cells in the cellular genetic algorithm represents genetically similar material that forms
virtual islands isolated by distance. The arrows in the cellular model indicate that the grid
wraps around to form a torus.
One can obviously assign one string per processor or cell. But global random mating
would now seem inappropriate given the communication restrictions. Instead, it is much
more practical to have each string (i.e., processor) seek a mate close to home. Each processor
can pick the best string in its local neighborhood to mate with, or alternatively, some form
of local probabilistic selection could be used. In either case, only one ospring is produced.
and becomes the new resident at that processor. Several people have proposed this type
of computational model (Manderick and Spiessens, 1989; Collins and Jeerson, 1991; Hillis,
1990; Davidor, 1991). The common theme in cellular genetic algorithms is that selection
and mating are typically restricted to a local neighborhood.
There are no explicit islands in the model, but there is the potential for similar eects.
Assuming that mating is restricted to adjacent processors, if one neighborhood of strings is
20 or 25 moves away from another neighborhood of strings, these neighborhoods are just as
isolated as two subpopulations on separate islands. This kind of separation is referred to as
isolation by distance (Wright, 1932; Muhlenbein, 1991; Gorges-Schleuter, 1991). Of course,
neighbors that are only 4 or 5 moves away have a greater potential for interaction.
After the rst random population is evaluated, the pattern of strings over the set of
processors should also be random. After a few generations, however, there emerge many
small local pockets of similar strings with similar tness values. Local mating and selection
creates local evolutionary trends, again due to sampling eects in the initial population and
genetic drift. After several generations, competition between local groups will result in fewer
and larger neighborhoods.
34
11 Conclusions
One thing that is striking about genetic algorithms and the various parallel models is the
richness of this form of computation. What may seem like simple changes in the algorithm
often result in surprising kinds of emergent behavior. Recent theoretical advances have also
improved our understanding of genetic algorithms and have opened to the door to using
more advanced analytical methods.
Many other timely issues have not been covered in this tutorial. In particular, the issue
of deception has not been discussed. The notion of deception, in simplistic terms, deals with
con
icting hyperplane competitions that have the potential to either mislead the genetic
algorithm, or to simply confound the search because the con
icting hyperplane competitions
interfere with the search process. For an introduction to the notion of deception see Goldberg
(1987) and Whitley (1991); for a criticism of the work on deception see Grefenstette (1993).
Acknowledgements: This tutorial not only represents information transmitted through scholarly
works, but also through conference presentations, personal discussions, debates and even disagree-
ments. My thanks to the people in the genetic algorithm community who have educated me over
the years. Any errors or errant interpretations of other works are my own. Work presented in the
tutorial was supported by NSF grant IRI-9010546 and the Colorado Advanced Software Institute.
References
Ackley, D. (1987) A Connectionist Machine for Genetic Hillclimbing. Kluwer Academic Publishers.
Antonisse, H.J. (1989) A New Interpretation of the Schema Notation that Overturns the Binary
Encoding Constraint. Proc 3rd International Conf on Genetic Algorithms, Morgan-Kaufmann.
Back, T., Homeister, F. and Schwefel, H.P. (1991) A Survey of Evolution Strategies. Proc. 4th
International Conf. on Genetic Algorithms, Morgan-Kaufmann.
Baker, J. (1985) Adaptive selection methods for genetic algorithms. Proc. International Conf. on
Genetic Algorithms and Their Applications. J. Grefenstette, ed. Lawrence Erlbaum.
Baker, J. (1987) Reducing Bias and Ineciency in the Selection Algorithm. Genetic Algorithms
and Their Applications: Proc. Second International Conf. J. Grefenstette, ed. Lawrence Erlbaum.
Booker, L. (1987) Improving Search in Genetic Algorithms. In, Genetic Algorithms and Simulating
Annealing, L. Davis, ed. Morgan Kaufman, pp. 61-73.
Bridges, C. and Goldberg, D. (1987) An analysis of reproduction and crossover in a binary-coded
genetic Algorithm. Proc. 2nd International Conf. on Genetic Algorithms and Their Applications.
J. Grefenstette, ed. Lawrence Erlbaum.
Collins, R. and Jeerson, D. (1991) Selection in Massively Parallel Genetic Algorithms. Proc. 4th
International Conf. on Genetic Algorithms, Morgan-Kaufmann, pp 249-256.
Davidor, Y. (1991) A Naturally Occurring Niche & Species Phenomenon: The Model and First
Results. Proc 4th International Conf on Genetic Algorithms, Morgan-Kaufmann, pp 257-263.
Davis, L.D. (1991) Handbook of Genetic Algorithms. Van Nostrand Reinhold.
DeJong, K. (1975) An Analysis of the Behavior of a Class of Genetic Adaptive Systems. PhD
Dissertation. Dept. of Computer and Communication Sciences, Univ. of Michigan, Ann Arbor.
35
Eshelman, L. (1991) The CHC Adaptive Search Algorithm. Foundations of Genetic Algorithms,
G. Rawlins, ed. Morgan-Kaufmann. pp 256-283.
Fitzpatrick, J.M. and Grefenstette, J.J. (1988) Genetic Algorithms in Noisy Environments. Ma-
chine Learning, 3(2/3): 101-120.
Fogel, L.J., Owens, A.J., and Walsh, M.J. (1966) Articial Intelligence Through Simulated Evolu-
tion. John Wiley.
Fogel, D., and Atmar, J.W., eds. (1992) First Annual Conference on Evolutionary Programming.
Goldberg, D. and Bridges, C. (1990) An Analysis of a Reordering Operator on a GA-Hard Problem.
Biological Cybernetics, 62:397-405.
Goldberg, D. (1987) Simple Genetic Algorithms and the Minimal, Deceptive Problem. In, Genetic
Algorithms and Simulated Annealing, L. Davis, ed., Pitman.
Goldberg, D. (1989) Genetic Algorithms in Search, Optimization and Machine Learning. Reading,
MA: Addison-Wesley.
Goldberg, D. (1990) A Note on Boltzmann Tournament Selection for Genetic Algorithms and
Population-oriented Simulated Annealing. TCGA 90003, Engineering Mechanics, Univ. Alabama.
Goldberg, D. (1991) The Theory of Virtual Alphabets. Parallel Problem Solving from Nature,
Springer Verlag.
Goldberg, D., and Deb, K. (1991) A Comparative Analysis of Selection Schemes Used in Genetic
Algorithms. Foundations of Genetic Algorithms, G. Rawlins, ed. Morgan-Kaufmann. pp 69-93.
Gorges-Schleuter, M. (1991) Explicit Parallelism of Genetic Algorithms through Population Struc-
tures. Parallel Problem Solving from Nature, Springer Verlag, pp 150-159.
Grefenstette, J.J. (1986) Optimization of Control Parameters for Genetic Algorithms. IEEE Trans.
Systems, Man, and Cybernetics, 16(1): 122-128.
Grefenstette, J.J. and Baker, J. (1989) How Genetic Algorithms Work: A Critical Look at Implicit
Parallelism. Proc 3rd International Conf on Genetic Algorithms, Morgan-Kaufmann.
Grefenstette, J.J. (1993) Deception Considered Harmful. Foundations of Genetic Algorithms -2-,
D. Whitley, ed., Morgan Kaufmann. pp: 75-91.
Hillis, D. (1990) Co-Evolving Parasites Improve Simulated Evolution as an Optimizing Procedure.
Physica D 42, pp 228-234.
Holland, J. (1975) Adaptation In Natural and Articial Systems. University of Michigan Press.
Liepins, G. and Vose, M. (1990) Representation Issues in Genetic Algorithms. Journal of Experi-
mental and Theoretical Articial Intelligence, 2:101-115.
Manderick, B. and Spiessens P. (1989) Fine Grained Parallel Genetic Algorithms. Proc 3rd Inter-
national Conf on Genetic Algorithms, Morgan-Kaufmann, pp 428-433.
Michalewicz, Z. (1992) Genetic Algorithms + Data Structures = Evolutionary Programs. Springer-
Verlag, AI Series, New York.
Muhlenbein, H. (1991) Evolution in Time and Space - The Parallel Genetic Algorithm. Foundations
of Genetic Algorithms, G. Rawlins, ed. Morgan-Kaufmann. pp 316-337.
Muhlenbein, H. (1992) How genetic algorithms really work: I. Mutation and Hillclimbing, Parallel
Problem Solving from Nature -2-, R. Manner and B. Manderick, eds. North Holland.
Nix, A. and Vose, M. (1992) Modeling Genetic Algorithms with Markov Chains. Annals of Math-
ematics and Articial Intelligence. 5:79-88.
36
Rechenberg, I. (1973) Evolutionsstrategie: Optimierung technischer Systeme nach Prinzipien der
biologischen Evolution. Frommann-Holzboog Verland, Stuttgart.
Schaer, J.D. (1987) Some Eects of Selection Procedures on Hyperplane Sampling by Genetic
Algorithms. In, Genetic Algorithms and Simulated Annealing, L. Davis, ed. Pitman.
Schaer, J.D., and Eshelman, L. (1993) Real-Coded Genetic Algorithms and Interval Schemata.
Foundations of Genetic Algorithms -2-, D. Whitley, ed. Morgan-Kaufmann.
Schwefel, H.P. (1975) Evolutionsstrategie und numerische Optimierung. Dissertation, Technische
Universitat Berlin.
Schwefel, H.P. (1981) Numerical Optimization of Computer Models. Wiley.
Spears, W. and DeJong, K. (1991) An Analysis of Multi-Point Crossover. Foundations of Genetic
Algorithms, G. Rawlins, ed. Morgan-Kaufmann.
Syswerda, G. (1989) Uniform Crossover in Genetic Algorithms. Proc 3rd International Conf on
Genetic Algorithms, Morgan-Kaufmann, pp 2-9.
Syswerda, G. (1991) A Study of Reproduction in Generational and Steady-State Genetic Algo-
rithms. Foundations of Genetic Algorithms, G. Rawlins, ed. Morgan-Kaufmann. pp 94-101.
Starkweather, T., Whitley, D., and Mathias, K. (1991) Optimization Using Distributed Genetic
Algorithms. Parallel Problem Solving from Nature, Springer Verlag.
Tanese, R. (1989) Distributed Genetic Algorithms. Proc 3rd International Conf on Genetic Algo-
rithms, Morgan-Kaufmann, pp 434-439.
Vose, M. (1993) Modeling Simple Genetic Algorithms. Foundations of Genetic Algorithms -2-, D.
Whitley, ed., Morgan Kaufmann. pp: 63-73.
Vose, M. and Liepins, G. (1991) Punctuated Equilibria in Genetic Search. Complex Systems 5:31-
44.
Whitley, D. (1989) The GENITOR Algorithm and Selective Pressure. Proc 3rd International Conf
on Genetic Algorithms, Morgan-Kaufmann, pp 116-121.
Whitley, D. (1991) Fundamental Principles of Deception in Genetic Search. Foundations of Genetic
Algorithms. G. Rawlins, ed. Morgan Kaufmann.
Whitley, D. (1993a) An Executable Model of a Simple Genetic Algorithm. Foundations of Genetic
Algorithms -2-. D. Whitley, ed. Morgan Kaufmann.
Whitley, D. (1993b) Cellular Genetic Algorithms. Proc. 5th International Conference on Genetic
Algorithms. Morgan Kaufmann.
Whitley, D., and Kauth, J. (1988) GENITOR: a Dierent Genetic Algorithm. Proceedings of the
Rocky Mountain Conference on Articial Intelligence, Denver, CO. pp 118-130.
Whitley, D. and Starkweather, T. (1990) Genitor II: a Distributed Genetic Algorithm. Journal
Expt. Theor. Artif. Intell., 2:189-214
Whitley, D., Das, R., and Crabb, C. (1992) Tracking Primary Hyperplane Competitors During
Genetic Search. Annals of Mathematics and Articial Intelligence. 6:367-388.
Winston, P. (1992) Articial Intelligence, Third Edition. Addison-Wesley.
Wright, A. (1991) Genetic Algorithms for Real Parameter Optimization. Foundations of Genetic
Algorithms. G. Rawlins, ed. Morgan Kaufmann.
Wright, S. (1932) The Roles of Mutation, Inbreeding, Crossbreeding, and Selection in Evolution.
Proc. 6th Int. Congr. on Genetics, pp 356-366.
37