0% found this document useful (0 votes)
7 views

The Genetic Algorithm in Computer Science

The document discusses genetic algorithms and their components. It describes the standard genetic algorithm process including selection, crossover, and mutation. It then discusses analyzing genetic algorithms through schema analysis and mathematical modeling, focusing on modeling infinite and finite populations.

Uploaded by

Lalla Bozzi
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

The Genetic Algorithm in Computer Science

The document discusses genetic algorithms and their components. It describes the standard genetic algorithm process including selection, crossover, and mutation. It then discusses analyzing genetic algorithms through schema analysis and mathematical modeling, focusing on modeling infinite and finite populations.

Uploaded by

Lalla Bozzi
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

The Genetic Algorithm in Computer Science

Eric Krevice Prebys

Abstract. The genetic algorithm is described, including its three main steps: selection,
crossover, and mutation. This description is used in the presentation of two methods for
analyzing genetic algorithms: schema analysis and mathematical modeling. The discus-
sion of schema analysis focuses on the Schema Theorem. Following it, an exact mathe-
matical model is described. First, the model is presented assuming an infinite population.
Then the model is made more accurate by assuming a finite population.

1. Introduction. All life on Earth is thought to evolve. Genetic algorithms are com-
puting algorithms constructed in analogy with the process of evolution [1]. Genetic
algorithms seem to be useful for searching very general spaces and poorly defined spaces.
It is hoped that, through more rigorous theoretical study, we will determine what sorts
of spaces genetic algorithms can search efficiently.
In biology, the gene is the basic unit of genetic storage [5]. Within cells, genes
are strung together to form chromosomes. The simplest possible sexual reproduction
is between single-cell organisms. The two cells fuse to produce a cell with two sets of
chromosomes, called a diploid cell. The diploid cell immediately undergoes meiosis. In
meiosis, each of the chromosomes in the diploid cell makes an exact copy of itself. Then
the chromosome groups (original and copy) undergo crossover with the corresponding
groups, mixing the genes somewhat. Finally the chromosomes separate twice, giving
four haploid cells. Mutation can occur at any stage, and any mutation in the chromo-
somes will be inheritable. Mutation is essential for evolution. There are three types
relevant to genetic algorithms: point mutations where a single gene is changed, chromo-
somal mutations where some number of genes are lost completely, and inversion where
a segment of the chromosome becomes flipped.
This paper follows Melanie Mitchell’s book [3]. We will introduce a simplified genetic
algorithm using the notions of sexual reproduction. This algorithm captures most of
the essential components of every genetic algorithm, so we will call it the Standard
Genetic Algorithm.
A simple way to view the Standard Genetic Algorithm is provided by schema. With
them we can understand the Schema Theorem. It explains how crossover allows a
genetic algorithm to zero in on an optimal solution. However, schema are inadequate
in determining some characteristics of the population. Specifically, in determining the
speed of population convergence, and the distribution of the population over time. To
deal with these issues, we will develop an exact mathematical model, first assuming an
infinite population, and then a finite population. This model was originally presented
by Michael Vose and G.E. Liepins in [2] and [7]. More details are found in two papers
by Allen Nix and Michael Vose, [4] and [8]. In practice, the genetic algorithms used are
much more complex then the Standard Genetic Algorithm. Consequently, the analyses
presented in this paper have some major weaknesses. Specifically, schema analysis

165
166 MIT Undergraduate Journal of Mathematics

makes some approximations that weaken it; as a result, many in the field find little
use in further analysis. On the other hand, the exact mathematical models are far too
complex. Still, the analyses are worthy of study because they have made possible proofs
of some rather interesting theorems, and because they provide a nice introduction to
algorithmic analysis.
In Section 2, we will describe the Standard Genetic Algorithm. In Section 3, we
will describe Schema and present the Schema Theorem. In Section 4, we will describe a
mathematical model of the Standard Genetic Algorithm, requiring an infinite population
approximation. Finally, in Section 5, we will correct the model by requiring a finite
population.

2. The Standard Genetic Algorithm. The Standard Genetic Algorithms follows


the method of haploid sexual reproduction. In the Standard Genetic Algorithm, the
population is a set of individual binary integers such as 1001011. Each individual
represents the chromosome of a lifeform. There is some function that determines how
t each individual is, and another function that selects individuals from the population
to reproduce. The two selected chromosomes crossover and split again. Next, the two
new individuals mutate. The process is then repeated a certain number of times. Let’s
consider each of the highlighted notions in more detail.
Fitness is a measure of the goodness of a chromosome, that is, a measure of
how well the chromosome fits the search space, or solves the problem at hand. For
the Standard Genetic Algorithm, the fitness f is a function from the set of possible
chromosomes to the positive reals.
Selection is a process for choosing a pair of organisms to reproduce. The selection
function can be any increasing function, but we will concentrate on fitness-proportionate
selection, whose selection function is the probability function

f (xi )
ps (xi ) = Pn
k=1 f (xk )

on the population {x1 , . . . , xn }.


Crossover is a process of exchanging the genes between the two individuals that
are reproducing. There are several such processes, but we will consider only one-point
crossover, a process that is both standard and simple. A random integer i is selected
uniformly between 1 and n. This is the place in the chromosome at which, with prob-
ability pc , crossover will occur. If crossover does occur, then the chunks up to i of the
two chromosomes are swapped. For example, the chromosome 11111111 when crossed
with 00101010 at i = 4 gives the chromosomes 00101111 and 11111010.
Mutation is the process of randomly altering the chromosomes. Say that pm is the
probability that bit i will be flipped. Let i vary from 1 to n. For each i a random
number is selected uniformly between 0 and 1. If the number is less than pm , then the
bit is flipped.
Using the preceeding notions, we now describe the seven steps in the Standard
Genetic Algorithm:
1. Start with a population of n random individuals each with l-bit chromosomes.
2. Calculate the fitness f (x) of each individual.
3. Choose, based on fitness, two individuals and call them parents. Remove the
parents from the population.
The Genetic Algorithm in Computer Science 167

4. Use a random process to determine whether to perform crossover. If so, refer to


the output of the crossover as the children. If not, simply refer to the parents as
the children.
5. Mutate the children with probability pm of mutation for each bit.
6. Put the two children into an empty set called the new generation.
7. Return to Step 2 until the new generation contains n individuals. Delete one child
at random if n is odd. Then replace the old population with the new generation.
Return to Step 1.

3. Schema. A schema is a template made up of a string of 1s, 0s, and *s, where * is
used as a wild card that can be either 1 or 0. For example H = 1 ∗ ∗ 0 ∗ 0 is a schema;
it has eight instances, one of which is 101010. The number of non-*, or de ned, bits
in a schema is called its order and denoted o(H). In the example, H has order 3. The
greatest distance between two defined bits is the de ning length d(H). In the example,
H has a defining length of 3. In the discussion that follows we use the term ‘schema’ to
refer both to the template and to the set of instances it defines within a population.
There is an interesting point to consider. Let S be the set of all strings of length l.
l
There are 3l possible schema on S, but 22 different subsets of S. Hence, schema cannot
be used to represent every possible population within S. On the other hand, schema
are thought to form a representative subset of the set of all subsets of S.
We now look at the expected number of instances of schema H as we iterate the
Standard Genetic Algorithm. Let m(H, t) be the number of instances of H at time t.
Let f (x) represent the fitness of chromosome x, and f (t) represent the average fitness
at time t, or
P
f (x)
f (t) = x∈S
n
where n = |S|. Let u

(H, t) represent the average fitness of instances of H at time t, or
P
f (x)
x∈H
u

(H, t) = .
m(H, t)

If we completely ignore the effects of crossover and mutation, then we get the expected
value P P
x∈H f (x) f (x) u

(H, t)m(H, t)
E(m(H, t + 1)) = n P = x∈H = . (3-1)
x∈S f (x) f (t) f (t)
Let’s consider only the effects of crossover and mutation that lower the number of
instances of H in the population. Then we will get a good lower bound on E(m(H, t+1)).
Let Sc (H) be the probability that a random crossover bit is between the defining
bits of H. Let pc be the probability of crossover occuring. Then
 d(H) 
Sc (H) = 1 − pc .
l−1

Let Sm (H) be the probability of an instance of H remaining the same after mutation.
Then Sm (H) is dependent on the order of H. If the probability of mutation is pm , then

Sm (H) = (1 − pm )o(H) .
168 MIT Undergraduate Journal of Mathematics

Therefore we can alter Equation (3-1) to get the following theorem, proved by John
Holland. Schema Theorem. With the above notation, we have

u

(H, t)m(H, t)  d(H) 
(1 − pm )o(H) .

E(m(H, t + 1)) ≥ 1 − pc (3-2)
f (t) l−1

This theorem is often interpreted to mean that, if u


(H, t) > f (t), then there will
be exponentially more instances of schema with low defining length and low order. As
desirable as is this interpretation, it is considered questionable by many. Still, the
theorem does justify, to some degree, the Building Block Hypothesis, which states that
all genetic algorithms follow a certain pattern. In the pattern, schema with low order
and low defining length are optimized, and then, thanks to crossover, turned into higher
order and higher defining length schema. Both of these theories have been carefully
examined in recent years.

4. Infinite Population Assumption. The exact mathematical model of the genetic


algorithm uses an algorithm slightly different from the Standard Genetic Algorithm.
Referring back to Step 5 in Section 2, change it so that one of the new individuals will
be immediately deleted. Thus the loop is iterated n times, not n/2, where n is the size
of the population. This is the only change.
As before, let S be the set of all strings of length l. Let N be the size of S, or 2l . Let
p~(t) be a column vector with 2l rows such that the ith component p~i (t) is equal to the
proportion of the population P at time t that has chromosome i. Let ~s(t) be another
column vector with 2l rows such that the ith component ~si (t) is equal to the probability
that chromosome i will be selected as a parent. For example, if l = 2 and there are three
individuals in the population, two with chromosome 10, and one with chromosome 11,
then p~(t) = (0, 0, 23 , 13 )T . If the fitness is equal to the number of 1s in the string, then
~s(t) = (0, 0, 12 , 12 )T . The model is making an infinite population assumption in that the
size of the population is never set. We keep track of proportions of chromosomes and
not actual numbers.
Let F = (Fi,j ) be a diagonal N × N matrix with Fi,i = f (i). There is a simple
relation between p~ and ~s, which is equivalent to the definition of fitness proportionate
selection:
F p~(t)
~s(t) = . (4-1)
|F p~(t)|
Our goal is the following: given a column vector ~x, to construct a column-vector-
valued function M(~x) such that

M(~s(t)) = p~(t + 1).

We say that M represents recombination, where recombination is taken to mean the


composition
L of crossover and mutation. N
Let i j denote the componentwise sum of i and j mod 2, and let i j denote the
componentwise product of i and j. Let M = (Mi,j ) be the matrix whose i, jth entry
Mij is the probability that 0 results from the recombination of i and j. An expression
for M has been derived explicitly, but the derivation is complex, though not particularly
difficult. Let σj be the permutation operator on RN given by

σj (y0 , · · · , yN −1 )T = (yj L 0 , · · · , yj L(N −1) )T .



The Genetic Algorithm in Computer Science 169

Then M(s) is given by the following expression, due to Vose [8, p. 65]:
M(s) = (σ0~s(t))T M σ0~s(t), · · · , (σ2l −1~s(t))T M σ2l −1~s(t) .


With this expression, we can calculate explicitly the expected value of each gener-
ation from the preceeding generation. The full derivation is fairly complex. It can be
found in Vose and Liepins [7, pp. 3339].

5. Finite Population Model of a General Genetic Algorithm. The information


on Markov chains in this section comes from Rota’s book [6].
A sequence of integer random variables X0 , X1 , · · · , Xn , called states, forms a Markov
chain if, for any integers i0 , i1 , · · · , in−1 , we have
 
P Xn = in (X0 = i0 ) ∩ (X1 = i1 ) ∩ · · · ∩ (Xn−1 = in−1 ) = P Xn = in Xn−1 = in−1
where we’ve used conditional probability. In other words, in a Markov chain, each of
the states depends only upon the state immediately preceding it.
Let P be a population of strings from S. So n = |P |. Define φ ~ i to be a state i of
the population. The different states of the chromosomes are enumerated in the same
fashion as they were in the infinite population model. Then (φi )j is the number of times
j appears in state i. The states now make up a Markov chain.
The transition probability Qij from state i to state j is given by
~j = t + 1 φ
~i = t

Qij = P φ
where t + 1 is the iteration of the algorithm following t. The matrix Q = (Qij ) is called
the Markov transition matrix. An explicit expression for Q will describe the genetic
algorithm. We will now compute Q.
Define a matrix Z by  
Z= φ ~1T , · · · , φ
~0T , φ ~M T
l
−1
where M = n+2

2l −1 is the number of populations of size n. Let pi (y) be the probability
that selection and recombination on population φ ~ i yields chromosome y. An expression
for Qi,j in terms of pi (y) can be easily derived probabilistically. Since pi (y) must occur
Zy,j times for each component y, we have
l
2Y −1
pi (y)Zy,j (5-1)
i=0

as the probability that pi (y) occurs in any one way. But the event can occur in many
ways. The number of ways of choosing the Z0,j chromosomes that are equal to 0 is
n n−Z0,j
 
Z0,j . Hence there are Z1,j ways for choosing the Z1,j chromosomes equal to 1,
and so on, giving
    
n n − Z0,j n − Z2l −1,j − · · · − Z0,j n!
··· = . (5-2)
Z0,j Z1,j Z2l −1,j Z0,j !Z1,j ! · · · Z2l −1,j !
Combining (5-1) and (5-2), we get
l
2Y −1
pi (y)Zy,j
Qi,j = n! .
i=0
Zy,j !
170 MIT Undergraduate Journal of Mathematics

To get an expresion for pi (y), we use M and M from Section 4. We have


 ~
F φ(i)

pi (y) = ,
~
|F φ(i)| y

and this gives us Qi,j :


  Zy,j
l
2Y −1 ~ i /|F φ
M Fφ ~i|
i
Qi,j = n! .
i=0
Zy,j !

References
[1] Heitkoetter, J., and Beasley, D., The hitchhiker’s guide to evolutionary comput-
ing: A list of frequently asked questions (FAQ), USENET:comp.ai.genetic, 1996.
Available via anonymous FTP from rtfm.mit.edu:/pub/usenet/news.answers/ai-
faq/genetic/
[2] Liepins, G. E., and Vose, M. D., Deceptiveness and genetic algorithm dynamics,
in Foundations of Genetic Algorithms, G. Rawlins (ed.), Morgan Kauffmann,
1991.
[3] Mitchell, M., An Introduction to Genetic Algorithms, MIT press, 1996.
[4] Nix, A. E., and Vose, M., D., Modeling Genetic Algorithms With Markov
Chains, Annals of Mathematics and Artificial Intelligence 5 (1992), 7988.
[5] Purves, W., Orians, G., and Heller, C., Life, the Science of Biology, Sinauer, 1995.
[6] Rota, G. C., Introduction To Probability Theory third preliminary edition,
Birkhauser, 1995.
[7] Vose, M. D., and Liepins, G. E., Punctuated equilibria in genetic search, Complex
Systems 5 (1991), 3134.
[8] Vose, M. D., Modeling simple genetic algorithms, in Foundations of Genetic
Algorithms 2, L. D. Whitley (ed.), Morgan Kaufmann, 1993.

You might also like