0% found this document useful (0 votes)
22 views

Nature Inspired Metaheuristic Algorithms

Uploaded by

Sab-Win Damad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views

Nature Inspired Metaheuristic Algorithms

Uploaded by

Sab-Win Damad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 49

Nature-Inspired

Metaheuristic Algorithms

Second Edition

Xin-She Yang

University of Cambridge, United Kingdom

ta h e u r is
d Me ti
i re
Luniver Press
cA
p
N a t u re -I n s

lgorith m s

Xin-She Yang
c Luniver Press
S ec

)
10

on
d E d ion (20
it
Published in 2010 by Luniver Press
Frome, BA11 6TT, United Kingdom
www.luniver.com

Copyright c Luniver Press 2010


Copyright c Xin-She Yang 2010

All rights reserved. This book, or parts thereof, may not be reproduced in
any form or by any means, electronic or mechanical, including photocopy-
ing, recording or by any information storage and retrieval system, without
permission in writing from the copyright holder.

British Library Cataloguing-in-Publication Data


A catalogue record for this book is available from
the British Library

ISBN-13: 978-1-905986-28-6
ISBN-10: 1-905986-28-9

While every attempt is made to ensure that the information in this publi-
cation is correct, no liability can be accepted by the authors or publishers
for loss, damage or injury caused by any errors in, or omission from, the
information given.

ta h e u r is
d Me ti
i re
cA
p
N a t u re -I n s

lgorith m s

Xin-She Yang
c Luniver Press
S ec

)
10

on
d E d ion (20
it
CONTENTS

Preface to the Second Edition v

Preface to the First Edition vi

1 Introduction 1
1.1 Optimization 1
1.2 Search for Optimality 2
1.3 Nature-Inspired Metaheuristics 4
1.4 A Brief History of Metaheuristics 5

2 Random Walks and Lévy Flights 11


Me ta h e u r is
re d ti
i 2.1 Random Variables 11
cA
p
N a t u re -I n s

lgorith m s

Xin-She Yang 2.2 Random Walks 12


c Luniver Press
2.3 Lévy Distribution and Lévy Flights 14
S ec

)
10

on
d E d ion (20 2.4 Optimization as Markov Chains 17
it
i
ii CONTENTS

3 Simulated Annealing 21
3.1 Annealing and Boltzmann Distribution 21
3.2 Parameters 22
3.3 SA Algorithm 23
3.4 Unconstrained Optimization 24
3.5 Stochastic Tunneling 26

4 How to Deal With Constraints 29


4.1 Method of Lagrange Multipliers 29
4.2 Penalty Method 32
4.3 Step Size in Random Walks 33
4.4 Welded Beam Design 34
4.5 SA Implementation 35

5 Genetic Algorithms 41
5.1 Introduction 41
5.2 Genetic Algorithms 42
5.3 Choice of Parameters 43

6 Differential Evolution 47
6.1 Introduction 47
6.2 Differential Evolution 47
6.3 Variants 50
6.4 Implementation 50

7 Ant and Bee Algorithms 53


7.1 Ant Algorithms 53
7.1.1 Behaviour of Ants 53
7.1.2 Ant Colony Optimization 54
7.1.3 Double Bridge Problem 56
7.1.4 Virtual Ant Algorithm 57
7.2 Bee-inspired Algorithms 57
ta h e u r is 7.2.1 Behavior of Honeybees 57
d Me ti
i re
7.2.2 Bee Algorithms 58
cA
p
N a t u re -I n s

lgorith m s

Xin-She Yang 7.2.3 Honeybee Algorithm 59


c Luniver Press
7.2.4 Virtual Bee Algorithm 60
S ec

)
10

on
d E d ion (20 7.2.5 Artificial Bee Colony Optimization 61
it
CONTENTS iii

8 Swarm Optimization 63
8.1 Swarm Intelligence 63
8.2 PSO algorithms 64
8.3 Accelerated PSO 65
8.4 Implementation 66
8.5 Convergence Analysis 69

9 Harmony Search 73
9.1 Harmonics and Frequencies 73
9.2 Harmony Search 74
9.3 Implementation 76

10 Firefly Algorithm 81
10.1 Behaviour of Fireflies 81
10.2 Firefly Algorithm 82
10.3 Light Intensity and Attractiveness 83
10.4 Scalings and Asymptotics 84
10.5 Implementation 86
10.6 FA variants 89
10.7 Spring Design 89

11 Bat Algorithm 97
11.1 Echolocation of bats 97
11.1.1 Behaviour of microbats 97
11.1.2 Acoustics of Echolocation 98
11.2 Bat Algorithm 98
11.2.1 Movement of Virtual Bats 99
11.2.2 Loudness and Pulse Emission 100
11.3 Validation and Discussions 101
11.4 Implementation 102
11.5 Further Topics 103

12 Cuckoo Search 105


ta h e u r is
d Me ti
i re
12.1 Cuckoo Breeding Behaviour 105
cA
p
N a t u re -I n s

lgorith m s

Xin-She Yang 12.2 Lévy Flights 106


c Luniver Press
12.3 Cuckoo Search 106
S ec

)
10

on
d E d ion (20 12.4 Choice of Parameters 108
it
iv CONTENTS

12.5 Implementation 108

13 ANNs and Support Vector Machine 117


13.1 Artificial Neural Networks 117
13.1.1 Artificial Neuron 117
13.1.2 Neural Networks 118
13.1.3 Back Propagation Algorithm 119
13.2 Support Vector Machine 121
13.2.1 Classifications 121
13.2.2 Statistical Learning Theory 121
13.2.3 Linear Support Vector Machine 122
13.2.4 Kernel Functions and Nonlinear SVM 125

14 Metaheuristics – A Unified Approach 127


14.1 Intensification and Diversification 127
14.2 Ways for Intensification and Diversification 128
14.3 Generalized Evolutionary Walk Algorithm (GEWA) 130
14.4 Eagle Strategy 133
14.5 Other Metaheuristic Algorithms 135
14.5.1 Tabu Search 135
14.5.2 Photosynthetic and Enzyme Algorithm 135
14.5.3 Artificial Immune System and Others 136
14.6 Further Research 137
14.6.1 Open Problems 137
14.6.2 To be Inspired or not to be Inspired 137

References 141

Index 147

ta h e u r is
d Me ti
i re
cA
p
N a t u re -I n s

lgorith m s

Xin-She Yang
c Luniver Press
S ec

)
10

on
d E d ion (20
it
v

Preface to the Second Edition

Since the publication of the first edition of this book in 2008, significant
developments have been made in metaheuristics, and new nature-inspired
metaheuristic algorithms emerge, including cuckoo search and bat algo-
rithms. Many readers have taken time to write to me personally, providing
valuable feedback, asking for more details of algorithm implementation,
or simply expressing interests in applying these new algorithms in their
applications.
In this revised edition, we strive to review the latest developments in
metaheuristic algorithms, to incorporate readers’ suggestions, and to pro-
vide a more detailed description to algorithms. Firstly, we have added
detailed descriptions of how to incorporate constraints in the actual imple-
mentation. Secondly, we have added three chapters on differential evolu-
tion, cuckoo search and bat algorithms, while some existing chapters such
as ant algorithms and bee algorithms are combined into one due to their
similarity. Thirdly, we also explained artificial neural networks and sup-
port vector machines in the framework of optimization and metaheuristics.
Finally, we have been trying in this book to provide a consistent and uni-
fied approach to metaheuristic algorithms, from a brief history in the first
chapter to the unified approach in the last chapter.
Furthermore, we have provided more Matlab programs. At the same
time, we also omit some of the implementation such as genetic algorithms,
as we know that there are many good software packages (both commercial
and open course). This allows us to focus more on the implementation of
new algorithms. Some of the programs also have a version for constrained
optimization, and readers can modify them for their own applications.
Even with the good intention to cover most popular metaheuristic al-
gorithms, the choice of algorithms is a difficult task, as we do not have
the space to cover every algorithm. The omission of an algorithm does not
mean that it is not popular. In fact, some algorithms are very powerful
and routinely used in many applications. Good examples are Tabu search
and combinatorial algorithms, and interested readers can refer to the refer-
ences provided at the end of the book. The effort in writing this little book
becomes worth while if this book could in some way encourage readers’
interests in metaheuristics.
Xin-She Yang

ta h e u r is August 2010
d Me ti
i re
cA
p
N a t u re -I n s

lgorith m s

Xin-She Yang
c Luniver Press
S ec

)
10

on
d E d ion (20
it
vi

Preface to the First Edition

Modern metaheuristic algorithms such as the ant colony optimization and


the harmony search start to demonstrate their power in dealing with tough
optimization problems and even NP-hard problems. This book reviews and
introduces the state-of-the-art nature-inspired metaheuristic algorithms in
optimization, including genetic algorithms (GA), particle swarm optimiza-
tion (PSO), simulated annealing (SA), ant colony optimization (ACO), bee
algorithms (BA), harmony search (HS), firefly algorithms (FA), photosyn-
thetic algorithm (PA), enzyme algorithm (EA) and Tabu search. By imple-
menting these algorithms in Matlab/Octave, we will use worked examples
to show how each algorithm works. This book is thus an ideal textbook for
an undergraduate and/or graduate course. As some of the algorithms such
as the harmony search and firefly algorithms are at the forefront of current
research, this book can also serve as a reference book for researchers.
I would like to thank my editor, Andy Adamatzky, at Luniver Press for
his help and professionalism. Last but not least, I thank my wife and son
for their help.

Xin-She Yang

Cambridge, 2008

ta h e u r is
d Me ti
i re
cA
p
N a t u re -I n s

lgorith m s

Xin-She Yang
c Luniver Press
S ec

)
10

on
d E d ion (20
it
Chapter 1

INTRODUCTION

It is no exaggeration to say that optimization is everywhere, from engi-


neering design to business planning and from the routing of the Internet to
holiday planning. In almost all these activities, we are trying to achieve cer-
tain objectives or to optimize something such as profit, quality and time.
As resources, time and money are always limited in real-world applica-
tions, we have to find solutions to optimally use these valuable resources
under various constraints. Mathematical optimization or programming is
the study of such planning and design problems using mathematical tools.
Nowadays, computer simulations become an indispensable tool for solving
such optimization problems with various efficient search algorithms.

1.1 OPTIMIZATION

Mathematically speaking, it is possible to write most optimization problems


in the generic form
minimize fi (x), (i = 1, 2, ..., M ), (1.1)
x∈ℜn

subject to hj (x) = 0, (j = 1, 2, ..., J), (1.2)


gk (x) ≤ 0, (k = 1, 2, ..., K), (1.3)
where fi (x), hj (x) and gk (x) are functions of the design vector

x = (x1 , x2 , ..., xn )T . (1.4)

Here the components xi of x are called design or decision variables, and


they can be real continuous, discrete or the mixed of these two.
The functions fi (x) where i = 1, 2, ..., M are called the objective func-
ta h u r is
d M e etions ti or simply cost functions, and in the case of M = 1, there is only a
i re
cA
p
N a t u re -I n s

single objective. The space spanned by the decision variables is called the
lgorith m s

Xin-She Yang
design space or search space ℜn , while the space formed by the objective
c Luniver Press
function values is called the solution space or response space. The equali-
S ec

)
10

on
d E d i oties 20
i t n ( for hj and inequalities for gk are called constraints. It is worth pointing
Nature-Inspired Metaheuristic Algorithms, 2nd Edition by Xin-She Yang 1
Copyright c 2010 Luniver Press
2 CHAPTER 1. INTRODUCTION

out that we can also write the inequalities in the other way ≥ 0, and we
can also formulate the objectives as a maximization problem.
In a rare but extreme case where there is no objective at all, there are
only constraints. Such a problem is called a feasibility problem because
any feasible solution is an optimal solution.
If we try to classify optimization problems according to the number
of objectives, then there are two categories: single objective M = 1 and
multiobjective M > 1. Multiobjective optimization is also referred to as
multicriteria or even multi-attributes optimization in the literature. In
real-world problems, most optimization tasks are multiobjective. Though
the algorithms we will discuss in this book are equally applicable to mul-
tiobjective optimization with some modifications, we will mainly place the
emphasis on single objective optimization problems.
Similarly, we can also classify optimization in terms of number of con-
straints J + K. If there is no constraint at all J = K = 0, then it is
called an unconstrained optimization problem. If K = 0 and J ≥ 1, it is
called an equality-constrained problem, while J = 0 and K ≥ 1 becomes
an inequality-constrained problem. It is worth pointing out that in some
formulations in the optimization literature, equalities are not explicitly in-
cluded, and only inequalities are included. This is because an equality
can be written as two inequalities. For example h(x) = 0 is equivalent to
h(x) ≤ 0 and h(x) ≥ 0.
We can also use the actual function forms for classification. The objec-
tive functions can be either linear or nonlinear. If the constraints hj and gk
are all linear, then it becomes a linearly constrained problem. If both the
constraints and the objective functions are all linear, it becomes a linear
programming problem. Here ‘programming’ has nothing to do with com-
puting programming, it means planning and/or optimization. However,
generally speaking, all fi , hj and gk are nonlinear, we have to deal with a
nonlinear optimization problem.

1.2 SEARCH FOR OPTIMALITY

After an optimization problem is formulated correctly, the main task is


to find the optimal solutions by some solution procedure using the right
mathematical techniques.
Figuratively speaking, searching for the optimal solution is like treasure
hunting. Imagine we are trying to hunt for a hidden treasure in a hilly
landscape within a time limit. In one extreme, suppose we are blind-
ta h
d M e eu ris tifold without any guidance, the search process is essentially a pure random
i re
cA
p
N a t u re -I n s

search, which is usually not efficient as we can expect. In another extreme,


lgorith m s

Xin-She Yang
if we are told the treasure is placed at the highest peak of a known region,
c Luniver Press
we will then directly climb up to the steepest cliff and try to reach to the
S ec

)
10

on
d E d i o n ( 2 0highest peak, and this scenario corresponds to the classical hill-climbing
it
1.2 SEARCH FOR OPTIMALITY 3

techniques. In most cases, our search is between these extremes. We are


not blind-fold, and we do not know where to look for. It is a silly idea to
search every single square inch of an extremely large hilly region so as to
find the treasure.
The most likely scenario is that we will do a random walk, while looking
for some hints; we look at some place almost randomly, then move to an-
other plausible place, then another and so on. Such random walk is a main
characteristic of modern search algorithms. Obviously, we can either do
the treasure-hunting alone, so the whole path is a trajectory-based search,
and simulated annealing is such a kind. Alternatively, we can ask a group
of people to do the hunting and share the information (and any treasure
found), and this scenario uses the so-called swarm intelligence and corre-
sponds to the particle swarm optimization, as we will discuss later in detail.
If the treasure is really important and if the area is extremely large, the
search process will take a very long time. If there is no time limit and if any
region is accessible (for example, no islands in a lake), it is theoretically
possible to find the ultimate treasure (the global optimal solution).
Obviously, we can refine our search strategy a little bit further. Some
hunters are better than others. We can only keep the better hunters and
recruit new ones, this is something similar to the genetic algorithms or
evolutionary algorithms where the search agents are improving. In fact, as
we will see in almost all modern metaheuristic algorithms, we try to use the
best solutions or agents, and randomize (or replace) the not-so-good ones,
while evaluating each individual’s competence (fitness) in combination with
the system history (use of memory). With such a balance, we intend to
design better and efficient optimization algorithms.
Classification of optimization algorithm can be carried out in many ways.
A simple way is to look at the nature of the algorithm, and this divides the
algorithms into two categories: deterministic algorithms, and stochastic
algorithms. Deterministic algorithms follow a rigorous procedure, and its
path and values of both design variables and the functions are repeatable.
For example, hill-climbing is a deterministic algorithm, and for the same
starting point, they will follow the same path whether you run the program
today or tomorrow. On the other hand, stochastic algorithms always have
some randomness. Genetic algorithms are a good example, the strings or
solutions in the population will be different each time you run a program
since the algorithms use some pseudo-random numbers, though the final
results may be no big difference, but the paths of each individual are not
exactly repeatable.
dM e t a h e u ris Furthermore, there is a third type of algorithm which is a mixture, or
i re ti
a hybrid, of deterministic and stochastic algorithms. For example, hill-
cA
p
N a t u re -I n s

lgorith m s

Xin-She Yang
climbing with a random restart is a good example. The basic idea is to
c Luniver Press use the deterministic algorithm, but start with different initial points. This
has certain advantages over a simple hill-climbing technique, which may be
S ec

)
10

o
nd 0
E d ition (2
4 CHAPTER 1. INTRODUCTION

stuck in a local peak. However, since there is a random component in this


hybrid algorithm, we often classify it as a type of stochastic algorithm in
the optimization literature.

1.3 NATURE-INSPIRED METAHEURISTICS

Most conventional or classic algorithms are deterministic. For example, the


simplex method in linear programming is deterministic. Some determinis-
tic optimization algorithms used the gradient information, they are called
gradient-based algorithms. For example, the well-known Newton-Raphson
algorithm is gradient-based, as it uses the function values and their deriva-
tives, and it works extremely well for smooth unimodal problems. However,
if there is some discontinuity in the objective function, it does not work
well. In this case, a non-gradient algorithm is preferred. Non-gradient-
based or gradient-free algorithms do not use any derivative, but only the
function values. Hooke-Jeeves pattern search and Nelder-Mead downhill
simplex are examples of gradient-free algorithms.
For stochastic algorithms, in general we have two types: heuristic and
metaheuristic, though their difference is small. Loosely speaking, heuristic
means ‘to find’ or ‘to discover by trial and error’. Quality solutions to a
tough optimization problem can be found in a reasonable amount of time,
but there is no guarantee that optimal solutions are reached. It hopes
that these algorithms work most of the time, but not all the time. This is
good when we do not necessarily want the best solutions but rather good
solutions which are easily reachable.
Further development over the heuristic algorithms is the so-called meta-
heuristic algorithms. Here meta- means ‘beyond’ or ‘higher level’, and
they generally perform better than simple heuristics. In addition, all meta-
heuristic algorithms use certain tradeoff of randomization and local search.
It is worth pointing out that no agreed definitions of heuristics and meta-
heuristics exist in the literature; some use ‘heuristics’ and ‘metaheuristics’
interchangeably. However, the recent trend tends to name all stochastic
algorithms with randomization and local search as metaheuristic. Here we
will also use this convention. Randomization provides a good way to move
away from local search to the search on the global scale. Therefore, almost
all metaheuristic algorithms intend to be suitable for global optimization.
Heuristics is a way by trial and error to produce acceptable solutions to
a complex problem in a reasonably practical time. The complexity of the
problem of interest makes it impossible to search every possible solution
ta h
d M e eu ris tior combination, the aim is to find good feasible solution in an acceptable
i re
cA
p
N a t u re -I n s

timescale. There is no guarantee that the best solutions can be found, and
lgorith m s

Xin-She Yang
we even do not know whether an algorithm will work and why if it does
c Luniver Press
work. The idea is to have an efficient but practical algorithm that will
S ec

)
10

on
d E d i o n ( 2 0work most the time and is able to produce good quality solutions. Among
it
1.4 A BRIEF HISTORY OF METAHEURISTICS 5

the found quality solutions, it is expected some of them are nearly optimal,
though there is no guarantee for such optimality.
Two major components of any metaheuristic algorithms are: intensifi-
cation and diversification, or exploitation and exploration. Diversification
means to generate diverse solutions so as to explore the search space on the
global scale, while intensification means to focus on the search in a local
region by exploiting the information that a current good solution is found
in this region. This is in combination with with the selection of the best
solutions. The selection of the best ensures that the solutions will converge
to the optimality, while the diversification via randomization avoids the
solutions being trapped at local optima and, at the same time, increases
the diversity of the solutions. The good combination of these two major
components will usually ensure that the global optimality is achievable.
Metaheuristic algorithms can be classified in many ways. One way is
to classify them as: population-based and trajectory-based. For example,
genetic algorithms are population-based as they use a set of strings, so
is the particle swarm optimization (PSO) which uses multiple agents or
particles.
On the other hand, simulated annealing uses a single agent or solution
which moves through the design space or search space in a piecewise style.
A better move or solution is always accepted, while a not-so-good move
can be accepted with a certain probability. The steps or moves trace a tra-
jectory in the search space, with a non-zero probability that this trajectory
can reach the global optimum.
Before we introduce all popular meteheuristic algorithms in detail, let
us look at their history briefly.

1.4 A BRIEF HISTORY OF METAHEURISTICS

Throughout history, especially at the early periods of human history, we


humans’ approach to problem-solving has always been heuristic or meta-
heuristic – by trial and error. Many important discoveries were done
by ‘thinking outside the box’, and often by accident; that is heuristics.
Archimedes’s Eureka moment was a heuristic triumph. In fact, our daily
learning experience (at least as a child) is dominantly heuristic.
Despite its ubiquitous nature, metaheuristics as a scientific method to
problem solving is indeed a modern phenomenon, though it is difficult to
pinpoint when the metaheuristic method was first used. Alan Turing was
probably the first to use heuristic algorithms during the second World War
ta h u r is
d M e ewhen ti he was breaking German Enigma ciphers at Bletchley Park. Turing
i re
cA
p
N a t u re -I n s

called his search method heuristic search, as it could be expected it worked


lgorith m s

Xin-She Yang
most of time, but there was no guarantee to find the correct solution,
c Luniver Press
but it was a tremendous success. In 1945, Turing was recruited to the
S ec

)
10

on
d E d i oNational20 Physical Laboratory (NPL), UK where he set out his design for
it n (
6 CHAPTER 1. INTRODUCTION

the Automatic Computing Engine (ACE). In an NPL report on Intelligent


machinery in 1948, he outlined his innovative ideas of machine intelligence
and learning, neural networks and evolutionary algorithms.
The 1960s and 1970s were the two important decades for the develop-
ment of evolutionary algorithms. First, John Holland and his collaborators
at the University of Michigan developed the genetic algorithms in 1960s
and 1970s. As early as 1962, Holland studied the adaptive system and was
the first to use crossover and recombination manipulations for modeling
such system. His seminal book summarizing the development of genetic
algorithms was published in 1975. In the same year, De Jong finished his
important dissertation showing the potential and power of genetic algo-
rithms for a wide range of objective functions, either noisy, multimodal or
even discontinuous.
In essence, a genetic algorithm (GA) is a search method based on the ab-
straction of Darwinian evolution and natural selection of biological systems
and representing them in the mathematical operators: crossover or recom-
bination, mutation, fitness, and selection of the fittest. Ever since, genetic
algorithms become so successful in solving a wide range of optimization
problems, there have several thousands of research articles and hundreds
of books written. Some statistics show that a vast majority of Fortune
500 companies are now using them routinely to solve tough combinatorial
optimization problems such as planning, data-fitting, and scheduling.
During the same period, Ingo Rechenberg and Hans-Paul Schwefel both
then at the Technical University of Berlin developed a search technique for
solving optimization problem in aerospace engineering, called evolutionary
strategy, in 1963. Later, Peter Bienert joined them and began to construct
an automatic experimenter using simple rules of mutation and selection.
There was no crossover in this technique, only mutation was used to pro-
duce an offspring and an improved solution was kept at each generation.
This was essentially a simple trajectory-style hill-climbing algorithm with
randomization. As early as 1960, Lawrence J. Fogel intended to use simu-
lated evolution as a learning process as a tool to study artificial intelligence.
Then, in 1966, L. J. Fogel, together A. J. Owen and M. J. Walsh, developed
the evolutionary programming technique by representing solutions as finite-
state machines and randomly mutating one of these machines. The above
innovative ideas and methods have evolved into a much wider discipline,
called evolutionary algorithms and/or evolutionary computation.
Although our focus in this book is metaheuristic algorithms, other al-
gorithms can be thought as a heuristic optimization technique. These in-
ta h cludes artificial neural networks, support vector machines and many other
d M e e u r i s ti
i re machine learning techniques. Indeed, they all intend to minimize their
cA
p
N a t u re -I n s

lgorith m s

Xin-She Yang
learning errors and prediction (capability) errors via iterative trials and
c Luniver Press errors.
S ec

)
10

on
d E d ion (20
it
1.4 A BRIEF HISTORY OF METAHEURISTICS 7

Artificial neural networks are now routinely used in many applications.


In 1943, W. McCulloch and W. Pitts proposed the artificial neurons as
simple information processing units. The concept of a neural network was
probably first proposed by Alan Turing in his 1948 NPL report concerning
‘intelligent machinery’. Significant developments were carried out from the
1940s and 1950s to the 1990s with more than 60 years of history.
The support vector machine as a classification technique can date back to
the earlier work by V. Vapnik in 1963 on linear classifiers, and the nonlinear
classification with kernel techniques were developed by V. Vapnik and his
collaborators in the 1990s. A systematical summary in Vapnik’s book on
the Nature of Statistical Learning Theory was published in 1995.
The two decades of 1980s and 1990s were the most exciting time for
metaheuristic algorithms. The next big step is the development of simu-
lated annealing (SA) in 1983, an optimization technique, pioneered by S.
Kirkpatrick, C. D. Gellat and M. P. Vecchi, inspired by the annealing pro-
cess of metals. It is a trajectory-based search algorithm starting with an
initial guess solution at a high temperature, and gradually cooling down
the system. A move or new solution is accepted if it is better; otherwise,
it is accepted with a probability, which makes it possible for the system to
escape any local optima. It is then expected that if the system is cooled
down slowly enough, the global optimal solution can be reached.
The actual first usage of memory in modern metaheuristics is probably
due to Fred Glover’s Tabu search in 1986, though his seminal book on Tabu
search was published later in 1997.
In 1992, Marco Dorigo finished his PhD thesis on optimization and nat-
ural algorithms, in which he described his innovative work on ant colony
optimization (ACO). This search technique was inspired by the swarm in-
telligence of social ants using pheromone as a chemical messenger. Then, in
1992, John R. Koza of Stanford University published a treatise on genetic
programming which laid the foundation of a whole new area of machine
learning, revolutionizing computer programming. As early as in 1988, Koza
applied his first patent on genetic programming. The basic idea is to use the
genetic principle to breed computer programs so as to gradually produce
the best programs for a given type of problem.
Slightly later in 1995, another significant progress is the development
of the particle swarm optimization (PSO) by American social psychologist
James Kennedy, and engineer Russell C. Eberhart. Loosely speaking, PSO
is an optimization algorithm inspired by swarm intelligence of fish and birds
and by even human behavior. The multiple agents, called particles, swarm
e t a h earound
u r is the search space starting from some initial random guess. The
dM ti
i re swarm communicates the current best and shares the global best so as to
cA
p
N a t u re -I n s

lgorith m s

Xin-She Yang
focus on the quality solutions. Since its development, there have been about
c Luniver 20 Pressdifferent variants of particle swarm optimization techniques, and have
been applied to almost all areas of tough optimization problems. There is
S ec

)
10

o
nd 0
E d ition (2
8 CHAPTER 1. INTRODUCTION

some strong evidence that PSO is better than traditional search algorithms
and even better than genetic algorithms for many types of problems, though
this is far from conclusive.
In around 1996 and later in 1997, R. Storn and K. Price developed their
vector-based evolutionary algorithm, called differential evolution (DE), and
this algorithm proves more efficient than genetic algorithms in many ap-
plications.
In 1997, the publication of the ‘no free lunch theorems for optimization’
by D. H. Wolpert and W. G. Macready sent out a shock way to the opti-
mization community. Researchers have been always trying to find better
algorithms, or even universally robust algorithms, for optimization, espe-
cially for tough NP-hard optimization problems. However, these theorems
state that if algorithm A performs better than algorithm B for some opti-
mization functions, then B will outperform A for other functions. That is
to say, if averaged over all possible function space, both algorithms A and B
will perform on average equally well. Alternatively, there is no universally
better algorithms exist. That is disappointing, right? Then, people real-
ized that we do not need the average over all possible functions for a given
optimization problem. What we want is to find the best solutions, which
has nothing to do with average over all possible function space. In addition,
we can accept the fact that there is no universal or magical tool, but we do
know from our experience that some algorithms indeed outperform others
for given types of optimization problems. So the research now focuses on
finding the best and most efficient algorithm(s) for a given problem. The
objective is to design better algorithms for most types of problems, not for
all the problems. Therefore, the search is still on.
At the turn of the 21st century, things became even more exciting. First,
Zong Woo Geem et al. in 2001 developed the harmony search (HS) algo-
rithm, which has been widely applied in solving various optimization prob-
lems such as water distribution, transport modelling and scheduling. In
2004, S. Nakrani and C. Tovey proposed the honey bee algorithm and its
application for optimizing Internet hosting centers, which followed by the
development of a novel bee algorithm by D. T. Pham et al. in 2005 and the
artificial bee colony (ABC) by D. Karaboga in 2005. In 2008, the author of
this book developed the firefly algorithm (FA)1 . Quite a few research arti-
cles on the firefly algorithm then followed, and this algorithm has attracted
a wide range of interests. In 2009, Xin-She Yang at Cambridge University,
UK, and Suash Deb at Raman College of Engineering, India, introduced
an efficient cuckoo search (CS) algorithm, and it has been demonstrated
ta h e u r is that CS is far more effective than most existing metaheuristic algorithms
d Me ti
i re
cA
p
N a t u re -I n s

lgorith m s

1 X. S. Yang, Nature-Inspired Meteheuristic Algorithms, Luniver Press, (2008)


Xin-She Yang
c Luniver Press
S ec

)
10

on
d E d ion (20
it
1.4 A BRIEF HISTORY OF METAHEURISTICS 9

including particle swarm optimization2 . In 2010, the author of this book


developed a bat-inspired algorithm for continuous optimization, and its
efficiency is quite promising.
As we can see, more and more metaheuristic algorithms are being devel-
oped. Such a diverse range of algorithms necessitates a systematic summary
of various metaheuristic algorithms, and this book is such an attempt to
introduce all the latest nature-inspired metaheuristics with diverse appli-
cations.
We will discuss all major modern metaheuristic algorithms in the rest
of this book, including simulated annealing (SA), genetic algorithms (GA),
ant colony optimization (ACO), bee algorithms (BA), differential evolution
(DE), particle swarm optimization (PSO), harmony search (HS), the firefly
algorithm (FA), cuckoo search (CS) and bat-inspired algorithm (BA), and
others.

REFERENCES

1. C. M. Bishop, Neural Networks for Pattern Recognition, Oxford University


Press, Oxford, 1995.
2. B. J. Copeland, The Essential Turing, Oxford University Press, 2004.
3. B. J. Copeland, Alan Turing’s Automatic Computing Engine, Oxford Uni-
versity Press, 2005.
4. K. De Jong, Analysis of the Behaviour of a Class of Genetic Adaptive Sys-
tems, PhD thesis, University of Michigan, Ann Anbor, 1975.
5. M. Dorigo, Optimization, Learning and Natural Algorithms, PhD thesis, Po-
litecnico di Milano, Italy, 1992.
6. L. J. Fogel, A. J. Owens, and M. J. Walsh, Artificial Intelligence Through
Simulated Evolution, Wiley, 1966.
7. Z. W. Geem, J. H. Kim and G. V. Loganathan, A new heuristic optimization:
Harmony search, Simulation, 76(2), 60-68 (2001).
8. F. Glover and M. Laguna, Tabu Search, Kluwer Academic Publishers, Boston,
1997.
9. J. Holland, Adaptation in Natural and Artificial systems, University of Michi-
gan Press, Ann Anbor, 1975.
10. P. Judea, Heuristics, Addison-Wesley, 1984.
11. D. Karaboga, An idea based on honey bee swarm for numerical optimization,
Technical Report, Erciyes University, 2005.
ta h e u r is
d Me 12.
ti J. Kennedy and R. Eberhart, Particle swarm optimization, in: Proc. of the
i re
cA
p
N a t u re -I n s

IEEE Int. Conf. on Neural Networks, Piscataway, NJ, pp. 1942-1948 (1995).
lgorith m s

Xin-She Yang
c Luniver Press
2 Novel cuckoo search ‘beats’ particle swarm optimization, Science Daily, news article
S ec

)
10

on
d E d i o(2820
i t n ( May 2010), www.sciencedaily.com
10 CHAPTER 1. INTRODUCTION

13. S. Kirkpatrick, C. D. Gellat, and M. P. Vecchi, Optimization by simulated


annealing, Science, 220, 671-680 (1983).
14. J. R. Koza, Genetic Programming: One the Programming of Computers by
Means of Natural Selection, MIT Press, 1992.
15. S. Nakrani and C. Tovey, On honey bees and dynamic server allocation in
Internet hostubg centers, Adaptive Behavior, 12, 223-240 (2004).
16. D. T. Pham, A. Ghanbarzadeh, E. Koc, S. Otri, S. Rahim and M. Zaidi, The
bees algorithm, Technical Note, Manufacturing Engineering Center, Cardiff
University, 2005.
17. A. Schrijver, On the history of combinatorial optimization (till 1960), in:
Handbook of Discrete Optimization (Eds K. Aardal, G. L. Nemhauser, R.
Weismantel), Elsevier, Amsterdam, pp.1-68 (2005).
18. H. T. Siegelmann and E. D. Sontag, Turing computability with neural nets,
Appl. Math. Lett., 4, 77-80 (1991).
19. R. Storn and K. Price, Differential evolution - a simple and efficient heuristic
for global optimization over continuous spaces, Journal of Global Optimiza-
tion, 11, 341-359 (1997).
20. A. M. Turing, Intelligent Machinery, National Physical Laboratory, technical
report, 1948.
21. V. Vapnik, The Nature of Statistical Learning Theory, Springer-Verlag, New
York, 1995.
22. V. Vapnik, S. Golowich, A. Smola, Support vector method for function ap-
proxiation, regression estimation, and signal processing, in: Advances in
Neural Information Processing System 9 (Eds. M. Mozer, M. Jordan and
T. Petsche), MIT Press, Cambridge MA, 1997.
23. D. H. Wolpert and W. G. Macready, No free lunch theorems for optimization,
IEEE Transaction on Evolutionary Computation, 1, 67-82 (1997).
24. X. S. Yang, Nature-Inspired Metaheuristic Algorithms, Luniver Press, (2008).
25. X. S. Yang, Firefly algorithms for multimodal optimization, Proc. 5th Sympo-
sium on Stochastic Algorithms, Foundations and Applications, SAGA 2009,
Eds. O. Watanabe and T. Zeugmann, Lecture Notes in Computer Science,
5792, 169-178 (2009).
26. X. S. Yang and S. Deb, Cuckoo search via Lévy flights, in: Proc. of World
Congress on Nature & Biologically Inspired Computing (NaBic 2009), IEEE
Publications, USA, pp. 210-214 (2009).
27. X. S. Yang and S. Deb, Engineering optimization by cuckoo search, Int. J.
Math. Modelling & Num. Optimization, 1, 330-343 (2010).
28. X. S. Yang, A new metaheuristic bat-inspired algorithm, in: Nature Inspired
ta h e u r is
d Me ti Cooperative Strategies for Optimization (NICSO 2010) (Eds. J. R. Gonzalez
i re
cA
p

et al.), Springer, SCI 284, 65-74 (2010).


N a t u re -I n s

lgorith m s

Xin-She Yang
29. History of optimization, http://hse-econ.fi/kitti/opthist.html
c Luniver Press
30. Turing Archive for the History of Computing, www.alanturing.net/
S ec

)
10

on
d E d ion (20
it
Chapter 2

RANDOM WALKS AND LÉVY FLIGHTS

From the brief analysis of the main characteristics of metaheuristic algo-


rithms in the first chapter, we know that randomization plays an important
role in both exploration and exploitation, or diversification and intensifi-
cation. The essence of such randomization is the random walk. In this
chapter, we will briefly review the fundamentals of random walks, Lévy
flights and Markov chains. These concepts may provide some hints and
insights into how and why metaheuristic algorithms behave.

2.1 RANDOM VARIABLES

Loosely speaking, a random variable can be considered as an expression


whose value is the realization or outcome of events associated with a ran-
dom process such as the noise level on the street. The values of random
variables are real, though for some variables such as the number of cars on
a road can only take discrete values, and such random variables are called
discrete random variables. If a random variable such as noise at a particular
location can take any real values in an interval, it is called continuous. If a
random variable can have both continuous and discrete values, it is called
a mixed type. Mathematically speaking, a random variable is a function
which maps events to real numbers. The domain of this mapping is called
the sample space.
For each random variable, a probability density function can be used
to express its probability distribution. For example, the number of phone
calls per minute, and the number of users of a web server per day all obey
the Poisson distribution
λn e−λ
p(n; λ) = , (n = 0, 1, 2, ...), (2.1)
n!
ta h e u r is
d Me where
ti λ > 0 is a parameter which is the mean or expectation of the occur-
i re
cA
p
N a t u re -I n s

rence of the event during a unit interval.


lgorith m s

Xin-She Yang
Different random variables will have different distributions. Gaussian
c Luniver Press
distribution or normal distribution is by far the most popular distribu-
S ec

)
10

on
20
d E d i otions, because many physical variables including light intensity, and er-
it n (
Nature-Inspired Metaheuristic Algorithms, 2nd Edition by Xin-She Yang 11
Copyright c 2010 Luniver Press
12 CHAPTER 2. RANDOM WALKS AND LÉVY FLIGHTS

rors/uncertainty in measurements, and many other processes obey the nor-


mal distribution
1 (x − µ)2
p(x; µ, σ 2 ) = √ exp[− ], −∞ < x < ∞, (2.2)
σ 2π 2σ 2
where µ is the mean and σ > 0 is the standard deviation. This normal
distribution is often denoted by N(µ, σ 2 ). In the special case when µ = 0
and σ = 1, it is called a standard normal distribution, denoted by N(0, 1).
In the context of metaheuristics, another important distribution is the
so-called Lévy distribution, which is a distribution of the sum of N identi-
cally and independently distribution random variables whose Fourier trans-
form takes the following form

FN (k) = exp[−N |k|β ]. (2.3)

The inverse to get the actual distribution L(s) is not straightforward, as


the integral
1 ∞
Z
β
L(s) = cos(τ s)e−α τ dτ, (0 < β ≤ 2), (2.4)
π 0
does not have analytical forms, except for a few special cases. Here L(s)
is called the Lévy distribution with an index β. For most applications, we
can set α = 1 for simplicity. Two special cases are β = 1 and β = 2. When
β = 1, the above integral becomes the Cauchy distribution. When β = 2,
it becomes the normal distribution. In this case, Lévy flights become the
standard Brownian motion.
Mathematically speaking, we can express the integral (2.4) as an asymp-
totic series, and its leading-order approximation for the flight length results
in a power-law distribution

L(s) ∼ |s|−1−β , (2.5)

which is heavy-tailed. The variance of such a power-law distribution is


infinite for 0 < β < 2. The moments diverge (or are infinite) for 0 < β < 2,
which is a stumbling block for mathematical analysis.

2.2 RANDOM WALKS

A random walk is a random process which consists of taking a series of


Me ta h e u r is
i re d ticonsecutive random steps. Mathematically speaking, let SN denotes the
cA

sum of each consecutive random step Xi , then SN forms a random walk


p
N a t u re -I n s

lgorith m s

Xin-She Yang
N
c Luniver Press X
SN = Xi = X1 + ... + XN , (2.6)
S ec

)
10

on
d E d ion (20 i=1
it
2.2 RANDOM WALKS 13

where Xi is a random step drawn from a random distribution. This rela-


tionship can also be written as a recursive formula
N
X −1
SN = +XN = SN −1 + XN , (2.7)
i=1

which means the next state SN will only depend the current existing state
SN −1 and the motion or transition XN from the existing state to the next
state. This is typically the main property of a Markov chain to be intro-
duced later.
Here the step size or length in a random walk can be fixed or varying.
Random walks have many applications in physics, economics, statistics,
computer sciences, environmental science and engineering.
Consider a scenario, a drunkard walks on a street, at each step, he
can randomly go forward or backward, this forms a random walk in one-
dimensional. If this drunkard walks on a football pitch, he can walk in
any direction randomly, this becomes a 2D random walk. Mathematically
speaking, a random walk is given by the following equation

St+1 = St + wt , (2.8)

where St is the current location or state at t, and wt is a step or random


variable with a known distribution.
If each step or jump is carried out in the n-dimensional space, the ran-
dom walk discussed earlier
N
X
SN = Xi , (2.9)
i=1

becomes a random walk in higher dimensions. In addition, there is no


reason why each step length should be fixed. In fact, the step size can
also vary according to a known distribution. If the step length obeys the
Gaussian distribution, the random walk becomes the Brownian motion (see
Fig. 2.1).
In theory, as the number of steps N increases, the central limit theorem
implies that the random walk (2.9) should approaches a Gaussian distribu-
tion. As the mean of particle locations shown in Fig. 2.1 is obviously zero,
their variance will increase linearly with t. In general, in the d-dimensional
space, the variance of Brownian random walks can be written as
ta h e u r is
d Me
i re ti σ 2 (t) = |v0 |2 t2 + (2dD)t, (2.10)
cA
p
N a t u re -I n s

lgorith m s

Xin-She Yang
where v0 is the drift velocity of the system. Here D = s2 /(2τ ) is the
c Luniver Press
effective diffusion coefficient which is related to the step length s over a
S ec

)
10

on
20
d E d i oshort time interval τ during each jump.
it n (
14 CHAPTER 2. RANDOM WALKS AND LÉVY FLIGHTS

Figure 2.1: Brownian motion in 2D: random walk with a Gaus-


sian step-size distribution and the path of 50 steps starting at
the origin (0, 0) (marked with •).

Therefore, the Brownian motion B(t) essentially obeys a Gaussian dis-


tribution with zero mean and time-dependent variance. That is, B(t) ∼
N (0, σ 2 (t)) where ∼ means the random variable obeys the distribution on
the right-hand side; that is, samples should be drawn from the distribution.
A diffusion process can be viewed as a series of Brownian motion, and the
motion obeys the Gaussian distribution. For this reason, standard diffusion
is often referred to as the Gaussian diffusion. If the motion at each step is
not Gaussian, then the diffusion is called non-Gaussian diffusion.
If the step length obeys other distribution, we have to deal with more
generalized random walk. A very special case is when the step length obeys
the Lévy distribution, such a random walk is called a Lévy flight or Lévy
walk.

2.3 LÉVY DISTRIBUTION AND LÉVY FLIGHTS

Broadly speaking, Lévy flights are a random walk whose step length is
drawn from the Lévy distribution, often in terms of a simple power-law
formula L(s) ∼ |s|−1−β where 0 < β ≤ 2 is an index. Mathematically
speaking, a simple version of Lévy distribution can be defined as
p γ γ 1
 2π exp[− 2(s−µ) ] (s−µ)3/2 , 0 < µ < s < ∞
L(s, γ, µ) = (2.11)
0 otherwise,

where µ > 0 is a minimum step and γ is a scale parameter. Clearly, as


ta h e u r is
d Me ti s
i re → ∞, we have
cA
p
N a t u re -I n s

r
lgorith m s

γ 1
Xin-She Yang L(s, γ, µ) ≈ . (2.12)
c Luniver Press 2π s3/2
S ec

)
10

on
d E d i o n ( 2 0This is a special case of the generalized Lévy distribution.
it
2.3 LÉVY DISTRIBUTION AND LÉVY FLIGHTS 15

Figure 2.2: Lévy flights in consecutive 50 steps starting at the


origin (0, 0) (marked with •).

In general, Lévy distribution should be defined in terms of Fourier trans-


form
F (k) = exp[−α|k|β ], 0 < β ≤ 2, (2.13)
where α is a scale parameter. The inverse of this integral is not easy, as it
does not have analytical form, except for a few special cases.
For the case of β = 2, we have

F (k) = exp[−αk 2 ], (2.14)

whose inverse Fourier transform corresponds to a Gaussian distribution.


Another special case is β = 1, and we have

F (k) = exp[−α|k|], (2.15)

which corresponds to a Cauchy distribution

1 γ
p(x, γ, µ) = , (2.16)
π γ 2 + (x − µ)2

where µ is the location parameter, while γ controls the scale of this distri-
bution.
For the general case, the inverse integral

1 ∞
Z
L(s) = cos(ks) exp[−α|k|β ]dk, (2.17)
ta h
d M e e u r is
π 0
i re ti
cA
p
N a t u re -I n s

can be estimated only when s is large. We have


lgorith m s

Xin-She Yang
c Luniver Press
α β Γ(β) sin(πβ/2)
L(s) → , s → ∞. (2.18)
S ec

)
10

on
d E d ion (20 π|s|1+β
it
16 CHAPTER 2. RANDOM WALKS AND LÉVY FLIGHTS

Here Γ(z) is the Gamma function


Z ∞
Γ(z) = tz−1 e−t dt. (2.19)
0

In the case when z = n is an integer, we have Γ(n) = (n − 1)!.


Lévy flights are more efficient than Brownian random walks in exploring
unknown, large-scale search space. There are many reasons to explain this
efficiency, and one of them is due to the fact that the variance of Lévy
flights
σ 2 (t) ∼ t3−β , 1 ≤ β ≤ 2, (2.20)
increases much faster than the linear relationship (i.e., σ 2 (t) ∼ t) of Brow-
nian random walks.
Fig. 2.2 shows the path of Lévy flights of 50 steps starting from (0, 0)
with β = 1. It is worth pointing out that a power-law distribution is often
linked to some scale-free characteristics, and Lévy flights can thus show
self-similarity and fractal behavior in the flight patterns.
From the implementation point of view, the generation of random num-
bers with Lévy flights consists of two steps: the choice of a random direction
and the generation of steps which obey the chosen Lévy distribution. The
generation of a direction should be drawn from a uniform distribution, while
the generation of steps is quite tricky. There are a few ways of achieving
this, but one of the most efficient and yet straightforward ways is to use
the so-called Mantegna algorithm for a symmetric Lévy stable distribution.
Here ‘symmetric’ means that the steps can be positive and negative.
A random variable U and its probability distribution can be called stable
if a linear combination of its two identical copies (or U1 and U2 ) obeys the
same distribution. That is, aU1 + bU2 has the same distribution as cU + d
where a, b > 0 and c, d ∈ ℜ. If d = 0, it is called strictly stable. Gaussian,
Cauchy and Lévy distributions are all stable distributions.
In Mantegna’s algorithm, the step length s can be calculated by
u
s= , (2.21)
|v|1/β

where u and v are drawn from normal distributions. That is

u ∼ N (0, σu2 ), v ∼ N (0, σv2 ), (2.22)

where
Me ta h e u r is
n Γ(1 + β) sin(πβ/2) o1/β
i re d ti σu = , σv = 1. (2.23)
Γ[(1 + β)/2] β 2(β−1)/2
cA
p
N a t u re -I n s

lgorith m s

Xin-She Yang
This distribution (for s) obeys the expected Lévy distribution for |s| ≥ |s0 |
c Luniver Press
where s0 is the smallest step. In principle, |s0 | ≫ 0, but in reality s0 can
S ec

)
10

on
d E d i o n ( 2 0be taken as a sensible value such as s0 = 0.1 to 1.
it
2.4 OPTIMIZATION AS MARKOV CHAINS 17

Studies show that Lévy flights can maximize the efficiency of resource
searches in uncertain environments. In fact, Lévy flights have been observed
among foraging patterns of albatrosses and fruit flies, and spider monkeys.
Even humans such as the Ju/’hoansi hunter-gatherers can trace paths of
Lévy-flight patterns. In addition, Lévy flights have many applications.
Many physical phenomena such as the diffusion of fluorescent molecules,
cooling behavior and noise could show Lévy-flight characteristics under the
right conditions.

2.4 OPTIMIZATION AS MARKOV CHAINS

In every aspect, a simple random walk we discussed earlier can be consid-


ered as a Markov chain. Briefly speaking, a random variable ζ is a Markov
process if the transition probability, from state ζt = Si at time t to another
state ζt+1 = Sj , depends only on the current state ζi . That is

P (i, j) ≡ P (ζt+1 = Sj |ζ0 = Sp , ..., ζt = Si )

= P (ζt+1 = Sj |ζt = Si ), (2.24)


which is independent of the states before t. In addition, the sequence of ran-
dom variables (ζ0 , ζ1 , ..., ζn ) generated by a Markov process is subsequently
called a Markov chain. The transition probability P (i, j) ≡ P (i → j) = Pij
is also referred to as the transition kernel of the Markov chain.
If we rewrite the random walk relationship (2.7) with a random move
governed by wt which depends on the transition probability P , we have

St+1 = St + wt , (2.25)

which indeed has the properties of a Markov chain. Therefore, a random


walk is a Markov chain.
In order to solve an optimization problem, we can search the solution by
performing a random walk starting from a good initial but random guess
solution. However, simple or blind random walks are not efficient. To be
computationally efficient and effective in searching for new solutions, we
have to keep the best solutions found so far, and to increase the mobility of
the random walk so as to explore the search space more effectively. Most
importantly, we have to find a way to control the walk in such a way that it
can move towards the optimal solutions more quickly, rather than wander
away from the potential best solutions. These are the challenges for most
ta h u r is
d M e emetaheuristic
ti algorithms.
i re
cA
p
N a t u re -I n s

Further research along the route of Markov chains is that the devel-
lgorith m s

Xin-She Yang
opment of the Markov chain Monte Carlo (MCMC) method, which is a
c Luniver Press
class of sample-generating methods. It attempts to directly draw samples
S ec

)
10

on
d E d i ofrom 2 0 some highly complex multi-dimensional distribution using a Markov
it n (
18 CHAPTER 2. RANDOM WALKS AND LÉVY FLIGHTS

chain with known transition probability. Since the 1990s, the Markov chain
Monte Carlo has become a powerful tool for Bayesian statistical analysis,
Monte Carlo simulations, and potentially optimization with high nonlin-
earity.
An important link between MCMC and optimization is that some heuris-
tic and metaheuristic search algorithms such as simulated annealing to be
introduced later use a trajectory-based approach. They start with some ini-
tial (random) state, and propose a new state (solution) randomly. Then,
the move is accepted or not, depending on some probability. There is
strongly similar to a Markov chain. In fact, the standard simulated an-
nealing is a random walk.
Mathematically speaking, a great leap in understanding metaheuristic
algorithms is to view a Markov chain Monte carlo as an optimization pro-
cedure. If we want to find the minimum of an objective function f (θ) at
θ = θ∗ so that f∗ = f (θ∗ ) ≤ f (θ), we can convert it to a target distribution
for a Markov chain
π(θ) = e−βf (θ) , (2.26)
where β > 0 is a parameter which acts as a normalized factor. β value
should be chosen so that the probability is close to 1 when θ → θ∗ . At
θ = θ∗ , π(θ) should reach a maximum π∗ = π(θ∗ ) ≥ π(θ). This requires
that the formulation of L(θ) should be non-negative, which means that
some objective functions can be shifted by a large constant A > 0 such as
f ← f + A if necessary.
By constructing a Markov chain Monte Carlo, we can formulate a generic
framework as outlined by Ghate and Smith in 2008, as shown in Figure 2.3.
In this framework, simulated annealing and its many variants are simply a
special case with
∆f

exp[− Tt ] if ft+1 > ft
Pt = ,
1 if ft+1 ≤ ft

In this case, only the difference ∆f between the function values is impor-
tant.
Algorithms such as simulated annealing, to be discussed in the next
chapter, use a single Markov chain, which may not be very efficient. In
practice, it is usually advantageous to use multiple Markov chains in paral-
lel to increase the overall efficiency. In fact, the algorithms such as particle
swarm optimization can be viewed as multiple interacting Markov chains,
though such theoretical analysis remains almost intractable. The theory of
ta h
d M e eu ris tiinteracting Markov chains is complicated and yet still under development,
i re
cA
p
N a t u re -I n s

however, any progress in such areas will play a central role in the under-
lgorith m s

Xin-She Yang
standing how population- and trajectory-based metaheuristic algorithms
c Luniver Press
perform under various conditions. However, even though we do not fully
S ec

)
10

on
d E d i o n ( 2 0understand why metaheuristic algorithms work, this does not hinder us to
it
2.4 OPTIMIZATION AS MARKOV CHAINS 19

Markov Chain Algorithm for Optimization


Start with ζ0 ∈ S, at t = 0
while (criterion)
Propose a new solution Yt+1 ;
Generate a random number 0 ≤ Pt ≤ 1;

Yt+1 with probability Pt
ζt+1 = (2.27)
ζt with probability 1 − Pt

end

Figure 2.3: Optimization as a Markov chain.

use these algorithms efficiently. On the contrary, such mysteries can drive
and motivate us to pursue further research and development in metaheuris-
tics.

REFERENCES

1. W. J. Bell, Searching Behaviour: The Behavioural Ecology of Finding Re-


sources, Chapman & Hall, London, (1991).
2. C. Blum and A. Roli, Metaheuristics in combinatorial optimization: Overview
and conceptural comparison, ACM Comput. Surv., 35, 268-308 (2003).
3. G. S. Fishman, Monte Carlo: Concepts, Algorithms and Applications, Springer,
New York, (1995).
4. D. Gamerman, Markov Chain Monte Carlo, Chapman & Hall/CRC, (1997).
5. L. Gerencser, S. D. Hill, Z. Vago, and Z. Vincze, Discrete optimization,
SPSA, and Markov chain Monte Carlo methods, Proc. 2004 Am. Contr.
Conf., 3814-3819 (2004).
6. C. J. Geyer, Practical Markov Chain Monte Carlo, Statistical Science, 7,
473-511 (1992).
7. A. Ghate and R. Smith, Adaptive search with stochastic acceptance probabil-
ities for global optimization, Operations Research Lett., 36, 285-290 (2008).
8. W. R. Gilks, S. Richardson, and D. J. Spiegelhalter, Markov Chain Monte
Carlo in Practice, Chapman & Hall/CRC, (1996).
9. M. Gutowski, Lévy flights as an underlying mechanism for global optimiza-
Me ta h e u r is
i re d ti tion algorithms, ArXiv Mathematical Physics e-Prints, June, (2001).
cA
p
N a t u re -I n s

10. W. K. Hastings, Monte Carlo sampling methods using Markov chains and
lgorith m s

Xin-She Yang their applications, Biometrika, 57, 97-109 (1970).


c Luniver Press
11. S. Kirkpatrick, C. D. Gellat and M. P. Vecchi, Optimization by simulated
S ec

)
10

on
d E d i o n ( 2 0 annealing, Science, 220, 670-680 (1983).
it
20 CHAPTER 2. RANDOM WALKS AND LÉVY FLIGHTS

12. R. N. Mantegna, Fast, accurate algorithm for numerical simulation of Levy


stable stochastic processes, Physical Review E, 49, 4677-4683 (1994).
13. E. Marinari and G. Parisi, Simulated tempering: a new Monte Carlo scheme,
Europhysics Lett., 19, 451-458 (1992).
14. J. P. Nolan, Stable distributions: models for heavy-tailed data, American
University, (2009).
15. N. Metropolis, and S. Ulam, The Monte Carlo method, J. Amer. Stat.
Assoc., 44, 335-341 (1949).
16. N. Metropolis, A. W. Rosenbluth, M. N. Rosenbluth, A. H. Teller, and E.
Teller, Equation of state calculations by fast computing machines, J. Chem.
Phys., 21, 1087-1092 (1953).
17. I. Pavlyukevich, Lévy flights, non-local search and simulated annealing, J.
Computational Physics, 226, 1830-1844 (2007).
18. G. Ramos-Fernandez, J. L. Mateos, O. Miramontes, G. Cocho, H. Larralde,
B. Ayala-Orozco, Lévy walk patterns in the foraging movements of spider
monkeys (Ateles geoffroyi),Behav. Ecol. Sociobiol., 55, 223-230 (2004).
19. A. M. Reynolds and M. A. Frye, Free-flight odor tracking in Drosophila is
consistent with an optimal intermittent scale-free search, PLoS One, 2, e354
(2007).
20. A. M. Reynolds and C. J. Rhodes, The Lévy flight paradigm: random search
patterns and mechanisms, Ecology, 90, 877-887 (2009).
21. I. M. Sobol, A Primer for the Monte Carlo Method, CRC Press, (1994).
22. M. E. Tipping M. E., Bayesian inference: An introduction to principles and
and practice in machine learning, in: Advanced Lectures on Machine Learn-
ing, O. Bousquet, U. von Luxburg and G. Ratsch (Eds), pp.41-62 (2004).
23. G. M. Viswanathan, S. V. Buldyrev, S. Havlin, M. G. E. da Luz, E. P. Ra-
poso, and H. E. Stanley, Lévy flight search patterns of wandering albatrosses,
Nature, 381, 413-415 (1996).
24. E. Weisstein, http://mathworld.wolfram.com

ta h e u r is
d Me ti
i re
cA
p
N a t u re -I n s

lgorith m s

Xin-She Yang
c Luniver Press
S ec

)
10

on
d E d ion (20
it
Chapter 10

FIREFLY ALGORITHM

10.1 BEHAVIOUR OF FIREFLIES

The flashing light of fireflies is an amazing sight in the summer sky in


the tropical and temperate regions. There are about two thousand firefly
species, and most fireflies produce short and rhythmic flashes. The pat-
tern of flashes is often unique for a particular species. The flashing light is
produced by a process of bioluminescence, and the true functions of such
signaling systems are still being debated. However, two fundamental func-
tions of such flashes are to attract mating partners (communication), and
to attract potential prey. In addition, flashing may also serve as a protec-
tive warning mechanism to remind potential predators of the bitter taste
of fireflies.
The rhythmic flash, the rate of flashing and the amount of time form
part of the signal system that brings both sexes together. Females respond
to a male’s unique pattern of flashing in the same species, while in some
species such as Photuris, female fireflies can eavesdrop on the biolumines-
cent courtship signals and even mimic the mating flashing pattern of other
species so as to lure and eat the male fireflies who may mistake the flashes
as a potential suitable mate. Some tropic fireflies can even synchronize
their flashes, thus forming emerging biological self-organized behavior.
We know that the light intensity at a particular distance r from the light
source obeys the inverse square law. That is to say, the light intensity I
decreases as the distance r increases in terms of I ∝ 1/r2 . Furthermore,
the air absorbs light which becomes weaker and weaker as the distance
increases. These two combined factors make most fireflies visual to a limit
distance, usually several hundred meters at night, which is good enough
for fireflies to communicate.
The flashing light can be formulated in such a way that it is associated
dM e t a h ewith
u r is t the objective function to be optimized, which makes it possible to
i re i
formulate new optimization algorithms. In the rest of this chapter, we will
cA
p
N a t u re -I n s

lgorith m s

Xin-She Yang first outline the basic formulation of the Firefly Algorithm (FA) and then
c Luniver discuss Press the implementation in detail.
S ec

)
10

on
d E d ion (20
it
Nature-Inspired Metaheuristic Algorithms, 2nd Edition by Xin-She Yang 81
Copyright c 2010 Luniver Press
82 CHAPTER 10. FIREFLY ALGORITHM

Firefly Algorithm
Objective function f (x), x = (x1 , ..., xd )T
Generate initial population of fireflies xi (i = 1, 2, ..., n)
Light intensity Ii at xi is determined by f (xi )
Define light absorption coefficient γ
while (t <MaxGeneration)
for i = 1 : n all n fireflies
for j = 1 : n all n fireflies (inner loop)
if (Ii < Ij ), Move firefly i towards j; end if
Vary attractiveness with distance r via exp[−γr]
Evaluate new solutions and update light intensity
end for j
end for i
Rank the fireflies and find the current global best g ∗
end while
Postprocess results and visualization

Figure 10.1: Pseudo code of the firefly algorithm (FA).

10.2 FIREFLY ALGORITHM

Now we can idealize some of the flashing characteristics of fireflies so as


to develop firefly-inspired algorithms. For simplicity in describing our new
Firefly Algorithm (FA) which was developed by Xin-She Yang at Cam-
bridge University in 2007, we now use the following three idealized rules:

• All fireflies are unisex so that one firefly will be attracted to other
fireflies regardless of their sex;

• Attractiveness is proportional to the their brightness, thus for any two


flashing fireflies, the less brighter one will move towards the brighter
one. The attractiveness is proportional to the brightness and they
both decrease as their distance increases. If there is no brighter one
than a particular firefly, it will move randomly;

• The brightness of a firefly is affected or determined by the landscape


of the objective function.
ta h e u r is
d Me For a maximization problem, the brightness can simply be proportional
ti
i re
cA
p
N a t u re -I n s

to the value of the objective function. Other forms of brightness can be


lgorith m s

Xin-She Yang
defined in a similar way to the fitness function in genetic algorithms.
c Luniver Press
Based on these three rules, the basic steps of the firefly algorithm (FA)
S ec

)
10

on
d E d i o n ( 2 0can be summarized as the pseudo code shown in Figure 11.1.
it
10.3 LIGHT INTENSITY AND ATTRACTIVENESS 83

10.3 LIGHT INTENSITY AND ATTRACTIVENESS

In the firefly algorithm, there are two important issues: the variation of
light intensity and formulation of the attractiveness. For simplicity, we
can always assume that the attractiveness of a firefly is determined by its
brightness which in turn is associated with the encoded objective function.
In the simplest case for maximum optimization problems, the brightness
I of a firefly at a particular location x can be chosen as I(x) ∝ f (x).
However, the attractiveness β is relative, it should be seen in the eyes
of the beholder or judged by the other fireflies. Thus, it will vary with
the distance rij between firefly i and firefly j. In addition, light intensity
decreases with the distance from its source, and light is also absorbed in
the media, so we should allow the attractiveness to vary with the degree of
absorption.
In the simplest form, the light intensity I(r) varies according to the
inverse square law
Is
I(r) = 2 , (10.1)
r
where Is is the intensity at the source. For a given medium with a fixed
light absorption coefficient γ, the light intensity I varies with the distance
r. That is
I = I0 e−γr , (10.2)
where I0 is the original light intensity. In order to avoid the singularity
at r = 0 in the expression Is /r2 , the combined effect of both the inverse
square law and absorption can be approximated as the following Gaussian
form 2
I(r) = I0 e−γr . (10.3)
As a firefly’s attractiveness is proportional to the light intensity seen by
adjacent fireflies, we can now define the attractiveness β of a firefly by
2
β = β0 e−γr , (10.4)
where β0 is the attractiveness at r = 0. As it is often faster to calculate
1/(1 + r2 ) than an exponential function, the above function, if necessary,
can conveniently be approximated as
β0
β= . (10.5)
1 + γr2

Both (10.4) and (10.5) define a characteristic distance Γ = 1/ γ over which
the attractiveness changes significantly from β0 to β0 e−1 for equation (10.4)
M e ta h e u r is
i re
d orti β0 /2 for equation (10.5).
cA
p
N a t u re -I n s

In the actual implementation, the attractiveness function β(r) can be


lgorith m s

Xin-She Yang
any monotonically decreasing functions such as the following generalized
c Luniver Press
form
S ec

m
10

on
d E d ion (20
it β(r) = β0 e−γr , (m ≥ 1). (10.6)
84 CHAPTER 10. FIREFLY ALGORITHM

For a fixed γ, the characteristic length becomes


Γ = γ −1/m → 1, m → ∞. (10.7)
Conversely, for a given length scale Γ in an optimization problem, the
parameter γ can be used as a typical initial value. That is
1
γ= . (10.8)
Γm
The distance between any two fireflies i and j at xi and xj , respectively,
is the Cartesian distance
v
u d
uX
r = kx − x k = t (x − x )2 ,
ij i j i,k j,k (10.9)
k=1

where xi,k is the kth component of the spatial coordinate xi of ith firefly.
In 2-D case, we have
q
rij = (xi − xj )2 + (yi − yj )2 . (10.10)

The movement of a firefly i is attracted to another more attractive


(brighter) firefly j is determined by
2
xi = xi + β0 e−γrij (xj − xi ) + α ǫi , (10.11)
where the second term is due to the attraction. The third term is random-
ization with α being the randomization parameter, and ǫi is a vector of
random numbers drawn from a Gaussian distribution or uniform distribu-
tion. For example, the simplest form is ǫi can be replaced by rand − 1/2
where rand is a random number generator uniformly distributed in [0, 1].
For most our implementation, we can take β0 = 1 and α ∈ [0, 1].
It is worth pointing out that (10.11) is a random walk biased towards the
brighter fireflies. If β0 = 0, it becomes a simple random walk. Furthermore,
the randomization term can easily be extended to other distributions such
as Lévy flights.
The parameter γ now characterizes the variation of the attractiveness,
and its value is crucially important in determining the speed of the con-
vergence and how the FA algorithm behaves. In theory, γ ∈ [0, ∞), but
in practice, γ = O(1) is determined by the characteristic length Γ of the
system to be optimized. Thus, for most applications, it typically varies
from 0.1 to 10.
ta h e u r is
d Me ti
i re
cA
p
N a t u re -I n s

10.4 SCALINGS AND ASYMPTOTICS


lgorith m s

Xin-She Yang
c Luniver Press
It is worth pointing out that the distance r defined above is not limited to
S ec

)
10

on
d E d i o n ( 2 0the Euclidean distance. We can define other distance r in the n-dimensional
it
10.4 SCALINGS AND ASYMPTOTICS 85

hyperspace, depending on the type of problem of our interest. For example,


for job scheduling problems, r can be defined as the time lag or time in-
terval. For complicated networks such as the Internet and social networks,
the distance r can be defined as the combination of the degree of local
clustering and the average proximity of vertices. In fact, any measure that
can effectively characterize the quantities of interest in the optimization
problem can be used as the ‘distance’ r.
The typical scale Γ should be associated with the scale concerned in
our optimization problem. If Γ is the typical scale for a given optimization
problem, for a very large number of fireflies n ≫ k where k is the number of
local optima, then the initial locations of these n fireflies should distribute
relatively uniformly over the entire search space. As the iterations proceed,
the fireflies would converge into all the local optima (including the global
ones). By comparing the best solutions among all these optima, the global
optima can easily be achieved. Our recent research suggests that it is
possible to prove that the firefly algorithm will approach global optima
when n → ∞ and t ≫ 1. In reality, it converges very quickly and this will
be demonstrated later in this chapter.
There are two important limiting or asymptotic cases when γ → 0 and
γ → ∞. For γ → 0, the attractiveness is constant β = β0 and Γ → ∞,
this is equivalent to saying that the light intensity does not decrease in an
idealized sky. Thus, a flashing firefly can be seen anywhere in the domain.
Thus, a single (usually global) optima can easily be reached. If we remove
the inner loop for j in Figure 11.1 and replace xj by the current global
best g ∗ , then the Firefly Algorithm becomes the special case of accelerated
particle swarm optimization (PSO) discussed earlier. Subsequently, the
efficiency of this special case is the same as that of PSO.
On the other hand, the limiting case γ → ∞ leads to Γ → 0 and β(r) →
δ(r) which is the Dirac delta function, which means that the attractiveness
is almost zero in the sight of other fireflies. This is equivalent to the case
where the fireflies roam in a very thick foggy region randomly. No other
fireflies can be seen, and each firefly roams in a completely random way.
Therefore, this corresponds to the completely random search method.
As the firefly algorithm is usually in the case between these two extremes,
it is possible to adjust the parameter γ and α so that it can outperform
both the random search and PSO. In fact, FA can find the global optima
as well as the local optima simultaneously and effectively. This advantage
will be demonstrated in detail later in the implementation.
A further advantage of FA is that different fireflies will work almost
e t a h eindependently,
u it is thus particular suitable for parallel implementation. It
dM r is t
i re is even i
better than genetic algorithms and PSO because fireflies aggregate
cA
p
N a t u re -I n s

lgorith m s

Xin-She Yang
more closely around each optimum. It can be expected that the interactions
c Luniver between Press different subregions are minimal in parallel implementation.
S ec

)
10

on
d E d ion (20
it
86 CHAPTER 10. FIREFLY ALGORITHM

1
5

0
−5 0
0
5 −5

Figure 10.2: Landscape of a function with two equal global maxima.

10.5 IMPLEMENTATION

In order to demonstrate how the firefly algorithm works, we have imple-


mented it in Matlab/Octave to be given later.
In order to show that both the global optima and local optima can be
found simultaneously, we now use the following four-peak function
2
−(y−4)2 2
−(y−4)2 2
−y 2 2
−(y+4)2
f (x, y) = e−(x−4) + e−(x+4) + 2[e−x + e−x ],

where (x, y) ∈ [−5, 5] × [−5, 5]. This function has four peaks. Two local
peaks with f = 1 at (−4, 4) and (4, 4), and two global peaks with fmax = 2
at (0, 0) and (0, −4), as shown in Figure 10.2. We can see that all these
four optima can be found using 25 fireflies in about 20 generations (see Fig.
10.3). So the total number of function evaluations is about 500. This is
much more efficient than most of existing metaheuristic algorithms.

% Firefly Algorithm by X S Yang (Cambridge University)


% Usage: ffa_demo([number_of_fireflies,MaxGeneration])
% eg: ffa_demo([12,50]);
function [best]=firefly_simple(instr)
% n=number of fireflies
% MaxGeneration=number of pseudo time steps
if nargin<1, instr=[12 50]; end
n=instr(1); MaxGeneration=instr(2);
rand(‘state’,0); % Reset the random generator
dM e t a h e u ris t % ------ Four peak functions ---------------------
i re i
str1=‘exp(-(x-4)^2-(y-4)^2)+exp(-(x+4)^2-(y-4)^2)’;
cA
p
N a t u re -I n s

lgorith m s

str2=‘+2*exp(-x^2-(y+4)^2)+2*exp(-x^2-y^2)’;
Xin-She Yang
c Luniver Press
funstr=strcat(str1,str2);
% Converting to an inline function
S ec

)
10

on
d E d i o n ( 2 0f=vectorize(inline(funstr));
it
10.5 IMPLEMENTATION 87

% range=[xmin xmax ymin ymax];


range=[-5 5 -5 5];

% ------------------------------------------------
alpha=0.2; % Randomness 0--1 (highly random)
gamma=1.0; % Absorption coefficient
% ------------------------------------------------
% Grid values are used for display only
Ngrid=100;
dx=(range(2)-range(1))/Ngrid;
dy=(range(4)-range(3))/Ngrid;
[x,y]=meshgrid(range(1):dx:range(2),...
range(3):dy:range(4));
z=f(x,y);
% Display the shape of the objective function
figure(1); surfc(x,y,z);

% ------------------------------------------------
% generating the initial locations of n fireflies
[xn,yn,Lightn]=init_ffa(n,range);
% Display the paths of fireflies in a figure with
% contours of the function to be optimized
figure(2);
% Iterations or pseudo time marching
for i=1:MaxGeneration, %%%%% start iterations
% Show the contours of the function
contour(x,y,z,15); hold on;
% Evaluate new solutions
zn=f(xn,yn);

% Ranking the fireflies by their light intensity


[Lightn,Index]=sort(zn);
xn=xn(Index); yn=yn(Index);
xo=xn; yo=yn; Lighto=Lightn;
% Trace the paths of all roaming fireflies
plot(xn,yn,‘.’,‘markersize’,10,‘markerfacecolor’,‘g’);
% Move all fireflies to the better locations
[xn,yn]=ffa_move(xn,yn,Lightn,xo,yo,...
Lighto,alpha,gamma,range);
drawnow;
% Use "hold on" to show the paths of fireflies
hold off;
end %%%%% end of iterations
d M eta h ebest(:,1)=xo’;
u r is best(:,2)=yo’; best(:,3)=Lighto’;
i re ti
cA
p
N a t u re -I n s

lgorith m s

% -----
Xin-She Yang All subfunctions are listed here ---------
c Luniver % The
Press initial locations of n fireflies
function [xn,yn,Lightn]=init_ffa(n,range)
S ec

)
10

on 0
d E d ion (2
it
88 CHAPTER 10. FIREFLY ALGORITHM

5 5

0 0

−5 −5
−5 0 5 −5 0 5

Figure 10.3: The initial locations of 25 fireflies (left) and their final locations
after 20 iterations (right).

xrange=range(2)-range(1);
yrange=range(4)-range(3);
xn=rand(1,n)*xrange+range(1);
yn=rand(1,n)*yrange+range(3);
Lightn=zeros(size(yn));

% Move all fireflies toward brighter ones


function [xn,yn]=ffa_move(xn,yn,Lightn,xo,yo,...
Lighto,alpha,gamma,range)
ni=size(yn,2); nj=size(yo,2);
for i=1:ni,
% The attractiveness parameter beta=exp(-gamma*r)
for j=1:nj,
r=sqrt((xn(i)-xo(j))^2+(yn(i)-yo(j))^2);
if Lightn(i)<Lighto(j), % Brighter and more attractive
beta0=1; beta=beta0*exp(-gamma*r.^2);
xn(i)=xn(i).*(1-beta)+xo(j).*beta+alpha.*(rand-0.5);
yn(i)=yn(i).*(1-beta)+yo(j).*beta+alpha.*(rand-0.5);
end
end % end for j
end % end for i
[xn,yn]=findrange(xn,yn,range);

% Make sure the fireflies are within the range


function [xn,yn]=findrange(xn,yn,range)
dM e t a h e u ris t for i=1:length(yn),
i re i
if xn(i)<=range(1), xn(i)=range(1); end
cA
p
N a t u re -I n s

lgorith m s

if xn(i)>=range(2), xn(i)=range(2); end


Xin-She Yang
c Luniver Press
if yn(i)<=range(3), yn(i)=range(3); end
if yn(i)>=range(4), yn(i)=range(4); end
S ec

)
10

on
d E d i o n ( 2 0end
it
10.6 FA VARIANTS 89

In the implementation, the values of the parameters are α = 0.2, γ = 1


and β0 = 1. Obviously, these parameters can be adjusted to suit for solving
various problems with different scales.

10.6 FA VARIANTS

The basic firefly algorithm is very efficient, but we can see that the solutions
are still changing as the optima are approaching. It is possible to improve
the solution quality by reducing the randomness.
A further improvement on the convergence of the algorithm is to vary
the randomization parameter α so that it decreases gradually as the optima
are approaching. For example, we can use

α = α∞ + (α0 − α∞ )e−t , (10.12)

where t ∈ [0, tmax ] is the pseudo time for simulations and tmax is the max-
imum number of generations. α0 is the initial randomization parameter
while α∞ is the final value. We can also use a similar function to the
geometrical annealing schedule. That is

α = α0 θt , (10.13)

where θ ∈ (0, 1] is the randomness reduction constant.


In addition, in the current version of the FA algorithm, we do not ex-
plicitly use the current global best g ∗ , even though we only use it to decode
the final best solutions. Our recent studies show that the efficiency may
significantly improve if we add an extra term λǫi (xi − g ∗ ) to the updating
formula (10.11). Here λ is a parameter similar to α and β, and ǫi is a
vector of random numbers. These could form important topics for further
research.

10.7 SPRING DESIGN

The design of a tension and compression spring is a well-known benchmark


optimization problem. The main aim is to minimize the weight subject
to constraints on deflection, stress, surge frequency and geometry. It in-
volves three design variables: the wire diameter x1 , coil diameter x2 and
number/length of the coil x3 . This problem can be summarized as

d Me
ta h e u r is
ti minimize f (x) = x21 x2 (2 + x3 ), (10.14)
i re
cA
p
N a t u re -I n s

lgorith m s

Xin-She Yang
subject to the following constraints
c Luniver Press
x32 x3
g1 (x) = 1 − ≤ 0,
S ec

)
10

on
d E d ion
it (20 71785x41
90 CHAPTER 10. FIREFLY ALGORITHM

4x22 − x1 x2 1
g2 (x) = + − 1 ≤ 0,
12566(x31 x2 − x41 ) 5108x21
140.45x1
g3 (x) = 1 − ≤ 0,
x22 x3
x1 + x2
g4 (x) =
− 1 ≤ 0. (10.15)
1.5
The simple bounds on the design variables are

0.05 ≤ x1 ≤ 2.0, 0.25 ≤ x2 ≤ 1.3, 2.0 ≤ x3 ≤ 15.0. (10.16)

The best solution found in the literature (e.g., Cagnina et al. 2008) is

x∗ = (0.051690, 0.356750, 11.287126), (10.17)

with the objective


f (x∗ ) = 0.012665. (10.18)
We now provide the Matlab implementation of our firefly algorithm to-
gether with the penalty method for incorporating constraints. You may
need a newer version of Matlab to deal with function handles. If you run
the program a few times, you can get the above optimal solutions. It is
even possible to produce better results if you experiment the program for
a while.

% -------------------------------------------------------%
% Firefly Algorithm for constrained optimization %
% by Xin-She Yang (Cambridge University) Copyright @2009 %
% -------------------------------------------------------%
function fa_mincon_demo

% parameters [n N_iteration alpha betamin gamma]


para=[40 150 0.5 0.2 1];

% This demo uses the Firefly Algorithm to solve the


% [Spring Design Problem as described by Cagnina et al.,
% Informatica, vol. 32, 319-326 (2008). ]

% Simple bounds/limits
disp(’Solve the simple spring design problem ...’);
Lb=[0.05 0.25 2.0];
Ub=[2.0 1.3 15.0];
ta h e u r is
d Me ti
i re % Initial random guess
cA
p
N a t u re -I n s

lgorith m s

u0=(Lb+Ub)/2;
Xin-She Yang
c Luniver Press
[u,fval,NumEval]=ffa_mincon(@cost,@constraint,u0,Lb,Ub,para);
S ec

)
10

on
d E d ion (20
it
10.7 SPRING DESIGN 91

% Display results
bestsolution=u
bestojb=fval
total_number_of_function_evaluations=NumEval

%%% Put your own cost/objective function here --------%%%


%% Cost or Objective function
function z=cost(x)
z=(2+x(3))*x(1)^2*x(2);

% Constrained optimization using penalty methods


% by changing f to F=f+ \sum lam_j*g^2_j*H_j(g_j)
% where H(g)=0 if g<=0 (true), =1 if g is false

%%% Put your own constraints here --------------------%%%


function [g,geq]=constraint(x)
% All nonlinear inequality constraints should be here
% If no inequality constraint at all, simple use g=[];
g(1)=1-x(2)^3*x(3)/(7178*x(1)^4);
tmpf=(4*x(2)^2-x(1)*x(2))/(12566*(x(2)*x(1)^3-x(1)^4));
g(2)=tmpf+1/(5108*x(1)^2)-1;
g(3)=1-140.45*x(1)/(x(2)^2*x(3));
g(4)=x(1)+x(2)-1.5;

% all nonlinear equality constraints should be here


% If no equality constraint at all, put geq=[] as follows
geq=[];

%%% End of the part to be modified -------------------%%%

%%% --------------------------------------------------%%%
%%% Do not modify the following codes unless you want %%%
%%% to improve its performance etc %%%
% -------------------------------------------------------
% ===Start of the Firefly Algorithm Implementation ======
% Inputs: fhandle => @cost (your own cost function,
% can be an external file )
% nonhandle => @constraint, all nonlinear constraints
% can be an external file or a function
% Lb = lower bounds/limits
% Ub = upper bounds/limits
% para == optional (to control the Firefly algorithm)
% Outputs: nbest = the best solution found so far
d M eta h e%
u r is fbest = the best objective value
i re ti
cA
p

% NumEval = number of evaluations: n*MaxGeneration


N a t u re -I n s

lgorith m s

Xin-She Yang % Optional:


c Luniver % Press The alpha can be reduced (as to reduce the randomness)
% ---------------------------------------------------------
S ec

)
10

on 0
d E d ion (2
it
92 CHAPTER 10. FIREFLY ALGORITHM

% Start FA
function [nbest,fbest,NumEval]...
=ffa_mincon(fhandle,nonhandle,u0, Lb, Ub, para)
% Check input parameters (otherwise set as default values)
if nargin<6, para=[20 50 0.25 0.20 1]; end
if nargin<5, Ub=[]; end
if nargin<4, Lb=[]; end
if nargin<3,
disp(’Usuage: FA_mincon(@cost, @constraint,u0,Lb,Ub,para)’);
end

% n=number of fireflies
% MaxGeneration=number of pseudo time steps
% ------------------------------------------------
% alpha=0.25; % Randomness 0--1 (highly random)
% betamn=0.20; % minimum value of beta
% gamma=1; % Absorption coefficient
% ------------------------------------------------
n=para(1); MaxGeneration=para(2);
alpha=para(3); betamin=para(4); gamma=para(5);

% Total number of function evaluations


NumEval=n*MaxGeneration;

% Check if the upper bound & lower bound are the same size
if length(Lb) ~=length(Ub),
disp(’Simple bounds/limits are improper!’);
return
end

% Calcualte dimension
d=length(u0);

% Initial values of an array


zn=ones(n,1)*10^100;
% ------------------------------------------------
% generating the initial locations of n fireflies
[ns,Lightn]=init_ffa(n,d,Lb,Ub,u0);

% Iterations or pseudo time marching


for k=1:MaxGeneration, %%%%% start iterations

ta h eu ris % This line of reducing alpha is optional


d Me ti
i re
cA
p

alpha=alpha_new(alpha,MaxGeneration);
N a t u re -I n s

lgorith m s

Xin-She Yang
%
c Luniver Press Evaluate new solutions (for all n fireflies)
for i=1:n,
S ec

)
10

on 0
d E d ion (2
it
10.7 SPRING DESIGN 93

zn(i)=Fun(fhandle,nonhandle,ns(i,:));
Lightn(i)=zn(i);
end

% Ranking fireflies by their light intensity/objectives


[Lightn,Index]=sort(zn);
ns_tmp=ns;
for i=1:n,
ns(i,:)=ns_tmp(Index(i),:);
end

%% Find the current best


nso=ns; Lighto=Lightn;
nbest=ns(1,:); Lightbest=Lightn(1);

% For output only


fbest=Lightbest;

% Move all fireflies to the better locations


[ns]=ffa_move(n,d,ns,Lightn,nso,Lighto,nbest,...
Lightbest,alpha,betamin,gamma,Lb,Ub);

end %%%%% end of iterations

% -------------------------------------------------------
% ----- All the subfunctions are listed here ------------
% The initial locations of n fireflies
function [ns,Lightn]=init_ffa(n,d,Lb,Ub,u0)
% if there are bounds/limits,
if length(Lb)>0,
for i=1:n,
ns(i,:)=Lb+(Ub-Lb).*rand(1,d);
end
else
% generate solutions around the random guess
for i=1:n,
ns(i,:)=u0+randn(1,d);
end
end

% initial value before function evaluations


Lightn=ones(n,1)*10^100;

ta h e%
d Me u ris Move all fireflies toward brighter ones
ti
i re
cA
p

function [ns]=ffa_move(n,d,ns,Lightn,nso,Lighto,...
N a t u re -I n s

lgorith m s

Xin-She Yang nbest,Lightbest,alpha,betamin,gamma,Lb,Ub)


c Luniver % Scaling of the system
Press
scale=abs(Ub-Lb);
S ec

)
10

on 0
d E d ion (2
it
94 CHAPTER 10. FIREFLY ALGORITHM

% Updating fireflies
for i=1:n,
% The attractiveness parameter beta=exp(-gamma*r)
for j=1:n,
r=sqrt(sum((ns(i,:)-ns(j,:)).^2));
% Update moves
if Lightn(i)>Lighto(j), % Brighter and more attractive
beta0=1; beta=(beta0-betamin)*exp(-gamma*r.^2)+betamin;
tmf=alpha.*(rand(1,d)-0.5).*scale;
ns(i,:)=ns(i,:).*(1-beta)+nso(j,:).*beta+tmpf;
end
end % end for j

end % end for i

% Check if the updated solutions/locations are within limits


[ns]=findlimits(n,ns,Lb,Ub);

% This function is optional, as it is not in the original FA


% The idea to reduce randomness is to increase the convergence,
% however, if you reduce randomness too quickly, then premature
% convergence can occur. So use with care.
function alpha=alpha_new(alpha,NGen)
% alpha_n=alpha_0(1-delta)^NGen=0.005
% alpha_0=0.9
delta=1-(0.005/0.9)^(1/NGen);
alpha=(1-delta)*alpha;

% Make sure the fireflies are within the bounds/limits


function [ns]=findlimits(n,ns,Lb,Ub)
for i=1:n,
% Apply the lower bound
ns_tmp=ns(i,:);
I=ns_tmp<Lb;
ns_tmp(I)=Lb(I);

% Apply the upper bounds


J=ns_tmp>Ub;
ns_tmp(J)=Ub(J);
% Update this new move
ns(i,:)=ns_tmp;
end
ta h e u r is
d Me ti
i re
cA
p

% -----------------------------------------
N a t u re -I n s

lgorith m s

Xin-She Yang % d-dimensional objective function


function z=Fun(fhandle,nonhandle,u)
c Luniver Press

0% Objective
S ec

)
10

on
d E d ion (2
it
10.7 SPRING DESIGN 95

z=fhandle(u);

% Apply nonlinear constraints by the penalty method


% Z=f+sum_k=1^N lam_k g_k^2 *H(g_k) where lam_k >> 1
z=z+getnonlinear(nonhandle,u);

function Z=getnonlinear(nonhandle,u)
Z=0;
% Penalty constant >> 1
lam=10^15; lameq=10^15;
% Get nonlinear constraints
[g,geq]=nonhandle(u);

% Apply inequality constraints as a penalty function


for k=1:length(g),
Z=Z+ lam*g(k)^2*getH(g(k));
end
% Apply equality constraints (when geq=[], length->0)
for k=1:length(geq),
Z=Z+lameq*geq(k)^2*geteqH(geq(k));
end

% Test if inequalities hold


% H(g) which is something like an index function
function H=getH(g)
if g<=0,
H=0;
else
H=1;
end

% Test if equalities hold


function H=geteqH(g)
if g==0,
H=0;
else
H=1;
end
%% ==== End of Firefly Algorithm implementation ======

REFERENCES
Me ta h e u r is
i re d ti
cA
p

1. J. Arora, Introduction to Optimum Design, McGraw-Hill, (1989).


N a t u re -I n s

lgorith m s

Xin-She Yang
c Luniver Press
2. L. C. Cagnina, S. C. Esquivel, C. A. Coello, Solving engineering optimization
problems with the simple constrained particle swarm optimizer, Informatica,
S ec

)
10

on
d E d i o n ( 2 0 32, 319-326 (2008).
it
96 CHAPTER 10. FIREFLY ALGORITHM

3. S. Lukasik and S. Zak, Firefly algorithm for continuous constrained optimiza-


tion tasks, ICCCI 2009, Lecture Notes in Artificial Intelligence (Eds. N. T.
Ngugen et al.), 5796, 97-106 (2009).
4. S. M. Lewis and C. K. Cratsley, Flash signal evolution, mate choice, and
predation in fireflies, Annual Review of Entomology, 53, 293-321 (2008).
5. C. O’Toole, Firefly Encyclopedia of Insects and Spiders, Firefly Books Ltd,
2002.
6. A. M. Reynolds and C. J. Rhodes, The Lévy flight paradigm: random search
patterns and mechanisms, Ecology, 90, 877-87 (2009).
7. E. G. Talbi, Metaheuristics: From Design to Implementation, Wiley, (2009).
8. X. S. Yang, Nature-Inspired Metaheuristic Algorithms, Luniver Press, (2008).
9. X. S. Yang, Firefly algorithms for multimodal optimization, in: Stochastic
Algorithms: Foundations and Applications, SAGA 2009, Lecture Notes in
Computer Science, 5792, 169-178 (2009).
10. X. S. Yang, Firefly algorithm, Lévy flights and global optimization, in: Re-
search and Development in Intelligent Systems XXVI, (Eds M. Bramer et
al.), Springer, London, pp. 209-218 (2010).

ta h e u r is
d Me ti
i re
cA
p
N a t u re -I n s

lgorith m s

Xin-She Yang
c Luniver Press
S ec

)
10

on
d E d ion (20
it
144 REFERENCES

52. Sawaragi Y., Nakayama H., Tanino T., Theory of Multiobjective Optimisa-
tion, Academic Press, (1985).
53. Schrijver A., On the history of combinatorial optimization (till 1960), in:
Handbook of Discrete Optimization (Eds K. Aardal, G. L. Nemhauser, R.
Weismantel), Elsevier, Amsterdam, p.1-68 (2005).
54. Sirisalee P., Ashby M. F., Parks G. T., and Clarkson P. J.: Multi-criteria
material selection in engineering design, Adv. Eng. Mater., 6, 84-92 (2004).
55. Siegelmann H. T. and Sontag E. D., Turing computability with neural nets,
Appl. Math. Lett., 4, 77-80 (1991).
56. Seeley T. D., The Wisdom of the Hive, Harvard University Press, (1995).
57. Seeley T. D., Camazine S., Sneyd J., Collective decision-making in honey
bees: how colonies choose among nectar sources, Behavioural Ecology and
Sociobiology, 28, 277-290 (1991).
58. Spall J. C., Introduction to Stochastic Search and optimization: Estimation,
Simulation, and Control, Wiley, Hoboken, NJ, (2003).
59. Storn R., On the usage of differential evolution for function optimization,
Biennial Conference of the North American Fuzzy Information Processing
Society (NAFIPS), pp. 519-523 (1996).
60. Storn R., web pages on differential evolution with various programming
codes, http://www.icsi.berkeley.edu/∼storn/code.html
61. Storn R. and Price K., Differential evolution - a simple and efficient heuristic
for global optimization over continuous spaces, Journal of Global Optimiza-
tion, 11, 341-359 (1997).
62. Swarm intelligence, http://www.swarmintelligence.org
63. Talbi E. G., Metaheuristics: From Design to Implementation, Wiley, (2009).
64. Vapnik V., The Nature of Statistical Learning Theory, Springer, (1995).
65. Wolpert D. H. and Macready W. G., No free lunch theorems for optimiza-
tion, IEEE Trans. on Evol. Computation, 1, 67-82 (1997).
66. Wikipedia, http://en.wikipedia.org
67. Yang X. S., Engineering optimization via nature-inspired virtual bee algo-
rithms, IWINAC 2005, Lecture Notes in Computer Science, 3562, 317-323
(2005).
68. Yang X. S., Biology-derived algorithms in engineering optimization (Chap-
ter 32), in Handbook of Bioinspired Algorithms, edited by Olariu S. and
Zomaya A., Chapman & Hall / CRC, (2005).
69. Yang X. S., New enzyme algorithm, Tikhonov regularization and inverse
ta h e u r is
d Me ti parabolic analysis, in: Advances in Computational Methods in Science and
i re
cA
p

Engineering, ICCMSE 2005, 4, 1880-1883 (2005).


N a t u re -I n s

lgorith m s

Xin-She Yang
c Luniver Press
70. Yang X. S., Lees J. M., Morley C. T.: Application of virtual ant algorithms
in the optimization of CFRP shear strengthened precracked structures, Lec-
S ec

)
10

on
d E d ion (20 ture Notes in Computer Sciences, 3991, 834-837 (2006).
it
REFERENCES 145

71. Yang X. S., Firefly algorithms for multimodal optimization, 5th Symposium
on Stochastic Algorithms, Foundations and Applications, SAGA 2009, Eds.
O. Watanabe & T. Zeugmann, LNCS, 5792, 169-178(2009).
72. Yang X. S. and Deb S., Cuckoo search via Lévy flights, in: Proc. of World
Congress on Nature & Biologically Inspired Computing (NaBic 2009), IEEE
Publications, USA, pp. 210-214 (2009).
73. Yang X. S. and Deb S., Engineering optimization by cuckoo search, Int. J.
Math. Modelling & Num. Optimization, 1, 330-343 (2010).
74. Yang X. S., A new metaheuristic bat-inspired algorithm, in: Nature Inspired
Cooperative Strategies for Optimization (NICSO 2010) (Eds. J. R. Gonzalez
et al.), Springer, SCI 284, 65-74 (2010).
75. Yang X. S. and Deb S., Eagle strategy using Lévy walk and firefly algorithms
for stochastic optimization, in: Nature Inspired Cooperative Strategies for
Optimization (NICSO 2010) (Eds. J. R. Gonzalez et al.), Springer, SCI 284,
101-111 (2010).

ta h e u r is
d Me ti
i re
cA
p
N a t u re -I n s

lgorith m s

Xin-She Yang
c Luniver Press
S ec

)
10

on
d E d ion (20
it
ta h e u r is
d Me ti
i re
cA
p
N a t u re -I n s

lgorith m s

Xin-She Yang
c Luniver Press
S ec

)
10

on
d E d ion (20
it
INDEX

Accelerated PSO, 65 crossover, 41, 43, 48


algorithm, 130 cuckoo search, 9
ANN, 117
annealing, 21, 22 dance strength, 59
annealing schedule, 21, 89 decision variable, 61
ant algorithm, 53 design space, 1
ant colony optimization, 9, 54 design variable, 1
artificial bee colony, 57, 61 deterministic, 4
artificial immune system, 136 differential evolution, 9, 47, 129
asymptotic, 85 diversification, 127
attractiveness, 83 double bridge problem, 56

back propagation algorithm, 119 eagle strategy, 133


bacterial foraging optimization, 136 Easom’s function, 45
banana function, 24 efficiency, 127
bat algorithm, 9, 97 enzyme algorithm, 136
bee algorithm, 53, 57, 60 enzyme reaction, 136
bioluminescence, 81 evolutionary walk, 130
Boltzmann distribution, 22 exploitation, 127
brightness, 82 exploration, 127
Brownian motion, 13, 14
FA, 81
calculus-based algorithm, 138 FA variant, 89
Cartesian distance, 84 firefly algorithm, 9, 81
ta h u r is
d M e eclassification,
ti 121 fitness, 43, 82
i re
cA
p

constraint, 1 fractal, 16
N a t u re -I n s

lgorith m s

Xin-She Yang convergence, 89 frequency, 73


c Luniver cooling
Press rate, 21
cooling schedule, 22 Gaussian distribution, 11
S ec

)
10

on
d E d i ocross 2 0 entropy, 136 genetic algorithm, 9, 41, 42, 75
it n (
Nature-Inspired Metaheuristic Algorithms, 2nd Edition by Xin-She Yang 147
Copyright c 2010 Luniver Press
148 INDEX

GEWA, 130 photosynthesis, 135


global optimality, 23 photosynthetic algorithm, 135
global optimization, 4 pitch, 73, 74
gradient-based, 4 pitch adjustment, 75
population, 42, 44, 75
harmonics, 73 power-law distribution, 12
harmony search, 9, 73, 74 probability, 54
heuristic, 1, 4, 73 crossover, 44
history, 1 mutation, 44
honeybee algorithm, 57, 59 probability density function, 11
probability distribution, 11
inertia function, 65 pseudo code, 22, 23, 55, 60, 64, 82
initial temperature, 22 PSO, 5, 63, 85
intensification, 127
interacting Markov chain, 18 random process, 11
random search, 21, 85
kernel function, 125 random variable, 11
random walk, 12, 75, 128
Lévy flight, 14 higher dimension, 13
light intensity, 83 randomization, 74
linear SVM, 122 response space, 1
Rosenbrock’s function, 24, 77
Markov chain, 17
optimization, 18 SA, 21
Markov chain algorithm, 19 algorithm, 23
Markov chain Monte Carlo, 17 implementation, 24
Matlab, 66 scaling, 84
memetic algorithm, 136 search space, 1
metaheuristic, 1, 4, 127 selection, 41
metaheuristic algorithm, 18 self-organized, 54, 81
metaheuristics, 73, 135 self-similarity, 16
minimum energy, 21 simulated annealing, 9, 21
Monte Carlo, 17 solution quality, 23
multicriteria, 2 solution space, 1
multiobjective, 41 spider algorithm, 138
music-inspired, 73 stability, 23
mutation, 41, 44
stochastic, 76
stochastic algorithm, 4
nature-inspired, 53, 58, 135, 138
support vector machine, 117, 121
neural networks, 118
support vectors, 124
nonlinear SVM, 125
NP-hard, 139
Tabu search, 135
NP-hard problem, 56
travelling salesman problem, 41, 56
objective, 1
variant, 50
Octave, 66
virtual ant algorithm, 57
open problems, 137
virtual bee algorithm, 57, 60
optimization, 18, 47, 139
e t a h e u
dM r is t Markov chain, 18
i re i
waggle dance, 57
cA
p

Markov chain algorithm, 19


N a t u re -I n s

lgorith m s

Xin-She Yang
particle
c Luniver Press swarm optimization, 5, 9, 63,
66, 73
S ec

)
10

on
d E d i o n ( 2 0pheromone, 53, 56, 57
it

You might also like