Extremal Optimization - Fundamentals, Algorithms, and Application
Extremal Optimization - Fundamentals, Algorithms, and Application
Optimization
Fundamentals, Algorithms, and Applications
This page intentionally left blank
Extremal
Optimization
Fundamentals, Algorithms, and Applications
CRC Press
Taylor & Francis Group
6000 Broken Sound Parkway NW, Suite 300
Boca Raton, FL 33487-2742
© 2016 by Chemical Industry Press. Published by Taylor & Francis Group under an exclusive license with Chemical
Industry Press.
CRC Press is an imprint of Taylor & Francis Group, an Informa business
This book contains information obtained from authentic and highly regarded sources. Reasonable efforts have been
made to publish reliable data and information, but the author and publisher cannot assume responsibility for the
validity of all materials or the consequences of their use. The authors and publishers have attempted to trace the
copyright holders of all material reproduced in this publication and apologize to copyright holders if permission to
publish in this form has not been obtained. If any copyright material has not been acknowledged please write and let
us know so we may rectify in any future reprint.
Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, transmitted,
or utilized in any form by any electronic, mechanical, or other means, now known or hereafter invented, includ-
ing photocopying, microfilming, and recording, or in any information storage or retrieval system, without written
permission from the publishers.
For permission to photocopy or use material electronically from this work, please access www.copyright.com
(http://www.copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers,
MA 01923, 978-750-8400. CCC is a not-for-profit organization that provides licenses and registration for a variety
of users. For organizations that have been granted a photocopy license by the CCC, a separate system of payment
has been arranged.
Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used only for
identification and explanation without intent to infringe.
Visit the Taylor & Francis Web site at
http://www.taylorandfrancis.com
Preface.................................................................................................................xi
Acknowledgments.............................................................................................. xv
v
vi ◾ Contents
Section III APPLICATIONS
7 EO for Systems Modeling and Control...............................................215
7.1 Problem Statement......................................................................... 215
7.2 Endpoint Quality Prediction of Batch Production with MA-EO...... 216
7.3 EO for Kernel Function and Parameter Optimization in
Support Vector Regression.............................................................. 219
7.3.1 Introduction.......................................................................221
7.3.2 Problem Formulation.........................................................221
7.3.2.1 Support Vector Regression................................. 222
7.3.2.2 Optimization of SVR Kernel Function and
Parameters...........................................................223
7.3.3 Hybrid EO-Based Optimization for SVR Kernel
Function and Parameters...................................................224
7.3.3.1 Chromosome Structure.......................................224
7.3.3.2 Fitness Function..................................................225
7.3.3.3 EO-SVR Workflow............................................ 226
7.3.4 Experimental Results........................................................ 228
7.3.4.1 Approximation of Single-Variable Function....... 228
7.3.4.2 Approximation of Multivariable Function...........233
7.4 Nonlinear Model Predictive Control with MA-EO........................238
7.4.1 Problem Formulation for NMPC Based on SVM Model.....239
x ◾ Contents
With the high demand and the critical situation of solving hard optimization
problems we are facing in social, environment, bioinformatics, traffic, and indus-
trial systems, the development of more efficient novel optimization solutions has
been a serious challenge to academic and practical societies in an information-
rich era. In addition to the traditional math-programming-inspired optimiza-
tion solutions, computational intelligence has been playing an important role in
developing novel optimization solutions for practical applications. On the basis
of the features of system complexity, a new general-purpose heuristic for finding
high-quality solutions to NP-hard (nondeterministic polynomial-time) optimi-
zation problems, the so-called “extremal optimization (EO),” was proposed by
Boettcher and Percus. In principle, this method is inspired by the Bak–Sneppen
model of self-organized criticality describing “far-from-equilibrium phenomena,”
from statistical physics, a key concept describing the complexity in physical sys-
tems. In comparison with other modern heuristics, such as simulated anneal-
ing, genetic algorithm (GA), through testing on some popular benchmarks (TSP
[traveling salesman problem], coloring, K-SAT, spin glass, etc.) of large-scale
combinatory-constrained optimization problems, EO shows superior performance
in the convergence and capability of dealing with computational complexity, for
example, the phase transition in search dynamics and having much fewer tuning
parameters.
The aim of this book is to introduce the state-of-the-art EO solutions from fun-
damentals, methodologies, and algorithms to applications based on numerous clas-
sic publications and the authors’ recent original research results, and to make EO
more popular with multidisciplinary aspects, such as operations research, software,
systems control, and manufacturing. Hopefully, this book will promote the move-
ment of EO from academic study to practical applications. It should be noted that
EO has a strong basic science foundation in statistical physics and bioevolution, but
from the application point of view, compared with many other metaheuristics, the
application of EO is much simpler, easier, and straightforward. With more studies
in EO search dynamics, the hybrid solutions with the marriage of EO and other
metaheuristics, and the real-world application, EO will be an additional weapon to
xi
xii ◾ Preface
deal with hard optimization problems. The contents of this book cover the follow-
ing four aspects:
The authors have made great efforts to focus on the development of MEO and
its applications, and also present the advanced features of EO in solving NP-hard
problems through problem formulation, algorithms, and simulation studies on
popular benchmarks and industrial applications. This book can be used as a refer-
ence for graduate students, research developers, and practical engineers when they
work on developing optimization solutions for those complex systems with hard-
ness that cannot be solved with mathematical optimization or other computational
intelligence, such as evolutionary computations. This book is divided into the fol-
lowing three parts.
Section I: Chapter 1 provides the general introduction to optimization with a
focus on computational complexity, computational intelligence, the highlights of
EO, and the organization of the book; Chapter 2 introduces the fundamental and
numerical examples of extremal dynamics-inspired EO; and Chapter 3 presents the
extremal dynamics–inspired self-organizing optimization.
Section II: Chapter 4 covers the development of modified EO, such as pop-
ulation-based EO, multistage EO, and modified EO with an extended evolu-
tionary probability distribution. Chapter 5 presents the development of memetic
algorithms, the integration of EO with other computational intelligence, such
as GA, particle swarm optimization (PSO), and artificial bee colony (ABC).
Chapter 6 presents the development of multiobjective optimization with extremal
dynamics.
Preface ◾ xiii
This book was written based on the published pioneering research results on extre-
mal dynamics-inspired optimization, and the authors’ recent research and develop-
ment results on the fundamentals, algorithms, and applications of EO during the
last decade at both Shanghai Jiao Tong University (SJTU) and Zhejiang University,
China. We wish to thank the Department of Automation, SJTU, the Research
Institute of Cyber Systems and Control, Zhejiang University, and the Research
Institute of Supcon Co. for their funding of PhD programs and research projects.
We are most grateful to Professor J. Chu at Zhejiang University, Professors Y. G. Xi
and G. K. Yang at SJTU, and Directors Y. M. Shi and Z. S. Pan of Supcon Co. for
their strong support and encouragement.
In particular, we are deeply indebted to the members of Chinese Academy of
Engineering: Professor C. Wu and Professor T. Y. Chai; and SJTU Professor G. K.
Yang who have freely given of their time to review the book proposal and write sug-
gestions and recommendations.
We are grateful to Chemical Industrial Press for providing the funding to pub-
lish this book, and also to Ms. H. Song (commissioning editor, Chemical Industrial
Press) and the staff of CRC Press for their patience, understanding, and effort in
publishing this book.
This book was also supported by the National Natural Science Foundation
of China (Nos. 61005049, 51207112, 61373158, 61472165, and 61272413),
Zhejiang Province Science and Technology Planning Project (No. 2014C31074),
National High Technology Research and Development Program of China (No.
2012AA041700), National Major Scientific and Technological Project (No.
2011ZX02601-005), National Science and Technology Enterprises Technological
Innovation Fund (No. 11C26213304701), Zhejiang Province Major Scientific and
Technological Project (No. 2013C01043), and the State Scholarship Fund of China.
Finally, we thank the relevant organizations for their permission to reproduce
some figures, tables, and math formulas in this book. See specific figures for appli-
cable source details.
xv
This page intentionally left blank
FUNDAMENTALS, I
METHODOLOGY,
AND ALGORITHMS
This page intentionally left blank
Chapter 1
General Introduction
1.1 Introduction
With the revolutionary advances in science and technology during the last few
decades, optimization has been playing a more and more important role in solving
a variety of real-world systems for modeling, optimization, and decision problems.
The major functions of optimization are to provide one or multiple solutions that
are able to optimize (e.g., minimize or maximize) the desired objectives subject
to the given constraints in the relevant search space. The optimization techniques
have been popularly applied in business, social, environmental, biological, medical,
man-made physical and engineering systems, etc. Due to the increase in computa-
tional complexity, traditional mathematics-inspired optimization solutions, such as
mathematical programming (e.g., linear programming [LP], nonlinear program-
ming [NLP], mix-integer programming [MIP]), can hardly be applied to solving
some real-world complex optimization problems, such as the NP-hard (nondeter-
ministic polynomial-time hard) problems (Korte and Vygen, 2012) defined in com-
putational complexity theory.
To make optimization solutions applicable, workable, and realistic for those
complex systems with roughing search landscape, and limited or no mathemati-
cal understanding (i.e., knowledge) between decision variables, desired criteria, and
constraints, a number of alternative multidisciplinary optimization approaches have
been developed. Instead of traditional optimization methodologies and algorithms,
computer science and computational intelligence (CI)-inspired meta-heuristics
search optimization solutions (Patrick and Michalewicz, 2008) have been developed.
The CI (Engelbrecht, 2007) applied in optimization is to stimulate a set of natu-
ral ways to deal with complex computational problems in optimization solutions.
The major features of CI are to model and stimulate the behaviors and features of
3
4 ◾ Extremal Optimization
natural evolution, the human body, artificial life, biological and physical systems
with computer algorithms, such as evolutionary computations, for example, genetic
algorithms (GAs) (Holland, 1992), genetic programming (GP) (Koza, 1998), arti-
ficial neural network (ANN) (Jain and Mao, 1996), fuzzy logic (FL) (Zadel, 1965;
Xu and Lu, 1987), artificial life (Langton, 1998), artificial immune system (De
Castro and Timmis, 2002), DNA computing (Daley and Kari, 2002), and statistical
physics (Hartmann and Weigt, 2005). The major advantages of CI in solving com-
plex optimization problems are (1) the requirement of limited or no knowledge of
mathematical first principles to describe the quantitative relations of system behav-
iors between decision variables, desired criteria, and constraints of a system under
study; (2) the “point-to-point” gradient-based search is replaced by “generation”
and “population”-based search, and consequently, it is not necessary to calculate the
gradient during the search process and the search space can be significantly enlarged;
(3) the introduction of probability-inspired “mutation,” “crossover,” and “selections”
operations significantly improves the search capability and efficiency; (4) the uses of
natural features of “self-learning” and “self-organizing” in modeling, data mining,
clustering, classification, and decisions to enhance the search power, robustness, and
adaptation, particularly for those systems with variable environments. As a result,
the solutions of complex problems might not be “optimal” under the sense of math-
ematical definition, but may provide a satisfactory result with much less search costs
in memory, communications, and time. In addition to the highlighting of mathemati-
cal optimization, Section 1.2 will focus on presenting the concepts of optimization
from practical aspects.
1.2 Understanding Optimization:
From Practical Aspects
1.2.1 Mathematical Optimization
A well-known standard mathematical (continuous) optimization problem or math-
ematical programming (Luenberger, 1984) can be represented as
minimize f ( X ), X = [ x1 ,… , xn ]T ∈ R n (1.2)
subject to g i ( X ) ≤ 0, i = 1,…, m
hi ( X ) = 0, i = 1,…, p
where
f(X ): R n → R is an objective function to be minimized over the variable X
gi(X ) ≤ 0, inequality constraints
hi(X ) = 0, equality constraints
= Interstate pipeline
= Intrastate pipeline
= Compressor station
and connections) and the engineering design (e.g., the physical and engineer-
ing parameters) that minimizes the capital investment and energy efficiency
under the given supply and demand figures. Since the NGPN covers both
interstate and intrastate pipeline network architecture, and the detailed engi-
neering design, the problem formulation and optimization solutions become
very complicated.
2. The NGPN operational optimization: The solutions of NGPN operational
optimization are to generate an optimal control strategy (e.g., select the con-
trol nodes with the relevant control parameters) that may optimize the desired
objectives, such as energy efficiency under the changes in supply and demand.
In fact, these two problems can be viewed as “pipeline network design” with
layout and parameter optimization and the “control over gas pipeline network,”
respectively. Consequently, the optimization solutions should be multifunctional
and multiscale with offline and online modes.
that are able to make an enterprise robust and profitable under an unpredictable
business environment.
1. The decisions on the product-mix, the type and amount of core products
to be produced with order management and/or inventory control for both
make-to-order and make-to-inventory business models, respectively. This is to
answer what to produce under a certain business environment and given time
window.
2. The decisions on procurement plan are to define the suppliers for purchasing
the bill of materials (BOM) and the amount of BOM to be purchased. This
also involves a hybrid optimization with discrete and continuous decisions in
selecting suppliers and determining the size of purchasing.
3. The decisions on advanced production planning and scheduling (APPS) are to
determine where and when to prefill the work (manufacturing) orders, namely
the order-to-site assignment and “production scheduling” under a desired time
window. These problems could be formulated as multilevel COPs.
4. Real-time optimization and advanced control make real-time decisions and
control under a variable business and production environment, namely,
answer the question: “How to produce?” to reach the goals.
In reality, the real hard problems just occur on the boundary between these two
regions, where the probability of a solution is low but nonnegligible (Cheesman
et al., 1991). In fact, the phenomena of easy–hard–easy and easy–hard transitions in
a hard search process usually come from the property of phase transitions existing in
many NP-hard problems. The probability of solutions and computational complex-
ity governed by phase transitions has been studied in statistical physics for a 3-SAT
problem being a typical NP-complete problem that shows the connection between
NP-complete problems and phase transitions. Many relevant research results on
this topic have been published in special issues of Artificial Intelligence (Hogg et al.,
1996), Theoretical Computer Science (Dubolis and Dequen, 2001), Discrete Applied
Mathematics (Kirousis and Kranakisdeg, 2005), and Science and Nature (Monasson
et al., 1999; Selman, 2008).
Typical examples to study on phase transitions in hard optimization problems
include TSP (Gent and Walsh, 1996), spin glasses (Boettcher, 2005a,b), graph par-
titioning (Boettcher and Percus, 1999) problems, etc. In addition to these examples,
phase transition also exists in the job shop (Beck and Jackson, 1997) and project
scheduling (Herroelen and Reyck, 1999) problems, where the resource parameters
exhibit a sharp easy–hard–easy transition behavior. It should be noted that the capa-
bility of dealing with the computational complexity with phase transition is one of the
key advanced features of EO.
mathematical models with symbolic regressions and further perform “computer pro-
gramming” automatically through evolutionary procedures: mutation, crossover,
and selection.
The theoretical foundation of evolutionary computation relies on Darwinian
evolutionary principles for the biological mechanism of evolution. To emulate the
bio-evolution process, the search processes performed in EAs are population-based
with the probability-driven genetic operators in chromosomes: mutation, crossover,
and selection. In fact, the individuals with higher fitness (i.e., the winners) in the
population pool may have higher probability to survive as parents for the next gen-
eration. During the evolution process, there are multiple individuals (chromosomes
or solutions) in the population pool of each generation being qualified as feasible
solutions. Consequently, the user may select a preferred solution through view-
ing the entire path of the evolution. The evolution computations have been widely
applied in solving many real-world optimization problems.
1.5 Highlights of EO
1.5.1 Self-Organized Criticality and EO
To build the bridge between statistical physics and computational complexity
and find high-quality solutions for hard optimization problems, an extremal
dynamics-oriented local-search heuristic, the so-called extremal optimization
(EO) has been proposed (Boettcher and Percus, 1999). EO was originally devel-
oped from the fundamental of statistical physics. More specifically, EO is
inspired by self-organized criticality (SOC) (Bak et al., 1987), which is a statisti-
cal physics concept to describe a class of systems that have a critical point as an
attractor. Moreover, as also indicated by Bak et al. (1987), the concept of SOC
may be further described and experimented on by a sandpile model as shown in
Figure 1.2. If a sandpile is formed on a horizontal circular base with any arbi-
trary initial distribution of sand grains, if a steady state of the sandpile is formed
by slowly adding sand grains gradually, the surface of the sandpile makes on
average a constant angle with the horizontal plane. The addition of each sand
General Introduction ◾ 15
grain results in some activity on the surface of the pile: an avalanche of sand
mass follows, which propagates on the surface of the sandpile. In the stationary
regime, avalanches are of many different sizes that they would have a power law
distribution. If one starts with an initial uncritical state, initially most of the
avalanches are small, but the range of sizes of avalanches grows with time. After
a long time, the system arrives at a critical state, in which the avalanches extend
overall length and time scales.
Their macroscopic behavior exhibits the spatial and temporal scale-invariance
characteristics of the critical point of a phase transition. SOC is typically observed
in slowly driven nonequilibrium systems with extended degrees of freedom and a
high level of nonlinearity. It is interesting to note that in SOC, there is no need
to tune control parameters with precise values. Just inspired by this principle, EO
drives the system far from equilibrium: aside from ranking, there exists no adjust-
able parameter, and new solutions are accepted indiscriminately. Consequently, the
nature of SOC performed in EO may result in better solutions for those hard opti-
mization problems consisting of phase transitions. The mechanism of EO can be
characterized from the perspectives of statistical physics, biological evolution, and
ecosystems (Lu et al., 2007).
16 ◾ Extremal Optimization
The book starts with a general introduction to optimization that covers under-
standing optimization with selected application fields, challenges faced in an infor-
mation-rich era, and state-of-the-art problem-solving methods with CI. The book
is structurally divided into three sections. Section I covering Chapters 1, 2, and 3
20 ◾ Extremal Optimization
Introduction to Extremal
Optimization
21
22 ◾ Extremal Optimization
a species, and has an associated “fitness” value between 0 and 1 (randomly sam-
pling from a uniform distribution). At each updating step, the extremal species,
that is, the one with the smallest value is selected. Then, that species and its inter-
related species are replaced with new random numbers. After a sufficient number of
update steps, the system reaches a highly correlated SOC, and the fitness of almost
all species has transcended a certain fitness threshold. However, the dynamical
systems maintain punctuated equilibrium: the species with the lowest fitness can
undermine the fitness of those interrelated neighbors while updating its own state.
This coevolutionary activity gives rise to chain reactions called “avalanches,” large
fluctuations that rearrange major parts of the system, potentially making any con-
figuration accessible (Boettcher and Frank, 2006).
In close analogy to the self-organizing process of the BS model, the basic EO
algorithm proceeds as follows (Boettcher and Percus, 2000):
From the above EO algorithm, it can be seen that unlike GAs working with a
population of candidate solutions, EO evolves a single solution S and makes local
modification to the worst components. This requires that a suitable representation
should be selected that permits individual solution components (or the so-called
decision variables) to be assigned a quality measure (i.e., fitness). This differs from
holistic approaches such as EAs that assign equal fitness to all components of
a solution based on their collective evaluation against an objective function. In
EO, each decision variable in the current solution S is considered as a “species”
(Lu et al., 2007).
To avoid getting stuck into a local optimum (Boettcher and Percus, 2000), a
single parameter is introduced into EO and the improved algorithm is called τ-EO.
In τ-EO, according to fitness λi, all xi are ranked as a permutation Π of the variable
labels i with λ ∏(1) ≤ λ ∏( 2 ) ≤ ≤ λ ∏( n ). The worst variable xj is of rank 1, j = Π(1),
and the best variable is of rank n. Then the variable is selected stochastically, in
terms of a probability distribution over the rank order, rather than always selecting
the “worst” variable at step 2. The variable with the kth highest fitness is selected
with the probability Pk ∝ k−τ (1 ≤ k ≤ N), given that there are N entities in a com-
putational system.
Introduction to Extremal Optimization ◾ 23
Biological evolution
Extremal optimization
one randomly. Thus, the fitness of the worst species and its neighbors will always
change together, which can be considered a coevolutionary activity. This coevo-
lutionary activity gives rise to chain reactions or “avalanches”: large (nonequilib-
rium) fluctuations that rearrange major parts of the system, potentially making any
configuration accessible. Large fluctuations allow the method to escape from local
minima and explore the configuration space efficiently, while the extremal selection
process enforces frequent returns to near-optimal solutions.
Furthermore, EO can be analyzed from the ecosystem point of view. An eco-
system is defined as a biological community of interacting organisms and their
surrounding environment. That is to say, the fitness of any species living in an
ecosystem will be affected by the fitness of any other species in the same ecosystem,
whereas the change in the fitness of any species will affect the fitness landscape
(i.e., environment) of the whole ecosystem. The interaction relationship between
any two species in the ecosystem can be regarded as the inherent fundamental
mechanism that drives all the species to coevolving. The food chain may be one of
the ways in which the interaction between any two species takes place. The food
chain provides energy that all living things in the ecosystem must have to survive.
In the food chain, there exist direct or intermediate connections between any spe-
cies. According to natural selection or “survival of the fittest” proposed by Darwin,
those species with higher fitness will have a higher probability to survive while
those with a lower fitness will die out. In other words, the species with the lower
fitness will die out with a higher probability than other species. When one species
with a lower fitness dies out, those species above the extinct species in the food
chain will also be under the threat of extinction, no matter how high their fitness
value is. Similarly, EO considers those species with a lower fitness to die out more
easily than others. Hence, EO always selects those “weakest” to update, or mutate.
The change in the fitness of the worst species will impact the fitness landscape of
the whole system. At the same time, the fitness of those species connected to the
weakest species will also be affected by the altered environment and be changed
simultaneously.
ei = pi − min(d ij ) (2.1)
j ≠i
Finally, the energy function for any feasible TSP tour s can be expressed as
n n
F (s ) = ∑ min(d ) + ∑ e
i =1
j ≠i
ij
i =1
i (2.2)
Obviously, the first part of Equation 2.2 is a constant for a specific TSP instance,
and it can be viewed as the internal energy for an isolated physical system. Thus,
26 ◾ Extremal Optimization
the optimal solution for combinatorial optimization is equivalent to the system with
minimal free energy, that is, the whole computational system reaches its ground state.
On the basis of the definition of potential energy, the optimization method
with extremal dynamics is proposed as follows (Chen et al., 2007):
P (k ) ∝ k −α (2.3)
where the rank k is going from k = 1 for the city with the highest potential
energy to k = n for the city with the lowest potential energy, and the power-
law exponent α is an adjustable parameter. For α = 0, randomly selected
cities get forced to update, resulting in merely a random walk through the
configuration space. For α → ∞, only the extremal cities with the highest
potential energy get updated that may trap the computational system to a
metastable state (i.e., near-optimal solution).
For updating the state of the selected cities, now, let us focus on the move
class, also called neighborhood relation, that is, to each solution s, a set of
neighbors N(s) is defined (Franz and Hoffmann, 2002). If the move class
is a reversible process, that is, if s′ ∈ N(s), then also, s ∈ N(s′), and the state
space will be defined as an undirected network structure. On this complex
network of configuration space, the degrees of freedom of each solution s ∈ S
is the cardinality of its neighborhood |N(s)|, and a random walk takes places
for finding the ground state. In the TSP, the neighborhood N(s) can be eas-
ily constructed by the 2-opt move (Fredman et al., 1995; Helsgaun, 2000):
construct new Hamiltonian cycles by deleting two edges and reconnecting
the two resulting paths in a different way. Simply, the 2-opt move keeps the
tour feasible and corresponds to a reversal of a subsequence of the cities. Since
Introduction to Extremal Optimization ◾ 27
the potential energy of the selected city is expected to be updated, the 2-opt
move must replace its forward-directed edge, and so, there will be n – 1 pos-
sible neighbors, that is, the cardinality of the neighborhood |N(s)| = n – 1.
Being in the current state s, the random walker in our optimization dynamics
chooses a new state s′ having μth “low” energy out of its neighbors N(s) by
another power-law distribution Pμ ∝ μ−β (1 ≤ μ ≤ |N(S)|), and accepts s ← s′
unconditionally, if E(s′) < E(sbest), sets sbest ← s′. The external control param-
eter β can be used to regulate the extent of avalanche-like fluctuations in
those highly susceptible states (near the ground state), that is, uphill moves
may be accepted, and it can help the random walker pass through barriers of
the energy landscape. In principle, the dynamical system can jump from one
metastable state (corresponding to a near-optimal solution in combinatorial
optimization) to another by avalanche dynamics.
3. Repeat the update step 2 until a given termination criterion is satisfied, which
is either a certain number of iterations or a predefined amount of CPU time
or the convergence of the performance improvement. Return sbest and E(sbest)
as the resulting solution.
It is worth noting that the 2-opt move of updating the states of the selected city
with a high potential energy also has an impact on the potential energy of its inter-
acting cities. Consequently, the extremal dynamics and coevolutionary processes
can drive the computational system (TSP tour) to its ground state gradually.
To get an explicit conception of the parameters α and β, it is useful to conduct
some numerical simulations. First, in Figure 2.2, the average energy and the errors
for 10 runs with the same initial configurations are presented for the n = 64 ran-
dom Euclidean TSP instance. It can be seen that the energy isn’t very sensitive to
the value of α, but numerous experiments demonstrate that a value α ∼ 1 + 1/ln(n)
seems to work best as discussed in Boettcher and Percus (2003).
Second, the same initial configurations and α = 1 + 1/ln(n) are used, and
Figure 2.3 shows typical runs for different values of the parameter β. In the simula-
tion, the parameters are set as α = 1 + 1/ln(n) and (a) β = 1, (b) β = 2, (c) β = 3,
and (d) β → ∞. Starting with the case β = 1, it is apparent that the convergence is
slightly slow and fluctuations are large. As the value of β increases, the fluctuations
are considerably reduced, and drastically damped while β → ∞. The experiments
show that the ground state with minimal energy can be reached when β = 3.
6.8
6.7
6.6
Energy
6.5
6.4
6.3
0 0.5 1 1.5 2 2.5
α
Figure 2.2 Energy versus the parameter α. (Reprinted from Physica A, 385,
Chen, Y. W. et al. Optimization with extremal dynamics for the traveling sales-
man problem. 115–123. Copyright 2007, with permission from Elsevier.)
(a) (b)
30 30
Energy
Energy
20 20
10 10
(c) (d)
30 30
Energy
Energy
20 20
10 10
Figure 2.3 Evolution of the energy function. (Reprinted from Physica A, 385,
Chen, Y. W. Lu et al. Optimization with extremal dynamics for the traveling sales-
man problem. 115–123. Copyright 2007, with permission from Elsevier.)
Introduction to Extremal Optimization ◾ 29
35
30
25
Energy
20
15
10
5
0 1 2 3 4 5 6 7 8 9 10
Number of updates × 104
Figure 2.4 Evolution of the energy function in a typical run of SA. (Reprinted
from Physica A, 385, Chen, Y. W. et al. Optimization with extremal dynamics for
the traveling salesman problem. 115–123. Copyright 2007, with permission from
Elsevier.)
30 ◾ Extremal Optimization
state (a near-optimal TSP tour), while the tuned temperature parameter T decreases.
However, SA samples numerous states far from the ground state with the lowest
energy, and its searching capacity is inefficient in the last portion of the optimiza-
tion process.
In contrast to SA, which only dwells on the macroscopic behavior of computational
systems (i.e., global energy function), and does not investigate the micromechanism
of solution configurations, the extremal dynamics simulates a complex multientity
system in statistical physics. Both the collective behavior and the individual states
of particles are considered simultaneously during the optimization dynamics. As
shown previously in Figure 2.3, a near-optimal solution can be quickly obtained by
the greedy-searching process at first, and enough fluctuations (ergodic walk) can help
to escape from the local optima and explore new regions of the configuration space.
Combining the greedy search and the fluctuated explorations near the backbone of
optimization problems, this optimization process can be viewed as an ideal search
dynamics for those computational systems with rugged-energy landscapes.
l opt
lim =k A (2.4)
n →∞ n
where k is a constant, and the best current estimate for k is 0.7124 ± 0.0002 (Gent
and Walsh, 1996). This asymptotic result suggests that a natural control parameter
is the dimensionless ratio l / n ⋅ A . At large values of this parameter, that is, the
tour length is large compared to the number of cities to be visited, the solutions can
be easily found. More accurately speaking, the system configuration can be con-
structed arbitrarily when the free energy ∑in=1 ei of the computation system is high.
From this aspect, it can be inferred that the numerous sampling in the preceding
Introduction to Extremal Optimization ◾ 31
Standard Standard
Mean Deviation Number of Mean Deviation Number of
n (l / n ⋅ A ) (l / n ⋅ A ) Updates (l / n ⋅ A ) (l / n ⋅ A ) Updates
The asymmetric TSP appears to be more difficult than its symmetric counterparts,
both with respect to optimization and approximation (Gutin and Punnen, 2002;
Laporte, 2010). Generally, the asymmetric TSP heuristics can be divided into the
following three categories (Cirasella et al., 2001): (1) tour construction methods,
such as nearest-neighbor search and greedy algorithm, (2) algorithms based on
patching subcycles together in a minimum cycle cover, such as branch and bound
algorithms (Miller and Pekny, 1989), and (3) tour improvement methods, mainly
referring to local search methods based on rearranging segments of the tour, such
as 3-opt search (Lawler et al., 1985), hyperopt search (Burke et al., 2001), and Lin–
Kernighan algorithm (Lin and Kernighan, 1973; Kanellakis and Papadimitriou,
1980; Helsgaun, 2000).
p p′ p p′ p p′
C1 C1 C1
C2 C2 C2
r′ r′ r′
r q r q r q
q′ q′ q′
C3 C3 C3
Figure 2.5 3-Opt move for the asymmetric TSP. (Reprinted from Physica A, 390,
Chen, Y. W. et al. Improved extremal optimization for the asymmetric traveling
salesman problem. 4459–4465. Copyright 2011, with permission from Elsevier.)
Introduction to Extremal Optimization ◾ 33
considering all the updating cities. In the 3-opt move, there are (N − 1) × (N − 2)
possible options to update the state of the selected city. Each of these states leads
to a neighbor solution. Furthermore, a cooperative optimization strategy is imple-
mented as follows:
1. If the local optimum solution Slocal in the 3-opt neighbor space satisfies
F(Slocal) < F(Sbest), then set Sbest ← Slocal and accept S ← Slocal .
2. Else, randomly select a new solution S′ from the 3-opt neighbor space with
a probability p(0 ≤ p ≤ 1), and accept S ← S′. That is to say, a random 3-opt
move is applied to update the state of the selected city with the probability p.
With the probability 1 − p, a greedy strategy is used to select the local opti-
mum solution Slocal, and set S ← Slocal .
The choice of a scale-free distribution for Pk ensures that no rank of fitness gets
excluded from further evolution while maintaining a bias against entities with bad
fitness (Boettcher, 2005a). The probability parameter p is used to introduce some
“noise” or “random walk” to the algorithm so that the optimization dynamics can
escape from metastable states (i.e., local–optimal solutions) more easily (Selman
et al., 1994).
120
τ = 0.0
100
τ = 0.5
80
Average distance
60 τ = 1.0
τ = 1.2
40 τ = 1.5
τ = 1.7
20 τ = 2.5
τ = 4.0
0
0 500 1000 1500 2000 2500
Time lag
Figure 2.6 Average distance between solutions at different time lags (ftv170).
(Reprinted from Physica A, 390, Chen, Y. W. et al. Improved extremal optimization
for the asymmetric traveling salesman problem. 4459–4465. Copyright 2011, with
permission from Elsevier.)
Problem F(Sopt) σ(%) tCPU σ(%) tCPU Success Rate σ(%) tCPU
denotes the average of the best solutions found by a specific algorithm over 10 runs.
The average computation time (tCPU) on an Intel Pentium PC is used to evaluate the
efficiency of optimization algorithms.
It can be concluded from Table 2.2 that the proposed EO optimization method
provides superior optimization performances in both computational effectiveness
and efficiency.
2.4 Summary
Physics has provided systematic viewpoints and powerful methods for optimization
problems. By mapping the optimization problems into physical systems, the EO
method with self-organizing dynamics can be used effectively to solve COPs. There
are many advantages of EO such as extremal dynamics mechanism, coevolution,
only-mutation operator, and long-term memory. Thus, EO can be considered as a
good heuristic method that is competitive with or outperforms many state-of-the-
art heuristics. The experimental results on both the symmetric and asymmetric
TSPs demonstrate that the EO algorithm performs very well and provides much
better performance than the existing stochastic search methods developed from
statistical physics, such as SA. For the algorithmic equivalence of NP-complete
problems, this interdisciplinary optimization method can be extended to solve a
wide variety of combinatorial and physical optimization problems particularly with
phase transitions on search space.
This page intentionally left blank
Chapter 3
Extremal Dynamics–
Inspired Self-Organizing
Optimization
3.1 Introduction
Combinatorial optimization is pervasive in most fields of science and e ngineering.
Its aim is to optimize an objective function on a finite set of feasible solutions.
For example, the TSP (Gutin and Punnen, 2002), one of the classical COPs, can
be described as a problem of search for the shortest tour among a set of cities.
Generally speaking, most of COPs in practice are deemed as computationally
intractable, and have been proven to belong to the class of NP-complete problems
(Garey and Johnson, 1979), where NP stands for “nondeterministic polynomial
time.” For NP-complete problems, although the optimality of a possible solution
can be verified in polynomial time, the computational time for finding the opti-
mal solution grows exponentially with the dimension of input variables in the
worst case. Furthermore, if a polynomial time algorithm can be found to solve one
NP-complete problem, then, all the other NP-complete problems will become solv-
able in polynomial time (Papadimitriou, 1994).
However, in modern computer science, it has been commonly conjectured
that there are no such polynomial time algorithms for any NP-complete problems
(Cormen et al., 2001). Alternatively, a variety of nature-inspired optimization tech-
niques have been developed for finding near-optimal solutions of NP-complete
problems within a reasonable computational time. Examples of such algorithms
include SA (Kirkpatrick et al., 1983), GA (Forrest, 1993), ACO (Bonabeau et al.,
37
38 ◾ Extremal Optimization
2000), and particle swarm optimization (Kennedy and Eberhart, 1995), among
others.
It is interesting to note that most of the existing optimization methods employ
a centralized control model, and rely on a global objective function for evaluating
intermediate and final solutions. In dealing with hard COPs, however, it would be
difficult and time consuming to collect global information due to the interaction
in, and the dimensionality of, computation. In order to overcome the limitation of
the existing centralized optimization, several decentralized, self-organized comput-
ing methods have been studied in recent years with the help of complex systems
and complexity science.
Boettcher and Percus (2000) presented a stochastic search method called EO.
The method is motivated by the BS evolution model (Bak and Sneppen, 1993;
Sneppen, 1995), in which the least-adapted species are repeatedly mutated following
some local rules. To further improve the adaptability and performance of the EO
algorithm, a variation of EO called τ-EO (Boettcher and Percus, 2000) was subse-
quently presented by introducing a tunable parameter τ. So far, EO and its variants
have successfully addressed several physical systems with two-degree-of-freedom
entities, that is, the COPs with binary-state variables, such as graph bipartitioning
problems (Boettcher and Percus, 2000), Ising spin glasses (Middleton, 2004), com-
munity detection in complex networks (Duch and Arenas, 2005), etc. Recently, Liu
and Tsui (2006) presented a general autonomy-oriented computing (AOC) frame-
work for the optimization of self-organized distributed autonomous agents. AOC
is also a bottom-up computing paradigm for solving hard computational problems
and for characterizing complex systems behavior. Computational approaches based
on AOC systems have been applied to distributed constraint satisfaction problems,
image feature extraction (Liu et al., 1997), network community-mining problems
(Yang and Liu, 2007), etc. In addition, Han (2005) introduced the concept of the
local fitness function for evaluating the state of autonomous agents in computa-
tional systems.
In order to design a reasonable self-organized computing method, there are
generally several important issues that should be considered:
Much literature has been focused on one or some of the above questions over
the past few years (Mézard et al., 2002; Han and Cai, 2003; Goles et al., 2004;
Achlioptas et al., 2005). The aim of this chapter is to answer each of those questions,
and furthermore, provide a solid theoretical foundation for the self-organized
Extremal Dynamics–Inspired Self-Organizing Optimization ◾ 39
computing method under study. First of all, COPs are modeled into multien-
tity systems, in which a large number of self-organizing, interacting agents are
involved. Then, the microscopic characteristics of optimal solutions are examined
with respect to the notions of the discrete-state variable and local fitness function.
Moreover, the complexity of search in a solution space is analyzed based on the rep-
resentation of the fitness network and the observation of phase transition. Finally,
based on the analysis, a self-organized computing algorithm is described for solving
hard COPs.
◾◾ A set of discrete variables, X = {x1, …, xn}, with the relevant variable domains,
D1, …, Dn
◾◾ Multiple constraints among variables
◾◾ An objective function F to be optimized, where F: D1 × … × Dn → R
where each element s is a candidate solution. Given a COP, the aim is to find the
optimal solution, s* ∈ S, such that the global objective, F(s*) ≤ F(s) (∀s ∈ S).
Since combinatorial solutions often depend on a nontrivial combination of mul-
tiple elements with specific states, a COP can be straightforwardly translated into
a multientity computational system, if it is formulated from the viewpoint of a
complex system. In a complex computational system, each entity can be viewed as
an autonomous agent, and the agent can collect local (limited) information from the
environment, and acts with other interacting agents to achieve the system objective
40 ◾ Extremal Optimization
Fitness
Ab Ab
ab
Ab
AB AB AB
Global optimum
Figure 3.1 (a–c) Illustration of accessible evolutionary paths. (With kind per-
mission from Springer Science + Business Media: Artificial Intelligence Review,
Toward understanding the optimization of complex systems, 38, 2012, 313–324,
Liu, J. and Chen, Y. W.)
agent should reflect the sign of the global fitness changes; seemingly, in the inter-
mediate process, updating “a” to “A” yields the opposite fitness effect for updating
“b” to “B”; Figure 3.1b and c show that the local fitness function should be capable
of measuring the magnitude of the global fitness changes, for example, updating
“a” to “A” has a more positive or less negative fitness effect than updating “b” to
“B.” With these two premises, reasonable local fitness functions can be defined for
different computational systems. For example, in the Ising computing model, the
local fitness of the spin σi can be represented as
f σ (σi ) = σi hi +
∑
j ∈N ( i )
J ij σ j
(3.1)
where N(i) is the set of spins directly interacted with the spin σi. And then, the
global fitness changes can be consistently calculated by the local fitness function
when flipping the state of spins. With the definition of the consistent local fitness
function, the emergent computing methods can now know how to apply the local
behavioral rules in the presence of natural or priority-based selection mechanisms.
Next, we will take the TSP as a walk-through example of COPs. The TSP has
attracted much interest in computing communities in the past decades due to the
fact that it provides insights into a wide range of theoretical questions in discrete
mathematics, theoretical computer science, computational biology, among oth-
ers, and offers models and solutions to many real-world practical problems, rang-
ing from scheduling, transportation, and to genome mapping. The TSP is usually
stated as an optimization problem of finding the shortest closed tour that visits each
city only once. In other words, given a set of n cities and the distance measure, dij,
for all pairs i and j, the optimization goal is to find a permutation π of these n cities
that minimizes the following global fitness function:
n −1
F (s ) = ∑d
i =1
π( i ), π( i +1) + d π( n ), π(1) (3.2)
42 ◾ Extremal Optimization
s( xi ) = k , i = 1, 2, …, n − 1 (3.3)
In an ideal computational system, all cities will be connected to their first near-
est neighbors, that is, s(xi) = 1 for all i (1 ≤ i ≤ n). However, such ideal microstates
are often frustrated by the interacting effects between cities, that is, competitions
on the selection of forward-directed edges. As a result, a city may not always be
connected to its first nearest neighbor, and possibly dwells on a specific energy state
except the ground state, that is, s(xi) > 1. Correspondingly, a local fitness function
can be formulated to evaluate the microscopic dynamical characteristics of all cities.
Now, let us assume di to be the length of the forward-directed edge starting
from city i; in a feasible solution s, the local fitness of city i can be defined as
f s ( xi ) = d i − min d ij , i = 1, 2, …, n (3.4)
j ≠i
Correspondingly, the global fitness function for any possible solution s can be
represented as
n n
F (s ) = ∑ min d + ∑ f ( x )
i =1
j ≠i
ij
i =1
s i (3.5)
Obviously, the first term in Equation 3.5 is a constant, which can serve as a
lower bound for a given TSP instance. The second term is the sum of the local fit-
ness for all entities. That is to say, the global fitness of a combinatorial solution can
be represented as a function of the distributed local fitness, and the optimal solu-
tion is equivalent to the microstate with the minimum sum of local fitness for all
entities. Intuitively, the computational system can be optimized through updating
the states of those entities with a worse local fitness. In a complex computational
system, the relationship between discrete states and local fitness is usually nonlin-
ear, and the value of local fitness is not necessarily equal even if two entities have
the same state value.
Since local fitness is not as explicit as global fitness, as discussed in this chapter
(Chen and Lu, 2007), the following questions will inevitably arise:
1. Can the optimization guided by local fitness achieve the global optimum of
computational systems?
2. What will be the relationship between local fitness and global fitness?
Extremal Dynamics–Inspired Self-Organizing Optimization ◾ 43
Definition 3.1
sgn{ F ( s ′ ) − F ( s )} = sgn
∑ x ∈X ( s , s ′ , i )
[ f s ′ ( x ) − f s ( x )]
(3.6)
where sgn{·} is a symbolic function, and X(s, s′, i) denotes the set of entities whose
states are changed as a result of the interacting effects of updating the state of entity
i. If the neighbor rule is a reversible move class, and X(s, s′, i) = X(s′, s, i), then,
the solution space can be represented as an undirected graph, otherwise a directed
graph.
Definition 3.2
F (s ) = α ∑ x ∈X
f s (x ) + β (3.7)
According to Equation 3.6, it is evident that if local fitness is consistent with global
fitness, improving one or some entities’ local fitness will also optimize the global
fitness of the whole computational system, and the process of self-organization will
be effective in solving hard computational problems. Equivalence is a special case
of consistency. These definitions make it easier for us to understand what types
of local fitness function should be defined in designing self-organized computing
methods.
As we know, the optimal solutions for a large number of benchmark TSP prob-
lems have been given in Reinelt’s TSPLIB (Reinelt, 1991; TSPLIB). Therefore, for
the sake of illustration and testing, the TSP instance, called kroA100 (100 cities)
is used as an example. In order to find a specific TSP tour, the discrete-state value
and local fitness of each city are calculated sequentially. As a result, the microscopic
characteristics of the optimal tour and a random tour can be found and have been
comparatively shown in Figure 3.2.
As can be noted in Figure 3.2, in the optimal solution, the values of the state
variables for all cities as well as their local fitness are far lower than the upper
bound of the variables, for example, s(xi) ≪ n − 1 for all i (1 ≤ i ≤ n). This implies
that those cities with a high local fitness in an initial solution should update their
states, until the computational system self-organizes itself into a well-organized
microscopic state, that is, the optimal solution.
In order to demonstrate the universality of this microscopic distribution, the sta-
tistical properties of the optimal TSP solutions can be further characterized by the kth
nearest-neighbor distributions (NNDs) (Chen and Zhang, 2006). Without loss of gen-
erality, in this chapter, the kth-NND for any possible TSP tour s is defined as follows:
r (k )
p(k ) = , k = 1, 2, …, n − 1 (3.8)
n
100
Random Optimal
State
50
0
0 10 20 30 40 50 60 70 80 90 100
n
4000
Random Optimal
3000
Local fitness
2000
1000
0
0 10 20 30 40 50 60 70 80 90 100
n
Figure 3.2 Microscopic characteristics of the optimal tour and a random tour
(kroA100). (Reprinted from Expert Systems with Applications, 38, Liu, J. et al.
Self-organized combinatorial optimization. 10532–10540, Copyright 2011, with
permission from Elsevier.)
Extremal Dynamics–Inspired Self-Organizing Optimization ◾ 45
where r(k) is the total number of the kth nearest neighbors for all forward-directed
edges in a feasible tour, that is, r(k) = |X(k)|if X(k) = {xi|s(xi) = k}. Obviously,
p(k) ∈ [0, 1], and its sum will be equal to one. Figure 3.3 shows the kth-NNDs
p(k) of optimal tours on a set of 10 benchmark Euclidean TSP instances, with sizes
ranging from 51 to 2392 nodes.
As shown in Figure 3.3, each kth-NND p(k) is approximately an exponen-
tial decay function of the neighboring rank k. Numerous experiments have indi-
cated that the optimal solutions of almost all benchmark TSP instances in TSPLIB
conform to this qualitative characterization of microscopic exponential distribution.
It is worth pointing out that the above analytical results of microscopic
distributions are very useful both to mathematicians and to computer scien-
tists. From the viewpoint of mathematics, the maximum neighboring rank
Kmax = max{s(xi)} for all i (1 ≤ i ≤ n) in the optimal solution is very useful to reduce
the dimension of the state space. Taking pr2392 as an example, if its maximum
neighboring rank Kmax = 20 can be approximately estimated, the solution space
represented by discrete-state variables can be significantly reduced from (n − 1)n
to 20n, and thus, a low-dimensional state space can be obtained before being fed
into mathematical tools. From the viewpoint of computer science, these analysis
results are also of great value for the design of an effective self-organized computing
method.
ei151
0.5 pr76
kroA100
ch130
ch150
0.4
tsp225
a280
pcb442
0.3 pr1002
pr2392
p(k)
0.2
0.1
0
0 2 4 6 8 10 12 14 16 18 20
k
Figure 3.3 The kth NNDs p(k) of optimal tours from eil51 to pr2392. (Reprinted
from Expert Systems with Applications, 38, Liu, J. et al. Self-organized combinato-
rial optimization. 10532–10540, Copyright 2011, with permission from Elsevier.)
46 ◾ Extremal Optimization
∑ f ( x ; x , …, x
1
F (s ) = i i i1 iK ) (3.9)
N i =1
The local fitness f i of entity i depends on its value xi and the values of K other
entities xi1 , …, xiK . In simulation, the local fitness function f i: {0, 1}K+1 → IR assigns
a random number from a uniform distribution between 0 and 1 to each of its inputs
(Merz, 2004). With this NK-model, the ruggedness of fitness landscapes can be
tuned from smooth to rugged by increasing the value of K from 0 to N – 1. When
Extremal Dynamics–Inspired Self-Organizing Optimization ◾ 47
K is small, the global fitness difference between neighboring solutions will be rela-
tively small, and heuristics can easily find better solutions using correlated gradi-
ent information. When K is large, a large number of entities between neighboring
solutions will possess different states that will greatly influence the global fitness.
When K = N – 1, the landscape will be completely random, and the local fitness of
all entities will be changed, even if we update only one entity’s state. In this extreme
case, no algorithm is substantially more efficient than exhaustive search. However,
in practice, almost all computational systems are probably not as complex as the
worst-case condition of K = N – 1 in the NK-model, for example, in the TSP, the
maximum neighboring rank Kmax ≪ N. Therefore, it leaves us sufficient space to
develop reasonable and efficient optimization methods, and hence to tackle those
theoretically intractable computational problems.
Although the notion of the fitness landscape is useful to characterize the com-
plexity of search space, it is impossible to visualize a solution space when the dimen-
sion of the neighborhood is higher than 2. Thus, in this chapter, we utilize the
concept of the fitness network to represent the structure of solution spaces. For a
given COP, the fitness network can be defined by a triple (S, F, N):
Thus, the fitness network can be interpreted as graph G = (V, E) with vertex set
V = S and edge set E = {(s, s′) ∈ S × S|s′ ∈ N(s)}. If the move class is a reversible,
that is, if s′ ∈ N(s), then also s ∈ N(s′), and the fitness network is an undirected
network; otherwise it is a directed network. On this complex network of search
space, the out-degree of each nodes s is the cardinality of its neighborhood |N(s)|.
Let us take the illustrative fitness network of Figure 3.4 as an example. The
hypothetical fitness network corresponds to a solution space with (0, 1) binary
sequences of length 4. In this fitness network, we have the Hamming distances
d(s, s′) = 1; for all neighboring solutions s and s′, the height in the vertical direction
reflects the fitness value, that is, the height of a node represents the fitness of the
solution associated with it. Thus, the objective of optimization is to find the glob-
ally optimal solution “0011” in the bottom of the fitness network.
Based on the representation of the fitness network, the optimization dynamics
can be described in terms of a searching trajectory navigating through the network
in order to find the lowest-fitness node. Therefore, a neighborhood-based algorithm
for combinatorial optimization can be essentially characterized by the neighbor-
hood structure N(s) and its searching dynamics in the fitness network.
In order to discuss whether or not an algorithmic solution can obtain the opti-
mal solution from an arbitrary initial solution, let us first consider some properties
of the fitness network.
48 ◾ Extremal Optimization
1100
1110 1000
1010
1111 0000
0001
Fitness
Definition 3.3
Definition 3.4
A fitness network is called ergodic, if all other solutions are reachable from an arbi-
trary solution s (∀s ∈ S).
Here, the term “ergodic,” as adopted from physics, refers to all microstates that
are accessible. Theoretically speaking, a probabilistic searching method should
guarantee that the optimal solution is reachable from its initial solution s. Since we
usually construct initial solutions by nondeterministic methods, the ergodicity of a
fitness network is a necessary condition for finding the optimal solution.
In a fitness network as discussed above, local optima (such as solution “0101” in
Figure 3.4) are undoubtedly barriers on the way to the optimal solution. Suppose
Extremal Dynamics–Inspired Self-Organizing Optimization ◾ 49
that solution s1 and s2 are two local optima, and St denotes the solution set in a
reachable trajectory from s1 to s2, the fitness barrier separating s1 and s2, as discussed
by Reidys and Stadler (2002), can be defined as follows:
This depth indirectly reflects the difficulty of escaping from the basins of attraction
of the current local optimum s1. As a rule of thumb, the quality of local optimal
solutions will be improved, if we increase the size of the neighborhood. It can also
be reflected by Equations 3.10 and 3.11 that an increased set of reachable trajecto-
ries may also decrease the depth of a local optimum. However, a larger neighbor-
hood does not necessarily produce a more effective optimization method due to
computational complexity, unless one can search the larger neighborhood in a more
efficient manner.
0.9
0.8
0.7
0.6
Probability
Optimal
0.5
0.4
0.3
0.2
0.1
0
0 0.5 1 1.5 2 2.5 3
Control parameter
square of area A, and enumerate the length L of all (n − 1)!/2 possible tours. For
each solution, we calculate its dimensionless control parameter L/ n ⋅ A (Gent and
Walsh, 1996). Figure 3.5 shows the phase transition on the probability distribution
of all dimensionless control parameters. In the figure, the cumulative distribution
function directly reflects the probability that the solution space takes on solutions
less than or equal to a specific control parameter.
The observation of the phenomenon of phase transition can be shown in Figure
3.5, and we can approximately divide the solution space into two regions: (i) one
region is underconstrained among entities, in which the density of solutions is
high, thus making it relatively easy to find a solution; (ii) another region is over-
constrained, in which the probability of the existence of a solution is toward zero,
and it is very difficult to obtain a near-optimal solution, especially the optimal
solution, which is located on the boundary of phase transition. Since the cumu-
lative distribution is analogous to Gaussian distribution, it means that most of
candidate solutions are located in the middle level of the fitness network.
Based on the above theoretical modeling, simulation, and analysis, we can note
that the computational complexity of COPs always occurs on the boundary of
Extremal Dynamics–Inspired Self-Organizing Optimization ◾ 51
Step 1. Initialization
Randomly construct an initial TSP tour s, which can be naturally represented
as a permutation of n cities (path representation). Let sbest represent the best
solution found so far, and set sbest ← s.
52 ◾ Extremal Optimization
× 104
5
SOA Optimal
× 104
15
4.5
10
4 5
Global fitness
0 100 200 300 400 500 600 700 800 900 1000
3.5 × 104
2.5
2.4
3 2.3
2.2
2.1
2.5 1000 2000 3000 4000 5000 6000 7000 8000 9000 10,000
2
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10,000
Number of updates
The extent of fluctuations, which reflect the ability of hill climbing near the
bottom of the fitness network, can be adjusted by the parameter β. As we increase
the value of β, the fluctuations become gradually reduced. While β → ∞, there are
no fluctuations, and the algorithm will directly converge into a local optimum. The
experiments also show that the proposed algorithm provides superior performance
when β ≈ 2.75 ± 0.25 for the example problems under simulation.
According to the analytical results in Section 3.2, the self-organized optimiza-
tion dynamics combining the greedy self-organizing process and the fluctuated
explorations is an effective search process in complex fitness networks. Furthermore,
in order to validate the self-organized behavior of the proposed algorithm, we have
randomly generated a large-scale Euclidean TSP instance, with n = 2048, and then
statistically analyzed the optimized results of our optimization algorithm. The kth-
NND of an optimized result is presented in Figure 3.7 (Liu et al., 2011).
As shown in Figure 3.7, the kth-NND for the optimized result obtained by the
self-organized optimization algorithm is a nearly perfect exponential distribution. In
the exponentially fitted function p(k) = ae−bk, the coefficients a = 0.6791 ± 0.0142,
b = 0.5173 ± 0.0115, with 99% confidence bounds.
0.45
p(k) vs. k
Exponential fitted curve
0.4
0.35
0.3
0.25
p(k)
0.2
0.15
0.1
0.05
0
2 4 6 8 10 12 14 16 18 20
k
i. The definition of local fitness differs. For example, in a TSP tour, the local
fitness of city i is defined as f i = 3/(p + q), if it is connected to its pth and qth
nearest neighbors, respectively (Boettcher and Percus, 2000). Obviously, it
is not consistent with the global fitness, as discussed above, and it is hardly
possible to obtain the global optimum.
ii. The method for selecting a new solution s′ to replace the current solution s is
different. It is more like a physical process rather than optimization.
Table 3.1 Comparison of the Computational Results from SA, GA, τ-EO, and SOA
SA GA τ-EO SOA
eil51 426 0.00 0.00 2 0.00 0.00 25 0.00 0.00 2 0.00 0.00 2
pr76 108159 0.21 0.24 3 0.00 0.00 30 0.00 0.00 3 0.00 0.00 3
Extremal Optimization
kroA100 21282 0.69 0.45 5 0.05 0.07 65 0.12 0.09 5 0.02 0.04 5
ch130 6110 0.74 0.54 10 0.29 0.25 102 0.31 0.34 10 0.13 0.15 10
ch150 6528 0.92 0.61 12 0.38 0.40 149 0.51 0.42 12 0.22 0.20 12
tsp225 3916 1.42 0.77 21 0.66 0.35 212 0.65 0.49 21 0.45 0.34 21
a280 2579 1.45 0.71 45 0.64 0.51 362 0.53 0.46 45 0.32 0.31 45
pcb442 50778 2.07 1.05 125 1.53 0.62 1011 1.52 0.68 125 1.15 0.53 125
pr1002 259045 3.16 1.24 300 1.89 0.80 2957 1.95 0.76 300 1.77 0.69 300
pr2392 378032 4.45 2.23 900 3.83 1.51 5326 3.55 1.54 900 3.21 1.25 900
Source: Reprinted from Expert Systems with Applications, 38, Liu, J. et al. Self-organized combinatorial
optimization. 10532–10540, Copyright 2011, with permission from Elsevier.
Extremal Dynamics–Inspired Self-Organizing Optimization ◾ 57
effectiveness and efficiency, and outperforms the state-of-the-art SA, GA, and EO
for all 10 test instances.
3.4 Summary
In this chapter, an analytic characterization for COPs is developed from the self-
organizing system’s point of view. Based on the definitions of the discrete-state
variable and local fitness, the empirical observation of the microscopic distribution
is discussed with respect to an optimal solution. A notion of fitness network is also
introduced in order to characterize the structure of the solution space, and the
searching complexity of hard optimization problems is statistically analyzed from
the characteristics of phase transition.
The performance of a self-organized optimization method was presented as well
as demonstrated for solving hard computational systems. For different optimiza-
tion problems, only the definitions of a consistent local fitness function and the
neighboring rules are required.
Based on our analytic and algorithmic discussions, it is noted that this chapter
offers new insights into, as well as may inspire, further studies on the systematic
and microscopic analysis of NP-complete problems. The chapter paves the way for
utilizing self-organization in solving COPs. The future work will involve in-depth
studies on the optimization dynamics of computational systems by means of intro-
ducing the fundamentals of self-organizing systems.
This page intentionally left blank
MODIFIED EO II
AND INTEGRATION
OF EO WITH OTHER
SOLUTIONS TO
COMPUTATIONAL
INTELLIGENCE
This page intentionally left blank
Chapter 4
Modified Extremal
Optimization
4.1 Introduction
To improve the performance of the original τ-EO algorithm and extend its applica-
tion area, this chapter presenting some modified versions is organized as follows:
In Section 4.2, the modified EO with extended evolutionary probability distribu-
tion as proposed by Zeng et al. (2010b) is discussed. Section 4.3 presents a multi-
stage EO (MSEO) with dynamical evolutionary mechanism (Zeng et al., 2010c).
In Section 4.4, another modified version called the backbone-guided EO algo-
rithm proposed by Zeng et al. (2012), which utilizes the backbone information that
guides the search process of EO approaching the optimal region more efficiently, is
described. Furthermore, Section 4.5 presents PEO algorithm (Chen et al., 2006).
Finally, the summary of this chapter is given in Section 4.6.
61
62 ◾ Extremal Optimization
and Frank (2006), τ-EO is characterized by a power-law distribution over the fit-
ness ranks k,
τ − 1 −τ
Pτ (k ) = k (1 ≤ k ≤ n ) (4.1)
1 − n1− τ
where τ is a positive constant, called critical exponent of power laws and often
τ > 1. For a fixed dimension n in a specified problem, it is obvious that the coef-
ficient τ − 1/1 − n1−τ is also a positive constant. Therefore, Equation 4.1 can be
represented as the following general form:
Pp(k ) ∝ k −τ (1 ≤ k ≤ n ) (4.2)
The scale-free property is common but not universal (Strogatz, 2001). This
statement motivates us to explore other probability distributions to replace the
power-laws-based evolution rules in EO-similar methods.
As a negative example, μ-EO (Boettcher and Frank, 2006) was introduced to
demonstrate the usefulness of power-law distributions of τ-EO. In detail, μ-EO is
characterized by an exponential distribution over the fitness ranks k,
e µ − 1 − µk
Pµ (k ) = e (1 ≤ k ≤ n ) (4.3)
1 − e − µn
In general, complex networks are categorized into three types: random, scale-
free, and hierarchical networks (Barabási and Oltvai, 2004), in which the probabil-
ity P(k) that a node is connected to k other nodes is bounded, following exponential,
power-law, and power law with exponential cutoff distributions, respectively.
Empirical measurements, however, indicate that real networks deviate from
simple power-law behavior (Barabási, 2007). The most typical deviation is the flat-
tening of the degree distribution at small values of k, while a less typical deviation is
the exponential cutoff for high values of k. Thus, a proper fit to the degree distribu-
tion of real networks has the form
Ph (k ) ∝ (k + k0 )− γ e − k / kx (1 ≤ k ≤ n ) (4.5)
where k 0, kx represent the small-degree cutoff and the length scale of the high-
degree exponential cutoff, respectively, and γ is a positive constant. The scale-free
64 ◾ Extremal Optimization
behavior of real networks is therefore evident only between k 0 and kx. This distribu-
tion can be reformulated as the following form termed “hybrid distribution” here:
Ph(k ) ∝ k − h e − hk (1 ≤ k ≤ n ) (4.6)
∑
K
k −τ
k =1
Qp( K ) = (1 ≤ K ≤ n ) (4.7)
∑
n
k −τ
k =1
∑
K
e − µk
k =1
Qe( K ) = (1 ≤ K ≤ n ) (4.8)
∑
n
e − µk
k =1
∑
K
e − hk k − h
k =1
Qh( K ) = (1 ≤ K ≤ n ) (4.9)
∑
n
e − hk k − h
k =1
In fact, for the specific finite size problems, for example, n = 532, these prob-
ability distributions and their cumulative ones are shown in Figure 4.1 (Zeng et al.,
2010b). When the parameters µ, τ, and h are assigned small values, for exam-
ple, µ = 0.037, τ = 1.15, and h = 0.05, the probabilities Pe(k), Pp(k), and Ph(k) for
n = 532 decrease significantly as k varies in [1,100] while unobviously as k in [100,
532], but the specific distributions are different, which are shown in Figure 4.1a.
By contrast, when the parameters are assigned large values, for example, µ = 0.52,
τ = 2.20, and h = 0.35, Pe(k), Pp(k), and Ph(k) for n = 532 decrease significantly as
k varies in [1,10] while the changes are unobvious as k in [10,532] and the different
specific distribution are shown in Figure 4.1b.
(a) 1
0.9
0.8
Pe
0.7 Qe
Pp
0.6 Qp
Probability
Ph
0.5 Qh
0.4
0.3
0.2
0.1
0
100 101 102 103
k
(b) 1
0.9 1
0.995
0.8
0.99
0.7 0.985
0.98
0.6 101 102 103
Probability
Pe Qe Pp Qp Ph Qh
0.5
0.02
0.4 0.015
0.01
0.3
0.005
0.2
0
101 102 103
0.1
0
100 101 102 103
k
Figure 4.1 (See color insert.) Probabilities and their corresponding cumula-
tive ones of power-law, exponential, and hybrid distributions for n = 532, of
which (a) µ = 0.037, τ = 1.15, and h = 0.05; (b) µ = 0.52, τ = 2.20, and h = 0.325.
(Reprinted from Physica A, 389 (9), Zeng, G. Q. et al. Study on probability distri-
butions for evolution in modified extremal optimization, 1922–1930, Copyright
2010c, with permission from Elsevier.)
66 ◾ Extremal Optimization
n n
C (S ) = ∑
i =1
min d ij +
i≠ j ∑λ
i =1
i (4.11)
where di represents the length of the forward-directed edge starting from city i in a
feasible solution S, and dij is the intercity distance between ith and jth cities.
Inherited with the optimization scheme of SOA (Chen et al., 2007), a modified
EO framework with extended evolutionary probability distributions (Zeng et al.,
2010b) was proposed in this section by replacing the original power-law distri-
butions used in SOA with other probability distributions, such as exponential or
hybrid ones. The details of this framework are described as follows:
The modified EO algorithms under the above framework with different evo-
lutionary probability distributions are shown in Table 4.1 (Zeng et al., 2010b).
Here, power-law, exponential, and hybrid distributions are abbreviated as “P,”
“E,” and “H,” respectively. Note that the postfix of each algorithm defined in
Table 4.1 denotes the types of probability distributions used in P 1(k1) and P 2(k 2).
For example, the MEO-HH algorithm represents a modified EO algorithm in
which the hybrid distributions “H” are used both in P 1(k1) and P 2(k 2). It is
obvious that this framework can be viewed as a generalization of SOA since
the updating rules are changed by using different probability distributions. In
particular, when P 1(k1) and P 2(k 2) follow power-law distributions, the proposed
algorithm is SOA.
Modified Extremal Optimization ◾ 67
SOA k1−τ 1 k2 −τ 2
MEO-EP e −µ 1k1 k2 −τ 2
MEO-HP e − h1k1k1− h1 k2 −τ 2
Source: Reprinted from Physica A, 389 (21), Zeng, G. Q. et al. Multistage extremal
optimization for hard travelling salesman problem. 5037–5044, Copyright
2010b, with permission from Elsevier.
Algorithm eb (%) em (%) ew (%) eb (%) em (%) ew (%) eb (%) em (%) ew (%)
τ-EO 0.448 0.740 1.000 0.510 1.520 2.275 1.254 2.330 3.195
SOA 0.000 0.167 0.462 0.440 1.150 1.800 0.993 1.862 2.735
MEO-EE 0.000 0.065 0.200 0.097 0.869 1.607 0.396 1.057 1.614
MEO-EP 0.000 0.031 0.101 0.399 0.721 0.975 0.364 1.060 1.615
MEO-EH 0.000 0.042 0.175 0.037 0.760 1.150 0.512 0.985 1.859
MEO-HE 0.000 0.063 0.194 0.272 0.986 1.233 0.667 1.115 1.642
MEO-HP 0.000 0.032 0.100 0.435 0.814 1.140 0.615 0.995 1.623
MEO-HH 0.000 0.060 0.186 0.293 0.863 1.112 0.646 1.087 1.607
Source: Zeng, G. Q. 2011. Research on modified extremal optimization algorithms
and their applications in combinatorial optimization problems. Doctoral
dissertation, Zhejiang University, Hangzhou, China.
τ-EO, GA, and SA for other TSP benchmark instances from TSPLIB95 is shown
in Table 4.3.
Although power-law-based evolutionary probability distribution is one of the
main characteristics of SOA, the above experimental results (Zeng et al., 2010a,b;
Zeng, 2011) on a variety of TSP benchmark instances have shown that the pro-
posed MEO algorithms with exponential or hybrid distributions is superior to
SOA, τ-EO, and SA. Furthermore, this study appears to demonstrate that the
μ-EO with exponential distribution (Boettcher and Frank, 2006) can provide bet-
ter performance than τ-EO at least for hard TSP, which can dispel the miscon-
ception of Boettcher and Frank (2006) that μ-EO with exponential distributions
fails to perform well for hard optimization problems. From an optimization point
of view, our results indicate that the power law is not the only proper probability
distribution for EO-similar methods, the exponential and hybrid distributions may
be other choices. In fact, the key idea behind the proposed algorithm (Zeng et al.,
2010b) has extended to solve other optimization problems, for example, MAX-SAT
(Zeng et al., 2011). The research results further demonstrate the effectiveness of the
proposed MEO algorithms with extended evolutionary probability distributions.
Modified Extremal Optimization ◾ 69
tCPU(s) 4 9 11 19
(b) Part 2
(Continued)
70 ◾ Extremal Optimization
4.3 Multistage EO
In all previous work concerning EO except the theoretical analysis (Hoffmann et al.,
2004; Heilmann et al., 2004), the selection over the ranks of the degree of freedom
for updating depends on a time-independent probability distribution. More specifi-
cally, in the power-law distribution adopted by all existing EO versions, the value
of critical exponent τ is always fixed as a constant in the whole search process. In
this section, Zeng et al. (2010c) proposed a novel method called MSEO by adopting
different values of the control parameters in different stages. The main goal of this
study is attempting to demonstrate that a dynamical probability distribution-based
evolutionary mechanism is more effective than the traditional static strategy.
4.3.1 Motivations
In fact, there are two motivations behind the proposed MSEO method. One is
the effect of the control parameter τ on the search dynamics of τ-EO. Specifically,
when τ is small, the search is similar to a random walk. Conversely, it approaches
to a deterministic local search, only updating those worst variables for very large
values of τ. It is clear that the adjustable parameters are crucial to control the
search fluctuations. Once the search gets trapped into a local optimum, it seems
to be difficult to bring it to a better one if the control parameters are not adjusted.
To illustrate the necessity of modifying the previous time-independent strategy of
the control parameters, we design a simple experimental study on the performances
Modified Extremal Optimization ◾ 71
2.5
1.5
Error (%)
0.5
0
0 2 4 6 8 10 12 14 16 18
Iteration × 104
Figure 4.2 Number of iterations versus the performances of MEO-HH with the
same initial configuration and the same control parameters for pcb442 instance.
The error bars represent the best and worst performances over 10 independent
runs. (Reprinted from Physica A, 389 (21), Zeng, G. Q. et al. Multistage extremal
optimization for hard travelling salesman problem. 5037–5044, Copyright 2010b,
with permission from Elsevier.)
of MEO-HH with the same initial configuration and the same value of control
parameters but different runtime, that is, the number of iterations. The resulting
performances are obtained by 10 independent runs and is shown in Figure 4.2
(Zeng et al., 2010c). Obviously, for the case with the same initial configurations
and the same control parameters, the average and the best performances are the
best at some critical number of iterations, that is, 80,000 iterations in this experi-
ment. In other words, even though the runtime is extended, it may be difficult
to further improve the performance of this algorithm after the critical runtime.
In fact, similar phenomena can be observed in other experiments under this case.
From the above experiments, the intuition in improving the performances
is to design a dynamical evolutionary mechanism. As a consequence, a possible
improved method is to adjust the values of control parameters after some critical
runtime. The question arising naturally is how to determine the critical values of
runtime. Here, we will determine them empirically, for example, presetting them
as some constant values in practice.
Another motivation of the proposed MSEO is our recent study (Zeng et al.,
2010a) concerning the effects of initial configurations on the performances of
72 ◾ Extremal Optimization
modified EO. More specifically, for the same evolutionary mechanism, the algo-
rithm with the initial configurations constructed by some heuristics is generally
superior to that starting from random ones. This indicates that the quality of initial
configurations plays an important role in governing the performances of modified
EO algorithms. It should be noted that the optimal values of the control parameters
applied to the algorithm starting from a biased initial configuration are different
from those with a randomly selected one. Thus, varying the values of the control
parameters during the different search processes is likely to improve the perfor-
mances of a modified EO algorithm starting from different-quality initial configu-
rations in different stages.
1. In the first stage, that is, m = 1, use a modified EO algorithm with the same
evolutionary probability distribution parameters (p11, p12) starting from ran-
dom or nearest-neighbor search (NNS)-based initial configurations (Zeng
et al., 2010a) by N1 different independent runs, where the number of itera-
tions in each run is set as I1 and obtain N1 configurations.
2. For m from 2 to M, where M denotes the total number of stages.
3. Select the best one s(m−1)b from the stage m − 1 and set s(m−1)b as the initial con-
figuration of the mth stage, that is, sm0 = s(m−1)b.
4. In mth stage, use a modified EO with the evolutionary probability distri-
bution parameters (pm1, pm2) starting from sm0 by Nm different independent
runs, and the number of iterations in each run is set as Im and obtain Nm
configurations.
5. End for and output the finial NM configurations and the corresponding
global fitness.
Note that the modified EO algorithm used in each stage has several choices
from Table 4.1. Of course, the quality of the selected algorithm in each stage
has a direct influence on the final performances of MSEO. It should be also
emphasized that we focus on illustrating how MSEO works in this chapter and
the comparison of MSEO for different modified EO algorithms in each stage
is an open subject in future research. Thus, for the sake of convenience, the
Modified Extremal Optimization ◾ 73
NNMEO-HH and MEO-HH are selected in the first stage and the rest of the
stages, respectively, for tests in Section 4.3.3. Note that NNMEO-HH (Zeng
et al., 2010a) is a modified EO algorithm with NNS-based initial configurations
(“NN”) and a hybrid distributions-based evolutionary mechanism in P 1(k1) and
P 2(k 2).
Obviously, the performances of the proposed method are governed by a set of
control parameters, including M, Nm, Im, pm1, and pm2. For simplicity, M, Nm, and
Im are predefined as constants here. Our study focuses on the method of varying
the values of control parameters to enhance the solutions. It is evident that the
control parameters hm1, hm2 play a critical role in the performance of MSEO. In
this sense, the key to MSEO is to change the values of the control parameters (pm1,
pm2) in different stages. In general, the determination of control parameters used in
the first stage is similar to that of the normal EO-similar method in the research
works (Chen et al., 2007a; Zeng et al., 2010b). For the other stages, the values of the
parameters used in the current stage are always larger than those in the last stage.
This may be explained from the perspective of the “backbone” idea (Schneider
et al., 1996). Specifically speaking, we expect some good components of the local
minimum generated by the last stage can be frozen and the others are optimized in
this stage, which indicates that the original problem reduces to a smaller size one.
To achieve this goal, the larger values of the parameters in the current stages than
the previous one are needed. As a result, adjusting control parameters in each stage
to make the search approach the ground states as deeply as possible is similar to a
robust backbone-based guided search method.
Windows Vista Basic systems. Here, τ-EO is implemented with a new definition
of fitness rather than the original definition. In fact, τ-EO here outperforms the
original one with nonlinear function of fitness (Boettcher and Percus, 2000). The
resulting performances of these algorithms are shown in Table 4.4. Clearly, even
TSEO being the simplest case of MSEO can provide much better performances
than classical SA, τ-EO, and single-stage modified EO called NNMEO-HH under
fine-tuning within the same runtime.
Figure 4.3 (Zeng et al., 2010c) shows the typical search dynamics of the algo-
rithms including NNMEO-HH and TSEO for pcb442 instances. Obviously, the
modified EO algorithms including NNMEO-HH and TSEO descend fast near
the optimum but with different fluctuations. By comparing these search processes,
we can observe that changing the values of the control parameters after some
number of iterations makes the search of TSEO as far down in the energy land-
scape as possible while the static strategy adopted by NNMEO-HH do not. From
the respective histograms, the fluctuation characteristics can be analyzed easily.
The fluctuated dynamics depend on the updating probability distributions adopted
by the proposed optimization algorithms. More specifically, NNMEO-HH fol-
lows a bell-shaped distribution while TSEO has a “good” cutoff. In other words,
TSEO is more likely to approach lower states than NNMEO-HH under the same
runtime. It is clear that the more the frequency of this cutoff, the better the perfor-
mance that is obtained. From the viewpoint of mathematics, the search dynamics
of these algorithms can be described and analyzed by the Markov process.
(a) × 104
6.2
6000
6 4000
Frequency
2000
5.8
Tour length
0
5 5.1 5.2 5.3 5.4
5.6 Tour length × 104
5.4
5.2
5
0 1 2 3 4 5 6 7 8 9 10
Number of updates × 104
(b) × 104
6.2
6000
Frequency
6 4000
2000
5.8
0
Tour length
5.4
5.2
5
0 1 2 3 4 5 6 7 8 9 10
Number of updates × 104
Figure 4.3 Search dynamics of NNMEO-HH (a) and TSEO (b) for pcb442
instance. The insets are the respective histograms of the frequency with which a
particular tour length is obtained during the fluctuation process. (Reprinted from
Physica A, 389 (21), Zeng, G. Q. et al. Multistage extremal optimization for hard
travelling salesman problem. 5037–5044, Copyright 2010b, with permission from
Elsevier.)
Modified Extremal Optimization ◾ 77
than the traditional static strategy. Of course, a more adaptive schedule of the con-
trol parameters may be devised to cross the energy barriers effectively and efficiently.
probabilities. The optimal values of h2 always range from 0.30 to 0.50 to guarantee
that the better configuration (or viewed as macro-state) has more probability to be
selected as a new one than unconditionally accepted. This is also consistent with the
statistical property of the kth-nearest-neighbor distribution of optimal tours found
in many TSP instances (Chen and Zhang, 2006). For the other stages, the values of
the parameters used in P1(k1) and P2(k2) of the current stage are always larger than
those in the last stage according to the “backbone-similar” idea.
Based on the above analysis, Zeng et al. (2010c) chose the pcb442 instance to
illustrate how to determine those optimal values of the control parameters used in
the two-stage EO algorithm TSEO. For the pcb442 instance, the optimal values of
the control parameters h11 and h21 of P1(k1) used in the first and second stage, respec-
tively, are determined as h11 ≈ 0.056 ± 0.005 and h21 ≈ 0.060 ± 0.015 according to
their similar effect to τ1 ≈ 1 + 1/ln(442) = 1.15 in τ-EO and SOA, and aforemen-
tioned rules. Figure 4.4 illustrates the effects of the control parameters h12 and h22 of
P2(k2) used in the first and second stage, respectively, on the corresponding perfor-
mances of TSEO. These performances are measured by the best, average, and worst
errors (%), which are defined as 100 × (best − optimum)/optimum, 100 × (aver-
age − optimum)/optimum, 100 × (worst − optimum)/optimum, respectively, over
10 independent runs when varying the values of h12 and h22. Due to the operation
that the best configuration obtained in the first stage is selected as the initial one
in the second stage, one should focus on evaluating the best errors to determine
the optimal value of h12. From Figure 4.4, it is clear that the best configuration is
obtained in the first stage when h12 ≈ 0.365 ± 0.005. The optimal value of h22 is
determined in terms of the final comprehensive performances obtained in the sec-
ond stage. From Figure 4.4, the optimal value of h22 used in TSEO is approximately
from 0.720 to 0.730. By the similar method, these optimal values of the control
parameters used in TSEO can be determined for other tested TSP instances.
For the three-stage algorithm MSEO, the optimal values of the control param-
eters h11, h12, and h21 are the same as those in TSEO, yet those of h22 should be
determined according to the best errors obtained in the second stage. From Figure
4.4, we can observe that the best configuration is obtained in the second stage when
h22 ≈ 0.565 ± 0.005 for the pcb442 instance. Similar to the determination method
of the parameters h21 and h22 used in TSEO, the optimal values of h31 and h32 used
in the third stage can be also determined.
4.4 Backbone-Guided EO
A large deal of research works (Schneider et al., 1996; Monasson et al., 1999; Singer
et al., 2000; Dubolis and Dequen, 2001; Slaney and Walsh, 2001; Zhang, 2001,
2002, 2004; Telelis and Stamatopoulos, 2002; Schneider, 2003; Kilby et al., 2005;
Zhang and Looks, 2005; Menaï and Batouche, 2006) have shown that the compu-
tational complexity of an optimization problem depends not only on its dimension,
Modified Extremal Optimization ◾ 79
(a) 2.5
1.5
Error (%)
0.5
0
0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6
h12
(b) 0.35
0.3
0.25
0.2
Error (%)
0.15
0.1
0.05
0
0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
h22
Figure 4.4 Control parameters h12 (a) and h22 (b) versus the performances of
the first and second stage, respectively, in the TSEO algorithm. (Reprinted from
Physica A, 389 (21), Zeng, G. Q. et al. Multistage extremal optimization for hard
travelling salesman problem. 5037–5044, Copyright 2010b, with permission from
Elsevier.)
but also on some inherent structural properties, for example, backbone. As one of
the most interesting and important structures, backbone has been used to explain
the difficulty of problem instances (Monasson et al., 1999; Singer, 2000; Slaney
and Walsh, 2001; Zhang, 2001, 2002; Kilby et al., 2005). The problems with
larger backbone are generally harder for local search algorithms to solve because
80 ◾ Extremal Optimization
the clustered solutions in these problems often result in these algorithms making
mistakes more easily and wasting time searching empty subspaces before correcting
the bad assignments (Slaney and Walsh, 2001). On the other hand, the utilization
of the backbone information may help the design of effective and efficient optimi-
zation algorithms (Schneider et al., 1996; Dubolis and Dequen, 2001; Telelis and
Stamatopoulos, 2002; Schneider, 2003; Zhang, 2004; Zhang and Looks, 2005;
Menaï and Batouche, 2006). For example, Schneider et al. (1996) and Schneider
(2003) have developed a powerful parallel algorithm for TSP by using its backbone
information. Dubolis and Dequen (2001) incorporated the backbone information
in a DPL-type algorithm for random 3-SAT problem. Telelis and Stamatopoulos
(2002) developed a heuristic backbone sampling method to generate initial solu-
tions for a local search algorithm based on the concept of backbone. Zhang (2004)
proposed a backbone-guided WALKSAT method where the backbone information
is embedded in a popular local search algorithm, such as WALKSAT. Furthermore,
the basic idea has been extended to TSP (Zhang and Looks, 2005), and the partial
MAX-SAT problem (Menaï and Batouche, 2006). The experimental results have
shown that these backbone-based methods provide better performance than the
pure local search ones. Nevertheless, almost all existing EO-based algorithms have
overlooked the inherent structural properties behind the optimization problems,
for example, backbone information.
This section presents another method called backbone-guided extremal opti-
mization (BGEO) (Zeng et al., 2012) for the hard MAX-SAT problem. Menaï and
Batouche (2006) developed a modified EO algorithm called Bose–Einstein-EO
(BE-EO) to solve the MAX-SAT problem. The basic idea behind BE-EO is to sample
initial configurations set based on Bose–Einstein distribution to the original τ-EO
search process. The experimental results on both random and structured MAX-SAT
instances demonstrate BE-EO’s superiority to more elaborate stochastic optimiza-
tion methods such as SA (Hansen and Jaumard, 1990), GSAT (Selman and Kautz,
1993), WALKSAT (Selman et al., 1994), and Tabu search (Szedmak, 2001). In Zeng
et al. (2011), a more generalized EO framework termed as EOSAT was proposed
to solve the MAX-SAT problem. The modified algorithms, such as BE-EEO and
BE-HEO, provide better performance than BE-EO. Therefore, by incorporating the
backbone information into the EOSAT framework, the BGEO method proposed in
the work of Zeng et al. (2012) is possible to guide the search approach to the optimal
solutions, and to further improve the performance of the original EO algorithms.
λi =
− ∑ xi ∈C j and C j ( S )= 0
wj
(4.12)
∑ xi ∈C k
wk
In other words, the local fitness is defined as the fraction of the sum of weights
of unsatisfied clauses in which the variable xi appears by the sum of weights of
clauses connected to this variable.
The global fitness C(S) is defined as the sum of the contribution from each vari-
able (Zeng et al., 2012), that is,
n n
C (S ) = − ∑ ∑ λ
i xi ∈C k
wk = −
∑ (c λ ), i i where ci = ∑ xi ∈C k
wk (4.13)
i =1 i =1
Input: a MAX-SAT instance; MI: the maximum iterations; Rl: the maximum independent
runs of the lth iteration; SSl: the maximum sample size in the lth iteration; MSl: the maximum
steps of EO algorithm in the lth iteration; pl: the adjustable parameter for evolutionary prob-
ability distribution of EO algorithm in the lth iteration;
Output: SB : the best configurations found; C(SB): the total weights of unsatisfied clauses.
SSl = C l 1 × X NB (l ) (4.16)
MSl = C l 2 × X NB (l ) (4.17)
where Cl1, Cl2 are all positive constants and ≤|X NB(l )| is the number of nonback-
bone variables in the lth iteration.
Obviously, pl plays an analogous role in the proportion p of random and greedy
moves in WALKSAT (Selman and Kautz, 1993), and the noise parameter η in
FMS (Seitz et al., 2005). Due to the different features of the first and the remaining
iterations, pl is given in the following form:
pc , l =1
pl = (4.18)
pc + d * X B (l − 1) , 2 ≤ l ≤ MI
0.9
uf-100:430
0.8 uf-125:538
uf-150:645
Backbone size
0.7 uf-175:753
uf-200:860
uf-225:960
0.6
uf-250:1065
0.5
0.4
1 2 3 4 5 6 7 8 9 10
Iteration
Figure 4.6 Dynamics of the pseudo backbone size during the search process
of BGEO. (From Zeng, G. Q. et al., 2012. International Journal of Innovative
Computing, Information and Control 8 (12): 8355–8366. With permission.)
12
10
The best fitness so far
0
1 2 3 4 5 6 7 8 9 10
Iteration
25
BG-EEO
20 BE-EEO
The best fitness so far
15
10
0
0 20 40 60 80 100 120 140 160 180 200
Time (s)
Figure 4.7 For uf-100:430, the top is the search dynamics of the best global
fitness in BGEO and the bottom is the comparison of BG-EEO and BE-EEO.
(From Zeng, G. Q. et al., 2012. International Journal of Innovative Computing,
Information and Control 8 (12): 8355–8366. With permission.)
Problem α eb em ew eb em ew eb em ew
uf-50:218 4.360 0.92 2.20 2.75 0.46 1.74 2.29 0.00 0.00 0.00
uf-75:325 4.333 1.54 2.62 3.07 0.62 1.45 2.15 0.00 0.15 0.31
uf-100:430 4.300 1.63 2.47 2.80 0.70 1.83 2.33 0.00 0.23 0.46
uf-125:538 4.304 2.04 2.68 3.35 1.11 1.99 2.23 0.19 0.28 0.37
uf-150:645 4.300 2.17 2.53 2.95 1.40 1.95 2.33 0.16 0.31 0.47
uf-175:753 4.303 2.39 2.76 2.92 1.59 1.91 2.26 0.13 0.33 0.40
uf-200:860 4.300 2.67 3.13 3.49 1.86 2.29 2.56 0.12 0.17 0.23
uf-225:960 4.267 2.08 2.79 3.23 1.67 1.99 2.29 0.10 0.31 0.62
uf-250:1065 4.260 1.88 2.76 3.29 1.60 1.94 2.07 0.09 0.35 0.66
Source: From Zeng, G. Q. et al., 2012. International Journal of Innovative Computing,
Information and Control 8 (12): 8355–8366. With permission.
Problem α eb em ew eb em ew eb em ew
uuf-50:218 4.360 1.38 2.38 3.21 0.46 1.88 2.75 0.00 0.00 0.00
uuf-75:325 4.333 1.85 2.65 3.37 1.23 1.94 2.46 0.00 0.17 0.31
uuf-100:430 4.300 1.86 2.63 2.80 1.16 1.88 2.56 0.00 0.26 0.46
uuf-125:538 4.304 1.86 2.70 3.35 1.30 2.08 3.16 0.19 0.30 0.37
uuf-150:645 4.300 2.17 2.71 3.41 1.40 2.05 2.64 0.16 0.32 0.47
uuf-175:753 4.303 2.92 3.33 3.98 1.73 2.30 2.67 0.13 0.34 0.40
uuf-200:860 4.300 3.49 3.85 4.30 2.44 2.72 3.14 0.12 0.16 0.23
uuf-225:960 4.267 2.81 3.48 4.17 2.19 2.64 3.44 0.10 0.44 0.62
uuf-250:1065 4.260 3.09 3.51 4.38 1.78 2.28 2.72 0.09 0.45 0.66
Source: From Zeng, G. Q. et al., 2012. International Journal of Innovative Computing,
Information and Control 8 (12): 8355–8366. With permission.
88 ◾ Extremal Optimization
Problem A eb em ew eb em ew
4.5 Population-Based EO
Many real-world optimization problems involve complicated constraints. What
constitute the difficulties of the constrained optimization problem are various lim-
its on the decision variables, the constraints involved, the interference among con-
straints, and the interrelationship between the constraints, objective functions, and
decision variables. This has motivated the development of a considerable number
of approaches to tackling constrained optimization problems such as Stochastic
Ranking (SR) (Runarsson and Yao, 2000), Adaptive Segregational Constraint
Handling Evolutionary Algorithm (ASCHEA) (Hamida and Schoenauer, 2002),
Simple Multimembered Evolution Strategy (SMES) (Mezura-Montes and Coello,
2005), etc. In this section, EO will be applied to solving numerical constrained
optimization problems. To enhance and improve the search performance and effi-
ciency of EO, Chen et al. (2006) developed a novel EO strategy with population-
based search, called PEO. In addition, Chen et al. (2006) adopted the adaptive
Lévy mutation operator, which makes PEO able to carry out not only coarse-
grained but also fine-grained search. It is worth noting that there exists no adjust-
able parameter in PEO, which makes PEO more charming than other methods.
Finally, PEO is successfully applied in solving six popular benchmark problems and
Modified Extremal Optimization ◾ 89
Table 4.10 BG-EEO versus the BE-EO, BE-EEO Algorithms for CBS Instances
BE-EO BE-EEO BG-EEO
Problem α B eb em ew eb em ew eb em ew
CBS_100_ 4.03 10 1.99 2.68 3.23 0.99 1.91 2.48 0.00 0.07 0.25
403
30 1.99 2.46 2.98 1.24 1.76 2.23 0.25 0.45 0.74
CBS_100_ 4.11 10 1.46 2.12 2.68 0.97 1.58 1.95 0.24 0.49 0.73
411
30 1.22 2.19 2.92 1.22 1.68 2.68 0.24 0.49 0.97
CBS_100_ 4.18 10 1.67 2.34 3.11 1.22 1.75 2.19 0.24 0.38 0.72
418
30 2.39 2.68 3.11 1.44 2.08 2.39 0.24 0.36 0.72
CBS_100_ 4.23 10 1.65 2.13 2.60 0.71 1.47 2.13 0.00 0.47 0.95
423
30 1.42 2.48 3.55 1.18 1.84 2.13 0.24 0.47 0.71
CBS_100_ 4.29 10 1.17 2.45 3.03 0.47 1.52 2.10 0.00 0.65 0.93
429
30 1.63 2.17 2.56 1.17 1.75 2.10 0.00 0.47 0.70
Table 4.10 (Continued ) BG-EEO versus the BE-EO, BE-EEO Algorithms for
CBS Instances
BE-EO BE-EEO BG-EEO
Problem α B eb em ew eb em ew eb em ew
CBS_100_ 4.35 10 1.15 1.91 2.53 0.69 1.22 1.61 0.00 0.28 0.46
435
30 1.38 1.91 2.53 0.92 1.47 1.84 0.00 0.32 0.46
CBS_100_ 4.41 10 1.36 2.12 2.49 0.68 1.45 2.04 0.23 0.45 0.68
441
30 1.59 2.15 2.95 0.45 1.16 2.04 0.00 0.23 0.45
CBS_100_ 4.49 10 1.56 2.27 3.12 0.67 1.44 2.00 0.22 0.45 0.67
449
30 1.78 2.45 3.12 1.34 1.87 2.23 0.45 0.58 0.89
minimize f ( X ), X = [ x1 , …, xn ]T ∈ R n (4.19)
subject to g i ( X ) ≤ 0, i = 1, …, q (4.20)
Modified Extremal Optimization ◾ 91
h j ( X ) = 0, j = q + 1, …, r (4.21)
where lj and uj are the lower and upper bound of xj, respectively, and F ⊆ Rn is
defined as the feasible region. It is clear that F ⊆ S.
In this section, the methods for handling constrained nonlinear programming
problems are based on the concept of penalty functions, which penalize unfeasible
solutions. A set of functions Pi(X ) (1 ≤ i ≤ r) is used to construct the penalty. The
function Pi(X ) measures the violation of the ith constraint in the following way:
max{0, g i ( X )},2 if 1 ≤ i ≤ q
Pi ( x ) = 2 (4.23)
hi ( X ) , if q + 1 ≤ i ≤ r
Input: MI: the maximum iterations; Dim: the dimension of each solution; PopSize: the size of
population.
Output: Sbest: the best solution found; OBJ(Sbest): the objective value of the best solution found.
1. Initialization: generate initial population with PopSize solutions randomly and uni-
formly, and choose one solution with the best performance as the best solution Sbest . Set
iteration = 0.
2. For each solution Si, i ∈ {1, …, PopSize},
a. evaluate the valuable fitness λij = OBJ(Sij) − OBJ(Sbest) + Q(Sij) for each variable xij,
j ∈ {1, …, Dim},
b. compare all the variables according to their fitness values and fi nd out the worst
adapted variable xiw, w ∈ {1, …, n},
c. perform mutation only on xiw while keeping other variables unchanged, then get a
new solution Siw,
d. accept Si = Siw unconditionally and set OBJ(Si) = OBJ(Siw),
e. if OBJ(Si) < OBJ(Sbest) and Si is a feasible solution, then set Sbest = Si and
OBJ(Sbest) = OBJ(Si)
3. If iteration = MI, then continue the next step; otherwise, set iteration = iteration + 1,
and go to Step (2).
4. Output Sbest and OBJ(Sbest).
On the contrary, those variables which do not satisfy the constraints will be consid-
ered as well-adapted species and be assigned high fitness.
For a numerical constrained minimization problem, PEO proceeds as shown
in Figure 4.8.
adopted the adaptive Lévy mutation which was proposed by Lee and Yao (2001), to
easily switch Cauchy mutation to Gaussian mutation. Lévy mutation is, in a sense,
a generalization of Cauchy mutation since Cauchy distribution is a special case of
Lévy distribution. By adjusting the parameter α in Lévy distribution, one can tune
the shape of the probability density function, which in turn yields adjustable varia-
tion in mutation step sizes. In addition, Lévy mutation provides an opportunity
for mutating a parent using a distribution which is neither Cauchy nor Gaussian.
The Lévy probability distribution has the following form (Mantegna, 1994):
∞
1 − γq α
Lα , γ ( y ) =
∫π
0
e cos(qy )dq
(4.24)
As can be easily seen from Equation 4.24, the distribution is symmetric with
respect to y = 0 and has two parameters, γ and α. γ is the scaling factor satisfying
γ > 0 and α satisfies 0 < α <2. The analytic form of the integral is not known for
general α except for a few cases. In particular, for α = 1, the integral can be carried
out analytically and is known as the Cauchy probability distribution. In the limit
of α → 2, the distribution approaches the Gaussian distribution. The parameter α
controls the shape of the probability distribution in such a way, that one can obtain
different shapes of probability distribution. In Chen et al. (2006), Lévy mutation
performs with the following representation:
where Lk(α) is a Lévy random variable with the scaling factor γ = 1 for the kth vari-
able. To generate a Lévy random number, Chen et al. (2006) used an effective algo-
rithm presented by Mantegna (1994). It is known that Gaussian mutation (α = 2)
works better for searching a small local neighborhood, whereas Cauchy mutation
(α = 1) is good at searching a large area of the search space. By adding additional
two candidate offspring (α = 1.4 and 1.7), one is not fixed to the two extremes. It
must be indicated that, unlike the method in Lee and Yao (2001), the mutation
in PEO does not compare the anticipated outcomes of different values of α due
to the characteristics of EO. In PEO, the Lévy mutation with α = 1 (i.e., Cauchy
mutation) is first adopted. It means the large step size will be taken first at each
mutation. If the new generated variable after mutation goes beyond the intervals
of the decision variables, the Lévy mutation with α = 1.4, 1.7, 2 will be carried out
in turn, that is, the step size will become smaller than before. Thus, PEO com-
bines the advantages of coarse-grained search and fine-grained search. The above
analysis shows that the adaptive Lévy mutation is very simple yet effective. Unlike
some switching algorithms which have to decide when to switch between different
mutations during search, the adaptive Lévy mutation does not need to make such
decisions and introduces no adjustable parameters.
94 ◾ Extremal Optimization
1. Compared with SR: PEO found better “best,” “mean,” and “worst” solutions
in two functions (g5 and g10). It also provided similar “best,” “mean,” and
“worst” solutions in function g12. Slightly better “best” results were found by
SR in the remaining functions (g04, g07, g09).
2. Compared with ASCHEA: PEO was able to find better “best” and “mean”
results in two functions (g05, g10). ASCHEA surpassed our mean results in
three functions (g04, g07, g09). We did not compare the worst results due to the
Modified Extremal Optimization ◾ 95
Objective value
Objective value
6000
–1
5800
–2 5600
5400
–3
5200
–4 5000
0 1000 2000 3000 4000 5000 0 1000 2000 3000 4000 5000
Generation number Generation number
29
695
28
Objective value
Objective value
27 690
26
685
25
24 680
0 1000 2000 3000 4000 5000 0 1000 2000 3000 4000 5000
Generation number Generation number
1.8
Objective value
Objective value
–0.97
1.6
1.4
–0.98
1.2
1 –0.99
0.8
0.6 –1
0 1000 2000 3000 4000 5000 0 1000 2000 3000 4000 5000
Generation number Generation number
Figure 4.9 (a–f) Simulation results of PEO algorithm on six test functions.
(With kind permission from Springer Science + Business Media: Proceedings of
the 2006 International Conference on Computational Intelligence and Security
(CIS’2006), Population-based extremal optimization with adaptive Lévy mutation
for constrained optimization. 2006, pp. 258–261, Chen, M. R. et al.)
96 ◾ Extremal Optimization
fact that they were not available for ASCHEA. In addition, we did not perform
comparisons with respect to ASCHEA using function g12 for the same reason.
3. Compared with SMES: PEO found better “best,” “mean,” and “worst” results
in two functions (g05, g10) and similar “best,” “mean,” and “worst” results in
function g12. SMES outperformed PEO in the remaining functions.
From the aforementioned comparisons, it is obvious that PEO shows very com-
petitive performance with respect to those three state-of-the-art approaches.
Modified Extremal Optimization ◾ 97
4.6 Summary
This chapter introduces some modified EO versions, such as modified EO with
extended evolutionary probability distributions (Zeng et al., 2010b), MSEO with
dynamical evolutionary mechanism (Zeng et al., 2010c), backbone-guided EO
algorithm (Zeng et al., 2011), PEO algorithm (Chen et al., 2006), which are to
improve the performance of the original EO algorithm and extend the application
area. The experimental results have demonstrated the effectiveness of these modi-
fied EO algorithms. The main topics studied in this chapter are summarized as
follows:
5.1 Introduction to MAs
In the past few decades, the new theory and methods of CI have been developed
from fundamentals to practical applications in terms of the combination of com-
puter sciences and control theories. Although the CI methods don’t have a solid
theoretical foundation, they have been successfully applied in many real-world
problems, particularly for those complex systems optimization with global and/
or LS. Among them, the evolutionary computation methods have been great suc-
cesses in control systems analysis, model identification, and design. However, due
to the inherent shortcomings of Darwin’s theory, the evolutionary computations
could suffer from a lower searching efficiency and inaccuracy to provide real-time
solutions.
In recent years, a particular class of global–LS hybrids named MAs are pro-
posed (Moscato, 1989), which are motivated by Dawkins’s theory (Dawkins, 1976).
MAs are a class of stochastic heuristics for global optimization that combine the
global- search nature of EA with LS to improve individual solutions (Hart et al.,
2004; Krasnogor and Smith, 2005). They have been successfully applied to hun-
dreds of real-world problems such as optimization of combinatorial optimization
(Merz, 2000), multiobjective optimization (MOO) (Knowles and Come, 2001),
bioinformatics (Krasnogor, 2004), etc.
As mentioned above, conventional optimization techniques using the deter-
ministic rule-based search often fail or get trapped in local optimum when solving
complex problems. On the other hand, compared with deterministic optimiza-
tion techniques, some CI methods are inefficient and imprecise in fine-tuned LS
101
102 ◾ Extremal Optimization
although they are good at global search, especially when they approach a local
region near the global optimum. According to the so-called “No-Free-Lunch”
Theorem by Wolpert and Macready (1997), a search algorithm strictly performs in
accordance with the quantity and quality of the problem knowledge they incor-
porate. This fact clearly underpins the exploitation of problem knowledge intrinsic
to MAs (Moscato and Cotta, 2003). Under the framework of MAs, the stochastic
global-search heuristics approaches are combined with problem-specific solvers,
which are a combination of Neo-Darwinian’s natural evolution principles and
Dawkins’ concept of meme (Dawkins, 1976), defined as a unit of cultural evolu-
tion that is capable of performing individual learning (local refinement). In MAs,
the global character of the search is given by the evolutionary nature of the CI
approach while the LS aspect is usually performed by means of constructive meth-
ods, intelligent LS heuristics, or other search techniques (Hart et al., 2004). The
hybrid algorithms can combine the global explorative power of the CI method
with the local exploitation behaviors of conventional optimization techniques,
complementing their individual weak points, and thus outperforming either one
used alone. MAs have also been named genetic local searchers (Merz, 2000),
hybrid GAs (He and Mort, 2000), Lamarckian GAs (Ong and Keane, 2004),
Baldwinian GAs (Ku and Mak, 1998), etc.
1. The traditional EAs use a predefined fitness function; according to the theory
of coevolution, the fitness of individuals is naturally formed in the environment
when struggling for survival and will change as the environment changes.
2. Without considering the possibility of cooperation between organisms, the
traditional EAs only consider the competition between organisms, but the
truth is that the coexistence of competition and cooperation really exists,
which is called coevolution.
5.3 EO–LM Integration
5.3.1 Introduction
As mentioned above, the efficiency of optimization search can be improved sig-
nificantly by incorporating a LS procedure into optimization; the LS algorithm
could be gradient-based methods such as LM GS or other methods. In this
section, a hybrid EO–LM algorithm is introduced and applied in NN (neural
network) training. The structure of the hybrid EO–LM algorithm is based on
standard EO, the characteristic of GS is added by propagating the individual
solution with the LM algorithm during EO evolution. The proposed EO–LM
solution has the abilities to avoid local minimum and perform detailed LS with
both efficiency and robustness. The incorporation of the stochastic EO method
with the conventional deterministic LM algorithm can combine the global
explorative power of EO with the local exploitation behaviors of LM, comple-
menting their individual weak points, and thus make multilayer perceptron
(MLP) network training superior in generalization, computation efficiency, and
avoiding local minima.
The properties of the feed-forward MLP network are governed by the activation
functions of neurons and the synaptic connections between the layered neurons,
as shown in Figure 5.1. The associative memories from input space to output space
are built up and stored in the synaptic weights through supervised learning from
learning samples. The performance under its working environment measures the
generalization capability of an MLP network (Haykin, 1994). After introduced
by Werbos (1974) and popularized by Rumelhart et al. (1986a, b), the GS-based
back propagation (BP) algorithm has been the most popular learning technique in
MLP network training due to the simplicity and applicability of its implementa-
tion. However, in view of the drawbacks of GS in nature, such as easily trapping
into local minima, sensitivity to initial weights, poor generalization (Haykin, 1994;
Salomon, 1998), etc., there have been a variety of well-known attempts to improve
106 ◾ Extremal Optimization
Weighted
Weighted Bias Bias Weighted Bias
synapses matrix
Parameters synapses matrix for from hidden layer for synapses matrix for
from input layer hidden hidden from hidden layer output
in NN to hidden layer 1 layer 1
n–1 to hidden
layer n n to output layer layer
layer n
Phenotype
representation
the original BP algorithm (Jacobs, 1988; Rigler et al., 1991; Fukuoka et al., 1998).
The applications of these approaches may result in better solutions, but require
higher computation cost (Dengiz et al., 2009).
On the other hand, Hush has proved that the parameter optimization for an
MLP network with sigmoid function is an NP-hard problem (Hush, 1999). The
recent research results in bioinspired CI (Engelbrecht, 2007) (e.g., EAs, EO, and
ant colony optimization [ACO]) and their superior capabilities in solving NP-hard
and complex optimization problems have motivated researchers to use CI meth-
ods for the training of the MLP network. One way to overcome the drawbacks
of BP learning is to formulate the training process as CI-based evolution of the
MLP network structure, synaptic weights, learning rule, input features (Arifovic
and Gencay, 2001; Li et al., 2007; Yao and Islam, 2008; Dengiz et al., 2009; Fasih
Memetic Algorithms with Extremal Optimization ◾ 107
et al., 2009; Reyaz-Ahmed et al., 2009; Sedkia et al., 2009), etc. In fact, the NN
evolution with CI methods may significantly enlarge its search space and provide
better performance than BP algorithms. However, most CI methods are rather
inefficient in fine-tuned LS although they are good at global search, especially when
the searching solutions approach a local region near the global optimum, this will
result in high computation cost.
Based on the complexity of nonlinear optimization involved in NN learning,
P. Chen et al. (2010) presented the development of a novel MA-based hybrid
method called the “EO–LM” learning algorithm, which combines the recently
proposed heuristic EO (Boettcher and Percus, 1999) with the popular LM GS
algorithm (Hagan and Menhaj, 1994). In this section, we will first give the math
formulation for the problem under study, then illustrate the EO–LM fundamentals
and algorithms, and finally show the comparison results between the EO–LM and
standard LM algorithms on three experimental problems.
yi = f ( X , w, v, θ, r ) = ∑ (v z
k =1
ki k + ri )
(5.1)
p m
= ∑
k =1
vki log
∑
j =1
ω jk x j + θk + ri
i = 1, ..., n
n n _Train
∑∑
2
Min E (w , v , θ, r ) = yil − yil
i =1 l =1 (5.2)
s.t w ∈ R , v ∈ R , θ ∈ R , r ∈ R
m* p p *n p n
where n_Train represents the training data number. w, v, θ, and r are bounded
by the searching space of the optimization algorithm. yi represents the ith desired
output.
5.3.3 Introduction of LM GS
The LM GS algorithm was introduced to feed-forward network training to provide
better performance (Hagan and Menhaj, 1994). Generally, the LM algorithm is a
Hessian-based algorithm for nonlinear least-squares optimization (Nocedal and
Stephen, 2006). Similar to the quasi-Newton methods, the LM algorithm was
designed to approach second-order training speed without having to compute the
Hessian matrix. Under the assumption that the error function is some kind of a
squared sum, the Hessian matrix can be approximated as
H = JT J (5.3)
g = JT e (5.4)
where J is the Jacobian matrix that contains first derivatives of the network errors
with respect to weights and biases, and e is an error vector. The Jacobian matrix
can be computed through a standard BP technique that is much less complex than
computing the Hessian matrix (Hagan and Menhaj, 1994).
The LM algorithm uses this approximation to the Hessian matrix in the follow-
ing Newton-like update:
xk +1 = xk − [ J T J + µI ]−1 J T e (5.5)
(a)
LRMSE
0.3
Fitness_global/2
Local minimum GRMSE
0.095 RMSE on test set
0.25 0.09
0.085
Best solution
0.08
0.2
RMSE
0.075
0.07
Critical
0.065
0.15 point
4 6 8 10 12 14 16 18 20 22
0.1
10 20 30 40 50 60 70 80
Iterations
(b)
5
LRMSE
4.5 Fitness_global/2
GRMSE
4 0.25 RMSE on test set
3.5 0.2
3 0.15
RMSE
2.5 0.1
2 0.05
1.5 6 8 10 12 14 16 18 20
1
0.5
2 4 6 8 10 12 14 16 18 20
Iterations
Figure 5.2 Generalization ability on training and validation data set and the
phase transition in NN training. (a) Learning curve using LM and (b) learning
curve using EO–LM.
phases of LS (mutation operator) are applied to the best solution S so far based on a
probability parameter pm in each generation. In contrast to the standard EO muta-
tion, when LM mutation or multistart Gaussian mutation is adopted, we use the
“GEOvar” (De Sousa et al., 2004) strategy to evolve the current solution by improv-
ing all variables simultaneously, as an attempt to speed up the process of searching
the local minimum. There are two evolutionary levels during the proposed EO–LM
optimization: on one hand, evolution takes place at the “chromosome level” as in any
Memetic Algorithms with Extremal Optimization ◾ 111
other EA; chromosomes (genes) represent solutions and features of the problem to be
solved. On the other hand, evolution also happens at the “meme level,” that is, the
behaviors that individuals will use to alter the survival value of their chromosomes
(Krasnogor and Gustafson, 2004). Accordingly, the solutions are evaluated by fitness
functions of two different levels: the fitness of the respective gene itself (global fitness),
the interaction fitness between the associated gene and meme (local fitness). Thus,
both genetic and meme materials are coevolved, the evolutionary changes at the gene
level are expected to influence the evolution at the meme level, and vice versa. The
proposed EO–LM is able to self-assemble different mutation operators and coevolve
the behaviors it needs to successfully solve the NN supervised learning problem. The
flowchart of the proposed EO–LM algorithm to optimize parameters (the connec-
tion weights and the biases) of MLP network is shown in Figure 5.3.
The work steps of the proposed EO–LM-based MLP-training algorithm in this
study can be described as below:
1. Define the number of hidden layers, the numbers of input neurons, output
neurons, and the control parameters to be used in EO–LM algorithm.
2. Initialize the NN with randomly generated weights and biases based on the
predefined structure in Step (1).
Start
Select the mutation operator based on a
randomly generated probability
Define NN structure and
initial NN’s
weights/biases randomly
N
Termination?
EO evolution cycle Y
Stop
3. Map the weights/biases matrices of the NN from the problem oriented phe-
notype space into a chromosome, as shown in Figure 5.1.
4. For the first iteration of EO, decode the initial chromosome S back to weights/
biases matrices and calculate the object fitness function, set Sbest = S.
5. Decide what kind of mutation operators should be imposed to the current
chromosome S based on randomly generated probability parameters pm, if
Pm ≤ Pm_basic, go to (i); else if Pm_basic < Pm ≤ Pm_LM, go to (ii); else if Pm > Pm_LM,
go to (iii).
a. Perform the standard EO mutation on the best so-far solution S.
i. Change the value of each component in the current S and get a set of
new solutions Sk′, k ∈ [1, 2, …, n].
ii. Sequentially evaluate the localized fitness λk specified in Equation
5.10 for every Sk′, and rank them according to their fitness values.
iii. Choose the best solution S′ from the new solutions set [S′], which is
a neighbor subspace of the best so far solution S.
b. Perform the LM mutation on the current chromosome S.
i. Decode the chromosome S back to weights/biases matrices in MLP
networks.
ii. The weight vector is updated for N iterations by
S ′ = S − [ J T J + µI ]−1 J T e (5.6)
e( x ) = ∑ ∑ ( y − y )
i =1 l =1
l
i
l 2
i (5.7)
∑ ∑ ( y − y ) l
i
l 2
i ∑ ∑ ( y − y ) l
i
l 2
i
(5.9)
= i =1 l =1
+ i =1 l =1
n * n _Train n * n _Valid
λ k = Fitnesslocal (k ) = ∆LRMSE (k )
(5.10)
= LRMSE S (w , v , θ, r ) − LRMSE Sk′ (w , v , θ, r )
First we randomly generate 300 I/O observation pairs, for which the input
variables are generated randomly in region I: x1 ∈ [0, 1], x 2 ∈ [0, 1], x 3 ∈ [0, 1],
use 200 of them as learning data and the other 100 as interpolation testing data
to measure the generalization performance, we randomly generate another 100
I/O data pairs within region II: x1 ∈ [1, 2], x 2 ∈ [1, 2], x 3 ∈ [1, 2], they are beyond
region I and will be used as an extrapolation testing data set. In this case, the
structure of the network is {3, 2, 1}, namely three, two, and one nodes in input,
hidden, and output layers, respectively, totally 11 parameters need to be optimized
in this example.
For the purpose of fair comparison, this test was repeated 10 times using the
Monte Carlo method for the EO–LM and standard LM algorithms. The com-
parison results between these two algorithms are listed in Table 5.1. In addition to
the popular criteria of RMSE and mean error (ME), the efficiency coefficients R 2
or R, which measures the proportion of the variation of the observations around
the mean (usually explained by the fitted regression model), is also be used as an
additional measure. The value of R 2 or R falls between 0 and 1. When the R 2 or R
reaches to 1, the model’s outputs perfectly agree with its system’s actual outputs.
Table 5.1 and Figure 5.4 show the comparison between EO–LM and LM based on
the statistical data.
Table 5.1 shows the performance of each algorithm over 10 runs, giving an
indication of robustness and generalization ability of each algorithm. As men-
tioned above, the better solution has smaller RMSE over “train,” “interpolation
test,” and “extrapolation test” datasets, and lower standard deviation as well. We
can see that the EO–LM algorithm performs much better than the standard LM
algorithm. Figure 5.4 shows comparison of RMSE distribution between EO–LM
and LM on training and test data. From Table 5.1 and Figure 5.4, we can see that
the solution is more robust and consistent than that evolved by the standard LM.
Figure 5.5 gives the comparisons in generalization performance between the
EO–LM and LM algorithms on training, interpolation, and extrapolation test
Memetic Algorithms with Extremal Optimization ◾ 115
Test Test
Data Data
Train Test Data (Extra Train Test Data (Extra
Data (Interset) set) Data (Interset) set)
datasets. We can see both algorithms provide good results for training dataset and
EO–LM performs slightly better on the interpolation test dataset, but much better
on the extrapolation test dataset. The extrapolation test dataset is a most crucial test
phase to tell which algorithm will have a better ability to learn, EO–LM performs
pretty well in this case, while LM corrupts.
–1.8 –0.7
–1.9
–1.9 –0.8
–2
–2 –0.9
–2.1
–2.2 –2.1 –1
–2.4 –1.2
EO–LM LM EO–LM LM EO–LM LM
Predicted and measured value on Predicted and measured value on Predicted and measured value on
training data EO–LM interpolation test data EO–LM extrapolation test data EO–LM
1 1 3
Predicted value
Measured value
0.5 0.5 2
0 0 1
0 50 100 150 0 50 100 0 50 100
Predicted and measured value on Predicted and measured value on Predicted and measured value on
training data original LM interpolation test data original LM extrapolation test data original LM
1 1 2.5
Predicted value
Measured value
2
0.5 0.5
1.5
0 0 1
0 50 100 150 0 50 100 0 50 100
f ( X ) = 20 + e − 20e
(−0.2 1/n ∑ x ) − e1/n ∑
n 2
i =1 i n
i =1 cos( 2 πxi ) (5.12)
dx1
dt = u1 + u2 − k1 x1
dx2 = (C − x ) u1 + (C − x ) u2 − k2 x2 (5.13)
B1 2 B2 2
dt x1 x1 (1 + x 2 )2
1 0
y = [ x1 x2 ]T
0 1
Landscape for two-dimensional Ackley function Landscape for two-dimensional Ackley function
25 8
7
20 6
5
15
4
10 3
2
5
1
0 0
20 2
10 1
20 0
0 10 2
0 –1 1
–10 0
–10 –1
–20 –20 –2 –2
Figure 5.6 (See color insert.) Landscape for two-dimensional Ackley function;
left: surface plot in an area from −20 to 20, right: focus around the area of the
global optimum at [0, 0] in an area from −2 to 2.
118 ◾
Y1 Y2 Y1 Y2 Y1 Y2 Y1 Y2
RMSE Mean 0.0024 0.0025 0.0046 0.0052 0.0027 0.0031 0.0051 0.0054
Standard deviation 8.05e − 05 1.81e − 04 2.41e − 04 5.98e − 04 3.01e − 04 3.38e − 04 3.53e − 04 4.13e − 04
Source: From Chen, P. et al., International Journal of Computational Intelligence Systems 3: 622–631, 2010. With permission.
Memetic Algorithms with Extremal Optimization ◾ 119
0.1 ≤ u1 ≤ 2
s.t (5.14)
0.1 ≤ u2 ≤ 2
where Y(n) = [y1(n), y2(n)]T = [x1(n), x 2(n)]T, In this case, a MIMO MLP network
is employed to build the nonlinear mapping f(•) between inputs and outputs. The
structure of the network is {4, 10, 2}, totally 72 parameters need to be optimized.
We randomly generate 500 observation pairs and use 400 of them as the learning
data and the other 100 as the test data set. The example was run 10 times for each
algorithm randomly. The performance by the EO–LM and standard LM is listed
in Table 5.3.
Table 5.3 shows the comparison between predicted and measured values at
training and test phases by the hybrid EO–LM and LM. We can see that the EO–
LM performs better than the standard LM both on train data set and test data set.
It can be seen the EO–LM is better than the LM in both RMSE and generaliza-
tion performance. We can see that the EO–LM can easily avoid the local minima,
overfitting, or underfitting problems. The dynamic responses of the weights corre-
sponding to Figure 5.2 are shown in Figure 5.7. It can be seen that the LM suffers
a phase transition from the 62th iteration and the whole weights suddenly evolved
large fluctuations simultaneously when the set of parameters of the network crosses
its boundary. The LM takes a long time to converge a “metastable” state slowly with
a minor improvement in LRMSE. In contrast, the EO–LM also has a phase transi-
tion or critical point at the fifth iteration, but the EO–LM is able to handle it quite
well due to the natural capability of EO to deal with phase transitions.
5.4 EO–SQP Integration
5.4.1 Introduction
With high demand in decision and optimization for many real-world problems
and the progress in computer science, the research on novel global optimization
120 ◾ Extremal Optimization
Critical point
LM
1
0.8
0.6
0.4
0.2
EO–LM
1
0.8
0.6
0.4
0.2
2 4 6 8 10 12 14 16 18 20
Iterations
Figure 5.7 Comparison of weights change during the training process using LM
and EO–LM.
solutions has been a challenge to academic and industrial societies. During the past
few decades, various optimization techniques have been intensively studied; these
techniques follow different approaches and can be divided roughly into three main
categories, namely, the deterministic methods (Nocedal and Stephen, 2006), sto-
chastic methods (Kall and Wallace, 1994), and bioinspired CI (Engelbrecht, 2007).
In general, most global optimization problems are intractable, especially when
the optimization problem has complex landscape and the feasible region is con-
cave and covers a very small part of the whole search space. Solution accuracy and
Memetic Algorithms with Extremal Optimization ◾ 121
Minimize f ( X ), X = [ x1 , x2 , …, xn ] (5.16)
subject to
g t ( X ) ≤ 0; t = 1, 2, …, p (5.17)
122 ◾ Extremal Optimization
hu ( X ) = 0; u = 1, 2, …, q (5.18)
xv ≤ xv ≤ xv ; v = 1, 2, …, n (5.19)
1. Update the Hessian matrix of Lagrangian function based on the most popular
algorithms called “Broyden–Fletcher–Goldfarb–Shanno” (BFGS) as shown
below
qk qTk H kT sTk sk H k
H k +1 = H k + − T (5.20)
qTk sk sk H k sk
where
sk = xk +1 − xk (5.21)
Memetic Algorithms with Extremal Optimization ◾ 123
m
m
qk = ∇f ( xk +1 ) +
∑
i =1
λ i ⋅ ∇g i ( xk +1 ) − ∇f ( xk ) +
∑
i =1
λ i ⋅ ∇g i ( xk +1 ) (5.22)
1 T
minn q(d ) = d k H k d k + ∇f ( xk )T d k (5.23)
d ∈ℜ 2
subject to
[∇g ( xk )]T d k + g i ( xk ) = 0 i = 1, …, me
(5.24)
[∇g ( xk )]T d k + g i ( xk ) ≤ 0 i = me + 1, …, m
where
Hk, Hessian matrix of Lagrangian function defined by L(x, λ) = f(x) +
λTgi(x) at x = xk,
dk, Basis for a search direction at iteration k,
f(x), Objective function in Equation 5.16,
g(x), Constraints described in Equations 5.17 and 5.18,
me, Number of equality constraints,
m, Number of inequality constraints
3. The next iteration xk+1 is updated by
xk +1 = xk + ak ⋅ d k (5.25)
search algorithm strictly performs in accordance with the amount and quality of
the problem knowledge they incorporate. This fact clearly underpins the exploita-
tion of problem knowledge intrinsic to MAs. Under the framework of MAs, the
stochastic global- search heuristics work together with problem-specific solvers, in
which Neo-Darwinian’s natural evolution principles are combined with Dawkins’
concept of a meme (Dawkins, 1976) defined as a unit of cultural evolution that is
capable of performing individual learning (local refinement). The global character
of the search is given by the evolutionary nature of CI approaches while the LS is
usually performed by means of constructive methods, intelligent LS heuristics or
other search techniques (Hart et al., 2004). The hybrid algorithms can combine
the global explorative power of CI methods with the local exploitation behavior of
conventional optimization techniques, complement their individual weak points,
and thus outperform either one used alone.
Moreover, since the natural link between hard optimization and statistical
physics, the dynamic properties and computational complexity of the optimiza-
tion have been attractive fundamental research topics in the physics community
in the past two decades. Murty and Kabadi (1987, p. 118) first used the technique
of discrete combinatorial complexity theory to study the computational diffi-
culty of continuous optimization problems and found that “Computing a global
minimum, or checking whether a given feasible solution is a global minimum,
for a smooth nonconvex NLP, may be hard problems in general.” It has been rec-
ognized that one of the real complexities in optimization comes from the phase
transition, for example, “easy–hard–easy” search path (Rogers et al., 2006).
Phase transitions are found in many combinatorial optimization problems, and
have been observed in the region of continuous parameter space containing the
hardest instances (Monasson et al., 1999; Fukumizu and Amari, 2000; Ramos
et al., 2005). It has been shown that many problems exhibit “critical boundar-
ies,” across which dramatic changes occur in the computational difficulty and
solution character, the problems become easier to solve away from the boundary
(De Sousa et al., 2004). Unlike the equilibrium approaches such as SA, EO as
a general-purpose method inspired by nonequilibrium physical processes shows
no signs of diminished performance near the critical point, which is deemed to
be the origin of the hardest instances in terms of computational complexity. This
opens a new door for the development of a high-performance solution with fast
global convergence and good accuracy in terms of a hybrid EO–SQP algorithm
proposed in this section.
In this section, an MA-based hybrid EO–SQP algorithm is developed and
applied to NLP problems. The proposed algorithm is a hybridization of EO and
SQP. We intend to make use of the capacity of both algorithms: the ability of EO
to find a solution close to the global optimum and effectively dealing with phase
transition; the ability of SQP to fine-tune a solution quickly by means of LS and
repair infeasible solutions. To implement EO–SQP optimization, the following
practical issues need to be addressed.
Memetic Algorithms with Extremal Optimization ◾ 125
N
Termination?
EO evolution cycle Y
Stop
Figure 5.8 Flowchart of the EO–SQP algorithm. (From Chen, P. and Lu, Y. Z.
Memetic algorithms based real-time optimization for nonlinear model predictive
control. International Conference on System Science and Engineering, Macau,
China, pp. 119–124. © 2011, IEEE.)
Memetic Algorithms with Extremal Optimization ◾ 127
where ak, dk are the optimum search length and the search direction
described in Equations 5.23 and 5.25, respectively.
c. Perform the multistart Gaussian mutation on the best-so-far chromo-
some S.
i. Generate a new chromosome S0′ by adding a Gaussian distribution
random vector with n dimensions to the best-so-far chromosome S.
S0′ = S + Scale * N (0,1) (5.30)
unimodal, and multimodal) and constraints (e.g., linear inequalities [LIs], nonlin-
ear equalities [NEs], and nonlinear inequalities [NIs]). These benchmark functions
make it possible to study the proposed EO–SQP algorithm in comparison with
other state-of-the-art methods and some well-known results published recently.
∑
n
Michalewicz f1( X ) = − sin( xi ) (0, π)n −9.66
i =1
i − xi2
sin2m , m = 10
π
Schwefel f2( X ) = − ∑
n
i =1
( ( x ))
xi sin i (−500, 500)n −12569.5
x
∑ ∏
1 n n
= xi2 − cos i + 1
4000 i =1 i =1 i
∑
n
Rastrigin f4 ( X ) = xi2 − 10 cos( 2π xi ) + 10 (−5.12, 5.12)n 0
i =1
1 n
−0.2
n
∑ i =1xi2
Ackley f5 ( X ) = 20 + e − 20e
(−32.768, 32.768)n 0
1 n
∑ i =1cos( 2πxi )
− en
∑
Rosenbrock n −1 (−30, 30)n 0
f6 ( X ) = 100( xi +1 − xi2 )2 + ( xi − 1)2
i =1
and standard SQP on the six benchmark functions are presented in Table 5.6. The
best results among the four approaches are shown in bold. “Success” represents the
success rate (percentage of success to discover the global minimum), and “runtime”
is the average runtime when the algorithm stops according to the termination cri-
teria defined in Section 5.4.6. In our experiments, the population size of GA and
PSO are set to 100 and 50, respectively. The four algorithms are implemented in
MATLAB and the experiments are carried out on a P4 E5200 (2.5)-GHz machine
with 2-GB RAM under WINXP platform, and the source codes of GA and PSO
can be obtained from the MATLAB website.
As shown in Table 5.6, the EO–SQP algorithm proposed in this study is able
to find the global optima consistently with a success rate of 100% for all six uncon-
strained benchmark functions, while GA, PSO, and SQP have a very low suc-
cess rate for most benchmark problems (Michalewicz, Schwefel, Rastrigin, and
Rosenbrock functions). Moreover, EO–SQP is quite an efficient method; the com-
putational time is significantly reduced in comparison with GA and PSO. Although
130
the deterministic SQP is the fastest method among the four, it is easily trapped in
local minima as shown in simulation results (Michalewicz, Schwefel, Rastrigin,
Ackley, and Rosenbrock functions). The proposed EO–SQP method can success-
fully prevent solutions from falling into the deep local minimal, reduce evolution
process significantly with efficiency, and converge to the global optimum or its
close vicinity.
g 2 ( x ) = − x3 + x 4 − 0.55 ≤ 0
g 6 ( x ) = x12 + 2( x2 − 2)2 − 2 x1 x2 + 14 x5 − 6 x6 ≤ 0
g10:
Min f ( x ) = x1 + x2 + x3
g 2 ( x ) = −1 + 0.0025( x5 + x7 − x 4 ) ≤ 0
g 3 ( x ) = −1 + 0.01( x8 − x5 ) ≤ 0
g 5 ( x ) = − x2 x 7 + 1250 x5 + x 2 x 4 − 1250 x 4 ≤ 0
g 6 ( x ) = − x3 x8 + 1250000 + x3 x5 − 2500 x5 ≤ 0
s.t . g ( x ) = ( x1 − p )2 + ( x2 − q )2 + ( x3 − r )2 − 0.0625 ≤ 0
g12 Best 1 1 1 1
Mean 1 1 0.999988 1
1 Worst 1 1 0.999935 1
Standard 0 0 1.7e − 05 0
deviation
a The best result of problem g05 by SR is even better than the optimal solution of
5126.498. This is the consequence of transforming equality constraints into
inequality constraints by a relaxed parameter ε (Runarsson and Yao, 2000).
with the exception of test function g09. With respect to test function g09, although
the EO–SQP fails to provide superior results, the performance of the four methods
are actually very close. Generally, constrained optimization problems with equal-
ity constraints are very difficult to solve. It should be noted that for the three test
functions with equality constraints (g05, g07, and g10), the EO–SQP can provide
better performance than the other three methods, the optimum solutions are found
by the EO–SQP for all the three problems with equality constraints; while the SR,
SMES, and AFM fail to find the global optimums. This is due to the hybrid mecha-
nism that the EO–SQP can benefit from the strong capability of SQP to deal with
constraints during the EO evolution.
15
Fitness
10
0
0 100 200 300 400 500 600 700 800 900 1000
Iterations
mutation will help to find the global optimum point in just a few runs, as shown
in Figure 5.9.
Evolutions of best solution fitness as a function of time for the EO–SQP, GA,
PSO, and SQP on the Ackley function are also shown in Figure 5.10. The conver-
gence rate of the proposed EO–SQP algorithm is a little slower than the GA and
PSO at the early stage, due to the better solution diversity of population-based
methods (GA and PSO); however, when approaching a near region of the global
optimum, the EO–SQP keeps a high convergence rate and reaches the global mini-
mum very fast due to the efficiency of gradient-based SQP LS. On the other hand,
the conventional SQP converges to a local minimal far from the global optimum
with high efficiency and cannot escape from it due to the weakness of GS.
As a general remark on the comparisons above, the EO–SQP shows better per-
formance with respect to state-of-the-art approaches in terms of quality, robustness,
and efficiency of search. The results show that the proposed EO–SQP finds optimal
or near-optimal solutions quickly, and has more statistical soundness and faster con-
vergence rate than the compared algorithms. It should be noted that the factors con-
tributing to the performance of the proposed EO–SQP method are the global-search
capability of EO and the capability of the gradient-based SQP method to search the
local optimum efficiently with high accuracy and to deal with various constraints.
In this section, a novel MA-based hybrid EO–SQP algorithm is proposed for
global optimization of NLP problems, which are typically quite difficult to solve
138 ◾ Extremal Optimization
EO–SQP GA
25
3 2
20
Fitness
15
Fitness
2 1
10 0
1
5 0 2 4
0 0
0 1 2 3 4 0 5 10 15
Time Time
PSO SQP
25 20 20.5
20 15
Fitness
15 10 Fitness 20
5
10 0 0.5 1
19.5
0 20 40 60 80 100 0 2 4 6 8
Time Time × 10–3
exactly. Traditional deterministic methods are more vulnerable to get trapped in the
local minima; while most CI-based optimization methods with global-search capabil-
ity tend to suffer from high computation cost. Therefore, under the framework of MAs,
the general-purpose heuristic EO and deterministic LS method SQP are combined
together in order to develop a robust and fast optimization technique with global-
search capability and mechanism to deal with constraints. The hybrid method avoids
the possibility of local minimum by providing the GS method with the exploration
ability of EO. Those advantages have been clearly demonstrated by the comparison
with some other state-of-the-art approaches over 12 widely used benchmark functions.
5.5 EO–PSO Integration
5.5.1 Introduction
The PSO algorithm is a recent addition to the list of global-search methods. This
derivative-free method is particularly suited to continuous variable problems and
has received increasing attention in the optimization community. PSO was origi-
nally developed by Kennedy and Eberhart (1995) and inspired by the paradigm of
birds flocking. PSO consists of a swarm of particles and each particle flies through
the multidimensional search space with a velocity, which is constantly updated
by the particle’s previous best performance and by the previous best performance
of the particle’s neighbors. PSO can be easily implemented and is computationally
Memetic Algorithms with Extremal Optimization ◾ 139
inexpensive in terms of both memory requirements and CPU speed (Kennedy and
Eberhart, 1995). However, even though PSO is a good and fast search algorithm,
it has premature convergence, especially in complex multi-peak-search problems.
This means that it does not “know how” to sacrifice short-term fitness to gain
longer-term fitness. The likelihood of this occurring depends on the shape of the
fitness landscape: certain problems may provide an easy ascent toward a global
optimum; others may make it easier for the function to find the local optima. So
far, there have been many researchers devoted to this field to deal with this problem
(Shelokar et al., 2007; Jin et al., 2008; Chen and Zhao, 2009).
To avoid premature convergence of PSO, an idea of combining PSO with EO
was addressed by Chen et al. (2010b). Such a hybrid approach expects to enjoy the
merits of PSO with those of EO. In other words, PSO contributes to the hybrid
approach in a way to ensure that the search converges faster, while EO makes the
search jump out of local optima due to its strong LS ability. Chen et al. (2010b)
developed a novel hybrid optimization method, called the hybrid PSO–EO algo-
rithm, to solve those complex unimodal/multimodal functions which may be dif-
ficult for the standard PSOs. The performance of PSO–EO was testified on six
unimodal/multimodal benchmark functions and provided comparisons with the
PSO–GA-based hybrid algorithm (PGHA) (Shi et al., 2005), standard PSO, stan-
dard GA, and PEO (Chen et al., 2006). Experimental results indicate that PSO–
EO has better performance and strong capability of escaping from local optima.
Hence, the hybrid PSO–EO algorithm may be a good alternative to deal with
complex numerical optimization problems.
where d ∈ {1, 2, …, D}, i ∈ {1, 2, …, N}, N is the population size, the superscript t
denotes the iteration number, w the inertia weight, r1 and r 2 are two random values
in the range [0, 1], c1 and c 2 are the cognitive and social scaling parameters which
are positive constants.
(GC mutation for short) presented by Chen and Lu (2008). This mutation method
mixes Gaussian mutation and Cauchy mutation. The mechanisms of Gaussian and
Cauchy mutation operations have been studied by Yao et al. (1999). They pointed
out that Cauchy mutation is better at coarse-grained search while Gaussian muta-
tion is better at fine-grained search. In the hybrid GC mutation, the Cauchy muta-
tion is first used. It means that the large step size will be taken first at each mutation.
START
Initialization of PSO
iteration = 0
PSO operators
No
Iteration mod INV = 0?
Yes
Iteration = iteration + 1 EO procedure
No
Terminal condition met?
Yes
Output optimal solution
STOP
If the new generated variable after mutation goes beyond the range of the variable,
the Cauchy mutation will be used repeatedly for some times (parameter TC is used
to represent the times of Cauchy mutation), until the new generated offspring falls
into the range. Otherwise, the Gaussian mutation will be carried out repeatedly for
another set of times (parameter TG is used to represent the times of the Gaussian
mutation), until the offspring satisfies the requirement. That is, the step size will
become smaller than before. If the new generated variable after mutation still goes
beyond the range of the variable, then the upper or lower bound of the decision
variable will be chosen as the new generated variable. Thus, the hybrid GC muta-
tion combines the advantages of coarse-grained search and fine-grained search.
Unlike some switching algorithms which have to decide when to switch between
different mutations during search, the hybrid GC mutation does not need to make
such decisions. The Gaussian mutation performs with the following representation
(Chen et al., 2010b):
where xk and xk′ denote the kth decision variables before mutation and after muta-
tion, respectively, Nk(0, 1) denotes the Gaussian random number with mean zero
and standard deviation one and is generated anew for kth decision variable. The
Cauchy mutation performs as follows (Chen et al., 2010b):
xk′ = xk + δ k (5.34)
where δk denotes the Cauchy random variable with the scale parameter equal to one
and is generated anew for the kth decision variable.
Memetic Algorithms with Extremal Optimization ◾ 143
In the hybrid GC mutation, the values of parameters TC and TG are set by the
user beforehand. The value of TC decides the coarse-grained searching time, while
the value of TG has an effect on the fine-grained searching time. Therefore, both
values of the two parameters cannot be large because it will prolong the search
process and hence increase the computational overhead. According to the literature
(Chen and Lu, 2008), the moderate values of TC and TG can be set to 2–4.
1. The PSO procedure is O(2N + ND), where D is the number of decision vari-
ables, N is the number of the particles in the swarm.
2. The EO procedure is O(2ND).
To testify the efficiency and effectiveness of the PSO–EO, in Chen et al. (2010b),
the experimental results of PSO-EO were compared with those of the PGHA, stan-
dard PSO, standard GA, and PEO. Note that all the algorithms were run on the
same hardware (i.e., Intel Pentium M with 900-MHz CPU and 256M memory) and
software (i.e., JAVA) platform. Each algorithm was run independently for 20 trials.
Table 5.8 shows the settings of problem dimension, maximum generation,
population size, initialization range of each algorithm, and the value of parameter
INV for each test function.
For the hybrid PSO–EO, the cognitive and social scaling parameters, that is, c1
and c 2, were both set to 2, the inertia weight w varied from 0.9 to 0.4 linearly with
the iterations, the upper and lower bounds for velocity on each dimension, that is,
vmin and vmax, were set to be the upper and lower bounds of each dimension, that
is, (vmin , vmax ) = ( xmin , xmax ). The parameters TC and TG in the hybrid GC mutation
were both set to 3. From the values of parameter INV shown in Table 5.8, we can
see that EO was introduced to PSO more frequently on functions f 1, f 2, and f6 than
other three functions due to the complexity of problems. For the standard PSO,
all the parameters, that is, c1, c 2, w, and (vmin, vmax), were set to the same as those
used in the hybrid PSO–EO. For the standard GA, elitism mechanism, roulette
wheel selection mechanism, single-point uniform crossover with the rate of 0.3,
nonuniform mutation with the rate of 0.05 and the system parameter of 0.1 were
used. For the PGHA, all the parameters were set to the same as those used in the
standard PSO and standard GA. For the PEO, the hybrid GC mutation with both
parameters TC and TG equal to 3 were adopted.
In order to compare the different algorithms, a fair time measure must be
selected. The number of iterations cannot be used as a time measure, as these algo-
rithms do different amounts of work in their inner loops. It is also noticed that
each component of a solution in the EO procedure has to be evaluated at each
Table 5.8 Parameter Settings for Six Test Functions
Maximum Population Initialization
Function Dimension Generation Size Range INV
f1 10 20,000 10 (0, π) n 20
f2 30 20,000 30 (−500, 500) n 1
f3 30 20,000 30 (−600, 600)n 100
f4 30 20,000 10 (−5.12, 5.12)n 100
f5 30 10,000 30 (−32.768, 32.768)n 100
f6 30 10,000 30 (−30, 30) n 1
Source: Reprinted from International Journal of Computational Intelligence
Systems, 3, Chen, P. et al., Extremal optimization combined with LM gra-
dient search for MLP network learning, 622–631, Copyright 2010, with
permission from Elsevier.
Memetic Algorithms with Extremal Optimization ◾ 145
iteration, and thus the calculation of the number of function evaluations will bring
some trouble. Therefore, the number of function evaluations is not very suitable as
a time measure. Chen et al. (2010b) adopted the average runtime of 20 runs when
the near-optimal solution is found or otherwise when the maximum generation is
reached as a time measure.
After 20 trials of running each algorithm for each test function, the simulation
results were obtained and shown in Tables 5.9 through 5.14. The bold numer-
als in Tables 5.9 through 5.14 mean that they are the best optima among all the
optima. Denote F as the result found by the algorithms and F* as the optimum
value of the functions. The simulation is considered successful, or in other words,
the near-optimal solution is found, if F satisfies that |(F* – F)/F*| < 1E − 3 (for
the case F* ≠ 0) or |F* − F| < 1E − 3 (for the case F* = 0). In these tables, “success”
represents the success rate, and “runtime” is the average runtime of 20 runs when
the near-optimal solution is found or otherwise when the maximum generation is
reached. The worst, mean, best, and standard deviation of solutions found by the
five algorithms are also listed in these tables. Note that the standard deviation of
solutions indicates the stability of the algorithms, and the successful rate represents
the robustness of the algorithms.
The Michalewicz function (Molga and Smutnicki, 2005) is a highly multi-
modal test function (with n! local optima). The parameter m defines the “steep-
ness” of the valleys or edges. Larger m leads to a more difficult search. For a very
large m, the function behaves like a needle in the haystack (the function values for
points in the space outside the narrow peaks give very little information on the
location of the global optimum) (Molga and Smutnicki, 2005). As can be seen
from Table 5.9, PSO–EO significantly outperformed PSO, GA, and PEO in terms
of solution quality, convergence speed, and success rate. PSO–EO had the same
success rate as PGHA, but converged faster and found more accurate solution than
PGHA. It is interesting to notice that PSO–EO converged to the global optimum
more than 10 times faster than PSO, GA, and PEO on this function.
With regard to the Schwefel function (Molga and Smutnicki, 2005), its sur-
face is composed of a great number of peaks and valleys. The function has a second
best minimum far from the global minimum where many search algorithms are
trapped. Moreover, the global minimum is near the bounds of the domain. The
search algorithms are potentially prone to convergence in the wrong direction in
optimization of this function (Molga and Smutnicki, 2005). Consequently, this
function is very hard to solve for many state-of-the-art optimization algorithms.
Nevertheless, PSO–EO showed its great search ability and the prominent capa-
bility of escaping from local optima when dealing with this function. From Table
5.10, it is clear that PSO–EO was the winner which was capable of converging to
148 ◾ Extremal Optimization
the global optimum with a 100% success rate, whereas the other four algorithms
were not able to find the global optimum at all. It is important to point out that,
in order to help the PSO to jump out of local optima, the EO procedure was
introduced to PSO in each iteration. Thus, the exploitation search played a criti-
cal role in optimization of the Schwefel function (Chen et al., 2010b).
The Griewank function is a highly multimodal function with many local minima
distributed regularly (Srinivasan and Seow, 2006). The difficult part about finding
the optimal solution to this function is that an optimization algorithm easily can be
trapped in a local optimum on its way toward the global optimum. From Table 5.11,
we can see that the PSO–EO, PGHA, PSO, and PEO could find the global optimum
with a 100% success rate. It can be observed that the PSO–EO converged to the
global optimum almost as quickly as the PGHA and PSO, and faster than the PEO.
Tables 5.12 and 5.13 show the simulation results of each algorithm on the
Rastrigin and Ackley functions respectively. Both these functions are highly multi-
modal and their patterns are similar to that observed with the Griewank function.
As can be seen from Tables 5.12 and 5.13, only the PSO–EO, PGHA, and PSO
were capable of finding the global optimum on the two functions with a 100%
success rate, and the PSO–EO performed nearly as well as the PSO with respect to
convergence speed. The PGHA converged a little faster than the PSO–EO on the
Rastrigin function, but slower than the PSO–EO on the Ackley function.
Note that the EO procedure was introduced to the PSO every 100 iterations
in optimization of the Griewank, Rastrigin, and Ackley functions. In other words,
PSO played a key role in optimization of these functions, while EO brought the
diversity to the solutions generated by the PSO at intervals. In this way, the hybrid
PSO–EO algorithm is able to preserve the fast convergence ability of PSO. This
offers the explanation as to almost the same convergence speed of PSO–EO in
comparison with PSO on the Griewank, Rastrigin, and Ackley functions.
The Rosenbrock function is a unimodal optimization problem. Its global opti-
mum is inside a long, narrow, parabolic shaped flat valley, popularly known as
Rosenbrock’s valley (Molga and Smutnicki, 2005). It is trivial to find the valley, but
it is a difficult task to achieve convergence to the global optimum. The Rosenbrock
function proves to be hard to solve for all the algorithms, as can be observed from
Table 5.14. On this function, only the PSO–EO could find the global optimum
with a 100% success rate. Due to the difficulty in finding the global optimum of
this function, the EO procedure was introduced to PSO at each iteration.
In a general analysis of the simulation results shown in Tables 5.9 through 5.14,
the following conclusions can be drawn:
1. On the convergence speed, the PSO–EO was the fastest algorithm in com-
parison with the PGHA, standard PSO, standard GA, and PEO for the
Michalewicz, Schwefel, and Rosenbrock functions. The PSO–EO converged
to the global optimum almost as quickly as the PGHA and PSO for the other
three functions, that is, Griewank, Rastrigin, and Ackley. Thus, the hybrid
Memetic Algorithms with Extremal Optimization ◾ 149
From the above summary, it can be concluded that the PSO–EO algorithm
possesses superior performance in accuracy, convergence speed, stability, and
robustness, as compared to the PGHA, standard PSO, standard GA, and PEO. As
a result, the PSO–EO algorithm is a perfectly good performer in optimization of
those complex high-dimensional functions.
5.6 EO–ABC Integration
The ABC algorithm is a novel swarm intelligent algorithm inspired by the forag-
ing behavior of the honeybee. It was first introduced by Karaboga (2005). After
that, ABC was applied to solving the binding numerical optimization problems
by Karaboga and Basturk (2008), and satisfactory results were achieved. Since the
150 ◾ Extremal Optimization
ABC algorithm has many advantages, such as being simple in concept, easy to
implement, and having fewer control parameters, it has attracted the attention of
many researchers and has been used in solving many real-world optimization prob-
lems (Potschka, 2010; Yan et al., 2011, 2013; Rangel, 2012).
However, the standard ABC algorithm also has its limitations, such as prema-
ture convergence, slow convergence speed at the later stage of evolution, and low
convergence accuracy. In order to overcome the limitations of ABC, inspired by
the PSO–EO algorithm proposed by Chen et al. (2010b), an idea of combining
ABC with EO was addressed in Chen et al. (2014a). Chen et al. (2014a) developed
a hybrid optimization method, called the ABC–EO algorithm, which makes full
use of the global-search ability of ABC and the LS ability of EO. The performance
of the ABC–EO algorithm was testified on six unimodal/multimodal benchmark
functions and furthermore the ABC–EO algorithm was compared with other five
state-of-the-art optimization algorithms, that is, the standard ABC, PSO–EO
(Chen et al., 2010b), standard PSO, PEO (Chen et al., 2006), and standard GA.
The experimental results indicate that the ABC–EO algorithm may be a good
a lternative for complex numerical optimization problems.
by its employed bee and the employed bee associated with that food source becomes
a scout. Here “trial” is used to record the nonimprovement number of the solution
Xi, used for the abandonment. Finally, scout bees search the whole environment
randomly. Note that each food source is exploited by only one employed bee. That
is, the number of the employed bees or the onlooker bees is equal to the number of
food sources.
The pseudocode of the standard ABC algorithm is described in Figure 5.14.
In order to produce a candidate food position X i′ from the old one Xi in mem-
ory, the ABC uses the following expression (Karaboga and Akay, 2011):
X i′, j = X i , j + ϕi , j ( X i , j − X k , j ) (5.35)
where k = {1, 2, …, SN/2} and j = {1, 2, …, D} are randomly chosen indexes; k has
to be different from i; D is the number of variables (problem dimension); and φij is
a random number between [−1 and 1].
The pseudocode of the employed bees phase of the ABC algorithm is as shown
in Figure 5.15.
An artificial onlooker bee chooses a food source depending on the probability
value (denoted as P), which is associated with that food source. P is calculated by
the following expression (Karaboga and Akay, 2009):
fiti
P=
∑
SN /2 (5.36)
fit n
n =1
where fiti is the fitness value of the solution Xi, which is proportional to the nectar
amount of the food source in the position Xi, and SN/2 is the number of food
sources, which is equal to the number of employed bees or onlooker bees.
152 ◾ Extremal Optimization
1. For i = 1 to SN/2 do
2. For j = 1 to D do
3. Produce a new food source X i′ for the employed
bee of the food source Xi using Equation 5.35
4. End for
5. Evaluate the fitness of X i′
6. Apply the selection process between X i′ and Xi
based on greedy selection
7. If the solution X i′ does not improve, let
trial = trial + 1, otherwise trial = 0
8. End for
Figure 5.15 Pseudocode of employed bees phase. (Reprinted from Applied Soft
Computing, 11, Karaboga, D. and Akay, B., A modified artificial bee colony (ABC)
algorithm for constrained optimization, 3021–3031, Copyright 2011, with permis-
sion from Elsevier.)
The pseudocode of the onlooker bees phase of the ABC algorithm is as shown
in Figure 5.16.
The positions of the new food sources found by the scout bees will be produced
by the following expression (Karaboga and Akay, 2009):
X i′, j = X min, j + rand(0,1)( X max, j − X min, j ) (5.37)
where i is the index of the employed bees whose “trial” value reaches the “limit”
value first, j = 1, 2, …, D, X min and X max are the lower bound and the upper bound
of each solution, respectively, and rand(0,1) is a random number between [0 and 1].
1. t = 0, i = 1
2. Repeat (if the termination conditions are not met)
3. If random < P then (note that P is calculated by Eq. (5.36))
4. t=t+1
5. For j = 1 to D do
6. Produce a new food source X i′ for the onlooker bee of the
food source Xi by using E
7. End for
8. Apply the selection process between X i′ and Xi based on
greedy selection;
9. If the solution Xi does not improve, then let
trial = trial + 1, otherwise let trial = 0
10. End if
11. i=i+1
12. i = i mod (SN/2 + 1)
13. Until t = SN/2.
Figure 5.16 Pseudocode of onlooker bees phase. (Reprinted from Applied Soft
Computing, 11, Karaboga, D. and Akay, B., A modified artificial bee colony (ABC)
algorithm for constrained optimization, 3021–3031, Copyright 2011, with permis-
sion from Elsevier.)
Memetic Algorithms with Extremal Optimization ◾ 153
Figure 5.17 Pseudocode of scout bees phase. (Reprinted from Applied Soft
Computing, 11, Karaboga, D. and Akay, B., A modified artificial bee colony (ABC)
algorithm for constrained optimization, 3021–3031, Copyright 2011, with permis-
sion from Elsevier.)
The pseudocode of the scout bees phase of the ABC algorithm is worked as in
Figure 5.17.
The fitness of the ABC algorithm is proportional to the nectar amount of that
food source. The fitness is determined by the following expressions (Karaboga and
Akay, 2011):
fitnessi = 1 / (1 + f i ) if f i ≥ 0 (5.38)
where f i is the cost value of the solution Xi and abs(f i ) is the absolute value of f i.
of standard ABC and EO, and the other is the combination of the improved ABC
(Gao et al., 2012) (IABC for short) and EO, in which its search way of employed
bees is changed as follows (Gao et al., 2012):
Chen et al. (2014a) called them ABC–EO and IABC–EO, respectively. The
pseudocode of ABC–EO and IABC–EO for a minimization problem with D
dimensions is described in Figure 5.18.
In the main procedure of the ABC–EO algorithm, the fitness of each individual
is evaluated by Equations 5.38 and 5.39. However, in the EO procedure, in order
to find out the worst component, each component of a solution should be assigned
a fitness value. Chen et al. (2014a) defined the fitness of each component of a solu-
tion for an unconstrained minimization problem as follows. For the ith position of
food sources, the fitness λi,k of the kth component is defined as the mutation cost,
that is, OBJ ( X i′,k ) − OBJ ( X best ), where X i′,k is the new position of the ith position
obtained by performing mutation only on the kth component and leaving all other
components fixed, OBJ ( X i′,k ) is the objective value of X i′,k , and OBJ(Xbest) is the
objective value of the best position in the bee colony found so far. The EO proce-
dure is described in Figure 5.19.
After 50 trials of running each algorithm for each test function, the simulation
results were obtained and shown in Tables 5.16 through 5.21. The bold numerals in
Tables 5.16 through 5.21 mean that they are the best optima among all the optima.
Denote F as the result found by the algorithms and F * as the optimum value of the
functions (F1* = −9.66, F2* = −12569.5, F3* = F4* = F5* = F6* = 0). The simulation is
It can be observed that the ABC–EO has the fastest convergence speed, and the
IABC–EO converged to the global optimum almost as quickly as the ABC, and
faster than the PSO–EO, PSO, PEO, and GA. The IABC–EO also had a good
performance in terms of stability.
With regard to the Schwefel function, it has a second best minimum far
from the global minimum where many search algorithms may get trapped. From
Table 5.17, it is clear that the IABC–EO was the winner which was capable of
converging to the global optimum with the 100% success rate, the fastest conver-
gence speed, and the lowest standard deviation. The ABC–EO and ABC were bet-
ter than the PSO–EO, PSO, PEO, and GA in terms of success rate, convergence
speed, and solution accuracy. It is interesting to notice that the IABC–EO and
ABC–EO converged to the global optimum more than 100 times faster than the
PSO–EO and PEO.
Tables 5.18 and 5.20 show the simulation results of each algorithm on the
Griewank and Ackley functions respectively. Both the functions are highly mul-
timodal. As can be seen from Table 5.18, all algorithms could find the optimal
solution with a 100% success rate, except for GA. But the IABC–EO, ABC–EO,
and ABC were a little worse than the PSO–EO and PSO with respect to solution
accuracy. From Table 5.20, we can see that the IABC–EO and ABC–EO could
find the optimum with a 100% success rate in a short time. At the same time, Table
5.20 indicates that the ABC–EO and IABC–EO were not better than the PSO and
PSO–EO, but better than the PEO and GA with respect to convergence speed and
solution accuracy.
From Table 5.19, which shows the simulation results of each algorithm on the
Rastrigin function, we can see that the PSO and PSO–EO were the best perform-
ers in all aspects, and the IABC–EO and ABC–EO could find the optimal solution
in a short time with a higher success rate and solution accuracy, almost as good as
the ABC.
The last test function, Rosenbrock, is a unimodal function, but for a lot of
optimization algorithms, it is very difficult to converge to the global optimal point.
As can be seen from Table 5.21, the IABC–EO was capable of finding the optimal
solution quickly with a 96% success rate. Moreover, the IABC–EO converged to
the global optimum more than 60 times faster than the PSO–EO, although the
PSO–EO could find the optimum with a 100% success rate. The IABC–EO sig-
nificantly outperformed other algorithms, except for the PSO–EO, in terms of
solution quality, convergence speed, and success rate.
From the simulation results, it can be concluded that the ABC–EO and IABC–
EO possess good or superior performance in solution accuracy, convergence speed,
and success rate, as compared to the standard ABC, PSO–EO, standard PSO,
PEO, and standard GA. As a result, the ABC–EO and IABC–EO can be consid-
ered as perfectly good performers in optimization of complex high-dimensional
functions.
160 ◾ Extremal Optimization
5.7 EO–GA Integration
GAs belong to the larger class of EAs, which mimic the process of natural selec-
tion, such as crossover, mutation, and selection. In a GA, a population of candidate
solutions to an optimization problem is evolved toward better solutions through
iterations. GAs have several merits, such as easy implementation, fewer adjustable
parameters, and parallel computation. GAs have been widely applied in many
fields, such as bioinformatics, phylogenetics, computational science, engineering,
economics, chemistry, manufacturing, mathematics, physics, and pharmacomet-
rics. However, there is also premature convergence of GAs. When dealing with
many optimization problems, GAs may tend to converge to local optima rather
than to the global optimum of the problem. This problem may be alleviated by
increasing the mutation rate, adopting a different fitness function, or by using
selection techniques that keep a diverse population of solutions (Taherdangkoo
et al., 2012). A possible technique to maintain diversity would be mutation opera-
tion, which simply replaces part of the population with randomly generated indi-
viduals, while most of the population is too similar to each other. Diversity is of
importance in GAs because crossing over a homogeneous population does not
generate new solutions. Hence, increasing the diversity of the population of solu-
tions may prevent GAs from getting trapped in local optima. Note that in EO,
there is only mutation operation, which can maintain the diversity of solutions.
So EO may be introduced to GAs to increase the diversity of solutions, and thus
the hybrid algorithm, which is the combination of GA and EO, may not easily go
into local optima.
Memetic Algorithms with Extremal Optimization ◾ 161
The above three frameworks have their own merits. GA–EO-I may be manipu-
lated more easily for users. However, EO will be introduced to GA when GA does
not get trapped in local optima. This will increase the computational cost. As far as
GA–EO-II is concerned, EO will be adopted by GA when GA has gone into local
optima. Therefore, the computational cost of GA–EO-II will be lower than that
START
Initialization of GA
iteration=0
GA operators
(i.e., selection, crossover, and mutation)
No
Iteration mod INV=0?
Yes
Iteration=iteration+1 EO procedure
No
Terminal condition met?
Yes
Output optimal solution
STOP
START
Initialization of GA
iteration=0, count=0
GA operators
(i.e., selection, crossover, and mutation)
Yes
Better solution found?
No
Iteration=iteration+1 Count=count+1
Count=0
No Count=num?
Yes
EO procedure
No
Terminal condition met?
Yes
Output optimal solution
STOP
START
Initialization of GA
iteration = 0
Selection
Crossover
No
Terminal condition met?
Yes
Output optimal solution
STOP
of GA–EO-I. But the setting of parameter num will depend on the experience of
users. GA–EO-III proposes a new idea that the EO procedure can play the role of
mutation. This operation will bring more diversity for the population of solutions.
But at the same time, the computational complexity of the GA–EO-III will rise.
As a result, how to make the balance of effectiveness and efficiency of an algorithm
will be our vital objective in future studies.
5.8 Summary
Regarding the disadvantages of current evolutionary computation in theoretical
foundation and LS effectiveness, the interrelations between coevolution, MAs, and
extremal dynamics are studied. This chapter first introduces two hybrid EO algo-
rithms under the umbrella of MAs, called “EO–LM” and “EO–SQP,” which com-
bine the global-search capability of EO and the LS efficiency of the deterministic
methods. The proposed methods are employed for the numerical optimization and
NN learning. The search dynamics of the proposed methods are studied in detail.
The effectiveness and the efficiency of the proposed methods are proven by the
comparison with traditional methods through a number of benchmark systems.
Second, this chapter introduces the hybrid PSO–EO, which makes full use of the
exploration ability of PSO and the exploitation ability of EO, and thus can over-
come the limitation of PSO and has the capability of escaping from local optima.
Furthermore, another hybrid algorithm called ABC–EO is introduced in this
chapter, which combines the merits of ABC and EO. Experimental results show
that PSO–EO and ABC–EO possess good performance in terms of solution accu-
racy, convergence speed, and success rate. As a consequence, PSO–EO and ABC–
EO can be considered as perfectly good performers in the optimization of complex
high-dimensional functions. Finally, to avoid premature convergence of GA, an
idea of combining GA with EO is proposed in this chapter and three frameworks
through which EO can be integrated with GA are also addressed.
This page intentionally left blank
Chapter 6
Multiobjective
Optimization with
Extremal Dynamics
6.1 Introduction
Multiobjective optimization (MOO), or so-called multicriteria optimization,
Pareto optimization, involves a problem solution with more than one objective
function to be optimized simultaneously. MOO has been in high demand and
widely applied in a variety of real-world areas, including sciences, engineering, eco-
nomics, logistics, etc. The solutions and/or decisions taken for MOO problems
need to consider the trade-off between two or more conflicting criteria. In general,
MOO may provide a set of solutions; the decision maker will then select one that
meets the desired requirement.
The operations research (OR) community has developed several mathematical
programming techniques to solve multiobjective optimization problems (MOPs)
since the 1950s (Miettinen, 1999). However, mathematical programming tech-
niques have some limitations when dealing with MOPs (Coello, 2006). For exam-
ple, many of them may not work when the Pareto front is concave or disconnected.
Some require differentiability of the objective functions and the constraints. In
addition, most of them only generate a single solution from each run. During
the past two decades, a considerable amount of MOEAs have been presented to
solve MOPs (Sarker et al., 2002; Beausoleil, 2006; Coello, 2006; Elaoud et al.,
2007; Hanne, 2007). EAs seem particularly suitable to solve MOPs, because they
can deal with a set of possible solutions (or so-called population) simultaneously.
This allows us to find several members of the Pareto-optimal set in a single run of
165
166 ◾ Extremal Optimization
F ( X ) = [ f1( X ), f 2 ( X ), …, f k ( X )] (6.1)
subject to
g i ( X ) ≤ 0, i = 1, 2, …, m
h j ( X ) = 0, j = 1, 2, …, p
The Pareto-optimal front PF is defined as the set of all objective functions values
corresponding to the solutions in PS : PF = {f(x) = (f 1(x), …, f M (x)) | x ∈PS }.
168 ◾ Extremal Optimization
1. Aggregating functions
2. Population-based non-Pareto approaches
3. Pareto-based approaches
minimize ∑ w f (x )
i =1
i i (6.2)
∑w = 1
i =1
i (6.3)
This approach does not require any changes to the basic mechanism of an EA
and it is therefore very simple, easy to implement, and efficient. This approach can
work well in simple MOPs with few objective functions and convex search spaces.
One obvious problem of this approach is that it may be difficult to generate a set of
weights that properly scales the objectives when little is known about the problem.
However, its most serious drawback is that it cannot generate proper members of
the Pareto-optimal set when the Pareto front is concave regardless of the weights
used (Coello, 2005).
Multiobjective Optimization with Extremal Dynamics ◾ 169
It is well known that there exist two fundamental goals in MOO design: one is to
minimize the distance of the generated solutions to the Pareto-optimal set, the other
is to maximize the diversity of the achieved Pareto set approximation. MOEO mainly
consists of three components: fitness assignment, diversity preservation, and external
archive. A good fitness assignment is beneficial for guiding the search toward the
Pareto-optimal set. In order to increase the diversity of the nondominated solutions,
the diversity-preserving mechanism is introduced to MOEO. For the sake of prevent-
ing nondominated solutions from being lost, MOEO also adopts an external archive
to store the nondominated solutions found in the evolutionary process.
1. Randomly generate an initial solution S = (x1, x2, …, xn). Set the external archive empty. Set
iteration = 0.
2. Generate n offspring of the current solution S by performing mutation on each decision
variable one by one.
3. Perform dominance ranking on the n offspring and then obtain their rank numbers, i.e.,
rj ∈[0, n−1], j ∈{1, …, n}.
4. Assign the fitness λj = rj for each variable, xi , j ∈{1, …, n}.
5. If there is only one variable with fitness value of zero, the variable will be considered the worst
component; otherwise, the diversity preservation mechanism is invoked. Assuming that the
worst component is xw with fitness λw = 0, w ∈{1, …, n}.
6. Perform mutation only on xw while keeping other variables unchanged, then get a new
solution sw.
7. Accept S = Sw unconditionally.
8. Apply UpdateArchive (S, archive) to update the external archive (see Fig. 6.4)).
9. If the iterations reach the predefined maximum number of the generations, go to Step 10;
otherwise, set iteration = iteration + 1, and go to Step 2.
10. Output the external archive as the Pareto-optimal set.
Start
No
Perform mutation on the worst
component
gen = gen + 1
Yes
gen < maxgen?
No
Stop
Figure 6.2 Flowchart of MOEO algorithm for numerical MOPs. (Adapted from
Chen, M. R. and Lu, Y. Z., European Journal of Operational Research 3 (188):
637–651, 2008.)
the worst possible ranking is the number of all the solutions minus one. It is impor-
tant to note that there exists only one individual in the search process of MOEO,
and each decision variable in the individual is considered a species. In order to find
out the worst species via fitness assignment, MOEO generates a population of new
individuals (or so-called offspring) by performing mutation on the decision vari-
ables of the current individual one by one. Then the dominance ranking is carried
Multiobjective Optimization with Extremal Dynamics ◾ 173
(a) (b) F I IV
F2 D 2
B
i A D
i
C
A B C
I II
F1 F1
Figure 6.3 (a) Dominance ranking in MOEO and (b) diversity preservation in
MOEO. (Adapted from Chen, M. R. and Lu, Y. Z., European Journal of Operational
Research 3 (188): 637–651, 2008.)
1. Archiving logic: The function of the archiving logic is to decide whether the
current solution should be added to the archive or not. The archiving logic
works as follows. The newly generated solution S is compared with the archive
to check if it dominates any member of the archive. If yes, the dominated
solutions are eliminated from the archive and S is added to the archive. If S
does not dominate any member of the archive, the archive does not need to
be updated and the iteration continues. However, if S and any member of the
archive do not dominate each other, there exist three cases:
a. If the archive is not full, S is added to the archive.
b. If the archive is full and S resides in the most crowded region in the
parameter space among the members of the archive, the archive does not
need to be updated.
c. If the archive is full and S does not reside in the most crowded region
of the archive, the member in the most crowded region of the archive is
replaced by S.
Based on the above archiving logic, the pseudo-code of function
UpdateArchive (S, archive) is shown in Figure 6.4.
Multiobjective Optimization with Extremal Dynamics ◾ 175
Thus, it would be ideal if Cauchy mutation is used when search points are far away
from the global optimum and Gaussian mutation is adopted when search points are
in the neighborhood of the global optimum. Unfortunately, the global optimum
is usually unknown in practice, making the ideal switch from Cauchy to Gaussian
mutation very difficult.
A new method based on mixing (rather than switching) different mutation
operators has been proposed by Yao et al. (1999). The idea is to mix different search
biases of Cauchy and Gaussian mutations. The method generates two offspring
from the parents, one by Cauchy mutation and the other by Gaussian. The bet-
ter one is then chosen as the offspring. Inspired by the above idea, Chen and Lu
(2008) presented a new mutation method-“hybrid GC mutation,” which is based
on mixing Gaussian mutation and Cauchy mutation. It must be indicated that,
unlike the method in Yao et al. (1999), the hybrid GC mutation does not compare
the anticipated outcomes between Gaussian mutation and Cauchy mutation due
to the characteristics of EO. In the hybrid GC mutation, Cauchy mutation is first
adopted. It means that the large step size will be taken first at each mutation. If
the new generated variable after mutation goes beyond the intervals of the decision
variables, Cauchy mutation will be used repeatedly for some times (TC), which
is defined by the user, until the offspring satisfies the requirements. Otherwise,
Gaussian mutation will be carried out repeatedly for some times (TG), which is
also defined by the user, until the offspring satisfies the requirements. That is, the
step size will become smaller than before. If the new generated variable after muta-
tion still goes beyond the intervals of the decision variables, then the upper limit or
lower limit of the decision variables is chosen as the new generated variable. Thus,
the hybrid GC mutation combines the advantages of coarse-grained search and
fine-grained search. The above analyses show that the hybrid GC mutation is very
simple yet effective. Unlike some switching algorithms which have to decide when
to switch between different mutations during search, the hybrid GC mutation does
not need to make such decisions.
n i = 2, … , n
g ( X ) = 1+ 9 xi (n − 1)
∑ i=2
n i = 2, … , n
g ( X ) = 1+ 9 xi (n − 1)
∑ i=2
Multiobjective Optimization with Extremal Dynamics
(Continued)
◾
177
178 ◾
Source: Reprinted from European Journal of Operational Research, 3 (188), Chen, M. R. and Lu, Y. Z., A novel elitist multi-
objective optimization algorithm: Multi-objective extremal optimization, 637–651, Copyright 2008, with
permission from Elsevier.
n is the number of decision variables.
Multiobjective Optimization with Extremal Dynamics ◾ 179
metric does not have a value of zero. The metric will yield zero only when each
obtained solution lies exactly on each of the chosen solutions. The second metric Δ
measures the extent of spread of the obtained nondominated solutions. It is desir-
able to get a set of solutions that spans the entire Pareto-optimal region. The second
metric Δ can be calculated as follows:
∑
N −1
d f + dl + di − d
∆= i =1 (6.4)
d f + d l + ( N − 1)d
where df and dl are the Euclidean distances between the extreme solutions and the
boundary solutions of the obtained nondominated set, di is the Euclidean distance
between consecutive solutions in the obtained nondominated set of solutions, and
d is the average of all distances di (i = 1, 2, … , N − 1), assuming that there are N
solutions on the best nondominated front. Note that a good distribution would
make all distances di equal to d and would make df = dl = 0 (with existence of
extreme solutions in the nondominated set). Consequently, for the most widely and
uniformly spreadout set of nondominated solutions, Δ would be zero.
Table 6.2 Mean (First Rows) and Variance (Second Rows) of the
Convergence Metric ϒ
Algorithm ZDT1 ZDT2 ZDT3 ZDT4 ZDT6
Table 6.3 shows the mean and variance of the diversity metric Δ obtained using
all the algorithms. The bold numeral in Table 6.3 means it is the best optimum
among all the optima. As can be seen from Table 6.3, MOEO is capable of finding
a better spread of solutions than any other algorithm on all the problems except
ZDT3. This indicates that MOEO has the ability to find a well-distributed set
of nondominated solutions than many other state-of-the-art MOEAs. In all cases
with MOEO, the variance of the diversity metric in 10 runs is also small.
It is interesting to note that MOEO can perform well with respect to conver-
gence and diversity of solutions in problem ZDT4, where there exist 219 different
local Pareto-optimal fronts in the search space, of which only one corresponds to
the global Pareto-optimal front. This indicates that MOEO is capable of escaping
from the local Pareto-optimal fronts and approaching the global nondominated
front. Furthermore, MOEO is suitable to deal with those problems with nonuni-
formly spaced Pareto front, for example, problem ZDT6.
For illustration, 1 of 10 runs of MOEO on three test problems (ZDT1, ZDT2,
and ZDT3) is shown in Figures 6.5 through 6.7, respectively. The figures show all
nondominated solutions obtained after 25,000 FFE with MOEO. From Figures 6.5
through 6.7, we can see that MOEO is able to converge to the true Pareto-optimal
front on all the three problems. Moreover, MOEO can find a well-distributed set
Multiobjective Optimization with Extremal Dynamics ◾ 181
Table 6.3 Mean (First Rows) and Variance (Second Rows) of the Diversity
Metric Δ
Algorithm ZDT1 ZDT2 ZDT3 ZDT4 ZDT6
1
Pareto-optimal front
MOEO
0.8
0.6
f2
0.4
0.2
0
0 0.2 0.4 0.6 0.8 1
f1
Pareto-optimal front
MOEO
1
0.8
0.6
f2
0.4
0.2
0
0 0.2 0.4 0.6 0.8 1
f1
Figure 6.6 Nondominated solutions with MOEO on ZDT2. (Adapted from Chen,
M. R. and Lu, Y. Z., European Journal of Operational Research 3 (188): 637–651,
2008.)
1
Pareto-optimal front
0.8 MOEO
0.6
0.4
0.2
f2
–0.2
–0.4
–0.6
–0.8
0 0.2 0.4 0.6 0.8 1
f1
Pareto-optimal front
1 MOEO
SPEA2
0.8
0.6
f2
0.4
0.2
0
0 0.2 0.4 0.6 0.8 1
f1
Figure 6.8 MOEO converges to the true Pareto-optimal front and finds a better
spread of solutions than SPEA2 on ZDT4. (Reprinted from European Journal of
Operational Research, 3 (188), Chen, M. R. and Lu, Y. Z., A novel elitist multi-
objective optimization algorithm: Multi-objective extremal optimization,
637–651, Copyright 2008, with permission from Elsevier.)
of nondominated solutions on ZD1 and ZDT2. But the diversity of the nondomi-
nated solutions found by MOEO on ZDT3 is not very good.
Figure 6.8 shows 1 of 10 runs with MOEO and SPEA2 on ZDT4. As can be
observed from Figure 6.8, MOEO is able to find a better convergence and spread
of solutions than SPEA2 on ZDT4. In the literature (Deb et al., 2002), it can be
observed that NSGA-II can find better convergence and spread of solutions than
PAES on ZDT4, but both of them cannot converge to the true Pareto-optimal
front. For more details, readers can refer to Deb et al. (2002). It is interesting to
notice that MOEO can converge to the true Pareto-optimal front, which indi-
cates that MOEO outperforms NSGA-II and PAES on ZDT4 in terms of solution
convergence.
Figure 6.9 shows 1 of 10 runs with MOEO and SPEA2 on ZDT6. From Figure
6.9, it can be seen that MOEO is able to find a better convergence and spread of solu-
tions than SPEA2 on ZDT6. It is worth noting that MOEO is capable of converging
to the true Pareto-optimal front and finding a well-distributed set of nondominated
solutions that cover the whole front on ZDT6.
In order to compare the running times of MOEO with those of the other three
algorithms, that is, NSGA-II, SPEA2, and PAES, Chen and Lu (2008) also showed
the mean and variance of running times of each algorithm in 10 runs in Table 6.4. The
bold numeral in Table 6.4 means it is the best optimum among all the optima.
The FFE for all the algorithms was 25,000.
184 ◾ Extremal Optimization
Pareto-optimal front
MOEO
0.8 SPEA2
0.6
f2
0.4
0.2
0
0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
f1
Figure 6.9 MOEO can converge to the true Pareto-optimal front and finds a
better spread of solutions than SPEA2 on ZDT6. (Reprinted from European
Journal of Operational Research, 3 (188), Chen, M. R. and Lu, Y. Z., A novel elitist
multi-objective optimization algorithm: Multi-objective extremal optimization,
637–651, Copyright 2008, with permission from Elsevier.)
Table 6.4 Mean (First Rows) and Variance (Second Rows) of the Running
Times (in Milliseconds)
Algorithm ZDT1 ZDT2 ZDT3 ZDT4 ZDT6
From Table 6.4, we can observe that MOEO is the fastest algorithm in compari-
son with the other three algorithms on problems ZDT1, ZDT2, ZDT3, and ZDT4.
It is important to notice that MOEO is about 6 times faster than NSGA-II and
SPEA2, even approximately 30 times as fast as PAES on problems ZDT1, ZDT2,
and ZDT3. For problem ZDT6, MOEO runs almost as fast as NSGA-II and SPEA2,
nearly 10 times as fast as PAES. It is worthwhile noting that although MOEO and
PAES are both single-parent single-child MOO algorithms, MOEO runs much
faster than PAES when the same archive truncation method is adopted. The results
in Table 6.4 are remarkable if NSGA-II and SPEA2 are normally considered “very
fast” algorithms. Therefore, MOEA may be considered a very fast approach.
6.4.2.4 Conclusions
From the above analysis, it can be seen that MOEO has the following advantages:
1. Similar to MOEAs, MOEO is not susceptible to the shape of Pareto front.
2. Only one operator, that is, mutation operator, exists in MOEO, which makes
MOEO simple and convenient to be implemented.
3. The hybrid GC mutation operator suggested in MOEO combines the advan-
tages of coarse-grained search and fine-grained search.
4. The historical external archive provides the elitist mechanism for MOEO.
5. MOEO provides good performance in both aspects of convergence and dis-
tribution of solutions.
6. MOEO is capable of handling those problems with multiple local Pareto-
optimal fronts or nonuniformly spaced Pareto-optimal front.
7. Compared with three competitive MOEAs in terms of running times on five
test problems, MOEO tested very fast.
g2( x ) = ( x1 − 0.5)2
+ ( x2 − 0.5)2 ≤ 0.5
1. Front spread (FS) (Bosman and Thierens, 2003): It indicates the size of the objec-
tive space covered by an approximate set. A larger FS metric is preferable. The FS
metric for an approximation set S is defined to be the maximum Euclidean dis-
tance inside the smallest m-dimensional bounding-box that contains S. Here, m
is the number of objectives. This distance can be computed using the maximum
distance among the solutions in S in each dimension separately.
{( f (z ) − f (z )) }
m−1
∑
2
FS (S ) = max
0 1
i
0
i
1
(6.5)
(z , z )∈S × S
i =0
2. Coverage of two sets (Zitzler and Thiele, 1999): Assume, without loss of gen-
erality, a minimization problem and consider two decision vectors a,b ∈ X.
Then a is said to cover b (written as a ≺ b ) if a dominates b (also written as
a ≺ b )) or a equals to b. Let X′, X″ ⊆ X be two sets of decision vectors. The
function C maps the ordered pair (X′, X″) to the interval [0, 1].
C ( X ′, X ″ ) =
{a ″ ∈ X ″ ; ∃a ′ ∈ X ′ : a ′ ≺ a ″} (6.6)
X″
Here, C(X′, X″) = 1 means that all points in X″ are covered by points in X′. The
opposite, C(X′, X″) = 0 represents the situation when none of points in X″ are cov-
ered by the set X′. Note that both C(X′, X″) and C(X″, X′) have to be considered,
since C(X′, X″) is not necessarily equal to C(X″, X′) (e.g., if X′ dominates X″, then
C(X′, X″) = 1 and C(X″, X′) = 0).
The first metric FS can be used to measure the spread of an approximate front,
while the second metric C can be used to show that the outcomes of one algorithm
dominate the outcomes of another algorithm.
Algorithm Mean St. Dev Mean St. Dev Mean St. Dev
algorithms have difficulty finding the entire central continuous region. Figure 6.12
shows the trade-off fronts obtained by all the algorithms on this problem. It is obvi-
ous that MOEO performs significantly better than SPEA2 and PAES in terms of
the convergence and diversity of solutions. Note that MOEO and NSGA-II nearly
converge to the same trade-off front. However, MOEO is able to find a much more
diverse set of nondominated solutions than NSGA-II.
9
1.55
8
1.5
7 1.45
6 1.4
1.35
5
f2
1.3
0.68 0.7 0.72 0.74 0.76
4
3
MOEO
NSGA-II
2 PAES
SPEA2
1
0.4 0.5 0.6 0.7 0.8 0.9 1
f1
Figure 6.10 Approximate Pareto fronts for CONSTR. (Reproduced from Chen,
M. R. et al. Journal of Zhejiang University, Science A 8 (12): 1905–1911, 2007.
With permission.)
190 ◾ Extremal Optimization
50
–10
0 –15
–20
–50
–25
–30
–100
f2
–35
15 20 25 30 35
–150
MOEO
–200 NSGA-II
PAES
SPEA2
–250
0 50 100 150 200 250
f1
Figure 6.11 Approximate Pareto fronts for SRN. (Reproduced from Chen, M. R.
et al. Journal of Zhejiang University, Science A 8 (12): 1905–1911, 2007. With
permission.)
10
MOEO
9 NSGA-II
PAES
8
SPEA2
7
6
8
5
f2
7.5
4
7
3
6.5
2
6
1 5.5
0.4 0.5 0.6 0.7
0
0 0.2 0.4 0.6 0.8 1
f1
Figure 6.12 Approximate Pareto fronts for TNK. (Reproduced from Chen, M. R.
et al. Journal of Zhejiang University, Science A 8 (12): 1905–1911, 2007. With
permission.)
Multiobjective Optimization with Extremal Dynamics ◾ 191
6.4.3.4 Conclusions
In this section, we introduce the constrained MOEO algorithm in application to
solving constrained MOPs. The simulation results indicate that MOEO is highly
competitive with three state-of-the-art MOEAs, that is, NSGA-II, SPEA2, and
PAES in terms of convergence and diversity of solutions. Thus, MOEO may be a
good alternative to deal with constrained MOPs.
maximize f = ( f1 ( X ), f 2 ( X ), …, f n ( X ))
m
subject to ∑w
j =1
i, j ⋅ x j ≤ Ci , i = 1, 2, …, n (6.7)
X = ( x1 , x2 , … xm ) ∈{0,1}m
1. Generate an initial feasible solution X = (x1, x 2, …, xm) ∈ {0, 1}m using the greedy algorithm.
Add X to the external archive. Set iteration = 0.
2. Generate m offspring of the current solution X by performing mutation on each compo-
nent one by one, while keeping other components unchanged.
3. Apply the repair strategy to repair m offspring.
4. Perform dominance ranking on the m offspring and then obtain their rank numbers, i.e.,
rj ∈ [0, m − 1], j ∈ {1, …, m}.
5. Assign the fitness λj = rj for each component xj, j ∈ {1, …, m}.
6. If there is only one component with fitness value of zero, the component will be consid-
ered the worst one. Otherwise, if there are two or more components with fitness value of
zero, the diversity preservation mechanism is invoked to decide the worst component.
Assuming that the kth component is the worst one, and x k is the corresponding offspring
generated by performing mutation only on the kth component, k ∈ {1, …, m}.
7. Accept x = xk unconditionally.
8. Apply the archiving logic to updating the external archive. Otherwise, go to Step 9.
9. If the iterations reach the predefi ned maximum number of the generations, go to Step 10;
otherwise, set iteration = iteration + 1, and go to Step 2.
10. Output the external archive as the Pareto-optimal set.
current component, that is, set xk = 0 when its value equals to 1; otherwise,
set xk = 1. At the same time, randomly choose another component with differ-
ent value from that of the current component, and perform flipping on it. For
instance, if the current solution X = 1010, then four offspring individuals will be
generated through the above mutation, that is, X1 = 0110, X 2 = 1100, X 3 = 1001,
and X4 = 0011.
Procedure Repair (x)
begin
knapsack-overfi lled = false;
knapsack-unfi
lled = false;
x′ = x, i.e., (x1′ , … , xm′ ): = (x1 , … , xm );
n Ci ⋅ pij
sort all the items in the increasing order of the value ratios i.e., max ;
i = 1 wij
/*Remove phase */
∑
m
if j =1
wij ⋅ x ′j > Ci , i ∈{1,… , n}
knapsack-overfi lled = true;
while (knapsack-overfi lled)
{j := select the item with the minimum value
ratio from the knapsack;
remove the selected item from the knapsack, i.e., x ′j := 0;
∑
m
if j =1
wij ⋅ x ′j ≤ Ci , i = 1, … , n
knapsack-overfi lled = false;
end if }
end if
/* Add phase */
∑
m
if j =1
wij ⋅ x ′j ≤ Ci , i = 1, … , n
knapsack-unfi lled = true;
while (knapsack-unfi lled)
{j := select the item with the maximum value ratio outside the knapsack;
add the selected item to the knapsack, i.e., x ′j := 1;
if ∑ j =1 wij ⋅ x ′j > Ci , i = 1, … , n
m
are 100, 150, 200, respectively. The maximum iterations for the three algorithms
are 500. Thus, the maximum iterations for MOEO are 500, 300, 200 for instances
2-100, 2-250, 2-500, respectively. For MOEO, it was encoded in the floating point
representation and an archive of size 100 was used. All algorithms were run 50 times
independently on each instance. Two performance metrics, that is, FS and coverage
of two sets (C metric) were used to assess the performance of the extended MOEO.
0.4 0.4
0.3 0.3
0.2 0.2
0.1 0.1
0 0
2–100 2–250 2–500 2–100 2–250 2–500
0.4 0.4
0.3 0.3
0.2 0.2
0.1 0.1
0 0
2–100 2–250 2–500 2–100 2–250 2–500
0.4 0.4
0.3 0.3
0.2 0.2
0.1 0.1
0 0
2–100 2–250 2–500 2–100 2–250 2–500
Algorithms Mean St. Dev. Mean St. Dev. Mean St. Dev.
on all the instances in terms of the metric C. It can be also observed from Figure
6.15 that MOEO performed better than SPEA except the instance 2-250 concern-
ing the metric C. This illustrates that MOEO is capable of performing nearly as
well as or better than other three algorithms with respect to the convergence of
solutions.
Table 6.8 shows the mean and standard deviation of the FS metric obtained using
four algorithms, that is, MOEO, NSGA, SPEA, and NPGA. The bold numeral in
Table 6.8 means it is the best optimum among all the optima. As can be seen from
Table 6.8, MOEO was capable of finding a wider spread of solutions than the other
three algorithms on the instance 2-100. MOEO performed better than NSGA on
instance 2-250, and better than NPGA on instance 2-500. This indicates that the
extended MOEO has the ability to find a wide distributed set of nondominated
solutions.
For illustration, we also show 1 of 50 runs of all the algorithms on each instance
in Figure 6.16a–c, respectively. Figure 6.16 shows the approximate Pareto fronts
obtained by MOEO, NSGA, SPEA, and NPGA. From Figure 6.16, we can see
that MOEO was able to converge nearly as well as or better than the other three
algorithms on all the instances. Moreover, MOEO could find a well-distributed set
of nondominated solutions on all the instances.
6.5.4 Conclusions
The extended MOEO for MOKP was validated using three benchmark problems.
The simulation results demonstrated that the extended MOEO is highly com-
petitive with three state-of-the-art MOEAs, that is, the NSGA, the SPEA, and
the NPGA. As a result, the extended MOEO may be a good alternative to solve
MOKP.
196 ◾ Extremal Optimization
(a) 4100
Pareto front
4000 MOEO
NSGA
3900 NPGA
3800 SPEA
3700
f2
3600
3500
3400
3300
3200
3200 3400 3600 3800 4000 4200 4400
f1
(b) 10,500
Pareto front
MOEO
10,000 NSGA
NPGA
SPEA
9500
9000
f2
8500
8000
7500
7000 7500 8000 8500 9000 9500 10,000
f1
(c) × 104
2.05
2
1.95
1.9
1.85
f2
1.8
1.75 Pareto front
MOEO
1.7 NSGA
1.65 NPGA
SPEA
1.6
1.5 1.6 1.7 1.8 1.9 2 2.1
f1 × 104
Figure 6.16 (a)–(c) Approximate Pareto fronts for instances 2-100, 2-250, and
2-500. (Chen, M. R. et al. A novel multi-objective optimization algorithm for
0/1 multi-objective knapsack problems. Proceedings of the 4th IEEE Conference
on Industrial Electronics and Applications (ICIEA 2010), pp. 1511–1516. © 2010
IEEE.)
Multiobjective Optimization with Extremal Dynamics ◾ 197
1. Randomly generate an initial feasible solution S = (x1, x 2, …, xn) and add S to the external
archive. Set iteration = 0.
2. Generate n offspring of the current solution S by performing mutation on each compo-
nent one by one, while keeping other components unchanged.
3. Perform dominance ranking on the n offspring and then obtain their rank numbers, i.e.,
rj ∈[0, n − 1], j ∈{1, …, n}.
4. Assign the fitness λ j = rj for each component xj, j ∈ {1, …, n}.
5. If there is only one component with fitness value of zero, the component will be consid-
ered the worst one. Otherwise, if there are two or more components with fitness value of
zero, the diversity preservation mechanism is invoked to decide the worst component.
Assuming that the worst component is xw, and Sw is the corresponding offspring gener-
ated by performing mutation only on xw, w ∈ {1, …, n}.
6. Accept S = Sw unconditionally.
7. If S is a feasible solution, then apply the archiving logic to updating the external archive.
Otherwise, go to Step 8.
8. If the iterations reach the predefined maximum number of the generations, go to Step 9;
otherwise, set iteration = iteration + 1, and go to Step 2.
9. Output the external archive as the Pareto-optimal set.
minimize f1 ( x ) = x1 16 + y 2 + x2 1 + y 2
minimize f 2 ( x ) = max(σ AC , σ BC )
(6.8)
subject to max(σ AC , σ BC ) ≤ 105
1 ≤ y ≤ 3 and x ≥0
20 16 + y 2 80 1 + y 2
σ AC = , σ BC = (6.9)
yx1 yx2
minimize f 2 ( x ) = δ( x ),
g 2 ( x ) = 30, 000 − σ( x ) ≥ 0,
g 3 ( x ) = b − h ≥ 0, (6.10)
g 4 ( x ) = Pc ( x ) − 6000 ≥ 0
2.1952
δ( x ) = (6.11)
t 3b
τ( x ) = ( τ ′ )2 + ( τ ′′ )2 + (l τ ′τ ′′ ) 0.25(l 2 + (h + t )2 ) ,
6000
τ′ = ,
2 hl
The variables are initialized in the following range: 0.125 ≤ h, b ≤ 5 and 0.1 ≤ l,
t ≤ 10.
π
minimize f ( x1 ) =
4
( ) (
a d a2 − d 02 + l db2 − d 02 )
l I a F ca a 2
2
Fa 3 a
minimize f ( x2 ) = 1 + + 1 + + (6.13)
3EI a a I b c a l cb l 2
200 ◾ Extremal Optimization
subject to g1 ( x ) = l − l g ≤ 0
g 2 ( x ) = lk − l ≤ 0
g 3 ( x ) = d a1 − d a ≤ 0
g4 (x ) = da − da2 ≤ 0
g 5 ( x ) = db1 − db ≤ 0
g 6 ( x ) = db − db 2 ≤ 0
g 7 ( x ) = d om − d o ≤ 0
g 8 ( x ) = p1d o − db ≤ 0
g 9 ( x ) = p2 db − d a ≤ 0
a
g10 ( x ) = ∆ a + ( ∆ a − ∆b ) −∆≤0
l
where there are two continuous variables (l and do) and two discrete variables
(da and db), Ia and Ib are the moments of inertia:
( ) (
I a = 0.049 d a4 − d o4 , I b = 0.049 db4 − d o4 ) (6.14)
1/ 9 1/ 9
c a = 35, 400 δ ra d a10/9, cb = 35, 400 δ rb db10/9 (6.15)
where δra and δrb are the preloads of the bearings, g 8(x) and g 9(x) are the designer’s
proportion requirements, g 10(x) is the maximal radial run out of the spindle nose Δ,
and Δa and Δb are the radial run outs of the front and the end bearings.
For this example, it is assumed that da must be chosen from the set X 3 = {80, 85,
90, 95} and db must be chosen from the set X4 = {75, 80, 85, 90}. Additionally, the
following constant parameters are assumed:
db 2 = 90.00 mm
m, p1 = 1.25, p2 = 1.05, l k = 150.00 mm, l g = 200.00 mm,
Algorithm Mean St. Dev Mean St. Dev Mean St. Dev
perform better than NSGA-II on the problems Spindle concerning the C metric.
It is interesting to note that MOEO outperforms SPEA2 on all the problems in
terms of the C metric. It is also clear from Table 6.10 that MOEO significantly
outperforms PAES on problems Two Bar and Spindle, with respect to the C metric.
For illustration, we also show 1 of 50 runs of all the algorithms on each problem
in Figures 6.18 through 6.20, respectively. The figures show the trade-off fronts
obtained by MOEO, NSGA-II, SPEA2, and PAES. The insets in all the figures
show the parts which may be unclear in the main plots. From Figures 6.18 through
6.20, we can see that MOEO is able to converge nearly as well as or better than
the other three algorithms on all the problems. Moreover, MOEO can find a well-
distributed set of nondominated solutions on all the problems.
Note that the problem Spindle has two continuous and two discrete decision
variables. As can be observed from Tables 6.9 and 6.10 and Figure 6.20, MOEO
seems to be the best performer in both aspects of convergence and diversity of
solutions on this problem. Two extreme nondominated solutions found by MOEO
are (4.767 × 105, 0.0346) and (1.607 × 106, 0.01465), respectively, which are better
than those found by other techniques listed in the literature (Baykasoglu, 2006).
Consequently, MOEO may be a good choice to solve those problems with mixed
continuous or discrete decision variables.
6.6.4 Conclusions
In this section, the MOEO algorithm is extended to solve the mechanical components
design MOPs. The proposed approach is validated by three mechanical components
design problems. The simulation results indicate that MOEO is highly competitive
with three state-of-the-art MOEAs, that is, NSGA-II, SPEA2, and PAES in terms of
convergence and diversity of solutions. Thus, MOEO may be well-suited for handling
those mechanical components design problems with multiple objectives.
Multiobjective Optimization with Extremal Dynamics ◾ 203
× 104
10
MOEO
9 × 104
NSGA-II
3 PAES
8
SPEA2
7
2.5
6
5
f2
2
4
0
0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
f1
Figure 6.18 Approximate Pareto fronts for Two Bar. (Reproduced from
Chen, M. R. et al. Journal of Zhejiang University, Science A 8 (12): 1905–1911,
2007. With permission.)
minimize σ = 2
p ∑ ∑w w σ
i =1 j =1
i j ij (6.16)
204 ◾ Extremal Optimization
× 10–3
8
× 10–3
7 2
6 1.9
5 1.8
4 1.7
f2
3 1.6
2 1.5
9.5 10 10.5 11 11.5
MOEO
1 NSGA-II
PAES
SPEA2
0
0 5 10 15 20 25 30 35 40 45
f1
Figure 6.19 Approximate Pareto fronts for Welded Beam. (Reproduced from
Chen, M. R. et al. Journal of Zhejiang University, Science A 8 (12): 1905–1911,
2007. With permission.)
maximize rp = ∑w µ
i =1
i i (6.17)
subject to ∑w = 1
i =1
i (6.18)
∑x = K
i =1
i (6.19)
εi xi ≤ wi ≤ δi xi , i = 1, …, N (6.20)
0.035
0.0195
0.03 0.019
0.0185
0.025
0.018
f2
0.02
7 7.5 8 8.5
× 105
0.015 MOEO
NSGA-II
PAES
SPEA2
0.01
0.6 0.8 1 1.2 1.4 1.6
f1 × 106
Figure 6.20 Approximate Pareto fronts for Spindle. (Reproduced from Chen,
M. R. et al. Journal of Zhejiang University, Science A 8 (12): 1905–1911, 2007.
With permission.)
where N is the number of available assets, σ 2p the risk of the portfolio assets, σij the
covariance between the ith and the jth asset, rp the expected return of the portfolio
assets, μi the expected return of the ith asset, K the number of assets to invest in
(K ≤ N), εi the minimum investment ratio allowed in the ith asset, and δi the maxi-
mum investment ratio allowed in the ith asset.
There are two decision variables in this model: one is X = (x1, x 2, …, xi, …,
xN) ∈ {0, 1}N, where xi = 1, if the ith asset is chosen, otherwise xi = 0; and the other
is W = (w1, w2, …, wj, …, wN), wj ∈ [0, 1], j = 1, …, N, where wj is the money ratio
invested in the jth asset.
Equation 6.16 minimizes the total risk associated with the portfolio of assets.
Equation 6.17 maximizes the return associated with the portfolio of assets.
Equation 6.19 fixes the number of assets to invest in at K. Equation 6.20 imposes
the maximum and minimum investment ratio allowed for each asset.
1. Randomly select K assets from N assets, and then generate an initial feasible solution
W = (w1, w2, … , w j , …, w N ), w j ∈[0, 1], j = 1, … , N . Repair W using repair strategy Repair
(W). Add W to the external archive. Set iteration = 0.
2. Generate N offspring of the current solution W by performing mutation on each compo-
nent one by one, while keeping other components unchanged.
3. Perform dominance ranking on the N offspring and then obtain their rank numbers, i.e.,
rj ∈ [0, N − 1], j ∈ {1, …, N}.
4. Assign the fitness λj = rj for each component wj, j ∈ {1, …, N}.
5. If there is only one component with fitness value of zero, the component will be consid-
ered the worst one. Otherwise, if there are two or more components with fitness value of
zero, the diversity preservation mechanism is invoked to decide the worst component.
Assuming that the kth component is the worst one, and W k is the corresponding offspring
generated by performing mutation only on the kth component, k ∈ {1, …, N}.
6. Accept W = W k unconditionally.
7. Apply the archiving logic to updating the external archive. Otherwise, go to Step 8.
8. If the iterations reach the predefi ned maximum number of the generations, go to Step 9;
otherwise, set iteration = iteration + 1, and go to Step 2.
9. Output the external archive as the Pareto-optimal set.
if L ≠ 0 then
for each α i ∈ℑ
if L > 0 then
di = δ i − wi
else
di = wi − δ i
end if
end for
for each α i ∈ℑ
di
mi =
∑
K
j =1
dj
wi = wi + L ⋅ mi
end for
end if
return W
end
British FTSE 100, the fourth one is the U.S. S&P 100, and the fifth one is the
Japanese Nikkei 225. The values included for each index were collected from March
1992 to September 1997. The data package contains the complete identification
list of the assets included. The data files are composed of 31, 85, 89, 98, and 225
assets, respectively. For each asset i, the average return μi and the individual risk σi
is included. For each pair of assets i and j, the correlation ρij between them is also
included. The risk of investing in an asset having invested in another one simulta-
neously is modeled by the covariance between both. Then, the risk (or covariance)
between two assets, i and j, is given by the expression σij = ρij ⋅ σi ⋅ σj.
The simulation results of MOEO (Chen et al., 2009) were compared with those
of NSGA-II, SPEA2, and PAES when K = 5. In addition, to conduct a fair com-
parison, all algorithms were performed under the same FFE. All the algorithms
were run for a maximum of 31,000, 85,000, 89,000, 98,000 and 225,000 FFE for
problems port1 to port5, respectively. For MOEO, it was encoded in the floating
point representation and an archive of size 100 was used. For NSGA-II, the popula-
tion size was 100. For SPEA2, a population of size 80 and an external population
of size 20 were adopted, so that the overall population size became 100. For PAES,
a depth value d equal to 4 and the archive size of 100 were used. NSGA-II and
SPEA2 adopt uniform crossover operator and Gaussian mutation, while PAES uses
Gaussian mutation. The crossover probability of pc = 0.9 and a mutation probability
of pm = 0.1 were used. Additionally, all the algorithms were executed 50 times inde-
pendently on each problem. In order to consider a search space as wide as possible,
the values of the maximum and minimum investments were set at extreme values.
For all the experiments, the values of εi and δi are 0.001 and 1.0, respectively.
Additionally, two performance metrics, that is, FS and coverage of two sets (C met-
ric) were used to assess the performance of MOEO.
Algorithms Mean St. Dev. Mean St. Dev. Mean St. Dev. Mean St. Dev. Mean St. Dev.
MOEO 8.59E−3 4.17E−4 7.78E−3 6.46E−4 5.30E−3 3.71E−4 7.30E−3 2.57E−4 3.98E−3 2.88E−4
NSGA-II 8.06E−3 2.80E−4 6.99E−3 1.07E−4 4.89E−3 1.20E−4 6.90E−3 2.70E−4 3.67E−3 1.92E−4
SPEA2 8.46E−3 3.46E−4 7.20E−3 2.06E−4 5.14E−3 1.13E−4 7.20E−3 1.85E−4 3.92E−3 1.85E−4
PAES 2.13E−3 9.55E−4 2.38E−3 2.45E−3 2.20E−3 1.52E−3 3.99E−3 2.07E−3 1.73E−3 7.30E−4
Source: Chen, M. R. et al. 2009. Multi-objective extremal optimization for portfolio optimization problem. Proceedings of 2009
IEEE International Conference on Intelligent Computing and Intelligent Systems (ICIS 2009), pp. 552–556. © 2009 IEEE.
Multiobjective Optimization with Extremal Dynamics
◾
209
210 ◾ Extremal Optimization
C
C
0.4 0.4
0.3 0.3
0.2 0.2
0.1 0.1
0 0
Port1 Port2 Port3 Port4 Port5 Port1 Port2 Port3 Port4 Port5
0.4 0.4
0.3 0.3
0.2 0.2
0.1 0.1
0 0
Port1 Port2 Port3 Port4 Port5 Port1 Port2 Port3 Port4 Port5
0.4 0.4
0.3 0.3
0.2 0.2
0.1 0.1
0 0
Port1 Port2 Port3 Port4 Port5 Port1 Port2 Port3 Port4 Port5
For illustration, we also show 1 of 50 runs of all the algorithms on each problem
when K = 5 in Figure 6.24a–e, respectively. The real curve represents the uncon-
strained Pareto front. Figure 6.24 shows the trade-off fronts obtained by MOEO,
NSGA-II, SPEA2, and PAES. From Figure 6.24, we can see that MOEO is able
to converge nearly as well as or better than the other three algorithms on all the
Multiobjective Optimization with Extremal Dynamics ◾ 211
(a) (b)
× 10–3 Port1 × 10–3 Port2
5 3
Unconstrained Pareto front Unconstrained Pareto front
4.5 MOEO MOEO
NSGA-II NSGA-II
PAES 2.5 PAES
4 SPEA2 SPEA2
3.5 2
3
Risk
Risk
1.5
2.5
2 1
1.5
0.5
1
0.5 0
2 3 4 5 6 7 8 9 10 11 1 2 3 4 5 6 7 8 9 10
Return × 10–3 Return × 10–3
(c)
× 10–3 Port3 (d) × 10–3 Port4
1.6 3
Unconstrained Pareto front Unconstrained Pareto front
MOEO MOEO
1.4 NSGA-II NSGA-II
PAES 2.5 PAES
1.2 SPEA2 SPEA2
2
1.0
Risk
Risk
0.8 1.5
0.6
1
0.4
0.5
0.2
0 0
2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10
Return × 10–3 Return × 10–3
(e)
× 10–3 Port5
1.8
Unconstrained Pareto front
1.6 MOEO
NSGA-II
PAES
1.4 SPEA2
1.2
1.0
Risk
0.8
0.6
0.4
0.2
0
–0.5 0 0.5 1 1.5 2 2.5 3 3.5 4
Return × 10–3
Figure 6.24 (a)–(e) Approximate Pareto fronts for port1–port5 (K = 5). (Chen,
M. R. et al. Multi-objective extremal optimization for portfolio optimization prob-
lem. Proceedings of 2009 IEEE International Conference on Intelligent Computing
and Intelligent Systems (ICIS 2009), pp. 552–556. © 2009 IEEE.)
212 ◾ Extremal Optimization
6.7.5 Conclusions
The above simulation results indicate that MOEO is highly competitive with three
state-of-the-art MOEAs, that is, NSGA-II, SPEA2, and PAES in terms of con-
vergence and diversity of solutions. The results of this study are encouraging. As a
consequence, MOEO may be a good alternative to deal with the portfolio optimi-
zation problem.
6.8 Summary
EO is a general-purpose local-search heuristic algorithm, which has been success-
fully applied to some NP-hard COPs such as graph bipartitioning, TSP, graph
coloring, spin glasses, MAXSAT, and so on. In this chapter, in order to extend
EO to solve MOPs in an efficient way, a novel elitist Pareto-based multiobjective
algorithm, called MOEO, is investigated in detail. The MOEO algorithm and its
variation versions were used to solve several benchmark MOPs, such as numeri-
cal unconstrained/constrained MOPs, multiobjective mechanical components
design problems, multiobjective portfolio optimization problems, and MOKP.
Experimental results indicate that MOEO and its variation versions are highly com-
petitive to the state-of-the-art MOEAs, such as NSGA-II, SPEA2, PAES, NSGA,
SPEA, and NPGA. Thus, MOEO may be a good alternative to deal with MOPs.
We will explore the efficiency of MOEO on those problems with large number of
decision variables in the future. It is desirable to further apply MOEO to solving
those complex engineering MOPs in the real world.
APPLICATIONS III
This page intentionally left blank
Chapter 7
EO for Systems
Modeling and Control
215
216 ◾ Extremal Optimization
analysis method, and the optimal control method is carried on the system analysis
and design of the known mathematical system model. Therefore, the establish-
ment of an accurate dynamic system model is very crucial. On the other hand,
with the trends toward integration and large scale in industry, process control
gradually developed from simple proportional-integral-derivative (PID) loops to
large-scale multivariable complex nonlinear systems, and optimization problems
in control systems have become more complex, resulting in MOO, increased pro-
cess variables, correlation, and lots of constraints. Meanwhile market competition
has put forward higher requirements on control strategies, which requires the con-
trol system to be more flexible to lower the consumption and deliver the qualified
products on time.
Based on the ideas of memetic algorithms (MA), the efficiency of CI can be
improved significantly by incorporating a local search procedure into the optimiza-
tion. Considering the problems that occur in the application of evolution computa-
tion methods in control theory and application, this chapter further applies the two
hybrid EO-based memetic algorithms, called “EO-LM” and “EO-SQP,” to system
identification and industrial optimal control.
with more than a dozen of input variables to produce the desired steel grade. In this
section, the former proposed EO-LM algorithm is applied in multilayer percep-
tron (MLP) network training for BOF endpoint quality prediction with the abili-
ties of avoiding local minimum and performing detailed local search. Based on the
mechanism of the BOF process, a feed-forward MLP network {13, 10, 2} is adopted
for real applications, which consists of 1 hidden layer with 10 hidden nodes. As
shown in Figure 7.1, the inputs are 13 variables, including uncontrollable hot metal
information (hot metal temperature, weight, chemical compositions, etc.) and the
operational receipt variables (oxygen, scrap, a variety of recipes, etc.). Two outputs
are defined as the endpoint temperature and the carbon content, respectively. The
Weighted synapses
Weighted synapses Weighted synapses Bias
Parameters Bias for matrix Bias for
matrix matrix for
hidden from hidden layer hidden
in NN from input layer to from hidden layer output
layer 1 n – 1 to hidden layer n
hidden layer 1 n to output layer layer
layer n
Burnt
Dolo
Stone Phenotype
Ore representation
Oxygen
consumption
EO-LM learning is executed in two parallel phases: the genotype for EO-LM and
the phenotype for MLP network. The synaptic weights and biases are encoded as a
real-valued chromosome, to be evolved during EO-LM iterations.
To evaluate the effectiveness of the proposed hybrid EO-LM for the predictions
of endpoint temperature and carbon content, the simulation experiment is per-
formed using real industry data. Over 1600 pairs of data are gathered from the steel
plant database; among them, 800 pairs are selected randomly as training data, 480
pairs as validation data, and the balance 320 pairs as test data. In order to evaluate
the performance of the proposed EO-LM-based NN model, the conventional LM
algorithm is applied with the same test data set as used in the EO-LM model.
The simulation performance of the EO-LM model is evaluated in terms of root
mean square error (RMSE), mean error (ME), and correlation coefficient. Table
7.1 gives the RMSE and ME values of the two different models on test data set.
Compared with the conventional LM algorithm, the proposed algorithm reduces
the prediction RMSE by 8.65% and 14% for endpoint temperature and carbon
content, respectively.
The scatter diagram in Figure 7.2 shows the extent of the match between the
measured and predicted values by EO-LM and LM learning algorithms. It can be
seen that the EO-LM model shows a better agreement with the target than those
by conventional LM learning algorithm.
The comparison of prediction error distributions for endpoint temperature and
carbon content between the hybrid EO-LM algorithm and conventional LM algo-
rithm are shown in Figure 7.3. It can also be seen that the range of prediction
residuals are reduced by the hybrid EO-LM algorithm compared to the conven-
tional LM algorithm.
The BOF steelmaking is a highly complex process and difficult to model and
control. In this section, a novel hybrid NN training method with the integration of
EO and LM is presented for BOF endpoint quality prediction. The main advantage
of the proposed EO-LM algorithm is to utilize the superior features of EO and
EO for Systems Modeling and Control ◾ 219
1700 1700
Real value (°C)
1660 1660
1640 1640
1620 1620
1600 1600
1600 1620 1640 1660 1680 1700 1720 1740 1600 1620 1640 1660 1680 1700 1720 1740
Predict value (°C) Predict value (°C)
0.2 0.2
Real value (%)
0.15 0.15
0.1 0.1
0.05 0.05
0 0
0 0.05 0.1 0.15 0.2 0.25 0.3 0 0.05 0.1 0.15 0.2 0.25 0.3
Predict value (%) Predict value (%)
LM in global and local search, respectively. As a result, the application of the pro-
posed EO-LM algorithm in NN learning may create a BOF endpoint prediction
model with better performance, the experimental results indicate that the proposed
EO-LM can easily avoid the local minima, overfitting and underfitting problems
suffered by traditional GS-based training algorithms, and provide better prediction
results.
60
Frequency
40
20
0
–60 –40 –20 0 20 40 60
Temperature prediction error by EO-LM
60
Frequency
40
20
0
–60 –40 –20 0 20 40 60
Temperature prediction error by conventional LM
80
Frequency
60
40
20
0
–0.1 –0.05 0 0.05 0.1
Temperature prediction error by conventional LM
80
Frequency
60
40
20
0
–0.1 –0.05 0 0.05 0.1
Carbon content prediction by conventional LM
machine (SVM), was first proposed by Vapnik to solve pattern recognition prob-
lems (Vapnik, 1995; Burges, 1998). In the last decade, SVM-based algorithms
have been developed rapidly and employed in many real-world applications, such
as handwriting recognition, identification, bioinformatics, classification, regres-
sion, etc.
The performance of support vector regression (SVR) model is sensitive to
the kernel type and its parameters. The determination of appropriate kernel
type and associated parameters for SVR is a challenging research topic in the
field of support vector learning. In this section, a novel method is presented for
simultaneous optimization of SVR kernel function and its parameters, which is
formulated as a mixed integer optimization problem and solved by the recently
proposed heuristic “extremal optimization (EO).” We will present the problem
formulation for the optimization of the SVR kernel and parameters, EO-SVR
algorithm, and experimental tests with five benchmark regression problems
(Chen and Lu, 2011b). The comparison results with other traditional approaches
show that the proposed EO-SVR method can provide better generalization per-
formance through successfully identifying the optimal SVR kernel function and
its parameters.
EO for Systems Modeling and Control ◾ 221
7.3.1 Introduction
High generalization capability and global optimal solution constitute the major
advantages of SVM over other machine learning approaches. However, the per-
formance of SVM strongly depends on the embedded kernel type (Ali and Smith,
2006) and the associated parameter values (Min and Lee, 2005; Jeng, 2006; Hou
and Li, 2009). Therefore, the selection of the SVM kernel function and its param-
eters should be conducted carefully on a case-to-case basis as different functions
and parameters may lead to widely varying performance. Up to now, several kernels
have been proposed by researchers, but there are no effective guidelines or system-
atic theories concerning how to choose an appropriate kernel for a given problem.
The empirical search for the SVM kernel and parameters through a trial-and-error
approach has been proven to be often time consuming, imprecise, and unreliable
(Zhang et al., 2004; Lorena and De Carvalho, 2008).
New parameter optimization techniques for SVM have been proposed and inves-
tigated by researchers in recent years. Among them, the most commonly used meth-
ods are grid search (Hsu et al., 2004) and CI (Engelbrecht, 2007). The former is time
consuming (Engin, 2009) and can only adjust a few parameters (Friedrichs and Igel,
2005). On the contrary, CI methods have shown high suitability for constrained non-
linear optimization problems, and are able to avoid local minima inherently.
So far, the optimization of SVR parameters based on CI methods has been real-
ized in many studies and applications (Zhang et al., 2004; Friedrichs and Igel, 2005;
Mao et al., 2005; Pai and Hong, 2005; Engin, 2009; Tang et al., 2009; Wu, 2010).
However, there has been relatively limited research published on how to determine
an appropriate SVR kernel function automatically (Howley and Madden, 2005;
Thadani et al., 2006) and only a few of the existed work was devoted to simulta-
neous optimization of the SVR kernel and its associated parameters (Wu et al.,
2009). Therefore, we will introduce a novel SVR tuning algorithm based on EO
to improve the predictive accuracy and generalization capability. Inspired by far-
from-equilibrium dynamics, EO provides a new philosophy to optimization using
nonequilibrium statistical physics and the capability to elucidate the properties of
phase transitions in complex optimization problems (Boettcher and Percus, 1999).
In this section, we will, for the first time, apply EO in the simultaneous optimiza-
tion of the SVR kernel and its parameters. Considering the complexity of SVR
tuning, a novel EO-SVR algorithm with carefully designed chromosome structure
and fitness function is proposed to deal with this hard issue.
f ( x ) = (w * Φ( x )) + b (7.1)
∑ Γ( f ( x ) − y ) + 2 ||w ||
1
Rreg ( f ) = γ i i
2
(7.2)
i =1
where Γ(⋅) is a cost function and γ is a constant that determines penalties to regres-
sion errors and the vector w can be written in terms of data points as
w= ∑ (α − α*)Φ( x )
i =1
i i i (7.3)
Only the most critical samples associated with nonzero Lagrange multipliers
account in the solution, called support vectors. More detailed descriptions for the
training process of SVR can be found in Steve’s work (Steve, 1998).
Substituting Equation 7.3 into Equation 7.1, the general form can be rewritten as
l l
f (x ) = ∑
i =1
(α i − α*i )(Φ( xi ) * Φ( x )) + b = ∑ (α − α* )k( x , x ) + b
i =1
i i i (7.4)
The dot product in Equation 7.4 can be replaced by the kernel function k(xi, x),
which provides an elegant way of dealing with nonlinear mapping in the feature
space, thereby avoiding all difficulties inherently in high dimensions. There are sev-
eral commonly used kernel types in SVR: linear, polynomial, radial basis function
(RBF), and MLP, as listed in Equations through 7.8.
k ( xi , x ) = ( xi * x ) (7.5)
EO for Systems Modeling and Control ◾ 223
k( xi , x ) = ( xi * x + t )d (7.6)
−||xi − x ||2
k( xi , x ) = exp 2 (7.7)
2σ
where J is a function of kernel type (discrete variable) and kernel parameters (con-
tinuous variables). In this study, the mean square error (MSE) in Equation 7.10 is
selected as the cost function:
∑
N
( yi − yi )2
= i =1
(7.10)
N
224 ◾ Extremal Optimization
where N denotes the number of the validation data, yi represents the actual output,
and yi is the SVR predicted value.
Based on the problem formulation described above, our goal is to employ the
optimized procedure to explore a finite subset of possible values and determine the
kernel type and associated parameters which can minimize the cost function J:
1 Polynomial kernel γ d t
function
3 Multilayer perceptron γ s t
(MLP) kernel
Source: Reproduced from Chen, P. and Lu, Y. Z., Journal of Zhejiang University,
Science C 12: 297–306, 2011b. With permission.
Note: – Denotes no parameter needed.
226 ◾ Extremal Optimization
CONVERT ( P (3)i )}
(7.13)
= { KTinteger , P (1)real , P (2)real , P (3)real }
where ROUND is a function that rounds to the nearest integer, and CONVERT is a
function that maps the P(1)i, P(2)i, and P(3)i from [0, 1] to actual variable regions:
where PLB (j)i and PUB (j)i are the variable bounds satisfying PLB (j)i ≤ P(j)real ≤ PUB (j)i.
As described in Equation 7.10, to solve the SVR optimization problems, the
global fitness can be defined as
∑
N
( yi − yi )2
i =1 (7.15)
Fitness global (S ) = MSE (S ) =
∏Validation set N
Start
Transform the
chromosome, calculate
localized fitness and find
worst component
Transform the
chromosome and
calculate global fitness
N
Termination?
EO evolution cycle
Y
Stop
Case 1:
In this experiment, we approximate the following simple function (Chuang and
Lee, 2009):
sin x
y= , x ∈[ −10, 10] (7.17)
x
Original examples and noisy learning examples for case 1 Original examples and noisy learning examples for case 2
1.2 10
Noisy data (N(0, 0.01)) Noisy data (N(0, 0.2))
Original data 8 Original data
1
0.8 6
4
0.6
2
0.4
0
y
0.2
–2
0
–4
–0.2
–6
–0.4 –8
–0.6 –10
–10 –5 0 5 10 –10 –5 0 5 10
x x
Figure 7.6 Original function examples (“.”) and noisy learning examples (“o”)
for cases 1 and 2. (Reproduced from Chen, P. and Lu, Y. Z., Journal of Zhejiang
University, Science C 12: 297–306, 2011b. With permission.)
EO for Systems Modeling and Control ◾ 229
We have 500 randomly generated examples, 200 of which are adopted as train-
ing examples, 200 as validation examples, and the balance 100 as testing examples.
The learning samples are disturbed by additive noise ξ with zero mean and stan-
dard derivation δ, as described in the following equation:
Figure 7.7 illustrates the whole evolution process of MSE, the best-so-far result
and the searching dynamics of kernel type during EO-SVR optimization with the
noise level δ = 0.002.
Figure 7.8 shows the predictions and the actual values of output y for test sam-
ples. As shown in Figure 7.8, the predictions by EO-SVR show good agreements
with the actual outputs under different noise levels.
The performance comparisons between EO-SVR, RBF kernel with grid search,
and NN model for case 1 are listed in Table 7.3. The best evolved model with
optimal kernel function and associated parameters is highlighted in bold italic
fonts. The comparisons imply that the predictive accuracy of the traditional SVR
with grid search can be improved by simultaneous optimization of kernel type and
parameters.
Case 2:
In this case, we consider a more complex example (Zhang et al., 2004):
0.05
0
0 200 400 600 800 1000 1200 1400 1600 1800 2000
Generation
Best kernel function type and searching dynamics of kernel function during EO evolution
Polynomial kernel Best–so–far kernel function type
Current searching kernel function type
MLP kernel
Linear kernel
RBF kernel
0 200 400 600 800 1000 1200 1400 1600 1800 2000
Generation
Figure 7.7 Evolution process, best kernel type, and searching dynamics during
EO-SVR optimization (case 1, δ = 0.002). (Reproduced from Chen, P. and Lu, Y. Z.,
Journal of Zhejiang University, Science C 12: 297–306, 2011b. With permission.)
230 ◾ Extremal Optimization
0.4
y
0.2
0
–0.2
–10 –8 –6 –4 –2 0 2 4 6 8 10
x
1
0.8 Original output
Learning performance under noise level N(0, 0.005)
0.6 Learning performance under noise level N(0, 0.008)
Learning performance under noise level N(0, 0.01)
0.4
y
0.2
0
–0.2
–10 –8 –6 –4 –2 0 2 4 6 8 10
x
Figure 7.8 EO-SVR-based predictions and actual values for test samples under
different noise levels. (Reproduced from Chen, P. and Lu, Y. Z., Journal of Zhejiang
University, Science C 12: 297–306, 2011b. With permission.)
In this case, we randomly generate 500 examples, 200 of which are adopted
as training examples, 200 as validation examples, and the balance 100 as testing
examples. The learning samples are disturbed by an additive noise ξ with zero mean
and standard derivation δ:
NN – – – – 0.0017
◾
Source: Reproduced from Chen, P. and Lu, Y. Z., Journal of Zhejiang University, Science C 12: 297–306, 2011b. With permission.
Note: – Denotes no parameter needed.
231
Table 7.4 Performance Comparisons between EO-SVR and Other Conventional Approaches (Case 2)
232
◾
Case 3:
In this case, a two-variable function is considered (Chuang and Lee, 2009):
y=
sin ( x12 + x22 ), − 5 ≤ x1 , x2 ≤ 5 (7.21)
x +x
2
1
2
2
We have 700 randomly generated examples, 400 of which are adopted as train-
ing examples, 200 as validation examples, and the balance 100 as testing examples.
The learning samples are disturbed by an additive noise ξ with zero mean and
standard derivation δ:
For the sake of comparison, the experimental results by EO-SVR, RBF kernel
with grid search, and NN model for case 3 are summarized in Table 7.5. The best
Originally examples and noisy learning examples for case 3 Originally examples and noisy learning examples for case 4
80
0.8 100 60
1
40
0.6 50
0.5 20
0
Y
0.4 0
0
–50 –20
0.2
–0.5 –40
–100
5 10
0 –60
5 5 10
0 0 5
0 0 –80
–5 –5
–0.2
X2 –5 –5 X1 X2 –10 –10 X1
Figure 7.9 (See color insert.) Original function examples and noisy learning
examples (⋆) for case 3 (noise level N(0, 0.01)) and case 4 (noise level N(0, 5)).
(Reproduced from Chen, P. and Lu, Y. Z., Journal of Zhejiang University, Science
C 12: 297–306, 2011b. With permission.)
Table 7.5 Performance Comparisons between EO-SVR and Other Conventional Approaches (Case 3)
234
evolved model with optimal kernel function and associated parameters is labeled
in bold italic fonts.
Case 4:
This case is to consider a two-variable function as (Zhang et al., 2004):
We have 700 randomly generated examples, 400 of which are adopted as train-
ing examples, 200 as validation examples, and the balance 100 as testing examples.
The learning samples are disturbed by an additive noise ξ with zero mean and
standard derivation δ:
Table 7.6 shows the simulation results comparison between the proposed
EO-SVR, RBF kernel with grid search, and NN model for case 4. The best evolved
model with optimal kernel function and associated parameters is labeled in bold
italic fonts. It can be seen that the prediction model, which is evolved by the pro-
posed EO-SVM method, can always have satisfactory performances under differ-
ent noise levels.
Case 5:
In this example, we approximate a widely used three-input nonlinear function
(Qiao and Wang, 2008) to verify the effectiveness of the proposed EO-SVR:
The training samples consist of 600 randomly generated data. Another 200
examples are independently generated as validation data set, and 100 examples as
testing data set. The learning samples are disturbed by additive noise ξ with zero
mean and standard derivation δ, as described in the following equation:
Table 7.7 shows the forecasting accuracy on the testing data set with the pro-
posed EO-SVR, RBF kernel with grid search, and NN model. The best evolved
model with optimal kernel function and associated parameters is labeled in bold
italic fonts. Obviously, the developed EO-SVR model yields more appropriate
kernel and parameters, thus giving higher predictive accuracy and generalization
capability.
Table 7.6 Performance Comparisons between EO-SVR and Other Conventional Approaches (Case 4)
236
NN – – – – 2.8953
Source: Reproduced from Chen, P. and Lu, Y. Z., Journal of Zhejiang University, Science C 12: 297–306, 2011b. With permission.
Note: – Denotes no parameter needed.
Table 7.7 Performance Comparisons between EO-SVR and Other Conventional Approaches (Case 5)
Noise Level Method Best Kernel Parameter 1 Parameter 2 Parameter 3 MSE
NN – – – – 0.0121
◾
Source: Reproduced from Chen, P. and Lu, Y. Z., Journal of Zhejiang University, Science C 12: 297–306, 2011b. With permission.
Note: – Denotes no parameter needed.
237
238 ◾ Extremal Optimization
problems, which gives rise to computational difficulties related to the expense and
reliability of solving the NLP problems online. The classical analytical and numeri-
cal optimization techniques are very sensitive to the initial conditions and usu-
ally lead to unacceptable solutions due to the convergence to local optima (Onnen
et al., 1997; Venkateswarlu and Reddy, 2008), or are not even able to guarantee a
feasible solution because of the complexity of optimization problems.
Consequently, new optimization techniques have been proposed and inves-
tigated by researchers for NMPC solution development in recent years. Among
them, methods of CI (e.g., EAs, EO, and ACO) (Engelbrecht, 2007) have shown
notable suitability for constrained nonlinear optimization problems, and are able
to avoid local minima inherently. Therefore, combinations of NMPC and vari-
ous computation intelligence methods as online optimizers have been success-
fully applied to several typical nonlinear processes (Onnen et al., 1997; Martinez
et al., 1998; Potocnik and Grabec, 2002; Song et al., 2007; Venkateswarlu and
Reddy, 2008). However, CI methods have the weakness of slow convergence and
are unable to provide precise solutions because of the failure to exploit local infor-
mation (Tahk et al., 2007). Moreover, for the constrained optimization problems
involving a number of constraints which the optimal solution must satisfy, CI
methods often lack an explicit mechanism to bias the search in constrained search
space (Runarsson and Yao, 2000; Zhang et al., 2008).
In NMPC, the system performance is greatly dependent upon predictive model
accuracy and online optimization algorithm efficiency. This section introduces a
novel NMPC with the integration of SVM and a hybrid optimization algorithm
called “EO-SQP,” which combines recently proposed heuristic EO and deterministic
sequential quadratic programming (SQP) under the framework of MA. Inheriting
the advantages of the two approaches, the proposed EO-SQP algorithm is capable
of solving nonlinear programming (NLP) problems effectively. Furthermore, the
hybrid EO-SQP algorithm is employed as the online solver of a multistep-ahead
SVM model-based NMPC. Simulation results on a nonlinear multi-input multi-
output (MIMO) continuous stirred tank reactor (CSTR) reactor show considerable
performance improvement obtained by the proposed NMPC. This work focused
on the development of a novel NMPC framework, which combines a multistep-
ahead SVM predictive model with an EO-SQP-based online optimizer.
Here, k refers to the sampling time point; y and u are the output and input
variables; and ny and nu refer to input and output lags, respectively.
The primary purpose of MPC is to deal with complex dynamics over an extended
horizon. Thus, based on the currently measured system outputs and future inputs,
the model in MPC must predict the future outputs over a predefined prediction
horizon. However, the NARMAX model described in Equation 7.27 can only pro-
vide a one-step-ahead prediction. If a wider prediction horizon is required, this
problem can be handled by a procedure known as multistep-ahead prediction: the
input vector’s components, previously composed of actual sample points of the time
series, are gradually replaced by predicted values. By feeding back the model out-
puts, the one-step-ahead predictive model can be recurrently cascaded to itself to
generate future predictions for process output, as described below:
y k +i = y(k + i | k )
where e(k + i | k) represents the model uncertainty and unmeasured process distur-
bances at time step (k + i).
In this study, the RBF is adopted as the SVM kernel function. The dynamic model-
ing task in NMPC can then be formulated into the problem of finding an SVM model
that can be used to predict the future trajectory of the plant over a prediction horizon:
# SV
∑ α exp − 2σ
d j ,k +i
y k +i = j 2
+ b + e(k + i | k ), i = 0, 1, 2, …, P − 1 (7.29)
j =1
EO for Systems Modeling and Control ◾ 241
The #SV in Equation 7.29 represents the support vectors. P is the prediction
horizon in NMPC.
The dj,k+i in Equation 7.29 can be further described in the following equation:
min( i ,n y ) ny
d j ,k + i = ∑ (x
n =0
j ,nu + n +1 − y k +i − n ) +
2
∑ (x
n = i +1
j ,nu + n +1 − yk +i − n )2
nu ( x j ,n +1 − uk +i − n )2 , i − nu < n
+ ∑
n =0
( x j ,n +1 − uk + nu )2 , i − nu > n
(7.30)
The NMPC based on SVM can then be formulated into an online constrained
optimization problem as follows:
min J (U (k ), Y (k ))
U (k )
subject to:
Y (k ) = [ yk , yk +1|k , …, yk + i|k ]
yk + i|k = g (U k + i , Yk + i −1 )
i ∈[0, P − 1],
and
umin ≤ uk ≤ umax
ymin ≤ yk ≤ ymax
the desired trajectory over prediction horizon and control energy over control
horizon:
P M −1
J= ∑ || y
i =1
k +i − r ||Qi + ∑ || ∆u
j =1
||
k+ j Rj (7.32)
where r represents the desired set point. The relative importance of the objective
function contributions is controlled by setting the weighting matrices Qi and Rj.
[U, Y] ^
Historical data input Yk+i – + Yk+i Historical data
SVM model output for SVM
for SVM training
training
SVM
learning
Off-line algorithm
modeling
Kernel parameters
tuned by EO
Model
copy
NMPC
controller SVM model
Online U(k)
prediction and Cost function
optimization +
constraints
^ ^ ^
[Yk , Yk+1|k , …, Yk+i|k]
Figure 7.10 Structure of NMPC with integration of SVM model and EO-SQP
optimization. (Adapted from Chen, P. and Lu, Y. Z. 2010b. Nonlinear model
predictive control with the integration of support vector machine and extremal
optimization. Proceedings of the 8th World Congress on Intelligent Control and
Automation, pp. 3167–3172.)
zi (k ), with probability ps
ui (k ) = ,k > 1
ui (k − 1), with probability 1 − ps (7.33)
ui (1) = zi (1)
where zi(k) is a random step input sequence with any desired distribution (e.g.,
uniform, Gaussian) and zi(k) are fed into the plant to obtain the open-loop inputs/
outputs time sequence y(k), u(k).
Min and Lee (2005) stated that the optimal parameter search on SVM plays
a crucial role in building a prediction model with high accuracy and stabil-
ity. Therefore, to construct an efficient SVM model with RBF kernel, two extra
parameters, sigma and gamma, have to be carefully tuned. This study proposes
an EO-based SVM multistep predictive model, namely, EO-SVM, in which EO
is adopted to search for the optimal values of sigma and gamma. According to the
multistep-ahead prediction procedure described in Section 7.4.1, the generalization
244 ◾ Extremal Optimization
P −1 P −1
MSE k = ∑
i =1
ek +i 2 = ∑ ( y
i =1
k +i − yk +i )2 (7.34)
where yk +i represents model prediction trajectory over the whole prediction horizon
as defined in Equation 7.28. The tuning process of proposed EO-SVM is illustrated
in Figure 7.11.
Start
N
Termination?
EO evolution cycle Y
Stop
u(k) u(k + 1) u(k + 2) u(k + 3) u(k + 4) u(k) u(k + 1) u(k + 2) u(k + 3) u(k + 4)
u(k) u(k + 1) u(k + 2) u(k + 3) u(k + 4) u(k) u(k + 1) u(k + 2) u(k + 3) u(k + 4)
optimization of NMPC, due to the fact that the system dynamics depend heavily
on the input and the states series, as described in Equation 7.28. Thus, the muta-
tion on a single component (controller output at a single time instant) does not
help much to improve the system performance. For this reason, this study pro-
poses a novel “horizon-based EO mutation” which is specially designed for NMPC.
The horizon-based mutation updates all the components within a flexible window,
whose width is decided by the position of the currently changed decision variable
and control horizon P. For example, the decision variables between the current
mutation position and the position at the end of the control horizon will update
simultaneously, as described in Figure 7.12. This strategy will help to reduce the
unnecessary variations of optimized control variables and consequently suppress
the oscillation of the system outputs.
dx1
dt = u1 + u2 − k1 x1
dx2 = (C − x ) u1 + (C − x ) u2 − k2 x2
dt
B1 2
x1
B2 2
x1 (1 + x 2 )2 (7.35)
1 0
y = [ x1 x2 ]T
0 1
0 ≤ u1 ≤ 2
s.t. (7.36)
0 ≤ u2 ≤ 2
where k1 = 0.2, k2 = 1, CB1 = 24.9, and CB2 = 0.1. The plant has two inputs and two
outputs and possesses strong nonlinearity.
Here, the NARMAX structure is used to model the nonlinear dynamical sys-
tem in Equation 7.35 and 7.36, it can be mathematically represented as follows:
In this case, two separated SVM models are employed to build the nonlin-
ear mapping g(⋅) between inputs and outputs for y1 and y2, respectively. Based on
Equation 7.33, 500 observation pairs are generated and 400 of them are used as
learning data, the other 100 are used as test data set. Since the generalization capa-
bility of the SVM model greatly depends on some essential parameters, the SVM
parameters are tuned by EO to guarantee multistep-ahead-prediction accuracy of
the predictive model, as described in Section 7.4.1 and Equation 7.34. The general-
ization capability of the optimized SVM predictive model is shown in Figure 7.13.
It can be seen that the long-term recursive prediction by EO-SVM shows a good
agreement with actual output.
The trained EO-SVM models, which show accurate predictions and good gen-
eralization properties over the whole prediction horizon, will be employed as the
internal prediction model of the NMPC.
The effectiveness of the proposed NMPC framework is tested by simulating
various control problem scenarios involving set-point tracking (cases 1–4 for Y1
and cases 1, 3 for Y2) and disturbance rejection (cases 2, 4 for Y2), and compared
with the performance of a LMPC controller, which piecewise linearizes the plant
model at each optimization loop and chooses QP algorithm as online optimizer.
The simulation starts from (y1(0), y2(0)) = (100, 6.637) with input constraints
EO for Systems Modeling and Control ◾ 249
160
y1(h + t)
140
120
100
80
60
Time Time Time Time Time Time Time Time Time Time
interval (10) interval (10) interval (10) interval (10) interval (10) interval (10) interval (10) interval (10) interval (10) interval (10)
12 y2prediction(h + t)
y2prediction(h + t), y2(h + t)
y2(h + t)
10
Time Time Time Time Time Time Time Time Time Time
interval (10) interval (10) interval (10) interval (10) interval (10) interval (10) interval (10) interval (10) interval (10) interval (10)
Figure 7.13 Long-term recursive prediction (10 steps ahead) of the nonlinear
process by EO-SVM model, starting from various initial states.
∫
ITAE = t |(e(t ))| dt
0
(7.38)
As shown in Figures 7.14 and 7.15 and Table 7.8, EO-SQP-based NMPC per-
forms better than LMPC in most cases, as it settles faster, has smaller overshooting,
and lower ITAE values. These results are expected, considering that the long-term
accurate predictions of the EO-SVM model and the global search capability of the
online EO-SQP optimizer will result in more effective and efficient control actions.
Moreover, Figure 7.15 shows that the manipulated variables of the LMPC controller
tend to oscillate, while their NMPC counterparts stay close to the reference values,
due to the fact that the NMPC controller can handle nonlinear process dynam-
ics. Although the proposed EO-SQP-based NMPC requires more computational
250 ◾ Extremal Optimization
EO–SQP-based NMPC tracking and regulatory EO–SQP-based NMPC tracking and regulatory
performance on Y1 performance on Y2
130
Output Y1 Case 3 16 Case 1 Case 2 Output Y2
120 Set point Y1 Set point Y2
14
110
Case 2 12
100
Y1
Y2
Case 4 10
90 Case 3 Case 4
Case 1 8
80
6
70
0 500 1000 1500 2000 2500 3000 3500 0 500 1000 1500 2000 2500 3000 3500
t/s t/s
1.5 1.5
1 1
U1
U2
0.5 0.5
0 0
0 500 1000 1500 2000 2500 3000 3500 0 500 1000 1500 2000 2500 3000 3500
t/s t/s
QP-based MPC tracking and regulatory QP-based MPC tracking and regulatory
performance on Y1 performance on Y2
130
Output Y1 Case 3 16 Output Y2
120 Set point Y1 Set point Y2
14 Case 1 Case 2
110
Case 2 12
Y1
100
Y2
1.5 1.5
1 1
U1
U2
0.5 0.5
0 0
0 500 1000 1500 2000 2500 3000 3500 0 500 1000 1500 2000 2500 3000 3500
t/s t/s
Figure 7.15 Performance of LMPC for tracking and disturbance rejection. (Chen,
P. and Lu, Y. Z. Memetic algorithms based real-time optimization for nonlin-
ear model predictive control. International Conference on System Science and
Engineering, Macau, China, pp. 119–124. © 2011 IEEE.)
Table 7.8 Quantitative Comparison of NMPC and LMPC under Tracking and Regulation Cases
Control Type Optimization Time (s) Variable Performance Index Case 1 Case 2 Case 3 Case 4
(Tracking) (Tracking) (Tracking) (Tracking)
NMPC 1.131 Y1 Overshooting (%) 4.9% 14.6% 11.8% 13.1%
Settling time (s) 24 42 42 42
ITAE 4.499 × 103 5.031 × 103 5.792 × 103 7.021 × 103
(Tracking) (Regulation) (Tracking) (Regulation)
Y2 Overshooting (%) 0.5 – 1.8 –
Settling time (s) 60 18 63 87
ITAE 8.548 × 10 3 4.476 × 10 3 7.289 × 10 3 4.532 × 103
(Tracking) (Tracking) (Tracking) (Tracking)
LMPC 0.0032 Y1 Overshooting (%) 6.4% 16% 12.9% 10.6%
Settling time (s) 36 33 30 33
ITAE 1.841 × 10 4 2.054 × 10 4 2.245 × 10 4 1.959 × 104
(Tracking) (Regulation) (Tracking) (Regulation)
Y2 Overshooting (%) 15.2 – 37.2 –
Settling time (s) 204 72 552 255
ITAE 7.63 × 104 6.273 × 104 10.698 × 104 3.993 × 104
EO for Systems Modeling and Control
◾
Source: Chen, P. and Lu, Y. Z. Memetic algorithms based real-time optimization for nonlinear model predictive control. International
Conference on System Science and Engineering, Macau, China, pp. 119–124. © 2011 IEEE.
251
time than the traditional LMPC, it is acceptable in this case for the reason that the
average NMPC optimization time (1.131 s) is much smaller than the sample time
(3 s). The dramatic improvement in tracking and regulatory performance makes it
appealing and promising for processes with high nonlinearity.
1 1
D( s ) = K P 1 + + TD s = K P + K I + K D s (7.39)
TI s s
where TI and TD are the integral time constant and derivative time constant,
respectively, K P, K I, and K D are proportional gain, integral gain, and derivative
gain, respectively, K I = K P /TI, and K D = K PTD.
EO for Systems Modeling and Control ◾ 253
1 1
U ( s ) = D( s )E ( s ) = K P 1 + + TD s E ( s ) = K P E ( s ) + K I E ( s ) + K D sE ( s )
TI s s
(7.40)
where E(s) is the transfer function of the system error e(t). Furthermore, the contin-
uous-time form of U(s) is also written as the following equation:
t
de(t )
∫
u(t ) = K P e(t ) + K I e(t ) dt + K D
0
dt
(7.41)
∑ e( j ) + T
KD
u(k ) = K P e(k ) + K ITs [e(k ) − e(k − 1)] (7.42)
s
j =0
g11 ( s ) g1n ( s )
G(s ) = (7.43)
g n1 ( s ) … g nn ( s )
d11 ( s ) d1n ( s )
D( s ) = (7.44)
d n1 ( s ) … d nn ( s )
254 ◾ Extremal Optimization
1
d ij ( s ) = K Pij + K Iij s + K Dij s, ∀i, j ∈{1, 2, …, n} (7.45)
In most of previous research work, the integral of absolute error (IAE) and the
integral of time-weighted absolute error (ITAE) are generally used as the indexes
measuring the performances of PID controllers (Ang et al., 2005). However, these
indexes are not still sufficient to evaluate the control performances comprehensively
(Zhang et al., 2009). Here, another much more reasonable performance index is
presented by considering the following additional factors. The first one is the intro-
duction of the square of the controllers’ output, that is, ∫ 0∞ w2 u 2 (t ) dt in order to
avoid exporting a large control value. Second, the rising time w3tu is used to evalu-
ate the rapidity of the step response of a control system. The third one ∫ 0∞ w4 |∆y(t )| dt
is added to avoid a large overshoot value.
The objective function (also called fitness) evaluating the control performance
of a single-variable PID controller is defined as follows (Zhang et al., 2009):
∞
∫
(w1|e(t )| + w2u 2 (t ))dt + w3t u , if ∆y(t ) ≥ 0
min J = min ∞0 (7.46)
0
∫
(w1|e(t )| + w2u (t ) + w4 |∆y(t )|)dt + w3t u ,
2
if ∆y(t ) < 0
where e(t) is the system error, Δy(t) = y(t) − y(t − Δt), u(t) is the control output at
the time t, tu is the rising time, w1,w2, w3, w4 are weight coefficients, and w4 ≫ w1.
The objective function that evaluates the control performance of a multivariable
PID controller is defined as follows:
min J =
∞ n n
n
∫
0
w1
∑
i =1
|ei (t )| + w2 ∑
i
ui2 (t ) dt + w3
∑t
i =1
ui , iff ∆yi (t ) ≥ 0
∞ n n n
min
∫
0
w1
∑
i =1
|ei (t )| + w2 ∑
i
ui2 (t ) + w4 ∑|∆y (t )|
i =1
i
n
dt + w3 ∑ti =1
ui , if ∆yi (t ) < 0 (7.47)
EO for Systems Modeling and Control ◾ 255
where ei(t) is the ith system error, Δyi(t) = yi(t) − yi(t − Δt), ui(t) is the ith control
output at the time t, tui is the rising time of the ith system output yi, w1, w2, w3, w4
are weight coefficients, and w4 ≫ w1.
(k = 1, 2, …, N ) (7.48)
ei (k ) = ri (k ) − yi (k ), i = 1, 2, …, n (7.49)
∑e ( j) + T
K Di
ui (k ) = K Pi ei (k ) + K IiTs i [ei (k ) − ei (k − 1)], i = 1, 2, …, n (7.50)
s
j =0
l1 l2 l3 lM–2 lM–1 lM
Binary-coded external
KPi, KIi, KDi
optimization (BCEO)
Figure 7.18 Block diagram of a control system with BCEO-based PID controller.
(Reprinted from Neurocomputing, 138, Zeng, G. Q. et al., Binary-coded extremal
optimization for the design of PID controllers, 180–188, Copyright 2014, with
permission from Elsevier.)
corresponding to a specific ith PID controller, w1, w2, w3, w4 are weight coefficients,
and w4 ≫ w1. In addition, the relationship between yi(k) and ui(k) is obtained by
the following steps: First, compute the equation Y(z)/U(z) = Z[G(s)] by z-trans-
form, and then obtain the difference equation describing the relationship between
yi(k), yi(k − 1), … and ui(k), ui(k − 1), … by inverse z-transform. Clearly, the special
case when n = 1 is just the fitness of single-variable PID controller.
The block diagram of a control system with BCEO-based PID controller is
shown as Figure 7.18. The parameters K Pi, K Ii, and K Di of a PID controller are
optimized by the proposed BCEO algorithm after an iteration process. To be more
precise, the basic idea behind the proposed BCEO-based PID design method is
encoding the PID real-valued parameters, including K Pi, K Ii, and K Di, into a binary
string, evaluating the control performance by Equations 7.48 through 7.52, select-
ing bad binary elements according to a probability distribution, mutation of the
selected elements by flipping the binary elements, and updating the current solu-
tion by accepting the new solution after mutation unconditionally. The detailed
description of BCEO-based PID controller design algorithm is presented as follows
(Zeng et al., 2014).
Input: A discrete-time model of controlled plant G(s) with sampling period Ts,
the number M of PID controllers, the length lj of jth binary substring correspond-
ing to jth parameter, the weight coefficients w1, w2, w3, w4 used for evaluating the
fitness, the maximum number of iterations Imax, the control parameter of the prob-
ability distribution P(k).
Output: The best solution Sbest (the best PID parameters K PO, K IO, and K DO) and
the corresponding global fitness Cbest.
2. For configuration S,
a. Generate the configuration Si by flipping the bit i (1 ≤ i ≤ L) and keeping
the others unchanged, then compute the fitness C(Si) based on Equations
7.48 through 7.52.
b. Evaluate the local fitness λi = C(Si) − Cbest for each bit i and rank all the
bits according to λi, that is, find a permutation П1 of the labels i such that
λ Π1 (1) ≥ λ Π1 ( 2 ) ≥ … ≥ λ Π1 ( L ).
c. Select a rank П1(k) according to a probability distribution P(k), 1 ≤ k ≤ L
and denote the corresponding bit as xj.
d. Flip the value of xj and set Snew = S in which xj value is flipped.
e. If C(Snew) ≤ C(Sbest), then Sbest = Snew.
f. Accept Snew unconditionally.
3. Repeat step (2) until some predefined stopping criteria (e.g., the maximum
number of iterations) are satisfied.
4. Obtain the best solution Sbest and the corresponding global fitness Cbest.
PBPSO (Menhas 8 NP, Imax, lj, inertia weight factor wmax and wmin,
et al., 2012) Vmax, acceleration parameter c1, c2
tuned in PBPSO. Furthermore, BCEO has only selection and mutation operations
from the perspective of EA. Therefore, BCEO-based PID design method is much
simpler than the methods based on AGA, SOGA, and PBPSO. Additionally, the
proposed BCEO is more effective and efficient than AGA, SOGA, and PBPSO,
which has been demonstrated by the experimental results on some benchmark
examples in Section 7.5.3.
1.6
G1 ( s ) = (7.53)
s 2 + 2.584 s + 1.6
15
G2 ( s ) = (7.54)
50s 3 + 43s 2 + 3s + 1
AGA (Zhang et al., 2007) NP = 50, Imax = 100, lj = 10, Pc = 0.6, Pm = 0.01
SOGA (Zhang et al., 2009) NP = 50, Imax = 100, lj = 10, β = 0.7, Pc = 0.6, α = 4,
Tc = 50
PBPSO (Menhas et al., 2012) NP = 50, Imax = 100, lj = 16, wmax = 0.8 and
wmin = 0.8, Vmax = 50, c1 = 2, c2 = 2
Table 7.11 Comparative Performances of BCEO with the Reported Popular Evolutionary Algorithms for
Single-Variable Plant 1
Algorithm KP KI KD BF σ% tu (s) ess% 0.1%tw (s) T (s)
◾ Extremal Optimization
AGA 15.2884 2.7566 3.4506 9.8644 0.21 0.65 0.05 5.80 52.2760
SOGA 19.3900 4.1190 5.1510 14.2753 2.81 0.90 0.51 > 10 51.3921
PBPSO 19.9990 4.2174 3.9562 14.1507 3.78 0.65 0.35 > 10 78.9850
BCEO (τ = 1.3) 17.5171 2.6393 3.9296 9.7605 0.01 0.65 0.00 2.00 19.8290
Source: Reprinted from Neurocomputing, 138, Zeng, G. Q. et al., Binary-coded extremal optimization for the design of PID
controllers, 180–188, Copyright 2014, with permission from Elsevier.
EO for Systems Modeling and Control ◾ 261
1.2
BCEO
PBPSO
0.4
0.2
0
0 2 4 6 8 10
Time (s)
Figure 7.19 (See color insert.) Comparison of step response for plant 1 under
different algorithms-based PID controllers. (Reprinted from Neurocomputing,
138, Zeng, G. Q. et al., Binary-coded extremal optimization for the design of PID
controllers, 180–188, Copyright 2014, with permission from Elsevier.)
The step responses for plant 2 under different algorithms-based PID control-
lers are shown in Figure 7.20. From the experimental results of AGA (Zhang et al.,
2007), SOGA (Zhang et al., 2009), PBPSO (Menhas et al., 2012), and BCEO
(Zeng et al., 2014) for plant 2 shown in Table 7.12, it is obvious that the transient
and steady-state performances obtained by BCEO for plant 2 are better than those
by AGA, SOGA, and PBPSO. Moreover, the BF and T of BCEO are also smaller
than those of AGA, SOGA, and PBPSO. In other words, the proposed BCEO-
based PID algorithm is also more effective and efficient than AGA, SOGA, and
PBPSO for plant 2.
12.8e − s −18.9e −3 s
1 + 16.7 s 1 + 21s
Gm ( s ) = (7.55)
6.6e −7 s −19.4e −3 s
1 + 10.9s 1 + 14.4 s
262 ◾ Extremal Optimization
1.4
1.2
0.6 SOGA
BCEO
0.4 PBPSO
0.2
0
0 20 40 60 80 100
Time (s)
Figure 7.20 (See color insert.) Comparison of step response for plant 2 under
different algorithms-based PID controller. (Reprinted from Neurocomputing,
138, Zeng, G. Q. et al., Binary-coded extremal optimization for the design of PID
controllers, 180–188, Copyright 2014, with permission from Elsevier.)
The steady-state decoupling matrix of the above controlled plant model is given
as follows:
0.1570 −0.1529
D = Gm−1 (0) = (7.56)
0.0534 −0.1036
AGA 1.3294 0.1955 4.6921 108.0177 22.97 3.75 0.00 43.20 64.8280
SOGA 2.98 0.096 12.7 156.5308 8.45 8.45 0.45 > 100 62.7620
PBPSO 4.0043 0.0355 10.0 165.4567 11.13 5.70 0.24 > 100 125.1520
BCEO 1.7986 0.0196 6.3441 92.7998 0.50 7.75 0.00 30.0 41.7960
Source: Reprinted from Neurocomputing, 138, Zeng, G. Q. et al., Binary-coded extremal optimization for the design of
PID controllers, 180–188, Copyright 2014, with permission from Elsevier.
EO for Systems Modeling and Control
◾
263
264
◾
Table 7.13 Multivariable PID Controller Parameters of AGA, PBPSO, and BCEO for Multivariable Plant with Decoupler
Algorithm KP1 KI1 KD1 KP2 KI2 KD2 BF T (s)
Extremal Optimization
Table 7.14 Comparative Performance of BCEO with AGA and PBPSO for
Multivariable Plant with Decoupler
Algorithm σ1% tu1 ess1% 0.5%tw1 σ2% tu2 ess2% 0.5%tw2
AGA 0.00 18.0 0.00 30 0.00 17.0 0.31 34.0
PBPSO 0.40 11.8 0.02 22 0.60 10.5 0.44 25.0
BCEO 2.98 6.0 0.00 11 2.49 4.0 0.27 10.0
Source: Reprinted from Neurocomputing, 138, Zeng, G. Q. et al., Binary-coded
extremal optimization for the design of PID controllers, 180–188, Copyright
2014, with permission from Elsevier.
(a) 1.2
0.8
Ideal position signal
AGA
0.6
y1
BCEO
PBPSO
0.4
0.2
0
0 50 100 150
Time (min)
(b) 1.2
PBPSO
0.4
0.2
0
0 50 100 150
Time (min)
Figure 7.21 (See color insert.) Comparison of output y1 (a) and y2 (b) under dif-
ferent algorithms-based PID controllers for multivariable plant with decoupler.
(Reprinted from Neurocomputing, 138, Zeng, G. Q. et al., Binary-coded extremal
optimization for the design of PID controllers, 180–188, Copyright 2014, with
permission from Elsevier.)
266
◾
Table 7.15 Effects of Parameter τ on the Performances of BCEO for Single-Variable Plant 2
τ KP KI KD BF σ% tu (s) ess% 0.1%tw (s) T (s)
1.10 2.1896 0.2933 6.4614 106.9520 24.12 3.15 0.00 28.60 43.1100
1.15 1.8768 0.0196 6.1877 94.0765 1.08 7.55 0.09 13.85 42.9370
1.20 1.7400 0.2346 5.3568 103.8372 23.19 3.35 0.07 27.25 43.1090
Extremal Optimization
1.25 2.5024 0.0196 8.5826 97.8417 1.20 6.90 0.02 15.75 42.1880
1.30 1.7986 0.0196 6.3441 92.7998 0.50 7.75 0.00 18.40 41.7960
1.35 2.5024 0.4301 7.4682 109.8545 28.25 3.00 0.00 22.10 42.5630
1.40 2.5024 0.0196 8.5826 97.8417 1.20 6.90 0.02 15.75 42.9690
1.45 2.8152 0.0196 9.4233 101.9073 1.78 6.55 0.04 15.05 42.8900
1.50 2.5024 8.5142 8.2893 97.8486 1.23 6.85 0.03 15.85 43.6410
Source: Reprinted from Neurocomputing, 138, Zeng, G. Q. et al., Binary-coded extremal optimization for the design of
PID controllers, 180–188, Copyright 2014, with permission from Elsevier.
EO for Systems Modeling and Control ◾ 267
1.4
1.2
τ = 1.15
0.6 τ = 1.20
τ = 1.25
τ = 1.30
0.4 τ = 1.35
τ = 1.40
τ = 1.45
0.2 τ = 1.50
0
0 10 20 30 40 50 60 70 80 90 100
Time (s)
Figure 7.22 (See color insert.) Adjustable parameter τ versus the step response
for single-variable plant 2. (Reprinted from Neurocomputing, 138, Zeng, G. Q.
et al., Binary-coded extremal optimization for the design of PID controllers, 180–
188, Copyright 2014, with permission from Elsevier.)
7.6 Summary
As one of the most popular APC solutions, the MPC has been widely applied in
process industries during recent years. Due to the rapid development of industry
and technology, control performance requirements for large-scale, complicated,
268 ◾ Extremal Optimization
and uncertain systems keep rising. Basically, the issues mentioned above mainly
involve solving various types of complicated nonlinear optimization problems.
The LMPC, which usually relies on a linear dynamic model, is inadequate for
the abovementioned problems and the limitation becomes more and more obvi-
ous. NMPC appears to be a perspective method. However, the introduction of
the nonlinear predictive model brings some difficulties, such as the parameters/
structure selection of the nonlinear predictive model and the online receding hori-
zon optimization. All these issues encourage researchers and practitioners to devote
themselves to the development of novel optimization methods, which are suitable
for applications in NMPC.
This chapter starts with the general review of two key research issues in mod-
ern control: the prediction model and online optimization. Based on the methods
in Chapter 5, we first apply the proposed EO-LM methods in BOFs endpoint
quality prediction. Then a new SVR tuning method, EO-SVR, is introduced for
the automatic optimization of SVR kernel type and its parameters, to provide a
better generalization performance and a lower forecasting error. Finally, a novel
NMPC with the integration of EO-SVR prediction model and “EO-SQP”-based
receding horizon optimization is proposed. Simulation results on a nonlinear
MIMO CSTR reactor show considerable performance improvement obtained by
the proposed NMPC. The main topics studied in this chapter are summarized
as follows:
EO for Production
Planning and Scheduling
8.1 Introduction
To make a manufacturing enterprise more competitive and profitable in the global
marketplace, the profit- driven “make-to-order” or “make-to-stock” business model
has been applied widely in manufacturing management. Among multidimensional
business and production decisions, computer-aided production planning and schedul-
ing to optimize desired business objectives subject to multiple sophisticated constraints
has been one of the most important decisions. In general, many production-scheduling
problems can be formulated as constrained combinatorial optimization models that
are usually NP-hard problems, particularly for those large-scale real-world applica-
tions. This kind of scheduling problem is generally difficult to be solved with tra-
ditional optimization techniques. Consequently, many approximation methods, for
example, metaheuristics, have been the major approaches to such kind of constrained
COP. Although approximation algorithms do not guarantee achieving optimal solu-
tions, they can attain near-optimal solutions with reasonable computation time.
In this chapter, hybrid EAs with integration of GA and EO are proposed
to solve a typical sequence-dependent scheduling problem, which is illustrated
with the production scheduling of a hot-strip mill (HSM) in the steel industry.
In general, the manufacturing system under study has the following major features:
271
272 ◾ Extremal Optimization
Figure 8.1 A typical hot-rolling production line. (Reprinted from Computers and
Operations Research, 39 (2), Chen, Y. W. et al., Development of hybrid evolution-
ary algorithms for production scheduling of hot strip mill, 339–349, Copyright
2012, with permission from Elsevier.)
274 ◾ Extremal Optimization
further reduce the gauge and width of slabs to desired specifications. After passing
through the watercooler and the coiler, the raw slabs are finally converted into the
finished hot-rolled products with a desired gauge (1.5–12 mm), width, mechanical
and thermal properties, and surface quality.
Campaign
Round
Warm-up Body
Figure 8.2 “Coffin-shape” width profile of a rolling round and the composi-
tion of a campaign. (Reprinted from Computers and Operations Research, 39 (2),
Chen, Y. W. et al., Development of hybrid evolutionary algorithms for produc-
tion scheduling of hot strip mill, 339–349, Copyright 2012, with permission from
Elsevier.)
EO for Production Planning and Scheduling ◾ 275
As the warm-up section consists of only a few slabs, it can be scheduled easily
by various heuristic methods. In the following sections, we only consider the body
section of the HSM scheduling without explicit declaration. In terms of their rela-
tive importance, the HSM-scheduling constraints can be classified into hard con-
straints and soft constraints. Hard constraints refer to those that cannot be violated
in the final scheduling solution. In the HSM- scheduling problem, the main hard
constraints are as follows:
1. A rolling round has a “coffin-shape” width profile, which starts with a warm-
up section having the width pattern of narrow to wide, and follows a body
section having the width pattern of wide to narrow.
2. The number of coils in a rolling round has lower and upper bounds due to the
capacity of the rollers.
3. The changes in dimensions and hardness between adjacent coils should be
smooth in both warm-up and body sections. This is because the rollers need
to be adjusted for controlling the dimension and hardness jumps on the entry
and outlet of mills.
4. The number of coils with the same width to be processed consecutively has an
upper bound in a rolling round. Otherwise, the edges of slabs mark the rollers
easily. In practice, this is also called groove constraint.
5. The number of short slabs in a rolling round has an upper bound, because
reheating furnaces are generally designed for normal slabs and short ones are
only used to fill in gaps.
i, j number of coils
N total number of coils to be produced for manufacturing orders
wi, gi, li, hi width, gauge, length, and hardness of coil i
ui finished temperature of coil i in the outlet of the finishing mill
si logic symbol of short slabs, that is, if the raw slab of producing coil i is shorter
than a predefined length, si = 1, otherwise, si = 0
EO for Production Planning and Scheduling ◾ 277
– – – – – – 4 50
– – – – – – – –
Since this chapter aims to solve the two subtasks of order selection and sequenc-
ing simultaneously, another decision variable xii is defined below by analogy with
the prize-collecting TSP model (Lopez et al., 1998)
278 ◾ Extremal Optimization
The sequence-dependent transition cost cij can be defined as cij = pijw + pijg +
pijh + pijt , where pijw, pijg , pijh and pijt represent the width, gauge, hardness, and finished
temperature transition cost, respectively. As discussed above, the transition cost
can be measured by the jump values of the parameters of every two adjacent coils.
For example, the finished temperature transition cost can be measured by a func-
tion pijt = pt (u j − ui ). In practice, pijw, pijg , and pijh are usually defined by the penalty
structure as shown in Table 8.1.
In the HSM scheduling, the optimization objective can be decomposed into
two parts according to the information requirements of evaluating fitness (Han,
2005): (1) local evaluation function (LEF), which can be calculated by using only
local information. The LEF is defined as LEF (S ) = ∑in=0 ∑ nj =0 cij xij , which consists
of the sequence-dependent transition costs and the nonexecution penalties (Refael
and Mati, 2005); (2) global evaluation function (GEF), which needs to consider
the overall configuration information of a solution. The E/T penalties can be
included in this part, and therefore, the GEF is defined as GEF (S ) = ∑in=1(ei + ri ),
where ei = max{0, di − ti − pi} and ri = max{0, ti + pi − di}. To generate a rolling
sequence, a virtual coil is added to the set of manufacturing orders, and it has
no processing time and any sequence-dependent transition cost. A constraint is
added to select the virtual coil as the starting node of a rolling round. As a result,
a mathematical HSM-scheduling model can be formulated as follows:
Minimize
n n n
λ ∑∑c x
i =0 j =0
ij ij + (1 − λ ) ∑ (e + r )
i =1
i i (8.1)
subject to
n
∑x
i =0
ij = 1, j = 0, ..., n (8.2)
∑x
j =0
ij = 1, i = 0, ..., n (8.3)
Ll ≤ ∑ l (1 − x ) ≤ L
i =0
i ii u (8.4)
EO for Production Planning and Scheduling ◾ 279
∑ (1 − x ) * sgn(w − υ ) ≤ N ,
i =0
ii i k υ ∀υk ∈V (8.5)
∑ s (1 − x ) ≤ N
i =0
i ii s (8.6)
x00 = 0 (8.7)
t 0 = Ts (8.8)
tj = ∑ x (t + p ) + x (t
i =0
ij i i jj 0 + T ), j = 1, …, n (8.9)
1, if x = 0
sgn( x ) = (8.11)
0, otherwise
Equation 8.1 is used to calculate the optimization objective F(S) of any sched-
uling solution S, and the parameter λ(0 ≤ λ ≤ 1) is the weight of measuring the
relative importance of the two optimization parts. Equations 8.2 and 8.3 indicate
that each coil can be processed only once or it is not selected in the current-rolling
round. Equation 8.4 specifies the minimal and maximal capacities of a rolling
round. Equations 8.5 and 8.6 represent the maximal number of same-width coils
and that of short slabs, respectively. Equations 8.7 and 8.8 select the virtual coil
as the starting node of a rolling round. Equation 8.9 establishes the relationship
between variables tj and xij, where constant T is greater than the total processing
time of a rolling round. It means that if coil j is not selected in the current-rolling
round, its processing time is not earlier than the starting time of the next rolling
round.
Obviously, the scheduling model above is an NP-hard constrained COP.
Considering a simple case, we suppose that M coils need to be scheduling to a
rolling round, and there will be N!/M! possible solutions without taking into
account any constraints. If there are thousands of manufacturing orders, the com-
plete e numeration of all possible solutions is computationally prohibitive, that is,
no exact algorithm is capable of solving the optimization problem with reasonable
computation time. Frequently, EAs as promising approximate techniques can be
280 ◾ Extremal Optimization
Slot#: 1 2 3 4 5 6 7 8 9
Chromosome: 5 8 9 12 7 11 2 10 3
is to select m coils from the n coils and generate an optimized rolling sequence.
An example of a chromosome is shown in Figure 8.3.
The vector [5, 8, …, 3] denotes a possible rolling round, and each figure in the
vector represents a particular coil ID. This chromosome can represent a possible
scheduling solution, in which the rolling round consists of nine coils, but the total
number of coils can be more than 12.
Step 1. Identify available slabs for the body section within this current- scheduling
time horizon.
Step 2. Sequence the corresponding coils in descending width and ascending
due date (i.e., the coils are first sequenced in descending width, and those
with the same width are further sequenced in ascending due date).
Step 3. Group the coils with the same width, and calculate the number of
groups Nw.
Step 4. Select Nv coils into a body set from each group sequentially, and calculate
the number of selected coils Ntemp in the body set.
Step 5. If Ntemp is less than the required number Nbody for a body section, it
means no feasible body section can be generated with existing manufacturing
orders, then go to step 6.
Else, generate a body section by sequentially selecting the coils in the body set
with the step size N temp /N body .
Step 6. Stop the heuristic, and return the scheduling output.
Second, the nearest-neighbor search method, which first chooses a starting coil
and then selects the next least-cost coil to the provisional sequence iteratively, is
used to generate a proportion of individuals in the initial population.
Finally, the random insertion method is used to generate all other individuals.
Starting from a randomly selected coil, a body section is generated by selecting the
next unscheduled coil randomly and then inserting it in the least-cost position of
the provisional sequence.
probability is defined as pi = c(1 − c)i−1, where c denotes the selection pressure, and i
is the rank number of a chromosome in the whole population (Michalewicz, 1996).
To generate new individuals, one parent is chosen from the feasible solution
pool by the roulette-wheel method, and another parent is selected from the current
population using a niche technique. The niche technique ensures a minimum dif-
ference between every two parent chromosomes and further maintains the genetic
diversity of the new population. Thus, a similarity coefficient is defined as cij = sij/n,
where n is the chromosome size, sij is the number of identical genes between chro-
mosomes i and j. When a parent chromosome i is selected, only the chromosome j,
whose similarity coefficient cij with the chromosome i is not higher than a predeter-
mined value c0, has the possibility to be selected as the second parent chromosome.
8.3.1.4.2 Crossover
The crossover operator transforms parent chromosomes for finding better child
chromosomes. Partially matched crossover (PMX) and ordered crossover (OX) have
been proved as effective crossover schemes for integer chromosomes (Larraňaga
et al., 1999). In the HSM-scheduling problem, the integral range is equal to, or
greater than the chromosome length. As a result, the genes in a child chromosome
are not completely homologous to that of its parents, and the crossover operator in
the MGA is also not completely identical to that of the standard PMX and OX.
Here, we present a simple example to illustrate the PMX operator in the MGA.
Given two parent chromosomes S1 and S2:
S1 : 5 − 8 − 9− | 12 − 7 − 11 | −2 − 10 − 3
S2 : 7 − 6 − 11− | 1 − 9 − 10 | −5 − 4 − 8
First, two cut points are chosen at random, and the genes bounded by the cut
points are exchanged. As a result, one chromosome has some new partial genetic
information from the other. The intermediate structures of the new solutions are
S1′: 5 − 8 − 9− | 1 − 9 − 10 | − 2 − 10 − 3
S2′ : 7 − 6 − 11− | 12 − 7 − 11 | − 5 − 4 − 8
However, these two intermediate solutions are not necessarily feasible because
some genes are repeated. The repeated genes can be replaced by mapping |12 − 7 − 11|
to |1 − 9 − 10|. And then, two new solutions are generated as follows:
S1′ : 5 − 8 − 7 − | 1 − 9 − 10 | − 2 − 11 − 3
S2′ : 9 − 6 − 10− | 12 − 7 − 11 | − 5 − 4 − 8
284 ◾ Extremal Optimization
One can see that the gene 1 is not included in the parent chromosome S1, but it
appears in its child chromosome S1′ after the PMX operation.
Step 1. Sequence the selected coils in the current solution according to the
descending width and ascending due date.
Step 2. If the groove constraint is satisfied, go to step 3. Else, delete a coil with
a specific width violating the groove constraint, randomly select a new coil
with a different width from manufacturing orders, and insert it to the cur-
rent-rolling round with the least cost, and go to step 2.
Step 3. If the short-slab constraint is satisfied, go to step 4. Else, delete a short
slab from the rolling round, select a nonshort slab from manufacturing orders
randomly, and insert it to the current sequence with the least cost without
violating the groove constraint, and then go to step 3.
Step 4. Stop this repair strategy, and return the scheduling solution.
Figure 8.4 Example of the Or-opt exchange: moving chain (π(i + 1), π(j)) to the
position between π(k) and π(k + 1). (Reprinted from Computers and Operations
Research, 39 (2), Chen, Y. W. et al., Development of hybrid evolutionary algo-
rithms for production scheduling of hot strip mill, 339–349, Copyright 2012, with
permission from Elsevier.)
EO for Production Planning and Scheduling ◾ 285
3400
3200
Objective function value
3000
2800
2600
2400
2200
0 50 100 150 200 250 300 350 400 450 500
Number of generations
Figure 8.5 Convergence curve of the modified GA. (Reprinted from Computers
and Operations Research, 39 (2), Chen, Y. W. et al., Development of hybrid
evolutionary algorithms for production scheduling of hot strip mill, 339–349,
Copyright 2012, with permission from Elsevier.)
After some criteria are satisfied, the MGA terminates the iterative process and
reports the best schedule solution so far. The termination criteria can be a cer-
tain number of generations (Gen) or a given computation time. In the study, the
algorithms are coded in C ++, and compile with MS Visual Studio 6.0. The initial
parameters are set as follows: the population size Pop = 200, the selection pres-
sure c = 0.1, the similarity coefficient threshold c0 = 0.5, the crossover probability
pc = 0.95, and the mutation probability pm = 0.05. By simulating a set of produc-
tion data collected from a hot-rolling mill, the convergence curve of the proposed
MGA is shown in Figure 8.5.
One can see that the proposed MGA can converge into a satisfactory solution
within a few hundreds of generations.
8.3.2.1 Introduction to EO
EO is inspired by the self-organized critical models in ecosystems (Boettcher and
Percus, 1999). For a suboptimal solution, the EO algorithm eliminates the com-
ponents with an extremely undesirable performance and then replaces them with
randomly selected new components. Finally, a better solution may be generated by
repeating such kinds of local search process. For a general minimization problem,
the basic EO algorithm proceeds as follows:
Step 1. Initialize parameters and obtain an initial solution S, which can be inher-
ited from other algorithms, and then calculate the optimization objective
F(S), set Sbest = S.
Step 2. For the current solution S, evaluate the local fitness λi for all scheduled
coils and rank them according to their fitness values.
Step 3. Select a coil c(s) according to the power-law distribution pk(τ0), where τ0
is a given parameter value.
Step 4. Choose the best solution S′ from a neighboring subspace N(S) of the
current solution S.
Step 5. If F(S′) < F(Sbest), then set Sbest ← S′.
Step 6. Accept S ← S′ unconditionally.
Step 7. If termination criteria are not satisfied, go to step 2; else go to the next step.
Step 8. Return Sbest and F(Sbest).
Note that in step 4, the subspace N(S) can be generated through various strat-
egies. In this chapter, the “route-improvement” algorithm that is similar to the
perturbation moves (Cowling, 2003) is presented to generate the neighboring
subspace. We take a scheduling solution S and improve it by slight perturbations.
The perturbation is iterated until no further improvement is possible, and then the
local optimum S′ is obtained. The perturbation processes can be described as follows:
2200
n/m = 2.5
n/m = 5.0
2100 n/m = 7.5
2000
Objective function value
1900
1800
1700
1600
1500
1400
0 0.5 1 1.5 2 2.5 3 3.5 4
Parameter (τ)
Figure 8.6 Plot of the objective function value over the parameter τ under a
number of scheduling instances. (Reprinted from Computers and Operations
Research, 39 (2), Chen, Y. W. et al., Development of hybrid evolutionary
algorithms for production scheduling of hot strip mill, 339–349, Copyright 2012,
with permission from Elsevier.)
Using the solution generated by the previous MGA as the initial solution, the
τ-EO algorithm has the convergence curve as shown in Figure 8.7.
It is obvious that the algorithm can improve the initial solution significantly
and converges into a solution within 2000 generations. Note that the optimization
process takes less than 45 s.
2400
2200
Objective function value
2000
1800
1600
1400
1200
0 200 400 600 800 1000 1200 1400 1600 1800 2000
Number of generation
250
Local fitness valve
200
150
100
50
0
0 20 40 60 80 100 120
Sequential slot number
Figure 8.8 Local fitness comparison between the MGA and the GEO-1.
(Reprinted from Computers and Operations Research, 39 (2), Chen, Y. W. et al.,
Development of hybrid evolutionary algorithms for production scheduling of hot
strip mill, 339–349, Copyright 2012, with permission from Elsevier.)
290 ◾ Extremal Optimization
3000
MGALEF
MGA GEF
2500 MGA
GE0-2 LEF
GE0-2GEF
GE0-2
Evaluation function value
2000
1500
1000
500
0
2 4 6 8 10 12 14 16 18 20
Number of the top 20 solutions
Figure 8.9 Fitness comparison of a multisolution generated by the MGA and the
GEO-2. (Reprinted from Computers and Operations Research, 39 (2), Chen, Y. W.
et al., Development of hybrid evolutionary algorithms for production scheduling
of hot strip mill, 339–349, Copyright 2012, with permission from Elsevier.)
EO for Production Planning and Scheduling ◾ 291
3500
GEO-1
GEO-2
GEO-3
3000
Objective function value
2500
2000
1500
1000
0 100 200 300 400 500 600 700 800 900 1000
Number of generations
Enterprise legacy
Scheduling scenarios HSM shop-floor
or ERP system
Manufacturing orders
Optimizer
Parameters setting
Model Database
Scheduling objectives
Statistic analysis
Step 3. Based on the site-specific production requirements, the planner can con-
figure the constraint settings.
Step 4. Under the preconfigured conditions, the planner starts the optimization
engine to generate scheduling solutions.
Step 5. Evaluate the generated scheduling solution. The planner can check the
scheduling solutions by viewing the graphical width (or gauge, hardness, etc.)
patterns and the statistical analysis results. The planner can also edit a specific
rolling round by using graphical user interfaces, such as inserting, replacing,
moving, or deleting slabs from a rolling round, or rescheduling some rolling
rounds in the existing scheduling scenario.
Figure 8.12 (See color insert.) Main graphical user interface of the developed
HSM-scheduling system. (Reprinted from Computers and Operations Research,
39 (2), Chen, Y. W. et al., Development of hybrid evolutionary algorithms for
production scheduling of hot strip mill, 339–349, Copyright 2012, with permission
from Elsevier.)
Pattern distribution
200
Width (cm)
100
0
0 20 40 60 80 100 120
1.5
Gauge (cm)
0.5
0
0 20 40 60 80 100 120
10
Hardness (1–10)
0
0 20 40 60 80 100 120
Sequential slot number
Figure 8.13 Width, gauge, and hardness transition patterns of a rolling round
example. (Reprinted from Computers and Operations Research, 39 (2), Chen, Y. W.
et al., Development of hybrid evolutionary algorithms for production scheduling
of hot strip mill, 339–349, Copyright 2012, with permission from Elsevier.)
150
Width (cm)
100
50
0
0 100 200 300 400 500 600
Position
Note that the rolling rounds usually become worse, without updating the
manufacturing order pool. To simulate a dynamic environment, we import 200
new coils into the set of manufacturing orders after the scheduling of each rolling
round. The optimization objectives for the heuristic algorithm 1 in Section 8.3.1,
the MGA and the GEO-3 are reported in Table 8.3.
Improvement
Instance No. Objective Objective Objective (% over MGA)
In Tables 8.2 and 8.3, it is obvious that the proposed hybrid EA can consider-
ably improve the optimization objective. The HSM-scheduling system equipped
with the optimization engine can obtain an optimized rolling round within 600 s.
Furthermore, extensive computational results on industrial data demonstrate that
the developed HSM-scheduling system has superior performances in scheduling
quality and computing efficiency.
8.4 Summary
In this chapter, the HSM-scheduling problem in the steel industry is studied.
A mathematical model is formulated to describe two important scheduling sub-
tasks: (1) selecting a subset of manufacturing orders, and (2) generating an o ptimal
rolling sequence. In view of the complexity of the scheduling problem, hybrid
EAs are proposed through the combination of GA and EO. With the help of the
developed HSM-scheduling system, simulations are conducted to demonstrate that
the hybrid EAs can generate optimized rolling rounds or campaigns efficiently.
Although the hybrid EA in this chapter is developed to solve the HSM-scheduling
problem, it has great potential in the areas of scheduling and optimization. Future
work will emphasize the generalization of this hybrid evolutionary optimization
method.
References
Achlioptas, D., Naor, A., and Peres, Y. 2005. Rigorous location of phase transitions in hard
optimization problems. Nature 435: 759–764.
Ackley, D. H. 1987. A Connectionist Machine for Genetic Hill Climbing. Kluwer, Boston,
MA.
Ahmed, E. and Elettreby, M. F. 2004. On multi-objective evolution model. International
Journal of Modern Physics C 15 (9): 1189–1195.
Al Seyab, R. K. and Cao, Y. 2006. Nonlinear model predictive control for the ALSTOM
gasifier. Journal of Process Control 16: 795–808.
Albert, R. and Barabási, A. L. 2000. Topology of evolving networks: Local events and
universality. Physical Review Letters 85: 5234–5237.
Ali, S. and Smith, K. A. 2006. A meta-learning approach to automatic kernel selection for
support vector machines. Neurocomputing 70: 173–186.
Altenberg, L. 1997. NK fitness landscapes. In: T. Bäck, D. B. Fogel and Z. Michalewicz
(Eds.), Handbook of Evolutionary Computation. Oxford University Press, New York.
Ang, K. H., Chong, G., and Li, Y. 2005. PID control system analysis, design and technol-
ogy. IEEE Transactions on Control Systems Technology 13 (4): 559–576.
Arifovic, J. and Gencay, R. 2001. Using genetic algorithms to select architecture of a feed-
forward artificial neural network. Physica A: Statistical Mechanics and Its Applications
289: 574–594.
Armańanzas, R. and Lozano, J. A. 2005. A multi-objective approach to the portfolio opti-
mization problem. Proceedings of CEC’2005, Edinburgh, UK, pp. 1388–1395.
Assaf, I., Chen, M., and Katzberg, J. 1997. Steel production schedule generation.
International Journal of Production Research 35 (2): 467–477.
Azadehgan, V., Jafarian, N., and Jafarieh, F. 2011. A novel hybrid artificial bee colony with
extremal optimization. Proceedings of the 4th International Conference on Computer
and Electrical Engineering (ICCEE 2011), Singapore, pp. 45–49.
Bak, P. 1996. How Nature Works: The Science of Self-Organized Criticality. Copernicus
Press, New York.
Bak, P. and Sneppen, K. 1993. Punctuated equilibrium and criticality in a simple model of
evolution. Physical Review Letters 71 (24): 4083–4086.
Bak, P., Tang, C., and Wiesenfeld, K. 1987. Self-organized criticality: An explanation of 1/f
noise. Physical Review Letters 59: 381–384.
Balas, E. 1989. The prize collecting traveling salesman problem. Networks 19: 621–636.
Barabási, A. L. 2007. The architecture of complexity. IEEE Control Systems Magazine
27 (4): 33–42.
297
298 ◾ References
Barabási, A. L. and Oltvai, A. L. 2004. Network biology: Understanding the cell’s func-
tional organization. Nature Reviews Genetics 5: 101–113.
Bauke, H., Mertens, S., and Engel, A. 2003. Phase transition in multiprocessor scheduling.
Physical Review Letters 90 (15): 158701-1–158701-4.
Baykasoglu, A. 2006. Applying multiple objective tabu search to continuous optimization
problems with a simple neighborhood strategy. International Journal for Numerical
Methods in Engineering 65: 406–424.
Beasley, J. E. 1990. OR-library: Distributing test problems by electronic mail. Journal of the
Operational Research Society 41: 1069–1072.
Beausoleil, R. P. 2006. “MOSS” multi-objective scatter search applied to non-linear mul-
tiple criteria optimization. European Journal of Operational Research 169: 426–449.
Beck, J. C. and Jackson, K. 1997. Constrainedness and phase transition in job shop sched-
uling. Technical Report, School of Computer Sciences, Simon Fraster University,
Burnaby, BC.
Biehl, M., Ahr, M., and Schlösser, E. 2000. Statistical physics of learning: Phase transitions
in multilayered neural networks. Advances in Solid State Physics 40: 819–826.
Biroli, G., Cocco, S., and Monasson, R. 2002. Phase transitions and complexity in com-
puter science: An overview of the statistical physics approach to the random satisfi-
ability problem. Physica A 306: 381–394.
Blum, C. and Roli, A. 2003. Metaheuristics in combinatorial optimization: Overview and
conceptual comparison. ACM Computing Surveys 35 (3): 268–308.
Boettcher, S. 2005a. Extremal optimization for the Sherrington–Kirkpatrick spin glass.
European Physics Journal B 46: 501–505.
Boettcher, S. 2005b. Self-organizing dynamics for optimization. Computational Science,
Lecture Notes in Computer Science 3515: 386–394.
Boettcher, S. and Frank, M. 2006. Optimizing at the ergodic edge. Physica A 367: 220–230.
Boettcher, S. and Percus, A. G. 1999. Extremal optimization: Methods derived from
co-evolution. Proceedings of the Genetic and Evolutionary Computation Conference,
Orlando, FL, pp. 825–832.
Boettcher, S. and Percus, A. G. 2000. Nature’s way of optimizing. Artificial Intelligence 119:
275–286.
Boettcher, S. and Percus, A. G. 2001a. Optimization with extremal dynamics. Physical
Review Letters 86 (23): 5211–5214.
Boettcher, S. and Percus, A. G. 2001b. Extremal optimization for graph partitioning.
Physical Review E 64 (2): 1–13.
Boettcher, S. and Percus, A. G. 2003. Optimization with extremal dynamics. Complexity
8: 57–62.
Bonabeau, E., Dorigo, M., and Theraulaz, G. 2000. Inspiration for optimization from
social insect behaviour. Nature 406: 39–42.
Bosman, P. A. N. and Thierens, D. 2003. The balance between proximity and diver-
sity in multi-objective evolutionary algorithms. IEEE Transactions on Evolutionary
Computation 7 (2): 174–188.
Burges, C. J. C. 1998. A tutorial on support vector machines for pattern recognition.
Knowledge Discovery and Data Mining 2: 121–167.
Burke, E. K., Cowling, P. I., and Keuthen, R. 2001. Effective local and guided variable
neighborhood search methods for the asymmetric traveling salesman problem.
Applications of Evolutionary Computing, Lecture Notes in Computer Science 2037:
203–212.
References ◾ 299
Camacho, E. F. and Bordons, C. 1995. Model Predictive Control in the Process Industry.
Springer-Verlag, Berlin.
Chaiyatham, T. and Ngamroo, I. 2012. A bee colony optimization based-fuzzy logic–PID
control design of electrolyzer for microgrid stabilization. International Journal of
Innovative Computing, Information and Control 8 (9): 6049–6066.
Chang, W. A. and Ramakrishna, R. S. 2002. A genetic algorithm for shortest path routing
problem and the sizing of population. IEEE Transactions on Evolutionary Computation
6 (6): 566–579.
Chang, W. D. 2007. A multi-crossover genetic approach to multivariable PID controllers
tuning. Expert Systems with Applications 33: 620–626.
Cheesman, P., Kanefsky, B., and Taylor, W. M. 1991. Where the really hard problems are.
Proceedings of the 12th International Joint Conference on Artificial Intelligence, Sidney,
Australia, pp. 163–169.
Chen, A. L., Yang, G. K., and Wu, Z. M. 2008. Production scheduling optimization algo-
rithm for the hot rolling processes. International Journal of Production Research 46 (7):
1955–1973.
Chen, D. B. and Zhao, C. X. 2009. Particle swarm optimization with adaptive population
size and its application. Applied Soft Computing 9: 39–48.
Chen, X., Wan, W., and Xu, X. 1998. Modeling rolling batch planning as vehicle routing
problem with time windows. Computers and Operations Research 25 (12): 1127–1136.
Chen, Y. and Zhang, P. 2006. Optimized annealing of traveling salesman problem from
the nth-nearest-neighbor distribution. Physica A 371: 627–631.
Chuang, C. C. and Lee, Z. J. 2009. Hybrid robust support vector machines for regression
with outliers. Applied Soft Computing 11: 64–72.
Cirasella, J., Johnson, D. S., McGeoch, L. A., and Zhang, W. X. 2001. The asymmetric
traveling salesman problem: Algorithms, instance generators, and tests. Algorithm
Engineering and Experimentation, Lecture Notes in Computer Science 2153: 32–59.
Clauset, A., Shalizi, C. R., and Newman, M. E. J. 2009. Power-law distributions in empiri-
cal data. SIAM Review 51 (4): 661–703.
Coello, C. A. C. 1996. An empirical study of evolutionary techniques for multi-objective
optimization in engineering design. PhD thesis, Department of Computer Science,
Tulane University, New Orleans, LA.
Coello, C. A. C. 2005. Recent trend in evolutionary multi-objective optimization.
In: A. Abraham, L. Jain., and R. Goldberg (Eds.), Evolutionary Multi-Objective
Optimization: Theoretical Advances and Applications, Springer-Verlag, London, pp.
7–32.
Coello, C. A. C. 2006. Evolutionary multi-objective optimization: A historical view of the
field. IEEE Computational Intelligence Magazine 1 (1): 28–36.
Coelho, L. S. and Pessôa, M. W. 2011. A tuning strategy for multivariable PI and PID con-
trollers using differential evolution with chaotic Zaslavskii map. Expert Systems with
Applications 38: 13694–13701.
Coello, C. A. C., Pulido, G. T., and Leehuga, M. S. 2004. Handling multiple objectives
with particle swarm optimization. IEEE Transactions on Evolutionary Computation
8 (3): 256–279.
Cormen, T. H., Leiserson, C. E., Rivest, R. L., and Stein, C. 2001. Introduction to
Algorithms. MIT Press, Cambridge, MA.
Cowling, P. 2003. A flexible decision support system for steel hot rolling mill scheduling.
Computers and Industrial Engineering 45: 307–321.
300 ◾ References
Cox, I. J., Lewis, R. W., Ransing, R. S., Laszczewski, H., and Berni, G. 2002. Application
of neural computing in basic oxygen steelmaking. Journal of Material Processing
Technology 120: 310–315.
Daley, M. J. and Kari, L. 2002. DNA computing: Models and implementations. Comments
on Theoretical Biology 7: 177–198.
Das, I. and Dennis, J. 1997. A close look at drawbacks of minimizing weighted sum of
objectives for Pareto set generation in multicriteria optimization problems. Structure
Optimization 14 (1): 63–69.
Davendra, D., Zelinka, I., and Senkerik, R. 2010. Chaos driven evolutionary algorithms for
the task of PID control. Computers and Mathematics with Applications 60: 1088–1104.
Dawkins, R. 1976. The Selfish Gene. Oxford University Press, Oxford, UK.
Deb, K., Pratab, A., Agrawal, S., and Meyarivan, T. 2002. A fast and elitist multi-objective
genetic algorithm: NSGA-II. IEEE Transactions on Evolutionary Computation 6 (2):
182–197.
Deb, K., Patrap, A., and Moitra, S. 2000. Mechanical component design for multi-objective
using elitist non-dominated sorting GA. KanGAL Report 200002, Indian Institute of
Technology, Kanpur, India.
De Castro, L. N. and Timmis, J. 2002. Artificial Immune Systems: A New Computational
Intelligence Approach. Springer, London.
De Menezes, M. A. and Lima, A. R. 2003. Using entropy-based methods to study general
constrained parameter optimization problems. Physica A 323: 428–434.
Dengiz, B., Alabas-Uslu, C., and Dengiz, O. 2009. A tabu search algorithm for the training
of neural networks. Journal of the Operational Research Society 60: 282–291.
De Sousa, F. L. and Ramos, F. M. 2002. Function optimization using extremal dynamics.
Proceedings of the 4th International Conference on Inverse Problems in Engineering, Rio
de Janeiro, Brazil, pp. 1–5.
De Sousa, F. L., Vlassov, V., Ramos, F. M. 2004. Generalized extremal optimization: An
application in heat pipe design. Applied Mathematical Modeling 28: 911–931.
Djoewahir, A., Tanaka, K., and Nakashima, S. 2013. Adaptive PSO-based self-tuning
PID controller for ultrasonic motor. International Journal of Innovative Computing,
Information and Control 9 (10): 3903–3914.
Dobson, C. M. 2003. Review article: Protein folding and misfolding. Nature 426: 884–890.
Dorigo, M. and Gambardella, L. M. 1997. Ant colony systems: A cooperative learning
approach to the TSP. IEEE Transactions on Evolutionary Computation 1 (1): 53–66.
Dubolis, O. and Dequen, G. 2001. A backbone-search heuristic for efficient solving of hard
3-SAT formulae. International Joint Conference on Artificial Intelligence, Seattle, WA,
pp. 248–253.
Duch, J. and Arenas, A. 2005. Community detection in complex networks using extremal
optimization. Physical Review E 72: 27104.
Eberhart, R. C. and Kennedy, J. 1995. A new optimizer using particle swarm theory.
Proceedings of the 6th International Symposium on Micromachine and Human Science,
Nagoya, Japan, pp. 39–43.
Ehrlich, P. R. and Raven, P. H. 1964. Butterflies and plants: A study in coevolution. Society
for the Study of Evolution 18: 586–608.
Elaoud, S., Loukil, T., and Teghem, J. 2007. The Pareto fitness genetic algorithm: Test
function study. European Journal of Operational Research 177: 1703–1719.
Engel, A. 2001. Complexity of learning in artificial neural networks. Theoretical Computer
Science 265: 285–306.
References ◾ 301
Engelbrecht, A. P. 2007. Computational Intelligence, an Introduction, 2nd ed. John Wiley &
Sons, New York.
Engin, A. 2009. Selecting of the optimal feature subset and kernel parameters in digi-
tal modulation classification by using hybrid genetic algorithm–support vector
machines: HGASVM. Expert Systems with Application 36: 1391–1402.
Fan, S. K. S. and Zahara, E. 2007. A hybrid simplex search and particle swarm optimization
for unconstrained optimization. European Journal of Operational Research 181: 527–548.
Fang, H. L. and Tsai, C. H. 1998. A genetic algorithm approach to hot strip mill rolling
scheduling problems. Proceedings of the Tenth IEEE International Conference on Tools
with Artificial Intelligence, Taipei, Taiwan, pp. 264–271.
Fasih, A., Chedjou, C. J., and Kyamakya, K. 2009. Cellular neural networks-based genetic
algorithm for optimizing the behavior of an unstructured robot. International Journal
of Computational Intelligence Systems 2: 124–131.
Feng, M. X., Li, Q., and Zou, Z. S. 2008. An outlier identification and judgment method
for an improved neural-network BOF forecasting model. Steel Research International
79: 323–332.
Fileti, A. M. F., Pacianotto, T. A., and Cunha, A. P. 2006. Neural modelling helps the BOS
process to achieve aimed end-point conditions in liquid steel. Engineering Applications
of Artificial Intelligence 19: 9–17.
Fonseca, C. M. and Fleming, P. J. 1993. Genetic algorithms for multi-objective optimi-
zation: Formulation, discussion and generalization. In: S. Forrest (Ed.), Proceedings
of the 5th International Conference on Genetic Algorithms, Morgan Kaufmann, San
Mateo, CA, pp. 416–423.
Fonseca, C. M. and Fleming, P. J. 1995. An overview of evolutionary algorithms in multi-
objective optimization. Evolutionary Computation 1: 1–16.
Forrest, S. 1993. Genetic algorithms: Principles of natural selection applied to computa-
tion. Science 261 (5123): 872–878.
Franz, A. and Hoffmann, K. H. 2002. Optimal annealing schedules for a modified Tsallis
statistics. Journal of Computational Physics 176: 196–204.
Fredman, M. L., Johnson, D. S., Mcgeoch, L. A., and Ostheimer, G. 1995. Data structures
for traveling salesmen. Journal of Algorithms 18: 432–479.
Friedrichs, F. and Igel, C. 2005. Evolutionary tuning of multiple SVM parameters.
Neurocomputing 64: 107–117.
Fukumizu, K. and Amari, S. 2000. Local minima and plateaus in hierarchical structures of
multilayer perceptrons. Neural Networks 13: 317–327.
Fukuoka, Y., Matsuki, H., Minamitani, H., and Ishida, A. 1998. A modified back propaga-
tion method to avoid false local minima. Neural Networks 11: 1059–1072.
Gabrielli, A., Cafiero, R., Marsili, M., and Pietronero, L. 1997. Theory of self-organized
criticality for problems with extremal dynamics. Europhysics Letters 38 (7): 491–496.
Galski, R. L., De Sousa, F. L., and Ramos, F. M. 2005. Application of a new multi-objective
evolutionary algorithm to the optimum design of a remote sensing satellite constella-
tion. Proceedings of the 5th International Conference on Inverse Problems in Engineering:
Theory and Practice, Cambridge, Vol. II, G01.
Galski, R. L., De Sousa, F. L., Ramos, F. M., and Muraoka, I. 2004. Spacecraft thermal
design with the generalized extremal optimization algorithm. Proceedings of Inverse
Problems, Design and Optimization Symposium, Rio de Janeiro, Brazil.
Gao, W. F., Liu, S. Y., and Huang, L. L. 2012. Inspired artificial bee colony algorithm for
global optimization problems. Acta Electronica Sinica 12: 2396–2403.
302 ◾ References
Garey, M. R. and Jonhson, D. S. 1979. Computers and Intractability: A Guide to the Theory
of NP-Completeness. W. H. Freeman, New York.
Gent, I. P. and Walsh, T. 1996. The TSP phase transition. Artificial Intelligence 88: 349–358.
Goldberg, D. E. 1989. Genetic Algorithms in Search, Optimization and Machine Learning.
Addison-Wesley, Reading, MA.
Goles, E., Latapy, M., Magnien, C., Morvan, M., and Phan, H. D. 2004. Sandpile models
and lattices: A comprehensive survey. Theoretical Computer Science 322: 383–407.
Gutin, G. and Punnen, A. P. 2002. The Traveling Salesman Problem and Its Variations.
Kluwer Academic Publishers, Boston.
Hagan, M. T. and Menhaj, M. B. 1994. Training feedforward networks with the Marquardt
algorithm. IEEE Transactions on Neural Networks 5: 989–993.
Haken, H. 1977. Synergetics. Springer, Berlin, Germany.
Hamida, S. B. and Schoenauer, M. 2002. ASCHEA: New results using adaptive segrega-
tional constraint handling. Proceedings of the Congress on Evolutionary Computation
2002 (CEC’2002), Hawaii, pp. 884–889.
Han, J. 2005. Local evaluation functions and global evaluation functions for computa-
tional evolution. Complex Systems 15 (4):307–347.
Han, J. and Cai, Q. S. 2003. Emergence from local evaluation function. Journal of Systems
Science and Complexity 16 (3): 372–390.
Han, S. P. 1976. Superlinearly convergent variable metric algorithms for general nonlinear
programming problems. Mathematical Programming 11: 263–282.
Hanne, T. 2007. A multi-objective evolutionary algorithm for approximating the efficient
set. European Journal of Operational Research 176: 1723–1734.
Hansen, P. and Jaumard, B. 1990. Algorithms for the maximum satisfiability problem.
Computing 44: 279–303.
Hart, W. E., Krasnogor, N., and Smith, J. E. 2004. Recent Advances in Memetic Algorithms.
Springer, Berlin.
Hartmann, A. K. and Rieger, H. (Eds.) 2004. New Optimization Algorithms in Physics.
Wiley-VCH, Weinheim.
Hartmann, A. K. and Weigt, M. 2005. Phase Transitions in Combinatorial Optimization
Problems: Basics, Algorithms and Statistical Mechanics. Wiley-VCH, Weinheim, German.
Haykin, S. 1994. A Comprehensive Foundation. Neural Networks. Macmillan, New York.
He, L. and Mort, N. 2000. Hybrid genetic algorithms for telecommunications network
back-up routing. British Telecom Technology Journal 18: 42–56.
Heilmann, F., Hoffmann, K. H., and Salamon, P. 2004. Best possible probability distribu-
tion over extremal optimization ranks. Europhysics Letters 66 (3): 305–310.
Helsgaun, K. 2000. An effective implementation of the Lin–Kernighan traveling salesman
heuristic. European Journal of Operational Research 126: 106–130.
Henson, M. A. 1998. Nonlinear model predictive control: Current status and future direc-
tions. Computers and Chemical Engineering 23: 187–202.
Herroelen, W. and Reyck, B. D. 1999. Phase transition in project scheduling. Journal of
Operations Research Society 50: 148–156.
Hoffmann, K. H., Heilmann, F., and Salamon, P. 2004. Fitness threshold accepting over
extremal optimization ranks. Physical Review E 70 (4): 046704.
Hogg, T., Huberman, B. A., and Williams, C. P. 1996. Special issue on frontiers in problem
solving: Phase transitions and complexity. Artificial Intelligence 81 (1–2).
Hohmann, W., Kraus, M., and Schneider, F. W. 2008. Learning and recognition in excit-
able chemical reactor networks. Journal of Physical Chemistry A 102: 3103–3111.
References ◾ 303
Kilby, P., Slaney, J., Thiébaux, S., and Walsh, T. 2005. Backbones and backdoors in satisfi-
ability. Proceedings of the 20th National Conference on Artificial Intelligence, Pittsburgh,
Pennsylvania, pp. 1368–1373.
Kim, T. H., Maruta, I., and Sugie, T. 2008. Robust PID controller tuning based on con-
strained particle swarm optimization. Automatica 44: 1104–1110.
Kinzel, W. 1998. Phase transitions of neural networks. Philosophical Magazine B 77:
1455–1477.
Kinzel, W. 1999. Statistical physics of neural networks. Computer Physics Communications
121: 86–93.
Kirkpatrick, S., Gelatt, C. D. Jr., and Vecchi, M. P. 1983. Optimization by simulated
annealing. Science 220 (4598): 671–680.
Kirousis, L. M. and Kranakisdeg, E. 2005. Special issue on typical case complexity and
phase transition. Discrete Applied Mathematics 153 (1–3): 1–182.
Knoop, P. and Van Nerom, L. 2003. Scheduling requirements for hot charge optimization
in an integrated steel plant. Proceedings of Industry Applications Conference, 38th IAS
Annual Meeting, Utah, USA, pp. 74–78.
Knowles, J. and Corne, D. 1999. The Pareto archived evolution strategy: A new base-
line algorithm for multi-objective optimization. Proceedings of the 1999 Congress on
Evolutionary Computation, IEEE Press, Piscataway, NJ, pp. 98–105.
Knowles, J. and Come, D. 2001. A comparative assessment of memetic, evolutionary, and
constructive algorithms for the multiobjective D-MST problem. 2001 Genetic and
Evolutionary Computation Workshop Proceeding, San Francisco, pp. 162–167.
Korte, B. and Vygen, J. 2012. Combinatorial Optimization: Theory and Algorithms. Springer,
Heidelberg, German.
Kosiba, E. D., Wright, J. R., and Cobbs, A. E. 1992. Discrete event sequencing as a travel-
ing salesman problem. Computers in Industry 19 (3): 317–327.
Koza, J. R. 1998. Genetic Programming. MIT Press, Cambridge, MA.
Krasnogor, N. 2004. Self-generating metaheuristics in bioinformatics: The protein struc-
ture comparison case. Genetic Programming and Evolvable Machines 5: 181–201.
Krasnogor, N. and Gustafson, S. 2004. A study on the use of “self-generation” in memetic
algorithms. Natural Computing 3: 53–76.
Krasnogor, N. and Smith, J. E. 2005. A tutorial for competent memetic algorithms: Model,
taxonomy and design issues. IEEE Transactions on Evolutionary Computation 9:
474–488.
Ku, K. and Mak, M. 1998. Empirical analysis of the factors that affect the Baldwin effect.
Lecture Notes in Computer Science 1498: 481–490.
Langton, C. G. 1998. Artificial Life: An Overview. MIT Press, Cambridge, MA.
Laporte, G. 2010. A concise guide to the traveling salesman problem. Journal of the
Operational Research Society 61: 35–40.
Larraňaga, P., Kuijpers, C. M. H., Murga, R. H., Inza, I., and Dizdarevic, S. 1999. Genetic
algorithms for the travelling salesman problem: A review of representations and oper-
ators. Artificial Intelligence Review 13: 129–170.
Lawler, E. J., Lenstra, J. K., Kan, A. H. G. R., and Shmoys, D. B. 1985. The Traveling
Salesman Problem: A Guided Tour of Combinatorial Optimization. Wiley, New York.
Lee, C. Y. and Yao, X. 2001. Evolutionary algorithms with adaptive Lévy mutations. Proceedings
of the 2001 Congress on Evolutionary Computation, Seoul, Korea, pp. 568–575.
Lee, H. S., Murthy, S. S., Haider, S. W., and Morse, D. V. 1996. Primary production schedul-
ing at steelmaking industries. IBM Journal of Research and Development 40 (2): 231–252.
References ◾ 305
Li, N. J., Wang, W. J., Hsu, C. C. J., Chang, W., and Chou, H. G. 2014. Enhanced particle
swarm optimizer incorporating a weighted particle. Neurocomputing 124: 218–227.
Li, S. J., Li, Y., Liu, Y., and Xu, Y. F. 2007. A GA-based NN approach for makespan estima-
tion. Applied Mathematics and Computation 185: 1003–1014.
Lin, S. and Kernighan, B. W. 1973. An effective heuristic algorithm for the traveling sales-
man problem. Operations Research 21: 498–516.
Liu, H., Abraham, A., and Clerc, M. 2007. Chaotic dynamic characteristics in swarm intel-
ligence. Applied Soft Computing 7: 1019–1026.
Liu, J., Han, J., and Tang, Y. Y. 2002. Multi-agent oriented constraint satisfaction. Artificial
Intelligence 136: 101–144.
Liu, J., Jin, X., and Tsui, K. C. 2005. Autonomy Oriented Computing: From Problem Solving
to Complex Systems Modeling. Kluwer Academic Publishers, Boston, MA.
Liu, J., Tang, Y. Y., and Cao, Y. C. 1997. An evolutionary autonomous agents approach to
image feature extraction. IEEE Transactions on Evolutionary Computation 1 (2): 141–158.
Liu, J. and Tsui, K. C. 2006. Toward nature-inspired computing. Communications of the
ACM 49 (10): 59–64.
Lopez, L., Carter, M. W., and Gendreau, M. 1998. The hot strip mill production schedul-
ing problem: A tabu search approach. European Journal of Operational Research 106:
317–335.
Lorena, A. C. and De Carvalho, A. 2008. Evolutionary tuning of SVM parameter values
in multiclass problems. Neurocomputing 71: 3326–3334.
Luenberger, D. G. 1984. Linear and Nonlinear Programming. Addison-Wesley, Reading,
MA.
Mantegna, R. 1994. Fast, accurate algorithm for numerical simulation of Lévy stable sto-
chastic process. Physical Review E 49: 4677–4683.
Mao, Y., Zhou, X., Pi, D., Sun, Y., and Wong, S. T. C. 2005. Parameters selection in
gene selection using Gaussian kernel support vector machines by genetic algorithm.
Journal of Zhejiang University, Science B 6: 961–973.
Martin, O. C., Monasson, R., and Zecchina, R. 2001. Statistical mechanics methods and
phase transitions in optimization problems. Theoretical Computer Science 265: 3–67.
Martinez, M., Senent, J. S., and Blasco, X. 1998. Generalized predictive control using genetic
algorithms (GAGPC). Engineering Applications of Artificial Intelligence 11: 355–367.
Martinsen, F., Biegler, L. T., and Foss, B. A. 2004. A new optimization algorithm with
application to nonlinear MPC. Journal of Process Control 14: 853–865.
Mattika, I., Amorimb, P., and Günther, H. O. 2014. Hierarchical scheduling of continu-
ous casters and hot strip mills in the steel industry: A block planning application.
International Journal of Production Research 52 (9): 2576–2591.
Menaï, M. B. and Batouche, M. 2006. An effective heuristic algorithm for the maximum
satisfiability problem. Applied Intelligence 24: 227–239.
Méndez, R. A., Valladares, A., Flores, J., Seligman, T. H., and Bohigas, O. 1996. Universal
fluctuations of quasi-optimal paths of the traveling salesman problem. Physica A 232:
554–562.
Menhas, M. I., Wang, L., Fei, M., and Pan, H. 2012. Comparative performance analysis of
various binary coded PSO algorithms in multivariable PID controller design. Expert
Systems with Applications 39: 4390–4401.
Merz, P. 2000. Memetic algorithms for combinatorial optimization problems: Fitness
landscapes and effective search strategies. PhD dissertation, Department of Electrical
Engineering and Computer Science, University of Siegen, Germany.
306 ◾ References
Merz, P. 2004. Advanced fitness landscape analysis and the performance of memetic algo-
rithms. Evolutionary Computation 12 (3): 303–325.
Merz, P. and Freisleben, B. 2000. Fitness landscape analysis and memetic algorithms for
the quadratic assignment problem. IEEE Transactions on Evolutionary Computation
4 (4): 337–352.
Meza, G. R., Sanchis, J., Blasco, X., and Herrero, J. M. 2012. Multiobjective evolutionary
algorithms for multivariable PI controller design. Expert Systems with Applications 39:
7895–7907.
Mézard, M., Parisi, G., and Zecchina, R. 2002. Analytic and algorithmic solution of ran-
dom satisfiability problems. Science 297: 812–815.
Mezura-Montes, E. and Coello, C. A. C. 2005. A simple multimembered evolution strat-
egy to solve constrained optimization problems. IEEE Transactions on Evolutionary
Computation 9: 1–17.
Michalewicz, Z. 1996. Genetic Algorithms + Data Structures = Evolution Programs. Springer,
Heidelberg.
Middleton, A. A. 2004. Improved extremal optimization for the using spin glass. Physical
Review E 69: 055701R.
Miettinen, K. M. 1999. Nonlinear Multi-Objective Optimization. Kluwer Academic
Publishers, Boston, MA.
Miller, D. L. and Pekny, J. F. 1989. Results from a parallel branch and bound algorithm for
the asymmetric traveling salesman problem. Operations Research Letters 8 (3): 129–135.
Min, J. H. and Lee, Y. C. 2005. Bankruptcy prediction using support vector machine with opti-
mal choice of kernel function parameters. Expert Systems with Applications 28: 603–614.
Molga, M. and Smutnicki, C. 2005. Test functions for optimization needs. Available at
http://www.zsd.ict.pwr.wroc.pl/files/docs/functions.pdf.
Monasson, R., Zecchina, R., Kirkpatrick, S., Selman, B., and Troyansky, L. 1999.
Determining computational complexity from characteristic “phase transitions”.
Nature 400: 133–137.
Moscato, P. 1989. On evolution, search, optimization, genetic algorithms, and martial
arts: Towards memetic algorithms. Technical Report Caltech Concurrent Computation
Program, Report 826, California Institute of Technology, Pasadena, CA.
Moscato, P. and Cotta, C. 2003. A Gentle Introduction to Memetic Algorithms, Handbook of
Metaheuristics. Kluwer Academic Publishers, Boston.
Moscato, P., Mendes, A., and Berretta, R. 2007. Benchmarking a memetic algorithm for
ordering microarray data. Biosystems 88: 56–75.
Mosetti, G., Jug, G., and Scalas, E. 2007. Power laws from randomly sampled continuous-
time random walks. Physica A 375: 233–238.
Murty, K. G. and Kabadi, S. N. 1987. Some NP-complete problems in quadratic and non-
linear programming. Mathematical Programming 39: 117–129.
Nocedal, J. and Stephen, J. W. 2006. Numerical Optimization. Springer, New York.
Okano, H., Davenport, A. J., Trumbo, M., Reddy, C., Yoda, K., and Amano, M.
2004. Finishing line scheduling in the steel industry. IBM Journal of Research and
Development 48 (5/6): 811–830.
Ong, Y. S. and Keane, A. 2004. Meta-Lamarckian learning in memetic algorithms. IEEE
Transactions on Evolutionary Computation 8: 99–110.
Onnen, C., Babuška, R., Kaymak, U., Sousa, J. M., Verbruggen, H. B., and Isermann, R.
1997. Genetic algorithms for optimization in predictive control. Control Engineering
Practice 5: 1363–1372.
References ◾ 307
Paczuski, M., Maslov, S., and Bak, P. 1996. Avalanche dynamics in evolution, growth, and
depinning models. Physical Review E 53 (1): 414–443.
Pai, P. F. and Hong, W. C. 2005. Support vector machines with simulated annealing
algorithms in electricity load forecasting. Energy Conversion and Management 46:
2669–2688.
Papadimitriou, C. H. 1994. Computational Complexity, 1st ed. Addison-Wesley, USA.
Papadimitriou, C. H. and Steiglitz, K. 1998. Combinatorial Optimization: Algorithms and
Complexity. Courier Dover Publications, USA.
Patrick, S. and Michalewicz, Z. 2008. Advances in Meta-Heuristics for Hard Optimization.
Springer, Heidelberg.
Pearson, R. K. 2006. Nonlinear empirical modeling techniques. Computers and Chemical
Engineering 30: 1514–1528.
Peterson, C. M., Sorensen, K. L., and Vidal, R. V. V. 1992. Inter-process synchronization in
steel production. International Journal of Production Research 30: 1415–1425.
Potocnik, P. and Grabec, I. 2002. Nonlinear model predictive control of a cutting process.
Neurocomputing 43: 107–126.
Potschka, H. 2010. Targeting regulation of ABC efflux transporters in brain diseases: A
novel therapeutic approach. Pharmacology and Therapeutics 125: 118–127.
Potvin, J. Y. 1996. Genetic algorithms for the traveling salesman problem. Annals of
Operations Research 63: 339–370.
Powell, M. J. D. 1978. A fast algorithm for nonlinearly constrained optimization calcula-
tions. Lecture Notes in Mathematics 630: 144–157.
Qiao, J. F. and Wang, H. D. 2008. A self-organizing fuzzy neural network and its applica-
tions to function approximation and forecast modeling. Neurocomputing 71: 564–569.
Qin, S. J. and Badgwell, T. A. 2003. A survey of industrial model predictive control tech-
nology. Control Engineering Practice 11: 733–764.
Ramos, V., Fernandes, C., and Rosa, A. C. 2005. Societal memory and his speed on track-
ing extrema over dynamic environments using self-regulatory swarms. Proceedings of
the 1st European Symposium on Nature Inspired Smart Information Systems, Albufeira,
Portugal.
Rangel, L. P. 2012. Putative role of an ABC transporter in Fonsecaea pedrosoi multidrug
resistance. International Journal of Antimicrobial Agents 40: 409–415.
Refael, H. and Mati, S. 2005. Machine scheduling with earliness, tardiness and non-execu-
tion penalties. Computers and Operations Research 32: 683–705.
Reidys, C. M. and Stadler, P. F. 2002. Combinatorial landscapes. SIAM Review 44 (1):
3–54.
Reinelt, G. 1991. TSPLIB—A traveling salesman problem library. ORSA Journal on
Computing 3 (4): 376–384.
Reyaz-Ahmed, A., Zhang, Y. Q., and Harrison, R. W. 2009. Granular decision tree and
evolutionary neural SVM for protein secondary structure prediction. International
Journal of Computational Intelligence Systems 2: 343–352.
Rigler, A. K., Irvine, J. M., and Vogl, T. P. 1991. Rescaling of variables in back propagation
learning. Neural Networks 4: 225–229.
Rogers, A., Prügel-Bennett, A., and Jennings, N. R. 2006. Phase transitions and symme-
try breaking in genetic algorithms with crossover. Theoretical Computer Science 358:
121–141.
Rosenberg, R. S. 1967. Simulation of genetic populations with biochemical properties. PhD
thesis, University of Michigan, Ann Arbor, MI.
308 ◾ References
Rumelhart, D. E., Hinton, G. E., and Williams, R. J. 1986a. Learning internal represen-
tations by error propagation. In: Parallel Distributed Processing: Exploration in the
Microstructure of Cognition. J. L. McClelland and D. E. Rumelhart, Eds, MIT Press,
Cambridge, MA, pp. 318–362.
Rumelhart, D. E., Hinton, G. E., and Williams, R. J. 1986b. Learning representations by
back propagating errors. Nature 323: 533–536.
Runarsson, T. P. and Yao, X. 2000. Stochastic ranking for constrained evolutionary opti-
mization. IEEE Transactions on Evolutionary Computation 4: 284–294.
Salomon, R. 1998. Evolutionary algorithms and gradient search: Similarities and differ-
ences. IEEE Transactions on Evolutionary Computation 2: 45–55.
Sarker, R., Liang, K. H., and Newton, C. 2002. A new multi-objective evolutionary algo-
rithm. European Journal of Operational Research 140: 12–23.
Schaffer, J. D. 1985. Multiple objective optimization with vector evaluated genetic algo-
rithms. Proceedings of the First International Conference on Genetic Algorithms,
Lawrence Erlbaum, Hillsdale, NJ, pp. 93–100.
Schneider, J. 2003. Searching for backbones—A high-performance parallel algorithm for
solving combinatorial optimization problem. Future Generation Computer Systems 19:
121–131.
Schneider, J., Froschhammer, C., Morgenstern, I., Husslein, T., and Singer, J. M. 1996.
Searching for backbones-efficient parallel algorithm for the traveling salesman prob-
lem. Computer Physics Communications 96: 173–188.
Sedkia, A., Ouazar, D., and Mazoudi, E. El. 2009. Evolving neural network using real
coded genetic algorithm for daily rainfall-runoff forecasting. Expert Systems with
Applications 36: 4523–4527.
Seitz, S., Alava, M., and Orponen, P. 2005. Focused local search for random 3-satisfiability.
Journal of Statistical Mechanics: Theory and Experiment 23(6): 524–536.
Selman, B. 2008. Computational science: A hard statistical view. Nature 451: 639–640.
Selman, B. and Kautz, H. A. 1993. An empirical study of greedy local search for satisfi-
ability testing. Proceedings of the 11th National Conference on Artificial Intelligence,
Washington, D.C., pp. 46–51.
Selman, B., Kautz, H. A., and Cohen, B. 1994. Noise strategies for improving local search.
Proceedings of the 12th National Conference on Artificial Intelligence, Seattle, WA,
pp. 337–343.
Senthil Arumugam, M., Rao, M. V. C., and Tan, A. W. C. 2009. A novel and effective par-
ticle swarm optimization like algorithm with extrapolation technique. Applied Soft
Computing 9: 308–320.
Seung, H. S., Sompolinsky, H., and Tishby, N. 1992. Statistical mechanics of learning
from examples. Physical Review A 45: 6056–6091.
Shelokar, P. S., Siarry, P. V., Jayaraman, K., and Kulkarni, B. D. 2007. Particle swarm
and ant colony algorithms hybridized for improved continuous optimization. Applied
Mathematics and Computation 188: 129–142.
Shi, X. H., Liang, Y. C., Lee, H. P., Lu, C., and Wang, L. M. 2005. An improved GA and
a novel PSO–GA-based hybrid algorithm. Information Processing Letters 93: 255–261.
Singer, J., Gent, I. P., and Smaill, A. 2000. Backbone fragility and the local search cost
peak. Journal of Artificial Intelligence Research 12: 235–270.
Slaney, J. and Walsh, T. 2001. Backbones in optimization and approximation. Proceedings
of the International Joint Conference on Artificial Intelligence, Seattle, WA, Morgan
Kaufmann, San Mateo, CA, pp. 254–259.
References ◾ 309
Smith, J. E. 2007. Coevolving memetic algorithms: A review and progress report. IEEE
Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics 37: 6–17.
Sneppen, K. 1995. Extremal dynamics and punctuated co-evolution. Physica A 221: 168–179.
Song, Y., Chen, Z. Q., and Yuan, Z. Z. 2007. New chaotic PSO-based neural network predic-
tive control for nonlinear process. IEEE Transactions on Neural Networks 18: 595–600.
Srinivas, N. and Deb, K. 1994. Multi-objective optimization using nondominated sorting
in genetic algorithms. Evolutionary Computation 2 (3): 221–248.
Srinivasan, D. and Seow, T. H. 2006. Particle swarm inspired evolutionary algorithm
(PS-EA) for multi-criteria optimization problems. Proceedings of the Evolutionary
Multiobjective Optimization, Springer, Berlin, pp. 147–165.
Stauffer, L. and Liebling, T. M. 1997. Rolling horizon scheduling in a rolling-mill. Annals
of Operations Research 69: 323–349.
Steve, G. 1998. Support vector machines classification and regression. ISIS Technical Report,
Image, Speech, and Intelligent Systems Group, University of Southampton.
Strogatz, S. H. 2001. Exploring complex networks. Nature 410: 268–276.
Suganthan, P. N., Hansen, N., Liang,J. J., Deb, K., Chen, Y. P., Auger, A., and Tiwari, S.
2005. Problem definitions and evaluation criteria for the CEC 2005 special session
on real-parameter optimization. Technical Report, Nanyang Technological University,
Singapore.
Szedmak, S. 2001. How to find more efficient initial solution for searching. RUTCOR
Research Report 49-2001, Rutgers Center for Operations Research, Rutgers University,
Piscataway, NJ.
Taherdangkoo, M., Paziresh, M., Yazdi, M., and Bagheri, M. H. 2012. An efficient algo-
rithm for function optimization: Modified stem cells algorithm. Central European
Journal of Engineering 3: 36–50.
Tahk, M. J., Woo, H. W., and Park, M. S. 2007. A hybrid optimization method of evolu-
tionary and gradient search. Engineering Optimization 39: 87–104.
Tang, L. X., Liu, J. Y., Rong, A. Y., and Yang, Z. H. 2000. A multiple traveling sales-
man problem model for hot rolling scheduling in Shanghai Baoshan Iron & Steel
Complex. European Journal of Operational Research 124: 267–282.
Tang, L. X., Liu, J. Y., Rong, A. Y., and Yang, Z. H. 2001. A review of planning and
scheduling systems and methods for integrated steel production. European Journal of
Operational Research 133: 1–20.
Tang, L. X. and Wang, X. P. 2005. Iterated local search algorithm based on very large-scale
neighborhood for prize-collecting vehicle routing problem. The International Journal
of Advanced Manufacturing Technology 12: 1433–3015.
Tang, X., Zhuang, L., and Jiang, C. 2009. Prediction of silicon content in hot metal using
support vector regression based on chaos particle swarm optimization. Expert Systems
with Applications 36: 11853–11857.
Tao, J., Wang, X., and Chai, T. Y. 2002. Intelligent control method and application for
BOF steelmaking. Proceedings of the 15th IFAC World Congress, Barcelona, Spain, pp.
1071–1076.
Telelis, O. and Stamatopoulos, P. 2002. Heuristic backbone sampling for maximum
satisfiability. Proceedings of the 2nd Hellenic Conference on Artificial Intelligence,
Thessaloniki, Greece, pp. 129–139.
Thadani, K., Ashutosh, Jayaraman, V. K., and Sundararajan, V. 2006. Evolutionary selec-
tion of kernels in support vector machines. Proceedings of the International Conference
on Advanced Computing and Communications, Surathkal, India, pp. 19–24.
310 ◾ References
Tsallis, C. and Stariolo, D. A. 1996. Generalized simulated annealing. Physica A 233 (1–2):
395–406.
TSPLIB. Available at http://www.iwr.uni-heidelberg.de/groups/comopt/software/TSPLIB95/.
Vapnik, V. 1995. The Nature of Statistical Learning Theory. Springer-Verlag, New York.
Venkateswarlu, C. and Reddy, A. D. 2008. Nonlinear model predictive control of reactive
distillation based on stochastic optimization. Industrial and Engineering Chemistry
Research 47: 6949–6960.
Verdejoa, V. V., Alarcób, M. A. P., and Sorlí, M. P. L. 2009. Scheduling in a continuous
galvanizing line. Computers and Operations Research 36: 280–296.
Wang, Q. G., Zou, B., Lee, T. H., and Qiang, B. 1997. Auto-tuning of multivariable PID
controllers from decentralized relay feedback. Automatica 33 (3): 319–330.
Wang, X., Wang, Z. J., and Tao, J. 2006. Multiple neural network modeling method
for carbon and temperature estimation in basic oxygen furnace. Lecture Notes in
Computer Science 3973: 870–875.
Werbos, P. 1974. Beyond regression: New tools for prediction and analysis in the behavioral
sciences. PhD dissertation, Harvard University.
Wilson, R. B. 1963. A simplicial algorithm for concave programming. PhD dissertation,
Graduate School of Business Administration, Harvard University.
Wolpert, D. H. and Macready, W. G. 1997. No free lunch theorems for optimization. IEEE
Transactions on Evolutionary Computation 1: 67–82.
Wu, C. H., Tzeng, G. H., and Lin, R. H. 2009. A novel hybrid genetic algorithm for ker-
nel function and parameter optimization in support vector regression. Expert Systems
with Applications 36: 4725–4735.
Wu, Q. 2010. A hybrid-forecasting model based on Gaussian support vector machine and
chaotic particle swarm optimization. Expert Systems with Applications 37: 2388–2394.
Xu, C. W. and Lu, Y. Z. 1987. Fuzzy model identification and self-learning for dynamic
systems. IEEE Transactions on Systems, Man and Cybernetics SMC 17 (4): 683–689.
Xu, K. and Li, W. 2006. Many hard examples in exact phase transitions. Theoretical
Computer Science 355: 191–302.
Yadollahpour, M. R., Bijari, M., Kavosh, S., and Mahnam, M. 2009. Guided local search
algorithm for hot strip mill scheduling problem with considering hot charge rolling.
International Journal of Advanced Manufacturing Technology 45: 1215–1231.
Yan, D., Ahmad, S. Z., and Yang, D. 2013. Matthew effect, ABC analysis and project manage-
ment of scale-free information systems. The Journal of Systems and Software 86: 247–254.
Yan, X. H., Zhu, Y. L., and Zou, W. P. 2011. A hybrid artificial bee colony algorithm for
numerical function optimization. Proceedings of the 11th International Conference on
Hybrid Intelligent Systems, Malacca, Malaysia, pp. 127–132.
Yang, B. and Liu, J. 2007. An autonomy oriented computing (AOC) approach to distrib-
uted network community mining. Proceedings of the 1st International Conference on
Self-Adaptive and Self-Organizing Systems, Boston, MA, pp. 151–160.
Yao, X. and Islam, Md. M. 2008. Evolving artificial neural network ensembles. IEEE
Computational Intelligence Magazine 3: 31–42.
Yao, X., Liu, Y., and Lin, G. 1999. Evolutionary programming made faster. IEEE
Transactions on Evolutionary Computation 3 (2): 82–102.
Ye, J. 2008. Adaptive control of nonlinear PID-based analog neural networks for a non-
holonomic mobile robot. Neurocomputing 71 (7–9): 1561–1565.
Zadel, L. A. 1965. Fuzzy sets. Information and Control 8: 338–353.
References ◾ 311
Zhang, J., Chung, S. H. H., and Lo, W. L. 2007. Clustering-based adaptive crossover
and mutation probabilities for genetic algorithms. IEEE Transactions on Evolutionary
Computation 11: 326–335.
Zhang, J. H., Zhuang, J., Du, H. F., and Wang, S. A. 2009. Self-organizing genetic algo-
rithm based tuning of PID controllers. Information Sciences 179: 1007–1018.
Zhang, L., Zhou, W., and Jiao, L. 2004. Wavelet support vector machine. IEEE Transactions
on Systems, Man, and Cybernetics, Part B: Cybernetics 34: 34–39.
Zhang, M., Luo, W., and Wang, X. 2008. Differential evolution with dynamic stochastic
selection for constrained optimization. Information Sciences 178: 3043–3074.
Zhang, N. G. and Zeng, C. 2008. Reference energy extremal optimization: A stochastic
search algorithm applied to computational protein design. Journal of Computational
Chemistry 29: 1762–1771.
Zhang, W. X. 2001. Phase transitions and backbones of 3-SAT and maximum 3-SAT.
Proceedings of the 7th International Conference on Principles and Practice of Constraint
Programming, Paphos, Cyprus, pp. 153–167.
Zhang, W. X. 2002. Phase transitions, backbones, measurement accuracy, and phase-
aware approximation: The ATSP as a case study. Proceedings of CP-AI-OR, Le Croisic,
France, pp. 345–357.
Zhang, W. X. 2004. Phase transitions and backbones of the asymmetric traveling salesman
problem. Journal of Artificial Intelligence Research 21: 471–497.
Zhang, W. X. and Looks, M. 2005. A novel local search algorithm for the traveling salesman
problem that exploit backbones. Proceedings of the 19th International Joint Conference
on Artificial Intelligence, Morgan Kaufmann Publishers, San Francisco, pp. 343–351.
Zhao, S. Z., Iruthayarajan, M. W., Baskar, S., and Suganthan, P. N. 2011. Multi-objective
robust PID controller tuning using two lbests multi-objective particle swarm optimi-
zation. Information Sciences 181: 3323–3335.
Zhu, W. and Ali, M. M. 2009. Solving nonlinearly constrained global optimization problem
via an auxiliary function method. Journal of Computational and Applied Mathematics
230: 491–503.
Zitzler, E., Deb, K., and Thiele, L. 2000. Comparison of multi-objective evolutionary algo-
rithms: Empirical results. Evolutionary Computation 8 (2): 173–195.
Zitzler, E., Laumanns, M., and Thiele, L. 2001. SPEA2: Improving the performance of the
strength Pareto evolutionary algorithm. Technical Report 103, Computer Engineering
and Communication Networks Lab (TIK), Swiss Federal Institute of Technology
(ETH), Zurich, Gloriastrasse 35, CH-8092, Zurich.
Zitzler, E. and Thiele, L. 1998. Multi-objective optimization using evolutionary algo-
rithms—A comparative case study. Proceedings of the 7th International Conference on
Parallel Problem Solving from Nature, PPSN-V [M], Springer, Berlin.
Zitzler, E. and Thiele, L. 1999. Multi-objective evolutionary algorithms: A comparative
case study and the strength Pareto approach. IEEE Transactions on Evolutionary
Computation 3 (4): 257–271.
Chen, Y. W., Lu, Y. Z., and Chen, P. 2007. Optimization with extremal dynamics for the
traveling salesman problem. Physica A 385: 115–123.
Chen, Y. W., Lu, Y. Z., Ge, M., Yang, G. K., and Pan, C. C. 2012. Development of hybrid
evolutionary algorithms for production scheduling of hot strip mill. Computers and
Operations Research 39 (2): 339–349.
Chen, Y. W., Zhu, Y. J., Yang, G. K., and Lu, Y. Z. 2011. Improved extremal optimization
for the asymmetric traveling salesman problem. Physica A 390 (2011): 4459–4465.
Chen, Y. W., Lu, Y. Z., and Yang, G. 2008. Hybrid evolutionary algorithm with mar-
riage of genetic algorithm and extremal optimization for production scheduling.
International Journal of Advanced Manufacturing Technology 36: 959–968.
Li, X., Luo, J., Chen, M. R., and Wang, N. 2012. An improved shuffled frog-leaping algo-
rithm with extremal optimisation for continuous optimisation. Information Sciences
192: 143–151.
Liu, J., Chen, Y. W. 2012. Toward understanding the optimization of complex systems.
Artificial Intelligence Review 38: 313–324.
Liu, J., Chen, Y. W., Yang, G. K., and Lu, Y. Z. 2011. Self-organized combinatorial optimi-
zation. Expert Systems with Applications 38: 10532–10540.
Lu, Y. Z. 1996. Industrial Intelligent Control: Fundamentals and Applications. John Wiley
& Sons, New York.
Lu, Y. Z., Chen, M. R., and Chen, Y. W. 2007. Studies on extremal optimization and
its applications in solving real world optimization problems. Proceedings of the 2007
IEEE Symposium on Foundations of Computational Intelligence (FOCI 2007), Hawaii,
pp. 162–168.
Luo, J. and Chen, M. R. 2014. Multi-phase modified shuffled frog leaping algorithm
with extremal optimization for the MDVRP and the MDVRPTW. Computers and
Industrial Engineering 72: 84–97.
Zeng, G. Q. 2011. Research on modified extremal optimization algorithms and their appli-
cations in combinatorial optimization problems. PhD thesis, Zhejiang University,
Hangzhou, China.
Zeng, G. Q., Chen, J., Chen, M. R., Dai, Y. X., Li, L. M., Lu, K. D., and Zheng, C. W.
2015. Design of multivariable PID controllers using real-coded population-based
extremal optimization. Neurocomputing 151: 1343–1353.
Zeng, G. Q., Chen, J., Dai, Y. X., Li, L. M., Zheng, C. W., and Chen, M. R. 2015. Design
of fractional order PID controller for automatic regulator voltage system based on
multi-objective extremal optimization. Neurocomputing 160: 173–184.
Zeng, G. Q., Lu, K. D., Dai, Y. X., Zhang, Z. J., Chen, M. R., Zheng, C. W., Peng, W. W.,
and Wu, D. 2014. Binary-coded extremal optimization for the design of PID control-
lers. Neurocomputing 138: 180–188.
Zeng, G. Q., Lu, Y. Z., Dai, Y.-X., Wu, Z. G., Mao, W. J., Zhang, Z. J., and Zheng, C. W.
2012. Backbone guided extremal optimization for the hard maximum satisfiability
problem. International Journal of Innovative Computing, Information and Control
8 (12): 8355–8366.
Zeng, G. Q., Lu, Y. Z., Mao., and W. J. 2010a. A novel stochastic method with modified
extremal optimization and nearest neighbor search for hard combinatorial problems.
Proceedings of the 8th World Congress on Intelligent Control and Automation, Jinan,
China, pp. 2903–2908.
Zeng, G. Q., Lu, Y. Z., and Mao, W. J. 2010b. Multistage extremal optimization for hard
travelling salesman problem. Physica A 389 (21): 5037–5044.
314 ◾ References
Zeng, G. Q., Lu, Y. Z., and Mao, W. J. 2011. Modified extremal optimization for the
hard maximum satisfiability problem. Journal of Zhejiang University, Science C 12
(7): 589–596.
Zeng, G. Q., Lu, Y. Z., Mao, W. J., and Chu, J. 2010c. Study on probability distributions
for evolution in modified extremal optimization. Physica A 389 (9): 1922–1930.
(a) 1
0.9
0.8
Pe
0.7 Qe
Pp
0.6 Qp
Probability
Ph
0.5 Qh
0.4
0.3
0.2
0.1
0
100 101 102 103
k
(b) 1
0.9 1
0.995
0.8
0.99
0.7 0.985
0.98
0.6 101 102 103
Probability
Pe Qe Pp Qp Ph Qh
0.5
0.02
0.4 0.015
0.01
0.3
0.005
0.2
0
101 102 103
0.1
0
100 101 102 103
k
Landscape for two-dimensional Ackley function Landscape for two-dimensional Ackley function
25 8
7
20 6
5
15
4
10 3
2
5
1
0 0
20 2
10 1
0 20 0 2
10 1
–10 0 –1 0
–10 –2 –2 –1
–20 –20
Figure 5.6 Landscape for two-dimensional Ackley function; left: surface plot
in an area from −20 to 20, right: focus around the area of the global optimum at
[0, 0] in an area from −2 to 2.
Originally examples and noisy learning examples for case 3 Originally examples and noisy learning examples for case 4
80
0.8 100 60
1
40
0.6 50
0.5 20
0
Y
Y
0.4 0
0
–50 –20
0.2
–0.5 –40
–100
5 10
0 –60
5 5 10
0 0 5
0 0 –80
–5 –5
–0.2
X2 –5 –5 X1 X2 –10 –10 X1
Figure 7.9 Original function examples and noisy learning examples (⋆) for case
3 (noise level N(0, 0.01)) and case 4 (noise level N(0, 5)). (Reproduced from Chen,
P. and Lu, Y. Z., Journal of Zhejiang University, Science C 12: 297–306, 2011b.
With permission.)
1.2
BCEO
PBPSO
0.4
0.2
0
0 2 4 6 8 10
Time (s)
Figure 7.19 Comparison of step response for plant 1 under different algorithms-
based PID controllers. (Reprinted from Neurocomputing, 138, Zeng, G. Q. et al.,
Binary-coded extremal optimization for the design of PID controllers, 180–188,
Copyright 2014, with permission from Elsevier.)
1.4
1.2
y
0.6 SOGA
BCEO
0.4 PBPSO
0.2
0
0 20 40 60 80 100
Time (s)
Figure 7.20 Comparison of step response for plant 2 under different algorithms-
based PID controller. (Reprinted from Neurocomputing, 138, Zeng, G. Q. et al.,
Binary-coded extremal optimization for the design of PID controllers, 180–188,
Copyright 2014, with permission from Elsevier.)
(a) 1.2
0.8
Ideal position signal
AGA
0.6
y1
BCEO
PBPSO
0.4
0.2
0
0 50 100 150
Time (min)
(b) 1.2
PBPSO
0.4
0.2
0
0 50 100 150
Time (min)
1.2
τ = 1.15
0.6 τ = 1.20
τ = 1.25
τ = 1.30
0.4 τ = 1.35
τ = 1.40
τ = 1.45
0.2 τ = 1.50
0
0 10 20 30 40 50 60 70 80 90 100
Time (s)
Figure 7.22 Adjustable parameter τ versus the step response for single-variable
plant 2. (Reprinted from Neurocomputing, 138, Zeng, G. Q. et al., Binary-coded
extremal optimization for the design of PID controllers, 180–188, Copyright
2014, with permission from Elsevier.)
Figure 8.12 Main graphical user interface of the developed HSM-scheduling sys-
tem. (Reprinted from Computers and Operations Research, 39 (2), Chen, Y. W.
et al., Development of hybrid evolutionary algorithms for production scheduling
of hot strip mill, 339–349, Copyright 2012, with permission from Elsevier.)