0% found this document useful (0 votes)
16 views

The Fully Informed Particle Swarm Simpler Maybe Better

Uploaded by

nyachadedisvi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views

The Fully Informed Particle Swarm Simpler Maybe Better

Uploaded by

nyachadedisvi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

204 IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, VOL. 8, NO.

3, JUNE 2004

The Fully Informed Particle Swarm:


Simpler, Maybe Better
Rui Mendes, Member, IEEE, James Kennedy, and José Neves

Abstract—The canonical particle swarm algorithm is a new ap- The two versions are equivalent, but are simply implemented
proach to optimization, drawing inspiration from group behavior differently. The second form is used in the present investiga-
and the establishment of social norms. It is gaining popularity, es- tions. Other versions exist, but all are fairly close to the models
pecially because of the speed of convergence and the fact that it
is easy to use. However, we feel that each individual is not simply given above.
influenced by the best performer among his neighbors. We, thus, A particle searches through its neighbors in order to identify
decided to make the individuals “fully informed.” The results are the one with the best result so far, and uses information from that
very promising, as informed individuals seem to find better solu- one source to bias its search in a promising direction. There is
tions in all the benchmark functions. no assumption, however, that the best neighbor at time actually
Index Terms—Optimization, particle swarm optimization, social found a better region than the second or third best neighbors.
networks. Important information about the search space may be neglected
through overemphasis on the single best neighbor.
I. INTRODUCTION When constriction is implemented as in the second version
above, lightening the right-hand side of the velocity formula,

T HE CANONICAL particle swarm algorithm works by


searching iteratively in a region that is defined by each
particle’s best previous success, the best previous success of
the constriction coefficient is calculated from the values of
the acceleration coefficient limits and , importantly, it is
the sum of these two coefficients that determines what to use.
any of its neighbors, the particle’s current position, and its
This fact implies that the particle’s velocity can be adjusted by
previous velocity. The current paper proposes an alternative
that is conceptually more concise and promises to perform any number of terms, as long as the acceleration coefficients
more effectively than the traditional particle swarm algorithm. sum to an appropriate value. For instance, the algorithm given
In this new version, the particle uses information from all its above is often used with and .
neighbors, rather than just the best one. The coefficients must sum, for that value of to 4.1. Clerc’s
The standard algorithm is given in some form resembling the analysis was worked out using a condensed form of the formula
following:
(5)
(6)
(1)
(2) which was then expanded to partition the acceleration
weight between the particle’s own previous success and
where denotes point-wise vector multiplication, the neighborhood’s, such that . Note
is a function that returns a vector whose positions are randomly
that in this deterministic model is calculated as
generated following the uniform distribution between and
.
, is called the inertia weight and is less than 1, and
represent the speed and position of the particle at time ,
II. VARIATION AND PARTITIONING OF
refers to the best position found by the particle, and refers
to the position found by the member of its neighborhood that The search of particle converges on a point in the search
has had the best performance so far. The Type 1 constriction space. Variation is introduced in several ways.
coefficient is often used [1] • First, obviously, the term is weighted by a random number.
This in itself, however, would not prevent the velocity
from approaching a zero limit. For instance, if the
(3) difference equals zero, the velocity will still converge to
zero.
(4)
• Thus, another important source of variation is the dif-
Manuscript received July 10, 2002; revised August 29, 2003. The work ference between and . As long as the position of
of R. Mendes was supported in part by PRAXIS XXI, ref. BD/3107/9 and the particle differs from the previous best position, then
POSI/ROBO/43904/2002.
R. Mendes and J. Neves are with the Departmento de Informática, Univer- there will be movement. In a constricted algorithm, how-
sidade do Minho, Braga 4710-057, Portugal (e-mail: [email protected]; ever, this difference tends toward zero over time as is
[email protected]). updated.
J. Kennedy is with the Bureau of Labor Statistics, Washington, DC 20212
USA (e-mail: [email protected]). • Of course, it is hoped in practice that does not re-
Digital Object Identifier 10.1109/TEVC.2004.826074 main fixed, and a key source of variation is the updating of
1089-778X/04$20.00 © 2004 IEEE

Authorized licensed use limited to: BEIHANG UNIVERSITY. Downloaded on March 14,2023 at 03:05:37 UTC from IEEE Xplore. Restrictions apply.
MENDES et al.: FULLY INFORMED PARTICLE SWARM: SIMPLER, MAYBE BETTER 205

over time as new points are found in the search space


which are better than those previous ones. It is not neces-
sary for to be ’s own previous best point, in order for
’s trajectory to converge to it. For convergence, it is only
necessary for to remain fixed.
• In the traditional particle swarm, the very most important
source of variation is the difference between ’s own pre-
vious best and the neighborhood’s previous best, that is,
between and . Random weighting of the two terms
keeps the particle searching between and beyond a re-
gion defined by the two points. While some investigators
have looked at schemes for differentially weighting the
two terms (e.g., [2]), the limits for the two uniform dis-
tributions are usually the same. That is, the total weight of Fig. 1. Topologies used in the paper are presented in the following order:
is partitioned into two equal components. All, where all vertexes are connected to every other; Ring, where every vertex
Clerc’s method, however, does not require that the velocity is connected to two others; Four clusters, with four cliques connected among
themselves by gateways; Pyramid, a triangular wire-frame pyramid, and Square,
adjustments be shared between two terms. It is only necessary which is a mesh where every vertex has four neighbors that wraps around on the
that the parts sum to a value that is appropriate for the constric- edges as a torus.
tion weight . The algorithm will behave properly, at least as
far as its convergence and explosion characteristics, whether all
of is allocated to one term, or it is divided into thirds, fourths, The behavior of each particle is affected by its local neigh-
etc. borhood, which can be seen as a single region in the population
We propose an alternate form of calculating topology. Thus, the topology affects search at a low level by
defining neighborhoods. Particles that are acquainted to one an-
other tend to explore the same region of the search space. It also
(7) affects search at a higher level, by defining the relationships be-
tween the local neighborhoods.
(8) The current study tested five different social networks that
had given good results in a previous study [3]. The networks are
encoded in binary matrices for input into the program, and are
where is the set of neighbors of the particle and is the best depicted graphically in Fig. 1.
position found by individual . A social network can be characterized by a series of statistics
The function may describe any aspect of the particle that is that convey some information about its structure and the speed
hypothesized to be relevant; in the experiments reported below, of communication flow. The most descriptive statistics are the
we use the fitness of the best position found by the particle, graph’s average distance, its diameter and the distribution se-
and the distance from that particle to the current individual, or quence. The average distance measures the average number of
have return a constant value. Because all the neighbors con- edges between any two nodes. The diameter is the largest dis-
tribute to the velocity adjustment, we say that the particle is fully tance between two nodes in the graph. The distribution sequence
informed. is a descriptive statistic, of the form where
is the average number of nodes reachable from a vertex of the
III. SOCIOMETRY IN THE FULLY INFORMED PARTICLE SWARM graph by traversing exactly arcs, without cycles. Note that the
In the traditional particle swarm, a particle with neighbors first value of the distribution sequence, , is the average degree
selects one to be a source of influence and ignores the others. In of the graph.
that situation, neighborhood size means how many other parti- Whenever a particle discovers a good region of the search
cles you can choose among, and the more there are, the better space, it only directly influences its neighbors. Its second
the one you pick is likely to be. In the fully informed neighbor- degree neighbors will only be influenced after those directly
hood, however, all neighbors are a source of influence. Thus, connected to it become highly successful themselves. Thus,
neighborhood size determines how diverse your influences will there is a delay in the information spread through the graph.
be and in an optimization algorithm diverse influences might This delay can be characterized by the distribution sequence
mean that search is diluted rather than enhanced. statistic. The average distance and the diameter of the graph
The rest of the paper will describe experiments with var- are two simple statistics that represent the average and the
ious neighborhoods, where all the neighbors’ previous best maximum, respectively, number of cycles of influence needed
values are used to modify the velocity of the particle. These to broadcast information throughout the graph.
arrangements of the neighborhoods can be thought of as social By studying Table I, we can extract a number of conclusions
networks. about the topologies used. The all topology was the one used
It should be appreciated that the topological structure of the when the algorithm was developed and is still widely used by
population controls its exploration versus exploitation tenden- researchers. It represents a fully connected graph, and, based
cies [3], [4]. on all three statistics, we conjecture that information spreads

Authorized licensed use limited to: BEIHANG UNIVERSITY. Downloaded on March 14,2023 at 03:05:37 UTC from IEEE Xplore. Restrictions apply.
206 IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, VOL. 8, NO. 3, JUNE 2004

TABLE I continuous point that is the global optimum. Over all possible
TOPOLOGIES USED IN THE STUDY AND THE
functions, it must be true that gradients lead away from the
ASSOCIATED GRAPH STATISTICS
optimum at least as often as they lead the searcher toward it.
The second class, random functions, contains very many more
members than the first [8]. In fact, when all possible functions
are considered, it seems certain—indeed it can be proven—that
most of them are nonsense. Where gradients exist, they are un-
related to real solutions. On these very numerous function land-
scapes, a hill-climber will do no better than a hill-descender, no
matter whether you are trying to maximize or minimize. It is
quickly. Sociologically, it could represent a small and closed like finding a needle in a haystack; no method of search can be
community where decisions are taken in consensus. any better than dumb luck. These two classes of functions ex-
The ring topology, the usual alternative to all, represents a plain why there is NFL.
regular graph with a minimum number of edges between its But there is a third class of functions. These are functions
nodes. The graph statistics show that information travels slowly where regularities on the fitness landscape do provide clues as
along the graph. This allows for different regions of the search to the location of a problem solution. Speaking of dumb luck, it
space to be explored at the same time, as information of suc- is lucky for us that this third class contains most of the kinds of
cessful regions takes a long time to travel to the other side of the functions that we call problems. Problems are a special subclass
graph. of functions; they are special because somebody thinks there
The four clusters topology represents four cliques connected may be a solution to them, and wants to find it.
among themselves by several gateways. Sociologically, it re- It is interesting to consider whether this third class of func-
sembles four mostly isolated communities, where a few indi- tions is actually more common in the world, perhaps because
viduals have an acquaintance outside their group. This graph of correlations forced by physical laws, or whether they are
is characterized by the large number of individuals three hops merely more salient because of some idiosyncrasy of human at-
away, despite the fact that its diameter is only 3. tention. As we cannot count up instances of real function land-
The pyramid represents a three-dimensional wire-frame scapes—like the set of “all possible functions” it is innumerable
pyramid. It has the lowest average distance of all the graphs and and meaningless—we will never be able to satisfy our curiosity
the highest first and second degree neighbors. The square is a regarding this question.
graph representing a rectangular lattice that folds like a torus. How do we know if a function has a solution or not? Of
This structure, albeit artificial, is commonly used to represent course, we have known since Turing that we cannot tell with
neighborhoods in the Evolutionary Computation and Cellular certainty whether an algorithm will ever reach finality [9], that
Automata communities, and is referred to as the von Neumann is in this case, whether a problem can be solved. But even though
neighborhood. there is no certainty, there are clues. For instance, if it is believed
that a cause and effect relationship exists among variables in the
IV. DEPENDENT VARIABLES AND FREE LUNCH function, then we may expect to find some exploitable regular-
ities in the fitness landscape. Even if the causal relationship is
The present experiments extracted three kinds of measures noisy, or if the relationship involves variables not mentioned in
of performance on a standard suite of test functions. The func- the function (e.g., the “third variable problem” in correlational
tions were the sphere or parabolic function in 30 dimensions, research [10]), it is often possible to find useful features on the
Rastrigin’s function in 30 dimensions, Griewank’s function in function landscape.
10 and 30 dimensions (the importance of the local minima is Another clue that a function might be solvable is when it is
much higher in 10 dimensions, due to the product of co-sinuses, compressible. The easiest-to-spot form of this clue exists when
making it much harder to find the global minimum), Rosen- the problem is given as a mathematical formula, rather than
brock’s function in 30 dimensions, and Schaffer’s f6, which is a lookup table. If the formula is shorter than the table of all
in 2 dimensions. Formulas can be found in the literature, e.g., in possible input–output matches, then we have been given a hint
[5]. that it might be useful to watch for regularities. The evidence
It does not seem interesting to us to demonstrate that an al- of this is seen in the difficulty of the search for functions that
gorithm is good on some functions and not on others. What we produce random outputs [11]; it is not easy to produce an un-
hope for is a problem-solver that can work well with a wide predictable series out of a mathematical formula, e.g., a good
range of problems. This line of reasoning drives us head-on into random number generator, even though random functions are
the no free lunch (NFL) theorem [6], [7]. known to comprise the larger share of the universe of all pos-
sible functions.
A. Free Lunch In some hard cases, we may have only hypotheses and intu-
NFL asserts that no algorithm can be better than any other, itions to provide ideas for how to search for patterns that will
over all possible functions. This seems to be true because of reveal the problem solution. Sometimes we are wrong, and a
two classes of functions: deceptive ones, and random ones. De- problem is reassigned to one of the first two classes of functions.
ceptive functions lead a hill-climber away from the optimum, We reiterate the important point that a function is only a
for instance there may be gradients that lead away from a dis- problem if someone thinks it is a problem. That means that

Authorized licensed use limited to: BEIHANG UNIVERSITY. Downloaded on March 14,2023 at 03:05:37 UTC from IEEE Xplore. Restrictions apply.
MENDES et al.: FULLY INFORMED PARTICLE SWARM: SIMPLER, MAYBE BETTER 207

a function, let us say Schaffer’s f6, may exist as a curiosity TABLE II


without anyone ever trying to find its minimum. All of science PARAMETERS AND CRITERIA FOR THE TEST FUNCTIONS
can be viewed as a progression of things that have always existed
suddenly becoming problems to be solved and explained. The
argument presented here is the pragmatist’s response to NFL. If
somebody is trying to solve it, it is a problem; even if it does
turn out to be deceptive or random, and they give up on it, and it
stops being a problem, it is a problem during the time they are
trying to solve it. It may remain a problem if something about it
gives the researcher hope of solving it.
This is all a way of saying that the NFL theorem, eyeball-
popping as it is, is not especially relevant to the task of problem- There is, however, a problem with this measure, too. That
solving. NFL does not say that the search for a general problem- is, some trials might never reach the criteria. Many hours have
solver is futile; it does say that the search for a general function been lost waiting, trying to give each version a fair chance to
optimizer is futile. As researchers, it is our aim to minimize the find the global optimum, often in vain. Trials where the criteria
amount of time we devote to searching for optima on deceptive are not met after a reasonable time—here, we use 10 000 iter-
and random function spaces. ations—must be coded as infinite, which means among other
Thus, in the current exercises we combined results from all things that the mean is meaningless.
the test functions, all of which are commonly used in experi- The proper measure of central tendency for such a data set is
mentation with optimization algorithms, with the goal in mind the median. If the majority of trials are coded as infinite, then
of finding versions of the particle swarm algorithm that perform the median is represented as infinity, shown in the results tables
well on all of them. If we are successful in this, then we will nat- with the lemniscus. In order to combine iteration data, we used
urally extend the range of problems until we have widened the the mean of the medians, with the caveat that if any median were
applicability of the particle swarm to its broadest extent. infinite, the mean would be infinite, too.
Note that the first measure, performance, considers data after
B. Performance
a short run of 1000 iterations, and is a speed measure. The trials
The first dependent variable is simply the best function result were run for as many as 10 000 iterations, however, to determine
after some arbitrary number of iterations, here, we use 1,000. whether the criterion would be met at all. Thus, one measure
Basically, this is a measure of sloppy speed. It does not neces- was taken at 1000 iterations, and then if the criterion had not
sarily indicate whether the algorithm is close to the global op- been met, the trial ran for as many iterations as were necessary.
timum; a relatively high score can be obtained on some of these If the criterion was not met by 10 000 iterations, the trial was
multimodal functions simply by finding the best part of a locally treated as if the criterion would never be met. In most cases
optimal region. this is true, as failure after 10 000 iterations suggests that the
It is not possible to combine raw results from different func- population has converged in an area of the search space that
tions, as they are all scaled differently. For instance, almost any is not globally optimal. The first measure determines whether
decent algorithm will find a function result less than 0.01 on the the algorithm can get a good solution fast, e.g., after only 1000
sphere function, but a result of 40.0 on Rosenbrock is considered iterations, while the second and third measures determine how
good. In order to combine the function outputs, we standardized long it takes to find the global optimum if left to run, or whether
the results of each function to a mean of 0.0 and standard de- it can find it at all.
viation of 1.0. All results of all trials for a single function are
standardized to the same scale; as all of these problems involve D. Proportion Reaching Criteria
minimization, a lower result is better, and after standardization
The third dependent measure is perhaps the most important
that means that a negative result is better than average. After
one. This is a simple binary code indicating whether the cri-
standardizing each function separately, we can combine them
teria were met within 10 000 iterations or not. Averaged over all
and find the average for a single condition.
function trials, this gives the proportion of trials that success-
One comment about combining data from different functions:
fully found the global optimum. There is no trick to this one;
when a very good performance is combined with a very bad
the mean of the ones and zeroes, where one indicates success
one, the result is a moderate average. On the other hand, a very
and zero failure, gives the proportion of successes.
good average can only be attained through combining very good
scores. In this paper, we are interested in discovering very good
performers and will neglect the confusion found in the middle. V. METHOD
The experiment manipulated neighborhood topologies,
C. Iterations to Criteria initialization strategies, and algorithm details. The types of
The second dependent variable is the number of iterations re- topologies have been described and are shown in Fig. 1. Two
quired to reach a criterion. Function criteria are given in Table II. kinds of initialization strategies were used, which we called,
This is also a measure of speed, but in this case the criteria are after Shi and Eberhart [12] “symmetrical” and “asymmet-
intended to indicate that the searcher has arrived in the region rical.” Symmetrical initialization is performed over the entire
of the global optimum. spectrum of valid solutions, while asymmetrical initialization

Authorized licensed use limited to: BEIHANG UNIVERSITY. Downloaded on March 14,2023 at 03:05:37 UTC from IEEE Xplore. Restrictions apply.
208 IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, VOL. 8, NO. 3, JUNE 2004

TABLE III
STANDARDIZED PERFORMANCE OF THE TOPOLOGIES AND ALGORITHMS. NEGATIVE VALUES ARE BELOW THE MEAN
WHILE POSITIVE VALUES ARE ABOVE. AS THE TASKS INVOLVE MINIMIZATION, THE BEST PERFORMANCES
ARE THE MOST NEGATIVE. IN BOLD ARE THE BEST RESULTS FOR EACH ALGORITHM/INITIALIZATION PAIR

started particles with an offset, so they were off-center. This one cell of the All topology was more than 3 s.d., and two cells
eliminated any advantage that might be gained when function more than one s.d. worse than the mean; and two of the UAll
optima were located near the center of the parameter space. topology conditions were farther than one s.d. worse than the
There were five kinds of algorithm types: mean. Two other cells in the Square, and two in the Pyramid
Canonical: the traditional particle swarm, with Type 1 topology, were less than one s.d. worse than the mean. These
constriction; account for all of the worse-than-average cells in the design.
FIPS: the fully informed particle swarm with re- Looking for excellence, we note that of the eight conditions
turning a constant, i.e., where all contributions resulting in a performance 0.4 standard deviations or farther
have the same value; below the mean, five of them occurred when the neighborhood
wFIPS: a fully informed swarm, where the contribution was the unselfed square. The other three appear in selfless
of each neighbor was weighted by the goodness pyramid conditions. The best performance of all occurred in
of its previous best; the selfless-square FIPS configuration.
wdFIPS: also fully informed, with the contribution of In light of the results presented below, it is noteworthy that
each neighbor weighted by its distance in the problem solving using the URing topology was rather slow, rel-
search space from the target particle; ative to the others, while the USquare was rather fast.
Self: a fully informed model, where the particle’s The performance measure tells us how well a problem-solver
own previous best received half the weight; is able to do within a limited amount of time. Many times in real-
world applications it is “good enough” to find a good point on
wSelf: a fully informed model, where the particle’s
a local optimum; this first dependent variable tells us how high
own previous best received half the weight and
an algorithm is able to get on a fitness peak, but says nothing
the contribution of each neighbor was weighted
about whether it is the globally best peak.
by the goodness of its previous best.
Canonical, FIPS, and wFIPS were tested with both symmet-
rical and asymmetrical initializing. B. Iterations to Criteria
The five types of topologies shown in Fig. 1 were tested. How quickly does an algorithm reach a criterion that presum-
As some were tested with and without including the target par- ably reflects the presence of a global optimum? In Table IV,
ticle in the neighborhood, there were nine topology conditions: we see that some algorithm conditions cannot reach the crite-
Square, Ring, Pyramid, and All were tested both ways, and rion, even after 10 000 iterations. In particular, the wdFIPS tends
FourClusters was only tested with the self excluded. Conditions not to reach it, especially with topologies that showed badly
without the self are written with a “U” prefix, e.g., USquare is on the performance measure, as well; the All and UAll mea-
the Square topology, with the reference to the particle’s own sures also failed in all cases with the FIPS variations, though
index removed from the neighborhood. they displayed about average success on the canonical algo-
rithms. A few other topologies had trouble with the asymmet-
VI. RESULTS rical initializations.
Again, the URing was relatively slow and the USquare rel-
We present the results on the three dependent measures sep- atively faster than others. The canonical versions were moder-
arately. Following that we look at patterns across the measures, ately slow. The configurations that converged the fastest were
and finally we discuss the implications of the results. the UPyramid on both FIPS and wFIPS, and the Four-Cluster
topology on wFIPS.
A. Performance Medians are used in this measure to account for failures to
Table III shows the pattern of standardized averages across meet the criterion at all. A cell may have as many as half its trials
the topologies and algorithms. Recalling that positive values in- fail to meet the standard, but if the remaining trials went quickly,
dicate bad performance and negative ones good for the mini- the median iterations will suggest erroneously that something
mization problem, we notice some patterns immediately. For in- good has happened. Fast convergence of a configuration to the
stance, four of the nine wdFIPS algorithm conditions are quite performance criterion on a large percentage of trials would sug-
bad (more than three standard deviations worse than the mean); gest good problem solving qualities; fast convergence on half

Authorized licensed use limited to: BEIHANG UNIVERSITY. Downloaded on March 14,2023 at 03:05:37 UTC from IEEE Xplore. Restrictions apply.
MENDES et al.: FULLY INFORMED PARTICLE SWARM: SIMPLER, MAYBE BETTER 209

TABLE IV
MEDIAN NUMBER OF ITERATIONS TO CRITERIA. THESE REPRESENT THE NUMBER OF ITERATIONS THE ALGORITHM TOOK TO
REACH THE CRITERIA. AN INFINITE VALUE MEANS THAT AT LEAST HALF THE EXPERIMENTS WERE UNSUCCESSFUL.
IN BOLD ARE THE QUICKEST RESULT FOR EVERY ALGORITHM/INITIALIZATION PAIR

TABLE V
PROPORTION OF EXPERIMENTS REACHING CRITERIA. THEY REPRESENT FOR EACH CONFIGURATION THE PROPORTION OF
RUNS THAT WERE ABLE TO REACH THE REGION SPECIFIED BY THE CRITERIA. IN BOLD ARE THE
BEST RESULTS FOR EACH ALGORITHM/INITIALIZATION PAIR

the trials would not. The next dependent variable tells us how D. Combining the Measures
often the criteria were met.
Our prejudice is that the most weight should be given to the
last measure, the proportion of successes, though the other mea-
C. Proportion of Trials Reaching Criteria sures should be taken into account. By this rule of thumb, we
For us, the third dependent measure is the most important. could recommend always using the URing, which never failed
With today’s computer speeds, the difference of a few thou- when implemented with the wFIPS algorithm. We remember,
sand iterations may be a matter of seconds, and slight speed however, that it was relatively slow by both the first two mea-
advantages are not usually crucial. The proportion measure sures—plus, weighting the FIPS adds some computational cost.
tells, though, in black and white, whether the given algo- If speed is a requirement, and the URing’s relative slowness
rithm/topology configuration can solve the problems (Table V). may create problems, then we would suggest the USquare with
The first result that jumps out is that the URing topology with the unweighted FIPS algorithm. This combination succeeded
the wFIPS algorithm found the global optimum (as measured approximately 98.9% of the time, meaning that it failed three
by meeting the criteria) on 100% of its trials, that is, 40 trials times out of 240. The USquare/FIPS also had the best score at
each on 6 functions, amounting to 240 total trials. This is ob- 1000 iterations and was the ninth fastest to reach the criteria.
viously a remarkable performance. We note also that 24 algo-
rithm/topology combinations, out of 81, met the criterion 90%
of the time or more. The canonical algorithm harbored perfor- VII. CONCLUSION
mances greater than 0.90 on five of nine topologies, and the
wFIPS on five of nine. wFIPS beat the 90% mark in three of The canonical particle swarm algorithm showed itself to be
the asymmetric initialization conditions, while the canonical al- a journeyman problem solver, finding the global optimum a re-
gorithm never did. Unweighted FIPS was above 0.90 four times spectable proportion of the time, depending on topology, and
in the symmetric and three times in the asymmetric initializa- getting there in a respectably fast time. It was outperformed,
tion conditions, and the weighted and unweighted Self algo- though, by the FIPS versions, on every dependent measure.
rithm broke the 0.90 standard one time each. It should be mentioned that the FIPS versions were the only
Looking at topologies, we see that none of the Square, ones able to consistently find the minimum to the Griewank
Pyramid, All, or UAll conditions met the criterion 90% of function in ten dimensions. The best result obtained by a canon-
the time. The Ring did it five times; the Four-Clusters thrice; ical configuration was the USquare with a proportion of 72.5%
USquare six times; URing eight times; and UPyramid twice. It followed by much lower proportions using the other topolo-
appears that the USquare and URing topologies were the most gies (lower than 60%). Both FIPS algorithms with all the social
successful vehicles for the communication among particles, at topologies except the All versions were able to find the min-
least across the algorithms tested here. imum 100% of the time.

Authorized licensed use limited to: BEIHANG UNIVERSITY. Downloaded on March 14,2023 at 03:05:37 UTC from IEEE Xplore. Restrictions apply.
210 IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, VOL. 8, NO. 3, JUNE 2004

The effect of weighting neighbors’ contributions by their fit- [3] J. Kennedy and R. Mendes, “Topological structure and particle swarm
ness is not clearly settled. We have to note that the one condition performance,” in Proc. 4th Congr. Evolutionary Computation (CEC-
2002), D. B. Fogel, X. Yao, G. Greenwood, H. Iba, P. Marrow, and M.
that met the criteria 100% of the time was a weighted condi- Shackleton, Eds., Honolulu, HI, May 2002, pp. 1671–1676.
tion. It could be argued that the extra computational expense of [4] J. Kennedy, “Small worlds and mega-minds: Effects of neighborhood
topology on particle swarm performance,” in Proc. 1999 Conf. Evolu-
weighting is not justified; the unweighted versions performed tionary Computation Washington, DC, 1999, pp. 1931–1938.
quite well, maybe well enough. [5] R. G. Reynolds and C. Chung, “Knowledge-based self-adaptation in
A word needs to be said about the particle’s contribution to evolutionary programming using cultural algorithms,” in Proc. IEEE Int.
Conf. Evolutionary Computation (ICEC’97), 1997, pp. 71–76.
its own trajectory. The U-versions, for instance USquare, per- [6] D. H. Wolpert and W. G. Macready. (1995) No free lunch theorems for
formed better in many cases than versions where the self was search. Tech. Rep. SFI-TR-95-02-010, Santa Fe Inst., Sante Fe, New
included in the neighborhood. This goes contrary to particle Mexico. [Online]. Available: citeseer.nj.nec.com/wolpert95no.html
[7] , “No free lunch theorems for optimization,” IEEE Trans. Evol.
swarm lore, which describes the algorithm in terms of the com- Comput., vol. 1, pp. 67–82, Apr. 1997.
bination of “cognitive” and “social” experience. FIPS versions [8] T. M. English, “Optimization is easy and learning is hard in the typical
function,” in Proc. Congr. Evolutionary Computation CEC00 La Jolla,
where half the weight was given to the self did not perform out- CA, 6–9, 2000, pp. 924–931.
standingly, and it does not appear that the individual’s own pre- [9] A. Turing, “On computable numbers with an application to the
vious best needs to be part of the formula. Entscheidungsproblem,” in Proc. London Mathematical Society, 1936,
pp. 230–265.
But it must be noted that the particle’s own experience does [10] T. Cook and D. Campbell, Quasiexperimentation: Designs and Analysis
contribute information to its trajectory, in the form that Miranda Issues for Field Settings. Skokie, IL: Rand McNally, 1979.
[11] W. Press, S. A. Teukolsky, W. T. Vetterling, and B. P. Flannery, Numer-
and Fonseca [13] call its “habit.” For instance, the current ve- ical Recipes in C, 2nd ed. Cambridge, U.K.: Cambridge Univ. Press,
locity is created out of the velocities of the previous time steps 1992.
as the particle maintains a cyclic trip through the search space. [12] Y. Shi and R. Eberhart, “Parameter selection in particle swarm optimiza-
tion,” in Evolutionary Programming VII: Proc. EP98, Springer-Verlag,
The particle’s current position is featured in every comparison pp. 591–600.
term. Finally, the position in the next step is a function [13] V. Miranda and N. Fonseca, “EPSO—best-of-two-worlds meta-heuristic
of the position in the current one. Thus, the habit of the par- applied to power system problems,” in Proc. 4th Congr. Evolutionary
Computation (CEC-2002), D. B. Fogel, X. Yao, G. Greenwood, H. Iba,
ticle affects everything about its trajectory. It is just that, in the P. Marrow, and M. Shackleton, Eds. Honolulu, HI, May 2002, pp.
FIPS versions without self, the explicit memory of the previous 1080–1085.
best point it has sampled is not part of the particle’s immediate
decision.
As expected, increasing the size of the neighborhood seems
to deteriorate the performance of the swarm. The very worse Rui Mendes (M’03) received the B.S. degree in
FIPS conditions in the study were the UAll and All topologies, mathematics and computer science and the Ph.D.
where the particle is truly fully informed, gathering information degree in computer engineering from the University
of Minho, Braga, Portugal, in 1994 and 2004,
from every single member of the population. The best were the respectively.
Ring and Square versions, where the particle has three and five His thesis was called “Population Topologies and
neighbors (counting itself), respectively, plus their U-versions, Their Influence in Particle Swarm Performance.”
which subtract one. He is a Computer Scientist, who has been working
with the particle swarm algorithm since 2001.
We note that, though asymmetric initialization radically hurt His research interests are swarm intelligence and
performance in many conditions, it had nearly no effect on the evolutionary computation.
USquare and URing conditions in FIPS and wFIPS. The un-
weighted FIPS with URing actually found the global optimum
a greater proportion of the time with asymmetrical initializa-
tion, though we do not insist that this was a significant differ- James Kennedy received the Ph.D. degree from the
ence—but asymmetry clearly does not impair the algorithm. University of North Carolina, Chapel Hill, in 1992.
The fully informed particle swarm is not a radical departure He is with the U.S. Department of Labor, Wash-
from previous versions. The standard two-term PSO is simply ington, DC. He is a Social Psychologist who has
been working with the particle swarm algorithm
seen to be a special case of the FIPS, one that includes the selec- since 1994. He has published dozens of articles and
tion of one particular neighbor to influence the target particle. chapters on particle swarms and related topics, in
The FIPS representation of the particle swarm algorithm has the computer science and social science journals and
proceedings. He is a coauthor of Swarm Intelligence
potential for freeing investigators to look at other important fea- (San Mateo, CA: Morgan Kaufmann, 2001), with R.
tures of the algorithm. C. Eberhart and Y. Shi, now in its third printing.

REFERENCES
[1] M. Clerc and J. Kennedy, “The particle swarm: Explosion, stability, and
convergence in a multi-dimensional complex space,” IEEE Trans. Evol.
Comput., vol. 6, pp. 58–73, Feb. 2002. José Neves is a Full Professor in the Informatics Department, University of
[2] A. Carlisle and G. Dozier, “An off-the-shelf PSO,” in Proc. Workshop Minho, Braga, Portugal. He is the Head of the Artificial Intelligence Group and
on Particle Swarm Optimization. Indianapolis, IN: Purdue School of coordinates several projects with applications in the areas of law and medicine.
Eng. Technol., IUPUI, Apr. 2001. His research interests are knowledge representation and computational logic.

Authorized licensed use limited to: BEIHANG UNIVERSITY. Downloaded on March 14,2023 at 03:05:37 UTC from IEEE Xplore. Restrictions apply.

You might also like