The Fully Informed Particle Swarm Simpler Maybe Better
The Fully Informed Particle Swarm Simpler Maybe Better
3, JUNE 2004
Abstract—The canonical particle swarm algorithm is a new ap- The two versions are equivalent, but are simply implemented
proach to optimization, drawing inspiration from group behavior differently. The second form is used in the present investiga-
and the establishment of social norms. It is gaining popularity, es- tions. Other versions exist, but all are fairly close to the models
pecially because of the speed of convergence and the fact that it
is easy to use. However, we feel that each individual is not simply given above.
influenced by the best performer among his neighbors. We, thus, A particle searches through its neighbors in order to identify
decided to make the individuals “fully informed.” The results are the one with the best result so far, and uses information from that
very promising, as informed individuals seem to find better solu- one source to bias its search in a promising direction. There is
tions in all the benchmark functions. no assumption, however, that the best neighbor at time actually
Index Terms—Optimization, particle swarm optimization, social found a better region than the second or third best neighbors.
networks. Important information about the search space may be neglected
through overemphasis on the single best neighbor.
I. INTRODUCTION When constriction is implemented as in the second version
above, lightening the right-hand side of the velocity formula,
Authorized licensed use limited to: BEIHANG UNIVERSITY. Downloaded on March 14,2023 at 03:05:37 UTC from IEEE Xplore. Restrictions apply.
MENDES et al.: FULLY INFORMED PARTICLE SWARM: SIMPLER, MAYBE BETTER 205
Authorized licensed use limited to: BEIHANG UNIVERSITY. Downloaded on March 14,2023 at 03:05:37 UTC from IEEE Xplore. Restrictions apply.
206 IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, VOL. 8, NO. 3, JUNE 2004
TABLE I continuous point that is the global optimum. Over all possible
TOPOLOGIES USED IN THE STUDY AND THE
functions, it must be true that gradients lead away from the
ASSOCIATED GRAPH STATISTICS
optimum at least as often as they lead the searcher toward it.
The second class, random functions, contains very many more
members than the first [8]. In fact, when all possible functions
are considered, it seems certain—indeed it can be proven—that
most of them are nonsense. Where gradients exist, they are un-
related to real solutions. On these very numerous function land-
scapes, a hill-climber will do no better than a hill-descender, no
matter whether you are trying to maximize or minimize. It is
quickly. Sociologically, it could represent a small and closed like finding a needle in a haystack; no method of search can be
community where decisions are taken in consensus. any better than dumb luck. These two classes of functions ex-
The ring topology, the usual alternative to all, represents a plain why there is NFL.
regular graph with a minimum number of edges between its But there is a third class of functions. These are functions
nodes. The graph statistics show that information travels slowly where regularities on the fitness landscape do provide clues as
along the graph. This allows for different regions of the search to the location of a problem solution. Speaking of dumb luck, it
space to be explored at the same time, as information of suc- is lucky for us that this third class contains most of the kinds of
cessful regions takes a long time to travel to the other side of the functions that we call problems. Problems are a special subclass
graph. of functions; they are special because somebody thinks there
The four clusters topology represents four cliques connected may be a solution to them, and wants to find it.
among themselves by several gateways. Sociologically, it re- It is interesting to consider whether this third class of func-
sembles four mostly isolated communities, where a few indi- tions is actually more common in the world, perhaps because
viduals have an acquaintance outside their group. This graph of correlations forced by physical laws, or whether they are
is characterized by the large number of individuals three hops merely more salient because of some idiosyncrasy of human at-
away, despite the fact that its diameter is only 3. tention. As we cannot count up instances of real function land-
The pyramid represents a three-dimensional wire-frame scapes—like the set of “all possible functions” it is innumerable
pyramid. It has the lowest average distance of all the graphs and and meaningless—we will never be able to satisfy our curiosity
the highest first and second degree neighbors. The square is a regarding this question.
graph representing a rectangular lattice that folds like a torus. How do we know if a function has a solution or not? Of
This structure, albeit artificial, is commonly used to represent course, we have known since Turing that we cannot tell with
neighborhoods in the Evolutionary Computation and Cellular certainty whether an algorithm will ever reach finality [9], that
Automata communities, and is referred to as the von Neumann is in this case, whether a problem can be solved. But even though
neighborhood. there is no certainty, there are clues. For instance, if it is believed
that a cause and effect relationship exists among variables in the
IV. DEPENDENT VARIABLES AND FREE LUNCH function, then we may expect to find some exploitable regular-
ities in the fitness landscape. Even if the causal relationship is
The present experiments extracted three kinds of measures noisy, or if the relationship involves variables not mentioned in
of performance on a standard suite of test functions. The func- the function (e.g., the “third variable problem” in correlational
tions were the sphere or parabolic function in 30 dimensions, research [10]), it is often possible to find useful features on the
Rastrigin’s function in 30 dimensions, Griewank’s function in function landscape.
10 and 30 dimensions (the importance of the local minima is Another clue that a function might be solvable is when it is
much higher in 10 dimensions, due to the product of co-sinuses, compressible. The easiest-to-spot form of this clue exists when
making it much harder to find the global minimum), Rosen- the problem is given as a mathematical formula, rather than
brock’s function in 30 dimensions, and Schaffer’s f6, which is a lookup table. If the formula is shorter than the table of all
in 2 dimensions. Formulas can be found in the literature, e.g., in possible input–output matches, then we have been given a hint
[5]. that it might be useful to watch for regularities. The evidence
It does not seem interesting to us to demonstrate that an al- of this is seen in the difficulty of the search for functions that
gorithm is good on some functions and not on others. What we produce random outputs [11]; it is not easy to produce an un-
hope for is a problem-solver that can work well with a wide predictable series out of a mathematical formula, e.g., a good
range of problems. This line of reasoning drives us head-on into random number generator, even though random functions are
the no free lunch (NFL) theorem [6], [7]. known to comprise the larger share of the universe of all pos-
sible functions.
A. Free Lunch In some hard cases, we may have only hypotheses and intu-
NFL asserts that no algorithm can be better than any other, itions to provide ideas for how to search for patterns that will
over all possible functions. This seems to be true because of reveal the problem solution. Sometimes we are wrong, and a
two classes of functions: deceptive ones, and random ones. De- problem is reassigned to one of the first two classes of functions.
ceptive functions lead a hill-climber away from the optimum, We reiterate the important point that a function is only a
for instance there may be gradients that lead away from a dis- problem if someone thinks it is a problem. That means that
Authorized licensed use limited to: BEIHANG UNIVERSITY. Downloaded on March 14,2023 at 03:05:37 UTC from IEEE Xplore. Restrictions apply.
MENDES et al.: FULLY INFORMED PARTICLE SWARM: SIMPLER, MAYBE BETTER 207
Authorized licensed use limited to: BEIHANG UNIVERSITY. Downloaded on March 14,2023 at 03:05:37 UTC from IEEE Xplore. Restrictions apply.
208 IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, VOL. 8, NO. 3, JUNE 2004
TABLE III
STANDARDIZED PERFORMANCE OF THE TOPOLOGIES AND ALGORITHMS. NEGATIVE VALUES ARE BELOW THE MEAN
WHILE POSITIVE VALUES ARE ABOVE. AS THE TASKS INVOLVE MINIMIZATION, THE BEST PERFORMANCES
ARE THE MOST NEGATIVE. IN BOLD ARE THE BEST RESULTS FOR EACH ALGORITHM/INITIALIZATION PAIR
started particles with an offset, so they were off-center. This one cell of the All topology was more than 3 s.d., and two cells
eliminated any advantage that might be gained when function more than one s.d. worse than the mean; and two of the UAll
optima were located near the center of the parameter space. topology conditions were farther than one s.d. worse than the
There were five kinds of algorithm types: mean. Two other cells in the Square, and two in the Pyramid
Canonical: the traditional particle swarm, with Type 1 topology, were less than one s.d. worse than the mean. These
constriction; account for all of the worse-than-average cells in the design.
FIPS: the fully informed particle swarm with re- Looking for excellence, we note that of the eight conditions
turning a constant, i.e., where all contributions resulting in a performance 0.4 standard deviations or farther
have the same value; below the mean, five of them occurred when the neighborhood
wFIPS: a fully informed swarm, where the contribution was the unselfed square. The other three appear in selfless
of each neighbor was weighted by the goodness pyramid conditions. The best performance of all occurred in
of its previous best; the selfless-square FIPS configuration.
wdFIPS: also fully informed, with the contribution of In light of the results presented below, it is noteworthy that
each neighbor weighted by its distance in the problem solving using the URing topology was rather slow, rel-
search space from the target particle; ative to the others, while the USquare was rather fast.
Self: a fully informed model, where the particle’s The performance measure tells us how well a problem-solver
own previous best received half the weight; is able to do within a limited amount of time. Many times in real-
world applications it is “good enough” to find a good point on
wSelf: a fully informed model, where the particle’s
a local optimum; this first dependent variable tells us how high
own previous best received half the weight and
an algorithm is able to get on a fitness peak, but says nothing
the contribution of each neighbor was weighted
about whether it is the globally best peak.
by the goodness of its previous best.
Canonical, FIPS, and wFIPS were tested with both symmet-
rical and asymmetrical initializing. B. Iterations to Criteria
The five types of topologies shown in Fig. 1 were tested. How quickly does an algorithm reach a criterion that presum-
As some were tested with and without including the target par- ably reflects the presence of a global optimum? In Table IV,
ticle in the neighborhood, there were nine topology conditions: we see that some algorithm conditions cannot reach the crite-
Square, Ring, Pyramid, and All were tested both ways, and rion, even after 10 000 iterations. In particular, the wdFIPS tends
FourClusters was only tested with the self excluded. Conditions not to reach it, especially with topologies that showed badly
without the self are written with a “U” prefix, e.g., USquare is on the performance measure, as well; the All and UAll mea-
the Square topology, with the reference to the particle’s own sures also failed in all cases with the FIPS variations, though
index removed from the neighborhood. they displayed about average success on the canonical algo-
rithms. A few other topologies had trouble with the asymmet-
VI. RESULTS rical initializations.
Again, the URing was relatively slow and the USquare rel-
We present the results on the three dependent measures sep- atively faster than others. The canonical versions were moder-
arately. Following that we look at patterns across the measures, ately slow. The configurations that converged the fastest were
and finally we discuss the implications of the results. the UPyramid on both FIPS and wFIPS, and the Four-Cluster
topology on wFIPS.
A. Performance Medians are used in this measure to account for failures to
Table III shows the pattern of standardized averages across meet the criterion at all. A cell may have as many as half its trials
the topologies and algorithms. Recalling that positive values in- fail to meet the standard, but if the remaining trials went quickly,
dicate bad performance and negative ones good for the mini- the median iterations will suggest erroneously that something
mization problem, we notice some patterns immediately. For in- good has happened. Fast convergence of a configuration to the
stance, four of the nine wdFIPS algorithm conditions are quite performance criterion on a large percentage of trials would sug-
bad (more than three standard deviations worse than the mean); gest good problem solving qualities; fast convergence on half
Authorized licensed use limited to: BEIHANG UNIVERSITY. Downloaded on March 14,2023 at 03:05:37 UTC from IEEE Xplore. Restrictions apply.
MENDES et al.: FULLY INFORMED PARTICLE SWARM: SIMPLER, MAYBE BETTER 209
TABLE IV
MEDIAN NUMBER OF ITERATIONS TO CRITERIA. THESE REPRESENT THE NUMBER OF ITERATIONS THE ALGORITHM TOOK TO
REACH THE CRITERIA. AN INFINITE VALUE MEANS THAT AT LEAST HALF THE EXPERIMENTS WERE UNSUCCESSFUL.
IN BOLD ARE THE QUICKEST RESULT FOR EVERY ALGORITHM/INITIALIZATION PAIR
TABLE V
PROPORTION OF EXPERIMENTS REACHING CRITERIA. THEY REPRESENT FOR EACH CONFIGURATION THE PROPORTION OF
RUNS THAT WERE ABLE TO REACH THE REGION SPECIFIED BY THE CRITERIA. IN BOLD ARE THE
BEST RESULTS FOR EACH ALGORITHM/INITIALIZATION PAIR
the trials would not. The next dependent variable tells us how D. Combining the Measures
often the criteria were met.
Our prejudice is that the most weight should be given to the
last measure, the proportion of successes, though the other mea-
C. Proportion of Trials Reaching Criteria sures should be taken into account. By this rule of thumb, we
For us, the third dependent measure is the most important. could recommend always using the URing, which never failed
With today’s computer speeds, the difference of a few thou- when implemented with the wFIPS algorithm. We remember,
sand iterations may be a matter of seconds, and slight speed however, that it was relatively slow by both the first two mea-
advantages are not usually crucial. The proportion measure sures—plus, weighting the FIPS adds some computational cost.
tells, though, in black and white, whether the given algo- If speed is a requirement, and the URing’s relative slowness
rithm/topology configuration can solve the problems (Table V). may create problems, then we would suggest the USquare with
The first result that jumps out is that the URing topology with the unweighted FIPS algorithm. This combination succeeded
the wFIPS algorithm found the global optimum (as measured approximately 98.9% of the time, meaning that it failed three
by meeting the criteria) on 100% of its trials, that is, 40 trials times out of 240. The USquare/FIPS also had the best score at
each on 6 functions, amounting to 240 total trials. This is ob- 1000 iterations and was the ninth fastest to reach the criteria.
viously a remarkable performance. We note also that 24 algo-
rithm/topology combinations, out of 81, met the criterion 90%
of the time or more. The canonical algorithm harbored perfor- VII. CONCLUSION
mances greater than 0.90 on five of nine topologies, and the
wFIPS on five of nine. wFIPS beat the 90% mark in three of The canonical particle swarm algorithm showed itself to be
the asymmetric initialization conditions, while the canonical al- a journeyman problem solver, finding the global optimum a re-
gorithm never did. Unweighted FIPS was above 0.90 four times spectable proportion of the time, depending on topology, and
in the symmetric and three times in the asymmetric initializa- getting there in a respectably fast time. It was outperformed,
tion conditions, and the weighted and unweighted Self algo- though, by the FIPS versions, on every dependent measure.
rithm broke the 0.90 standard one time each. It should be mentioned that the FIPS versions were the only
Looking at topologies, we see that none of the Square, ones able to consistently find the minimum to the Griewank
Pyramid, All, or UAll conditions met the criterion 90% of function in ten dimensions. The best result obtained by a canon-
the time. The Ring did it five times; the Four-Clusters thrice; ical configuration was the USquare with a proportion of 72.5%
USquare six times; URing eight times; and UPyramid twice. It followed by much lower proportions using the other topolo-
appears that the USquare and URing topologies were the most gies (lower than 60%). Both FIPS algorithms with all the social
successful vehicles for the communication among particles, at topologies except the All versions were able to find the min-
least across the algorithms tested here. imum 100% of the time.
Authorized licensed use limited to: BEIHANG UNIVERSITY. Downloaded on March 14,2023 at 03:05:37 UTC from IEEE Xplore. Restrictions apply.
210 IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, VOL. 8, NO. 3, JUNE 2004
The effect of weighting neighbors’ contributions by their fit- [3] J. Kennedy and R. Mendes, “Topological structure and particle swarm
ness is not clearly settled. We have to note that the one condition performance,” in Proc. 4th Congr. Evolutionary Computation (CEC-
2002), D. B. Fogel, X. Yao, G. Greenwood, H. Iba, P. Marrow, and M.
that met the criteria 100% of the time was a weighted condi- Shackleton, Eds., Honolulu, HI, May 2002, pp. 1671–1676.
tion. It could be argued that the extra computational expense of [4] J. Kennedy, “Small worlds and mega-minds: Effects of neighborhood
topology on particle swarm performance,” in Proc. 1999 Conf. Evolu-
weighting is not justified; the unweighted versions performed tionary Computation Washington, DC, 1999, pp. 1931–1938.
quite well, maybe well enough. [5] R. G. Reynolds and C. Chung, “Knowledge-based self-adaptation in
A word needs to be said about the particle’s contribution to evolutionary programming using cultural algorithms,” in Proc. IEEE Int.
Conf. Evolutionary Computation (ICEC’97), 1997, pp. 71–76.
its own trajectory. The U-versions, for instance USquare, per- [6] D. H. Wolpert and W. G. Macready. (1995) No free lunch theorems for
formed better in many cases than versions where the self was search. Tech. Rep. SFI-TR-95-02-010, Santa Fe Inst., Sante Fe, New
included in the neighborhood. This goes contrary to particle Mexico. [Online]. Available: citeseer.nj.nec.com/wolpert95no.html
[7] , “No free lunch theorems for optimization,” IEEE Trans. Evol.
swarm lore, which describes the algorithm in terms of the com- Comput., vol. 1, pp. 67–82, Apr. 1997.
bination of “cognitive” and “social” experience. FIPS versions [8] T. M. English, “Optimization is easy and learning is hard in the typical
function,” in Proc. Congr. Evolutionary Computation CEC00 La Jolla,
where half the weight was given to the self did not perform out- CA, 6–9, 2000, pp. 924–931.
standingly, and it does not appear that the individual’s own pre- [9] A. Turing, “On computable numbers with an application to the
vious best needs to be part of the formula. Entscheidungsproblem,” in Proc. London Mathematical Society, 1936,
pp. 230–265.
But it must be noted that the particle’s own experience does [10] T. Cook and D. Campbell, Quasiexperimentation: Designs and Analysis
contribute information to its trajectory, in the form that Miranda Issues for Field Settings. Skokie, IL: Rand McNally, 1979.
[11] W. Press, S. A. Teukolsky, W. T. Vetterling, and B. P. Flannery, Numer-
and Fonseca [13] call its “habit.” For instance, the current ve- ical Recipes in C, 2nd ed. Cambridge, U.K.: Cambridge Univ. Press,
locity is created out of the velocities of the previous time steps 1992.
as the particle maintains a cyclic trip through the search space. [12] Y. Shi and R. Eberhart, “Parameter selection in particle swarm optimiza-
tion,” in Evolutionary Programming VII: Proc. EP98, Springer-Verlag,
The particle’s current position is featured in every comparison pp. 591–600.
term. Finally, the position in the next step is a function [13] V. Miranda and N. Fonseca, “EPSO—best-of-two-worlds meta-heuristic
of the position in the current one. Thus, the habit of the par- applied to power system problems,” in Proc. 4th Congr. Evolutionary
Computation (CEC-2002), D. B. Fogel, X. Yao, G. Greenwood, H. Iba,
ticle affects everything about its trajectory. It is just that, in the P. Marrow, and M. Shackleton, Eds. Honolulu, HI, May 2002, pp.
FIPS versions without self, the explicit memory of the previous 1080–1085.
best point it has sampled is not part of the particle’s immediate
decision.
As expected, increasing the size of the neighborhood seems
to deteriorate the performance of the swarm. The very worse Rui Mendes (M’03) received the B.S. degree in
FIPS conditions in the study were the UAll and All topologies, mathematics and computer science and the Ph.D.
where the particle is truly fully informed, gathering information degree in computer engineering from the University
of Minho, Braga, Portugal, in 1994 and 2004,
from every single member of the population. The best were the respectively.
Ring and Square versions, where the particle has three and five His thesis was called “Population Topologies and
neighbors (counting itself), respectively, plus their U-versions, Their Influence in Particle Swarm Performance.”
which subtract one. He is a Computer Scientist, who has been working
with the particle swarm algorithm since 2001.
We note that, though asymmetric initialization radically hurt His research interests are swarm intelligence and
performance in many conditions, it had nearly no effect on the evolutionary computation.
USquare and URing conditions in FIPS and wFIPS. The un-
weighted FIPS with URing actually found the global optimum
a greater proportion of the time with asymmetrical initializa-
tion, though we do not insist that this was a significant differ- James Kennedy received the Ph.D. degree from the
ence—but asymmetry clearly does not impair the algorithm. University of North Carolina, Chapel Hill, in 1992.
The fully informed particle swarm is not a radical departure He is with the U.S. Department of Labor, Wash-
from previous versions. The standard two-term PSO is simply ington, DC. He is a Social Psychologist who has
been working with the particle swarm algorithm
seen to be a special case of the FIPS, one that includes the selec- since 1994. He has published dozens of articles and
tion of one particular neighbor to influence the target particle. chapters on particle swarms and related topics, in
The FIPS representation of the particle swarm algorithm has the computer science and social science journals and
proceedings. He is a coauthor of Swarm Intelligence
potential for freeing investigators to look at other important fea- (San Mateo, CA: Morgan Kaufmann, 2001), with R.
tures of the algorithm. C. Eberhart and Y. Shi, now in its third printing.
REFERENCES
[1] M. Clerc and J. Kennedy, “The particle swarm: Explosion, stability, and
convergence in a multi-dimensional complex space,” IEEE Trans. Evol.
Comput., vol. 6, pp. 58–73, Feb. 2002. José Neves is a Full Professor in the Informatics Department, University of
[2] A. Carlisle and G. Dozier, “An off-the-shelf PSO,” in Proc. Workshop Minho, Braga, Portugal. He is the Head of the Artificial Intelligence Group and
on Particle Swarm Optimization. Indianapolis, IN: Purdue School of coordinates several projects with applications in the areas of law and medicine.
Eng. Technol., IUPUI, Apr. 2001. His research interests are knowledge representation and computational logic.
Authorized licensed use limited to: BEIHANG UNIVERSITY. Downloaded on March 14,2023 at 03:05:37 UTC from IEEE Xplore. Restrictions apply.