Pso 1
Pso 1
Research Article
Designing Artificial Neural Networks Using Particle
Swarm Optimization Algorithms
Copyright © 2015 B. A. Garro and R. A. Vázquez. This is an open access article distributed under the Creative Commons
Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is
properly cited.
Artificial Neural Network (ANN) design is a complex task because its performance depends on the architecture, the selected
transfer function, and the learning algorithm used to train the set of synaptic weights. In this paper we present a methodology
that automatically designs an ANN using particle swarm optimization algorithms such as Basic Particle Swarm Optimization
(PSO), Second Generation of Particle Swarm Optimization (SGPSO), and a New Model of PSO called NMPSO. The aim of these
algorithms is to evolve, at the same time, the three principal components of an ANN: the set of synaptic weights, the connections
or architecture, and the transfer functions for each neuron. Eight different fitness functions were proposed to evaluate the fitness
of each solution and find the best design. These functions are based on the mean square error (MSE) and the classification error
(CER) and implement a strategy to avoid overtraining and to reduce the number of connections in the ANN. In addition, the ANN
designed with the proposed methodology is compared with those designed manually using the well-known Back-Propagation
and Levenberg-Marquardt Learning Algorithms. Finally, the accuracy of the method is tested with different nonlinear pattern
classification problems.
described as swarm intelligence. This concept is defined in [1] different values for their parameters. During the experimen-
as a property of systems composed of unintelligent agents tation phase, the best parameter’s values for these algorithms
with limited individual capabilities but with an intelligent are determined to obtain the best results. In addition, the
collective behavior. best configuration is used to generated a set of statistically
There are several works that use evolutionary and bioin- valid experiments for each selected classification problem.
spired algorithms to train ANN as another fundamental form Moreover, the results obtained with the proposed method-
of learning [2]. Metaheuristic methods for training neural ology in terms of the connection’s number, the neuron’s
networks are based on local search, population methods, and number, and the transfer functions selected for each ANN
others such as cooperative coevolutionary models [3]. are presented and discussed. Another contribution of this
An excellent work where the authors show an extensive research is related to a new metric that allows comparing
literature review of evolutionary algorithms that are used to efficiently the results provided by an ANN generated with the
evolve ANN is [2]. However, most of the reported researches proposed methodology. This metric takes into account the
are focused only on the evolution of the synaptic weights, recognition rate obtained during training and testing stages
parameters [4], or involve the evolution of the neuron’s where testing accuracy is more weighted in comparison to
numbers for hidden layers, but the number of hidden layers training accuracy. Finally, the results achieved by the three
is established previously by the designer. Moreover, the bioinspired algorithms are compared against those achieved
researches do not involve the evolution of transfer functions, with two classic learning algorithms. The selection of the
which are an important element of an ANN that determines three bioinspired algorithms was done because NMPSO is a
the output of each neuron. relatively new algorithm (proposed in 2009) which is based
For example, in [5], the authors proposed a method on the metaphor of basic PSO technique so it is important
that combines Ant Colony Optimization (ACO) to find a to compare its performance with others inspired in the same
phenomenon.
particular architecture (the connections) for an ANN and
Particle Swarm Optimization (PSO) to adjust the synaptic In general, it is possible to define the problem to be solved
weights. Other researches like [6] implemented a modifi- as giving a set of input patterns 𝑋 = {x1 , . . . , x𝑝 }, x ∈ R𝑛 ,
cation of PSO mixed with Simulated Annealing (SA) to and a set of desired patterns 𝐷 = {d1 , . . . , d𝑝 }, d ∈ R𝑚 ,
obtain a set of synaptic weights and ANN thresholds. In and finding the ANN represented by 𝑊 ∈ R𝑞×(𝑞+3) such that
[7], the authors use Evolutionary Programming to get the a function defined by min(𝐹(𝐷, 𝑋, 𝑊)) is minimized and 𝑞
architecture and the set of weights with the aim to solve defined the maximum number of neurons. It is important to
classification and prediction problems. Another example is remark that the search space involves three different domains
[8] where Genetic Programming is used to obtain graphs (architecture, synaptic weight, and transfer functions).
that represent different topologies. In [9], the Differential This research provides a complete study about how an
Evolution (DE) algorithm was applied to design an ANN ANN can be automatically designed by applying bioinspired
to solve a weather forecasting problem. In [10], the authors algorithms, particularly using the Basic Particle Swarm Opti-
use a PSO algorithm to adjust the synaptic weights to model mization (PSO), Second Generation PSO (SGPSO), and New
the daily rainfall-runoff relationship in Malaysia. In [11], the Model of PSO (NMPSO). The proposed methodology evolves
authors compare the back-propagation method versus basic at the same time the architecture, the synaptic weights, and
PSO to adjust only the synaptic weights of an ANN for solving the kind of transfer functions in order to design the ANNs
classification problems. In [12], the set of weights are evolved that provide the best accuracy for a particular problem.
using the Differential Evolution and basic PSO. Moreover, a comparison of the Particle Swarm algorithm per-
In other works like [13], the three principle elements of formance versus classic learning methods (back-propagation
an ANN are evolved at the same time: architecture, transfer and Levenberg-Marquardt) is presented. In addition, in this
functions, and synaptic weights. The authors proposed a New research is presented a new way to select the maximum
Model of a PSO (NMPSO) algorithm, while, in [14], the number of neurons (MNN). The accuracy of the proposed
authors solve the same problem by means of a Differential methodology is tested solving some real and synthetic pattern
Evolution (DE) algorithm. Another example is [15], where
recognition problems. In this paper, we show the results
the authors used an Artificial Bee Colony (ABC) algorithm
to evolve the design of an ANN with two different fitness obtained with ten classification problems of different com-
functions. plexities.
This research has significant contributions in comparison The basic concepts concerning the three PSO algorithms
with these last three works. First of all, eight fitness functions and ANN are presented in Sections 2 and 3, respectively.
are proposed to deal with three common problems that In Section 4 the methodology and the strategy used to
emerge during the design of the ANN: accuracy, overfitting, design the ANN automatically are described. In Section 5 the
and reduction of the ANN. In that sense, to handle better eight fitness functions used in this research are described.
the problems that emerge during the design of the ANN, the In Section 6, the experimental results about tuning the
fitness functions take into account the classification error, parameters for PSO algorithms are described. Moreover, the
mean square error, validation error, reduction of architec- experimental results are outlined in Section 7. Finally, in
tures, and a combination of them. Furthermore, this research Sections 8 and 9 the general discussion and conclusions of
explores the behavior of three bioinspired algorithms using this research are given.
Computational Intelligence and Neuroscience 3
2. Particle Swarm Optimization Algorithms The position of the geometric centre 𝑃 ∈ R𝐷 of the
optimum swarm is updated according to
In this section, three different algorithms based on PSO
metaphor are described. The first one is the original PSO 1 𝑀
algorithm. Then, two algorithms which improve the original 𝑃= ∑p , if CI mod 𝑇 = 0, (3)
PSO are shown: the Second Generation of PSO and a New 𝑀 𝑖=1 𝑖
Model of PSO.
where 𝑀 is the number of particles in the swarm, CI is
2.1. Original Particle Swarm Optimization Algorithm. The the current iteration number, and 𝑇 is the geometric centre
Particle Swarm Optimization (PSO) algorithm is a method updating time of optimum swarm with a value between
for the optimization of continuous nonlinear functions pro- [1, MAXITER].
posed by Eberhart et al. [16]. This algorithm is inspired In SGPSO the velocity is updated by (4) and the position
by observations of social and collective behavior on the of each particle by (5):
movements of bird flocks in search of food or survival as
well as fish schooling. A PSO algorithm is inspired on the k𝑖 (𝑡 + 1) = 𝜔k𝑖 (𝑡) + 𝑐1 𝑟1 (p𝑖 (𝑡) − x𝑖 (𝑡))
movements of the best member of the population and at
+ 𝑐2 𝑟2 (p𝑔 (𝑡) − x𝑖 (𝑡)) (4)
the same time also on their own experience. The metaphor
indicates that a set of solutions is moving in a search space
+ 𝑐3 𝑟3 (𝑃 − x𝑖 (𝑡)) ,
with the aim to achieve the best position or solution.
The population is considered as a cumulus of particles x𝑖 (𝑡 + 1) = x𝑖 (𝑡) + k𝑖 (𝑡 + 1) , (5)
𝑖 where each represents a position x𝑖 ∈ R𝐷, 𝑖 = 1, . . . , 𝑀
in a multidimensional space. These particles are evaluated in where 𝑐1 , 𝑐2 , and 𝑐3 are constants called acceleration coeffi-
a particular optimization function to recognize their fitness cients, 𝑟1 , 𝑟2 , and 𝑟3 are random numbers in the range [0, 1],
value and save the best solution. All the particles change their and 𝑤 is the velocity inertia.
position in the search space according to a velocity function
k𝑖 which takes into account the best position of a particle in a
2.3. New Model of Particle Swarm Optimization. This algo-
population p𝑔 ∈ R𝐷 (i.e., social component) as well as their
rithm was proposed by Garro et al. [13] and is based on some
own best position p𝑖 ∈ R𝐷 (i.e., cognitive component). The ideas that other authors proposed to improve the basic PSO
particles will move in each iteration to a different position algorithm [4]. These ideas are described in next paragraphs.
until they reach an optimum position. At each time 𝑡, the Shi and Eberhart [18] proposed a linearly varying inertia
particle velocity 𝑖 is updated using weight over the course of generations, which significantly
improves the performance of Basic PSO. The following
k𝑖 (𝑡 + 1) = 𝜔k𝑖 (𝑡) + 𝑐1 𝑟1 (p𝑖 (𝑡) − x𝑖 (𝑡)) equation shows us how to compute the inertia:
(1)
+ 𝑐2 𝑟2 (p𝑔 (𝑡) − x𝑖 (𝑡)) , MAXITER − iter
𝑤 = (𝑤1 − 𝑤2 ) × + 𝑤2 , (6)
MAXITER
where 𝜔 is the inertia weight and typically set up to vary
linearly from 1 to 0 during the course of an iteration run; 𝑐1 where 𝑤1 and 𝑤2 are the initial and final values of the inertia
and 𝑐2 are acceleration coefficients; 𝑟1 and 𝑟2 are uniformly weight, respectively, iter is the current iteration number, and
distributed random numbers between (0, 1). The velocity k𝑖 is MAXITER is the maximum number of allowable iterations.
limited to the range [Vmax , Vmin ]. Updating velocity in this way The empirical studies in [18] indicated that the optimal
enables the particle 𝑖 to search for its best individual position solution could be improved by varying the value of 𝑤 from
p𝑖 (𝑡), and the best global particle position 𝑖 is computed as in 0.9 at the beginning of the evolutionary process to 0.4 at the
end of the evolutionary process.
x𝑖 (𝑡 + 1) = x𝑖 (𝑡) + k𝑖 (𝑡 + 1) . (2) Yu et al. [4] developed a strategy that when the global
best position is not improving with the increasing number
2.2. Second Generation of PSO Algorithm. The SGPSO algo- of generations, each particle 𝑖 will be selected by a predefined
rithm [17] is an improvement of the original PSO algorithm probability from the population, and then a random pertur-
that considers three aspects: the local optimum solution of bation is added to each velocity vector dimension k𝑖 of the
each particle, the global best solution, and a new concept, selected particle 𝑖. The velocity resetting is computed as in
the geometric center of optimum swarm. The authors explain
that the birds keep a certain distance from the swarm center v𝑖 = v𝑖 + (2 × 𝑟 − 1) × Vmax , (7)
(food). On the other hand, no bird accurately calculates the
position of the swarm center every time. Bird flocking always where 𝑟 is a uniformly distributed random number in the
stays in the same area for a specified time, during which the range (0, 1) and Vmax is the maximum random perturbation
swarm center will be kept fixed in every bird eyes. Afterward, magnitude to each selected particle dimension.
the swarm moves to a new area. Then all birds must keep a Based on some evolutionary schemes of Genetic Algo-
certain distance in the new swarm center. This fact is the basis rithms (GA), several effective mutation and crossover opera-
of the SGPSO. tors have been proposed for PSO. Løvberg et al. [19] proposed
4 Computational Intelligence and Neuroscience
a crossover operator in terms of a certain crossover rate 𝛼 Mohais et al. [6, 21] used random neighborhoods in PSO,
defined in together with dynamism operators.
In the NMPSO, the use of dynamic random neigh-
ch1 (x𝑖 ) = 𝑟𝑖 par1 (x𝑖 ) + (1 − 𝑟𝑖 ) par2 (x𝑖 ) , (8) borhoods that change in terms of certain rates 𝛾 is pro-
where 𝑟𝑖 is a uniformly distributed random number in the posed. First of all, a maximum number of neighborhoods
range (0, 1), ch1 is the offspring, and par1 and par2 are the MAXNEIGH is defined in terms of population size divided
two parents randomly selected from the population. by 4. With this condition at least each neighborhood 𝐾𝑛 ,
The offspring velocity is calculated in the following 𝑛 = 1, . . . , MAXNEIGH, will have 4 members. Then, the
equation as the sum of the two parents velocity vectors, members of each neighborhood 𝐾𝑛 are randomly selected,
normalized to the original length of each parent velocity and the best particle p𝑔𝐾 is computed. Finally, the velocity
𝑛
vector: of each particle 𝑖 is updated as in
par (k ) + par2 (k𝑖 ) k𝑖 (𝑡 + 1) = 𝜔k𝑖 (𝑡) + 𝑐1 𝑟1 (p𝑖 (𝑡) − x𝑖 (𝑡))
ch1 (k𝑖 ) = 1 𝑖 par1 (k𝑖 ) . (9)
par1 (k𝑖 ) + par2 (k𝑖 ) (11)
+ 𝑐2 𝑟2 (p𝑔𝐾 (𝑡) − x (𝑡)) ,
𝑛
Higashi and Iba [20] proposed a Gaussian mutation
operator to improve the performance of PSO in terms of a for all 𝑖 ∈ 𝐾𝑛 , 𝑛 = 1, . . . , MAXNEIGH.
certain mutation rate 𝛽 defined in The NMPSO combines the varying schemes of inertia
MAXITER − iter weight 𝜔 and acceleration coefficients 𝑐1 and 𝑐1 , velocity
ch (x𝑖 ) = par (x𝑖 ) + 𝑁 (0, 1) , (10) resetting, crossover and mutation operators, and dynamic
MAXITER random neighbourhoods [13]. The NMPSO algorithm is
where ch is the offspring, par is the parent randomly selected described in Algorithm 1.
from the population, iter is the current iteration number and
MAXITER is the maximum number of allowable iterations, 3. Artificial Neural Networks
and 𝑁 is a Gaussian distribution. Utilization of these oper-
ators in PSO has the potential to achieve faster convergence An ANN is a system that performs a mapping between input
and find better solutions. and output patterns that represent a problem [22]. The ANNs
Computational Intelligence and Neuroscience 5
learn information during the training process after several performance of the ANN and solving really complex prob-
iterations. When the learning process finishes, the ANN is lems:
ready to classify new information, predict new behaviours, or
𝑝 𝑀
estimate nonlinear function problems. Its structure consists 1 2
of a set of neurons (represented by functions) connected 𝑒= ∑ ∑ (𝑑𝑖𝜉 − 𝑦𝑖𝜉 ) . (14)
𝑝 ⋅ 𝑀 𝜉=1 𝑖=1
among others organized in layers. The patterns that codify
the real problem codification a ∈ R𝑁 are sent through layers
and the information is transformed with the corresponding 4. Proposed Methodology
synaptic weights W ∈ R𝑁 (values between 0 and 1). Then,
The most important elements to design and improve the
neurons in the following layers perform a summation of this
accuracy of an ANN are the architecture (or topology),
information depending on whether there exists a connection
the set of transfer functions (TF), and the set of synaptic
between them. In addition, in this summation another input
weights and bias. These elements should be codified into
called bias is considered where the value of its input is 1. This
the individual that represents the solution of our problem.
bias is a threshold that represents the minimum level that
The solutions generated by the bioinspired algorithms will
a neuron needs for activating and is represented by 𝜃. The
be measured by the fitness function with the aim to select
summation function is presented in
the best individual which represents the best ANN. The three
𝑁
bioinspired algorithms (basic PSO, SGPSO, and NMPSO) are
𝑜 = ∑𝑎𝑖 𝑤𝑖 + 𝜃. (12) going to lead the evolutionary learning process until finding
𝑖=1 the best ANN by using one of the eight fitness functions
proposed in this paper. It is important to remark that only
After that, the result of the summation is evaluated in pattern classification problems will be solved by the proposed
transfer functions 𝑓(𝑜) activated by the neuron input. The methodology.
result is the output neuron, and this information is sent to The methodology is evaluated with three particle swarm
the other connected neurons until they reach the last layer. algorithms and eight fitness functions. Therefore, this
Finally, the output of the ANN is obtained. involves an extensive behavioral study for each algorithm.
The learning process consists of adapting the synaptic Another point to review is the maximum number of neurons
weights until they reach the desire behaviour. The output is (MNN) used by the methodology to generate the ANN
evaluated to measure the performance of the ANN; if the which is directly related to the dimension of the individual.
output is not as desired, the synaptic weights have to be Due to the information needed to determine the size of the
changed or adjusted in terms of the input patterns a ∈ R𝑁. individuals for a specific problem only depending on the
There are two ways to verify if the ANN has learned: first, the input and output patterns (because the supervised learning is
ANN computes grades similarity between input patterns and applied), it was necessary to propose an equation that allow
information that it knew before (nonsupervised learning). us to obtain the MNN to design the ANN. This equation is
Secondly, the ANN output with desire patterns y ∈ R𝑀 explained in the individual section.
is compared (supervised learning). In our case, supervised In Figure 1, a diagram of the proposed methodology is
learning where the objective is to produce an output approx- shown. During the training stage, it is necessary to define
imation with the desired patterns of a input-output samples the individual and the fitness functions to evaluate each
set 𝑝 is applied (see the following equation): individual. The size of the individual depends on the size of
the input patterns as well as the desire patterns. The individual
T𝜉 = {(a𝜉 ∈ R𝑁, d𝜉 ∈ R𝑀)} ∀𝜉 = 1, . . . , 𝑝, (13) will be evolved during a certain time to obtain the best
solution (with a minimum error). At the end of the learning
where a is the input pattern and d the desired response. process, it is expected that the ANN provides an acceptable
Given the training sample T𝜉 , the requirement is to accuracy during the training and testing stage.
design and compute the neural network free parameters
so that the actual output y𝜉 of the neural network due 4.1. Individual. When solving an optimization problem, the
problem has to be described as a feasible model. After the
to a𝜉 is close enough to d𝜉 for all 𝜉 in a statistical sense
model is defined, the next step is focused on designing the
[15]. We may use the mean square error (MSE) given by
(14) as the first objective function to be minimized. There individual that codifies the solution for the problem. Equa-
are algorithms that adjust the synaptic weights to obtain a tion (15) shows an individual represented with a matrix that
minimum error such as the classic back-propagation (BP) codifies the ANN design. This codification was previously
algorithm [23, 24]. This algorithm like others is based on the described in [13–15]. As it is necessary to evolve the three
descendant gradient technique, which can stay trapped in a ANN elements at the same time, a matrix W ∈ R𝑞×(𝑞+3)
local minimum. Furthermore, a BP algorithm cannot solve is composed by three principal parts with the following
noncontinuous problems. For this reason, the applications of information: first, the topology (𝑇), second the synaptic
other techniques that can solve noncontinuous and nonlinear weights and bias (SW), and third the transfer functions (TF),
problems are necessary to implement for obtaining a better where 𝑞 is the maximum number of neurons (MNN) defined
6 Computational Intelligence and Neuroscience
Training stage
Individual Fitness
function
Input patterns and desired
patterns that codify the Artificial neural network
classification problem Bioinspired design
algorithm
Testing stage
by 𝑞 = 𝑚 + 𝑛 + ((𝑚 + 𝑛)/2), 𝑛 is the input patterns vector (1) For the input layer neurons (ILN), the ILN𝑖 , 𝑖 =
dimension, and 𝑚 is the desired patterns vector dimension: 1, . . . , 𝐼, neuron only can send information to HLN𝑗
and OLN𝑘 .
𝑥1,1 𝑥1,2 ⋅⋅⋅ 𝑥1,MNN+2 𝑥1,MNN+3
[ ] (2) For the hidden layer neurons (HLN), the HLN𝑗 , 𝑗 =
[ .. .. .. .. ]
[ . . d . . ] . (15) 1, . . . , 𝐽, neuron only can send information to OLN𝑘
[ ]
and HLN𝑗 with one restriction for the last. For HLN𝑗
𝑥MNN,1 ⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟
⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟
[ 𝑥MNN,2 ⋅ ⋅ ⋅ 𝑥MNN,MNN+2 ⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟⏟
𝑥MNN,MNN+3 ]
there is a connection only with HLN𝑗+1 , . . . , HLN𝐽 .
𝑇 SW TF
(3) For the output layer neuron (OLN), the OLN𝑘 , 𝑘 =
The matrix that represents the individual codifies three 1, . . . , 𝐾 neuron only can send information to other
different types of information (topology, synaptic weights, neurons of their layer but with a restriction, for OLN𝑘
and transfer function). In that sense, it is necessary to there is a connection only with OLN𝑘+1 , . . . , OLN𝐾 .
determine the exploring range of each type of information in
its corresponding search space. For the case of the topology, To decode the architecture taking into account these
the range is set between [1, 2MNN − 1] due to the integer rules, the information in W𝑖𝑗 with 𝑖 = 1, . . . , MNN and 𝑗 = 1
number of this part being codified into a binary vector (which is in decimal base) is codified based on the binary
composed of MNN elements that indicates if there is a square matrix Z. This matrix will represent a graph where
connection between neuron 𝑖 and neuron 𝑗. each component 𝑧𝑖𝑗 indicates the links between neuron 𝑖 and
The synaptic weights and bias have a range between neuron 𝑗 when 𝑧𝑖𝑗 = 1. For example, suppose that W𝑖𝑗 has
[−4, 4] and [−2, 2] and for the transfer functions the range an integer number “57.” It is necessary to transform it into a
is [1, nF], where nF is the total number of transfer functions. binary code “0111001.” The binary code is interpreted as the
connections of a 𝑖th neuron to seven neurons (number of
4.2. Architecture and Synaptic Weights. Once the individuals bits). In this case, only neurons two, three, four, and seven
or possibles solutions are obtained, it is necessary to decode (from left to right) links to neuron 𝑖 are observed.
the matrix information W into an ANN for its evaluation. The Then, the architecture is now evaluated with the corre-
first element to decode is the topology in terms of the synaptic sponding synaptic weights of the component W𝑖𝑗 with 𝑖 =
weights and transfer functions that are stored in the matrix. 1, . . . , MNN and 𝑗 = 2, . . . , MNN + 1. Finally, each neuron
This research is limited to a kind of feed-forward ANN, computes its output with its corresponding transfer function
for this reason some rules were proposed to guarantee that shown in the same array. In the case of bias, it is encoded in
no recurrent connections will appear in the ANN (the unique the component W𝑖𝑗 with 𝑖 = 1, . . . , MNN and 𝑗 = MNN + 2.
restriction for the ANN). In future works, we will include
recurrent connections and study the behavior of this type of 4.3. Transfer Functions. The TF are represented in the com-
ANNs. ponent W𝑖𝑗 with 𝑖 = 1, . . . , MNN and 𝑗 = MNN + 3. The
The architectures generated by the proposed methodol- transfer functions are in the range of [0, 5] representing one
ogy will be composed of only three layers: input, hidden, and of the six transfer functions selected in this work.
output. To generate valid architectures the following three Although there are several transfer functions that can be
rules must satisfied. used in the ANN context, in this work the most popular
Let ILN be the set of 𝐼 neurons composing the input layer, and useful transfer functions in several kinds of problems
HLN the set of 𝐽 neurons composing the hidden layer, and are selected. The transfer functions in this research with their
OLN the set of 𝐾 neurons composing the output layer. labels to identify them are Sigmoid function (LS), hyperbolic
Computational Intelligence and Neuroscience 7
(1) for 𝑖 = 1 to 𝑛 do
(2) Compute the output 𝑜𝑖 = 𝑎𝑖 .
(3) end for
(4) for 𝑖 with 𝑖 = 𝑛 + 1 to MNN do
(5) Get connections by using individual x1,𝑖 .
(6) Get connections vector z for neuron 𝑖 from 𝑊𝑖 .
(7) Get synaptic weights s for neuron 𝑖 from 𝑊𝑖 .
(8) Get the bias 𝑏 for neuron 𝑖 from 𝑊𝑖 .
(9) Get the transfer function index 𝑡 for neuron 𝑖 from 𝑊𝑖 .
𝑖
(10) Compute the output of neuron 𝑖 as 𝑜𝑖 = 𝑓𝑡 (∑𝑗=1 𝑠𝑗 ⋅ 𝑧𝑗 ⋅ 𝑜𝑖 + 𝑏).
(11) end for
(12) for 𝑖 = MNN − 𝑚 to MNN do
(13) Compute the ANN output with 𝑦𝑖−(MNN−𝑚+1)+1 = 𝑜𝑖 .
(14) end for
tangent function (HT), sinusoidal function (SN), Gaussian the rest is composed of 0s. This indicates that the position
function (GS), linear function (LN), and hard limit function with 1 is the class to which the input pattern belongs. This
(HL). binary chain is compared against the desire pattern, if they
are equal the classification was done correctly.
4.4. ANN Output. Once decoded the information from the In this case, the best ANN is the one which generates the
individual is necessary to know its efficiency to be evaluated minimum wrong classified patterns. The CER is represented
with any of the fitness functions. To do this, it is necessary to by
calculate the output of the ANN designed during the training
stage and generalization stage. This output is calculated using npbc
𝐹2 = CER = (1 − ), (17)
Algorithm 2, where 𝑜𝑖 is the output of the neuron 𝑖, 𝑎𝑗 is the tpc
input pattern that feeds the ANN, 𝑛 is the dimensionality where npbc represents the number of patterns well classified
of the input pattern, 𝑚 is the dimensionality of the desired and tpc is the total of patterns to classify.
pattern, and 𝑦𝑖 is the output of the ANN.
5.3. Validation Error. When the ANN is trained during a long
5. Proposed Fitness Functions period, the ANN could get a maximum learning in which
the ANN becomes adept (overfitting). However, this has a
Each individual must be selected based on their fitness, disadvantage because if the input data during the testing stage
and the best solution is taken depending on the evaluation are contaminated with a negligible amount of noise, the ANN
(performance) of each individual. In this work, we propose will not be able to recognize new patterns.
eight different fitness functions to design an ANN. It is For that reason, we need to include a validation phase
important to remark that fitness functions only are used to prevent overfitting and thus guarantee an adequate gen-
during the training stage to evaluate each solution. After eralization. Therefore, we designed a fitness function that
designing the ANN, we use a new metric that allows us integrates the assessment of both the training and validation
to compare efficiently the results provided by the ANN stages.
generated with the proposed methodology. Based on this idea, two fitness functions were generated:
the first evaluates the mean square error (MSE) on the
5.1. Mean Square Error. The mean square error (MSE) rep- training set MSE𝑇 and the MSE on the validation set MSE𝑉;
resents the error between the ANN output and the desire see (18). The second function takes into account both the
patterns. In this case, the best individual is the one which classification error (CER) on the training set CER𝑇 and the
generates the minimum MSE (see the following equation): classification error on the validation set CER𝑉; see (19):
𝑝 𝑀 𝐹3 = 𝑉MSE = 0.6 × (MSE𝑉) + 0.4 × (MSE𝑇 ) , (18)
1 2
𝐹1 = MSE = ( ∑ ∑ (𝑑𝑖𝜉 − 𝑦𝑖𝜉 ) ) , (16)
𝑝 ⋅ 𝑀 𝜉=1 𝑖=1 𝐹4 = 𝑉CER = 0.6 × (CER𝑉) + 0.4 × (CER𝑇 ) . (19)
where 𝑦𝑖 is the output of the ANN. In order to evaluate the fitness of each solution using (18)
and (19), it is necessary to first computed the MSE or CER
5.2. Classification Error. The classification error (CER) is using the training set; after that, the MSE or CER using the
calculated as follows: the output of the ANN 𝑦𝑖 is transformed validation set is computed. It is important to notice that the
into binary codification by means of the winner-take-all error achieved with the validation set is more weighted than
technique. The binary chain must have only a number 1 and the error obtained with the training set.
8 Computational Intelligence and Neuroscience
5.4. Reduction of the Architecture. In order to generate a The second function reduces the architecture, the validation
smaller ANN in terms of the number of connections, it is error, and the CER equation (25):
necessary to design a fitness function that takes into account
𝐹7 = RA𝑉MSE
the performances of the ANN in terms of the MSE or CER as (24)
well as a factor related to the number of connections used in = RA ⋅ (0.6 × (MSE𝑉) + 0.4 × (MSE𝑇 )) ,
the ANN.
In that sense, we proposed the following equation for 𝐹8 = RA𝑉CER
computing the factor that allows us to measure the size of the (25)
ANN in terms of the number of connections: = RA ⋅ (0.6 × (CER𝑉) + 0.4 × (CER𝑇 )) .
NC
RA = , (20) 6. Tuning the Parameters for PSO Algorithms
NMaxC
Ten classification problems of different complexity were
where NC represents the number of connections when the selected to evaluate the accuracy of the methodology: Iris
proposed methodology is applied and NMaxC represents the plant, wine, breast cancer, diabetes, and liver disorder datasets
maximum number of connection that an ANN can generate which were taken from the UCI machine learning benchmark
which is computed as in repository [25]. The object recognition problem was taken
MNN
from [26], and the spiral, synthetic 1, and synthetic 2 datasets
NMaxC = ∑ 𝑖, (21) were developed in our laboratory. The pattern dispersions of
𝑖=𝑛 these datasets are shown in Figure 2.
Table 1 shows the description for each classification prob-
where MNN is the maximum number of neurons. lem.
It is important to mention that not necessarily less or Each dataset was randomly divided into three sets for
more connections generate a better performance; however, by training, testing, and validating the ANN as follows: 33% of
using factor RA, it is possible to weight other metrics that can the total patterns for the training stage, 33% for validation
measure the performance of the ANN and find the ANN with stage, and 34% for testing stage.
less connections with an acceptable performance. After that, the best parameter values for each algorithm
In that sense, we proposed two new fitness functions were found to obtain the best performance for each classifica-
in terms of the MSE function equation (22) and in terms tion problem. Then, the best configuration for each algorithm
of the CER function equation (23). These fitness functions was used to validate statistically the accuracy of the ANN.
tend to the global minimum when the factor RA and the To determine which parameters generate the best ANN
performance are small; however, when one of these terms in terms of its accuracy, it is necessary to analyze training
tends to increase, the fitness function tends to move away and testing performance. Although the accuracy of the ANN
from the global minimum: should be measured in terms of the testing performance, it
is also important to consider the performance that achieves
𝐹5 = RAMSE = RA ⋅ MSE, (22) the ANN during the training stage, in order to find the
parameters that provoke the best results during training and
𝐹6 = RACER = RA ⋅ CER. (23) testing stages. Instead of analyzing the training and testing
performances separately, we proposed a new metric that let
5.5. Architecture Reduction and Validation Error with MSE us consider the accuracy of the ANN during training and
and CER Errors. At last, two fitness functions RA𝑉MSE and testing stages. This metric allows us to weight the testing
RA𝑉CER were generated: the first reduces simultaneously performance to validate the accuracy of the proposal and, at
the architecture, the validation error, and the MSE; see (24). the same time, to have the confidence that training stage was
Computational Intelligence and Neuroscience 9
8 8
6 7
6
4
5
2
4
0
−8 −6 −4 −2 0 2 4 6 8 3
−2
2
−4
1
−6 0
−8 0 1 2 3 4 5 6 7
Class 1 Class 1
Class 2 Class 2
(a) Spiral (b) Synthetic 1
7
6
5
4
3
2
1
0
0 1 2 3 4 5 6 7
Class 1
Class 2
(c) Synthetic 2
Figure 2: Pattern dispersion for the three synthetic problems. (a) Pattern dispersion for spiral problem. (b) Pattern dispersion for synthetic
1 problem. (c) Pattern dispersion for synthetic 2 problem.
done with an acceptable accuracy. This metric computed a the range of variables, and the fitness function. The specific
weighted recognition rate (wrr) and it is described in parameters are those that are unique or specific to each
algorithm, for example, for the basic PSO algorithm, inertia
wrr = 0.4 × (Trr r) + 0.6 × (Ter r) , (26) 𝜔 and the two coefficients of acceleration 𝑐1 and 𝑐2 are the
where Trr r represents the recognition rate obtained during parameters that change. In the case of SGPSO algorithm takes
the training stage and Ter r represents the recognition rate two parameters, the coefficient of acceleration 𝐶3 and the
obtained during the testing stage. geometric center 𝑃. Finally, the NMPSO algorithm has the
From (26), we could observe that testing and training crossover operator 𝛼, the mutation operator 𝛽, and 𝛾 which
stages were weighted by a factor of 0.6 and 0.4, respectively. determine when each neighborhood should be updated.
Using these factors, we can avoid that high wrr value may be For each parameter configuration and each problem 5
obtained by a higher training recognition rate and a lower experiments with 2000 generations were performed. Once
testing recognition rate. the ANNs were designed with the proposed methodology, the
The analysis to select the best values of each algorithm was average weighted recognition rate wrr was obtained.
performed taking into account the ten classification problems Next is described which values were taken for each
described above. The different parameters for each algorithm parameter to obtain the best configuration for each bioin-
were varied in different ranges to evaluate the performance of spired algorithm.
the algorithms over different pattern recognition problem. In The common parameters for the three algorithms are
order to find the best configuration for the parameters of each represented as follows: for the population size, in the variable
algorithm, several experiments were done assigning different V = {50, 100} the first element corresponds to 50 individuals
values to each parameter in the three bioinspired algorithms and the second corresponds to 100 individuals. In the case
(original PSO, SGPSO, and NMPSO). of the search space size 𝑤 = {2, 4} the first element
The parameters were divided into two types: the param- indicates that the range is set to [−2, 2] and the second item
eters that are shared or common to all algorithms, such indicates that the range is between [−4, 4]. The type of fitness
as the number of generations, the number of individuals, function used with the bioinspired algorithm is represented
10 Computational Intelligence and Neuroscience
3 3
2 1 SN SN 2 1 LN
GS
1 2 GS SN 1 2 SN
0 0
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
(a) (b)
5
4 1
GS SN
3 3 2
SN SN
2 1 SN GS 2 3
SN HT
1 2 GS GS 1 4
0 0
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
(c) (d)
Figure 3: Some ANN generated using basic PSO algorithm. (a) The best architecture for spiral problem. (b) The best architecture for synthetic
1 problem. (c) The best architecture for synthetic 2 problem. (d) The best architecture for Iris plant problem.
by the variable 𝑥 and can take one of the eight elements, configuration found for NMPSO was 100 − [−4, 4] − CER −
𝑥 = {MSE, CER, 𝑉MSE, 𝑉CER, RAMSE, RACER, RA𝑉MSE, 200 − 0.1 − 0.1.
RA𝑉CER}.
All the possible combinations using the different param- 7. Experimental Results
eter values were tested. The eighth fitness function was tested
using all the classification problems proposed in this research Once we determined the best configuration for each algo-
to see which provides the best accuracy. rithm, we performed an exhaustive testing of 30 runs for
The configuration to determine the value for each param- each pattern classification problem. The accuracy of the ANN
eter for original PSO is determined by the following sequence: generated by the methodology was measured in terms of
V − 𝑤 − 𝑥 − 𝑢 − 𝑦 − 𝑧. the weighted recognition rate (26). The following subsections
The basic PSO algorithm has three unique parameters: describe the results obtained for each database and each
the inertia weight 𝜔 represented by 𝑢 which can take the bioinspired algorithm. These experiments show the evolution
following values 𝑢 = 0.3, 0.5, 0.7, 0.9 and the two acceleration of the fitness function during 5000 generations, the weighted
coefficients 𝑐1 and 𝑐2 represented by 𝑦 and 𝑧, respectively, with recognition rate, and some examples of the architectures
the values 𝑦 = 𝑧 = 0.5, 1.0, 1.5. Once we finished the set of generated with the methodology.
experiments to test the performance of the original (basic)
PSO algorithm with all the previous values combinations, 7.1. Results for Basic PSO Algorithm. In Figure 3 are shown
we found that the best parameter configuration was 100 − some of the ANNs generated using the PSO algorithm that
[−2, 2] − 𝑉CER − 0.3 − 1.0 − 1.5. provide the best results for the recognition problem.
SGPSO algorithm has two unique parameters: the accel- Figure 4(a) showed the evolution of the fitness function
eration coefficient 𝐶3 represented by the variable 𝑦 whose 𝑉CER where we can appreciate the tendency for each
values are 𝑦 = 0.5, 1.0, 1.5 and the geometric center 𝑃 classification problem. These results were obtained with the
represented by the variable 𝑧 with values 𝑧 = 100, 200, 300. best configuration of basic PSO.
In the case of the acceleration coefficients 𝑐1 and 𝑐2 , it took The evolution of the fitness function represents the aver-
the best values found for the basic PSO algorithms: 𝑐1 = 1.0, age of the 30 experiments for each problem. It is observed that
𝑐2 = 1.5, and 𝜔 = 0.3. After several experiments, the best the value of the fitness function for the glass, spiral, liver dis-
parameter configuration for SGPSO was 100−[−2, 2]−CER− orders, diabetes, and synthetic 2 problems slightly decreases
0.5 − 100. despite the number of generations. Smaller values for the
The NMPSO algorithm has three unique parameters: the fitness function were achieved with the Iris plant, breast
updating neighborhood rate 𝛾 which takes the values 𝑢 = cancer, and synthetic 1 problems. With the object recognition
100, 200, 300, the crossover factor 𝛼, and the mutation factor and wine problems, the value of the fitness function decreased
𝛽, which are represented by the variables 𝑦 and 𝑧; both when approaching the limit of generations. The average
take the values 𝑦 = 𝑧 = 0.1, 0.5, 0.9. The best parameter weighted recognition rate for each problem is presented in
Computational Intelligence and Neuroscience 11
Table 2: Number of times that transfer function was selected by basic PSO algorithm.
ID Classification Sigmoid Hyp. Tan. Sinusoid Gaussian Linear function Hard limit
problem problems function (LS) function (HT) function (SN) function (GS) (LN) function (HL)
1 Spiral 3 3 66 32 9 0
2 Synthetic 1 1 10 51 29 18 2
3 Synthetic 1 2 12 57 29 9 7
4 Iris plant 6 27 51 63 25 1
5 Breast cancer 2 40 51 61 32 8
6 Diabetes 4 30 57 76 26 0
7 Liver disorders 3 18 62 65 19 4
Object
8 6 33 116 126 40 6
recognition
9 Wine 1 46 98 125 42 3
10 Glass 5 54 140 145 46 0
Total 33 273 749 751 266 31
1 1
0.9 0.9
0.8 0.8
0.7
Weighted recognition
0.7
0.6 0.6
VCER
0.5 0.5
0.4 0.4
0.3 0.3
0.2 0.2
0.1 0.1
0 0
0 1000 2000 3000 4000 5000 1 2 3 4 5 6 7 8 9 10 11
Generations ID problem
Figure 4: Average results for ten classification problems using basic PSO algorithm. (a) Average error evolution. (b) Average weighted
recognition percentage.
Figure 4(b). It can be observed that, for the glass problem, synthetic 1, and synthetic 2 problems. The Gaussian transfer
the ANN achieved the smallest average weighted recognition function was selected more often for Iris plant, breast cancer,
rate (52.67%), followed by the spiral (53.39%), liver disorders diabetes, liver disorders, object recognition, wine, and glass
(68.74%), diabetes (76.90%), object recognition (80.22%), problems.
synthetic 2 (82.96%), and wine (86.49%). The highest average Table 3 shows the maximum, minimum, standard devia-
weighted recognition rates were achieved for the synthetic 1 tion, and average number of connections used by the ANN.
(95.03%), the Iris (96.35%), and the breast cancer (96.99%). As you can see, in average, the number of connections is low
Table 2 presents the frequency at which the six different for the problems of spiral, synthetic 1, and synthetic 2. For the
transfer functions were selected for the ANN during the glass and wine, in average, 97.43 and 91.1 connections were
training stage. Applying the PSO algorithm, we see that used, respectively.
there is a small range of selected functions. For example, the Table 4 shows the maximum, minimum, standard devi-
sinusoidal function was selected more often for the spiral, ation, and average the number of neurons used in the ANN
12 Computational Intelligence and Neuroscience
Table 3: Number of connections used by the ANN generated with the basic PSO algorithm.
Connection number
ID problem Classification problems
Minimum Maximum Average Std. dev.
1 Spiral 4 10 7.2667 1.4126
2 Synthetic 1 5 10 7.0667 1.2576
3 Synthetic 2 5 9 7.5 1.3326
4 Iris plant 14 23 18.9667 2.4563
5 Breast cancer 19 50 38 6.9926
6 Diabetes 8 47 35 7.0759
7 Liver disorders 16 31 24.6667 3.8177
8 Object recognition 54 71 62.8 4.9578
9 Wine 59 109 91.1 13.2336
10 Glass 81 108 97.4333 6.3933
Table 4: Number of neurons used by the ANN generated with the basic PSO algorithm.
Neurons number
ID problem Classification problems
Minimum Maximum Average Std. dev.
1 Spiral 3 4 3.7667 0.4302
2 Synthetic 1 2 4 3.7 0.5350
3 Synthetic 2 3 4 3.8667 0.3457
4 Iris plant 5 6 5.7667 0.4302
5 Breast cancer 3 7 6.4667 0.9371
6 Diabetes 2 7 6.4333 1.0063
7 Liver disorders 4 6 5.7 0.5350
8 Object recognition 10 11 10.9 0.3051
9 Wine 8 11 10.5 0.8610
10 Glass 13 13 13 0
generated with the proposed method. In this table, we can and synthetic 1 (93.61%). The second highest weighted recog-
see that the number of neurons in the ANN for the ten nition rate was achieved for the Iris plant (96.45%). The
classification problems was no more than 13. highest weighted recognition rate was achieved for the breast
cancer problem (97.03%).
7.2. Results for SGPSO Algorithm. In Figure 5 are shown some Table 5 presents the number of times that transfer func-
of the best ANNs generated with the SGPSO algorithm. tions were selected using the SGPSO algorithm. The sinusoid
You can also observe an example of an ANN with a input function was the most selected by 9 of the 10 classification
neuron without any connection; see Figure 5(c). The lack problems: spiral, synthetic 1 and synthetic 2, Iris plant,
of connection in the ANN indicates that the input feature diabetes, liver disorders, object recognition, wine, and glass
was not necessary to solve the problem. In other words, a problems. For the breast cancer problem, sinusoid function
dimensionality reduction of the input pattern was also done was selected almost at the same rate as the Gaussian function.
by the proposed methodology. Furthermore, Table 6 shows the maximum, minimum,
Figure 6(a) shows the evolution of the fitness function standard deviation, and average number of connections used
CER where we can see the tendency of the fitness function for by the ANN designed with the proposed methodology. In this
each classification problem. These results were obtained with case, SGPSO generates more connections between neurons
the best parameter configuration for the SGPSO algorithm. of the ANN for the ten classification problems than those
In general, the problems whose values are near to the optimal generated with the basic PSO algorithm.
solution are the breast cancer, Iris plant, and synthetic 1, being
Table 7 shows the maximum, minimum, standard devia-
in last place with high errors the liver disorders, glass, and
tion, and average number of neurons required for the ANN
spiral problems.
using SGPSO algorithm.
The average weighted recognition rate for each problem
is presented in Figure 6(b). It was observed that for the glass
problem the proposed methodology achieved the smallest 7.3. Results for NMPSO Algorithm. Figure 7 shows some of
weighted recognition rate (54.31%), followed by the spiral the best ANNs generated with the NMPSO algorithm. The
(55.60%), liver disorders (69.19%), diabetes (76.09%), object fitness function used with the NMPSO algorithm was CER
recognition (80.45%), synthetic 2 (81.39%), wine (82.47%), function.
Computational Intelligence and Neuroscience 13
10
9 1
8 2
7 3 GS
6 4 SN
LN
5 5 5 LN
SN
4 1 4 6 SN
SN SN
3 2 3 7 SN
SN SN
2 3 2 8
SN SN
1 4 1 9
0 0
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
(a) (b)
9
8 1
7 2 7
6 3 GS 6 1
5 4 SN SN 5 2 GS
4 5 SN SN 4 3 SN GS
3 6 SN 3 4 SN GS
2 7 2 5 GS
1 8 1 6
0 0
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
(c) (d)
Figure 5: Some ANN using SGPSO algorithm. (a) The best architecture for Iris plant problem. (b) The best architecture for breast cancer
problem. (c) The best architecture for diabetes problem. (d) The best architecture for liver disorders problem.
The evolution of the fitness function for the 10 classifica- problem (88.62%), Iris plant (96.60%), breast cancer (97.11%),
tion problems is shown in Figure 8(a) where it is observed and synthetic 1 (97.42%).
that the minimum values are reached with the synthetic 1, The number of times that the transfer functions were
breast cancer, and Iris plant problems. For the case of wine selected using NMPSO algorithm is described in Table 8.
problem the value of the fitness function improves while the Using the sinusoidal function, the ANNs provide better
generation’s number increased. The worst case was observed results for the spiral, synthetic problem 1, synthetic problem 2,
for the glass problem. and the object recognition problem. For the the Iris plant,
breast cancer, diabetes, liver disorders, wine, and glass prob-
The weighted recognition rate for each problem is shown
lems the Gaussian function was the most selected.
in Figure 8(b). From this graph, we observed that the aver- In general, the transfer function most often selected using
age weighted recognition rate for the glass problem was NMPSO algorithm was the Gaussian, second sinusoidal func-
54.06%, for the spiral problem 62.97% and for liver disor- tion, then the hyperbolic tangent, next the linear function,
ders it achieved 70.01%, the diabetes problem 76.89%, the and the last places the sigmoid and hard limit functions.
object recognition problem 85.73%, and synthetic problem 2 Table 9 shows the maximum, minimum, standard deviation,
86.30%. The best recognition rate was achieved with the wine and average connections number.
14 Computational Intelligence and Neuroscience
0.8 1
0.7 0.9
0.8
0.6
Weighted recognition
0.7
0.5
0.6
CER
0.4 0.5
0.3 0.4
0.3
0.2
0.2
0.1
0.1
0 0
0 1000 2000 3000 4000 5000 1 2 3 4 5 6 7 8 9 10 11
Generations ID problem
Figure 6: Average results for ten classification problems using basic SGPSO algorithm. (a) Average error evolution. (b) Average weighted
recognition rate.
Table 5: Number of times that transfer function was selected by SGPSO algorithm.
ID Classification Sigmoid Hyp. Tan. Sinusoid Gaussian Linear function Hard limit
problem problems function (LS) function (HT) function (SN) function (GS) (LN) function (HL)
1 Spiral 0 9 70 30 1 0
2 Synthetic 1 0 2 72 38 2 0
3 Synthetic 2 0 1 80 29 2 0
4 Iris plant 2 13 103 54 4 0
5 Breast cancer 5 28 71 72 22 1
6 Diabetes 2 17 93 73 6 0
7 Liver disorders 2 12 99 56 2 0
Object
8 0 11 198 116 3 0
recognition
9 Wine 3 34 134 120 24 1
10 Glass 0 24 215 144 4 0
Total 14 151 1135 732 70 2
Table 6: Number of connections used by the ANN generated with the SGPSO algorithm.
Connection number
ID problem Classification problems
Minimum Maximum Average Std. dev.
1 Spiral 4 10 6.8667 1.8520
2 Synthetic 1 4 9 6.6333 1.4016
3 Synthetic 2 5 10 7.0667 1.4368
4 Iris plant 12 25 19.7667 3.025
5 Breast cancer 30 49 40.9333 4.2906
6 Diabetes 20 47 36.7 6.8337
7 Liver disorders 14 34 26.9333 4.1517
8 Object recognition 52 79 66.1667 6.2648
9 Wine 71 107 93.833 9.0443
10 Glass 81 109 96.2 7.4575
Computational Intelligence and Neuroscience 15
Table 7: Number of neurons used by the ANN generated with the SGPSO algorithm.
Neurons number
ID problem Classification problems
Minimum Maximum Average Std. dev.
1 Spiral 2 4 3.6667 0.5467
2 Synthetic 1 3 4 3.8 0.4068
3 Synthetic 2 3 4 3.7333 0.4498
4 Iris plant 4 6 5.8667 0.4342
5 Breast cancer 5 7 6.6333 0.5561
6 Diabetes 4 7 6.3667 0.8899
7 Liver disorders 4 6 5.7 0.5350
8 Object recognition 10 11 10.9333 0.2537
9 Wine 9 11 10.5333 0.6288
10 Glass 12 13 12.9 0.3051
Table 8: Number of times that transfer function was selected by NMPSO algorithm.
Classification Sigmoid Hyp. Tan. Sinusoid Gaussian Linear function Hard limit
ID problem
problems function (LS) function (HT) function (SN) function (GS) (LN) function (HL)
1 Spiral 0 1 80 25 10 0
2 Synthetic 1 1 9 48 44 6 1
3 Synthetic 2 0 3 62 41 6 1
4 Iris plant 5 17 63 68 17 3
5 Breast cancer 2 41 52 62 29 3
6 Diabetes 3 24 55 82 23 1
7 Liver disorders 0 9 65 81 4 0
Object
8 0 19 174 115 19 0
recognition
9 Wine 3 58 76 137 40 5
10 Glass 1 53 114 172 48 2
Total 15 234 789 827 202 16
In Table 10 are shown the maximum, minimum, standard needing a specific architecture, it was proposed to design
deviation, and the average number of neurons used by the manually two kinds of ANN. The first consists of one hidden
ANN generated with the NMPSO algorithm. layer and the second consists of two hidden layers.
To determine the maximum number of neurons MNN
8. General Discussion used to generate the ANN we follow the same rule proposed
in the methodology. For the ANN with two hidden layers,
In general, Table 11 shows a summary of results taking into there was a pyramidal distribution using
account the average weighted recognition rate obtained with
the three bioinspired algorithms. DN = 0.6 × (MNN) + 0.4 × (MNN) , (27)
For the cases of the spiral, synthetic 1, Iris plant, breast
cancer, liver disorders, object recognition, and wine problems where the first hidden layer has the 60% of the total hidden
the algorithm providing better results was the NMPSO algo- layers and the second hidden layer has the 40% of the total
rithm. For the glass problem the best accuracy was achieved hidden layers.
with SGPSO algorithm and for the case of diabetes the best Two stop criteria for the gradient descent and Levenberg-
performance was achieved using the basic PSO algorithm. Marquardt algorithms were established: until the algorithm
From Table 11, it is possible to see that the best algorithm, reach 5000 epochs or until reach an error of 0.000001.
in terms of the weighted recognition rate, was NMPSO The classification problems were divided into three subsets:
(81.57%), the second best algorithm was basic PSO (78.97%), 40% of the overall patterns were used for training, 50% for
and the last was SGPSO algorithm (78.65%) for the ten generalization, and 10% for validation. The learning rate was
classification problems. set to 0.1.
Moreover, these results were compared with results In Table 12 is shown the average weighted recognition rate
obtained from classic algorithms such as the gradient descent using the classic training algorithms: one based on gradient
and Levenberg-Marquardt. Due to the classic techniques descent (backpropagation algorithm) and the other based on
16 Computational Intelligence and Neuroscience
7 7 1
GS
6 1 6 2 SN
SN
5 2 SN 5 3 SN
SN
4 3 GS GS 4 4 GS
SN
3 4 SN SN 3 5 LN
SN
2 5 GS 2 6 SN
SN
1 6 1 7
0 0
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
(a) (b)
14
12 2
3
HT
10 4 10
HT
5 9 1
GS
8 6 GS 8 2 GS
GS LN
7 GS 7 3 LS
GS HT
6 8 HT 6 4 LN
GS GS
9 5 5 GS
HT SN
4 10 4 6 LN
GS SN
11 3 7 GS
LN
2 12 2 8 LN
13 1 9
0 0
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
(c) (d)
Figure 7: Some ANN using NMPSO algorithm. (a) The best architecture for liver disorders problem. (b) The best architecture for object
recognition problem. (c) The best architecture for wine problem. (d) The best architecture for glass problem.
the Levenberg-Marquardt algorithm. From this set of experi- Considering Tables 12 and 11, the best techniques to
ments, we observed that the best algorithm was Levenberg- design ANN were the NMPSO algorithm followed by the
Marquardt with a single layer. This algorithm solved eight Levenberg-Marquardt with one hidden layer. On the other
of ten problems with the best performance (spiral, synthetic hand, the basic PSO and SGPSO algorithms as well as the
1, synthetic 2, Iris plant, breast cancer, diabetes, liver dis- gradient descend and Levenberg-Marquardt with two layers
orders, and object recognition). For the case of the wine did not provide a good performance.
problem, the best algorithm was the gradient descent algo- Besides that Levenberg-Marquardt obtained better results
rithm composed of one single layer. The glass problem was than PSO and SGPSO algorithms, there are some important
solved better using Levenberg-Marquardt with two hidden points to consider: first, the ANN designed with the pro-
layers. posed methodology includes the selection of the architecture,
Computational Intelligence and Neuroscience 17
0.9 1
0.8 0.9
0.7 0.8
Weighted recognition
0.7
0.6
0.6
0.5
CER
0.5
0.4
0.4
0.3
0.3
0.2 0.2
0.1 0.1
0 0
0 1000 2000 3000 4000 5000 1 2 3 4 5 6 7 8 9 10 11
Generations ID problem
Figure 8: Average results for ten classification problems using NMPSO algorithm. (a) Average error evolution. (b) Average weighted
recognition rate.
Table 9: Number of connections used by the ANN generated with the NMPSO algorithm.
Connections number
ID problem Classification problem
Minimum Maximum Average Std. dev.
1 Spiral 4 10 7.1333 1.4320
2 Synthetic 1 3 10 6.5333 1.4559
3 Synthetic 2 4 10 7.4667 1.5916
4 Iris plant 15 24 19.2667 2.6121
5 Breast cancer 22 51 38.6333 7.7792
6 Diabetes 12 44 34.5667 7.6775
7 Liver disorders 16 30 22.8667 4.9461
8 Object recognition 51 70 62.8667 5.3223
9 Wine 76 109 92.9333 9.1007
10 Glass 79 107 95.4333 6.8213
Table 10: Number of neurons used by the ANN generated with the SGPSO algorithm.
Connections number
ID problem Classification problem
Minimum Maximum Average Std. dev.
1 Spiral 3 4 3.8667 0.3457
2 Synthetic 1 2 4 3.6333 0.5561
3 Synthetic 2 3 4 3.7667 0.4302
4 Iris plant 5 6 5.7667 0.4302
5 Breast cancer 4 7 6.3 0.9154
6 Diabetes 3 7 6.2667 0.9803
7 Liver disorders 4 6 5.3 0.7944
8 Object recognition 10 11 10.9 0.3051
9 Wine 9 11 10.6333 0.6149
10 Glass 13 13 13 0
18 Computational Intelligence and Neuroscience
Table 11: Average weighted recognition rate (wrr) for the three bioinspired algorithms.
Table 12: Average weighted recognition rate (wrr) for the classic algorithms.
synaptic weights, bias, and transfer functions. For the case (of connections and neurons), were implemented to evaluate
of classic techniques, the architectures must be carefully and each individual. From these experiments, we observed that
manually designed by an expert in order to obtain the best the fitness functions that generated the ANN with the best
results; this process can be a time-consuming task for the weighted recognition rate were those that used the classi-
expert. On the opposite side, the proposed methodology fication error CER. The three bioinspired algorithms based
automatically designs the ANN in terms of the input and on PSO were compared in terms of the average weighted
desire patterns that codified the problem to be solved. recognition rate.
On the other hand, the NMPSO algorithm achieved the
9. Conclusions best performance followed by the basic PSO and SGPSO
algorithm.
In this paper, we proposed three connection rules for generat- To validate statistically the accuracy of the proposed
ing feed-forward ANN and guiding the connections between methodology, first of all, the parameters for the three bioin-
neurons. These rules allow connections among neurons from spired algorithms were selected. For the case of basic PSO
the input layer to the output layer. These rules also allow to the best fitness function selected was 𝑉CER with a variable
generate lateral connections among neurons from the same range between [−2, 2]. After tuning the parameters of each
layer. algorithm and choosing the best configuration, we observe
We also observed that some ANNs designed by the that the parameters were different from those proposed in
proposed methodology do not have any connection from the the literature; these values for the parameters were set to
input neurons. It means that the feature associated to this 𝜔 = 0.3, 𝑐1 = 1.0, and 𝑐2 = 1.5. For the SGPSO algorithm,
neuron was not relevant to compute the output of ANN. This the best fitness function selected was CER with a variable
is known as dimensionality reduction of the input pattern. range between [−2, 2]. The values for the parameters were
Eight transfer functions, which involve the combination set to 𝑐3 = 0.5 and the geometric centre 𝑃 = 100. For the
of the MSE, CER validation error, and architecture reduction NMPSO algorithm, the best fitness function was CER with
Computational Intelligence and Neuroscience 19
a variable range between [−4, 4]. The parameters for the best for weather forecasting34,” International Journal of Computer
configuration were set to 𝛾 = 200, crossover rate 𝛼 = 0.1, and Science and Network Security, vol. 9, no. 3, pp. 92–99, 2009.
mutation rate 𝛽 = 0.1. [10] K. K. Kuok, S. Harun, and S. M. Shamsuddin, “Particle swarm
After tuning the parameters of the three algorithms, optimization feedforward neural network for modeling runoff,”
30 runs were performed for each of the ten classification International Journal of Environmental Science and Technology,
problems. In general, whereas the problems that achieved vol. 7, no. 1, pp. 67–78, 2010.
a weighted recognition rate of 100% were the synthetic [11] B. A. Garro, H. Sossa, and R. A. Vázquez, “Back-propagation
problem 1, Iris plant, and object recognition problems, a lower vs particle swarm optimization algorithm: which algorithm is
performance was obtained with the glass and spiral problems. better to adjust the synaptic weights of a feed-forward ANN?”
International Journal of Artificial Intelligence, vol. 7, no. 11, pp.
The transfer functions that more often were selected for
208–218, 2011.
each algorithm were: the Gaussian function for the basic PSO
[12] B. Garro, H. Sossa, and R. Vazquez, “Evolving neural networks:
algorithm, the sinusoidal function for SGPSO algorithm and
a comparison between differential evolution and particle swarm
the Gaussian function for NMPSO algorithm. optimization,” in Advances in Swarm Intelligence, Y. Tan, Y.
In general, the ANNs designed with the proposed Shi, Y. Chai, and G. Wang, Eds., vol. 6728 of Lecture Notes in
methodology were very promising. The proposed methodol- Computer Science, pp. 447–454, Springer, Berlin, Germany, 2011.
ogy automatically designs the ANN based on determining the [13] B. A. Garro, H. Sossa, and R. A. Vazquez, “Design of artificial
set connections, the number of neurons in hidden layers, the neural networks using a modified particle swarm optimization
adjustment of the synaptic weights, the selection of bias, and algorithm,” in Proceedings of the International Joint Conference
transfer function for each neuron. on Neural Networks (IJCNN ’09), pp. 938–945, IEEE, Atlanta,
Ga, USA, June 2009.
Conflict of Interests [14] B. Garro, H. Sossa, and R. Vazquez, “Design of artificial neural
networks using differential evolution algorithm,” in Neural
The authors declare that there is no conflict of interests Information Processing. Models and Applications, K. Wong, B.
regarding the publication of this paper. Mendis, and A. Bouzerdoum, Eds., vol. 6444 of Lecture Notes in
Computer Science, pp. 201–208, Springer, Berlin, Germany, 2010.
[15] B. A. Garro, H. Sossa, and R. A. Vazquez, “Artificial neural
Acknowledgments network synthesis by means of artificial bee colony (abc)
algorithm,” in Proceedings of the IEEE Congress on Evolutionary
The authors thank Universidad La Salle for the economic Computation (CEC ’11), pp. 331–338, IEEE, New Orleans, La,
support under Grants number I-61/12. Beatriz Garro thanks USA, June 2011.
CONACYT and UNAM for the posdoctoral scholarship. [16] R. C. Eberhart, Y. Shi, and J. Kennedy, Swarm Intelligence, The
Morgan Kaufmann Series in Evolutionary Computation, Mor-
References gan Kaufmann, Boston, Mass, USA, 1st edition, 2001.
[17] M. Chen, “Second generation particle swarm optimization,” in
[1] G. Beni and J. Wang, “Swarm intelligence in cellular robotic Proceedings of the IEEE Congress on Evolutionary Computation
systems,” in Robots and Biological Systems: Towards a New (CEC ’08), pp. 90–96, Hong Kong, June 2008.
Bionics? vol. 102 of NATO ASI Series, pp. 703–712, Springer, [18] Y. Shi and R. C. Eberhart, “Empirical study of particle swarm
Berlin, Germany, 1993. optimization,” in Proceedings of the Congress on Evolutionary
[2] X. Yao, “Evolving artificial neural networks,” Proceedings of the Computation (CEC '99), vol. 3, IEEE, Washington, DC, USA,
IEEE, vol. 87, no. 9, pp. 1423–1447, 1999. July 1999.
[3] E. Alba and R. Martı́, Metaheuristic Procedures for Train- [19] M. Løvberg, T. K. Rasmussen, and T. Krink, “Hybrid par-
ing Neural Networks, Operations Research/Computer Science ticle swarm optimizer with breeding and subpopulations,”
Interfaces Series, Springer, New York, NY, USA, 2006. in Proceedings of the Genetic and Evolutionary Computation
[4] J. Yu, L. Xi, and S. Wang, “An improved particle swarm opti- Conference (GECCO ’01), pp. 469–476, Morgan Kaufmann, San
mization for evolving feedforward artificial neural networks,” Francisco, Calif, USA, July 2001.
Neural Processing Letters, vol. 26, no. 3, pp. 217–231, 2007. [20] N. Higashi and H. Iba, “Particle swarm optimization with Gaus-
[5] M. Conforth and Y. Meng, “Toward evolving neural networks sian mutation,” in Proceedings of the IEEE Swarm Intelligence
using bio-inspired algorithms,” in IC-AI, H. R. Arabnia and Y. Symposium (SIS ’03), pp. 72–79, IEEE, April 2003.
Mun, Eds., pp. 413–419, CSREA Press, 2008. [21] A. S. Mohais, R. Mohais, C. Ward, and C. Posthoff, “Earthquake
[6] Y. Da and G. Xiurun, “An improved PSO-based ANN with classifying neural networks trained with random dynamic
simulated annealing technique,” Neurocomputing, vol. 63, pp. neighborhood PSOs,” in Proceedings of the 9th Annual Genetic
527–533, 2005. and Evolutionary Computation Conference (GECCO ’07), pp.
[7] X. Yao and Y. Liu, “A new evolutionary system for evolving arti- 110–117, ACM, New York, NY, USA, July 2007.
ficial neural networks,” IEEE Transactions on Neural Networks, [22] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learn-
vol. 8, no. 3, pp. 694–713, 1997. ing internal representations by error propagation,” in Parallel
[8] D. Rivero and D. Periscal, “Evolving graphs for ann develop- Distributed Processing: Explorations in the Microstructure of
ment and simplification,” in Encyclopedia of Artificial Intelli- Cognition, Vol. 1, D. E. Rumelhart, J. L. McClelland, and PDP
gence, J. R. Rabuñal, J. Dorado, and A. Pazos, Eds., pp. 618–624, Research Group, Eds., pp. 318–362, MIT Press, Cambridge,
IGI Global, 2009. Mass, USA, 1986.
[9] H. M. Abdul-Kader, “Neural networks training based on differ- [23] J. A. Anderson, An Introduction to Neural Networks, The MIT
ential evolution algorithm compared with other architectures Press, 1995.
20 Computational Intelligence and Neuroscience
Advances in
Fuzzy
Systems
Modelling &
Simulation
in Engineering
Hindawi Publishing Corporation
Hindawi Publishing Corporation Volume 2014 http://www.hindawi.com Volume 2014
http://www.hindawi.com
International Journal of
Advances in Computer Games Advances in
Computer Engineering Technology Software Engineering
Hindawi Publishing Corporation Hindawi Publishing Corporation Hindawi Publishing Corporation Hindawi Publishing Corporation Hindawi Publishing Corporation
http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014 http://www.hindawi.com Volume 2014
International Journal of
Reconfigurable
Computing