Unit 2 - Soft Computing - WWW - Rgpvnotes.in
Unit 2 - Soft Computing - WWW - Rgpvnotes.in
E
Subject Name: Soft Computing
Subject Code: CS-8001
Semester: 8th
Downloaded from be.rgpvnotes.in
UNIT-2 Notes
A wide variety of sigmoid functions have been used as the activation function of artificial
neurons, including the logistic and hyperbolic tangent functions, see figure 1.9.
Derivation of EBPA
The basic algorithm can be summed up in the following equation (the delta rule) for the
change to the weight wji from node i to node j:
1. If node j is a output ode, the δj is the p odu t of φ' vj) and the error signal ej, where
φ _ is the logisti fu tio a d vj is the total input to node j i.e. Σi wjiyi), and ej is the
error signal for node j (i.e. the difference between the desired output and the actual
output);
2. If node j is a hidde ode, the δj is the p odu t of φ' vj) and the weighted sum of the
δ's o puted fo the odes i the e t hidde o output la e that a e o e ted to
node j.
3. [The a tual fo ula is δj = φ' vj) &Sigmak δkwkj where k ranges over those nodes for
which wkj is non-zero (i.e. nodes k that actually have connections from node j. The δk
values have already been computed as they are in the output layer (or a layer closer to
the output layer than node j).]
Momentum
If the lea i g ate η is very small, then the algorithm proceeds slowly, but accurately
follows the path of steepest descent in weight space.
If η is la gish, the algo ith a os illate " ou e off the a o alls" .
A simple method of effectively increasing the rate of learning is to modify the delta rule
by including a momentum term:
Δwji(n = α Δwji(n- + η δj(n)yi(n)
he e α is a positi e o sta t te ed the momentum constant. This is called the
generalized delta rule.
The effect is that if the basic delta rule is consistently pushing a weight in the same
direction, then it gradually gathers "momentum" in that direction.
-Stopping Criterion
Two commonly used stopping criteria are:
stop after a certain number of runs through all the training data (each run through all
the training data is called an epoch);
stop when the total sum-squared error reaches some low level. By total sum-squared
e o e ea ΣpΣiei2 where p ranges over all of the training patterns and i ranges over
all of the output units.
-Initialization
The weights of a network to be trained by backprop must be initialized to some non-
zero values.
The usual thing to do is to initialize the weights to small random values.
The reason for this is that sometimes backprop training runs become "lost" on a plateau
in weight-space, or for some other reason backprop cannot find a good minimum error
value.
Using small random values means different starting points for each training run, so that
subsequent training runs have a good chance of finding a suitable minimum.
Limitation of EBPA
A back-propagation neural network is only practical in certain situations. Following are some
guidelines on when you should use another approach:
Can you write down a flow chart or a formula that accurately describes the problem? If
so, then stick with a traditional programming method.
Is there a simple piece of hardware or software that already does what you want? If so,
then the development time for a NN might not be worth it.
Do you want the functionality to "evolve" in a direction that is not pre-defined? If so,
then consider using a Genetic Algorithm (that's another topic!).
Do you have an easy way to generate a significant number of input/output examples of
the desired behavior? If not, then you won't be able to train your NN to do anything.
Is the problem is very "discrete"? Can correct answer be found in a look-up table of
reasonable size? A look-up table is much simpler and more accurate.
Is precise numeric output values required? NN's are not good at giving precise numeric
answers.
Characteristics of EBPA
There are the characteristics of EBPA:
1. After briefly describing linear threshold units, neural network computation paradigm in
general, and the use of the logistic function (or similar functions) to transform weighted sums
of inputs to a neuron.
2. Backprop's performance on the XOR problem was demonstrated using the tlearn backprop
simulator.
3. A number of refinements to backprop were looked at briefly, including momentum and a
technique to obtain the best generalization ability.
4. Backprop nets learn slowly but compute quickly once they have learned.
5. They can be trained so as to generalize reasonably well.
Applications of EBPA
Backprop tends to work well in some situations where human experts are unable to
articulate a rule for what they are doing - e.g. in areas depending on raw perception,
and where it is difficult to determine the attributes (in the ID3 sense) that are relevant
to the problem at hand.
For example, there is a proprietary system, which includes a backprop component, for
assisting in classifying Pap smears.
o The system picks out from the image the most suspicious-looking cells.
o A human expert then inspects these cells.
o This reduces the problem from looking at maybe 10,000 cells to looking at
maybe 100 cells - this reduces the boredom-induced error rate.
Other successful systems have been built for tasks like reading handwritten postcodes.
An example of a hybrid network which combine the features of two or more basic network
Counter propagation network architecture
The hidden layer is a Kohonen network with unsupervised learning and the output layer is a
designs. Proposed by Hecht-Nielsen in 1986.
Grossberg (outstar) layer fully connected to the hidden layer. The output layer is trained by
The output of the A subsection of the input layer is fanned out to the competitive middle
of size n to pattern B of size m.
layer. Each neuron in the output layer recei es a sig al o espo di g to the i put patte s
The B subsection of the input layer has zero input during actual operation of the network
category along one connection from the middle layer.
The role of the output layer is to produce the pattern corresponding to the category output
by the middle layer. The output layer uses a supervised learning procedure, with direct
o e tio f o the i put la e s B subsection providing the correct output.
Training is a two-stage procedure. First, the Kohonen layer is trained on input patterns. No
changes are made in the output layer during this step. Once the middle layer is trained to
correctly categorise all the input patterns, the weights between the input and middle layers
are kept fixed and the output layer is trained to produce correct output patterns by
adjusting weights between the middle and output layers.
Characteristics of CPN
• I the fi st t ai i g phase, if a hidde -layer unit does not win for a long period of time, its
weights should be set to random values to give that unit a chance to win subsequently.
• The e is no need for normalizing the training output vectors.
• Afte the t ai i g has fi ished, the et o k aps the t ai ing vectors onto output vectors
that are close to the desired ones.
• The more hidden units, the better the mapping.
• Tha ks to the o petiti e eu o s i the hidde la e , the li ea eu o s a ealize
nonlinear mappings.
The Hopfield network is an implementation of a learning matrix with recurrent links. The
learning matrix is a weight matrix which actually stores associations between inputs and
targets. This network generalizes in the sense that it identifies general dependencies in the
given incomplete and noisy training data, in this sense it resembles a learning matrix. This kind
of a network is a linear model as it can model only linearly separable data.
The Hopfield Type Network is a multiple-loop feedback neural computation system. The
neurons in this network are connected to all other neurons except to themselves that is there
are no self-feedbacks in the network. A connection between two neurons Ni and Nj is two way
which is denoted by wij. The connection wij from the output of neuron i to the input of neuron j
has the same strength as the connection wji from the output of neuron j to the input of neuron
i, in other words the weight matrix is symmetric. Each neuron computes the summation:
si = ∑j=1n wji xj
where: n is the number of neurons in the network, wji are the weights, and xj are inputs,
1<=j<=n.
The Hopfield network can be made to operate in either continuous or discrete mode. Here it is
considered that the network operates in discrete mode using neurons with discrete activation
functions. When a neuron fires, its discrete activation function is evaluated and the following
output is produced: x'i = 1 if si = ∑j=1n wji xj >= 0 or x'i = 0 if si < 0.
Configuration:
Hopfield networks can be implemented to operate in two modes:
- Synchronous mode of training Hopfield networks means that all neurons fire at the same time.
- Asynchronous mode of training Hopfield networks means that the neurons fire at random.
Stability constraints:
The recurrent networks of the Hopfield type are complex dynamical systems whose behavior is
determined by the connectivity pattern between the neurons. The inputs to the neurons are
not simply externally provided inputs but outputs from other neurons that change with the
time. The temporal behavior of the network implies characteristics that have to be taken into
consideration in order to examine the network performance.
The Hopfield networks are dynamical systems whose state changes with the time. The state of
the neural network is the set of the outputs of all neurons at a particular moment, time instant.
When a neuron fires then its output changes and so the network state also changes. Therefore
the sequence of neuron firings leads to a corresponding sequence of modified neuron outputs,
and modified system states. Acquiring knowledge of the state space allows us to study the
motion of the neural network in time. The trajectories that the network leaves in the time may
be taken to make a state portrait of the system.
A Hopfield net with n neurons has 2n possible states, assuming that each neuron output
produces two values 0 and 1. Performance analysis of the network behavior can be carried out
by developing a state table that lists all possible subsequent states.
Associative Memory:
These kinds of neural networks work on the basis of pattern association, which means they can
store different patterns and at the time of giving an output they can produce one of the stored
patterns by matching them with the given input pattern. These types of memories are also called
Content-Addressable Memory (CAM). Associative memory makes a parallel search with the
stored patterns as data files.
Following are the two types of associative memories we can observe −
Auto Associative Memory
Hetero Associative memory
Training Algorithm
For training, this network is using the Hebb or Delta learning rule.
Step 1 − I itialize all the eights to ze o as wij = 0 (i = 1 to n, j = 1 to n)
Step 2 − Pe fo steps -4 for each input vector.
Step 3 − A ti ate ea h i put u it as follo s −
xi=si(i=1 to n)
Step 4 − A ti ate ea h output u it as follo s −
yj=sj(j=1 to n)
Step 5 − Adjust the eights as follo s −
wij(new)=wij(old)+xiyj
Hopfield machine
The standard binary Hopfield network is a recurrently connected network with the following
features:
symmetrical connections: if there is a connection going from unit j to unit i having a
connection weight equal to W_ij then there is also a connection going from unit i to unit
j with an equal weight.
linear threshold activation: if the total weighted summed input (dot product of input
and weights) to a unit is greater than or equal to zero, its state is set to 1, otherwise it is
-1. Normally, the threshold is zero. Note that the Hopfield network for the travelling
salesman problem (assignment 3) behaved slightly differently from this.
asynchronous state updates: units are visited in random order and updated according to
the above linear threshold rule.
Energy function: it can be shown that the above state dynamics minimizes an energy
function.
Hebbian learning
The most important features of the Hopfield network are:
Energy minimization during state updates guarantees that it will converge to a stable
attractor.
The learning (weight updates) also minimizes energy; therefore, the training patterns
will become stable attractors (provided the capacity has not been exceeded).
However, there are some serious drawbacks to Hopfield networks:
Capacity is only about .15 N, where N is the number of units.
Local energy minima may occur, and the network may therefore get stuck in very poor
(high Energy) states which do not satisfy the "constraints" imposed by the weights very
well at all. These local minima are referred to as spurious attractors if they are stable
attractors which are not part of the training set. Often, they are blends of two or more
training patterns.
Boltzmann machine
The binary Boltzmann machine is very similar to the binary Hopfield network, with the addition
of three features:
ART1 neural networks cluster binary vectors, using unsupervised learning. The neat thing about
adaptive resonance theory is that it gives the user more control over the degree of relative
similarity of patterns placed on the same cluster.
An ART1 net achieves stability when it cannot return any patterns to previous clusters (in other
words, a pattern oscillating among different clusters at different stages of training indicates an
unstable net. Some nets achieve stability by gradually reducing the learning rate as the same
set of training patterns is presented many times. However, this does not allow the net to
readily learn a new pattern that is presented for the first time after a number of training epochs
have already taken place. The ability of a net to respond to (learn) a new pattern equally well
at any stage of learning is called plasticity (e.g., this is a computational corollary of the
biological model of neural plasticity). Adaptive resonance theory nets are designed to be both
stable and plastic.
an input processing field (called the F1 layer) which happens to consist of two
parts:
o an input portion (F1(a))
o an interface portion (F1(b))
the cluster units (the F2 layer)
and a mechanism to control the degree of similarity of patterns placed on the
same cluster
a reset mechanism
weighted bottom-up connections between the F1 and F2 layers
weighted top-down connections between the F2 and F1 layers
F1(b), the interface portion, combines signals from the input portion and the F 2 layer, for use in
comparing the similarity of the input signal to the weight vector for the cluster unit that has
been selected as a candidate for learning.
To control the similarity of patterns placed on the same cluster, there are two sets of
connections (each with its own weights) between each unit in the interface portion of the input
field and the cluster unit. The F1(b) layer is connected to the F2 layer by bottom-up weights
(bij). The F2 layer is connected to the F1(b) layer by top-down weights (tij).
The F2 layer is a competitive layer: The cluster unit with the largest net input becomes the
candidate to learn the input pattern. The activations of all other F2 units are set to zero. The
interface units, F1(b), now combine information from the input and cluster units. Whether or
not this cluster unit is allowed to learn the input pattern depends on how similar its top-down
weight vector is to the input vector. This decision is made by the reset unit, based on signals it
receives from the input F1(a) and interface F1(b) layers. If the cluster unit is not allowed to
learn, it is inhibited and a new cluster unit is selected as the candidate. If a cluster unit is
allowed to learn, it is said to classify a pattern class. Sometimes there is a tie for the winning
neuron in the F2 layer, when this happens, then an arbitrary rule, such as the first of them in a
serial order, can be taken as the winner.
During the operation of an ART1 net, patterns emerge in the F 1a and F1b layers and are called
t a es of “TM sho t−te e o . T a es of LTM lo g−te e o a e i the o e tio
weights between the input layers (F1) and output layer (F2).
Classifications of ART
ART-1
This is a binary version of ART, i.e., it can cluster binary input vectors.
ART-2
This is an analogue version of ART, i.e. it can cluster real-valued input vectors.
ART-2A
This refers to a fast version of the ART2 learning algorithm.
ART-3
This network is an ART extension that incorporates "chemical transmitters" to control the
search process in a hierarchical ART structure.
ARTMAP
This is a supervised version of ART that can learn arbitrary mappings of binary patterns.
Fuzzy ART
This network is a synthesis of ART and fuzzy logic.
Fuzzy ARTMAP
This is supervised fuzzy ART.
ART adaptations
We are aware that this list is far from complete. Contributions in the form of a short description
together with references are much appreciated!
ARTMAP-IC
This network adds distributed prediction and category instance counting to the basic fuzzy
ARTMAP system.
Gaussian ARTMAP
A supervised-learning ART network that uses Gaussian-defined receptive fields.
arboART
In this network, the prototype vectors at each layer are used as input to the following layer
(agglomerative method). The architecture is similar to HART-J (see below). It has been applied
to automatic rule generation of Kansei engineering expert systems.
HART(-J), HART-S
Modular Hierarchical ART (HART) models. HART-J (also known as HART) implements an
agglomerative clustering method (similar to arboART above). HART-S implements a divisive
clustering method with each ART layer learning the differences between the input and the
matching prototype of the previous layer.
SMART
Self-consistent Modular ART network, which is capable of learning self-consistent cluster
hierarchies through explicit links and an internal feedback mechanism (much like those of the
ARTMAP network).
LAPART
An ART-based neural architecture for pattern sequence verification through inferencing.
MART
Multichannel ART, for adaptive classification of patterns through multiple inputs channels.
PROBART
A modification to the Fuzzy ARTMAP network (by building up probabilistic information
regarding interlayer node associations) which allows it to learn to approximate noisy mappings.
R2MAP
An ARTMAP-based architecture capable of learning complex classification tasks by re-iteratively
creating novel (relational) input features to represent the same problem with fewer input
categories. It takes motivation from the representational redescription (RR) hypothesis in
cognitive science.
TD-ART
Time-Delay ART for learning spatio-temporal patterns.