0% found this document useful (0 votes)

96 views

Unit 2 - Soft Computing - WWW - Rgpvnotes.in

This document provides an overview of multilayer perceptrons (MLPs) and the backpropagation algorithm for training neural networks. It discusses the architecture of MLPs, including the input, hidden, and output layers. The backpropagation algorithm is described as propagating errors backward from the output to hidden layers to update weights and reduce errors between network outputs and desired outputs. Key aspects covered include the sigmoid activation function, momentum term, stopping criteria, and random weight initialization.

Uploaded by

Rozy Vadgama

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

96 views

Unit 2 - Soft Computing - WWW - Rgpvnotes.in

Uploaded by

Rozy Vadgama

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 20

Program : B.

E
Subject Name: Soft Computing
Subject Code: CS-8001
Semester: 8th
Downloaded from be.rgpvnotes.in

UNIT-2 Notes

Introduction of MLP (Multi Layer Perceptron)

1. The multilayer perceptron (MLP) is a hierarchical structure of several perceptrons, and
overcomes the shortcomings of these single-layer networks see figure 1.7.
2. The multilayer perceptron is an artificial neural network that learns nonlinear function
mappings. The multilayer perceptron is capable of learning a rich variety of nonlinear decision
surfaces.
3. Nonlinear functions can be represented by multilayer perceptrons with units that use
nonlinear activation functions. Multiple layers of cascaded linear units still produce only linear
mappings.
4. A neural network with one or more layers of nodes between the input and the output nodes
is called multilayer network.
5. The multilayer network structure, or architecture, or topology, consists of an input layer, two
or more hidden layers, and one output layer. The input nodes pass values to the first hidden
layer, its nodes to the second and so on till producing outputs.
6. A network with a layer of input units, a layer of hidden units and a layer of output units is a
two-layer network. A network with two layers of hidden units is a three-layer network, and so
on. A justification for this is that the layer of input units is used only as an input channel and can
therefore be discounted.
A two-layer neural network that implements the function:
f( x = Σ wjkΣ wijxi + w0j ) + w0k )
where: x is the input vector,
w0j and w0k are the thresholds,
wij are the weights connecting the input with the hidden nodes
wjk are the weights connecting the hidden with the output nodes
s is the sigmoid activation function.
These are the hidden units that enable the multilayer network to learn complex tasks by
extracting progressively more meaningful information from the input examples.
7. The multilayer network MLP has a highly connected topology since every input is connected
to all nodes in the first hidden layer, every unit in the hidden layers is connected to all nodes in
the next layer, and so on.
8. The input signals, initially these are the input examples, propagate through the neural
network in a forward direction on a layer-by-layer basis, that is why they are often called feed
forward multilayer networks.
Two kinds of signals pass through these networks:
- function signals: the input examples propagated through the hidden units and processed by
their activation functions emerge as outputs;
- error signals: the errors at the otuput nodes are propagated backward layer-by-layer through
the network so that each node returns its error back to the nodes in the previous hidden layer.

Page no: 1 Follow us on facebook to get real-time updates from RGPV

Downloaded from be.rgpvnotes.in

Fig 1.7: MLP Neural Network

Different activation functions:

An activation function performs a mathematical operation on the signal output. It decides the
criteria and nature of output to be generated. Two most common activation functions are:
1. Threshold Function,
A step function is a function is likely used by the original Perceptron. The output is a certain
value, A1, if the input sum is above a certain threshold and A0 if the input sum is below a
certain threshold. The values used by the Perceptron were A1 = 1 and A0 = 0 see figure 1.8.

Fig 1.8: Threshold Function

2. Sigmoidal (S shaped) function,

Page no: 2 Follow us on facebook to get real-time updates from RGPV

Downloaded from be.rgpvnotes.in

A wide variety of sigmoid functions have been used as the activation function of artificial
neurons, including the logistic and hyperbolic tangent functions, see figure 1.9.

Fig 1.9: Sigmoidal Function

Error back propagation algorithm

1. This algorithm was discovered and rediscovered a number of times. This reference also
contains the mathematical details of the derivation of the backpropagation equations, which
we shall omit. (This is covered in COMP9444 Neural Networks.)
2. Like perceptron learning, back-propagation attempts to reduce the errors between the
output of the network and the desired result.
3. However, assigning blame for errors to hidden nodes (i.e. nodes in the intermediate layers),
is not so straightforward. The error of the output nodes must be propagated back through the
hidden nodes.
4. The contribution that a hidden node makes to an output node is related to the strength of
the weight on the link between the two nodes and the level of activation of the hidden node
when the output node was given the wrong level of activation.
5. This can be used to estimate the error value for a hidden node in the penultimate layer, and
that can, in turn, be used in making error estimates for earlier layers see figure 1.9.

Derivation of EBPA

 The basic algorithm can be summed up in the following equation (the delta rule) for the
change to the weight wji from node i to node j:

weight learning local input signal

change rate gradient to node j
Δwji = η × δj × yi

 where the lo al g adie t δj is defined as follows:

Page no: 3 Follow us on facebook to get real-time updates from RGPV

Downloaded from be.rgpvnotes.in

1. If node j is a output ode, the δj is the p odu t of φ' vj) and the error signal ej, where
φ _ is the logisti fu tio a d vj is the total input to node j i.e. Σi wjiyi), and ej is the
error signal for node j (i.e. the difference between the desired output and the actual
output);
2. If node j is a hidde ode, the δj is the p odu t of φ' vj) and the weighted sum of the
δ's o puted fo the odes i the e t hidde o output la e that a e o e ted to
node j.
3. [The a tual fo ula is δj = φ' vj) &Sigmak δkwkj where k ranges over those nodes for
which wkj is non-zero (i.e. nodes k that actually have connections from node j. The δk
values have already been computed as they are in the output layer (or a layer closer to
the output layer than node j).]

Fig 1.9: Error Back Propagation Network

-Two Passes of Computation

FORWARD PASS: weights fixed, input signals propagated through network and outputs
calculated. Outputs oj are compared with desired outputs dj; the error signal ej = dj - oj is
computed.
BACKWA‘D PA““: sta ts ith output la e a d e u si el o putes the lo al g adie t δ j for
each node. Then the weights are updated usi g the e uatio a o e fo Δwji, and back to
another forward pass.
-Sigmoidal Nonlinearity
With the sig oidal fu tio φ x defi ed a o e, it is the ase that φ' vj) = yj(1 - yj), a fact that
simplifies the computations.

Momentum

Page no: 4 Follow us on facebook to get real-time updates from RGPV

Downloaded from be.rgpvnotes.in

 If the lea i g ate η is very small, then the algorithm proceeds slowly, but accurately
follows the path of steepest descent in weight space.
 If η is la gish, the algo ith a os illate " ou e off the a o alls" .
A simple method of effectively increasing the rate of learning is to modify the delta rule
by including a momentum term:
Δwji(n = α Δwji(n- + η δj(n)yi(n)
he e α is a positi e o sta t te ed the momentum constant. This is called the
generalized delta rule.
 The effect is that if the basic delta rule is consistently pushing a weight in the same
direction, then it gradually gathers "momentum" in that direction.

-Stopping Criterion
Two commonly used stopping criteria are:
 stop after a certain number of runs through all the training data (each run through all
the training data is called an epoch);
 stop when the total sum-squared error reaches some low level. By total sum-squared
e o e ea ΣpΣiei2 where p ranges over all of the training patterns and i ranges over
all of the output units.
-Initialization
 The weights of a network to be trained by backprop must be initialized to some non-
zero values.
 The usual thing to do is to initialize the weights to small random values.
 The reason for this is that sometimes backprop training runs become "lost" on a plateau
in weight-space, or for some other reason backprop cannot find a good minimum error
value.
 Using small random values means different starting points for each training run, so that
subsequent training runs have a good chance of finding a suitable minimum.

Limitation of EBPA
A back-propagation neural network is only practical in certain situations. Following are some
guidelines on when you should use another approach:
 Can you write down a flow chart or a formula that accurately describes the problem? If
so, then stick with a traditional programming method.
 Is there a simple piece of hardware or software that already does what you want? If so,
then the development time for a NN might not be worth it.
 Do you want the functionality to "evolve" in a direction that is not pre-defined? If so,
then consider using a Genetic Algorithm (that's another topic!).
 Do you have an easy way to generate a significant number of input/output examples of
the desired behavior? If not, then you won't be able to train your NN to do anything.
 Is the problem is very "discrete"? Can correct answer be found in a look-up table of
reasonable size? A look-up table is much simpler and more accurate.
 Is precise numeric output values required? NN's are not good at giving precise numeric
answers.

Page no: 5 Follow us on facebook to get real-time updates from RGPV

Downloaded from be.rgpvnotes.in

Characteristics of EBPA
There are the characteristics of EBPA:
1. After briefly describing linear threshold units, neural network computation paradigm in
general, and the use of the logistic function (or similar functions) to transform weighted sums
of inputs to a neuron.
2. Backprop's performance on the XOR problem was demonstrated using the tlearn backprop
simulator.
3. A number of refinements to backprop were looked at briefly, including momentum and a
technique to obtain the best generalization ability.
4. Backprop nets learn slowly but compute quickly once they have learned.
5. They can be trained so as to generalize reasonably well.

Applications of EBPA
 Backprop tends to work well in some situations where human experts are unable to
articulate a rule for what they are doing - e.g. in areas depending on raw perception,
and where it is difficult to determine the attributes (in the ID3 sense) that are relevant
to the problem at hand.
 For example, there is a proprietary system, which includes a backprop component, for
assisting in classifying Pap smears.
o The system picks out from the image the most suspicious-looking cells.
o A human expert then inspects these cells.
o This reduces the problem from looking at maybe 10,000 cells to looking at
maybe 100 cells - this reduces the boredom-induced error rate.
 Other successful systems have been built for tasks like reading handwritten postcodes.

 An example of a hybrid network which combine the features of two or more basic network
Counter propagation network architecture

 The hidden layer is a Kohonen network with unsupervised learning and the output layer is a
designs. Proposed by Hecht-Nielsen in 1986.

Grossberg (outstar) layer fully connected to the hidden layer. The output layer is trained by

 Allows the output of a pattern rather than a simple category number.

the Widrow-Hoff rule.

 Can also be viewed as a bidirectional associative memory.

 Figure 2.1 shows a unidirectional counter propagation network used for mapping pattern A

 The output of the A subsection of the input layer is fanned out to the competitive middle
of size n to pattern B of size m.

layer. Each neuron in the output layer recei es a sig al o espo di g to the i put patte s

 The B subsection of the input layer has zero input during actual operation of the network
category along one connection from the middle layer.

and is used to provide input only during training.

Page no: 6 Follow us on facebook to get real-time updates from RGPV

Downloaded from be.rgpvnotes.in

 The role of the output layer is to produce the pattern corresponding to the category output
by the middle layer. The output layer uses a supervised learning procedure, with direct
o e tio f o the i put la e s B subsection providing the correct output.
 Training is a two-stage procedure. First, the Kohonen layer is trained on input patterns. No
changes are made in the output layer during this step. Once the middle layer is trained to
correctly categorise all the input patterns, the weights between the input and middle layers
are kept fixed and the output layer is trained to produce correct output patterns by
adjusting weights between the middle and output layers.

Fig 2.1: Counter Propagation Network

CPN Functioning and Training algorithm
Training algorithm stage 1:
1. Apply normalised input vector x to input A.
2. Determine winning node in the Kohonen layer.
3. Update i i g ode s eight e to –
w(t+1) = w(t) + (x – w)
4. Repeat steps 1 through 3 until all vectors have been processed.
5. Repeat steps 1 to 4 until all input vectors have been learned.

Training algorithm stage 2:

1. Apply normalised input vector x and its corresponding output vector y, to inputs A and B
respectively.
2. Determine winning node in the Kohonen layer.
3. Update weights on the connections from the winning node to the output unit –
wi(t+1) = wi(t) + (yi – wi)
4. Repeat steps 1 through 3 until all vectors of all classes map to satisfactory outputs.

Characteristics of CPN
• I the fi st t ai i g phase, if a hidde -layer unit does not win for a long period of time, its
weights should be set to random values to give that unit a chance to win subsequently.
• The e is no need for normalizing the training output vectors.

Page no: 7 Follow us on facebook to get real-time updates from RGPV

Downloaded from be.rgpvnotes.in

• Afte the t ai i g has fi ished, the et o k aps the t ai ing vectors onto output vectors
that are close to the desired ones.
• The more hidden units, the better the mapping.
• Tha ks to the o petiti e eu o s i the hidde la e , the li ea eu o s a ealize
nonlinear mappings.

Hopfield/ Recurrent network

The Hopfield network is an implementation of a learning matrix with recurrent links. The
learning matrix is a weight matrix which actually stores associations between inputs and
targets. This network generalizes in the sense that it identifies general dependencies in the
given incomplete and noisy training data, in this sense it resembles a learning matrix. This kind
of a network is a linear model as it can model only linearly separable data.

The Hopfield Type Network is a multiple-loop feedback neural computation system. The
neurons in this network are connected to all other neurons except to themselves that is there
are no self-feedbacks in the network. A connection between two neurons Ni and Nj is two way
which is denoted by wij. The connection wij from the output of neuron i to the input of neuron j
has the same strength as the connection wji from the output of neuron j to the input of neuron
i, in other words the weight matrix is symmetric. Each neuron computes the summation:

si = ∑j=1n wji xj

where: n is the number of neurons in the network, wji are the weights, and xj are inputs,
1<=j<=n.

The Hopfield network can be made to operate in either continuous or discrete mode. Here it is
considered that the network operates in discrete mode using neurons with discrete activation
functions. When a neuron fires, its discrete activation function is evaluated and the following
output is produced: x'i = 1 if si = ∑j=1n wji xj >= 0 or x'i = 0 if si < 0.

Page no: 8 Follow us on facebook to get real-time updates from RGPV

Downloaded from be.rgpvnotes.in

Fig 2.2: Hopfield Network

Configuration:
Hopfield networks can be implemented to operate in two modes:
- Synchronous mode of training Hopfield networks means that all neurons fire at the same time.
- Asynchronous mode of training Hopfield networks means that the neurons fire at random.

Stability constraints:
The recurrent networks of the Hopfield type are complex dynamical systems whose behavior is
determined by the connectivity pattern between the neurons. The inputs to the neurons are
not simply externally provided inputs but outputs from other neurons that change with the
time. The temporal behavior of the network implies characteristics that have to be taken into
consideration in order to examine the network performance.
The Hopfield networks are dynamical systems whose state changes with the time. The state of
the neural network is the set of the outputs of all neurons at a particular moment, time instant.
When a neuron fires then its output changes and so the network state also changes. Therefore
the sequence of neuron firings leads to a corresponding sequence of modified neuron outputs,
and modified system states. Acquiring knowledge of the state space allows us to study the
motion of the neural network in time. The trajectories that the network leaves in the time may
be taken to make a state portrait of the system.
A Hopfield net with n neurons has 2n possible states, assuming that each neuron output
produces two values 0 and 1. Performance analysis of the network behavior can be carried out
by developing a state table that lists all possible subsequent states.

Associative Memory:

Page no: 9 Follow us on facebook to get real-time updates from RGPV

Downloaded from be.rgpvnotes.in

These kinds of neural networks work on the basis of pattern association, which means they can
store different patterns and at the time of giving an output they can produce one of the stored
patterns by matching them with the given input pattern. These types of memories are also called
Content-Addressable Memory (CAM). Associative memory makes a parallel search with the
stored patterns as data files.
Following are the two types of associative memories we can observe −
 Auto Associative Memory
 Hetero Associative memory

Auto Associative Memory

This is a single layer neural network in which the input training vector and the output target
vectors are the same. The weights are determined so that the network stores a set of patterns.
Architecture
As shown in the following figure, the architecture of Auto Associative memory network has
number of input training vectors and similar number of output target vectors.

Training Algorithm
For training, this network is using the Hebb or Delta learning rule.
Step 1 − I itialize all the eights to ze o as wij = 0 (i = 1 to n, j = 1 to n)
Step 2 − Pe fo steps -4 for each input vector.
Step 3 − A ti ate ea h i put u it as follo s −
xi=si(i=1 to n)
Step 4 − A ti ate ea h output u it as follo s −
yj=sj(j=1 to n)
Step 5 − Adjust the eights as follo s −
wij(new)=wij(old)+xiyj

Fig 2.3: Auto Associative Memory

Page no: 10 Follow us on facebook to get real-time updates from RGPV

Downloaded from be.rgpvnotes.in

Hetero Associative memory

Similar to Auto Associative Memory network, this is also a single layer neural network.
However, in this network the input training vector and the output target vectors are not the
same. The weights are determined so that the network stores a set of patterns. Hetero
associative network is static in nature, hence, there would be no non-linear and delay
operations.
As shown in the following figure, the architecture of Hetero Associative Memory network has
number of input training vectors and number of output target vectors.
Training Algorithm
For training, this network is using the Hebb or Delta learning rule.
Step 1 − I itialize all the eights to ze o as wij = 0 (i = 1 to n, j = 1 to m)
Step 2 − Pe fo steps -4 for each input vector.
Step 3 − A ti ate ea h i put u it as follo s −
xi=si(i=1 to n)
Step 4 − A ti ate ea h output u it as follo s −
yj=sj(j=1 to m)
Step 5 − Adjust the eights as follo s −
wij(new)=wij(old)+xiyj

Fig 2.4: Hetero Associative Memory

Characteristics of Associative Memory

Characteristics of an autocorrelation or cross correlation associative memory largely depend on
how items are encoded in pattern vectors to be stored. When most of the components of
encoded patterns to be stored are 0 and only a small ratio of the components is 1, the encoding
scheme is said to be sparse. The memory capacity and information capacity of a sparsely
encoded associative memory are analyzed in detail, and are proved to be in proportion of
n2logn2, n being the number of neurons, which is very large compared to the ordinary non-
sparse encoding scheme of about 0.15n. Moreover, it is proved that the sparsely encoded
associative memory has a large basin of attraction around each memorized pattern, when and
only when an activity control mechanism is attached to it.

Page no: 11 Follow us on facebook to get real-time updates from RGPV

Downloaded from be.rgpvnotes.in

Limitations of Associative Memory

Main limitation of associative memory is to efficiency of access and retrieval of pattern stored
in database. In case of damaged pattern, it is unable to restore it.

Applications of Associative Memory

Following are the application areas of associative memory:
1. Patter Recognition i.e. face, signature etc.
2. Content Addressable Storage (CAS)
3. Clustering
4. Encoding and Decoding of Data

Hopfield machine
The standard binary Hopfield network is a recurrently connected network with the following
features:
 symmetrical connections: if there is a connection going from unit j to unit i having a
connection weight equal to W_ij then there is also a connection going from unit i to unit
j with an equal weight.
 linear threshold activation: if the total weighted summed input (dot product of input
and weights) to a unit is greater than or equal to zero, its state is set to 1, otherwise it is
-1. Normally, the threshold is zero. Note that the Hopfield network for the travelling
salesman problem (assignment 3) behaved slightly differently from this.
 asynchronous state updates: units are visited in random order and updated according to
the above linear threshold rule.
 Energy function: it can be shown that the above state dynamics minimizes an energy
function.
 Hebbian learning
The most important features of the Hopfield network are:
 Energy minimization during state updates guarantees that it will converge to a stable
attractor.
 The learning (weight updates) also minimizes energy; therefore, the training patterns
will become stable attractors (provided the capacity has not been exceeded).
However, there are some serious drawbacks to Hopfield networks:
 Capacity is only about .15 N, where N is the number of units.
 Local energy minima may occur, and the network may therefore get stuck in very poor
(high Energy) states which do not satisfy the "constraints" imposed by the weights very
well at all. These local minima are referred to as spurious attractors if they are stable
attractors which are not part of the training set. Often, they are blends of two or more
training patterns.

Boltzmann machine
The binary Boltzmann machine is very similar to the binary Hopfield network, with the addition
of three features:

Page no: 12 Follow us on facebook to get real-time updates from RGPV

Downloaded from be.rgpvnotes.in

 Stochastic activation function: the state a unit is in is probabilistically related to its

Energy gap. The bigger the energy gap between its current state and the opposite state,
the more likely the unit will flip states.
 Temperature and simulated annealing: the probability that a unit is on is computed
according to a sigmoid function of its total weighted summed input divided by T. If T is
large, the network behaves very randomly. T is gradually reduced and at each value of T,
all the units' states are updated. Eventually, at the lowest T, units are behaving less
randomly and more like binary threshold units.
 Contrastive Hebbian Learning: A Boltzmann machine is trained in two phases, "clamped"
and "unclamped". It can be trained either in supervised or unsupervised mode. Only the
supervised mode was discussed in class; this type of training proceeds as follows, for
each training pattern:
1. Clamped Phase: The input units' states are clamped to (set and not permitted to
change from) the training pattern, and the output units' states are clamped to
the target vector. All other units' states are initialized randomly, and are then
permitted to update until they reach "equilibrium" (simulated annealing). Then
Hebbian learning is applied.
2. Unclamped Phase: The input units' states are clamped to the training pattern. All
other units' states (both hidden and output) are initialized randomly, and are
then permitted to update until they reach "equilibrium". Then anti-Hebbian
learning (Hebbian learning with a negative sign) is applied.
The above two-phase learning rule must be applied for each training pattern, and for a
great many iterations through the whole training set. Eventually, the output units' states
should become identical in the clamped and unclamped phases, and so the two learning
rules exactly cancel one another. Thus, at the point when the network is always
producing the correct responses, the learning procedure naturally converges and all
weight updates approach zero.
The stochasticity enables the Boltzmann machine to overcome the problem of getting
stuck in local energy minima, while the contrastive Hebb rule allows the network to be
trained with hidden features and thus overcomes the capacity limitations of the
Hopfield network. However, in practice, learning in the Boltzmann machine is hopelessly
slow.

Comparison between Hopfield v/s Boltzman

Hopfield networks have following limitations:
1. suffer from spurious local minima that form on the energy hyper surface
2. require the input patterns to be uncorrelated
3. are limited in capacity of patterns that can be stored
4. are usually fully connected and not stacked
Restricted Boltzmann Machines (RBMs) avoid the spurious solutions that arise in Hopfield
Networks by adding in hidden nodes and then sampling over all possible nodes using Boltzman
statistics:
This is not the only difference however, as older studies of Hopfield networks also looked at
adding temperature (i.e the Little model for spin glasses)

Page no: 13 Follow us on facebook to get real-time updates from RGPV

Downloaded from be.rgpvnotes.in

Spin Glass Models of Neural Networks

(* these models have spurious states that form if you are too far below the critical
temperature; if you place too many patterns in a Hopfield network, it undergoes a spin glass
transition.)
It is known, however, that introducing even simple statistical thermal fluctuations leads to 2p
ground stable that are correlated with the input patterns. This behavior is not seen in T=0
Hopfield networks and it appears that introducing the Boltzman stats is critical.
Furthermore, this behavior emerges in 2 very different models, the generalized Hopfield model
and the little model, so it seems it would also appear in the RBM model also.
So adding the hidden variables + temperature has the effect of introducing these kinds of
thermal fluctuations, which create a large number of stable memories without low lying meta
stable states.

Adaptive Resonance Theory: Architecture

ART1 neural networks cluster binary vectors, using unsupervised learning. The neat thing about
adaptive resonance theory is that it gives the user more control over the degree of relative
similarity of patterns placed on the same cluster.

An ART1 net achieves stability when it cannot return any patterns to previous clusters (in other
words, a pattern oscillating among different clusters at different stages of training indicates an
unstable net. Some nets achieve stability by gradually reducing the learning rate as the same
set of training patterns is presented many times. However, this does not allow the net to
readily learn a new pattern that is presented for the first time after a number of training epochs
have already taken place. The ability of a net to respond to (learn) a new pattern equally well
at any stage of learning is called plasticity (e.g., this is a computational corollary of the
biological model of neural plasticity). Adaptive resonance theory nets are designed to be both
stable and plastic.

The basic structure of an ART1 neural network involves:

 an input processing field (called the F1 layer) which happens to consist of two
parts:
o an input portion (F1(a))
o an interface portion (F1(b))
 the cluster units (the F2 layer)
 and a mechanism to control the degree of similarity of patterns placed on the
same cluster
 a reset mechanism
 weighted bottom-up connections between the F1 and F2 layers
 weighted top-down connections between the F2 and F1 layers

Page no: 14 Follow us on facebook to get real-time updates from RGPV

Downloaded from be.rgpvnotes.in

F1(b), the interface portion, combines signals from the input portion and the F 2 layer, for use in
comparing the similarity of the input signal to the weight vector for the cluster unit that has
been selected as a candidate for learning.

Fig 2.5: ART Architecture

To control the similarity of patterns placed on the same cluster, there are two sets of
connections (each with its own weights) between each unit in the interface portion of the input
field and the cluster unit. The F1(b) layer is connected to the F2 layer by bottom-up weights
(bij). The F2 layer is connected to the F1(b) layer by top-down weights (tij).
The F2 layer is a competitive layer: The cluster unit with the largest net input becomes the
candidate to learn the input pattern. The activations of all other F2 units are set to zero. The
interface units, F1(b), now combine information from the input and cluster units. Whether or
not this cluster unit is allowed to learn the input pattern depends on how similar its top-down
weight vector is to the input vector. This decision is made by the reset unit, based on signals it
receives from the input F1(a) and interface F1(b) layers. If the cluster unit is not allowed to
learn, it is inhibited and a new cluster unit is selected as the candidate. If a cluster unit is
allowed to learn, it is said to classify a pattern class. Sometimes there is a tie for the winning
neuron in the F2 layer, when this happens, then an arbitrary rule, such as the first of them in a
serial order, can be taken as the winner.
During the operation of an ART1 net, patterns emerge in the F 1a and F1b layers and are called
t a es of “TM sho t−te e o . T a es of LTM lo g−te e o a e i the o e tio
weights between the input layers (F1) and output layer (F2).

Classifications of ART
 ART-1

Page no: 15 Follow us on facebook to get real-time updates from RGPV

Downloaded from be.rgpvnotes.in

This is a binary version of ART, i.e., it can cluster binary input vectors.

 ART-2
This is an analogue version of ART, i.e. it can cluster real-valued input vectors.

 ART-2A
This refers to a fast version of the ART2 learning algorithm.

 ART-3
This network is an ART extension that incorporates "chemical transmitters" to control the
search process in a hierarchical ART structure.

 ARTMAP
This is a supervised version of ART that can learn arbitrary mappings of binary patterns.

 Fuzzy ART
This network is a synthesis of ART and fuzzy logic.

 Fuzzy ARTMAP
This is supervised fuzzy ART.

 Distributed ART and ARTMAP (dART and dARTMAP)

These models learn distributed code representations in the F2 layer. In the special case of
winner-take-all F2 layers, they are equivalent to Fuzzy ART and ARTMAP, respectively.

 ART adaptations
We are aware that this list is far from complete. Contributions in the form of a short description
together with references are much appreciated!

 ARTMAP-IC
This network adds distributed prediction and category instance counting to the basic fuzzy
ARTMAP system.

 Gaussian ARTMAP
A supervised-learning ART network that uses Gaussian-defined receptive fields.

 Hierarchical (modular) ART models

These are ART-based modular networks that learn hierarchical clusterings of arbitrary
sequences of input patterns.

 arboART
In this network, the prototype vectors at each layer are used as input to the following layer
(agglomerative method). The architecture is similar to HART-J (see below). It has been applied
to automatic rule generation of Kansei engineering expert systems.

Page no: 16 Follow us on facebook to get real-time updates from RGPV

Downloaded from be.rgpvnotes.in

 Cascade Fuzzy ART

A cascade of Fuzzy ART networks that develop hierarchies of analogue and binary patterns
through bottom-up learning guided by a top-down search process. It has been applied to
model-based 3D object recognition.

 HART(-J), HART-S
Modular Hierarchical ART (HART) models. HART-J (also known as HART) implements an
agglomerative clustering method (similar to arboART above). HART-S implements a divisive
clustering method with each ART layer learning the differences between the input and the
matching prototype of the previous layer.

 SMART
Self-consistent Modular ART network, which is capable of learning self-consistent cluster
hierarchies through explicit links and an internal feedback mechanism (much like those of the
ARTMAP network).

 LAPART
An ART-based neural architecture for pattern sequence verification through inferencing.

 MART
Multichannel ART, for adaptive classification of patterns through multiple inputs channels.

 PROBART
A modification to the Fuzzy ARTMAP network (by building up probabilistic information
regarding interlayer node associations) which allows it to learn to approximate noisy mappings.

 R2MAP
An ARTMAP-based architecture capable of learning complex classification tasks by re-iteratively
creating novel (relational) input features to represent the same problem with fewer input
categories. It takes motivation from the representational redescription (RR) hypothesis in
cognitive science.

 TD-ART
Time-Delay ART for learning spatio-temporal patterns.

Implementation and Training

Step 1 − I itialize the lea i g ate, the igila e pa a ete , a d the eights as follo s −

α>1 and 0<ρ 1

0<bij(0)<α / α−1+n) and tij(0)=1

Step 2 − Co ti ue step -9, when the stopping condition is not true.

Page no: 17 Follow us on facebook to get real-time updates from RGPV

Downloaded from be.rgpvnotes.in

Step 3 − Co ti ue step -6 for every training input.

Step 4 − “et a ti atio s of all F1(a) and F1 units as follows
F2 = 0 and F1(a) = input vectors
Step 5 − I put sig al f o F1(a) to F1(b) layer must be sent like
si=xi
Step 6 − Fo e e i hi ited F2 node
yj=∑ibijxi, the condition is yj ≠ -1
Step 7 − Pe fo step -10, when the reset is true.
Step 8 − Fi d J for yJ yj for all nodes j
Step 9 − Agai al ulate the a ti atio o F1(b) as follows
xi=sitJi
Step 10 − No , afte al ulati g the o of e to x and vector s, we need to check the reset
o ditio as follo s −
If ||x||/ ||s|| < vigilance parameter ρ, then inhibit node J and go to step 7
Else If ||x||/ ||s|| igila e pa a ete ρ, then proceed further.
Step 11 − Weight updati g fo ode J a e do e as follo s −
bij(new)=(αxi / α−1+||x||)
tij(new)=xi
Step 12 − The stoppi g o ditio fo algo ith ust e he ked a d it a e as follo s −
 Do not have any change in weight.
 Reset is not performed for units.
 Maximum number of epochs reached.

Page no: 18 Follow us on facebook to get real-time updates from RGPV

We hope you find these notes useful.
You can get previous year question papers at
https://qp.rgpvnotes.in .

If you have any queries or you want to submit your

study notes please write us at
[email protected]

Neural Networks Desing - Martin T. Hagan - 2nd Edition
100% (1)
Neural Networks Desing - Martin T. Hagan - 2nd Edition
1,013 pages
Parallel Computing: Second Edition
No ratings yet
Parallel Computing: Second Edition
3 pages
KOM6110 ANN - Machine - Learning Assignment 1 Spring 2017
No ratings yet
KOM6110 ANN - Machine - Learning Assignment 1 Spring 2017
1 page
A Seminar Report On Machine Learing
35% (23)
A Seminar Report On Machine Learing
30 pages
SOFT COMPUTING UNIT 2
No ratings yet
SOFT COMPUTING UNIT 2
22 pages
Chapter 10: Artificial Neural Networks
No ratings yet
Chapter 10: Artificial Neural Networks
17 pages
MLP Lecture 4
No ratings yet
MLP Lecture 4
35 pages
Back Propagation
100% (1)
Back Propagation
27 pages
cst414- Deep learning
No ratings yet
cst414- Deep learning
34 pages
ML Module 2 New
No ratings yet
ML Module 2 New
36 pages
4 Multilayer Perceptrons and Radial Basis Functions
No ratings yet
4 Multilayer Perceptrons and Radial Basis Functions
6 pages
DOC-20241108-WA0006.
No ratings yet
DOC-20241108-WA0006.
70 pages
ANN MODULE 1 Part2
No ratings yet
ANN MODULE 1 Part2
58 pages
Unit 3
100% (1)
Unit 3
11 pages
Feedforward
No ratings yet
Feedforward
34 pages
Unit - II ML
No ratings yet
Unit - II ML
9 pages
CC511 Week 5 - 6 - NN - BP
No ratings yet
CC511 Week 5 - 6 - NN - BP
62 pages
Introduction To Neural Networks: Revision Lectures: © John A. Bullinaria, 2004
No ratings yet
Introduction To Neural Networks: Revision Lectures: © John A. Bullinaria, 2004
24 pages
Lect 15 MLP Introduction Backprop
No ratings yet
Lect 15 MLP Introduction Backprop
24 pages
ML unit-2
100% (1)
ML unit-2
28 pages
Lect 15 MLP Introduction Backprop
No ratings yet
Lect 15 MLP Introduction Backprop
24 pages
Multi Layer Perceptron
No ratings yet
Multi Layer Perceptron
62 pages
Module 3 Final
No ratings yet
Module 3 Final
88 pages
Multilayer Perceptrons Neural Networks
No ratings yet
Multilayer Perceptrons Neural Networks
19 pages
Artificial Neural Networks - Lect - 3
No ratings yet
Artificial Neural Networks - Lect - 3
16 pages
Lec 15 MLP Cont
No ratings yet
Lec 15 MLP Cont
34 pages
Lecture_2 (1)
No ratings yet
Lecture_2 (1)
52 pages
Foundations of Machine Learning: Module 6: Neural Network
No ratings yet
Foundations of Machine Learning: Module 6: Neural Network
14 pages
Graph theory report
No ratings yet
Graph theory report
9 pages
Learning in A Feed Forward Multiple Layer ANN - Backpropagation
No ratings yet
Learning in A Feed Forward Multiple Layer ANN - Backpropagation
18 pages
AI17-Neural Networks
No ratings yet
AI17-Neural Networks
34 pages
MLP
No ratings yet
MLP
2 pages
36-Multi-Layer Perceptron and Its Properties-30-10-2024
No ratings yet
36-Multi-Layer Perceptron and Its Properties-30-10-2024
39 pages
Neural Network: Prof. Subodh Kumar Mohanty
No ratings yet
Neural Network: Prof. Subodh Kumar Mohanty
37 pages
NN Lab2
No ratings yet
NN Lab2
5 pages
U2-ML-QB With Answers
No ratings yet
U2-ML-QB With Answers
16 pages
unit2ml-230101150634-5590aaef
No ratings yet
unit2ml-230101150634-5590aaef
202 pages
Machine Learning
No ratings yet
Machine Learning
83 pages
Back Propagation Learning Algorithm
No ratings yet
Back Propagation Learning Algorithm
15 pages
Unit II Supervised II
No ratings yet
Unit II Supervised II
16 pages
08_NN
No ratings yet
08_NN
117 pages
Lec 23
No ratings yet
Lec 23
13 pages
ML UNIT-4
No ratings yet
ML UNIT-4
6 pages
Slides 11
No ratings yet
Slides 11
48 pages
UNit 6 Machine Learning
No ratings yet
UNit 6 Machine Learning
23 pages
Artificial Neural Networks - MLP
No ratings yet
Artificial Neural Networks - MLP
52 pages
ML unit 4
No ratings yet
ML unit 4
23 pages
Back Propagation Algorithm PDF
No ratings yet
Back Propagation Algorithm PDF
9 pages
RBFN and TDNN
No ratings yet
RBFN and TDNN
42 pages
ANN-Implemetation of Back-Prop
No ratings yet
ANN-Implemetation of Back-Prop
89 pages
NN 2
No ratings yet
NN 2
31 pages
Backpropogation Learning
No ratings yet
Backpropogation Learning
9 pages
Ad3451 ML Unit 4 Notes Eduengg
No ratings yet
Ad3451 ML Unit 4 Notes Eduengg
36 pages
Module I
No ratings yet
Module I
109 pages
DL Notes ALL
No ratings yet
DL Notes ALL
63 pages
NN_2
No ratings yet
NN_2
31 pages
ML807_Distributed_and_Federated_Learning_Slides_2
No ratings yet
ML807_Distributed_and_Federated_Learning_Slides_2
211 pages
Learning in Multi-Layer Perceptrons - Back-Propagation: Neural Computation: Lecture 7
No ratings yet
Learning in Multi-Layer Perceptrons - Back-Propagation: Neural Computation: Lecture 7
20 pages
ML Unit-5
No ratings yet
ML Unit-5
11 pages
Supervised Learning Network
No ratings yet
Supervised Learning Network
33 pages
Chapter 6
No ratings yet
Chapter 6
15 pages
Supervised Learning Network Introduction: Unit 2
No ratings yet
Supervised Learning Network Introduction: Unit 2
52 pages
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
From Everand
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
Fouad Sabry
No ratings yet
Multilayer Perceptron: Fundamentals and Applications for Decoding Neural Networks
From Everand
Multilayer Perceptron: Fundamentals and Applications for Decoding Neural Networks
Fouad Sabry
No ratings yet
Weilow Project Final Report
No ratings yet
Weilow Project Final Report
21 pages
Minor Project Anam
No ratings yet
Minor Project Anam
36 pages
A Major Project On "Dpad Text Editor" in Partial Fulfillment For The Award of Degree of Bachelor of Engineering in Computer Science & Engineering Submitted To
No ratings yet
A Major Project On "Dpad Text Editor" in Partial Fulfillment For The Award of Degree of Bachelor of Engineering in Computer Science & Engineering Submitted To
3 pages
Cloud Computing
No ratings yet
Cloud Computing
11 pages
C++ Notes Complet
100% (1)
C++ Notes Complet
146 pages
Unit 4 - Data Mining - WWW - Rgpvnotes.in
No ratings yet
Unit 4 - Data Mining - WWW - Rgpvnotes.in
10 pages
40 REAL TIME DATA MINING Multiple Choice Questions and Answers PDF
No ratings yet
40 REAL TIME DATA MINING Multiple Choice Questions and Answers PDF
8 pages
Unit 4 - Data Mining - WWW - Rgpvnotes.in
No ratings yet
Unit 4 - Data Mining - WWW - Rgpvnotes.in
10 pages
Lab Manual FOR Data Structure Using C: Sardar Patel University, Dogariya Balaghat
No ratings yet
Lab Manual FOR Data Structure Using C: Sardar Patel University, Dogariya Balaghat
35 pages
OOPM Lab-Manual
No ratings yet
OOPM Lab-Manual
22 pages
International Journal For Research
No ratings yet
International Journal For Research
8 pages
Abstrac Programming Lab-Manual
No ratings yet
Abstrac Programming Lab-Manual
50 pages
Deep Learning 2.0: Artificial Neurons That Matter - Reject Correlation, Embrace Orthogonality
No ratings yet
Deep Learning 2.0: Artificial Neurons That Matter - Reject Correlation, Embrace Orthogonality
19 pages
ML Disha
No ratings yet
ML Disha
46 pages
Introduction To Machine Learning Lecture 3: Linear Classification Methods
No ratings yet
Introduction To Machine Learning Lecture 3: Linear Classification Methods
40 pages
Course Outcomes
No ratings yet
Course Outcomes
3 pages
Total Pages: 2: Answer All Questions, Each Carries 3 Marks
No ratings yet
Total Pages: 2: Answer All Questions, Each Carries 3 Marks
2 pages
MCA-SEM-III-Syllabus Mobile Computing
No ratings yet
MCA-SEM-III-Syllabus Mobile Computing
12 pages
Feature Extraction, Feature Selection and Machine Learning For Image Classification: A Case Study
No ratings yet
Feature Extraction, Feature Selection and Machine Learning For Image Classification: A Case Study
6 pages
Deep Learning
100% (4)
Deep Learning
100 pages
Artificial Intelligence Interview Questions
No ratings yet
Artificial Intelligence Interview Questions
28 pages
Pattern Recognition & Learning II: © UW CSE Vision Faculty
No ratings yet
Pattern Recognition & Learning II: © UW CSE Vision Faculty
47 pages
Deep Learning Unit 2
No ratings yet
Deep Learning Unit 2
30 pages
Lecture Two: The Perceptron: CEG5301: Machine Learning With Applications
No ratings yet
Lecture Two: The Perceptron: CEG5301: Machine Learning With Applications
66 pages
Nature Inspired Algorithms Big Data Frameworks
No ratings yet
Nature Inspired Algorithms Big Data Frameworks
436 pages
LAB MANUAL CST (Soft Computing) 12!02!2019
No ratings yet
LAB MANUAL CST (Soft Computing) 12!02!2019
66 pages
Intro Perceptron
No ratings yet
Intro Perceptron
70 pages
01-Introduction - Shared PDF
No ratings yet
01-Introduction - Shared PDF
71 pages
Multilayer Feed Forward Neural Network
No ratings yet
Multilayer Feed Forward Neural Network
8 pages
Shallow Neural Network
No ratings yet
Shallow Neural Network
152 pages
ML Section15 Neural Networks
No ratings yet
ML Section15 Neural Networks
133 pages
Facial Final Mini
No ratings yet
Facial Final Mini
38 pages
Dodo
No ratings yet
Dodo
29 pages
Machine learning and deep learning for state of art
No ratings yet
Machine learning and deep learning for state of art
21 pages
Introduction To Deep Learning
100% (1)
Introduction To Deep Learning
122 pages
Lecture Notes 3 Perceptron
No ratings yet
Lecture Notes 3 Perceptron
7 pages
2_5420416093837615925
No ratings yet
2_5420416093837615925
73 pages

Unit 2 - Soft Computing - WWW - Rgpvnotes.in

Uploaded by

Unit 2 - Soft Computing - WWW - Rgpvnotes.in

Uploaded by

Program : B.

Introduction of MLP (Multi Layer Perceptron)

Page no: 1 Follow us on facebook to get real-time updates from RGPV

Fig 1.7: MLP Neural Network

Different activation functions:

Fig 1.8: Threshold Function

2. Sigmoidal (S shaped) function,

Page no: 2 Follow us on facebook to get real-time updates from RGPV

Fig 1.9: Sigmoidal Function

Error back propagation algorithm

weight learning local input signal

 where the lo al g adie t δj is defined as follows:

Page no: 3 Follow us on facebook to get real-time updates from RGPV

Fig 1.9: Error Back Propagation Network

-Two Passes of Computation

Page no: 4 Follow us on facebook to get real-time updates from RGPV

Page no: 5 Follow us on facebook to get real-time updates from RGPV

 Allows the output of a pattern rather than a simple category number.

 Can also be viewed as a bidirectional associative memory.

and is used to provide input only during training.

Page no: 6 Follow us on facebook to get real-time updates from RGPV

Fig 2.1: Counter Propagation Network

Training algorithm stage 2:

Page no: 7 Follow us on facebook to get real-time updates from RGPV

Hopfield/ Recurrent network

Page no: 8 Follow us on facebook to get real-time updates from RGPV

Fig 2.2: Hopfield Network

Page no: 9 Follow us on facebook to get real-time updates from RGPV

Auto Associative Memory

Fig 2.3: Auto Associative Memory

Page no: 10 Follow us on facebook to get real-time updates from RGPV

Hetero Associative memory

Fig 2.4: Hetero Associative Memory

Characteristics of Associative Memory

Page no: 11 Follow us on facebook to get real-time updates from RGPV

Limitations of Associative Memory

Applications of Associative Memory

Page no: 12 Follow us on facebook to get real-time updates from RGPV

 Stochastic activation function: the state a unit is in is probabilistically related to its

Comparison between Hopfield v/s Boltzman

Page no: 13 Follow us on facebook to get real-time updates from RGPV

Spin Glass Models of Neural Networks

Adaptive Resonance Theory: Architecture

The basic structure of an ART1 neural network involves:

Page no: 14 Follow us on facebook to get real-time updates from RGPV

Fig 2.5: ART Architecture

Page no: 15 Follow us on facebook to get real-time updates from RGPV

 Distributed ART and ARTMAP (dART and dARTMAP)

 Hierarchical (modular) ART models

Page no: 16 Follow us on facebook to get real-time updates from RGPV

 Cascade Fuzzy ART

Implementation and Training

α>1 and 0<ρ 1

0<bij(0)<α / α−1+n) and tij(0)=1

Step 2 − Co ti ue step -9, when the stopping condition is not true.

Page no: 17 Follow us on facebook to get real-time updates from RGPV

Step 3 − Co ti ue step -6 for every training input.

Page no: 18 Follow us on facebook to get real-time updates from RGPV

If you have any queries or you want to submit your

You might also like