SC - Unit 2
SC - Unit 2
Unit -2
Learning Processes
Thus we can say that a neural network learns from its environment and improves
The neural network undergoes changes in its free parameters as a result of this
stimulation.
The neural network responds in a new way to the environment because of the
Learning based on critic information is called reinforcement learning & the feedback sent is
The external reinforcement signals are processed in the critic signal generator, and the
obtained critic signals are sent to the ANN for adjustment of weights properly to get critic
feedback in future.
Hebbian Learning
Hebb 's postulate of learning is the oldest and most famous of all learning rules.
It is named in the honor of the neuropsychologist Hebb (1949).
According to Donald Hebb. – “ When an axon of cell A is near enough to excite
a cell B and repeatedly or persistently takes part in firing it, some growth
process or metabolic changes take place in one or both cells such that A's
efficiency as one of the cells firing B, is increased.”
Hebb proposed this change as a basis of associative learning
According to the Hebb rule-
The weight vector is found to increase proportionally to the product of the input and the
learning signal (learning signal is equal to the neuron’s output).
If two interconnected neurons are on simultaneously then the weights associated with these
neurons can be increased by the modification made in their synaptic gap or strength.
(new)=(old) + y
This rule is more suited for bipolar data than binary data. In case of binary data, the weight
updation equation cannot distinguish following two conditions –
The learning signal is the difference between the desired and actual neuron’s response.
The perceptron learning rule states that for finite n number of input training
vectors, x(n) where n = 1 to N
each with an associated target value, t(n) where n = 1 to N
which is +1 or -1 , and an activation function f(yin), where
(new)=(old) + y
b(new) =b(old) + y
w(new) = w(old) +
Major class of Neural Networks
Perceptron networks
Perceptron networks come under single-layer feed-forward networks and are also called
simple perceptrons.
Various types of perceptrons were designed by Rosenblatt (1962) and Minsky-Papert (1969,
1988).
The key points to be noted in a perceptron network are:
• The perceptron network consists of three units, namely, sensory unit (input unit), associator
unit (hidden unit), and response unit (output unit).
• The sensory units are connected to associator units with fixed weights having values 1, 0 or
-l, which are assigned at random.
• The binary activation function is used in sensory unit and associator unit.
• The response unit has an activation of l, 0 or -1. The binary step with fixed threshold ɵ is
used as activation for associator. The output signals that are sent from the associator unit to
the response unit are only binary.
• The output of the perceptron network is given by
𝑦=𝑓(𝑦𝑖𝑛)
where 𝑓(𝑦𝑖𝑛) is activation function and is defined as
The perceptron learning rule is used in the weight updation between the associator unit and
the response unit. For each training input, the net will calculate the response and it will
determine whether or not an error has occurred.
The error calculation is based on the comparison of the values of targets with those of the
ca1culated outputs.
The weights on the connections from the units that send the nonzero signal will get
adjusted suitably.
The weights will be adjusted on the basis of the learning rule an error has occurred for a
particular training patterns .i.e.,
0 𝑖𝑓 −𝑦≤ 𝑦 𝑖𝑛 ≤𝜃
−1 𝑖𝑓 𝑦𝑖𝑛 < −𝜃
Step 5: Weight and bias adjustment: Compare the value of the actual (calculated) output
and desired (target) output.
If yj ≠ tj,
then 𝑤𝑖(𝑛𝑒𝑤) = 𝑤𝑖(𝑜𝑙𝑑) + 𝛼 𝑡j𝑥𝑖
𝑏j(𝑛𝑒𝑤) = 𝑏j(𝑜𝑙𝑑)+ 𝑡j
else,
𝑤𝑖(𝑛𝑒𝑤) = 𝑤𝑖(𝑜𝑙𝑑)
𝑏j(𝑛𝑒𝑤) = 𝑏j(𝑜𝑙𝑑)
Step 6: Train the network until there is no weight change. This is the stopping condition for
the network. If this condition is not met, then start again from Step 2.
Perceptron Training Algorithm
Multiple Output Classes
Algorithm is -
Step 0: Initialize the weights, bias and learning rate α , where 0< α ≤ 1
(for simplicity α is set to 1).
Step 1: Perform Steps 2-6 until the final stopping condition is false.
Step 2: Perform Steps 3-5 for each binary or bipolar training vector pair s:t.
Step 3: The input layer containing input units is applied with identity activation functions: 𝑥𝑖=𝑠𝑖
Step 4: Calculate output response of each output unit j = 1 to m. Net input is calculated as:
= bj+
where n is the number of input neurons in the input layer.
Apply activations over the net input calculated to obtain the output –
𝑦 =𝑓(𝑦𝑖𝑛j) ={1 𝑖𝑓 𝑦𝑖𝑛j > 𝜃
0 𝑖𝑓 −𝑦≤ 𝑦 𝑖𝑛j ≤𝜃
−1 𝑖𝑓 𝑦𝑖𝑛j < −𝜃
Step 5: Weights and bias adjustment for j =1 to m and i = 1 to n.
If yj ≠ tj,
then 𝑤𝑖j(𝑛𝑒𝑤) = 𝑤𝑖j(𝑜𝑙𝑑) + 𝑡j𝑥𝑖
𝑏j(𝑛𝑒𝑤) = 𝑏j(𝑜𝑙𝑑)+ 𝑡j
else,
𝑤𝑖j(𝑛𝑒𝑤) = 𝑤𝑖j(𝑜𝑙𝑑)
𝑏j(𝑛𝑒𝑤) = 𝑏j(𝑜𝑙𝑑)
Step 6: Train the network until there is no weight change. This is the stopping condition for
the network. If this condition is not met, then start again from Step 2.
Perceptron Network Testing Algorithm
Algorithm is -
Step 0: The initial weights are taken from the training algorithm (the final weights obtained during
training)
Step 1: For each input vector X to be classified, perform steps 2 -3.
Step 2: Set activation of the input unit.
Step 3: Obtain the response of the output unit.
=
𝑦 =𝑓(𝑦𝑖𝑛j) ={1 𝑖𝑓 𝑦𝑖𝑛j > 𝜃
0 𝑖𝑓 −𝑦≤ 𝑦 𝑖𝑛j ≤𝜃
−1 𝑖𝑓 𝑦𝑖𝑛j < −𝜃
Multilayer Perceptron Networks
This network consists of a set of sensory units that constitute the input layer and one or more
hidden layer of computation modes.
The input signal passes through the network in the forward direction.
The multilayer perceptrons are used with supervised learning and have led to the successful
backpropagation algorithm.
In MLP networks there exists a non-linear activation function.
MLP has various layers of hidden neurons.
The hidden neurons make the network active for highly complex tasks.
The layers of the network are connected by synaptic weights.
Back-Propagation Network
Back propagation learning algorithm is applied to multilayer feed-forward networks
consisting of processing elements with continuous differentiable activation functions.
The networks associated with back-propagation learning algorithm are called back-
propagation networks. (BPNs).
A back-propagation neural network is a multilayer, feed-forward neural network consisting of
an input layer, a hidden layer and an output layer.
The neurons present in the hidden and output layers have biases, which are the connections
from the units whose activation is always 1.
The bias terms also acts as weights.
For a given set of training input-output pair, this algorithm provides a procedure for changing
the weights in a BPN to classify the given input patterns correctly.
The basic concept for this weight update algorithm is simply the gradient descent method.
Error is propagated back to the hidden unit.
The aim of the neural network is to train the net to achieve a balance between the net’s ability
to respond and its ability to give reasonable responses to the input that is similar but not
identical to the one that is used in training.
The training of the BPN is done in three stages - the feed-forward of the input training
pattern, the calculation and back-propagation of the error, and updation of weights.
During the back propagation phase of learning, signals are sent in the reverse direction.
The inputs sent to the BPN and the output obtained from the net could be either binary (0, 1) or
bipolar (-1, +1).
The activation function could be any function which increases monotonically and is also
differentiable.
Architecture
START
For each No
training D
pair x ,t
Yes
Receive input signal xi and transmit to hidden unit
j = 1 to p, i = 1 to n
A
A
k = 1 to m
B
B
C
C
If specified
No
E umber of epochs
reached or
t k = yk
Yes D
STOP
Training Algorithm
Step 0: Initialize weights and learning rate (take some small random values).
Step 1: Perform Steps 2-9 when stopping condition is false.
Step 2: Perform Steps 3-8 for each training pair.
Feedforward Phase 1
Step 3: Each input unit receives input signal xi and sends it to the hidden unit (i = l to n).
Step 4: Each hidden unit zj (j = 1 to p) sums its weighted input signals to calculate net input:
Calculate output of the hidden unit by applying its activation functions over 𝑧𝑖𝑛𝑗
and send the output signal from the hidden unit to the input of output layer units.
Step 5: For each output unit 𝑦𝑘 (k = 1 to m), calculate the net input:
The derivative 𝑓′(𝑦𝑖𝑛𝑘) can be calculated as in activation function section. On the basis
of the calculated error correction term, update the change in weights and bias:
The term 𝑖𝑛𝑗 gets multiplied with the derivative of 𝑓(𝑧𝑖𝑛𝑗) to calculate the error term:
Step 8: Each output unit (yk, k = 1 to m) updates the bias and weights:
Each hidden unit (zj; j = 1 to p) updates its bias and weights:
Step 9: Check for the stopping condition. The stopping condition may be certain number
of epochs reached or when the actual output equals the target output.
Radial Basis Function Networks
to the network and the so-called position of the neuron (center). This is inserted into a radial activation
The center of an RBF neuron is the point in the input space where the RBF neuron is located. In general, the
closer the input vector is to the center vector of an RBF neuron, the higher is its activation.
The RBF neurons have a propagation function that determines the distance between the center of a neuron and
the input vector y. This distance represents the network input. Then the network input is sent through a radial
basis function which returns the activation or the output of the neuron.
Architecture
In this diagram, the input layer has J1 nodes, the hidden layer has J2 nodes and output layer has and J3
neurons. φ0(x) = 1 corresponds to the bias in the output layer, while φi(x) = φ (x − ci), ci being the center of
the ith node and φ(x) an RBF.
The radial basis function approach was given by Powell in 1987. It introduces a set of N basis functions, one
for each data point, which take the form
The RBFN conventionally uses the Gaussian function as the RBF. The Gaussian is compact
and positive.
It is motivated from the point of view of kernel regression and kernel density estimation.
In fitting data in which there is normally distributed noise with the inputs, the Gaussian is the
optimal basis function.
The thin-plate spline function is another popular RBF for universal approximation.
Unlike the Gaussian, the use of the thin-plate spline is motivated from a curve-fitting
perspective.
Training Algorithm
Step 0: Set the weights to small random values.
Step 1: Perform Steps 2-8 when stopping condition is false.
Step 2: Perform Steps 3-7 for each training pair.
Step 3: Each input unit receives input signal xi and sends it to the hidden unit (i = l to n).
Step 4: Calculate the radial basis function.
Step 5: Select the centers for the radial basis function. The centers are selected from the set of
input vectors.
Step 6: Calculate the output from the hidden layer unit:
() =
where is the center of the RBF unit for input variables; σi is the width of ith RBF unit;
Step 8: Calculate the error and test for the stopping condition.
Comparison with MLP with backpropagation network
· Training in RBFNs is an order of magnitude faster than training of a comparably sized feed-
forward network with the backpropagation algorithm.
· A better generalization is achieved in RBFNs.
· RBFNs have very fast convergence properties compared with the conventional multilayer
networks with sigmoid transfer functions.
· There is no local minima problem.
· The RBF model can be interpreted as a fuzzy connectionist model, as the RBFs can be
considered as membership functions.
The hidden layer has a much clearer interpretation than the MLP with the backpropagation
algorithm. Thus, it is easier to explain what an RBF network has learned than its counterpart
MLP with the backpropagation algorithm.
Limitations of RBF
Finding the appropriate number of hidden nodes is difficult. Unsupervised learning
might be necessary to apply first and find out the number of clusters. The number of
hidden nodes is then set to be equal to this number. Too many, or too few, hidden nodes
will prevent RBFN from properly approximating the data.