0% found this document useful (0 votes)
25 views

SC - Unit 2

Uploaded by

teacher2
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views

SC - Unit 2

Uploaded by

teacher2
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 50

Soft Computing

Unit -2
Learning Processes

According to Mendel and McClaren (1970) –

“Learning is a process by which the free parameters of a neural network are

adapted through a process of stimulation by the environment in which the

network is embedded. The type of learning is determined by the manner in

which the parameter changes take place.”

Thus we can say that a neural network learns from its environment and improves

its performance through learning.


A learning process indicates the following sequence of events:

 The neural network is stimulated or motivated by an environment.

 The neural network undergoes changes in its free parameters as a result of this

stimulation.

 The neural network responds in a new way to the environment because of the

changes that have occurred in its internal structure.

A prescribed set of well-defined rules for the solution of a learning problem is

called a learning algorithm.


 Supervised learning

 It is also known as learning with a teacher.


 In supervised learning we assume that at each instant of time when the input is applied, the
desired response of the system is provided by the teacher.
 The distance between the actual and the desired response serves as an error measure and is
used to correct network parameters externally.
 In learning classifications of input patterns or situations with known responses, the error can be
used to modify weights so that the error decreases.
 A set of input and output patterns called a training set is required for this learning mode.
 Unsupervised learning

 It is sometimes called learning without a teacher or learning without supervision .


 In learning without supervision, the desired response is not known. Thus, explicit error
information cannot be used to improve network behavior.
 Since no information is available as to correctness or incorrectness of responses, learning must
somehow be accomplished based on observations of responses to inputs that we have marginal
or no knowledge about.
 No feedback is applied from environment to inform what output should be or whether they are
correct. The network itself discover patterns, regularities, features/ categories from the input
data and relations for the input data over the output.
 Exact clusters are formed by discovering similarities & dissimilarities so called as self –
organizing.
 Reinforcement learning

 It is similar to supervised learning.

 Learning based on critic information is called reinforcement learning & the feedback sent is

called reinforcement signal.

 The network receives some feedback from the environment.

 Feedback is only evaluative.

 The external reinforcement signals are processed in the critic signal generator, and the

obtained critic signals are sent to the ANN for adjustment of weights properly to get critic

feedback in future.
 Hebbian Learning
 Hebb 's postulate of learning is the oldest and most famous of all learning rules.
 It is named in the honor of the neuropsychologist Hebb (1949).
 According to Donald Hebb. – “ When an axon of cell A is near enough to excite
a cell B and repeatedly or persistently takes part in firing it, some growth
process or metabolic changes take place in one or both cells such that A's
efficiency as one of the cells firing B, is increased.”
 Hebb proposed this change as a basis of associative learning
According to the Hebb rule-

The weight vector is found to increase proportionally to the product of the input and the
learning signal (learning signal is equal to the neuron’s output).

 If two interconnected neurons are on simultaneously then the weights associated with these
neurons can be increased by the modification made in their synaptic gap or strength.

 The weight update in Hebb rule is given by –

(new)=(old) + y

 This rule is more suited for bipolar data than binary data. In case of binary data, the weight
updation equation cannot distinguish following two conditions –

o A training pair in which an input unit is on and target value is off.


o A training pair in which both the input unit and the target value are off.
 Perceptron Learning Rule

 The learning signal is the difference between the desired and actual neuron’s response.
The perceptron learning rule states that for finite n number of input training
vectors, x(n) where n = 1 to N
each with an associated target value, t(n) where n = 1 to N
which is +1 or -1 , and an activation function f(yin), where

the weight updatation is given by –


if y ≠ t, then wnew = wold + tx
if y = t, then there s no change in weights
 Delta Learning Rule

 It is also called Widrow-Hoff Rule or Least Mean Square (LMS) Rule.


 It was give by Widrow and Hoff in 1960.
 It is valid only for continuous activation functions and in the supervised training mode.
 This rule is stated as-
“The adjustment made to a synaptic weight of a neuron is proportional to the product of
the error signal and the input signal of the synapse.”
 The learning signal for this rule is called delta.
 It assumes that the error signal is directly measurable.
 The aim of this rule is to minimize the error over all training patterns.
 Delta rule can be applied for single output unit and several output units.
Delta rule for single output unit is given by –
=α(t-)
where x : vector of activation of input units
: net input
t : target vector
α : learning rate

Delta rule for several output units is given by –


= α ( tj - )
here weights are adjusted from ith input unit to the jth output unit.
 Competitive Learning
 In this learning, the output neurons of a neural network compete among themselves to become
active.
 The basic idea is that there are a set of neurons that are similar in all aspects except for some
randomly distributed synaptic weights, and therefore respond differently to a given set of
input patterns.
 A limit is imposed on the strength of the neurons.
 This rule has a mechanism that permits the neurons to compete for the right to respond to a
given subset of inputs, such that only one output neuron, or only one neuron per group, is
active at a time.
 The winner neuron is called "winner-takes-all“ neuron.
 Thus, in this method, those neurons which respond strongly to the input stimuli have their
weights updated.
Therefore, the change is given as-
=
where is the learning rate.
 This rule is suited for unsupervised network training.
 Stochastic Learning
 In this method the weights are adjusted in a probabilistic fashion.
 Simulated annealing which is a learning mechanism employed by Boltzmann and Cauchy
machines is an example of Stochastic learning.
Hebb Network
 The Hebb law states that if two neurons are activated simultaneously, then the strength of the
connection between them should be increased.
 The Hebb net consists of a bias which acts exactly as a weight on a connection from a unit
whose activation is always 1.
 If the bias increased, it increases the net input of the unit.
 The weight update in Hebb rule is given by
wi (new) = wi (old)+ xi y
 Hebb network is suited more for bipolar data. If binary data is used, the weight updation
formula cannot distinguish two conditions namely:
o A training pair in which an input unit is “on” and the target value is “off”.
o A training pair in which both the input unit and the target value is “off”.
Training Algorithm - The training algorithm is used for the calculation and adjustment of
weights.

Step 1: First initialize the weights.


Step 2: Perform steps 3-5 for each input training vector and target
output pair, s : t
Step 3: Input units activation are set. (Generally the activation function of input layer is
identity function : = for i = 1 to n)
Step 4: Output units activation are set: y = t.
Step 5: Weight adjustments and bias adjustment are performed :

(new)=(old) + y
b(new) =b(old) + y

These equations can also be expressed as:

w(new) = w(old) +
Major class of Neural Networks

 Perceptron networks

 Perceptron networks come under single-layer feed-forward networks and are also called
simple perceptrons.
 Various types of perceptrons were designed by Rosenblatt (1962) and Minsky-Papert (1969,
1988).
The key points to be noted in a perceptron network are:
• The perceptron network consists of three units, namely, sensory unit (input unit), associator
unit (hidden unit), and response unit (output unit).
• The sensory units are connected to associator units with fixed weights having values 1, 0 or
-l, which are assigned at random.
• The binary activation function is used in sensory unit and associator unit.
• The response unit has an activation of l, 0 or -1. The binary step with fixed threshold ɵ is
used as activation for associator. The output signals that are sent from the associator unit to
the response unit are only binary.
• The output of the perceptron network is given by
𝑦=𝑓(𝑦𝑖𝑛)
where 𝑓(𝑦𝑖𝑛) is activation function and is defined as

 The perceptron learning rule is used in the weight updation between the associator unit and
the response unit. For each training input, the net will calculate the response and it will
determine whether or not an error has occurred.
 The error calculation is based on the comparison of the values of targets with those of the
ca1culated outputs.

 The weights on the connections from the units that send the nonzero signal will get
adjusted suitably.

 The weights will be adjusted on the basis of the learning rule an error has occurred for a
particular training patterns .i.e.,

 If no error occurs , there is no weight updation. Training process stopped.


Perceptron Training Algorithm
Single Output Class
Algorithm is -
Step 0: Initialize the weights, bias and learning rate α , where 0< α ≤ 1
(for simplicity α is set to 1).
Step 1: Perform Steps 2-6 until the final stopping condition is false.
Step 2: Perform Steps 3-5 for each training pair s:t.
Step 3: The input layer containing input units is applied with identity activation
functions: 𝑥𝑖=𝑠𝑖
Step 4: Calculate the output of the network. To do so, first obtain the net input:
= b+
where n is the number of input neurons in the input layer.
Apply activations over the net input calculated to obtain the output –
𝑦=𝑓(𝑦𝑖𝑛) ={1 𝑖𝑓 𝑦𝑖𝑛 > 𝜃

0 𝑖𝑓 −𝑦≤ 𝑦 𝑖𝑛 ≤𝜃

−1 𝑖𝑓 𝑦𝑖𝑛 < −𝜃
Step 5: Weight and bias adjustment: Compare the value of the actual (calculated) output
and desired (target) output.
If yj ≠ tj,
then 𝑤𝑖(𝑛𝑒𝑤) = 𝑤𝑖(𝑜𝑙𝑑) + 𝛼 𝑡j𝑥𝑖
𝑏j(𝑛𝑒𝑤) = 𝑏j(𝑜𝑙𝑑)+ 𝑡j
else,
𝑤𝑖(𝑛𝑒𝑤) = 𝑤𝑖(𝑜𝑙𝑑)
𝑏j(𝑛𝑒𝑤) = 𝑏j(𝑜𝑙𝑑)
Step 6: Train the network until there is no weight change. This is the stopping condition for
the network. If this condition is not met, then start again from Step 2.
Perceptron Training Algorithm
Multiple Output Classes
Algorithm is -
Step 0: Initialize the weights, bias and learning rate α , where 0< α ≤ 1
(for simplicity α is set to 1).
Step 1: Perform Steps 2-6 until the final stopping condition is false.
Step 2: Perform Steps 3-5 for each binary or bipolar training vector pair s:t.
Step 3: The input layer containing input units is applied with identity activation functions: 𝑥𝑖=𝑠𝑖
Step 4: Calculate output response of each output unit j = 1 to m. Net input is calculated as:
= bj+
where n is the number of input neurons in the input layer.
Apply activations over the net input calculated to obtain the output –
𝑦 =𝑓(𝑦𝑖𝑛j) ={1 𝑖𝑓 𝑦𝑖𝑛j > 𝜃

0 𝑖𝑓 −𝑦≤ 𝑦 𝑖𝑛j ≤𝜃

−1 𝑖𝑓 𝑦𝑖𝑛j < −𝜃
Step 5: Weights and bias adjustment for j =1 to m and i = 1 to n.
If yj ≠ tj,
then 𝑤𝑖j(𝑛𝑒𝑤) = 𝑤𝑖j(𝑜𝑙𝑑) + 𝑡j𝑥𝑖
𝑏j(𝑛𝑒𝑤) = 𝑏j(𝑜𝑙𝑑)+ 𝑡j
else,
𝑤𝑖j(𝑛𝑒𝑤) = 𝑤𝑖j(𝑜𝑙𝑑)
𝑏j(𝑛𝑒𝑤) = 𝑏j(𝑜𝑙𝑑)
Step 6: Train the network until there is no weight change. This is the stopping condition for
the network. If this condition is not met, then start again from Step 2.
Perceptron Network Testing Algorithm

Algorithm is -
Step 0: The initial weights are taken from the training algorithm (the final weights obtained during
training)
Step 1: For each input vector X to be classified, perform steps 2 -3.
Step 2: Set activation of the input unit.
Step 3: Obtain the response of the output unit.
=
𝑦 =𝑓(𝑦𝑖𝑛j) ={1 𝑖𝑓 𝑦𝑖𝑛j > 𝜃

0 𝑖𝑓 −𝑦≤ 𝑦 𝑖𝑛j ≤𝜃

−1 𝑖𝑓 𝑦𝑖𝑛j < −𝜃
Multilayer Perceptron Networks

 This network consists of a set of sensory units that constitute the input layer and one or more
hidden layer of computation modes.
 The input signal passes through the network in the forward direction.
 The multilayer perceptrons are used with supervised learning and have led to the successful
backpropagation algorithm.
 In MLP networks there exists a non-linear activation function.
 MLP has various layers of hidden neurons.
 The hidden neurons make the network active for highly complex tasks.
 The layers of the network are connected by synaptic weights.
Back-Propagation Network
 Back propagation learning algorithm is applied to multilayer feed-forward networks
consisting of processing elements with continuous differentiable activation functions.
 The networks associated with back-propagation learning algorithm are called back-
propagation networks. (BPNs).
 A back-propagation neural network is a multilayer, feed-forward neural network consisting of
an input layer, a hidden layer and an output layer.
 The neurons present in the hidden and output layers have biases, which are the connections
from the units whose activation is always 1.
 The bias terms also acts as weights.
 For a given set of training input-output pair, this algorithm provides a procedure for changing
the weights in a BPN to classify the given input patterns correctly.
 The basic concept for this weight update algorithm is simply the gradient descent method.
 Error is propagated back to the hidden unit.
 The aim of the neural network is to train the net to achieve a balance between the net’s ability
to respond and its ability to give reasonable responses to the input that is similar but not
identical to the one that is used in training.
 The training of the BPN is done in three stages - the feed-forward of the input training
pattern, the calculation and back-propagation of the error, and updation of weights.
 During the back propagation phase of learning, signals are sent in the reverse direction.
 The inputs sent to the BPN and the output obtained from the net could be either binary (0, 1) or
bipolar (-1, +1).
 The activation function could be any function which increases monotonically and is also
differentiable.
Architecture
START

Initialize the weights to some random values

For each No
training D
pair x ,t

Yes
Receive input signal xi and transmit to hidden unit

In hidden unit, calculate o /p

j = 1 to p, i = 1 to n

A
A

Send zi to the output layer units

Calculate output signal from output layer,

k = 1 to m

Target pair tk enters

Compute error correction factor

Find weight and bias correction term

B
B

Calculate error term 𝛿j

Compute change in weights and bias based

Update weight and bias on output unit

C
C

Update weight and bias on hidden unit

If specified
No
E umber of epochs
reached or
t k = yk

Yes D
STOP
Training Algorithm

Step 0: Initialize weights and learning rate (take some small random values).
Step 1: Perform Steps 2-9 when stopping condition is false.
Step 2: Perform Steps 3-8 for each training pair.
Feedforward Phase 1
Step 3: Each input unit receives input signal xi and sends it to the hidden unit (i = l to n).
Step 4: Each hidden unit zj (j = 1 to p) sums its weighted input signals to calculate net input:

Calculate output of the hidden unit by applying its activation functions over 𝑧𝑖𝑛𝑗

and send the output signal from the hidden unit to the input of output layer units.
Step 5: For each output unit 𝑦𝑘 (k = 1 to m), calculate the net input:

and apply the activation function to compute output signal

Back-propagation of error (Phase II)


Step 6: Each output unit 𝑦𝑘(k=1 to m) receives a target pattern corresponding to the input
training pattern and computes the error correction term:

The derivative 𝑓′(𝑦𝑖𝑛𝑘) can be calculated as in activation function section. On the basis
of the calculated error correction term, update the change in weights and bias:

Also, send 𝛿 𝑘 to the hidden layer backwards.


Step 7: Each hidden unit (𝑧𝑗 = 1 to p) sums its delta inputs from the output units:

The term 𝑖𝑛𝑗 gets multiplied with the derivative of 𝑓(𝑧𝑖𝑛𝑗) to calculate the error term:

The derivative 𝑓′(𝑧𝑖𝑛𝑗) can be calculated as activation function section depending on


whether binary or bipolar sigmoidal function is used. On the basis of the calculated 𝛿𝑗,
update the change in weights and bias:

Weight and bias updation (Phase IIl):

Step 8: Each output unit (yk, k = 1 to m) updates the bias and weights:
Each hidden unit (zj; j = 1 to p) updates its bias and weights:

Step 9: Check for the stopping condition. The stopping condition may be certain number
of epochs reached or when the actual output equals the target output.
Radial Basis Function Networks

 Radial Basis Function (RBF) networks are a paradigm of neural networks.


 Like perceptrons, the RBF networks are built in layers.
 The RBF networks are - like MLPs - universal function approximators.
 Radial basis function methods have their origins in techniques for performing exact interpolation of a set of
data points in a multi-dimensional space.
Components and Structure of RBF Network:
The RBF network has exactly three layers, i.e. only one single layer of hidden neurons.
 RBF networks have a feed-forward structure and their layers are completely linked.
 The input layer again does not participate in information processing.
 In an RBF network the output neurons only contain the identity as activation function and one weighted sum
as propagation function. Thus, they do little more than adding all input values and returning the sum.
 Hidden neurons are also called RBF neurons and the layer in which they are located is referred to as RBF
layer.
 As propagation function, each hidden neuron calculates a norm that represents the distance between the input

to the network and the so-called position of the neuron (center). This is inserted into a radial activation

function which calculates and outputs the activation of the neuron.

 The center of an RBF neuron is the point in the input space where the RBF neuron is located. In general, the

closer the input vector is to the center vector of an RBF neuron, the higher is its activation.

 The RBF neurons have a propagation function that determines the distance between the center of a neuron and

the input vector y. This distance represents the network input. Then the network input is sent through a radial

basis function which returns the activation or the output of the neuron.
Architecture
In this diagram, the input layer has J1 nodes, the hidden layer has J2 nodes and output layer has and J3

neurons. φ0(x) = 1 corresponds to the bias in the output layer, while φi(x) = φ (x − ci), ci being the center of
the ith node and φ(x) an RBF.

The radial basis function approach was given by Powell in 1987. It introduces a set of N basis functions, one
for each data point, which take the form

where ϕ(•) is some non-linear function.


Thus the nth such function depends on the distance ||x — x n| |, usually taken to be Euclidean, between x and
x".
A number of functions can be used as the RBF:
where r > 0 denotes the distance from a data point x to a center c, the parameter σ is used to
control the smoothness of the interpolating function, and θ is an adjustable bias.

 The RBFN conventionally uses the Gaussian function as the RBF. The Gaussian is compact
and positive.
 It is motivated from the point of view of kernel regression and kernel density estimation.
 In fitting data in which there is normally distributed noise with the inputs, the Gaussian is the
optimal basis function.
 The thin-plate spline function is another popular RBF for universal approximation.
 Unlike the Gaussian, the use of the thin-plate spline is motivated from a curve-fitting
perspective.
Training Algorithm
Step 0: Set the weights to small random values.
Step 1: Perform Steps 2-8 when stopping condition is false.
Step 2: Perform Steps 3-7 for each training pair.
Step 3: Each input unit receives input signal xi and sends it to the hidden unit (i = l to n).
Step 4: Calculate the radial basis function.
Step 5: Select the centers for the radial basis function. The centers are selected from the set of
input vectors.
Step 6: Calculate the output from the hidden layer unit:
() =
where is the center of the RBF unit for input variables; σi is the width of ith RBF unit;

xji is the jth variable of input pattern.


Step 7: Calculate the output of the neural network:
=
where k is the number of hidden later nodes (RBF function);
is the output value of mth node in output layer for the nth incoming pattern;
wim is the weight between ith RBF unit and mth output node;

w0 is the bias term at nth output node.

Step 8: Calculate the error and test for the stopping condition.
Comparison with MLP with backpropagation network
· Training in RBFNs is an order of magnitude faster than training of a comparably sized feed-
forward network with the backpropagation algorithm.
· A better generalization is achieved in RBFNs.
· RBFNs have very fast convergence properties compared with the conventional multilayer
networks with sigmoid transfer functions.
· There is no local minima problem.
· The RBF model can be interpreted as a fuzzy connectionist model, as the RBFs can be
considered as membership functions.
 The hidden layer has a much clearer interpretation than the MLP with the backpropagation
algorithm. Thus, it is easier to explain what an RBF network has learned than its counterpart
MLP with the backpropagation algorithm.
Limitations of RBF
Finding the appropriate number of hidden nodes is difficult. Unsupervised learning
might be necessary to apply first and find out the number of clusters. The number of
hidden nodes is then set to be equal to this number. Too many, or too few, hidden nodes
will prevent RBFN from properly approximating the data.

You might also like