ANN Models
ANN Models
ICT619 S2-05 2
MLP learning algorithm
The learning rule for the multilayer perceptron is known
as "the generalised delta rule" or the "backpropagation
rule"
ICT619 S2-05 3
MLP learning algorithm (cont’d)
New weight = Old weight + change calculated from square of error
Error = difference between desired output and actual output
ICT619 S2-05 5
ICT619 S2-05 6
Example (cont’d)
Each neuron is composed of two units. First unit adds
products of weights coefficients and input signals. The
second unit realise nonlinear function, called neuron
activation function. Signal e is adder output signal, and
y = f(e) is output signal of nonlinear element. Signal y is
also output signal of neuron.
ICT619 S2-05 7
Example (cont’d)
To teach the neural network we need training data set.
The training data set consists of input signals(x1 and x2 )
assigned with corresponding target (desired output) z.
Pictures below illustrate how signal is propagating
through the network, Symbols w(xm)n represent weights of
connections between network input xm and neuron n in
input layer. Symbols yn represents output signal of
neuron n.
ICT619 S2-05 8
Example (cont’d)
ICT619 S2-05 9
Example (cont’d)
Propagation of signals through the hidden layer.
Symbols wmn represent weights of connections between
output of neuron m and input of neuron n in the next
layer.
ICT619 S2-05 10
Example (cont’d)
Propagation of signals through the output layer.
ICT619 S2-05 11
Example (cont’d)
It is impossible to compute error signal for internal
neurons directly, because output values of these
neurons are unknown. The idea is to propagate error
signal d (computed in single teaching step) back to all
neurons, which output signals were input for discussed
neuron.
ICT619 S2-05 12
Example (cont’d)
The weights' coefficients wmn used to propagate
errors back are equal to this used during
computing output value. Only the direction of
data flow is changed.
ICT619 S2-05 13
Example (cont’d)
When the error signal for each neuron is computed, the
weights coefficients of each neuron input node may be
modified. In formulas below df(e)/de represents
derivative of neuron activation function (which weights
are modified).
ICT619 S2-05 14
Example (cont’d)
ICT619 S2-05 15
Example (cont’d)
ICT619 S2-05 16
Example (cont’d)
ICT619 S2-05 18
Architecture of Back-
Propagation Algorithm
X1 x1 h1 y11
x22 h2 y22
X2
.
.
.
x3p hm3 y3n
Xp
ICT619 S2-05 20
The Algorithm (cont’d)
Step 3: Calculate Actual Outputs
Use the sigmoid nonlinearity from above and
formulas to calculate outputs y0,y1…yN-1.
Step 4: Adapt Weights
Use a recursive algorithm staring at the output
nodes and working back to the first hidden layer. Adjust
weights by
wij(t+1) = wij(t) + η δj xi
In this equation wij(t) is the weight from hidden node i
or from an input to node j at time t, xi is either the output
of node i or is an input, η is a gain term, and δj is an error
term for node j. if node j is an output node, then
δj = xj(1-xj) Σ δk wjk
ICT619 S2-05 21
The Algorithm (cont’d)
where k is over all nodes in the layers
above node j. Internal node thresholds are
adapted in a similar manner by assuming they
are connection weights on links from auxiliary
constant-valued inputs. Convergence is
sometimes faster is a momentum term is added
and weight changes are smoothed by
wij(t+1) = wij(t) + η δj xi + α (wij(t) - wij(t-1)),
where 0 < α < 1
Step 5: Repeat steps 2 to 4.
ICT619 S2-05 22
Derivation of Back-Propagation
Algorithm
x= input vector of input layer for unit k.
h=weight vector of input layer for unit k.
v=input vector of hidden layer for unit j.
g= weight vector of hidden layer for unit j.
y=output vector of output layer.
wij= weight for ith neuron of output layer to jth
neuron of hidden layer.
wjk= weight for jth neuron of hidden layer to kth
neuron of input layer.
ICT619 S2-05 23
Derivation (cont’d)
Input of hidden layer:
p
hj = ∑wjk * xk ………………………..(1)
k=1
ICT619 S2-05 25
Derivation (cont’d)
Error function:
n
E(t)= 1/2 ∑(yid - yi )2………………………..(5)
i=1
Weight function:
ICT619 S2-05 26
Derivation (cont’d)
Updating weights between output layer and
hidden layer:
n
∂E(t) = ∑ ∂E(t) * ∂yi ……………(8)
∂wij(t) i=1 ∂yi ∂wij(t)
From equation (4) and (5):
yi = 1/(1+e-gi)
E(t)= 1/2 ∑(yid - yi )2
∂E(t) = - (yid - yi ) ………… ……………(9)
∂yi
ICT619 S2-05 27
Derivation (cont’d)
∂ yi = ∂ (1+e-gi) -1
∂wij(t) ∂wij
ICT619 S2-05 28
Derivation (cont’d)
∂yi = -1(-e-gi) * vj……………(10)
∂wij(t) (1+e-gi) 2
yi = 1/(1+e-gi)
(1-yi) = 1- 1/(1+e-gi)
(1-yi) = e-gi/(1+e-gi)
yi(1-yi) = e-gi/(1+e-gi)2
ICT619 S2-05 29
Derivation (cont’d)
substituting this value in eq (10)
ICT619 S2-05 30
Derivation (cont’d)
δi = -yi(1-yi)(yid – yi)-----------------------------(13)
∂E(t) = - nΣi=1 δi vj-------------------------------(14)
∂wij(t)
∆wij(t) = - η ∂E(t)
∂wij(t)
∆wij(t) = η δi vj-----------------------------------(15)
Hence updated weight will be
wij(t+1) = wij (t) + η δi vj-----------------------(16)
ICT619 S2-05 31
Derivation (cont’d)
Updating weights between hidden layer and input
layer: n
∂E(t) = Σ ∂Ei(t) ----------------------------------(17)
∂wjk(t) j=1 ∂wjk
= ∂Ei(t) * ∂yi ----------------------------(18)
∂yi ∂wjk(t)
A B
A= ∂Ei(t) = -(yid – yi)------------------------------(19)
∂yi
B = ∂yi = ∂yi * ∂gi * ∂vj * ∂hj -----------------(20)
∂wjk(t) ∂gi ∂vj ∂hj ∂wjk
D E F G
D = ∂yi = yi(1-yi) ----------------------------------(21)
∂gi
E = ∂gi = wij ----------------------------------------(22)
∂vj
ICT619 S2-05 32
Derivation (cont’d)
F = ∂vj = ∂ (1 + e-hj)-1 = vj(1-vj)-----------(23)
∂hj ∂hj
G = ∂hj = xk -------------------------------------(24)
∂wjk
Substituting the values of equation (19),(20),(21),(22),
(23), (24) into equation (18), we get
∂E(t) = -(yid – yi) * yi(1-yi) * vj(1-vj) xk wij
∂wjk(t)
= -vj(1-vj) xk nΣi=1 δiwij
where δi = yi(yid – yi)(1-yi)
δj = -vj(1-vj) nΣi=1 δiwij
Change in weight between hidden & input layer is:
∆wij(t) = η δj xk
Hence updated weight will be
wjk(t+1) = wjk (t) + η δj xk
ICT619 S2-05 33
The error landscape in a multilayer
perceptron
For a given pattern p, the error Ep can be plotted against
the weights to give the so called error surface
The error surface is a landscape of hills and valleys, with
points of minimum error corresponding to wells and
maximum error found on peaks.
The generalised delta rule aims to minimise Ep by
adjusting weights so that they correspond to points of
lowest error
It follows the method of gradient descent where the
changes are made in the steepest downward direction
ICT619 S2-05 34
The error landscape in a multilayer
perceptron
Ep
i
ICT619 S2-05 35
Learning difficulties in multilayer
perceptrons - local minima
The MLP may fail to settle into the global minimum of the
error surface and instead find itself in one of the local
minima
This is due to the gradient descent strategy followed
A number of alternative approaches can be taken to
reduce this possibility:
Lowering the gain term progressively
Used to influence rate at which weight changes are made
during training
Value by default is 1, but it may be gradually reduced to
reduce the rate of change as training progresses
ICT619 S2-05 36
Learning difficulties in multilayer
perceptrons (cont’d)
Addition of more nodes for better representation of patterns
Too few nodes (and consequently not enough weights) can cause failure
of the ANN to learn a pattern
ICT619 S2-05 37
BPA
A unit in the output layer determines its activity by following a two-step procedure.
Where yi is the activity level of the ith unit in the previous layer and Wij is the weight of the connection between
the ith and the jth unit.
Next, the unit calculates the activity yj using some function of the total weighted input. Typically we use
the sigmoid function:
1
yj x
1 e j
Once the activities of all output units have been determined, the network computes the error E, which is
defined by the expression:
1
i i
2
E y d
2 i
where yj is the activity level of the jth unit in the top layer and dj is the desired output of the jth unit
ICT619 S2-05 38
BPA
The back-propagation algorithm consists of four steps:
1. Compute how fast the error changes as the activity of an output unit is changed . This
error derivative (EA) is the difference between the actual and the desired activity.
E
EAj yj d j
y j
2. Compute how fast the error changes as the total input received by an output unit is
changed. This quantity (EI) is the answer from step 1 multiplied by the rate at which the
output of a unit changes as its total input is changed.
E E y j
EI j x EA j y j 1 y j
x j y j x j
3. Compute how fast the x j as a weight on the connection into an output unit
E errorEchanges
is changed [70, ij This
EW118]. quantityx (EW) is
EIthe
j y j answer from step 2 multiplied by the
W
activity level of the unit from
ij x
which
j W
theij connection emanates.
ICT619 S2-05 39
BPA
Compute how fast the error changes as the activity of a unit in the previous layer is changed.
This crucial step allows back propagation to be applied to multi-layer networks. When the activity
of a unit in the previous layer changes, it affects the activities of all the output units to which it is
connected. So to compute the overall effect on the error, we add together all these separate
effects on output units. But each effect is simple to calculate. It is the answer in step 2 multiplied
by the weight on the connection to that output unit.
E E x j
EAj x EI jWij
yi j x j yi j
By using steps 2 and 4, we can convert the EAs of one layer of units into EAs for the previous
layer. This procedure can be repeated to get the EAs for as many previous layers as desired.
Once we know the EA of a unit, we can use steps 2 and 3 to compute the EWs on its incoming
connections.
ICT619 S2-05 40
Weight v/s Sum of Square Error
ICT619 S2-05 41
Flow Diagram Of Back-Propagation Algorithm.
ICT619 S2-05 42