0% found this document useful (0 votes)
95 views

ANN Models

The document discusses artificial neural network (ANN) models, specifically the multilayer perceptron model. It provides details on: 1) The multilayer perceptron model involves nodes arranged in an input, hidden and output layer with connections between layers. Weights are adjusted during training using backpropagation. 2) Backpropagation involves calculating error values that are propagated back through the network to adjust weights between layers and reduce error. 3) An example illustrates how signals propagate through a three layer network during training, with weights adjusted based on the difference between the actual and desired output.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
95 views

ANN Models

The document discusses artificial neural network (ANN) models, specifically the multilayer perceptron model. It provides details on: 1) The multilayer perceptron model involves nodes arranged in an input, hidden and output layer with connections between layers. Weights are adjusted during training using backpropagation. 2) Backpropagation involves calculating error values that are propagated back through the network to adjust weights between layers and reduce error. 3) An example illustrates how signals propagate through a three layer network during training, with weights adjusted based on the difference between the actual and desired output.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 42

ANN models

 ANNs are supposed to model the structure and


operation of the biological brain
 But there are different types of neural networks
depending on the architecture, learning strategy and
operation
 Three of the most well known models are:
1. The multilayer perceptron
2. The Kohonen network (the Self-Organising Map)
3. The Hopfield net

 The Multilayer Perceptron (MLP) is the most popular


ANN architecture
ICT619 S2-05 1
The Multilayer Perceptron
 Nodes are arranged into an input layer, an output layer and one or
more hidden layers
 Also known as the backpropagation network because of the
use of error values from the output layer in the layers before it to
calculate weight adjustments during training.
 Another name for the MLP is the feedforward network.

ICT619 S2-05 2
MLP learning algorithm
 The learning rule for the multilayer perceptron is known
as "the generalised delta rule" or the "backpropagation
rule"

 The generalised delta rule repeatedly calculates an error


value for each input, which is a function of the squared
difference between the expected correct output and the
actual output

 The calculated error is backpropagated from one layer to


the previous one, and is used to adjust the weights
between connecting layers

ICT619 S2-05 3
MLP learning algorithm (cont’d)
New weight = Old weight + change calculated from square of error
Error = difference between desired output and actual output

 Training stops when error becomes acceptable, or


after a predetermined number of iterations

 After training, the modified interconnection weights


form a sort of internal representation that enables the
ANN to generate desired outputs when given the
training inputs – or even new inputs that are similar to
training inputs

 This generalisation is a very important property


ICT619 S2-05 4
Multi-layer Neural Network
Employing Backpropagation
Algorithm
 To illustrate this process let us take an example
of three layer neural network with two inputs and
one output, which is shown in the picture below:

ICT619 S2-05 5
ICT619 S2-05 6
Example (cont’d)
 Each neuron is composed of two units. First unit adds
products of weights coefficients and input signals. The
second unit realise nonlinear function, called neuron
activation function. Signal e is adder output signal, and
y = f(e) is output signal of nonlinear element. Signal y is
also output signal of neuron.

ICT619 S2-05 7
Example (cont’d)
 To teach the neural network we need training data set.
The training data set consists of input signals(x1 and x2 )
assigned with corresponding target (desired output) z.
 Pictures below illustrate how signal is propagating
through the network, Symbols w(xm)n represent weights of
connections between network input xm and neuron n in
input layer. Symbols yn represents output signal of
neuron n.

ICT619 S2-05 8
Example (cont’d)

ICT619 S2-05 9
Example (cont’d)
 Propagation of signals through the hidden layer.
Symbols wmn represent weights of connections between
output of neuron m and input of neuron n in the next
layer.

ICT619 S2-05 10
Example (cont’d)
 Propagation of signals through the output layer.

 In the next algorithm step the output signal of


the network y is compared with the desired
output value (the target), which is found in
training data set. The difference is called error
signal d of output layer neuron.

ICT619 S2-05 11
Example (cont’d)
 It is impossible to compute error signal for internal
neurons directly, because output values of these
neurons are unknown. The idea is to propagate error
signal d (computed in single teaching step) back to all
neurons, which output signals were input for discussed
neuron.

ICT619 S2-05 12
Example (cont’d)
 The weights' coefficients wmn used to propagate
errors back are equal to this used during
computing output value. Only the direction of
data flow is changed.

ICT619 S2-05 13
Example (cont’d)
 When the error signal for each neuron is computed, the
weights coefficients of each neuron input node may be
modified. In formulas below df(e)/de represents
derivative of neuron activation function (which weights
are modified).

ICT619 S2-05 14
Example (cont’d)

ICT619 S2-05 15
Example (cont’d)

ICT619 S2-05 16
Example (cont’d)

 Coefficient η affects network teaching speed. There are a few


techniques to select this parameter. The first method is to start
teaching process with large value of the parameter. While weights
coefficients are being established the parameter is being decreased
gradually. The second, more complicated, method starts teaching
with small parameter value. During the teaching process the
parameter is being increased when the teaching is advanced and
then decreased again in the final stage. Starting teaching process
with low parameter value enables to determine weights coefficients
signs.
ICT619 S2-05 17
Back-Propagation Training
Algorithm
The back-propagation training algorithm is
an iterative gradient algorithm designed to
minimize the mean square error between the
actual output of a multilayer feed-forward
perceptron and the desired output. It requires
continuous differentiable non-linearities. The
following assumes a sigmoid logistic non-
linearity is used where the function f(α) is
f(α) = 1 .
1+ e-(α-θ)

ICT619 S2-05 18
Architecture of Back-
Propagation Algorithm

X1 x1 h1 y11

x22 h2 y22
X2
.
.
.
x3p hm3 y3n
Xp

Input layer Hidden layer Output layer


ICT619 S2-05 19
The Algorithm
Step 1: Initialize weights and offsets
Set all weights and node offsets to small random
values
Step 2: Present Input and Desired Outputs
Present a continuous valued input vector x0,x1…
xP-1 and specify the desired output d0,d1…dN-1. If the net
is used as a classifier then all desired outputs are
typically set to zero except for that corresponding to the
class the input is from. That desired output is 1. The
input could be new on each trial or samples from a
training set could be presented cyclically until weights
stabilize.

ICT619 S2-05 20
The Algorithm (cont’d)
Step 3: Calculate Actual Outputs
Use the sigmoid nonlinearity from above and
formulas to calculate outputs y0,y1…yN-1.
Step 4: Adapt Weights
Use a recursive algorithm staring at the output
nodes and working back to the first hidden layer. Adjust
weights by
wij(t+1) = wij(t) + η δj xi
In this equation wij(t) is the weight from hidden node i
or from an input to node j at time t, xi is either the output
of node i or is an input, η is a gain term, and δj is an error
term for node j. if node j is an output node, then
δj = xj(1-xj) Σ δk wjk

ICT619 S2-05 21
The Algorithm (cont’d)
where k is over all nodes in the layers
above node j. Internal node thresholds are
adapted in a similar manner by assuming they
are connection weights on links from auxiliary
constant-valued inputs. Convergence is
sometimes faster is a momentum term is added
and weight changes are smoothed by
wij(t+1) = wij(t) + η δj xi + α (wij(t) - wij(t-1)),
where 0 < α < 1
Step 5: Repeat steps 2 to 4.

ICT619 S2-05 22
Derivation of Back-Propagation
Algorithm
 x= input vector of input layer for unit k.
 h=weight vector of input layer for unit k.
 v=input vector of hidden layer for unit j.
 g= weight vector of hidden layer for unit j.
 y=output vector of output layer.
 wij= weight for ith neuron of output layer to jth
neuron of hidden layer.
 wjk= weight for jth neuron of hidden layer to kth
neuron of input layer.
ICT619 S2-05 23
Derivation (cont’d)
 Input of hidden layer:
p

hj = ∑wjk * xk ………………………..(1)
k=1

 Output of hidden layer:


vj = f(hj)
vj = 1/(1+ e-hj) ………..………….(2)
ICT619 S2-05 24
Derivation (cont’d)
 Input of output layer:
m
gi= ∑wij * vj ……………………………..(3)
j=1
 Output of output layer:
yi = f(gi)
yi = 1/(1+e-gi) ………………….…………..(4)

ICT619 S2-05 25
Derivation (cont’d)
 Error function:
n
E(t)= 1/2 ∑(yid - yi )2………………………..(5)
i=1
 Weight function:

wij(t+1)= wij(t) + ∆wij(t)…………….……….(6)


∆wij(t) = - η ∂E(t) ………….…………… …(7)
∂wij(t)

ICT619 S2-05 26
Derivation (cont’d)
 Updating weights between output layer and
hidden layer:
n
∂E(t) = ∑ ∂E(t) * ∂yi ……………(8)
∂wij(t) i=1 ∂yi ∂wij(t)
From equation (4) and (5):
yi = 1/(1+e-gi)
E(t)= 1/2 ∑(yid - yi )2
∂E(t) = - (yid - yi ) ………… ……………(9)
∂yi

ICT619 S2-05 27
Derivation (cont’d)
∂ yi = ∂ (1+e-gi) -1
∂wij(t) ∂wij

∂yi = ∂yi * ∂gi


∂wij(t) ∂gi ∂wij
m

∂yi = ∂ (1+e-gi) -1 * ∂ (∑ wijvj)


∂wij(t) ∂gi ∂wij j=1

ICT619 S2-05 28
Derivation (cont’d)
∂yi = -1(-e-gi) * vj……………(10)
∂wij(t) (1+e-gi) 2

yi = 1/(1+e-gi)

(1-yi) = 1- 1/(1+e-gi)

(1-yi) = e-gi/(1+e-gi)

yi(1-yi) = e-gi/(1+e-gi)2

ICT619 S2-05 29
Derivation (cont’d)
substituting this value in eq (10)

∂yi = (1-yi) yi * vj…………………………(11)


∂wij(t
substituting the value of eq (9) and (11) in eq (8)

∂E(t) = -(yid - yi ) * (1-yi) yi * vj …………(12)


∂wij(t)

ICT619 S2-05 30
Derivation (cont’d)
δi = -yi(1-yi)(yid – yi)-----------------------------(13)
∂E(t) = - nΣi=1 δi vj-------------------------------(14)
∂wij(t)
∆wij(t) = - η ∂E(t)
∂wij(t)
∆wij(t) = η δi vj-----------------------------------(15)
 Hence updated weight will be
wij(t+1) = wij (t) + η δi vj-----------------------(16)

ICT619 S2-05 31
Derivation (cont’d)
 Updating weights between hidden layer and input
layer: n
∂E(t) = Σ ∂Ei(t) ----------------------------------(17)
∂wjk(t) j=1 ∂wjk
= ∂Ei(t) * ∂yi ----------------------------(18)
∂yi ∂wjk(t)
A B
A= ∂Ei(t) = -(yid – yi)------------------------------(19)
∂yi
B = ∂yi = ∂yi * ∂gi * ∂vj * ∂hj -----------------(20)
∂wjk(t) ∂gi ∂vj ∂hj ∂wjk
D E F G
D = ∂yi = yi(1-yi) ----------------------------------(21)
∂gi
E = ∂gi = wij ----------------------------------------(22)
∂vj
ICT619 S2-05 32
Derivation (cont’d)
F = ∂vj = ∂ (1 + e-hj)-1 = vj(1-vj)-----------(23)
∂hj ∂hj
G = ∂hj = xk -------------------------------------(24)
∂wjk
Substituting the values of equation (19),(20),(21),(22),
(23), (24) into equation (18), we get
∂E(t) = -(yid – yi) * yi(1-yi) * vj(1-vj) xk wij
∂wjk(t)
= -vj(1-vj) xk nΣi=1 δiwij
where δi = yi(yid – yi)(1-yi)
δj = -vj(1-vj) nΣi=1 δiwij
Change in weight between hidden & input layer is:
∆wij(t) = η δj xk
 Hence updated weight will be
wjk(t+1) = wjk (t) + η δj xk
ICT619 S2-05 33
The error landscape in a multilayer
perceptron
 For a given pattern p, the error Ep can be plotted against
the weights to give the so called error surface
 The error surface is a landscape of hills and valleys, with
points of minimum error corresponding to wells and
maximum error found on peaks.
 The generalised delta rule aims to minimise Ep by
adjusting weights so that they correspond to points of
lowest error
 It follows the method of gradient descent where the
changes are made in the steepest downward direction

 All possible solutions are depressions in the error


surface, known as basins of attraction

ICT619 S2-05 34
The error landscape in a multilayer
perceptron

Ep

i
ICT619 S2-05 35
Learning difficulties in multilayer
perceptrons - local minima
 The MLP may fail to settle into the global minimum of the
error surface and instead find itself in one of the local
minima
 This is due to the gradient descent strategy followed
 A number of alternative approaches can be taken to
reduce this possibility:
 Lowering the gain term progressively
 Used to influence rate at which weight changes are made
during training
 Value by default is 1, but it may be gradually reduced to
reduce the rate of change as training progresses

ICT619 S2-05 36
Learning difficulties in multilayer
perceptrons (cont’d)
 Addition of more nodes for better representation of patterns
 Too few nodes (and consequently not enough weights) can cause failure
of the ANN to learn a pattern

 Introduction of a momentum term


 Determines effect of past weight changes on current direction of
movement in weight space
 Momentum term is also a small numerical value in the range 0 -1

 Addition of random noise to perturb the ANN out of local minima


 Usually done by adding small random values to weights.
 Takes the net to a different point in the error space – hopefully out of a
local minimum

ICT619 S2-05 37
BPA


A unit in the output layer determines its activity by following a two-step procedure.

First, it computes the total weighted input xj, using the


formula:
X j   yiWij
i

Where yi is the activity level of the ith unit in the previous layer and Wij is the weight of the connection between
the ith and the jth unit.
Next, the unit calculates the activity yj using some function of the total weighted input. Typically we use
the sigmoid function:
1
yj  x
1 e j

Once the activities of all output units have been determined, the network computes the error E, which is
defined by the expression:
1
 i i
 
2
E y  d
2 i
where yj is the activity level of the jth unit in the top layer and dj is the desired output of the jth unit

ICT619 S2-05 38
BPA
 The back-propagation algorithm consists of four steps:
1. Compute how fast the error changes as the activity of an output unit is changed . This
error derivative (EA) is the difference between the actual and the desired activity.

E
EAj   yj  d j
y j
2. Compute how fast the error changes as the total input received by an output unit is
changed. This quantity (EI) is the answer from step 1 multiplied by the rate at which the
output of a unit changes as its total input is changed.

E E y j
EI j   x  EA j y j  1  y j 
x j y j x j

3. Compute how fast the x j as a weight on the connection into an output unit
E errorEchanges
is changed [70, ij  This 
EW118]. quantityx (EW)  is
EIthe
j y j answer from step 2 multiplied by the
W
activity level of the unit from
ij  x
which
j  W
theij connection emanates.

ICT619 S2-05 39
BPA
 Compute how fast the error changes as the activity of a unit in the previous layer is changed.
This crucial step allows back propagation to be applied to multi-layer networks. When the activity
of a unit in the previous layer changes, it affects the activities of all the output units to which it is
connected. So to compute the overall effect on the error, we add together all these separate
effects on output units. But each effect is simple to calculate. It is the answer in step 2 multiplied
by the weight on the connection to that output unit.

E E x j
EAj   x   EI jWij
yi j x j yi j

 By using steps 2 and 4, we can convert the EAs of one layer of units into EAs for the previous
layer. This procedure can be repeated to get the EAs for as many previous layers as desired.
Once we know the EA of a unit, we can use steps 2 and 3 to compute the EWs on its incoming
connections.

ICT619 S2-05 40
Weight v/s Sum of Square Error

ICT619 S2-05 41
Flow Diagram Of Back-Propagation Algorithm.

ICT619 S2-05 42

You might also like