0% found this document useful (0 votes)
1 views

Lecture 02 - Artificial Neural Network

The document discusses the Delta Learning Rule and Backpropagation Algorithm in neural networks, focusing on how to minimize error through weight adjustments during training. It explains the importance of learning constants, the stochastic approximation to gradient descent, and the challenges of local minima. Additionally, it covers the architecture of Multi Layer Perceptrons (MLPs) and their application in tasks like face recognition and autonomous vehicle steering.

Uploaded by

yingo.xingo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1 views

Lecture 02 - Artificial Neural Network

The document discusses the Delta Learning Rule and Backpropagation Algorithm in neural networks, focusing on how to minimize error through weight adjustments during training. It explains the importance of learning constants, the stochastic approximation to gradient descent, and the challenges of local minima. Additionally, it covers the architecture of Multi Layer Perceptrons (MLPs) and their application in tasks like face recognition and autonomous vehicle steering.

Uploaded by

yingo.xingo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 37

NEURAL NETWORKS

Delta Learning Rule

Let E = accumulative error over a data set. It is a function of


the neuron weights

E = training samples(d – O)2

d is the desired output and O is the actual output

The error is squared so that the positive and negative errors


may not cancel each other out during summation
NEURAL NETWORKS

Delta Learning Rule

Each weight configuration can be represented by a point on


an error surface
NEURAL NETWORKS

Delta Learning Rule

Starting from a random weight configuration, we want our


training algorithm to move in the direction where error is
reduced more rapidly

Delta rule attempts to minimize the local error and uses the
derivative of the error to find the slope of the error space in
the region local to a particular point
NEURAL NETWORKS

Delta Learning Rule

Delta rule:
wi = -c (Error/ wi)

If the learning constant “c” is large (more than 0.5), weights


move quickly to optimal value but there is a risk of
overshooting the minimum or oscillation around optimum
weights

If “c” is small, the training is less prone to these problems


but system does not learn quickly; also the algorithm may
get stuck in local minima
NEURAL NETWORKS

Delta Learning Rule

The weights are updated incrementally, following the


presentation of each training example

This corresponds to a stochastic approximation to gradient


descent

To obtain the true gradient of Error, one would consider all


of the training examples before altering the weight values

The stochastic approximation avoids costly computations per


weight update
NEURAL NETWORKS
Delta Learning Rule

To calculate the weight change, we use chain rule


The Error is only indirectly dependent on wi, but it is directly
dependent on variable O

Error/ wi = (Error/ O) . ( O / wi)

Error/ O = rate of change of error w.r.t output

Now Error/ O = (d - O)2 /O = -2(d - O)

For  O / wi we have ( O / act) ( act / wi)


( O / act) = ( f(act) / act) = f’(act)
( act / wi) = ( i xiwi/ wi) = xi
Hence wi = -c (Error/ wi) = c[-(d - O) . f’(act) . xi]
NEURAL NETWORKS

Delta Learning Rule

A typical activation function is logistic function (which is a


type of sigmoidal function)

f(act) = 1/(1 + e-act)

If value of  (squashing parameter) is large we have a unit


step function, if it is small we have almost a straight line
between two saturation limits

f’(act) = f(act)(1 – f(act))


NEURAL NETWORKS

Multi Layer Perceptron: Architecture & Forward Pass

Hidden Units Output Units


NEURAL NETWORKS

Delta Learning Rule

Now the accumulative error E over a data set will be

E = training samplesj (dj – Oj)2

dj is the desired output of node j and Oj is the actual output


NEURAL NETWORKS

Backpropagation Algorithm

• Set up the architecture & initialize the weights of the


network
• Apply the training pairs (input-output vectors) from the
training set, one by one
• For each training pair, calculate the output of the network
• Calculate the error between actual output & desired output
• Propagate the error backwards & adjust the weights in such
a way that minimizes the error
• Repeat the above steps for each pair in the training set until
the error for the set is lower than the required
minimum error
NEURAL NETWORKS

Multi Layer Perceptron: Training of Hidden Layer Weights

wki
k
i

Hidden layer Output layer


NEURAL NETWORKS

Multi Layer Perceptron: Training of Hidden Layer Weights

wki
k
i

Hidden layer Output layer


NEURAL NETWORKS

Multi Layer Perceptron: Training

Since training examples provide target values only for the


the network outputs , no target values are directly available
to indicate the error of hidden unit’s values

Instead, the error term for a hidden unit is calculated by


taking the weighted sum of the error terms for each output
unit influenced by it

This weight characterizes the degree to which the hidden unit


is responsible for the error in the output unit
NEURAL NETWORKS

Multi Layer Perceptron: Training of Hidden Layer Weights

wki
k
i

Hidden layer Output layer


NEURAL NETWORKS

Multi Layer Perceptron: Training of Hidden Layer Weights

Adjustment of kth weight of node “i”

wki = -c (Error/ wki)

Error/ wki = (Error/ Oi) . ( Oi / wki)

Error/ Oi = rate of change of error w.r.t output of node i


=  j Errorj/ Oi
Since each Errorj is dependent upon Oi but all Errorj are
independent of each other (each has its own independent
weight set)
Hence  j Errorj/ Oi = j ( Errorj/ Oi )
NEURAL NETWORKS

Multi Layer Perceptron: Training of Hidden Layer Weights

j
wij
Oi = xi
i

Hidden layer Output layer


NEURAL NETWORKS

Multi Layer Perceptron: Training of Hidden Layer Weights

Hence j ( Errorj/ Oi ) = j [( Errorj/ actj) . (actj / Oi)]

 Errorj/ actj = ( Errorj/ Oj) ( Oj/ actj)


 Errorj/ Oj =  (dj – Oj)2/ Oj = -2(dj - Oj)
 Oj/ actj =  f(actj)/ actj = f’(actj)

(actj / Oi) = (  xiwij / Oi)


Since Oi = xi
hence actj / Oi = wij
NEURAL NETWORKS

Multi Layer Perceptron: Training of Hidden Layer Weights

Furthermore  Oi / wki = ( Oi / acti)( acti / wki)

( acti / wki) = ( k xkwki/ wki) = xk


( Oi / acti) = ( f(act)i / acti) = f’(act)i

Hence wki = -c (Error/ wki)


= -c[-2j {(dj - Oj) f’(actj) wij }f’(act)i xk]
NEURAL NETWORKS

Multi Layer Perceptron: Training

This approach is called “gradient descent learning”

Requirement of this approach is that the activation function


must be differentiable (i.e. continuous)

The number of input and output neurons are fixed

But the selection of number of hidden layers and the number


of neurons in the hidden layers is done by trial and error
NEURAL NETWORKS

Multi Layer Perceptron: Training

The gradient descent is not guaranteed to converge to the


global optimum

The algorithm we have discussed is the incremental gradient


descent (or stochastic gradient descent) version of the
Backpropagation
NEURAL NETWORKS

Multi Layer Perceptron: Face Recognition Example

Images of 20 different people

32 images per person

With varying expressions (happy, sad, angry, neutral)


and
looking in various directions (left, right, straight, up)
and
with and without sunglasses

Grayscale images (intensity between 0 to 255) and


size (resolution) of 120 x 128 pixels
NEURAL NETWORKS

Multi Layer Perceptron: Face Recognition Example


NEURAL NETWORKS

Multi Layer Perceptron: Face Recognition Example

An ANN can be trained on any one of a variety of target


functions using this image data, e.g.
- identity of a person
- direction in which person is looking
- gender of the person
- whether or not they are wearing sunglasses
NEURAL NETWORKS

Multi Layer Perceptron: Face Recognition Example

Design Choices:

Separate the data into


training (260 images) and test sets (364 images)

Input Encoding
- 30 x 32 pixel image
- A coarse resolution of the 120 x 128 pixel image
- Every 4 x 4 pixels are replaced by their mean value
- The pixel intensity is linearly scaled from 0 to 1 so
that inputs, hidden units and output units have
the same range
NEURAL NETWORKS

Multi Layer Perceptron: Face Recognition Example

Design Choices:

Output Encoding
- Learning Task: Direction in which person is looking
- Only one neuron could have been used with outputs
0.2, 0.4, 0.6, and 0.8 to encode the four possible
values
- But we use 4 output neurons, so that measure of
confidence in the ANN’s decision can be
obtained
- Output vector:
1 for true & 0 for false; e.g. [1, 0, 0, 0]
NEURAL NETWORKS

Multi Layer Perceptron: Face Recognition Example

Design Choices:

Network Structure
- How many Layers?
Usually one hidden layer is enough
- How many units in the hidden layer
More than necessary units result in over-fitting
Less units result in failure of training
Trial & error: Start with a number and prune
the units with the help of a cross-validation set
NEURAL NETWORKS

Example of MLP based Systems:


ALVINN

The system ALVINN uses a


trained ANN to steer an
autonomous vehicle driving at
normal speeds on public
highways

The input is a 30 x 32 grid of


pixel intensities obtained from a
forward pointed camera
mounted on the vehicle
NEURAL NETWORKS

Example of MLP based Systems:


ALVINN

The network output is the


direction in which the vehicle is
steered

The training examples have been


obtained from the steering
commands of a human driver
NEURAL NETWORKS

Appropriate Problems for MLPs

MLPs are appropriate for problems with following


characteristics:

1. Input is high dimensional, and the input attributes may


be highly correlated or they may be independent

2. The output vector may have discrete or real valued


attributes

3. The training data may contain noise (errors)

4. Long training times are acceptable


NEURAL NETWORKS

Appropriate MLP Problems

5. Fast system response is required (trained NN evaluate the


outputs quickly)

6. The ability of humans to understand the learned target


function is not important
NEURAL NETWORKS

Adding Momentum

A popular variation in the backpropagation algorithm is to


have the following weight update rule:

The weight update on the nth iteration is partially dependent


upon the update that occurred during the (n-1)th iteration

The constant “alpha” is called momentum and its value is


between 0 & 1

The first term on the right of the equation is just the weight
update rule described before
NEURAL NETWORKS

Adding Momentum

To understand the effect of the


momentum term consider that the
gradient descent search trajectory
is analogous to that of a ball rolling
down the error surface

Withoutthe momentum term the


ball is pushed only by the weight
change term
NEURAL NETWORKS

Adding Momentum

The effect of momentum is the


tendency to keep the ball rolling in
the same direction from one
iteration to next

It keeps the ball rolling through


small local minima in the error
surface and along flat regions in
the surface where the ball would normally stop

It also has the effect of gradually increasing the step size of


the search in regions where the gradient is unchanging,
thereby speeding convergence
NEURAL NETWORKS

Backpropagation Algorithm: Convergence & Local Minima

May become trapped in local minimum

- However, due to high dimensionality of weights, chances


are less
- Sometimes, near the global minimum, the local
minimum is good enough

- Remedy: Add momentum


- Remedy: Stochastic gradient descent
- Remedy: Train multiple nets with different initial
weights
Reading Assignment & References

Chapter 4 of T. Mitchell

http://www-2.cs.cmu.edu/afs/cs/project/ai-repository/
ai/areas/neural/systems/nevprop/np.c
Assignment

• “Global Optimization Algorithm for


training Product Unit Neural Networks”
• Read the paper
• Make a summary of 1 page
• Implement the paper.
NEURAL NETWORKS

Home Work

Read 2nd Chapter of Engelbrecht’s book

You might also like