0% found this document useful (0 votes)
117 views

Back Propagation

The document describes the backpropagation algorithm for training multi-layer neural networks. It discusses how backpropagation works by propagating error backwards from the output layer through the network to update weights, using gradient descent to minimize error. The algorithm involves feeding inputs forward and calculating outputs, then propagating error backwards to adjust weights between layers based on their contribution to the output error.

Uploaded by

Shame Bope
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
117 views

Back Propagation

The document describes the backpropagation algorithm for training multi-layer neural networks. It discusses how backpropagation works by propagating error backwards from the output layer through the network to update weights, using gradient descent to minimize error. The algorithm involves feeding inputs forward and calculating outputs, then propagating error backwards to adjust weights between layers based on their contribution to the output error.

Uploaded by

Shame Bope
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 20

CHAPTER 9

BACK PROPAGATION
BACKGROUND
• Backpropagation was invented by Bryson and Ho in 1969.
• It is a method for supervised training in multi-layer networks.
• The Backpropagation algorithm is a sensible approach for dividing the contribution of each
weight.
• Works basically the same as perceptrons.
• It has 2 Learning Principles: Hidden Layers and Gradients
• There are two differences for the updating rule :
1) The activation of the hidden unit is used instead of activation of the input value.
2) The rule contains a term for the gradient of the activation function
• Back propagation works by approximating the non-linear relationship between
the input and the output by adjusting the weight values internally.
• It can further be generalized for the input that is not included in the training
patterns (predictive abilities).
• It is a generalization of the delta rule for non-linear activation functions and
multi-layer networks.
• The back propagation network has two stages, training and testing.
• During the training phase, the network is "shown" sample inputs and the correct
classifications. For example, the input might be an encoded picture of a face, and
the output could be represented by a code that corresponds to the name of the
person.
• The scheme will define the network architecture so that once a network is trained,
the scheme cannot be changed without creating a totally new net.
BACK PROPAGATION ARCHITECTURE
• A back propagation network consists of at least three layers of units:
 An input layer,
 At least one intermediate hidden layer, and
 An output layer
• Typically units are connected in a feed-forward fashion with input units fully connected to units in the
hidden layer and hidden units fully connected to units in the output layer.
• When a back propagation network is cycled, an input pattern is propagated forward to the output units
through the intervening input-to-hidden and hidden-to-output weights.
• The output of a back propagation network is interpolated as a classification decision.
• Back propagation neural networks can have more than one hidden layer.
BACK PROPAGATION NETWORK TRAINING

1. Initialize network with random weights.


2. For all training cases (called examples): Present training inputs to network and
calculate output.
3. For all layers (starting with output layer, back to input layer):
 Compare network output with correct output (error function)
 Adapt weights in current layer Method for learning weights in feed-forward (FF) nets.
4. Use gradient descent to minimize the error - propagate deltas to adjust for
errors backward from outputs to hidden layers to inputs.
IDEAS OF BACK PROPAGATION
• It computes the error term for the output units using the observed error.
• From output layer, repeat -propagating the error . term back to the previous
layer and - updating the weights between the two layers until the earliest
hidden layer is reached
• Backward Pass: Compute deltas to weights - from hidden layer to output
layer
• Without changing any weights (yet), compute the actual contributions within
the hidden layer(s) and compute deltas
• Once the error signal for each node has been determined, the errors are then
used by the nodes to update the values for each connection weights until the
network converges to a state that allows all the training patterns to be encoded.
• The back propagation algorithm looks for the minimum value of the error
function in weight space using a technique called the delta rule or gradient
descent.
• The weights that minimize the error function is then considered to be a
solution to the learning problem
UNDERTRAINING
• Under-training can occur when the neural
network is not complex enough to detect a
pattern in a complicated data set.
• This is usually the result of networks with so
few hidden nodes that it cannot accurately
represent the solution, therefore under-fitting
the data
OVERTRAINING
• Over-training can result in a network that is
too complex, resulting in predictions that are
far beyond the range of the training data.
• Networks with too many hidden nodes will
tend to over-fit the solution
GOOD FIT
• The aim is to create a neural network with the
"right" number of hidden nodes that will lead
to a good solution to the problem.
• Assign all network inputs and output  Initialize all weights with small random
numbers, typically between -1 and 1 
• repeat 
•     for every pattern in the training set 
•         Present the pattern to the network 
• //        Propagated the input forward through the network:
            for each layer in the network 
                for every node in the layer 
                    1. Calculate the weight sum of the inputs to the node 
                    2. Add the threshold to the sum 
                    3. Calculate the activation for the node 
                end 
BACK             end 
• //        Propagate the errors backward through the network
PROPAGATION              for every node in the output layer 
                calculate the error signal 
ALGORITHM             end 
•             for all hidden layers 
                for every node in the layer 
                    1. Calculate the node's signal error 
                    2. Update each node's weight in the network 
                end 
            end 
• //        Calculate Global Error
            Calculate the Error Function 
•     end 
• while ((maximum  number of iterations < than specified) AND 
          (Error Function is > than specified))
BACK PROPAGATION ALGORITHM
Feed Forward
• When a specified training pattern is fed to the input layer, the weighted sum of the input to the node in
the hidden layer is given by
=
• The equation is used to calculate the aggregate input to the neuron.
• The  term is the weighted value from a bias node that always has an output value of 1.
• The bias node is considered a "pseudo input" to each neuron in the hidden layer and the output layer,
and is used to overcome the problems associated with situations where the values of an input pattern are
zero.
• If any input pattern has zero values, the neural network could not be trained without a bias node.
• To decide whether a neuron should fire, the "Net" term, also known as the
action potential,  is passed onto an appropriate activation function.
• The resulting value from the activation function determines the neuron's output,
and becomes the input value for the neurons in the next layer connected to it.
• Since one of the requirements for the back propagation algorithm is that the
activation function is differentiable, a typical activation function used is the
Sigmoid equation
==
Error Calculations and Weight Adjustments –
Back Propagation
• Output Layer
If the actual activation value of the output node, k, is O k, and the expected target output for node k
is tk, the difference between the actual output and the expected output is given by:
• =-
• The error signal for node k in the output layer can be calculated as
= (1 - )
• OR
= ( - ) (1 - )
where the (1 - ) term is the derivative of the Sigmoid function.
• With the delta rule, the change in the weight connecting input node j and output node k is proportional to the
error at node k multiplied by the activation of node j.
• The formulas used to modify the weight, wj,k, between the output node, k, and the node, j is:
=
= +
• where  is the change in the weight between nodes j and k, lr is the learning rate.
• The learning rate is a relatively small constant that indicates the relative change in weights. If the learning rate
is too low, the network will learn very slowly, and if the learning rate is too high, the network may oscillate
around minimum point, overshooting the lowest point with each weight adjustment, but never actually
reaching it.
• Usually the learning rate is very small, with 0.01 not an uncommon number.
• As learning progresses, the learning rate decreases as it approaches the optimal point in the minima. Slowing
the learning process near the optimal point encourages the network to converge to a solution while reducing
the possibility of overshooting.
• If, however, the learning process initiates close to the optimal point, the system may initially oscillate, but this
effect is reduced with time as the learning rate decreases.
THE HIDDEN LAYER
• The hidden layer allows neural network to develop its own internal representation of input-
output mapping.
• The complex internal representation capability allows the hierarchical network to learn any
mapping and not just the linearly separable ones
• The error signal for node j in the hidden layer can be calculated as :
=(-)
where the Sum term adds the weighted error signal for all nodes, k,  in the output layer.
• As before, the formula to adjust the weight, wi,j, between the input node, i, and the node, j is:
 
GLOBAL ERROR
• Back propagation is derived by assuming that it is desirable to minimize the error
on the output nodes over all the patterns presented to the neural network.
• The following equation is used to calculate the error function, E,  for all
patterns:
E=)

• Ideally, the error function should have a value of zero when the neural network
has been correctly trained. This, however, is numerically unrealistic.
LIMITATION OF BACK PROPAGATION
• It cannnot use Perceptron Learning Rule because no teacher values are
possible for hidden units.
THE END.

You might also like