Back Propagation Technique
Back Propagation Technique
Presented by:
Karthick Myilvahanan
Outline of the Presentation
• Introduction
• Historical Background
• Perceptron
• Back propagation algorithm
• Limitations and improvements
• Questions and Answers
Introduction
• Artificial Neural Networks are crude attempts to model the highly
massive parallel and distributed processing we believe takes place in
the brain.
• Back propagation, an abbreviation for "backward propagation of
errors", is a common method of training artificial neural networks used
in conjunction with an optimization method such as gradient descent.
• The method calculates the gradient of a loss function with respect to
all the weights in the network.
• The gradient is fed to the optimization method which in turn uses it to
update the weights, in an attempt to minimize the loss function.
Why do we need multi layer neural networks??
• Some input and output patterns can be easily learned by single-layer neural
networks (i.e. perceptron). However, these single-layer perceptron cannot learn
some relatively simple patterns, such as those that are not linearly separable.
• Single layer neural network can’t learn any abstract features of the input since
it is limited to having only one layer. A multi-layered network overcomes this
limitation as it can create internal representations and learn different features
in each layer.
• Each higher layer learns more and more abstract features that can be used to
classify the image. Each layer finds patterns in the layer below it and it is this
ability to create internal representations that are independent of outside input
that gives multi-layered networks their power.
Historical Background
• Early attempts to implement artificial neural networks: McCulloch
(Neuroscientist) and Pitts (Logician) (1943) [1]
• Based on simple neurons (MCP neurons)
• Based on logical functions
• Donald Hebb (1949) gave the hypothesis in his thesis “The Organization of
Behavior”:
• “Neural pathways are strengthened every time they are used.”
wn w0 [3]
xn
X0=1
Delta Rule
• The delta rule is used for calculating the gradient that is used for
updating the weights.
• We will try to minimize the following error:
• E = ½ Σi (ti – oi) 2
• For a new training example X = (x1, x2, …, xn), update each weight
according to this rule:
• wi = wi + Δwi
where, Δwi= -n * E’(w)/wi
Delta Rule
• The derivative gives,
• E’(w)/wi= Σi (ti – oi)*(- xi)
Output nodes
Internal nodes
• O(x1,x2,…,xn) = σ ( WX )
where: σ ( WX ) = 1 / 1 + e -WX [3]
[2]
What is gradient descent algorithm??
• Back propagation calculates the gradient of the error of the
network regarding the network's modifiable weights.
• This gradient is almost always used in a simple stochastic gradient
descent algorithm to find weights that minimize the error.
E(W)
w1
[3]
w2
Propagating forward
• Given example X, compute the output of every node until we reach
the output nodes:
Output nodes
Compute
Internal nodes
sigmoid function
Input nodes
[3]
Example X
Propagating Error Backward
• For each output node k compute the error:
δk = Ok (1-Ok)(tk – Ok)
• For each hidden unit h, calculate the error:
δh = Oh (1-Oh) Σk Wkh δk
• Update each network weight:
Wji = Wji + Δwji
where Δwji = η δj Xji (Wji and Xji are the input and
weight of node i to node j)
Number of Hidden Units
• The number of hidden units is related to the complexity of the
decision boundary.
[3]
Limitations of the back propagation algorithm
• It is not guaranteed to find global minimum of the error function. It
may get trapped in a local minima,
• Improvements,
• Add momentum.
• Use stochastic gradient descent.
• Use different networks with different initial values for the weights.
• Solutions,
• Use a validation set and stop until the error is small in this set.
• Use 10 fold cross validation
References
• Artificial Neural Networks, https://
en.wikipedia.org/wiki/Artificial_neural_network
• Back propagation Algorithm, https://
en.wikipedia.org/wiki/Backpropagation
• Lecture Slides from Dr. Vilalta’s machine learning class
Questions??
Thank you!!