0% found this document useful (0 votes)
33 views

Foundations of Machine Learning: Module 6: Neural Network

- Multi-layer neural networks can represent complex functions by allowing interactions between inputs through multiple hidden layers of nodes. - The backpropagation algorithm trains multi-layer neural networks by propagating errors from the output layer back through the network to update weights between all layers. - It works by calculating the contribution of each node in one layer to errors in the next layer, and using those contributions to update the weights to minimize overall error via gradient descent. - This recursive process of propagating errors back through the network allows multi-layer neural networks to learn complex patterns from large amounts of data.

Uploaded by

Nishant Tiwari
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views

Foundations of Machine Learning: Module 6: Neural Network

- Multi-layer neural networks can represent complex functions by allowing interactions between inputs through multiple hidden layers of nodes. - The backpropagation algorithm trains multi-layer neural networks by propagating errors from the output layer back through the network to update weights between all layers. - It works by calculating the contribution of each node in one layer to errors in the next layer, and using those contributions to update the weights to minimize overall error via gradient descent. - This recursive process of propagating errors back through the network allows multi-layer neural networks to learn complex patterns from large amounts of data.

Uploaded by

Nishant Tiwari
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 14

Foundations of Machine Learning

Module 6: Neural Network


Part B: Multi-layer Neural
Network
Sudeshna Sarkar
IIT Kharagpur
Limitations of Perceptrons
• Perceptrons have a monotinicity property:
If a link has positive weight, activation can only increase as the
corresponding input value increases (irrespective of other
input values)
• Can’t represent functions where input interactions can cancel
one another’s effect (e.g. XOR)
• Can represent only linearly separable functions
A solution: multiple layers
output layer
y y

z2

hidden layer
z1 z2 z1

x2

input layer
x1 x2 x1
Power/Expressiveness of Multilayer
Networks
• Can represent interactions among inputs
• Two layer networks can represent any Boolean
function, and continuous functions (within a
tolerance) as long as the number of hidden units is
sufficient and appropriate activation functions used
• Learning algorithms exist, but weaker guarantees
than perceptron learning algorithms
Multilayer Network

Outputls
Inputs

First Second
Input hidden hidden Output
layer layer layer
Two-layer back-propagation neural network
Input signals
1
x1 1 y1
1
2
x2 2 y2
2

i wij j wjk
xi k yk

n1
n n2 yn2
xn
Input Hidden Output
layer layer

Error signals

6
The back-propagation training algorithm
• Step 1: Initialisation
Set all the weights and threshold levels of the network to
random numbers uniformly distributed inside a small range
1

v01
v11 1
x1 1 1 w11
v21 w01

1 y1
v22
x2 2 2 w21
v22
Input v02 Output

1
x z y
Backprop
• Initialization
– Set all the weights and threshold levels of the network to
random numbers uniformly distributed inside a small
range
• Forward computing:
– Apply an input vector x to input units
– Compute activation/output vector z on hidden layer
𝑧𝑗 = 𝜑(σ𝑖 𝑣𝑖𝑗 𝑥𝑖 )
– Compute the output vector y on output layer
𝑦𝑘 = 𝜑(σ𝑗 𝑤𝑗𝑘 𝑧𝑗 )
y is the result of the computation.
Learning for BP Nets
• Update of weights in W (between output and hidden layers):
– delta rule
• Not applicable to updating V (between input and hidden)
– don’t know the target values for hidden units z1, Z2, … ,ZP
• Solution: Propagate errors at output units to hidden units to
drive the update of weights in V (again by delta rule)
(error BACKPROPAGATION learning)
• Error backpropagation can be continued downward if the net
has more than one hidden layer.
• How to compute errors on hidden units?
Derivation
• For one output neuron, the error function is
1
𝐸 = (𝑦 − 𝑦) ො 2
2
• For each unit 𝑗, the output 𝑜𝑗 is defined as
𝑛

𝑜𝑗 = 𝜑 𝑛𝑒𝑡𝑗 = 𝜑 ෍ 𝑤𝑘𝑗 𝑜𝑘
𝑘=1
The input 𝑛𝑒𝑡𝑗 to a neuron is the weighted sum of outputs 𝑜𝑘
of previous 𝑛 neurons.
• Finding the derivative of the error:
𝜕𝐸 𝜕𝐸 𝜕𝑜𝑗 𝜕𝑛𝑒𝑡𝑗
=
𝜕𝑤𝑖𝑗 𝜕𝑜𝑗 𝜕𝑛𝑒𝑡𝑗 𝜕𝑤𝑖𝑗
Derivation
• Finding the derivative of the error:
𝜕𝐸 𝜕𝐸 𝜕𝑜𝑗 𝜕𝑛𝑒𝑡𝑗
=
𝜕𝑤𝑖𝑗 𝜕𝑜𝑗 𝜕𝑛𝑒𝑡𝑗 𝜕𝑤𝑖𝑗
𝑛
𝜕𝑛𝑒𝑡𝑗 𝜕
= ෍ 𝑤𝑘𝑗 𝑜𝑘 = 𝑜𝑖
𝜕𝑤𝑖𝑗 𝜕𝑤𝑖𝑗
𝑘=1
𝜕𝑜𝑗 𝜕
= 𝜑 𝑛𝑒𝑡𝑗 = 𝜑 𝑛𝑒𝑡𝑗 1 − 𝜑 𝑛𝑒𝑡𝑗
𝜕𝑛𝑒𝑡𝑗 𝜕𝑛𝑒𝑡𝑗
Consider 𝐸 as as a function of the inputs of all neurons 𝑍 = 𝑧1 , 𝑧2 , …
receiving input from neuron 𝑗,
𝜕𝐸 𝑜𝑗 𝜕𝐸 𝑛𝑒𝑡𝑧1 , 𝑛𝑒𝑡𝑧2 , …
=
𝜕𝑜𝑗 𝜕𝑜𝑗
taking the total derivative with respect to 𝑜𝑗 , a recursive expression for
the derivative is obtained:
𝜕𝐸 𝜕𝐸 𝜕𝑛𝑒𝑡𝑧𝑙 𝜕𝐸 𝜕𝑜𝑙
=෍ =෍ 𝑤𝑗𝑧𝑙
𝜕𝑜𝑗 𝜕𝑛𝑒𝑡𝑧𝑙 𝜕𝑜𝑗 𝜕𝑜𝑙 𝜕𝑛𝑒𝑡𝑧𝑙
𝑙 𝑙
𝜕𝐸 𝜕𝐸 𝜕𝑛𝑒𝑡𝑧𝑙 𝜕𝐸 𝜕𝑜𝑙
=෍ =෍ 𝑤
𝜕𝑜𝑗 𝜕𝑛𝑒𝑡𝑧𝑙 𝜕𝑜𝑗 𝜕𝑜𝑙 𝜕𝑛𝑒𝑡𝑧𝑙 𝑗𝑧𝑙
𝑙 𝑙
• Therefore, the derivative with respect to 𝑜𝑗 can be calculated if all the derivatives
with respect to the outputs 𝑜𝑧𝑙 of the next layer – the one closer to the output
neuron – are known.
• Putting it all together:
𝜕𝐸
= 𝛿𝑗 𝑜𝑖
𝜕𝑤𝑖𝑗
With
𝑜𝑗 − 𝑡𝑗 𝑜𝑗 1 − 𝑜𝑗 if 𝑗 is an output neuron
𝜕𝐸 𝜕𝑜𝑗
𝛿𝑗 = =
𝜕𝑜𝑗 𝜕𝑛𝑒𝑡𝑗 ෍ 𝛿𝑧𝑙 𝑤𝑗𝑙 𝑜𝑗 1 − 𝑜𝑗 if 𝑗 is an inner neuron
𝑍
To update the weight 𝑤𝑖𝑗 using gradient descent, one must choose a learning rate 𝜂.
𝜕𝐸
∆𝑤𝑖𝑗 = −𝜂
𝜕𝑤𝑖𝑗
Backpropagation Algorithm
Thank You

You might also like