0% found this document useful (0 votes)
10 views

lec15_parameter_update2 (1)

The document outlines the process of updating parameters in a neural network using gradient descent, focusing on the output and hidden layers. It provides detailed equations for loss differentiation and parameter updates for weights and biases associated with nodes 3, 4, 5, 6, and 7. The methodology emphasizes the use of the chain rule to derive the necessary gradients for effective learning.

Uploaded by

sameen.islam
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

lec15_parameter_update2 (1)

The document outlines the process of updating parameters in a neural network using gradient descent, focusing on the output and hidden layers. It provides detailed equations for loss differentiation and parameter updates for weights and biases associated with nodes 3, 4, 5, 6, and 7. The methodology emphasizes the use of the chain rule to derive the necessary gradients for effective learning.

Uploaded by

sameen.islam
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Parameter Update 2

Chowdhury Mofizur Rahman


April 23, 2025

Assume that the target class is at node 5. At first start with the output
nodes. Define loss function and differentiate loss function with respect to all
the output nodes.

Loss = − log P5
dloss 1
=−
dP5 P5
dLoss dLoss dP5 1
= . = − .P5 (1 − P5 ) = P5 − 1
dY5 dP5 dY5 P5
dLoss dLoss dP5 1
= . = − .(−P5 .P6 ) = P6
dY6 dP5 dY6 P5
dLoss dLoss dP5 1
= . = − .(−P5 .P7 ) = P7
dY7 dP5 dY7 P5
Write the expressions for output nodes
Y5 = W35 Y3 + W45 Y4 + b5
Y6 = W36 Y3 + W46 Y4 + b6
Y7 = W37 Y3 + W47 Y4 + b7
Now you can differentiate loss function with respect to all the output layer
parameters W35 , W45 , W36 , W46 , W37 , W47 , b5 , b6 , b7 using chain rule
dLoss dloss dY5
= . = (P5 − 1).Y3
dW35 dY5 dW35
dLoss dloss dY5
= . = (P5 − 1).Y4
dW45 dY5 dW45
dLoss dloss dY5
= . = P6 .Y3
dW36 dY5 dW36
dLoss dloss dY6
= . = P6 .Y4
dW46 dY6 dW46
dLoss dloss dY7
= . = P7 .Y3
dW37 dY7 dW37
dLoss dloss dY7
= . = P7 .Y4
dW47 dY7 dW47
1
dLoss dloss dY5
= . = P5 − 1
db5 dY5 db5
dLoss dloss dY6
= . = P6
db6 dY6 db6
dLoss dloss dY7
= . = P7
db7 dY7 db7
Now using gradient descent approach the update equations for all the output
layer parameters are:

W35 = W35 − η.Y3 .(P5 − 1)


W45 = W45 − η.Y4 .(P5 − 1)

W36 = W36 − η.Y3 .P6


W46 = W46 − η.Y4 .P6

W37 = W37 − η.Y3 .P7


W47 = W47 − η.Y4 .P7

b5 = b5 − η.(P5 − 1)
b6 = b6 − η.P6
b7 = b7 − η.P7
Now consider the hidden nodes 3 and 4. Write the expressions for Y3 and
Y4 .

Y3 = Relu{W13 .X1 + W23 .X2 + b3 }


Y4 = Relu{W14 .X1 + W24 .X2 + b4 }
Now differentiate the Loss function with respect to hidden outputs Y3 and
Y4 . Both nodes 3 and 4 have parents 5, 6 and 7. Following the chain rule

dLoss dloss dY5 dloss dY6 dloss dY7


= . + . + . = (P5 −1).W35 +P6 .W36 +P7 .W37
dY3 dY5 dY3 dY6 dY3 dY7 dY3

dLoss dloss dY5 dloss dY6 dloss dY7


= . + . + . = (P5 −1).W45 +P6 .W46 +P7 .W47
dY4 dY5 dY4 dY6 dY4 dY7 dY4

2
Now we can differentiate Loss function with respect to all the hidden layer
parameters W13 , W23 , W14 , W24 , b3 and b4

dLoss dLoss dY3 ′


= . = {(P5 −1).W35 +P6 .W36 +P7 .W37 }.Relu (W13 .X1 +W23 .X2 +b3 ).X1
dW13 dY3 dW13

dLoss dLoss dY3 ′


= . = {(P5 −1)).W35 +P6 .W36 +P7 .W37 }.Relu (W13 .X1 +W23 .X2 +b3 ).X2
dW23 dY3 dW23

dLoss dLoss dY4 ′


= . = {(P5 −1).W45 +P6 .W46 +P7 .W47 }.Relu (W14 .X1 +W24 .X2 +b4 ).X1
dW14 dY4 dW14

dLoss dLoss dY4 ′


= . = {(P5 −1).W45 +P6 .W46 +P7 .W47 }.Relu (W14 .X1 +W24 .X2 +b4 ).X2
dW24 dY4 dW24

dLoss dLoss dY3 ′


= . = {(P5 −1).W35 +P6 .W36 +P7 .W37 }.Relu (W13 .X1 +W23 .X2 +b3 )
db3 dY3 db3

dLoss dLoss dY4 ′


= . = {(P5 −1).W45 +P6 .W46 +P7 ).W47 }.Relu (W14 .X1 +W24 .X2 +b4 )
db4 dY4 db4
Therefore the updated equations for all the hidden layer parameters are as
follows:


W13 = W13 +η.((P5 −1).W35 +P6 .W36 +P7 .W37 }.Relu (W13 .X1 +W23 .X2 +b3 ).X1


W23 = W23 +η.((P5 −1).W35 +P6 .W36 +P7 .W37 ).Relu (W13 .X1 +W23 .X2 +b3 ).X2


W14 = W14 +η.((P5 −1).W45 +P6 .W46 +P7 .W47 ).Relu (W14 .X1 +W24 .X2 +b4 ).X1


W24 = dW24 +η.((P5 −1).W45 +P6 .W46 +P7 .W47 ).Relu (W14 .X1 +W24 .X2 +b4 ).X2


b3 = b3 + η.((P5 − 1).W35 + P6 .W36 + P7 .W37 ).Relu (W13 .X1 + W23 .X2 + b3 )


b4 = b4 + η.((P5 − 1).W45 + P6 .W46 + P7 .W47 ).Relu (W14 .X1 + W24 .X2 + b4 )

3
bs

Wu +b

Way
Wyt

h)

Wiy
Wiy

+e

You might also like