lec15_parameter_update2 (1)
lec15_parameter_update2 (1)
Assume that the target class is at node 5. At first start with the output
nodes. Define loss function and differentiate loss function with respect to all
the output nodes.
Loss = − log P5
dloss 1
=−
dP5 P5
dLoss dLoss dP5 1
= . = − .P5 (1 − P5 ) = P5 − 1
dY5 dP5 dY5 P5
dLoss dLoss dP5 1
= . = − .(−P5 .P6 ) = P6
dY6 dP5 dY6 P5
dLoss dLoss dP5 1
= . = − .(−P5 .P7 ) = P7
dY7 dP5 dY7 P5
Write the expressions for output nodes
Y5 = W35 Y3 + W45 Y4 + b5
Y6 = W36 Y3 + W46 Y4 + b6
Y7 = W37 Y3 + W47 Y4 + b7
Now you can differentiate loss function with respect to all the output layer
parameters W35 , W45 , W36 , W46 , W37 , W47 , b5 , b6 , b7 using chain rule
dLoss dloss dY5
= . = (P5 − 1).Y3
dW35 dY5 dW35
dLoss dloss dY5
= . = (P5 − 1).Y4
dW45 dY5 dW45
dLoss dloss dY5
= . = P6 .Y3
dW36 dY5 dW36
dLoss dloss dY6
= . = P6 .Y4
dW46 dY6 dW46
dLoss dloss dY7
= . = P7 .Y3
dW37 dY7 dW37
dLoss dloss dY7
= . = P7 .Y4
dW47 dY7 dW47
1
dLoss dloss dY5
= . = P5 − 1
db5 dY5 db5
dLoss dloss dY6
= . = P6
db6 dY6 db6
dLoss dloss dY7
= . = P7
db7 dY7 db7
Now using gradient descent approach the update equations for all the output
layer parameters are:
b5 = b5 − η.(P5 − 1)
b6 = b6 − η.P6
b7 = b7 − η.P7
Now consider the hidden nodes 3 and 4. Write the expressions for Y3 and
Y4 .
2
Now we can differentiate Loss function with respect to all the hidden layer
parameters W13 , W23 , W14 , W24 , b3 and b4
′
W13 = W13 +η.((P5 −1).W35 +P6 .W36 +P7 .W37 }.Relu (W13 .X1 +W23 .X2 +b3 ).X1
′
W23 = W23 +η.((P5 −1).W35 +P6 .W36 +P7 .W37 ).Relu (W13 .X1 +W23 .X2 +b3 ).X2
′
W14 = W14 +η.((P5 −1).W45 +P6 .W46 +P7 .W47 ).Relu (W14 .X1 +W24 .X2 +b4 ).X1
′
W24 = dW24 +η.((P5 −1).W45 +P6 .W46 +P7 .W47 ).Relu (W14 .X1 +W24 .X2 +b4 ).X2
′
b3 = b3 + η.((P5 − 1).W35 + P6 .W36 + P7 .W37 ).Relu (W13 .X1 + W23 .X2 + b3 )
′
b4 = b4 + η.((P5 − 1).W45 + P6 .W46 + P7 .W47 ).Relu (W14 .X1 + W24 .X2 + b4 )
3
bs
Wu +b
Way
Wyt
h)
Wiy
Wiy
+e