0% found this document useful (0 votes)
8 views

Chapter 6 - Backpropagation

Uploaded by

Muhammad Ashraf
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

Chapter 6 - Backpropagation

Uploaded by

Muhammad Ashraf
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 48

Backpropagation

Notes based on
CS231n, Stanford University
EECS 498-007 / 598-005, University of Michigan
with permission from Justin Johnson

9/10/2023 Zafri Baharuddin EEEB4023/ECEB463 AI 1 The Energy University


Previously: Neural Networks

Feature transform + Linear classifer Neural Networks as learnable feature transforms


allows nonlinear decision boundaries

9/10/2023 Zafri Baharuddin EEEB4023/ECEB463 AI 2 The Energy University


Previously: Neural Networks Linear classifier: One template per class

From linear classifers to


fully-connected networks
s = W2 max(0, W1x)
Neural networks: Many reusable templates
Hidden layer:
100 Output: 10
Input:
3072
x W1 h W2 s

9/10/2023 Zafri Baharuddin EEEB4023/ECEB463 AI 3 The Energy University


Optimization: How to compute gradients?
s = f(x; W1, W2) = W2 max(0, W1x) Nonlinear score function

SVM Loss on predictions

Total loss: data loss + regularization

Want to find gradient (slope) of the Loss, with respect to W:

If we can compute then we can learn W1 and W2

9/10/2023 Zafri Baharuddin EEEB4023/ECEB463 AI 4 The Energy University


Optimization

Iteratively take steps towards direction of greatest descent

9/10/2023 Zafri Baharuddin EEEB4023/ECEB463 AI 5 The Energy University


Gradient Descent

Numeric gradient: approximate, slow, easy to write


Analytic gradient: exact, fast, error-prone

Today: Analytic gradient, for arbitrarily complex functions


using computational graphs

9/10/2023 Zafri Baharuddin EEEB4023/ECEB463 AI 6 The Energy University


Computational Graphs

Li

9/10/2023 Zafri Baharuddin EEEB4023/ECEB463 AI 7 The Energy University


Deep Network (AlexNet)

9/10/2023 Zafri Baharuddin EEEB4023/ECEB463 AI 8 The Energy University


Deep Network (AlexNet)

input image

weights

loss

9/10/2023 Zafri Baharuddin EEEB4023/ECEB463 AI 9 The Energy University


Neural Turing Machine

input image

loss

Figure from a Twitter post by Andrej Karpathy

9/10/2023 Zafri Baharuddin EEEB4023/ECEB463 AI 10 The Energy University


Neural Turing Machine

input image

loss

Figure from a Twitter post by Andrej Karpathy

9/10/2023 Zafri Baharuddin EEEB4023/ECEB463 AI 11 The Energy University


Backpropagation: Simple Example
-2
f (x, y, z) = (x + y)z 3
e.g. x = -2, y = 5, z = -4 5
-12
1. Forward pass: Compute outputs 1
-4
q=x+y

f = qz
2. Backward pass: Compute derivatives
Want:

9/10/2023 Zafri Baharuddin EEEB4023/ECEB463 AI 12 The Energy University


Backpropagation: Simple Example
-2
f (x, y, z) = (x + y)z 3
e.g. x = -2, y = 5, z = -4 5
-12
1. Forward pass: Compute outputs 1
-4
q=x+y

f = qz
2. Backward pass: Compute derivatives
Want:

9/10/2023 Zafri Baharuddin EEEB4023/ECEB463 AI 13 The Energy University


Backpropagation: Simple Example
-2
f (x, y, z) = (x + y)z 3
e.g. x = -2, y = 5, z = -4 5
-12
1. Forward pass: Compute outputs 1
-4
q=x+y 3

f = qz
2. Backward pass: Compute derivatives
Want:

9/10/2023 Zafri Baharuddin EEEB4023/ECEB463 AI 14 The Energy University


Backpropagation: Simple Example
-2
f (x, y, z) = (x + y)z 3
e.g. x = -2, y = 5, z = -4 5
-12
1. Forward pass: Compute outputs 1
-4
q=x+y 3

f = qz
2. Backward pass: Compute derivatives
Want:

9/10/2023 Zafri Baharuddin EEEB4023/ECEB463 AI 15 The Energy University


Backpropagation: Simple Example
-2
f (x, y, z) = (x + y)z 3
e.g. x = -2, y = 5, z = -4 5 -4
-12
1. Forward pass: Compute outputs 1
-4
q=x+y 3

f = qz
2. Backward pass: Compute derivatives
Want:

9/10/2023 Zafri Baharuddin EEEB4023/ECEB463 AI 16 The Energy University


Backpropagation: Simple Example
-2
f (x, y, z) = (x + y)z 3
e.g. x = -2, y = 5, z = -4 5 -4
-12
1. Forward pass: Compute outputs 1
-4
q=x+y 3

f = qz ,
2. Backward pass: Compute derivatives
Want:

9/10/2023 Zafri Baharuddin EEEB4023/ECEB463 AI 17 The Energy University


Backpropagation: Simple Example
-2
f (x, y, z) = (x + y)z 3
e.g. x = -2, y = 5, z = -4 5 -4
-12
-4
1. Forward pass: Compute outputs 1
-4
q=x+y 3

Chain rule
f = qz ,
2. Backward pass: Compute derivatives
Want:
Downstream
Gradient = Local
Gradient *
Upstream
Gradient

9/10/2023 Zafri Baharuddin EEEB4023/ECEB463 AI 18 The Energy University


Backpropagation: Simple Example
-2
f (x, y, z) = (x + y)z -4 3
e.g. x = -2, y = 5, z = -4 5 -4
-12
-4
1. Forward pass: Compute outputs 1
-4
q=x+y , 3

Chain rule
f = qz ,
2. Backward pass: Compute derivatives
Want:
Downstream
Gradient = Local
Gradient *
Upstream
Gradient

9/10/2023 Zafri Baharuddin EEEB4023/ECEB463 AI 19 The Energy University


z
f

9/10/2023 Zafri Baharuddin EEEB4023/ECEB463 AI 20 The Energy University


z
Downstream f
Gradients

Local
Gradient
Upstream
Gradient

9/10/2023 Zafri Baharuddin EEEB4023/ECEB463 AI 21 The Energy University


Another Example

9/10/2023 Zafri Baharuddin EEEB4023/ECEB463 AI 22 The Energy University


Another Example
2.00

-1.00

-3.00

-2.00

-3.00

9/10/2023 Zafri Baharuddin EEEB4023/ECEB463 AI 23 The Energy University


Another Example
2.00
-2.00 Forward pass: Compute outputs
-1.00
4.00
-3.00
6.00
1.00 -1.00 0.37 1.37 0.73
-2.00

-3.00

9/10/2023 Zafri Baharuddin EEEB4023/ECEB463 AI 24 The Energy University


Another Example
2.00
Backward pass: Compute gradients
-2.00
-1.00
4.00
-3.00
6.00
1.00 -1.00 0.37 1.37 0.73
-2.00 1.00

-3.00

Base case

9/10/2023 Zafri Baharuddin EEEB4023/ECEB463 AI 25 The Energy University


Another Example
2.00
Backward pass: Compute gradients
-2.00
-1.00
Local gradient
4.00 [local gradient] * [upstream gradient] 𝜕 1 1
=−
-3.00 𝜕𝑥 𝑥 𝑥

6.00
1.00 -1.00 0.37 1.37 0.73
-2.00 -0.53 1.00

-3.00 Downstream Upstream


gradient gradient

9/10/2023 Zafri Baharuddin EEEB4023/ECEB463 AI 26 The Energy University


Another Example
2.00
Backward pass: Compute gradients
-2.00
-1.00
Local gradient
4.00 𝜕
𝑥+1 =1
-3.00 𝜕𝑥

6.00
1.00 -1.00 0.37 1.37 0.73
-2.00 -0.53 -0.53 1.00

-3.00 Downstream Upstream


gradient gradient

[local gradient] * [upstream gradient]

9/10/2023 Zafri Baharuddin EEEB4023/ECEB463 AI 27 The Energy University


Another Example
2.00
Backward pass: Compute gradients
-2.00
-1.00
Local gradient
4.00 𝜕 [local gradient] * [upstream gradient]
𝑒 =𝑒
-3.00 𝜕𝑥

6.00
1.00 -1.00 0.37 1.37 0.73
-2.00 -0.20 -0.53 -0.53 1.00

-3.00 Downstream Upstream


gradient gradient

9/10/2023 Zafri Baharuddin EEEB4023/ECEB463 AI 28 The Energy University


Another Example
2.00
Backward pass: Compute gradients
-2.00
-1.00
Local gradient
4.00 𝜕 [local gradient] * [upstream gradient]
−𝑥 = −1
-3.00 𝜕𝑥

6.00
1.00 -1.00 0.37 1.37 0.73
-2.00 0.20 -0.20 -0.53 -0.53 1.00

-3.00 Downstream Upstream


gradient gradient

9/10/2023 Zafri Baharuddin EEEB4023/ECEB463 AI 29 The Energy University


Another Example
2.00
Backward pass: Compute gradients
-2.00
-1.00
Local gradient
4.00 𝜕 𝜕 [local] * [upstream]
𝑥 + 𝑦 = 1; 𝑥+𝑦 =1
0.20 𝜕𝑥 𝜕𝑦
-3.00
6.00
1.00 -1.00 0.37 1.37 0.73
-2.00 0.20 -0.20 -0.53 -0.53 1.00

-3.00 Downstream Upstream


gradient gradient
0.20

9/10/2023 Zafri Baharuddin EEEB4023/ECEB463 AI 30 The Energy University


Another Example
2.00
Backward pass: Compute gradients
-2.00
0.20
-1.00
Local gradient
4.00 𝜕 𝜕 [local] * [upstream]
𝑥 + 𝑦 = 1; 𝑥+𝑦 =1
0.20 𝜕𝑥 𝜕𝑦
-3.00
6.00
0.20 1.00 -1.00 0.37 1.37 0.73
-2.00 0.20 -0.20 -0.53 -0.53 1.00

-3.00 Downstream Upstream


gradient gradient
0.20

9/10/2023 Zafri Baharuddin EEEB4023/ECEB463 AI 31 The Energy University


Downstream
gradient

Another Example
2.00
-0.20 Upstream Backward pass: Compute gradients
-2.00 gradient
0.20
-1.00 Local gradient [local] * [upstream]
0.40 𝜕 𝜕
4.00 𝑥𝑦 = 𝑦; 𝑥𝑦 = 𝑥
𝜕𝑥 𝜕𝑦
0.20
-3.00
6.00
0.20 1.00 -1.00 0.37 1.37 0.73
-2.00 0.20 -0.20 -0.53 -0.53 1.00

-3.00
0.20

9/10/2023 Zafri Baharuddin EEEB4023/ECEB463 AI 32 The Energy University


Downstream
gradient

Another Example
2.00
-0.20 Upstream Backward pass: Compute gradients
-2.00 gradient
0.20
-1.00 Local gradient [local] * [upstream]
0.40 𝜕 𝜕
4.00 𝑥𝑦 = 𝑦; 𝑥𝑦 = 𝑥
𝜕𝑥 𝜕𝑦
0.20
-3.00
-0.40
6.00
0.20 1.00 -1.00 0.37 1.37 0.73
-2.00 0.20 -0.20 -0.53 -0.53 1.00
-0.60

-3.00
0.20

9/10/2023 Zafri Baharuddin EEEB4023/ECEB463 AI 33 The Energy University


Another Example
2.00
-0.20 Backward pass: Compute gradients
-2.00
0.20
-1.00 Computational graph is not
0.40 unique: we can use primitives
4.00
that have simple local gradients
0.20
-3.00
-0.40 Sigmoid
6.00
0.20 1.00 -1.00 0.37 1.37 0.73
-2.00 0.20 -0.20 -0.53 -0.53 1.00
-0.60
[downstream] = [local] * [upstream]
-3.00
0.20 = (1 – 0.73) * 0.73 * 1.0 = 0.2

Sigmoid local
gradient:

9/10/2023 Zafri Baharuddin EEEB4023/ECEB463 AI 34 The Energy University


Patterns in Gradient Flow
add gate: gradient distributor mul gate: “swap multiplier”

copy gate: gradient adder max gate: gradient router

9/10/2023 Zafri Baharuddin EEEB4023/ECEB463 AI 35 The Energy University


Backprop Implementation: def f(wo, x0, w1, x1, w2):

”Flat” gradient code: s0 = w0 * x0


s1 = w1 * x1
Forward pass: s2 = s0 + s1
Compute output s3 = s2 + w2
L = sigmoid(s3)

grad_L = 1.0
Backward pass:
Compute grads grad_s3 = grad_L * (1 - L) * L
grad_w2 = grad_s3
0.73 grad_s2 = grad_s3
1.00
grad_s0 = grad_s2
grad_s1 = grad_s2
grad_w1 = grad_s1 * x1
grad_x1 = grad_s1 * w1
grad_w0 = grad_s0 * x0
grad_x0 = grad_s0 * w0

9/10/2023 Zafri Baharuddin EEEB4023/ECEB463 AI 36 The Energy University


Backprop Implementation: def f(wo, x0, w1, x1, w2):

”Flat” gradient code: s0 = w0 * x0


s1 = w1 * x1
s2 = s0 + s1
Forward pass: s3 = s2 + w2
Compute output
L = sigmoid(s3)

Base case grad_L = 1.0


grad_s3 = grad_L * (1 - L) * L
grad_w2 = grad_s3
0.73 grad_s2 = grad_s3
1.00
grad_s0 = grad_s2
grad_s1 = grad_s2
grad_w1 = grad_s1 * x1
grad_x1 = grad_s1 * w1
grad_w0 = grad_s0 * x0
grad_x0 = grad_s0 * w0

9/10/2023 Zafri Baharuddin EEEB4023/ECEB463 AI 37 The Energy University


Backprop Implementation: def f(wo, x0, w1, x1, w2):

”Flat” gradient code: s0 = w0 * x0


s1 = w1 * x1
s2 = s0 + s1
Forward pass: s3 = s2 + w2
Compute output
L = sigmoid(s3)

grad_L = 1.0
Sigmoid grad_s3 = grad_L * (1 - L) * L
grad_w2 = grad_s3
0.73 grad_s2 = grad_s3
1.00
grad_s0 = grad_s2
grad_s1 = grad_s2
grad_w1 = grad_s1 * x1
grad_x1 = grad_s1 * w1
grad_w0 = grad_s0 * x0
grad_x0 = grad_s0 * w0

9/10/2023 Zafri Baharuddin EEEB4023/ECEB463 AI 38 The Energy University


Backprop Implementation: def f(wo, x0, w1, x1, w2):

”Flat” gradient code: s0 = w0 * x0


s1 = w1 * x1
s2 = s0 + s1
Forward pass: s3 = s2 + w2
Compute output
L = sigmoid(s3)

grad_L = 1.0
grad_s3 = grad_L * (1 - L) * L
Add grad_w2 = grad_s3
0.73 grad_s2 = grad_s3
1.00
grad_s0 = grad_s2
grad_s1 = grad_s2
grad_w1 = grad_s1 * x1
grad_x1 = grad_s1 * w1
grad_w0 = grad_s0 * x0
grad_x0 = grad_s0 * w0

9/10/2023 Zafri Baharuddin EEEB4023/ECEB463 AI 39 The Energy University


Backprop Implementation: def f(wo, x0, w1, x1, w2):

”Flat” gradient code: s0 = w0 * x0


s1 = w1 * x1
s2 = s0 + s1
Forward pass: s3 = s2 + w2
Compute output
L = sigmoid(s3)

grad_L = 1.0
grad_s3 = grad_L * (1 - L) * L
grad_w2 = grad_s3
0.73 grad_s2 = grad_s3
1.00
grad_s0 = grad_s2
Add
grad_s1 = grad_s2
grad_w1 = grad_s1 * x1
grad_x1 = grad_s1 * w1
grad_w0 = grad_s0 * x0
grad_x0 = grad_s0 * w0

9/10/2023 Zafri Baharuddin EEEB4023/ECEB463 AI 40 The Energy University


Backprop Implementation: def f(wo, x0, w1, x1, w2):

”Flat” gradient code: s0 = w0 * x0


s1 = w1 * x1
s2 = s0 + s1
Forward pass: s3 = s2 + w2
Compute output
L = sigmoid(s3)

grad_L = 1.0
grad_s3 = grad_L * (1 - L) * L
grad_w2 = grad_s3
0.73 grad_s2 = grad_s3
1.00
grad_s0 = grad_s2
grad_s1 = grad_s2
grad_w1 = grad_s1 * x1
Multiply
grad_x1 = grad_s1 * w1
grad_w0 = grad_s0 * x0
grad_x0 = grad_s0 * w0

9/10/2023 Zafri Baharuddin EEEB4023/ECEB463 AI 41 The Energy University


Backprop Implementation: def f(wo, x0, w1, x1, w2):

”Flat” gradient code: s0 = w0 * x0


s1 = w1 * x1
s2 = s0 + s1
Forward pass: s3 = s2 + w2
Compute output
L = sigmoid(s3)

grad_L = 1.0
grad_s3 = grad_L * (1 - L) * L
grad_w2 = grad_s3
0.73 grad_s2 = grad_s3
1.00
grad_s0 = grad_s2
grad_s1 = grad_s2
grad_w1 = grad_s1 * x1
grad_x1 = grad_s1 * w1
grad_w0 = grad_s0 * x0
Multiply
grad_x0 = grad_s0 * w0

9/10/2023 Zafri Baharuddin EEEB4023/ECEB463 AI 42 The Energy University


Patterns in Gradient Flow
add gate: gradient distributor mul gate: “swap multiplier”

copy gate: gradient adder max gate: gradient router

9/10/2023 Zafri Baharuddin EEEB4023/ECEB463 AI 43 The Energy University


So far: backprop with scalars

What about vector-valued functions?

9/10/2023 Zafri Baharuddin EEEB4023/ECEB463 AI 44 The Energy University


Review: Vector Derivatives

9/10/2023 Zafri Baharuddin EEEB4023/ECEB463 AI 45 The Energy University


Summary
During the backward pass, each node in
Represent complex expressions the graph receives upstream gradients
as computational graphs and multiplies them by local gradients to
compute downstream gradients

z
Downstream f
Gradients

Local
Gradient
Forward pass: Compute outputs Upstream
Gradient

Backward pass: Compute gradients

9/10/2023 Zafri Baharuddin EEEB4023/ECEB463 AI 46 The Energy University


Problem: So far our
classifiers don’t
respect the spatial
f = W2 max(0, W1x) structure of images!

Hidden layer:
100 Output: 10
Input:
x W1 h W2 s
3072

9/10/2023 Zafri Baharuddin EEEB4023/ECEB463 AI 47 The Energy University


Next time:

Convolutional Neural Networks

9/10/2023 Zafri Baharuddin EEEB4023/ECEB463 AI 48 The Energy University

You might also like