0% found this document useful (0 votes)

33 views

Backpropagation Exercises

Uploaded by

Sirius

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

33 views

Backpropagation Exercises

Uploaded by

Sirius

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

Section 3: Gradient Descent & Backpropagation

Practice Problems

Problem 1. Computation Graph Review

Let's assume we have a simple function f (x, y, z) = (x + y) z . We can break this up into the
equations q = x + y and f (x, y, z) = qz . Using this simplified notation, we can also represent
this equation as a computation graph:

Now let's assume that we are evaluating this function at x = -2, y = 5, and z = -4. In addition let the
value of the upstream gradient (gradient of the loss with respect to our function, ∂ L/∂f ) equal 1.
These are filled out for you in the computation graph.

Solve for the following values, both symbolically (without plugging in specific values of x/y/z), and
evaluated at x = -2, y = 5, z = -4, and ∂L/∂f = 1:

Symbolically Evaluated:
1. ∂ f / ∂q = ∂ f / ∂q =

2. ∂ q / ∂x = ∂ q / ∂x =

3. ∂ q / ∂y = ∂ q / ∂y =

4. ∂ f / ∂z = ∂ f / ∂z =

5. ∂ f / ∂x = ∂ f / ∂x =

6. ∂ f / ∂y = ∂ f / ∂y =

CS230 Deep Learning Page 1

Problem 2. Computation Graphs on Steroids

Now let's perform backpropagation through a single neuron of a neural network with a sigmoid
activation. Specifically, we will define the pre-activation z = wo xo + w1 x1 + w2 and we will define
the activation value α = σ (z) = 1 / (1 + e−z ) . The computation graph is visualized below:

In the graph we've filled out the forward activations, on the top of the lines, as well as the
upstream gradient (gradient of the loss with respect to our neuron, ∂ L/∂α ). Use this information
to compute the rest of the gradients (labelled with question marks) throughout the graph.

Hint: A calculator may be helpful here.

Finally, report the symbolic gradients with respect to the input parameters, xo , x1 , w0 , w1 , w2 :

1. ∂ α / ∂x0 =

2. ∂ α / ∂w0 =

3. ∂ α / ∂x1 =

4. ∂ α / ∂w1 =

5. ∂ α / ∂w2 =

CS230 Deep Learning Page 2

Problem 3. Backpropagation Basics: Dimensions & Derivatives

Let's assume we have a two layer neural network, as defined below:

z 1 = W 1 x(i) + b1
a1 = ReLU (z 1 )
z 2 = W 2 a1 + b2
(i)
ŷ = σ(z 2 )
(i) (i) (i)
L = y (i) (i)
* log(yˆ ) + (1 − y ) * log(1 − yˆ )
m
J = −1
m ∑ L(i)
i=1

Note that x(i) represents a single input example, and is of shape Dx × 1 . Further y (i) is a single
output label and is a scalar. There are m examples in our dataset. We will use Da1 nodes in our
hidden layer; that is, z 1 's shape is Da1 × 1 .

1. What are the shapes of W 1 , b1 , W 2 , b2 ? If we were vectorizing this network across

multiple examples, what would the shapes of the weights/biases be instead? If we were
vectorizing across multiple examples, what would the shapes of X and Y be instead?

(i)
2. What is ∂ J / ∂yˆ ? Refer to this result as δ 1 (i) . Using this result, what is ∂ J / ∂yˆ ?

(i)
3. What is ∂ yˆ / ∂z 2 ? Refer to this result as δ 2 (i) .

CS230 Deep Learning Page 3

Equations reproduced below for the remaining parts of the question:

z 1 = W 1 x(i) + b1
a1 = ReLU (z 1 )
z 2 = W 2 a1 + b2
(i)
ŷ = σ(z 2 )
(i) (i) (i)
L = y (i) * log(yˆ ) + (1 − y (i) ) * log(1 − yˆ )
m
J = −1
m ∑ L(i)
i=1

4. What is ∂ z 2 / ∂a1 ? Refer to this result as δ 3 (i) .

5. What is ∂ a1 / ∂z 1 ? Refer to this result as δ 4 (i) .

6. What is ∂ z 1 / ∂W 1 ? Refer to this result as δ 5 (i) .

7. What is ∂ J / ∂W 1 ? It may help to reuse work from the previous parts. Hint: Be careful with
the shapes!

CS230 Deep Learning Page 4

Problem 4. Bonus!

Apart from simple mathematical operations like multiplication or exponentiation, and piecewise
operations like the max used in relu activations, we can also perform complex operations in our
neural networks. For this question, we'll be exploring the sort operation in hopes of better
understanding how to backpropagate gradients through a sort. This is applicable in a variety of
real-world use-cases including a differentiable non-max suppression, for object detection
networks.

For each of the following parts, assume you are given an input vector x ∈ Rn and some
upstream gradient vector ∂ L / ∂F , and you want to calculate ∂ L / ∂x where F is a function of x
that also returns a vector. You may assume all values in x are distinct. Note that xo is the first
component in the vector x: ( x = [x0 , x1 , ... , xn−1 ] ).

1. F(x) = x0 * x

2. F(x) = sort(x)

3. F(x) = x0 * sort(x)

CS230 Deep Learning Page 5

Section 3 Solutions

Problem 1. Computation Graph Review

1. ∂ f / ∂q = z = − 4
2. ∂ q / ∂x = 1
3. ∂ q / ∂y = 1
4. ∂ f / ∂z = q = x + y = 3
5. ∂ f / ∂x = z * 1 = z = − 4
6. ∂ f / ∂y = z * 1 = z = − 4

Problem 2. Computation Graphs on Steroids

1. ∂ α / ∂x0 = σ (z) (1 − σ(z)) w0

2. ∂ α / ∂w0 = σ (z) (1 − σ(z)) x0

3. ∂ α / ∂x1 = σ (z) (1 − σ(z)) w1

4. ∂ α / ∂w1 = σ (z) (1 − σ(z)) x1

5. ∂ α / ∂w2 = σ (z) (1 − σ(z))

CS230 Deep Learning Page 6

Problem 3. Backpropagation Basics: Dimensions & Derivatives

1. W 1 ∈ RDa1 ×Dx , b1 ∈ RDa1 ×1 , W 2 ∈ R1×Da1 , b2 ∈ R1×1 . The shapes of the weights/biases

would be the same after vectorizing. X ∈ RDx ×m , Y ∈ Rm ×1 after vectorizing.
(i) (i)
2. δ 1 (i) = y (i) / yˆ − (1 − y (i) ) / (1 − yˆ ) . δ J / δyˆ = − 1
m ∑ δ 1 (i)
i
(i)
3. δ2 = σ(z 2 ) (1 − σ(z 2 ))

4. δ 3 (i) = W 2

5. δ 4 (i) = 0 if z 1 < 0, 1 if z 1 >= 0

T
6. δ 5 (i) = x(i)

7. δ 6 (i) = − 1
m ∑ δ 1 (i) * δ 2 (i) * (δ 3 (i) ° δ 4 (i) ) * δ 5 (i)
i

Problem 4. Bonus!

1. As an example, say x = [x0 , x1 , x2 ] . Then F (x) = [x0 * x0 , x0 * x1 , x0 * x2 ] . Then ∂ L/∂x

will be a vector. For component i, where i is not 0, it is ∂ L/∂F i * x0 . For the 0th

component, it will be 2 * x0 * ∂ L/∂F 0 + ∑ ∂ L/∂F i * xi .

i≠0
2. Sorting will simply reroute the gradients. As an example, say x = [x0 , x1 , x2 ] , we have
upstream gradients ∂ L / ∂F = [∂ 0 , ∂ 1 , ∂ 2 ] , and F (x) = [x1 , x2 , x0 ] . Then,
∂ L / ∂x = [∂ 2 , ∂ 0 , ∂ 1 ] (move gradients to reverse the transformation from x -> F(x)) .

3. This can be viewed as a computation graph where the multiplication happens first and
then the sorting happens. As such, it simply requires rerouting the gradients to account
for the sort as in #2, and then performing the multiplicative rules as in #1 to account for
multiplying by x0.

CS230 Deep Learning Page 7

Haykin, Xue-Neural Networks and Learning Machines 3ed Soln
53% (19)
Haykin, Xue-Neural Networks and Learning Machines 3ed Soln
103 pages
Learning 3
No ratings yet
Learning 3
98 pages
Lecture04 Neuralnets
No ratings yet
Lecture04 Neuralnets
81 pages
2. Neural Network Training
No ratings yet
2. Neural Network Training
73 pages
cs224n 2023 Lecture03 Neuralnets
No ratings yet
cs224n 2023 Lecture03 Neuralnets
83 pages
Machine Learning: Backpropagation
No ratings yet
Machine Learning: Backpropagation
24 pages
Mathematics of Deep Learning: Lecture 1-Introduction and The Universality of Depth 1 Nets
No ratings yet
Mathematics of Deep Learning: Lecture 1-Introduction and The Universality of Depth 1 Nets
12 pages
Week 1 Solutions
No ratings yet
Week 1 Solutions
8 pages
Vectorization: Linear Model As A Perceptron
No ratings yet
Vectorization: Linear Model As A Perceptron
5 pages
Backpropagation: Loading Data
No ratings yet
Backpropagation: Loading Data
12 pages
CS460 - Deep Learning - W02 & W03
No ratings yet
CS460 - Deep Learning - W02 & W03
44 pages
Sparse Autoencoder
No ratings yet
Sparse Autoencoder
15 pages
Ex4 Tutorial - Forward and Back-Propagation
No ratings yet
Ex4 Tutorial - Forward and Back-Propagation
20 pages
Slides 11
No ratings yet
Slides 11
48 pages
Neural Networks: Derivation: 1 Model
No ratings yet
Neural Networks: Derivation: 1 Model
9 pages
Computing Neural Network Gradients-merged
No ratings yet
Computing Neural Network Gradients-merged
67 pages
Gradient Notes PDF
No ratings yet
Gradient Notes PDF
7 pages
Sparseautoencoder 2011new
No ratings yet
Sparseautoencoder 2011new
19 pages
ECE/CS 559 - Neural Networks Lecture Notes #7: The Backpropagation Algorithm
No ratings yet
ECE/CS 559 - Neural Networks Lecture Notes #7: The Backpropagation Algorithm
9 pages
L3-ANN
No ratings yet
L3-ANN
15 pages
Unit 3
No ratings yet
Unit 3
6 pages
Deep Learning Basics Lecture 2 Backpropagation
No ratings yet
Deep Learning Basics Lecture 2 Backpropagation
31 pages
nn2
No ratings yet
nn2
12 pages
Chapter 9
No ratings yet
Chapter 9
73 pages
Christopher Manning Lecture 3: Neural Net Learning: Gradients by Hand (Matrix Calculus) and Algorithmically (The Backpropagation Algorithm)
No ratings yet
Christopher Manning Lecture 3: Neural Net Learning: Gradients by Hand (Matrix Calculus) and Algorithmically (The Backpropagation Algorithm)
84 pages
18 DL Regularization
No ratings yet
18 DL Regularization
41 pages
Week5-LectureNotes
No ratings yet
Week5-LectureNotes
15 pages
DL03 Classroom SNN
No ratings yet
DL03 Classroom SNN
41 pages
Machine Learning: Neural Networks
No ratings yet
Machine Learning: Neural Networks
22 pages
Unit 3
No ratings yet
Unit 3
110 pages
Machine Learning
No ratings yet
Machine Learning
4 pages
04 Numerical
No ratings yet
04 Numerical
46 pages
5 Backward Propagation
No ratings yet
5 Backward Propagation
81 pages
Lecture12 Diff
No ratings yet
Lecture12 Diff
31 pages
3-Gradient.pptx
No ratings yet
3-Gradient.pptx
31 pages
Deep Learning Lectures - 2
No ratings yet
Deep Learning Lectures - 2
73 pages
Chap3slides
No ratings yet
Chap3slides
95 pages
[Fall 2024] Deep Learning 1
No ratings yet
[Fall 2024] Deep Learning 1
55 pages
Neural Networks
No ratings yet
Neural Networks
52 pages
Neural Network Presentation
No ratings yet
Neural Network Presentation
33 pages
TUM I2DL Matrix Derivatives
No ratings yet
TUM I2DL Matrix Derivatives
8 pages
Lec 03 Deep Networks 1
No ratings yet
Lec 03 Deep Networks 1
53 pages
AI2025_Lecture08_recording_slide
No ratings yet
AI2025_Lecture08_recording_slide
38 pages
Understanding and Creating Neural Networks
No ratings yet
Understanding and Creating Neural Networks
69 pages
Week 5 Lecture Notes
No ratings yet
Week 5 Lecture Notes
15 pages
06-backprop
No ratings yet
06-backprop
63 pages
Ps 4
No ratings yet
Ps 4
4 pages
09: Neural Networks - Learning: Neural Network Cost Function
No ratings yet
09: Neural Networks - Learning: Neural Network Cost Function
9 pages
Lec3 MLP Optimization
No ratings yet
Lec3 MLP Optimization
86 pages
XCS224N_Module2_Slides
No ratings yet
XCS224N_Module2_Slides
80 pages
Haykin Xue Neural Networks and Learning Machines 3ed Soln PDF
50% (2)
Haykin Xue Neural Networks and Learning Machines 3ed Soln PDF
103 pages
IBest_DeepLearning
No ratings yet
IBest_DeepLearning
123 pages
2021-exam2-solution
No ratings yet
2021-exam2-solution
11 pages
Lecture20 Backprop
No ratings yet
Lecture20 Backprop
77 pages
Deep L
No ratings yet
Deep L
21 pages
Understanding Backpropagation Algorithm - Towards Data Science
No ratings yet
Understanding Backpropagation Algorithm - Towards Data Science
11 pages
3EBX0_lecture_notes_addendum
No ratings yet
3EBX0_lecture_notes_addendum
10 pages
Lecture 02
No ratings yet
Lecture 02
37 pages
Mathematics 1St First Order Linear Differential Equations 2Nd Second Order Linear Differential Equations Laplace Fourier Bessel Mathematics
From Everand
Mathematics 1St First Order Linear Differential Equations 2Nd Second Order Linear Differential Equations Laplace Fourier Bessel Mathematics
Andrew Igla
No ratings yet
Computer Solved: Nonlinear Differential Equations
From Everand
Computer Solved: Nonlinear Differential Equations
Joe J. Ettl
No ratings yet
Vitmee Previous Question Paper
No ratings yet
Vitmee Previous Question Paper
1 page
Bearing Smearing and Its Causes
No ratings yet
Bearing Smearing and Its Causes
6 pages
Structural Stability
100% (1)
Structural Stability
4 pages
Prestressed Concrete Design
No ratings yet
Prestressed Concrete Design
864 pages
Research Method FD
No ratings yet
Research Method FD
77 pages
Hundred Years Geophysics PDF
No ratings yet
Hundred Years Geophysics PDF
10 pages
7.14 Inclined Pipes
No ratings yet
7.14 Inclined Pipes
14 pages
Principles of Vision & Colour: Ir Dr. Sam C. M. Hui
No ratings yet
Principles of Vision & Colour: Ir Dr. Sam C. M. Hui
60 pages
Multiple integral: r sinθdr d θd ∅ ρ r sinθd ∅ ρ ρ a sinθ dθd ∅ ρ a ρ a
No ratings yet
Multiple integral: r sinθdr d θd ∅ ρ r sinθd ∅ ρ ρ a sinθ dθd ∅ ρ a ρ a
3 pages
5 Nopause
No ratings yet
5 Nopause
95 pages
Where can buy Ill Posed and Non Classical Problems of Mathematical Physics and Analysis Proceedings of the International Conference Samarkand Uzbekistan M M Lavrent’Ev ebook with cheap price
No ratings yet
Where can buy Ill Posed and Non Classical Problems of Mathematical Physics and Analysis Proceedings of the International Conference Samarkand Uzbekistan M M Lavrent’Ev ebook with cheap price
67 pages
3.1 Forces Easy
No ratings yet
3.1 Forces Easy
6 pages
Materials Science and Engineering
No ratings yet
Materials Science and Engineering
1 page
State Biot Servat Law and Hence Obtain An Expression For The Magnetic Induction Produced by Infinite Long Current Carrying Conductor at Any Point Near It
No ratings yet
State Biot Servat Law and Hence Obtain An Expression For The Magnetic Induction Produced by Infinite Long Current Carrying Conductor at Any Point Near It
5 pages
Study of Entropy Generation in Secondary Flows in The T106A Low Pressure Axial Turbine Cascade
No ratings yet
Study of Entropy Generation in Secondary Flows in The T106A Low Pressure Axial Turbine Cascade
8 pages
Inspection Certificate According To EN 10204, Type 3.1.: EN ISO 17632-A: T464MM1H5
No ratings yet
Inspection Certificate According To EN 10204, Type 3.1.: EN ISO 17632-A: T464MM1H5
1 page
Anchor Bolt Design
No ratings yet
Anchor Bolt Design
3 pages
Micro Motion 5300 Mass Flow & Density Transmitter: EMERSON Process Management Educational Services
No ratings yet
Micro Motion 5300 Mass Flow & Density Transmitter: EMERSON Process Management Educational Services
30 pages
Control System in A Hybrid Solar Dryer
No ratings yet
Control System in A Hybrid Solar Dryer
5 pages
DWU - Concrete Design
No ratings yet
DWU - Concrete Design
6 pages
Jiles-Atherton Magnetic Hysteresis Parameters Iden
No ratings yet
Jiles-Atherton Magnetic Hysteresis Parameters Iden
7 pages
FSC Part-1 Maths Mcqs
No ratings yet
FSC Part-1 Maths Mcqs
24 pages
Delaire (Duke) - Phonons Near Lattice Instabilities in Thermoelectrics SnSe, SnTe, and PbTe
No ratings yet
Delaire (Duke) - Phonons Near Lattice Instabilities in Thermoelectrics SnSe, SnTe, and PbTe
14 pages
PBSEDSCI 004 Fluid Mechanics Learning Plan
No ratings yet
PBSEDSCI 004 Fluid Mechanics Learning Plan
13 pages
First Order Open Loop System: Che 529 Process Dynamics and Control
No ratings yet
First Order Open Loop System: Che 529 Process Dynamics and Control
5 pages
Dust Concentration Measuring Device
No ratings yet
Dust Concentration Measuring Device
2 pages
Real-Time PMU-based Power System Inertia Monitoring Considering Dynamic Equivalents
No ratings yet
Real-Time PMU-based Power System Inertia Monitoring Considering Dynamic Equivalents
203 pages
3.063 Polymer Physics: "Soft Matter"
No ratings yet
3.063 Polymer Physics: "Soft Matter"
519 pages
Post Lab Tit Rations Selecting Indicators
No ratings yet
Post Lab Tit Rations Selecting Indicators
2 pages
Atmospheric Stability
No ratings yet
Atmospheric Stability
13 pages

Backpropagation Exercises

Uploaded by

Backpropagation Exercises

Uploaded by

Section 3: Gradient Descent & Backpropagation

Problem 1. Computation Graph Review

CS230 Deep Learning Page 1

Hint: A calculator may be helpful here.

CS230 Deep Learning Page 2

Let's assume we have a two layer neural network, as defined below:

1. What are the shapes of W 1 , b1 , W 2 , b2 ? If we were vectorizing this network across

CS230 Deep Learning Page 3

4. What is ∂ z 2 / ∂a1 ? Refer to this result as δ 3 (i) .

5. What is ∂ a1 / ∂z 1 ? Refer to this result as δ 4 (i) .

6. What is ∂ z 1 / ∂W 1 ? Refer to this result as δ 5 (i) .

CS230 Deep Learning Page 4

3. F(x) = x​0​ * sort(x)

CS230 Deep Learning Page 5

Problem 1. Computation Graph Review

Problem 2. Computation Graphs on Steroids

1. ∂ α / ∂x0 = σ (z) (1 − σ(z)) w0

2. ∂ α / ∂w0 = σ (z) (1 − σ(z)) x0

3. ∂ α / ∂x1 = σ (z) (1 − σ(z)) w1

4. ∂ α / ∂w1 = σ (z) (1 − σ(z)) x1

5. ∂ α / ∂w2 = σ (z) (1 − σ(z))

CS230 Deep Learning Page 6

1. W 1 ∈ RDa1 ×Dx , b1 ∈ RDa1 ×1 , W 2 ∈ R1×Da1 , b2 ∈ R1×1 . The shapes of the weights/biases

5. δ 4 (i) = 0 if z 1 < 0, 1 if z 1 >= 0

1. As an example, say x = [x0 , x1 , x2 ] . Then F (x) = [x0 * x0 , x0 * x1 , x0 * x2 ] . Then ∂ L/∂x

component, it will be 2 * x0 * ∂ L/∂F 0 + ∑ ∂ L/∂F i * xi .

CS230 Deep Learning Page 7

You might also like

3. F(x) = x0 * sort(x)