0% found this document useful (0 votes)
91 views

Chapter 5 - Vector Calculus File

The document discusses topics in vector calculus including: - The chain rule for computing derivatives of composite functions. - Partial derivatives and gradients, including computing gradients of matrices and vector-valued functions. - Higher-order derivatives, linearization, and multivariate Taylor series. - Backpropagation, an efficient algorithm for computing gradients in deep learning models to optimize parameters with gradient descent.

Uploaded by

Hoang Quoc Trung
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
91 views

Chapter 5 - Vector Calculus File

The document discusses topics in vector calculus including: - The chain rule for computing derivatives of composite functions. - Partial derivatives and gradients, including computing gradients of matrices and vector-valued functions. - Higher-order derivatives, linearization, and multivariate Taylor series. - Backpropagation, an efficient algorithm for computing gradients in deep learning models to optimize parameters with gradient descent.

Uploaded by

Hoang Quoc Trung
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 41

Vector Calculus

Contents
• ex11-12, bt11-12
Differentiation of Univariate Functions
Partial Differentiation and Gradients
Gradients of Matrices
Backpropagation
Higher-Order Derivatives
Linearization and Multivariate Taylor Series

1/10/2022 Chapter 5 - Vector Calculus 2


The Chain Rule

x f f(x) g (gf)(x)

(gf)(x) = g(f(x))f(x) # gf means g after f


dg dg df
=
dx df dx

1/9/2022 Chapter 5 - Vector Calculus 3


Chain rule – Ex
• Use the chain rule to compute the derivative of
h(x) = (2x + 1)4

x 2() 2x () + 1 2x+ 1 ()4 (2x+1)4

• h can be expressed as h(x) = (gfu)(x)


u(x) = 2x, f(u) = u + 1, g(f) = f4
h(x) = (gfu)(x) = g(f)f(u)u(x) = 4f3(1)(2) = 8(2x + 1)3

1/9/2022 Chapter 5 - Vector Calculus 4


Partial Derivative
Definition (Partial Derivative). For a function of n variables
x1, . . . , xn f : n → ,
x  f(x)
we define the partial derivatives as
f f(x1,…, xk+ h, xk+1,..., xn) – f(x1,..., xk,…, xn)
= limh→0
xk h

1/9/2022 Chapter 5 - Vector Calculus 5


Gradients of f : n → 

• We collect all partial derivatives of f in the row vector to


form the gradient of f f f … f
x1 x2 xn
df
• Notation. xf gradf
dx
• Ex. For f : 2 → , f(x1, x2) = x13 – x1x2
f f
• Partial derivatives = 3x12 – x2, = x 1 3 – x1
x1 x2
• The Gradient of f
xf = [3x12x2 – x2 x13 – x1]  12 (1 row, 2 columns)

1/9/2022 Chapter 5 - Vector Calculus 6


Gradients of f : n →  x  3 f  1
df
gradf = = xf  1n
dx
Ex. For f(x, y) = (x3 + 2y)2, xf  13
we obtain the partial derivatives
f 
• = 2(x3 + 2y) (x3 + 2y) = 6x2(x3 + 2y)
x x

f 
• = 2(x3 + 2y) (x3 + 2y) = 4(x3 + 2y)
y y
 The gradient of f is [6x2(x3 + 2y) 4(x3 + 2y)]

1/9/2022 Chapter 5 - Vector Calculus 7


Gradients/Jacobian of Vector-Valued Functions f : n  m

• For a vector-valued function f: n  m,


f(x) = [f1(x) f2(x) … fm(x)]T m-row vector
where fi : n  
Gradient (or Jacobian) of f
df ∇xf 1
J = ∇xf = = … Dimension: mn
dx
∇xfm

1/9/2022 Chapter 5 - Vector Calculus 8


Jacobian of f: n  m – size
x3 f4 J  43
x1 x2 x3
f1
f2
f2 x3

f3

fm

1/9/2022 Chapter 5 - Vector Calculus 9


Jacobian of f: n  m – Ex

Ex. Find the Jacobian of f: 3  2


f1(x1, x2, x3) = 2x1 + x2x3, f2(x1, x2, x3) = x1x3 - x22

Jacobian of f: J 23

1/9/2022 Chapter 5 - Vector Calculus 10


Gradient of f: n  m – Ex

• We are given f(x) = Ax, f(x) ∈ m, A ∈ mn, x ∈ n. Compute


the gradient ∇xf
fi
• ∇xf = its size is mn
xj
mn

𝑛
fi(x) = 𝑗=1 aij xj
fi
 = aij  ∇xf = A
xj

1/9/2022 Chapter 5 - Vector Calculus 11


Gradient of f: n  m – Ex2

• Given h :  → , h(t) = (fx)(t)


where f : 2 → , f(x) = exp(x1 + x22),
x1(t) t
x :  →  , x(t) =
2 =
x2(t) sint
dh
Compute , the gradient of h with respect to t.
dt
• Use the chain rule (matrix version) for h = fx
dh d df dx
= (fx) =
dt dt dx dt

1/9/2022 Chapter 5 - Vector Calculus 12


Gradient of f: n  m – Ex2
x1
dh
= f f t = f x1 + f x2
dt x1 x2 x2 x1 t x2 t
t
= exp(x1 + x22) + 2x2exp(x1 + x22)cost,
where x1 = t, x2 = sint.

1/9/2022 Chapter 5 - Vector Calculus 13


Gradient of f: n  m – Exercise
• y  N, θ  D, Φ  ND
e: D  N, e(θ) = y − Φθ,
L: N  , L(e) = e2 = eTe
dL de dL
Find , , and
de dθ dθ

1/9/2022 Chapter 5 - Vector Calculus 14


Gradient of A : m  pq
Approach 1

4×2×3 tensor

1/9/2022 Chapter 5 - Vector Calculus 15


Gradient of A : m  pq
Approach 2: Re-shape matrices into vectors

4×2×3 tensor

1/9/2022 Chapter 5 - Vector Calculus 16


Gradients of A : m  pq – Ex
• Ex. Consider A: 3  32
𝑥1 − 𝑥2 𝑥1 + 𝑥3
• A(x1, x2, x3) = 𝑥1 2 + 𝑥3 2𝑥1
𝑥3 − 𝑥2 𝑥1 + 𝑥2 + 𝑥3
dA
• The dimension of : (32)3
dx
• Approach 1
1 1 −1 0 0 1
A A A
= 2𝑥1 2 , = 0 0, = 1 0
x 1 x2 x3 (32)3 tensor
0 1 −1 1 1 1

1/9/2022 Chapter 5 - Vector Calculus 17


Gradient of f : mn  p – Ex
fM AMN xN
… … … … x1 f: MN  M, f(A) = Ax
fj Aj1 … AjN … fi = Ai1x1 +… + Aikxk +…+ AiNxN
fi Ai1 … AiN xN
fi fi
… … … …  = xk, = 0 (j  i)
Aik Ajk
df
 M(MN)
dA … … …
… … …
0 … 0
x1 … xN
x1 … xN
0 … 0
… … …
… … …

1/9/2022 Chapter 5 - Vector Calculus 18


Gradient of Matrices with Respect to Matrices mn  pq

For R ∈ MN and f : MN → NN d𝑲𝑝𝑞


∈ 1×(𝑀×𝑁)
with f(R) = RTR = K ∈ NN d𝑹
Compute the gradient dK/dR.
The gradient has the dimensions
R
dK/dR ∈ (NN)MN K

Kpq

1/9/2022 Chapter 5 - Vector Calculus 19


Gradient of Matrices with Respect to Matrices mn  pq

dK/dR ∈ (NN)MN K= RTR


R = [r1 r2 … rN], ri is ith column of R
dK𝑝𝑞 1×(𝑀×𝑁)
∈
dR Kpq
Rij
dK𝑝𝑞
∈ 1
dR𝑖𝑗 𝑀
Kpq = rpTrq = 𝑚=1 𝑅𝑚𝑝𝑅𝑚𝑞
𝑅𝑖𝑞 𝑖𝑓𝑗 = 𝑝  𝑞
dK𝑝𝑞 𝑅𝑖𝑝 𝑖𝑓 𝑗 = 𝑞𝑝
= pqij =
* dR𝑖𝑗 2𝑅𝑖𝑞 𝑖𝑓 𝑗 = 𝑝 = 𝑞
0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒

1/9/2022 Chapter 5 - Vector Calculus 20


Backpropagation - Introduction

• Probably the single most important algorithm in all of Deep Learning


• In many machine learning applications, we find good model
parameters by performing gradient descent  compute the gradient
of a learning objective w.r.t. the parameters of the model. For
example, an ANN (single Hidden Layer with 150 nodes) for
128x128x3 color image needs at least 128x128x3x150 = 7,372,800
weights.
• The backpropagation algorithm is an efficient way to compute the
gradient of an error function with respect to the parameters of the
model.

1/9/2022 Chapter 5 - Vector Calculus 21


 Given training data ML Needs Gradients
{(x1, y1), (x2, y2), …, (xm, ym)}

 Choose decision and cost functions


𝒚𝑖 = 𝑓𝜃 (𝒙𝑖 )
C(𝒚𝑖, yi)

 Define the goal


1
Find * that minimizes iC(𝒚𝑖, yi)
𝑚

! The backpropagation  Train the model with (stochastic) gradient


descent to update ,
algorithm is an efficient 𝜕𝐶
way to compute the (t+1) = (t) -  (xi, yi)
gradient 𝜕(t)

1/9/2022 Chapter 5 - Vector Calculus 22


Epochs
• Backpropagation algorithm consists of many cycles, each cycle is
called an epoch with two processes:

forward phase
a(0)  z(1), a(1)  z(2), a(2)  …  C

𝜕𝐶 𝜕𝐶 𝜕𝐶
 …
𝜕 (1) 𝜕 (2) 𝜕 (𝑁)

backward phase

1/9/2022 Chapter 5 - Vector Calculus 23


Deep Network (ANN with hidden layers)

Activation equations (matrix version)


Layer (1) = hidden layer
z(1) = W(1)a(0) + b(1)
a(1) = 1(z(1))
Layer (2) = output layer
z(2) = W(2)a(1) + b(2)
a(2) = 2(z(2))

The cost for example number k


1 (2) 1
Ck = 𝑖(𝑎𝑖 −𝑦𝑖 )2 = a(2) – y2
2 2

1/9/2022 Chapter 5 - Vector Calculus 24


Forward phase
For L = 1..N, a(0) = x
z(L) = W(L)a(L-1) + b(L)
a(L) = L(z(L))
1
C: cost function (i.e., C = a(N) – y2)
2

1/9/2022 Chapter 5 - Vector Calculus 25


Backpropagation
Layer 1 Layer 2 … Layer K-1 Layer K … Layer N-1 Layer N

𝜕𝐶 𝜕𝐶 𝜕𝒂(𝑁) 𝜕𝒂(𝑁−1)
=
𝜕𝑾(𝑁−1) 𝜕𝒂(𝑁) 𝜕𝒂(𝑁−1) 𝜕𝑾(𝑁−1) 𝜕𝐶 𝜕𝐶 𝜕𝒂(𝑁)
=
𝜕𝑾(𝑁) 𝜕𝒂(𝑁) 𝜕𝑾(𝑁)
𝜕𝐶 𝜕𝐶 𝜕𝒂(𝑁) 𝜕𝒂(𝑁−1)
=
𝜕𝒃(𝑁−1) 𝜕𝒂(𝑁) 𝜕𝒂(𝑁−1) 𝜕𝒃(𝑁−1) 𝜕𝐶 𝜕𝐶 𝜕𝒂(𝑁)
=
𝜕𝒃(𝑁) 𝜕𝒂(𝑁) 𝜕𝒃(𝑁)
𝜕𝐶 𝜕𝐶 𝜕𝒂(𝑁) 𝜕𝒂(𝑁−1) 𝜕𝒂(𝑁−2)
= Benefit of backpropagation:
𝜕𝑾(𝑁−2) 𝜕𝒂(𝑁) 𝜕𝒂(𝑁−1) 𝜕𝒂(𝑁−2) 𝜕𝑾(𝑁−2)
Reused terms outside the box
𝜕𝐶 𝜕𝐶 𝜕𝒂(𝑁) 𝜕𝒂(𝑁−1) 𝜕𝒂(𝑁−2)
=
𝜕𝒃(𝑁−2) 𝜕𝒂(𝑁) 𝜕𝒂(𝑁−1) 𝜕𝒂(𝑁−2) 𝜕𝒃(𝑁−2)

1/9/2022 Chapter 5 - Vector Calculus 26


Backpropagation Activation equations
z(L) = W(L)a(L-1) + b(L)
Layer 1 Layer 2 … Layer K-1 Layer K … Layer N-1 Layer N a(L) = L(z(L))
C: cost function

𝜕𝐶 𝜕𝐶 𝜕𝒂(𝑁) 𝜕𝒂(𝑁−1) 𝜕𝒂(𝐿+3) 𝜕𝒂(𝐿+2) 𝜕𝒂(𝐿+1)


= … (𝐿+2) (𝐿+1) (𝐿+1)
𝜕𝑾(𝐿+1) 𝜕𝒂(𝑁) 𝜕𝒂(𝑁−1) 𝜕𝒂(𝑁−2) 𝜕𝒂 𝜕𝒂 𝜕𝑾

𝜕𝐶 𝜕𝐶 𝜕𝒂(𝑁) 𝜕𝒂(𝑁−1) 𝜕𝒂(𝐿+2) 𝜕𝒂(𝐿+1) 𝜕𝒂(𝐿)


= …
𝜕𝑾(𝐿) 𝜕𝒂(𝑁) 𝜕𝒂(𝑁−1) 𝜕𝒂(𝑁−2) 𝜕𝒂(𝐿+1) 𝜕𝒂(𝐿) 𝜕𝑾(𝐿)

𝜕𝐶 𝜕𝐶 𝜕𝒂(𝑁) 𝜕𝒂(𝑁−1) 𝜕𝒂(𝐿+2) 𝜕𝒂(𝐿+1) 𝜕𝒂(𝐿)


= … Backpropagation
𝜕𝒃(𝐿) 𝜕𝒂(𝑁) 𝜕𝒂(𝑁−1) 𝜕𝒂(𝑁−2) 𝜕𝒂(𝐿+1) 𝜕𝒂(𝐿) 𝜕𝒃(𝐿)

At each layer (L), need to compute Compute e(L+1) (at layer


𝜕𝐶 𝜕𝐶 𝜕𝒂(𝑁) 𝜕𝒂(𝐿+2) 𝜕𝒂(𝐿+1) 𝜕𝒂(𝐿+1) L+1) before computing
e(L) := = … = e(L+1)
e(L) (at layer L)
𝜕𝒂(𝐿) 𝜕𝒂(𝑁) 𝜕𝒂(𝑁−1) 𝜕𝒂(𝐿+1) 𝜕𝒂(𝐿) 𝜕𝒂(𝐿)
1/9/2022 Chapter 5 - Vector Calculus 27
Backpropagation algorithm
For each example in training examples
1. Feed forward
2. Backpropagation
At output layer (N), compute and store:
𝜕𝐶
e(N) = (𝑁)
𝜕𝒂
𝜕𝐶 𝜕𝒂(𝑁) 𝜕𝐶 (𝑁)
𝜕𝒂
= e (N) , = e(N)
𝜕𝑾(𝑁) 𝜕𝑾(𝑁) 𝜕𝒃(𝑁) 𝜕𝒃(𝑁)
For layer (L) from N-1 to 1:
𝜕𝒂 (𝐿+1)
• Compute e(L) using e(L) = e(L+1)
Activation equations 𝜕𝒂(𝐿)
z(L) = W(L)a(L-1) + b(L) 𝜕𝐶 𝜕𝒂(𝐿) , 𝜕𝐶
• Compute (𝐿) = e (L) = e(L)
a(L) = L(z(L)) (𝐿)
𝜕𝑾 𝜕𝑾 (𝐿) 𝜕𝒃(𝐿)
𝜕𝒂
C: cost function
𝜕𝒃(𝐿)

1/9/2022 Chapter 5 - Vector Calculus 28


Higher-order partial derivatives

Consider a function f : 2
 of two variables x, y.
Second order partial derivatives:
2 f 2 f Ex.
f: 2  , f(x, y) = x3y – 3xy2 + 5y,
x 2 xy 𝜕𝑓 𝜕𝑓
= 3x2y – 3y2, = x3 – 6xy +5
𝜕𝑥 𝜕𝑦
2 f 2 f 2
𝜕𝑓 2
𝜕𝑓
= 3𝑥2 − 6𝑦,
2 = 6𝑥𝑦,

yx y 2
𝜕𝑥 𝜕𝑥𝜕𝑦
2
𝜕𝑓 2 𝜕2𝑓
= 3𝑥 − 6𝑦, 2 = −6𝑥
𝜕𝑦𝜕𝑥 𝜕𝑦
n f
is the nth partial derivative of f with respect to x
x n

1/9/2022 Chapter 5 - Vector Calculus 29


The Hessian
• The Hessian is the collection of all second-order partial derivatives

Hessian matrix is symmetric for twice


continuously differentiable functions,
that is,
𝜕2𝑓 𝜕2𝑓
Hessian matrix =
𝜕𝑥𝜕𝑦 𝜕𝑦𝜕𝑥

1/9/2022 Chapter 5 - Vector Calculus 30


Gradient vs Hessian of f: n  
Consider a function f : n

Gradient Hessian
 2 f 2 f 2 f 
 f f f   ... 
f   ...   1x 2
x1x2 x1xn 
 x1 x2 xn   2 f 2 f 2 f 
 ... 
Dimension: 1  n  f   x2 x1
2
x2 2 x2 xn 
 ... ... ... ... 
 
 2 f 2 f  f 
2

 x x xn x2
...
xn 2 
 n 1
Dimension: n  n

1/9/2022 Chapter 5 - Vector Calculus 31


Gradient vs Hessian of f: n  m
Consider (vector-valued) function f : n
 m

x1 f1
Gradient x2 f2 Hessian
x3
m  n matrix m  (n  n) tensor

Dimension: 2  3

Dimension: 2  (3  3)

1/9/2022 Chapter 5 - Vector Calculus 32


Example
• Compute the Hessian of the function z = f(x, y) = x2 + 6xy – y3 and
evaluate at the point (x = 1, y = 2, z = 5).

1/9/2022 Chapter 5 - Vector Calculus 33


Taylor series for f:   
Taylor polynomials

Approximation problems

1/9/2022 Chapter 5 - Vector Calculus 34


Taylor series for f: D  
Consider a function f (smooth at x0)

multivariate Taylor series of f at x0 is defined as

where

1/9/2022 Chapter 5 - Vector Calculus 35


Example
Find the Taylor series for the function
f(x, y) = x2 + 2xy + y3 at x0 = 1, y0 = 2.

1/9/2022 Chapter 5 - Vector Calculus 36


Taylor series of f(x, y) = x2 + 2xy + y3

1/9/2022 Chapter 5 - Vector Calculus 37


Taylor series of f(x, y) = x2 + 2xy + y3

*
* *

[i].[j].[k]
3[i,j,k]

1/9/2022 Chapter 5 - Vector Calculus 38


Taylor series of f(x, y) = x2 + 2xy + y3

The Taylor series expansion of f at (x0, y0) = (1, 2) is

1/9/2022 Chapter 5 - Vector Calculus 39


Summary

Differentiation of Univariate Functions


Partial Differentiation and Gradients
Gradients of Matrices
Backpropagation
Higher-Order Derivatives
Linearization and Multivariate Taylor Series

1/11/2022 Chapter 5 - Vector Calculus 40


THANKS

1/9/2022 Chapter 5 - Vector Calculus 41

You might also like