0% found this document useful (0 votes)
26 views

MAI Lecture 03 Differential Calculus

Differential calculus defines the derivative of a function f as the limit of the difference quotient as h approaches 0. The derivative f'(x) represents the instantaneous rate of change of f(x) with respect to x. Higher order derivatives quantify how the rate of change is changing. Taylor series and Maclaurin series approximate functions using polynomials whose coefficients are evaluated from the functions and its derivatives. Taylor series are centered around a point a, while Maclaurin series are centered at 0.

Uploaded by

Yeabsira
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views

MAI Lecture 03 Differential Calculus

Differential calculus defines the derivative of a function f as the limit of the difference quotient as h approaches 0. The derivative f'(x) represents the instantaneous rate of change of f(x) with respect to x. Higher order derivatives quantify how the rate of change is changing. Taylor series and Maclaurin series approximate functions using polynomials whose coefficients are evaluated from the functions and its derivatives. Taylor series are centered around a point a, while Maclaurin series are centered at 0.

Uploaded by

Yeabsira
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

Differential Calculus

Differential Calculus
Differential Calculus
• For a function 𝑓: ℝ → ℝ, the derivative of f is defined as
𝑓 𝑥+ℎ −𝑓 𝑥
𝑓′
𝑥 = lim
ℎ→0 ℎ
• If 𝑓 ′ 𝑎 exists, f is said to be differentiable at a
• If f ‘ 𝑐 is differentiable for ∀𝑐 ∈ 𝑎, 𝑏 , then f is differentiable on this
interval
 We can also interpret the derivative 𝑓′(𝑥) as the instantaneous rate of change of
𝑓(𝑥) with respect to x
 I.e., for a small change in x, what is the rate of change of 𝑓(𝑥)
• Given 𝑦 = 𝑓(𝑥), where x is an independent variable and y is a dependent
variable, the following expressions are equivalent:
𝑑𝑦 𝑑𝑓 𝑑
𝑓′ 𝑥 = 𝑓′ = = = 𝑓 𝑥 = 𝐷𝑓 𝑥 = 𝐷𝑥 𝑓(𝑥)
𝑑𝑥 𝑑𝑥 𝑑𝑥
𝑑
• The symbols D, and 𝐷𝑥 are differentiation operators that indicate
,
𝑑𝑥
operation of differentiation
Differential Calculus
Differential Calculus
• The following rules are used for computing the derivatives of explicit
functions
Differential Calculus
Higher Order Derivatives
• The derivative of the first derivative of a function 𝑓 𝑥 is the second
derivative of 𝑓 𝑥
𝑑2 𝑓 𝑑 𝑑𝑓
2 =
𝑑𝑥 𝑑𝑥 𝑑𝑥
• The second derivative quantifies how the rate of change of 𝑓 𝑥 is
changing
 E.g., in physics, if the function describes the displacement of an object, the
first derivative gives the velocity of the object (i.e., the rate of change of the
position)
 The second derivative gives the acceleration of the object (i.e., the rate of
change of the velocity)
• If we apply the differentiation operation any number of times, we
obtain the n-th derivative of 𝑓 𝑥
𝑛𝑓 𝑛
𝑛
𝑑 𝑑
𝑓 𝑥 = 𝑛= 𝑓 𝑥
𝑑𝑥 𝑑𝑥
Taylor Series

Brook Taylor was an


accomplished musician and
painter. He did research in a
variety of areas, but is most
famous for his development of Brook Taylor
ideas regarding infinite series. 1685 - 1731
Suppose we wanted to find a fourth degree polynomial of
the form:
P  x   a0  a1 x  a2 x 2  a3 x3  a4 x 4

that approximates the behavior of f  x   ln  x  1 at x  0

If we make P  0   f  0 , and the first, second, third and fourth


derivatives the same, then we would have a pretty good
approximation.


P  x   a0  a1 x  a2 x 2  a3 x3  a4 x 4 f  x   ln  x  1

f  x   ln  x  1 P  x   a0  a1 x  a2 x 2  a3 x3  a4 x 4
f  0   ln 1  0 P  0   a0 a0  0

f  x 

1 P  x   a1  2a2 x  3a3 x 2  4a4 x3
1 x
1
f   0   1 P  0   a1 a1  1
1

f   x   
1 P  x   2a2  6a3 x  12a4 x 2
1  x 
2
1
1
f  0     1
 P  0   2a2 a2  
1
2


P  x   a0  a1 x  a2 x 2  a3 x3  a4 x 4 f  x   ln  x  1

f   x   
1 P  x   2a2  6a3 x  12a4 x 2
1  x 
2
1
1
f   0     1 P  0   2a2 a2  
1
2

P  x   6a3  24a4 x


1
f   x   2 
1  x 
3
2
f   0   2 P  0   6a3 a3 
6

P 4  x   24a4
1
f  4
 x   6
1  x 
4
6
f  4  0  6 P  4
 0  24a4 a4  
24

P  x   a0  a1 x  a2 x 2  a3 x3  a4 x 4 f  x   ln  x  1
1 2 2 3 6 4
P  x   0  1x  x  x  x f  x   ln  x  1
2 6 24
x 2 x3 x 4
P  x  0  x   
2 3 4
If we plot both functions, we see that near zero the functions match very well!
5
1
4
3
2 0.5
1 f  x
-5 -4 -3 -2 -1 0 1 2 3 4 5 -1 -0.5 0 0.5 1
-1
-2
-0.5
-3
-4
-5 P  x -1 
1 2 6
Our polynomial: 0  1x  x 2  x3  x 4
2 6 24
f   0  2 f   0  3 f  4  0  4
has the form: f  0  f   0 x  x  x  x
2 6 24

or: f  0 f   0 f   0  2 f   0  3 f  4  0  4
 x x  x  x
0! 1! 2! 3! 4!

This pattern occurs no matter what the original function was!


Maclaurin Series:
(generated by f at x=0 )

f   0  2 f   0  3
P  x   f  0  f   0 x  x  x  
2! 3!

If we want to center the series (and it’s graph) at some


point other than zero, we get the Taylor Series:

Taylor Series:
(generated by f at x=a )

f   a  f   a 
P  x   f  a   f   a  x  a           
2 3
x a x a
2! 3!

example: y  cos x

f  x   cos x f  0  1 f   x   sin x f   0   0

f   x    sin x f   0  0 f  4  x   cos x f  4  0  1

f   x    cos x f   0   1

1x 2 0 x3 1x 4 0 x 5 1x 6
P  x  1 0x       
2! 3! 4! 5! 6!

x 2 x 4 x 6 x8 x10
P  x  1     
2! 4! 6! 8! 10!


x 2 x 4 x 6 x8 x10
y  cos x P  x  1     
2! 4! 6! 8! 10!

-5 -4 -3 -2 -1 0 1 2 3 4 5
-1

The more terms we add, the better our approximation.


example: y  cos  2 x 

Rather than start from scratch, we can use the function


that we already know:

 2x  2x  2x   2x   2x 
2 4 6 8 10

P  x  1     
2! 4! 6! 8! 10!



example: y  cos  x  at x 
2
   
f  x   cos x f  0 f  x   sin x f    1
 
2 2
 
f   x    sin x f     1
2  4  

f  4
 x   cos x f  0
2
f   x    cos x f      0
2
  0  1 
2 3

P  x   0  1 x     x     x    
 2  2!  2  3!  2

 
3 5
 
 x    x 
 
P  x    x    
2  2
  
 2 3! 5!

Differential Calculus Taylor Series
• Taylor series provides a method to approximate any function 𝑓(𝑥) at a
point 𝑥0 if we have the first n
derivatives 𝑓 𝑥0 , 𝑓 1 𝑥0 , 𝑓 2 𝑥0 , … , 𝑓 𝑛 𝑥0
• For instance, for 𝑛 = 2, the second-order approximation of a function 𝑓(𝑥)
is
1 𝑑2 𝑓 2
𝑑𝑓
𝑓 𝑥 ≈ 𝑥 − 𝑥0 + 𝑥 − 𝑥0 + 𝑓 𝑥0
2 𝑑𝑥 2 𝑥0
𝑑𝑥 𝑥0
• Similarly, the approximation of 𝑓(𝑥) with a Taylor polynomial of n-degree
is
𝑖
𝑛 1𝑑 𝑓 𝑖
𝑓(𝑥) ≈ 𝑖=0 𝑖! 𝑑𝑥 𝑖 𝑥 − 𝑥0
𝑥0

• For example, the figure shows the first-


order, second-order, and fifth-order
polynomial of the exponential function
𝑓(𝑥) = 𝑒 𝑥 at the point 𝑥0 = 0

Picture from: http://d2l.ai/chapter_appendix-mathematics-for-deep-learning/single-variable-calculus.html


Differential Calculus
Geometric Interpretation
• To provide a geometric interpretation of the derivatives, let’s
consider a first-order Taylor series approximation of 𝑓 𝑥 at
𝑥 = 𝑥0
𝑑𝑓
𝑓 𝑥 ≈ 𝑓 𝑥0 + 𝑥 − 𝑥0
𝑑𝑥 𝑥0

• The expression approximates the function 𝑓 𝑥 by a line


which passes through the point 𝑥0 , 𝑓 𝑥0 and has slope
𝑑𝑓 𝑑𝑓
(i.e., the value of at the point 𝑥0 )
𝑑𝑥 𝑥0 𝑑𝑥

• Therefore, the first derivative of a


function is also the slope of the
tangent line to the curve of the
function

Picture from: http://d2l.ai/chapter_appendix-mathematics-for-deep-learning/single-variable-calculus.html


Differential Calculus
Partial Derivatives
• So far, we looked at functions of a single variable, where 𝑓: ℝ → ℝ
• Functions that depend on many variables are called multivariate
functions
• Let 𝑦 = 𝑓 𝐱 = 𝑓(𝑥1 , 𝑥2 , … , 𝑥𝑛 ) be a multivariate function with n variables
 The input is an n-dimensional vector 𝐱 = 𝑥1 , 𝑥2 , … , 𝑥𝑛 𝑇 and the output is a
scalar y
 The mapping is 𝑓: ℝ𝑛 → ℝ
• The partial derivative of y with respect to its ith parameter 𝑥𝑖 is
𝜕𝑦 𝑓(𝑥1 , 𝑥2 , … , 𝑥𝑖 +ℎ, … , 𝑥𝑛 ) − 𝑓(𝑥1 , 𝑥2 , … , 𝑥𝑖 , … , 𝑥𝑛 )
= lim
𝜕𝑥𝑖 ℎ→0 ℎ
𝜕𝑦
• To calculate (𝜕 pronounced ‚del‛ or we can just say ‚partial
𝜕𝑥𝑖
derivative‛), we can treat 𝑥1 , 𝑥2 , … , 𝑥𝑖−1 , 𝑥𝑖+1 … , 𝑥𝑛 as constants and
calculate the derivative of y only with respect to 𝑥𝑖
• For notation of partial derivatives, the following are equivalent:
𝜕𝑦 𝜕𝑓 𝜕
= = 𝑓 𝐱 = 𝑓𝑥𝑖 = 𝑓𝑖 = 𝐷𝑖 𝑓 = 𝐷𝑥𝑖 𝑓
𝜕𝑥𝑖 𝜕𝑥𝑖 𝜕𝑥𝑖
Differential Calculus
Gradient
• We can concatenate partial derivatives of a multivariate function with
respect to all its input variables to obtain the gradient vector of the
function
• The gradient of the multivariate function 𝑓(𝐱) with respect to the n-
dimensional input vector 𝐱 = 𝑥1 , 𝑥2 , … , 𝑥𝑛 𝑇 , is a vector of n partial
derivatives
𝑇
𝜕𝑓 𝐱 𝜕𝑓 𝐱 𝜕𝑓 𝐱
𝛻𝐱 𝑓 𝐱 = , ,…,
𝜕𝑥1 𝜕𝑥2 𝜕𝑥𝑛
• When there is no ambiguity, the notations 𝛻𝑓 𝐱 or 𝛻𝐱 𝑓 are often used for
the gradient instead of 𝛻𝐱 𝑓 𝐱
 The symbol for the gradient is the Greek letter 𝛻 (pronounced ‚nabla‛),
although 𝛻𝐱 𝑓 𝐱 is more often it is pronounced ‚gradient of f with respect to x‛
• In ML, the gradient descent algorithm relies on the opposite direction of
the gradient of the loss function ℒ with respect to the model parameters 𝜃
𝛻𝜃 ℒ for minimizing the loss function
 Adversarial examples can be created by adding perturbation in the direction of
the gradient of the loss ℒ with respect to input examples 𝑥 𝛻𝑥 ℒ for
maximizing the loss function
Differential Calculus
Hessian Matrix
• To calculate the second-order partial derivatives of multivariate functions,
we need to calculate the derivatives for all combination of input variables
• That is, for a function 𝑓(𝐱) with an n-dimensional input vector 𝐱 =
𝑥1 , 𝑥2 , … , 𝑥𝑛 𝑇 , there are 𝑛2 second partial derivatives for any choice of i
and j
𝜕2𝑓 𝜕 𝜕𝑓
=
𝜕𝑥𝑖 𝜕𝑥𝑗 𝜕𝑥𝑖 𝜕𝑥𝑗
• The second partial derivatives are assembled in a matrix called the Hessian
𝜕2𝑓 𝜕2𝑓

𝜕𝑥1 𝜕𝑥1 𝜕𝑥1 𝜕𝑥𝑛
𝐇𝑓 = ⋮ ⋱ ⋮
𝜕2𝑓 𝜕2𝑓

𝜕𝑥𝑛 𝜕𝑥1 𝜕𝑥𝑛 𝜕𝑥𝑛
• Computing and storing the Hessian matrix for functions with high-
dimensional inputs can be computationally prohibitive
 E.g., the loss function for a ResNet50 model with approximately 23 million
parameters, has a Hessian of 23 M × 23 M = 529 T (trillion) parameters
Differential Calculus
Jacobian Matrix
• The concept of derivatives can be further generalized to vector-
valued functions (or, vector fields) 𝑓: ℝ𝑛 → ℝ𝑚
• For an n-dimensional input vector 𝐱 = 𝑥1 , 𝑥2 , … , 𝑥𝑛 𝑇 ∈ ℝ𝑛 ,
the vector of functions is given as
𝐟 𝐱 = 𝑓1 𝐱 , 𝑓2 𝐱 , … , 𝑓𝑚 𝐱 𝑇 ∈ ℝ𝑚
• The matrix of first-order partial derivates of the vector-valued
function 𝐟 𝐱 is an 𝑚 × 𝑛 matrix called a Jacobian
𝜕𝑓1 𝐱 𝜕𝑓1 𝐱

𝜕𝑥1 𝜕𝑥𝑛
𝐉= ⋮ ⋱ ⋮
𝜕𝑓𝑚 𝐱 𝜕𝑓𝑚 𝐱

𝜕𝑥1 𝜕𝑥𝑛
 For example, in robotics a robot Jacobian matrix gives the partial
derivatives of the translational and angular velocities of the robot end-
effector with respect to the joints (i.e., axes) velocities
Integral Calculus
Integral Calculus
• For a function 𝑓(𝑥) defined on the domain [𝑎, 𝑏], the definite
integral of the function is denoted
𝑏

𝑓 𝑥 𝑑𝑥
𝑎
• Geometric interpretation of the integral is the area between
the horizontal axis and the graph of 𝑓(𝑥) between the points
a and b
 In this figure, the integral is the sum of blue areas (where 𝑓 𝑥 > 0)
minus the pink area (where 𝑓 𝑥 < 0)

You might also like