0% found this document useful (0 votes)
36 views24 pages

Calculus Stanford

The ICME Refresher Course on Calculus provides a review of essential calculus concepts relevant to various engineering and finance courses, including derivatives, the chain rule, and the Fundamental Theorem of Calculus. It also covers integration techniques such as integration by parts and the Leibniz Integral Rule, along with Taylor and Maclaurin series. The course aims to reinforce understanding of these topics for students preparing for advanced coursework.

Uploaded by

李家驹
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
36 views24 pages

Calculus Stanford

The ICME Refresher Course on Calculus provides a review of essential calculus concepts relevant to various engineering and finance courses, including derivatives, the chain rule, and the Fundamental Theorem of Calculus. It also covers integration techniques such as integration by parts and the Leibniz Integral Rule, along with Taylor and Maclaurin series. The course aims to reinforce understanding of these topics for students preparing for advanced coursework.

Uploaded by

李家驹
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

ICME Refresher Course: Calculus

Kaleigh Mentzer, Nadim Saad, Christiane Adcock


July 2020

1 Introduction
This refresher course covers material primarily relevant to CME 302, CME 303,
CME 306, and CME 307. This material is also relevant in many engineering and
finance classes. This is not a rigorous, theoretical explanation of Calculus. If you
want that, consider taking a master’s level calculus class. If you already know
the material covered in this refresher course, great. If not, don’t worry—that’s
why we have refresher courses.

2 Scalar Calculus
2.1 Derivatives, partial derivatives
The derivative of a single variable function, f (x) is denoted by f 0 (x), dfdx
(x)
, or
fx . It is defined by:
f (x + h) − f (x)
f 0 (x) = lim
h→0 h
0
The value of f (x) is equal to the slope of the tangent line to a function at the
point x.

When we extend this definition to the multivariable case, we get partial deriva-
tives. Suppose we have a multivariable function f (x1 , x2 , ..., xn ). The partial
∂f
derivative of f with respect to variable xi , written ∂x i
or fxi , is

∂f f (x1 , x2 , ..., xi + h, ..., xn ) − f (x1 , x2 , ..., xn )


= lim
∂xi h→0 h
For the partial derivative, we treat all variables except one we’re differentiating
by as constants and differentiate just as we do in the single variable case.

2.2 Chain rule


There are several rules for how to differentiate. One common one is the chain
rule. Suppose we have two differentiable functions f (x) and g(x). We are

1
interested in computing the derivative of f (g(x)). According to the chain rule,
if we define h(x) = f (g(x)), then

h0 (x) = f 0 (g(x))g 0 (x)


2
Exercise: Differentiate h(x) = ex .
Solution: We define f (x) = ex and g(x) = x2 . Plugging in these functions, we
get
2
h0 (x) = 2xex

We can extend the chain rule to multivariable functions, f (x1 (t), x2 (t), ..., xn (t)).
Suppose we are interested in computing ∂f ∂t . Applying the chain rule, we get:

∂f ∂f ∂x1 ∂f ∂xn
= + ... +
∂t ∂x1 ∂t ∂xn ∂t

2.3 Fundamental Theorem of Calculus


The Fundamental Theorem of Calculus (FTC) links the concepts of inte-
gration and differentiation.

The first part of the FTC is that if f is function that is continuous on a closed
interval, [a, b] and ∀x ∈ (a, b)
Z x
F (x) = f (t)dt
a

then
F 0 (x) = f (x) ∀x ∈ (a, b)

The second part of the FTC is that if f is a function on a closed interval, [a, b],
F 0 (x) = f (x) in [a, b], and f is Riemann integrable on [a, b] then
Z b
f (x)dx = F (b) − F (a)
a

The Riemann sum of a function is the sum of a series of shapes, such as


rectangles, which approximate the shape of the function. Several examples of
Riemann sums with rectangles are shown in Figure 1.

The Riemann integral is the limit of the Riemann sum as the function is
partitioned into smaller and smaller shapes.

A function is Riemann integrable if, no matter how we draw the rectangles,


the Riemann integral converges to the same number. All continuous functions
are Riemann-integrable on a bounded interval.

2
https://en.wikipedia.org/wiki/Riemanns um

Most functions you will take about in the core courses are Riemann integrable.
For example, all polynomials on a bounded interval are Riemann integrable. An
example of a function that is not Riemann integrable is the indicator function
for rational numbers
(
1, if x is a rational number
IQ (x) =
0, otherwise

2.4 Leibniz Integral Rule


The Leibniz Integral Rule, also called the Leibniz rule, tells us how to dif-
ferentiate an integral.

Say we have a function of two variables, f (x, t), which we are integrating with
respect to (wrt) t. Say the bounds of our integral are functions of x, a(x) and
b(x). If we differentiate this integral wrt x, then the Leibniz rule tells us
Z b(x) ! Z
b(x)
d ∂ d d
f (x, t)dt = f (x, t)dt+f (x, b(x)) b(x)−f (x, a(x)) a(x)
dx a(x) a(x) ∂x dx dx

Say we have a function of one variable, f (t), which we are integrating wrt to
t. Say the bounds of our integral are functions of x, a(x) and b(x). If we
differentiate this integral wrt x, then the Leibniz rule tells us
Z b(x)
d d d
f (t)dt = f (b(x)) b(x) − f (a(x)) a(x)
dx a(x) dx dx
Say we have a function of two variables, f (x, t) which we are integrating wrt t.
Say the bounds of our integral are constants, a and b. If we differentiate this

3
integral wrt x, then the Leibniz rule tells us
Z b Z b
d ∂
f (x, t)dt = f (x, t)dt
dx a a ∂x

Say we have a function of one variable, f (x), which we are integrating wrt to x.
Say the bounds of our integral are constants, a and b. If we differentiate this
integral wrt x, then the Leibniz rule tells us
Z b
d
f (x)dx = f (b) − f (a)
dx a

2.5 Integration by parts


Integration by parts is the same as the divergence theorem for n = 1. The
divergence theorem generalizes integration by parts to higher dimensions and
will be discussed in section 2. Integration by parts is frequently used in CME
303 (with energy equations related to PDEs) and CME 306 (in finite element
methods–integration by parts is key to the Galerkin method).
Integration by parts tells us that if u(x) and v(x) are two differentiable functions
then Z b Z b b
u(x)v 0 (x)dx = u(x)v(x) − u0 (x)v(x)dx
a a a

Derivation: Integration by parts can be quickly derived from the product rule.
Begin with u(x)v(x) and then differentiate:
0
(u(x)v(x)) = u0 (x)v(x) + u(x)v 0 (x)

Integrating both sides from a to b,


Z b Z b Z b
0
(u(x)v(x)) dx = u0 (x)v(x)dx + u(x)v 0 (x)dx
a a a

By the fundamental theorem of calculus,


Z b b
0
(u(x)v(x)) dx = u(x)v(x)
a a

Plugging this into the previous expression and rearranging terms, we get
Z b b
Z b
u(x)v 0 (x)dx = u(x)v(x) − u0 (x)v(x)dx
a a a

which concludes our derivation.

Example: A common integral is


Z π
x sin(x)dx
0

4
u(x) = x v(x) = − cos(x)
u0 (x) = 1 v 0 (x) = sin(x)

You’ll see this integral in CME 303 when working with Fourier series. We
solve this integral using integration by parts. We define plugging these into the
integration by parts formula,
Z π π
Z π
x sin(x)dx = −x cos(x) + cos(x)dx
0 0 0
= (π − 0) + (−1 + 1) = π
When computing integrals, be sure to check for symmetry. If a function is
integrated over an interval that’s symmetric about zero, [−a, a] and the function
is odd, then the integral is 0. An odd function is a function f (x) such that
(s.t.)
f (−x) = −f (x)
An even function is a function f (x) s.t.
f (−x) = f (x)
Example We can solve the following integral using the rule about integrals of
odd functions. Z π/2
x cos(x)dx = 0
−π/2

This is true because our interval is symmetric about zero, [−π/2, π/2] and
xcos(x) is odd.

2.6 Taylor series and Maclaurin series


Let f(x) be a real valued function that is infinitely differentiable at x = x0 . The
Taylor series, also called the Taylor expansion, for the function f (x) around
x0 is

X (x − x0 )n
f (n) (x0 )
n=0
n!
Writing this out we get
f 00 (x0 ) f (n) (x0 )
f (x) = f (x0 ) + f 0 (x0 )(x − x0 ) + (x − x0 )2 + ... + (x − x0 )n
2! n!
The Taylor series can be generalized to higher dimensions. For example, the
Taylor series of f (x, y) about x, y = x0 , y0 is

∂f ∂f
f (x, y) = f (x0 , y0 ) = (x0 , y0 )(x − x0 ) + (x0 , y0 )(y − y0 )
∂x ∂y
1 ∂2f ∂2f ∂2f
 
2 2
+ (x ,
0 0y )(x − x 0 ) + 2 (x ,
0 0y )(x − x 0 )(y − y0 ) + (x ,
0 0y )(y − y 0 ) +...
2! ∂ 2 x ∂x∂y ∂2y

5
This infinite sum of polynomials equals the original function f (x), even if f (x)
is not a polynomial function. Often, though, we use Taylor polynomials to
approximate a function up to a certain order of error. This approximation is
best when we are close to x0 . For example

f 00 (x0 )
f (x) = f (x0 ) + f 0 (x0 )(x − x0 ) + (x − x0 )2 + O(x − x0 )3
2!
where O(x − x0 )3 denotes that the error in this approximation is on the order
of (x − x0 )3 . Since we’re considering x close to x0 , we assume (x − x0 ) is small,
specifically less than 1. Then we can say (x − x0 )3 is small compared to the first
few terms of the sequence, and we choose to ignore it for our approximations.
You’ll hear more about this ”truncation error” and orders of approximation in
CME 306.

If the Taylor series expansion is centered around x = 0, then we call this poly-
nomial a MacLaurin series expansion instead.

Taylor series are used frequently in CME 306 (Numerical Solutions for Par-
tial Differential Equations.) For example, when approximating derivatives in
a PDE, you can figure out the order of the error in your approximation using
Taylor series.

Some useful Taylor series:


Expansions about x = 0,

X xn
ex =
n=0
n!

X (−1)n x2n+1
sin(x) =
n=0
(2n + 1)!

X (−1)n x2n
cos(x) =
n=0
(2n)!

Be careful about ln(x), the form looks very different depending on what value
you expand around. Here’s the series for ln(x) about x = 1:

X (x − 1)n
ln(x) =
n=1
n

3 Vector Calculus
In this section, we cover the dot product, cross product, scalar fields, vector
fields, operators on scalar and vector fields, the Divergence theorem, and Stokes’
theorem.

6
In subsequent sections, unbold lowercase letters denote scalars, e.g. a. Bold
lowercase letters denote vectors, e.g. x. Bold uppercase letters denote matrices,
e.g. A. In other texts or lectures, where writing bold characters is challenging,
you may see vectors denoted with a single underline, e.g. x and matrices denotes
with an unbold uppercase letter, e.g. A, or a double underline, e.g. A.

Depending on the convention used in a class or text, vectors may by default


be represented as rows or columns. Our convention will be that vectors are rep-
resented as columns. Using this convention we can also write the dot product
as
aT b
meaning the product a row vector, aT , with a column vector, b.

3.1 Dot Product


The dot product, also called the scalar product, is an operation that takes in
two vectors and returns a scalar. Given two vectors, u = (u1 , u2 , ..., un ) and
v = (v1 , v2 , ..., vn ), we define their dot product as:
n
X
u·v = ui bi = u1 v1 + u2 v2 + u3 v3 + ... + un vn
i=1

We can also define the dot product of u and v as


u · v = kukkvkcosθ
where θ is the angle formed between u and v and kxk is the magnitude, or
length, of x. Here we are defining the magnitude to be the Euclidean norm,
also called the Euclidean length
q
kxk = x21 + x22 + ... + x2n

Tomorrow we’re going to talk about different definitions of the norm.

If two vectors, u and v, are orthogonal, u ⊥ v, this means that the angle
between them, θ, is π/2 radians. If u ⊥ v then
u·v =0
We can prove this using the definition
u · v = kukkvkcosθ = kukkvk0 = 0
where cosθ = 0 when θ = π/2.
If two vectors, u and v are codirectional, u k v, this means the angle between
them is 0. If u k v then
u · v = kukkvk

7
We can again prove this using the definition

u · v = kukkvkcosθ = kukkvk1 = kukkvk

where cosθ = 1 when θ = 0.

The component of a vector, u, in the direction of another vector, v, is


v
u · v̂ = u · = kukcosθ
kvk
where θ is the angle between u and v. The unit vector in the direction of x,
x̂, is the vector in the direction of x with magnitude one, kx̂k = 1
x
x̂ =
kxk
The projection of u in the direction of v is
 
v v v
(u · v̂)v̂ = u · = (kukcosθ)
kvk kvk kvk

The dot product has the following properties:


• commutative: a · b = b · a
• distributive over vector addition: a · (b + c) = a · b + a · c
• scalar multiplication: (c1 a) · (c2 b) = c1 c2 (a · b)
• not associative: (a · b) · c and a · (b · c) are both ill-defined
• no cancellation: if a · b = a · c then it is not necessarily true that b = c.
a · b = a · c only tells us, by the distributive law, that a · (b − c) = 0,
meaning a ⊥ (b − c).

3.2 Cross Product


Let u and v be two vectors. For any two vectors, there exists a unit vector n̂
that is orthogonal to both u and v.

The cross-product of u and v, u × v, is

u × v = kukkvksin(θ)n̂

n is not uniquely determined in terms of u and v—it could point in two different
directions along the same line. The ambiguity is resolved by using the right-
hand-rule, depicted in Figure 1.

The geometric interpretation of u × v = kukkvksin(θ)n is that u × v is the

8
https://en.wikipedia.org/wiki/Cross product

https://en.wikipedia.org/wiki/Cross product

vector with length is the area of the parallelogram formed by vectors u and v
and direction perpendicular to u and v. This is shown in Figure 2.
For vectors u = [u1 ; u2 ; u3 and v = [v1 ; v2 ; v3 ] the cross product can be com-
puted by
î ĵ k̂
u × v = u1 u2 u3
v1 v2 v3

where î, ĵ, k̂ are the unit vectors in the direction of the x, y, z axes of the Carte-
sian coordinate system
     
1 0 0
î = 0 , ĵ = 1 , k̂ = 0
0 0 1

We also use e1 , e2 , e3 to denote the unit vectors in the direction of a 3D coor-


dinate system. This coordinate system could be a Cartesian coordinate system,
but it could also be a non-Cartesian coordinate system, such as a cylindrical or
spherical system.

9
|A| is the determinant of matrix A. The determinant is also written det(A).
For a 3x3 matrix  
a b c
A = d e f 
g h i
the determinant is
e f d f d e
|A| = a −b +c
h i g i g h

For a 2x2 matrix  


a b
A=
c d
the determinant is
|A| = ad − bc
The determinant can also be defined in higher dimensions. If you need to do
this, look up the formula online.

The cross product has the following properties:


• anticommutative: a × b = −(b × a)
• distributive over vector addition: a × (b + c) = (a × b) + (a × c)
• scalar multiplication: c1 a) × b = a × (c1 b) = c1 (a × b)
• not associative: (a × b) × c 6= a × (b × c)
• no cancellation: if a × b = a × c then it is not necessarily true that b = c.
It only means that 0 = a × (b − c).

3.3 Scalar and Vector Fields, and their Operators


A scalar field is an assignment of each point in space to a scalar value. A
scalar field is a type of scalar-valued function. A scalar-valued function is a
function that takes in a vector and outputs a scalar, f : Rn → R. An example
of a scalar field is a function that maps a location in 3D space, (x, y, z), to the
temperature at that location, T (x, y, z). A scalar field is typically visualized
with a heatmap, which assigns each location in space a color which represents
a value. Figure 3 shows a 2D heatmap.

A vector field is an assignment of each point in space to a vector value. A


vector field is a type of vector-valued function. A vector-valued function is
a function that takes in a vector of dimension m × 1 and outputs a different
vector of dimension n × 1, F : Rm → Rn . An example of a vector field is func-
tion maps each location in 3D space, (x, y, z) to the x and y velocities at that
location, (u(x, y, z), v(x, y, z)). A vector field is often visualized with a quiver

10
https://en.wikipedia.org/wiki/Scalar field

plot, which draws a vector at regular points in space. The location in space rep-
resents the input vector. The direction and length of the vector drawn at that
location represent to the direction and magnitude of the corresponding output
vector. Figure 4 shows a quiver plot that takes a 2D input to a 2D output.

https://en.wikipedia.org/wiki/Vector field

We can also interpret a vector field as a vector whose components are functions.
That is, at each point (x,y,x) in space we have a vector determined by:

F = F1 (x, y, z)î + F2 (x, y, z)ĵ + F3 (x, y, z)k̂

The nabla/del operator, ∇, is the vector


 ∂ ∂ ∂

∇ = ∂x 1 ∂x2 ··· ∂xn

11
The gradient of a function (a scalar field), f (x1 , x2 , ..., xn ) is a vector and is
given by:  ∂f 
∂x
 ∂f1 
 ∂x2 
∇f = 
 .. 

 . 
∂f
∂xn

Gradients are commonly used in optimization to find the critical points of a


function. The critical points are all points (x1 , x2 , ..., xn ) that satisfy ∇f (x1 , x2 , ..., xn ) =
0.

The Hessian of a function, f (x1 , x2 , ..., xn ), is a matrix and is given by:


 ∂2f ∂2f ∂2f

2 ∂x ∂x · · · ∂x ∂x
 ∂∂x2 f1 1
∂2f
2 1
∂2f 
n


 ∂x2 ∂x1 ∂x2 2 · ∂x2 ∂xn 
2
∇ f = . .. .. .. 
 . .
 . . . 

∂2f ∂2f ∂2f
∂xn ∂x1 ∂xn ∂x2 ··· ∂x2 n

Hessians are commonly used in optimization as well. Hessians are used to check
whether the critical points of a function are maximums, minimums or saddle
points.

The Jacobian of a vector-valued function, F, is denoted ∇F, DF, or JF . The


Jacobian is  ∂F1 ∂F1 ∂F1 
∂x1 ∂x2 · · · ∂x n

∂Fj  ∂F2 ∂F2


· · · ∂F2 
 ∂x1 ∂x2 ∂xn 
∇F = = . .. .. .. 
∂xi  .. . . . 
∂Fm ∂Fm ∂Fm
∂x1 ∂x2 · · · ∂xn

The Hessian of a function, f , is the Jacobian of the gradient of f , ∇(∇f ).

The divergence of a vector field, F(x1 , x2 , ..., xn ) = (F1 (x1 , x2 , ..., xn ), F2 (x1 , x2 , ..., xn ), ..., Fn (x1 , x2 , ..., xn ))
is
∂F1 ∂F2 ∂Fn
div(F) = ∇ · F = + + ... +
∂x1 ∂x2 ∂xn
The divergence has the following properties:
• ∇ · (F + G) = ∇ · F + ∇ · G
• ∇ · (f F) = (∇f ) · F + f (∇ · F)

The curl of a vector field, F, is


∇×F

12
The curl has the following properties:
• ∇ × (F + G) = ∇ × F + ∇ × G
• ∇ × (f F) = (∇f ) × F + f (∇ × F)
• ∇ × (∇f ) = 0

• ∇ · (∇ × F) = 0
The Laplace operator, also called the Laplacian is
n
X ∂2
∆ = ∇2 = ∇ · ∇ =
i=1
∂x2i

The Laplacian of a scalar field, f (x1 , x2 , ..., xn ) is

∂2f ∂2f ∂2f


∇f = 2 + 2 + ... +
∂x1 ∂x2 ∂x2n

The Laplacian of a vector field is

∇2 · F

In Cartesian coordinates, the Laplacian of a vector field, F(x1 , x2 , x3 ) = (F1 (x1 , x2 , x3 ), F2 (x1 , x2 , x3 ), F3 (x1 , x2 ,
is
∂ 2 F1 ∂ 2 F2 ∂ 2 F3
∇2 · F = 2 + 2 +
∂x1 ∂x2 ∂x23

The Laplacian operator appears in CME 303 in PDEs, such as the Laplacian
equation and the Poisson equation.

3.4 Divergence Theorem


Let E be a subset of Rn and S be the boundary surface of E with positive
orientation. For n = 3, E is a 3D volume. Positive orientation means that the
unit normal, n̂, anywhere on S points outward. Let F be a vector field whose
components have continuous first order partial derivatives. Then the divergence
theorem says ZZ ZZZ
(F · n̂) · dS = (∇ · F)dV
S E

The divergence theorem is often used in CME 303. The Divergence theorem
simplifies to integration by parts when n = 1.

Example to practice: Compute the flux of the vector field F = xî + y ĵ + zk


over the entire surface of the cylinder x2 + y 2 = a2 bounded by the planes z =
0 and z = b using the divergence theorem.

13
3.5 Stokes’ Theorem
Let S be an oriented smooth surface that is bounded by a simple, closed, smooth
boundary curve C with positive orientation. An oriented smooth surface is a
smooth surface with normal vector, n, pointed from one side of the surface to
the other. A curve with positive orientation is a curve where if you curl your
fingers in the direction of the curve your thumb points in the direction of n. Let
F be a vector field. Stokes’ theorem tells us that
Z ZZ
F · dr = (∇ × F) · dS
C S

4 Einstein Notation
This notation most likely won’t come up in your core classes. I’m going to cover
it because it is widely used in textbooks and papers. In addition if you take a
graduate-level engineering classes, there’s a good chance that class will assume
you know Einstein notation.

In Einstein notation, also called Einstein summation convention, if a index,


such as the i in xi , appears twice in a term then that term is summed over all
values of the index. For example, for i ∈ {1, 2, 3}
3
X
ai xi = ai xi = a1 x1 + a2 x2 + a3 x3
i=1

Each letter can appear at most twice per term. For example, dijij is a valid
term but dijjj is not a valid term.

If there are zero non-repeated indices, such as u or ui ui , then the output is


a scalar. This holds so long as u is not defined to be a vector. Most classes use
some notation to differentiate scalar, vector, and matrix values, such as making
vector values bold or underlined, u or u, and matrix values capital or capital
and double underlined, A or A.

If there is one non-repeated index then the output is a vector, such as ui or 


ui ui uj . A tricky example of this is u2i , which is a vector, u2i = uP
2 2 2
1 , u2 , ..., un .
n
This is different than ui ui , which we said is a scalar, ui ui = i=1 ui ui =
u21 + u22 + ...u2n .

If there are two non-repeated indices then the output is a matrix, such as ui uj
or ui ui uj uk .

In Einstein notation, the dot product is

u · v = ui v i

14
the cross product is
u × v = ui bj εijk
where εijk is the Levi-Civita symbol.

The Levi-Civita symbol is



1 if ijk is in cyclic order {123, 231, 312

εijk = −1if ijk is in anti − cyclic order {321, 132, 213}

0 if two or more indices are the same

If any two indices of the Levi-Civita symbol are switched, the symbol changes
sign
ε...ip ...iq ... = −ε...iq ...ip ...
The double ε identity tell us that

εijk εipq = δjp δkq − δjq δkp

where δij is the Kronecker delta function.

The Kronecker delta function is


(
0 if i 6= j
δij =
1 if i = j

The ith element of the product of a matrix, Aij , with a vector, vj , is

(Av)i = Aij vj

The (i, k)th element of the product of two matrices, Aij and Bjk , is

(AB)ik = Aij Bjk

5 Matrix Calculus
5.1 Transpose
The transpose of a matrix A, AT , is

(AT )ij = Aji

The transpose has the following properties


• (AT )T = A
• (A + B)T = AT + BT
• (AB)T = BT AT

15
• (cA)T = cAT
• det(AT ) = det(A)
• (AT )−1 = (A−1 )T where A−1 in the inverse, defined in the next section
• the eigenvalues of A are the same as the eigenvalues of AT (you will
discuss eigenvalues in the calculus refresher course)
A symmetric matrix has AT = A.

5.2 Inverse
The inverse of a matrix, A, is another matrix, A−1 , such that

AA−1 = I

where I is the identity matrix, which is 1 on the diagonal and 0 everywhere


else. For a 2x2 matrix,  
a b
A=
c d
the inverse is  
−1 1 d −b
A =
det(A) −c a
You can find formulas to compute the inverse for nxn matrices online.
The inverse has the following properties

• (A−1 )−1 = A
• (AB)−1 = B−1 A−1
• (cA)−1 = 1c A−1
There is no simple expression for (A + B)−1

5.3 Matrix Differentiation


Let y be a m × 1 vector and x be a n × 1 vector where

y = f (x)

The partial derivative of y with respect to (wrt) x is


 ∂y ∂y1 ∂y1

∂x1
1
∂x2 ··· ∂xn
 ∂y2 ∂y2 ∂y2
∂x2 · · ·

∂y  ∂x1 ∂xn 
= . . . . 
∂x  ..
 .. .. .. 

∂ym ∂ym ∂ym
∂x1 ∂x2 cdots ∂xn

16
This is the Jacobian of f .

It is worth remembering the result of this formula for the following common
matrix derivatives. We prove the first result. The rest can be proved similarly

If A does not depend on x, then

∂(Ax)
=A
∂x
We can prove this by
n
!
∂ ∂ X
(Ax)i = ai kxk = ai j ∀i = 1, 2, ..., m, j = 1, 2, ..., n
∂xj ∂xj
k=1

so
∂y
=A
∂x

If A does not depend on x or z, and x depends on z, then


∂Ax ∂x
=A
∂z ∂z

If A does not depend on x and y then

∂yT Ax
= yT A
∂x
and
∂yT Ax
= x T AT
∂y

If A does not depend on x then

∂(xT Ax)
= xT (A + AT )
∂x

If A does not depend on x and A is symmetric then

∂(xT Ax)
= 2xT A
∂x

If y and x depend on z then

∂(yT x) ∂y ∂x
= xT + yT
∂z ∂z ∂z

17
If x depends on z then
∂(xT x) ∂x
= 2xT
∂z ∂z

If y and x depend on z and A does not depend on z then

∂(yT Ax ∂y ∂x
= x T AT + yT A
∂z ∂z ∂z

If x depends on z and A does not depend on z then

∂(xT Ax) ∂x
= xT (A + AT )
∂z ∂z

If x depends on z, A does not depend on z, and A is a symmetric matrix then

∂(xT Ax) ∂x
= 2xT A
∂z ∂z

5.4 Quadratic Forms


A quadratic form is a function f : Rn →
− R of the form
n X
X n
f (x) = xT Ax = Aij xi xj
i=1 j=1

where A is an n x n matrix and x ∈ Rn .

You will see in CME 302 that these forms are frequently used to prove properties
about matrices, such as positive definiteness and negative definiteness. These
forms are used often used in CME 307 (or any other optimization classes), such
as when optimizing over ellipsoids.

We will now derive the gradient and Hessian of a quadratic form. These are used
in CME 307 to derive conditions to find critical points and check for maxima
and minima.

Our variables of interest is vector x ∈ Rn . Therefore, we have n variables


of interest. We should then get a derivative of the quadratic form with respect
to each of these n variables. The gradient of this quadratic form is a vector
∇f ∈ Rn . Computing the derivative with respect of each variable, we get the
gradient
∇f = 2Ax

The Hessian is the gradient of ∇f . We now have n terms (in our gradient) and

18
n variables of interest, we therefore expect to get n2 terms, which we write as a
matrix. Computing the derivative wrt each variable, we get that the Hessian is
∇2 f = 2A
If the expressions for the gradient and Hessian are confusing you, we’d recom-
mend going over the case where n = 2.

5.5 Inner Product


An inner product on a real vector space V is a map
< ., . >: V x V →
− R
such that (s.t.):
1. < ., . > is linear in both slots, meaning ∀u, u1 , u2 , v, v1 , v2 ∈ V :
< c1 u1 + c2 u2 , v >= c1 < u1 , v > +c2 < u2 , v >
and
< u, c1 v1 + c2 v2 >= c1 < u, v1 > +c2 < u, v2 >

2. < ., . > is symmetric, meaning < u, v >=< v, u >


3. < ., . > is positive definite, meaning
v ∈ V =⇒ < v, v >≥ 0, and < v, v >= 0 ⇐⇒ v = 0

We also introduce the notation for the norm associated to this inner product:
1
kvk =< v, v > 2

There are many maps that satisfy these requirements, meaning that there are
many different inner products and associated norms. One inner product is the
dot product. A useful exercise is proving that the dot products satisfies the
definition of an inner product given above. You will other inner products and
norms in CME 302 and CME 303.

6 Fourier Transform
The Fourier transform of function f (x) is
Z
ˆ
f (ξ) = f (x)e−ixξ dx
R

The inverse Fourier transform, fˆ(ξ), is


Z
1
f (x) = fˆ(ξ)eixξ dξ
2π R

19
You will see in CME 303 that Fourier transforms are often used to solve PDEs.
In CME 308, you will talk about the characteristic function (cf) of a probability
density function (pdf)—the cf is the Fourier transform of the pdf. The cf and
pdf were introduced in the probability and statistics refresher course.

Exercise: Compute the Fourier transform for f (x) = e−x for x > 0.
Solution:
Z ∞
fˆ(ξ) = e−x e−ixξ dx
Z0 ∞
= e(−iξ−1)x dx
0
1 ∞
= e(−iξ−1)x
−iξ − 1 0
1
=
1 + iξ

Exercise: Let fa (x) = f (x − a). Show that the Fourier transform of fa (x)
expressed as fˆa (ξ) is given by e−iaξ fˆ(ξ) where fˆ(ξ) is the Fourier transform of
f(x).
Solution:
Z ∞
fˆa (ξ) = fa (x)e−ixξ dx
−∞
Z ∞
= f (x − a)e−ixξ dx
−∞
Z ∞
= f (y)e−i(y+a)ξ dy
−∞
Z ∞
= f (y)e−iaξ e−iyξ dy
−∞
Z ∞
= e−iaξ f (y)e−iyξ dy
−∞

=e −iaξ
fˆ(ξ)

7 Optimization
7.1 Scalar, Single Variable Function
Let f (x) be a scalar function of a single variable, x. A critical point of f , x0 ,
is a point where
f 0 (x0 ) = 0

A local maximum of f is a point, a, where f (a) ≥ f (x) for all x in some open

20
interval around x = a. A local maximum occurs at a critical point where
f 00 (x0 ) < 0

A local minimum of f is a point, a, where f (a) ≤ f (x) for all x in some open
interval around x = a. A local minimum occurs at a critical point where
f 00 (x0 ) > 0

If f 00 (x0 ) = 0 then x0 might be a saddle point, or a minimax point. A saddle


point is a critical point that is not a local minimum or maximum.
The global maximum is a point, a, where f (a) ≥ f (x) for every x in the domain.
We find the global maximum by computing all of the local maxima and finding
the location of the largest local maxima.

The global minimum is a point, a, where f (a) ≤ f (x) for every x in the domain.
We find the global minimum by computing all of the local minima and finding
the location of the smallest local minima.

A convex function is a function, f (x), with domain X such that


∀x1 , x2 ∈ X, ∀α ∈ [0, 1] : f (αx1 + (1 − α)x2 ≤ αf (x1 ) + (1 − α)f (x2
Equivalently, a function is convex if
f 00 (x) ≥ 0 ∀x ∈ X

A strictly convex function if a function, f (x) with domain X such that


∀x1 6= x2 ∈ X, ∀α ∈ (0, 1) : f (αx1 + (1 − α)x2 < αf (x1 ) + (1 − α)f (x2 )
Equivalently, a function is strictly convex if
f 00 (x) > 0 ∀x ∈ X

If f (x) is convex then every local minimum is a global minimum.

If f (x) is strictly convex then the problem has at most one local minimum,
which is the global minimum.

7.2 Scalar, Two-Variable Function


Let f (x, y) be a scalar function of two variables, x and y. A critical point of
f , (x0 , y0 is a point where

f (x, y)|x=x0 ,y=y0 = 0
∂x

21
and

f (x, y)|x=x0 ,y=y0 = 0
∂y

We use the discriminant to determine if a critical point is a local maximum,


a local minimum, or a saddle point. The discriminant is
2
D(f (x, y)) = fxx fyy − fxy

A local maximum of f is a point (a, b) where f (a, b) ≥ f (x, y) for all (x, y) in
some open intervals around (a, b). A local maximum occurs at a critical point
where
D(x0 , y0 ) > 0
and
fxx (x0 , y0 ) < 0

A local minimum of f is a point (a, b) where f (a, b) ≤ f (x, y) for all (x, y) in
some open intervals around (a, b). A local minimum occurs at a critical point
where
D(x0 , y0 ) > 0
and
fxx (x0 , y0 ) > 0

A saddle point of f is a critical point where

D(x0 , y0 ) < 0

If
D(x0 , y0 ) = 0
then we cannot determine if the point is a local maximum, local minimum, or
saddle point based on the discriminant.
The global maxima and minima are defined and found analogously to the single
variable case.

Discussing convexity for a function of multiple variables requires defining a


convex set, positive semidefinite, and using the Hessian. CME 307 starts with
this material.

7.3 Constrained Optimization


In a constrained optimization problem, we try to minimize or maximize a
function, called the objective function, while satisfying several equalities or

22
inequalities, called constraints. An example of a single variable constrained
optimization problem is
min f (x)
x
s.t. f1 (x) ≤ 0 (1)
f2 (x) ≤ 0

To solve such problems, we can use the Lagrangian, which is a weighted com-
bination of our objective function and constraints. The Lagrangian is
L(x, λ1 , λ2 ) = f (x) + λ1 f1 (x) + λ2 f2 (x)

The idea behind this Lagrangian function is to transform our original con-
strained problem into an unconstrained problem with several variables which
we can handle as we discussed in the previous sections. We just transformed
our original constrained problem into an unconstrained problem: minimize
L(x, λ1 , λ2 . Now we solve this unconstrained problem. First, we solve for the
critical points of the Lagrangian:

∂L ∂f ∂f1 ∂f2
∂x = 0 =⇒ ∂x + λ1 ∂x + λ2 ∂x = 0







∂L
 ∂λ1 = 0 =⇒ f1 (x) = 0




 ∂L = 0 =⇒ f (x) = 0

∂λ2 2

We see that we actually recover our constraints when differentiating with re-
spect to λ1 and λ2 .

Example: Suppose we’re interested in finding the dimensions of a rectangular


box, x, y, z, that minimize the surface area of the box, while ensuring the vol-
ume is 2. Mathematically, we want to minimize S = 2xy + 2xz + 2yz subject
to xyz = 2.

Solution: Our Lagrangian is


L(x, y, z, λ1 ) = 2xy + 2xz + 2yz + λ(xyz − 2)
The critical points of the Lagrangian are

∂L
 ∂x = 0 =⇒ 2y + 2z + λyz = 0







∂L
 ∂y = 0 =⇒ 2x + 2z + λxz = 0


 ∂L
∂z = 0 =⇒ 2x + 2y + λxy = 0









 ∂L = 0 =⇒ xyz = 2

∂λ

23
Solving these four equations with four unknowns, we find that
√3
x=y=z= 2

Therefore, the surface area is


√ √ √ √ √
S = 2( 2)2 + 2( 2)2 + 2( 2)2 = 6( 2)2 = 6 4
3 3 3 3 3

Lagrangians show up all the time in CME 307 in homeworks and exams.

24

You might also like