Calculus Stanford
Calculus Stanford
1 Introduction
This refresher course covers material primarily relevant to CME 302, CME 303,
CME 306, and CME 307. This material is also relevant in many engineering and
finance classes. This is not a rigorous, theoretical explanation of Calculus. If you
want that, consider taking a master’s level calculus class. If you already know
the material covered in this refresher course, great. If not, don’t worry—that’s
why we have refresher courses.
2 Scalar Calculus
2.1 Derivatives, partial derivatives
The derivative of a single variable function, f (x) is denoted by f 0 (x), dfdx
(x)
, or
fx . It is defined by:
f (x + h) − f (x)
f 0 (x) = lim
h→0 h
0
The value of f (x) is equal to the slope of the tangent line to a function at the
point x.
When we extend this definition to the multivariable case, we get partial deriva-
tives. Suppose we have a multivariable function f (x1 , x2 , ..., xn ). The partial
∂f
derivative of f with respect to variable xi , written ∂x i
or fxi , is
1
interested in computing the derivative of f (g(x)). According to the chain rule,
if we define h(x) = f (g(x)), then
We can extend the chain rule to multivariable functions, f (x1 (t), x2 (t), ..., xn (t)).
Suppose we are interested in computing ∂f ∂t . Applying the chain rule, we get:
∂f ∂f ∂x1 ∂f ∂xn
= + ... +
∂t ∂x1 ∂t ∂xn ∂t
The first part of the FTC is that if f is function that is continuous on a closed
interval, [a, b] and ∀x ∈ (a, b)
Z x
F (x) = f (t)dt
a
then
F 0 (x) = f (x) ∀x ∈ (a, b)
The second part of the FTC is that if f is a function on a closed interval, [a, b],
F 0 (x) = f (x) in [a, b], and f is Riemann integrable on [a, b] then
Z b
f (x)dx = F (b) − F (a)
a
The Riemann integral is the limit of the Riemann sum as the function is
partitioned into smaller and smaller shapes.
2
https://en.wikipedia.org/wiki/Riemanns um
Most functions you will take about in the core courses are Riemann integrable.
For example, all polynomials on a bounded interval are Riemann integrable. An
example of a function that is not Riemann integrable is the indicator function
for rational numbers
(
1, if x is a rational number
IQ (x) =
0, otherwise
Say we have a function of two variables, f (x, t), which we are integrating with
respect to (wrt) t. Say the bounds of our integral are functions of x, a(x) and
b(x). If we differentiate this integral wrt x, then the Leibniz rule tells us
Z b(x) ! Z
b(x)
d ∂ d d
f (x, t)dt = f (x, t)dt+f (x, b(x)) b(x)−f (x, a(x)) a(x)
dx a(x) a(x) ∂x dx dx
Say we have a function of one variable, f (t), which we are integrating wrt to
t. Say the bounds of our integral are functions of x, a(x) and b(x). If we
differentiate this integral wrt x, then the Leibniz rule tells us
Z b(x)
d d d
f (t)dt = f (b(x)) b(x) − f (a(x)) a(x)
dx a(x) dx dx
Say we have a function of two variables, f (x, t) which we are integrating wrt t.
Say the bounds of our integral are constants, a and b. If we differentiate this
3
integral wrt x, then the Leibniz rule tells us
Z b Z b
d ∂
f (x, t)dt = f (x, t)dt
dx a a ∂x
Say we have a function of one variable, f (x), which we are integrating wrt to x.
Say the bounds of our integral are constants, a and b. If we differentiate this
integral wrt x, then the Leibniz rule tells us
Z b
d
f (x)dx = f (b) − f (a)
dx a
Derivation: Integration by parts can be quickly derived from the product rule.
Begin with u(x)v(x) and then differentiate:
0
(u(x)v(x)) = u0 (x)v(x) + u(x)v 0 (x)
Plugging this into the previous expression and rearranging terms, we get
Z b b
Z b
u(x)v 0 (x)dx = u(x)v(x) − u0 (x)v(x)dx
a a a
4
u(x) = x v(x) = − cos(x)
u0 (x) = 1 v 0 (x) = sin(x)
You’ll see this integral in CME 303 when working with Fourier series. We
solve this integral using integration by parts. We define plugging these into the
integration by parts formula,
Z π π
Z π
x sin(x)dx = −x cos(x) + cos(x)dx
0 0 0
= (π − 0) + (−1 + 1) = π
When computing integrals, be sure to check for symmetry. If a function is
integrated over an interval that’s symmetric about zero, [−a, a] and the function
is odd, then the integral is 0. An odd function is a function f (x) such that
(s.t.)
f (−x) = −f (x)
An even function is a function f (x) s.t.
f (−x) = f (x)
Example We can solve the following integral using the rule about integrals of
odd functions. Z π/2
x cos(x)dx = 0
−π/2
This is true because our interval is symmetric about zero, [−π/2, π/2] and
xcos(x) is odd.
∂f ∂f
f (x, y) = f (x0 , y0 ) = (x0 , y0 )(x − x0 ) + (x0 , y0 )(y − y0 )
∂x ∂y
1 ∂2f ∂2f ∂2f
2 2
+ (x ,
0 0y )(x − x 0 ) + 2 (x ,
0 0y )(x − x 0 )(y − y0 ) + (x ,
0 0y )(y − y 0 ) +...
2! ∂ 2 x ∂x∂y ∂2y
5
This infinite sum of polynomials equals the original function f (x), even if f (x)
is not a polynomial function. Often, though, we use Taylor polynomials to
approximate a function up to a certain order of error. This approximation is
best when we are close to x0 . For example
f 00 (x0 )
f (x) = f (x0 ) + f 0 (x0 )(x − x0 ) + (x − x0 )2 + O(x − x0 )3
2!
where O(x − x0 )3 denotes that the error in this approximation is on the order
of (x − x0 )3 . Since we’re considering x close to x0 , we assume (x − x0 ) is small,
specifically less than 1. Then we can say (x − x0 )3 is small compared to the first
few terms of the sequence, and we choose to ignore it for our approximations.
You’ll hear more about this ”truncation error” and orders of approximation in
CME 306.
If the Taylor series expansion is centered around x = 0, then we call this poly-
nomial a MacLaurin series expansion instead.
Taylor series are used frequently in CME 306 (Numerical Solutions for Par-
tial Differential Equations.) For example, when approximating derivatives in
a PDE, you can figure out the order of the error in your approximation using
Taylor series.
Be careful about ln(x), the form looks very different depending on what value
you expand around. Here’s the series for ln(x) about x = 1:
∞
X (x − 1)n
ln(x) =
n=1
n
3 Vector Calculus
In this section, we cover the dot product, cross product, scalar fields, vector
fields, operators on scalar and vector fields, the Divergence theorem, and Stokes’
theorem.
6
In subsequent sections, unbold lowercase letters denote scalars, e.g. a. Bold
lowercase letters denote vectors, e.g. x. Bold uppercase letters denote matrices,
e.g. A. In other texts or lectures, where writing bold characters is challenging,
you may see vectors denoted with a single underline, e.g. x and matrices denotes
with an unbold uppercase letter, e.g. A, or a double underline, e.g. A.
If two vectors, u and v, are orthogonal, u ⊥ v, this means that the angle
between them, θ, is π/2 radians. If u ⊥ v then
u·v =0
We can prove this using the definition
u · v = kukkvkcosθ = kukkvk0 = 0
where cosθ = 0 when θ = π/2.
If two vectors, u and v are codirectional, u k v, this means the angle between
them is 0. If u k v then
u · v = kukkvk
7
We can again prove this using the definition
u × v = kukkvksin(θ)n̂
n is not uniquely determined in terms of u and v—it could point in two different
directions along the same line. The ambiguity is resolved by using the right-
hand-rule, depicted in Figure 1.
8
https://en.wikipedia.org/wiki/Cross product
https://en.wikipedia.org/wiki/Cross product
vector with length is the area of the parallelogram formed by vectors u and v
and direction perpendicular to u and v. This is shown in Figure 2.
For vectors u = [u1 ; u2 ; u3 and v = [v1 ; v2 ; v3 ] the cross product can be com-
puted by
î ĵ k̂
u × v = u1 u2 u3
v1 v2 v3
where î, ĵ, k̂ are the unit vectors in the direction of the x, y, z axes of the Carte-
sian coordinate system
1 0 0
î = 0 , ĵ = 1 , k̂ = 0
0 0 1
9
|A| is the determinant of matrix A. The determinant is also written det(A).
For a 3x3 matrix
a b c
A = d e f
g h i
the determinant is
e f d f d e
|A| = a −b +c
h i g i g h
10
https://en.wikipedia.org/wiki/Scalar field
plot, which draws a vector at regular points in space. The location in space rep-
resents the input vector. The direction and length of the vector drawn at that
location represent to the direction and magnitude of the corresponding output
vector. Figure 4 shows a quiver plot that takes a 2D input to a 2D output.
https://en.wikipedia.org/wiki/Vector field
We can also interpret a vector field as a vector whose components are functions.
That is, at each point (x,y,x) in space we have a vector determined by:
11
The gradient of a function (a scalar field), f (x1 , x2 , ..., xn ) is a vector and is
given by: ∂f
∂x
∂f1
∂x2
∇f =
..
.
∂f
∂xn
Hessians are commonly used in optimization as well. Hessians are used to check
whether the critical points of a function are maximums, minimums or saddle
points.
The divergence of a vector field, F(x1 , x2 , ..., xn ) = (F1 (x1 , x2 , ..., xn ), F2 (x1 , x2 , ..., xn ), ..., Fn (x1 , x2 , ..., xn ))
is
∂F1 ∂F2 ∂Fn
div(F) = ∇ · F = + + ... +
∂x1 ∂x2 ∂xn
The divergence has the following properties:
• ∇ · (F + G) = ∇ · F + ∇ · G
• ∇ · (f F) = (∇f ) · F + f (∇ · F)
12
The curl has the following properties:
• ∇ × (F + G) = ∇ × F + ∇ × G
• ∇ × (f F) = (∇f ) × F + f (∇ × F)
• ∇ × (∇f ) = 0
• ∇ · (∇ × F) = 0
The Laplace operator, also called the Laplacian is
n
X ∂2
∆ = ∇2 = ∇ · ∇ =
i=1
∂x2i
∇2 · F
In Cartesian coordinates, the Laplacian of a vector field, F(x1 , x2 , x3 ) = (F1 (x1 , x2 , x3 ), F2 (x1 , x2 , x3 ), F3 (x1 , x2 ,
is
∂ 2 F1 ∂ 2 F2 ∂ 2 F3
∇2 · F = 2 + 2 +
∂x1 ∂x2 ∂x23
The Laplacian operator appears in CME 303 in PDEs, such as the Laplacian
equation and the Poisson equation.
The divergence theorem is often used in CME 303. The Divergence theorem
simplifies to integration by parts when n = 1.
13
3.5 Stokes’ Theorem
Let S be an oriented smooth surface that is bounded by a simple, closed, smooth
boundary curve C with positive orientation. An oriented smooth surface is a
smooth surface with normal vector, n, pointed from one side of the surface to
the other. A curve with positive orientation is a curve where if you curl your
fingers in the direction of the curve your thumb points in the direction of n. Let
F be a vector field. Stokes’ theorem tells us that
Z ZZ
F · dr = (∇ × F) · dS
C S
4 Einstein Notation
This notation most likely won’t come up in your core classes. I’m going to cover
it because it is widely used in textbooks and papers. In addition if you take a
graduate-level engineering classes, there’s a good chance that class will assume
you know Einstein notation.
Each letter can appear at most twice per term. For example, dijij is a valid
term but dijjj is not a valid term.
If there are two non-repeated indices then the output is a matrix, such as ui uj
or ui ui uj uk .
u · v = ui v i
14
the cross product is
u × v = ui bj εijk
where εijk is the Levi-Civita symbol.
If any two indices of the Levi-Civita symbol are switched, the symbol changes
sign
ε...ip ...iq ... = −ε...iq ...ip ...
The double ε identity tell us that
(Av)i = Aij vj
The (i, k)th element of the product of two matrices, Aij and Bjk , is
5 Matrix Calculus
5.1 Transpose
The transpose of a matrix A, AT , is
15
• (cA)T = cAT
• det(AT ) = det(A)
• (AT )−1 = (A−1 )T where A−1 in the inverse, defined in the next section
• the eigenvalues of A are the same as the eigenvalues of AT (you will
discuss eigenvalues in the calculus refresher course)
A symmetric matrix has AT = A.
5.2 Inverse
The inverse of a matrix, A, is another matrix, A−1 , such that
AA−1 = I
• (A−1 )−1 = A
• (AB)−1 = B−1 A−1
• (cA)−1 = 1c A−1
There is no simple expression for (A + B)−1
y = f (x)
16
This is the Jacobian of f .
It is worth remembering the result of this formula for the following common
matrix derivatives. We prove the first result. The rest can be proved similarly
∂(Ax)
=A
∂x
We can prove this by
n
!
∂ ∂ X
(Ax)i = ai kxk = ai j ∀i = 1, 2, ..., m, j = 1, 2, ..., n
∂xj ∂xj
k=1
so
∂y
=A
∂x
∂yT Ax
= yT A
∂x
and
∂yT Ax
= x T AT
∂y
∂(xT Ax)
= xT (A + AT )
∂x
∂(xT Ax)
= 2xT A
∂x
∂(yT x) ∂y ∂x
= xT + yT
∂z ∂z ∂z
17
If x depends on z then
∂(xT x) ∂x
= 2xT
∂z ∂z
∂(yT Ax ∂y ∂x
= x T AT + yT A
∂z ∂z ∂z
∂(xT Ax) ∂x
= xT (A + AT )
∂z ∂z
∂(xT Ax) ∂x
= 2xT A
∂z ∂z
You will see in CME 302 that these forms are frequently used to prove properties
about matrices, such as positive definiteness and negative definiteness. These
forms are used often used in CME 307 (or any other optimization classes), such
as when optimizing over ellipsoids.
We will now derive the gradient and Hessian of a quadratic form. These are used
in CME 307 to derive conditions to find critical points and check for maxima
and minima.
The Hessian is the gradient of ∇f . We now have n terms (in our gradient) and
18
n variables of interest, we therefore expect to get n2 terms, which we write as a
matrix. Computing the derivative wrt each variable, we get that the Hessian is
∇2 f = 2A
If the expressions for the gradient and Hessian are confusing you, we’d recom-
mend going over the case where n = 2.
We also introduce the notation for the norm associated to this inner product:
1
kvk =< v, v > 2
There are many maps that satisfy these requirements, meaning that there are
many different inner products and associated norms. One inner product is the
dot product. A useful exercise is proving that the dot products satisfies the
definition of an inner product given above. You will other inner products and
norms in CME 302 and CME 303.
6 Fourier Transform
The Fourier transform of function f (x) is
Z
ˆ
f (ξ) = f (x)e−ixξ dx
R
19
You will see in CME 303 that Fourier transforms are often used to solve PDEs.
In CME 308, you will talk about the characteristic function (cf) of a probability
density function (pdf)—the cf is the Fourier transform of the pdf. The cf and
pdf were introduced in the probability and statistics refresher course.
Exercise: Compute the Fourier transform for f (x) = e−x for x > 0.
Solution:
Z ∞
fˆ(ξ) = e−x e−ixξ dx
Z0 ∞
= e(−iξ−1)x dx
0
1 ∞
= e(−iξ−1)x
−iξ − 1 0
1
=
1 + iξ
Exercise: Let fa (x) = f (x − a). Show that the Fourier transform of fa (x)
expressed as fˆa (ξ) is given by e−iaξ fˆ(ξ) where fˆ(ξ) is the Fourier transform of
f(x).
Solution:
Z ∞
fˆa (ξ) = fa (x)e−ixξ dx
−∞
Z ∞
= f (x − a)e−ixξ dx
−∞
Z ∞
= f (y)e−i(y+a)ξ dy
−∞
Z ∞
= f (y)e−iaξ e−iyξ dy
−∞
Z ∞
= e−iaξ f (y)e−iyξ dy
−∞
=e −iaξ
fˆ(ξ)
7 Optimization
7.1 Scalar, Single Variable Function
Let f (x) be a scalar function of a single variable, x. A critical point of f , x0 ,
is a point where
f 0 (x0 ) = 0
A local maximum of f is a point, a, where f (a) ≥ f (x) for all x in some open
20
interval around x = a. A local maximum occurs at a critical point where
f 00 (x0 ) < 0
A local minimum of f is a point, a, where f (a) ≤ f (x) for all x in some open
interval around x = a. A local minimum occurs at a critical point where
f 00 (x0 ) > 0
The global minimum is a point, a, where f (a) ≤ f (x) for every x in the domain.
We find the global minimum by computing all of the local minima and finding
the location of the smallest local minima.
If f (x) is strictly convex then the problem has at most one local minimum,
which is the global minimum.
21
and
∂
f (x, y)|x=x0 ,y=y0 = 0
∂y
A local maximum of f is a point (a, b) where f (a, b) ≥ f (x, y) for all (x, y) in
some open intervals around (a, b). A local maximum occurs at a critical point
where
D(x0 , y0 ) > 0
and
fxx (x0 , y0 ) < 0
A local minimum of f is a point (a, b) where f (a, b) ≤ f (x, y) for all (x, y) in
some open intervals around (a, b). A local minimum occurs at a critical point
where
D(x0 , y0 ) > 0
and
fxx (x0 , y0 ) > 0
D(x0 , y0 ) < 0
If
D(x0 , y0 ) = 0
then we cannot determine if the point is a local maximum, local minimum, or
saddle point based on the discriminant.
The global maxima and minima are defined and found analogously to the single
variable case.
22
inequalities, called constraints. An example of a single variable constrained
optimization problem is
min f (x)
x
s.t. f1 (x) ≤ 0 (1)
f2 (x) ≤ 0
To solve such problems, we can use the Lagrangian, which is a weighted com-
bination of our objective function and constraints. The Lagrangian is
L(x, λ1 , λ2 ) = f (x) + λ1 f1 (x) + λ2 f2 (x)
The idea behind this Lagrangian function is to transform our original con-
strained problem into an unconstrained problem with several variables which
we can handle as we discussed in the previous sections. We just transformed
our original constrained problem into an unconstrained problem: minimize
L(x, λ1 , λ2 . Now we solve this unconstrained problem. First, we solve for the
critical points of the Lagrangian:
∂L ∂f ∂f1 ∂f2
∂x = 0 =⇒ ∂x + λ1 ∂x + λ2 ∂x = 0
∂L
∂λ1 = 0 =⇒ f1 (x) = 0
∂L = 0 =⇒ f (x) = 0
∂λ2 2
We see that we actually recover our constraints when differentiating with re-
spect to λ1 and λ2 .
∂L
∂z = 0 =⇒ 2x + 2y + λxy = 0
∂L = 0 =⇒ xyz = 2
∂λ
23
Solving these four equations with four unknowns, we find that
√3
x=y=z= 2
Lagrangians show up all the time in CME 307 in homeworks and exams.
24