Advanced Calculus Lectures
Advanced Calculus Lectures
audrey terras
Math. Dept., U.C.S.D., La Jolla, CA 92093-0112
November, 2010
Part I
Preface
These notes come from various courses that I have taught at U.C.S.D. using Serge Langs Undergraduate Analysis as the
basic text. My lectures are an attempt to make the subject more accessible. Recently Rami Shakarchi published Problems
and Solutions for Undergraduate Analysis, which provides solutions to all the problems in Langs book. This caused me to
collect my own exercises which are included. Exams are also to be found. The main dierence between the approach of Lang
and that of other similar books is the treatment of the integral which emphasizes the properties of the integral as a linear
function from the set of piecewise continuous real valued functions on an interval to the real numbers. Thus the approach
can be viewed as intermediate between the Riemann integral and the Lebesgue integral. Since we are interested mainly in
piecewise continuous functions, we are really getting the Riemann integral.
In these lectures we include more pictures and examples than the usual texts. Moreover, we include less denitions
from point set topology. Our aim is to make sense to an audience of potential high school math teachers, or economists,
or engineers. We did not write these lectures for potential math. grad students. We will always try to include examples,
pictures and applications. Applications will include Fourier analysis, fractals, ....
Warning to the reader: This course is to calculus as xing a car is to driving a car. Moreover, sometimes the car is
invisible because it is an innitesimal car or because it is placed on the road at innity. It is thus important to ask questions
and do the exercises.
A Suggestion: You should treat any mathematics course as a language course. This means that you must be sure
to memorize the denitions and practice the new vocabulary every day. Form a study group to discuss the subject. It is
always a good idea to look at other books too; in particular, your old calculus book.
Another Warning: Also, beware of typos. I am a terrible proof reader.
Your calculus class was probably one that would have made sense to Newton and Leibniz in the 1600s. However, that
turned out not to be sucient to gure out complicated problems. The basic idea of the real numbers was missing as well as a
real understanding of the concept of limit. This course starts with the foundations that were missing in your calculus course.
You may not see why you need them at rst. Dont be discouraged by that. Persevere and you will get to derivatives and
integrals. We will assume that you know the basics of proofs, sets.
Other References:
Hans Sagan, Advanced Calculus
Tom Apostol, Mathematical Analysis
Dym & McKean, Fourier Series and Integrals
1.1
Some History
Around the early 1800s Fourier was studying heat ow in wires or metal plates. He wanted to model this mathematically
and came up with the heat equation. Suppose that we have a wire stretched out on the x-axis from x = 0 to x = 1. Let
1
u(x, t) represent the temperature of the wire at position x and time t. The heat equation is the PDE below, for t > 0
and 0 < x < 1:
u
2u
=c 2.
t
t
Here c is a positive constant depending on the metal. If you are given an initial heat distribution f (x) on the wire at time
0, then we have the initial condition: u(x, 0) = f (x) also.
Fourier plugged in the function u(x, t) = X(x)T (t) and found that to for the solution to satisfy the initial condition he
needed to express f (x) as a Fourier series:
f (x) =
an e2inx .
(1)
n=
Note that eix = cosx + isinx, where i = (1)1/2 (which is not a real number). This means you can rewrite the series of
complex exponentials as 2 series - one involving cosines and the other involving sines. Fourier made the claim that any
function f (x) has such an expression as a sum of cn sin(nx) and dn cos(nx). People took issue with this although they did
believe in power series expressions of functions (Taylor series and Laurent expansions). But the conditions under which such
series converge to the function were really unclear when Fourier rst worked on the subject.
Fourier tells us that the Fourier coecients are
1
an =
f (y)e2iny dy.
(2)
If you believe that it is legal to interchange sum and integral, then a bit of work will make you believe this, but unfortunately, that isnt always legal when f is a bad guy. This left mathematicians in an uproar in the early 1800s. And it took
at least 50 years to bring some order to the subject.
Part of the problem was that in the early 1800s people viewed integrals as antiderivatives. And they had no precise
meaning for the convergence of a series of functions of x such as the Fourier series above. They argued a lot. They would
not let Fourier publish his work until many years had passed. False formulas abounded. Confusion reigned supreme. So this
course was invented. We wont have time to go into the history much, but it is fascinating. Bressoud, A Radical Approach
to Real Analysis, says a little about the history. Another reference is Grattan-Guinness and Ravetz, Joseph Fourier. Still
another is Lakatos, Proofs and Refutations.
We will end up with a precise formulation of Fouriers theorems. And we will be able to do many more things of interest
in applied mathematics. In order to do all this we need to understand what the real numbers are, what we mean by the limit
of a sequence of numbers or of a sequence of functions, what we mean by derivatives and integrals. You may think that you
learned this in calculus, but unless you had an unusual calculus class, you just learned to compute derivatives and integrals
not so much how to prove things about them.
Fourier series (and integrals) are important for all sorts of things such as analysis of time series, looking for periodicities.
The nite version leads to a computer algorithm called the fast Fourier transform, which has made it possible to do things
such as weather prediction in a reasonable amount of time. Matlab has a nice demo of the search for periodicities. We
modied it in our book Fourier Analysis on Finite Groups and Applications to look for periodicities in LA yearly rainfall.
The rst answer I found was 12.67 years. See p. 159 of my book. Another version leads to the number 28.75 years.
Almost any applied math. problem leads to an analysis question. Look at any book on mathematical methods of physics
and engineering. There are also many theoretical problems in computer science that lead to analysis questions. The same
can be said of economics, chemistry and biology. Here we list a few examples. We do not give all the details. The idea is
to get a taste of such problems.
Example 1. Population Growth Model - The Logistic Equation.
References.
I. Stewart, Does God Play Dice? The Mathematics of Chaos, p. 155.
J. T. Sandefur, Discrete Dynamical Systems.
Dene the logistics function Lk (x) = kx(1 x), for x [0, 1]. Here k is a xed real number with 0 < k < 4. Let
x0 [0, 1] be xed. Form a sequence
x0 , x1 = Lk (x0 ), x2 = Lk (x1 ), , xn = Lk (xn1 ),
Question: What happens to xn as n ?
The answer depends on k. For k near 0 there is a limit. For k near 4 the behavior is chaotic. Our course should give us
the tools to solve this sort of problem. Similar problems come from weather forecasting, orbits of asteroids. You can put
these problems on a computer to get some intuition. But you need analysis to prove that you intuition is correct (or not).
Example 2. Central Limit Theorem in Probability and Statistics.
References.
Feller, Probability Theory
Dym and McKean, Fourier Series and Integrals, p. 114
Terras, Harmonic Analysis on Symmetric Spaces and Applications, Vol. I
Where does the bell shaped curve originate?
Denition 1 For integrable functions f and g:R R, dene the convolution f g ( f "splat" g) to be
(f g) =
f (y)g(x y)dy.
f (x)dx = 1,
xf (x)dx = 0,
x2 f (x)dx = 1.
b n
a n
1
(f
f )(x)dx
n
2
n
b
ex
/2
dx.
c) Theta Function.
References.
Lang, Undergraduate Analysis
Terras, Harmonic Analysis on Symmetric Spaces and Applications, Vol. I
One of the Jacobi identities says that for t > 0, we have
1
t2
(t) =
e
=
( ).
t t
n=
4
This is a rather unexpected formula - a hidden symmetry of the theta function. It implies (as Riemann showed in a
paper published in 1859) that the Riemann zeta function also has a symmetry, relating (s) with (1 s).
d) Famous Inequalities.
i) Cauchy-Schwarz Inequality
Reference.
Lang, Undergraduate Analysis
n
Suppose that V is a vector space such as Rn with a scalar product < v, w >=
vi wi , if vi denotes the ith coordinate
i=1
of v in Rn . Then the length of v is v = < v, v >. The Cauchy-Schwarz inequality says
(3)
1
2
1
1
2
f (x)g(x)dx
f (x) dx g(x)2 dx.
(4)
Amazingly the same proof works for inequality (3) as for inequality (4).
ii) The Isoperimetric Inequality.
Reference.
Dym and McKean, Fourier Series and Integrals
This inequality is related to Queen Didos problem which is to maximize the area enclosed by a curve of xed length. In
800 B.C., as recorded in Virgils Aeneid, Queen Dido wanted to buy land to found the ancient city of Carthage. The locals
5
would only sell her the amount of land that could be enclosed with a bulls hide. She cut the hide into narrow strips and
then made a long strip and used it to enclose a circle (actually a semicircle with one boundary being the Mediterranean
Sea).
The isoperimetric inequality says that if A is the area enclosed by a plane curve and L is the length of the curve enclosing
this area,
4A L2 .
Moreover, equality only holds for the circle which maximizes A for xed L.
f(w) =
f (t)e2itw dt.
|f (t)| dt = 1,
2
t |f (t)| dt = 0,
w f(w) dw = 0.
2
t |f (t)| dt
2
1
2
w f (w) dw
.
4
2
The integral over t measures the square of the time duration of the signal f (t) and the integral over w measures the square
of the frequency spread of the signal. The uncertainty inequality can be shown to be equivalent to the following inequality
involving the derivative of f rather than the Fourier transform of f :
t |f (t)| dt
|f (u)| du
1
.
4
G. Cantor (1845-1918) developed the theory of innite sets. It was controversial. There are paradoxes for those who throw
caution to the winds and consider sets whose elements are sets. For example, consider Russells paradox. It was stated
by B. Russell (1872-1970). We use the notation: x S to mean that x is an element of the set S; x
/ S meaning x is not
an element of the set S. The notation {x|x has property P } is read as the set of x such that x has property P . Consider
the set X dened by
X = {sets S|S
/ S}.
Then X X implies X
/ X and X
/ X implies X X. This is a paradox. The set X can neither be a member of
itself nor not a member of itself. There are similar paradoxes that sound less abstract. Consider the barber who must shave
every man in town who does not shave himself. Does the barber shave himself? A mystery was written inspired by the
paradox: The Library Paradox by Catharine Shaw. There is also a comic book about Russell, Logicomix by A. Doxiadis
and C. Papadimitriou.
We will hopefully avoid paradoxes by restricting consideration to sets of numbers, vectors, functions. This would not be
enough for "constructionists" such as E. Bishop, once at U.C.S.D. Anyway for applied math., one can hope that paradoxical
sets and barbers do not appear.
Most books on calculus do a little set theory. We assume you are familiar with the notation. Lets do pictures in
the plane.
We write A B if A is a subset of B; i.e., x A implies x B. If A B, the complement of A in
B is B A = {x B |x
/ A } . The empty set is denoted . It has no elements. The intersection of sets A and B is
A B = {x|x A and x B}. The union of sets A and B is A B = {x|x A or x B}. Here or means either or both.
See Figure 4.
7
Denition 4 If A and B are sets, the Cartesian product of A and B is the set of ordered pairs (a, b) with a A and
b B; i.e.,
A B = {(a, b)|a A, b B}.
Example 1. Suppose A and B are both equal to the set of all real numbers; A = B = R. Then A B = R R = R2 .
That is the Cartesian product of the real line with itself is the set of points in the plane.
Example 2. Suppose C is the interval [0, 1] and D is the set consisting of the point {2}. Then C D is the line segment
of length 1 at height 2 in the plane. See Figure 5 below.
10
11
Next we recall the denitions of functions which will be extremely important for the rest of these notes.
Denition 5 A function (or mapping or map)
element f (a) B. The notation is f : A B.
f maps set A into set B means that for every a A there is a unique
Denition 6 The function f : A B is one-to-one (1-1 or injective) if and only if f (a) = f (a ) implies a = a .
Denition 7 The function f : A B is onto (or surjective) if and only if for every b B, there exists a A such that
f (a) = b.
A function that is 1-1 and onto is also called bijective or a bijection. Given a function f : A B, we can draw a
graph consisting of points (x, f (x)), for all x A. It is a subset of the Cartesian product A B. Equivalently a function
f : A B can be viewed as a subset F of A B such that (a, b) and (a, c) in F implies b = c.
Notation. From now on I will use the abbreviations:
i
if and only if
for every
there exists
s.t.
such that
13
Denition 8 Suppose f : A B and g : B C. Then the composition g f : A C is dened by (g f ) (x) = g(f (x)),
for all x A.
It is easily seen (and the reader should check) that this operation is associative; i.e., f (g h) = (f g) h. However this
operation is not commutative in general; i.e., f g = g f usually. For example, consider f (x) = x2 and g(x) = x + 1,
for x R.
There is a right identity for the operation of composition of functions. If f : A B and IA (x) = x, x A, then
f IA = f. Similarly IB is a left identity for f ; i.e., IB f = f.
Denition 9 If f : A B is 1-1 and onto, it has an inverse function f 1 : A dened by requiring f f 1 = IB and
f 1 f = IA . If f (a) = b, then f 1 (b) = a.
Example 1. Let f (x) = x2 map [0, ) onto [0, ). Then f is 1-1 and onto with the inverse function f 1 (x) = x = x1/2 .
Here of course we take the non-negative square root of x 0. It is necessary to restrict f to non-negative real number in
order for f to be 1-1.
Example 2. Dene f (x) = ex . Then f maps (, ) 1-1 onto (0, ). The inverse function is f 1 (x) = log x = loge x.
It is only dened for positive x. We discuss these functions in more detail later.
When you draw the graph of f 1 , you just need to reect the graph of f across the line y = x.
Mathematical Induction
Z+ {1, 2, 3, 4, ...}
the positive integers
Z
{0, 1, 2, 3, ...} the integers
Notation:
R
(, +)
real numbers
We will assume that you are familiar with the integers as far as arithmetic goes. They satisfy most of the axioms that
we will list later for the real numbers. In particular, Z is closed under addition and multiplication (also subtraction but
not division). This means n, m Z implies n + m, n m and n m are all unique elements of Z . Moreover, one has an
identity for +, namely 0, an identity for *, namely 1. Addition and multiplication are associative and commutative. There
is an additive inverse in Z for every n Z namely n. But unless n = 1, there is no multiplicative inverse for n in Z.
One thing that dierentiates Z from the real numbers R is the following axiom. Moreover there is an ordering of Z which
behaves well with respect to addition and multiplication. We will list the order axioms later, with one exception.
Axiom 11 The Well Ordering Axiom. If S Z+ , and S = , then S has a least element a S such that a x,
x S.
This axiom says that any non-empty set of positive integers has a least element. We usually call such a least element a
minimum. By an axiom, we mean that it is a basic unproved assumption.
G. Peano (1858-1932) wrote down the 5 Peano Postulates (or axioms) for the natural numbers Z+ {0}. We wont list
them here. See, for example, Birkho and MacLane, Survey of Modern Algebra. Once one has these axioms it would be
nice to show that something exists satisfying the axioms. We will not do that here, feeling pretty condent that you believe
Z exists.
The most important fact about the well ordering axiom is that it is equivalent to mathematical induction.
Domino Version of Mathematical Induction. Given an innite line of equally spaced dominos of equal dimensions
and weight, in order to knock over all the dominos by just knocking over the rst one in line, we should make sure that the
nth domino is so close to the (n+1)st domino that when the nth domino falls over, it knocks over the (n+1)st domino. See
Figure 10.
14
Figure 10: An innite line of equally spaced dominos. If the nth domino is close enough to knock over the n+1st domino,
then once you knock over the 1st domino, they should all fall over.
Translating this to theorems, we get the following
Principle of Mathematical Induction I.
Suppose you want to prove an innite list of theorems Tn , n = 1, 2, .... It suces to do 2 things.
1) Prove T1 .
2) Prove that Tn true implies Tn+1 true for all n 1.
Note that this works by the well ordering axiom. If S = {n Z+ |Tn is false}, then either S is empty or S has a least
element q. But we know q > 1 by the fact that we proved T1 . And we know that Tq1 is true since q is the least element of
S. But then by 2) we know Tq1 implies Tq , contradicting q S.
Example 1. Tn is the formula used by Gauss as a youth to confound his teacher:
1 + 2 + + n =
n(n + 1)
,
2
n = 1, 2, 3, ....
n(n + 1)
2
Add the next term in the sum, namely, n + 1, to both sides of the equation:
n+1=n+1
Obtain
1 + 2 + + n + (n + 1) =
n(n + 1)
+ (n + 1)
2
n+2
2
,
+ + n
1 + 2
n + (n 1) + + 1
When you add corresponding terms you always get n + 1. There are n such terms. Thus twice our sum is n(n + 1).
Example 2. The formula relating n! and Gamma.
Assume the denition of the gamma function given in the preceding section of these notes makes sense:
(s) = et ts1 dt, for s > 0.
0
n! = (n + 1) = et tn dt.
0
udv = uv
1 n+1
.
n+1 t
n! = (n + 1) = et tn dt =
0
vdu.
Plug this into the integration by parts formula and get
1
1 t n+1
+
et tn+1 dt
e t
n+1
n
+
1
t=0
0
1
= 0+
(n + 2).
n+1
This says (n + 1)n! = (n + 2), which is formula Gn+1 . This completes our induction proof.
There is also a second induction principle. See Lang, p. 10. You should be able to translate it into something you
believe about dominoes. Of course, you have never really seen or been able to draw an innite collection of dominos. You
might want to try to draw a domino picture for the second form of mathematical induction.
16
Denition 14 A set S is denumerable (or countable and innite) i there is a 1-1, onto map f : Z+ S.
Examples.
1) Z=the set of all integers is denumerable.
2) 2Z=the set of even integers is denumerable.
Corresponding statements for innite sets to those of the preceding proposition defy intuition. Cantors set theory
boggled many minds. Luckily we only need to note a few things from Cantors theory. Later we will have a new way to
think about the size of innite sets. For example length of an interval or area of a region in the plane or volume of a region
in 3-space.
Proposition 15 Facts About Denumerable Sets.
Fact 1) a) If S is a denumerable set, then there exists a proper subset T S, proper meaning T = S, such that there is
a bijection f : S T.
Fact 1b) Any innite subset of a denumerable set is also denumerable.
Fact 2) If sets S and T are denumerable, then so is the Cartesian product
S T.
Fact 3) If {Sn }n1 is a sequence of denumerable sets, then the union
Sn is denumerable.
n1
17
Proof. (See the stories after this proof for a more amusing way to see these facts).
Fact 1a) Suppose that h:Z+ S is 1-1,onto. Write h(n) = sn , for n = 1, 2, 3, ..... That is we can think of S as a
sequence {sn }n1 with the property that sn = sm implies n = m. So let T = {s2 , s3 , s4 , ...} = S {s1 } . That is, take
one element s1 out of S. Dene the map g:Z+ T by g(n) = sn+1 . It should be clear that g is 1-1,onto. Thus T is
denumerable.
Fact 1b) Suppose that T is an innite subset of the denumerable set S. Then writing S as a sequence as in 1a, we have
S = {s1 , s2 , s3 , s4 , ...} and T = {sk1 , sk2 , sk3 , ...}, with k1 < k2 < k3 < < kn < kn+1 < . We can think of T as a
subsequence of S and then mapping h:Z+ S is dened by h(n) = skn , for all n Z+ . Again it should be clear that h is
1-1 and onto.
Fact 2) Let S = {sn }n1 and T = {tn }n1 . Then S T = {(sn , tm )}(n,m)Z+ Z+ . So we have a 1-1, onto map
f :Z+ Z+ S T dened by f (n, m) = (sn , tm ). Thus to prove 2) we need only show that there is a 1-1, onto map
g:Z+ Z+ Z+ . For then f g is 1-1 from Z+ onto S T. We will do this as follows by lining up the points with positive
integer coordinates in the plane using the arrows as in Figure 11.
Figure 11: The arrows indicate the order to enumerate points with positive integer coordinates in the plane.
Start o at (1, 1) and follow the arrows. Our list is
(1, 1), (2, 1), (1, 2), (3, 1), (2, 2), (1, 3), (4, 1), (3, 2), (2, 3), (1, 4), ....
The general formula for the function g 1 : Z+ Z+ Z+ is
g(m, n) =
(m + n 2)(m + n 1)
+ n.
2
just found to see that this set is denumerable. Of course we need to be careful since the map dened by f (m, n) = sm,n for
m, n Z+ may not be 1-1.
18
Figure 12: In the list to the right of the triangle we count the number of points with positive integer coordinates on each
diagonal. The last diagonal goes through the point (m,n).
Fact 4) These examples follow fairly easily from 1), 2) and 3). We will let the reader ll in the details.
Fact 5) Here we use Cantors diagonal argument (Sagan, Advanced Calculus, p. 53) to see that the set of real numbers
R is not denumerable. This is a proof by contradiction. We will assume that R is denumerable and deduce a contradiction.
In fact, we look at the interval [0, 1] and show that even this proper subset of R is not denumerable. We can represent real
1
2
3
7
9
numbers in [0, 1] by decimal expansions like .12379285..... By this, we mean 10
+ 100
+ 1000
+ 10000
+ 100000
+ .
In general a real number in [0, 1] has the decimal representation = .a1 a2 a3 a4 , with ai {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}.
Thus if [0, 1] were denumerable, wed have a list of decimals including every element of [0, 1] exactly once as follows:
1 = .a11 a12 a13 a14
2 = .a21 a22 a23 a24
3 = .a31 a32 a33 a34
19
Stories about the Innite Motel - Interpretation of the Facts about Denumerable Sets
Figure 13: The megamotel of the galaxy with denumerably many rooms.
A traveller arrived at the motel and saw that it was full. He began to be worried as the next galaxy was pretty far away.
"No problem," said the manager and proceeded to move the occupant of room rn to room rn+1 for all n = 1, 2, 3, .... Then
room r1 was vacant and the traveller was given that room. See Figure 14.
Moral: If you add or subtract an element from a denumerable set, you still have a denumerable set.
A few days later a denumerably innite number of bears showed up at the motel which was still full. The angry bears
began to growl. But the manager did not worry. He moved the guest in room rn into room r2n , for all n = 1, 2, 3, ....
This freed up the odd numbered rooms for the bears. So bear bn was placed in room r2n1 , for all n = 1, 2, 3, .... .
Moral. A union of 2 denumerable sets is denumerable.
The motel was part of a denumerable chain of denumerable motels. Later, when the galactic economy went into a
depression, the chain closed all motels but 1. All the motels in the chain were full and the manager of the mega motel
was told to nd rooms for all the guests from the innite chain of denumerable motels. This motel manager showed his
cleverness again as Figure 15 indicates. He listed the rooms in all the hotels in a table so that hotel i has guests rooms
ri,1 , ri,2 , ri,3 , ri,4 , .... Then, in a slightly dierent manner from the proof of Fact 2 in Proposition 15, he twined a red thread
through all the rooms, lining up the guests so that he could put them into rooms in his motel.
Moral: The Cartesian product of 2 denumerable sets is denumerable.
But the next part of the story concerns a defeat of this clever motel manager. The powers that be in the commission of
cosmic motels asked the manager to compile a list of all the ways in which the rooms of his motel could be occupied. This
20
Figure 14: A traveler t arrives at the full hotel. Guest gn is moved to room rn+1 and the traveler t is put in room r1 .
21
Figure 15: The manager of the mega motel has to put the guests from the entire chain of denumerable motels into his motel.
He runs a red thread through the rooms to put the guests in order and thus into 1-1 correspondence with Z+ and the rooms
in his motel.
22
list was supposed to be an innite table. Each line of the table was to be an innite sequence of 0s and 1s. At the nth
position there would be a 1 if room rn were occupied and a 0 otherwise. For example, the sequence 0000000000000 ......
would represent an empty motel. The sequence 1010101010101010........ would mean that the odd rooms were occupied
and the even rooms empty.
The proof of Fact 5 in Proposition 15 (Cantors diagonal argument) shows that this list is incomplete. For suppose the
table is
1
2
3
n
Now dene = b1 b2 b3 with bi {0, 1} by saying bn = 1, if ann = 0, and bn = 0, if ann = 1. Then cannot
be in our table at any row. For the nth entry in cannot equal the nth entry in n , for any n.
So the set of all ways of occupying the motel is not denumerable. The motel manager failed this time.
Moral. The set of all sequences of 0s and 1s is not denumerable. Nor are the real numbers.
Part II
Figure 16 shows our favorite sets of numbers. Of course, they are all innite sets an thus we cannot put all the elements in.
First we have Z = {0, 1, 2....}, the integers,
spaced points on a line, marching out to innity.
a discrete set of equally
m, n Z, n = 0 . This set is everywhere dense in the real line; every
Then we have the rational numbers Q = m
n
open interval contains a rational. For example any open
interval containing 0, must contain innitely many of the numbers
1
,
n
=
1,
2,
3,
4,
....
However,
Q
is
full
of
holes
where
2, e, would be if they were rational but they are not.
n
2
The set of real numbers, R, consists of all decimal expansions including that for
e = 2.71828 18284 59045 23536 02874 71352 66249 7757.....
It can be pictured as a continuous line, with no holes or gaps. You can think of the real numbers algebraically as decimals.
By this we mean an innite series:
=
j=n
In the usual decimal notation, we write = a1 a2 a3 a4 a5 . This representation is not unique; for example, 0.999999 =
1. To see this, use the geometric series
1
xn =
, for |x| < 1.
1
x
n=0
Here I assume that you learned about innite series in calculus. They are of course limits, which we have yet to dene
carefully. Anyway, back to our example, we have
n
9 1
9
1
0.9999999 =
=
1 = 1.
10 n=0 10
10 1 10
Z is the set of real numbers which have a decimal representation with only all 0s or all 9s after the decimal point.
Q is the set of real numbers with decimals that are repeating after a certain point. For example, 13 = 0.333333333 ;
1
142857
1
7 = 0.142 857142857142857 . This too comes from the geometric series and the fact that
999999 = 7 .
23
Figure 16: The red dots indicate integers in the rst line, rational numbers in the second line, and real numbers in the 3rd
line. Of course we cannot actually draw all the rationals in an interval so we tried to indicate a cloud of points.
24
Given real numbers x, y, z we have unique real numbers x + y, xy such that the following axioms hold x, y, z R.
A1. associative law for addition: (x + y) + z = x + (y + z)
A2. identity for addition: 0 R s.t. 0 + x = x
A3. inverses for addition: x R, x R s.t. x + (x) = (x) + x = 0.
A4. commutative law for addition: x + y = y + x
M1. associative law for multiplication: x(yz) = (xy)z
M2. identity for multiplication: 1 R s.t. 1x = x1 = x
M3. multiplicative inverses for non-zero elements: x R s.t. x = 0, x1 R s.t. xx1 = 1 = x1 x
M4. commutative law for multiplication: xy = yx
D. distributive law: x(y + z) = xy + xz
Any set with 2 operations + and that satisfy the preceding 9 axioms is called a eld. The rational numbers Q also
satisfy these 9 axioms and are thus a eld too. Mostly elds are topics studied in algebra, not analysis.
From these laws you can deduce the many facts that you know from school before college. For example we list a few
facts.
Facts About R that Follow from the Field Axioms.
Fact 1) x, y, z R, if xy = xz and x = 0, then y = z.
Fact 2) 0 x = 0 x R.
Fact 3) The elements 0 and 1 are unique.
Fact 4) (x) = (x)(x) = x.
Proof. Fact 1) Multiply the equation by x1 which exists by M3. This gives
y = 1 y = (x1 x)y = x1 (xy) = x1 (xz) = (x1 x)z = 1 z = z.
Here we have used axioms M2, M3,M1.
Fact 2) Using our axioms M2, D, A2 we have
0 x + x = 0 x + 1 x = (0 + 1) x = 1 x = x.
It follows that 0 x + x = x. Now subtract x from both sides (or equivalently add x to both sides) to get
(0 x + x) x = x x
which says by A1 and A3
0 x + (x x) = 0.
Thus by A3 and A2, 0 x + 0 = 0 and again by A2, we have 0 x = 0.
Fact 3) and Fact 4) We leave these proofs to the reader.
The set R has a subset P which we know as the set of positive real numbers. Then P satises the following 2 Order
Axioms:
Ord 1. R = P {0} (P ), where P = {x|x R} = negative real numbers. Moreover, this union is disjoint;
i.e., the intersection of any pair of the 3 sets is empty.
Ord 2. x, y P implies x + y and xy P.
Denition 16 For real numbers x, y we write x < y i y x P. We write x y i either x < y or x = y.
All the usual properties of inequalities can be deduced from our 2 order axioms and this denition. We will do a few of
these.
Facts about Order. x, y, z R
Fact 1) Transitivity. x < y and y < z implies x < z.
25
Fact 2) Trichotomy. For any x, y, z R exactly one of the following inequalities is true: x < y, y < x, or x = y.
Fact 3) x < y implies x + z < y + z for any z R.
Fact 4) 0 < x i x P.
Fact 5) If 0 < c and x < y, then cx < cy.
Fact 6) If c < 0 and x < y, then cy < cx.
Fact 7) 0 < 1
Fact 8) If 0 < x < y, then 0 < y1 < x1 .
Proof. We will leave most of these proofs to the reader as Exercises. But lets do 1)and 7).
Fact 1) x < y means y x P. y < z means z y P. Then by Ord2 and the axioms for arithmetic in R, we have
y x + z y = z x P. This says x < z.
Fact 7) note that by Ord 1, 1 must either be in P or P , since 1 = 0 (Why?). If 1 P, then 1 P and Ord 2 says
(1)(1) = 1 P, contradiction.
x, if x 0,
Denition 17 The absolute value |x| of a real number x is dened by |x| =
x, if x < 0.
Equivalently, we can write
|x| =
x2 .
(5)
Fact 2) We prove it by contradiction. Assume x > y. Then x = x x > x y > y y = y. Again by transitivity,
we have x > y, a contradiction to our hypothesis.
The absolute value is very useful. For example, it allows us to dene the distance d(x, y) between 2 real numbers x and
y to be d(x, y) = |x y| .
26
10
10.1
We have stated 9 eld axioms and 2 order axioms. Both of these axioms are also valid for the rational numbers; i.e.,
Q is
an ordered eld just like R. So what distinguishes Q from R? We tried indicate this in Figure 16. Q has holes like 2, , e,
while R is a continuum. Of course the holes in Q are as invisible as points. We will prove later that every interval on the
real line contains a rational number.
There is a fairly simple axiom that allows us to say that R has no holes. Before stating this axiom, lets explain why 2
is irrational. The Pythagoreans noticed this over 1000 years ago but kept it secret on pain of death. It seemed evil to them
that the diagonal of a unit square or the hypotenuse of such a nice triangle as that in Figure 17 should be irrational.
Figure 17:
Theorem 18
2 is the length of the diagonal of a square each of whose sides has length 1.
2 is not rational.
m
n
m
, with m, n Z, n = 0.
(6)
n
is in lowest terms; i.e., m and n have no common divisors. Square formula (6). This
2=
m2
and then 2n2 = m2 .
n2
But then m must be even, since the square of an odd number is odd. So m = 2r, for some r Z. Therefore
2=
m2 = 4r2 = 2n2 .
Divide by 2 to see that n has to be even since n2 is even. This is a contradiction since n and m now have a common divisor,
namely, 2.
Similarly (or, better, using unique factorization of positive integers as a product of primes)
one can show that m is
irrational for any positive integer m such that m is not the square of another integer. Thus 5, 6 are also irrational. You
can do similar things for cube roots. It is harder to see that and e are irrational. We will at least show e is irrational
later. In
fact, e and are transcendental, meaning that they are not roots of a polynomial with rational coecients. Of
course, 2 is a root of x2 2. See Hardy and Wright, Theory of Numbers, for more information.
A reference for weird facts about numbers (without proof) is David Wells, The Penguin Dictionary of Curious and
Interesting Numbers. Here we learn that J. Lambert proved
/ Q in 1766. And in 1882 Lindemann proved to be
transcendental. This implies that it is not possible to square the circle with ruler and compass - one of the 3 famous
problems of antiquity. It asks for the ruler and compass construction of a square whose area equals that of a given circle.
27
The other 2 problems are angle trisection and cube duplication. To understand these problems you need to gure out the
precise rules for ruler and compass constructions. Many undergraduate algebra books use Galois theory to show that all
three problems are impossible.
Despite the provable impossibility of circle squaring, circle squarers abound. In 1897 the Indiana House of Representatives
16
almost passed a law setting =
= 9.237 6 - due to the eorts of a circle squarer.
3
Now many computers have been put to work nding more and more digits of .
year
1961
1967
1988
1989
digits
100,000
500,000
201 million
over 1 billion
where
U.S., Shanks and Wrench
France
Japan, Y. Canada
U.S., Chudnovsky brothers
What is the point of such calculations? Some believe that is a normal number, which means that there is, in some
sense, no pattern at all in the decimal expansion of . But we digress into number theory. Anyway here are the rst few
digits:
10.2
a = l.u.b.S
(or
a = sup S).
What is the l.u.b.? It is just what it claims to be; namely, the least of all the upper bounds for S (assuming S has upper
bounds). There is also an analogous denition of greatest lower bound (g.l.b.) or inmum. We give the reader the job
of writing down the denition. It is the greatest of all possible lower bounds.
If the set S is nite, there is no problem nding the l.u.b. or g.l.b. of S. Then you would say l.u.b.S is the maximum
element of the nite set, for example. But when the set S is innite, things become a lot less obvious. Why even should
the l.u.b. of S exist? We will soon state an axiom that proclaims the existence of the l.u.b. or g.l.b. of a bounded set.
Otherwise we would have no way to know. We will have proclaimed this existence by at. If we were kind, we would also
produce a proof that the real numbers actually exist. I am sure you are not really worried about that. Or are you?
The l.u.b. of S is the left most real number to the right of all the elements of S. If you confuse right and left as much as
I do, you may nd this confusing.
Examples.
1) l.u.b.{ 2, , e} = .
2) If S = (0, 1) = {x R|0 < x < 1}, then l.u.b.S = 1. This example shows that the least upper bound need not be
an element of the set.
3) If S = n1 n Z+ , then g.l.b.S = 0. This example shows that the greatest lower bound need not be in the set.
The Completeness Axiom.
Suppose S is a non-empty set of real numbers and that S is bounded above; i.e., there is a real number B such that x B
for all x S. Then there exists a real number a = l.u.b.S.
Assume l.u.b. S = a
/ S. This is the most interesting case. Figure 18 shows the picture of a least upper bound a for
a bounded set S. The red dots represent points of the innite set S. Of course I cannot really put an innite number of
28
points on a page in a lifetime. So you have to imagine them. Also points have no width. So what I am drawing is not
really the points but a representation of what they would be if they had width. Again use your imagination. The point B
(blue oval) at the right end is an upper bound for the set S. The point a (blue oval) is the least upper bound of S.
Assume l.u.b. S = a
/ S. One way to characterize l.u.b. S is to note that, for a small positive , the interval (a , a)
has innitely many points from the set S, Why? Otherwise there would be a smaller l.u.b. S than a. On the other hand,
the interval [a, a + ) has no points from S, again assuming is small and positive. Why? Otherwise a would not even be
an upper bound for S.
Figure 18: The innite bounded set S is indicated with red dots. An upper bound B for S is indicated with a blue oval.
The least upper bound a of S is indicated with a blue oval. The interval (a , a) contains innitely many points of S.
The interval (a, a + ) contains no points of S. Here a and a + are also indicated with blue ovals. Since S is innite
we cannot really draw all its points. Moreover points are invisible really. So our gure is just a shadow of what is really
happening. That you have to imagine.
The completeness axiom will imply all we need to know about limits and their existence. And that is what this course
is about. Without it we would be in serious trouble.
The result of all this is that R is characterized by 9+2+1 axioms; 9 eld axioms, 2 order axioms, and 1 completeness
axiom. To summarize that, we say that R is a complete ordered eld.
Next let us look at an example of how the completeness axiom can
be used to ll in a hole in the rationals.
Example 1. A Set of Rational Numbers whose l.u.b. is 2 - found by Newtons Method.
See Figure 19.
1
Dene x1 = 2, x2 = 12 x1 + x11 , ..., xn = 12 xn1 + xn1
, n = 2, 3, 4, 5, ..... In this way, we have an inductive denition of
+
an innite set S = { xn | n Z } . The
rst few elements of S are 2, 1.5, 1.416666666...., 1.414215686, ...... We claim that
the least upper bound of this set is 2. We will prove this claim later after noting that {xn }n1 is an decreasing sequence
that is bounded below and thus must have a limit (which we are about to dene), which is 2. This says that 2 is the l.u.b.
of S.
Note on Newtons Method.
This is a method which often approximates the root of a polynomial very well. In this case the polynomial is x2 2. To
nd a root near x1 = 2, you need to look at the tangent to the curve y = f (x) = x2 2 at the point (x1 , f (x1 )). See Figure
19. The point where that tangent line intersects the x-axis is x2 . To nd it, look at the slope of the tangent at (x1 , f (x1 ))
which is f (x1 ) = 2x1 = 4. Then use the point-slope equation of a line to get:
=
=
f (x1 ) 0
x1 x2
2
.
2 x2
29
30
Part III
Limits
11
Denition of Limits
Recall that a sequence {xn }n1 = {x1 , x2 , x3 , x4 , ...} has indices n which are positive integers. We do not assume that
the mapping from Z+ to {xn }n1 is 1-1. In fact a possible sequence has all xn equal to the same number, say 2. No
problem nding the limit of that sequence. You can think of a sequence as a vector with innitely many components.
Denition 20 If {xn }n1 is a sequence of real numbers, we say that the real number L is the limit of xn as n goes to
and write L = lim xn i, for every > 0 there is an N Z+ (with N depending on ) such that n N implies
|xn L| < .
In this denition you are supposed to think of as an arbitrarily small positive guy. Paul Erds used to call children
"epsilons." A computer might think = 1010 . Such a small number of inches would be invisible. We have tried to draw a
picture of a sequence approaching a limit. See Figure 20. Here we graph points (n, xn ) and L = lim xn . Given a small
n
positive number , we are supposed to be able to nd a positive integer N = N () depending on so that every (n, xn ),
for n N () is in the shaded box. Then you have to imagine taking an even smaller say 109 and there will be a new
probably larger N = N ( 109 ) so that all (n, xn ) with n N ( 109 ) are in the new smaller version of the shaded box. These
xn will be extremely close to a. And the point is that you can make all but a nite number of the xn as close you want to a.
This denition of limit dates from the 1800s. It is due to B. Bolzano (1781-1848), A. Cauchy (1789-1857) and K.
Weierstrass (1815-1897). Weierstrass was the advisor of the rst woman math. prof. - Sonya Kovalevsky. The main fact
about Cauchy that comes to my mind is that he lost that memoir of Galois. See Edna Kramer, The Nature and Growth of
Modern Mathematics. An entertaining story book on the history of math. is E. T. Bell, Men of Mathematics.
Now we have a precise
meaning to apply to our example above obtained using Newtons method to nd a sequence of
rationals approaching 2. We will do the proof later after we know more about limits. Lets consider another favorite
example.
Example 2. A Sequence of Rational Numbers Approaching e.
Dene e by the Taylor series for ex . That is,
1
e=
.
n!
n=0
This means that e is the limit of the sequence of partial sums
sn =
n
1
1 1
1
1
=1+1+ + +
+ + .
k!
2 6 24
n!
k=0
So we have s1 = 1, s2 = 2, s3 = 1 + 1 + 12 = 2.5, s4 = 1 + 1 + 12 + 16
= 2.666 7, .... This sequence does not converge as fast as
that in example 1.
Example 3. The Simplest Limit (except for a constant sequence).
Dene the sequence xn = n1 , for all n Z+ . I think it is obvious what the limit of this sequence is; namely,
lim xn = lim
n n
= 0.
+
To
1 prove
this from the denition of limit, we need to show that given > 0, we can nd N Z so that n N implies
0 < . That means, we need
n
1
< , if n N.
n
The inequality is equivalent to saying n > 1 . This means that we should take N = 1 + 1. Here the oor of x = x =
the greatest integer x. For then if n N 1 + 1 > 1 , we have n > 1 .
To picture this you could graph the points (n, n1 ) as in Figure 21. We seek the limit of the y-coordinates as the
x-coordinates approach innity.
Example 3 is this sort of limit that anyone could work out - even before having the wonderful Denition 20. More subtle
limits however could confuse experts such as Cauchy himself. We will see more examples after considering the main facts
about limits of sequences.
32
Figure 21: The red points are (n, n1 ), for n = 1, 2, ..., 10. It is supposed to be clear that the y-coordinates are approaching
0 as the x-coordinates march out to innity. That is the points are getting closer and closer to the y-axis. Of course you
cannot actually see what is happening at innity.
33
12
Now that we have seen a few examples, perhaps we should prove a few things so that we can refer to them rather than
reprove them every time.
Facts About Limits of Sequences of Real Numbers.
Fact 1) (Uniqueness) If the limit a = lim xn exists, then a is unique; i.e., if also b = lim xn , then a = b.
n
Fact 2) (Limit Exists implies Sequence Bounded) If a = lim xn exists, then the set { xn | n Z+ } is bounded
n
above and below.
Fact 3) (Limit of a Sum is the Sum of the Limits) If we have limits a = lim xn and b = lim yn , then
n
lim (xn + yn ) = a + b.
Fact 4) (Limit of a Product) If we have limits a = lim xn and b = lim yn , then lim (xn yn ) = ab.
n
Fact 5) (Limit of a Quotient) If we have limits a = lim xn and b = lim yn , and in addition b = 0, then we have
yn = 0, for all n suciently large and
a
b
xn
,
n yn
= lim
Fact 6) (Limit of an Increasing Bounded Sequence) Suppose that xn xn+1 B, for all n Z+ . Then the limit
a = lim xn exists. Moreover, a is the least upper bound of the set of all elements in the sequence. That is,
n
a = lim xn = l.u.b.{ xn | n Z+ }.
n
There is an analogous result for decreasing sequences which are bounded below.
Fact 7) (Limits Preserve ) Suppose a = lim xn and b = lim yn and xn yn , for all n. Then a b.
n
Proof. Fact 1) Let us postpone this one to the end of this subsection. It may seem to be the most obvious but it requires
a little thought.
Fact 2) Let = 1 be given. Then there exists N such that n N implies |xn L| < 1. By properties of inequalities,
this implies |xn | = |xn L + L| |xn L| + |L| < 1 + |L| , if n N. It follows that a bound on the sequence is
max {|x1 | , ..., |xN 1 | , |L| + 1} .
Fact 3) Suppose we are given an > 0. Then we know we can nd N and M depending on so that
n N implies |xn a| < and n M implies |yn b| < .
(7)
rather than
(8)
In order to get this, given our hypotheses, we use a trick that you may remember from calculus, if you ever proved the
formula for the derivative of a product. We know |xn a| is eventually small for large n, so we should be able to show
|xn yn xn b| is small. Thus we subtract xn b from xn yn ab and add it back in to obtain
|xn yn ab| = |xn yn xn b + xn b ab| .
Now use the triangle inequality and the multiplicative property of the absolute value, nding that
|xn yn ab| = |xn yn xn b + xn b ab| |xn yn xn b| + |xn b ab| = |xn | |yn b| + |b| |xn a| .
(9)
In order to make |xn | |yn b| + |b| |xn a| small, we actually need to know that |xn | is not blowing up as n . Fact
2 tells us that there is a positive number K such that |xn | K, for all n. This, plus inequality (9), implies
|xn yn ab| K |yn b| + |b| |xn a| .
34
(10)
Given > 0, we can nd N and M depending on so that we have an improved version of the inequalities (7)
(11)
We divide by 1 + |b| rather than |b| because b might be 0. We know K > 0 so we dont need to add 1 to K to avoid dividing
by 0.
Now we combine inequalities (10) and (11) to get the desired inequality (8).
Fact 5) We will prove this sort of thing later. The reader should think about it though, using ideas similar to the proof of
4).
Fact 6) We know that the set S = { xn | n Z+ } is non-empty and bounded. Therefore, by the completeness axiom, it has
a least upper bound which we will call a. We want to show that a = lim xn . Suppose that we have been given > 0. Look
n
at a . We know that a < a and thus a cannot be an upper bound for the set S of all xn , n Z+ . Look at Figure
22.
Figure 22: Picture of the proof of fact 6. {xn } is a bounded increasing sequence and a = lub { xn | n Z+ } . Here we assume
n N. We know N exists such that xN (a , a] as a is not an upper bound for the set of all xk , k Z+ .
This means there is an N Z+ so that a < xN a. Since {xn } is an increasing sequence, this means that for all
n N we have
a < xN xn a.
This implies that |xn a| < if n N. Therefore according to our denition of limit, a = lim xn .
n
Fact 7) Using Facts 3 and 4, it suces to look at zn = yn xn 0, n. We know that lim zn = b a and we must show
n
that lim zn 0. Do a proof by contradiction looking at Figure 23.
n
We leave the details to the reader as we will return to do a more general version of this argument later. Next we need 2
lemmas which will prove useful now and in the future.
Lemma 21 Suppose that a is a real number such that |a| < for all > 0. Then a = 0.
Proof. We do a proof by contradiction again. If a is not 0, then |a| > 0 and then, we can take =
1
|a| < |a|
2 . But that is absurd as it implies 1 < 2 and thus 2 < 1 and 1 < 0. We have our contradiction.
|a|
2 .
This means
35
and n M implies
Figure 23: Picture of a proof by contradiction in which we assume that the sequence {xn } is non-negative while the limit L
is negative. But then |xn L| |L| > 0 for all n. This contradicts the denition of limit.
13
n
lim 1 + n1 = e.
n
Here xn = 1 + n1 , for n = 1, 2, 3, .... So we have
Example 1.
n
x1 = 2, x2 =
1+
1
2
2
= 2. 25, x3 =
1+
1
3
3
1+
1
30
30
= 2.674 3.
This is an increasing sequence bounded above by 4. It therefore has a limit. It can be shown (using lHopitals rule after
taking the natural logarithm) that the limit is e
= 2.71828. We will say more about this later. See Section 23.
1 n+1
is a decreasing sequence, also approaching e as n goes to innity. We have, for example
Note that yn = 1 + n
y1 = 4, y2 =
1+
1
2
3
= 3.375, y3 =
1+
1
3
4
1+
1
30
31
= 2.763 5.
Example 2. xn = cos(n) = (1)n , n = 1, 2, 3, ..... This sequence has no limit as n goes to innity. The sequence
alternates between 1 and -1. Thus it cannot decide on a limit.
Proof. To prove there is no limit, proceed by contradiction. If L = lim (1)n , then according to the denition of limit
n
we can take = 1 (or any positive number), and nd N so that n N implies |xn L| < 1. This means that for n even
and large we have
|1 L| < 1 or equivalently 1 < 1 L < 1
and for n odd and large we have
|1 + L| = |1 L| < 1 or equivalently 1 < 1 + L < 1.
But then we can add the inequalities and obtain 2 < 2 < 2. Contradiction. Thus the sequence has no limit.
n
Example 3. lim 3n2
= 13 .
n
36
n
3n2
and
1
3
n
1 1 3n (3n 2) 1 2
=
=
3 3n 2 < .
3n 2 3 3
3n 2
This inequality is equivalent to
3n 2 >
which says
1
n>
3
2
,
3
2
+2 .
3
1
n
1
n
1
3
2
n
1
n n
= 0 which was proved above and Facts about limits stated above to prove the result.
Next recall the example of a sequence approaching 2 obtained using Newtons method.
xn+1 =
1
1
.
xn +
2
xn
Take the limit as n to see that (using the appropriate facts about limits)
L=
L
1
+ .
2
L
Multiply by 2L to obtain 2L2 = L2 + 2. Thus L2 = 2. Therefore L = 2. Why must L be positive? Use Fact 7.
Proof. of the Claim.
Here we use mathematical induction. We give the induction step, taking a = xn and b = xn+1 . We need to show that if
a2 > 2 and a > 0 then b = a2 + a1 implies 0 < b < a and b2 > 2 which implies b > 1.
To see this, note that
a 1
a 1
a2 2
ab=a
+
= =
.
2 a
2 a
2a
Then a2 > 2 and a > 0 imply that a b > 0.
Next look at
2
2
a 1
a 1
a2
1
b2 2 =
2=
> 0.
+
+1+ 2 2=
2 a
4
a
2 a
So b2 > 2. The proof of the claims is complete which nishes the proof that lim xn = 2.
n
37
14
Why is the completeness axiom necessary? Why do we need to understand limits? This idea is fundamental to most of
applied mathematics. It is central to dierential equations and therefore to the theory of earthquakes and cosmology. The
idea of "gradually getting there," "tending towards," "approaching" is one of the most basic. We cannot actually draw a
picture of what is happening to xn when n has moved innitely far out. The idea is subtle.
You can sometimes see it happen on a computer. Look for example at Figure 24.
Here an equilateral polygon approaches a circle. So its area must approach that of a circle. This gives a way to
approximate . Archimedes did this around 250 B.C. Assume the radius of the circle is 1. Then
4 sides give the area 2
8 sides give the area 2.828
16 sides give the area 3.061.
Around 1821 Cauchy had formulated a principle of convergence of sequences of real numbers. His idea was to use the
idea of approximation. We will give the denition of Cauchy sequence soon. It gives a useful way to construct the real
numbers as well as a criterion for convergence of a sequence. See V. Bryant, Yet Another Introduction to Analysis for more
examples.
The need for clarication of the concept of a real number became apparent in 1826 when Abel corrected Cauchys belief
that a sequence of continuous functions must have a continuous limit function. This showed that intuition can be very
misleading when investigating limits. See G. Temple, 100 Years of Math., for more discussion of the history of the concept
of limit.
The completeness axiom for R can be stated in a dierent way. If A and B are non-empty sets of real numbers such that
a b for all a A and b B, then there exists a real number so that a for all a A and b for all b B. This
axiom is another way of saying there are no holes in the real line. V. Bryant, Yet Another Introduction to Analysis, p. 11,
calls "Piggy-in-the-middle." See Figure 25.
Figure 25: piggy in the middle - real number between 2 sets A and B such that a b for every a A and b B.
In 1872 Dedekind used this sort of idea to construct the real numbers. A Dedekind cut consists of 2 sets A and B of
rational numbers such that
1) Q = A B
2) a A and b B implies a b.
38
15
Cauchy Sequences
Next we want to dene Cauchy sequences. This gives a convergence criterion, a new version of the completeness axiom, and
a way to construct the real numbers, the space of Lebesgue integrable functions, and Hensels space of p-adic numbers for
every prime p (as space which has many applications in number theory).
Denition 23 A sequence {xn } of real numbers is a Cauchy sequence i for every > 0 there is an N Z+ such that
n, m N implies |xn xm | < .
In this denition, we just ask that the distance between xn and xm is less than for all but a nite number of n and m.
To say this another way, we ask that the sequence elements xn and xm become arbitrarily close as m, n . The useful
thing about this convergence criterion is that it does not require you to know what the limit is.
Cauchy made this denition in 1821. He did not prove the following theorem.
Theorem 24 Every Cauchy sequence of real numbers has a limit.
This theorem is actually a consequence of our completeness axiom and we will soon give a proof. In fact, the theorem is
logically equivalent to the completeness axiom. Thus one could construct R (as Cantor did in 1883) as "limits of" Cauchy
sequences of rational numbers. Here we identify 2 Cauchy sequences if they converge to the same limit. This gives a
construction of R which is analogous to that used to construct the spaces of Lebesgue integrable functions out of the space
of continuous functions.
Before thinking about proving the preceding theorem, we need to think about subsequences.
Denition 25 Suppose that {xn } is a sequence. A subsequence {xnk } is a sequence obtained by selecting out certain terms
of the original sequence. Here
1 n1 < n2 < n3 < < nk < nk+1 < .
Example. Consider the sequence xn = (1)n . One subsequence consists of the terms with even indices x2n = 1.
Another subsequence consists of the terms with odd indices x2n+1 = 1. Both of these subsequences converge (since they
are constant) even though the original sequence does not converge. According the Fact 4 below this gives another proof
that the original sequence does not converge.
Facts About Cauchy (and Other) Sequences of Real Numbers.
Fact 1. If a sequence of real numbers has a limit then it is a Cauchy sequence.
Fact 2. Cauchy sequences of real numbers are bounded.
Fact 3. Any bounded sequence of real numbers has a convergent subsequence.
Fact 4. For a Cauchy sequence of real numbers, if a subsequence converges to L, then the original sequence also converges
to L.
Proof. Fact 1.
Suppose lim xn = L. Given > 0, we know there is an N Z+ so that n N implies |xn L| < . Similarly m N
n
(12)
Here we have used the triangle inequality. It follows that our sequence is Cauchy. Replace by 2 if you feel paranoid
about the 2 in formula (12).
Fact 2.
Suppose that {xn } is a Cauchy sequence of real numbers. Given = 1, we know by denition that there is an integer
N1 so that n N1 implies |xn xN1 | < 1. Use the triangle inequality to see that n N1 implies
|xn | = |xn xN1 + xN1 | |xn xN1 | + |xN1 | < 1 + |xN1 | .
This gives a bound on the sequence elements xn such that n N1 . That means we have a bound on all but a nite
number of sequence elements. It is then not hard to get a bound on the entire sequence. Such a bound is
max{|x1 | , ..., |xN1 1 | , 1 + |xN1 |}.
39
Fact 3.
We need to show that any bounded sequence of real numbers has a convergent subsequence. We give the proof of Bryant
in Yet Another Introduction to Analysis. There is also a proof in Lang, Undergraduate Analysis as a Corollary to the
Bolzano-Weierstrass Theorem (p. 38).
Step 1. Any sequence has a subsequence which is either increasing or decreasing.
To prove this, we use the Spanish or (La Jolla) Hotel Argument. Consider a sequence of hotels placed on the real
axis such that the nth hotel has height xn , n Z+ . See Figure 26.
Figure 26: Picture of case A in the Spanish Hotel Argument. The indices {kj } correspond to hotels of strictly decreasing
height so that an eyeball at the top of the hotels with label kn can see the ocean and palm tree at innity for every n Z+ .
Note that if the sequence {xn } is not bounded above and below, you can easily nd a subsequence that is either increasing
or decreasing.
There are two possibilities.
Case A). There is an innite sequence of hotels with unblocked views to the right in the direction of the sea at innity. See
Figure 26. This means that there is an innite sequence of positive integers k1 < k2 < k3 < < kn < such that from
the top of the corresponding hotel, a person has an unblocked view to the right in the direction of the sea. If this is the
case, then
xk1 > xk2 > xk3 > > xkn > xkn+1 > .
Thus we have an innite strictly decreasing sequence of hotel heights.
If Case A) is false, we must be in Case B.
Case B). In this case, after a nite number of hotels, every hotel has a blocked view. Let the nite number of hotels be
indexed by kj , j = 1, ..., N . Then the next integer after that is N + 1 = m1 with the property that there is a positive integer
m2 > m1 such that xm2 xm1 . This means hotel m2 blocks the view of hotel m1 . Continue to obtain a subsequence which
is increasing:
xm1 xm2 xm3 xmn xmn+1 .
See Figure 27.
Step 2. Convergence of Subsequence from Step 1.
To see the convergence, just recall that we are assuming in Fact 3 that our original sequence and thus any subsequence
is bounded. So we just need to apply Fact 6 about limits. Any increasing or decreasing bounded sequence must converge.
Fact 4.
Let {xn } be a Cauchy sequence with a convergent subsequence {xnk } such that lim xnk = L. So given > 0, there is a
k
Figure 27: Case B of the Spanish Hotel Argument. The sequence of indices {mk } corresponds to hotels of increasing height.
Here we used the triangle inequality. It follows that the original sequence converges to L. Again you may want to replace
by 2 .
Proof of Theorem 24
Proof. We want to show that every Cauchy sequence of real numbers converges to a real number.
Fact 2 about Cauchy sequences says a Cauchy sequence is bounded.
Fact 3 about Cauchy sequences says a bounded sequence has a convergent subsequence.
Fact 4 about Cauchy sequences says that once you have a convergent subsequence the original sequence is forced to
converge to the same limit as the subsequence.
So we are done.
One can base calculus on the concept of an innitesimal rather than on the idea of a limit. This is called "non-standard
analysis." It was created by A. Robinson. R. Rucker, Innity and the Mind, p. 93, says that it is simpler to believe in
the innitely small "dx" rather than to let x approach 0. But, Rucker says: "So great is the average persons fear of
innity that to this day calculus all over the world is being taught as a study of limit processes instead of what it really is
innitesimal analysis." There are a few calculus texts based on non-standard analysis: H.J. Keisler, Elementary Calculus
and Henle and Kleinberg, Innitesimal Calculus. I will not pursue this subject at all in these notes. It seems harder to deal
with than limits since so few people have actually tried to understand it.
Others argue that the universe is nite. See Greenspan, Discrete Models, where it is said that "It is unfortunate that
so many scientists have been conditioned to believe that 1030 particles can always be well approximated by an innite
number of points." Classical applied math. views the vibrating string as a continuum like R. Greenspan argues that we
should perhaps replace the continuum with a large nite set of points. This replaces calculus with nite dierence calculus
or the nite element method. We will have nothing to say about that here, except to note that in the end usually one needs
a computer to obtain an approximate solution to our applied math. problems and that leads us to replace derivatives with
nite dierences, for example.
Rucker, Innity and the Mind, considers this question also. On p. 33, for example, he says: "The question of whether or
not matter is innitely divisible may never be decided. For whenever an allegedly minimal particle is exhibited, there will
be those who claim that if a high enough energy were available, the particle could be decomposed; and whenever someone
wishes to claim that matter is innitely divisible, there will be some smallest known particle, which cannot be split."
There is also a very basic controversy in mathematics - that of constructivism. One aspect of the constructivist approach
is to seek so-called "constructive proofs" which involve a new meaning for the mathematical word "or." Sets and numbers
must be constructed. Constructive mathematicians seek a dierent approach to the completeness axiom and thus to the
existence of limits. References are E. Bishop, Foundations of Constructive Analysis (where this course and graduate courses
are done in a constructive manner) and Volume 39 of the Journal, Contemporary Mathematics, which was dedicated to
Errett Bishop, including an article by Bishop titled "Schizophrenia in Contemporary Mathematics." I personally nd that
this constructive approach to the basics of the logic of our proofs does in fact lead my brain to so many twists and turns that
schizophrenia might be a good description. I will say no more about it here.
41
Part IV
Limits of Functions
Now we want to consider the limit of a function f (x) as x approaches a, denoted lim f (x).
xa
Denition 26 Suppose I is an open interval containing the point a and f : I {a} R. Then we say L is the limit of
f (x) (or f (x) converges to L) as x approaches a and write lim f (x) = L i
xa
(13)
Figure 28: The graph of a function y = f (x) is red. The denition of lim f (x) = L says that given a positive we can nd
xa
a positive (depending on ) so that for x = a in the interval (a , a + ), the graph of the function must lie in the blue
box of height 2 and width 2, except perhaps for (a, f (a)). In the picture f (a) is undened and thus there is a hole in the
graph at (a, L).
Again you need to memorize this denition. And, yes, it is a pretty horric sentence full of quantiers . If you believe
in the usual logic of mathematics, then you should be willing to write down the negation of this statement. If you are a
constructivist, I do not know what you would do.
See Figure 28 for a picture of the denition of limit. Note that we do not assume that f (a) is dened. Thus |x a| = 0
is excluded from consideration in the statement in formula (13) of the denition of limit. It is assumed in the denition
that is small enough that 0 < |x a| < implies x I {a} and thus f (x) makes sense. If f (a) is dened that is O.K.
too. It is not required that f (a) = L however. If f (a) = L, the point (a, f (a)) would be outside the little blue box for small
enough in Figure 28.
42
Note. We could weaken the hypotheses in the denition of limit. Most authors assume that a is an accumulation
point of the set S where f is dened. This means that for every > 0, there is a point x = a such that x S (a , a + ).
This insures that there are points x = a such that 0 < |x a| < and f (x) makes sense. See Apostol, Mathematical Analysis
or Sagan, Advanced Calculus. Lang, Undergraduate Analysis, does not do this, nor does he assume that he is only taking
points x = a in (a , a + ). This allows f (a) to have a bad denition (in which case you would not have a limit) or a to
be an isolated point (i.e., any non-accumulation point) of the domain of f (in which case you would have a limit trivially).
We give the more general denition of limit (with accumulation points) in Lectures, II. For now, our denition suces.
Example 1. Suppose f (x) = 3x 1, x R. Then lim (3x 1) = 5. Of course, you do not need the denition to
x2
compute this limit since f (x) is a continuous function and thus our limit is f (2). But we have not proved anything about
continuous functions yet, nor even dened them. So we prove that our limit is correct.
Proof. |3x 1 5| = |3x 6| = 3 |x 2| < if |x 2| < 3 = .
For an example of lim f (x) where f (a) is not dened, look at what we will later call a derivative.
xa
2
1
Example 2. lim xx1
= 2.
xa
x2 1
x1
1 to obtain the proof. We leave this to the reader. Note that the graph of the function
at the point (1, 2).
x2 1
x1
If you want, you can also consider right and left hand limits. You just replace the open interval I in the denition of
limit above with a half open interval (a , a) or (a, a + ), for small positive . For example consider the function known
as oor of x = x = L = the greatest integer x. See Figure 29. Then we want to say
lim x = 0 and lim x = 1.
x1
x1
x<1
x1
x<1
x1
x>1
x1+
43
Before looking at more examples, it will help to know the basics about limits.
Properties of Limits.
In the following, we always assume our functions are dened on an open interval I containing the point a, except perhaps
at x = a.
Property 1) Sequential Denition of Limits. lim f (x) = L i for every sequence {xn } of points in I {a} such that
xa
xa
and
lim g(x) = M.
xa
Then
xa
Property 4) Limit of a product is the product of the limits. Suppose lim f (x) = L and lim g(x) = M. Then
xa
xa
xa
(x)
lim fg(x)
xa
xa
L
M.
Property 6) Limits preserve inequalities. Suppose lim f (x) = L and lim g(x) = M and, f (x) g(x), for all x in an
xa
xa
open interval containing a, except perhaps when x = a. Then L M.
Proof. Property 1). We leave this proof as an exercise.
Property 2). We leave this as an exercise. Imitate the analogous proof for limits of sequences.
Property 3). This proof is similar to that of the analogous fact for limits of sequences. We leave it as an exercise.
Property 4). We proceed as in the proof of the analogous fact for sequences. Note that
|f (x)g(x) LM | = |f (x)g(x) f (x)M + f (x)M LM |
|f (x)g(x) f (x)M | + |f (x)M LM |
= |f (x)| |g(x) M | + |f (x) L| |M | .
We need to bound |f (x)| for x close to a in order to be able to make the rst term in the last sum small. To do this
use the denition of limit with = 1. This says there is a 1 > 0 such that 0 < |x a| < 1 implies |f (x) L| < 1. Since
|f (x)| = |f (x) L + L| |f (x) L| + |L|, it follows that 0 < |x a| < 1 implies |f (x)| < 1 + |L| . Thus 0 < |x a| < 1
implies
|f (x)g(x) LM | (1 + |L|) |g(x) M | + |f (x) L| |M | .
(14)
We know that 2 > 0 s.t.
0 < |x a| < 2 implies |g(x) M | <
.
2 (1 + |L|)
(15)
.
2 (1 + |M |)
(16)
Dene = min{ 1 , 2 , 3 }. Then 0 < |x a| < implies (combining inequalities (14), (15) and (16) )
|f (x)g(x) LM | < (1 + |L|)
+
|M | .
2 (1 + |L|) 2 (1 + |M |)
A shorter proof can be obtained using Property 1 and the corresponding fact for limits of sequences.
Property 5. By Property 4, it suces to prove the special case that
lim g(x) = M, with
xa
M = 0 implies lim
1
1
=
.
g(x)
M
(17)
xa
Since dividing by 0 is a "no no," we need to show that for x close enough to a, g(x) = 0. Since M = 0, we can take
|
= |M
2 . So 1 > 0 s.t. 0 < |x a| < 1 implies
||g(x)| |M || |g(x) M | <
44
|M |
.
2
(The rst inequality here follows from the triangle inequality; i.e., ||A| |B|| |A B|. The proof of this is an exercise.)
|
|M |
Therefore |M
which implies, upon adding |M | , that
2 < |g(x)| |M | < 2
|g(x)| >
|M |
> 0, if 0 < |x a| < 1 .
2
(18)
Now we prove (17). To do this, we need to prove the following is < when x is close enough to a:
1
1 M g(x)
g(x) M = M g(x) .
From formula (18), we know 0 < |x a| < 1 implies
M g(x) |M g(x)|
|M g(x)|
=2
.
2
M g(x) <
|M |2
|M |
2
This can be made < since < 1 s.t. |M g(x)| < 2 |M | if 0 < |x a| < . This completes the proof of (17).
Property 6. Look at h(x) = g(x)f (x).Then h(x) 0, for all x in an open interval containing a, except perhaps when x = a.
We know from earlier properties that lim h(x) = M L K. Thus it suces to show K 0. Assume K < 0 and deduce a
xa
contradiction. Note rst that for x in our open interval with x = a we have |h(x) K| |K| = K > 0. See Figure 26 below.
But we know > 0 > 0 s.t. |h(x) K| < for 0 < |x a| < . This implies |K| = K h(x) K |h(x) K| <
> 0. Lemma 21 says |K| = 0. This contradicts our hypothesis that K < 0 and were done.
Figure 30: We plot f (x) = x4 + 1 0 in red; g(x) = 1 in green. It should be clear that the distance between f (x) and
g(x); i.e., |f (x) (1)| 1 for all x and thus f (x) cannot approach a negative number like 1 as x a.
45
Example 1.
lim (x+h)h
h0
x2
= 2x.
Here we are computing the derivative of the function f (x) = x2 . This is the sort of limit everyone (e.g., Newton and
Leibniz) could do without knowing the precise denition of limit. The computation goes as follows:
(x + h)2 x2
x2 + 2xh + h2 x2
=
= 2x + h.
h
h
Taking limits as h 0 and using the properties of limits given above, we see that
(x + h)2 x2
= lim (2x + h) = lim 2x + lim h = 2x.
h0
h0
h0
h0
h
lim
Here we have used the fact that the limit of a sum is the sum of the limits, the limit of a constant function is the constant,
lim h = 0, and the limit of a product is the product of the limits. Note that x is a constant during our calculation.
h0
Example 2.
Dene f (x) = 1 if x is rational and f (x) = 0 if x is irrational. Then lim f (x) does not exist for any real number a. To
xa
see this, you just have to note that any interval contains both rationals and irrationals. See the proof below. If lim f (x) = L,
xa
we know > 0 s.t. 0 < |x a| < implies |f (x) L| < 12 . But there are points x such that 0 < |x a| < with f (x) = 1
and other points u such that 0 < |u a| < with f (u) = 0. Then 1 = |f (x) f (u)| |f (x) L| + |L f (u)| < 1. But
1 < 1 is impossible.
Theorem 27 Any open interval I contains both rational and irrational numbers.
Proof. Rationals in I.
It suces to show that there is a rational number arbitrarily close to any real number. Given > 0 (our measure of
closeness), we know by Lemma 22 there is a positive integer n such that n > 1 . Therefore, it suces to show that for any
real number a there is a rational number q such that |a q| n1 .
Case 1. a is positive.
If a > 0, then, for n as in the preceding paragraph, look at the set S = {k Z+ |na < k} . This set has a least element
m by the well ordering axiom for the positive integers. This means (m 1) na. Therefore
m
1
m
a< ,
n
n
n
1
which says a m
n n < . Thus we have found a rational number within distance of a.
Case 2. a is negative.
In this case a is positive and we can use Case 1 to nd a rational number q so that |a q| < . It follows that
|a (q)| < . Of course q rational implies q is rational.
Case 3. a=0.
In this case, life is even easier as we have 0 n1 = n1 < .
Irrationals in I.
We need to show that there is an irrational number arbitrarily close to any given rational
number
q. We know from the
r
preceding that we can choose a rational number r such that r q 2 < . This implies 2 q < 2 < . Since r2 is
irrational (as otherwise r2 = u is rational and then so is 2 = ur contradicting Theorem 18), we are done.
Before considering another example, we need a Lemma.
Lemma 28 Squeeze Lemma. Suppose f (x) g(x) h(x) for all x in some open interval containing a. If lim f (x) =
xa
xa
Proof. We know from Property 6 above that if lim g(x) exists, it must be both L and L and, therefore = L. But why
xa
must the limit exist? Given > 0, 1 s.t. 0 < |x a| < 1 implies |f (x) L| < . Similarly 2 s.t. 0 < |x a| < 2
implies |h(x) L| < . It follows that if = min{ 1 , 2 }, we have
0 < |x a| < implies < f (x) L g(x) L h(x) L < .
46
Thus
0 < |x a| < implies |g(x) L| < .
Example 3.
lim sin x
x0 x
= 1.
First look at Figure 31. That may convince you that the formula is correct.
sin x
x
How are we going to prove this formula? Later we will see that this limit is the derivative of sin x at x = 0 and thus it
is cos 0 = 1. Dening the sine and cosine as in your favorite calculus book, and measuring our angles in radians, we have
Figure 32 which implies that
sin x cos x
x
tan x
.
2
2
2
Multiply by
2
sin x
to see that
cos x
x
1
.
sin x
cos x
Thus sinx x is squeezed between 2 functions approaching 1 as x 0 and by the Squeeze Lemma (which was Lemma 28), it
must approach 1 as well. Am I cheating by assuming that cos x is continuous at x = 0? We will say more about trigonometric
functions later.
47
Figure 32: A circle of radius 1 is drawn with an acute angle x measured in radians; i.e., 0 x < 2 . By looking at the aras
of the triangle 0AC, the arc of the circle with angle x, the triangle 0BC, we see that sin x2cos x x2 tan2 x .
48
16
Until now we have been dealing with limits of functions f (x) as x approaches a nite real number a. What happens if we let
x approach ? This denition is similar to the denition of limit of a sequence. We can of course also let x approach .
Denition 29 Suppose f : (a, ) R for some a. Then
|f (x) L| < .
Exercise. Show that lim f (x) = L
x
Example 1.
lim 1
x x
lim f ( y1 ) = L.
y0
y>0
= 0.
1
Proof. Given > 0, we need to nd B > 0 s.t. x B implies x1 0 = |x|
< . Since x B > 0, |x| = x and thus we
1
1
1
need x < . This is equivalent to x > . So we can take B = 1 + . Then x B implies x > 1 and we have the desired
inequality.
It is possible to prove that these innite limits have the usual properties of limits. We leave that to the reader to check.
You can use the exercise after the denition to do this. That is, make the change of variables y = 1/x and let y go to 0.
That will also simplify the following examples.
1+x
Example 2. lim 1x
= 1.
Proof. Note
Example 3.
Example 4.
x
1+x
that 1x
lim 1
x x
1+x
1x
1
x
1
x
1
1+ x
1
1 x
= 0.
lim 1+x
x 1x
= 1.
49
1+0
10
= 1, as x .
Part V
Continuous Functions
Intuitively continuous functions dened on an open interval in R are those whose graphs have no breaks or holes. They can
have jagged peaks and valleys though. See Figure 33.
Figure 33: On the upper left is a continuous function on an interval, while on the lower right is a discontinuous function.
Denition 30 If S R, f : S R is continuous at a S
> 0 > 0 s.t. |x a| < implies
(19)
50
Example 1. Polynomials p(x) = an xn + an1 xn1 + + a1 x + a0 , (an , an1 , ..., a1 , a0 R) are continuous at all points
in R.
Proof. This follows from Fact 1 below as well as the fact that constant functions f (x) = c are easily seen to be continuous
everywhere as is the identity function
sin x f (x) = x (exercise).
x = 0
x ,
This function is also continuous at all points in R.
Example 2. Dene f (x) =
1,
x = 0.
Proof. The continuity at non-zero points is easy from Fact 1 below. At x = 0, however, one must use the denition of
continuity and Example 3 from
on limits. See Figure 31.
the section
x sin x1 , x = 0
. This function is also continuous at all points in R. See Figure 34 below.
Example 3. Dene f (x) =
0,
x = 0.
The unusual aspect of this function is that there are innitely many peaks and valleys on the interval [0, 1]. Thus you can
not really draw the graph in a nite amount of time.
Proof. Again the continuity at non-zero points is easy from Facts 1 and 2 below. But at x = 0 one must use the denition
of continuity and the fact that |sin | 1 for all angles . Thus x sin x1 |x| and we can take = in the denition of
continuity at 0.
Figure 34: a graph of x sin x1 . There are innitely many wiggles in this curve near 0 although the function is everywhere
continuous.
Facts About Continuous Functions.
Fact 1) Sum, Product, Quotient of Continuous Functions are Continuous. Suppose f, g : S R are both
continuous at a S. Then the sum f + g and product f g are also continuous at a. If g(a) = 0, then the quotient fg is also
continuous at a.
Fact 2) Composite of Continuous Functions is Continuous. Suppose f : S T and g : T R for subsets S, T
of R. If a S, then b = f (a) T. If f is continuous at a and g is continuous at b = f (a), then the composite function
g f : S R is continuous at a. Recall that (g f ) (x) = g (f (x)) .
Proof. Fact 1) These facts follow from the corresponding facts about limits.
Fact 2) Given > 0 we know there is > 0 s.t. |y b| < implies |g(y) g(b)| < . For this very we know there is a
1 > 0 s.t. |x a| < 1 implies |f (x) f (a)| < . Putting these 2 sentences together gives (since b = f (a))
|x a| < 1 implies |f (x) f (a)| < which implies |g(f (x)) g(b)| < .
51
The most important properties of a continuous function on a closed nite interval [a, b] are the Intermediate Value Theorem
and the Weierstrass Theorem on the existence of maxima and minima. We state these 2 theorems rst and then prove them.
Theorem 31 Intermediate Value Theorem. Suppose f : [a, b] R is continuous. Then for every between f (a) and
f (b) there exists a point c [a, b] such that f (c) = .
Roughly this says that the graph of f has no breaks or holes. It must cross every horizontal line y = if is between
f (a) and f (b). See Figure 35.
Figure 35: Assume f is continuous on the closed nite interval [a, b]. The intermediate value theorem says that for any
in f [a, b] there is a point where the horizontal line y = intersects the graph of y = f (x). We label the point (c, ) on our
graph.
Theorem 32 Weierstrass Theorem on the Existence of Maxima and Minima.
Suppose f : [a, b] R is
continuous. Then there exists a point c [a, b] such that f (c) is the minimum value of f on [a, b]; i.e., f (c) f (x)
x [a, b]. We write f (c) = min{f (x) | x [a, b]}. Similarly there exists a point d [a, b] such that f (d) is the maximum
value of f on [a, b]; i.e., f (d) f (x) x [a, b]. Write f (d) = max{f (x)x [a, b]}.
The Weierstrass theorem assures the existence of solutions to max-min problems for continuous functions on closed nite
intervals. If you drop either the hypothesis that the interval is closed and nite or the hypothesis that the function is
continuous on the interval, then the conclusion may be false. Ultimately it is possible to replace the closed interval with an
52
arbitrary compact set (see Lang, Undergraduate Analysis, for the denition) such as the Cantor dust (see Figure 36) obtained
by repeatedly removing middle thirds of intervals in [0, 1]. We will see this example again later in the course. The actual
Cantor set is invisible. We can only show pictures of approximations to the set. The Cantor set is a fractal. We will say
more about fractals later. See Falconer, Fractal Geometry for more information on fractals.
Figure 36: 5 approximations to the Cantor dust, each approximation removing middle thirds of the intervals in the approximation above.
You may question how useful it is to know that a max or min exists without knowing how to nd it. Sometimes existence
does tell you everything. Of course once we have derivatives we will have more information on the location of a max or min.
The following example says BEWARE that you know your function and interval satisfy the Weierstrass hypotheses.
Example. Consider the function f (x) = x1 , for x (0, 1]. This function has no maximum value even though it is
continuous on the interval (0,1]. This is not a closed interval of course.
Given a function like f (x) = x5 3x + 129, you probably know (at least if you have taken numerical analysis) there are
may ways to solve f (c) = 7 approximately. Programs like Mathematica, Matlab, Scientic Workplace do these things with
amazing speed. Scientic Workplace tells me the 5 solutions are approximately
{2. 104 4 + 1. 504 0i, 2. 104 4 1. 504 0i, 0.780 88 2. 505 9i, 0.780 88 + 2. 505 9i, 2. 647 0} .
Only the last root is real. That is the one we are talking about here. Yeah, this is real analysis not complex. So we ignore
4 out of 5 roots. In Figure 37 we plot the function f (x) = x5 3x + 129 and the line y = 7 on the interval [3, 1]. The
intermediate value theorem tells us that the root c of f (c) = 7 exists since f (3) = 105 and f (1) = 127.
In what follows we give a numerical analysts proof of the Intermediate Value Theorem (from V. Bryant, Yet Another
Introduction to Analysis). It tells you how to approximate a point c [a, b] with f (c) = .
Proof. of the Intermediate Value Theorem by the Bisection Method. This is a proof by induction.
1
Step 1. Set r1 = a and s1 = b. Assume f (r1 ) < f (s1 ). Then f (r1 ) < < f (s1 ). Dene m1 = r1 +s
=the midpoint
2
of the interval [r1 , s1 ]. To obtain the next subinterval [r2 , s2 ], there are 3 cases according to the location of f (m1 ). See the
Figure 38.
Case a. If f (m1 ) < , set r2 = m1 and s2 = s1 .
Case b. If f (m1 ) > , set r2 = r1 and s2 = m1 .
Case c. f (m1 ) = and we are done as m1 = c.
Induction Step. Assume that [rj , sj ] have been found for j = 1, 2, ..., n. Dene [rn+1 , sn+1 ] as follows. Look at the
n
midpoint mn = rn +s
. Again consider 3 cases according to the location of f (mn ).
2
Case a. If f (mn ) < , set rn+1 = mn and sn+1 = sn .
Case b. If f (mn ) > , set rn+1 = rn and sn+1 = mn .
Case c. f (mn ) = and we are done as mn = c.
Assuming we are never in Case c (when we would be done after a nite number of steps), we create 2 innite sequences
{rn } and {sn } with the following properties:
a = r1 r2 rn rn+1 sn+1 sn s2 s1 = b
and
sn rn =
ba
2n1 .
53
Figure 37: The graph of f (x) = x5 3x + 129 and that of the line y = 7. The x-coordinate of the intersection of the 2
curves is approximately 2.647.
Claim.
Proof of Claim. The sequence {rn } is increasing and bounded. Thus it must have a limit which we will call c, using Fact
6 in the section on facts about limits of sequences. Similarly {sn } is decreasing and bounded and must have a limit which
we will call c . We also know that
ba
sn rn = n1 0 as n .
2
It follows that lim (sn rn ) = c c = 0.
n
We know also that
f (rn ) f (sn ).
Since f is continuous, we have
f (c) = lim f (rn ) lim f (sn ) = f (c).
n
f (xnk ) > nk k, for all k Z+ . This implies that lim f (xnk ) = . But this contradicts the continuity of the function f
k
which implies that lim f (xnk ) = f (c) which is a nite real number.
k
54
Figure 38: Here the visualize the 1st step in the bisection method with the 1st interval being [a, b] = [r1 , s1 ] = [3, 1]. We
have f (x) = x5 3x + 129. and seek to nd c with f (c) = 7. This is the polynomial from Figure 37. The midpoint is
m1 = 1. Then clearly f (m1 ) > = 7. So the next interval is [r2 , s2 ] = [3, 1]. Of course you can see from the graph
approximately where the curve crosses the line y = 7. And we noted above that an approximation to c with f (c) = 7 is
2. 647 0. The reader should nd the next interval [r3 , s3 ], or even enough intervals to get our approximation to c.
55
.
Step 3. l.u.b. {f (x)|x [a, b]} = = max {f (x)|x [a, b]} = f (c) for some c [a, b].
By Step 2 and the completeness axiom we know the least upper bound exists. But why does = f (c) for some
c [a, b]? By the denition of least upper bound, there is a point un [a, b] such that
1
f (un ) > , for every n Z+ .
n
Once more by Fact 3 about Cauchy (and other) sequences, we know {un } has a convergent subsequence {unk } with
lim unk = c [a, b]. We want to show that f (c) = . To do this, note that
f (unk ) >
1
1
for every k Z+ .
nk
k
Therefore by the Squeeze Lemma (which was Lemma 28) and the continuity of f ,
= lim f (unk ) = f ( lim unk ) = f (c).
k
This completes the proof of the Weierstrass Theorem on Existence of Maxima. We leave minima to the reader.
amounts to replacing f by f . You should not have to do the entire proof over again.
It
56
Part VI
Derivatives
17
Examples
Next we nally get to do some calculus (if by calculus, you mean derivatives and integrals only). We start with derivatives.
It would be hard to do applied mathematics without derivatives. Newton invented calculus (along with Leibniz) to
formulate Newtons 3 laws of motion such as force = mass acceleration (1666). Of course acceleration is a second derivative.
See E.T. Bell, Men of Mathematics for some of this story. The derivative can be used to represent all sorts of instantaneous
rates of change - not just that of distance with respect to time. Thus one nds dierential equations in physics, chemistry,
economics, biology, ecology, weather modeling.
For example, the predator-prey equations describe the evolution of 2 interacting species such as cats and mice on a
desert island. If u(t)=number of mice at time t, v(t)=number of cats at time t, the predator-prey equations say:
u
v
= (a bv mu)u
= (cu d nv)v,
where a, b, c, d, m, n are constants. References for these equations include H. Kocak, Dierential and Dierence Equations
through Computer Experiment; or J. M. Maynard Smith, Mathematical Ideas in Biology. See K. Devlin, Mathematics: The
Science of Patterns, Chapter 3, for more examples.
The dierence between the graph of an everywhere dierentiable function and a merely everywhere continuous function
is quite visible. The more derivatives a function has the smoother the graph.
Example 1. An innitely dierentiable function The Gaussian or normal density.
2
Consider the function G(x) = 1 ex which is plotted in Figure 39. The nth derivative G(n) (x) exists for every
n = 1, 2, 3, ... and every real point x. This graph is as smooth as they come.
2
1 ex
f (x) =
(s2)k sin k x , for > 1 and 1 < s < 2.
k=0
We will prove later (in Lectures, II) that f (x) is continuous. This is an easy consequence of the Weierstrass M-test for
uniform convergence of series of functions. However it can be shown that the derivative f (x) does not exist at any point x.
We will give exercises on this in a nal exam.
Figures 41, 42, and 43 show graphs of approximations of such functions on the interval [0, .5] (in which the innite series
is cut o at some nite number of terms). It turns out that the graph of the Weierstrass function is a fractal having box
dimension s. We will dene box dimension in a nal exam. See Falconer, loc. cit., if you feel impatient. A smooth curve
would have box dimension 1. The closer to 2 the dimension gets, the more the curve lls the 2D picture. In 41 we take
= 1.5, s = .7, and cut o the sum at k = 3000. In Figure 42 we take = 1.5, s = .9, and cut o the sum at k = 3000. In
Figure 43 we take = 5, s = .9, and cut o the sum at k = 1000.
58
3000
1.5(1.72)k sin 1.5k x .
k=0
3000
k=0
59
1.5(1.92)k sin 1.5k x .
1000
k=0
60
5(1.92)k sin 5k t
18
Denition of Derivative
The derivative f (x) can be thought of geometrically as the slope of the tangent line to the curve y = f (x) at the point
(x, f (x)). But what does that really mean? It means that you take limits of slopes of secant lines; i.e., lines through points
(x, f (x)) and (x + h, f (x + h)) for small values of |h| . See Figure 44.
(x)
y
Figure 44: The derivative of f(x) at the point x is the limit of the slopes x
= f (x+x)f
of the secant lines as x 0.
x
Here by secant line, we mean the line connnecting (x, f (x)) and (x + x, f (x + x)). The derivative f (x) is the slope of the
tangent to the curve y = f (x) at the point (x, f (x)).
f (t+t)f (t)
If instead t =time and y = f (t) = distance from origin at time t, then y
= average velocity over the
t =
t
as
t
approaches
0.
interval from t to t + t. The instantaneous velocity at time t is the limit of y
t
More precisely, suppose the function f is dened on an open interval I containing the point c
61
Example 1. Consider f (x) = x3 2x + 1. This is dierentiable at any point x, as are all polynomials. Lets compute the
derivative.
(x + h)3 2(x + h) + 1 x3 2x + 1
f (x + h) f (x)
lim
= lim
h0
h0
h
h
3
2
2
x + 3x h + 3xh + h3 2(x + h) + 1 x3 2x + 1
= lim
h0
h
3x2 h + 3xh2 + h3 2h
= lim
= 3x2 2.
h0
h
1, x Q
Example 2. Dene the function g(x) =
The graph looks like 2 horizontal lines even though one line
0, x
/ Q.
has way fewer points, as the rationals are denumerable while the irrationals are not. We saw earlier that this is not a
continuous function at any point x since any interval on the real line contains both rationals and irrationals. See the section
on properties of limits. Later we will show that this implies the derivative cannot exist at any point. Thus this function is
nowhere dierentiable.
It is also possible to dene 1-sided derivatives of a function f at a point c. For the right-hand derivative at c, you only
need f to be dened on [c, c + ) for some > 0.
Denition 35 The right-hand derivative is dened by:
f+ (c) =
lim
h0
h>0
f (c+h)f (c)
.
h
We leave it to the reader to dene the left-hand derivative. The right- and left-hand derivatives must be equal at the
point c in order for f (c), the 2-sided derivative to exist.
Example. The absolute value.
The function f (x) = |x| has right-hand derivative f+ (0) = 1, while the left-hand
derivative f (0) = 1. Thus, the absolute value function does not have a 1-sided derivative. To see the right-hand formula,
note that
|0 + h| |0|
|h|
h
f+ (c) = lim
= lim
= lim
= 1.
h
h
h
h0
h0
h0
h>0
h>0
h>0
19
Denition 36 Suppose f is dened on the open interval I. If c I, then f is dierentiable at c if and only if there is a
real number L and a function dened on an open interval containing 0 such that
f (c + h) = f (c) + Lh + (h), and
lim
h0
(h)
= 0.
h
(20)
In formula (20), L = f (c). Note that the rst 2 terms on the right (f (c) + Lh) are a linear function of h (holding c
xed). We view (h) as a 2nd order term since (h)
must approach 0 with h. This 2nd denition of derivative is the
h
beginning of a Taylor expansion at c. See Figure 45 for an illustration of formula (20).
Set
(h) =
Then is continuous at 0 and (h) = h(h)
that is continuous at 0.
(h)
h ,
0,
h = 0
h = 0.
(21)
62
Figure 45: The linear approximation to the derivative (as a function of h holding x xed) is shown here as f (x) + f (x)h.
The dierence with f (x + h) is the function (h). One has f (x + h) (f (x) + f (x)h) = (h).
63
We will need this 2nd denition of derivative in proving the chain rule. It is also a useful way to get a rough linear
approximation to a non-linear
function. Before showing the 2 denitions of derivative are the same, lets do an example.
f (x) =
x+h
h
x+h
h
x x+h+ x
x+hx
1
1 1
=
=
, as h 0.
2 x
x+h+ x
h x+h+ x
x+h+ x
1
4.
So we get
5
= 2.25.
Theorem 37 The usual denition 34 of derivative is equivalent to the linear approximation denition 36 with f (c) = L.
Proof. Denition 34 implies 36.
and set
Then
(h)
lim
h0 h
f (c + h) f (c)
f (c + h) f (c) Lh
= lim
= lim
L
h0
h0
h
h
f (c + h) f (c)
L = 0,
= lim
h0
h
20
h0
h0
To make our lives easier when trying to nd the formula for the derivative of some function, it will help to know the properties
of the derivative.
Just the Facts about Derivatives.
Fact 1) Dierentiability implies Continuity. Suppose f is dierentiable at c. Then f must be continuous at c.
Fact 2) Derivative is Linear. Suppose f and g are dierentiable at c. And suppose k is a real number. Then f + g and
kf are dierentiable at c and
(f + g) (c) = f (c) + g (c); (kf ) (c) = kf (c).
Fact 3) Product Rule. Suppose f and g are dierentiable at c. Then so is the product f g and
64
f
g
is dierentiable at c
Fact 5) Chain Rule. Suppose f is dierentiable at c and g is dierentiable at f (c). Then the composite (g f ) (x) =
g (f (x)) is dierentiable at c and
(h)
0 as h 0.
h
To nish the proof, you just need to convince yourself that (h) 0 as h 0. We saw this after Formula (21) for example.
Fact 2) I leave these proofs to you as Exercises.
Fact 3) To show the product rule, we use the trick we needed in the proof of the limit of a product result, i.e., we add and
subtract a term in between the 2 things in the numerator of the dierence quotient for the derivative:
f (c + h)g(c + h) f (c)g(c)
h
Using Fact 1 and properties of limits, we see that this last mess approaches f (c)g (c) + f (c)g(c) as h 0.
Fact 4) To prove the quotient rule it suces to do the case f (x) = 1, x in the domain of f and g, by the product rule. To
do this, look at
1
1
1 g(c) g(c + h)
1
g(c) g(c + h)
g(c+h) g(c)
=
=
.
h
h g(c)g(c + h)
g(c)g(c + h)
h
Since g is non-zero in an open interval containing c (because g(c) = 0 and g is continuous at c by Fact 1), the denominator
is not vanishing for small enough values of h. Since g is continuous at c by Fact 1, we see that the last mess approaches
1
g(c)2 (g (c)) as h 0.
Fact 5) It is easy to convince ourselves of the formula by writing
y
y u
=
.
x
u x
Here y = g(x + x) g(x) and u = f (x + x) f (x). The problem with this "proof" is that u might vanish at lots
of points as x 0. Then you would be dividing by 0 (which is certainly a real no-no). So we need to devise a proof that
avoids this pitfall. That is really why we like the 2nd denition of the derivative using the function rather than ; see
Formulas (36) and (21).
Let
k = k(h) = f (x + h) f (x) = f (x)h + h 1 (h)
(22)
and let y = f (x). Then we have
g(f (x + h)) g(f (x)) = g(y + k) g(y) = g (y)k + k 2 (k).
It follows that
We can use these examples and the rules for derivatives to deduce that all polynomials
p(x) = an xn + an1 xn1 + + a1 x + a0 , where aj R
are everywhere dierentiable. By the quotient rule we can dierentiate rational functions p(x)
q(x) , for polynomials p(x),
q(x), at any point x R such that q(x) = 0. Any computer with a program such as Scientic Workplace, Matlab or
Mathematica can grind out these derivatives and lots more. We have to say more about ex , log x, sin x, cos x, xa , for
x >
0, a
R, before we can do more interesting derivatives. Then we can use the chain rule to get derivatives of functions
like 1 + x or xx or sin( x1 ).
Our next goals are to gure out the mean value theorem and the formula for the derivative of the inverse
of a function.
Here we mean the inverse function for the operation of composition of functions, such as the derivative of x knowing the
derivative of x2 .
21
This theorem will be of great use in deducing properties of the function f (x) from properties of its derivative f (x). It also
helps us to prove that any function whose derivative is identically 0 on an interval must be a constant. This fact is important
for integration. It explains why the antiderivative tables always have +C at the end of the formulas.
Theorem 38 The Mean Value Theorem.
is a point c (a, b) such that
f (b) f (a)
.
ba
The conclusion of the mean value theorem says that the tangent to the curve y = f (x) at the point (c, f (c)) is parallel to
the line through the points (a, f (a)) and (b, f (b)). See Figure 46.
66
Figure 46: The mean value theorem says the slope of the line through (a, f (a)) and (b, f (b)) equals the slope of the tangent
at some point c.
67
According the C.H. Edwards, Jr., in The Historical Development of the Calculus, p. 314, our proof is due to O. Bonnet
(1819-1890). The proof is 1st done in a special case and we need to know the 1st Derivative Test before proceeding.
Theorem 39 First Derivative Test. Suppose that f : (a, b) R is dierentiable and
f (c) f (x) for all x (c , c + ) (a, b) for some > 0.
Then we say that f has a local maximum at x = c. It follows that the derivative f (c) = 0.
Proof. Consider the dierence quotient
f (c + h) f (c)
=
h
number 0,
if > h > 0
number 0, if < h < 0.
See Figure 47 For the derivative f (c) to exist, this must have a limit as h 0. This limit must be 0 as the right hand limit
is 0, while the left-hand limit is 0 (since limits preserve inequalities).
Figure 47: The rst derivative test for a local maximum is pictured. The slopes of secant lines on the left are positive while
those on the right are negative. So the slope of the tangent line must be 0.
Proof. of the Mean Value Theorem.
Special Case. f (a) = f (b) = 0. (also known as Rolles Theorem).
By the hypothesis of the mean value theorem f : [a, b] R is continuous and dierentiable on (a, b). In this special case
f (b)f (a)
= 0 and thus we need to nd a point c (a, b) such that f (c) = 0. If f (x) =constant, then any point c (a, b) will
ba
work. Otherwise we know f (u) = f (a) for some u (a, b). We may assume f (u) > f (a). Otherwise we can make a similar
argument which is left to the reader. Let c be the point in [a, b] such that f (c) = max{f (x)|x (a, b)}. We know that the
maximum of a continuous function is attained on a closed nite interval by the Weierstrass theorem (see Theorem 32). It
follows from the fact that f (a) = f (b) that c = a and c = b. Then by the rst derivative test we know that f (c) = 0. This
completes the proof of the special case.
68
We get the following corollary immediately since now knowing that the slopes of all tangent lines are positive implies the
same for all secant lines.
Corollary 40 Suppose that f : [a, b] R is continuous and dierentiable on (a, b). If f (x) > 0 for all x (a, b), then
the function f (x) is monotone strictly increasing on [a, b]; i.e., for every pair of points u, v [a, b] such that u < v, we
have f (u) < f (v).
Proof. By the mean value theorem
f (v) f (u)
= f (c) for some c (u, v).
vu
Here we replace a, b with u, v, of course. But we know f (c) > 0 and v u > 0.
f (v) f (u) > 0. Thus f (u) < f (v).
(23)
It follows from equation (23) that
Similarly one can show that if f : [a, b] R is continuous and dierentiable on (a, b) and f (x) < 0 for all x (a, b),
then the function f (x) is monotone strictly decreasing on [a, b]; i.e., for every pair of points u, v [a, b] such that u < v,
we have f (u) > f (v). This is an exercise.
Yet another exercise is to show that if f : [a, b] R is continuous and dierentiable on (a, b) and f (x) 0 for all
x (a, b), then the function f (x) is monotone increasing on [a, b]; i.e., for every pair of points u, v [a, b] such that
u v, we have f (u) f (v). Similarly if f : [a, b] R is continuous and dierentiable on (a, b) and f (x) 0 for all
x (a, b), then the function f (x) is monotone decreasing on [a, b]; i.e., for every pair of point u, v [a, b] such that
u v, we have f (u) f (v).
A similar fact is important enough that we state it as a Corollary. We will need it in the theory of integration.
Corollary 41 Suppose that f : [a, b] R is continuous and dierentiable on (a, b) and f (x) = 0 for all x (a, b). Then
f (x) is constant.
Proof. This is left to the reader. If v (a, b], use the mean value theorem to see that
f (v)f (a)
va
= 0.
In the next theorem we use a fact about the second derivative which is of course the derivative of the derivative; i.e.,
f = (f ) . We use the 2nd derivative to obtain a test for whether the graph of f (x) is strictly convex up, meaning that
the graph of y = f (x), for u x v lies below the chord (which is the nite part of the secant line) connecting (u, f (u))
and (v, f (v)) for every pair of points u, v in the domain of f. You can state this as an inequality by parameterizing the chord
through (u, f (u)) and (v, f (v)) as the set of points t(u, f (u)) + (1 t)(v, f (v)) = (tu + (1 t)v, tf (u) + (1 t)f (v)) for the
parameter t [0, 1]. Then f (x) is strictly convex up if
f (tu + (1 t)v) < tf (u) + (1 t)f (v), for all t [0, 1].
See Figure 48 for an example.
Of course there will be a similar test telling whether the graph is strictly convex down (lying above all the secants).
The terminology for this is often confusing. The word "convex" or its negative is often replaced by the word "concave,"
especially in older calculus books such as E. Purcell, Calculus with Analytic Geometry, where the denition is slightly dierent
in another way. Purcells denition involves the placement of the tangent lines rather than the secant lines.
69
Figure 48: A curve is "convex up" if it always lies below the chord (or nite part) of the secant line for every pair of points
u, v in the domain of the function. Here the chord is shown in turquoise while the curve is in red.
70
Theorem 42 Convexity Test. Suppose that f : [a, b] R is continuous and the second derivative f (x) exists and
f (x) > 0 for all x (a, b). Then f (x) is strictly convex up.
Proof. The equation of the secant line is given by
L(x) = f (u) +
f (v) f (u)
(x u).
vu
Set g(x) = L(x) f (x). Our goal is to show that g(x) > 0 for all x (u, v). Note that g(u) = g(v) = 0.
Use the mean value theorem for f on the interval (u, v) to see that g (x) = f (d) f (x), for some d (u, v). Here d
depends only on u and v, and is independent of x.
Then g (x) = f (x). The hypothesis that f (x) > 0 for all x (a, b) implies that g is strictly decreasing on [u, v]. We
(u)
see that g (x) = f (v)f
f (x). We know that g (d) = 0 for some d (u, v). So we must have g (x) > 0 for u < x < d
vu
and g (x) < 0 for d < u < v. It follows from our discussion after the mean value theorem that g(x) strictly increases from
0 to some positive value as x goes from u to d and then g(x) decreases from this positive value back to 0 as x goes from d to
v. This is just what we wanted as it shows g(x) > 0 for all x (u, v).
Theorem 43 Second Derivative Test. If f : [a, b] R, suppose f (x) exists at any x in an open interval containing the
point c, and f (c) = 0, while f (c) > 0 for some c (a, b). Then f (c) is a local or relative minimum of f (x); meaning
that > 0 s.t. f (c) f (x) for all x [c , c + ].
Proof. By the preceding theorem, if we know f (x) exists and is positive on an open interval containing c, then f (x) is
strictly convex up on [c , c + ] for some > 0. This makes f (c) a local minimum since f (x) must be greater than f (c)
for all x [c , c + ].
But we dont need to know that f (x) exists except at the point x = c. For we have
f (c + h) f (c)
f (c + h)
= lim
.
h0
h0
h
h
Therefore f (x) must be positive on (c, c + 1 ) for some small 1 > 0 and f (x) must be negative on (c 2 , c) for some small
2 > 0. This means that taking = 12 min{ 1 , 2 }, we have our result. For then f is decreasing on [c , c] and increasing
on [c, c + ]. That makes f (c) a minimum on [c , c + ].
22
We seek a rule to tell us how to dierentiate log x assuming we know the derivative of ex , (or the derivative of x, assuming
we know the derivative of x2 ), or ( the derivative of arcsin(x) assuming we know the derivative of sin(x)). This is the inverse
function theorem in 1 variable. We will prove a special case.
Theorem 44 Inverse Function Theorem. Suppose f : [a, b] R is continuous and dierentiable on (a, b) with f (x) > 0
for all x (a, b). Then there is an inverse function g : [f (a), f (b)] R meaning that g(f (x)) = x for all x [a, b] and
f (g(y)) = y for all y [f (a), f (b)]. We often write g = f 1 although this is not to be confused with f1 . The inverse function
g is dierentiable and
1
g (y) =
.
(24)
f (g(y))
Note that the Leibniz notation for derivatives makes this theorem look like high school algebra as it says
dx
dy
1
dy
dx
Moreover, if we know the inverse function to be dierentiable, then the formula for its derivative follows from the chain rule
as g(f (x)) = x implies g (f (x))f (x) = 1 and thus as x = g(y), we get formula (24). Unfortunately we need to show that
the inverse function is dierentiable. Before doing that, lets look at a few examples.
Example 1. Consider the function
y = f (x) = xn = x x, n = 1, 2, 3, 4, .....
n times
71
1
1
The inverse function is g(y) = y n = n y, dened for y 0 if n is even and all y if n is odd. Then g (y) = n1 y n 1 .
No surprise that it follows the general formula for powers. But we need to prove this using the inverse function theorem.
Assume y > 0 if n is even and in general that y = 0. Then by formula (24) we have, using the fact that f (x) = nxn1 ,
g (y) =
1
1 n1
1 1
1
= y n = y n 1 .
=
f (g(y))
n(g(y))n1
n
n
(25)
assuming x > 0 if q is even. We leave it as an exercise for the reader to show that f (x) = rxr1 .
Proof. of the inverse function theorem.
Step 1. Dene f 1 .
We are assuming f : [a, b] R is continuous and dierentiable on (a, b) with f (x) > 0 for all x (a, b). Thus f (x)
is strictly increasing on [a, b] by the mean-value theorem. This shows that f is 1-1 on [a, b]. By the intermediate value
theorem we know that f maps [a, b] 1-1, onto the interval [f (a), f (b)]. Thus the inverse function g : [f (a), f (b)] [a, b] is
well-dened, 1-1, and onto. Given y [f (a), f (b)],we know by the intermediate value theorem that there is an x [a, b] such
that f (x) = y and so we then dene g(y) = x. Note that x is unique since f is 1-1.
Step 2. The inverse function g is also strictly increasing.
Suppose that f (a) < r < s < f (b). We want to show that g(r) < g(s). We do a proof by contradiction. If g(s) g(r),
then since f is strictly increasing, we have s = f (g(s)) f (g(r)) = r. But this says s r while r < s giving us our
contradiction. So g must be strictly increasing.
Step 3. The continuity of the inverse function.
We want to show that g(y) is continuous at r = f (c) for c [a, b]. Let us rst assume that c is not an endpoint. Suppose
we are given > 0 which is small enough that c (a, b). Let g(r + 2 ) = c + and g(r 1 ) = c . Set = min{ 1 , 2 }.
Then we claim that
|y f (c)| < implies |g(y) c| < .
To see this, look at Figure 49 and note that
|y f (c)| < < y f (c) < f (c) < y < f (c) + .
This implies r 1 < y < r + 2 . It follows upon applying g to this inequality, using the fact that g is strictly increasing, we
have
c = g(r 1 ) < g(y) < g (r + 2 ) = c + .
Thus |g(y) c| < if |y r| < . This proves the continuity of g. We leave the case that c = a or c = b to the reader as
an exercise.
Note that the graph of the inverse function g is obtained from that of the function f by reecting the graph of f across
the line y = x. Thus it is inconceivable that if the graph of f has no breaks, then the graph of g could have breaks.
Step 4. The formula for the derivative of the inverse function.
As in Step 3, let f (c) = r and thus g(r) = c. Then
xc
1
g(y) g(r)
= lim
= lim f (x)f (c) .
yr
yr f (x) f (c)
yr
yr
g (r) = lim
xc
This proof is over once we prove that if y approaches r, then x = g(y) approaches c = g(r) as the limit on the right as
1
x approaches c is f 1(x) = f (g(y))
. But the continuity of g which was proved in Step 3 says that if y approaches r, then
x = g(y) approaches c = g(r). So were done.
72
Figure 49: Figure for the proof of the continuity of the inverse of a function y = f (x) with positive derivative on an interval.
The inverse function is x = g(y). For a point c let r = f (c). Given small positive , dene the positive numbers 1 and 2
by r + 2 = f (c + ) = and r 1 = f (c ).
73
23
We assume you have seen the functions ex , log x, sin x, cos x, etc. before. But how do you dene them precisely? Here
I will take the approach that you believe in innite series and I will dene ex by its Taylor series, even though we will not
discuss series for a while. Then I will dene log x to be the inverse function of ex . Many calculus books take a dierent
approach and dene log x as an integral. That seems less natural to me and we havent yet discussed integrals either. As
to the trigonometric functions, I will say very little about them here.
23.1
Exponential
M
xn
xn
x2
x3
x4
= lim
=1+x+
+
+
+ .
M
n!
n!
2
6
24
n=0
n=0
(26)
The power series in formula (26) converges for all real x as we will see later using the ratio test. Assuming that it is legal
to dierentiate a power series term-by-term where it converges (as we will later prove), we see that
x2
x3
x4
dex
=0+1+x+
+
+
+ = ex .
dx
2
6
24
We also see that e0 = 1. It follows that y = f (x) = ex satises the rst order ordinary dierential equation
dy
=y
dx
(27)
The ODE and Initial Condition in Formula (27) gives another way to dene y = ex since the solution to this 1st order ODE
with initial condition problem is unique. The good thing about our denition vs this one is that ours gives us a way to
compute the function. Maybe that doesnt impress anyone with a calculator or a computer, but it used to make me very
10
1
happy. The series converges pretty fast. Try computing e as
n! = 2.718 3. This gives the rst 5 digits by summing 11
n=0
Yes, I love this series. It even works for complex numbers z = x + iy, where i = 1, for square matrices X, and for
padic numbers so dearly loved by number theorists. Moreover there is a great generalization to Lie algebras, but we wont
go there. The complex number version gives you a way to understand sin x and cos x without knowing trigonometry since
ei = cos + i sin (Eulers formula).
Next lets derive the basic facts about the exponential from the power series denition, making use of theorems about
power series that will be proved later in the course. But you saw them in the power series part of the usual calculus class.
Facts About the Exponential Function.
Fact 1. Exp takes addition to multiplication.
ex+y = ex ey .
Proof. Multiply the power series and use the binomial theorem. This gives
ex+y =
xn y m
=
n! m=0 m!
n=0
k=0 m+n=k
m,n0
xn y m
.
n! m!
Here we treat the double series as we would a double integral. We change variables fromm, n to n, k = m + n. It follows
that
k
xn y kn
1
k!
1
=
xn y kn =
(x + y)k ,
ex+y =
n!(k
n)!
k!
n!(k
n)!
k!
n=0
n=0
k=0
k=0
k=0
Proof. By our denition of ex for x > 0, ex is bounded below by any one term of the power series, e.g.,
ex >
xn+1
.
(n + 1)!
Now we can graph ex . We know that it is strictly increasing for all real x, since it is its own derivative and it is always
positive. The second derivative is the same and positive and thus the graph is convex up everywhere. The function ex goes
to as x . Since ex = e1x , it follows that ex 0 as x .
k
k
xn
xn
a xn
=
=
.
n!
b n=0 n!
n!
n=0
n=k+1
75
Multiply this by k! to nd
k!
a xn
b n=0 n!
k
= k!
xn
.
n!
(28)
n=k+1
Now the left-hand side of this last equation is an integer while (by the argument below) the right-hand side is in the interval
(0, 1). This is a contradiction, proving that e is irrational.
To see that the right-hand side of equation (28) is in the interval (0, 1), proceed as follows.
k!
xn
n!
n=k+1
=
<
k!
k!
k!
+
+
+
(k + 1)! (k + 2)! (k + 3)!
1
1
1
+
+
+
k + 1 (k + 1)(k + 2) (k + 1)(k + 2)(k + 3)
1
1
1
+
+
+
k + 1 (k + 1)2
(k + 1)3
This last sum can be evaluated using the formula for the geometric series
xn =
1
1x ,
n=0
1
1
1
1
=
1 = k.
n
(k
+
1)
k
+
1
1 k+1
n=1
It follows that the right-hand side of equation (28) is in the interval (0, 1) and we have our contradiction.
Lambert proved that e and are irrational in 1761. Gelfond proved that e is irrational in 1929. See Hardy and Wright,
Introduction to the Theory of Numbers, p. 46, for the proof of the irrationality of . In fact, both e and are transcendental;
i.e., they are not roots of a polynomial with rational coecients. Note that rational numbers r = ab , with a, b Z, b = 0,
are roots of the rst degree polynomial with rational coecients x r or, if you want integer coecients bx a.
There is a whole branch of number theory devoted to such questions. In the 1970s a French mathematician named
Apry showed that the following value of the Riemann zeta function is irrational:
(3) =
1
.
3
n
n=1
Apry was an unknown older French mathematician when he found his proof and so no one believed him at rst. See
Van Der Poorten, "A Proof that Euler Missed ..." in The Mathematical Intelligencer, 1 (1979), 195-203. Euler showed that
(2) =
2
1
=
,
n2
6
n=1
(4) =
4
1
=
,
n4
90
n=1
and similar formulas for (2n), n = 3, 4, 5, 6, .... saying that (2n) = r 2n , where r is rational.
Lectures, II for a proof of the 1st formula using Fourier series.
23.2
Here we reverse the treatment of exp and log found in most calculus books. Note that exp : R R+ = (0, ) is 1-1 and
onto. Why? It is 1-1 since it is its own derivative and it is positive, thus strictly increasing and its graph is convex up. It is
1
onto since ex certainly approaches as x . Moreover as x , we see that ex = ex
0. Since ex is continuous,
x
by the intermediate value theorem it must be onto (0, ). So e satises the hypotheses of our theorem on inverse functions
(Theorem 44).
The moral of the preceding paragraph is that we can dene an inverse function to y = ex .
Denition 46 We dene the logarithm base e (natural logarithm) denoted log y by g(y) = log y = x i
76
y = ex .
Note that most calculus books call this function ln y for natural logarithm to distinguish it from the logarithm base 10.
We will have no need for base 10 logs and so we will probably hopefully never write ln y.
Properties of the Logarithm
Property 0) The Dierential Equation.
The function g(y) = log y maps (0, ) 1-1 onto R. It satises the dierential equation g (y) = y1 with g(1) = 0.
Proof. We use Theorem 44 which says that if y = f (x) = ex , then the derivative of the inverse function is obtained as
follows:
1
1
1
g (y) =
=
= .
f (g(y))
f (g(y))
y
Since f (0) = 1, we know that g(1) = 0.
The calculus book usually denes log y as:
y
log y =
1
1
dt.
t
This will make more sense after we have proved the Fundamental Theorem of Calculus (which will be discussed soon). For
then we will know that the derivative of the integral as a function of its upper endpoint is just the integrand. And, of course,
0
1
t dt = 0.
0
(29)
Let u = ex and v = ey. Then take logs of both sides of formula (29). That says
log ex+y = log (ex ey ) .
Since log erases exp, we have
log u + log v = x + y = log (ex ey ) = log(uv).
Now we are ready to graph y = log x. Of course the graph is obtained by reecting the graph of ex across the line y = x.
dy
d2 y
1
Since dx
= x1 > 0 if x > 0, we know that log x is increasing on (0, ). And the second derivative dx
2 = x2 < 0, for x > 0.
This means that the graph is convex down. We also know that log 1 = 0 and log e = 1. Recall that e
= 2.7183. So we get
Figure 51. We also know lim log y = and lim log y = from the corresponding facts about ex .
y
y0
y>0
77
78
Now we dene a function that includes most of the powers that we might want to think about for real variables.
Denition 47 The power function is dened as follows.
xa = exp(a log x).
Then dene
log xab = ab log x.
"
1
1
y
and therefore
log(1 + h)
1
= g (1) = = 1.
h0
h
1
lim g (1 + h) h = lim
h0
So erasing g by exponentiating will compute the desired limit as f (x) = ex is continuous. Thus
"
!
"
!
"
!
1
1
1
h
f lim g (1 + h)
= lim f g (1 + h) h
= lim (1 + h) h = f (1) = e.
h0
Example 2.
lim
y
x
log y
y
h0
h0
= 0.
Proof. Set y = e . Note that y i x . The stated formula is therefore a result of the fact that
as x .
79
log y
y
x
ex
0,
23.3
2
Since 1 is not the square of a real number, to get a root
of x + 1 = 0, we need to enlarge our world beyond the real line.
Let i denote a creature whose square is 1. Thus i = 1 is not real, yes, its imaginary, but not really more imaginary
that most of mathematics. Of course the equation x2 + 1 = 0 also has the root i. And it is really impossible to say which
of the 2 roots we are calling i. But worry not. We just call one of them i.
Dene the set C of complex numbers to be:
C = {z = x + iy | x, y R}
If z = x + iy, with x, y R, we say that x = Re z is the real part of z and y = Im z is the imaginary part of z.
We can visualize the complex numbers z = x + iy, with x, y R as points (x, y) in the plane. See Figure 52.
Figure 52: The complex number z = x + iy, with x, y real can be pictured as a point in the plane with coordinates (x, y),
or thought of as a vector from the origin to (x, y).
The set C forms a eld; i.e., it satises the 9 eld axioms stated in the earlier section on the eld axioms for R, where
we have dened sum and product of complex numbers as follows if z = x + iy and w = u + iv, with x, y, u, v R
z + w = (x + u) + i(y + v)
80
Here we just recalled an old trick from high school algebra, multiplying by 1 in the form of the conjugate u iv of the
denominator over u iv.
To say that C forms a eld means that you can add, subtract, multiply and divide by non-0 numbers with the usual laws
of associativity, distributivity, commutativity, etc. Note however that C is not an ordered eld, just a eld. The identity
for addition is the complex number 0 = 0 + i0. A complex number is 0 i both its real and imaginary parts are 0.
Next we dene the complex absolute value.
Denition 49 Absolute value of a complex number z is |z| = x2 + y 2 if z = x + iy, with x, y R.
Complex conjugate of z is c(z) = z = x iy.
2
Then |z| = zz. Also c(z + w) = c(z) + c(w) and c(zw) = c(z)c(w). The map c(z) is a continuous function which
maps C 1-1 onto C xing elements of R.
The properties of complex absolute value are the same as those for the real absolute value.
Properties of Complex Absolute Value.
Property 1) |z| 0, for all complex numbers z and |z| = 0 i z = 0.
Property 2) |zw| = |z| |w| .
Property 3) triangle inequality |z + w| |z| + |w| .
We leave the proofs of these properties to the reader. We will prove these things later in these notes, when we consider
normed vector spaces. This time there really is a triangle in the triangle inequality. See Figure 53.
Figure 53: Since the complex absolute value |z| is the length of the vector from the origin to the point representing z in the
plane and since the sum of 2 complex numbers z and w corresponds to the sum of the 2 vectors corresponding to z and w,
the triangle inequality says the sum of the lengths of 2 sides of a triangle is greater than or equal to the length of the third
side.
81
We could redo all of the things about limits and derivatives for complex numbers, but we wont. Thats another course.
And we would fall asleep the 2nd time through. In particular series of complex numbers work just like series of real numbers.
If z is a complex number, we can dene the complex exponential as a complex power series:
exp(z) = ez =
zn
.
n!
n=0
It turns out that this series converges for all complex numbers z. Then we have
ez+w = ez ew
(30)
just as in the real case. You multiply complex power series just the same way you multiply real power series (carefully).
And the binomial theorem is as true for complex numbers as for real numbers.
So we see that
ex+iy = ex eiy .
What does this mean? Lets look at eiy :
eiy
n
2
3
4
(iy)
(iy)
(iy)
(iy)
= 1 + iy +
+
+
+
n!
2
6
24
n=0
y3
y4
y2
i +
+ .
= 1 + iy
2
6 #24
#
y2
y4
y3
y5
=
1
+
+ + i y
+
+
2
24
6
120
= cos y + i sin y.
Here we are dening sin x and cos x by their Taylor series. You can deduce the standard trigonometric identities from
the properties of the complex exponential like formula (30).
2
One nds that eiy = 1, for y R. To see this note that eiy = eiy eiy = eiyiy = e0 = 1. Since eiy must be
positive it follows that eiy = 1 for all real y. This says sin2 y + cos2 y = 1, for y R.
See Figure 54.
82
83
n=0
(1)n
x2n
x2n+1
(1)n
and g(x) = sin x =
.
(2n)!
(2n + 1)!
n=0
(31)
You can check (assuming it is legal to dierentiate term-by-term) that sin x and cos x satisfy the following system of ODEs
with initial conditions:
f (x) = g(x) and g (x) = f (x), f (0) = 1 and g(0) = 0.
(32)
It is possible to dene sin and cos to be solutions of the initial value problems in equation (32). Lang, Undergraduate
Analysis, denes the trig functions this way. I dont nd this to be very satisfying since it does not tell you how to compute
them. Of course you might prefer the geometric denition seen in Figure 54 for an acute angle x measured in radians. All
denitions are the same once you show that whatever your denition of sin and cos it satises formula (32). For you learn
in dierential equations that solutions to such things are unique.
I prefer to deduce all the standard trig identities from our Taylor series for sin and cos, eiy = cos y+i sin y, ez+w = ez ew .
It is a nice exercise. For example, you can easily see from the fact that all powers in the Taylor series for cos x are even,
that cos(x) = cos x.
To gure out the addition formula for cos x, look at
eiu+iv = eiu eiv = (cos u + i sin u) (cos v + i sin v) = cos u cos v sin u sin v + i(cos u sin v + sin u cos v).
Therefore
cos(u + v) = cos u cos v sin u sin v.
I learned this treatment of elementary functions as by power series an undergrad reading P. Dienes, The Taylor Series,
Chapter 4. I really like this approach since it allows you to compute the functions and minimizes the memorization of trig
identities.
Of course we still have the problem of dening . Lang denes by saying that 2 is the rst positive zero of cos x. It
takes a bit of eort to see that such a zero exists if you are not allowed to look at Figure 54.
Then Lang goes on to show that sin and cos are periodic of period 2. This is certainly not obvious from the Taylor
series in formula (31). Any nite combination via sum, product, composite, dierence, quotient, inverse of the functions
seen up to now (polynomials, powers, exponentials, sines, cosine) are called elementary functions. Of course they might
be very complicated such as
2
ex 2001 log x 500x 3 + x.
These are the functions considered in calculus.
There are other favorite functions with many applications; e.g., in statistics, physics,... A good reference for many such
functions is N.N. Lebedev, Special Functions. Mathematica and Matlab know many of these functions: the error function,
Bessel functions, Airy integrals, ....
We have already seen the gamma function in the introduction. The incomplete gamma function is another favorite, as
are Legendre polynomials, also known as spherical harmonics because they arise in problems with spherical symmetry, such
as the problem of understanding the vibrations of the earth after a large earthquake, the solution of the Schrdinger equation
for the hydrogen atom, or the study of the suns magnetic eld.
Not every function that you can think of is expressible by its Taylor series. One example is the innitely dierentiable
function dened by
exp( 1
x2 ), x = 0
f (x) =
0,
x=0
One can show that all the derivatives f (n) (0) = 0, for n = 0, 1, 2, 3, .... This implies that the Taylor series of this function at
the origin is identically 0. We will say more about that later when we discuss Taylor series. Similar functions can be used
to glue 0 to 1 in an innitely dierentiable way (C - glue).
84
Figure 55: The function f (x) =
exp( 1
x2 ), x = 0
0,
x=0
series at 0.
85
Part VII
In this section we prove the basic properties of the Riemann integral of a continuous function on a nite interval from 2 basic
axioms. We will not show that an integral satisfying these 2 axioms exists until much later, using a method which may
seem dierent from that of limits of Riemann sums that you saw in calculus. It is an intermediate method between that
usually used for Riemann integrals and that used for Lebesgue integrals. Both kinds of integrals give the same answer for
continuous functions on nite intervals, or even piecewise continuous functions on nite intervals (i.e., functions with a nite
number of jump discontinuities; that is, points where the right- and left- hand limits exist but are dierent). An example is
seen in Figure 33.
Both Riemann and Lebesgue integrals were invented to discuss Fourier analysis. Yes, the Fourier coecients in equation
(2) are integrals. Both ideas of integrals start with area. The integral of a constant function f (x) = c, for all x in [a, b]
will be c(b a) which is the area under the curve. See Figure 56.
Figure 56: The integral of the constant function f (x) = 11 from 2 to 14 is 11 12 = 132.
But, of course, there are many more reasons we need integrals; for example, to compute probabilities. Many tests from
t2
1
statistics come from computing 2 e 2 dt.
x
Anyway, the amazing thing is that we can do most of the stu you learned in calculus about integrals from 2 measly
86
d
axioms. We will write denite integrals as
Icd f
or
d
f rather than
real number depending on the function f, the interval [c, d] and nothing else. The variable x is a "dummy variable." It
is there to help us evaluate integrals; for example, when we want to make a substitution. But the denite integral is not a
function of the variable x.
The 2 Simple Axioms for Integrals
Suppose that f : [a, b] R is continuous and a < b are nite numbers. We write for a < c < d < b
d
Icd f
d
f=
f (x)dx.
c
Figure 57: Illustration of Axiom 1 for integrals showing that when the function is between the positive numbers m and M,
then you expect the integral over [a, b] to be between the area of the large rectangle which is M (b a) and that of the small
rectangle which is m(b a).
Figure 58: Illustration of Axiom 2 showing that the integral of f on [a, b] should equal the sum of the integrals of f on [a, r]
and on [r, b] if a < r < b.
88
i=1
Figure 59: graph of a step function with values wj on the jth subinterval of the partition of [a, b]
Integrals are extended to continuous functions and others by taking limits of Cauchy sequences of step functions using
the absolute value norm
b
f 1 = |f (x)| dx.
a
89
Theorem 50 Fundamental Theorem of Calculus. Suppose that we have an integral satisfying our 2 axioms able to
integrate all continuous functions on nite intervals. Then if f : [a, b] R is continuous with < a < b < , we have
d
dx
x
d
f=
dx
x
f (t)dt = f (x) for all x [a, b].
a
x
f
a
x+h
f
a
f
=
Here we use the fact that h > 0 and thus a < x < x + h < b when h is near enough to 0.
Let f (u) = min{f (t) | x t x + h} and f (v) = max{f (t) | x t x + h}. We know that u, v [x, x + h] exist by
the Weierstrass theorem on the existence of maxima and minima for continuous functions on closed nite intervals. So now
by Axiom I, we have
x+h
f
f (u)h
f (v)h
f (u) =
x
= f (v).
h
h
h
x
Now let h approach 0 from above. The quantity in the middle approaches the right-hand derivative of f as a function
a
of x, holding a xed. The quantities on the outside of the inequalities approach f (x) by the continuity of f. So by the
Squeeze Lemma, the right-hand derivative is f (x).
Case 2. The Left-Hand Derivative.
We leave it to the reader as an exercise to show that the left-hand derivative is also f (x). This is a little tricky because
the denominator in the dierence quotient is now negative and a < x + h < x < b for h < 0. One has, again using Axiom 2,
x+h
x
x
f
a
f
a
x
f
x+h
f
=
x+h
Now use the same argument as in Case 1 to see that the left-hand derivative is f (x), to complete the proof.
There was a huge quarrel over who invented calculus and "proved" the fundamental theorem - Newton or Leibniz.
The eects may still persist. According to Wiener in 1949, " it became an act of faith and of patriotic loyalty for British
mathematicians to use the less exible Newtonian notation and to aect to look down on the new work done by the Leibnizian
school on the Continent .... When the great continental school of the Bernoullis and Euler arose (not to mention Lagrange
and Laplace who came later) there were no men (sic) of comparable calibre north of the Channel to compete with them ...."
See Edna E. Kramer, The Nature and Growth of Mathematics, p. 172.
Anyway it is relatively simple now to prove all the basic rules of calculus that you know and love, using our 2 axioms
and the fundamental theorem. No more memorizing formulas! Next we show the basic fact which allows the evaluation of
integrals by nding antiderivatives.
Corollary 51 The integral is unique. Assume that f : [a, b] R is continuous with < a < b < . If F is dierentiable
on (a, b) with F = f, then for x (a, b), we have
x
x
f = F (x) F (a).
f (t)dt =
a
90
(33)
Proof. Well, by the fundamental theorem, we know that the derivative with respect to x of the left-hand side of formula
(33) is f (x). By Corollary 41 of the mean-value theorem, formula (33) is true up to a constant. That is, we have a constant
C such that
x
f = F (x) + C.
a
a
f = 0. Thus 0 = F (a) + C, which implies C = F (a), proving our
Corollary.
The Fundamental Theorem of Calculus says that the derivative erases the integral:
d
dx
x
f = f (x).
a
The next Corollary essentially says the integral erases the derivative. It is just a restatement of the preceding Corollary.
Corollary 52 With the same hypotheses as the preceding Corollary, we have
x
d
F (t) dt = F (x) F (a).
dt
As a result of the fundamental theorem and the preceding Corollary we see that the integral and the derivative are
essentially inverse functions on functions. Thus the indenite integral is called an antiderivative and written
f. It is only
determined up to a constant by Corollary 41 of the mean value theorem.
Examples.
Example 1. Constant Function.
Suppose f (x) = c = constant, for all x [a, b]. Then by Axiom I we have
b
c(b a)
f c(b a).
a
It follows that
b
c = c(b a).
(34)
If c > 0, this integral is the area of the rectangle bounded by y = c, the x-axis, x = a, and x = b.
If c < 0, the integral is the negative of the area of this rectangle. We could also prove formula (34) from the preceding
Corollaries. For the derivative of F (x) = cx is c and thus
b
c = F (b) F (a) = cb ca = c(b a).
a
91
25
Most of the following rules for integrals come from some property of derivatives and the fundamental theorem of calculus.
Rules for Integrals. Suppose that f and g are continuous on [a, b].
Rule 1) Linearity.
If , R, then
b
b
(f + g) =
b
f +
g.
a
b
f
g.
b
b
f |f | .
It follows that
Rule 3) Positivity. Suppose that f (x) 0 for all x [a, b]. If there is a point c [a, b] such that f (c) > 0, then
b
f > 0.
a
Rule 4) Substitution.
Suppose g (x) is continuous on [a, b]. Then, if f is continuous on g[a, b],
g(b)
b
f (u)du = f (g(x))g (x)dx.
a
g(a)
b
f g = f (b)g(b) f (a)g(a)
a
f g.
Proof. Rules 1), 4), 5) come from corollary 51 and corresponding rules for derivatives. So lets do these rst.
Rule 1). We want to show that for all x [a, b] :
x
x
(f + g) =
x
f +
g.
a
Take derivatives of the function of x on the left using the fundamental theorem. You get (f + g) (x). But the derivative
of the function of x on the right is also (f + g) (x), using the linearity of derivatives.
So now we know that the left hand function of x has the same derivative as the right hand function of x. But then
corollary 41 of the mean value theorem says that the two functions dier by a constant K; i.e.,
x
x
(f + g) =
x
f +
92
g + K.
a
a
What is K?
x
f (u)du =
f (g(t))g (t)dt.
g(a)
Again we dierentiate the left hand function of x, using the fundamental theorem of calculus and the chain rule, obtaining
f (g(x))g (x).
When we dierentiate the right hand side as a function of x, we get the same answer using just the
fundamental theorem. Again Corollary 41 of the mean value theorem tells us that the left and right hand side must dier
by a constant. Plug in x = a to see that the constant must be 0.
Rule 5). Prove this as an exercise imitating the proof of Rule 1 using the rule for the derivative of a product.
b
b
Rule 2). We want to show assuming a < b and f (x) g(x) for all x [a, b] that
f g. To do this, look at
a
b
h = g f . Then h is 0 on [a, b] and Axiom 1 for integrals says
b
b
g
then
a
b
h 0. This implies the result.
f=
a
b
f > 0. For this, you should draw a picture. See Figure 60. By the continuity of f at
a
c, if were given = f (c)/2, then there is a so that |x c| < implies |f (x) f (c)| < f (c)/2. This means that
f (c)
f (c)
< f (x) f (c) <
.
2
2
Add f (c) to this and get, for x (c , c + ),
0<
3f (c)
f (c)
< f (x) <
.
2
2
It follows using Axiom 2 for integrals and the fact that the integral preserves
b
a
c
c+
c+
b
f (c)
f (c)
f=
f+
f+
f
=
(2) > 0.
2
2
a
c+
Here we have assumed a < c < b. If a = c or c = b, the result still works. We leave this to you to prove as an exercise.
26
Answer.
Joseph Liouville (1809-1882) proved that such integrals as
2
Erf (x) =
x
93
Figure 60: Picture proof that a non-negative continuous function f (x) on an interval [a, b] such that f (c) > 0 at some point
c must be positive in a subinterval (c , c + ). Here take f (c)/2 and the corresponding with the graph of y = f (x)
above the interval (c , c + ) in the pink box.
and
x
F (x|m) =
cannot be expressed in terms of a nite number of elementary functions. A reference is: D.G. Mead, American Math.
Monthly, 68 (1961), 152-156. The error function comes up in statistics and in the solution of various partial dierential
equations. The elliptic integrals come up in various sorts of applied math. problems as well as in number theory.
Mathematica, Matlab, Maple, etc. attempt to do all "doable" integrals. But of course these programs have to recognize
that some integrals simply are not doable. See Wolframs book to accompany Mathematica for some discussion of this
problem.
Leibniz search for closed form expressions of integrals. J. Stillwell, Math. and its History, p. 110, says: "The search for
closed forms was a wild goose chase but, like many eorts to solve intractable problems, it led to worthwhile results in other
directions. Attempts to integrate rational functions raised the problem of factorization of polynomials and led ultimately
1
to the fundamental theorem of algebra .... Attempts to integrate (1 x4 ) 2 led to the theory of elliptic functions.... the
problem of deciding which algebraic functions may be integrated in closed form has been solved only recently, though not in
a form suitable for calculus textbooks, which continue to remain oblivious to most of the developments since Leibniz." See
J. H. Davenport, On the Integration of Algebraic Functions, Lecture Notes in Computer Science, Springer-Verlag, 1981.
Unlike Leibniz, Newton evaluated integrals by expanding functions in power series and integrating the power series termby-term. Isaac Newton oered his works on calculus to the British Royal Society and Cambridge University Press but the
works were rejected. Thus Leibniz managed to publish the rst paper on calculus. As Stillwell notes, loc. cit., p. 109:
"This led to Leibnizs initially receiving credit for the calculus and later to a bitter dispute with Newton and his followers
over the question of priority for the discovery." Stillwell nds (p. 110): "One thing has changed: it is now much easier to
publish a calculus book than it was for Newton."
94
27
Next we consider the Taylor formula which is a nite version of Taylor series. In 1715 Brook Taylor (1685-1731) was the rst
person to publish the formula.
Theorem 53 Taylors Formula with Remainder.
Suppose that J is an interval and f : J R has n continuous derivatives on the interval. If a, x J, we have
f (x) =
n1
k=0
f (k) (a)
(x a)k + Rn ,
k!
x
Proof. Induction on n.
Start with the fundamental theorem of calculus. This says
x
f (x) f (a) =
f (t)dt,
1
(n 1)!
x
x
x
1
1
1
(x t)n f (n+1) (t)dt
(x t)n f (n) (t) +
(n 1)! n
n
a
a
1 (n)
f (a)(x a)n + Rn+1 .
n!
It is much easier to remember the alternate formula for the remainder. It looks much like the next term in the Taylor
series.
Corollary 54 Taylors Formula with Alternate Remainder. Under the same hypotheses as for Taylors formula, we
can nd a point c between a and x such that the remainder in Taylors formula has the form:
Rn =
1 (n)
f (c)(x a)n .
n!
In this case, since integrals (in the right direction) preserve inequalities, we have
m
(x a)n
n!
m
(n 1)!
x
(x t)n1 dt
a
1
Rn =
(n 1)!
M
(n 1)!
Therefore
m
(35)
x
(36)
x
(x t)n1 dt =
M
(x a)n .
n!
(37)
Rn n!
M.
(x a)n
(38)
By the intermediate value theorem for f (n) , we know there is a point c in [a, x] such that
Rn n!
= f (n) (c).
(x a)n
Solve for Rn to obtain the desired formula.
Case 2. x < a.
x
We leave the details of this case to the reader as an exercise. Note that
a
=
for t [x, a]. The inequality is reversed if n is odd. You have to take account of that to get formula (35). But in the latter
case another sign switch will occur when you derive formula (38). The joy of inequalities.
28
This function has the property that f (n) (x) exists for all values of x. Moreover it can be shown that f (n) (0) = 0 for all
n = 0, 1, 2, 3, .... See the exercises. So the Taylor formula says
f (x) =
n1
k=0
f (k) (a)
(x a)k + Rn
k!
= 0 + Rn .
Since f (x) = e1/x > 0 if x > 0, we see that if we let n , the Taylor series does not represent the function for
x > 0. Thus, for positive x, the remainder Rn cannot approach 0 as n .
We say that such a function is "not analytic". Weierstrass dened analytic functions around 1870 and found many
applications. His student Sonya Kovalevsky (alias Soa Kovalevskaya) wrote a thesis about power series (in 2 or more
variables) solutions of partial dierential equations.
Example 2. The Exponential.
ex =
xn
n!
n=0
96
Figure 61: The function f (x) =
e1/x , x > 0
0,
x 0.
The ratio test (which we consider later) shows that the remainder (2nd form) goes to 0 as n
Stirlings formula for n! would make this even clearer as it says
n!
2nnn en , as
n ,
meaning that
lim
n!
= 1.
2nnn en
Read "" as "is asymptotic to." A proof of Stirlings formula can be found in Lang, Undergraduate Analysis.
As we noted earlier eix = cos x + i sin x, where i =
that for eix .
1. So you can get the Taylor series for cos x and sin x out of
1
xn ,
=
1 x n=0
xn
,
n
n=1
(1 + x) =
s
n=0
xn ,
1 + x = exp 12 log (1 + x) .
97
n
1 dn 2
x 1 .
2n n! dxn
This is called the Rodrigues formula for the Legendre polynomials. The rst few are:
P0 (x) = 1, P1 (x) = x, P2 (x) =
1 2
1 3
3x 1 , P3 (x) =
5x 3x .
2
2
Mathematica knows them and will graph them quickly. The Mathematica command
{i, 30}], {x, -1, 1}] produced the following gure.
Plot[Table[LegendreP[i, x],
2 12
(1 2xt + t )
=
Pn (x)tn , for |t| < 1 and |x| 1.
n=0
It is possible to use the generating function to derive properties of the Legendre polynomials quickly. For example, you can
see that Pn (1) = 1.
References for Legendre polynomials include: N.N. Lebedev, Special Functions, S. Wolfram, Mathematica, R. Courant
and D. Hilbert, Methods of Mathematical Physics, Vol. I.
The Legendre polynomials arise in problems of applied math. which have spherical symmetry. They form a system of
orthogonal polynomials on the interval [1, 1] with the inner product
1
< f, g >=
f (t)g(t)dt.
1
The Legendre polynomials are pairwise orthogonal for this inner product; i.e., < Pn , Pm >= 0 if n = m. It is possible to
expand arbitrary continuous functions on [1, 1] is series of Legendre
polynomials (a generalized Fourier series) convergent
in the sense of the norm coming from this inner product f = < f, f >. We will say much more about such things in the
98
last section of Lectures, II. The Legendre polynomials are important for obtaining explicit solutions of partial dierential
equations with spherical symmetry; e.g., Schrdingers equation for the hydrogen atom.
Stillwell, Mathematics and its History, p. 107 says: "It is misleading ... to describe Newton as a founder of calculus
unless one understands calculus, as he did, as an algebra of innite series. In this calculus, dierentiation and integration
are carried out term by term on powers of x and hence are comparatively trivial."
Newton studied the binomial series and used it to obtain the power series for arcsine in 1669. Mercator found the power
series for log(1 x) in 1668.
There is also
x2n+1
arctan x =
(1)n+1
, if
|x| < 1.
2n + 1
n=0
By a theorem of Abel, this converges for x = 1 and gives the Leibniz formula:
1 1 1
= 1 + + .
4
3 5 7
Of course there are many applications of power series. For example, in the theory of perturbations, small deviations from
the state of a physical system, one plugs a power series into the dierential equation describing the system and computes as
many coecients of the solution as possible, in order to deduce information about the system.
29
x0 < x1 < < xn = b} of the interval [a, b] on the x - axis.. Here, for sk [xk , xk+1 ), the Riemann sum is
S(f, P ) =
n
k=1
"Rening the partition" means that the lengths of all subintervals [xk , xk+1 ) get smaller. Assume that f is continuous or
piecewise continuous on [a, b]. You may view this as approximating the function f (x) by the step function with value f (sk )
on the kth subinterval [xk , xk+1 ). See Figure 63.
In 1875 Darboux modied the Riemann sums by taking f (sk ) to be either the max or min of f (x) on the kth subinterval.
He thus got upper and lower sums. For a function f (x) to be Riemann integrable one wants the upper and lower sums to
approach the same limit as the partition P is rened. The common limit of the upper and lower Darboux sums is called the
b
Riemann integral f (x)dx. This is ne for piecewise continuous functions.
a
Our Axioms 1 and 2 for the integral actually come out of upper and lower Darboux sums. We will not go this route
however. Instead we will prove the existence of the integral for continuous functions (or piecewise continuous functions) by
a method closer to that of Lebesgue (1902). Lebesgues methods led to a better theory of Fourier series and to the modern
theory of probability measures.
If you are only interested in the calculation of the sort of integrals that arise in calculus, then it does not matter what
denition of integral you use. For a continuous function on a closed nite interval or a piecewise continuous function (i.e., a
function that is mostly continuous but has a nite number of jump discontinuities) the Lebesgue integral equals the Riemann
integral. However, the Lebesgue integral will integrate more functions than the Riemann integral. For a positive function on
99
Figure 63: The Riemann sum is obtained by summing terms f (sk )(xk xk1 ) over k = 1, ..., 7. Since our function is
non-negative, this means we add up the areas of the rectangles with dotted sides, green tops and bases on the x-axis.
100
an innite interval (or a positive unbounded function) the Lebesgue integral is the same as the improper Riemann integral,
when the latter exists.
For theoretical work, the Lebesgue integral is much nicer than the Riemann integral. For example, the interchange of
integral and limit is legal in many more situations for the Lebesgue integral. The hypotheses of the Lebesgue dominated
convergence theorem are quite weak in comparison to those that we will discuss later for the Riemann integral. Similarly
the story of repeated integrals is much simpler for Lebesgue integrals (see the statement of Fubinis theorem).
There are at least 2 ways to approach the Lebesgue integral.
Lebesgues Way. Dene the Lebesgue measure (S) of a (measurable) subset S R where [a, b] = ba. Suppose the
bounded function f has range contained in the interval [c, d] on the y-axis. Then partition the interval on the y-axis rather
than the x-axis by partition Q = {c = y0 < y1 < < yn = d}. Thus obtain step functions approximating f involving sets
Sk = f 1 [yk1 , yk ]. Assume the function is such that all these sets are Lebesgue measurable. Let k [yk1 , yk ]. Then
the step function (x) is dened to have the constant value k on Sk , for k = 1, ..., n. The Lebesgue integral of step
function (x) is dened to be
n
k (Sk ).
k=1
Figure 64: To get a Lebesgue integral create a step function by partitioning the y-axis. Then dene the step function to
be constant on the inverse image of the intervals on the y-axis. For example, (x) = 1 for all x f 1 [y0 , y1 ] = S1 .
The New-Fangled Way. Use Cauchy sequences of step functions with respect to the L1 -norm on the space of nice
101
functions on [a, b]
b
f 1 =
|f (x)| dx.
a
In order to do this, we must think about normed vector spaces (innite dimensional) of functions. The method is the same
as that used to create the real numbers as limits of Cauchy sequences of rational numbers.
The Lebesgue measure of a set S of real numbers starts out with the idea that an interval I = [a, b] or (a, b) or
[a, b) or (a, b] has measure (I) = b a. Measurable sets are created by taking countable unions, complements, countable
intersections. Then the measure is assumed to have certain properties on measurable sets. In particular, the Lebesgue
should be non-negative and countably additive meaning that if one has a innite sequence Sn of pairwise disjoint (n = m
= Sn Sm = ) measurable sets then
Sn =
(Sn ).
n=1
n=1
We will not say more about the Lebesgue integral or measure here.
102
Exercises I
Exercises 1
( L D L ) ( x ) = L( L( x )).
Exercises 2
{2
3)
n = 1, 2,3,.....} or
a n+m = a n a m .
b) Prove the formula in part a) if both n and m are negative integers.
c) Again suppose that a is real and n,m are positive integers. Show that
(a )
n m
= a n<m .
4) a) Suppose that a is a positive real number. Show that there is a positive integer n such that
1
< x.
n
7 is irrational.
b) Show that
5) a) Use the 2 order axioms ORD1 and ORD2 listed on p. 22 of the lectures plus the definition of a < b to
deduce that x < y implies x+z < y+z (which is Fact 3 in our list of facts about order).
b) Using the properties of inequalities (see our list of facts about order) and the definition of
absolute value, find the set of real numbers x such that
1
x2 1 < .
4
Exercises 3
1) Tell whether the sequences below have a limit. Find the limit, if possible.
Prove that your answer is correct.
a) 1+(-1)n
b) n!/nn
c) (1-n)/(1+n)
2) Assume that {xn} is a sequence of real numbers. Show that if a = lim xn exists, then the set
n
{x1,x2,x3, ....
} is bounded (both above and below).
3) Assume that {xn} and {yn} are sequences of real numbers. Show that if a = lim xn exists and
n
4) True-False. State whether the following are true or false. Give a brief reason for your answer
(such as a reference to some fact in these notes or a textbook).
a) Every bounded sequence of real numbers is convergent to a real number.
b) Every bounded sequence of real numbers is Cauchy.
c) Suppose that an bn cn for nn0. If the limit L = lim an = lim cn exists and is the same for
n
both {an} and {cn} , then {bn} has a limit which is also the same; i.e. L = lim bn .
n
5) a) Define = lim an .
n
b) Suppose that {an} is an unbounded sequence of non-negative real numbers. Show that {an}
has a subsequence converging to according to your definition in part a).
Exercises 4
1) Find the following limits. Then explain your answer using the definition of limit. This means given ,
find as a function of .
a) lim
x0
(5 x 1)
b)
lim x 2
x 1
c)
x.
lim
x0
x>0
2) True-False. Tell whether the following statements are true or false. Give a brief reason for your
answer.
a)
lim f ( x) = L implies lim f (a + h) = L .
xa
b)
h0
lim f ( x) = L implies L = f(a).
xa
x>a
Hint: Draw a picture of the graph of y=f(x) near x=a.
3) Prove or disprove:
a)
lim f ( x) = 3 lim f ( x)
x 3a
xa
b)
lim f ( x) = lim f (3 x) .
x 3a
xa
1
lim f ( x) = L lim f ( ) = L.
x
x0 x
x>0
Exercises 5
1) Determine where the following functions are continuous. Explain your answers briefly. Draw the graphs.
a)
1, if x
0, otherwise.
b) f ( x) =
1
, 0 to two points
2k
1
1
1
1
,
,
and
for all integers k' with k0. So the function consists
2k + 1 2k + 1
2k 1 2k 1
2)
of infinitely many line segments. This example implies that it is wrong to say that continuity
means being able to draw the graph without lifting the pen from the paper. You cannot really draw
this graph.
a) Use the Intermediate Value Theorem to show that any polynomial of odd degree has a real root.
Why doesn't this work for polynomials of even degree?
b) Suppose that f:[a,b][a,b] is continuous. Show that there is an x[a,b] such that f(x)=x
The point x is called a fixed point of f. Hint: Look at g(x)=f(x)-x.
3) True-False. Tell whether the following statements are true or false. Give a brief reason for your
answer.
a)
4)
b)
c)
Suppose that f:[a,b] is continuous and strictly increasing; i.e., x < y, for x,y in [a,b] implies
f(x) < f(y).
exists and is
Exercises 6
1) a) Assuming the derivative of ex is ex, find the derivative of log(x)=loge(x)=ln(x) using the theorem
about derivatives of inverse functions. (In this course, e is the only base we use.)
b) Compute the derivative of the following function using properties of derivatives such as the
chain rule: f(x)= x1/x .
Hint: Recall that the definition of xa for x>0 is xa=ealogx.
2) Assume that f and g are functions on the open interval (a,b). Assume that both f and g are
differentiable at x(a,b). Suppose c is a real constant. Prove using the definition of derivative
that:
a) (f+g)(x)=f(x)+g(x). b) (cf)'=cf' if c is a constant.
c) Using mathematical induction, prove the formula for the derivative of xn, for n=1,2,3,... .
3) a) Define f(x) = xsin(1/x) if x0 and f(0)=0. Show that f(x) is not differentiable at x=0 but
3
Exercises 7
1) Prove the Generalized Mean Value Theorem also known as Cauchys Mean Value Theorem.
This says the following. Assume that f and g are differentiable on (a,b) and continuous on [a,b]. Show
there exists a point c(a,b) such that
[f(b)-f(a)]g(c)=[g(b)-g(a)]f(c).
Hint. As in the proof of the mean value theorem, consider a function to which you can apply Rolles
theorem. Try the function:
h(x)=[f(b)-f(a)][g(x)-g(a)]-[g(b)-g(a)][f(x)-f(a)].
2) Prove lHpitals rule. This says the following. Assume that f and g are differentiable on an open
interval (a,c). Suppose also that g(x) and g(x) do not vanish in (a,c). Finally assume that both f(x) and
g(x) approach 0 as x goes to c, with x < c. Then lHpitals rule says:
If
f '( x)
f ( x)
= k , then lim
= k.
lim
g
x
g
x
'(
)
(
)
xc
xc
x<c
e 1( x 1) ,
3) a) Define g ( x ) =
0,
x<c
Show g(n)(x) exists for all n'+ and all real numbers x. Again there are 3 cases.
1 1( x1)
g ( n ) ( x ) = Pn
, for x>1,
e
x 1
Exercises 8
1) Prove that
sin x dx
2) Assume that a<b<c. Note that if a function f(x) is continuous on [a,b] and also continuous on [b,c], but
has a jump discontinuity at x=b, then we can extend the integral to say f is integrable on [a,c] and
c
Show that this integral still satisfies the our axioms I and II.
3) Define sign(x) = 1 if x>0, sign(x)=-1 if x<0, and sign(0)=0. Show that for any a<b, we have (using the
b
sign( x)dx =| b | | a | .
preceding problem):
Hint. First note that f has a maximum M and minimum m on [a,b]. Why? Then use the axioms for
b
f (t )dt M (b a ).
Then
1
f (t )dt is in the interval [m,M].
b a a
5) Compute
1
1
1 t 2 dt = t 1 = 2 ?
1
7) Write out Taylors formula with remainder as in Lang p. 109 with a=0 and n=6 for the function
f(x) = log(1-x).
8) Cauchy-Schwarz Inequality.
2
f (t ) g (t )dt
a
f (t ) dt g (t ) 2 dt.
2
( f (t ) xg (t ) )
dt = Ax 2 + Bx + C. This is always
I, Practice Exam 1
d) lim
n
e) If f:(c,d) and a(c,d), define the 2-sided limit
lim
xa
xn ;
f ( x) ;
x a , x (c , d )
f) Cauchy sequence;
2)
g) subsequence.
with a(c,d).
3)
Hint: You need recall how to negate a statement with the quantifiers , .
a) Prove that an increasing sequence of real numbers which is bounded above has a limit.
b) Show that a Cauchy sequence is bounded.
c) Show that if a sequence {xn} is Cauchy and it has a convergent subsequence, then {xn} converges to
the same limit as the subsequence.
d) Prove that the set '+ does not have an upper bound in .
e) Suppose that
4)
lim xn = L
n
and
5)
a1 = 1, an +1 = 1 + an .
Show that
Hint: First show that 0< an < 2 for all n, by induction. Then show that the sequence is strictly increasing.
Then note that L must satisfy L2=1+L. Why?
1
6) State whether the following sequences have limits. If they do, find the limit and prove that the sequence
approaches that limit using the definition of limit. If they don't, explain why they don't.
a)
lim sin(n ) ;
n
n
;
2
n
+
1
n
b) lim
c)
n2 n
;
lim
2
n
+
1
n
d)
1
.
lim
n
2
n
7) True-False.
Tell whether the following statements are true or false. Give a brief reason for your answer.
a) The only real number a which satisfies the inequality |a|< , for every >0, is the number a=0.
b) The sum of two irrational numbers is irrational.
c) For all real numbers x,y we have |x-y||x|-|y|.
d) If x denotes the largest integer less than or equal to x, then lim x = 1.
x 1
m' + ;
2m + 1
a)
b)
{x
x;
lim
x0
x>0
Find
lim f ( x)
xa
xa
IPractice Exam
1)
c) product rule
a)
b) linearity
c) integration by parts
d) substitution formula
e) integrals preserve
5) True-False.
Tell whether the following statements are true or false. If true, give a brief reason for your
answer. If false, give a counterexample.
a) Suppose f:(a,b)is continuous. Then there is a point c in (a,b) such that f(x) f(c) for all x(a,b).
b)
fg = f g
c) Suppose f:[a,b] is continuous and f is differentiable on (a,b) with f(x) =0 for all x(a,b ). Then
f(x)=constant, for all x[a,b].
d) Recall that a function f(x) defined for xis even iff f(-x)=f(x) for all x. We say f(x) is odd iff
f(-x)=-f(x), for all x. Then the derivative f(x) of an even function f(x) is an odd function.
e) Suppose that x=g(y) is the inverse function to y=f(x). Then g(y)=1/f(y).
f) f,g continuous on [a,b] implies max{f,g} continuous on [a,b].
g) f continuous at x=c implies f differentiable at x=c.
1, x rational
f ( x) =
0, x irrational
c) Compute lim 1 +
n
x
.
n
d) Suppose that f is a differentiable function on the whole real line such that f(x)=2xf(x) for all x.
Show that there is a constant C so that f ( x) = Ce
x2
for all x.
e) Compute
d
g ( x t )dx.
dt 0
7) a) Suppose that f(x) > 0 for all x in (a,b). Why does y=f(x) have an inverse function x=g(y)? What is
the formula for the derivative of the inverse function? Prove it.
b) State Taylors formula with remainder (2nd version). Apply it to get the formula for log(1-x) using the
first 3 terms of the Taylor series plus remainder.
c) Define ex by its Taylor series and prove that euev = eu+v.
d) Define log(y) as the inverse function to ex and show that log(uv) = logu+logv.
e 1/ x ,
8) Define a function f ( x ) =
0,
for x > 0
for x 0
function. Is f(x) represented by its Taylor series around the point 0? Why?
9) Show that the following function gives an example of a continuous function whose graph cannot be drawn
without lifting pen from paper. Define f(x) inductively as a map from [-1,1] to [-1,1] consisting of
straight line segments connecting the point (1/(2k),0) to the 2 points (1/(2k+1),1/(2k+1)) and
(1/(2k-1),1/(2k-1)) for k=1,2,3, ....
Sketch the function.
11/27/2010
from Wikipedia:
Sierpinski Triangle
3D Cantor Dust
Lorenz attractor
Coastline of Great Britain
Mandelbrot Set
11/27/2010
1) Cantor Dust.
From the interval [0,1] remove the middle third. Then remove the
middle third from the remaining 2 intervals [0,1/3] and [2/3,1].
Keep going with this removal of middle thirds
forever ..
You end up with the Cantor dust. Impossible to draw it. We give the
first 5 steps. Later we will see the Cantor Dust has box dimension
l /l
ln2/ln3
# .63.
6
Higher dimension than a point but smaller than an interval.
There are many interesting facts about the Cantor dust. For
example, the set is uncountable, but if you integrate the function
that is 1 on the Cantor set and 0 off the set (using the Lebesgue
integral), you get 0. The Riemann integral cannot deal with this.
dn
n 1
11/27/2010
2) Sierpinski Triangle.
Start with an equilateral triangle
and remove the center triangle.
Remove the
R
h center triangles
i
l from
f
each
h of
f the
h
3 remaining triangles.
11/27/2010
dim B S
ln N ( r )
.
k of
1
ln k
r
lim
Proof Sketch.
Given H>0 and r with 0<r<1,
have to find a positive integer k such that
r k 1 H d r k .
Then
you just
N ( r k ) d N (H ) N ( r k 1 ).
1
1
1
ln k d ln ln k 1 ,
r
H
r
k
k 1
ln N ( r ) d ln N (H ) d ln N ( r ).
As ln(( x ) monotone n,
and
So
ln N ( r k )
ln N (H ) ln N ( r k 1 )
d
d
.
ln 1 / r k 1 ln 1 / H ln 1 / r k
Example.
The Box dim of the Cantor set C.
Recall that C is obtained from the interval [0,1] by continually
removing middle thirds. At each stage there are twice as
many intervals as the preceding stage
stage. And each interval has
length 1/3 that of the preceding stage. Since N(1/3) =2,
we find that, by induction, N(1/3K)=2K, for each kt1.
Let r=1/3 in the preceding theorem and find
dim B C
1
ln N k
k
3 lim ln 2
lim
k of
k of ln 3k
1
l
ln
1 k
3
k ln 2 ln 2
lim
# 0.63.
k of k ln 3
ln 3
11/27/2010
Problem 4. Show that the Sierpinski triangle is a self-similar set. Use this to see
that the box dimension of the Sierpinski triangle is
ln3/ln2#1.58 using the
following theorem. You need to define functions of vectors in the plane. First put
the origin at the left hand base point of the big triangle. Then figure out what
function shrinks the big triangle to the small one at the origin. Next what vector
must you add to that function to shift the left small triangle over to the right one
at the base?
ln N ( r k )
k of ln(1/ r ) k
lim
ln m k
k of ln(1/ r ) k
lim
ln m
.
ln(1/ r )
11/27/2010
M * En and
n t1
length( E ) H .
n
n t1
9 9
9 9
2k-1
k
f(x) = 1k , 3k ,!, 2 k 1
Problem 8. Prove that f(x) is increasing, continuous and has derivative f(x)=0
except on the Cantor set C, which has Lebesgue measure 0. But f(0)=0 and
f(1)=1. So the fundamental theorem of calculus fails for this function:
1
f (1) f (0)
f '(t )dt
0.
11/27/2010
A person moving toward you according to the Devils staircase law y=f(x)
would cover a unit distance in a unit of time, but you might never see
him or her move even if you were watching all the time. Thus Korevaar,
Mathematical Methods, p. 404, calls this function "the almost perfect
sneak.
In 1872 Weierstrass found functions that were continuous everywhere
but nowhere differentiable. This shocked many famous mathematicians
who had thought such a function impossible.
Hermite described these functions as a "dreadful plague."
Poincar wrote: "Yesterday, if a new function was invented it was to
serve some practical end; today they are specially invented only to show
up the
th arguments
t of
f our fathers,
f th
and
d they
th
will
ill never have
h
any other
th
use." Even as late as the 1960's, before "everyone" had a computer fast
enough to graph these things, such examples were viewed as pathological
monsters. Now there are thousands of websites with pictures of
approximations of them.
f (t )
( s 2) k
sin(O t ).
)
k
k 1
0.002
0 .0 0 4
0 .0 0 6
0 . 00 8
0 .0 1
-1
There are lots of pictures of this graph on the web. For example
http://en.wikipedia.org/wiki/Weierstrass_function.
h
//
k
/ k/
f
http://planetmath.org/encyclopedia/WeierstrassFunction.html
Or see http://www.math.washington.edu/%7Econroy/
for an animation zooming in on the Weierstrass function
f
f (t )
sin(2k t ).
k 1
11/27/2010
f (t )
( s 2) k
sin(O k t ) is a
k 1
0 . 0 0 4
0 . 0 0 6
0 . 0 0 8
0 . 0 1
0.2
0 .4
0.6
0.8
0.002
0.004
0.006
0.008
0.01
-1
-2
11/27/2010
m 1
k 1
k 1
0.002
0.0 04
0.006
0.00 8
0.01
-1
0 .00 2
0 .0 04
0 .00 6
0 .0 08
0. 01
-1
11/27/2010
f (t )
sin(O t ).
( s 2) k
k 1
0 .002
( s 2) k
k 1
k 1
N
k 1
2O
0. 006
0.0 08
0.01
sin O k t h sin O k t
0. 004
-1
( s 2) k
k N 1
( s 2)) k
sin O k t h sin O k t
k N 1
Here we used the mean value theorem on the first N terms and that
|sinx|d1 for the rest of the terms.
Then we sum the geometric series to see that
f (t h ) f (t ) d
hO ( s 1) N
O ( s 2)( N 1)
2
d ch 2 s ,
1 s
1 O
1 O s 2
where c is independent of h.
This implies that the dimB(graph f) d s by the Lemma above.
To go the
T
th other
th way, take
t k our sum defining
d fi i f(t+h)-f(t)
f(t h) f(t) and
d split
lit it
into 3 parts, the first N-1 terms, the Nth term, and the rest. This
implies that
if
O ( N 1) d h O N ,
then
0.00 2
0. 004
0.006
0.008
0.01
-1
10
11/27/2010
1 ( s 2) N
O
, N .
20
O N d G O ( N 1) .
O ( N 1) d h O N
sin O N t h sin O N t !
1
.
10
f (t h ) f (t ) t
1 ( s 2) N 1 s 2 2 s
O
t O G .
20
20
There is lots more to say about fractals, but we will stop here.
I leave you with a picture of the Mandelbrot set from Wikipedia.
11
FINAL
1) Page 2 of fractals lecture. Show that the Cantor set can be expressed as certain triadic expansions
and thus is uncountable.
2) Page 3 of fractals lecture. Show that the box dimension of the unit square is 2.
3) Page 5 of fractals lecture. Show that the box dimension of the set of rational numbers is 1. Compare with
the box dimension of the Cantor set. How can it be that the Cantor set has a smaller box dimension than the
rationals even though the Cantor set is uncountable?
4) Page 5 of fractals lecture. Show that the Sierpinski triangle is a self-similar set. Show using Thm. 2 that
the box dimension of the Sierpinski triangle is ln3/ln2.
5) Answer the Why? in the Proof of Theorem 2 on p.5 of the fractals lecture.
6) Page 6 of fractals lecture. Show that any countable set M of real numbers has Lebesgue measure 0.
7) Page 6 of fractals lecture. Show that the Cantor C set (fractals lecture p. 2) has Lebesgue measure 0.
Hint. To do this show that Ck in the definition of C has length (2/3)k.
8) Page 6 of fractals lecture. Prove that the Devils Staircase function f(x) is increasing, continuous and has
derivative f(x)=0 except on the Cantor set C, which has Lebesgue measure 0.
9) Page 8 of fractals lecture. Show that the Weierstrass function is a continuous function on [0,1].
f (t ) = ( s 2) k sin( k t ) .
k =1
Hint. Use a convergence test (often called the Weierstrass M-Test) from calculus which we will do in
the next part of the notes.
10) Page 9 of fractals lecture. Fill in the details of the proof of Proposition 1, page 9 of the fractals lecture.
11) Page 10 of fractals lecture. Suppose f:[a,b] has a continuous derivative. Show dimBgraphf=1. Hint. Use
the mean value theorem and Lemma 1.
12) Page 11 of fractals lecture. Show that any function satisfying condition 2 of Lemma 1 must be nowhere
differentiable. It follows that the Weierstrass function is continuous but nowhere differentiable.
Part I
Motivation
Recall Section 1.1 of Lectures I where we noted that Fourier needed to express a function f (x) as a Fourier series:
f (x) =
an e2inx .
n=
Here eix = cos x + i sin x, where i = 1 (which is not a real number). This means you can rewrite the series of complex
exponentials as 2 series - one involving cosines and the other involving sines. The Fourier coecients are
1
an =
f (y)e2iny dy.
In order to investigate Fourier series as well as integrals, we will need to look at normed vector spaces.
We want to understand the integral from a more modern perspective rather than that of your calculus book. Secondly we
want to understand convergence of series of functions - something that proved problematic for Cauchy in the 1800s. These
things are important for many applications in physics, engineering, statistics. We will be able to study vibrating things such
as violin strings, drums, buildings, bridges, spheres, planets, stock values. Quantum physics, for example, involves Hilbert
space, which is a type of normed vector space with a scalar product where all Cauchy sequences of vectors converge.
The theory of such normed vector spaces was created at the same time as quantum mechanics - the 1920s and 1930s. So
with this approach we are moving ahead hundreds of years from Newton and Leibnitz, perhaps 70 years from Riemann.
Fourier series involve orthogonal sets of vectors in an innite dimensional normed vector space:
C[a, b] = {f : [a, b] R |f continuous } .
The L2 norm of a continuous function f in C[a, b] is
f 2 =
b
1/2
2
|f (x)| dx
This is an analog of the usual idea of length of a vector f = (f (1), ..., f (n)) Rn :
1/2
n
2
f 2 =
|f (j)| .
j=1
|f (x)| dx.
a
f = max |f (x)| .
axb
On nite dimensional vector spaces such as R it does not matter what norm you use when you are trying to gure out
whether a sequence of vectors has a limit. However, in innite dimensional normed vector spaces, convergence can disappear
if a dierent norm is used. Not all norms are equivalent in innite dimensions. We will discuss this in detail later.
Note that C[a, b] is innite dimensional since the set {1, x, x2 , x3 , ..., xn , ...} is an innite set of linearly independent
n
vectors. Prove this as follows. Suppose that we have a linear dependence relation
cj xj = 0, for all x in [a, b]. This
n
j=0
In what follows we dene normed vector space by 5 axioms. We will not put arrows on our vectors. We will try to keep
vectors and scalars apart by mostly using Greek letters for scalars. Our scalars will be real in this section. However, at the
end of these lectures, we will allow complex scalars. It simplies Fourier series.
Denition 1 A vector space V is a set of vectors v V which is closed under addition and closed under multiplication
by scalars R. This means vectors u, v V, there is a unique sum u + v V and scalar R, there is a unique
product v V. Moreover the following 5 axioms must hold for all u, v, w V and all , R:
VS1. u + (v + w) = (u + v) + w
VS2. 0 V s.t. 0 + v = v
VS3. v V v V s.t. v + v = 0.
VS4. v + u = u + v
VS5. 1v = v, (v) = ()v, ( + )v = v + v, (u + v) = u + v.
You may say we cheated by putting 4 axioms into VS5.
Denition 2 A vector space V is a normed vector space if there is a norm function mapping V to the non-negative real
numbers, written v , for v V, and satisfying the following 3 axioms:
N 1. v 0 v V and v = 0 if and only if v = 0.
N 2. v = || v , v V and R. Here || = absolute value of .
N 3. u + v u + v , u, v V. Triangle Inequality.
3-Space.
x1
R3 = x2 x1 , x2 , x3 R .
x3
x1
y1
x2 + y2
x3
y3
x1
x2
x3
The usual norm is
x1 + y1
= x2 + y2 .
x3 + y3
x1
= x2 , R.
x3
x1
x2 = x21 + x22 + x23 if x = x2 .
x3
or
The proof that these denitions make R3 a normed vector space is tedious. So we make it an exercise.
Example 2. The space of continuous functions on an interval.
C[a, b] = {f : [a, b] R |f continuous } .
For f, g C[a, b], dene (f + g)(x) = f (x) + g(x) for all x [a, b] and dene for R (f )(x) = f (x) for all
x [a, b]. We leave it as an exercise to check the axioms for a vector space. The most interesting part of the exercise is to
show that f + g and f are both continuous functions on [a, b].
Again there are many possible norms. We will look at 3:
b
1/2
2
f 2 = |f (x)| dx .
a
b
|f (x)| dx.
f 1 =
a
f = max |f (x)| .
axb
Most of the axioms for norms are easy to check. Lets do it for the f 1 norm.
Proof of N1. v 0 v V and v = 0 if and only if v = 0.
Since |f (x)| 0 for all x we know that the integral is 0, because the integral preserves inequalities (see Lectures I).
b
Suppose that
|f (x)| dx = 0. Since f is continuous (as is |f |), this implies f (x) = 0 for all x [a, b] by the positivity of
a
and R .
b
b
b
Also for any R and f C[a, b], we have f 1 = |f (x)| dx = || |f (x)| dx = || |f (x)| dx = || f 1 . This
a
proves N2 for norms. Here we used the multiplicative property of absolute value as well as the linearity of the integral (i.e.,
scalars come out of the integral from Lectures, I).
To nish the proof, use the linearity of the integral to see that
b
b
(|f (x)| + |g(x)|) dx =
b
|f (x)| dx +
|g(x)| dx = f 1 + g1 .
a
Scalar Products.
x1
y1
x2 y2 = x1 y1 + x2 y2 + x3 y3 .
x3
y3
It turns out there is a similar thing for C[a, b]. First lets dene the scalar product on a vector space and see how to get
a norm if, in addition, the scalar product is positive denite.
Denition 4 A (positive denite) scalar product is a function mapping (v, w) V V to < v, w > R such that:
SP1. < v, w >=< w, v >, v, w V (symmetry)
SP2. < u, v + w >=< u, v > + < u, w >, u, v, w V
SP3. < v, w >= < v, w >, v, w V and R
SP4. < v, v > 0, v V and < v, v >= 0 v = 0 (positive denite)
Axioms SP1,2,3 imply that < v, w > is linear in each variable holding the other variable xed. Axiom SP4 says the
scalar product is positive denite. We will always want to assume SP4 because we want to be able to get a norm out of
the scalar product via the following denition.
Denition 5 If V is a vector space with a (positive denite) scalar product < v, w > for v, w V, dene the associated
x1
y1
< x, y >= x2 y2 = x1 y1 + x2 y2 + x3 y3 .
x3
y3
It is easy to check the axioms. For example, the positive deniteness follows from the fact that squares of real numbers are
0 and sums of non-negative numbers are non-negative:
< x, x >= x21 + x22 + x23 0.
And
f g.
a
Once more, it is not hard to use the properties of the integral to check axioms SP1,SP2,SP3 (exercise). To see SP4,
note that f (x)2 0 for all x [a, b] implies by the fact that integrals preserve that
b
< f, f >=
f (x)2 dx 0.
Now suppose that < f, f >= 0. By the positivity property of the integral, we know that f (x)2 = 0 for all x [a, b] which
says that f is the 0 function (the identity for addition in our vector space C[a, b]).
b
1/2
2
Then the norm associated to this scalar product is f 2 = |f (x)| dx .
a
The following theorem is so useful people from lots of countries got their names attached.
Theorem 6 Cauchy-Schwarz (Bunyakovsky) Inequality
Suppose that V is a vector space with scalar product < v, w >. Then, dening the norm v = < v, v >, we have for
all v, w V :
|< v, w >| v w .
Proof. Let t R and look at
f (t) =< v + tw, v + tw > .
By properties of the scalar product, we have 0 f (t) =< v, v > +2t < v, w > +t2 < w, w > .
As a function of f, we see that f (t) = At2 + Bt + C, where A =< w, w >, B = 2 < v, w > and C =< v, v > . So
the graph of f (t) is that of a parabola above or touching the t-axis. For example, in Figure 1, we have drawn a parabola
touching the t-axis at one point.
Figure 1: graph of a parabola with positive leading coecient and non-positive discriminant
B B 2 4AC
r =
.
2A
Since we have at most one real root it follows that
B 2 4AC 0.
Now plug in A =< w, w >, B = 2 < v, w > and C =< v, v > . This gives the Cauchy-Schwarz inequality.
Corollary 7 Under the hypotheses of the preceding theorem, using Denition 5, v = < v, v > denes a norm on V .
Proof. We must prove:
N1. v 0 v V and v = 0 if and only if v = 0.
N2. v = || v , v V and R .
N3. Triangle Inequality. u + v u + v , u, v V.
We get N1 from SP 4.
2
2
2
We get N2 from SP 3. For then v =< v, v >= 2 < v, v >= || v , v V and R .
To prove the triangle inequality N3, we need to use the Cauchy-Schwarz inequality. This proof goes as follows. By the
linearity and symmetry of the scalar product we see that
2
u + v
as x |x|
by Cauchy Schwarz
= (u + v) .
Now use the fact that the square root
Whats the good of all this? Now we can happily dene limits of sequences of vectors {vn } in our normed vector space
V . Can you guess the denition of lim vn = L V ?
n
Answer:
> 0, N Z+ s.t. n N implies vn L < . That is, just replace absolute value in the old
denition of limit with the norm.
Similarly we can dene a Cauchy sequence {vn } in the normed vector space V . We will do all this in detail later, but
you should be able to guess what we will say.
Another use of the scalar product is to dene orthogonal vectors in a vector space V with a scalar product.
Denition 8 Two vectors v, w V, a vector space V with scalar product <, >, are dened to be orthogonal if the scalar
product < v, w >= 0.
In a vector space with scalar product, you can also dene the angle between 2 vectors v, w V, by
< v, w >= v w cos .
From this, the preceding denition makes sense as 2 vectors are orthogonal i the cosine of the angle between them is 0.
This denition of cosine agrees with the usual one if V = R2 .
What is the cosine law? Using the triangles in Figure 2, it says
2
Figure 2: Visualizing vectors in a normed vector space and the angle between them.
2
2
v 2 v w cos + w .
Comparison of Norms
Suppose {vn } is a sequence in a normed vector space V with norm , We will say lim vn = L and "vn converges to
L V "
lim vn L = 0. Note that the last sequence vn L is a sequence of real numbers.
Question. We know that there are lots of norms on V . How can we guarantee that 2 dierent norms and
produce the same convergent sequences in V ?
The answer is that equivalent norms produce the same convergent sequences where we dene equivalent as follows.
Denition 9 2 norms and on the vector space V are equivalent i there are constants A, B > 0 such that for all
v V we have
A v v B v .
In the preceding denition we are assuming that the constants A and B are independent of v V.
Why do equivalent norms lead to the same convergent sequences?
Answer. Suppose {vn } is a convergent sequence for the -norm; i.e., for some L V we have lim vn L = 0.
n
Since the outside sequences go to 0 as n , it follows by the squeeze lemma that the guy in the middle has to go to 0
as well.
Similarly lim vn L = 0 implies lim vn L = 0.
n
n
Moral. It does not matter which of 2 equivalent norms you use to test a sequence for convergence.
Theorem 10 All norms on Rn are equivalent.
Proof. See Lang, Undergraduate Analysis, p. 145.
Thus, for our purposes, it does not matter which norm you use on nite dimensional vector spaces. You get the same
denition of convergence of sequences. However, things are very dierent for innite dimensional vector spaces. See Figure
3 below for a sequence of functions fn in C[0, 1] such that
1
lim fn 01 = 0
f 1 =
|f (x)| dx.
0
However,
fn 0 = 1 for all n, using the norm
f = max |f (x)| .
axb
It follows that the norms 1 and on C[0, 1] are not equivalent.
Exercise.
1
1/2
2
Using the same example, show that lim fn 02 = 0 using the norm f 2 = |f (x)| dx . It
n
follows that the norms 2 and on C[0, 1] are not equivalent.
b
Proposition 11 The norms f 1 =
1/2
b
the inequality
f 1
b a f 2 .
Proof. To prove the inequality, use the Cauchy-Schwarz inequality on the functions |f | and g(x) = 1 for all x [a, b]. This
gives
|< f, g >| f 2 g2 .
So we have |< |f | , 1 >| |f |2 12 . Now this really means
b
f 1
|f (x)| dx
b
1/2
f (x)2 dx
b
1/2
1dx
a
b
1/2
=
b a f (x)2 dx
= b a f 2 .
a
To see that 1 and 2 are not equivalent norms, we take a = 0 and b = 1. Then we look at the following example.
Dene as in Lang, Undergraduate Analysis, p. 147:
n, for 0 x n1 ;
gn (x) =
1
, for 1 x 1.
n
x
Note that gn is continuous. Why? Exercise. Show also that
1
gn 1 = 2
n
and
gn 2 =
1 + log n.
It follows that there cannot be a constant C > 0 such that v2 C v1 at least on the interval [0, 1]. Can you extend
this idea to arbitrary intervals [a, b]?
lim vn = L
lim vn L = 0.
Examples.
Example 1) Let V = R2 and consider vn = ( n1 , n12 ). Using whatever norm is your favorite, lim vn = 0 = (0, 0). Prove
n
this as an exercise.
Example 2) Dene a sequence of functions on [0, 1] by the picture in Figure 3. One can show (exercise) that
Denition 13 A Cauchy sequence {vn } of vectors in a normed vector space V means that
> 0, N s.t. n, m N = vn vm < .
Lets list some facts about limits of sequences of vectors in V . Compare with the analogs for sequences of real numbers
and you will see that mostly we replace the absolute value in the real version with the norm in our new version. We did
these things in Lectures I. Sometimes we write exactly the same formulas as before, though of course we mean something
dierent as now we are talking about sequences of vectors in V which may in fact be sequences of functions. The proofs of
the following facts will be mostly the same as before, after replacing absolute value with norm.
Facts
1) Uniqueness of Limits.
lim vn = L and
lim vn = M
= L = M.
3) Linearity of Limits.
lim vn = L and
lim wn = M
implies
lim (vn + wn ) = L + M
+ = .
2 2
Exercise. Prove the second part of Fact 3) and then generalize to show that if {n } denotes a sequence of real numbers
such that lim n = , then lim n vn = L, assuming lim vn = L for the sequence {vn } of vectors in V .
n
4) The analogous proof was in Lectures 1. Replace all the absolute values in that proof with our norm and you have the
proof of Fact 4.
If lim vn = L and > 0 is given, there exists N Z+ such that n N implies vn L < /2. So if m N , we
n
have vm L < /2. It follows from the triangle inequality that for n, m N we have
vn vm vn L + L vm
vn L + L vm <
10
+ = .
2 2
Denition 14 Suppose that V is a normed vector space. We say that V is complete i every Cauchy sequence {vn } of
vectors vn V converges to a limit L V.
11
Figure 4: Here is a picture of the proof that a Cauchy sequence of vectors in the plane must converge because the projections
of the sequence to points on the 2 axes converge to limits in R, the red star and the green star.
L(1)
.
Let L =
L(2)
Claim. lim vn = L.
n
Proof. See Figure 4.
(j)
> 0, Nj s.t. n Nj = vn L(j) < . Let N =max {N1 , N2 } . Then n N implies
*
)
vn L = max vn(1) L(1) , vn(2) L(2) < .
Q.E.D. Claim.
2) We postpone the proof that C[a, b] is complete in the norm.
To see that C[0, 1] is not complete with respect to the 1 norm, we must nd a Cauchy sequence of continuous functions
that does not converge to an element of C[0, 1] in this norm.
Example.
Consider the function fn (x) dened by
0,
0 x 12 n1
1
1 + n(x 2 ), 12 n1 x 12
fn (x) =
1
1,
2 x.
See Figure 5.
0, 0 x < 12
1, 12 x 1.
Since L(x) is not continuous, it is not in the space C[0, 1]. But
Dene the function L (x) =
1
0,
2n
as n .
Thus the sequence {fn } approaches a limit in the 1 -norm, but not a limit that is continuous. So we have a Cauchy
sequence approaching a limit L
/ C[0, 1]. This says C[0, 1] is not complete. It is like the rationals, full of holes. To make
12
Figure 5: The function fn (x) designed not to converge to a continuous function with respect to the 1 norm as n .
13
a space containing limits of all Cauchy sequences in C[0, 1], you need to add all Lebesgue integrable functions on [0,1]. We
will not say much about the Lebesgue integral in this course. If you are interested, you could look at Lang, Undergraduate
Analysis, p. 262, Apostol, Mathematical Analysis, or Korevaar, Mathematical Methods.
Figure 7 shows the dierence between f g1 and f g . Let f be the purple function and g be the blue one.
Then f g is the maximum length of the pink dotted lines while f g1 is the area between the 2 curves.
Figure 7: This gure can be used to see the dierence between f g1 and f g .
14
Open and Closed Sets in a Normed Vector Space and Other Denitions.
Here we give a very brief discussion of some concepts from point set topology, in particular open and closed sets. You
could probably live without more denitions (unless you plan to go to grad school in math.). But I nd that in discussing
continuity, these ideas clarify things. If you always hated , you will be happy to learn that these ideas allow you to forget
about .
Denition 16 An open set U in a normed vector space V is a subset of V such that a U, r > 0 such that the open
ball of radius r and center a, B(a, r) = {x V | x a < r} U .
This means that U has no hard edges - no boundary points. As an example, if V = R with the usual absolute value as
norm, the open interval (a, b) is an open set. In any normed vector space V, the open ball B(a, r) is an open set.
A closed set F in a normed vector space V is a subset of V such that the complement F c is an open set. Closed sets
have hard edges. For example, if V = R with the usual absolute value as norm, the closed interval [a, b] is a closed set. In
any normed vector space V, the closed ball B(a, r) ={x V | x a r} is a closed set.
See Figure ?? for pictures of open and closed sets.
Figure 8: pictures of an open set (top) and a closed set (bottom). The boundary of the open set is dashed to indicate that
it is not part of the set.
Theorem 17 Properties of Open & Closed Sets in a normed vector space V
1) The empty set and V are both open and closed.
2) Finite intersections of open sets are open.
Arbitrary intersections of closed sets are closed.
3) Arbitrary unions of open sets are open.
Finite unions of closed sets are closed.
Proof. I will leave most of these proofs as exercises. Let me do parts of 2) and 3).
n
+
2) Suppose that U1 , ..., Un are open sets in V . If p
Ui , then i, i > 0 such that B(p, i ) Ui . Let =
min{ 1 , ..., n }. Then B(p, i )
n
+
i=1
Ui . Thus
n
+
i=1
i=1
15
Figure 9: An open turquoise star is intersected with an open purple rhombus. The red point is in the intersection of the 2
sets. Then 2 discs with centers at the red point are found, the yellow disc staying inside the star and the pink disc staying
inside the rhombus.
3) Suppose {Ui }iI denotes an arbitrary collection of open sets in V. If p
i > 0 such that B(p, ) Ui
iI
Ui . It follows that
iI
Ui is open.
iI
The facts about closed sets can be derived from the facts about open sets using (A B)c = Ac B c .
A set may be neither open nor closed. For example, consider the half-open interval (a, b].
Denition 18 The closure A of a subset A of a complete normed vector space V is the set of points p V
> 0, x A such that x p < . This says that > 0, B(p, ) A = .
You should picture a point in the closure of a set as a point sticking to the set.
adherent to A for that reason. See Figure 10.
Proposition 19 A point p is in A
such that
= If p = lim xn , for a sequence {xn } of points xn A, it follows that > 0, N s.t. n N implies xn B(p, ) A.
n
This implies p A.
Examples.
1) Any point in S is in the closure of S; i.e., S S. Just take x = p in the
denition of closure.
2) If V = R2 , using
any
of
our
favorite
norms,
any
point
in
the
closed
ball
x R2 | x r is in the closure of the
open ball B(0, r) = x R2 | x < r .
If a point x is inthe complement ofthe closed ball, then it is not in the closure of B(0, r).
Thus B(0, r) = x R2 | x r .
3) What is the closure of Q=the rationals? Answer. R.
4) What is the closure of { n1 |n Z+ }? Answer. {0} { n1 |n Z+ }.
5) What is the closure of the interval (a, b) in R? Answer. [a, b].
16
Figure 10: The red point is an element of the closure of the purple set.
6) What is the closure of the set {polynomials p(x) = an xn + + a1 x + a0 |aj R} in the space C[a, b] of continuous
functions on a nite interval [a, b] using the norm? Answer. C[a, b] by the Weierstrass Theorem proved in a later
section saying that any continuous function on a nite interval can be uniformly approximated by polynomials.
Proposition 20 A set S in a normed vector space is closed i S = S.
Proof. Suppose S is closed. Then S c is open. First we know that S S. Now to go the other way, suppose p S c . Then
> 0 such that B(p, ) S c . This means that p is not in S. That shows S closed implies S = S.
To go the other way, suppose S = S. We want to show that S c is open. Suppose p S c . Then we know that p is not
in S. By denition, this means > 0 such that B(p, ) S c . That is what we needed to show S c open.
Corollary 21 S is closed in a normed vector space V sequence {xn } of points xn S having a limit p = lim xn in
n
V, it follows that p S.
Proposition 22 A closed subset A of a complete normed vector space V is complete.
Proof. Suppose that {xn } is a Cauchy sequence in A. Since V is complete, we know p V such that lim xn = p. Since
n
A is closed, it follows from the preceding Corollary that p A.
It can be useful to learn the following denition.
Denition 23 A point a in the set S {a} is an accumulation point of the set S. This means that for every > 0, the
deleted ball B(a, ) {a} = {x | 0 < x a < } contains a point of S.
A set S is closed i it contains all its accumulation points.
Examples.
Example 1) The set Z of integers has no accumulation points. For n Z , one sees that B(n, 12 ) {n} Z = .
Example 2) What is the set of accumulation points of Q=the rationals? Answer. R.
Example 3) What is the set of accumulation points of { n1 |n Z+ }? Answer. {0}.
Example 4) What is the set of accumulation points of the open ball B(a, r) in R2 using the usual norm? Answer.
The closed ball B(a, r) = {x R2 | x a2 r}.
Another important word in the mathematicians vocabulary is "compact." I will give a denition that works for normed
vector spaces but not more general spaces that mathematicians like to consider. We will not say much more about compact
sets except to note that in nite dimensional normed vector spaces a set is compact i it is closed and bounded. Here
bounded means contained in a ball of radius r and center 0. This is false for innite dimensional normed vector spaces,
however. A ball of radius r in innite dimensions is not compact. Very inconvenient.
Denition 24 A subset S of a normed vector space V is (sequentially) compact i every sequence {xn } in S has a
subsequence {xnk } converging to an element of S.
See Lang, Undergraduate Analysis or Apostol, Mathematical Analysis, for more information on compact sets.
17
This brings us to the story of limits of functions in and on normed vector spaces. Why are we interested in this question?
We want to know when we can interchange limit and integral, limit and derivative. We want to know whether a series
of functions (such as a Taylor series or a Fourier series) converges. In fact, we need to know precisely what we mean by
convergence of a series of functions. We want to think of the denite integral as a function on the space C[a, b] of continuous
functions on a nite closed interval [a, b]. What sort of function is it? Linear? Continuous?
In fact, proving things about limits of functions on normed vector spaces is no harder than it was for functions on the
real line. We will basically copy the proofs from those of the analogous results for functions on the real line. Suppose that
V and W are normed vector spaces. I will use the same symbol for the norm on V and the norm on W . Suppose S V
and f : S W. I will make a more general hypothesis on the nature of a and S in lim f (x) = L, however.
xa
Denition 25 Suppose S V and f : S W. Here V and W are normed vector spaces. We will denote both norms by
though they may be dierent. Assume a is an accumulation point of S. Dene the limit of f as x approaches a:
lim f (x) = lim f (x) = L W
xa
xa
xS
to mean that for every > 0, there exists > 0 (depending on ) such that x S and 0 < x a < implies
f (x) L < .
Examples.
Example 1) Let V = C[a, b], the space of continuous real-valued functions on the nite interval [a, b]. Dene I : V R
b
b
by I(f ) =
f (x)dx. Suppose our norms are f 1 =
|f (x)| dx on V and the usual absolute value on R. Does
a
lim I(f ) = 0?
f 0
Answer: Yes. In fact, here = , since (using the fact that integrals preserve inequalities), we see that f 01 <
implies
b
b
|I(f ) I(0)| = f (x)dx 0 |f (x)| dx = f 1 < .
a
y x
y 2 +x2 ,
lim
(x,y)(0,0)
f (x, y) = 0?
18
19
y 2 x2
y 2 +x2 .
arbitrarily close to a. If we dont assure that, then the limit could be anything. Thus we should assume minimally that a S.
Then S {x | x V, x a < } = for all > 0. This is what is done by Lang, Undergraduate Analysis, p. 160. However,
this does not assure the non-emptyness of the intersection with the punctured ball: S {x | x V, 0 < x a < } =
for all > 0. For example Lang seems to allow S = {a} in the denition of limit. Moreover, Lang leaves out the assumption
0 < x a in the denition of limit. I cannot make myself do that. See also Dieudonn, Foundations of Modern Analysis,
Volume I. Sagan, Advanced Calculus and Apostol, Mathematical Analysis, p. 77, assume that a is an accumulation point
of S. So I will do that. It wont really matter for the examples we want to consider in innite dimensional normed vector
spaces where our sets will be like C[a, b].
We assume that V, W are normed vector spaces, S V, f, g : S W and we assume that a is an accumulation point
of S.
Property 1) Uniqueness.
Suppose lim f (x) = L and lim f (x) = M . Then L = M .
xa
xa
Property 2) Linearity.
Suppose , R and lim f (x) = L
and
xa
xa
xa
Property 3) Composite.
Suppose we have 3 normed vector spaces V, W, Z and functions f : S T, g : T Z, with the usual assumptions on
the sets S V and T W as well as the points a and L. If lim f (x) = L and lim g(y) = M , then lim g (f (x)) = M.
xa
xa
xa
yL
Property 4) Inequalities.
Suppose f, g : S R (with the absolute value making R a normed vector space) and
lim f (x) = L and lim g(x) = M, then L M.
xa
g(x) M < 2(1+||) . Then let = min{ 1 , 2 } so that 0 < x a < implies
Figure 12: Picture of the proof of property 4 of limits. Here we assume h(x) 0 and lim h(x) = K < 0. The h(x) values
xa
are red circles to the right of 0. The limiting value K is a blue star to the left of 0. No way can this happen since the red
circles can never get close to the blue star.
Proof. Property 4) Dene h(x) = g(x) f (x). Then h(x) 0 for all x S. Let K = M L. If K < 0, we can get
a contradiction. We know from Property 1 of Limits that K = M L = lim h(x). See Figure 12. The red circles are the
xa
values of h(x) to the right of 0 and the blue star is K, to the left of 0. Take = |K|
2 . Then such that 0 < x a <
.
implies |h(x) K| < |K|
2
and, adding K to both sides, h(x) < K
Proof. Then, since K is negative, h(x) K < K
2
2 < 0, a contradiction to
h(x) 0.
Exercise. Prove Property 5) of Limits.
We wont discuss limits of products in general. You can refer to Lang, Undergraduate Analysis, p.162-3 for the general
story of limits of products. You get the joy of considering a special case in an exercise. There are lots of other cases one could
look at; e.g., scalar valued function times vector valued function, matrix valued function times matrix valued function,.....
If you hate stu, you will love the following theorem, which allows you to think about limits of sequences instead.
Theorem 26 Sequential Denition of Limits. Suppose S V and f : S W, where V and W are normed vector
spaces. Assume that a is an accumulation point of the set S. Then the existence of lim f (x) = L
is equivalent to saying
xa
that for every sequence of vectors {xn } in S such that lim xn = a, we have lim f (xn ) = L exists.
n
If, by contradiction,
lim f (x) does not equal L, then (using the rules for negating a statement involving lots of ), we see that > 0 s.t.
xa
1
n
lim f (xn ) = L.
Maybe we should try to draw a picture of the denition of limit in higher dimensions. The problem is that it is hard to
draw the graph of a function unless it maps a subset of the plane into the reals. Here the graph of z = f (x, y) is 3-dimensional.
Just plot the points (x, y, f (x, y)) in 3-space. Of course in the innite dimensional case good luck drawing pictures. Even
drawing the graph of a function from the plane to the plane requires 4 dimensional pictures. You can still project them
down to 2 dimensions as you would in the case of a real valued function of 2 variables. Or you can make a movie of the
graph being rotated.
n
21
10
We see (assuming that c is an accumulation point of S) that f : U W is continuous at c U i lim f (x) = f (c).
xc
The following theorem allows you to erase from your vocabulary. For a proof, see Apostol, Mathematical Analysis,
p. 82.
Theorem 28 Suppose that V, W are normed vector spaces and S is a subset of V. We say that f : S W
at c S i for every open set Y W, the inverse image f 1 (Y ) = {x V | f (x) Y } is open in V .
is continuous
The denition includes some slightly crazy continuous functions. If V = W = R and S = Z, every point of Z is isolated
meaning that there is a > 0 such that B(p, ) Z = {p}. Take = 12 . Thus every function f : Z R is continuous.
When V = W = R, we view continuity to mean that the graph of y = f (x) does not break up at x = c. When
V = R2 , you can think a similar thing about the surface z = f (x, y) in 3-space. But recalling Figure 11 of the function
2
2
f (x, y) = yy2 x
+x2 , it is even hard to see the break up for a function of 2 variables. It is easier to recall that we saw that f (x, y)
has a dierent value on various lines through the origin. It is 1 on the y-axis and -1 on the x-axis, for example.
We can use the properties of limits to deduce the following properties of continuous functions.
Property 1) Linearity. Suppose that f, g : U W, where U V and V, W are normed vector spaces. Let , be
(real) scalars. Then f and g continuous at c U implies that (f + g) is continuous at c.
Property 2) Composition. Suppose that V, W, Z are normed vector spaces with U V and T W. Let c U. Suppose
that f : U T and g : T Z. Suppose f is continuous at c and g is continuous at f (c). Then g f is continuous at c.
Property 3) Sequential Denition of Continuity.
Assume V, W are normed vector spaces with U V. The function f : U W is continuous at c U i sequence
{vn } of vectors in V such that lim vn = c, we have lim f (vn ) = f (c) .
n
For the proofs, you just have to look at the proofs of the corresponding properties of limits. We leave it to you as an
exercise.
Examples (the same as those in the section on limits).
Example 1) Let V = C[a, b], the space of continuous real-valued functions on the nite interval [a, b]. Dene I : V R
b
b
by I(f ) = f (x)dx. Suppose we use f 1 = |f (x)| dx on V and the usual absolute value on R. We showed earlier that
a
the linear function I(f ) is actually continuous at f = 0. Now I claim I(f ) is continuous on V ; i.e., continuous everywhere.
Why? Using properties of the integral on continuous functions that we proved in Lectures I,
|I(f ) I(g)| = |I(f g)| I (|f g|) = f g1 .
This means that given > 0, we can take = and then f g1 < implies |I(f ) I(g)| < (which is the
denition of continuity at g (or f ). In fact, since depends only on and not on f or g, we have proved that the function
I(f ) is uniformly continuous - a concept we are about to dene.
Exercise. Is I(f ) still continuous when we replace the norm f 1 on V with f 2 ? Explain your answer.
2
2
Example 2) Look again at the function f (x, y) = yy2 x
+x2 , for (x, y) = (0, 0) in R . We know that this function cannot be
continuous at (0, 0) since it has no limit as (x, y) (0, 0). This function is continuous at every other point though. There
are more such examples in the exercises.
22
Denition 29 Suppose that V, W are normed vector spaces and U V . We say that f : U W is uniformly continuous
on U i > 0 > 0 (with depending only on ) such that u, v U, u v < implies f (u) f (v) < .
The point in this denition is that does not depend on u, v U.
Example 1 just considered is an example of a uniformly continuous function where in fact = . We get lots more
examples using the following theorem.
Theorem 30 Suppose K is a compact subset of a normed vector space V and W is any normed vector space.
function f : K W must be uniformly continuous on K.
A continuous
For a proof of the preceding theorem, see p. 198 of Lang, Undergraduate Analysis.
More Examples.
Example 3) V =normed vector space. Let f (x) = x . Then f is uniformly continuous on V . For all x, y V, the
triangle inequality implies (as it did for ordinary absolute value in an early homework problem from part 1),
|x y| x y .
This says we can take = again.
0
..
.
0
0
in the jth row and the rest of the entries being 0. Every vector v Rn , can be written uniquely in the form:
v1
..
.
n
vj1
=
v=
vj ej .
vj j=1
.
..
vn
It follows from linearity of L, that
Lv =
a1j
..
.
aj1,j
Write Lej =
ajj
..
.
amj
n
vj Lej .
j=1
Rm . So we see that
a1j
..
.
aj1,j
Lv =
vj
ajj
j=1
..
.
amj
n
23
= Av,
where we multiply the matrix A whose entries are aij with the vector v. As an exercise, show that the linear function L is
uniformly continuous. You can see this by using the innity norm on Rn and Rm and showing that there is a constant C > 0
so that Lx C x . The constant C depends on the entries aij of the matrix A. If you take the K = max |aij | ,
then C = nK should work.
When a normed vector space V is not complete, one can always obtain a larger complete space W containing V. This
is done by taking all the Cauchy sequences {xn } of elements xn V modulo the equivalence relation {xn } {yn } i
xn yn 0 as n . This larger space W is then called the completion of V. See Lang, Undergraduate Analysis for
more information on completions.
11
Proof. Recall that V "complete" means every Cauchy sequence in V converges to an element of V . So let {fn } be a Cauchy
sequence in C[a, b] using the norm f . This means for every x [a, b], the sequence {fn (x)} of real numbers is Cauchy;
as N s.t.
|fn (x) fm (x)| fn fm < when n, m N .
(1)
We showed in Lectures I that Cauchy sequences of real numbers converge to a limit in R. Thus x [a, b] there is a function
f (x) = lim fn (x). Now we need to show that fn converges uniformly to f on [a, b].
n
Let > 0 be given. There is M = M (x, ) N so that m M implies |fm (x) f (x)| < . Then for n N we have
the following sneaky formula by adding and subtracting fm (x) and using the triangle inequality:
|f (x) fn (x)| |f (x) fm (x)| + |fm (x) fn (x)| < + fn fm < 2.
(2)
We chose N so that formula (1) holds. This implies f fn < 2 for n N which is uniform convergence of fn to f
on [a, b] since N does not depend on x.
Next we need to show that f is continuous on [a, b]. To see this, note that for x, y [a, b], using the triangle inequality
in a sneaky way again (this time adding and subtracting fn (x) fn (y)):
|f (x) f (y)| |f (x) fn (x) + fn (x) fn (y) + fn (y) f (y)|
|f (x) fn (x)| + |fn (x) fn (y)| + |fn (y) f (y)| .
We know that for n N the 1st and 3rd terms are < 2 by formula (2). Since fn is continuous, there is a positive ,
depending on n, and y such that |xy| < implies the middle term is also < . So the nal result is that |f (x) f (y)| < 5,
if |x y| < . Replace by /5, if you like.
Part II
The Basics
In this section I will be more sketchy than usual. Hope that is OK and that you remember some of the series part of calculus.
We are mostly interested in series of functions like power series and Fourier series. That is, we are interested in series.
in
a normed vector
space
like
C[a,
b]
that
is
innite
dimensional.
And
we
want
to
know:
can
we
interchange
limit
and
,
.
.
derivative and , integral and ? The answer will depend on the norm used. Before doing the innite dimensional theory
of series wed better review the theory of series of real numbers.
First dene what we mean by convergence of a series in a normed vector space; i.e., convergence of the sequence of partial
sums.
24
Denition 32 Suppose V is a normed vector space. Let {vn } be a sequence of vectors in V . Then we say the series
converges to s and write
vn = s i s = lim
n=1
n
vn
n=1
vk .
k=1
n
n=1
k=1
k=1
k=1
k=m+1
k=m+1
Now take n = m + 1 N . Then we see that vn < . We have proved the following necessary condition for convergence
of a series.
Necessary but NOT Sucient Condition for Convergence. The series
vn converges implies that the terms
n=1
This condition is not sucient for convergence. The integral test will give us many examples (e.g., the harmonic series).
13
n=1
n
ak are bounded.
k=1
Proof. The partial sums form an increasing sequence. We know from Lectures I that an increasing sequence sn converges
i it is bounded and then it converges to the least upper bound of the set {s1 , s2 , s3 , ...}.
Test 2. Comparison Test.
Suppose
an and
n=1
n=1
a) If
bn converges, then so does
an .
n=1
b) If
n=1
n=1
bn .
n=1
Proof. You can ignore all the an for n < N . They cannot aect the convergence of
n=1
n
k=1
ak C
n
bk C
k=1
bk .
k=1
b) Exercise.
Test 3. Ratio Test.
Suppose that 0 an .
b) If a positive constant C such that C > 1 and an+1 Can , for all n N, then
n=1
25
an converges.
n=1
an
diverges.
Cn =
1
1C .
n=0
for k = 1, 2, .... The comparison test will then imply the conclusion of part b) since for C > 1,
C n diverges. In fact,
n=0
f (n)
converges
f (x)dx converges.
n=1
n
f (x)dx f (n 1).
f (n)
n1
N
N
1
f (n) f (x)dx
f (n).
n=1
Example.
My favorite function is the Riemann zeta function.
(s) =
1
.
s
n
n=1
The integral test says that this converges if s > 1 and diverges if s 1.
The case s = 1 is the divergent harmonic series. The cases s = 2n an even integer > 1 were evaluated by Euler as
rational numbers times 2n . Some have wasted much time thinking about the case of s = 3 and the like producing some
horrendous formulas but not anything like Eulers formula.
Riemann wrote a paper about 1850 showing how to make sense of (s) for all complex numbers s except for s = 1. Why
was Riemann interested in this? He was interested in the distribution of prime numbers 2, 3, 5, 7, 11, .... Euler had showed
the product formula that for Re s > 1,
1
0
1
(s) =
1 s
.
p
p=prime
Here we dene the innite product as the limit of the partial products. We wont prove this formula. It relies on the
geometric series and the fundamental theorem of arithmetic (every n 1 is a product of powers of primes and the product
is unique up to order).
Thanks to complex analysis, it turns out that the location of the complex numbers s such that (s) = 0 gives information
about the distribution of primes. About 50 years after Riemanns work Hadamard and de la Valle Poussin used what was
known about the location of zeta zeros to show the prime number theorem which says that the number of primes x is
asymptotic to logx x as x .
Riemann stated the Riemann hypothesis saying that the non-real zeros of (s) must have Re s = 12 . You win 1 million
dollars from the Clay Math. Institute if you can prove it. However, I think there was a deadline for the proof and that
deadline has passed. A reference for zeta is Edwards, Riemanns Zeta Function.
14
Non-Absolute Convergence
n=1
an diverges.
n=1
s2n+1
This implies (since the terms an are decreasing) that the odd partial sums are decreasing.
Case 2. The even partial sums
s2n = (a1 a2 ) + (a3 a4 ) + + (a2n1 a2n ).
s2n+2
using stu from Lectures I (the existence of the l.u.b.). The odd partial sums are a decreasing sequence bounded below and
thus have a limit
M = lim s2n+1 .
n
To see that
L = M = lim sn ,
n
Example.
(1)n1 n1 converges by the alternating series test. With some eort, using the Taylor series for log(1x) =
n=1
x
n
n=1
log 2 =
(1)n1
n=1
1
1 1 1 1 1 1 1
= 1 + + + + .
n
2 3 4 5 6 7 8
If one rearranges this series to sum one odd then two evens:
s=1
1 1 1 1 1 1
1
1
+ +
+ ,
2 4 3 6 8 5 10 12
with a little algebra, one can show that s= log2 2 . That is, we get half the sum of the original series with alternating signs.
Theorem 34 Riemanns Rearrangement Theorem. You can rearrange a series that converges conditionally to make
it converge to anything or diverge to or . Rearrangement of the series
bn means take a 1-1,onto map of the
positive integers onto the positive integers and look at the series
n=1
b(n) .
n=1
This moral of this theorem is that conditionally or non-absolutely convergent series are quite nasty. Absolute convergence
of
vn means
vn converges in R and will be discussed in the next section. For a proof of Riemanns Rearrangement
n=1
n=1
28
15
Suppose that V is a complete normed vector space (recall complete means contains limits of all Cauchy sequences which
must converge to limits in V ).
Theorem 35 Absolute Convergence = Convergence in a Complete Normed Vector Space.
Suppose V is a complete normed vector space and vn V for all n. Then
vn
converges in R implies
n=1
vn converges to a limit s in V.
n=1
vn .
n=1
n
k=1
converge to a limit s V . To see this, assume n > m and use the triangle inequality to obtain
/ n
/
n
/
/
/
/
vk /
vk .
sn sm = /
/
/
k=m+1
k=m+1
We can chose N to make the last sum < for n > m N by the hypothesis that
Example. Suppose X is an m m real matrix. Dene
eX = exp(X) =
vn converges in R.
n=1
1
n
n! X .
n=0
by
X = max {Xv | v Rm , v = 1 } .
Here v denotes any of our favorite norms on Rm . We know we can say max rather than l.u.b. or sup by Theorem 2.2,
p. 197, of Lang, Undergraduate Analysis, because the unit sphere in Rm is compact and the function on Rm produced
by taking vector v to Xv is continuous linear. More about matrix norms can be found in Strang, Linear Algebra and its
Applications, Chapter 7 or Horn and Johnson, Matrix Analysis, Chapter 5.
Then one can show that
XY X Y .
Since
1
n!
X
always converges, we know that the matrix exponential series always converges.
n=0
Theorem 36 If
n=1
exam from Lectures I when we looked at the fractal nature of the Weierstrass nowhere dierentiable continuous function.
This test gave the continuity.
Theorem 37 Weierstrass M-Test.
Theorem 38 Let fn C[a, b]. Suppose fn Mn for all n = 1, 2, 3, ...
Then
n=1
n=1
29
Mn
converges in R.
Proof. We make our usual argument to see that the sequence of partial sums of
norm. Then completeness of C[a, b] completes the proof that
That is, the nth partial sum sn =
n
n=1
fn
n=1
n
fk implies sn sm =
k=1
fk
k=1
m
n
fk =
k=1
k=m+1
k=m+1
k=k+1
Example 1.
sin(nx)
n2
n=1
1
n2
with
n=1
1
n,
1
n2 .
The series
Mn =
n=1
1
n2
n=1
the question of convergence is more delicate. We will think about it when we look at Fourier
Then
n=0
Mn =
n=0
cn =
1
1c
1
1x
n=0
Then L : V V is linear and can be shown to be continuous. We can dene a norm on the space L(V, V ) of such
operators L to be L = lub {Lf | f V, f = 1} . This is called the operator norm.
1
One has an operator-valued geometric series for R:
n Ln = (I L) . One can show, using properties
n=0
of the operator norm that this series converges when L = || L < 1. This fact is quite useful when considering the
spectral theory of integral and dierential operators. See Courant and Hilbert, Methods of Mathematical Physics, Vol. I
and Stakgold, Greens Functions and Boundary Value Problems for more information
Part III
Power Series
16
Radius of Convergence
n=0
for simplicity. We consider x R mostly. The same arguments should work for complex numbers and matrices, with a
30
little thought.
Theorem 39 a) Suppose {an } is a sequence of real numbers and
Proof. a)
n=0
n
b) If
an x diverges, then
an un diverges if |u| > |x| .
n=0
an un
n=0
n=0
an xn converges implies lim an xn = 0. Since convergent implies bounded, we have |an xn | M for all n.
n
n=0
This implies if |u| < |x| , we can apply the comparison test by writing
n
n
u
n
nu
|an u | = an x n M n .
x
x
So we can compare the series
n=0
un
n .
x
n=0
an xn
n=0
2
n
R = l.u.b. r r 0,
|an | r converges .
1
n=0
I am using the convention here that R (blackboard bold face font) is the set of real numbers and should not be confused
with ordinary capital R. This convention comes to us from Nicolas Bourbaki (the unreal mathematician).
n=0
an xn .
n=0
We assume an = 0, n.
n
Formula 1) Assume the limit lim aan+1
exists.
n
n=0
an
.
R = lim
n an+1
Formula 2) Again, assuming the limit involved exists, we have the formula:
1
1
= lim |an | n .
n
R
31
an xn . That is
n
an xn . Setting lim aan+1
= c, we see that the ratio
n
n=0
xn =
1
1x .
n=0
an = 1, for all n.
an
= R = lim
R = lim
n an+1
n
1 n
n! x
n=0
1
n!
1
(n+1)!
(n + 1)!
lim (n + 1) = .
= lim
n
n
n!
1
n!xn .
n=0
The radius of convergence is the reciprocal of that for Example 2. Thus you get R = 0. This series only converges at
x = 0.
Exercise. Replace R in Examples 1 and 2 with the space Rnn of n n real matrices. Obtain the matrix geometric series
and the matrix exponential. Where do they converge? Use the denition L = lub {Lv | v Rn , v = 1} to get a
norm on the space of matrices. See Apostol, Calculus, Vol. II for many applications of the matrix exponential to dierential
equations.
17
n=0
d
dx
),
n=0
Example.
Integrate the geometric series to get the power series for log(1 x). This is legal if |x| < 1 using the corollary of the
following theorem.
x
log(1 x) =
0
1
dt =
1t
x
t dt =
0 n=0
n=0 0
x
tn+1
xn+1
t dt =
=
.
n + 1 0 n=0 n + 1
n=0
n
Note that the formula makes sense if x = 1, although the following theorem and corollary do not justify the equality at 1.
32
Theorem 43 (Interchange of limit and integral). Suppose that fn : [a, b] R is continuous n and converges
uniformly on [a, b] to f . Then
b
b
b
lim fn =
lim fn = f.
n
Proof. We already know that f must be continuous on [a, b]. So we can integrate it. See Lectures, I. Using properties of
the integral from Lectures I, we nd that
b
b
b b
b
f fn = (f fn ) |f fn | f fn = f fn (b a).
a
ba
As a corollary, we see that it is legal to integrate a series of functions term-by-term on a nite interval if the series
converges uniformly on that interval.
Corollary 44 Suppose gn : [a, b] R is continuous n and
n=0
gn =
n=0 a
b
a n=0
b
gn =
s.
a
Proof. Exercise.
The proof of the last theorem was essentially one line. But dierentiation requires more eort, as well as more hypotheses.
Why should that be?
fn (x) =
n cos(nx).
This derivative has problems converging at all, much less to 0. For example, it diverges at x = 0. In fact, it can be
shown to diverge everywhere. Suppose x = 0. You just need to see that if x = 0, N such that |cos (N x)| < 12 and thus
1
|cos(2N x)| = 2 cos2 (N x) 1 = 1 2 cos2 (N x) > .
2
1
It follows that there is a subsequence satisfying nk cos(nk x) > 2 nk and thus diverging.
Exercise. Find an example to show that we need uniform convergence of fn to f in the hypothesis of the preceding
theorem. Pointwise convergence at each x in [a, b] does not suce for the interchange of integral and limit.
Hint. Look at fn (x) = n2 x(1 x)n , on [0, 1].
Theorem 45 (Interchange of Derivative and Limit). Suppose {fn } is a sequence of continuously dierentiable
functions on [a, b]. Suppose, also that the sequence of derivatives {fn } converges uniformly to g on [a, b]. And, nally
assume there exists one point x0 [a, b] such that fn (x0 ) converges.
Then
d
d
g(x) = lim
fn (x) =
lim fn (x).
n dx
dx n
and fn (x) converges uniformly to f (x), with f (x) = g(x), for all x [a, b].
33
fn (x) =
(3)
Here cn = fn (a).
Let x = x0 and take the limit of formula (3) as n . This says that c = lim cn .
n
Next take the limit of formula (3) for general x [a, b] as n using the preceding Theorem (which we can since
the sequence of derivatives converges uniformly). This says, the sequence {fn } converges pointwise (i.e., for each xed x in
[a, b]) to
x
f (x) = g + c.
a
To nish the proof of this theorem, we just need to see {fn } converges uniformly to f . Well, try this, using our favorite
triangle inequality and properties of integrals from Lectures I,
x
x
|fn (x) f (x)| = fn + cn g c
a
ax
x
x
fn g + |cn c| fn g + |cn c|
a
a
/a
/
/
/
(b a) /fn g / + |cn c| .
So if you insist on giving me an > 0, I can certainly nd an N (depending only on and not on x) such that n > N
implies |fn (x) f | < . That is the meaning of uniform convergence.
Corollary. Suppose gn (x) [a, b] R is continuously dierentiable, n and
s(x). And assume there is one point x0 [a, b] such that
n=0
n=0
n=0
d
d
gn (x).
gn (x) =
dx
dx n=0
n=0
xn+1
,
n+1
for n 0.
an xn with radius of convergence R. Then for x in (R, R), we have the following facts.
n=0
34
1)
x
f (t)dt =
an
n=0
2)
f (x) =
xn+1
.
n+1
nan xn1 .
n=0
The integrated and dierentiated series have the same radius of convergence R.
Proof. 2) Note that if R is its radius of convergence, a power series converges uniformly on closed subintervals of (R, R).
So we just have to convince ourselves that the extra factor of n in nan does not aect the radius of convergence.
To see this, note that if 0 < |x| < c < R, we know by the denition of radius of convergence R on the rst page of this
part of the lectures, that
|an | cn < . Thus lim an cn = 0. It follows that there is a bound M such that |an cn | M ,
n
n=0
for all n.
Therefore we have:
n1
n |an | |x|
So
n1
n |an | |x|
n=0
n
=
c
|x|
c
n1
n
|an | c M
c
|x|
c
n1
.
n1
n |x|
. This series converges by the ratio test as the ratios are
c
n=0
n
(n + 1) |x|
c
n + 1 |x|
|x|
< 1, as n .
n1 =
n c
c
|x|
n c
1) is left to the reader for Exercise.
Examples. The power series for ex , sin(x), cos(x) converge absolutely and uniformly on closed and bounded sets in the
real line. So they are dierentiable and integrable term by term. The same holds if we replace R with C or n n matrices.
18
Taylor Series
n1
k=0
f (k) (a)
(x a)k + Rn .
k!
(k)
We had 2 formulas for the remainder. The easiest to remember is Rn = f k!(c) (x a)k , for some c between a and x. If
one can show that lim Rn = 0, then the function f (x) is represented by its Taylor series within the radius of convergence.
n
f (k) (a)
k=0
k!
(x a)k .
See Wikipedia (Taylor series article) for animated pictures showing the convergence of the Taylor series to ex . We proved
last quarter that our favorite functions: ex , sin(x), cos(x), log(1 x) are represented by their Taylor series within the
radius of convergence interval. Of course we cheated and dened ex by its Taylor series. So we have:
ex =
1 n
x , for all x R;
n!
n=0
35
log(1 x) =
1 n
x , for |x| < 1.
n
n=0
2
x
x
+ 3024
Figure 14 shows the sum of the rst 7 terms of the Taylor series for ex , namely f (x) = 1 + x + x2 + x6 + x24 + 524
x
in red and e in blue on the interval [5, 5].
Figure 14: Plot of ex blue and 1st 7 terms of its Taylor series about 0 red.
Another example is the Binomial Series:
(1 + x) =
n=0
xn ,
(4)
=
.
n
n!
As usual, dene 0! = 1. When is a positive integer, this is the ordinary binomial theorem. Otherwise you need to restrict
x to have |x| < 1. Why?
Exercise. Prove formula (4) using Taylors formula from Lectures, I. Find the radius of convergence. Why is this the
binomial theorem when is a positive integer?
We also saw earlier that there are functions f (x) not represented by their Taylor series; e.g., f (0) = 0, and for x = 0,
12
dene f (x) = e x . For this function, one can show that f (n) (0) = 0, for all n. This means the Taylor series for f (x)
around x = 0 is 0 even though f (x) itself is positive except at x = 0. For x = 0,
12
x
= f (x) =
f (n) (0) n
x = 0.
n!
n=0
36
Figure 15: The plot of f (x) = e x2 . This function is not represented by its Taylor series about 0 which is 0 for all x.
37
Part IV
Introduction
$b
Recall our Lectures, I, Part 7. We assumed that for any nite interval [a, b] there exists an integral a f of a continuous
function f : [a, b] R. And we assumed the integral satises the 2 axioms:
$b
INT 1. m f (x) M, x [a, b] = m(b a) a f M (b a).
$b
$c
$b
INT 2. a < c < b = a f = a f + c f.
We deduced all the other basic facts about integrals from these 2 axioms; e.g., the fundamental theorem of calculus,
Taylors formula.
$b
However, we never showed that such an integral a f exists! I doubt that anyone was too worried. But now we
will nally prove the existence of the integral. We will in fact be able to integrate more general functions than just those
that are continuous on [a, b].
Note: We should also have proved R exists. That is done by forming the completion of Q (the space of Cauchy sequences
{xn } of rationals mod the equivalence relation {xn } {yn } i lim |xn yn | = 0.
n
$b
How will we create a f ? It will be somewhat reminiscent of calculus. Recall the Riemann sums, where you partition
the interval [a, b] with points a = a0 < a1 < < an = b. Then you form rectangles the sum of whose areas approximates
$1
the integral. In Figure 16 we approximate 0 x2 dx as in calculus by dividing up the x-axis interval [0, 1] into 5 equal parts.
The blue lines show the tops of the rectangles and also provide the graph of a step function approximating f (x) = x2 . Our
approximation for the integral is
.2 (.2)2 + (.4)2 + (.6)2 + (.8)2 + 12
= 0.44
The actual integral is x3 /3|0 = 1/3
= .333333.
1
$1
Figure 16: The calculus way of approximating the integral 0 x2 dx. Divide the x-axis into 5 equal parts and take the value
of the function at the right- (or left-) hand end point of the subinterval for the height of the rectangle.
38
$1
Our newfangled way of nding an approximation for the same integral 0 x2 dx is illustrated in Figure 17. We need to
nd a step function s(x) such that s f = l.u.b. |s(x) f (x)| is small. To do this, one should 1st divide up the y-axis
0x1
rather than the x-axis. Although in the end we divide up both axes.
$1
0
x2 dx .
$1
For 0x2 dx, we divide the y-axis interval (also [0, 1]) into 5 equal subdivisions and go down to the x-axis via the inverse
function x to our original function x2 . Now our approximation for the integral is:
.2
.2 0 + .4
.4 .2 + .6
.6 .4 + .8
.8 .6 + 1 1 .8
= 0.450 26.
Numerically this is not impressive. But it will give us a way of constructing integrals that throws a new light on the theory.
We will be able to get the Lebesgue integral simply by changing our norm from norm to the 1 norm.
$1
$1
Our method of integrating x2 requires
the integral 0 x2 dx is the limit as n of 0 sn (x)dx for a sequence
/ us to say
/
of step function sn (x) such that lim /sn (x) x2 / = 0. More generally we write St[a, b]=the space of step functions on
n
the interval [a, b]. We will be able to integrate functions f on [a, b] that are uniform limits of sequences of step functions
sn (x). These functions are in St[a, b] = the closure of the space of step functions with respect to the norm. The ocial
name for St[a, b] is the space of regulated functions. Yes, you can Google it.
In the exercises, we give an example of a non-regulated function. This function cant decide whether it is 1, 1 or 0 on
any small interval containing 0. It does not have a right-hand limit at 0. The function |xx sin(1/x)
sin(1/x)| is pictured in gure 18.
This is not quite the function in the exercises as it does not have a value when x sin(1/x) = 0.
20
We will extend the integral from step functions to piecewise continuous functions and further. To do this, we need the
continuous linear extension theorem. Suppose that F and E are normed vector spaces. Assume that E is complete; i.e.,
all Cauchy sequences in E converge to a limit in E. We will write for the norms on both F and E. In our case E=the
real numbers=R with norm the usual absolute value.
Suppose that F0 is a subspace of F ; i.e., a subset which is a vector space using the same operations of + and multiplication
by scalars as in F . In our case, F0 will be St[a, b], the step functions on the nite interval [a, b].
Let L : F0 E be linear; i.e., L(x + y) = L(x) + L(y), for all x, y in F0. and , R. In our case L will be the
integral over a nite interval [a, b].
39
x sin(1/x)
|x sin(1/x)|
on (0, .2)
We also assume that L is continuous for the norms on E and F0 . We will use the norm on St[a, b]; i.e., f =
l.u.b. |f (x)| .
x[a,b]
Lemma 47 Suppose L : F E is a linear function, where E, F are normed vector spaces. Then L is continuous
constant C > 0 such that L(x) C x , x F.
Proof. = L continuous implies it is continuous at x = 0. Thus, taking = 1, we see that > 0 such that x <
implies L(x) < 1.
Now we use a trick coming from properties of/ the /norm. If x = 0, (which we may assume without any problem since
/ x /
that case is pretty clear as L(0) = 0), we have / 2
x
lim (xn + yn ) = x + y.
This is an exercise in the properties of limits. It follows that x +y F0 . That means F0 is a vector space.
Theorem 49 Continuous Linear Extension Theorem
Assume F =normed vector space, F0 = subspace of F , E = complete normed vector space, L:F0 E, is continuous
linear. Then L can be extended to a unique continuous linear function L : F0 E. Here F0 is the closure of F0 , as in
Lemma 48. If C is the constant in Lemma 47 for L, then it also works for L.
40
Proof. Suppose x F0 . Then there is a sequence of points {xn } from F0 such that
lim xn = x.
We want to dene
w = L(x).
Claim 2. w is unique and thus w = L(x) is a legal denition of a function.
Proof of Claim 2. Suppose we take another sequence {un } from F0 such that
lim un = x. Then
lim L (un ) = v E.
/
/ /
/
/
/L(x)/ = /
/ lim L (xn )/ = lim L (xn ) .
n
We know that L(xn ) C xn n. Since the limit preserves , we see (using the continuity of ) that
/
/
/L(x)/ lim C xn = C x .
n
Q.E.D. claim 4.
This completes our proof of the continuous linear extension theorem - the hardest part of our construction of the integral.
Next we proceed to use the theorem.
Part V
We assume < a < b < . A step map or function is just what it says; i.e., a function whose graph looks like a bunch
steps. More precisely, we dene f : [a, b] R to be a step function if we can partition [a, b] by P = {a0 < a1 < < an }
41
The sum
n
n
wi (ai ai1 ).
i=1
wi (ai ai1 ) should look like a Riemann sum. Here wi = f (ci ) for any choice of points ci (ai1 , ai ).
i=1
We dene a partition Q = {b0 < b1 < < bm } to be a renement of the partition P = {a0 < a1 < < an } if the
set of points {a0 , a1 , , an } is contained
i
in the setof points {b0 , b1 , , bm }.
i = 0, ..., 2n
So, for example, the points P = ni i = 0, ..., n form a partition (regular) of [0, 1]. The set Q = 2n
is a renement of P.
Now we need to worry about the possibility that the integral of a step function might depend on what partition we choose.
Lemma 51 Suppose f is a step function with respect to 2 partitions
P = {a0 < a1 < < an } and Q = {b0 < b1 < < bm }
of [a, b]. Then I(f, P) = I(f, Q).
Proof. Step 1. The partitions P and Q have a common renement R whose subinterval endpoints are the union of the
endpoints from the partitions P and Q; i.e., the points {a0 , a1 , , an } {b0 , b1 , , bm }. Call the common renement
R = {a = r0 < r1 < < rk = b}.
Step 2. We claim that I(f, P) = I(f, R) = I(f, Q).
To see this, look at Figure 20.
This shows that if we look at any subinterval of partition P, like (ai1 , ai ) and further subdivide it by inserting a point
c; ai1 < c < ai , then since f is the constant wi on the subinterval (ai1 , ai ), the contribution to I(f, P) from (ai1 , ai )
is
wi (ai ai1 ) = wi (c ai1 ) + wi (ai c).
42
$b
Figure 20: Inserting a partition point in the ith subinterval of P replaces a term in I(f, P) = a f by the sum of 2 terms
representing the sums of the areas of the 2 rectangles pictured (if the function is 0 on the ith subinterval of P anyway)
but this does not change the nal result.
The right hand side is the sum associated to the 2 subintervals (ai1 , c) and (c, ai ) in our new partition rened by adding one
point to the ith subinterval. Keep doing this for as many points as you want to add to P to get the renement R. Hopefully
this convinces you that I(f, P) = I(f, R). Similarly, since R is also a renement of Q, we see that I(f, Q) = I(f, R). It
follows that I(f, P) = I(f, Q).
b
Lemma 52 Suppose that St[a, b] denotes the set of all step functions dened on the interval [a, b]. Dene the integral
f
a
for f St[a, b] as in Denition 50. By Lemma 51, the integral is independent of the partition P used to dene the step
function f . The integral then has the following properties on step functions.
Lemma 53 a) Integrals Preserve .
b
f, g St[a, b] and
b
f
g.
a
b
f is a continuous linear map from St[a, b] into R using the
a
f=
a
n
b
wi (ai ai1 ) I(g, P) =
i=1
g=
a
43
n
i=1
vi (ai ai1 ).
b) Integral is Continuous Linear. Note that if f is a step function with respect to the partition P = {a0 < a1 <
< an } of [a, b] and for each i = 1, ..., n:
f (x) = wi , x (ai1 , ai ),
then
b
I(f, P) =
f=
n
wi (ai ai1 ).
i=1
We have f = max |wi | . It follows, using the triangle inequality and the fact that the lengths of the subintervals of
i=1,...,n
n
i=1
n
f (ai ai1 )
i=1
= f
n
(ai ai1 ) = (b a) f .
i=1
Continuity at f = 0 follows from this inequality. Once we have proved linearity of the integral, the (uniform) continuity
will also follow since then we can say
b
b
b b
f g = (f g) |f g| (b a) f g .
a
For the last 2 inequalities, we are using the fact that the integral on step functions preserves inequalities. One quickly nds
a as a function of and independent of f and g.
Linearity. Let , R. Given step function f with corresponding partition P and step function g with partition Q, we
take a common renement R of P and Q. Both f and g are step functions with respect to R. Let R = {a0 < a1 < < an }.
Suppose that for each i = 1, ..., n:
f (x) = wi , x (ai1 , ai )
g(x) = vi , x (ai1 , ai ).
Then f + g is a step function for partition R with values for each i = 1, ..., n:
(f + g) (x) = wi + vi , x (ai1 , ai ).
This shows that St[a, b] is indeed a normed vector space with norm .
It follows from our denition of the integral on step functions that
b
I(f + g, R) =
(f + g) =
i=1
n
n
wi (ai ai1 ) +
i=1
b
a
vi (ai ai1 )
i=1
b
f +
n
g.
a
44
22
The Integral on Step Functions satises the 2 Axioms for Integrals from
Lectures I.
Recall from Lectures I that we needed 2 axioms for integrals in order to prove the fundamental theorem of calculus. Now
we prove those axioms for step functions f St[a, b].
b
Axiom 1. If m f (x) M for all x [a, b], then m(b a) f M (b a).
b
Axiom 2. If a < c < b, then
c
f=
b
f+
f.
c
Proof. Axiom 1 for Step Functions. Since m f (x) M for all x [a, b] and we have proved that the integral
preserves inequalities, we have
b
b
b
m(b a) = m f M = M (b a).
a
Proof. Axiom 2 for Step Functions. For f a step function on [a, b], we can always add a point like c to our partition
P = {a0 < a1 < < an } of [a, b] dening f and assume c = aj for some j. Suppose that for each i = 1, ..., n:
f (x) = wi , x (ai1 , ai ).
Proof. Then according to our denition of the integral of a step function
b
I(f, P) =
f=
wi (ai ai1 )
i=1
n
j
wi (ai ai1 ) +
i=1
c
a
wi (ai ai1 )
i=j+1
b
f+
n
f.
c
This completes our proofs of the properties of the integral on step functions. In the next section we prove a result which
will help us with our scheme to integrate continuous functions.
23
lim xn = x [a, b]
n
nJ
and
45
1
n
1
|xn yn |
lim
lim
= 0,
n
n
n
nJ
nJ
which means x = y.
On the other hand, since |f (xn ) f (yn )| and x = y, we have
0 = |f (x) f (y)| = lim f (xn )
lim f (yn )
n
n
nJ
nJ
=
|f (xn ) f (yn )| .
lim
n
nJ
|x y| <
implies
(5)
ba
Now let n be so large that ba
n < and let P = {a0 < a1 < < an } be a partition of [a, b] such that ai ai1 = n < .
Dene the step function sn on the subinterval (ai1 , ai ) to take any value f (c) for c (ai1 , ai ). Then by formula (5)
24
We can use the continuous linear extension theorem to extend the integral from St[a, b] to St[a, b]. Every f St[a, b] is a
limit with respect to the norm of a sequence sn St[a, b]. Then the continuous linear extension theorem tells us to
dene
b
b
f = lim sn .
n
We need to show that this extended integral satises the same 2 axioms that the step functions were shown to satisfy.
46
Theorem 56 Properties of the Integral on the Space St[a, b] of Regulated Functions (which includes all continuous, even piecewise continuous functions on [a, b]).
b
b
1)
f is a linear map from f St[a, b] into f R and
a
b
f (b a) f .
a
b
f preserves inequalities; i.e., f, g St[a, b] with f (x) g(x) x [a, b] implies
2)
a
b
b
f
b
3) a < c < b implies
c
f=
b
f+
g.
a
f.
c
Proof. 1) This is just the continuous linear extension theorem from the last lecture.
2) Let h(x) = g(x) f (x) x [a, b]. Then h(x) 0 x [a, b]. Suppose sn St[a, b] s.t.
Suppose sn is a step function for the partition P = {a0 < a1 < < am } of [a, b] and for i = 1, ..., m,
lim sn h = 0.
sn (x) = wi , x (ai1 , ai ).
If wi < 0 for some i, we can replace wi by 0 and make a new step function sn which is even closer to h than sn was
(because h(x) 0 for all x). Now we see that
b
b
g
b
f=
b
h = lim
sn 0.
3) Let sn be a sequence of step maps converging to f in the norm on [a, b]. Then sn
the norm on [a, c] and on [c, b]. We showed that for step functions we have:
b
c
sn =
b
sn +
sn .
c
Take the limit as n to get property 3 using the basic properties of limits from last quarter.
This completes our discussion of the existence of an integral with the properties Int1 and Int 2 from Lectures I. From
this we deduced in Lectures I all the basic properties of integrals such as the fundamental theorem of calculus, integration
by parts, the formula for substituting in an integral, the formulas like
b
xn dx =
a
b
xn+1
bn+1
an+1
=
,
n + 1 x=a
n+1 n+1
when n = 1 and assuming that if n < 1, 0 is not in the interval [a, b].
To extend the fundamental theorem of calculus from continuous functions to piecewise continuous functions or regulated
functions, requires a little more eort. Lang does this in Undergraduate Analysis, Chapter 10 and produces theorems
legalizing dierentiation under the integral sign. In the last part of the book he extends the theory to integrals in several
variables. I leave it to you to read these things.
47
Part VI
solved this problem with Taylors formula and series. But Taylor polynomials do not give arbitrarily good approximations
unless the function f has lots of derivatives. Moreover, we know that there are innitely dierentiable functions that are not
represented by their Taylor series. See Figure 15. Thus we must nd a new method to get polynomial approximations. At
the same time we will create the mechanism to nd other sorts of approximations that we will need when we discuss Fourier
series in the last section.
25
You are familiar with the pointwise product of functions dened by (f g) (x) = f (x) g(x). You just take the product of
the real numbers f (x) and g(x). Thus if we dene f (x) = 1 for all x, we get ( f g) (x) = g(x). So f (x) = 1 is the identity
for pointwise product.
Now we want to dene a new kind of product, well, not that new if you got to convolution in the Laplace transform
section of ODEs courses. And not so new if you are taking probability or statistics courses where given independent random
variables with densities f and g, the density of the sum of the random variables is the convolution product f g. For this
product, f 1 = f.
.N
Our aim is to use convolution in order to uniformly approximate continuous
functions f on [a, b] by polynomials n=0 an xn
.
(or continuous functions of period 1 by trigonometric polynomials n= an e2inx for Fourier series).
We will assume that our functions are piecewise continuous and that at least one of the functions in f g vanishes o a
bounded interval so that we know the integrals exist without thinking too hard about the meaning of
B
= lim lim .
A=
B
A
f (t)g(x t)dt.
One can read f g as f splat g since it does splat the properties of the 2 functions together, preserving the best properties.
If one function is discontinuous, but the other is a polynomial, the convolution is a polynomial.
Example. Let g(x) = 1 x2 , for all x. Dene
1,
if 0 < x < 1
1, if 1 < x < 0
f (x) =
0,
if x = 0
0,
if x 1 or x 1.
See Figures 21 and 22.
48
1,
1,
Figure 22: The f (x) =
0,
0,
49
if
if
if
if
0<x<1
1<x<0
x=0
x 1 or x 1.
Then, since f has diering formulas on dierent intervals (and mercifully g does not), we get
0
(f g)(x) = 1
1
g(x t)dt +
0
=
1 (x t)
1
dt +
1 (x t)2 dt
0
3
g(x t)dt
0
2x3
(x + 1)
(x 1)
+
+
= 2x.
3
3
3
(f g) (x) =
f (t)g(x t)dt = f (x u)g(u)du =
g(u)f (x u)du = (g f ) (x).
50
Now if x > c + d, and c t c, we see that x t > c + d c = d and g(x t) = 0. So x > c + d implies (f g) = 0.
If x < c d and c t c, we see that x t < c d + c = d and g(x t) = 0. Thus x < c d implies
(f g) = 0.
6) It suces using 2) and 3) to consider the case that g(x) = xn . Then supposing f (x) = 0 if |x| > c, we have the
following, using the binomial theorem
(f g) (x) =
c
f (t)g(x t)dt =
c
=
c
Let
f (t)(x t)n dt
c
c
n
n
n nk
n nk
k
x
x
f (t)
(t) dt =
f (t)(t)k dt.
k
k
k=0
k=0
c
f (t)(t)k dt = ck .
c
Then
(f g) (x) =
n
n
k=0
ck xnk ,
which is a polynomial.
7) This is an exercise using a result legalizing interchange of derivative and integral which can be found in Lang,
Undergraduate Analysis, p. 276, Theorem 7. The hypothesis requires the uniform continuity of x
(f (t)g(x t)) .
This concludes our discussion of convolution. It is important in the theory of probability and Fourier analysis, Laplace
transforms. In fact the Fourier and Laplace transforms change convolution into ordinary pointwise product of the transformed
functions. Convolution is also used to smooth data thanks to property 7). We want to use it to approximate continuous
functions by polynomials.
26
We want to think about something called the Dirac delta "function" denoted (x). This is not to be confused with our earlier
use of the Greek letter to mean a small positive number. The Dirac delta function is used in physics to represent an
impulse. Examples are a point mass and a point charge. It is often said to be a function that is 0 for x = 0 and at x = 0.
At least that is what I remember as an undergrad taking physics classes for my minor. It used to drive me crazy, since as a
mathematics major, I learned that such a function could not exists. Laurent Schwartz whose theory of distributions (1944)
legalized delta describes how the formulas involving delta drove him crazy in 1935 in his autobiography. See L. Schwartz, A
Mathematician Grappling with his Century, p. 218. He also says: " ... its a good thing that theoretical physicists do not
wait for mathematical justication before going ahead with their theories!"
The graph usually associated with shows a unit arrow or spike at the origin - not a point at innite height. See Figure
24.
Another "denition" of is that for any continuous function f (x) it is supposed to give the formula
f (t)(t)dt = f (0).
We show in the exercises that this formula forces (x) = 0 for x = 0. But if delta were really a function this would force
f (t)(t)dt = 0 and not f (0).
51
Kn (x)dx + Kn (x)dx < .
The 3 properties say that the area under the curve (which is 1) y = Kn (x) becomes more and more concentrated at the
origin as n . You might think it hard to nd such a sequence but we have the following examples.
K(x)dx = 1. Then Kn (x) = nK(nx) is a Dirac sequence.
Example 1. Let K(x) be such that K(x) 0 for all x and
1
c
Ln (x) =
1 t2 dt.
where cn =
n
0
otherwise.
1
cn =
(n!) 22n+1
.
(2n)! (2n + 1)
52
See Courant and Hilbert, Methods of Mathematical Physics, Vol. I, p. 84. Unless you know Stirlings formula, this is not
too helpful in proving the Landau
kernel
n gives a Dirac sequence. Lang gives a simple inequality which is all we need. In
Figure 25, we plot Ln (t) = 1 t2 (2n)!(2n+1)
, for t [1, 1], when n = 10 in blue; 30 in magenta ; 60 in green; 90
(n!)2 22n+1
in turquoise, 120 in purple. Figure 25 shows the area under the curve (which is 1) begins to concentrate at the origin as n
increases.
n
Figure 25: Ln (t) = 1 t2
120 in purple
Lemma. cn
Proof.
(2n)!(2n+1)
,
(n!)2 22n+1
2
n+1 .
cn
=
2
1
1t
2 n
1
n
(1 t) (1 + t) dt
dt =
1
n
(1 t) dt =
0
1
.
n+1
It is clear that the Landau kernel has the rst 2 properties of a Dirac sequence. To prove the 3rd property, we argue
as in Lang, Undergraduate Analysis. Given > 0 and 0 < < 1, we need to nd N so that n N makes the following
integral < :
1
cn
1
1 t2
n
dt
n+1
2
1
1 t2
n
dt
n+1
2
1
1 2
n
dt =
n
n+1
1 2 (1 ) .
2
We can make the stu on the right < for large n, since we can show that because 0 < 1 2 < 1, lim
n+1
n 2
1 2
n
= 0.
To see this, you could use lHpitals rule or just remember the appropriate fact about exponentials (if c < 0, then xecx 0,
as x ).
27
We want to prove that any Dirac sequence Kn behaves like an identity for convolution in the limit as n . Some people
call Dirac sequences "approximate identities" for this reason.
53
Theorem 60 Suppose f is a bounded piecewise continuous function on R. Let I be any nite interval on which f is
continuous. Dene g = max |g(x)|. Then lim f Kn f = 0. This says that the sequence Kn f converges
n
xI
uniformly to f on the interval I. Here the norm is taken on the interval I and we are assuming that the kernel vanishes o
I.
Proof. Using the 1st and 2nd properties of Dirac sequences plus the fact that integrals preserve , we have:
|(Kn f ) (x) f (x)| = Kn (t)f (x t)dt f (x) Kn (t)dt
= Kn (t) (f (x t) f (x)) dt
Since f is uniformly continuous on I by Theorem 54, given there is a such that |f (x t) f (x)| < when |t| < .
Since f is bounded, there is a bound M so that |f (x)| M for all x.
Now we can break up the last integral
Kn (t) |f (x t) f (x)| dt =
Kn (t) |f (x t) f (x)| dt + Kn (t) |f (x t) f (x)| dt + Kn (t) |f (x t) f (x)| dt.
For the rst 2 integrals, use the bound on f and the 3rd property of Dirac sequences to see that, for large enough n, they
are less than 2M or even , if you prefer.
For the last integral, use the uniform continuity of f to see that the integral is
Kn (t)dt
Kn (t)dt = .
This completes the proof that |(Kn f ) (x) f (x)| (2M + 1) . You can replace by
Corollary 61 Weierstrass Theorem.
approximated by polynomials.
2M +1
Proof. We can use the preceding theorem for the interval [0, 1] along with the Landau kernel. See Lang, Undergraduate
Analysis, for a general interval. Suppose f is continuous on [0, 1] and vanishes outside [0, 1]. We still need to see that Ln f
is a polynomial. Suppose x [0, 1]. Then
1
Ln (x t)f (t)dt =
Ln (x t)f (t)dt.
0
n
2
Also we see that for x, t [0, 1], we have 1 x t 1 and so Ln (x t) = 1 (x t)
and the integral is
1
cn
1
1 (x t)
n
f (t)dt.
This is a polynomial using the same reasoning as in our proof in the 1st section that convolution of f with any polynomial
is a polynomial.
54
x
2
1
2
2 e
Figure 27: The unit step function U (x) = 1 if x > 0, U (x) = 0, otherwise.
55
x2
1
One can show (exercise) that the Gauss kernel (or normal density) G (x) = 2
e 22 approaches the Dirac delta
as 0. Figure 26 shows the kernel for = 1/10, 1/20, 1/30, 1/40.
As an example, lets approach the Unit Step function in Figure 27.
Convolutions of the Gaussians from Figure 26 and the Unit Step function are shown in Figure 28.
1
1
1
1
10 , 20 , 30 , 40 .
Part VII
Fourier Series
We said a bit about Fourier series in the introduction. Fouriers original paper was published in 1822. The publication was
delayed by mathematicians such as Lagrange. See the book on Fourier by Grattan-Guinness and Ravetz. It may seem odd
that people worried so much about Fourier expansions when they did not worry about Taylor
expansions.
It will be slightly easier if we allow complex-valued functions like f (x) = e2inx , where i = 1. This means we need to
allow vector spaces with complex scalars.
28
We want to look at vector spaces V with complex (or real) scalars. This means that V has all the axioms for addition and
multiplication by scalars given earlier. Such a vector space is said to have a Hermitian scalar product < v, w > for
v, w V if it satises the following axioms.
Axioms for a Hermitian Scalar Product
For all z, w, u V and , C we have:
Axiom P1. < z, w > C and < z, w >= < w, z >. Here = u + iv C has complex conjugate = u iv.
Axiom P2. < z + w, u >= < z, u > + < w, u > .
Axiom P3. < z, z > 0 and < z, z >= 0 z = 0.
Denition 62 As in the case of real scalars, we will say that 2 vectors z, w V are orthogonal if < z, w >= 0.
56
Theorem 63 If V is a vector space with complex scalars and < z, w > denotes an Hermitian scalar product on V, then
.
z = < z, z > gives a norm on V according to our usual axioms for norms on vector spaces.
Proof. We need to check the 3 norm axioms.
N 1.
v 0 v V and v = 0 if and only if v = 0.
N 2.
v = || v , v V and R. Here || = complex absolute value of .
N 3. u + v u + v , u, v V. Triangle Inequality.
The rst follows from Property 3 of the scalar product. To see the second, note that if C
v || w =
< v, w >2
w
w =
< v, w >2
w
z1
z2
=
z1
z2
.
f (x)g(x)dx.
0
It is easy to integrate a complex valued function h(x) = u(x) + iv(x), with u(x) and v(x) real valued. The denition is
b
b
h=
b
(u + iv) =
b
u + i v.
29
Complex Exponentials
Our Fourier series will involve complex exponentials because it would be twice as much work to write series of sines and
cosines. We can dene for z C, the complex exponential by the same old Taylor series, except that now all the terms
are complex:
zn
ez =
.
n!
n=0
This series converges absolutely and uniformly in the complex plane by the complex version of the Weierstrass M-Test. The
w
same proof that worked for real exponentials shows that ez+w = ez ew . One also has (ez ) = ezw . And, as before, if y is
iy
real, e = cos y + i sin y.
Complex conjugation c(z) = z is a continuous function satisfying zw = zw. This means that if p(z) is a polynomial
with real coecients, then p(z) = p (z) . Furthermore ez = ez . It is an exercise to prove the preceding statements.
The complex exponentials en (x) = e2inx , n = 0, 1, 2, 3, .... form a set of mutually orthogonal functions in C[0, 1] =
{f : [0, 1] C} using the scalar product dened in Example 2 of the last section. To see this, suppose m = n and do the
integral:
1
1
1
1
2inx 2imx
2i(nm)x
2i(nm)x
< en , em >= e
e
dx = e
dx =
= 0.
e
2i(n m)
x=0
0
Here we are assuming a complex version of the fundamental theorem of calculus, but you can check that it works given our
denitions of these integrals. It is certainly simpler to do things this way than to show the analogous properties of cos(2nx)
and sin(2mx), using trigonometric identities. We also need to know that 1 = e2in , for any integer n.
If n = m, we nd that
1
1
2
2inx 2inx
en =< en , en >= e
e
dx = 1dx = 1.
0
2inx
ne
1
, with n =< f, en >=
n=
f (t)e2nit dt.
(6)
Fourier claimed to be able to expand "arbitrary" functions in his series. That will not be possible if we want the series in
Formula (6) to converge pointwise or better yet uniformly. For applications we might want even better convergence in that
we may want to be able to dierentiate term-by-term. Generally we will nd that the more derivatives f has, the faster the
Fourier series will converge.
If you hate complex numbers and you want to keep everything real, use eix = cos x + i sin x and write the Fourier series
as
f (x) =
where for n = 1, 2, 3, ....
a0
(an cos(2nx) + bn sin(2nx)) ,
+
2
n=1
1
an =
1
f (t) cos(2nt)dt and bn =
f (t) sin(2nt)dt.
(7)
58
30
We do some innite dimensional linear algebra here. You should be familiar with the nite dimensional version from calculus.
The set-up is that V is a complex vector space with Hermitian scalar product < v, w >. Given an innite set {v1 , v2 , v3 , . . . }
of pairwise orthogonal vectors in V , we want to express an arbitrary vector as an innite sum
n vn , with scalars n C.
n=1
Theorem 65 Facts about Generalized Fourier Series in Innite Dimensional scalar Product Spaces
Suppose {v1 , v2 , v3 , . . . } are non-zero elements of V and pairwise orthogonal; i.e. < vn , vm >= 0 if n = m.
Fact 1. Non-0 Orthogonal elements of V are linearly independent.
K
n vn = 0, then cn = 0 for all n = 1, 2, ..., K. This says that
For any K, if there are scalars n C such that
n=1
n vn , with n C,
n=1
< z, vj >
.
< vj , vj >
Fact 3. Bessels Inequality. Assume that the set is orthonormal, meaning that vn = 1, for all n. Then we have
Bessels inequality
2
2
| n | z .
n=0
expression z =
n vn , with n C. Then we have Parsevals equality:
has an
n=1
| n | = z .
n=0
Proof. Fact 1. If
K
n=1
6
0=
K
7
n vn , vj
n=1
K
n vn , vj = j vj , vj .
n=1
The 1st equality holds since < 0, w >= 0 for any vector w. The 2nd holds by the linearity of the scalar product in the
1st variable. The last equality holds by the pairwise orthogonality of the vectors vn . Since vj = 0, it follows that j = 0.
Fact 2. Look at the scalar product < z, vj >:
7
6
n vn , vj =
n vn , vj = j < vj , vj > .
< z, vj >=
n=1
n=1
We can move limits such as innite sums in and out of the scalar product < v, w > because the scalar product is continuous
in v holding w xed (exercise using the Cauchy-Schwarz inequality). Again, we have used the linearity of the scalar product
in the 1st variable and the pairwise orthogonality of the vectors vn . Solve for j to nish the proof.
59
2
< z, z >=
n vn ,
m vm =
n m vn , vm =
| n | .
n=1
m=1
n=1 m=1
n=0
Here we used the continuity and linearity of the scalar product in each variable holding the other variable xed as well as
the pairwise orthogonality of the vn plus the fact that vn = 1, n.
We leave the proof of Fact 3 as an exercise.
Example. Let V = C[0, 1] = {f : [0, 1] C | f is continuous} . Note that we can identify this space with C[ 21 , 12 ] or
C(I) for any other closed interval I of length 1.
1
Dene the scalar product as < f, g >= f (t)g(t)dt. We have an orthonormal set S = en (t) = e2int |n Z . The
0
norm induced on this space by the scalar product is the L2 norm given by
8
91
9
9
2
f 2 = : |f (t)| dt.
0
Thanks to the preceding theorem, once we have proved that S is a complete orthonormal set, then we will know that any
function f C[0, 1] has a Fourier series converging in the L2 norm. That is
f (x) =
2inx
ne
1
, with n =< f, en >=
n=
f (t)e2it dt.
For the applications to partial dierential equations that we want to discuss, we will need the Fourier series to converge
pointwise at each x, along with all the necessary derivatives. We will need extra hypotheses on f to insure such nice
convergence.
You can also view the kth partial sum of the generalized Fourier series for z in Theorem 65 as giving best approximations
to
/
/
K
K
/
/
/
/
z in the subspace of
n vn , n C, spanned by the vn s. Here "best" means that the mean square error /z
n vn /
/
/
n=1
n=1
is smallest when j = j =< z, vj > . Another way to say this is to say that the Fourier coecients give the best
least squares t to z. We leave the proof of this as an exercise for the reader.
31
n vn , where
n =< z, vn > .
n=1
Here convergence of the series means convergence with respect to the norm w =
< w, w >.
2
2
| n | = z if n =< z, vn > .
2) Parsevals Equality holds:
n=0
Exercise. Prove the preceding theorem. Hint: You need only show 1) = 2) = 3) = 1).
In order to prove that our favorite sequence {e2inx , n Z} is complete for the scalar product space C[0, 1], we will need
to use the theory of convolution and Dirac sequences studied earlier. But now we need dierent kernels from the Landau
kernel or the Gauss kernel.
Denition 68 The nth partial sum of the Fourier series of f C[0, 1] is dened to be
sn (x) =
n
2ikx
ke
1
, with
k =< f, ek >=
k=n
f (t)e2kit dt.
Note that the nth partial sum is always taken to be the symmetric sum of terms from n to +n. There are 2n + 1 terms
in the nth partial sum.
Denition 69 The Dirichlet kernel is dened to be
n
Dn (x) =
e2ikx .
k=n
n1
1
Dk (x).
n
k=0
12
The convolution of f and the Fejr kernel is the average of the rst n partial sums of the Fourier series of f ; i.e.
f Kn =
1
{s0 + s1 + + sn1 } .
n
Property 2.
sin((2n + 1)x)
1
Dn (x) =
; Kn (x) =
sin(x)
n
sin(nx)
sin(x)
2
.
Proof. Property 1.
1
2
(f Dn )(x) =
f (t)
n
e2ik(xt) dt =
k=n
12
n
k=n
e2ikx
2
f (t)e2ikt dt.
12
n
e2ikx = e2inx
k=n
k=n
(2n+1)2ix
e2i(k+n)x = e2inx
2n
e2ijx
j=0
(2n+1)2ix
1
e
1
= e2inx eix ix
e2ix 1
e
eix
e(2n+1)ix e(2n+1)ix
sin((2n + 1)x)
=
.
eix eix
sin(x)
= e2inx
=
n
61
sin((210+1)x)
.
sin(x)
2
n1 k
1 2ijx
1 sin(nx)
e
=
,
n
n sin(x)
k=0 j=k
2
Kn (x)dx + Kn (x)dx < .
12
Dir1 is clear. We leave Dir 2 to the reader as an exercise. That leaves Dir 3.
assuming everything vanishes outside [ 12 , 12 ], we just need to note that
1
1
n
2
sin(nx)
sin(x)
2
1
dx
n
2
1
dx 0, as n .
sin (x)
2
62
sin((2n+1)x)
,
sin(x)
63
1
10
n = 1, 2, ..., 10.
sin(10x)
sin(x)
2
.
1
n
sin(nx)
sin(x)
2
, n = 1, 2, ..., 10.
Theorem 73 (Fejrs Theorem, 1904) Suppose that f C[0, 1] and f (0) = f (1). Then if Kn denotes the Fejr kernel,
f Kn approaches f uniformly on [0, 1] as n .
This means lim f Kn f = 0. Here, for bounded functions g on [0, 1], we dene g = lub{|g(x)| |x [0, 1]}.
n
One says that the Fourier series of f is Cesro summable to f where f is continuous.
Proof. This follows from the fact that Kn is a Dirac sequence of positive type by Theorem 60.
It follows from Fejrs Theorem 73 that if f C[0, 1] and f (0) = f (1) then f can be uniformly approximated by
trigonometric polynomials of the form
k
j e2ijx , f or j C.
j=k
For
n1 k
1
(f Kn )(x) =
j e2ijx , where j = f (x)e2ijx dx.
n
1
k=0 j=k
Corollary 74 The set {e2inx |n Z} is a complete orthonormal set for C[0, 1].
Proof. If w C[0, 1] is orthogonal to e2inx for all n, it follows from Fejrs theorem that w must be 0.
Uniform convergence implies L2 -convergence. Thus, if f C[0, 1] satises f (0) = f (1), then it can be uniformly approximated by trigonometric polynomials which implies that it can be approximated in L2 -norm by trigonometric polynomials.
But this means
1
2inx
f (x) =
ne
, with n =< f, en >= f (t)e2int dt.
(8)
n=
0
2
64
Warning: When we say that the convergence of the Fourier series in formula (8) is in the L2 -norm, this does not suce
usually for what we are thinking is happening. We always hope to have uniform convergence, but we know that convergence
with respect to is harder to achieve.
Maybe we should write some other symbol than = as we have not claimed that the Fourier series converges for every
point x [0, 1]. That is false.
Note that we need not assume that f has period 1 at least if all we want is 2 norm convergence, since it is possible
to replace any f C[0, 1] with a g C[0, 1] such that g(0) = g(1) and such that g f 2 is arbitrarily small. This is an
exercise.
The following Corollaries follow from Theorem 65 and Corollary 74.
Corollary 75 (Parsevals Equality for Fourier Series) Suppose that f (x) is piecewise continuous on [0, 1]. Then
1
|f (x)| dx =
| n | ,
n=
1
where n =< f, en >=
f (t)e2int dt.
n n ,
n=
if
2int
f (t)e
1
dt
and
n =< g, en >=
g(t)e2int dt.
32
For the applications we need pointwise convergence of the Fourier series and even the ability to dierentiate the Fourier series
term-by-term. When is this legal? To gure such things out, consider the Dirichlet kernel dened in the last section:
n
Dn (x) =
e2ikx =
k=n
sin((2n + 1)x)
.
sin(x)
Recall that we showed in the last section that the convolution of f with the Dirichlet kernel is the nth partial sum of the
Fourier series of f when the period interval has length 1:
n
f Dn = sn =
e2ikx k , where k =
k=n
2
f (t)e2ikt dt.
12
However, the Dirichlet kernel is not as nice as the Fejr kernel because it is not positive.
Before proving some theorems, lets look at a
few examples.
1,
0 < x ,
Example 1. The step function f (x) =
Make this periodic of period 2 on the real line.
1, x < 0.
You need to make the change of variables x = 2y here. This changes the Fourier series and coecients to
f (x) =
n=
inx
ne
1
, with n =< f, en >=
2
65
f (t)eint dt.
0
1
2
i
1 + ein + ein 1 .
2n
int
1
dt +
2
int
e
0
1
dt =
2
0
eint
eint
+
in
in 0
4(1)n sin(nx)
.
n
Figure 33 shows a plot of the function f (x) at the top left, then it shows the nth partial sums of the Fourier series for
n = 10, 50, 100.
n einx + n einx =
1,
0 < x ,
, is shown at top left. The nth partial sums of the Fourier series of
1, x < 0.
f for n = 10, 50, 100 are shown in the next 3 plots. The Gibbs phenomenon is revealed.
Figure 33: The function f (x) =
This reveals the Gibbs phenomenon which arises because the function has a jump discontinuity at 0. This sort of thing
always happens at a jump discontinuity. It does not help to take more terms of the Fourier series. There will always be an
overshoot of about 9%. See Dym and McKean, Fourier Series and Integrals for more details.
The phenomenon had been noted before Gibbs by various people. Gibbs explained the phenomenon, replying to Michaelson (in a letter to Nature in 1898). Michaelson had become angry that his machine for computing the 1st 80 terms in a
Fourier series gave a result that was not close enough to the function near the jumps.
The next example shows that convergence is very nice when there are no jumps.
Example 2. Consider the function f (x) = |x| for |x| . Make this periodic of period 2 on the real line.
The graphs in Figure 34 show the function plus the 10th, 50th and 100th partial sums of its Fourier series.
convergence is quite rapid.
The
Example 3. Let f (x) = x, if x [0, 1) and make it periodic of period 1 elsewhere. This produces a sawtooth graph.
See Figure 35.
66
Figure 34: Dene f (x) = |x| for |x| . Make this periodic of period 2 on the real line. The graphs show at top left f (x)
on the period interval, then the 10th, 50th and 100th partial sums of its Fourier series.
Figure 35: The sawtooth function - a function of period 1 dened by f (x) = x, for x [0, 1).
67
One nds that the Fourier coecients of the sawtooth function are
1
n =
xe2inx dx =
1
x2 dx =
| n | =
n=
1
2,
1
2in ,
n = 0,
n = 0.
1
2 1
.
+ 2
4 4 n=1 n2
1
2
= (2).
=
2
6
n
n=1
The 1st step in showing the convergence of Fourier series of dierentiable functions is the following Lemma which is also
useful in quantum mechanics. We could weaken the hypothesis to assume only that the function f is Lebesgue integrable
on [0, 1].
Lemma 77 Riemann-Lebesgue Lemma.
f (x)e2inx dx = 0.
|f (x)| dx =
n=
sn =
n
2
e2ikx k , where k =
k=n
f (t)e2ikt dt
12
f sn 0, as n .
Moreover there is a positive constant b such that
f sn
b
1
np 2
Proof. We will take our interval to be [ 12 , 12 ]. Since our functions have period 1, this is no problem. It is an exercise to
n
e2ikx = sin((2n+1)x)
, is the Dirichlet kernel, then
show that if Dn (x) =
sin(x)
k=n
1
2
Dn (t)dt = 1.
12
68
By the same trick that was used in proving Fejrs theorem, we have
1
2
(sn f )(x) = (Dn f f )(x) =
(f (x y) f (x)) Dn (y)dy
12
12
=
12
f (x y) f (x)
sin(y)
#
sin ( (2n + 1) y) dy.
Next note that the quantity in brackets is continuous at y = 0 (the only place in the interval [ 12 , 12 ] where something
could go wrong), since f (x) exists and is continuous. The reason is that
f (x y) f (x)
f (x y) f (x)
y
=
,
sin(y)
y
sin(y)
which approaches f (x) 1 as y 0. Then we can apply the Riemann-Lebesgue Lemma to see that (sn f )(x) 0, as
n .
How do we estimate f sn ? To do this use the fact that we now know that we have pointwise convergence of the
Fourier series; i.e.,
1
2inx
f (x) =
ne
, with n =< f, en >= f (t)e2int dt.
n=
(9)
Now we obtain our estimate as follows making use of the Cauchy-Schwarz inequality for the scalar product space of
2
sequences {n }nZ , where {n } , { n } =
n n and {n } =
|n | . Cauchy-Schwarz says
n=
n=
f
|(s fn ) (x)|
2
2
2.
4 2
4 2
|k|
|k|
|k|>n
|k|>n
|k|>n
We know that
|k|>n
1
|k|2
1
n
69
Exercises II
Exercises 1
1) a) Suppose x is a norm on a vector space E and x is a norm on a vector space F. Show that you can
make the Cartesian product EF={(x,y) | xE and yF} into a vector space.
b) Show that the Cartesian product EF has a norm given by (x,y)= x + y .
c) Consider the special case that E=F=h with x=x=|x|=the ordinary absolute value of x.
Using the definition of norm in b), draw a picture of the region in the plane given by the set of points
{(x,y)h2| (x,y)-(1,2)<3 }.
2) a) Suppose the E is an inner product space with inner product denoted by <x,y> for x,y in E.
Then use the norm v =(<v,v>)1/2. Prove that v+w2+v-w2=2v2+2w2.
b) Let E=h2 and draw a picture to explain why the formula in part a) is called the parallelogram law.
3) Suppose that E=C[0,1], the space of continuous real-valued functions on the interval [0,1].
b
= f ( x ) dx
is a norm.
f ( x ) dx , then we have
f1 f2. Hint. Use the Cauchy-Schwarz inequality. What happens to the norm f2 if we replace the
interval [0,1] with the interval [a,b]?
5) Let E = h2 using the usual inner product (i.e., the dot product) and the usual Euclidean norm. Under what
conditions on 2 vectors x,y in h2 do we have equality in the Cauchy-Schwarz inequality |<x,y>| x y ?
Prove your answer.
6) Which of the following are norms on the 1-dimensional vector space E=h ? Give reasons for your answers.
a) |x| = ordinary absolute value of x in h.
b) x2.
c) |x|/(1+|x|)
d) 2|x|
e) x=1 if x0 and x=0 if x=0
f) x3.
7) Which of the following give equivalent norms on C[a,b]? Why?
b
= f ( x ) dx ,
1
a
f ( x ) dx ,
= max f ( x ) .
x[ a ,b ]
lim x
p
= x
= max { x1 , x2 }.
Exercises 2
x
vn = n is a sequence of vectors in 2. Show that if we define limits for sequences of
yn
vectors in 2 using the norm v , then
1) Suppose that
a
lim vn = L = if and only if
n
b
2) Consider the sequence of functions on [0,1] given by fn(x) = xn, for 0x1.
a) Show that for each x[0,1] we have lim f n ( x ) = f ( x ) exists (defining f(x) to be this limit). This is
n
called a pointwise limit. Note that f(x) is only piecewise continuous on [0,1].
f fn
f fn
and
f fn
as
does not 0 as n .
n =0
(1)
n =0
4) Consider the sequence of functions defined by fn(x) = 1/(x+n), for n=1,2,3,4, with x(0,). We have 4
notions of convergence: pointwise for each fixed x(0,) and convergence with respect to the norms
, and
Under which
Exercises 3
1) Suppose that
<,>
is a scalar product on a vector space V and suppose that we have a subset S of V with a
vector a adherent to S. Let f and g be functions mapping S into V such that the two following limits exist
x a
xS
lim f ( x ), g ( x ) = L, M .
x a
xS
2) Define C[a,b] to be the space of continuous real valued functions on the finite interval [a,b]. Define the
b
mapping I:C[a,b] by
a) Is I continuous when we use the norm on C[a,b] and ordinary absolute value as our norm on ?
Why?
b) What if we use the 1 norm on C[a,b]? Why?
3) Consider the following functions on 2
2 x2 y
( y 2 x 2 )2
x
y
,
(
,
)
(0,0)
, ( x, y ) (0,0)
f ( x, y ) = x 2 + y 2
g ( x, y ) = x 4 + y 4
0,
( x, y ) = (0,0)
0,
( x, y ) = (0,0)
y 0 x 0
b) Do the repeated limits in part a) equal the limit as (x,y)(0,0) using any of the favorite norms on 2?
That is, does
lim
( x , y )(0,0)
h ( x, y ) = L ?
4) Let ` denote the set of all sequences a={an}n1 of real numbers such that
2
n
converges.
n =1
a) Show that you can define addition and scalar multiplication to make `2 a vector space.
<a,b>= an bn .
n =1
properties of a scalar product. You will need to use the Cauchy-Schwarz inequality on the partial sums.
Exercises 4
1) State whether the following series converge and give a reason for your answer.
a)
n
n =0 2
n =1 n
b)
c)
2
n =1 n
n =1
( 1)n
.
n
n =1
d)
2
n
n =1
a)
n 3e n
n =0
n = 2 log n
b)
log n
n +1 x
n =1
uniformly on (0,1)?
xn
b) Show that the series nx converges uniformly for x [0,C].
n =1 3
Exercises 5
a)
nx
n =1
1 n
b) x
n =1 n
c)
n x
n
d)
n =1
2
n =1
xn .
2) What functions f(x) are represented by the power series in parts a), b) and d) of problem 1. Compute
the power series for the derivatives f(x) of these 3 functions. For what values of x is it legal to
differentiate term by term?
1 x .
4) For an integer n, the Bessel function Jn(x) is defined by the power series
( 1) k x
J n ( x) =
k = 0 k !( n + k )! 2
2k +n
Find the radius of convergence of this power series and then show that the Bessel function satisfies
Bessels equation
x 2 J n" ( x) + xJ n' ( x) + ( x 2 n 2 ) J n ( x) = 0.
Exercises 6
A B.
x
2
x <
y
x
y = 2 x
y
y .
3)
Let f(x)=x2, for all x[0,1]. Find a step function s(x) on the interval [0,1] such that
f-s < 1/10.
4) Define the function f(x) = x sin(1/x), when x0, and f(0) = 0. Define g(x) = 1 if x>0, g(x) = -1 if x<0,
and g(0)=0. Show that h(x)=g(f(x)) is not the (uniform) limit of a sequence of step functions on the
interval [0,1], using the norm .
5) Show that for 2 equivalent norms and on a vector space V that the open sets, closed sets and
continuous functions are the same no matter which of the 2 norms you use.
Exercises 7
1) a) A Mean Value Theorem. Suppose that f,g:[a,b] are continuous on [a,b] and g(x)0 for all
x[a,b]. Show that there exists c[a,b] such that
b
fg = f (c) g .
b) Show that the result of problem 1 need not be true if g(x) can assume both positive and negative
values on [a,b].
a
2) Define
f = f .
a
Show that
f = f + f
3) Let g:[a,b] be continuous and g(x) 0 for all x[a,b]. Show that if
g =0
a
x=0
1,
F ( x ) = x, 0 < x < 1
0,
x = 1.
f .
Exercises 8
1) a) Suppose that f(x)=1, for 0x1 and f(x)=0, otherwise. Compute (f * f)(x). Draw the graphs of
f and f * f.
b) Suppose that f(x) is as in part a). Define g(x) = 1-x2, for all x. Compute f * g. Draw the graphs
of g and f * g.
2) Suppose that is a real number and f,g,h are piecewise continuous functions such that h vanishes outside
the interval [-c,c]. Show that
a) (f+g) * h = f * h + g * h
b) (g) * h = (g * h).
c) Suppose that f,g,h are piecewise continuous functions that vanish outside an interval [-c,c]. Show
that (f * g) * h = f * (g * h).
3) (Delta is not a function). Suppose that there is a piecewise continuous function such that
f (t ) (t )dt = f (0) for all continuous functions f which vanish outside an interval [-c,c] for some c.
Show that this implies (x)=0, for all x0 and thus that
proved assuming only that is a Lebesgue integrable function on finite intervals. (In fact, is not an
ordinary function. It is a generalized function or distribution.)
x
1
4) Define the Gaussian by Gt ( x ) =
e 4t . Show that G1/n, for n=1,2,3,....
4 t
2
is a Dirac sequence.
Exercises 9
1) Compute the Fourier series of the function f(x)=x2 for x in[0,1) and extended to all real numbers to be periodic of period 1.
What does the Parseval identity say for this function?
1
f ( x) x dx = 0
n
0.
You will need to remember the Weierstrass theorem on uniform approximation here.
****************************************************************************************************
PRACTICE EXAM 1
****************************************************************************************************
1) Define and give an example:
a) norm;
b) scalar product;
c) equivalent norms;
d) limit of a sequence {vn} in a normed vector space;
e) Cauchy sequence in a normed vector space;
f) complete normed vector space;
g) closure of a set in a normed vector space;
h) lim f ( x) = L for function f:DW, with V { x | 0<&x-c&< } for some >0.
xc
2) True - False. Tell whether the following statements are true or false. Give a brief reason for your answer.
a) C[a,b] the space of continuous real-valued functions on an interval [a,b], is a finite dimensional vector
space.
b) For
v
v = 1 2,
v2
c) The function
v = v12 + v22
f ( x, y ) =
x2
, for (x,y)(0,0) and f(0,0)=0
x2 + y2
d) Suppose that fn is a sequence of continuous functions on the interval [a,b]. Suppose for every x in [a,b]
we have lim f n ( x ) = f ( x ). Then f(x) is continuous on [a,b].
n
3) State and prove the Cauchy-Schwarz inequality. What norm is being used in this inequality?
Would it still be true if we replaced that norm with some other one?
4) Suppose that and are equivalent norms on a vector space V. Show that if {vn} is a
sequence of vectors in V and LV, we have
lim vn L
n
lim vn L .
n
5) Suppose that <v,w> denotes a scalar product of 2 vectors v,w in the vector space V.
Show that <v,w> is a continuous function of v, holding w fixed. Is it uniformly continuous?
7) Show that f:UW , where U is a subset of a normed vector space V and W is a normed vector
space, is continuous at a point a in U if and only if
for every sequence vn in U such that lim vn = a
n
we have
lim f ( vn ) = f ( a ).
n
8) The function I mapping C[a,b]=the space of continuous real valued functions on the interval [a,b]
into defined by I ( f ) =
absolute value on .
9) Show that the norms and 1 on the space C[a,b]=the space of continuous real valued
functions on the interval [a,b] into are not equivalent.
10) Explain how the picture below can be used to see the difference between f-g and f-g1 assuming that
f is the purple function which starts at the top and g is the blue function which starts at the bottom.
****************************************************************************************************
****************************************************************************************************
PRACTICE EXAM 2
****************************************************************************************************
1) Define and give an example:
a x
a) radius of convergence of
n =0
sin( n )
2
n
2n
d)
n =1
( 1) n
b)
n = 2 log( n )
1
a)
n = 2 n ( n 1)
e)
1
n( n 1)
n =2
2
n =1
c)
4) True - False. Tell whether the following statements are true or false. Give a brief reason for your answer.
a)
an x n converges implies
n =0
b)
n ( x 2)
a u
n
n =0
converges if x < 1.
n =1
c) fn continuous on [a,b]
and
e) Any bounded function on [a,b] can be uniformly approximated by step functions and is thus integrable.
f) Any piecewise continuous function on [a,b] can be uniformly approximated by step
functions and is thus integrable.
g)
If
lim an = 0, then
n
is convergent.
n =1
a x
n
n =0
nx
n =1
c) What function does this power series represent within the interval where it converges?
b) Compute
&f-s& .25.
s .
functions on [a,b].
b) Explain why you know that this integral is a continuous linear function of f, where continuity is
with respect to the & & norm on the space of bounded functions on [a,b].
13) Explain why the integral has the following 2 properties on the space C[a,b] of continuous functions on the
finite interval [a,b].
a) the integral preserves
b
f = f + f .
F(x)=f(x), x[a,b].
15) Let a<b<c. Suppose that f(x) is continuous on [a,b] and we define g(x) to be equal to f(x) for all x in [a,c)
b
and for all x in (c,b] but we define g(c) = f(c) + 100. Show that
f = g.
a
12/11/2010
Vibrating Things
12/11/2010
2u 2u
=
t 2 x 2
, 0<x<, 0<t.
u(x,0)=f(x),
u
( x,0) = 0
t
for 0<x<.
12/11/2010
T "(t )
X "( x )
=c
T (t )
X ( x)
X ( x ) = a1 exp x + a2 exp x
X ( x ) = c1 cos ( x ) + c2 sin ( x )
12/11/2010
Look at ODE2.
Math 20D says that the general solution may be taken to
be
T (t ) = b1 cos ( nt ) + b2 sin ( nt )
12/11/2010
f ( x ) = . cn sin( nx )
(7)
n 1
cn = f ( y ) sin( ny )dy.
(8)
u ( x, t ) = cn sin( nx ) cos( nt ).
(9)
n 1
u = f ( x) cos (t )
xx
u (0, t ) = u ( , t ) = 0; u ( x, 0) = ut ( x, 0) = 0.
utt
utt u xx = f ( x ) cos (t )
u (0, t ) = u ( , t ) = 0; u ( x, 0) = ut ( x, 0) = 0.
u ( x, t ) = cn (t )sin( nx ).
n 1
12/11/2010
(12)
f , g = f ( x ) g ( x )dx.
0
Then
cn (t ) = u (, t ),
sin( n )
= u ( x, t )
0
sin( nx )
dx
sin( nx )
,n=1,2,3,....
n=1 2 3
forms an orthonormal family for
12/11/2010
You can solve this using methods from Math. 20D for
example, variation of parameters. The result is:
f ,v
cn (t ) = 2 n 2 ( cos(t ) cos(nt ) ) .
n
Exercise 5. Check this answer.
Thus a solution for the problem posed by formula
(10) is
cos(t ) cos(nt )
sin(nx)
vn ( x), an = f , vn , vn ( x) =
.
2
2
n
n =1
2
pp
to formula (13)
( ) as n???
Exercise 6. What happens
You need to compute
cos(t ) cos(nt )
lim
.
n
2 n2
(13)
u ( x, t ) = an
12/11/2010
Upon hearing this, the noted engineer Von Karman sent a telegram to the governor stating If you build
the exact same bridge exactly as before, it will fall into the exact same river exactly as before.
Another famous bridge failure due to resonance: Angers Bridge, in Angers, France, 16 April 1850.
The collapse was due to resonance from marching soldiers.
Part II. Short course in series expansions and PDEs like the wave equation.
Applied Math. involves solving certain partial differential equations, for example, the wave
equation, heat equation, Schrdingers equation. One way to find solutions for such PDEs is the method of
separation of variables. Usually this leads to Fourier series or generalized Fourier series. Lets look at
the wave equation which describes the motion of a vibrating string. Assume the string has constant
density and constant tension . Then one can derive the following PDE known as the wave equation,
using Newtons law of motion of the principle of least action:
2 u 2u
=
, 0<x<, 0<t.
t 2 x 2
(1)
One assumes, for example, that the string is tied down at the boundary points giving the boundary
conditions:
(2)
u(0,t)=u(,t)=0, for all t>0.
And one may assume initial conditions
(3)
u(x,0)=f(x),
for
u
( x,0) = 0 0<x<.
t
The method of separation of variables of Daniel Bernoulli says: look for a solution of the PDE in
formula (1) of the form u(x,t)=X(x)T(t).
If you want this to satisfy (2), assume X(0)=X(). If you
want it to satisfy (3), you are in trouble for the 1st part, but the 2nd part becomes T(0)=0.
Now plug u(x,t)=X(x)T(t) into formula (1). You get (setting c=/)
X(x)T(t)=cT(t)X(x),
Divide both sides by X(x)T(t) (hoping you are not dividing by 0). This gives:
(4)
T "(t )
X "( x )
=c
.
T (t )
X ( x)
This implies each side is constant. Call the constant . It is often called the separation constant. It is
an eigenvalue in the 1st ODE below.
Exercise 1. Prove that each side in equation (4) must be constant.
Thus we now have 2 ODES to solve, assuming c=1 (or = in formula (1)).
ODE 1. X(x) = X(x), 0 = X(0) = X().
ODE 2. T(t) = T(t), T (0) = 0.
Look at ODE1. The general solution from Math. 20D is
X ( x ) = a1 exp x + a2 exp x ,
with constants ai. To satisfy the boundary conditions, we need <0. This means, since eix=cosx+isinx
when i=(-1), that we should write, for =-2,
X ( x ) = c1 cos ( x ) + c2 sin ( x ) ,
T (t ) = b1 cos ( nt ) + b2 sin ( nt ) .
/2
To solve the plucked string problem you need to represent the function f(x)=u(x,0) as a Fourier
sine series (thinking that f is an odd function of period 2):
(7)
f ( x ) = cn sin( nx ) .
n 1
cn =
f ( y )sin(ny )dy.
0
This was proved in Lectures II, for sufficiently smooth functions f. Of course, our plucked string
function does not look smooth, just continuous. It has that sharp point, remember.
Anyway our final solution to the vibrating string problem is:
(9)
u( x, t ) = cn sin( nx ) cos( nt ).
n 1
Exercise 2. Check that formula (9) solves the PDE (1) assuming c=1 (or = in formula (1)).
What assumptions do you need to make on the function f to know that the Fourier coefficients
decrease rapidly enough to make the differentiated series converge uniformly so that it is legal to
differentiate term-by-term?
Thus we see that Fourier series seem necessary if we want to understand vibrating things like
strings. They are also useful in the analysis of almost any phenomenon; e.g., the stock market, heat
diffusion, yearly San Diego rainfall measurements. Of course, many questions are raised in the
mathematical brain here. We put some of them in the preceding exercise. You might also ask:
1) Are the solutions u(x,t) to (1),(2), and (3) unique? Then whatever crazy method
we use will lead to the same answer.
2) When is an infinite series of solutions to a PDE again a solution?
3
u = f ( x) cos (t )
xx
u (0, t ) = u ( , t ) = 0
utt
u ( x, 0) = ut ( x, 0) = 0.
Assume 1=/, for simplicity. So our problem is now
utt u xx = f ( x) cos (t )
u (0, t ) = u ( , t ) = 0
u ( x, 0) = ut ( x, 0) = 0.
(10)
To solve (10), plug in
(11)
u( x, t ) = cn (t ) sin( nx ).
n 1
(12)
f , g = f ( x ) g ( x )dx.
0
Then
cn (t ) = u(, t ),
sin( n )
= u( x, t )
sin( nx )
dx .
Exercise 3. Prove the last formula and then use it to show that vn(x)=
sin( nx )
,n=1,2,3,.... forms
an orthonormal family for the inner product space of piecewise continuous functions on [0,] using
the inner product defined by formula (12).
If we plug formula (11) into (10) without worrying about our issues of interchange of derivative
and summation, we see that we need
cn (t ) = n 2cn (t ) + f , vn cos(t )
cn (0) = cn (0) = 0.
You can solve this using methods from Math. 20D for example, variation of parameters. The result is:
cn (t ) =
f , vn
( cos(t ) cos(nt ) ) .
n 2
2
(13)
u ( x , t ) = an
n =1
cos(t ) cos( nt )
sin( nx )
vn ( x ), an = f , vn , vn ( x ) =
.
2
2
n
cos(t ) cos( nt )
.
n
2 n2
lim
You should be able to deduce from this that unless an=0, the function u(x,t) blows up as t goes to
infinity. This is resonance.
The phenomenon of resonance is quite general. It works for all sorts of vibrating objects, even with nonconstant density and tension and even in higher dimensions. Of course it can be used for good as well as
evil. Consider musical instruments for example. Can they be played for evil? I suppose you might be able
to destroy a glass by making it resonate.