LShouh
LShouh
Preface i
I The basics 1
1 Sets, functions, numbers, and infinities 3
1 Paradoxes of the smallest infinity . . . . . . . . . . . . . . . . . . 3
2 Uncountability . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3 Cantors infinite paradise of infinities . . . . . . . . . . . . . . . . 18
4 The real deal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
5 Reals from rationals . . . . . . . . . . . . . . . . . . . . . . . . . 28
6 The Cantor set . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2 Discontinuity 33
7 Guessing function values . . . . . . . . . . . . . . . . . . . . . . . 33
8 The Dirichlet function . . . . . . . . . . . . . . . . . . . . . . . . 38
9 Conways base-13 function . . . . . . . . . . . . . . . . . . . . . . 41
10 Continuity is uncommon . . . . . . . . . . . . . . . . . . . . . . . 45
11 Thomaes function . . . . . . . . . . . . . . . . . . . . . . . . . . 46
12 Discontinuities of monotone functions . . . . . . . . . . . . . . . 47
13 Discontinuities of indicator functions . . . . . . . . . . . . . . . . 51
14 Sets of discontinuities . . . . . . . . . . . . . . . . . . . . . . . . 52
15 The Baire category theorem . . . . . . . . . . . . . . . . . . . . . 56
3 Series 61
16 Stacking books . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
17 Inserting parentheses and rearranging series . . . . . . . . . . . . 68
18 A Taylor series that converges to the wrong function . . . . . . . 73
19 Misshapen series . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
20 If you torture a series enough, it will converge . . . . . . . . . . . 77
4 Sequences of functions 83
21 Cauchys wrong theorem . . . . . . . . . . . . . . . . . . . . . . . 83
22 Walrus tusks and nasty pointwise limits . . . . . . . . . . . . . . 85
3
4 CONTENTS
5 Differentiation 109
27 Discontinuous derivative . . . . . . . . . . . . . . . . . . . . . . . 109
28 Darbouxs theorem . . . . . . . . . . . . . . . . . . . . . . . . . . 111
29 Continuous but nowhere differentiable functions . . . . . . . . . . 111
30 Derivatives at infinity . . . . . . . . . . . . . . . . . . . . . . . . 113
31 Bump functions and partitions of unity . . . . . . . . . . . . . . 115
32 Multivariable limits, derivatives, and local extrema are weird . . 116
6 Measure 119
33 Episode I: The Phantom Measure (Measure is problematic) . . . 119
34 Jordan measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
35 Lebesgue measure . . . . . . . . . . . . . . . . . . . . . . . . . . 127
36 The Smith-Volterra-Cantor set . . . . . . . . . . . . . . . . . . . 129
37 A nonmeager set of measure zero . . . . . . . . . . . . . . . . . . 130
38 Lebesgues density theorem . . . . . . . . . . . . . . . . . . . . . 135
39 Measure and Minkowski sums . . . . . . . . . . . . . . . . . . . . 136
40 Intersections of measure zero sets and their images . . . . . . . . 137
41 Noise sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
42 Convergence in measure vs. pointwise convergence . . . . . . . . 139
43 Borel measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
44 Measures in general . . . . . . . . . . . . . . . . . . . . . . . . . 143
45 Episode II: Attack of the Clones (Banach-Tarski and paradoxical
decompositions) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
12 Acknowledgments 269
6 CONTENTS
Preface
i
ii PREFACE
Pictures
35%
bedtime
ries [1], a brilliant work of fiction. This book is both nonfictional and about
R, hence the name. We didnt include any complex analysis, because we feel
that topics in complex analysis are more like magic tricks than bedtime stories.
(Can I get a bounded entire volunteer from the audience? Abracadabra, hocus
pocus, tada! Youre constant.) Happy reading!
Some undergrads
Earth, 2016
References
bedtime [1] S. Duvois and C. Macdonald. 101 Illustrated Analysis Bedtime Stories.
2001. url: http://people.maths.ox.ac.uk/macdonald/errh/.
lockhart09 [2] Paul Lockhart. A mathematicians lament. Bellevue literary press New
York, 2009.
Part I
The basics
1
refsection:2
Chapter 1
This chapter isnt exactly about real analysis, but its fun stuff that you
need to understand anyway. To appreciate the real analysis stories, you need to
know something about the world in which they take place.
primitive objects, so rather than defining them, we should just give some axioms about them
which we will assume. Well be assuming the ZFC axioms. Look them up if youre curious.
It shouldnt matter.
3
4 CHAPTER 1. SETS, FUNCTIONS, NUMBERS, AND INFINITIES
Figure 1.2: The centipede can compare two finite sets even though it doesnt know
how to count. The centipede puts a clean sock on each foot until it either runs out
of socks or runs out of feet. If it runs out of socks with some feet still bare, it can
comparing-finite-sets conclude that it has more feet than clean socks, so its time to do laundry.
chairs, to compare two finite sets X and Y , you can avoid counting or numbers
and instead use an even more primitive concept: matching. Pair off elements of
X with elements of Y one by one. The sets are the same size if and only if you
fig:comparing-finite-sets
end up with no leftover elements in either set. (See Figure 1.2.)
Armed with this observation, way back in 1638, Galileo declared that in-
fig:galileo-paradox
finities cannot be compared. Heres his reasoning. (See Figure 1.3.) Suppose
were interested in comparing the set of natural numbers N = {1, 2, 3, . . . } with
the set of perfect squares S = {1, 4, 9, . . . }. On the one hand, obviously, N
is bigger than S, because S is a proper subset of N, i.e. every perfect square
is a natural number but not vice versa. If we list the natural numbers on the
left and the perfect squares on the right, we can match each perfect square n2
on the right with the copy of that same number n2 on the left, leaving a lot of
lonely unmatched natural numbers.
But on the other hand, instead, we could match each natural number n N
with the perfect square n2 S. That would leave no leftovers on either side,
suggesting that N and S are the same size! We get two different answers based
on two different matching rules. Its as if we play musical chairs twice, with the
same set of people and the same set of chairs both times. In the first game, the
chairs all fill up, with infinitely many losers still standing. But in the rematch,
everybody
galileo1638
finds a chair to sit in! Galileo concluded that this is all just nonsense
[4]:
So far as I see we can only infer that the totality of all numbers is
infinite, that the number of squares is infinite, and that the number
of their roots is infinite; neither is the number of squares less than the
totality of all the numbers, nor the latter greater than the former;
and finally the attributes equal, greater, and less, are not
applicable to infinite, but only to finite, quantities.
Galileo was on the right track, but he didnt get it quite right. The main
takeaway from Galileos paradox is that we really do need a definition in order
6 CHAPTER 1. SETS, FUNCTIONS, NUMBERS, AND INFINITIES
1 1 1 1
2 2 4
3 3 9
4 4 4 16
5 5 25
6 6 36
7 7 49
8 8 64
9 9 9 81
.. .. .. ..
. . . .
Figure 1.3: Galileos paradox. With infinite sets, different matching rules can lead
fig:galileo-paradox to different outcomes.
to compare infinite sets. In the 1800s, Georg Ferdinand Ludwig Philipp Cantor
provided a good one, declaring that X has the same cardinality as Y if there is
some way to pair off the elements of X with elements of Y , leaving no leftovers
in either set.2 So the definition is biased in favor of declaring sets to be the
same size. Cantor says, the appropriate way to handle Galileos paradox is to
say yeah, there really are just as many natural numbers in total as there are
perfect squares. Infinitys weird like that.
To explore cardinality properly, we need to be more precise. Galileos para-
dox involved two different ways of associating elements of N with elements of S:
two different binary relations between N and S.
to it as Humes principle.
1. PARADOXES OF THE SMALLEST INFINITY 7
1 1
2 2
3 3
.. ..
. .
Figure 1.4: An example of a familiar relation with domain N and codomain N, the
relation.
Figure 1.5: Let Obama be the binary relation with domain R and codomain R whose
graph G is depicted above. Obama is not a function, for two reasons. First, for some
values of x, there are multiple y so that (x, y) G. (Obama fails the vertical line
test.) Second, for some x, there does not exist a y so that (x, y) G. (Obama is not
entire.)
8 CHAPTER 1. SETS, FUNCTIONS, NUMBERS, AND INFINITIES
1
a
2
b
3
c
4
fig:example-func Figure 1.6: A function f : {1, 2, 3, 4} {a, b, c}, with e.g. f (4) = c.
1 1
2 2
3 3
.. ..
. .
of real numbers ( 32 , 109 , 2, e, etc.3 ) These real-valued functions of a real
argument are going to be the main characters in most of our stories.
A collision of a function f : X Y is a pair of distinct inputs x1 , x2 X
such that f (x1 ) = f (x2 ). A function is injective if it has no collisions. An
injective function is lossless: you can recover the input from the output. An
bacon09
injection preserves information. Darius Bacon [3]. To put it another way, if
X is a set of people and Y is a set of chairs, an injection X Y is a seating
arrangement where each person gets her own chair, possibly leaving some chairs
empty.
You should think of the codomain Y as the set of allowed outputs of f .
The image of f , denoted f (X), is the set of actual outputs of the function,
i.e. f (X) is the set of all f (x) Y as x ranges over X.4 E.g. the image of the
3 Unsatisfied by this definition as well? Well discuss what real numbers really are in
sec:real-number-axioms
Section 4. For now, just think of points on a number line, or decimal expansions.
4 You might have heard the term range before. The word range is ambiguous. Dont
use it. When people say range, sometimes they mean codomain, and other times they mean
image.
1. PARADOXES OF THE SMALLEST INFINITY 9
kevin09
Figure 1.8: Sir Jective hits everything with his sword. Kevin [9]. See also
twistedpencil196
[8]. [TODO replace picture. The image should depict the fact that the knight hits
everything with his sword. Maybe he is stabbing something in the picture, hes missing
a leg, the horse is missing a leg; nearby is a slain dragon, a headless chicken, a mailbox
fig:sir-jective cut in half, a chopped-down tree...]
fig:example-func
function depicted in Figure 1.6 is {b, c}. We say that f is surjective if f (X) = Y .
In other words, f is surjective if for every y Y , there exists an x X so that
f (x) = y. A surjection is a seating arrangement which fills every chair, possibly
with many people sharing a single chair.
A function is bijective if it is injective and surjective. A bijection is also called
a one-to-one correspondence:5 it is the notion of matching with no leftovers
that we were looking for. A bijection is a seating arrangement in which every
person is assigned her own chair and every chair is filled. Heres the official
version of Cantors definition.
Definition 3. Suppose X and Y are sets. We say that X has the same car-
dinality as Y if there exists a bijection X Y . We write |X| = |Y | in this
case.
Example 1. The set of even integers has the same cardinality as the set of
odd integers, because f (2k) = 2k + 1 is a bijection between these two sets. (See
fig:even-odd
Figure 1.10.) This should be intuitive, since even and odd seem to be on equal
footing.
bijection, and then in the same breath use the term one-to-one to mean injection (note
the omission of the word correspondence.) Some say f maps X onto Y to say that f
is surjective, while the subtly different f maps X into Y merely means that X and Y
are the domain and codomain of f ! Its a terminological disaster. Much better to stick
with the injective/surjective/bijective terms, invented by the group of mathematicians known
pseudonymously as Bourbaki.
10 CHAPTER 1. SETS, FUNCTIONS, NUMBERS, AND INFINITIES
f (x)
Figure 1.9: The function f (x) = x2 is not surjective when thought of as a function
R R, because negative numbers are not part of its image. However, it is surjective
if we think of it as a function R [0, ).
..
. ..
.
4
3
2
1
0
1
2
3
4
5
6
..
.. .
.
fig:even-odd Figure 1.10: The bijection from the set of even integers to the set of odd integers.
1. PARADOXES OF THE SMALLEST INFINITY 11
Example 2. The set of all integers (positive, negative, and 0) has the same
cardinality as N (the set of positive integers). To see why, observe that we can
fig:integers
reorder the integers as follows (see also Figure 1.11):
0, 1, 1, 2, 2, 3, 3, . . .
The function f (n) which gives the nth element in the list is a bijection from
N to the set of integers. We denote the set of all integers by Z (which stands
for zahlen, the German word for number). So what weve just shown is that
|Z| = |N|. This is counterintuitive: it feels like there are about twice as many
integers as positive integers.
3 2 1 0 1 2 3
0 1 2 3
1 2 3
0 1 2 3
0 1 1 2 2 3 3
1 2 3 4 5 ...
1 1 1 1 1
1 2 3 4 5 ...
2 2 2 2 2
1 2 3 4 5 ...
3 3 3 3 3
1 2 3 4 5 ...
4 4 4 4 4
1 2 3 4 5 ...
5 5 5 5 5
.. .. .. .. .. ..
. . . . . .
Figure 1.12: The proof that |Q| = |N|. We make an infinite table of fractions, with
the row index being the denominator and the column index being the numerator.
We circle all of the reduced fractions, and then we can make a list of all the rational
numbers in the zig-zag order indicated by the arrows. A small simplification made in
fig:q-countable this illustration is that it omits the nonpositive rational numbers.
Example 3. The set Q of all rational numbers (i.e. fractions of integers) has the
same cardinality as N! This seems horribly wrong, because there are infinitely
many rational numbers between every two integers. Its sufficiently surprising
glencoe14
that at least one high school textbook [2] boldly asserts that Q and N have
different cardinalities. But in fact, we can enumerate the rational numbers as
follows. Every rational number can be written as a reduced fraction pq , where
p is a nonnegative integer and q is a positive integer. First, we list all rational
numbers with p + q = 1 (theres just one: zero.)
0
1
Then, we list all rational numbers with p + q = 2:
0 1 1
, ,
1 1 1
Then all rational numbers with p + q = 3:
0 1 1 1 1 2 2
, , , , , ,
1 1 1 2 2 1 1
Etc. etc. Every rational number will eventually be listed. Just like in the case
of Z, this reordering immediately gives a bijection between Q and N, showing
fig:q-countable
that |Q| = |N|. (See Figure 1.12.)
We call a set countably infinite if it has the same cardinality as N. (If
you carefully count the elements in a countably infinite set, its false that you
2. UNCOUNTABILITY 13
Figure 1.13: Hilberts paradox of the Grand Hotel. There are countably infinitely
many rooms, all of which are occupied, yet the hotel is still accepting more guests.
When a new guest arrives, the hotel asks the patron in room n to move to room n + 1,
with the net effect being that room 1 is freed up for the new arrival. Do you see how
the hotel can deal with countably infinitely many guests who all arrive simultaneously?
fig:hilbert-hotel (Credit for the No vacancy, guests welcome sign: [hilbert-hotel])
will eventually have counted every element, but its true that for each element,
you will eventually have counted that element.) fig:hilbert-hotel
Countable infinity may be the
smallest infinity, but its got teeth. (See Figure 1.13.)
2 Uncountability
sec:uncountability
A simple observation: a set is finite if and only if you can write down the entire
set, after giving each element a name. Theres a similar characterization of
countable sets. (A set is countable if it is either finite or countably infinite.) If
X is countable, maybe you cant write down the entire set, but at least you can
write down an arbitrary element of X.
Geology rocks.
Vacuuming sucks.
Dont drink and derive.
Two wrongs can make a riot.
Why is the letter before Z?
Statisticians say mean things.
Your calendars days are numbered.
A plateau is the highest form of flattery.
..
.
Two fish are in a tank. One says to the other, Do
you know how to drive this thing?
..
.
Bobby Fischer got bored of playing chess with
Russians. He asked the association to fix his next
match with some other Europeans, writing, How about
a Czech mate?
..
.
Figure 1.14: The set of all puns is countable, because every pun can be written
down, and hence the puns can be enumerated: we start with the shortest, then move
fig:puns on to longer and longer puns. (We deserve no credit for the puns listed.)
Before the proof, some examples: We can write down an arbitrary element
of N using the alphabet = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9} and standard decimal no-
tation. Similarly, to write down elements of Z and Q, just throw in two more
symbols, and /. (That was a much easier proof that Q is countable than the
zig-zag argument we did before!)
prop:strings
Proof of Proposition 1. If X is countable, then we can index each element of
X with a natural number, which we can think of as its name. Writing down
natural numbers is easy enough.
For the converse, well show that is countable. To enumerate , first
list all the length-0 strings (theres only one: the empty string.) Then list all
the length-1 strings, then the length-2 strings, etc. There are only finitely many
fig:puns
strings of each length, so this gives a bijection N . (See Figure 1.14.)
prop:strings
Proposition 1 reveals tons of countable sets: the set of all finite subsets of
Z, the set of all polynomials with integer coefficients, the set of all possible
computer viruses, the set of all possible recipes describing yummy food, the set
of all love notes which can ever be written, the set of all theorems, the set of
all proofs, the set of all stories, the set of all finite mazes, the set of all vague
2. UNCOUNTABILITY 15
philosophical questions, the set of all possible digital photographs, the set of all
physical laws that we have any hope of making sense of...
Are there any sets that are uncountable even bigger than N? Of course,
by the thats-why-the-word-countable-was-invented principle. Where do we findprop:strings
one of these super-infinite sets, despised by Count von Count? Proposition 1
gives a hint: it ought to require an infinite amount of information to specify
an element of the set. A sequence is a function with domain N, except that if
A is a sequence, we write An instead of the functional notation A(n).
thm:uncountability Theorem 1. Let 2N denote the set of all sequences of zeroes and ones. Then
2N is uncountable.
The proof, due to Cantor, is unquestionably one of the greatest proofs of all
time. Remember that the definition of cardinality was biased in favor of sets
having the same cardinality, which makes it especially tricky to prove that two
sets have different cardinalities. We have to prove that there does not exist a
2N . Lots of people like to say that you cant prove a negative
bijection Nprice06
jacoby13
[randi09, 11, 7]. But were about to do exactly that.
Proof. Consider any arbitrary function f : N 2N ; we will show that f is not
a surjection. fig:diagonalization
We can represent f as a table, like the example in Figure 1.15. Let A be the
diagonal sequence, defined by An = f (n)n that is, the nth term of A is the
nth term of the nth sequence. Let B be the opposite of A:
(
0 if An = 1
Bn = (1.1)
1 if An = 0.
By construction, for every n N, f (n) differs from B in its nth term. Thus, B
is not in the image of f , so f is not surjective!
Heres a more familiar uncountable set:
thm:r-uncountable Theorem 2. The set R of all real numbers is uncountable.
Proof. Define f : 2N R by
Then f is injective, and hence f is a bijection between 2N and f (2N ) ( (1, 1).
This shows that some subset of R is uncountable, which implies that R is un-
countable.
thm:uncountability
The diagonalization argument in the proof of Theorem 1 is extremely clever.
Cantor wondered whether there was a bijection between N and R for several
years. He asked Richard Dedekind for help, but Dedekind couldnt solve the
problem either. Cantor eventually published a more complicated proof that R
sec:bair56432322e
is uncountable in 1874 (well see this early proof in Section ??.) He published
his diagonalization argument in 1891 [was-cantor-surprised].
16 CHAPTER 1. SETS, FUNCTIONS, NUMBERS, AND INFINITIES
n f (n)
1 0 0 0 0 0 0 0 0 0 0 ...
2 1 1 1 1 1 1 1 1 1 1 ...
3 0 1 0 1 0 1 0 1 0 1 ...
4 0 1 0 0 1 0 0 0 1 0 ...
5 1 1 1 1 0 1 0 1 0 0 ...
6 0 1 0 0 0 1 1 0 1 0 ...
7 1 0 0 1 0 0 1 0 0 1 ...
8 0 1 1 1 1 1 1 1 1 1 ...
9 1 0 1 1 1 1 1 1 1 1 ...
10 0 0 0 1 1 1 0 0 0 1 ...
.. .. .. .. .. .. .. .. .. .. .. ..
. . . . . . . . . . . .
A 0 1 0 0 0 1 1 1 1 1 ...
B 1 0 1 1 1 0 0 0 0 0 ...
thm:uncountability
Figure 1.15: How the proof of Theorem 1 works for one example function f . The
sequence B cannot be in the image of f , because for every n, B and f (n) disagree at
fig:diagonalization their nth position.
thm:r-uncountable
Theorem 2 is profound. Obviously some numbers, like , have infinite deci-
mal expansions. We still manage to write down such numbers, by using special
thm:r-uncountable
notation, like the symbol . But Theorem 2 tells us that no matter how much
notation we make up, there will still be some numbers which cannot be written
down! As Shakespeare said,
There are more things in heaven and earth, Horatio, than are dreamt
of in your philosophy.
For example, there must exist noncomputable numbers numbers for which
there is no algorithm for listing the the digits of thenumber. Numbers which
turn up in the wild tend to be computable (, e, 2, etc.) But the noncom-
putable ones are out there! thm:r-uncountable
Notice that in the proof of Theorem 2, we actually showed that the interval
(1, 1) is already uncountable! Intuition suggests that R has a greater cardinal-
ity than a puny little interval like (1, 1), but youve probably learned by now
that your intuition can be misleading in this business:
prop:interval-cardinalities Proposition 2. For any real numbers a < b, |(a, b)| = |R|.
Proof sketch.
fig:tan
The function f (x) = tan(x) is a bijection ( 2 , 2 ) R. (See
fig:interval-cardinalities
Figure 1.17.) By translating and scaling like in Figure 1.16, you can get a
bijection (a, b) R.
prop:interval-cardinalities
Proposition 2 is bizarre, because we like to think of intervals as having
different sizes, e.g. (0, 2) should be twice as big as (0, 1). Well address that
chap:measure
idea in depth in Chapter 6.
2. UNCOUNTABILITY 17
0 1
0 2
0 1
R
0
To illustrate the care that must be taken to show that one set is bigger than
another, we conclude this section with some philosophical nonsense.
SIMPLICIO: Ive discovered a proof that there are more bad ex-
periences than good experiences. Take any good experience, and
imagine altering it by setting yourself on fire. Now its a bad ex-
perience! So there are at least as many bad experiences as good
experiences, and of course there are some bad experiences in which
youre not on fire, so the inequality is strict.
SALVIATI: No, that wont do. Youve provided a map from the set
of good experiences to the set of bad experiences (the set-yourself-
on-fire map) which is injective, but not surjective. Your conclusion
that there are more bad experiences than good experiences would
only be justified if we were dealing with finite sets. (After all, the
map f : N N defined by f (x) = x + 1 is injective but not sur-
jective! You dont think that N is bigger than itself, do you?) But
in actual fact, there are infinitely many experiences. Just consider
the experience of holding n marbles, for n N. Theres a different
experience for each n.
SIMPLICIO: No no, youve misunderstood what I mean by ex-
perience. You thought that I meant a situation, which the subject
18 CHAPTER 1. SETS, FUNCTIONS, NUMBERS, AND INFINITIES
(The symbol denotes the empty set, the set with no elements.)
Our next theorem implies that no matter how huge a set you come up with,
there is always an even huger set. The proof is just a slightly more abstract
version of the diagonalization argument that revealed uncountability.
3. CANTORS INFINITE PARADISE OF INFINITIES 19
B = X \ A = {x X : x 6 A}.
|X| + |Y | = |X Y |.
If X and Y are not disjoint, just rename the elements of each set, giving new
sets X and Y which are disjoint satisfying |X| = |X | and |Y | = |Y |.
6 This is actually one of many senses in which infinity is a number. See also ordinal numbers,
fig:addition Figure 1.18: Despite what these two piles of apples may suggest, 5 + 4 6= 7.
X Y X Y
X Y X \Y Y \X
fig:boolean-operations Figure 1.19: The Boolean set operations: union, intersection, and set difference.
3. CANTORS INFINITE PARADISE OF INFINITIES 21
Our fold in half proof that |N| = |Z| can easily be adapted to show that
We also have |R| + |R| = |(0, 1)| + |(0, 1)| |(0, 2)| |R|, and hence
These two calculations are not coincidences: it turns out that for any infinite
set X,
|X| + |X| = |X|.
Adding an infinity to itself doesnt do anything!
|X| |Y | = |X Y |.
Our zig-zag proof that |N| = |Q| can easily be adapted to show that
We saw that addition and multiplication are pretty boring for infinite car-
dinal numbers. Is exponentiation similarly boring? You can specify a subset
A X by giving its indicator function A : X {0, 1} defined by
(
0 if x 6 A
A (x) =
1 if x A.
numbers and real numbers are equally nonfictional. Real numbers shouldve been called
continuum numbers or line numbers or something. Too late now.
4. THE REAL DEAL 23
?
?
1
Figure 1.20: Uh oh, rational numbers and triangles are not friends.
Well that wasnt a very good definition. Wed better tell you what the real
number axioms are, eh? Most of them are ax:sup
pretty boring. You should just skim
them to get the flavor, except for Axiom 8 which is important. The first four
axioms say that arithmetic works like it ought to.
2
1
3
4
6
5
Figure 1.21: Number systems satisfying Axioms 1 through 4 (with no order struc-
ture) are called fields. There are a lot of bizarre fields which are nothing like R. For
example, Z/7Z is the field you get by coiling Z up into a circle, pretending that n
and n + 7 are the same number for every n. Division in this field is pretty weird, e.g.
1
3
= 5, since 5 3 = 15 = 1. The point is, Axioms 5 through 8 are important.
Figure 1.22: Q is like Swiss cheese: its riddled with holes. R is like cheddar cheese:
fig:cheese it tastes good grated over scrambled eggs.
Figure 1.23: The countable set S = { n1 : n N}. This set has a maximum,
fig:sup-inf max S = sup S = 1. It has no minimum, but inf S = 0.
empty set, or Abraham Lincoln, or radical freedom. Given one real number
system (R, +, , ), we can build another real number system. Our new set of
real numbers is R {}, i.e. the set of all pairs (x, ) where x R. Arithmetic
is defined by (x, ) + (y, ) = (x + y, ) and (x, ) (y, ) = (x y, ), and the
order is defined by saying that (x, ) (y, ) if and only if x y.
But thats dumb. All we did is rename each number x to (x, ), which
shouldnt count as building a whole new real number system. A rose by any
other name would smell as sweet. This renaming silliness is the only thing
that goes wrong; any two real number systems are isomorphic, i.e. each can be
obtained from the other by renaming the elements. Precisely:
thm:r-uniqueness Theorem 7. The real number system is unique up to ordered field isomorphism.
That is, if (R1 , +1 , 1 , 1 ) and (R2 , +2 , 2 , 2 ) are two real number systems, then
there exists a bijection f : R1 R2 so that
For all x, y R1 , f (x +1 y) = f (x) +2 f (y).
For all x, y R1 , f (x 1 y) = f (x) 2 f (y).
1
9
1.2
8
4.5 5
6
4
1.4
7
3.5
1.6
8
6
1.8
9
2. 5
2
5
1.
4.5
2 2
1. 8 1.4 2.5
1.6
4
3.5
3
Figure 1.24: In ancient times (circa 1970), engineers used slide rules to quickly mul-
tiply and divide numbers. A simple circular slide rule is depicted; the gray portion
rotates relative to the white portion. The depicted position corresponds to multiplica-
tion/division by 2. Slide rules work because the exponential function f (x) = ex is an
isomorphism between the additive structure of R and the multiplicative structure of
(0, ), because of the standard exponent rule ex+y = ex ey . (This is a slightly sim-
thm:r-uniqueness
pler kind of isomorphism than the ordered field isomorphism of Theorem 7, because thm:r-uniqueness
here were just preserving the structure of one operation, whereas in Theorem 7 we
fig:isomorphism preserved the structure of two operations and a relation.)
28 CHAPTER 1. SETS, FUNCTIONS, NUMBERS, AND INFINITIES
X Y = {x y : x X, y Y, x 0, y 0} {x Q : x < 0}.
We define X by
X = {x y : x < 0, y 6 X}.
And now we can extend our definition of multiplication to all reals by setting
(X) Y = X (Y ) = (X Y ) and (X) (Y ) = X Y . Its tedious, but it
can be verified that these definitions make (R, +, , ) a real number system.
There are lots of alternative constructions of R. The Cauchy sequence con-
struction, discovered by Cantor [TODO cite], is more in the spirit of real anal-
ysis. Cantor identified a certain class of sequences of rational numbers which
deserve to converge, and then defined the real numbers specifically so that
those sequences really do converge. You might appreciate reading about it
sec:continuity-definition
[TODO reference], but you should read Section ?? first.
6. THE CANTOR SET 29
283
200 = 2.002 . . .
13
9 = 2.08 . . .
3
2 = 2.25
5
3 = 2.77 . . .
2= 4
5
2 = 6.25
1
0.25 = 2
1 = 1 6
1.44 = 5
1.77 . . . = 34
1.96 = 75
707
1.9993 . . . = 500
Figure 1.25: The Dedekind cut identified with 2 is the set of shaded rational
fig:dedekind-cut numbers.
Figure 1.26: The set S = { n1 : n N} is nowhere dense. Given any interval (such as
nowhere-dense-example the blue interval), there is a subinterval (in red) which completely misses S.
30 CHAPTER 1. SETS, FUNCTIONS, NUMBERS, AND INFINITIES
0
0 1
1
0 1 2 1
3 3
2
0 1 2 1 2 7 8 1
9 9 3 3 9 9
Its easy to see that is nowhere dense: for any open interval I that inter-
sects [0, 1], there is a sufficiently large n so that the nth step of constructing
involves removing a subinterval of I. In fact, after removing all those intervals,
how much of [0, 1] is left over? The sum of the lengths of the intervals that make
up n is ( 23 )n , so if we take a limit as n , we see that chap:measure
the total length
of is 0. (Well come back to this calculation in Chapter 6.) So must be
empty... right? Wrong! For example, 0, 1 . In fact, has infinitely many
points: any number of the form 31n is in .
But to really understand ||, we need to take a detour. So far, weve talked
about real numbers in the abstract. When you met R as a child, real numbers
were presented to you in the guise of decimal expansions. A decimal expansion
is a string, something like 3.14159265 . . . , which (by definition) represents12 the
real number
1 4 1 5
3+ + + + +
10 100 1000 10000
(We havent talked about limits yet, but since all the terms are nonnegative,
you can interpret this infinite sum as the supremum of the set of partial sums.)
Proposition 4. Every real number has a decimal expansion.
(Well skip the proof.) How about uniqueness? Annoyingly, some real num-
bers have two different decimal expansions. The real number 1 can also be
12 You might complain that we havent explained which real number is referred to by strings
like 3, 10, 4, 100, etc.! Well, you understand which integers are referred to by such strings,
right? And the real number 1 is part of the axioms. So identify the positive integer n with
the real number 1 + 1 + + 1 (with n ones.)
6. THE CANTOR SET 31
Figure 1.28: 0.999 . . . apples are depicted. Or maybe 0.999 . . . apple is depicted?
represented as 0.999 . . . , where there are an infinite number of nines after the
decimal point. Do you doubt it? Lets prove it. Certainly 1 is an upper bound
on {0.9, 0.99, 0.999, . . . }. If there were a smaller upper bound, say 1 , then
would be infinitesimal : greater than zero, but smaller than n1 for every natural
number n. Such numbers do not exist:
Theorem 8 (Archimedean Property). For any real > 0, there exists a natural
number n > 0 so that > n1 .
Proof. Let S = {n N : n 1 }. Our goal is to show that S 6= N. If S is
empty, were done. Otherwise, by the supremum axiom, S has a least upper
bound sup S. By the minimality of sup S, there exists s S with s > (sup S)1.
Then s+1 > sup S, so sup S is not an upper bound on N. Therefore, S 6= N.
If youre still in doubt, maybe youd be convinced by tripling both sides of
the equation 31 = 0.333 . . . . If youre still uncomfortable, maybe it helps to keep
in mind that decimal expansions are just strings, not the numbers themselves.
So is 1 the only two-faced scoundrel in R? Nope, e.g. 97.842 = 97.841999 . . . .
Every real number with a finite decimal expansion has a second decimal expan-
sions. But thats the only thing that goes wrong.
Proposition 5. Every real number has at most two decimal expansions. A
real number has two decimal expansions if and only if it has a finite decimal
expansion.13
Now we can finally understand ||, by representing numbers in ternary, i.e.
base 3. (All of our discussion of decimal expansions applies mutatis mutandis
for any integer base b 2, or even weirder bases like base 2i where i is the
imaginary unit.) The interval ( 31 , 23 ) that we remove in the first iteration of the
construction of consists of all those real numbers x [0, 1] whose first ternary
digit (after the decimal point14 ) is 1. More precisely, ( 13 , 23 ) consists of those real
numbers x [0, 1] such that in every ternary representation of x, the first digit
13 Note that for this proposition, we count 3 and 3.0 and +003 as all being the same
decimal expansion. If you were trying to be careful, you might disallow leading/trailing
zeroes in decimal expansions.
14 It would really be more appropriate to call it a radix point, but whatever.
32 CHAPTER 1. SETS, FUNCTIONS, NUMBERS, AND INFINITIES
is 1. Similarly, in the nth step, we remove those real numbers x such that in
every ternary representation of x, the nth digit is 1. So what were left with, ,
is the set of real numbers in [0, 1] which can be represented in ternary without
using the digit 1. But of course there are uncountably many such real numbers,
because every sequence of 0s and 2s represents a distinct such real number!
So on the one hand, is big: it is an uncountable set. But on the other
hand, is small: it is nowhere dense, and it has total length zero. Well
meet again many times, when these odd properties make it useful.
References
qcsd [1] Scott Aaronson. Quantum computing since Democritus. Cambridge: Cam-
bridge University Press, 2013. isbn: 978-0521199568.
glencoe14 [2] Algebra 2, Study Guide and Intervention Workbook. McGraw-Hill Educa-
tion, 2014.
bacon09 [3] Darius Bacon. Comment on blog post of Scott Aaronson. url: http://
www.scottaaronson.com/blog/?p=391#comment-13569.
galileo1638 [4] Galileo Galilei. Discourses and Mathematical Demonstrations Relating to
Two New Sciences. Italy, 1638.
hilbert26 [5] D. Hilbert. ger. In: Mathematische Annalen 95 (1926), pp. 161190. url:
http://eudml.org/doc/159124.
honig54 [6] Chaim Samuel H{onig. Proof of the well-ordering of cardinal numbers.
In: Proceedings of the American Mathematical Society 5.2 (Feb. 1954),
p. 312. doi: 10.1090/s0002-9939-1954-0060558-3. url: http://dx.
doi.org/10.1090/S0002-9939-1954-0060558-3.
jacoby13 [7] S. Jacoby S. Roell. An Interview with Susan Jacoby on Athiesm. url:
http://fivebooks.com/interviews/susan-jacoby-on-atheism.
twistedpencil196 [8] Jette. The Adventures of Sir Jective. url: http://twistedpencil.com/
posting/196.
kevin09 [9] Kevin. Comment on blog post of Scott Aaronson. url: http : / / www .
scottaaronson.com/blog/?p=391#comment-13542.
munkres [10] James Munkres. Topology. 2nd ed. Prentice Hall, 2000.
price06 [11] Nelson L. Price. Is There A God. url: http://www.nelsonprice.com/
is-there-a-god/.
tall00 [12] D. Tall. Cognitive development in advanced mathematics using tech-
nology. In: Mathematics Education Research Journal 12 (Dec. 2000),
pp. 196218. doi: 10.1007/BF03217085.
refsection:3
Chapter 2
Discontinuity
33
34 CHAPTER 2. DISCONTINUITY
f (x) g(x)
x x
fig:x-sin-one-over-x Figure 2.3: The topologists sine curve after a pliers accident, g(x) = x sin(1/x).
7. GUESSING FUNCTION VALUES 35
|xn L| < .
fig:limit-definition
(See Figure 2.4.) In this situation, we write lim xn = L, or just xn L.
n
xn
L+
n
N
Figure 2.4: The definition of the limit of a sequence. For any error margin > 0,
fig:limit-definition for all sufficiently large n, xn is within of L.
Traditionally, real analysis students find the epsilontics involved in the defi-
nition of a limit to be confusing.1 Maybe a real-life example would help clarify.
You are the pilot of a helicopter carrying secret agents. For their secret spy
mission, its important that you hover L feet off the ground. Let xn be the
altitude of the helicopter after youve made n adjustments. (Its a digital heli-
copter.) Then xn L means that no matter what tolerance > 0 your crazy
boss demands of you, by making enough careful adjustments, you can eventually
guarantee that the helicopter is within of L and always will be in the future.
Meeting higher standards takes more time, of course: if is very small, then N
might have to be very big.
Now lets move on to defining continuity. You and your spouse want to go on
a trip to the Moon. Your spouse has been obsessively watching the fluctuating
rocket ticket prices, trying to get the best possible deal. I finally bought the
tickets just now at time t, your spouse says.
How much did they end up costing? you ask.
1 Steven Krantz reports that when asked to give the - definition of continuity on a quiz,
one student responded: For every > 0 there is a > 0 such that you can draw the graph
without lifting your pencil from the paper. [TODO cite Mathematical Apocrypha]
36 CHAPTER 2. DISCONTINUITY
You dont wanna know / your spouse replies. But you really do wanna
know, so you ask, Well how much did they cost at time t 100?
Only $200! We shouldve bought them then!
How about at time t 10?
They shot up to $1000, which scared me.
And at time t 1?
Down to $600. I thought Id better grab them soon.
What about at time t 0.1?
$580. Having learned the prices at times near t, you can extrapolate to
guess the price at t, but youd have to assume that the price doesnt fluctuate
too wildly. You keep needling your spouse, learning the prices at times t 0.01,
t 0.001, t 0.0001, t 0.00001... You gain more and more confidence in your
extrapolations, because you have to assume less and less about the behavior of
the price. After infinitely many questions, youve learned the price at a sequence
of times tn with tn t, so you just have to extrapolate infinitesimally to infer
the price at t. All youre assuming now is that the price function is continuous
at t.
Well end this section with a ridiculous theorem about infinitesimal extrapo-
weather1
lation even in the face of discontinuity, from [7]. Lets play a game. We choose a
function f : R R. Then a point x0 R is randomly chosen (drawn from, say,
a standard normal distribution, or whatever.) We reveal to you the restriction
of f to R \ {x }. (I.e. you get to know f (x) for every x 6= x .) Then you have
to guess what f (x ) is. You win if you get it right; we win if you get it wrong.
Youre probably thinking, Ill take a limit! You could find a sequence
xn x with xn 6= x , and evaluate limn f (xn ). If that limit exists, it
seems like the obvious guess. If we choose a continuous f and you follow this
strategy, youre guaranteed to win.
7. GUESSING FUNCTION VALUES 37
Figure 2.5: A sequence (in red) showing that f (x) = sin(1/x) is discontinuous at 0.
-over-x-discontinuous The sequence suggests that f (0) = 1, but in actuality f (0) = 0.
f (x)
? ?
?
?
?
x
x
Figure 2.7: This is the sort of picture that you have to deal with in our guessing
game. Every value of the function is revealed except one mysterious point.
But were not going to make it that easy. We dont make any promises at
all about f . Can you still force a guaranteed win? Nah, youre doomed to
occasionally give wrong answers. But, absurdly, you can force an almost sure
win:
thm:function-guessing Theorem 9. There is a strategy you can follow which ensures that for any
function f we choose, there are only finitely many values of x which lead you
to lose. In particular, no matter which f we choose, your probability of winning
is 100%.
Proof. Define a binary relation on the set of all functions R R by declaring
38 CHAPTER 2. DISCONTINUITY
that f g if f and g agree on all but finitely many points. This relation is an
equivalence relation, i.e. it is reflexive (f f ), symmetric (f g = g f )
and transitive (f g, g h = f h.) Therefore, partitions the set of
all functions R R up into equivalence classes maximal sets of functions any
two of which agree on all but finitely many points. For each equivalence class
C, choose one representative function fC C.
When youre presented with f with its value at x hidden, figure out which
equivalence class f belongs to (call it C.) Then guess that f (x ) = fC (x ). For
any f , there are only finitely many x causing you to lose, because f fC !
That proof was our first2 encounter with the Axiom of Choice (AC), which
is the axiom of set theory which allows the step where we defined fC .
f (x)
1
1 0.5 0 0.5 1
0.5
0.5
1
1 0.5 0 0.5 1
0.8
0.6
0.4
0.2
0.2
1 0.5 0 0.5 1
different notions of function, of varying degrees of rigor. For the first couple
hundred years, it was popular to think of functions in terms of formulas
or analytic expressions,
euler1748
whatever that means. E.g. in 1748, Euler gave a
definition [4]:
A function of a variable quantity is an analytic expression composed
in any way whatsoever of the variable quantity and numbers or con-
stant quantities.
dirichlet1829
In 1829 [3], sec:dirichlet-thomae-revisited
Dirichlet gave Q as an example of a function with no integral
(see Section ??). Since Q is not really defined by a formula, some infer that
Dirichlet had internalized the modern concept of a function, for which they
lakatos76
therefore give him credit. But Lakatos correctly points out [6, p 151] that the
credit is undeserved. Dirichlet never gave any such definition.
f (x)
f (b)
y
f (a)
x
a xb
Figure 2.12: The intermediate value theorem: in order for a continuous function to
fig:ivt get from one value to another, it must pass through every value in between.
Figure 2.13: When you drive in a car, your distance from Wellington varies contin-
uously. Every point on the Earths surface which is 550 miles away from Wellington
is at sea. So by the IVT, if you want to drive from New Zealand to Australia, youre
fig:oceania going to have to build a car that can drive through water. Or a bridge or something.
9. CONWAYS BASE-13 FUNCTION 43
Hey, maybe now we can argue that Bolzanos formal definition of continu-
ity successfully captures Eulers intuitive idea! A function which satisfies the
conclusion of the IVT is called a Darboux function. That is, f is a Darboux
function if for every a < b and every y between f (a) and f (b), there is an
x (a, b) so that f (x) = y. Maybe thats a definition of continuity that Euler
could get behind! The IVT says that every continuous function is Darboux, so
now we just have to prove that every Darboux function is continuous.
Theres one small hitch: that last statement is extremely false! The function
f (x) = sin(1/x) is Darboux, but discontinuous at one point. It gets worse.
Well give a function f : R R such that for every open interval (a, b), we have
f ((a, b)) = R. That is, for every open interval (a, b) and every y R, there
exists x (a, b) so that f (x) = y. So f is certainly Darboux, but f is not even
remotely close to continuous. In fact, its discontinuous at every point (like the
Dirichlet function, but much crazier.)
fig:conway Figure 2.14: A truncated graph of Conways base-13 function (in black).
fig:conway2 Figure 2.15: A truncated graph of Conways base-13 function (in white).
fig:conway
Figure 2.14 is a little misleading. The graph of f isnt all of R2 (its a
44 CHAPTER 2. DISCONTINUITY
function, after all!) But every disc in R2 contains a point in the graph of f . In
other words, the graph of f is a dense subset of R2 .
So what function has this bizarre property? One example is by British
mathematician John Horton Conway, who (as of 2015) is still alive, unlike the
other mathematicians weve encountered. His idea is to represent numbers in
base 13, with these symbols:
0 1 2 3 4 5 6 7 8 9 + .
Every real number has a unique base-13 expansion with no trailing . symbols
sec:cantor-set
(recall Section 6.) Conways base-13 function f is defined with respect to this
expansion as follows.
For the interesting case, suppose the base-13 expansion of x is of the form
AB, where removing all the circles from the symbols in B yields a sensible
base-10 expansion for a real number y. Then set f (x) = y.
Otherwise, just set f (x) = 0.
For example, let x be the real number with base-13 expansion
x= + 6 . 2 4 3 . 1 4 1 5 9 2 6 ...
| {z }| {z }
A B
prop:conway Proposition 7. Let f denote Conways base-13 function. Then for every a < b
and every y, there is some x such that a < x < b and f (x) = y.
Proof. Start with the base-13 expansion for the midpoint 21 (a + b). If we go out
far enough in this base-13 expansion, we can change anything we want and well
still have a number in (a, b). So in particular, we can replace the sequence of
subsequent digits with the circledfig:conway-proof
base-10 expansion of y, to obtain an x (a, b)
such that f (x) = y. (See Figure 2.16.)
So Darboux functions are a lot more complicated than continuous functions.
In fact, Darboux functions are absurdly expressive:
Theorem 11 (Sierpinski). For every function f : R R, there are two Darboux
functions g, h so that f = g + h.
Proof. Define an equivalence relation on R by declaring that x y if xy Q.
Let E be the set of equivalence classes. Observe that
1
.
(a + b) = 0 . 5 1 8 1 0 5 9 ... (base 13)
2
y = +3.1415926 . . . (base 10)
.
x= 0 . 5 1 8 + 3 1 4 ... (base 13)
prop:conway
Figure 2.16: The proof of Proposition 7. The location of the vertical bar in the
base-13 expansion of 12 (a + b) is chosen based on how big b a is, to make sure that
fig:conway-proof x (a, b).
Whats the moral of this story? Is there something wrong with Bolzanos
definition of continuity? Nah. Euler would probably agree that Conways base-
13 function does not deserve to be called continuous. The notion of a Darboux
function is not a reasonable definition of continuity. Its hard to say what it
means for the graph of a function to be described by freely leading the hand,
but it really ought to be more conservative than continuity, not more liberal.
10 Continuity is uncommon
c:continuity-uncommon
In the previous couple of sections, we saw some really nasty functions with
tons of discontinuities. But in everyday life, it seems like we only run into
continuous functions. You might be tempted to infer that most functions are
continuous. But in truth, in the sense of cardinality, the vast majority of func-
tions are discontinuous!
f (x) f (x)
x x
Figure 2.17: To recover the full graph of f given the values of f on Q, just connect
g:specifying-continuous-function the dots.
Then |C(R, R)| = |R| = i1 . (In contrast, note that the set RR of all functions
R R has cardinality i2 .)
Proof. Since Q is dense, to specify a continuous function f : R R, it suffices
to give the restriction of f to Q. (The value of f at any point x can be recovered
from its values on Q, because theres a sequence x1 , x2 , .fig:specifying-continuous-function
. . of rational numbers
converging to x, and f (x) = limn f (xn ). See Figure 2.17.) Therefore,
11 Thomaes function
sec:thomae
Weve seen some very discontinuous functions. But bigger is not always better.
Maybe youre especially fond of some set D R. Like discontinuity connois-
seurs, we can look for a function which is discontinuous exactly at the x values
in D. For now, lets consider the case D = Q. In the 19th century, the German
mathematician Carl Johannes Thomae devised his namesake function:
(
0 if x is irrational
f (x) = 1 (2.3)
q if x = pq , with pq reduced and q > 0.
fig:thomae
(See Figure 2.18.)
Proposition 9. Thomaes function is continuous at irrational x and discon-
tinuous at rational x.
12. DISCONTINUITIES OF MONOTONE FUNCTIONS 47
f (x) g(x)
x x
Figure 2.19: A monotone increasing function on the left and a monotone decreasing
fig:monotone-function function on the right.
(Its basically like the definition of the limit of a sequence, with x playing
the role of n and playing the role of N .) If you just check, youll see that
f is continuous at c if and only if limxc f (x) = f (c). So now we can divide
the crime of discontinuity into fig:discontinuity-types
three tiers, depending on how badly limxc f (x)
fails to equal f (c) (see Figure 2.20):
1. Suppose limxc f (x) exists, but it doesnt equal f (c). Then f is charged
with having a removable discontinuity at c. For this minor infraction, f
is required to enroll in a 12-step program, where it learns how to change
its value at c and thereby become continuous.
2. The left limit, denoted limxc f (x) or f (c), is defined just like limxc f (x),
except we only pay attention to x < c. Similarly for right limits. Suppose
f (c) and f (c+) both exist, but theyre not equal, and hence limxc f (x)
doesnt exist. Then f is charged with having a jump discontinuity at c.
For this misdemeanor, f is incarcerated in a correctional facility, where
professionals attempt to decrease f (x) for all x on one side of c, thereby
eliminating the jump and restoring continuity.
3. Finally, suppose either f (c) or f (c+) does not exist. Then f is charged
with having an essential discontinuity at c, which is a felony. Making f
continuous at c would require fundamentally altering f s character. So f is
just sentenced to life imprisonment, to protect society from its incorrigible,
deviant behavior.
1000000
Figure 2.21: You open the right envelope and see 106 . Do you guess that x1 = 106
or x2 = 106 ? Does 106 seem like a small number, or a big number? What a dumb
fig:envelope-game question. Surely, all you can do is toss a coin and hope for the best... right? Nope!
a jump discontinuity. Since f (c) < f (c+), there is some rational number qc
with f (c) < qc < f (c+). The map c 7 qc is an injection from the set of
discontinuities of f to Q.
x1 x2
Figure 2.22: The strategy which gives you a win probability greater than 50%. The
area of the green region is the probability that y really does fall between x1 and x2 ,
in which case you win. If the blue or yellow event occurs, youll win if and only if you
fig:gaussian-envelope-strategy open the envelope containing x2 or x1 , respectively.
f (x)
3
4
1
2
1
4
x
0
thm:froda-converse
fig:froda-converse Figure 2.23: The function f used to prove Theorem 13 in the case D = N.
Figure 2.24: Let E denote the gray region. Then x is an interior point of E, y is an
fig:interior-exterior-boundary exterior point of E, and z is a boundary point of E.
a point x Rn and a radius r > 0, let Br (x) denote the open ball of radius r
centered at x.
14 Sets of discontinuities
sec:f-sigma
Does Thomaes function have a twin? That is, does there exist a function which
is continuous at rational points and discontinuous at irrational points?
As in the last section, D(f ) is the set of points where f is discontinuous.
Weve seen examples of messed up functions with D(f ) = R, D(f ) = Q, D(f ) =
14. SETS OF DISCONTINUITIES 53
Figure 2.25: Adolf Hitler does not appreciate the terms open and closed
fig:hitler [hitlertopology].
, etc. We have not, however, seen any hints about how you might rule out the
possibility of asec:monotone-discontinuities
function f with some given discontinuity set.
In Section 12, we saw a satisfying theorem: There exists a monotone function
f such that D(f ) = D if and only if D is countable. In this section, well prove
an analogous theorem without the monotone qualifier.
Definition 14. Fix a set E Rn . We say that E is closed if E E. We say
that E is open if E E c .
For example, thankfully, open intervals are open and closed intervals are
closed. An open set is one where each point has some wiggle room. Fuzzy
set probably would have been a better term for open sets. The term closed
set is more reasonable, because a set E is (topologically) closed if and only if
it is closed under the operation of taking limits. That is, E is closed if and only
if whenever xn is a convergent sequence of points in E, we have lim xn E.
Warning: some sets, like [0, 1), are neitherfig:hitler
open nor closed, and other sets, like
, are both open and closed. (See Figure 2.25.)
If E = E, like the case E = , then theres a function f with D(f ) = E,
namely f = E . By adapting Dirichlets simple trick, we can handle all closed
sets, even the ones with nonempty interiors.
:closed-discontinuity Proposition 12. Suppose E R is closed. Then there is some function f with
D(f ) = E.
Proof. Define
1 if x E Q
f (x) = 1 if x E \ Q
0 if x 6 E.
fig:closed-discontinuity
(See Figure 2.26.) This is obviously continuous on x 6 E, because theres a
neighborhood around x on which f is constant. Conversely, suppose x E, so
54 CHAPTER 2. DISCONTINUITY
f (x)
prop:closed-discontinuity
fig:closed-discontinuity Figure 2.26: The function used to prove Proposition 12 in the case E = [1, 1].
prop:closed-discontinuity
(An alternative way to prove Proposition 12 is to show that every closed
subset of R is the boundary of some set.) How about the converse? Do we
have our characterization are discontinuity sets precisely closed sets? Nah,
that hypothesis has already been falsified. For example, Q is not closed, but its
the discontinuity set of Thomaes function. The real criterion is slightly more
complicated.
(The term F comes from the French words ferme and somme, meaning
closed and union.) For example, any countable set, like Q, is F , because
singleton sets are closed. Any closed set, like or R, is trivially F . The set
R \ {0} is F , because
[ 1
1
R \ {0} = , , .
n n
nN
Notice that every set of discontinuities that weve encountered so far is F ! This
is no coincidence. Using the basic idea behind Thomaes function, we can tweak
prop:closed-discontinuity
the proof of Proposition 12 to handle arbitrary F sets.
fig:f-sigma-implies-discontinuity prop:closed-discontinuity
(See Figure 2.27.) First, suppose x E. The proof used for Proposition 12 still
applies, showing that f is discontinuous at x. Conversely, suppose x 6 E, so
f (x) = 0. Suppose xm x. Since each En is closed, the sequence xm must
eventually escape En and never return. Once xm has escaped E1 , . . . , En , we
have |f (xm )| n1 . So f (xm ) 0, and f is continuous at x.
f (x)
x
0 1 1 3 2 5 3
2 2 2
thm:f-sigma-implies-discontinuity
Figure 2.27: The function used to prove Theorem 14 in the case En = [ n1 , 3 1
n
],
implies-discontinuity which is discontinuous precisely on E = (0, 3).
Figure 2.28: The oscillation of f in E is the height of the smallest box that contains
the graph of the restriction of f to E. For example, the oscillation of sin(1/x) in any
fig:oscillation interval containing 0 is 2.
thm:discontinuity-implies-f-sigma
Proof sketch of Theorem 15. We can write
[ 1
D(f ) = x R : f (x) . (2.5)
n
nN
Proposition 13. Let D denote the set of all F subsets of R. Then |D| = |R|
(which is smaller than |P(R)| by Cantors theorem.)
Proof sketch. It turns out that every open set U R can be written as a
countable union of disjoint open intervals. A closed set is just a complement
of an open set, so a closed set can be specified by a sequence of real numbers.
Hence, an arbitrary element of D is specified by a sequence of sequences of reals.
Therefore,
|D| (|R||N| )|N| = |R||N| = |R|.
But the story so far isnt entirely satisfying, because its not obvious how
to identify examples of sets which are not F . Can the set R \ Q be written as
a countable union of closed sets? Its difficult to say! (Thats the thing about
characterization theorems. Youre never really sure when youre done.) Stay
sec:baire
tuned, well answer this question in Section 15.
sec:uncountability
In Section 2, we saw Cantors famous 1891 diagonal argument, which proved
that R is uncountable. Diagonalization is a great trick to have up your sleeve;
sec:cardinal-numbers
we saw in Section ?? that it can be used to prove that |P(S)| > |S| for every set
S. Historically, diagonalization was not the first technique used to prove that R
is uncountable. Lets take a look at (a slight variant of) Cantors original proof
that R is uncountable, from 1874. The older proof is actually more real-analysis-
ish than the slick diagonalization trick, and if you understand the proof, youll
be ready to meet meager sets. A set E R is bounded if diam(E) < .
n 0 1
n
Figure 2.31: Cantors original proof that R is uncountable. Having already defined
In1 (in black), we can find a subinterval In (in blue) which misses the single point
fig:r-uncountable-original xn (in red.)
References
bolzano1817 [1] Bernard Bolzano. Rein analytischer Beweis des Lehrsatzes da zwischen je
zwey Werthen, die ein entgegengesetzetes Resultat gewahren, wenigstens
eine reelle Wurzel der Gleichung liege. Gottlieb Haase, 1817.
bourbaki54 [2] Nicolas Bourbaki. Berlin: Springer, 2006. isbn: 3540340343.
60 CHAPTER 2. DISCONTINUITY
Chapter 3
Series
16 Stacking books
How far over the edge of a table can a stack of books protrude without toppling?
fig:stacking-books-problem
(Figure 3.1)
d
Table
Figure 3.1: The book stacking problem with N = 4. We are interested in maximizing
tacking-books-problem the overhang d.
61
62 CHAPTER 3. SERIES
Table
If youre in the mood to solve this puzzle yourself, close this book now and
ponder. Otherwise, read on for the solution.
Table Table
Figure 3.3: The stack on the left is unbalanced and will topple over. The COM of
the entire stack (marked ) is over the table like it should be, but the COM of the
top two books (marked ) is to the right of the third book. The top two books will
pivot about the top right corner of the third book, as shown on the right. Note: We
assume that the books have uniform density, so the COM of a set of books is just the
fig:unbalanced-books average of their spatial centers.
Table
1 1
8 4
1 1
6 2
Proof. Say there are N books in the stack. By induction, we can assume that if
you held the bottom book steady, the stack wouldnt fall over. The only thing
to worry about is the horizontal component of the COM of the whole stack
compared to the edge of the table. Put the origin at the upper right corner of
16. STACKING BOOKS 63
1
the table, so that the COM of the top N 1 books is at most 2N (by induction.)
Hence, the COM of all the books is at most
" #
1 1 1 1
1 + +(N 1) = 0.
N 2 2N 2N
| {z }
COM of
bottom book
fig:harmonica Figure 3.5: The harmonic series is not to be confused with the harmonica series.
P
A series is an expression1 of the form n=1 an , where a1 , a2 , . . . is a sequence
of real numbers (the terms of the series.) The sequence of partial sums of the
PN
series is the sequence S1 , S2 , . . . where SN = n=1 an . We say that the series
converges/diverges if the sequence of partial sums converges/diverges. Series
can diverge because the limit is infinite, e.g. 1 + 1 + 1 + , or because the limit
does not exist, e.g. 1 1 + 1 1 + .
P
Proposition 16. The harmonic series n=1 n1 diverges, i.e.
XN
1
lim = .
N
n=1
n
Proof. This proof was discovered by the philosopher Nicole Oresme in the 1300s.
fig:oresme
(Figure 3.6) Well make the series a little smaller, and show that it still diverges.
Replace each term with the next power of two to appear:
X
1 1 1 1 1 1 1 1 1 1
= + + + + + + + + +
n=1
n 1 2 3 4 5 6 7 8 9
1 1 1 1 1 1 1 1 1
+ + + + + + + + +
1 2 |4 {z 4} |8 8 {z 8 8} 16
1/2 1/2
30
importance
20
10
ts
s
rie
ar
se
ch
ic
r
ba
on
rm
ha
Figure 3.6: Other than his proof that the harmonic series diverges, Oresmes main
fig:oresme contribution to the world may have been the invention of bar charts.
1 1
3 5
1 1
2 4
Figure 3.7: The proof that the harmonic series 12 + 13 + 41 + . . . diverges (the first
term 1 is not drawn.) We divide the infinitely many terms of the series into blocks,
and alternatingly color the blocks gray and red. Each block has only finitely many
terms (twice as many as the previous block) yet each block has a total width of at
fig:harmonic-series-diverges least 21 .
Table
fig:harmonic-52 Figure 3.8: A harmonic stack of 52 books, which achieves an overhang of about 2.27.
Figure 3.9: You can get near the theoretical optimal overhang with a deck of 52
fig:cards playing cards.
66 CHAPTER 3. SERIES
out that
PNthere is a constant 0.58 called the Euler-Mascheroni constant such
that n=1 n1 + ln N in the sense that
"N # !
X1
lim ln N = .
N
n=1
n
fig:euler-mascheroni
fig:harmonic-log-overlay
(Figures 3.10, 3.11).
To paraphrase2 Daniel Shanks, ln N goes to infinity with great dignity. Turn-
ing things around, the number of books youd need to achieve an overhang of
d using a harmonic stack grows very rapidly with d; it scales like e2d . Even
for smallish distances like d = 30, you would need far more books than can be
found on Earth.
So the harmonic stack isnt as exciting as it seemed. Unfortunately, the
harmonic stack is optimal:
prop:harmonic-optimality Proposition 17. The maximum overhang that can be achieved by a stack of N
books is that achieved by the harmonic stack of N books.
(The proof, which is elementary, is omitted.) One way to get around this
annoyance is to relax the model by allowing multiple books at each vertical
fig:side-by-side
position, side by side. (Figure 3.12) It turns out that in this new model, the
2 The original quote: log log log x goes to infinity with great dignity.
16. STACKING BOOKS 67
1
2 ln N
Table a
Figure 3.11: The harmonic stack is shaped like the exponential function (or the
natural log function if your head is sideways.) The distance marked a is approximately
:harmonic-log-overlay /2, where is the Euler-Mascheroni constant.
number of books needed to reach a distance d scales like d3 instead of like e2d .
[TODO cite https://math.dartmouth.edu/~pw/papers/maxover.pdf] Much
more practical.
Finally, well address two misconceptions about book stacking. Misconcep-
tion one: Some people mistakenly summarize our discussion of harmonic stacks
by saying, You can build a stack of books that reaches infinitely far away from
the table. But infinitely far is much different than arbitrarily far. (What
physics is even supposed to apply to an infinite stack of books?)
Misconception two: Some people mistakenly believe that you can add books
to the top of an ever-growing stack, one by one, in such a way that the overhang
Table
Figure 3.12: When you allow books to be side by side (unlike our original problem),
new possibilities open up. A harmonic stack of 9 books achieves an overhang of
d 1.41, but this simple diamond stack of 9 books achieves a superior overhang of
fig:side-by-side d = 1.5.
68 CHAPTER 3. SERIES
goes to as time progresses. Our discussion of harmonic stacks did not prove
this claim; notice that to get from a harmonic stack of N books to a harmonic
stack of N + 1 books, you have to add another book to the bottom of the
stack! And in fact, in the model where no two books can have the same vertical
position, the claim is false. Since this point is a little bit subtle, and it isnt
discussed anywhere outside this book to the bestapx:book-stacking
of our knowledge, we give a
fairly detailed statement and proof in Appendix 1.
1 1 + 1 1 + 1 1 + , (3.1) eqn:divergent-series
which diverges since its partial sums form the divergent sequence 1, 0, 1, 0, . . ..
Now insert some parentheses to help it along:
(1 1) + (1 1) + (1 1) + (3.2) eqn:divergent-series2
1 + (1 + 1) + (1 + 1) + , (3.3) eqn:divergent-series3
fig:idcrisis
which then converges to 1. (Figure 3.13) Evidently, we cant get associativity
for infinite sums in general. Uh oh. Does Oresmes proof have a gaping hole in
it? It seemed so convincing!
P No, not a gaping hole, just a tiny technicality to address. It is true that if
an converges, then we can insert parenthesis wherever we want and it will still
converge to the same thing. Proof: Inserting parenthesisfig:series-associativity-failure
amounts to looking
at a subsequence of the sequence of partial sums. (Figure 3.14) If the sequence
of partial sums converges to begin with, then all subsequences also converge
to the same thing, so we can add parentheses to the series willy-nilly and the
sum wont change. Adding parentheses can only help the series converge. So
17. INSERTING PARENTHESES AND REARRANGING SERIES 69
Figure 3.13: Guido Grandi experiences an identity crisis. Actually, the paradox
didnt bother Grandi at all. He found it theologically illuminating: By putting
parentheses into the expression 11 + 11 + . . . in different ways, I can, if I want, obtain
0 or 1. But then the idea of the creation ex nihilo is perfectly plausible. [TODO cite
fig:idcrisis Bagnis Appunti di Didattica della Matematica]
SN
0 N
Figure 3.14: The sequence of partial sums for Grandis series oscillates and diverges.
But the subsequence consisting of just the blue dots converges to 1, and the subse-
associativity-failure quence consisting of just the black dots converges to 0.
70 CHAPTER 3. SERIES
Oresmes proof works, because3 if the harmonic series did converge, then the
series 1 + 21 + 12 + 21 + . . . would have to converge to something smaller.
Now that weve seen that associativity does not generalize to infinite series,
well look at commutativity (a + b = b + a). Can we rearrange terms of a series
without affecting the sum?
The answer is, again, no in general. As an example, lets rearrange the al-
P n+1
ternating harmonic series, n=1 (1)n . Using Taylor series, we can evaluate:
X
(1)n+1 1 1 1
= 1 + + = log 2.
n=1
n 2 3 4
By the way, thats a natural logarithm.45 Now lets rearrange the series like
this:
1 1 1 1 1 1 1 1
S =1 + + + ,
2 4 3 6 8 5 10 12
which is a pattern of an odd denominator followed by two consecutive even
denominators. This new series does not converge to log 2. If it did, we could
insert parentheses without altering the sum, but:
1 1 1 1 1 1 1 1
1 + + +
2 4 3 6 8 5 10 12
1 1 1 1 1 1
= + + +
2 4 6 8 10 12
1 1 1 1 1 1
= 1 + + +
2 2 3 4 5 6
1
= log 2.
2
fig:paychecks-and-bills
(Figure 3.15.)
How far can we push this madness? Which series have sums which depend
on the order of summation? And which values can such a series be made to sum
to?
Wed better clarify what it means to rearrange the terms of a series. In-
tuitively, we just want to add up the terms in a different order. But of course,
1 1 1 1
1+ + + + + ...
2 4 8 16
should not count as a rearrangement of the harmonic series, because some
terms of the harmonic series will never appear. We want every term of the
original series to appear exactly once in the new series. To make this precise,
note that a permutation of a set S is a bijection : S S.
3 Another way to justify Oresmes argument: Every term of the harmonic series is nonnega-
tive, so the sequence of partial sums is monotone. Every subsequence of a monotone sequence
has the same convergence behavior as the original sequence.
4 In analysis, when the base of a logarithm isnt specified, you should assume its base e.
This is in contrast to e.g. computer science, where logs are base 2 by default.
5 Heres a joke: What do analysts and number theorists throw into the fireplace? Answer:
Natural logs!
17. INSERTING PARENTHESES AND REARRANGING SERIES 71
.
..
Bill
Paycheck
Bill
/ /
,
Paycheck
,
Bill
/ /
,
Paycheck
,
Bill
/ /
,
Paycheck
,
Bill
/ /
,
Paycheck
,
/ /
, ,
P (1)n+1
Figure 3.15: You can think about a series such as n
financially. The positive
terms of the series are paychecks and the negative terms are bills. When you get a
paycheck, you immediately deposit it, and when you get a bill, you immediately pay
it off. The series converges to log 2, which means that as time progresses, your bank
account balance will converge to log 2. The paychecks sum to infinite wealth, and the
bills sum to infinite debt, so your bank account balance converging is the result of a
careful balancing act. Each paycheck puts your bank account balance a little above
log 2, and each bill puts your bank account balance a little below log 2. It should
make sense that if you start getting two bills for every paycheck, you wont be able to
g:paychecks-and-bills maintain such a high bank account balance.
P
Definition 19. A rearrangement of the series n=1 an is a series of the form
P
n=1 a(n) , where is a permutation of N.
P P
Recall that a convergent series n=1 an is conditionally convergent if n=1 |an | =
. For example, the alternating harmonic series is conditionally convergent.
P
riemann-rearrangement Theorem 19 (Riemanns rearrangement theorem). Let an be a condition-
ally convergentP series. Then for any L R {}, there is a permutation
(n) so that a(n) = L.
SN
21
41
1
16 81 10
1.2
1
+ 19 1
+ 19 1 + 25 1
1 + 29
1
+ 13 1 + 23 + 27
+ 17 1
+ 21
+ 31 + 17 1
+ 11 1
+ 15
+ 15
1 1 1 1 1 1 1 1 1
1.2 = 1 + + + + + + +
3 2 5 7 9 4 11 13 6
+ 11
N
Figure 3.16: The rearrangement of the alternating harmonic series that the proof of
thm:riemann-rearrangement
fig:riemann-rearrangement Theorem 19 constructs for the target sum L = 1.2.
P P P
cant both converge, because that would imply that |an | = a+
n a
n
converges.6
thm:riemann-rearrangement
Proof sketch of Theorem 19. First suppose L R. Without loss of generality,
assume L 0. By the lemma, our positive terms are worth and our negative
terms are worth , so lets use them! Add a bunch of positive terms until our
partial sum exceeds L. Then throw in some negative terms until we drop below
L, then back to positive terms, etc. We switch to adding terms of the other
sign as soon as we pass L. In this way, we use up all the terms in the series,
and the error between our partial sum and L goes to 0 as time progresses, since
fig:riemann-rearrangement
an 0 as n . (Figure 3.16)
Now suppose L = +. The idea is to add up a lot of positive terms, then
a negative term, then a lot of positive terms, then a negative term, etc. By the
lemma, we can always add enough positive terms to more than make up for the
negative term. The L = case is symmetric.
P P
If n=1 |an | converges, we say that n=1 an is absolutely convergent. Heres
a converse to Riemanns rearrangement theorem. Dirichlet showed that rear-
ranging an absolutely convergent series never changes the sum:
P P
Proof. Let n=1 an = L. ThePkey fact is that since n=1 |an | converges, there
is some N so that the tail n=N +1 |an | is . For this same N ,
N
X X
X
L an = an |an | .
n=1 n=N +1 n=N +1
Now we wait for our permutation (n) to hit all the numbers {1, . . . , N } (this
will happen in finite time, since theres only finintely many numbers we need to
hit). If it takes time t to get all of them, so {(1), . . . , (t)} {1, . . . , N }, then
X Xt X
L a(n) L a(n) + a(n) 3,
n=1 n=1 n=t+1
P P
since n=t+1 a(n) n=N +1 |a(n) | and
Xt XN X
L a(n) L an + more terms beyond aN 2.
n=1 n=1
| P
{z }
N +1 |an |
X
f (n) (0) n
f (x) = x , for |x| < R, some R > 0. (3.4) eqn:taylor
n=0
n!
P xn
Some examples you may P be familar with are ex = n=0 n! for all x R
1
(R = ) and 1x = n=0 xn for |x| < 1 (R = 1). In these examples, the
Taylor series is equal to the original function wherever the series converges.
But in general, if the Taylor series converges for |x| < R, must f be equal to its
Taylor series there? Well answer this question negatively by answering a weaker
question: If the Taylor series converges for all x R, is there a neighborhood
eqn:taylor
of x = 0 such that (3.4) holds for all x in that neighborhood? Such a function
is called real analytic, meaning it is locally equal to a convergent power series.
Surprisingly, even this weaker statement is false. Not only can a convergent
Taylor series fail to converge to f everywhere, but it doesnt even need to con-
verge to f in any neighborhood of the center! In other words, the Taylor series
of f can converge to the wrong function in every neighborhood of the center
point.
74 CHAPTER 3. SERIES
Define f by (
e1/x , x>0
f (x) = ,
0, x0
fig:exp1
whose graph is shown in Figure 3.17.
Figure 3.17: Plot of e1/x near zero. It is super flat at zero but then ever so slowly
fig:exp1 makes it way up away from the x-axis.
which is certainly convergent for all x R. But since f (x) 6= 0 for x > 0, no
open interval around x = 0 exists such that T (x) = 0 is equal to f (x).
( 2
e1/x , x 6= 0
rmk:exp2 Remark 1. f has a relative g(x) = with similar properties.
0, x=0
fig:seagull-fcn
(Figure 3.18.)
19. MISSHAPEN SERIES 75
exp(1/x2 )
2
Figure 3.18: The seagull function f (x) = e1/x is infinitely differentiable, but all
fig:seagull-fcn of its derivatives at 0 are 0, so its Taylor series converges to the function g(x) 0.
Remark 2. Maybe you hope these non real analytic functions are rare. Too
bad! Lets say we start with a function h that is real analytic; i.e. it is locally
given by a convergent power series. Then add f to it, which will guarantee that
h + f is not real anlaytic. So for every real analytic function h, we can construct
a unique non real analytic function h + f ! In fact, even the set of smooth
Darst
but
nowhere analytic functions on R is second category in C (R)! (See [1].)
Heres a joke: Recall that a Maclaurin series is just a Taylor series centered
at zero. So, why do Maclaurin polynomials fit the original function so well?
Because they are Taylor-made! In light of this section, Maclaurin polynomials
may not actually work so well, but its just a joke.
19 Misshapen series
P
So far, weve investigated standard series, of the form n=1 an . But standards
are for chumps. How about a two-sided series? E.g.
X
2|n| = + 22 + 21 + 20 + 21 + 22 +
n=
P
It seemsfig:series-2-sided
pretty clear that this series should converge to 1 + 2 n=1 2n = 3.
(Figure 3.19.)
76 CHAPTER 3. SERIES
1
X 1
X
2|n| = 1 2|n| = 1
n= n=1
5 4 3 2 1 0 1 2 3 4 5 n
21 + 22 + 23 + 24 + 25 + 26 +
+ 22 + 23 + 24 + 25 + 26 +
+ 23 + 24 + 25 + 26 +
+ 24 + 25 + 26 +
+ 25 + 26 +
+ 26 +
..
.
P fig:series-2d
This one ought to converge to n=1 n2n = 2. (Figure ??.)
n2n
n
we used there generalizes nicely: Suppose I is some arbitrary index set, and for
i I, ai is a nonnegative real number. Then we define
( )
X X
ai = sup ai : J is a finite subset of I .
iI iJ
3 2 1 0 + 1 + 2 + 3 + = ???
Sometimes
P we can say something about such a series. We can separate our series
S = iI ai into the positive part and the negative part:
X X
S+ = ai , S = ai .
iI iI
ai 0 ai 0
Both of these series make sense by our earlier definition, and if at most one of
S + and S is infinite, then we can define S = S + S . But if S + = and
S = , we just leave S undefined. All of these ideas are generalized by
chap:integration
measure theory and the Lebesgue integral. But thats a story for Chapter ??.
(Leibniz, 1713) [TODO cite] ...And now since from that one [Gero-
lamo Cardano] who wrote of the values of the gambling games, it had
been shown that when the average between two even quantities is
found by calculation, the arithmetic mean ought to be found, which
is one-half of the sum, and in such a way this nature of things attends
to the same law of righteousness; hence although 11+11+11+
etc is 0 in the case with an finite even number of elements, in the
case with a finite odd number of elements it is equal to 1; it follows
that in the case with both sides vanishing into multitude of infinite
elements, where the law is confounded by the presence of both evens
and odds, and there is such a great sum on both sides, that 0+1 2 = 2
1
S + S = 1 1 + 1 1 +
+ 1 1 + 1 1 +
2S = 1 + 0 + 0 + 0 +
1
S= .
2
Hopefully youve learned to not take such an argument very seriously. But in
this case, the answer 12 is correct:
P
Definition 20. Suppose a series n=1 an has partial sums S1 , S2 , . . . .The
Cesaro sum of the series is the limit of the arithmetic mean of the first m
partial sums: Pm
N =1 SN
C = lim .
m m
20. IF YOU TORTURE A SERIES ENOUGH, IT WILL CONVERGE 79
P P
Proposition 19. If n=1 an = L R, then the Cesaro sum of n=1 an is L.
Proof. Fix an arbitrary > 0. Choose N0 large enough so that for every
N > N0 , |SN L| < . Then apply the triangle inequality a few times:
Pm PN0
m N Pm
N =1 SN N0 N =1 SN 0 N =N0 +1 SN
L L + L
m m m m m
1 m N0
(no m dependence) +
m m
2 for m sufficiently large.
So Cesaro provides the right answer when you give him a convergent series.
But sometimes, he even gives an answer if you give him a divergent series! The
partial sums of Grandis series are 1, 0, 1, 0, 1, 0, . . . , hence the Cesaro sum of
the series is 21 .
A summation method is an assignment of a realPvalue to some series. For ex- PN
ample, the standard summation method assigns to n=1 an the value limN n=1 an .
[TODO discuss Borel summation, Ramanujan summation, Abel summation,
etc. Be sure to mention that silly 1 + 2 + = 1/12 thing] Cesaro summation
is mentioned in fourier analysis (summability is nice)
We can sum a geometric series,
1
1 + x + x2 + x3 + = ,
1x
1
provided that |x| < 1. However, the right side of the equation, 1x , makes sense
for any x 6= 1. This lets us make fun claims like
1 + 2 + 4 + 8 + = 1
1
1 1 + 1 1 + = .
2
TODO analytic continuation, remark that there are possible values for the sums
(analytically continue a different function) These values are not unique; for
example
One sum that everyone loves is
1
1 + 2 + 3 + 4 + = (?!?)
12
At the end of this section, well talk about how this can come from analytic
continuation of the Riemann zeta function. Beware, this uses some complex
analysis. To avoid that, here is a quick alternative derivation from TODO cite
First define
d
1 + 2x + 3x2 + 4x3 + = x + x2 + x3 +
dx
d x 1
= = =: S(x).
dx 1 x (1 x)2
80 CHAPTER 3. SERIES
The Riemann zeta function has a friend called the Gamma function,
Z
(s) := et ts1 dt, Re s > 0.
0
Gamma may look a bit scary, but by integrating by parts, we can verify the
functional equation (s + 1) = s(s) and conclude that (n + 1) = n! for
n N. (And yes, that is a factorial, not just us being very excited.) We can
analytically continue the gamma function using the functional equation to copy-
paste everything to the left one unit at a time. The only issue is the singularity
at s = 0, which gets translated to all the negative integers.
TODO picture of gamma function plot on R?
Gamma plays nicely with Riemann zeta. One way to prove the analytic
continuation of Riemann zeta is to form the auxiliary function
References
Darst [1] R. B. Darst. Most infinitely differentiable functions are nowhere ana-
lytic. In: Canad. Math. Bull. 16 (Jan. 1973), pp. 597598. doi: 10.4153/
cmb-1973-098-3. url: http://dx.doi.org/10.4153/CMB-1973-098-3.
Deitmar [2] Anto Deitmar. A First Course in Harmonic Analysis. Springer, 2005.
Hardy [3] G. H. Hardy. Divergent Series. 1948.
82 CHAPTER 3. SERIES
SteinShakarchi3 [4] Elias Stein and Rami Shakarchi. Complex Analysis. Vol. 3. Princeton Lec-
tures in Analysis. Princeton University Press, 2003.
sd00 [5] Arild Stubhaug and Richard H Daly. Niels Henrik Abel and his times:
called too soon by flames afar. Springer, 2000.
refsection:5
Chapter 4
Sequences of functions
83
84 CHAPTER 4. SEQUENCES OF FUNCTIONS
fn
f
x0
Figure 4.1: Pointwise convergence: for each fixed x0 , the sequence f1 (x0 ), f2 (x0 ), . . .
fig:pointwise-convergence converges to f (x0 ).
In 1826, Abel saw this theorem and said, ...it seems to me that this theorem
admits exceptions.2 Indeed, a counterexample to this theorem is the simple
fig:seqfcns-pointwise
sequence of functions depicted in Figure 4.2. But in 1833, in a book3 , Cauchy
wrongthm:pointwise-limit-continuous
once again asserted False Theorem 1. It wasnt until 1853 that Cauchy finally
conceded that the theorem is wrong, saying4 [TODO is this citation correct?]:
As has been remarked by MM. Bouquet and Briot, this theorem
is verified for ordered series according to the ascending powers of
a variable. But for other series, it can not be admitted without
restriction. [TODO replace with actual translation this is from
Google Translate!!!]
1 fn 1 f
0 0
n1 0 0
Its difficult to comprehend the fact that Cauchy made such an elementary
mistake, since he was usually quite careful. Indeed, some historians try to argue
wrongthm:pointwise-limit-continuous
that attributing Wrong Theorem 1 to Cauchy misinterprets his assertions in 1821
m(m1) m(m1)(m2)
2 N. H. Abel, Untersuchungen uber die Reihe: 1 + m 1
x+ 12
x2 + 123
x3 +
u.s.w., Journal fur die reine und angewandte Mathematik, 1(1826), 311-339. Quote (in
German) on pg.9 of http://name.umdl.umich.edu/ABW7150.0001.001.
3 A. Cauchy, Resumes analytiques (1833).
4 A. Cauchy, Note sur les series convergentes dont les divers termes sont des fonctions con-
tinues dune variable reelle ou imaginaire, entre des limites donnees. In Oeuvres completes,
Series 1, Vol. 12, pp. 30-36. Paris: Gauthier-Villars, 1900
22. WALRUS TUSKS AND NASTY POINTWISE LIMITS 85
and 1833. They say that Cauchy actually meant a different, correct theorem.5
But this position seems untenable, considering Cauchys own concession.
Theres a tiny grain of truth in Cauchys Wrong Theorem, though. If you
take a pointwise limit of continuous functions, what you end up with might be
discontinuous, but it wont be catastropically discontinuous.
Theorem 23. Fix D R. Then D is the discontinuity set of some Baire class
one function if and only if D is meager and F .
[TODO give pointers for proof. Note that the if direction comes from the
stronger fact that if D is meager and F , then there is some derivative with
discontinuity set D.]
fn
n
0 r1 r2
(b) Walrus tusks at Cape Peirce,
(a) Walrus tusks at r1 and r2 . Alaska.
1. Look at the next rational rm in our enumeration. See if the spike centered
on rm with width n1 fits without overlapping any current spikes. If so,
then insert it there. If not, do nothing. (This waiting is a key step, and
will ensure that fn (x) is unbounded on irrational numbers.)
1
2. Shrink all spikes inserted so far so they have width n.
Since well eventually get spikes at every rational r, fn (r) will be bounded (in
fact, have limit 0). At irrationals x, well get fn (x) = n infinitely often: If
fn (x) 6= n, then x is in some downward spike. But because the spikes are
shrinking, eventually x will move back up to n, when its sufficiently far away
from the rational number that had the spike. This is where the waiting is key:
x cannot go from being in one spike to being in a new spike without first going
back up to n. A new spike will only be inserted into fn if it doesnt overlap any
of the spikes in fn1 . So eventually x will be out of the original spike, and it
can only go back below n (in a new spike) if it was not in a spike in the previous
iteration, i.e. it went back up to n in between.
You might complain that though (fn (x)) is unbounded for irrational x, we
cant say much meaningful about a pointwise limit as n . But we can fix
this:
Well use the same idea that made Thomaes function continuous precisely
at irrational numbers. The function fn is defined as follows: at each of the first
22. WALRUS TUSKS AND NASTY POINTWISE LIMITS 87
q3
q1
q2
p1 p3 p2
q1 q3 q2
x c1 x c2 x+
Figure 4.5: The proof that fn (x) if x is irrational. For each n > N , the value
of fn (x) is chosen by linearly interpolating between values which are both bigger than
Q, so fn (x) Q.
fn
0 r1 r2
(a) Upside-down walrus tusks at r1 and r2 . (b) More upside-down walrus tusks.
But this isnt good enough! For x irrational, there will always be some ratio-
nal close enough to x that will make fn (x) jump up, forcing {fn (x)} unbounded.
And in fact, it is impossible for such a sequence to exist.
Claim 2. There does not exist a sequence of continuous nonnegative functions
{fn } that is bounded precisely for irrational x.
sec:baire
The proof uses similar ideas as in Section 15. The key is to use Baire, and
try to write R \ Q as a countable union of closed sets (F ), which is impossible.
Well take the sets to be,
Yk := {x R : n, fn (x) k},
S
each of which is closed, and Y = kN Yk , so that x Y if and only if {fn (x)} is
bounded. But then we cannot have R \ Q = Y , since by Baire, R \ Q cannot be
written as a countable union of closed sets. So indeed such a task is impossible.
[TODO: obvious question: if we want a sequence of continuous nonnegative
functions whose pointwise limit is bounded precisely on E, is it necessary and
sufficient that E is F ?]
Next, to get more confusing, lets ask the analogue of the second problem
from the previous section: Does there exist a sequence of continuous nonnegative
functions with limn fn (x) = + if and only if x is rational? We just saw that
getting such a sequence with {fn (x)} unbounded iff x is rational is impossible,
but this ones different. It turns out that yes, we can. In fact, the example we
fig:seqfcns-badlimits2
considered in Figure 4.6 works.
In summary:
1. There exists a sequence of continuous nonnegative functions fn with {fn (x)}
bounded if and only if x is rational.
2. There does not exist a sequence of continuous nonnegative functions {fn }
that is bounded precisely for irrational x.
3. There exists a sequence of continuous nonnegative functions fn with limn fn (x) =
+ iff x is irrational, and a sequence with limn gn (x) = + iff x is
rational.
f +
f
g
Figure 4.7: The idea of uniform convergence. In this picture, for all x, g(x) lies
fig:uniform-convergence between f (x) and f (x) + , i.e. for all x, |g(x) f (x)| < .
fig:pointwise-convergence
Recall the sequence 1.1x, 1.01x, 1.001x, . . . from Figure 4.1, which converges
pointwise to the identity function. But it does not converge uniformly to f (x) =
x, because for every n, the distances |fn (x) f (x)| can get arbitrarily large by
taking x large.
But what if we consider functions with a compact domain, say functions
[0, 1] R? Considered thus, the sequence above actually does converge uni-
formly, because |fn (x) f (x)| 10n for all x. This might make you hope
that maybe for continuous functions defined on [0, 1], pointwise and uniform
convergence coincide... but no such luck. Consider the sequence of functions
fig:seqfcns-spike
depicted in Figure 4.8.
fn
1
0 2n
1
n
|f (x) f (x0 )| |f (x) fN (x)| + |fN (x) fN (x0 )| + |fN (x0 ) f (x0 )|, (4.1) eqn:epsilon-3
then show each term on the right side is smaller than /3 for x close to x0 and
sufficiently large N .
By uniform convergence, choose N so that for all n N ,
So now weve taken care of the second term, and we get in total,
The pair (M, d) is called a metric space. A sequence (xn )nN M converges if
there is an x M such that
Perhaps you were hoping the triangle inequality involved triangles? Take
the usual Euclidean metric p on R2 : the distance between two points is
d((x1 , y1 ), (x2 , y2 )) = (x1 x2 )2 + (yfig:triangle-ineq
2
1 y2 ) . Now we get actual trian-
gles in the triangle inequality (Figure 4.9).
Figure 4.9: Matt has to walk further to get home if he stops to buy math books.
fig:triangle-ineq Matt is a nerd.
Figure 4.10: A taxicab in a US city can only drive horizontally and vertically (red,
fig:taxicab blue, yellow lines). The diagonal green line shows the usual Euclidean distance.
Let X be a compact space, and let C(X) be the space of continuous func-
tions X R. Equip it with the metric d(f, g) = supxX |f (x) g(x)|.
Convergence in this metric is uniform convergence for all > 0, there
exists N N so that for all n N , for all x X, |fn (x) f (x)| < .
sec:indicator-discontinuities
Remark 5. Recall from Section 13 the open balls Br (x) of radius r centered at
x Rn . In general, for a metric space M , Br (x) is the set {y M : d(x, y) < r};
all we did is generalize the notion of distance.
prop:uniform-limits
The uniform limit theorem for R (Proposition 20) generalizes to the same
result for metric spaces, utilizing basically the same proof.
Proposition 21 (Uniform limit theorem). Let X and Y be metric spaces, and
let fn : X Y be a sequence of functions that converges uniformly to some
f : X Y . Assume that for some x0 X, each fn is continuous at x0 . Then
f is continuous at x0 .
One way to construct a metric is using norms.
Definition 23. Let V be a R or C vector space. A function k k : V [0, )
is called a norm on V if
1. kxk = 0 x = 0
2. kxk = ||kxk x V, C
3. kx + yk kxk + kyk x, y V
The pair (V, k k) is called a normed space. A normed space becomes a metric
space by setting d(x, y) = kx yk.
Youre probably most familiar
p with the normed space Rn equipped with the
Euclidean norm, kxyk = (x1 y1 )2 + + (xn yn )2 . But the space C(X)
94 CHAPTER 4. SEQUENCES OF FUNCTIONS
and gives rise to the metric weve been using all along,
d(f, g) = kf gk = sup |f (x) g(x)|. (4.6)
xX
What exactly are we doing here? Were sampling f at spacing n1 , and trying
to interpolate between those points with a polynomial. As n increases, the
sampling points get finer, and wed hope the approximation gets better. (Fig-
fig:bernstein-approx
ure 4.11.)
The proof that indeed kBn (f ) f k 0 involves a decent amount of real
analysis and inequalities (not the best bedtime story material), but well outline
the idea. [TODO outline]
6 Power
sec:taylor-series
series are not enough, even for C functions! cf. Section 18
25. POLYNOMIALS ARE PRETTY GOOD AT APPROXIMATING CONTINUOUS FUNCTIONS95
3.5
3.0
2.5
2.0
1.5
1.0
0.5
3.5
3.0
2.5
2.0
1.5
1.0
0.5
Figure 4.11: (Above) Bernstein approximations of degrees 1, 2, 10, and 50 for the
function | 12 x cos(5x) + sin(20x) + 2.5x| (shown in blue). (Below) Bernstein approxi-
fig:bernstein-approx mations of degrees 100 and 1000.
0.14
0.12
0.10
0.08
0.06
0.04
0.02
-2 -1 1 2
1 1
Figure 4.12: Plot of h(x) = e (x1)2 (x+1)2 [1,1] .
fig:bump-smooth
This is roughly the idea, but since we need a C function, we need smooth
cut-offs to form our partition of unity instead of rough cut-offs.
Start by defining
X
H(x) := h(x n) > 0,
nZ
so we just put a copy of h at each P integer and let them overlap a bit. fig:H-sum
Then
H C (R), and for each x, the sum nZ h(xn) is a finite sum. (Figure 4.13.)
Finally, we get to the actual cut-offs. We had to do all that work to make
sure these smooth cut-offs formed a partition of unity. Define
0.14
0.12
0.10
0.08
0.06
0.04
-2 -1 1 2
P
fig:H-sum Figure 4.13: Plot of H(x) = nZ h(x n). Note it is 1periodic.
1.0
0.8
0.6
0.4
0.2
-2 -1 1 2
fig:smooth-cutoff
e 0 , and
e 1 ,
Figure 4.14: The smooth cut-offs e1 .
X
fe(x) := e n (x),
pn (x) (4.7)
nZ
98 CHAPTER 4. SEQUENCES OF FUNCTIONS
thm:continuous-approx-R
since maxx:en (x)6=0 |pn (x) f (x)| . So Theorem 25 is proved.
Finally, instead of approximating f C([0, 1]) by polynomials, we can ap-
proximate it by trigonometric polynomials of the form
N
X
TN (x) = cj e2ijx , cj C. (4.8)
j=N
We use T to denote the 1-dimensional torus or circle R/Z, which is also often
identified with the interval [0, 1). Then we have:
Okay, so there were some words there we havent defined yet. But the
idea is if we have a set A C(X, R) satisfying certain properties, then its
automatically dense in C(X, R). Also, there is a complex version for C(X, C).
We recommend
teschl-fa
looking at the relevant sections in nearly any real analysis book,
carothers
rudin-little
like [6], [1], or [3], if you are interested in learning about the Stone-Weierstra
theorem.
7 One of the standard proofs of the Stone-Weierstra theorem uses the classical Weierstrass
theorem, but its only a little more work to avoid using it in the proof of Stone-Weierstra.
26. VARIANCE OF DIMENSION 99
26 Variance of dimension
variance-of-dimension
What is it that makes planes, spheres, and Mobius strips two-dimensional, while
lines, circles, and knots are one-dimensional? A typical answer one hears is: it
takes two coordinates to specify a point in a two-dimensional space (e.g. x and
y for the plane, or latitude and longitude for the sphere) whereas it only takes
one coordinate to specify a point in a one-dimensional space. [TODO include
picture]
Cantor realized that there is something to be proven here. As he put it in
an 1877 letter to Dedekind8
For several years I have followed with interest the efforts that have
been made, building on Gauss, Riemann, Helmholtz, and others,
towards the clarification of all questions concerning the ultimate
foundations of geometry. It struck me that all the important in-
vestigations in this field proceed from an unproven presupposition
which does not appear to me self-evident, but rather to need a justifi-
cation. I mean the presupposition that a -fold extended continuous
manifold needs independent real coordinates for the determina-
tion of its elements, and that for a given manifold this number of
coordinates can neither be increased nor decreased.
This presupposition became my view as well, and I was almost con-
vinced of its correctness. The only difference between my standpoint
and all the others was that I regarded that presupposition as a theo-
rem which stood in great need of a proof; and I refined my standpoint
into a question that I presented to several colleagues, in particular
at the Gauss Jubilee in Gottingen. The question was the following:
Can a continuous structure of dimensions, where > 1, be related
one-to-one with a continuous structure of one dimension so that to
each point of the former there corresponds one and only one point
of the latter?
In todays terminology, Cantor was wondering, how does the cardinality of [0, 1]
compare to the cardinality of [0, 1]2 ? When someone says that two coordinates
are necessary to specify an arbitrary point in the unit square [0, 1]2 , they seem
to implicitly be claiming that |[0, 1]2 | > |[0, 1]|. Surprisingly, they are wrong:
Proof. There is an obvious injection establishing that |[0, 1]| |[0, 1]2 |, so it
suffices to show the reverse inequality. Define f : [0, 1]2 [0, 1] by interleaving
digits! That is, if x = 0.x1 x2 x3 . . . and y = 0.y1 y2 y3 . . . , then we set
f (x, y) = 0.x1 y1 x2 y2 x3 y3 . . .
8 F. Gouvea, Was Cantor Surprised?, The American Mathematical Monthly (March
2011).
100 CHAPTER 4. SEQUENCES OF FUNCTIONS
thm:space-filling-curve
any curve must be a thin, one-dimensional sort of an object. But Theorem 29
tells us that there are curves which fill up the entire unit square; such curves
are called space-filling curves.
So how does the proof go? Well sketch a slight variant of Peanos original
proof, due to Hilbert. Hilbert devised a sequence of curves f1 , f2 , . . . , each of
which gets a little closer to filling up the unit square than the last. We wont
fig:hilbert-curve
bother giving a careful definition of fn ; take a look at Figure ?? and you should
see the pattern. Then, we let f be the limit of the fn curves.
fig:hilbert-curve
Figure 4.15: The first four curves in the sequence defining the Hilbert curve.
To finish the proof, we need to show two things. First, we need to show
that the sequence of curves is uniformly convergent. (That way, we can be sure
that f is continuous, i.e. is a curve.) [TODO I dont think weve mentioned
Cauchy sequences or complete metric spaces in this whole book... but its kinda
unavoidable here. Maybe well just skip the proof?] Now, we need to show that
f ([0, 1]) = [0, 1]2 . Certain elementary topological considerations10 imply that
it is sufficient to show that for every open set U [0, 1]2 , the curve f passes
through U , i.e. f ([0, 1]) U 6= . But this is clear from the pictures: for all
sufficiently large n, fn ([0, 1]) intersects U , so f ([0, 1]) does as well.
Space-filling curves are seriously messed up. Invariance of dimension tells
us that it is impossible to parameterize the unit square by a single coordinate t
in such a way that the point associated with t varies continuously with t, and
10 The continuous image of a compact set is compact, so f ([0, 1]) is a closed subset of [0, 1]2 .
26. VARIANCE OF DIMENSION 103
every point in the square has exactly one associated t. But the existence of
space-filling curves means that a parameterization is possible where every point
in the square has at least one associated t! That seems like it should have been
the hard part!
The weirdness doesnt stop here. A Jordan curve is a continuous injection
[0, 1] [0, 1]2 . (So the curve never crosses over itself.) Invariance of dimension
tells us that if f is a Jordan curve, then f ([0, 1]) 6= [0, 1]2 . At this point, youre
probably tempted to draw the inference that if f is a Jordan curve, then f ([0, 1])
has to be small the reason that f ([0, 1]) cannot equal [0, 1]2 is that the square
is just too big. But thats not right either! In TODO, Osgood discovered that
[0, 1]2 is not too big for a Jordan curve to fill up. Rather, [0, 1]2 is just shaped
wrong! Precisely:
Theorem 30. There exists a Jordan curve f : [0, 1] [0, 1]2 whose image
f ([0, 1]) [0, 1]2 has positive area (Lebesgue measure.)
Proof sketch. Even better, well show that for any < 1, there is a curve (an
Osgood curve) whose image has Lebesgue measure at least . (Well define
sec:lebesuge-measure
Lebesgue measure formally in Section ??, but for now, just think of it as the good
definition of area.) [TODO the reader doesnt know about measure yet...?
TODO also cite 2D Lebesgue measure?] Like the Hilbert curve, well construct
the Osgood curve by giving a sequence of fig:osgood-2
curves.
fig:osgood-1
Again,
fig:osgood-3
we wont fig:osgood-6
fig:osgood-4
bother
fig:osgood-5
with an explicit definition. See figures 4.16, 4.17, 4.18, 4.19, 4.20, and 4.21 for
f1 , f2 , f3 , f4 , f5 , f6 ; you should be able to see the general pattern for fn . (The
curve fn is in black.) The crucial parameter is the thickness of the gray grate.
When forming fn+1 , we choose the new grate to have thickness
1
.
4 6n1
With this choice, a straightforward calculation shows that the set of points
which never appear in a grate has total area .
As with the Hilbert curve, we should show that the sequence fn converges
uniformly. [TODO discuss.] Let f = lim fn . Now, well show that f ([0, 1]) has
Lebesgue measure at least , by showing that every point p which never appears
in a grate is in the image of f . Just like when we showed that the Hilbert curve
was surjective, it suffices to show that f passes through every neighborhood of
p, which is quite clear from the pictures.
Finally, we need to show that f is injective. This is also fairly clear from
the pictures. Fix t1 6= t2 , and consider the sequences fn (t1 ) and fn (t2 ). If both
sequences are eventually in the grate, then at that time, they will not intersect
(since every fn is injective) and the sequences will be constant from then on.
Hence, f (t1 ) 6= f (t2 ). If one sequence is eventually in the grate but the other
avoids the grate forever, then f (t1 ) 6= f (t2 ), because one of f (t1 ), f (t2 ) is in
the grate while the other is not. (Note that we need to exclude the grates
boundary in the definition of the grate to make this argument work.) Finally, if
both sequences avoid the grate forever, then they will eventually be in distinct
104 CHAPTER 4. SEQUENCES OF FUNCTIONS
fig:osgood-1 Figure 4.16: The first curve in the sequence defining the Osgood curve.
white squares, and clearly this implies that they will not converge to the same
thing, so again, f (t1 ) 6= f (t2 ).
Sagan
For further reading about space-filling curves, see the beautiful little book
[5].
26. VARIANCE OF DIMENSION 105
fig:osgood-2 Figure 4.17: The second curve in the sequence defining the Osgood curve.
fig:osgood-3 Figure 4.18: The third curve in the sequence defining the Osgood curve.
106 CHAPTER 4. SEQUENCES OF FUNCTIONS
fig:osgood-4 Figure 4.19: The fourth curve in the sequence defining the Osgood curve.
26. VARIANCE OF DIMENSION 107
fig:osgood-5 Figure 4.20: The fifth curve in the sequence defining the Osgood curve.
108 CHAPTER 4. SEQUENCES OF FUNCTIONS
fig:osgood-6 Figure 4.21: The sixth curve in the sequence defining the Osgood curve.
refsection:6
Chapter 5
Differentiation
27 Discontinuous derivative
sec:discont-deriv
Lets start with a quick review of derivatives. Differentiability is a stronger sort
of predictability than continuity. Roughly, we say that a function f : R R is
differentiable at x if there is some linear transformation T : R R such that
for all small x, we have f (x + x) f (x) + T (x). The function T is called
the Frechet derivative of f at x. Since we are working in just one dimension, T
is necessarily of the form T (x) = x; the number is called the derivative of
f at x and denoted f (x). [TODO picture. Also make less confusing and more
enlightening] The exact definition:
109
110 CHAPTER 5. DIFFERENTIATION
0 0
fig:abs-val-deriv Figure 5.1: The absolute value function and its derivative.
The answer to the latter question is no. Heres well get a function that is
differentiable everywhere, but its derivative is not continuous at x = 0. Let
(
x2 sin x1 , x 6= 0
f (x) = .
0, x=0
fig:discont-deriv
Its graph near zero is shown in Figure 5.2.
Using the limit definition and squeeze theorem, we can show that f (0) = 0.
But, for all x 6= 0, we can use the product rule to compute,
1 1
f (x) = 2x sin cos .
x x
However, lim f (x) = lim (2x sin(1/x) cos(1/x)) does not exist because of
x0 x0
oscillation! Therefore, f exists but is not continuous at x = 0.
Remark 6. A relative of f has the following property: g is differentiable with
g (0) > 0, but there is no open interval I containing 0 with g (x) > 0 for all
x I. Somehow, g manages to oscillate so quickly that every neighborhood of
zero has points with negative derivative! We let
(
x2 sin(1/x) + 0.001x, x 6= 0
g(x) = .
0, x=0
28. DARBOUXS THEOREM 111
Similar to how we showed things for f , we can show that g (0) = 0.001 > 0, but
1
that if x = 2k for k Z, then g (x) = 0.999 < 0.
28 Darbouxs theorem
sec:conway-base-13
Recall from Section 9 that a Darboux function is a function which satisfies the
conclusion of the intermediate value theorem. That is, f is Darboux if for every
a < b and every y between f (a) and f (b), there is some x (a, b) so that
f (x) = y. The IVT says that every continuous function is Darboux. Conways
base 13 function is a discontinuous function which is Darboux. A step function
with two steps is an example of a function which is not Darboux.
In the last section, we saw that derivatives can be discontinuous. But not ev-
ery function is a derivative! In particular, one necessary condition for a function
to be a derivative is that it is Darboux:
Theorem 31 (Darbouxs theorem). Suppose f : R R is differentiable. Then
f is Darboux.
Proof. Without loss of generality, assume that f (a) > y > f (b). Let g(x) =
f (x) yx. Let x be the point in [a, b] where g(x) is maximized. Since g
is increasing at a and decreasing at b, we must have x (a, b). Therefore,
g (x) = 0, and hence f (x) = y as desired.
[TODO draw picture]
[TODO this section is very short. maybe combine with something like pre-
vious section, or mention that characterizing derivatives is hard? -lhs]
with 0 < a < 1 and b a positive odd integer such that ab > 3
2 + 1, is everywhere
continuous and nowhere differentiable. As an example, the function
X
g(x) = (3/4)n cos(9n x)
n=0
satisfies the requirements on a and b. All the partial sums are differentiable
(finite sum of cosines), but it turns out that in the infinite sum, f oscillates so
rapidly at each point that it is nowhere differentiable!
112 CHAPTER 5. DIFFERENTIATION
4 2 0 2 4
P5 n
fig:cont-not-diff2 Figure 5.3: The partial sum n=0 (3/4) cos(9n x) looks quite messy.
1
2
2 1 1 2
X
1
f (x) := n
g(2n x).
n=0
2
PN 1 n
Each partial sum fN (x) = n=0 2n g(2 x) is continuous, and
X
1 1 X 1 N
kfN f k kgk 0,
2n 2 2n
n=N +1 n=N +1
so f is continuous (apply TODO cite uniform convergence thm on C([0, 1]) and
use periodicity). Now pick some point x R, and choose
j j+1
un := n
x < n =: vn . (5.1)
2 2
30. DERIVATIVES AT INFINITY 113
f (vn ) f (un )
If f was differentiable at x, then the ratio would converge to
v n un
f (x) as n . But in fact,
n1
f (vn ) f (un ) X g( 2nk ) g( 2nk ) X 1 g(2kn (j + 1)) g(2kn j)
j+1 j
= + ,
v n un 2k (vn un ) 2k v n un
k=0 | {z } k=n | {z }
=:dn =0 by periodicity of g
(2kn (j+1),2kn (j)Z)
(5.2)
where dn {1} since g always looks like x or x on [0, 1]. But then
n1
f (vn ) f (un ) X
= dn
vn un
k=0
30 Derivatives at infinity
Suppose f is differentiable and limx f (x) = L exists and is finite. What
can we say about limx f (x)? Intuitively, it seems like f should go to zero
since f has a limit so probably looks pretty flat for large x. But this intuition
is wrong! It is possible for f to have no limit as x .
The easiest way to construct such an example is to make f oscillate a lot as
x , but in decreasing amplitude so that f has a limit. Then the oscillations
will hopefully prevent f from limiting to zero. In fact, this is exactly what
sin(x2 )
happens with f (x) = . Although limx f (x) = 0, we can calculate the
x
derivative to be
d 1 sin(x2 )
f (x) = x sin(x2 ) = + 2 cos(x2 ),
dx x2
which has no limit as x . The function f oscillates faster and faster
(because of the x2 ) as x increases, but at the same time, the x1 factor squeezes
fig:deriv-at-infinity-osc1
f to zero. The derivative, however, oscillates worse and worse! (Figure 5.4.)
3
We even have the function g(x) = sin(x x
)
, which oscillates so fast that the
3
derivative g (x) = 3x cos(x3 ) sin(x
fig:deriv-at-infinity-osc2 x2
)
is not even bounded as x +! (Fig-
ure 5.5.)
Okay, so we just saw that oscillation at can prevent limx f (x) from
existing even if limx f (x) exists and is finite. But what if we impose some
conditions on f so it cant oscillate? How about we require f to be monotonic?
That seems like a pretty good way to ensure that f looks really flat at .
114 CHAPTER 5. DIFFERENTIATION
0.5
0.5
0 5 10 15 20
2
Figure 5.4: The function f (x) = sin(x x
)
oscillates a lot. Even though it gets squeezed
fig:deriv-at-infinity-osc1 to a limit, its derivative is out of control!
0.5
0.5
0 5 10 15 20
3 2
Figure 5.5: The graph of sin(xx
)
looks similar to that of sin(x
x
)
, but its derivative
is even more out of control! TODO increase samples to more like 3000 or 4000 when
fig:deriv-at-infinity-osc2 actually compiling
But even assuming this, we still cannot ensure limx f (x) = 0! We can rig
a function so that it increasing and differentiable, but near +, it increases in
small discrete bursts that prevent f from having a limit! Let g be the following
tent-post function,where each tent-post is a triangle centered at an integer
n N. Each triangle has height 1, and base length 21n . This way, the triangle
31. BUMP FUNCTIONS AND PARTITIONS OF UNITY 115
1
centered at n N has area 2n+1 .
1 2 3
Define Z x
f (x) := g(t) dt,
0
Claim 4. There exists a function R2 R with no limit at (0, 0), even though
all the directional limits exist and are equal.
differentiable vs equal mixed partials (gelbaum pg.120) i.e. why those re-
quirements in Clairauts theorem are necessary. Also include a discussion about
why Clairauts theorem is true at all! (Maybe involving Fubinis theorem.)
(ii) a differentiable function R2 R which has two local maxima but no local
minimum
Remark 7. True R R.
has only one critical point (a local maximum) but has no absolute extrema (it
is unbounded). Since f is differentiable, we find critical points by solving
f = 3ey 3x2 , 3xey 3e3y = 0.
The only critical point turns out to be (1, 0), which is a local maximum by the
second partials test. (D = fxx fyy (fxy )2 , and D(1, 0) = (6)(39)9 = 27 >
0 while fxx (1, 0) = 6 < 0.) We can evaluate f (1, 0) = 1. However, looking at
f (x, 0) = 3x x3 1 as x , we see f (x, 0) gets arbitrarily large so that
32. MULTIVARIABLE LIMITS, DERIVATIVES, AND LOCAL EXTREMA ARE WEIRD117
f has no absolute maximum value. Thus the local maximum we found is not a
global maximum.
TODO GRAPH
TODO single variable
Chapter 6
Measure
chap:measure
Then you also, perhaps you have
some faults? I do not believe
so.
119
120 CHAPTER 6. MEASURE
Figure 6.1: Duh, you cant fill a triangle with nonoverlapping unit squares.
Figure 6.2: A proof that the area of a triangle is 12 bh. Note that this argument
only makes any sense if the triangle actually fits inside the rectangle, i.e. only if the
fig:area-of-triangle top vertex of the triangle is above the base.
rectangle. This argument made several assumptions about how area works, like
that congruent shapes have the same area, and that you can calculate the area
of a region by decomposing it into subregions and adding up the areas of those
subregions. There are some nontrivial claims to be proven here!
Maybe you remember the classic and silly missing square paradox, where
a triangle appears to have two different areas based on two different decom-
fig:missing-square
positions. (See Figure 6.3.) Of course, the missing square paradox is just a
simple illusion, as it ought to be. But have you ever seen a real proof that the
phenomenon purportedly exemplified by the missing square paradox can never
actually happen?
And how do you handle something like the area of a disk? Do you plan to
somehow cut up finitely many squares into finitely many pieces and perfectly
cover a disk?2 Archimedes calculated the familiar area formula r2 for a disk
of radius r as follows: We can cut the disk into N congruent wedges, with N
very large. The wedges are essentially just triangles with height r and base
length (2r)/N , so the area of each wedge is approximately r2 /N . Adding up,
we calculate that the area of the disk is approximately r2 . The point is that
as we take N , this approximation gets better and better. But how can we
justify this sort of limiting argument? (What is it that we are approximating,
exactly?) [TODO discuss Euclids treatment of area]
And theres another difficulty with this informal treatment of area. When
were cutting up these squares, what sorts of cuts are we allowed to make,
anyway? Can we cut off the set of points with rational coordinates?
We hope weve convinced you by now that defining area is quite a tricky
business! Of course, theres nothing special about two dimensions. Volume
2 Note that weirdly enough, this particular task can actually be performed, in a sense.
Look upsec:banach-tarski
Tarskis circle-squaring problem. You might appreciate it more after first reading
Section 45.
33. EPISODE I: THE PHANTOM MEASURE (MEASURE IS PROBLEMATIC)121
Figure 6.3: The missing square paradox: it appears that two different decompositions
of the same triangle lead to two different area calculations. An alternative description
is that it appears that by translating the pieces in the lower decomposition, you can end
up with the same overall triangle we started with plus one free unit square. The non-
profound resolution to the paradox is that the small blue triangle and the large green
triangle are not similar, so neither overall figure is actually a triangle (the hypotenuses
fig:missing-square are slightly bent.) The two overall figures are not congruent.
Figure 6.4: How Archimedes calculated the area of a disk. The approximation
2
with N = 20 is shown; the area of each wedge is approximately r 20
. Another way
to understand this argument is to rearrange the wedges so that they form a shape
which is approximately rectangular; the width of this rectangle is approximately
half the circumference, and the height is approximately the radius, giving a total area
of 12 (2r) r = r2 . But what was Archimedes calculating exactly?
122 CHAPTER 6. MEASURE
is similarly tricky, and in fact, length is already tricky. (Wed like to make
sec:cantor-set
sense of our claim in Section 6 that the Cantor set has total length zero!) In
general, the problem of giving legitimate definitions for length, area, volume,
etc. is called the problem of measure. For each dimension d, wed like a measure
function md , defined on subsets of Rd . (E.g. m2 is area.) What do we want
from this function?
Well, there should be no paradoxical decompositions, except silly illusions
like the missing square paradox. If you decompose a region in two different
ways even if you use countably infinitely many pieces [TODO motivate this
specifically] you should get the same measure calculation. For starters, lets
just try to solve the one-dimensional problem of measure, where these ideas are
crystallized into four basic Laws of Lengthodynamics that we would like our
measure function m = m1 to satisfy:
thm:vitali Theorem 32 (Vitali). There does not exist a function m : P(R) [0, ]
which satisfies normalization, countable additivity, and translation invariance.
thm:vitali
Theorem 32 is every conspiracy theorists dream. The doubts we expressed
about the definition and basic properties of area were not just the idle specula-
tions of an overzealous reductionist. Measure is seriously messed up!
Well pull off a variant of the missing square paradox with the unit circle, but this
time, instead of just being a cheap trick, it will be real magic. In particular,
well decompose S 1 into countably infinitely many disjoint pieces, and then
use translations and rotations to rearrange those pieces into two copies of the
33. EPISODE I: THE PHANTOM MEASURE (MEASURE IS PROBLEMATIC)123
Figure 6.5: All of the depicted points are in the same equivalence class, along with
infinitely many points which are not depicted. The points depicted are those reachable
from the point (1, 0) by walking counterclockwise an integer distance between 0 and
6; the entire equivalence class includes all points reachable by walking any arbitrary
integer distance in either direction.
original circle! Well omit the relatively boring last step of the proof, where we
ought to show that this paradoxical decomposition implies Vitalis theorem.
For a point p = (cos , sin ) S 1 and a number R, define
thm:vitali
But Theorem 32 already shows that the problem of measure (as posed) is
unsolvable. The problem of measure is quite a big problem indeed! It looks like
length is not as well-defined as you might hope. Stay tuned though, not all is
lost...3
34 Jordan measure
sec:vitali
We saw in Section 33 that there is no reasonable way to assign a measure to
every subset of R. But thats unacceptable. Surely, we shouldnt just declare
that ideas like length, area, and volume are meaningless gibberish! No, it is our
duty to make sense of them, so we will just have to shoot for a partial solution
to the problem of measure. This section will be about an early partial solution,
called Jordan measure. Well write md (E) to denote the Jordan measure of
a set E Rd . (So m1 is length, m2 is area, m3 is volume, etc.) Following
Archimedes, to define md (E), we want to take a sort of a limit. But you have
to be careful! Well start with how the definition does NOT go.
Given a set E Rd , we might try to estimate the measure of E by sampling
from Rd and seeing how many points we pick from E. To be more precise, we
could count how many points there are in E with all integer coordinates. To
get a better estimate, we could count how many points there are in E with
half-integer coordinates (and then wed have to divide by 2d to normalize.) In
general, our kth estimate is given by
1 n a o
md (E, k) = d a Zd : E .
k k
And then our temptation is to set md (E) = limk md (E, k). The limit may
not always be defined, of course, but were just shooting for a partial solution
here anyway. Let us call this quantity m e d (E) instead, because it is not the true
Jordan measure.
Unfortunately, m e 1 breaks the Second Law of Lengthodynamics: its not
translation invariant! To see why, observe that the set E1 = Q [0, 1] has
e 1 (E1 ) = 1: we only sample at rational points, so E1 looksjust like [0, 1] to our
m
sampling procedure. But its translate, E2 = (Q [0, 1]) + 2, has m e 1 (E2 ) = 0:
to our sampling procedure, E2 looks empty!
Its inevitable that we break at least one Law, but translation invariance is
too good to give up. You might think we could patch m e d up by just sampling
at irrational points, too. The formula that defines md (E, k) makes perfect sense
even if k is non-integral, so we could just try altering our definition of m e d (E)
by taking our limit over arbitrary real values k.
But this is no good either. We still violate the 2D analog of the Second Law:
e 2 is not rotation invariant! To see why, define
m
n y o
E1 = (x, y) [0, 1] [0, 1] : x = 0 or Q
x
3 This is a cliffhanger, in case you hadnt noticed.
34. JORDAN MEASURE 125
g:fake-jordan-measure
That is, E1 is the set of lines passing through the origin with rational slope,
intersected with the unit square. Then with our new definition, m e 2 (E1 ) = 1:
we still only sample at points with rational slope, so E1 looks just like the unit
square to our sampling procedure. But if we define E2 to be E1 rotated by an
irrational fraction of a full turn, then m e 2 (E2 ) = 0.
So lets move on to the actual definition of md (E). For the easiest case, if E
is a box, we define its measure to be the product of its side lengths. That is, if
I1 , . . . , Id are sets of real numbers of the form [ai , bi ], [ai , bi ), (ai , bi ], or (ai , bi ),
then we define
m
d (E) = sup{md (A) : A is an elementary subset of E}. (6.1)
m+
d (E) = inf{md (A) : A is an elementary superset of E}. (6.2)
126 CHAPTER 6. MEASURE
11
12
21 31
24
21
11 31
Figure 6.7: Two different ways of writing the same elementary set in the plane as
a disjoint union of boxes (rectangles.) Each suggests the same area: 1 + 8 + 2 =
3 + 2 + 3 + 2 + 1.
Figure 6.8: An elementary subset of the unit disk D with area 1.36, and an el-
ementary superset of D with area 4.69. By definition (assuming that D is Jordan
measurable, which it is) this shows that 1.36 m2 (D) 4.69. Archimedes would
agree: 1.36 4.69.
Proof. Recall that at the nth step in the construction of , we have an elemen-
tary superset n , and limn m1 (n ) = 0. Obviously 0 m 1 ()
m+1 (), so we are done.
It turns out that lots of regions in the plane, like polygons, disks, and an-
nuluses, are all Jordan measurable, and their Jordan measures are equal to the
35. LEBESGUE MEASURE 127
But even bounded sets can fail to be Jordan measurable. For example, take
E = Q [0, 1]. The inner Jordan measure of E is of course 0, since E does
not contain any intervals. But the outer Jordan measure of E is 1: Suppose
E is contained in the elementary set I1 In . Without loss of generality,
assume these intervals are ordered, so that Ik is to the left of Ik+1 . Then the
right endpoint of Ik must be equal to the left endpoint of Ik+1 , since Q is dense.
Hence, m(I1 ) + + m(In ) 1.
Time to evaluate the performance of m1 at solving the problem of measure
in one dimension. Jordan adheres to the Second and Third Laws of Lengthody-
namics: m1 is translation invariant, and m1 ([0, 1]) = 1. And Jordan measure is
finitely additive, i.e. if A and B are Jordan measurable and disjoint, then A B
is Jordan measurable with m(A B) = m(A) + m(B).
But Jordan is guilty of badly breaking the Zeroth Law: tons of sets are not
Jordan measurable. And Jordan breaks the First Law (countable additivity) as
well, because the collection of Jordan measurable sets is not even closed under
countable disjoint unions! E.g. for each rational number q, the singleton set {q}
is Jordan measurable, but their union, Q, is not Jordan measurable. This is the
only thing that goes wrong; if E1 , E2 , . . . are JordanP
measurable and disjoint,
and n En is Jordan measurable, then md (n En ) = n md (En ). One way to
prove this last fact is to use Lebesgue measure, which well discuss next section.
[TODO talk a little more about the history of this stuff]
35 Lebesgue measure
sec:lebesgue-measure
In the early 1900s, Henri Lebesgue (pronounced luh beg) realized that we
can come much closer to a full solution to the problem of measure than Jor-
dan measure. Well focus on the one dimensional case; the generalization to d
dimensions is straightforward. To begin, well define Lebesgue outer measure,
which is like Jordan outer measure, but a little more sophisticated. If E R,
a countable covering by intervals of E is a (countable) sequence I1 , I2 , . . . of
disjoint open intervals such that E n In . Intuitively, if we let (In ) denote
the length of In P
(i.e. ((a, b)) = b a), then such a covering should provide an
upper bound of n (In ) on the measure of E. So we define the Lebesgue outer
measure of E, denoted (E), as follows:
( )
X
(E) = inf (In ) : (In ) is a countable covering of E by intervals .
n
(6.4)
128 CHAPTER 6. MEASURE
[TODO verify that this is actually the correct definition, haha. Its not stan-
dard, but I think its much more natural than the standard ones.] [TODO in-
clude a little intuition and a picture] [TODO contrast the definition of Lebesgue
measure with that of Jordan measure] [TODO mention some other ways to de-
fine/construct Lebesgue measure]
Let M denote the set of Lebesgue measurable sets. We define the Lebesgue
measure : M R by (E) = (E). (So the only difference between Lebesgue
measure and Lebesgue outer measure is that Lebesgue measure is defined
on fewer sets.) It can be verified [TODO reference] that Lebesgue measure
satisfies the First, Second, and Third Laws of Lengthodynamics (countable ad-
ditivity, translation invariance, and normalization), along with various other
nice, intuitive properties. On Jordan measurable sets, Lebesgue measure and
Jordan measure coincide. Note that any individual point has Lebesgue mea-
sure zero, so by countable additivity, any countable set (e.g. Q) has Lebesgue
measure zero.
How about the Zeroth Law? It turns out that tons and tons of sets are
Lebesgue measurable, e.g. open sets, closed sets (such as the Cantor set), sets
with outer measure zero, and Jordan measurable sets. In fact, the collection of
Lebesgue measurable sets has the same cardinality as the set of all subsets of
R: the Cantor set has outer measure zero, so every subset of is Lebesgue
measurable, and || = |R|. So there are just as many measurable sets as there
are sets total. It also turns out that the set of Lebesgue measurable sets is
very robust, e.g. it is closed under countable union, countable intersection,
complementation, translation, reflection, continuous transformations [TODO
verify], and any other reasonable operation. Reality is approximated quite well
by pretending that all sets are Lebesgue measurable.
We know that the Zeroth Law must be broken, though, since Lebesgue
adheres to the other three Laws. But you might reasonably deny that weve
seen an actual example thm:vitali
of a set which is not Lebesgue measurable. If you recall,
our proof of Theorem 32 relied heavily on the axiom of choice, so we didnt
really pin down any particular nonmeasurable set.
It turns out, this was not just an artifact of our proof! If you believe some
abstruse set-theoretic assumptions, then the axiom of choice is unavoidable in
any proof of the existence of nonmeasurable sets.4 This is a bit frustrating,
but there is a silver lining. The necessity of the axiom of choice in proving the
existence of nonmeasurable sets suggests that you dont have to worry about
nonmeasurability too much, because if youve constructed some set E, as long
as your construction is nice and explicit, you can expect that E is measur-
able. Indeed, there are theorem versions of this last claim. See [TODO cite
4 Look up the Solovay model for the details.
36. THE SMITH-VOLTERRA-CANTOR SET 129
http://mathoverflow.net/questions/211507/measurability-and-axiom-of-choice] and
[TODO cite Large cardinals imply that every reasonably definable set of reals
is Lebesgue measurable].
[TODO some discussion of the history here e.g. which came first, Lebesgue
measure or Vitali sets? (I actually dont know the answer to this, but Im
guessing Lebesgue measure came first, which raises the further question, what
was Lebesgues inspiration for only defining his measure on a subset of P(R)?)]
0 1
0 3 5 1
8 8
0 5 7 3 5 25 27 1
32 32 8 8 32 32
In each successive step, we remove an interval of length 41n from the middle of
each of the remaining 2n1 intervals. SVC is the set of all the points remaining
as we continue the remove middle fourth intervals. (It is all the points that are
never eventually removed.)
The length of all the points in SVC, that is, the points remaining after
130 CHAPTER 6. MEASURE
Figure 6.10: SVC is also called the barcode set. [TODO self-cite]
In fact, SVC has Lebesgue measure 1/2. But SVC does not contain any intervals.
To see why, observe that in the nth step of the construction, we have a disjoint
union of several intervals, all of the same length (n). The point is that (n) 0
as n (since (n + 1) < 12 (n)), so for any a < b, the interval (a, b) is not
contained in SVC, because eventually, (n) < b a. Since SVC is a countable
intersection of countable sets, it is closed, and hence it is nowhere dense.
Another fun fact about SVC:
Proposition 25. SVC is not Jordan measurable.
Proof. The inner Jordan measure of SVC is zero, since SVC does not contain
any intervals. But the outer measure must be at least 21 , since outer Jordan
measure is an upper bound on Lebesgue measure.
The general criterion for Jordan measurability is as follows:
thm:jordan-measurability Theorem 33. A bounded set E R is Jordan measurable if and only if its
boundary E has Lebesgue measure zero.
For example, since SVC is closed, it is its own boundary; thus, the fact that
it has positive Lebesgue measure implies that it is not Jordan measurable. For
another example, the boundary of Q [0, 1] is all of thm:jordan-measurability
[0, 1], so Q [0, 1] is not
Jordan measurable. Well omit
thm:lebesgue
the proof of Theorem 33, because it is a special
sec:riemann-integrability
case of Theorem 39, which well discuss in Section 47.
r
ge
ea Meager
nm
no
o-
er
z
e-
ur
as
Definition
6 p:me
Q
n 2 pro
De
fin
SVC
itio
tio
osi
n
op
Pr
Nowhere
Dense
Q
SVC Q
Measure
Countable
Zero Definition
Figure 6.11: The relationships between being countable, being meager, being
nowhere dense, and having measure zero. A solid arrow represents an implication,
while a dashed arrow represents a lack of implication. Each arrow is labeled with its
proof/counterexample.
132 CHAPTER 6. MEASURE
Well prove slightly more than this by showing that there is a comeager set
with measure zero. (A comeager set is the complement of a meager set; by
Baires category theorem, any comeager set is nonmeager.)
prop:measure-zero-nonmeager Proposition 26. There exists a comeager set E with (E) = 0.
Proof. For > 0, let U be an open set of measure containing a neighborhood
of every rational number. Then Uc is nowhere dense, because if I is an interval,
then there is a rational q I, and Uc misses an interval around q. Therefore,
the set \
E= U1/n
nN
is comeager. But on the other hand, for every n, (E) (U1/n ) = 1/n, so
(E) = 0.
(Note that (E, x) might not be defined, if the limit is not defined.)
The main theorem here is Lebesgues Density Theorem, which tells us that
TODO was not a coincidence.
In other words, Lebesgues density theorem says that (E, x) = E (x) for al-
sec:lebesgue-diferentiation
most every x. [TODO give reference for proof Section ??] One nifty consequence
is that if we define
(E) = {x : (E, x) = 1},
then picks out one representative from each equivalence class of the equiv-
alence relation on Lebesgue-measurable sets defined by saying that A B iff
(AB) = 0.
136 CHAPTER 6. MEASURE
A + B = {a + b : a A, b B}.
thm:set-sum-interval Theorem 35. Suppose A and B both have positive Lebesgue measure. Then
A + B contains an interval.
One nifty corollary of the Steinhaus theorem is that if E has positive mea-
sure, then |E| = |R|. (Its obvious that E must be uncountable.) How about
a converse to the Steinhaus theorem? If E E contains a neighborhood of 0,
must E have positive measure? Nope!
40. INTERSECTIONS OF MEASURE ZERO SETS AND THEIR IMAGES137
Proposition 27. Let denote the Cantor set. Then = [1, 1], despite
the fact that () = 0.
Proof. Its obvious that [1, 1], since [0, 1]. For the reverse
inclusion, observe that c if and only if the graph of the line y = x + c
intersects . Let n be the nth set in the construction of ,fig:cantor-set-difference
so that
= n (n n ). It should be inductively clear from Figure 6.16 that
y = x + c intersects every n . Hence, if we let L denote the graph of y = x + c,
then L (n n ) is a decreasing sequence of nonempty compact subsets of
R2 . An application of Cantors intersection theorem [TODO we should state
this somewhere] completes the proof.
41 Noise sets
sec:lebesgue-density
[TODO: think more about this name.] As in Section 38, for two measurable
sets E, U R, let U (E) denote the density of E in U , i.e. the fraction of U
which is filled up by E. Well gonig to define a noise set to be a measurable set
E such that for every nonempty interval I,
So a noise set partially fills every interval. You might be imagining that E
partially fills R uniformly, i.e. I (E) is constant at some value between 0
and 1. (So a picture of E would just be gray.) But that cant happen! By the
Lebesgue density theorem, if E is a noise set, then for every interval I, there
are subintervals of I in which E has density arbitrarily close to 1, and there
are subintervals of I in which E has density arbitrarily close to 0. So E has
black and white patches all over the place, like static on your TV (hence the
fig:noise-set
name noise set.) (See Figure 6.18.) In light of the Lebesgue density theorem,
138 CHAPTER 6. MEASURE
q1 q2 r2 r1
thm:noise-set
Figure 6.18: A noise set formed as in the proof of Theorem 37 using X = SVC(0.3).
Here, SVC() is the set formed via the SVC construction with the interval removed
sec:svc
fig:noise-set at step n having length n . (So in Section ??, we discussed SVC( 14 ).)
its really rather surprising that noise sets exist at all! But they do. For a fun
bonus, they can even have finite measure.
thm:noise-set Theorem 37. There exists a noise set E with finite measure.
[TODO point out that our set is F , so its a discontinuity set. Point out
that this is somewhat surprising, since we proved that there is a sense in which
F sets are never medium-sized, and this set seems very medium-sized]
42. CONVERGENCE IN MEASURE VS. POINTWISE CONVERGENCE139
Definition 28. We say that fn f in measure if for all > 0, the measure
({|fn f | > }) 0 as n .
In other words, for a fixed , the measure of the set where fn differs from
f by more than goes to 0 as n . Of course, now we want to figure out
when if these notions of convergence are the same, or if one type of convergence
implies the other.
Note if we already started with a finite measure set like [0, 1], then conver-
gence in measure is the same as convergence locally in measure.
prop:pointwise-measure
Proof (of Proposition 28). Fix k N, and suppose fn f a.e. on Ck . Then
> 0, The sets where fn f is small, N := {|fn f | , n N }Ck , increase
to Ck (a.e). Thus the complement, where fn f is large, i.e. Ck \ N , decreases
to the empty set (a.e.) as N . By some continuity properties of measures
(which uses that (Ck ) < ), this implies (Ck \N ) 0 as N . Finally,
Ck \ N = {x Ck : |fn f | > , some n N } {x Ck : |fN f | > }.
The measure of the set has to decrease if we want to converge in measure, but
convergence in measure doesnt care where the set is. Pointwise convergence
does care a lot though! If we can move the bad set repeatedly over all the
points (while the set is still shrinking), then we might have a chance at failing
to converge pointwise a.e.!
Lets look at [0, 1] and some dyadic rationals (rational numbers with denom-
inator a power of 2). Now given m N, which will be our index for the functions
fm , write m = 2n + k, n N0 := N {0}, 0 k 2n . Define
(
1, if x 2kn , k+1
2n
fm (x) = .
0, otherwise
What fm does is this: Partition [0, 1] into the intervals of length 21n . On the kth
interval (starting from 0), make fm = 1. Everywhere else, make fm = 0. Each
time k increases as m increases, the interval where fm = 1 gets shifted over
by 21n . Every time n gets incremented, we repartition [0, 1] into finer intervals
and repeat, shifting the intervals one step at a time across [0, 1]. As a result,
fm (x) = 1 infinitely often for every x, even though (|fm | > ) = 21n m 2
0
as m .
TODO: picture
43 Borel measure
Recall that a countable union of closed sets is not necessarily closed or open. We
called such sets F sets. Similarly, a countable intersection of open sets is called
a G set. The same phenomenon happens again, up one level: a countable union
of F sets is again an F set, but a countable intersection of F sets might be
neither F nor G . We call such sets F , and their complements are the G
sets.
These are the first few layers of the Borel hierarchy. In order to generalize,
well introduce some less cumbersome notation. We say that a set is 01 if it
is open, and we say that a set is 01 if it is closed. (We also think of 01 as
denoting the set of open sets, and similarly with 01 , etc.) The 02 sets are
the F sets (i.e. countable unions of closed sets.) Their complements are the
02 sets. We continue like this, setting 0n+1 to be the collection of countable
unions of 0n sets and 0n+1 to be the collection of complements of 0n+1 sets.
Notice that 0n , 0n 0n+1 , 0n+1 . One can prove that for each n, the sets 0n
and 0n are distinct 0n is not a good stopping point for any n N. [TODO
include standard simple picture of inclusions between these things]
But maybe if we let B0 = n 0n , then B0 is a nice collection of sets? (Maybe
we can stop at infinity?) Its closed under complement. But its not closed
under countable unions: there are sequences E1 , E2 , . . . with En 0n so that
n En 6 B0 . So we still have not reached a good stopping point. Like Buzz
Lightyear, we must continue! Well just have to call a set 0 if is a countable
union of sets which are 0n for some finite n. Taking complements gives 0 .
43. BOREL MEASURE 141
For example, the standard ordering of N is a well ordering, but the standard
ordering of Z is not. [TODO give some intuition about how the well orderings
are all just sequences... except not necessarily familiar ones]
Naturally, we say that two ordered sets (X, X ) and (Y, Y ) are isomorphic
if there is a bijection : X Y so that x X x (x) Y (x ). Roughly
speaking, ordinal numbers are canonical representatives for the isomorphism
types of well orders. The precise definition is a bit confusing.
set of those sets of sets of subsets of a nonempty set whose intersections are not -algebras
with the set of sets of those sets of subsets of the same nonempty set which are -algebras is
empty.
44. MEASURES IN GENERAL 143
where every an is an integer and there is some infinite subsequence an1 , an2 , . . .
such that ani divides ani+1 for every i. It turns out [TODO give reference for
proof] that E is Lebesgue measurable but not Borel.
But the Borel sets form a pretty good sample of the Lebesgue measurable
sets!
Theorem 38. For every Lebesgue measurable set A, there exists a Borel set B
so that (AB) = 0. In fact, even better, there exist Borel sets B1 , B2 so that
B1 A B2 , but (B1 ) = (B2 ).
[TODO give reference for proof] [TODO discuss history]
44 Measures in general
As we discussed to motivate all this measure stuff, length is just one of many
intuitive notions which is best understood via measure theory. In general, a
measure space consists of a set of points, a collection of measurable sets of
points, and an assignment of a measure to each measurable set. Of course, the
collection of measurable sets and the measure itself have to satisfy some axioms.
Definition 32. A measure space is a triple (, F, ), such that:
is a nonempty set (the set of points.)
F is a -algebra on (the collection of measurable sets.) (So we require
that the empty set is measurable and the collection of measurable sets is
closed under complement, countable union, and countable intersection.)
: F [0, ] is a function (the measure function) which satisfies
1. () = 0.
2. If E1 , E2 , . . .Pis a countable sequence of disjoint measurable sets, then
(n En ) = n (En ). (The measure is countably additive.)
Example 7. Let M denote the collection of Lebesgue-measurable subsets of R.
Let : M [0, ] denote the Lebesgue measure. Then (R, M, ) is a measure
space.
Example 8. Let B denote the collection of Borel subsets of R. Let now
denote the restriction of Lebesgue measure to B. Then (R, B, ) is a measure
space.
Example 9. One can extend the definitions of Lebesgue outer measure and
Lebesgue measurable to subsets of Rn easily enough (just replace intervals with
boxes.) This yields a measure space (Rn , M, ). In the case n = 2, this is area,
and in the case n = 3, this is volume.
Example 10. If is any nonempty finite set, define # : P () R0 by setting
#(E) to be the number of elements in E. Then (, P (), #) is a measure space.
We call # the counting measure on .
144 CHAPTER 6. MEASURE
Example 11. Any situation with uncertainty, such as tossing two dice, or pre-
dicting the weather, or taking a difficult multiple-choice exam, can be modeled
by a measure space (, F, Pr), where is the set of possible outcomes, F is
some suitable collection of events (sets of outcomes), and the measure func-
tion Pr(E) gives the probability that the event E occurs. For this reason, we
say that a probability space is a measure space (, F, ) with () = 1. Well
chap:prob
meet all these characters again in Chapter 9.
Example 12. The appropriate way to formalize surface area (e.g. the area of
a sphere) is of course also in terms of measure theory. But we dont want to use
the Lebesgue measure on R3 ; thats volume (so the measure of a sphere is zero.)
We cant use the Lebesgue measure on R2 (a sphere is not a region in the plane.)
One suitable measure is called the Hausdorff measure. For each dimension d
and each Euclidean space Rn , there is a Hausdorff measure Hnd which gives the
d-dimensional measure of a measurable subset of Rn . The definition even makes
sense if d is not an integer! [TODO give reference]
Example 13. Sometimes, physicists talk about point charges, but other times,
they talk about charge densities. Both of these are part of a more general
phenomenon: a charge distribution is properly represented by a measure space
(R3 , F, q), where q(E) gives the total amount of charge enclosed in E R3 .
Similarly, mass is best described with measure theory.
Non-Example 1. Jordan measure is not a true measure. The class of Jordan
measurable sets does not form a -algebra, e.g. because every singleton set is
Jordan measurable but there are countable sets which are not Jordan measurable
(such as Q [0, 1].) Maybe it should have been called Jordan pseudo-measure.
Oh well.
[TODO make this section sound less like a textbook. We dont even have
any pictures!]
145
refsection:8
Chapter 7
Integration
a b
Just like with the definition of Jordan measure, we have to be careful with
this limit. Well start by giving the wrong definition, analogous to the definition
147
148 CHAPTER 7. INTEGRATION
That is, by definition, f is fake Riemann integrable if the above limit exists, in
which case its integral is the value of that limit.
This is a fine definition as far as it goes, but it is not the familiar integral.
Rb Rc Rc
For example,
R the standard rule a f (x) dx + b f (x) dx = a f (x) dx is not
true if is interpreted as a fake Riemann integral! For proof, let f be the
Dirichlet function (i.e. the indicator function of Q), let a = 0, let c = 2, and
Rb
let b = 2. Then a f (x) = 0, because we only evaluate f at irrational values
Rc Rc
in the definition of the integral. Similarly b f (x) dx = 0. But a f (x) = 2,
because we only evaluate f at rational values for the calculation of this last
integral!
One of the ways to approach the definition of the true Riemann integral is
by using Darboux sums. The upper Darboux sum will overestimate the definite
integral, while the lower Darboux sum will underestimate the definite integral.
As we refine these upper and lower Darboux sums, their values will become
closer and closer to the actual value of the integral. It can be shown that
Darboux integration is equivalent to Riemann integration, so we often use the
upper and lower (Darboux) sums to introduce the Riemann integral.
Definition 35. Let f : [a, b] R. The upper Darboux sum for a partition
P = {x0 , . . . , xn } on [a, b] is defined as
n
X
U (f, P ) := Mi (xi xi1 ),
i=1
where mi := inf [xi1 ,xi ] f (x), the infimum of f on the subinterval [xi1 , xi ].
These sums are essentially the same as the upper and lower approxima-
tions used in many calculus classes. The upper and lower sums correspond to
summing the areas of the upper rectangles (U (f, P )) and lower rectangles
46. INTRODUCTION TO THE RIEMANN INTEGRAL 149
a b
(L(f, P )), so U (f, P ) and L(f, P ) trap the value of the integral between them.
fig:darboux-sums
(Figure 7.2.) How can we improve our estimate? By taking smaller and smaller
subintervals [xi1 , xi ], we can better approximate f on each subinterval, which
will give a better approximation for the integral. As we get finer and finer parti-
tions, the upper sums U (f, P ) decrease, while the lower sums L(f, P ) increase,
squeezing the value of the integral between them. Using this idea, we can define
the upper and lower Darboux integrals.
Definition 36. The upper Darboux integral U (f ) is defined as
Now that weve finished defining the Darboux integral, youre probably won-
dering about the definition of the actual Riemann integral. The main difference
from upper and lower Darboux integrals is that instead of the sup or inf on
[xj , xj+1 ], we allow any function value f (tj ), for tj [xj , xj+1 ].
Definition 37. A tagged partition of the interval [a, b] is a set {x0 , . . . , xn }
with a = x0 < x1 < . . . < xn = b, along with a set {t0 , . . . , tn1 } with tj
[xj , xj+1 ]. The mesh of a partition is the length of the largest subinterval,
max1jn (xj xj1 ).
In other words, a tagged partition is just a usual partition but we also tag a
point tj from each interval.
150 CHAPTER 7. INTEGRATION
Rb
Definition 38. The Riemann integral a f exists and equals S iff for all > 0,
there exists a > 0 so that all tagged partitions with mesh < satisfy
n
X
f (tj )(xj xj1 ) S < . (7.2)
j=1
It can be shown that this is the same as the Darboux integral. But this
is a rather unwieldy definition, which is why we like to use the Darboux in-
tegral definition. From now on, well use Darboux and Riemann integrability
interchangeably.
The relationship between the Riemann integral and Jordan measure is two-
fold. On the one hand, the Riemann integral generalizes one-dimensional Jordan
measure. To see how, for a set E R, we define E , the indicator function of
E, by
(
0 if x 6 E
E (x) := (7.3)
1 if x E.
But on the other hand, the Riemann integral is a special case of two-
dimensional Jordan measure:
Then f is Riemann integrable if and only if A+ and A are both Jordan mea-
surable. In this case,
Z b
f (x) dx = m(A+ ) m(A ). (7.7)
a
So the Riemann integral and Jordan measure are two sides of the same coin.
47. DIRICHLET, THOMAE, AND LEBESGUES CRITERION 151
Figure 7.3: The key observation to prove the Thomae function is Riemann-integrable:
For any > 0, there are only finitely many rational numbers rj with f (rj ) = q1j > .
Note there are finitely many denominators qk with qk < 1 , and then only finitely
fig:thomae-integrable many numerators to go with each qk .
Figure 7.4: We cover all the points rj (0, 1) such that f (rj ) > with the yellow
intervals of total length < . The yellow intevals contribution to the upper Darboux
sum is then < . On the remaining part of [0, 1], shown in turquoise, the function f
takes values , and so the turquoise intervals contribution to the upper Darboux
fig:thomae-integrable2 sum is also < .
we can force the upper sum to be 2 for any > 0, so the upper integral and
hence Riemann integral are just zero.
Whats the difference between the Dirichlet and Thomae functions that
makes one Riemann integrable and one not? Recall a key fact about conti-
nuity of these functions: the Dirichlet function is continuous nowhere, while the
Thomae function is continuous at every irrational (and discontinuous at every
rational). Could this be related to the fact that the Thomae function is Riemann
integrable, while the Dirichlet function is not?
It turns out yes, because there is the following criterion for Riemann-integrability
that is commonly known as Lebesgues criterion.
For the () direction, which is esaier, we need to show that m({f (x) n1 }) = 0
for each n N. For the () direction, we use m({f (x) n1 }) = 0 for each
n N to construct a partition P with U (f, P ) L(f, P ) < .
The key relation between oscillation and the Riemann integral is
n
X
U (f, P ) L(f, P ) = (f ; [xj1 , xj ])xj . (7.8) eqn:lebesgue-criterion-forward
j=1
where F is an antiderivative
eqn:ftc
of f . But what conditions do we need to impose on
f to make sure that (7.9) works? One useful requirement is that F should exist
so we can make sense of the equation. But do we need to worry about existence
Rb
of a f (x) dx? If an antiderivative F exists, do we need to say anything about
Rb
a
f (x) dx existing?
Its not too hard to see that the answer is yes. Even if f has an antiderivative,
it might not be integrable. For example, let
(
x2 sin x12 if x 6= 0
F (x) = (7.10)
0 if x = 0.
0.1
0.05
-0.05
-0.1
-0.15
-0.2
-1
-2
This is not technically Riemann integrable over the interval [1, 1], for the
boring reason that it is unbounded. Although in this case, one could get
around this technicality with improper integration.
48. EPISODE III: REVENGE OF THE ANTIDERIVATIVES (VOLTERRA)155
Take the portion of r on [0, 1/2], and rotate this portion around the point (1/2, 0)
fig:volterra1
to form the function g : [0, 1] R. (Figure 7.7.)
fig:volterra1 Figure 7.7: This function g is the basis for constructing Volterras function.
Figure 7.8: The first and second steps of forming Volterras function
When we do this for all intervals removed from [0, 1] (i.e. in the limit), we get
Volterras function, V (x). More formally, we set
X
V (x) = gn (x),
n=1
where gn (x) consists of 2n1 shrunk copies of g slid into to the 2n1 intervals
removed in the nth step of forming SVC.
It turns out that V is continuous and differentiable1 on [0, 1], and that V
is bounded but discontinuous at every point in SVC. Since SVC has Lebesgue
measure 1/2, then the set of discontinuities of V on [0, 1] has measure 1/2, so
V is not Riemann-integrable on [0, 1] by Lebesgues criterion.
How did something like this sneak by the fundamental theorem of calculus?
Exactly for which functions does the fundamental theorem of calculus actually
sec:lebesgue-differentiation
hold? Well revisit this problem later,
sec:absolute-continuity
in Sections 57 (we take derivatives of
integrals) and 61 (the FTC for grown-ups).
Remark 8. Although V is not Riemann-integrable on [0, 1], it is integrable
1 Differentiability is easy away from SVC. For SVC, differentiability from one side is easy,
and differentiability from the other side can be shown using the quadratic bounds on each gn .
49. EPISODE IV: A NEW HOPE FOR INTEGRATION (LEBESGUE) 157
on many other intervals. For example, if we integrate over one of the inter-
vals removed from SVC, then V is continuous except at the endpoint, and the
fundamental theorem of calculus holds. However, we can construct wilder func-
tions that are not Riemann-integrable on any interval. One such example is a
sec:discontinuity-sets
Pompeiu derivative, which will be introduced in Section 64.
hoped. Moreover, what if start with a nice function f , and then simply redefine
f (x0 ) := + at a single point x0 ? Then this new function isnt bounded and
so isnt Riemann integrable. But cmon, its just a single point, its not like we
added any area, right?
In addition, we have oddities like Volterras function. The fundamental
theorem of calculus for Riemann integrals requires overly restrictive conditions
that we dont like. Even simply being Riemann-integrable requires continuity
a.e. by Lebesgues criterion.
Finally, theres also the failure of limits and integrals to commute. For
example, take fn increasing to the Dirichlet function (throw one rational up to 1
R1 R1
each time); then 0 Q (x) dx = 0 limn fn (x) dx doesnt exist as a Riemann
R1
integral. It certainly isnt limn 0 fn (x) = 0, and we dont like that. We
would also like to exchange derivatives with integrals, but that is difficult when
limits and integrals dont necessarily commute.
In the early 1900s, Lebesgue addressed the issue of integration, and devel-
oped a better method of integration now called Lebesgue integration. It allows
for more functions to be integrated and has nice properties that the Riemann
integral lacks. Informally, the idea of the Lebesgue integral is to partition the
y-axis instead of the x-axis. Instead of summing the height times the width of
rectangles partitioned along the x-axis, the idea is to sum the y-value times a
weight corresponding to how often the function takes values close to that
y-value. Theres a quote from Lebesgue that tries to capture this idea [TODO
citation]:
Now we flesh out some details: for an interval of the y-axis [y0 , y0 + y], we
look at x [a, b] such that f (x) [y0 , y0 + y]. What we need to know is the
measure () of the set
Figure 7.10: How Riemann counts his money vs. how Lebesgue counts his money.
Either way, it is clear that inventing new types of integrals is not very profitable.
fig:coins (Oriel illustrated this)
y0 + y
y0
x
a S1 S2 S3 b
Sy0 = S1 S2 S3
Additionally, it will turn out that the Lebesgue integral agrees with the Riemann
integral whenever the latter is defined.
Theorem 41. If f : [a, b] R is Riemann integrable, then f is Lebesgue
integrable on [a, b] and the two integrals agree.
Finally, as Riemann integration and the area under a curve related to 2D
sec:riemann-intro
Jordan measure (cf. Section 46), so also does Lebesgue integration and area
fig:oregon
under a curve relate to 2D Lebesgue measure. (Figure 7.12.)
Proposition 33. Let f 0 be measurable and let A be the plane region under
the graph of f ,
For measurability, which we probably should have shown first, note that it is
easy if f is a nice simple step function (i.e. it only takes on finitely many
y-values), since then the area under the curve is a nice union of sets that
look like A [y1 , y2 ]. For general f measurable, well see shortly that we can
approximate it from below by simple functions. Then the area under the
curve will just be the union of the areas under the simple functions.
49. EPISODE IV: A NEW HOPE FOR INTEGRATION (LEBESGUE) 161
Figure 7.12: The integral of (part of) the Columbia River equals Oregon (mostly),
the region below the river. [Feel free to edit/replace if this isnt what you had in mind.
fig:oregon file is in gimp format (xcf) -lhs]
162 CHAPTER 7. INTEGRATION
{f < }
Figure 7.14: The integral of cA is just c (A), which agrees with the nice simple
case where A R and is a nice union of a few intervals.
50. INTEGRATION ON MEASURE SPACES 163
Using just this idea, were going to define integrals starting with simple func-
tions. These are functions : R or C measurable such that card () <
, i.e. it only takes on finitely many values. Maybe more clearly, such a func-
tion always has a canoncial representation in terms of characteristic functions,
N
X
= ci Ai with ci 6= cj and Ai Aj = for i 6= j.
i=1
Note that if it werent for ci 6= cj , this wouldnt be unique since we could further
decompose the Ai . Equivalently, we can just require that Ai = 1 (ci ).
Figure 7.15: Some simple functions look simple. Be careful that the sets Aj may
be rather wild though! Those dotted lines on the left that make the picture look like
not-a-function represent something like the Dirichlet function or one of its relatives.
Just like in the really simple case of characteristic functions, define the in-
tegral of a simple function by setting
Z XN
d := ci (Ai ).
i=1
Now that we have integration figured out for simple functions, we turn to mea-
surable functions in general. Were going to start with measurable functions
f 0. Once we have this, then it will be easy to extend to general measurable
f : C by breaking f up into real and imaginary parts and then positive
and negative parts. We will use simple functions to approximate f 0. And
were going to approximate it pointwise since thats convenient. This is the idea
of cutting up the y-axis.
lem:approx Lemma 3 (Approximation lemma). Let f 0 measurable. Then there exists
a sequence of simple functions n 0 such that n increases to f pointwise.
Proof. The idea is to cut up the interval [0, 2n ] on the y-axis into segments of
fig:simple-approx
1
length 2n . (Figure 7.16.) Then we set
22n
X 1
j
n = j j+1 .
j=0
2n { 2n f < 2n }
164 CHAPTER 7. INTEGRATION
2n
1
2n
This means all the theorems well prove later about Lp spaces or integration on
measure spaces carry over to sums. Of course, the special case for sums usually
has its own simpler proof, but its fun to hit it with the general-measure-theory
hammer.
51. NICE FUNCTIONS AND THE 3 LIMIT THEOREMS 165
m:integral-properties Lemma 4.
R R
(i) If 0 f g, then f d X g d.
X
R R
(i) If A B and f 0, then A f d B f d.
R R
(i) If f 0 and c R, then X cf d = c X f d.
fn
N E Z Z
TTh O m
r e
eo X
lim = lim
X
f3 O
N ce
f2 Overgen
f1 Mon
C
Figure 7.17: We can switch limit and integral if the sequence of functions is pointwise
increasing.
S
Proof. First, f = sup fn is measurable since {f > } = nN {fn > }, and
each set in the union is measurable.3
3 More generally, the pointwise limit of measurable functions is measurable; use lim =
lim sup and the facts that sups and infs of mesaurable functions are measurable.
166 CHAPTER 7. INTEGRATION
R
lem:integral-properties
By Lemma 4, the sequence X
fn d is increasing so converges to say .
Since fn f , Z
f d.
X
For the other inequality, approximate f by a measurable simple function s f .
Let 0 < < 1 and set Xn := {x X : fn (x) s(x)}. Since < 1, we have
Xn X. Also
Z Z Z
fn d fn d s d, n N.
X Xn Xn
Since this holds for all < 1 and all simple functions s f , this implies
Z
f d.
X
Note that the integral for fn does not need to be finite, and fn may not
converge. For example,R look at fn (x) n1 on R. Fatous lemma tells usRthat
when were integrating X fn , the integral can only drop down as we go to X f .
fig:fatou
(Figure 7.18.) The proof is by applying MCT to gn := inf kn fk , since gn is
increasing and lim inf fn = lim gn .
n n
In fact, Z
|f (x) fn (x)| d 0. (7.17) eqn:dct-2
X
R
Proof. Apply Fatou to 2g |fn f | 0 and subtract X
2g < from both
sides to obtain Z
lim sup |fn f | d 0,
n X
51. NICE FUNCTIONS AND THE 3 LIMIT THEOREMS 167
Z FATOU
Z
f3 d
f2 d X Z
Z X Z
f6 d
f1 d f4 dZ X
X X
f5 d
X Z
X fd
fig:fatou Figure 7.18: The integral can only drop down in the limit.
f2
f1
f3
Figure 7.19: The Dominated Convergence Theorem. TODO make the graph look
like a bicep :P or have the graph outlining the top of a bicep
eqn:dct-1 eqn:dct-2
using lim
R inf(h)
= 7.16) and (7.17) are equivalent
R lim sup h. Note that (eqn:dct-1
using4 X f d X |f | d and by applying (7.16) to |f fn |.
R sec:Lp
measurable f such that X |f | d < . Well talk more this in Section 53. If we
identify integrable functions that differ only a measure zero set together, then
L1 is a Banach space with the norm
Z
kf kL1 := |f | d. (7.18)
X
Using the convergence theorems, we can prove nice things about nice func-
tions in L1 . For example,
This is easy to prove using MCT or DCT and the approximation lemma
lem:approx
(Lemma 3).
Simple functions are simple but maybe not particularly nice. But we also
have:
prop:Cc-approx Proposition 35. Let X be a locally compact metric space and a regular Borel
measure. Then the continuous compactly supported functions, denoted Cc (X),
are dense in L1 (X, , ).
Proof sketch. The idea is to use Urysohns lemma, which states that if X is
a normal space and A, B are closed disjoint subsets of X, then there exists
a continuous function f : X [0, 1] with f (x) = 0 if x A and f (x) = 1
if x B. Since simple functions are dense in L1 , we only need to show K
K
(O \ K)
f (x) = 0 f (x) = 1
Remark 10. The theorem holds more generally, e.g. for X a locally compact,
-compact Hausdorff space and a Radon (locally finite, inner regular) measure.
52 Detour: Convexity
[TODO make less like a textbook 26 June 2016 -lhs]
Convexity is a pretty useful topic in analysis. Youve probably heard of
convex functions before (maybe the terms concave up and concave down
are familiar to you from first year calculus), but were going to give a proper
definition here.
Before talking about convex functions, we need to talk about convex sets in
a vector space (e.g. think of convex polygons). So let V be a R or C-vector
space.
[x1 , x2 ] = {t x2 + (1 t)x1 : 0 t 1} C.
In other words, the line segment between any two points in C is contained in
fig:convex-set
C. (Figure 7.21.)
x2
[x1 , x2 ]
x1
t
f
epi(f )
x
2 C = [2, 1] 1
y
f
x
1 2
fig:secant Figure 7.23: The function value is below the secant line value.
Proof. () This comes from looking at the geometry of convex functions, which
well do right after this.
() Let x0 < x1 , x0 , x1 I. Define xt := t x1 + (1 t)x0 , 0 t 1. By the
52. DETOUR: CONVEXITY 171
As promised, now well look at the geometry of convex functions. The key
lemma is the 3-chord lemma. Were not going to prove it here, but drawing
fig:3-chord-lemma
pictures (Figure 7.24) should make it somewhat convincing.
Lemma 5 (3-chord lemma). Let I R be an interval, f : I R be convex.
Then
f (b) f (a) f (c) f (a) f (c) f (b)
,
ba ca cb
for a < b < c and a, b, c I.
x
a b c
x
a b c d
TODO proof sketch! If f is strictly convex, it can be shown that t(x) satisfies
t(x) < f (x) for all x 6= x0 , which can be used
R to show that if f is strictly convex,
then we get equality in Jensen iff f (x) = f d a.e.
One immediate application of Jensens inequality is the well-known AM-GM
inequality.
eqn:am-gm
which is (7.23). Garling
References: [2]
53 An introduction to Lp spaces
sec:Lp
[TODO check when do we use sigma-finite?]
174 CHAPTER 7. INTEGRATION
Basically, we take all measurable functions whose pth powers are integrable, and
then mod out by functions that are the same everywhere. Functions that are
equal a.e. have identical integrals, and we really dont care if functions differ
on just a set of measure zero. So Lp is really a bunch of equivalence classes of
functions. Nevertheless, well generally still talk about elements of Lp as usual
functions since its convenient; just keep the equivalence class idea in the back
of your mind. We also define for 1 p < ,
Z 1/p
kf kp := |f |p d(x) , (7.24) eqn:Lp-norm
which we have yet to show is a norm. In fact, this is not a norm for p < 1. Note
eqn:Lp-norm
that in order for (7.24) to have any hope of being a norm, we need functions
that are zero a.e. to be identified with the zero element in Lp . So indeed we
need theoe equivalence classes.
Before going on, note that L1 is just Lebesgue-integrable functions, or more
precisely, the set of equivalence classes of integrable functions under the relation
equal a.e.. We have yet to define L (, , ). It gets a slightly different
definition:
and
kf k = esssup |f | = inf{C : |f (x)| C a.e.}.
The L norm is basically the supremum of f , but not counting sets of measure
zero where f might be large: if you go crazy and set f = on a set of measure
zero, we just ignore it. Hence the term essential supremum.
You might wonder, why do we call it L ? Heres some motivation: If has
finite measure, then L () Lp () and
lim kf kp = kf k (7.25)
p
for any bounded measurable function f . Note, we really mean that there is some
function in the equivalence class that is bounded. Equivalently, kf k < . The
idea is that anything raised to the power 1/p is going to get killed as p .
Let S := {x : |f (x)| kf k }, with 0 < < kf k . Then
Z 1/p
p
kf kp (kf k = (kf k ) (S )1/p , (7.26)
S
53. AN INTRODUCTION TO LP SPACES 175
kf kp kf k ()1/p , (7.27)
so lim supp kf kp kf k .
Remark 12. One cool thing about Lp spaces is that they include the little p
sequence spaces. For example, p (N) = Lp (R, P(R), N ), where N is the mea-
sec:integration-measure
sure that puts an atom of weight one at each point in N. cf. end of Section 50,
remark about sums are integrals in disguise.
Here are some useful properties of Lp spaces, 1 p < . Were omitting
teschl-fa
the proofs, but see [7] or any real analysis book that covers Lebesgue integration
if youre interested.
eqn:Lp-norm
Lp is a normed vector space with respect to (7.24). This comes down to
the triangle inequality (Minkowski), which says,
kf + gkp kf kp + kgkp .
One possible proof uses the fact that t 7 tp is convex on [0, ) for p 1.
Lp is complete (Riesz-Fischer). The proof involves showing that a Cauchy
sequence in Lp has a subsequence converging pointwise a.e. to some mea-
surable function, and then using Fatou.
1 1
Holders inequality. Let p, q > 1 with p + q = 1. Then
kf gk1 kf kp kgkp .
Why doesnt Fubinis theorem work here? Heres the general measure theory of
Fubinis thoerem (combined with Tonelli, which is for nonnegative functions).
Were glossing over the details about forming product measures.
thm:fubini Theorem 47 (Fubini-Tonelli).
TODO: non-warm-up. irrational stuff?
55. RADON-NIKODYM AND RIESZ REPRESENTATION 177
But wait! We can decompose ds (x) further. We have a weight function (x) :=
s ((, x]) which is increasing and right-continuous (Lebesgue-Stieltjes mea-
sure). So (x) has at most countably many discontinuities xk , k N, which we
split off: X
(x) = [xk ,+) (x) ((xk ) (x
k )) +sc (x),
kN
| {z }
=:ck >0
The singular continuous part s has no atoms (points xk where ({xk }) > 0).
The Riemann-Stieltjes/Lebesgue-Stietjes
sec:cantor-function
measure for the Cantor function from
Section 63 is an example of a singular continuous measure.
Now onto the Riesz representation theorem.
Definition 43. A Hilbert space H is a complete inner product space. In other
words, its a vector space equipped with a complete inner product h, i satisfying
1. Sesquilinear form: Linear in the second term, conjugate linear in the first7 .
2. Positive definite: For x 6= 0, hx, xi > 0.
3. hx, yi = hy, xi.
The easy examples of Hilbert spaces are Cn . One important infinite dimen-
sional Hilbert space is
Z
L2 (R) = {f : R C measurable : |f |2 dx < }/equality a.e.,
R
Question: Can we think of any continuous linear functionals that are not of
the above form for some H ?
Answer: No; every continuous linear functional looks like for some H .
Proof. WLOG assume H \{0} (statement trivial for = 0). Then ker (
H is a closed subspace of H (continuity of 8 ). Hence, decompose H =
ker (ker ) . We have (ker ) Range() = C, e.g. by the first isomorphism
theorem for vector spaces9 . Thus (ker ) = span{0 } for some 0 H \{0}.
(0 )
Take = k0 k2 0 :
If = 0 (ker ) , then
* +
(0 )
h, i = 0 , 0 = (0 ) = ((0 )) = ().
k0 k2
56 Duality
From the Riesz representation theorem in the previous section, we know that
Hilbert spaces are self-dual, and in particular that L2 is self-dual. What about
the other Lp spaces? We only defined dual spaces for Hilbert spaces last time,
but theyre the same thing for normed vector spaces.
Definition 45. Given a normed C-vector space (V, k k), we can consider its
dual, V := { : V C linear and continuous}, endowed with the operator
norm kkV := sup |(x)|.
kxk1
Its not clear that the dual space is non-trivial in general (for infinite dimen-
sional non-Hilbert spaces). Linearity is no problem since we can just define on
8 continuity preimage of 0 is closed, linearity subspace
9 Alternatively,note that a single vector 0 (ker ) is enough, since any vector H
() ()
can be written = ( ( ) 0 ) + ( ) 0 ; this shows (ker ) is 1-dimensional.
0 0
180 CHAPTER 7. INTEGRATION
Figure 7.27: A vector space and its dual. TODO fix latex resolution
a basis, but the continuity requirement could cause some problems. However,
conveniently, for Lp spaces, we have it easy. First well spoil the surprise and
tell you what the dual space of Lp is, for 1 < p < : If we define the dual
exponent q via p1 + 1q = 1, then the dual space of Lp is Lq ! The definition of
the dual exponent should remind you of Holders inequality, kf gk1 kf kp kgkq
where p1 + 1q = 1. In fact, we can use Holders inequality to get some prototypical
examples of elements in the dual space of Lp . So for us, there are no worries
about the dual space being trivial. Let g Lq ; then the map g : Lp C
defined by Z
g (f ) := gf d (7.28) eqn:dual-linear-funct
We select a ball Br (x) of radius r > 0, average |f | over that ball, and then see
how large we can make that value. In terms of cricket batting averages, we fix a
point in time x, select how far back to go r, calculate the batting average over
than time interval, and then see how large we can make our batting average.
It turns out studying the Hardy-Littlewood maximal function will allow us to
generalize the fundamental theorem of calculus. Recall in 1D, for f continuous,
Z x Z
1 x+h
F (x) := f (t) dt is differentiable, and F (x) = f (x) = lim f (t) dt.
x0 h0 h x
(7.29) eqn:ftc-again
182 CHAPTER 7. INTEGRATION
Figure 7.28: Mathematicians v. The Rest of the World: Hardys team of math-
ematicians go to play cricket in 1926. [TODO check I think this is public domain
now?]
R x+h
The last expression limh0 h1 x f (t) dt is related to the averages we take in
the Hardy-Littlewood maximal function. eqn:ftc-again
What conditions do we need to impose on f to ensure that (7.29) holds?
Continuity is certainly enough, but we want it to hold for more functions. The
Lebesgue differentiation theorem generalizes this to hold for f L1loc (Rn ) (f is
integrable on all compact subsets of Rn ), although we relax pointwise conver-
gence to just a.e. pointwise convergence.
thm:lebesgue-differentiation Theorem 52. If f L1loc (Rn , dn x), then
Z
1 r0+
f (y) dn y f (x), a.e. x Rn .
|Br (x)| Br (x)
In R, this just becomes,
Z x+h
1 h0+
f (y) dy f (y), a.e. x R.
2h xh
You might complain, thats not exactly the derivative since were averaging over
an interval (x h, x + h) instead of (x, x + h) or (x h, x). Good news is, the
result still holds if you replace (x h, x + h) with (x, x + h) or (x h, x). More
generally, you can rudin
replace Br (x) with any sequence of nicely shrinking sets.
sec:rectangles
See Section 59 or [4, 7.9].
A useful and common method for proving pointwise convergence a.e. is via
maximal functions (like the Hardy-Littlewood maximal function) and maximal
inequalities.
Theorem 53 (magic of maximal functions). Let X be a Banach space. Suppose
we have a family of subadditive operators indexed by A, T : (X, k k)
57. LEBESGUES DIFFERENTATION THEOREM 183
N
X N Z pairwise Z
1X disjoint 1 1
|K| |Brxj (xj )| |f (y)| dy == |f (y)| dy kf k1 .
j=1
j=1 Br x N
j=1 Brx (xj )
j j
But of course theyre probably not going to be pairwise disjoint, so were going
to need the following lemma.
Lemma 7 (Vitali covering lemma). Given a finite family of open balls {Brj }1jN
in a metric space, then there is a nonempty S {1, . . . , N } such that
Brj Brl = , if l 6= j S
SN S
j=1 Brj jS B3rj
184 CHAPTER 7. INTEGRATION
rl
xl
r1
rj
3r1 x1 xj
fig:vitali-covering Figure 7.29: Balls intersecting Br1 (x1 ) are contained in B3r1 (x1 ).
S
So we can get K jS B3rj (xj ). Thus
X X Z
n 3n X
|K| |B3rxj (xj )| = 3 |Brxj (xj )| |f (y)| dy
Brx (xj )
jS jS jS j
sec:lebesgue-density
Finally, as promised in Section 38, we can now prove Lebesgues density
theorem about metric density.
58 Convoluted convolutions
sec:convolutions
Convolution is a kind of averaging or smearing operation. If we start with
f L1 which is not very nice at x, we can use convolution with a nice function
to smear out or average f around x.
For f, g L1 (Rn ), their convolution f g is
Z
(f g)(x) := f (y)g(x y) dn y. (7.32)
Rn
Often, we have f L1 , and g a nice smooth function with a large peak at the
origin, like Br (0) except smooth. Then g(x y) acts like a smooth cut-off or
window function for f , and (f g)(x) is a sort of average of f around x.
For example, we can take g to be a family of -function-wanna-bes. More
formally, well call them approximate or nascent -functions. Given some
0 with support in [1, 1] and integral kk1 = 1, we can make a family of
approximate -functions by setting
n (x) := n (nx).
Then supp n 1 1
n , n and still kn k1 = 1. As n , f n (x) averages f
on smaller and smaller regions around x. As we average on smaller regions, we
might hope that f n (x) f (x), so that the sequence of approximate delta
functions behaves like actual Dirac delta.
[TODO remark about f n being nice; Dirac delta.]
1 1
Figure 7.30: Approximate -functions. They look up to Dirac delta, both literally
(at the origin) and figuratively.
where we get |f (y)f (x)| < by only considering integration where n (xy) 6=
0 and using uniform continuity of f on compact subsets of Rn .
Next, we need the maximal inequality so we can apply the magic of maximal
inequalities. This gets a little complicated depending on what looks like, but
if we assume an easy case then were good to go:
lem:symmetric Lemma 8. If L1 is symmetric decreasing and f L1 (R), then |(f
)(x)| (MHL f )(x) kk1 .
thm:convolutions
Proof (of Theorem 55). If is symmetric and decreasing, then by the lemma,
we have |f n | kn k1 (MHL f ). Thus, defining the maximal operator
| {z }
=1
(T f )(x) := supnN |f n (x)|, we obtain
kf k1
|{(T f ) > }| |{MHL f > }| C . (7.33)
If is not symmetric and decreasing, then we can form its decreasing sym-
metrization, e
(x) := sup|y||x| (y), which is decreasing and symmetric (Fig-
fig:decreasing-sym
ure 7.31).
Figure 7.31: The decreasing symmetrization is shown in light blue. For 1D, sym-
fig:decreasing-sym metric just means even.
e
Since ,
lemma
fn
|f n | |f | n |f | (MHL f )kf
n k1 .
58. CONVOLUTED CONVOLUTIONS 187
e
We still have (x) fn (nx), so kf
=n e 1 . So
n k1 = kk
kf k1
|{T f > }| MHL f > C e 1.
kk
e 1
kk
lem:symmetric
Proof (offig:layer-cake
Lemma 8). We need a fun fact called the layer cake representation
(Figure 7.32): For measurable g 0,
Z
g(x) = {g>} (x) d. (7.34)
0
Z g(x) Z Z
g(x) = 1 d = [0,g(x)) () d = {g>} (x) dt. (7.35)
0 R R
g
1
Figure 7.32: Layer cake representation. {g>} (x) = 1 for in the green zone, and
then {g>} (x) = 0 for in the red zone. The integral along the green zone just gives
fig:layer-cake us the height g(x).
where we used a key fact: since is symmetric and decreasing, we must have
188 CHAPTER 7. INTEGRATION
eqn:layer-cake-phi
{ > } = Br (0) for some r > 0. Then using (7.36),
Z Z Z !
Fubini
|f (y)|(x y) dy == d |f (y)| dy
0 Br (x)
Z Z !
1
= d f (y) dy |Br (x)|
0 |Br (x)| Br (x)
| {z }
(MHL f )(x)
Z
(MHL f )(x) |Br (x)| d = (MHL f )(x) kk1 .
0 | {z }
=|Br (0)|
=|{>|}
m(Ej ) m(Brj ), n N.
E4 Br4 (x)
x x x x x
E5 Br5 (x)
E3 Br3 (x)
E2 Br2 (x)
E1 Br1 (x)
Figure 7.33: Nicely shrinking sets with say = 1/4. Each set Ej in orange takes up
fig:nicely-shrinking at least of the area of some ball Brj (x) containing it.
rectangles with edge lengths 1j and j12 is not a nicely shrinking set. As far as
the Lebsgue differentiation theorem cares, nicely shrinking sets are just as fine
to work with as balls:
Theorem 56. Let x Rd , and suppose {Ej (x)} shrinks nicely to x. If f
L1loc (Rn ), then
Z
1 n
f (y) dn y f (x), a.e. x Rn .
m(Ej (x)) Ej (x)
Proof.
Z Z
(x) n 1
|f (y) f (x)| d y |f (y) f (x)| dn y 0 a.e.
m(Ej (x)) Ej (x) m(Brj (x)) Br j
Now the rectangles in R2 were starting to feel left out and became sad. They
wondered why they werent included. Was this just an oversight? Or was it an
actual problem? Could there exist an f L1 (R2 ) for which the conclusion of
the Lebesgue differentiation theorem failed for families of rectangles? How bad
could it be?
(We just did a small change of variables to get the x y, since our rectangles
R are centered at the origin rather than at x.)
TODO proof idea, Borel-Cantelli idearudin Stein-osc
References: Nicely shrinking sets: [4]; rectangles: [6]
190 CHAPTER 7. INTEGRATION
Definition 47. Let (X, ) and (Y, ) be measure spaces, and let T : Lp (X, )
{f : Y C meas.}. We say T is of strong type (p, q) if it is bounded from
Lp (X, ) to Lq (Y, ). We say T is of weak type (p, q), q < , if it satisfies the
weak-type inequality
q
Ckf kp
({y Y : |T f (y)| > }) , some C > 0.
Remarks.
1. For (X, ) = (Y, ) and T the identity, the weak (p, p) inequality is the
Chebyshev-Markov inequality,
kf kpp
(|f | > ) .
p
This isthm:hardy-littlewood-maximal-inequality
a weak type (1, 1) operator (Hardy-Littlewood maximal inequality, The-
orem 54). It is also strong-type (, ) since kMHL f k kf k . What about
1 < p < ? It turns out we can use the Marcinkiewicz interpolation theorem
to show us that MHL is strong type (p, p) for all 1 < p .
Theorem
( 57 (Marcinkiewicz interpolation). Let 1 p1 < p2 . Let T :
L (X, d) Lpw1 (Y, d)
p1
be subadditive and satisfy kT f kpi ,w Ci kf kpi for
Lp2 (X, d) Lpw2 (Y, d)
i = 1, 2. Then T extends as T : Lp (X, d) Lp (Y, d) with kT f kp Cp kf kp
for p1 < p < p2 .
60. WEAK AND STRONG OPERATORS AND UNHAPPY ROTATED RECTANGLES191
f = f |> + f |f |
|{z} | |f
{z } | {z }
Lp Lp1 Lp2
/
/ /
Remark 13. TODO Besicovitch sets and the Kakeya needle problem, ref to
other section
rudin
Stein-osc
References: [4], [6]
61. THE FTC FOR GROWN-UPS (ABSOLUTE CONTINUITY) 193
e1
R
R1
R2 e2
R
R3
e3
R
ej of Rj .
Figure 7.36: The reaches R
fig:rectangles-reach
prop:integral-epsilon Proposition 41. Suppose g L1 (R). Then for every > 0, there exists a
> 0 such that Z
m(E) = g(x) dx .
E
The proof is just some basic measure theory. Basically, the statement is easy
for bounded functions, and continuity of measures along with g L1 (R) ensuresprop:integral-epsilon
that the measure of {g > N } cant beR too large. The point is, Proposition 41
x
smells an awful lot like continuity of a g dt. Its a really strong kind of con-
Ry
tinuity, stronger than uniform. Instead of just caring that x g dt is small for
194 CHAPTER 7. INTEGRATION
Moreover,
Z x
AC([a, b]) = {f : [a, b] C : f (x) = f (a) + g(t) dt, some integrable g},
a
(7.38)
so absolutely continuous functions are exactly the ones for which the fundamen-
tal theorem of calculus works. Well state this in a theorem.
61. THE FTC FOR GROWN-UPS (ABSOLUTE CONTINUITY) 195
to show f+ is increasing.
thm:abs-cont
Proof (of Theorem 58).
1. AC([a, b]) C([a, b]) BV([a, b]). For (uniform) continuity, just take a
single interval. For bounded variation, take corresponding to = 1 in
the definition of ac. Then the total variation on an interval of length
is 1, and we can break up [a, b] into ba
such intervals, so the total
variation is ba
.
2. For f AC([a, b]) increasing, its associated Lebesgue-Stieltjes measure is
ac w.r.t. to the Lebesgue measure: The associated LS-measure is
X [
LS (E) = inf f (bj ) f (aj ) : E (aj , bj ) . (7.45)
jN jN
Ik with interior (a, b), there exists an open interval Ik = (a , b + ) Ik such that
f (b + ) f (a ) f (b) f (a) + 2k .
61. THE FTC FOR GROWN-UPS (ABSOLUTE CONTINUITY) 197
and the same thing works for f (x) with
R x [x r, x]. Thus, f (x) = h(x)
exists Lebesuge-a.e., and f (x) = f (a) + a f (t) dt.
prop:integral-epsilon
4. Finally, by Proposition 41, we get the inclusion in
Z x
AC([a, b]) = {f : [a, b] R : f (x) = f (a)+ g(t) dt, some g : [a, b] R}.
a
(7.47)
teschl-fa
References: [7]
198 CHAPTER 7. INTEGRATION
refsection:9
Chapter 8
Episode V: Differentiation
strikes back
Note this almost looks like the fundamental theorem of calculus, except we
have an inequality. We can easily see the inequality is necessray in general: If f
then the integral of f notice, but fsec:cantor-function
has a jump discontinuity,fig:monotone-jump (b)f (a) certainly
will be affected (Figure ??). As we will see later (Section 63), discontinuities
arent the only case where the inequality is necessary. We can have f continuous
and yet require the strict inequality, basically because f manages to do all of
its increasing on a set of Lebesgue measure zero.
The result that f is differentiable a.e. reminds us of the same result for
sec:absolute-continuity
absolutely continuous
thm:monotone-diff
functions from Section 61. It turns out the proof of The-
thm:abs-cont
orem 60 is quite similar to the proof of Theorem 58 (FTC for Grown-ups);
well use Lebesgue-Stieltjes measures, the Lebesgue differentiation theorem, and
Radon-Nikodym-Lebesgue, just like before. The difference is that well need to
deal with differentiating a measure that is singular with respect to the Lebesgue
measure. With absolutely continuous functions, we only had to deal with a
Lebesgue-Stieltjes measure that was absolutely continuous with respect to the
Lebesgue measure.
199
200 CHAPTER 8. EPISODE V: DIFFERENTIATION STRIKES BACK
f (b)
f has no idea
f (a)
a b
fig:montone-jump Figure 8.1: If f has a jump discontinuity, then f has no idea how big the jump is.
Proof. Were going to be assuming that f is increasing (just take f for decreas-
ing). First, if f is absolutely continuous
abs-cont
and increasing, then we do the same
thing as in the proof of Theorem ??: LS ([a, x]) := f (x) f (a) is absolutely
continuous with respect to the Lebesgue measure dx, so by Radon-Nikodym,
Z
f (x) f (a) = h dx,
[a,x]
Since the same thing works for f with [x r, x], theres no problems here.
Now if we just have an increasing function f , decompose the associated
Lebesgue-Stieltjes measure1 LS ([a, x]) := f (x) f (a) via the Radon-Nikodym-
thm:radon-nikodym
Lebesgue decomposition theorem (Theorem 48): LS = ac + sing , so
Z
f (x) f (a) = ac ([a, x]) + sing ([a, x]) = g dx + sing ([a, x]) (8.1) eqn:decomp-sing
[a,x]
(B (x))
(D)(x) := lim = 0, a.e. x Rn .
0 |B (x)|
Proof. TODO
Now suppose is singular wrt dx. Take to be a support for sing with
|| = 0, and then apply the first part to A = R \ . Thus we can remove the
sup in lim sup to get (D)(x) := lim0 (B (x))
|B (x)| = 0 for a.e. x R.
We can adapt this for nicely shrinking sets as we did in the Lebesgue differ-
entiation theorem [TODO check this] to get
LS ([x, x + h])
f (x) = lim = g(x) Lebesgue-a.e.
h0 h
So f is differentiable Lebesgue-a.e., and
Z Z b
f (b) f (a) = LS ([a, b]) ac ([a, b]) = g(x) dx = f dx.
[a,b] a
thm:monotone-diff
A strict inequality in Theorem 60 occurs when we have nonzero contribution
from sing . The measure sing includes jumps (pure point), but also more exotic
measures called singular continuous (i.e. singular but not pure point). Well
sec:cantor-function
see examples of these in Section 63, and theyll give us examples of monotone,
thm:monotone-diff
continuous f with a strict inequality in Theorem 60.thm:montone-diff
As a final application, we can extend Theorem ??sec:abs-cont
to functions of bounded
variation (recall these were just defined in Sectionthm:bv-jordan
??). Since functions f
BV([a, b]) have a Jordan decomposition (Theorem 59) f = f+ f where f
are increasing, we get:
manages to creep up from 0 to 1. Its also called the devils staircase, even
though its not particularly evil. We could have defined this function way earlier
in this book, but we waited until now because we want to make a connection to
Lebesgue-Stieltjes measures.
More specifically, the Cantor function F : [0, 1] [0, 1] is continuous and
surjective, and it has derivative 0 everywhere except on the Cantor set. That
is, it has derivative 0 on all the gaps in the Cantor set, and manages to creep
from 0 up to 1 by moving only on the Cantor set! The function F is defined
iteratively, by making Fn constant on the gaps In (the gaps in the nth step of
constructing the fig:cantor-function
Cantor set) and linear in-between. The first few iterations are
shown in Figure 8.2.
1 1 1
3 3
4 4
1 1 1
2 2 2
1 1
4 4
1 2 1 1 2 1 2 7 8 1 1 2 1 2 7 8 1
3 3 9 9 3 3 9 9 9 9 3 3 9 9
Figure 8.2: Constructing the Cantor function, also known as the devils staircase.
fig:cantor-function On the jth gap in In , Fn (x) = 2jn . Outside the gaps, we connect linearly.
Then we define F (x) := limn Fn (x). Well actually prove F has the de-
sired properties using a different construction of the Cantor function via decimal
expansions. First, we have
X ak
In = {x [0, 1] : x = with ak {0, 1, 2} and aj {0, 2} for 1 j n}.
3
kN
(Any x with some ak = 1 comes from a removed middle third.) Take some time
to convince yourself of this characterization.
So each x is uniquely represented by a sequence of zeros and twos. We
just write x in base 3 without any 1s (there are some ambiguities and details to
work out here, but well skip them; they come from non-uniqueness of decimal
expansions, like 0.1 (base 3) = 0.02222 (base 3)). Then define f : [0, 1]
via
ak f X ak 1
X
x=
7 .
3k 2 2k
k=1 k=1
All were doing here is taking the base 3 expansion of x, replacing 2s with 1s,
and pretending the expansion is in base 2. For example, 23 = 0.2 (base 3) 7
0.1 (base 2) = 12 .
63. THE CANTOR FUNCTION AND LEBESGUE-STIELTJES MEASURES203
Informally, this means to fill in the gaps (removed middle thirds), we just find
the largest value L to the left of the gap, and then make the function constant,
equal to L, along this gap. We have F (x) = f (x) if x and F (x) = 2kn if x
is in the kth gap of In (1 k 2n1 ). Note F (and f ) are surjective onto [0, 1]
(base 2 expansions). Since F is monotone, this means it must be continuous
(the only discontinuities allowed are jump discontinuities, but then it wouldnt
be surjective).
sec:radon-nikodym
Remark 14 (Lebesgue-Stieltjes meaures). Recall from Section 55 that we can
decompose a measure into three mutually singular measures,
X
d(x) = dac (x) + ck xk (x) + dsc (x) ,
| {z }
kN
| {z } singular
continuous
pure point component
where the singular continuous part sc has no atoms (points xk where ({xk }) >
0). The Lebesgue-Stieltjes measure for the Cantor function,
The idea is that we put bumps at all the rationals, and then we add up all the
bumps we run over on the way from 0 to x. The measure ([0, x]) is a singular
lem:singular-diff
measure (wrt the Lebesgue measure dx), so Theorem 10 implies
df d([0, x])
(x) = = 0 Lebesgue-a.e.
dx dx
204 CHAPTER 8. EPISODE V: DIFFERENTIATION STRIKES BACK
to handle f not necessarily continuous. This way, things make sense and we have
({a}) = f (a+) f (a), the size of the jump. Note also that the actual value
of f at a discontinuity is irrelevant. A useful theorem in measure theory (which
also gives us a way to define Lebesgue measure) ensures these Lebesgue-Stieltjes
measures are well-defined:
65 Fabius function
TODO write. [combine with Taylor series section? or combine with cantor func-
tion section] http://mathoverflow.net/questions/43462/existence-of-a-smooth-function-with-
66 Pompeiu derivative
TODO write. [TODO combine with discontinuity sets of derivatives? -lhs]
67 Functional derivatives
This section is going to be a little unconventional, if that phrase even means
anything for this book. Our starting place is with objects called functionals.
67. FUNCTIONAL DERIVATIVES 205
This function is continuous and differentiable in all but the first variable. Of
course if we restrict our domain to not allow the first variable to be zero, we
once more get a continuous and differentiable object, but wheres the fun in
that?
Now that we have a handle on discrete functionals, lets move on to the
continuum version. For example,
Z 1
F (h) = h(x)2 dx
0
fN (
x)
lim = F (h).
N N
This is the connection between the discrete and continuum cases, along the
same lines as the connection between finite differences and derivatives.
206 CHAPTER 8. EPISODE V: DIFFERENTIATION STRIKES BACK
x3 h
x2
xN
x1
0 1 2
N 1 1
N N N
f (
e) f (
x + b
x)
Debf (x) = eb f (
x ) lim .
0
x
h(t) + w(t)
(t1 , x1 )
h(t) w(t)
(t0 , x0 )
t
This is all pretty abstract, so lets get back to our example. Let
Z 1
F (h) h(x)2 dx
0
67. FUNCTIONAL DERIVATIVES 207
as usual. Then
Z 1
F (h) (h(x) + w(x))2 h(x)2
= lim dx
w 0 0
Z 1 Z 1
2
= lim 2h(x)w(x) + w(x) dx = 2h(x)w(x) dx.
0 0 0
Chapter 9
Example 15. Imagine tossing two dice, hoping that they will sum to 10.
We model this experiment by the probability space (, F, Pr), where =
1
{1, 2, . . . , 6}2 , F = P(), and Pr(S) = 36 |S|. The probability that the two
1
dice sum to 10 is Pr({(a, b) : a + b = 10}) = 12 .
Example 16. Suppose you are terrible at tossing darts; you always hit the
dartboard, but you have no tendency to hit any particular part of the dartboard.
What is the probability that you will get a bullseye? We model this experiment
by the probability space (, F, Pr), where R2 is the unit disc, F is the set
of Borel subsets of , and Pr is the Lebesgue measure (renormalized so that
Pr() = 1.) The probability in question is the ratio of the area of the bullseye
region to the area of the entire dartboard.
Example 17. Imagine repeatedly tossing a fair coin. What is the probability
that the sequence of outcomes, interpreted as zeroes and ones, is the binary
expansion of a transcendental number? We model this experiment by the so-
called product space of the probability space for a single coin with itself countably
infinitely many times. It turns out that the number whose binary expansion is
209
210 CHAPTER 9. PROBABILITY AND ERGODIC THEORY
72 Random walks
73 Brownian motion
[Discuss physics motivation, give a definition, state some basic analysis-ish facts
like the fact that it is continuous everywhere but differentiable nowhere a.s.]
74 Khinchins constant
[TODO: rewrite/edit this, I mainly just copy-pasted from my math club talk
notes -lhs]
An example:
1
=3+ .
1
7+
1
15 +
1
1+
292 +
Define
1
[a1 , a2 , . . . , an ] := , a1 , . . . , an N.
1
a1 +
1
a2 + +
an
212 CHAPTER 9. PROBABILITY AND ERGODIC THEORY
1
x = [a1 , a2 , . . .] = ,
1
a1 +
1
a2 +
a3 +
where
1
an (x) = .
n1 (x)
Question: Do the an (x) tend to favor certain numbers? Note that an (x) = k
1
iff k+1 <
n1
(x) k1 . So we want to look at the distribution of n (x) over all
n.
Recall the Gauss map
1 1
(x) = , x [0, 1).
x x
(i) Each k N appears in the sequence a1 (x), a2 (x), . . . with asymptotic fre-
quency
1 1
log 1 + .
log 2 k(k + 2)
1
lim
(a1 (x) + an (x)) = .
n n
(iii) The geometric mean of the partial quotients coefficients has limit
Y log j/ log 2
1
lim (a1 (x)a2 (x) an (x))1/n = 1+ =: K0 2.685.
n
j=1
j(j + 2)
Remark 16. The theorem only holds for a.e. x [0, 1). In particular, they do
not hold for quadratic irrationals (irrational root of a quadratic equation with
rational coefficients). For example,
1+ 5
2 = [1; 2, 2, 2, . . .], = [1; 1, 1, 1, . . .].
2
Proposition 44 (Lagrange 1770). The continued fraction expansion for x is
eventually periodic if and only if x is a quadratic irrational.
Proof (of theorem). (i) Let f := (1/(k+1),1/k] , so that an (x) = k iff f (n1 (x)) =
1. Apply Birkhoff ergodic theorem.
(ii) Let f (x) := 1/x = a1 (x), so an (x) = f (n1 (x)) and the N th arith-
metic average is
N 1
1 1 X
(a1 (x) + + aN (x)) = f (j (x)).
N N j=0
R 1 1x
since2 f (x) = 1/x > (1 x)/x and 0 x(1+x) dx = .
(
f (x), f (x) N
Set fN (x) := and note fN L1 ([0, 1), ). Apply Birkhoff
0, else
ergodic theorem to fN : For a.e. x [0, 1),
N 1 N 1
1 X 1 X
lim inf f (j (x)) lim inf fN (j (x))
N N j=0 N N
j=0
Z 1
1 fN (x)
= dx
log 2 0 1 + x
as N .
(iii) Were going to take logarithms. Let f (x) = log1/x, so
N 1
1 1 X
(log a1 (x) + log aN (x)) = f (j (x)).
N N j=0
One can show f L1 ([0, 1), d). Then apply the Birkhoff ergodic theorem and
exponentiate.
75 Normal numbers
[TODO: write. Discuss definition of normality, give an example in base 10, talk
about the fact that almost all numbers are normal. Also talk about the fact
that there ARE specific known examples of numbers which are normal in every
base, contrary to common wisdom...]
2 f (x) 1
jumps at n
, check those regions and use (1 x)/x decreasing
refsection:11
Chapter 10
Fourier analysis
chap:fourier
Ya got the wrong eigenvalue!!
Put a Fourier transform on it.
PRONTO!!
the function into its distinct sine and cosine frequencies by writing,
X
X
f (x) = a0 + an cos(2nx) + bn sin(2nx), an , bn R.
n=1 n=1
Now f was a bit confused. Why in the world did these engineers and physicists
expect that he could be decomposed into a bunch of sines and cosines? So
he asked them about this, and one of them explained: Well, we think youre
a sound wave (that talks!), and we want to determine the amplitudes of each
of the frequencies that make you up. Besides telling us lots of useful things,
once we know all the amplitudes for all the frequencies, we can synthesize a
wave by just adding up a bunch of pure sine and cosine terms with the correct
amplitudes.
215
216 CHAPTER 10. FOURIER ANALYSIS
Okay, but thats not exactly answering my question; why should sines and
cosines be enough? Why dont you try to write other functions in terms of me
instead? asked the function, who was starting to get jealous of sin and cos.
The physicist started to say something about the set {e2inx }nZ forming an
orthonormal (Schauder) basis for the function space L2 ([0, 1]), then remembered
he was a physicist and proceeded to give an example and some formulas instead.
Forget the sines and cosines; its often easier to deal with this if we use
complex exponentials instead. Write
X
f (x) = an e2inx , (10.1) eqn:fourier-expansion
nZ
1.0
0.5
-0.5
-1.0
Figure 10.2: The first few partial sums of the Fourier series for the square wave
g:fourier-square-wave (shown in black).
1.0
0.8
0.6
0.4
0.2
Figure 10.3: The first few partial sums of the Fourier series for the saw tooth wave
ourier-saw-tooth-wave (shown in black).
Figure 10.4: The torus T = R/Z is really the circle since were in one dimension.
Points in R that differ by an integer are identified together, so we end up with [0, 1]
fig:torus with 0 and 1 identified, which forms the circle.
R
Now f was absolutely integrable, that is, T |f | dx < , so f L1 (T), but
other than that he wasnt a very nice function. One day a new operator came
to visit the torus. His name was the Turn-me-into-a-Fourier-series operator.
For the right price, this operator would give a function an alternate identity in
terms of a sum of complex exponentials. In other words, it told a function its
Fourier series. After seeing some of his friends like the square wave and sawtooth
wave visit the Turn-me-into-a-Fourier-series operator and come back with a
totally new and exciting Fourier series representation, f decided to try it out
for himself. So he asked the Turn-me-into-a-Fourier-series operator to turn
him into a Fourier series. The Turn-me-into-a-Fourier-series operator looked
at him and paused. Hmmm, youre an L1 (T) function at least...I can compute
your Fourier series coefficients...but when I try to create the Fourier series, it
doesnt look like you. Now f , being in L1 , was used to actually representing an
equivalence class of functions in L1 that differed only by a set of zero Lebesgue
measure, so he replied, Okay, okay, I know it might not look like me, but
I represent an equivalence class of pretty similar functions, so as long as it
looks like someone in my equivalence class, its fine. But the Turn-me-into-
a-Fourier-series operator sighed and said, Im sorry, but Im looking at the
78. CONTINUOUS FUNCTIONS MAKE FOURIER SERIES CRY 219
resulting Fourier series, and it doesnt converge anywhere. Like, youre asking for
pointwise almost everywhere convergence, but I cant even get you convergence
at any point! Im going to have to ban functions like yourself! Howd you
even get yourselves into L1 , youre so ugly! At this, f ran away back to his
home on T and cried himself to sleep. All his friends1 could get nice Fourier
series converging to themselves at least almost everywhere, and here he was, his
Fourier series diverging everywhere! As he fell asleep, he secretly blamed the
great mathematician Andrey Kolmogorov for all his woes.
Why did f blame Kolmogorov? In fact, f was quite correct in placing his
blame. In 1923, Kolmogorov shattered the physicists dreams when he showed
that there is an L1 function whose Fourier series diverges almost everywhere.
Well, he didnt really shatter their dreams, since the physicists didnt actually
really care. Anyway, three years later in 1926, he constructed an L1 function
f whose Fourier series diverges everywhere. It was f s unfortunate luck that
he was indeed this function. Things werent looking very good for f or for the
Turn-me-into-a-Fourier-series operator.
Functions started to wonder how bad this could get. Could there exist a
continuous function whose Fourier series diverged a.e.? What kind of condition
sec:carleson
would guarantee convergence a.e.? (For spoilers, skip to Section 80.)
TODO For the last part of this section, well look at the construction of such
a function. TODO Picture????
Reference: Details forGrafakos
Kolmogorovs
Zygmund
functions with a.e. and everywhere
divergent Fourier series: [2, 14]
Sn (f ) = f Dn .
78. CONTINUOUS FUNCTIONS MAKE FOURIER SERIES CRY 221
10
8
6
4
2
Figure 10.6: Plot of the Dirichlet kernel Dn (x) for 1 n 5. The central peak rises
as n increases. By summing the geometric series, the Dirichlet kernel can be written
as Dn (x) = sin((2n+1)x)
sin(x)
.
The uniform boundedness principle didnt say much, other than that he
really liked Baire and his two best friends were Hahn-Banach and the open
mapping theorem.
P
Lemma 11. The L1 norm of the Dirichlet kernel Dn (x) = |j|n e2ikx grows
like log n. More specifically,
4
log n < kDn k1 < 3 + log n.
2
We only actually need kDn k1 & log n, and this we can get by using | sin x|
|x| and estimating
Z 1/2 Z (n+ 21 )
| sin((2n + 1)t)| 2 | sin x|
kDn k1 2 dt = dx
0 t 0 x
n Z k
2X 1 4
> | sin x| dx 2 log n.
k (k1)
k=1
for all f F. This works for any other x T as well, so we can always find a
lot of continuous functions whose Fourier series diverge at a fixed x T.
Hmm this seems pretty bad. But at least it cant really get any worse than
that right? asked the Turn-me-into-a-Fourier-series operator. And he was
wrong again. The Dirichlet kernel replied,
79. SUMMABILITY IS NICE 223
Well, um...in fact, it gets a bit worse. How about instead of just picking
a single x T where we want the Fourier series to diverge, we pick several?
Or countably infinitely many? Or just any measure zero set? It turns out that
given any measure zero set E T, there is some continuous function whose
Fourier series diverges precisely on E and nowhere else!
Arrrrrrrrrrrgggggggggggghhhhhhhhh!!!!!!!!!!! exclaimed the Turn-me-into-
a-Fourier-series operator as he ran around screaming and complaining about
how much he hated certain continuous functions, Why? why? why? why?
why? why? I hate these functions!
References: Banach-Stenhaus,
rudin
Zygmund
Dirichlet kernel, divergence of Fourier series:
Katznelson
[8, 14]. Any measure zero set: [4].
[TODO set E. Do we get the G or denseness?]
79 Summability is nice
After all these disasters with the Turn-me-into-a-Fourier-series operator fail-
ing to produce nice Fourier series for everyone, the functions on T were losing
faith in Fourier series. They searched far and wide and eventually found a simi-
lar operator that called himself the Average-me-into-a-Fourier-series operator
and promised better results.
The idea of the Average-me-into-a-Fourier-series was this: One thing we
can do if a sequence (an ) doesnt converge is to take averages of the sequence
elements (Caesaro summability). Instead of looking at a0 , a1 , a2 , . . ., we look at
a0 , 12 (a0 +a1 ), 13 (a0 +a1 +a2 ), . . .. That is, we look at the new sequence given by
Pn1
bn := n1 j=0 an . Conveniently, if the original sequence (an ) converged to say
a, then so does the new sequence of averages (bn ). The idea is that eventually
youll just be adding a ton of terms that look like a to the average, so the
sequence of averages will have to tend towards a. The beginning few terms
where you might be far away from a dont matter as that fraction n1 starts to
kick in.
Whats useful, though, is that sometimes (bn ) converges even when (an ) does
not. For example, 1, 1, 1, 1, 1, 1, . . . doesnt converge, but the sequence of
its averages does. This is just the sequence
1 1 1
1, 0, , 0, , 0, , 0, . . . ,
3 5 7
which converges to zero.
To apply this to Fourier series, the Average-me-into-a-Fourier-series
Pn1 oper-
ator got some help from the Fejer kernel, Fn (x) := n1 j=0 Dj (x). Recall we
have a sequence of partial sums (Sn (f )) that doesnt behave nicely, and we hope
that averaging them will make things converge nicely. Computation shows
n1
1X
f Fn = Sj (f ),
n j=0
224 CHAPTER 10. FOURIER ANALYSIS
which means that the Fejer kernel takes care of the averaging for us. Addition-
ally, the Fejer kernel is a summability kernel (sometimes called an approximate
identity or a good kernel ), meaning its a sequence that satisfies
R
1. T Fn (x) dx = 1.
R
2. T |Fn (x)| dx M (uniformly bounded).
R
3. For every > 0, |x| 1 |Fn (x)| dx 0 as n . (Here were looking
2
at integration on [1/2, 1/2].)
The Fejer kernel proudly remarked that the Dirichlet kernel is not a summabil-
ity kernel since it fails condition 2; kDn k1 log n is not bounded. The Fejer ker-
h i2
nel does satisfy condition 2, though, since we can write Fn (x) = n1 sin(nx)
sin(x)
0. Condition 3 says all the mass gets concentrated near the origin as n .
Although the Dirichlet kernel may seem to get more and more concentrated as
n increases since the central peak rises, it actually violates this condition.2
Figure 10.7: Plot of the Fejer kernel Fn (x) for 1 n 6. Note Fn (x) 0, unlike
fig:fejer Dn (x).
1
Pn1 n
prop:summability Proposition 46. If f L1 (T), then f Fn = n j=0 Sj (f ) f almost
everywhere.
h i2
1 sin(nx) sin((2n+1)x)
Fn (x) = n sin(x)
Dn (x) = sin(x)
???
Z/6Z?
Figure 10.9: While were discussing kernels, heres two jokes for your amusement.
How do algebraists eat their corn? Answer: By modding out the kernels! And a
classic: Why cant you grow corn in Z/6Z? Answer: Because its not a field! The first
joke may be in reference to a blog post by the user bentilly on blogspot about how
algebraists and analysts actually eat corn, concluding that algebraists tend to eat it
in rows while analysts tend to eat it in spirals. [TODO cite sources for jokes?]
226 CHAPTER 10. FOURIER ANALYSIS
Convolving f with the Fejer kernel or the Dirichlet kernel acts like a cut-off
on the terms in the Fourier series. Convolving with the Dirichlet kernel is a
sharp cut-off at |n| = N to form the N th partial sum. However, convolving
with the Fejer kernel features a smoother cut-off because of thefig:fejer-vs-dirichlet
averaging, and
this ends up producing better convergence properties. (Figure 10.10.)
DN 1
N N
FN 1
N N
Figure 10.10: How DN and FN act as multipliers on the terms in the Fourier series.
DN just takes the all the terms |n| N with a sharp cut-off, while FN averages the
contributions and ends up with a continuous multiplier (though they really only act
fig:fejer-vs-dirichlet at integer points). This increased regularity makes for nicer convergence properties.
Figure 10.11: We can bound Fn (x) by symmetric decreasing functions Fen (x) with
uniform L1 bound.
There was one final thing the Fejer kernel noted to demonstrate his useful-
kk
ness: If f C(T), we can show f Fn sec:weierstrass-approximation
f , which gives a proof of the
classical Weierstrass theorem from Section 25, that trigonometric polynomials
kk
(e.g. f Fn ) are dense in C(T). For the proof f Fn f , use uniform
continuity of f to get a , and then split up the integral between |t| < and
|t| and use the properties of a summability kernel.
Simon
References: [9, Book 3], TODO others
functions. Recall that Kolmogorov showed we cant get a.e. convergence for all
of L1 (T). But at least the Turn-me-into-a-Fourier-series operator eventually
found out about a good positive result for Lp spaces, p > 1.
Theorem 65 (Carleson-Hunt 1960s). If f Lp (T), p > 1, then Sn (f )(x)
f (x) pointwise a.e.
Think were going to include the proof? Not a chance! The theorem is
considered one of the hardest theorems in analysis. Were certainly not including
the proof.3 Showing a.e. convergence of Fourier series in Lp (T) is equivalent to
showing the Carleson operator supn |Sn f | is a bounded operator,
i.e. a weak-type (p, p) operator. This is what Carleson proved in 1966 for p = 2.
Since the Carleson operator is the maximal function, satisfying sec:lebesgue-differentiation
the weak-type
inequality gives us a.e. everywhere convergence. (Recall Section 57. We already
have convergence on the dense set of trig polynomials.)
Figure 10.13: Believe it or not, this function (in black) is in L2 (T) and hence has a
Fourier series that converges to itself pointwise a.e. The first few partial sums dont
look so great though.
Anyway, after this result, the functions on the torus were satisfied and al-
lowed the Turn-me-into-a-Fourier-series operator to stay. Well, the functions
in L1 but no other Lp space werent very happy, but there wasnt much they
could do.
3 There
LaceyThiele
is a simpler proof of Carlesons theorem by Lacey and Thiele [6] from 2000, but it
still requires much more harmonic analysis and technical details than we want here. The goal
of a quater-long class at Caltech, which had graduate analysis as a co/pre-requisite, was to
go over the harmonic anlaysis background related to the proof.
80. FOURIER SERIES FINALLY CONVERGE 229
So they sought out a new operator, who called himself the Turn-me-into-a-
Fourier-series-I-luv-Lp operator. This new operator promised Lp convergence
of Fourier series4 , that is, convergence in Lp norm, but only for 1 < p < .
Note, convergence in Lp norm implies that some subsequence of the Fourier
series converges pointwise almost everywhere. The functions only in L1 and no
other Lp space were again pretty annoyed and jealous.
Of all the favorite Lp spaces, our favorite is L2 (its the Hilbert space), so well
start there. First note that {e2inx }nZ is an orthonormal (Schauder) basis for
L2 (T): Since continuous functions, which are dense in L2 , can be approximated
uniformly by trig polynomials, we get that trig polynomials are also dense in
L2 (T). Orthonormality of {e2inx } is easy to check by computation.
More generally, if {uj }jJ is an orthonormal basis for a Hilbert space H, then
for every f H, X
kf k2 = |huj , f i|2 .
jJ
P
This comes from decomposing
P f into its basis elements, f = jJ huj , f iuj (or
if we are physicists, f = jJ |uj ihuj |f i).
[TODO is there stuff to prove here?]
Remark 17. A quick P detour: One fun thing we can do with Parseval is to
sum the infinite series n=1 n12 . We need to find a function f whose Fourier
1
coefficients look like n . It turns out our friend the saw-tooth wave f (x) = x
on [0, 1] works: After integrating by parts,
Z
b i
fn = xe2inx dx = , n 6= 0,
T 2n
4 This new operator computed the same Fourier series as the original Turn-me-into-a-
X
1 1 1
= kf k22 = + 2 ,
3 4 n=1
4 2 n2
so
X
1 2
= .
n=1
n2 6
P
This method extends to summing n=1 n12k , k N, by using xk on [0, 1] along
with results for all the lower even powers.
The proof is much more work and involves a lot more harmonic analysis
than the case p = 2, so well omit it. The moral of the story is to avoid purely
L1 functions, who arent always very friendly.
Katznelson Grafakos
Katznelson
duo
References: Pointwise convergence: [4], Norm convergence: [2, 4, 1]
Figure 10.14: The Fourier transform is not to be confused with the Fourier trans-
former, who turns into a car with four wheels, four seats, four windows, and four-wheel
drive. It is also not to be confused with the Fouriest transform, which is where you
write a number in the base system in which it has the most fours. (Check out SMBC
2874.)
Figure 10.15: What transform do you apply to turn a sphynx cat into a Norwegian
forest cat? Answer: The furrier transform!
Figure 10.16: Viewing a function in the time domain (red) and the frequency domain
(blue). The idea of Fourier series and the Fourier transform is to take a function in
fig:fourier-domains time and decompose it into its various frequency parts.
and your home R/Z are just specific examples of spaces where you can Fourier
transform. On R/Z, the Fourier transform just gives you your Fourier series.
The functions from the torus were amazed! And also pretty disappointed.
They had thought they were so special since they had Fourier series, and now
they learned that pretty much any function that anyone cared about could get
one. Plus they didnt know what a locally compact abelian group was.
The functions on R went on to explain the greatness of the Fourier transform.
The convention they chose to use is the one where the exponent has ipx instead
of 2ipx. This is not the good convention, but itshould agree with most physics
conventions, at least in quantum mechanics. The good convention is to put the
2 in the exponent instead, which makes a lot of the annoying factors in various
places disappear. With the physics convention, for f L1 (Rn ), its Fourier
transform is Z
1
fb(p) := f (x)eipx dn x, (10.2)
(2)n/2 Rn
Figure 10.17: The 2D box function f (x, y) = [1,1][1,1] along with the magnitude
fig:fourier-box of its Fourier transform fb(1 , 2 ) = 2 sin( 1 ) sin(2 )
1 2
.
If we also have f C 1 (Rn ) with lim|x| f (x) = 0 and the partial derivative
j f L1 (Rn ), then
The functions on R went on and on about how useful the Fourier transform
is: So if we need to do something complicated like differentiation, we can
instead move to Fourier space and just multiply by ipj . Or if we need to do
for example a translation in x space, then we can move to Fourier space and
a phase eiap instead. Well give an application of this relationship
multiply bysec:hrt
in Section 86.
Finally, after watching most of the functions on T fall asleep, they decided
to answer one very important question: The Fourier transform gets you from x
space to p space, but how do you get back?
thm:fourier-inversion Theorem 69 (Fourier inversion). The Fourier transform is a bounded injective
map L1 (Rn ) C0 (Rn ) (the space of continuous functions vanishing at ). It
has inverse Z
1 2
f (x) = lim n/2
eipx|p| /2 fb(p)dn p, (10.6) eqn:fourier-inversion1
0 (2) Rn
(fb) = f, (10.7)
where Z
1
f (p) = eipx f (x) dn x = fb(p). (10.8) eqn:fourier-inversion
(2)n/2 Rn
234 CHAPTER 10. FOURIER ANALYSIS
sec:riemann-lebesgue eqn:fourier-inversion
So for sufficiently nice f (e.g. Schwartz, Section 83), we have Equation 10.8,
which implies (fb) (x) = f (x), so that the Fourier transform has period four.
Its inverse is then just applying the Fourier transform three times.
Now we have a way to get from Fourier space back into regular space, al-
82. FOURIER SERIES VS. THE FOURIER TRANSFORM (AND LCA GROUPS)235
Figure 10.19: A plot of Re(eix e2iy ), with the imaginary part indicated by the color.
In 2D, we integrate against e2inx = e2in1 x e2in2 y , so our 2D waves are products
fig:fourier-series-2d of waves in 1D.
To start off the discussion, the functions on R began with the analogue of
236 CHAPTER 10. FOURIER ANALYSIS
sec:carleson
Parsevals relation. Recall, this was in Section 80. One form said,
Z X
2
kf k2 |f (x)|2 dx = |fbk |2 kfbk22 .
R nZ
For the Fourier transform, we get the following (which follows quickly from
Fubinis theorem): [TODO check proof?]
For the Fourier transform, we dont want to try to Fourier transform a periodic
function on R, since it wont be in L1 unless its zero a.e. So well look at f
supported on [T /2, T /2]. Then if we integrate over R, it reduces to an integral
over [T /2, T /2],
Z T /2
fb() = f (x)e2ix dx.
T /2
82. FOURIER SERIES VS. THE FOURIER TRANSFORM (AND LCA GROUPS)237
fig:fourier-transform-series
In fact we find that the Fourier coefficients satisfy an = T1 fb( Tn ). (Figure ??.) If
T = 12 , which corresponds to R/Z, then we see that the Fourier series eqn:fourier-T
coefficients
are the Fourier transform sampled at the integers. We can rewrite (10.10) as
1 X b n 2i(n/T )x
f (x) = f e .
T T
nZ
This is something like a Riemann sum (on the unbounded interval R though)
with x = T1 , so letting T become large, we get something like
Z
f (x) = fb()e2it d,
R
T 1 Imfb()
a1
a3 a2
a0
T3 T2 T1 0 1
T
2
T
3
T a3
a2
a1
Figure 10.20: The imaginary part of the Fourier transform of f (x) = xT /2,T /2
multiplied by T1 . (The real part is zero.) Evaluation at the points 2T
n
, n Z and
1 b n
multiplication by i retrieves the Fourier series coefficients an = T f ( 2T ). TODO
replace with an L1 function. Then comment, As T , the sampled points get
closer and closer, making the Fourier series sum look more and more like the Fourier
rier-transform-series transform.
where the group operation and taking inverses is continuous, and the space is
locally compact and Hausdorff. Rn and T with the normal Euclidean topology
are examples. Our plan is to do Fourier analysis on LCA groups.
Figure 10.21: The most general we go here is the Fourier transform on LCA groups.
High and low pass filtering: High pass filtering can be used for edge detec-
tion since rapid changes correspond to high frequencies. Low pass filtering
can be used to blur an image.
Changing contrast: We can change the contrast of an image by raising the
magnitude of the Fourier transform to a power. This will emphasize high
or low frequencies over the other.
Image processing software like the GMIC plugin for Gimp [TODO link] will let
you experiment with Fourier transforming an image.
Figure 10.23: Original image (left) and power spectrum (right, aka magnitude of
the Fourier transform [TODO or norm squared??]) for some simple pictures. The grid
looks like a 2D wave and has just a few (hard-to-see) components in frequency space.
Figure 10.24: Rotations: Original image (left), power spectrum (middle), phase
(right).
Figure 10.26: We start with a picture where every other horizontal line has been
replaced with white pixels. We will use the discrete Fourier transform to remove the
lines, essentially by blurring them. The magnitude of the discrete Fourier transform
is shown on the right. The bright areas at the top and bottom indicate a periodicity
in the original image; these came from the horizontal lines. In a normal picture, the
fig:fourier-noise1 bright area is generally concentrated at the center only.
Figure 10.27: We correct the image by coloring over the bright areas at the top and
bottom. The phase of the Fourier transform (not shown) is unchanged. Applying the
inverse Fourier transform yields the picture on the right, in which the horizontal lines
fig:fourier-noise2 are removed.
83. SCHWARTZ FUNCTIONS AND TRADING SMOOTHNESS FOR DECAY243
|| f
f := , x = x n
1 xn ,
1
|| = 1 + + n ,
x
1
1
x
n
n
2
Figure 10.28: One of our favorite Schwartz functions on R is f (x) = ex . Its
2
relative f (x) = e|x| is a Schwartz function on Rn . And of course any Cc function
is Schwartz. Schwartz functions are very friendly.
Heres some convenient facts about the Fourier transform on the space of
Schwartz functions:
Proof. First, the Fourier transform maps Schwartz functions to Schwartz func-
lem:fourier-properties
tions: Recall Lemma 12, which told us (j f ) (p) = ipj fb(p), and use that
p ( fb)(p) = i|||| ( x f (x) (p) is bounded since x f (x) S(Rn ). For
thm:fourier-inversion
the bijection part, use Fourier inversion (Theorem 69), which we conveniently
didnt prove.
Proof.
Z Z
1
dx dt f (x t)g(t)eipx
(2)n/2 Rn Rn
Z Z
1
= dt dx f (x t)g(t)eip(xt) eipt
(2)n/2
= (2)n/2 fb(p)b
g (p).
It turns out the Fourier transform takes smoothness of f and turns it into
fast decay for fb. As we just saw, for Schwartz functions f that are infinitely
differentiable, the Fourier transform fb is again Schwartz and decays quickly.
The general idea is that
smoothness of f fast decay of fb.
[TODO vice versa? Paley-Wiener?] Well start with no smoothness, just as-
suming f L1 , and then work our way up to C k and C .
Lemma 13 (Riemann-Lebesgue). If f L1 (Rn ), then |fb(p)| 0 as |p| .
In fact, the Fourier transform maps L1 (Rn ) into C0 (Rn ), the space of continuous
functions decaying at infinity.
We actually already stated (without proof) a stronger version of this lemma
thm:fourier-inversion
in Theorem 69 about Fourier inversion.
dk f
Corollary 6. Let f C k (R) L1 (R), and suppose dxk
L1 . Then
fb(p) = o(pk ).
k
d k kb 1
TODO check this ( dx k f (x)) (p) = i p f (p) 0 since f L .
This is nice, but we can do better: just like for the Fourier transform, the rate
of decay depends on the smoothness of f .
84. UNCERTAINTY PRINCIPLES 245
84 Uncertainty principles
One famous result in quantum mechanics is the Heisenberg uncertainty principle.
Roughly, it says that you cant know both the position and momentum of a
small particle to high accuracy. Morally, the better you know one of them,
the less you can know the other. Were going to ignore all the talk about
measurement in quantum mechanics and just do the math version, which we
can obtain from properties of thesec:quantum
Fourier transform. For some background in
quantum mechanics, see Section ??.
Once upon a time there was a quantum mechanical particle running around
in Rn . His position (location) wasnt given by a single point in Rn since he was
a quantum, not a classical,
R particle. His position was instead given by a wave
function (x) with Rd |(x)|2 dx = 1. In other words, (x) L2 (Rn ) with
kk2 = 1. The probability of finding this particle in a region E Rn is
Z
Prob(particle E) = |(x)|2 dx.
E
So were never entirely sure exactly where the particle is, but we have an idea
of where its likely to be. The particle himself was a little annoyed at this
uncertainty. For example, it made it difficult to set an address to meet other
particles or receive mail, but he realized he could localize himself to a small
region E if his wavefunction was very large on E and small outside of E.
Then he could say that he lives in E, and everyone would know where to find
him.
One day, the Fourier transformer was visiting quantum-mechanics-land and
ran into this particle. The particle explained his great plan to localize his
position. The Fourier transform looked skeptical. He asked the particle, But
what about your momentum? If you localize position too much, youll force your
momentum wavefunction to spread out and no one will have any idea where or
how fast youre going! What? replied the particle, whats this momentum
wavefunction? Why cant I just localize that too?
You cant just do that, explained the Fourier transformer. Your position
and momentum wavefunctions are related by the Fourier transform. Given a
position wavefunction (x), you can move to momentum space and get the
momentum wavefunction (p) b via the Fourier transform,
Z
b 1
(p) = (x)eipx dn x.
(2)n/2 Rn
246 CHAPTER 10. FOURIER ANALYSIS
R
Then Prob(momentum K) = K |(p)| b 2
dp. Im ignoring some of the con-
stants since I dont visit quantum-mechanics-land that often and I dont like ~,
but you get the idea. But then there are uncertainty principles regarding the
Fourier transform that dont let you localize both space and momentum simul-
taneously. Theyre often stated and proved using operators and expected values,
but since Im currently visiting quantum-mechanics-land, Ill express them in
terms of the Fourier transform.
Theorem 75 (Heisenberg uncertainty principle). Suppose S(R) with kk2 =
1. Then for any x0 , p0 R,
Z Z
b 1
(x x0 )2 |(x)|2 dx (p p0 )2 |(p)| 2
dp . (10.14)
R R 4
What does this have to do with uncertainty? The expected value of an
operator A in the state is given by
For the
R position operator X defined by X(x) = x(x), the expected value is
x = R x|(x)|2 dx. The variance, or uncertainty, is given by
Z
x2 = (x x)2 |(x)|2 dx,
R
and this is the quantity we are looking at in the Heisenberg uncertainty principle.
If is highly localized near x, then x2 will be small. The uncertainty principle
tells us that we cannot make both x2 and p2 very small at the same time, i.e. we
cannot localize on very small sets in both x and p space. If position is localized
to a region of size roughly R, then momentum cannot be localized on a scale
much smaller than R1 .
84. UNCERTAINTY PRINCIPLES 247
Proof (of HUP). First, we can replace (x) with eixp0 (x + x0 ) and change
variables, which lets us assume x0 = p0 = 0. Then we integrate by parts, using
S(R) and ||2 = :
Z Z Z
d d
1= 2 2
|(x)| dx = x |(x)| dx = 2 Re x(x) (x) dx.
R R dx R dx
Then we apply Cauchy-Schwarz to get
b
1 2kx(x)k2 kk2 = 2kx(x)k2 kp(p)k2.
for some C, N and where c(I) is the center of the interval I. (This is called C-
adapated of order N to I, [TODO cf cite].) But since were going to talk about
localization in a pretty vague sense, we wont concern ourselves with trying to
find a precise definition.
Heres another kind of uncertainty principle. If you localize yourself to a
bounded region E with probability 1, then youre guaranteeing that your mo-
mentum is pretty spread out for any magnitude you can think of, theres going
to be nonzero probability that your momentum is larger than that.
Theorem 76. There is no nonzero f L1 (R) which is compactly supported
whose Fourier transform is also compactly supported.
Proof. Theres a really nice proof that uses some complex analysis. You can
prove it without complex analysis, but the complex analysis proof is so nice
were going to use that one.
If f has compact support, then
R we can extend the Fourier transform fb to
6 b 1 itz
an entire function f (z) = 2 R e f (t) dt. The zeros of a nonzero entire
6 Say f is supported on [a, b]; then
Z bX X (iz)n Z b
izt)n
Z
eitz f (t) dt = f (t) dt = tn f (t) dt,
R a n n! n
n! a
where interchange of sum and integral is justified since everything converges absolutely.
248 CHAPTER 10. FOURIER ANALYSIS
p p
x x
Figure 10.30: A Heisenberg box in the time-frequency plane (here the space-
momentum plane). A small width corresponds to localization in position space, while
a small height corresponds to localization in momentu0m space. By the Heisenberg
uncertainty principle, the area of the rectangle must be at least 12 . One fun thing
to do is to take a Heisenberg box, and see how it changes under transformations like
fig:heisenberg-box dilation, modulation, translation, etc. TODO make a separate pictuer for this
2 y
1.5
0.5
x
2 1 1 2
Figure 10.31: Some Gaussian wavefunctions. The black curve is more localized at 0
than the blue curve. Gaussians obtain equality in the Heisenberg uncertainty principle.
TODO add a picture of their Fourier transforms spreading out
On a related note, a physicist might say that if you take the dirac delta
function, which is the most compactly supported function (supported on
a single point), then its Fourier transform is the most un-compactly supported
function, a plane wave eikx , which doesnt even get close to zero. Although the
dirac delta function is not an actual function on Rn , we can make sense of
it by viewing it as a distribution, and then we can still Fourier transform it in
the distributional sense to make sense of what the physicsts mean when they
say they are Fourier transforming the dirac delta. We will tell the tale of Dirac
85. WHAT ABOUT L2 ? 249
sec:distributions
sec:dirac-fourier
delta in Sections 88 and 89.
teschl-fa
SteinShakarchi-fa
tao-notes
References: Uncertainty principles: [13], [10], [11]. More quantum mechan-
teschl-quantum
ics: [12]
[TODO localization reference? Thiele wave packets?]
85 What about L2 ?
sec:fourier-L^2
Weve defined the Fourier transform for f L1 (Rn ), but what about f
L2 (Rn )? This is, for example, the space that physicists tend to care about in
quantum mechanics. Obviously theres no problem for f L1 (Rn ) L2 (Rn )
since we originally defined the Fourier transform for L1 . Conveniently, this space
is dense in L2 (Rn ) since it contains Cc (Rn ). So there exists hope to extend
the Fourier transform to L2 .
By now, the L2 functions were getting rather impatient, since they knew they
were in the best Lp space and wanted to know their Fourier transformss. Recall
thm:plancharel
Theorem 77 (Plancharel), which said that if f, fb L1 (Rn ), then f, fb L2 (Rn )
and kf k22 = kfbk22 . In particular, kf k2 = kfbk2 holds for f S(Rn ). Now we
try to extend the Fourier transform from some dense set, say S(Rn ), to all of
L2 (Rn ) by defining
fb := lim fbN , (10.16) eqn:fourier-L2
N
Figure 10.32: The BLT theorem has nothing to do with a BLT (bacon, lettuce, and
tomato) sandwich. It is instead about extending a bounded linear transformation from
a dense space to the entire space.
Now the functions in L2 (Rn ) were extremely happy! They really liked uni-
tary operators since unitary operators are really friendly and nice. It was also
convenient that for f L1 L2 , this Fourier transform on L2 agrees with the
Fourier transform definition for L1 . In practice, its often easier to just use
eqn:fourier-L2
(10.16) with fN L1 L2 not necessarily in S(Rn ), since then we can take
cut-offs fN := f BN (0) as the approximations.
Remark 20. L2 convergence and everything is fine, but what about pointwise
a.e. convergence? Could we just be lazy and set
Z N
1
fb(k) = lim f (x)eikx dx?
N 2 N
1.2
1.0
0.8
0.6
0.4
0.2
-20 -10 10 20
-0.2
Figure 10.33: Plots of x1 and sinx x . One way to see sinx x is not integrable over R
is to note that | sin x| 12 on [ 6, 5 ] + 2Z. Trying to integrate even just over this
6
subset of R gives infninity, since sinx x looks like at least 2|x|
1
, which is not integrable
5
at infinity even if we only integrate over [ 6 , 6 ] + 2Z.
We can compute the pointwise limit of fbN using some trig identities (separate
into cases for |p| = 1, |p| < 1, |p| > 1) to get
0, |p| > 1
1
lim fbN (p) = 2 , |p| = 1
.
N 2
, |p| 1
fb(p)
1 1 p
teschl-fa Lacey
References: Fourier transform on L2 : [13]. Carlesons theorem: [5]
252 CHAPTER 10. FOURIER ANALYSIS
and hoping for independence since Gabor systems are often basis-like.) Any-
way, these L2 functions were really interested in finite linear independence since
it seemed simpler than allowing infinite sums. That is, they P were wondering
N
whether or not there exist c1 , . . . , cN C, not all zero, such that j=1 cj Mbj Taj f (x) =
N
0 a.e. The set of points = {(aj , bj )}j=1 can be represented in hte phase space
RR, where the x-axis is time and the y-axis is frequency. This let the functions
talk about the geometry of the points in .
Figure 10.35: Some points in time-frequency space. These four points form a trape-
zoid.
While the functions didnt know for sure whether all possible finite Gabor
systems were independent, they could at least obtain some results for simpler
cases based on the geometry of . The full statement they hoped for was this:
hrt
Conjecture 1 (Heil-Ramanathan-Topiwala [3]). If 0 6 f L2 (R) and R
R is a finite set of distinct points, then G(f, ) is finitely linearly independent.
One easy result they could prove was linearly independence of frequency
86. FOURIER TRANSFORMS TURN TIME-TRANSLATES INTO FREQUENCY MODULATIONS253
PN
If g 6 0, then j=1 ck e2ik x = 0 on a set of positive measure (where g(x) 6=
0). But if we extend this trig polynomial to x C, we get cj = 0 for all
j = 1, . . . , N by uniqueness of analytic functions8 (uncountable subsets of R
have an accumulation point). So we cant get linear dependence with only
frequency modulations.
Additionally,
Proposition 51. Time-translates g(x aj ) are finitely linearly independent.
This seems a bit harder than frequency modulations, which are easy to deal
with:
This seems a bit harder than frequency modulations, which were easy to
deal with. But, if we use the Fourier transform, we can change all of our time
translations into frequency modulations! Since were now doing time-frequency
analysis instead of physics, well take a different normalization for the Fourier
transform where we put the 2 in the exponent,
Z
b
f () := f (x)e2ix dx.
R
tao-notes
This is actually the good place to put the 2, according to Terence Tao [11].
This is where well put the 2 for the rest of this chapter, since were done with
physics for a bit. Well also use instead of p in hopes of avoiding confusion.
Now suppose we have the relation
N
X
cj g(x j ) = 0
j=1
But from what we had for just frequency modulations, this implies gb() = 0, so
g 0 by Fourier inversion!
To deal with the time-translates, we effectively took the points {(j , 0)} and
rotated them 90 counterclockwise to become {(0, j )}, which correspond to
fig:hrt-rotate
frequency-translates. (Figure 10.36.)
8 Sorry some complex analysis snuck in there.
254 CHAPTER 10. FOURIER ANALYSIS
Once upon a time, there was a differential operator whose name was
Nabla. As expected, Nabla took a nice function f and returned its gradient,
f . Now Nabla was a fairly complicated operator, and so one day decided to
try to find out how to make himself simpler. Nabla was jealous of the really
simple multiplication operators like M : f 7 M f for a constant M R. But
unfortunately, Nabla didnt see an easy way to make himself simpler other than
doing something like only operating on constant functions.
One day, Nabla met the Fourier transformer. The Fourier transformer
promised to turn Nabla into a multiplication operator. Not as simple as the
multiplication by a constant operators, but at least it looked better than differ-
entiation. Nabla was excited! Except then the Fourier transformer transformed
him into Fourier space, using
Now youre just multiplication by 2i, said the Fourier transformer. Hey!
I dont want to have to live in Fourier space, complained Nabla. And he ran
away from the Fourier transformer (after finding the inverse Fourier transformer
to undo the transformation).
Now Nabla continued to wander around until eventually he came to 1-
dimensional R land and met the Littlewood-Paley (LP) square function. (You
87. THE GRADIENT MEETS LITTLEWOOD-PALEY 255
can imagine the same conversation occuring in Rn land with some minor mod-
ifications.) The LP square function takes a function f L2 (R) and spits out a
sequence {Pk f }kZ , where each Pk restricts f to frequencies near 2k .
What exactly are these Pk s? Let () be a radial bump function supported
on [2, 2] and equal to 1 on [1, 1], and let
() := () (2),
fig:fourier-lp
so is supported on {1/2 || 2}. See Figure 10.37 for possible graphs of
and .
()
1
0.5
-3 -2 -1 1 2 3
()
1
0.5
-3 -2 -1 1 2 3
2k
1
0.5
-3 -2 -1 1 2 3
fig:fourier-lp-all Figure 10.38: Plots of 2k
for k = 3, 2, . . . , 2.
Pk f 2k Pk .
Nabla was pleased with this result, although he wanted a bit more precise result
eqn:fourier-L-P-nabla
to back up the LP square functions claim and (10.20). So the LP square
function gave him a proposition:
Proposition 52. For 1 p ,
0 x
Figure 10.39: Dirac delta tries to disguise himself as a function on R. Good luck.
1
Whenever dirac delta( visits a function-land like L (R), he tries to sneak
, x = 0
in by pretending to be . But as soon as anyone tries to integrate
0, x 6= 0
, they get suspicious. Heres a function that is zero a.e. whose integral is
nonzero...wait a minute, thats impossible, we have an imposter! And then they
kick out.
So dirac delta gave up trying to enter function-lands like L1 (Rn ) and wan-
dered far and wide until coming to measure-land. Here, he was accepted as an
atom measure. But this was quite restrictive! As an atom measure, dirac delta
always had to stay inside the integral. The other measures in measure-land
viewed x0 (x) := (x x0 ) as the atom measure that puts weight 1 at the point
x0 , and weight 0 everywhere else, i.e.
(
1, x0 E
x0 (E) = .
0, x0 6 E
258 CHAPTER 10. FOURIER ANALYSIS
R
Then the integral9 R f (x)(x x0 ) dx is just an integral with respect to the
(non-Lebesgue)R measure (x x0 ), and this produces the same answers that the
physicists get, R f (x)(x x0 ) dx = f (x0 ).
But wanted to be free of these restraints and live outside an integral.
So he kept wandering around to find a place where he could be accepted as
himself. He wanted to find a place where he could be some sort of function,
live outside an integral and not only as a measure, and compute derivatives
D. Eventually, made his way to distribution-land, and this is where he
decided to stay. The distributions in distribution-land explained how they were
defined. They started with the space of test functions, Cc () for some open
nonempty Rn , i.e. compactly supported, infinitely differentiable functions
on . These are super nice functions!nice enough that they come and visit all
the distributions in distribution-land all the time. The topology on this space
that we want is essentially uniform convergence of all derivatives on a certain
compact set. More precisely, m converges to iff there is a compact set K
so that m is supported in K for all m and for each Nn0 , m
uniformly on K.
Definition 56. A distribution is an element in the dual space of Cc (), i.e. a
continuous linear functional Cc () C.
Dirac delta immediately saw that he fit right in here as the distribution
: 7 (0), for Cc (R), and was quite happy with this fact. Of course
was still worried about differentiation all these physicists kept assigning
homework problems about his derivative, but he had no idea how to define it
himself! The other distributions in distribution-land assured him there was no
problem. The functions in Cc (Rn ) were so nice that whenever a distribution
needed to be differentiated, they put the differentiation on themselves instead.
For T a distribution,
(D T )() := (1)|| T (D ),
The L1loc functions explained, Sure, we have citizenship in both L1loc -land and
distribution-land. You see, every L1loc function f can be identified with a distri-
bution Tf , where Z
Tf () := f dx. (10.22)
Sometimes we even say that the distribution Tf is the function f . And if f is
C 1 , then the classical derivative agrees with the weak (distributional) derivative.
But of course, youre not like us. You arent a function at all. Dirac delta was
a bit sad at this, but then he realized that being only a distribution was ok
whats the point in having distribution-land if everyone was already in L1loc -
land? He also pointed out that they were wrong; dirac delta was indeed a
function, just not a function on Rn .
Later, Dirac delta also ran into some old friends from measure land. He
found some Radon measures (locally finite and inner regular measures), who
explained how they are citizens of both measure-land and distribution-land.
For a Radon measure , the corresponding distribution is
Z
T () := d. (10.23)
If m where sec:radon-nikodym
m is the Lebesgue measure, then d = f dm by Radon-
Nikodym (Section 55), and the distribution can be identified with the function
f . Dirac delta realized he too could become a dual citizen of measure-land and
distribution-land since he corresponds to the atom measure! Dirac delta was
quite satisfied.
Finally, you might be wondering why this is in the Fourier transform section.
It turns out we can Fourier transform distributions. And the Fourier transform
on Lp , p > 2 hasliebloss
to be viewed as a distribution. Well address this next.
References: [7]
Z Z
b eb b
T () = T () = d T () dx (x)e2ix
n n
ZR ZR
= dx (x) d Tb()e2ix .
Rn Rn
R
So we identify T with the function T (x) = Rn
Tb()e2ix d.
liebloss
duo
References: [7, 1]
k f k Cn,r kf kH r , || k. (10.34)
d b
f () = (2i) f (),
References
duo [1] Javier Duoandikoetxea. Fourier Analysis. Graduate Studies in Mathemat-
ics 29. American Mathematical Society, 2000.
Grafakos [2] Loukas Grafakos. Classical Fourier Analysis. 3rd ed. Graduate Texts in
Mathematics 249. Springer, 2014.
hrt [3] Christopher Heil, Jayakumar Ramanathan, and Pankaj Topiwala. Linear
independence of time-frequency translates. In: Proc. Amer. Math. Soc
124 (1996), pp. 27872795. url: http://www.ams.org/journals/proc/
1996-124-09/S0002-9939-96-03346-1/S0002-9939-96-03346-1.pdf.
Katznelson [4] Yitzhak Katznelson. An Introduction to Harmonic Analysis. 3rd ed. Cam-
bridge University Press, 2004.
Lacey [5] Michael Lacey. Carlesons Theorem: Proof, Complements, Variations.
Publ. Mat. 48 (2004), no. 2, 251307. 2003. eprint: arXiv:math/0307008.
LaceyThiele [6] Michael Lacey and Christoph Thiele. A proof of boundedness of the Car-
leson operator. In: Mathematical Research Letters 7.4 (2000), pp. 361
370. doi: 10.4310/mrl.2000.v7.n4.a1. url: http://dx.doi.org/10.
4310/mrl.2000.v7.n4.a1.
liebloss [7] Elliott Lieb and Michael Loss. Analysis. Graduate Studies in Mathematics
14. American Mathematical Society, 2001.
rudin [8] Walter Rudin. Real and Complex Analysis. McGraw-Hill Education, 1986.
Simon [9] Barry Simon. A Comprehensive Course in Analysis. American Mathemat-
ical Society, 2015.
SteinShakarchi-fa [10] Elias Stein and Rami Shakarchi. Fourier Analysis: An Introduction. Vol. 1.
Princeton Lectures in Analysis. Princeton University Press, 2003.
tao-notes [11] Terence Tao. Math 254a Harmonic analysis in the phase plane. 2001. url:
https://www.math.ucla.edu/~tao/254a.1.01w/.
teschl-quantum [12] Gerald Teschl. Mathematical Methods in Quantum Mechanics. 2nd ed.
Graduate Studies in Mathematics 157. American Mathematical Society,
2014.
264 CHAPTER 10. FOURIER ANALYSIS
teschl-fa [13] Gerald Teschl. Topics in Real and Functional Analysis. 2015. url: https:
//www.mat.univie.ac.at/~gerald/ftp/book-fa/fa.pdf.
Zygmund [14] Antoni Zygmund. Trigonometric Series. 3rd ed. Cambridge University
Press, 2003.
refsection:12
Chapter 11
91 Rectifiability
[maybe combine with something?]
93 Kakeya sets
sec:kakeya
[TODO write. Maybe also mention Nikodym sets.]
265
266 CHAPTER 11. MISCELLANEOUS (MAYBE MOVE THESE LATER?)
It is immediate that this function is additive. To see that it is not linear, note
that its image is contained in Q! Furthermore, for b B, we have f (b) = 1.
Let be your favorite irrational number. Then f (b) is irrational, so f (b) 6=
f (b).
94. CAUCHYS FUNCTIONAL EQUATION AND HAMEL FUNCTIONS267
Proof. TODO: give credit to Horst Herrlichs Axiom of Choice. Also TODO:
somehow illustrate this proof. Consider replacing with the proof at http://web.stanford.edu/ ck-
hend/additive.pdf, which proceeds directly rather than contrapositively, but
which is slightly less elementary. Also TODO: consider just omitting the proof
entirely, since its not that simple...
First, assume that f : R R is additive and nonlinear, and assume that
f (x0 ) = 0 and f (x1 ) = 1, where x0 is nonzero. For n Z, let qn be a rational
number so that |nx1 qn x0 | < 12 . Define An = f 1 ([n, n + 1)). Define B0 =
A0 [ 12 , 32 ], and define
Bn = B0 + nx1 qn x0 .
1 + (n Bn ) 3.
x + nx1 qn x0 = y + mx1 qm x0 .
f (x) + n = f (y) + m.
Chapter 12
Acknowledgments
References
bedtime [1] S. Duvois and C. Macdonald. 101 Illustrated Analysis Bedtime Stories.
2001. url: http://people.maths.ox.ac.uk/macdonald/errh/.
jedi [2] George Lucas. Star Wars Episode VI: The Return of the Jedi. 1983.
269
270 CHAPTER 12. ACKNOWLEDGMENTS
refsection:14
Appendix A
Omitted Details
..
.
Table
271
272 APPENDIX A. OMITTED DETAILS
> 0. Well show that the plan is not sound. For each N > n, define
N
1 X
aN = xi
N i=1
XN
1
bN = xi .
N n i=n+1
First, suppose that |bN xn | > 21 for some N . Then when the stack has N
books, it will topple over, pivoting about one of the top corners of book n.
1 n
Pm that |bN xn | 2 for all N . Then N bN 0 as
Therefore, assume instead
1
N , and of course N i=1 xi 0 as N , so |aN bN | 0 as N .
Choose N large enough that |aN bN | < . Then by the triangle inequality,
|aN xn | < 12 + , so aN > 0. Therefore, when the stack has N books, it will
topple, pivoting about the upper right corner of the table.