231A Lecture Notes v23
231A Lecture Notes v23
Kyle Hambrook
February 5, 2020
Contents
2 Preliminaries: Set Theory, Extended Real Numbers, Limsup and Liminf, Infinite
Series 7
3 σ-Algebras 13
4 Measurable Functions 17
5 Measures 23
6 Simple Functions 25
7 Lebesgue Integral 28
11 Fatou’s Lemma 37
1
13 Dominated Convergence Theorem 39
15 Almost Everywhere 42
16 Complete Measures 45
22 Lp Spaces: Completeness 59
28 Product σ-Algebras 74
29 Monotone Classes 75
2
31 Product Measures 80
33 Lebesgue Measure on Rn 83
3
1 Introduction: Riemann to Lebesgue
You are probably familiar with the Riemann integral from calculus and undergraduate analysis. If
f is a non-negative real-valued function defined on an interval [a, b], then, roughly speaking, the
Riemann integral of f is a limit of Riemann sums
Z b Xn
f (x)dx = lim f (x∗i )(xi − xi−1 ),
a n→∞
i=1
where [a, b] is partitioned into subintervals [x0 , x1 ], [x1 , x2 ], . . . , [xn−1 , xn ], we choose sample points
x∗i ∈ [xi−1 , xi ], and (xi − xi−1 ) is the length of [xi−1 , xi ]. We say f is Riemann integrable on [a, b]
when its Riemann integral is defined.
In this course, we will study Lebesgue’s theory of integration with respect to a measure. Roughly, a
measure µ on a set X is a function which takes subsets A ⊆ X as inputs and gives non-negative real
numbers µ(A) as outputs. You can think of µ(A) as some general notion of the size of A. The most
important measure is the Lebesgue measure on R, denoted by λ. For each interval [a, b], λ([a, b])
is the length b − a. For other sets A ⊆ R, λ(A) is the length of A (we will see how to define this
later.) There are many other useful measures. If f is a non-negative real-valued function defined
on a set X, then, roughly speaking, the integral of f with respect to µ is
Z Xn
f (x)dµ(x) = lim f (x∗i )µ(Ai )
X n→∞
i=1
Here the set X is partitioned into subsets A1 , . . . , An , we choose sample points x∗i ∈ Ai , and µ(Ai )
is the measure of the set Ai . When the integral is defined, we say f is µ-integrable. When µ is the
Lebesgue measure λ on R, the integral is simply called the Lebesgue integral.
(1) Lebesgue’s theory allows us to integrate functions defined on arbitrary sets, whereas the Rie-
mann integral is restricted to functions defined on R or Rd . This is especially important for certain
applications. For example, Lebesgue’s theory of measure and integration forms the rigorous foun-
dation for modern probability theory, where the functions are random variables and we integrate
over the sample space of possible outcomes.
(2) The Lebesgue integral (i.e., the integral with respect to Lebesgue measure) is defined for a
larger class of functions on R, though it still agrees with the Riemann integral whenever the latter
is defined. For example, the indicator function of the rationals
1 if x ∈ Q
1Q =
0 if x ∈/Q
is not Riemann integrable on any interval [a, b], but it is Lebesgue integrable on any such interval.
We will see the details later.
(3) Lebesgue’s theory possesses better convergence theorems, which lead to more general elegant
results and more useful spaces of functions.
4
Prototype Convergence Theorem. If (fn ) is a sequence of integrable functions defined on [a, b]
Rb Rb
and (fn ) converges to f “nicely”, then f is integrable and limn a fn = a f .
The problem is, of course, to figure out what “nicely” might mean and to define the integral in
such a way that theorems like this one will be widely applicable. These were important unresolved
issues in the late nineteenth century; they arose, for example, in the study of Fourier series. The
Lebesgue theory was developed, in large part, to address this. Let us look at this problem in a bit
more detail.
Definition 1.1. A sequence (fn ) of functions fn : X → R is said to converge pointwise to a
function f : X → R if
lim fn (x) = f (x) for each x ∈ X.
n→∞
In this case, we write fn → f pointwise.
Example 1.2. fn = tent function = graph is triangle with vertices (0, 0), (1/n, n), (2/n, 0), equals
zero elsewhere. Then fn → f = 0 pointwise, fn is Riemann integrable on [a, b], and f = 0 is
Rb Rb
Riemann integrable on [a, b], but limn a fn = 1 6= 0 = a f . So we don’t get the desired conclusion
of the Prototype Convergence Theorem.
Example 1.3. Choose an enumeration Q = {r1 , r2 , r3 , . . .}. Define f = 1Q and define fn by
fn (x) = 1 if x ∈ {r1 , . . . , rn } and fn (x) = 0 otherwise. Then fn → f pointwise, and fn is Riemann
integrable on [a, b], but f is not Riemann integrable on [a, b]. Again we don’t get the desired
conclusion of the Prototype Convergence Theorem. Note that here the sequence (fn ) is quite nice.
Indeed, (fn ) is bounded (|fn | ≤ 1 for all n) and s(fn ) is increasing (fn ≤ fn+1 for all n). But it’s
not enough.
Example 1.4. Later, we will construct a sequence of functions fn : [0, 1] → R such that
• 0 ≤ fn ≤ 1 for each n
This sequence is even nicer than the one in the previous example, but we still don’t get the desired
conclusion of the Prototype Convergence Theorem.
The examples above show that pointwise convergence isn’t good enough for Riemann integration.
By assuming more, we can get some convergence theorems for Riemann integration. We describe
three such theorems below. Unfortunately, they are all somewhat unsatisfactory.
Definition 1.5. A sequence (fn ) of functions fn : X → R is said to converge uniformly to a
function f : X → R if
lim sup {|fn (x) − f (x)| : x ∈ X} = 0.
n→∞
In this case, we write fn → f uniformly.
5
Note that uniform convergence implies pointwise convergence.
Theorem 1.6. If fn is Riemann integrable on [a, b] for each n and fn → f uniformly on [a, b], then
Rb Rb
f is Riemann integrable on [a, b] and limn a fn = a f .
Theorem 1.7. If fn is Riemann integrable on [a, b] for each n, fn ≤ fn+1 for each n, fn → f
Rb Rb
pointwise on [a, b], and f is Riemann integrable on [a, b], then limn a fn = a f .
Theorem 1.8. If fn is Riemann integrable on [a, b] for each n, |fn | ≤ M for each n, fn → f
Rb Rb
pointwise on [a, b], and f is Riemann integrable on [a, b], then limn a fn = a f .
The problem with the first of these three theorems is that uniform convergence is too strong. In
many applications, we don’t have it. The problem with the last two theorems is that the Riemann
integrability of f is part of the hypothesis, rather than part of the conclusion.
We will see that the Lebesgue theory gives us very powerful convergence theorems, which lead to
some very impressive results and some very useful function spaces.
6
2 Preliminaries: Set Theory, Extended Real Numbers, Limsup
and Liminf, Infinite Series
Set Theory
Definition 2.1.
A ∪ B = {x : x ∈ A or x ∈ B}
A ∩ B = {x : x ∈ A and x ∈ B}
• If {Ai : i ∈ I} is an indexed family of sets, the union and intersection of the family are
[
Ai = {x : x ∈ Ai for at least one i ∈ I}
i∈I
\
Ai = {x : x ∈ Ai for all i ∈ I}
i∈I
• If A and B are sets, the set difference of A and B (or the relative complement of B
in A) is is
A \ B = {x : x ∈ A and x ∈
/ B}
It is read as “A minus B” or “A take away B”.
• If all the sets in a given context are subsets of a fixed set X, then the complement (or
absolute complement) of a set A ⊆ X is
Ac = X \ A = {x ∈ X : x ∈
/ A}
Properties of Complement:
Suppose all sets under consideration are subsets of a fixed set X. Let A, B ⊆ X and let {Ai : i ∈ I}
be an indexed family of subsets of X.
• A ∪ Ac = X
• A ∩ Ac = ∅
• ∅c = X
• Xc = ∅
• (Ac )c = A
• A \ B = A ∩ B c = B c \ Ac
• If A ⊆ B, then B c ⊆ Ac
7
• De Morgan’s Laws:
!c !c
[ \ \ [
Ai = Aci and Ai = Aci
i∈I i∈I i∈I i∈I
In words, De Morgan’s Laws say that the complement of a union is the intersection of the comple-
ments, and the complement of an intersection is the union of the complements.
S S
• B∪ i∈I Ai = i∈I (B ∪ Ai )
T T
• B∩ i∈I Ai = i∈I (B ∩ Ai )
T T
• B∪ i∈I Ai = i∈I (B ∪ Ai )
S S
• B∩ i∈I Ai = i∈I (B ∩ Ai )
• A = (A ∩ B) ∪ (A \ B)
• A ∩ B = A \ (A \ B)
• A \ (B ∪ C) = (A \ B) \ C
• (A ∪ B) \ C = (A \ B) ∪ (A \ C)
Definition 2.2. For any set X, the power set of X, denoted by P(X), is the collection of all
subsets of X:
P(X) = {A : A ⊆ X}
Definition 2.3. Let X, Y be sets. A function (or map) f : X → Y is a rule that assigns to
each element x ∈ X a unique element f (x) ∈ Y . The sets X and Y are called the domain and
codomain of f . The image (or range) of f is the set
f (X) = {f (x) : x ∈ X} .
f (A) = {f (x) : x ∈ A}
f −1 (B) = {x ∈ X : f (x) ∈ B}
The inverse image commutes with unions, intersections, set differences, and complements:
• f −1 Ai = i∈I f −1 (Ai )
S S
i∈I
• f −1 −1 (A )
T T
i∈I Ai = i∈I f i
• f −1 (A \ B) = f −1 (A) \ f −1 (B)
8
c
• f −1 (Ac ) = f −1 (A)
g◦f :X →Z
defined by
(g ◦ f )(x) = g(f (x)) for every x ∈ X.
The set of extended real numbers is the set obtained by adjoining the two symbols −∞ and
+∞ to the set of real numbers. It is denoted by R. Thus
R = R ∪ {−∞, +∞} .
• ∞ + ∞ = ∞ and −∞ − ∞ = −∞
• 0 · (±∞) = 0
x
• =0 for all x ∈ R.
±∞
In some other areas of mathematics, the products 0 · (±∞) = 0 are left undefined, but not here.
| − ∞| = | + ∞| = +∞
9
Supremum and Infimum
Let A ⊆ R.
If A is non-empty and bounded above in R, the order completeness axiom of the real numbers
implies that A has a supremum sup(A) in R.
If A is non-empty but not bounded above in R, then ∞ is the only upper bound for A, and so
sup(A) = ∞.
If A is empty, then every extended real number is an upper bound for A, and so sup(A) = −∞.
Every subset of R has a supremum and an infimum in R.
Thus every subset of R has a supremum in R. The same goes for infimum.
Definition 2.6.
• Let a ∈ R. We write lim an = a (and we say that (an ) converges to a and that a is the limit
of (an )) if for every real > 0 there exists an N ∈ N such that if n ≥ N then
|an − a| < .
• We write lim an = +∞ (and we say that (an ) converges to +∞ and that +∞ is the limit of
(an )) if for every real M > 0 there exists an N ∈ N such that if n 6= N then
an > M.
• We write lim an = −∞ (and we say that (an ) converges to −∞ and that −∞ is the limit of
(an )) if for every real M > 0 there exists an N ∈ N such that if n 6= N then
an < −M.
10
Definition 2.7. The limsup (or limit superior) and liminf (or limit inferior) are defined by
!
lim sup an = inf sup an
k≥1 n≥k
lim inf an = sup inf an
k≥1 n≥k
Note that
!
• The sequence ck = sup an is a decreasing sequence
n≥k
• The sequence bk = inf an is an increasing sequence
n≥k
! !
• lim sup an = inf sup an = lim sup an
k≥1 n≥k k n≥k
• lim inf an = sup inf an = lim inf an
k≥1 n≥k k n≥k
• bk ≤ ak ≤ ck for all k
Theorem 2.8. We have lim sup an = lim inf an iff limn an = L for some L ∈ R, in which case
Proof. Exercise.
Infinite Series
k
X ∞
X
lim sk = lim an = an .
k k
n=1 n=1
11
guarantee that the undefined expression ∞ − ∞
(ii) Note that condition (a) above is needed to P
does not occur in any of partial sums sk = kn=1 an .
(iii) If ∞ is one of the terms of the series and −∞ is not, then the sum of the series is ∞. Likewise
if we swap the roles of ∞ and −∞.
12
3 σ-Algebras
Definition 3.1. Let X be any set. A σ-algebra on X is a collection A ⊆ P(X) with the following
properties.
(a) ∅ ∈ A
(b) If A ∈ A, then Ac = X \ A ∈ A.
(c) If A1 , A2 , . . . ∈ A, then ∞
S
i=1 Ai ∈ A.
(i) X ∈ A
T∞
(ii) If A1 , A2 , . . . ∈ A, then i=1 Ai ∈ A.
Sn
(iii) If A1 , . . . , An ∈ A, then i=1 Ai ∈ A.
Tn
(iv) If A1 , . . . , An ∈ A, then i=1 Ai ∈ A.
(v) If A, B ∈ A, then A \ B = A ∪ B c ∈ A.
Proof. In this proof, (a),(b),(c) refer to the properties in the definition of a σ-algebra.
(iv): Define Ai = X for i > n and apply (ii). Alternatively, use De Morgan’s laws and apply (b)
and (iii).
(a) P(X)
(b) {∅, X}
13
(c) {∅, A0 , Ac0 , X} for any fixed A0 ⊆ X.
Theorem 3.4. Let X be any set. The intersection of any collection of σ-algebras on X is a
σ-algebra on X.
T
Proof. Let {Aj }j∈J be any collection of σ-algebras on X and consider their intersection j∈J Aj .
We need to check properties (a), (b), and (c) of the definition of a σ-algebra.
T
(a): We have ∅ ∈ Aj for every j ∈ J. So ∅ ∈ j∈J Aj .
T
(b): Suppose A ∈ j∈J Aj . Then A ∈ Aj T for every j ∈ J. Since Aj is a σ-algebra for every j ∈ J,
we have Ac ∈ Aj for every j ∈ J. So A ∈ j∈J Aj .
T
(c): Suppose A1 , A2 , . . . ∈T j∈J Aj . Then A1 , A2 , . . . ∈ Aj for ∈ J. Since Aj is a σ-algebra
T∞every j T
∞
for every j ∈ J, we have i=1 Ai ∈ Aj for every j ∈ J. So i=1 Ai ∈ j∈J Aj .
Definition 3.5. Let X be any set and let G ⊆ P(X). The intersection of all σ-algebras on X that
contain G is denoted by σ(G). In other words,
[
σ(G) = {A : A is a σ-algebra on X, G ⊆ A} .
Theorem 3.6. Let X be any set and let G ⊆ P(X). Then σ(G) is the smallest σ-algebra on X
that contains σ(G); in other words,
(ii) G ⊆ σ(G)
Definition 3.7.
Definition 3.8. Suppose X is R or Rd (or any topological space). The Borel σ-algebra on X,
denoted by B(X), is the σ-algebra generated by the collection of all open subsets of X. In other
words, if T denotes the collection of all open subsets of X, then B(X) = σ(T ). The elements of
B(X) are called Borel sets in X.
14
Theorem 3.9. The Borel σ-algebra B(Rd ) is generated by the collection of all open balls in Rd .
In other words, if G is the collection of all open balls in Rd , then B(Rd ) = σ(G).
Proof. Let T be the collection of all open sets in Rd . Since B(Rd ) = σ(T ), we must show σ(G) =
σ(T ). Since G ⊆ T , we have σ(G) ⊆ σ(T ) by Theorem 3.6. It remains to show σ(T ) ⊆ σ(G). If
A ∈ T , then A is a union of open balls in Rd . So for each point in x ∈ A, there is an open ball
Bx that contains x and is contained in A. Now for each point x ∈ A choose an open ball (1) that
contains x, (2) that is contained in Bx , (3) whose center is a point with rational coordinates, and
(4) whose radius is rational. In this way, A is written as a union of balls with whose centers have
rational coordinates and whose radii are rational. Thus A is written as a union of a countable
collection of open balls. Hence A ∈ σ(G). Thus T ⊆ σ(G). Therefore σ(T ) ⊆= σ(G).
Theorem 3.10. The Borel σ-algebra B(R) is generated by each of the following collections of sets:
(iii) G3 = {(−∞, b] : b ∈ R}
(iv) G4 = {(−∞, b) : b ∈ R}
(vii) G7 = {[a, ∞) : a ∈ R}
(viii) G8 = {(a, ∞) : a ∈ R}
Proof. (i): The previous theorem implies G1 generates B(R), i.e., σ(G1 ) = B(R).
(ii): By writing,
∞
\
(a, b] = (a, b + 1/n)
n=1
we see that G1 ⊆ σ(G2 ), hence σ(G1 ) ⊆ σ(G2 ). This proves G2 generates B(R).
(iii): By writing
∞
[
(−∞, b] = (−n, b]
n=1
we see that G2 ⊆ σ(G3 ), hence σ(G2 ) ⊆ σ(G3 ). This proves G3 generates B(R).
15
(iv): By writing
∞
[
(−∞, b) = (−∞, b − 1/n]
n=1
we see that G3 ⊆ σ(G4 ), hence σ(G3 ) ⊆ σ(G4 ). This proves G3 generates B(R).
(v),(vi),(vii),(viii): Similar.
Definition 3.11. Let B(R) denote the collection of all sets of the form A, A ∪ −∞, A ∪ ∞,
A ∪ −∞ ∪ ∞, where A is a Borel set in R. It is straightforward to check that B(R) is a σ-algebra
on R. We call B(R) the Borel σ-algebra on R.
Remark 3.12. It is possible to define a collection of open sets in R and to define the Borel σ-algebra
on R as the σ-algebra generated by the open sets in R. See Exercise X???
Theorem 3.13. The Borel σ-algebra B(R) is generated by each of the following collection of sets:
Proof. (i) By the previous lemma, G30 ⊆ B(R), hence σ(G30 ) ⊆ B(R). Now we show B(R) ⊆ σ(G30 ).
Let B ∈ B(R). Then B equals one of A or A ∪ −∞ or A ∪ ∞ or A ∪ −∞ ∪ ∞, where A ∈ B(R).
To show that B ∈ σ(G30 ), it suffices to show that {−∞} , {∞} , A ∈ σ(G30 ). By writing,
∞
\
{−∞} = [−∞, −n]
n=1
we see that {∞} ∈ σ(G30 ). Recall the definition of G5 from the previous theorem. By writing
we see that G5 ⊆ σ(G30 ), and hence σ(G5 ) ⊆ σ(G30 ). By the previous theorem, B(R) = σ(G5 ). So we
have A ∈ B(R) = σ(G5 ) ⊆ σ(G30 ). Therefore B ∈ σ(G30 ) and we conclude B(R) ⊆ σ(G30 ).
(ii),(iii),(iv): Similar.
16
4 Measurable Functions
To motivate the definition of a measurable function, we ask the reader to recall the following
theorem from undergraduate analysis about continuous functions.
If T is the collection of open sets in R, we can rewrite this as: f is continuous iff
f −1 (V ) ∈ T for every V ∈ T
The reader who has studied topology may recall the following more general theorem.
Theorem 4.2. Let X and Y be topological spaces and f : X → Y . Then f is continuous iff
If TX is the collection of open sets in X and TY is the collection of open sets in Y , we can rewrite
this as: f is continuous iff
f −1 (V ) ∈ TX for every V ∈ TY
In summary, a continuous function is one whose inverse image preserves open sets.
If f : Rd → R, we sometimes (but not always) consider the Borel σ-algebra on the domain Rd . We
say f : Rd → R is Borel measurable (or B(Rd )-measurable) if
The next theorem says that to show a function f is measurable we only need to check f −1 (B) for
B in a generating collection.
Theorem 4.4. Let (X, A) and (Y, B) be measurable spaces. Let f : X → Y . Let G be any
collection of subsets of Y that generates B, i.e., σ(G) = B. Then f is measurable iff
17
Proof. ⇒: Since G ⊆ σ(G) = B, if
then
f −1 (B) ∈ A for every B ∈ G.
By combining the above theorem and Theorem 3.13 (which gives generating sets for B(R)), we
obtain:
Theorem 4.5. Let (X, A) be a measurable space and let f : X → R. Then the following are
equivalent:
(i) f is measurable
Remark 4.6. When verifying a function f : X → R is measurable, we usually use the theorem
above, rather than the definition.
Theorem 4.7. Let (X, A) be a measurable space and let f : X → R. If f is measurable, then
{x ∈ X : f (x) = c} = f −1 ({c}) ∈ A for every c ∈ R.
Proof.
{x ∈ X : f (x) = c} = {x ∈ X : f (x) ≤ c} ∩ {x ∈ X : f (x) ≥ c} ∈ A.
{f = c} = {x ∈ X : f (x) = c} = f −1 ({c}),
and so on.
18
Theorem 4.8. Let (X, A) be a measurable space. Let f : X → R. If f is constant, then f is
measurable.
Proof. Assume f is constant. This means there exists a c ∈ R such that f (x) = c for all x ∈ X.
Let a ∈ R. Then {x ∈ X : f (x) ≥ a} = ∅ if a > c, and {x ∈ X : f (x) ≥ a} = X if a ≤ c. Either
way, {x ∈ X : f (x) < a} ∈ A. Thus f is measurable.
Theorem 4.9. Let (X, A) be a measurable space. Let A ⊆ X. Then A is measurable (i.e., A ∈ A)
iff the indicator function 1A is measurable.
So 1A is measurable.
Proof. Since f is continuous, f −1 (G) is open for every open set G ⊆ R. So {x ∈ X : f (x) < a} =
f −1 ((−∞, a)) is open for every a ∈ R. Therefore {x ∈ X : f (x) < a} = f −1 ((−∞, a)) ∈ B(R) for
every a ∈ R.
Proof. Exercise.
Theorem 4.12. Let (X, A),(Y, B), (Z, C) be a measurable spaces. Let f : X → Y and g : Y → Z.
If f is (A, B)-measurable and if g : Y → Z is measurable (B, C)-measurable, then then the function
g ◦ f : X → Z is (A, C) measurable.
Proof. Exercise.
Proof. (i): For each fixed x ∈ X, we have f (x) < g(x) iff there is an r ∈ Q such that f (x) < r <
g(x). Thus
[
{x ∈ X : f (x) < g(x)} = {x ∈ X : f (x) < r} ∩ {x ∈ X : r < g(x)} .
r∈Q
(ii): {f ≤ g} = {g < f }c
(iii): {f = g} = {f ≤ g} ∩ {f ≥ g}
19
Theorem 4.14. Let (X, A) be a measurable space. Let f, g : X → R be measurable functions.
Then the following functions are measurable:
(i):
[ [
{f + g < a} = {f < −g + a} = ({f < r} ∩ {r < −g + a}) = ({f < r} ∩ {g < −r + a}) ∈ A.
r∈Q r∈Q
20
Similarly, A3 , A4 ∈ A. Therefore {f g < a} ∈ A.
We have:
∅ if a ≤ 0
A0 = ({f = 0} ∪ {g = 0}) ∩ {f g < a} = ∈A
{f = 0} ∪ {g = 0} if a ≤ 0
21
(ii) min f, g
(iii) |f |
(iii): {|f | < a} = {−a < f < a} = {f > −a} ∩ {f < a} ∈ A. Alternatively, write |f | = max(f, −f )
and use (ii) and (vi).
Theorem 4.16. Let (X, A) be a measurable space. Let (fn ) be a sequence of measurable functions
X → [−∞, ∞]. The following functions are measurable:
(i) supn fn
(ii) inf n fn
Moreover, if (fn (x)) converges to an extended real number for each x ∈ X, then limn fn =
lim supn fn = lim inf n fn , and so limn fn is measurable.
(iii): By (i), gk = supn≥k fn is measurable for each k ∈ N. By (ii), lim sup fn = inf k≥1 gk is
measurable.
22
5 Measures
(a) µ(∅) = 0
(i) (Finite Additivity) If A, B are disjoint sets in A, then µ(A ∪ B) = µ(A) + µ(B).
Proof. (i): Define A1 = A, A2 = B, and Ai = for i > 2 and use countable additivity.
Since µ(A) < ∞, we can subtract it from both sides to get the desired result.
(iii): Since B = (B ∩ A) ∪ (B \ A) = A ∪ (B \ A), finite additivity and the fact that µ(B \ A) ≥ 0
gives
µ(B) = µ(A) + µ(B \ A) ≥ µ(A)
.
23
by countable additivity and finite additivity,
∞ ∞ ∞
! !
[ [ X
µ Ai = µ Bi = µ(Bi )
i=1 i=1 i=1
Xn
= lim µ(Bi ) = lim µ (∪ni=1 Bi )
n→∞ n→∞
i=1
= lim µ (∪ni=1 Ai ) = lim µ (An ) .
n→∞ n→∞
By monotonicity, µ ( ∞
T
i=1 Ai ) ≤ µ(An ) ≤ µ(A1 ) < ∞. Then (ii) gives
∞
!
\
µ(A1 ) − µ Ai = lim (µ(A1 ) − µ(An )) .
n→∞
i=1
Corollary 5.3. Let X be any set. If µ is a measure on P(X), then µ is an outer measure on X.
(i) The counting measure on X is the measure µ : P(X) → [0, ∞] defined by µ(A) = ∞ if A is
an infinite subset of X and µ(A) equals the number of elements in A if A is a finite subset of
X.
(ii) Let x0 ∈ X. The Dirac measure at x0 or point mass at x0 is the measure µ : P(X) → [0, ∞]
defined by µ(A) = 1 if x0 ∈
/ A and µ(A) = 1 if x0 ∈ A.
As an exercise, the reader should verify that these are indeed measures.
24
6 Simple Functions
Pn
Proof. If each Ei is measurable, then each 1Ei is measurable, so s = i=1 ci 1Ei is measurable.
(v) If f ≥ 0, then 0 ≤ s1 ≤ s2 ≤ . . . ≤ f .
Decompose [−∞, ∞] further by dividing (−k, 0] and [0, k) up into intervals of length 1/2k . So we
get
k2k k2k
[ −m −(m − 1) [ m − 1 m
[−∞, ∞] = [−∞, −k] ∪ , ∪ , k ∪ [k, ∞].
2k 2k 2k 2
m=1 m=1
25
Figure 1: Definition of sk and sk+1
For each x ∈ X, consider the subinterval to which f (x) belongs and define sk (x) to the endpoint
of that subinterval which is closest to 0. More precisely,
k if f (x) ∈ [k, ∞]
m−1 f (x) ∈ m−1 , 2mk , mi∈ 1, . . . , k2k
if
2k 2k
sk (x) = −(m−1) −m −(m−1)
k
2k if f (x) ∈ 2 k , 2k , m ∈ 1, . . . , k2
−k if f (x) ∈ [−∞, −k].
(ii): No matter which subinterval f (x) belongs, sk (x) rounds f (x) to a number closer to zero than
does sk+1 (x). See Figure 1.
1 1
|f (x) − sk (x)| ≤ k
≤ K < .
2 2
Therefore limk→∞ sk (x) = f (x).
26
(iv): If f is bounded, then there exists M > 0 such that f (x) ∈ (−M, M ) for every x ∈ X. Let
> 0. Choose K ∈ N such that 21K < and K ≥ M . For every k ∈ N and every x ∈ X, we have
f (x) ∈ (−k, k), and so
1 1
|f (x) − sk (x)| ≤ k ≤ K < .
2 2
Therefore sk → f uniformly on X.
(v): If f (x) ≥ 0, then sk (x) ≥ 0 by definition. The rest follows from (ii).
Then
k2 k k2 k
X m−1 X −(m − 1)
sk = k1Fk + (−k)1F−k + k
1Em + 1E−m .
2 2k
m=1 m=1
If f is measurable, then all the sets Fk , F−k , Em , E−m are measurable, and so sk is measurable.
27
7 Lebesgue Integral
Definition 7.2. Let (X, A, µ) be a measure space. Let f : X → [0, ∞] be a measurable function.
Let P = {A1 , . . . , An } be an A-partition of X. The lower Lebesgue sum for f and P is
n
X
L(f, P ) = (inf f )µ(Ai ).
Ai
i=1
Remark: Since f : X → [0, ∞], the lower Lebesgue sum involves only terms in [0, ∞]. If we had
allowed f : X → [−∞, ∞], then the sum could have both ∞ and −∞ terms, which would result in
the undefined expression ∞ − ∞.
Definition 7.3. Let (X, A, µ) be a measure space. Let f : X → [0, ∞] be a measurable function.
The (or integral) of f with respect to µ is defined to be
Z
f dµ = sup {L(f, P ) : P is an A-partition of X} .
R R
Note that f dµ is always either a finite number in [0, ∞) or ∞. If R f dµ is Rfinite, we say that
R
f is integrable with respect to µ. We sometimes write the integral f dµ as f (x)dµ(x) or f
instead. If the measure is Lebesgue measure on R, i.e., if µ = λ, we usually dx instead of dλ or
dλ(x).
Definition 7.4. Let f : X → [−∞, ∞]. The positive part of f is the function
f + = max {0, f } .
f − = max {0, −f } .
(a) f + , f − ≥ 0
(b) f = f + − f −
(c) |f | = f + + f −
28
Definition 7.6. Let (X, A, µ) Rbe a measure space. Let f : X → [−∞, ∞] be a measurable function.
If at least one of f dµ and f − dµ is finite, the Lebesgue integral (or simply integral) of f
R +
Definition
R 7.7. Let (X,R A, µ) be a measure space. Let f : X → C be a measurable function.
If both Re(f )dµ and Im(f )dµ are finite, we say f is integrable with respect to µ and the
Lebesgue integral (or simply integral) of f with respect to µ is defined to be
Z Z Z
f dµ = Re(f )dµ − Im(f )dµ.
Theorem 7.11. RLet (X, A,Rµ) be a measure space. Let f : X → [0, ∞] be a measurable function.
Let c ≥ 0. Then cf dµ = c f dµ.
Proof. Exercise.
Corollary 7.12. Let (X, RA, µ) be a Rmeasure space. Let f : X → [−∞, ∞] be a measurable
function. Let c ∈ R. Then cf dµ = c f dµ.
Corollary 7.13.
R R A, µ) be a measure space. Let f : X → C be a measurable function. Let
Let (X,
c ∈ C. Then cf dµ = c f dµ.
29
8 The Integral in Terms of Simple Functions
Proof. Since E ∈ A, the function 1E is measurable. We prove the ≥ and ≤ inequalities separately.
Therefore Z
1E dµ ≥ µ(E).
Therefore Z
1E dµ ≤ µ(E).
Theorem 8.2. Let (X, A, µ) be Pa measure space. If s is a measurable simple function with s ≥ 0
and standard representation s = ni=1 ci 1Ei , then
Z n
X
sdµ = ci µ(Ei ).
i=1
Thus sdµ ≥ ni=1 ci µ(Ei ). Now we prove the reverse inequality. Let P = {AS1 , . . . , Am } be any
R P
A-partition of X. Note that we can write each Aj as the disjoint union Aj = ni=1 (Aj ∩ Ei ). We
30
Sj
can also write each Ei as the disjoint union Ei = j=1 (Aj ∩ Ei ). Then
m
X
L(s, P ) = (inf s)µ(Aj )
Aj
j=1
Xm Xn
= (inf s)µ(Aj ∩ Ei )
Aj
j=1 i=1
Xm X n
≤ ( inf s)µ(Aj ∩ Ei )
Aj ∩Ei
j=1 i=1
Xm X n
= ci µ(Aj ∩ Ei )
j=1 i=1
n
X Xm
= ci µ(Aj ∩ Ei )
i=1 j=1
Xn
= ci µ(Ei ).
i=1
R Pn
Therefore sdµ ≤ i=1 ci µ(Ei ).
It is useful to restate the definition of the Lebesgue integral in terms of simple functions.
Theorem 8.3. Let (X, A, µ) be a measure space. Let f : X → [0, ∞] be a measurable function.
Then
Z Z
f dµ = sup sdµ : s simple measurable, 0 ≤ s ≤ f (8.1)
Proof. The ≥ inequality comes from Theorem 7.9. Now we prove the reverse inequality. Let
P = {A1 , . . . , An } be any A-partition of X. We must show L(f, P ) is ≤ the right-hand side of 8.1.
Case 1. For every i ∈ {1, . . . , n}, either inf Ai f ) < ∞ or µ(Ai ) = 0. Define
inf Ai f if inf Ai f < ∞.
ci =
0 if inf Ai f = ∞ and µ(Ai ) = 0.
Define s = ni=1 ci 1Ai . ThisR is the standard representation of the measurable simple function s.
P
By Theorem 8.2, L(f, P ) = sdµ. Thus L(f, P ) is ≤ the right-hand side of 8.1.
Case 2. For some i0 ∈ {1, . . . , n}, either inf Ai0 f ) = ∞ and µ(Ai0 ) > 0. Then L(f, P ) = ∞.
Seeking a contradiction, assume the right-hand side of 8.1 equals a finite number M ∈ [0, ∞).
Define c1 = (M + 1)/µ(Ai0 ) and c2 = 0. Define s = c1 1Ai0 + c2 1X\Ai0 . This is the standard
representation of the measurable
R simple functionR s. Note s is a simple measurable function with
0 ≤ s ≤ f . By Theorem 8.2, sdµ = M + 1. So sdµ is strictly larger than the right-hand side of
8.1. Contradiction.
31
9 Monotone Convergence Theorem
Theorem 9.1. (Monotone Convergence Theorem) Let (X, A, µ) be a measure space. Let fn :
X → [0, ∞] (n = 1, 2, . . .) be a sequence of measurable functions. If fn increases to f pointwise
(meaning that f1 (x) ≤ f2 (x) ≤ . . . and f (x) = limn fn (x) = supn fn (x) for every x ∈ X), then f is
measurable and Z Z
lim fn = f.
n
En = {x ∈ X : cs(x) ≤ fn (x)} .
We want to take the limit n → ∞ in (9.1). Suppose the standard representation Pof s is s =
P k k
a 1
i=1 i Ai . Then s1 En is a simple function and its standard representation is s1 En = i=1 ai 1En ∩Ai .
Therefore
Z Xk
s= ai µ(Ai )
i=1
and
Z k
X
s1En = ai µ(En ∩ Ai ).
i=1
Note E1 ⊆ E2 ⊆ . . . and
∞
[
X= En .
n=1
(To see the last equality, consider an arbitrary x ∈ X. If f (x) = 0 or s(x) = 0, then x ∈ E1 . If
f (x) > 0 and s(x) > 0, then cs(x) < f (x), hence there exists n ∈ N such that cs(x) < fn (x) ≤ f (x),
and so x ∈ En .) Let i ∈ {1, . . . , k} be arbitrary. We have E1 ∩ Ai ⊆ E2 ∩ Ai ⊆ . . . and
∞
[
Ai = (En ∩ Ai ).
n=1
Therefore
Z k
X k
X Z
lim s1En = lim ai µ(En ∩ Ai ) = ai µ(Ai ) = s.
n n
i=1 i=1
32
Thus taking n → ∞ in (9.1) gives Z Z
c s ≤ lim fn .
n
33
10 Additivity of the Integral
Lemma 10.1. Let (X, A, µ) be a measure space. If f, g are non-negative measurable simple
functions, then Z Z Z
(f + g) = f + g.
Proof.SLet f = m
P Pn
i=1 ai 1Ai and g = j=1 bj 1Bj be the standard representations
Sof f and g. Note
Ai = nj=1 (Ai ∩ Bj ) is a disjoint union for each i = 1, . . . , m. Likewise Bj = m i=1 (Ai ∩ Bj ) is a
disjoint union for each j = 1, . . . , n. Then
Z Z X m Xn
f+ g= ai µ(Ai ) + bi µ(Bi )
i=1 j=1
m X
X n m X
X n X
n
= ai µ(Ai ∩ Bj ) + bi µ(Ai ∩ Bj )
i=1 j=1 i=1 j=1 j=1
Xm X n
= (ai + bj )µ(Ai ∩ Bj ).
i=1 j=1
However this may not be the standard representation of f + g because the values ai + bj may not
be distinct. Let c1 , . . . , c` be the distinct numbers in the set
{ai + bj : 1 ≤ i ≤ m, 1 ≤ j ≤ n} .
Let Ek be the union of those sets Ai ∩ Bj such that ai + bj = ck . Then the standard representation
of f + g is
X`
f +g = ck 1Ek .
k=1
and X
µ(Ek ) = µ(Ai ∩ Bj ).
i,j:ai +bj =ck
Therefore
Z `
X
(f + g) = ck µ(Ek )
k=1
X` X
= ck µ(Ai ∩ Bj )
k=1 j,k:ai +bj =ck
m X
X n
= (ai + bj )µ(Ai ∩ Bj )
i=1 j=1
R R R
Comparing the calculations for f+ g and (f + g) shows they are equal.
34
Theorem 10.2. Let (X, A, µ) be a measure space. If f, g : X → [0, ∞] are measurable functions,
then Z Z Z
(f + g) = f + g.
Proof. By Theorem 6.3, there are sequences of measurable simple functions (sn ) and (tn ) such that
sn increases to f and tn increases to g. Then (sn + tn ) is a sequence of measurable function that
increases to f + g. By the monotone convergence theorem and Lemma 10.1,
Z Z Z Z Z Z
(f + g) = lim (sn + tn ) = lim sn + lim tn = f + g
n n n
Corollary 10.3. Let (X, A, µ) be a measure space. If f, g : X → C are integrable functions, then
f + g is integrable and Z Z Z
(f + g) = f + g.
Proof. Case 1: f and g are real-valued. Note (f + g)+ ≤ f + + g + . Then Theorem 10.2 gives
Z Z Z Z
(f + g) ≤ (f + g ) = f + g + < ∞.
+ + + +
Therefore Z Z Z
(f + g) = f+ g.
Case 2: f and g are complex-valued. Since f and g are integrable, so are Re(f ), Im(f ), Re(g), and
Im(g). We apply Case 1 to the real and imaginary parts of f and g. We get
Z Z Z Z
Re(f + g) = (Re(f ) + Re(g)) = Re(f ) + Re(g) < ∞
and Z Z Z Z
Im(f + g) = (Im(f ) + Im(g)) = Im(f ) + Im(g) < ∞.
So f + g is integrable and
Z Z Z Z Z Z Z
(f + g) = Re(f + g) + i Im(f + g) = Re(f ) + Re(g) + i Im(f ) + i Im(g)
Z Z Z Z Z Z
= Re(f ) + i Im(f ) + Re(g) + i Im(g) = f + g.
35
Corollary 10.4. Let (X, A, µ) be a measure space. Let f, g : X → [−∞, ∞] be measurable
functions.
Proof. Exercise.
36
11 Fatou’s Lemma
Proof. Define gk = inf n≥k fn for k ∈ N. Then (gk ) is an increasing sequence of measurable functions
and limk gk = lim inf fn . By the monotone convergence theorem,
Z Z
lim inf fn = lim gk .
k
R R R R
Note gk ≤ fn for n ≥ k. So gk ≤ fn for n ≥ k. Thus gk ≤ inf n≥k fn . Therefore
Z Z Z Z
lim inf fn = lim gk ≤ lim inf fn = lim inf fn .
k k n≥k
37
12 Integrable Functions and Absolute Values
(b):
Z Z Z Z Z Z Z Z
f = f + − f − ≤ f + + f − = f + + f − = |f |
Case 2: f : X → C.
R R R R R R
(a):
R f is integrable iff f = Re f + Im f is finite iff Re f and Im f are finite iff | Re f | and
| Im f | are finite. The last equivalence comes from Case 1. But |f | ≤ | Re f | + | Im f | ≤ 2|f |, so
that Z Z Z Z
|f | ≤ | Re f | + | Im f | ≤ 2 |f |.
R R R
Thus | Re f | and
| Im f | are finite iff |f | is finite iff |f | is integrable.
Z Z Z
f = α f = αf
R
In particular, αf is real. Therefore
Z Z Z Z Z
f = Re αf = Re(αf ) ≤ |αf | = |f |.
R R
Proof. Note |f | ≤ |g| < ∞ and apply the previous theorem.
38
13 Dominated Convergence Theorem
Theorem 13.1. (Dominated Convergence Theorem) Let (X, A, µ) be a measure space. Let fn :
X → C be a sequence of measurable functions. Let f : X → C be a function. If fn → f pointwise
and there exists an integrable function g : X → [0, ∞] such that |fn | ≤ g for all n, then f is
integrable, fn is integrable for each n,
Z
lim |f − fn | = 0,
n
and Z Z
lim fn = f
n
Proof. Since f is the pointwise limit of a sequence of measurable functions, Corollary ?? implies f
is measurable. Since fn → f pointwise and |fn | ≤ g for all n, we have |f | ≤ g, and so Corollary 12.2
implies f is integrable. Corollary 12.2 implies fn is integrable for each n. Then f − fn is integrable
for each n. Since |f − fn | ≤ 2g, we have 0 ≤ 2g − |f − fn |. Fatou’s lemma,
Z Z
2g = lim(2g − |f − fn |)
n
Z
= lim inf (2g − |f − fn |)
n
Z
≤ lim inf (2g − |f − fn |)
n
Z Z
= lim inf 2g − |f − fn |
n
Z Z
= 2g + lim inf − |f − fn |
n
Z Z
= 2g − lim sup |f − fn |
n
R R
Since 2g is finite, we can subtract it to obtain lim supn |f − fn | ≤ 0. So we have
Z Z
0 ≤ lim inf |f − fn | ≤ lim sup |f − fn | ≤ 0.
n n
Thus Z Z Z
lim |f − fn | = lim sup |f − fn | = lim inf |f − fn | = 0.
n n n
Furthermore, Z Z Z Z
lim fn − f = lim (fn − f ) ≤ lim |f − fn | = 0,
n n n
R R
whence limn fn = f .
39
14 Interchanging Limits and Derivatives with Integrals
Theorem 14.1. Let (X, A, µ) be a measure space. Let R f : X × [a, b] → C. Suppose that f ( · , t) :
X → C is integrable for each t ∈ [a, b]. Define F (t) = X f (x, t)dµ(x) for each t ∈ [a, b].
(a) Suppose there exists an integrable function g : X → [0, ∞] such that |f (x, t)| ≤ g(x) for all
x ∈ X, t ∈ [a, b]. Suppose c ∈ [a, b] and limt→c f (x, t) = f (x, c) for every x ∈ X. Then
Z Z
lim F (t) = lim f (x, t)dµ(x) = f (x, c)dµ(x) = F (c).
t→c t→c X X
∂f
(b) Suppose (x, t) exists for all x ∈ X, t ∈ [a, b]. Suppose there exists an integrable function
∂t
∂f
g : X → [0, ∞] such that | (x, t)| ≤ g(x) for all x ∈ X, t ∈ [a, b]. For any c ∈ [a, b],
∂t
Z
0 ∂f
F (c) = (x, c)dµ(x).
X ∂t
Proof. The idea is to combine the dominated convergence theorem with the following fact from
undergraduate analysis: limx→c h(x) = L iff limn→∞ h(xn ) = L for every sequence (xn ) with
xn 6= c and xn → c.
(a): Exercise.
(b): Let (tn ) be any sequence of numbers in [a, b] such that tn 6= c for all n and tn → c. Define
f (x, tn ) − f (x, c)
fn (x) =
tn − c
∂f
for each x ∈ X. Then fn is measurable and limn fn (x) = (x, c). By the mean value theorem,
∂t
∂f
|fn (x)| ≤ sup | (x, t)| ≤ g(x)
t∈[a,b] ∂t
40
exists, and Z
0 ∂f
F (c) = (x, c)dµ(x).
X ∂t
41
15 Almost Everywhere
Definition 15.1. Let (X, A, µ) be a measure space. If A ∈ A, µ(Ac ) = 0, and P is a property that
holds for every x ∈ A, then we say that P holds almost everywhere or that P holds for almost
every x ∈ X.
We often abbreviate “almost everywhere” and “almost every” to “a.e.” The concept of a.e. depends
on the measure µ. When clarity demands it, we write µ-a.e. instead of a.e.
For example, if f, g : X → C and there exists A ∈ A such that µ(Ac ) = 0 and f(x)=g(x) for every
x ∈ A, then we say that f = g a.e.
Therefore
∞
X
µ({x ∈ X : f (x) > 0}) ≤ µ(En ) = 0
n=1
Thus f = 0 a.e.
Corollary 15.3. Let (X, A, µ) be a measure space. Let f, g : X → [0, ∞] be measurable functions.
R R
(a) If f ≤ g a.e., then f ≤ g.
R R
(b) If f = g a.e., then f = g.
42
Proof. (a): Suppose f ≤ g a.e. Then there exists A ∈ A such that µ(Ac ) = 0 and f (x) ≤ g(x) for
all x ∈ A. So f 1Ac = 0 a.e. and f 1A ≤ g everywhere. Therefore
Z Z Z Z Z Z
f = (f 1A + f 1A ) = f 1A + f 1A = f 1A ≤ g.
c c
(b): Note that f = g a.e. iff f ≤ g a.e. and g ≤ f a.e., then apply (a) twice.
+ + − − f+ ≤ g + and
R R
Proof. R −Suppose f ≤ g a.e. Then f ≤ g a.e and g ≤ f a.e. Therefore
R − (a):
g ≤ f . Thus Z Z Z Z Z Z
+ − + −
f = f − f ≤ g − g = g.
(b): Note that f = g a.e. iff f ≤ g a.e. and g ≤ f a.e., then apply (a) twice.
Theorem 15.6. Let (X, A, µ) be a measure space. If f : X → [0, ∞] is integrable, then µ({x ∈ X : f (x) = ∞}) =
0, hence f is finite a.e.
43
Proof. Choose a set A ∈ A such that µ(Ac ) = 0 and such that, for each x ∈ A, we have limn fn (x) =
f (x) and |fn (x)| ≤ g(x) for all n. (To see that such a set A exists, argue as follows. Since fn → f
a.e, we have that there exists A0 ∈ A with µ(Ac0 ) = 0 such that limn fn (x) = f (x) for all x ∈ A0 . For
each n, we have |fn |T≤ g a.e., so there exists An ∈ A with µ(Acn ) = 0 such that |fn (x)| ≤ g(x) for all
x ∈ An . Then A = ∞ n=0 An is the desired set.) Then, for each x ∈ A, |f (x)| = | limn fn (x)| ≤ g(x).
Since |f |1Ac = 0 a.e. and |f |1A ≤ g everywhere, we have
Z Z Z Z Z Z
|f | = (|f |1A + |f |1A ) = |f |1A + |f |1A = |f |1A ≤ g < ∞.
c c
Thus f is integrable. Similar arguments show that f 1A , fn , and fn 1A are integrable for each n.
R and fn 1A is integrable for each n. For each n, |f − fn | = |f 1A − fn 1A |
Moreover, f 1RA is integrable
a.e., and so |f − fn | = |f 1A − fn 1A |. We also have fn 1A → f 1A pointwise and fn 1A ≤ g for
each n. So Theorem 15.2 and the dominated convergence theorem implies
Z Z
lim |f − fn | = lim |f 1A − fn 1A | = 0.
n n
Furthermore, Z Z Z Z
lim f − fn = lim (f − fn ) ≤ lim |f − fn | = 0,
n n n
R R
which shows limn fn = f .
44
16 Complete Measures
If we compare Theorem 13.1 (dominated convergence theorem) to Theorem 15.8 (almost everywhere
dominated convergence theorem), we notice that f being measurable is part of the conclusion of
Theorem 13.1 but it is part of the hypothesis in Theorem 15.8. This is because a pointwise limit of
measure functions is measurable, but an a.e. limit of measurable functions may not be measurable.
We explore this in this section.
Definition 16.1. Let (X, A, µ) be a measure space. We say that µ is complete (or that (X, A, µ)
is complete) if the following property holds: If N ∈ A, µ(N ) = 0, and A ⊆ N , then A ∈ A. In
other words, µ is complete if all subsets of measure zero sets are measurable.
Theorem 16.2. Let (X, A, µ) be a measure space. The following are equivalent.
(a) µ is complete.
Proof. (a) ⇒ (b): Assume f is measurable and f = g a.e. Then there exists a set A ∈ A such that
µ(Ac ) = 0 and f (x) = g(x) for all x ∈ A. Let B ∈ B be given. Write
(b) ⇒ (c): Assume fn is measurable for each n and fn → f a.e.. Then there exists a set A ∈ A
such that µ(Ac ) = 0 and fn (x) → f (x) for all x ∈ A. Define gn = fn 1A and g = f 1A . Then gn is
measurable for each n and gn → g pointwise. So g is measurable. Moreover, g = f a.e. Then (b)
implies f is measurable.
(c) ⇒ (b): Assume f is measurable and f = g a.e. Define hn = f for all n. Then hn is measurable
for all n and hn → g a.e.. Then (c) implies g is measurable.
(b) ⇒ (a): We prove the contrapositive. Assume µ is not complete. So there exist sets A, N ⊆ X
such that N ∈ A, µ(N ) = 0, A ⊆ N , and A ∈
/ A. Define f = 1N c and g = 1Ac . Then f = g a.e., f
is measurable, but g is not measurable.
Remark: The theorem still holds if we replace C by [0, ∞], [−∞, ∞], or R. The proof is easy to
modify.
Remark: In the almost everywhere dominated convergence theorem, if the measure µ is complete,
then f being measurable can be part of the conclusion, rather than part of the hypothesis.
The next theorem says that the Caratheodory restriction theorem produces a complete measure.
45
Theorem 16.3. Let µ∗ be an outer measure on a set X. Let A∗ be the σ-algebra of µ∗ -measurable
sets and let µ be the measure which is the restriction of µ∗ to A∗ . Then µ is complete.
So A ∈ A∗ .
46
17 Completion of Measures (Optional)
Definition 17.1. Let (X, A, µ) be a measure space. The completion of A with respect to µ is
the collection A consisting of all sets of the form A ∪ B, where A ∈ A, B ∈ P(X), and there exists
N ∈ A such that B ⊆ N and µ(N ) = 0.
(a) A is smallest σ-algebra containing all sets E such that E ⊆ N for some N ∈ A with µ(N ) = 0.
Theorem 17.3. Let (X, A, µ) be a measure space. There is a unique extension of µ to A. This
extension is called the completion of µ and is denoted by µ. The measure space (X, A, µ) is called
the completion of (X, A, µ).
Theorem 17.4. The Lebesgue σ-algebra on R, L(R), is the completion of the Borel σ-algebra on
R, B(R).
Theorem 17.5. The Lebesgue measure on (R, L(R)) is the completion of the Lebesgue measure
on (R, B(R))
Theorem 17.6. Let µ be a measure defined on a σ-algebra A. If µ is σ-finite, then the measure
obtained by doing a Caratheodory extension of µ is the completion of µ.
Theorem 17.7. Let (X, A, µ) be a measure space and let (X, A, mu) be its completion.
(b): Prove it first for f simple. For general f , consider a sequence of simple functions converging
pointwise to f .
Remark: The theorem still holds if we replace C by [0, ∞], [−∞, ∞], or R. The proof is easy to
modify.
47
18 The Lebesgue Integral Extends the Riemann Integral
In this section, X is the interval [a, b], the σ-algebra is the collection of Lebesgue measurable subsets
of [a, b], and the measure is the Lebesgue measure on R restricted to this σ-algebra. It is easy to
check that this defines a complete measure space.
Theorem 18.1. Suppose f : [a, b] → R is bounded on [a, b]. If f is Riemann integrable on [a, b],
then f is Lebesgue measurable, f is Lebesgue integrable, and the Riemann and Lebesgue integrals
of f on [a, b] are equal.
R R
Proof 1. We use the notation f for the lower Riemann integral of f on [a,b], f for the upper
R
Riemann integral of f on [a, b], and f for the Lebesgue integral on [a, b]. Assume f is Riemann
integrable on [a, b]. Using Riemann’s condition, choose partitions P1 ⊆ . . . ⊆ Pn ⊆ Pn+1 ⊆ . . . such
that
1
U (f, Pn , [a, b]) − L(f, Pn , [a, b]) < . (18.1)
n
For each n, write Pn = {x0 , . . . , xk }, and define
k
X k
X
sn = ( inf f )1[xi−1 ,xi ] , tn = ( sup f )1[xi−1 ,xi ]
[xi−1 ,xi ]
i=1 i=1 [xi−1 ,xi ]
so that Z Z
sn = L(f, Pn , [a, b]), tn = U (f, Pn , [a, b])
Note
s1 ≤ . . . ≤ sn ≤ sn+1 . . . ≤ f ≤ . . . ≤ tn+1 ≤ tn ≤ . . . ≤ t1 .
Then
s1 ≤ s ≤ f ≤ t ≤ t1
where we have defined s = limn sn and t = limn sn . It follows that s and t are finite everywhere
and Lebesgue integrable. Since 0 ≤ sn − s1 ↑ s − s1 , the monotone convergence theorem implies
Z Z
lim (sn − s1 ) = (s − s1 ),
n
R
and adding s1 gives Z
lim sn = s.
n
48
Since t − s ≥ 0, it follows that t − s = 0 a.e., hence s = t a.e. Since sn ≤ f ≤ tn for all n, we
have s ≤ f ≤ t, and so s = f = t a.e. Since s and t are measurable and the Lebesgue measure is
complete, f is measurable. Moreover,
Z Z Z
s= f = t
For each n, Z Z Z Z
1 1
f≤ tn ≤ sn + ≤ f+
n n
Taking n → ∞ gives Z Z Z Z
f≤ t= s≤ f
hence Z Z Z Z Z
f= f= t= s= f
So f is Lebesgue integrable and the Riemann and Lebesgue integrals are equal.
49
Here is a slightly different proof.
R R
Proof 2. We use the notation f for the lower Riemann integral of f on [a,b], f for the upper
Riemann integral of f on [a, b], and f for the Lebesgue integral on [a, b]. Choose a sequence (Pn0 )
R
so that Z Z
sn = L(f, Pn , [a, b]), tn = U (f, Pn , [a, b])
and Z Z Z
lim sn = f, lim tn = f.
n n
Note
s1 ≤ . . . ≤ sn ≤ sn+1 . . . ≤ f ≤ . . . ≤ tn+1 ≤ tn ≤ . . . ≤ t1 .
Then
s1 ≤ s ≤ f ≤ t ≤ t1
where we have defined s = limn sn and t = limn sn . It follows that s and t are finite everywhere
and Lebesgue integrable. Since 0 ≤ sn − s1 ↑ s − s1 , the monotone convergence theorem implies
Z Z
lim (sn − s1 ) = (s − s1 ),
n
R
and adding s1 gives Z Z
s = lim sn = f
n
50
Since t − s ≥ 0, it follows that t − s = 0 a.e., hence s = t a.e. Since sn ≤ f ≤ tn for all n, we have
s ≤ f ≤ t. Therefore s = f = t a.e. Since s and t are measurable and the Lebesgue measure is
complete, f is measurable. Moreover,
Z Z Z Z Z
f = s= t= f = f
So f is Lebesgue integrable and the Riemann and Lebesgue integrals are equal.
51
19 Lebesgue’s Condition for Riemann Integrability
In this section, X is the interval [a, b], the σ-algebra is the collection of Lebesgue measurable subsets
of [a, b], and the measure is the Lebesgue measure on R restricted to this σ-algebra. It is easy to
check that this defines a complete measure space.
Theorem 19.1. (Lebesgue’s Condition for Riemann Integrability) Let f : [a, b] → R be a bounded
function. Then f is Riemann integrable on [a, b] iff f is continuous at almost every point of [a, b].
The idea of the proof is as follows. According to Riemann’s condition, a bounded function f :
[a, b] → R will be Riemann integrable if and only if the sum
n
!
X
( sup f ) − ( inf f ) ∆xi
[xi−1 ,xi ] [xi−1 ,xi ]
i=1
can be made arbitrarily small by choosing an appropriate partition of [a, b]. Split this sum into two
parts, say S1 + S2 , where S2 contains
terms from subintervals where continuity makes the difference
(sup[xi−1 ,xi ] f ) − (inf [xi−1 ,xi ] f ) small and S1 contains the remaining terms where discontinuities
prevent the difference from being small. In S2 , each term is small, so a large number of terms can
occur and still keep S2 small. In S1 , the terms may not be small, but they are bounded in size
(because f is bounded), so that S1 will be small if the sum of the lengths of the subintervals is
small. Hence we may expect that the set of discontinuities of an Riemann integrable function can
be covered by intervals whose total length is small.
Definition 19.2. Let f : [a, b] → R be a bounded function. For any interval I ⊆ [a, b], the number
(a) If I and J are intervals such that I ⊆ J ⊆ [a, b], then Ωf (I) ≤ Ωf (J).
(b) For every x ∈ [a, b], f is continuous at x iff for every > 0 there exists a δ > 0 such that
Ωf ([a, b] ∩ (x − δ, x + δ)) < .
(c) D = ∞
S
k=1 Dk , where D is the set of points x ∈ [a, b] at which f is discontinuous, and
(e) If E is a compact subset of [a, b] \ Dk , then there is a δ > 0 such that Ωf (J) < 1/k for every
interval J ⊆ E with `(J) < δ.
52
Proof. The proofs of (a)-(c) are easy exercises.
(d): To see that Dk is closed, let (xn ) be any sequence of points in Dk that converges to a
point x0 ∈ R. Since [a, b] is closed and xn ∈ [a, b] for all n, we have x0 ∈ [a, b]. Let δ > 0
be given. Choose n large enough that xn ∈ (x0 − δ, x0 + δ). Choose δ 0 > 0 small enough that
(xn − δ 0 , xn + δ 0 ) ⊆ (x0 − δ, x0 + δ). Since xn ∈ Dk , we have
1
Ωf ([a, b] ∩ (xn − δ 0 , xn + δ 0 )) ≥ .
k
Since [a, b] ∩ (xn − δ 0 , xn + δ 0 ) ⊆ [a, b] ∩ (x0 − δ, x0 + δ), we have
1
Ωf ([a, b] ∩ (x0 − δ, x0 + δ)) ≥ Ωf ([a, b] ∩ (xn − δ 0 , xn + δ 0 )) ≥ .
k
Thus x0 ∈ Dk . Hence Dk is closed.
(e): For each x ∈ E, we have x ∈ [a, b] \ Dk , so there exists a δ(x) > 0 such that
1
Ωf ([a, b] ∩ (x − δ(x), x + δ(x))) < .
k
S
Then E ⊆ x∈E (x − δ(x)/2, xS+ δ(x)/2). Since E is compact, there exist finitely many points
x1 , . . . , xm ∈ E such that E ⊆ m
i=1 (xi − δ(xi )/2, xi + δ(xi )/2). Define δ = min {δ(x1 ), . . . , δ(xm )}.
Let J be any interval contained in E with `(J) < δ. Then J intersects (x − δ(xi )/2, x + δ(xi )/2)
for some i. Since `(J) < δ(xi ), we have J ⊆ (xi − δ(xi ), xi + δ(xi )), and so
1
Ωf ([a, b] ∩ J) ≤ Ωf ([a, b] ∩ (xi − δ(xi ), xi + δ(xi ))) < .
k
Proof of Theorem 19.1. We will use the same notation as the lemma.
We first assume λ(D) = 0 and show that Riemann’s condition is satisfied. Let k ∈ N be arbitrary.
Since DP k ⊆ D, we have λ(Dk ) = 0. Choose finite open intervals I1 , I2 , . . . which cover Dk and
satisfy ∞ 1
i=1 `(Ii ) < k . Since Dk is closed and bounded, Dk is compact. So Dk is covered by
SN
finitely many intervals I1 , . . . , IN . Define E = [a, b] \ i=1 Ii . So E is compact and E ⊆ [a, b] ⊆ Dk .
Thus there is a δ > 0 such that Ωf (J) < 1/k for every interval J ⊆ E with `(J) < δ. Note E is the
union of finitely many closed subintervals of [a, b]. Further divide E into a finite number of closed
subintervals of length < δ. The endpoints of these subintervals form a partition P = {x0 , . . . , xn }
of [a, b]. Then
X X
U (f, P, [a, b]) − L(f, P, [a, b]) = Ωf ([xi−1 , xi ])∆xi + Ωf ([xi−1 , xi ])∆xi
i=I1 i=I2
where I1 is the set of those i ∈ {1, . . . , n} such that [xi−1 , xi ] contains points of Dk and I2 is the
set of remaining i ∈ {1, . . . , n}. For each i ∈ I2 , [xi−1 , xi ] is an interval contained in E with length
< δ, and so Ωf ([xi−1 , xi ]) < 1/k. Thus
X 1X b−a
Ωf ([xi−1 , xi ])∆xi ≤ ∆xi ≤ .
k k
i=I2 i=I2
53
S SN
Since i∈I1 [xi−1 , xi ] ⊆ i=1 Ii , we have
N
X X 1
∆xi ≤ `(Ik ) < ,
k
i∈I1 i=1
and so
X Ωf ([a, b])
Ωf ([xi−1 , xi ])∆xi < .
k
i=I1
Therefore
Ωf ([a, b]) + b − a
U (f, P, [a, b]) − L(f, P, [a, b]) <
k
Since this holds for every k ∈ N, Riemann’s condition holds, and so f is Riemann integrable on
[a, b].
For the converse, we assume λ(D) > 0 and show that Riemann’s condition is not satisfied. We
must have λ(Dk ) > 0 for some k ∈ N. Set = λ(Dk )/k. For any partition P = {x0 , . . . , xn } of
[a, b], we have
n
X X
U (f, P, [a, b]) − L(f, P, [a, b]) = Ωf ([xi−1 , xi ])∆xi ≥ Ωf ([xi−1 , xi ])∆xi
i=1 i∈I
where
S I is the set of those i ∈ {1, . . . , n} such that (xi−1 , xi ) contains points of Dk . Then Dk \ P ⊆
i∈I i−1 , xi ), and so
(x
X X
∆xi = λ((xi−1 , xi )) ≥ λ(Dk \ P ) = λ(Dk ) = k.
i∈I i∈I
For each i ∈ I, there is a point x ∈ (xi−1 , xi ) ∩ Dk . Choose δ > 0 such that (x − δ, x + δ) ⊆ [xi−1 , xi ].
Then, since x ∈ Dk , we have
1
Ωf ([xi−1 , xi ]) ≥ Ωf ([a, b] ∩ (x − δ, x + δ)) ≥ .
k
Combining the above inequalities, we get
Thus Riemann’s condition is satisfied, and so f is not Riemann integrable on [a, b].
54
20 Normed Spaces and Banach Spaces
In this section, K denotes either R or C, and V is a vector space with scalar field K. The proofs
of the theorems are left as exercises; they are simple extensions of the proofs of the analogous
theorems in R.
Definition 20.1. A norm on V is a function k · k : V → [0, ∞) such that the following conditions
are satisfied for all x, y ∈ V and c ∈ K:
The pair (V, k · k) is called a normed space. When the norm is clear from context, we write V
instead of (V, k · k).
Example 20.2. For 1 ≤ p < ∞, define
n
!1/p
X
p
kxkp = |xi | .
i=1
55
(c) limn kxn k = k limn xn k
Definition 20.8. Let (V, k · k) be a normed space. A sequence (xn ) in V is called Cauchy if for
every > 0, there exists a positive integer there exists a positive integer N such that for all integers
m, n ≥ N we have kxm − xn k < .
Definition 20.10. Let (V, k · k) be a normed space. A set E ⊆ V is called complete if every
Cauchy sequence in E converges and its limit is in E. If V is complete, V is called a Banach
space.
Theorem 20.12. Let (V, k · k) be a normed space. If (xn ) is a Cauchy sequence in V and (xn ) has
a subsequence which converges to x ∈ V , then (xn ) converges to x.
56
21 Lp Spaces: Definitions and Basic Properties
Remark. Let (X, A, µ) be a measure space. Let f : X → K. The ∞-norm of f equals the essential
supremum of |f |:
|f | ≤ kf ku , kf k∞ ≤ kf ku , |f | ≤ kf k∞ a.e.
If (X, A, µ) = (R, L, λ), the ∞-norm and the uniform norm are equal for continuous functions
f : R → K, but they are not equal in general. For example, the function f (x) = x1Q∩[0,1] (x) has
kf ku = 1 and kf k∞ = 0.
Theorem 21.2. Let (X, A, µ) be a measure space. Let 1 ≤ p ≤ ∞. For every measurable function
f : X → K, we have kf kp = 0 iff f = 0 a.e. iff f = 0 in Lp .
Proof. The second equivalence is simply the convention that functions which are equalR a.e. are
equal as elements of Lp . Now we prove the first equivalence. For 1 ≤ p < ∞, kf kp = 0 iff |f |p = 0
iff |f | = 0 a.e. iff f = 0 a.e. Assume p = ∞. If f = 0 a.e., then f ≤ 0 a.e., so kf k∞ ≤ 0, hence
kf k∞ = 0. Conversely, if kf k∞ = 0, then |f | ≤ kf k∞ = 0 a.e., then |f | = 0 a.e., then f = 0 a.e.
57
Proof. If 1 ≤ p < ∞, then
Z Z
kcf kpp = p
|cf | = |c| p
|f |p = |c|p kf kpp .
If p = ∞, then
= |c|kf k∞
The third equality is a simple exercise in showing that two sets are equal.
The next theorem will lead to the triangle inequality for Lp spaces. But it is also extremely
important in its own right.
Theorem 21.4. (Holder’s Inequality) Let (X, A, µ) be a measure space. Let 1 ≤ p, q ≤ ∞ such
p−1 + q −1 = 1 with the convention that ∞−1 = 0. (We say p and q are conjugate exponents, we
say q is the conjugate exponent of p, and vice versa.) If f, g : X → K are measurable, then
kf gk1 ≤ kf kp kgkq .
ab ≤ p−1 ap + q −1 bq ..
Proof. For fixed b, p, q, we consider h(a) = p−1 ap + q −1 bq − ab, a ∈ [0, ∞). Then h0 (a) = ap−1 − b.
So h0 (a) > 0 if a > b1/(p−1) and h0 (a) < 0 if a < b1/(p−1) . Thus the minimum value of h(a) is
where for the second equality we used that q = p/(p − 1). Therefore 0 ≤ h(a) = p−1 ap + q −1 bq − ab,
hence ab ≤ p−1 ap + q −1 bq . Note that equality happens iff a = b1/(p−1) , which is equivalent to
ap = bq (because q = p/(p − 1))
Proof of Holder’s Inequality. If 1 < p, q < ∞ and 0 < kf kp , kgkq < ∞, set a = |f (x)|/kf kp ,
b = |g(x)|/kgkq , apply the lemma, then integrate. The result is
Z Z Z
1 1 p 1 1 1
|f g| ≤ p |f (x)| + q |g(x)|p = + = 1.
kf kp kgkq pkf kp qkgkq p p
The other cases are easy. By If kf kp = 0, then f = 0 a.e., then f g = 0 a.e., then kf gk1 = 0, so the
inequality is trivial; likewise if kgkq = 0. If kf kp = ∞ and kgkq > 0, the inequality is clear; likewise
if kf kp > ∞ and kgkq = ∞. If p = ∞, then |f | ≤ kf k∞ a.e., hence |f g| ≤ kf k∞ |g| a.e.; likewise if
q = ∞.
58
The next theorem is the triangle inequality for Lp spaces.
Theorem 21.6. (Minkowski’s Inequality) Let (X, A, µ) be a measure space. Let 1 ≤ p ≤ ∞. Let
f, g : X → K be measurable functions. Then
kf + gkp ≤ kf kp + kgkp
Consequently, if f, g ∈ Lp , then f + g ∈ Lp .
hence
kf + gkpp ≤ (kf kp + kgkp )kf + gkp−1
p .
22 Lp Spaces: Completeness
Case: 1 ≤ p < ∞. Let (fn ) be a Cauchy sequence in Lp . Inductively choose positive integers nj
(j = 1, 2, . . .) such that
kfm − fn kp < 2−j for all m, n ≥ nj .
and nj < nj+1 . Then (fnj )∞ j=1 is a subsequence of (fn ). By Theorem 20.12, to show that (fn )
converges in Lp , it will suffice to show that (fnj ) converges in Lp . Note
k
X
fnk = fn1 + (fnj+1 − fnj ) (22.1)
j=1
59
Define
k
X ∞
X
Gk = |fn1 | + |fnj+1 − fnj |, G = |fn1 | + |fnj+1 − fnj |.
j=2 j=2
Then
Z 1/p k
Gpk
X
= kGk kp ≤ kfn1 kp + 2−j < kfn1 kp + 1
j=1
for each k. Note Gpk ↑ Gp . Then the monotone convergence theorem gives
Z Z
G = lim Gpk ≤ kfn1 kp + 1 < ∞.
p
k
So Gp is integrable. It follows that Gp is finite a.e., which implies G is finite a.e. The latter means
that, for a.e. x ∈ X, the series
∞
X
fn1 (x) + (fnj+1 (x) − fnj (x))
j=1
of complex numbers is absolutely convergent and (hence) convergent. For those x in the set of
measure zero where the series diverges, define F (x) = 0. For those x at which the series converges,
define
X∞
F (x) = fn1 (x) + (fnj+1 (x) − fnj (x)).
j=1
So, by (22.1),
fnk → F a.e.
p
RWe will
p
R p that fnk → F inp L . This will complete the proof. We have |F | ≤ G, and so
show
|F | ≤ G < ∞. Thus F ∈ L . We have
|fnk − F |p → 0 a.e.
We also have
|fnk − F |p ≤ (|Gk | + |F |)p ≤ (2G)p .
Since (2G)p is integrable, the dominated convergence theorem gives
Z
lim kfnk − F kp = lim |fnk − F |p = 0
p
k k
Thus fnk → F in Lp .
60
23 Lp Spaces: Dense Subspaces (Optional)
Definition 23.1. Let (V, k · k) be a normed space. A set E ⊆ V is called dense if for each x in V
there exists a sequence (xn ) in E such that xn → x.
Theorem 23.2. Let (X, A, µ) be a measure space. Let 1 ≤ p ≤ ∞. For each f ∈ Lp and each
> 0, there exists a simple function s ∈ Lp such that kf − skp < . In other words, the set of
simple functions in Lp is a dense subset of Lp .
Proof. If the theorem holds for real-valued f , then it holds for complex-valued f by considering
real and imaginary parts. So we assume f is real-valued.
Case 1: 1 ≤ p < ∞. Let f ∈ Lp and > 0 be given. By Theorem 6.3, we can choose a sequence
of simple measurable functions sn such that |sn | ≤ |f | for each n and sn → f pointwise. Then
|sn − f |p → 0 pointwise and |sn − f |p ≤ (|sn | + |f |)p ≤ (2|f |)p ∈ L1 . By the dominated convergence
theorem, Z Z
lim ksn − f kpp = lim |sn − f |p = 0 = 0,
n n
and so ksn − f kp → 0. Therefore there exists N ∈ N such that ksn − f k < . Take s = sN . Note
s ∈ Lp because kskp ≤ ks − f kp + kf kp < + ∞ = ∞.
Case 2: p = ∞. Let f ∈ L∞ and > 0 be given. Since |f | ≤ kf k∞ < ∞ a.e., there exists
A ∈ A such that µ(Ac ) = 0 and |f (x)| ≤ kf k∞ for all x ∈ A. So f 1A is bounded by the number
kf k∞ . By Theorem 6.3, we can choose a sequence of simple measurable functions sn such that
sn → f 1A uniformly. Thus ksn − f 1A ku → 0. Since ksn − f 1A k∞ ≤ ksn − f 1A ku , we also have
ksn − f 1A k∞ → 0. Finally, since sn − f 1A = sn − f a.e., we have ksn − f k∞ = ksn − f 1A k∞ → 0.
Thus ksn − f k∞ → 0. Therefore there exists N ∈ N such that ksn − f k < . Take s = sN . Note
s ∈ L∞ because ksk∞ ≤ ks − f k∞ + kf k∞ < + ∞ = ∞.
The next lemma tells us what the simple functions in Lp spaces look like.
(b) If s is a simple function and 1 ≤ p < ∞, then s ∈ Lp iff s is measurable and s = 0 outside a
set of finite measure.
Suppose s ∈ Lp . Then s is measurable and kskpp = |s|p = ni=1 |ci |p µ(Ei ) < ∞. Let E be the
R P
union of those sets Ei with ci 6= 0. Then µ(E) < ∞ and E c is the union of those sets Ei with
ci = 0. So for all x ∈ E c we have s(x) = 0. (Note that if E c = ∅, then it is vacuously true that
s(x) = 0 for all x ∈ E c .)
61
Conversely, suppose s is measurable and s = 0 outside a set of finite measure. So there exists a set
A ∈ A such that µ(A) < ∞ and s(x) = 0 for all x ∈ Ac . Then
Z Z Z n
Z X n
X
kskpp = p
|s| 1A + p
|s| 1 Ac = p
|s| 1A = p
|ci | 1Ei ∩A = |ci |p µ(Ei ∩ A).
i=1 i=1
Since µ(A ∩ Ei ) ≤ µ(A) < ∞ for each i, it follows that kskpp < ∞, hence s ∈ Lp .
Now we give some approximation theorems for the specific case of Lebesgue measure on (R, L).
Definition 23.4. A function h : R → C is called a step function if
n
X
h= ck 1Ik
k=1
Proof. Let f ∈ Lp and let > 0. By the previous theorem, there exists simple function s ∈ Lp such
that kf − skp < /2. So it will suffice to find a step function h ∈ Lp such that ks − hkp < /2. Let
c1 , . . . , cn be the distinct, non-zero numbers, in the range of s. Let Ei = s−1 (ci ) for each i. Then
n
X
s= ci 1Ei .
i=1
Fix i ∈ {1, . . . , n}. Choose δ > 0 such that 2(2δ)1/p < /(2|ci |n). Since s ∈ Lp , 1 ≤ p < ∞, and
ci 6= 0, the previous lemma implies λ(Ei ) <S∞. By the definition of Lebesgue measure, there exists
a sequence of intervals (Ij ) such that E ⊆ ∞ j=1 Ij and
∞
X
λ(Ei ) ≤ `(Ij ) ≤ λ(Ei ) + δ.
j=1
62
Define hi to be the indicator function of N
S
j=1 Ij . Then hi is a step function
S∞ (verify). Since each
p
Ij is a finite interval, hi ∈ L . Define gi to be the indicator function of j=1 Iij . Note hi ≤ gi and
1Ei ≤ gi . We have
∞
[ [N ∞
[
2−p khi − 1Ei kpp ≤ kgi − hi kpp + kgi − 1Ei kpp = λ Ij \ Ij + λ Ij \ Ei
j=1 j=1 j=1
∞
[ N
[ ∞
[
= λ Ij − λ Ij + λ Ij − λ(Ei )
j=1 j=1 j=1
≤ 2δ
Since δ is arbitrary, we can choose it such that 2(2δ)1/p < /(2|ci |n), and so
k1Ei − hi kp < .
2|ci |n
Lemma 23.6. Let (X, A, µ) = (R, L, λ). Let 1 ≤ p < ∞. A step function h : R → C belongs to
Lp iff
h=0
outside a finite interval.
Theorem 23.7. Let (X, A, µ) = (R, L, λ). Let 1 ≤ p < ∞. For each f ∈ Lp and each > 0,
there exists a continuous function g such that g = 0 outside a finite interval and kf − gkp < .
Consequently, the set of continuous functions g such that g = 0 outside a finite interval is a dense
subset of Lp .
Proof. Let f ∈ Lp and > 0 be given. By the previous theorem, there exists a step function h ∈ Lp
such that kf − hkp < /2. So it will suffice to find a continuous function g such that kh − gkp < /2
and g = 0 outside a finite interval. We have
n
X
h= ck 1Ik
i=1
for some numbers c1 , . . . , cn ∈ C and intervals I1 , . . . , In . The sum does not change if we delete
terms with ck = 0 or Ik = ∅. So we assume that ck 6= 0 and Ik 6= ∅ for all k. The previous lemma
implies Ik is finite for all k. For each k ∈ {1, . . . , n}, we will define a continuous function gk such
that
k1Ik − gk kp <
2|ci |n
and gk = 0 outside Ik . Then the function defined by
n
X
g= ck gk
k=1
63
Sn be a continuous function such that kh − gkp ≤ /2 and g = 0 outside a finite interval containing
will
k=1 Ik . observing that set of continuous functions g such that g = 0 outside a finite interval is
a subset of Lp . This will also give the density statement of the theorem, as the set of continuous
functions g such that g = 0 outside a finite interval is easily checked to be a subset of Lp .
Let k ∈ {1, . . . , n} be given. Choose δ so that (2δ)1/p < /(2|ck |n). Define gk to be the continuous
function that equals 0 on (−∞, a] and [b, ∞), equals 1 on [a + δ, b − δ], and is linear on [a, a + δ]
and [b − δ, b] (draw a picture). Set A = [a, a + δ] ∪ [b − δ, b]. Note λ(A) ≤ 2δ and 0 ≤ 1Ik − gk ≤ 1A .
Then
k1Ik − gk kp ≤ k1A kp = µ(A)1/p ≤ (2δ)1/p < .
2|ci |n
64
24 Counting Measure and `p Spaces
Theorem 24.1. Let X be a nonempty set. Let f : X → [0, ∞]. Let µ be counting measure on
(X, P(X)). (Recall that counting measure is defined as follows: µ(A) is the number of elements in
A if A is a finite subset of X, and µ(A) = ∞ if A is an infinite subset of X).
(a) f is measurable
R P
(b) f dµ = sup x∈F f (x) : F ⊆ X, F is finite .
(d) If X = N, then f dµ = ∞
R P
k=1 f (k).
Proof. Exercise.
Definition 24.2. If X is any nonempty set and µ is counting measure on X on (X, P(X)), we
define `p (X) = Lp (A, P(A), µ).
Corollary 24.3. If 1 ≤ p ≤ ∞, X = {1, . . . , n}, and µ is the counting measure on (X, P(X)), then
`p (X) = Cn and (`p (X), k · kp ) = (Cn , k · kp ).
Corollary 24.4. If 1 ≤ p ≤ ∞, then `p (N) is the space of all sequences (an )n∈N in C such that
k(an )kp < ∞, where
P∞
( i=1 |ai |p )1/p if 1 ≤ p < ∞
k(an )kp =
supi∈N |ai | if p = ∞
65
25 Complex-Valued Measurable Functions
(i) cf
(ii) f + g
(iii) f g
(iv) |f |
Proof. (i): Write c = a + ib and f = u + iv (where a = Re(c), b = Im(c), u = Re(f ), v = Im(f )).
Then cf = au − bv + i(av + bu). So Re(cf ) = au − bv and Im(cf ) = av + bu are measurable.
Therefore cf is measurable.
Proof. Exercise.
Proof. Exercise.
66
26 Caratheodory Construction: Measures from Outer Measures
Lemma 26.1. Let X be any set. Let A be a σ-algebra on X. Let µ∗ be an outer measure on X.
Then µ∗ is a measure on A iff µ∗ is finitely additive on A. More precisely, the restriction of µ∗ to
A is a measure on A iff µ(A ∪ B) = µ(A) + µ(B) for every A, B ∈ A such that A ∩ B = ∅.
Proof.
⇒: Theorem 5.2 (a),(c),(d).
subaddivity of µ∗ , µ∗ ( ∞
S
⇐:
P∞Let∗A1 , A2 , . . . be any sequence of disjoint sets in A. By countably i=1 Ai ) ≤
µ (A ). To prove the reverse inequality, we now assume µ ∗ is finitely additive on A. We have
i=1 i
µ(AS∪ B) = µ(A) Pn + µ(B) for every A, B ∈ A such that A ∩ B = ∅. By induction, we have
∗ n ∗
µ ( i=1 Ai ) = i=1 µ (Ai ) for every n ∈ N. By monotonicity and taking a limit,
∞ n n
!
X X [
µ∗ (Ai ) = lim µ∗ (Ai ) = lim µ∗ Ai
n→∞ n→∞
i=1 i=1 i=1
∞ ∞
! !
[ [
∗ ∗
≤ lim µ Ai =µ Ai
n→∞
i=1 i=1
Lemma 26.2. Let X be a set. Let A be an algebra on X. Let µ : A → [0, ∞] be a function. The
following are equivalent.
µ∗ (E) = µ∗ (E ∩ A) + µ∗ (E ∩ Ac )
Note that since the ≤ inequality follows immediately from finite subadditivity of µ∗ , a set A ∈ P(X)
is µ∗ -measurable iff
µ∗ (E) ≥ µ∗ (E ∩ A) + µ∗ (E ∩ Ac )
for every E ∈ P(X).
A µ∗ -measurable set A can be thought of as a sharp knife; it can cut any set E into two pieces
E ∩ A and E ∩ Ac so cleanly that the sum of the outer measure of the pieces equals the outer
measure of the whole.
Theorem 26.4 (Caratheodory’s Restriction Theorem). Let X be a set. Let µ∗ be an outer measure
on X. Let A∗ be the collection of µ∗ -measurable sets.
67
(a) If A ∈ P(X) and µ∗ (A) = 0, then A ∈ A∗ .
(b) A∗ is a σ-algebra on X
(c) The restriction of µ∗ to A is a measure on A.
So A ∈ A∗ .
Step 1. ∅ ∈ A∗ .
Step 2. If A ∈ A∗ , then Ac ∈ A.
µ∗ (E) = µ∗ (E ∩ A) + µ∗ (E ∩ Ac ) = µ∗ (E ∩ Ac ) + µ∗ (E ∩ (Ac )c )
, and so Ac ∈ A.
Step 3. If A, B ∈ A∗ , then A ∪ B ∈ A∗ .
µ∗ (E) = µ∗ (E ∩ A) + µ∗ (E ∩ Ac )
= µ∗ (E ∩ A ∩ B) + µ∗ (E ∩ A ∩ B c ) + µ∗ (E ∩ Ac ∩ B)
+ µ∗ (E ∩ Ac ∩ B c ).
µ∗ (E ∩ (A ∪ B)) + µ∗ (E ∩ (A ∪ B)c ) ≤ µ∗ (E ∩ A ∩ B) + µ∗ (E ∩ A ∩ B c ) + µ∗ (E ∩ Ac ∩ B)
+ µ∗ (E ∩ Ac ∩ B c )
= µ∗ (E).
Thus A ∪ B ∈ A∗ .
Step 4. A∗ is an algebra.
68
Since Bn ∈ A∗ ,
n n n
! !
[ [ [
∗ ∗ ∗
µ (E ∩ Bi ) = µ (E ∩ Bi ∩ Bn ) + µ (E ∩ Bi ∩ Bnc )
i=1 i=1 i=1
and
n n−1
! !
[ [
E∩ Bi ∩ Bnc = E ∩ Bi
i=1 i=1
. Therefore
n n−1
! !!
[ [
µ∗ (E ∩ Bi ) = µ∗ (E ∩ Bn ) + µ∗ E∩ Bi .
i=1 i=1
By Step 5, !c !
n
X n
[
∗ ∗ ∗
µ (E) = µ (E ∩ Bi ) + µ E∩ Bi .
i=1 i=1
S∞ c
⊆ ( ni=1 Bi )c , monotonicity of µ∗ gives
S
Since ( i=1 Bi )
n ∞
!c !
X [
∗ ∗ ∗
µ (E) ≥ µ (E ∩ Bi ) + µ E∩ Bi .
i=1 i=1
Letting n → ∞ gives
∞ ∞
!c !
X [
∗ ∗ ∗
µ (E) ≥ µ (E ∩ Bi ) + µ E∩ Bi .
i=1 i=1
By countable subaddivity of µ∗ ,
∞
!! ∞
!c !
[ [
µ∗ (E) ≥ µ∗ E∩ Bi + µ∗ E∩ Bi .
i=1 i=1
S∞
Therefore i=1 Bi ∈ A∗ .
69
S∞
Step 7. If A1 , A2 , . . . ∈ A, then i=1 Ai ∈ A.
70
27 Lebesgue Measure and Lebesgue σ-Algebra
Definition 27.1. Let λ∗ be the Lebesgue outer measure on R. The σ-algebra of all λ∗ -measurable
sets is called the Lebesgue σ-algebra on R. It is denoted by L. The elements of L are called the
Lebesgue measurable sets. The restriction of λ∗ to L is a measure called Lebesgue measure
and is denoted by λ.
Theorem 27.2. The Borel σ-algebra on R is contained in the Lebesgue σ-algebra on R, i.e., B ⊆ L.
Proof. Since B is the smallest σ-algebra containing the open sets, we just need to show that L
contains every open set. We first show that L contains every interval of the form (a, ∞). Let I be
any open interval of the form (a, ∞). We must show
λ∗ (E) = λ∗ (E ∩ I) + λ∗ (E ∩ I c )
λ∗ (E) ≤ λ∗ (E ∩ I) + λ∗ (E ∩ I c ).
It remains to show
λ∗ (E) ≥ λ∗ (E ∩ I) + λ∗ (E ∩ I c ).
If λ∗ (E) = ∞, we are done. Assume λ∗ (E) < ∞. Let > 0 be given. There exists a sequence of
intervals (In )n∈N that covers E and satisfies
∞
X
`(In ) ≤ λ∗ (E) + .
n=1
Since µ∗ (E) is finite, the last inequality implies each In is a finite interval. For each n ∈ N, choose
an open interval Jn such that In ⊆ Jn and `(Jn ) ≤ `(In ) + 2−n . Then the sequence of finite open
intervals (Jn )n∈N covers E and
X∞
`(Jn ) ≤ λ∗ (E) + 2.
n=1
For each n ∈ N, define Jn0 = Jn ∩ I = Jn and Jn00 = Jn ∩ I c . Then each Jn0 and Jn00 is an interval and
Therefore
∞
X ∞
X ∞
X
λ∗ (E ∩ I) + λ∗ (E ∩ I c ) ≤ `(Jn0 ) + `(Jn00 ) = `(Jn ) ≤ λ∗ (E) + 2.
n=1 n=1 n=1
71
Since > 0 is arbitrary, we have
λ∗ (E ∩ I) + λ∗ (E ∩ I c ) ≤ λ∗ (E).
We have proved L contains every open interval of the form (a, ∞). Since L is closed under com-
plements it contains the complement of each such interval, i.e., it contains every interval of the
form (−∞, a]. Since L is closed under finite intersections, it contains every interval of the form
(a, b) = (a, ∞) ∩ (−∞, b]. Since L is closed under countable unions, it contains every interval of the
form
n
[ 1
(a, b) = (a, b − ].
n
i=1
and every interval of the form
n
[ 1
(−∞, b) = (−n, b − ].
n
i=1
Thus L contains every open interval.
Let U be any open set in R. Then U is a union of open balls, by definition. But the open
= (x − r, x + r) and (a, b) =
balls in R are exactly the finite open intervals. Indeed, B(x, r) S
B((a + b)/2, (b − a)/2). Thus U is union of open intervals, U = k∈K Ik . Note that the index
set K may not be countable. For each rational number r ∈ U , let Jr be the union of those
interval S r ∈ Ik . Then Jr is an open interval (possibly infinite) for each r ∈ Q and
S Ik such that
U = k∈K Ik = r∈QQ Jr . Since Jr ∈ L for each r ∈ Q and Q is countable, we see that U is a
countable union of sets in L, and so U ∈ L. This show that L contains all the open sets in R.
(a) A ∈ L.
(b) For every > 0, there exists an open set G such that A ⊆ G and λ∗ (G \ A) < .
(d) For every > 0, there exists a closed set F such that F ⊆ A and λ∗ (A \ F ) < .
We first prove this under the assumption thatPλ∗ (A) < ∞. Let > 0. Choose a sequence of
intervals I1 , I2 , . . . such that A ⊆ i=1 Ii and ∞
S∞ ∗ ∗
i=1 `(Ii ) ≤ λ (A) + . Since λ (A) is finite, the
last inequality implies each Ii is a finite interval.
S∞ For each i, choose an open interval Ji such that
−i
Ii ⊆ Ji and `(Ji ) ≤ `(Ii ) + 2 . Define G = i=1 Ji . Then G is open, A ⊆ G, and
∞
X
∗
λ (G) ≤ `(Ji ) ≤ λ∗ (A) + 2
i=1
72
Combining the inequalities above and using that λ∗ (A) < ∞, we get
∗ −n
S∞each n ∈ Z we get an open set Gn such that An ⊆ Gn and
the argument argument above, for
λ (Gn \ An ) < 2 . Define G = n=1 Gn . Then G is an open set, A ⊆ G, and
∞
[ ∞
[
G\A= (Gn \ A) ⊆ (Gn \ An ).
n=1 n=1
Therefore
∞
X ∞
X
λ∗ (G \ A) ≤ λ∗ (Gn \ An ) < 2−n−1 = .
n=1 n=1
(b) ⇒ (c): By (b),Sfor each n ∈ N, there exists an open set Gn such that A ⊆ Gn and λ∗ (Gn \ A) <
1/n. Define B = ∞ n=1 Gn . Then B ∈ B, A ⊆ B, and B \ A ⊆ Gn \ A for every n ∈ N. Therefore
λ∗ (B \ A) ≤ λ∗ (Gn \ A) < 1/n for every n ∈ N. Hence λ∗ (B \ A) = 0.
(c) ⇒ (a): Let E ∈ P(R) be given. By (c), there exists B ∈ B such that A ⊆ B and λ∗ (B \ A) = 0.
Since B ⊆ L, we have B ∈ L, and so
λ∗ (E) = λ∗ (E ∩ B) + λ∗ (E ∩ B c ).
But λ∗ (E ∩ A) ≤ λ∗ (E ∩ B) and
λ∗ (E ∩ Ac ) ≤ λ∗ (E ∩ Ac ∩ B) + λ∗ (E ∩ Ac ∩ B c ) ≤ λ∗ (B \ A) + λ∗ (E ∩ B c ) = λ∗ (E ∩ B c ).
Therefore
λ∗ (E) ≥ λ∗ (E ∩ A) + λ∗ (E ∩ Ac ).
The ≤ inequality comes from subaddivity. Thus A ∈ L.
At this point, we have proved (a) ⇔ (b) ⇔ (c). We use this equivalence in what follows.
(a) ⇔ (d): A ∈ L iff Ac ∈ L iff (b) holds with A replaced by Ac , i.e., for every > 0, there exists an
open set G such that Ac ⊆ G and λ∗ (G \ Ac ) < . This last statement is equivalent to (d) because
of the following facts: G is open iff Gc is closed; Ac ⊆ G iff Gc ⊆ A; G \ Ac = G ∩ A = A \ Gc .
(a) ⇔ (e): A ∈ L iff Ac ∈ L iff (b) holds with A replaced by Ac , i.e., for every > 0, there exists a
B ∈ B such that Ac ⊆ B and λ∗ (B \ Ac ) = 0. This last statement is equivalent to (e) because of
the following facts: B ∈ B iff B c ∈ B; Ac ⊆ B iff B c ⊆ A; B \ Ac = B ∩ A = A \ B c .
73
28 Product σ-Algebras
Definition 28.2. Let X, Y be any sets. Let E ⊆ X × Y . If a ∈ X, the a-section (or a-cross-
section or a-slice) of E is
[E]a = {y ∈ Y : (a, y) ∈ E} .
If b ∈ Y , the b-section (or b-cross-section or b-slice) of E is
[E]b = {x ∈ X : (x, b) ∈ E} .
It is easy to check that the right-hand side is a σ-algebra containing the measurable rectangles,
which implies the desired containment.
[f ]a (y) = f (a, y)
Theorem 28.6. Let S be a σ-algebra on a set X. Let T be a σ-algebra on a set Y . Let Let
f : X × Y → R. If f is S ⊗ T -measurable, then [f ]a is T -measurable for each a ∈ X and [f ]b is
S-measurable for each b ∈ Y . Informally, the sections of a measurable function are measurable.
This proves that [f ]a is T -measurable for each a ∈ X. The other conclusion is proved similarly.
74
29 Monotone Classes
In other words, a monotone class is a collection of subsets of X which is closed under countable
increasing unions and countable decreasing intersections.
Proof. Exercise.
Theorem 29.3. The intersection of any collection of monotone classes on a set X is a monotone
class on X.
Proof. Exercise.
Theorem 29.4. Let X be a set and let C be any collection of subsets of X. The intersection of
all monotone classes on X that contain C is the smallest monotone class on X that contains C. It
called the monotone class generated by C and it is denoted by M(C).
Proof. Exercise.
Lemma 29.5. (Monotone Class Lemma) If A is an algebra on a set X, then the monotone class
generated by A equals the σ-algebra generated by A, i.e.,
M(A) = σ(A).
Proof. Since σ(A) is a monotone class, we have M(A) ⊆ σ(A). We need to show σ(A) ⊆ M(A). It
suffices to show that M(A) is a σ-algebra. Note that if a monotone class is an algebra, then it is a σ-
algebra (exercise). Therefore it suffices to show that M(A) is an algebra. Note ∅, X ∈ A ⊆ M(A).
Thus, to prove that M(A) is an algebra, it is easy to check (exercise) that it will suffice to prove
the following claim: For all E, F ∈ M(A) we have E ∩ F, E \ F, F \ E ∈ M(A).
ME = {F ∈ M(A) : E ∩ F, E \ F, F \ E ∈ M(A)} .
It is easy to check (exercise) that ME is a monotone class for every E ∈ M(A). Since A is an
algebra, it is easy to check (exercise) that A ⊆ ME for every E ∈ A. Therefore M(A) ⊆ ME for
every E ∈ A. In other words, for every E ∈ A and every F ∈ M(A) we have F ∈ ME . Note that
for every E, F ∈ M(A) we have F ∈ ME iff E ∈ MF . Thus, for every E ∈ A and every F ∈ M(A)
we have E ∈ MF . This means that A ⊆ MF for every F ∈ M(A). Therefore M(A) ⊆ MF for
75
every F ∈ M(A). But this means that for all E, F ∈ M(A) we have E ∈ MF . Thus for all
E, F ∈ M(A) we have E ∩ F, E \ F, F \ E ∈ M(A).
Definition 30.1. Let (X, S, µ) be a measure space. We S say that (X, S, µ) is σ-finite (or that µ is
σ-finite) if there exist E1 , E2 , . . . ⊆ X such that X = ∞
i=1 Ei and µ(Ei ) < ∞ for every i ∈ N.
Definition 30.2. Let (X, S, µ) and (Y, T , ν) be measure spaces. Let E ∈ S ⊗ T . For each x ∈ X,
we define Z Z
1E (x, y)dν(y) = [1E ]x (y)dν(y)
Y Y
For each y ∈ Y , we define
Z Z
1E (x, y)dµ(x) = [1E ]y (x)dµ(x)
X X
Lemma 30.3. Let (X, S, µ) and (Y, T , ν) be measure spaces. Let E ∈ S ⊗ T . For each x ∈ X,
Z
1E (x, y)dν(y) = ν([E]x )
Y
For each y ∈ Y , Z
1E (x, y)dµ(x) = µ([E]y )
X
Theorem 30.4. Let (X, S, µ) and (Y, T , ν) be σ-finite measure spaces. For every E ∈ S ⊗ T , we
have
R
(a) x 7→ Y 1E (x, y)dν(y) is an S-measurable function.
R
(b) y 7→ X 1E (x, y)dµ(x) is a T -measurable function.
R R R R
(c) X Y 1E (x, y)dν(y)dµ(x) = Y X 1E (x, y)dµ(x)dν(y).
Proof. We will first prove the theorem in the special case where µ and ν are finite. Then we will
prove the theorem in the case µ and ν are σ-finite.
Assume µ and ν are finite. Let M be the collection of all sets E ∈ S ⊗ T such that (a),(b),(c) hold.
It will suffice to show that S ⊗ T ⊆ M. Let R be the collection of all measurable rectangles. Let
76
A be the collection of all finite unions of disjoint measurable rectangles. Note that S ⊗ T = σ(A)
and that A is an algebra. By the monotone class lemma, S ⊗ T = M(A). Therefore, to show that
S ⊗ T ⊆ M, it will suffice to show that M is a monotone class that contains A. We do so in three
steps.
Step 1: R ⊆ M. Let E ∈ R. We must show that (a),(b),(c) hold. We have E = A × B for some
A ∈ S and B ∈ T . For each x ∈ X, we have
B if x ∈ A
[E]x =
∅ if x ∈
/A
and so Z
ν(B) if x ∈ A
1E (x, y)dν(y) = ν([E]x ) = = ν(B)1A (x).
Y 0 if x ∈
/A
R
Since A ∈ S, (a) holds. Similarly, X 1E (x, y)dµ(x) = µ(A)1B (y) for each y ∈ Y . Since B ∈ T , (b)
holds. Now we see that
Z Z Z Z Z Z
1E (x, y)dν(y)dµ(x) = ν(B)1A (x)dµ(x) = µ(A)ν(B) = µ(A)1B (y)dν(y) = 1E (x, y)dµ(y)dν(x)
X Y X Y Y X
77
For x ∈ X, we have [E1 ]x ⊆ [E2 ]x ⊆ . . . and [E]x = ∞
S
n=1 [En ]x . By continuity from below applied
to ν([En ]x ) (for each fixed x) (or the monotone convergence theorem applied to 1En (x, y) (for
each fixed x)), we have that fn → f pointwise. By the definition of M, each fn is S-measurable.
Therefore f is measurable. So (a) holds for E. For each y ∈ Y , define
Z
gn (y) = 1En (x, y)dµ(x) = ν([En ]y ),
Z X
Similarly, since (gn ) is an increasing sequence, the monotone convergence theorem implies
Z Z Z Z Z Z
1E (x, y)dµ(x)dν(y) = g(y)dν(y) = lim gn (y)dν(y) = lim 1En (x, y)dµ(x)dν(y).
n n
(30.2)
Since (c) holds for each En , the right-hand sides of (30.1) and (30.2) are equal, and so
Z Z Z Z
1E (x, y)dν(y)dµ(x) = 1E (x, y)dµ(x)dν(y)
Thus (c) holds for E. Now we show that M S is closed under countable decreasing intersections.
Suppose E1 ⊆ E2 ⊆ . . . are in M and set E = ∞ n=1 Ei . We must show E ∈ M. To do so, we will
show (a),(b),(c) hold for E. Define f , fn , g, gn as above. We have f1 (x) = ν([E1 ]x ) ≤ ν(Y ) < ∞ for
each x ∈ X. By continuity from above applied to ν([En ]x ) (for each fixed x ∈ X) (or the dominated
convergence theorem applied to 1En (x, y) (for each fixed x)), we have that fn → f pointwise. By
the definition of M, each fn is S-measurable. Therefore f is measurable. So (a) holds for E. We
have g1 (y) = µ([E1 ]y ) ≤ µ(X) < ∞ for each y ∈ Y . By continuity from above applied to µ([En ]y )
(for each fixed y ∈ Y ) (or the dominated convergence theorem applied to 1En (x, y) (for each fixed
y)), we have that gn → g pointwise. By the definition of M, each gn is T -measurable. Therefore g
is measurable. So (b) holds for E. We have 0 ≤ fn ≤ f1 for all n and
Z Z
f1 (x)dµ(x) ≤ ν(Y )dµ(x) = µ(X)ν(Y ) < ∞.
78
Similarly, we have 0 ≤ gn ≤ g1 for all n and
Z Z
g1 (y)dν(y) ≤ µ(X)dν(y) = µ(X)ν(Y ) < ∞.
Since (c) holds for each En , the right-hand sides of (30.3) and (30.4) are equal, and so
Z Z Z Z
1E (x, y)dν(y)dµ(x) = 1E (x, y)dµ(x)dν(y)
The proof of the theorem is now complete in the case where µ and ν are finite. Now S we assume
only that µ and ν are σ-finite. Since µ is σ-finite, there exist X1 , X2 ∈ S such that ∞ i=1 Xi = X
and µ(Xi ) < ∞ for all i. By replacing each Xi by X1S∪ · · · ∪ Xi , we can assume X1 ⊆ X2 ⊆ . . ..
Since ν is σ-finite, there exist Y1 , Y2 ∈ T such that ∞ j=1 Yj = Y and ν(Yj ) < ∞ for all j. By
replacing each Yj by Y1 ∪ · · · ∪ Yj , we can assume Y1 ⊆ Y2 ⊆ . . .. For each k ∈ N, define µk on
S by µk (A) = µ(A ∩ Xk ) and νk on T by νk (B) = ν(B ∩ Yk ). It is easy to check that µk and νk
are finite measures. Let E ∈ S ⊗ T . As the theorem has been proved for finite measures, for each
k ∈ N, x 7→ νk ([E]x ) is S-measurable function, y 7→ µk ([E]y ) is T -measurable function, and
Z Z
νk ([E]x )dµ(x) = µk ([E]y )dν(x). (30.5)
For each fixed x, [E]x ∩Yk increases to [E]x , and so continuity from below implies νk ([E]x ) increases
to ν([E]x ). Then x 7→ ν([E]x ) is measurable and the monotone convergence theorem implies
Z Z
lim νk ([E]x )dµ(x) = ν([E]x )dµ(x).
k
which is equivalent to
Z Z Z Z
1E (x, y)dν(y)dµ(x) = 1E (x, y)dµ(x)dν(y).
79
31 Product Measures
Definition 31.1. Let (X, S, µ) and (Y, T , ν) be measure spaces. Assume (X, S, µ) and (Y, T , ν)
are σ-finite. The product measure of µ and ν is the set function µ × ν : S ⊗ T → [0, ∞] defined
by Z Z Z Z
(µ × ν)(E) = 1E (x, y)dν(y)dµ(x) = 1E (x, y)dν(y)dµ(x).
X Y X Y
Theorem 31.2. Let (X, S, µ) and (Y, T , ν) be measure spaces. Assume (X, S, µ) and (Y, T , ν) are
σ-finite. Then µ × ν is a σ-finite measure on S ⊗ T . Moreover, µ × ν is the unique measure on
S ⊗ T such that
(µ × ν)(A × B) = µ(A)ν(B)
for all A ∈ S and B ∈ T .
By applying the monotone convergence theorem to the sequence of partial sums of the series, we
get
∞ ∞ Z ∞ Z Z ∞
!
[ X X X
(µ × ν) Ei = ν ([Ei ]x ) dµ = 1Ei (x, y)dνdµ = (µ × ν)(Ei )
i=1 i=1 i=1 i=1
S∞
Step 3: µ × ν is σ-finite. Since µ is σ-finite, there exist X1 , X2 ∈ S such thatS∞ i=1 Xi = X
and µ(Xi ) < ∞ for all i. SSince ν is σ-finite, there exist Y1 , Y2 ∈ T such that j=1 Yj = Y and
ν(Yj ) < ∞ for all j. Then (i,j)∈N×N (Xi × Yj ) = X × Y and Xi × Yj ∈ S × T and (µ × ν)(Xi × Yj ) =
µ(Xi )ν(Yj ) < ∞ for all (i, j) ∈ N × N.
Step 4: Uniqueness. Let π and ρ be measures on S ⊗ T that both satisfy π(A × B) = µ(A)ν(B)
and ρ(A × B) = µ(A)ν(B) for all A ∈ S and B ∈ T . We need to show π = ρ. By arguing as in
Step 3, we see that π and ρ are σ-finite. We first treat the case where π and ρ are finite. Then we
treat the case where π and ρ are σ-finite.
80
Case 1: π and ρ are finite. Define M = {E ∈ S ⊗ T : π(E) = ρ(E)}. To show that π = ρ, it
suffices to show S ⊗ T ⊆ M. Let A be the collection of finite unions of disjoint measurable
rectangles. It is easy to check that A is an algebra and S ⊗ T = σ(A). By the monotone class
lemma, S ⊗ T = M(A). If we show that M is a monotone class that contains A, it will follow
that S ⊗ T ⊆ M, and we will be done. Using the additivity of π and ρ, it is easy to check that
A ⊆ M. So it remains to check that M is a monotone class. First we check thatSM is closed
under countable increasing unions. Suppose E1 ⊆ E2 ⊆ . . . belong to M and let E = ∞
n=1 En . By
continuity from below,
π(E) = lim π(En ) = lim ρ(En ) = ρ(E).
n n
Thus E ∈ M. Now we check that M is closed under countable decreasing intersections. Suppose
E1 ⊇ E2 ⊇ . . . belong to M and let E = ∞
T
n=1 En . Since π(E1 ) < ∞ and ρ(E1 ) < ∞, continuity
from above gives
π(E) = lim π(En ) = lim ρ(En ) = ρ(E).
n n
Case
S 2: π and ρ are σ-finite. As in Step 3, choose X1 , X2 ∈ S and Y1 , Y2 ∈ T such that
(i,j)∈N×N (Xi × Yj ) = X × Y , µ(Xi ) < ∞ for all i, and µ(Yj ) < ∞. S For each k ∈ N, let Zk
be the union of those sets (Xi × Yj ) with i + j ≤ k, i.e., Zk = (i,j):i+j≤k (Xi × Yj ). Then
Z1 ⊆ Z2 . . . is an increasing sequence of sets in S × T such that ∞
S
k=1 Zk = Xk × Yk ,
X X
π(Zk ) ≤ π(Xi × Yj ) = µ(Xi )ν(Yj ) < ∞,
(i,j):i+j≤k (i,j):i+j≤k
and X X
ρ(Zk ) ≤ ρ(Xi × Yj ) = µ(Xi )ν(Yj ) < ∞.
(i,j):i+j≤k (i,j):i+j≤k
For each k ∈ N, define πk and ρk on S ⊗T by πk (E) = π(E ∩Zk ) and ρk (E) = ρ(E ∩Zk ). It is easy to
check that πk and ρk are finite measures and that πk (A×B) = µ(A)ν(B) and ρk (A×B) = µ(A)ν(B)
for all A ∈ S and B ∈ T . Therefore, by applying Case 1 to πk and ρk , we get that πk (E) = ρk (E)
for all E ∈ S ⊗ T . Then, by continuity from below,
Theorem 32.1. (Tonelli’s Theorem) Let (X, S, µ) and (Y, T , ν) be σ-finite measure spaces. If
f : X × Y → [0, ∞] is a S ⊗ T -measurable function, then
R
(a) x 7→ Y f (x, y)dν(x) is an S-measurable function,
R
(b) y 7→ X f (x, y)dµ(y) is a T -measurable function,
R R R R R
(c) f d(µ × ν) = X Y f (x, y)dν(y)dµ(x) = Y X f (x, y)dµ(x)dν(y).
81
Proof. By Theorem 30.4 and the definition of product measure, the theorem holds when f is the
indicator function of a S ⊗ T -measurable set. Since linear combinations of measurable functions
are measurable and since the integral is linear, the theorem also holds when f is a measurable
simple function. If f is non-negative extended-real-valued S ⊗ T -measurable function, we can find
a sequence of non-negative measurable simple functions which increases to f pointwise, and so the
fact that limits of measurable functions are measurable and the monotone convergence theorem
imply that the theorem holds for f in this case as well.
Theorem 32.2. (Fubini’s Theorem) Let (X, S, µ) and (Y, T , ν) be σ-finite measure spaces. Sup-
pose f : X × Y → [−∞, ∞] (or f : X × Y → C) is S ⊗ T -measurable and µ × ν-integrable.
R
(a) [f ]x is ν-integrable (i.e., Y |f (x, y)|dν(y) < ∞) for µ-almost-every x ∈ X.
(b) [f ]y is µ-integrable (i.e., X |f (x, y)|dµ(x) < ∞) for ν-almost-every y ∈ Y .
R
R
Remark: Note that the function x 7→ Y f (x, y)dν(y) may not be defined
R for every x ∈ X. Thus
we need to work with I(x) instead. Likewise with the function y 7→ Y f (x, y)dµ(y) and J(y). It is
standard convention to identify these functions, so that (e) can be written as
Z Z Z Z Z
f d(µ × ν) = f (x, y)dν(y)dµ(x) = f (x, y)dµ(x)dν(y).
X Y Y X
Proof. The plan is to apply Tonelli’s theorem to the functions |f |, f + , and Rf − . The function |f |
is non-negative and S ⊗ T -measurable.
R Tonelli’s theorem implies that x 7→ Y |f (x, y)|dν(x) is an
S-measurable function, y 7→ X |f (x, y)|dµ(y) is a T -measurable function, and
Z Z Z Z Z
|f |d(µ × ν) = |f (x, y)|dν(y)dµ(x) = |f (x, y)|dµ(x)dν(y).
X Y Y X
The assumption that f is µ × ν-integrable means that all three expressions above are finite. Now
we apply Tonelli’s theorem to the positive and negative parts of f . The functions f + and f − are
non-negative and S ⊗T -measurable. Tonelli’s theorem implies that the functions I + and I − defined
by Z Z Z
I + (x) = f + (x, y)dν(y) = [f ]+ I − (x) = f − (x, y)dν(y)
Y Y
are both S-measurable functions. Note that
Z Z
+
+
I (x) = f x dν = ([f ]x )+ dν
82
and Z Z
− −
([f ]x )− dν
I (x) = f x
dν =
Thus [fx ] is ν-integrable iff I + (x) and I − (x) are both finite. Since f + ≤ |f | and f − ≤ |f |, we have
Z Z Z
+
I (x)dµ(x) ≤ |f (x, y)|dν(y)dµ(x) < ∞
X X Y
and Z Z Z
−
I (x)dµ(x) ≤ |f (x, y)|dν(y)dµ(x) < ∞
X X Y
Thus I + and I − are µ-integrable. It follows that I + (x) and I − (x) are finite for µ-almost-every
x ∈ X. Equivalently, [fx ] is ν-integrable for µ-almost-every x ∈ X. So (a) is proved. Let A be the
set of those x ∈ X such that both I + (x) and I − (x) are finite. Equivalently, A is the set of those
x ∈ X such that is ν-integrable. For each x ∈ A, we have
Z Z Z
I(x) = f (x, y)dν(y) = +
f (x, y)dν(y) − f − (x, y)dν(y) = I + (x) − I − (x)
Y Y Y
and Z Z Z Z
− −
f d(µ × ν) = f dνdµ = I − dµ.
Therefore
Z Z Z Z Z Z
+ − + +
f d(µ × ν) = f d(µ × ν) − f d(µ × ν) = I dµ − I dµ = Idµ
33 Lebesgue Measure on Rn
Let B(Rn ) denote the Borel σ-algebra on Rn . It is an exercise to show that B(Rm ) ⊗ B(Rn ) =
B(Rm+n ). Note that the Lebesgue measure on R is σ-finite. The Lebesgue measure λn on B(Rn )
is defined to be the product measure
λn = λ × · × λ.
83
Tonelli’s Theorem and Fubini’s Theorem hold for Lebesgue measure on B(Rn ).
It is possible to define the Lebesgue measure on the larger Lebesgue σ-algebra on Rn . And it is
possible to prove Tonelli’s Theorem and Fubini’s Theorem in that case. But it is a bit complicated.
This is explored in the homework exercises.
Definition 34.1. A probability space is a measure space (Ω, A, P ) such that P (Ω) = 1. In
such case, the measure P is called a probability measure on (Ω, A), the elements of Ω are called
elementary outcomes or sample points, the set Ω is called the sample space, the sets A ∈ A
are called events, and, for each A ∈ A, the number P (A) is called the probability of the event
A.
Example 34.2. Flip two fair coins. There are four possible outcomes: we get two heads; we get
head on the first coin and a tail on the second coin; we get a tail on the first coin and a head on
the second coin; we get two tails. We can take the sample space to be Ω = {HH,HT,TH,TT} and
collection of events to be A = P(Ω). For example, one event is {HT,TH}; it represents the event
where we get exactly one head. Since the coin is fair, each elementary outcome has probability 1/4,
so we define the probability of each event A to be 1/4 times the number of elements of A. Thus
1 1
P (exactly one head) = P ({HT,TH}) = ·2= .
4 2
Definition 34.3. Let (Ω, A, P ) be a probability space. A random variable on (Ω, A, P ) is a
A-measurable function X : Ω → R.
Example 34.4. We continue the previous example. Define X to be the number of heads, i.e.,
0 if ω = TT
X(ω) = 1 if ω = HT or ω = TH
2 if ω = HH
(Since the σ-algebra A is the collecton of all subsets of Ω, any function from Ω to R is A-measurable.
Thus X is A-measurable. So X is a random variable.)
Notation 34.5. Probabilists have an aversion to displaying the arguments of random variables. For
example, it is common to write {X ≥ a} instead of {ω : X(ω) ≥ a} = X −1 ((−∞, a]) and P (X ≥ a)
instead of P ({ω : X(ω) ≥ a}) = P (X −1 ((−∞, a]))
Definition 34.6. Let (Ω, A, P ) be a probability space. Let X be a random variable on (Ω, A, P ).
The distribution of X is the set function PX : A → [0, 1] defined by
PX (B) = P (X −1 (B)) = P (X ∈ B)
for each B ∈ B(R). The next theorem says that PX is a probability measure on (R, B(R)).
Theorem 34.7. Let (Ω, A, P ) be a probability space. Let X be a random variable on (Ω, A, P ).
The distribution of X, PX , is a probability measure on (R, B(R)).
84
Proof. We have
Definition 34.8. Let (Ω, A, P ) be a probability space. Let X be a random variable on (Ω, A, P ).
The cumulative distribution function or cdf of X is the function FX : R → [0, 1] defined by
Definition 34.9. Let (Ω, A, P ) be a probability space. Let X be a random variable on (Ω, A, P ).
We say that X is a continuous random variable if there is a function fX : R → [0, ∞) such that
fX is B(R)-measurable and
Z Z
P (X ∈ B) = fX dλ = fX 1B dλ
B
for every B ∈ B(R). The function fX is called the probability density function or pdf of X.
When no confusion is possible, we denote the pdf of X by f instead of fX .
Theorem 34.10. Let X be a continuous random variable on a probability space (Ω, A, P ). Let f
be the pdf of X. Let F be the cdf of X.
Rt
1. F (t) = −∞ f dλ for every t ∈ R.
2. P (X = t) = 0 for all t ∈ R.
3. F is continuous.
4. The pdf of X is unique up toR λ-a.e. equality. In other words, if g : R → [0, ∞) is B(R)-
measurable and P (X ∈ B) = B gdλ for all B ∈ B(R), then f = g λ-a.e.
Proof. (a)-(e) are exercises. The proof of (f) is difficult and is omitted for now.
85
Example 34.11. (a) √ A −x
random variable X is said to have a standard normal distribution if it
2
has pdf f (x) = 2πe /2 . Then
Z d√
2 /2
P (c ≤ X ≤ d) = 2πe−x dx.
c
(b) A random variable X is said to be uniformly distributed on the interval [a, b] if it has pdf
1
f = b−a 1[a,b] . For each interval [c, d] ⊆ [a, b], we have
d−c
Z
1
P (c ≤ X ≤ d) = 1[a,b] 1[c,d] dx = .
b−a b−a
In words, the probability that X lies in the interval [c, d] ⊆ [a, b] is the relative length of the
interval [c, d].
Definition 34.12. Let X be a random variable on a probability space (Ω, A, P ). Let µc be the
counting measure on (R, B(R)). We say X is a discrete random variable if there is a function
f : R → [0, ∞) such that Z Z
P (X ∈ B) = fX dµc = fX 1B dµc
B
for every B ∈ B(R). In such case, fX is called the probability mass function of X. When no
confusion is possible, we denote the pmf of X by f instead of fX .
Theorem 34.13. Let X be a discrete random variable on a probability space (Ω, A, P ). Let
S = {s ∈ R : P (X = s) > 0}.
1. S is countable.
for all t ∈ R.
P
3. PX (B) = s∈S∩B f (s) for every B ∈ B(R).
P
4. PX = s∈S f (s)δs .
Proof. Exercise.
Theorem 34.14. Let X be a random variable on a probability space (Ω, A, P ). The following are
equivalent:
(a) X is discrete.
(c) The
P∞ distribution of X is a countable weighted sum of point masses. More precisely, PX =
i=1 ci δsi for some sequences c1 , c2 , . . . ∈ [0, ∞) and s1 , s2 , . . . ∈ R.
86
Proof. Exercise.
(b)
87
35 Theory of Cumulative Distribution Functions
As a preliminary, we establish
Theorem 35.1. If X is a random variable on a probability space (Ω, A, P ), then the cumulative
distribution function of X, FX , has the following properties.
(b): If (an ) is any sequence of real numbers which decreases to a, then the sets {ω : X(ω) ≤ an }
decrease to {ω : X(ω) ≤ a}, and so the fact that P is a finite measure and continuity from above
give
lim FX (an ) = lim P ({ω : X(ω) ≤ an }) = lim P ({ω : X(ω) ≤ a}) = lim FX (a).
n n n n
Thus limt→a+ FX (t) = FX (a). (c) and (d): Similar to (b). The details are an exercise.
Theorem 35.2. Let X be a random variable on a probability space (Ω, A, P ). Let a ≤ b be real
numbers. Let F be the cdf of X.
88
Proof. Exercise.
The previous theorem tells us that the cumulative distribution function FX determines the distri-
bution PX for intervals.
Proof. We give only an outline. The details are left as an exercise to the reader. To prove the
existence of µF , we mimic the construction of Lebesgue measure using Caratheodory’s restriction
theorem. For each interval (a, b] in R, define
`F ((a, b]) = F (b) − F (a).
For each E ∈ P(R), define
(∞ ∞
)
X [
µ∗F (B) = inf `F (Ii ) : I1 , I2 , . . . are intervals of the form (a, b] and B ⊆ Ii .
i=1 i=1
By mimicking the argument for Lebesgue outer measure, it is easy to check that µ∗F is an outer
measure on R and µ∗F (I) = `F (I) (exercise). The latter implies that F (t) = µ∗F ((−∞, t]) for all
t ∈ R and µ∗F (R) = 1. Now the Caratheodory restriction theorem implies that the collection
M∗F = {E ∈ P(R) : µ∗F (A) = µ∗F (A ∩ E) + µ∗F (A ∩ E c ) for all A ∈ P(R)}
of µ∗F -measurable sets is a σ-algebra on R and that the restriction of µ∗F to M∗F is a measure.
Denote this measure by µF . Note µF is a measure on (R, M∗F ). We want to have a measure
on (R, B(R)). By mimicking the proof that B(R) ⊆ L(R), it is easy to show that B(R) ⊆ M∗F
(exercise). Thus we can restrict µF to B(R) to obtain a measure on (R, B(R)). We also denote
this restricted measure by µF . Since we have noted that F (t) = µ∗F ((−∞, t]) for all t ∈ R and
µ∗F (R) = 1, and since (−∞, t] is a Borel set we see that F (t) = µF ((−∞, t]) and µF is a probability
measure. This completes the proof of the existence of µF . One way to prove uniqueness is with the
monotone class lemma. Another way is given in the proof of Proposition 1.3.10 in Cohn’s book.
The details are left as an exercise.
Now we can answer the questions above. The first two corollaries answer Question 1.
Corollary 35.4. Suppose F : R → [0, 1] satisfies properties (a)-(d) above and let µ be the probabil-
ity measure from the previous theorem. If we define (Ω, A, P ) = (R, B(R), µ) and define X : Ω → R
by X(ω) = ω, then F is the cumulative distribution function of the random variable X on (Ω, A, P ).
Proof. We have
FX (t) = PX ((−∞, t]) = P (X −1 ((−∞, t]))
= P ({ω ∈ Ω : X(ω) ∈ (−∞, t]}) = P ({ω ∈ Ω : ω ∈ (−∞, t]})
= P ((−∞, t]) = µ((−∞, t]) = F (t)
89
Corollary 35.5. Let F : R → [0, 1]. Then F satisfies properties (a)-(d) above iff there exists a
random variable whose cumulative distribution function is F .
Corollary 35.6. Let X be a random variable on a probability space (Ω, A, P ). There is a unique
probability measure µ on (R, B(R)) such that FX (t) = µ((−∞, t]) and µ = PX .
Proof. Apply the previous theorem to F = FX and use the uniqueness assertion to conclude
µ = PX .
From elementary probability and statistics, you may be familiar with the following formulas for the
expected value of discrete and continuous random variables. If X is a continuous random variable
with pdf f , then Z
E(X) = xf (x)dx.
R
If X is a discrete random variable with pmf f and S = {s ∈ R : P (X = s) > 0}, then
X
E(X) = sf (s)
s∈S
Our goal in this section is to show that these two formulas are actually special cases of a single
formula.
Definition 36.1. Let X be a random variable on a probability space (Ω, A, P ). the expected
value (or expectation or mean or average of X is defined to be
Z
E(X) = XdP
whenever the integral on the right is defined. Note that E(X) may be ±∞.
Theorem 36.2. (Law of the Unconscious Statistician) Let (Ω, A, P ) be a probability space. Let
X be a random variable on (Ω, A, P ). Let g : R → R be a B(R)-measurable function.
90
R R
(d) If either R gdPX = ±∞ or (g ◦ X)dP == ±∞, then
Z Z
E(g(X)) = (g ◦ X)dP = gdPX .
R
Thus g ◦ X is B(R)-measurable.
(b): First suppose that g is an indicator function of a set B ∈ B(R). Then g ◦ X is the indicator
function of the set X −1 (B) and
Z Z Z Z
−1
gdPX = 1B dPX = PX (B) = P (X (B)) = 1X −1 (B) dP = g ◦ XdP
In summary, Z Z
gdPX = g ◦ XdP
Using the linearity of the integral, we can see that this equality holds if g is any non-negative
simple B(R)-measurable function. By the monotone convergence, this equality also holds if g is
any non-negative B(R)-measurable function. By applying the equality to the positive and negative
parts of g and noting that (g ◦ X)+ = g + ◦ X and (g ◦ X)− = g − ◦ X, we have
Z Z Z Z
g + dPX = (g ◦ X)+ dP, g − dPX = (g ◦ X)− dP
g is PX -integrable iff both g + dPX and g − dPX are finite iff both (g ◦ X)+ dP
R R R
So we
R see that
and (g ◦ X)− dP iff g ◦ X is P -integrable. This proves (b).
(c): The first equality is just the definition of expectation. Based on what we proved above, if g is
PX -integrable, then
Z Z Z Z Z Z
+ − + −
gdPX = g dPX − g dPX = (g ◦ X) dP − (g ◦ X) dP = g ◦ XdP.
(d): If gdPX = ±∞, then one of g + dPX and g − dPX is infinite and one is finite, and we can
R R R
R
Remark
R Some authors will use the notation g(x)dFX (x). This is just another notation for
g(x)dPX (x).
91
Corollary 36.3. If X is a random variable on a probability space (Ω, A, P ), then
Z
E(X) = xdPX .
Theorem 36.4. Let (Ω, A, P ) be a probability space. Let X be a random variable on (Ω, A, P ).
Theorem 36.5. Let (Ω, A, P ) be a probability space. Let X be a random variable on (Ω, A, P ).
Let g : R → R be a B(R)-measurable function.
(b): By repeating the argument in (a) with λ replaced by the counting measure µc , we get
Z X
E(g(X)) = g(x)f (x)dµc (x) = g(s)f (s)
R s∈S
92