0% found this document useful (0 votes)

41 views

231A Lecture Notes v23

This document provides lecture notes on real analysis and the Lebesgue theory of integration. It begins with an introduction comparing Riemann and Lebesgue integration. The Riemann integral is restricted to functions on intervals, while the Lebesgue integral allows integrating functions defined on arbitrary sets using measures. The Lebesgue integral also defines integration for a larger class of functions and has stronger convergence theorems. However, pointwise convergence is not sufficient for the Riemann integral, as shown through counterexamples. The notes then cover the topics of measures, measurable functions, the Lebesgue integral, and Lp spaces.

Uploaded by

Sweety Mascarenhas

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

41 views

231A Lecture Notes v23

Uploaded by

Sweety Mascarenhas

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 92

Lecture Notes - MATH 231A - Real Analysis

Kyle Hambrook

February 5, 2020

Contents

1 Introduction: Riemann to Lebesgue 4

2 Preliminaries: Set Theory, Extended Real Numbers, Limsup and Liminf, Infinite
Series 7

3 σ-Algebras 13

4 Measurable Functions 17

5 Measures 23

6 Simple Functions 25

7 Lebesgue Integral 28

8 The Integral in Terms of Simple Functions 30

9 Monotone Convergence Theorem 32

10 Additivity of the Integral 34

11 Fatou’s Lemma 37

12 Integrable Functions and Absolute Values 38

1
13 Dominated Convergence Theorem 39

14 Interchanging Limits and Derivatives with Integrals 40

15 Almost Everywhere 42

16 Complete Measures 45

17 Completion of Measures (Optional) 47

18 The Lebesgue Integral Extends the Riemann Integral 48

19 Lebesgue’s Condition for Riemann Integrability 52

20 Normed Spaces and Banach Spaces 55

21 Lp Spaces: Definitions and Basic Properties 57

22 Lp Spaces: Completeness 59

23 Lp Spaces: Dense Subspaces (Optional) 61

24 Counting Measure and `p Spaces 65

25 Complex-Valued Measurable Functions 66

26 Caratheodory Construction: Measures from Outer Measures 67

27 Lebesgue Measure and Lebesgue σ-Algebra 71

28 Product σ-Algebras 74

29 Monotone Classes 75

30 Iterated Integrals of Characteristic Functions 76

2
31 Product Measures 80

32 Tonelli’s Theorem and Fubini’s Theorem 81

33 Lebesgue Measure on Rn 83

34 Probability Theory: Basic Definitions 84

35 Theory of Cumulative Distribution Functions 88

36 Probability Theory: Expected Value 90

37 Absolutely Continuous and Singular Measures 92

38 Decomposition of Probability Measures 92

3
1 Introduction: Riemann to Lebesgue

You are probably familiar with the Riemann integral from calculus and undergraduate analysis. If
f is a non-negative real-valued function defined on an interval [a, b], then, roughly speaking, the
Riemann integral of f is a limit of Riemann sums
Z b Xn
f (x)dx = lim f (x∗i )(xi − xi−1 ),
a n→∞
i=1

where [a, b] is partitioned into subintervals [x0 , x1 ], [x1 , x2 ], . . . , [xn−1 , xn ], we choose sample points
x∗i ∈ [xi−1 , xi ], and (xi − xi−1 ) is the length of [xi−1 , xi ]. We say f is Riemann integrable on [a, b]
when its Riemann integral is defined.

In this course, we will study Lebesgue’s theory of integration with respect to a measure. Roughly, a
measure µ on a set X is a function which takes subsets A ⊆ X as inputs and gives non-negative real
numbers µ(A) as outputs. You can think of µ(A) as some general notion of the size of A. The most
important measure is the Lebesgue measure on R, denoted by λ. For each interval [a, b], λ([a, b])
is the length b − a. For other sets A ⊆ R, λ(A) is the length of A (we will see how to define this
later.) There are many other useful measures. If f is a non-negative real-valued function defined
on a set X, then, roughly speaking, the integral of f with respect to µ is
Z Xn
f (x)dµ(x) = lim f (x∗i )µ(Ai )
X n→∞
i=1

Here the set X is partitioned into subsets A1 , . . . , An , we choose sample points x∗i ∈ Ai , and µ(Ai )
is the measure of the set Ai . When the integral is defined, we say f is µ-integrable. When µ is the
Lebesgue measure λ on R, the integral is simply called the Lebesgue integral.

Lebesgue’s theory of integration has several advantages over Riemann’s theory.

(1) Lebesgue’s theory allows us to integrate functions defined on arbitrary sets, whereas the Rie-
mann integral is restricted to functions defined on R or Rd . This is especially important for certain
applications. For example, Lebesgue’s theory of measure and integration forms the rigorous foun-
dation for modern probability theory, where the functions are random variables and we integrate
over the sample space of possible outcomes.

(2) The Lebesgue integral (i.e., the integral with respect to Lebesgue measure) is defined for a
larger class of functions on R, though it still agrees with the Riemann integral whenever the latter
is defined. For example, the indicator function of the rationals

1 if x ∈ Q
1Q =
0 if x ∈/Q

is not Riemann integrable on any interval [a, b], but it is Lebesgue integrable on any such interval.
We will see the details later.

(3) Lebesgue’s theory possesses better convergence theorems, which lead to more general elegant
results and more useful spaces of functions.

Here is the type of convergence theorem we are interested in:

4
Prototype Convergence Theorem. If (fn ) is a sequence of integrable functions defined on [a, b]
Rb Rb
and (fn ) converges to f “nicely”, then f is integrable and limn a fn = a f .

The problem is, of course, to figure out what “nicely” might mean and to define the integral in
such a way that theorems like this one will be widely applicable. These were important unresolved
issues in the late nineteenth century; they arose, for example, in the study of Fourier series. The
Lebesgue theory was developed, in large part, to address this. Let us look at this problem in a bit
more detail.
Definition 1.1. A sequence (fn ) of functions fn : X → R is said to converge pointwise to a
function f : X → R if
lim fn (x) = f (x) for each x ∈ X.
n→∞
In this case, we write fn → f pointwise.
Example 1.2. fn = tent function = graph is triangle with vertices (0, 0), (1/n, n), (2/n, 0), equals
zero elsewhere. Then fn → f = 0 pointwise, fn is Riemann integrable on [a, b], and f = 0 is
Rb Rb
Riemann integrable on [a, b], but limn a fn = 1 6= 0 = a f . So we don’t get the desired conclusion
of the Prototype Convergence Theorem.
Example 1.3. Choose an enumeration Q = {r1 , r2 , r3 , . . .}. Define f = 1Q and define fn by
fn (x) = 1 if x ∈ {r1 , . . . , rn } and fn (x) = 0 otherwise. Then fn → f pointwise, and fn is Riemann
integrable on [a, b], but f is not Riemann integrable on [a, b]. Again we don’t get the desired
conclusion of the Prototype Convergence Theorem. Note that here the sequence (fn ) is quite nice.
Indeed, (fn ) is bounded (|fn | ≤ 1 for all n) and s(fn ) is increasing (fn ≤ fn+1 for all n). But it’s
not enough.
Example 1.4. Later, we will construct a sequence of functions fn : [0, 1] → R such that

• 0 ≤ fn ≤ 1 for each n

• fn is continuous for each n

• (fn ) is an increasing sequence

• (fn ) converges pointwise to a function f on [0, 1].

• f is not Riemann integrable on [0, 1].

This sequence is even nicer than the one in the previous example, but we still don’t get the desired
conclusion of the Prototype Convergence Theorem.

The examples above show that pointwise convergence isn’t good enough for Riemann integration.
By assuming more, we can get some convergence theorems for Riemann integration. We describe
three such theorems below. Unfortunately, they are all somewhat unsatisfactory.
Definition 1.5. A sequence (fn ) of functions fn : X → R is said to converge uniformly to a
function f : X → R if
lim sup {|fn (x) − f (x)| : x ∈ X} = 0.
n→∞
In this case, we write fn → f uniformly.

5
Note that uniform convergence implies pointwise convergence.

Theorem 1.6. If fn is Riemann integrable on [a, b] for each n and fn → f uniformly on [a, b], then
Rb Rb
f is Riemann integrable on [a, b] and limn a fn = a f .

Theorem 1.7. If fn is Riemann integrable on [a, b] for each n, fn ≤ fn+1 for each n, fn → f
Rb Rb
pointwise on [a, b], and f is Riemann integrable on [a, b], then limn a fn = a f .

Theorem 1.8. If fn is Riemann integrable on [a, b] for each n, |fn | ≤ M for each n, fn → f
Rb Rb
pointwise on [a, b], and f is Riemann integrable on [a, b], then limn a fn = a f .

The problem with the first of these three theorems is that uniform convergence is too strong. In
many applications, we don’t have it. The problem with the last two theorems is that the Riemann
integrability of f is part of the hypothesis, rather than part of the conclusion.

We will see that the Lebesgue theory gives us very powerful convergence theorems, which lead to
some very impressive results and some very useful function spaces.

6
2 Preliminaries: Set Theory, Extended Real Numbers, Limsup
and Liminf, Infinite Series

Set Theory

Definition 2.1.

• If A and B are sets, their union and intersection are

A ∪ B = {x : x ∈ A or x ∈ B}
A ∩ B = {x : x ∈ A and x ∈ B}

• If {Ai : i ∈ I} is an indexed family of sets, the union and intersection of the family are
[
Ai = {x : x ∈ Ai for at least one i ∈ I}
i∈I
\
Ai = {x : x ∈ Ai for all i ∈ I}
i∈I

• If A and B are sets, the set difference of A and B (or the relative complement of B
in A) is is
A \ B = {x : x ∈ A and x ∈
/ B}
It is read as “A minus B” or “A take away B”.

• If all the sets in a given context are subsets of a fixed set X, then the complement (or
absolute complement) of a set A ⊆ X is

Ac = X \ A = {x ∈ X : x ∈
/ A}

Properties of Complement:
Suppose all sets under consideration are subsets of a fixed set X. Let A, B ⊆ X and let {Ai : i ∈ I}
be an indexed family of subsets of X.

• A ∪ Ac = X

• A ∩ Ac = ∅

• ∅c = X

• Xc = ∅

• (Ac )c = A

• A \ B = A ∩ B c = B c \ Ac

• If A ⊆ B, then B c ⊆ Ac

7
• De Morgan’s Laws:
!c !c
[ \ \ [
Ai = Aci and Ai = Aci
i∈I i∈I i∈I i∈I

In words, De Morgan’s Laws say that the complement of a union is the intersection of the comple-
ments, and the complement of an intersection is the union of the complements.

Properties of Union, Intersection, and Set Difference

S S
• B∪ i∈I Ai = i∈I (B ∪ Ai )
T T
• B∩ i∈I Ai = i∈I (B ∩ Ai )
T T
• B∪ i∈I Ai = i∈I (B ∪ Ai )
S S
• B∩ i∈I Ai = i∈I (B ∩ Ai )

• A = (A ∩ B) ∪ (A \ B)

• A ∩ B = A \ (A \ B)

• A \ (B ∪ C) = (A \ B) \ C

• (A ∪ B) \ C = (A \ B) ∪ (A \ C)

Definition 2.2. For any set X, the power set of X, denoted by P(X), is the collection of all
subsets of X:
P(X) = {A : A ⊆ X}

Definition 2.3. Let X, Y be sets. A function (or map) f : X → Y is a rule that assigns to
each element x ∈ X a unique element f (x) ∈ Y . The sets X and Y are called the domain and
codomain of f . The image (or range) of f is the set

f (X) = {f (x) : x ∈ X} .

If A ⊆ X, the image of A under f is the set

f (A) = {f (x) : x ∈ A}

If B ⊆ Y , the inverse image (or preimage) of B under f is the set

f −1 (B) = {x ∈ X : f (x) ∈ B}

The inverse image commutes with unions, intersections, set differences, and complements:

• f −1 Ai = i∈I f −1 (Ai )
S S
i∈I

• f −1 −1 (A )
T T
i∈I Ai = i∈I f i

• f −1 (A \ B) = f −1 (A) \ f −1 (B)

8
c
• f −1 (Ac ) = f −1 (A)

Remark 2.4. The image is not so well-behaved. See Exercise ???.

Definition 2.5. If f : X → Y and g : Y → Z, the composition of f and g is the function

g◦f :X →Z

defined by
(g ◦ f )(x) = g(f (x)) for every x ∈ X.

Extended Real Numbers

The set of extended real numbers is the set obtained by adjoining the two symbols −∞ and
+∞ to the set of real numbers. It is denoted by R. Thus

R = R ∪ {−∞, +∞} .

We often write ∞ instead of +∞ We extend the usual ordering on R to R by declaring that

−∞ < x < ∞ for every x ∈ R

The interval notation is extended in the natural way. For example,

(0, ∞] = (0, ∞) ∪ {∞} = x ∈ R : x > 0 = x ∈ R : 0 < x ≤ ∞ .

We extend the usual arithmetic operations on R to R by declaring that

• x + ∞ = ∞ and x − ∞ = −∞ for all x ∈ R

• ∞ + ∞ = ∞ and −∞ − ∞ = −∞

• x · (±∞) = ±∞ for all x ∈ (0, ∞]

• x · (±∞) = ∓∞ for all x ∈ [−∞, 0)

• 0 · (±∞) = 0
x
• =0 for all x ∈ R.
±∞

Expressions of the form

∞
∞−∞ and
∞
are left undefined.

In some other areas of mathematics, the products 0 · (±∞) = 0 are left undefined, but not here.

The absolute values of −∞ and +∞ are

| − ∞| = | + ∞| = +∞

9
Supremum and Infimum

Let A ⊆ R.

An element a0 ∈ R is called the maximum (or greatest element) of A if a0 ∈ A and a ≤ a0 for

every a ∈ A. The maximum of A is denoted by max(A). It is easy to see that the maximum is
unique if it exists, hence why we say “the maximum” instead of “a maximum”. The minimum
(or least element) of A is defined similarly; it is denoted by min(A) and is unique if it exists.

An element a0 ∈ R is called an upper bound of A if a ≤ a0 for every a ∈ A. If A has an upper

bound, it is said to be bounded above. A lower bound of A is defined similarly.

An element a0 ∈ R is called the supremum of A if it is the least upper bound of A, i.e., if a0 is

the minimum of the set of all upper bounds of A. Thus a0 is the supremum of A if and only if a0
is an upper bound of A and a0 ≤ a00 for every upper bound a00 of A. The supremum is denoted
by sup(A). Since minimums are unique if they exist, the supremum is unique if it exists. The
infimum of A is the greatest lower bound of A; it is denoted by inf(A) and is unique if it exists.

If A is non-empty and bounded above in R, the order completeness axiom of the real numbers
implies that A has a supremum sup(A) in R.

If A is non-empty but not bounded above in R, then ∞ is the only upper bound for A, and so
sup(A) = ∞.

If A is empty, then every extended real number is an upper bound for A, and so sup(A) = −∞.
Every subset of R has a supremum and an infimum in R.

Thus every subset of R has a supremum in R. The same goes for infimum.

Limits, Limsup, Liminf

Let (an ) be a sequence in R.

Definition 2.6.

• Let a ∈ R. We write lim an = a (and we say that (an ) converges to a and that a is the limit
of (an )) if for every real > 0 there exists an N ∈ N such that if n ≥ N then
|an − a| < .

• We write lim an = +∞ (and we say that (an ) converges to +∞ and that +∞ is the limit of
(an )) if for every real M > 0 there exists an N ∈ N such that if n 6= N then
an > M.

• We write lim an = −∞ (and we say that (an ) converges to −∞ and that −∞ is the limit of
(an )) if for every real M > 0 there exists an N ∈ N such that if n 6= N then
an < −M.

10
Definition 2.7. The limsup (or limit superior) and liminf (or limit inferior) are defined by
!
lim sup an = inf sup an
k≥1 n≥k

lim inf an = sup inf an
k≥1 n≥k

Note that
!
• The sequence ck = sup an is a decreasing sequence
n≥k

• The sequence bk = inf an is an increasing sequence
n≥k
! !
• lim sup an = inf sup an = lim sup an
k≥1 n≥k k n≥k

• lim inf an = sup inf an = lim inf an
k≥1 n≥k k n≥k

• bk ≤ ak ≤ ck for all k

• lim inf an ≤ lim sup an

Theorem 2.8. We have lim sup an = lim inf an iff limn an = L for some L ∈ R, in which case

lim sup an = lim inf an = lim an .

Proof. Exercise.

Infinite Series

Definition 2.9. Let ∞

P
n=1 an be an infinite series whose terms an belong to R. The series has a
sum in R (i.e., the series converges in R) if both

(a) ∞ and −∞ do not both occur among the terms of ∞

P
n=1 an , and

(b) the sequence sk = kn=1 an of partial sums of ∞

P P
n=1 an has a limit in R

The sum of the series ∞

P
n=1 an is then defined to be the limit of the partial sums:

k
X ∞
X
lim sk = lim an = an .
k k
n=1 n=1

Remark 2.10. (i) The sum of the series may be −∞ or +∞.

11
guarantee that the undefined expression ∞ − ∞
(ii) Note that condition (a) above is needed to P
does not occur in any of partial sums sk = kn=1 an .

(iii) If ∞ is one of the terms of the series and −∞ is not, then the sum of the series is ∞. Likewise
if we swap the roles of ∞ and −∞.

12
3 σ-Algebras

Definition 3.1. Let X be any set. A σ-algebra on X is a collection A ⊆ P(X) with the following
properties.

(a) ∅ ∈ A

(b) If A ∈ A, then Ac = X \ A ∈ A.

The pair (X, A) is called a measurable space.

Remark: σ indicates “countable union”

Theorem 3.2. If A is a σ-algebra on a set X, then A has the following properties:

(i) X ∈ A
T∞
(ii) If A1 , A2 , . . . ∈ A, then i=1 Ai ∈ A.
Sn
(iii) If A1 , . . . , An ∈ A, then i=1 Ai ∈ A.
Tn
(iv) If A1 , . . . , An ∈ A, then i=1 Ai ∈ A.

(v) If A, B ∈ A, then A \ B = A ∪ B c ∈ A.

Proof. In this proof, (a),(b),(c) refer to the properties in the definition of a σ-algebra.

(i): By (a) and (b), X = ∅c ∈ A.

(ii): Let A1 , A2 , . . . ∈ A. By De Morgan’s laws

∞ ∞
!c
\ [
Ai = Aci .
i=1 i=1
S∞ T∞
By (b), Ac1 , Ac2 , . . . ∈ A. Then, by (c), c =( ∞ c c
S
i=1 Ai ∈ A. Finally, by (b) again, i=1 Ai i=1 Ai ) ∈
A.

(iii): Define Ai = ∅ for i > n and apply (c).

(iv): Define Ai = X for i > n and apply (ii). Alternatively, use De Morgan’s laws and apply (b)
and (iii).

(v): Use (b) and (iv).

Example 3.3. Let X be any set. The following are σ-algebras on X.

(a) P(X)

(b) {∅, X}

13
(c) {∅, A0 , Ac0 , X} for any fixed A0 ⊆ X.

(d) {A ⊆ X : A is countable or Ac is countable}

Theorem 3.4. Let X be any set. The intersection of any collection of σ-algebras on X is a
σ-algebra on X.

T
Proof. Let {Aj }j∈J be any collection of σ-algebras on X and consider their intersection j∈J Aj .
We need to check properties (a), (b), and (c) of the definition of a σ-algebra.
T
(a): We have ∅ ∈ Aj for every j ∈ J. So ∅ ∈ j∈J Aj .
T
(b): Suppose A ∈ j∈J Aj . Then A ∈ Aj T for every j ∈ J. Since Aj is a σ-algebra for every j ∈ J,
we have Ac ∈ Aj for every j ∈ J. So A ∈ j∈J Aj .
T
(c): Suppose A1 , A2 , . . . ∈T j∈J Aj . Then A1 , A2 , . . . ∈ Aj for ∈ J. Since Aj is a σ-algebra
T∞every j T
∞
for every j ∈ J, we have i=1 Ai ∈ Aj for every j ∈ J. So i=1 Ai ∈ j∈J Aj .

Definition 3.5. Let X be any set and let G ⊆ P(X). The intersection of all σ-algebras on X that
contain G is denoted by σ(G). In other words,
[
σ(G) = {A : A is a σ-algebra on X, G ⊆ A} .

We call σ(G) the σ-algebra generated by G.

The following theorem is immediate from the definition of σ(G).

Theorem 3.6. Let X be any set and let G ⊆ P(X). Then σ(G) is the smallest σ-algebra on X
that contains σ(G); in other words,

(i) σ(G) is a σ-algebra on X

(ii) G ⊆ σ(G)

(iii) If A is any σ-algebra on X and G ⊆ A, then σ(G) ⊆ A.

Definition 3.7.

• An open ball in Rd is a set of the form B(x0 , r) = {x ∈ X : |x − x0 | < r}, where x0 ∈ Rd

and 0 < r < ∞. Note that the open balls in R are the open intervals of the form (a, b) with
−∞ < a < b < ∞. To see this, note (a, b) = (x0 − r, x0 + r) = B(x0 , r), where x0 = (b + a)/2
and r = (b − a)/2.

• An open set in Rd is a union open balls.

Definition 3.8. Suppose X is R or Rd (or any topological space). The Borel σ-algebra on X,
denoted by B(X), is the σ-algebra generated by the collection of all open subsets of X. In other
words, if T denotes the collection of all open subsets of X, then B(X) = σ(T ). The elements of
B(X) are called Borel sets in X.

14
Theorem 3.9. The Borel σ-algebra B(Rd ) is generated by the collection of all open balls in Rd .
In other words, if G is the collection of all open balls in Rd , then B(Rd ) = σ(G).

Proof. Let T be the collection of all open sets in Rd . Since B(Rd ) = σ(T ), we must show σ(G) =
σ(T ). Since G ⊆ T , we have σ(G) ⊆ σ(T ) by Theorem 3.6. It remains to show σ(T ) ⊆ σ(G). If
A ∈ T , then A is a union of open balls in Rd . So for each point in x ∈ A, there is an open ball
Bx that contains x and is contained in A. Now for each point x ∈ A choose an open ball (1) that
contains x, (2) that is contained in Bx , (3) whose center is a point with rational coordinates, and
(4) whose radius is rational. In this way, A is written as a union of balls with whose centers have
rational coordinates and whose radii are rational. Thus A is written as a union of a countable
collection of open balls. Hence A ∈ σ(G). Thus T ⊆ σ(G). Therefore σ(T ) ⊆= σ(G).

Theorem 3.10. The Borel σ-algebra B(R) is generated by each of the following collections of sets:

(i) G1 = {(a, b) : a, b ∈ R, a < b}

(ii) G2 = {(a, b] : a, b ∈ R, a < b}

(iii) G3 = {(−∞, b] : b ∈ R}

(iv) G4 = {(−∞, b) : b ∈ R}

(v) G5 = {[a, b] : a, b ∈ R, a < b}

(vi) G6 = {[a, b) : a, b ∈ R, a < b}

(vii) G7 = {[a, ∞) : a ∈ R}

(viii) G8 = {(a, ∞) : a ∈ R}

Proof. (i): The previous theorem implies G1 generates B(R), i.e., σ(G1 ) = B(R).

(ii): By writing,
∞
\
(a, b] = (a, b + 1/n)
n=1

we see that G2 ⊆ σ(G1 ), hence σ(G2 ) ⊆ σ(G1 ). By writing

∞
[
(a, b) = (a, b − 1/n]
n=1

we see that G1 ⊆ σ(G2 ), hence σ(G1 ) ⊆ σ(G2 ). This proves G2 generates B(R).

(iii): By writing
∞
[
(−∞, b] = (−n, b]
n=1

we see that G3 ⊆ σ(G2 ), hence σ(G3 ) ⊆ σ(G2 ). By writing

(a, b] = (−∞, b] ∩ (a, ∞) = (−∞, b] ∩ (−∞, a]c

we see that G2 ⊆ σ(G3 ), hence σ(G2 ) ⊆ σ(G3 ). This proves G3 generates B(R).

15
(iv): By writing
∞
[
(−∞, b) = (−∞, b − 1/n]
n=1

we see that G4 ⊆ σ(G3 ), hence σ(G4 ) ⊆ σ(G3 ). By writing

∞
\
(−∞, b] = (−∞, b + 1/n)
n=1

we see that G3 ⊆ σ(G4 ), hence σ(G3 ) ⊆ σ(G4 ). This proves G3 generates B(R).

(v),(vi),(vii),(viii): Similar.

Definition 3.11. Let B(R) denote the collection of all sets of the form A, A ∪ −∞, A ∪ ∞,
A ∪ −∞ ∪ ∞, where A is a Borel set in R. It is straightforward to check that B(R) is a σ-algebra
on R. We call B(R) the Borel σ-algebra on R.

Remark 3.12. It is possible to define a collection of open sets in R and to define the Borel σ-algebra
on R as the σ-algebra generated by the open sets in R. See Exercise X???

Theorem 3.13. The Borel σ-algebra B(R) is generated by each of the following collection of sets:

(i) G30 = {[−∞, b] : b ∈ R}

(ii) G40 = {[−∞, b) : b ∈ R}

(iii) G70 = {[a, ∞] : a ∈ R}

(iv) G80 = {(a, ∞] : a ∈ R}

Proof. (i) By the previous lemma, G30 ⊆ B(R), hence σ(G30 ) ⊆ B(R). Now we show B(R) ⊆ σ(G30 ).
Let B ∈ B(R). Then B equals one of A or A ∪ −∞ or A ∪ ∞ or A ∪ −∞ ∪ ∞, where A ∈ B(R).
To show that B ∈ σ(G30 ), it suffices to show that {−∞} , {∞} , A ∈ σ(G30 ). By writing,
∞
\
{−∞} = [−∞, −n]
n=1

we see that {−∞} ∈ σ(G30 ). By writing

∞
\ ∞
\
{∞} = (n, ∞] = [−∞, n]c
n=1 n=1

we see that {∞} ∈ σ(G30 ). Recall the definition of G5 from the previous theorem. By writing

[a, b] = [−∞, b) ∩ [−∞, a)c ,

we see that G5 ⊆ σ(G30 ), and hence σ(G5 ) ⊆ σ(G30 ). By the previous theorem, B(R) = σ(G5 ). So we
have A ∈ B(R) = σ(G5 ) ⊆ σ(G30 ). Therefore B ∈ σ(G30 ) and we conclude B(R) ⊆ σ(G30 ).

(ii),(iii),(iv): Similar.

16
4 Measurable Functions

To motivate the definition of a measurable function, we ask the reader to recall the following
theorem from undergraduate analysis about continuous functions.

Theorem 4.1. Let f : R → R. Then f is continuous iff

f −1 (V ) is an open set in R for every V is an open set in R

If T is the collection of open sets in R, we can rewrite this as: f is continuous iff

f −1 (V ) ∈ T for every V ∈ T

The reader who has studied topology may recall the following more general theorem.

Theorem 4.2. Let X and Y be topological spaces and f : X → Y . Then f is continuous iff

f −1 (V ) is an open set in X for every V is an open set in V in Y

If TX is the collection of open sets in X and TY is the collection of open sets in Y , we can rewrite
this as: f is continuous iff
f −1 (V ) ∈ TX for every V ∈ TY

In summary, a continuous function is one whose inverse image preserves open sets.

Now we state the definition of a measurable function.

Definition 4.3. Let f : X → Y . Let A be a σ-algebra on X. Let B be a σ-algebra on Y . We say

that f is measurable (or (A, B)-measurable to avoid ambiguity) if

f −1 (B) ∈ A for every B ∈ B.

Now we turn to some special cases. If f : X → R, the σ-algebra on R is always assumed to

be the Borel σ-algebra B(R), i.e., (Y, B) = (R, B(R)). We say f : X → R is measurable (or
A-measurable to avoid ambiguity) if

f −1 (B) ∈ A for every B ∈ B(R).

If f : Rd → R, we sometimes (but not always) consider the Borel σ-algebra on the domain Rd . We
say f : Rd → R is Borel measurable (or B(Rd )-measurable) if

f −1 (B) ∈ B(Rd ) for every B ∈ B(R).

The next theorem says that to show a function f is measurable we only need to check f −1 (B) for
B in a generating collection.

Theorem 4.4. Let (X, A) and (Y, B) be measurable spaces. Let f : X → Y . Let G be any
collection of subsets of Y that generates B, i.e., σ(G) = B. Then f is measurable iff

f −1 (B) ∈ A for every B ∈ G.

17
Proof. ⇒: Since G ⊆ σ(G) = B, if

f −1 (B) ∈ A for every B ∈ B.

then
f −1 (B) ∈ A for every B ∈ G.

⇐: It is readily verified that B ⊆ Y : f −1 (B) ∈ A is a σ-algebra. If

f −1 (B) ∈ A for every B ∈ G.

then B ⊆ Y : f −1 (B) ∈ A contains G Thus B = σ(G) ⊆ B ⊆ Y : f −1 (B) ∈ A . So f −1 (B) ∈ A

for every B ∈ B. Therefore f is measurable.

By combining the above theorem and Theorem 3.13 (which gives generating sets for B(R)), we
obtain:

Theorem 4.5. Let (X, A) be a measurable space and let f : X → R. Then the following are
equivalent:

(i) f is measurable

(ii) {x ∈ X : f (x) < b} = f −1 ([−∞, b)) ∈ A for every b ∈ R

(iii) {x ∈ X : f (x) ≤ b} = f −1 ([−∞, b]) ∈ A for every b ∈ R

(iv) {x ∈ X : f (x) > a} = f −1 ((a, ∞]) ∈ A for every a ∈ R

(v) {x ∈ X : f (x) ≥ a} = f −1 ([a, ∞]) ∈ A for every a ∈ R

Remark 4.6. When verifying a function f : X → R is measurable, we usually use the theorem
above, rather than the definition.

Theorem 4.7. Let (X, A) be a measurable space and let f : X → R. If f is measurable, then
{x ∈ X : f (x) = c} = f −1 ({c}) ∈ A for every c ∈ R.

Proof.
{x ∈ X : f (x) = c} = {x ∈ X : f (x) ≤ c} ∩ {x ∈ X : f (x) ≥ c} ∈ A.

Notation. For brevity, we write

{f < b} = {x ∈ X : f (x) < b} f −1 ([−∞, b)),

{f = c} = {x ∈ X : f (x) = c} = f −1 ({c}),
and so on.

Now we give some examples of measurable functions.

18
Theorem 4.8. Let (X, A) be a measurable space. Let f : X → R. If f is constant, then f is
measurable.

Proof. Assume f is constant. This means there exists a c ∈ R such that f (x) = c for all x ∈ X.
Let a ∈ R. Then {x ∈ X : f (x) ≥ a} = ∅ if a > c, and {x ∈ X : f (x) ≥ a} = X if a ≤ c. Either
way, {x ∈ X : f (x) < a} ∈ A. Thus f is measurable.

Theorem 4.9. Let (X, A) be a measurable space. Let A ⊆ X. Then A is measurable (i.e., A ∈ A)
iff the indicator function 1A is measurable.

Proof. ⇒: Assume A ∈ A. For every a ∈ R,


 ∅ if a > 1
{x ∈ X : 1A (x) ≥ a} = A if 0 < a ≤ 1 ∈ A.
X if a ≤ 0


So 1A is measurable.

⇐: Assume 1A is measurable. Then A = {x ∈ X : 1A (x) > 0} ∈ A.

Theorem 4.10. Let f : R → R. If f is continuous, then f is Borel measurable.

Proof. Since f is continuous, f −1 (G) is open for every open set G ⊆ R. So {x ∈ X : f (x) < a} =
f −1 ((−∞, a)) is open for every a ∈ R. Therefore {x ∈ X : f (x) < a} = f −1 ((−∞, a)) ∈ B(R) for
every a ∈ R.

Theorem 4.11. Let f : R → R. If f is increasing or decreasing, then f is Borel measurable.

Proof. Exercise.

Theorem 4.12. Let (X, A),(Y, B), (Z, C) be a measurable spaces. Let f : X → Y and g : Y → Z.
If f is (A, B)-measurable and if g : Y → Z is measurable (B, C)-measurable, then then the function
g ◦ f : X → Z is (A, C) measurable.

Proof. Exercise.

Theorem 4.13. Let (X, A) be a measurable space. Let f, g : X → R be measurable functions.

Then (i) {f < g}, (ii) {f ≤ g}, and (iii) {f = g} are measurable sets.

Proof. (i): For each fixed x ∈ X, we have f (x) < g(x) iff there is an r ∈ Q such that f (x) < r <
g(x). Thus
[
{x ∈ X : f (x) < g(x)} = {x ∈ X : f (x) < r} ∩ {x ∈ X : r < g(x)} .
r∈Q

(ii): {f ≤ g} = {g < f }c

(iii): {f = g} = {f ≤ g} ∩ {f ≥ g}

19
Theorem 4.14. Let (X, A) be a measurable space. Let f, g : X → R be measurable functions.
Then the following functions are measurable:

(i) f + g, if f (x) + g(x) 6= ∞ − ∞, −∞ + ∞ for every x ∈ X.

(ii) f g
(iii) f /g, if g(x) 6= 0 for every x ∈ X.

Proof. Let a ∈ R be arbitrary.

(i):
[ [
{f + g < a} = {f < −g + a} = ({f < r} ∩ {r < −g + a}) = ({f < r} ∩ {g < −r + a}) ∈ A.
r∈Q r∈Q

(ii): {f g < a} = A0 ∪ A1 ∪ A2 ∪ A3 ∪ A4 , where Ai are as follows:

A0 = ({f = 0} ∪ {g = 0}) ∩ {f g < a}
A1 = {f > 0} ∩ {g > 0} ∩ {f g < a}
A2 = {f > 0} ∩ {g < 0} ∩ {f g < a}
A3 = {f < 0} ∩ {g > 0} ∩ {f g < a}
A4 = {f < 0} ∩ {g < 0} ∩ {f g < a}
We have:

∅ if a ≤ 0
A0 = ({f = 0} ∪ {g = 0}) ∩ {f g < a} = ∈A
{f = 0} ∪ {g = 0} if a ≤ 0

A1 = {f > 0} ∩ {g > 0} ∩ {f g < a}

= {f > 0} ∩ {g > 0} ∩ {f < a/g}
[
= {f > 0} ∩ {g > 0} ∩ ({f < r} ∩ {r < a/g})
r∈Q
r>0
[
= {f > 0} ∩ {g > 0} ∩ ({f < r} ∩ {g < a/r}) ∈ A
r∈Q
r>0

A2 = {f > 0} ∩ {g < 0} ∩ {f g < a}

= {f > 0} ∩ {−g > 0} ∩ {f (−g) > −a}
= {f > 0} ∩ {−g > 0} ∩ {f > (−a)/(−g)}
[
= {f > 0} ∩ {−g > 0} ∩ ({f > r} ∩ {r > (−a)/(−g)})
r∈Q
r>0
[
= {f > 0} ∩ {−g > 0} ∩ ({f > r} ∩ {−g > −a/r})
r∈Q
r>0
[
= {f > 0} ∩ {−g > 0} ∩ ({f > r} ∩ {−g > −a/r}) ∈ A
r∈Q
r>0

20
Similarly, A3 , A4 ∈ A. Therefore {f g < a} ∈ A.

(ii): {f g < a} = A0 ∪ A1 ∪ A2 ∪ A3 ∪ A4 , where Ai are as follows:

A0 = ({f = 0} ∪ {g = 0}) ∩ {f g < a}

A1 = {f > 0} ∩ {g > 0} ∩ {f g < a}
A2 = {f > 0} ∩ {g < 0} ∩ {f g < a}
A3 = {f < 0} ∩ {g > 0} ∩ {f g < a}
A4 = {f < 0} ∩ {g < 0} ∩ {f g < a}

We have:

∅ if a ≤ 0
A0 = ({f = 0} ∪ {g = 0}) ∩ {f g < a} = ∈A
{f = 0} ∪ {g = 0} if a ≤ 0

A1 = {f > 0} ∩ {g > 0} ∩ {f g < a}

= {f > 0, g > 0, f g < a}
= {f > 0, g > 0, f < a/g}
= ∪r∈Q ({f > 0, g > 0, f < r} ∩ {f > 0, g > 0, r < a/g})
r>0
= ∪r∈Q ({f > 0, g > 0, f < r} ∩ {f > 0, g > 0, g < a/r})
r>0
[
= {f > 0} ∩ {g > 0} ∩ ({f < r} ∩ {g < a/r}) ∈ A
r∈Q
r>0

A2 = {f > 0} ∩ {g < 0} ∩ {f g < a}

= {f > 0, g < 0, f g < a}
= {f > 0, g < 0, f (−g) > −a}
= {f > 0, g < 0, f > (−a)/(−g)}
= ∪r∈Q ({f > 0, g < 0, f > r} ∩ {f > 0, g < 0, r > (−a)/(−g)})
r>0
= ∪r∈Q ({f > 0, g < 0, f > r} ∩ {f > 0, g < 0, −g > (−a)/r})
r>0
= ∪r∈Q ({f > 0, g < 0, f > r} ∩ {f > 0, g < 0, g < a/r})
r>0
= {f > 0} ∩ {g < 0} ∩ ∪r∈Q ({f > r} ∩ {g < a/r}) ∈ A
r>0

Similarly, A3 , A4 ∈ A. Therefore {f g < a} ∈ A

(iii): {f /g < a} = {f < ag} ∈ A by (ii) and the previous theorem.

Theorem 4.15. Let (X, A) be a measurable space. Let f, g : X → R be measurable functions.

The following functions are measurable:

(i) max {f, g}

21
(ii) min f, g

(iii) |f |

Proof. (i): {max {f, g} > a} = {f > a} ∪ {g > a} ∈ A

(ii): {min {f, g} > a} = {f > a} ∩ {g > a} ∈ A

(iii): {|f | < a} = {−a < f < a} = {f > −a} ∩ {f < a} ∈ A. Alternatively, write |f | = max(f, −f )
and use (ii) and (vi).

Exercise Prove or Disprove: If |f | is measurable, then f is measurable.

Theorem 4.16. Let (X, A) be a measurable space. Let (fn ) be a sequence of measurable functions
X → [−∞, ∞]. The following functions are measurable:

(i) supn fn

(ii) inf n fn

(iii) lim sup fn = inf k≥1 (supn≥k fn )

(iv) lim inf fn = supk≥1 (inf n≥k fn )

Moreover, if (fn (x)) converges to an extended real number for each x ∈ X, then limn fn =
lim supn fn = lim inf n fn , and so limn fn is measurable.

Proof. (i): For each a ∈ R,

∞
[
sup fn > a = x ∈ X : sup fn (x) > a = {x ∈ X : fn (x) > a} ∈ A.
n n
n=1

(ii): Similar to (i).

(iii): By (i), gk = supn≥k fn is measurable for each k ∈ N. By (ii), lim sup fn = inf k≥1 gk is
measurable.

(iv): Similar to (iii).

22
5 Measures

Definition 5.1. Let A be a σ-algebra on a set X. A measure on A is a function µ : A → [0, ∞]

that has the following properties:

(a) µ(∅) = 0

(Countable Addivity) If E1 , E2 , . . . is a sequence of disjoint sets in A, then µ ( ∞

S
(b) P i=1 Ei ) =
∞
i=1 µ(Ei ).

Theorem 5.2 (Properties of Measure). Let A be a σ-algebra on a set X. If µ is a measure on A,

then µ has the following properties:

(i) (Finite Additivity) If A, B are disjoint sets in A, then µ(A ∪ B) = µ(A) + µ(B).

(ii) (Differencing) If A, B ∈ A, A ⊆ B, and µ(A) < ∞, then µ(B \ A) = µ(B) − µ(A).

(iii) (Monotonicity) If A, B ∈ A and A ⊆ B, then µ(A) ≤ µ(B).

(iv) (Countable Subadditivity) If A1 , A2 , . . . ∈ A, then µ ( ∞

S P∞
i=1 Ai ) ≤ i=1 µ(Ai ).

(v) (Continuity From Below) If A1 , A2 , . . . ∈ A and A1 ⊆ A2 ⊆ . . ., then µ ( ∞

S
i=1 Ai ) = limn µ(An ).

(vi) (Continuity From Above) If A1 , A2 , . . . ∈ A, A1 ⊇ A2 ⊇ . . ., and µ(A1 ) < ∞, then

µ( ∞
T
A
i=1 i ) = limn µ(An ).

Proof. (i): Define A1 = A, A2 = B, and Ai = for i > 2 and use countable additivity.

(ii): Since B = (B ∩ A) ∪ (B \ A) = A ∪ (B \ A), finite additivity gives

µ(B) = µ(A) + µ(B \ A).

Since µ(A) < ∞, we can subtract it from both sides to get the desired result.

(iii): Since B = (B ∩ A) ∪ (B \ A) = A ∪ (B \ A), finite additivity and the fact that µ(B \ A) ≥ 0
gives
µ(B) = µ(A) + µ(B \ A) ≥ µ(A)
.

(iv): Define B1 = A1 , B2 = A2 \ A1 , B3 = A3 \ (A1 ∪ A2 ), and so on. Then B1 , B2 , . . . is sequence

of disjoint sets in A such that ∪∞ ∞
i=1 Bi = ∪i=1 Ai and Bi ⊆ Ai for all i ∈ N. Therefore, by countable
additivity and monotonicity,
∞ ∞ ∞ ∞
! !
[ [ X X
µ Ai = µ Bi = µ(Bi ) ≤ µ(Ai ).
i=1 i=1 i=1 i=1

(v): Define B1 = A1 , B2 = A2 \ A1 , B3 = A3 \ A2 , and so on. Then B1 , B2 , . . . is sequence of

disjoint sets in A such that ∪∞ ∞ n n
i=1 Bi = ∪i=1 Ai and ∪i=1 Bi = ∪i=1 Ai = An for all n ∈ N. Therefore,

23
by countable additivity and finite additivity,
∞ ∞ ∞
! !
[ [ X
µ Ai = µ Bi = µ(Bi )
i=1 i=1 i=1
Xn
= lim µ(Bi ) = lim µ (∪ni=1 Bi )
n→∞ n→∞
i=1
= lim µ (∪ni=1 Ai ) = lim µ (An ) .
n→∞ n→∞

(vi): Define B1 = A1 \ A1 = ∅, B2 = A1 \ A2 , B3 = A1 \ A3 , and so on. Then B1 , B2 , . . . ∈ A and

B1 ⊆ B2 ⊆ . . .. By (v),
∞
!
[
µ Bi = lim µ(Bn ).
n→∞
i=1
S∞ T∞
Since i=1 Bi = A1 \ i=1 Ai and Bn = A1 \ An , we have
∞
!
\
µ A1 \ Ai = lim µ(A1 \ An ).
n→∞
i=1

By monotonicity, µ ( ∞
T
i=1 Ai ) ≤ µ(An ) ≤ µ(A1 ) < ∞. Then (ii) gives

∞
!
\
µ(A1 ) − µ Ai = lim (µ(A1 ) − µ(An )) .
n→∞
i=1

Subtracting µ(A1 ) and then multiply by −1 gives the desired result.

Corollary 5.3. Let X be any set. If µ is a measure on P(X), then µ is an outer measure on X.

Example 5.4. Let X be any set.

(i) The counting measure on X is the measure µ : P(X) → [0, ∞] defined by µ(A) = ∞ if A is
an infinite subset of X and µ(A) equals the number of elements in A if A is a finite subset of
X.

(ii) Let x0 ∈ X. The Dirac measure at x0 or point mass at x0 is the measure µ : P(X) → [0, ∞]
defined by µ(A) = 1 if x0 ∈
/ A and µ(A) = 1 if x0 ∈ A.

As an exercise, the reader should verify that these are indeed measures.

24
6 Simple Functions

Definition 6.1. A function s : X → C is called simple if its range is finite. If s : X → C is a

simple function whose range consists of the distinct numbers c1 , c2 , . . . , cn , then
n
X
s= ci 1Ei (6.1)
i=1
Sn
where Ei = s−1 ({ci }) = {x ∈ X : s(x) = ci }. Note E1 , . . . , En are disjoint sets and i=1 Ei . We
call (6.1) the standard representation of the s.
Lemma 6.2. Let (X, A) be a measurable space. Let s : X → C be a simple function whose range
consists of the distinct numbers c1 , c2 , . . . , cn . Then s is measurable iff each set Ei = s−1 ({ci }) in
its standard representation is measurable.

Pn
Proof. If each Ei is measurable, then each 1Ei is measurable, so s = i=1 ci 1Ei is measurable.

Conversely, if s is measurable, then

∞
\
Ei = {x ∈ X : s(x) = ci } = {x ∈ X : ci − 1/n < s(x) < ci + 1/n}
n=1

is measurable for each i = 1, . . . , n.

The next theorem concerns approximation of functions by simple functions.

Theorem 6.3. Let f : X → [−∞, ∞]. Then there exists a sequence (sn ) of functions sn : X → R
such that

(i) s1 , s2 , . . . are simple functions.

(ii) |s1 | ≤ |s2 | ≤ . . . ≤ |f |.

(iii) limk→∞ sk (x) = f (x) for each x ∈ X.

(iv) If f is bounded, then sk → f uniformly on X.

(v) If f ≥ 0, then 0 ≤ s1 ≤ s2 ≤ . . . ≤ f .

(vi) If (X, A) is a measurable space and f is measurable, then s1 , s2 , . . . are measurable.

Proof. Let k ∈ N. Decompose [−∞, ∞] as the following union of subintervals:

[−∞, ∞] = [−∞, −k] ∪ (−k, 0] ∪ [0, k) ∪ [k, ∞]

Decompose [−∞, ∞] further by dividing (−k, 0] and [0, k) up into intervals of length 1/2k . So we
get
k2k k2k
[ −m −(m − 1) [ m − 1 m
[−∞, ∞] = [−∞, −k] ∪ , ∪ , k ∪ [k, ∞].
2k 2k 2k 2
m=1 m=1

25
Figure 1: Definition of sk and sk+1

For each x ∈ X, consider the subinterval to which f (x) belongs and define sk (x) to the endpoint
of that subinterval which is closest to 0. More precisely,

 k if f (x) ∈ [k, ∞]
 m−1 f (x) ∈ m−1 , 2mk , mi∈ 1, . . . , k2k

if

2k 2k
sk (x) = −(m−1) −m −(m−1)
k


 2k if f (x) ∈ 2 k , 2k , m ∈ 1, . . . , k2

−k if f (x) ∈ [−∞, −k].


Figure 1 illustrates the definition of sk and sk+1 .

(i): Since sk takes only finitely many values, sk is simple.

(ii): No matter which subinterval f (x) belongs, sk (x) rounds f (x) to a number closer to zero than
does sk+1 (x). See Figure 1.

(iii): The key observation is that if f (x) ∈ (−k, k), then

1
|f (x) − sk (x)| ≤ .
2k
Here are the details. Let x ∈ X. If f (x) = ∞, then sk (x) = k for every k ∈ N, and so
limk→∞ sk (x) = ∞ = f (x). If f (x) = −∞, then sk (x) = −k for every k ∈ N, and so limk→∞ sk (x) =
−∞ = f (x). If f (x) ∈ (−∞, ∞), there exists M > 0 such that f (x) ∈ (−M, M ). Let > 0. Choose
K ∈ N such that 21K < and K ≥ M . For every k ∈ N, we have f (x) ∈ (−k, k), and so

1 1
|f (x) − sk (x)| ≤ k
≤ K < .
2 2
Therefore limk→∞ sk (x) = f (x).

26
(iv): If f is bounded, then there exists M > 0 such that f (x) ∈ (−M, M ) for every x ∈ X. Let
> 0. Choose K ∈ N such that 21K < and K ≥ M . For every k ∈ N and every x ∈ X, we have
f (x) ∈ (−k, k), and so
1 1
|f (x) − sk (x)| ≤ k ≤ K < .
2 2
Therefore sk → f uniformly on X.

(v): If f (x) ≥ 0, then sk (x) ≥ 0 by definition. The rest follows from (ii).

(vi): For k ∈ N and m ∈ 1, . . . , k2k , define

Fk = {x : f (x) ∈ [k, ∞]} = f −1 ([k, ∞]),

F−k = {x : f (x) ∈ [−∞, −k]} = f −1 ([−∞, −k]),

m−1 m −1 m−1 m
Em = x : f (x) ∈ , k =f , k ,
2k 2 2k 2

−m −(m − 1) −1 −m −(m − 1)
E−m = x : f (x) ∈ , = f , .
2k 2k 2k 2k

Then
k2 k k2 k
X m−1 X −(m − 1)
sk = k1Fk + (−k)1F−k + k
1Em + 1E−m .
2 2k
m=1 m=1

If f is measurable, then all the sets Fk , F−k , Em , E−m are measurable, and so sk is measurable.

27
7 Lebesgue Integral

Definition 7.1. Let (X, A) be a measurable space. An A-partition of X is a finite collection

P = {A1 , . . . , An } of disjoints sets in A such that ni=1 Ai = X.
S

Definition 7.2. Let (X, A, µ) be a measure space. Let f : X → [0, ∞] be a measurable function.
Let P = {A1 , . . . , An } be an A-partition of X. The lower Lebesgue sum for f and P is
n
X
L(f, P ) = (inf f )µ(Ai ).
Ai
i=1

Note that we use the convention that 0 · ∞ = ∞ · 0 = 0.

Remark: Since f : X → [0, ∞], the lower Lebesgue sum involves only terms in [0, ∞]. If we had
allowed f : X → [−∞, ∞], then the sum could have both ∞ and −∞ terms, which would result in
the undefined expression ∞ − ∞.

Definition 7.3. Let (X, A, µ) be a measure space. Let f : X → [0, ∞] be a measurable function.
The (or integral) of f with respect to µ is defined to be
Z
f dµ = sup {L(f, P ) : P is an A-partition of X} .
R R
Note that f dµ is always either a finite number in [0, ∞) or ∞. If R f dµ is Rfinite, we say that
R
f is integrable with respect to µ. We sometimes write the integral f dµ as f (x)dµ(x) or f
instead. If the measure is Lebesgue measure on R, i.e., if µ = λ, we usually dx instead of dλ or
dλ(x).

Definition 7.4. Let f : X → [−∞, ∞]. The positive part of f is the function

f + = max {0, f } .

The negative part of f is the function

f − = max {0, −f } .

Lemma 7.5. Let f : X → [−∞, ∞].

(a) f + , f − ≥ 0

(b) f = f + − f −

(d) f + (x) = 21 (|f |(x) + f (x)) whenever f (x) 6= −∞

(e) f − (x) = 12 (|f |(x) − f (x)) whenever f (x) 6= ∞

(f) f is measurable iff f + and f − are measurable

28
Definition 7.6. Let (X, A, µ) Rbe a measure space. Let f : X → [−∞, ∞] be a measurable function.
If at least one of f dµ and f − dµ is finite, the Lebesgue integral (or simply integral) of f
R +

with respect to µ is defined to be

Z Z Z
f dµ = f + dµ − f + dµ.

If fR + dµ = f − dµ = ∞, the integral f dµ is not defined Note f dµ is finite iff both f + dµ

R R R R R

and f − dµ are finite. If f dµ is finite, we say that f is integrable with respect to µ.

Definition
R 7.7. Let (X,R A, µ) be a measure space. Let f : X → C be a measurable function.
If both Re(f )dµ and Im(f )dµ are finite, we say f is integrable with respect to µ and the
Lebesgue integral (or simply integral) of f with respect to µ is defined to be
Z Z Z
f dµ = Re(f )dµ − Im(f )dµ.

Definition 7.8. Let (X,RA, µ) be a measure space. Let f : X → [−∞, ∞] or f : X → C. If f is

measurable, E ∈ A, and f 1E dµ is defined, we define
Z Z
f dµ = f 1E dµ.
E
R R
With this notation, f dµ = X f dµ.
Theorem 7.9.R Let (X,RA, µ) be a measure space. Let f, g : X → [0, ∞] be measurable functions.
If f ≤ g, then f dµ ≤ gdµ.

Proof. Let P = {A1 , . . . , An } be an A-partition of X. Then

inf f ≤ inf g
Ai Ai
R R
for all i. Therefore L(f, P ) ≤ L(g, P ), and so f dµ ≤ gdµ

Corollary 7.10. Let (X,R A, µ) beR a measure space. LetR f, g : X

R → [−∞, ∞] be measurable
functions. If f ≤ g and f dµ and gdµ are defined, then f dµ ≤ gdµ.

Proof. Assume f ≤ g. Then f + ≤ g + and f − ≥ g − . Therefore

Z Z Z Z Z Z
+ − + −
f dµ = f dµ − f dµ ≤ g dµ − g dµ ≤ gdµ.

Theorem 7.11. RLet (X, A,Rµ) be a measure space. Let f : X → [0, ∞] be a measurable function.
Let c ≥ 0. Then cf dµ = c f dµ.

Proof. Exercise.

Corollary 7.12. Let (X, RA, µ) be a Rmeasure space. Let f : X → [−∞, ∞] be a measurable
function. Let c ∈ R. Then cf dµ = c f dµ.
Corollary 7.13.
R R A, µ) be a measure space. Let f : X → C be a measurable function. Let
Let (X,
c ∈ C. Then cf dµ = c f dµ.

29
8 The Integral in Terms of Simple Functions

Theorem 8.1. Let (X, A, µ) be a measure space. For any E ∈ A,

Z
1E dµ = µ(E).

Proof. Since E ∈ A, the function 1E is measurable. We prove the ≥ and ≤ inequalities separately.

≥: Consider the A-partition of X given by P = {E, X \ E}. Then

L(f, P ) = (inf 1E )µ(E) + ( inf 1E )µ(X \ E)

E X\E

= 1 · µ(E) + 0 · µ(X \ E) = µ(E).

Therefore Z
1E dµ ≥ µ(E).

≤: Let P = {A1 , . . . , An } be any A-partition of X. Then inf Ai 1E = 1 if Ai ⊆ E and inf Ai 1E = 0

otherwise. Therefore
 
Xn X [
L(f, P ) = (inf f )µ(Ai ) = µ(Ai ) = µ  Ai  ≤ µ(E).
Ai
i=1 i:Ai ⊆E i:Ai ⊆E

Therefore Z
1E dµ ≤ µ(E).

Theorem 8.2. Let (X, A, µ) be Pa measure space. If s is a measurable simple function with s ≥ 0
and standard representation s = ni=1 ci 1Ei , then
Z n
X
sdµ = ci µ(Ei ).
i=1

Proof. The collection P = {E1 , . . . , En } is a A-partition of X, and

n
X n
X
L(s, P ) = (inf s)µ(Ei ) = ci µ(Ei )
Ei
i=1 i=1

Thus sdµ ≥ ni=1 ci µ(Ei ). Now we prove the reverse inequality. Let P = {AS1 , . . . , Am } be any
R P
A-partition of X. Note that we can write each Aj as the disjoint union Aj = ni=1 (Aj ∩ Ei ). We

30
Sj
can also write each Ei as the disjoint union Ei = j=1 (Aj ∩ Ei ). Then
m
X
L(s, P ) = (inf s)µ(Aj )
Aj
j=1
Xm Xn
= (inf s)µ(Aj ∩ Ei )
Aj
j=1 i=1
Xm X n
≤ ( inf s)µ(Aj ∩ Ei )
Aj ∩Ei
j=1 i=1
Xm X n
= ci µ(Aj ∩ Ei )
j=1 i=1
n
X Xm
= ci µ(Aj ∩ Ei )
i=1 j=1
Xn
= ci µ(Ei ).
i=1
R Pn
Therefore sdµ ≤ i=1 ci µ(Ei ).

It is useful to restate the definition of the Lebesgue integral in terms of simple functions.

Theorem 8.3. Let (X, A, µ) be a measure space. Let f : X → [0, ∞] be a measurable function.
Then
Z Z
f dµ = sup sdµ : s simple measurable, 0 ≤ s ≤ f (8.1)

Proof. The ≥ inequality comes from Theorem 7.9. Now we prove the reverse inequality. Let
P = {A1 , . . . , An } be any A-partition of X. We must show L(f, P ) is ≤ the right-hand side of 8.1.

Case 1. For every i ∈ {1, . . . , n}, either inf Ai f ) < ∞ or µ(Ai ) = 0. Define

inf Ai f if inf Ai f < ∞.
ci =
0 if inf Ai f = ∞ and µ(Ai ) = 0.

Define s = ni=1 ci 1Ai . ThisR is the standard representation of the measurable simple function s.
P
By Theorem 8.2, L(f, P ) = sdµ. Thus L(f, P ) is ≤ the right-hand side of 8.1.

Case 2. For some i0 ∈ {1, . . . , n}, either inf Ai0 f ) = ∞ and µ(Ai0 ) > 0. Then L(f, P ) = ∞.
Seeking a contradiction, assume the right-hand side of 8.1 equals a finite number M ∈ [0, ∞).
Define c1 = (M + 1)/µ(Ai0 ) and c2 = 0. Define s = c1 1Ai0 + c2 1X\Ai0 . This is the standard
representation of the measurable
R simple functionR s. Note s is a simple measurable function with
0 ≤ s ≤ f . By Theorem 8.2, sdµ = M + 1. So sdµ is strictly larger than the right-hand side of
8.1. Contradiction.

31
9 Monotone Convergence Theorem

Theorem 9.1. (Monotone Convergence Theorem) Let (X, A, µ) be a measure space. Let fn :
X → [0, ∞] (n = 1, 2, . . .) be a sequence of measurable functions. If fn increases to f pointwise
(meaning that f1 (x) ≤ f2 (x) ≤ . . . and f (x) = limn fn (x) = supn fn (x) for every x ∈ X), then f is
measurable and Z Z
lim fn = f.
n

Proof. Since f is the pointwise limit of a sequenceR of measurable

R functions,
R Corollary ??
R implies
R f
is measurable. Since f1 ≤ f2 ≤ . . . ≤ f , we have f1 ≤ f2 ≤ . . . ≤ f , and so limn fn ≤ f .
We must prove the reverse inequality. Let 0 < c < 1 be given. Let s be a simple measurable
function such that 0 ≤ s ≤ f . For every n ∈ N, define

En = {x ∈ X : cs(x) ≤ fn (x)} .

For every n ∈ N, we have cs1En ≤ fn , and so

Z Z
c s1En ≤ fn . (9.1)

We want to take the limit n → ∞ in (9.1). Suppose the standard representation Pof s is s =
P k k
a 1
i=1 i Ai . Then s1 En is a simple function and its standard representation is s1 En = i=1 ai 1En ∩Ai .
Therefore
Z Xk
s= ai µ(Ai )
i=1

and
Z k
X
s1En = ai µ(En ∩ Ai ).
i=1

Note E1 ⊆ E2 ⊆ . . . and
∞
[
X= En .
n=1

(To see the last equality, consider an arbitrary x ∈ X. If f (x) = 0 or s(x) = 0, then x ∈ E1 . If
f (x) > 0 and s(x) > 0, then cs(x) < f (x), hence there exists n ∈ N such that cs(x) < fn (x) ≤ f (x),
and so x ∈ En .) Let i ∈ {1, . . . , k} be arbitrary. We have E1 ∩ Ai ⊆ E2 ∩ Ai ⊆ . . . and
∞
[
Ai = (En ∩ Ai ).
n=1

By continuity from below,

lim µ(En ∩ Ai ) = µ(Ai ).
n→∞

Therefore
Z k
X k
X Z
lim s1En = lim ai µ(En ∩ Ai ) = ai µ(Ai ) = s.
n n
i=1 i=1

32
Thus taking n → ∞ in (9.1) gives Z Z
c s ≤ lim fn .
n

Since 0 < c < 1 is arbitrary, Z Z

s ≤ lim fn .
n

Since s is an arbitrary simple measurable function with 0 ≤ s ≤ f , we have

Z Z
f ≤ lim fn .
n

33
10 Additivity of the Integral

Lemma 10.1. Let (X, A, µ) be a measure space. If f, g are non-negative measurable simple
functions, then Z Z Z
(f + g) = f + g.

Proof.SLet f = m
P Pn
i=1 ai 1Ai and g = j=1 bj 1Bj be the standard representations
Sof f and g. Note
Ai = nj=1 (Ai ∩ Bj ) is a disjoint union for each i = 1, . . . , m. Likewise Bj = m i=1 (Ai ∩ Bj ) is a
disjoint union for each j = 1, . . . , n. Then
Z Z X m Xn
f+ g= ai µ(Ai ) + bi µ(Bi )
i=1 j=1
m X
X n m X
X n X
n
= ai µ(Ai ∩ Bj ) + bi µ(Ai ∩ Bj )
i=1 j=1 i=1 j=1 j=1
Xm X n
= (ai + bj )µ(Ai ∩ Bj ).
i=1 j=1

Note f + g is a simple function and

m X
X n
f +g = (ai + bj )1Ai ∩Bj .
i=1 j=1

However this may not be the standard representation of f + g because the values ai + bj may not
be distinct. Let c1 , . . . , c` be the distinct numbers in the set
{ai + bj : 1 ≤ i ≤ m, 1 ≤ j ≤ n} .
Let Ek be the union of those sets Ai ∩ Bj such that ai + bj = ck . Then the standard representation
of f + g is
X`
f +g = ck 1Ek .
k=1
and X
µ(Ek ) = µ(Ai ∩ Bj ).
i,j:ai +bj =ck

Therefore
Z `
X
(f + g) = ck µ(Ek )
k=1
X` X
= ck µ(Ai ∩ Bj )
k=1 j,k:ai +bj =ck
m X
X n
= (ai + bj )µ(Ai ∩ Bj )
i=1 j=1
R R R
Comparing the calculations for f+ g and (f + g) shows they are equal.

34
Theorem 10.2. Let (X, A, µ) be a measure space. If f, g : X → [0, ∞] are measurable functions,
then Z Z Z
(f + g) = f + g.

Proof. By Theorem 6.3, there are sequences of measurable simple functions (sn ) and (tn ) such that
sn increases to f and tn increases to g. Then (sn + tn ) is a sequence of measurable function that
increases to f + g. By the monotone convergence theorem and Lemma 10.1,
Z Z Z Z Z Z
(f + g) = lim (sn + tn ) = lim sn + lim tn = f + g
n n n

Corollary 10.3. Let (X, A, µ) be a measure space. If f, g : X → C are integrable functions, then
f + g is integrable and Z Z Z
(f + g) = f + g.

Proof. Case 1: f and g are real-valued. Note (f + g)+ ≤ f + + g + . Then Theorem 10.2 gives
Z Z Z Z
(f + g) ≤ (f + g ) = f + g + < ∞.
+ + + +

Likewise (f + g)− < ∞. Therefore f + g is integrable. Note (f + g)+ − (f + g)− = f + g =

f + − f − + g + − g − . Since f − , g − , and (f + g)− are finite, rearranging gives (f + g)+ + f − + g − =

(f + g)− + f + + g + . Then Theorem 10.2 gives
Z Z Z Z Z Z
− − −
(f + g) + f + g = (f + g) + f + g + .
+ +

f −, g − , and (f + g)− are finite, rearranging gives

R R R
Since
Z Z Z Z Z Z
(f + g) − (f + g) = f − f + g − g − .
+ − + − +

Therefore Z Z Z
(f + g) = f+ g.

Case 2: f and g are complex-valued. Since f and g are integrable, so are Re(f ), Im(f ), Re(g), and
Im(g). We apply Case 1 to the real and imaginary parts of f and g. We get
Z Z Z Z
Re(f + g) = (Re(f ) + Re(g)) = Re(f ) + Re(g) < ∞

and Z Z Z Z
Im(f + g) = (Im(f ) + Im(g)) = Im(f ) + Im(g) < ∞.

So f + g is integrable and
Z Z Z Z Z Z Z
(f + g) = Re(f + g) + i Im(f + g) = Re(f ) + Re(g) + i Im(f ) + i Im(g)
Z Z Z Z Z Z
= Re(f ) + i Im(f ) + Re(g) + i Im(g) = f + g.

35
Corollary 10.4. Let (X, A, µ) be a measure space. Let f, g : X → [−∞, ∞] be measurable
functions.

(a) If f − , g − , f − , and g − are finite, then (f + g) = f + g.

R R R R R

(b) If f + , g + , f + , and g + are finite, then (f + g) = f + g.

R R R R R

Proof. Exercise.

36
11 Fatou’s Lemma

Lemma 11.1 (Fatou’s Lemma). Let (X, A, µ) be a measure space. If fn : X → [0, ∞] (n = 1, 2, . . .)

is a sequence of measurable functions, then
Z Z
lim inf fn ≤ lim inf fn .

Proof. Define gk = inf n≥k fn for k ∈ N. Then (gk ) is an increasing sequence of measurable functions
and limk gk = lim inf fn . By the monotone convergence theorem,
Z Z
lim inf fn = lim gk .
k
R R R R
Note gk ≤ fn for n ≥ k. So gk ≤ fn for n ≥ k. Thus gk ≤ inf n≥k fn . Therefore
Z Z Z Z
lim inf fn = lim gk ≤ lim inf fn = lim inf fn .
k k n≥k

37
12 Integrable Functions and Absolute Values

Theorem 12.1. Let (X, A, µ) be a measure space. Let f : X → [−∞, ∞] or f : X → C be a

measurable function.

(a) f is integrable iff |f | is integrable.

R R
(b) If f is integrable, then f ≤ |f |.

Proof. Case 1: f : X → [−∞, ∞].

(a): f is integrable iff f = f + − f − is finite iff f + and f − are finite iff |f | = f + + f −

R R R R R R R R

is finite iff |f | is integrable.

(b):
Z Z Z Z Z Z Z Z
f = f + − f − ≤ f + + f − = f + + f − = |f |

Case 2: f : X → C.
R R R R R R
(a):
R f is integrable iff f = Re f + Im f is finite iff Re f and Im f are finite iff | Re f | and
| Im f | are finite. The last equivalence comes from Case 1. But |f | ≤ | Re f | + | Im f | ≤ 2|f |, so
that Z Z Z Z
|f | ≤ | Re f | + | Im f | ≤ 2 |f |.
R R R
Thus | Re f | and
| Im f | are finite iff |f | is finite iff |f | is integrable.

(b): The inequality is trivial if f = 0. Assume f 6= 0. Set α = f ( f )−1 . Then

R R R R

Z Z Z

f = α f = αf

R
In particular, αf is real. Therefore
Z Z Z Z Z

f = Re αf = Re(αf ) ≤ |αf | = |f |.

Corollary 12.2. Let (X, A, µ) be a measure space. Let f, g : X → [−∞, ∞] or f, g : X → C. If f

is measurable, g is integrable, and |f | ≤ |g|, then f is integrable.

R R
Proof. Note |f | ≤ |g| < ∞ and apply the previous theorem.

38
13 Dominated Convergence Theorem

Theorem 13.1. (Dominated Convergence Theorem) Let (X, A, µ) be a measure space. Let fn :
X → C be a sequence of measurable functions. Let f : X → C be a function. If fn → f pointwise
and there exists an integrable function g : X → [0, ∞] such that |fn | ≤ g for all n, then f is
integrable, fn is integrable for each n,
Z
lim |f − fn | = 0,
n

and Z Z
lim fn = f
n

Proof. Since f is the pointwise limit of a sequence of measurable functions, Corollary ?? implies f
is measurable. Since fn → f pointwise and |fn | ≤ g for all n, we have |f | ≤ g, and so Corollary 12.2
implies f is integrable. Corollary 12.2 implies fn is integrable for each n. Then f − fn is integrable
for each n. Since |f − fn | ≤ 2g, we have 0 ≤ 2g − |f − fn |. Fatou’s lemma,
Z Z
2g = lim(2g − |f − fn |)
n
Z
= lim inf (2g − |f − fn |)
n
Z
≤ lim inf (2g − |f − fn |)
n
Z Z
= lim inf 2g − |f − fn |
n
Z Z
= 2g + lim inf − |f − fn |
n
Z Z
= 2g − lim sup |f − fn |
n
R R
Since 2g is finite, we can subtract it to obtain lim supn |f − fn | ≤ 0. So we have
Z Z
0 ≤ lim inf |f − fn | ≤ lim sup |f − fn | ≤ 0.
n n

Thus Z Z Z
lim |f − fn | = lim sup |f − fn | = lim inf |f − fn | = 0.
n n n

Furthermore, Z Z Z Z

lim fn − f = lim (fn − f ) ≤ lim |f − fn | = 0,
n n n
R R
whence limn fn = f .

39
14 Interchanging Limits and Derivatives with Integrals

Theorem 14.1. Let (X, A, µ) be a measure space. Let R f : X × [a, b] → C. Suppose that f ( · , t) :
X → C is integrable for each t ∈ [a, b]. Define F (t) = X f (x, t)dµ(x) for each t ∈ [a, b].

(a) Suppose there exists an integrable function g : X → [0, ∞] such that |f (x, t)| ≤ g(x) for all
x ∈ X, t ∈ [a, b]. Suppose c ∈ [a, b] and limt→c f (x, t) = f (x, c) for every x ∈ X. Then
Z Z
lim F (t) = lim f (x, t)dµ(x) = f (x, c)dµ(x) = F (c).
t→c t→c X X

∂f
(b) Suppose (x, t) exists for all x ∈ X, t ∈ [a, b]. Suppose there exists an integrable function
∂t
∂f
g : X → [0, ∞] such that | (x, t)| ≤ g(x) for all x ∈ X, t ∈ [a, b]. For any c ∈ [a, b],
∂t
Z
0 ∂f
F (c) = (x, c)dµ(x).
X ∂t

Proof. The idea is to combine the dominated convergence theorem with the following fact from
undergraduate analysis: limx→c h(x) = L iff limn→∞ h(xn ) = L for every sequence (xn ) with
xn 6= c and xn → c.

(a): Exercise.

(b): Let (tn ) be any sequence of numbers in [a, b] such that tn 6= c for all n and tn → c. Define
f (x, tn ) − f (x, c)
fn (x) =
tn − c
∂f
for each x ∈ X. Then fn is measurable and limn fn (x) = (x, c). By the mean value theorem,
∂t
∂f
|fn (x)| ≤ sup | (x, t)| ≤ g(x)
t∈[a,b] ∂t

for all x ∈ X. By the dominated convergence theorem,

F (tn ) − F (c)
= lim
n tn − c
Z Z
1
= lim f (x, tn )dµ(x) − f (x, c)dµ(x)
n tn − c X X
f (x, tn ) − f (x, c)
Z
= lim dµ(x)
n tn − c
ZX
= lim fn (x)dµ(x)
n X
Z
∂f
= (x, c)dµ(x).
X ∂t

Since (tn ) is arbitrary, the limit

F (t) − F (c)
F 0 (c) = lim
t→c t−c

40
exists, and Z
0 ∂f
F (c) = (x, c)dµ(x).
X ∂t

41
15 Almost Everywhere

Definition 15.1. Let (X, A, µ) be a measure space. If A ∈ A, µ(Ac ) = 0, and P is a property that
holds for every x ∈ A, then we say that P holds almost everywhere or that P holds for almost
every x ∈ X.

We often abbreviate “almost everywhere” and “almost every” to “a.e.” The concept of a.e. depends
on the measure µ. When clarity demands it, we write µ-a.e. instead of a.e.

For example, if f, g : X → C and there exists A ∈ A such that µ(Ac ) = 0 and f(x)=g(x) for every
x ∈ A, then we say that f = g a.e.

As another example, if fn : X → C (n = 1, 2, . . .) is a sequence functions and there exists A ∈ A

such that µ(Ac ) = 0 and (fn (x)) converges for each x ∈ A, we say that fn converges a.e.

Remark: “almost everywhere” means “everywhere except on a set of measure zero”

Remark: Almost everywhere convergence is not a topological convergence; there is no topology

that can be placed on the set of measurable functions for which convergence in the topology is
equivalent to convergence almost everywhere.
Theorem
R 15.2. Let (X, A, µ) be a measure space. Let f : X → [0, ∞] be a measurable function.
Then f = 0 iff f = 0 a.e.

Proof. Case 1: f is simple. Let f = ni=1 ciR1Ei beP

P
the standard representation of f . So ci ≥ 0 and
−1 n
Ei = f ({ci }) for each 1 ≤ Pni ≤ n. Then f = i=1 ci µ(Ei ) = 0 iff for every 1 ≤ i ≤ n either
ci = 0 or µ(Ei ) = 0 iff f = i=1 ci 1Ei = 0 a.e..

Case 2: f not simple. Assume

R f = 0 a.e.. For every measurable simple function s with 0 ≤ s ≤ f ,
we have s = 0 a.e., hence s = 0 by Case 1. Therefore
Z Z
f = sup s : s measurable simple, 0 ≤ s ≤ f = sup {0} = 0.

Conversely, assume f = 0. Note {x ∈ X : f (x) > 0} = ∞ −1 .

R S
n=1 En , where En = x ∈ X : f (x) ≥ n
For every n ∈ N, we have 1En ≤ nf , and so
Z Z
µ(En ) = 1En ≤ n f = 0.

Therefore
∞
X
µ({x ∈ X : f (x) > 0}) ≤ µ(En ) = 0
n=1
Thus f = 0 a.e.

Corollary 15.3. Let (X, A, µ) be a measure space. Let f, g : X → [0, ∞] be measurable functions.
R R
(a) If f ≤ g a.e., then f ≤ g.
R R
(b) If f = g a.e., then f = g.

42
Proof. (a): Suppose f ≤ g a.e. Then there exists A ∈ A such that µ(Ac ) = 0 and f (x) ≤ g(x) for
all x ∈ A. So f 1Ac = 0 a.e. and f 1A ≤ g everywhere. Therefore
Z Z Z Z Z Z
f = (f 1A + f 1A ) = f 1A + f 1A = f 1A ≤ g.
c c

(b): Note that f = g a.e. iff f ≤ g a.e. and g ≤ f a.e., then apply (a) twice.

Corollary 15.4. Let (X, A, µ) be a measure space. Let f, g : X → [−∞, ∞] be measurable

functions.
R R R R
(a) If f ≤ g a.e. and both f and g are defined, then f ≤ g.
R R R R
(b) If f = g a.e. and both f and g are defined, then f = g.

+ + − − f+ ≤ g + and
R R
Proof. R −Suppose f ≤ g a.e. Then f ≤ g a.e and g ≤ f a.e. Therefore
R − (a):
g ≤ f . Thus Z Z Z Z Z Z
+ − + −
f = f − f ≤ g − g = g.

(b): Note that f = g a.e. iff f ≤ g a.e. and g ≤ f a.e., then apply (a) twice.

Corollary 15.5. Let (X, A, µ) be a measure space.

R LetR f, g : X → C be measurable functions. If
f = g a.e. and both f and g are integrable, then f = g.

Proof. Suppose f = g a.e. Then Re f = Re g a.e and Im f = Im g a.e. Therefore

Z Z Z Z Z Z
f = Re f + i Im f = Re g + i Im g = g.

Theorem 15.6. Let (X, A, µ) be a measure space. If f : X → [0, ∞] is integrable, then µ({x ∈ X : f (x) = ∞}) =
0, hence f is finite a.e.

Proof. For each n ∈ N, define En = {x ∈ X : n ≤ f (x)}. Since {x ∈ X : f (x) = ∞} ⊆ En and

1En ≤ n−1 f , we have
Z Z
−1
µ ({x ∈ X : f (x) = ∞}) ≤ µ(En ) = 1En ≤ n f.
R
Since f is finite, letting n → ∞ gives µ ({x ∈ X : f (x) = ∞}) = 0.

Corollary 15.7. Let (X, A, µ) be a measure space. If f : X → [−∞, ∞] is an integrable function,

then µ({x ∈ X : f (x) = ∞}) = µ({x ∈ X : f (x) = −∞}) = 0, hence f is finite a.e.
Theorem 15.8. (Almost Everywhere Dominated Convergence Theorem) Let (X, A, µ) be a mea-
sure space. Let fn : X → C (n = 1, 2, . . .) be a sequence of measurable functions. Let f : X → C
be a measurable function. If fn → f a.e. and there exists an integrable function g : X → [0, ∞]
such that |fn | ≤ g a.e. for all n, then f is integrable, fn is integrable for each n,
Z Z Z
lim |f − fn | = 0 and lim fn = f.
n n

43
Proof. Choose a set A ∈ A such that µ(Ac ) = 0 and such that, for each x ∈ A, we have limn fn (x) =
f (x) and |fn (x)| ≤ g(x) for all n. (To see that such a set A exists, argue as follows. Since fn → f
a.e, we have that there exists A0 ∈ A with µ(Ac0 ) = 0 such that limn fn (x) = f (x) for all x ∈ A0 . For
each n, we have |fn |T≤ g a.e., so there exists An ∈ A with µ(Acn ) = 0 such that |fn (x)| ≤ g(x) for all
x ∈ An . Then A = ∞ n=0 An is the desired set.) Then, for each x ∈ A, |f (x)| = | limn fn (x)| ≤ g(x).
Since |f |1Ac = 0 a.e. and |f |1A ≤ g everywhere, we have
Z Z Z Z Z Z
|f | = (|f |1A + |f |1A ) = |f |1A + |f |1A = |f |1A ≤ g < ∞.
c c

Thus f is integrable. Similar arguments show that f 1A , fn , and fn 1A are integrable for each n.
R and fn 1A is integrable for each n. For each n, |f − fn | = |f 1A − fn 1A |
Moreover, f 1RA is integrable
a.e., and so |f − fn | = |f 1A − fn 1A |. We also have fn 1A → f 1A pointwise and fn 1A ≤ g for
each n. So Theorem 15.2 and the dominated convergence theorem implies
Z Z
lim |f − fn | = lim |f 1A − fn 1A | = 0.
n n

Furthermore, Z Z Z Z

lim f − fn = lim (f − fn ) ≤ lim |f − fn | = 0,

n n n
R R
which shows limn fn = f .

44
16 Complete Measures

If we compare Theorem 13.1 (dominated convergence theorem) to Theorem 15.8 (almost everywhere
dominated convergence theorem), we notice that f being measurable is part of the conclusion of
Theorem 13.1 but it is part of the hypothesis in Theorem 15.8. This is because a pointwise limit of
measure functions is measurable, but an a.e. limit of measurable functions may not be measurable.
We explore this in this section.

Definition 16.1. Let (X, A, µ) be a measure space. We say that µ is complete (or that (X, A, µ)
is complete) if the following property holds: If N ∈ A, µ(N ) = 0, and A ⊆ N , then A ∈ A. In
other words, µ is complete if all subsets of measure zero sets are measurable.

Theorem 16.2. Let (X, A, µ) be a measure space. The following are equivalent.

(a) µ is complete.

(b) For all f, g : X → C, if f is measurable and f = g a.e., then g is measurable.

(c) For all f, f1 , f2 , . . . : X → C, if fn is measurable for each n and fn → f a.e., then f is

measurable.

Proof. (a) ⇒ (b): Assume f is measurable and f = g a.e. Then there exists a set A ∈ A such that
µ(Ac ) = 0 and f (x) = g(x) for all x ∈ A. Let B ∈ B be given. Write

g −1 (B) = (g −1 (B) ∩ A) ∪ (g −1 (B) ∩ Ac ).

Note g −1 (B) ∩ A = f −1 (B) ∩ A ∈ A. Note g −1 (B) ∩ Ac ⊆ Ac , Ac ∈ A, and µ(Ac ) = 0. Since µ is

complete, g −1 (B) ∩ Ac ∈ A. Therefore g −1 (B) ∈ A. This proves g is measurable.

(b) ⇒ (c): Assume fn is measurable for each n and fn → f a.e.. Then there exists a set A ∈ A
such that µ(Ac ) = 0 and fn (x) → f (x) for all x ∈ A. Define gn = fn 1A and g = f 1A . Then gn is
measurable for each n and gn → g pointwise. So g is measurable. Moreover, g = f a.e. Then (b)
implies f is measurable.

(c) ⇒ (b): Assume f is measurable and f = g a.e. Define hn = f for all n. Then hn is measurable
for all n and hn → g a.e.. Then (c) implies g is measurable.

(b) ⇒ (a): We prove the contrapositive. Assume µ is not complete. So there exist sets A, N ⊆ X
such that N ∈ A, µ(N ) = 0, A ⊆ N , and A ∈
/ A. Define f = 1N c and g = 1Ac . Then f = g a.e., f
is measurable, but g is not measurable.

Remark: The theorem still holds if we replace C by [0, ∞], [−∞, ∞], or R. The proof is easy to
modify.

Remark: In the almost everywhere dominated convergence theorem, if the measure µ is complete,
then f being measurable can be part of the conclusion, rather than part of the hypothesis.

The next theorem says that the Caratheodory restriction theorem produces a complete measure.

45
Theorem 16.3. Let µ∗ be an outer measure on a set X. Let A∗ be the σ-algebra of µ∗ -measurable
sets and let µ be the measure which is the restriction of µ∗ to A∗ . Then µ is complete.

Proof. Assume N ∈ A∗ , µ(N ) = 0, and A ⊆ N . Then µ∗ (A) ≤ µ∗ (N ) = µ(N ) = 0, hence

µ∗ (A) = 0. Therefore, for any E ∈ P(X),

µ∗ (E ∩ A) + µ∗ (E ∩ Ac ) ≤ µ∗ (A) + µ∗ (E) = µ∗ (E).

So A ∈ A∗ .

Corollary 16.4. Lebesgue measure is complete.

46
17 Completion of Measures (Optional)

Definition 17.1. Let (X, A, µ) be a measure space. The completion of A with respect to µ is
the collection A consisting of all sets of the form A ∪ B, where A ∈ A, B ∈ P(X), and there exists
N ∈ A such that B ⊆ N and µ(N ) = 0.

Theorem 17.2. Let (X, A, µ) be a measure space.

(a) A is smallest σ-algebra containing all sets E such that E ⊆ N for some N ∈ A with µ(N ) = 0.

(b) A is smallest σ-algebra on which some extension of µ is complete.

Theorem 17.3. Let (X, A, µ) be a measure space. There is a unique extension of µ to A. This
extension is called the completion of µ and is denoted by µ. The measure space (X, A, µ) is called
the completion of (X, A, µ).

Theorem 17.4. The Lebesgue σ-algebra on R, L(R), is the completion of the Borel σ-algebra on
R, B(R).

Theorem 17.5. The Lebesgue measure on (R, L(R)) is the completion of the Lebesgue measure
on (R, B(R))

Theorem 17.6. Let µ be a measure defined on a σ-algebra A. If µ is σ-finite, then the measure
obtained by doing a Caratheodory extension of µ is the completion of µ.

Remark: Exploring what happens in the non-σ-finite case is an exercise.

Theorem 17.7. Let (X, A, µ) be a measure space and let (X, A, mu) be its completion.

(a) If f : X → R is A-measurable, then f is A-measurable.

(b) If f : X → R is A-measurable, then there is a A-measurable function g : X → R such that

f = g µ-a.e.

Proof. (a): Use A ⊆ A and the definition of measurability.

(b): Prove it first for f simple. For general f , consider a sequence of simple functions converging
pointwise to f .

Remark: The theorem still holds if we replace C by [0, ∞], [−∞, ∞], or R. The proof is easy to
modify.

47
18 The Lebesgue Integral Extends the Riemann Integral

In this section, X is the interval [a, b], the σ-algebra is the collection of Lebesgue measurable subsets
of [a, b], and the measure is the Lebesgue measure on R restricted to this σ-algebra. It is easy to
check that this defines a complete measure space.

Theorem 18.1. Suppose f : [a, b] → R is bounded on [a, b]. If f is Riemann integrable on [a, b],
then f is Lebesgue measurable, f is Lebesgue integrable, and the Riemann and Lebesgue integrals
of f on [a, b] are equal.

R R
Proof 1. We use the notation f for the lower Riemann integral of f on [a,b], f for the upper
R
Riemann integral of f on [a, b], and f for the Lebesgue integral on [a, b]. Assume f is Riemann
integrable on [a, b]. Using Riemann’s condition, choose partitions P1 ⊆ . . . ⊆ Pn ⊆ Pn+1 ⊆ . . . such
that
1
U (f, Pn , [a, b]) − L(f, Pn , [a, b]) < . (18.1)
n
For each n, write Pn = {x0 , . . . , xk }, and define
k
X k
X
sn = ( inf f )1[xi−1 ,xi ] , tn = ( sup f )1[xi−1 ,xi ]
[xi−1 ,xi ]
i=1 i=1 [xi−1 ,xi ]

so that Z Z
sn = L(f, Pn , [a, b]), tn = U (f, Pn , [a, b])

Note
s1 ≤ . . . ≤ sn ≤ sn+1 . . . ≤ f ≤ . . . ≤ tn+1 ≤ tn ≤ . . . ≤ t1 .
Then
s1 ≤ s ≤ f ≤ t ≤ t1
where we have defined s = limn sn and t = limn sn . It follows that s and t are finite everywhere
and Lebesgue integrable. Since 0 ≤ sn − s1 ↑ s − s1 , the monotone convergence theorem implies
Z Z
lim (sn − s1 ) = (s − s1 ),
n
R
and adding s1 gives Z
lim sn = s.
n

A similar argument gives Z

lim tn = t.
n

Combining the above with (18.1) gives

Z Z
(t − s) = lim (tn − sn ) = lim (U (f, Pn , [a, b]) − L(f, P, [a, b])) = 0.
n n

48
Since t − s ≥ 0, it follows that t − s = 0 a.e., hence s = t a.e. Since sn ≤ f ≤ tn for all n, we
have s ≤ f ≤ t, and so s = f = t a.e. Since s and t are measurable and the Lebesgue measure is
complete, f is measurable. Moreover,
Z Z Z
s= f = t

For each n, Z Z Z Z
1 1
f≤ tn ≤ sn + ≤ f+
n n
Taking n → ∞ gives Z Z Z Z
f≤ t= s≤ f

hence Z Z Z Z Z
f= f= t= s= f

So f is Lebesgue integrable and the Riemann and Lebesgue integrals are equal.

49
Here is a slightly different proof.

R R
Proof 2. We use the notation f for the lower Riemann integral of f on [a,b], f for the upper
Riemann integral of f on [a, b], and f for the Lebesgue integral on [a, b]. Choose a sequence (Pn0 )
R

of partitions of [a, b] such that Z

lim L(f, Pn0 , [a, b]) = f.
n

Choose a sequence (Pn00 ) of partitions of [a, b] such that

Z
lim U (f, Pn00 , [a, b]) = f.
n
Sn 0
Define Pn = i=1 (Pi ∪ Pi00 ) for each n. Then P1 ⊆ P2 ⊆ . . .,
Z Z
lim L(f, Pn , [a, b]) = f, lim U (f, Pn , [a, b]) = f.
n n

For each n, write Pn = {x0 , . . . , xk }, and define

k
X k
X
sn = ( inf f )1[xi−1 ,xi ] , tn = ( sup f )1[xi−1 ,xi ]
[xi−1 ,xi ]
i=1 i=1 [xi−1 ,xi ]

so that Z Z
sn = L(f, Pn , [a, b]), tn = U (f, Pn , [a, b])

and Z Z Z
lim sn = f, lim tn = f.
n n

A similar argument gives Z Z

t = lim tn = f
n

Since f is Riemann integrable on [a, b], we have

Z Z Z
(t − s) = f − f = 0.

50
Since t − s ≥ 0, it follows that t − s = 0 a.e., hence s = t a.e. Since sn ≤ f ≤ tn for all n, we have
s ≤ f ≤ t. Therefore s = f = t a.e. Since s and t are measurable and the Lebesgue measure is
complete, f is measurable. Moreover,
Z Z Z Z Z
f = s= t= f = f

So f is Lebesgue integrable and the Riemann and Lebesgue integrals are equal.

51
19 Lebesgue’s Condition for Riemann Integrability

Theorem 19.1. (Lebesgue’s Condition for Riemann Integrability) Let f : [a, b] → R be a bounded
function. Then f is Riemann integrable on [a, b] iff f is continuous at almost every point of [a, b].

The idea of the proof is as follows. According to Riemann’s condition, a bounded function f :
[a, b] → R will be Riemann integrable if and only if the sum
n
!
X
( sup f ) − ( inf f ) ∆xi
[xi−1 ,xi ] [xi−1 ,xi ]
i=1

can be made arbitrarily small by choosing an appropriate partition of [a, b]. Split this sum into two
parts, say S1 + S2 , where S2 contains
terms from subintervals where continuity makes the difference
(sup[xi−1 ,xi ] f ) − (inf [xi−1 ,xi ] f ) small and S1 contains the remaining terms where discontinuities
prevent the difference from being small. In S2 , each term is small, so a large number of terms can
occur and still keep S2 small. In S1 , the terms may not be small, but they are bounded in size
(because f is bounded), so that S1 will be small if the sum of the lengths of the subintervals is
small. Hence we may expect that the set of discontinuities of an Riemann integrable function can
be covered by intervals whose total length is small.

To make the argument above precise, we need some preparation.

Definition 19.2. Let f : [a, b] → R be a bounded function. For any interval I ⊆ [a, b], the number

Ωf (I) = sup f (x) − inf f (x) = sup {|f (x) − f (y)| : x, y ∈ I}

x∈I x∈I

is called the oscillation of f on I.

Lemma 19.3. Let f : [a, b] → R be a bounded function.

(a) If I and J are intervals such that I ⊆ J ⊆ [a, b], then Ωf (I) ≤ Ωf (J).

(b) For every x ∈ [a, b], f is continuous at x iff for every > 0 there exists a δ > 0 such that
Ωf ([a, b] ∩ (x − δ, x + δ)) < .

Dk = {x : x ∈ [a, b], Ωf ([a, b] ∩ (x − δ, x + δ)) ≥ 1/k for all δ > 0} .

(d) For each k ∈ N, Dk is closed.

(e) If E is a compact subset of [a, b] \ Dk , then there is a δ > 0 such that Ωf (J) < 1/k for every
interval J ⊆ E with `(J) < δ.

52
Proof. The proofs of (a)-(c) are easy exercises.

(d): To see that Dk is closed, let (xn ) be any sequence of points in Dk that converges to a
point x0 ∈ R. Since [a, b] is closed and xn ∈ [a, b] for all n, we have x0 ∈ [a, b]. Let δ > 0
be given. Choose n large enough that xn ∈ (x0 − δ, x0 + δ). Choose δ 0 > 0 small enough that
(xn − δ 0 , xn + δ 0 ) ⊆ (x0 − δ, x0 + δ). Since xn ∈ Dk , we have
1
Ωf ([a, b] ∩ (xn − δ 0 , xn + δ 0 )) ≥ .
k
Since [a, b] ∩ (xn − δ 0 , xn + δ 0 ) ⊆ [a, b] ∩ (x0 − δ, x0 + δ), we have
1
Ωf ([a, b] ∩ (x0 − δ, x0 + δ)) ≥ Ωf ([a, b] ∩ (xn − δ 0 , xn + δ 0 )) ≥ .
k
Thus x0 ∈ Dk . Hence Dk is closed.

(e): For each x ∈ E, we have x ∈ [a, b] \ Dk , so there exists a δ(x) > 0 such that
1
Ωf ([a, b] ∩ (x − δ(x), x + δ(x))) < .
k
S
Then E ⊆ x∈E (x − δ(x)/2, xS+ δ(x)/2). Since E is compact, there exist finitely many points
x1 , . . . , xm ∈ E such that E ⊆ m
i=1 (xi − δ(xi )/2, xi + δ(xi )/2). Define δ = min {δ(x1 ), . . . , δ(xm )}.
Let J be any interval contained in E with `(J) < δ. Then J intersects (x − δ(xi )/2, x + δ(xi )/2)
for some i. Since `(J) < δ(xi ), we have J ⊆ (xi − δ(xi ), xi + δ(xi )), and so
1
Ωf ([a, b] ∩ J) ≤ Ωf ([a, b] ∩ (xi − δ(xi ), xi + δ(xi ))) < .
k

Proof of Theorem 19.1. We will use the same notation as the lemma.

We first assume λ(D) = 0 and show that Riemann’s condition is satisfied. Let k ∈ N be arbitrary.
Since DP k ⊆ D, we have λ(Dk ) = 0. Choose finite open intervals I1 , I2 , . . . which cover Dk and
satisfy ∞ 1
i=1 `(Ii ) < k . Since Dk is closed and bounded, Dk is compact. So Dk is covered by
SN
finitely many intervals I1 , . . . , IN . Define E = [a, b] \ i=1 Ii . So E is compact and E ⊆ [a, b] ⊆ Dk .
Thus there is a δ > 0 such that Ωf (J) < 1/k for every interval J ⊆ E with `(J) < δ. Note E is the
union of finitely many closed subintervals of [a, b]. Further divide E into a finite number of closed
subintervals of length < δ. The endpoints of these subintervals form a partition P = {x0 , . . . , xn }
of [a, b]. Then
X X
U (f, P, [a, b]) − L(f, P, [a, b]) = Ωf ([xi−1 , xi ])∆xi + Ωf ([xi−1 , xi ])∆xi
i=I1 i=I2

where I1 is the set of those i ∈ {1, . . . , n} such that [xi−1 , xi ] contains points of Dk and I2 is the
set of remaining i ∈ {1, . . . , n}. For each i ∈ I2 , [xi−1 , xi ] is an interval contained in E with length
< δ, and so Ωf ([xi−1 , xi ]) < 1/k. Thus
X 1X b−a
Ωf ([xi−1 , xi ])∆xi ≤ ∆xi ≤ .
k k
i=I2 i=I2

53
S SN
Since i∈I1 [xi−1 , xi ] ⊆ i=1 Ii , we have

N
X X 1
∆xi ≤ `(Ik ) < ,
k
i∈I1 i=1

and so
X Ωf ([a, b])
Ωf ([xi−1 , xi ])∆xi < .
k
i=I1

Therefore
Ωf ([a, b]) + b − a
U (f, P, [a, b]) − L(f, P, [a, b]) <
k
Since this holds for every k ∈ N, Riemann’s condition holds, and so f is Riemann integrable on
[a, b].

For the converse, we assume λ(D) > 0 and show that Riemann’s condition is not satisfied. We
must have λ(Dk ) > 0 for some k ∈ N. Set = λ(Dk )/k. For any partition P = {x0 , . . . , xn } of
[a, b], we have
n
X X
U (f, P, [a, b]) − L(f, P, [a, b]) = Ωf ([xi−1 , xi ])∆xi ≥ Ωf ([xi−1 , xi ])∆xi
i=1 i∈I

where
S I is the set of those i ∈ {1, . . . , n} such that (xi−1 , xi ) contains points of Dk . Then Dk \ P ⊆
i∈I i−1 , xi ), and so
(x
X X
∆xi = λ((xi−1 , xi )) ≥ λ(Dk \ P ) = λ(Dk ) = k.
i∈I i∈I

For each i ∈ I, there is a point x ∈ (xi−1 , xi ) ∩ Dk . Choose δ > 0 such that (x − δ, x + δ) ⊆ [xi−1 , xi ].
Then, since x ∈ Dk , we have
1
Ωf ([xi−1 , xi ]) ≥ Ωf ([a, b] ∩ (x − δ, x + δ)) ≥ .
k
Combining the above inequalities, we get

U (f, P, [a, b]) − L(f, P, [a, b]) ≥ .

Thus Riemann’s condition is satisfied, and so f is not Riemann integrable on [a, b].

54
20 Normed Spaces and Banach Spaces

In this section, K denotes either R or C, and V is a vector space with scalar field K. The proofs
of the theorems are left as exercises; they are simple extensions of the proofs of the analogous
theorems in R.
Definition 20.1. A norm on V is a function k · k : V → [0, ∞) such that the following conditions
are satisfied for all x, y ∈ V and c ∈ K:

(i) kx + yk ≤ kxk + kyk (Triangle Inequality)

(ii) kcxk = |c|kxk

(iii) kxk = 0 if and only if x = 0

The pair (V, k · k) is called a normed space. When the norm is clear from context, we write V
instead of (V, k · k).
Example 20.2. For 1 ≤ p < ∞, define
n
!1/p
X
p
kxkp = |xi | .
i=1

for each x = (x1 , . . . , xn ) ∈ K n . For p = ∞, define

kxk∞ = max |xi |.

1≤i≤n

for each x = (x1 , . . . , xn ) ∈ K n . For each 1 ≤ p ≤ ∞, k · kp is a norm on K n called the p-norm.

When p = 2, k · kp is also called the Euclidean norm.
Definition 20.3. Let (V, k · k) be a normed space. A sequence (xn ) in V is called convergent
if there exists a point x ∈ X with the following property: For every > 0 there exists a positive
integer N such that for every integer n ≥ N we have kxn − xk < . In such case, x is called the
limit of (xn ), we say that (xn ) converges to x, and we write limn xn = x or xn → x. A sequence
which is not convergent is called divergent.
Theorem 20.4. Let (V, k · k) be a normed space. If (xn ) is a convergent sequence in V , then (xn )
has a unique limit. In other words, if (xn ) is a convergent sequence in V such that xn → x ∈ V and
xn → y ∈ V , then x = y. (This justifies saying “the limit” rather than “a limit” in the definition.)
Theorem 20.5. Let (V, k · k) be a normed space. Let x ∈ X and let (xn ) be a sequence in X.
Then xn → x iff kxn − xk → 0.
Theorem 20.6. (Reverse Triangle Inequality) Let (V, k · k) be a normed space. For every x, y ∈ V ,
we have |kxk − kyk| ≤ kx − yk.
Theorem 20.7. Let (V, k · k) be a normed space. If (xn ) and (yn ) are convergent sequences in V
and c ∈ K, then

(a) limn (xn + yn ) = limn xn + lim yn .

(b) limn cxn = c lim xn

55
(c) limn kxn k = k limn xn k

Definition 20.8. Let (V, k · k) be a normed space. A sequence (xn ) in V is called Cauchy if for
every > 0, there exists a positive integer there exists a positive integer N such that for all integers
m, n ≥ N we have kxm − xn k < .

Theorem 20.9. Every convergent sequence in a normed space is Cauchy.

Definition 20.10. Let (V, k · k) be a normed space. A set E ⊆ V is called complete if every
Cauchy sequence in E converges and its limit is in E. If V is complete, V is called a Banach
space.

Example 20.11. Cn and Rn are complete, but Qn is not.

The following theorem will be useful.

Theorem 20.12. Let (V, k · k) be a normed space. If (xn ) is a Cauchy sequence in V and (xn ) has
a subsequence which converges to x ∈ V , then (xn ) converges to x.

56
21 Lp Spaces: Definitions and Basic Properties

In this section, K denotes either R or C.

Definition 21.1. Let (X, A, µ) be a measure space. Let f : X → K be a measurable function.

For each 1 ≤ p < ∞, define the p-norm of f to be
Z 1/p
kf kp = f (x)dµ(x)

with the convention that (∞)1/p = ∞. Define the ∞-norm of f to be

kf k∞ = inf {M ∈ [0, ∞] : |f (x)| ≤ M for almost every x ∈ X} .

Let 1 ≤ p ≤ ∞, define Lp == Lp (µ) = Lp (X, A, µ) to be the set of all measurable functions

f : X → K such that kf kp < ∞, where we adopt the convention that functions which are equal
almost everywhere as regarded as equals elements of Lp . Thus, to be accurate, Lp is the space of
equivalence classes of measurable functions f : X → K such that kf kp < ∞, where the equivalence
relation is “equal almost everywhere.” We will show below that k · kp is a norm on Lp . Unless
another norm is given, we always assume Lp is equipped with the p-norm.

Remark. Let (X, A, µ) be a measure space. Let f : X → K. The ∞-norm of f equals the essential
supremum of |f |:

kf k∞ = ess sup |f | = inf {M ∈ [0, ∞] : |f (x)| ≤ M for almost every x ∈ X}

It should be compared to the uniform norm of f , which equals the supremum of |f |:

kf ku = sup |f | = sup {|f (x)| : x ∈ X} = inf {M ∈ [0, ∞] : |f (x)| ≤ M for every x ∈ X} .

It is easy to check (exercise) that

|f | ≤ kf ku , kf k∞ ≤ kf ku , |f | ≤ kf k∞ a.e.

If (X, A, µ) = (R, L, λ), the ∞-norm and the uniform norm are equal for continuous functions
f : R → K, but they are not equal in general. For example, the function f (x) = x1Q∩[0,1] (x) has
kf ku = 1 and kf k∞ = 0.

Theorem 21.2. Let (X, A, µ) be a measure space. Let 1 ≤ p ≤ ∞. For every measurable function
f : X → K, we have kf kp = 0 iff f = 0 a.e. iff f = 0 in Lp .

Proof. The second equivalence is simply the convention that functions which are equalR a.e. are
equal as elements of Lp . Now we prove the first equivalence. For 1 ≤ p < ∞, kf kp = 0 iff |f |p = 0
iff |f | = 0 a.e. iff f = 0 a.e. Assume p = ∞. If f = 0 a.e., then f ≤ 0 a.e., so kf k∞ ≤ 0, hence
kf k∞ = 0. Conversely, if kf k∞ = 0, then |f | ≤ kf k∞ = 0 a.e., then |f | = 0 a.e., then f = 0 a.e.

Theorem 21.3. Let (X, A, µ) be a measure space. Let 1 ≤ p ≤ ∞. Let f : X → K be measurable

function and let c ∈ K. Then kcf kp = |c|kf kp . Consequently, if f ∈ Lp , then cf ∈ Lp .

57
Proof. If 1 ≤ p < ∞, then
Z Z
kcf kpp = p
|cf | = |c| p
|f |p = |c|p kf kpp .

If p = ∞, then

kcf k∞ = inf {M ∈ [0, ∞] : |cf (x)| ≤ M for almost every x ∈ X}

= inf {M ∈ [0, ∞] : |f (x)| ≤ M/|c| for almost every x ∈ X}
= inf |c|M 0 : M 0 ∈ [0, ∞], |f (x)| ≤ M 0 for almost every x ∈ X

= |c| inf M 0 : M 0 ∈ [0, ∞], |f (x)| ≤ M 0 for almost every x ∈ X

= |c|kf k∞

The third equality is a simple exercise in showing that two sets are equal.

The next theorem will lead to the triangle inequality for Lp spaces. But it is also extremely
important in its own right.
Theorem 21.4. (Holder’s Inequality) Let (X, A, µ) be a measure space. Let 1 ≤ p, q ≤ ∞ such
p−1 + q −1 = 1 with the convention that ∞−1 = 0. (We say p and q are conjugate exponents, we
say q is the conjugate exponent of p, and vice versa.) If f, g : X → K are measurable, then

kf gk1 ≤ kf kp kgkq .

Consequently, if f ∈ Lp and g ∈ Lq , then f g ∈ L1 .

The proof of Holder’s inequality will use the following lemma.

Lemma 21.5. If a ≥ 0, b ≥ 0, p > 0, q > 0, and p−1 + q −1 = 1, then

ab ≤ p−1 ap + q −1 bq ..

Proof. For fixed b, p, q, we consider h(a) = p−1 ap + q −1 bq − ab, a ∈ [0, ∞). Then h0 (a) = ap−1 − b.
So h0 (a) > 0 if a > b1/(p−1) and h0 (a) < 0 if a < b1/(p−1) . Thus the minimum value of h(a) is

h(b1/(p−1) ) = p−1 (b1/(p−1) )p + q −1 bq − (b1/(p−1) )b = (p−1 + q −1 )bq − bq = 0

where for the second equality we used that q = p/(p − 1). Therefore 0 ≤ h(a) = p−1 ap + q −1 bq − ab,
hence ab ≤ p−1 ap + q −1 bq . Note that equality happens iff a = b1/(p−1) , which is equivalent to
ap = bq (because q = p/(p − 1))

Proof of Holder’s Inequality. If 1 < p, q < ∞ and 0 < kf kp , kgkq < ∞, set a = |f (x)|/kf kp ,
b = |g(x)|/kgkq , apply the lemma, then integrate. The result is
Z Z Z
1 1 p 1 1 1
|f g| ≤ p |f (x)| + q |g(x)|p = + = 1.
kf kp kgkq pkf kp qkgkq p p
The other cases are easy. By If kf kp = 0, then f = 0 a.e., then f g = 0 a.e., then kf gk1 = 0, so the
inequality is trivial; likewise if kgkq = 0. If kf kp = ∞ and kgkq > 0, the inequality is clear; likewise
if kf kp > ∞ and kgkq = ∞. If p = ∞, then |f | ≤ kf k∞ a.e., hence |f g| ≤ kf k∞ |g| a.e.; likewise if
q = ∞.

58
The next theorem is the triangle inequality for Lp spaces.
Theorem 21.6. (Minkowski’s Inequality) Let (X, A, µ) be a measure space. Let 1 ≤ p ≤ ∞. Let
f, g : X → K be measurable functions. Then
kf + gkp ≤ kf kp + kgkp
Consequently, if f, g ∈ Lp , then f + g ∈ Lp .

Proof. If either kf kp = ∞ or kgkp = ∞, the inequality is obvious. So we assume f, g ∈ Lp . If p = 1

or p = ∞, the proof is easy and is left as an exercise. Assume 1 < p < ∞. Note
|f + g|p = |f + g||f + g|p−1 ≤ |f ||f + g|p−1 + |g||f + g|p−1 .
Integrating and applying Holder’s inequality to the two terms on the right gives
Z Z 1/q
p p−1 p−1 (p−1)q
|f + g| ≤ kf kp k(f + g) kq + kgkp k(f + g) kq = (kf kp + kgkp ) |f + g|

with p−1 + q −1 = 1. Noting (p − 1)q = p and 1/q = 1 − 1/p gives

Z Z 1−1/p
p p
|f + g| ≤ (kf kp + kgkp ) |f + g| ,

hence
kf + gkpp ≤ (kf kp + kgkp )kf + gkp−1
p .

If kf + gkp > 0, dividing by kf + gkp−1

p gives the result. If kf + gkp = 0, the result is trivial.

Combining the theorems above gives

Theorem 21.7. Let (X, A, µ) be a measure space. For each 1 ≤ p ≤ ∞, Lp is a vector space and
k · kp is a norm on Lp .

22 Lp Spaces: Completeness

Theorem 22.1. Let (X, A, µ) be a measure space. For every 1 ≤ p ≤ ∞, Lp = Lp (X, A, µ) is

complete.

Proof. Case: p = ∞. Exercise.

Case: 1 ≤ p < ∞. Let (fn ) be a Cauchy sequence in Lp . Inductively choose positive integers nj
(j = 1, 2, . . .) such that
kfm − fn kp < 2−j for all m, n ≥ nj .
and nj < nj+1 . Then (fnj )∞ j=1 is a subsequence of (fn ). By Theorem 20.12, to show that (fn )
converges in Lp , it will suffice to show that (fnj ) converges in Lp . Note
k
X
fnk = fn1 + (fnj+1 − fnj ) (22.1)
j=1

59
Define
k
X ∞
X
Gk = |fn1 | + |fnj+1 − fnj |, G = |fn1 | + |fnj+1 − fnj |.
j=2 j=2

Then
Z 1/p k
Gpk
X
= kGk kp ≤ kfn1 kp + 2−j < kfn1 kp + 1
j=1

for each k. Note Gpk ↑ Gp . Then the monotone convergence theorem gives
Z Z
G = lim Gpk ≤ kfn1 kp + 1 < ∞.
p
k

So Gp is integrable. It follows that Gp is finite a.e., which implies G is finite a.e. The latter means
that, for a.e. x ∈ X, the series
∞
X
fn1 (x) + (fnj+1 (x) − fnj (x))
j=1

of complex numbers is absolutely convergent and (hence) convergent. For those x in the set of
measure zero where the series diverges, define F (x) = 0. For those x at which the series converges,
define
X∞
F (x) = fn1 (x) + (fnj+1 (x) − fnj (x)).
j=1

So, by (22.1),
fnk → F a.e.
p
RWe will
p
R p that fnk → F inp L . This will complete the proof. We have |F | ≤ G, and so
show
|F | ≤ G < ∞. Thus F ∈ L . We have

|fnk − F |p → 0 a.e.

We also have
|fnk − F |p ≤ (|Gk | + |F |)p ≤ (2G)p .
Since (2G)p is integrable, the dominated convergence theorem gives
Z
lim kfnk − F kp = lim |fnk − F |p = 0
p
k k

Thus fnk → F in Lp .

60
23 Lp Spaces: Dense Subspaces (Optional)

Definition 23.1. Let (V, k · k) be a normed space. A set E ⊆ V is called dense if for each x in V
there exists a sequence (xn ) in E such that xn → x.

Theorem 23.2. Let (X, A, µ) be a measure space. Let 1 ≤ p ≤ ∞. For each f ∈ Lp and each
> 0, there exists a simple function s ∈ Lp such that kf − skp < . In other words, the set of
simple functions in Lp is a dense subset of Lp .

Proof. If the theorem holds for real-valued f , then it holds for complex-valued f by considering
real and imaginary parts. So we assume f is real-valued.

Case 1: 1 ≤ p < ∞. Let f ∈ Lp and > 0 be given. By Theorem 6.3, we can choose a sequence
of simple measurable functions sn such that |sn | ≤ |f | for each n and sn → f pointwise. Then
|sn − f |p → 0 pointwise and |sn − f |p ≤ (|sn | + |f |)p ≤ (2|f |)p ∈ L1 . By the dominated convergence
theorem, Z Z
lim ksn − f kpp = lim |sn − f |p = 0 = 0,
n n

and so ksn − f kp → 0. Therefore there exists N ∈ N such that ksn − f k < . Take s = sN . Note
s ∈ Lp because kskp ≤ ks − f kp + kf kp < + ∞ = ∞.

Case 2: p = ∞. Let f ∈ L∞ and > 0 be given. Since |f | ≤ kf k∞ < ∞ a.e., there exists
A ∈ A such that µ(Ac ) = 0 and |f (x)| ≤ kf k∞ for all x ∈ A. So f 1A is bounded by the number
kf k∞ . By Theorem 6.3, we can choose a sequence of simple measurable functions sn such that
sn → f 1A uniformly. Thus ksn − f 1A ku → 0. Since ksn − f 1A k∞ ≤ ksn − f 1A ku , we also have
ksn − f 1A k∞ → 0. Finally, since sn − f 1A = sn − f a.e., we have ksn − f k∞ = ksn − f 1A k∞ → 0.
Thus ksn − f k∞ → 0. Therefore there exists N ∈ N such that ksn − f k < . Take s = sN . Note
s ∈ L∞ because ksk∞ ≤ ks − f k∞ + kf k∞ < + ∞ = ∞.

The next lemma tells us what the simple functions in Lp spaces look like.

Lemma 23.3. Let (X, A, µ) be a measure space.

(a) Every measurable simple function is in L∞ .

(b) If s is a simple function and 1 ≤ p < ∞, then s ∈ Lp iff s is measurable and s = 0 outside a
set of finite measure.

Proof. (a): Every measurable simple function is bounded, hence belongs to L∞ .

(b): Let s = ni=1 ci 1Ei be the standard representation of s. So c1 , . . . , cn ∈ C are the P

P
distinct num-
bers in the range of s and Ei = s (ci ). Since E1 , . . . , En are disjoint, we have |s| = ni=1 |ci |p 1Ei .
−1 p

Suppose s ∈ Lp . Then s is measurable and kskpp = |s|p = ni=1 |ci |p µ(Ei ) < ∞. Let E be the
R P
union of those sets Ei with ci 6= 0. Then µ(E) < ∞ and E c is the union of those sets Ei with
ci = 0. So for all x ∈ E c we have s(x) = 0. (Note that if E c = ∅, then it is vacuously true that
s(x) = 0 for all x ∈ E c .)

61
Conversely, suppose s is measurable and s = 0 outside a set of finite measure. So there exists a set
A ∈ A such that µ(A) < ∞ and s(x) = 0 for all x ∈ Ac . Then
Z Z Z n
Z X n
X
kskpp = p
|s| 1A + p
|s| 1 Ac = p
|s| 1A = p
|ci | 1Ei ∩A = |ci |p µ(Ei ∩ A).
i=1 i=1

Since µ(A ∩ Ei ) ≤ µ(A) < ∞ for each i, it follows that kskpp < ∞, hence s ∈ Lp .

Now we give some approximation theorems for the specific case of Lebesgue measure on (R, L).
Definition 23.4. A function h : R → C is called a step function if
n
X
h= ck 1Ik
k=1

for some numbers c1 , . . . , cn ∈ C and intervals I1 , . . . , In .

Theorem 23.5. Let (X, A, µ) = (R, L, λ). Let 1 ≤ p < ∞. For each f ∈ Lp and each > 0, there
exists a step function h ∈ Lp such that kf − hkp < . In other words, the set of step functions in
Lp is dense subset of Lp .

Proof. Let f ∈ Lp and let > 0. By the previous theorem, there exists simple function s ∈ Lp such
that kf − skp < /2. So it will suffice to find a step function h ∈ Lp such that ks − hkp < /2. Let
c1 , . . . , cn be the distinct, non-zero numbers, in the range of s. Let Ei = s−1 (ci ) for each i. Then
n
X
s= ci 1Ei .
i=1

For each i = 1, . . . , n, we will find a step function hi ∈ Lp such that

k1Ei − hi kp < .
2|ci |n
Then the function defined by
n
X
h= ci hi
i=1

will be the desired step function in Lp with ks − hkp < /2.

Fix i ∈ {1, . . . , n}. Choose δ > 0 such that 2(2δ)1/p < /(2|ci |n). Since s ∈ Lp , 1 ≤ p < ∞, and
ci 6= 0, the previous lemma implies λ(Ei ) <S∞. By the definition of Lebesgue measure, there exists
a sequence of intervals (Ij ) such that E ⊆ ∞ j=1 Ij and

∞
X
λ(Ei ) ≤ `(Ij ) ≤ λ(Ei ) + δ.
j=1

Since µ(Ei ) < ∞, each Ij is a finite interval. Choose N ∈ N such that

N
X ∞
X N
X
`(Ij ) ≤ `(Ij ) ≤ `(Ij ) + δ.
j=1 j=1 j=1

62
Define hi to be the indicator function of N
S
j=1 Ij . Then hi is a step function
S∞ (verify). Since each
p
Ij is a finite interval, hi ∈ L . Define gi to be the indicator function of j=1 Iij . Note hi ≤ gi and
1Ei ≤ gi . We have
   
∞
[ [N ∞
[
2−p khi − 1Ei kpp ≤ kgi − hi kpp + kgi − 1Ei kpp = λ  Ij \ Ij  + λ  Ij \ Ei 
j=1 j=1 j=1
        
∞
[ N
[ ∞
[
= λ  Ij  − λ  Ij  + λ  Ij  − λ(Ei )
j=1 j=1 j=1

≤ 2δ

Since δ is arbitrary, we can choose it such that 2(2δ)1/p < /(2|ci |n), and so

k1Ei − hi kp < .
2|ci |n

Lemma 23.6. Let (X, A, µ) = (R, L, λ). Let 1 ≤ p < ∞. A step function h : R → C belongs to
Lp iff
h=0
outside a finite interval.

Proof. Exercise. Similar to the previous lemma.

Theorem 23.7. Let (X, A, µ) = (R, L, λ). Let 1 ≤ p < ∞. For each f ∈ Lp and each > 0,
there exists a continuous function g such that g = 0 outside a finite interval and kf − gkp < .
Consequently, the set of continuous functions g such that g = 0 outside a finite interval is a dense
subset of Lp .

Proof. Let f ∈ Lp and > 0 be given. By the previous theorem, there exists a step function h ∈ Lp
such that kf − hkp < /2. So it will suffice to find a continuous function g such that kh − gkp < /2
and g = 0 outside a finite interval. We have
n
X
h= ck 1Ik
i=1

for some numbers c1 , . . . , cn ∈ C and intervals I1 , . . . , In . The sum does not change if we delete
terms with ck = 0 or Ik = ∅. So we assume that ck 6= 0 and Ik 6= ∅ for all k. The previous lemma
implies Ik is finite for all k. For each k ∈ {1, . . . , n}, we will define a continuous function gk such
that

k1Ik − gk kp <
2|ci |n
and gk = 0 outside Ik . Then the function defined by
n
X
g= ck gk
k=1

63
Sn be a continuous function such that kh − gkp ≤ /2 and g = 0 outside a finite interval containing
will
k=1 Ik . observing that set of continuous functions g such that g = 0 outside a finite interval is
a subset of Lp . This will also give the density statement of the theorem, as the set of continuous
functions g such that g = 0 outside a finite interval is easily checked to be a subset of Lp .

Let k ∈ {1, . . . , n} be given. Choose δ so that (2δ)1/p < /(2|ck |n). Define gk to be the continuous
function that equals 0 on (−∞, a] and [b, ∞), equals 1 on [a + δ, b − δ], and is linear on [a, a + δ]
and [b − δ, b] (draw a picture). Set A = [a, a + δ] ∪ [b − δ, b]. Note λ(A) ≤ 2δ and 0 ≤ 1Ik − gk ≤ 1A .
Then

k1Ik − gk kp ≤ k1A kp = µ(A)1/p ≤ (2δ)1/p < .
2|ci |n

64
24 Counting Measure and `p Spaces

Theorem 24.1. Let X be a nonempty set. Let f : X → [0, ∞]. Let µ be counting measure on
(X, P(X)). (Recall that counting measure is defined as follows: µ(A) is the number of elements in
A if A is a finite subset of X, and µ(A) = ∞ if A is an infinite subset of X).

(a) f is measurable
R P
(b) f dµ = sup x∈F f (x) : F ⊆ X, F is finite .

(c) If X = {1, . . . , n}, then f dµ = nk=1 f (k).

R P

(d) If X = N, then f dµ = ∞
R P
k=1 f (k).

Proof. Exercise.

Definition 24.2. If X is any nonempty set and µ is counting measure on X on (X, P(X)), we
define `p (X) = Lp (A, P(A), µ).

Corollary 24.3. If 1 ≤ p ≤ ∞, X = {1, . . . , n}, and µ is the counting measure on (X, P(X)), then
`p (X) = Cn and (`p (X), k · kp ) = (Cn , k · kp ).

Corollary 24.4. If 1 ≤ p ≤ ∞, then `p (N) is the space of all sequences (an )n∈N in C such that
k(an )kp < ∞, where
P∞
( i=1 |ai |p )1/p if 1 ≤ p < ∞
k(an )kp =
supi∈N |ai | if p = ∞

65
25 Complex-Valued Measurable Functions

If X is a set and f : X → C, then f = Re(f ) + i Im(f ), where Re(f ) : X → R and Im(f ) : X → R

are defined by Re(f )(x) = Re(f (x)) and Im(f )(x) = Im(f (x)) for every x ∈ X.

Definition 25.1. Let (X, A) be a measurable space. A function f : X → C is called measurable

if Re(f ) and Im(f ) are measurable.

Theorem 25.2. Let (X, A) be a measurable space. Let f, g : X → C be measurable functions.

Let c ∈ C. The following functions are measurable.

(i) cf

(ii) f + g

(iii) f g

(iv) |f |

Proof. (i): Write c = a + ib and f = u + iv (where a = Re(c), b = Im(c), u = Re(f ), v = Im(f )).
Then cf = au − bv + i(av + bu). So Re(cf ) = au − bv and Im(cf ) = av + bu are measurable.
Therefore cf is measurable.

(ii) and (iii): Similar to (i).

(iv): For each a ∈ R,

{x ∈ X : |f (x)| > a} = x ∈ X : |f (x)|2 > a2 = x ∈ X : (Re(f )(x))2 + (Im(f )(x))2 > a2 ∈ A.

Theorem 25.3. Let (X, A) be a measurable space. If fn : X → C is measurable for n = 1, 2, . . .

and limn fn (x) exists for each x ∈ X, then limn fn is measurable.

Proof. Exercise.

Theorem 25.4. Let (X, A) be a measurable space. If f : X → C is measurable, then f −1 ({a}) =

{x ∈ X : f (x) = a} is measurable (i.e., belongs to A) for each a ∈ C.

Proof. Exercise.

66
26 Caratheodory Construction: Measures from Outer Measures

Lemma 26.1. Let X be any set. Let A be a σ-algebra on X. Let µ∗ be an outer measure on X.
Then µ∗ is a measure on A iff µ∗ is finitely additive on A. More precisely, the restriction of µ∗ to
A is a measure on A iff µ(A ∪ B) = µ(A) + µ(B) for every A, B ∈ A such that A ∩ B = ∅.

Proof.
⇒: Theorem 5.2 (a),(c),(d).

subaddivity of µ∗ , µ∗ ( ∞
S
⇐:
P∞Let∗A1 , A2 , . . . be any sequence of disjoint sets in A. By countably i=1 Ai ) ≤
µ (A ). To prove the reverse inequality, we now assume µ ∗ is finitely additive on A. We have
i=1 i
µ(AS∪ B) = µ(A) Pn + µ(B) for every A, B ∈ A such that A ∩ B = ∅. By induction, we have
∗ n ∗
µ ( i=1 Ai ) = i=1 µ (Ai ) for every n ∈ N. By monotonicity and taking a limit,
∞ n n
!
X X [
µ∗ (Ai ) = lim µ∗ (Ai ) = lim µ∗ Ai
n→∞ n→∞
i=1 i=1 i=1
∞ ∞
! !
[ [
∗ ∗
≤ lim µ Ai =µ Ai
n→∞
i=1 i=1

Lemma 26.2. Let X be a set. Let A be an algebra on X. Let µ : A → [0, ∞] be a function. The
following are equivalent.

1. µ(A ∪ B) = µ(A) + µ(B) for every A, B ∈ A such that A ∩ B = ∅.

2. µ(E) = µ(E ∩ F ) + µ(E ∩ F c ) for every E, F ∈ A.

Proof. ⇒: Set A = E ∪ F and B = E ∩ F c . ⇐: Set E = A ∪ B and F = A.

Definition 26.3. Let X be a set. Let µ∗ be an outer measure on X. A µ∗ -measurable (or

Caratheodory measurable) set is a set A ∈ P(X) such that

µ∗ (E) = µ∗ (E ∩ A) + µ∗ (E ∩ Ac )

for every E ∈ P(X).

Note that since the ≤ inequality follows immediately from finite subadditivity of µ∗ , a set A ∈ P(X)
is µ∗ -measurable iff
µ∗ (E) ≥ µ∗ (E ∩ A) + µ∗ (E ∩ Ac )
for every E ∈ P(X).

A µ∗ -measurable set A can be thought of as a sharp knife; it can cut any set E into two pieces
E ∩ A and E ∩ Ac so cleanly that the sum of the outer measure of the pieces equals the outer
measure of the whole.
Theorem 26.4 (Caratheodory’s Restriction Theorem). Let X be a set. Let µ∗ be an outer measure
on X. Let A∗ be the collection of µ∗ -measurable sets.

67
(a) If A ∈ P(X) and µ∗ (A) = 0, then A ∈ A∗ .
(b) A∗ is a σ-algebra on X
(c) The restriction of µ∗ to A is a measure on A.

Proof. Step 0. If A ∈ P(X) and µ∗ (A) = 0, then A ∈ A∗ .

Assume A ∈ P(X) and µ∗ (A) = 0. For any E ∈ P(X),

µ∗ (E ∩ A) + µ∗ (E ∩ Ac ) ≤ µ∗ (A) + µ∗ (E) = µ∗ (E).

So A ∈ A∗ .

Step 1. ∅ ∈ A∗ .

For any E ∈ P(X),

µ∗ (E ∩ ∅) + µ∗ (E ∩ ∅c ) = µ∗ (∅) + µ∗ (E) = µ∗ (E).
So ∅ ∈ A∗ .

Step 2. If A ∈ A∗ , then Ac ∈ A.

If A ∈ A, then, for any E ∈ P(X),

µ∗ (E) = µ∗ (E ∩ A) + µ∗ (E ∩ Ac ) = µ∗ (E ∩ Ac ) + µ∗ (E ∩ (Ac )c )

, and so Ac ∈ A.

Step 3. If A, B ∈ A∗ , then A ∪ B ∈ A∗ .

Let E ∈ P(X). Then

µ∗ (E) = µ∗ (E ∩ A) + µ∗ (E ∩ Ac )
= µ∗ (E ∩ A ∩ B) + µ∗ (E ∩ A ∩ B c ) + µ∗ (E ∩ Ac ∩ B)
+ µ∗ (E ∩ Ac ∩ B c ).

But A ∪ B = (A ∩ B) ∪ (A ∩ B c ) ∪ (Ac ∩ B) and Ac ∩ B c = (A ∪ B)c . So, by finite subadditvity,

µ∗ (E ∩ (A ∪ B)) + µ∗ (E ∩ (A ∪ B)c ) ≤ µ∗ (E ∩ A ∩ B) + µ∗ (E ∩ A ∩ B c ) + µ∗ (E ∩ Ac ∩ B)
+ µ∗ (E ∩ Ac ∩ B c )
= µ∗ (E).

Thus A ∪ B ∈ A∗ .

Step 4. A∗ is an algebra.

Steps 1, 2, and 3 combined show that A∗ is an algebra.

Step 5. If B1 , B2 , . . . , Bn are disjoint sets in A∗ and if E ∈ P(X), then

n n
!!
[ X
∗
µ E∩ Bi = µ∗ (E ∩ Bi ).
i=1 i=1

68
Since Bn ∈ A∗ ,
n n n
! !
[ [ [
∗ ∗ ∗
µ (E ∩ Bi ) = µ (E ∩ Bi ∩ Bn ) + µ (E ∩ Bi ∩ Bnc )
i=1 i=1 i=1

Since the B1 , B2 , . . . , Bn are disjoint sets,

n
!
[
E∩ Bi ∩ Bn = E ∩ Bn
i=1

and
n n−1
! !
[ [
E∩ Bi ∩ Bnc = E ∩ Bi
i=1 i=1

. Therefore
n n−1
! !!
[ [
µ∗ (E ∩ Bi ) = µ∗ (E ∩ Bn ) + µ∗ E∩ Bi .
i=1 i=1

Repeating the argument (or induction) gives the desired result.

Step 6. If B1 , B2 , . . . are disjoint sets in A∗ , then ∞

S
i=1 Bi ∈ A.
Sn
Let E ∈ P(X). Let n ∈ N be arbitrary. Since A∗ is an algebra (Step 4), i=1 Bi ∈ A∗ , and so
Then
n
!! n
!c !
[ [
∗ ∗ ∗
µ (E) = µ E ∩ Bi +µ E∩ Bi
i=1 i=1

By Step 5, !c !
n
X n
[
∗ ∗ ∗
µ (E) = µ (E ∩ Bi ) + µ E∩ Bi .
i=1 i=1
S∞ c
⊆ ( ni=1 Bi )c , monotonicity of µ∗ gives
S
Since ( i=1 Bi )

n ∞
!c !
X [
∗ ∗ ∗
µ (E) ≥ µ (E ∩ Bi ) + µ E∩ Bi .
i=1 i=1

Letting n → ∞ gives
∞ ∞
!c !
X [
∗ ∗ ∗
µ (E) ≥ µ (E ∩ Bi ) + µ E∩ Bi .
i=1 i=1

By countable subaddivity of µ∗ ,
∞
!! ∞
!c !
[ [
µ∗ (E) ≥ µ∗ E∩ Bi + µ∗ E∩ Bi .
i=1 i=1
S∞
Therefore i=1 Bi ∈ A∗ .

69
S∞
Step 7. If A1 , A2 , . . . ∈ A, then i=1 Ai ∈ A.

Define B1 = A1 , B2 = A2 \ A1 , B3 = A3 \ (A1 ∪ A2 ), . . .. Note B1 , B2 , . . . are disjoints sets. Since

A∗ is an algebra (Step 4), B1 , B2 , . . . ∈ A∗ . By Step 6,
∞
[ ∞
[
Ai = Bi ∈ A.
i=1 i=1

Step 8. The restriction of µ∗ to A∗ is a measure on A∗ .

For every E, F A∗ , we have

µ∗ (E) = µ∗ (E ∩ F ) + µ∗ (E ∩ F c ).
So Lemma 26.2 implies µ∗ is finitely additive on A*. Then Lemma 26.1 implies µ∗ is a measure on
A*.

70
27 Lebesgue Measure and Lebesgue σ-Algebra

Definition 27.1. Let λ∗ be the Lebesgue outer measure on R. The σ-algebra of all λ∗ -measurable
sets is called the Lebesgue σ-algebra on R. It is denoted by L. The elements of L are called the
Lebesgue measurable sets. The restriction of λ∗ to L is a measure called Lebesgue measure
and is denoted by λ.
Theorem 27.2. The Borel σ-algebra on R is contained in the Lebesgue σ-algebra on R, i.e., B ⊆ L.

Proof. Since B is the smallest σ-algebra containing the open sets, we just need to show that L
contains every open set. We first show that L contains every interval of the form (a, ∞). Let I be
any open interval of the form (a, ∞). We must show

λ∗ (E) = λ∗ (E ∩ I) + λ∗ (E ∩ I c )

for every E ∈ P(R). Let E ∈ P(R). By subadditivity,

λ∗ (E) ≤ λ∗ (E ∩ I) + λ∗ (E ∩ I c ).

It remains to show
λ∗ (E) ≥ λ∗ (E ∩ I) + λ∗ (E ∩ I c ).
If λ∗ (E) = ∞, we are done. Assume λ∗ (E) < ∞. Let > 0 be given. There exists a sequence of
intervals (In )n∈N that covers E and satisfies
∞
X
`(In ) ≤ λ∗ (E) + .
n=1

Since µ∗ (E) is finite, the last inequality implies each In is a finite interval. For each n ∈ N, choose
an open interval Jn such that In ⊆ Jn and `(Jn ) ≤ `(In ) + 2−n . Then the sequence of finite open
intervals (Jn )n∈N covers E and
X∞
`(Jn ) ≤ λ∗ (E) + 2.
n=1

For each n ∈ N, define Jn0 = Jn ∩ I = Jn and Jn00 = Jn ∩ I c . Then each Jn0 and Jn00 is an interval and

`(Jn ) = `(Jn0 ) + `(Jn00 ).

S∞ 0
Since E ∩ I ⊆ n=1 Jn , we have
∞
X
∗
λ (E ∩ I) ≤ `(Jn0 ).
n=1
S∞ 00
Since E ∩ Ic ⊆ n=1 Jn , we have
∞
X
λ∗ (E ∩ I c ) ≤ `(Jn00 ).
n=1

Therefore
∞
X ∞
X ∞
X
λ∗ (E ∩ I) + λ∗ (E ∩ I c ) ≤ `(Jn0 ) + `(Jn00 ) = `(Jn ) ≤ λ∗ (E) + 2.
n=1 n=1 n=1

71
Since > 0 is arbitrary, we have

λ∗ (E ∩ I) + λ∗ (E ∩ I c ) ≤ λ∗ (E).

We have proved L contains every open interval of the form (a, ∞). Since L is closed under com-
plements it contains the complement of each such interval, i.e., it contains every interval of the
form (−∞, a]. Since L is closed under finite intersections, it contains every interval of the form
(a, b) = (a, ∞) ∩ (−∞, b]. Since L is closed under countable unions, it contains every interval of the
form
n
[ 1
(a, b) = (a, b − ].
n
i=1
and every interval of the form
n
[ 1
(−∞, b) = (−n, b − ].
n
i=1
Thus L contains every open interval.

Let U be any open set in R. Then U is a union of open balls, by definition. But the open
= (x − r, x + r) and (a, b) =
balls in R are exactly the finite open intervals. Indeed, B(x, r) S
B((a + b)/2, (b − a)/2). Thus U is union of open intervals, U = k∈K Ik . Note that the index
set K may not be countable. For each rational number r ∈ U , let Jr be the union of those
interval S r ∈ Ik . Then Jr is an open interval (possibly infinite) for each r ∈ Q and
S Ik such that
U = k∈K Ik = r∈QQ Jr . Since Jr ∈ L for each r ∈ Q and Q is countable, we see that U is a
countable union of sets in L, and so U ∈ L. This show that L contains all the open sets in R.

Theorem 27.3. Let A ⊆ X. The following are equivalent:

(a) A ∈ L.

(b) For every > 0, there exists an open set G such that A ⊆ G and λ∗ (G \ A) < .

(c) There exists a set B ∈ B such that A ⊆ B and λ∗ (B \ A) = 0.

(d) For every > 0, there exists a closed set F such that F ⊆ A and λ∗ (A \ F ) < .

(e) There exists a set B ∈ B such that B ⊆ A and λ∗ (A \ B) = 0.

Proof. (a) ⇒ (b):

We first prove this under the assumption thatPλ∗ (A) < ∞. Let > 0. Choose a sequence of
intervals I1 , I2 , . . . such that A ⊆ i=1 Ii and ∞
S∞ ∗ ∗
i=1 `(Ii ) ≤ λ (A) + . Since λ (A) is finite, the
last inequality implies each Ii is a finite interval.
S∞ For each i, choose an open interval Ji such that
−i
Ii ⊆ Ji and `(Ji ) ≤ `(Ii ) + 2 . Define G = i=1 Ji . Then G is open, A ⊆ G, and
∞
X
∗
λ (G) ≤ `(Ji ) ≤ λ∗ (A) + 2
i=1

. Since A ∈ L and A ⊆ G, we have

λ∗ (G) = λ∗ (G ∩ A) + λ∗ (G ∩ Ac ) = λ∗ (A) + λ∗ (G \ A).

72
Combining the inequalities above and using that λ∗ (A) < ∞, we get

λ∗ (G \ A) = λ∗ (G) − λ∗ (A) ≤ 2 < 3.

Replacing by /3 everywhere in the argument gives the desired result.

Now we drop the assumption

S∞ that λ∗ (A) = ∞. Let > 0. Define An = A ∪ [−n, n] for each
n ∈ N. Then A = n=1 An . Moreover An ∈ L and λ∗ (An ) < ∞ for each n ∈ N. By mimicking

∗ −n
S∞each n ∈ Z we get an open set Gn such that An ⊆ Gn and
the argument argument above, for
λ (Gn \ An ) < 2 . Define G = n=1 Gn . Then G is an open set, A ⊆ G, and
∞
[ ∞
[
G\A= (Gn \ A) ⊆ (Gn \ An ).
n=1 n=1

Therefore
∞
X ∞
X
λ∗ (G \ A) ≤ λ∗ (Gn \ An ) < 2−n−1 = .
n=1 n=1

(b) ⇒ (c): By (b),Sfor each n ∈ N, there exists an open set Gn such that A ⊆ Gn and λ∗ (Gn \ A) <
1/n. Define B = ∞ n=1 Gn . Then B ∈ B, A ⊆ B, and B \ A ⊆ Gn \ A for every n ∈ N. Therefore
λ∗ (B \ A) ≤ λ∗ (Gn \ A) < 1/n for every n ∈ N. Hence λ∗ (B \ A) = 0.

(c) ⇒ (a): Let E ∈ P(R) be given. By (c), there exists B ∈ B such that A ⊆ B and λ∗ (B \ A) = 0.
Since B ⊆ L, we have B ∈ L, and so

λ∗ (E) = λ∗ (E ∩ B) + λ∗ (E ∩ B c ).

But λ∗ (E ∩ A) ≤ λ∗ (E ∩ B) and

λ∗ (E ∩ Ac ) ≤ λ∗ (E ∩ Ac ∩ B) + λ∗ (E ∩ Ac ∩ B c ) ≤ λ∗ (B \ A) + λ∗ (E ∩ B c ) = λ∗ (E ∩ B c ).

Therefore
λ∗ (E) ≥ λ∗ (E ∩ A) + λ∗ (E ∩ Ac ).
The ≤ inequality comes from subaddivity. Thus A ∈ L.

At this point, we have proved (a) ⇔ (b) ⇔ (c). We use this equivalence in what follows.

(a) ⇔ (d): A ∈ L iff Ac ∈ L iff (b) holds with A replaced by Ac , i.e., for every > 0, there exists an
open set G such that Ac ⊆ G and λ∗ (G \ Ac ) < . This last statement is equivalent to (d) because
of the following facts: G is open iff Gc is closed; Ac ⊆ G iff Gc ⊆ A; G \ Ac = G ∩ A = A \ Gc .

(a) ⇔ (e): A ∈ L iff Ac ∈ L iff (b) holds with A replaced by Ac , i.e., for every > 0, there exists a
B ∈ B such that Ac ⊆ B and λ∗ (B \ Ac ) = 0. This last statement is equivalent to (e) because of
the following facts: B ∈ B iff B c ∈ B; Ac ⊆ B iff B c ⊆ A; B \ Ac = B ∩ A = A \ B c .

73
28 Product σ-Algebras

Definition 28.1. Let S be a σ-algebra on a set X. Let T be a σ-algebra on a set Y . A measurable

rectangle is a set of the form A × B where A ∈ S and B ∈ T . The product σ-algebra on X × Y is
defined to be
S × T = σ({A × B : A ∈ S, B ∈ T })
In other words, S × T is the σ-algebra generated by the collection of measurable rectangles.

Definition 28.2. Let X, Y be any sets. Let E ⊆ X × Y . If a ∈ X, the a-section (or a-cross-
section or a-slice) of E is
[E]a = {y ∈ Y : (a, y) ∈ E} .
If b ∈ Y , the b-section (or b-cross-section or b-slice) of E is

[E]b = {x ∈ X : (x, b) ∈ E} .

Theorem 28.3. Let S be a σ-algebra on a set X. Let T be a σ-algebra on a set Y . If E ∈ S ⊗ T ,

then [E]a ∈ T for all a ∈ X and [E]b ∈ S for all b ∈ Y . Informally, the sections of a measurable set
are measurable.

Proof. We must show

S ⊗ T ⊆ {E ⊆ X × Y : [E]a ∈ T for all a ∈ X and [E]b ∈ S for all b ∈ Y .}

It is easy to check that the right-hand side is a σ-algebra containing the measurable rectangles,
which implies the desired containment.

Definition 28.4. Let X, Y be any sets. Let f : X × Y → R. If a ∈ X, the a-section (or

a-cross-section or a-slice) of f is the function [f ]a : Y → R defined by

[f ]a (y) = f (a, y)

for all y ∈ Y . If b ∈ X, the b-section (or b-cross-section or b-slice) of f is the function

[f ]b : X → R defined by
[f ]b (x) = f (x, b)
for all x ∈ X.

Example 28.5. If f (x, y) = 5x2 + y 3 , then [f ]2 (y) = f (2, y) = 20 + y 3 .

Theorem 28.6. Let S be a σ-algebra on a set X. Let T be a σ-algebra on a set Y . Let Let
f : X × Y → R. If f is S ⊗ T -measurable, then [f ]a is T -measurable for each a ∈ X and [f ]b is
S-measurable for each b ∈ Y . Informally, the sections of a measurable function are measurable.

Proof. Let a ∈ X. For any Borel set B ⊆ R, we have f −1 (B) ∈ S ⊗ T , and so

([f ]a )−1 (B) = f −1 (B) a ∈ T .

This proves that [f ]a is T -measurable for each a ∈ X. The other conclusion is proved similarly.

74
29 Monotone Classes

Monotone classes are similar (but not identical) to σ-algebras.

Definition 29.1. Let X be a set. A collection M ⊆ P(X) is called a monotone class on X if

the following properties hold:

(i) If E1 ⊆ E2 ⊆ . . . are in M, then ∞

S
i=1 Ei ∈ M.

(ii) If E1 ⊇ E2 ⊇ . . . are in M, then ∞

T
i=1 Ei ∈ M.

In other words, a monotone class is a collection of subsets of X which is closed under countable
increasing unions and countable decreasing intersections.

Theorem 29.2. Every σ-algebra is a monotone class, but not conversely.

Proof. Exercise.

Theorem 29.3. The intersection of any collection of monotone classes on a set X is a monotone
class on X.

Proof. Exercise.

Theorem 29.4. Let X be a set and let C be any collection of subsets of X. The intersection of
all monotone classes on X that contain C is the smallest monotone class on X that contains C. It
called the monotone class generated by C and it is denoted by M(C).

Proof. Exercise.

Lemma 29.5. (Monotone Class Lemma) If A is an algebra on a set X, then the monotone class
generated by A equals the σ-algebra generated by A, i.e.,

M(A) = σ(A).

Proof. Since σ(A) is a monotone class, we have M(A) ⊆ σ(A). We need to show σ(A) ⊆ M(A). It
suffices to show that M(A) is a σ-algebra. Note that if a monotone class is an algebra, then it is a σ-
algebra (exercise). Therefore it suffices to show that M(A) is an algebra. Note ∅, X ∈ A ⊆ M(A).
Thus, to prove that M(A) is an algebra, it is easy to check (exercise) that it will suffice to prove
the following claim: For all E, F ∈ M(A) we have E ∩ F, E \ F, F \ E ∈ M(A).

Proof of Claim. For each E ∈ M(A), define

ME = {F ∈ M(A) : E ∩ F, E \ F, F \ E ∈ M(A)} .

It is easy to check (exercise) that ME is a monotone class for every E ∈ M(A). Since A is an
algebra, it is easy to check (exercise) that A ⊆ ME for every E ∈ A. Therefore M(A) ⊆ ME for
every E ∈ A. In other words, for every E ∈ A and every F ∈ M(A) we have F ∈ ME . Note that
for every E, F ∈ M(A) we have F ∈ ME iff E ∈ MF . Thus, for every E ∈ A and every F ∈ M(A)
we have E ∈ MF . This means that A ⊆ MF for every F ∈ M(A). Therefore M(A) ⊆ MF for

75
every F ∈ M(A). But this means that for all E, F ∈ M(A) we have E ∈ MF . Thus for all
E, F ∈ M(A) we have E ∩ F, E \ F, F \ E ∈ M(A).

30 Iterated Integrals of Characteristic Functions

Definition 30.1. Let (X, S, µ) be a measure space. We S say that (X, S, µ) is σ-finite (or that µ is
σ-finite) if there exist E1 , E2 , . . . ⊆ X such that X = ∞
i=1 Ei and µ(Ei ) < ∞ for every i ∈ N.

Definition 30.2. Let (X, S, µ) and (Y, T , ν) be measure spaces. Let E ∈ S ⊗ T . For each x ∈ X,
we define Z Z
1E (x, y)dν(y) = [1E ]x (y)dν(y)
Y Y
For each y ∈ Y , we define
Z Z
1E (x, y)dµ(x) = [1E ]y (x)dµ(x)
X X

Lemma 30.3. Let (X, S, µ) and (Y, T , ν) be measure spaces. Let E ∈ S ⊗ T . For each x ∈ X,
Z
1E (x, y)dν(y) = ν([E]x )
Y

For each y ∈ Y , Z
1E (x, y)dµ(x) = µ([E]y )
X

Proof. For each x ∈ X, we have [1E ]x = 1[E]x , and so

Z Z Z
1E (x, y)dν(y) = [1E ]x (y)dν(y) = 1[E]x (y)dν(y) = ν([E]x ).
Y Y Y

The proof of the other statement is similar.

Theorem 30.4. Let (X, S, µ) and (Y, T , ν) be σ-finite measure spaces. For every E ∈ S ⊗ T , we
have
R
(a) x 7→ Y 1E (x, y)dν(y) is an S-measurable function.
R
(b) y 7→ X 1E (x, y)dµ(x) is a T -measurable function.
R R R R
(c) X Y 1E (x, y)dν(y)dµ(x) = Y X 1E (x, y)dµ(x)dν(y).

Proof. We will first prove the theorem in the special case where µ and ν are finite. Then we will
prove the theorem in the case µ and ν are σ-finite.

Assume µ and ν are finite. Let M be the collection of all sets E ∈ S ⊗ T such that (a),(b),(c) hold.
It will suffice to show that S ⊗ T ⊆ M. Let R be the collection of all measurable rectangles. Let

76
A be the collection of all finite unions of disjoint measurable rectangles. Note that S ⊗ T = σ(A)
and that A is an algebra. By the monotone class lemma, S ⊗ T = M(A). Therefore, to show that
S ⊗ T ⊆ M, it will suffice to show that M is a monotone class that contains A. We do so in three
steps.

Step 1: R ⊆ M. Let E ∈ R. We must show that (a),(b),(c) hold. We have E = A × B for some
A ∈ S and B ∈ T . For each x ∈ X, we have

B if x ∈ A
[E]x =
∅ if x ∈
/A
and so Z
ν(B) if x ∈ A
1E (x, y)dν(y) = ν([E]x ) = = ν(B)1A (x).
Y 0 if x ∈
/A
R
Since A ∈ S, (a) holds. Similarly, X 1E (x, y)dµ(x) = µ(A)1B (y) for each y ∈ Y . Since B ∈ T , (b)
holds. Now we see that
Z Z Z Z Z Z
1E (x, y)dν(y)dµ(x) = ν(B)1A (x)dµ(x) = µ(A)ν(B) = µ(A)1B (y)dν(y) = 1E (x, y)dµ(y)dν(x)
X Y X Y Y X

Thus (c) holds.

Sn
Step 2: A ⊆ M. Let E ∈ A. Then E = for some disjoint R1 , . . . , Rn ∈ R. For each x ∈ X,
i=1 Ri
we have "n n
#
[ [
[E]x = Ri = [Ri ]x
i=1 x i=1
and so Z n
X n Z
X
1E (x, y)dν(y) = ν([E]x ) = ν([Ri ]x ) = 1Ri (x, y)dν(y).
Y i=1 i=1 Y
R R
By Step 1, x 7→ Y 1Ri (x, y)dν(y) is S-measurable for each i. So x 7→ Y 1E (x, y)dν(y)
R is a sum of S-
measurable
Pn R functions, and hence it is S-measurable. Thus (a) holds. Similarly, 1
X E (x, y)dµ(x) =
i=1 X 1Ri (x, y)dµ(x), and so (b) holds. By Step 1,
Z Z Z Z
1Ri (x, y)dν(y)dµ(x) = 1Ri (x, y)dµ(x)dν(y)
X Y Y X

for each i. Therefore

Z Z n Z Z
X n Z Z
X Z Z
1E (x, y)dν(y)dµ(x) = 1Ri (x, y)dν(y)dµ(x) = 1Ri (x, y)dµ(x)dν(y) = 1E (x, y)dµ(x
X Y i=1 X Y i=1 Y X Y X

Thus (c) holds.

Step 3: M is a monotone class. We first show that

S∞ M is closed under countable increasing unions.
Suppose E1 ⊆ E2 ⊆ . . . are in M and set E = n=1 Ei . We must show E ∈ M. To do so, we will
show (a),(b),(c) hold for E. For each x ∈ X, define
Z
fn (x) = 1En (x, y)dν(y) = ν([En ]x ),
ZY
f (x) = 1E (x, y)dν(y) = ν([E]x ).
Y

77
For x ∈ X, we have [E1 ]x ⊆ [E2 ]x ⊆ . . . and [E]x = ∞
S
n=1 [En ]x . By continuity from below applied
to ν([En ]x ) (for each fixed x) (or the monotone convergence theorem applied to 1En (x, y) (for
each fixed x)), we have that fn → f pointwise. By the definition of M, each fn is S-measurable.
Therefore f is measurable. So (a) holds for E. For each y ∈ Y , define
Z
gn (y) = 1En (x, y)dµ(x) = ν([En ]y ),
Z X

g(x) = 1E (x, y)dµ(x) = ν([E]y ).

For y ∈ Y , we have [E1 ]y ⊆ [E2 ]y ⊆ . . . and [E]y = ∞ y

S
n=1 [En ] . By continuity from below applied
to ν([En ]y ) (for each fixed y) (or the monotone convergence theorem applied to 1En (x, y) (for each
fixed y)), we have that gn → g pointwise. By the definition of M, each gn is T -measurable.
Therefore g is measurable. So (b) holds for E. Now, since (fn ) is an increasing sequence, the
monotone convergence theorem implies
Z Z Z Z Z Z
1E (x, y)dν(y)dµ(x) = f (x)dµ(x) = lim fn (x)dµ(x) = lim 1En (x, y)dν(y)dµ(x).
n n
(30.1)

Similarly, since (gn ) is an increasing sequence, the monotone convergence theorem implies
Z Z Z Z Z Z
1E (x, y)dµ(x)dν(y) = g(y)dν(y) = lim gn (y)dν(y) = lim 1En (x, y)dµ(x)dν(y).
n n
(30.2)

Since (c) holds for each En , the right-hand sides of (30.1) and (30.2) are equal, and so
Z Z Z Z
1E (x, y)dν(y)dµ(x) = 1E (x, y)dµ(x)dν(y)

Thus (c) holds for E. Now we show that M S is closed under countable decreasing intersections.
Suppose E1 ⊆ E2 ⊆ . . . are in M and set E = ∞ n=1 Ei . We must show E ∈ M. To do so, we will
show (a),(b),(c) hold for E. Define f , fn , g, gn as above. We have f1 (x) = ν([E1 ]x ) ≤ ν(Y ) < ∞ for
each x ∈ X. By continuity from above applied to ν([En ]x ) (for each fixed x ∈ X) (or the dominated
convergence theorem applied to 1En (x, y) (for each fixed x)), we have that fn → f pointwise. By
the definition of M, each fn is S-measurable. Therefore f is measurable. So (a) holds for E. We
have g1 (y) = µ([E1 ]y ) ≤ µ(X) < ∞ for each y ∈ Y . By continuity from above applied to µ([En ]y )
(for each fixed y ∈ Y ) (or the dominated convergence theorem applied to 1En (x, y) (for each fixed
y)), we have that gn → g pointwise. By the definition of M, each gn is T -measurable. Therefore g
is measurable. So (b) holds for E. We have 0 ≤ fn ≤ f1 for all n and
Z Z
f1 (x)dµ(x) ≤ ν(Y )dµ(x) = µ(X)ν(Y ) < ∞.

Thus the dominated convergence theorem implies

Z Z Z Z Z Z
1E (x, y)dν(y)dµ(x) = f (x)dµ(x) = lim fn (x)dµ(x) = lim 1En (x, y)dν(y)dµ(x).
n n
(30.3)

78
Similarly, we have 0 ≤ gn ≤ g1 for all n and
Z Z
g1 (y)dν(y) ≤ µ(X)dν(y) = µ(X)ν(Y ) < ∞.

Thus the dominated convergence theorem implies

Z Z Z Z Z Z
1E (x, y)dµ(x)dν(y) = g(y)dν(y) = lim gn (y)dν(y) = lim 1En (x, y)dµ(x)dν(y).
n n
(30.4)

Since (c) holds for each En , the right-hand sides of (30.3) and (30.4) are equal, and so
Z Z Z Z
1E (x, y)dν(y)dµ(x) = 1E (x, y)dµ(x)dν(y)

Thus (c) holds for E.

The proof of the theorem is now complete in the case where µ and ν are finite. Now S we assume
only that µ and ν are σ-finite. Since µ is σ-finite, there exist X1 , X2 ∈ S such that ∞ i=1 Xi = X
and µ(Xi ) < ∞ for all i. By replacing each Xi by X1S∪ · · · ∪ Xi , we can assume X1 ⊆ X2 ⊆ . . ..
Since ν is σ-finite, there exist Y1 , Y2 ∈ T such that ∞ j=1 Yj = Y and ν(Yj ) < ∞ for all j. By
replacing each Yj by Y1 ∪ · · · ∪ Yj , we can assume Y1 ⊆ Y2 ⊆ . . .. For each k ∈ N, define µk on
S by µk (A) = µ(A ∩ Xk ) and νk on T by νk (B) = ν(B ∩ Yk ). It is easy to check that µk and νk
are finite measures. Let E ∈ S ⊗ T . As the theorem has been proved for finite measures, for each
k ∈ N, x 7→ νk ([E]x ) is S-measurable function, y 7→ µk ([E]y ) is T -measurable function, and
Z Z
νk ([E]x )dµ(x) = µk ([E]y )dν(x). (30.5)

For each fixed x, [E]x ∩Yk increases to [E]x , and so continuity from below implies νk ([E]x ) increases
to ν([E]x ). Then x 7→ ν([E]x ) is measurable and the monotone convergence theorem implies
Z Z
lim νk ([E]x )dµ(x) = ν([E]x )dµ(x).
k

Similarly, y 7→ µ([E]y ) is measurable and the monotone convergence theorem implies

Z Z
lim µk ([E]x )dν(x) = µ([E]y )dν(x).
k

Combining the last two equations with (30.5) gives

Z Z
ν([E]x )dµ(x) = µ([E]y )dν(x),

which is equivalent to
Z Z Z Z
1E (x, y)dν(y)dµ(x) = 1E (x, y)dµ(x)dν(y).

79
31 Product Measures

Definition 31.1. Let (X, S, µ) and (Y, T , ν) be measure spaces. Assume (X, S, µ) and (Y, T , ν)
are σ-finite. The product measure of µ and ν is the set function µ × ν : S ⊗ T → [0, ∞] defined
by Z Z Z Z
(µ × ν)(E) = 1E (x, y)dν(y)dµ(x) = 1E (x, y)dν(y)dµ(x).
X Y X Y

Theorem 31.2. Let (X, S, µ) and (Y, T , ν) be measure spaces. Assume (X, S, µ) and (Y, T , ν) are
σ-finite. Then µ × ν is a σ-finite measure on S ⊗ T . Moreover, µ × ν is the unique measure on
S ⊗ T such that
(µ × ν)(A × B) = µ(A)ν(B)
for all A ∈ S and B ∈ T .

Proof. Step 1: µ × ν is a measure on S ⊗ T . Since 1∅ = 0, we have

Z Z Z Z
(µ × ν)() = 1∅ dµdν = 0dµdν = 0.

Now we check that µ × ν is countably additive. Let E1 , E2 , . . . ∈ S ⊗ T be disjoint. Then

∞
! Z Z Z Z "∞ # ! Z ∞
!
[ [ [
(µ × ν) Ei = 1S∞i=1 Ei
dνdµ = ν Ei dµ = ν [Ei ]x dµ
i=1 i=1 x i=1
∞
Z X
= ν ([Ei ]x ) dµ
i=1

By applying the monotone convergence theorem to the sequence of partial sums of the series, we
get
∞ ∞ Z ∞ Z Z ∞
!
[ X X X
(µ × ν) Ei = ν ([Ei ]x ) dµ = 1Ei (x, y)dνdµ = (µ × ν)(Ei )
i=1 i=1 i=1 i=1

Step 2: (µ × ν)(A × B) = µ(A)ν(B) for all A ∈ S and B ∈ T . If A ∈ S and B ∈ T , then

Z Z Z Z
(µ × ν)(A × B) = 1A×B dνdµ = ν([A × B]x )dµ(x) = ν(B)1A (x)dµ(x) = µ(A)ν(B).

S∞
Step 3: µ × ν is σ-finite. Since µ is σ-finite, there exist X1 , X2 ∈ S such thatS∞ i=1 Xi = X
and µ(Xi ) < ∞ for all i. SSince ν is σ-finite, there exist Y1 , Y2 ∈ T such that j=1 Yj = Y and
ν(Yj ) < ∞ for all j. Then (i,j)∈N×N (Xi × Yj ) = X × Y and Xi × Yj ∈ S × T and (µ × ν)(Xi × Yj ) =
µ(Xi )ν(Yj ) < ∞ for all (i, j) ∈ N × N.

Step 4: Uniqueness. Let π and ρ be measures on S ⊗ T that both satisfy π(A × B) = µ(A)ν(B)
and ρ(A × B) = µ(A)ν(B) for all A ∈ S and B ∈ T . We need to show π = ρ. By arguing as in
Step 3, we see that π and ρ are σ-finite. We first treat the case where π and ρ are finite. Then we
treat the case where π and ρ are σ-finite.

80
Case 1: π and ρ are finite. Define M = {E ∈ S ⊗ T : π(E) = ρ(E)}. To show that π = ρ, it
suffices to show S ⊗ T ⊆ M. Let A be the collection of finite unions of disjoint measurable
rectangles. It is easy to check that A is an algebra and S ⊗ T = σ(A). By the monotone class
lemma, S ⊗ T = M(A). If we show that M is a monotone class that contains A, it will follow
that S ⊗ T ⊆ M, and we will be done. Using the additivity of π and ρ, it is easy to check that
A ⊆ M. So it remains to check that M is a monotone class. First we check thatSM is closed
under countable increasing unions. Suppose E1 ⊆ E2 ⊆ . . . belong to M and let E = ∞
n=1 En . By
continuity from below,
π(E) = lim π(En ) = lim ρ(En ) = ρ(E).
n n

Thus E ∈ M. Now we check that M is closed under countable decreasing intersections. Suppose
E1 ⊇ E2 ⊇ . . . belong to M and let E = ∞
T
n=1 En . Since π(E1 ) < ∞ and ρ(E1 ) < ∞, continuity
from above gives
π(E) = lim π(En ) = lim ρ(En ) = ρ(E).
n n

Thus E ∈ M. Therefore M is a monotone class.

Case
S 2: π and ρ are σ-finite. As in Step 3, choose X1 , X2 ∈ S and Y1 , Y2 ∈ T such that
(i,j)∈N×N (Xi × Yj ) = X × Y , µ(Xi ) < ∞ for all i, and µ(Yj ) < ∞. S For each k ∈ N, let Zk
be the union of those sets (Xi × Yj ) with i + j ≤ k, i.e., Zk = (i,j):i+j≤k (Xi × Yj ). Then
Z1 ⊆ Z2 . . . is an increasing sequence of sets in S × T such that ∞
S
k=1 Zk = Xk × Yk ,
X X
π(Zk ) ≤ π(Xi × Yj ) = µ(Xi )ν(Yj ) < ∞,
(i,j):i+j≤k (i,j):i+j≤k

and X X
ρ(Zk ) ≤ ρ(Xi × Yj ) = µ(Xi )ν(Yj ) < ∞.
(i,j):i+j≤k (i,j):i+j≤k

For each k ∈ N, define πk and ρk on S ⊗T by πk (E) = π(E ∩Zk ) and ρk (E) = ρ(E ∩Zk ). It is easy to
check that πk and ρk are finite measures and that πk (A×B) = µ(A)ν(B) and ρk (A×B) = µ(A)ν(B)
for all A ∈ S and B ∈ T . Therefore, by applying Case 1 to πk and ρk , we get that πk (E) = ρk (E)
for all E ∈ S ⊗ T . Then, by continuity from below,

π(E) = lim πk (E) = lim ρk (E) = ρ(E).

k k

32 Tonelli’s Theorem and Fubini’s Theorem

Theorem 32.1. (Tonelli’s Theorem) Let (X, S, µ) and (Y, T , ν) be σ-finite measure spaces. If
f : X × Y → [0, ∞] is a S ⊗ T -measurable function, then
R
(a) x 7→ Y f (x, y)dν(x) is an S-measurable function,
R
(b) y 7→ X f (x, y)dµ(y) is a T -measurable function,
R R R R R
(c) f d(µ × ν) = X Y f (x, y)dν(y)dµ(x) = Y X f (x, y)dµ(x)dν(y).

81
Proof. By Theorem 30.4 and the definition of product measure, the theorem holds when f is the
indicator function of a S ⊗ T -measurable set. Since linear combinations of measurable functions
are measurable and since the integral is linear, the theorem also holds when f is a measurable
simple function. If f is non-negative extended-real-valued S ⊗ T -measurable function, we can find
a sequence of non-negative measurable simple functions which increases to f pointwise, and so the
fact that limits of measurable functions are measurable and the monotone convergence theorem
imply that the theorem holds for f in this case as well.

Theorem 32.2. (Fubini’s Theorem) Let (X, S, µ) and (Y, T , ν) be σ-finite measure spaces. Sup-
pose f : X × Y → [−∞, ∞] (or f : X × Y → C) is S ⊗ T -measurable and µ × ν-integrable.
R
(a) [f ]x is ν-integrable (i.e., Y |f (x, y)|dν(y) < ∞) for µ-almost-every x ∈ X.
(b) [f ]y is µ-integrable (i.e., X |f (x, y)|dµ(x) < ∞) for ν-almost-every y ∈ Y .
R

(c) The function R

Y f (x, y)dν(y) if [f ]x is ν-integrable
I(x) =
0 otherwise
is S-measurable and µ-integrable.
(d) The function
f (x, y)dµ(x) if [f ]y is µ-integrable
R
J(y) = X
0 otherwise
is T -measurable and ν-integrable.
R R R
(e) f d(µ × ν) = X I(x)dµ(x) = Y J(y)dν(y).

R
Remark: Note that the function x 7→ Y f (x, y)dν(y) may not be defined
R for every x ∈ X. Thus
we need to work with I(x) instead. Likewise with the function y 7→ Y f (x, y)dµ(y) and J(y). It is
standard convention to identify these functions, so that (e) can be written as
Z Z Z Z Z
f d(µ × ν) = f (x, y)dν(y)dµ(x) = f (x, y)dµ(x)dν(y).
X Y Y X

Proof. The plan is to apply Tonelli’s theorem to the functions |f |, f + , and Rf − . The function |f |
is non-negative and S ⊗ T -measurable.
R Tonelli’s theorem implies that x 7→ Y |f (x, y)|dν(x) is an
S-measurable function, y 7→ X |f (x, y)|dµ(y) is a T -measurable function, and
Z Z Z Z Z
|f |d(µ × ν) = |f (x, y)|dν(y)dµ(x) = |f (x, y)|dµ(x)dν(y).
X Y Y X

The assumption that f is µ × ν-integrable means that all three expressions above are finite. Now
we apply Tonelli’s theorem to the positive and negative parts of f . The functions f + and f − are
non-negative and S ⊗T -measurable. Tonelli’s theorem implies that the functions I + and I − defined
by Z Z Z
I + (x) = f + (x, y)dν(y) = [f ]+ I − (x) = f − (x, y)dν(y)
Y Y
are both S-measurable functions. Note that
Z Z
+
+
I (x) = f x dν = ([f ]x )+ dν

82
and Z Z
− −
([f ]x )− dν

I (x) = f x
dν =

Thus [fx ] is ν-integrable iff I + (x) and I − (x) are both finite. Since f + ≤ |f | and f − ≤ |f |, we have
Z Z Z
+
I (x)dµ(x) ≤ |f (x, y)|dν(y)dµ(x) < ∞
X X Y

and Z Z Z
−
I (x)dµ(x) ≤ |f (x, y)|dν(y)dµ(x) < ∞
X X Y

Thus I + and I − are µ-integrable. It follows that I + (x) and I − (x) are finite for µ-almost-every
x ∈ X. Equivalently, [fx ] is ν-integrable for µ-almost-every x ∈ X. So (a) is proved. Let A be the
set of those x ∈ X such that both I + (x) and I − (x) are finite. Equivalently, A is the set of those
x ∈ X such that is ν-integrable. For each x ∈ A, we have
Z Z Z
I(x) = f (x, y)dν(y) = +
f (x, y)dν(y) − f − (x, y)dν(y) = I + (x) − I − (x)
Y Y Y

Thus I = I + 1A − I − 1A . Since I + and I − are S-measurable, it follows that A ∈ S. Therefore I is

S-measurable. Moreover, since I + and I − are µ-integrable, so are I + 1A and I − 1A , and hence so
is I. So (c) is proved. The proofs of (b) and (d) are similar. It remains to prove (e). First observe
that Z Z Z Z Z
Idµ = I + 1A dµ − I − 1A dµ = I + dµ − I − dµ.

By applying Tonelli’s theorem to the functions f + and f − , we have that

Z Z Z Z
+
f d(µ × ν) = f dνdµ = I + dµ
+

and Z Z Z Z
− −
f d(µ × ν) = f dνdµ = I − dµ.

Therefore
Z Z Z Z Z Z
+ − + +
f d(µ × ν) = f d(µ × ν) − f d(µ × ν) = I dµ − I dµ = Idµ

A similar argument gives Z Z

f d(µ × ν) = Jdν.

33 Lebesgue Measure on Rn

Let B(Rn ) denote the Borel σ-algebra on Rn . It is an exercise to show that B(Rm ) ⊗ B(Rn ) =
B(Rm+n ). Note that the Lebesgue measure on R is σ-finite. The Lebesgue measure λn on B(Rn )
is defined to be the product measure
λn = λ × · × λ.

83
Tonelli’s Theorem and Fubini’s Theorem hold for Lebesgue measure on B(Rn ).

It is possible to define the Lebesgue measure on the larger Lebesgue σ-algebra on Rn . And it is
possible to prove Tonelli’s Theorem and Fubini’s Theorem in that case. But it is a bit complicated.
This is explored in the homework exercises.

34 Probability Theory: Basic Definitions

Definition 34.1. A probability space is a measure space (Ω, A, P ) such that P (Ω) = 1. In
such case, the measure P is called a probability measure on (Ω, A), the elements of Ω are called
elementary outcomes or sample points, the set Ω is called the sample space, the sets A ∈ A
are called events, and, for each A ∈ A, the number P (A) is called the probability of the event
A.

Example 34.2. Flip two fair coins. There are four possible outcomes: we get two heads; we get
head on the first coin and a tail on the second coin; we get a tail on the first coin and a head on
the second coin; we get two tails. We can take the sample space to be Ω = {HH,HT,TH,TT} and
collection of events to be A = P(Ω). For example, one event is {HT,TH}; it represents the event
where we get exactly one head. Since the coin is fair, each elementary outcome has probability 1/4,
so we define the probability of each event A to be 1/4 times the number of elements of A. Thus
1 1
P (exactly one head) = P ({HT,TH}) = ·2= .
4 2
Definition 34.3. Let (Ω, A, P ) be a probability space. A random variable on (Ω, A, P ) is a
A-measurable function X : Ω → R.

Example 34.4. We continue the previous example. Define X to be the number of heads, i.e.,

 0 if ω = TT
X(ω) = 1 if ω = HT or ω = TH
2 if ω = HH


(Since the σ-algebra A is the collecton of all subsets of Ω, any function from Ω to R is A-measurable.
Thus X is A-measurable. So X is a random variable.)

Notation 34.5. Probabilists have an aversion to displaying the arguments of random variables. For
example, it is common to write {X ≥ a} instead of {ω : X(ω) ≥ a} = X −1 ((−∞, a]) and P (X ≥ a)
instead of P ({ω : X(ω) ≥ a}) = P (X −1 ((−∞, a]))

Definition 34.6. Let (Ω, A, P ) be a probability space. Let X be a random variable on (Ω, A, P ).
The distribution of X is the set function PX : A → [0, 1] defined by

PX (B) = P (X −1 (B)) = P (X ∈ B)

for each B ∈ B(R). The next theorem says that PX is a probability measure on (R, B(R)).

Theorem 34.7. Let (Ω, A, P ) be a probability space. Let X be a random variable on (Ω, A, P ).
The distribution of X, PX , is a probability measure on (R, B(R)).

84
Proof. We have

PX (∅) = P (X −1 (∅)) = P (∅) = 0

PX (R) = P (X −1 (R)) = P (Ω) = 1.

To complete the proof, we need to show that PX is countably additive. S Suppose B1 , B2 , . . . ∈

B(R) are disjoint. Then X −1 (B1 ), X −1 (B2 ), . . . are disjoint and X −1 ( ∞
S∞ −1 (B ).
B
i=1 i ) = i=1 X i
Therefore
∞ ∞ ∞ ∞ ∞
! !! !
[ [ [ X X
−1 −1
PX Bi = P X Bi =P X (Bi ) = P X −1 (Bi ) = PX (Bi ) .
i=1 i=1 i=1 i=1 i=1

Definition 34.8. Let (Ω, A, P ) be a probability space. Let X be a random variable on (Ω, A, P ).
The cumulative distribution function or cdf of X is the function FX : R → [0, 1] defined by

FX (t) = PX ((−∞, t]) = P (X ≤ t)

for all t ∈ R. When no confusion is possible, we denote the cdf of X by F instead of FX .

Definition 34.9. Let (Ω, A, P ) be a probability space. Let X be a random variable on (Ω, A, P ).
We say that X is a continuous random variable if there is a function fX : R → [0, ∞) such that
fX is B(R)-measurable and
Z Z
P (X ∈ B) = fX dλ = fX 1B dλ
B

for every B ∈ B(R). The function fX is called the probability density function or pdf of X.
When no confusion is possible, we denote the pdf of X by f instead of fX .

Theorem 34.10. Let X be a continuous random variable on a probability space (Ω, A, P ). Let f
be the pdf of X. Let F be the cdf of X.
Rt
1. F (t) = −∞ f dλ for every t ∈ R.

2. P (X = t) = 0 for all t ∈ R.

3. F is continuous.

4. The pdf of X is unique up toR λ-a.e. equality. In other words, if g : R → [0, ∞) is B(R)-
measurable and P (X ∈ B) = B gdλ for all B ∈ B(R), then f = g λ-a.e.

5. f (t) = F 0 (t) for every t ∈ R at which f is continuous.

6. f (t) = F 0 (t) for λ-a.e. t ∈ R.

Proof. (a)-(e) are exercises. The proof of (f) is difficult and is omitted for now.

Here are some examples of continuous random variables.

85
Example 34.11. (a) √ A −x
random variable X is said to have a standard normal distribution if it
2
has pdf f (x) = 2πe /2 . Then
Z d√
2 /2
P (c ≤ X ≤ d) = 2πe−x dx.
c

(b) A random variable X is said to be uniformly distributed on the interval [a, b] if it has pdf
1
f = b−a 1[a,b] . For each interval [c, d] ⊆ [a, b], we have

d−c
Z
1
P (c ≤ X ≤ d) = 1[a,b] 1[c,d] dx = .
b−a b−a
In words, the probability that X lies in the interval [c, d] ⊆ [a, b] is the relative length of the
interval [c, d].

Definition 34.12. Let X be a random variable on a probability space (Ω, A, P ). Let µc be the
counting measure on (R, B(R)). We say X is a discrete random variable if there is a function
f : R → [0, ∞) such that Z Z
P (X ∈ B) = fX dµc = fX 1B dµc
B
for every B ∈ B(R). In such case, fX is called the probability mass function of X. When no
confusion is possible, we denote the pmf of X by f instead of fX .

Theorem 34.13. Let X be a discrete random variable on a probability space (Ω, A, P ). Let
S = {s ∈ R : P (X = s) > 0}.

1. S is countable.

2. The probability mass function of X is given by

X
f (t) = P (X = t) = 1s (t)
s∈S

for all t ∈ R.
P
3. PX (B) = s∈S∩B f (s) for every B ∈ B(R).
P
4. PX = s∈S f (s)δs .

Proof. Exercise.

Theorem 34.14. Let X be a random variable on a probability space (Ω, A, P ). The following are
equivalent:

(a) X is discrete.

(b) The set S = {s ∈ R : P (X = s) > 0} is countable.

(c) The
P∞ distribution of X is a countable weighted sum of point masses. More precisely, PX =
i=1 ci δsi for some sequences c1 , c2 , . . . ∈ [0, ∞) and s1 , s2 , . . . ∈ R.

In particular, if the range of X is countable, then X is discrete.

86
Proof. Exercise.

Here are some examples of discrete random variables.

Example 34.15. (a)

(b)

87
35 Theory of Cumulative Distribution Functions

We seek to answer two questions.

Question 1: Which functions can be cumulative distribution functions?

Question 2: We know the distribution PX determines the cumulative distribution function FX .

Does the cumulative distribution function FX determine the distribution PX ?

As a preliminary, we establish

First we note some properties of cumulative distribution functions.

Theorem 35.1. If X is a random variable on a probability space (Ω, A, P ), then the cumulative
distribution function of X, FX , has the following properties.

(a) (Monotone Increasing) If s ≤ t are real numbers, then FX (s) ≤ FX (t).

(b) (Right-Continuous) limt→a+ FX (t) = FX (a) for all a ∈ R.

(c) limt→−∞ FX (t) = 0

(d) limt→∞ FX (t) = 1

Proof. (a): By monotonicity of the measure P , if s ≤ t, then

FX (s) = P ({ω : X(ω) ≤ s}) ≤ P ({ω : X(ω) ≤ t}) = FX (t).

(b): If (an ) is any sequence of real numbers which decreases to a, then the sets {ω : X(ω) ≤ an }
decrease to {ω : X(ω) ≤ a}, and so the fact that P is a finite measure and continuity from above
give
lim FX (an ) = lim P ({ω : X(ω) ≤ an }) = lim P ({ω : X(ω) ≤ a}) = lim FX (a).
n n n n

Thus limt→a+ FX (t) = FX (a). (c) and (d): Similar to (b). The details are an exercise.

Theorem 35.2. Let X be a random variable on a probability space (Ω, A, P ). Let a ≤ b be real
numbers. Let F be the cdf of X.

(a) PX ((a, b]) = F (b) − F (a).

(b) PX ((−∞, a)) = limt→a− F (t)

(c) PX ({a}) = F (a) − limt→a− F (t).

(d) PX ([a, b]) = F (b) − limt→a− F (t).

(e) PX ((a, b)) = limt→b− FX (t) − F (a).

(f) PX ([a, b)) = limt→b− FX (t) − limt→a− F (t).

(g) PX ((a, ∞)) = 1 − F (a)

(h) PX ([a, ∞)) = 1 − limt→a− F (t)

88
Proof. Exercise.

The previous theorem tells us that the cumulative distribution function FX determines the distri-
bution PX for intervals.

The next theorem is the key to answering the questions above.

Theorem 35.3. If F : R → [0, 1] satisfies properties (a)-(d) above, then there is a unique proba-
bility measure µF on (R, B(R)) such that F (t) = µF ((−∞, t]) for all t ∈ R.

Proof. We give only an outline. The details are left as an exercise to the reader. To prove the
existence of µF , we mimic the construction of Lebesgue measure using Caratheodory’s restriction
theorem. For each interval (a, b] in R, define
`F ((a, b]) = F (b) − F (a).
For each E ∈ P(R), define
(∞ ∞
)
X [
µ∗F (B) = inf `F (Ii ) : I1 , I2 , . . . are intervals of the form (a, b] and B ⊆ Ii .
i=1 i=1

By mimicking the argument for Lebesgue outer measure, it is easy to check that µ∗F is an outer
measure on R and µ∗F (I) = `F (I) (exercise). The latter implies that F (t) = µ∗F ((−∞, t]) for all
t ∈ R and µ∗F (R) = 1. Now the Caratheodory restriction theorem implies that the collection
M∗F = {E ∈ P(R) : µ∗F (A) = µ∗F (A ∩ E) + µ∗F (A ∩ E c ) for all A ∈ P(R)}
of µ∗F -measurable sets is a σ-algebra on R and that the restriction of µ∗F to M∗F is a measure.
Denote this measure by µF . Note µF is a measure on (R, M∗F ). We want to have a measure
on (R, B(R)). By mimicking the proof that B(R) ⊆ L(R), it is easy to show that B(R) ⊆ M∗F
(exercise). Thus we can restrict µF to B(R) to obtain a measure on (R, B(R)). We also denote
this restricted measure by µF . Since we have noted that F (t) = µ∗F ((−∞, t]) for all t ∈ R and
µ∗F (R) = 1, and since (−∞, t] is a Borel set we see that F (t) = µF ((−∞, t]) and µF is a probability
measure. This completes the proof of the existence of µF . One way to prove uniqueness is with the
monotone class lemma. Another way is given in the proof of Proposition 1.3.10 in Cohn’s book.
The details are left as an exercise.

Now we can answer the questions above. The first two corollaries answer Question 1.
Corollary 35.4. Suppose F : R → [0, 1] satisfies properties (a)-(d) above and let µ be the probabil-
ity measure from the previous theorem. If we define (Ω, A, P ) = (R, B(R), µ) and define X : Ω → R
by X(ω) = ω, then F is the cumulative distribution function of the random variable X on (Ω, A, P ).

Proof. We have
FX (t) = PX ((−∞, t]) = P (X −1 ((−∞, t]))
= P ({ω ∈ Ω : X(ω) ∈ (−∞, t]}) = P ({ω ∈ Ω : ω ∈ (−∞, t]})
= P ((−∞, t]) = µ((−∞, t]) = F (t)

89
Corollary 35.5. Let F : R → [0, 1]. Then F satisfies properties (a)-(d) above iff there exists a
random variable whose cumulative distribution function is F .

The final corollary answers Question 1.

Corollary 35.6. Let X be a random variable on a probability space (Ω, A, P ). There is a unique
probability measure µ on (R, B(R)) such that FX (t) = µ((−∞, t]) and µ = PX .

Proof. Apply the previous theorem to F = FX and use the uniqueness assertion to conclude
µ = PX .

36 Probability Theory: Expected Value

From elementary probability and statistics, you may be familiar with the following formulas for the
expected value of discrete and continuous random variables. If X is a continuous random variable
with pdf f , then Z
E(X) = xf (x)dx.
R
If X is a discrete random variable with pmf f and S = {s ∈ R : P (X = s) > 0}, then
X
E(X) = sf (s)
s∈S

Our goal in this section is to show that these two formulas are actually special cases of a single
formula.

We start with the general definition of expected value.

Definition 36.1. Let X be a random variable on a probability space (Ω, A, P ). the expected
value (or expectation or mean or average of X is defined to be
Z
E(X) = XdP

whenever the integral on the right is defined. Note that E(X) may be ±∞.

Theorem 36.2. (Law of the Unconscious Statistician) Let (Ω, A, P ) be a probability space. Let
X be a random variable on (Ω, A, P ). Let g : R → R be a B(R)-measurable function.

(a) g(X) = g ◦ X is a random variable on (Ω, A, P ).

(b) g is PX -integrable iff g(X) = g ◦ X is P -integrable.

(c) If g is PX -integrable, then

Z Z
E(g(X)) = (g ◦ X)dP = gdPX
R

90
R R
(d) If either R gdPX = ±∞ or (g ◦ X)dP == ±∞, then
Z Z
E(g(X)) = (g ◦ X)dP = gdPX .
R

Proof. (a): We are given that X : Ω → R is B(R)-measurable and that is g : R → R is B(R)-

measurable. We must show that g ◦ X is B(R)-measurable. Let B ∈ B(R) be arbitrary. Since
g is B(R)-measurable, we have g −1 (B) ∈ B(R). Then, since X is B(R)-measurable, we have
X −1 (g −1 (B)) ∈ B(R). Finally, because (g ◦ X)−1 (A) = X −1 (g −1 (A) for all sets A ⊆ R, we have

(g ◦ X)−1 (B) = X −1 (g −1 (B) ∈ B(R).

Thus g ◦ X is B(R)-measurable.

(b): First suppose that g is an indicator function of a set B ∈ B(R). Then g ◦ X is the indicator
function of the set X −1 (B) and
Z Z Z Z
−1
gdPX = 1B dPX = PX (B) = P (X (B)) = 1X −1 (B) dP = g ◦ XdP

In summary, Z Z
gdPX = g ◦ XdP

Using the linearity of the integral, we can see that this equality holds if g is any non-negative
simple B(R)-measurable function. By the monotone convergence, this equality also holds if g is
any non-negative B(R)-measurable function. By applying the equality to the positive and negative
parts of g and noting that (g ◦ X)+ = g + ◦ X and (g ◦ X)− = g − ◦ X, we have
Z Z Z Z
g + dPX = (g ◦ X)+ dP, g − dPX = (g ◦ X)− dP

g is PX -integrable iff both g + dPX and g − dPX are finite iff both (g ◦ X)+ dP
R R R
So we
R see that
and (g ◦ X)− dP iff g ◦ X is P -integrable. This proves (b).

(c): The first equality is just the definition of expectation. Based on what we proved above, if g is
PX -integrable, then
Z Z Z Z Z Z
+ − + −
gdPX = g dPX − g dPX = (g ◦ X) dP − (g ◦ X) dP = g ◦ XdP.

This proves the second equality.

(d): If gdPX = ±∞, then one of g + dPX and g − dPX is infinite and one is finite, and we can
R R R

do the same calculation as in (c). The other case is similar.

R
Remark
R Some authors will use the notation g(x)dFX (x). This is just another notation for
g(x)dPX (x).

By taking g(x) = x in the previous theorem, we get the following.

91
Corollary 36.3. If X is a random variable on a probability space (Ω, A, P ), then
Z
E(X) = xdPX .

Theorem 36.4. Let (Ω, A, P ) be a probability space. Let X be a random variable on (Ω, A, P ).
Theorem 36.5. Let (Ω, A, P ) be a probability space. Let X be a random variable on (Ω, A, P ).
Let g : R → R be a B(R)-measurable function.

(a) If X is a continuous random variable with pdf f , then

Z
E(g(X)) = g(x)f (x)dx.
R

whenever the integral is defined.

(b) If X is a discrete random variable with pmf f and S = {s ∈ R : P (X = s) > 0}, then
X
E(g(X)) = g(s)f (s)
s∈S

whenever the sum is defined.

Proof. (a): For all B ∈ B(R), Z

PX (B) = f dλ,
B
and this can be rewritten as Z Z
1B dPX = 1B f dλ.

By the linearity of the integral, we have

Z Z
gdPX = gf dλ

if g is any non-negative simple B(R)-measurable function. By the monotone convergence, this

equality also holds if g is any non-negative B(R)-measurable function. By applying the equality to
the positive and negative parts of g, we see that
R inequality
R holds whenever g is PX -integrable or
(more generally) whenever one of the integrals gdPX or gf dλ is defined. (As usual, “defined”
means equal to a finite number or ±∞.). Now the previous theorem implies the desired result.

(b): By repeating the argument in (a) with λ replaced by the counting measure µc , we get
Z X
E(g(X)) = g(x)f (x)dµc (x) = g(s)f (s)
R s∈S

37 Absolutely Continuous and Singular Measures

38 Decomposition of Probability Measures

Le Gall JF Measure Theory Probability and Stochastic Process
100% (1)
Le Gall JF Measure Theory Probability and Stochastic Process
409 pages
Solution Manual To Mathematical Analysis 2nd Edition 1974 by T.M. Apostol I6svrf
No ratings yet
Solution Manual To Mathematical Analysis 2nd Edition 1974 by T.M. Apostol I6svrf
309 pages
Folland Solution
No ratings yet
Folland Solution
64 pages
A User-Friendly Introduction To Lebesgue Measure and Integration
100% (8)
A User-Friendly Introduction To Lebesgue Measure and Integration
233 pages
Measure Theory Notes
No ratings yet
Measure Theory Notes
83 pages
00 Gardner Thesis
No ratings yet
00 Gardner Thesis
72 pages
Daniel MI Notes
No ratings yet
Daniel MI Notes
74 pages
Apuntes. Esp. de Hilbert, Transf. de Fourier. Piere Bremaud.
No ratings yet
Apuntes. Esp. de Hilbert, Transf. de Fourier. Piere Bremaud.
87 pages
Measure Theory and Lebesgue Integration: Joshua H. Lifton Originally Published 31 March 1999 Revised 5 September 2004
No ratings yet
Measure Theory and Lebesgue Integration: Joshua H. Lifton Originally Published 31 March 1999 Revised 5 September 2004
28 pages
Subhangi Notes
No ratings yet
Subhangi Notes
32 pages
mth404 Notes PDF
No ratings yet
mth404 Notes PDF
20 pages
Mth404 Notes
100% (1)
Mth404 Notes
20 pages
Daniel MI Notes Early
No ratings yet
Daniel MI Notes Early
60 pages
Chapter I. Probability Background.: 1. Measure
No ratings yet
Chapter I. Probability Background.: 1. Measure
10 pages
Measure 1
No ratings yet
Measure 1
77 pages
The Lebesgue Integral As A Riemann
No ratings yet
The Lebesgue Integral As A Riemann
14 pages
Lebesgue Theory
No ratings yet
Lebesgue Theory
49 pages
1 Anoteonl - Spaces: Tma 401/man 670 Functional Analysis 2003/2004
100% (1)
1 Anoteonl - Spaces: Tma 401/man 670 Functional Analysis 2003/2004
13 pages
Main
No ratings yet
Main
134 pages
Measure Integral 120615
No ratings yet
Measure Integral 120615
152 pages
(Dover Books On Advanced Mathematics) H Kestelman - Modern Theories of Integration-New York, Dover Publications (1937) PDF
No ratings yet
(Dover Books On Advanced Mathematics) H Kestelman - Modern Theories of Integration-New York, Dover Publications (1937) PDF
261 pages
MA715-note (1)
No ratings yet
MA715-note (1)
127 pages
Measure Theory and Lebesgue Integration: Appendix D
No ratings yet
Measure Theory and Lebesgue Integration: Appendix D
14 pages
Measure Theory and Lebesgue Integration: Appendix D
No ratings yet
Measure Theory and Lebesgue Integration: Appendix D
14 pages
MA2224 ch3
No ratings yet
MA2224 ch3
21 pages
Chapter 3. Lebesgue Integral and The Monotone Convergence Theorem
No ratings yet
Chapter 3. Lebesgue Integral and The Monotone Convergence Theorem
21 pages
Ye - Lecture Notes On Real Analysis
No ratings yet
Ye - Lecture Notes On Real Analysis
76 pages
ORFE 526 - Probability: 1 Definitions
No ratings yet
ORFE 526 - Probability: 1 Definitions
10 pages
integral
No ratings yet
integral
30 pages
Charles Doss: K K K K
No ratings yet
Charles Doss: K K K K
4 pages
Integration
No ratings yet
Integration
44 pages
Spiegel M.R. Real Variables, Lebesque Measure With Applications To Fourier Series 1990
100% (3)
Spiegel M.R. Real Variables, Lebesque Measure With Applications To Fourier Series 1990
201 pages
5215 Notes
No ratings yet
5215 Notes
113 pages
Theory of Integration in Statistics: Courtesy: An Unknown Writer
No ratings yet
Theory of Integration in Statistics: Courtesy: An Unknown Writer
60 pages
MTD350_Mid-sem_report_2022MT11923
No ratings yet
MTD350_Mid-sem_report_2022MT11923
28 pages
Don McLeish Probability
No ratings yet
Don McLeish Probability
101 pages
Measure Theory Primer
No ratings yet
Measure Theory Primer
33 pages
2012f Lebesgue Integrals Lecture Note
No ratings yet
2012f Lebesgue Integrals Lecture Note
69 pages
2012f Lebesgue-Integrals Lecture-Note PDF
No ratings yet
2012f Lebesgue-Integrals Lecture-Note PDF
69 pages
Another Method of Integration: Lebesgue Integral: Shengjun Wang 2017/05
No ratings yet
Another Method of Integration: Lebesgue Integral: Shengjun Wang 2017/05
33 pages
MT notes 2024-4
No ratings yet
MT notes 2024-4
96 pages
Lebesgue-Measure
No ratings yet
Lebesgue-Measure
4 pages
Measure and Integration Theory - 4-06-11-2021!16!20-07_Measure and Integration Theory(20MAT22C2) (2)
100% (1)
Measure and Integration Theory - 4-06-11-2021!16!20-07_Measure and Integration Theory(20MAT22C2) (2)
90 pages
Akshay
No ratings yet
Akshay
177 pages
Abbas C. - Functional Analysis-Math 920 (Spring 2003) (2003)
No ratings yet
Abbas C. - Functional Analysis-Math 920 (Spring 2003) (2003)
135 pages
BBBB
No ratings yet
BBBB
51 pages
Bartle PDF
No ratings yet
Bartle PDF
69 pages
Measure Theory
100% (1)
Measure Theory
201 pages
Unofficiall Abbott 2nd Edition Solutions
No ratings yet
Unofficiall Abbott 2nd Edition Solutions
210 pages
Countable Sets and Separable Hilbert Spaces
No ratings yet
Countable Sets and Separable Hilbert Spaces
17 pages
01 Book
No ratings yet
01 Book
38 pages
Villu
No ratings yet
Villu
13 pages
Tips and Tricks in Real Analysis: Nate Eldredge August 3, 2008
No ratings yet
Tips and Tricks in Real Analysis: Nate Eldredge August 3, 2008
5 pages
(2019) - Shirali Satish & Vasudeva Lal. Measure and Integration
No ratings yet
(2019) - Shirali Satish & Vasudeva Lal. Measure and Integration
609 pages
Chapter Iii: Measure-Theoretic Probability 1. Measure: Measure Theory Int Egrale, Longueur, Aire' Length
No ratings yet
Chapter Iii: Measure-Theoretic Probability 1. Measure: Measure Theory Int Egrale, Longueur, Aire' Length
17 pages
Mathematical Finance
No ratings yet
Mathematical Finance
17 pages
Humboldt Universität Zu Berlin - Mathematische Fakultät - Measure Theory - Skript Holtz 2017
No ratings yet
Humboldt Universität Zu Berlin - Mathematische Fakultät - Measure Theory - Skript Holtz 2017
53 pages
Wang A
No ratings yet
Wang A
9 pages
An Introduction to Lebesgue Integration and Fourier Series
From Everand
An Introduction to Lebesgue Integration and Fourier Series
Howard J. Wilcox
No ratings yet
Lectures on Measure and Integration
From Everand
Lectures on Measure and Integration
Harold Widom
No ratings yet
Exercises of Complex Analysis
From Everand
Exercises of Complex Analysis
Simone Malacrida
No ratings yet
Introduction to Partial Differential Equations: From Fourier Series to Boundary-Value Problems
From Everand
Introduction to Partial Differential Equations: From Fourier Series to Boundary-Value Problems
Arne Broman
2.5/5 (2)
Instant Access To Applied Probability From Random Sequences To Stochastic Processes Valérie Girardin Ebook Full Chapters
100% (3)
Instant Access To Applied Probability From Random Sequences To Stochastic Processes Valérie Girardin Ebook Full Chapters
52 pages
05.0 PP 1 58 Sums of Independent Random Variables
No ratings yet
05.0 PP 1 58 Sums of Independent Random Variables
58 pages
Vdoc - Pub Differentiable Measures and The Malliavin Calculus
No ratings yet
Vdoc - Pub Differentiable Measures and The Malliavin Calculus
506 pages
Foundations of Quantitative Finance. Book II: Probability Spaces and Random Variables 1st Edition Robert R. Reitnao - The ebook in PDF format is available for download
100% (1)
Foundations of Quantitative Finance. Book II: Probability Spaces and Random Variables 1st Edition Robert R. Reitnao - The ebook in PDF format is available for download
79 pages
STAT301 Notes
No ratings yet
STAT301 Notes
168 pages
Random matrices high dimensional phenomena 1st Edition Gordon Blower all chapter instant download
100% (1)
Random matrices high dimensional phenomena 1st Edition Gordon Blower all chapter instant download
82 pages
Probability Interpret A4
No ratings yet
Probability Interpret A4
43 pages
Chapter 4 Probability
No ratings yet
Chapter 4 Probability
33 pages
Cheat Sheet - JAM
No ratings yet
Cheat Sheet - JAM
46 pages
Lecture 1
No ratings yet
Lecture 1
11 pages
DTSP - Compiled Content - Module 1
No ratings yet
DTSP - Compiled Content - Module 1
57 pages
Get Applied Stochastic Processes 1st Edition Ming Liao (Author) free all chapters
100% (10)
Get Applied Stochastic Processes 1st Edition Ming Liao (Author) free all chapters
59 pages
Probability_Theory_Cookbook
No ratings yet
Probability_Theory_Cookbook
63 pages
Ross SM Pekoz Ea A Second Course in Probability
100% (1)
Ross SM Pekoz Ea A Second Course in Probability
191 pages
Realandcomplexanalysis: Gabrielribeiro & Thiago Landim
No ratings yet
Realandcomplexanalysis: Gabrielribeiro & Thiago Landim
109 pages
Stochastic Analysis in Finance I Stochastic Analysis in Finance I
No ratings yet
Stochastic Analysis in Finance I Stochastic Analysis in Finance I
18 pages
Machine Learning Probability Homework
No ratings yet
Machine Learning Probability Homework
3 pages
Formal Aspects of Language Modeling
No ratings yet
Formal Aspects of Language Modeling
252 pages
ST2133 ASDT 2021 Guide
No ratings yet
ST2133 ASDT 2021 Guide
242 pages
Functional Analysisf15
No ratings yet
Functional Analysisf15
100 pages
Probability Theory
100% (1)
Probability Theory
106 pages
An Introduction to Continuous Time Stochastic Processes Theory Models and Applications to Finance Biology and Medicine 1st Edition Vincenzo Capasso 2024 Scribd Download
100% (11)
An Introduction to Continuous Time Stochastic Processes Theory Models and Applications to Finance Biology and Medicine 1st Edition Vincenzo Capasso 2024 Scribd Download
60 pages
1. Sets, Fields, Sigma Fields, Limit of Sequences of Subsets and Borel fields
No ratings yet
1. Sets, Fields, Sigma Fields, Limit of Sequences of Subsets and Borel fields
15 pages
Skript 2022
No ratings yet
Skript 2022
112 pages
Kit-Wing Yu - A Complete Solution Guide To Real and Complex Analysis I-978-988-78797-9-4 (2019)
No ratings yet
Kit-Wing Yu - A Complete Solution Guide To Real and Complex Analysis I-978-988-78797-9-4 (2019)
330 pages
Measure Int
100% (1)
Measure Int
247 pages
Borel Sets PDF
100% (1)
Borel Sets PDF
181 pages