Measure Theory Notes
Measure Theory Notes
Daniel Raban
Contents
1 Motivation 1
3 σ-algebras 4
3.1 Definition and examples . . . . . . . . . . . . . . . . . . . . . . . . . 4
3.2 Constructing σ-algebras . . . . . . . . . . . . . . . . . . . . . . . . . 5
4 Measures 7
4.1 Definitions and examples . . . . . . . . . . . . . . . . . . . . . . . . . 7
4.2 Properties of measures . . . . . . . . . . . . . . . . . . . . . . . . . . 9
6 Construction of measures 19
6.1 Outer measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
6.2 Distribution functions and Lebesgue measure . . . . . . . . . . . . . . 25
1 Motivation
What is length? Intuitively, you might want to line up a ruler (or an interval on
the real line) against an object to measure “length.” This suggests that we could
define length based on intervals on the real line; ignoring units, you can say that the
length of the interval [a, b] is b − a. We could fairly intuitively extend this idea to
1
finite unions and intersections of intervals; just add the lengths of disjoint intervals.
But what about other subsets of R? Can we extend the idea of length to Z, Q, or
complicated fractal-like subsets of R?
What about area? This might seem even harder to define. More generally, it is
a common desire to want to assign a number or a value to a set. We want to assign
area or volume to subsets of R2 or R3 . Maybe we have a distribution of mass in R3 ,
and we want to be able to see how much mass is in any kind of set, even a fractal-like
set (maybe this brings to mind an idea of integration). Or maybe we have a set of
outcomes of some situation, and we want to define probabilities of subsets of these
outcomes.
At the most basic level, measure theory is the theory of assigning a number, via
a “measure,” to subsets of some given set. Measure theory is the basic language of
many disciplines, including analysis and probability. Measure theory also leads to a
more powerful theory of integration than Riemann integration, and formalizes many
intuitions about calculus.
Example 2.1 (Vitali set). Let’s assume we can measure all subsets of R, with some
function µ : P(R) → [0, ∞] that takes subsets of R and assigns them a “length.” In
particular, let’s assume that µ has the following properties:
1. µ([0, 1)) = 1.
S P
2. If sets Ai are mutually disjoint, µ( i Ai ) = i µ(Ai ).
2
from each member of each equivalence class.1 For q ∈ Q ∩ [0, 1), let
That is, we translate N over by q to the right by at most 1, and we take the part
that sticks out of [0, 1) and shift it left by 1; you can also think of it as the translated
set wrapping around the interval [0, 1), much like how movement works in a video
game where moving past the right edge of the screen makes you enter from the left
edge of the screen (e.g. PAC-MAN). This is just two translations, so µ(Nq ) = µ(N ).
You should note that
So the Nq with q ∈ Q ∩ [0, 1) form a partition of [0, 1). We have already run into a
problem: what is µ(N )? Observe that
[ X X
µ([0, 1)) = µ Nq = µ(Nq ) = µ(N ).
q∈Q∩[0,1) q∈Q∩[0,1) q∈Q∩[0,1)
If µ(N ) = 0, then µ([0, 1)) = 0. But if µ(N ) = c > 0, then µ([0, 1)) = ∞. So there
is no value µ can assign N . We call N a nonmeasurable set.
3
3 σ-algebras
3.1 Definition and examples
Definition 3.1. Let let X be any set. A σ-algebra (or σ-field)2 F ⊆ P(X) is a
collection of subsets of X such that
1. F 6= ∅.
2. If E ∈ F, then E c ∈ F .
S∞
3. If E1 , E2 , . . . ∈ F, then i=1 Ei ∈ F.
In other words, a σ-algebra is a nonempty collection that is closed under set com-
plements and countable T unions. TheS∞definition also implies closure under countable
intersections because ∞ E
n=1 n = ( E
n=1 n
c c
) .
Example 3.1. Let X be any set. Then P(X) is a σ-algebra.
Remark 3.1. You might wonder what the whole point of defining σ-algebras was,
since P(X) is a σ-algebra. Indeed, using P(R) as a σ-algebra for our “length”
measure would be no different from if we had not introduced σ-algebras at all. The
key point to realize here is that some measures with desired properties (such as
translation invariance) will be defined on smaller σ-algebras, while other measures
may be defined on larger σ-algebras, even P(X).
Example 3.2. Let X be any set. The collection F = {∅, X} is a σ-algebra.
In fact, this is in some sense the minimal σ-algebra on a set.
Proposition 3.1. Let F ⊆ P(X) be a σ-algebra. Then ∅, X ∈ F.
Proof. The collection F is nonempty, so there exists some E ∈ F. Then E c ∈ F,
and we get that X = E ∪ E c ∪ · · · ∈ F. Moreover, ∅ = X c ∈ F.
So a σ-algebra is also closed under finite unions, since ni=1 Ei = ni=1 Ei ∪ ∅ ∪
S S
∅ ∪ · · · . The same holds for finite intersections. Here is a nontrivial example of a
σ-algebra.
Example 3.3. Let X be an uncountable set. Then the collection
F := {E ⊆ X : E is countable or E c is countable}
is a σ-algebra.
2
These are not to be confused with algebras and fields from abstract algebra. In my experience,
analysts use the term σ-algebra, and probabilists use σ-field.
4
3.2 Constructing σ-algebras
How do we construct σ-algebras? Often, it is difficult to just define a set that contains
what we want. Sometimes, it is useful to create σ-algebras from other σ-algebras or
from other collections of sets, such as a topology.
Proposition
T 3.2. Let {Fα : α ∈ A} be a collection of σ-algebras on a set X. Then
F = α∈A Fα is a σ-algebra.
Proof. We check the 3 parts of the definition.
1. Nonemptiness: F 6= ∅ because X ∈ Fα for each α ∈ A.
2. Closure under complements: If E ∈ F, then E ∈ Fα for each α ∈ A. Every Fα
c
T and is closed under complements, so E ∈ Fα for each α ∈ A.
is a σ-algebra
c
Then E ∈ α∈A Fα = F.
3. Closure under countable unions: If Ei ∈ F for each i ∈ N, then all the Ei ∈ Fα
each α ∈ A. Every Fα is a σ-algebraSand is closed
for S T under countable unions,
∞ ∞
so i=1 Ei ∈ Fα for each α ∈ A. Then i=1 Ei ∈ α∈A Fα = F.
We now introduce one of the most common ways to construct a σ-algebra. This
construction closely mirrors other constructions in mathematics, such as closures of
sets in topology and ideals generated by elements in ring theory.
Definition 3.2. Let E be a collection of subsets of a set X. The σ-algebra gen-
erated by E, denoted σ(E), is the smallest σ-algebra containing E. That is, σ(E) is
the intersection of all σ-algebras containing E.
Why is this intersection well defined? There is always one σ-algebra containing E,
namely P(X). Now we can introduce possibly the most commonly used σ-algebra.
Example 3.4. Let X be a nonempty set, let A ⊆ X, and let E = {A}. Then
σ(E) = {∅, A, Ac , X}. If you take B ⊆ X and let E 0 = {A, B}, then σ(E 0 ) contains
sets such as A, B, Ac , A ∪ B, A ∩ B c , (A ∪ B)c , . . . . Constructing the σ-field generated
by a collection of sets creates a large collection of sets to measure.
Example 3.5. Let X be a metric space (or topological space), and let T be the
collection of open subsets of X. Then the Borel σ-algebra is given by BX = σ(T ).
This σ-algebra contains open sets, closed sets, and more. We most commonly use
BR , which we will sometimes denote by B. When we talk about BRd , we assume the
Euclidean metric.3
3
We can actually use any metric induced by a norm on Rd , since all norms on a finite-dimensional
R-valued vector space induce equivalent metrics.
5
Example 3.6. What kinds of sets are in B = BR ? By definition, B contains all
open intervals and unions of open intervals. Since closed sets are complements of
T∞ B contains all closed intervals, as well. For any x ∈ R, {x} ∈ B because
open sets,
{x} = n=1 [x, x + 1/n]. From this, we get that all countable subsets of R are in B.
In fact, most subsets of R you would ever care about can be found in B, save for
pathological sets we might construct, such as the Vitali set.
Example 3.7. Suppose you are flipping a coin repeatedly.4 Define the set of out-
comes Ω = {H, T }∞ := {H, T } × {H, T } × · · · . For each n ≥ 1,
The collection Fn is a σ-field, and contains the events that can be determined after
n flips of the coin. Note that F1 ⊆ FS2 ⊆ · · · . We can define F∞ , the σ-field S
of events
in the whole random S∞process, as σ( n=1 Fn ). This is actually larger than ∞
∞
n=1 Fn ,
since all events
S∞ in F
n=1 n are only determined by finitely many coin flips, while
events in σ( n=1 Fn ) can be dependent on the results of infinitely many coin flips.
The previous example illustrates a fact about σ-algebras that you should be aware
of: a union of σ-algebras need not be a σ-algebra.
We can also create σ-algebras on the Cartesian product of sets in a way that
“respects” the σ-algebras on the components.
Exercise 3.1. Let X and Y be separable metric spaces. Show that BX×Y = BX ⊗BY .
4
We assume that you have a lot of free time, so you flip the coin infinitely many times.
6
4 Measures
4.1 Definitions and examples
Now that we have introduced σ-fields, we can define measures.
1. µ(∅) = 0.
S∞ P∞
2. If sets E1 , E2 , . . . are mutually disjoint, then µ( i=1 Ei ) = i=1 µ(Ei ).
The second condition is called countable additivity; the same property also
holds for only finitely many sets E1 , . . . , En (finite additivity) because you can just
let Ei = ∅ for i > n. Note that µ can take on the value ∞; measures that do not
are called finite measures.
Definition 4.2. A measure µ on S∞a set X is called σ-finite if there exist countably
many sets E1 , E2 , . . . such that i=1 Ei = X, and µ(Ei ) < ∞ for each i.
Example 4.1. Let X be any set, equipped with the σ-algebra F = P(X). The
function µ(E) = |E| that returns the size of the set is a measure called counting
measure. Counting measure is σ-finite iff X is countable.
Example 4.2. Let X be any nonempty set, equipped with the σ-algebra F = P(X).
Fix some x ∈ X. The function
(
1 x∈E
µ(E) =
0 x∈ /E
7
Example 4.4. Let µ be a measure on (X, F), and let E ∈ F. Then the function
µE : F → [0, ∞], given by
µE (F ) := µ(E ∩ F ),
is a measure on E.
Example 4.5. Imagine you flip a fair coin once. The set of outcomes is Ω = {H, T },
and the σ-field of events is F = P(Ω). Define the function P : F → [0, 1] given by
(on “rectangles” of events)
0
E=∅
P(E) = 1/2 E = {H} or {T }
1 E = Ω.
Later, we will learn how to formally extend such a function to a measure on any
set in F. The function P is a probability measure that gives the probability of each
event occurring.
If we want to flip our coin n times, the set of outcomes is Ωn = {H, T }n , our
σ-algebra is Fn = P(Ωn ), and we can construct the probability measure
n
Y
Pn (E1 × · · · × En ) = P(Ei ).
i=1
∞
Y
P∞ (E1 × E2 × · · · ) = P(Ei ).
i=1
Definition 4.4. If X is a set, and F ⊆ P(X) is a σ-algebra, then the pair (X, F)
is called a measurable space. If µ : F → [0, ∞] is a measure, then the triple
(X, F, µ) is called a measure space.
8
4.2 Properties of measures
Here are four basic facts about probability measures. The first two properties should
form your “common sense” intuition about what measures are. The third and fourth
properties are very useful formal properties that allow us to determine the measure of
complicated sets; they should also inform your intuition about how measures work.
Proposition 4.1. Let (X, F, µ) be a measure space. Then
1. (Monotonicity) If E ⊆ F , then µ(E) ≤ µ(F ).
2. (Subadditivity) µ( ∞
S P∞
n=1 En ) ≤ n=1 µ(En ).
To prove 4, first assume (without loss of generality) that µ(E1 ) < ∞; otherwise,
we can throw away the first finitely many Ei to make µ(E1 ) < ∞. Then
∞
! ∞
! ∞
\ [ X
µ En = µ E1 \ (En−1 \ En ) = µ(E1 ) − µ(En−1 \ En )
n=1 n=2 n=2
9
n
X
= lim µ(E1 ) − µ(En−1 ) − µ(En )
n→∞
i=1
= lim µ(En ).
n→∞
10
Proposition 5.1. Let X and Y be metric (or topological) spaces, equipped with the
respective Borel σ-algebras BX and BY . Then if f : X → Y is continuous, it is
measurable.
1. ∅ is open in Y , so A is nonempty.
3. If E1 , E2 , · · · ∈ A, then f −1 ( ∞
S S∞ −1
i=1 Ei ) = i=1 f (Ei ), which is in M by the
closure under countable unions of σ-algebras.
Exercise 5.1. (a) Let f : X → Y be a function. Show that for arbitrary unions
and intersections, !
[ [
f −1 Aα = f −1 (Aα ),
α α
!
\ \
f −1 Aα = f −1 (Aα ),
α α
7
This is one of those things that it is publicly acceptable to be intimately familiar with. Relish
this fact, and take it as your motivation to complete the associated exercise.
11
f −1 (Ac ) = (f −1 (A))c .
(b) Show that !
[ [
f Aα = f (Aα ),
α α
and find a counterexample to show that this property does not hold for intersections.
While the previous exercise provides a motivation for the definition of measurable
functions regarding formal manipulations, the following constructions provide much
more satisfying motivation.
Example 5.2. Let (X, M, µ) be a measure space, let (Y, N ) be a measurable space,
and let f : X → Y be a measurable function. We can define the push-forward
measure ν on Y by setting ν(E) = µ(f −1 (E)). Check yourself that this is indeed a
measure.
f −1 (N ) = {f −1 (E) ⊆ X : E ∈ N }.
This σ-algebra is also sometimes denoted σ(f ). The σ-algebra generated by f is the
smallest σ-algebra on X for which the function f is measurable.
If ν is a measure on Y , we can construct a pull-back measure µ on X by setting
µ(f −1 (E)) = ν(E). Check yourself that this is indeed a measure. Note, however,
that f cannot define a pull-back measure on any σ-algebra on X; this only works for
σ-algebras that are smaller than f −1 (N ).
12
5.2 Random variables and distributions
Measurable functions have an alternative, yet very important, interpretation in prob-
ability theory.
Example 5.5. Let X be a real-valued random variable (we implicitly assume the
Borel σ-algebra on R). Since f (x) = x2 is continuous, it is measurable (from (R, B)
to (R, B)). So X 2 is also a random variable. Similarly, aX + b (for a, b ∈ R), sin(X),
eX , etc. are random variables.
Example 5.6. Let µ be a probability measure on ({−1, 1}, P({−1, 1})) given by
µ({−1}) = µ({1}) = 1/2. Let (Ω, F, P) be our canonical measure space. If X is a
13
measurable function from Ω to {−1, −1} with push-forward measure µ, we call X a
Rademacher random variable. In particular, we have
P(X = 1) := P({ω ∈ Ω : X(ω) ∈ {1}}) = µ({1}) = 1/2,
P(X = −1) := P({ω ∈ Ω : X(ω) ∈ {−1}}) = µ({−1}) = 1/2.
Example 5.7. Let’s construct a Poisson random variable. Let µ be a probability
measure on (N, P(N)) given by
k
−λ λ
µ({k}) = e ,
k!
for some real-valued constant λ > 0. Let (Ω, F, P) be our canonical measure space,
and let X be a measurable function from Ω to N with push-forward measure µ. In
particular,
λk
P(X = k) = µ({k}) = e−λ ,
k!
and the probability of any subset of N can be specified by computing a countable
sum of P(X = k) for different k.
One of the amazing aspects of measure theory is that is unifies the ideas of discrete
and continuous probability (and even allows for mixing of the two). We have covered
a few examples of discrete probability spaces above. Here is an example of a non-
discrete case.
Example 5.8. A random variable with uniform distribution on [0, 1] (also de-
noted as U [0, 1]) is a random variable X with codomain ([0, 1], B[0,1] ) and distribution
µ that satisfies
µ([a, b]) = P(X ∈ [a, b]) = b − a.
for all 0 ≤ a ≤ b ≤ 1. The existence of such a measure is non-obvious, and we shall
prove its existence in the next section.
Example 5.9. Here is a distribution on ([0, 1], B[0,1] ) that is not discrete but also
has no continuous probability density over the real numbers. Let
(
0 0 < a ≤ b < 1/2
µ({0}) = 1/2, µ([a, b]) =
b − a 1/2 ≤ a ≤ b ≤ 1.
That is, µ is the uniform distribution but with all the probability in the interval
[0, 1/2) “concentrated” onto the value 0. Taking the existence of the uniform distri-
bution for granted (moreso the fact that we can define a measure on ([0, 1], B[0,1] ) by
defining its value on all subintervals), such a measure is well-defined.
To say that a random variable X has distribution µ, we write X ∼ µ.
14
5.3 Properties of real- and complex-valued measurable func-
tions
In this section, we show that sums, products, and limits of real- and complex-valued
measurable functions are measurable. Here, we always assume that the σ-algebra on
R or C is BR or BC , respectively. The most important parts of this section are not
the results (which are not surprising) but rather the techniques used in the proofs.
Proposition 5.3. Let f, g : X → R be measurable functions. Then the functions
f + g and f g are measurable.
Proof. By the proposition we proved when we introduced the idea of measurable
functions, it suffices to show that (f + g)−1 ((a, b)) is measurable for a, b ∈ R; this
is because every nonempty open set in R can be expressed as a countable union of
open intervals, so these sets generate B. The key property here is that R is separable
(i.e. it contains the countable dense set Q).
We have
[
(f + g)−1 ((a, ∞)) = {x : f (x) > q} ∩ {x : g(x) > a − q}
q∈Q
[
= f −1 ((q, ∞)) ∩ g −1 ((a − q, ∞)),
q∈Q
[
(f + g)−1 ((−∞, b)) = {x : f (x) < q} ∩ {x : g(x) < b − q}
q∈Q
[
= f −1 ((−∞, q)) ∩ g −1 ((−∞, b − q)),
q∈Q
15
Exercise 5.2. Show that if f, g are R-valued measurable functions, then max(f, g)
and min(f, g) are measurable.
To extend to the case of C-valued measurable functions, we provide a more general
framework for checking measurability of functions on product spaces.
Proposition Q 5.4. Let (X, M) and (Yα , Nα ) for each α ∈ A be measurable
Q spaces,
and let πα : α∈A Yα → Yα be the projection maps. Then f : X → α∈A Y is
measurable iff fα = πα ◦ f is measurable for each α ∈ A.9
f
f1 f2 f3 f4
( 4n=1 Yn , 4n=1 Nn )
Q N
π1 π4
π2 π3
16
Corollary 5.2. Let f, g : X → C be measurable functions. Then the functions f + g
and f g are measurable.
Proof. By the previous corollary, it is sufficient to show that Re(f + g), Im(f + g),
Re(f g), and Im(f g) are measurable. Moreover, the same corollary gives us that
Re(f ), Im(f ), Re(g), and Im(g) are all measurable. We have
Re(f g) = Re(f ) Re(g) − Im(f ) Im(g), Im(f + g) = Re(f ) Im(g) + Im(f ) Re(g),
which are measurable as sums and products of measurable real-valued functions.
In probability theory, this gives us an often taken-for-granted result: that sums
and products of random variables are indeed random variables.
What if we have a random process (i.e. a sequence of random variables)? Are
limits of sequences of measurable functions measurable? To talk about limits on the
real line, we must be prepared to have functions take values in R := R ∪ {±∞}.
Equip R with BR , the σ-algebra generated by {E ⊆ R : E ∩ R ∈ BR }.10
Proposition 5.5. Let {fj }j∈N be a sequence of R-valued measurable functions. Then
the functions
g1 (x) := sup fj (x), g2 (x) := inf fj (x),
j j
are all measurable. Additionally, the set {x : limj→∞ fj (x) exists} is measurable.
Proof. For g1 , it is sufficient to show that g1−1 ((a, ∞)) is a measurable set because
{(a, ∞) : a ∈ R} generates B; you can check this by checking that countable unions
and intersections of sets in this collection gets you all of the open intervals in R. We
have that
[ [
g1−1 ((a, ∞)) = {x : sup fj (x) > a} = {x : fj (x) > a} = fj−1 ((a, ∞)),
j
j∈N j∈N
10
As the notation suggests, this is actually a Borel σ-algebra, induced by a metric on R. The
metric is ρ(x, y) = | arctan(x) − arctan(y)|.
17
which is measurable because the inside is g1 but with the functions −fj .
For measurability of g3 , let hn (x) := supj≥n fj (x); hn is measurable by the same
reasoning used for g1 . Then g3 = inf n hn , so g3 is measurable since g2 is measurable.
For measurability of g4 , note that
g4 (x) = lim inf fj (x) = − lim sup −fj (x),
j j
which is measurable because the inside is g3 but with the functions −fj .
Finally, note that
{x : lim fj (x) exists} = {x : lim sup fj (x) = lim inf fj (x)}
j→∞ j j
18
6 Construction of measures
6.1 Outer measure
Until now, we have been vague about how to construct measures, especially the more
complicated measures on R. We now develop tools for doing so. These constructions
are sometimes skipped by people who wish to assume the existence of such measures
and treat them as “black boxes.” However, if you go on to do work involving measure
theory, you will invariably run into issues involving measurability, and in such times,
knowledge of outer measure will save the day.
Here is some motivation. We have had two main issues so far in constructing
measures:
2. how to actually specify values for a large class of subsets of a measurable space.
Outer measure solves both these issues by “approximating complicated sets using
simpler sets.” For example, in R2 , you can approximate the area under a curve by
successively finer coverings of the area by rectangles (as in Riemann integration);
you might call this approximation by “outer area.” As an analogy, consider the
relationship between the limit and lim sup of a real-valued sequence; the lim sup
approximates the limit from above, and we can define the limit as the value of the
lim sup under certain conditions (in this case, the lim sup equalling the lim inf).
The construction takes two steps and is summarized in the following diagram:
outer approx. Carathèodory
premeasure (µ0 ) outer measure (µ∗ ) measure (µ)
Let’s start in the middle, since outer measure is the most important of these.
1. µ(∅) = 0,
2. µ∗ (A) ≤ µ∗ (B) if A ⊆ B,
19
S∞ P∞
3. µ∗ ( i=1 Ai ) ≤ i=1 µ∗ (Ai ).
2. If A ⊆ B, then B ⊆ ∞
S S∞
i=1 Ei implies A ⊆ i=1 Ei . So
(∞ ∞
)
X [
µ∗ (A) = inf µ0 (Ei ) : Ei ∈ E, A ⊆ Ei
( i=1
∞
i=1
∞
)
X [
≤ inf µ0 (Ei ) : Ei ∈ E, B ⊆ Ei
i=1 i=1
∗
= µ (B)
3. Let ε > 0. For each Ai , choose Ei,j ∈ E for each j ∈ N such that Ai ⊆ ∞
S
P∞ S∞ j=1 Ei,j
∗ −i
S
and j=1 µ0 (Ei,j ) ≤ µ (Ai ) + ε2 . Then i=1 Ai ⊆ i,j≥1 Ei,j , and
∞
! ∞ X
∞ ∞ ∞
[ X X X
∗ ∗ −i
µ Ai ≤ µ0 (Ei,j ) ≤ [µ (Ai ) + ε2 ] = µ∗ (Ai ) + ε.
i=1 i=1 j=1 i=1 i=1
S∞ P∞
This holds for every ε > 0, so µ∗ ( i=1 Ai ) ≤ i=1 µ∗ (Ai ).
20
The verification of the last part of the definition uses a very valuable11 tech-
nique: if you wanst to establish inequalities (or equalities) involving an infimum or
supremum, consider an element that almost achieves the infimum (or supremum)
but misses it by at most ε. If you’re wondering why an inequality holds (and are
struggling to prove it), sometimes the ε comes in and solves everything like magic. In
cases like this, taking a step back and viewing the problem from a a vaguer viewpoint
of “things approximating other things” should provide you with the intuition you
were missing.
What kind of function is µ0 ? Defining an outer measure out of any function may
lead to issues when we try to make the outer measure into a measure. The following
example provides some intuition for what nice properties we need.
2
ExampleSn 6.1. Let E be the set of finite unions of half-open rectangles in R ; that is
E = { i=1 (ai , bi ] × (ci , di ] : ai , bi , ci , di ∈ R}. Define the function µ0 : E → [0, ∞] by
µ0 ((a, b] × (c, d]) = (b − a)(d − c). So µ0 just gives the area of a rectangle. For unions
of disjoint rectangles, add the values of µ0 on the different parts; and if two rectangles
intersect, we can split the union into several disjoint half-open rectangles.12
We want to make an “outer area” outer measure µ∗ that will behave nicely on
complicated sets. What property of µ0 makes it possible to determine the area of a
complicated region?
A “good covering” of a complicated subset of R2 will probably consist of countably
many tiny rectangles as to not overestimate the area of the region by too much. We
11
It’s also just super cool.
12
We don’t use closed rectangles because you can’t split the union of two intersecting closed
rectangles into disjoint closed rectangles. The boundaries of the rectangles end up intersecting.
21
have built in countable additivity into the function µ0 , so to approximate the area of
the region, we add up the areas of countably many disjoint rectangles in our covering.
This countable additivity condition is the condition we need.
When we define µ0 , we don’t necessarily have a σ-algebra, but we should still be
able to talk about how much measure we want to assign to complements of sets and
unions of sets we are already dealing with. This is a restriction on E.
Definition 6.2. An algebra13 (or field) of subsets of X is a nonempty collection
closed under complements and finite unions.
This is like a σ-algebra but without closure under countable unions. Defining
µ0 on such a collection is generally much easier than doing so on a σ-field. In fact,
the whole construction of outer measure could be thought of as extending a measure
from an algebra to a σ-algebra.
Example 6.2. Let X be a metric space (or a topological space), and let E be the
collection of open and closed sets. Then E is an algebra.
We can now explicitly state what µ0 should be.
Definition 6.3. Let E be an algebra. Then a premeasure µ0 : E → [0, ∞] is a
function such that
1. µ0 (∅) = 0,
S∞
Ei ∈ E, then µ0 ( ∞
S P∞
2. If (Ei )i∈N are disjoint, Ei ∈ E, and i=1 i=1 Ei ) = i=1 µ0 (Ei ).
Note that countable additivity implies finite additivity by setting all but finitely
many Ei equal to ∅. Since algebras are closed under finite unions, finite additivity
always holds for premeasures.
Exercise 6.1. Let µ0 : E → [0, ∞] be a premeasure, and let µ∗ be the outer measure
constructed from µ0 . Show that µ∗ |E = µ0 .
Now that we have premeasures and outer measures, we can finally construct
measures.14 The next definition essentially does the work for us. It defines sets
whose outer and “inner” measures are the same.
13
This is not to be confused with an algebra or field in the abstract algebraic sense. The ter-
minology can be unclear, but it has historic origins in relations to actual algebras (in the abstract
algebra sense).
14
Premeasures are not important to remember in detail; they are essentially an artifact of this
construction. Outer measures, by contrast, are still useful when you can’t guarantee the measura-
bility of a set.
22
Definition 6.4. Let A ⊆ X. Then A is µ∗ -measurable if for all E ⊆ X,
µ∗ (E) − µ∗ (E ∩ Ac ) = µ∗ (E ∩ A).
In the example of outer area, this says that the outer and inner areas of A are equal.15
Note that the inequality
µ∗ (E) ≤ µ∗ (E ∩ A) + µ∗ (E ∩ Ac )
µ∗ (E) ≥ µ∗ (E ∩ A) + µ∗ (E ∩ Ac )
23
Proof. The collection A contains ∅, and it is closed under complements since the
definition of µ∗ -measurability is symmetric in A and Ac . So to prove that A is a
σ-algebra, we need to show that it is closed under countable unions.
We first show that A is closed under finite unions. Let A, B ∈ A. Then, for
E ⊆ X,
µ∗ (E) = µ∗ (E ∩ A) + µ∗ (E ∩ Ac )
= (µ∗ (E ∩ A ∩ B) + µ∗ (E ∩ A ∩ B c )) + (µ∗ (E ∩ Ac ∩ B) + µ∗ (E ∩ Ac ∩ B c ))
The first three terms partition the set E∩(A∪B). The last set is equal to E∩(A∪B)c .
So by the subadditivity of outer measure,
≥ µ∗ (E ∩ (A ∪ B)) + µ∗ (E ∩ (A ∪ B)c ).
The reverse inequality follows from the subadditivity of outer measure, and we get
that A ∪ B is µ∗ -measurable. Now note that A1 ∪ · · · ∪ An = (A1 ∪ · · · ∪ An−1 ) ∪ An .
So by induction on the number of sets in the union, A is closed under finite unions.
We nowSextend to countable unions. Let An be µ∗ -measurable S∞for each n,
S∞ and let
Bn = An \ n−1
`=1 A ` , the set of new elements in An ). Then A := n=1 nA = n=1 Bn ,
and
n
! n
!
[ [
µ∗ (E) = µ∗ E ∩ B` + µ∗ E ∩ B`c
`=1 `=1
Sn
Since `=1 B`c ⊇ Ac , we can use monotonicity to get
n
!
[
≥ µ∗ E ∩ B` + µ∗ (E ∩ Ac )
`=1
n
X
= µ∗ (E ∩ B` ) + µ∗ (E ∩ Ac ).
`=1
Only the right hand side depends on n, so we may let n → ∞ on the right. We get
∞
X
µ∗ (E) ≥ µ∗ (E ∩ B` ) + µ∗ (E ∩ Ac )
`=1
S∞
By subadditivity, since E ∩ A = `=1 (E ∩ B` ),
≥ µ∗ (E ∩ A) + µ∗ (E ∩ Ac ).
24
As before, the reverse inequality is given by the subadditivity of µ∗ , so A = ∞
S
n=1 Ai
∗ ∗
is µ -measurable. So we have shown that A, the collection of µ -measurable sets, is
a σ-algebra.
Note that µ∗ (∅) = 0 by definition, so to show that µ := µ∗ |A is a measure, we
need only show that µ is countably additive on disjoint sets. Let (B` )`∈N be disjoint
µ∗ -measurable sets. Recall that in proving closure under countable unions of µ∗ -
∞ ∞
measurable sets, we had the inequality µ∗ (E) ≥ `=1 µ∗ (E ∩B` )+µ∗ (E ∩( `=1 B` )c )
P S
when the B` are disjoint. We actually showed that this S∞is an equality, since
S∞ the right
hand side is sandwiched between µ∗ (E) and S∞µ ∗
(E ∩ B
`=1 ` ) + µ ∗
(E ∩ ( c
`=1 B` ) ).
This holds for any E ⊆ X, so setting E = `=1 B` gives us
∞
! ∞
! ∞ ∞
! ∞
[ [ X [ X
∗ ∗ ∗
µ B` = µ B` = µ B` ∩ Bk + µ (∅) = µ(B` ).
`=1 `=1 k=1 `=1 k=1
Proof. Let µ∗ be the outer measure constructed from µ0 , and let µ be the measure
constructed from µ∗ . The exercise about premeasures shows that µ∗ |A = µ0 . Since
every A ∈ A is µ∗ -measurable, the domain of µ contains A.
Example 6.3. Let X be a random variable with point mass distribution at 0. Then
the cdf of X is (
0 x<0
F (x) = P(X ≤ x) =
1 x ≤ 0.
25
This is called the Heaviside step function. It is not continuous, but it is right-
continuous.
Example 6.4. Let X be a random variable with Uniform [0, 1] distribution. Then
the cdf of X is
0 x < 0
F (x) = x 0 ≤ x ≤ 1
1 x > 1.
Exercise 6.2. Let F be a cdf. Show that limx→−∞ F (x) = 0 and limx→∞ F (x) = 1.
Recall that the distribution of X is the push-forward measure of the probability
measure P; that is, the distribution is a measure on R that measures the probability
26
of X taking a value in a given subset of R. This is the same idea, except instead of
encoding the information as a measure, we treat the distribution as a function on R.
We will see that these are indeed the same concept.
Why is the cdf of a real-valued random variable important? Consider a random
variable that takes a value x if some certain event occurs at time x. Then the cdf
measures the probability that the event has already happened by time x, and 1−F (x)
measures the probability that the event will happen after time x.
From an analytic perspective, the study of cumulative distribution functions is
the study of nondecreasing right-continuous functions (limx→a+ F (x) = F (a) for each
a ∈ R). These are central to the idea of Riemann-Stieltjes integration, which will
be a special case of the powerful theory of Lebesgue integration we will develop. As
with their probability measure counterparts, these functions are closely related to
measures on the real line.
We have constructed distribution functions from measures on R. Let us now do
the reverse, using the outer measure construction.
Lemma 6.1. Let F : R → R be nondecreasing and right-continuous. For disjoint
half-open intervals (ai , bi ], let
n
! n
[ X
µ0 (ai , bi ] = (F (bi ) − F (ai )),
i=1 i=1
27
Proof. By the lemma, we can construct the measure µF from the premeasure µ0 .
Since µ0 is defined on a set containing all open intervals, µF is defined on all open
intervals. These sets generate B, so since µF is defined on a σ-algebra containing
these sets, µF is defined on at least all of B. Since µF agrees with the premeasure
µ0 , we have µF ((a, b]) = F (b) − F (a).
Given a finite µ, to show that F is nondecreasing, use the monotonicity of µ. If
a ≤ b,
F (a) = µ((−∞, a]) ≤ µ((−∞, b]) = F (b).
To show that F is right continuous, use continuity from above.
We now turn our attention to perhaps the most important (or at least the most
frequently used) measure: the measure of length on the real line.
28
Proposition 6.3. Let λ be Lebesgue measure. Then
1. λ({x}) = 0 for x ∈ R.
2. If A is countable, λ(A) = 0.
3. For a, b ∈ R with a ≤ b, λ((a, b)) = λ([a, b]) = b − a.
Proof. Properties 2 and 3 follow from the first.
T∞
1. Using continuity from above, since {x} = n=1 (x − 1/n, x], we have
λ({x}) = lim λ((x − 1/k, x]) = lim 1/k = 0.
k→∞ k→∞
S
2. If A is countable, then A = x∈A {x} is a countable union. So by countable
additivity, X X
λ(A) = λ({x}) = 0 = 0.
x∈A x∈A
Corollary 6.2.
λ(Q) = 0.
Take a moment to step back and realize how remarkable this is. The rational
numbers are inifinite and even dense in R, yet our canonical measure on R assigns
them zero measure! Even though the Borel σ-algebra is defined via topological
properties of R, Lebesgue measure does not always characterize size in the same way
as the topology does.
Exercise 6.5. Show that for every ε > 0, there is an open, dense subset A ⊆ R such
that λ(A) < ε. Conclude that an open, dense subset of R need not be R.19
Lebesgue measure is invariant under translations and scales with dilations. The
former of these properties is an example of a more general property of translation-
invariant measures (called Haar measure) on Abelian groups.20
19
This is a really interesting problem, especially since it’s so counterintuitive. Many students
incorrectly believe that open, dense subsets must be the whole space.
20
The interplay between the algebraic structure and the measure-theoretic properties is very rich.
One of my research interests is exactly this kind of thing: probability on algebraic structures.
29
Proposition 6.4. For r ∈ R, let A + r := {a + r : a ∈ A} and rA := {ra : a ∈ A}.
If A ∈ B, then
1. A + r, rA ∈ B.
2. λ(A + r) = λ(A).
3. λ(rA) = |r|λ(A).
30
References
[Bas13] R.F. Bass. Real Analysis for Graduate Students. Createspace Ind Pub, 2013.
[Fol13] G.B. Folland. Real Analysis: Modern Techniques and Their Applications.
Pure and Applied Mathematics: A Wiley Series of Texts, Monographs and
Tracts. Wiley, 2013.
31