Course in Probability Theory
Course in Probability Theory
COURSE IN
PROBABILITY
THEORY
THIRD EDITION
This Page Intentionally Left Blank
A
COURSE IN
PROBABILITY
THEORY
THIRD EDITION
Requests for permission to make copies of any part of the work should be mailed to the
following address: Permissions Department, Harcourt, Inc., 6277 Sea Harbor Drive, Orlando,
Florida 32887-6777.
ACADEMIC PRESS
A Harcourt Science and Technology Company
525 B Street, Suite 1900, San Diego, CA 92101-4495, USA
http://www.academicpress.com
ACADEMIC PRESS
Harcourt Place, 32 Jamestown Road, London, NW1 7BY, UK
http://www.academicpress.com
1 Distribution function
1.1 Monotone functions 1
1.2 Distribution functions 7
1.3 Absolutely continuous and singular distributions 11
2 Measure theory
2.1 Classes of sets 16
2.2 Probability measures and their distribution
functions 21
4 Convergence concepts
4.1 Various modes of convergence 68
4.2 Almost sure convergence; Borel–Cantelli lemma 75
vi CONTENTS
6 Characteristic function
6.1 General properties; convolutions 150
6.2 Uniqueness and inversion 160
6.3 Convergence theorems 169
6.4 Simple applications 175
6.5 Representation theorems 187
6.6 Multidimensional case; Laplace transforms 196
Bibliographical Note 204
8 Random walk
8.1 Zero-or-one laws 263
8.2 Basic notions 270
8.3 Recurrence 278
8.4 Fine structure 288
8.5 Continuation 298
Bibliographical Note 308
CONTENTS vii
Index 415
This Page Intentionally Left Blank
Preface to the third edition
galley proofs were read by David Kreps and myself independently, and it was
fun to compare scores and see who missed what. But since not all parts of
the old text have undergone the same scrutiny, readers of the new edition
are cordially invited to continue the fault-finding. Martha Kirtley and Joan
Shepard typed portions of the new material. Gail Lemmond took charge of
the final page-by-page revamping and it was through her loving care that the
revision was completed on schedule.
In the third printing a number of misprints and mistakes, mostly minor, are
corrected. I am indebted to the following persons for some of these corrections:
Roger Alexander, Steven Carchedi, Timothy Green, Joseph Horowitz, Edward
Korn, Pierre van Moerbeke, David Siegmund.
In the fourth printing, an oversight in the proof of Theorem 6.3.1 is
corrected, a hint is added to Exercise 2 in Section 6.4, and a simplification
made in (VII) of Section 9.5. A number of minor misprints are also corrected. I
am indebted to several readers, including Asmussen, Robert, Schatte, Whitley
and Yannaros, who wrote me about the text.
Preface to the first edition
best seen in the advanced study of stochastic processes, but will already be
abundantly clear from the contents of a general introduction such as this book.
Although many notions of probability theory arise from concrete models
in applied sciences, recalling such familiar objects as coins and dice, genes
and particles, a basic mathematical text (as this pretends to be) can no longer
indulge in diverse applications, just as nowadays a course in real variables
cannot delve into the vibrations of strings or the conduction of heat. Inciden-
tally, merely borrowing the jargon from another branch of science without
treating its genuine problems does not aid in the understanding of concepts or
the mastery of techniques.
A final disclaimer: this book is not the prelude to something else and does
not lead down a strait and righteous path to any unique fundamental goal.
Fortunately nothing in the theory deserves such single-minded devotion, as
apparently happens in certain other fields of mathematics. Quite the contrary,
a basic course in probability should offer a broad perspective of the open field
and prepare the student for various further possibilities of study and research.
To this aim he must acquire knowledge of ideas and practice in methods, and
dwell with them long and deeply enough to reap the benefits.
A brief description will now be given of the nine chapters, with some
suggestions for reading and instruction. Chapters 1 and 2 are preparatory. A
synopsis of the requisite “measure and integration” is given in Chapter 2,
together with certain supplements essential to probability theory. Chapter 1 is
really a review of elementary real variables; although it is somewhat expend-
able, a reader with adequate background should be able to cover it swiftly and
confidently — with something gained from the effort. For class instruction it
may be advisable to begin the course with Chapter 2 and fill in from Chapter 1
as the occasions arise. Chapter 3 is the true introduction to the language and
framework of probability theory, but I have restricted its content to what is
crucial and feasible at this stage, relegating certain important extensions, such
as shifting and conditioning, to Chapters 8 and 9. This is done to avoid over-
loading the chapter with definitions and generalities that would be meaningless
without frequent application. Chapter 4 may be regarded as an assembly of
notions and techniques of real function theory adapted to the usage of proba-
bility. Thus, Chapter 5 is the first place where the reader encounters bona fide
theorems in the field. The famous landmarks shown there serve also to intro-
duce the ways and means peculiar to the subject. Chapter 6 develops some of
the chief analytical weapons, namely Fourier and Laplace transforms, needed
for challenges old and new. Quick testing grounds are provided, but for major
battlefields one must await Chapters 7 and 8. Chapter 7 initiates what has been
called the “central problem” of classical probability theory. Time has marched
on and the center of the stage has shifted, but this topic remains without
doubt a crowning achievement. In Chapters 8 and 9 two different aspects of
PREFACE TO THE FIRST EDITION xv
THIRD EDITION
This Page Intentionally Left Blank
1 Distribution function
In general, we say that the function f has a jump at x iff the two limits
in (2) both exist but are unequal. The value of f at x itself, viz. fx, may be
arbitrary, but for an increasing f the relation (3) must hold. As a consequence
of (i) and (ii), we have the next result.
fx D 0 for x x0 1;
1 1 1
D1 for x0 x < x0 , n D 1, 2, . . . ;
n n nC1
D1 for x ½ x0 .
The point x0 is a point of accumulation of the points of jump fx0 1/n, n ½ 1g, but
f is continuous at x0 .
Before we discuss the next example, let us introduce a notation that will
be used throughout the book. For any real number t, we set
0 for x < t,
4 υt x D
1 for x ½ t.
We shall call the function υt the point mass at t.
Example 2. Let fan , n ½ 1g be any given enumeration of the set of all rational
numbers, and let fbn , n ½ 1g be a set of positive ½0 numbers such that 1nD1 bn < 1.
For instance, we may take bn D 2n . Consider now
1
5 fx D bn υan x.
nD1
Since 0 υan x 1 for every n and x, the series in (5) is absolutely and uniformly
convergent. Since each υan is increasing, it follows that if x1 < x2 ,
1
fx2 fx1 D bn [υan x2 υan x1 ] ½ 0.
nD1
But for each n, the number in the square brackets above is 0 or 1 according as x 6D an
or x D an . Hence if x is different from all the an ’s, each term on the right side of (6)
vanishes; on the other hand if x D ak , say, then exactly one term, that corresponding
to n D k, does not vanish and yields the value bk for the whole series. This proves
that the function f has jumps at all the rational points and nowhere else.
This example shows that the set of points of jump of an increasing function
may be everywhere dense; in fact the set of rational numbers in the example may
4 DISTRIBUTION FUNCTION
It follows that the two intervals Ix and Ix0 are disjoint, though they may
abut on each other if fxC D fx 0 . Thus we may associate with the set of
points of jump in the domain of f a certain collection of pairwise disjoint open
intervals in the range of f. Now any such collection is necessarily a countable
one, since each interval contains a rational number, so that the collection of
intervals is in one-to-one correspondence with a certain subset of the rational
numbers and the latter is countable. Therefore the set of discontinuities is also
countable, since it is in one-to-one correspondence with the set of intervals
associated with it.
(v) Let f1 and f2 be two increasing functions and D a set that is (every-
where) dense in 1, C1. Suppose that
8x 2 D: f1 x D f2 x.
Then f1 and f2 have the same points of jump of the same size, and they
coincide except possibly at some of these points of jump.
To see this, let x be an arbitrary point and let tn 2 D, tn0 2 D, tn " x,
0
tn # x. Such sequences exist since D is dense. It follows from (i) that
In particular
The first assertion in (v) follows from this equation and (ii). Furthermore if
f1 is continuous at x, then so is f2 by what has just been proved, and we
1.1 MONOTONE FUNCTIONS 5
have
f1 x D f1 x D f2 x D f2 x,
Q D fx, fx
fx Q D fx C fxC ,
Q D fxC, fx
2
and use one of these instead of the original one. The third modification is
found to be convenient in Fourier analysis, but either one of the first two is
more suitable for probability theory. We have a free choice between them and
we shall choose the second, namely, right continuity.
(vi) If we put
Q D fxC,
8x: fx
This is indeed true for any f such that ftC exists for every t. For then:
given any > 0, there exists υ > 0 such that
8s 2 x, x C υ: jfs fxCj .
Let t 2 x, x C υ and let s # t in the above, then we obtain
jftC fxCj ,
Let D be dense in 1, C1, and suppose that f is a function with the
domain D. We may speak of the monotonicity, continuity, uniform continuity,
6 DISTRIBUTION FUNCTION
EXERCISES
Q D fx f1
1 fx
fC1 f1
which is bounded and increasing with
2 Q
f1 D 0, Q
fC1 D 1.
Faj Faj D bj
which represents the sum of all the jumps of F in the half-line 1, x]. It is
clearly increasing, right continuous, with
3 Fd 1 D 0, Fd C1 D bj 1.
j
Fx 0 Fx.
This shows that Fc is left continuous; since it is also right continuous, being
the difference of two such functions, it is continuous.
1.2 DISTRIBUTION FUNCTIONS 9
Theorem 1.2.2. Let F be a d.f. Suppose that there exist a continuous func-
tion Gc and a function Gd of the form
Gd x D bj0 υaj0 x
j
[where faj0 g is a countable set of real numbers and j jbj0 j < 1], such that
F D Gc C Gd ,
then
Gc D Fc , Gd D Fd ,
where Fc and Fd are defined as before.
PROOF.If Fd 6D Gd , then either the sets faj g and faj0 g are not identical,
or we may relabel the aj0 so that aj0 D aj for all j but bj0 6D bj for some j. In
either case we have for at least one j, and aQ D aj or aj0 :
EXERCISES
unless x is a point of jump of F, in which case the limit is equal to the size
of the jump.
2. Let F be a d.f. with points of jump fa g. Prove that the sum
j
[Faj Faj ]
x<aj <x
7. Prove that the support of any d.f. is a closed set, and the support of
any continuous d.f. is a perfect set.
It follows from a well-known proposition (see, e.g., Natanson [3]Ł ) that such
a function F has a derivative equal to f a.e. In particular, if F is a d.f., then
1
2 f ½ 0 a.e. and ft dt D 1.
1
The next theorem summarizes some basic facts of real function theory;
see, e.g., Natanson [3].
(a) If S denotes the set of all x for which F0 x exists with 0 F0 x <
1, then mSc D 0.
(b) This F0 belongs to L 1 , and we have for every x < x 0 :
x0
4 F0 t dt Fx 0 Fx.
x
(c) If we put
x
5 8x: Fac x D F0 t dt, Fs x D Fx Fac x,
1
It is clear that Fac is increasing and Fac F. From (4) it follows that if
x < x0 x0
Fs x 0 Fs x D Fx 0 Fx ft dt ½ 0.
x
EXERCISES
1 C 2 C Ð Ð Ð C 2n1 D 2n 1
disjoint open intervals and are left with 2n disjoint closed intervals each of
length 1/3n . Let these removed ones, in order of position from left to right,
be denoted by Jn,k , 1 k 2n 1, and their union by Un . We have
n
1 2 4 2n1 2
mUn D C 2 C 3 C Ð Ð Ð C n D 1 .
3 3 3 3 3
As n " 1, Un increases to an open set U; the complement C of U with
respect to [0,1] is a perfect set, called the Cantor set. It is of measure zero
since
mC D 1 mU D 1 1 D 0.
This definition is consistent since two intervals, Jn,k and Jn0 ,k 0 , are either
disjoint or identical, and in the latter case so are cn,k D cn0 ,k 0 . The last assertion
14 DISTRIBUTION FUNCTION
The value of F is constant on each Jn,k and is strictly greater on any other
Jn0 ,k 0 situated to the right of Jn,k . Thus F is increasing and clearly we have
EXERCISES
11. Calculate
1 1 1
2
x dFx, x dFx, eitx dFx.
0 0 0
[HINT: This can be done directly or by using Exercise 10; for a third method
see Exercise 9 of Sec. 5.3.]
12. Extend the function F on [0,1] trivially to 1, 1. Let frn g be an
enumeration of the rationals and
1
1
Gx D Frn C x.
nD1
2n
Show that G is a d.f. that is strictly increasing for all x and singular. Thus we
have a singular d.f. with support 1, 1.
13. Consider F on [0,1]. Modify its inverse F1 suitably to make it
single-valued in [0,1]. Show that F1 so modified is a discrete d.f. and find
its points of jump and their sizes.
14. Given any closed set C in 1, C1, there exists a d.f. whose
support is exactly C. [HINT: Such a problem becomes easier when the corre-
sponding measure is considered; see Sec. 2.2 below.]
15. The Cantor d.f. F is a good building block of “pathological”
examples. For example, let H be the inverse of the homeomorphic map of [0,1]
onto itself: x ! 12 [Fx C x]; and E a subset of [0,1] which is not Lebesgue
measurable. Show that
1HE °H D 1E
(i) E 2 A ) Ec 2 A .
(ii) E1 2 A , E2 2 A ) E1 [ E2 2 A .
(iii) E1 2 A , E2 2 A ) E1 \ E2 2 A .
(iv) 8n ½ 2 : Ej 2 A , 1 j n ) n
jD1 Ej 2 A .
(v) 8n ½ 2 : Ej 2 A , 1 j n ) n
jD1 Ej 2 A .
1
(vi) Ej 2 A ; Ej ² EjC1 , 1 j < 1 ) jD1 Ej 2 A .
1
(vii) Ej 2 A ; Ej ¦ EjC1 , 1 j < 1 ) jD1 Ej 2 A .
1
(viii) Ej 2 A , 1 j < 1 ) jD1 Ej 2 A.
1
(ix) Ej 2 A , 1 j < 1 ) jD1 Ej 2 A.
(x) E1 2 A , E2 2 A , E1 ² E2 ) E2 nE1 2 A .
It follows from simple set algebra that under (i): (ii) and (iii) are equiv-
alent; (vi) and (vii) are equivalent; (viii) and (ix) are equivalent. Also, (ii)
implies (iv) and (iii) implies (v) by induction. It is trivial that (viii) implies
(ii) and (vi); (ix) implies (iii) and (vii).
n
Fn D Ej 2 A
jD1
1
hence jD1 Ej 2 A by (vi).
The collection S of all subsets of is a B.F. called the total B.F.; the
collection of the two sets f∅, g is a B.F. called the trivial B.F. If A is any
index set and if for every ˛ 2 A, F˛ is a B.F. (or M.C.) then the intersection
˛2A F˛ of all these B.F.’s (or M.C.’s), namely the collection of sets each
of which belongs to all F˛ , is also a B.F. (or M.C.). Given any nonempty
collection C of sets, there is a minimal B.F. (or field, or M.C.) containing it;
this is just the intersection of all B.F.’s (or fields, or M.C.’s) containing C ,
of which there is at least one, namely the S mentioned above. This minimal
B.F. (or field, or M.C.) is also said to be generated by C . In particular if F0
is a field there is a minimal B.F. (or M.C.) containing F0 .
The identities
⎛ ⎞
1 1
F\⎝ Ej ⎠ D F \ Ej
jD1 jD1
⎛ ⎞
1
1
F\⎝ Ej ⎠ D F \ Ej
jD1 jD1
show that both C1 and C2 are M.C.’s. Since F0 is closed under intersection and
contained in G , it is clear that F0 ² C1 . Hence G ² C1 by the minimality of G
2.1 CLASSES OF SETS 19
The theorem above is one of a type called monotone class theorems. They
are among the most useful tools of measure theory, and serve to extend certain
relations which are easily verified for a special class of sets or functions to a
larger class. Many versions of such theorems are known; see Exercise 10, 11,
and 12 below.
EXERCISES
where we have arithmetical addition modulo 2 on the right side. All properties
of 1 follow easily from this definition, some of which are rather tedious to
verify otherwise. As examples:
A 1 B 1 C D A 1B 1 C,
A 1 B 1B 1 C D A 1 C,
20 MEASURE THEORY
(i) 8E 2 F : P E ½ 0.
(ii) If fEj g is a countable collection of (pairwise) disjoint sets in F , then
⎛ ⎞
P ⎝ Ej ⎠ D P Ej .
j j
(iii) P D 1.
(iv) P E 1.
(v) P ∅ D 0.
(vi) P Ec D 1 P E.
(vii) P E [ F C P E \ F D P E C P F.
(viii) E ² F ) P E D P F P FnE P F.
(ix) Monotone property. En " E or En # E ) P En ! P E.
(x) Boole’s inequality. P j Ej j P Ej .
22 MEASURE THEORY
If En # ∅, the last term is the empty set. Hence if (ii) is assumed, we have
1
8n ½ 1: P En D P Ek nEkC1 ;
kDn
the series being convergent, we have limn!1 P En D 0. Hence (1) is true.
Conversely, let fEk , k ½ 1g be pairwise disjoint, then
1
Ek # ∅
kDnC1
1 1
n
P Ek D lim P Ek C lim P Ek
n!1 n!1
kD1 kD1 kDnC1
1
D P Ek .
kD1
In words, we assign pj as the value of the “probability” of the singleton fωj g, and
for an arbitrary set of ωj ’s we assign as its probability the sum of all the probabilities
assigned to its elements. Clearly axioms (i), (ii), and (iii) are satisfied. Hence P so
defined is a p.m.
Conversely, let any such P be given on F . Since fωj g 2 F for every j, P fωj g
is defined, let its value be pj . Then (2) is satisfied. We have thus exhibited all the
24 MEASURE THEORY
possible p.m.’s on , or rather on the pair , S ; this will be called a discrete sample
space. The entire first volume of Feller’s well-known book [13] with all its rich content
is based on just such spaces.
Example 3. Let R1 D 1, C1, C the collection of intervals of the form (a, b].
1 < a < b < C1. The field B0 generated by C consists of finite unions of disjoint
sets of the form (a, b], 1, a] or b, 1. The Euclidean B.F. B1 on R1 is the B.F.
generated by C or B0 . A set in B1 will be called a (linear) Borel set when there is no
danger of ambiguity. However, the Borel–Lebesgue measure m on R1 is not a p.m.;
indeed mR1 D C1 so that m is not a finite measure but it is -finite on B0 , namely:
there exists a sequence of sets En 2 B0 , En " R1 with mEn < 1 for each n.
EXERCISES
1. For any countably infinite set , the collection of its finite subsets
and their complements forms a field F . If we define P E on F to be 0 or 1
according as E is finite or not, then P is finitely additive but not countably so.
2. Let be the space of natural numbers. For each E ² let N E
n
be the cardinality of the set E \ [0, n] and let C be the collection of E’s for
which the following limit exists:
Nn E
P E D lim .
n!1 n
P is finitely additive on C and is called the “asymptotic density” of E. Let E D
fall odd integersg, F D fall odd integers in.[22n , 22nC1 ] and all even integers
in [22nC1 , 22nC2 ] for n ½ 0g. Show that E 2 C , F 2 C , but E \ F 2 / C . Hence
C is not a field.
2.2 PROBABILITY MEASURES AND THEIR DISTRIBUTION FUNCTIONS 25
3. In the preceding example show that for each real number ˛ in [0, 1]
there is an E in C such that P E D ˛. Is the set of all primes in C ? Give
an example of E that is not in C .
4. Prove the nonexistence of a p.m. on , S , where , S is as in
Example 1, such that the probability of each singleton has the same value.
Hence criticize a sentence such as: “Choose an integer at random”.
5. Prove that the trace of a B.F. F on any subset 1 of is a B.F.
Prove that the trace of , F , P on any 1 in F is a probability space, if
P 1 > 0.
6. Now let 1 2 / F be such that
1 ² F 2 F ) P F D 1.
8x 2 R1 : Ix D 1, x].
Then Ix 2 B1 so that Ix is defined; call it Fx and so define the function
F on R1 . We shall show that F is a d.f. as defined in Chapter 1. First of all, F
is increasing by property (viii) of the measure. Next, if xn # x, then Ixn # Ix ,
hence we have by (ix)
Hence F is right continuous. [The reader should ascertain what changes should
be made if we had defined F to be left continuous.] Similarly as x # 1, Ix #
∅; as x " C1, Ix " R1 . Hence it follows from (ix) again that
lim Fx D lim Ix D ∅ D 0;
x#1 x#1
This ends the verification that F is a d.f. The relations in (5) follow easily
from the following complement to (4):
To see this let xn < x and xn " x. Since Ixn " 1, x, we have by (ix):
To prove the last sentence in the theorem we show first that (4) restricted
to x 2 D implies (4) unrestricted. For this purpose we note that 1, x],
as well as Fx, is right continuous as a function of x, as shown in (6).
Hence the two members of the equation in (4), being both right continuous
functions of x and coinciding on a dense set, must coincide everywhere. Now
2.2 PROBABILITY MEASURES AND THEIR DISTRIBUTION FUNCTIONS 27
suppose, for example, the second relation in (5) holds for rational a and b. For
each real x let an , bn be rational such that an # 1 and bn > x, bn # x. Then
an , bn ! 1, x] and Fbn Fan ! Fx. Hence (4) follows.
Incidentally, the correspondence (4) “justifies” our previous assumption
that F be right continuous, but what if we have assumed it to be left continuous?
Now we proceed to the second-half of the correspondence.
SD ai , bi ]
i
Having thus defined the measure for all open sets, we find that its values for
all closed sets are thereby also determined by property (vi) of a probability
measure. In particular, its value for each singleton fag is determined to be
Fa Fa, which is nothing but the jump of F at a. Now we also know its
value on all countable sets, and so on — all this provided that no contradiction
28 MEASURE THEORY
is ever forced on us so far as we have gone. But even with the class of open
and closed sets we are still far from the B.F. B1 . The next step will be the Gυ
sets and the F sets, and there already the picture is not so clear. Although
it has been shown to be possible to proceed this way by transfinite induction,
this is a rather difficult task. There is a more efficient way to reach the goal
via the notions of outer and inner measures as follows. For any subset S of
R1 consider the two numbers:
Ł
S D inf U,
U open, U¦S
Ł
is the outer measure, Ł the inner measure (both with respect to the given
F). It is clear that Ł S ½ Ł S. Equality does not in general hold, but when
it does, we call S “measurable” (with respect to F). In this case the common
value will be denoted by S. This new definition requires us at once to
check that it agrees with the old one for all the sets for which has already
been defined. The next task is to prove that: (a) the class of all measurable
sets forms a B.F., say L ; (b) on this L , the function is a p.m. Details of
these proofs are to be found in the references given above. To finish: since
L is a B.F., and it contains all intervals of the form (a, b], it contains the
minimal B.F. B1 with this property. It may be larger than B1 , indeed it is
(see below), but this causes no harm, for the restriction of to B1 is a p.m.
whose existence is asserted in Theorem 2.2.2.
Let us mention that the introduction of both the outer and inner measures
is useful for approximations. It follows, for example, that for each measurable
set S and > 0, there exists an open set U and a closed set C such that
U ¦ S ¦ C and
where the infimum is taken over all countable unions n Un such that each
Un 2 B0 and n Un ¦ E. For another case where such a construction is
required see Sec. 3.3 below.
There is one more question: besides the discussed above is there any
other p.m. that corresponds to the given F in the same way? It is important
to realize that this question is not answered by the preceding theorem. It is
also worthwhile to remark that any p.m. that is defined on a domain strictly
containing B1 and that coincides with on B1 (such as the on L as
mentioned above) will certainly correspond to F in the same way, and strictly
speaking such a is to be considered as distinct from . Hence we should
phrase the question more precisely by considering only p.m.’s on B1 . This
will be answered in full generality by the next theorem.
Theorem 2.2.3. Let and be two measures defined on the same B.F. F ,
which is generated by the field F0 . If either or is -finite on F0 , and
E D E for every E 2 F0 , then the same is true for every E 2 F , and
thus D .
PROOF. We give the proof only in the case where and are both finite,
leaving the rest as an exercise. Let
C D fE 2 F : E D Eg,
PROOF. In order to apply the theorem, we must verify that any of the
hypotheses implies that and agree on a field that generates B. Let us take
intervals of the first kind and consider the field B0 defined above. If and
agree on such intervals, they must agree on B0 by countable additivity. This
finishes the proof.
Returning to Theorems 2.2.1 and 2.2.2, we can now add the following
complement, of which the first part is trivial.
PROOF. Let N be the collection of sets that are subsets of null sets, and
let F be the collection of subsets of each of which differs from a set in F
by a subset of a null set. Precisely:
9 F D fE ² : E 1 F 2 N for some F 2 F g.
It is easy to verify, using Exercise 1 of Sec. 2.1, that F is a B.F. Clearly it
contains F . For each E 2 F , we put
P E D P F,
where F is any set that satisfies the condition indicated in (7). To show that
this definition does not depend on the choice of such an F, suppose that
E 1 F1 2 N , E 1 F2 2 N .
Then by Exercise 2 of Sec. 2.1,
E 1 F1 1E 1 F2 D F1 1 F2 1E 1 E D F1 1 F2 .
Hence F1 1 F2 2 N and so P F1 1 F2 D 0. This implies P F1 D P F2 ,
as was to be shown. We leave it as an exercise to show that P is a measure
on F . If E 2 F , then E 1 E D ∅ 2 N , hence P E D P E.
Finally, it is easy to verify that if E 2 F and P E D 0, then E 2 N .
Hence any subset of E also belongs to N and so to F . This proves that
, F , P is complete.
What is the advantage of completion? Suppose that a certain property,
such as the existence of a certain limit, is known to hold outside a certain set
N with P N D 0. Then the exact set on which it fails to hold is a subset
of N, not necessarily in F , but will be in F with P N D 0. We need the
measurability of the exact exceptional set to facilitate certain dispositions, such
as defining or redefining a function on it; see Exercise 25 below.
EXERCISES
the support of a p.m. on B1 is the same as that of its d.f., defined in Exercise 6
of Sec. 1.2.
25. Let f be measurable with respect to F , and Z be contained in a null
set. Define
Q f on Zc ,
fD
K on Z,
Let the probability space , F , P be given. R1 D 1, C1 the (finite)
real line, RŁ D [1, C1] the extended real line, B1 D the Euclidean Borel
field on R1 , BŁ D the extended Borel field. A set in BŁ is just a set in B
possibly enlarged by one or both points š1.
Condition (1) then states that X1 carries members of B1 onto members of F :
2 8B 2 B1 : X1 B 2 F ;
or in the briefest notation:
X1 B1 ² F .
Such a function is said to be measurable (with respect to F ). Thus, an r.v. is
just a measurable function from to R1 (or RŁ ).
The next proposition, a standard exercise on inverse mapping, is essential.
Theorem 3.1.2. X is an r.v. if and only if for each real number x, or each
real number x in a dense subset of R1 , we have
fω: Xω xg 2 F .
PROOF. The preceding condition may be written as
if 8j: Sj 2 A , then
⎛ ⎞
P fXω 2 Bg or P fX 2 Bg.
Specifically, F is given by
Example 1. Let , S be a discrete sample space (see Example 1 of Sec. 2.2).
Every numerically valued function is an r.v.
Example 2. U , B, m.
In this case an r.v. is by definition just a Borel measurable function. According to
the usual definition, f on U is Borel measurable iff f1 B1 ² B. In particular, the
function f given by fω ω is an r.v. The two r.v.’s ω and 1 ω are not identical
but are identically distributed; in fact their common distribution is the underlying
measure m.
Example 3. R1 , B1 , .
The definition of a Borel measurable function is not affected, since no measure
is involved; so any such function is an r.v., whatever the given p.m. may be. As
in Example 2, there exists an r.v. with the underlying as its p.m.; see Exercise 3
below.
f ° X: ω ! fXω,
The reader who is not familiar with operations of this kind is advised to spell
out the proof above in the old-fashioned manner, which takes only a little
longer.
We must now discuss the notion of a random vector. This is just a vector
each of whose components is an r.v. It is sufficient to consider the case of two
dimensions, since there is no essential difference in higher dimensions apart
from complication in notation.
38 RANDOM VARIABLE. EXPECTATION. INDEPENDENCE
B1 ð B2 D fx, y: x 2 B1 , y 2 B2 g,
the right side being an abbreviation of P fω: Xω, Yω 2 Ag. This
is called the (2-dimensional, probability) distribution or simply the p.m. of
(X, Y).
Let us also define, in imitation of X1 , the inverse mapping X, Y1 by
the following formula:
[f ° X, Y]1 B1 D X, Y1 ° f1 B1 ² X, Y1 B2 ² F .
The last inclusion says the inverse mapping X, Y1 carries each 2-
dimensional Borel set into a set in F . This is proved as follows. If A D
B1 ð B2 , where B1 2 B1 , B2 2 B1 , then it is clear that
by (2). Now the collection of sets A in R2 for which X, Y1 A 2 F forms
a B.F. by the analogue of Theorem 3.1.1. It follows from what has just been
shown that this B.F. contains B20 hence it must also contain B2 . Hence each
set in B2 belongs to the collection, as was to be proved.
Here are some important special cases of Theorems 3.1.4 and 3.1.5.
Throughout the book we shall use the notation for numbers as well as func-
tions:
X _ Y, X ^ Y, X C Y, X Y, X Ð Y, X/Y
are r.v.’s, not necessarily finite-valued with probability one though everywhere
defined, and
lim Xj
j!1
and limj!1 Xj exists [and is finite] on the set where lim supj Xj D lim infj Xj
[and is finite], which belongs to F , the rest follows.
Here already we see the necessity of the general definition of an r.v.
given at the beginning of this section.
It is easy to see that X is discrete if and only if its d.f. is. Perhaps it is
worthwhile to point out that a discrete r.v. need not have a range that is discrete
in the sense of Euclidean topology, even apart from a set of probability zero.
Consider, for example, an r.v. with the d.f. in Example 2 of Sec. 1.1.
The following terminology and notation will be used throughout the book
for an arbitrary set , not necessarily the sample space.
More generally, let bj be arbitrary real numbers, then the function ϕ defined
below:
8ω 2 : ϕω D bj 13j ω,
j
is a discrete r.v. We shall call ϕ the r.v. belonging to the weighted partition
f3j ; bj g. Each discrete r.v. X belongs to a certain partition. For let fbj g be
the countable set in the definition of X and let 3j D fω: Xω D bj g, then X
belongs to the weighted partition f3j ; bj g. If j ranges over a finite index set,
the partition is called finite and the r.v. belonging to it simple.
EXERCISES
2. If two r.v.’s are equal a.e., then they have the same p.m.
3. Given any p.m. on R1 , B1 , define an r.v. whose p.m. is . Can
this be done in an arbitrary probability space?
4. Let be uniformly distributed on [0,1]. For each d.f. F, define Gy D
supfx: Fx yg. Then G has the d.f. F.
5. Suppose X has the continuous d.f. F, then FX has the uniform
distribution on [0,1]. What if F is not continuous?
6. Is the range of an r.v. necessarily Borel or Lebesgue measurable?
7. The sum, difference, product, or quotient (denominator nonvanishing)
of the two discrete r.v.’s is discrete.
8. If is discrete (countable), then every r.v. is discrete. Conversely,
every r.v. in a probability space is discrete if and only if the p.m. is atomic.
[HINT: Use Exercise 23 of Sec. 2.2.]
9. If f is Borel measurable, and X and Y are identically distributed, then
so are fX and fY.
10. Express the indicators of 31 [ 32 , 31 \ 32 , 31 n32 , 31 1 32 ,
lim sup 3n , lim inf 3n in terms of those of 31 , 32 , or 3n . [For the definitions
of the limits see Sec. 4.2.]
11. Let F fXg be the minimal B.F. with respect to which X is measurable.
Show that 3 2 F fXg if and only if 3 D X1 B for some B 2 B1 . Is this B
unique? Can there be a set A 2 / B1 such that 3 D X1 A?
12. Generalize the assertion in Exercise 11 to a finite set of r.v.’s. [It is
possible to generalize even to an arbitrary set of r.v.’s.]
X be an arbitrary positive r.v. For any two positive integers m and n, the set
n nC1
3mn D ω: m Xω <
2 2m
belongs to F . For each m, let Xm denote the r.v. belonging to the weighted
partition f3mn ; n/2m g; thus Xm D n/2m if and only if n/2m X < n C
1/2m . It is easy to see that we have for each m:
1
8ω: Xm ω XmC1 ω; 0 Xω Xm ω < .
2m
Consequently there is monotone convergence:
8ω: lim Xm ω D Xω.
m!1
If for one value of m we have E Xm D C1, then we define E X D C1;
otherwise we define
E X D lim E Xm ,
m!1
and call it “the integral of X (with respect to P ) over the set 3”. We shall
say that X is integrable with respect to P over 3 iff the integral above exists
and is finite.
In the case of R1 , B1 , , if we write X D f, ω D x, the integral
XωP dω D fx dx
3 3
to distinguish clearly between the four kinds of intervals (a, b], [a, b], (a, b),
[a, b).
In the case of U , B, m, the integral reduces to the ordinary Lebesgue
integral
b b
fxmdx D fx dx.
a a
(ii) Linearity.
aX C bYdP D a X dP C b Y dP
3 3 3
Xn D X a.e. or merely
(viii) Dominated convergence theorem. If limn!1
in measure on 3 and 8n: jXn j Y a.e. on 3, with 3 Y dP < 1, then
5 lim Xn dP D X dP D lim Xn dP .
n!1 3 3 3 n!1
It remains to show
1
1
8 nP 3n D P jXj ½ n,
nD0 nD0
finite or infinite. Now the partial sums of the series on the left may be rear-
ranged (Abel’s method of partial summation!) to yield, for N ½ 1,
N
9 nfP jXj ½ n P jXj ½ n C 1g
nD0
N
D fn n 1gP jXj ½ n NP jXj ½ N C 1
nD1
N
D P jXj ½ n NP jXj ½ N C 1.
nD1
Thus we have
N
N
N
10 nP 3n P jXj ½ n nP 3n C NP jXj ½ N C 1.
nD1 nD1 nD1
46 RANDOM VARIABLE. EXPECTATION. INDEPENDENCE
Hence if E jXj < 1, then the last term in (10) converges to zero as N ! 1
and (8) follows with both sides finite. On the other hand, if E jXj D 1, then
the second term in (10) diverges with the first as N ! 1, so that (8) is also
true with both sides infinite.
EXERCISES
1. If X ½ 0 a.e. on 3 and 3 X dP D 0, then X D 0 a.e. on 3.
2. If E jXj < 1 and lim
n!1 P 3n D 0, then limn!1 3n X dP D 0.
In particular
lim X dP D 0.
n!1 fjXj>ng
3. Let X ½ 0 and X dP D A, 0 < A < 1. Then the set function
defined on F as follows:
1
3 D X dP ,
A 3
is a probability measure on F .
4. Let c be a fixed constant, c > 0. Then E jXj < 1 if and only if
1
P jXj ½ cn < 1.
nD1
In particular, if the last series converges for one value of c, it converges for
all values of c.
5. For any r > 0, E jXjr < 1 if and only if
1
nr1 P jXj ½ n < 1.
nD1
3.2 PROPERTIES OF MATHEMATICAL EXPECTATION 47
31 , 32 D P 31 1 32 ;
then is a pseudo-metric in the space of sets in F ; call the resulting metric
space MF , P . Prove that
for each integrable r.v. X the mapping of MF , P
to R1 given by 3 ! 3 X dP is continuous. Similarly, the mappings on
MF , P ð MF , P to MF , P given by
31 , 32 ! 31 [ 32 , 31 \ 32 , 31 n32 , 31 1 32
are all continuous. If (see Sec. 4.2 below)
lim sup 3n D lim inf 3n
n n
modulo a null set, we denote the common equivalence class of these two sets
by limn 3n . Prove that in this case f3n g converges to limn 3n in the metric
. Deduce Exercise 2 above as a special case.
There is a basic relation between the abstract integral with respect to
P over sets in F on the one hand, and the Lebesgue–Stieltjes integral with
respect to over sets in B1 on the other, induced by each r.v. We give the
version in one dimension first.
2
Another important application is as follows: let be as in Theorem 3.2.3
and take fx, y to be x C y there. We obtain
16 E X C Y D x C y 2 dx, dy
R2
D x 2
dx, dy C y 2
dx, dy.
R2 R2
and consequently
This result is a case of the linearity of E given but not proved here; the proof
above reduces this property in the general case of , F , P to the corre-
sponding one in the special case R2 , B2 , 2 . Such a reduction is frequently
useful when there are technical difficulties in the abstract treatment.
We end this section with a discussion of “moments”.
Let a be real, r positive, then E jX ajr is called the absolute moment
of X of order r, about a. It may be C1; otherwise, and if r is an integer,
E X ar is the corresponding moment. If and F are, respectively, the
p.m. and d.f. of X, then we have by Theorem 3.2.2:
1
E jX ajr D jx ajr dx D jx ajr dFx,
R1 1
1
E X ar D x ar dx D x ar dFx.
R1 1
var X D 2
X D E fX E X2 g D E X2 fE Xg2 .
We note the inequality 2 X E X2 , which will be used a good deal
in Chapter 5. For any positive number p, X is said to belong to L p D
L p , F , P iff E jXjp < 1.
50 RANDOM VARIABLE. EXPECTATION. INDEPENDENCE
each u > 0:
E fϕXg
P fjXj ½ ug .
ϕu
PROOF. We have by the mean value theorem:
E fϕXg D ϕXdP ½ ϕXdP ½ ϕuP fjXj ½ ug
fjXj½ug
The most familiar application is when ϕu D jujp for 0 < p < 1, so
that the inequality yields an upper bound for the “tail” probability in terms of
an absolute moment.
EXERCISES
9. Another proof of (14): verify it first for simple r.v.’s and then use
Exercise 7 of Sec. 3.2.
0
10. Prove that if 0 r < r 0 and E jXjr < 1, then E jXjr < 1. Also
that E jXjr < 1 if and only if E jX ajr < 1 for every a.
11. If E X2 D 1 and E jXj ½ a > 0, then P fjXj ½ ag ½ 1 2 a2
for 0 1.
12. If X ½ 0 and Y ½ 0, p ½ 0, then E fX C Yp g 2p fE Xp C
E Yp g. If p > 1, the factor 2p may be replaced by 2p1 . If 0 p 1, it
may be replaced by 1.
13. If X ½ 0, then
j
⎧⎛ ⎞p ⎫
⎨ n ⎬ n
E ⎝ Xj ⎠
p
or ½ E Xj
⎩ ⎭
jD1 jD1
according as p 1 or p ½ 1.
14. If p > 1, we have
p
1 n 1
n
X jXj jp
n j
jD1 n jD1
and so ⎧ p ⎫
⎨ 1 n ⎬
1
n
E Xj E jXj jp ;
⎩ n ⎭ n
jD1 jD1
52 RANDOM VARIABLE. EXPECTATION. INDEPENDENCE
we have also
⎧ p ⎫ ⎧ ⎫p
⎨ 1 n ⎬ ⎨
1
n ⎬
E Xj E jXj jp 1/p .
⎩ n ⎭ ⎩ n jD1 ⎭
jD1
3.3 Independence
We shall now introduce a fundamental new concept peculiar to the theory of
probability, that of “(stochastic) independence”.
The r.v.’s of an infinite family are said to be independent iff those in every
finite subfamily are. They are said to be pairwise independent iff every two
of them are independent.
Note that (1) implies that the r.v.’s in every subset of fXj , 1 j ng
are also independent, since we may take some of the Bj ’s as R1 . On the other
hand, (1) is implied by the apparently weaker hypothesis: for every set of real
numbers fxj , 1 j ng:
⎧ ⎫
⎨ n ⎬ n
2 P Xj xj D P Xj xj .
⎩ ⎭
jD1 jD1
The proof of the equivalence of (1) and (2) is left as an exercise. In terms
of the p.m. n induced by the random vector X1 , . . . , Xn on Rn , Bn , and
the p.m.’s f j , 1 j ng induced by each Xj on R1 , B1 , the relation (1)
may be written as
⎛ ⎞
ð
n n
n⎝
3 Bj ⎠ D j Bj ,
jD1 jD1
ð1, x ]
n
Fx1 , . . . , xn D P fXj xj , 1 j ng D n ⎝ j
⎠;
jD1
From now on when the probability space is fixed, a set in F will also be
called an event. The events fEj , 1 j ng are said to be independent iff their
indicators are independent; this is equivalent to: for any subset fj1 , . . . j g of
f1, . . . , ng, we have
4 P Ej k D P Ejk .
kD1 kD1
Hence we have
⎧ ⎫ ⎧ ⎫
⎨ n ⎬ ⎨n ⎬ n
P [fj Xj 2 Aj ] D P [Xj 2 f1 A j ] D P fXj 2 f1
j Aj g
⎩ ⎭ ⎩ j
⎭
jD1 jD1 jD1
n
D P ffj Xj 2 Aj g.
jD1
This being true for every choice of the Aj ’s, the fj Xj ’s are independent
by definition.
The proof of the next theorem is similar and is left as an exercise.
are independent.
Theorem 3.3.3. If X and Y are independent and both have finite expecta-
tions, then
Now we have
⎛ ⎞
D⎝ 3j ⎠ \ Mk D 3j Mk
j k j,k
and
XωYω D cj dk if ω 2 3j Mk .
D E XE Y.
!
n" n0
D P Xm D m P Ym D m .
2 2
The independence of Xm and Ym is also a consequence of Theorem 3.3.1,
since Xm D [2m X]/2m , where [X] denotes the greatest integer in X. Finally, it
is clear that Xm Ym is increasing with m and
0 XY Xm Ym D XY Ym C Ym X Xm ! 0.
Hence, by the monotone convergence theorem, we conclude that
E XY D lim E Xm Ym D lim E Xm E Ym
m!1 m!1
Thus (5) is true also in this case. For the general case, we use (2) and (3) of
Sec. 3.2 and observe that the independence of X and Y implies that of XC and
YC ; X and Y ; and so on. This again can be seen directly or as a consequence
of Theorem 3.3.1. Hence we have, under our finiteness hypothesis:
E XY D E XC X YC Y
D E XC YC XC Y X YC C X Y
D E XC YC E XC Y E X YC C E X Y
D E XC E YC E XC E Y E X E YC C E X E Y
D fE XC E X gfE YC E Y g D E XE Y.
The first proof is completed.
Second proof. Consider the random vector (X, Y) and let the p.m.
induced by it be 2 dx, dy. Then we have by Theorem 3.2.3:
E XY D XY dP D xy 2 dx, dy
R2
finishing the proof! Observe that we are using here a very simple form of
Fubini’s theorem (see below). Indeed, the second proof appears to be so much
shorter only because we are relying on the theory of “product measure” 2 D
1 ð 2 on R , B . This is another illustration of the method of reduction
2 2
This follows at once by induction from (5), provided we observe that the
two r.v.’s
k
n
Xj and Xj
jD1 jDkC1
Do independent random variables exist? Here we can take the cue from
the intuitive background of probability theory which not only has given rise
historically to this branch of mathematical discipline, but remains a source of
inspiration, inculcating a way of thinking peculiar to the discipline. It may
be said that no one could have learned the subject properly without acquiring
some feeling for the intuitive content of the concept of stochastic indepen-
dence, and through it, certain degrees of dependence. Briefly then: events are
determined by the outcomes of random trials. If an unbiased coin is tossed
and the two possible outcomes are recorded as 0 and 1, this is an r.v., and it
takes these two values with roughly the probabilities 12 each. Repeated tossing
will produce a sequence of outcomes. If now a die is cast, the outcome may
be similarly represented by an r.v. taking the six values 1 to 6; again this
may be repeated to produce a sequence. Next we may draw a card from a
pack or a ball from an urn, or take a measurement of a physical quantity
sampled from a given population, or make an observation of some fortuitous
natural phenomenon, the outcomes in the last two cases being r.v.’s taking
some rational values in terms of certain units; and so on. Now it is very
easy to conceive of undertaking these various trials under conditions such
that their respective outcomes do not appreciably affect each other; indeed it
would take more imagination to conceive the opposite! In this circumstance,
idealized, the trials are carried out “independently of one another” and the
corresponding r.v.’s are “independent” according to definition. We have thus
“constructed” sets of independent r.v.’s in varied contexts and with various
distributions (although they are always discrete on realistic grounds), and the
whole process can be continued indefinitely.
Can such a construction be made rigorous? We begin by an easy special
case.
58 RANDOM VARIABLE. EXPECTATION. INDEPENDENCE
namely, to the n-tuple ω1 , . . . , ωn the probability to be assigned is the product of the
probabilities originally assigned to each component ωj by Pj . This p.m. will be called
the product measure derived from the p.m.’s fPj , 1 j ng and denoted by ðnjD1 Pj .
It is trivial to verify that this is indeed a p.m. Furthermore, it has the following product
property, extending its definition (7): if Sj 2 Sj , 1 j n, then
⎛ ⎞
ð
n
n
n⎝
8 P Sj ⎠ D Pj Sj .
jD1 jD1
To see this, we observe that the left side is, by definition, equal to
n
ÐÐÐ P n fω1 , . . . , ωn g D ÐÐÐ Pj fωj g
ω1 2S1 ωn 2Sn ω1 2S1 ωn 2Sn jD1
⎧ ⎫
n ⎨
⎬ n
D Pj fωj g D Pj Sj ,
⎩ ⎭
jD1 ωj 2Sj jD1
Sj D fωj 2 j : Xj ωj 2 Bj g
ð ð
⎨ n ⎬ ⎨ n ⎬
n
n
9 P n
[Xj 2 Bj ] D P n
Sj D Pj Sj D Pj fXj 2 Bj g.
⎩ ⎭ ⎩ ⎭
jD1 jD1 jD1 jD1
Then we have
ð
n n
Q j ω 2 Bj g D
fω: X fωj : Xj ωj 2 Bj g
jD1 jD1
since
Q j ω 2 Bj g D 1 ð Ð Ð Ð ð j1 ð fωj : Xj ωj 2 Bj g ð jC1 ð Ð Ð Ð ð n .
fω: X
Q j , 1 j ng are independent.
Therefore the r.v.’s fX
Xj x1 , . . . , xn D fj xj .
ð
n
n
n⎝
Bj ⎠ D j Bj .
jD1 jD1
It remains to extend this definition to all of Bn , or, more logically speaking, to prove
that there exists a p.m. n on Bn that has the above “product property”. The situation
is somewhat more complicated than in Example 1, just as Example 3 in Sec. 2.2 is
more complicated than Example 1 there. Indeed, the required construction is exactly
that of the corresponding Lebesgue–Stieltjes measure in n dimensions. This will be
subsumed in the next theorem. Assuming that it has been accomplished, then sets of
n independent r.v.’s can be defined just as in Example 2.
60 RANDOM VARIABLE. EXPECTATION. INDEPENDENCE
This expansion is unique except when x is of the form m/2n ; the set of such x is
countable and so of probability zero, hence whatever we decide to do with them will
be immaterial for our purposes. For the sake of definiteness, let us agree that only
expansions with infinitely many digits “1” are used. Now each digit j of x is a
function of x taking the values 0 and 1 on two Borel sets. Hence they are r.v.’s. Let
fcj , j ½ 1g be a given sequence of 0’s and 1’s. Then the set
n
fx : j x D cj , 1 j ng D fx : j x D cj g
jD1
is the set of numbers x whose first n digits are the given cj ’s, thus
with the digits from the n C 1st on completely arbitrary. It is clear that this set is
just an interval of length 1/2n , hence of probability 1/2n . On the other hand for each
j, the set fx: j x D cj g has probability 12 for a similar reason. We have therefore
n n
1 1
P fj D cj , 1 j ng D n
D D P fj D cj g.
2 jD1
2 jD1
This being true for every choice of the cj ’s, the r.v.’s fj , j ½ 1g are independent. Let
ffj , j ½ 1g be arbitrary functions with domain the two points f0, 1g, then ffj j , j ½
1g are also independent r.v.’s.
This example seems extremely special, but easy extensions are at hand (see
Exercises 13, 14, and 15 below).
We are now ready to state and prove the fundamental existence theorem
of product measures.
in which there exists an r.v. Xn with n as its p.m. Indeed this is possible if we
take n , Fn , Pn to be R1 , B1 , n and Xn to be the identical function of
the sample point x in R1 , now to be written as ωn (cf. Exercise 3 of Sec. 3.1).
Now define the infinite product space
ð
1
D n
nD1
ðF ,
1
11 ED n
nD1
where each Fn 2 Fn and all but a finite number of these Fn ’s are equal to the
corresponding n ’s. Thus ω 2 E if and only if ωn 2 Fn , n ½ 1, but this is
actually a restriction only for a finite number of values of n. Let the collection
of subsets of , each of which is the union of a finite number of disjoint finite-
product sets, be F0 . It is easy to see that the collection F0 is closed with respect
to complementation and pairwise intersection, hence it is a field. We shall take
the F in the theorem to be the B.F. generated by F0 . This F is called the
product B.F. of the sequence fFn , n ½ 1g and denoted by ðnD1 Fn .
1
where all but a finite number of the factors on the right side are equal to one.
Next, if E 2 F0 and
n
ED Ek ,
kD1
does not appear as first coordinate of any point in E, then E j ω10 D ∅. Note
that if E 2 F0 , then E j ω10 2 F0 for each ω10 . We claim that there exists an
ω10 such that for every n, we have P Cn j ω10 ½ υ/2. To see this we begin
with the equation
14 P Cn D P Cn j ω1 P1 dω1 .
1
Repeating the argument for the set Cn j ω1 , we see that there exists an ω20
0
such that for every n, P Cn j ω10 , ω20 ½ υ/4, where Cn j ω10 , ω20 D Cn j
ω10 j ω20 is of the form 1 ð 2 ð E3 and E3 is the set ω3 , ω4 , . . . in
ð1 nD3 n such that ω1 , ω2 , ω3 , ω4 , . . . 2 Cn ; and so forth by induction. Thus
0 0
16 8m < n: n
° mn D
m
.
Then there exists a probability space , F , P and a sequence of r.v.’s fXj g
on it such that for each n, n is the n-dimensional p.m. of the vector
X1 , . . . , Xn .
where j , Fj D R1 , B1 for each j; only P is now more general. In terms
of d.f.’s, the consistency condition (16) may be stated as follows. For each
m ½ 1 and x1 , . . . , xm 2 Rm , we have if n > m:
EXERCISES
are independent.
4. Fields or B.F.’s F˛ ² F of any family are said to be independent iff
any collection of events, one from each F˛ , forms a set of independent events.
Let F˛0 be a field generating F˛ . Prove that if the fields F˛0 are independent,
then so are the B.F.’s F˛ . Is the same conclusion true if the fields are replaced
by arbitrary generating sets? Prove, however, that the conditions (1) and (2)
are equivalent. [HINT: Use Theorem 2.1.2.]
5. If fX˛ g is a family of independent r.v.’s, then the B.F.’s generated
by disjoint subfamilies are independent. [Theorem 3.3.2 is a corollary to this
proposition.]
6. The r.v. X is independent of itself if and only if it is constant with
probability one. Can X and fX be independent where f 2 B1 ?
7. If fEj , 1 j < 1g are independent events then
⎛ ⎞
1
1
P ⎝ Ej ⎠ D P Ej ,
jD1 jD1
9. If X and Y are independent and E X exists, then for any Borel set
B, we have
X dP D E XP Y 2 B.
fY2Bg
10. If X and Y are independent and for some p > 0: E jX C Yjp < 1,
then E jXjp < 1 and E jYjp < 1.
11. If X and Y are independent, E jXjp < 1 for some p ½ 1, and
E Y D 0, then E jX C Yjp ½ E jXjp . [This is a case of Theorem 9.3.2;
but try a direct proof!]
12. The r.v.’s fj g in Example 4 are related to the “Rademacher func-
tions”:
rj x D sgnsin 2j x.
If 8n: fEn
j , 1 j ng are independent events, and
⎛ ⎞
n
P⎝ En
j
⎠!0 as n ! 1,
jD1
then ⎛ ⎞
n
n
P⎝ En ⎠¾ n
j P Ej .
jD1 jD1
then # $ # $
fdP 2 dP 1 D fdP 1 dP 2 .
1 2 2 1
Recall that our convention stated at the beginning of Sec. 3.1 allows
each r.v. a null set on which it may be š1. The union of all these sets being
still a null set, it may be included in the set N in (1) without modifying the
4.1 VARIOUS MODES OF CONVERGENCE 69
Theorem 4.1.1. The sequence fXn g converges a.e. to X if and only if for
every > 0 we have
or equivalently
Then Am is increasing with m. For each ω0 , the convergence of fXn ω0 g
to Xω0 implies that given any > 0, there exists mω0 , such that
If ω0 belongs to A, then (4) is true for all D 1/n, hence for all > 0 (why?).
This means fXn ω0 g converges to Xω0 for all ω0 in a set of probability one.
A weaker concept of convergence is of basic importance in probability
theory.
Strictly speaking, the definition applies when all Xn and X are finite-
valued. But we may extend it to r.v.’s that are finite a.e. either by agreeing
to ignore a null set or by the logical convention that a formula must first be
defined in order to be valid or invalid. Thus, for example, if Xn ω D C1
and Xω D C1 for some ω, then Xn ω Xω is not defined and therefore
such an ω cannot belong to the set fjXn Xj > g figuring in (5).
Since 20 clearly implies (5), we have the immediate consequence below.
Theorem 4.1.2. Convergence a.e. [to X] implies convergence in pr. [to X].
Theorem 4.1.3. The sequence fXn g converges a.e. if and only if for every
we have
It can be shown (Exercise 6 of Sec. 4.2) that this implies the existence of a
finite r.v. X such that Xn ! X in pr.
4.1 VARIOUS MODES OF CONVERGENCE 71
p
DEFINITION OF CONVERGENCE IN L , 0 < p < 1. The sequence fXn g is said
to converge in L to X iff Xn 2 L , X
p p
2 L p and
8 lim E jXn Xjp D 0.
n!1
is a metric in the space of r.v.’s, provided that we identify r.v.’s that are
equal a.e.
PROOF. If X, Y D 0, then E jX Yj D 0, hence X D Y a.e. by
Exercise 1 of Sec. 3.2. To show that Ð, Ð is metric it is sufficient to show that
jX Yj jXj jYj
E E CE .
1 C jX Yj 1 C jXj 1 C jYj
For this we need only verify that for every real x and y:
jx C yj jxj jyj
11 C .
1 C jx C yj 1 C jxj 1 C jyj
By symmetry we may suppose that jyj jxj; then
jx C yj jxj jx C yj jxj
12 D
1 C jx C yj 1 C jxj 1 C jx C yj1 C jxj
jjx C yj jxjj jyj
.
1 C jxj 1 C jyj
For any X the r.v. jXj/1 C jXj is bounded by 1, hence by the second
part of Theorem 4.1.4 the first assertion of the theorem will follow if we show
that jXn j ! 0 in pr. if and only if jXn j/1 C jXn j ! 0 in pr. But jxj is
equivalent to jxj/1 C jxj /1 C ; hence the proof is complete.
Example 1. Convergence in pr. does not imply convergence in L p , and the latter
does not imply convergence a.e.
Take the probability space , F , P to be U , B, m as in Example 2 of
Sec. 2.2. Let ϕk,j be the indicator of the interval
j1 j
, , k ½ 1, 1 j k.
k k
Order these functions lexicographically first according to k increasing, and then for
each k according to j increasing, into one sequence fXn g so that if Xn D ϕkn jn , then
kn ! 1 as n ! 1. Thus for each p > 0:
1
E Xpn D ! 0,
kn
and so Xn ! 0 in Lp . But for each ω and every k, there exists a j such that ϕkj ω D 1;
hence there exist infinitely many values of n such that Xn ω D 1. Similarly there exist
infinitely many values of n such that Xn ω D 0. It follows that the sequence fXn ωg
of 0’s and 1’s cannot converge for any ω. In other words, the set on which fXn g
converges is empty.
4.1 VARIOUS MODES OF CONVERGENCE 73
Now if we replace ϕkj by k 1/p ϕkj , where p > 0, then P fXn > 0g D 1/kn !
0 so that Xn ! 0 in pr., but for each n, we have E Xn p D 1. Consequently
limn!1 E jXn 0jp D 1 and Xn does not ! 0 in L p .
EXERCISES
jXj
fx D
1 C jXj
as in Theorem 4.1.5.]
5. Convergence in L p implies that in L r for r < p.
6. If Xn ! X, Yn ! Y, both in L p , then Xn š Yn ! X š Y in L p . If
Xn ! X in L p and Yn ! Y in L q , where p > 1 and 1/p C 1/q D 1, then
Xn Yn ! XY in L 1 .
7. If Xn ! X in pr. and Xn ! Y in pr., then X D Y a.e.
8. If Xn ! X a.e. and n and are the p.m.’s of Xn and X, it does not
follow that n P ! P even for all intervals P.
9. Give an example in which E Xn ! 0 but there does not exist any
subsequence fnk g ! 1 such that Xnk ! 0 in pr.
10. Let f be a continuous function on R1 . If X ! X in pr., then
n
fXn ! fX in pr. The result is false if f is merely Borel measurable.
[HINT: Truncate f at šA for large A.]
11. The extended-valued r.v. X is said to be bounded in pr. iff for each
> 0, there exists a finite M such that P fjXj Mg ½ 1 . Prove that
X is bounded in pr. if and only if it is finite a.e.
12. The sequence of extended-valued r.v. fXn g is said to be bounded in
pr. iff supn jXn j is bounded in pr.; fXn g is said to diverge to C1 in pr. iff for
each M > 0 and > 0 there exists a finite n0 M, such that if n > n0 , then
P fjXn j > Mg > 1 . Prove that if fXn g diverges to C1 in pr. and fYn g is
bounded in pr., then fXn C Yn g diverges to C1 in pr.
13. If supn Xn D C1 a.e., there need exist no subsequence fXnk g that
diverges to C1 in pr.
14. It is possible that for each ω, limn Xn ω D C1, but there does
not exist a subsequence fnk g and a set 1 of positive probability such that
limk Xnk ω D C1 on 1. [HINT: On U , B define Xn ω according to the
nth digit of ω.]
4.2 ALMOST SURE CONVERGENCE; BOREL–CANTELLI LEMMA 75
15. Instead of the in Theorem 4.1.5 one may define other metrics as
follows. Let 1 X, Y be the infimum of all > 0 such that
P jX Yj > .
Let 2 X, Y be the infimum of P fjX Yj > g C over all > 0. Prove
that these are metrics and that convergence in pr. is equivalent to convergence
according to either metric.
16. Convergence in pr. for arbitrary r.v.’s may be reduced to that of
bounded r.v.’s by the transformation
X0 D arctan X.
hence it belongs to
1
Fm D lim sup En .
n
mD1
Conversely, if ω belongs to 1
mD1 Fm , then ω 2 Fm for every m. Were ω to
belong to only a finite number of the En ’s there would be an m such that
ω 62 En for n ½ m, so that
1
ω 62 En D Fm .
nDm
This contradiction proves that ω must belong to an infinite number of the En ’s.
In more intuitive language: the event lim supn En occurs if and only if the
events En occur infinitely often. Thus we may write
P lim sup En D P En i.o.
n
4.2 ALMOST SURE CONVERGENCE; BOREL–CANTELLI LEMMA 77
where the abbreviation “i.o.” stands for “infinitely often”. The advantage of
such a notation is better shown if we consider, for example, the events “jXn j ½
” and the probability P fjXn j ½ i.o.g; see Theorem 4.2.3 below.
(ii) If each En 2 F , then we have
1
2 P lim sup En D lim P En ;
n m!1
nDm
1
3 P lim inf En D lim P En .
n m!1
nDm
Hence the hypothesis in (4) implies that P Fm ! 0, and the conclusion in
(4) now follows by (2).
As an illustration of the convenience of the new notions, we may restate
Theorem 4.1.1 as follows. The intuitive content of condition (5) below is the
point being stressed here.
Having so chosen fnk g, we let Ek be the event “jXnk j > 1/2k ”. Then we have
by (4):
1
P jXnk j > k i.o. D 0.
2
[Note: Here the index involved in “i.o.” is k; such ambiguity being harmless
and expedient.] This implies Xnk ! 0 a.e. (why?) finishing the proof.
EXERCISES
1. Prove that
P lim sup En ½ lim P En ,
n n
where
%
1
m, n, n0 D ω: max 0 jXj ω Xk ωj .
n<j<kn m
Hence if the Xn ’s are r.v.’s, we have
%
P C D lim lim lim
0
P m, n, n0 .
m!1 n!1 n !1
then limn!1 Xn exists a.e. but it may be infinite. [HINT: Consider all pairs of
rational numbers a, b and take a union over them.]
Now for any x ½ 0, we have 1 x ex ; it follows that the last term above
does not exceed
m0
m0
P En
e D exp P En .
nDm nDm
0
Letting m ! 1, the right member above ! 0, since the series in the
exponent ! C1 by hypothesis. It follows by monotone property that
1 m0
P En D lim
c
0
P En D 0.
c
m !1
nDm nDm
Thus the right member in (7) is equal to 0, and consequently so is the left
member in (7). This is equivalent to P En i.o. D 1 by (1).
Theorems 4.2.1 and 4.2.4 together will be referred to as the
Borel–Cantelli lemma, the former the “convergence part” and the latter “the
divergence part”. The first is more useful since the events there may be
completely arbitrary. The second has an extension to pairwise independent
r.v.’s; although the result is of some interest, it is the method of proof to be
given below that is more important. It is a useful technique in probability
theory.
4.2 ALMOST SURE CONVERGENCE; BOREL–CANTELLI LEMMA 81
Theorem 4.2.5. The implication (6) remains true if the events fEn g are pair-
wise independent.
PROOF. Let In denote the indicator of En , so that our present hypothesis
becomes
8 8m 6D n:
E Im In D E Im E In .
1
Consider the series of r.v.’s: nD1 In ω. It diverges to C1 if and only if
an infinite number of its terms are equal to one, namely if ω belongs to an
infinite number of the En ’s. Hence the conclusion in (6) is equivalent to
1
9 P In D C1 D 1.
nD1
What has been said so far is true for arbitrary En ’s. Now the hypothesis in
(6) may be written as
1
E In D C1.
nD1
k
Consider the partial sum Jk D nD1 In . Using Chebyshev’s inequality, we
have for every A > 0:
2
Jk 1
10 P fjJk E Jk j A Jk g ½ 1 D1 ,
A 2 Jk
2 A2
2
where J denotes the variance of J. Writing
pn D E In D P En ,
we may calculate 2 Jk by using (8), as follows:
k
E Jk D E
2
In C 2
2
Im In
nD1 1m<nk
k
D E I2n C 2 E Im E In
nD1 1m<nk
k
k
D E In 2 C 2 E Im E In C fE In E In 2 g
nD1 1m<nk nD1
2
k
k
D pn C pn p2n .
nD1 nD1
82 CONVERGENCE CONCEPTS
Hence
& '
k
k
2
Jk D E J2k E Jk D
2
pn p2n D 2
In .
nD1 nD1
This calculation will turn out to be a particular case of a simple basic formula;
see (6) of Sec. 5.1. Since knD1 pn D E Jk ! 1, it follows that
EXERCISES
Jn
lim D 1 a.e.
n!1 E Jn
[HINT: Take a subsequence fkn g such that E Jnk ¾ k 2 ; prove the result first
for this subsequence by estimating P fjJk E Jk j > υE Jk g; the general
case follows because if nk n < nkC1 ,
Jnk /E JnkC1 Jn /E Jn JnkC1 /E Jnk .
P f lim Xn ½ 1g > 0.
n!1
Deduce from this that if (i) n P En D 1 and (ii) there exists c > 0 such
that we have
8m < n: P Em En cP Em P Enm ;
then
P flim sup En g > 0.
n
19. If P En D 1 and
n
⎧ ⎫(
⎨ n n ⎬
n 2
In this situation it is said that masses of amount ˛ and ˇ “have wandered off to C1
and 1 respectively.” The remedy here is obvious: we should consider measures on
the extended line RŁ D [1, C1], with possible atoms at fC1g and f1g. We
leave this to the reader but proceed to give the appropriate definitions which take into
account the two kinds of troubles discussed above.
which proves (ii); indeed a, b] may be replaced by a, b or [a, b or [a, b]
there. Next, since the set of atoms of is countable, the complementary set
D is certainly dense in R1 . If a 2 D, b 2 D, then a, b is a continuity interval
of . This proves (ii) ) (iii). Finally, suppose (iii) is true so that (1) holds.
Given any a, b and > 0, there exist a1 , a2 , b1 , b2 all in D satisfying
a < a1 < a < a2 < a C , b < b1 < b < b2 < b C .
By (1), there exists n0 such that if n ½ n0 , then
j n ai , bj ] ai , bj ]j
for i D 1, 2 and j D 1, 2. It follows that
a C , b a2 , b1 ] n a2 , b1 ] n a, b n a1 , b2 ]
a1 , b2 ] C a , b C C .
Thus (iii) ) (i). The theorem is proved.
As an immediate consequence, the vague limit is unique. More precisely,
if besides (1) we have also
8a 2 D0 , b 2 D0 , a < b: n a, b] ! 0
a, b],
then 0 . For let A be the set of atoms of and of 0
; then if a 2 Ac ,
b 2 Ac , we have by Theorem 4.3.1, (ii):
0
a, b] n a, b] ! a, b]
0
so that a, b] D a, b]. Now A is dense in R1 , hence the two measures
c
Theorem 4.3.2. Let f n g and be p.m.’s. Then (i), (ii), and (iii) in the
preceding theorem are equivalent to the following “uniform” strengthening
of (i).
(i0 ) For any υ > 0 and > 0, there exists n0 υ, such that if n ½ n0
then we have for every interval a, b, possibly infinite:
4 a C υ, b υ n a, b a υ, b C υ C .
4.3 VAGUE CONVERGENCE 87
aj < ajC1 aj C υ, 1 j 1;
and
5 a1 , a c < .
4
By (ii), there exist n0 depending on and (and so on and υ) such that if
n ½ n0 then
6 sup j aj , ajC1 ] n aj , ajC1 ]j .
1j1 4
which states: The set of all s.p.m.’s is sequentially compact with respect to
vague convergence. It is often referred to as “Helly’s extraction (or selection)
principle”.
If n is a p.m., then Fn is just its d.f. (see Sec. 2.2); in general Fn is increasing,
right continuous with Fn 1 D 0 and Fn C1 D n R1 1.
Let D be a countable dense set of R1 , and let frk , k ½ 1g be an
enumeration of it. The sequence of numbers fFn r1 , n ½ 1g is bounded, hence
by the Bolzano–Weierstrass theorem there is a subsequence fF1k , k ½ 1g of
the given sequence such that the limit
lim F1k r1 D 1
k!1
By Sec. 1.1, (vii), F is increasing and right continuous. Let C denote the set
of its points of continuity; C is dense in R1 and we show that
For, let x 2 C and > 0 be given, there exist r, r 0 , and r 00 in D such that
r < r 0 < x < r 00 and Fr 00 Fr . Then we have
Fr Gr 0 Fx Gr 00 Fr 00 Fr C ;
) )
⏐ ⏐
and ⏐ ⏐
⏐ ⏐
as in Theorem 2.2.2. Now the relation (8) yields, upon taking differences:
v
Thus nk ! , and the theorem is proved.
v v
We say that Fn converges vaguely to F and write Fn ! F for n !
where n and are the s.p.m.’s corresponding to the s.d.f.’s Fn and F.
The reader should be able to confirm the truth of the following proposition
about real numbers. Let fxn g be a sequence of real numbers such that every
subsequence that tends to a limit (š1 allowed) has the same value for the
limit; then the whole sequence tends to this limit. In particular a bounded
sequence such that every convergent subsequence has the same limit is
convergent to this limit.
The next theorem generalizes this result to vague convergence of s.p.m.’s.
It is not contained in the preceding proposition but can be reduced to it if we
use the properties of vague convergence; see also Exercise 9 below.
infinity such that the numbers nk a, b converge to a limit, say L 6D a, b.
By Theorem 4.3.3, the sequence f nk , k ½ 1g contains a subsequence, say
f n0 , k ½ 1g, which converges vaguely, hence to by hypothesis of the
k
theorem. Hence again by Theorem 4.3.1, (ii), we have
nk a, b
0 ! a, b.
EXERCISES
(The first assertion is the Vitali–Hahn–Saks theorem and rather deep, but it
can be proved by reducing it to a problem of summability; see A. Rényi, [24].
4.4 CONTINUATION 91
8. If n and are p.m.’s and n E ! E for every open set E, then
this is also true for every Borel set. [HINT: Use (7) of Sec. 2.2.]
9. Prove a convergence theorem in metric space that will include both
Theorem 4.3.3 for p.m.’s and the analogue for real numbers given before the
theorem. [HINT: Use Exercise 9 of Sec. 4.4.]
4.4 Continuation
We proceed to discuss another kind of criterion, which is becoming ever more
popular in measure theory as well as functional analysis. This has to do with
classes of continuous functions on R1 .
CK D the class of continuous functions f each vanishing outside a
compact set Kf;
C0 D the class of continuous functions f such that
limjxj!1 fx D 0;
CB D the class of bounded continuous functions;
C D the class of continuous functions.
The problem is then that of the approximation of the graph of a plane curve
by inscribed or circumscribed polygons, as treated in elementary calculus. But
let us remark that the lemma is also a particular case of the Stone–Weierstrass
theorem (see, e.g., Rudin [2]) and should be so verified by the reader. Such
a sledgehammer approach has its merit, as other kinds of approximation
soon to be needed can also be subsumed under the same theorem. Indeed,
the discussion in this section is meant in part to introduce some modern
terminology to the relevant applications in probability theory. We can now
state the following alternative criterion for vague convergence.
v
Theorem 4.4.1. Let f ng
and be s.p.m.’s. Then n ! if and only if
2 8f 2 CK [or C0 ]: fx n dx ! fx dx.
R1 R1
v
PROOF. Suppose n ! ; (2) is true by definition when f is the indicator
of a, b] for a 2 D, b 2 D, where D is the set in (1) of Sec. 4.3. Hence by the
linearity of integrals it is also true when f is any D-valued step function. Now
let f 2 C0 and > 0; by the approximation lemma there exists a D-valued
step function f satisfying (1). We have
3 f d n f d f f d n C f d n f d
C f f d .
By the modulus inequality and mean value theorem for integrals (see Sec. 3.2),
the first term on the right side above is bounded by
jf f j d n d n ;
similarly for the third term. The second term converges to zero as n ! 1
because f is a D-valued step function. Hence the left side of (3) is bounded
by 2 as n ! 1, and so converges to zero since is arbitrary.
Conversely, suppose (2) is true for f 2 CK . Let A be the set of atoms
of as in the proof of Theorem 4.3.2; we shall show that vague convergence
holds with D D Ac . Let g D 1a,b] be the indicator of a, b] where a 2 D,
b 2 D. Then, given > 0, there exists υ > 0 such that a C υ < b υ, and
such that U < where
U D a υ, a C υ [ b υ, b C υ.
Now define g1 to be the function that coincides with g on 1, a] [ [a C
υ, b υ] [ [b, 1 and that is linear in a, a C υ and in b υ, b; g2 to be
the function that coincides with g on 1, a υ] [ [a, b] [ [b C υ, 1 and
4.4 CONTINUATION 93
# #
5 g1 d gd g2 d .
and is arbitrary, it follows that the middle term in (4) also converges to that
in (5), proving the assertion.
by (8). A similar estimate holds with replacing n above, by (7). Now the
argument leading from (3) to (2) finishes the proof of (6) in the same way.
v
This proves that n ! implies (6); the converse has already been proved
in Theorem 4.4.1.
Theorems 4.3.3 and 4.3.4 deal with s.p.m.’s. Even if the given sequence
f n g consists only of strict p.m.’s, the sequential vague limit may not be so.
This is the sense of Example 2 in Sec. 4.3. It is sometimes demanded that
such a limit be a p.m. The following criterion is not deep, but applicable.
PROOF. Suppose (11) holds. For any sequence f n g from the family, there
v
exists a subsequence f 0n g such that 0n ! . We show that is a p.m. Let J
be a continuity interval of which contains the I in (11). Then
0 0
R1 ½ J D lim n J ½ lim n I ½ 1 .
n n
There are several equivalent definitions (see, e.g., Royden [5]) but
the following characterization is most useful: f is bounded and lower
semicontinuous if and only if there exists a sequence of functions fk 2 CB
which increases to f everywhere, and we call f upper semicontinuous iff f
is lower semicontinuous. Usually f is allowed to be extended-valued; but to
avoid complications we will deal with bounded functions only and denote
by L and U respectively the classes of bounded lower semicontinuous and
bounded upper semicontinuous functions.
v
Theorem 4.4.4. If f n g and are p.m.’s, then n ! if and only if one
of the two conditions below is satisfied:
13 8f 2 L : lim fx n dx ½ fx dx
n
8g 2 U : lim gx n dx gx dx.
n
PROOF. We begin by observing that the two conditions above are equiv-
v
alent by putting f D g. Now suppose n ! and let fk 2 CB , fk " f.
Then we have
14 lim fx n dx ½ lim fk x n dx D fk x dx
n n
which proves
lim ϕx n dx D ϕx dx.
n
v
Hence n ! by Theorem 4.4.2.
(a) Xn C Yn ! X in dist.
(b) Xn Yn ! 0 in dist.
PROOF. We begin with the remark that for any constant c, Yn ! c in dist.
is equivalent to Yn ! c in pr. (Exercise 4 below). To prove (a), let f 2 CK ,
jfj M. Since f is uniformly continuous, given > 0 there exists υ such
that jx yj υ implies jfx fyj . Hence we have
This means P fjXn j > A0 g < for n > n0 . Furthermore we choose A ½ A0
so that the same inequality holds also for n n0 . Now it is clear that
! " ! "
P fjXn Yn j > g P fjXn j > Ag C P jYn j > C P jYn j > .
A A
The last-written probability tends to zero as n ! 1, and (b) follows.
EXERCISES
1. Let v
n and be p.m.’s such that n ! . Show that the conclusion
in (2) need not hold if (a) f is bounded and Borel measurable and all n
and are absolutely continuous, or (b) f is continuous except at one point
and every n is absolutely continuous. (To find even sharper counterexamples
would not be too easy, in view of Exercise 10 of Sec. 4.5.)
v
2. Let n ! when the n ’s are s.p.m.’s. Then for each f 2 C and
each finite continuity interval I we have I f d n ! I f d .
3. Let
n and be as in Exercise 1. If the fn ’s arebounded continuous
functions converging uniformly to f, then fn d n ! f d .
4. Give an example to show that convergence in dist. does not imply that
in pr. However, show that convergence to the unit mass υa does imply that in
pr. to the constant a.
5. A set f ˛ g of p.m.’s is tight if and only if the corresponding d.f.’s
fF˛ g converge uniformly in ˛ as x ! 1 and as x ! C1.
6. Let the r.v.’s fX˛ g have the p.m.’s f ˛ g. If for some real r > 0,
E fjX˛ jr g is bounded in ˛, then f ˛ g is tight.
7. Prove the Corollary to Theorem 4.4.4.
8. If the r.v.’s X and Y satisfy
P fjX Yj ½ g
where C1 is allowed as a value for each member with the usual convention.
In case of convergence in L r , r > 1, we have by Minkowski’s inequality
(Sec. 3.2), since X D Xn C X Xn D Xn Xn X:
E jXn jr 1/r E jXn Xjr 1/r E jXjr 1/r E jXn jr 1/r C E jXn Xjr 1/r .
Letting n ! 1 we obtain the second assertion of the theorem. For 0 < r 1,
the inequality jx C yjr jxjr C jyjr implies that
E jXn jr E jX Xn jr E jXjr E jXn jr C E jX Xn jr ,
whence the same conclusion.
The next result should be compared with Theorem 4.1.4.
100 CONVERGENCE CONCEPTS
Next we have
1
jfA x x j dFn x
r
jxj dFn x D
r
jXn jr dP
1 jxj>A jXn j>A
1 M
jXn jp dP .
Apr Apr
The last term does not depend
1 on n, and converges to zero as A ! 1. It
1
follows that as A ! 1, 1 fA dFn converges uniformly in n to 1 x r dF.
Hence by a standard theorem on the inversion of repeated limits, we have
1 1 1
4 x r dF D lim fA dF D lim lim fA dFn
1 A!1 1 A!1 n!1 1
1 1
D lim lim fA dFn D lim x r dFn .
n!1 A!1 1 n!1 1
We now introduce the concept of uniform integrability, which is of basic
importance in this connection. It is also an essential hypothesis in certain
convergence questions arising in the theory of martingales (to be treated in
Chapter 9).
uniformly in t 2 T.
4.5 UNIFORM INTEGRABILITY; CONVERGENCE OF MOMENTS 101
Theorem 4.5.3. The family fXt g is uniformly integrable if and only if the
following two conditions are satisfied:
Given > 0, there exists A D A such that the last-written integral is less
than /2 for every t, by (5). Hence (b) will follow if we set υ D /2A. Thus
(5) implies (b).
Conversely, suppose that (a) and (b) are true. Then by the Chebyshev
inequality we have for every t,
E jXt j M
P fjXt j > Ag ,
A A
where M is the bound indicated in (a). Hence if A > M/υ, then P Et < υ
and we have by (b):
jXt j dP < .
Et
valid for all r > 0, together with Theorem 4.5.3 then implies that the sequence
fjXn Xjr g is also uniformly integrable. For each > 0, we have
102 CONVERGENCE CONCEPTS
6 jXn Xjr dP D jXn Xjr dP C jXn Xjr dP
jXn Xj> jXn Xj
jXn Xjr dP C r .
jXn Xj>
where the inequalities follow from the shape of fA , while the limit relation
in the middle as in the proof of Theorem 4.4.5. Subtracting from the limit
relation in (iii), we obtain
lim jXn j dP
r
jXjr dP .
n!1 jXn jr >AC1 jXjr >A
The last integral does not depend on n and converges to zero as A ! 1. This
means: for any > 0, there exists A0 D A0 and n0 D n0 A0 such that
we have
sup jXn jr dP <
n>n0 jXn jr >AC1
provided that A > A0 . Since each jXn jr is integrable, there exists A1 D A1
such that the supremum above may be taken over all n ½ 1 provided that
A > A0 _ A1 . This establishes (i), and completes the proof of the theorem.
we say that “the moment problem is determinate” for the sequence. Of course
an arbitrary sequence of numbers need not be the sequence of moments for
any d.f.; a necessary but far from sufficient condition, for example, is that
the Liapounov inequality (Sec. 3.2) be satisfied. We shall not go into these
questions here but shall content ourselves with the useful result below, which
is often referred to as the “method of moments”; see also Theorem 6.4.5.
Theorem 4.5.5. Suppose there is a unique d.f. F with the moments fmr , r ½
1g, all finite. Suppose that fFn g is a sequence of d.f.’s, each of which has all
its moments finite: 1
mn D
r
x r dFn .
1
Since mn2k ! m2 < 1, it follows that as A ! 1, the left side converges
uniformly in k to one. Letting A ! 1 along a sequence of points such that
104 CONVERGENCE CONCEPTS
both šA belong to the dense set D involved in the definition of vague conver-
gence, we obtain as in (4) above:
R1 D lim A, CA D lim lim nk A, CA
A!1 A!1 k!1
Now for each r, let p be the next larger even integer. We have
1
x p d nk D mnp
k
! mp ,
1
hence mnp
k
is bounded in k. It follows from Theorem 4.5.2 that
1 1
x r d nk ! xr d .
1 1
r
But the left side also converges to m by (8). Hence by the uniqueness
hypothesis is the p.m. determined by F. We have therefore proved that every
vaguely convergent subsequence of f n g, or equivalently fFn g, has the same
limit , or equivalently F. Hence the theorem follows from Theorem 4.3.4.
EXERCISES
is uniformly integrable.
10. Suppose the distributions of fX , 1 n 1g are absolutely contin-
n
uous with densities fgn g such that gn ! g1 in Lebesgue measure. Then
gn ! g1 in L 1 1, 1, and consequently for every bounded Borel measur-
C
able
function f we have E ffX n g ! E ffX1 g. [ HINT: g1 gn dx D
C
g1 gn dx and g1 gn g1 ; use dominated convergence.]
Law of large numbers.
5 Random series
even though they overlap each other to some extent, for it is just as important
to learn the basic techniques as the results themselves.
The simplest cases follow from Theorems 4.1.4 and 4.2.3, according to
which if Zn is any sequence of r.v.’s, then E Z2n ! 0 implies that Zn ! 0
in pr. and Znk ! 0 a.e. for a subsequence fnk g. Applied to Zn D Sn /n, the
first assertion becomes
Sn
2 E Sn2 D on2 ) ! 0 in pr.
n
Now we can calculate E Sn2 more explicitly as follows:
⎛⎛ ⎞2 ⎞ ⎛ ⎞
⎜ n
⎟ n
3 E Sn D E ⎝
2 ⎝ ⎠
Xj ⎠ D E ⎝ X2j C 2 Xj Xk ⎠
jD1 jD1 1j<kn
n
D E X2j C 2 E Xj Xk .
jD1 1j<kn
Observe that there are n2 terms above, so that even if all of them are bounded
by a fixed constant, only E Sn2 D On2 will result, which falls critically short
of the hypothesis in (2). The idea then is to introduce certain assumptions to
cause enough cancellation among the “mixed terms” in (3). A salient feature
of probability theory and its applications is that such assumptions are not only
permissible but realistic. We begin with the simplest of its kind.
DEFINITION. Two r.v.’s X and Y are said to be uncorrelated iff both have
finite second moments and
4 E XY D E XE Y.
The r.v.’s of any family are said to be uncorrelated [orthogonal] iff every two
of them are.
without it the definitions are hardly useful. Finally, it is obvious that pairwise
independence implies uncorrelatedness, provided second moments are finite.
If fXn g is a sequence of uncorrelated r.v.’s, then the sequence fXn
E Xn g is orthogonal, and for the latter (3) reduces to the fundamental relation
below:
n
6 2
Sn D 2
Xj ,
jD1
which may be called the “additivity of the variance”. Conversely, the validity
of (6) for n D 2 implies that X1 and X2 are uncorrelated. There are only n
terms on the right side of (6), hence if these are bounded by a fixed constant
we have now 2 Sn D On D on2 . Thus (2) becomes applicable, and we
have proved the following result.
Theorem 5.1.1. If the Xj ’s are uncorrelated and their second moments have
a common bound, then (1) is true in L 2 and hence also in pr.
Theorem 5.1.2. Under the same hypotheses as in Theorem 5.1.1, (1) holds
also a.e.
PROOF. Without loss of generality we may suppose that E Xj D 0 for
each j, so that the Xj ’s are orthogonal. We have by (6):
E Sn2 Mn,
where M is a bound for the second moments. It follows by Chebyshev’s
inequality that for each > 0 we have
Mn M
P fjSn j > ng D 2.
n2 2 n
If we sum this over n, the resulting series on the right diverges. However, if
we confine ourselves to the subsequence fn2 g, then
M
P fjSn2 j > n2 g D < 1.
n n
n2 2
Hence by Theorem 4.2.1 (Borel–Cantelli) we have
7 P fjSn2 j > n2 i.o.g D 0;
5.1 SIMPLE LIMIT THEOREMS 109
Then we have
2
nC1
E Dn2 2nE jSnC12 Sn2 j2 D 2n 2
Xj 4n2 M
jDn2 C1
10 ω D Ðx1 x2 . . . xn . . . .
Except for the countable set of terminating decimals, for which there are two
distinct expansions, this representation is unique. Fix a k: 0 k 9, and let
110 LAW OF LARGE NUMBERS. RANDOM SERIES
n
k ω denote the number of digits among the first n digits of ω that are
equal to k. Then kn ω/n is the relative frequency of the digit k in the first
n places, and the limit, if existing:
n
k ω
11 lim D ϕk ω,
n!1 n
may be called the frequency of k in ω. The number ω is called simply normal
(to the scale 10) iff this limit exists for each k and is equal to 1/10. Intuitively
all ten possibilities should be equally likely for each digit of a number picked
“at random”. On the other hand, one can write down “at random” any number
of numbers that are “abnormal” according to the definition given, such as
Ð1111 . . ., while it is a relatively difficult matter to name even one normal
number in the sense of Exercise 5 below. It turns out that the number
Ð12345678910111213 . . . ,
Theorem 5.1.3. Except for a Borel set of measure zero, every number in
[0, 1] is simply normal.
PROOF. Consider the probability space U , B, m in Example 2 of
Sec. 2.2. Let Z be the subset of the form m/10n for integers n ½ 1, m ½ 1,
then mZ D 0. If ω 2 U nZ, then it has a unique decimal expansion; if ω 2 Z,
it has two such expansions, but we agree to use the “terminating” one for the
sake of definiteness. Thus we have
ω D Ð1 2 . . . n . . . ,
where for each n ½ 1, n Ð is a Borel measurable function of ω. Just as in
Example 4 of Sec. 3.3, the sequence fn , n ½ 1g is a sequence of independent
r.v.’s with
P fn D kg D 101
, k D 0, 1, . . . , 9.
Indeed according to Theorem 5.1.2 we need only verify that the n ’s are
uncorrelated, which is a very simple matter. For a fixed k we define the
5.1 SIMPLE LIMIT THEOREMS 111
r.v. Xn to be the indicator of the set fω: n ω D kg, then E Xn D 1/10,
E Xn 2 D 1/10, and
1
n
Xj ω
n jD1
is the relative frequency of the digit k in the first n places of the decimal for
ω. According to Theorem 5.1.2, we have then
Sn 1
! a.e.
n 10
Hence in the notation of (11), we have P fϕk D 1/10g D 1 for each k and
consequently also 9 #
$
1
P ϕk D D 1,
kD0
10
which means that the set of normal numbers has Borel measure one.
Theorem 5.1.3 is proved.
The preceding theorem makes a deep impression (at least on the older
generation!) because it interprets a general proposition in probability theory
at a most classical and fundamental level. If we use the intuitive language of
probability such as coin-tossing, the result sounds almost trite. For it merely
says that if an unbiased coin is tossed indefinitely, the limiting frequency of
“heads” will be equal to 12 — that is, its a priori probability. A mathematician
who is unacquainted with and therefore skeptical of probability theory tends
to regard the last statement as either “obvious” or “unprovable”, but he can
scarcely question the authenticity of Borel’s theorem about ordinary decimals.
As a matter of fact, the proof given above, essentially Borel’s own, is a
lot easier than a straightforward measure-theoretic version, deprived of the
intuitive content [see, e.g., Hardy and Wright, An introduction to the theory of
numbers, 3rd. ed., Oxford University Press, Inc., New York, 1954].
EXERCISES
1. For any sequence of r.v.’s fXn g, if E X2n ! 0, then (1) is true in pr.
but not necessarily a.e.
2. Theorem 5.1.2 may be sharpened as follows: under the same hypo-
theses we have Sn /n˛ ! 0 a.e. for any ˛ > 34 .
3. Theorem 5.1.2 remains true if the hypothesis of bounded second mo-
ments is weakened to: 2 Xn D On where 0 < 12 . Various combina-
tions of Exercises 2 and 3 are possible.
4. If fX g are independent r.v.’s such that the fourth moments E X4
n n
have a common bound, then (1) is true a.e. [This is Cantelli’s strong law of
112 LAW OF LARGE NUMBERS. RANDOM SERIES
large numbers. Without using Theorem 5.1.2 we may operate with E Sn4 /n4
as we did with E Sn2 /n2 . Note that the full strength of independence is not
needed.]
5. We may strengthen the definition of a normal number by considering
blocks of digits. Let r ½ 1, and consider the successive overlapping blocks of
r consecutive digits in a decimal; there are n r C 1 such blocks in the first
n places. Let n ω denote the number of such blocks that are identical with
a given one; for example, if r D 5, the given block may be “21212”. Prove
that for a.e. ω, we have for every r:
n
ω 1
lim D r
n!1 n 10
The law of large numbers in the form (1) of Sec. 5.1 involves only the first
moment, but so far we have operated with the second. In order to drop any
assumption on the second moment, we need a new device, that of “equivalent
sequences”, due to Khintchine (1894–1959).
5.2 WEAK LAW OF LARGENUMBERS 113
DEFINITION. Two sequences of r.v.’s fXn g and fYn g are said to be equiv-
alent iff
1 P fXn 6D Yn g < 1.
n
1
n
2 Xj Yj ! 0 a.e.
an jD1
PROOF. By the Borel–Cantelli lemma, (1) implies that
P fXn 6D Yn i.o.g D 0.
This means that there exists a null set N with the following property: if
ω 2 nN, then there exists n0 ω such that
Thus for such an ω, the two numerical sequences fXn ωg and fYn ωg differ
only in a finite number of terms (how many depending on ω). In other words,
the series
Xn ω Yn ω
n
consists of zeros from a certain point on. Both assertions of the theorem are
trivial consequences of this fact.
respectively. In particular, if
1
n
Xj
an jD1
1
n
Yj .
an jD1
To prove the last assertion of the corollary, observe that by Theorem 4.1.2
the relation (2) holds also in pr. Hence if
1
n
Xj ! X in pr.,
an jD1
then we have
1 1 1
n n n
Yj D Xj C Yj Xj ! X C 0 D X in pr.
an jD1 an jD1 an jD1
The next law of large numbers is due to Khintchine. Under the stronger
hypothesis of total independence, it will be proved again by an entirely
different method in Chapter 6.
By the corollary above, (3) will follow if (and only if) we can prove Tn /n !
m in pr. Now the Yn ’s are also pairwise independent by Theorem 3.3.1
(applied to each pair), hence they are uncorrelated, since each, being bounded,
has a finite second moment. Let us calculate 2 Tn ; we have by (6) of
Sec. 5.1,
n
n n
2
Tn D 2
Yj E Y2j D x 2 dFx.
jD1 jD1 jD1 jxjj
which is On2 , but not on2 as required by (2) of Sec. 5.1. To improve
on it, let fan g be a sequence of integers such that 0 < an < n, an ! 1 but
an D on. We have
n
x 2 dFx D C
jD1 jxjj jan an <jn
an jxj dFx C an jxj dFx
jan jxjan an <jn jxjan
C n jxj dFx
an <jn an <jxjn
1
nan jxj dFx C n2 jxj dFx.
1 jxj>an
The first term is Onan D on2 ; and the second is n2 o1 D on2 , since
the set fx: jxj > an g decreases to the empty set and so the last-written inte-
gral above converges to zero. We have thus proved that 2 Tn D on2 and
116 LAW OF LARGE NUMBERS. RANDOM SERIES
1
n
Tn E Tn
D fYj E Yj g ! 0 in pr.
n n jD1
It follows that
1
n
Tn
D Yj ! m in pr.,
n n jD1
as was to be proved.
For totally independent r.v.’s, necessary and sufficient conditions for
the weak law of large numbers in the most general formulation, due to
Kolmogorov and Feller, are known. The sufficiency of the following crite-
rion is easily proved, but we omit the proof of its necessity (cf. Gnedenko and
Kolmogorov [12]).
then if we put
n
5 an D x dFj x,
jD1 jxjbn
we have
1
6 Sn an ! 0 in pr.
bn
Next suppose that the Fn ’s have the property that there exists a > 0
such that
7 8n: Fn 0 ½ , 1 Fn 0 ½ .
5.2 WEAK LAW OF LARGENUMBERS 117
Then if (6) holds for the given fbn g and any sequence of real numbers fan g,
the conditions (i) and (ii) must hold.
(6) is proved.
As an application of Theorem 5.2.3 we give an example where the weak
but not the strong law of large numbers holds.
Example. Let fXn g be independent r.v.’s with a common d.f. F such that
c
P fX1 D ng D P fX1 D ng D , n D 3, 4, . . . ,
n2 log n
where c is the constant 1 1
1 1
.
2 nD3
n2 log n
Thus conditions (i) and (ii) are satisfied with bn D n; and we have an D 0 by (5).
Hence Sn /n ! 0 in pr. in spite of the fact that E jX1 j D C1. On the other hand,
we have
c
P fjX1 j > ng ¾ ,
n log n
so that, since X1 and Xn have the same d.f.,
P fjXn j > ng D P fjX1 j > ng D 1.
n n
But jSn Sn1 j D jXn j > n implies jSn j > n/2 or jSn1 j > n/2; it follows that
! n "
P jSn j > i.o. D 1,
2
and so it is certainly false that Sn /n ! 0 a.e. However, we can prove more. For any
A > 0, the same argument as before yields
P fjXn j > An i.o.g D 1
and consequently
An
P jSn j > i.o. D 1.
2
This means that for each A there is a null set ZA such that if ω 2 nZA, then
Sn ω A
10 lim ½ .
n!1 n 2
Let Z D 1 mD1 Zm; then Z is still a null set, and if ω 2 nZ, (10) is true for every
A, and therefore the upper limit is C1. Since X is “symmetric” in the obvious sense,
it follows that
Sn Sn
lim D 1, lim D C1 a.e.
n!1 n n!1 n
EXERCISES
n
Sn D Xj .
jD1
n
lim pk 1 pnk D 0
n!1 k
jknpj>nυ
p
and that an D o nbn .
8. They also imply that
n
1
x dFj x D o1.
bn jD1 bj <jxjbn
[HINT: Use the first part of Exercise 7 and divide the interval of integration
bj < jxj bn into parts of the form k < jxj kC1 with > 1.]
9. A median of the r.v. X is any number ˛ such that
P fX ˛g ½ 12 , P fX ½ ˛g ½ 12 .
Show that such a number always exists but need not be unique.
10. Let fX , 1 n 1g be arbitrary r.v.’s and for each n let m be a
n n
median of Xn . Prove that if Xn ! X1 in pr. and m1 is unique, then mn !
m1 . Furthermore, if there exists any sequence of real numbers fcn g such that
Xn cn ! 0 in pr., then Xn mn ! 0 in pr.
11. Derive the following form of the weak law of large numbers from
Theorem 5.2.3. Let fbn g be as in Theorem 5.2.3 and put Xn D 2bn for n ½ 1.
Then there exists fan g for which (6) holds but condition (i) does not.
5.3 CONVERGENCE OF SERIES 121
then Sn /n ! 0 in pr.
13. Let fXn g be a sequence of identically distributed strictly positive
random variables. For any ϕ such that ϕn/n ! 0 as n ! 1, show that
P fSn > ϕn i.o.g D 1, and so Sn ! 1 a.e. [HINT: Let Nn denote the number
of k n such that Xk ϕn/n. Use Chebyshev’s inequality to estimate
P fNn > n/2g and so conclude P fSn > ϕn/2g ½ 1 2Fϕn/n. This pro-
blem was proposed as a teaser and the rather unexpected solution was given
by Kesten.]
14. Let fbn g be as in Theorem 5.2.3. and put Xn D 2bn for n ½ 1. Then
there exists fan g for which (6) holds, but condition (i) does not hold. Thus
condition (7) cannot be omitted.
let us define
ω D minfj: 1 j n, jSj ωj > g.
Clearly is an r.v. with domain 3. Put
3k D fω: ω D kg D fω: max jSj ωj , jSk ωj > g,
1jk1
where for k D 1, max1j0 jSj ωj is taken to be zero. Thus is the “first
time” that the indicated maximum exceeds , and 3k is the event that this
occurs “for the first time at the kth step”. The 3k ’s are disjoint and we have
n
3D 3k .
kD1
It follows that
n
n
2 Sn dP D
2
Sn2 dP D [Sk C Sn Sk ]2 dP
3 kD1 3k kD1 3k
n
D [Sk2 C 2Sk Sn Sk C Sn Sk 2 ] dP .
kD1 3k
Let ϕk denote the indicator of 3k , then the two r.v.’s ϕk Sk and Sn Sk are
independent by Theorem 3.3.2, and consequently (see Exercise 9 of Sec. 3.3)
Sk Sn Sk dP D ϕk Sk Sn Sk dP
3k
D ϕk Sk dP Sn Sk dP D 0,
since the last-written integral is
n
E Sn Sk D E Xj D 0.
jDkC1
n
½ 2 P 3k D 2 P 3,
kD1
5.3 CONVERGENCE OF SERIES 123
where the last inequality is by the mean value theorem, since jSk j > on 3k by
definition. The theorem now follows upon dividing the inequality above by 2 .
Theorem 5.3.2. Let fXn g be independent r.v.’s with finite means and sup-
pose that there exists an A such that
3 8n: jXn E Xn j A < 1.
Then for every > 0 we have
2A C 42
4 P f max jSj j g 2 S
.
1jn n
PROOF. Let M0 D , and for 1 k n:
1k D Mk1 Mk .
We may suppose that P Mn > 0, for otherwise (4) is trivial. Furthermore,
let S00 D 0 and for k ½ 1,
k
X0k D Xk E Xk , Sk0 D X0j .
jD1
Now we write
0
6 SkC1 akC1 2 dP D Sk0 ak C ak akC1 C X0kC1 2 dP
MkC1 Mk
Sk0 ak C ak akC1 C X0kC1 2 dP
1kC1
and denote the two integrals on the right by I1 and I2 , respectively. Using the
definition of Mk and (3), we have
1
0
jSk ak j D Sk E Sk [Sk E Sk ] dP
P Mk Mk
1
D Sk Sk dP jSk j C ;
P Mk Mk
124 LAW OF LARGE NUMBERS. RANDOM SERIES
1 1
jak akC1 j D Sk dP Sk dP
P Mk Mk P MkC1 MkC1
1
7 XkC1 dP 2 C A.
0
P MkC1 MkC1
It follows that, since jSk j on 1kC1 ,
I2 jSk j C C 2 C A C A2 dP 4 C 2A2 P 1kC1 .
1kC1
The integrals of the last three terms all vanish by (5) and independence, hence
0
0
I1 ½ Sk ak dP C
2
XkC12
dP
Mk Mk
D Sk0 ak 2 dP C P Mk 2
XkC1 .
Mk
n
½ P Mn 2
Xj 4 C 2A2 P nMn ,
jD1
hence
n
2A C 42 ½ P Mn 2
Xj ,
jD1
which is (4).
5.3 CONVERGENCE OF SERIES 125
Theorem 5.3.3. Let fXn g be independent r.v.’s and define for a fixed con-
stant A > 0:
Xn ω, if jXn ωj A;
Yn ω D
0, if jXn ωj > A.
Then the series n Xn converges a.e. if and only if the following three series
all converge:
(i) n P fjXn j > Ag D n P fXn 6D Yn g,
(ii) n E Yn ,
2
(iii) n Yn .
PROOF. Suppose that the three series converge. Applying Theorem 5.3.1
to the sequence fYn E Yn g, we have for every m ½ 1:
⎧ ⎫
⎨ ⎬ n0
k 1
max
fYj E Yj g ½1m 2 2
Yj .
⎩nkn0
P
m ⎭
jDn jDn
Theorem 5.3.4. If fXn g is a sequence of independent r.v.’s, then the conver-
gence of the series n Xn in pr. is equivalent to its convergence a.e.
PROOF. By Theorem 4.1.2, it is sufficient to prove that convergence of
n n in pr. implies its convergence a.e. Suppose the former; then, given
X
: 0 < < 1, there exists m0 such that if n > m > m0 , we have
8 P fjSm,n j > g < ,
where
n
Sm,n D Xj .
jDmC1
where the sets in the union are disjoint. Going to probabilities and using
independence, we obtain
n
P f max jSm,j j 2; jSm,k j > 2gP fjSk,n j g P fjSm,n j > g.
m<jk1
kDmC1
This inequality is due to Ottaviani. By (8), the second factor on the left exceeds
1 , hence if m > m0 ,
1
10 P f max jSm,j j > 2g P fjSm,n j > g < .
m<jn 1 1
Example. n š1/n.
This is meant to be the “harmonic series” with a random choice of signs in each
term, the choices being totally independent and equally likely to be C or in each
case. More precisely, it is the series
Xn
,
n
n
EXERCISES
[HINT: Let
3k D f max Sj < x; Sk ½ xg
1j<k
n p p
then kD1P f3k ; Sn Sk ½ 2ng P fSn ½ x 2ng.]
3. Theorem 5.3.2 has the following companion, which is easier to prove.
Under the joint hypotheses in Theorems 5.3.1 and 5.3.2, we have
A C 2
P f max jSj j g 2 S
.
1jn n
4. Let fXn , X0n , n ½ 1g be independent r.v.’s such that Xn and X0n have
the same distribution. Suppose further that all these r.v.’s are bounded by the
same constant A. Then
Xn X0n
n
Use Exercise 3 to prove this without recourse to Theorem 5.3.3, and so finish
the converse part of Theorem 5.3.3.
5. But neither Theorem 5.3.2 nor the alternative indicated in the prece-
ding exercise is necessary; what we need is merely the following result, which
is an easy consequence of a general theorem in Chapter 7. Let fXn g be a
sequence of independent and uniformly bounded r.v.’s with 2 Sn ! C1.
Then for every A > 0 we have
lim P fjSn j Ag D 0.
n!1
converges a.e. Prove that the limit has the Cantor d.f. discussed in Sec. 1.3.
Do Exercise 11 in that section again; it is easier now.
10. If šX converges a.e. for all choices of š1, where the X ’s are
n n n
arbitrary r.v.’s, then n Xn 2 converges a.e. [HINT: Consider n rn tXn ω
where the rn ’s are coin-tossing r.v.’s and apply Fubini’s theorem to the space
of t, ω.]
xn D an bn bn1
and
1 1 1
n n n1
xj D aj bj bj1 D bn bj ajC1 aj
an jD1 an jD1 an jD0
1
n1
ajC1 aj D 1,
an jD0
and bn ! b1 , we have
1
n
xj ! b1 b1 D 0.
an jD1
Thus, for the r.v.’s fYn E Yn g/an , the series (iii) in Theorem 5.3.3 con-
verges, while the two other series vanish for A D 2, since jYn E Yn j 2an ;
hence
1
5 fYn E Yn g converges a.e.
n
an
Next we have
jE Yn j 1
1
D
x dFn x D x dFn x
an
an jxjan
an jxj>an
n n n
jxj
dFn x,
n jxj>an an
1
where the second equation follows from 1 x dFn x D 0. By the first hypo-
thesis in (1), we have
jxj ϕx
for jxj > an .
an ϕan
It follows that
jE Yn j ϕx E ϕXn
dFn x < 1.
n
an n jxj>an ϕan n
ϕan
This and (5) imply that n Yn /an converges a.e. Finally, since ϕ ", we have
ϕx
P fXn 6D Yn g D dFn x dFn x
n n jxj>an n jxj>an ϕan
E ϕXn
< 1.
n
ϕan
Thus, fXn g and fYn g are equivalent sequences and (3) follows.
132 LAW OF LARGE NUMBERS. RANDOM SERIES
In case all n2 D 1 so that sn2 D n, the above ratio has a denominator that is
close to n1/2 . Later we shall see that n1/2 is a critical order of magnitude
for Sn .
5.4 STRONG LAW OF LARGENUMBERS 133
We now come to the strong version of Theorem 5.2.2 in the totally inde-
pendent case. This result is also due to Kolmogorov.
by Theorem 3.2.1, fXn g and fYn g are equivalent sequences. Let us apply (7)
to fYn E Yn g, with ϕx D x 2 . We have
2 Yn E Y2 1
10 2
2
n
D 2
x 2 dFx.
n
n n
n n
n jxjn
We are obliged to estimate the last written second moment in terms of the first
moment, since this is the only one assumed in the hypothesis. The standard
technique is to split the interval of integration and then invert the repeated
summation, as follows:
1 n
1
2
x 2 dFx
nD1
n jD1 j1<jxjj
1
1
1
D x 2 dFx 2
jD1 j1<jxjj nDj
n
1
1
C
j jxj dFx Ð C Ð jxj dFx
jD1 j1<jxjj j jD1 j1jxjj
D CE jX1 j < 1.
In the above we have used the elementary estimate 1 nDj n
2
Cj1 for
some constant C and all j ½ 1. Thus the first sum in (10) converges, and we
conclude by (7) that
1
n
fYj E Yj g ! 0 a.e.
n jD1
134 LAW OF LARGE NUMBERS. RANDOM SERIES
PROOF. Writing
1
dFx D dFx,
jxj½an kDn ak jxj<akC1
5.4 STRONG LAW OF LARGENUMBERS 135
substituting into (12) and rearranging the double series, we see that the series
in (12) converges if and only if
13 k dFx < 1.
k ak1 jxj<ak
Next, with a0 D 0:
1
Y2n
E x 2 dFx
n
an2 n
an2 jxj<an
1 n
1
D 2
x 2 dFx
nD1
an kD1 ak1 jxj<ak
1
1
1
dFxak2 .
kD1 ak1 jxj<ak a2
nDk n
and so
1
Y2n
E 2k dFx < 1
n
an2 kD1 ak1 jxj<ak
by (13). Hence Yn /an converges (absolutely) a.e. by Theorem 5.4.1, and
so by Kronecker’s lemma:
1
n
15 Yk ! 0 a.e.
an kD1
136 LAW OF LARGE NUMBERS. RANDOM SERIES
because naj /an j for j n. We may now as well replace the n in the
right-hand member above by 1; as N ! 1, it tends to 0 as the remainder of
the convergent series in (13). Thus the quantity in (16) tends to 0 as n ! 1;
combine this with (14) and (15), we obtain the first alternative in (11).
The second alternative is proved in much the same way as in Theorem 5.4.2
and is left as an exercise. Note that when an D n it reduces to (9) above.
EXERCISES
1
n
cj Xj ! 0 a.e.
n jD1
(i) Sn /n ! 0 in pr.,
(ii) S2n /2n ! 0 a.e.;
[This result, due to P. Lévy and Marcinkiewicz, was stated with a superfluous
condition on fan g. Proceed as in Theorem 5.3.3 but truncate Xn at an ; direct
estimates are easy.]
11. Prove the second alternative in Theorem 5.4.3.
12. If E X1 6D 0, then max1kn jXk j/jSn j ! 0 a.e. [HINT: jXn j/n !
0 a.e.]
13. Under the assumptions in Theorem 5.4.2, if Sn /n converges a.e. then
E jX1 j < 1. [Hint: Xn /nconverges to 0 a.e., hence P fjXn j > n i.o.g D 0;
use Theorem 4.2.4 to get n P fjX1 j > ng < 1.]
5.5 Applications
The law of large numbers has numerous applications in all parts of proba-
bility theory and in other related fields such as combinatorial analysis and
statistics. We shall illustrate this by two examples involving certain important
new concepts.
The first deals with so-called “empiric distributions” in sampling theory.
Let fXn , n ½ 1g be a sequence of independent, identically distributed r.v.’s
with the common d.f. F. This is sometimes referred to as the “underlying”
or “theoretical distribution” and is regarded as “unknown” in statistical lingo.
For each ω, the values Xn ω are called “samples” or “observed values”, and
the idea is to get some information on F by looking at the samples. For each
5.5 APPLICATIONS 139
We have then
1
n
Fn x, ω D j x, ω.
n jD1
For each x, the sequence fj xg is totally independent since fXj g is, by
Theorem 3.3.1. Furthermore they have the common “Bernoullian distribution”,
taking the values 1 and 0 with probabilities p and q D 1 p, where
p D Fx, q D 1 Fx;
thus E j x D Fx. The strong law of large numbers in the form
Theorem 5.1.2 or 5.4.2 applies, and we conclude that
ND Nx
x2Q
is again a null set. Hence by the definition of vague convergence in Sec. 4.4,
we can already assert that
and it follows as before that there exists a null set Nx such that if ω 2
nNx, then
3 Fn xC, ω Fn x, ω ! FxC Fx.
Now let N1 D x2Q[J Nx, then N1 is a null set, and if ω 2 nN1 , then (3)
holds for every x 2 J and we have also
4 Fn x, ω ! Fx
for every x 2 Q. Hence the theorem will follow from the following analytical
result.
PROOF. Suppose the contrary, then there exist > 0, a sequence fnk g of
integers tending to infinity, and a sequence fxk g in R1 such that for all k:
In each case let first k ! 1, then r1 " , r2 # ; then the last member of
each chain of inequalities does not exceed a quantity which tends to 0 and a
contradiction is obtained.
Remark. The reader will do well to observe the way the proof above
is arranged. Having chosen a set of ω with probability one, for each fixed
ω in this set we reason with the corresponding sample functions Fn Ð, ω
and FÐ, ω without further intervention of probability. Such a procedure is
standard in the theory of stochastic processes.
strictly positive but may be C1. Now the successive r.v.’s are interpreted as
“lifespans” of certain objects undergoing a process of renewal, or the “return
periods” of certain recurrent phenomena. Typical examples are the ages of a
succession of living beings and the durations of a sequence of services. This
raises theoretical as well as practical questions such as: given an epoch in
time, how many renewals have there been before it? how long ago was the
last renewal? how soon will the next be?
Let us consider the first question. Given the epoch t ½ 0, let Nt, ω
be the number of renewals up to and including the time t. It is clear that
we have
This shows in particular that for each t > 0, Nt D Nt, Ð is a discrete r.v.
whose range is the set of all natural numbers. The family of r.v.’s fNtg
indexed by t 2 [0, 1 may be called a renewal process. If the common distri-
bution F of the Xn ’s is the exponential Fx D 1 ex , x ½ 0; where > 0,
then fNt, t ½ 0g is just the simple Poisson process with parameter .
Let us prove first that
namely that the total number of renewals becomes infinite with time. This is
almost obvious, but the proof follows. Since Nt, ω increases with t, the limit
in (8) certainly exists for every ω. Were it finite on a set of strictly positive
probability, there would exist an integer M such that
which is impossible. (Only because we have laid down the convention long
ago that an r.v. such as X1 should be finite-valued unless otherwise specified.)
Next let us write
and suppose for the moment that m < C1. Then, according to the strong law
of large numbers (Theorem 5.4.2), Sn /n ! m a.e. Specifically, there exists a
5.5 APPLICATIONS 143
Now for each fixed ω0 , if the numerical sequence fan ω0 , n ½ 1g converges
to a finite (or infinite) limit m and at the same time the numerical function
fNt, ω0 , 0 t < 1g tends to C1 as t ! C1, then the very definition of a
limit implies that the numerical function faNt,ω0 ω0 , 0 t < 1g converges
to the limit m as t ! C1. Applying this trivial but fundamental observation to
1
n
an D Xj
n jD1
By the definition of Nt, ω, the numerator on the left side should be close to
t; this will be confirmed and strengthened in the following theorem.
The deduction of (12) from (11) is more tricky than might have been
thought. Since Xn is not zero a.e., there exists υ > 0 such that
8n: P fXn ½ υg D p > 0.
Define
υ, if Xn ω ½ υ;
X0n ω D
0, if Xn ω < υ;
and let Sn0 and N0 t be the corresponding quantities for the sequence fX0n , n ½
1g. It is obvious that Sn0 Sn and N0 t ½ Nt for each t. Since the r.v.’s
fX0n /υg are independent with a Bernoullian distribution, elementary computa-
tions (see Exercise 7 below) show that
2
t
E fN0 t2 g D O 2 as t ! 1.
υ
Hence we have, υ being fixed,
2
Nt 2 N0 t
E E D O1.
t t
Since (11) implies the convergence of Nt/t in distribution to υ1/m , an appli-
cation of Theorem 4.5.2 with Xn D Nn/n and p D 2 yields (12) with t
replaced by n in (12), from which (12) itself follows at once.
1
1
1
D Xj dP D Xj dP
jD1 kDj fNDkg jD1 fN½jg
1
D E Xj Xj dP .
jD1 fNj1g
Now the set fN j 1g and the r.v. Xj are independent, hence the last written
integral is equal to E Xj P fN j 1g. Substituting this into the above, we
obtain
1
1
E SN D E Xj P fN ½ jg D E X1 P fN ½ jg D E X1 E N,
jD1 jD1
Theorem 5.5.4. Let f be a continuous function on [0, 1], and define the
Bernstein polynomials fpn g as follows:
n
k n
15 pn x D f x k 1 xnk .
n k
kD0
n
and let Sn D kD1Xk as usual. We know from elementary probability theory
that
n
P fSn D kg D x k 1 xnk , 0 k n,
k
so that
Sn
pn x D E f .
n
We know from the law of large numbers that Sn /n ! x with probability one,
but it is sufficient to have convergence in probability, which is Bernoulli’s
weak law of large numbers. Since f is uniformly continuous in [0, 1], it
follows as in the proof of Theorem 4.4.5 that
Sn
E f ! E ffxg D fx.
n
We have therefore proved the convergence of pn x to fx for each x. It
remains to check the uniformity. Now we have for any υ > 0:
Sn
16 jpn x fxj E f fx
n
Sn Sn
D E f fx ; x > υ
n n
Sn Sn
C E f fx ; x υ ,
n n
where we have written E fY; 3g for 3 Y dP . Given > 0, there exists υ
such that
jx yj υ ) jfx fyj /2.
With this choice of υ the last term in (16) is bounded by /2. The preceding
term is clearly bounded by
Sn
2jjfjjP x > υ .
n
Now we have by Chebyshev’s inequality, since E Sn D nx, 2 Sn D nx1
x, and x1 x 14 for 0 x 1:
Sn 1 Sn nx1 x 1
P x > υ 2 2 D 2 2
2 .
n υ n υ n 4υ n
This is nothing but Chebyshev’s proof of Bernoulli’s theorem. Hence if n ½
jjfjj/υ2 , we get jpn x fxj in (16). This proves the uniformity.
5.5 APPLICATIONS 147
One should remark not only on the lucidity of the above derivation but
also the meaningful construction of the approximating polynomials. Similar
methods can be used to establish a number of well-known analytical results
with relative ease; see Exercise 11 below for another example.
EXERCISES
1. Show that equality can hold somewhere in (1) with strictly positive
probability if and only if the discrete part of F does not vanish.
2. Let F and F be as in Theorem 5.5.1; then the distribution of
n
is the same for all continuous F. [HINT: Consider FX, where X has the
d.f. F.]
3. Find the distribution of Ynk , 1 k n, in (1). [These r.v.’s are called
order statistics.]
4. Let S and Nt be as in Theorem 5.5.2. Show that
n
1
E fNtg D P fSn tg.
nD1
if such an n exists, or C1 if not. If P X1 6D 0 > 0, then for every t > 0 and
r > 0 we have P f t > ng n for some < 1 and all large n; consequently
E f tr g < 1. This implies the corollary of Theorem 5.5.2 without recourse
to the law of large numbers. [This is Charles Stein’s theorem.]
7. Consider the special case of renewal where the r.v.’s are Bernoullian
taking the values 1 and 0 with probabilities p and 1 p, where 0 < p < 1.
148 LAW OF LARGE NUMBERS. RANDOM SERIES
Find explicitly the d.f. of 0 as defined in Exercise 6, and hence of t
for every t > 0. Find E f tg and E f t2 g. Relate t, ω to the Nt, ω in
Theorem 5.5.2 and so calculate E fNtg and E fNt2 g.
8. Theorem 5.5.3 remains true if E X1 is defined, possibly C1 or 1.
9. In Exercise 7, find the d.f. of X for a given t. E fX g is the mean
t t
lifespan of the object living at the epoch t; should it not be the same as E fX1 g,
the mean lifespan of the given species? [This is one of the best examples of
the use or misuse of intuition in probability theory.]
10. Let be a positive integer-valued r.v. that is independent of the Xn ’s.
Suppose that both and X1 have finite second moments, then
2
S D E 2
X1 C 2
E X1 2 .
11. Let f be continuous and belong to L r 0, 1 for some r > 1, and
1
g D et ft dt.
0
Then
1n1 n n n
fx D lim gn1 ,
n!1 n 1! x x
Prove that
1
lim log n, ω exists a.e.
n!1 n
Bibliographical Note
An important tool in the study of r.v.’s and their p.m.’s or d.f.’s is the char-
acteristic function (ch.f.). For any r.v. X with the p.m. and d.f. F, this is
defined to be the function f on R1 as follows, 8t 2 R1 :
1
1 ft D E e D
itX
e itXω
P dω D e itx
dx D eitx dFx.
R1 1
(i) 8t 2 R1 :
jftj 1 D f0; ft D ft,
where z denotes the conjugate complex of z.
(ii) f is uniformly continuous in R1 .
To see this, we write for real t and h:
ft C h ft D eitChx eitx dx,
jft C h ftj je jje
itx ihx
1j dx D jeihx 1j dx.
1
For if f n ,
n ½ 1g are the corresponding p.m.’s, then nD1 n n is a p.m.
whose ch.f. is 1 nD1 n fn .
(v) If ffj , 1 j ng are ch.f.’s, then
n
fj
jD1
is a ch.f.
and written as
F D F1 Ł F2 .
It is easy to verify that F is indeed a d.f. The other basic properties of
convolution are consequences of the following theorem.
This reduces to (4). The second equation above, evaluating the double integral
by an iterated one, is an application of Fubini’s theorem (see Sec. 3.3).
and written as
p D p 1 Ł p2 .
Now let g be the indicator of the set B, then for each y, the function gy defined
by gy x D gx C y is the indicator of the set B y. Hence
gx C y 1 dx D 1 B y
R1
6.1 GENERAL PROPERTIES; CONVOLUTIONS 155
and, substituting into the right side of (8), we see that it reduces to (7) in this
case. The general case is proved in the usual way by first considering simple
functions g and then passing to the limit for integrable functions.
As an instructive example, let us calculate the ch.f. of the convolution
1 Ł 2 . We have by (8)
e 1 Ł 2 du D
itu
eity eitx 1 dx 2 dy
D e itx
1 dx eity 2 dy.
This is as it should be by (v), since the first term above is the ch.f. of X C Y,
where X and Y are independent with 1 and 2 as p.m.’s. Let us restate the
results, after an obvious induction, as follows.
To prove the corollary, let X have the ch.f. f. Then there exists on some
(why?) an r.v. Y independent of X and having the same d.f., and so also
the same ch.f. f. The ch.f. of X Y is
E eitXY D E eitX E eitY D ftft D jftj2 .
The technique of considering X Y and jfj2 instead of X and f will
be used below and referred to as “symmetrization” (see the end of Sec. 6.2).
This is often expedient, since a real and particularly a positive-valued ch.f.
such as jfj2 is easier to handle than a general one.
Let us list a few well-known ch.f.’s together with their d.f.’s or p.d.’s
(probability densities), the last being given in the interval outside of which
they vanish.
2
(12) Cauchy distribution with parameter a > 0:
a
p.d. in 1, 1; ch.f. eajtj .
a C x 2
2
6.1 GENERAL PROPERTIES; CONVOLUTIONS 157
Thus the first assertion follows by differentiation under the integral of the last
term in (9), which is justified by elementary rules of calculus. The second
assertion is proved by standard estimation as follows, for any > 0:
1
jfx fυ xj jfx fx yjnυ y dy
1
sup jfx fx yj C 2jjfjj nυ y dy.
jyj jyj>
v
then n! .
EXERCISES
is a ch.f. for any d.f. G. In particular, if f is a ch.f. such that limt!1 ft
exists and G a d.f. with G0 D 0, then
1
t
f dGu is a ch.f.
0 u
3. Find the d.f. with the following ch.f.’s ˛ > 0, ˇ > 0:
˛2 1 1
, , .
˛2 C t2 1 ˛itˇ 1 C ˛ˇ ˛ˇeit 1/ˇ
[HINT: The second and third steps correspond respectively to the gamma and
Pólya distributions.]
6.1 GENERAL PROPERTIES; CONVOLUTIONS 159
converges in the sense of infinite product for each t and is the ch.f. of S1 .
5. If F1 and F2 are d.f.’s such that
F1 D bj υaj
j
and F2 has density p, show that F1 Ł F2 has a density and find it.
6. Prove that the convolution of two discrete d.f.’s is discrete; that of a
continuous d.f. with any d.f. is continuous; that of an absolutely continuous
d.f. with any d.f. is absolutely continuous.
7. The convolution of two discrete distributions with exactly m and n
atoms, respectively, has at least m C n 1 and at most mn atoms.
8. Show that the family of normal (Cauchy, Poisson) distributions is
closed with respect to convolution in the sense that the convolution of any
two in the family with arbitrary parameters is another in the family with some
parameter(s).
9. Find the nth iterated convolution of an exponential distribution.
10. Let fX , j ½ 1g be a sequence of independent r.v.’s having the
j
common exponential distribution with mean 1/, > 0. For given x > 0 let
be the maximum of n such that Sn x, where S0 D 0, Sn D njD1 Xj as
usual. Prove that the r.v. has the Poisson distribution with mean x. See
Sec. 5.5 for an interpretation by renewal theory.
11. Let X have the normal distribution 8. Find the d.f., p.d., and ch.f.
of X2 .
12. Let fXj , 1 j ng be independent r.v.’s each having the d.f. 8.
Find the ch.f. of
n
X2j
jD1
and show that the corresponding p.d. is 2n/2 0n/21 x n/21 ex/2 in 0, 1.
This is called in statistics the “ 2 distribution with n degrees of freedom”.
13. For any ch.f. f we have for every t:
R[1 ft] ½ 14 R[1 f2t].
14. Find an example of two r.v.’s X and Y with the same p.m. that are
not independent but such that X C Y has the p.m. Ł . [HINT: Take X D Y
and use ch.f.]
160 CHARACTERISTIC FUNCTION
QF is called the Lévy concentration function of F. Prove that the sup above
is attained, and if G is also a d.f., we have
8h > 0: QFŁ G h QF h ^ QG h.
16. If 0 < h 2, then there is an absolute constant A such that
A
QF h jftj dt,
0
where f is the ch.f. of F. [HINT: Use Exercise 2 of Sec. 6.2 below.]
17. Let F be a symmetric d.f., with ch.f. f ½ 0 then
1 1
h2
ϕF h D 2 C x2
dFx D h eht ft dt
1 h 0
1
1 cos ˛x
3 dx D j˛j.
0 x2 2
The substitution ˛x D u shows at once that it is sufficient to prove all three
formulas for ˛ D 1. The inequality (1) is proved by partitioning the interval
[0, 1 with positive multiples of so as to convert the integral into a series
of alternating signs and decreasing moduli. The integral in (2) is a standard
exercise in contour integration, as is also that in (3). However, we shall indicate
the following neat heuristic calculations, leaving the justifications, which are
not difficult, as exercises.
1 1 # 1 $ 1 # 1 $
sin x
dx D sin x exu du dx D exu sin x dx du
0 x 0 0 0 0
1
du
D D ;
0 1Cu 2 2
1 1 # x $ 1 # 1 $
1 cos x 1 dx
dx D sin u du dx D sin u du
0 x2 0 x2 0 0 u x2
1
sin u
D du D .
0 u 2
We are ready to answer the question: given a ch.f. f, how can we find
the corresponding d.f. F or p.m. ? The formula for doing this, called the
inversion formula, is of theoretical importance, since it will establish a one-
to-one correspondence between the class of d.f.’s or p.m.’s and the class of
ch.f.’s (see, however, Exercise 12 below). It is somewhat complicated in its
most general form, but special cases or variants of it can actually be employed
to derive certain properties of a d.f. or p.m. from its ch.f.; see, e.g., (14) and
(15) of Sec. 6.4.
D 1
2
fx1 g C x1 , x2 C 1
2
fx2 g.
This proves the theorem. For the justification mentioned above, we invoke
Fubini’s theorem and observe that
itxx1
e eitxx2 x2 itu
it D e du jx1 x2 j,
x1
Theorem 6.2.2. If two p.m.’s or d.f.’s have the same ch.f., then they are the
same.
PROOF. If neither x1 nor x2 is an atom of , the inversion formula (4)
shows that the value of on the interval x1 , x2 is determined by its ch.f. It
follows that two p.m.’s having the same ch.f. agree on each interval whose
endpoints are not atoms for either measure. Since each p.m. has only a count-
able set of atoms, points of R1 that are not atoms for either measure form a
dense set. Thus the two p.m.’s agree on a dense set of intervals, and therefore
they are identical by the corollary to Theorem 2.2.3.
We give next an important particular case of Theorem 6.2.1.
PROOF. Since the set of atoms is countable, all but a countable number
of terms in the sum above vanish, making the sum meaningful with a value
bounded by 1. Formula (9) can be established directly in the manner of (5)
and (7), but the following proof is more illuminating. As noted in the proof
of the corollary to Theorem 6.1.4, jfj2 is the ch.f. of the r.v. X Y there,
whose distribution is Ł 0 , where 0 B D B for each B 2 B. Applying
Theorem 6.2.4 with x0 D 0, we see that the left member of (9) is equal to
0
Ł f0g.
By (7) of Sec. 6.1, the latter may be evaluated as
0
fyg dy D fyg fyg,
R1
y2R1
DEFINITION. The r.v. X is called symmetric iff X and X have the same
distribution.
ft D ft
166 CHARACTERISTIC FUNCTION
EXERCISES
4. If ft/t 2 L 1 1, 1, then for each ˛ > 0 such that š˛ are points
of continuity of F, we have
1 1 sin ˛t
F˛ F˛ D ft dt.
1 t
6.2 UNIQUENESS AND INVERSION 167
10. Prove the following form of the inversion formula (due to Gil-
Palaez):
T itx
1 1 e ft eitx ft
fFxC C Fxg D C lim dt.
2 2 T"1
υ#0
υ 2it
[HINT: Use the method of proof of Theorem 6.2.1 rather than the result.]
11. Theorem 6.2.3 has an analogue in L 2 . If the ch.f. f of F belongs
2
to L , then F is absolutely continuous. [HINT: By Plancherel’s theorem, there
exists ϕ 2 L 2 such that
x 1 itx
1 e 1
ϕu du D p ft dt.
0 2 0 it
Now use the inversion formula to show that
x
1
Fx F0 D p ϕu du.]
2 0
12. Prove Theorem 6.2.2 by the Stone–Weierstrass theorem. [HINT: Cf.
Theorem 6.6.2 below, but beware of the differences. Approximate uniformly
g1 and g2 in the proof of Theorem 4.4.3 by a periodic function with “arbitrarily
large” period.]
13. The uniqueness theorem holds as well for signed measures [or func-
tions of bounded variations]. Precisely, if each i , i D 1, 2, is the difference
of two finite measures such that
8t: eitx 1 dx D eitx 2 dx,
then 1 2.
14. There is a deeper supplement to the inversion formula (4) or
Exercise 10 above, due to B. Rosén. Under the condition
1
1 C log jxj dFx < 1,
1
we have
1 1
N
sinx yt N
dt
dFy dt D sinx yt dFy.
1 0 t 0 t 1
6.3 CONVERGENCE THEOREMS 169
for any > 0, suitable A and n ½ n0 A, . The equicontinuity of ffn g follows.
u
This and the pointwise convergence fn ! f1 imply fn !f1 by a simple
compactness argument (the “3 argument”) left to the reader.
Then we have
v
(˛) n ! 1 , where 1 is a p.m.;
(ˇ) f1 is the ch.f. of 1.
PROOF. Let us first relax the conditions (a) and (b) to require only conver-
gence of fn in a neighborhood υ0 , υ0 of t D 0 and the continuity of the
limit function f (defined only in this neighborhood) at t D 0. We shall prove
that any vaguely convergent subsequence of f n g converges to a p.m. . For
this we use the following lemma, which illustrates a useful technique for
obtaining estimates on a p.m. from its ch.f.
Since the integrand on the right side is bounded by 1 for all x (it is defined to be
1 at x D 0), and by jTxj1 2TA1 for jxj > 2A, the integral is bounded by
1
[2A, 2A] C f1 [2A, 2A]g
2TA
1 1
D 1 [2A, 2A] C .
2TA 2TA
The first term on the right side tends to 1 as υ # 0, since f0 D 1 and f
is continuous at 0; for fixed υ the second term tends to 0 as n ! 1, by
bounded convergence since jfn fj 2. It follows that for any given > 0,
6.3 CONVERGENCE THEOREMS 171
there exist υ D υ < υ0 and n0 D n0 such that if n ½ n0 , then the left
member of (4) has a value not less than 1 . Hence by (2)
1 1
5 n [2υ , 2υ ] ½ 21 1 ½ 1 2.
fn t D 1
2
C 12 eint ,
which does not converge as n ! 1, except when t is equal to a multiple of 2.
and
0, if t 6D 0;
fn t ! ft D
1, if t D 0.
Later we shall see that (a) cannot be relaxed to read: fn t converges in
jtj T for some fixed T (Exercise 9 of Sec. 6.5).
The convergence theorem above settles the question of vague conver-
gence of p.m.’s to a p.m. What about just vague convergence without restric-
tion on the limit? Recalling Theorem 4.4.3, this suggests first that we replace
the integrand eitx in the ch.f. f by a function in C0 . Secondly, going over the
last part of the proof of Theorem 6.3.2, we see that the choice should be made
so as to determine uniquely an s.p.m. (see Sec. 4.3). Now the Fourier–Stieltjes
transform of an s.p.m. is well defined and the inversion formula remains valid,
so that there is unique correspondence just as in the case of a p.m. Thus a
natural choice of g is given by an “indefinite integral” of a ch.f., as follows:
u # $
eiux 1
7 gu D eitx dx dt D dx.
0 R1 R1 ix
Let us call g the integrated characteristic function of the s.p.m. . We are
thus led to the following companion of (6), the details of the proof being left
as an exercise.
It is easy to verify that this is a metric on CB and that convergence in this metric
is equivalent to uniform convergence on compacts; clearly the denominator
1 C t2 may be replaced by any function continuous on R1 , bounded below
by a strictly positive constant, and tending to C1 as jtj ! 1. Since there is
a one-to-one correspondence between ch.f.’s and p.m.’s, we may transfer the
6.3 CONVERGENCE THEOREMS 173
This means that for each and given > 0, there exists υ , such that:
h , i1 υ , ) h , i2 ,
h , i2 υ , ) h , i1 .
Theorem 6.3.4 needs no new proof, since it is merely a paraphrasing of (6) in
new words. However, it is important to notice the dependence of υ on (as
well as ) above. The sharper statement without this dependence, which would
mean the equivalence of the uniform structures induced by the two metrics,
is false with a vengeance; see Exercises 10 and 11 below (Exercises 3 and 4
are also relevant).
EXERCISES
1
sin t t
D cos n
t nD1
2
Prove that either factor on the right is the ch.f. of a singular distribution. Thus
the convolution of two such may be absolutely continuous. [HINT: Use the
same r.v.’s as for the Cantor distribution in Exercise 9 of Sec. 5.3.]
10. Using the strong law of large numbers, prove that the convolution of
two Cantor d.f.’s is still singular. [HINT: Inspect the frequency of the digits in
the sum of the corresponding random series; see Exercise 9 of Sec. 5.3.]
11. Let F , G be the d.f.’s of
n n n , n , and fn , gn their ch.f.’s. Even if
supx2R1 jFn x Gn xj ! 0, it does not follow that hfn , gn i2 ! 0; indeed
it may happen that hfn , gn i2 D 1 for every n. [HINT: Take two step functions
“out of phase”.]
12. In the notation of Exercise 11, even if supt2R1 jfn t gn tj ! 0,
it does not follow that hFn , Gn i ! 0; indeed it may ! 1. [HINT: Let f be any
ch.f. vanishing outside (1, 1), fj t D einj t fmj t, gj t D einj t fmj t, and
Fj , Gj be the corresponding d.f.’s. Note that if mj n1 j ! 0, then Fj x ! 1,
Gj x ! 0 for every x, and that fj gj vanishes outside mj1 , mj1 and
2
is 0sin nj t near t D 0. If mj D 2j and nj D jmj then j fj gj is
1 1
uniformly bounded in t: for nkC1 < t nk consider j > k, j D k, j < k sepa-
rately. Let
n
n
fŁn D n1 fj , gŁn D n1 gj ,
jD1 jD1
then sup jfŁn gŁn j D On1 while FŁn GŁn ! 0. This example is due to
Katznelson, rivaling an older one due to Dyson, which is as follows. For
6.4 SIMPLE APPLICATIONS 175
t [eajtj ebjtj ]
ft gt D i .
jtj b
log
a
If a is large, then hF, Gi is near 1. If b/a is large, then hf, gi is near 0.]
Theorem 6.4.1. If the d.f. has a finite absolute moment of positive integral
order k, then its ch.f. has a bounded continuous derivative of order k given by
1
1 f t D
k
ixk eitx dFx.
1
eihx 2 C eihx
D lim dFx
h!0 h2
1 cos hx
2 D 2 lim dFx.
h!0 h2
As h ! 0, we have by Fatou’s lemma,
1 cos hx 1 cos hx
x dFx D 2 lim
2
dFx lim 2 dFx
h!0 h2 h!0 h2
D f00 0.
Thus F has a finite second moment, and the validity of (1) for k D 2 now
follows from the first assertion of the theorem.
The general case can again be reduced to this by induction, as follows.
Suppose the second assertion of the theorem is true for 2k 2, and that
f2k 0 is finite. Then f2k2 t exists and is continuous in the neighborhood
of t D 0, and by the induction hypothesis we have in particular
1 k1
x 2k2 dFx D f2k2 0.
x
Put Gx D 1 y 2k2 dFy for every x, then GÐ/G1 is a d.f. with the
ch.f.
1 1k1 f2k2 t
t D eitx x 2k2 dFx D .
G1 G1
00
Hence exists, and by the case k D 2 proved above, we have
1 1
2 0 D x 2 dGx D x 2k dFx
G1 G1
Upon cancelling G1, we obtain
1k f2k 0 D x 2k dFx,
which proves the finiteness of the 2kth moment. The argument above fails
if G1 D 0, but then we have (why?) F D υ0 , f D 1, and the theorem is
trivial.
Although the next theorem is an immediate corollary to the preceding
one, it is so important as to deserve prominent mention.
k1 j
i j j k
0
3 ft D m t C k
jtjk ;
jD0
j! k!
from (1), we obtain (3) from (4), and 30 from 40 .
It should be remarked that the form of Taylor expansion given in (4)
is not always given in textbooks as cited above, but rather under stronger
assumptions, such as “f has a finite kth derivative in the neighborhood of
0”. [For even k this stronger condition is actually implied by the weaker one
stated in the proof above, owing to Theorem 6.4.1.] The reader is advised
to learn the sharper result in calculus, which incidentally also yields a quick
proof of the first equation in (2). Observe that (3) implies 30 if the last term
in 30 is replaced by the more ambiguous Ojtjk , but not as it stands, since
the constant in “O” may depend on the function f and not just on k .
By way of illustrating the power of the method of ch.f.’s without the
encumbrance of technicalities, although anticipating more elaborate develop-
ments in the next chapter, we shall apply at once the results above to prove two
classical limit theorems: the weak law of large numbers (cf. Theorem 5.2.2),
and the central limit theorem in the identically distributed and finite variance
case. We begin with an elementary lemma from calculus, stated here for the
sake of clarity.
Theorem 6.4.5. In the notation of Theorem 4.5.5, if (8) there holds together
with the following condition:
mk tk
6 8t 2 R1 : lim D 0,
k!1 k!
v
then Fn !F.
PROOF. Let fn be the ch.f. of Fn . For fixed t and an odd k we have by
the Taylor expansion for eitx with a remainder term:
⎧ ⎫
⎨ k kC1 ⎬
itxj
jitxj
fn t D eitx dFn x D C dFn x
⎩ j! k C 1! ⎭
jD0
k
itj mnkC1 tkC1
D mnj C ,
jD0
j! k C 1!
Given > 0, by condition (6) there exists an odd k D k such that for the
fixed t we have
2mkC1 C 1tkC1
8 .
k C 1! 2
Since we have fixed k, there exists n0 D n0 such that if n ½ n0 , then
mnkC1 mkC1 C 1,
and moreover,
max jmnj mj j ejtj .
1jk 2
Then the right side of (7) will not exceed in modulus:
k
jtjj tkC1 2mkC1 C 1
ejtj C .
jD0
j! 2 k C 1!
180 CHARACTERISTIC FUNCTION
Hence fn t ! ft for each t, and since f is a ch.f., the hypotheses of
v
Theorem 6.3.2 are satisfied and so Fn !F.
As another kind of application of the convergence theorem in which a
limiting process is implicit rather than explicit, let us prove the following
characterization of the normal distribution.
EXERCISES
[If we assume only p P fX1 6D 0g > 0, E jX1 j < 1 and E X1 D 0, then we
have E jSn j ½ C n for some constant C and all n; this p is known as
Hornich’s inequality.] [HINT: In case 2 D 1, if limn E j Sn / n < 1, then
p
there exists fnk g such that Sn / nk converges in distribution; use an extension
p 2n
of Exercise 1 to show jft/ nj ! 0. This is due to P. Matthews.]
3. Let P fX D kg D pk , 1 k < 1, kD1 pk D 1. The sum Sn of
n independent r.v.’s having the same distribution as X is said to have a
multinomial distribution. Define it explicitly. Prove that [Sn E Sn ]/ Sn
converges to 8 in distribution as n ! 1, provided that X > 0.
4. Let X have the binomial distribution with parameter n, p , and
n n
suppose that npn ! ½ 0. Prove that Xn converges in dist. to the Poisson d.f.
with parameter . (In the old days this was called the law of small numbers.)
5. Let X have the Poisson distribution with parameter . Prove that
[X ]/1/2 converges in dist. to 8 as ! 1.
6. Prove that in Theorem 6.4.4, S / pn does not converge in proba-
n p
p
bility. [HINT: Consider Sn / n and S2n / 2n.]
7. Let f be the ch.f. of the d.f. F. Suppose that as t ! 0,
ft 1 D Ojtj˛ ,
[HINT: Integrate cos tx dFx Ct˛ over t in 0, A.]
jxj>A 1
8. If 0 < ˛ < 1 and jxj˛ dFx < 1, then ft 1 D ojtj˛ as t ! 0.
1 ˛ < 2 the same result is true under the additional assumption that
For
x dFx D 0. [HINT: The case 1 ˛ < 2 is harder. Consider the real and
imaginary parts of ft 1 separately and write the latter as
sin tx dFx C sin tx dFx.
jxj/t jxj>/jtj
182 CHARACTERISTIC FUNCTION
The second is bounded by jtj/˛ jxj>/jtj jxj˛ dFx D ojtj˛ for fixed . In
the first integral use sin tx D tx C Ojtxj3 ,
tx dFx D t x dFx,
jxj/jtj jxj>/jtj
1
jtxj dFx
3 3˛
jtxj˛ dFx.]
jxj/jtj 1
10. Suppose F satisfies the condition that for every > 0 such that as
A ! 1,
dFx D OeA .
jxj>A
Then all moments of F are finite, and condition (6) in Theorem 6.4.5 is satisfied.
11. Let X and Y be independent
p with the common d.f. F of mean 0 and
variance 1. Suppose that X C Y/ 2 also has the d.f. F. Then F 8. [HINT:
Imitate Theorem 6.4.5.]
12. Let fX , j ½ 1g be independent, identically distributed r.v.’s with
j
mean 0 and variance 1. Prove that both
n
p
n
Xj n Xj
jD1 jD1
/ and
0n
n
0 X2j
1 X2j
jD1 jD1
n
n log n
which is an absolutely convergent Fourier series. Note that the degenerate d.f.
υa with ch.f. eait is a particular case. We have the following characterization.
The integrand is positive everywhere and vanishes if and only if for some
integer j,
0 2
xD Cj .
t0 t0
It follows that the support of must be contained in the set of x of this form
in order that equation (13) may hold, for the integral of a strictly positive
function over a set of strictly positive measure is strictly positive. The theorem
is therefore proved, with a D 0 /t0 and d D 2/t0 in the definition of a lattice
distribution.
184 CHARACTERISTIC FUNCTION
EXERCISES
f or fn is a ch.f. below.
14. If jftj D 1, jft0 j D 1 and t/t0 is an irrational number, then f is
degenerate. If for a sequence ftk g of nonvanishing constants tending to 0 we
have jftk j D 1, then f is degenerate.
15. If jf tj ! 1 for every t as n ! 1, and F is the d.f. corre-
n n
v
sponding to fn , then there exist constants an such that Fn x C an !υ0 . [HINT:
Symmetrize and take an to be a median of Fn .]
16. Suppose b > 0 and jfb tj converges everywhere to a ch.f. that
n n
is not identically 1, then bn converges to a finite and strictly positive limit.
[HINT: Show that it is impossible that a subsequence of bn converges to 0 or
to C1, or that two subsequences converge to different finite limits.]
17. Suppose c is real and that ecn it converges to a limit for every t
n
in a set of strictly positive Lebesgue measure. Then cn converges to a finite
6.4 SIMPLE APPLICATIONS 185
limit. [HINT: Proceed as in Exercise 16, and integrate over t. Beware of any
argument using “logarithms”, as given in some textbooks, but see Exercise 12
of Sec. 7.6 later.]
18. Let f and g be two nondegenerate ch.f.’s. Suppose that there exist
real constants an and bn > 0 such that for every t:
t
fn t ! ft and e itan /bn
fn ! gt.
bn
[Two d.f.’s F and G such that Gx D Fbx C a for every x, where b > 0
and a is real, are said to be of the same “type”. The two preceding exercises
deal with the convergence of types.]
20. Show by using (14) that j cos tj is not a ch.f. Thus the modulus of a
ch.f. need not be a ch.f., although the squared modulus always is.
21. The span of an integer lattice distribution is the greatest common
divisor of the set of all differences between points of jump.
22. Let fs, t be the ch.f. of a 2-dimensional p.m. . If jfs0 , t0 j D 1
for some s0 , t0 6D 0, 0, what can one say about the support of ?
23. If fX g is a sequence of independent and identically distributed r.v.’s,
n
then there does not exist a sequence of constants fcn g such that n Xn cn
converges a.e., unless the common d.f. is degenerate.
In Exercises 24 to 26, let Sn D njD1 Xj , where the X0j s are independent r.v.’s
with a common d.f. F of the integer lattice type with span 1, and taking both
>0 and <0 values.
24. If x dFx D 0, x 2 dFx D 2 , then for each integer j:
1
n1/2 P fSn D jg ! p .
2
[HINT: Proceed as in Theorem 6.4.4, but use (15).]
186 CHARACTERISTIC FUNCTION
[HINT: Reduce to the case where the d.f. has zero mean and finite variance by
translating and truncating.]
28. Let Qn be the concentration function of Sn D njD1 Xj , where the
Xj ’s are independent r.v.’s having a common nondegenerate d.f. F. Then for
every h > 0,
Qn h An1/2
[HINT: Use Exercise 27 above and Exercise 16 of Sec. 6.1. This result is due
to Lévy and Doeblin, but the proof is due to Rosén.]
In Exercises 29 to 35, or k is a p.m. on U D 0, 1].
29. Define for each n:
f n D e2inx dx.
U
34. Suppose that the space U is replaced by its closure [0, 1] and the
two points 0 and 1 are identified; in other words, suppose U is regarded as the
6.5 REPRESENTATIVE THEOREMS 187
Since jfT tj jftj 1 by Theorem 6.5.1, and 1 cos t/t2 belongs to
L 1 1, 1, we have by dominated convergence:
ˇ
1 ˛ 1 1 t 1 cos t
6 lim dˇ pT x dx D lim fT dt
˛!1 ˛ 0 ˇ 1 ˛!1 ˛ t2
1 1 1 cos t
D dt D 1.
1 t2
Note that the last two integrals are in reality over finite intervals. Letting
˛ ! 1, we obtain by bounded convergence as before:
1
8 eix pT x dx D fT ,
1
the integral on the left existing by (7). Since equation (8) is valid for each ,
we have proved that fT is the ch.f. of the density function pT . Finally, since
fT ! f as T ! 1 for every , and f is by hypothesis continuous at
D 0, Theorem 6.3.2 yields the desired conclusion that f is a ch.f.
As a typical application of the preceding theorem, consider a family of
(real-valued) r.v.’s fXt , t 2 RC g, where RC D [0, 1, satisfying the following
conditions, for every s and t in RC :
6.5 REPRESENTATIVE THEOREMS 191
(i) E X2t D 1;
(ii) there exists a function rÐ on R1 such that E Xs Xt D rs t;
(iii) limt#0 E X0 Xt 2 D 0.
This R is called the spectral distribution of the process and is essential in its
further analysis.
Theorem 6.5.2 is not practical in recognizing special ch.f.’s or in
constructing them. In contrast, there is a sufficient condition due to Pólya
that is very easy to apply.
and left-hand derivatives everywhere that are equal except on a countable set,
that f is the integral of either one of them, say the right-hand one, which will
be denoted simply by f0 , and that f0 is increasing. Under the conditions of
the theorem it is also clear that f is decreasing and f0 is negative in RC .
Now consider the fT as defined in (5) above, and observe that
t
1
1 f0 t C ft, if 0 < t < T;
f0T t D T T
0, if t ½ T.
Thus f0T is positive and decreasing in RC . We have for each x 6D 0:
1 1
2 1
eitx fT t dt D 2 cos txfT t dt D sin txf0T t dt
1 0 x 0
1 kC1/x
2
D sin txf0T t dt.
x kD0 k/x
The terms of the series alternate in sign, beginning with a positive one, and
decrease in magnitude, hence the sum ½ 0. [This is the argument indicated
for formula (1) of Sec. 6.2.] For x D 0, it is trivial that
1
fT t dt ½ 0.
1
f˛ t D ejtj
˛
is a ch.f.
PROOF. For 0 < ˛ 1, this is a quick consequence of Pólya’s theorem
above. Other conditions there being obviously satisfied, we need only check
that f˛ is convex in [0, 1. This is true because its second derivative is
equal to
et f˛2 t2˛2 ˛˛ 1t˛2 g > 0
˛
for the range of ˛ in question. No such luck for 1 < ˛ < 2, and there are
several different proofs in this case. Here is the one given by Lévy which
6.5 REPRESENTATIVE THEOREMS 193
being continuous at t D 0, is also a ch.f. by the basic Theorem 6.3.2, and the
constant c˛ may be absorbed by a change of scale. Finally, for ˛ D 2, f˛ is
the ch.f. of a normal distribution. This completes the proof of the theorem.
Actually Lévy, who discovered these ch.f.’s around 1923, proved also
that there are complex constants ˛ such that e˛ jtj is a ch.f., and determined
˛
the exact form of these constants (see Gnedenko and Kolmogorov [12]). The
corresponding d.f.’s are called stable distributions, and those with real posi-
tive ˛ the symmetric stable ones. The parameter ˛ is called the exponent.
These distributions are part of a much larger class called the infinitely divisible
distributions to be discussed in Chapter 7.
Using the Cauchy ch.f. ejtj we can settle a question of historical interest.
Draw the graph of this function, choose an arbitrary T > 0, and draw the
tangents at šT meeting the abscissa axis at šT0 , where T0 > T. Now define
194 CHARACTERISTIC FUNCTION
the function fT to be f in [T, T], linear in [T0 , T] and in [T, T0 ], and
zero in 1, T0 and T0 , 1. Clearly fT also satisfies the conditions of
Theorem 6.5.3 and so is a ch.f. Furthermore, f D fT in [T, T]. We have
thus established the following theorem and shown indeed how abundant the
desired examples are.
Theorem 6.5.5. There exist two distinct ch.f.’s that coincide in an interval
containing the origin.
is the ch.f. of the Poisson distribution which should be familiar to the reader.
6.5 REPRESENTATIVE THEOREMS 195
EXERCISES
7. Construct a ch.f. that vanishes in [b, a] and [a, b], where 0 < a <
b, but nowhere else. [HINT: Let fm be the ch.f. in Exercise 6 and consider
pm fm , where pm ½ 0, pm D 1,
m m
is a ch.f.
9. Show that in Theorem 6.3.2, the hypothesis (a) cannot be relaxed to
require convergence of ffn g only in a finite interval jtj T.
Propositions (i) to (v) of Sec. 6.1 have their obvious analogues. The inversion
formula may be formulated as follows. Call an “interval” (rectangle)
fx, y: x1 x x2 , y1 y y2 g
an interval of continuity iff the -measure of its boundary (the set of points
on its four sides) is zero. For such an interval I, we have
T T isx1
1 e eisx2 eity1 eity2
I D lim fs, t ds dt.
T!1 22 T T is it
The proof is entirely similar to that in one dimension. It follows, as there, that
f uniquely determines . Only the following result is noteworthy.
Theorem 6.6.2. Let F O j be the Laplace transform of the d.f. Fj with support
O1 D F
in RC , j D 1, 2. If F O 2 , then F1 D F2 .
PROOF. We shall apply the Stone–Weierstrass theorem to the algebra
generated by the family of functions fex , ½ 0g, defined on the closed
positive real line: RC D [0, 1], namely the one-point compactification of
RC D [0, 1. A continuous function of x on RC is one that is continuous
in RC and has a finite limit as x ! 1. This family separates points on RC
and vanishes at no point of RC (at the point C1, the member e0x D 1 of the
family does not vanish!). Hence the set of polynomials in the members of the
6.6 MULTIDIMENSIONAL CASE; LAPLACE TRANSFORMS 199
family, namely the algebra generated by it, is dense in the uniform topology,
in the space CB RC of bounded continuous functions on RC . That is to say,
given any g 2 CB RC , and > 0, there exists a polynomial of the form
n
g x D cj ej x ,
jD1
and consequently,
g xdF1 x D g xdF2 x.
for each g 2 CB RC ; second, that this also holds for each g that is the
indicator of an interval in RC (even one of the form a, 1]; third, that
the two p.m.’s induced by F1 and F2 are identical, and finally that F1 D F2
as asserted.
Turning to the “if ” part, let us first prove that f is quasi-analytic in 0, 1,
namely it has a convergent Taylor series there. Let 0 < 0 < < , then, by
6.6 MULTIDIMENSIONAL CASE; LAPLACE TRANSFORMS 201
Taylor’s theorem, with the remainder term in the integral form, we have
k1
fj
6 f D j
jD0
j!
k 1
C 1 tk1 fk C t dt.
k 1! 0
Because of (4), the last term in (6) is positive and does not exceed
k 1
1 tk1 fk C 0 t dt.
k 1! 0
For if k is even, then fk # and k ½ 0, while if k is odd then fk "
and k 0. Now, by (6) with replaced by 0 , the last expression is
equal to
⎡ ⎤
k
k1 k
⎣f0
f j
⎦
0
j
f0 ,
0 jD0
j! 0
where the inequality is trivial, since each term in the sum on the left is positive
by (4). Therefore, as k ! 1, the remainder term in (6) tends to zero and the
Taylor series for f converges.
Now for each n ½ 1, define the discrete s.d.f. Fn by the formula:
[nx] j
n
7 Fn x D 1j fj n.
jD0
j!
This is indeed an s.d.f., since for each > 0 and k ½ 1 we have from (6):
k1
fj n
1 D f0C ½ f ½ nj .
jD0
j!
Letting # 0 and then k " 1, we see that Fn 1 1. The Laplace transform
of Fn is plainly, for > 0:
1
x nj j
e dFn x D ej/n f n
RC jD0
j!
1
1
D n1 e/n nj fj n D fn1 e/n ,
jD0
j!
the last equation from the Taylor series. Letting n ! 1, we obtain for the
limit of the last term f, since f is continuous at each . It follows from
202 CHARACTERISTIC FUNCTION
Theorem 6.6.3 that fFn g converges vaguely, say to F, and that the Laplace
transform of F is f. Hence F1 D f0 D 1, and F is a d.f. The theorem
is proved.
EXERCISES
[HINT: Show that Fn x e2xυ fυ for each υ > 0 and all large n, where Fn
is defined in (7). Alternatively, apply Theorem 6.6.4 to f C n1 /fn1
for ½ 0 and use Exercise 3.]
12. Let fgn , 1 n 1g on RC satisfy the conditions: (i) for each
n, gn Ð is positive and decreasing; (ii) g1 x is continuous; (iii) for each
> 0, 1 1
x
lim e gn x dx D ex g1 x dx.
n!1 0 0
Then
lim gn x D g1 x for every x 2 RC .
n!1
1
[HINT: For > 0 consider the sequence 0 ex gn x dx and show that
b b
lim ex gn x dx D ex g1 x dx, lim gn b g1 b,
n!1 a a n!1
and so on.]
8 2 RC : h D g.
PROOF. For each integer m ½ 1, the function hm defined by
1
xn
hm z D e dFx D
zx
z n
dFx
[0,m] nD0 [0,m] n!
Bibliographical Note
For standard references on ch.f.’s, apart from Lévy [7], [11], Cramér [10], Gnedenko
and Kolmogorov [12], Loève [14], Rényi [15], Doob [16], we mention:
S. Bochner, Vorlesungen über Fouriersche Integrale. Akademische Ver-
laggesellschaft, Konstanz, 1932.
E. Lukacs, Characteristic functions. Charles Griffin, London, 1960.
The proof of Theorem 6.6.4 is taken from
Willy Feller, Completely monotone functions and sequences, Duke J. 5 (1939),
661–674.
Central limit theorem and
7 its ramifications
we see that we are really dealing with a double array, as follows. For each
n ½ 1 let there be kn r.v.’s fXnj , 1 j kn g, where kn ! 1 as n ! 1:
X11 , X12 , . . . , X1k1 ;
2 X21 , X22 , . . . , X2k2 ;
.................
Xn1 , Xn2 , . . . , Xnkn ;
.................
206 CENTRAL LIMIT THEOREM AND ITS RAMIFICATIONS
The r.v.’s with n as first subscript will be referred to as being in the nth row.
Let Fnj be the d.f., fnj the ch.f. of Xnj ; and put
kn
Sn D Sn,kn D Xnj .
jD1
The particular case kn D n for each n yields a triangular array, and if, further-
more, Xnj D Xj for every n, then it reduces to the initial sections of a single
sequence fXj , j ½ 1g.
We shall assume from now on that the r.v.’s in each row in (2) are
independent, but those in different rows may be arbitrarily dependent as in
the case just mentioned — indeed they may be defined on different probability
spaces without any relation to one another. Furthermore, let us introduce the
following notation for moments, whenever they are defined, finite or infinite:
E Xnj D ˛nj , 2
Xnj D nj2
,
kn
kn
E Sn D ˛nj D ˛n , 2
Sn D nj D sn ,
2 2
are “negligible” in comparison with the sum itself. Historically, this arose
from the assumption that “small errors” accumulate to cause probabilistically
predictable random mass phenomena. We shall see later that such a hypothesis
is indeed necessary in a reasonable criterion for the central limit theorem such
as Theorem 7.2.1.
In order to clarify the intuitive notion of the negligibility, let us consider
the following hierarchy of conditions, each to be satisfied for every > 0:
(a) 8j: lim P fjXnj j > g D 0;
n!1
kn
(d) lim P fjXnj j > g D 0.
n!1
jD1
It is clear that (d) ) (c) ) (b) ) (a); see Exercise 1 below. It turns out that
(b) is the appropriate condition, which will be given a name.
DEFINITION. The double array (2) is said to be holospoudic Ł iff (b) holds.
and consequently
Ł I am indebted to Professor M. Wigodsky for suggesting this word, the only new term coined
in this book.
208 CENTRAL LIMIT THEOREM AND ITS RAMIFICATIONS
Then we have
kn
7 1 C nj ! e .
jD1
PROOF. By (i), there exists n0 such that if n ½ n0 , then jnj j 12 for all
j, so that 1 C nj 6D 0. We shall consider only such large values of n below,
and we shall denote by log 1 C nj the determination of logarithm with an
angle in , ]. Thus
(This 3 is not the same as before, but bounded by the same 1!). It follows
from (ii) and (i) that
kn
kn
9 jnj j2 max jnj j jnj j M max jnj j ! 0;
1jkn 1jkn
jD1 jD1
Theorem 7.1.2. Assume that (4) and (5) hold for the double array (2) and
that nj is finite for every n and j. If
10 0n ! 0
as n ! 1, then Sn converges in dist. to 8.
PROOF. For each n, the range of j below will be from 1 to kn . It follows
from the assumption (10) and Liapounov’s inequality that
11 max 3
nj max nj 0n ! 0.
j j
nj D 12 2 2
nj t C 3njnj jtj3 .
210 CENTRAL LIMIT THEOREM AND ITS RAMIFICATIONS
It follows that
kn
fnj t ! et
2
/2
.
jD1
This establishes the theorem by the convergence theorem of Sec. 6.3, since
the left member is the ch.f. of Sn .
Corollary. Without supposing that E Xnj D 0, suppose that for each n and
j there is a finite constant Mnj such that jXnj j Mnj a.e., and that
D 2 max Mnj .
1jkn
n
n
n
13 Sn D Xj , sn2 D 2
j, 0n D j ,
jD1 jD1 jD1
is as follows.
7.1 LIAPOUNOV’S THEOREM 211
If
0n
14 ! 0,
sn3
then Sn /sn converges in dist. to 8.
This is obtained by setting Xnj D Xj /sn . It should be noticed that the double
scheme gets rid of cumbersome fractions in proofs as well as in statements.
We proceed to another proof of Liapounov’s theorem for a single
sequence by the method of Lindeberg, who actually used it to prove his
version of the central limit theorem, see Theorem 7.2.1 below. In recent times
this method has been developed into a general tool by Trotter and Feller.
The idea of Lindeberg is to approximate the sum X1 C Ð Ð Ð C Xn in (13)
successively by replacing one X at a time with a comparable normal (Gaussian)
r.v. Y, as follows. Let fYj , j ½ 1g be r.v.’s having the normal distribution
N0, j2 ; thus Yj has the same mean and variance as the corresponding Xj
above; let all the X’s and Y’s be totally independent. Now put
Zj D Y1 C Ð Ð Ð C Yj1 C XjC1 C Ð Ð Ð C Xn , 1 j n,
with the obvious convention that
Z1 D X2 C Ð Ð Ð C Xn , Zn D Y1 C Ð Ð Ð C Yn1 .
To compare the distribution of Xj C Zj /sn with that of Yj C Zj /sn , we
use Theorem 6.1.6 by comparing the expectations of test functions. Namely,
we estimate the difference below for a suitable class of functions f:
X1 C Ð Ð Ð C Xn Y1 C Ð Ð Ð C Yn
15 E f E f
sn sn
n # $
Xj C Zj Yj C Zj
D E f E f .
jD1
sn sn
Note that the r.v.’s f, f0 , and f00 are bounded hence integrable. If
is another r.v. independent of and having the same mean and variance as ,
and E fjj3 g < 1, we obtain by replacing with in (16) and then taking the
difference:
M
17 jE ff C g E ff C gj E fjj3 C jj3 g.
6
This key formula is applied to each term on the right side of (15), with
D Zj /sn , D Xj /sn , D Yj /sn . The bounds on the right-hand side of (17)
then add up to
M j
n
c j3
18 C
6 jD1 sn3 sn3
p
where c D 8/ since the absolute third moment of N0, 2 is equal to c j3 .
By Liapounov’s inequality (Sec. 3.2) j3 j , so that the quantity in (18) is
O0n /sn3 . Let us introduce a unit normal r.v. N for convenience of notation,
so that Y1 C Ð Ð Ð C Yn /sn may be replaced by N so far as its distribution is
concerned. We have thus obtained the following estimate:
Sn 0n
19 8f 2 C3 : E f E ffNg O .
sn s3n
Consequently, under the condition (14), this converges to zero as n ! 1. It
follows by the general criterion for vague convergence in Theorem 6.1.6 that
Sn /sn converges in distribution to the unit normal. This is Liapounov’s form of
the central limit theorem proved above by the method of ch.f.’s. Lindeberg’s
idea yields a by-product, due to Pinsky, which will be proved under the same
assumptions as in Liapounov’s theorem above.
Using (19) for f D fn and f D gn , and combining the results with (22) and
(23), we obtain
0n
P fN ½ xn C 1g O P fSn ½ xn sn g
sn3
0n
24 P fN ½ xn 1g C O .
sn3
Now an elementary estimate yields for x ! C1:
1 2 2
1 y 1 x
P fN ½ xg D p exp dy ¾ p exp ,
2 x 2 2x 2
(see Exercise 4 of Sec. 7.4), and a quick computation shows further that
# 2 $
x
P fN ½ x š 1g D exp 1 C o1 , x ! C1.
2
Thus (24) may be written as
# 2 $
x 0n
P fSn ½ xn sn g D exp n 1 C o1 C O .
2 sn3
Suppose n is so large that the o1 above is strictly less than in absolute
value; in order to conclude (23) it is sufficient to have
# 2 $
0n xn
D o exp 1 C , n ! 1.
sn3 2
This is the sense of the condition (20), and the theorem is proved.
214 CENTRAL LIMIT THEOREM AND ITS RAMIFICATIONS
on the logarithmic scale, which is just to say that xn2 is small in absolute
value compared with xn2 /2. Thus the estimates in (21) is useful on such a
scale, and may be applied in the proof of the law of the iterated logarithm in
Sec. 7.5.
EXERCISES
1. Prove that for arbitrary r.v.’s fX g in the array (2), the implications
nj
d ) c ) b ) a are all strict. On the other hand, if the Xnj ’s are
independent in each row, then d c.
2. For any sequence of r.v.’s fYn g, if Yn /bn converges in dist. for an
increasing sequence of constants fbn g, then Yn /bn0 converges in pr. to 0 if
bn D obn0 . In particular, make precise the following statement: “The central
limit theorem implies the weak law of large numbers.”
3. For the double array (2), it is possible that Sn /bn converges in dist.
for a sequence of strictly positive constants bn tending to a finite limit. Is it
still possible if bn oscillates between finite limits?
4. Let fXj g be independent r.v.’s such that max1jn jXj j/bn ! 0 in
pr. and Sn an /bn converges to a nondegenerate d.f. Then bn ! 1,
bnC1 /bn ! 1, and anC1 an /bn ! 0.
5. In Theorem 7.1.2 let the d.f. of S be F . Prove that given any > 0,
n n
there exists a υ such that 0n υ ) LFn , 8 , where L is Levy
distance. Strengthen the conclusion to read:
sup jFn x 8xj .
x2R1
6. Prove the assertion made in Exercise 5 of Sec. 5.3 using the methods
of this section. [HINT: use Exercise 4 of Sec. 4.3.]
Hence (ii) follows from (1); indeed even the stronger form of negligibility (d)
in Sec. 7.1 follows. Now for a fixed , 0 < < 1, we truncate Xnj as follows:
Xnj , if jXnj j ;
3 X0nj D
0, otherwise.
kn
Put Sn0 D 0
jD1 Xnj , Sn0 D s0 2n . We have, since E Xnj D 0,
2
0
E Xnj D x dFnj x D x dFnj x.
jxj jxj>
Hence
1
jE X0nj j jxj dFnj x x 2 dFnj x
jxj> jxj>
and so by (1),
1 n k
jE Sn0 j x 2 dFnj x ! 0.
jD1 jxj>
and consequently
0
2
Xnj D x dFnj x
2
E X0nj 2 ½ x 2 dFnj x.
jxj jxj jxj>
Thus as n ! 1, we have
sn0 ! 1 and E Sn0 ! 0.
Since
Sn0 E Sn0 E Sn0
Sn0 D C sn0
sn0 sn0
we conclude (see Theorem 4.4.6, but the situation is even simpler here) that
if [Sn0 E Sn0 ] j /sn0 converges in dist., so will Sn0 /sn0 to the same d.f.
Now we try to apply the corollary to Theorem 7.1.2 to the double array
fX0nj g. We have jX0nj j , so that the left member in (12) of Sec. 7.1 corre-
sponding to this array is bounded by . But although is at our disposal and
may be taken as small as we please, it is fixed with respect to n in the above.
Hence we cannot yet make use of the cited corollary. What is needed is the
following lemma, which is useful in similar circumstances.
PROOF. It is the statement of the lemma and its necessity in our appli-
cation that requires a certain amount of sophistication; the proof is easy. For
each m, there is an nm such that n ½ nm ) um, n 1/m. We may choose
fnm , m ½ 1g inductively so that nm increases strictly with m. Now define
n0 D 1
mn D m for nm n < nmC1 .
Then
1
umn , n for nm n < nmC1 ,
m
and consequently the lemma is proved.
7.2 LINDEBERG–FELLER THEOREM 217
Now we can go back and modify the definition in (3) by replacing with
n . As indicated above, the cited corollary becomes applicable and yields the
convergence of [Sn0 E Sn0 ]/sn0 in dist. to 8, hence also that of Sn0 /sn0 as
remarked.
Finally we must go from Sn0 to Sn . The idea is similar to Theorem 5.2.1
but simpler. Observe that, for the modified Xnj in (3) with replaced by n ,
we have
⎧ ⎫
⎨ kn ⎬ kn
P fSn 6D Sn0 g P [Xnj 6D X0nj ] P fjXnj j > n g
⎩ ⎭
jD1 jD1
kn
1
2
x 2 dFnj x,
jD1 n jxj>n
the last inequality from (2). As n ! 1, the last term tends to 0 by the above,
hence Sn must have the same limit distribution as Sn0 (why?) and the sufficiency
of Lindeberg’s condition is proved. Although this method of proof is somewhat
longer than a straightforward approach by means of ch.f.’s (Exercise 4 below),
it involves several good ideas that can be used on other occasions. Indeed the
sufficiency part of the most general form of the central limit theorem (see
below) can be proved in the same way.
Necessity. Here we must resort entirely to manipulations with ch.f.’s. By
the convergence theorem of Sec. 6.3 and Theorem 7.1.1, the conditions (i)
and (ii) are equivalent to:
kn
fnj t D et
2 /2
4 8t: lim ;
n!1
jD1
By Theorem 6.3.1, the convergence in (4) is uniform in jtj T for each finite
T : similarly for (5) by Theorem 7.1.1. Hence for each T there exists n0 T
218 CENTRAL LIMIT THEOREM AND ITS RAMIFICATIONS
We shall consider only such values of n below. We may take the distinguished
logarithms (see Theorems 7.6.2 and 7.6.3 below) to conclude that
kn
t2
6 lim log fnj t D .
n!1
jD1
2
t2 1 2 t2
x dFnj x D .
2 j 1 2
Hence it follows from (5) and (9) that the left member of (8) tends to 0 as
n ! 1. From this, (7), and (6) we obtain
t2
lim ffnj t 1g D .
n!1
j
2
Hence for each > 0, if we split the integral into two parts and transpose one
of them, we obtain
2
t
lim 1 cos tx dFnj x
n!1 2 jxj
j
D lim 1 cos tx dFnj x
n!1 jxj>
j
7.2 LINDEBERG–FELLER THEOREM 219
lim 2 dFnj x
n!1 jxj>
j
nj
2
2
lim 2 D ,
n!1
j
2 2
For υ D 1 this condition is just (10) of Sec. 7.1. In the general case the asser-
tion follows at once from the following inequalities:
jxj2Cυ
x 2 dFnj x υ
dFnj x
j jxj> j jxj>
1 1 2Cυ
υ jxj dFnj x,
j 1
Theorem 7.2.2. For the double array (2) of Sec. 7.1 (with independence in
each row), in order that there exists a sequence of constants fan g such that
n
(i) kjD1 Xnj an converges in dist. to 8, and (ii) the array is holospoudic,
it is necessary and sufficient that the following two conditions hold for every
> 0:
kn
(a) jD1 jxj> dFnj x ! 0;
kn
(b) jD1 f jxj x 2 dFnj x jxj x dFnj x2 g ! 1.
This formula, due to DeMoivre, can be derived from Stirling’s formula for
factorials in a rather messy way. But it is just an application of Theorem 6.4.4
(or 7.1.2), where each Xj has the Bernoullian d.f. pυ1 C qυ0 .
More interesting and less expected applications to combinatorial analysis
will be illustrated by the following example, which incidentally demonstrates
the logical necessity of a double array even in simple situations.
Consider all n! distinct permutations a1 , a2 , . . . , an of the n integers
(1, 2, . . . , n). The sample space D n consists of these n! points, and P
assigns probability 1/n! to each of the points. For each j, 1 j n, and
each ω D a1 , a2 , . . . , an let Xnj be the number of “inversions” caused by
j in ω; namely Xnj ω D m if and only if j precedes exactly m of the inte-
gers 1, . . . , j 1 in the permutation ω. The basic structure of the sequence
fXnj , 1 j ng is contained in the lemma below.
7.2 LINDEBERG–FELLER THEOREM 221
Lemma 2. For each n, the r.v.’s fXnj , 1 j ng are independent with the
following distributions:
1
P fXnj D mg D for 0 m j 1.
j
EXERCISES
max nj ! 0.
1jkn
3. Prove that in Theorem 7.2.1, (i) does not imply (1). [HINT: Consider
r.v.’s with normal distributions.]
4. Prove the sufficiency part of Theorem 7.2.1 without using
Theorem 7.1.2, but by elaborating the proof of the latter. [HINT: Use the
expansion
tx2
eitx D 1 C itx C for jxj >
2
and
tx2 jtxj3
eitx D 1 C itx C 0 for jxj .
2 6
As a matter of fact, Lindeberg’s original proof does not even use ch.f.’s; see
Feller [13, vol. 2].]
5. Derive Theorem 6.4.4 from Theorem 7.2.1.
6. Prove that if υ < υ0 , then the condition (10) implies the similar one
when υ is replaced by υ0 .
7.2 LINDEBERG–FELLER THEOREM 223
is the first cycle of the decomposition. Next, begin with the least integer,
say b, not in the first cycle and apply to it successively; and so on. We
say 1 ! 1 is the first step, . . . , k1 1 ! 1 the kth step, b ! b the
k C 1st step of the decomposition, and so on. Now define Xnj ω to be equal
to 1 if in the decomposition of ω, a cycle is completed at the jth step; otherwise
to be 0. Prove that for each n, fXnj , 1 j ng is a set of independent r.v.’s
with the following distributions:
1
P fXnj D 1g D ,
njC1
1
P fXnj D 0g D 1 .
njC1
Deduce the central limit theorem for the number of cycles of a permutation.
It follows from the hypothesis of m-dependence and Theorem 3.3.2 that the
Yj ’s are independent; so are the Zj ’s, provided njC1 m C 1 nj > m,
which is the case if n/k is large enough. Although Sn0 and Sn00 are not
independent of each other, we shall show that the latter is comparatively
negligible so that Sn behaves like Sn0 . Observe that each term Xr in Sn00
is independent of every term Xs in Sn0 except at most m terms, and that
E Xr Xs D 0 when they are independent, while jE Xr Xs j M2 otherwise.
Since there are km terms in Sn00 , it follows that
jE Sn0 Sn00 j km Ð m Ð M2 D kmM2 .
We have also
k1
E S00 n D
2
E Z2j kmM2 .
jD0
we obtain
jE Sn2 E S0 n j 3km2 M2 .
2
Hence, first, Sn00 /sn ! 0 in pr. (Theorem 4.1.4) and, second, since
Sn s0 S 0 S00
D n 0n C n ,
sn sn sn sn
Sn /sn will converge in dist. to 8 if Sn0 /sn0 does.
Since kn is a function of n, in the notation above Yj should be replaced
by Ynj to form the double array fYnj , 0 j kn 1, 1 ng, which retains
independence in each row. We have, since each Ynj is the sum of no more
than [n/kn ] C 1 of the Xn ’s,
n
jYnj j C 1 M D On1/3 D osn D osn0 ,
kn
226 CENTRAL LIMIT THEOREM AND ITS RAMIFICATIONS
the last relation from (1) and the one preceding it from a hypothesis of the
theorem. Thus for each > 0, we have for all sufficiently large n:
x 2 dFnj x D 0, 0 j kn 1,
jxj>sn0
where Fnj is the d.f. of Ynj . Hence Lindeberg’s condition is satisfied for the
double array fYnj /sn0 g, and we conclude that Sn0 /sn0 converges in dist. to 8.
This establishes the theorem as remarked above.
The next extension of the central limit theorem is to the case of a random
number of terms (cf. the second part of Sec. 5.5). That is, we shall deal with
the r.v. S n whose value at ω is given by S n ω ω, where
n
Sn ω D Xj ω
jD1
The second factor on the right converges to 1 in pr., by (3). Hence a simple
argument used before (Theorem 4.4.6) shows that the theorem will be proved
if we show that
S n S[cn]
4 p !0 in pr.
[cn]
7.3 RAMIFICATIONS OF THE CENTRAL LIMIT THEOREM 227
By (3), there exists n0 such that if n ½ n0 , then the set
3 D fω: an n ω bn g
has probability ½1 . If ω is in this set, then S n ω ω is one of the sums
Sj with an j bn . For [cn] < j bn , we have
9
P n D j; max jSj S[cn] j > [cn] C Pf n D jg
an jbn
an jbn j2[a
/ n ,bn ]
9
P max jSj S[cn] j > [cn] C P f n 2
/ [an , bn ]g
an jbn
2 C 1 P f3g 3.
where a, b D 1 if ab < 0 and 0 otherwise. Thus the last two examples
represent, respectively, the “number of sums ½ a” and the “number of changes
of sign”. Now the central idea, originating with Erdös and Kac, is that the
asymptotic behavior of these functionals of Sn should be the same regardless of
the special properties of fXj g, so long as the central limit theorem applies to it
(at least when certain regularity conditions are satisfied, such as the finiteness
of a higher moment). Thus, in order to obtain the asymptotic distribution of
one of these functionals one may calculate it in a very particular case where
the calculations are feasible. We shall illustrate this method, which has been
called an “invariance principle”, by carrying it out in the case of max Sm ; for
other cases see Exercises 6 and 7 below.
Let us therefore put, for a given x:
p
Pn x D P max Sm x n .
1mn
Let also
p p
Ej D fω: Sm ω x n, 1 m < j; Sj ω > x ng;
7.3 RAMIFICATIONS OF THE CENTRAL LIMIT THEOREM 229
n
p
C P fEj ; jSnj Sj j > ng D C ,
jD1 1 2
p
say. Since Ej is independent of fjSnj Sj j > ng and 2
Snj Sj
n/k, we have by Chebyshev’s inequality:
n
n
1
P Ej 2k 2 .
2 jD1
n k
p p
On the pother hand, since Sj > x n and jSnj Sj j n imply Snj >
x n, we have
p
P max Sn > x n D 1 Rnk x .
1k
1
It follows that
1
5 Pn x D 1 ½ Rnk x .
1 2
2 k
Since it is trivial that Pn x Rnk x, we obtain from (5) the following
inequalities:
1
6 Pn x Rnk x Pn x C C .
2 k
We shall show that for fixed x and k, limn!1 Rnk x exists. Since
p p p
Rnk x D P fSn1 x n, Sn2 x n, . . . , Snk x ng,
nj being asymptotically equal to n/k for each j. It is well known that the
ch.f. given in (7) is that of the k-dimensional normal distribution, but for our
purpose it is sufficient to know that the convergence theorem for ch.f.’s holds
in any dimension, and so Rnk converges vaguely to R1k , where R1k is some
fixed k-dimensional distribution.
Now suppose for a special sequence fX Q j g satisfying the same conditions
as fXj g, the corresponding PQ n can be shown to converge (“pointwise” in fact,
but “vaguely” if need be):
1
R1k x Gx C C .
2 k
Letting k ! 1, we conclude that Pn converges vaguely to G, since is
arbitrary.
It remains to prove (8) for a special choice of fXj g and determine G. This
can be done most expeditiously by taking the common d.f. of the Xj ’s to be
7.3 RAMIFICATIONS OF THE CENTRAL LIMIT THEOREM 231
the symmetric Bernoullian 12 υ1 C υ1 . In this case we can indeed compute
the more specific probability
9 P max Sm < x; Sn D y ,
1mn
where x and y are two integers such that x > 0, x > y. If we observe that
in our particular case max1mn Sm ½ x if and only if Sj D x for some j,
1 j n, the probability in (9) is seen to be equal to
P fSn D yg P max Sm ½ x; Sn D y
1mn
n
D P fSn D yg P fSm < x, 1 m < j; Sj D x; Sn D yg
jD1
n
D P fSn D yg P fSm < x, 1 m < j; Sj D x; Sn Sj D y xg
jD1
n
D P fSn D yg P fSm < x, 1 m < j; Sj D xgP fSn Sj D y xg,
jD1
n
D P fSn D yg P fSm < x, 1 m < j; Sj D x; Sn Sj D x yg
jD1
n
D P fSn D yg P fSm < x, 1 m < j; Sj D x; Sn D 2x yg
jD1
D P fSn D yg P max Sm ½ x; Sn D 2x y .
1mn
and we have proved that the value of the probability in (9) is given by
(10). The trick used above, which consists in changing the sign of every Xj
after the first time when Sn reaches the value x, or, geometrically speaking,
reflecting the path fj, Sj , j ½ 1g about the line Sj D x upon first reaching it,
is called the “reflection principle”. It is often attributed to Desiré André in its
combinatorial formulation of the so-called “ballot problem” (see Exercise 5
below).
The value of (10) is, of course, well known in the Bernoullian case, and
summing over y we obtain, if n is even:
1 n n
P max Sm < x D n y n 2x C y
1mn
y<x
2n
2 2
1 n n
D n y n C 2x y
2n
y<x 2 2
1 n
D n
2 j
nx nCx
2 <j 2
1 n 1 n
D n C n nCx ,
2 n x
j 2
jj j< 2
2 2
6 7 p
where nj D 0 if jjj > n or if j is not an integer. Replacing x by x n (or
p
[x n] if one is pedantic) in the last expression, and using the central limit
theorem for the Bernoullian case in the form of (12) of Sec. 7.2 with p D q D
1
2 , we see that the preceding probability tends to the limit
+
x x
1 y 2 /2 2
ey
2 /2
p e dy D dy
2 x 0
EXERCISES
1
1 n n
D n n C 2kx C y z n C 2kx y z .
2 kD1
2 2
This can be done by starting the sample path at z and reflecting at both barriers
0 and x (Kelvin’s method of images). Next, show that
p p
lim P fz n < Sm < x z n for 1 m ng
n!1
1 2kC1xz 2kxz
1 2
Dp ey /2 dy.
2 kD1 2kxz 2k1xz
Finally, use the Fourier series for the function h of period 2x:
1, if x z < y < z;
hy D
C1, if z < y < x z;
where
1 12j
p2j D P fS2j D 0g D ¾p .
22j j
j
as j ! 1. To evaluate the multiple sum, say r, use induction on r as
follows. If
n r/2
r ¾ cr
2
7.4 ERROR ESTIMATION 235
as n ! 1, then
1
cr n rC1/2
r C 1 ¾ p z1/2 1 zr/2 dz .
0 2
Thus
r
0 C1
crC1 D cr 2 .
rC1
0 C1
2
Finally
rC1
r 2r/2 0 1
Nn 0r C 1 2
E p ! r D D x r dGx.
n r/2
2 0 C1 1 0
2 0
2
This result remains valid if the common d.f. F of Xj is of the integer lattice
type with mean 0 and variance 1. If F is not of the lattice type, no Sn need
ever be zero — but the “next nearest thing”, to wit the number of changes of
sign of Sn , is asymptotically distributed as G, at least under the additional
assumption of a finite third absolute moment.]
Set
1
2 1D sup jFx Gxj.
2M x
Then there exists a real number a such that we have for every T > 0:
T1
1 cos x
3 2MT1 3 dx
0 x2
1
1 cos Tx
fFx C a Gx C ag dx .
x 2
1
PROOF. Clearly the 1 in (2) is finite, since G is everywhere bounded by
(i) and (ii). We may suppose that the left member of (3) is strictly positive, for
otherwise there is nothing to prove; hence 1 > 0. Since F G vanishes at
š1 by (i), there exists a sequence of numbers fxn g converging to a finite limit
7.4 ERROR ESTIMATION 237
b such that Fxn Gxn converges to 2M1 or 2M1. Hence either Fb
Gb D 2M1 or Fb Gb D 2M1. The two cases being similar, we
shall treat the second one. Put a D b 1; then if jxj < 1, we have by (ii)
and the mean value theorem of differential calculus:
Gx C a ½ Gb C x 1M
and consequently
Fx C a Gx C a Fb [Gb C x 1M] D Mx C 1.
It follows that
1 1
1 cos Tx 1 cos Tx
2
fFx C a Gx C ag dx M x C 1 dx
1 x 1 x2
1
1 cos Tx
D 2M1 dx;
0 x2
1 1
1 cos Tx
C fFx C a Gx C ag dx
x 2
1 1
1 1 1
1 cos Tx 1 cos Tx
2M1 C 2
dx D 4M1 dx.
1 1 x 1 x2
Adding these inequalities, we obtain
1 1 1
1 cos Tx
fFx C a Gx C ag dx 2M1 C2
1 x2 0 1
1 1
1 cos Tx 1 cos Tx
dx D 2M1 3 C2 dx.
x2 0 0 x2
This reduces to (3), since
1
1 cos Tx T
dx D
0 x2 2
by (3) of Sec. 6.2, provided that T is so large that the left member of (3) is
positive; otherwise (3) is trivial.
Let 1 1
ft D eitx dFx, gt D eitx dGx.
1 1
238 CENTRAL LIMIT THEOREM AND ITS RAMIFICATIONS
Then we have
1 T
jft gtj 12
4 1 dt C .
M 0 t T
PROOF. That the integral on the right side of (4) is finite will soon be
apparent, and as a Lebesgue integral the value of the integrand at t D 0 may
be overlooked. We have, by a partial integration, which is permissible on
account of condition (iii):
1
5 ft gt D it fFx Gxgeitx dx;
1
and consequently
1
ft gt ita
e D fFx C a Gx C ageitx dx.
it 1
In particular, the left member above is bounded for all t 6D 0 by condition (iv).
Multiplying by T jtj and integrating, we obtain
T
ft gt ita
6 e T jtj dt
T it
T 1
D fFx C a Gx C ageitx T jtj dx dt.
T 1
We may invert the repeated integral by Fubini’s theorem and condition (iv),
and obtain (cf. Exercise 2 of Sec. 6.2):
1 T
1 cos Tx jft gtj
fFx C a Gx C ag dx T dt.
x 2 t
1 0
for the asymptotic expansion mentioned above). This should be compared with
the general discussion around Theorem 6.3.4. We shall now apply the lemma
to the specific case of Fn and 8 in Theorem 7.4.1. Let the ch.f. of Fn be fn ,
so that
kn
fn t D fnj t.
jD1
so that
2 3
nj 2 nj t 1 1 1
t C < C < .
2 6 8 48 4
or explicitly:
2
log fn t C t 1 0n jtj3 .
2 2
Since jeu 1j jujejuj for all u, it follows that
# $
t2 /2 0n jtj3 0n jtj3
jfn te 1j exp .
2 2
Since 0n jtj3 /2 1/16 and e1/16 2, this implies (8).
3
for the range of t specified in the lemma, proving (10).
Note that Lemma 4 is weaker than Lemma 3 but valid in a wider range.
We now combine them.
PROOF. If jtj < 1/20n 1/3 , this is implied by (8). If 1/20n 1/3 jtj <
1/40n , then 1 80n jtj3 , and so by (10):
jfn t et j jfn tj C et 2et 160n jtj3 et
2 2 2 2
/2 /2 /3 /3
.
PROOF OF THEOREM 7.4.1. Apply Lemma 2 with F D Fn and G D 8. The
M in condition (ii) of Lemma 1 may be taken to be 12 , since both Fn and 8
have mean 0 and variance 1, it follows from Chebyshev’s inequality that
1
Fx _ Gx
, if x < 0,
x2
1
1 Fx _ 1 Gx 2 , if x > 0;
x
and consequently
1
8x: jFx Gxj .
x2
Thus condition (iv) of Lemma 2 is satisfied. In (4) we take T D 1/40n ; we
have then from (4) and (11):
2 1/40n jfn t et /2 j
2
96
sup jFn x 8xj dt C p 0n
x 0 t 23
320n 1/40n 2 t2 /3 96
t e dt C p 0n
0 23
1
32 96
t2 et /3 dt C p
2
0n .
0 23
This establishes (1) with a numerical value for A0 (which may be somewhat
improved).
Although Theorem 7.4.1 gives the best possible uniform estimate of the
remainder Fn x 8x, namely one that does not depend on x, it becomes
less useful if x D xn increases with n even at a moderate speed. For instance,
we have 1
1
ey /2 dy C O0n ,
2
1 Fn xn D p
2 xn
where the first “principal” term on the right is asymptotically equal to
1
exn /2 .
2
p
2xn
p
Hence already when xn D 2 log1/0n this will be o0n for 0n ! 0 and
absorbed by the remainder. For such “large deviations”, what is of interest is
242 CENTRAL LIMIT THEOREM AND ITS RAMIFICATIONS
an asymptotic evaluation of
1 Fn xn
1 8xn
as xn ! 1 more rapidly than indicated above. This requires a different type
of approximation by means of “bilateral Laplace transforms”, which will not
be discussed here.
EXERCISES
Theorem 5.4.1); and On log log n1/2 were obtained successively by Haus-
dorff (1913), Hardy and Littlewood (1914), and Khintchine (1922); but in
1924 Khintchine gave the definitive answer:
n
Nn ω
lim + 2 D1
n!1 1
n log log n
2
for almost every ω. This sharp result with such a fine order of infinity as “log
log” earned its celebrated name. No less celebrated is the following extension
given by Kolmogorov
(1929). Let fXn , n ½ 1g be a sequence of independent
r.v.’s, Sn D njD1 Xj ; suppose that E Xn D 0 for each n and
sn
1 sup jXn ωj D o p ,
ω log log sn
where sn2 D 2
Sn , then we have for almost every ω:
Sn ω
2 lim 9 D 1.
n!1 2sn2 log log sn
The condition (1) was shown by Marcinkiewicz and Zygmund to be of the
best possible kind, but an interesting complement was added by Hartman and
Wintner that (2) also holds if the Xn ’s are identically distributed with a finite
second moment. Finally, further sharpening of (2) was given by Kolmogorov
and by Erdös in the Bernoullian case, and in the general case under exact
conditions by Feller; the “last word” being as follows: for any increasing
sequence ϕn , we have
!
P fSn ω > sn ϕn i.o.g D
0
1
according as the series
1
<
ϕn
eϕn
2
/2
1.
nD1
n D
We shall prove the result (2) under a condition different from (1) and
apparently overlapping it. This makes it possible to avoid an intricate estimate
concerning “large deviations” in the central limit theorem and to replace it by
an immediate consequence of Theorem 7.4.1.Ł It will become evident that the
Ł An alternative which bypasses Sec. 7.4 is to use Theorem 7.1.3; the details are left as an
exercise.
244 CENTRAL LIMIT THEOREM AND ITS RAMIFICATIONS
proof of such a “strong limit theorem” (bounds with probability one) as the
law of the iterated logarithm depends essentially on the corresponding “weak
limit theorem” (convergence of distributions) with a sufficiently good estimate
of the remainder term.
The vital link just mentioned will be given below as Lemma 1. In the
rest of this section “A” will denote a generic strictly positive constant, not
necessarily the same at each appearance, and 3 will denote a constant such
that j3j A. We shall also use the notation in the preceding statement of
Kolmogorov’s theorem, and
n
n D E jXn j3 , 0n D j
jD1
as in Sec. 7.4, but for a single sequence of r.v.’s. Let us set also
<
ϕ, x D 2x 2 log log x, > 0, x > 0.
This dominates the second (remainder) term on the right side of (6), by (3)
since 0 < υ < . Hence (4) and (5) follow as rather weak consequences.
To establish (2), let us write for each fixed υ, 0 < υ < :
En C D fω: Sn ω > ϕ1 C υ, sn g,
En D fω: Sn ω > ϕ1 υ, sn g,
and proceed by steps.
1° . We prove first that
8 P fEn C i.o.g D 0
in the notation introduced in Sec. 4.2, by using the convergence part of
the
Borel–Cantelli lemma there. But it is evident from (4) that the series
C
npP E n is far from convergent, since sn is expected to be of the order
of n. The main trick is to apply the lemma to a crucial subsequence fnk g
(see Theorem 5.1.2 for a crude form of the trick) chosen to have two prop-
erties: first, k P Enk C converges, and second, “En C i.o.” already implies
“Enk C i.o.” nearly, namely if the given υ is slightly decreased. This modified
implication is a consequence of a simple but essential probabilistic argument
spelled out in Lemma 2 below.
Given c > 1, let nk be the largest value of n satisfying sn ck , so that
snk ck < snk C1 .
Since max1jn j /sn ! 0 (why?), we have snk C1 /snk ! 1, and so
9 snk ¾ ck
as k ! 1. Now for each k, consider the range of j below:
10 nk j < nkC1
and put
11 Fj D fω: jSnkC1 Sj j < snkC1 g.
By Chebyshev’s inequality, we have
sn2 kC1 sn2 k 1
P Fj ½ 1 ! ;
sn2 kC1 c2
hence P Fj ½ A > 0 for all sufficiently large k.
that there exists a constant A > 0 such that P Fj ½ A for every j. Then
we have
⎛ ⎞ ⎛ ⎞
n n
12 P⎝ Ej Fj ⎠ ½ AP ⎝ Ej ⎠ .
jD1 jD1
n
½ P Ec1 Ð Ð Ð Ecj1 Ej Ð A,
jD1
(i) SnkC1 ω Snk ω > ϕ1 υ/2, tk for infinitely many k;
(ii) Snk ω ½ ϕ2, snk for all sufficiently large k.
Using (9) and log log tk2 ¾ log log sn2 kC1 , we see that the expression in the right
side of (17) is asymptotically greater than
& 8 p '
υ 1 2
1 1 2 ϕ1, snkC1 > ϕ1 υ, snkC1 ,
2 c c
Theorem9 7.5.1. Under the condition (3), the lim sup and lim inf, as n ! 1,
of Sn / 2sn2 log log sn are respectively C1 and 1, with probability one.
The assertion about lim inf follows, of course, from (2) if we apply it
to fXj , j ½ 1g. Recall that (3) is more than sufficient to ensure the validity
of the central limit theorem, namely that Sn /sn converges in dist. to 8. Thus
the law of the iterated logarithm complements the central limit theorem by
circumscribing the extraordinary fluctuations of the sequence fSn , n ½ 1g. An
immediate consequence is that for almost every ω, the sample sequence Sn ω
changes sign infinitely often. For much more precise results in this direction
see Chapter 8.
In view of the discussion preceding Theorem 7.3.3, one may wonder
about the almost everywhere bounds for
It is interesting to observe that as far as the lim supn is concerned, these two
functionals behave exactly like Sn itself (Exercise 2 below). However, the
question of lim infn is quite different. In the case of max1mn jSm j, another
law of the (inverted) iterated logarithm holds as follows. For almost every ω,
we have
max1mn jSm ωj
lim 8 D 1;
n!1 2 sn 2
8 log log sn
under a condition analogous to but stronger than (3). Finally, one may wonder
about an asymptotic lower bound for jSn j. It is rather trivial to see that this
is always osn when the central limit theorem is applicable; but actually it is
even osn1 in some general cases. Indeed in the integer lattice case, under
the conditions of Exercise 9 of 7.3, we have “Sn D 0 i.o. a.e.” This kind of
phenomenon belongs really to the recurrence properties of the sequence fSn g,
to be discussed in Chapter 8.
EXERCISES
Show that for sufficiently large k the event ek \ fk implies the complement
of ekC1 ; hence deduce
⎛ ⎞
kC1
k
P⎝ ej ⎠ P ej0 [1 P fj ]
jDj0 jDj0
converges in dist. to F. What is the class of such F’s, and when does
such a convergence take place? For a single sequence of independent r.v.’s
fXj , j ½ 1g, similar questions may be posed for the “normed sums” Sn
an /bn .
These questions have been answered completely by the work of Lévy, Khint-
chine, Kolmogorov, and others; for a comprehensive treatment we refer to the
book by Gnedenko and Kolmogorov [12]. Here we must content ourselves
with a modest introduction to this important and beautiful subject.
We begin by recalling other cases of the above-mentioned limiting distri-
butions, conveniently displayed by their ch.f.’s:
1
ecjtj ,
it ˛
ee , > 0; 0 < ˛ < 2, c > 0.
The former is the Poisson distribution; the latter is called the symmetric
stable distribution of exponent ˛ (see the discussion in Sec. 6.5), including
the Cauchy distribution for ˛ D 1. We may include the normal distribution
7.6 INFINITE DIVISIBILITY 251
among the latter for ˛ D 2. All these are exponentials and have the further
property that their “nth roots”:
1
ec/njtj ,
2 2 it ˛
eait/n , e1/naitb t
, e/ne ,
are also ch.f.’s. It is remarkable that this simple property already characterizes
the class of distributions we are looking for (although we shall prove only
part of this fact here).
DEFINITION OF INFINITE DIVISIBILITY. A ch.f. f is called infinitely divisible iff
for each integer n ½ 1, there exists a ch.f. fn such that
1 f D fn n .
In terms of d.f.’s, this becomes in obvious notation:
F D FnŁ
n DF Ł Fn Ł Ð Ð Ð Ł Fn .
=n >? @
n factors
Theorem 7.6.1. An infinitely divisible ch.f. never vanishes (for real t).
PROOF. We shall see presently that a complex-valued ch.f. is trouble-some
when its “nth root” is to be extracted. So let us avoid this by going to the real,
using the Corollary to Theorem 6.1.4. Let f and fn be as in (1) and write
g D jfj2 , gn D jfn j2 .
For each t 2 R1 , gt being real and positive, though conceivably vanishing,
its real positive nth root is uniquely defined; let us denote it by [gt]1/n . Since
by (1) we have
gt D [gn t]n ,
and gn t ½ 0, it follows that
3 8t: gn t D [gt]1/n .
252 CENTRAL LIMIT THEOREM AND ITS RAMIFICATIONS
fn t D [ft]1/n
for some “determination” of the multiple-valued nth root on the right side. This
can be done by a simple process of “continuous continuation” of a complex-
valued function of a real variable. Although merely an elementary exercise in
“complex variables”, it has been treated in the existing literature in a cavalier
fashion and then misused or abused. For this reason the main propositions
will be spelled out in meticulous detail here.
Next, since f is uniformly continuous in [T, T], there exists a υT , 0 < υT <
T , such that if t and t0 both belong to [T, T] and jt t0 j υT , then jft
ft0 j T /2 12 . Now divide [T, T] into equal parts of length less than
υT , say:
T D tl < Ð Ð Ð < t1 < t0 D 0 < t1 < Ð Ð Ð < t D T.
For t1 t t1 , we define as follows:
1
1j
5 t D fft 1gj .
jD1
j
similarly in [tk1 , tk ] by replacing tk with tk everywhere on the right side
above. Since we have, for tk t tkC1 ,
T
ft ftk
2 D 1,
ftk T 2
the power series in (6) converges uniformly in [tk , tkC1 ] and represents a
continuous function there equal to that determination of the logarithm of the
function ft/ftk 1 which is 0 for t D tk . Specifically, for the “schlicht
neighborhood” jz 1j 12 , let
1
1j
7 Lz D z 1j
jD1
j
with a similar expression for tk1 t tk . Thus (4) is satisfied in [t1 , t1 ],
and if it is satisfied for t D tk , then it is satisfied in [tk , tkC1 ], since
ft
et D etk CLft/ftk D ftk D ft.
ftk
Thus (4) is satisfied in [T, T] by induction, and the theorem is proved for
such an interval. To prove it for 1, 1 let us observe that, having defined
in [n, n], we can extend it to [n 1, n C 1] with the previous method,
by dividing [n, n C 1], for example, into small equal parts whose length must
be chosen dependent on n at each stage (why?). The continuity of is clear
from the construction.
To prove the uniqueness of , suppose that 0 has the same properties as
. Since both satisfy equation (4), it follows that for each t, there exists an
integer mt such that
t 0 t D 2i mt.
The left side being continuous in t, mÐ must be a constant (why?), which
must be equal to m0 D 0. Thus t D 0 t.
and consequently
k ft
9
sup L 1.
jtjT ft
Since for each t, the exponentials of k t t and Lk ft/ft are equal,
there exists an integer-valued function k mt, jtj T, such that
k ft
10 L Dk t t C 2i k mt, jtj T.
ft
7.6 INFINITE DIVISIBILITY 255
Theorem 7.6.4. For each n, the fn in (1) is just the distinguished nth root
of f.
PROOF. It follows from Theorem 7.6.1 and (1) that the ch.f. fn never
vanishes in 1, 1, hence its distinguished logarithm n is defined. Taking
multiple-valued logarithms in (1), we obtain as in (10):
8t: t nn t D 2imn t,
where mn Ð takes only integer values. We conclude as before that mn Ð 0,
and consequently
distribution can be obtained as the limiting distribution of Sn D njD1 Xnj in
such an array.
It is trivial that the product of two infinitely divisible ch.f.’s is again such
a one, for we have in obvious notation:
1 f Ð2 f D 1 fn n Ð 2 fn n D 1 fn Ð2 fn n .
The next proposition lies deeper.
is an infinitely divisible ch.f., since it is obtained from the Poisson ch.f. with
parameter a by substituting ut for t. We shall call such a ch.f. a generalized
Poisson ch.f. A finite product of these:
⎡ ⎤
k k
16 Pt; aj , uj D exp ⎣ aj eituj 1⎦
jD1 jD1
is an infinitely divisible ch.f. Now it turns out that although this falls some-
what short of being the most general form of an infinitely divisible ch.f.,
we have nevertheless the following qualititive result, which is a complete
generalization of (16).
Theorem 7.6.6. For each infinitely divisible ch.f. f, there exists a double
array of pairs of real constants anj , unj , 1 j kn , 1 n, where aj > 0,
such that
kn
18 ft D lim Pt; anj , unj .
n!1
jD1
The converse is also true. Thus the class of infinitely divisible d.f.’s coincides
with the closure, with respect to vague convergence, of convolutions of a finite
number of generalized Poisson d.f.’s.
PROOF. Let f and fn be as in (1) and let be the distinguished logarithm
of f, Fn the d.f. corresponding to fn . We have for each t, as n ! 1:
n[fn t 1] D n[et/n 1] ! t
and consequently
19 en[fn t1] ! et D ft.
Actually the first member in (19) is a ch.f. by Theorem 6.5.6, so that the
convergence is uniform in each finite interval, but this fact alone is neither
necessary nor sufficient for what follows. We have
1
n[fn t 1] D eitu 1n dFn u.
1
258 CENTRAL LIMIT THEOREM AND ITS RAMIFICATIONS
fanj , unj ; 1 j kn g
where 1 < un1 < un2 < Ð Ð Ð < unkn < 1 and anj D n[Fn un,j Fn un,j1 ],
such that
1
kn itu 1
20
sup e nj
1anj e 1n dFn u .
itu
jtjn jD1 1 n
This and (19) imply (18). The converse is proved at once by Theorem 7.6.5.
We are now in a position to state the fundamental theorem on infinitely
divisible ch.f.’s, due to P. Lévy and Khintchine.
Theorem 7.6.7. Every infinitely divisible ch.f. f has the following canonical
representation:
# 1 $
itu 1 C u2
ft D exp ait C eitu 1 dGu
1 1 C u2 u2
where a is a real constant, G is a bounded increasing function in 1, 1,
and the integrand is defined by continuity to be t2 /2 at u D 0. Furthermore,
the class of infinitely divisible ch.f.’s coincides with the class of limiting ch.f.’s
n
of kjD1 Xnj an in a holospoudic double array
fXnj , 1 j kn , 1 ng,
Note that we have proved above that every infinitely divisible ch.f. is in
the class of limiting ch.f.’s described here, although we did not establish the
canonical representation. Note also that if the hypothesis of “holospoudicity”
is omitted, then every ch.f. is such a limit, trivially (why?). For a complete
proof of the theorem, various special cases, and further developments, see the
book by Gnedenko and Kolmogorov [12].
7.6 INFINITE DIVISIBILITY 259
Let us end this section with an interesting example. Put s D C it, >1
and t real; consider the Riemann zeta function:
1
1
1 1
s D D 1 s ,
nD1
ns p
p
where p ranges over all prime numbers. Fix > 1 and define
C it
ft D .
We assert that f is an infinitely divisible ch.f. For each p and every real t,
the complex number 1 p it lies within the circle fz : jz 1j < 12 g. Let
log z denote that determination of the logarithm with an angle in (, ]. By
looking at the angles, we see that
1 p
log D log1 p log1 p it
1 p it
1
1
D m
em log pit 1
mD1
mp
1
D log Pt; m1 pm , m log p.
mD1
Since
ft D lim Pt; m1 pm , m log p,
n!1
pn
EXERCISES
Indeed, they do not even imply the existence of two subsequences fm g and
fn g such that
8t: lim um n t D ut.
!1
then ϕ satisfies Cauchy’s functional equation and must be of the form ecit ,
which is a ch.f. These approaches are fancier than the simple one indicated
in the hint for the said exercise, but they are interesting. There is no known
quick proof by “taking logarithms”, as some authors have done.]
Bibliographical Note
The most comprehensive treatment of the material in Secs. 7.1, 7.2, and 7.6 is by
Gnedenko and Kolmogorov [12]. In this as well as nearly all other existing books
on the subject, the handling of logarithms must be strengthened by the discussion in
Sec. 7.6.
For an extensive development of Lindeberg’s method (the operator approach) to
infinitely divisible laws, see Feller [13, vol. 2].
Theorem 7.3.2 together with its proof as given here is implicit in
W. Doeblin, Sur deux problèmes de M. Kolmogoroff concernant les chaines
dénombrables, Bull. Soc. Math. France 66 (1938), 210–220.
It was rediscovered by F. J. Anscombe. For the extension to the case where the constant
c in (3) of Sec. 7.3 is replaced by an arbitrary r.v., see H. Wittenberg, Limiting distribu-
tions of random sums of independent random variables, Z. Wahrscheinlichkeitstheorie
1 (1964), 7–18.
Theorem 7.3.3 is contained in
P. Erdös and M. Kac, On certain limit theorems of the theory of probability, Bull.
Am. Math. Soc. 52 (1946) 292–302.
The proof given for Theorem 7.4.1 is based on
P. L. Hsu, The approximate distributions of the mean and variance of a sample of
independent variables, Ann. Math. Statistics 16 (1945), 1–29.
This paper contains historical references as well as further extensions.
For the law of the iterated logarithm, see the classic
A. N. Kolmogorov, Über das Gesetz des iterierten Logarithmus, Math. Annalen
101 (1929), 126–136.
For a survey of “classical limit theorems” up to 1945, see
W. Feller, The fundamental limit theorems in probability, Bull. Am. Math. Soc.
51 (1945), 800–832.
262 CENTRAL LIMIT THEOREM AND ITS RAMIFICATIONS
Kai Lai Chung, On the maximum partial sums of sequences of independent random
variables. Trans. Am. Math. Soc. 64 (1948), 205–233.
Infinitely divisible laws are treated in Chapter 7 of Lévy [11] as the analytical
counterpart of his full theory of additive processes, or processes with independent
increments (later supplemented by J. L. Doob and K. Ito).
New developments in limit theorems arose from connections with the Brownian
motion process in the works by Skorohod and Strassen. For an exposition see references
[16] and [22] of General Bibliography.
8 Random walk
In this chapter we adopt the notation N for the set of strictly positive integers,
and N0 for the set of positive integers; used as an index set, each is endowed
with the natural ordering and interpreted as a discrete time parameter. Simi-
larly, for each n 2 N, Nn denotes the ordered set of integers from 1 to n (both
inclusive); N0n that of integers from 0 to n (both inclusive); and N0n that of
integers beginning with n C 1.
On the probability triple , F , P , a sequence fXn , n 2 Ng where each
Xn is an r.v. (defined on and finite a.e.), will be called a (discrete parameter)
stochastic process. Various Borel fields connected with such a process will now
be introduced. For any sub-B.F. G of F , we shall write
1 X2G
Recall that the union 1 nD1 Fn is a field but not necessarily a B.F. The
smallest B.F. containing it, or equivalently containing every Fn , n 2 N, is
denoted by
1
A
F1 D Fn ;
nD1
2 P 313 .
PROOF. Let G be the collection of sets 3 for which the assertion of the
theorem is true. Suppose 3k 2 G for each k 2 N and 3k " 3 or 3k # 3. Then
3 also belongs to G , as we can easily see by first taking k large and then
applying the asserted property to 3k . Thus G is a monotone class. Since it is
trivial that G contains the field 1 nD1 Fn that generates F1 , G must contain
F1 by the Corollary to Theorem 2.1.2, proving the theorem.
Without using Theorem 2.1.2, one can verify that G is closed with respect
to complementation (trivial), finite union (by Exercise 1 of Sec. 2.1), and
countable union (as increasing limit of finite unions). Hence G is a B.F. that
must contain F1 .
It will be convenient to use a particular type of sample space . In the
notation of Sec. 3.4, let
ð ,
1
D n
nD1
where each n is a “copy” of the real line R1 . Thus is just the space of all
infinite sequences of real numbers. A point ω will be written as fωn , n 2 Ng,
and ωn as a function of ω will be called the nth coordinate (function) of ω.
8.1 ZERO-OR-ONE LAWS 265
Each n is endowed with the Euclidean B.F. B1 , and the product Borel field
F D F1 in the notation above) on is defined to be the B.F. generated by
the finite-product sets of the form
k
3 fω: ωnj 2 Bnj g
jD1
where Fn0 is the Borel field generated by fωk , k > ng. This is obvious by (4)
if 3 is of the form above, and since the class of 3 for which the assertion
holds is a B.F., the result is true in general.
Observe the following general relation, valid for each point mapping
and the associated inverse set mapping 1 , each function Y on and each
subset A of R1 :
1 clearly forms a B.F., which may be called the “all-or-nothing” field. This
B.F. will also be referred to as “almost trivial”.
Now that we have defined all these concepts for the general stochastic
process, it must be admitted that they are not very useful without further
specifications of the process. We proceed at once to the particular case below.
Aspects of this type of process have been our main object of study,
usually under additional assumptions on the distributions. Having christened
it, we shall henceforth focus our attention on “the evolution of the process
as a whole” — whatever this phrase may mean. For this type of process, the
specific probability triple described above has been constructed in Sec. 3.3.
Indeed, F D F1 , and the sequence of independent r.v.’s is just that of the
successive coordinate functions fωn , n 2 Ng, which, however, will also be
interchangeably denoted by fXn , n 2 Ng. If ϕ is any Borel measurable func-
tion, then fϕXn , n 2 Ng is another such process.
The following result is called Kolmogorov’s “zero-or-one law”.
Theorem 8.1.2. For an independent process, each remote event has proba-
bility zero or one.
PROOF. Let 3 2 1 0
nD1 Fn and suppose that P 3 > 0; we are going to
prove that P 3 D 1. Since Fn and Fn0 are independent fields (see Exercise 5
of Sec. 3.3), 3 is independent of every set in Fn for each n 2 N; namely, if
M2 1 nD1 Fn , then
If we set
P 3 \ M
P3 M D
P 3
PQ 3 D P 1 3.
Since 1 maps disjoint sets into disjoint sets, it is clear that PQ is a p.m. For
a finite-product set 3, such as the one in (3), it follows from (4) that
k
PQ 3 D Bnj D P 3.
jD1
Hence P and PQ coincide also on each set that is the union of a finite number
of such disjoint sets, and so on the B.F. F generated by them, according to
Theorem 2.2.3. This proves (7); (8) is proved similarly.
The following companion to Theorem 8.1.2, due to Hewitt and Savage,
is very useful.
P 3 1 Mk k .
we deduce that
Since 1 0
kDm Mk 2 Fnm , the set lim sup Mk belongs to
1 0
kDm Fnk , which is seen
to coincide with the remote field. Thus lim sup Mk has probability zero or one
by Theorem 8.1.2, and the same must be true of 3, since is arbitrary in the
inequality above.
Here is a more transparent proof of the theorem based on the metric on
the measure space , F , P given in Exercise 8 of Sec. 3.2. Since 3k and
Mk are independent, we have
3k \ Mk ! 3 \ 3
in the same sense. Since convergence of events in this metric implies conver-
gence of their probabilities, it follows that P 3 \ 3 D P 3P 3, and the
theorem is proved.
EXERCISES
and F are the infinite product space and field specified above.
1. Find an example of a remote field that is not the trivial one; to make
it interesting, insist that the r.v.’s are not identical.
2. An r.v. belongs to the all-or-nothing field if and only if it is constant
a.e.
3. If 3 is invariant then 3 D 3; the converse is false.
4. An r.v. is invariant [permutable] if and only if it belongs to the
invariant [permutable] field.
5. The set of convergence of an arbitrary
sequence of r.v.’s fYn , n 2 Ng
or of the sequence of their partial sums njD1 Yj are both permutable. Their
limits are permutable r.v.’s with domain the set of convergence.
270 RANDOM WALK
6. If a > 0, lim
n n!1 an exists > 0 finite
or infinite, and limn!1 anC1 /an
D 1, then the set of convergence of fan 1 njD1 Yj g is invariant. If an ! C1,
the upper and lower limits of this sequence are invariant r.v.’s.
7. The set fY 2 A i.o.g, where A 2 B1 , is remote but not necessarily
2n
invariant; the set f njD1 Yj A i.o.g is permutable but not necessarily remote.
Find some other essentially different examples of these two kinds.
8. Find trivial examples of independent processes where the three
numbers P 1 3, P 3, P 3 take the values 1, 0, 1; or 0, 12 , 1.
9. Prove that an invariant event is remote and a remote event is
permutable.
10. Consider the bi-infinite product space of all bi-infinite sequences of
real numbers fωn , n 2 Ng, where N is the set of all integers in its natural
(algebraic) ordering. Define the shift as in the text with N replacing N, and
show that it is 1-to-1 on this space. Prove the analogue of (7).
11. Show that the conclusion of Theorem 8.1.4 holds true for a sequence
of independent r.v.’s, not necessarily stationary, but satisfying the following
condition: for every j there exists a k > j such that Xk has the same distribu-
tion as Xj . [This remark is due to Susan Horn.]
n
12. Let fXn , n ½ 1g be independent r.v.’s with P fXn D 4 g D P fXn D
4n g D 2 . Then the remote field of fSn , n ½ 1g, where Sn D njD1 Xj , is not
1
trivial.
have, however, decided to call it a “random walk (process)”, although the use
of this term is frequently restricted to the case when is of the integer lattice
type or even more narrowly a Bernoullian distribution.
DEFINITION OF RANDOM WALK. A random walk is the process fSn , n 2 Ng
defined in (1) where fXn , n 2 Ng is a stationary independent process. By
convention we set also S0 0.
A similar definition applies in a Euclidean space of any dimension, but
we shall be concerned only with R1 except in some exercises later.
Let us observe that even for an independent process fXn , n 2 Ng, its
remotefield is in general different from the remote field of fSn , n 2 Ng, where
Sn D njD1 Xj . They are almost the same, being both almost trivial, for a
stationary independent process by virtue of Theorem 8.1.4, since the remote
field of the random walk is clearly contained in the permutable field of the
corresponding stationary independent process.
We add that, while the notion of remoteness applies to any process,
“(shift)-invariant” and “permutable” will be used here only for the underlying
“coordinate process” fωn , n 2 Ng or fXn , n 2 Ng.
The following relation will be much used below, for m < n:
nm
nm
Snm m ω D Xj m ω D XjCm ω D Sn ω Sm ω.
jD1 jD1
It follows from Theorem 8.1.3 that Snm and Sn Sm have the same distri-
bution. This is obvious directly, since it is just nmŁ .
As an application of the results of Sec. 8.1 to a random walk, we state
the following consequence of Theorem 8.1.4.
Theorem 8.2.1. Let Bn 2 B1 for each n 2 N. Then
P fSn 2 Bn i.o.g
is equal to zero or one.
PROOF. If is a permutation on Nm , then Sn ω D Sn ω for n ½ m,
hence the set
1
3m D fSn 2 Bn g
nDm
1
is unchanged under or . Since 3m decreases as m increases, it
follows that
1
3m
mD1
3 [f˛ D ng \ 3n ],
1n1
PROOF. Both assertions are summarized in the formula below. For any
3 2 F˛ , k 2 N, Bj 2 B1 , 1 j k, we have
k
8 P f3; X˛Cj 2 Bj , 1 j kg D P f3g Bj .
jD1
To prove (8), we observe that it follows from the definition of ˛ and F˛ that
9 3 \ f˛ D ng D 3n \ f˛ D ng 2 Fn ,
where 3n 2 Fn for each n 2 N. Consequently we have
P f3; ˛ D n; X˛Cj 2 Bj , 1 j kg D P f3n ; ˛ D n; XnCj 2 Bj , 1 j kg
D P f3; ˛ D ngP fXnCj 2 Bj , 1 j kg
k
D P f3; ˛ D ng Bj ,
jD1
and
8n 2 N: f˛ D ng D fSj 0 for 1 j n 1; Sn > 0g.
Define also the r.v. Mn as follows:
12 8n 2 N0 : Mn ω D max Sj ω.
0jn
The inclusion of S0 above in the maximum may seem artificial, but it does
not affect the next theorem and will be essential in later developments in the
next section. Since each Xn is assumed to be finite a.e., so is each Sn and Mn .
Since Mn increases with n, it tends to a limit, finite or positive infinite, to be
denoted by
13 Mω D lim Mn ω D sup Sj ω.
n!1 0j<1
Theorem 8.2.4. The statements (a), (b), and (c) below are equivalent; the
statements a0 , b0 , and c0 are equivalent.
(a) P f˛ < C1g D 1; a0 ) P f˛ < C1g < 1;
(b) P f lim Sn D C1g D 1; b0 ) P f lim Sn D C1g D 0;
n!1 n!1
0
(c) P fM D C1g D 1; c ) P fM D C1g D 0.
PROOF. If (a) is true, we may suppose ˛ < 1 everywhere. Consider the
r.v. S˛ : it is strictly positive by definition and so 0 < E S˛ C1. By the
Corollary to Theorem 8.2.3, fSˇkC1 Sˇk , k ½ 1g is a sequence of indepen-
dent and identically distributed r.v.’s. Hence the strong law of large numbers
(Theorem 5.4.2 supplemented by Exercise 1 of Sec. 5.4) asserts that, if ˛0 0
and S˛0 0:
1
n1
S ˇn
D Sˇ Sˇk ! E S˛ > 0 a.e.
n n kD0 kC1
This implies (b). Since limn!1 Sn M, (b) implies (c). It is trivial that (c)
implies (a). We have thus proved the equivalence of (a), (b), and (c). If a0
is true, then (a) is false, hence (b) is false. But the set
f lim Sn D C1g
n!1
is clearly permutable (it is even invariant, but this requires a little more reflec-
tion), hence b0 is true by Theorem 8.1.4. Now any numerical sequence with
finite upper limit is bounded above, hence b0 implies c0 . Finally, if c0 is
true then (c) is false, hence (a) is false, and a0 is true. Thus a0 , b0 , and
c0 are also equivalent.
8.2 BASIC NOTIONS 277
Theorem 8.2.5. For the general random walk, there are four mutually exclu-
sive possibilities, each taking place a.e.:
(i) 8n 2 N: Sn D 0;
(ii) Sn ! 1;
(iii) Sn ! C1;
(iv) 1 D limn!1 Sn < limn!1 Sn D C1.
PROOF. If X D 0 a.e., then (i) happens. Excluding this, let ϕ1 D limn Sn .
Then ϕ1 is a permutable r.v., hence a constant c, possibly š1, a.e. by
Theorem 8.1.4. Since
lim Sn D X1 C limSn X1 ,
n n
These double alternatives yield the new possibilities (ii), (iii), or (iv), other
combinations being impossible.
This last possibility will be elaborated upon in the next section.
EXERCISES
8.3 Recurrence
A basic question about the random walk is the range of the whole process:
1
nD1 Sn ω for a.e. ω; or, “where does it ever go?” Theorem 8.2.5 tells us
that, ignoring the trivial case where it stays put at 0, it either goes off to 1
or C1, or fluctuates between them. But how does it fluctuate? Exercise 9
below will show that the random walk can take large leaps from one end to
the other without stopping in any middle range more than a finite number of
times. On the other hand, it may revisit every neighborhood of every point an
infinite number of times. The latter circumstance calls for a definition.
8.3 RECURRENCE 279
Theorem 8.3.1. The set < is either empty or a closed additive group of real
numbers. In the latter case it reduces to the singleton f0g if and only if X 0
a.e.; otherwise < is either the whole R1 or the infinite cyclic group generated
by a nonzero number c, namely fšnc: n 2 N0 g.
PROOF. Suppose < 6D throughout the proof. To prove that < is a group,
let us show that if x is a possible value and y 2 <, then y x 2 <. Suppose
not; then there is a strictly positive probability that from a certain value of n
on, Sn will not be in a certain neighborhood of y x. Let us put for z 2 R1 :
It is clear from the preceding theorem that the key to recurrence is the
value 0, for which we have the criterion below.
then
5 P fjSn j < i.o.g D 0
then
7 P fjSn j < i.o.g D 1
Remark. Actually if (4) or (6) holds for any > 0, then it holds for
every > 0; this fact follows from Lemma 1 below but is not needed here.
PROOF. The first assertion follows at once from the convergence part of
the Borel–Cantelli lemma (Theorem 4.2.1). To prove the second part consider
F D lim inffjSn j ½ g;
n
namely F is the event that jSn j < for only a finite number of values of n.
For each ω on F, there is an mω such that jSn ωj ½ for all n ½ mω; it
follows that if we consider “the last time that jSn j < ”, we have
1
P F D P fjSm j < ; jSn j ½ for all n ½ m C 1g.
mD0
Since the two independent events jSm j < and jSn Sm j ½ 2 together imply
that jSn j ½ , we have
1
1 ½ P F ½ P fjSm j < gP fjSn Sm j ½ 2 for all n ½ m C 1g
mD1
1
D P fjSm j < gp2,1 0
mD1
8.3 RECURRENCE 281
by the previous notation (2), since Sn Sm has the same distribution as Snm .
Consequently (6) cannot be true unless p2,1 0 D 0. We proceed to extend
this to show that p2,k D 0 for every k 2 N. To this aim we fix k and consider
the event
Am D fjSm j < , jSn j ½ for all n ½ m C kg;
then Am and Am0 are disjoint whenever m0 ½ m C k and consequently (why?)
1
k½ P Am .
mD1
The argument above for the case k D 1 can now be repeated to yield
1
k½ P fjSm j < gp2,k 0,
mD1
Theorem 8.3.3. If the weak law of large numbers holds for the random walk
fSn , n 2 Ng in the form that Sn /n ! 0 in pr., then < 6D .
PROOF. We need two lemmas, the first of which is also useful elsewhere.
Lemma 2. Let the positive numbers fun mg, where n 2 N and m is a real
number ½1, satisfy the following conditions:
Then we have
1
10 un 1 D 1.
nD0
Remark. If (ii) is true for all integer m ½ 1, then it is true for all real
m ½ 1, with c doubled.
PROOF OF LEMMA 2. Suppose not; then for every A > 0:
1
1
1 1
[Am]
1> un 1 ½ un m ½ un m
nD0
cm nD0 cm nD0
8.3 RECURRENCE 283
1
[Am]
n
½ un .
cm nD0 A
Then condition (i) is obviously satisfied and condition (ii) with c D 2 follows
from Lemma 1. The hypothesis that Sn /n ! 0 in pr. may be written as
Sn
un υn D P < υ ! 1
n
for every υ > 0 as n ! 1, hence condition (iii) is also satisfied. Thus Lemma 2
yields
1
P fjSn j < 1g D C1.
nD0
Applying this to the “magnified” random walk with each Xn replaced by Xn /,
which does not disturb the hypothesis of the theorem, we obtain (6) for every
> 0, and so the theorem is proved.
In practice, the following criterion is more expedient (Chung and Fuchs,
1951).
Theorem 8.3.4. Suppose that at least one of E XC and E X is finite. The
< 6D if and only if E X D 0; otherwise case (ii) or (iii) of Theorem 8.2.5
happens according as E X < 0 or > 0.
PROOF. If 1 E X < 0 or 0 < E X C1, then by the strong law
of large numbers (as amended by Exercise 1 of Sec. 5.4), we have
Sn
! E X a.e.,
n
so that either (ii) or (iii) happens as asserted. If E X D 0, then the same law
or its weaker form Theorem 5.2.2 applies; hence Theorem 8.3.3 yields the
conclusion.
284 RANDOM WALK
The most exact kind of recurrence happens when the Xn ’s have a common
distribution which is concentrated on the integers, such that every integer is
a possible value (for some Sn ), and which has mean zero. In this case for
each integer c we have P fSn D c i.o.g D 1. For the symmetrical Bernoullian
random walk this was first proved by Pólya in 1921.
We shall give another proof of the recurrence part of Theorem 8.3.4,
namely that E X D 0 is a sufficient condition for the random walk to be recur-
rent as just defined. This method is applicable in cases not covered by the two
preceding theorems (see Exercises 6–9 below), and the analytic machinery
of ch.f.’s which it employs opens the way to the considerations in the next
section.
The starting point is the integrated form of the inversion formula in
Exercise 3 of Sec. 6.2. Talking x D 0, u D , and F to be the d.f. of Sn , we
have
1
1 1 1 cos t
11 P jSn j < ½ P jSn j < u du D ftn dt.
0 1 t2
Thus the series in (6) may be bounded below by summing the last expres-
sion in (11). The latter does not sum well as it stands, and it is natural to resort
to a summability method. The Abelian method suits it well and leads to, for
0 < r < 1:
1 1
1 1 cos t 1
12 r n P jSn j < ½ R dt,
nD0
1 t 2 1 rft
where R and I later denote the real and imaginary parts or a complex quan-
tity. Since
1 1r
13 R ½ >0
1 rft j1 rftj2
and 1 cos t/t2 ½ C2 for jtj < 1 and some constant C, it follows that
for < 1n the right member of (12) is not less than
C 1
14 R dt.
1 rft
Now the existence of E jXj implies by Theorem 6.4.2 that 1 ft D ot
as t ! 0. Hence for any given υ > 0 we may choose the above so that
j1 rftj2 1 r C r[1 Rft]2 C rIft2
8.3 RECURRENCE 285
As r " 1, the right member above tends to n3υ; since υ is arbitrary, we have
proved that the right member of (12) tends to C1 as r " 1. Since the series
in (6) dominates that on the left member of (12) for every r, it follows that
(6) is true and so 0 2 < by Theorem 8.3.2.
EXERCISES
f is the ch.f. of .
then the random walk is not recurrent. [HINT: Use Exercise 3 of Sec. 6.2 to
show that there exists a constant C such that
u
1 cos 1 x C 1/
P jSn j < C n dx du ftn dt,
R1 x2 2 0 u
then the random walk is recurrent. This implies by Exercise 5 above that
almost every Brownian motion path is everywhere dense in R2 . [HINT: Use
the generalization of Exercise 6 and show that
1 c
R ½ 2
1 ft1 , t2 t1 C t22
for sufficiently small jt1 j C jt2 j. One can also make a direct estimate:
c0
P jSn j < ½ .]
n
12. Prove that no truly 3-dimensional random walk, namely one whose
common distribution does not have its support in a plane, is recurrent. [HINT:
There exists A > 0 such that
A 3
2
ti x i dx
A iD1
8.3 RECURRENCE 287
then
Rf1 ft1 , t2 , t3 g ½ CQt1 , t2 , t3 .]
17. The basic argument in the proof of Theorem 8.3.2 was the “last time
in , ”. A harder but instructive argument using the “first time” may be
given as follows.
Ł This form of condition (iii0 ) is due to Hsu Pei; see also Chung and Lindvall, Proc. Amer. Math.
Soc. Vol. 78 (1980), p. 285.
288 RANDOM WALK
and consequently
1n
P fSn > 0g < 1.
n
n
[HINT: For the first series, consider the last time that Sn 0; for the third series,
apply Du Bois–Reymond’s test. Cf. Theorem 8.4.4 below; this exercise will
be completed in Exercise 15 of Sec. 8.5.]
where 0 < r < 1, t is real, and f is the ch.f. of X. Applying the principle just
enunciated, we break this up into two parts:
˛1 1
E n itSn
r e CE r n eitSn
nD0 nD˛
8.4 FINE STRUCTURE 289
with the understanding that on the set f˛ D 1g, the first sum above is 1 nD0
while the second is empty and hence equal to zero. Now the second part may
be written as
1 1
2 E ˛Cn itS˛Cn
r e DE r e ˛ itS˛
r n eitS˛Cn S˛ .
nD0 nD0
We have
1
4 1 E fr e ˛ itS˛
gD1 r n
eitSn dP ;
nD1 f˛Dng
and
˛1 1
k1
5 E n itSn
r e D r n eitSn dP
nD0 kD1 f˛Dkg nD0
1
D1C r n
eitSn dP
nD1 f˛>ng
where
p0 t 1, pn t D e itSn
dP D eitx Un dx;
f˛Dng R1
q0 t 1, qn t D eitSn dP D eitx Vn dx.
f˛>ng R1
Thus we have
1
1 rn
7 D exp ftn
1 rft nD1
n
1
r n # $
D exp e dP C
itSn itSn
e dP
nD1
n fSn >0g fSn 0g
where
1
rn
fC r, t D exp eitx n dx ,
nD1
n 0,1
1
rn
f r, t D exp C eitx n dx ,
nD1
n 1,0]
where each ϕn is the Fourier transform of a measure in (0, 1), while each
n is the Fourier transform of a measure in (1, 0]. Substituting (7) into (6)
and multiplying through by fC r, t, we obtain
The next theorem below supplies the basic analytic technique for this devel-
opment, known as the Wiener–Hopf technique.
where p0 t q0 t pŁ0 t q0Ł t 1; and for n ½ 1, pn and pŁn as func-
tions of t are Fourier transforms of measures with support in (0, 1); qn and
qnŁ as functions of t are Fourier transforms of measures in (1, 0]. Suppose
that for some r0 > 0 the four power series converge for r in (0, r0 ) and all
real t, and the identity
The theorem is also true if (0, 1) and (1, 0] are replaced by [0, 1) and
(1, 0), respectively.
PROOF. It follows from (9) and the identity theorem for power series that
for every n ½ 0:
n
n
Ł
10 pk tqnk t D pŁk tqnk t.
kD0 kD0
By hypothesis, the left member above is the Fourier transform of a finite signed
measure 1 with support in (0, 1), while the right member is the Fourier
transform of a finite signed measure with support in (1, 0]. It follows from
the uniqueness theorem for such transforms (Exercise 13 of Sec. 6.2) that
we must have 1 2 , and so both must be identically zero since they have
disjoint supports. Thus p1 pŁ1 and q1 q1Ł . To proceed by induction on n,
suppose that we have proved that pj pŁj and qj qjŁ for 0 j n 1.
Then it follows from (10) that
Exactly the same argument as before yields that pn pŁn and qn qnŁ . Hence
the induction is complete and the first assertion of the theorem is proved; the
second is proved in the same way.
finite or infinite.
(B) If cn are complex numbers and 1 nD0 cn r converges for 0 r 1,
n
1
(D) If cni ½ 0, i n
nD0 cn r converges for 0 r < 1 and diverges for
r D 1, i D 1, 2; and
n
n
ck 1 ¾ K cn2
kD0 kD0
as r " 1.
(E) If cn ½ 0 and
1
1
cn r n ¾
nD0
1r
as r " 1, then
n1
ck ¾ n.
kD0
Observe that (C) is a partial converse of (B), and (E) is a partial converse
of (D). There is also an analogue of (D), which is actually a consequence of
(B): if cn are complex numbers converging to a finite limit c, then as r " 1,
1
c
cn r n ¾ .
nD0
1r
We have
1
1
14 P f˛ < 1g D 1 if and only if P [Sn 2 A] D 1;
nD1
n
in which case
1
1
15 E f˛g D exp P [Sn 2 Ac ] .
nD1
n
PROOF. Setting t D 0 in (11), we obtain the first equation in (13), from
which the second follows at once through
1 1 1
1 rn rn rn
D exp D exp P [Sn 2 A] C P [Sn 2 Ac ] .
1r nD1
n nD1
n nD1
n
Since
1
1
lim E fr ˛ g D lim P f˛ D ngr n D P f˛ D ng D P f˛ < 1g
r"1 r"1
nD1 nD1
by proposition (A), the middle term in (13) tends to a finite limit, hence also
the power series in r there (why?). By proposition (A), the said limit may be
obtained by setting r D 1 in the series. This establishes (14). Finally, setting
t D 0 in (12), we obtain
˛1 1
rn
16 E r n
D exp C P [Sn 2 Ac ] .
nD0 nD1
n
Rewriting the left member in (16) as in (5) and letting r " 1, we obtain
1
1
17 lim r P [˛ > n] D
n
P [˛ > n] D E f˛g 1
r"1
nD0 nD0
by proposition (A). The right member of (16) tends to the right member of
(15) by the same token, proving (15).
When is E f˛g in (15) finite? This is answered by the next theorem.Ł
Theorem 8.4.4. Suppose that X 6 0 and at least one of E XC and E X
is finite; then
18 E X > 0 ) E f˛0,1 g < 1;
19 E X 0 ) E f˛[0,1 g D 1.
Ł It can be shown that S ! C1 a.e. if and only if E f˛
n 0,1 g < 1; see A. J. Lemoine, Annals
of Probability 2(1974).
8.4 FINE STRUCTURE 295
PROOF. If E X > 0, then P fSn ! C1g D 1 by the strong law of large
numbers. Hence P flimn!1 Sn D 1g D 0, and this implies by the dual
of Theorem 8.2.4 that P f˛1,0 < 1g < 1. Let us sharpen this slightly to
P f˛1,0] < 1g < 1. To see this, write ˛0 D ˛1,0] and consider Sˇn0 as
in the proof of Theorem 8.2.4. Clearly Sˇn0 0, and so if P f˛0 < 1g D 1,
one would have P fSn 0 i.o.g D 1, which is impossible. Now apply (14) to
˛1,0] and (15) to ˛0,1 to infer
1
1
E f˛0,1 g D exp P [Sn 0] < 1,
nD1
n
Corollary. We have
1
1
20 P [Sn D 0] < 1.
nD1
n
The astonishing part of Theorem 8.4.4 is the case when the random walk
is recurrent, which is the case if E X D 0 by Theorem 8.3.4. Then the set
[0, 1, which is more than half of the whole range, is revisited an infinite
number of times. Nevertheless (19) says that the expected time for even one
visit is infinite! This phenomenon becomes more paradoxical if one reflects
that the same is true for the other half 1, 0], and yet in a single step one
of the two halves will certainly be visited. Thus we have:
˛1,0] ^ ˛[0,1 D 1, E f˛1,0] g D E f˛[0,1 g D 1.
Similarly we obtain 8 > 0 : P fSn < 2n i.o.g D 0, and the last two rela-
tions together mean exactly Sn /n ! 0 a.e. (cf. Theorem 4.2.2).
Having investigated the stopping time ˛, we proceed to investigate the
stopping place S˛ , where ˛ < 1. The crucial case will be handled first.
the justification for termwise differentiation being easy, since E fjSnjg nE fjXjg.
If we now set D 0 in (25), the result is
1
1
rn rn
C
26 E fr S˛ g D
˛
E fSn g exp P [Sn > 0] .
nD1
n nD1
n
since
1 2n 1
¾p .
22n n n
It follows from proposition (D) above that
1
1
rn rn
E fSnC g ¾ p 1 r 1/2
D p exp C .
nD1
n 2 2 nD1
2n
Substituting into (26), and observing that as r " 1, the left member of (26)
tends to E fS˛ g 1 by the monotone convergence theorem, we obtain
1
rn # 1 $
27 E fS˛ g D p lim exp P Sn > 0 .
2 r"1 nD1
n 2
It remains to prove that the limit above is finite, for then the limit of the
power series is also finite (why? it is precisely here that the Laplace transform
saves the day for us), and since the coefficients are o1/n by the central
298 RANDOM WALK
limit theorem, and certainly O1/n in any event, proposition (C) above will
identify it as the right member of (23) with A D 0, 1.
Now by analogy with (27), replacing 0, 1 by 1, 0] and writing
˛1,0] as ˇ, we have
1
rn # 1 $
28 E fSˇ g D p lim exp P Sn 0 .
2 r"1 nD1
n 2
Clearly the product of the two exponentials in (27) and (28) is just exp 0 D 1,
hence if the limit in (27) were C1, that in (28) would have to be 0. But
since E X D 0 and E X2 > 0, we have P X < 0 > 0, which implies at
once P Sˇ < 0 > 0 and consequently E fSˇ g < 0. This contradiction proves
that the limits in (27) and (28) must both be finite, and the theorem is proved.
Theorem 8.4.7. Suppose that X 6 0 and at least one of E XC and E X
is finite; and let ˛ D ˛0,1 , ˇ D ˛1,0] .
(i) If E X > 0 but may be C1, then E S˛ D E ˛E X.
(ii) If E X D 0, then E S˛ and E Sˇ are both finite if and only if
E X2 < 1.
PROOF. The assertion (i) is a consequence of (18) and Wald’s equation
(Theorem 5.5.3 and Exercise 8 of Sec. 5.5). The “if” part of assertion (ii)
has been proved in the preceding theorem; indeed we have even “evaluated”
E S˛ . To prove the “only if” part, we apply (11) to both ˛ and ˇ in the
preceding proof and multiply the results together to obtain the remarkable
equation:
1
rn
29 [1 E fr e g][1 E fr e g] D exp
˛ itS˛ ˇ itSˇ
ftn D 1 rft.
nD1
n
8.5 Continuation
Our next study concerns the r.v.’s Mn and M defined in (12) and (13) of
Sec. 8.2. It is convenient to introduce a new r.v. Ln , which is the first time
8.5 CONTINUATION 299
n
n1
D fSn > Snk D fSn > Sk g D fLn D ng.
kD1 kD0
Applying (5) and (12) of Sec. 8.4 to ˇ and substituting (4), we obtain
1
1 rn
5 r n
e dP D exp C
itSn
eitSn dP ;
nD0 fLn Dng nD1
n fSn >0g
applying (5) and (12) of Sec. 8.4 to ˛ and substituting the obvious relation
f˛ > ng D fLn D 0g, we obtain
1
1 rn
6 r n
e dP D exp C
itSn
eitSn dP .
nD0 fL n D0g nD1
n fSn 0g
We are ready for the main results for Mn and M, known as Spitzer’s identity:
n
eiuSnk ° dP
k
D eitSk dP
kD0 fLk Dkg fLnk ° k D0g
n
D eitSk dP eiuSnk dP .
kD0 fLk Dkg fLnk D0g
It follows that
1
12 r n E feitMn eiuSn Mn g
nD0
1
1
D rn eitSn dP Ð rn eiuSn dP .
nD0 fLn Dng nD0 fLn D0g
Setting u D 0 in (12) and using (5) as it stands and (6) with t D 0, we obtain
1
1 r n # $
r E fe
n itMn
g D exp eitSn dP C 1dP ,
nD0 nD1
n fSn >0g fSn 0g
and consequently
1
E feitM g D lim1 r r n E feitMn g
r"1
nD0
1
1
rn rn C
D lim exp exp E eitSn
r"1
nD1
n nD1
n
1
rn C
D lim exp [E eitSn 1] ,
r"1
nD1
n
where the first equation is by proposition (B) in Sec. 8.4. Since
1 1
1 C 2
jE eitSn 1j P [Sn > 0] < 1
nD1
n nD1
n
by (8), the last-written limit above is equal to the right member of (9), by
proposition (B). Theorem 8.5.1. is completely proved.
By switching to Laplace transforms in (7) as done in the proof of
Theorem 8.4.6 and using proposition (E) in Sec. 8.4, it is possible to give
an “analytic” derivation of the second assertion of Theorem 8.5.1 without
recourse to the “probabilistic” result in Theorem 8.2.4. This kind of mathemat-
ical gambit should appeal to the curious as well as the obstinate; see Exercise 9
below. Another interesting exercise is to establish the following result:
n
1
13 E Mn D E SkC
kD1
k
by differentiating (7) with respect to t. But a neat little formula such as (13)
deserves a simpler proof, so here it is.
Proof of (13). Writing
⎛ ⎞C
A
n A
n
Mn D Sj D ⎝ Sj ⎠
jD0 jD1
& '
A
n A
n
D X1 C 0 Sk X1 C Sk
fMn >0;Sn >0g kD2 fMn >0;Sn 0g kD1
& '
A
n A
n1
D X1 C 0 Sk X1 C Sk .
fSn >0g fSn >0g kD2 fMn1 >0;Sn 0g kD1
302 RANDOM WALK
Call the last three integrals 1 , 2 , and 3 . We have on grounds of symmetry
1
D Sn .
1 n fSn >0g
Apply the cyclical permutation
1, 2, . . . , n
2, 3, . . . , 1
to 2 to obtain
D Mn1 .
2 fSn >0g
Obviously, we have
D Mn1 D Mn1 .
3 fMn1 >0;Sn 0g fSn 0g
1
E SnC C E Mn1 ,
D
n
and (13) follows by recursion.
Another interesting quantity in the development was the “number of
strictly positive terms” in the random walk. We shall treat this by combinatorial
methods as an antidote to the analytic skulduggery above. Let us define
For easy reference let us repeat two previous definitions together with two
new ones below, for n 2 N0 :
Mn ω D max Sj ω; Ln ω D minfj 2 N0n : Sj ω D Mn ωg;
0jn
M0n ω D min Sj ω; Ln0 ω D maxfj 2 N0n : Sj ω D M0n ωg.
0jn
Now for n ω the first n C 1 partial sums Sj n ω, j 2 N0n , are
0, ωn , ωn C ωn1 , . . . , ωn C Ð Ð Ð C ωnjC1 , . . . , ωn C Ð Ð Ð C ω1 ,
D Pf n1 D k; Sn xg,
304 RANDOM WALK
where F is the common d.f. of each Xn . Now observe that on the set fSn 0g
we have Ln D Ln1 by definition, hence if x 0:
The left members in (21) and (22) below are equal by the lemma above, while
the right members are trivially equal since n C n0 D n:
subtracting (22) from (23), we obtain the equation in (20) for x ½ 0; hence it
is true for every x, proving the assertion about (16). Similarly for (160 ), and
the induction is complete.
As an immediate consequence of Theorem 8.5.2, the obvious relation
(10) is translated into the by-no-means obvious relation (24) below:
PROOF. Let us denote the number on right side of (25), which is equal to
1 2k 2n 2k
,
22n k nk
Pf 1 D 0g D P f 1 D 1g D 1
2 D an 0 D an n,
so that (25) holds trivially. Suppose now that it holds when n is replaced by
n 1; then for k 2 Nn1 we have by (24):
12 12
Pf n D kg D 1k 1nk D an k.
k nk
It follows that
n1
Pf n D 0g C P fvn D ng D 1 Pf n D kg
kD1
n1
D1 an k D an 0 C an n.
kD1
Under the hypotheses of the theorem, it is clear by considering the dual random
walk that the two terms in the first member above are equal; since the two
terms in the last member are obviously equal, they are all equal and the
theorem is proved.
Stirling’s formula and elementary calculus now lead to the famous “arcsin
law”, first discovered by Paul Lévy (1939) for Brownian motion.
This limit theorem also holds for an independent, not necessarily stationary
process, in which each Xn has mean 0 and variance 1 and such that the
classical central limit theorem is applicable. This can be proved by the same
method (invariance principle) as Theorem 7.3.3.
306 RANDOM WALK
EXERCISES
exists and is finite, say D . Now use proposition (E) in Sec. 8.4 and apply
the convergence theorem for Laplace transforms (Theorem 6.6.3).
10. Define a sequence of r.v.’s fY , n 2 N0 g as follows:
n
[HINT: Consider
1
lim1 r1/2 rnP [ n D 0]
r"1
nD0
308 RANDOM WALK
then pn ¾ n1/2 .]
14. Prove Theorem 8.5.4.
15. For an arbitrary random walk, we have
1n
P fSn > 0g < 1.
n
n
[HINT: Half of the result is given in Exercise 18 of Sec. 8.3. For the remaining
case, apply proposition (C) in the O-form to equation (5) of Sec. 8.5 with Ln
replaced by n and t D 0. This result is due to D. L. Hanson and M. Katz.]
Bibliographical Note
P 3 \ E
1 P3 E D
P 3
Then we have
1
1
3 P E D P 3n \ E D P 3n P3n E;
nD1 nD1
1
1
4 E Y D YωP dω D P 3n E3n Y,
nD1 3n nD1
Thus EG Y is a discrete r.v. that assumes that value E3n Y on the set 3n ,
for each n. Now we can rewrite (4) as follows:
E Y D EG Y dP D EG Y dP .
n 3n
In particular, this shows that EG Y is integrable. Formula (6) equates two
integrals over the same set with an essential difference: while the integrand
Y on the left belongs to F , the integrand EG Y on the right belongs to the
subfield G . [The fact that EG Y is discrete is incidental to the nature of G .]
It holds for every 3 in the subfield G , but not necessarily for a set in F nG .
Now suppose that there are two functions ϕ1 and ϕ2 , both belonging to G ,
such that
83 2 G : Y dP D ϕi dP , i D 1, 2.
3 3
(a) it belongs to G ;
(b) it has the same integral as Y over any set in G .
It follows from the definition that for an integrable r.v. Y and a Borel
subfield G , we have
[Y E Y j G ] dP D 0,
3
Then Z D limm ϕm X D ϕX, and ϕ is Borel measurable, proving the lemma.
To prove the second assertion: given any B 2 B1 , let 3 D X1 B, then
by Theorem 3.2.2 we have
E Y j X dP D 1B XϕX dP D 1B xϕx d D ϕx d .
3 R1 B
9.1 BASIC PROPERTIES OF CONDITIONAL EXPECTATION 315
Hence by (6),
B D Y dP D ϕx d .
3 B
This being true for every B in B1 , it follows that ϕ is a version of the derivative
d/d . Theorem 9.1.2 is proved.
As a consequence of the theorem, the function E Y j X of ω is constant
a.e. on each set on which Xω is constant. By an abuse of notation, the ϕx
above is sometimes written as E Y j X D x. We may then write, for example,
for each real c:
Y dP D E Y j X D x dP fX xg.
fXcg 1,c]
E Y j J D E Y, E Y j F D Y; a.e.
where J is the trivial field f∅, g. If G is the field generated by one set
3 : f∅, 3, 3c , g, then E Y j G is equal to E Y j 3 on 3 and E Y j 3c
on 3c . All these equations, as hereafter, are between equivalent classes of
r.v.’s.
We shall suppose the pair F , P to be complete and each Borel subfield
G of F to be augmented (see Exercise 20 of Sec. 2.2). But even if G is
not augmented and G is its augmentation, it follows from the definition that
E Y j G D E Y j G , since an r.v. belonging to G is equal to one belonging
to G almost everywhere (why ?). Finally, if G0 is a field generating G , or just
a collection of sets whose finite disjoint unions form such a field, then the
validity of (6) for each 3 in G0 is sufficient for (6) as it stands. This follows
easily from Theorem 2.2.3.
The next result is basic.
If we try to extend one of the usual proofs based on the positiveness of the
quadratic form in : E X C Y2 j G , the question arises that for each
the quantity is defined only up to a null set N , and the union of these over
all cannot be ignored without comment. The reader is advised to think this
difficulty through to its logical end, and then get out of it by restricting the
’s to the rationals. Here is another way out: start from the following trivial
inequality:
jXjjYj X2 Y2
2 C 2,
˛ˇ 2˛ 2ˇ
where ˛ D E X2 j G 1/2 , ˇ D E Y2 j G 1/2 , and ˛ˇ>0; apply the operation
E f j G g using (ii) and (iii) above to obtain
2 2
jXYj 1 X 1 Y
C G .
ˇ2
E G E 2
G E
˛ˇ 2 ˛ 2
Now use Theorem 9.1.3 to infer that this can be reduced to
1 1 ˛2 1 ˇ2
E fjXYj j G g C D 1,
˛ˇ 2 ˛2 2 ˇ2
the desired inequality.
The following theorem is a generalization of Jensen’s inequality in Sec. 3.2.
318 CONDITIONING. MARKOV PROPERTY. MARTINGALE
n
E ϕX j G D ϕyj P 3j j G ,
jD1
where njD1 P 3j j G D 1 a.e. Hence (11) is true in this case by the property
of convexity. In general let fXm g be a sequence of simple r.v.’s converging to
X a.e. and satisfying jXm j jXj for all m (see Exercise 7 of Sec. 3.2). If we
let m ! 1 below:
the left-hand member converges to the left-hand member of (11) by the conti-
nuity of ϕ, but we need dominated convergence on the right-hand side. To get
this we first consider ϕn which is obtained from ϕ by replacing the graph of
ϕ outside n, n with tangential lines. Thus for each n there is a constant
Cn such that
8x 2 R1 : jϕn xj Cn jxj C 1.
Consequently, we have
and the last term is integrable by hypothesis. It now follows from property
(vii) of conditional expectations that
and
EF1 Y 2 F1 ² F2 ,
the second equation in (14) follows from the same observation. It remains
to prove the first equation in (14). Let 3 2 F1 , then 3 2 F2 ; applying the
defining relation twice, we obtain
EF1 EF2 Y dP D EF2 Y dP D Y dP .
3 3 3
Hence EF1 EF2 Y satisfies the defining relation for EF1 Y; since it belongs
to F1 , it is equal to the latter.
As a particular case, we note, for example,
15 E fE Y j X1 , X2 j X1 g D E Y j X1 D E fE Y j X1 j X1 , X2 g.
partitions yields sets of the form f3j \ Mk g. The “inner” expectation on the
left of (15) is the result of replacing Y by its “average” over each 3j \ Mk .
Now if we replace this average r.v. again by its average over each 3j , then
the result is the same as if we had simply replaced Y by its average over each
3j . The second equation has a similar interpretation.
Another kind of simple situation is afforded by the probability triple
U n , Bn , mn discussed in Example 2 of Sec. 3.3. Let x1 , . . . , xn be the coor-
dinate r.v.’s žžž D fx1 , . . . , xn , where f is (Borel) measurable and inte-
grable. It is easy to see that for 1 k n 1,
1 1
E y j x1 , . . . , xk D ÐÐÐ fx1 , . . . , xn dxkC1 Ð Ð Ð dxn ,
0 0
while for k D n, the left side is just y (a.e.). Thus, taking conditional expec-
tation with respect to certain coordinate r.v.’s here amounts to integrating out
the other r.v.’s. The first equation in (15) in this case merely asserts the possi-
bility of iterated integration, while the second reduces to a banality, which we
leave to the reader to write out.
EXERCISES
8. In the case above, there exists an integrable function ϕÐ, Ð with the
property that for each B 2 B1 ,
ϕx, y dy
B
The term “wide sense” refers to the fact that these are functions on R1 rather
than on ; see Doob [1, Sec. I.9].]
9. Let the pÐ, Ð above be the 2-dimensional normal density:
2
1 1 x 2xy y2
9 exp C 2 ,
2 1 2 1 2 21 2 2
1 1 2 2
where 1 >0, 2 >0, 0 < < 1. Find the ϕ mentioned in Exercise 8 and
1
yϕx, y dy.
1
E Y2 j G D X2 , E Y j G D X.
Then Y D X a.e.
11. As in Exercise 10 but suppose now for any f 2 CK ;
Then Y D X a.e. [HINT: By a monotone class theorem the equations hold for
f D 1B , B 2 B1 ; now apply Exercise 10 with G D F fXg.]
12. Recall that Xn in L 1 converges weakly in L 1 to X iff E Xn Y !
E XY for every bounded r.v. Y. Prove that this implies E Xn j G converges
weakly in L 1 to E X j G for any Borel subfield G of F .
13. Let S be an r.v. such that P fS > tg D et , t>0. Compute E fS j S3tg
and E fS j S _ tg for each t > 0.
322 CONDITIONING. MARKOV PROPERTY. MARTINGALE
Theorem 9.2.1. For each ˛ 2 A let F ˛ denote the smallest B.F. containing
all Fˇ , ˇ 2 A f˛g. Then the F˛ ’s are conditionally independent relative to
G if and only if for each ˛ and 3˛ 2 F˛ we have
P 3˛ j F ˛
_ G D P 3˛ j G ,
where F ˛
_ G denotes the smallest B.F. containing F ˛
and G .
PROOF. It is sufficient to prove this for two B.F.’s F1 and F2 , since the
general result follows by induction (how?). Suppose then that for each 3 2 F1
we have
1 P 3 j F2 _ G D P 3 j G .
Let M 2 F2 , then
where the first equation follows from Theorem 9.1.5, the second and fourth
from Theorem 9.1.3, and the third from (1). Thus F1 and F2 are conditionally
independent relative to G . Conversely, suppose the latter assertion is true, then
E fP 3 j G 1M j G g D P 3 j G P M j G
D P 3M j G D E fP 3 j F2 _ G 1M j G g,
9.2 CONDITIONAL INDEPENDENCE; MARKOV PROPERTY 323
where the first equation follows from Theorem 9.1.3, the second by hypothesis,
and the third as shown above. Hence for every 1 2 G , we have
P 3 j G dP D P 3 j F2 _ G dP D P 3M1
M1 M1
It follows from Theorem 2.1.2 (take F0 to be finite disjoint unions of sets like
M1) or more quickly from Exercise 10 of Sec. 2.1 that this remains true if
M1 is replaced by any set in F2 _ G . The resulting equation implies (1), and
the theorem is proved.
When G is trivial and each F˛ is generated by a single r.v., we have the
following corollary.
Corollary. Let fX˛ , ˛ 2 Ag be an arbitrary set of r.v.’s. For each ˛ let F ˛
denote the Borel field generated by all the r.v.’s in the set except X˛ . Then
the X˛ ’s are independent if and only if: for each ˛ and each B 2 B1 , we have
P fX˛ 2 B j F ˛
g D P fX˛ 2 Bg a.e.
An equivalent form of the corollary is as follows: for each integrable r.v.
Y belonging to the Borel field generated by X˛ , we have
2 E fY j F ˛
g D E fYg.
This is left as an exercise.
Roughly speaking, independence among r.v.’s is equivalent to the lack
of effect by conditioning relative to one another. However, such intuitive
statements must be treated with caution, as shown by the following example
which will be needed later.
If F1 , F2 , and F3 are three Borel fields such that F1 _ F2 is independent
of F3 , then for each integrable X 2 F1 , we have
3 E fX j F2 _ F3 g D E fX j F2 g.
Instead of a direct verification, which is left to the reader, it is interesting
to deduce this from Theorem 9.2.1 by proving the following proposition.
If F1 _ F2 is independent of F3 , then F1 and F3 are conditionally inde-
pendent relative to F2 .
To see this, let 31 2 F1 , 33 2 F3 . Since
P 31 32 33 D P 31 32 P 33 D P 31 j F2 P 33 dP
32
where Bn D B. Sets of the form of 3 generate the Borel field F fS1 , . . . , Sn1 g.
It is therefore sufficient to verify that for each such 3, we have
n Bn Sn1 dP D P f3; Sn 2 Bn g.
3
If we proceed as before and write sn D njD1 xj , the left side above is equal to
ÐÐÐ n Bn sn1
n1
dx1 , . . . , dxn1
sj 2Bj ,1jn1
⎧ ⎫
⎨n ⎬
D ÐÐÐ n
dx1 , . . . , dxn D P [Sj 2 Bj ] ,
⎩ ⎭
sj 2Bj ,1jn jD1
as was to be shown. In the first equation above, we have made use of the fact
that the set of x1 , . . . , xn in Rn for which xn 2 Bn sn1 is exactly the set
for which sn 2 Bn , which is the formal counterpart of the heuristic argument
above. The theorem is proved.
The fact of the equality of the two extreme terms in (5) is a fundamental
property of the sequence fSn g. We have indeed made frequent use of this
in the study of sums of independent r.v.’s, particularly for a random walk
(Chapter 8), although no explicit mention has been made of it. It would be
instructive for the reader to review the material there and locate some instances
of its application. As an example, we prove the following proposition, where
the intuitive picture of conditioning is particularly clear.
Then we have
8n ½ 1: P fSj 2 0, A] for 1 j ng 1 υn .
Furthermore, given any finite interval I, there exists an >0 such that
P fSj 2 I, for 1 j ng 1 n .
PROOF. We write 3n for the event that Sj 2 0, A] for 1 j n; then
P f3n g D P f3n1 ; 0 < Sn Ag.
By the definition of conditional probability and (5), the last-written probability
is equal to
326 CONDITIONING. MARKOV PROPERTY. MARTINGALE
P f0 < Sn A j S1 , . . . , Sn1 g dP
3n1
D [Fn A Sn1 Fn 0 Sn1 ] dP ,
3n1
and the first assertion of the theorem follows by iteration. The second is proved
similarly by observing that P fXn C Ð Ð Ð C XnCm1 ½ mAg > υm and choosing
m so that mA exceeds the length of I. The details are left to the reader.
Let N0 D f0g [ N denote the set of positive integers. For a given sequence
of r.v.’s fXn , n 2 N0 g let us denote by F1 the Borel field generated by fXn , n 2
Ig, where I is a subset of N0 , such as [0, n], n, 1, or fng. Thus Ffng , F [0,n] ,
and Fn,1 have been denoted earlier by F fXn g, Fn , and F 0 n , respectively.
Theorem 9.2.4. The Markov property is equivalent to either one of the two
propositions below:
(7) 8n 2 N, M 2 Fn,1 : P fM j F [0,n] g D P fM j Xn g.
(8) 8n 2 N, M1 2 F [0,n] , M2 2 Fn,1 : P fM1 M2 j Xn g
D P fM1 j Xn gP fM2 j Xn g.
where the second and fourth equations follow from Theorems 9.1.3, the third
from assumption (7), and the fifth from Theorem 9.1.5.
Conversely, to prove that (8) implies (7), let 3 2 Ffng , M1 2 F [0,n , M2 2
Fn,1 . By the second equation in (9) applied to the fourth equation below,
328 CONDITIONING. MARKOV PROPERTY. MARTINGALE
we have
P M2 j Xn dP D E Y2 j Xn dP D Y1 E Y2 j Xn dP
3M1 3M1 3
D E fY1 E Y2 j Xn j Xn g dP
3
D E Y1 j Xn E Y2 j Xn dP
3
D P M1 j Xn P M2 j Xn dP
3
D P M1 M2 j Xn dP D P 3M1 M2 .
3
Since disjoint unions of sets of the form 3M1 as specified above generate the
Borel field F [0,n] , the uniqueness of P M2 j F [0,n] shows that it is equal to
P M2 j Xn , proving (7).
Finally, we prove the equivalence of the Markov property and the propo-
sition (7). Clearly the former is implied by the latter; to prove the converse
we shall operate with conditional expectations instead of probabilities and use
induction. Suppose that it has been shown that for every n 2 N, and every
bounded f belonging to F [nC1,nCk] , we have
D P f˛ D m; X˛ 2 B; Mg,
330 CONDITIONING. MARKOV PROPERTY. MARTINGALE
where the second equation follows from an application of (7), and the third
from the optionality of ˛, namely
f˛ D mg 2 F [0,m]
This establishes (12).
Now let 3 2 F˛ , then [cf. (3) of Sec. 8.2] we have
1
3D f˛ D n \ 3n g,
nD0
where 3n 2 F [0,n] . It follows that
1 1
P f3Mg D P f˛ D n; 3n ; Mn g D P fMn j F [0,n] g dP
nD0 nD0 ˛Dn\3n
1
D P fMn j Xn g1f˛Dng dP D P fM j ˛, X˛ g dP ,
nD0 3 3
where the third equation is by an application of (7) while the fourth is by (12).
This being true for each 3, we obtain (11). The theorem is proved.
When ˛ is a constant n, it may be omitted from the right member of
(11), and so (11) includes (7) as a particular case. It may be omitted also in
the homogeneous case discussed below (because then the ϕn above may be
chosen to be independent of n).
There is a very general method of constructing a Markov process with
given “transition probability functions”, as follows. Let P0 Ð be an arbitrary
p.m. on R1 , B1 . For each n ½ 1 let Pn Ð, Ð be a function of the pair (x, B)
where x 2 R1 and B 2 B1 , having the measurability properties below:
above to be R1 , and that the resulting collection of p.m.’s are mutually consis-
tent in the obvious sense [see (16) of Sec. 3.3]. But the actual construction of
the process fXn , n 2 N0 g with these given “marginal distributions” is omitted
here, the procedure being similar to but somewhat more sophisticated than that
given in Theorem 3.3.4, which is a particular case. Assuming the existence,
we will now show that it has the Markov property by verifying (6) briefly.
By Theorem 9.1.5, it will be sufficient to show that one version of the left
member of (6) is given by PnC1 Xn , B, which belongs to Ffng by condition
(b) above. Let then
n
3D [Xj 2 Bj ]
jD0
and nC1
be the n C 1-dimensional p.m. of the random vector
X0 , . . . , Xn . It follows from Theorem 3.2.3 and (13) used twice that
PnC1 Xn , B dP D Ð Ð Ð PnC1 xn , Bd nC1
3
B0 ðÐÐÐðBn
nC1
D ÐÐÐ P0 dx0 Pj xj1 , dxj
B0 ðÐÐÐðBn ðB jD1
and call it the “n-step transition probability function”; when n D 1, the qual-
ifier “1-step” is usually dropped. We also put P0 x, B D 1B x. It is easy to
see that
15 PnC1 x, B D Pn y, BP1 x, dy,
R1
Theorem 9.2.6. For a homogeneous Markov process and a finite r.v. ˛ which
is optional relative to the process, the pre-˛ and post-˛ fields are conditionally
independent relative to X˛ , namely:
83 2 F˛ , M 2 F˛0 : P f3M j X˛ g D P f3 j X˛ gP fM j X˛ g.
Furthermore, the post-˛ process fX˛Cn , n 2 Ng is a homogeneous Markov
process with the same transition probability function as the original one.
EXERCISES
In this form we can define a Markov process fXt g with a continuous parameter
t ranging in [0, 1.
9.2 CONDITIONAL INDEPENDENCE; MARKOV PROPERTY 333
1
gx, B D fx, B Pn x, dy[1 fy, B].
nD1 B
11. Suppose that for a homogeneous Markov process the initial distri-
bution has support in N0 as a subset of R1 , and that for each i 2 N0 , the
334 CONDITIONING. MARKOV PROPERTY. MARTINGALE
is an infinite matrix called the “transition matrix”. Show that Pn as a matrix
is just the nth power of P1 . Express the probability P fXtk D ik , 1 k ng
in terms of the elements of these matrices. [This is the case of homogeneous
Markov chains.]
12. A process fXn , n 2 N0 g is said to possess the “rth-order Markov
property”, where r ½ 1, iff (6) is replaced by
for n ½ r 1. Show that if r < s, then the rth-order Markov property implies
the sth. The ordinary Markov property is the case r D 1.
13. Let Yn be the random vector Xn , XnC1 , . . . , XnCr1 . Then the
vector process fYn , n 2 N0 g has the ordinary Markov property (trivially
generalized to vectors) if and only if fXn , n 2 N0 g has the rth-order Markov
property.
14. Let fXn , n 2 N0 g be an independent process. Let
n
n
Sn1 D Xj , SnrC1 D Sjr
jD0 jD0
let fxn , n 2 Ng denote independent r.v.’s with mean zero and write
usage,
Xn D njD1 xj for the partial sum. Then we have
E XnC1 j x1 , . . . , xn D E Xn C xnC1 j x1 , . . . , xn
D Xn C E xnC1 j x1 , . . . , xn D Xn C E xnC1 D Xn .
Note that the conditioning with respect to x1 , . . . , xn may be replaced by
conditioning with respect to X1 , . . . , Xn (why?). Historically, the equation
above led to the consideration of dependent r.v.’s fxn g satisfying the condition
1 E xnC1 j x1 , . . . , xn D 0.
It is astonishing that this simple property should delineate such a useful class
of stochastic processes which will now be introduced. In what follows, where
the index set for n is not specified, it is understood to be either N or some
initial segment Nm of N.
It is called a supermartingale iff the “D” in (c) above is replaced by “½”, and
a submartingale iff it is replaced by “”. For abbreviation we shall use the
term smartingale to cover all three varieties. In case Fn D F [1,n] as defined in
Sec. 9.2, we shall omit Fn and write simply fXn g; more frequently however
we shall consider fFn g as given in advance and omitted from the notation.
Condition (a) is nowadays referred to as: fXn g is adapted to fFn g. Condi-
tion (b) says that all the r.v.’s are integrable; we shall have to impose stronger
conditions to obtain most of our results. A particularly important one is the
uniform integrability of the sequence fXn g, which is discussed in Sec. 4.5. A
weaker condition is given by
2 sup E jXn j < 1;
n
when this is satisfied we shall say that fXn g is L 1 -bounded. Condition (c) leads
at once to the more general relation:
3 n < m ) Xn D E Xm j Fn .
This follows from Theorem 9.1.5 by induction since
E Xm j Fn D E E Xm j Fm1 j Fn D E Xm1 j Fn .
336 CONDITIONING. MARKOV PROPERTY. MARTINGALE
It is often safer to use the explicit formula (4) rather than (3), because condi-
tional expectations can be slippery things to handle. We shall refer to (3) or
(4) as the defining relation of a martingale; similarly for the “super” and “sub”
varieties.
Let us observe that in the form (3) or (4), the definition of a smartingale
is meaningful if the index set N is replaced by any linearly ordered set, with
“<” as the strict order. For instance, it may be an interval or the set of
rational numbers in the interval. But even if we confine ourselves to a discrete
parameter (as we shall do) there are other index sets to be considered below.
It is scarcely worth mentioning that fXn g is a supermartingale if and
only if fXn g is a submartingale, and that a martingale is both. However the
extension of results from a martingale to a smartingale is not always trivial, nor
is it done for the sheer pleasure of generalization. For it is clear that martingales
are harder to come by than the other varieties. As between the super and sub
cases, though we can pass from one to the other by simply changing signs,
our force of habit may influence the choice. The next proposition is a case in
point.
PROOF. For a martingale we have equality in (5) for any convex ϕ, hence
we may take ϕx D jxj, jxjp or jxj logC jxj in the proof above.
Thus for a martingale fXn g, all three transmutations: fXC
n g, fXn g and
fjXn jg are submartingales. For a submartingale fXn g, nothing is said about the
last two.
( i) Z1 D 0; Zn ZnC1 for n ½ 1;
( ii) E Zn < 1 for each n.
It follows that Z1 D limn!1 " Zn exists but may take the value C1; Z1
is integrable if and only if fZn g is L 1 -bounded as defined above, which
means here limn!1 " E Zn < 1. This is also equivalent to the uniform
integrability of fZn g because of (i). We can now state the result as follows.
8 f˛ ng 2 Fn .
9 3 \ f˛ ng 2 Fn ,
9.3 BASIC PROPERTIES OF SMARTINGALES 339
3D 3n D [f˛ D ng \ 3n ]
n n
where the index n ranges over N1 . This is (3) of Sec. 8.2. The reader should
now do Exercises 1–4 in Sec. 8.2 to get acquainted with the simplest proper-
ties of optionality. Here are some of them which will be needed soon: F˛ is
a B.F. and ˛ 2 F˛ ; if ˛ is optional then so is ˛3n for each n 2 N; if ˛ ˇ
where ˇ is also optional then F˛ ² Fˇ ; in particular F˛3n ² F˛ \ Fn and in
fact this inclusion is an equation.
Next we assume X1 has been defined and X1 2 F1 . We then define X˛
as follows:
11 X˛ ω D X˛ω ω;
in other words,
X˛ ω D Xn ω on f˛ D ng, n 2 N1 .
This definition makes sense for any ˛ taking values in N1 , but for an optional
˛ we can assert moreover that
12 X˛ 2 F˛ .
This is an exercise the reader should not miss; observe that it is a natural
but nontrivial extension of the assumption Xn 2 Fn for every n. Indeed, all
the general propositions concerning optional sampling aim at the same thing,
namely to make optional times behave like constant times, or again to enable
us to substitute optional r.v.’s for constants. For this purpose conditions must
sometimes be imposed either on ˛ or on the smartingale fXn g. Let us however
begin with a perfect case which turns out to be also very important.
We introduce a class of martingales as follows. For any integrable r.v. Y
we put
13 Xn D E Y j Fn , n 2 N1 .
By Theorem 9.1.5, if n m:
14 Xn D E fE Y j Fm j Fn g D E fXm j Fn g
which shows fXn , Fn g is a martingale, not only on N but also on N1 . The
following properties extend both (13) and (14) to optional times.
340 CONDITIONING. MARKOV PROPERTY. MARTINGALE
15 X˛ D E Y j F˛ .
X˛ D E fE Y j Fˇ j F˛ g D E fXˇ j F˛ g,
16 ˛1 ˛2 Ð Ð Ð ˛n Ð Ð Ð ,
Theorem 9.3.4. Let ˛ and ˇ be two bounded optional r.v.’s such that
˛ ˇ. Then for any [super]martingale fXn g, fX˛ , F˛ ; Xˇ , Fˇ g forms a
[super]martingale.
PROOF. Let 3 2 F˛ ; using (10) again we have for each k ½ j:
3j \ fˇ > kg 2 Fk
9.3 BASIC PROPERTIES OF SMARTINGALES 341
and consequently
Xk dP ½ Xk dP C XkC1 dP
3j \fˇ½kg 3j \fˇDkg 3j \fˇ>kg
Rewriting this as
Xk dP XkC1 dP ½ Xˇ dP ;
3j \fˇ½kg 3j \fˇ½kC1g 3j \fˇDkg
Another summation over j from 1 to m yields the desired result. In the case
of a martingale the inequalities above become equations.
A particular case of a bounded optional r.v. is ˛n D ˛3n where ˛ is
an arbitrary optional r.v. and n is a positive integer. Applying the preceding
theorem to the sequence f˛n g as under Theorem 9.3.3, we have the following
corollary.
Theorem 9.3.5. Let ˛ and ˇ be two arbitrary optional r.v.’s such that ˛ ˇ.
Then the conclusion of Theorem 9.3.4 holds true for any supermartingale
fXn , Fn ; n 2 N1 g.
342 CONDITIONING. MARKOV PROPERTY. MARTINGALE
Since the integrands are positive, the integrals exist and we may let m ! 1
and then sum over j 2 N. The result is
X˛ dP ½ Xˇ dP
3\f˛<1g 3\fˇ<1g
which falls short of the goal. But we can add the inequality
X˛ dP D X1 dP D X1 dP D Xˇ dP
3\f˛D1g 3\f˛D1g 3\fˇD1g 3\fˇD1g
Since 1 and ˛3n are two bounded optional r.v.’s satisfying 1 ˛3n; the
right-hand side of (19) does not exceed E X1 by Theorem 9.3.4. This shows
X˛ is integrable since it is positive.
(b) In the general case we put
X0n D E fX1 j Fn g, X00n D Xn X0n .
Then fX0n , Fn ; n 2 N1 g is a martingale of the kind introduced in (13), and
Xn ½ X0n by the defining property of supermartingale applied to Xn and X1 .
Hence the difference fX00n , Fn ; n 2 Ng is a positive supermartingale with X001 D
0 a.e. By Theorem 9.3.3, fX0˛ , F˛ ; X0ˇ , Fˇ g forms a martingale; by case (a),
fX00˛ , F˛ ; X00ˇ , Fˇ g forms a supermartingale. Hence the conclusion of the theorem
follows simply by addition.
The two preceding theorems are the basic cases of Doob’s optional
sampling theorem. They do not cover all cases of optional sampling (see e.g.
9.3 BASIC PROPERTIES OF SMARTINGALES 343
Exercise 11 of Sec. 8.2 and Exercise 11 below), but are adequate for many
applications, some of which will be given later.
Martingale theory has its intuitive background in gambling. If Xn is inter-
preted as the gambler’s capital at time n, then the defining property postulates
that his expected capital after one more game, played with the knowledge of
the entire past and present, is exactly equal to his current capital. In other
words, his expected gain is zero, and in this sense the game is said to be
“fair”. Similarly a smartingale is a game consistently biased in one direc-
tion. Now the gambler may opt to play the game only at certain preferred
times, chosen with the benefit of past experience and present observation,
but without clairvoyance into the future. [The inclusion of the present status
in his knowledge seems to violate raw intuition, but examine the example
below and Exercise 13.] He hopes of course to gain advantage by devising
such a “system” but Doob’s theorem forestalls him, at least mathematically.
We have already mentioned such an interpretation in Sec. 8.2 (see in partic-
ular Exercise 11 of Sec. 8.2; note that ˛ C 1 rather than ˛ is the optional
time there.) The present generalization consists in replacing a stationary inde-
pendent process by a smartingale. The classical problem of “gambler’s ruin”
illustrates very well the ideas involved, as follows.
Let fSn , n 2 N0 g be a random walk in the notation of Chapter 8, and let S1
have the Bernoullian distribution 12 υ1 C 12 υ1 . It follows from Theorem 8.3.4,
or the more elementary Exercise 15 of Sec. 9.2, that the walk will almost
certainly leave the interval [a, b], where a and b are strictly positive integers;
and since it can move only one unit a time, it must reach either a or b. This
means that if we set
20 ˛ D minfn ½ 1: Sn D ag, ˇ D minfn ½ 1: Sn D bg,
then D ˛3ˇ is a finite optional r.v. It follows from the Corollary to
Theorem 9.3.4 that fS3n g is a martingale. Now
21 lim S3n D S a.e.
n!1
and clearly S takes only the values a and b. The question is: with what
probabilities? In the gambling interpretation: if two gamblers play a fair coin-
tossing game and possess, respectively, a and b units of the constant stake as
initial capitals, what is the probability of ruin for each?
The answer is immediate (“without any computation”!) if we show first
that the two r.v.’s fS1 , S g form a martingale, for then
22 E S D E S1 D 0,
which is to say that
aP fS D ag C bP fS D bg D 0,
344 CONDITIONING. MARKOV PROPERTY. MARTINGALE
EXERCISES
where X0 D 0 and F0 is trivial. Then for any two optional r.v.’s ˛ and ˇ
such that ˛ ˇ and E ˇ < 1, fX˛ , F˛ ; Xˇ , Fˇ g is a [super]martingale. This
is another case of optional sampling given by Doob, which includes Wald’s
equation (Theorem 5.5.3) as a special case. [HINT: Dominate the integrand in
the second integral in (17) by Yˇ where X0 D 0 and Ym D m nD1 jXn Xn1 j.
We have
1
E Yˇ D jXn Xn1 j dP ME ˇ.]
nD1 fˇ½ng
ˇ at n. This example shows why in optional sampling the option may be taken
even with the knowledge of the present moment under certain conditions. In
the case here the present (namely ˇ ^ n) may leave one no choice!
14. In the gambler’s ruin problem, suppose that S1 has the distribution
pυ1 C 1 pυ1 , p 6D 12 ;
E XC
n E X1 .
since it takes only a finite number of values, Theorem 9.3.4 shows that the
pair fX˛ , Xn g forms a submartingale. If we write
M D f max Xj ½ g,
1jn
Mk D f min Xj g.
1jk
We now come to a new kind of inequality, which will be the tool for
proving the main convergence theorem below. Given any sequence of r.v.’s
fXj g, for each sample point ω, the convergence properties of the numerical
sequence fXj ωg hinge on the oscillation of the finite segments fXj ω, j 2
Nn g as n ! 1. In particular the sequence will have a limit, finite or infinite, if
and only if the number of its oscillations between any two [rational] numbers a
and b is finite (depending on a, b and ω). This is a standard type of argument
used in measure and integration theory (cf. Exercise 10 of Sec. 4.2). The
interesting thing is that for a smartingale, a sharp estimate of the expected
number of oscillations is obtainable.
Let a < b. The number of “upcrossings” of the interval [a, b] by a
numerical sequence fx1 , . . . , xn g is defined as follows. Set
˛1 D minfj: 1 j n, xj ag,
˛2 D minfj: ˛1 < j n, xj ½ bg;
if any one of these is undefined, then all the subsequent ones will be undefined.
Let ˛ be the last defined one, with D 0 if ˛1 is undefined, then is defined
to be [/2]. Thus is the actual number of successive times that the sequence
crosses from a to ½ b. Although the exact number is not essential, since a
couple of crossings more or less would make no difference, we must adhere
to a rigid way of counting in order to be accurate below.
Remark. Since
E jXn j D 2E XC C
n E Xn 2E Xn E X1 ,
the condition of L 1 -boundedness is equivalent to the apparently weaker one
below:
9 sup E XC
n < 1.
n
n
PROOF. Let [a,b] D limn [a,b] .
Our hypothesis implies that the last term
in (5) is bounded in n; letting n ! 1, we obtain E f [a,b] g < 1 for every a
and b, and consequently [a,b] is finite with probability one. Hence, for each
pair of rational numbers a < b, the set
3[a,b] D flim Xn < a < b < lim Xn g
n n
9.4 INEQUALITIES AND CONVERGENCE 351
is a null set; and so is the union over all such pairs. Since this union contains
the set where limn Xn < limn Xn , the limit exists a.e. It must be finite a.e. by
Fatou’s lemma applied to the sequence jXn j.
Then for each k it is possible to choose the ni ’s successively so that the differ-
ences ni ni1 for 1 i 2k are so large that “most” of 3[a,b] is contained
in 2k
iD1 3i , so that 2k
P 3i > .
iD1
where the equalities follow from the martingale property. Upon subtraction
we obtain
2j 2j1
b aP 3i aP 3i 32j 2j1
c
Xn dP ,
iD1 iD1 3i 3c2j
iD1
Once Theorem 9.4.4 has been proved for a martingale, we can extend it
easily to a positive or uniformly integrable supermartingale by using Doob’s
decomposition. Suppose fXn g is a positive supermartingale and Xn D Yn Zn
as in Theorem 9.3.2. Then 0 Zn Yn and consequently
E Z1 D lim E Zn E Y1 ;
n!1
next we have
E Yn D E Xn C E Zn E X1 C E Z1 .
Hence fYn g is an L 1 -bounded martingale and so converges to a finite limit
as n ! 1. Since Zn " Z1 < 1 a.e., the convergence of fXn g follows. The
case of a uniformly integrable supermartingale is just as easy by the corollary
to Theorem 9.3.2.
It is trivial that a positive submartingale need not converge, since the
sequence fng is such a one. The classical random walk fSn g (coin-tossing
game) is an example of a martingale that does not converge (why?). An
interesting and not so trivial consequence is that both E SnC and E jSn j must
diverge to C1! (Cf. Exercise 2 of Sec. 6.4.) Further examples are furnished
by “stopped random walk”. For the sake of concreteness, let us stay with the
classical case and define to be the first time the walk reaches C1. As in our
previous discussion of the gambler’s-ruin problem, the modified random walk
fSQ n g, where SQ n D S3n , is still a martingale, hence in particular we have for
each n:
E SQ n D E SQ 1 D S1 dP C S1 dP D E S1 D 0.
fD1g f>1g
since < 1 a.e., but this convergence now also follows from Theorem 9.4.4,
since SQ nC 1. Observe, however, that
E SQ n D 0 < 1 D E SQ 1 .
Next, we change the definition of to be the first time ½ 1 the walk “returns”
to 0, as usual supposing S0 0. Then SQ 1 D 0 and we have indeed E SQ n D
E SQ 1 . But for each n,
QSn dP >0 D SQ 1 dP ,
fSQ n >0g fSQ n >0g
Theorem 9.4.5. The three propositions below are equivalent for a sub-
martingale fXn , Fn ; n 2 Ng:
Theorem 9.4.6. In the case of a martingale, propositions (a) and (b) above
are equivalent to c0 or (d) below:
c0 it converges a.e. to an integrable X1 such that fXn , Fn ; n 2 N1 g is
a martingale;
(d) there exists an integrable r.v. Y such that Xn D E Y j Fn for each
n 2 N.
PROOF. (b) ) c0 as before; c0 ) (a) as before if we observe that
E Xn D E X1 for every n in the present case, or more rapidly by consid-
ering jXn j instead of XC 0
n as below. c ) (d) is trivial, since we may take
the Y in (d) to be the X1 in (c ). To prove (d) ) (a), let n < n0 , then by
0
354 CONDITIONING. MARKOV PROPERTY. MARTINGALE
Theorem 9.1.5:
E Xn0 j Fn D E E Y j Fn0 j Fn D E Y j Fn D Xn ,
1 1
P fjXn j > g E jXn j E jYj,
which together imply (a).
The following conditions are equivalent, and they are automatically satisfied
in case of a martingale with “submartingale” replaced by “martingale” in (c):
9.4 INEQUALITIES AND CONVERGENCE 355
n E XC
1 C jaj
Ef [a,b] g .
ba
Letting n ! 1 and arguing as the proof of Theorem 9.4.4, we conclude (11)
by observing that
C
E XC C
1 lim E Xn E X1 < 1.
n
The proofs of (a) ) (b) ) (c) are entirely similar to those in Theorem 9.4.5.
(c) ) (d) is trivial, since 1 < E X1 E Xn for each n. It remains
to prove (d) ) (a). Letting C denote the limit in (d), we have for each >0:
C
12 P fjXn j > g E jXn j D 2E XC
n E Xn 2E X1 C < 1.
By (d), we may choose m so large that E Xn Xm > for any given >0
and for every n < m. Having fixed such an m, we may choose so large that
sup jXm j dP <
n fXn <g
356 CONDITIONING. MARKOV PROPERTY. MARTINGALE
Let fYn g be r.v.’s indexed by N. If the B.F.’s and r.v.’s are only given on N
or N, they can be trivially extended to N by putting Fn D F1 , Yn D Y1 for
all n 0, or Fn D F1 , Yn D Y1 for all n ½ 0. The following convergence
theorem is very useful.
Hence the equations hold also for 3 2 F1 (why?), and this shows that X1
has the defining property of E Y j F1 , since X1 2 F1 . Similarly, the limit
X1 in (15b) exists by Theorem 9.4.7; to identify it, we have by (c) there,
for each 3 2 F1 :
X1 dP D Xn dP D Y dP .
3 3 3
Corollary. If 3 2 F1 , then
16 lim P 3 j Fn D 13 a.e.
n!1
The reader is urged to ponder over the intuitive meaning of this result and
judge for himself whether it is “obvious” or “incredible”.
EXERCISES
Then
fX˛^n , F˛^n ; n 2 N1 g is a submartingale. [HINT: for 3 2 F˛^n bound
3 X˛ X˛^n dP below by interposing X˛^m where n < m.]
9.4 INEQUALITIES AND CONVERGENCE 359
Xn D E Z1 j Fn Zn .
lim Xs , lim Xs .
s"t s#t
s2Q s2Q
[HINT: Let fQn , n ½ 1g be finite subsets of Q such that Qn " Q; and apply the
upcrossing inequality to fXs , s 2 Qn g, then let n ! 1.]
360 CONDITIONING. MARKOV PROPERTY. MARTINGALE
9.5 Applications
Although some of the major successes of martingale theory lie in the field of
continuous-parameter stochastic processes, which cannot be discussed here, it
has also made various important contributions within the scope of this work.
We shall illustrate these below with a few that are related to our previous
topics, and indicate some others among the exercises.
These have been a recurring theme in Chapters 4, 5, 8, and 9 and play impor-
tant roles in the theory of random walk and its generalization to Markov
processes. Let fXn , n 2 N0 g be an arbitrary stochastic process; the notation
for fields in Sec. 9.2 will be used. For each n consider the events:
1
3n D fXj 2 Bj g,
jDn
1
MD 3n D fXj 2 Bj i.o.g,
nD1
where Bn are arbitrary Borel sets.
then we have
3 P f[Xj 2 Aj i.o.]n[Xj 2 Bj i.o.]g D 0.
9.5 APPLICATIONS 361
PROOF. Let 1 D fXj 2 Aj i.o.g and use the notation 3n and M above.
We may ignore the null sets in (1) and (2). Then if ω 2 1, our hypothesis
implies that
P f3nC1 j Xn gω ½ υ i.o.
where the second equation follows by Exercise 8 of Sec. 9.2 and the third by
Markov property. This proves the lemma in the harmonic case; the other case
is similar. (Why not also the “sub” case?)
The most important example of a harmonic function is the gÐ, B of
Exercise 10 of Sec. 9.2 for a given B; that of a superharmonic function is the
362 CONDITIONING. MARKOV PROPERTY. MARTINGALE
fÐ, B of Exercise 9 there. These assertions follow easily from their proba-
bilistic meanings given in the cited exercises, but purely analytic verifications
are also simple and instructive. Finally, if for some B we have
1
x D Pn x, B < 1
nD0
for every x, then Ð is superharmonic and is called the “potential” of the
set B.
The first inequality in (1) of Sec. 9.4 is of a type familiar in ergodic theory
and leads to the result below, which has been called the “dominated ergodic
theorem” by Wiener. In the case where Xn is the sum of independent r.v.’s
with mean 0, it is due to Marcinkiewicz and Zygmund. We write jjXjjp for
the L p -norm of X: jjXjjpp D E jXjp .
Theorem 9.5.4. Let 1 < p < 1 and 1/p C 1/q D 1. Suppose that fXn , n 2
Ng is a positive submartingale satisfying the condition
5 sup E fXpn g < 1.
n
Now it turns out that such an inequality for any two r.v.’s Y and X1 implies
the inequality jjYjjp qjjX1 jjp , from which (6) follows by Fatou’s lemma.
This is shown by the calculation below, where G D P fY ½ g.
1 1
E Yp D p dG pp1 G d
0 0
1 # $
p1 1
p X1 dP d
0 fY½g
# Y $
D X1 pp2 d dP
0
Dq X1 Yp1 dP qjjX1 jjp jjYp1 jjq
eitSn
8 Zn D .
ϕn t
Then each Zn is integrable; indeed the sequence fZn g is uniformly bounded.
We have for each n, if Fn denotes the Borel field generated by S1 , . . . , Sn :
itSn
e eitXnC1
E fZnC1 j Fn g D E Ð
ϕn t fnC1 t
Fn
itXnC1
eitSn e itSn
D E Fn D e fnC1 t
D Zn ,
ϕn t fnC1 t ϕn t fnC1 t
where the second equation follows from Theorem 9.1.3 and the third from
independence. Thus fZn , Fn g is a martingale, in the sense that its real and
imaginary parts are both martingales. Since it is uniformly bounded, it follows
from Theorem 9.4.4 that Zn converges a.e. This means, for each t with jtj t0 ,
there is a set t with P t D 1 such that if ω 2 t , then the sequence of
complex numbers eitSn ω /ϕn t converges and so also does eitSn ω . But how
does one deduce from this the convergence of Sn ω? The argument below
may seem unnecessarily tedious, but it is of a familiar and indispensable kind
in certain parts of stochastic processes.
Consider eitSn ω as a function of t, ω in the product space T ð , where
T D [t0 , t0 ], with the product measure m ð P , where m is the Lebesgue
measure on T. Since this function is measurable in t, ω for each n, the
set C of t, ω for which limn!1 eitSn ω exists is measurable with respect to
m ð P . Each section of C by a fixed t has full measure P t D 1 as just
shown, hence Fubini’s theorem asserts that almost every section of C by a
fixed ω must also have full measure mT D 2t0 . This means that there exists
an Q with P Q D 1, and for each ω 2 Q there is a subset Tω of T with
mTω D mT, such that if t 2 Tω , then limn!1 eitSn ω exists. Now we are
in a position to apply Exercise 17 of Sec. 6.4 to conclude the convergence of
Sn ω for ω 2 ,Q thus finishing the proof of the theorem.
Our next example is a new proof of the classical strong law of large numbers
in the form of Theorem 5.4.2, (8). This basically different approach, which
has more a measure-theoretic than an analytic flavor, is one of the striking
successes of martingale theory. It was given by Doob (1949).
where the second equation follows from the argument above. On the other
hand, the first limit is a remote (even invariant) r.v. in the sense of Sec. 8.1,
since for every m ½ 1 we have
n
Sn ω jDm Xj ω
lim D lim ;
n!1 n n!1 n
366 CONDITIONING. MARKOV PROPERTY. MARTINGALE
The method used in the preceding example can also be applied to the theory
of exchangeable events. The events fEn , n 2 Ng are said to be exchangeable
iff for every k ½ 1, the probability of the joint occurrence of any k of them is
the same for every choice of k members from the sequence, namely we have
11 P fEn1 \ Ð Ð Ð \ Enk g D wk , k 2 N;
for any subset fn1 , . . . , nk g of N. Let us denote the indicator of En by en ,
and put
n
Nn D ej ;
jD1
then Nn is the number of occurrences among the first n events of the sequence.
Denote by Gn the B.F. generated by fNj , j ½ ng, and
%
G D Gn .
n2N
and that this conditional expectation is the same for any subset n1 , . . . , nk
of 1, . . . , n. Put then fn0 D 1 and
k
fnk D enj , 1 k n,
n1 ,...,nk jD1
6 7
where the sum is extended over all nk choices; this is the “elementary
symmetric function” of degree k formed by e1 , . . . , en . Introducing an inde-
terminate z we have the formal identity in z:
n
n
fnj zj D 1 C ej z.
jD0 jD1
9.5 APPLICATIONS 367
But it is trivial that 1 C ej z D 1 C zej since ej takes only the values 0 and
1, hence†
n n
fnj zj D 1 C zej D 1 C zNn .
jD0 jD1
The sequence fQn2 X, n 2 Ng associated with X is called its squared varia-
tion process and is useful in many applications. We begin with the algebraic
identity:
⎛ ⎞2
n
n n
15 Xn D
2 ⎝ xj ⎠ D xj C 2
2
Xj1 xj .
jD1 jD1 jD2
If Xn 2 L 2 for every n, then all terms above are integrable, and we have for
each j:
16 E Xj1 xj D E Xj1 E xj j Fj1 D 0.
It follows that
n
E X2n D E xj2 D E Qn2 .
jD1
When Xn is the nth partial sum of a sequence of independent r.v.’s with zero
mean and finite variance, the preceding formula reduces to the additivity of
variances; see (6) of Sec. 5.1.
Now suppose that fXn g is a positive bounded supermartingale such that
0 Xn for all n, where is a constant. Then the quantity of (16) is
negative and bounded below by
E E xj j Fj1 D E xj 0.
In this case we obtain from (15):
n
E Xn ½ X X2n ½ E Qn2 C 2 E xj D E Qn2 C 2[E Xn D E X1 ];
jD2
so that
17 E Qn2 2E X1 22 .
By Theorem 9.4.1, the first term on the right is bounded by 1 E X1 . The
second term may be estimated by Chebyshev’s inequality followed by (17)
applied to X3:
1 2
P fQn X3 ½ g E fQn2 X3g E X1 .
2
9.5 APPLICATIONS 369
(VIII) Derivation
1n
jω
Yω D lim n
,
n!1 P 1jω
For in this case F1 will contain each open interval and so also B. If is not
absolutely continuous, the procedure above will lead to the derivative of its
absolutely continuous part (see Exercise 14 below). In particular, if F is the
d.f. associated with the p.m. , and we put
# $
kC1 k k kC1
fn x D 2 F
n
F for n < x ,
2n 2n 2 2n
where k ranges over all integers, then we have
for almost all x with respect to m; and F0 is the density of the absolutely
continuous part of F; see Theorem 1.3.1. So we have come around to the
beginning of this course, and the book is hereby ended.
EXERCISES
P fXkCj D xj , 1 j ng D pn x1 , . . . , xn .
[HINT: Consider P 3 j Fn0 and apply 9.4.8. This is due to Blackwell and
Freedman.]
5. In the notation of Theorem 9.6.2, suppose that there exists υ>0 such
that
P fXj 2 Bj i.o. j Xn g 1 υ a.e. on fXn 2 An g;
then we have
P fXj 2 Aj i.o. and Xj 2 Bj i.o.g D 0.
Define ˛ to be the first time that Zn > A and show that E ZC ˛3n is bounded
in n. Apply Theorem 9.4.4 to fZ˛3n g for each A to show that Zn converges on
the set where limn Zn < 1; similarly also on the set where limn Zn > 1.
The situation is reminiscent of Theorem 8.2.5.]
9. Let fYk , 1 k ng be independent r.v.’s with mean zero and finite
variances k2 ;
k
k
Sk D Yj , Sk2 D 2
j >0, Zk D Sk2 sk2
jD1 jD1
Thus we obtain
C A2
P f max jSk j g .
1kn Sn2
1 1
1 cn1
jXn j dP D jX1 j dP ;
nD1
n f˛Dng nD1
n fjX1 j>ng
1
1
jSn1 j dP < 1.]
nD1
n f˛Dng
11. Deduce from Exercise 10 that E supn jSn j/n < 1 if and only
if E jX1 j logC jX1 j < 1. [HINT: Apply Exercise 7 to the martingale
f. . . , Sn /n, . . . , S2 /2, S1 g in Example (V).]
12. In Example (VI) show that (i) G is generated by ; (ii) the events
fEn , n 2 Ng are conditionally independent given ; (iii) for any l events Enj ,
1 j l and any k l we have
1
l
P fEn1 \ Ð Ð Ð \ Enk \ EcnkC1 \ Ð Ð Ð \ Ecnl g D x k 1 xlk Gdx
0 k
Bibliographical Note
Most of the results can be found in Chapter 7 of Doob [17]. Another useful account is
given by Meyer [20]. For an early but stimulating account of the connections between
random walks and partial differential equations, see
A. Khintchine, Asymptotische Gesetze der Wahrscheinlichkeitsrechnung. Springer-
Verlag, Berlin, 1933.
374 CONDITIONING. MARKOV PROPERTY. MARTINGALE
For basic mathematical vocabulary and notation the reader is referred to §1.1
and §2.1 of the main text.
1 Construction of measure
Let be an abstract space and S its total Borel field, then A 2 S means
A ² .
Let us show that the properties (b) and (c) for outer measure hold for a measure
, provided all the sets involved belong to F0 .
If A1 2 F0 , A2 2 F0 , and A1 ² A2 , then Ac1 A2 2 F0 because F0 is a field;
A2 D A1 [ Ac1 A2 and so by (d):
and so by (d), since each member of the disjoint union above belongs to F0 :
⎛ ⎞
⎝ Aj ⎠ D A1 C Ac1 A2 C Ac1 Ac2 A3 C Ð Ð Ð
j
ABj D A 2 F0 ;
j
Ł Ł
It follows from (2) that A A. Thus D on F0 .
Ł
To prove is an outer measure, the properties (a) and (b) are trivial.
To prove (c), let > 0. For each j, by the definition of Ł Aj , there exists a
covering fBjk g of Aj such that
Ł
Bjk Aj C .
k
2j
Ł
DEFINITION 4. A set A ² belongs to F iff for every Z ² we have
Ł Ł Ł
3 Z D AZ C Ac Z.
If in (3) we change “D” into “”, the resulting inequality holds by (c); hence
(3) is equivalent to the reverse inequality when “D” is changed into “½”.
Ł
Theorem 2. F is a Borel field and contains F0 . On F Ł , Ł
is a measure.
PROOF. Let A 2 F0 . For any Z ² and any > 0, there exists a covering
fBj g of Z such that
4 Bj Ł Z C .
j
Since n
jD1 Bj 2 F Ł , we have by (7) and the monotonicity of Ł
:
⎛ ⎞ ⎛ ⎛ ⎞c ⎞
n n
Ł
Z D Ł ⎝Z Bj ⎠ C Ł ⎝Z ⎝ Bj ⎠ ⎠
jD1 jD1
⎛ ⎛ ⎞c ⎞
n 1
½ Ł
ZBj C Ł ⎝Z ⎝ Bj ⎠ ⎠ .
jD1 jD1
Ł
Letting n " 1 and using property (c) of , we obtain
⎛ ⎞ ⎛ ⎛ ⎞c ⎞
1 1
Ł
Z ½ Ł ⎝Z Bj ⎠ C Ł ⎝Z ⎝ Bj ⎠ ⎠
jD1 jD1
1
that establishes 2 F Ł . Thus F Ł is a Borel field.
jD1 Bj
Finally, let fBj g be a sequence of disjoint sets in F Ł . By the property (b)
Ł
of and (7) with Z D , we have
⎛ ⎞ ⎛ ⎞
1 n n 1
Ł⎝
Bj ⎠ ½ lim sup Ł ⎝ Bj ⎠ D lim Ł
Bj D Ł
Bj .
n n
jD1 jD1 jD1 jD1
2 Characterization of extensions
We have proved that
Ł
S ¦F ¦ F ¦ F0 ,
where some of the “¦” may turn out to be “D”. Since we have extended the
measure from F0 to F Ł in Theorem 2, what for F ? The answer will appear
in the sequel.
2 CHARACTERIZATION OF EXTENSIONS 381
Both these collections belong to F because the Borel field is closed under
countable union and intersection, and these operations may be iterated, here
twice only, for each collection. If B 2 F0 , then B belongs to both F0 υ and
F0υ because we can take Bmn D B. Finally, A 2 F0 υ if and only if Ac 2 F0υ
because c
Bmn D c
Bmn .
m n m n
Put
Bm D Bmn ; BD Bm ;
n m
Hence we have
Ł Ł
n Ac ² n Bn ; n Ac D n Bn .
Ł
Taking complements with respect to n , we have since n < 1:
n A ¦ n Bnc ;
Ł Ł Ł Ł Ł Ł
n A D n n Ac D n n Bn D n Bnc .
2 CHARACTERIZATION OF EXTENSIONS 383
Since n 2 F0 and Bnc 2 F0υ , it is easy to verify that n Bnc 2 F0υ by the
distributive law for the intersection with a union. Put
CD n Bnc .
n
AD n A ¦ C.
n
Consequently, we have
Ł Ł Ł
A ½ C ½ lim inf n Bnc
n
Ł Ł
D lim inf n A D A,
n
the last equation owing to property (e) of the measure Ł . Thus Ł A D
Ł
C, and the assertion is proved.
The measure Ł on F Ł is constructed from the measure on the field
F0 . The restriction of Ł to the minimal Borel field F containing F0 will
henceforth be denoted by instead of Ł .
In a general measure space , G , , let us denote by N G , the class
of all sets A in G with A D 0. They are called the null sets when G and
are understood, or -null sets when G is understood. Beware that if A ² B
and B is a null set, it does not follow that A is a null set because A may not
be in G ! This remark introduces the following definition.
Ł
(i) A ² and the outer measure A D 0;
Ł Ł
(ii) A 2 F and A D 0;
(iii) A ² B where B 2 F and B D 0.
It is the collection N F Ł , Ł
.
PROOF. If Ł A D 0, we will prove A 2 F Ł by verifying the criterion
(3). For any Z ² , we have by properties (a) and (b) of Ł :
Ł Ł Ł Ł
0 ZA A D 0; ZAc Z;
384 SUPPLEMENT: MEASURE AND INTEGRAL
where the symbol “” denotes strict difference of sets, namely B C D BCc
where C ² B. Finally we define a function on F as follows, for the A
2 CHARACTERIZATION OF EXTENSIONS 385
shown in (9):
We will legitimize this definition and with the same stroke prove the mono-
tonicity of . Suppose then
11 B1 C1 ² B2 C2 , Bi 2 F , Ci 2 C , i D 1, 2.
Since the class C is closed under countable union, this shows that F is closed
under countable intersection. Next let C ² D, D 2 N F , ; then
Ac D Bc [ C D Bc [ BC D Bc [ BD D C
D Bc [ BD BD C.
Since D on F , the first and third members above are equal to, respec-
tively:
Bn D D Bn D D Bn D An ;
n n n n
Bn Bn D An .
n n n
Therefore we have
An D An .
n n
We must still
define . Observe that by the properties of a measure, we
have ½ n2N n D s, say.
3 MEASURES IN R 387
Ł
Now we use Definition 3 to determine the outer measure . It is easy
to see that for any A ² N, we have
Ł
A D n.
n2A
Ł
In particular N D s. Next we have
Ł
ω D inf Ac D sup A D s
A2Nf A2Nf
3 Measures in R
Let R D 1, C1 be the set of real members, alias the real line, with its
Euclidean topology. For 1 a < b C1,
12 a, b] D fx 2 R: a < x bg
388 SUPPLEMENT: MEASURE AND INTEGRAL
is an interval of a particular shape, namely open at left end and closed at right
end. For b D C1, a, C1] D a, C1 because C1 is not in R. By choice
of the particular shape, the complement of such an interval is the union of
two intervals of the same shape:
a, b]c D 1, a] [ b, 1].
When a D b, of course a, a] D is the empty set. A finite or countably
infinite number of such intervals may merge end to end into a single one as
illustrated below:
1 $
1 1
13 0, 2] D 0, 1] [ 1, 2]; 0, 1] D , .
nD1
nC1 n
both exist. We shall write 1 for C1 sometimes. Next, F has unilateral limits
everywhere, and is right-continuous:
Fx Fx D FxC.
3 MEASURES IN R 389
The right-continuity follows from the monotone limit properties e and f
of m and the primary assumption (14). The measure of a single point x is
given by
mx D Fx Fx.
We shall denote a point and the set consisting of it (singleton) by the same
symbol.
The simplest example of F is given by Fx x. In this case F is continu-
ous everywhere and (14) becomes
ma, b] D b a.
We can replace a, b] above by a, b, [a, b or [a, b] because mx D 0 for
each x. This measure is the length of the line-segment from a to b. It was
in this classic case that the following extension was first conceived by Émile
Borel (1871–1956).
We shall follow the methods in §§1–2, due to H. Lebesgue and
C. Carathéodory. Given F as specified above, we are going to construct a
measure m on B and a larger Borel field BŁ that fulfills the prescription (16).
The first step is to determine the minimal field B0 containing all a, b].
Since a field is closed under finite union, it must contain all sets of the form
n
17 BD Ij , Ij D aj , bj ], 1 j n; n 2 N.
jD1
whenever l is finite, and moreover when l D 1 and the union [1 kD1 Bk happens
to be in B0 .
The case for a finite l is really clear. If each Bk is represented as in
(17), then the disjoint union of a finite number of them is represented in a
similar manner by pooling together all the disjoint Ij ’s from the Bk ’s. Then the
equation (19) just means that a finite double array of numbers can be summed
in two orders.
If that is so easy, what is the difficulty when l D 1? It turns out, as
Borel saw clearly, that the crux of the matter lies in the following fabulous
“banality.”
where aj < bj for each j, and the intervals aj , bj ] are disjoint, then we have
1
6 7
21 Fb Fa D Fbj Faj .
jD1
This small but giant step shortens the original a, b] to a, c1 ]. Obviously we
can repeat the process and shorten it to a, c2 ] where a c2 < c1 D b, and so
by mathematical induction we obtain a sequence a cn < Ð Ð Ð < c2 < c1 D b.
Needless to say, if for some n we have cn D a, then we have accom-
plished our purpose, but this cannot happen under our specific assumptions
because we have not used up all the infinite number of intervals in the union.
3 MEASURES IN R 391
Therefore the process must go on ad infinitum. Suppose then cn > cnC1 for
all n 2 N, so that cω D limn # cn exists, then cω ½ a. If cω D a (which can
easily happen, see (13)), then we are done and (21) follows, although the terms
in the series have been gathered step by step in a (possibly) different order.
What if cω > a? In this case there is a unique j such that bj D cω ; rename
the corresponding aj as cω1 . We have now
1
23 a, cω ] D aj0 , bj0 ],
jD1
where the aj0 , bj0 ]’s are the leftovers from the original collection in (20) after
an infinite number of them have been removed in the process. The interval
cω1 , cω ] is contained in the reduced new collection and we can begin a new
process by first removing it from both sides of (23), then the next, to be
denoted by [cω2 , cω1 ], and so on. If for some n we have cωn D a, then (21)
is proved because at each step a term in the sum is gathered. Otherwise there
exists the limit lim # cωn D cωω ½ a. If cωω D a, then (21) follows in the limit.
n
Otherwise cωω must be equal to some bj (why?), and the induction goes on.
Let us spare ourselves of the cumbersome notation for the successive well-
ordered ordinal numbers. But will this process stop after a countable number
of steps, namely, does there exist an ordinal number ˛ of countable cardinality
such that c˛ D a? The answer is “yes” because there are only countably many
intervals in the union (20).
The preceding proof (which may be made logically formal) reveals the
possibly complex structure hidden in the “order-blind” union in (20). Borel in
his Thèse (1894) adopted a similar argument to prove a more general result
that became known as his Covering Theorem (see below). A proof of the latter
can be found in any text on real analysis, without the use of ordinal numbers.
We will use the covering theorem to give another proof of Borel’s lemma, for
the sake of comparison (and learning).
This second proof establishes the equation (21) by two inequalities in
opposite direction. The first inequality is easy by considering the first n terms
in the disjoint union (20):
n
Fb Fa ½ Fbj Faj .
jD1
Then there exists a finite integer l such that when l is substituted for 1 in
the above, the inclusion remains valid.
In other words, a finite subset of the original infinite set of open inter-
vals suffices to do the covering. This theorem is also called the Heine–Borel
Theorem; see Hardy [1] (in the general Bibliography) for two proofs by Besi-
covitch.
To apply (24) to (20), we must alter the shape of the intervals aj , bj ] to
fit the picture in (24).
Let 1 < a < b < 1; and > 0. Choose a0 in a, b, and for each j
choose bj0 > bj such that
25 Fa0 Fa < ; Fbj0 Fbj < .
2 2jC1
These choices are possible because F is right continuous; and now we have
1
[a0 , b] ² aj , bj0
jD1
as required in (24). Hence by Borel’s theorem, there exists a finite l such that
l
26 [a0 , b] ² aj , bj0 .
jD1
l
0
27 Fb Fa Fbj0 Faj .
jD1
If we intersect both sides of (26) with the complement of ak , bk0 , we obtain
l
[bk0 , b] ² aj , bj0 .
jD1
j6Dk
Here the number of intervals on the right side is l 1; hence by the induction
hypothesis we have
l
Fb Fbk0 Fbj0 Faj .
jD1
j6Dk
Adding this to (28) we obtain (27), and the induction is complete. It follows
from (27) and (25) that
l
Fb Fa Fbj Faj C .
jD1
Then for any a in 1, b, (21) holds with “D” replaced by “”. Letting
a ! 1 we obtain the desired result. The case b D C1 is similar. Q.E.D.
In the following, all I with subscripts denote intervals of the shape a, b];
denotes union of disjoint sets. Let B 2 B0 ; Bj 2 B0 , j 2 N. Thus
n
nj
BD Ii ; Bj D Ijk .
iD1 kD1
Suppose
1
BD Bj
jD1
so that
n 1
nj
29 Ii D Ijk .
iD1 jD1 kD1
394 SUPPLEMENT: MEASURE AND INTEGRAL
We will prove
n 1
nj
30 mIi D mIjk .
iD1 jD1 kD1
For n D 1, (29) is of the form (20) since a countable set of sets can be
ordered as a sequence. Hence (30) follows by Borel’s lemma. In general,
simple geometry shows that each Ii in (29) is the union of a subcollection of
the Ijk ’s. This is easier to see if we order the Ii ’s in algebraic order and, after
merging where possible, separate them at nonzero distances. Therefore (30)
follows by adding n equations, each of which results from Borel’s lemma.
This completes the proof of the countable additivity of m on B0 , namely
(19) is true as stipulated there for l D 1 as well as l < 1.
The general method developed in §1 can now be applied to R, B0 , m.
Substituting B0 for F0 , m for in Definition 3, we obtain the outer measure
mŁ . It is remarkable that the countable additivity of m on B0 , for which two
painstaking proofs were given above, is used exactly in one place, at the begin-
ning of Theorem 1, to prove that mŁ D m on B0 . Next, we define the Borel
field BŁ as in Definition 4. By Theorem 6, R, BŁ , mŁ is a complete measure
space. By Definition 5, m is -finite on B0 because n, n] " 1, 1 as
n " 1 and mn, n] is finite by our primary assumption (14). Hence by
Theorem 3, the restriction of mŁ to B is the unique extension of m from B0
to B.
In the most important case where Fx x, the measure m on B0 is the
length: ma, b] D b a. It was Borel who, around the turn of the twentieth
century, first conceived of the notion of a countably additive “length” on an
extensive class of sets, now named after him: the Borel field B. A member of
this class is called a Borel set. The larger Borel field BŁ was first constructed
by Lebesgue from an outer and an inner measure (see pp. 28–29 of main
text). The latter was later bypassed by Carathéodory, whose method is adopted
here. A member of BŁ is usually called Lebesgue-measurable. The intimate
relationship between B and BŁ is best seen from Theorem 7.
The generalization to a generalized distribution function F is sometimes
referred to as Borel–Lebesgue–Stieltjes. See §2.2 of the main text for the
special case of a probability distribution.
The generalization to a Euclidean space of higher dimension presents no
new difficulty and is encumbered with tedious geometrical “baggage”.
It can be proved that the cardinal number of all Borel sets is that of the
real numbers (viz. all points in R), commonly denoted by C (the continuum).
On the other hand, if Z is a Borel set of cardinal C with mZ D 0, such
as the Cantor ternary set (p. 13 of main text), then by the remark preceding
Theorem 6, all subsets of Z are Lebesgue-measurable and hence their totality
4 INTEGRAL 395
has cardinal 2C which is strictly greater than C (see e.g. [3]). It follows that
there are incomparably more Lebesgue-measurable sets than Borel sets.
It is however not easy to exhibit a set in BŁ but not in B; see Exercise
No. 15 on p. 15 of the main text for a clue, but that example uses a non-
Lebesgue-measurable set to begin with.
Are there non-Lebesgue-measurable sets? Using the Axiom of Choice,
we can “define” such a set rather easily; see example [3] or [5]. However, Paul
Cohen has proved that the axiom is independent of the other logical axioms
known as Zermelo–Fraenkel system commonly adopted in mathematics; and
Robert Solovay has proved that in a certain model without the axiom of
choice, all sets of real numbers are Lebesgue-measurable. In the notation of
Definition 1 in §1 in this case, BŁ D S and the outer measure mŁ is a measure
on S .
N.B. Although no explicit invocation is made of the axiom of choice in
the main text of this book, a weaker version of it under the prefix “countable”
must have been casually employed on the q.t. Without the latter, allegedly it is
impossible to show that the union of a countable collection of countable sets
is countable. This kind of logical finesse is beyond the scope of this book.
4 Integral
The measure space , F , is fixed. A function f with domain and
range in RŁ D [1, C1] is called F -measurable iff for each real number c
we have
ff cg D fω 2 : fω cg 2 F .
We write f 2 F in this case. It follows that for each set A 2 B, namely a
Borel set, we have
ff 2 Ag 2 F ;
and both ff D C1g and ff D 1g also belong to F . Properties of measur-
able functions are given in Chapter 3, although the measure there is a proba-
bility measure.
A function f 2 F with range a countable set in [0, 1] will be called
a basic function. Let faj g be its range (which may include “1”), and Aj D
ff D aj g. Then the Aj ’s are disjoint sets with union and
31 fD aj 1Aj
j
DEFINITION 8(a). For the basic function f in (31), its integral is defined
to be
32 Ef D aj Aj
j
The order of summation in the second double series may be reversed, and the
result follows by the countable additivity of .
(iii) If f and g are basic functions, a and b positive numbers, then af C
bg is basic and
Eaf C bg D aEf C bEg.
PROOF. It is trivial that af is basic and
Eaf D aEf.
Hence it is sufficient to prove the result for a D b D 1. Using the double
decomposition in (ii), we have
Ef C g D aj C bk Aj \ Bk .
j k
Splitting the double series in two and then summing in two orders, we obtain
the result.
It is good time to state a general result that contains the double series
theorem used above and some other version of it that will be used below.
Then we have
lim " lim " Cjk D lim " lim " Cjk C1.
j k k j
Theorem 8. Let ffn g and fgn g be two increasing sequences of basic func-
tions such that
36 lim " fn D lim " gn
n n
n1 n1
D EA; gn D Egn
n n
where the last equation is due to (38). Now let n " 1 to obtain
Corollary. Let fn and f be basic functions such that fn " f, then Efn "
Ef.
0] D 0; 1] D 1;
x] D n 1 for x 2 n 1, n], n 2 N.
400 SUPPLEMENT: MEASURE AND INTEGRAL
that yields property (iii) for FC , together with Eafm " aEf, for a ½ 0.
A ! EA; f
is a measure.
PROOF. We need only prove that if A D [1
nD1 An where the An ’s are
disjoint sets in F , then
1
EA; f D EAn ; f.
nD1
For a basic f, this follows from properties (iii) and (iv). The extension to FC
can be done by the double limit theorem and is left as an exercise.
Since fn " f, the numbers 2m fn ω] "2m fω] as n " 1, owing to the
left continuity of x !x]. Hence by Corollary to Theorem 8,
43 lim " Efm
n D Ef
m
.
n
It follows that
lim " lim " Efm
n D lim " Ef
m
D Ef.
m n m
(a) limn fn D 0;
(b) Esupn fn < 1.
Then we have
44 lim Efn D 0.
n
PROOF. Put for n 2 N:
45 gn D sup fk .
k½n
Substituting into the preceding relation and cancelling the finite Eg1 , we
obtain Egn # 0. Since 0 fn gn , so that 0 Efn Egn by property
(ii) for FC , (44) follows.
The next result is known as Fatou’s lemma, of the same vintage 1906
as Beppo Levi’s. It has the virtue of “no assumptions” with the consequent
one-sided conclusion, which is however often useful.
then
lim inf fn D lim " gn .
n n
Hence by Theorem 9,
The left member above is in truth the right member of (47); therefore (46)
follows as a milder but neater conclusion.
We have derived Theorem 11 from Theorem 9. Conversely, it is easy to
go the other way. For if fn " f, then (46) yields Ef limn " Efn . Since
f ½ fn , Ef ½ limn " Efn ; hence there is equality.
We can also derive Theorem 10 directly from Theorem 11. Using the
notation in (45), we have 0 g1 fn g1 . Hence by condition (a) and (46),
Eg1 D Elim infg1 fn lim infEg1 Efn
n n
so that
0 E1A jfj E1A .1 D A.1 D 0.
This implies (51).
To prove (iii), let
An D fjfj ½ ng.
Then An 2 F and
n An D EAn; n EAn; jfj Ejfj.
Hence
1
52 An Ejfj.
n
Letting n " 1, so that An # fjfj D 1g; since A1 Ejfj < 1, we
have by property (f) of the measure :
fjfj D 1g D lim # An D 0.
n
as follows:
fC C g gC C f .
By the assumptions of L 1 , all the four quantities above are finite numbers.
Transposing back we obtain the desired conclusion.
(iii) if f 2 L 1 , g 2 L 1 , then f C g 2 L 1 , and
Let us leave this as an exercise. If we assume only that both Ef and
Eg exist and that the right member in the equation above is defined, namely
not C1 C 1 or 1 C C1, does Ef C g then exist and equal
to the sum? We leave this as a good exercise for the curious, and return to
Theorem 10 in its practical form.
Theorem 101 . Let fn 2 F ; suppose
(a) limn fn D f a.e.;
(b) there exists ϕ 2 L 1 such that for all n:
jfn j ϕ a.e.
Then we have
(c) limn Ejfn fj D 0.
PROOF. observe first that
j lim fn j sup jfn j;
n n
provided the left members are defined. Since the union of a countable collec-
tion of null sets is a null set, under the hypotheses (a) and (b) there is a null set
A such that on A, we have supn jfn j ϕ hence by Theorem 12 (iv), all
jfn j, jfj, jfn fj are integrable, and therefore we can substitute their finite
versions without affecting their integrals, and moreover limn jfn fj D 0 on
A. (Remember that fn f need not be defined before the substitutions!).
By using Theorem 12 (ii) once more if need be, we obtain the conclusion (c)
from the positive version of Theorem 10.
This theorem is known as Lebesgue’s dominated convergence theorem,
vintage 1908. When < 1, any constant C is integrable and may be used
for ϕ; hence in this case the result is called bounded convergence theorem.
5 APPLICATIONS 407
Curiously, the best known part of the theorem is the corollary below with a
fixed B.
Corollary. We have
lim fn d D fd
n B B
uniformly in B 2 F .
5 Applications
The general theory of integration applied to a probability space is summarized
in §§3.1–3.2 of the main text. The specialization to R expounded in §3 above
will now be described and illustrated.
A function f defined on R with range in [1, C1] is called a Borel
function iff f 2 B; it is called a Lebesgue-measurable function iff f 2 BŁ .
The domain of definition f may be an arbitrary Borel set or Lebesgue-
measurable set D. This case is reduced to that for D D R by extending the
definition of f to be zero outside D. The integral of f 2 BŁ corresponding
to the measure mŁ constructed from F is denoted by
1
Ef D fx dFx.
1
In case Fx x, this is called the Lebesgue integral of f; in this case the
usual notation is, for A 2 BŁ :
fx dx D EA; f.
A
Suppose the infinite series k uk x converges I; then in the usual notation:
n 1
lim uk x D uk x D sx
n!1
kD1 kD1
Question: does the numerical series above converge? and if so is the sum of integrals
equal to the integral of the sum:
1
1
uk x dx D uk x dx D sx dx?
kD1 I I kD1 I
The Taylor series of an analytic function always converges uniformly and abso-
lutely in any compact subinterval of its interval of convergence. Thus the result above
is fruitful.
Another example of term-by-term integration goes back to Theorem 8.
n 1
Let fn D kD1 uk , then fn 2 L1 , fn " f D kD1 uk . Hence by monotone conver-
gence
Ef D lim Efn
n
that is (54).
5 APPLICATIONS 409
If this is finite, then the same is true when juk j is replaced by ukC and uk . It then
follows by subtraction that (54) is also true. This result of term-by-term integration
may be regarded as a special case of the Fubini–Tonelli theorem (pp. 63–64), where
one of the measures is the counting measure on N.
For another perspective, we will apply the Borel–Lebesgue theory of integral to
the older Riemann context.
and put
υP D max xk xk1 .
1kn
n
EfP D fk xk xk1 .
kD1
The sum above is called a Riemann sum; when the k are chosen as in (55), they are
called lower and upper sums, respectively.
Now let fPn, n 2 Ng be a sequence of partitions such that υPn ! 0 as
n ! 1. Since f is continuous on a compact set, it is bounded. It follows that there
is a constant C such that
sup sup jfPn xj < C.
n2N x2I
Since I is bounded, we can apply the bounded convergence theorem to conclude that
The finite existence of the limit above signifies the Riemann-integrability of f, and the
b
limit is then its Riemann-integral a fx dx. Thus we have proved that a continuous
function on a compact interval is Riemann-integrable, and its Riemann-integral is equal
to the Lebesgue integral. Let us recall that in the new theory, any bounded measurable
function is integrable over any bounded measurable set. For example, the function
1
sin , x 2 0, 1]
x
being bounded by 1 is integrable. But from the strict Riemannian point of view it
has only an “improper” integral because (0, 1] is not closed and the function is not
continuous on [0, 1], indeed it is not definable there. Yet the limit
1
1
lim sin dx
#0 x
1
exists and can be defined to be 0 sin1/x dx, As a matter of fact, the Riemann sums
do converge despite the unceasing oscillation of f between 0 and 1 as x # 0.
It follows that EfC D C1. Similarly Ef D C1; therefore by Definition 8(c)
Ef does not exist! This example is a splendid illustration of the following
Non-Theorem.
Let f 2 B and fn D f10,n , n 2 N. Then fn 2 B and fn ! f as n ! 1.
Even when the f0n s are “totally bounded”, it does not follow that
Example 6. The most notorious example of a simple function that is not Riemann-
integrable and that baffled a generation of mathematicians is the function 1Q , where Q
is the set of rational numbers. Its Riemann sums can be made to equal any real number
between 0 and 1, when we confine Q to the unit interval (0, 1). The function is so
totally discontinuous that the Riemannian way of approximating it, horizontally so to
speak, fails utterly. But of course it is ludicrous even to consider this indicator function
rather than the set Q itself. There was a historical reason for this folly: integration was
regarded as the inverse operation to differentiation, so that to integrate was meant to
“find the primitive” whose derivative is to be the integrand, for example,
2 d 2
x dx D , D ;
2 d 2
1 d 1
dx D log , log D .
x d
412 SUPPLEMENT: MEASURE AND INTEGRAL
2
A primitive is called “indefinite integral”, and 1 1/x dx e.g. is called a “definite
integral.” Thus the unsolvable problem was to find 0 1Q x dx, 0 < < 1.
The notion of measure as length, area, and volume is much more ancient than
Newton’s fluxion (derivative), not to mention the primitive measure of counting with
fingers (and toes). The notion of “countable additivity” of a measure, although seem-
ingly natural and facile, somehow did not take hold until Borel saw that
mQ D mq D 0 D 0.
q2Q q2Q
There can be no question that the “length” of a single point q is zero. Euclid gave it
“zero dimension”.
This is the beginning of MEASURE. An INTEGRAL is a weighted measure,
as is obvious from Definition 8(a). The rest is approximation, vertically as in Defini-
tion 8(b), and convergence, as in all analysis.
As for the connexion with differentiation, Lebesgue made it, and a clue is given
in §1.3 of the main text.
General bibliography
[The five divisions below are merely a rough classification and are not meant to be
mutually exclusive.]
1. Basic analysis
[4] P. R. Halmos, Measure theory. D. Van Nostrand Co., Inc., Princeton, N.J.,
1956.
[5] H. L. Royden, Real analysis. The Macmillan Company, New York, 1963.
[6] J. Neveu, Mathematical foundations of the calculus of probability. Holden-
Day, Inc., San Francisco, 1965.
3. Probability theory
4. Stochastic processes
[17] J. L. Doob, Stochastic processes. John Wiley & Sons, Inc., New York, 1953.
[18] Kai Lai Chung, Markov chains with stationary transition probabilities, 2nd
ed. Springer-Verlag, Berlin, 1967 [1st ed., 1960].
[19] Frank Spitzer, Principles of random walk. D. Van Nostrand Co., Princeton,
N.J., 1964.
[20] Paul–André Meyer, Probabilités et potential. Herman (Editions Scienti-
fiques), Paris, 1966.
[21] G. A. Hunt, Martingales et processus de Markov. Dunod, Paris, 1966.
[22] David Freedman, Brownian motion and diffusion. Holden-Day, Inc., San
Francisco, 1971.
5. Supplementary reading
G J