0% found this document useful (0 votes)
13 views

Course in Probability Theory

Uploaded by

souvik.pratiher
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views

Course in Probability Theory

Uploaded by

souvik.pratiher
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 439

A

COURSE IN
PROBABILITY
THEORY

THIRD EDITION
This Page Intentionally Left Blank
A
COURSE IN
PROBABILITY
THEORY

THIRD EDITION

Kai Lai Chung


Stanford University

San Diego San Francisco New York


Boston London Sydney Tokyo
This book is printed on acid-free paper. 
1

COPYRIGHT © 2001, 1974, 1968 BY ACADEMIC PRESS


ALL RIGHTS RESERVED.
NO PART OF THIS PUBLICATION MAY BE REPRODUCED OR TRANSMITTED IN ANY FORM OR BY ANY
MEANS, ELECTRONIC OR MECHANICAL, INCLUDING PHOTOCOPY, RECORDING, OR ANY INFORMATION
STORAGE AND RETRIEVAL SYSTEM, WITHOUT PERMISSION IN WRITING FROM THE PUBLISHER.

Requests for permission to make copies of any part of the work should be mailed to the
following address: Permissions Department, Harcourt, Inc., 6277 Sea Harbor Drive, Orlando,
Florida 32887-6777.

ACADEMIC PRESS
A Harcourt Science and Technology Company
525 B Street, Suite 1900, San Diego, CA 92101-4495, USA
http://www.academicpress.com

ACADEMIC PRESS
Harcourt Place, 32 Jamestown Road, London, NW1 7BY, UK
http://www.academicpress.com

Library of Congress Cataloging in Publication Data: 00-106712

International Standard Book Number: 0-12-174151-6

PRINTED IN THE UNITED STATES OF AMERICA


00 01 02 03 IP 9 8 7 6 5 4 3 2 1
Contents

Preface to the third edition ix


Preface to the second edition xi
Preface to the first edition xiii

1 Distribution function
1.1 Monotone functions 1
1.2 Distribution functions 7
1.3 Absolutely continuous and singular distributions 11

2 Measure theory
2.1 Classes of sets 16
2.2 Probability measures and their distribution
functions 21

3 Random variable. Expectation. Independence


3.1 General definitions 34
3.2 Properties of mathematical expectation 41
3.3 Independence 53

4 Convergence concepts
4.1 Various modes of convergence 68
4.2 Almost sure convergence; Borel–Cantelli lemma 75
vi CONTENTS

4.3 Vague convergence 84


4.4 Continuation 91
4.5 Uniform integrability; convergence of moments 99

5 Law of large numbers. Random series


5.1 Simple limit theorems 106
5.2 Weak law of large numbers 112
5.3 Convergence of series 121
5.4 Strong law of large numbers 129
5.5 Applications 138
Bibliographical Note 148

6 Characteristic function
6.1 General properties; convolutions 150
6.2 Uniqueness and inversion 160
6.3 Convergence theorems 169
6.4 Simple applications 175
6.5 Representation theorems 187
6.6 Multidimensional case; Laplace transforms 196
Bibliographical Note 204

7 Central limit theorem and its ramifications


7.1 Liapounov’s theorem 205
7.2 Lindeberg–Feller theorem 214
7.3 Ramifications of the central limit theorem 224
7.4 Error estimation 235
7.5 Law of the iterated logarithm 242
7.6 Infinite divisibility 250
Bibliographical Note 261

8 Random walk
8.1 Zero-or-one laws 263
8.2 Basic notions 270
8.3 Recurrence 278
8.4 Fine structure 288
8.5 Continuation 298
Bibliographical Note 308
CONTENTS vii

9 Conditioning. Markov property. Martingale


9.1 Basic properties of conditional expectation 310
9.2 Conditional independence; Markov property 322
9.3 Basic properties of smartingales 334
9.4 Inequalities and convergence 346
9.5 Applications 360
Bibliographical Note 373

Supplement: Measure and Integral


1 Construction of measure 375
2 Characterization of extensions 380
3 Measures in R 387
4 Integral 395
5 Applications 407

General Bibliography 413

Index 415
This Page Intentionally Left Blank
Preface to the third edition

In this new edition, I have added a Supplement on Measure and Integral.


The subject matter is first treated in a general setting pertinent to an abstract
measure space, and then specified in the classic Borel-Lebesgue case for the
real line. The latter material, an essential part of real analysis, is presupposed
in the original edition published in 1968 and revised in the second edition
of 1974. When I taught the course under the title “Advanced Probability”
at Stanford University beginning in 1962, students from the departments of
statistics, operations research (formerly industrial engineering), electrical engi-
neering, etc. often had to take a prerequisite course given by other instructors
before they enlisted in my course. In later years I prepared a set of notes,
lithographed and distributed in the class, to meet the need. This forms the
basis of the present Supplement. It is hoped that the result may as well serve
in an introductory mode, perhaps also independently for a short course in the
stated topics.
The presentation is largely self-contained with only a few particular refer-
ences to the main text. For instance, after (the old) §2.1 where the basic notions
of set theory are explained, the reader can proceed to the first two sections of
the Supplement for a full treatment of the construction and completion of a
general measure; the next two sections contain a full treatment of the mathe-
matical expectation as an integral, of which the properties are recapitulated in
§3.2. In the final section, application of the new integral to the older Riemann
integral in calculus is described and illustrated with some famous examples.
Throughout the exposition, a few side remarks, pedagogic, historical, even
x PREFACE TO THE THIRD EDITION

judgmental, of the kind I used to drop in the classroom, are approximately


reproduced.
In drafting the Supplement, I consulted Patrick Fitzsimmons on several
occasions for support. Giorgio Letta and Bernard Bru gave me encouragement
for the uncommon approach to Borel’s lemma in §3, for which the usual proof
always left me disconsolate as being too devious for the novice’s appreciation.
A small number of additional remarks and exercises have been added to
the main text.
Warm thanks are due: to Vanessa Gerhard of Academic Press who deci-
phered my handwritten manuscript with great ease and care; to Isolde Field
of the Mathematics Department for unfailing assistence; to Jim Luce for a
mission accomplished. Last and evidently not least, my wife and my daughter
Corinna performed numerous tasks indispensable to the undertaking of this
publication.
Preface to the second edition

This edition contains a good number of additions scattered throughout the


book as well as numerous voluntary and involuntary changes. The reader who
is familiar with the first edition will have the joy (or chagrin) of spotting new
entries. Several sections in Chapters 4 and 9 have been rewritten to make the
material more adaptable to application in stochastic processes. Let me reiterate
that this book was designed as a basic study course prior to various possible
specializations. There is enough material in it to cover an academic year in
class instruction, if the contents are taken seriously, including the exercises.
On the other hand, the ordering of the topics may be varied considerably to
suit individual tastes. For instance, Chapters 6 and 7 dealing with limiting
distributions can be easily made to precede Chapter 5 which treats almost
sure convergence. A specific recommendation is to take up Chapter 9, where
conditioning makes a belated appearance, before much of Chapter 5 or even
Chapter 4. This would be more in the modern spirit of an early weaning from
the independence concept, and could be followed by an excursion into the
Markovian territory.
Thanks are due to many readers who have told me about errors, obscuri-
ties, and inanities in the first edition. An incomplete record includes the names
below (with apology for forgotten ones): Geoff Eagleson, Z. Govindarajulu,
David Heath, Bruce Henry, Donald Iglehart, Anatole Joffe, Joseph Marker,
P. Masani, Warwick Millar, Richard Olshen, S. M. Samuels, David Siegmund,
T. Thedéen, A. González Villa lobos, Michel Weil, and Ward Whitt. The
revised manuscript was checked in large measure by Ditlev Monrad. The
xii PREFACE TO THE SECOND EDITION

galley proofs were read by David Kreps and myself independently, and it was
fun to compare scores and see who missed what. But since not all parts of
the old text have undergone the same scrutiny, readers of the new edition
are cordially invited to continue the fault-finding. Martha Kirtley and Joan
Shepard typed portions of the new material. Gail Lemmond took charge of
the final page-by-page revamping and it was through her loving care that the
revision was completed on schedule.
In the third printing a number of misprints and mistakes, mostly minor, are
corrected. I am indebted to the following persons for some of these corrections:
Roger Alexander, Steven Carchedi, Timothy Green, Joseph Horowitz, Edward
Korn, Pierre van Moerbeke, David Siegmund.
In the fourth printing, an oversight in the proof of Theorem 6.3.1 is
corrected, a hint is added to Exercise 2 in Section 6.4, and a simplification
made in (VII) of Section 9.5. A number of minor misprints are also corrected. I
am indebted to several readers, including Asmussen, Robert, Schatte, Whitley
and Yannaros, who wrote me about the text.
Preface to the first edition

A mathematics course is not a stockpile of raw material nor a random selection


of vignettes. It should offer a sustained tour of the field being surveyed and
a preferred approach to it. Such a course is bound to be somewhat subjective
and tentative, neither stationary in time nor homogeneous in space. But it
should represent a considered effort on the part of the author to combine his
philosophy, conviction, and experience as to how the subject may be learned
and taught. The field of probability is already so large and diversified that
even at the level of this introductory book there can be many different views
on orientation and development that affect the choice and arrangement of its
content. The necessary decisions being hard and uncertain, one too often takes
refuge by pleading a matter of “taste.” But there is good taste and bad taste
in mathematics just as in music, literature, or cuisine, and one who dabbles in
it must stand judged thereby.
It might seem superfluous to emphasize the word “probability” in a book
dealing with the subject. Yet on the one hand, one used to hear such specious
utterance as “probability is just a chapter of measure theory”; on the other
hand, many still use probability as a front for certain types of analysis such as
combinatorial, Fourier, functional, and whatnot. Now a properly constructed
course in probability should indeed make substantial use of these and other
allied disciplines, and a strict line of demarcation need never be drawn. But
PROBABILITY is still distinct from its tools and its applications not only in
the final results achieved but also in the manner of proceeding. This is perhaps
xiv PREFACE TO THE FIRST EDITION

best seen in the advanced study of stochastic processes, but will already be
abundantly clear from the contents of a general introduction such as this book.
Although many notions of probability theory arise from concrete models
in applied sciences, recalling such familiar objects as coins and dice, genes
and particles, a basic mathematical text (as this pretends to be) can no longer
indulge in diverse applications, just as nowadays a course in real variables
cannot delve into the vibrations of strings or the conduction of heat. Inciden-
tally, merely borrowing the jargon from another branch of science without
treating its genuine problems does not aid in the understanding of concepts or
the mastery of techniques.
A final disclaimer: this book is not the prelude to something else and does
not lead down a strait and righteous path to any unique fundamental goal.
Fortunately nothing in the theory deserves such single-minded devotion, as
apparently happens in certain other fields of mathematics. Quite the contrary,
a basic course in probability should offer a broad perspective of the open field
and prepare the student for various further possibilities of study and research.
To this aim he must acquire knowledge of ideas and practice in methods, and
dwell with them long and deeply enough to reap the benefits.
A brief description will now be given of the nine chapters, with some
suggestions for reading and instruction. Chapters 1 and 2 are preparatory. A
synopsis of the requisite “measure and integration” is given in Chapter 2,
together with certain supplements essential to probability theory. Chapter 1 is
really a review of elementary real variables; although it is somewhat expend-
able, a reader with adequate background should be able to cover it swiftly and
confidently — with something gained from the effort. For class instruction it
may be advisable to begin the course with Chapter 2 and fill in from Chapter 1
as the occasions arise. Chapter 3 is the true introduction to the language and
framework of probability theory, but I have restricted its content to what is
crucial and feasible at this stage, relegating certain important extensions, such
as shifting and conditioning, to Chapters 8 and 9. This is done to avoid over-
loading the chapter with definitions and generalities that would be meaningless
without frequent application. Chapter 4 may be regarded as an assembly of
notions and techniques of real function theory adapted to the usage of proba-
bility. Thus, Chapter 5 is the first place where the reader encounters bona fide
theorems in the field. The famous landmarks shown there serve also to intro-
duce the ways and means peculiar to the subject. Chapter 6 develops some of
the chief analytical weapons, namely Fourier and Laplace transforms, needed
for challenges old and new. Quick testing grounds are provided, but for major
battlefields one must await Chapters 7 and 8. Chapter 7 initiates what has been
called the “central problem” of classical probability theory. Time has marched
on and the center of the stage has shifted, but this topic remains without
doubt a crowning achievement. In Chapters 8 and 9 two different aspects of
PREFACE TO THE FIRST EDITION xv

(discrete parameter) stochastic processes are presented in some depth. The


random walks in Chapter 8 illustrate the way probability theory transforms
other parts of mathematics. It does so by introducing the trajectories of a
process, thereby turning what was static into a dynamic structure. The same
revolution is now going on in potential theory by the injection of the theory of
Markov processes. In Chapter 9 we return to fundamentals and strike out in
major new directions. While Markov processes can be barely introduced in the
limited space, martingales have become an indispensable tool for any serious
study of contemporary work and are discussed here at length. The fact that
these topics are placed at the end rather than the beginning of the book, where
they might very well be, testifies to my belief that the student of mathematics
is better advised to learn something old before plunging into the new.
A short course may be built around Chapters 2, 3, 4, selections from
Chapters 5, 6, and the first one or two sections of Chapter 9. For a richer fare,
substantial portions of the last three chapters should be given without skipping
any one of them. In a class with solid background, Chapters 1, 2, and 4 need
not be covered in detail. At the opposite end, Chapter 2 may be filled in with
proofs that are readily available in standard texts. It is my hope that this book
may also be useful to mature mathematicians as a gentle but not so meager
introduction to genuine probability theory. (Often they stop just before things
become interesting!) Such a reader may begin with Chapter 3, go at once to
Chapter 5 with a few glances at Chapter 4, skim through Chapter 6, and take
up the remaining chapters seriously to get a real feeling for the subject.
Several cases of exclusion and inclusion merit special comment. I chose
to construct only a sequence of independent random variables (in Section 3.3),
rather than a more general one, in the belief that the latter is better absorbed in a
course on stochastic processes. I chose to postpone a discussion of conditioning
until quite late, in order to follow it up at once with varied and worthwhile
applications. With a little reshuffling Section 9.1 may be placed right after
Chapter 3 if so desired. I chose not to include a fuller treatment of infinitely
divisible laws, for two reasons: the material is well covered in two or three
treatises, and the best way to develop it would be in the context of the under-
lying additive process, as originally conceived by its creator Paul Lévy. I
took pains to spell out a peripheral discussion of the logarithm of charac-
teristic function to combat the errors committed on this score by numerous
existing books. Finally, and this is mentioned here only in response to a query
by Doob, I chose to present the brutal Theorem 5.3.2 in the original form
given by Kolmogorov because I want to expose the student to hardships in
mathematics.
There are perhaps some new things in this book, but in general I have
not striven to appear original or merely different, having at heart the interests
of the novice rather than the connoisseur. In the same vein, I favor as a
xvi PREFACE TO THE FIRST EDITION

rule of writing (euphemistically called “style”) clarity over elegance. In my


opinion the slightly decadent fashion of conciseness has been overwrought,
particularly in the writing of textbooks. The only valid argument I have heard
for an excessively terse style is that it may encourage the reader to think for
himself. Such an effect can be achieved equally well, for anyone who wishes
it, by simply omitting every other sentence in the unabridged version.
This book contains about 500 exercises consisting mostly of special cases
and examples, second thoughts and alternative arguments, natural extensions,
and some novel departures. With a few obvious exceptions they are neither
profound nor trivial, and hints and comments are appended to many of them.
If they tend to be somewhat inbred, at least they are relevant to the text and
should help in its digestion. As a bold venture I have marked a few of them
with Ł to indicate a “must,” although no rigid standard of selection has been
used. Some of these are needed in the book, but in any case the reader’s study
of the text will be more complete after he has tried at least those problems.
Over a span of nearly twenty years I have taught a course at approx-
imately the level of this book a number of times. The penultimate draft of
the manuscript was tried out in a class given in 1966 at Stanford University.
Because of an anachronism that allowed only two quarters to the course (as
if probability could also blossom faster in the California climate!), I had to
omit the second halves of Chapters 8 and 9 but otherwise kept fairly closely
to the text as presented here. (The second half of Chapter 9 was covered in a
subsequent course called “stochastic processes.”) A good fraction of the exer-
cises were assigned as homework, and in addition a great majority of them
were worked out by volunteers. Among those in the class who cooperated
in this manner and who corrected mistakes and suggested improvements are:
Jack E. Clark, B. Curtis Eaves, Susan D. Horn, Alan T. Huckleberry, Thomas
M. Liggett, and Roy E. Welsch, to whom I owe sincere thanks. The manuscript
was also read by J. L. Doob and Benton Jamison, both of whom contributed
a great deal to the final revision. They have also used part of the manuscript
in their classes. Aside from these personal acknowledgments, the book owes
of course to a large number of authors of original papers, treatises, and text-
books. I have restricted bibliographical references to the major sources while
adding many more names among the exercises. Some oversight is perhaps
inevitable; however, inconsequential or irrelevant “name-dropping” is delib-
erately avoided, with two or three exceptions which should prove the rule.
It is a pleasure to thank Rosemarie Stampfel and Gail Lemmond for their
superb job in typing the manuscript.
A
COURSE IN
PROBABILITY
THEORY

THIRD EDITION
This Page Intentionally Left Blank
1 Distribution function

1.1 Monotone functions

We begin with a discussion of distribution functions as a traditional way


of introducing probability measures. It serves as a convenient bridge from
elementary analysis to probability theory, upon which the beginner may pause
to review his mathematical background and test his mental agility. Some of
the methods as well as results in this chapter are also useful in the theory of
stochastic processes.
In this book we shall follow the fashionable usage of the words “posi-
tive”, “negative”, “increasing”, “decreasing” in their loose interpretation.
For example, “x is positive” means “x ½ 0”; the qualifier “strictly” will be
added when “x > 0” is meant. By a “function” we mean in this chapter a real
finite-valued one unless otherwise specified.
Let then f be an increasing function defined on the real line 1, C1.
Thus for any two real numbers x1 and x2 ,

1 x1 < x2 ) fx1   fx2 .

We begin by reviewing some properties of such a function. The notation


“t " x” means “t < x, t ! x”; “t # x” means “t > x, t ! x”.
2 DISTRIBUTION FUNCTION

(i) For each x, both unilateral limits

2 lim ft D fx and lim ft D fxC


t"x t#x

exist and are finite. Furthermore the limits at infinity

lim ft D f1 and lim ft D fC1


t#1 t"C1

exist; the former may be 1, the latter may be C1.


This follows from monotonicity; indeed

fx D sup ft, fxC D inf ft.


1<t<x x<t<C1

(ii) For each x, f is continuous at x if and only if

fx D fx D fxC.

To see this, observe that the continuity of a monotone function f at x is


equivalent to the assertion that

lim ft D fx D lim ft.


t"x t#x

By (i), the limits above exist as fx and fxC and

3 fx  fx  fxC,

from which (ii) follows.

In general, we say that the function f has a jump at x iff the two limits
in (2) both exist but are unequal. The value of f at x itself, viz. fx, may be
arbitrary, but for an increasing f the relation (3) must hold. As a consequence
of (i) and (ii), we have the next result.

(iii) The only possible kind of discontinuity of an increasing function is a


jump. [The reader should ask himself what other kinds of discontinuity there
are for a function in general.]
If there is a jump at x, we call x a point of jump of f and the number
fxC  fx the size of the jump or simply “the jump” at x.
It is worthwhile to observe that points of jump may have a finite point
of accumulation and that such a point of accumulation need not be a point of
jump itself. Thus, the set of points of jump is not necessarily a closed set.
1.1 MONOTONE FUNCTIONS 3

Example 1. Let x0 be an arbitrary real number, and define a function f as follows:

fx D 0 for x  x0  1;

1 1 1
D1 for x0   x < x0  , n D 1, 2, . . . ;
n n nC1

D1 for x ½ x0 .

The point x0 is a point of accumulation of the points of jump fx0  1/n, n ½ 1g, but
f is continuous at x0 .

Before we discuss the next example, let us introduce a notation that will
be used throughout the book. For any real number t, we set

0 for x < t,
4 υt x D
1 for x ½ t.
We shall call the function υt the point mass at t.

Example 2. Let fan , n ½ 1g be any given enumeration of the set  of all rational
numbers, and let fbn , n ½ 1g be a set of positive ½0 numbers such that 1nD1 bn < 1.
For instance, we may take bn D 2n . Consider now
1

5 fx D bn υan x.
nD1

Since 0  υan x  1 for every n and x, the series in (5) is absolutely and uniformly
convergent. Since each υan is increasing, it follows that if x1 < x2 ,
1

fx2   fx1  D bn [υan x2   υan x1 ] ½ 0.
nD1

Hence f is increasing. Thanks to the uniform convergence (why?) we may deduce


that for each x,
1

6 fxC  fx D bn [υan xC  υan x].
nD1

But for each n, the number in the square brackets above is 0 or 1 according as x 6D an
or x D an . Hence if x is different from all the an ’s, each term on the right side of (6)
vanishes; on the other hand if x D ak , say, then exactly one term, that corresponding
to n D k, does not vanish and yields the value bk for the whole series. This proves
that the function f has jumps at all the rational points and nowhere else.
This example shows that the set of points of jump of an increasing function
may be everywhere dense; in fact the set of rational numbers in the example may
4 DISTRIBUTION FUNCTION

be replaced by an arbitrary countable set without any change of the argument. We


now show that the condition of countability is indispensable. By “countable” we mean
always “finite (possibly empty) or countably infinite”.

(iv) The set of discontinuities of f is countable.

We shall prove this by a topological argument of some general applica-


bility. In Exercise 3 after this section another proof based on an equally useful
counting argument will be indicated. For each point of jump x consider the
open interval Ix D fx, fxC. If x 0 is another point of jump and x < x 0 ,
say, then there is a point xQ such that x < xQ < x 0 . Hence by monotonicity we
have
fxC  fQx   fx 0 .

It follows that the two intervals Ix and Ix0 are disjoint, though they may
abut on each other if fxC D fx 0 . Thus we may associate with the set of
points of jump in the domain of f a certain collection of pairwise disjoint open
intervals in the range of f. Now any such collection is necessarily a countable
one, since each interval contains a rational number, so that the collection of
intervals is in one-to-one correspondence with a certain subset of the rational
numbers and the latter is countable. Therefore the set of discontinuities is also
countable, since it is in one-to-one correspondence with the set of intervals
associated with it.

(v) Let f1 and f2 be two increasing functions and D a set that is (every-
where) dense in 1, C1. Suppose that

8x 2 D: f1 x D f2 x.

Then f1 and f2 have the same points of jump of the same size, and they
coincide except possibly at some of these points of jump.
To see this, let x be an arbitrary point and let tn 2 D, tn0 2 D, tn " x,
0
tn # x. Such sequences exist since D is dense. It follows from (i) that

f1 x D lim f1 tn  D lim f2 tn  D f2 x,


n n
6
f1 xC D lim f1 tn0  D lim f2 tn0  D f2 xC.
n n

In particular

8x: f1 xC  f1 x D f2 xC  f2 x.

The first assertion in (v) follows from this equation and (ii). Furthermore if
f1 is continuous at x, then so is f2 by what has just been proved, and we
1.1 MONOTONE FUNCTIONS 5

have
f1 x D f1 x D f2 x D f2 x,

proving the second assertion.


How can f1 and f2 differ at all? This can happen only when f1 x
and f2 x assume different values in the interval f1 x, f1 xC D
f2 x, f2 xC. It will turn out in Chapter 2 (see in particular Exercise 21
of Sec. 2.2) that the precise value of f at a point of jump is quite unessential
for our purposes and may be modified, subject to (3), to suit our convenience.
More precisely, given the function f, we can define a new function fQ in
several different ways, such as

Q D fx, fx
fx Q D fx C fxC ,
Q D fxC, fx
2
and use one of these instead of the original one. The third modification is
found to be convenient in Fourier analysis, but either one of the first two is
more suitable for probability theory. We have a free choice between them and
we shall choose the second, namely, right continuity.

(vi) If we put
Q D fxC,
8x: fx

then fQ is increasing and right continuous everywhere.

Let us recall that an arbitrary function g is said to be right continuous at


x iff limt#x gt exists and the limit, to be denoted by gxC, is equal to gx.
To prove the assertion (vi) we must show that
8x: lim ftC D fxC.
t#x

This is indeed true for any f such that ftC exists for every t. For then:
given any  > 0, there exists υ > 0 such that
8s 2 x, x C υ: jfs  fxCj  .
Let t 2 x, x C υ and let s # t in the above, then we obtain
jftC  fxCj  ,

which proves that fQ is right continuous. It is easy to see that it is increasing


if f is so.

Let D be dense in 1, C1, and suppose that f is a function with the
domain D. We may speak of the monotonicity, continuity, uniform continuity,
6 DISTRIBUTION FUNCTION

and so on of f on its domain of definition if in the usual definitions we restrict


ourselves to the points of D. Even if f is defined in a larger domain, we may
still speak of these properties “on D” by considering the “restriction of f
to D”.

(vii) Let f be increasing on D, and define fQ on 1, C1 as follows:


Q D inf ft.
8x: fx
x<t2D

Then fQ is increasing and right continuous everywhere.


This is a generalization of (vi). fQ is clearly increasing. To prove right
continuity let an arbitrary x0 and  > 0 be given. There exists t0 2 D, t0 > x0 ,
such that
Q 0   ft0 .
ft0     fx

Hence if t 2 D, x0 < t < t0 , we have


Q 0   ft0   fx
0  ft  fx Q 0   .

This implies by the definition of fQ that for x0 < x < t0 we have


Q  fx
0  fx Q 0   .

Since  is arbitrary, it follows that fQ is right continuous at x0 , as was to be


shown.

EXERCISES

1. Prove that for the f in Example 2 we have


1

f1 D 0, fC1 D bn .
nD1

2. Construct an increasing function on 1, C1 with a jump of size


one at, each integer,
 and constant between jumps. Such a function cannot be
represented as 1 nD1 bn υn x with bn D 1 for each n, but a slight modification
will do. Spell this out.
* 3. Suppose that f is increasing and that there exist real numbers A and
B such that 8x: A  fx  B. Show that for each  > 0, the number of jumps
of size exceeding  is at most B  A/. Hence prove (iv), first for bounded
f and then in general.
Ł indicates specially selected exercises (as mentioned in the Preface).
1.2 DISTRIBUTION FUNCTIONS 7

4. Let f be an arbitrary function on 1, C1 and L be the set of x


where f is right continuous but not left continuous. Prove that L is a countable
set. [HINT: Consider L \ Mn , where Mn D fx j Of; x > 1/ng and Of; x is
the oscillation of f at x.]
 5. Let f and fQ be as in (vii). Show that the continuity of f on D does not
imply that of fQ on 1, C1, but uniform continuity does imply uniform
continuity.
6. Given any extended-valued f on 1, C1, there exists a countable
set D with the following property. For each t, there exist tn 2 D, tn ! t such
that ft D limn!1 ftn . This assertion remains true if “tn ! t” is replaced
by “tn # t” or “tn " t”. [This is the crux of “separability” for stochastic
processes. Consider the graph t, ft and introduce a metric.]

1.2 Distribution functions

Suppose now that f is bounded as well as increasing and not constant. We


have then
8x: 1 < f1  fx  fC1 < C1.

Consider the “normalized” function:

Q D fx  f1
1 fx
fC1  f1
which is bounded and increasing with

2 Q
f1 D 0, Q
fC1 D 1.

Owing to the simple nature of the linear transformation in (1) from f to


fQ and vice versa, we may without loss of generality assume the normalizing
conditions (2) in dealing with a bounded increasing function. To avoid needless
complications, we shall also assume that fQ is right continuous as discussed in
Sec. 1.1. These conditions will now be formalized.

DEFINITION OF A DISTRIBUTION FUNCTION. A real-valued function F with


domain 1, C1 that is increasing and right continuous with F1 D
0, FC1 D 1 is called a distribution function, to be abbreviated hereafter as
“d.f.” A d.f. that is a point mass as defined in (4) of Sec. 1.1 is said to be
“degenerate”, otherwise “nondegenerate”.
Of course all the properties given in Sec. 1.1 hold for a d.f. Indeed the
added assumption of boundedness does not appreciably simplify the proofs
there. In particular, let faj g be the countable set of points of jump of F and
8 DISTRIBUTION FUNCTION

bj the size at jump at aj , then

Faj   Faj  D bj

since Faj C D Faj . Consider the function



Fd x D bj υaj x
j

which represents the sum of all the jumps of F in the half-line 1, x]. It is
clearly increasing, right continuous, with

3 Fd 1 D 0, Fd C1 D bj  1.
j

Hence Fd is a bounded increasing function. It should constitute the “jumping


part” of F, and if it is subtracted out from F, the remainder should be positive,
contain no more jumps, and so be continuous. These plausible statements will
now be proved — they are easy enough but not really trivial.

Theorem 1.2.1. Let


Fc x D Fx  Fd x;

then Fc is positive, increasing, and continuous.


PROOF. Let x < x 0 , then we have
 
4 Fd x 0   Fd x D bj D [Faj   Faj ]
x<aj x0 x<aj x0

 Fx 0   Fx.

It follows that both Fd and Fc are increasing, and if we put x D 1 in the


above, we see that Fd  F and so Fc is indeed positive. Next, Fd is right
continuous since each υaj is and the series defining Fd converges uniformly
in x; the same argument yields (cf. Example 2 of Sec. 1.1)

bj if x D aj ,
Fd x  Fd x D
0 otherwise.
Now this evaluation holds also if Fd is replaced by F according to the defi-
nition of aj and bj , hence we obtain for each x:

Fc x  Fc x D Fx  Fx  [Fd x  Fd x] D 0.

This shows that Fc is left continuous; since it is also right continuous, being
the difference of two such functions, it is continuous.
1.2 DISTRIBUTION FUNCTIONS 9

Theorem 1.2.2. Let F be a d.f. Suppose that there exist a continuous func-
tion Gc and a function Gd of the form

Gd x D bj0 υaj0 x
j

[where faj0 g is a countable set of real numbers and j jbj0 j < 1], such that
F D Gc C Gd ,
then
Gc D Fc , Gd D Fd ,
where Fc and Fd are defined as before.
PROOF.If Fd 6D Gd , then either the sets faj g and faj0 g are not identical,
or we may relabel the aj0 so that aj0 D aj for all j but bj0 6D bj for some j. In
either case we have for at least one j, and aQ D aj or aj0 :

[Fd aQ   Fd aQ ]  [Gd aQ   Gd aQ ] 6D 0.

Since Fc  Gc D Gd  Fd , this implies that

Fc aQ   Gc aQ   [Fc aQ   Gc aQ ] 6D 0,

contradicting the fact that Fc  Gc is a continuous function. Hence Fd D Gd


and consequently Fc D Gc .

DEFINITION. A d.f. F that can be represented in the form



FD bj υaj
j

where faj g is a countable set of real numbers, bj > 0 for every j and j bj D 1,
is called a discrete d.f. A d.f. that is continuous everywhere is called a contin-
uous d.f.

Suppose Fc 6 0, Fd 6 0 in Theorem 1.2.1, then we may set ˛ D Fd 1


so that 0 < ˛ < 1,
1 1
F1 D Fd , F2 D Fc ,
˛ 1˛
and write
5 F D ˛F1 C 1  ˛F2 .
Now F1 is a discrete d.f., F2 is a continuous d.f., and F is exhibited as a
convex combination of them. If Fc  0, then F is discrete and we set ˛ D 1,
F1  F, F2  0; if Fd  0, then F is continuous and we set ˛ D 0, F1  0,
10 DISTRIBUTION FUNCTION

F2  F; in either extreme case (5) remains valid. We may now summarize


the two theorems above as follows.

Theorem 1.2.3. Every d.f. can be written as the convex combination of a


discrete and a continuous one. Such a decomposition is unique.

EXERCISES

1. Let F be a d.f. Then for each x,


lim[Fx C   Fx  ] D 0
#0

unless x is a point of jump of F, in which case the limit is equal to the size
of the jump.
 2. Let F be a d.f. with points of jump fa g. Prove that the sum
j

[Faj   Faj ]
x<aj <x

converges to zero as  # 0, for every x. What if the summation above is


extended to x   < aj  x instead? Give another proof of the continuity of
Fc in Theorem 1.2.1 by using this problem.
3. A plausible verbal definition of a discrete d.f. may be given thus: “It
is a d.f. that has jumps and is constant between jumps.” [Such a function is
sometimes called a “step function”, though the meaning of this term does not
seem to be well established.] What is wrong with this? But suppose that the set
of points of jump is “discrete” in the Euclidean topology, then the definition
is valid (apart from our convention of right continuity).
4. For a general increasing function F there is a similar decomposition
F D Fc C Fd , where both Fc and Fd are increasing, Fc is continuous, and
Fd is “purely jumping”. [HINT: Let a be a point of continuity, put Fd a D
Fa, add jumps in a, 1 and subtract jumps in 1, a to define Fd . Cf.
Exercise 2 in Sec. 1.1.]
5. Theorem 1.2.2 can be generalized to any bounded increasing function.
More generally, let f be the difference of two bounded increasing functions on
1, C1; such a function is said to be of bounded variation there. Define
its purely discontinuous and continuous parts and prove the corresponding
decomposition theorem.
 6. A point x is said to belong to the support of the d.f. F iff for every
 > 0 we have Fx C   Fx   > 0. The set of all such x is called the
support of F. Show that each point of jump belongs to the support, and that
each isolated point of the support is a point of jump. Give an example of a
discrete d.f. whose support is the whole line.
1.3 ABSOLUTELY CONTINUOUS AND SINGULAR DISTRIBUTIONS 11

7. Prove that the support of any d.f. is a closed set, and the support of
any continuous d.f. is a perfect set.

1.3 Absolutely continuous and singular distributions


Further analysis of d.f.’s requires the theory of Lebesgue measure. Throughout
the book this measure will be denoted by m; “almost everywhere” on the real
line without qualification will  refer to it and be abbreviated to “a.e.”; an
integral written in the form . . . dt is a Lebesgue integral; a function f is
said to be “integrable” in a, b iff
 b
ft dt
a

is defined and finite [this entails, of course, that f be Lebesgue measurable].


The class of such functions will be denoted by L 1 a, b, and L 1 1, 1 is
abbreviated to L 1 . The complement of a subset S of an understood “space”
such as 1, C1 will be denoted by Sc .

DEFINITION. A function F is called absolutely continuous [in 1, 1


and with respect to the Lebesgue measure] iff there exists a function f in L 1
such that we have for every x < x 0 :
 x0
0
1 Fx   Fx D ft dt.
x

It follows from a well-known proposition (see, e.g., Natanson [3]Ł ) that such
a function F has a derivative equal to f a.e. In particular, if F is a d.f., then
 1
2 f ½ 0 a.e. and ft dt D 1.
1

Conversely, given any f in L 1 satisfying the conditions in (2), the function F


defined by
 x
3 8x: Fx D ft dt
1

is easily seen to be a d.f. that is absolutely continuous.

DEFINITION. A function F is called singular iff it is not identically zero


and F0 (exists and) equals zero a.e.
Ł Numbers in brackets refer to the General Bibliography.
12 DISTRIBUTION FUNCTION

The next theorem summarizes some basic facts of real function theory;
see, e.g., Natanson [3].

Theorem 1.3.1. Let F be bounded increasing with F1 D 0, and let F0


denote its derivative wherever existing. Then the following assertions are true.

(a) If S denotes the set of all x for which F0 x exists with 0  F0 x <
1, then mSc  D 0.
(b) This F0 belongs to L 1 , and we have for every x < x 0 :
 x0
4 F0 t dt  Fx 0   Fx.
x

(c) If we put
 x
5 8x: Fac x D F0 t dt, Fs x D Fx  Fac x,
1

then F0ac D F0 a.e. so that F0s D F0  F0ac D 0 a.e. and consequently Fs is


singular if it is not identically zero.

DEFINITION. Any positive function f that is equal to F0 a.e. is called a


density of F. Fac is called the absolutely continuous part, Fs the singular part
of F. Note that the previous Fd is part of Fs as defined here.

It is clear that Fac is increasing and Fac  F. From (4) it follows that if
x < x0  x0
Fs x 0   Fs x D Fx 0   Fx  ft dt ½ 0.
x

Hence Fs is also increasing and Fs  F. We are now in a position to announce


the following result, which is a refinement of Theorem 1.2.3.

Theorem 1.3.2. Every d.f. F can be written as the convex combination of


a discrete, a singular continuous, and an absolutely continuous d.f. Such a
decomposition is unique.

EXERCISES

1. A d.f. F is singular if and only if F D Fs ; it is absolutely continuous


if and only if F  Fac .
2. Prove Theorem 1.3.2.
 3. If the support of a d.f. (see Exercise 6 of Sec. 1.2) is of measure zero,
then F is singular. The converse is false.
1.3 ABSOLUTELY CONTINUOUS AND SINGULAR DISTRIBUTIONS 13

 4. Suppose that F is a d.f. and (3) holds with a continuous f. Then


0
F D f ½ 0 everywhere.
5. Under the conditions in the preceding exercise, the support of F is the
closure of the set ft j ft > 0g; the complement of the support is the interior
of the set ft j ft D 0g.
6. Prove that a discrete distribution is singular. [Cf. Exercise 13 of
Sec. 2.2.]
7. Prove that a singular function as defined here is (Lebesgue) measurable
but need not be of bounded variation even locally. [HINT: Such a function is
continuous except on a set of Lebesgue measure zero; use the completeness
of the Lebesgue measure.]

The remainder of this section is devoted to the construction of a singular


continuous distribution. For this purpose let us recall the construction of the
Cantor (ternary) set (see, e.g., Natanson [3]). From the closed interval [0,1],
the “middle third” open interval  13 , 23  is removed; from each of the two
remaining disjoint closed intervals the middle third,  19 , 29  and  79 , 89 , respec-
tively, are removed and so on. After n steps, we have removed

1 C 2 C Ð Ð Ð C 2n1 D 2n  1

disjoint open intervals and are left with 2n disjoint closed intervals each of
length 1/3n . Let these removed ones, in order of position from left to right,
be denoted by Jn,k , 1  k  2n  1, and their union by Un . We have
 n
1 2 4 2n1 2
mUn  D C 2 C 3 C Ð Ð Ð C n D 1  .
3 3 3 3 3
As n " 1, Un increases to an open set U; the complement C of U with
respect to [0,1] is a perfect set, called the Cantor set. It is of measure zero
since
mC D 1  mU D 1  1 D 0.

Now for each n and k, n ½ 1, 1  k  2n  1, we put


k
cn,k D ;
2n
and define a function F on U as follows:

7 Fx D cn,k for x 2 Jn,k .

This definition is consistent since two intervals, Jn,k and Jn0 ,k 0 , are either
disjoint or identical, and in the latter case so are cn,k D cn0 ,k 0 . The last assertion
14 DISTRIBUTION FUNCTION

becomes obvious if we proceed step by step and observe that

JnC1,2k D Jn,k , cnC1,2k D cn,k for 1  k  2n  1.

The value of F is constant on each Jn,k and is strictly greater on any other
Jn0 ,k 0 situated to the right of Jn,k . Thus F is increasing and clearly we have

lim Fx D 0, lim Fx D 1.


x#0 x"1

Let us complete the definition of F by setting

Fx D 0 for x  0, Fx D 1 for x ½ 1.

F is now defined on the domain D D 1, 0 [ U [ 1, 1 and increasing


there. Since each Jn,k is at a distance ½ 1/3n from any distinct Jn,k 0 and the
total variation of F over each of the 2n disjoint intervals that remain after
removing Jn,k , 1  k  2n  1, is 1/2n , it follows that
1 1
0  x0  x  ) 0  Fx 0   Fx  n .
3n 2
Hence F is uniformly continuous on D. By Exercise 5 of Sec. 1.1, there exists
a continuous increasing F Q on 1, C1 that coincides with F on D. This
FQ is a continuous d.f. that is constant on each Jn,k . It follows that FQ 0 D 0
on U and so also on 1, C1  C. Thus F Q is singular. Alternatively, it
is clear that none of the points in D is in the support of F, hence the latter
is contained in C and of measure 0, so that F Q is singular by Exercise 3
above. [In Exercise 13 of Sec. 2.2, it will become obvious that the measure
corresponding to FQ is singular because there is no mass in U.]

EXERCISES

The F in these exercises is the F Q defined above.


8. Prove that the support of F is exactly C.
 9. It is well known that any point x in C has a ternary expansion without
the digit 1:
1
an
xD , an D 0 or 2.
nD1
3n

Prove that for this x we have


1
an
Fx D .
nD1
2nC1
1.3 ABSOLUTELY CONTINUOUS AND SINGULAR DISTRIBUTIONS 15

10. For each x 2 [0, 1], we have


 
x 2 x
2F D Fx, 2F C  1 D Fx.
3 3 3

11. Calculate
 1  1  1
2
x dFx, x dFx, eitx dFx.
0 0 0

[HINT: This can be done directly or by using Exercise 10; for a third method
see Exercise 9 of Sec. 5.3.]
12. Extend the function F on [0,1] trivially to 1, 1. Let frn g be an
enumeration of the rationals and
1
1
Gx D Frn C x.
nD1
2n

Show that G is a d.f. that is strictly increasing for all x and singular. Thus we
have a singular d.f. with support 1, 1.
 13. Consider F on [0,1]. Modify its inverse F1 suitably to make it
single-valued in [0,1]. Show that F1 so modified is a discrete d.f. and find
its points of jump and their sizes.
14. Given any closed set C in 1, C1, there exists a d.f. whose
support is exactly C. [HINT: Such a problem becomes easier when the corre-
sponding measure is considered; see Sec. 2.2 below.]
 15. The Cantor d.f. F is a good building block of “pathological”
examples. For example, let H be the inverse of the homeomorphic map of [0,1]
onto itself: x ! 12 [Fx C x]; and E a subset of [0,1] which is not Lebesgue
measurable. Show that
1HE °H D 1E

where H(E) is the image of E, 1B is the indicator function of B, and ° denotes


the composition of functions. Hence deduce: (1) a Lebesgue measurable func-
tion of a strictly increasing and continuous function need not be Lebesgue
measurable; (2) there exists a Lebesgue measurable function that is not Borel
measurable.
2 Measure theory

2.1 Classes of sets

Let  be an “abstract space”, namely a nonempty set of elements to be


called “points” and denoted generically by ω. Some of the usual opera-
tions and relations between sets, together with the usual notation, are given
below.
Union : E [ F, En
n
Intersection : E \ F, En
n
Complement : Ec D nE
Difference : EnF D E \ Fc
Symmetric difference : E 1 F D EnF [ FnE
Singleton : fωg
Containing (for subsets of  as well as for collections thereof):
E ² F, F¦E not excluding E D F
A ² B, B¦A not excluding A D B
2.1 CLASSES OF SETS 17

Belonging (for elements as well as for sets):


ω 2 E, E2A
Empty set: ∅
The reader is supposed to be familiar with the elementary properties of these
operations.
A nonempty collection A of subsets of  may have certain “closure
properties”. Let us list some of those used below; note that j is always an
index for a countable set and that commas as well as semicolons are used to
denote “conjunctions” of premises.

(i) E 2 A ) Ec 2 A .
(ii) E1 2 A , E2 2 A ) E1 [ E2 2 A .
(iii) E1 2 A , E2 2 A ) E1 \ E2 2 A .
(iv) 8n ½ 2 : Ej 2 A , 1  j  n ) n
jD1 Ej 2 A .
(v) 8n ½ 2 : Ej 2 A , 1  j  n ) n
jD1 Ej 2 A .
1
(vi) Ej 2 A ; Ej ² EjC1 , 1  j < 1 ) jD1 Ej 2 A .
1
(vii) Ej 2 A ; Ej ¦ EjC1 , 1  j < 1 ) jD1 Ej 2 A .
1
(viii) Ej 2 A , 1  j < 1 ) jD1 Ej 2 A.
1
(ix) Ej 2 A , 1  j < 1 ) jD1 Ej 2 A.
(x) E1 2 A , E2 2 A , E1 ² E2 ) E2 nE1 2 A .

It follows from simple set algebra that under (i): (ii) and (iii) are equiv-
alent; (vi) and (vii) are equivalent; (viii) and (ix) are equivalent. Also, (ii)
implies (iv) and (iii) implies (v) by induction. It is trivial that (viii) implies
(ii) and (vi); (ix) implies (iii) and (vii).

DEFINITION. A nonempty collection F of subsets of  is called a field iff


(i) and (ii) hold. It is called a monotone class (M.C.) iff (vi) and (vii) hold. It
is called a Borel field (B.F.) iff (i) and (viii) hold.

Theorem 2.1.1. A field is a B.F. if and only if it is also an M.C.


PROOF. The “only if ” part is trivial; to prove the “if ” part we show that
(iv) and (vi) imply (viii). Let Ej 2 A for 1  j < 1, then
18 MEASURE THEORY

n
Fn D Ej 2 A
jD1

by (iv), which holds in a field, Fn ² FnC1 and


1 1
Ej D Fj ;
jD1 jD1

1
hence jD1 Ej 2 A by (vi).
The collection S of all subsets of  is a B.F. called the total B.F.; the
collection of the two sets f∅, g is a B.F. called the trivial B.F. If A is any
index set and if for every ˛ 2 A, F˛ is a B.F. (or M.C.) then the intersection
˛2A F˛ of all these B.F.’s (or M.C.’s), namely the collection of sets each
of which belongs to all F˛ , is also a B.F. (or M.C.). Given any nonempty
collection C of sets, there is a minimal B.F. (or field, or M.C.) containing it;
this is just the intersection of all B.F.’s (or fields, or M.C.’s) containing C ,
of which there is at least one, namely the S mentioned above. This minimal
B.F. (or field, or M.C.) is also said to be generated by C . In particular if F0
is a field there is a minimal B.F. (or M.C.) containing F0 .

Theorem 2.1.2. Let F0 be a field, G the minimal M.C. containing F0 , F the


minimal B.F. containing F0 , then F D G .
PROOF. Since a B.F. is an M.C., we have F ¦ G . To prove F ² G it is
sufficient to show that G is a B.F. Hence by Theorem 2.1.1 it is sufficient to
show that G is a field. We shall show that it is closed under intersection and
complementation. Define two classes of subsets of G as follows:
C1 D fE 2 G : E \ F 2 G for all F 2 F0 g,
C2 D fE 2 G : E \ F 2 G for all F 2 G g.

The identities
⎛ ⎞
1 1
F\⎝ Ej ⎠ D F \ Ej 
jD1 jD1
⎛ ⎞
1
 1

F\⎝ Ej ⎠ D F \ Ej 
jD1 jD1

show that both C1 and C2 are M.C.’s. Since F0 is closed under intersection and
contained in G , it is clear that F0 ² C1 . Hence G ² C1 by the minimality of G
2.1 CLASSES OF SETS 19

and so G D C1 . This means for any F 2 F0 and E 2 G we have F \ E 2 G ,


which in turn means F0 ² C2 . Hence G D C2 and this means G is closed under
intersection.
Next, define another class of subsets of G as follows:
C3 D fE 2 G : Ec 2 G g
The (DeMorgan) identities
⎛ ⎞c
1 1

⎝ Ej ⎠ D Ecj
jD1 jD1
⎛ ⎞c
1
 1
⎝ Ej ⎠ D Ecj
jD1 jD1

show that C3 is a M.C. Since F0 ² C3 , it follows as before that G D C3 , which


means G is closed under complementation. The proof is complete.

Corollary. Let F0 be a field, F the minimal B.F. containing F0 ; C a class


of sets containing F0 and having the closure properties (vi) and (vii), then C
contains F .

The theorem above is one of a type called monotone class theorems. They
are among the most useful tools of measure theory, and serve to extend certain
relations which are easily verified for a special class of sets or functions to a
larger class. Many versions of such theorems are known; see Exercise 10, 11,
and 12 below.

EXERCISES

 1.  A n B  ² A nB . A n B  ² A nB . When


j j j j j j j j j j j j j j
is there equality?
 2. The best way to define the symmetric difference is through indicators
of sets as follows:
1A1B D 1A C 1B mod 2

where we have arithmetical addition modulo 2 on the right side. All properties
of 1 follow easily from this definition, some of which are rather tedious to
verify otherwise. As examples:
A 1 B 1 C D A 1B 1 C,
A 1 B 1B 1 C D A 1 C,
20 MEASURE THEORY

A 1 B 1C 1 D D A 1 C 1B 1 D,


A 1 B D C , A D B 1 C,
A 1 B D C 1 D , A 1 C D B 1 D.

3. If  has exactly n points, then S has 2n members. The B.F. generated


by n given sets “without relations among them” has 22n members.
4. If  is countable, then S is generated by the singletons, and
conversely. [HINT: All countable subsets of  and their complements form
a B.F.]
5. The intersection of any collection of B.F.’s fF˛ , ˛ 2 Ag is the maximal
B.F. contained in all of them; it is indifferently denoted by ˛2A F˛ or 3˛2A F˛ .
 6. The union of a countable collection of B.F.’s fF g such that F ² F
j j jC1
need not be a B.F., but there is a minimal B.F. containing all of them, denoted
by _j Fj . In general _˛2A F˛ denotes the minimal B.F. containing all F˛ , ˛ 2 A.
[HINT:  D the set of positive integers; Fj D the B.F. generated by those up
to j.]
7. A B.F. is said to be countably generated iff it is generated by a count-
able collection of sets. Prove that if each Fj is countably generated, then so
is _1jD1 Fj .
 8. Let F be a B.F. generated by an arbitrary collection of sets fE , ˛ 2
˛
Ag. Prove that for each E 2 F , there exists a countable subcollection fE˛j , j ½
1g (depending on E) such that E belongs already to the B.F. generated by this
subcollection. [HINT: Consider the class of all sets with the asserted property
and show that it is a B.F. containing each E˛ .]
9. If F is a B.F. generated by a countable collection of disjoint sets
f3n g, such that n 3n D , then each member of F is just the union of a
countable subcollection of these 3n ’s.
10. Let D be a class of subsets of  having the closure property (iii);
let A be a class of sets containing  as well as D , and having the closure
properties (vi) and (x). Then A contains the B.F. generated by D . (This is
Dynkin’s form of a monotone class theorem which is expedient for certain
applications. The proof proceeds as in Theorem 2.1.2 by replacing F0 and G
with D and A respectively.)
11. Take  D Rn or a separable metric space in Exercise 10 and let D
be the class of all open sets. Let H be a class of real-valued functions on 
satisfying the following conditions.
(a) 1 2 H and 1D 2 H for each D 2 D ;
(b) H is a vector space, namely: if f1 2 H , f2 2 H and c1 , c2 are any
two real constants, then c1 f1 C c2 f2 2 H ;
2.2 PROBABILITY MEASURES AND THEIR DISTRIBUTION FUNCTIONS 21

(c) H is closed with respect to increasing limits of positive functions,


namely: if fn 2 H , 0  fn  fnC1 for all n, and f D limn " fn <
1, then f 2 H .
Then H contains all Borel measurable functions on , namely all finite-
valued functions measurable with respect to the topological Borel field (D the
minimal B.F. containing all open sets of ). [HINT: let C D fE ² : 1E 2 H g;
apply Exercise 10 to show that C contains the B.F. just defined. Each positive
Borel measurable function is the limit of an increasing sequence of simple
(finitely-valued) functions.]
12. Let C be a M.C. of subsets of Rn (or a separable metric space)
containing all the open sets and closed sets. Prove that C ¦ Bn (the topological
Borel field defined in Exercise 11). [HINT: Show that the minimal such class
is a field.]

2.2 Probability measures and their distribution


functions
Let  be a space, F a B.F. of subsets of . A probability measure P Ð on F
is a numerically valued set function with domain F , satisfying the following
axioms:

(i) 8E 2 F : P E ½ 0.
(ii) If fEj g is a countable collection of (pairwise) disjoint sets in F , then
⎛ ⎞

P ⎝ Ej ⎠ D P Ej .
j j

(iii) P  D 1.

The abbreviation “p.m.” will be used for “probability measure”.


These axioms imply the following consequences, where all sets are
members of F .

(iv) P E  1.
(v) P ∅ D 0.
(vi) P Ec  D 1  P E.
(vii) P E [ F C P E \ F D P E C P F.
(viii) E ² F ) P E D P F  P FnE  P F.
(ix) Monotone property. En " E or En # E ) P En  ! P E.

(x) Boole’s inequality. P  j Ej   j P Ej .
22 MEASURE THEORY

Axiom (ii) is called “countable additivity”; the corresponding axiom


restricted to a finite collection fEj g is called “finite additivity”.
The following proposition
1 En # ∅ ) P En  ! 0
is called the “axiom of continuity”. It is a particular case of the monotone
property (x) above, which may be deduced from it or proved in the same way
as indicated below.

Theorem 2.2.1. The axioms of finite additivity and of continuity together


are equivalent to the axiom of countable additivity.
PROOF. Let En #. We have the obvious identity:
1 1

En D Ek nEkC1  [ Ek .
kDn kD1

If En # ∅, the last term is the empty set. Hence if (ii) is assumed, we have
1

8n ½ 1: P En  D P Ek nEkC1 ;
kDn

the series being convergent, we have limn!1 P En  D 0. Hence (1) is true.
Conversely, let fEk , k ½ 1g be pairwise disjoint, then
1
Ek # ∅
kDnC1

(why?) and consequently, if (1) is true, then


 1 
lim P Ek D 0.
n!1
kDnC1

Now if finite additivity is assumed, we have


1   n   1

P Ek DP Ek CP Ek
kD1 kD1 kDnC1
 1


n
D P Ek  C P Ek .
kD1 kDnC1

This shows that the infinite series 1
kD1 P Ek  converges as it is bounded by
the first member above. Letting n ! 1, we obtain
2.2 PROBABILITY MEASURES AND THEIR DISTRIBUTION FUNCTIONS 23

1   1


n
P Ek D lim P Ek  C lim P Ek
n!1 n!1
kD1 kD1 kDnC1
1

D P Ek .
kD1

Hence (ii) is true.

Remark. For a later application (Theorem 3.3.4) we note the following


extension. Let P be defined on a field F which is finitely additive and satis-
fies axioms (i), (iii), and (1). Then (ii) holds whenever k Ek 2 F . For then
1
kDnC1 Ek also belongs to F , and the second part of the proof above remains
valid.

The triple , F , P  is called a probability space (triple);  alone is


called the sample space, and ω is then a sample point.
Let 1 ² , then the trace of the B.F. F on 1 is the collection of all sets
of the form 1 \ F, where F 2 F . It is easy to see that this is a B.F. of subsets
of 1, and we shall denote it by 1 \ F . Suppose 1 2 F and P 1 > 0; then
we may define the set function P1 on 1 \ F as follows:
P E
8E 2 1 \ F : P1 E D .
P 1
It is easy to see that P1 is a p.m. on 1 \ F . The triple 1, 1 \ F , P 1  will
be called the trace of , F , P  on 1.

Example 1. Let  be a countable set:  D fωj , j 2 Jg, where J is a countable index


set, and let F be the total B.F. of . Choose any sequence of numbers fpj , j 2 Jg
satisfying

2 8j 2 J: pj ½ 0; pj D 1;
j2J

and define a set function P on F as follows:



3 8E 2 F : P E D pj .
ωj2E

In words, we assign pj as the value of the “probability” of the singleton fωj g, and
for an arbitrary set of ωj ’s we assign as its probability the sum of all the probabilities
assigned to its elements. Clearly axioms (i), (ii), and (iii) are satisfied. Hence P so
defined is a p.m.
Conversely, let any such P be given on F . Since fωj g 2 F for every j, P fωj g
is defined, let its value be pj . Then (2) is satisfied. We have thus exhibited all the
24 MEASURE THEORY

possible p.m.’s on , or rather on the pair , S ; this will be called a discrete sample
space. The entire first volume of Feller’s well-known book [13] with all its rich content
is based on just such spaces.

Example 2. Let U D 0, 1], C the collection of intervals:


C D fa, b]: 0 < a < b  1g;
B the minimal B.F. containing C , m the Borel–Lebesgue measure on B. Then
U , B, m is a probability space.
Let B0 be the collection of subsets of U each of which is the union of a finite
number of members of C . Thus a typical set B in B0 is of the form
n
BD aj , bj ] where a1 < b1 < a2 < b2 < Ð Ð Ð < an < bn .
jD1

It is easily seen that B0 is a field that is generated by C and in turn generates B.


If we take U D [0, 1] instead, then B0 is no longer a field since U 2 / B0 , but
B and m may be defined as before. The new B is generated by the old B and the
singleton f0g.

Example 3. Let R1 D 1, C1, C the collection of intervals of the form (a, b].
1 < a < b < C1. The field B0 generated by C consists of finite unions of disjoint
sets of the form (a, b], 1, a] or b, 1. The Euclidean B.F. B1 on R1 is the B.F.
generated by C or B0 . A set in B1 will be called a (linear) Borel set when there is no
danger of ambiguity. However, the Borel–Lebesgue measure m on R1 is not a p.m.;
indeed mR1  D C1 so that m is not a finite measure but it is -finite on B0 , namely:
there exists a sequence of sets En 2 B0 , En " R1 with mEn  < 1 for each n.

EXERCISES

1. For any countably infinite set , the collection of its finite subsets
and their complements forms a field F . If we define P E on F to be 0 or 1
according as E is finite or not, then P is finitely additive but not countably so.
 2. Let  be the space of natural numbers. For each E ²  let N E
n
be the cardinality of the set E \ [0, n] and let C be the collection of E’s for
which the following limit exists:
Nn E
P E D lim .
n!1 n
P is finitely additive on C and is called the “asymptotic density” of E. Let E D
fall odd integersg, F D fall odd integers in.[22n , 22nC1 ] and all even integers
in [22nC1 , 22nC2 ] for n ½ 0g. Show that E 2 C , F 2 C , but E \ F 2 / C . Hence
C is not a field.
2.2 PROBABILITY MEASURES AND THEIR DISTRIBUTION FUNCTIONS 25

3. In the preceding example show that for each real number ˛ in [0, 1]
there is an E in C such that P E D ˛. Is the set of all primes in C ? Give
an example of E that is not in C .
4. Prove the nonexistence of a p.m. on , S , where , S  is as in
Example 1, such that the probability of each singleton has the same value.
Hence criticize a sentence such as: “Choose an integer at random”.
5. Prove that the trace of a B.F. F on any subset 1 of  is a B.F.
Prove that the trace of , F , P  on any 1 in F is a probability space, if
P 1 > 0.
 6. Now let 1 2 / F be such that

1 ² F 2 F ) P F D 1.

Such a set is called thick in , F , P . If E D 1 \ F, F 2 F , define P Ł E D


P F. Then P Ł is a well-defined (what does it mean?) p.m. on 1, 1 \ F .
This procedure is called the adjunction of 1 to , F , P .
7. The B.F. B1 on R1 is also generated by the class of all open intervals
or all closed intervals, or all half-lines of the form 1, a] or a, 1, or these
intervals with rational endpoints. But it is not generated by all the singletons
of R1 nor by any finite collection of subsets of R1 .
8. B1 contains every singleton, countable set, open set, closed set, Gυ
set, F set. (For the last two kinds of sets see, e.g., Natanson [3].)
 9. Let C be a countable collection of pairwise disjoint subsets fE , j ½ 1g
j
of R1 , and let F be the B.F. generated by C . Determine the most general
p.m. on F and show that the resulting probability space is “isomorphic” to
that discussed in Example 1.
10. Instead of requiring that the Ej ’s be pairwise disjoint, we may make
the broader assumption that each of them intersects only a finite number in
the collection. Carry through the rest of the problem.

The question of probability measures on B1 is closely related to the


theory of distribution functions studied in Chapter 1. There is in fact a one-to-
one correspondence between the set functions on the one hand, and the point
functions on the other. Both points of view are useful in probability theory.
We establish first the easier half of this correspondence.

Lemma. Each p.m. on B1 determines a d.f. F through the correspondence

4 8x 2 R1 : 1, x] D Fx.


26 MEASURE THEORY

As a consequence, we have for 1 < a < b < C1:


a, b] D Fb  Fa,
a, b D Fb  Fa,
5
[a, b D Fb  Fa,
[a, b] D Fb  Fa.

Furthermore, let D be any dense subset of R1 , then the correspondence is


already determined by that in (4) restricted to x 2 D, or by any of the four
relations in (5) when a and b are both restricted to D.
PROOF. Let us write

8x 2 R1 : Ix D 1, x].

Then Ix 2 B1 so that Ix  is defined; call it Fx and so define the function
F on R1 . We shall show that F is a d.f. as defined in Chapter 1. First of all, F
is increasing by property (viii) of the measure. Next, if xn # x, then Ixn # Ix ,
hence we have by (ix)

6 Fxn  D Ixn  # Ix  D Fx.

Hence F is right continuous. [The reader should ascertain what changes should
be made if we had defined F to be left continuous.] Similarly as x # 1, Ix #
∅; as x " C1, Ix " R1 . Hence it follows from (ix) again that
lim Fx D lim Ix  D ∅ D 0;
x#1 x#1

lim Fx D lim Ix  D  D 1.


x"C1 x"C1

This ends the verification that F is a d.f. The relations in (5) follow easily
from the following complement to (4):

1, x D Fx.

To see this let xn < x and xn " x. Since Ixn " 1, x, we have by (ix):

Fx D lim Fxn  D 1, xn  " 1, x.


n!1

To prove the last sentence in the theorem we show first that (4) restricted
to x 2 D implies (4) unrestricted. For this purpose we note that 1, x],
as well as Fx, is right continuous as a function of x, as shown in (6).
Hence the two members of the equation in (4), being both right continuous
functions of x and coinciding on a dense set, must coincide everywhere. Now
2.2 PROBABILITY MEASURES AND THEIR DISTRIBUTION FUNCTIONS 27

suppose, for example, the second relation in (5) holds for rational a and b. For
each real x let an , bn be rational such that an # 1 and bn > x, bn # x. Then
an , bn  ! 1, x] and Fbn   Fan  ! Fx. Hence (4) follows.
Incidentally, the correspondence (4) “justifies” our previous assumption
that F be right continuous, but what if we have assumed it to be left continuous?
Now we proceed to the second-half of the correspondence.

Theorem 2.2.2. Each d.f. F determines a p.m. on B1 through any one of


the relations given in (5), or alternatively through (4).

This is the classical theory of Lebesgue-Stieltjes measure; see, e.g.,


Halmos [4] or Royden [5]. However, we shall sketch the basic ideas as an
important review. The d.f. F being given, we may define a set function for
intervals of the form (a, b] by means of the first relation in (5). Such a function
is seen to be countably additive on its domain of definition. (What does this
mean?) Now we proceed to extend its domain of definition while preserving
this additivity. If S is a countable union of such intervals which are disjoint:

SD ai , bi ]
i

we are forced to define S, if at all, by


 
S D ai , bi ] D fFbi   Fai g.
i i

But a set S may be representable in the form above in different ways, so


we must check that this definition leads to no contradiction: namely that it
depends really only on the set S and not on the representation. Next, we notice
that any open interval (a, b) is in the extended domain (why?) and indeed the
extended definition agrees with the second relation in (5). Now it is well
known that any open set U in R1 is the union of a countable collection of
disjoint open intervals [there is no exact analogue of this in Rn for n > 1], say
U D i ci , di ; and this representation is unique. Hence again we are forced
to define U, if at all, by
 
U D ci , di  D fFdi   Fci g.
i i

Having thus defined the measure for all open sets, we find that its values for
all closed sets are thereby also determined by property (vi) of a probability
measure. In particular, its value for each singleton fag is determined to be
Fa  Fa, which is nothing but the jump of F at a. Now we also know its
value on all countable sets, and so on — all this provided that no contradiction
28 MEASURE THEORY

is ever forced on us so far as we have gone. But even with the class of open
and closed sets we are still far from the B.F. B1 . The next step will be the Gυ
sets and the F sets, and there already the picture is not so clear. Although
it has been shown to be possible to proceed this way by transfinite induction,
this is a rather difficult task. There is a more efficient way to reach the goal
via the notions of outer and inner measures as follows. For any subset S of
R1 consider the two numbers:
Ł
S D inf U,
U open, U¦S

Ł S D sup C.


C closed, C²S

Ł
is the outer measure, Ł the inner measure (both with respect to the given
F). It is clear that Ł S ½ Ł S. Equality does not in general hold, but when
it does, we call S “measurable” (with respect to F). In this case the common
value will be denoted by S. This new definition requires us at once to
check that it agrees with the old one for all the sets for which has already
been defined. The next task is to prove that: (a) the class of all measurable
sets forms a B.F., say L ; (b) on this L , the function is a p.m. Details of
these proofs are to be found in the references given above. To finish: since
L is a B.F., and it contains all intervals of the form (a, b], it contains the
minimal B.F. B1 with this property. It may be larger than B1 , indeed it is
(see below), but this causes no harm, for the restriction of to B1 is a p.m.
whose existence is asserted in Theorem 2.2.2.
Let us mention that the introduction of both the outer and inner measures
is useful for approximations. It follows, for example, that for each measurable
set S and  > 0, there exists an open set U and a closed set C such that
U ¦ S ¦ C and

7 U    S  C C .

There is an alternative way of defining measurability through the use of the


outer measure alone and based on Carathéodory’s criterion.
It should also be remarked that the construction described above for
R1 , B1 ,  is that of a “topological measure space”, where the B.F. is gener-
ated by the open sets of a given topology on R1 , here the usual Euclidean
one. In the general case of an “algebraic measure space”, in which there is no
topological structure, the role of the open sets is taken by an arbitrary field F0 ,
and a measure given on F0 may be extended to the minimal B.F. F containing
F0 in a similar way. In the case of R1 , such an F0 is given by the field B0 of
sets, each of which is the union of a finite number of intervals of the form (a,
b], (1, b], or a, 1, where a 2 R1 , b 2 R1 . Indeed the definition of the
2.2 PROBABILITY MEASURES AND THEIR DISTRIBUTION FUNCTIONS 29

outer measure given above may be replaced by the equivalent one:



Ł
8 E D inf Un .
n

where the infimum is taken over all countable unions n Un such that each
Un 2 B0 and n Un ¦ E. For another case where such a construction is
required see Sec. 3.3 below.
There is one more question: besides the discussed above is there any
other p.m. that corresponds to the given F in the same way? It is important
to realize that this question is not answered by the preceding theorem. It is
also worthwhile to remark that any p.m. that is defined on a domain strictly
containing B1 and that coincides with on B1 (such as the on L as
mentioned above) will certainly correspond to F in the same way, and strictly
speaking such a is to be considered as distinct from . Hence we should
phrase the question more precisely by considering only p.m.’s on B1 . This
will be answered in full generality by the next theorem.

Theorem 2.2.3. Let and be two measures defined on the same B.F. F ,
which is generated by the field F0 . If either or is -finite on F0 , and
E D E for every E 2 F0 , then the same is true for every E 2 F , and
thus D .
PROOF. We give the proof only in the case where and are both finite,
leaving the rest as an exercise. Let

C D fE 2 F : E D Eg,

then C ¦ F0 by hypothesis. But C is also a monotone class, for if En 2 C for


every n and En " E or En # E, then by the monotone property of and ,
respectively,
E D lim En  D lim En  D E.
n n

It follows from Theorem 2.1.2 that C ¦ F , which proves the theorem.

Remark. In order that and coincide on F0 , it is sufficient that they


coincide on a collection G such that finite disjoint unions of members of G
constitute F0 .

Corollary. Let and be -finite measures on B1 that agree on all intervals


of one of the eight kinds: (a, b], (a, b), [a, b), [a, b], (1, b], (1, b), [a, 1),
(a, 1) or merely on those with the endpoints in a given dense set D, then
they agree on B1 .
30 MEASURE THEORY

PROOF. In order to apply the theorem, we must verify that any of the
hypotheses implies that and agree on a field that generates B. Let us take
intervals of the first kind and consider the field B0 defined above. If and
agree on such intervals, they must agree on B0 by countable additivity. This
finishes the proof.
Returning to Theorems 2.2.1 and 2.2.2, we can now add the following
complement, of which the first part is trivial.

Theorem 2.2.4. Given the p.m. on B1 , there is a unique d.f. F satisfying


(4). Conversely, given the d.f. F, there is a unique p.m. satisfying (4) or any
of the relations in (5).

We shall simply call the p.m. of F, and F the d.f. of .


Instead of R1 , B1  we may consider its restriction to a fixed interval [a,
b]. Without loss of generality we may suppose this to be U D [0, 1] so that we
are in the situation of Example 2. We can either proceed analogously or reduce
it to the case just discussed, as follows. Let F be a d.f. such that F D 0 for x 
0 and F D 1 for x ½ 1. The probability measure of F will then have support
in [0, 1], since 1, 0 D 0 D 1, 1 as a consequence of (4). Thus
the trace of R1 , B1 ,  on U may be denoted simply by U , B, , where
B is the trace of B1 on U . Conversely, any p.m. on B may be regarded as
such a trace. The most interesting case is when F is the “uniform distribution”
on U : 
0 for x < 0,
Fx D x for 0  x  1,
1 for x > 1.
The corresponding measure m on B is the usual Borel measure on [0, 1],
while its extension on L as described in Theorem 2.2.2 is the usual Lebesgue
measure there. It is well known that L is actually larger than B; indeed L , m
is the completion of B, m to be discussed below.

DEFINITION. The probability space , F , P  is said to be complete iff


any subset of a set in F with P F D 0 also belongs to F .

Any probability space , F , P  can be completed according to the next


theorem. Let us call a set in F with probability zero a null set. A property
that holds except on a null set is said to hold almost everywhere (a.e.), almost
surely (a.s.), or for almost every ω.

Theorem 2.2.5. Given the probability space , F , P , there exists a


complete space , F , P  such that F ² F and P D P on F .
2.2 PROBABILITY MEASURES AND THEIR DISTRIBUTION FUNCTIONS 31

PROOF. Let N be the collection of sets that are subsets of null sets, and
let F be the collection of subsets of  each of which differs from a set in F
by a subset of a null set. Precisely:
9 F D fE ² : E 1 F 2 N for some F 2 F g.
It is easy to verify, using Exercise 1 of Sec. 2.1, that F is a B.F. Clearly it
contains F . For each E 2 F , we put
P E D P F,
where F is any set that satisfies the condition indicated in (7). To show that
this definition does not depend on the choice of such an F, suppose that
E 1 F1 2 N , E 1 F2 2 N .
Then by Exercise 2 of Sec. 2.1,
E 1 F1  1E 1 F2  D F1 1 F2  1E 1 E D F1 1 F2 .
Hence F1 1 F2 2 N and so P F1 1 F2  D 0. This implies P F1  D P F2 ,
as was to be shown. We leave it as an exercise to show that P is a measure
on F . If E 2 F , then E 1 E D ∅ 2 N , hence P E D P E.
Finally, it is easy to verify that if E 2 F and P E D 0, then E 2 N .
Hence any subset of E also belongs to N and so to F . This proves that
, F , P  is complete.
What is the advantage of completion? Suppose that a certain property,
such as the existence of a certain limit, is known to hold outside a certain set
N with P N D 0. Then the exact set on which it fails to hold is a subset
of N, not necessarily in F , but will be in F with P N D 0. We need the
measurability of the exact exceptional set to facilitate certain dispositions, such
as defining or redefining a function on it; see Exercise 25 below.

EXERCISES

In the following, is a p.m. on B1 and F is its d.f.


 11. An atom of any measure on B1 is a singleton fxg such that
fxg > 0. The number of atoms of any -finite measure is countable. For
each x we have fxg D Fx  Fx.
12. is called atomic iff its value is zero on any set not containing any
atom. This is the case if and only if F is discrete. is without any atom or
atomless if and only if F is continuous.
13. is called singular iff there exists a set Z with mZ D 0 such
that Zc  D 0. This is the case if and only if F is singular.
 [HINT: One half
is proved by using Theorems 1.3.1 and 2.1.2 to get B F0 x dx  B for
B 2 B1 ; the other requires Vitali’s covering theorem.]
32 MEASURE THEORY

14. Translate Theorem 1.3.2 in terms of measures.


 15. Translate the construction of a singular continuous d.f. in Sec. 1.3 in
terms of measures. [It becomes clearer and easier to describe!] Generalize the
construction by replacing the Cantor set with any perfect set of Borel measure
zero. What if the latter has positive measure? Describe a probability scheme
to realize this.
16. Show by a trivial example that Theorem 2.2.3 becomes false if the
field F0 is replaced by an arbitrary collection that generates F .
17. Show that Theorem 2.2.3 may be false for measures that are -finite
on F . [HINT: Take  to be f1, 2, . . . , 1g and F0 to be the finite sets excluding
1 and their complements, E D number of points in E, 1 6D 1.]
18. Show that the F in (9) j is also the collection of sets of the form
F [ N [or FnN] where F 2 F and N 2 N .
19. Let N be as in the proof of Theorem 2.2.5 and N0 be the set of all
null sets in , F , P . Then both these collections are monotone classes, and
closed with respect to the operation “n”.
 20. Let , F , P  be a probability space and F a Borel subfield of
1
F . Prove that there exists a minimal B.F. F2 satisfying F1 ² F2 ² F and
N0 ² F2 , where N0 is as in Exercise 19. A set E belongs to F2 if and only
if there exists a set F in F1 such that E1F 2 N0 . This F2 is called the
augmentation of F1 with respect to , F , P .
21. Suppose that F  has all the defining properties of a d.f. except that it
is not assumed to be right continuous. Show that Theorem 2.2.2 and Lemma
remain valid with F replaced by F, Q provided that we replace Fx, Fb, Fa
Q
in (4) and (5) by FxC, Q
FbC, Q
FaC, respectively. What modification is
necessary in Theorem 2.2.4?
22. For an arbitrary measure P on a B.F. F , a set E in F is called an
atom of P iff P E > 0 and F ² E, F 2 F imply P F D P E or P F D
0. P is called atomic iff its value is zero over any set in F that is disjoint
from all the atoms. Prove that for a measure on B1 this new definition is
equivalent to that given in Exercise 11 above provided we identify two sets
which differ by a P -null set.
23. Prove that if the p.m. P is atomless, then given any ˛ in [0, 1]
there exists a set E 2 F with P E D ˛. [HINT: Prove first that there exists E
with “arbitrarily small” probability. A quick proof then follows from Zorn’s
lemma by considering a maximal collection of disjoint sets, the sum of whose
probabilities does not exceed ˛. But an elementary proof without using any
maximality principle is also possible.]
 24. A point x is said to be in the support of a measure on Bn iff
every open neighborhood of x has strictly positive measure. The set of all
such points is called the support of . Prove that the support is a closed set
whose complement is the maximal open set on which vanishes. Show that
2.2 PROBABILITY MEASURES AND THEIR DISTRIBUTION FUNCTIONS 33

the support of a p.m. on B1 is the same as that of its d.f., defined in Exercise 6
of Sec. 1.2.
 25. Let f be measurable with respect to F , and Z be contained in a null
set. Define 
Q f on Zc ,
fD
K on Z,

where K is a constant. Then fQ is measurable with respect to F provided that


, F , P  is complete. Show that the conclusion may be false otherwise.
Random variable.
3 Expectation. Independence

3.1 General definitions

Let the probability space , F , P  be given. R1 D 1, C1 the (finite)
real line, RŁ D [1, C1] the extended real line, B1 D the Euclidean Borel
field on R1 , BŁ D the extended Borel field. A set in BŁ is just a set in B
possibly enlarged by one or both points š1.

DEFINITION OF A RANDOM VARIABLE. A real, extended-valued random vari-


able is a function X whose domain is a set 1 in F and whose range is
contained in RŁ D [1, C1] such that for each B in BŁ , we have

1 fω: Xω 2 Bg 2 1 \ F

where 1 \ F is the trace of F on 1. A complex-valued random variable is


a function on a set 1 in F to the complex plane whose real and imaginary
parts are both real, finite-valued random variables.

This definition in its generality is necessary for logical reasons in many


applications, but for a discussion of basic properties we may suppose 1 D 
and that X is real and finite-valued with probability one. This restricted meaning
3.1 GENERAL DEFINITIONS 35

of a “random variable”, abbreviated as “r.v.”, will be understood in the book


unless otherwise specified. The general case may be reduced to this one by
considering the trace of , F , P  on 1, or on the “domain of finiteness”
10 D fω: jXωj < 1g, and taking real and imaginary parts.
Consider the “inverse mapping” X1 from R1 to , defined (as usual)
as follows:
8A ² R1 : X1 A D fω: Xω 2 Ag.

Condition (1) then states that X1 carries members of B1 onto members of F :
2 8B 2 B1 : X1 B 2 F ;
or in the briefest notation:
X1 B1  ² F .
Such a function is said to be measurable (with respect to F ). Thus, an r.v. is
just a measurable function from  to R1 (or RŁ ).
The next proposition, a standard exercise on inverse mapping, is essential.

Theorem 3.1.1. For any function X from  to R1 (or RŁ ), not necessarily


an r.v., the inverse mapping X1 has the following properties:
X1 Ac  D X1 Ac .
 
X1 A˛ D X1 A˛ ,
˛ ˛
 
 
X1 A˛ D X1 A˛ .
˛ ˛

where ˛ ranges over an arbitrary index set, not necessarily countable.

Theorem 3.1.2. X is an r.v. if and only if for each real number x, or each
real number x in a dense subset of R1 , we have
fω: Xω  xg 2 F .
PROOF. The preceding condition may be written as

3 8x: X1 1, x] 2 F .


Consider the collection A of all subsets S of R1 for which X1 S 2 F . From
Theorem 3.1.1 and the defining properties of the Borel field F , it follows that
if S 2 A , then
X1 Sc  D X1 Sc 2 F ;
36 RANDOM VARIABLE. EXPECTATION. INDEPENDENCE

if 8j: Sj 2 A , then
⎛ ⎞

X1 ⎝ Sj ⎠ D X1 Sj  2 F .


j j

Thus Sc 2 A and j Sj 2 A and consequently A is a B.F. This B.F. contains


all intervals of the form 1, x], which generate B1 even if x is restricted to
a dense set, hence A ¦ B1 , which means that X1 B 2 F for each B 2 B1 .
Thus X is an r.v. by definition. This proves the “if” part of the theorem; the
“only if” part is trivial.
Since P Ð is defined on F , the probability of the set in (1) is defined
and will be written as

P fXω 2 Bg or P fX 2 Bg.

The next theorem relates the p.m. P to a p.m. on R1 , B1  as discussed


in Sec. 2.2.

Theorem 3.1.3. Each r.v. on the probability space , F , P  induces a


probability space R1 , B1 ,  by means of the following correspondence:

4 8B 2 B1 : B D P fX1 B D P fX 2 Bg.


Clearly B ½ 0. If the Bn ’s are disjoint sets in B1 , then the
PROOF.
1
X Bn ’s are disjoint by Theorem 3.1.1. Hence
      
Bn DP X1 Bn DP X1 Bn 
n n n
 
1
D P X Bn  D Bn .
n n

Finally X1 R1  D , hence R1  D 1. Thus is a p.m.


The collection of sets fX1 S, S ² R1 g is a B.F. for any function X. If
X is a r.v. then the collection fX1 B, B 2 B1 g is called the B.F. generated
by X. It is the smallest Borel subfield of F which contains all sets of the form
fω: Xω  xg, where x 2 R1 . Thus (4) is a convenient way of representing
the measure P when it is restricted to this subfield; symbolically we may
write it as follows:
D P ° X1 .

This is called the “probability distribution measure” or p.m. of X, and its


associated d.f. F according to Theorem 2.2.4 will be called the d.f. of X.
3.1 GENERAL DEFINITIONS 37

Specifically, F is given by

Fx D 1, x] D P fX  xg.

While the r.v. X determines and therefore F, the converse is obviously


false. A family of r.v.’s having the same distribution is said to be “identically
distributed”.

Example 1. Let , S  be a discrete sample space (see Example 1 of Sec. 2.2).
Every numerically valued function is an r.v.

Example 2. U , B, m.
In this case an r.v. is by definition just a Borel measurable function. According to
the usual definition, f on U is Borel measurable iff f1 B1  ² B. In particular, the
function f given by fω  ω is an r.v. The two r.v.’s ω and 1  ω are not identical
but are identically distributed; in fact their common distribution is the underlying
measure m.

Example 3. R1 , B1 , .
The definition of a Borel measurable function is not affected, since no measure
is involved; so any such function is an r.v., whatever the given p.m. may be. As
in Example 2, there exists an r.v. with the underlying as its p.m.; see Exercise 3
below.

We proceed to produce new r.v.’s from given ones.

Theorem 3.1.4. If X is an r.v., f a Borel measurable function [on R1 , B1 ],


then fX is an r.v.
PROOF. The quickest proof is as follows. Regarding the function fX of
ω as the “composite mapping”:

f ° X: ω ! fXω,

we have f ° X1 D X1 ° f1 and consequently

f ° X1 B1  D X1 f1 B1  ² X1 B1  ² F .

The reader who is not familiar with operations of this kind is advised to spell
out the proof above in the old-fashioned manner, which takes only a little
longer.
We must now discuss the notion of a random vector. This is just a vector
each of whose components is an r.v. It is sufficient to consider the case of two
dimensions, since there is no essential difference in higher dimensions apart
from complication in notation.
38 RANDOM VARIABLE. EXPECTATION. INDEPENDENCE

We recall first that in the 2-dimensional Euclidean space R2 , or the plane,


the Euclidean Borel field B2 is generated by rectangles of the form

fx, y: a < x  b, c < y  dg.

A fortiori, it is also generated by product sets of the form

B1 ð B2 D fx, y: x 2 B1 , y 2 B2 g,

where B1 and B2 belong to B1 . The collection of sets, each of which is a finite


union of disjoint product sets, forms a field B20 . A function from R2 into R1
is called a Borel measurable function (of two variables) iff f1 B1  ² B2 .
Written out, this says that for each 1-dimensional Borel set B, viz., a member
of B1 , the set
fx, y: fx, y 2 Bg

is a 2-dimensional Borel set, viz. a member of B2 .


Now let X and Y be two r.v.’s on , F , P . The random vector (X, Y)
induces a probability on B2 as follows:

5 8A 2 B2 : A D P fX, Y 2 Ag,

the right side being an abbreviation of P fω: Xω, Yω 2 Ag. This
is called the (2-dimensional, probability) distribution or simply the p.m. of
(X, Y).
Let us also define, in imitation of X1 , the inverse mapping X, Y1 by
the following formula:

8A 2 B2 : X, Y1 A D fω: X, Y 2 Ag.

This mapping has properties analogous to those of X1 given in


Theorem 3.1.1, since the latter is actually true for a mapping of any two
abstract spaces. We can now easily generalize Theorem 3.1.4.

Theorem 3.1.5. If X and Y are r.v.’s and f is a Borel measurable function


of two variables, then fX, Y is an r.v.
PROOF.

[f ° X, Y]1 B1  D X, Y1 ° f1 B1  ² X, Y1 B2  ² F .

The last inclusion says the inverse mapping X, Y1 carries each 2-
dimensional Borel set into a set in F . This is proved as follows. If A D
B1 ð B2 , where B1 2 B1 , B2 2 B1 , then it is clear that

X, Y1 A D X1 B1  \ Y1 B2  2 F


3.1 GENERAL DEFINITIONS 39

by (2). Now the collection of sets A in R2 for which X, Y1 A 2 F forms
a B.F. by the analogue of Theorem 3.1.1. It follows from what has just been
shown that this B.F. contains B20 hence it must also contain B2 . Hence each
set in B2 belongs to the collection, as was to be proved.
Here are some important special cases of Theorems 3.1.4 and 3.1.5.
Throughout the book we shall use the notation for numbers as well as func-
tions:

6 x _ y D maxx, y, x ^ y D minx, y.

Corollary. If X is an r.v. and f is a continuous function on R1 , then fX


is an r.v.; in particular Xr for positive integer r, jXjr for positive real r, eX ,
eitX for real  and t, are all r.v.’s (the last being complex-valued). If X and Y
are r.v.’s, then

X _ Y, X ^ Y, X C Y, X  Y, X Ð Y, X/Y

are r.v.’s, the last provided Y does not vanish.

Generalization to a finite number of r.v.’s is immediate. Passing to an


infinite sequence, let us state the following theorem, although its analogue in
real function theory should be well known to the reader.

Theorem 3.1.6. If fXj , j ½ 1g is a sequence of r.v.’s, then

inf Xj , sup Xj , lim inf Xj , lim sup Xj


j j j j

are r.v.’s, not necessarily finite-valued with probability one though everywhere
defined, and
lim Xj
j!1

is an r.v. on the set 1 on which there is either convergence or divergence to


š1.
PROOF. To see, for example, that supj Xj is an r.v., we need only observe
the relation 
8x 2 R1 : fsup Xj  xg D fXj  xg
j j

and use Theorem 3.1.2. Since

lim sup Xj D infsup Xj ,


j n j½n
40 RANDOM VARIABLE. EXPECTATION. INDEPENDENCE

and limj!1 Xj exists [and is finite] on the set where lim supj Xj D lim infj Xj
[and is finite], which belongs to F , the rest follows.
Here already we see the necessity of the general definition of an r.v.
given at the beginning of this section.

DEFINITION. An r.v. X is called discrete (or countably valued ) iff there


is a countable set B ² R1 such that P X 2 B D 1.

It is easy to see that X is discrete if and only if its d.f. is. Perhaps it is
worthwhile to point out that a discrete r.v. need not have a range that is discrete
in the sense of Euclidean topology, even apart from a set of probability zero.
Consider, for example, an r.v. with the d.f. in Example 2 of Sec. 1.1.
The following terminology and notation will be used throughout the book
for an arbitrary set , not necessarily the sample space.

DEFINITION. For each 1 ² , the function 11 Ð defined as follows:



1, if ω 2 1,
8ω 2 : 11 ω D
0, if ω 2 n1,
is called the indicator (function) of 1.

Clearly 11 is an r.v. if and only if 1 2 F .


A countable partition of  is a countable family of disjoint sets f3j g,
with 3j 2 F for each j and such that  D j 3j . We have then

1 D 1 D 13j .
j

More generally, let bj be arbitrary real numbers, then the function ϕ defined
below: 
8ω 2 : ϕω D bj 13j ω,
j

is a discrete r.v. We shall call ϕ the r.v. belonging to the weighted partition
f3j ; bj g. Each discrete r.v. X belongs to a certain partition. For let fbj g be
the countable set in the definition of X and let 3j D fω: Xω D bj g, then X
belongs to the weighted partition f3j ; bj g. If j ranges over a finite index set,
the partition is called finite and the r.v. belonging to it simple.

EXERCISES

1. Prove Theorem 3.1.1. For the “direct mapping” X, which of these


properties of X1 holds?
3.2 PROPERTIES OF MATHEMATICAL EXPECTATION 41

2. If two r.v.’s are equal a.e., then they have the same p.m.
 3. Given any p.m. on R1 , B1 , define an r.v. whose p.m. is . Can
this be done in an arbitrary probability space?
 4. Let  be uniformly distributed on [0,1]. For each d.f. F, define Gy D
supfx: Fx  yg. Then G has the d.f. F.
 5. Suppose X has the continuous d.f. F, then FX has the uniform
distribution on [0,1]. What if F is not continuous?
6. Is the range of an r.v. necessarily Borel or Lebesgue measurable?
7. The sum, difference, product, or quotient (denominator nonvanishing)
of the two discrete r.v.’s is discrete.
8. If  is discrete (countable), then every r.v. is discrete. Conversely,
every r.v. in a probability space is discrete if and only if the p.m. is atomic.
[HINT: Use Exercise 23 of Sec. 2.2.]
9. If f is Borel measurable, and X and Y are identically distributed, then
so are fX and fY.
10. Express the indicators of 31 [ 32 , 31 \ 32 , 31 n32 , 31 1 32 ,
lim sup 3n , lim inf 3n in terms of those of 31 , 32 , or 3n . [For the definitions
of the limits see Sec. 4.2.]
 11. Let F fXg be the minimal B.F. with respect to which X is measurable.
Show that 3 2 F fXg if and only if 3 D X1 B for some B 2 B1 . Is this B
unique? Can there be a set A 2 / B1 such that 3 D X1 A?
12. Generalize the assertion in Exercise 11 to a finite set of r.v.’s. [It is
possible to generalize even to an arbitrary set of r.v.’s.]

3.2 Properties of mathematical expectation


The concept of “(mathematical) expectation” is the same as that of integration
in the probability space with respect to the measure P . The reader is supposed
to have some acquaintance with this, at least in the particular case U , B, m
or R1 , B1 , m. [In the latter case, the measure not being finite, the theory
of integration is slightly more complicated.] The general theory is not much
different and will be briefly reviewed. The r.v.’s below will be tacitly assumed
to be finite everywhere to avoid trivial complications.
For each positive discrete r.v. X belonging to the weighted partition
f3j ; bj g, we define its expectation to be

1 E X D bj P f3j g.
j

This is either a positive finite number or C1. It is trivial that if X belongs


to different partitions, the corresponding values given by (1) agree. Now let
42 RANDOM VARIABLE. EXPECTATION. INDEPENDENCE

X be an arbitrary positive r.v. For any two positive integers m and n, the set
 
n nC1
3mn D ω: m  Xω <
2 2m
belongs to F . For each m, let Xm denote the r.v. belonging to the weighted
partition f3mn ; n/2m g; thus Xm D n/2m if and only if n/2m  X < n C
1/2m . It is easy to see that we have for each m:
1
8ω: Xm ω  XmC1 ω; 0  Xω  Xm ω < .
2m
Consequently there is monotone convergence:
8ω: lim Xm ω D Xω.
m!1

The expectation Xm has just been defined; it is


1  
n n nC1
E Xm  D P  X < .
nD0
2m 2m 2m

If for one value of m we have E Xm  D C1, then we define E X D C1;
otherwise we define
E X D lim E Xm ,
m!1

the limit existing, finite or infinite, since E Xm  is an increasing sequence of


real numbers. It should be shown that when X is discrete, this new definition
agrees with the previous one.
For an arbitrary X, put as usual
2 X D XC  X where XC D X _ 0, X D X _ 0.
Both XC and X are positive r.v.’s, and so their expectations are defined.
Unless both E XC  and E X  are C1, we define
3 E X D E XC   E X 
with the usual convention regarding 1. We say X has a finite or infinite
expectation (or expected value) according as E X is a finite number or š1.
In the expected case we shall say that the expectation of X does not exist. The
expectation, when it exists, is also denoted by

XωP dω.


More generally, for each 3 in F , we define



4 XωP dω D E X Ð 13 
3
3.2 PROPERTIES OF MATHEMATICAL EXPECTATION 43

and call it “the integral of X (with respect to P ) over the set 3”. We shall
say that X is integrable with respect to P over 3 iff the integral above exists
and is finite.
In the case of R1 , B1 , , if we write X D f, ω D x, the integral
 
XωP dω D fx dx
3 3

is just the ordinary Lebesgue–Stieltjes integral of f with respect to . If F


is the d.f. of and 3 D a, b], this is also written as

fx dFx.
a,b]

This classical notation is really an anachronism, originated in the days when a


point function was more popular than a set function. The notation above must
then amend to  bC0  bC0  b0  b0
, , ,
aC0 a0 aC0 a0

to distinguish clearly between the four kinds of intervals (a, b], [a, b], (a, b),
[a, b).
In the case of U , B, m, the integral reduces to the ordinary Lebesgue
integral
 b  b
fxmdx D fx dx.
a a

Here m is atomless, so the notation is adequate and there is no need to distin-


guish between the different kinds of intervals.
The general integral has the familiar properties of the Lebesgue integral
on [0,1]. We list a few below for ready reference, some being easy conse-
quences of others.
 As a general notation, the left member of (4) will be
abbreviated to 3 X dP . In the following, X, Y are r.v.’s; a, b are constants;
3 is a set in F .

(i) Absolute integrability. 3 X dP is finite if and only if

jXj dP < 1.
3

(ii) Linearity.
  
aX C bYdP D a X dP C b Y dP
3 3 3

provided that the right side is meaningful, namely not C1  1 or 1 C 1.


44 RANDOM VARIABLE. EXPECTATION. INDEPENDENCE

(iii) Additivity over sets. If the 3n ’s are disjoint, then


 
X dP D X dP .
[n 3n n 3n

(iv) Positivity. If X ½ 0 a.e. on 3, then



X dP ½ 0.
3

(v) Monotonicity. If X1  X  X2 a.e. on 3, then


  
X1 dP  X dP  X2 dP .
3 3 3

(vi) Mean value theorem. If a  X  b a.e. on 3, then



aP 3  X dP  bP 3.
3

(vii) Modulus inequality.


  
 
 X dP   jXj dP .
 
3 3

 Xn D X a.e. or merely
(viii) Dominated convergence theorem. If limn!1
in measure on 3 and 8n: jXn j  Y a.e. on 3, with 3 Y dP < 1, then
  
5 lim Xn dP D X dP D lim Xn dP .
n!1 3 3 3 n!1

(ix) Bounded convergence theorem. If limn!1 Xn D X a.e. or merely


in measure on 3 and there exists a constant M such that 8n: jXn j  M a.e.
on 3, then (5) is true.
(x) Monotone convergence theorem. If Xn ½ 0 and Xn " X a.e. on 3,
then (5) is again true provided that C1 is allowed as a value for either
member. The condition “Xn ½ 0” may be weakened to: “E Xn  > 1 for
some n”.
(xi) Integration term by term. If

jXn j dP < 1,
n 3
 
then n jXn j < 1 a.e. on 3 so that n Xn converges a.e. on 3 and
  
Xn dP D Xn dP .
3 n n 3
3.2 PROPERTIES OF MATHEMATICAL EXPECTATION 45

(xii) Fatou’s lemma. If Xn ½ 0 a.e. on 3, then


 
 lim Xn  dP  lim Xn dP .
3 n!1 n!1 3

Let us prove the following useful theorem as an instructive example.

Theorem 3.2.1. We have


1
 1

6 P jXj ½ n  E jXj  1 C P jXj ½ n
nD1 nD1

so that E jXj < 1 if and only if the series above converges.


PROOF. By the additivity property (iii), if 3n D fn  jXj < n C 1g,
1 
E jXj D jXj dP .
nD0 3n

Hence by the mean value theorem (vi) applied to each set 3n :


1
 1
 1

7 nP 3n   E jXj  n C 1P 3n  D 1 C nP 3n .
nD0 nD0 nD0

It remains to show
1
 1

8 nP 3n  D P jXj ½ n,
nD0 nD0

finite or infinite. Now the partial sums of the series on the left may be rear-
ranged (Abel’s method of partial summation!) to yield, for N ½ 1,

N
9 nfP jXj ½ n  P jXj ½ n C 1g
nD0


N
D fn  n  1gP jXj ½ n  NP jXj ½ N C 1
nD1


N
D P jXj ½ n  NP jXj ½ N C 1.
nD1

Thus we have

N 
N 
N
10 nP 3n   P jXj ½ n  nP 3n  C NP jXj ½ N C 1.
nD1 nD1 nD1
46 RANDOM VARIABLE. EXPECTATION. INDEPENDENCE

Another application of the mean value theorem gives



NP jXj ½ N C 1  jXj dP .
jXj½NC1

Hence if E jXj < 1, then the last term in (10) converges to zero as N ! 1
and (8) follows with both sides finite. On the other hand, if E jXj D 1, then
the second term in (10) diverges with the first as N ! 1, so that (8) is also
true with both sides infinite.

Corollary. If X takes only positive integer values, then


1

E X D P X ½ n.
nD1

EXERCISES

1. If X ½ 0 a.e. on 3 and 3 X dP D 0, then X D 0 a.e. on 3.
 2. If E jXj < 1 and lim 
n!1 P 3n  D 0, then limn!1 3n X dP D 0.
In particular

lim X dP D 0.
n!1 fjXj>ng


3. Let X ½ 0 and  X dP D A, 0 < A < 1. Then the set function
defined on F as follows:

1
3 D X dP ,
A 3

is a probability measure on F .
4. Let c be a fixed constant, c > 0. Then E jXj < 1 if and only if
1

P jXj ½ cn < 1.
nD1

In particular, if the last series converges for one value of c, it converges for
all values of c.
5. For any r > 0, E jXjr  < 1 if and only if
1

nr1 P jXj ½ n < 1.
nD1
3.2 PROPERTIES OF MATHEMATICAL EXPECTATION 47

 6. Suppose that sup jX j  Y on 3 with  Y dP < 1. Deduce from


n n 3
Fatou’s lemma:  
 lim Xn  dP ½ lim Xn dP .
3 n!1 n!1 3

Show this is false if the condition involving Y is omitted.


 7. Given the r.v. X with finite E X, and  > 0, there exists a simple r.v.
X (see the end of Sec. 3.1) such that
E jX  X j < .
Hence there exists a sequence of simple r.v.’s Xm such that
lim E jX  Xm j D 0.
m!1

We can choose fXm g so that jXm j  jXj for all m.


 8. For any two sets 3 and 3 in F , define
1 2

31 , 32  D P 31 1 32 ;
then  is a pseudo-metric in the space of sets in F ; call the resulting metric
space MF , P . Prove that
 for each integrable r.v. X the mapping of MF , P 
to R1 given by 3 ! 3 X dP is continuous. Similarly, the mappings on
MF , P  ð MF , P  to MF , P  given by
31 , 32  ! 31 [ 32 , 31 \ 32 , 31 n32 , 31 1 32
are all continuous. If (see Sec. 4.2 below)
lim sup 3n D lim inf 3n
n n

modulo a null set, we denote the common equivalence class of these two sets
by limn 3n . Prove that in this case f3n g converges to limn 3n in the metric
. Deduce Exercise 2 above as a special case.
There is a basic relation between the abstract integral with respect to
P over sets in F on the one hand, and the Lebesgue–Stieltjes integral with
respect to over sets in B1 on the other, induced by each r.v. We give the
version in one dimension first.

Theorem 3.2.2. Let X on , F , P  induce the probability space


R1 , B1 ,  according to Theorem 3.1.3 and let f be Borel measurable. Then
we have
 
11 fXωP dω D fx dx
 R1

provided that either side exists.


48 RANDOM VARIABLE. EXPECTATION. INDEPENDENCE

Let B 2 B1 , and f D 1B , then the left side in (11) is P X 2 B


PROOF.
and the right side is B. They are equal by the definition of in (4) of
Sec. 3.1. Now by the linearity of both integrals in (11), it will also hold if

12 fD bj 1Bj ,
j
1 1
namely the r.v. on R , B ,  belonging to an arbitrary weighted partition
fBj ; bj g. For an arbitrary positive Borel measurable function f we can define,
as in the discussion of the abstract integral given above, a sequence ffm , m ½
1g of the form (12) such that fm " f everywhere. For each of them we have
 
13 fm X dP D fm d ;
 R1
hence, letting m ! 1 and using the monotone convergence theorem, we
obtain (11) whether the limits are finite or not. This proves the theorem for
f ½ 0, and the general case follows in the usual way.
We shall need the generalization of the preceding theorem in several
dimensions. No change is necessary except for notation, which we will give
in two dimensions. Instead of the in (5) of Sec. 3.1, let us write the “mass
element” as 2 dx, dy so that

A D 2
dx, dy.
A

Theorem 3.2.3. Let (X, Y) on , F , P  induce the probability space


R2 , B2 , 2  and let f be a Borel measurable function of two variables.
Then we have
 
14 fXω, YωP dω D fx, y 2 dx, dy.

R2

Note that fX, Y is an r.v. by Theorem 3.1.5.

As a consequence of Theorem 3.2.2, we have: if X and FX denote,


respectively, the p.m. and d.f. induced by X, then we have
  1
E X D x X dx D x dFX x;
R1 1
and more generally
  1
15 E fX D fx X dx D fx dFX x
R1 1
with the usual proviso regarding existence and finiteness.
3.2 PROPERTIES OF MATHEMATICAL EXPECTATION 49

2
Another important application is as follows: let be as in Theorem 3.2.3
and take fx, y to be x C y there. We obtain

16 E X C Y D x C y 2 dx, dy
R2
 
D x 2
dx, dy C y 2
dx, dy.
R2 R2

On the other hand, if we take fx, y to be x or y, respectively, we obtain


 
E X D x 2 dx, dy, E Y D y 2 dx, dy
R2 R2

and consequently

17 E X C Y D E X C E Y.

This result is a case of the linearity of E given but not proved here; the proof
above reduces this property in the general case of , F , P  to the corre-
sponding one in the special case R2 , B2 , 2 . Such a reduction is frequently
useful when there are technical difficulties in the abstract treatment.
We end this section with a discussion of “moments”.
Let a be real, r positive, then E jX  ajr  is called the absolute moment
of X of order r, about a. It may be C1; otherwise, and if r is an integer,
E X  ar  is the corresponding moment. If and F are, respectively, the
p.m. and d.f. of X, then we have by Theorem 3.2.2:
  1
E jX  ajr  D jx  ajr dx D jx  ajr dFx,
R1 1
  1
E X  ar  D x  ar dx D x  ar dFx.
R1 1

For r D 1, a D 0, this reduces to E X, which is also called the mean of X.


The moments about the mean are called central moments. That of order 2 is
particularly important and is called the variance, var (X); its positive square
root the standard deviation. X:

var X D 2
X D E fX  E X2 g D E X2   fE Xg2 .

We note the inequality 2 X  E X2 , which will be used a good deal
in Chapter 5. For any positive number p, X is said to belong to L p D
L p , F , P  iff E jXjp  < 1.
50 RANDOM VARIABLE. EXPECTATION. INDEPENDENCE

The well-known inequalities of Hölder and Minkowski (see, e.g.,


Natanson [3]) may be written as follows. Let X and Y be r.v.’s, 1 < p < 1
and 1/p C 1/q D 1, then
18 jE XYj  E jXYj  E jXjp 1/p E jYjq 1/q ,
19 fE jX C Yjp g1/p  E jXjp 1/p C E jYjp 1/p .
If Y  1 in (18), we obtain
20 E jXj  E jXjp 1/p ;

for p D 2, (18) is called the Cauchy–Schwarz inequality. Replacing jXj by


jXjr , where 0 < r < p, and writing r 0 D pr in (20) we obtain
0 0
21 E jXjr 1/r  E jXjr 1/r , 0 < r < r 0 < 1.
The last will be referred to as the Liapounov inequality. It is a special case of
the next inequality, of which we will sketch a proof.

Jensen’s inequality. If ϕ is a convex function on R1 , and X and ϕX are


integrable r.v.’s, then
22 ϕE X  E ϕX.
PROOF. Convexity means : for every positive 1 , . . . , n with sum 1 we
have
⎛ ⎞
n 
n
23 ϕ⎝ j yj ⎠  j ϕyj .
jD1 jD1

This is known to imply the continuity of ϕ, so that ϕX is an r.v. We shall


prove (22) for a simple r.v. and refer the general case to Theorem 9.1.4. Let
then X take the value yj with probability j , 1  j  n. Then we have by
definition (1):

n 
n
E X D j yj , E ϕX D j ϕyj .
jD1 jD1

Thus (22) follows from (23).


Finally, we prove a famous inequality that is almost trivial but very
useful.

Chebyshev inequality. If ϕ is a strictly positive and increasing function


on 0, 1, ϕu D ϕu, and X is an r.v. such that E fϕXg < 1, then for
3.2 PROPERTIES OF MATHEMATICAL EXPECTATION 51

each u > 0:
E fϕXg
P fjXj ½ ug  .
ϕu
PROOF. We have by the mean value theorem:
 
E fϕXg D ϕXdP ½ ϕXdP ½ ϕuP fjXj ½ ug
 fjXj½ug

from which the inequality follows.

The most familiar application is when ϕu D jujp for 0 < p < 1, so
that the inequality yields an upper bound for the “tail” probability in terms of
an absolute moment.

EXERCISES

9. Another proof of (14): verify it first for simple r.v.’s and then use
Exercise 7 of Sec. 3.2.
0
10. Prove that if 0  r < r 0 and E jXjr  < 1, then E jXjr  < 1. Also
that E jXjr  < 1 if and only if E jX  ajr  < 1 for every a.
 11. If E X2  D 1 and E jXj ½ a > 0, then P fjXj ½ ag ½ 1  2 a2
for 0    1.
 12. If X ½ 0 and Y ½ 0, p ½ 0, then E fX C Yp g  2p fE Xp  C
E Yp g. If p > 1, the factor 2p may be replaced by 2p1 . If 0  p  1, it
may be replaced by 1.
 13. If X ½ 0, then
j
⎧⎛ ⎞p ⎫
⎨  n ⎬ n
E ⎝ Xj ⎠
p
 or ½ E Xj 
⎩ ⎭
jD1 jD1

according as p  1 or p ½ 1.
 14. If p > 1, we have
 p
  
1 n  1
n
 X   jXj jp
n j
 jD1  n jD1
and so ⎧ p ⎫
⎨ 1 n  ⎬
 1
n
E  Xj   E jXj jp ;
⎩ n  ⎭ n
jD1 jD1
52 RANDOM VARIABLE. EXPECTATION. INDEPENDENCE

we have also
⎧ p ⎫ ⎧ ⎫p
⎨ 1 n  ⎬ ⎨ 
 1
n ⎬
E  Xj   E jXj jp 1/p .
⎩ n  ⎭ ⎩ n jD1 ⎭
jD1

Compare the inequalities.


15. If p > 0, E jXjp  < 1, then x p P fjXj > xg D o1 as x ! 1.
Conversely, if x p P fjXj > xg D o1, then E jXjp  < 1 for 0 <  < p.
 16. For any d.f. and any a ½ 0, we have
 1
[Fx C a  Fx] dx D a.
1

17. If F is a d.f. such that F0 D 0, then


 1  1
f1  Fxg dx D x dFx  C1.
0 0

Thus if X is a positive r.v., then we have


 1  1
E X D P fX > xg dx D P fX ½ xg dx.
0 0
1
18. Prove that 1 jxj dFx < 1 if and only if
 0  1
Fx dx < 1 and [1  Fx] dx < 1.
1 0

 19. If fX g is a sequence of identically distributed r.v.’s with finite mean,


n
then
1
lim E f max jXj jg D 0.
n n 1jn

[HINT: Use Exercise 17 to express the mean of the maximum.]


20. For r > 1, we have
 1
1 r
E X ^ ur  du D E X1/r .
0 u r r1
[HINT: By Exercise 17,
 ur  u
E X ^ u  D
r
P X > x dx D P X1/r > vr vr1 dv,
0 0

substitute and invert the order of the repeated integrations.]


3.3 INDEPENDENCE 53

3.3 Independence
We shall now introduce a fundamental new concept peculiar to the theory of
probability, that of “(stochastic) independence”.

DEFINITION OF INDEPENDENCE. The r.v.’s fXj , 1  j  ng are said to be


(totally) independent iff for any linear Borel sets fBj , 1  j  ng we have
⎧ ⎫
⎨ n ⎬  n
1 P Xj 2 Bj  D P Xj 2 Bj .
⎩ ⎭
jD1 jD1

The r.v.’s of an infinite family are said to be independent iff those in every
finite subfamily are. They are said to be pairwise independent iff every two
of them are independent.

Note that (1) implies that the r.v.’s in every subset of fXj , 1  j  ng
are also independent, since we may take some of the Bj ’s as R1 . On the other
hand, (1) is implied by the apparently weaker hypothesis: for every set of real
numbers fxj , 1  j  ng:
⎧ ⎫
⎨ n ⎬  n
2 P Xj  xj  D P Xj  xj .
⎩ ⎭
jD1 jD1

The proof of the equivalence of (1) and (2) is left as an exercise. In terms
of the p.m. n induced by the random vector X1 , . . . , Xn  on Rn , Bn , and
the p.m.’s f j , 1  j  ng induced by each Xj on R1 , B1 , the relation (1)
may be written as
⎛ ⎞

ð
n n
n⎝
3 Bj ⎠ D j Bj ,
jD1 jD1

where ðjD1 Bj is the product set B1 ð Ð Ð Ð ð Bn discussed in Sec. 3.1. Finally,


n

we may introduce the n-dimensional distribution function corresponding to


n
, which is defined by the left side of (2) or in alternative notation:
⎛ ⎞

ð1, x ]
n
Fx1 , . . . , xn  D P fXj  xj , 1  j  ng D n ⎝ j
⎠;
jD1

then (2) may be written as



n
Fx1 , . . . , xn  D Fj xj .
jD1
54 RANDOM VARIABLE. EXPECTATION. INDEPENDENCE

From now on when the probability space is fixed, a set in F will also be
called an event. The events fEj , 1  j  ng are said to be independent iff their
indicators are independent; this is equivalent to: for any subset fj1 , . . . j g of
f1, . . . , ng, we have
 
  
4 P Ej k D P Ejk .
kD1 kD1

Theorem 3.3.1. If fXj , 1  j  ng are independent r.v.’s and ffj , 1  j 


ng are Borel measurable functions, then ffj Xj , 1  j  ng are independent
r.v.’s.
PROOF.Let Aj 2 B1 , then f1
j Aj  2 B by the definition of a Borel
1

measurable function. By Theorem 3.1.1, we have



n 
n
ffj Xj  2 Aj g D fXj 2 f1
j Aj g.
jD1 jD1

Hence we have
⎧ ⎫ ⎧ ⎫
⎨ n ⎬ ⎨n ⎬  n
P [fj Xj  2 Aj ] D P [Xj 2 f1 A j ] D P fXj 2 f1
j Aj g
⎩ ⎭ ⎩ j

jD1 jD1 jD1


n
D P ffj Xj  2 Aj g.
jD1

This being true for every choice of the Aj ’s, the fj Xj ’s are independent
by definition.
The proof of the next theorem is similar and is left as an exercise.

Theorem 3.3.2. Let 1  n1 < n2 < Ð Ð Ð < nk D n; f1 a Borel measurable


function of n1 variables, f2 one of n2  n1 variables, . . . , fk one of nk  nk1
variables. If fXj , 1  j  ng are independent r.v.’s then the k r.v.’s

f1 X1 , . . . , Xn1 , f2 Xn1 C1 , . . . , Xn2 , . . . , fk Xnk1 C1 , . . . , Xnk 

are independent.

Theorem 3.3.3. If X and Y are independent and both have finite expecta-
tions, then

5 E XY D E XE Y.


3.3 INDEPENDENCE 55

PROOF. We give two proofs in detail of this important result to illustrate


the methods. Cf. the two proofs of (14) in Sec. 3.2, one indicated there, and
one in Exercise 9 of Sec. 3.2.
First proof. Suppose first that the two r.v.’s X and Y are both discrete
belonging respectively to the weighted partitions f3j ; cj g and fMk ; dk g such
that 3j D fX D cj g, Mk D fY D dk g. Thus
 
E X D cj P 3j , E Y D dk P Mk .
j k

Now we have
⎛ ⎞  
D⎝ 3j ⎠ \ Mk D 3j Mk 
j k j,k

and
XωYω D cj dk if ω 2 3j Mk .

Hence the r.v. XY is discrete and belongs to the superposed partitions


f3j Mk ; cj dk g with both the j and the k varying independently of each other.
Since X and Y are independent, we have for every j and k:

P 3j Mk  D P X D cj ; Y D dk  D P X D cj P Y D dk  D P 3j P Mk ;

and consequently by definition (1) of Sec. 3.2:


⎧ ⎫⎧ ⎫
 ⎨ ⎬ ⎨ ⎬
E XY D cj dk P 3j Mk  D cj P 3j  dk P Mk 
⎩ ⎭⎩ ⎭
j,k j k

D E XE Y.

Thus (5) is true in this case.


Now let X and Y be arbitrary positive r.v.’s with finite expectations. Then,
according to the discussion at the beginning of Sec. 3.2, there are discrete
r.v.’s Xm and Ym such that E Xm  " E X and E Ym  " E Y. Furthermore,
for each m, Xm and Ym are independent. Note that for the independence of
discrete r.v.’s it is sufficient to verify the relation (1) when the Bj ’s are their
possible values and “” is replaced by “D” (why ?). Here we have
   
n n0 n n C 1 n0 n0 C 1
P Xm D m ; Ym D m D P  X < ;  Y <
2 2 2m 2m 2m 2m
   0 
n nC1 n n0 C 1
DP  X < P  Y <
2m 2m 2m 2m
56 RANDOM VARIABLE. EXPECTATION. INDEPENDENCE

!  
n" n0
D P Xm D m P Ym D m .
2 2
The independence of Xm and Ym is also a consequence of Theorem 3.3.1,
since Xm D [2m X]/2m , where [X] denotes the greatest integer in X. Finally, it
is clear that Xm Ym is increasing with m and
0  XY  Xm Ym D XY  Ym  C Ym X  Xm  ! 0.
Hence, by the monotone convergence theorem, we conclude that
E XY D lim E Xm Ym  D lim E Xm E Ym 
m!1 m!1

D lim E Xm  lim E Ym  D E XE Y.


m!1 m!1

Thus (5) is true also in this case. For the general case, we use (2) and (3) of
Sec. 3.2 and observe that the independence of X and Y implies that of XC and
YC ; X and Y ; and so on. This again can be seen directly or as a consequence
of Theorem 3.3.1. Hence we have, under our finiteness hypothesis:
E XY D E XC  X YC  Y 
D E XC YC  XC Y  X YC C X Y 
D E XC YC   E XC Y   E X YC  C E X Y 
D E XC E YC   E XC E Y   E X E YC  C E X E Y 
D fE XC   E X gfE YC   E Y g D E XE Y.
The first proof is completed.
Second proof. Consider the random vector (X, Y) and let the p.m.
induced by it be 2 dx, dy. Then we have by Theorem 3.2.3:
 
E XY D XY dP D xy 2 dx, dy

R2

By (3), the last integral is equal to


   
xy 1 dx 2 dy D x 1 dx y 2 dx D E XE Y,
R1 R1 R1 R1

finishing the proof! Observe that we are using here a very simple form of
Fubini’s theorem (see below). Indeed, the second proof appears to be so much
shorter only because we are relying on the theory of “product measure” 2 D
1 ð 2 on R , B . This is another illustration of the method of reduction
2 2

mentioned in connection with the proof of (17) in Sec. 3.2.


3.3 INDEPENDENCE 57

Corollary. If fXj , 1  j  ng are independent r.v.’s with finite expectations,


then
⎛ ⎞

n 
n
6 E⎝ Xj ⎠ D E Xj .
jD1 jD1

This follows at once by induction from (5), provided we observe that the
two r.v.’s
k 
n
Xj and Xj
jD1 jDkC1

are independent for each k, 1  k  n  1. A rigorous proof of this fact may


be supplied by Theorem 3.3.2.

Do independent random variables exist? Here we can take the cue from
the intuitive background of probability theory which not only has given rise
historically to this branch of mathematical discipline, but remains a source of
inspiration, inculcating a way of thinking peculiar to the discipline. It may
be said that no one could have learned the subject properly without acquiring
some feeling for the intuitive content of the concept of stochastic indepen-
dence, and through it, certain degrees of dependence. Briefly then: events are
determined by the outcomes of random trials. If an unbiased coin is tossed
and the two possible outcomes are recorded as 0 and 1, this is an r.v., and it
takes these two values with roughly the probabilities 12 each. Repeated tossing
will produce a sequence of outcomes. If now a die is cast, the outcome may
be similarly represented by an r.v. taking the six values 1 to 6; again this
may be repeated to produce a sequence. Next we may draw a card from a
pack or a ball from an urn, or take a measurement of a physical quantity
sampled from a given population, or make an observation of some fortuitous
natural phenomenon, the outcomes in the last two cases being r.v.’s taking
some rational values in terms of certain units; and so on. Now it is very
easy to conceive of undertaking these various trials under conditions such
that their respective outcomes do not appreciably affect each other; indeed it
would take more imagination to conceive the opposite! In this circumstance,
idealized, the trials are carried out “independently of one another” and the
corresponding r.v.’s are “independent” according to definition. We have thus
“constructed” sets of independent r.v.’s in varied contexts and with various
distributions (although they are always discrete on realistic grounds), and the
whole process can be continued indefinitely.
Can such a construction be made rigorous? We begin by an easy special
case.
58 RANDOM VARIABLE. EXPECTATION. INDEPENDENCE

Example 1. Let n ½ 2 and j , Sj , Pj  be n discrete probability spaces. We define


the product space
n D 1 ð Ð Ð Ð ð n n factors

to be the space of all ordered n-tuples ωn D ω1 , . . . , ωn , where each ωj 2 j . The


product B.F. S n is simply the collection of all subsets of n , just as Sj is that for j .
Recall that (Example 1 of Sec. 2.2) the p.m. Pj is determined by its value for each
point of j . Since n is also a countable set, we may define a p.m. P n on S n by the
following assignment:

n
7 P n fωn g D Pj fωj g,
jD1

namely, to the n-tuple ω1 , . . . , ωn  the probability to be assigned is the product of the
probabilities originally assigned to each component ωj by Pj . This p.m. will be called
the product measure derived from the p.m.’s fPj , 1  j  ng and denoted by ðnjD1 Pj .
It is trivial to verify that this is indeed a p.m. Furthermore, it has the following product
property, extending its definition (7): if Sj 2 Sj , 1  j  n, then
⎛ ⎞

ð
n
n
n⎝
8 P Sj ⎠ D Pj Sj .
jD1 jD1

To see this, we observe that the left side is, by definition, equal to
    
n
ÐÐÐ P n fω1 , . . . , ωn g D ÐÐÐ Pj fωj g
ω1 2S1 ωn 2Sn ω1 2S1 ωn 2Sn jD1
⎧ ⎫
n ⎨
 ⎬ n
D Pj fωj g D Pj Sj ,
⎩ ⎭
jD1 ωj 2Sj jD1

the second equation being a matter of simple algebra.


Now let Xj be an r.v. (namely an arbitrary function) on j ; Bj be an arbitrary
Borel set; and Sj D X1
j Bj , namely:

Sj D fωj 2 j : Xj ωj  2 Bj g

so that Sj 2 Sj , We have then by (8):


⎧ ⎫ ⎧ ⎫

ð ð
⎨ n ⎬ ⎨ n ⎬ 
n 
n
9 P n
[Xj 2 Bj ] D P n
Sj D Pj Sj  D Pj fXj 2 Bj g.
⎩ ⎭ ⎩ ⎭
jD1 jD1 jD1 jD1

To each function Xj on j let correspond the function X Q j on n defined below, in


which ω D ω1 , . . . , ωn  and each “coordinate” ωj is regarded as a function of the
point ω:
Q j ω D Xj ωj .
8ω 2 n : X
3.3 INDEPENDENCE 59

Then we have

ð

n n
Q j ω 2 Bj g D
fω: X fωj : Xj ωj  2 Bj g
jD1 jD1

since
Q j ω 2 Bj g D 1 ð Ð Ð Ð ð j1 ð fωj : Xj ωj  2 Bj g ð jC1 ð Ð Ð Ð ð n .
fω: X

It follows from (9) that


⎧ ⎫
⎨n ⎬ n
Pn Q j 2 Bj ] D
[X Q j 2 Bj g.
P n fX
⎩ ⎭
jD1 jD1

Q j , 1  j  ng are independent.
Therefore the r.v.’s fX

Example 2. Let U n be the n-dimensional cube (immaterial whether it is closed or


not):
U n D fx1 , . . . , xn  : 0  xj  1; 1  j  ng.

The trace on U n of Rn , Bn , mn , where Rn is the n-dimensional Euclidean space,


Bn and mn the usual Borel field and measure, is a probability space. The p.m. mn on
U n is a product measure having the property analogous to (8). Let ffj , 1  j  ng
be n Borel measurable functions of one variable, and

Xj x1 , . . . , xn  D fj xj .

Then fXj , 1  j  ng are independent r.v.’s. In particular if fj xj   xj , we obtain


the n coordinate variables in the cube. The reader may recall the term “independent
variables” used in calculus, particularly for integration in several variables. The two
usages have some accidental rapport.

Example 3. The point of Example 2 is that there is a ready-made product measure


there. Now on Rn , Bn  it is possible to construct such a one based on given p.m.’s
on R1 , B1 . Let these be f j , 1  j  ng; we define n for product sets, in analogy
with (8), as follows: ⎛ ⎞

ð
n
n
n⎝
Bj ⎠ D j Bj .
jD1 jD1

It remains to extend this definition to all of Bn , or, more logically speaking, to prove
that there exists a p.m. n on Bn that has the above “product property”. The situation
is somewhat more complicated than in Example 1, just as Example 3 in Sec. 2.2 is
more complicated than Example 1 there. Indeed, the required construction is exactly
that of the corresponding Lebesgue–Stieltjes measure in n dimensions. This will be
subsumed in the next theorem. Assuming that it has been accomplished, then sets of
n independent r.v.’s can be defined just as in Example 2.
60 RANDOM VARIABLE. EXPECTATION. INDEPENDENCE

Example 4. Can we construct r.v.’s on the probability space U , B, m itself, without


going to a product space? Indeed we can, but only by imbedding a product structure
in U . The simplest case will now be described and we shall return to it in the next
chapter.
For each real number in (0,1], consider its binary digital expansion
1
n
10 x D Ð1 2 Ð Ð Ð n Ð Ð Ð D n
, each n D 0 or 1.
nD1
2

This expansion is unique except when x is of the form m/2n ; the set of such x is
countable and so of probability zero, hence whatever we decide to do with them will
be immaterial for our purposes. For the sake of definiteness, let us agree that only
expansions with infinitely many digits “1” are used. Now each digit j of x is a
function of x taking the values 0 and 1 on two Borel sets. Hence they are r.v.’s. Let
fcj , j ½ 1g be a given sequence of 0’s and 1’s. Then the set


n
fx : j x D cj , 1  j  ng D fx : j x D cj g
jD1

is the set of numbers x whose first n digits are the given cj ’s, thus

x D Ðc1 c2 Ð Ð Ð cn nC1 nC2 Ð Ð Ð

with the digits from the n C 1st on completely arbitrary. It is clear that this set is
just an interval of length 1/2n , hence of probability 1/2n . On the other hand for each
j, the set fx: j x D cj g has probability 12 for a similar reason. We have therefore

 n    n
1 1
P fj D cj , 1  j  ng D n
D D P fj D cj g.
2 jD1
2 jD1

This being true for every choice of the cj ’s, the r.v.’s fj , j ½ 1g are independent. Let
ffj , j ½ 1g be arbitrary functions with domain the two points f0, 1g, then ffj j , j ½
1g are also independent r.v.’s.
This example seems extremely special, but easy extensions are at hand (see
Exercises 13, 14, and 15 below).

We are now ready to state and prove the fundamental existence theorem
of product measures.

Theorem 3.3.4. Let a finite or infinite sequence of p.m.’s f j g on R1 , B1 ,


or equivalently their d.f.’s, be given. There exists a probability space
, F , P  and a sequence of independent r.v.’s fXj g defined on it such that
for each j, j is the p.m. of Xj .
PROOF. Without loss of generality we may suppose that the given
sequence is infinite. (Why?) For each n, let n , Fn , Pn  be a probability space
3.3 INDEPENDENCE 61

in which there exists an r.v. Xn with n as its p.m. Indeed this is possible if we
take n , Fn , Pn  to be R1 , B1 , n  and Xn to be the identical function of
the sample point x in R1 , now to be written as ωn (cf. Exercise 3 of Sec. 3.1).
Now define the infinite product space

ð
1
D n
nD1

on the collection of all “points” ω D fω1 , ω2 , . . . , ωn , . . .g, where for each n,


ωn is a point of n . A subset E of  will be called a “finite-product set” iff
it is of the form

ðF ,
1
11 ED n
nD1

where each Fn 2 Fn and all but a finite number of these Fn ’s are equal to the
corresponding n ’s. Thus ω 2 E if and only if ωn 2 Fn , n ½ 1, but this is
actually a restriction only for a finite number of values of n. Let the collection
of subsets of , each of which is the union of a finite number of disjoint finite-
product sets, be F0 . It is easy to see that the collection F0 is closed with respect
to complementation and pairwise intersection, hence it is a field. We shall take
the F in the theorem to be the B.F. generated by F0 . This F is called the
product B.F. of the sequence fFn , n ½ 1g and denoted by ðnD1 Fn .
1

We define a set function P on F0 as follows. First, for each finite-product


set such as the E given in (11) we set
1

12 P E D Pn Fn ,
nD1

where all but a finite number of the factors on the right side are equal to one.
Next, if E 2 F0 and
n
ED Ek ,
kD1

where the Ek ’s are disjoint finite-product sets, we put



n
13 P E D P Ek .
kD1

If a given set E in F0 has two representations of the form above, then it is


not difficult to see though it is tedious to explain (try drawing some pictures!)
that the two definitions of P E agree. Hence the set function P is uniquely
62 RANDOM VARIABLE. EXPECTATION. INDEPENDENCE

defined on F0 ; it is clearly positive with P  D 1, and it is finitely additive


on F0 by definition. In order to verify countable additivity it is sufficient to
verify the axiom of continuity, by the remark after Theorem 2.2.1. Proceeding
by contraposition, suppose that there exist a υ > 0 and a sequence of sets
fCn , n ½ 1g in F0 such that for each n, we have Cn ¦ CnC1 and P Cn  ½
υ > 0; we shall prove that 1 nD1 Cn 6D ∅. Note that each set E in F0 , as well
as each finite-product set, is “determined” by a finite number of coordinates
in the sense that there exists a finite integer k, depending only on E, such that
if ω D ω1 , ω2 , . . . and ω0 D ω10 , ω20 , . . . are two points of  with ωj D ωj0
for 1  j  k, then either both ω and ω0 belong to E or neither of them does.
To simplify the notation and with no real loss of generality (why?) we may
suppose that for each n, the set Cn is determined by the first n coordinates.
Given ω10 , for any subset E of  let us write E j ω10  for 1 ð E1 , where E1 is
the set of points ω2 , ω3 , . . . in ðnD2 n such that ω10 , ω2 , ω3 , . . . 2 E. If ω10
1

does not appear as first coordinate of any point in E, then E j ω10  D ∅. Note
that if E 2 F0 , then E j ω10  2 F0 for each ω10 . We claim that there exists an
ω10 such that for every n, we have P Cn j ω10  ½ υ/2. To see this we begin
with the equation

14 P Cn  D P Cn j ω1 P1 dω1 .
1

This is trivial if Cn is a finite-product set by definition (12), and follows by


addition for any Cn in F0 . Now put Bn D fω1 : P Cn j ω1  ½ υ/2g, a subset
of 1 ; then it follows that
 
υ
υ 1P1 dω1  C P1 dω1 
Bn Bnc 2

and so P Bn  ½ υ/2 for every n ½ 1. Since Bn is decreasing with Cn , we have


P1  1 nD1 Bn  ½ υ/2. Choose any ω1 in
0 1
nD1 Bn . Then P Cn j ω1  ½ υ/2.
0

Repeating the argument for the set Cn j ω1 , we see that there exists an ω20
0

such that for every n, P Cn j ω10 , ω20  ½ υ/4, where Cn j ω10 , ω20  D Cn j
ω10  j ω20  is of the form 1 ð 2 ð E3 and E3 is the set ω3 , ω4 , . . . in
ð1 nD3 n such that ω1 , ω2 , ω3 , ω4 , . . . 2 Cn ; and so forth by induction. Thus
0 0

for each k ½ 1, there exists ωk0 such that


υ
8n : P Cn j ω10 , . . . , ωk0  ½ .
2k
Consider the point ω0 D ω10 , ω20 , . . . , ωn0 , . . .. Since Ck j ω10 , . . . , ωk0  6D ∅,
there is a point in Ck whose first k coordinates are the same as those of ω0 ;
since Ck is determined by the first k coordinates, it follows that ω0 2 Ck . This
is true for all k, hence ω0 2 1kD1 Ck , as was to be shown.
3.3 INDEPENDENCE 63

We have thus proved that P as defined on F0 is a p.m. The next theorem,


which is a generalization of the extension theorem discussed in Theorem 2.2.2
and whose proof can be found in Halmos [4], ensures that it can be extended
to F as desired. This extension is called the product measure of the sequence
fPn , n ½ 1g and denoted by ðnD1 Pn with domain the product field ðnD1 Fn .
1 1

Theorem 3.3.5. Let F0 be a field of subsets of an abstract space , and P


a p.m. on F0 . There exists a unique p.m. on the B.F. generated by F0 that
agrees with P on F0 .

The uniqueness is proved in Theorem 2.2.3.


Returning to Theorem 3.3.4, it remains to show that the r.v.’s fωj g are
independent. For each k ½ 2, the independence of fωj , 1  j  kg is a conse-
quence of the definition in (12); this being so for every k, the independence
of the infinite sequence follows by definition.
We now give a statement of Fubini’s theorem for product measures,
which has already been used above in special cases. It is sufficient to consider
the case n D 2. Let  D 1 ð 2 , F D F1 ð F2 and P D P1 ð P2 be the
product space, B.F., and measure respectively.
Let F1 , P1 , F2 , P2  and F1 ð F2 , P1 ð P2  be the completions
of F1 , P1 , F2 , P2 , and F1 ð F2 , P1 ð P2 , respectively, according to
Theorem 2.2.5.

Fubini’s theorem. Suppose that f is measurable with respect to F1 ð F2


and integrable with respect to P1 ð P2 . Then

(i) for each ω1 2 1 nN1 where N1 2 F1 and P1 N1  D 0, fω1 , Ð is


measurable with respect to F2 and integrable with respect to P2 ;
(ii) the integral 
fÐ, ω2 P2 dω2 
2

is measurable with respect to F1 and integrable with respect to P1 ;


(iii) The following equation between a double and a repeated integral
holds:
  # $
15 fω1 , ω2 P1 ð P2 dω D fω1 , ω2 P2 dω2  P1 dω1 .
1 2
1 ð2

Furthermore, suppose f is positive and measurable with respect to


F1 ð F2 ; then if either member of (15) exists finite or infinite, so does the
other also and the equation holds.
64 RANDOM VARIABLE. EXPECTATION. INDEPENDENCE

Finally if we remove all the completion signs “ — ” in the hypotheses


as well as the conclusions above, the resulting statement is true with the
exceptional set N1 in (i) empty, provided the word “integrable” in (i) be
replaced by “has an integral.”
The theorem remains true if the p.m.’s are replaced by -finite measures;
see, e.g., Royden [5].
The reader should readily recognize the particular cases of the theorem
with one or both of the factor measures discrete, so that the corresponding
integral reduces to a sum. Thus it includes the familiar theorem on evaluating
a double series by repeated summation.
We close this section by stating a generalization of Theorem 3.3.4 to the
case where the finite-dimensional joint distributions of the sequence fXj g are
arbitrarily given, subject only to mutual consistency. This result is a particular
case of Kolmogorov’s extension theorem, which is valid for an arbitrary family
of r.v.’s.
Let m and n be integers: 1  m < n, and define mn to be the “projection
map” of Bm onto Bn given by

8B 2 Bm : mn B D fx1 , . . . , xn : x1 , . . . , xm  2 Bg.

Theorem 3.3.6. For each n ½ 1, let n


be a p.m. on Rn , Bn  such that

16 8m < n: n
° mn D
m
.

Then there exists a probability space , F , P  and a sequence of r.v.’s fXj g
on it such that for each n, n is the n-dimensional p.m. of the vector

X1 , . . . , Xn .

Indeed, the  and F may be taken exactly as in Theorem 3.3.4 to be the


product space
ð
j and
j
Fj , ð j

where j , Fj  D R1 , B1  for each j; only P is now more general. In terms
of d.f.’s, the consistency condition (16) may be stated as follows. For each
m ½ 1 and x1 , . . . , xm  2 Rm , we have if n > m:

lim Fn x1 , . . . , xm , xmC1 , . . . , xn  D Fm x1 , . . . , xm .


.xmC1 .!1
.xn.!1 ..

For a proof of this first fundamental theorem in the theory of stochastic


processes, see Kolmogorov [8].
3.3 INDEPENDENCE 65

EXERCISES

1. Show that two r.v.’s on , F  may be independent according to one


p.m. P but not according to another!
 2. If X and X are independent r.v.’s each assuming the values C1
1 2
and 1 with probability 12 , then the three r.v.’s fX1 , X2 , X1 X2 g are pairwise
independent but not totally independent. Find an example of n r.v.’s such that
every n  1 of them are independent but not all of them. Find an example
where the events 31 and 32 are independent, 31 and 33 are independent, but
31 and 32 [ 33 are not independent.
 3. If the events fE , ˛ 2 Ag are independent, then so are the events
˛
fF˛ , ˛ 2 Ag, where each F˛ may be E˛ or Eca ; also if fAˇ , ˇ 2 Bg, where
B is an arbitrary index set, is a collection of disjoint countable subsets of A,
then the events
E˛ , ˇ 2 B,
˛2Aˇ

are independent.
4. Fields or B.F.’s F˛ ² F  of any family are said to be independent iff
any collection of events, one from each F˛ , forms a set of independent events.
Let F˛0 be a field generating F˛ . Prove that if the fields F˛0 are independent,
then so are the B.F.’s F˛ . Is the same conclusion true if the fields are replaced
by arbitrary generating sets? Prove, however, that the conditions (1) and (2)
are equivalent. [HINT: Use Theorem 2.1.2.]
5. If fX˛ g is a family of independent r.v.’s, then the B.F.’s generated
by disjoint subfamilies are independent. [Theorem 3.3.2 is a corollary to this
proposition.]
6. The r.v. X is independent of itself if and only if it is constant with
probability one. Can X and fX be independent where f 2 B1 ?
7. If fEj , 1  j < 1g are independent events then
⎛ ⎞
1
 1
P ⎝ Ej ⎠ D P Ej ,
jD1 jD1

where the infinite product is defined to be the obvious limit; similarly


⎛ ⎞
1 1

P ⎝ Ej ⎠ D 1  1  P Ej .
jD1 jD1

8. Let fXj , 1  j  ng be independent with d.f.’s fFj , 1  j  ng. Find


the d.f. of maxj Xj and minj Xj .
66 RANDOM VARIABLE. EXPECTATION. INDEPENDENCE

 9. If X and Y are independent and E X exists, then for any Borel set
B, we have 
X dP D E XP Y 2 B.
fY2Bg

 10. If X and Y are independent and for some p > 0: E jX C Yjp  < 1,
then E jXjp  < 1 and E jYjp  < 1.
11. If X and Y are independent, E jXjp  < 1 for some p ½ 1, and
E Y D 0, then E jX C Yjp  ½ E jXjp . [This is a case of Theorem 9.3.2;
but try a direct proof!]
12. The r.v.’s fj g in Example 4 are related to the “Rademacher func-
tions”:
rj x D sgnsin 2j x.

What is the precise relation?


13. Generalize Example 4 by considering any s-ary expansion where s ½
2 is an integer:
1
 n
xD , where n D 0, 1, . . . , s  1.
nD1
sn
 14. Modify Example 4 so that to each x in [0, 1] there corresponds a
sequence of independent and identically distributed r.v.’s fn , n ½ 1g, each
taking the values 1 and 0 with probabilities p and 1  p, respectively, where
0 < p < 1. Such a sequence will be referred to as a “coin-tossing game (with
probability p for heads)”; when p D 12 it is said to be “fair”.
15. Modify Example 4 further so as to allow n to take the values 1 and
0 with probabilities pn and 1  pn , where 0 < pn < 1, but pn depends on n.
16. Generalize (14) to

P C D P C j ω1 , . . . , ωk P1 dω1  Ð Ð Ð Pk dωk 
C1,...,k

D P C j ω1 , . . . , ωk P dω,


where C1, . . . , k is the set of ω1 , . . . , ωk  which appears as the first k


coordinates of at least one point in C and in the second equation ω1 , . . . , ωk
are regarded as functions of ω.
 17. For arbitrary events fE , 1  j  ng, we have
j
⎛ ⎞
n n 
P ⎝ Ej ⎠ ½ P Ej   P Ej Ek .
jD1 jD1 1j<kn
3.3 INDEPENDENCE 67

If 8n: fEn
j , 1  j  ng are independent events, and
⎛ ⎞
n
P⎝ En
j
⎠!0 as n ! 1,
jD1

then ⎛ ⎞
n 
n
P⎝ En ⎠¾ n
j P Ej .
jD1 jD1

18. Prove that B ð B 6D B ð B, where B is the completion of B with


respect to the Lebesgue measure; similarly B ð B.
19. If f 2 F1 ð F2 and

jfjdP1 ð P2  < 1,
1 ð2

then  # $  # $
fdP 2 dP 1 D fdP 1 dP 2 .
1 2 2 1

20. A typical application of Fubini’s theorem is as follows. If f is a


Lebesgue measurable function of (x, y) such that fx, y D 0 for each x 2 R1
and y 2
/ Nx , where mNx  D 0 for each x, then we have also fx, y D 0 for
/ N and x 2 Ny0 , where mN  D 0 and mNy0  D 0 for each y 2
each y 2 / N.
4 Convergence concepts

4.1 Various modes of convergence

As numerical-valued functions, the convergence of a sequence of r.v.’s


fXn , n ½ 1g, to be denoted simply by fXn g below, is a well-defined concept.
Here and hereafter the term “convergence” will be used to mean convergence
to a finite limit. Thus it makes sense to say: for every ω 2 1, where
1 2 F , the sequence fXn ωg converges. The limit is then a finite-valued
r.v. (see Theorem 3.1.6), say Xω, defined on 1. If  D 1, then we have
“convergence every-where”, but a more useful concept is the following one.

DEFINITION OF CONVERGENCE “ALMOST EVERYWHERE” (a.e.). The sequence of


r.v. fXn g is said to converge almost everywhere [to the r.v. X] iff there exists
a null set N such that

1 8ω 2 nN: lim Xn ω D Xω finite.


n!1

Recall that our convention stated at the beginning of Sec. 3.1 allows
each r.v. a null set on which it may be š1. The union of all these sets being
still a null set, it may be included in the set N in (1) without modifying the
4.1 VARIOUS MODES OF CONVERGENCE 69

conclusion. This type of trivial consideration makes it possible, when dealing


with a countable set of r.v.’s that are finite a.e., to regard them as finite
everywhere.
The reader should learn at an early stage to reason with a single sample
point ω0 and the corresponding sample sequence fXn ω0 , n ½ 1g as a numer-
ical sequence, and translate the inferences on the latter into probability state-
ments about sets of ω. The following characterization of convergence a.e. is
a good illustration of this method.

Theorem 4.1.1. The sequence fXn g converges a.e. to X if and only if for
every  > 0 we have

2 lim P fjXn  Xj   for all n ½ mg D 1;


m!1

or equivalently

20  lim P fjXn  Xj >  for some n ½ mg D 0.


m!1

PROOF. Suppose there is convergence a.e. and let 0 D nN where N is


as in (1). For m ½ 1 let us denote by Am  the event exhibited in (2), namely:
1

3 Am  D fjXn  Xj  g.
nDm

Then Am  is increasing with m. For each ω0 , the convergence of fXn ω0 g
to Xω0  implies that given any  > 0, there exists mω0 ,  such that

4 n ½ mω0 ,  ) jXn ω0   Xω0 j  .

Hence each such ω0 belongs to some Am  and so 0 ² 1 mD1 Am .


It follows from the monotone convergence property of a measure that
limm!1 P Am  D 1, which is equivalent to (2).
Conversely, suppose (2) holds, then we see above that the set A D
1
mD1 Am  has probability equal to one. For any ω0 2 A, (4) is true for
the given . Let  run through a sequence of values decreasing to zero, for
instance f1/ng. Then the set
1  
1
AD A
nD1
n

still has probability one since


  
1
P A D lim P A .
n n
70 CONVERGENCE CONCEPTS

If ω0 belongs to A, then (4) is true for all  D 1/n, hence for all  > 0 (why?).
This means fXn ω0 g converges to Xω0  for all ω0 in a set of probability one.
A weaker concept of convergence is of basic importance in probability
theory.

DEFINITION OF CONVERGENCE “IN PROBABILITY” (in pr.). The sequence fXn g


is said to converge in probability to X iff for every  > 0 we have

5 lim P fjXn  Xj > g D 0.


n!1

Strictly speaking, the definition applies when all Xn and X are finite-
valued. But we may extend it to r.v.’s that are finite a.e. either by agreeing
to ignore a null set or by the logical convention that a formula must first be
defined in order to be valid or invalid. Thus, for example, if Xn ω D C1
and Xω D C1 for some ω, then Xn ω  Xω is not defined and therefore
such an ω cannot belong to the set fjXn  Xj > g figuring in (5).
Since 20  clearly implies (5), we have the immediate consequence below.

Theorem 4.1.2. Convergence a.e. [to X] implies convergence in pr. [to X].

Sometimes we have to deal with questions of convergence when no limit


is in evidence. For convergence a.e. this is immediately reducible to the numer-
ical case where the Cauchy criterion is applicable. Specifically, fXn g converges
a.e. iff there exists a null set N such that for every ω 2 nN and every  > 0,
there exists mω,  such that

n0 > n ½ mω,  ) jXn ω  Xn0 ωj  .

The following analogue of Theorem 4.1.1 is left to the reader.

Theorem 4.1.3. The sequence fXn g converges a.e. if and only if for every 
we have

6 lim P fjXn  Xn0 j >  for some n0 > n ½ mg D 0.


m!1

For convergence in pr., the obvious analogue of (6) is

7 lim P fjXn  Xn0 j > g D 0.


n!1
n0 !1

It can be shown (Exercise 6 of Sec. 4.2) that this implies the existence of a
finite r.v. X such that Xn ! X in pr.
4.1 VARIOUS MODES OF CONVERGENCE 71

p
DEFINITION OF CONVERGENCE IN L , 0 < p < 1. The sequence fXn g is said
to converge in L to X iff Xn 2 L , X
p p
2 L p and
8 lim E jXn  Xjp  D 0.
n!1

In all these definitions above, Xn converges to X if and only if Xn  X


converges to 0. Hence there is no loss of generality if we put X  0 in the
discussion, provided that any hypothesis involved can be similarly reduced
to this case. We say that X is dominated by Y if jXj  Y a.e., and that the
sequence fXn g is dominated by Y iff this is true for each Xn with the same
Y. We say that X or fXn g is uniformly bounded iff the Y above may be taken
to be a constant.

Theorem 4.1.4. If Xn converges to 0 in L p , then it converges to 0 in pr.


The converse is true provided that fXn g is dominated by some Y that belongs
to L p .

Remark. If Xn ! X in L p , and fXn g is dominated by Y, then fXn  Xg


is dominated by Y C jXj, which is in L p . Hence there is no loss of generality
to assume X  0.
PROOF. By Chebyshev inequality with ϕx  jxjp , we have
E jXn jp 
9 P fjXn j ½ g  .
p
Letting n ! 1, the right member ! 0 by hypothesis, hence so does the left,
which is equivalent to (5) with X D 0. This proves the first assertion. If now
jXn j  Y a.e. with EYp  < 1, then we have
  
E jXn j  D
p
jXn j dP C
p
jXn j dP   C
p p
Yp dP .
fjXn j<g fXn ½g fjXn j½g

Since P fjXn j ½ g ! 0, the last-written integral ! 0 by Exercise 2 in


Sec. 3.2. Letting first n ! 1 and then  ! 0, we obtain E jXn jp  ! 0; hence
Xn converges to 0 in L p .
As a corollary, for a uniformly bounded sequence fXn g convergence in
pr. and in L p are equivalent. The general result is as follows.

Theorem 4.1.5. Xn ! 0 in pr. if and only if


 
jXn j
10 E ! 0.
1 C jXn j
Furthermore, the functional Ð, Ð given by
 
jX  Yj
X, Y D E
1 C jX  Yj
72 CONVERGENCE CONCEPTS

is a metric in the space of r.v.’s, provided that we identify r.v.’s that are
equal a.e.
PROOF. If X, Y D 0, then E jX  Yj D 0, hence X D Y a.e. by
Exercise 1 of Sec. 3.2. To show that Ð, Ð is metric it is sufficient to show that
     
jX  Yj jXj jYj
E E CE .
1 C jX  Yj 1 C jXj 1 C jYj
For this we need only verify that for every real x and y:
jx C yj jxj jyj
11  C .
1 C jx C yj 1 C jxj 1 C jyj
By symmetry we may suppose that jyj  jxj; then
jx C yj jxj jx C yj  jxj
12  D
1 C jx C yj 1 C jxj 1 C jx C yj1 C jxj
jjx C yj  jxjj jyj
  .
1 C jxj 1 C jyj
For any X the r.v. jXj/1 C jXj is bounded by 1, hence by the second
part of Theorem 4.1.4 the first assertion of the theorem will follow if we show
that jXn j ! 0 in pr. if and only if jXn j/1 C jXn j ! 0 in pr. But jxj   is
equivalent to jxj/1 C jxj  /1 C ; hence the proof is complete.

Example 1. Convergence in pr. does not imply convergence in L p , and the latter
does not imply convergence a.e.
Take the probability space , F , P  to be U , B, m as in Example 2 of
Sec. 2.2. Let ϕk,j be the indicator of the interval
 
j1 j
, , k ½ 1, 1  j  k.
k k

Order these functions lexicographically first according to k increasing, and then for
each k according to j increasing, into one sequence fXn g so that if Xn D ϕkn jn , then
kn ! 1 as n ! 1. Thus for each p > 0:

1
E Xpn  D ! 0,
kn

and so Xn ! 0 in Lp . But for each ω and every k, there exists a j such that ϕkj ω D 1;
hence there exist infinitely many values of n such that Xn ω D 1. Similarly there exist
infinitely many values of n such that Xn ω D 0. It follows that the sequence fXn ωg
of 0’s and 1’s cannot converge for any ω. In other words, the set on which fXn g
converges is empty.
4.1 VARIOUS MODES OF CONVERGENCE 73

Now if we replace ϕkj by k 1/p ϕkj , where p > 0, then P fXn > 0g D 1/kn !
0 so that Xn ! 0 in pr., but for each n, we have E Xn p  D 1. Consequently
limn!1 E jXn  0jp  D 1 and Xn does not ! 0 in L p .

Example 2. Convergence a.e. does not imply convergence in L p .


In U , B, m define
⎧  
⎨ 2n , 1
if ω 2 0, ;
Xn ω D n

0, otherwise.

Then E jXn jp  D 2np /n ! C1 for each p > 0, but Xn ! 0 everywhere.

The reader should not be misled by these somewhat “artificial” examples


to think that such counterexamples are rare or abnormal. Natural examples
abound in more advanced theory, such as that of stochastic processes, but
they are not as simple as those discussed above. Here are some culled from
later chapters, tersely stated. In the symmetrical Bernoullian random walk
fSn , n ½ 1g, let n D 1fSn D0g . Then limn!1 E np  D 0 for every p > 0, but
P flimn!1 n existsg D 0 because of intermittent return of Sn to the origin (see
Sec. 8.3). This kind of example can be formulated for any recurrent process
such as a Brownian motion. On the other hand, if fn , n ½ 1g denotes the
random walk above stopped at 1, then E n  D 0 for all n but P flimn!1 n D
1g D 1. The same holds for the martingale that consists in tossing a fair
coin and doubling the stakes until the first head (see Sec. 9.4). Another
striking example is furnished by the simple Poisson process fNt, t ½ 0g (see
Sect. 5.5). If t D Nt/t, then E t D  for all t > 0; but P flimt#0 t D
0g D 1 because for almost every ω, Nt, ω D 0 for all sufficiently small values
of t. The continuous parameter may be replaced by a sequence tn # 0.
Finally, we mention another kind of convergence which is basic in func-
tional analysis, but confine ourselves to L 1 . The sequence of r.v.’s fXn g in L 1
is said to converge weakly in L 1 to X iff for each bounded r.v. Y we have

lim E Xn Y D E XY, finite.


n!1

It is easy to see that X 2 L 1 and is unique up to equivalence by taking


Y D 1fX6DX0 g if X0 is another candidate. Clearly convergence in L 1 defined
above implies weak convergence; hence the former is sometimes referred to
as “strong”. On the other hand, Example 2 above shows that convergence
a.e. does not imply weak convergence; whereas Exercises 19 and 20 below
show that weak convergence does not imply convergence in pr. (nor even in
distribution; see Sec. 4.4).
74 CONVERGENCE CONCEPTS

EXERCISES

1. Xn ! C1 a.e. if and only if 8M > 0: P fXn < M i.o.g D 0.


2. If 0  Xn , Xn  X 2 L 1 and Xn ! X in pr., then Xn ! X in L 1 .
3. If Xn ! X, Yn ! Y both in pr., then Xn š Yn ! X š Y, Xn Yn !
XY, all in pr.
 4. Let f be a bounded uniformly continuous function in R1 . Then X !
n
0 in pr. implies E ffXn g ! f0. [Example:

jXj
fx D
1 C jXj

as in Theorem 4.1.5.]
5. Convergence in L p implies that in L r for r < p.
6. If Xn ! X, Yn ! Y, both in L p , then Xn š Yn ! X š Y in L p . If
Xn ! X in L p and Yn ! Y in L q , where p > 1 and 1/p C 1/q D 1, then
Xn Yn ! XY in L 1 .
7. If Xn ! X in pr. and Xn ! Y in pr., then X D Y a.e.
8. If Xn ! X a.e. and n and are the p.m.’s of Xn and X, it does not
follow that n P ! P even for all intervals P.
9. Give an example in which E Xn  ! 0 but there does not exist any
subsequence fnk g ! 1 such that Xnk ! 0 in pr.
 10. Let f be a continuous function on R1 . If X ! X in pr., then
n
fXn  ! fX in pr. The result is false if f is merely Borel measurable.
[HINT: Truncate f at šA for large A.]
11. The extended-valued r.v. X is said to be bounded in pr. iff for each
 > 0, there exists a finite M such that P fjXj  Mg ½ 1  . Prove that
X is bounded in pr. if and only if it is finite a.e.
12. The sequence of extended-valued r.v. fXn g is said to be bounded in
pr. iff supn jXn j is bounded in pr.; fXn g is said to diverge to C1 in pr. iff for
each M > 0 and  > 0 there exists a finite n0 M,  such that if n > n0 , then
P fjXn j > Mg > 1  . Prove that if fXn g diverges to C1 in pr. and fYn g is
bounded in pr., then fXn C Yn g diverges to C1 in pr.
13. If supn Xn D C1 a.e., there need exist no subsequence fXnk g that
diverges to C1 in pr.
14. It is possible that for each ω, limn Xn ω D C1, but there does
not exist a subsequence fnk g and a set 1 of positive probability such that
limk Xnk ω D C1 on 1. [HINT: On U , B define Xn ω according to the
nth digit of ω.]
4.2 ALMOST SURE CONVERGENCE; BOREL–CANTELLI LEMMA 75

 15. Instead of the  in Theorem 4.1.5 one may define other metrics as
follows. Let 1 X, Y be the infimum of all  > 0 such that
P jX  Yj >   .

Let 2 X, Y be the infimum of P fjX  Yj > g C  over all  > 0. Prove
that these are metrics and that convergence in pr. is equivalent to convergence
according to either metric.
 16. Convergence in pr. for arbitrary r.v.’s may be reduced to that of
bounded r.v.’s by the transformation
X0 D arctan X.

In a space of uniformly bounded r.v.’s, convergence in pr. is equivalent to


that in the metric 0 X, Y D E jX  Yj; this reduces to the definition given
in Exercise 8 of Sec. 3.2 when X and Y are indicators.
17. Unlike convergence in pr., convergence a.e. is not expressible
by means of metric. [HINT: Metric convergence has the property that if
xn , x ! 0, then there exist  > 0 and fnk g such that xnk , x ½  for
every k.]
18. If Xn # X a.s., each Xn is integrable and infn E Xn  > 1, then
Xn ! X in L 1 .
19. Let fn x D 1 C cos 2nx, fx D 1 in [0, 1]. Then for each g 2 L 1
[0, 1] we have  
1 1
fn g dx ! fg dx,
0 0

but fn does not converge to f in measure. [HINT: This is just the


Riemann–Lebesgue lemma in Fourier series, but we have made fn ½ 0 to
stress a point.]
20. Let fXn g be a sequence of independent r.v.’s with zero mean and
unit variance. Prove that for any bounded r.v. Y we have limn!1 E Xn Y D 0.
HINT: Consider E f[Y  kD1 E Xk YXk ] g to get Bessel’s inequality E Y  ½
n 2 2
[
n 2
kD1 E Xk Y . The stated result extends to case where the Xn ’s are assumed
only to be uniformly integrable (see Sec. 4.5) but now we must approximate
Y by a function of X1 , . . . , Xm , cf. Theorem 8.1.1.]

4.2 Almost sure convergence; Borel–Cantelli lemma


An important concept in set theory is that of the “lim sup” and “lim inf ” of
a sequence of sets. These notions can be defined for subsets of an arbitrary
space .
76 CONVERGENCE CONCEPTS

DEFINITION. Let En be any sequence of subsets of ; we define


1
 1 1 1

lim sup En D En , lim inf En D En .
n n
mD1 nDm mD1 nDm

Let us observe at once that


1 lim inf En D lim sup Ecn c ,
n n

so that in a sense one of the two notions suffices, but it is convenient to


employ both. The main properties of these sets will be given in the following
two propositions.

(i) A point belongs to lim supn En if and only if it belongs to infinitely


many terms of the sequence fEn , n ½ 1g. A point belongs to lim infn En if
and only if it belongs to all terms of the sequence from a certain term on.
It follows in particular that both sets are independent of the enumeration of
the En ’s.
PROOF. We shall prove the first assertion. Since a point belongs to
infinitely many of the En ’s if and only if it does not belong to all Ecn from
a certain value of n on, the second assertion will follow from (1). Now if ω
belongs to infinitely many of the En ’s, then it belongs to
1
Fm D En for every m;
nDm

hence it belongs to
1

Fm D lim sup En .
n
mD1

Conversely, if ω belongs to 1
mD1 Fm , then ω 2 Fm for every m. Were ω to
belong to only a finite number of the En ’s there would be an m such that
ω 62 En for n ½ m, so that
1
ω 62 En D Fm .
nDm

This contradiction proves that ω must belong to an infinite number of the En ’s.
In more intuitive language: the event lim supn En occurs if and only if the
events En occur infinitely often. Thus we may write
P lim sup En  D P En i.o.
n
4.2 ALMOST SURE CONVERGENCE; BOREL–CANTELLI LEMMA 77

where the abbreviation “i.o.” stands for “infinitely often”. The advantage of
such a notation is better shown if we consider, for example, the events “jXn j ½
” and the probability P fjXn j ½  i.o.g; see Theorem 4.2.3 below.
(ii) If each En 2 F , then we have
1 
2 P lim sup En  D lim P En ;
n m!1
nDm
 1


3 P lim inf En  D lim P En .
n m!1
nDm

PROOF. Using the notation in the preceding proof, it is clear that Fm


decreases as m increases. Hence by the monotone property of p.m.’s:
1 

P Fm D lim P Fm ,
m!1
mD1

which is (2); (3) is proved similarly or via (1).

Theorem 4.2.1. We have for arbitrary events fEn g:



4 P En  < 1 ) P En i.o. D 0.
n

PROOF. By Boole’s inequality for p.m.’s, we have


1

P Fm   P En .
nDm

Hence the hypothesis in (4) implies that P Fm  ! 0, and the conclusion in
(4) now follows by (2).
As an illustration of the convenience of the new notions, we may restate
Theorem 4.1.1 as follows. The intuitive content of condition (5) below is the
point being stressed here.

Theorem 4.2.2. Xn ! 0 a.e. if and only if

5 8 > 0: P fjXn j >  i.o.g D 0.


1
Using the notation Am D
PROOF. nDm fjXn j  g as in (3) of Sec. 4.1
(with X D 0), we have
1
 1 1

fjXn j >  i.o.g D fjXn j > g D Acm .
mD1 nDm mD1
78 CONVERGENCE CONCEPTS

According to Theorem 4.1.1, Xn ! 0 a.e. if and only if for each  > 0,


P Acm  ! 0 as m ! 1; since Acm decreases as m increases, this is equivalent
to (5) as asserted.

Theorem 4.2.3. If Xn ! X in pr., then there exists a sequence fnk g of inte-


gers increasing to infinity such that Xnk ! X a.e. Briefly stated: convergence
in pr. implies convergence a.e. along a subsequence.
PROOF. We may suppose X  0 as explained before. Then the hypothesis
may be written as
 
1
8k > 0: lim P jXn j > k D 0.
n!1 2
It follows that for each k we can find nk such that
 
1 1
P jXnk j > k  k
2 2
and consequently
   
1 1
P jXnk j > k  k
< 1.
k
2 k
2

Having so chosen fnk g, we let Ek be the event “jXnk j > 1/2k ”. Then we have
by (4):  
1
P jXnk j > k i.o. D 0.
2
[Note: Here the index involved in “i.o.” is k; such ambiguity being harmless
and expedient.] This implies Xnk ! 0 a.e. (why?) finishing the proof.

EXERCISES

1. Prove that
P lim sup En  ½ lim P En ,
n n

P lim inf En   lim P En .


n n

2. Let fBn g be a countable collection of Borel sets in U . If there exists


a υ > 0 such that mBn  ½ υ for every n, then there is at least one point in U
that belongs to infinitely many Bn ’s.
3. If fXn g converges to a finite limit a.e., then for any  there exists
M < 1 such that P fsup jXn j  Mg ½ 1  .
4.2 ALMOST SURE CONVERGENCE; BOREL–CANTELLI LEMMA 79

 4. For any sequence of r.v.’s fX g there exists a sequence of constants


n
fAn g such that Xn /An ! 0 a.e.
 5. If fX g converges on the set C, then for any  > 0, there exists
n
C0 ² C with P CnC0  <  such that Xn converges uniformly in C0 . [This
is Egorov’s theorem. We may suppose C D  and the limit to be zero. Let
Fmk D 1 nDk fω: j Xn ω  1/mg; then 8m, 9km such that

P Fm,km  > 1  /2m .

Take C0 D 1 mD1 Fm,km .]


6. Cauchy convergence of fXn g in pr. (or in L p ) implies the existence of
an X (finite a.e.), such that Xn converges to X in pr. (or in L p ). [HINT: Choose
nk so that
  1

P jXnkC1  Xnk j > k < 1;
k
2

cf. Theorem 4.2.3.]


 7. fX g converges in pr. to X if and only if every subsequence
n
fXnk g contains a further subsequence that converges a.e. to X. [HINT: Use
Theorem 4.1.5.]
8. Let fXn , n ½ 1g be any sequence of functions on  to R1 and let
C denote the set of ω for which the numerical sequence fXn ω, n ½ 1g
converges. Show that
1 1
 1 
  1 1
 1 %

1
CD jXn  Xn j 
0 D m, n, n0 
0
mD1 nD1 n DnC1
m 0
mD1 nD1 n DnC1

where
%  
1
m, n, n0  D ω: max 0 jXj ω  Xk ωj  .
n<j<kn m
Hence if the Xn ’s are r.v.’s, we have
%
P C D lim lim lim
0
P m, n, n0  .
m!1 n!1 n !1

9. As in Exercise 8 show that


1
1 
1 
1
fω: lim Xn ω D 0g D jXn j  .
n!1
mD1 kD1 nDk
m

10. Suppose that for a < b, we have


P fXn < a i.o. and Xn > b i.o.g D 0;
80 CONVERGENCE CONCEPTS

then limn!1 Xn exists a.e. but it may be infinite. [HINT: Consider all pairs of
rational numbers a, b and take a union over them.]

Under the assumption of independence, Theorem 4.2.1 has a striking


complement.

Theorem 4.2.4. If the events fEn g are independent, then



6 P En  D 1 ) P En i.o. D 1.
n

PROOF. By (3) we have


 1


7 P flim inf Ecn g D lim P Ecn .
n m!1
nDm

The events fEcn g


are independent as well as fEn g, by Exercise 3 of
Sec. 3.3; hence we have if m0 > m:
 m0  m0 m0
  
P En D
c
P Ecn  D 1  P En .
nDm nDm nDm

Now for any x ½ 0, we have 1  x  ex ; it follows that the last term above
does not exceed
m0
 m0 
 
P En 
e D exp  P En  .
nDm nDm
0
Letting m ! 1, the right member above ! 0, since the series in the
exponent ! C1 by hypothesis. It follows by monotone property that
1   m0 
 
P En D lim
c
0
P En D 0.
c
m !1
nDm nDm

Thus the right member in (7) is equal to 0, and consequently so is the left
member in (7). This is equivalent to P En i.o. D 1 by (1).
Theorems 4.2.1 and 4.2.4 together will be referred to as the
Borel–Cantelli lemma, the former the “convergence part” and the latter “the
divergence part”. The first is more useful since the events there may be
completely arbitrary. The second has an extension to pairwise independent
r.v.’s; although the result is of some interest, it is the method of proof to be
given below that is more important. It is a useful technique in probability
theory.
4.2 ALMOST SURE CONVERGENCE; BOREL–CANTELLI LEMMA 81

Theorem 4.2.5. The implication (6) remains true if the events fEn g are pair-
wise independent.
PROOF. Let In denote the indicator of En , so that our present hypothesis
becomes
8 8m 6D n:
E Im In  D E Im E In .
1
Consider the series of r.v.’s: nD1 In ω. It diverges to C1 if and only if
an infinite number of its terms are equal to one, namely if ω belongs to an
infinite number of the En ’s. Hence the conclusion in (6) is equivalent to
1

9 P In D C1 D 1.
nD1

What has been said so far is true for arbitrary En ’s. Now the hypothesis in
(6) may be written as
1
E In  D C1.
nD1
k
Consider the partial sum Jk D nD1 In . Using Chebyshev’s inequality, we
have for every A > 0:
2
Jk  1
10 P fjJk  E Jk j  A Jk g ½ 1  D1 ,
A 2 Jk 
2 A2
2
where J denotes the variance of J. Writing
pn D E In  D P En ,
we may calculate 2 Jk  by using (8), as follows:
 k 
 
E Jk  D E
2
In C 2
2
Im In
nD1 1m<nk


k 
D E I2n  C 2 E Im E In 
nD1 1m<nk


k  
k
D E In 2 C 2 E Im E In  C fE In   E In 2 g
nD1 1m<nk nD1
 2

k 
k
D pn C pn  p2n .
nD1 nD1
82 CONVERGENCE CONCEPTS

Hence
& '

k 
k
2
Jk  D E J2k   E Jk  D
2
pn  p2n  D 2
In  .
nD1 nD1

This calculation will turn out to be a particular case of a simple basic formula;

see (6) of Sec. 5.1. Since knD1 pn D E Jk  ! 1, it follows that

Jk   E Jk 1/2 D oE Jk 


in the classical “o, O” notation of analysis. Hence if k > k0 A, (10) implies
 
1 1
P Jk > E Jk  ½ 1  2
2 A
(where 12 may be replaced by any constant < 1). Since Jk increases with k
the inequality above holds a fortiori when the first Jk there is replaced by
limk!1 Jk ; after that we can let k ! 1 in E Jk  to obtain
1
P f lim Jk D C1g ½ 1  .
k!1 A2
Since the left member does not depend on A, and A is arbitrary, this implies
that limk!1 Jk D C1 a.e., namely (9).

Corollary. If the events fEn g are pairwise independent, then


P lim sup En  D 0 or 1
n

according as n P En  < 1 or D 1.

This is an example of a “zero-or-one” law to be discussed in Chapter 8,


though it is not included in any of the general results there.

EXERCISES

Below Xn , Yn are r.v.’s, En events.


11. Give a trivial example of dependent fEn g satisfying the hypothesis
but not the conclusion of (6); give a less trivial example satisfying the hypoth-
esis but with P lim supn En  D 0. [HINT: Let En be the event that a real number
in [0, 1] has its n-ary expansion begin with 0.]
 12. Prove that the probability of convergence of a sequence of indepen-
dent r.v.’s is equal to zero or one.
13. If fXn g is a sequence of independent and identically distributed r.v.’s
not constant a.e., then P fXn convergesg D 0.
4.2 ALMOST SURE CONVERGENCE; BOREL–CANTELLI LEMMA 83

 14. If fX g is a sequence of independent r.v.’s with d.f.’s fF g, then


n  n
P flimn Xn D 0g D 1 if and only if 8 > 0: n f1  Fn  C Fn g < 1.

15. If n P jXn j > n < 1, then
jXn j
lim sup  1 a.e.
n n
 16. Strengthen Theorem 4.2.4 by proving that

Jn
lim D 1 a.e.
n!1 E Jn 

[HINT: Take a subsequence fkn g such that E Jnk  ¾ k 2 ; prove the result first
for this subsequence by estimating P fjJk  E Jk j > υE Jk g; the general
case follows because if nk  n < nkC1 ,
Jnk /E JnkC1   Jn /E Jn   JnkC1 /E Jnk .

17. If E Xn  D 1 and E X2n  is bounded in n, then

P f lim Xn ½ 1g > 0.
n!1

[This is due to Kochen and Stone. Truncate Xn at A to Yn with A so large


that E Yn  > 1   for all n; then apply Exercise 6 of Sec. 3.2.]
 18. Let fE g be events and fI g their indicators. Prove the inequality
n n
 n    n  2 ( ⎧ n 2 ⎫
 ⎨  ⎬
11 P Ek ½ E Ik E Ik .
⎩ ⎭
kD1 kD1 kD1


Deduce from this that if (i) n P En  D 1 and (ii) there exists c > 0 such
that we have
8m < n: P Em En   cP Em P Enm ;

then
P flim sup En g > 0.
n

19. If P En  D 1 and
n
⎧ ⎫(
⎨ n  n ⎬ 
n 2

lim P Ej Ek  P Ek  D 1,


n ⎩ ⎭
jD1 kD1 kD1

then P flim supn En g D 1. [HINT: Use (11) above.]


84 CONVERGENCE CONCEPTS

20. Let fEn g be arbitrary events satisfying



(i) lim P En  D 0, (ii) P En EcnC1  < 1;
n
n

then P flim supn En g D 0. [This is due to Barndorff–Nielsen.]

4.3 Vague convergence


If a sequence of r.v.’s fXn g tends to a limit, the corresponding sequence of
p.m.’s f n g ought to tend to a limit in some sense. Is it true that limn n A
exists for all A 2 B1 or at least for all intervals A? The answer is no from
trivial examples.

Example 1. Let Xn D cn where the cn ’s are constants tending to zero. Then Xn ! 0


deterministically. For any interval I such that 0 2/ I, where I is the closure of I, we
have limn n I D 0 D I; for any interval such that 0 2 I° , where I° is the interior
of I, we have limn n I D 1 D I. But if fcn g oscillates between strictly positive
and strictly negative values, and I D a, 0 or 0, b, where a < 0 < b, then n I
oscillates between 0 and 1, while I D 0. On the other hand, if I D a, 0] or [0, b,
then n I oscillates as before but I D 1. Observe that f0g is the sole atom of
and it is the root of the trouble.
Instead of the point masses concentrated at cn , we may consider, e.g., r.v.’s fXn g
having uniform distributions over intervals cn , cn0  where cn < 0 < cn0 and cn ! 0,
cn0 ! 0. Then again Xn ! 0 a.e. but n a, 0 may not converge at all, or converge
to any number between 0 and 1.
Next, even if f n g does converge in some weaker sense, is the limit necessarily
a p.m.? The answer is again no.

Example 2. Let Xn D cn where cn ! C1. Then Xn ! C1 deterministically.


According to our definition of a r.v., the constant C1 indeed qualifies. But for
any finite interval a, b we have limn n a, b D 0, so any limiting measure must
also be identically zero. This example can be easily ramified; e.g. let an ! 1,
bn ! C1 and ⎧
⎨ an with probability ˛,
Xn D 0 with probability 1  ˛  ˇ,

bn with probability ˇ.
Then Xn ! X where

⎨ C1 with probability ˛,
XD 0 with probability 1  ˛  ˇ,

1 with probability ˇ.
For any finite interval a, b containing 0 we have
lim n a, b D lim n f0g D 1  ˛  ˇ.
n n
4.3 VAGUE CONVERGENCE 85

In this situation it is said that masses of amount ˛ and ˇ “have wandered off to C1
and 1 respectively.” The remedy here is obvious: we should consider measures on
the extended line RŁ D [1, C1], with possible atoms at fC1g and f1g. We
leave this to the reader but proceed to give the appropriate definitions which take into
account the two kinds of troubles discussed above.

DEFINITION.A measure on R1 , B1  with R1   1 will be called a


subprobability measure (s.p.m.).

DEFINITION OF VAGUE CONVERGENCE. A sequence f n , n ½ 1g of s.p.m.’s is


said to converge vaguely to an s.p.m. iff there exists a dense subset D of
R1 such that
1 8a 2 D, b 2 D, a < b: n a, b] ! a, b].
This will be denoted by
!
2 n v
and is called the vague limit of f n g. We shall see that it is unique below.
For brevity’s sake, we will write a, b] as a, b] below, and similarly
for other kinds of intervals. An interval a, b is called a continuity interval
of iff neither a nor b is an atom of ; in other words iff a, b D [a, b].
As a notational convention, a, b D 0 when a > b.

Theorem 4.3.1. Let f n g and be s.p.m.’s. The following propositions are


equivalent.
(i) For every finite interval a, b and  > 0, there exists an n0 a, b, 
such that if n ½ n0 , then
3 a C , b      n a, b  a  , b C  C .
Here and hereafter the first term is interpreted as 0 if a C  > b  .
(ii) For every continuity interval a, b] of , we have

n a, b] ! a, b].


v
(iii) n ! .
PROOF. To prove that (i) ) (ii), let a, b be a continuity interval of .
It follows from the monotone property of a measure that
lim a C , b   D a, b D [a, b] D lim a  , b C .
#0 #0

Letting n ! 1, then  # 0 in (3), we have


a, b  lim n a, b  lim n [a, b]  [a, b] D a, b,
n n
86 CONVERGENCE CONCEPTS

which proves (ii); indeed a, b] may be replaced by a, b or [a, b or [a, b]
there. Next, since the set of atoms of is countable, the complementary set
D is certainly dense in R1 . If a 2 D, b 2 D, then a, b is a continuity interval
of . This proves (ii) ) (iii). Finally, suppose (iii) is true so that (1) holds.
Given any a, b and  > 0, there exist a1 , a2 , b1 , b2 all in D satisfying
a   < a1 < a < a2 < a C , b   < b1 < b < b2 < b C .
By (1), there exists n0 such that if n ½ n0 , then
j n ai , bj ]  ai , bj ]j  
for i D 1, 2 and j D 1, 2. It follows that
a C , b      a2 , b1 ]    n a2 , b1 ]  n a, b  n a1 , b2 ]

 a1 , b2 ] C   a  , b C  C .
Thus (iii) ) (i). The theorem is proved.
As an immediate consequence, the vague limit is unique. More precisely,
if besides (1) we have also
8a 2 D0 , b 2 D0 , a < b: n a, b] ! 0
a, b],
then  0 . For let A be the set of atoms of and of 0
; then if a 2 Ac ,
b 2 Ac , we have by Theorem 4.3.1, (ii):
0
a, b] n a, b] ! a, b]
0
so that a, b] D a, b]. Now A is dense in R1 , hence the two measures
c

and 0 coincide on a set of intervals that is dense in R1 and must therefore


be identical (see Corollary to Theorem 2.2.3).
v
Another consequence is: if n ! and a, b is a continuity interval
of , then n I ! I, where I is any of the four kinds of intervals with
endpoints a, b. For it is clear that we may replace n a, b by n [a, b] in (3).
In particular, we have n fag ! 0, n fbg ! 0.
The case of strict probability measures will now be treated.

Theorem 4.3.2. Let f n g and be p.m.’s. Then (i), (ii), and (iii) in the
preceding theorem are equivalent to the following “uniform” strengthening
of (i).

(i0 ) For any υ > 0 and  > 0, there exists n0 υ,  such that if n ½ n0
then we have for every interval a, b, possibly infinite:
4 a C υ, b  υ    n a, b  a  υ, b C υ C .
4.3 VAGUE CONVERGENCE 87

PROOF. The employment of two numbers υ and  instead of one  as in (3)


is an apparent extension only (why?). Since (i0 ) ) (i), the preceding theorem
yields at once (i0 ) ) (ii) , (iii). It remains to prove (ii) ) (i0 ) when the n ’s
and are p.m.’s. Let A denote the set of atoms of . Then there exist an
integer  and aj 2 Ac , 1  j  , satisfying:

aj < ajC1  aj C υ, 1  j    1;

and

5 a1 , a c  < .
4
By (ii), there exist n0 depending on  and  (and so on  and υ) such that if
n ½ n0 then

6 sup j aj , ajC1 ]  n aj , ajC1 ]j  .
1j1 4

It follows by additivity that



j a1 , a ]  n a1 , a ]j <
4
and consequently by (5):
c 
7 n a1 , a   < .
2
From (5) and (7) we see that by ignoring the part of a, b outside a1 , a , we
commit at most an error </2 in either one of the inequalities in (4). Thus it
is sufficient to prove (4) with υ and /2, assuming that a, b ² a1 , a . Let
then aj  a < ajC1 and ak  b < akC1 , where 0  j  k    1. The desired
inequalities follow from (6), since when n ½ n0 we have
 
n a C υ, b  υ   n ajC1 , ak   ajC1 , ak   a, b

4 4

 aj , akC1   n aj , akC1  C
4

 n a  υ, b C υ C .
4
The set of all s.p.m.’s on R1 bears close analogy to the set of all
real numbers in [0, 1]. Recall that the latter is sequentially compact, which
means: Given any sequence of numbers in the set, there is a subsequence
which converges, and the limit is also a number in the set. This is the
fundamental Bolzano–Weierstrass theorem. We have the following analogue
88 CONVERGENCE CONCEPTS

which states: The set of all s.p.m.’s is sequentially compact with respect to
vague convergence. It is often referred to as “Helly’s extraction (or selection)
principle”.

Theorem 4.3.3. Given any sequence of s.p.m.’s, there is a subsequence that


converges vaguely to an s.p.m.
PROOF. Here it is convenient to consider the subdistribution function
(s.d.f.) Fn defined as follows:
8x: Fn x D n 1, x].

If n is a p.m., then Fn is just its d.f. (see Sec. 2.2); in general Fn is increasing,
right continuous with Fn 1 D 0 and Fn C1 D n R1   1.
Let D be a countable dense set of R1 , and let frk , k ½ 1g be an
enumeration of it. The sequence of numbers fFn r1 , n ½ 1g is bounded, hence
by the Bolzano–Weierstrass theorem there is a subsequence fF1k , k ½ 1g of
the given sequence such that the limit
lim F1k r1  D 1
k!1

exists; clearly 0  1  1. Next, the sequence of numbers fF1k r2 , k ½ 1g is


bounded, hence there is a subsequence fF2k , k ½ 1g of fF1k , k ½ 1g such that
lim F2k r2  D 2
k!1

where 0  2  1. Since fF2k g is a subsequence of fF1k g, it converges also at


r1 to 1 . Continuing, we obtain
F11 , F12 , . . . , F1k , . . . converging at r1 ;
F21 , F22 , . . . , F2k , . . . converging at r1 , r2 ;
....................
Fj1 , Fj2 , . . . , Fjk , . . . converging at r1 , r2 , . . . , rj ;
....................
Now consider the diagonal sequence fFkk , k ½ 1g. We assert that it converges
at every rj , j ½ 1. To see this let rj be given. Apart from the first j  1 terms,
the sequence fFkk , k ½ 1g is a subsequence of fFjk , k ½ 1g, which converges
at rj and hence limk!1 Fkk rj  D j , as desired.
We have thus proved the existence of an infinite subsequence fnk g and a
function G defined and increasing on D such that
8r 2 D: lim Fnk r D Gr.
k!1

From G we define a function F on R1 as follows:


8x 2 R1 : Fx D inf Gr.
x<r2D
4.3 VAGUE CONVERGENCE 89

By Sec. 1.1, (vii), F is increasing and right continuous. Let C denote the set
of its points of continuity; C is dense in R1 and we show that

8 8x 2 C: lim Fnk x D Fx.


k!1

For, let x 2 C and  > 0 be given, there exist r, r 0 , and r 00 in D such that
r < r 0 < x < r 00 and Fr 00   Fr  . Then we have
Fr  Gr 0   Fx  Gr 00   Fr 00   Fr C ;
) )
⏐ ⏐
and ⏐ ⏐
⏐ ⏐

Fnk r 0   Fnk x  Fnk r 00 .

From these (8) follows, since  is arbitrary.


To F corresponds a (unique) s.p.m. such that

Fx  F1 D 1, x]

as in Theorem 2.2.2. Now the relation (8) yields, upon taking differences:

8a 2 C, b 2 C, a < b: lim nk a, b] D a, b].


k!1

v
Thus nk ! , and the theorem is proved.
v v
We say that Fn converges vaguely to F and write Fn ! F for n !
where n and are the s.p.m.’s corresponding to the s.d.f.’s Fn and F.
The reader should be able to confirm the truth of the following proposition
about real numbers. Let fxn g be a sequence of real numbers such that every
subsequence that tends to a limit (š1 allowed) has the same value for the
limit; then the whole sequence tends to this limit. In particular a bounded
sequence such that every convergent subsequence has the same limit is
convergent to this limit.
The next theorem generalizes this result to vague convergence of s.p.m.’s.
It is not contained in the preceding proposition but can be reduced to it if we
use the properties of vague convergence; see also Exercise 9 below.

Theorem 4.3.4. If every vaguely convergent subsequence of the sequence


v
of s.p.m.’s f n g converges to the same , then n ! .
PROOF. To prove the theorem by contraposition, suppose n does not
converge vaguely to . Then by Theorem 4.3.1, (ii), there exists a continuity
interval a, b of such that n a, b does not converge to a, b. By
the Bolzano–Weierstrass theorem there exists a subsequence fnk g tending to
90 CONVERGENCE CONCEPTS

infinity such that the numbers nk a, b converge to a limit, say L 6D a, b.
By Theorem 4.3.3, the sequence f nk , k ½ 1g contains a subsequence, say
f n0 , k ½ 1g, which converges vaguely, hence to by hypothesis of the
k
theorem. Hence again by Theorem 4.3.1, (ii), we have

nk a, b
0 ! a, b.

But the left side also ! L, which is a contradiction.

EXERCISES

1. Perhaps the most logical approach to vague convergence is as follows.


The sequence f n , n ½ 1g of s.p.m.’s is said to converge vaguely iff there
exists a dense subset D of R1 such that for every a 2 D, b 2 D, a < b, the
sequence f n a, b, n ½ 1g converges. The definition given before implies this,
of course, but prove the converse.
2. Prove that if (1) is true, then there exists a dense set D0 , such that
n I ! I where I may be any of the four intervals a, b, a, b], [a, b,
[a, b] with a 2 D0 , b 2 D0 .
3. Can a sequence of absolutely continuous p.m.’s converge vaguely to
a discrete p.m.? Can a sequence of discrete p.m.’s converge vaguely to an
absolutely continuous p.m.?
4. If a sequence of p.m.’s converges vaguely to an atomless p.m., then
the convergence is uniform for all intervals, finite or infinite. (This is due to
Pólya.)
5. Let ffn g be a sequence of functions increasing in R1 and uniformly
bounded there: supn,x jfn xj  M < 1. Prove that there exists an increasing
function f on R1 and a subsequence fnk g such that fnk x ! fx for every
x. (This is a form of Theorem 4.3.3 frequently given; the insistence on “every
x” requires an additional argument.)
6. Let f n g be a sequence of finite measures on B1 . It is said to converge
vaguely to a measure iff (1) holds. The limit is not necessarily a finite
measure. But if n R1  is bounded in n, then is finite.
7. If Pn is a sequence of p.m.’s on , F  such that Pn E converges
for every E 2 F , then the limit is a p.m. P . Furthermore, if f is bounded
and F -measurable, then
 
f dPn ! f dP .
 

(The first assertion is the Vitali–Hahn–Saks theorem and rather deep, but it
can be proved by reducing it to a problem of summability; see A. Rényi, [24].
4.4 CONTINUATION 91

8. If n and are p.m.’s and n E ! E for every open set E, then
this is also true for every Borel set. [HINT: Use (7) of Sec. 2.2.]
9. Prove a convergence theorem in metric space that will include both
Theorem 4.3.3 for p.m.’s and the analogue for real numbers given before the
theorem. [HINT: Use Exercise 9 of Sec. 4.4.]

4.4 Continuation
We proceed to discuss another kind of criterion, which is becoming ever more
popular in measure theory as well as functional analysis. This has to do with
classes of continuous functions on R1 .
CK D the class of continuous functions f each vanishing outside a
compact set Kf;
C0 D the class of continuous functions f such that
limjxj!1 fx D 0;
CB D the class of bounded continuous functions;
C D the class of continuous functions.

We have CK ² C0 ² CB ² C. It is well known that C0 is the closure of CK


with respect to uniform convergence.
An arbitrary function f defined on an arbitrary space is said to have
support in a subset S of the space iff it vanishes outside S. Thus if f 2 CK ,
then it has support in a certain compact set, hence also in a certain compact
interval. A step function on a finite or infinite interval a, b is one with support
in it such that fx D cj for x 2 aj , ajC1  for 1  j  , where  is finite,
a D a1 < Ð Ð Ð < a D b, and the cj ’s are arbitrary real numbers. It will be
called D-valued iff all the aj ’s and cj ’s belong to a given set D. When the
interval a, b is R1 , f is called just a step function. Note that the values of
f at the points aj are left unspecified to allow for flexibility; frequently they
are defined by right or left continuity. The following lemma is basic.

Approximation Lemma. Suppose that f 2 CK has support in the compact


interval [a, b]. Given any dense subset A of R1 and  > 0, there exists an
A-valued step function f on a, b such that
1 sup jfx  f xj  .
x2R1

If f 2 C0 , the same is true if a, b is replaced by R1 .

This lemma becomes obvious as soon as its geometric meaning is grasped.


In fact, for any f in CK , one may even require that either f  f or f ½ f.
92 CONVERGENCE CONCEPTS

The problem is then that of the approximation of the graph of a plane curve
by inscribed or circumscribed polygons, as treated in elementary calculus. But
let us remark that the lemma is also a particular case of the Stone–Weierstrass
theorem (see, e.g., Rudin [2]) and should be so verified by the reader. Such
a sledgehammer approach has its merit, as other kinds of approximation
soon to be needed can also be subsumed under the same theorem. Indeed,
the discussion in this section is meant in part to introduce some modern
terminology to the relevant applications in probability theory. We can now
state the following alternative criterion for vague convergence.
v
Theorem 4.4.1. Let f ng
and be s.p.m.’s. Then n ! if and only if
 
2 8f 2 CK [or C0 ]: fx n dx ! fx dx.
R1 R1
v
PROOF. Suppose n ! ; (2) is true by definition when f is the indicator
of a, b] for a 2 D, b 2 D, where D is the set in (1) of Sec. 4.3. Hence by the
linearity of integrals it is also true when f is any D-valued step function. Now
let f 2 C0 and  > 0; by the approximation lemma there exists a D-valued
step function f satisfying (1). We have
       
     
3  f d n  f d    f  f  d n  C  f d n  f d 
    
 
 
C  f  f d  .

By the modulus inequality and mean value theorem for integrals (see Sec. 3.2),
the first term on the right side above is bounded by
 
jf  f j d n   d n  ;

similarly for the third term. The second term converges to zero as n ! 1
because f is a D-valued step function. Hence the left side of (3) is bounded
by 2 as n ! 1, and so converges to zero since  is arbitrary.
Conversely, suppose (2) is true for f 2 CK . Let A be the set of atoms
of as in the proof of Theorem 4.3.2; we shall show that vague convergence
holds with D D Ac . Let g D 1a,b] be the indicator of a, b] where a 2 D,
b 2 D. Then, given  > 0, there exists υ > 0 such that a C υ < b  υ, and
such that U <  where
U D a  υ, a C υ [ b  υ, b C υ.
Now define g1 to be the function that coincides with g on 1, a] [ [a C
υ, b  υ] [ [b, 1 and that is linear in a, a C υ and in b  υ, b; g2 to be
the function that coincides with g on 1, a  υ] [ [a, b] [ [b C υ, 1 and
4.4 CONTINUATION 93

that is linear in a  υ, a and b, b C υ. It is clear that g1  g  g2  g1 C 1


and consequently
  
4 g1 d n  g d n  g2 d n ,

# #
  
5 g1 d  gd  g2 d .

Since g1 2 CK , g2 2 CK , it follows from (2) that the extreme terms in (4)


converge to the corresponding terms in (5). Since
  
g2 d  g1 d  1 d D U < ,
U

and  is arbitrary, it follows that the middle term in (4) also converges to that
in (5), proving the assertion.

Corollary. If f ng is a sequence of s.p.m.’s such that for every f 2 CK ,



lim fx n dx
n R1

exists, then f ng converges vaguely.

For by Theorem 4.3.3 a subsequence converges


 vaguely, say to . By
Theorem 4.4.1, the limit above is equal to R1 fx dx. This must then
be the same for every vaguely convergent subsequence, according to the
hypothesis of the corollary. The vague limit of every such sequence is
therefore uniquely determined (why?) to be , and the corollary follows from
Theorem 4.3.4.
v
Theorem 4.4.2. Let f ng and be p.m.’s. Then n ! if and only if
 
6 8f 2 CB : fx n dx ! fx dx.
R1 R1
v
PROOF. Suppose n ! . Given  > 0, there exist a and b in D such that
7 a, b]c  D 1  a, b] < .
It follows from vague convergence that there exists n0  such that if
n ½ n0 , then
8 n a, b]
c
D1 n a, b] < .
Let f 2 CB be given and suppose that jfj  M < 1. Consider the function
f , which is equal to f on [a, b], to zero on 1, a  1 [ b C 1, 1,
94 CONVERGENCE CONCEPTS

and which is linear in [a  1, a and in b, b C 1]. Then f 2 CK and


jf  f j  2M. We have by Theorem 4.4.1
 
9 f d n ! f d .
R1 R1

On the other hand, we have


 
10 jf  f j d n  2M d n  2M
R1 a,b]c

by (8). A similar estimate holds with replacing n above, by (7). Now the
argument leading from (3) to (2) finishes the proof of (6) in the same way.
v
This proves that n ! implies (6); the converse has already been proved
in Theorem 4.4.1.
Theorems 4.3.3 and 4.3.4 deal with s.p.m.’s. Even if the given sequence
f n g consists only of strict p.m.’s, the sequential vague limit may not be so.
This is the sense of Example 2 in Sec. 4.3. It is sometimes demanded that
such a limit be a p.m. The following criterion is not deep, but applicable.

Theorem 4.4.3. Let a family of p.m.’s f ˛ , ˛ 2 Ag be given on an arbitrary


index set A. In order that every sequence of them contains a subsequence
which converges vaguely to a p.m., it is necessary and sufficient that the
following condition be satisfied: for any  > 0, there exists a finite interval I
such that
11 inf ˛ I > 1  .
˛2A

PROOF. Suppose (11) holds. For any sequence f n g from the family, there
v
exists a subsequence f 0n g such that 0n ! . We show that is a p.m. Let J
be a continuity interval of which contains the I in (11). Then
0 0
R1  ½ J D lim n J ½ lim n I ½ 1  .
n n

Since  is arbitrary, R1  D 1. Conversely, suppose the condition involving


(11) is not satisfied, then there exists  > 0, a sequence of finite intervals In
increasing to R1 , and a sequence f n g from the family such that
8n: n In   1  .
Let f 0n g and be as before and J any continuity interval of . Then J ² In
for all sufficiently large n, so that
0 lim 0
J D lim n J  n In   1  .
n n
Thus R1   1   and is not a p.m. The theorem is proved.
4.4 CONTINUATION 95

A family of p.m.’s satisfying the condition above involving (11) is said to


be tight. The preceding theorem can be stated as follows: a family of p.m.’s is
relatively compact if and only if it is tight. The word “relatively” purports that
the limit need not belong to the family; the word “compact” is an abbreviation
of “sequentially vaguely convergent to a strict p.m.” Extension of the result
to p.m.’s in more general topological spaces is straight-forward but plays an
important role in the convergence of stochastic processes.
The new definition of vague convergence in Theorem 4.4.1 has the
advantage over the older ones in that it can be at once carried over to measures
in more general topological spaces. There is no substitute for “intervals” in
such a space but the classes CK , C0 and CB are readily available. We will
illustrate the general approach by indicating one more result in this direction.
Recall the notion of a lower semicontinuous function on R1 defined by:

12 8x 2 R1 : fx  lim fy.


y!x
y6Dx

There are several equivalent definitions (see, e.g., Royden [5]) but
the following characterization is most useful: f is bounded and lower
semicontinuous if and only if there exists a sequence of functions fk 2 CB
which increases to f everywhere, and we call f upper semicontinuous iff f
is lower semicontinuous. Usually f is allowed to be extended-valued; but to
avoid complications we will deal with bounded functions only and denote
by L and U respectively the classes of bounded lower semicontinuous and
bounded upper semicontinuous functions.
v
Theorem 4.4.4. If f n g and are p.m.’s, then n ! if and only if one
of the two conditions below is satisfied:
 
13 8f 2 L : lim fx n dx ½ fx dx
n
 
8g 2 U : lim gx n dx  gx dx.
n

PROOF. We begin by observing that the two conditions above are equiv-
v
alent by putting f D g. Now suppose n ! and let fk 2 CB , fk " f.
Then we have
  
14 lim fx n dx ½ lim fk x n dx D fk x dx
n n

by Theorem 4.4.2. Letting k ! 1 the last integral above converges to


fx dx by monotone convergence. This gives the first inequality in (13).
Conversely, suppose the latter is satisfied and let ϕ 2 CB , then ϕ belongs to
96 CONVERGENCE CONCEPTS

both L and U, so that


   
ϕx dx  lim ϕx n dx  lim ϕx n dx  ϕx dx
n n

which proves  
lim ϕx n dx D ϕx dx.
n

v
Hence n ! by Theorem 4.4.2.

Remark. (13) remains true if, e.g., f is lower semicontinuous, with C1


as a possible value but bounded below.

Corollary. The conditions in (13) may be replaced by the following:


for every open O: lim n O ½ O;
n

for every closed C: lim n C  C.


n

We leave this as an exercise.


Finally, we return to the connection between the convergence of r.v.’s
and that of their distributions.

DEFINITION OF CONVERGENCE “IN DISTRIBUTION” (in dist.). A sequence of


r.v.’s fXn g is said to converge in distribution to F iff the sequence fFn g
of corresponding d.f.’s converges vaguely to the d.f. F.

If X is an r.v. that has the d.f. F, then by an abuse of language we shall


also say that fXn g converges in dist. to X.

Theorem 4.4.5. Let fFn g, F be the d.f.’s of the r.v.’s fXn g, X. If Xn ! X in


v
pr., then Fn ! F. More briefly stated, convergence in pr. implies convergence
in dist.
PROOF. If Xn ! X in pr., then for each f 2 CK , we have fXn  ! fX
in pr. as easily seen from the uniform continuity of f (actually this is true
for any continuous f, see Exercise 10 of Sec. 4.1). Since f is bounded the
convergence holds also in L 1 by Theorem 4.1.4. It follows that
E ffXn g ! E ffXg,
v
which is just the relation (2) in another guise, hence n! .
Convergence of r.v.’s in dist. is merely a convenient turn of speech; it
does not have the usual properties associated with convergence. For instance,
4.4 CONTINUATION 97

if Xn ! X in dist. and Yn ! Y in dist., it does not follow by any means


that Xn C Yn will converge in dist. to X C Y. This is in contrast to the true
convergence concepts discussed before; cf. Exercises 3 and 6 of Sec. 4.1. But
if Xn and Yn are independent, then the preceding assertion is indeed true as a
property of the convergence of convolutions of distributions (see Chapter 6).
However, in the simple situation of the next theorem no independence assump-
tion is needed. The result is useful in dealing with limit distributions in the
presence of nuisance terms.

Theorem 4.4.6. If Xn ! X in dist, and Yn ! 0 in dist., then

(a) Xn C Yn ! X in dist.
(b) Xn Yn ! 0 in dist.
PROOF. We begin with the remark that for any constant c, Yn ! c in dist.
is equivalent to Yn ! c in pr. (Exercise 4 below). To prove (a), let f 2 CK ,
jfj  M. Since f is uniformly continuous, given  > 0 there exists υ such
that jx  yj  υ implies jfx  fyj  . Hence we have

E fjfXn C Yn   fXn jg


 P fjfXn C Yn   fXn j  g j C2MP fjfXn C Yn   fXn j > g
  C 2MP fjYn j > υg.

The last-written probability tends to zero as n ! 1; it follows that

lim E ffXn C Yn g D lim E ffXn g D E ffXg


n!1 n!1

by Theorem 4.4.1, and (a) follows by the same theorem.


To prove (b), for given  > 0 we choose A0 so that both šA0 are points
of continuity of the d.f. of X, and so large that

lim P fjXn j > A0 g D P fjXj > A0 g < .


n!1

This means P fjXn j > A0 g <  for n > n0 . Furthermore we choose A ½ A0
so that the same inequality holds also for n  n0 . Now it is clear that
! " ! "
P fjXn Yn j > g  P fjXn j > Ag C P jYn j >   C P jYn j > .
A A
The last-written probability tends to zero as n ! 1, and (b) follows.

Corollary. If Xn ! X, ˛n ! a, ˇn ! b, all in dist. where a and b are


constants, then ˛n Xn C ˇn ! aX C b in dist.
98 CONVERGENCE CONCEPTS

EXERCISES

 1. Let v
n and be p.m.’s such that n ! . Show that the conclusion
in (2) need not hold if (a) f is bounded and Borel measurable and all n
and are absolutely continuous, or (b) f is continuous except at one point
and every n is absolutely continuous. (To find even sharper counterexamples
would not be too easy, in view of Exercise 10 of Sec. 4.5.)
v
2. Let n ! when the n ’s are s.p.m.’s. Then  for each f 2 C and
each finite continuity interval I we have I f d n ! I f d .
 3. Let
n and be as in Exercise 1. If the fn ’s arebounded continuous
functions converging uniformly to f, then fn d n ! f d .
 4. Give an example to show that convergence in dist. does not imply that
in pr. However, show that convergence to the unit mass υa does imply that in
pr. to the constant a.
5. A set f ˛ g of p.m.’s is tight if and only if the corresponding d.f.’s
fF˛ g converge uniformly in ˛ as x ! 1 and as x ! C1.
6. Let the r.v.’s fX˛ g have the p.m.’s f ˛ g. If for some real r > 0,
E fjX˛ jr g is bounded in ˛, then f ˛ g is tight.
7. Prove the Corollary to Theorem 4.4.4.
8. If the r.v.’s X and Y satisfy
P fjX  Yj ½ g  

for some , then their d.f.’s F and G satisfying the inequalities:


15 8x 2 R1 : Fx      Gx  Fx C  C .
Derive another proof of Theorem 4.4.5 from this.
 9. The Lévy distance of two s.d.f.’s F and G is defined to be the infimum
of all  > 0 satisfying the inequalities in (15). Prove that this is indeed a metric
in the space of s.d.f.’s, and that Fn converges to F in this metric if and only
v 1 1
if Fn ! F and 1 dFn ! 1 dF.
10. Find two sequences of p.m.’s f n g and f n g such that
 
8f 2 CK : f d n  f d n ! 0;

but for no finite a, b is it true that


n a, b  n a, b ! 0.
[HINT: Let n D υrn , n D υsn and choose frn g, fsn g suitably.]
11. Let f n g be a sequence of p.m.’s such that for each f 2 CB , the
 v
sequence R1 f d n converges; then n ! , where is a p.m. [HINT: If the
4.5 UNIFORM INTEGRABILITY; CONVERGENCE OF MOMENTS 99

hypothesis is strengthened to include every f in C, and convergence of real


numbers is interpreted as usual as convergence to a finite limit, the result is
easy by taking an f going to 1. In general one may proceed by contradiction
using an f that oscillates at infinity.]
 12. Let F and F be d.f.’s such that F ! v
n F. Define G  and G
n n
as in Exercise 4 of Sec. 3.1. Then Gn  ! G in a.e. [HINT: Do this first
when Fn and F are continuous and strictly increasing. The general case is
obtained by smoothing Fn and F by convoluting with a uniform distribution
in [υ, Cυ] and letting υ # 0; see the end of Sec. 6.1 below.]

4.5 Uniform integrability; convergence of moments


The function jxjr , r > 0, is in C but not in CB , hence Theorem 4.4.2 does
not apply to it. Indeed, we have seen in Example 2 of Sec. 4.1 that even
convergence a.e. does not imply convergence of any moment of order r > 0.
For, given r, a slight modification of that example will yield Xn ! X a.e.,
E jXn jr  D 1 but E jXjr  D 0.
It is useful to have conditions to ensure the convergence of moments
when Xn converges a.e. We begin with a standard theorem in this direction
from classical analysis.

Theorem 4.5.1. If Xn ! X a.e., then for every r > 0:


1 E jXjr   lim E jXn jr .
n!1

If Xn ! X in L r , and X 2 L r , then E jXn jr  ! E jXjr .


PROOF. (1) is just a case of Fatou’s lemma (see Sec. 3.2):
  
jXj dP D
r
lim jXn j dP  lim jXn jr dP ,
r
  n n 

where C1 is allowed as a value for each member with the usual convention.
In case of convergence in L r , r > 1, we have by Minkowski’s inequality
(Sec. 3.2), since X D Xn C X  Xn  D Xn  Xn  X:
E jXn jr 1/r  E jXn  Xjr 1/r  E jXjr 1/r  E jXn jr 1/r C E jXn  Xjr 1/r .
Letting n ! 1 we obtain the second assertion of the theorem. For 0 < r  1,
the inequality jx C yjr  jxjr C jyjr implies that
E jXn jr   E jX  Xn jr   E jXjr   E jXn jr  C E jX  Xn jr ,
whence the same conclusion.
The next result should be compared with Theorem 4.1.4.
100 CONVERGENCE CONCEPTS

Theorem 4.5.2. If fXn g converges in dist. to X, and for some p > 0,


supn E fjXn jp g D M < 1, then for each r < p:
2 lim E jXn jr  D E jXjr  < 1.
n!1

If r is a positive integer, then we may replace jXn jr and jXjr above by Xn r


and Xr .
PROOF. We prove the second assertion since the first is similar. Let Fn , F
v
be the d.f.’s of Xn , X; then Fn ! F. For A > 0 define fA on R1 as follows:
 r
x, if jxj  A;
3 fA x D Ar , if x > A;
Ar , if x < A.
Then fA 2 CB , hence by Theorem 4.4.4 the “truncated moments” converge:
 1  1
fA x dFn x ! fA x dFx.
1 1

Next we have
 1  
jfA x  x j dFn x 
r
jxj dFn x D
r
jXn jr dP
1 jxj>A jXn j>A

1 M
 jXn jp dP  .
Apr  Apr
The last term does not depend
1 on n, and converges to zero as A ! 1. It
1
follows that as A ! 1, 1 fA dFn converges uniformly in n to 1 x r dF.
Hence by a standard theorem on the inversion of repeated limits, we have
 1  1  1
4 x r dF D lim fA dF D lim lim fA dFn
1 A!1 1 A!1 n!1 1
 1  1
D lim lim fA dFn D lim x r dFn .
n!1 A!1 1 n!1 1
We now introduce the concept of uniform integrability, which is of basic
importance in this connection. It is also an essential hypothesis in certain
convergence questions arising in the theory of martingales (to be treated in
Chapter 9).

DEFINITION OF UNIFORM INTEGRABILITY. A family of r.v.’s fXt g, t 2 T,


where T is an arbitrary index set, is said to be uniformly integrable iff

5 lim jXt j dP D 0
A!1 jXt j>A

uniformly in t 2 T.
4.5 UNIFORM INTEGRABILITY; CONVERGENCE OF MOMENTS 101

Theorem 4.5.3. The family fXt g is uniformly integrable if and only if the
following two conditions are satisfied:

(a) E jXt j is bounded in t 2 T;


(b) For every  > 0, there exists υ > 0 such that for any E 2 F :

P E < υ ) jXt j dP <  for every t 2 T.
E
PROOF. Clearly (5) implies (a). Next, let E 2 F and write Et for the set
fω : jXt ωj > Ag. We have by the mean value theorem
    
jXt j dP D C jXt j dP  jXt j dP C AP E.
E E\Et EnEt Et

Given  > 0, there exists A D A such that the last-written integral is less
than /2 for every t, by (5). Hence (b) will follow if we set υ D /2A. Thus
(5) implies (b).
Conversely, suppose that (a) and (b) are true. Then by the Chebyshev
inequality we have for every t,
E jXt j M
P fjXt j > Ag   ,
A A
where M is the bound indicated in (a). Hence if A > M/υ, then P Et  < υ
and we have by (b): 
jXt j dP < .
Et

Thus (5) is true.

Theorem 4.5.4. Let 0 < r < 1, Xn 2 L r , and Xn ! X in pr. Then the


following three propositions are equivalent:

(i) fjXn jr g is uniformly integrable;


(ii) Xn ! X in L r ;
(iii) E jXn jr  ! E jXjr  < 1.
PROOF. Suppose (i) is true; since Xn ! X in pr. by Theorem 4.2.3, there
exists a subsequence fnk g such that Xnk ! X a.e. By Theorem 4.5.1 and (a)
above, X 2 L r . The easy inequality

jXn  Xjr  2r fjXn jr C jXjr g,

valid for all r > 0, together with Theorem 4.5.3 then implies that the sequence
fjXn  Xjr g is also uniformly integrable. For each  > 0, we have
102 CONVERGENCE CONCEPTS

  
6 jXn  Xjr dP D jXn  Xjr dP C jXn  Xjr dP
 jXn Xj> jXn Xj

 jXn  Xjr dP C r .
jXn Xj>

Since P fjXn  Xj > g ! 0 as n ! 1 by hypothesis, it follows from (b)


above that the last written integral in (6) also tends to zero. This being true
for every  > 0, (ii) follows.
Suppose (ii) is true, then we have (iii) by the second assertion of
Theorem 4.5.1.
Finally suppose (iii) is true. To prove (i), let A > 0 and consider a function
fA in CK satisfying the following conditions:

D jxjr for jxjr  A;
fA x  jxjr for A < jxjr  A C 1;
D0 for jxjr > A C 1;
cf. the proof of Theorem 4.4.1 for the construction of such a function. Hence
we have
 
lim jXn jr dP ½ lim E ffA Xn g D E ffA Xg ½ jXjr dP ,
n!1 jXn jr AC1 n!1 jXjr A

where the inequalities follow from the shape of fA , while the limit relation
in the middle as in the proof of Theorem 4.4.5. Subtracting from the limit
relation in (iii), we obtain
 
lim jXn j dP 
r
jXjr dP .
n!1 jXn jr >AC1 jXjr >A

The last integral does not depend on n and converges to zero as A ! 1. This
means: for any  > 0, there exists A0 D A0  and n0 D n0 A0  such that
we have 
sup jXn jr dP < 
n>n0 jXn jr >AC1

provided that A > A0 . Since each jXn jr is integrable, there exists A1 D A1 
such that the supremum above may be taken over all n ½ 1 provided that
A > A0 _ A1 . This establishes (i), and completes the proof of the theorem.

In the remainder of this section the term “moment” will be restricted to a


moment of positive integral order. It is well known (see Exercise 5 of Sec. 6.6)
that on U , B any p.m. or equivalently its d.f. is uniquely determined by
its moments of all orders. Precisely, if F1 and F2 are two d.f.’s such that
4.5 UNIFORM INTEGRABILITY; CONVERGENCE OF MOMENTS 103

Fi 0 D 0, Fi 1 D 1 for i D 1, 2; and


 1  1
8n ½ 1: x n dF1 x D x n dF2 x,
0 0

then F1  F2 . The corresponding result is false in R1 , B1  and a further


condition on the moments is required to ensure uniqueness. The sufficient
condition due to Carleman is as follows:
1
 1
1/2r
D C1,
rD1 m2r

where mr denotes the moment of order r. When a given sequence of numbers


fmr , r ½ 1g uniquely determines a d.f. F such that
 1
7 mr D x r dFx,
1

we say that “the moment problem is determinate” for the sequence. Of course
an arbitrary sequence of numbers need not be the sequence of moments for
any d.f.; a necessary but far from sufficient condition, for example, is that
the Liapounov inequality (Sec. 3.2) be satisfied. We shall not go into these
questions here but shall content ourselves with the useful result below, which
is often referred to as the “method of moments”; see also Theorem 6.4.5.

Theorem 4.5.5. Suppose there is a unique d.f. F with the moments fmr , r ½
1g, all finite. Suppose that fFn g is a sequence of d.f.’s, each of which has all
its moments finite:  1
mn D
r
x r dFn .
1

Finally, suppose that for every r ½ 1:


8 lim mnr D mr .
n!1
v
Then Fn ! F.
PROOF. Let n be the p.m. corresponding to Fn . By Theorem 4.3.3 there
exists a subsequence of f n g that converges vaguely. Let f nk g be any sub-
sequence converging vaguely to some . We shall show that is indeed a
p.m. with the d.f. F. By the Chebyshev inequality, we have for each A > 0:

nk A, CA ½ 1  A2 mn2k .

Since mn2k ! m2 < 1, it follows that as A ! 1, the left side converges
uniformly in k to one. Letting A ! 1 along a sequence of points such that
104 CONVERGENCE CONCEPTS

both šA belong to the dense set D involved in the definition of vague conver-
gence, we obtain as in (4) above:
R1  D lim A, CA D lim lim nk A, CA
A!1 A!1 k!1

D lim lim nk A, CA D lim nk R


1
 D 1.
k!1 A!1 k!1

Now for each r, let p be the next larger even integer. We have
 1
x p d nk D mnp
k
! mp ,
1

hence mnp
k
is bounded in k. It follows from Theorem 4.5.2 that
 1  1
x r d nk ! xr d .
1 1
r
But the left side also converges to m by (8). Hence by the uniqueness
hypothesis is the p.m. determined by F. We have therefore proved that every
vaguely convergent subsequence of f n g, or equivalently fFn g, has the same
limit , or equivalently F. Hence the theorem follows from Theorem 4.3.4.

EXERCISES

1. If supn jXn j 2 L p and Xn ! X a.e., then X 2 L p and Xn ! X in L p .


2. If fXn g is dominated by some Y in L p , and converges in dist. to X,
then E jXn jp  ! E jXjp .
3. If Xn ! X in dist., and f 2 C, then fXn  ! fX in dist.
 4. Exercise 3 may be reduced to Exercise 10 of Sec. 4.1 as follows. Let
v
Fn , 1  n  1, be d.f.’s such that Fn ! F. Let  be uniformly distributed on
[0, 1] and put Xn D F1 1
n , 1  n  1, where Fn y D supfx: Fn x  yg
(cf. Exercise 4 of Sec. 3.1). Then Xn has d.f. Fn and Xn ! X1 in pr.
5. Find the moments of the normal d.f. 8 and the positive normal d.f.
8C below:
⎧+
 1 ⎨ 2  x y 2 /2
1 0 e dy, if x ½ 0;
ey /2 dy,
2
8x D p 8C x D
2 1 ⎩ 
0, if x < 0.
Show that in either case the moments satisfy Carleman’s condition.
6. If fXt g and fYt g are uniformly integrable, then so is fXt C Y1 g and
fXt C Yt g.
7. If fXn g is dominated by some Y in L 1 or if it is identically distributed
with finite mean, then it is uniformly integrable.
4.5 UNIFORM INTEGRABILITY; CONVERGENCE OF MOMENTS 105

 8. If sup E jX jp  < 1 for some p > 1, then fX g is uniformly inte-


n n n
grable.
9. If fXn g is uniformly integrable, then the sequence
⎧ ⎫
⎨1  n ⎬
Xj , n ½ 1
⎩n ⎭
jD1

is uniformly integrable.
 10. Suppose the distributions of fX , 1  n  1g are absolutely contin-
n
uous with densities fgn g such that gn ! g1 in Lebesgue measure. Then
gn ! g1 in L 1 1, 1, and consequently for every bounded  Borel measur-
C
able
 function f we have E ffX n g ! E ffX1 g. [ HINT: g1  gn  dx D
 C
g1  gn  dx and g1  gn   g1 ; use dominated convergence.]
Law of large numbers.
5 Random series

5.1 Simple limit theorems


The various concepts of Chapter 4 will be applied to the so-called “law of
large numbers” — a famous name in the theory of probability. This has to do
with partial sums
 n
Sn D Xj
jD1

of a sequence of r.v.’s. In the most classical formulation, the “weak” or the


“strong” law of large numbers is said to hold for the sequence according as
Sn  E Sn 
1 !0
n
in pr. or a.e. This, of course, presupposes the finiteness of E Sn . A natural
generalization is as follows:
Sn  an
! 0,
bn
where fan g is a sequence of real numbers and fbn g a sequence of positive
numbers tending to infinity. We shall present several stages of the development,
5.1 SIMPLE LIMIT THEOREMS 107

even though they overlap each other to some extent, for it is just as important
to learn the basic techniques as the results themselves.
The simplest cases follow from Theorems 4.1.4 and 4.2.3, according to
which if Zn is any sequence of r.v.’s, then E Z2n  ! 0 implies that Zn ! 0
in pr. and Znk ! 0 a.e. for a subsequence fnk g. Applied to Zn D Sn /n, the
first assertion becomes
Sn
2 E Sn2  D on2  ) ! 0 in pr.
n
Now we can calculate E Sn2  more explicitly as follows:
⎛⎛ ⎞2 ⎞ ⎛ ⎞
⎜  n
⎟ n 
3 E Sn  D E ⎝
2 ⎝ ⎠
Xj ⎠ D E ⎝ X2j C 2 Xj Xk ⎠
jD1 jD1 1j<kn


n 
D E X2j  C 2 E Xj Xk .
jD1 1j<kn

Observe that there are n2 terms above, so that even if all of them are bounded
by a fixed constant, only E Sn2  D On2  will result, which falls critically short
of the hypothesis in (2). The idea then is to introduce certain assumptions to
cause enough cancellation among the “mixed terms” in (3). A salient feature
of probability theory and its applications is that such assumptions are not only
permissible but realistic. We begin with the simplest of its kind.

DEFINITION. Two r.v.’s X and Y are said to be uncorrelated iff both have
finite second moments and
4 E XY D E XE Y.

They are said to be orthogonal iff (4) is replaced by


5 E XY D 0.

The r.v.’s of any family are said to be uncorrelated [orthogonal] iff every two
of them are.

Note that (4) is equivalent to


E fX  E XY  E Yg D 0,

which reduces to (5) when E X D E Y D 0. The requirement of finite


second moments seems unnecessary, but it does ensure the finiteness of
E XY (Cauchy–Schwarz inequality!) as well as that of E X and E Y, and
108 LAW OF LARGE NUMBERS. RANDOM SERIES

without it the definitions are hardly useful. Finally, it is obvious that pairwise
independence implies uncorrelatedness, provided second moments are finite.
If fXn g is a sequence of uncorrelated r.v.’s, then the sequence fXn 
E Xn g is orthogonal, and for the latter (3) reduces to the fundamental relation
below:
 n
6 2
Sn  D 2
Xj ,
jD1

which may be called the “additivity of the variance”. Conversely, the validity
of (6) for n D 2 implies that X1 and X2 are uncorrelated. There are only n
terms on the right side of (6), hence if these are bounded by a fixed constant
we have now 2 Sn  D On D on2 . Thus (2) becomes applicable, and we
have proved the following result.

Theorem 5.1.1. If the Xj ’s are uncorrelated and their second moments have
a common bound, then (1) is true in L 2 and hence also in pr.

This simple theorem is actually due to Chebyshev, who invented his


famous inequalities for its proof. The next result, due to Rajchman (1932),
strengthens the conclusion by proving convergence a.e. This result is inter-
esting by virtue of its simplicity, and serves well to introduce an important
method, that of taking subsequences.

Theorem 5.1.2. Under the same hypotheses as in Theorem 5.1.1, (1) holds
also a.e.
PROOF. Without loss of generality we may suppose that E Xj  D 0 for
each j, so that the Xj ’s are orthogonal. We have by (6):
E Sn2   Mn,
where M is a bound for the second moments. It follows by Chebyshev’s
inequality that for each  > 0 we have
Mn M
P fjSn j > ng  D 2.
n2 2 n
If we sum this over n, the resulting series on the right diverges. However, if
we confine ourselves to the subsequence fn2 g, then
  M
P fjSn2 j > n2 g D < 1.
n n
n2 2
Hence by Theorem 4.2.1 (Borel–Cantelli) we have
7 P fjSn2 j > n2  i.o.g D 0;
5.1 SIMPLE LIMIT THEOREMS 109

and consequently by Theorem 4.2.2


Sn2
8 ! 0 a.e.
n2
We have thus proved the desired result for a subsequence; and the “method
of subsequences” aims in general at extending to the whole sequence a result
proved (relatively easily) for a subsequence. In the present case we must show
that Sk does not differ enough from the nearest Sn2 to make any real difference.
Put for each n ½ 1:

Dn D max jSk  Sn2 j.


n2 k<nC12

Then we have
2

nC1
E Dn2   2nE jSnC12  Sn2 j2  D 2n 2
Xj   4n2 M
jDn2 C1

and consequently by Chebyshev’s inequality


4M
P fDn > n2 g  .
2 n2
It follows as before that
Dn
9 ! 0 a.e.
n2
Now it is clear that (8) and (9) together imply (1), since
jSk j jSn2 j C Dn

k n2
for n2  k < n C 12 . The theorem is proved.
The hypotheses of Theorems 5.1.1 and 5.1.2 are certainly satisfied for a
sequence of independent r.v.’s that are uniformly bounded or that are identi-
cally distributed with a finite second moment. The most celebrated, as well as
the very first case of the strong law of large numbers, due to Borel (1909), is
formulated in terms of the so-called “normal numbers.” Let each real number
in [0, 1] be expanded in the usual decimal system:

10 ω D Ðx1 x2 . . . xn . . . .

Except for the countable set of terminating decimals, for which there are two
distinct expansions, this representation is unique. Fix a k: 0  k  9, and let
110 LAW OF LARGE NUMBERS. RANDOM SERIES

n
k ω denote the number of digits among the first n digits of ω that are
equal to k. Then kn ω/n is the relative frequency of the digit k in the first
n places, and the limit, if existing:
n
k ω
11 lim D ϕk ω,
n!1 n
may be called the frequency of k in ω. The number ω is called simply normal
(to the scale 10) iff this limit exists for each k and is equal to 1/10. Intuitively
all ten possibilities should be equally likely for each digit of a number picked
“at random”. On the other hand, one can write down “at random” any number
of numbers that are “abnormal” according to the definition given, such as
Ð1111 . . ., while it is a relatively difficult matter to name even one normal
number in the sense of Exercise 5 below. It turns out that the number

Ð12345678910111213 . . . ,

which is obtained by writing down in succession all the natural numbers in


the decimal system, is a normal number to the scale 10 even in the strin-
gent definition of Exercise 5 below, but the proof is not so easy. As for
determining whether certain well-known numbers such as e  2 or   3 are
normal, the problem seems beyond the reach of our present capability for
mathematics. In spite of these difficulties, Borel’s theorem below asserts that
in a perfectly precise sense almost every number is normal. Furthermore, this
striking proposition is merely a very particular case of Theorem 5.1.2 above.

Theorem 5.1.3. Except for a Borel set of measure zero, every number in
[0, 1] is simply normal.
PROOF. Consider the probability space U , B, m in Example 2 of
Sec. 2.2. Let Z be the subset of the form m/10n for integers n ½ 1, m ½ 1,
then mZ D 0. If ω 2 U nZ, then it has a unique decimal expansion; if ω 2 Z,
it has two such expansions, but we agree to use the “terminating” one for the
sake of definiteness. Thus we have
ω D Ð1 2 . . . n . . . ,
where for each n ½ 1, n Ð is a Borel measurable function of ω. Just as in
Example 4 of Sec. 3.3, the sequence fn , n ½ 1g is a sequence of independent
r.v.’s with
P fn D kg D 101
, k D 0, 1, . . . , 9.

Indeed according to Theorem 5.1.2 we need only verify that the n ’s are
uncorrelated, which is a very simple matter. For a fixed k we define the
5.1 SIMPLE LIMIT THEOREMS 111

r.v. Xn to be the indicator of the set fω: n ω D kg, then E Xn  D 1/10,
E Xn 2  D 1/10, and
1
n
Xj ω
n jD1

is the relative frequency of the digit k in the first n places of the decimal for
ω. According to Theorem 5.1.2, we have then
Sn 1
! a.e.
n 10
Hence in the notation of (11), we have P fϕk D 1/10g D 1 for each k and
consequently also  9 #
 $
1
P ϕk D D 1,
kD0
10
which means that the set of normal numbers has Borel measure one.
Theorem 5.1.3 is proved.
The preceding theorem makes a deep impression (at least on the older
generation!) because it interprets a general proposition in probability theory
at a most classical and fundamental level. If we use the intuitive language of
probability such as coin-tossing, the result sounds almost trite. For it merely
says that if an unbiased coin is tossed indefinitely, the limiting frequency of
“heads” will be equal to 12 — that is, its a priori probability. A mathematician
who is unacquainted with and therefore skeptical of probability theory tends
to regard the last statement as either “obvious” or “unprovable”, but he can
scarcely question the authenticity of Borel’s theorem about ordinary decimals.
As a matter of fact, the proof given above, essentially Borel’s own, is a
lot easier than a straightforward measure-theoretic version, deprived of the
intuitive content [see, e.g., Hardy and Wright, An introduction to the theory of
numbers, 3rd. ed., Oxford University Press, Inc., New York, 1954].

EXERCISES

1. For any sequence of r.v.’s fXn g, if E X2n  ! 0, then (1) is true in pr.
but not necessarily a.e.
 2. Theorem 5.1.2 may be sharpened as follows: under the same hypo-
theses we have Sn /n˛ ! 0 a.e. for any ˛ > 34 .
3. Theorem 5.1.2 remains true if the hypothesis of bounded second mo-
ments is weakened to: 2 Xn  D On  where 0   < 12 . Various combina-
tions of Exercises 2 and 3 are possible.
 4. If fX g are independent r.v.’s such that the fourth moments E X4 
n n
have a common bound, then (1) is true a.e. [This is Cantelli’s strong law of
112 LAW OF LARGE NUMBERS. RANDOM SERIES

large numbers. Without using Theorem 5.1.2 we may operate with E Sn4 /n4 
as we did with E Sn2 /n2 . Note that the full strength of independence is not
needed.]
5. We may strengthen the definition of a normal number by considering
blocks of digits. Let r ½ 1, and consider the successive overlapping blocks of
r consecutive digits in a decimal; there are n  r C 1 such blocks in the first
n places. Let n ω denote the number of such blocks that are identical with
a given one; for example, if r D 5, the given block may be “21212”. Prove
that for a.e. ω, we have for every r:
n
ω 1
lim D r
n!1 n 10

[HINT: Reduce the problem to disjoint blocks, which are independent.]


 6. The above definition may be further strengthened if we consider diffe-
rent scales of expansion. A real number in [0, 1] is said to be completely
normal iff the relative frequency of each block of length r in the scale s tends
to the limit 1/sr for every s and r. Prove that almost every number in [0, 1]
is completely normal.
7. Let ˛ be completely normal. Show that by looking at the expansion
of ˛ in some scale we can rediscover the complete works of Shakespeare
from end to end without a single misprint or interruption. [This is Borel’s
paradox.]
 8. Let X be an arbitrary r.v. with an absolutely continuous distribution.
Prove that with probability one the fractional part of X is a normal number.
[HINT: Let N be the set of normal numbers and consider P fX  [X] 2 Ng.]
9. Prove that the set of real numbers in [0, 1] whose decimal expansions
do not contain the digit 2 is of measure zero. Deduce from this the existence
of two sets A and B both of measure zero such that every real number is
representable as a sum a C b with a 2 A, b 2 B.
 10. Is the sum of two normal numbers, modulo 1, normal? Is the product?
[HINT: Consider the differences between a fixed abnormal number and all
normal numbers: this is a set of probability one.]

5.2 Weak law of large numbers

The law of large numbers in the form (1) of Sec. 5.1 involves only the first
moment, but so far we have operated with the second. In order to drop any
assumption on the second moment, we need a new device, that of “equivalent
sequences”, due to Khintchine (1894–1959).
5.2 WEAK LAW OF LARGENUMBERS 113

DEFINITION. Two sequences of r.v.’s fXn g and fYn g are said to be equiv-
alent iff

1 P fXn 6D Yn g < 1.
n

In practice, an equivalent sequence is obtained by “truncating” in various


ways, as we shall see presently.

Theorem 5.2.1. If fXn g and fYn g are equivalent, then



Xn  Yn  converges a.e.
n

Furthermore if an " 1, then

1 
n
2 Xj  Yj  ! 0 a.e.
an jD1
PROOF. By the Borel–Cantelli lemma, (1) implies that

P fXn 6D Yn i.o.g D 0.

This means that there exists a null set N with the following property: if
ω 2 nN, then there exists n0 ω such that

n ½ n0 ω ) Xn ω D Yn ω.

Thus for such an ω, the two numerical sequences fXn ωg and fYn ωg differ
only in a finite number of terms (how many depending on ω). In other words,
the series 
Xn ω  Yn ω
n

consists of zeros from a certain point on. Both assertions of the theorem are
trivial consequences of this fact.

Corollary. With probability one, the expression


 1 
n
Xn or Xj
n
an jD1

converges, diverges to C1 or 1, or fluctuates in the same way as


 1 
n
Yn or Yj ,
n
an jD1
114 LAW OF LARGE NUMBERS. RANDOM SERIES

respectively. In particular, if
1 
n
Xj
an jD1

converges to X in pr., then so does

1 
n
Yj .
an jD1

To prove the last assertion of the corollary, observe that by Theorem 4.1.2
the relation (2) holds also in pr. Hence if

1 
n
Xj ! X in pr.,
an jD1

then we have
1  1  1 
n n n
Yj D Xj C Yj  Xj  ! X C 0 D X in pr.
an jD1 an jD1 an jD1

(see Exercise 3 of Sec. 4.1).

The next law of large numbers is due to Khintchine. Under the stronger
hypothesis of total independence, it will be proved again by an entirely
different method in Chapter 6.

Theorem 5.2.2. Let fXn g be pairwise independent and identically distributed


r.v.’s with finite mean m. Then we have
Sn
3 ! m in pr.
n
PROOF. Let the common d.f. be F so that
 1  1
m D E Xn  D x dFx, E jXn j D jxj dFx < 1.
1 1

By Theorem 3.2.1 the finiteness of E jX1 j is equivalent to



P jX1 j > n < 1.
n

Hence we have, since the Xn ’s have the same distribution:



4 P jXn j > n < 1.
n
5.2 WEAK LAW OF LARGENUMBERS 115

We introduce a sequence of r.v.’s fYn g by “truncating at n”:



Xn ω, if jXn ωj  n;
Yn ω D
0, if jXn ωj > n.

This is equivalent to fXn g by (4), since P jXn j > n D P Xn 6D Yn . Let



n
Tn D Yj .
jD1

By the corollary above, (3) will follow if (and only if) we can prove Tn /n !
m in pr. Now the Yn ’s are also pairwise independent by Theorem 3.3.1
(applied to each pair), hence they are uncorrelated, since each, being bounded,
has a finite second moment. Let us calculate 2 Tn ; we have by (6) of
Sec. 5.1,

n 
n n 

2
Tn  D 2
Yj   E Y2j  D x 2 dFx.
jD1 jD1 jD1 jxjj

The crudest estimate of the last term yields


n 
 
n   1
nn C 1
x dFx 
2
j jxj dFx  jxj dFx,
jD1 jxjj jD1 jxjj 2 1

which is On2 , but not on2  as required by (2) of Sec. 5.1. To improve
on it, let fan g be a sequence of integers such that 0 < an < n, an ! 1 but
an D on. We have
n   
x 2 dFx D C
jD1 jxjj jan an <jn
   
 an jxj dFx C an jxj dFx
jan jxjan an <jn jxjan

 
C n jxj dFx
an <jn an <jxjn
 1 
 nan jxj dFx C n2 jxj dFx.
1 jxj>an

The first term is Onan  D on2 ; and the second is n2 o1 D on2 , since
the set fx: jxj > an g decreases to the empty set and so the last-written inte-
gral above converges to zero. We have thus proved that 2 Tn  D on2  and
116 LAW OF LARGE NUMBERS. RANDOM SERIES

consequently, by (2) of Sec. 5.1,

1
n
Tn  E Tn 
D fYj  E Yj g ! 0 in pr.
n n jD1

Now it is clear that as n ! 1, E Yn  ! E X D m; hence also


1
n
E Yj  ! m.
n jD1

It follows that
1
n
Tn
D Yj ! m in pr.,
n n jD1

as was to be proved.
For totally independent r.v.’s, necessary and sufficient conditions for
the weak law of large numbers in the most general formulation, due to
Kolmogorov and Feller, are known. The sufficiency of the following crite-
rion is easily proved, but we omit the proof of its necessity (cf. Gnedenko and
Kolmogorov [12]).

Theorem 5.2.3.Let fXn g be a sequence of independent r.v.’s with d.f.’s


fFn g; and Sn D njD1 Xj . Let fbn g be a given sequence of real numbers increa-
sing to C1.
Suppose that we have
n 
(i) jD1 jxj>bn dFj x D o1,
1  
(ii) 2 njD1 jxjbn x 2 dFj x D o1;
bn

then if we put
n 

5 an D x dFj x,
jD1 jxjbn

we have
1
6 Sn  an  ! 0 in pr.
bn
Next suppose that the Fn ’s have the property that there exists a  > 0
such that
7 8n: Fn 0 ½ , 1  Fn 0 ½ .
5.2 WEAK LAW OF LARGENUMBERS 117

Then if (6) holds for the given fbn g and any sequence of real numbers fan g,
the conditions (i) and (ii) must hold.

Remark. Condition (7) may be written as


P fXn  0g ½ , P fXn ½ 0g ½ ;
when  D this means that 0 is a median for each Xn ; see Exercise 9 below.
1
2
In general it ensures that none of the distribution is too far off center, and it
is certainly satisfied if all Fn are the same; see also Exercise 11 below.
It is possible to replace the an in (5) by
n 
x dFj x
jD1 jxjbj

and maintain (6); see Exercise 8 below.


PROOF OF SUFFICIENCY. Define for each n ½ 1 and 1  j  n:

Xj , if jXj j  bn ;
Yn,j D
0, if jXj j > bn ;
and write

n
Tn D Yn,j .
jD1

Then condition (i) may be written as



n
P fYn,j 6D Xj g D o1;
jD1

and it follows from Boole’s inequality that


⎧ ⎫
⎨ n ⎬ n
8 P fTn 6D Sn g  P Yn,j 6D Xj   P fYn,j 6D Xj g D o1.
⎩ ⎭
jD1 jD1

Next, condition (ii) may be written as


  
n
Yn,j 2
E D o1;
jD1
bn

from which it follows, since fYn,j , 1  j  ng are independent r.v.’s:


   n    n
  
2 Tn 2 Yn,j Yn,j 2
D  E D o1.
bn jD1
bn jD1
bn
118 LAW OF LARGE NUMBERS. RANDOM SERIES

Hence as in (2) of Sec. 5.1,


Tn  E Tn 
9 ! 0 in pr.
bn
It is clear (why?) that (8) and (9) together imply
Sn  E Tn 
! 0 in pr.
bn
Since

n n 

E Tn  D E Yn,j  D x dFj x D an ,
jD1 jD1 jxjbn

(6) is proved.
As an application of Theorem 5.2.3 we give an example where the weak
but not the strong law of large numbers holds.

Example. Let fXn g be independent r.v.’s with a common d.f. F such that
c
P fX1 D ng D P fX1 D ng D , n D 3, 4, . . . ,
n2 log n
where c is the constant 1 1
1  1
.
2 nD3
n2 log n

We have then, for large values of n,


  c c
n dFx D n ¾ ,
jxj>n k>n
k 2 log k log n

1  ck 2
n
1 c
ÐnÐ x 2 dFx D ¾ .
n2 jxjn n kD3 k 2 log k log n

Thus conditions (i) and (ii) are satisfied with bn D n; and we have an D 0 by (5).
Hence Sn /n ! 0 in pr. in spite of the fact that E jX1 j D C1. On the other hand,
we have
c
P fjX1 j > ng ¾ ,
n log n
so that, since X1 and Xn have the same d.f.,
 
P fjXn j > ng D P fjX1 j > ng D 1.
n n

Hence by Theorem 4.2.4 (Borel–Cantelli),


P fjXn j > n i.o.g D 1.
5.2 WEAK LAW OF LARGENUMBERS 119

But jSn  Sn1 j D jXn j > n implies jSn j > n/2 or jSn1 j > n/2; it follows that
! n "
P jSn j > i.o. D 1,
2
and so it is certainly false that Sn /n ! 0 a.e. However, we can prove more. For any
A > 0, the same argument as before yields
P fjXn j > An i.o.g D 1
and consequently  
An
P jSn j > i.o. D 1.
2
This means that for each A there is a null set ZA such that if ω 2 nZA, then
Sn ω A
10 lim ½ .
n!1 n 2
Let Z D 1 mD1 Zm; then Z is still a null set, and if ω 2 nZ, (10) is true for every
A, and therefore the upper limit is C1. Since X is “symmetric” in the obvious sense,
it follows that
Sn Sn
lim D 1, lim D C1 a.e.
n!1 n n!1 n

EXERCISES


n
Sn D Xj .
jD1

1. For any sequence of r.v.’s fXn g, and any p ½ 1:


Sn
Xn ! 0 a.e. ) ! 0 a.e.,
n
Sn
Xn ! 0 in L p ) ! 0 in L p .
n
The second result is false for p < 1.
2. Even for a sequence of independent r.v.’s fXn g,
Sn
Xn ! 0 in pr. 6) ! 0 in pr.
n
[HINT: Let Xn take the values 2n and 0 with probabilities n1 and 1  n1 .]
3. For any sequence fXn g:
Sn Xn
! 0 in pr. ) ! 0 in pr.
n n
More generally, this is true if n is replaced by bn , where bnC1 /bn ! 1.
120 LAW OF LARGE NUMBERS. RANDOM SERIES

 4. For any υ > 0, we have

  
n
lim pk 1  pnk D 0
n!1 k
jknpj>nυ

uniformly in p: 0 < p < 1.


 5. Let P X D 2n  D 1/2n , n ½ 1; and let fX , n ½ 1g be independent
1 n
and identically distributed. Show that the weak law of large numbers does not
hold for bn D n; namely, with this choice of bn no sequence fan g exists for
which (6) is true. [This is the St. Petersburg paradox, in which you win 2n if
it takes n tosses of a coin to obtain a head. What would you consider as a
fair entry fee? and what is your mathematical expectation?]
 6. Show on the contrary that a weak law of large numbers does hold for
bn D n log n and find the corresponding an . [HINT: Apply Theorem 5.2.3.]
7. Conditions (i) and (ii) in Theorem 5.2.3 imply that for any υ > 0,
n 

dFj x D o1
jD1 jxj>υbn

p
and that an D o nbn .
8. They also imply that
n 
1 
x dFj x D o1.
bn jD1 bj <jxjbn

[HINT: Use the first part of Exercise 7 and divide the interval of integration
bj < jxj  bn into parts of the form k < jxj  kC1 with  > 1.]
9. A median of the r.v. X is any number ˛ such that

P fX  ˛g ½ 12 , P fX ½ ˛g ½ 12 .

Show that such a number always exists but need not be unique.
 10. Let fX , 1  n  1g be arbitrary r.v.’s and for each n let m be a
n n
median of Xn . Prove that if Xn ! X1 in pr. and m1 is unique, then mn !
m1 . Furthermore, if there exists any sequence of real numbers fcn g such that
Xn  cn ! 0 in pr., then Xn  mn ! 0 in pr.
11. Derive the following form of the weak law of large numbers from
Theorem 5.2.3. Let fbn g be as in Theorem 5.2.3 and put Xn D 2bn for n ½ 1.
Then there exists fan g for which (6) holds but condition (i) does not.
5.3 CONVERGENCE OF SERIES 121

12. Theorem 5.2.2 may be slightly generalized as follows. Let fXn g be


pairwise independent with a common d.f. F such that
 
i x dFx D o1, ii n dFx D o1;
jxjn jxj>n

then Sn /n ! 0 in pr.
13. Let fXn g be a sequence of identically distributed strictly positive
random variables. For any ϕ such that ϕn/n ! 0 as n ! 1, show that
P fSn > ϕn i.o.g D 1, and so Sn ! 1 a.e. [HINT: Let Nn denote the number
of k  n such that Xk  ϕn/n. Use Chebyshev’s inequality to estimate
P fNn > n/2g and so conclude P fSn > ϕn/2g ½ 1  2Fϕn/n. This pro-
blem was proposed as a teaser and the rather unexpected solution was given
by Kesten.]
14. Let fbn g be as in Theorem 5.2.3. and put Xn D 2bn for n ½ 1. Then
there exists fan g for which (6) holds, but condition (i) does not hold. Thus
condition (7) cannot be omitted.

5.3 Convergence of series


If the terms of an infinite series are independent r.v.’s, then it will be shown
in Sec. 8.1 that the probability of its convergence is either zero or one. Here
we shall establish a concrete criterion for the latter alternative. Not only is
the result a complete answer to the question of convergence of independent
r.v.’s, but it yields also a satisfactory form of the strong law of large numbers.
This theorem is due to Kolmogorov (1929). We begin with his two remarkable
inequalities. The first is also very useful elsewhere; the second may be circum-
vented (see Exercises 3 to 5 below), but it is given here in Kolmogorov’s
original form as an example of true virtuosity.

Theorem 5.3.1. Let fXn g be independent r.v.’s such that


8n: E Xn  D 0, E X2n  D 2
Xn  < 1.
Then we have for every  > 0:
2
Sn 
1 P f max jSj j > g  .
1jn 2

Remark. If we replace the max1jn jSj j in the formula by jSn j, this


becomes a simple case of Chebyshev’s inequality, of which it is thus an
essential improvement.
122 LAW OF LARGE NUMBERS. RANDOM SERIES

PROOF. Fix  > 0. For any ω in the set


3 D fω: max jSj ωj > g,
1jn

let us define
ω D minfj: 1  j  n, jSj ωj > g.
Clearly is an r.v. with domain 3. Put
3k D fω: ω D kg D fω: max jSj ωj  , jSk ωj > g,
1jk1

where for k D 1, max1j0 jSj ωj is taken to be zero. Thus is the “first
time” that the indicated maximum exceeds , and 3k is the event that this
occurs “for the first time at the kth step”. The 3k ’s are disjoint and we have
n
3D 3k .
kD1

It follows that
 n 
 n 

2 Sn dP D
2
Sn2 dP D [Sk C Sn  Sk ]2 dP
3 kD1 3k kD1 3k

n 

D [Sk2 C 2Sk Sn  Sk  C Sn  Sk 2 ] dP .
kD1 3k

Let ϕk denote the indicator of 3k , then the two r.v.’s ϕk Sk and Sn  Sk are
independent by Theorem 3.3.2, and consequently (see Exercise 9 of Sec. 3.3)
 
Sk Sn  Sk  dP D ϕk Sk Sn  Sk  dP
3k 
 
D ϕk Sk dP Sn  Sk  dP D 0,
 
since the last-written integral is

n
E Sn  Sk  D E Xj  D 0.
jDkC1

Using this in (2), we obtain


  n 

2
Sn  D Sn2 dP ½ Sn2 dP ½ Sk2 dP
 3 kD1 3k


n
½ 2 P 3k  D 2 P 3,
kD1
5.3 CONVERGENCE OF SERIES 123

where the last inequality is by the mean value theorem, since jSk j >  on 3k by
definition. The theorem now follows upon dividing the inequality above by 2 .

Theorem 5.3.2. Let fXn g be independent r.v.’s with finite means and sup-
pose that there exists an A such that
3 8n: jXn  E Xn j  A < 1.
Then for every  > 0 we have
2A C 42
4 P f max jSj j  g  2 S 
.
1jn n
PROOF. Let M0 D , and for 1  k  n:

Mk D fω: max jSj j  g,


1jk

1k D Mk1  Mk .
We may suppose that P Mn  > 0, for otherwise (4) is trivial. Furthermore,
let S00 D 0 and for k ½ 1,

k
X0k D Xk  E Xk , Sk0 D X0j .
jD1

Define numbers ak , 0  k  n, as follows:



1
ak D S0 dP ,
P Mk  Mk k
so that

5 Sk0  ak  dP D 0.
Mk

Now we write
 
0
6 SkC1  akC1 2 dP D Sk0  ak C ak  akC1 C X0kC1 2 dP
MkC1 Mk

 Sk0  ak C ak  akC1 C X0kC1 2 dP
1kC1

and denote the two integrals on the right by I1 and I2 , respectively. Using the
definition of Mk and (3), we have
  
 1 
0 
jSk  ak j D Sk  E Sk   [Sk  E Sk ] dP 
P Mk  Mk
  
 1 

D Sk  Sk dP   jSk j C ;
P Mk  Mk
124 LAW OF LARGE NUMBERS. RANDOM SERIES

  
 1 1

jak  akC1 j D  Sk dP  Sk dP
P Mk  Mk P MkC1  MkC1
 
1 
7  XkC1 dP   2 C A.
0
P MkC1  MkC1
It follows that, since jSk j   on 1kC1 ,

I2  jSk j C  C 2 C A C A2 dP  4 C 2A2 P 1kC1 .
1kC1

On the other hand, we have



0
I1 D fSk0  ak 2 C ak  akC1 2 C XkC1
2
C 2Sk0  ak ak  akC1 
Mk

C 2Sk0  ak X0kC1 C 2ak  akC1 X0kC1 g dP .

The integrals of the last three terms all vanish by (5) and independence, hence
 
0
0
I1 ½ Sk  ak  dP C
2
XkC12
dP
Mk Mk

D Sk0  ak 2 dP C P Mk  2
XkC1 .
Mk

Substituting into (6), and using Mk ¦ Mn , we obtain for 0  k  n  1:


 
0
SkC1  akC1  dP 
2
Sk0  ak 2 dP
MkC1 Mk

½ P Mn  XkC1   4 C 2A2 P 1kC1 .


2

Summing over k and using (7) again for k D n:


 
42 P Mn  ½ jSn j C 2 dP ½ Sn0  an 2 dP
Mn Mn


n
½ P Mn  2
Xj   4 C 2A2 P nMn ,
jD1

hence

n
2A C 42 ½ P Mn  2
Xj ,
jD1

which is (4).
5.3 CONVERGENCE OF SERIES 125

We can now prove the “three series theorem” of Kolmogorov (1929).

Theorem 5.3.3. Let fXn g be independent r.v.’s and define for a fixed con-
stant A > 0: 
Xn ω, if jXn ωj  A;
Yn ω D
0, if jXn ωj > A.

Then the series n Xn converges a.e. if and only if the following three series
all converge:
 
(i) n P fjXn j > Ag D n P fXn 6D Yn g,

(ii) n E Yn ,
 2
(iii) n Yn .
PROOF. Suppose that the three series converge. Applying Theorem 5.3.1
to the sequence fYn  E Yn g, we have for every m ½ 1:
⎧   ⎫
⎨   ⎬ n0
 k  1
max  
fYj  E Yj g  ½1m 2 2
Yj .
⎩nkn0 
P
 m ⎭
jDn jDn

If we denote the probability on the left by P m, n, n0 , it follows from the


convergence of (iii) that for each m:
lim lim P m, n, n0  D 1.
n!1 n0 !1

This means that the tail of n fYn  E Yn g converges
 to zero a.e., so that the
series converges a.e. Since (ii) converges, so does
 n Yn . Since (i) converges,
fXn g and fYn g are equivalent sequences; hence n Xn also converges a.e. by
Theorem 5.2.1. We have therefore  proved the “if” part of the theorem.
Conversely, suppose that n Xn converges a.e. Then for each A > 0:
P jXn j > A i.o. D 0.
It follows from the
 Borel–Cantelli lemma that the series (i) must converge.
Hence as before n Yn also converges a.e. But since jYn  E Yn j  2A, we
have by Theorem 5.3.2
⎧   ⎫
⎨   ⎬
 k  4A C 42
P max 0  Yj   1  0 .
⎩nkn   ⎭ n
jDn 2 Y 
j
jDn

Were the series (iii) to diverge, the probability


 above would tend to zero as
n0 ! 1 for each n, hence the tail of n Yn almost surely would not be
126 LAW OF LARGE NUMBERS. RANDOM SERIES

bounded by 1, so the series could not converge. This


 contradiction proves that
(iii) must converge. Finally, consider the series n fYn  E Yn g and apply
the proven part of the theorem to it. We have P fjYn  E Yn j > 2Ag D 0 and
E Yn  E Yn  D 0 so that the two series corresponding to (i) and (ii) with
2A for A vanish identically, while (iii) has just 
been shown to converge. It
 n fYn  E Yn g converges
follows from the sufficiency of the criterion that 
a.e., and so by equivalence the same is true of n fXn  E Yn g. Since n Xn
also converges a.e. by hypothesis, we conclude by subtraction that the series
(ii) converges. This completes the proof of the theorem.
The convergence of a series of r.v.’s is, of course, by definition the same
as its partial sums, in any sense of convergence discussed in Chapter 4. For
series of independent terms we have, however, the following theorem due to
Paul Lévy.

Theorem 5.3.4. If fXn g is a sequence of independent r.v.’s, then the conver-
gence of the series n Xn in pr. is equivalent to its convergence a.e.
PROOF. By Theorem 4.1.2, it is sufficient to prove that convergence of

n n in pr. implies its convergence a.e. Suppose the former; then, given
X
: 0 <  < 1, there exists m0 such that if n > m > m0 , we have
8 P fjSm,n j > g < ,
where

n
Sm,n D Xj .
jDmC1

It is obvious that for m < k  n we have


n
9 f max jSm,j j  2; jSm,k j > 2; jSk,n j  g ² fjSm,n j > g
m<jk1
kDmC1

where the sets in the union are disjoint. Going to probabilities and using
independence, we obtain

n
P f max jSm,j j  2; jSm,k j > 2gP fjSk,n j  g  P fjSm,n j > g.
m<jk1
kDmC1

If we suppress the factors P fjSk,n j  g, then the sum is equal to


P f max jSm,j j > 2g
m<jn

(cf. the beginning of the proof of Theorem 5.3.1). It follows that


P f max jSm,j j > 2g min P fjSk,n j  g  P fjSm,n j > g.
m<jn m<kn
5.3 CONVERGENCE OF SERIES 127

This inequality is due to Ottaviani. By (8), the second factor on the left exceeds
1  , hence if m > m0 ,

1 
10 P f max jSm,j j > 2g  P fjSm,n j > g < .
m<jn 1 1

Letting n ! 1, then m ! 1, and finally  ! 0 through a sequence of


values, we see that the triple limit of the 
first probability in (10) is equal
to zero. This proves the convergence a.e. of n Xn by Exercise 8 of Sec. 4.2.
It is worthwhile to observe the underlying similarity between the inequal-
ities (10) and (1). Both give a “probability upper bound” for the maximum of
a number of summands in terms of the last summand as in (10), or its variance
as in (1). The same principle will be used again later.
Let us remark also that even the convergence of Sn in dist. is equivalent
to that in pr. or a.e. as just proved; see Theorem 9.5.5.
We give some examples of convergence of series.


Example. n š1/n.
This is meant to be the “harmonic series” with a random choice of signs in each
term, the choices being totally independent and equally likely to be C or  in each
case. More precisely, it is the series

 Xn
,
n
n

where fXn , n ½ 1g is a sequence of independent, identically distributed r.v.’s taking


the values š1 with probability 12 each.
We may take A D 1 in Theorem 5.3.3 so that the two series (i) and (ii) vanish
 iden-
tically. Since 2 Xn  D 2 Yn  D 1/n2 , the series (iii) converges. Hence,
 n š1/n
converges a.e. by the criterion above. The same conclusion applies to n š1/n if
1
2
<   1. Clearly there is no absolute convergence. For 0    12 , the probability
of convergence is zero by the same criterion and by Theorem 8.1.2 below.

EXERCISES

1. Theorem 5.3.1 has the following “one-sided” analogue. Under the


same hypotheses, we have
2
Sn 
P f max Sj ½ g  .
1jn 2 C 2 Sn 

[This is due to A. W. Marshall.]


128 LAW OF LARGE NUMBERS. RANDOM SERIES

 2. Let fX g be independent and identically distributed with mean 0 and


n
variance 1. Then we have for every x:
p
P f max Sj ½ xg  2P fSn ½ x  2ng.
1jn

[HINT: Let
3k D f max Sj < x; Sk ½ xg
1j<k
n p p
then kD1P f3k ; Sn  Sk ½  2ng  P fSn ½ x  2ng.]
3. Theorem 5.3.2 has the following companion, which is easier to prove.
Under the joint hypotheses in Theorems 5.3.1 and 5.3.2, we have
A C 2
P f max jSj j  g  2 S 
.
1jn n

4. Let fXn , X0n , n ½ 1g be independent r.v.’s such that Xn and X0n have
the same distribution. Suppose further that all these r.v.’s are bounded by the
same constant A. Then 
Xn  X0n 
n

converges a.e. if and only if



2
Xn  < 1.
n

Use Exercise 3 to prove this without recourse to Theorem 5.3.3, and so finish
the converse part of Theorem 5.3.3.
 5. But neither Theorem 5.3.2 nor the alternative indicated in the prece-
ding exercise is necessary; what we need is merely the following result, which
is an easy consequence of a general theorem in Chapter 7. Let fXn g be a
sequence of independent and uniformly bounded r.v.’s with 2 Sn  ! C1.
Then for every A > 0 we have
lim P fjSn j  Ag D 0.
n!1

Show that this is sufficient to finish the proof of Theorem 5.3.3.


 6. The following analogue of the inequalities of Kolmogorov and Otta-
viani is due to P. Lévy. Let Sn be the sum of n independent r.v.’s and
Sn0 D Sn  m0 Sn , where m0 Sn  is a median of Sn . Then we have
! "
P f max jSj0 j > g  3P jSn0 j > .
1jn 2
[HINT: Try “4” in place of “3” on the right.]
5.4 STRONG LAW OF LARGENUMBERS 129

7. For arbitrary fXn g, if



E jXn j < 1,
n

then nXn converges absolutely a.e.
8. Let fXn g, where n D 0, š1, š2, . . ., be independent and identically
distributed according to the normal distribution 8 (see Exercise 5 of Sec. 4.5).
Then the series of complex-valued r.v.’s
1
 1

einx Xn einx Xn
xX0 C C ,
nD1
in nD1
in
p
where i D 1 and x is real, converges a.e. and uniformly in x. (This is
Wiener’s representation of the Brownian motion process.)
 9. Let fX g be independent and identically distributed, taking the values
n
0 and 2 with probability 12 each; then
1
 Xn
nD1
3n

converges a.e. Prove that the limit has the Cantor d.f. discussed in Sec. 1.3.
Do Exercise 11 in that section again; it is easier now.
 10. If  šX converges a.e. for all choices of š1, where the X ’s are
n n   n
arbitrary r.v.’s, then n Xn 2 converges a.e. [HINT: Consider n rn tXn ω
where the rn ’s are coin-tossing r.v.’s and apply Fubini’s theorem to the space
of t, ω.]

5.4 Strong law of large numbers


To return to the strong law of large numbers, the link is furnished by the
following lemma on “summability”.

Kronecker’s lemma. Let fxk g be a sequence of real numbers, fak g a


sequence of numbers >0 and " 1. Then
 xn 1 
n
< converges ) xj ! 0.
n
an an jD1
PROOF. For 1  n  1 let
n
xj
bn D .
a
jD1 j
130 LAW OF LARGE NUMBERS. RANDOM SERIES

If we also write a0 D 0, b0 D 0, we have

xn D an bn  bn1 

and

1  1  1 
n n n1
xj D aj bj  bj1  D bn  bj ajC1  aj 
an jD1 an jD1 an jD0

(Abel’s method of partial summation). Since ajC1  aj ½ 0,

1 
n1
ajC1  aj  D 1,
an jD0

and bn ! b1 , we have

1 
n
xj ! b1  b1 D 0.
an jD1

The lemma is proved.


Now let ϕ be a positive, even, and continuous function on R1 such that
as jxj increases,
ϕx ϕx
1 ", #.
jxj x2

Theorem 5.4.1. Let fXn g be a sequence of independent r.v.’s with E Xn  D


0 for every n; and 0 < an " 1. If ϕ satisfies the conditions above and
 E ϕXn 
2 < 1,
n
ϕan 
then
 Xn
3 converges a.e.
n
an
PROOF. Denote the d.f. of Xn by Fn . Define for each n:

Xn ω, if jXn ωj  an ,
4 Yn ω D
0, if jXn ωj > an .
Then  
 Y2n  x2
E D dFn x.
n
an2 n jxjan an2
5.4 STRONG LAW OF LARGENUMBERS 131

By the second hypothesis in (1), we have


x2 ϕx
 for jxj  an .
an2 ϕan 
It follows that
      
Yn Y2n ϕx
2
 E  dFn x
n
an n
an2 n jxjan ϕan 
 E ϕXn 
 < 1.
n
ϕan 

Thus, for the r.v.’s fYn  E Yn g/an , the series (iii) in Theorem 5.3.3 con-
verges, while the two other series vanish for A D 2, since jYn  E Yn j  2an ;
hence
 1
5 fYn  E Yn g converges a.e.
n
an

Next we have
 jE Yn j  1   


1 


D  
x dFn x D x dFn x
an 
an jxjan 
an jxj>an
n n n
 
jxj
 dFn x,
n jxj>an an
1
where the second equation follows from 1 x dFn x D 0. By the first hypo-
thesis in (1), we have
jxj ϕx
 for jxj > an .
an ϕan 
It follows that
 jE Yn j  ϕx  E ϕXn 
 dFn x  < 1.
n
an n jxj>an ϕan  n
ϕan 

This and (5) imply that n Yn /an  converges a.e. Finally, since ϕ ", we have
   ϕx
P fXn 6D Yn g D dFn x  dFn x
n n jxj>an n jxj>an ϕan 

 E ϕXn 
 < 1.
n
ϕan 

Thus, fXn g and fYn g are equivalent sequences and (3) follows.
132 LAW OF LARGE NUMBERS. RANDOM SERIES

Applying Kronecker’s lemma to (3) for each ω in a set of probability


one, we obtain the next result.

Corollary. Under the hypotheses of the theorem, we have


1 
n
6 Xj ! 0 a.e.
an jD1

Particular cases. (i) Let ϕx D jxjp , 1  p  2; an D n. Then we have


 1 1
n
7 E jXn j p
 < 1 ) Xj ! 0 a.e.
n
np n jD1

For p D 2, this is due to Kolmogorov; for 1  p < 2, it is due to


Marcinkiewicz and Zygmund.
(ii) Suppose for some υ, 0 < υ  1 and M < 1 we have

8n: E jXn j1Cυ   M.

Then the hypothesis in (7) is clearly satisfied with p D 1 C υ. This case is


due to Markov. Cantelli’s theorem under total independence (Exercise 4 of
Sec. 5.1) is a special case.
(iii) By proper choice of fan g, we can considerably sharpen the conclu-
sion (6). Suppose

n
8n: 2
Xn  D 2
n < 1, 2
Sn  D sn2 D 2
j ! 1.
jD1

Choose ϕx D x 2 and an D sn log sn 1/2C ,  > 0, in the corollary to


Theorem 5.4.1. Then
 E X2   2
n
D n
<1
n
an2 n
sn2 log sn 1C2

by Dini’s theorem, and consequently


Sn
! 0 a.e.
sn log sn 1/2C

In case all n2 D 1 so that sn2 D n, the above ratio has a denominator that is
close to n1/2 . Later we shall see that n1/2 is a critical order of magnitude
for Sn .
5.4 STRONG LAW OF LARGENUMBERS 133

We now come to the strong version of Theorem 5.2.2 in the totally inde-
pendent case. This result is also due to Kolmogorov.

Theorem 5.4.2. Let fXn g be a sequence of independent and identically distri-


buted r.v.’s. Then we have
Sn
8 E jX1 j < 1 ) ! E X1  a.e.,
n
jSn j
9 E jX1 j D 1 ) lim D C1 a.e.
n!1 n

PROOF. To prove (8) define fYn g as in (4) with an D n. Since


  
P fXn 6D Yn g D P fjXn j > ng D P fjX1 j > ng < 1
n n n

by Theorem 3.2.1, fXn g and fYn g are equivalent sequences. Let us apply (7)
to fYn  E Yn g, with ϕx D x 2 . We have
 2 Yn   E Y2   1 
10 2
 2
n
D 2
x 2 dFx.
n
n n
n n
n jxjn

We are obliged to estimate the last written second moment in terms of the first
moment, since this is the only one assumed in the hypothesis. The standard
technique is to split the interval of integration and then invert the repeated
summation, as follows:
1 n 
1 
2
x 2 dFx
nD1
n jD1 j1<jxjj

1 
 1
1
D x 2 dFx 2
jD1 j1<jxjj nDj
n
1
  1 
C
 j jxj dFx Ð C Ð jxj dFx
jD1 j1<jxjj j jD1 j1jxjj

D CE jX1 j < 1.

In the above we have used the elementary estimate 1 nDj n
2
 Cj1 for
some constant C and all j ½ 1. Thus the first sum in (10) converges, and we
conclude by (7) that

1
n
fYj  E Yj g ! 0 a.e.
n jD1
134 LAW OF LARGE NUMBERS. RANDOM SERIES

Clearly E Yn  ! E X1  as n ! 1; hence also


1
n
E Yj  ! E X1 ,
n jD1
and consequently
1
n
Yj ! E X1  a.e.
n jD1

By Theorem 5.2.1, the left side above may be replaced by 1/n njD1 Xj ,
proving (8).
To prove (9), we note that E jX1 j D 1 implies E jX1 j/A D 1 for each
A > 0 and hence, by Theorem 3.2.1,

P jX1 j > An D C1.
n
Since the r.v.’s are identically distributed, it follows that

P jXn j > An D C1.
n
Now the argument in the example at the end of Sec. 5.2 may be repeated
without any change to establish the conclusion of (9).
Let us remark that the first part of the preceding theorem is a special case
of G. D. Birkhoff’s ergodic theorem, but it was discovered a little earlier and
the proof is substantially simpler.
N. Etemadi proved an unexpected generalization of Theorem 5.4.2: (8)
is true when the (total) independence of fXn g is weakened to pairwise
independence (An elementary proof of the strong law of large numbers,
Z. Wahrscheinlichkeitstheorie 55 (1981), 119–122).
Here is an interesting extension of the law of large numbers when the
mean is infinite, due to Feller (1946).
Theorem 5.4.3. Let fXn g be as in Theorem 5.4.2 with E jX1 j D 1. Let
fan g be a sequence of positive numbers satisfying the condition an /n ". Then
we have
jSn j
11 lim D 0 a.e., or D 1 a.e.
n an
according as
 
12 P fjXn j ½ an g D dFx < 1, or D 1.
n n jxj½an

PROOF. Writing
 1 

dFx D dFx,
jxj½an kDn ak jxj<akC1
5.4 STRONG LAW OF LARGENUMBERS 135

substituting into (12) and rearranging the double series, we see that the series
in (12) converges if and only if
 
13 k dFx < 1.
k ak1 jxj<ak

Assuming, this is the case, we put



n D x dFx;
jxj<an

Xn  if jXn j < an ,
Yn D n
 n if jXn j ½ an .

Thus E Yn  D 0. We have by (12),



14 P fYn 6D Xn  ng < 1.
n

Next, with a0 D 0:
    1 
Y2n
E  x 2 dFx
n
an2 n
an2 jxj<an
1 n 
1 
D 2
x 2 dFx
nD1
an kD1 ak1 jxj<ak

1 
 1
1
 dFxak2 .
kD1 ak1 jxj<ak a2
nDk n

Since an /n ½ ak /k for n ½ k, we have


1 1
1 k2  1 2k
2
 2 2
 2,
a
nDk n
ak nDk n ak

and so   
 1

Y2n
E  2k dFx < 1
n
an2 kD1 ak1 jxj<ak


by (13). Hence Yn /an converges (absolutely) a.e. by Theorem 5.4.1, and
so by Kronecker’s lemma:

1 
n
15 Yk ! 0 a.e.
an kD1
136 LAW OF LARGE NUMBERS. RANDOM SERIES

We now estimate the quantity


n 
1  1 
n
16 k D x dFx
an kD1 an kD1 jxj<ak

as n ! 1. Clearly for any N < n, it is bounded in absolute value by


  
n
17 aN C jxj dFx .
an aN <jxj<an

Since E jX1 j D 1, the series in (12) cannot converge if an /n remains boun-


ded (see Exercise 4 of Sec. 3.2). Hence for fixed N the term n/an aN in (17)
tends to 0 as n ! 1. The rest is bounded by
 
n  
n n
18 aj dFx  j dFx
an jDNC1 aj1 jxj<aj jDNC1 aj1 jxj<aj

because naj /an  j for j  n. We may now as well replace the n in the
right-hand member above by 1; as N ! 1, it tends to 0 as the remainder of
the convergent series in (13). Thus the quantity in (16) tends to 0 as n ! 1;
combine this with (14) and (15), we obtain the first alternative in (11).
The second alternative is proved in much the same way as in Theorem 5.4.2
and is left as an exercise. Note that when an D n it reduces to (9) above.

Corollary. Under the conditions of the theorem, we have


19 P fjSn j ½ an i.o.g D P fjXn j ½ an i.o.g.

This follows because the second probability above is equal to 0 or 1


according as the series in (12) converges or diverges, by the Borel–Cantelli
lemma. The result is remarkable in suggesting that the nth partial sum and
the nth individual term of the sequence fXn g have comparable growth in a
certain sense. This is in contrast to the situation when E jX1 j < 1 and (19)
is false for an D n (see Exercise 12 below).

EXERCISES

The Xn ’s are independent throughout;


 in Exercises 1, 6, 8, 9, and 12 they are
also identically distributed; Sn D njD1 Xj .
 1. If E XC  D C1, E X  < 1, then S /n ! C1 a.e.
1 1 n
 2. There is a complement to Theorem 5.4.1 as follows. Let fa g and ϕ
n
be as there except that the conditions in (1) are replaced by the condition that
ϕx " and ϕx/jxj #. Then again (2) implies (3).
5.4 STRONG LAW OF LARGENUMBERS 137

3. Let fXn g be independent and identically distributed r.v.’s such that


E jX1 jp  < 1 for some p: 0 < p < 2; in case p > 1, we assume also that
E X1  D 0. Then Sn n1/p ! 0 a.e. For p D 1 the result is weaker than
Theorem 5.4.2.
4. Both Theorem 5.4.1 and its complement in Exercise 2 above are “best
possible” in the following sense. Let fan g and ϕ be as before and suppose that
bn > 0,
 bn
D 1.
n
ϕan 

Then there exists a sequence of independent and identically distributed r.v.’s


fXn g such that E Xn  D 0, E ϕXn  D bn , and

 Xn
P converges D 0.
n
an

[HINT: Make n P fjXn j ½ an g D 1 by letting each Xn take two or three
values only according as bn /ϕan   1 or >1.]
5. Let Xn take the values šn with probability 12 each. If 0   <
2 , then Sn /n ! 0 a.e. What if  ½ 2 ? [HINT: To answer the question, use
1 1

Theorem 5.2.3 or Exercise 12 of Sec. 5.2; an alternative method is to consider


the characteristic function of Sn /n (see Chapter 6).]
6. Let E X1  D 0 and fcn g be a bounded sequence of real numbers. Then

1
n
cj Xj ! 0 a.e.
n jD1

[HINT: Truncate Xn at n and proceed as in Theorem 5.4.2.]


7. We have Sn /n ! 0 a.e. if and only if the following two conditions
are satisfied:

(i) Sn /n ! 0 in pr.,
(ii) S2n /2n ! 0 a.e.;

an alternative set of conditions is (i) and



(ii0 ) 8 > 0: n P jS2nC1  S2n j > 2n  < 1.
 8. If E jX j < 1, then the sequence fS /ng is uniformly integrable and
1 n
Sn /n ! E X1  in L 1 as well as a.e.
138 LAW OF LARGE NUMBERS. RANDOM SERIES

9. Construct an example where E XC 


1  D E X1  D C1 and Sn /n !
C1 a.e. [HINT: Let
0
0 < ˛ < ˇ < 1 and take a d.f. F such that 1  Fx ¾ x ˛
as x ! 1, and 1 jxjˇ dFx < 1. Show that

1/˛0
P f max XC
j n g<1
1jn
n

for every ˛0 > ˛ and use Exercise 3 for njD1 X j . This example is due to
Derman and Robbins. Necessary and sufficient condition for Sn /n ! C1
has been given recently by K. B. Erickson.
10. Suppose there exist an ˛, 0 < ˛ < 2, ˛ 6D 1, and two constants A1
and A2 such that
A1 A2
8n, 8x > 0: ˛
 P fjXn j > xg  ˛ .
x x
If ˛ > 1, suppose also that E Xn  D 0 for each n. Then for any sequence fan g
increasing to infinity, we have
  1 <
0
P fjSn j > an i.o.g D if 1.
1 n
˛˛n D

[This result, due to P. Lévy and Marcinkiewicz, was stated with a superfluous
condition on fan g. Proceed as in Theorem 5.3.3 but truncate Xn at an ; direct
estimates are easy.]
11. Prove the second alternative in Theorem 5.4.3.
12. If E X1  6D 0, then max1kn jXk j/jSn j ! 0 a.e. [HINT: jXn j/n !
0 a.e.]
13. Under the assumptions in Theorem 5.4.2, if Sn /n converges a.e. then
E jX1 j < 1. [Hint: Xn /nconverges to 0 a.e., hence P fjXn j > n i.o.g D 0;
use Theorem 4.2.4 to get n P fjX1 j > ng < 1.]

5.5 Applications
The law of large numbers has numerous applications in all parts of proba-
bility theory and in other related fields such as combinatorial analysis and
statistics. We shall illustrate this by two examples involving certain important
new concepts.
The first deals with so-called “empiric distributions” in sampling theory.
Let fXn , n ½ 1g be a sequence of independent, identically distributed r.v.’s
with the common d.f. F. This is sometimes referred to as the “underlying”
or “theoretical distribution” and is regarded as “unknown” in statistical lingo.
For each ω, the values Xn ω are called “samples” or “observed values”, and
the idea is to get some information on F by looking at the samples. For each
5.5 APPLICATIONS 139

n, and each ω 2 , let the n real numbers fXj ω, 1  j  ng be arranged in


increasing order as

1 Yn1 ω  Yn2 ω  Ð Ð Ð  Ynn ω.

Now define a discrete d.f. Fn Ð, ω as follows:


Fn x, ω D 0, if x < Yn1 ω,
k
Fn x, ω D , if Ynk ω  x < Yn,kC1 ω, 1  k  n  1,
n
Fn x, ω D 1, if x ½ Ynn ω.

In other words, for each x, nFn x, ω is the number of values of j, 1  j  n,


for which Xj ω  x; or again Fn x, ω is the observed frequency of sample
values not exceeding x. The function Fn Ð, ω is called the empiric distribution
function based on n samples from F.
For each x, Fn x, Ð is an r.v., as we can easily check. Let us introduce
also the indicator r.v.’s fj x, j ½ 1g as follows:

1 if Xj ω  x,
j x, ω D
0 if Xj ω > x.

We have then
1
n
Fn x, ω D j x, ω.
n jD1

For each x, the sequence fj xg is totally independent since fXj g is, by
Theorem 3.3.1. Furthermore they have the common “Bernoullian distribution”,
taking the values 1 and 0 with probabilities p and q D 1  p, where

p D Fx, q D 1  Fx;

thus E j x D Fx. The strong law of large numbers in the form
Theorem 5.1.2 or 5.4.2 applies, and we conclude that

2 Fn x, ω ! Fx a.e.

Matters end here if we are interested only in a particular value of x, or a finite


number of values, but since both members in (2) contain the parameter x,
which ranges over the whole real line, how much better it would be to make
a global statement about the functions Fn Ð, ω and FÐ. We shall do this in
the theorem below. Observe first the precise meaning of (2): for each x, there
exists a null set Nx such that (2) holds for ω 2 nNx. It follows that (2)
also holds simultaneously for all x in any given countable set Q, such as the
140 LAW OF LARGE NUMBERS. RANDOM SERIES

set of rational numbers, for ω 2 nN, where

ND Nx
x2Q

is again a null set. Hence by the definition of vague convergence in Sec. 4.4,
we can already assert that

Fn Ð, ω ! FÐ for a.e. ω.


This will be further strengthened in two ways: convergence for all x and
uniformity. The result is due to Glivenko and Cantelli.

Theorem 5.5.1. We have as n ! 1


sup jFn x, ω  Fxj ! 0 a.e.
1<x<1

PROOF. Let J be the countable set of jumps of F. For each x 2 J, define



1, if Xj ω D x;
j x, ω D
0, if Xj ω 6D x.
Then for x 2 J:
1
n
Fn xC, ω  Fn x, ω D j x, ω,
n jD1

and it follows as before that there exists a null set Nx such that if ω 2
nNx, then
3 Fn xC, ω  Fn x, ω ! FxC  Fx.
Now let N1 D x2Q[J Nx, then N1 is a null set, and if ω 2 nN1 , then (3)
holds for every x 2 J and we have also
4 Fn x, ω ! Fx
for every x 2 Q. Hence the theorem will follow from the following analytical
result.

Lemma. Let Fn and F be (right continuous) d.f.’s, Q and J as before.


Suppose that we have
8x 2 Q: Fn x ! Fx;
8x 2 J: Fn x  Fn x ! Fx  Fx.
Then Fn converges uniformly to F in R1 .
5.5 APPLICATIONS 141

PROOF. Suppose the contrary, then there exist  > 0, a sequence fnk g of
integers tending to infinity, and a sequence fxk g in R1 such that for all k:

5 jFnk xk   Fxk j ½  > 0.

This is clearly impossible if xk ! C1 or xk ! 1. Excluding these cases,


we may suppose by taking a subsequence that xk !  2 R1 . Now consider
four possible cases and the respective inequalities below, valid for all suffi-
ciently large k, where r1 2 Q, r2 2 Q, r1 <  < r2 .
Case 1. xk " , xk <  :
  Fnk xk   Fxk   Fnk   Fr1 
 Fnk   Fnk  C Fnk r2   Fr2  C Fr2   Fr1 .
Case 2. xk " , xk <  :
  Fxk   Fnk xk   F  Fnk r1 
D F  Fr1  C Fr1   Fnk r1 .
Case 3. xk # , xk ½  :
  Fxk   Fnk xk   Fr2   Fnk 
 Fr2   Fr1  C Fr1   Fnk r1  C Fnk   Fnk .
Case 4. xk # , xk ½  :
  Fnk xk   Fxk   Fnk r2   F
D Fnk r2   Fnk r1  C Fnk r1   Fr1  C Fr1   F.

In each case let first k ! 1, then r1 " , r2 # ; then the last member of
each chain of inequalities does not exceed a quantity which tends to 0 and a
contradiction is obtained.

Remark. The reader will do well to observe the way the proof above
is arranged. Having chosen a set of ω with probability one, for each fixed
ω in this set we reason with the corresponding sample functions Fn Ð, ω
and FÐ, ω without further intervention of probability. Such a procedure is
standard in the theory of stochastic processes.

Our next application is to renewal theory. Let fXn , n ½ 1g again be a


sequence of independent and identically distributed r.v.’s. We shall further
assume that they are positive, although this hypothesis can be dropped, and
that they are not identically zero a.e. It follows that the common mean is
142 LAW OF LARGE NUMBERS. RANDOM SERIES

strictly positive but may be C1. Now the successive r.v.’s are interpreted as
“lifespans” of certain objects undergoing a process of renewal, or the “return
periods” of certain recurrent phenomena. Typical examples are the ages of a
succession of living beings and the durations of a sequence of services. This
raises theoretical as well as practical questions such as: given an epoch in
time, how many renewals have there been before it? how long ago was the
last renewal? how soon will the next be?
Let us consider the first question. Given the epoch t ½ 0, let Nt, ω
be the number of renewals up to and including the time t. It is clear that
we have

6 fω: Nt, ω D ng D fω: Sn ω  t < SnC1 ωg,

valid for n ½ 0, provided S0  0. Summing over n  m  1, we obtain

7 fω: Nt, ω < mg D fω: Sm ω > tg.

This shows in particular that for each t > 0, Nt D Nt, Ð is a discrete r.v.
whose range is the set of all natural numbers. The family of r.v.’s fNtg
indexed by t 2 [0, 1 may be called a renewal process. If the common distri-
bution F of the Xn ’s is the exponential Fx D 1  ex , x ½ 0; where  > 0,
then fNt, t ½ 0g is just the simple Poisson process with parameter .
Let us prove first that

8 lim Nt D C1 a.e.,


t!1

namely that the total number of renewals becomes infinite with time. This is
almost obvious, but the proof follows. Since Nt, ω increases with t, the limit
in (8) certainly exists for every ω. Were it finite on a set of strictly positive
probability, there would exist an integer M such that

P f sup Nt, ω < Mg > 0.


0t<1

This implies by (7) that

P fSM ω D C1g > 0,

which is impossible. (Only because we have laid down the convention long
ago that an r.v. such as X1 should be finite-valued unless otherwise specified.)
Next let us write

9 0 < m D E X1   C1,

and suppose for the moment that m < C1. Then, according to the strong law
of large numbers (Theorem 5.4.2), Sn /n ! m a.e. Specifically, there exists a
5.5 APPLICATIONS 143

null set Z1 such that


X1 ω C Ð Ð Ð C Xn ω
8ω 2 nZ1 : lim D m.
n!1 n
We have just proved that there exists a null set Z2 such that
8ω 2 nZ2 : lim Nt, ω D C1.
t!1

Now for each fixed ω0 , if the numerical sequence fan ω0 , n ½ 1g converges
to a finite (or infinite) limit m and at the same time the numerical function
fNt, ω0 , 0  t < 1g tends to C1 as t ! C1, then the very definition of a
limit implies that the numerical function faNt,ω0  ω0 , 0  t < 1g converges
to the limit m as t ! C1. Applying this trivial but fundamental observation to

1
n
an D Xj
n jD1

for each ω in nZ1 [ Z2 , we conclude that


SNt,ω ω
10 lim Dm a.e.
t!1 Nt, ω

By the definition of Nt, ω, the numerator on the left side should be close to
t; this will be confirmed and strengthened in the following theorem.

Theorem 5.5.2. We have


Nt 1
11 lim D a.e.
t!1 t m
and
E fNtg 1
lim D ;
t!1 t m
both being true even if m D C1, provided we take 1/m to be 0 in that case.
PROOF. It follows from (6) that for every ω:
SNt,ω ω  t < SNt,ωC1 ω
and consequently, as soon as t is large enough to make Nt, ω > 0,
SNt,ω ω t SNt,ωC1 ω Nt, ω C 1
 < .
Nt, ω Nt, ω Nt, ω C 1 Nt, ω
Letting t ! 1 and using (8) and (10) (together with Exercise 1 of Sec. 5.4
in case m D C1), we conclude that (11) is true.
144 LAW OF LARGE NUMBERS. RANDOM SERIES

The deduction of (12) from (11) is more tricky than might have been
thought. Since Xn is not zero a.e., there exists υ > 0 such that
8n: P fXn ½ υg D p > 0.
Define 
υ, if Xn ω ½ υ;
X0n ω D
0, if Xn ω < υ;
and let Sn0 and N0 t be the corresponding quantities for the sequence fX0n , n ½
1g. It is obvious that Sn0  Sn and N0 t ½ Nt for each t. Since the r.v.’s
fX0n /υg are independent with a Bernoullian distribution, elementary computa-
tions (see Exercise 7 below) show that
 2
t
E fN0 t2 g D O 2 as t ! 1.
υ
Hence we have, υ being fixed,
   2
Nt 2 N0 t
E E D O1.
t t
Since (11) implies the convergence of Nt/t in distribution to υ1/m , an appli-
cation of Theorem 4.5.2 with Xn D Nn/n and p D 2 yields (12) with t
replaced by n in (12), from which (12) itself follows at once.

Corollary. For each t, E fNtg < 1.

An interesting relation suggested by (10) is that E fSNt g should be close


to mE fNtg when t is large. The precise result is as follows:
E fX1 C Ð Ð Ð C XNtC1 g D E fX1 gE fNt C 1g.
This is a striking generalization of the additivity of expectations when the
number of terms as well as the summands is “random.” This follows from the
following more general result, known as “Wald’s equation”.

Theorem 5.5.3. Let fXn , n ½ 1g be a sequence of independent and identi-


cally distributed r.v.’s with finite mean. For k ½ 1 let Fk , 1  k < 1, be the
Borel field generated by fXj , 1  j  kg. Suppose that N is an r.v. taking
positive integer values such that
13 8k ½ 1: fN  kg 2 Fk ,
and E N < 1. Then we have
E SN  D E X1 E N.
5.5 APPLICATIONS 145

PROOF. Since S0 D 0 as usual, we have


 1  k 
1 

14 E SN  D SN dP D Sk dP D Xj dP
 kD1 NDk kD1 jD1 fNDkg

1 
1 
 1 

D Xj dP D Xj dP
jD1 kDj fNDkg jD1 fN½jg

1 
  
D E Xj   Xj dP .
jD1 fNj1g

Now the set fN  j  1g and the r.v. Xj are independent, hence the last written
integral is equal to E Xj P fN  j  1g. Substituting this into the above, we
obtain
1
 1

E SN  D E Xj P fN ½ jg D E X1  P fN ½ jg D E X1 E N,
jD1 jD1

the last equation by the corollary to Theorem 3.2.1.


It remains to justify the interchange of summations in (14), which is
essential here. This is done by replacing Xj with jXj j and obtaining as before
that the repeated sum is equal to E jX1 jE N < 1.
We leave it to the reader to verify condition (13) for the r.v. Nt C 1
above. Such an r.v. will be called “optional” in Chapter 8 and will play an
important role there.
Our last example is a noted triumph of the ideas of probability theory
applied to classical analysis. It is S. Bernstein’s proof of Weierstrass’ theorem
on the approximation of continuous functions by polynomials.

Theorem 5.5.4. Let f be a continuous function on [0, 1], and define the
Bernstein polynomials fpn g as follows:
n   
k n
15 pn x D f x k 1  xnk .
n k
kD0

Then pn converges uniformly to f in [0, 1].


PROOF. For each x, consider a sequence of independent Bernoullian r.v.’s
fXn , n ½ 1g with success probability x, namely:

1 with probability x,
Xn D
0 with probability 1  x;
146 LAW OF LARGE NUMBERS. RANDOM SERIES

n
and let Sn D kD1Xk as usual. We know from elementary probability theory
that  
n
P fSn D kg D x k 1  xnk , 0  k  n,
k
so that   
Sn
pn x D E f .
n
We know from the law of large numbers that Sn /n ! x with probability one,
but it is sufficient to have convergence in probability, which is Bernoulli’s
weak law of large numbers. Since f is uniformly continuous in [0, 1], it
follows as in the proof of Theorem 4.4.5 that
  
Sn
E f ! E ffxg D fx.
n
We have therefore proved the convergence of pn x to fx for each x. It
remains to check the uniformity. Now we have for any υ > 0:
   
 Sn 
16 jpn x  fxj  E f  fx
n
      
 Sn   Sn 
D E f  fx ;   x  > υ
n n
      
 Sn   Sn 
C E f  fx ;   x   υ ,
n n

where we have written E fY; 3g for 3 Y dP . Given  > 0, there exists υ
such that
jx  yj  υ ) jfx  fyj  /2.
With this choice of υ the last term in (16) is bounded by /2. The preceding
term is clearly bounded by
  
 Sn 
2jjfjjP   x  > υ .
n
Now we have by Chebyshev’s inequality, since E Sn  D nx, 2 Sn  D nx1 
x, and x1  x  14 for 0  x  1:
    
 Sn  1 Sn nx1  x 1
P   x  > υ  2 2 D 2 2
 2 .
n υ n υ n 4υ n
This is nothing but Chebyshev’s proof of Bernoulli’s theorem. Hence if n ½
jjfjj/υ2 , we get jpn x  fxj   in (16). This proves the uniformity.
5.5 APPLICATIONS 147

One should remark not only on the lucidity of the above derivation but
also the meaningful construction of the approximating polynomials. Similar
methods can be used to establish a number of well-known analytical results
with relative ease; see Exercise 11 below for another example.

EXERCISES

fXn g is a sequence of independent and identically distributed r.v.’s;



n
Sn D Xj .
jD1

1. Show that equality can hold somewhere in (1) with strictly positive
probability if and only if the discrete part of F does not vanish.
 2. Let F and F be as in Theorem 5.5.1; then the distribution of
n

sup jFn x, ω  Fxj


1<x<1

is the same for all continuous F. [HINT: Consider FX, where X has the
d.f. F.]
3. Find the distribution of Ynk , 1  k  n, in (1). [These r.v.’s are called
order statistics.]
 4. Let S and Nt be as in Theorem 5.5.2. Show that
n
1

E fNtg D P fSn  tg.
nD1

This remains true if X1 takes both positive and negative values.


5. If E X1  > 0, then
1
lim P [Sn  t] D 0.
t!1
nD1

 6. For each t > 0, define

t, ω D minfn: jSn ωj > tg

if such an n exists, or C1 if not. If P X1 6D 0 > 0, then for every t > 0 and
r > 0 we have P f t > ng  n for some  < 1 and all large n; consequently
E f tr g < 1. This implies the corollary of Theorem 5.5.2 without recourse
to the law of large numbers. [This is Charles Stein’s theorem.]
 7. Consider the special case of renewal where the r.v.’s are Bernoullian
taking the values 1 and 0 with probabilities p and 1  p, where 0 < p < 1.
148 LAW OF LARGE NUMBERS. RANDOM SERIES

Find explicitly the d.f. of 0 as defined in Exercise 6, and hence of t
for every t > 0. Find E f tg and E f t2 g. Relate t, ω to the Nt, ω in
Theorem 5.5.2 and so calculate E fNtg and E fNt2 g.
8. Theorem 5.5.3 remains true if E X1  is defined, possibly C1 or 1.
 9. In Exercise 7, find the d.f. of X for a given t. E fX g is the mean
t t
lifespan of the object living at the epoch t; should it not be the same as E fX1 g,
the mean lifespan of the given species? [This is one of the best examples of
the use or misuse of intuition in probability theory.]
10. Let  be a positive integer-valued r.v. that is independent of the Xn ’s.
Suppose that both  and X1 have finite second moments, then
2
S  D E  2
X1  C 2
E X1 2 .
 11. Let f be continuous and belong to L r 0, 1 for some r > 1, and
 1
g D et ft dt.
0

Then
1n1 n n n
fx D lim gn1 ,
n!1 n  1! x x

where gn1 is the n  1st derivative of g, uniformly in every finite interval.


[HINT: Let  > 0, P fX1   tg D 1  et . Then
E ffSn g D [1n1 /n  1!]n gn1 

and Sn n/x ! x in pr. This is a somewhat easier version of Widder’s inver-


sion formula for Laplace transforms.]

12. Let P fX1 D kg D pk , 1  k  , kD1 pk D 1. Let Nn, ω be the
number of values of j, 1  j  n, for which Xj D k and
 

n, ω D pNn,ω
k .
kD1

Prove that
1 
lim log n, ω exists a.e.
n!1 n

and find the limit. [This is from information theory.]

Bibliographical Note

Borel’s theorem on normal numbers, as well as the Borel–Cantelli lemma in


Secs. 4.2–4.3, is contained in
5.5 APPLICATIONS 149

Émile Borel, Sur les probabilités dénombrables et leurs applications arithmétiques,


Rend. Circ. Mat. Palermo 26 (1909), 247–271.
This pioneering paper, despite some serious gaps (see Fréchet [9] for comments),
is well worth reading for its historical interest. In Borel’s Jubilé Selecta (Gauthier-
Villars, 1940), it is followed by a commentary by Paul Lévy with a bibliography on
later developments.
Every serious student of probability theory should read:
A. N. Kolmogoroff, Über die Summen durch den Zufall bestimmten unabhängiger
Grössen, Math. Annalen 99 (1928), 309–319; Bermerkungen, 102 (1929),
484–488.
This contains Theorems 5.3.1 to 5.3.3 as well as the original version of Theorem 5.2.3.
For all convergence questions regarding sums of independent r.v.’s, the deepest
study is given in Chapter 6 of Lévy’s book [11]. After three decades, this book remains
a source of inspiration.
Theorem 5.5.2 is taken from
J. L. Doob, Renewal theory from the point of view of probability, Trans. Am. Math.
Soc. 63 (1942), 422–438.
Feller’s book [13], both volumes, contains an introduction to renewal theory as well
as some of its latest developments.
6 Characteristic function

6.1 General properties; convolutions

An important tool in the study of r.v.’s and their p.m.’s or d.f.’s is the char-
acteristic function (ch.f.). For any r.v. X with the p.m. and d.f. F, this is
defined to be the function f on R1 as follows, 8t 2 R1 :
   1
1 ft D E e  D
itX
e itXω
P dω D e itx
dx D eitx dFx.
 R1 1

The equality of the third and fourth terms above is a consequence of


Theorem 3.32.2, while the rest is by definition and notation. We remind the
reader that the last term in (1) is defined to be the one preceding it, where the
one-to-one correspondence between and F is discussed in Sec. 2.2. We shall
use both of them below. Let us also point out, since our general discussion
of integrals has been confined to the real domain, that f is a complex-valued
function of the real variable t, whose real and imaginary parts are given
respectively by
 
Rft D cos xt dx, Ift D sin xt dx.
6.1 GENERAL PROPERTIES; CONVOLUTIONS 151

Here and hereafter, integrals without an indicated domain of integration are


taken over R1 .
Clearly, the ch.f. is a creature associated with or F, not with X, but
the first equation in (1) will prove very convenient for us, see e.g. (iii) and (v)
below. In analysis, the ch.f. is known as the Fourier–Stieltjes transform of
or F. It can also be defined over a wider class of or F and, furthermore,
be considered as a function of a complex variable t, under certain conditions
that ensure the existence of the integrals in (1). This extension is important in
some applications, but we will not need it here except in Theorem 6.6.5. As
specified above, it is always well defined (for all real t) and has the following
simple properties.

(i) 8t 2 R1 :
jftj  1 D f0; ft D ft,
where z denotes the conjugate complex of z.
(ii) f is uniformly continuous in R1 .
To see this, we write for real t and h:

ft C h  ft D eitChx  eitx  dx,
 
jft C h  ftj  je jje
itx ihx
 1j dx D jeihx  1j dx.

The last integrand is bounded by 2 and tends to 0 as h ! 0, for each x. Hence,


the integral converges to 0 by bounded convergence. Since it does not involve
t, the convergence is surely uniform with respect to t.
(iii) If we write fX for the ch.f. of X, then for any real numbers a and
b, we have
faXCb t D fX ateitb ,
fX t D fX t.
This is easily seen from the first equation in (1), for
E eitaXCb  D E eitaX Ð eitb  D E eitaX eitb .

(iv) If ffn , n ½ 1g are ch.f.’s, n ½ 0, 1 nD1 n D 1, then
1

n fn
nD1

is a ch.f. Briefly: a convex combination of ch.f.’s is a ch.f.


152 CHARACTERISTIC FUNCTION

1
For if f n ,
n ½ 1g are the corresponding p.m.’s, then nD1 n n is a p.m.
whose ch.f. is 1 nD1 n fn .
(v) If ffj , 1  j  ng are ch.f.’s, then

n
fj
jD1

is a ch.f.

By Theorem 3.3.4, there exist independent r.v.’s fXj , 1  j  ng with


probability distributions f j , 1  j  ng, where j is as in (iv). Letting

n
Sn D Xj ,
jD1

we have by the corollary to Theorem 3.3.3:


⎛ ⎞

n n 
n
E eitSn  D E ⎝ eitXj ⎠ D E eitXj  D fj t;
jD1 jD1 jD1

or in the notation of (iii):



n
2 fSn D fXj .
jD1

(For an extension to an infinite number of fj ’s see Exercise 4 below.)


The ch.f. of Sn being so neatly expressed in terms of the ch.f.’s of the
summands, we may wonder about the d.f. of Sn . We need the following
definitions.

DEFINITION. The convolution of two d.f.’s F1 and F2 is defined to be the


d.f. F such that
 1
3 8x 2 R : Fx D
1
F1 x  y dF2 y,
1

and written as
F D F1 Ł F2 .
It is easy to verify that F is indeed a d.f. The other basic properties of
convolution are consequences of the following theorem.

Theorem 6.1.1. Let X1 and X2 be independent r.v.’s with d.f.’s F1 and F2 ,


respectively. Then X1 C X2 has the d.f. F1 Ł F2 .
6.1 GENERAL PROPERTIES; CONVOLUTIONS 153

PROOF. We wish to show that


4 8x: P fX1 C X2  xg D F1 Ł F2 x.
For this purpose we define a function f of x1 , x2  as follows, for fixed x:

1, if x1 C x2  x;
fx1 , x2  D
0, otherwise.
f is a Borel measurable function of two variables. By Theorem 3.3.3 and
using the notation of the second proof of Theorem 3.3.3, we have
 
fX1 , X2  dP D fx1 , x2  2 dx1 , dx2 

R2
 
D 2 dx2  fx1 , x2  1 dx1 
R1 R1
 
D 2 dx2  1 dx1 
R1 1,xx2 ]
 1
D dF2 x2 F1 x  x2 .
1

This reduces to (4). The second equation above, evaluating the double integral
by an iterated one, is an application of Fubini’s theorem (see Sec. 3.3).

Corollary. The binary operation of convolution Ł is commutative and asso-


ciative.
For the corresponding binary operation of addition of independent r.v.’s
has these two properties.

DEFINITION. The convolution of two probability density functions p1 and


p2 is defined to be the probability density function p such that
 1
5 8x 2 R1 : px D p1 x  yp2 y dy,
1

and written as
p D p 1 Ł p2 .

We leave it to the reader to verify that p is indeed a density, but we will


spell out the following connection.

Theorem 6.1.2. The convolution of two absolutely continuous d.f.’s with


densities p1 and p2 is absolutely continuous with density p1 Ł p2 .
154 CHARACTERISTIC FUNCTION

PROOF. We have by Fubini’s theorem:


 x  x  1
pu du D du p1 u  vp2 v dv
1 1 1
 1 # x $
D p1 u  v du p2 v dv
1 1
 1
D F1 x  vp2 v dv
1
 1
D F1 x  v dF2 v D F1 Ł F2 x.
1

This shows that p is a density of F1 Ł F2 .


What is the p.m., to be denoted by 1 Ł 2 , that corresponds to F1 Ł F2 ?
For arbitrary subsets A and B of R1 , we denote their vector sum and difference
by A C B and A  B, respectively:
6 A š B D fx š y: x 2 A, y 2 Bg;
and write x š B for fxg š B, B for 0  B. There should be no danger of
confusing A  B with AnB.

Theorem 6.1.3. For each B 2 B, we have



7  1 Ł 2 B D 1 B  y 2 dy.
R1

For each Borel measurable function g that is integrable with respect to 1 Ł 2,


we have
 
8 gu 1 Ł 2 du D gx C y 1 dx 2 dy.
R1
R1 R1

PROOF. It is easy to verify that the set function  1 Ł 2 Ð defined by


(7) is a p.m. To show that its d.f. is F1 Ł F2 , we need only verify that its value
for B D 1, x] is given by the Fx defined in (3). This is obvious, since
the right side of (7) then becomes
  1
F1 x  y 2 dy D F1 x  y dF2 y.
R1 1

Now let g be the indicator of the set B, then for each y, the function gy defined
by gy x D gx C y is the indicator of the set B  y. Hence

gx C y 1 dx D 1 B  y
R1
6.1 GENERAL PROPERTIES; CONVOLUTIONS 155

and, substituting into the right side of (8), we see that it reduces to (7) in this
case. The general case is proved in the usual way by first considering simple
functions g and then passing to the limit for integrable functions.
As an instructive example, let us calculate the ch.f. of the convolution
1 Ł 2 . We have by (8)
 
e  1 Ł 2 du D
itu
eity eitx 1 dx 2 dy

 
D e itx
1 dx eity 2 dy.

This is as it should be by (v), since the first term above is the ch.f. of X C Y,
where X and Y are independent with 1 and 2 as p.m.’s. Let us restate the
results, after an obvious induction, as follows.

Theorem 6.1.4. Addition of (a finite number of) independent r.v.’s corre-


sponds to convolution of their d.f.’s and multiplication of their ch.f.’s.

Corollary. If f is a ch.f., then so is jfj2 .

To prove the corollary, let X have the ch.f. f. Then there exists on some
 (why?) an r.v. Y independent of X and having the same d.f., and so also
the same ch.f. f. The ch.f. of X  Y is
E eitXY  D E eitX E eitY  D ftft D jftj2 .
The technique of considering X  Y and jfj2 instead of X and f will
be used below and referred to as “symmetrization” (see the end of Sec. 6.2).
This is often expedient, since a real and particularly a positive-valued ch.f.
such as jfj2 is easier to handle than a general one.
Let us list a few well-known ch.f.’s together with their d.f.’s or p.d.’s
(probability densities), the last being given in the interval outside of which
they vanish.

(1) Point mass at a:


d.f. υa ; ch.f. eiat .
(2) Symmetric Bernoullian distribution with mass 1
2 each at C1 and 1:

d.f. 12 υ1 C υ1 ; ch.f. cos t.


(3) Bernoullian distribution with “success probability” p, and q D 1  p:
d.f. qυ0 C pυ1 ; ch.f. q C peit D 1 C peit  1.
156 CHARACTERISTIC FUNCTION

(4) Binomial distribution for n trials with success probability p:


n  
n
d.f. pk qnk υk ; ch.f. q C peit n .
k
kD0
(5) Geometric distribution with success probability p:
1

d.f. qn pυn ; ch.f. p1  qeit 1 .
nD0
(6) Poisson distribution with (mean) parameter :
1
 n
e
it 1
d.f. υn ; ch.f. ee .
nD0
n!

(7) Exponential distribution with mean 1 :


p.d. ex in [0, 1; ch.f. 1  1 it1 .
(8) Uniform distribution in [a, Ca]:
1 sin at
p.d. in [a, a]; ch.f. D 1 for t D 0.
2a at
(9) Triangular distribution in [a, a]:
⎛ at ⎞2
a  jxj 21  cos at ⎜ sin

p.d. 2
in [a, a]; ch.f. 2 2
D ⎝ at2 ⎠ .
a a t
2
(10) Reciprocal of (9):
 
1  cos ax jtj
p.d. in 1, 1; ch.f. 1  _ 0.
ax 2 a
2
(11) Normal distribution Nm,  with mean m and variance 2 :
# $
1 x  m2
p.d. p exp  in 1, 1;
2 2 2
 2 2
t
ch.f. exp imt  .
2
Unit normal distribution N0, 1 D 8 with mean 0 and variance 1:
1
p.d. p ex /2 in 1, 1; ch.f. et /2 .
2 2

2
(12) Cauchy distribution with parameter a > 0:
a
p.d. in 1, 1; ch.f. eajtj .
a C x 2 
2
6.1 GENERAL PROPERTIES; CONVOLUTIONS 157

Convolution is a smoothing operation widely used in mathematical anal-


ysis, for instance in the proof of Theorem 6.5.2 below. Convolution with the
normal kernel is particularly effective, as illustrated below.
Let nυ be the density of the normal distribution N0, υ2 , namely
 
1 x2
nυ x D p exp  2 , 1 < x < 1.
2υ 2υ

For any bounded measurable function f on R1 , put


 1  1
9 fυ x D f Ł nυ x D fx  ynυ y dy D nυ x  yfy dy.
1 1

It will be seen below that the integrals above converge. Let C1


B denote the
class of functions on R1 which have bounded derivatives of all orders; CU
the class of bounded and uniformly continuous functions on R1 .

Theorem 6.1.5. For each υ > 0, we have fυ 2 C1


B . Furthermore if f 2 CU ,
then fυ ! f uniformly in R1 .
PROOF. It is easily verified that nυ 2 C1 k
B . Moreover its kth derivative nυ
is dominated by ck,υ n2υ where ck,υ is a constant depending only on k and υ so
that
 1   1
 
 n k
x  yfy dy   ck,υ jjfjj n2υ x  y dy D ck,υ jjfjj.
 υ 
1 1

Thus the first assertion follows by differentiation under the integral of the last
term in (9), which is justified by elementary rules of calculus. The second
assertion is proved by standard estimation as follows, for any  > 0:
 1
jfx  fυ xj  jfx  fx  yjnυ y dy
1

 sup jfx  fx  yj C 2jjfjj nυ y dy.
jyj jyj>

Here is the probability idea involved. If f is integrable over R1 , we may


think of f as the density of a r.v. X, and nυ as that of an independent normal
r.v. Yυ . Then fυ is the density of X C Yυ by Theorem 6.1.2. As υ # 0, Yυ
converges to 0 in probability and so X C Yυ converges to X likewise, hence
also in distribution by Theorem 4.4.5. This makes it plausible that the densities
will also converge under certain analytical conditions.
As a corollary, we have shown that the class C1 B is dense in the class CU
1
with respect to the uniform topology on R . This is a basic result in the theory
158 CHARACTERISTIC FUNCTION

of “generalized functions” or “Schwartz distributions”. By way of application,


we state the following strengthening of Theorem 4.4.1.

Theorem 6.1.6. If f ng and are s.p.m.’s such that


 
8f 2 C1
B : fx n dx ! fx dx,
R1 R1

v
then n! .

This is an immediate consequence of Theorem 4.4.1, and Theorem 6.1.5,


if we observe that C0 ² CU . The reduction of the class of “test functions”
from C0 to CxB is often expedient, as in Lindeberg’s method for proving central
limit theorems; see Sec. 7.1 below.

EXERCISES

1. . If f is a ch.f., and G a d.f. with G0 D 0, then the following


functions are all ch.f.’s:
 1  1  1
fut du, futeu du, ejtju dGu,
0 0 0
 1  1
et u dGu,
2
fut dGu.
0 0

 2. Let fu, t be a function on 1, 1 ð 1, 1 such that for each


u, fu, Ð is a ch.f. and for each t, fÐ, t is a continuous function; then
 1
fu, t dGu
1

is a ch.f. for any d.f. G. In particular, if f is a ch.f. such that limt!1 ft
exists and G a d.f. with G0 D 0, then
 1  
t
f dGu is a ch.f.
0 u

3. Find the d.f. with the following ch.f.’s ˛ > 0, ˇ > 0:

˛2 1 1
, , .
˛2 C t2 1  ˛itˇ 1 C ˛ˇ  ˛ˇeit 1/ˇ

[HINT: The second and third steps correspond respectively to the gamma and
Pólya distributions.]
6.1 GENERAL PROPERTIES; CONVOLUTIONS 159

4. Let Sn be as in (v) and suppose that Sn ! S1 in pr. Prove that


1

fj t
jD1

converges in the sense of infinite product for each t and is the ch.f. of S1 .
5. If F1 and F2 are d.f.’s such that

F1 D bj υaj
j

and F2 has density p, show that F1 Ł F2 has a density and find it.
 6. Prove that the convolution of two discrete d.f.’s is discrete; that of a
continuous d.f. with any d.f. is continuous; that of an absolutely continuous
d.f. with any d.f. is absolutely continuous.
7. The convolution of two discrete distributions with exactly m and n
atoms, respectively, has at least m C n  1 and at most mn atoms.
8. Show that the family of normal (Cauchy, Poisson) distributions is
closed with respect to convolution in the sense that the convolution of any
two in the family with arbitrary parameters is another in the family with some
parameter(s).
9. Find the nth iterated convolution of an exponential distribution.
 10. Let fX , j ½ 1g be a sequence of independent r.v.’s having the
j
common exponential distribution with mean 1/,  > 0. For given  x > 0 let
be the maximum of n such that Sn  x, where S0 D 0, Sn D njD1 Xj as
usual. Prove that the r.v. has the Poisson distribution with mean x. See
Sec. 5.5 for an interpretation by renewal theory.
11. Let X have the normal distribution 8. Find the d.f., p.d., and ch.f.
of X2 .
12. Let fXj , 1  j  ng be independent r.v.’s each having the d.f. 8.
Find the ch.f. of
 n
X2j
jD1

and show that the corresponding p.d. is 2n/2 0n/21 x n/21 ex/2 in 0, 1.
This is called in statistics the “ 2 distribution with n degrees of freedom”.
13. For any ch.f. f we have for every t:
R[1  ft] ½ 14 R[1  f2t].
14. Find an example of two r.v.’s X and Y with the same p.m. that are
not independent but such that X C Y has the p.m. Ł . [HINT: Take X D Y
and use ch.f.]
160 CHARACTERISTIC FUNCTION

 15. For a d.f. F and h ½ 0, define

QF h D sup[Fx C h  Fx];


x

QF is called the Lévy concentration function of F. Prove that the sup above
is attained, and if G is also a d.f., we have
8h > 0: QFŁ G h  QF h ^ QG h.
16. If 0 < h  2, then there is an absolute constant A such that

A 
QF h  jftj dt,
 0
where f is the ch.f. of F. [HINT: Use Exercise 2 of Sec. 6.2 below.]
17. Let F be a symmetric d.f., with ch.f. f ½ 0 then
 1  1
h2
ϕF h D 2 C x2
dFx D h eht ft dt
1 h 0

is a sort of average concentration function. Prove that if G is also a d.f. with


ch.f. g ½ 0, then we have 8h > 0:
ϕFŁ G h  ϕF h ^ ϕG h;
1  ϕFŁ G h  [1  ϕF h] C [1  ϕG h].
 18. Let the support of the p.m. on R1 be denoted by supp . Prove
that
supp  Ł  D closure of supp C supp ;
supp  1 Ł 2 Ł Ð Ð Ð D closure of supp 1 C supp 2 C Ð Ð Ð

where “C” denotes vector sum.

6.2 Uniqueness and inversion


To study the deeper properties of Fourier–Stieltjes transforms, we shall need
certain “Dirichlet integrals”. We begin with three basic formulas, where “sgn
˛” denotes 1, 0 or 1, according as ˛ > 0, D 0, or < 0.
 y  
sin ˛x sin x
1 8y ½ 0: 0  sgn ˛ dx  dx.
0 x 0 x
 1
sin ˛x 
2 dx D sgn ˛.
0 x 2
6.2 UNIQUENESS AND INVERSION 161

 1
1  cos ˛x 
3 dx D j˛j.
0 x2 2
The substitution ˛x D u shows at once that it is sufficient to prove all three
formulas for ˛ D 1. The inequality (1) is proved by partitioning the interval
[0, 1 with positive multiples of  so as to convert the integral into a series
of alternating signs and decreasing moduli. The integral in (2) is a standard
exercise in contour integration, as is also that in (3). However, we shall indicate
the following neat heuristic calculations, leaving the justifications, which are
not difficult, as exercises.
 1  1 # 1 $  1 # 1 $
sin x
dx D sin x exu du dx D exu sin x dx du
0 x 0 0 0 0
 1
du 
D D ;
0 1Cu 2 2
 1  1 # x $  1 # 1 $
1  cos x 1 dx
dx D sin u du dx D sin u du
0 x2 0 x2 0 0 u x2
 1
sin u 
D du D .
0 u 2
We are ready to answer the question: given a ch.f. f, how can we find
the corresponding d.f. F or p.m. ? The formula for doing this, called the
inversion formula, is of theoretical importance, since it will establish a one-
to-one correspondence between the class of d.f.’s or p.m.’s and the class of
ch.f.’s (see, however, Exercise 12 below). It is somewhat complicated in its
most general form, but special cases or variants of it can actually be employed
to derive certain properties of a d.f. or p.m. from its ch.f.; see, e.g., (14) and
(15) of Sec. 6.4.

Theorem 6.2.1. If x1 < x2 , then we have


4 x1 , x2  Cfx1 g C 12 fx2 g
1
2
 T itx1
1 e  eitx2
D lim ft dt
T!1 2 T it
(the integrand being defined by continuity at t D 0).
PROOF. Observe first that the integrand above is bounded by jx1  x2 j
1
everywhere 1and is Ojtj  as jtj ! 1; yet we cannot assert that the “infinite
integral” 1 exists (in the Lebesgue sense). Indeed, it does not in general
(see Exercise 9 below). The fact that the indicated limit, the so-called Cauchy
limit, does exist is part of the assertion.
162 CHARACTERISTIC FUNCTION

We shall prove (4) by actually substituting the definition of f into (4)


and carrying out the integrations. We have
 T itx1 # 1 $
1 e  eitx2
5 eitx dx dt
2 T it 1
 1 # T itxx1  $
e  eitxx2 
D dt dx.
1 T 2it
Leaving aside for the moment the justification of the interchange of the iterated
integral, let us denote the quantity in square brackets above by IT, x, x1 , x2 .
Trivial simplification yields
 
1 T sin tx  x1  1 T sin tx  x2 
IT, x, x1 , x2  D dt  dt.
 0 t  0 t
It follows from (2) that
⎧ 1
⎪  2   2  D 0
1
⎪ for x < x1 ,



⎨ 0   2  D 2
1 1
for x D x1 ,
lim IT, x, x1 , x2  D 2   2  D 1
1 1
for x1 < x < x2 ,
T!1 ⎪

⎪1 0D 1
⎪ for x D x2 ,

⎩ 21 1 2
2  2 D0 for x > x2 .
Furthermore, I is bounded in T by (1). Hence we may let T ! 1 under the
integral sign in the right member of (5) by bounded convergence, since

2  sin x
jIT, x, x1 , x2 j  dx
 0 x
by (1). The result is
     
1 1
0C C 1C C 0 dx
1,x1  fx1 g 2 x1 ,x2  fx2 g 2 x2 ,1

D 1
2
fx1 g C x1 , x2  C 1
2
fx2 g.
This proves the theorem. For the justification mentioned above, we invoke
Fubini’s theorem and observe that
 itxx1    
e  eitxx2    x2 itu 

 it D e du  jx1  x2 j,
x1

where the integral is taken along the real axis, and


  T
jx1  x2 j dt dx  2Tjx1  x2 j < 1,
R1 T
6.2 UNIQUENESS AND INVERSION 163

so that the integrand on the right of (5) is dominated by a finitely integrable


function with respect to the finite product measure dt Ð dx on [T, CT] ð
R1 . This suffices.

Remark. If x1 and x2 are points of continuity of F, the left side of (4)


is Fx2   Fx1 .

The following result is often referred to as the “uniqueness theorem” for


the “determining” or F (see also Exercise 12 below).

Theorem 6.2.2. If two p.m.’s or d.f.’s have the same ch.f., then they are the
same.
PROOF. If neither x1 nor x2 is an atom of , the inversion formula (4)
shows that the value of on the interval x1 , x2  is determined by its ch.f. It
follows that two p.m.’s having the same ch.f. agree on each interval whose
endpoints are not atoms for either measure. Since each p.m. has only a count-
able set of atoms, points of R1 that are not atoms for either measure form a
dense set. Thus the two p.m.’s agree on a dense set of intervals, and therefore
they are identical by the corollary to Theorem 2.2.3.
We give next an important particular case of Theorem 6.2.1.

Theorem 6.2.3. If f 2 L 1 1, C1, then F is continuously differentiable,


and we have
 1
0 1
6 F x D eixt ft dt.
2 1
Applying (4) for x2 D x and x1 D x  h with h > 0 and using F
PROOF.
instead of , we have
 1 ith
Fx C Fx Fx  h C Fx  h 1 e  1 itx
 D e ft dt.
2 2 2 1 it
Here the infinite integral exists by the hypothesis on f, since the integrand
above is dominated by jhftj. Hence we may let h ! 0 under the integral
sign by dominated convergence and conclude that the left side is 0. Thus, F
is left continuous and so continuous in R1 . Now we can write
 1 ith
Fx  Fx  h 1 e  1 itx
D e ft dt.
h 2 1 ith
The same argument as before shows that the limit exists as h ! 0. Hence
F has a left-hand derivative at x equal to the right member of (6), the latter
being clearly continuous [cf. Proposition (ii) of Sec. 6.1]. Similarly, F has a
164 CHARACTERISTIC FUNCTION

right-hand derivative given by the same formula. Actually it is known for a


continuous function that, if one of the four “derivates” exists and is continuous
at a point, then the function is continuously differentiable there (see, e.g.,
Titchmarsh, The theory of functions, 2nd ed., Oxford Univ. Press, New York,
1939, p. 355).
The derivative F0 being continuous, we have (why?)
 x
8x: Fx D F0 u du.
1
0
Thus F is a probability density function. We may now state Theorem 6.2.3
in a more symmetric form familiar in the theory of Fourier integrals.

Corollary. If f 2 L 1 , then p 2 L 1 , where


 1
1
px D eixt ft dt,
2 1
and  1
ft D eitx px dx.
1

The next two theorems yield information on the atoms of by means of


f and are given here as illustrations of the method of “harmonic analysis”.

Theorem 6.2.4. For each x0 , we have


 T
1
7 lim eitx0 ft dt D fx0 g.
T!1 2T T

PROOF. Proceeding as in the proof of Theorem 6.2.1, we obtain for the


integral average on the left side of (7):
 
sin Tx  x0 
8 dx C 1 dx.
R1 fx0 g Tx  x0  fx0 g

The integrand of the first integral above is bounded by 1 and tends to 0 as


T ! 1 everywhere in the domain of integration; hence the integral converges
to 0 by bounded convergence. The second term is simply the right member
of (7).

Theorem 6.2.5. We have


 T 
1
9 lim jftj2 dt D fxg2 .
T!1 2T T
x2R1
6.2 UNIQUENESS AND INVERSION 165

PROOF. Since the set of atoms is countable, all but a countable number
of terms in the sum above vanish, making the sum meaningful with a value
bounded by 1. Formula (9) can be established directly in the manner of (5)
and (7), but the following proof is more illuminating. As noted in the proof
of the corollary to Theorem 6.1.4, jfj2 is the ch.f. of the r.v. X  Y there,
whose distribution is Ł 0 , where 0 B D B for each B 2 B. Applying
Theorem 6.2.4 with x0 D 0, we see that the left member of (9) is equal to
0
 Ł f0g.
By (7) of Sec. 6.1, the latter may be evaluated as
 
0
fyg dy D fyg fyg,
R1
y2R1

since the integrand above is zero unless y is an atom of 0 , which is the


case if and only if y is an atom of . This gives the right member of (9). The
reader may prefer to carry out the argument above using the r.v.’s X and Y
in the proof of the Corollary to Theorem 6.1.4.

Corollary. is atomless (F is continuous) if and only if the limit in the left


member of (9) is zero.

This criterion is occasionally practicable.

DEFINITION. The r.v. X is called symmetric iff X and X have the same
distribution.

For such an r.v., the distribution has the following property:


8B 2 B: B D B.
Such a p.m. may be called symmetric; an equivalent condition on its d.f. F is
as follows:
8x 2 R1 : Fx D 1  Fx,

(the awkwardness of using d.f. being obvious here).

Theorem 6.2.6. X or is symmetric if and only if its ch.f. is real-valued


(for all t).
PROOF. If X and X have the same distribution, they must “determine”
the same ch.f. Hence, by (iii) of Sec. 6.1, we have

ft D ft
166 CHARACTERISTIC FUNCTION

and so f is real-valued. Conversely, if f is real-valued, the same argument


shows that X and X must have the same ch.f. Hence, they have the same
distribution by the uniqueness theorem (Theorem 6.2.2).

EXERCISES

f is the ch.f. of F below.


1. Show that   2
1
sin x 
dx D .
0 x 2
 2. Show that for each T > 0:

1 1 1  cos Tx cos tx
dx D T  jtj _ 0.
 1 x2
Deduce from this that for each T > 0, the function of t given by
 
jtj
1 _0
T
is a ch.f. Next, show that as a particular case of Theorem 6.2.3,

1  cos Tx 1 T
D T  jtjeitx dt.
x2 2 T
Finally, derive the following particularly useful relation (a case of Parseval’s
relation in Fourier analysis), for arbitrary a and T > 0:
 1  T
1  cos Tx  a 1 1
dFx D T  jtjeita ft dt.
1 [Tx  a]2 2 T2 T
 3. Prove that for each ˛ > 0:
 ˛ 
1 1 1  cos ˛t itx
[Fx C u  Fx  u] du D e ft dt.
0  1 t2
As a sort of reciprocal, we have
  u  1
1 ˛ 1  cos ˛x
du ft dt D dFx.
2 0 u 1 x2

4. If ft/t 2 L 1 1, 1, then for each ˛ > 0 such that š˛ are points
of continuity of F, we have

1 1 sin ˛t
F˛  F˛ D ft dt.
 1 t
6.2 UNIQUENESS AND INVERSION 167

5. What is the special case of the inversion formula when f  1? Deduce


also the following special cases, where ˛ > 0:

1 1 sin ˛t sin t
dt D ˛ ^ 1,
 1 t2

1 1 sin ˛tsin t2 ˛2
dt D ˛  for ˛  2; 1 for ˛ > 2.
 1 t3 4
6. For each n ½ 0, we have
    2 u
1 1 sin t nC2
dt D ϕn t dt du,
 1 t 0 0

where ϕ1 D 12 1[1,1] and ϕn D ϕn1 Ł ϕ1 for n ½ 2.


 7. If F is absolutely continuous, then lim
jtj!1 ft D 0. Hence, if the
absolutely continuous part of F does not vanish, then limt!1 jftj < 1. If
F is purely discontinuous, then limt!1 ft D 1. [The first assertion is the
Riemann–Lebesgue lemma; prove it first when F has a density that is a
simple function, then approximate. The third assertion is an easy part of the
observation that such an f is “almost periodic”.]
8. Prove that for 0 < r < 2 we have
 1  1
1  Rft
jxjr dFx D Cr dt
1 1 jtjrC1
where  1
1
1  cos u 0r C 1 r
Cr D du D sin ,
1 jujrC1  2
thus C1 D 1/. [HINT:
 1
1  cos xt
jxj D Cr
r
dt.]
1 jtjrC1
 9. Give a trivial example where the right member of (4) cannot be
replaced by the Lebesgue integral
 1
1 eitx1  eitx2
R ft dt.
2 1 it
But it can always be replaced by the improper Riemann integral:
 T2
1 eitx1  eitx2
lim
T1 !1 2
R ft dt.
T !C1 T1 it
2
168 CHARACTERISTIC FUNCTION

10. Prove the following form of the inversion formula (due to Gil-
Palaez):
 T itx
1 1 e ft  eitx ft
fFxC C Fxg D C lim dt.
2 2 T"1
υ#0
υ 2it

[HINT: Use the method of proof of Theorem 6.2.1 rather than the result.]
11. Theorem 6.2.3 has an analogue in L 2 . If the ch.f. f of F belongs
2
to L , then F is absolutely continuous. [HINT: By Plancherel’s theorem, there
exists ϕ 2 L 2 such that
 x  1 itx
1 e 1
ϕu du D p ft dt.
0 2 0 it
Now use the inversion formula to show that
 x
1
Fx  F0 D p ϕu du.]
2 0
 12. Prove Theorem 6.2.2 by the Stone–Weierstrass theorem. [HINT: Cf.
Theorem 6.6.2 below, but beware of the differences. Approximate uniformly
g1 and g2 in the proof of Theorem 4.4.3 by a periodic function with “arbitrarily
large” period.]
13. The uniqueness theorem holds as well for signed measures [or func-
tions of bounded variations]. Precisely, if each i , i D 1, 2, is the difference
of two finite measures such that
 
8t: eitx 1 dx D eitx 2 dx,

then 1 2.
14. There is a deeper supplement to the inversion formula (4) or
Exercise 10 above, due to B. Rosén. Under the condition
 1
1 C log jxj dFx < 1,
1

the improper Riemann integral in Exercise 10 may be replaced by a Lebesgue


integral. [HINT: It is a matter of proving the existence of the latter. Since
 1  N   1
 sinx  yt 
 
dFy  t  dt  dFyf1 C log1 C Njx  yjg < 1,
1 0 1

we have
 1    1
N
sinx  yt N
dt
dFy dt D sinx  yt dFy.
1 0 t 0 t 1
6.3 CONVERGENCE THEOREMS 169

For fixed x, we have


   
 1
sinx  yt  dFy
dFy  dt  C1
y6Dx N t jxyj½1/N Njx  yj

C C2 dFy,
0<jxyj<1/N

both integrals on the right converging to 0 as N ! 1.]

6.3 Convergence theorems


For purposes of probability theory the most fundamental property of the ch.f.
is given in the two propositions below, which will be referred to jointly as the
convergence theorem, due to P. Lévy and H. Cramér. Many applications will
be given in this and the following chapter. We begin with the easier half.

Theorem 6.3.1. Let f n , 1  n  1g be p.m.’s on R1 with ch.f.’s ffn , 1 


n  1g. If n converges vaguely to 1 , then fn converges to f1 uniformly
in every finite interval. We shall write this symbolically as
v u
1 n! 1 ) fn !f1 .
Furthermore, the family ffn g is equicontinuous on R1 .
PROOF. Since eitx is a bounded continuous function on R1 , although
complex-valued, Theorem 4.4.2 applies to its real and imaginary parts and
yields (1) at once, apart from the asserted uniformity. Now for every t and h,
we have, as in (ii) of Sec. 6.1:
 
jfn t C h  fn tj  jeihx  1j n dx  jhxj n dx
jxjA
 
C2 n dx  jhjA C 2 dx C 
jxj>A jxj>A

for any  > 0, suitable A and n ½ n0 A, . The equicontinuity of ffn g follows.
u
This and the pointwise convergence fn ! f1 imply fn !f1 by a simple
compactness argument (the “3 argument”) left to the reader.

Theorem 6.3.2. Let f n, 1  n < 1g be p.m.’s on R1 with ch.f.’s ffn , 1 


n < 1g. Suppose that

(a) fn converges everywhere in R1 and defines the limit function f1 ;


(b) f1 is continuous at t D 0.
170 CHARACTERISTIC FUNCTION

Then we have
v
(˛) n ! 1 , where 1 is a p.m.;
(ˇ) f1 is the ch.f. of 1.

PROOF. Let us first relax the conditions (a) and (b) to require only conver-
gence of fn in a neighborhood υ0 , υ0  of t D 0 and the continuity of the
limit function f (defined only in this neighborhood) at t D 0. We shall prove
that any vaguely convergent subsequence of f n g converges to a p.m. . For
this we use the following lemma, which illustrates a useful technique for
obtaining estimates on a p.m. from its ch.f.

Lemma. For each A > 0, we have


 1 
 A 
 
2 [2A, 2A] ½ A  ft dt  1.
 A1 

PROOF OF THE LEMMA. By (8) of Sec. 6.2, we have


T  1
1 sin Tx
3 ft dt D dx.
2T T 1 Tx

Since the integrand on the right side is bounded by 1 for all x (it is defined to be
1 at x D 0), and by jTxj1  2TA1 for jxj > 2A, the integral is bounded by
1
[2A, 2A] C f1  [2A, 2A]g
2TA
 
1 1
D 1 [2A, 2A] C .
2TA 2TA

Putting T D A1 in (3), we obtain


  1 
A A  1
  1
 ft dt  [2A, 2A] C ,
 2 A1  2 2

which reduces to (2). The lemma is proved.


Now for each υ, 0 < υ < υ0 , we have
  υ    υ  
1  1  1 υ
4   
fn t dt ½  
ft dt  jfn t  ftj dt.
 2υ 2υ υ 2υ υ

The first term on the right side tends to 1 as υ # 0, since f0 D 1 and f
is continuous at 0; for fixed υ the second term tends to 0 as n ! 1, by
bounded convergence since jfn  fj  2. It follows that for any given  > 0,
6.3 CONVERGENCE THEOREMS 171

there exist υ D υ < υ0 and n0 D n0  such that if n ½ n0 , then the left
member of (4) has a value not less than 1  . Hence by (2)
1 1
5 n [2υ , 2υ ] ½ 21    1 ½ 1  2.

Let f nk g be a vaguely convergent subsequence of f n g, which always


exists by Theorem 4.3.3; and let the vague limit be , which is always an
s.p.m. For each υ satisfying the conditions above, and such that neither 2υ1
nor 2υ1 is an atom of , we have by the property of vague convergence
and (5):
R1  ½ [2υ1 , 2υ1 ]
1 1
D lim n [2υ , 2υ ] ½ 1  2.
n!1

Since  is arbitrary, we conclude that is a p.m., as was to be shown.


Let f be the ch.f. of . Then, by the preceding theorem, fnk ! f every-
where; hence under the original hypothesis (a) we have f D f1 . Thus every
vague limit considered above has the same ch.f. and therefore by the unique-
ness theorem is the same p.m. Rename it 1 so that 1 is the p.m. having
v
the ch.f. f1 . Then by Theorem 4.3.4 we have n ! 1 . Both assertions ˛
and ˇ are proved.
As a particular case of the above theorem: if f n , 1  n  1g and
ffn , 1  n  1g are corresponding p.m.’s and ch.f.’s, then the converse of
(1) is also true, namely:
v u
6 n! 1 , fn !f1 .
This is an elegant statement, but it lacks the full strength of Theorem 6.3.2,
which lies in concluding that f1 is a ch.f. from more easily verifiable condi-
tions, rather than assuming it. We shall see presently how important this is.
Let us examine some cases of inapplicability of Theorems 6.3.1 and 6.3.2.

Example 1. Let n have mass 12 at 0 and mass 1


2
at n. Then n ! 1, where 1
has mass 12 at 0 and is not a p.m. We have

fn t D 1
2
C 12 eint ,
which does not converge as n ! 1, except when t is equal to a multiple of 2.

Example 2. Let n be the uniform distribution [n, n]. Then n ! 1, where 1


is identically zero. We have
 sin nt
, if t 6D 0;
fn t D nt
1, if t D 0;
172 CHARACTERISTIC FUNCTION

and

0, if t 6D 0;
fn t ! ft D
1, if t D 0.

Thus, condition (a) is satisfied but (b) is not.

Later we shall see that (a) cannot be relaxed to read: fn t converges in
jtj  T for some fixed T (Exercise 9 of Sec. 6.5).
The convergence theorem above settles the question of vague conver-
gence of p.m.’s to a p.m. What about just vague convergence without restric-
tion on the limit? Recalling Theorem 4.4.3, this suggests first that we replace
the integrand eitx in the ch.f. f by a function in C0 . Secondly, going over the
last part of the proof of Theorem 6.3.2, we see that the choice should be made
so as to determine uniquely an s.p.m. (see Sec. 4.3). Now the Fourier–Stieltjes
transform of an s.p.m. is well defined and the inversion formula remains valid,
so that there is unique correspondence just as in the case of a p.m. Thus a
natural choice of g is given by an “indefinite integral” of a ch.f., as follows:
 u # $ 
eiux  1
7 gu D eitx dx dt D dx.
0 R1 R1 ix
Let us call g the integrated characteristic function of the s.p.m. . We are
thus led to the following companion of (6), the details of the proof being left
as an exercise.

Theorem 6.3.3. A sequence of s.p.m.’s f n , 1  n < 1g converges (to 1 )


if and only if the corresponding sequence of integrated ch.f.’s fgn g converges
(to the integrated ch.f. of 1 ).

Another question concerning (6) arises naturally. We know that vague


convergence for p.m.’s is metric (Exercise 9 of Sec. 4.4); let the metric be
denoted by hÐ, Ði1 . Uniform convergence on compacts (viz., in finite intervals)
for uniformly bounded subsets of CB R1  is also metric, with the metric
denoted by hÐ, Ði2 , defined as follows:
jft  gtj
hf, gi2 D sup .
t2R1 1 C t2

It is easy to verify that this is a metric on CB and that convergence in this metric
is equivalent to uniform convergence on compacts; clearly the denominator
1 C t2 may be replaced by any function continuous on R1 , bounded below
by a strictly positive constant, and tending to C1 as jtj ! 1. Since there is
a one-to-one correspondence between ch.f.’s and p.m.’s, we may transfer the
6.3 CONVERGENCE THEOREMS 173

metric hÐ, Ði2 to the latter by setting


h , i2 D hf , f i2
in obvious notation. Now the relation (6) may be restated as follows.

Theorem 6.3.4. The topologies induced by the two metrics h i1 and h i2 on


the space of p.m.’s on R1 are equivalent.

This means that for each and given  > 0, there exists υ ,  such that:
h , i1  υ ,  ) h , i2  ,
h , i2  υ ,  ) h , i1  .
Theorem 6.3.4 needs no new proof, since it is merely a paraphrasing of (6) in
new words. However, it is important to notice the dependence of υ on (as
well as ) above. The sharper statement without this dependence, which would
mean the equivalence of the uniform structures induced by the two metrics,
is false with a vengeance; see Exercises 10 and 11 below (Exercises 3 and 4
are also relevant).

EXERCISES

1. Prove the uniform


 convergence of fn in Theorem 6.3.1 by an inte-
gration by parts of eitx dFn x.
 2. Instead of using the Lemma in the second part of Theorem 6.3.2,
prove that is a p.m. by integrating the inversion formula, as in Exercise 3
of Sec. 6.2. (Integration is a smoothing operation and a standard technique in
taming improper integrals: cf. the proof of the second part of Theorem 6.5.2
below.)
3. Let F be a given absolutely continuous d.f. and let Fn be a sequence
of step functions with equally spaced steps that converge to F uniformly in
R1 . Show that for the corresponding ch.f.’s we have
8n: sup jft  fn tj D 1.
t2R1

4. Let Fn , Gn be d.f.’s with ch.f.’s fn and gn . If fn  gn ! 0 a.e.,


then for each f 2 CK we have f dFn  f dGn ! 0 (see Exercise 10
of Sec. 4.4). This does not imply the Lévy distance hFn , Gn i1 ! 0; find
a counterexample. [HINT: Use Exercise 3 of Sec. 6.2 and proceed as in
Theorem 4.3.4.]
5. Let F be a discrete d.f. with points of jump faj , j ½ 1g and sizes of
jump fbj , j ½ 1g. Consider the approximating s.d.f.’s Fn with the same jumps
v
but restricted to j  n. Show that Fn !F.
174 CHARACTERISTIC FUNCTION

 6. If the sequence of ch.f.’s ff g converges uniformly in a neighborhood


n
of the origin, then ffn g is equicontinuous, and there exists a subsequence that
converges to a ch.f. [HINT: Use Ascoli-Arzela’s theorem.]
v v v
7. If Fn !F and Gn !G, then Fn Ł Gn !F Ł G. [A proof of this simple
result without the use of ch.f.’s would be tedious.]
 8. Interpret the remarkable trigonometric identity

1
sin t t
D cos n
t nD1
2

in terms of ch.f.’s, and hence by addition of independent r.v.’s. (This is an


example of Exercise 4 of Sec. 6.1.)
9. Rewrite the preceding formula as
1  1 
sin t  t  t
D cos 2k1 cos 2k .
t kD1
2 kD1
2

Prove that either factor on the right is the ch.f. of a singular distribution. Thus
the convolution of two such may be absolutely continuous. [HINT: Use the
same r.v.’s as for the Cantor distribution in Exercise 9 of Sec. 5.3.]
10. Using the strong law of large numbers, prove that the convolution of
two Cantor d.f.’s is still singular. [HINT: Inspect the frequency of the digits in
the sum of the corresponding random series; see Exercise 9 of Sec. 5.3.]
 11. Let F , G be the d.f.’s of
n n n , n , and fn , gn their ch.f.’s. Even if
supx2R1 jFn x  Gn xj ! 0, it does not follow that hfn , gn i2 ! 0; indeed
it may happen that hfn , gn i2 D 1 for every n. [HINT: Take two step functions
“out of phase”.]
12. In the notation of Exercise 11, even if supt2R1 jfn t  gn tj ! 0,
it does not follow that hFn , Gn i ! 0; indeed it may ! 1. [HINT: Let f be any
ch.f. vanishing outside (1, 1), fj t D einj t fmj t, gj t D einj t fmj t, and
Fj , Gj be the corresponding d.f.’s. Note that if mj n1 j ! 0, then Fj x ! 1,
Gj x ! 0 for every x, and that fj  gj vanishes outside mj1 , mj1  and
2 
is 0sin nj t near t D 0. If mj D 2j and nj D jmj then j fj  gj  is
1 1
uniformly bounded in t: for nkC1 < t  nk consider j > k, j D k, j < k sepa-
rately. Let
n 
n
fŁn D n1 fj , gŁn D n1 gj ,
jD1 jD1

then sup jfŁn  gŁn j D On1  while FŁn  GŁn ! 0. This example is due to
Katznelson, rivaling an older one due to Dyson, which is as follows. For
6.4 SIMPLE APPLICATIONS 175

b > a > 0, let  


x 2 C b2
log
1 x 2 C a2
Fx D  
2 b
a log
a

for x < 0 and D 1 for x > 0; Gx D 1  Fx. Then,

t [eajtj  ebjtj ]
ft  gt D i   .
jtj b
log
a

If a is large, then hF, Gi is near 1. If b/a is large, then hf, gi is near 0.]

6.4 Simple applications

A common type of application of Theorem 6.3.2 depends on power-series


expansions of the ch.f., for which we need its derivatives.

Theorem 6.4.1. If the d.f. has a finite absolute moment of positive integral
order k, then its ch.f. has a bounded continuous derivative of order k given by
 1
1 f t D
k
ixk eitx dFx.
1

Conversely, if f has a finite derivative of even order k at t D 0, then F has


a finite moment of order k.
PROOF. For k D 1, the first assertion follows from the formula:
 1 itChx
ft C h  ft e  eitx
D dFx.
h 1 h
An elementary inequality already used in the proof of Theorem
 6.2.1 shows
that the integrand above is dominated by jxj. Hence if jxj dFx < 1, we
may let h ! 0 under the integral sign and obtain (1). Uniform continuity of
the integral as a function of t is proved as in (ii) of Sec. 6.1. The case of a
general k follows easily by induction.
To prove the second assertion, let k D 2 and suppose that f00 0 exists
and is finite. We have
fh  2f0 C fh
f00 0 D lim
h!0 h2
176 CHARACTERISTIC FUNCTION


eihx  2 C eihx
D lim dFx
h!0 h2

1  cos hx
2 D 2 lim dFx.
h!0 h2
As h ! 0, we have by Fatou’s lemma,
  
1  cos hx 1  cos hx
x dFx D 2 lim
2
dFx  lim 2 dFx
h!0 h2 h!0 h2
D f00 0.
Thus F has a finite second moment, and the validity of (1) for k D 2 now
follows from the first assertion of the theorem.
The general case can again be reduced to this by induction, as follows.
Suppose the second assertion of the theorem is true for 2k  2, and that
f2k 0 is finite. Then f2k2 t exists and is continuous in the neighborhood
of t D 0, and by the induction hypothesis we have in particular

1 k1
x 2k2 dFx D f2k2 0.
x
Put Gx D 1 y 2k2 dFy for every x, then GÐ/G1 is a d.f. with the
ch.f. 
1 1k1 f2k2 t
t D eitx x 2k2 dFx D .
G1 G1
00
Hence exists, and by the case k D 2 proved above, we have
 
1 1
 2 0 D x 2 dGx D x 2k dFx
G1 G1
Upon cancelling G1, we obtain

1k f2k 0 D x 2k dFx,

which proves the finiteness of the 2kth moment. The argument above fails
if G1 D 0, but then we have (why?) F D υ0 , f D 1, and the theorem is
trivial.
Although the next theorem is an immediate corollary to the preceding
one, it is so important as to deserve prominent mention.

Theorem 6.4.2. If F has a finite absolute moment of order k, k an integer


½1, then f has the following expansion in the neighborhood of t D 0:
k
ij j j
3 ft D m t C ojtjk ,
jD0
j!
6.4 SIMPLE APPLICATIONS 177


k1 j
i j j k
0
3  ft D m t C k
jtjk ;
jD0
j! k!

where mj is the moment of order j, k


is the absolute moment of order k,
and jk j  1.
PROOF. According to a theorem in calculus (see, e.g., Hardy [1], p. 290]),
if f has a finite kth derivative at the point t D 0, then the Taylor expansion
below is valid:
k
fj 0 j
4 ft D t C ojtjk .
jD0
j!
If f has a finite kth derivative in the neighborhood of t D 0, then

k1
fj 0 fk t k
40  ft D tj C t , jj  1.
jD0
j! k!
Since the absolute moment of order j is finite for 1  j  k, and
fj 0 D ij mj , jfk tj  k

from (1), we obtain (3) from (4), and 30  from 40 .
It should be remarked that the form of Taylor expansion given in (4)
is not always given in textbooks as cited above, but rather under stronger
assumptions, such as “f has a finite kth derivative in the neighborhood of
0”. [For even k this stronger condition is actually implied by the weaker one
stated in the proof above, owing to Theorem 6.4.1.] The reader is advised
to learn the sharper result in calculus, which incidentally also yields a quick
proof of the first equation in (2). Observe that (3) implies 30  if the last term
in 30  is replaced by the more ambiguous Ojtjk , but not as it stands, since
the constant in “O” may depend on the function f and not just on k .
By way of illustrating the power of the method of ch.f.’s without the
encumbrance of technicalities, although anticipating more elaborate develop-
ments in the next chapter, we shall apply at once the results above to prove two
classical limit theorems: the weak law of large numbers (cf. Theorem 5.2.2),
and the central limit theorem in the identically distributed and finite variance
case. We begin with an elementary lemma from calculus, stated here for the
sake of clarity.

Lemma. If the complex numbers cn have the limit c, then


cn n
5 lim 1 C D ec .
n!1 n
(For real cn ’s this remains valid for c D C1.)
178 CHARACTERISTIC FUNCTION

Now let fXn ,


n ½ 1g be a sequence of independent r.v.’s with the common
d.f. F, and Sn D njD1 Xj , as in Chapter 5.

Theorem 6.4.3. If F has a finite mean m, then


Sn
! m in pr.
n
PROOF. Since convergence to the constant m is equivalent to that in dist.
to υm (Exercise 4 of Sec. 4.4), it is sufficient by Theorem 6.3.2 to prove that
the ch.f. of Sn /n converges to eimt (which is continuous). Now, by (2) of
Sec. 6.1 we have
#  $n
t
Ee itSn /n
 D Ee it/nSn
D f .
n
By Theorem 6.4.2, the last term above may be written as
  n
t t
1 C im C o
n n
for fixed t and n ! 1. It follows from (5) that this converges to eimt as
desired.
2
Theorem 6.4.4. If F has mean m and finite variance > 0, then
Sn  mn
p !8 in dist.
n
where 8 is the normal distribution with mean 0 and variance 1.
PROOF. We may suppose m D 0 by considering the r.v.’s Xj  m, whose
second moment is 2 . As in the preceding proof, we have
    n
Sn t
E exp it p Df p
n n
  2   n
i2 2 t jtj 2
D 1C p Co p
2 n n
   n
t2 t2
! et /2 .
2
D 1 Co
2n n
The limit being the ch.f. of 8, the proof is ended.
The convergence theorem for ch.f.’s may be used to complete the method
of moments in Theorem 4.5.5, yielding a result not very far from what ensues
from that theorem coupled with Carleman’s condition mentioned there.
6.4 SIMPLE APPLICATIONS 179

Theorem 6.4.5. In the notation of Theorem 4.5.5, if (8) there holds together
with the following condition:
mk tk
6 8t 2 R1 : lim D 0,
k!1 k!
v
then Fn !F.
PROOF. Let fn be the ch.f. of Fn . For fixed t and an odd k we have by
the Taylor expansion for eitx with a remainder term:
⎧ ⎫
  ⎨ k kC1 ⎬
itxj
jitxj
fn t D eitx dFn x D C dFn x
⎩ j! k C 1! ⎭
jD0


k
itj mnkC1 tkC1
D mnj C  ,
jD0
j! k C 1!

where  denotes a “generic” complex number of modulus 1, not necessarily


the same in different appearances. (The use of such a symbol, to be further
indulged in the next chapter, avoids the necessity of transposing terms and
taking moduli of long expressions. It is extremely convenient, but one must
occasionally watch the dependence of  on various quantities involved.) It
follows that

k
itj tkC1
7 fn t  ft D mnj  mj  C mkC1 C mkC1 .
jD0
j! k C 1! n

Given  > 0, by condition (6) there exists an odd k D k such that for the
fixed t we have
2mkC1 C 1tkC1 
8  .
k C 1! 2
Since we have fixed k, there exists n0 D n0  such that if n ½ n0 , then
mnkC1  mkC1 C 1,
and moreover,

max jmnj  mj j  ejtj .
1jk 2
Then the right side of (7) will not exceed in modulus:

k
jtjj  tkC1 2mkC1 C 1
ejtj C  .
jD0
j! 2 k C 1!
180 CHARACTERISTIC FUNCTION

Hence fn t ! ft for each t, and since f is a ch.f., the hypotheses of
v
Theorem 6.3.2 are satisfied and so Fn !F.
As another kind of application of the convergence theorem in which a
limiting process is implicit rather than explicit, let us prove the following
characterization of the normal distribution.

Theorem 6.4.6. Let X and Y be independent, identically distributed r.v.’s


with mean 0 and variance 1. If X C Y and X  Y are independent then the
common distribution of X and Y is 8.
Let f be the ch.f., then by (1), f0 0 D 0, f00 0 D 1. The ch.f.
PROOF.
of X C Y is ft2 and that of X  Y is ftft. Since these two r.v.’s are
independent, the ch.f. f2t of their sum 2X must satisfy the following relation:
9 f2t D ft3 ft.
It follows from (9) that the function f never vanishes. For if it did at t0 , then
it would also at t0 /2, and so by induction at t0 /2n for every n ½ 1. This is
impossible, since limn!1 ft0 /2n  D f0 D 1. Setting for every t:
ft
t D ,
ft
we obtain
10 2t D t2 .
Hence we have by iteration, for each t:
 2n   2n
t t
t D  n
D 1Co !1
2 2n
by Theorem 6.4.2 and the previous lemma. Thus t  1, ft  ft, and
(9) becomes
11 f2t D ft4 .
Repeating the argument above with (11), we have
 4n    &  ' 4n
t 1 t 2 t 2
! et
2
ft D f D 1  C o /2
.
2n 2 2n 2n
This proves the theorem without the use of “logarithms” (see Sec. 7.6).

EXERCISES

 1. If f is the ch.f. of X, and



ft  1  2
lim D > 1,
t#0 t2 2
6.4 SIMPLE APPLICATIONS 181

then E X D 0 and E X2  D 2 . In particular, if ft D 1 C ot2  as t ! 0,


then f  1.
 2. Let fX g be independent, identically distributed with mean 0 and vari-
n
ance 2 , 0  2  1. Prove that
   C +
jSn j S 2
lim E p D 2 lim E pn D
n!1 n n!1 n 

[If we assume only p P fX1 6D 0g > 0, E jX1 j < 1 and E X1  D 0, then we
have E jSn j ½ C n for some constant C and all n; this p is known as
Hornich’s inequality.] [HINT: In case 2 D 1, if limn E j Sn / n < 1, then
p
there exists fnk g such that Sn / nk converges in distribution; use an extension
p 2n
of Exercise 1 to show jft/ nj ! 0. This is due to P. Matthews.]

3. Let P fX D kg D pk , 1  k   < 1, kD1 pk D 1. The sum Sn of
n independent r.v.’s having the same distribution as X is said to have a
multinomial distribution. Define it explicitly. Prove that [Sn  E Sn ]/ Sn 
converges to 8 in distribution as n ! 1, provided that X > 0.
 4. Let X have the binomial distribution with parameter n, p , and
n n
suppose that npn !  ½ 0. Prove that Xn converges in dist. to the Poisson d.f.
with parameter . (In the old days this was called the law of small numbers.)
5. Let X have the Poisson distribution with parameter . Prove that
[X  ]/1/2 converges in dist. to 8 as  ! 1.
 6. Prove that in Theorem 6.4.4, S / pn does not converge in proba-
n p
p
bility. [HINT: Consider Sn / n and S2n / 2n.]
7. Let f be the ch.f. of the d.f. F. Suppose that as t ! 0,

ft  1 D Ojtj˛ ,

where 0 < ˛  2, then as A ! 1,



dFx D OA˛ .
jxj>A


[HINT: Integrate  cos tx dFx  Ct˛ over t in 0, A.]
jxj>A 1

8. If 0 < ˛ < 1 and jxj˛ dFx < 1, then ft  1 D ojtj˛  as t ! 0.
 1  ˛ < 2 the same result is true under the additional assumption that
For
x dFx D 0. [HINT: The case 1  ˛ < 2 is harder. Consider the real and
imaginary parts of ft  1 separately and write the latter as
 
sin tx dFx C sin tx dFx.
jxj/t jxj>/jtj
182 CHARACTERISTIC FUNCTION


The second is bounded by jtj/˛ jxj>/jtj jxj˛ dFx D ojtj˛  for fixed . In
the first integral use sin tx D tx C Ojtxj3 ,
 
tx dFx D t x dFx,
jxj/jtj jxj>/jtj
  1
jtxj dFx  
3 3˛
jtxj˛ dFx.]
jxj/jtj 1

9. Suppose that ecjtj , where c ½ 0, 0 < ˛  2, is a ch.f. (Theorem 6.5.4


˛

below). Let fXj , j ½ 1g be independent and identically distributed r.v.’s with


a common ch.f. of the form
1  ˇjtj˛ C ojtj˛ 

as t ! 0. Determine the constants b and  so that the ch.f. of Sn /bn converges


to ejtj .
˛

10. Suppose F satisfies the condition that for every  > 0 such that as
A ! 1, 
dFx D OeA .
jxj>A

Then all moments of F are finite, and condition (6) in Theorem 6.4.5 is satisfied.
11. Let X and Y be independent
p with the common d.f. F of mean 0 and
variance 1. Suppose that X C Y/ 2 also has the d.f. F. Then F  8. [HINT:
Imitate Theorem 6.4.5.]
 12. Let fX , j ½ 1g be independent, identically distributed r.v.’s with
j
mean 0 and variance 1. Prove that both
n
p 
n
Xj n Xj
jD1 jD1
/ and
0n 
n
0 X2j
1 X2j
jD1 jD1

converge in dist. to 8. [Hint: Use the law of large numbers.]


13. The converse part of Theorem 6.4.1 is false for an odd k. Example.
F is a discrete symmetric d.f. with mass C/n2 log n for integers n ½ 3, where
O is the appropriate constant, and k D 1. [HINT: It is well known that the series
 sin nt

n
n log n

converges uniformly in t.]


6.4 SIMPLE APPLICATIONS 183

We end this section with a discussion of a special class of distributions


and their ch.f.’s that is of both theoretical and practical importance.
A distribution or p.m. on R1 is said to be of the lattice type iff its
support is in an arithmetical progression — that is, it is completely atomic
(i.e., discrete) with all the atoms located at points of the form fa C jdg, where
a is real, d > 0, and j ranges over a certain nonempty set of integers. The
corresponding lattice d.f. is of form:
1

Fx D pj υaCjd x,
jD1
1
where pj ½ 0 and jD1 pj D 1. Its ch.f. is
1

12 ft D eait pj ejdit ,
jD1

which is an absolutely convergent Fourier series. Note that the degenerate d.f.
υa with ch.f. eait is a particular case. We have the following characterization.

Theorem 6.4.7. A ch.f. is that of a lattice distribution if and only if there


exists a t0 6D 0 such that jft0 j D 1.
PROOF. The “only if ” part is trivial, since it is obvious from (12) that jfj
is periodic of period 2/d. To prove the “if ” part, let ft0  D ei0 , where 0
is real; then we have

1 D ei0 ft0  D eit0 x0  dx

and consequently, taking real parts and transposing:



13 0 D [1  cost0 x  0 ] dx.

The integrand is positive everywhere and vanishes if and only if for some
integer j,  
0 2
xD Cj .
t0 t0
It follows that the support of must be contained in the set of x of this form
in order that equation (13) may hold, for the integral of a strictly positive
function over a set of strictly positive measure is strictly positive. The theorem
is therefore proved, with a D 0 /t0 and d D 2/t0 in the definition of a lattice
distribution.
184 CHARACTERISTIC FUNCTION

It should be observed that neither “a” nor “d” is uniquely determined


above; for if we take, e.g., a0 D a C d0 and d0 a divisor of d, then the support
of is also contained in the arithmetical progression fa0 C jd0 g. However,
unless is degenerate, there is a unique maximum d, which is called the
“span” of the lattice distribution. The following corollary is easy.

Corollary. Unless jfj  1, there is a smallest t0 > 0 such that ft0  D 1.


The span is then 2/t0 .

Of particular interest is the “integer lattice” when a D 0, d D 1; that is,


when the support of is a set of integers at least two of which differ by 1. We
have seen many examples of these. The ch.f. f of such an r.v. X has period
2, and the following simple inversion formula holds, for each integer j:
 
1
14 P X D j D pj D ftejit dt,
2 
where the range of integration may be replaced by any interval of length
2. This, of course, is nothing but the well-known formula for the “Fourier
coefficients” of f. If fXk g is a sequence
 of independent, identically distributed
r.v.’s with the ch.f. f, then Sn D n Xk has the ch.f. fn , and the inversion
formula above yields:
 
1
15 P Sn D j D [ft]n ejit dt.
2 
This may be used to advantage to obtain estimates for Sn (see Exercises 24
to 26 below).

EXERCISES

f or fn is a ch.f. below.
14. If jftj D 1, jft0 j D 1 and t/t0 is an irrational number, then f is
degenerate. If for a sequence ftk g of nonvanishing constants tending to 0 we
have jftk j D 1, then f is degenerate.
 15. If jf tj ! 1 for every t as n ! 1, and F is the d.f. corre-
n n
v
sponding to fn , then there exist constants an such that Fn x C an !υ0 . [HINT:
Symmetrize and take an to be a median of Fn .]
 16. Suppose b > 0 and jfb tj converges everywhere to a ch.f. that
n n
is not identically 1, then bn converges to a finite and strictly positive limit.
[HINT: Show that it is impossible that a subsequence of bn converges to 0 or
to C1, or that two subsequences converge to different finite limits.]
 17. Suppose c is real and that ecn it converges to a limit for every t
n
in a set of strictly positive Lebesgue measure. Then cn converges to a finite
6.4 SIMPLE APPLICATIONS 185

limit. [HINT: Proceed as in Exercise 16, and integrate over t. Beware of any
argument using “logarithms”, as given in some textbooks, but see Exercise 12
of Sec. 7.6 later.]
 18. Let f and g be two nondegenerate ch.f.’s. Suppose that there exist
real constants an and bn > 0 such that for every t:
 
t
fn t ! ft and e itan /bn
fn ! gt.
bn

Then an ! a, bn ! b, where a is finite, 0 < b < 1, and gt D eita/b ft/b.


[HINT: Use Exercises 16 and 17.]
 19. Reformulate Exercise 18 in terms of d.f.’s and deduce the following
consequence. Let Fn be a sequence of d.f.’s an , an0 real constants, bn > 0,
bn0 > 0. If
v v
Fn bn x C an  ! Fx and Fn bn0 x C an0  ! Fx,

where F is a nondegenerate d.f., then


bn an  an0
! 1 and ! 0.
bn0 bn

[Two d.f.’s F and G such that Gx D Fbx C a for every x, where b > 0
and a is real, are said to be of the same “type”. The two preceding exercises
deal with the convergence of types.]
20. Show by using (14) that j cos tj is not a ch.f. Thus the modulus of a
ch.f. need not be a ch.f., although the squared modulus always is.
21. The span of an integer lattice distribution is the greatest common
divisor of the set of all differences between points of jump.
22. Let fs, t be the ch.f. of a 2-dimensional p.m. . If jfs0 , t0 j D 1
for some s0 , t0  6D 0, 0, what can one say about the support of ?
 23. If fX g is a sequence of independent and identically distributed r.v.’s,
n 
then there does not exist a sequence of constants fcn g such that n Xn  cn 
converges a.e., unless the common d.f. is degenerate.

In Exercises 24 to 26, let Sn D njD1 Xj , where the X0j s are independent r.v.’s
with a common d.f. F of the integer lattice type with span 1, and taking both
>0 and <0 values.
 24. If  x dFx D 0,  x 2 dFx D 2 , then for each integer j:

1
n1/2 P fSn D jg ! p .
2
[HINT: Proceed as in Theorem 6.4.4, but use (15).]
186 CHARACTERISTIC FUNCTION

25. If F 6D υ0 , then there exists a constant A such that for every j:


P fSn D jg  An1/2 .

[HINT: Use a special case of Exercise 27 below.]



26. If F is symmetric and jxj dFx < 1, then
nP fSn D jg ! 1.

[HINT: 1  ft D ojtj as t ! 0.]


27. If f is any nondegenerate ch. f, then there exist constants A > 0
and υ > 0 such that
jftj  1  At2 for jtj  υ.

[HINT: Reduce to the case where the d.f. has zero mean and finite variance by
translating and truncating.]

28. Let Qn be the concentration function of Sn D njD1 Xj , where the
Xj ’s are independent r.v.’s having a common nondegenerate d.f. F. Then for
every h > 0,
Qn h  An1/2

[HINT: Use Exercise 27 above and Exercise 16 of Sec. 6.1. This result is due
to Lévy and Doeblin, but the proof is due to Rosén.]
In Exercises 29 to 35, or k is a p.m. on U D 0, 1].
29. Define for each n:

f n D e2inx dx.
U

Prove by Weierstrass’s approximation theorem (by trigonometrical polyno-


mials) that if f 1 n D f 2 n for every n ½ 1, then 1  2 . The conclusion
becomes false if U is replaced by [0, 1].
30. Establish the inversion formula expressing in terms of the f n’s.
Deduce again the uniqueness result in Exercise 29. [HINT: Find the Fourier
series of the indicator function of an interval contained in U .]
31. Prove that jf nj D 1 if and only if has its support in the set
f0 C jn1 , 0  j  n  1g for some 0 in (0, n1 ].
 32. is equidistributed on the set fjn1 , 0  j  n  1g if and only if
f j D 0 or 1 according to j  n or j j n.
 33. v
k! if and only if f Ð ! f Ð everywhere.
k

34. Suppose that the space U is replaced by its closure [0, 1] and the
two points 0 and 1 are identified; in other words, suppose U is regarded as the
6.5 REPRESENTATIVE THEOREMS 187

circumference  U of a circle; then we can improve the result in Exercise 33


as follows. If there exists a function g on the integers such that f k Ð ! gÐ
everywhere, then there exists a p.m. on 
v
U such that g D f and k !
on  U .
35. Random variables defined on  U are also referred to as defined
“modulo 1”. The theory of addition of independent r.v.’s on  U is somewhat
simpler than on R1 , as exemplified by the following theorem. Let fXj , j ½ 1g

be independent and identically distributed r.v.’s on U and let Sk D kjD1 Xj .
Then there are only two possibilities for the asymptotic distributions of Sk .
Either there exist a constant c and an integer n ½ 1 such that Sk  kc converges
in dist. to the equidistribution on fjn1 , 0  j  n  1g; or Sk converges in
dist. to the uniform distribution on  U . [HINT: Consider the possible limits of
f nk as k ! 1, for each n.]

6.5 Representation theorems


A ch.f. is defined to be the Fourier–Stieltjes transform of a p.m. Can it be char-
acterized by some other properties? This question arises because frequently
such a function presents itself in a natural fashion while the underlying measure
is hidden and has to be recognized. Several answers are known, but the
following characterization, due to Bochner and Herglotz, is most useful. It
plays a basic role in harmonic analysis and in the theory of second-order
stationary processes.
A complex-valued function f defined on R1 is called positive definite iff
for any finite set of real numbers tj and complex numbers zj (with conjugate
complex zj ), 1  j  n, we have

n 
n
1 ftj  tk zj zk ½ 0.
jD1 kD1

Let us deduce at once some elementary properties of such a function.

Theorem 6.5.1. If f is positive definite, then for each t 2 R1 :


ft D ft, jftj  f0.
If f is continuous at t D 0, then it is uniformly continuous in R1 . In
this case, we have for every continuous complex-valued function  on R1 and
every T > 0:
 T T
2 fs  tst ds dt ½ 0.
0 0
188 CHARACTERISTIC FUNCTION

PROOF. Taking n D 1, t1 D 0, z1 D 1 in (1), we see that


f0 ½ 0.
Taking n D 2, t1 D 0, t2 D t, z1 D z2 D 1, we have
2f0 C ft C ft ½ 0;
changing z2 to i, we have
f0 C fti  fti C f0 ½ 0.
Hence ft C ft is real and ft  ft is pure imaginary, which imply
that ft D ft. Changing z1 to ft and z2 to jftj, we obtain
2f0jftj2  2jftj3 ½ 0.
Hence f0 ½ jftj, whether jftj D 0 or ½ 0. Now, if f0 D 0, then
fÐ  0; otherwise we may suppose that f0 D 1. If we then take n D 3,
t1 D 0, t2 D t, t3 D t C h, a well-known result in positive definite quadratic
forms implies that the determinant below must have a positive value:
 
 f0 ft ft  h 
 
 ft f0 fh 
 
ft C h fh f0
D 1  jftj2  jft C hj2  jfhj2 C 2Rfftfhft C hg ½ 0.
It follows that
jft  ft C hj2 D jftj2 C jft C hj2  2Rfftft C hg
 1  jfhj2 C 2Rfftft C h[fh  1]g
 1  jfhj2 C 2j1  fhj  4j1  fhj.
Thus the “modulus of continuity” at each t is bounded by twice the square
root of that at 0; and so continuity at 0 implies uniform continuity every-
where. Finally, the integrand of the double integral in (2) being continuous,
the integral is the limit of Riemann sums, hence it is positive because these
sums are by (1).

Theorem 6.5.2. f is a ch.f. if and only if it is positive definite and continuous


at 0 with f0 D 1.

Remark. It follows trivially that f is the Fourier–Stieltjes transform



eitx dx
R1
6.5 REPRESENTATIVE THEOREMS 189

of a finite measure if and only if it is positive definite and finite continuous


at 0; then R1  D f0.
PROOF. If f is the ch.f. of the p.m. , then we need only verify that it is
positive definite. This is immediate, since the left member of (1) is then
 2
  n    
n
 n

x
itj x
e zj e zk dx D 
itk  itj 
x
e zj  dx ½ 0.
jD1 kD1  jD1 

Conversely, if f is positive definite and continuous at 0, then by


Theorem 6.5.1, (2) holds for t D eitx . Thus
 T T
1
3 fs  teistx ds dt ½ 0.
2T 0 0
Denote the left member by pT x; by a change of variable and partial integra-
tion, we have
 T 
1 jtj
4 pT x D 1 fteitx dt.
2 T T
Now observe that for ˛ > 0,
  ˇ 
1 ˛ 1 ˛ 2 sin ˇt 21  cos ˛t
dˇ eitx dx D dˇ D
˛ 0 ˇ ˛ 0 t ˛t2

(where at t D 0 the limit value is meant, similarly later); it follows that


  ˇ   
1 ˛ 1 T jtj 1  cos ˛t
dˇ pT x dx D 1 ft dt
˛ 0 ˇ  T T ˛t2

1 1 1  cos ˛t
D fT t dt
 1 ˛t2
  
1 1 t 1  cos t
D fT dt,
 1 ˛ t2
where
⎧ 
⎨ 1  jtj ft, if jtj  T;
5 fT t D T

0, if jtj > T.
Note that this is the product of f by a ch.f. (see Exercise 2 of Sec. 6.2)
and corresponds to the smoothing of the would-be density which does not
necessarily exist.
190 CHARACTERISTIC FUNCTION

Since jfT tj  jftj  1 by Theorem 6.5.1, and 1  cos t/t2 belongs to
L 1 1, 1, we have by dominated convergence:
  ˇ   
1 ˛ 1 1 t 1  cos t
6 lim dˇ pT x dx D lim fT dt
˛!1 ˛ 0 ˇ  1 ˛!1 ˛ t2

1 1 1  cos t
D dt D 1.
 1 t2

Here the second equation follows from the continuity of fT at 0 with


fT 0 D 1, and the third equation from formula (3) of Sec. 6.2. Since pT ½ 0,

the integral ˇ pT x dx is increasing in ˇ, hence the existence of the limit of
its “integral average” in (6) implies that of the plain limit as ˇ ! 1, namely:
 1  ˇ
7 pT x dx D lim pT x dx D 1.
1 ˇ!1 ˇ

Therefore pT is a probability density function. Returning to (3) and observing


that for real :  ˇ
2 sin ˇ  t
eix eitx dx D ,
ˇ t

we obtain, similarly to (4):


   1
1 ˛ ˇ
1 1  cos ˛  t
dˇ e pT x dx D
ix
fT t dt
˛ 0 ˇ  1 ˛  t2
  
1 1 t 1  cos t
D fT   dt.
 1 ˛ t2

Note that the last two integrals are in reality over finite intervals. Letting
˛ ! 1, we obtain by bounded convergence as before:
 1
8 eix pT x dx D fT ,
1

the integral on the left existing by (7). Since equation (8) is valid for each ,
we have proved that fT is the ch.f. of the density function pT . Finally, since
fT  ! f as T ! 1 for every , and f is by hypothesis continuous at
 D 0, Theorem 6.3.2 yields the desired conclusion that f is a ch.f.
As a typical application of the preceding theorem, consider a family of
(real-valued) r.v.’s fXt , t 2 RC g, where RC D [0, 1, satisfying the following
conditions, for every s and t in RC :
6.5 REPRESENTATIVE THEOREMS 191

(i) E X2t  D 1;
(ii) there exists a function rÐ on R1 such that E Xs Xt  D rs  t;
(iii) limt#0 E X0  Xt 2  D 0.

A family satisfying (i) and (ii) is called a second-order stationary process or


stationary process in the wide sense; and condition (iii) means that the process
is continuous in L 2 , F , P .
For every finite set of tj and zj as in the definitions of positive definite-
ness, we have
⎧ 2 ⎫
⎨
⎪  ⎪
n
 ⎬  n n n  n
0E   
Xtj zj  D E Xtj Xtk zj zk D rtj  tk zj zk .

⎩ jD1  ⎪⎭ jD1 kD1 jD1 kD1

Thus r is a positive definite function. Next, we have


r0  rt D E X0 X0  Xt ,
hence by the Cauchy–Schwarz inequality,

jrt  r0j2  E X20 E X0  Xt 2 .


It follows that r is continuous at 0, with r0 D 1. Hence, by Theorem 6.5.2,
r is the ch.f. of a uniquely determined p.m. R:

rt D eitx Rdx.
R1

This R is called the spectral distribution of the process and is essential in its
further analysis.
Theorem 6.5.2 is not practical in recognizing special ch.f.’s or in
constructing them. In contrast, there is a sufficient condition due to Pólya
that is very easy to apply.

Theorem 6.5.3. Let f on R1 satisfy the following conditions, for each t:


9 f0 D 1, ft ½ 0, ft D ft,
f is decreasing and continuous convex in RC D [0, 1. Then f is a ch.f.
PROOF. Without loss of generality we may suppose that
f1 D lim ft D 0;
t!1

otherwise we consider [ft  f1]/[f0  f1] unless f1 D 1, in


which case f  1. It is well known that a convex function f has right-hand
192 CHARACTERISTIC FUNCTION

and left-hand derivatives everywhere that are equal except on a countable set,
that f is the integral of either one of them, say the right-hand one, which will
be denoted simply by f0 , and that f0 is increasing. Under the conditions of
the theorem it is also clear that f is decreasing and f0 is negative in RC .
Now consider the fT as defined in (5) above, and observe that
  t

1
 1 f0 t C ft, if 0 < t < T;
f0T t D T T
0, if t ½ T.
Thus f0T is positive and decreasing in RC . We have for each x 6D 0:
 1  1 
2 1
eitx fT t dt D 2 cos txfT t dt D sin txf0T t dt
1 0 x 0

1  kC1/x
2
D sin txf0T t dt.
x kD0 k/x

The terms of the series alternate in sign, beginning with a positive one, and
decrease in magnitude, hence the sum ½ 0. [This is the argument indicated
for formula (1) of Sec. 6.2.] For x D 0, it is trivial that
 1
fT t dt ½ 0.
1

We have therefore proved that the pT defined in (4) is positive everywhere,


and the proof there shows that f is a ch.f. (cf. Exercise 1 below).
Next we will establish an interesting class of ch.f.’s which are natural
extensions of the ch.f.’s corresponding to the normal and the Cauchy distri-
butions.

Theorem 6.5.4. For each ˛ in the range (0, 2],

f˛ t D ejtj
˛

is a ch.f.
PROOF. For 0 < ˛  1, this is a quick consequence of Pólya’s theorem
above. Other conditions there being obviously satisfied, we need only check
that f˛ is convex in [0, 1. This is true because its second derivative is
equal to
et f˛2 t2˛2  ˛˛  1t˛2 g > 0
˛

for the range of ˛ in question. No such luck for 1 < ˛ < 2, and there are
several different proofs in this case. Here is the one given by Lévy which
6.5 REPRESENTATIVE THEOREMS 193

works for 0 < ˛ < 2. Consider the density function


 ˛
if jxj > 1,
px D 2jxj˛C1
0 if jxj  1;
and compute its ch.f. f as follows, using symmetry:
 1  1
1  cos tx
1  ft D 1  eitx px dx D ˛ dx
1 1 x ˛C1
 1  t 
1  cos u 1  cos u
D ˛jtj˛ du  du ,
0 u˛C1 0 u˛C1

after the change of variables tx D u. Since 1  cos u ¾ 12 u2 near u D 0, the first


integral in the last member above is finite while the second is asymptotically
equivalent to

1 t u2 1
du D t2˛
2 0 u ˛C1 22  ˛
as t # 0. Therefore we obtain

ft D 1  c˛ jtj˛ C Ot2 

where c˛ is a positive constant depending on ˛.


It now follows that
 n   2 n
t c˛ jtj˛ t
f D 1 CO
n1/˛ n n2/˛
is also a ch.f. (What is the probabilistic meaning in terms of r.v.’s?) For each
t, as n ! 1, the limit is equal to ec˛ jtj (the lemma in Sec. 6.4 again!). This,
˛

being continuous at t D 0, is also a ch.f. by the basic Theorem 6.3.2, and the
constant c˛ may be absorbed by a change of scale. Finally, for ˛ D 2, f˛ is
the ch.f. of a normal distribution. This completes the proof of the theorem.
Actually Lévy, who discovered these ch.f.’s around 1923, proved also
that there are complex constants ˛ such that e˛ jtj is a ch.f., and determined
˛

the exact form of these constants (see Gnedenko and Kolmogorov [12]). The
corresponding d.f.’s are called stable distributions, and those with real posi-
tive ˛ the symmetric stable ones. The parameter ˛ is called the exponent.
These distributions are part of a much larger class called the infinitely divisible
distributions to be discussed in Chapter 7.
Using the Cauchy ch.f. ejtj we can settle a question of historical interest.
Draw the graph of this function, choose an arbitrary T > 0, and draw the
tangents at šT meeting the abscissa axis at šT0 , where T0 > T. Now define
194 CHARACTERISTIC FUNCTION

the function fT to be f in [T, T], linear in [T0 , T] and in [T, T0 ], and
zero in 1, T0  and T0 , 1. Clearly fT also satisfies the conditions of
Theorem 6.5.3 and so is a ch.f. Furthermore, f D fT in [T, T]. We have
thus established the following theorem and shown indeed how abundant the
desired examples are.

Theorem 6.5.5. There exist two distinct ch.f.’s that coincide in an interval
containing the origin.

That the interval of coincidence can be “arbitrarily large” is, of course,


trivial, for if f1 D f2 in [υ, υ], then g1 D g2 in [nυ, nυ], where
   
t t
g1 t D f1 , g2 t D f2 .
n n

Corollary. There exist three ch.f.’s, f1 , f2 , f3 , such that f1 f3  f2 f3 but


f1 6 f2 .

To see this, take f1 and f2 as in the theorem, and take f3 to be any


ch.f. vanishing outside their interval of coincidence, such as the one described
above for a sufficiently small value of T. This result shows that in the algebra
of ch.f.’s, the cancellation law does not hold.
We end this section by another special but useful way of constructing
ch.f.’s.

Theorem 6.5.6. If f is a ch.f., then so is ef1 for each  ½ 0.


PROOF. For each  ½ 0, as soon as the integer n ½ , the function
  f  1
1 C f D1C
n n n
is a ch.f., hence so is its nth power; see propositions (iv) and (v) of Sec. 6.1.
As n ! 1,  
f  1 n
1C ! ef1
n
and the limit is clearly continuous. Hence it is a ch.f. by Theorem 6.3.2.
Later we shall see that this class of ch.f.’s is also part of the infinitely
divisible family. For ft D eit , the corresponding
1

it 1 e n
ee D eitn
nD0
n!

is the ch.f. of the Poisson distribution which should be familiar to the reader.
6.5 REPRESENTATIVE THEOREMS 195

EXERCISES

1. If f is continuous in R1 and satisfies (3) for each x 2 R1 and each


T > 0, then f is positive definite.
2. Show that the following functions are ch.f.’s:

1 1  jtj˛ , if jtj  1;
, ft D
1 C jtj 0, if jtj ½ 1; 0 < ˛  1,

1  jtj, if 0  jtj  12 ;
ft D 1
, if jtj ½ 12 .
4jtj
3. If fXn g are
 independent r.v.’s with the same stable distribution of
exponent ˛, then nkD1 Xk /n1/˛ has the same distribution. [This is the origin
of the name “stable”.]
 1 4. rIf F is a symmetric stable distribution of exponent ˛, 0 < ˛ < 2, then
1 jxj dFx < 1 for r < ˛ and D 1 for r ½ ˛. [HINT: Use Exercises 7
and 8 of Sec. 6.4.]
 5. Another proof of Theorem 6.5.3 is as follows. Show that
 1
t df0 t D 1
0

and define the d.f. G on RC by



Gu D t df0 t.
[0,u]

Next show that  1  


jtj
1 dGu D ft.
0 u
Hence if we set  
jtj
fu, t D 1  _0
u
(see Exercise 2 of Sec. 6.2), then

ft D fu, t dGu.
[0,1

Now apply Exercise 2 of Sec. 6.1.


6. Show that there is a ch.f. that has period 2m, m an integer ½ 1, and
that is equal to 1  jtj in [1, C1]. [HINT: Compute the Fourier series of such
a function to show that the coefficients are positive.]
196 CHARACTERISTIC FUNCTION

7. Construct a ch.f. that vanishes in [b, a] and [a, b], where 0 < a <
b, but nowhere else. [HINT: Let fm be the ch.f. in Exercise 6 and consider
 
pm fm , where pm ½ 0, pm D 1,
m m

and the pm ’s are strategically chosen.]


8. Suppose ft, u is a function on R2 such that for each u, fÐ, u is a
ch.f.; and for each t, ft, Ð is continuous. Then for any d.f. G,
 1 
exp [ft, u  1] dGu
1

is a ch.f.
9. Show that in Theorem 6.3.2, the hypothesis (a) cannot be relaxed to
require convergence of ffn g only in a finite interval jtj  T.

6.6 Multidimensional case; Laplace transforms


We will discuss very briefly the ch.f. of a p.m. in Euclidean space of more than
one dimension, which we will take to be two since all extensions are straight-
forward. The ch.f. of the random vector (X, Y) or of its 2-dimensional p.m.
is defined to be the function fÐ, Ð on R2 :

1 fs, t D fX,Y s, t D E eisXCtY  D eisxCty dx, dy.
R2

Propositions (i) to (v) of Sec. 6.1 have their obvious analogues. The inversion
formula may be formulated as follows. Call an “interval” (rectangle)
fx, y: x1  x  x2 , y1  y  y2 g
an interval of continuity iff the -measure of its boundary (the set of points
on its four sides) is zero. For such an interval I, we have
 T  T isx1
1 e  eisx2 eity1  eity2
I D lim fs, t ds dt.
T!1 22 T T is it
The proof is entirely similar to that in one dimension. It follows, as there, that
f uniquely determines . Only the following result is noteworthy.

Theorem 6.6.1. Two r.v.’s X and Y are independent if and only if


2 8s, 8t: fX,Y s, t D fX sfY t,
where fX and fY are the ch.f.’s of X and Y, respectively.
6.6 MULTIDIMENSIONAL CASE; LAPLACE TRANSFORMS 197

The condition (2) is to be contrasted with the following identify in one


variable:
8t: fXCY t D fX tfY t,
where fXCY is the ch.f. of X C Y. This holds if X and Y are independent, but
the converse is false (see Exercise 14 of Sec. 6.1).
PROOF OF THEOREM 6.6.1. If X and Y are independent, then so are eisX and
itY
e for every s and t, hence
E eisXCtY  D E eisX Ð eitY  D E eisX E eitY ,
which is (2). Conversely, consider the 2-dimensional product measure 1 ð 2 ,
where 1 and 2 are the 1-dimensional p.m.’s of X and Y, respectively, and
the product is defined in Sec. 3.3. Its ch.f. is given by definition as
 
eisxCty  1 ð 2 dx, dy D eisx Ð eity 1 dx 2 dy
R2 R2
 
D e isx
1 dx eity 2 dy D fX sfY t
R1 R1
(Fubini’s theorem!). If (2) is true, then this is the same as fX,Y s, t, so that
1 ð 2 has the same ch.f. as , the p.m. of (X, Y). Hence, by the uniqueness
theorem mentioned above, we have 1 ð 2 D . This is equivalent to the
independence of X and Y.
The multidimensional analogues of the convergence theorem, and some
of its applications such as the weak law of large numbers and central limit
theorem, are all valid without new difficulties. Even the characterization of
Bochner has an easy extension. However, these topics are better pursued with
certain objectives in view, such as mathematical statistics, Gaussian processes,
and spectral analysis, that are beyond the scope of this book, and so we shall
not enter into them.
We shall, however, give a short introduction to an allied notion, that of
the Laplace transform, which is sometimes more expedient or appropriate than
the ch.f., and is a basic tool in the theory of Markov processes.
Let X be a positive ½0 r.v. having the d.f. F so that F has support in
[0, 1, namely F0 D 0. The Laplace transform of X or F is the function
FO on RC D [0, 1 given by

3 O
F D E eX  D ex dFx.
[0,1

It is obvious (why?) that


O
F0 O
D lim F D 1, O
F1 O
D lim F D F0.
#0 !1
198 CHARACTERISTIC FUNCTION

More generally, we can define the Laplace transform of an s.d.f. or a function


G of bounded variation satisfying certain “growth condition” at infinity. In
particular, if F is an s.d.f., the Laplace transform of its indefinite integral
 x
Gx D Fu du
0

is finite for  > 0 and given by


  
O
G D ex Fx dx D ex dx dy
[0,1 [0,1 [0,x]
  1 
1 1
D dy ex dx D ey dy D O
F,
[0,1 y  [0,1 
where is the s.p.m. of F. The calculation above, based on Fubini’s theorem,
replaces a familiar “integration by parts”. However, the reader should beware
of the latter operation. For instance, according to the usual definition, as
given in Rudin [1] (and several other standard textbooks!), the value of the
Riemann–Stieltjes integral
 1
ex dυ0 x
0

is 0 rather than 1, but


 1  1
x x A
e dυ0 x D lim υ0 xe jj C υ0 xex dx
#0
0 A"1 0

is correct only if the left member is taken in the Lebesgue-Stieltjes sense, as


is always done in this book.
There are obvious analogues of propositions (i) to (v) of Sec. 6.1.
However, the inversion formula requiring complex integration will be omitted
and the uniqueness theorem, due to Lerch, will be proved by a different method
(cf. Exercise 12 of Sec. 6.2).

Theorem 6.6.2. Let F O j be the Laplace transform of the d.f. Fj with support
O1 D F
in RC , j D 1, 2. If F O 2 , then F1 D F2 .
PROOF. We shall apply the Stone–Weierstrass theorem to the algebra
generated by the family of functions fex ,  ½ 0g, defined on the closed
positive real line: RC D [0, 1], namely the one-point compactification of
RC D [0, 1. A continuous function of x on RC is one that is continuous
in RC and has a finite limit as x ! 1. This family separates points on RC
and vanishes at no point of RC (at the point C1, the member e0x D 1 of the
family does not vanish!). Hence the set of polynomials in the members of the
6.6 MULTIDIMENSIONAL CASE; LAPLACE TRANSFORMS 199

family, namely the algebra generated by it, is dense in the uniform topology,
in the space CB RC  of bounded continuous functions on RC . That is to say,
given any g 2 CB RC , and  > 0, there exists a polynomial of the form

n
g x D cj ej x ,
jD1

where cj are real and j ½ 0, such that


sup jgx  g xj  .
xRC
Consequently, we have

jgx  g xj dFj x  , j D 1, 2.

By hypothesis, we have for each  ½ 0:


 
ex dF1 x D ex dF2 x,

and consequently,
 
g xdF1 x D g xdF2 x.

It now follows, as in the proof of Theorem 4.4.1, first that


 
gx dF1 x D gx dF2 x

for each g 2 CB RC ; second, that this also holds for each g that is the
indicator of an interval in RC (even one of the form a, 1]; third, that
the two p.m.’s induced by F1 and F2 are identical, and finally that F1 D F2
as asserted.

Remark. Although it is not necessary here, we may also extend the


domain of definition of a d.f. F to RC ; thus F1 D 1, but with the new
meaning that F1 is actually the value of F at the point 1, rather than
a notation for limx!1 Fx as previously defined. F is thus continuous at
1. In terms of the p.m. , this means we extend its domain to RC but set
f1g D 0. On other occasions it may be useful to assign a strictly positive
value to f1g.

Passing to the convergence theorem, we shall prove it in a form closest


to Theorem 6.3.2 and refer to Exercise 4 below for improvements.

Theorem 6.6.3. Let fFn , 1  n < 1g be a sequence of s.d.f.’s with supports


v
in RC and fFO n g the corresponding Laplace transforms. Then Fn !F1 , where
F1 is a d.f., if and only if:
200 CHARACTERISTIC FUNCTION

(a) limn!1 F O n  exists for every  > 0;


(b) the limit function has the limit 1 as  # 0.

Remark. The limit in (a) exists at  D 0 and equals 1 if the Fn ’s are


d.f.’s, but even so (b) is not guaranteed, as the example Fn D υn shows.
PROOF. The “only if ” part follows at once from Theorem 4.4.1 and the
remark that the Laplace transform of an s.d.f. is a continuous function in RC .
Conversely, suppose lim F O n  D G,  > 0; extended G to RC by setting
G0 D 1 so that G is continuous in RC by hypothesis (b). As in the proof
of Theorem 6.3.2, consider any vaguely convergent subsequence Fnk with the
vague limit F1 , necessarily an s.d.f. (see Sec. 4.3). Since for each  > 0,
ex 2 C0 , Theorem 4.4.1 applies to yield FO nk  ! F O 1  for  > 0, where
FO 1 is the Laplace transform of F1 . Thus F O 1  D G for  > 0, and
consequently for  ½ 0 by continuity of F O 1 and G at  D 0. Hence every
vague limit has the same Laplace transform, and therefore is the same F1 by
v
Theorem 6.6.2. It follows by Theorem 4.3.4 that Fn !F1 . Finally, we have
F1 1 D F O 1 0 D G0 D 1, proving that F1 is a d.f.
There is a useful characterization, due to S. Bernstein, of Laplace trans-
forms of measures on RC . A function is called completely monotonic in an
interval (finite or infinite, of any kind) iff it has derivatives of all orders there
satisfying the condition:

4 1n fn  ½ 0

for each n ½ 0 and each  in the domain of definition.

Theorem 6.6.4. A function f on 0, 1 is the Laplace transform of a d.f. F:



5 f D ex dFx,
RC

if and only if it is completely monotonic in 0, 1 with f0C D 1.

Remark. We then extend f to RC by setting f0 D 1, and (5) will


hold for  ½ 0.
PROOF. The “only if ” part is immediate, since

fn  D xn ex dFx.
RC

Turning to the “if ” part, let us first prove that f is quasi-analytic in 0, 1,
namely it has a convergent Taylor series there. Let 0 < 0 <  < , then, by
6.6 MULTIDIMENSIONAL CASE; LAPLACE TRANSFORMS 201

Taylor’s theorem, with the remainder term in the integral form, we have

k1
fj  
6 f D   j
jD0
j!

  k 1
C 1  tk1 fk  C   t dt.
k  1! 0

Because of (4), the last term in (6) is positive and does not exceed

  k 1
1  tk1 fk  C 0  t dt.
k  1! 0
For if k is even, then fk # and   k ½ 0, while if k is odd then fk "
and   k  0. Now, by (6) with  replaced by 0 , the last expression is
equal to
⎡ ⎤
 k 
k1  k
 ⎣f0  
f j
  ⎦

0   
j
f0 ,
0  jD0
j! 0 

where the inequality is trivial, since each term in the sum on the left is positive
by (4). Therefore, as k ! 1, the remainder term in (6) tends to zero and the
Taylor series for f converges.
Now for each n ½ 1, define the discrete s.d.f. Fn by the formula:

[nx] j
n
7 Fn x D 1j fj n.
jD0
j!

This is indeed an s.d.f., since for each  > 0 and k ½ 1 we have from (6):

k1
fj n
1 D f0C ½ f ½   nj .
jD0
j!

Letting  # 0 and then k " 1, we see that Fn 1  1. The Laplace transform
of Fn is plainly, for  > 0:
 1
x nj j
e dFn x D ej/n f n
RC jD0
j!

1
1
D n1  e/n   nj fj n D fn1  e/n ,
jD0
j!

the last equation from the Taylor series. Letting n ! 1, we obtain for the
limit of the last term f, since f is continuous at each . It follows from
202 CHARACTERISTIC FUNCTION

Theorem 6.6.3 that fFn g converges vaguely, say to F, and that the Laplace
transform of F is f. Hence F1 D f0 D 1, and F is a d.f. The theorem
is proved.

EXERCISES

1. If X and Y are independent r.v.’s with normal d.f.’s of the same


variance, then X C Y and X  Y are independent.
 2. Two uncorrelated r.v.’s with a joint normal d.f. of arbitrary parameters
are independent. Extend this to any finite number of r.v.’s.
 3. Let F and G be s.d.f.’s. If  > 0 and F
0 O O
D G for all  ½ 0 , then
F D G. More generally, if FnO 0  D O
Gn 0  for integer n ½ 1, then F D G.
[HINT: In order to apply the Stone–Weierstrass theorem as cited, adjoin the
constant 1 to the family fex ,  ½ 0 g; show that a function in C0 can actually
be uniformly approximated without a constant.]
 4. Let fF g be s.d.f.’s. If  > 0 and lim
n 0 n!1 F O n  exists for all  ½ 0 ,
then fFn g converges vaguely to an s.d.f.
5. Use Exercise 3 to prove that for any d.f. whose support is in a finite
interval, the moment problem is determinate.
6. Let F be an s.d.f. with support in RC . Define G0 D F,
 x
Gn x D Gn1 u du
0

for n ½ 1. Find GO n  in terms of F.


O
 1 x
O
7. Let f D 0 e fx dx where f 2 L 1 0, 1. Suppose that f has
a finite right-hand derivative f0 0 at the origin, then
O
f0 D lim f,
!1
0 O
f 0 D lim [f  f0].
!1

 8. In the notation of Exercise 7, prove that for every , 2 R :


C
 1 1
   O
esC t fs C t ds dt D f O .
 f
0 0

9. Given a function on RC that is finite, continuous, and decreasing


to zero at infinity, find a -finite measure on RC such that

8t > 0: t  s ds D 1.
[0,t]

[HINT: Assume 0 D 1 and consider the d.f. 1  .]


6.6 MULTIDIMENSIONAL CASE; LAPLACE TRANSFORMS 203

 10. If f > 0 on 0, 1 and has a derivative f0 that is completely mono-


tonic there, then 1/f is also completely monotonic.
11. If f is completely monotonic in 0, 1 with f0C D C1, then f
is the Laplace transform of an infinite measure on RC :

f D ex dx.
RC

[HINT: Show that Fn x  e2xυ fυ for each υ > 0 and all large n, where Fn
is defined in (7). Alternatively, apply Theorem 6.6.4 to f C n1 /fn1 
for  ½ 0 and use Exercise 3.]
12. Let fgn , 1  n  1g on RC satisfy the conditions: (i) for each
n, gn Ð is positive and decreasing; (ii) g1 x is continuous; (iii) for each
 > 0,  1  1
x
lim e gn x dx D ex g1 x dx.
n!1 0 0

Then
lim gn x D g1 x for every x 2 RC .
n!1
1
[HINT: For  > 0 consider the sequence 0 ex gn x dx and show that
 b  b
lim ex gn x dx D ex g1 x dx, lim gn b  g1 b,
n!1 a a n!1

and so on.]

Formally, the Fourier transform f and the Laplace transform F O of a p.m.


with support in RC can be obtained from each other by the substitution t D i
or  D it in (1) of Sec. 6.1 or (3) of Sec. 6.6. In practice, a certain expression
may be derived for one of these transforms that is valid only for the pertinent
range, and the question arises as to its validity for the other range. Interesting
cases of this will be encountered in Secs. 8.4 and 8.5. The following theorem
is generally adequate for the situation.

Theorem 6.6.5. The function h of the complex variable z given by



hz D ezx dFx
RC

is analytic in Rz < 0 and continuous in Rz  0. Suppose that g is another


function of z that is analytic in Rz < 0 and continuous in Rz  0 such that
8t 2 R1 : hit D git.
204 CHARACTERISTIC FUNCTION

Then hz  gz in Rz  0; in particular

8 2 RC : h D g.
PROOF. For each integer m ½ 1, the function hm defined by
 1
 
xn
hm z D e dFx D
zx
z n
dFx
[0,m] nD0 [0,m] n!

is clearly an entire function of z. We have


 
e dFx 
zx
dFx
m,1 m,1

in Rz  0; hence the sequence hm converges uniformly there to h, and h


is continuous in Rz  0. It follows from a basic proposition in the theory of
analytic functions that h is analytic in the interior Rz < 0. Next, the difference
h  g is analytic in Rz < 0, continuous is Rz  0, and equal to zero on the
line Rz D 0 by hypothesis. Hence, by Schwarz’s reflection principle, it can be
analytically continued across the line and over the whole complex plane. The
resulting entire functions, being zero on a line, must be identically zero in the
plane. In particular h  g  0 in Rz  0, proving the theorem.

Bibliographical Note

For standard references on ch.f.’s, apart from Lévy [7], [11], Cramér [10], Gnedenko
and Kolmogorov [12], Loève [14], Rényi [15], Doob [16], we mention:
S. Bochner, Vorlesungen über Fouriersche Integrale. Akademische Ver-
laggesellschaft, Konstanz, 1932.
E. Lukacs, Characteristic functions. Charles Griffin, London, 1960.
The proof of Theorem 6.6.4 is taken from
Willy Feller, Completely monotone functions and sequences, Duke J. 5 (1939),
661–674.
Central limit theorem and
7 its ramifications

7.1 Liapounov’s theorem


The name “central limit theorem” refers to a result that asserts the convergence
in dist. of a “normed” sum of r.v.’s, Sn  an /bn , to the unit normal d.f. 8.
We have already proved a neat, though special, version in Theorem 6.4.4.
Here we begin by generalizing the set-up. If we write
⎛ ⎞
Sn  an  n
X a
D⎝ ⎠  n,
j
1
bn b
jD1 n
bn

we see that we are really dealing with a double array, as follows. For each
n ½ 1 let there be kn r.v.’s fXnj , 1  j  kn g, where kn ! 1 as n ! 1:
X11 , X12 , . . . , X1k1 ;
2 X21 , X22 , . . . , X2k2 ;
.................
Xn1 , Xn2 , . . . , Xnkn ;
.................
206 CENTRAL LIMIT THEOREM AND ITS RAMIFICATIONS

The r.v.’s with n as first subscript will be referred to as being in the nth row.
Let Fnj be the d.f., fnj the ch.f. of Xnj ; and put


kn
Sn D Sn,kn D Xnj .
jD1

The particular case kn D n for each n yields a triangular array, and if, further-
more, Xnj D Xj for every n, then it reduces to the initial sections of a single
sequence fXj , j ½ 1g.
We shall assume from now on that the r.v.’s in each row in (2) are
independent, but those in different rows may be arbitrarily dependent as in
the case just mentioned — indeed they may be defined on different probability
spaces without any relation to one another. Furthermore, let us introduce the
following notation for moments, whenever they are defined, finite or infinite:
E Xnj  D ˛nj , 2
Xnj  D nj2
,
kn 
kn

E Sn  D ˛nj D ˛n , 2
Sn  D nj D sn ,
2 2

3 jD1 jD1



kn
E jXnj j3  D nj , 0n D nj .
jD1

In the special case of (1), we have


2
Xj Xj 
Xnj D , 2
Xnj  D .
bn bn2
If we take bn D sn , then

kn
4 2
Xnj  D 1.
jD1

By considering Xnj  ˛nj instead of Xnj , we may suppose


5 8n, 8j: ˛nj D 0
whenever the means exist. The reduction (sometimes called “norming”)
leading to (4) and (5) is always available if each Xnj has a finite second
moment, and we shall assume this in the following.
In dealing with the double array (2), it is essential to impose a hypothesis
that individual terms in the sum

kn
Sn D Xnj
jD1
7.1 LIAPOUNOV’S THEOREM 207

are “negligible” in comparison with the sum itself. Historically, this arose
from the assumption that “small errors” accumulate to cause probabilistically
predictable random mass phenomena. We shall see later that such a hypothesis
is indeed necessary in a reasonable criterion for the central limit theorem such
as Theorem 7.2.1.
In order to clarify the intuitive notion of the negligibility, let us consider
the following hierarchy of conditions, each to be satisfied for every  > 0:
(a) 8j: lim P fjXnj j > g D 0;
n!1

(b) lim max P fjXnj j > g D 0;


n!1 1jkn
 
(c) lim P max jXnj j >  D 0;
n!1 1jkn


kn
(d) lim P fjXnj j > g D 0.
n!1
jD1

It is clear that (d) ) (c) ) (b) ) (a); see Exercise 1 below. It turns out that
(b) is the appropriate condition, which will be given a name.

DEFINITION. The double array (2) is said to be holospoudic Ł iff (b) holds.

Theorem 7.1.1. A necessary and sufficient condition for (2) to be


holospoudic is:

6 8t 2 R1 : lim max jfnj t  1j D 0.


n!1 1jkn

PROOF. Assuming that (b) is true, we have


  
jfnj t  1j  jeitx  1j dFnj x D C
jxj> jxj
 
 2 dFnj x C jtj jxj dFnj x
jxj> jxj

2 dFnj x C jtj;
jxj>

and consequently

max jfnj t  1j  2 max P fjXnj j > g C jtj.


j j

Ł I am indebted to Professor M. Wigodsky for suggesting this word, the only new term coined
in this book.
208 CENTRAL LIMIT THEOREM AND ITS RAMIFICATIONS

Letting n ! 1, then  ! 0, we obtain (6). Conversely, we have by the


inequality (2) of Sec. 6.3,
    
  
dFnj x  2   fnj tdt  j1  fnj tj dt;
jxj> 2 jtj2/ 2 jtj2/
and consequently


max P fjXnj j > g  max j1  fnj tj dt.
j 2 jtj2/ j

Letting n ! 1, the right-hand side tends to 0 by (6) and bounded conver-


gence; hence (b) follows. Note that the basic independence assumption is not
needed in this theorem.
We shall first give Liapounov’s form of the central limit theorem
involving the third moment by a typical use of the ch.f. From this we deduce
a special case concerning bounded r.v.’s. From this, we deduce the sufficiency
of Lindeberg’s condition by direct arguments. Finally we give Feller’s proof
of the necessity of that condition by a novel application of the ch.f.
In order to bring out the almost mechanical nature of Liapounov’s result
we state and prove a lemma in calculus.

Lemma. Let fnj , 1  j  kn , 1  ng be a double array of complex numbers


satisfying the following conditions as n ! 1:

(i) max1jkn jnj j ! 0;


kn
(ii) jD1 jnj j  M < 1, where M does not depend on n;
kn
(iii) jD1 nj ! , where  is a (finite) complex number.

Then we have

kn
7 1 C nj  ! e .
jD1

PROOF. By (i), there exists n0 such that if n ½ n0 , then jnj j  12 for all
j, so that 1 C nj 6D 0. We shall consider only such large values of n below,
and we shall denote by log 1 C nj  the determination of logarithm with an
angle in , ]. Thus

8 log1 C nj  D nj C 3jnj j2 ,


where 3 is a complex number depending on various variables but bounded
by some absolute constant not depending on anything, and is not necessarily
7.1 LIAPOUNOV’S THEOREM 209

the same in each appearance. In the present case we have in fact


1 
 1m1   1
jnj jm
 m
j log1 C nj   nj j D  nj  
 m  m
mD2 mD2
1  
jnj j2  1 m2
 D jnj j2  1,
2 mD2 2
so that the absolute constant mentioned above may be taken to be 1. (The
reader will supply such a computation next time.) Hence

kn 
kn 
kn
log1 C nj  D nj C 3 jnj j2 .
jD1 jD1 jD1

(This 3 is not the same as before, but bounded by the same 1!). It follows
from (ii) and (i) that

kn 
kn
9 jnj j2  max jnj j jnj j  M max jnj j ! 0;
1jkn 1jkn
jD1 jD1

and consequently we have by (iii),



kn
log1 C nj  ! .
jD1

This is equivalent to (7).

Theorem 7.1.2. Assume that (4) and (5) hold for the double array (2) and
that nj is finite for every n and j. If
10 0n ! 0
as n ! 1, then Sn converges in dist. to 8.
PROOF. For each n, the range of j below will be from 1 to kn . It follows
from the assumption (10) and Liapounov’s inequality that
11 max 3
nj  max nj  0n ! 0.
j j

By 30  of Theorem 6.4.2, we have


fnj t D 1  1 2 2
2 nj
t C 3njnj jtj3 ,

where j3nj j  16 . We apply the lemma above, for a fixed t, to

nj D  12 2 2
nj t C 3njnj jtj3 .
210 CENTRAL LIMIT THEOREM AND ITS RAMIFICATIONS

Condition (i) is satisfied since


t2
max jnj j  max 2
nj C 3jtj3 max nj ! 0
j 2 j j

by (11). Condition (ii) is satisfied since


 t2
jnj j  C 3jtj3 0n
j
2

is bounded by (11); similarly condition (iii) is satisfied since


 t2 t2
nj D  C 3jtj3 0n !  .
j
2 2

It follows that

kn
fnj t ! et
2
/2
.
jD1

This establishes the theorem by the convergence theorem of Sec. 6.3, since
the left member is the ch.f. of Sn .

Corollary. Without supposing that E Xnj  D 0, suppose that for each n and
j there is a finite constant Mnj such that jXnj j  Mnj a.e., and that

12 max Mnj ! 0.


1jkn

Then Sn  E Sn  converges in dist. to 8.

This follows at once by the trivial inequality



kn 
kn
E jXnj  E Xnj j   2 max Mnj
3 2
Xnj 
1jkn
jD1 jD1

D 2 max Mnj .
1jkn

The usual formulation of Theorem 7.1.2 for a single sequence of inde-


pendent r.v.’s fXj g with E Xj  D 0, 2 Xj  D j2 < 1, E jXj j3  D j < 1,


n 
n 
n
13 Sn D Xj , sn2 D 2
j, 0n D j ,
jD1 jD1 jD1

is as follows.
7.1 LIAPOUNOV’S THEOREM 211

If
0n
14 ! 0,
sn3
then Sn /sn converges in dist. to 8.
This is obtained by setting Xnj D Xj /sn . It should be noticed that the double
scheme gets rid of cumbersome fractions in proofs as well as in statements.
We proceed to another proof of Liapounov’s theorem for a single
sequence by the method of Lindeberg, who actually used it to prove his
version of the central limit theorem, see Theorem 7.2.1 below. In recent times
this method has been developed into a general tool by Trotter and Feller.
The idea of Lindeberg is to approximate the sum X1 C Ð Ð Ð C Xn in (13)
successively by replacing one X at a time with a comparable normal (Gaussian)
r.v. Y, as follows. Let fYj , j ½ 1g be r.v.’s having the normal distribution
N0, j2 ; thus Yj has the same mean and variance as the corresponding Xj
above; let all the X’s and Y’s be totally independent. Now put
Zj D Y1 C Ð Ð Ð C Yj1 C XjC1 C Ð Ð Ð C Xn , 1  j  n,
with the obvious convention that
Z1 D X2 C Ð Ð Ð C Xn , Zn D Y1 C Ð Ð Ð C Yn1 .
To compare the distribution of Xj C Zj /sn with that of Yj C Zj /sn , we
use Theorem 6.1.6 by comparing the expectations of test functions. Namely,
we estimate the difference below for a suitable class of functions f:
     
X1 C Ð Ð Ð C Xn Y1 C Ð Ð Ð C Yn
15 E f E f
sn sn
n #      $
Xj C Zj Yj C Zj
D E f E f .
jD1
sn sn

This equation follows by telescoping since Yj C Zj D XjC1 C ZjC1 . We take


f in C3B , the class of bounded continuous functions with three bounded contin-
uous derivatives. By Taylor’s theorem, we have for every x and y:
 # $
 00  3
fx C y  fx C f0 xy C f x y 2   Mjyj
 2  6
where M D supx2R1 jf3 xj. Hence if  and  are independent r.v.’s such that
E fjj3 g < 1, we have by substitution followed by integration:
1
jE ff C g  E ffg  E ff0 gE fg  E ff00 gE f2 gj
2
M
16  E fjj3 g.
6
212 CENTRAL LIMIT THEOREM AND ITS RAMIFICATIONS

Note that the r.v.’s f, f0 , and f00  are bounded hence integrable. If 
is another r.v. independent of  and having the same mean and variance as ,
and E fjj3 g < 1, we obtain by replacing  with  in (16) and then taking the
difference:
M
17 jE ff C g  E ff C gj  E fjj3 C jj3 g.
6
This key formula is applied to each term on the right side of (15), with
 D Zj /sn ,  D Xj /sn ,  D Yj /sn . The bounds on the right-hand side of (17)
then add up to

M  j
n
c j3
18 C
6 jD1 sn3 sn3
p
where c D 8/ since the absolute third moment of N0, 2  is equal to c j3 .
By Liapounov’s inequality (Sec. 3.2) j3  j , so that the quantity in (18) is
O0n /sn3 . Let us introduce a unit normal r.v. N for convenience of notation,
so that Y1 C Ð Ð Ð C Yn /sn may be replaced by N so far as its distribution is
concerned. We have thus obtained the following estimate:
      
 Sn  0n
19 8f 2 C3 : E f  E ffNg  O .
sn s3n
Consequently, under the condition (14), this converges to zero as n ! 1. It
follows by the general criterion for vague convergence in Theorem 6.1.6 that
Sn /sn converges in distribution to the unit normal. This is Liapounov’s form of
the central limit theorem proved above by the method of ch.f.’s. Lindeberg’s
idea yields a by-product, due to Pinsky, which will be proved under the same
assumptions as in Liapounov’s theorem above.

Theorem 7.1.3. Let fxn g be a sequence of real numbers increasing to C1


but subject to the growth condition: for some  > 0,
0n xn2
20 log C 1 C  ! 1
sn3 2
as n ! 1. Then for this , there exists N such that for all n ½ N we have
# 2 $ # 2 $
xn xn
21 exp  1 C   P fSn ½ xn sn g  exp  1   .
2 2
PROOF. This is derived by choosing particular test functions in (19). Let
f 2 C3 be such that
fx D 0 for x   12 ; 0  fx  1 for  12 < x < 12 ;
fx D 1 for x ½ 12 ;
7.1 LIAPOUNOV’S THEOREM 213

and put for all x:

fn x D fx  xn  12 , gn x D fx  xn C 12 .

Thus we have, denoting by IB the indicator function of B ² R1 :


I[xn C1,1  fn x  I[xn,1  gn x  I[xn 1,1 .
It follows that
     
Sn Sn
22 E fn  P fSn ½ xn sn g  E gn
sn sn
whereas
23 P fN ½ xn C 1g  E ffn Ng  E fgn Ng  P fN ½ xn  1g.

Using (19) for f D fn and f D gn , and combining the results with (22) and
(23), we obtain
 
0n
P fN ½ xn C 1g  O  P fSn ½ xn sn g
sn3
 
0n
24  P fN ½ xn  1g C O .
sn3
Now an elementary estimate yields for x ! C1:
 1  2  2
1 y 1 x
P fN ½ xg D p exp  dy ¾ p exp  ,
2 x 2 2x 2
(see Exercise 4 of Sec. 7.4), and a quick computation shows further that
# 2 $
x
P fN ½ x š 1g D exp  1 C o1 , x ! C1.
2
Thus (24) may be written as
# 2 $  
x 0n
P fSn ½ xn sn g D exp  n 1 C o1 C O .
2 sn3
Suppose n is so large that the o1 above is strictly less than  in absolute
value; in order to conclude (23) it is sufficient to have
 # 2 $
0n xn
D o exp  1 C  , n ! 1.
sn3 2
This is the sense of the condition (20), and the theorem is proved.
214 CENTRAL LIMIT THEOREM AND ITS RAMIFICATIONS

Recalling that sn is the standard deviation of Sn , we call the probability


in (21) that of a “large deviation”. Observe that the two estimates of this
probability given there have a ratio equal to ex n2 which is large for each ,
as xn ! C1. Nonetheless, it is small relative to the principal factor exn
2/2

on the logarithmic scale, which is just to say that xn2 is small in absolute
value compared with xn2 /2. Thus the estimates in (21) is useful on such a
scale, and may be applied in the proof of the law of the iterated logarithm in
Sec. 7.5.

EXERCISES

 1. Prove that for arbitrary r.v.’s fX g in the array (2), the implications
nj
d ) c ) b ) a are all strict. On the other hand, if the Xnj ’s are
independent in each row, then d  c.
2. For any sequence of r.v.’s fYn g, if Yn /bn converges in dist. for an
increasing sequence of constants fbn g, then Yn /bn0 converges in pr. to 0 if
bn D obn0 . In particular, make precise the following statement: “The central
limit theorem implies the weak law of large numbers.”
3. For the double array (2), it is possible that Sn /bn converges in dist.
for a sequence of strictly positive constants bn tending to a finite limit. Is it
still possible if bn oscillates between finite limits?
4. Let fXj g be independent r.v.’s such that max1jn jXj j/bn ! 0 in
pr. and Sn  an /bn converges to a nondegenerate d.f. Then bn ! 1,
bnC1 /bn ! 1, and anC1  an /bn ! 0.
 5. In Theorem 7.1.2 let the d.f. of S be F . Prove that given any  > 0,
n n
there exists a υ such that 0n  υ ) LFn , 8  , where L is Levy
distance. Strengthen the conclusion to read:
sup jFn x  8xj  .
x2R1

 6. Prove the assertion made in Exercise 5 of Sec. 5.3 using the methods
of this section. [HINT: use Exercise 4 of Sec. 4.3.]

7.2 Lindeberg–Feller theorem


We can now state the Lindeberg–Feller theorem for the double array (2) of
Sec. 7.1 (with independence in each row).

Theorem 7.2.1. Assume nj 2


< 1 for each n and j and the reduction
hypotheses (4) and (5) of Sec. 7.1. In order that as n ! 1 the two conclusions
below both hold:
7.2 LINDEBERG–FELLER THEOREM 215

(i) Sn converges in dist. to 8,


(ii) the double array (2) of Sec. 7.1 is holospoudic;

it is necessary and sufficient that for each  > 0, we have


kn 

1 x 2 dFnj x ! 0.
jD1 jxj>

The condition (1) is called Lindeberg’s condition; it is manifestly


equivalent to
kn 

0
1  x 2 dFnj x ! 1.
jD1 jxj

PROOF. Sufficiency. By the argument that leads to Chebyshev’s inequality,


we have

1
2 P fjXnj j > g  x 2 dFnj x.
2 jxj>

Hence (ii) follows from (1); indeed even the stronger form of negligibility (d)
in Sec. 7.1 follows. Now for a fixed , 0 <  < 1, we truncate Xnj as follows:

Xnj , if jXnj j  ;
3 X0nj D
0, otherwise.
kn
Put Sn0 D 0
jD1 Xnj , Sn0  D s0 2n . We have, since E Xnj  D 0,
2

 
0
E Xnj  D x dFnj x D  x dFnj x.
jxj jxj>

Hence  
1
jE X0nj j  jxj dFnj x  x 2 dFnj x
jxj>  jxj>

and so by (1),

1 n k
jE Sn0 j  x 2 dFnj x ! 0.
 jD1 jxj>

Next we have by the Cauchy–Schwarz inequality


 
[E X0nj ]2  x 2 dFnj x 1 dFnj x,
jxj> jxj>
216 CENTRAL LIMIT THEOREM AND ITS RAMIFICATIONS

and consequently
   
0
2
Xnj  D x dFnj x 
2
E X0nj 2 ½  x 2 dFnj x.
jxj jxj jxj>

It follows by 10  that


kn 
  
1 D sn2 ½ s0 n ½
2
 x 2 dFnj x ! 1.
jD1 jxj jxj>

Thus as n ! 1, we have
sn0 ! 1 and E Sn0  ! 0.
Since  
Sn0  E Sn0  E Sn0 
Sn0 D C sn0
sn0 sn0
we conclude (see Theorem 4.4.6, but the situation is even simpler here) that
if [Sn0  E Sn0 ] j /sn0 converges in dist., so will Sn0 /sn0 to the same d.f.
Now we try to apply the corollary to Theorem 7.1.2 to the double array
fX0nj g. We have jX0nj j  , so that the left member in (12) of Sec. 7.1 corre-
sponding to this array is bounded by . But although  is at our disposal and
may be taken as small as we please, it is fixed with respect to n in the above.
Hence we cannot yet make use of the cited corollary. What is needed is the
following lemma, which is useful in similar circumstances.

Lemma 1. Le t u(m, n) be a function of positive integers m and n such that


8m: lim um, n D 0.
n!1

Then there exists a sequence fmn g increasing to 1 such that


lim umn , n D 0.
n!1

PROOF. It is the statement of the lemma and its necessity in our appli-
cation that requires a certain amount of sophistication; the proof is easy. For
each m, there is an nm such that n ½ nm ) um, n  1/m. We may choose
fnm , m ½ 1g inductively so that nm increases strictly with m. Now define
n0 D 1
mn D m for nm  n < nmC1 .
Then
1
umn , n  for nm  n < nmC1 ,
m
and consequently the lemma is proved.
7.2 LINDEBERG–FELLER THEOREM 217

We apply the lemma to (1) as follows. For each m ½ 1, we have


kn 

lim m2 x 2 dFnj x D 0.
n!1 jxj>1/m
jD1

It follows that there exists a sequence fn g decreasing to 0 such that


kn 
1 
x 2 dFnj x ! 0.
2n jD1 jxj>n

Now we can go back and modify the definition in (3) by replacing  with
n . As indicated above, the cited corollary becomes applicable and yields the
convergence of [Sn0  E Sn0 ]/sn0 in dist. to 8, hence also that of Sn0 /sn0 as
remarked.
Finally we must go from Sn0 to Sn . The idea is similar to Theorem 5.2.1
but simpler. Observe that, for the modified Xnj in (3) with  replaced by n ,
we have
⎧ ⎫
⎨ kn ⎬  kn
P fSn 6D Sn0 g  P [Xnj 6D X0nj ]  P fjXnj j > n g
⎩ ⎭
jD1 jD1

kn 
1
 2
x 2 dFnj x,

jD1 n jxj>n

the last inequality from (2). As n ! 1, the last term tends to 0 by the above,
hence Sn must have the same limit distribution as Sn0 (why?) and the sufficiency
of Lindeberg’s condition is proved. Although this method of proof is somewhat
longer than a straightforward approach by means of ch.f.’s (Exercise 4 below),
it involves several good ideas that can be used on other occasions. Indeed the
sufficiency part of the most general form of the central limit theorem (see
below) can be proved in the same way.
Necessity. Here we must resort entirely to manipulations with ch.f.’s. By
the convergence theorem of Sec. 6.3 and Theorem 7.1.1, the conditions (i)
and (ii) are equivalent to:

kn
fnj t D et
2 /2
4 8t: lim ;
n!1
jD1

5 8t: lim max jfnj t  1j D 0.


n!1 1jkn

By Theorem 6.3.1, the convergence in (4) is uniform in jtj  T for each finite
T : similarly for (5) by Theorem 7.1.1. Hence for each T there exists n0 T
218 CENTRAL LIMIT THEOREM AND ITS RAMIFICATIONS

such that if n ½ n0 T, then


max max jfnj t  1j  12 .
jtjT 1jkn

We shall consider only such values of n below. We may take the distinguished
logarithms (see Theorems 7.6.2 and 7.6.3 below) to conclude that

kn
t2
6 lim log fnj t D  .
n!1
jD1
2

By (8) and (9) of Sec. 7.1, we have


7 log fnj t D fnj t  1 C 3jfnj t  1j2 ;

kn 
kn
8 jfnj t  1j  max jfnj t  1j
2
jfnj t  1j.
1jkn
jD1 jD1

Now the last-written sum is, with some , jj  1:


  1    1 
  t2 x 2
 

9  
e  1 dFnj x D
itx  itx C  dFnj x
  2
j 1 j 1


t2  1 2 t2
 x dFnj x D .
2 j 1 2

Hence it follows from (5) and (9) that the left member of (8) tends to 0 as
n ! 1. From this, (7), and (6) we obtain
 t2
lim ffnj t  1g D  .
n!1
j
2

Taking real parts, we have


 1
t2
lim 1  cos tx dFnj x D .
n!1
j 1 2

Hence for each  > 0, if we split the integral into two parts and transpose one
of them, we obtain
 
 2  
t 
lim   1  cos tx dFnj x
n!1  2 jxj 
j
 
  
 
D lim   1  cos tx dFnj x
n!1  jxj> 
j
7.2 LINDEBERG–FELLER THEOREM 219


 lim 2 dFnj x
n!1 jxj>
j

 nj
2
2
 lim 2 D ,
n!1
j
2 2

the last inequality above by Chebyshev’s inequality. Since 0  1  cos  


 2 /2 for every real , this implies
⎧ ⎫
2 ⎨ t2  t2  ⎬
½ lim  x 2
dFnj x ½ 0,
2 n!1 ⎩ 2 j
2 jxjn ⎭

the quantity in braces being clearly positive. Thus


⎧ ⎫
⎨ kn  ⎬
4
½ lim 1  x 2
dFnj x ;
t2 2 n!1 ⎩ jxj
jD1

t being arbitrarily large while  is fixed; this implies Lindeberg’s condition


(10 ). Theorem 7.2.1 is completely proved.
Lindeberg’s theorem contains both the identically distributed case
(Theorem 6.4.4) and Liapounov’s theorem. Let us derive the latter, which
assets that, under (4) and (5) of Sec. 7.1, the condition below for any one
value of υ > 0 is a sufficient condition for Sn to converge in dist. to 8:
kn 
 1
10 jxj2Cυ dFnj x ! 0.
jD1 1

For υ D 1 this condition is just (10) of Sec. 7.1. In the general case the asser-
tion follows at once from the following inequalities:
  jxj2Cυ
x 2 dFnj x  υ
dFnj x
j jxj> j jxj> 

1  1 2Cυ
 υ jxj dFnj x,
 j 1

showing that (10) implies (1).


The essence of Theorem 7.2.1 is the assumption of the finiteness of the
second moments together with the “classical” norming factor sn , which is the
standard deviation of the sum Sn ; see Exercise 10 below for an interesting
possibility. In the most general case where “nothing is assumed,” we have the
following criterion, due to Feller and in a somewhat different form to Lévy.
220 CENTRAL LIMIT THEOREM AND ITS RAMIFICATIONS

Theorem 7.2.2. For the double array (2) of Sec. 7.1 (with independence in
each row), in order that there exists a sequence of constants fan g such that
n
(i) kjD1 Xnj  an converges in dist. to 8, and (ii) the array is holospoudic,
it is necessary and sufficient that the following two conditions hold for every
 > 0:
kn 
(a) jD1 jxj> dFnj x ! 0;
kn  
(b) jD1 f jxj x 2 dFnj x   jxj x dFnj x2 g ! 1.

We refer to the monograph by Gnedenko and Kolmogorov [12] for this


result and its variants, as well as the following criterion for a single sequence
of independent, identically distributed r.v.’s due to Lévy.

Theorem 7.2.3.  Let fXj , j ½ 1g be independent r.v.’s having the common


d.f. F; and Sn D njD1 Xj . In order that there exist constants an and bn > 0
(necessarily bn ! C1) such that Sn  an /bn converges in dist. to 8, it is
necessary and sufficient that we have, as y ! C1:
  
11 y 2
dFx D o 2
x dFx .
jxj>y jxjy

The central limit theorem, applied to concrete cases, leads to asymptotic


formulas. The best-known one is perhaps the following, for 0 < p < 1, p C
q D 1, and x1 < x2 , as n ! 1:
12    x2
 n 1
ey /2 dy.
2
pk qnk ¾ 8x2   8x1  D p
p p k 2 x1
x1 npqknpx2 npq

This formula, due to DeMoivre, can be derived from Stirling’s formula for
factorials in a rather messy way. But it is just an application of Theorem 6.4.4
(or 7.1.2), where each Xj has the Bernoullian d.f. pυ1 C qυ0 .
More interesting and less expected applications to combinatorial analysis
will be illustrated by the following example, which incidentally demonstrates
the logical necessity of a double array even in simple situations.
Consider all n! distinct permutations a1 , a2 , . . . , an  of the n integers
(1, 2, . . . , n). The sample space  D n consists of these n! points, and P
assigns probability 1/n! to each of the points. For each j, 1  j  n, and
each ω D a1 , a2 , . . . , an  let Xnj be the number of “inversions” caused by
j in ω; namely Xnj ω D m if and only if j precedes exactly m of the inte-
gers 1, . . . , j  1 in the permutation ω. The basic structure of the sequence
fXnj , 1  j  ng is contained in the lemma below.
7.2 LINDEBERG–FELLER THEOREM 221

Lemma 2. For each n, the r.v.’s fXnj , 1  j  ng are independent with the
following distributions:
1
P fXnj D mg D for 0  m  j  1.
j

The lemma is a striking example that stochastic independence need not


be an obvious phenomenon even in the simplest problems and may require
verification as well as discovery. It is often disposed of perfunctorily but a
formal proof is lengthier than one might think. Observe first that the values of
Xn1 , . . . , Xnj are determined as soon as the positions of the integers f1, . . . , jg
are known, irrespective of those of the rest. Given j arbitrary positions among
n ordered slots, the number of ω’s in which f1, . . . , jg occupy these positions in
some order is j!n  j!. Among these the number of ω’s in which j occupies
the j  mth place, where 0  m  j  1, (in order from left to right) of the
given positions is j  1!n  j!. This position being fixed for j, the integers
f1, . . . , j  1g may occupy the remaining given positions in j  1! distinct
ways, each corresponding uniquely to a possible value of the random vector
fXn1 , . . . , Xn,j1 g.
That this correspondence is 1  1 is easily seen if we note that the total
number of possible values of this vector is precisely 1 Ð 2 Ð Ð Ð j  1 D
j  1!. It follows that for any such value c1 , . . . , cj1  the number of
ω’s in which, first, f1, . . . , jg occupy the given positions and, second,
Xn1 ω D c1 , . . . , Xn,j1 ω D cj1 , Xnj ω D m, is equal to n  j!.6 Hence
7
the number of ω’s satisfying the second condition alone is equal to nj n 
j! D n!/j!. Summing over m from 0 to j  1, we obtain the number of
ω’s in which Xn1 ω D c1 , . . . , Xn,j1 ω D cj1 to be jn!/j! D n!/j  1!.
Therefore we have
n!
P fXn1 D c1 , . . . , Xn,j1 D cj1 , Xnj D mg j! 1
D D .
P fXn1 D c1 , . . . , Xn,j1 D cj1 g n! j
j  1!
This, as we know, establishes the lemma. The reader is urged to see for himself
whether he can shorten this argument while remaining scrupulous.
The rest is simple algebra. We find
j1 j2  1 n2 n3
E Xnj  D , 2
nj D , E S n  ¾ , sn2 ¾ .
2 12 4 36
For each  > 0, and sufficiently large n, we have
jXnj j  j  1  n  1 < sn .
222 CENTRAL LIMIT THEOREM AND ITS RAMIFICATIONS

Hence Lindeberg’s condition is satisfied for the double array


 
Xnj
; 1  j  n, 1  n
sn
(in fact the Corollary to Theorem 7.1.2 is also applicable) and we obtain the
central limit theorem:
⎧ ⎫

⎪ n2 ⎪

⎨ Sn  ⎬
lim P 4  x D 8x.
n!1 ⎪
⎪ n 3/2 ⎪

⎩ ⎭
6

Here for each permutation ω, Sn ω D njD1 Xnj ω is the total number of
inversions in ω; and the result above asserts that among the n! permutations
on f1, . . . , ng, the number of those in which there are n2 /4 C xn3/2 /6
inversions has a proportion 8x, as n ! 1. In particular, for example, the
number of those with n2 /4 inversions has an asymptotic proportion of 12 .

EXERCISES

1. Restate Theorem 7.1.2 in terms of normed sums of a single sequence.


2. Prove that Lindeberg’s condition (1) implies that

max nj ! 0.
1jkn

 3. Prove that in Theorem 7.2.1, (i) does not imply (1). [HINT: Consider
r.v.’s with normal distributions.]
 4. Prove the sufficiency part of Theorem 7.2.1 without using
Theorem 7.1.2, but by elaborating the proof of the latter. [HINT: Use the
expansion
tx2
eitx D 1 C itx C  for jxj > 
2
and
tx2 jtxj3
eitx D 1 C itx  C 0 for jxj  .
2 6
As a matter of fact, Lindeberg’s original proof does not even use ch.f.’s; see
Feller [13, vol. 2].]
5. Derive Theorem 6.4.4 from Theorem 7.2.1.
6. Prove that if υ < υ0 , then the condition (10) implies the similar one
when υ is replaced by υ0 .
7.2 LINDEBERG–FELLER THEOREM 223

 7. Find an example where Lindeberg’s condition is satisfied but


Liapounov’s is not for any υ > 0.
In Exercises 8 to 10 below fXj , j ½ 1g is a sequence of independent r.v.’s.
8. For each j let Xj have the uniform distribution in [j, j]. Show that
Lindeberg’s condition is satisfied and state the resulting central limit theorem.
9. Let Xj be defined as follows for some ˛ > 1:



1
⎨ šj˛ , with probability 2˛1 each;
6j
Xj D

⎪ 1
⎩ 0, with probability 1  2˛1 .
3j

Prove that Lindeberg’s condition is satisfied if and only if ˛ < 3/2.


 10. It is important to realize that the failure of Lindeberg’s condition
means only the failure of either (i) or (ii) in Theorem 7.2.1 with the specified
constants sn . A central limit theorem may well hold with a different sequence
of constants. Let
⎧ 1

⎪ šj ,
⎪ 2
with probability each;

⎪ 12j2

1
Xj D šj, with probability each;

⎪ 12



⎩ 0,
1
with probability 1   2 .
1
6 6j

Prove that Lindeberg’s condition is not satisfied. Nonetheless if we take bn2 D


n3 /18, then Sn /bn converges in dist. to 8. The point is that abnormally large
values may not count! [HINT: Truncate out the abnormal value.]
1
11. Prove that 1 x 2 dFx < 1 implies the condition (11), but not
vice versa.
 12. The following combinatorial problem is similar to that of the number
of inversions. Let  and P be as in the example in the text. It is standard
knowledge that each permutation
 
1 2 ÐÐÐ n
a1 a2 ÐÐÐ an

can be uniquely decomposed into the product of cycles, as follows. Consider


the permutation as a mapping  from the set 1, . . . , n onto itself such
that j D aj . Beginning with 1 and applying the mapping successively,
1 ! 1 ! 2 1 ! Ð Ð Ð, until the first k such that k 1 D 1. Thus
1, 1, 2 1, . . . , k1 1
224 CENTRAL LIMIT THEOREM AND ITS RAMIFICATIONS

is the first cycle of the decomposition. Next, begin with the least integer,
say b, not in the first cycle and apply  to it successively; and so on. We
say 1 ! 1 is the first step, . . . , k1 1 ! 1 the kth step, b ! b the
k C 1st step of the decomposition, and so on. Now define Xnj ω to be equal
to 1 if in the decomposition of ω, a cycle is completed at the jth step; otherwise
to be 0. Prove that for each n, fXnj , 1  j  ng is a set of independent r.v.’s
with the following distributions:
1
P fXnj D 1g D ,
njC1
1
P fXnj D 0g D 1  .
njC1
Deduce the central limit theorem for the number of cycles of a permutation.

7.3 Ramifications of the central limit theorem


As an illustration of a general method of extending the central limit theorem
to certain classes of dependent r.v.’s, we prove the following result. Further
elaboration along the same lines is possible, but the basic idea, attributed to
S. Bernstein, consists always in separating into blocks and neglecting small ones.
Let fXn , n ½ 1g be a sequence of r.v.’s; let Fn be the Borel field generated
by fXk , 1  k  ng, and Fn0 that by fXk , n < k < 1g. The sequence is called
m-dependent iff there exists an integer m ½ 0 such that for every n the fields
0
Fn and FnCm are independent. When m D 0, this reduces to independence.

Theorem 7.3.1. Suppose that fXn g is a sequence of m-dependent, uniformly


bounded r.v.’s such that
Sn 
! C1
n1/3
as n ! 1. Then [Sn  E Sn ]/ Sn  converges in dist. to 8.
PROOF. Let the uniform bound be M. Without loss of generality we may
suppose that E Xn  D 0 for each n. For an integer k ½ 1 let nj D [jn/k],
0  j  k, and put for large values of n:
Yj D Xnj C1 C Xnj C2 C Ð Ð Ð C Xnj C1m ;
Zj D XnjC1 mC1 C XnjC1 mC2 C Ð Ð Ð C XnjC1 .
We have then

k1 
k1
Sn D Yj C Zj D Sn0 C Sn00 , say.
jD0 jD0
7.3 RAMIFICATIONS OF THE CENTRAL LIMIT THEOREM 225

It follows from the hypothesis of m-dependence and Theorem 3.3.2 that the
Yj ’s are independent; so are the Zj ’s, provided njC1  m C 1  nj > m,
which is the case if n/k is large enough. Although Sn0 and Sn00 are not
independent of each other, we shall show that the latter is comparatively
negligible so that Sn behaves like Sn0 . Observe that each term Xr in Sn00
is independent of every term Xs in Sn0 except at most m terms, and that
E Xr Xs  D 0 when they are independent, while jE Xr Xs j  M2 otherwise.
Since there are km terms in Sn00 , it follows that
jE Sn0 Sn00 j  km Ð m Ð M2 D kmM2 .
We have also

k1
E S00 n  D
2
E Z2j   kmM2 .
jD0

From these inequalities and the identity

E Sn2  D E S0 n  C 2E Sn0 Sn00  C E S00 n 


2 2

we obtain
jE Sn2   E S0 n j  3km2 M2 .
2

Now we choose k D kn D [n2/3 ] and write sn2 D E Sn2  D 2


Sn , s0 2n D
E S0 2n  D 2 Sn0 . Then we have, as n ! 1.
sn0
1 ! 1,
sn
and
 2
Sn00
2 E ! 0.
sn

Hence, first, Sn00 /sn ! 0 in pr. (Theorem 4.1.4) and, second, since
Sn s0 S 0 S00
D n 0n C n ,
sn sn sn sn
Sn /sn will converge in dist. to 8 if Sn0 /sn0 does.
Since kn is a function of n, in the notation above Yj should be replaced
by Ynj to form the double array fYnj , 0  j  kn  1, 1  ng, which retains
independence in each row. We have, since each Ynj is the sum of no more
than [n/kn ] C 1 of the Xn ’s,
 
n
jYnj j  C 1 M D On1/3  D osn  D osn0 ,
kn
226 CENTRAL LIMIT THEOREM AND ITS RAMIFICATIONS

the last relation from (1) and the one preceding it from a hypothesis of the
theorem. Thus for each  > 0, we have for all sufficiently large n:

x 2 dFnj x D 0, 0  j  kn  1,
jxj>sn0

where Fnj is the d.f. of Ynj . Hence Lindeberg’s condition is satisfied for the
double array fYnj /sn0 g, and we conclude that Sn0 /sn0 converges in dist. to 8.
This establishes the theorem as remarked above.
The next extension of the central limit theorem is to the case of a random
number of terms (cf. the second part of Sec. 5.5). That is, we shall deal with
the r.v. S n whose value at ω is given by S n ω ω, where

n
Sn ω D Xj ω
jD1

as before, and f n ω, n ½ 1g is a sequence of r.v.’s. The simplest case,


but not a very useful one, is when all the r.v.’s in the “double family”
fXn , n , n ½ 1g are independent. The result below is more interesting and is
distinguished by the simple nature of its hypothesis. The proof relies essentially
on Kolmogorov’s inequality given in Theorem 5.3.1.

Theorem 7.3.2. Let fXj , j ½ 1g be a sequence of independent, identically


distributed r.v.’s with mean 0 and variance 1. Let f n , n ½ 1g be a sequence
of r.v.’s taking only strictly positive integer values such that
n
3 !c in pr.,
n
p
where c is a constant: 0 < c < 1. Then S n / n converges in dist. to 8.
p
PROOF. We know from Theorem 6.4.4 that Sn / n converges in dist. to
8, so that our conclusion means that we can substitute n for n there. The
remarkable thing is that no kind of independence is assumed, but only the limit
property in (3). First of all, we observe that in the result of Theorem 6.4.4 we
may substitute [cn]p (D integer part of cn) for n to conclude the convergence
in dist. of S[cn] / [cn] to 8 (why?). Next we write
 8
Sn S[cn] S n  S[cn] [cn]
p D p C p .
n [cn] [cn] n

The second factor on the right converges to 1 in pr., by (3). Hence a simple
argument used before (Theorem 4.4.6) shows that the theorem will be proved
if we show that
S n  S[cn]
4 p !0 in pr.
[cn]
7.3 RAMIFICATIONS OF THE CENTRAL LIMIT THEOREM 227

Let  be given, 0 <  < 1; put

an D [1  3 [cn]], bn D [1 C 3 [cn]]  1.

By (3), there exists n0  such that if n ½ n0 , then the set

3 D fω: an  n ω  bn g

has probability ½1  . If ω is in this set, then S n ω ω is one of the sums
Sj with an  j  bn . For [cn] < j  bn , we have

Sj  S[cn] D X[cn]C1 C X[cn]C2 C Ð Ð Ð C Xj ;

hence by Kolmogorov’s inequality


 
p 2
Sbn  S[cn]  3 [cn]
P max jSj  S[cn] j >  cn    .
[cn]jbn 2 cn 2 cn
A similar inequality holds for an  j < [cn]; combining the two, we obtain
p
P f max jSj  S[cn] j >  cng  2.
an jbn

Now we have, if n ½ n0 :


  
 S n  S[cn] 
P  p >
[cn] 
1    
 S n  S[cn] 
D 
n D j;  p
 >
[cn] 
P
jD1

   
9
 P n D j; max jSj  S[cn] j >  [cn] C Pf n D jg
an jbn
an jbn j2[a
/ n ,bn ]
 
9
P max jSj  S[cn] j >  [cn] C P f n 2
/ [an , bn ]g
an jbn

 2 C 1  P f3g  3.

Since  is arbitrary, this proves (4) and consequently the theorem.


As a third, somewhat deeper application of the central limit theorem, we
give an instance of another limiting distribution inherently tied up with the
normal. In this application the role of the central limit theorem, indeed in
high dimensions, is that of an underlying source giving rise to multifarious
manifestations. The topic really belongs to the theory of Brownian motion
process, to which one must turn for a true understanding.
228 CENTRAL LIMIT THEOREM AND ITS RAMIFICATIONS

Let fXj , j ½ 1g be a sequence of independent and identically distributed


r.v.’s with mean 0 and variance 1; and

n
Sn D Xj .
jD1

It will become evident that these assumptions may be considerably weakened,


and the basic hypothesis is that the central limit theorem should be applicable.
Consider now the infinite sequence of successive sums fSn , n ½ 1g. This is
the kind of stochastic process to which an entire chapter will be devoted later
(Chapter 8). The classical limit theorems we have so far discussed deal with
the individual terms of the sequence fSn g itself, but there are various other
sequences derived from it that are no less interesting and useful. To give a
few examples:
jSm j
max Sm , min Sm , max jSm j, max p ,
1mn 1mn 1mn 1mn m

n 
n
υa Sm , Sm , SmC1 ;
mD1 mD1

where a, b D 1 if ab < 0 and 0 otherwise. Thus the last two examples
represent, respectively, the “number of sums ½ a” and the “number of changes
of sign”. Now the central idea, originating with Erdös and Kac, is that the
asymptotic behavior of these functionals of Sn should be the same regardless of
the special properties of fXj g, so long as the central limit theorem applies to it
(at least when certain regularity conditions are satisfied, such as the finiteness
of a higher moment). Thus, in order to obtain the asymptotic distribution of
one of these functionals one may calculate it in a very particular case where
the calculations are feasible. We shall illustrate this method, which has been
called an “invariance principle”, by carrying it out in the case of max Sm ; for
other cases see Exercises 6 and 7 below.
Let us therefore put, for a given x:
 
p
Pn x D P max Sm  x n .
1mn

For an integer k ½ 1 let nj D [jn/k], 0  j  k, and define


 
p
Rnk x D P max Snj  x n .
1jk

Let also
p p
Ej D fω: Sm ω  x n, 1  m < j; Sj ω > x ng;
7.3 RAMIFICATIONS OF THE CENTRAL LIMIT THEOREM 229

and for each j, define j by

nj1 < j  nj .

Now we write, for 0 <  < x:


  
p n
p
P max Sm > x n D P fEj ; jSnj  Sj j   ng
1mn
jD1


n
p  
C P fEj ; jSnj  Sj j >  ng D C ,
jD1 1 2
p
say. Since Ej is independent of fjSnj  Sj j >  ng and 2
Snj  Sj  
n/k, we have by Chebyshev’s inequality:
n
 
n
1
 P Ej  2k  2 .
2 jD1
 n  k
p p
On the pother hand, since Sj > x n and jSnj  Sj j   n imply Snj >
x   n, we have
  
p
 P max Sn > x   n D 1  Rnk x  .
1k
1

It follows that
  1
5 Pn x D 1   ½ Rnk x    .
1 2
2 k

Since it is trivial that Pn x  Rnk x, we obtain from (5) the following
inequalities:
1
6 Pn x  Rnk x  Pn x C  C .
2 k
We shall show that for fixed x and k, limn!1 Rnk x exists. Since
p p p
Rnk x D P fSn1  x n, Sn2  x n, . . . , Snk  x ng,

it is sufficient to show that the sequence of k-dimensional random vectors


+ + + 
k k k
Sn , Sn , . . . , Sn
n 1 n 2 n k
230 CENTRAL LIMIT THEOREM AND ITS RAMIFICATIONS

converges in distribution as n ! 1. Now its ch.f. ft1 , . . . , tk  is given by


9 9
E fexpi k/nt1 Sn1 C Ð Ð Ð C tk Snk g D E fexpi k/n[t1 C Ð Ð Ð C tk Sn1
C t2 C Ð Ð Ð C tk Sn2  Sn1 
C Ð Ð Ð C tk Snk  Snk1 ]g,
which converges to
: ; : ; 6 7
7 exp  12 t1 C Ð Ð Ð C tk 2 exp  12 t2 C Ð Ð Ð C tk 2 Ð Ð Ð exp  12 tk2 ,
since the ch.f.’s of
+ + +
k k k
Sn1 , Sn2  Sn1 , . . . , Sn  Snk1 
n n n k
all converge to et /2 by the central limit theorem (Theorem 6.4.4), njC1 
2

nj being asymptotically equal to n/k for each j. It is well known that the
ch.f. given in (7) is that of the k-dimensional normal distribution, but for our
purpose it is sufficient to know that the convergence theorem for ch.f.’s holds
in any dimension, and so Rnk converges vaguely to R1k , where R1k is some
fixed k-dimensional distribution.
Now suppose for a special sequence fX Q j g satisfying the same conditions
as fXj g, the corresponding PQ n can be shown to converge (“pointwise” in fact,
but “vaguely” if need be):

8 8x: lim PQ n x D Gx.


n!1

Then, applying (6) with Pn replaced by PQ n and letting n ! 1, we obtain,


since R1k is a fixed distribution:
1
Gx  R1k x  Gx C  C .
2 k
Substituting back into the original (6) and taking upper and lower limits,
we have
1 1
Gx    2  R1k x    2  lim Pn x
 k  k n

1
 R1k x  Gx C  C .
2 k
Letting k ! 1, we conclude that Pn converges vaguely to G, since  is
arbitrary.
It remains to prove (8) for a special choice of fXj g and determine G. This
can be done most expeditiously by taking the common d.f. of the Xj ’s to be
7.3 RAMIFICATIONS OF THE CENTRAL LIMIT THEOREM 231

the symmetric Bernoullian 12 υ1 C υ1 . In this case we can indeed compute
the more specific probability
 
9 P max Sm < x; Sn D y ,
1mn

where x and y are two integers such that x > 0, x > y. If we observe that
in our particular case max1mn Sm ½ x if and only if Sj D x for some j,
1  j  n, the probability in (9) is seen to be equal to
 
P fSn D yg  P max Sm ½ x; Sn D y
1mn


n
D P fSn D yg  P fSm < x, 1  m < j; Sj D x; Sn D yg
jD1


n
D P fSn D yg  P fSm < x, 1  m < j; Sj D x; Sn  Sj D y  xg
jD1


n
D P fSn D yg  P fSm < x, 1  m < j; Sj D xgP fSn  Sj D y  xg,
jD1

where the last step is by independence. Now, the r.v.



n
Sn  Sj D Xm
mDjC1

being symmetric, we have P fSn  Sj D y  xg D P fSn  Sj D x  yg.


Substituting this and reversing the steps, we obtain
n
P fSn D yg  P fSm < x, 1  m < j; Sj D xgP fSn  Sj D x  yg
jD1


n
D P fSn D yg  P fSm < x, 1  m < j; Sj D x; Sn  Sj D x  yg
jD1


n
D P fSn D yg  P fSm < x, 1  m < j; Sj D x; Sn D 2x  yg
jD1
 
D P fSn D yg  P max Sm ½ x; Sn D 2x  y .
1mn

Since 2x  y > x, Sn D 2x  y implies max1mn Sm ½ x, hence the last line


reduces to
10 P fSn D yg  P fSn D 2x  yg,
232 CENTRAL LIMIT THEOREM AND ITS RAMIFICATIONS

and we have proved that the value of the probability in (9) is given by
(10). The trick used above, which consists in changing the sign of every Xj
after the first time when Sn reaches the value x, or, geometrically speaking,
reflecting the path fj, Sj , j ½ 1g about the line Sj D x upon first reaching it,
is called the “reflection principle”. It is often attributed to Desiré André in its
combinatorial formulation of the so-called “ballot problem” (see Exercise 5
below).
The value of (10) is, of course, well known in the Bernoullian case, and
summing over y we obtain, if n is even:
  
  
 1 n n
P max Sm < x D n  y  n  2x C y
1mn
y<x
2n
2 2
  
 1  n  n
D n  y  n C 2x  y
2n
y<x 2 2
  
1 n
D n
2 j
nx nCx
2 <j 2
   
1  n 1 n
D n C n nCx ,
2 n x
j 2
jj j< 2
2 2

6 7 p
where nj D 0 if jjj > n or if j is not an integer. Replacing x by x n (or
p
[x n] if one is pedantic) in the last expression, and using the central limit
theorem for the Bernoullian case in the form of (12) of Sec. 7.2 with p D q D
1
2 , we see that the preceding probability tends to the limit

 + 
x x
1 y 2 /2 2
ey
2 /2
p e dy D dy
2 x  0

as n ! 1. It should be obvious, even without a similar calculation, that the


same limit obtains for odd values of n. Finally, since this limit as a function
of x is a d.f. with support in 0, 1, the corresponding limit for x  0 must
be 0. We state the final result below.

Theorem 7.3.3. Let fXj , j ½ 0g be independent and identically


p distributed
r.v.’s with mean 0 and variance 1, then max1mn Sm / n converges in dist.
to the “positive normal d.f.” G, where

8x: Gx D 28x  1 _ 0.


7.3 RAMIFICATIONS OF THE CENTRAL LIMIT THEOREM 233

EXERCISES

1. Let fXj , j ½ 1g be a sequence of independent r.v.’s, and f a Borel


measurable function of m variables. Then if k D fXkC1 , . . . , XkCm , the
sequence fk , k ½ 1g is m  1-dependent.
 2. Let fX , j ½ 1g be a sequence of independent r.v.’s having the
j
Bernoullian d.f. pυ1 C 1  pυ0 , 0 < p < 1. An r-run of successes in the
sample sequence fXj ω, j ½ 1g is defined to be a sequence of r consecutive
“ones” preceded and followed by “zeros”. Let Nn be the number of r-runs in
the first n terms of the sample sequence. Prove a central limit theorem for Nn .
3. Let fXj , j , j ½ 1g be independent r.v.’s such that the j ’s are integer-
valued, j !1 a.e., and the central limit theorem applies to Sn  an /bn ,
where Sn D njD1 Xj , an , bn are real constants, bn ! 1. Then it also applies
to S n  a n /b n .
 4. Give an example of a sequence of independent and identically
distributed r.v.’s fXn g with mean 0 and variance 1 and a sequence of positive
integer-valued r.v.’s n tending to 1 a.e. such that S n /s n does not converge
in distribution. [HINT: The easiest way is to use Theorem 8.3.3 below.]
 5. There are a ballots marked A and b ballots marked B. Suppose that
these a C b ballots are counted in random order. What is the probability that
the number of ballots for A always leads in the counting?
6. If fXn g are independent, identically distributed symmetric r.v.’s, then
for every x ½ 0,

P fjSn j > xg ½ 12 P f max jXk j > xg ½ 12 [1  enP fjX1 j>xg ].


1kn

7. Deduce from Exercise 6 that for a symmetric stable r.v. X with


exponent ˛, 0 < ˛ < 2 (see Sec. 6.5), there exists a constant c > 0 such that
P fjXj > n1/˛ g ½ c/n. [This is due to Feller; use Exercise 3 of Sec. 6.5.]
8. Under p the same hypothesis as in Theorem 7.3.3, prove that
max1mn jSm j/ n converges in dist. to the d.f. H, where Hx D 0 for
x  0 and
1 # $
4  1k 2k C 12 2
Hx D exp  for x > 0.
 kD0 2k C 1 8x 2

[HINT: There is no difficulty in proving that the limiting distribution is the


same for any sequence satisfying the hypothesis. To find it for the symmetric
Bernoullian case, show that for 0 < z < x we have
 
P z < min Sm  max Sm < x  z; Sn D y  z
1mn 1mn
234 CENTRAL LIMIT THEOREM AND ITS RAMIFICATIONS

1
   
1  n n
D n n C 2kx C y  z  n C 2kx  y  z .
2 kD1
2 2
This can be done by starting the sample path at z and reflecting at both barriers
0 and x (Kelvin’s method of images). Next, show that
p p
lim P fz n < Sm < x  z n for 1  m  ng
n!1
1  2kC1xz  2kxz 
1  2
Dp  ey /2 dy.
2 kD1 2kxz 2k1xz

Finally, use the Fourier series for the function h of period 2x:

1, if  x  z < y < z;
hy D
C1, if  z < y < x  z;

to convert the above limit to


1 # $
4 1 2k C 1z 2k C 12 2
sin exp  .
 kD0 2k C 1 x 2x 2

This gives the asymptotic joint distribution of

min Sm and max Sm ,


1mn 1mn

of which that of max1mn jSm j is a particular case.


9. Let fXj ½ 1g be independent r.v.’s with the symmetric Bernoullian
distribution. Let Nn ω be the number of zerospin the first n terms of the
sample sequence fSj ω, j ½ 1g. Prove that Nn / n converges in dist. to the
same G as in Theorem 7.3.3. [HINT: Use the method of moments. Show that
for each integer r ½ 1:

E Nrn  ¾ r! p2j1 p2j2 j1  Ð Ð Ð p2jr jr1 
0<j1 <ÐÐÐ<jr n/2

where  
1 12j
p2j D P fS2j D 0g D ¾p .
22j j
j

as j ! 1. To evaluate the multiple sum, say r, use induction on r as
follows. If
 n r/2
r ¾ cr
2
7.4 ERROR ESTIMATION 235

as n ! 1, then
  1
cr n rC1/2
r C 1 ¾ p z1/2 1  zr/2 dz .
 0 2

Thus
r
0 C1
crC1 D cr  2 .
rC1
0 C1
2

Finally
 
rC1
 r  2r/2 0  1
Nn 0r C 1 2
E p ! r D   D x r dGx.
n r/2
2 0 C1 1 0
2 0
2

This result remains valid if the common d.f. F of Xj is of the integer lattice
type with mean 0 and variance 1. If F is not of the lattice type, no Sn need
ever be zero — but the “next nearest thing”, to wit the number of changes of
sign of Sn , is asymptotically distributed as G, at least under the additional
assumption of a finite third absolute moment.]

7.4 Error estimation

Questions of convergence lead inevitably to the question of the “speed” of


convergence — in other words, to an investigation of the difference between
the approximating expression and its limit. Specifically, if a sequence of d.f.’s
Fn converge to the unit normal d.f. 8, as in the central limit theorem, what
can one say about the “remainder term” Fn x  8x? An adequate estimate
of this term is necessary in many mathematical applications, as well as for
numerical computation. Under Liapounov’s condition there is a neat “order
bound” due to Berry and Esseen, who improved upon Liapounov’s older result,
as follows.

Theorem 7.4.1. Under the hypotheses of Theorem 7.1.2, there is a universal


constant A0 such that

1 sup jFn x  8xj  A0 0n


x

where Fn is the d.f. of Sn .


236 CENTRAL LIMIT THEOREM AND ITS RAMIFICATIONS

In the case of a single sequence of independent and identically distributed


r.v.’s fXj , j ½ 1g with mean 0, variance 2 , and third absolute moment  < 1,
the right side of (1) reduces to
n A0  1
A0 2 3/2
D 3 1/2 .
n  n
H. Cramér and P. L. Hsu have shown that under somewhat stronger condi-
tions, one may even obtain an asymptotic expansion of the form:
H1 x H2 x H3 x
Fn x D 8x C C C 3/2 C Ð Ð Ð ,
n1/2 n n
where the H’s are explicit functions involving the Hermite polynomials. We
shall not go into this, as the basic method of obtaining such an expansion is
similar to the proof of the preceding theorem, although considerable technical
complications arise. For this and other variants of the problem see Cramér
[10], Gnedenko and Kolmogorov [12], and Hsu’s paper cited at the end of
this chapter.
We shall give the proof of Theorem 7.4.1 in a series of lemmas, in which
the machinery of operating with ch.f.’s is further exploited and found to be
efficient.

Lemma 1. Let F be a d.f., G a real-valued function satisfying the condi-


tions below:

(i) limx!1 Gx D 0, limx!C1 Gx D 1;


(ii) G has a derivative that is bounded everywhere: supx jG0 xj  M.

Set
1
2 1D sup jFx  Gxj.
2M x
Then there exists a real number a such that we have for every T > 0:
  T1 
1  cos x
3 2MT1 3 dx  
0 x2
 1 
 1  cos Tx 
  fFx C a  Gx C ag dx  .
x 2
1
PROOF. Clearly the 1 in (2) is finite, since G is everywhere bounded by
(i) and (ii). We may suppose that the left member of (3) is strictly positive, for
otherwise there is nothing to prove; hence 1 > 0. Since F  G vanishes at
š1 by (i), there exists a sequence of numbers fxn g converging to a finite limit
7.4 ERROR ESTIMATION 237

b such that Fxn   Gxn  converges to 2M1 or 2M1. Hence either Fb 
Gb D 2M1 or Fb  Gb D 2M1. The two cases being similar, we
shall treat the second one. Put a D b  1; then if jxj < 1, we have by (ii)
and the mean value theorem of differential calculus:
Gx C a ½ Gb C x  1M
and consequently
Fx C a  Gx C a  Fb  [Gb C x  1M] D Mx C 1.
It follows that
 1  1
1  cos Tx 1  cos Tx
2
fFx C a  Gx C ag dx  M x C 1 dx
1 x 1 x2
 1
1  cos Tx
D 2M1 dx;
0 x2
 1  1  
 1  cos Tx 
 C fFx C a  Gx C ag dx 
 x 2
1 1
 1  1   1
1  cos Tx 1  cos Tx
 2M1 C 2
dx D 4M1 dx.
1 1 x 1 x2
Adding these inequalities, we obtain
 1   1  1
1  cos Tx
fFx C a  Gx C ag dx  2M1  C2
1 x2 0 1
  1  1
1  cos Tx 1  cos Tx
dx D 2M1 3 C2 dx.
x2 0 0 x2
This reduces to (3), since
 1
1  cos Tx T
dx D
0 x2 2
by (3) of Sec. 6.2, provided that T is so large that the left member of (3) is
positive; otherwise (3) is trivial.

Lemma 2. In addition to the assumptions of Lemma 1, we assume that

(iii) G is of bounded variation in 1, 1;


1
(iv) 1 jFx  Gxj dx < 1.

Let  1  1
ft D eitx dFx, gt D eitx dGx.
1 1
238 CENTRAL LIMIT THEOREM AND ITS RAMIFICATIONS

Then we have

1 T
jft  gtj 12
4 1 dt C .
M 0 t T
PROOF. That the integral on the right side of (4) is finite will soon be
apparent, and as a Lebesgue integral the value of the integrand at t D 0 may
be overlooked. We have, by a partial integration, which is permissible on
account of condition (iii):
 1
5 ft  gt D it fFx  Gxgeitx dx;
1

and consequently
 1
ft  gt ita
e D fFx C a  Gx C ageitx dx.
it 1

In particular, the left member above is bounded for all t 6D 0 by condition (iv).
Multiplying by T  jtj and integrating, we obtain
 T
ft  gt ita
6 e T  jtj dt
T it
 T 1
D fFx C a  Gx C ageitx T  jtj dx dt.
T 1

We may invert the repeated integral by Fubini’s theorem and condition (iv),
and obtain (cf. Exercise 2 of Sec. 6.2):
 1   T
 1  cos Tx  jft  gtj
 fFx C a  Gx C ag dx  T dt.
 x 2  t
1 0

In conjunction with (3), this yields


  T1   T
1  cos x jft  gtj
7 2M1 3 2
dx    dt.
0 x 0 t
The quantity in braces in (7) is not less than
 1  1
1  cos x 2  6
3 2
dx  3 2
dx   D  .
0 x T1 x 2 T1
Using this in (7), we obtain (4).
Lemma 2 bounds the maximum difference between two d.f.’s satisfying
certain regularity conditions by means of a certain average difference between
their ch.f.’s (It is stated for a function of bounded variation, since this is needed
7.4 ERROR ESTIMATION 239

for the asymptotic expansion mentioned above). This should be compared with
the general discussion around Theorem 6.3.4. We shall now apply the lemma
to the specific case of Fn and 8 in Theorem 7.4.1. Let the ch.f. of Fn be fn ,
so that

kn
fn t D fnj t.
jD1

Lemma 3. For jtj < 1/201/3


n , we have

jfn t  et j  0n jtj3 et


2 /2 2 /2
8 .
PROOF. We shall denote by  below a “generic” complex number with
jj  1, otherwise unspecified and not necessarily the same at each appearance.
By Taylor’s expansion:
2
nj 2 nj 3
fnj t D 1  t C t .
2 6
For the range of t given in the lemma, we have by Liapounov’s inequality:

9 j nj tj  jnj 1/3 tj  j0n 1/3 tj < 12 ,

so that  
 2 3
 nj 2 nj t  1 1 1
 t C < C < .
 2 6  8 48 4

Using (8) of Sec. 7.1 with 3 D /2, we may write


 2
2 2 2
nj 2 nj 3  nj t nj t3
log fnj t D  t C t C  C .
2 6 2 2 6

The absolute value of the last term above is less than


4 4 2 6    
nj t nj t nj jtj nj jtj3 1 1
C  C nj jtj 
3
C nj jtj3
4 36 4 36 4.2 36.8
by (9); hence
2   2
nj 2 1 1 1 nj 2 
log fnj t D  t C C C nj jtj3 D  t C nj t3 .
2 6 8 288 2 2
Summing over j, we have
t2 
log fn t D  C 0n t3 ,
2 2
240 CENTRAL LIMIT THEOREM AND ITS RAMIFICATIONS

or explicitly:  
 2
log fn t C t   1 0n jtj3 .
 2 2
Since jeu  1j  jujejuj for all u, it follows that
# $
t2 /2 0n jtj3 0n jtj3
jfn te  1j  exp .
2 2
Since 0n jtj3 /2  1/16 and e1/16  2, this implies (8).

Lemma 4. For jtj < 1/40n , we have

jfn tj  et


2
/3
10 .
PROOF. We symmetrize (see end of Sec. 6.2) to facilitate the estimation.
We have
 1  1
jfnj tj2 D cos tx  ydFnj x dFnj y,
1 1

since jfnj j2 is real. Using the elementary inequalities


 2 

cos u  1 C u   juj ,
3
 2  6
jx  yj3  4jxj3 C jyj3 ;
we see that the double integral above does not exceed
 1 1 
t2 2 2 3 3
1  x  2xy C y  C jtj jxj C jyj  dFnj x dFnj y
2 3
1 1 2 3
 
4 4
D 1  nj t C nj jtj  exp  nj t C nj jtj .
2 2 3 2 2 3
3 3
Multiplying over j, we obtain
 
4
jfn tj2  exp t2 C 0n jtj3  e2/3t
2

3
for the range of t specified in the lemma, proving (10).
Note that Lemma 4 is weaker than Lemma 3 but valid in a wider range.
We now combine them.

Lemma 5. For jtj < 1/40n , we have

jfn t  et j  160n jtj3 et


2 /2 2 /3
11 .
7.4 ERROR ESTIMATION 241

PROOF. If jtj < 1/20n 1/3 , this is implied by (8). If 1/20n 1/3   jtj <
1/40n , then 1  80n jtj3 , and so by (10):

jfn t  et j  jfn tj C et  2et  160n jtj3 et
2 2 2 2
/2 /2 /3 /3
.
PROOF OF THEOREM 7.4.1. Apply Lemma 2 with F D Fn and G D 8. The
M in condition (ii) of Lemma 1 may be taken to be 12 , since both Fn and 8
have mean 0 and variance 1, it follows from Chebyshev’s inequality that
1
Fx _ Gx 
, if x < 0,
x2
1
1  Fx _ 1  Gx  2 , if x > 0;
x
and consequently
1
8x: jFx  Gxj  .
x2
Thus condition (iv) of Lemma 2 is satisfied. In (4) we take T D 1/40n ; we
have then from (4) and (11):

2 1/40n  jfn t  et /2 j
2
96
sup jFn x  8xj  dt C p 0n
x  0 t 23

320n 1/40n  2 t2 /3 96
 t e dt C p 0n
 0 23
  1 
32 96
t2 et /3 dt C p
2
 0n .
 0 23
This establishes (1) with a numerical value for A0 (which may be somewhat
improved).
Although Theorem 7.4.1 gives the best possible uniform estimate of the
remainder Fn x  8x, namely one that does not depend on x, it becomes
less useful if x D xn increases with n even at a moderate speed. For instance,
we have  1
1
ey /2 dy C O0n ,
2
1  Fn xn  D p
2 xn
where the first “principal” term on the right is asymptotically equal to
1
exn /2 .
2
p
2xn
p
Hence already when xn D 2 log1/0n  this will be o0n  for 0n ! 0 and
absorbed by the remainder. For such “large deviations”, what is of interest is
242 CENTRAL LIMIT THEOREM AND ITS RAMIFICATIONS

an asymptotic evaluation of
1  Fn xn 
1  8xn 
as xn ! 1 more rapidly than indicated above. This requires a different type
of approximation by means of “bilateral Laplace transforms”, which will not
be discussed here.

EXERCISES

1. If F and G are d.f.’s with finite first moments, then


 1
jFx  Gxj dx < 1.
1

[HINT: Use Exercise 18 of Sec. 3.2.]


2. If f and g are ch.f.’s such that ft D gt for jtj  T, then
 1

jFx  Gxj dx  .
1 T
This is due to Esseen (Acta Math, 77(1944)).
 3. There exists a universal constant A > 0 such that for any sequence
1
of independent, identically distributed integer-valued r.v.’s fXj g with mean 0
and variance 1, we have
A1
sup jFn x  8xj ½ ,
x n1/2
n p
where Fn is the d.f. of  jD1 Xj / n. [HINT: Use Exercise 24 of Sec. 6.4.]
4. Prove that for every x > 0:
 1
x x2 /2 1 2
ey /2 dy  ex /2 .
2
e 
1Cx 2
x x

7.5 Law of the iterated logarithm

The law of the iterated logarithm is a crowning achievement in classical proba-


bility theory. It had its origin in attempts to perfect Borel’s theorem on normal
numbers (Theorem 5.1.3). In its simplest but basic form, this asserts: if Nn ω
denotes the number of occurrences of the digit 1 in the first n places of the
binary (dyadic) expansion of the real number ω in [0, 1], then Nn ω ¾ n/2
for almost every ω in Borel measure. What can one say about the devia-
tion Nn ω  n/2? The order bounds On1/2C ,  > 0; On log n1/2  (cf.
7.5 LAW OF THE ITERATED LOGARITHM 243

Theorem 5.4.1); and On log log n1/2  were obtained successively by Haus-
dorff (1913), Hardy and Littlewood (1914), and Khintchine (1922); but in
1924 Khintchine gave the definitive answer:
n
Nn ω 
lim + 2 D1
n!1 1
n log log n
2
for almost every ω. This sharp result with such a fine order of infinity as “log
log” earned its celebrated name. No less celebrated is the following extension
given by Kolmogorov
 (1929). Let fXn , n ½ 1g be a sequence of independent
r.v.’s, Sn D njD1 Xj ; suppose that E Xn  D 0 for each n and
 
sn
1 sup jXn ωj D o p ,
ω log log sn

where sn2 D 2
Sn , then we have for almost every ω:
Sn ω
2 lim 9 D 1.
n!1 2sn2 log log sn
The condition (1) was shown by Marcinkiewicz and Zygmund to be of the
best possible kind, but an interesting complement was added by Hartman and
Wintner that (2) also holds if the Xn ’s are identically distributed with a finite
second moment. Finally, further sharpening of (2) was given by Kolmogorov
and by Erdös in the Bernoullian case, and in the general case under exact
conditions by Feller; the “last word” being as follows: for any increasing
sequence ϕn , we have
!
P fSn ω > sn ϕn i.o.g D
0
1
according as the series
1
<
 ϕn
eϕn
2
/2
1.
nD1
n D
We shall prove the result (2) under a condition different from (1) and
apparently overlapping it. This makes it possible to avoid an intricate estimate
concerning “large deviations” in the central limit theorem and to replace it by
an immediate consequence of Theorem 7.4.1.Ł It will become evident that the
Ł An alternative which bypasses Sec. 7.4 is to use Theorem 7.1.3; the details are left as an
exercise.
244 CENTRAL LIMIT THEOREM AND ITS RAMIFICATIONS

proof of such a “strong limit theorem” (bounds with probability one) as the
law of the iterated logarithm depends essentially on the corresponding “weak
limit theorem” (convergence of distributions) with a sufficiently good estimate
of the remainder term.
The vital link just mentioned will be given below as Lemma 1. In the
rest of this section “A” will denote a generic strictly positive constant, not
necessarily the same at each appearance, and 3 will denote a constant such
that j3j  A. We shall also use the notation in the preceding statement of
Kolmogorov’s theorem, and

n
n D E jXn j3 , 0n D j
jD1

as in Sec. 7.4, but for a single sequence of r.v.’s. Let us set also
<
ϕ, x D 2x 2 log log x,  > 0, x > 0.

Lemma 1. Suppose that for some , 0 <  < 1, we have


0n A
3  .
sn3 log sn 1C
Then for each υ, 0 < υ < , we have
A
4 P fSn > ϕ1 C υ, sn g  ;
log sn 1Cυ
A
5 P fSn > ϕ1  υ, sn g ½ .
log sn 1υ/2
PROOF. By Theorem 7.4.1, we have for each x:
 1
1 0n
ey /2 dy C 3 3 .
2
6 P fSn > xsn g D p
2 x sn
We have as x ! 1:
 1
ex /2
2
y 2 /2
7 e dy ¾ .
x x
p
(See Exercise 4 of Sec. 7.4). Substituting x D 21 š υ log log sn , the first
term on the right side of (6) is, by (7), asymptotically equal to
1 1
p .
41 š υ log log sn log sn 1šυ
7.5 LAW OF THE ITERATED LOGARITHM 245

This dominates the second (remainder) term on the right side of (6), by (3)
since 0 < υ < . Hence (4) and (5) follow as rather weak consequences.
To establish (2), let us write for each fixed υ, 0 < υ < :
En C D fω: Sn ω > ϕ1 C υ, sn g,
En  D fω: Sn ω > ϕ1  υ, sn g,
and proceed by steps.
1° . We prove first that
8 P fEn C i.o.g D 0
in the notation introduced in Sec. 4.2, by using the convergence part of
the
 Borel–Cantelli lemma there. But it is evident from (4) that the series
C
npP E n  is far from convergent, since sn is expected to be of the order
of n. The main trick is to apply the lemma to a crucial subsequence fnk g
(see Theorem 5.1.2 for a crude form of the trick) chosen to have two prop-
erties: first, k P Enk C  converges, and second, “En C i.o.” already implies
“Enk C i.o.” nearly, namely if the given υ is slightly decreased. This modified
implication is a consequence of a simple but essential probabilistic argument
spelled out in Lemma 2 below.
Given c > 1, let nk be the largest value of n satisfying sn  ck , so that
snk  ck < snk C1 .
Since max1jn j /sn ! 0 (why?), we have snk C1 /snk ! 1, and so

9 snk ¾ ck
as k ! 1. Now for each k, consider the range of j below:
10 nk  j < nkC1
and put
11 Fj D fω: jSnkC1  Sj j < snkC1 g.
By Chebyshev’s inequality, we have
sn2 kC1  sn2 k 1
P Fj  ½ 1  ! ;
sn2 kC1 c2
hence P Fj  ½ A > 0 for all sufficiently large k.

Lemma 2. Let fEj g and fFj g, 1  j  n < 1, be two sequences of events.


Suppose that for each j, the event Fj is independent of Ec1 Ð Ð Ð Ecj1 Ej , and
246 CENTRAL LIMIT THEOREM AND ITS RAMIFICATIONS

that there exists a constant A > 0 such that P Fj  ½ A for every j. Then
we have
⎛ ⎞ ⎛ ⎞
n n
12 P⎝ Ej Fj ⎠ ½ AP ⎝ Ej ⎠ .
jD1 jD1

PROOF. The left member in (12) is equal to


⎛ ⎞
n
P⎝ [E1 F1 c Ð Ð Ð Ej1 Fj1 c Ej Fj ]⎠
jD1
⎛ ⎞
n 
n
½P⎝ [Ec1 Ð Ð Ð Ecj1 Ej Fj ]⎠ D P Ec1 Ð Ð Ð Ecj1 Ej P Fj 
jD1 jD1


n
½ P Ec1 Ð Ð Ð Ecj1 Ej  Ð A,
jD1

which is equal to the right member in (12).


Applying Lemma 2 to Ej C and the Fj in (11), we obtain
⎧ ⎫ ⎧ ⎫
⎨nkC1 1 ⎬ ⎨nkC1 1 ⎬
13 P Ej C Fj ½ AP Ej C .
⎩ ⎭ ⎩ ⎭
jDnk jDnk

It is clear that the event Ej C \ Fj implies


SnkC1 > Sj  snkC1 > ϕ1 C υ, sj   snkC1 ,
which is, by (9) and (10), asymptotically greater than
 
1 3
ϕ 1 C υ, snkC1 .
c 4
Choose c so close to 1 that 1 C 3/4υ/c2 > 1 C υ/2 and put
  
υ
Gk D ω: SnkC1 > ϕ 1 C , snkC1 ;
2
note that Gk is just EnkC1 C with υ replaced by υ/2. The above implication may
be written as
Ej C Fj ² Gk
for sufficiently large k and all j in the range given in (10); hence we have
nkC1 1
14 EC
j Fj ² Gk .
jDnk
7.5 LAW OF THE ITERATED LOGARITHM 247

It follows from (4) that


  A  1
P Gk   1Cυ/2
 A < 1.
k k
log snk  k
k log c1Cυ/2

In conjunction with (13) and (14) we conclude that


⎧ ⎫
 ⎨nkC1 1 ⎬
P Ej C < 1;
⎩ ⎭
k jDnk

and consequently by the Borel–Cantelli lemma that


⎧ ⎫
⎨nkC1 1 ⎬
P Ej C i.o. D 0.
⎩ ⎭
jDnk

This is equivalent (why?) to the desired result (8).


2° . Next we prove that with the same subsequence fnk g but an arbitrary
c, if we put tk2 D sn2 kC1  sn2 k , and
  
υ
Dk D ω: SnkC1 ω  Snk ω > ϕ 1  , tk ,
2
then we have
15 P Dk i.o. D 1.
Since the differences SnkC1  Snk , k ½ 1, are independent r.v.’s, the divergence
part of the Borel–Cantelli lemma is applicable to them. To estimate P Dk ,
we may apply Lemma 1 to the sequence fXnk Cj , j ½ 1g. We have
   
1 1
tk ¾ 1  2 snkC1 ¾ 1  2 c2kC1 ,
2 2
c c
and consequently by (3):
0nkC1  0nk A0n A
3
 3 kC1  .
tk SnkC1 log tk 1C2
Hence by (5),
A A
P Dk  ½ 1υ/4
½ 1υ/4
log tk  k

and so k P Dk  D 1 and (15) follows.
3° . We shall use (8) and (15) together to prove
16 P En  i.o. D 1.
248 CENTRAL LIMIT THEOREM AND ITS RAMIFICATIONS

This requires just a rough estimate entailing a suitable choice of c. By (8)


applied to fXn g and (15), for almost every ω the following two assertions
are true:

(i) SnkC1 ω  Snk ω > ϕ1  υ/2, tk  for infinitely many k;
(ii) Snk ω ½ ϕ2, snk  for all sufficiently large k.

For such an ω, we have then


 
υ
17 SnkC1 ω > ϕ 1  , tk  ϕ2, snk  for infinitely many k.
2

Using (9) and log log tk2 ¾ log log sn2 kC1 , we see that the expression in the right
side of (17) is asymptotically greater than
& 8   p '
υ 1 2
1 1 2  ϕ1, snkC1  > ϕ1  υ, snkC1 ,
2 c c

provided that c is chosen sufficiently large. Doing this, we have therefore


proved that

18 P EnkC1  i.o. D 1,

which certainly implies (16).


4° . The truth of (8) and (16), for each fixed υ, 0 < υ <2, means exactly
the conclusion (2), by an argument that should by now be familiar to the
reader.

Theorem9 7.5.1. Under the condition (3), the lim sup and lim inf, as n ! 1,
of Sn / 2sn2 log log sn are respectively C1 and 1, with probability one.

The assertion about lim inf follows, of course, from (2) if we apply it
to fXj , j ½ 1g. Recall that (3) is more than sufficient to ensure the validity
of the central limit theorem, namely that Sn /sn converges in dist. to 8. Thus
the law of the iterated logarithm complements the central limit theorem by
circumscribing the extraordinary fluctuations of the sequence fSn , n ½ 1g. An
immediate consequence is that for almost every ω, the sample sequence Sn ω
changes sign infinitely often. For much more precise results in this direction
see Chapter 8.
In view of the discussion preceding Theorem 7.3.3, one may wonder
about the almost everywhere bounds for

max Sm , max jSm j, and so on.


1mn 1mn
7.5 LAW OF THE ITERATED LOGARITHM 249

It is interesting to observe that as far as the lim supn is concerned, these two
functionals behave exactly like Sn itself (Exercise 2 below). However, the
question of lim infn is quite different. In the case of max1mn jSm j, another
law of the (inverted) iterated logarithm holds as follows. For almost every ω,
we have
max1mn jSm ωj
lim 8 D 1;
n!1 2 sn 2
8 log log sn
under a condition analogous to but stronger than (3). Finally, one may wonder
about an asymptotic lower bound for jSn j. It is rather trivial to see that this
is always osn  when the central limit theorem is applicable; but actually it is
even osn1  in some general cases. Indeed in the integer lattice case, under
the conditions of Exercise 9 of 7.3, we have “Sn D 0 i.o. a.e.” This kind of
phenomenon belongs really to the recurrence properties of the sequence fSn g,
to be discussed in Chapter 8.

EXERCISES

1. Show that condition (3) is fulfilled if the Xj ’s have a common d.f.


with a finite third moment.
 2. Prove that whenever (2) holds, then the analogous relations with S
n
replaced by max1mn Sm or max1mn jSm j also hold.
 3. Let fX , j ½ 1g be a sequence of independent, identically distributed
j 
r.v.’s with mean 0 and variance 1, and Sn D njD1 Xj . Then
 
jSn ωj
P lim p D 0 D 1.
n!1 n

[HINT: Consider SnkC1  Snk with nk ¾ k k . A quick proof follows from


Theorem 8.3.3 below.]
4. Prove that in Exercise 9 of Sec. 7.3 we have P fSn D 0 i.o.g D 1.
 5. The law of the iterated logarithm may be used to supply certain coun-
terexamples. For instance, if the Xn ’s are independent and Xn D šn1/2 /log log
n with probability 12 each, then Sn /n ! 0 a.e., but Kolmogorov’s sufficient

condition (see case (i) after Theorem 5.4.1) n E X2n /n2 < 1 fails.
6. Prove that P fjSn j > ϕ1  υ, sn i.o.g D 1, without use of (8), as
follows. Let
ek D fω: jSnk ωj < ϕ1  υ, snk g;
  
υ
fk D ω: SnkC1 ω  Snk ω > ϕ 1  , snkC1 .
2
250 CENTRAL LIMIT THEOREM AND ITS RAMIFICATIONS

Show that for sufficiently large k the event ek \ fk implies the complement
of ekC1 ; hence deduce
⎛ ⎞

kC1 
k
P⎝ ej ⎠  P ej0  [1  P fj ]
jDj0 jDj0

and show that the product ! 0 as k ! 1.

7.6 Infinite divisibility


The weak law of large numbers and the central limit theorem are concerned,
respectively, with the convergence in dist. of sums of independent r.v.’s to a
degenerate and a normal d.f. It may seem strange that we should be so much
occupied with these two apparently unrelated distributions. Let us point out,
however, that in terms of ch.f.’s these two may be denoted, respectively, by
2 2
eait and eaitb t — exponentials of polynomials of the first and second degree
in (it). This explains the considerable similarity between the two cases, as
evidenced particularly in Theorems 6.4.3 and 6.4.4.
Now the question arises: what other limiting d.f.’s are there when small
independent r.v.’s are added? Specifically, consider the double array (2) in
Sec. 7.1, in which independence in each row and holospoudicity are assumed.
Suppose that for some sequence of constants an ,

kn
Sn  an D Xnj  an
jD1

converges in dist. to F. What is the class of such F’s, and when does
such a convergence take place? For a single sequence of independent r.v.’s
fXj , j ½ 1g, similar questions may be posed for the “normed sums” Sn 
an /bn .
These questions have been answered completely by the work of Lévy, Khint-
chine, Kolmogorov, and others; for a comprehensive treatment we refer to the
book by Gnedenko and Kolmogorov [12]. Here we must content ourselves
with a modest introduction to this important and beautiful subject.
We begin by recalling other cases of the above-mentioned limiting distri-
butions, conveniently displayed by their ch.f.’s:
1
ecjtj ,
it ˛
ee ,  > 0; 0 < ˛ < 2, c > 0.
The former is the Poisson distribution; the latter is called the symmetric
stable distribution of exponent ˛ (see the discussion in Sec. 6.5), including
the Cauchy distribution for ˛ D 1. We may include the normal distribution
7.6 INFINITE DIVISIBILITY 251

among the latter for ˛ D 2. All these are exponentials and have the further
property that their “nth roots”:
1
ec/njtj ,
2 2 it ˛
eait/n , e1/naitb t 
, e/ne ,
are also ch.f.’s. It is remarkable that this simple property already characterizes
the class of distributions we are looking for (although we shall prove only
part of this fact here).
DEFINITION OF INFINITE DIVISIBILITY. A ch.f. f is called infinitely divisible iff
for each integer n ½ 1, there exists a ch.f. fn such that
1 f D fn n .
In terms of d.f.’s, this becomes in obvious notation:
F D FnŁ
n DF Ł Fn Ł Ð Ð Ð Ł Fn .
=n >? @
n factors

In terms of r.v.’s this means, for each n ½ 1, in a suitable probability space


(why “suitable”?): there exist r.v.’s X and Xnj , 1  j  n, the latter being
independent among themselves, such that X has ch.f. f, Xnj has ch.f. fn , and

n
2 XD Xnj .
jD1

X is thus “divisible” into n independent and identically distributed parts, for


each n. That such an X, and only such a one, can be the “limit” of sums of
small independent terms as described above seems at least plausible.
A vital, though not characteristic, property of an infinitely divisible ch.f.
will be given first.

Theorem 7.6.1. An infinitely divisible ch.f. never vanishes (for real t).
PROOF. We shall see presently that a complex-valued ch.f. is trouble-some
when its “nth root” is to be extracted. So let us avoid this by going to the real,
using the Corollary to Theorem 6.1.4. Let f and fn be as in (1) and write
g D jfj2 , gn D jfn j2 .
For each t 2 R1 , gt being real and positive, though conceivably vanishing,
its real positive nth root is uniquely defined; let us denote it by [gt]1/n . Since
by (1) we have
gt D [gn t]n ,
and gn t ½ 0, it follows that
3 8t: gn t D [gt]1/n .
252 CENTRAL LIMIT THEOREM AND ITS RAMIFICATIONS

But 0  gt  1, hence limn!1 [gt]1/n is 0 or 1 according as gt D 0 or


gt 6D 0. Thus limn!1 gn t exists for every t, and the limit function, say
ht, can take at most the two possible values 0 and 1. Furthermore, since g is
continuous at t D 0 with g0 D 1, there exists a t0 > 0 such that gt 6D 0 for
jtj  t0 . It follows that ht D 1 for jtj  t0 . Therefore the sequence of ch.f.’s
gn converges to the function h, which has just been shown to be continuous
at the origin. By the convergence theorem of Sec. 6.3, h must be a ch.f. and
so continuous everywhere. Hence h is identically equal to 1, and so by the
remark after (3) we have

8t: jftj2 D gt 6D 0.

The theorem is proved.


Theorem 7.6.1 immediately shows that the uniform distribution on [1, 1]
is not infinitely divisible, since its ch.f. is sin t/t, which vanishes for some t,
although in a literal sense it has infinitely many divisors! (See Exercise 8 of
Sec. 6.3.) On the other hand, the ch.f.
2 C cos t
3
never vanishes, but for it (1) fails even when n D 2; when n ½ 3, the failure
of (1) for this ch.f. is an immediate consequence of Exercise 7 of Sec. 6.1, if
we notice that the corresponding p.m. consists of exactly 3 atoms.
Now that we have proved Theorem 7.6.1, it seems natural to go back to
an arbitrary infinitely divisible ch.f. and establish the generalization of (3):

fn t D [ft]1/n

for some “determination” of the multiple-valued nth root on the right side. This
can be done by a simple process of “continuous continuation” of a complex-
valued function of a real variable. Although merely an elementary exercise in
“complex variables”, it has been treated in the existing literature in a cavalier
fashion and then misused or abused. For this reason the main propositions
will be spelled out in meticulous detail here.

Theorem 7.6.2. Let a complex-valued function f of the real variable t be


given. Suppose that f0 D 1 and that for some T > 0, f is continuous in
[T, T] and does not vanish in the interval. Then there exists a unique (single-
valued) function  of t in [T, T] with 0 D 0 that is continuous there and
satisfies

4 ft D et , T  t  T.


7.6 INFINITE DIVISIBILITY 253

The corresponding statement when [T, T] is replaced by 1, 1 is


also true.
PROOF. Consider the range of ft, t 2 [T, T]; this is a closed set of
points in the complex plane. Since it does not contain the origin, we have
inf jft  0j D T > 0.
TtT

Next, since f is uniformly continuous in [T, T], there exists a υT , 0 < υT <
T , such that if t and t0 both belong to [T, T] and jt  t0 j  υT , then jft 
ft0 j  T /2  12 . Now divide [T, T] into equal parts of length less than
υT , say:
T D tl < Ð Ð Ð < t1 < t0 D 0 < t1 < Ð Ð Ð < t D T.
For t1  t  t1 , we define  as follows:
1
 1j
5 t D fft  1gj .
jD1
j

This is a continuous function of t in [t1 , t1 ], representing that determination


of log ft which equals 0 for t D 0. Suppose that  has already been defined
in [tk , tk ]; then we define  in [tk , tkC1 ] as follows:
1  
1j ft  ftk  j
6 t D tk  C ;
jD1
j ftk 

similarly in [tk1 , tk ] by replacing tk with tk everywhere on the right side
above. Since we have, for tk  t  tkC1 ,
  T
 ft  ftk  
   2 D 1,
 ftk   T 2
the power series in (6) converges uniformly in [tk , tkC1 ] and represents a
continuous function there equal to that determination of the logarithm of the
function ft/ftk   1 which is 0 for t D tk . Specifically, for the “schlicht
neighborhood” jz  1j  12 , let
1
 1j
7 Lz D z  1j
jD1
j

be the unique determination of log z vanishing at z D 1. Then (5) and (6)


become, respectively:
t D Lft,   t1  t  t1 ;
ft
t D tk  C L , tk  t  tkC1 ;
ftk 
254 CENTRAL LIMIT THEOREM AND ITS RAMIFICATIONS

with a similar expression for tk1  t  tk . Thus (4) is satisfied in [t1 , t1 ],
and if it is satisfied for t D tk , then it is satisfied in [tk , tkC1 ], since
ft
et D etk CLft/ftk  D ftk  D ft.
ftk 
Thus (4) is satisfied in [T, T] by induction, and the theorem is proved for
such an interval. To prove it for 1, 1 let us observe that, having defined
 in [n, n], we can extend it to [n  1, n C 1] with the previous method,
by dividing [n, n C 1], for example, into small equal parts whose length must
be chosen dependent on n at each stage (why?). The continuity of  is clear
from the construction.
To prove the uniqueness of , suppose that 0 has the same properties as
. Since both satisfy equation (4), it follows that for each t, there exists an
integer mt such that
t  0 t D 2i mt.
The left side being continuous in t, mÐ must be a constant (why?), which
must be equal to m0 D 0. Thus t D 0 t.

Remark. It may not be amiss to point out that t is a single-valued


function of t but not of ft; see Exercise 7 below.

Theorem 7.6.3. For a fixed T, let each k f, k ½ 1, as well as f satisfy the


conditions for f in Theorem 7.6.2, and denote the corresponding  by k .
Suppose that k f converges uniformly to f in [T, T], then k  converges
uniformly to  in [T, T].
PROOF. Let L be as in (7), then there exists a υ, 0 < υ < 12 , such that
jLzj  1, if jz  1j  υ.
By the hypothesis of uniformity, there exists k1 T such that if k ½ k1 T,
then we have
 
 k ft 
8 sup   1  υ,
ft
jtjT

and consequently
  
 k ft 
9 
sup L  1.
jtjT ft 
Since for each t, the exponentials of k t  t and Lk ft/ft are equal,
there exists an integer-valued function k mt, jtj  T, such that
 
k ft
10 L Dk t  t C 2i k mt, jtj  T.
ft
7.6 INFINITE DIVISIBILITY 255

Since L is continuous in jz  1j  υ, it follows that k mÐ is continuous in


jtj  T. Since it is integer-valued and equals 0 at t D 0, it is identically zero.
Thus (10) reduces to
 
k ft
11 k t  t D L , jtj  T.
ft
The function L being continuous at z D 1, the uniform convergence of k f/f
to 1 in [T, T] implies that of k    to 0, as asserted by the theorem.
Thanks to Theorem 7.6.1, Theorem 7.6.2 is applicable to each infinitely
divisible ch.f. f in 1, 1. Henceforth we shall call the corresponding 
the distinguished logarithm, and et/n the distinguished nth root of f. We
can now extract the correct nth root in (1) above.

Theorem 7.6.4. For each n, the fn in (1) is just the distinguished nth root
of f.
PROOF. It follows from Theorem 7.6.1 and (1) that the ch.f. fn never
vanishes in 1, 1, hence its distinguished logarithm n is defined. Taking
multiple-valued logarithms in (1), we obtain as in (10):
8t: t  nn t D 2imn t,
where mn Ð takes only integer values. We conclude as before that mn Ð  0,
and consequently

12 fn t D en t D et/n


as asserted.

Corollary. If f is a positive infinitely divisible ch.f., then for every t the


fn t in (1) is just the real positive nth root of ft.
PROOF. Elementary analysis shows that the real-valued logarithm of a
real number x in 0, 1 is a continuous function of x. It follows that this is
a continuous solution of equation (4) in 1, 1. The uniqueness assertion
in Theorem 7.6.2 then identifies it with the distinguished logarithm of f, and
the corollary follows, since the real positive nth root is the exponential of 1/n
times the real logarithm.
As an immediate consequence of (12), we have
8t: lim fn t D 1.
n!1

Thus by Theorem 7.1.1 the double array fXnj , 1  j  n, 1  ng giving rise


to (2) is holospoudic. We have therefore proved that each infinitely divisible
256 CENTRAL LIMIT THEOREM AND ITS RAMIFICATIONS


distribution can be obtained as the limiting distribution of Sn D njD1 Xnj in
such an array.
It is trivial that the product of two infinitely divisible ch.f.’s is again such
a one, for we have in obvious notation:

1 f Ð2 f D 1 fn n Ð 2 fn n D 1 fn Ð2 fn n .
The next proposition lies deeper.

Theorem 7.6.5. Let fk f, k ½ 1g be a sequence of infinitely divisible ch.f.’s


converging everywhere to the ch.f. f. Then f is infinitely divisible.
PROOF. The difficulty is to prove first that f never vanishes. Consider, as
in the proof of Theorem 7.6.1: g D jfj2 ,k g D jk fj2 . For each n > 1, let x 1/n
denote the real positive nth root of a real positive x. Then we have, by the
hypothesis of convergence and the continuity of x 1/n as a function of x,
13 8t: [k gt]1/n ! [gt]1/n .
By the Corollary to Theorem 7.6.4, the left member in (13) is a ch.f. The right
member is continuous everywhere. It follows from the convergence theorem
for ch.f.’s that [gÐ]1/n is a ch.f. Since g is its nth power, and this is true for
each n ½ 1, we have proved that g is infinitely divisible and so never vanishes.
Hence f never vanishes and has a distinguished logarithm  defined every-
where. Let that of k f be k . Since the convergence of k f to f is necessarily
uniform in each finite interval (see Sec. 6.3), it follows from Theorem 7.6.3
that k  !  everywhere, and consequently
14 expk t/n ! expt/n
for every t. The left member in (14) is a ch.f. by Theorem 7.6.4, and the right
member is continuous by the definition of . Hence it follows as before that
et/n is a ch.f. and f is infinitely divisible.
The following alternative proof is interesting. There exists a υ > 0 such
that f does not vanish for jtj  υ, hence  is defined in this interval. For
each n, (14) holds uniformly in this interval by Theorem 7.6.3. By Exercise 6
of Sec. 6.3, this is sufficient to ensure the existence of a subsequence from
fexpk t/n, k ½ 1g converging everywhere to some ch.f. ϕn . The nth power
of this subsequence then converges to ϕn n , but being a subsequence of fk fg
it also converges to f. Hence f D ϕn n , and we conclude again that f is
infinitely divisible.
Using the preceding theorem, we can construct a wide class of ch.f.’s
that are infinitely divisible. For each a and real u, the function
tut 1
15 Pt; a, u D eae
7.6 INFINITE DIVISIBILITY 257

is an infinitely divisible ch.f., since it is obtained from the Poisson ch.f. with
parameter a by substituting ut for t. We shall call such a ch.f. a generalized
Poisson ch.f. A finite product of these:
⎡ ⎤
k k
16 Pt; aj , uj  D exp ⎣ aj eituj  1⎦
jD1 jD1

is then also infinitely


1 divisible. Now if G is any bounded increasing function,
the integral 1 eitu  1dGu may be approximated by sums of the kind
appearing as exponent in the right member of (16), for all t in R1 and indeed
uniformly so in every finite interval (why?). It follows that for each such G,
the function
# 1 $
17 ft D exp eitu  1 dGu
1

is an infinitely divisible ch.f. Now it turns out that although this falls some-
what short of being the most general form of an infinitely divisible ch.f.,
we have nevertheless the following qualititive result, which is a complete
generalization of (16).

Theorem 7.6.6. For each infinitely divisible ch.f. f, there exists a double
array of pairs of real constants anj , unj , 1  j  kn , 1  n, where aj > 0,
such that

kn
18 ft D lim Pt; anj , unj .
n!1
jD1

The converse is also true. Thus the class of infinitely divisible d.f.’s coincides
with the closure, with respect to vague convergence, of convolutions of a finite
number of generalized Poisson d.f.’s.
PROOF. Let f and fn be as in (1) and let  be the distinguished logarithm
of f, Fn the d.f. corresponding to fn . We have for each t, as n ! 1:
n[fn t  1] D n[et/n  1] ! t
and consequently
19 en[fn t1] ! et D ft.
Actually the first member in (19) is a ch.f. by Theorem 6.5.6, so that the
convergence is uniform in each finite interval, but this fact alone is neither
necessary nor sufficient for what follows. We have
 1
n[fn t  1] D eitu  1n dFn u.
1
258 CENTRAL LIMIT THEOREM AND ITS RAMIFICATIONS

For each n, nFn is a bounded increasing function, hence there exists

fanj , unj ; 1  j  kn g

where 1 < un1 < un2 < Ð Ð Ð < unkn < 1 and anj D n[Fn un,j   Fn un,j1 ],
such that
 
  1 
 kn itu  1
20 
sup  e nj
 1anj  e  1n dFn u  .
itu
jtjn  jD1 1  n

(Which theorem in Chapter 6 implies this?) Taking exponentials and using


0 0
the elementary inequality jez  ez j  jez jejzz j  1, we conclude that as
n ! 1,
 
   
 n[f t1]  kn
 1
21 sup e n  Pt; anj , unj  D O .
jtjn  jD1  n

This and (19) imply (18). The converse is proved at once by Theorem 7.6.5.
We are now in a position to state the fundamental theorem on infinitely
divisible ch.f.’s, due to P. Lévy and Khintchine.

Theorem 7.6.7. Every infinitely divisible ch.f. f has the following canonical
representation:
#  1   $
itu 1 C u2
ft D exp ait C eitu  1  dGu
1 1 C u2 u2
where a is a real constant, G is a bounded increasing function in 1, 1,
and the integrand is defined by continuity to be t2 /2 at u D 0. Furthermore,
the class of infinitely divisible ch.f.’s coincides with the class of limiting ch.f.’s
n
of kjD1 Xnj  an in a holospoudic double array

fXnj , 1  j  kn , 1  ng,

where kn ! 1 and for each n, the r.v.’s fXnj , 1  j  kn g are independent.

Note that we have proved above that every infinitely divisible ch.f. is in
the class of limiting ch.f.’s described here, although we did not establish the
canonical representation. Note also that if the hypothesis of “holospoudicity”
is omitted, then every ch.f. is such a limit, trivially (why?). For a complete
proof of the theorem, various special cases, and further developments, see the
book by Gnedenko and Kolmogorov [12].
7.6 INFINITE DIVISIBILITY 259

Let us end this section with an interesting example. Put s D C it, >1
and t real; consider the Riemann zeta function:
 1
1  
1 1
s D D 1 s ,
nD1
ns p
p

where p ranges over all prime numbers. Fix > 1 and define
 C it
ft D .
 
We assert that f is an infinitely divisible ch.f. For each p and every real t,
the complex number 1  p it lies within the circle fz : jz  1j < 12 g. Let
log z denote that determination of the logarithm with an angle in (, ]. By
looking at the angles, we see that
1  p
log D log1  p   log1  p it

1  p it
1
 1
D m
em log pit  1
mD1
mp
1

D log Pt; m1 pm , m log p.
mD1

Since 
ft D lim Pt; m1 pm , m log p,
n!1
pn

it follows that f is an infinitely divisible ch.f.


So far as known, this famous relationship between two “big names” has
produced no important issue.

EXERCISES

1. Is the convex combination of infinitely divisible ch.f.’s also infinitely


divisible?
2. If f is an infinitely divisible ch.f. and  its distinguished logarithm,
r > 0, then the rth power of f is defined to be ert . Prove that for each r > 0
it is an infinitely divisible ch.f.
 3. Let f be a ch.f. such that there exists a sequence of positive integers
nk going to infinity and a sequence of ch.f.’s ϕk satisfying f D ϕk nk ; then
f is infinitely divisible.
4. Give another proof that the right member of (17) is an infinitely divis-
ible ch.f. by using Theorem 6.5.6.
260 CENTRAL LIMIT THEOREM AND ITS RAMIFICATIONS

5. Show that ft D 1  b/1  beit , 0 < b < 1, is an infinitely divis-


ible ch.f. [HINT: Use canonical form.]
6. Show that the d.f. with density ˇ˛ 0˛1 x ˛1 eˇx , ˛ > 0, ˇ > 0, in
0, 1, and 0 otherwise, is infinitely divisible.
7. Carry out the proof of Theorem 7.6.2 specifically for the “trivial” but
instructive case ft D eait , where a is a fixed real number.
 8. Give an example to show that in Theorem 7.6.3, if the uniformity of
convergence of k f to f is omitted, then k  need not converge to . [HINT:
1
k ft D expf2i1 kt1 C kt g.]
k

9. Let ft D 1  t, fk t D 1  t C 1k itk 1 , 0  t p2, k ½ 1. Then


fk never vanishes and converges uniformly to f in [0,p2]. Let fk denote the
distinguished square root of fk in [0, 2]. Show that fk does not converge
in any neighborhood of t D 1. Why is Theorem 7.6.3 not applicable? [This
example is supplied by E. Reich].
 10. Some writers have given the proof of Theorem 7.6.6 by apparently
considering each fixed t and using an analogue of (21) without the “supjtjn ”
there. Criticize this “quick proof”. [HINT: Show that the two relations
8t and 8m: lim umn t D u.n t,
m!1

8t: lim u.n t D ut,


n!1

do not imply the existence of a sequence fmn g such that

8t: lim umn n t D ut.


n!1

Indeed, they do not even imply the existence of two subsequences fm g and
fn g such that
8t: lim um n t D ut.
!1

Thus the extension of Lemma 1 in Sec. 7.2 is false.]


The three “counterexamples” in Exercises 7 to 9 go to show that the
cavalierism alluded to above is not to be shrugged off easily.
11. Strengthening Theorem 6.5.5, show that two infinitely divisible
ch.f.’s may coincide in a neighborhood of 0 without being identical.
 12. Reconsider Exercise 17 of Sec. 6.4 and try to apply Theorem 7.6.3.
[HINT: The latter is not immediately applicable, owing to the lack of uniform
convergence. However, show first that if ecn it converges for t 2 A, where
mA > 0, then it converges for all t. This follows from a result due to Stein-
haus, asserting that the difference set A  A contains a neighborhood of 0 (see,
0 0
e.g., Halmos [4, p. 68]), and from the equation ecn it ecn it D ecn itCt  . Let fbn g,
0
fbn0 g be any two subsequences of cn , then ebn bn it ! 1 for all t. Since 1 is a
7.6 INFINITE DIVISIBILITY 261

ch.f., the convergence is uniform in every finite interval by the convergence


theorem for ch.f.’s. Alternatively, if

ϕt D lim ecn it


n!1

then ϕ satisfies Cauchy’s functional equation and must be of the form ecit ,
which is a ch.f. These approaches are fancier than the simple one indicated
in the hint for the said exercise, but they are interesting. There is no known
quick proof by “taking logarithms”, as some authors have done.]

Bibliographical Note

The most comprehensive treatment of the material in Secs. 7.1, 7.2, and 7.6 is by
Gnedenko and Kolmogorov [12]. In this as well as nearly all other existing books
on the subject, the handling of logarithms must be strengthened by the discussion in
Sec. 7.6.
For an extensive development of Lindeberg’s method (the operator approach) to
infinitely divisible laws, see Feller [13, vol. 2].
Theorem 7.3.2 together with its proof as given here is implicit in
W. Doeblin, Sur deux problèmes de M. Kolmogoroff concernant les chaines
dénombrables, Bull. Soc. Math. France 66 (1938), 210–220.
It was rediscovered by F. J. Anscombe. For the extension to the case where the constant
c in (3) of Sec. 7.3 is replaced by an arbitrary r.v., see H. Wittenberg, Limiting distribu-
tions of random sums of independent random variables, Z. Wahrscheinlichkeitstheorie
1 (1964), 7–18.
Theorem 7.3.3 is contained in
P. Erdös and M. Kac, On certain limit theorems of the theory of probability, Bull.
Am. Math. Soc. 52 (1946) 292–302.
The proof given for Theorem 7.4.1 is based on
P. L. Hsu, The approximate distributions of the mean and variance of a sample of
independent variables, Ann. Math. Statistics 16 (1945), 1–29.
This paper contains historical references as well as further extensions.
For the law of the iterated logarithm, see the classic
A. N. Kolmogorov, Über das Gesetz des iterierten Logarithmus, Math. Annalen
101 (1929), 126–136.
For a survey of “classical limit theorems” up to 1945, see
W. Feller, The fundamental limit theorems in probability, Bull. Am. Math. Soc.
51 (1945), 800–832.
262 CENTRAL LIMIT THEOREM AND ITS RAMIFICATIONS

Kai Lai Chung, On the maximum partial sums of sequences of independent random
variables. Trans. Am. Math. Soc. 64 (1948), 205–233.
Infinitely divisible laws are treated in Chapter 7 of Lévy [11] as the analytical
counterpart of his full theory of additive processes, or processes with independent
increments (later supplemented by J. L. Doob and K. Ito).
New developments in limit theorems arose from connections with the Brownian
motion process in the works by Skorohod and Strassen. For an exposition see references
[16] and [22] of General Bibliography.
8 Random walk

8.1 Zero-or-one laws

In this chapter we adopt the notation N for the set of strictly positive integers,
and N0 for the set of positive integers; used as an index set, each is endowed
with the natural ordering and interpreted as a discrete time parameter. Simi-
larly, for each n 2 N, Nn denotes the ordered set of integers from 1 to n (both
inclusive); N0n that of integers from 0 to n (both inclusive); and N0n that of
integers beginning with n C 1.
On the probability triple , F , P , a sequence fXn , n 2 Ng where each
Xn is an r.v. (defined on  and finite a.e.), will be called a (discrete parameter)
stochastic process. Various Borel fields connected with such a process will now
be introduced. For any sub-B.F. G of F , we shall write

1 X2G

and use the expression “X belongs to G ” or “G contains X” to mean that


X1 B ² G (see Sec. 3.1 for notation): in the customary language X is said
to be “measurable with respect to G ”. For each n 2 N, we define two B.F.’s
as follows:
264 RANDOM WALK

Fn D the augmented B.F. generated by the family of r.v.’s fXk , k 2 Nn g;


that is, Fn is the smallest B.F.G containing all Xk in the family
and all null sets;
Fn0 D the augmented B.F. generated by the family of r.v.’s fXk , k 2 N0n g.

Recall that the union 1 nD1 Fn is a field but not necessarily a B.F. The
smallest B.F. containing it, or equivalently containing every Fn , n 2 N, is
denoted by
1
A
F1 D Fn ;
nD1

it is the B.F. generated by the stochastic process fXn , n 2


B1Ng. On the other
hand, the intersection 1 0
nD1 Fn is a B.F. denoted also by
0
nD1 Fn . It will be
called the remote field of the stochastic process and a member of it a remote
event.
Since F1 ² F , P is defined on F1 . For the study of the process fXn , n 2
Ng alone, it is sufficient to consider the reduced triple , F1 , P jF1 . The
following approximation theorem is fundamental.
1
Theorem 8.1.1. Given  > 0 and 3 2 F1 , there exists 3 2 nD1 Fn
such that

2 P 313   .
PROOF. Let G be the collection of sets 3 for which the assertion of the
theorem is true. Suppose 3k 2 G for each k 2 N and 3k " 3 or 3k # 3. Then
3 also belongs to G , as we can easily see by first taking k large and then
applying the asserted property to 3k . Thus G is a monotone class. Since it is
trivial that G contains the field 1 nD1 Fn that generates F1 , G must contain
F1 by the Corollary to Theorem 2.1.2, proving the theorem.
Without using Theorem 2.1.2, one can verify that G is closed with respect
to complementation (trivial), finite union (by Exercise 1 of Sec. 2.1), and
countable union (as increasing limit of finite unions). Hence G is a B.F. that
must contain F1 .
It will be convenient to use a particular type of sample space . In the
notation of Sec. 3.4, let

ð ,
1
D n
nD1

where each n is a “copy” of the real line R1 . Thus  is just the space of all
infinite sequences of real numbers. A point ω will be written as fωn , n 2 Ng,
and ωn as a function of ω will be called the nth coordinate (function) of ω.
8.1 ZERO-OR-ONE LAWS 265

Each n is endowed with the Euclidean B.F. B1 , and the product Borel field
F D F1 in the notation above) on  is defined to be the B.F. generated by
the finite-product sets of the form

k
3 fω: ωnj 2 Bnj g
jD1

where n1 , . . . , nk  is an arbitrary finite subset of N and where each Bnj 2 B1 .


In contrast to the case discussed in Sec. 3.3, however, no restriction is made on
the p.m. P on F . We shall not enter here into the matter of a general construc-
tion of P . The Kolmogorov extension theorem (Theorem 3.3.6) asserts that
on the concrete space-field , F  just specified, there exists a p.m. P whose
projection on each finite-dimensional subspace may be arbitrarily preassigned,
subject only to consistency (where one such subspace contains the other).
Theorem 3.3.4 is a particular case of this theorem.
In this chapter we shall use the concrete probability space above, of the
so-called “function space type”, to simplify the exposition, but all the results
below remain valid without this specification. This follows from a measure-
preserving homomorphism between an abstract probability space and one of
the function-space type; see Doob [17, chap. 2].
The chief advantage of this concrete representation of  is that it enables
us to define an important mapping on the space.

DEFINITION OF THE SHIFT. The shift  is a mapping of  such that


: ω D fωn , n 2 Ng ! ω D fωnC1 , n 2 Ng;
in other words, the image of a point has as its nth coordinate the n C 1st
coordinate of the original point.

Clearly  is an 1-to-1 mapping and it is from  onto . Its iterates


are defined as usual by composition:  0 D identity,  k D  °  k1 for k ½ 1. It
induces a direct set mapping  and an inverse set mapping  1 according to
the usual definitions. Thus
 1 3 D fω: ω 2 3g
and  n is the nth iterate of  1 . If 3 is the set in (3), then

k
4  1 3 D fω: ωnj C1 2 Bnj g.
jD1

It follows from this that  1 maps F into F ; more precisely,


83 2 F :  n 3 2 Fn0 , n 2 N,
266 RANDOM WALK

where Fn0 is the Borel field generated by fωk , k > ng. This is obvious by (4)
if 3 is of the form above, and since the class of 3 for which the assertion
holds is a B.F., the result is true in general.

DEFINITION.A set in F is called invariant (under the shift) iff 3 D  1 3.


An r.v. Y on  is invariant iff Yω D Yω for every ω 2 .

Observe the following general relation, valid for each point mapping 
and the associated inverse set mapping  1 , each function Y on  and each
subset A of R1 :

5  1 fω: Yω 2 Ag D fω: Yω 2 Ag.

This follows from  1 ° Y1 D Y ° 1 .


We shall need another kind of mapping of . A permutation on Nn is a
1-to-1 mapping of Nn to itself, denoted as usual by
 
1, 2, . . . , n
.
1, 2, . . . , n

The collection of such mappings forms a group with respect to composition.


A finite permutation on N is by definition a permutation on a certain “initial
segment” Nn of N. Given such a permutation as shown above, we define ω
to be the point in  whose coordinates are obtained from those of ω by the
corresponding permutation, namely

ω j, if j 2 Nn ;
 ωj D
ωj , if j 2 N0n .

As usual, induces a direct set mapping and an inverse set mapping 1 ,


the latter being also the direct set mapping induced by the “group inverse”
1
of . In analogy with the preceding definition we have the following.

DEFINITION. A set in F is called permutable iff 3 D 3 for every finite


permutation on N. A function Y on  is permutable iff Yω D Y ω for
every finite permutation and every ω 2 .

It is fairly obvious that an invariant set is remote and a remote set is


permutable; also that each of the collections: all invariant events, all remote
events, all permutable events, forms a sub-B.F. of F . If each of these B.F.’s
is augmented (see Exercise 20 of Sec. 2.2), the resulting augmented B.F.’s
will be called “almost invariant”, “almost remote”, and “almost permutable”,
respectively. Finally, the collection of all sets in F of probability either 0 or
8.1 ZERO-OR-ONE LAWS 267

1 clearly forms a B.F., which may be called the “all-or-nothing” field. This
B.F. will also be referred to as “almost trivial”.
Now that we have defined all these concepts for the general stochastic
process, it must be admitted that they are not very useful without further
specifications of the process. We proceed at once to the particular case below.

DEFINITION. A sequence of independent r.v.’s will be called an indepen-


dent process; it is called a stationary independent process iff the r.v.’s have a
common distribution.

Aspects of this type of process have been our main object of study,
usually under additional assumptions on the distributions. Having christened
it, we shall henceforth focus our attention on “the evolution of the process
as a whole” — whatever this phrase may mean. For this type of process, the
specific probability triple described above has been constructed in Sec. 3.3.
Indeed, F D F1 , and the sequence of independent r.v.’s is just that of the
successive coordinate functions fωn , n 2 Ng, which, however, will also be
interchangeably denoted by fXn , n 2 Ng. If ϕ is any Borel measurable func-
tion, then fϕXn , n 2 Ng is another such process.
The following result is called Kolmogorov’s “zero-or-one law”.

Theorem 8.1.2. For an independent process, each remote event has proba-
bility zero or one.
PROOF. Let 3 2 1 0
nD1 Fn and suppose that P 3 > 0; we are going to
prove that P 3 D 1. Since Fn and Fn0 are independent fields (see Exercise 5
of Sec. 3.3), 3 is independent of every set in Fn for each n 2 N; namely, if
M2 1 nD1 Fn , then

6 P 3 \ M D P 3P M.

If we set
P 3 \ M
P3 M D
P 3

for M 2 F , then P3 Ð is clearly a p.m. (the conditional probability relative to


3; see Chapter 9). By (6) it coincides with P on 1 nD1 Fn and consequently
also on F by Theorem 2.2.3. Hence we may take M to be 3 in (6) to conclude
that P 3 D P 32 or P 3 D 1.
The usefulness of the notions of shift and permutation in a stationary
independent process is based on the next result, which says that both  1 and
are “measure-preserving”.
268 RANDOM WALK

Theorem 8.1.3. For a stationary independent process, if 3 2 F and is


any finite permutation, we have
7 P  1 3 D P 3;
8 P  3 D P 3.
PROOF. Define a set function PQ on F as follows:

PQ 3 D P  1 3.

Since  1 maps disjoint sets into disjoint sets, it is clear that PQ is a p.m. For
a finite-product set 3, such as the one in (3), it follows from (4) that


k
PQ 3 D Bnj  D P 3.
jD1

Hence P and PQ coincide also on each set that is the union of a finite number
of such disjoint sets, and so on the B.F. F generated by them, according to
Theorem 2.2.3. This proves (7); (8) is proved similarly.
The following companion to Theorem 8.1.2, due to Hewitt and Savage,
is very useful.

Theorem 8.1.4. For a stationary independent process, each permutable event


has probability zero or one.
PROOF. Let 3 be a permutable event. Given  > 0, we may choose k > 0
so that
1

k  .
kD1

By Theorem 8.1.1, there exists 3k 2 Fnk such that P 3 1 3k   k , and we


may suppose that nk " 1. Let
 
1, . . . , nk , nk C 1, . . . , 2nk
D ,
nk C 1, . . . , 2nk , 1, . . . , nk

and Mk D 3k . Then clearly Mk 2 Fn0k . It follows from (8) that

P 3 1 Mk   k .

For any sequence of sets fEk g in F , we have


1 1  1
 
P lim sup Ek  D P Ek  P Ek .
k mD1 kDm kD1
8.1 ZERO-OR-ONE LAWS 269

Applying this to Ek D 3 1 Mk , and observing the simple identity

lim sup3 1 Mk  D 3n lim inf Mk  [ lim sup Mk n3,

we deduce that

P 3 1 lim sup Mk   P lim sup3 1 Mk   .

Since 1 0
kDm Mk 2 Fnm , the set lim sup Mk belongs to
1 0
kDm Fnk , which is seen
to coincide with the remote field. Thus lim sup Mk has probability zero or one
by Theorem 8.1.2, and the same must be true of 3, since  is arbitrary in the
inequality above.
Here is a more transparent proof of the theorem based on the metric on
the measure space , F , P  given in Exercise 8 of Sec. 3.2. Since 3k and
Mk are independent, we have

P 3k \ Mk  D P 3k P Mk .

Now 3k ! 3 and Mk ! 3 in the metric just mentioned, hence also

3k \ Mk ! 3 \ 3

in the same sense. Since convergence of events in this metric implies conver-
gence of their probabilities, it follows that P 3 \ 3 D P 3P 3, and the
theorem is proved.

Corollary. For a stationary independent process, the B.F.’s of almost


permutable or almost remote or almost invariant sets all coincide with the
all-or-nothing field.

EXERCISES

 and F are the infinite product space and field specified above.
1. Find an example of a remote field that is not the trivial one; to make
it interesting, insist that the r.v.’s are not identical.
2. An r.v. belongs to the all-or-nothing field if and only if it is constant
a.e.
3. If 3 is invariant then 3 D 3; the converse is false.
4. An r.v. is invariant [permutable] if and only if it belongs to the
invariant [permutable] field.
5. The set of convergence of an arbitrary
 sequence of r.v.’s fYn , n 2 Ng
or of the sequence of their partial sums njD1 Yj are both permutable. Their
limits are permutable r.v.’s with domain the set of convergence.
270 RANDOM WALK

 6. If a > 0, lim
n n!1 an exists > 0 finite
 or infinite, and limn!1 anC1 /an 
D 1, then the set of convergence of fan 1 njD1 Yj g is invariant. If an ! C1,
the upper and lower limits of this sequence are invariant r.v.’s.
 7. The set fY 2 A i.o.g, where A 2 B1 , is remote but not necessarily
2n
invariant; the set f njD1 Yj A i.o.g is permutable but not necessarily remote.
Find some other essentially different examples of these two kinds.
8. Find trivial examples of independent processes where the three
numbers P  1 3, P 3, P 3 take the values 1, 0, 1; or 0, 12 , 1.
9. Prove that an invariant event is remote and a remote event is
permutable.
 10. Consider the bi-infinite product space of all bi-infinite sequences of
real numbers fωn , n 2 Ng, where N is the set of all integers in its natural
(algebraic) ordering. Define the shift as in the text with N replacing N, and
show that it is 1-to-1 on this space. Prove the analogue of (7).
11. Show that the conclusion of Theorem 8.1.4 holds true for a sequence
of independent r.v.’s, not necessarily stationary, but satisfying the following
condition: for every j there exists a k > j such that Xk has the same distribu-
tion as Xj . [This remark is due to Susan Horn.]
n
12. Let fXn , n ½ 1g be independent r.v.’s with P fXn D 4 g D P fXn D
4n g D 2 . Then the remote field of fSn , n ½ 1g, where Sn D njD1 Xj , is not
1

trivial.

8.2 Basic notions

From now on we consider only a stationary independent process fXn , n 2


Ng on the concrete probability triple specified in the preceding section. The
common distribution of Xn will be denoted by (p.m.) or F (d.f.); when only
this is involved, we shall write X for a representative Xn , thus E X for E Xn .
Our interest in such a process derives mainly from the fact that it under-
lies another process of richer content. This is obtained by forming the succes-
sive partial sums as follows:

n
1 Sn D Xj , n 2 N.
jD1

An initial r.v. S0  0 is adjoined whenever this serves notational convenience,


as in Xn D Sn  Sn1 for n 2 N. The sequence fSn , n 2 Ng is then a very
familiar object in this book, but now we wish to find a proper name for it. An
officially correct one would be “stochastic process with stationary independent
differences”; the name “homogeneous additive process” can also be used. We
8.2 BASIC NOTIONS 271

have, however, decided to call it a “random walk (process)”, although the use
of this term is frequently restricted to the case when is of the integer lattice
type or even more narrowly a Bernoullian distribution.
DEFINITION OF RANDOM WALK. A random walk is the process fSn , n 2 Ng
defined in (1) where fXn , n 2 Ng is a stationary independent process. By
convention we set also S0  0.
A similar definition applies in a Euclidean space of any dimension, but
we shall be concerned only with R1 except in some exercises later.
Let us observe that even for an independent process fXn , n 2 Ng, its
remotefield is in general different from the remote field of fSn , n 2 Ng, where
Sn D njD1 Xj . They are almost the same, being both almost trivial, for a
stationary independent process by virtue of Theorem 8.1.4, since the remote
field of the random walk is clearly contained in the permutable field of the
corresponding stationary independent process.
We add that, while the notion of remoteness applies to any process,
“(shift)-invariant” and “permutable” will be used here only for the underlying
“coordinate process” fωn , n 2 Ng or fXn , n 2 Ng.
The following relation will be much used below, for m < n:

nm 
nm
Snm  m ω D Xj  m ω D XjCm ω D Sn ω  Sm ω.
jD1 jD1

It follows from Theorem 8.1.3 that Snm and Sn  Sm have the same distri-
bution. This is obvious directly, since it is just nmŁ .
As an application of the results of Sec. 8.1 to a random walk, we state
the following consequence of Theorem 8.1.4.
Theorem 8.2.1. Let Bn 2 B1 for each n 2 N. Then
P fSn 2 Bn i.o.g
is equal to zero or one.
PROOF. If is a permutation on Nm , then Sn  ω D Sn ω for n ½ m,
hence the set
1
3m D fSn 2 Bn g
nDm

1
is unchanged under or . Since 3m decreases as m increases, it
follows that
1

3m
mD1

is permutable, and the theorem is proved.


272 RANDOM WALK

Even for a fixed B D Bn the result is significant, since it is by no means


evident that the set fSn > 0 i.o.g, for instance, is even vaguely invariant
or remote with respect to fXn , n 2 Ng (cf. Exercise 7 of Sec. 8.1). Yet the
preceding theorem implies that it is in fact almost invariant. This is the strength
of the notion of permutability as against invariance or remoteness.
For any serious study of the random walk process, it is imperative to
introduce the concept of an “optional r.v.” This notion has already been used
more than once in the book (where?) but has not yet been named. Since the
basic inferences are very simple and are supposed to be intuitively obvious,
it has been the custom in the literature until recently not to make the formal
introduction at all. However, the reader will profit by meeting these funda-
mental ideas for the theory of stochastic processes at the earliest possible time.
They will be needed in the next chapter, too.

DEFINITION OF OPTIONAL r.v. An r.v. ˛ is called optional relative to the


arbitrary stochastic process fZn , n 2 Ng iff it takes strictly positive integer
values or C1 and satisfies the following condition:
2 8n 2 N [ f1g: fω: ˛ω D ng 2 Fn ,
where Fn is the B.F. generated by fZk , k 2 Nn g.
Similarly if the process is indexed by N0 (as in Chapter 9), then the range
of ˛ will be N0 . Thus if the index n is regarded as the time parameter, then
˛ effects a choice of time (an “option”) for each sample point ω. One may
think of this choice as a time to “stop”, whence the popular alias “stopping
time”, but this is usually rather a momentary pause after which the process
proceeds again: time marches on!
Associated with each optional r.v. ˛ there are two or three important
objects. First, the pre-˛ field F˛ is the collection of all sets in F1 of the form

3 [f˛ D ng \ 3n ],
1n1

where 3n 2 Fn for each n 2 N [ f1g. This collection is easily verified to be


a B.F. (how?). If 3 2 F˛ , then we have clearly 3 \ f˛ D ng 2 Fn , for every
n. This property also characterizes the members of F˛ (see Exercise 1 below).
Next, the post-˛ process is the process j fZ˛Cn , n 2 Ng defined on the trace
of the original probability triple on the set f˛ < 1g, where
4 8n 2 N: Z˛Cn ω D Z˛ωCn ω.
Each Z˛Cn is seen to be an r.v. with domain f˛ < 1g; indeed it is finite a.e.
there provided the original Zn ’s are finite a.e. It is easy to see that Z˛ 2 F˛ .
The post-˛ field F˛0 is the B.F. generated by the post-˛ process: it is a sub-B.F.
of f˛ < 1g \ F1 .
8.2 BASIC NOTIONS 273

Instead of requiring ˛ to be defined on all  but possibly taking the value


1, we may suppose it to be defined on a set 1 in F1 . Note that a strictly
positive integer n is an optional r.v. The concepts of pre-˛ and post-˛ fields
reduce in this case to the previous Fn and Fn0 .
A vital example of optional r.v. is that of the first entrance time into a
given Borel set A:
⎧ 1
⎨ minfn 2 N: Z ω 2 Ag on fω: Zn ω 2 Ag;
n
5 ˛A ω D nD1

C1 elsewhere.
To see that this is optional, we need only observe that for each n 2 N:
fω: ˛A ω D ng D fω: Zj ω 2 Ac , 1  j  n  1; Zn ω 2 Ag
which clearly belongs to Fn ; similarly for n D 1.
Concepts connected with optionality have everyday counterparts, implicit
in phrases such as “within thirty days of the accident (should it occur)”. Histor-
ically, they arose from “gambling systems”, in which the gambler chooses
opportune times to enter his bets according to previous observations, experi-
ments, or whatnot. In this interpretation, ˛ C 1 is the time chosen to gamble
and is determined by events strictly prior to it. Note that, along with ˛, ˛ C 1
is also an optional r.v., but the converse is false.
So far the notions are valid for an arbitrary process on an arbitrary triple.
We now return to a stationary independent process on the specified triple and
extend the notion of “shift” to an “˛-shift” as follows:  ˛ is a mapping on
f˛ < 1g such that
6  ˛ω D  nω on fω: ˛ω D ng.
Thus the post-˛ process is just the process fXn  ˛ ω, n 2 Ng. Recalling that
Xn is a mapping on , we may also write
7 X˛Cn ω D Xn  ˛ ω D Xn °  ˛ ω
and regard Xn °  ˛ , n 2 N, as the r.v.’s of the new process. The inverse set
mapping  ˛ 1 , to be written more simply as  ˛ , is defined as usual:
 ˛ 3 D fω:  ˛ ω 2 3g.
Let us now prove the fundamental theorem about “stopping” a stationary
independent process.

Theorem 8.2.2. For a stationary independent process and an almost every-


where finite optional r.v. ˛ relative to it, the pre-˛ and post-˛ fields are inde-
pendent. Furthermore the post-˛ process is a stationary independent process
with the same common distribution as the original one.
274 RANDOM WALK

PROOF. Both assertions are summarized in the formula below. For any
3 2 F˛ , k 2 N, Bj 2 B1 , 1  j  k, we have


k
8 P f3; X˛Cj 2 Bj , 1  j  kg D P f3g Bj .
jD1

To prove (8), we observe that it follows from the definition of ˛ and F˛ that
9 3 \ f˛ D ng D 3n \ f˛ D ng 2 Fn ,
where 3n 2 Fn for each n 2 N. Consequently we have
P f3; ˛ D n; X˛Cj 2 Bj , 1  j  kg D P f3n ; ˛ D n; XnCj 2 Bj , 1  j  kg
D P f3; ˛ D ngP fXnCj 2 Bj , 1  j  kg

k
D P f3; ˛ D ng Bj ,
jD1

where the second equation is a consequence of (9) and the independence of


Fn and Fn0 . Summing over n 2 N, we obtain (8).
An immediate corollary is the extension of (7) of Sec. 8.1 to an ˛-shift.

Corollary. For each 3 2 F1 we have


10 P  ˛ 3 D P 3.

Just as we iterate the shift , we can iterate  ˛ . Put ˛1 D ˛, and define


k
˛ inductively by
˛kC1 ω D ˛k  ˛ ω, k 2 N.
Each ˛k is finite a.e. if ˛ is. Next, define ˇ0 D 0, and

k
ˇk D ˛j , k 2 N.
jD1

We are now in a position to state the following result, which will be


needed later.

Theorem 8.2.3. Let ˛ be an a.e. finite optional r.v. relative to a stationary


independent process. Then the random vectors fVk , k 2 Ng, where

Vk ω D ˛k ω, Xˇk1 C1 ω, . . . , Xˇk ω,


are independent and identically distributed.
8.2 BASIC NOTIONS 275

PROOF. The independence follows from Theorem 8.2.2 by our showing


that V1 , . . . , Vk1 belong to the pre-ˇk1 field, while Vk belongs to the post-
ˇk1 field. The details are left to the reader; cf. Exercise 6 below.
To prove that Vk and VkC1 have the same distribution, we may suppose
that k D 1. Then for each n 2 N, and each n-dimensional Borel set A, we have
fω: ˛2 ω D n; X˛1 C1 ω, . . . , X˛1 C˛2 ω 2 Ag
D fω: ˛1  ˛ ω D n; X1  ˛ ω, . . . , X˛1  ˛ ω 2 Ag,
since
X˛1  ˛ ω D X˛1  ˛ ω  ˛ ω D X˛2 ω  ˛ ω
D X˛1 ωC˛2 ω ω D X˛1 C˛2 ω
by the quirk of notation that denotes by X˛ Ð the function whose value at
ω is given by X˛ω ω and by (7) with n D ˛2 ω. By (5) of Sec. 8.1, the
preceding set is the  ˛ -image (inverse image under  ˛ ) of the set
fω: ˛1 ω D n; X1 ω, . . . , X˛1 ω 2 Ag,
and so by (10) has the same probability as the latter. This proves our assertion.

Corollary. The r.v.’s fYk , k 2 Ng, where



ˇk
Yk ω D ϕXn ω
nDˇk1 C1

and ϕ is a Borel measurable function, are independent and identically


distributed.

For ϕ  1, Yk reduces to ˛k . For ϕx  x, Yk D Sˇk  Sˇk1 . The reader


is advised to get a clear picture of the quantities ˛k , ˇk , and Yk before
proceeding further, perhaps by considering a special case such as (5).
We shall now apply these considerations to obtain results on the “global
behavior” of the random walk. These will be broad qualitative statements
distinguished by their generality without any additional assumptions.
The optional r.v. to be considered is the first entrance time into the strictly
positive half of the real line, namely A D 0, 1 in (5) above. Similar results
hold for [0, 1); and then by taking the negative of each Xn , we may deduce
the corresponding result for 1, 0] or 1, 0. Results obtained in this way
will be labeled as “dual” below. Thus, omitting A from the notation:
⎧ 1
⎨ minfn 2 N: S > 0g on fω: Sn ω > 0g;
n
11 ˛ω D
⎩ nD1
C1 elsewhere;
276 RANDOM WALK

and
8n 2 N: f˛ D ng D fSj  0 for 1  j  n  1; Sn > 0g.
Define also the r.v. Mn as follows:
12 8n 2 N0 : Mn ω D max Sj ω.
0jn

The inclusion of S0 above in the maximum may seem artificial, but it does
not affect the next theorem and will be essential in later developments in the
next section. Since each Xn is assumed to be finite a.e., so is each Sn and Mn .
Since Mn increases with n, it tends to a limit, finite or positive infinite, to be
denoted by
13 Mω D lim Mn ω D sup Sj ω.
n!1 0j<1

Theorem 8.2.4. The statements (a), (b), and (c) below are equivalent; the
statements a0 , b0 , and c0  are equivalent.
(a) P f˛ < C1g D 1; a0 ) P f˛ < C1g < 1;
(b) P f lim Sn D C1g D 1; b0 ) P f lim Sn D C1g D 0;
n!1 n!1
0
(c) P fM D C1g D 1; c ) P fM D C1g D 0.
PROOF. If (a) is true, we may suppose ˛ < 1 everywhere. Consider the
r.v. S˛ : it is strictly positive by definition and so 0 < E S˛   C1. By the
Corollary to Theorem 8.2.3, fSˇkC1  Sˇk , k ½ 1g is a sequence of indepen-
dent and identically distributed r.v.’s. Hence the strong law of large numbers
(Theorem 5.4.2 supplemented by Exercise 1 of Sec. 5.4) asserts that, if ˛0  0
and S˛0  0:

1
n1
S ˇn
D Sˇ  Sˇk  ! E S˛  > 0 a.e.
n n kD0 kC1

This implies (b). Since limn!1 Sn  M, (b) implies (c). It is trivial that (c)
implies (a). We have thus proved the equivalence of (a), (b), and (c). If a0 
is true, then (a) is false, hence (b) is false. But the set
f lim Sn D C1g
n!1

is clearly permutable (it is even invariant, but this requires a little more reflec-
tion), hence b0  is true by Theorem 8.1.4. Now any numerical sequence with
finite upper limit is bounded above, hence b0  implies c0 . Finally, if c0  is
true then (c) is false, hence (a) is false, and a0  is true. Thus a0 , b0 , and
c0  are also equivalent.
8.2 BASIC NOTIONS 277

Theorem 8.2.5. For the general random walk, there are four mutually exclu-
sive possibilities, each taking place a.e.:

(i) 8n 2 N: Sn D 0;
(ii) Sn ! 1;
(iii) Sn ! C1;
(iv) 1 D limn!1 Sn < limn!1 Sn D C1.
PROOF. If X D 0 a.e., then (i) happens. Excluding this, let ϕ1 D limn Sn .
Then ϕ1 is a permutable r.v., hence a constant c, possibly š1, a.e. by
Theorem 8.1.4. Since
lim Sn D X1 C limSn  X1 ,
n n

we have ϕ1 D X1 C ϕ2 , where ϕ2 ω D ϕ1 ω D c a.e. Since X1 6 0, it


follows that c D C1 or 1. This means that
either lim Sn D C1 or lim Sn D 1.
n n

By symmetry we have also


either lim Sn D 1 or lim Sn D C1.
n n

These double alternatives yield the new possibilities (ii), (iii), or (iv), other
combinations being impossible.
This last possibility will be elaborated upon in the next section.

EXERCISES

In Exercises 1–6, the stochastic process is arbitrary.


 1. ˛ is optional if and only if 8n 2 N: f˛  ng 2 F .
n
 2. For each optional ˛ we have ˛ 2 F and X 2 F . If ˛ and ˇ are both
˛ ˛ ˛
optional and ˛  ˇ, then F˛ ² Fˇ .
3. If ˛1 and ˛2 are both optional, then so is ˛1 ^ ˛2 , ˛1 _ ˛2 , ˛1 C ˛2 . If
˛ is optional and 1 2 F˛ , then ˛1 defined below is also optional:

˛ on 1
˛1 D
C1 on n1.
 4. If ˛ is optional and ˇ is optional relative to the post-˛ process, then
˛ C ˇ is optional (relative to the original process).
5. 8k 2 N: ˛1 C Ð Ð Ð C ˛k is optional. [For the ˛ in (11), this has been
called the kth ladder variable.]
278 RANDOM WALK

 6. Prove the following relations:

˛k D ˛ °  ˇk1 ;  ˇk1 °  ˛ D  ˇk ; Xˇk1 Cj °  ˛ D Xˇk Cj .

7. If ˛ and ˇ are any two optional r.v.’s, then


XˇCj  ˛ ω D X˛ωCˇ ˛ ωCj ω;
˛
 ˇ °  ˛ ω D  ˇ ωC˛ω
ω 6D  ˇC˛ ω in general.
 8. Find an example of two optional r.v.’s ˛ and ˇ such that ˛  ˇ but
F˛0 6¦ Fˇ0 . However, if  is optional relative to the post-˛ process and ˇ D
˛ C , then indeed F˛0 ¦ Fˇ0 . As a particular case, Fˇ0k is decreasing (while
Fˇk is increasing) as k increases.
9. Find an example of two optional r.v.’s ˛ and ˇ such that ˛ < ˇ but
ˇ  ˛ is not optional.
10. Generalize Theorem 8.2.2 to the case where the domain of definition
and finiteness of ˛ is 1 with 0 < P 1 < 1. [This leads to a useful extension
of the notion of independence. For a given 1 in F with P 1 > 0, two
events 3 and M, where M ² 1, are said to be independent relative to 1 iff
P f3 \ 1 \ Mg D P f3 \ 1gP1 fMg.]
 11. Let fX , n 2 Ng be a stationary independent process and f˛ , k 2 Ng
n k
a sequence of strictly increasing finite optional r.v.’s. Then fX˛k C1 , k 2 Ng is
a stationary independent process with the same common distribution as the
original process. [This is the gambling-system theorem first given by Doob in
1936.]
12. Prove the Corollary to Theorem 8.2.2.
13. State and prove the analogue of Theorem 8.2.4 with ˛ replaced by
˛[0,1 . [The inclusion of 0 in the set of entrance causes a small difference.]
14. In an independent process where all Xn have a common bound,
E f˛g < 1 implies E fS˛ g < 1 for each optional ˛ [cf. Theorem 5.5.3].

8.3 Recurrence
A basic question about the random walk is the range of the whole process:
1
nD1 Sn ω for a.e. ω; or, “where does it ever go?” Theorem 8.2.5 tells us
that, ignoring the trivial case where it stays put at 0, it either goes off to 1
or C1, or fluctuates between them. But how does it fluctuate? Exercise 9
below will show that the random walk can take large leaps from one end to
the other without stopping in any middle range more than a finite number of
times. On the other hand, it may revisit every neighborhood of every point an
infinite number of times. The latter circumstance calls for a definition.
8.3 RECURRENCE 279

DEFINITION. The number x 2 R1 is called a recurrent value of the random


walk fSn , n 2 Ng, iff for every  > 0 we have

1 P fjSn  xj <  i.o.g D 1.

The set of all recurrent values will be denoted by <.

Taking a sequence of  decreasing to zero, we see that (1) implies the


apparently stronger statement that the random walk is in each neighborhood
of x i.o. a.e.
Let us also call the number x a possible value of the random walk iff
for every  > 0, there exists n 2 N such that P fjSn  xj < g > 0. Clearly a
recurrent value is a possible value of the random walk (see Exercise 2 below).

Theorem 8.3.1. The set < is either empty or a closed additive group of real
numbers. In the latter case it reduces to the singleton f0g if and only if X  0
a.e.; otherwise < is either the whole R1 or the infinite cyclic group generated
by a nonzero number c, namely fšnc: n 2 N0 g.
PROOF. Suppose < 6D  throughout the proof. To prove that < is a group,
let us show that if x is a possible value and y 2 <, then y  x 2 <. Suppose
not; then there is a strictly positive probability that from a certain value of n
on, Sn will not be in a certain neighborhood of y  x. Let us put for z 2 R1 :

2 p,m z D P fjSn  zj ½  for all n ½ mg;

so that p2 ,m y  x > 0 for some  > 0 and m 2 N. Since x is a possible


value, for the same  we have a k such that P fjSk  xj < g > 0. Now the
two independent events jSk  xj <  and jSn  Sk  y  xj ½ 2 together
imply that jSn  yj > ; hence
3 p,kCm y D P fjSn  yj ½  for all n ½ k C mg
½ P fjSk  xj < gP fjSn  Sk  y  xj ½ 2 for all n ½ k C mg.

The last-written probability is equal to p2,m y  x, since Sn  Sk has the


same distribution as Snk . It follows that the first term in (3) is strictly positive,
contradicting the assumption that y 2 <. We have thus proved that < is an
additive subgroup of R1 . It is trivial that < as a subset of R1 is closed in the
Euclidean topology. A well-known proposition (proof?) asserts that the only
closed additive subgroups of R1 are those mentioned in the second sentence
of the theorem. Unless X  0 a.e., it has at least one possible value x 6D 0,
and the argument above shows that x D 0  x 2 < and consequently also
x D 0  x 2 <. Suppose < is not empty, then 0 2 <. Hence < is not a
singleton. The theorem is completely proved.
280 RANDOM WALK

It is clear from the preceding theorem that the key to recurrence is the
value 0, for which we have the criterion below.

Theorem 8.3.2. If for some  > 0 we have



4 P fjSn j < g < 1,
n

then
5 P fjSn j <  i.o.g D 0

(for the same ) so that 0 62 <. If for every  > 0 we have



6 P fjSn j < g D 1,
n

then
7 P fjSn j <  i.o.g D 1

for every  > 0 and so 0 2 <.

Remark. Actually if (4) or (6) holds for any  > 0, then it holds for
every  > 0; this fact follows from Lemma 1 below but is not needed here.
PROOF. The first assertion follows at once from the convergence part of
the Borel–Cantelli lemma (Theorem 4.2.1). To prove the second part consider
F D lim inffjSn j ½ g;
n

namely F is the event that jSn j <  for only a finite number of values of n.
For each ω on F, there is an mω such that jSn ωj ½  for all n ½ mω; it
follows that if we consider “the last time that jSn j < ”, we have
1

P F D P fjSm j < ; jSn j ½  for all n ½ m C 1g.
mD0

Since the two independent events jSm j <  and jSn  Sm j ½ 2 together imply
that jSn j ½ , we have
1

1 ½ P F ½ P fjSm j < gP fjSn  Sm j ½ 2 for all n ½ m C 1g
mD1
1

D P fjSm j < gp2,1 0
mD1
8.3 RECURRENCE 281

by the previous notation (2), since Sn  Sm has the same distribution as Snm .
Consequently (6) cannot be true unless p2,1 0 D 0. We proceed to extend
this to show that p2,k D 0 for every k 2 N. To this aim we fix k and consider
the event
Am D fjSm j < , jSn j ½  for all n ½ m C kg;
then Am and Am0 are disjoint whenever m0 ½ m C k and consequently (why?)
1

k½ P Am .
mD1

The argument above for the case k D 1 can now be repeated to yield
1

k½ P fjSm j < gp2,k 0,
mD1

and so p2,k 0 D 0 for every  > 0. Thus


P F D lim p,k 0 D 0,
k!1

which is equivalent to (7).


A simple sufficient condition for 0 2 <, or equivalently for < 6D , will
now be given.

Theorem 8.3.3. If the weak law of large numbers holds for the random walk
fSn , n 2 Ng in the form that Sn /n ! 0 in pr., then < 6D .
PROOF. We need two lemmas, the first of which is also useful elsewhere.

Lemma 1. For any  > 0 and m 2 N we have


1
 1

8 P fjSn j < mg  2m P fjSn j < g.
nD0 nD0

PROOF OF LEMMA 1. It is sufficient to prove that if the right member of (8)


is finite, then so is the left member and (8) is true. Put
I D , , J D [j, j C 1,
for a fixed j 2 N; and denote by ϕI , ϕJ the respective indicator functions.
Denote also by ˛ the first entrance time into J, as defined in (5) of Sec. 8.2
with Zn replaced by Sn and A by J. We have
1 1  1
  
9 E ϕJ Sn  D ϕJ Sn dP .
nD1 kD1 f˛Dkg nD1
282 RANDOM WALK

The typical integral on the right side is by definition of ˛ equal to


  1   1
 
1C ϕJ Sn  dP  1C ϕI Sn  Sk  dP ,
f˛Dkg nDkC1 f˛Dkg nDkC1

since f˛ D kg ² fSk 2 Jg and fSk 2 Jg \ fSn 2 Jg ² fSn  Sk 2 Ig. Now f˛ D


kg and Sn  Sk are independent, hence the last-written integral is equal to
   1
1

P ˛ D k 1 C ϕI Sn  Sk  dP D P ˛ D kE ϕI Sn  ,
 nDkC1 nD0

since ϕI S0  D 1. Summing over k and observing that ϕJ 0 D 1 only if j D 0,


in which case J ² I and the inequality below is trivial, we obtain for each j:
1 1
 
E ϕJ Sn   E ϕI Sn  .
nD0 nD0

Now if we write Jj for J and sum over j from m to m  1, the inequality


(8) ensues in disguised form.
This lemma is often demonstrated by a geometrical type of argument.
We have written out the preceding proof in some detail as an example of the
maxim: whatever can be shown by drawing pictures can also be set down in
symbols!

Lemma 2. Let the positive numbers fun mg, where n 2 N and m is a real
number ½1, satisfy the following conditions:

(i) 8n : un m is increasing in m and tends to 1 as m ! 1;


 1
(ii) 9c > 0: 1 nD0 un m  cm nD0 un 1 for all m ½ 1
(iii) 8υ > 0: limn!1 un υn  D 1.

Then we have
1

10 un 1 D 1.
nD0

Remark. If (ii) is true for all integer m ½ 1, then it is true for all real
m ½ 1, with c doubled.
PROOF OF LEMMA 2. Suppose not; then for every A > 0:
1
 1
1  1 
[Am]
1> un 1 ½ un m ½ un m
nD0
cm nD0 cm nD0
8.3 RECURRENCE 283

1 
[Am]
n
½ un .
cm nD0 A

Letting m ! 1 and applying (iii) with υ D A1 , we obtain


1
 A
un 1 ½ ;
nD0
c

since A is arbitrary, this is a contradiction, which proves the lemma


To return to the proof of Theorem 8.3.3, we apply Lemma 2 with

un m D P fjSn j < mg.

Then condition (i) is obviously satisfied and condition (ii) with c D 2 follows
from Lemma 1. The hypothesis that Sn /n ! 0 in pr. may be written as
  
 Sn 
 
un υn D P   < υ ! 1
n
for every υ > 0 as n ! 1, hence condition (iii) is also satisfied. Thus Lemma 2
yields
1
P fjSn j < 1g D C1.
nD0

Applying this to the “magnified” random walk with each Xn replaced by Xn /,
which does not disturb the hypothesis of the theorem, we obtain (6) for every
 > 0, and so the theorem is proved.
In practice, the following criterion is more expedient (Chung and Fuchs,
1951).

Theorem 8.3.4. Suppose that at least one of E XC  and E X  is finite. The
< 6D  if and only if E X D 0; otherwise case (ii) or (iii) of Theorem 8.2.5
happens according as E X < 0 or > 0.
PROOF. If 1  E X < 0 or 0 < E X  C1, then by the strong law
of large numbers (as amended by Exercise 1 of Sec. 5.4), we have
Sn
! E X a.e.,
n
so that either (ii) or (iii) happens as asserted. If E X D 0, then the same law
or its weaker form Theorem 5.2.2 applies; hence Theorem 8.3.3 yields the
conclusion.
284 RANDOM WALK

DEFINITION OF RECURRENT RANDOM WALK. A random walk will be called


recurrent iff < 6D ∅; it is degenerate iff < D f0g; and it is of the lattice type
iff < is generated by a nonzero element.

The most exact kind of recurrence happens when the Xn ’s have a common
distribution which is concentrated on the integers, such that every integer is
a possible value (for some Sn ), and which has mean zero. In this case for
each integer c we have P fSn D c i.o.g D 1. For the symmetrical Bernoullian
random walk this was first proved by Pólya in 1921.
We shall give another proof of the recurrence part of Theorem 8.3.4,
namely that E X D 0 is a sufficient condition for the random walk to be recur-
rent as just defined. This method is applicable in cases not covered by the two
preceding theorems (see Exercises 6–9 below), and the analytic machinery
of ch.f.’s which it employs opens the way to the considerations in the next
section.
The starting point is the integrated form of the inversion formula in
Exercise 3 of Sec. 6.2. Talking x D 0, u D , and F to be the d.f. of Sn , we
have
  1
1  1 1  cos t
11 P jSn j <  ½ P jSn j < u du D ftn dt.
 0  1 t2
Thus the series in (6) may be bounded below by summing the last expres-
sion in (11). The latter does not sum well as it stands, and it is natural to resort
to a summability method. The Abelian method suits it well and leads to, for
0 < r < 1:
1  1
1 1  cos t 1
12 r n P jSn j <  ½ R dt,
nD0
 1 t 2 1  rft

where R and I later denote the real and imaginary parts or a complex quan-
tity. Since
1 1r
13 R ½ >0
1  rft j1  rftj2
and 1  cos t/t2 ½ C2 for jtj < 1 and some constant C, it follows that
for  < 1n the right member of (12) is not less than

C  1
14 R dt.
  1  rft
Now the existence of E jXj implies by Theorem 6.4.2 that 1  ft D ot
as t ! 0. Hence for any given υ > 0 we may choose the  above so that
j1  rftj2  1  r C r[1  Rft]2 C rIft2
8.3 RECURRENCE 285

 21  r2 C 2rυt2 C rυt2 D 21  r2 C 3r 2 υ2 t2 .

The integral in (14) is then not less than


  1r1

1  r dt 1 ds
dt ½ .
 21  r C 3r υ t
2 2 2 2 3 1r1 1 C υ2 s2

As r " 1, the right member above tends to n3υ; since υ is arbitrary, we have
proved that the right member of (12) tends to C1 as r " 1. Since the series
in (6) dominates that on the left member of (12) for every r, it follows that
(6) is true and so 0 2 < by Theorem 8.3.2.

EXERCISES

f is the ch.f. of .

1. Generalize Theorems 8.3.1 and 8.3.2 to Rd . (For d ½ 3 the general-


ization is illusory; see Exercise 12 below.)
 2. If a random walk in Rd is recurrent, then every possible value is a
recurrent value.
3. Prove the Remark after Theorem 8.3.2.
 4. Assume that P fX D 0g < 1. Prove that x is a recurrent value of the
1
random walk if and only if
1

P fjSn  xj < g D 1 for every  > 0.
nD1

5. For a recurrent random walk that is neither degenerate nor of the


lattice type, the countable set of points fSn ω, n 2 Ng is everywhere dense in
R1 for a.e.ω. Hence prove the following result in Diophantine approximation:
if  is irrational, then given any real x and  > 0 there exist integers m and n
such that jm C n  xj < .
 6. If there exists a υ > 0 such that (the integral below being real-valued)
 υ
dt
lim D 1,
r"1 υ 1  rft

then the random walk is recurrent.


 7. If there exists a υ > 0 such that
 υ
dt
sup < 1,
0<r<1 υ 1  rft
286 RANDOM WALK

then the random walk is not recurrent. [HINT: Use Exercise 3 of Sec. 6.2 to
show that there exists a constant C such that
   u
1  cos 1 x C 1/
P jSn j <   C n dx  du ftn dt,
R1 x2 2 0 u

where n is the distribution of Sn , and use (13). [Exercises 6 and 7 give,


respectively, a necessary and a sufficient condition for recurrence. If is of
the integer lattice type with span one, then
 
1
dt D 1
 1  ft

is such a condition according to Kesten and Spitzer.]


 8. Prove that the random walk with ft D ejtj (Cauchy distribution) is
recurrent.
 9. Prove that the random walk with ft D ejtj˛ , 0 < ˛ < 1, (Stable
law) is not recurrent, but (iv) of Theorem 8.2.5 holds.
10. Generalize Exercises 6 and 7 above to Rd .
 11. Prove that in R2 if the common distribution of the random vector
X, Y has mean zero and finite second moment, namely:

E X D 0, E Y D 0, 0 < E X2 C Y2  < 1,

then the random walk is recurrent. This implies by Exercise 5 above that
almost every Brownian motion path is everywhere dense in R2 . [HINT: Use
the generalization of Exercise 6 and show that
1 c
R ½ 2
1  ft1 , t2  t1 C t22

for sufficiently small jt1 j C jt2 j. One can also make a direct estimate:

c0
P jSn j <  ½ .]
n
 12. Prove that no truly 3-dimensional random walk, namely one whose
common distribution does not have its support in a plane, is recurrent. [HINT:
There exists A > 0 such that
 A  3
2
ti x i dx
A iD1
8.3 RECURRENCE 287

is a strictly positive quadratic form Q in t1 , t2 , t3 . If



3
jti j < A1 ,
iD1

then
Rf1  ft1 , t2 , t3 g ½ CQt1 , t2 , t3 .]

13. Generalize Lemma 1 in the proof of Theorem 8.3.3 to Rd . For d D 2


the constant 2m in the right member of (8) is to be replaced by 4m2 , and
“Sn < ” means Sn is in the open square with center at the origin and side
length 2.
14. Extend Lemma 2 in the proof of Theorem 8.3.3 as follows. Keep
condition (i) but replace (ii) and (iii) by
 
(ii0 ) xnD0 un m  cm2 xnD0 un 1;
There exists d > 0 such that for every b > 1 and m ½ mb:
dm2
(iii0 ) un m ½ for m2  n  bm2
n
Then (10) is true.Ł
2
15. Generalize Theorem 8.3.3 to R p as follows. If the central limit
theorem applies in the form that Sn j n converges in dist. to the unit
normal, then the random walk is recurrent. [HINT: Use Exercises 13 and 14 and
Exercise 4 of § 4.3. This is sharper than Exercise 11. No proof of Exercise 12
using a similar method is known.]
16. Suppose E X D 0, 0 < E X2  < 1, and is of the integer lattice
type, then
P fSn2 D 0 i.o.g D 1.

17. The basic argument in the proof of Theorem 8.3.2 was the “last time
in , ”. A harder but instructive argument using the “first time” may be
given as follows.

n D P fjSj j ½  for m  j  n  1; jSn j < g.


fm gn  D P fjSn j < g.

Show that for 1  m < M:



M 
M 
M
gn   fm
n gn 2.
nDm nDm nD0

Ł This form of condition (iii0 ) is due to Hsu Pei; see also Chung and Lindvall, Proc. Amer. Math.
Soc. Vol. 78 (1980), p. 285.
288 RANDOM WALK

It follows by a form of Lemma 1 in this section that



x
1
lim n ½
fm ;
m!x
nDm
4

now use Theorem 8.1.4.


18. For an arbitrary random walk, if P f8n 2 N : Sn > 0g > 0, then

P fSn  0, SnC1 > 0g < 1.
n

Hence if in addition P f8n 2 N : Sn  0g > 0, then



jP fSn > 0g  P fSnC1 > 0gj < 1;
n

and consequently
 1n
P fSn > 0g < 1.
n
n

[HINT: For the first series, consider the last time that Sn  0; for the third series,
apply Du Bois–Reymond’s test. Cf. Theorem 8.4.4 below; this exercise will
be completed in Exercise 15 of Sec. 8.5.]

8.4 Fine structure


In this section we embark on a probing in depth of the r.v. ˛ defined in (11)
of Sec. 8.2 and some related r.v.’s.
The r.v. ˛ being optional, the key to its introduction is to break up the
time sequence into a pre-˛ and a post-˛ era, as already anticipated in the
terminology employed with a general optional r.v. We do this with a sort of
characteristic functional of the process which has already made its appearance
in the last section:
1 1
  1
1 E n itSn
r e D r n ftn D ,
nD0 nD0
1  rft

where 0 < r < 1, t is real, and f is the ch.f. of X. Applying the principle just
enunciated, we break this up into two parts:
 ˛1 1
 
E n itSn
r e CE r n eitSn
nD0 nD˛
8.4 FINE STRUCTURE 289


with the understanding that on the set f˛ D 1g, the first sum above is 1 nD0
while the second is empty and hence equal to zero. Now the second part may
be written as
1  1
 
2 E ˛Cn itS˛Cn
r e DE r e ˛ itS˛
r n eitS˛Cn S˛  .
nD0 nD0

It follows from (7) of Sec. 8.2 that


S˛Cn  S˛ D Sno  ˛
has the same distribution as Sn , and by Theorem 8.2.2 that for each n it is
independent of S˛ . [Note that the same fact has been used more than once
before, but for a constant ˛.] Hence the right member of (2) is equal to
1
 1
E fr e gE
˛ itS˛
r n eitSn D E fr ˛ eitS˛ g ,
nD0
1  rft

where r ˛ eitS˛ is taken to be 0 for ˛ D 1. Substituting this into (1), we obtain


 ˛1
1 
3 [1  E fr e g] D E
˛ itS˛
r n eitSn .
1  rft nD0

We have
1
 
4 1  E fr e ˛ itS˛
gD1 r n
eitSn dP ;
nD1 f˛Dng

and
 ˛1 1 
  
k1
5 E n itSn
r e D r n eitSn dP
nD0 kD1 f˛Dkg nD0

1
 
D1C r n
eitSn dP
nD1 f˛>ng

by an interchange of summation. Let us record the two power series appearing


in (4) and (5) as
1

Pr, t D 1  E fr ˛ eitS˛ g D r n pn t;
nD0
 ˛1 1
 
Qr, t D E r n eitSn D r n qn t,
nD0 nD0
290 RANDOM WALK

where
 
p0 t  1, pn t D  e itSn
dP D  eitx Un dx;
f˛Dng R1
 
q0 t  1, qn t D eitSn dP D eitx Vn dx.
f˛>ng R1

Now Un Ð D P f˛ D n; Sn 2 Ðg is a finite measure with support in (0, 1),


while Vn Ð D P f˛ > n; Sn 2 Ðg is a finite measure with support in (1, 0].
Thus each pn is the Fourier transform of a finite measure in (0, 1), while
each qn is the Fourier transform of a finite measure in (1, 0]. Returning to
(3), we may now write it as
1
6 Pr, t D Qr, t.
1  rft
The next step is to observe the familiar Taylor series:
1
1  xn
D exp , jxj < 1.
1x nD1
n

Thus we have
1
1  rn
7 D exp ftn
1  rft nD1
n
1
 r n #  $
D exp e dP C
itSn itSn
e dP
nD1
n fSn >0g fSn 0g

D fC r, t1 f r, t,

where
 1 
 rn
fC r, t D exp  eitx n dx ,
nD1
n 0,1
 1 
 rn
f r, t D exp C eitx n dx ,
nD1
n 1,0]

and n Ð D P fSn 2 Ðg is the distribution of Sn . Since the convolution of


two measures both with support in (0, 1) has support in (0, 1), and the
convolution of two measures both with support in (1, 0] has support in
(1, 0], it follows by expansion of the exponential functions above and
8.4 FINE STRUCTURE 291

rearrangements of the resulting double series that


1
 1

fC r, t D 1 C r n ϕn t, f r, t D 1 C rn n t,
nD1 nD1

where each ϕn is the Fourier transform of a measure in (0, 1), while each
n is the Fourier transform of a measure in (1, 0]. Substituting (7) into (6)
and multiplying through by fC r, t, we obtain

8 Pr, tf r, t D Qr, tfC r, t.

The next theorem below supplies the basic analytic technique for this devel-
opment, known as the Wiener–Hopf technique.

Theorem 8.4.1. Let


1
 1

Pr, t D r n pn t, Qr, t D r n qn t,
nD0 nD0
1
 1

PŁ r, t D r n pŁn t, QŁ r, t D r n qnŁ t,
nD0 nD0

where p0 t  q0 t  pŁ0 t  q0Ł t  1; and for n ½ 1, pn and pŁn as func-
tions of t are Fourier transforms of measures with support in (0, 1); qn and
qnŁ as functions of t are Fourier transforms of measures in (1, 0]. Suppose
that for some r0 > 0 the four power series converge for r in (0, r0 ) and all
real t, and the identity

9 Pr, tQŁ r, t  PŁ r, tQr, t

holds there. Then


P  PŁ , Q  Q Ł .

The theorem is also true if (0, 1) and (1, 0] are replaced by [0, 1) and
(1, 0), respectively.
PROOF. It follows from (9) and the identity theorem for power series that
for every n ½ 0:

n 
n
Ł
10 pk tqnk t D pŁk tqnk t.
kD0 kD0

Then for n D 1 equation (10) reduces to the following:

p1 t  pŁ1 t D q1 t  q1Ł t.


292 RANDOM WALK

By hypothesis, the left member above is the Fourier transform of a finite signed
measure 1 with support in (0, 1), while the right member is the Fourier
transform of a finite signed measure with support in (1, 0]. It follows from
the uniqueness theorem for such transforms (Exercise 13 of Sec. 6.2) that
we must have 1  2 , and so both must be identically zero since they have
disjoint supports. Thus p1  pŁ1 and q1  q1Ł . To proceed by induction on n,
suppose that we have proved that pj  pŁj and qj  qjŁ for 0  j  n  1.
Then it follows from (10) that

pn t C qnŁ t  pŁn t C qn t.

Exactly the same argument as before yields that pn  pŁn and qn  qnŁ . Hence
the induction is complete and the first assertion of the theorem is proved; the
second is proved in the same way.

Applying the preceding theorem to (8), we obtain the next theorem in


the case A D 0, 1; the rest is proved in exactly the same way.

Theorem 8.4.2. If ˛ D ˛A is the first entrance time into A, where A is one


of the four sets: (0, 1), [0, 1), (1, 0), (1, 0], then we have
 1
 rn 
11 1  E fr e g D exp 
˛ itS˛
eitSn dP ;
nD1
n fSn 2Ag
 ˛1  1 
  rn
12 E r n eitSn D exp C eitSn dP .
nD0 nD1
n fSn 2A cg

From this result we shall deduce certain analytic expressions involving


the r.v. ˛. Before we do that, let us list a number of classical Abelian and
Tauberian theorems below for ready reference.
1
(A) If cn ½ 0 and nD0 cn r
n
converges for 0  r < 1, then
1
 1

Ł lim cn r n D cn
r"1
nD0 nD0

finite or infinite. 
(B) If cn are complex numbers and 1 nD0 cn r converges for 0  r  1,
n

then (*) is true.


(C) If cn are complex numbers such that cn D on1  [or just On1 ]
as n ! 1, and the limit in the left member of (Ł) exists and is finite, then
(Ł) is true.
8.4 FINE STRUCTURE 293

1
(D) If cni ½ 0, i n
nD0 cn r converges for 0  r < 1 and diverges for
r D 1, i D 1, 2; and

n 
n
ck 1 ¾ K cn2
kD0 kD0

[or more particularly cn1 ¾ Kcn2 ] as n ! 1, where 0  K  C1, then


1
 1

cn1 r n ¾K cn2 r n
nD0 nD0

as r " 1.
(E) If cn ½ 0 and
1
 1
cn r n ¾
nD0
1r

as r " 1, then

n1
ck ¾ n.
kD0

Observe that (C) is a partial converse of (B), and (E) is a partial converse
of (D). There is also an analogue of (D), which is actually a consequence of
(B): if cn are complex numbers converging to a finite limit c, then as r " 1,
1
 c
cn r n ¾ .
nD0
1r

Proposition (A) is trivial. Proposition (B) is Abel’s theorem, and proposi-


tion (C) is Tauber’s theorem in the “little o” version and Littlewood’s theorem
in the “big O” version; only the former will be needed below. Proposition (D)
is an Abelian theorem, and proposition (E) a Tauberian theorem, the latter
being sometimes referred to as that of Hardy-Littlewood-Karamata. All four
can be found in the admirable book by Titchmarsh, The theory of functions
(2nd ed., Oxford University Press, Inc., New York, 1939, pp. 9–10, 224 ff.).

Theorem 8.4.3. The generating function of ˛ in Theorem 8.4.2 is given by


 1
 rn
13 E fr g D 1  exp 
˛
P [Sn 2 A]
nD1
n
1
 rn
D 1  1  r exp P [Sn 2 Ac ] .
nD1
n
294 RANDOM WALK

We have
1
1
14 P f˛ < 1g D 1 if and only if P [Sn 2 A] D 1;
nD1
n

in which case
1
1
15 E f˛g D exp P [Sn 2 Ac ] .
nD1
n
PROOF. Setting t D 0 in (11), we obtain the first equation in (13), from
which the second follows at once through
1 1 1
1  rn  rn  rn
D exp D exp P [Sn 2 A] C P [Sn 2 Ac ] .
1r nD1
n nD1
n nD1
n

Since
1
 1

lim E fr ˛ g D lim P f˛ D ngr n D P f˛ D ng D P f˛ < 1g
r"1 r"1
nD1 nD1

by proposition (A), the middle term in (13) tends to a finite limit, hence also
the power series in r there (why?). By proposition (A), the said limit may be
obtained by setting r D 1 in the series. This establishes (14). Finally, setting
t D 0 in (12), we obtain
 ˛1  1
  rn
16 E r n
D exp C P [Sn 2 Ac ] .
nD0 nD1
n

Rewriting the left member in (16) as in (5) and letting r " 1, we obtain
1
 1

17 lim r P [˛ > n] D
n
P [˛ > n] D E f˛g  1
r"1
nD0 nD0

by proposition (A). The right member of (16) tends to the right member of
(15) by the same token, proving (15).
When is E f˛g in (15) finite? This is answered by the next theorem.Ł

Theorem 8.4.4. Suppose that X 6 0 and at least one of E XC  and E X 
is finite; then
18 E X > 0 ) E f˛0,1 g < 1;
19 E X  0 ) E f˛[0,1 g D 1.
Ł It can be shown that S ! C1 a.e. if and only if E f˛
n 0,1 g < 1; see A. J. Lemoine, Annals
of Probability 2(1974).
8.4 FINE STRUCTURE 295

PROOF. If E X > 0, then P fSn ! C1g D 1 by the strong law of large
numbers. Hence P flimn!1 Sn D 1g D 0, and this implies by the dual
of Theorem 8.2.4 that P f˛1,0 < 1g < 1. Let us sharpen this slightly to
P f˛1,0] < 1g < 1. To see this, write ˛0 D ˛1,0] and consider Sˇn0 as
in the proof of Theorem 8.2.4. Clearly Sˇn0  0, and so if P f˛0 < 1g D 1,
one would have P fSn  0 i.o.g D 1, which is impossible. Now apply (14) to
˛1,0] and (15) to ˛0,1  to infer
1
1
E f˛0,1 g D exp P [Sn  0] < 1,
nD1
n

proving (18). Next if E X D 0, then P f˛1,0 < 1g D 1 by Theorem 8.3.4.


Hence, applying (14) to ˛1,0 and (15) to ˛[0,1 , we infer
1
1
E f˛[0,1 g D exp P [Sn < 0] D 1.
nD1
n

Finally, if E X < 0, then P f˛[0,1 D 1g > 0 by an argument dual to that


given above for ˛1,0] , and so E f˛[0,1 g D 1 trivially.
Incidentally we have shown that the two r.v.’s ˛0,1 and ˛[0,1 have
both finite or both infinite expectations. Comparing this remark with (15), we
derive an analytic by-product as follows.

Corollary. We have
1
1
20 P [Sn D 0] < 1.
nD1
n

This can also be shown, purely analytically, by means of Exercise 25 of


Sec. 6.4.

The astonishing part of Theorem 8.4.4 is the case when the random walk
is recurrent, which is the case if E X D 0 by Theorem 8.3.4. Then the set
[0, 1, which is more than half of the whole range, is revisited an infinite
number of times. Nevertheless (19) says that the expected time for even one
visit is infinite! This phenomenon becomes more paradoxical if one reflects
that the same is true for the other half 1, 0], and yet in a single step one
of the two halves will certainly be visited. Thus we have:
˛1,0] ^ ˛[0,1 D 1, E f˛1,0] g D E f˛[0,1 g D 1.

Another curious by-product concerning the strong law of large numbers


is obtained by combining Theorem 8.4.3 with Theorem 8.2.4.
296 RANDOM WALK

Theorem 8.4.5. Sn /n ! m a.e. for a finite constant m if and only if for


every  > 0 we have
1   
1  Sn 
21  
P   m >  < 1.
nD1
n n
PROOF. Without loss of generality we may suppose m D 0. We know from
Theorem 5.4.2 that Sn /n ! 0 a.e. if and only if E jXj < 1 and E X D 0.
If this is so, consider the stationary independent process fX0n , n 2 Ng, where
X0n D Xn  ,  > 0; and let Sn0 and ˛0 D ˛00,1 be the corresponding r.v.’s
for this modified process. Since E X0  D , it follows from the strong law
of large numbers that Sn0 ! 1 a.e., and consequently by Theorem 8.2.4 we
have P f˛0 < 1g < 1. Hence we have by (14) applied to ˛0 :
1
1
22 P [Sn  n > 0] < 1.
nD1
n

By considering Xn C  instead of Xn  , we obtain a similar result with


“Sn  n > 0” in (22) replaced by “Sn C n < 0”. Combining the two, we
obtain (21) when m D 0.
Conversely, if (21) is true with m D 0, then the argument above yields
P f˛0 < 1g < 1, and so by Theorem 8.2.4, P flimn!1 Sn0 D C1g D 0. A
fortiori we have

8 > 0 : P fSn0 > n i.o.g D P fSn > 2n i.o.g D 0.

Similarly we obtain 8 > 0 : P fSn < 2n i.o.g D 0, and the last two rela-
tions together mean exactly Sn /n ! 0 a.e. (cf. Theorem 4.2.2).
Having investigated the stopping time ˛, we proceed to investigate the
stopping place S˛ , where ˛ < 1. The crucial case will be handled first.

Theorem 8.4.6. If E X D 0 and 0 < E X2  D 2 < 1, then


1
 1 #1 $
23 E fS˛ g D p exp  P Sn 2 A < 1.
2 nD1
n 2
PROOF. Observe that E X D 0 implies each of the four r.v.’s ˛ is finite
a.e. by Theorem 8.3.4. We now switch from Fourier transform to Laplace
transform in (11), and suppose for the sake of definiteness that A D 0, 1.
It is readily verified that Theorem 6.6.5 is applicable, which yields
 1
 rn 
˛ s˛
24 1  E fr e g D exp  esn dP
nD1
n fs n >0g
8.4 FINE STRUCTURE 297

for 0  r < 1, 0   < 1. Letting r " 1 in (24), we obtain an expression for


the Laplace transform of S˛ , but we must go further by differentiating (24)
with respect to  to obtain
25 E fr ˛ es˛ S˛ g
  1
1
rn  rn 
sn
D Sn e dP Ð exp  esn dP ,
nD1
n fsn >0g nD1
n fsn >0g

the justification for termwise differentiation being easy, since E fjSnjg  nE fjXjg.
If we now set  D 0 in (25), the result is
1
 1
 rn  rn
C
26 E fr S˛ g D
˛
E fSn g exp  P [Sn > 0] .
nD1
n nD1
n

By Exercise 2 of Sec. 6.4, we have as n ! 1,


 C
Sn
E ¾p ,
n 2n
so that
pthe coefficients in the first power series in (26) are asymptotically equal
to / 2 times those of
1
  
1/2 1 2n n
1  r D r ,
22n nnD0

since  
1 2n 1
¾p .
22n n n
It follows from proposition (D) above that
1
 1
 rn  rn
E fSnC g ¾ p 1  r 1/2
D p exp C .
nD1
n 2 2 nD1
2n

Substituting into (26), and observing that as r " 1, the left member of (26)
tends to E fS˛ g  1 by the monotone convergence theorem, we obtain
1
 rn # 1 $
27 E fS˛ g D p lim exp  P Sn > 0 .
2 r"1 nD1
n 2

It remains to prove that the limit above is finite, for then the limit of the
power series is also finite (why? it is precisely here that the Laplace transform
saves the day for us), and since the coefficients are o1/n by the central
298 RANDOM WALK

limit theorem, and certainly O1/n in any event, proposition (C) above will
identify it as the right member of (23) with A D 0, 1.
Now by analogy with (27), replacing 0, 1 by 1, 0] and writing
˛1,0] as ˇ, we have
1
 rn # 1 $
28 E fSˇ g D p lim exp  P Sn  0 .
2 r"1 nD1
n 2

Clearly the product of the two exponentials in (27) and (28) is just exp 0 D 1,
hence if the limit in (27) were C1, that in (28) would have to be 0. But
since E X D 0 and E X2  > 0, we have P X < 0 > 0, which implies at
once P Sˇ < 0 > 0 and consequently E fSˇ g < 0. This contradiction proves
that the limits in (27) and (28) must both be finite, and the theorem is proved.

Theorem 8.4.7. Suppose that X 6 0 and at least one of E XC  and E X 
is finite; and let ˛ D ˛0,1 , ˇ D ˛1,0] .

(i) If E X > 0 but may be C1, then E S˛  D E ˛E X.
(ii) If E X D 0, then E S˛  and E Sˇ  are both finite if and only if
E X2  < 1.
PROOF. The assertion (i) is a consequence of (18) and Wald’s equation
(Theorem 5.5.3 and Exercise 8 of Sec. 5.5). The “if” part of assertion (ii)
has been proved in the preceding theorem; indeed we have even “evaluated”
E S˛ . To prove the “only if” part, we apply (11) to both ˛ and ˇ in the
preceding proof and multiply the results together to obtain the remarkable
equation:
 1
 rn
29 [1  E fr e g][1  E fr e g] D exp 
˛ itS˛ ˇ itSˇ
ftn D 1  rft.
nD1
n

Setting r D 1, we obtain for t 6D 0:


1  ft 1  E feitS˛ g 1  E feitSˇ g
D .
t 2 it Cit
Letting t # 0, the right member above tends to E fS˛ gE fSˇ g by Theorem 6.4.2.
Hence the left member has a real and finite limit. This implies E X2  < 1 by
Exercise 1 of Sec. 6.4.

8.5 Continuation
Our next study concerns the r.v.’s Mn and M defined in (12) and (13) of
Sec. 8.2. It is convenient to introduce a new r.v. Ln , which is the first time
8.5 CONTINUATION 299

(necessarily belonging to N0n ) that Mn is attained:

1 8n 2 N0 : Ln ω D minfk 2 N0n : Sk ω D Mn ωg;


note that L0 D 0. We shall also use the abbreviation
2 ˛ D ˛0,1 , ˇ D ˛1,0]
as in Theorems 8.4.6 and 8.4.7.
For each n consider the special permutation below:
 
1, 2, ..., n
3 n D ,
n, n  1, ..., 1
which amounts to a “reversal” of the ordered set of indices in Nn . Recalling
that ˇ ° n is the r.v. whose value at ω is ˇn ω, and Sk ° n D Sn  Snk
for 1  k  n, we have

n
fˇ ° n > ng D fSk ° n > 0g
kD1


n 
n1
D fSn > Snk  D fSn > Sk g D fLn D ng.
kD1 kD0

It follows from (8) of Sec. 8.1 that


  
itSn °n 
4 e dP D
itSn
e dP D eitSn dP .
fˇ>ng fˇ°n >ng fLn Dng

Applying (5) and (12) of Sec. 8.4 to ˇ and substituting (4), we obtain
  1
1  rn 
5 r n
e dP D exp C
itSn
eitSn dP ;
nD0 fLn Dng nD1
n fSn >0g

applying (5) and (12) of Sec. 8.4 to ˛ and substituting the obvious relation
f˛ > ng D fLn D 0g, we obtain
  1
1  rn 
6 r n
e dP D exp C
itSn
eitSn dP .
nD0 fL n D0g nD1
n fSn 0g

We are ready for the main results for Mn and M, known as Spitzer’s identity:

Theorem 8.5.1. We have for 0 < r < 1:


1
1
  rn C
7 r E fe
n itMn
g D exp E eitSn  .
nD0 nD1
n
300 RANDOM WALK

M is finite a.e. if and only if


1
1
8 P fSn > 0g < 1,
nD1
n

in which case we have


 1
 1 C
9 E feitM g D exp [E eitSn   1] .
nD1
n
PROOF. Observe the basic equation that follows at once from the meaning
of Ln :
10 fLn D kg D fLk D kg \ fLnk °  k D 0g,
where  k is the kth iterate of the shift. Since the two events on the right side
of (10) are independent, we obtain for each n 2 N0 , and real t and u:
n 

itMn iuSn Mn 
11 E fe e gD eitSk eiuSn Sk  dP
kD0 fLn Dkg

n 
 
eiuSnk °  dP
k
D eitSk dP
kD0 fLk Dkg fLnk ° k D0g

n 
 
D eitSk dP eiuSnk dP .
kD0 fLk Dkg fLnk D0g

It follows that
1

12 r n E feitMn eiuSn Mn  g
nD0
1
  1
 
D rn eitSn dP Ð rn eiuSn dP .
nD0 fLn Dng nD0 fLn D0g

Setting u D 0 in (12) and using (5) as it stands and (6) with t D 0, we obtain
1
1  r n #  $
r E fe
n itMn
g D exp eitSn dP C 1dP ,
nD0 nD1
n fSn >0g fSn 0g

which reduces to (7).


Next, by Theorem 8.2.4, M < 1 a.e. if and only if P f˛ < 1g < 1 or
equivalently by (14) of Sec. 8.4 if and only if (8) holds. In this case the
convergence theorem for ch.f.’s asserts that
E feitM g D lim E feitMn g,
n!1
8.5 CONTINUATION 301

and consequently
1

E feitM g D lim1  r r n E feitMn g
r"1
nD0
 1
1
 rn  rn C
D lim exp  exp E eitSn 
r"1
nD1
n nD1
n
1
 rn C
D lim exp [E eitSn   1] ,
r"1
nD1
n
where the first equation is by proposition (B) in Sec. 8.4. Since
1 1
1 C 2
jE eitSn   1j  P [Sn > 0] < 1
nD1
n nD1
n
by (8), the last-written limit above is equal to the right member of (9), by
proposition (B). Theorem 8.5.1. is completely proved.
By switching to Laplace transforms in (7) as done in the proof of
Theorem 8.4.6 and using proposition (E) in Sec. 8.4, it is possible to give
an “analytic” derivation of the second assertion of Theorem 8.5.1 without
recourse to the “probabilistic” result in Theorem 8.2.4. This kind of mathemat-
ical gambit should appeal to the curious as well as the obstinate; see Exercise 9
below. Another interesting exercise is to establish the following result:
n
1
13 E Mn  D E SkC 
kD1
k
by differentiating (7) with respect to t. But a neat little formula such as (13)
deserves a simpler proof, so here it is.
Proof of (13). Writing
⎛ ⎞C
A
n A
n
Mn D Sj D ⎝ Sj ⎠
jD0 jD1

and dropping “dP ” in the integrals below:


 An
E Mn  D Sk
fMn >0g kD1

 & ' 
A
n A
n
D X1 C 0 Sk  X1  C Sk
fMn >0;Sn >0g kD2 fMn >0;Sn 0g kD1

  & ' 
A
n A
n1
D X1 C 0 Sk  X1  C Sk .
fSn >0g fSn >0g kD2 fMn1 >0;Sn 0g kD1
302 RANDOM WALK

  
Call the last three integrals 1 , 2 , and 3 . We have on grounds of symmetry
 
1
D Sn .
1 n fSn >0g
Apply the cyclical permutation
 
1, 2, . . . , n
2, 3, . . . , 1

to 2 to obtain  
D Mn1 .
2 fSn >0g

Obviously, we have
  
D Mn1 D Mn1 .
3 fMn1 >0;Sn 0g fSn 0g

Gathering these, we obtain


  
1
E Mn  D Sn C Mn1 C Mn1
n fSn >0g fSn >0g fSn 0g

1
E SnC  C E Mn1 ,
D
n
and (13) follows by recursion.
Another interesting quantity in the development was the “number of
strictly positive terms” in the random walk. We shall treat this by combinatorial
methods as an antidote to the analytic skulduggery above. Let us define

n ω D the number of k in Nn such that Sk ω > 0;


0
n ω D the number of k in Nn such that Sk ω  0.

For easy reference let us repeat two previous definitions together with two
new ones below, for n 2 N0 :
Mn ω D max Sj ω; Ln ω D minfj 2 N0n : Sj ω D Mn ωg;
0jn

M0n ω D min Sj ω; Ln0 ω D maxfj 2 N0n : Sj ω D M0n ωg.
0jn

Since the reversal given in (3) is a 1-to-1 measure preserving mapping


that leaves Sn unchanged, it is clear that for each 3 2 F1 , the measures on
B1 below are equal:

14 P f3; Sn 2 Ðg D P fn 3; Sn 2 Ðg.


8.5 CONTINUATION 303

Lemma. For each n 2 N0 , the two random vectors

Ln , Sn  and n  Ln0 , Sn 

have the same distribution.


PROOF. This will follow from (14) if we show that

15 8k 2 N0n : n1 fLn D kg D fLn0 D n  kg.

Now for n ω the first n C 1 partial sums Sj n ω, j 2 N0n , are

0, ωn , ωn C ωn1 , . . . , ωn C Ð Ð Ð C ωnjC1 , . . . , ωn C Ð Ð Ð C ω1 ,

which is the same as

Sn  Sn , Sn  Sn1 , Sn  Sn2 , . . . , Sn  Snj , . . . , Sn  S0 ,

from which (15) follows by inspection.

Theorem 8.5.2. For each n 2 N0 , the random vectors

16 Ln , Sn  and  n , Sn 

have the same distribution; and the random vectors

160  Ln0 , Sn  and  n0 , Sn 

have the same distribution.


PROOF. For n D 0 there is nothing to prove; for n D 1, the assertion about
(16) is trivially true since fL1 D 0g D fS1  0g D f 1 D 0g; similarly for 160 .
We shall prove the general case by simultaneous induction, supposing that both
assertions have been proved when n is replaced by n  1. For each k 2 N0n1
and y 2 R1 , let us put

Gy D P fLn1 D k; Sn1  yg, Hy D P f n1 D k; Sn1  yg.

Then the induction hypothesis implies that G  H. Since Xn is independent


of Fn1 and so of the vector Ln1 , Sn1 , we have for each x 2 R1 :
 1
17 P fLn1 D k; Sn  xg D Fx  y dGy
1
 1
D Fx  y dHy
1

D Pf n1 D k; Sn  xg,
304 RANDOM WALK

where F is the common d.f. of each Xn . Now observe that on the set fSn  0g
we have Ln D Ln1 by definition, hence if x  0:

18 fω : Ln ω D k; Sn ω  xg D fω : Ln1 ω D k; Sn ω  xg.

On the other hand, on the set fSn  0g we have n1 D n, so that if x  0,

19 fω : n ω D k; Sn ω  xg D fω : n1 ω D k; Sn ω  xg.

Combining (17) with (19), we obtain

20 8k 2 N0n , x  0 : P fLn D k; Sn  xg D P f n D k; Sn  xg.

Next if k 2 N0n , and x ½ 0, then by similar arguments:


fω : Ln0 ω D n  k; Sn ω > xg D fω : Ln1
0
ω D n  k; Sn ω > xg,
0 0
fω : n ω D n  k; Sn ω > xg D fω : n1 ω D n  k; Sn ω > xg.

Using (160 ) when n is replaced by n  1, we obtain the analogue of (20):

21 8k 2 N0n , x ½ 0 : P fLn0 D n  k; Sn > xg D P f 0


n D n  k; Sn > xg.

The left members in (21) and (22) below are equal by the lemma above, while
the right members are trivially equal since n C n0 D n:

22 8k 2 N0n , x ½ 0 : P fLn D k; Sn > xg D P f n D k; Sn > xg.

Combining (20) and (22) for x D 0, we have

23 8k 2 N0n : P fLn D kg D P f n D kg;

subtracting (22) from (23), we obtain the equation in (20) for x ½ 0; hence it
is true for every x, proving the assertion about (16). Similarly for (160 ), and
the induction is complete.
As an immediate consequence of Theorem 8.5.2, the obvious relation
(10) is translated into the by-no-means obvious relation (24) below:

Theorem 8.5.3. We have for k 2 N0n :

24 Pf n D kg D P f k D kgP f nk D 0g.

If the common distribution of each Xn is symmetric with no atom at zero,


then
  
 12  12
25 8k 2 N0n : P f n D kg D 1n .
k nk
8.5 CONTINUATION 305

PROOF. Let us denote the number on right side of (25), which is equal to
  
1 2k 2n  2k
,
22n k nk

by an k. Then for each n 2 N, fan k, k 2 N0n g is a well-known probability


distribution. For n D 1 we have

Pf 1 D 0g D P f 1 D 1g D 1
2 D an 0 D an n,

so that (25) holds trivially. Suppose now that it holds when n is replaced by
n  1; then for k 2 Nn1 we have by (24):
   
 12  12
Pf n D kg D 1k 1nk D an k.
k nk

It follows that


n1
Pf n D 0g C P fvn D ng D 1  Pf n D kg
kD1


n1
D1 an k D an 0 C an n.
kD1

Under the hypotheses of the theorem, it is clear by considering the dual random
walk that the two terms in the first member above are equal; since the two
terms in the last member are obviously equal, they are all equal and the
theorem is proved.
Stirling’s formula and elementary calculus now lead to the famous “arcsin
law”, first discovered by Paul Lévy (1939) for Brownian motion.

Theorem 8.5.4. If the common distribution of a stationary independent


process is symmetric, then we have
! " 
n 2 p 1 x du
8x 2 [0, 1] : lim P  x D arc sin x D p .
n!1 n   0 u1  u

This limit theorem also holds for an independent, not necessarily stationary
process, in which each Xn has mean 0 and variance 1 and such that the
classical central limit theorem is applicable. This can be proved by the same
method (invariance principle) as Theorem 7.3.3.
306 RANDOM WALK

EXERCISES

1. Derive (3) of Sec. 8.4 by considering


1 #  $
E fr e g D
˛ itS˛
r n
e dP 
itSn
e itSn
dP .
nD1 f˛>n1g f˛>ng

 2. Under the conditions of Theorem 8.4.6, show that



E fS˛ g D  lim Sn dP .
n!1 f˛>ng

3. Find an expression for the Laplace transform of S˛ . Is the corre-


sponding formula for the Fourier transform valid?
 4. Prove (13) of Sec. 8.5 by differentiating (7) there; justify the steps.
5. Prove that
1
1
  rn
r P [Mn D 0] D exp
n
P [Sn  0]
nD0 nD1
n

and deduce that


 1
 1
P [M D 0] D exp  P [Sn > 0] .
nD1
n

6. If M < 1 a.e., then it has an infinitely divisible distribution.


7. Prove that
1
1
 1
P fLn D 0; Sn D 0g D exp P [Sn D 0] .
nD0 nD1
n

[HINT: One way to deduce this is to switch to Laplace transforms in (6) of


Sec. 8.5 and let  ! 1.]
8. Prove that the left member of the equation in Exercise 7 is equal to
& 1
'1

0
1 P f˛ D n; Sn D 0g ,
nD1

where ˛0 D ˛[0,1 ; hence prove its convergence.


 9. Prove that
1
1
P [Sn > 0] < 1
nD1
n
8.5 CONTINUATION 307

implies P M < 1 D 1 as follows. From the Laplace transform version of


(7) of Sec. 8.5, show that for  > 0,
1

lim1  r r n E feMn g
r"1
nD0

exists and is finite, say D . Now use proposition (E) in Sec. 8.4 and apply
the convergence theorem for Laplace transforms (Theorem 6.6.3).
 10. Define a sequence of r.v.’s fY , n 2 N0 g as follows:
n

Y0 D 0, YnC1 D Yn C XnC1 C , n 2 N0 ,


where fXn , n 2 Ng is a stationary independent sequence. Prove that for each
n, Yn and Mn have the same distribution. [This approach is useful in queuing
theory.]
11. If 1  E X < 0, then
E X D E V , where V D sup Sj .
1j<1

[HINT: If Vn D max1jn Sj , then


C  C
E eitVn  C E eitVn   1 D E eitV
n  D E e
itVn1
ft;
let n ! 1, then t # 0. For the case E X D 1, truncate. This result is due
to S. Port.]
 12. If P f˛ < 1g < 1, then v ! v, L ! L, both limits being finite
0,1 n n
a.e. and having the generating functions:
1
 rn  1
E fr g D E fr g D exp
L
P [Sn > 0] .
nD1
n

[HINT: Consider limm!1 1 nD0 P fvm D ngr and use (24) of Sec. 8.5.]
n

13. If E X D 0, E X2  D 2 , 0 < 2 < 1, then as n ! 1 we have


ec ec
P[ n D 0] ¾ p , P[ n D n] ¾ p ,
n n
where  
1
1 1
cD  P [Sn > 0] .
nD1
n 2

[HINT: Consider
1

lim1  r1/2 rnP [ n D 0]
r"1
nD0
308 RANDOM WALK

as in the proof of Theorem 8.4.6, and use the following lemma: if pn is a


decreasing sequence of positive numbers such that

n
pk ¾ 2n1/2 ,
kD1

then pn ¾ n1/2 .]
 14. Prove Theorem 8.5.4.
15. For an arbitrary random walk, we have
 1n
P fSn > 0g < 1.
n
n

[HINT: Half of the result is given in Exercise 18 of Sec. 8.3. For the remaining
case, apply proposition (C) in the O-form to equation (5) of Sec. 8.5 with Ln
replaced by n and t D 0. This result is due to D. L. Hanson and M. Katz.]

Bibliographical Note

For Sec. 8.3, see


K. L. Chung and W. H. J. Fuchs, On the distribution of values of sums of random
variables, Mem. Am. Math. Soc. 6 (1951), 1–12.
K. L. Chung and D. Ornstein, On the recurrence of sums of random variables,
Bull. Am. Math. Soc. 68 (1962) 30–32.
Theorems 8.4.1 and 8.4.2 are due to G. Baxter; see
G. Baxter, An analytical approach to finite fluctuation problems in probability,
J. d’Analyse Math. 9 (1961), 31–70.
Historically, recurrence problems for Bernoullian random walks were first discussed
by Pólya in 1921.
The rest of Sec. 8.4, as well as Theorem 8.5.1, is due largely to F. L. Spitzer; see
F. L. Spitzer, A combinatorial lemma and its application to probability theory,
Trans. Am. Math. Soc. 82 (1956), 323–339.
F. L. Spitzer, A Tauberian theorem and its probability interpretation, Trans. Am.
Math. Soc. 94 (1960), 150–169.
See also his book below — which, however, treats only the lattice case:
F. L. Spitzer, Principles of random walk. D. Van Nostrand Co., Inc., Princeton,
N.J., 1964.
8.5 CONTINUATION 309

The latter part of Sec. 8.5 is based on


W. Feller, On combinatorial methods in fluctuation theory, Surveys in Proba-
bility and Statistics (the Harald Cramér volume), 75–91. Almquist & Wiksell,
Stockholm, 1959.
Theorem 8.5.3 is due to E. S. Andersen. Feller [13] also contains much material on
random walks.
Conditioning. Markov
9 property. Martingale

9.1 Basic properties of conditional expectation

If 3 is any set in F with P 3>0, we define P3 Ð on F as follows:

P 3 \ E
1 P3 E D
P 3

Clearly P3 is a p.m. on F ; it is called the “conditional probability relative to


3”. The integral with respect to this p.m. is called the “conditional expectation
relative to 3”:
 
1
2 E3 Y D YωP3 dω D YωP dω.
 P 3 3

If P 3 D 0, we decree that P3 E D 0 for every E 2 F . This convention is


expedient as in (3) and (4) below.
Let now f3n , n ½ 1g be a countable measurable partition of , namely:
1
D 3n , 3n 2 F , 3m \ 3n D ∅, if m 6D n.
nD1
9.1 BASIC PROPERTIES OF CONDITIONAL EXPECTATION 311

Then we have
1
 1

3 P E D P 3n \ E D P 3n P3n E;
nD1 nD1
1 
 1

4 E Y D YωP dω D P 3n E3n Y,
nD1 3n nD1

provided that E Y is defined. We have already used such decompositions


before, for example in the proof of Kolmogorov’s inequality (Theorem 5.3.1):
 n
Sn2 dP D P 3k E3k Sn2 .
3 kD1

Another example is in the proof of Wald’s equation (Theorem 5.5.3), where


1

E SN  D P N D kEfNDkg SN .
kD1

Thus the notion of conditioning gives rise to a decomposition when a given


event or r.v. is considered on various parts of the sample space, on each of
which some particular information may be obtained.
Let us, however, reflect for a few moments on the even more elementary
example below. From a pack of 52 playing cards one card is drawn and seen
to be a spade. What is the probability that a second card drawn from the
remaining deck will also be a spade? Since there are 51 cards left, among
which are 12 spades, it is clear that the required probability is 12/51. But is
this the conditional probability P3 E defined above, where 3 D “first card
is a spade” and E D “second card is a spade”? According to the definition,
13.12
P 3 \ E 52.51 12
D D ,
P 3 13 51
52
where the denominator and numerator have been separately evaluated by
elementary combinatorial formulas. Hence the answer to the question above is
indeed “yes”; but this verification would be futile if we did not have another
way to evaluate P3 E as first indicated. Indeed, conditional probability is
often used to evaluate a joint probability by turning the formula (1) around as
follows:
13 12
P 3 \ E D P 3P3 E D Ð Ð
52 51
In general, it is used to reduce the calculation of a probability or expectation
to a modified one which can be handled more easily.
312 CONDITIONING. MARKOV PROPERTY. MARTINGALE

Let G be the Borel field generated by a countable partition f3n g, for


example that by a discrete r.v. X, where 3n D fX D an g. Given an integrable
r.v. Y, we define the function EG Y on  by:

5 EG Y D E3n Y13n Ð.
n

Thus EG Y is a discrete r.v. that assumes that value E3n Y on the set 3n ,
for each n. Now we can rewrite (4) as follows:
 
E Y D EG Y dP D EG Y dP .
n 3n 

Furthermore, for any 3 2 G , 3 is the union of a subcollection of the 3n ’s


(see Exercise 9 of Sec. 2.1), and the same manipulation yields
 
6 83 2 G : Y dP D EG Y dP .
3 3

In particular, this shows that EG Y is integrable. Formula (6) equates two
integrals over the same set with an essential difference: while the integrand
Y on the left belongs to F , the integrand EG Y on the right belongs to the
subfield G . [The fact that EG Y is discrete is incidental to the nature of G .]
It holds for every 3 in the subfield G , but not necessarily for a set in F nG .
Now suppose that there are two functions ϕ1 and ϕ2 , both belonging to G ,
such that  
83 2 G : Y dP D ϕi dP , i D 1, 2.
3 3

Let 3 D fω : ϕ1 ω > ϕ2 ωg, then 3 2 G and so



ϕ1  ϕ2  dP D 0.
3

Hence P 3 D 0; interchanging ϕ1 and ϕ2 above we conclude that ϕ1 D ϕ2


a.e. We have therefore proved that the EG Y in (6) is unique up to an equiv-
alence. Let us agree to use EG Y or E Y j G  to denote the corresponding
equivalence class, and let us call any particular member of the class a “version”
of the conditional expectation.
The results above are valid for an arbitrary Borel subfield G and will be
stated in the theorem below.

Theorem 9.1.1. If E jYj < 1 and G is a Borel subfield of F , then there


exists a unique equivalence class of integrable r.v.’s E Y j G  belonging to G
such that (6) holds.
9.1 BASIC PROPERTIES OF CONDITIONAL EXPECTATION 313

PROOF. Consider the set function on G :



83 2 G : 3 D Y dP .
3

It is finite-valued and countably additive, hence a “signed measure” on G .


If P 3 D 0. then 3 D 0; hence it is absolutely continuous with respect
to P : − P . The theorem then follows from the Radon–Nikodym theorem
(see, e.g., Royden [5] or Halmos [4]), the resulting “derivative” d /dP being
what we have denoted by E Y j G .
Having established the existence and uniqueness, we may repeat the defi-
nition as follows.

DEFINITION OF CONDITIONAL EXPECTATION. Given an integrable r.v. Y and a


Borel subfield G , the conditional expectation E Y j G  of Y relative to G is
any one of the equivalence class of r.v.’s on  satisfying the two properties:

(a) it belongs to G ;
(b) it has the same integral as Y over any set in G .

We shall refer to (b), or equivalently formula (6) above, as the “defining


relation” of the conditional expectation. In practice as well as in theory, the
identification of conditional expectations or relations between them is estab-
lished by verifying the two properties listed above. When Y D 11 , where
1 2 F , we write
P 1 j G  D E 11 j G 

and call it the “conditional probability of 1 relative to G ”. Specifically, P 1 j


G  is any one of the equivalence class of r.v.’s belonging to G and satisfying
the condition

7 83 2 G : P 1 \ 3 D P 1 j G  dP .
3

It follows from the definition that for an integrable r.v. Y and a Borel
subfield G , we have 
[Y  E Y j G ] dP D 0,
3

for every 3 2 G , and consequently also


E f[Y  E Y j G ]Zg D 0

for every bounded Z 2 G (why?). This implies the decomposition:


Y D Y0 C Y00 where Y0 D E Y j G  and Y00 ? G ,
314 CONDITIONING. MARKOV PROPERTY. MARTINGALE

where “Y00 ? G ” means that E Y00 Z D 0 for every bounded Z 2 G . In the


language of Banach space, Y0 is the “projection” of Y on G and Y00 its “orthog-
onal complement”.
For the Borel field F fXg generated by the r.v. X, we write also E Y j X
for E Y j F fXg; similarly for E Y j X1 , . . . , Xn . The next theorem clarifies
certain useful connections.

Theorem 9.1.2. One version of the conditional expectation E Y j X is given


by ϕX, where ϕ is a Borel measurable function on R1 . Furthermore, if we
define the signed measure  on B1 by

8B 2 B : B D
1
Y dP ,
x1 B
and the p.m. of X by , then ϕ is one version of the Radon–Nikodym deriva-
tive d/d .
PROOF. The first assertion of the theorem is a particular case of the
following lemma.

Lemma. If Z 2 F fXg, then Z D ϕX for some extended-valued Borel


measurable function ϕ.
PROOF OF THE LEMMA. It is sufficient to prove this for a bounded positive Z
(why?). Then there exists a sequence of simple functions Zm which increases
to Z everywhere, and each Zm is of the form


cj 13j
jD1

where 3j 2 F fXg. Hence 3j D X1 Bj  for some Bj 2 B1 (see Exercise 11


of Sec. 3.1). Thus if we take


ϕm D cj 1Bj ,
jD1

we have Zm D ϕm X. Since ϕm X ! Z, it follows that ϕm converges on the


range of X. But this range need not be Borel or even Lebesgue measurable
(Exercise 6 of Sec. 3.1). To overcome this nuisance, we put,
8x 2 R1 : ϕx D lim ϕm x.
m!1

Then Z D limm ϕm X D ϕX, and ϕ is Borel measurable, proving the lemma.
To prove the second assertion: given any B 2 B1 , let 3 D X1 B, then
by Theorem 3.2.2 we have
   
E Y j X dP D 1B XϕX dP D 1B xϕx d D ϕx d .
3  R1 B
9.1 BASIC PROPERTIES OF CONDITIONAL EXPECTATION 315

Hence by (6),  
B D Y dP D ϕx d .
3 B

This being true for every B in B1 , it follows that ϕ is a version of the derivative
d/d . Theorem 9.1.2 is proved.
As a consequence of the theorem, the function E Y j X of ω is constant
a.e. on each set on which Xω is constant. By an abuse of notation, the ϕx
above is sometimes written as E Y j X D x. We may then write, for example,
for each real c:
 
Y dP D E Y j X D x dP fX  xg.
fXcg 1,c]

Generalization to a finite number of X’s is straightforward. Thus one


version of E Y j X1 , . . . , Xn  is ϕX1 , . . . , Xn , where ϕ is an n-dimensional
Borel measurable function, and by E Y j X1 D x1 , . . . , Xn D xn  is meant
ϕx1 , . . . , xn .
It is worthwhile to point out the extreme cases of E Y j G :

E Y j J  D E Y, E Y j F  D Y; a.e.

where J is the trivial field f∅, g. If G is the field generated by one set
3 : f∅, 3, 3c , g, then E Y j G  is equal to E Y j 3 on 3 and E Y j 3c 
on 3c . All these equations, as hereafter, are between equivalent classes of
r.v.’s.
We shall suppose the pair F , P  to be complete and each Borel subfield
G of F to be augmented (see Exercise 20 of Sec. 2.2). But even if G is
not augmented and G is its augmentation, it follows from the definition that
E Y j G  D E Y j G , since an r.v. belonging to G is equal to one belonging
to G almost everywhere (why ?). Finally, if G0 is a field generating G , or just
a collection of sets whose finite disjoint unions form such a field, then the
validity of (6) for each 3 in G0 is sufficient for (6) as it stands. This follows
easily from Theorem 2.2.3.
The next result is basic.

Theorem 9.1.3. Let Y and YZ be integrable r.v.’s and Z 2 G ; then we have

8 E YZ j G  D ZE Y j G  a.e.

[Here “a.e.” is necessary, since we have not stipulated to regard Z as an


equivalence class of r.v.’s, although conditional expectations are so regarded
by definition. Nevertheless we shall sometimes omit such obvious “a.e.’s”
from now on.]
316 CONDITIONING. MARKOV PROPERTY. MARTINGALE

PROOF. As usual we may suppose Y ½ 0, Z ½ 0 (see property (ii) below).


The proof consists in observing that the right member of (8) belongs to G and
satisfies the defining relation for the left member, namely:
 
9 83 2 G : ZE Y j G  dP D ZY dP .
3 3

For (9) is true if Z D 11 , where 1 2 G , hence it is true if Z is a simple


r.v. belonging to G and consequently also for each Z in G by monotone
convergence, whether the limits are finite or positive infinite. Note that the
integrability of ZE Y j G  is part of the assertion of the theorem.
Recall that when G is generated by a partition f3n g, we have exhibited a
specific version (5) of E Y j G . Now consider the corresponding P M j G 
as a function of the pair M, ω:

P M j G ω D P M j 3n 13n ω.
n

For each fixed M, as a function of ω this is a specific version of P M j G .


For each fixed ω0 , as a function of M this is a p.m. on F given by P fÐ j 3m g
for ω0 2 3m . Let us denote for a moment the family of p.m.’s arising in this
manner by Cω0 , Ð. We have then for each integrable r.v. Y and each ω0 in :
 
10 E Y j G ω0  D E Y j 3n 13n ω0  D YCω0 , dω.
n 

Thus the specific version of E Y j G  may be evaluated, at each ω0 2 , by


integrating Y with respect to the p.m. Cω0 , Ð. In this case the conditional
expectation E Ð j G  as a functional on integrable r.v.’s is an integral in the
literal sense. But in general such a representation is impossible (see Doob
[16, Sec. 1.9]) and we must fall back on the defining relations to deduce its
properties, usually from the unconditional analogues. Below are some of the
simplest examples, in which X and Xn are integrable r.v.’s.

(i) If X 2 G , then E X j G  D X a.e.; this is true in particular if X is a


constant a.e.
(ii) E X1 C X2 j G  D E X1 j G  C E X2 j G .
(iii) If X1  X2 , then E X1 j G   E X2 j G .
(iv) jE X j G j  E jXj j G .
(v) If Xn " X, then E Xn j G  " E X j G .
(vi) If Xn # X, then E Xn j G  # E X j G .
9.1 BASIC PROPERTIES OF CONDITIONAL EXPECTATION 317

(vii) If jXn j  Y where E Y < 1 and Xn ! X, then E Xn j G  !


E X j G .

To illustrate, (iii) is proved by observing that for each 3 2 G :


   
E X1 j G  dP D X1 dP  X2 dP D E X2 j G  dP .
3 3 3 3

Hence if 3 D fE X1 j G  > E X2 j G g, we have P 3 D 0. The inequality


(iv) may be proved by (ii), (iii), and the equation X D XC  X . To prove
(v), let the limit of E Xn j G  be Z, which exists a.e. by (iii). Then for each
3 2 G , we have by the monotone convergence theorem:
   
Z dP D lim E Xn j G  dP D lim Xn dP D X dP .
3 n 3 n 3 3

Thus Z satisfies the defining relation for E X j G , and it belongs to G with


the E Xn j G ’s, hence Z D E X j G .
To appreciate the caution that is necessary in handling conditional expec-
tations, let us consider the Cauchy–Schwarz inequality:

E jXYj j G 2  E X2 j G E Y2 j G .

If we try to extend one of the usual proofs based on the positiveness of the
quadratic form in  : E X C Y2 j G , the question arises that for each 
the quantity is defined only up to a null set N , and the union of these over
all  cannot be ignored without comment. The reader is advised to think this
difficulty through to its logical end, and then get out of it by restricting the
’s to the rationals. Here is another way out: start from the following trivial
inequality:
jXjjYj X2 Y2
 2 C 2,
˛ˇ 2˛ 2ˇ
where ˛ D E X2 j G 1/2 , ˇ D E Y2 j G 1/2 , and ˛ˇ>0; apply the operation
E f j G g using (ii) and (iii) above to obtain
    2   2 
jXYj  1 X  1 Y 
   C G .
ˇ2 
E G E 2
G E
˛ˇ 2 ˛ 2
Now use Theorem 9.1.3 to infer that this can be reduced to
1 1 ˛2 1 ˇ2
E fjXYj j G g  C D 1,
˛ˇ 2 ˛2 2 ˇ2
the desired inequality.
The following theorem is a generalization of Jensen’s inequality in Sec. 3.2.
318 CONDITIONING. MARKOV PROPERTY. MARTINGALE

Theorem 9.1.4. If ϕ is a convex function on R1 and X and ϕX are inte-


grable r.v.’s, then for each G :

11 ϕE X j G   E ϕX j G .


PROOF. If X is a simple r.v. taking the values fyj g on the sets f3j g, 1 
j  n, which forms a partition of , we have

n
E X j G  D yj P 3j j G ,
jD1


n
E ϕX j G  D ϕyj P 3j j G ,
jD1

where njD1 P 3j j G  D 1 a.e. Hence (11) is true in this case by the property
of convexity. In general let fXm g be a sequence of simple r.v.’s converging to
X a.e. and satisfying jXm j  jXj for all m (see Exercise 7 of Sec. 3.2). If we
let m ! 1 below:

12 ϕE Xm j G   E ϕXm  j G ,

the left-hand member converges to the left-hand member of (11) by the conti-
nuity of ϕ, but we need dominated convergence on the right-hand side. To get
this we first consider ϕn which is obtained from ϕ by replacing the graph of
ϕ outside n, n with tangential lines. Thus for each n there is a constant
Cn such that
8x 2 R1 : jϕn xj  Cn jxj C 1.

Consequently, we have

jϕn Xm j  Cn jXm j C 1  Cn jXj C 1

and the last term is integrable by hypothesis. It now follows from property
(vii) of conditional expectations that

lim E ϕn Xm  j G  D E ϕn X j G .


m!1

This establishes (11) when ϕ is replaced by ϕn . Letting n ! 1 we have


ϕn " ϕ and ϕn X is integrable; hence (11) follows for a general convex ϕ,
by monotone convergence (v).
Here is an alternative proof, slightly more elegant and more delicate. We
have for any x and y:

ϕx  ϕy ½ ϕ0 yx  y


9.1 BASIC PROPERTIES OF CONDITIONAL EXPECTATION 319

where ϕ0 is the right-hand derivative of ϕ. Hence

ϕX  ϕE X j G  ½ ϕ0 E X j G [X  E X j G ].

The right member may not be integrable; but let 3 D fω : jE X j G j  Ag


for A>0. Replace X by X13 in the above, take expectations of both sides, and
let A " 1. Observe that

E fϕX13  j G g D E fϕX13 C ϕ013c j G g D E fϕX j G g13 C ϕ013c .

We now come to the most important property of conditional expectation


relating to changing fields. Note that when 3 D , the defining relation (6)
may be written as

EJ EG Y D EJ Y D EG EJ Y.

This has an immediate generalization.

Theorem 9.1.5. If Y is integrable and F1 ² F2 , then

13 EF1 Y D EF2 Y if and only if EF2 Y 2 F1 ;

and

14 EF1 EF2 Y D EF1 Y D EF2 EF1 Y.


PROOF. Since Y satisfies trivially the defining relation for E Y j F1 , it
will be equal to the latter if and only if Y 2 F1 . Now if we replace our basic
F by F2 and Y by EF2 Y, the assertion (13) ensues. Next, since

EF1 Y 2 F1 ² F2 ,

the second equation in (14) follows from the same observation. It remains
to prove the first equation in (14). Let 3 2 F1 , then 3 2 F2 ; applying the
defining relation twice, we obtain
  
EF1 EF2 Y dP D EF2 Y dP D Y dP .
3 3 3

Hence EF1 EF2 Y satisfies the defining relation for EF1 Y; since it belongs
to F1 , it is equal to the latter.
As a particular case, we note, for example,

15 E fE Y j X1 , X2  j X1 g D E Y j X1  D E fE Y j X1  j X1 , X2 g.

To understand the meaning of this formula, we may think of X1 and X2 as


discrete, each producing a countable partition. The superimposition of both
320 CONDITIONING. MARKOV PROPERTY. MARTINGALE

partitions yields sets of the form f3j \ Mk g. The “inner” expectation on the
left of (15) is the result of replacing Y by its “average” over each 3j \ Mk .
Now if we replace this average r.v. again by its average over each 3j , then
the result is the same as if we had simply replaced Y by its average over each
3j . The second equation has a similar interpretation.
Another kind of simple situation is afforded by the probability triple
U n , Bn , mn  discussed in Example 2 of Sec. 3.3. Let x1 , . . . , xn be the coor-
dinate r.v.’s žžž D fx1 , . . . , xn , where f is (Borel) measurable and inte-
grable. It is easy to see that for 1  k  n  1,
 1  1
E y j x1 , . . . , xk  D ÐÐÐ fx1 , . . . , xn  dxkC1 Ð Ð Ð dxn ,
0 0

while for k D n, the left side is just y (a.e.). Thus, taking conditional expec-
tation with respect to certain coordinate r.v.’s here amounts to integrating out
the other r.v.’s. The first equation in (15) in this case merely asserts the possi-
bility of iterated integration, while the second reduces to a banality, which we
leave to the reader to write out.

EXERCISES

1. Prove Lemma 2 in Sec. 7.2 by using conditional probabilities.


2. Let f3n g be a countable measurable partition of , and E 2 F with
P E>0; then we have for each m:
P 3m P3m E
PE 3m  D  .
P 3n P3n E
n

[This is Bayes’ rule.]


 3. If X is an integrable r.v., Y a bounded r.v., and G a Borel subfield,
then we have
E fE X j G Yg D E fXE Y j G g.

4. Prove Fatou’s lemma and Lebesgue’s dominated convergence theorem


for conditional expectations.
 5. Give an example where E E Y j X  j X  6D E E Y j X  j X .
1 2 2 1
[HINT: It is sufficient to give an example where E X j Y 6D E fE X j Y j Xg;
consider an  with three points.]
 6. Prove that 2 E Y  2 Y, where 2 is the variance.
G
7. If the random vector has the probability density function pÐ, Ð and
X is integrable, then one version of E X j X C Y D z is given by
 
xpx, z  x dx/ px, z  x dx.
9.1 BASIC PROPERTIES OF CONDITIONAL EXPECTATION 321

 8. In the case above, there exists an integrable function ϕÐ, Ð with the
property that for each B 2 B1 ,

ϕx, y dy
B

is a version of P fY 2 B j X D xg. [This is called a “conditional density func-


tion in the wide sense” and
 
8x,  D ϕx, y dy
1

is the corresponding conditional distribution in the wide sense:


8x,  D P fY   j X D xg.

The term “wide sense” refers to the fact that these are functions on R1 rather
than on ; see Doob [1, Sec. I.9].]
9. Let the pÐ, Ð above be the 2-dimensional normal density:
  2 
1 1 x 2xy y2
9 exp   C 2 ,
2 1 2 1  2 21  2  2
1 1 2 2

where 1 >0, 2 >0, 0 <  < 1. Find the ϕ mentioned in Exercise 8 and
 1
yϕx, y dy.
1

The latter should be a version of E Y j X D x; verify it.


10. Let G be a B.F., X and Y two r.v.’s such that

E Y2 j G  D X2 , E Y j G  D X.

Then Y D X a.e.
11. As in Exercise 10 but suppose now for any f 2 CK ;

E fX2 j fXg D E fY2 j fXg; E fX j fXg D E fY j fXg.

Then Y D X a.e. [HINT: By a monotone class theorem the equations hold for
f D 1B , B 2 B1 ; now apply Exercise 10 with G D F fXg.]
12. Recall that Xn in L 1 converges weakly in L 1 to X iff E Xn Y !
E XY for every bounded r.v. Y. Prove that this implies E Xn j G  converges
weakly in L 1 to E X j G  for any Borel subfield G of F .
 13. Let S be an r.v. such that P fS > tg D et , t>0. Compute E fS j S3tg
and E fS j S _ tg for each t > 0.
322 CONDITIONING. MARKOV PROPERTY. MARTINGALE

9.2 Conditional independence; Markov property

In this section we shall first apply the concept of conditioning to independent


r.v.’s and to random walks, the two basic types of stochastic processes that
have been extensively studied in this book; then we shall generalize to a
Markov process. Another important generalization, the martingale theory, will
be taken up in the next three sections.
All the B.F.’s (Borel fields) below will be subfields of F . The B.F.’s
fF˛ , ˛ 2 Ag, where A is an arbitrary index set, are said to be conditionally inde-
pendent relative to the B.F. G , iff for any finite collection of sets 31 , . . . , 3n
such that 3j 2 F˛j and the ˛j ’s are distinct indices from A, we have
⎛ ⎞
n n
P ⎝ 3j j G ⎠ D P 3j j G .
jD1 jD1

When G is the trivial B.F., this reduces to unconditional independence.

Theorem 9.2.1. For each ˛ 2 A let F ˛ denote the smallest B.F. containing
all Fˇ , ˇ 2 A  f˛g. Then the F˛ ’s are conditionally independent relative to
G if and only if for each ˛ and 3˛ 2 F˛ we have

P 3˛ j F ˛
_ G  D P 3˛ j G ,

where F ˛
_ G denotes the smallest B.F. containing F ˛
and G .
PROOF. It is sufficient to prove this for two B.F.’s F1 and F2 , since the
general result follows by induction (how?). Suppose then that for each 3 2 F1
we have

1 P 3 j F2 _ G  D P 3 j G .

Let M 2 F2 , then

P 3M j G  D E fP 3M j F2 _ G  j G g D E fP 3 j F2 _ G 1M j G g


D E fP 3 j G 1M j G g D P 3 j G P M j G ,

where the first equation follows from Theorem 9.1.5, the second and fourth
from Theorem 9.1.3, and the third from (1). Thus F1 and F2 are conditionally
independent relative to G . Conversely, suppose the latter assertion is true, then

E fP 3 j G 1M j G g D P 3 j G P M j G 
D P 3M j G  D E fP 3 j F2 _ G 1M j G g,
9.2 CONDITIONAL INDEPENDENCE; MARKOV PROPERTY 323

where the first equation follows from Theorem 9.1.3, the second by hypothesis,
and the third as shown above. Hence for every 1 2 G , we have
 
P 3 j G  dP D P 3 j F2 _ G  dP D P 3M1
M1 M1
It follows from Theorem 2.1.2 (take F0 to be finite disjoint unions of sets like
M1) or more quickly from Exercise 10 of Sec. 2.1 that this remains true if
M1 is replaced by any set in F2 _ G . The resulting equation implies (1), and
the theorem is proved.
When G is trivial and each F˛ is generated by a single r.v., we have the
following corollary.

Corollary. Let fX˛ , ˛ 2 Ag be an arbitrary set of r.v.’s. For each ˛ let F ˛
denote the Borel field generated by all the r.v.’s in the set except X˛ . Then
the X˛ ’s are independent if and only if: for each ˛ and each B 2 B1 , we have
P fX˛ 2 B j F ˛
g D P fX˛ 2 Bg a.e.
An equivalent form of the corollary is as follows: for each integrable r.v.
Y belonging to the Borel field generated by X˛ , we have
2 E fY j F ˛
g D E fYg.
This is left as an exercise.
Roughly speaking, independence among r.v.’s is equivalent to the lack
of effect by conditioning relative to one another. However, such intuitive
statements must be treated with caution, as shown by the following example
which will be needed later.
If F1 , F2 , and F3 are three Borel fields such that F1 _ F2 is independent
of F3 , then for each integrable X 2 F1 , we have
3 E fX j F2 _ F3 g D E fX j F2 g.
Instead of a direct verification, which is left to the reader, it is interesting
to deduce this from Theorem 9.2.1 by proving the following proposition.
If F1 _ F2 is independent of F3 , then F1 and F3 are conditionally inde-
pendent relative to F2 .
To see this, let 31 2 F1 , 33 2 F3 . Since

P 31 32 33  D P 31 32 P 33  D P 31 j F2 P 33  dP
32

for every 32 2 F2 , we have


P 31 33 j F2  D P 31 j F2 P 33  D P 31 j F2 P 33 j F2 ,
which proves the proposition.
324 CONDITIONING. MARKOV PROPERTY. MARTINGALE

Next we ask: if X1 and X2 are independent, what is the effect of condi-


tioning X1 C X2 by X1 ?

Theorem 9.2.2. Let X1 and X2 be independent r.v.’s with p.m.’s 1 and 2;


then for each B 2 B1 :
4 P fX1 C X2 2 B j X1 g D 2 B  X1  a.e.
More generally, if fXn ,
n ½ 1g is a sequence of independent r.v.’s with p.m.’s
f n , n ½ 1g, and Sn D njD1 Xj , then for each B 2 B1 :
5 P fSn 2 B j S1 , . . . , Sn1 g D n B  Sn1  D P fSn 2 B j Sn1 g a.e.
PROOF. To prove (4), since its right member belongs to F fX1 g, it
is sufficient to verify that it satisfies the defining relation for its left
member. Let 3 2 F fX1 g, then 3 D X11 A for some A 2 B . It follows from
1

Theorem 3.2.2 that


 
2 B  X1  dP D 2 B  x1  1 dx1 .
3 A
Writing D 1 ð 2 and applying Fubini’s theorem to the right side above,
then using Theorem 3.2.3, we obtain
  
1 dx1  2 dx2  D dx1 , dx2 
A x1 Cx2 2B
x1 2A
x1 Cx2 2B

D dP D P fX1 2 A; X1 C X2 2 Bg.
X1 2A
X1 CX2 2B

This establishes (4).


To prove (5), we begin by observing that the second equation has just been
proved. Next we observe that since fX1 , . . . , Xn g and fS1 , . . . , Sn g obviously
generate the same Borel field, the left member of (5) is just
P fSn D B j X1 , . . . , Xn1 g.
Now it is trivial that as a function of X1 , . . . , Xn1 , Sn “depends on them
only through their sum Sn1 ”. It thus appears obvious that the first term in (5)
should depend only on Sn1 , namely belong to F fSn1 g (rather than the larger
F fS1 , . . . , Sn1 g). Hence the equality of the first and third terms in (5) should
be a consequence of the assertion (13) in Theorem 9.1.5. This argument,
however, is not rigorous, and requires the following formal substantiation.
Let n
D 1 ð ÐÐÐ ð n D n1
ð n and

n1
3D Sj1 Bj ,
jD1
9.2 CONDITIONAL INDEPENDENCE; MARKOV PROPERTY 325

where Bn D B. Sets of the form of 3 generate the Borel field F fS1 , . . . , Sn1 g.
It is therefore sufficient to verify that for each such 3, we have

n Bn  Sn1  dP D P f3; Sn 2 Bn g.
3

If we proceed as before and write sn D njD1 xj , the left side above is equal to
 
ÐÐÐ n Bn  sn1 
n1
dx1 , . . . , dxn1 
sj 2Bj ,1jn1
⎧ ⎫
  ⎨n ⎬
D ÐÐÐ n
dx1 , . . . , dxn  D P [Sj 2 Bj ] ,
⎩ ⎭
sj 2Bj ,1jn jD1

as was to be shown. In the first equation above, we have made use of the fact
that the set of x1 , . . . , xn  in Rn for which xn 2 Bn  sn1 is exactly the set
for which sn 2 Bn , which is the formal counterpart of the heuristic argument
above. The theorem is proved.
The fact of the equality of the two extreme terms in (5) is a fundamental
property of the sequence fSn g. We have indeed made frequent use of this
in the study of sums of independent r.v.’s, particularly for a random walk
(Chapter 8), although no explicit mention has been made of it. It would be
instructive for the reader to review the material there and locate some instances
of its application. As an example, we prove the following proposition, where
the intuitive picture of conditioning is particularly clear.

Theorem 9.2.3. Let fXn , n ½ 1g be an independent (but not necessarily


stationary) process such that for A>0 there exists υ>0 satisfying
inf P fXn ½ Ag > υ.
n

Then we have
8n ½ 1: P fSj 2 0, A] for 1  j  ng  1  υn .
Furthermore, given any finite interval I, there exists an >0 such that
P fSj 2 I, for 1  j  ng  1  n .
PROOF. We write 3n for the event that Sj 2 0, A] for 1  j  n; then
P f3n g D P f3n1 ; 0 < Sn  Ag.
By the definition of conditional probability and (5), the last-written probability
is equal to
326 CONDITIONING. MARKOV PROPERTY. MARTINGALE


P f0 < Sn  A j S1 , . . . , Sn1 g dP
3n1

D [Fn A  Sn1   Fn 0  Sn1 ] dP ,
3n1

where Fn is the d.f. of n . [A quirk of notation forbids us to write the


integrand on the right as P fSn1 < Xn  A  Sn1 g!] Now for each ω0 in
3n1 , Sn1 ω0 >0, hence
Fn A  Sn1 ω0   P fXn < Ag  1  υ
by hypothesis. It follows that

P f3n g  1  υ dP D 1  υP f3n1 g,
3n1

and the first assertion of the theorem follows by iteration. The second is proved
similarly by observing that P fXn C Ð Ð Ð C XnCm1 ½ mAg > υm and choosing
m so that mA exceeds the length of I. The details are left to the reader.
Let N0 D f0g [ N denote the set of positive integers. For a given sequence
of r.v.’s fXn , n 2 N0 g let us denote by F1 the Borel field generated by fXn , n 2
Ig, where I is a subset of N0 , such as [0, n], n, 1, or fng. Thus Ffng , F [0,n] ,
and Fn,1 have been denoted earlier by F fXn g, Fn , and F 0 n , respectively.

The sequence of r.v.’s fXn , n 2 N0 g is


DEFINITION OF MARKOV PROCESS.
said to be a Markov process or to possess the “Markov property” iff for
every n 2 N0 and every B 2 B1 , we have
6 P fXnC1 2 B j X0 , . . . , Xn g D P fXnC1 2 B j Xn g.

This property may be verbally announced as: the conditional distribution


(in the wide sense!) of each r.v. relative to all the preceding ones is the same
as that relative to the last preceding one. Thus if fXn g is an independent
process as defined in Chapter 8, then both the process itself and the process
of the successive partial sums fSn g are Markov processes by Theorems 9.2.1
and 9.2.2. The latter category includes random walk as a particular case; note
that in this case our notation Xn rather than Sn differs from that employed in
Chapter 8.
Equation (6) is equivalent to the apparently stronger proposition: for
every integrable Y 2 FfnC1g , we have
60  E fY j X1 , . . . , Xn g D E fY j Xn g.
It is clear that 60  implies (6). To see the converse, let Ym be a sequence of
simple r.v.’s belonging to FfnC1g and increasing to Y. By (6) and property (ii)
9.2 CONDITIONAL INDEPENDENCE; MARKOV PROPERTY 327

of conditional expectation in Sec. 9.1, 60  is true when Y is replaced by Ym ;


hence by property (v) there, it is also true for Y.
The following remark will be repeatedly used. If Y and Z are integrable,
Z 2 F1 , and  
Y dP D Z dP
3 3

for each 3 of the form 


X1
j Bj ,
j2I0

where I0 is an arbitrary finite subset of I and each Bj is an arbitrary Borel set,


then Z D E Y j F1 . This follows from the uniqueness of conditional expec-
tation and a previous remark given before Theorem 9.1.3, since finite disjoint
unions of sets of this form generate a field that generates FI .
If the index n is regarded as a discrete time parameter, as is usual in
the theory of stochastic processes, then F [0,n] is the field of “the past and the
present”, while Fn,1 is that of “the future”; whether the present is adjoined
to the past or the future is often a matter of convenience. The Markov property
just defined may be further characterized as follows.

Theorem 9.2.4. The Markov property is equivalent to either one of the two
propositions below:
(7) 8n 2 N, M 2 Fn,1 : P fM j F [0,n] g D P fM j Xn g.
(8) 8n 2 N, M1 2 F [0,n] , M2 2 Fn,1 : P fM1 M2 j Xn g
D P fM1 j Xn gP fM2 j Xn g.

These conclusions remain true if Fn,1 is replaced by F [n,1 .


PROOF. To prove (7) implies (8), let Yi D 1Mi , i D 1, 2. We then have
9 P fM1 j Xn gP fM2 j Xn g D E fY1 j Xn gE fY2 j Xn g
D E fY1 E Y2 j Xn  j Xn g
D E fY1 E Y2 j F [0,n]  j Xn g
D E fE Y1 Y2 j F [0,n]  j Xn g
D E fY1 Y2 j Xn g D P fM1 M2 j Xn g,

where the second and fourth equations follow from Theorems 9.1.3, the third
from assumption (7), and the fifth from Theorem 9.1.5.
Conversely, to prove that (8) implies (7), let 3 2 Ffng , M1 2 F [0,n , M2 2
Fn,1 . By the second equation in (9) applied to the fourth equation below,
328 CONDITIONING. MARKOV PROPERTY. MARTINGALE

we have
  
P M2 j Xn  dP D E Y2 j Xn  dP D Y1 E Y2 j Xn  dP
3M1 3M1 3

D E fY1 E Y2 j Xn  j Xn g dP
3

D E Y1 j Xn E Y2 j Xn  dP
3

D P M1 j Xn P M2 j Xn  dP
3

D P M1 M2 j Xn  dP D P 3M1 M2 .
3

Since disjoint unions of sets of the form 3M1 as specified above generate the
Borel field F [0,n] , the uniqueness of P M2 j F [0,n]  shows that it is equal to
P M2 j Xn , proving (7).
Finally, we prove the equivalence of the Markov property and the propo-
sition (7). Clearly the former is implied by the latter; to prove the converse
we shall operate with conditional expectations instead of probabilities and use
induction. Suppose that it has been shown that for every n 2 N, and every
bounded f belonging to F [nC1,nCk] , we have

10 E f j F [0,n]  D E f j Ffng .

This is true for k D 1 by 60 . Let g be bounded, g 2 F [nC1,nCkC1] ; we are going


to show that (10) remains true when f is replaced by g. For this purpose it
is sufficient to consider a g of the form g1 g2 , where g1 2 F [nC1,nCk] , g2 2
FfnCkC1g , both bounded. The successive steps, in slow motion fashion, are as
follows:

E fg j F [0,n] g D E fE g j F [0,nCk]  j F [0,n] g D E fg1 E g2 j F [0,nCk]  j F [0,n] g


D E fg1 E g2 j FfnCkg  j F [0,n] g D E fg1 E g2 j FfnCkg  j Ffng g
D E fg1 E g2 j F [n,nCk]  j Ffng g D E fE g1 g2 j F [n,nCk]  j Ffng g
D E fg1 g2 j Ffng g D E fg j Ffng g.

It is left to the reader to scrutinize each equation carefully for explanation,


except that the fifth one is an immediate consequence of the Markov property
E fg2 j FfnCkg g D E fg2 j F [0,nCk] g and (13) of Sec. 9.1. This establishes (7) for
M2 1 kD1 Fn,nCk , which is a field generating Fn,1 . Hence (7) is true (why?).
The last assertion of the theorem is left as an exercise.
9.2 CONDITIONAL INDEPENDENCE; MARKOV PROPERTY 329

The property embodied in (8) may be announced as follows: “The past


and the future are conditionally independent given the present”. In this form
there is a symmetry that is not apparent in the other equivalent forms.
The Markov property has an extension known as the “strong Markov
property”. In the case discussed here, where the time index is N0 , it is an
automatic consequence of the ordinary Markov property, but it is of great
conceptual importance in applications. We recall the notions of an optional
r.v. ˛, the r.v. X˛ , and the fields F˛ and F 0 ˛ , which are defined for a general
sequence of r.v.’s in Sec. 8.2. Note that here ˛ may take the value 0, which
is included in N0 . We shall give the extension in the form (7).

Theorem 9.2.5. Let fXn , n 2 N0 g be a Markov process and ˛ a finite


optional r.v. relative to it. Then for each M 2 F 0 ˛ we have
11 P fM j F˛ g D P fM j ˛, X˛ g.
PROOF. Since ˛ 2 F˛ and X˛ 2 F˛ (Exercise 2 of Sec. 8.2), the right
member above belongs to F˛ . To prove (11) it is then sufficient to verify
that the right member satisfies the defining relation for the left, when M is of
the form

fX˛Cj 2 Bj g, Bj 2 B1 , 1  j  ; 1   < 1.
jD1
Put for each n,


Mn D fXnCj 2 Bj g 2 Fn,1 .
jD1

Now the crucial step is to show that


1
12 P fMn j Xn g1f˛Dng D P fM j ˛, X˛ g.
nD0
By the lemma in the proof of Theorem 9.1.2, there exists a Borel measurable
function ϕn such that P fMn j Xn g D ϕn Xn , from which it follows that the
left member of (12) belongs to the Borel field generated by the two r.v.’s ˛
and X˛ . Hence we shall prove (12) by verifying that its left member satisfies
the defining relation for its right member, as follows. For each m 2 N and
B 2 B1 , we have
 1
 
P fMn j Xn g1f˛Dng dP D P fMm j Xm g dP
f˛Dm:X˛ 2Bg nD0 f˛Dm:Xm 2Bg

D P fMm j F [0,m] g dP D P f˛ D m; Xm 2 B; Mm g
f˛Dm:Xm 2Bg

D P f˛ D m; X˛ 2 B; Mg,
330 CONDITIONING. MARKOV PROPERTY. MARTINGALE

where the second equation follows from an application of (7), and the third
from the optionality of ˛, namely
f˛ D mg 2 F [0,m]
This establishes (12).
Now let 3 2 F˛ , then [cf. (3) of Sec. 8.2] we have
1
3D f˛ D n \ 3n g,
nD0
where 3n 2 F [0,n] . It follows that
 1 1 

P f3Mg D P f˛ D n; 3n ; Mn g D P fMn j F [0,n] g dP
nD0 nD0 ˛Dn\3n

1 
 
D P fMn j Xn g1f˛Dng dP D P fM j ˛, X˛ g dP ,
nD0 3 3

where the third equation is by an application of (7) while the fourth is by (12).
This being true for each 3, we obtain (11). The theorem is proved.
When ˛ is a constant n, it may be omitted from the right member of
(11), and so (11) includes (7) as a particular case. It may be omitted also in
the homogeneous case discussed below (because then the ϕn above may be
chosen to be independent of n).
There is a very general method of constructing a Markov process with
given “transition probability functions”, as follows. Let P0 Ð be an arbitrary
p.m. on R1 , B1 . For each n ½ 1 let Pn Ð, Ð be a function of the pair (x, B)
where x 2 R1 and B 2 B1 , having the measurability properties below:

(a) for each x, Pn x, Ð is a p.m. on B1 ;


(b) for each B, Pn Ð, B 2 B1 .

It is a consequence of Kolmogorov’s extension theorem (see Sec. 3.3) that


there exists a sequence of r.v.’s fXn , n 2 N0 g on some probability space
with the following “finite-dimensional joint distributions”: for each 0  L <
1, Bj 2 B1 , 0  j  n:
⎧ ⎫
⎨ n ⎬  
P [Xj 2 Bj ] D P0 dx0  P1 x0 , dx1 
⎩ ⎭ B0 B1
jD0

13 ð ÐÐÐ ð Pn xn1 , dxn .
Bn

There is no difficulty in verifying that (13) yields an n C 1-dimensional p.m.


on RnC1 , BnC1  and so also on each subspace by setting some of the Bj ’s
9.2 CONDITIONAL INDEPENDENCE; MARKOV PROPERTY 331

above to be R1 , and that the resulting collection of p.m.’s are mutually consis-
tent in the obvious sense [see (16) of Sec. 3.3]. But the actual construction of
the process fXn , n 2 N0 g with these given “marginal distributions” is omitted
here, the procedure being similar to but somewhat more sophisticated than that
given in Theorem 3.3.4, which is a particular case. Assuming the existence,
we will now show that it has the Markov property by verifying (6) briefly.
By Theorem 9.1.5, it will be sufficient to show that one version of the left
member of (6) is given by PnC1 Xn , B, which belongs to Ffng by condition
(b) above. Let then
n
3D [Xj 2 Bj ]
jD0

and nC1
be the n C 1-dimensional p.m. of the random vector
X0 , . . . , Xn . It follows from Theorem 3.2.3 and (13) used twice that
  
PnC1 Xn , B dP D Ð Ð Ð PnC1 xn , Bd nC1
3
B0 ðÐÐÐðBn
  
nC1
D ÐÐÐ P0 dx0  Pj xj1 , dxj 
B0 ðÐÐÐðBn ðB jD1

D P 3; XnC1 2 B.


This is what was to be shown.
We call P0 Ð the initial distribution of the Markov process and Pn Ð, Ð
its “nth-stage transition probability function”. The case where the latter is the
same for all n ½ 1 is particularly important, and the corresponding Markov
process is said to be “(temporally) homogeneous” or “with stationary transition
probabilities”. In this case we write, with x D x0 :
  n1

14 Pn x, B D ÐÐÐ Pxj , dxjC1 ,
jD0
R1 ðÐÐÐðR1 ðB

and call it the “n-step transition probability function”; when n D 1, the qual-
ifier “1-step” is usually dropped. We also put P0 x, B D 1B x. It is easy to
see that

15 PnC1 x, B D Pn y, BP1 x, dy,
R1

so that all Pn are just the iterates of P1 .


It follows from Theorem 9.2.2 that for the Markov process fSn , n 2 Ng
there, we have
Pn x, B D n B  x.
332 CONDITIONING. MARKOV PROPERTY. MARTINGALE

In particular, a random walk is a homogeneous Markov process with the 1-step


transition probability function
P1 x, B D B  x.
In the homogeneous case Theorem 9.2.4 may be sharpened to read like
Theorem 8.2.2, which becomes then a particular case. The proof is left as an
exercise.

Theorem 9.2.6. For a homogeneous Markov process and a finite r.v. ˛ which
is optional relative to the process, the pre-˛ and post-˛ fields are conditionally
independent relative to X˛ , namely:
83 2 F˛ , M 2 F˛0 : P f3M j X˛ g D P f3 j X˛ gP fM j X˛ g.
Furthermore, the post-˛ process fX˛Cn , n 2 Ng is a homogeneous Markov
process with the same transition probability function as the original one.

Given a Markov process fXn , n 2 N0 g, the distribution of X0 is a p.m.


and for each B 2 B1 , there is according to Theorem 9.1.2 a Borel measurable
function ϕn Ð, B such that
P fXnC1 2 B j Xn D xg D ϕn x, B.
It seems plausible that the function ϕn Ð, Ð would correspond to the n-stage
transition probability function just discussed. The trouble is that while condi-
tion (b) above may be satisfied for each B by a particular choice of ϕn Ð, B, it
is by no means clear why the resulting collection for varying B would satisfy
condition (a). Although it is possible to ensure this by means of conditional
distributions in the wide sense alluded to in Exercise 8 of Sec. 9.1, we shall
not discuss it here (see Doob [16, chap. 2]).
The theory of Markov processes is the most highly developed branch
of stochastic processes. Special cases such as Markov chains, diffusion, and
processes with independent increments have been treated in many mono-
graphs, a few of which are listed in the Bibliography at the end of the book.

EXERCISES

 1. Prove that the Markov property is also equivalent to the following


proposition : if t1 < Ð Ð Ð < tn < tnC1 are indices in N0 and Bj , 1  j  n C 1,
are Borel sets, then
P fXtnC1 2 BnC1 j Xt1 , . . . , Xtn g D P fXtnC1 2 BnC1 j Xtn g.

In this form we can define a Markov process fXt g with a continuous parameter
t ranging in [0, 1.
9.2 CONDITIONAL INDEPENDENCE; MARKOV PROPERTY 333

2. Does the Markov property imply the following (notation as in


Exercise 1):
P fXnC1 2 BnC1 j X1 2 B1 , . . . , Xn 2 Bn g D P fXnC1 2 BnC1 j Xn 2 Bn g?

3. Unlike independence, the Markov property is not necessarily preserved


by a “functional”: ffXn , n 2 N0 g. Give an example of this, but show that it
is preserved by each one-to-one Borel measurable mapping f.
 4. Prove the strong Markov property in the form of (8).
5. Prove Theorem 9.2.6.
 6. For the Pn defined in (14), prove the “Chapman–Kolmogorov
equations”:

8m 2 N, n 2 N: PmCn x, B D Pm x, dyPn y, B.
R1

7. Generalize the Chapman–Kolmogorov equation in the nonhomo-


geneous case.
 8. For the homogeneous Markov process constructed in the text, show
that for each f ½ 0 we have

E ffXmCn  j Xm g D fyPn Xm , dy.
R1

 9. Let B be a Borel set, f x, B D Px, B, and define f for n ½ 2


1 n
inductively by 
fn x, B D Px, dyfn1 y, B;
Bc
1
put fx, B D nD1 fn x, B. Prove that fXn , B is a version of the
conditional probability P f 1
jDnC1 [Xj 2 B] j Xn g for the homogeneous Markov
process with transition probability function PÐ, Ð.
 10. Using the f defined in Exercise 9, put

1 
gx, B D fx, B  Pn x, dy[1  fy, B].
nD1 B

Prove that gXn , B is a version of the conditional probability


P flim sup[Xj 2 B] j Xn g.
j

 11. Suppose that for a homogeneous Markov process the initial distri-
bution has support in N0 as a subset of R1 , and that for each i 2 N0 , the
334 CONDITIONING. MARKOV PROPERTY. MARTINGALE

transition probability function Pi, Ð, also has support in N0 . Thus

fPi, j; i, j 2 N0 ð N0 g

is an infinite matrix called the “transition matrix”. Show that Pn as a matrix
is just the nth power of P1 . Express the probability P fXtk D ik , 1  k  ng
in terms of the elements of these matrices. [This is the case of homogeneous
Markov chains.]
12. A process fXn , n 2 N0 g is said to possess the “rth-order Markov
property”, where r ½ 1, iff (6) is replaced by

P fXnC1 2 B j X0 , . . . , Xn g D P fXnC1 2 B j Xn , . . . , XnrC1 g

for n ½ r  1. Show that if r < s, then the rth-order Markov property implies
the sth. The ordinary Markov property is the case r D 1.
13. Let Yn be the random vector Xn , XnC1 , . . . , XnCr1 . Then the
vector process fYn , n 2 N0 g has the ordinary Markov property (trivially
generalized to vectors) if and only if fXn , n 2 N0 g has the rth-order Markov
property.
14. Let fXn , n 2 N0 g be an independent process. Let

n 
n
Sn1 D Xj , SnrC1 D Sjr
jD0 jD0

for r ½ 1. Then fSnr , n 2 N0 g has the rth-order Markov property. For r D 2,


give an example to show that it need not be a Markov process.
15. If fSn , n 2 Ng is a random walk such that P fS1 6D 0g>0, then for
any finite interval [a, b] there exists an  < 1 such that

P fSj 2 [a, b], 1  j  ng  n .

This is just Exercise 6 of Sec. 5.5 again.]


16. The same conclusion is true if the random walk above is replaced
by a homogeneous Markov process for which, e.g., there exist υ>0 and >0
such that Px, R1  x  υ, x C υ ½  for every x.

9.3 Basic properties of smartingales

The sequence of sums of independent r.v.’s has motivated the generalization


to a Markov process in the preceding section; in another direction it will now
motivate a martingale. Changing our previous notation to conform with later
9.3 BASIC PROPERTIES OF SMARTINGALES 335

let fxn , n 2 Ng denote independent r.v.’s with mean zero and write
usage,
Xn D njD1 xj for the partial sum. Then we have
E XnC1 j x1 , . . . , xn  D E Xn C xnC1 j x1 , . . . , xn 
D Xn C E xnC1 j x1 , . . . , xn  D Xn C E xnC1  D Xn .
Note that the conditioning with respect to x1 , . . . , xn may be replaced by
conditioning with respect to X1 , . . . , Xn (why?). Historically, the equation
above led to the consideration of dependent r.v.’s fxn g satisfying the condition
1 E xnC1 j x1 , . . . , xn  D 0.
It is astonishing that this simple property should delineate such a useful class
of stochastic processes which will now be introduced. In what follows, where
the index set for n is not specified, it is understood to be either N or some
initial segment Nm of N.

DEFINITION OF MARTINGALE. The sequence of r.v.’s and B.F.’s fXn , Fn g is


called a martingale iff we have for each n:

(a) Fn ² FnC1 and Xn 2 Fn ;


(b) E jXn j < 1;
(c) Xn D E XnC1 j Fn , a.e.

It is called a supermartingale iff the “D” in (c) above is replaced by “½”, and
a submartingale iff it is replaced by “”. For abbreviation we shall use the
term smartingale to cover all three varieties. In case Fn D F [1,n] as defined in
Sec. 9.2, we shall omit Fn and write simply fXn g; more frequently however
we shall consider fFn g as given in advance and omitted from the notation.
Condition (a) is nowadays referred to as: fXn g is adapted to fFn g. Condi-
tion (b) says that all the r.v.’s are integrable; we shall have to impose stronger
conditions to obtain most of our results. A particularly important one is the
uniform integrability of the sequence fXn g, which is discussed in Sec. 4.5. A
weaker condition is given by
2 sup E jXn j < 1;
n

when this is satisfied we shall say that fXn g is L 1 -bounded. Condition (c) leads
at once to the more general relation:
3 n < m ) Xn D E Xm j Fn .
This follows from Theorem 9.1.5 by induction since
E Xm j Fn  D E E Xm j Fm1  j Fn  D E Xm1 j Fn .
336 CONDITIONING. MARKOV PROPERTY. MARTINGALE

An equivalent form of (3) is as follows: for each 3 2 Fn and n  m, we have


 
4 Xn dP D Xm dP .
3 3

It is often safer to use the explicit formula (4) rather than (3), because condi-
tional expectations can be slippery things to handle. We shall refer to (3) or
(4) as the defining relation of a martingale; similarly for the “super” and “sub”
varieties.
Let us observe that in the form (3) or (4), the definition of a smartingale
is meaningful if the index set N is replaced by any linearly ordered set, with
“<” as the strict order. For instance, it may be an interval or the set of
rational numbers in the interval. But even if we confine ourselves to a discrete
parameter (as we shall do) there are other index sets to be considered below.
It is scarcely worth mentioning that fXn g is a supermartingale if and
only if fXn g is a submartingale, and that a martingale is both. However the
extension of results from a martingale to a smartingale is not always trivial, nor
is it done for the sheer pleasure of generalization. For it is clear that martingales
are harder to come by than the other varieties. As between the super and sub
cases, though we can pass from one to the other by simply changing signs,
our force of habit may influence the choice. The next proposition is a case in
point.

Theorem 9.3.1. Let fXn , Fn g be a submartingale and let ϕ be an increasing


convex function defined on R1 . If ϕXn  is integrable for every n, then
fϕXn , Fn g is also a submartingale.
PROOF. Since ϕ is increasing, and
Xn  E fXnC1 j Fn g
we have
5 ϕXn   ϕE fXnC1 j Fn g.
By Jensen’s inequality (Sec. 9.1), the right member above does not exceed
E fϕXnC1  j Fn g; this proves the theorem. As forewarned in 9.1, we have left
out some “a.e.” above and shall continue to do so.

Corollary 1. If fXn , Fn g is a submartingale, then so is fXC C


n , Fn g. Thus E Xn 
as well as E Xn  is increasing with n.

Corollary 2. If fXn , Fn g is a martingale, then fjXn j, Fn g is a submartingale;


and fjXn jp , Fn g, 1 < p < 1, is a submartingale provided that every Xn 2 L p ;
similarly for fjXn j logC jXn j, Fn g where logC x D log x _ 0 for x ½ 0.
9.3 BASIC PROPERTIES OF SMARTINGALES 337

PROOF. For a martingale we have equality in (5) for any convex ϕ, hence
we may take ϕx D jxj, jxjp or jxj logC jxj in the proof above.
Thus for a martingale fXn g, all three transmutations: fXC 
n g, fXn g and
fjXn jg are submartingales. For a submartingale fXn g, nothing is said about the
last two.

Corollary 3. If fXn , Fn g is a supermartingale, then so is fXn ^ A, Fn g where


A is any constant.
PROOF. We leave it to the reader to deduce this from the theorem, but
here is a quick direct proof:
Xn ^ A ½ E XnC1 j Fn  ^ E A j Fn  ½ E XnC1 ^ A j Fn .
It is possible to represent any smartingale as a martingale plus or minus
something special. Let us call a sequence of r.v.’s fZn , n 2 Ng an increasing
process iff it satisfies the conditions:

( i) Z1 D 0; Zn  ZnC1 for n ½ 1;
( ii) E Zn  < 1 for each n.

It follows that Z1 D limn!1 " Zn exists but may take the value C1; Z1
is integrable if and only if fZn g is L 1 -bounded as defined above, which
means here limn!1 " E Zn  < 1. This is also equivalent to the uniform
integrability of fZn g because of (i). We can now state the result as follows.

Theorem 9.3.2. Any submartingale fXn , Fn g can be written as


6 Xn D Yn C Zn ,
where fYn , Fn g is a martingale, and fZn g is an increasing process.
PROOF. From fXn g we define its difference sequence as follows:
7 x1 D X1 , xn D Xn  Xn1 , n ½ 2,
n
so that Xn D jD1 xj , n ½ 1 (cf. the notation in the first paragraph of this
section). The defining relation for a submartingale then becomes
E fXn j Fn1 g ½ 0,
with equality for a martingale. Furthermore, we put

n
y1 D x1 , yn D xn  E fxn j Fn1 g, Yn D yj ;
jD1
n
z1 D 0, zn D E fxn j Fn1 g, Zn D zj .
jD1
338 CONDITIONING. MARKOV PROPERTY. MARTINGALE

Then clearly xn D yn C zn and (6) follows by addition. To show that fYn , Fn g


is a martingale, we may verify that E fyn j Fn1 g D 0 as indicated a moment
ago, and this is trivial by Theorem 9.1.5. Since each zn ½ 0, it is equally
obvious that fZn g is an increasing process. The theorem is proved.
Observe that Zn 2 Fn1 for each n, by definition. This has important
consequences; see Exercise 9 below. The decomposition (6) will be called
Doob’s decomposition. For a supermartingale we need only change the “C”
there into “”, since fYn , Fn g is a martingale. The following complement
is useful.

Corollary. If fXn g is L 1 -bounded [or uniformly integrable], then both fYn g


and fZn g are L 1 -bounded [or uniformly integrable].
PROOF. We have from (6):

E Zn   E jXn j  E Y1 

since E Yn  D E Y1 . Since Zn ½ 0 this shows that if fXn g is L 1 -bounded,


then so is fZn g; and fYn g is too because

E jYn j  E jXn j C E Zn .

Next if fXn g is uniformly integrable, then it is L 1 -bounded by Theorem 4.5.3,


hence fZn g is L 1 -bounded and therefore uniformly integrable as remarked
before. The uniform integrability of fYn g then follows from the last-written
inequality.
We come now to the fundamental notion of optional sampling of a
smartingale. This consists in substituting certain random variables for the orig-
inal index n regarded as the time parameter of the process. Although this kind
of thing has been done in Chapter 8, we will reintroduce it here in a slightly
different way for the convenience of the reader. To begin with we adjoin a last
index 1 to the set N and call it N1 D f1, 2, . . . , 1g. This C is an example of
a linearly ordered set mentioned above. Next, adjoin F1 D 1 nD1 Fn to fFn g.
A r.v. ˛ taking values in N1 is called optional (relative to fFn , n 2 N1 g)
iff for every n 2 N1 we have

8 f˛  ng 2 Fn .

Since Fn increases with n, the condition in (8) is unchanged if f˛  ng is


replaced by f˛ D ng. Next, for an optional ˛, the pre-˛ field F˛ is defined
to be the class of all subsets 3 of F1 satisfying the following condition: for
each n 2 N1 we have

9 3 \ f˛  ng 2 Fn ,
9.3 BASIC PROPERTIES OF SMARTINGALES 339

where again f˛  ng may be replaced by f˛ D ng. Writing then


10 3n D 3 \ f˛ D ng,
we have 3n 2 Fn and

3D 3n D [f˛ D ng \ 3n ]
n n

where the index n ranges over N1 . This is (3) of Sec. 8.2. The reader should
now do Exercises 1–4 in Sec. 8.2 to get acquainted with the simplest proper-
ties of optionality. Here are some of them which will be needed soon: F˛ is
a B.F. and ˛ 2 F˛ ; if ˛ is optional then so is ˛3n for each n 2 N; if ˛  ˇ
where ˇ is also optional then F˛ ² Fˇ ; in particular F˛3n ² F˛ \ Fn and in
fact this inclusion is an equation.
Next we assume X1 has been defined and X1 2 F1 . We then define X˛
as follows:
11 X˛ ω D X˛ω ω;
in other words,
X˛ ω D Xn ω on f˛ D ng, n 2 N1 .
This definition makes sense for any ˛ taking values in N1 , but for an optional
˛ we can assert moreover that
12 X˛ 2 F˛ .
This is an exercise the reader should not miss; observe that it is a natural
but nontrivial extension of the assumption Xn 2 Fn for every n. Indeed, all
the general propositions concerning optional sampling aim at the same thing,
namely to make optional times behave like constant times, or again to enable
us to substitute optional r.v.’s for constants. For this purpose conditions must
sometimes be imposed either on ˛ or on the smartingale fXn g. Let us however
begin with a perfect case which turns out to be also very important.
We introduce a class of martingales as follows. For any integrable r.v. Y
we put
13 Xn D E Y j Fn , n 2 N1 .
By Theorem 9.1.5, if n  m:
14 Xn D E fE Y j Fm  j Fn g D E fXm j Fn g
which shows fXn , Fn g is a martingale, not only on N but also on N1 . The
following properties extend both (13) and (14) to optional times.
340 CONDITIONING. MARKOV PROPERTY. MARTINGALE

Theorem 9.3.3. For any optional ˛, we have

15 X˛ D E Y j F˛ .

If ˛  ˇ where ˇ is also optional, then fX˛ , F˛ ; Xˇ , Fˇ g forms a two-term


martingale.
PROOF. Let us first show that X˛ is integrable. It follows from (13) and
Jensen’s inequality that
jXn j  E jYj j Fn .

Since f˛ D ng 2 Fn , we may apply this to get


   
jX˛ j dP D jXn j dP  jYj dP D jYj dP < 1.
 n f˛Dng n f˛Dng 

Next if 3 2 F˛ , we have, using the notation in (10):


   
X˛ dP D Xn dP D Y dP D Y dP ,
3 n 3n n 3n 3

where the second equation holds by (13) because 3n 2 Fn . This establishes


(15). Now if ˛  ˇ, then F˛ ² Fˇ and consequently by Theorem 9.1.5.

X˛ D E fE Y j Fˇ  j F˛ g D E fXˇ j F˛ g,

which proves the second assertion of the theorem.


As an immediate corollary, if f˛n g is a sequence of optional r.v.’s such
that

16 ˛1  ˛2  Ð Ð Ð  ˛n  Ð Ð Ð ,

then fX˛n , F˛n g is a martingale. This new martingale is obtained by sampling


the original one at the optional times f˛j g. We now proceed to extend the
second part of Theorem 9.3.3 to a supermartingale. There are two important
cases which will be discussed separately.

Theorem 9.3.4. Let ˛ and ˇ be two bounded optional r.v.’s such that
˛  ˇ. Then for any [super]martingale fXn g, fX˛ , F˛ ; Xˇ , Fˇ g forms a
[super]martingale.
PROOF. Let 3 2 F˛ ; using (10) again we have for each k ½ j:
3j \ fˇ > kg 2 Fk
9.3 BASIC PROPERTIES OF SMARTINGALES 341

because 3j 2 Fj ² Fk , whereas fˇ > kg D fˇ  kgc 2 Fk . It follows from the


defining relation of a supermartingale that
 
Xk dP ½ XkC1 dP
3j \fˇ>kg 3j \fˇ>kg

and consequently
  
Xk dP ½ Xk dP C XkC1 dP
3j \fˇ½kg 3j \fˇDkg 3j \fˇ>kg

Rewriting this as
  
Xk dP  XkC1 dP ½ Xˇ dP ;
3j \fˇ½kg 3j \fˇ½kC1g 3j \fˇDkg

summing over k from j to m, where m is an upper bound for ˇ; and then


replacing Xj by X˛ on 3j , we obtain
  
17 X˛ dP  XmC1 dP ½ Xˇ dP .
3j \fˇ½jg 3j \fˇ½mC1g 3j \fjˇmg
 
X˛ dP ½ Xˇ dP .
3j 3j

Another summation over j from 1 to m yields the desired result. In the case
of a martingale the inequalities above become equations.
A particular case of a bounded optional r.v. is ˛n D ˛3n where ˛ is
an arbitrary optional r.v. and n is a positive integer. Applying the preceding
theorem to the sequence f˛n g as under Theorem 9.3.3, we have the following
corollary.

Corollary. If fXn , Fn g is a [super]martingale and ˛ is an arbitrary optional


r.v., then fX˛3n F˛3n g is a [super]martingale.

In the next basic theorem we shall assume that the [super]martingale


is given on the index set N1 . This is necessary when the optional r.v.
can take the value C1, as required in many applications; see the typical
example in (5) of Sec. 8.2. It turns out that if fXn g is originally given only
for n 2 N, we may take X1 D limn!1 Xn to extend it to N1 under certain
conditions, see Theorems 9.4.5 and 9.4.6 and Exercise 6 of Sec. 9.4. A trivial
case occurs when fXn , Fn ; n 2 Ng is a positive supermartingale; we may then
take X1 D 0.

Theorem 9.3.5. Let ˛ and ˇ be two arbitrary optional r.v.’s such that ˛  ˇ.
Then the conclusion of Theorem 9.3.4 holds true for any supermartingale
fXn , Fn ; n 2 N1 g.
342 CONDITIONING. MARKOV PROPERTY. MARTINGALE

Remark. For a martingale fXn , Fn ; n 2 N1 g this theorem is contained


in Theorem 9.3.3 since we may take the Y in (13) to be X1 here.
PROOF. (a) Suppose first that the supermartingale is positive with X1 D 0
a.e. The inequality (17) is true for every m 2 N, but now the second integral
there is positive so that we have
 
X˛ dP ½ Xˇ dP .
3j 3j \fˇmg

Since the integrands are positive, the integrals exist and we may let m ! 1
and then sum over j 2 N. The result is
 
X˛ dP ½ Xˇ dP
3\f˛<1g 3\fˇ<1g

which falls short of the goal. But we can add the inequality
   
X˛ dP D X1 dP D X1 dP D Xˇ dP
3\f˛D1g 3\f˛D1g 3\fˇD1g 3\fˇD1g

which is trivial because X1 D 0 a.e. This yields the desired


 
18 X˛ dP ½ Xˇ dP .
3 3

Let us show that X˛ and Xˇ are in fact integrable. Since Xn ½ X1 we have


X˛  limn!1 X˛3n so that by Fatou’s lemma,
19 E X˛   lim E X˛3n .
n!1

Since 1 and ˛3n are two bounded optional r.v.’s satisfying 1  ˛3n; the
right-hand side of (19) does not exceed E X1  by Theorem 9.3.4. This shows
X˛ is integrable since it is positive.
(b) In the general case we put
X0n D E fX1 j Fn g, X00n D Xn  X0n .
Then fX0n , Fn ; n 2 N1 g is a martingale of the kind introduced in (13), and
Xn ½ X0n by the defining property of supermartingale applied to Xn and X1 .
Hence the difference fX00n , Fn ; n 2 Ng is a positive supermartingale with X001 D
0 a.e. By Theorem 9.3.3, fX0˛ , F˛ ; X0ˇ , Fˇ g forms a martingale; by case (a),
fX00˛ , F˛ ; X00ˇ , Fˇ g forms a supermartingale. Hence the conclusion of the theorem
follows simply by addition.
The two preceding theorems are the basic cases of Doob’s optional
sampling theorem. They do not cover all cases of optional sampling (see e.g.
9.3 BASIC PROPERTIES OF SMARTINGALES 343

Exercise 11 of Sec. 8.2 and Exercise 11 below), but are adequate for many
applications, some of which will be given later.
Martingale theory has its intuitive background in gambling. If Xn is inter-
preted as the gambler’s capital at time n, then the defining property postulates
that his expected capital after one more game, played with the knowledge of
the entire past and present, is exactly equal to his current capital. In other
words, his expected gain is zero, and in this sense the game is said to be
“fair”. Similarly a smartingale is a game consistently biased in one direc-
tion. Now the gambler may opt to play the game only at certain preferred
times, chosen with the benefit of past experience and present observation,
but without clairvoyance into the future. [The inclusion of the present status
in his knowledge seems to violate raw intuition, but examine the example
below and Exercise 13.] He hopes of course to gain advantage by devising
such a “system” but Doob’s theorem forestalls him, at least mathematically.
We have already mentioned such an interpretation in Sec. 8.2 (see in partic-
ular Exercise 11 of Sec. 8.2; note that ˛ C 1 rather than ˛ is the optional
time there.) The present generalization consists in replacing a stationary inde-
pendent process by a smartingale. The classical problem of “gambler’s ruin”
illustrates very well the ideas involved, as follows.
Let fSn , n 2 N0 g be a random walk in the notation of Chapter 8, and let S1
have the Bernoullian distribution 12 υ1 C 12 υ1 . It follows from Theorem 8.3.4,
or the more elementary Exercise 15 of Sec. 9.2, that the walk will almost
certainly leave the interval [a, b], where a and b are strictly positive integers;
and since it can move only one unit a time, it must reach either a or b. This
means that if we set
20 ˛ D minfn ½ 1: Sn D ag, ˇ D minfn ½ 1: Sn D bg,
then  D ˛3ˇ is a finite optional r.v. It follows from the Corollary to
Theorem 9.3.4 that fS3n g is a martingale. Now
21 lim S3n D S a.e.
n!1

and clearly S takes only the values a and b. The question is: with what
probabilities? In the gambling interpretation: if two gamblers play a fair coin-
tossing game and possess, respectively, a and b units of the constant stake as
initial capitals, what is the probability of ruin for each?
The answer is immediate (“without any computation”!) if we show first
that the two r.v.’s fS1 , S g form a martingale, for then
22 E S  D E S1  D 0,
which is to say that
aP fS D ag C bP fS D bg D 0,
344 CONDITIONING. MARKOV PROPERTY. MARTINGALE

so that the probability of ruin is inversely proportional to the initial capital of


the gambler, a most sensible solution.
To show that the pair fS1 , S g forms a martingale we use Theorem 9.3.5
since fS3n , nN1 g is a bounded martingale. The more elementary
Theorem 9.3.4 is inapplicable, since  is not bounded. However, there is
a simpler way out in this case: (21) and the boundedness just mentioned
imply that
E S  D lim E S3n ,
n!1

and since E S31  D E S1 , (22) follows directly.


The ruin problem belonged to the ancient history of probability theory,
and can be solved by elementary methods based on difference equations (see,
e.g., Uspensky, Introduction to mathematical probability, McGraw-Hill, New
York, 1937). The approach sketched above, however, has all the main ingredi-
ents of an elaborate modern theory. The little equation (22) is the prototype of
a “harmonic equation”, and the problem itself is a “boundary-value problem”.
The steps used in the solution — to wit: the introduction of a martingale,
its optional stopping, its convergence to a limit, and the extension of the
martingale property to include the limit with the consequent convergence of
expectations — are all part of a standard procedure now ensconced in the
general theory of Markov processes and the allied potential theory.

EXERCISES

1. The defining relation for a martingale may be generalized as follows.


For each optional r.v. ˛  n, we have E fXn j F˛ g D X˛ . Similarly for a
smartingale.
 2. If X is an integrable r.v., then the collection of (equivalence classes
of) r.v.’s E X j G  with G ranging over all Borel subfields of F , is uniformly
integrable.
3. Suppose fXk n , Fn g, k D 1, 2, are two [super]martingales, ˛ is a finite
optional r.v., and X1˛ D [½]X˛ . Define Xn D Xn 1fn˛g C Xn 1fn>˛g ; show
2 1 2

that fXn , Fn g is a [super]martingale. [HINT: Verify the defining relation in (4)


for m D n C 1.]
4. Suppose each Xn is integrable and
E fXnC1 j X1 , . . . , Xn g D n1 X1 C Ð Ð Ð C Xn 

then fn1 X1 C Ð Ð Ð C Xn , n 2 Ng is a martingale.


5. Every sequence of integrable r.v.’s is the sum of a supermartingale
and a submartingale.
6. If fXn , Fn g and fX0n , Fn g are martingales, then so is fXn C X0n , Fn g.
But it may happen that fXn g and fX0n g are martingales while fXn C X0n g is not.
9.3 BASIC PROPERTIES OF SMARTINGALES 345

[HINT: Let x1 and x10 be independent Bernoullian r.v.’s; and x2 D x20 D C1 or


1 according as x1 C x10 D 0 or 6D 0; notation as in (7).]
7. Find an example of a positive martingale which is not uniformly inte-
grable. [HINT: You win 2n if it’s heads n times in a row, and you lose everything
as soon as it’s tails.]
8. Find an example of a martingale fXn g such that Xn ! 1 a.e. This
implies that even in a “fair” game one player may be bound to lose an
arbitrarily large amount if he plays long enough (and no limit is set to the
liability of the other player). [HINT: Try sums of independent but not identically
distributed r.v.’s with mean 0.]
 9. Prove that if fY , F g is a martingale such that Y 2 F , then for
n n n n1
every n, Yn D Y1 a.e. Deduce from this result that Doob’s decomposition (6)
is unique (up to equivalent r.v.’s) under the condition that Zn 2 Fn1 for every
n ½ 2. If this condition is not imposed, find two different decompositions.
10. If fXn g is a uniformly integrable submartingale, then for any optional
r.v. ˛ we have

(i) fX˛3n g is a uniformly integrable submartingale;


(ii) E X1   E X˛   supn E Xn .

[HINT: jX˛3n j  jX˛ j C jXn j.]


 11. Let fX , F ; n 2 Ng be a [super]martingale satisfying the following
n n
condition: there exists a constant M such that for every n ½ 1:

E fjXn  Xn1 jFn1 g  Ma.e.

where X0 D 0 and F0 is trivial. Then for any two optional r.v.’s ˛ and ˇ
such that ˛  ˇ and E ˇ < 1, fX˛ , F˛ ; Xˇ , Fˇ g is a [super]martingale. This
is another case of optional sampling given by Doob, which includes Wald’s
equation (Theorem 5.5.3) as a special case. [HINT: Dominate  the integrand in
the second integral in (17) by Yˇ where X0 D 0 and Ym D m nD1 jXn  Xn1 j.
We have
1 
E Yˇ  D jXn  Xn1 j dP  ME ˇ.]
nD1 fˇ½ng

12. Apply Exercise 11 to the gambler’s ruin problem discussed in the


text and conclude that for the ˛ in (20) we must have E ˛ D C1. Verify
this by elementary computation.
 13. In the gambler’s ruin problem take b D 1 in (20). Compute E S 
ˇ^n
for a fixed n and show that fS0 , Sˇ^n g forms a martingale. Observe that fS0 , Sˇ g
does not form a martingale and explain in gambling terms the effect of stopping
346 CONDITIONING. MARKOV PROPERTY. MARTINGALE

ˇ at n. This example shows why in optional sampling the option may be taken
even with the knowledge of the present moment under certain conditions. In
the case here the present (namely ˇ ^ n) may leave one no choice!
14. In the gambler’s ruin problem, suppose that S1 has the distribution
pυ1 C 1  pυ1 , p 6D 12 ;

and let d D 2p  1. Show that E S  D dE . Compute the probabilities of


ruin by using difference equations to deduce E , and vice versa.
15. Prove that for any L 1 -bounded smartingale fXn , Fn , n 2 N1 g, and
any optional ˛, we have E jX˛ j < 1. [HINT: Prove the result first for a
martingale, then use Doob’s decomposition.]
 16. Let fX , F g be a martingale: x D X , x D X  X
n n 1 1 n n n1 for n ½ 2;
let vn 2 Fn1 for n ½ 1 where F0 D F1 ; now put

n
Tn D vj xj .
jD1

Show that fTn , Fn g is a martingale provided that Tn is integrable for every n.


The martingale may be replaced by a smartingale if vn ½ 0 for every n. As
a particular case take vn D 1fn˛g where ˛ is an optional r.v. relative to fFn g.
What then is Tn ? Hence deduce the Corollary to Theorem 9.3.4.
17. As in the preceding exercise, deduce a new proof of Theorem 9.3.4
by taking vn D 1f˛<nˇg .

9.4 Inequalities and convergence


We begin with two inequalities, the first of which is a generalization of
Kolmogorov’s inequality (Theorem 5.3.1).

Theorem 9.4.1. If fXj , Fj , j 2 Nn g is a submartingale, then for each real 


we have

1 P f max Xj ½ g  Xn dP  E XC
n ;
1jn fmax1jn Xj ½g

2 P f min Xj  g  E Xn  X1   Xn dP
1jn fmin1jn Xj g

 E XC
n   E X1 .

PROOF. Let ˛ be the first j such that Xj ½  if there is such a j in Nn ,


otherwise let ˛ D n (optional stopping at n). It is clear that ˛ is optional;
9.4 INEQUALITIES AND CONVERGENCE 347

since it takes only a finite number of values, Theorem 9.3.4 shows that the
pair fX˛ , Xn g forms a submartingale. If we write

M D f max Xj ½ g,
1jn

then M 2 F˛ (why?) and X˛ ½  on M, hence the first inequality follows


from  
P M  X˛ dP  Xn dP ;
M M

the second is just a cruder consequence.


Similarly let ˇ be the first j such that Xj   if there is such a j in
Nn , otherwise let ˇ D n. Put also

Mk D f min Xj  g.
1jk

Then fX1 , Xˇ g is a submartingale by Theorem 9.3.4, and so


  
E X1   E Xˇ  D Xˇ dP C Xn dP C Xn dP
fˇn1g Mcn1 Mn Mcn

 P Mn  C E Xn   Xn dP ,
Mn

which reduces to (2).

Corollary 1. If fXn g is a martingale, then for each >0:



1 1
3 P f max jXj j ½ g  jXn j dP  E jXn j.
1jn  fmax1jn jXj j½g 

If in addition E X2n  < 1 for each n, then we have also


1
4 P f max jXj j ½ g  E X2n .
1jn 2
These are obtained by applying the theorem to the submartingales fjXn jg
and fX2n g. In case Xn is the Sn in Theorem 5.3.1, (4) is precisely the Kolmo-
gorov inequality there.

Corollary 2. Let 1  m  n, 3m 2 Fm and M D fmaxmjn Xj ½ g, then



P f3m \ Mg  Xn dP .
3m \M

This is proved just as (1) and will be needed later.


348 CONDITIONING. MARKOV PROPERTY. MARTINGALE

We now come to a new kind of inequality, which will be the tool for
proving the main convergence theorem below. Given any sequence of r.v.’s
fXj g, for each sample point ω, the convergence properties of the numerical
sequence fXj ωg hinge on the oscillation of the finite segments fXj ω, j 2
Nn g as n ! 1. In particular the sequence will have a limit, finite or infinite, if
and only if the number of its oscillations between any two [rational] numbers a
and b is finite (depending on a, b and ω). This is a standard type of argument
used in measure and integration theory (cf. Exercise 10 of Sec. 4.2). The
interesting thing is that for a smartingale, a sharp estimate of the expected
number of oscillations is obtainable.
Let a < b. The number of “upcrossings” of the interval [a, b] by a
numerical sequence fx1 , . . . , xn g is defined as follows. Set
˛1 D minfj: 1  j  n, xj  ag,
˛2 D minfj: ˛1 < j  n, xj ½ bg;

if either ˛1 or ˛2 is not defined because no such j exists, we define D 0. In


general, for k ½ 2 we set
˛2k1 D minfj: ˛2k2 < j  n, xj  ag,
˛2k D minfj: ˛2k1 < j  n, xj ½ bg;

if any one of these is undefined, then all the subsequent ones will be undefined.
Let ˛ be the last defined one, with  D 0 if ˛1 is undefined, then is defined
to be [/2]. Thus is the actual number of successive times that the sequence
crosses from  a to ½ b. Although the exact number is not essential, since a
couple of crossings more or less would make no difference, we must adhere
to a rigid way of counting in order to be accurate below.

Theorem 9.4.2. Let fXj , Fj , j 2 Nn g be a submartingale and 1 < a <


n
b < 1. Let [a,b] ω denote the number of upcrossings of [a, b] by the sample
sequence fXj ω; j 2 Nn g. We have then

n E fXn  aC g  E fX1  aC g E fXC


n g C jaj
5 Ef [a,b] g   .
ba ba
PROOF. Consider first the case where Xj ½ 0 for every j and 0 D a < b,
n n
so that [a,b] ω becomes [0,b] ω, and X˛j ω D 0 if j is odd, where ˛j D
˛j ω is defined as above with xj D Xj ω. For each ω, the sequence ˛j ω
is defined only up to ω, where 0  ω  n. But now we modify the
definition so that ˛j ω is defined for 1  j  n by setting it to be equal to n
wherever it was previously undefined. Since for some ω, a previously defined
˛j ω may also be equal to n, this apparent confusion will actually simplify
9.4 INEQUALITIES AND CONVERGENCE 349

the formulas below. In the same vein we set ˛0  1. Observe that ˛n D n in


any case, so that

n1  
Xn  X1 D X˛n  X˛0 D X˛jC1  X˛j  D C .
jD0 j even j odd

If j is odd and j C 1  ω, then


X˛jC1 ω ½ b>0 D X˛j ω;
If j is odd and j D ω, then
X˛jC1 ω D Xn ω ½ 0 D X˛j ω;
if j is odd and ω < j, then
X˛jC1 ω D Xn ω D X˛j ω.
Hence in all cases we have
 
6 X˛jC1 ω  X˛j ω ½ X˛jC1 ω  X˛j ω
j odd j odd
jC1ω
# $
ω n
½ bD [0,b] ωb.
2
Next, observe that f˛j , 0  j  ng as modified above is in general of the form
1 D ˛0  ˛1 < ˛2 < Ð Ð Ð < ˛  ˛C1 D Ð Ð Ð D ˛n D n, and since constants
are optional, this is an increasing sequence of optional r.v.’s. Hence by
Theorem 9.3.4, fX˛j , 0  j  ng is a submartingale so that for each j, 0 
j  n  1, we have E fX˛j C1  X˛j g ½ 0 and consequently
⎧ ⎫
⎨ ⎬
E X˛jC1  X˛j  ½ 0.
⎩ ⎭
j even

Adding to this the expectations of the extreme terms in (6), we obtain


n
7 E Xn  X1  ½ E  [0,b] b,

which is the particular case of (5) under consideration.


In the general case we apply the case just proved to fXj  aC , j 2 Nn g,
which is a submartingale by Corollary 1 to Theorem 9.3.1. It is clear that
the number of upcrossings of [a, b] by the given submartingale is exactly
that of [0, b  a] by the modified one. The inequality (7) becomes the first
inequality in (5) after the substitutions, and the second one follows since
Xn  aC  XC n C jaj. Theorem 9.4.2 is proved.
350 CONDITIONING. MARKOV PROPERTY. MARTINGALE

The corresponding result for a supermartingale will be given below; but


after such a painstaking definition of upcrossing, we may leave the dual defi-
nition of downcrossing to the reader.
Theorem 9.4.3. Let fXj , Fj ; j 2 Nn g be a supermartingale and let 1 <
n
ab < 1. Let Q [a,b] be the number of downcrossings of [a, b] by the sample
sequence fXj ω, j 2 Nn g. We have then
n E fX1 ^ bg  E fXn ^ bg
8 E fQ [a,b] g 
ba
n n
PROOF. fXj , j 2 Nn g is a submartingale and Q [a,b] is [b,a] for this
submartingale. Hence the first part of (5) becomes
n E fXn C bC  X1 C bC g E fb  Xn C  b  X1 C g
E fQ [a,b] g  D .
a  b ba
Since b  xC D b  b ^ x this is the same as in (8).

Corollary. For a positive supermartingale we have for 0  a < b < 1


n b
E fQ [a,b] g  .
ba
G. Letta proved the sharper “dual”:
n a
E f [a,b] g  .
ba
(Martingales et intégration stochastique, Quaderni, Pisa, 1984, 48–49.)
The basic convergence theorem is an immediate consequence of the
upcrossing inequality.
Theorem 9.4.4. If fXn , Fn ; n 2 Ng is an L 1 -bounded submartingale, then
fXn g converges a.e. to a finite limit.

Remark. Since
E jXn j D 2E XC C
n   E Xn   2E Xn   E X1 ,
the condition of L 1 -boundedness is equivalent to the apparently weaker one
below:
9 sup E XC
n  < 1.
n
n
PROOF. Let [a,b] D limn [a,b] .
Our hypothesis implies that the last term
in (5) is bounded in n; letting n ! 1, we obtain E f [a,b] g < 1 for every a
and b, and consequently [a,b] is finite with probability one. Hence, for each
pair of rational numbers a < b, the set
3[a,b] D flim Xn < a < b < lim Xn g
n n
9.4 INEQUALITIES AND CONVERGENCE 351

is a null set; and so is the union over all such pairs. Since this union contains
the set where limn Xn < limn Xn , the limit exists a.e. It must be finite a.e. by
Fatou’s lemma applied to the sequence jXn j.

Corollary. Every uniformly bounded smartingale converges a.e. Every posi-


tive supermartingale and every negative submartingale converge a.e.

It may be instructive to sketch a direct proof of Theorem 9.4.4 which is


done “by hand”, so to speak. This is the original proof given by Doob (1940)
for a martingale.
Suppose that the set 3[a,b] above has probability > >0. For each ω in
3[a,b] , the sequence fXn ω, n 2 Ng takes an infinite number of values < a
and an infinite number of values > b. Let 1 D n0 < n1 < . . . and put
32j1 D f min Xi < ag, 32j D f max Xi > bg.
n2j2 in2j1 n2j1 <in2j

Then for each k it is possible to choose the ni ’s successively so that the differ-
ences ni  ni1 for 1  i  2k are so large that “most” of 3[a,b] is contained
in 2k
iD1 3i , so that  2k

P 3i > .
iD1

Fixing an n > n2k and applying Corollary 2 to Theorem 9.4.1 to fXi g as


well as fXi g, we have
2j1   

aP 3i ½ 2j1 Xn2j1 dP D 2j1 Xn dP ,
iD1 3i
iD1 iD1
 2j   

bP 3i  2j Xn2j dP D 2j Xn dP ,
iD1 3i 3i
iD1 iD1

where the equalities follow from the martingale property. Upon subtraction
we obtain
 2j  2j1  
 
b  aP 3i  aP 3i 32j   2j1
c
Xn dP ,
iD1 iD1 3i 3c2j
iD1

and consequently, upon summing over 1  j  k:


kb  a  jaj  E jXn j.
This is impossible if k is large enough, since fXn g is L 1 -bounded.
352 CONDITIONING. MARKOV PROPERTY. MARTINGALE

Once Theorem 9.4.4 has been proved for a martingale, we can extend it
easily to a positive or uniformly integrable supermartingale by using Doob’s
decomposition. Suppose fXn g is a positive supermartingale and Xn D Yn  Zn
as in Theorem 9.3.2. Then 0  Zn  Yn and consequently
E Z1  D lim E Zn   E Y1 ;
n!1

next we have
E Yn  D E Xn  C E Zn   E X1  C E Z1 .
Hence fYn g is an L 1 -bounded martingale and so converges to a finite limit
as n ! 1. Since Zn " Z1 < 1 a.e., the convergence of fXn g follows. The
case of a uniformly integrable supermartingale is just as easy by the corollary
to Theorem 9.3.2.
It is trivial that a positive submartingale need not converge, since the
sequence fng is such a one. The classical random walk fSn g (coin-tossing
game) is an example of a martingale that does not converge (why?). An
interesting and not so trivial consequence is that both E SnC  and E jSn j must
diverge to C1! (Cf. Exercise 2 of Sec. 6.4.) Further examples are furnished
by “stopped random walk”. For the sake of concreteness, let us stay with the
classical case and define  to be the first time the walk reaches C1. As in our
previous discussion of the gambler’s-ruin problem, the modified random walk
fSQ n g, where SQ n D S3n , is still a martingale, hence in particular we have for
each n:
 
E SQ n  D E SQ 1  D S1 dP C S1 dP D E S1  D 0.
fD1g f>1g

As in (21) of Sec. 9.3 we have, writing SQ 1 D S D 1,


lim SQ n D SQ 1 a.e.,
n

since  < 1 a.e., but this convergence now also follows from Theorem 9.4.4,
since SQ nC  1. Observe, however, that
E SQ n  D 0 < 1 D E SQ 1 .
Next, we change the definition of  to be the first time ½ 1 the walk “returns”
to 0, as usual supposing S0  0. Then SQ 1 D 0 and we have indeed E SQ n  D
E SQ 1 . But for each n,
 
QSn dP >0 D SQ 1 dP ,
fSQ n >0g fSQ n >0g

so that the “extended sequence” fSQ 1 , . . . , SQ n , . . . , SQ 1 g is no longer a martin-


gale. These diverse circumstances will be dealt with below.
9.4 INEQUALITIES AND CONVERGENCE 353

Theorem 9.4.5. The three propositions below are equivalent for a sub-
martingale fXn , Fn ; n 2 Ng:

(a) it is a uniformly integrable sequence;


(b) it converges in L 1 ;
(c) it converges a.e. to an integrable X1 such that fXn , Fn ; n 2 N1 g is
a submartingale and E Xn  converges to E X1 .
PROOF. (a) ) (b): under (a) the condition in Theorem 9.4.4 is satisfied so
that Xn ! X1 a.e. This together with uniform integrability implies Xn ! X1
in L 1 by Theorem 4.5.4 with r D 1.
(b) ) (c): under (b) let Xn ! X1 in L 1 , then E jXn j ! E jX1 j <
1 and so Xn ! X1 a.e. by Theorem 9.4.4. For each 3 2 Fn and n < n0 ,
we have  
Xn dP  Xn0 dP
3 3

by the defining relation. The right member converges to 3 X1 dP by L 1 -
convergence and the resulting inequality shows that fXn , Fn ; n 2 N1 g is a
submartingale. Since L 1 -convergence also implies convergence of expecta-
tions, all three conditions in (c) are proved.
(c) ) (a); under (c), fXC n , Fn ; n 2 N1 g is a submartingale; hence we
have for every >0:
 
10 XCn dP  XC
1 dP ,
fXC
n >g fXC
n >g

which shows that fXC C C


n , n 2 Ng is uniformly integrable. Since Xn ! X1 a.e.,
C C
this implies E Xn  ! E X1 . Since by hypothesis E Xn  ! E X1 , it follows
that E X    
n  ! E X1 . This and Xn ! X1 a.e. imply that fXn g is uniformly
integrable by Theorem 4.5.4 for r D 1. Hence so is fXn g.

Theorem 9.4.6. In the case of a martingale, propositions (a) and (b) above
are equivalent to c0  or (d) below:
c0  it converges a.e. to an integrable X1 such that fXn , Fn ; n 2 N1 g is
a martingale;
(d) there exists an integrable r.v. Y such that Xn D E Y j Fn  for each
n 2 N.
PROOF. (b) ) c0  as before; c0  ) (a) as before if we observe that
E Xn  D E X1  for every n in the present case, or more rapidly by consid-
ering jXn j instead of XC 0
n as below. c  ) (d) is trivial, since we may take
the Y in (d) to be the X1 in (c ). To prove (d) ) (a), let n < n0 , then by
0
354 CONDITIONING. MARKOV PROPERTY. MARTINGALE

Theorem 9.1.5:

E Xn0 j Fn  D E E Y j Fn0  j Fn  D E Y j Fn  D Xn ,

hence fXn , Fn , n 2 N; Y, F g is a martingale by definition. Consequently


fjXn j, Fn , n 2 N; jYj, F g is a submartingale, and we have for each >0:
 
jXn j dP  jYj dP ,
fjXn j>g fjXn j>g

1 1
P fjXn j > g  E jXn j  E jYj,
 
which together imply (a).

Corollary. Under (d), fXn , Fn , n 2 N; X1 , F1 ; Y, F g is a martingale, where


X1 is given in c0 .

Recall that we have introduced martingales of the form in (d) earlier in


(13) in Sec. 9.3. Now we know this class coincides with the class of uniformly
integrable martingales.
We have already observed that the defining relation for a smartingale
is meaningful on any linearly ordered (or even partially ordered) index set.
The idea of extending the latter to a limit index is useful in applications to
continuous-time stochastic processes, where, for example, a martingale may
be defined on a dense set of real numbers in t1 , t2  and extended to t2 .
This corresponds to the case of extension from N to N1 . The dual extension
corresponding to that to t1 will now be considered. Let N denote the set of
strictly negative integers in their natural order, let 1 precede every element
in N, and denote by N1 the set f1g [ N in the prescribed order.
If fFn , n 2 Ng is a decreasing (with decreasing n) sequence of Borel fields,
their intersection n2N Fn will be denoted by F1 .
The convergence results for a submartingale on N are simpler because
the right side of the upcrossing inequality (5) involves the expectation of the
r.v. with the largest index, which in this case is the fixed 1 rather than the
previous varying n. Hence for mere convergence there is no need for an extra
condition such as (9).

Theorem 9.4.7. Let fXn , n 2 Ng be a submartingale. Then

11 lim Xn D X1 , where 1  X1 < 1 a.e.


n!1

The following conditions are equivalent, and they are automatically satisfied
in case of a martingale with “submartingale” replaced by “martingale” in (c):
9.4 INEQUALITIES AND CONVERGENCE 355

(a) fXn g is uniformly integrable;


(b) Xn ! X1 in L 1 ;
(c) fXn , n 2 N1 g is a submartingale;
(d) limn!1 # E Xn  > 1.
n
PROOF. Let [a,b] be the number of upcrossings of [a, b] by the sequence
fXn,...,X1 g. We have from Theorem 9.4.2:

n E XC
1  C jaj
Ef [a,b] g  .
ba
Letting n ! 1 and arguing as the proof of Theorem 9.4.4, we conclude (11)
by observing that
C
E XC C
1   lim E Xn   E X1  < 1.
n

The proofs of (a) ) (b) ) (c) are entirely similar to those in Theorem 9.4.5.
(c) ) (d) is trivial, since 1 < E X1   E Xn  for each n. It remains
to prove (d) ) (a). Letting C denote the limit in (d), we have for each >0:
C
12 P fjXn j > g  E jXn j D 2E XC
n   E Xn   2E X1   C < 1.

It follows that P fjXn j > g converges to zero uniformly in n as  ! 1. Since


 
C
Xn dP  XC1 dP ,
fXC
n >g fXC
n >g

this implies that fXC


n g is uniformly integrable. Next if n < m, then
 
0½ Xn dP D E Xn   Xn dP
fXn <g fXn ½g

½ E Xn   Xm dP
fXn ½g

D E Xn  Xm  C E Xm   Xm dP
fXn ½g

D E Xn  Xm  C Xm dP .
fXn <g

By (d), we may choose m so large that E Xn  Xm  >  for any given >0
and for every n < m. Having fixed such an m, we may choose  so large that

sup jXm j dP < 
n fXn <g
356 CONDITIONING. MARKOV PROPERTY. MARTINGALE

by the remark after (12). It follows that fX


n g is also uniformly integrable, and
therefore (a) is proved.
The next result will be stated for the index set N of all integers in their
natural order:

N D f. . . , n, . . . , 2, 1, 0, 1, 2, . . . , n, . . .g.

Let fFn g be increasing B.F.’s on N, namely: Fn ² Fm if n  m. We may


“close” them at both ends by adjoining the B.F.’s below:
% A
F1 D Fn , F1 D Fn .
n n

Let fYn g be r.v.’s indexed by N. If the B.F.’s and r.v.’s are only given on N
or N, they can be trivially extended to N by putting Fn D F1 , Yn D Y1 for
all n  0, or Fn D F1 , Yn D Y1 for all n ½ 0. The following convergence
theorem is very useful.

Theorem 9.4.8. Suppose that the Yn ’s are dominated by an integrable r.v.Z:

13 sup jYn j  Z;


n

and limn Yn D Y1 or Y1 as n ! 1 or 1. Then we have


14a lim E fYn j Fn g D E fY1 j F1 g;
n!1

14b lim E fYn j Fn g D E fY1 j F1 g.


n!1

In particular for a fixed integrable r.v. Y, we have


15a lim E fY j Fn g D E fY j F1 g;
n!1

15b lim E fY j Fn g D E fY j F1 g.


n!1

where the convergence holds also in L 1 in both cases.


PROOF. We prove (15) first. Let Xn D E fY j Fn g. For n 2 N, fXn , Fn g
is a martingale already introduced in (13) of Sec. 9.3; the same is true for
n 2 N. To prove (15a), we apply Theorem 9.4.6 to deduce c0  there. It
remains to identify the limit X1 with the right member of (15a). For each
3 2 Fn , we have
  
Y dP D Xn dP D X1 dP .
3 3 3
9.4 INEQUALITIES AND CONVERGENCE 357

Hence the equations hold also for 3 2 F1 (why?), and this shows that X1
has the defining property of E Y j F1 , since X1 2 F1 . Similarly, the limit
X1 in (15b) exists by Theorem 9.4.7; to identify it, we have by (c) there,
for each 3 2 F1 :
  
X1 dP D Xn dP D Y dP .
3 3 3

This shows that X1 is equal to the right member of (15b).


We can now prove (14a). Put for m 2 N:
Wm D sup jYn  Y1 j;
n½m

then jWm j  2Z and limm!1 Wm D 0 a.e. Applying (15a) to Wm we obtain



lim E fjYn  Y1 jFn g  lim E fWm j Fn g D E fWm j F1 g.
n!1 n!1

As m ! 1, the last term above converges to zero by dominated convergence


(see (vii) of Sec. 9.1). Hence the first term must be zero and this clearly
implies (14a). The proof of (14b) is completely similar.
Although the corollary below is a very special case we give it for histor-
ical interest. It is called Paul Lévy’s zero-or-one law (1935) and includes
Theorem 8.1.1 as a particular case.

Corollary. If 3 2 F1 , then
16 lim P 3 j Fn  D 13 a.e.
n!1

The reader is urged to ponder over the intuitive meaning of this result and
judge for himself whether it is “obvious” or “incredible”.

EXERCISES

 1. Prove that for any smartingale, we have for each >0:

P fsup jXn j ½ g  3 sup E jXn j.


n n

For a martingale or a positive or negative smartingale the constant 3 may be


replaced by 1.
2. Let fXn g be a positive supermartingale. Then for almost every ω,
Xk ω D 0 implies Xn ω D 0 for all n ½ k. [This is the analogue of a
minimum principle in potential theory.]
3. Generalize the upcrossing inequality for a submartingale fXn , Fn g as
follows:
n E fXn  aC j F1 g  X1  aC
E f [a,b] j F1 g  .
ba
358 CONDITIONING. MARKOV PROPERTY. MARTINGALE

Similarly, generalize the downcrossing inequality for a positive supermartin-


gale fXn , Fn g as follows:
n X1 ^ b
E fQ [a,b] j F1 g  .
ba
 4. As a sharpening of Theorems 9.4.2 and 9.4.3 we have, for a positive
supermartingale fXn , Fn , n 2 Ng:
n E X1 ^ a a k1
Pf [a,b] ½ kg  ,
b b
n E X1 ^ b a k1
P fQ [a,b] ½ kg  .
b b
These inequalities are due to Dubins. Derive Theorems 9.3.6 and 9.3.7 from
them. [HINT:
 
bP f˛2j < ng  X˛2j dP  X˛2j dP
f˛2j <ng f˛2j1 <ng

 X˛2j1 dP  aP f˛2j1 < ng
f˛2j1 <ng

since f˛2j1 < ng 2 F˛2j1 .]


 5. Every L 1 -bounded martingale is the difference of two positive L 1 -
bounded martingales. This is due to Krickeberg. [HINT: Take one of them to
be limk!1 E fXC k j Fn g.]
 6. A smartingale fX , F ; n 2 Ng is said to be closable [on the right]
n n
iff there exists a r.v. X1 such that fXn , Fn ; n 2 N1 g is a smartingale of the
same kind. Prove that if so then we can always take X1 D limn!1 Xn . This
supplies a missing link in the literature. [HINT: For a supermartingale consider
Xn D E X1 j Fn  C Yn , then fYn , Fn g is a positive supermartingale so we
may apply the convergence theorems to both terms of the decomposition.]
7. Prove a result for closability [on the left] which is similar to Exercise 6
but for the index set N. Give an example to show that in case of N we may
have limn!1 E Xn  6D E X1 , whereas in case of N closability implies
limn!1 E Xn  D E X1 .
8. Let fXn , Fn , n 2 Ng be a submartingale and let ˛ be a finite optional
r.v. satisfying the conditions: (a) E jX˛ j < 1, and (b)

lim jXn j dP D 0.
n!1 f˛>ng

Then
 fX˛^n , F˛^n ; n 2 N1 g is a submartingale. [HINT: for 3 2 F˛^n bound
3 X˛  X˛^n  dP below by interposing X˛^m where n < m.]
9.4 INEQUALITIES AND CONVERGENCE 359

9. Let fXn , Fn ; n 2 Ng be a supermartingale satisfying the condition


limn!1 E Xn  > 1. Then we have the representation Xn D X0n C X00n where
fX0n , Fn g is a martingale and fX00n , Fn g is a positive supermartingale such that
limn!1 X00n D 0 in L 1 as well as a.e. This is the analogue of F. Riesz’s
decomposition of a superharmonic function, X0n being the harmonic part and
X00n the potential part. [HINT: Use Doob’s decomposition Xn D Yn  Zn and
put X00n D Yn  E Z1 j Fn .]
10. Let fXn , Fn g be a potential ; namely a positive supermartingale such
that limn!1 E Xn  D 0; and let Xn D Yn  Zn be the Doob decomposition
[cf. (6) of Sec. 9.3]. Show that

Xn D E Z1 j Fn   Zn .

 11. If fX g is a martingale or positive submartingale such that


n
supn E X2n  < 1, then fXn g converges in L 2 as well as a.e.
12. Let fn , n 2 Ng be a sequence of independent and identically

distributed r.v.’s with zero mean and unit variance; and Sn D njD1 j .
p
Then for any optional
p r.v. ˛ relative to fn g such that E  ˛ < 1, we
p
have E jS˛ j  2E  ˛ and E S˛  D 0. This is an extension of Wald’s
p
equation
p due to Louis Gordon. [HINT: Truncate ˛ and put k D Sk2 / k 
2
Sk1 / k  1; then
1
  1
 p
p p
E fS˛2 / ˛g D k dP  P f˛ ½ kg/ k  2E  ˛;
kD1 f˛½kg kD1

now use Schwarz’s inequality followed by Fatou’s lemma.]


The next two problems are meant to give an idea of the passage from
discrete parameter martingale theory to the continuous parameter theory.
13. Let fXt , Ft ; t 2 [0, 1]g be a continuous parameter supermartingale.
For each t 2 [0, 1] and sequence ftn g decreasing to t, fXtn g converges a.e. and
in L 1 . For each t 2 [0, 1] and sequence ftn g increasing to t, fXtn g converges a.e.
but not necessarily in L 1 . [HINT: In the second case consider Xtn  E fXt j Ftn g.]
 14. In Exercise 13 let Q be the set of rational numbers in [0, 1]. For
each t 2 0, 1 both limits below exist a.e.:

lim Xs , lim Xs .
s"t s#t
s2Q s2Q

[HINT: Let fQn , n ½ 1g be finite subsets of Q such that Qn " Q; and apply the
upcrossing inequality to fXs , s 2 Qn g, then let n ! 1.]
360 CONDITIONING. MARKOV PROPERTY. MARTINGALE

9.5 Applications
Although some of the major successes of martingale theory lie in the field of
continuous-parameter stochastic processes, which cannot be discussed here, it
has also made various important contributions within the scope of this work.
We shall illustrate these below with a few that are related to our previous
topics, and indicate some others among the exercises.

(I) The notions of ‘‘at least once’’ and ‘‘infinitely often’’

These have been a recurring theme in Chapters 4, 5, 8, and 9 and play impor-
tant roles in the theory of random walk and its generalization to Markov
processes. Let fXn , n 2 N0 g be an arbitrary stochastic process; the notation
for fields in Sec. 9.2 will be used. For each n consider the events:
1
3n D fXj 2 Bj g,
jDn
1

MD 3n D fXj 2 Bj i.o.g,
nD1
where Bn are arbitrary Borel sets.

Theorem 9.5.1. We have


1 lim P f3nC1 j F [0,n] g D 1M a.e.,
n!1

where F [0,n] may be replaced by Ffng or Xn if the process is Markovian.


PROOF. By Theorem 9.4.8, (14a), the limit is
P fM j F [0,1 g D 1M .
The next result is a “principle of recurrence” which is useful in Markov
processes; it is an extension of the idea in Theorem 9.2.3 (see also Exercises 15
and 16 of Sec. 9.2).

Theorem 9.5.2. Let fXn , n 2 N0 g be a Markov process and An , Bn Borel


sets. Suppose that there exists υ>0 such that for every n,
1
2 Pf [Xj 2 Bj ] j Xn g ½ υ a.e. on the set fXn 2 An g;
jDnC1

then we have
3 P f[Xj 2 Aj i.o.]n[Xj 2 Bj i.o.]g D 0.
9.5 APPLICATIONS 361

PROOF. Let 1 D fXj 2 Aj i.o.g and use the notation 3n and M above.
We may ignore the null sets in (1) and (2). Then if ω 2 1, our hypothesis
implies that
P f3nC1 j Xn gω ½ υ i.o.

In view of (1) this is possible only if ω 2 M. Thus 1 ² M, which implies (3).


The intuitive meaning of the preceding theorem has been given by
Doeblin as follows: if the chance of a pedestrian’s getting run over is greater
than υ > 0 each time he crosses a certain street, then he will not be crossing
it indefinitely (since he will be killed first)! Here fXn 2 An g is the event of
the nth crossing, fXn 2 Bn g that of being run over at the nth crossing.

(II) Harmonic and superharmonic functions for a Markov process

Let fXn , n 2 N0 g be a homogeneous Markov process as discussed in Sec. 9.2


with the transition probability function PÐ, Ð. An extended-valued function
f on R1 is said to be harmonic (with respect to P ) iff it is integrable with
respect to the measure Px, Ð for each x and satisfies the following “harmonic
equation”;

4 8x 2 R1 : fx D Px, dyfy.
R1

It is superharmonic (with respect to P) iff the “D” in (4) is replaced by “½”;


in this case f may take the value C1.

Lemma. If f is [super]harmonic, then ffXn , n 2 N0 g, where X0  x0 for


some given x0 in R1 , is a [super]martingale.
PROOF. We have, recalling (14) of Sec. 9.2,

E ffXn g D Pn x0 , dyfy < 1,
R1

as is easily seen by iterating (4) and applying an extended form of Fubini’s


theorem (see, e.g., Neveu [6]). Next we have, upon substituting Xn for x in (4):

fXn  D PXn , dyfy D E ffXnC1  j Xn g D E ffXnC1  j F [0,n] g,
R1

where the second equation follows by Exercise 8 of Sec. 9.2 and the third by
Markov property. This proves the lemma in the harmonic case; the other case
is similar. (Why not also the “sub” case?)
The most important example of a harmonic function is the gÐ, B of
Exercise 10 of Sec. 9.2 for a given B; that of a superharmonic function is the
362 CONDITIONING. MARKOV PROPERTY. MARTINGALE

fÐ, B of Exercise 9 there. These assertions follow easily from their proba-
bilistic meanings given in the cited exercises, but purely analytic verifications
are also simple and instructive. Finally, if for some B we have
1

x D Pn x, B < 1
nD0

for every x, then Ð is superharmonic and is called the “potential” of the
set B.

Theorem 9.5.3. Suppose that the remote field of fXn , n 2 N0 g is trivial.


Then each bounded harmonic function is a constant a.e. with respect to each
n , where n is the p.m. of Xn .

PROOF. By Theorem 9.4.5, fXn  converges a.e. to Z such that


ffXn , F [0,n] ; Z, F [0,1 g
is a martingale. Clearly Z belongs to the remote field and so is a constant c
a.e. Since
fXn  D E fZ j Fn g,
each fXn  is the same constant c a.e. Mapped into R1 , the last assertion
becomes the conclusion of the theorem.

(III) The supremum of a submartingale

The first inequality in (1) of Sec. 9.4 is of a type familiar in ergodic theory
and leads to the result below, which has been called the “dominated ergodic
theorem” by Wiener. In the case where Xn is the sum of independent r.v.’s
with mean 0, it is due to Marcinkiewicz and Zygmund. We write jjXjjp for
the L p -norm of X: jjXjjpp D E jXjp .

Theorem 9.5.4. Let 1 < p < 1 and 1/p C 1/q D 1. Suppose that fXn , n 2
Ng is a positive submartingale satisfying the condition
5 sup E fXpn g < 1.
n

Then supn2N Xn 2 L p and


6 jj sup Xn jjp  q sup jjXn jjp .
n n

PROOF. The condition (5) implies that fXn g is uniformly integrable


(Exercise 8 of Sec. 4.5), hence by Theorem 9.4.5, Xn ! X1 a.e. and fXn , n 2
N1 g is a submartingale. Writing Y for sup Xn , we have by an obvious
9.5 APPLICATIONS 363

extension of the first equation in (1) of Sec. 9.4:



7 8>0: P fY ½ g  X1 dP .
fY½g

Now it turns out that such an inequality for any two r.v.’s Y and X1 implies
the inequality jjYjjp  qjjX1 jjp , from which (6) follows by Fatou’s lemma.
This is shown by the calculation below, where G D P fY ½ g.
 1  1
E Yp  D  p dG  pp1 G d
0 0
 1 #  $
p1 1
 p X1 dP d
0  fY½g
 # Y $
D X1 pp2 d dP
 0

Dq X1 Yp1 dP  qjjX1 jjp jjYp1 jjq


D qjjX1 jjp fE Yp g1/q .

Since we do not yet know E Yp  < 1, it is necessary to replace Y first


with Y ^ m, where m is a constant, and verify the truth of (7) after this
replacement, before dividing through in the obvious way. We then let m " 1
and conclude (6).
The result above is false for p D 1 and is replaced by a more complicated
one (Exercise 7 below).

(IV) Convergence of sums of independent r.v.’s

We return to Theorem 5.3.4 and complete the discussion


 there by showing
that the convergence in distribution of the series n Xn already implies its
convergence a.e. This can also be proved by an analytic method based on
estimation of ch.f.’s, but the martingale approach is more illuminating.

 9.5.5. If fXn , n 2 Ng is a sequence of independent r.v.’s such that


Theorem
Sn D njD1 Xj converges in distribution as n ! 1, then Sn converges a.e.
PROOF. Let fi be the ch.f. of Xj , so that

n
ϕn D fj
jD1
364 CONDITIONING. MARKOV PROPERTY. MARTINGALE

is the ch.f. of Sn . By the convergence theorem of Sec. 6.3, ϕn converges


everywhere to ϕ, the ch.f. of the limit distribution of Sn . We shall need only
this fact for jtj  t0 , where t0 is so small that ϕt 6D 0 for jtj  t0 ; then this
is also true of ϕn for all sufficiently large n. For such values of n and a fixed
t with jtj  t0 , we define the complex-valued r.v. Zn as follows:

eitSn
8 Zn D .
ϕn t
Then each Zn is integrable; indeed the sequence fZn g is uniformly bounded.
We have for each n, if Fn denotes the Borel field generated by S1 , . . . , Sn :
 itSn  
e eitXnC1 
E fZnC1 j Fn g D E Ð
ϕn t fnC1 t 
Fn
 itXnC1  
eitSn e  itSn
D E  Fn D e fnC1 t
D Zn ,
ϕn t fnC1 t  ϕn t fnC1 t
where the second equation follows from Theorem 9.1.3 and the third from
independence. Thus fZn , Fn g is a martingale, in the sense that its real and
imaginary parts are both martingales. Since it is uniformly bounded, it follows
from Theorem 9.4.4 that Zn converges a.e. This means, for each t with jtj  t0 ,
there is a set t with P t  D 1 such that if ω 2 t , then the sequence of
complex numbers eitSn ω /ϕn t converges and so also does eitSn ω . But how
does one deduce from this the convergence of Sn ω? The argument below
may seem unnecessarily tedious, but it is of a familiar and indispensable kind
in certain parts of stochastic processes.
Consider eitSn ω as a function of t, ω in the product space T ð , where
T D [t0 , t0 ], with the product measure m ð P , where m is the Lebesgue
measure on T. Since this function is measurable in t, ω for each n, the
set C of t, ω for which limn!1 eitSn ω exists is measurable with respect to
m ð P . Each section of C by a fixed t has full measure P t  D 1 as just
shown, hence Fubini’s theorem asserts that almost every section of C by a
fixed ω must also have full measure mT D 2t0 . This means that there exists
an Q with P  Q D 1, and for each ω 2  Q there is a subset Tω of T with
mTω  D mT, such that if t 2 Tω , then limn!1 eitSn ω exists. Now we are
in a position to apply Exercise 17 of Sec. 6.4 to conclude the convergence of
Sn ω for ω 2 ,Q thus finishing the proof of the theorem.

According to the preceding proof, due to Doob, the hypothesis of


Theorem 9.5.5 may be further weakened to that the sequence of ch.f.’s of
Sn converges on a set of t of strictly positive Lebesgue measure. In particular,
if an infinite product 5n fn of ch.f.’s converges on such a set, then it converges
everywhere.
9.5 APPLICATIONS 365

(V) The strong law of large numbers

Our next example is a new proof of the classical strong law of large numbers
in the form of Theorem 5.4.2, (8). This basically different approach, which
has more a measure-theoretic than an analytic flavor, is one of the striking
successes of martingale theory. It was given by Doob (1949).

Theorem 9.5.6. Let fSn , n 2 Ng be a random walk (in the sense of


Chapter 8) with E fjS1 jg < 1. Then we have
Sn
lim D E fS1 g a.e.
n!1 n
n
PROOF. Recall that Sn D jD1 Xj and consider for 1  k  n:

9 E fXk j Sn , SnC1 , . . .g D E fXk j Gn g,

where Gn is the Borel field generated by fSj , j ½ ng. Thus


%
Gn # G D Gn
n2N

as n increases. By Theorem 9.4.8, (15b), the right side of (9) converges to


E fXk j G g. Now Gn is also generated by Sn and fXj , j ½ n C 1g, hence it
follows from the independence of the latter from the pair Xk , Sn  and (3) of
Sec. 9.2 that we have

10 E fXk j Sn , XnC1 , XnC2 , . . .g D E fXk j Sn g D E fX1 j Sn g,

the second equation by reason of symmetry (proof?). Summing over k from


1 to n and taking the average, we infer that
Sn
D E fX1 j Gn g
n
so that if Yn D Sn /n for n 2 N, fYn , n 2 Ng is a martingale. In particular,
Sn
lim D lim E fX1 j Gn g D E fX1 j G g,
n!1 n n!1

where the second equation follows from the argument above. On the other
hand, the first limit is a remote (even invariant) r.v. in the sense of Sec. 8.1,
since for every m ½ 1 we have
n
Sn ω jDm Xj ω
lim D lim ;
n!1 n n!1 n
366 CONDITIONING. MARKOV PROPERTY. MARTINGALE

hence it must be a constant a.e. by Theorem 8.1.2. [Alternatively we may use


Theorem 8.1.4, the limit above being even more obviously a permutable r.v.]
This constant must be E fE fX1 j G gg D G fX1 g, proving the theorem.

(VI) Exchangeable events

The method used in the preceding example can also be applied to the theory
of exchangeable events. The events fEn , n 2 Ng are said to be exchangeable
iff for every k ½ 1, the probability of the joint occurrence of any k of them is
the same for every choice of k members from the sequence, namely we have
11 P fEn1 \ Ð Ð Ð \ Enk g D wk , k 2 N;
for any subset fn1 , . . . , nk g of N. Let us denote the indicator of En by en ,
and put
n
Nn D ej ;
jD1

then Nn is the number of occurrences among the first n events of the sequence.
Denote by Gn the B.F. generated by fNj , j ½ ng, and
%
G D Gn .
n2N

Then the definition of exchangeability implies that if nj  n for 1  j 


k, then
⎛ ⎞ ⎛ ⎞
k k
12 E ⎝ enj j Gn ⎠ D E ⎝ enj j Nn ⎠
jD1 jD1

and that this conditional expectation is the same for any subset n1 , . . . , nk 
of 1, . . . , n. Put then fn0 D 1 and
 
k
fnk D enj , 1  k  n,
n1 ,...,nk  jD1
6 7
where the sum is extended over all nk choices; this is the “elementary
symmetric function” of degree k formed by e1 , . . . , en . Introducing an inde-
terminate z we have the formal identity in z:

n 
n
fnj zj D 1 C ej z.
jD0 jD1
9.5 APPLICATIONS 367

But it is trivial that 1 C ej z D 1 C zej since ej takes only the values 0 and
1, hence†
n n
fnj zj D 1 C zej D 1 C zNn .
jD0 jD1

From this we obtain by comparing the coefficients:


 
Nn
13 fnk D , 0  k  n.
k
It follows that the right member of (12) is equal to
 D 
Nn n
.
k k
Letting n ! 1 and using Theorem 9.4.8 (15b) in the left member of (12),
we conclude that
⎛ ⎞
k  k
⎝ ⎠ Nn
14 E enj j G D lim .
jD1
n!1 n
This is the key to the theory. It shows first that the limit below exists almost
surely:
Nn
lim D ,
n!1 n
and clearly  is a r.v. satisfying 0    1. Going back to (14) we have
established the formula
140  P En1 \ Ð Ð Ð \ Enk j G  D k , k 2 N;
and taking expectations we have identified the wk in (11):
wk D E k .
Thus fwk , k 2 Ng is the sequence of moments of the distribution of . This is
de Finetti’s theorem, as proved by D. G. Kendall. We leave some easy conse-
quences as exercises below. An interesting classical example of exchangeable
events is Pólya’s urn scheme, see Rényi [24], and Chung [25].

(VII) Squared variation

Here is a small sample of the latest developments in martingale theory. Let


X D fXn , Fn g be a martingale; using the notation in (7) of Sec. 9.3, we put

n
Qn2 D Qn2 X D xj2 .
jD1

†I owe this derivation to David Klarner.


368 CONDITIONING. MARKOV PROPERTY. MARTINGALE

The sequence fQn2 X, n 2 Ng associated with X is called its squared varia-
tion process and is useful in many applications. We begin with the algebraic
identity:
⎛ ⎞2
n 
n n
15 Xn D
2 ⎝ xj ⎠ D xj C 2
2
Xj1 xj .
jD1 jD1 jD2

If Xn 2 L 2 for every n, then all terms above are integrable, and we have for
each j:
16 E Xj1 xj  D E Xj1 E xj j Fj1  D 0.

It follows that

n
E X2n  D E xj2  D E Qn2 .
jD1

When Xn is the nth partial sum of a sequence of independent r.v.’s with zero
mean and finite variance, the preceding formula reduces to the additivity of
variances; see (6) of Sec. 5.1.
Now suppose that fXn g is a positive bounded supermartingale such that
0  Xn   for all n, where  is a constant. Then the quantity of (16) is
negative and bounded below by
E E xj j Fj1  D E xj   0.
In this case we obtain from (15):

n
E Xn  ½ X X2n  ½ E Qn2  C 2 E xj  D E Qn2  C 2[E Xn  D E X1 ];
jD2

so that
17 E Qn2   2E X1   22 .

If X is a positive martingale, then X3 is a supermartingale of the kind


just considered so that (17) is applicable to it. Letting XŁ D sup1n<1 Xn ,
P fQn X ½ g  P fXŁ > g C P fXŁ  ; Qn X3 ½ g.

By Theorem 9.4.1, the first term on the right is bounded by 1 E X1 . The
second term may be estimated by Chebyshev’s inequality followed by (17)
applied to X3:
1 2
P fQn X3 ½ g  E fQn2 X3g  E X1 .
2 
9.5 APPLICATIONS 369

We have therefore established the inequality:


3
18 P fQn X ½ g  E X1 

for any positive martingale. Letting n ! 1 and then  ! 1, we obtain
1

xj2 D lim Qn2 X < 1 a.e.
n!1
jD1

Using Krickeberg’s decomposition (Exercise 5 of Sec. 9.4) the last result


extends at once to any L 1 -bounded martingale. This was first proved by D. G.
Austin. Similarly, the inequality (18) extends to any L 1 -bounded martingale
as follows:
6
19 P fQn X ½ g  sup E jXn j.
 n
The details are left as an exercise. This result is due to D. Burkholder. The
simplified proofs given above are due to A. Garsia.

(VIII) Derivation

Our final example is a feedback to the beginning of this chapter, namely to


use martingale theory to obtain a Radon–Nikodym derivative. Let Y be an
integrable r.v. and consider, as in the proof of Theorem 9.1.1, the countably
additive set function below:

20 3 D Y dP .
3

For any countable measurable partition f1n


j , j 2 Ng of , let Fn be the Borel
field generated by it. Define the approximating function Xn as follows:
 1n
j 
21 Xn D n
11n ,
j P 1j  j

where the fraction is taken to be zero if the denominator vanishes. According


to the discussion of Sec. 9.1, we have
Xn D E fY j Fn g.
Now suppose that the partitions become finer as n increases so that fFn , n 2 Ng
is an increasing sequence of Borel fields. Then we obtain by Theorem 9.4.8:
lim Xn D E fY j F1 g.
n!1
370 CONDITIONING. MARKOV PROPERTY. MARTINGALE

In particular if Y 2 F1 we have obtained the Radon–Nikodym derivative


Y D d /dP as the almost everywhere limit of “increment ratios” over a “net”:

1n
jω 
Yω D lim n
,
n!1 P 1jω 

where jω is the unique j such that ω 2 1nj .


If , F , P  is U , B, m as in Example 2 of Sec. 3.1, and is an
arbitrary measure which is absolutely continuous with respect to the Lebesgue
measure m, we may take the nth partition to be 0 D 0n < Ð Ð Ð < nn D 1
such that
n
max kC1  kn  ! 0.
0kn1

For in this case F1 will contain each open interval and so also B. If is not
absolutely continuous, the procedure above will lead to the derivative of its
absolutely continuous part (see Exercise 14 below). In particular, if F is the
d.f. associated with the p.m. , and we put
#    $
kC1 k k kC1
fn x D 2 F
n
F for n < x  ,
2n 2n 2 2n
where k ranges over all integers, then we have

lim fn x D F0 x


n!1

for almost all x with respect to m; and F0 is the density of the absolutely
continuous part of F; see Theorem 1.3.1. So we have come around to the
beginning of this course, and the book is hereby ended.

EXERCISES

 1. Suppose that fX , n 2 Ng is a sequence of integer-valued r.v.’s having


n
the following property. For each n, there exists a function pn of n integers
such that for every k 2 N, we have

P fXkCj D xj , 1  j  ng D pn x1 , . . . , xn .

Define for a fixed x0 :


pnC1 x0 , X1 , . . . , Xn 
Zn D
pn X1 , . . . , Xn 

if the denominator >0; otherwise Zn D 0. Then fZn , n 2 Ng is a martingale


that converges a.e. and in L 1 . [This is from information theory.]
9.5 APPLICATIONS 371

2. Suppose that for each n, fXj , 1  j  ng and fX0j , 1  j  ng have


respectively the n-dimensional probability density functions pn and qn . Define
qn X1 , . . . , Xn 
Yn D
pn X1 , . . . , Xn 
if the denominator >0 and D 0 otherwise. Then fYn , n 2 Ng is a super-
martingale that converges a.e. [This is from statistics.]
3. Let fZn , n 2 N0 g be positive integer-valued r.v.’s such that Z0  1
and for each n ½ 1, the conditional distribution of Zn given Z0 , . . . , Zn1 is
that of Zn1 independent r.v.’s with the common distribution fpk , k 2 N0 g,
where p1 < 1 and
1
0<mD kpk < 1.
kD0

Then fWn , n 2 N0 g, where Wn D Zn /mn , is a martingale that converges, the


limit being zero if m  1. [This is from branching process.]
4. Let fXn , n 2 Ng be an arbitrary stochastic process and let Fn0 be as in
Sec. 8.1. Prove that the remote field is almost trivial if and only if for each
3 2 F1 we have
lim sup jP 3M  P 3P Mj D 0.
n!1 M2F 0
n

[HINT: Consider P 3 j Fn0  and apply 9.4.8. This is due to Blackwell and
Freedman.]
 5. In the notation of Theorem 9.6.2, suppose that there exists υ>0 such
that
P fXj 2 Bj i.o. j Xn g  1  υ a.e. on fXn 2 An g;

then we have
P fXj 2 Aj i.o. and Xj 2 Bj i.o.g D 0.

6. Let f be a real bounded continuous function on R1 and a p.m. on


1
R such that 
8x 2 R : fx D
1
fx C y dy.
R1

Then fx C s D fx for each s in the support of . In particular, if is not of


the lattice type, then f is constant everywhere. [This is due to G. A. Hunt, who
used it to prove renewal limit theorems. The approach was later rediscovered
by other authors, and the above result in somewhat more general context is
now referred to as Choquet–Deny’s theorem.]
372 CONDITIONING. MARKOV PROPERTY. MARTINGALE

 7. The analogue of Theorem 9.5.4 for p D 1 is as follows: if X ½ 0 for


n
all n, then
e
E fsup Xn g  [1 C sup E fXn logC Xn g],
n e1 n

where logC x D log x if x ½ 1 and 0 if x  1.


 8. As an example of a more sophisticated application of the martingale
convergence theorem, consider the following result due to Paul Lévy. Let
fXn , n 2 N0 g be a sequence of uniformly bounded r.v.’s, then the two series
 
Xn and E fXn j X1 , . . . , Xn1 g
n n

converge or diverge together. [HINT: Let



n
Yn D Xn  E fXn j X1 , . . . , Xn1 g and Zn D Yj .
jD1

Define ˛ to be the first time that Zn > A and show that E ZC ˛3n  is bounded
in n. Apply Theorem 9.4.4 to fZ˛3n g for each A to show that Zn converges on
the set where limn Zn < 1; similarly also on the set where limn Zn > 1.
The situation is reminiscent of Theorem 8.2.5.]
9. Let fYk , 1  k  ng be independent r.v.’s with mean zero and finite
variances k2 ;


k 
k
Sk D Yj , Sk2 D 2
j >0, Zk D Sk2  sk2
jD1 jD1

Prove that fZk , 1  k  ng is a martingale. Suppose now all Yk are bounded


by a constant A, and define ˛ and M as in the proof of Theorem 9.4.1, with
the Xk there replaced by the Sk here. Prove that

Sn2 P Mc   E S˛2  D E S˛2    C A2 .

Thus we obtain
 C A2
P f max jSk j  g  .
1kn Sn2

an improvement on Exercise 3 of Sec. 5.3. [This is communicated by Doob.]


10. Let fXn , n 2 Ng be a sequence
 of independent, identically distributed
r.v.’s with E jX1 j < 1; and Sn D njD1 Xj . Define ˛ D inffn ½ 1: jXn j >
ng. Prove that if E jS˛ j/˛1f˛<1g  < 1, then E jX1 j logC jX1 j < 1. This
9.5 APPLICATIONS 373

is due to McCabe and Shepp. [HINT:



n
cn D P fjXj j  jg ! c>0;
jD1

1  1 
1 cn1
jXn j dP D jX1 j dP ;
nD1
n f˛Dng nD1
n fjX1 j>ng
1 
1
jSn1 j dP < 1.]
nD1
n f˛Dng

11. Deduce from Exercise 10 that E supn jSn j/n < 1 if and only
if E jX1 j logC jX1 j < 1. [HINT: Apply Exercise 7 to the martingale
f. . . , Sn /n, . . . , S2 /2, S1 g in Example (V).]
12. In Example (VI) show that (i) G is generated by ; (ii) the events
fEn , n 2 Ng are conditionally independent given ; (iii) for any l events Enj ,
1  j  l and any k  l we have
 1 
l
P fEn1 \ Ð Ð Ð \ Enk \ EcnkC1 \ Ð Ð Ð \ Ecnl g D x k 1  xlk Gdx
0 k

where G is the distributions of .


13. Prove the inequality (19).
 14. Prove that if is a measure on F that is singular with respect to
1
P , then the Xn ’s in (21) converge a.e. to zero. [HINT: Show that

3 D Xn dP for 3 2 Fm , m  n,
3

and apply Fatou’s lemma. fXn g is a supermartingale, not necessarily a martin-


gale!]
15. In the case of U , B, m, suppose that D υ1 and the nth partition
is obtained by dividing U into 2n equal parts: what are the Xn ’s in (21)? Use
this to “explain away” the St. Peterburg paradox (see Exercise 5 of Sec. 5.2).

Bibliographical Note

Most of the results can be found in Chapter 7 of Doob [17]. Another useful account is
given by Meyer [20]. For an early but stimulating account of the connections between
random walks and partial differential equations, see
A. Khintchine, Asymptotische Gesetze der Wahrscheinlichkeitsrechnung. Springer-
Verlag, Berlin, 1933.
374 CONDITIONING. MARKOV PROPERTY. MARTINGALE

Theorems 9.5.1 and 9.5.2 are contained in


Kai Lai Chung, The general theory of Markov processes according to Doeblin,
Z. Wahrscheinlichkeitstheorie 2 (1964), 230–254.
For Theorem 9.5.4 in the case of an independent process, see
J. Marcinkiewicz and A. Zygmund, Sur les fonctions independants, Fund. Math.
29 (1937), 60–90,
which contains other interesting and useful relics.
The following article serves as a guide to the recent literature on martingale
inequalities and their uses:
D. L. Burkholder, Distribution function inequalities for martingales, Ann. Proba-
bility 1 (1973), 19–42.
Supplement: Measure and
Integral

For basic mathematical vocabulary and notation the reader is referred to §1.1
and §2.1 of the main text.

1 Construction of measure
Let  be an abstract space and S its total Borel field, then A 2 S means
A ² .

DEFINITION 1. A function Ł with domain S and range in [0, 1] is an


outer measure iff the following properties hold:
(a) Ł  D 0;
(b) (monotonicity) if A1 ² A2 , then Ł A1   Ł A2 ;
(c) (subadditivity) if fAj g is a countable sequence of sets in S , then
⎛ ⎞

Ł⎝
Aj ⎠  Ł
Aj .
j j

DEFINITION 2. Let F0 be a field in . A function with domain F0 and


range in [0, 1] is a measure on F0 iff (a) and the following property hold:
376 SUPPLEMENT: MEASURE AND INTEGRAL

(d) (additivity) if fBj g is a countable sequence of disjoint sets in F0 and


j Bj 2 F0 , then
⎛ ⎞

1 ⎝ Bj ⎠ D Bj .
j j

Let us show that the properties (b) and (c) for outer measure hold for a measure
, provided all the sets involved belong to F0 .
If A1 2 F0 , A2 2 F0 , and A1 ² A2 , then Ac1 A2 2 F0 because F0 is a field;
A2 D A1 [ Ac1 A2 and so by (d):

A2  D A1  C Ac1 A2  ½ A1 .

Next, if each Aj 2 F0 , and furthermore if j Aj 2 F0 (this must be assumed


for a countably infinite union because it is not implied by the definition of a
field!), then
Aj D A1 [ Ac1 A2 [ Ac1 Ac2 A3 [ . . .
j

and so by (d), since each member of the disjoint union above belongs to F0 :
⎛ ⎞
⎝ Aj ⎠ D A1  C Ac1 A2  C Ac1 Ac2 A3  C Ð Ð Ð
j

 A1  C A2  C A3  C Ð Ð Ð

by property (b) just proved.


The symbol N denotes the sequence of natural numbers (strictly positive
integers); when used as index set, it will frequently be omitted as understood.
For instance, the index j used above ranges over N or a finite segment of N.
Now let us suppose that the field F0 is a Borel field to be denoted by F
and that is a measure on it. Then if An 2 F for each n 2 N, the countable
union n An and countable intersection n An both belong to F . In this case
we have the following fundamental properties.
(e) (increasing limit) if An ² AnC1 for all n and An " A D n An , then

lim " An  D A.


n

(f) (decreasing limit) if An ¦ AnC1 for all n, An # A D n An , and for


some n we have An  < 1, then

lim # An  D A.


n
1 CONSTRUCTION OF MEASURE 377

The additional assumption in (f) is essential. For a counterexample let


An D n, 1 in R, then An #  the empty set, but the measure (length!) of An
is C1 for all n, while  surely must have measure 0. See §3 for formalities
of this trivial example. It can even be made discrete if we use the counting
measure # of natural numbers: let An D fn, n C 1, n C 2, . . .g so that #An  D
C1, # n An  D 0.
Beginning with a measure on a field F0 , not a Borel field, we proceed
to construct a measure on the Borel field F generated by F0 , namely the
minimal Borel field containing F0 (see §2.1). This is called an extension of
from F0 to F , when the notation is maintained. Curiously, we do this
by first constructing an outer measure Ł on the total Borel field S and then
showing that Ł is in truth a measure on a certain Borel field to be denoted
by F Ł that contains F0 . Then of course F Ł must contain the minimal F , and
so Ł restricted to F is an extension of the original from F0 to F . But we
have obtained a further extension to F Ł that is in general “larger” than F and
possesses a further desirable property to be discussed.
Ł
3. Given a measure on a field F0 in , we define
DEFINITION on
S as follows, for any A 2 S :
⎧ ⎫
⎨  ⎬

2 Ł
A D inf Bj Bj 2 F0 for all j and Bj ¦ A .
⎩ ⎭
j j

A countable (possibly finite) collection of sets fBj g satisfying the conditions


indicated in (2) will be referred to below as a “covering” of A. The infimum
taken over all such coverings exists because the single set  constitutes a
covering of A, so that
Ł Ł
0 A    C1.
Ł
It is not trivial that A D A if A 2 F0 , which is part of the next theorem.
Ł Ł
Theorem 1. We have D on F0 ; on S is an outer measure.
PROOF. Let A 2 F0 , then the single set A serves as a covering of A; hence
Ł
A  A. For any covering fBj g of A, we have ABj 2 F0 and

ABj D A 2 F0 ;
j

hence by property (c) of on F0 followed by property (b):


⎛ ⎞
 
A D ⎝ ABj ⎠  ABj   Bj .
j j j
378 SUPPLEMENT: MEASURE AND INTEGRAL

Ł Ł
It follows from (2) that A  A. Thus D on F0 .
Ł
To prove is an outer measure, the properties (a) and (b) are trivial.
To prove (c), let  > 0. For each j, by the definition of Ł Aj , there exists a
covering fBjk g of Aj such that
 
Ł
Bjk   Aj  C .
k
2j

The double sequence fBjk g is a covering of j Aj such that


 
Ł
Bjk   Aj  C .
j k j

Hence for any  > 0:


⎛ ⎞

Ł ⎝ Aj ⎠  Ł
Aj  C 
j j

that establishes (c) for Ł , since  is arbitrarily small.


With the outer measure Ł , a class of sets F Ł is associated as follows.

Ł
DEFINITION 4. A set A ²  belongs to F iff for every Z ²  we have
Ł Ł Ł
3 Z D AZ C Ac Z.

If in (3) we change “D” into “”, the resulting inequality holds by (c); hence
(3) is equivalent to the reverse inequality when “D” is changed into “½”.

Ł
Theorem 2. F is a Borel field and contains F0 . On F Ł , Ł
is a measure.
PROOF. Let A 2 F0 . For any Z ²  and any  > 0, there exists a covering
fBj g of Z such that

4 Bj   Ł Z C .
j

Since ABj 2 F0 , fABj g is a covering of AZ; fAc Bj g is a covering of Ac Z; hence


 
Ł Ł c
5 AZ  ABj , A Z  Ac Bj .
j j

Since is a measure on F0 , we have for each j:

6 ABj  C Ac Bj  D Bj .


1 CONSTRUCTION OF MEASURE 379

It follows from (4), (5), and (6) that


Ł Ł Ł
AZ C Ac Z  Z C .
Letting  # 0 establishes the criterion (3) in its “½” form. Thus A 2 F Ł , and
we have proved that F0 ² F Ł .
To prove that F Ł is a Borel field, it is trivial that it is closed under
complementation because the criterion (3) is unaltered when A is changed
into Ac . Next, to show that F Ł is closed under union, let A 2 F Ł and B 2 F Ł .
Then for any Z ² , we have by (3) with A replaced by B and Z replaced by
ZA or ZAc :
Ł Ł Ł
ZA D ZAB C ZABc ;
Ł Ł Ł
ZAc  D ZAc B C ZAc Bc .
Hence by (3) again as written:
Ł Ł Ł Ł Ł
Z D ZAB C ZABc  C ZAc B C ZAc Bc .
Applying (3) with Z replaced by ZA [ B, we have
Ł Ł Ł
ZA [ B D ZA [ BA C ZA [ BAc 
Ł Ł
D ZA C ZAc B
Ł Ł Ł
D ZAB C ZABc  C ZAc B.
Comparing the two preceding equations, we see that
Ł Ł Ł
Z D ZA [ B C ZA [ Bc .
Hence A [ B 2 F Ł , and we have proved that F Ł is a field.
Now let fAj g be an infinite sequence of sets in F Ł ; put
j1 
B1 D A 1 , Bj D A j n A for j ½ 2.
iD1

Then fBj g is a sequence of disjoint sets in F Ł (because F Ł is a field) and has


the same union as fAj g. For any Z ² , we have for each n ½ 1:
⎛ ⎞ ⎛ ⎛ ⎞ ⎞ ⎛ ⎛ ⎞ ⎞
n n n
Ł ⎝Z Bj ⎠ D Ł ⎝Z ⎝ Bj ⎠ Bn ⎠ C Ł ⎝Z ⎝ Bj ⎠ Bnc ⎠
jD1 jD1 jD1
⎛ ⎞
n1
D Ł
ZBn  C Ł ⎝Z Bj ⎠
jD1
380 SUPPLEMENT: MEASURE AND INTEGRAL

because Bn 2 F Ł . It follows by induction on n that


⎛ ⎞
n n
Ł⎝
7 Z Bj ⎠ D Ł
ZBj .
jD1 jD1

Since n
jD1 Bj 2 F Ł , we have by (7) and the monotonicity of Ł
:
⎛ ⎞ ⎛ ⎛ ⎞c ⎞
n n
Ł
Z D Ł ⎝Z Bj ⎠ C Ł ⎝Z ⎝ Bj ⎠ ⎠
jD1 jD1
⎛ ⎛ ⎞c ⎞

n 1
½ Ł
ZBj  C Ł ⎝Z ⎝ Bj ⎠ ⎠ .
jD1 jD1
Ł
Letting n " 1 and using property (c) of , we obtain
⎛ ⎞ ⎛ ⎛ ⎞c ⎞
1 1
Ł
Z ½ Ł ⎝Z Bj ⎠ C Ł ⎝Z ⎝ Bj ⎠ ⎠
jD1 jD1
1
that establishes 2 F Ł . Thus F Ł is a Borel field.
jD1 Bj
Finally, let fBj g be a sequence of disjoint sets in F Ł . By the property (b)
Ł
of and (7) with Z D , we have
⎛ ⎞ ⎛ ⎞
1 n  n 1

Ł⎝
Bj ⎠ ½ lim sup Ł ⎝ Bj ⎠ D lim Ł
Bj  D Ł
Bj .
n n
jD1 jD1 jD1 jD1

Combined with the property (c) of Ł , we obtain the countable additivity of


Ł
on F Ł , namely the property (d) for a measure:
⎛ ⎞
1 1
Ł⎝
Bj ⎠ D Ł
Bj .
jD1 jD1

The proof of Theorem 2 is complete.

2 Characterization of extensions
We have proved that
Ł
S ¦F ¦ F ¦ F0 ,
where some of the “¦” may turn out to be “D”. Since we have extended the
measure from F0 to F Ł in Theorem 2, what for F ? The answer will appear
in the sequel.
2 CHARACTERIZATION OF EXTENSIONS 381

The triple , F ,  where F is a Borel field of subsets of , and is a


measure on F , will be called a measure space. It is qualified by the adjective
“finite” when  < 1, and by the noun “probability” when  D 1.
A more general case is defined below.

DEFINITION 5. A measure on a field F0 (not necessarily Borel field) is


said to be -finite iff there exists a sequence of sets fn , n 2 Ng in F0 such
that n  < 1 for each n, and n n D . In this case the measure space
, F , , where F is the minimal Borel field containing F0 , is said to be
“ -finite on F0 ”.

Theorem 3. Let F0 be a field and F the Borel field generated by F0 . Let 1


and 2 be two measures on F that agree on F0 . If one of them, hence both
are -finite on F0 , then they agree on F .
PROOF. Let fn g be as in Definition 5. Define a class C of subsets of 
as follows:

C D fA ² : 1 n A D 2 n A for all n 2 Ng.

Since n 2 F0 , for any A 2 F0 we have n A 2 F0 for all n; hence C ¦ F0 .


Suppose Ak 2 C , Ak ² AkC1 for all k 2 N and Ak " A. Then by property (e)
of 1 and 2 as measures on F , we have for each n:

1 n A D lim " 1 n Ak  D lim " 2 n Ak  D 2 n A.


k k

Thus A 2 C . Similarly by property (f), and the hypothesis 1 n  D


2 n  < 1, if Ak 2 C and Ak # A, then A 2 C . Therefore C is closed under
both increasing and decreasing limits; hence C ¦ F by Theorem 2.1.2 of the
main text. This implies for any A 2 F :

1 A D lim " 1 n A D lim " 2 n A D 2 A


n n

by property (e) once again. Thus 1 and 2 agree on F .


It follows from Theorem 3 that under the -finite assumption there, the
outer measure Ł in Theorem 2 restricted to the minimal Borel field F
containing F0 is the unique extension of from F0 to F . What about the
more extensive extension to F Ł ? We are going to prove that it is also unique
when a further property is imposed on the extension. We begin by defining
two classes of special sets in F .

DEFINITION 6. Given the field F0 of sets in , let F0 υ be the collection


of all sets of the form 1 1
mD1 nD1 Bmn where each Bmn 2 F0 , and F0υ be the
collection of all sets of the form 1 1
mD1 nD1 Bmn where each Bmn 2 F0 .
382 SUPPLEMENT: MEASURE AND INTEGRAL

Both these collections belong to F because the Borel field is closed under
countable union and intersection, and these operations may be iterated, here
twice only, for each collection. If B 2 F0 , then B belongs to both F0 υ and
F0υ because we can take Bmn D B. Finally, A 2 F0 υ if and only if Ac 2 F0υ
because  c
 
Bmn D c
Bmn .
m n m n

Theorem 4. Let A 2 F Ł . There exists B 2 F0 υ such that


Ł Ł
A ² B; A D B.

If is -finite on F0 , then there exists C 2 F0υ such that


Ł Ł
C ² A; C D A.
PROOF. For each m, there exists fBmn g in F such that
 1
Ł Ł
A² Bmn ; Bmn   A C .
n n
m

Put 
Bm D Bmn ; BD Bm ;
n m

then A ² B and B 2 F0 υ . We have


 1
Ł Ł Ł Ł
B  Bm   Bmn   A C .
n
m

Letting m " 1 we see that Ł B  Ł


A; hence Ł
B D Ł
A. The first
assertion of the theorem is proved.
To prove the second assertion, let n be as in Definition 5. Applying the
first assertion to n Ac , we have Bn 2 F0 υ such that
Ł Ł
n Ac ² Bn ; n Ac  D Bn .

Hence we have
Ł Ł
n Ac ² n Bn ; n Ac  D n Bn .
Ł
Taking complements with respect to n , we have since n  < 1:
n A ¦ n Bnc ;
Ł Ł Ł Ł Ł Ł
n A D n   n Ac  D n   n Bn  D n Bnc .
2 CHARACTERIZATION OF EXTENSIONS 383

Since n 2 F0 and Bnc 2 F0υ , it is easy to verify that n Bnc 2 F0υ by the
distributive law for the intersection with a union. Put

CD n Bnc .
n

It is trivial that C 2 F0υ and

AD n A ¦ C.
n

Consequently, we have
Ł Ł Ł
A ½ C ½ lim inf n Bnc 
n
Ł Ł
D lim inf n A D A,
n

the last equation owing to property (e) of the measure Ł . Thus Ł A D
Ł
C, and the assertion is proved.
The measure Ł on F Ł is constructed from the measure on the field
F0 . The restriction of Ł to the minimal Borel field F containing F0 will
henceforth be denoted by instead of Ł .
In a general measure space , G , , let us denote by N G ,  the class
of all sets A in G with A D 0. They are called the null sets when G and
are understood, or -null sets when G is understood. Beware that if A ² B
and B is a null set, it does not follow that A is a null set because A may not
be in G ! This remark introduces the following definition.

DEFINITION 7. The measure space , G ,  is called complete iff any


subset of a null set is a null set.

Theorem 5. The following three collections of subsets of  are idential:

Ł
(i) A ²  and the outer measure A D 0;
Ł Ł
(ii) A 2 F and A D 0;
(iii) A ² B where B 2 F and B D 0.
It is the collection N F Ł , Ł
.
PROOF. If Ł A D 0, we will prove A 2 F Ł by verifying the criterion
(3). For any Z ² , we have by properties (a) and (b) of Ł :
Ł Ł Ł Ł
0 ZA  A D 0; ZAc   Z;
384 SUPPLEMENT: MEASURE AND INTEGRAL

and consequently by property (c):


Ł Ł Ł Ł Ł
Z D ZA [ ZAc   ZA C ZAc   Z.
Thus (3) is satisfied and we have proved that (i) and (ii) are equivalent.
Next, let A 2 F Ł and Ł A D 0. Then we have by the first assertion in
Theorem 4 that there exists B 2 F such that A ² B and Ł A D B. Thus
A satisfies (iii). Conversely, if A satisfies (iii), then by property (b) of outer
measure: Ł A  Ł B D B D 0, and so (i) is true.
As consequence, any subset of a F Ł , Ł -null set is a F Ł , Ł -null set.
This is the first assertion in the next theorem.

Theorem 6. The measure space , F Ł , Ł


 is complete. Let , G ,  be a
complete measure space; G ¦ F0 and D on F0 . If is -finite on F0 then
Ł Ł
G ¦F and D on F Ł .
PROOF. Let A 2 F Ł , then by Theorem 4 there exists B 2 F and C 2 F
such that
Ł
8 C ² A ² B; C D A D B.
Since D on F0 , we have by Theorem 3, D on F . Hence by (8) we
have B  C D 0. Since A  C ² B  C and B  C 2 G , and , G ,  is
complete, we have A  C 2 G and so A D C [ A  C 2 G .
Moreover, since C, A, and B belong to G , it follows from (8) that
C D C  A  B D B
and consequently by (8) again A D A. The theorem is proved.
To summarize the gist of Theorems 4 and 6, if the measure on the field
F0 is -finite on F0 , then F ,  is its unique extension to F , and F Ł , Ł 
is its minimal complete extension. Here one is tempted to change the notation
to 0 on F0 !
We will complete the picture by showing how to obtain F Ł , Ł  from
F , , reversing the order of previous construction. Given the measure space
, F , , let us denote by C the collection of subsets of  as follows: A 2 C
iff there exists B 2 N F ,  such that A ² B. Clearly C has the “hereditary”
property: if A belongs to C , then all subsets of A belong to C ; C is also closed
under countable union. Next, we define the collection
9 F D fA ²  j A D B  C where B 2 F , C 2 C g.

where the symbol “” denotes strict difference of sets, namely B  C D BCc
where C ² B. Finally we define a function on F as follows, for the A
2 CHARACTERIZATION OF EXTENSIONS 385

shown in (9):

10 A D B.

We will legitimize this definition and with the same stroke prove the mono-
tonicity of . Suppose then

11 B1  C1 ² B2  C2 , Bi 2 F , Ci 2 C , i D 1, 2.

Let C1 ² D 2 N F , . Then B1 ² B2 [ D and so B1   B2 [ D D


B2 . When the ² in (11) is “D”, we can interchange B1 and B2 to conclude
that B1  D B2 , so that the definition (10) is legitimate.

Theorem 7. F is a Borel field and is a measure on F .


PROOF. Let An 2 F , n 2 N; so that An D Bn Ccn as in (9). We have then
1
1  1 c
 
An D Bn \ Cn .
nD1 nD1 nD1

Since the class C is closed under countable union, this shows that F is closed
under countable intersection. Next let C ² D, D 2 N F , ; then

Ac D Bc [ C D Bc [ BC D Bc [ BD  D  C
D Bc [ BD  BD  C.

Since BD  C ² D, we have BD  C 2 C ; hence the above shows that F


is also closed under complementation and therefore is a Borel field. Clearly
F ¦ F because we may take C D  in (9).
To prove is countably additive on F , let fAn g be disjoint sets in
F . Then
An D B n  C n , B n 2 F , C n 2 C .
1
There exists D in N F ,  containing nD1 Cn . Then fBn  Dg are
disjoint and
1 1 1
Bn  D ² An ² Bn .
nD1 nD1 nD1

All these sets belong to F and so by the monotonicity of :


     
Bn  D  An  Bn .
n n n
386 SUPPLEMENT: MEASURE AND INTEGRAL

Since D on F , the first and third members above are equal to, respec-
tively:
 
  
Bn  D D Bn  D D Bn  D An ;
n n n n
 
 
Bn  Bn  D An .
n n n

Therefore we have  

An D An .
n n

Since  D    D  D 0, is a measure on F .


Ł Ł
Corollary. In truth: F D F and D .
Ł
PROOF. For any A 2 F , by the first part of Theorem 4, there exists B 2
F such that
Ł
A D B  B  A, B  A D 0.
Hence by Theorem 5, B  A 2 C and so A 2 F by (9). Thus F Ł ² F . Since
F ² F Ł and C 2 F Ł by Theorem 6, we have F ² F Ł by (9). Hence F D
F Ł . It follows from the above that Ł A D B D A. Hence Ł D on
FŁDF.
The question arises naturally whether we can extend from F0 to F
directly without the intervention of F Ł . This is indeed possible by a method of
transfinite induction originally conceived by Borel; see an article by LeBlanc
and G.E. Fox: “On the extension of measure by the method of Borel”, Cana-
dian Journal of Mathematics, 1956, pp. 516–523. It is technically lengthier
than the method of outer measure expounded here.
Although the case of a countable space  can be treated in an obvious
way, it is instructive to apply the general theory to see what happens.
Let  D N [ ω; F0 is the minimal field (not Borel field) containing each
singleton n in N, but not ω. Let Nf denote the collection of all finite subsets
of N; then F0 consists of Nf and the complements of members of Nf (with
respect to ), the latter all containing ω. Let 0  n < 1 for all n 2 N; a
measure is defined on F0 as follows:

A D n if A 2 Nf ; Ac  D   A.
n2A

We must still 
define . Observe that by the properties of a measure, we
have  ½ n2N n D s, say.
3 MEASURES IN R 387

Ł
Now we use Definition 3 to determine the outer measure . It is easy
to see that for any A ² N, we have

Ł
A D n.
n2A
Ł
In particular N D s. Next we have
Ł
ω D inf Ac  D   sup A D   s
A2Nf A2Nf

provided s < 1; otherwise the inf above is 1. Thus we have


Ł Ł
ω D   s if  < 1; ω D 1 if  D 1.
It follows that for any A ² :

Ł Ł
A D n
n2A

where Ł n D n for n 2 N. Thus Ł is a measure on S , namely, F Ł D S .


But it is obvious that F D S since F contains N as countable union and
so contains ω as complement. Hence F D F Ł D S .
If  D 1 and s D 1, the extension Ł of to S is not unique,
because we can define ω to be any positive number and get an exten-
sion. Thus is not -finite on F0 , by Theorem 3. But we can verify this
directly when  D 1, whether s D 1 or s < 1. Thus in the latter case,
ω D 1 is also the unique extension of from F0 to S . This means that
the condition of -finiteness on F0 is only a sufficient and not a necessary
condition for the unique extension.
As a ramification of the example above, let  D N [ ω1 [ ω2 , with two
extra points adjoined to N, but keep F0 as before. Then F DF Ł  is strictly
smaller than S because neither ω1 nor ω2 belongs to it. From Definition 3 we
obtain
Ł
ω1 [ ω2  D Ł ω1  D Ł ω2 .
Thus Ł is not even two-by-two additive on S unless the three quantities
above are zero. The two points ω1 and ω2 form an inseparable couple. We
leave it to the curious reader to wonder about other possibilities.

3 Measures in R
Let R D 1, C1 be the set of real members, alias the real line, with its
Euclidean topology. For 1  a < b  C1,
12 a, b] D fx 2 R: a < x  bg
388 SUPPLEMENT: MEASURE AND INTEGRAL

is an interval of a particular shape, namely open at left end and closed at right
end. For b D C1, a, C1] D a, C1 because C1 is not in R. By choice
of the particular shape, the complement of such an interval is the union of
two intervals of the same shape:
a, b]c D 1, a] [ b, 1].
When a D b, of course a, a] D  is the empty set. A finite or countably
infinite number of such intervals may merge end to end into a single one as
illustrated below:
1  $
1 1
13 0, 2] D 0, 1] [ 1, 2]; 0, 1] D , .
nD1
nC1 n

Apart from this possibility, the representation of a, b] is unique.


The minimal Borel field containing all a, b] will be denoted by B and
called the Borel field of R. Since a bounded open interval is the countable
union of intervals like a, b], and any open set in R is the countable union
of (disjount) bounded open intervals, the Borel field B contains all open
sets; hence by complementation it contains all closed sets, in particular all
compact sets. Starting from one of these collections, forming countable union
and countable intersection successively, a countable number of times, one can
build up B through a transfinite induction.
Now suppose a measure m has been defined on B, subject to the sole
assumption that its value for a finite (alias bounded) interval be finite, namely
if 1 < a < b < C1, then
14 0  ma, b] < 1.
We associate a point function F on R with the set function m on B, as follows:
15 F0 D 0; Fx D m0, x] for x > 0; Fx D mx, 0] for x < 0.
This function may be called the “generalized distribution” for m. We see that
F is finite everywhere owing to (14), and
16 ma, b] D Fb  Fa.
F is increasing (viz. nondecreasing) in R and so the limits
FC1 D lim Fx  C1, F1 D lim Fx ½ 1
x!C1 x!1

both exist. We shall write 1 for C1 sometimes. Next, F has unilateral limits
everywhere, and is right-continuous:
Fx  Fx D FxC.
3 MEASURES IN R 389

The right-continuity follows from the monotone limit properties e and f
of m and the primary assumption (14). The measure of a single point x is
given by
mx D Fx  Fx.

We shall denote a point and the set consisting of it (singleton) by the same
symbol.
The simplest example of F is given by Fx  x. In this case F is continu-
ous everywhere and (14) becomes
ma, b] D b  a.
We can replace a, b] above by a, b, [a, b or [a, b] because mx D 0 for
each x. This measure is the length of the line-segment from a to b. It was
in this classic case that the following extension was first conceived by Émile
Borel (1871–1956).
We shall follow the methods in §§1–2, due to H. Lebesgue and
C. Carathéodory. Given F as specified above, we are going to construct a
measure m on B and a larger Borel field BŁ that fulfills the prescription (16).
The first step is to determine the minimal field B0 containing all a, b].
Since a field is closed under finite union, it must contain all sets of the form
n
17 BD Ij , Ij D aj , bj ], 1  j  n; n 2 N.
jD1

Without loss of generality, we may suppose the intervals Ij to be disjoint, by


merging intersections as illustiated by
1, 3] [ 2, 4] D 1, 4].
Then it is clear that the complement Bc is of the same form. The union of
two sets like B is also of the same form. Thus the collection of all sets like
B already forms a field and so it must be B0 . Of course it contains (includes)
the empty set  D a, a] and R. However, it does not contain any a, b except
R, [a, b, [a, b], or any single point!
Next we define a measure m on B0 satisfying (16). Since the condition
d in Definition 2 requires it to be finitely additive, there is only one way:
for the generic B in (17) with disjoint Ij we must put

n 
n
18 mB D mIj  D Fbj   Faj .
jD1 jD1

Having so defined m on B0 , we must now prove that it satisfies the condi-


tion d in toto, in order to proclaim it to be a measure on B0 . Namely, if
390 SUPPLEMENT: MEASURE AND INTEGRAL

fBk , 1  k  l  1g is a finite or countable sequence of disjoint sets in B0 ,


we must prove
 l 
 l
19 m Bk D mBk .
kD1 kD1

whenever l is finite, and moreover when l D 1 and the union [1 kD1 Bk happens
to be in B0 .
The case for a finite l is really clear. If each Bk is represented as in
(17), then the disjoint union of a finite number of them is represented in a
similar manner by pooling together all the disjoint Ij ’s from the Bk ’s. Then the
equation (19) just means that a finite double array of numbers can be summed
in two orders.
If that is so easy, what is the difficulty when l D 1? It turns out, as
Borel saw clearly, that the crux of the matter lies in the following fabulous
“banality.”

Borel’s lemma. If 1  a < b  C1 and


1
20 a, b] D aj , bj ],
jD1

where aj < bj for each j, and the intervals aj , bj ] are disjoint, then we have
1
 6 7
21 Fb  Fa D Fbj   Faj  .
jD1

PROOF. We will first give a transfinite argument that requires knowledge


of ordinal numbers. But it is so intuitively clear that it can be appreciated
without that prerequisite. Looking at (20) we see there is a unique index
j such that bj D b; name that index k and rename ak as c1 . By removing
ak , bk ] D c1 , b] from both sides of (20) we obtain
1
22 a, c1 ] D aj , bj ].
jD1
j6Dk

This small but giant step shortens the original a, b] to a, c1 ]. Obviously we
can repeat the process and shorten it to a, c2 ] where a  c2 < c1 D b, and so
by mathematical induction we obtain a sequence a  cn < Ð Ð Ð < c2 < c1 D b.
Needless to say, if for some n we have cn D a, then we have accom-
plished our purpose, but this cannot happen under our specific assumptions
because we have not used up all the infinite number of intervals in the union.
3 MEASURES IN R 391

Therefore the process must go on ad infinitum. Suppose then cn > cnC1 for
all n 2 N, so that cω D limn # cn exists, then cω ½ a. If cω D a (which can
easily happen, see (13)), then we are done and (21) follows, although the terms
in the series have been gathered step by step in a (possibly) different order.
What if cω > a? In this case there is a unique j such that bj D cω ; rename
the corresponding aj as cω1 . We have now

1
23 a, cω ] D aj0 , bj0 ],
jD1

where the aj0 , bj0 ]’s are the leftovers from the original collection in (20) after
an infinite number of them have been removed in the process. The interval
cω1 , cω ] is contained in the reduced new collection and we can begin a new
process by first removing it from both sides of (23), then the next, to be
denoted by [cω2 , cω1 ], and so on. If for some n we have cωn D a, then (21)
is proved because at each step a term in the sum is gathered. Otherwise there
exists the limit lim # cωn D cωω ½ a. If cωω D a, then (21) follows in the limit.
n
Otherwise cωω must be equal to some bj (why?), and the induction goes on.
Let us spare ourselves of the cumbersome notation for the successive well-
ordered ordinal numbers. But will this process stop after a countable number
of steps, namely, does there exist an ordinal number ˛ of countable cardinality
such that c˛ D a? The answer is “yes” because there are only countably many
intervals in the union (20).
The preceding proof (which may be made logically formal) reveals the
possibly complex structure hidden in the “order-blind” union in (20). Borel in
his Thèse (1894) adopted a similar argument to prove a more general result
that became known as his Covering Theorem (see below). A proof of the latter
can be found in any text on real analysis, without the use of ordinal numbers.
We will use the covering theorem to give another proof of Borel’s lemma, for
the sake of comparison (and learning).
This second proof establishes the equation (21) by two inequalities in
opposite direction. The first inequality is easy by considering the first n terms
in the disjoint union (20):


n
Fb  Fa ½ Fbj   Faj .
jD1

As n goes to infinity we obtain (21) with the “D” replaced by “½”.


The other half is more subtle: the reader should pause and think why?
The previous argument with ordinal numbers tells the story.
392 SUPPLEMENT: MEASURE AND INTEGRAL

Borel’s covering theorem. Let [a, b] be a compact interval, and aj , bj ,


j 2 N, be bounded open intervals, which may intersect arbitrarily, such that
1
24 [a, b] ² aj , bj .
jD1

Then there exists a finite integer l such that when l is substituted for 1 in
the above, the inclusion remains valid.
In other words, a finite subset of the original infinite set of open inter-
vals suffices to do the covering. This theorem is also called the Heine–Borel
Theorem; see Hardy [1] (in the general Bibliography) for two proofs by Besi-
covitch.

To apply (24) to (20), we must alter the shape of the intervals aj , bj ] to
fit the picture in (24).
Let 1 < a < b < 1; and  > 0. Choose a0 in a, b, and for each j
choose bj0 > bj such that

 
25 Fa0   Fa < ; Fbj0   Fbj  < .
2 2jC1
These choices are possible because F is right continuous; and now we have
1
[a0 , b] ² aj , bj0 
jD1

as required in (24). Hence by Borel’s theorem, there exists a finite l such that
l
26 [a0 , b] ² aj , bj0 .
jD1

From this it follows “easily” that


l
0
27 Fb  Fa   Fbj0   Faj .
jD1

We will spell out the proof by induction on l. When l D 1 it is obvious.


Suppose the assertion has been proved for l  1, l ½ 2. From (26) as written,
there is k, 1  k  l, such that ak < a0 < bk0 and so

28 Fbk0   Fa0   Fbk0   Fak .


3 MEASURES IN R 393

If we intersect both sides of (26) with the complement of ak , bk0 , we obtain
l
[bk0 , b] ² aj , bj0 .
jD1
j6Dk

Here the number of intervals on the right side is l  1; hence by the induction
hypothesis we have

l
Fb  Fbk0   Fbj0   Faj .
jD1
j6Dk

Adding this to (28) we obtain (27), and the induction is complete. It follows
from (27) and (25) that

l
Fb  Fa  Fbj   Faj  C .
jD1

Beware that the l above depends on . However, if we change l to 1 (back to


infinity!) then the infinite series of course does not depend on . Therefore we
can let  ! 0 to obtain (21) when the “D” there is changed to “ 00 , namely
the other half of Borel’s lemma, for finite a and b.
It remains to treat the case a D 1 and/or b D C1. Let
1
1, b] ² aj , bj ].
jD1

Then for any a in 1, b, (21) holds with “D” replaced by “”. Letting
a ! 1 we obtain the desired result. The case b D C1 is similar. Q.E.D.

 In the following, all I with subscripts denote intervals of the shape a, b];
denotes union of disjoint sets. Let B 2 B0 ; Bj 2 B0 , j 2 N. Thus

n 
nj
BD Ii ; Bj D Ijk .
iD1 kD1

Suppose
1

BD Bj
jD1

so that

n 1 
 nj
29 Ii D Ijk .
iD1 jD1 kD1
394 SUPPLEMENT: MEASURE AND INTEGRAL

We will prove


n 1 
 nj
30 mIi  D mIjk .
iD1 jD1 kD1

For n D 1, (29) is of the form (20) since a countable set of sets can be
ordered as a sequence. Hence (30) follows by Borel’s lemma. In general,
simple geometry shows that each Ii in (29) is the union of a subcollection of
the Ijk ’s. This is easier to see if we order the Ii ’s in algebraic order and, after
merging where possible, separate them at nonzero distances. Therefore (30)
follows by adding n equations, each of which results from Borel’s lemma.
This completes the proof of the countable additivity of m on B0 , namely
(19) is true as stipulated there for l D 1 as well as l < 1.
The general method developed in §1 can now be applied to R, B0 , m.
Substituting B0 for F0 , m for in Definition 3, we obtain the outer measure
mŁ . It is remarkable that the countable additivity of m on B0 , for which two
painstaking proofs were given above, is used exactly in one place, at the begin-
ning of Theorem 1, to prove that mŁ D m on B0 . Next, we define the Borel
field BŁ as in Definition 4. By Theorem 6, R, BŁ , mŁ  is a complete measure
space. By Definition 5, m is -finite on B0 because n, n] " 1, 1 as
n " 1 and mn, n] is finite by our primary assumption (14). Hence by
Theorem 3, the restriction of mŁ to B is the unique extension of m from B0
to B.
In the most important case where Fx  x, the measure m on B0 is the
length: ma, b] D b  a. It was Borel who, around the turn of the twentieth
century, first conceived of the notion of a countably additive “length” on an
extensive class of sets, now named after him: the Borel field B. A member of
this class is called a Borel set. The larger Borel field BŁ was first constructed
by Lebesgue from an outer and an inner measure (see pp. 28–29 of main
text). The latter was later bypassed by Carathéodory, whose method is adopted
here. A member of BŁ is usually called Lebesgue-measurable. The intimate
relationship between B and BŁ is best seen from Theorem 7.
The generalization to a generalized distribution function F is sometimes
referred to as Borel–Lebesgue–Stieltjes. See §2.2 of the main text for the
special case of a probability distribution.
The generalization to a Euclidean space of higher dimension presents no
new difficulty and is encumbered with tedious geometrical “baggage”.
It can be proved that the cardinal number of all Borel sets is that of the
real numbers (viz. all points in R), commonly denoted by C (the continuum).
On the other hand, if Z is a Borel set of cardinal C with mZ D 0, such
as the Cantor ternary set (p. 13 of main text), then by the remark preceding
Theorem 6, all subsets of Z are Lebesgue-measurable and hence their totality
4 INTEGRAL 395

has cardinal 2C which is strictly greater than C (see e.g. [3]). It follows that
there are incomparably more Lebesgue-measurable sets than Borel sets.
It is however not easy to exhibit a set in BŁ but not in B; see Exercise
No. 15 on p. 15 of the main text for a clue, but that example uses a non-
Lebesgue-measurable set to begin with.
Are there non-Lebesgue-measurable sets? Using the Axiom of Choice,
we can “define” such a set rather easily; see example [3] or [5]. However, Paul
Cohen has proved that the axiom is independent of the other logical axioms
known as Zermelo–Fraenkel system commonly adopted in mathematics; and
Robert Solovay has proved that in a certain model without the axiom of
choice, all sets of real numbers are Lebesgue-measurable. In the notation of
Definition 1 in §1 in this case, BŁ D S and the outer measure mŁ is a measure
on S .
N.B. Although no explicit invocation is made of the axiom of choice in
the main text of this book, a weaker version of it under the prefix “countable”
must have been casually employed on the q.t. Without the latter, allegedly it is
impossible to show that the union of a countable collection of countable sets
is countable. This kind of logical finesse is beyond the scope of this book.

4 Integral
The measure space , F ,  is fixed. A function f with domain  and
range in RŁ D [1, C1] is called F -measurable iff for each real number c
we have
ff  cg D fω 2 : fω  cg 2 F .
We write f 2 F in this case. It follows that for each set A 2 B, namely a
Borel set, we have
ff 2 Ag 2 F ;
and both ff D C1g and ff D 1g also belong to F . Properties of measur-
able functions are given in Chapter 3, although the measure there is a proba-
bility measure.
A function f 2 F with range a countable set in [0, 1] will be called
a basic function. Let faj g be its range (which may include “1”), and Aj D
ff D aj g. Then the Aj ’s are disjoint sets with union  and

31 fD aj 1Aj
j

where the sum is over a countable set of j.


We proceed to define an integral for functions in F , in three stages,
beginning with basic functions.
396 SUPPLEMENT: MEASURE AND INTEGRAL

DEFINITION 8(a). For the basic function f in (31), its integral is defined
to be

32 Ef D aj Aj 
j

and is also denoted by


 
fd D fω dω.


If a term in (32) is 0.1 or 1.0, it is taken to be 0. In particular if f  0, then


E0 D 0 even if  D 1. If A 2 F and A D 0, then the basic function
1.1A C 0.1Ac
has integral equal to
1.0 C 0. Ac  D 0.

We list some of the properties of the integral.


(i) Let fBj g be a countable set of disjoint sets in F , with union  and fbj g
arbitrary positive numbers or 1, not necessarily distinct. Then the function

33 bj 1Bj
j

is basic, and its integral is equal to



bj Bj .
j

PROOF. Collect all equal bj ’s into aj and the corresponding Bj ’s into Aj


as in (31). The result follows from the theorem on double series of positive
terms that it may be summed in any order to yield a unique sum, possibly C1.
(ii) If f and g are basic and f  g, then
Ef  Eg.
In particular if Ef D C1, then Eg D C1.
PROOF. Let f be as in (31) and g as in (33). The doubly indexed set
fAj \ Bk g are disjoint and their union is . We have using (i):

Ef D aj Aj \ Bk ;
j k

Eg D bk Aj \ Bk .
k j
4 INTEGRAL 397

The order of summation in the second double series may be reversed, and the
result follows by the countable additivity of .
(iii) If f and g are basic functions, a and b positive numbers, then af C
bg is basic and
Eaf C bg D aEf C bEg.
PROOF. It is trivial that af is basic and
Eaf D aEf.
Hence it is sufficient to prove the result for a D b D 1. Using the double
decomposition in (ii), we have

Ef C g D aj C bk  Aj \ Bk .
j k

Splitting the double series in two and then summing in two orders, we obtain
the result.
It is good time to state a general result that contains the double series
theorem used above and some other version of it that will be used below.

Double Limit Lemma. Let fCjk ; j 2 N, k 2 Ng be a doubly indexed array


of real numbers with the following properties:

(a) for each fixed j, the sequence fCjk ; k 2 Ng is increasing in k;


(b) for each fixed k, the sequence fCjk ; j 2 Ng is increasing in j.

Then we have
lim " lim " Cjk D lim " lim " Cjk  C1.
j k k j

The proof is surprisingly simple. Both repeated limits exist by funda-


mental analysis. Suppose first that one of these is the finite number C. Then
for any  > 0, there exist j0 and k0 such that Cj0 k0 > C  . This implies
that the other limit > C  . Since  is arbitrary and the two indices are
interchangeable, the two limits must be equal. Next if the C above is C1,
then changing C   into 1 finishes the same argument.
As an easy exercise, the reader should derive the cited theorem on double
series from the Lemma.
Let A 2 F and f be a basic function. Then the product 1A f is a basic
function and its integral will be denoted by
 
34 EA; f D fω dω D fd .
A A
398 SUPPLEMENT: MEASURE AND INTEGRAL

(iv) Let An 2 F , An ² AnC1 for all n and A D [n An . Then we have


35 lim EAn ; f D EA; f.
n

PROOF. Denote f by (33), so that 1A f D j bj 1ABj . By (i),

EA; f D bj ABj 
j

with a similar equation


m 1 A is replaced by An . Since An Bj  " ABj 
where
as n " 1, and jD1 " jD1 as m " 1, (35) follows by the double limit
theorem.
Consider now an increasing sequence ffn g of basic functions, namely,
fn  fnC1 for all n. Then f D limn " fn exists and f 2 F , but of course f
need not be basic; and its integral has yet to be defined. By property (ii), the
numerical sequence Efn  is increasing and so limn " Efn  exists, possibly
equal to C1. It is tempting to define Ef to be that limit, but we need the
following result to legitimize the idea.

Theorem 8. Let ffn g and fgn g be two increasing sequences of basic func-
tions such that
36 lim " fn D lim " gn
n n

(everywhere in ). Then we have


37 lim " Efn  D lim " Egn .
n n

PROOF. Denote the common limit function in (36) by f and put


A D fω 2 : fω > 0g,
then A 2 F . Since 0  gn  f, we have 1Ac gn D 0 identically; hence by prop-
erty (iii):
38 Egn  D EA; gn  C EAc ; gn  D EA; gn .
Fix an n and put for each k 2 N:
 
n1
Ak D ω 2 : fk ω > gn ω .
n
Since fk  fkC1 , we have Ak ² AkC1 for all k. We are going to prove that
1
39 Ak D A.
kD1
4 INTEGRAL 399

If ω 2 Ak , then fω ½ fk ω > [n  1/n]gn ω ½ 0; hence ω 2 A. On the


other hand, if ω 2 A then

lim " fk ω D fω ½ gn ω


k

and fω > 0; hence there exists an index k such that


n1
fk ω > gn ω
n
and so ω 2 Ak . Thus (39) is proved. By property (ii), since
 
n1
fk ½ 1Ak fk ½ 1Ak gn
n
we have
n1
Efk  ½ EAk ; fk  ½ EAk ; gn .
n
Letting k " 1, we obtain by property (iv):
n1
lim " Efk  ½ lim EAk ; gn 
k n k

n1 n1
D EA; gn  D Egn 
n n
where the last equation is due to (38). Now let n " 1 to obtain

lim " Efk  ½ lim " Egn .


k n

Since ffn g and fgn g are interchangeable, (37) is proved.

Corollary. Let fn and f be basic functions such that fn " f, then Efn  "
Ef.

PROOF. Take gn D f for all n in the theorem.


The class of positive F -measurable functions will be denoted by FC .
Such a function can be approximated in various ways by basic functions. It is
nice to do so by an increasing sequence, and of course we should approximate
all functions in FC in the same way. We choose a particular mode as follows.
Define a function on [0, 1] by the (uncommon) symbol ) ]:

0] D 0; 1] D 1;
x] D n  1 for x 2 n  1, n], n 2 N.
400 SUPPLEMENT: MEASURE AND INTEGRAL

Thus ] D 3, 4] D 3. Next we define for any f 2 FC the approximating


sequence ffm g, m 2 N, by
2m fω]
40 fm ω D .
2m
Each fm is a basic function with range in the set of dyadic (binary) numbers:
fk/2m g where k is a nonnegative integer or 1. We have fm  fmC1 for
all m, by the magic property of bisection. Finally fm " f owing to the
left-continuity of the function x !x].

DEFINITION 8(b). For f 2 FC , its integral is defined to be

41 Ef D lim " Efm .


m

When f is basic, Definition 8(b) is consistent with 8(a), by Corollary to


Theorem 8. The extension of property (ii) of integrals to FC is trivial, because
f  g implies fm  gm . On the contrary, f C gm is not fm C gm , but
since fm C gm " f C g, it follows from Theorem 8 that

lim " Efm C gm  D lim " Ef C gm 


m m

that yields property (iii) for FC , together with Eafm  " aEf, for a ½ 0.

Property (iv) for FC will be given in an equivalent form as follows.


(iv) For f 2 FC , the function of sets defined on F by

A ! EA; f

is a measure.
PROOF. We need only prove that if A D [1
nD1 An where the An ’s are
disjoint sets in F , then
1

EA; f D EAn ; f.
nD1

For a basic f, this follows from properties (iii) and (iv). The extension to FC
can be done by the double limit theorem and is left as an exercise.

There are three fundamental theorems relating the convergence of func-


tions with the convergence of their integrals. We begin with Beppo Levi’s
theorem on Monotone Convergence (1906), which is the extension of Corol-
lary to Theorem 8 to FC .
4 INTEGRAL 401

Theorem 9. Let ffn g be an increasing sequence of functions in FC with


limit f: fn " f. Then we have
lim " Efn  D Ef  C1.
n
We have f 2 FC ; hence by Definition 8(b), (41) holds. For each
PROOF.
fn , we have, using analogous notation:
42 lim " Efm
n  D Efn .
m

Since fn " f, the numbers 2m fn ω] "2m fω] as n " 1, owing to the
left continuity of x !x]. Hence by Corollary to Theorem 8,
43 lim " Efm
n  D Ef
m
.
n

It follows that
lim " lim " Efm
n  D lim " Ef
m
 D Ef.
m n m

On the other hand, it follows from (42) that


lim " lim " Efm
n  D lim " Efn .
n m n

Therefore the theorem is proved by the double limit lemma.


From Theorem 9 we derive Lebesgue’s theorem in its pristine positive
guise.

Theorem 10. Let fn 2 FC , n 2 N. Suppose

(a) limn fn D 0;
(b) Esupn fn  < 1.

Then we have
44 lim Efn  D 0.
n
PROOF. Put for n 2 N:
45 gn D sup fk .
k½n

Then gn 2 FC , and as n " 1, gn # lim supn fn D 0 by (a); and g1 D supn fn


so that Eg1  < 1 by (b).
Now consider the sequence fg1  gn g, n 2 N. This is increasing with limit
g1 . Hence by Theorem 9, we have
lim " Eg1  gn  D Eg1 .
n
402 SUPPLEMENT: MEASURE AND INTEGRAL

By property (iii) for FC ,

Eg1  gn  C Egn  D Eg1 .

Substituting into the preceding relation and cancelling the finite Eg1 , we
obtain Egn  # 0. Since 0  fn  gn , so that 0  Efn   Egn  by property
(ii) for FC , (44) follows.
The next result is known as Fatou’s lemma, of the same vintage 1906
as Beppo Levi’s. It has the virtue of “no assumptions” with the consequent
one-sided conclusion, which is however often useful.

Theorem 11. Let ffn g be an arbitrary sequence of functions in fC . Then


we have

46 Elim inf fn   lim inf Efn .


n n

PROOF. Put for n 2 N:


gn D inf fk ,
k½n

then
lim inf fn D lim " gn .
n n

Hence by Theorem 9,

47 Elim inf fn  D lim " Egn .


n n

Since gn  fn , we have Egn   Efn  and

lim inf Egn   lim inf Efn .


n n

The left member above is in truth the right member of (47); therefore (46)
follows as a milder but neater conclusion.
We have derived Theorem 11 from Theorem 9. Conversely, it is easy to
go the other way. For if fn " f, then (46) yields Ef  limn " Efn . Since
f ½ fn , Ef ½ limn " Efn ; hence there is equality.
We can also derive Theorem 10 directly from Theorem 11. Using the
notation in (45), we have 0  g1  fn  g1 . Hence by condition (a) and (46),
Eg1  D Elim infg1  fn   lim infEg1   Efn 
n n

D Eg1   lim sup Efn 


n

that yields (44).


4 INTEGRAL 403

The three theorems 9, 10, and 11 are intimately woven.


We proceed to the final stage of the integral. For any f 2 F with range
in [1, C1], put
 
C f on ff ½ 0g,  f on ff  0g,
f D f D
0 on ff < 0g; 0 on ff > 0g.
Then fC 2 FC , f 2 FC , and
f D fC  f ; jfj D fC C f
By Definition 8(b) and property (iii):
48 Ejfj D EfC  C Ef .

DEFINITION 8(c). For f 2 F , its integral is defined to be


49 Ef D EfC   Ef ,
provided the right side above is defined, namely not 1  1. We say f is
integrable, or f 2 L 1 , iff both EfC  and Ef  are finite; in this case Ef
is a finite number. When Ef exists but f is not integrable, then it must be
equal to C1 or 1, by (49).
A set A in  is called a null set iff A 2 F and A D 0. A mathematical
proposition is said to hold almost everywhere, or a.e. iff there is a null set A
such that it holds outside A, namely in Ac .
A number of important observations are collected below.

Theorem 12. (i) The function f in F is integrable if and only if jfj is


integrable; we have
50 jEfj  Ejfj.
(ii) For any f 2 F and any null set A, we have
 
51 EA; f D f d D 0; Ef D EAc ; f D fd .
A Ac

(iii) If f 2 L 1 , then the set fω 2 : jfωj D 1g is a null set.


(iv) If f 2 L 1 , g 2 F , and jgj  jfj a.e., then g 2 L 1 .
(v) If f 2 F , g 2 F , and g D f a.e., then Eg exists if and only if Ef
exists, and then Eg D Ef.
(vi) If  < 1, then any a.e. bounded F -measurable function is inte-
grable.
PROOF. (i) is trivial from (48) and (49); (ii) follows from
1A jfj  1A .1
404 SUPPLEMENT: MEASURE AND INTEGRAL

so that
0  E1A jfj  E1A .1 D A.1 D 0.
This implies (51).
To prove (iii), let
An D fjfj ½ ng.
Then An 2 F and
n An D EAn; n  EAn; jfj  Ejfj.
Hence
1
52 An  Ejfj.
n
Letting n " 1, so that An # fjfj D 1g; since A1  Ejfj < 1, we
have by property (f) of the measure :
fjfj D 1g D lim # An D 0.
n

To prove (iv), let jgj  jfj on A , where


c
A D 0. Then
jgj  1A .1 C 1Ac .jfj
and consequently
Ejgj  A.1 C EAc ; jfj  0.1 C Ejfj D Ejfj.
Hence g 2 L 1 if f 2 L 1 .
The proof of (v) is similar to that of (iv) and is left as an exercise. The
assertion (vi) is a special case of (iv) since a constant is integrable when
 < 1.

Remark. A case of (52) is known as Chebyshev’s inequality; see p. 48


of the main text. Indeed, it can be strengthened as follows:
53 lim n An  lim EAn; jfj D 0.
n n

This follows from property (f) of the measure


A ! EA; jfj;
see property (iv) of the integral for FC .
There is also a strengthening of (ii), as follows.
If Bk 2 F and Bk  ! 0 as k ! 1, then
lim EBk ; f D 0.
k
4 INTEGRAL 405

To prove this we may suppose, without loss of generality, that f 2 FC . We


have then
EBk ; f D EBk \ An; f C EBk \ Anc ; f
 EAn; f C EBk n.
Hence
lim sup EBk ; f  EAn; f
k

and the result follows by letting n ! 1 and using (53).


It is convenient to define, for any f in F , a class of functions denoted
by Cf, as follows: g 2 Cf iff g D f a.e. When , F ,  is a complete
measure space, such a g is automatically in F . To see this, let B D fg 6D fg.
Our definition of “a.e.” means only that B is a subset of a null set A; in plain
English this does not say whether g is equal to f or not anywhere in A  B.
However if the measure is complete, then any subset of a null set is also a
null set, so that not only the set B but all its subsets are null, hence in F .
Hence for any real number c,
fg  cg D fg D f; g  cg [ fg 6D f; g  cg
belongs to F , and so g 2 F .
A member of Cf may be called a version of f, and may be substituted
for f wherever a null set “does not count”. This is the point of (iv) and (v) in
Theorem 12. Note that when the measure space is complete, the assumption
“g 2 F ” there may be omitted. A particularly simple version of f is the
following finite version:

f on fjfj < 1g,
fD
0 on fjfj D 1g;
where 0 may be replaced by some other number, e.g., by 1 in Elog f.
In functional analysis, it is the class Cf rather than an individual f
that is a member of L 1 .
As examples of the preceding remarks, let us prove properties (ii) and
(iii) for integrable functions.
(ii) if f 2 L 1 , g 2 L 1 , and f  g a.e., then
Ef  Eg.
PROOF. We have, except on a null set:
fC  f  gC  g
but we cannot transpose terms that may be C1! Now substitute finite versions
of f and g in the above (without changing their notation) and then transpose
406 SUPPLEMENT: MEASURE AND INTEGRAL

as follows:
fC C g  gC C f .

Applying properties (ii) and (iii) for FC , we obtain

EfC  C Eg   EgC  C Ef .

By the assumptions of L 1 , all the four quantities above are finite numbers.
Transposing back we obtain the desired conclusion.
(iii) if f 2 L 1 , g 2 L 1 , then f C g 2 L 1 , and

Ef C g D Ef C Eg.

Let us leave this as an exercise. If we assume only that both Ef and
Eg exist and that the right member in the equation above is defined, namely
not C1 C 1 or 1 C C1, does Ef C g then exist and equal
to the sum? We leave this as a good exercise for the curious, and return to
Theorem 10 in its practical form.
Theorem 101 . Let fn 2 F ; suppose
(a) limn fn D f a.e.;
(b) there exists ϕ 2 L 1 such that for all n:

jfn j  ϕ a.e.

Then we have
(c) limn Ejfn  fj D 0.
PROOF. observe first that
j lim fn j  sup jfn j;
n n

jfn  fj  jfn j C jfj  2 sup jfn j;


n

provided the left members are defined. Since the union of a countable collec-
tion of null sets is a null set, under the hypotheses (a) and (b) there is a null set
A such that on   A, we have supn jfn j  ϕ hence by Theorem 12 (iv), all
jfn j, jfj, jfn  fj are integrable, and therefore we can substitute their finite
versions without affecting their integrals, and moreover limn jfn  fj D 0 on
  A. (Remember that fn  f need not be defined before the substitutions!).
By using Theorem 12 (ii) once more if need be, we obtain the conclusion (c)
from the positive version of Theorem 10.
This theorem is known as Lebesgue’s dominated convergence theorem,
vintage 1908. When  < 1, any constant C is integrable and may be used
for ϕ; hence in this case the result is called bounded convergence theorem.
5 APPLICATIONS 407

Curiously, the best known part of the theorem is the corollary below with a
fixed B.

Corollary. We have
 
lim fn d D fd
n B B

uniformly in B 2 F .

This is trivial from (c), because, in alternative notation:


jEB; fn   EB; fj  EB; jfn  fj  Ejfn  fj.
In the particular case where B D , the Corollary contains a number of useful
results such as the integration term by term of power series or Fourier series.
A glimpse of this is given below.

5 Applications
The general theory of integration applied to a probability space is summarized
in §§3.1–3.2 of the main text. The specialization to R expounded in §3 above
will now be described and illustrated.
A function f defined on R with range in [1, C1] is called a Borel
function iff f 2 B; it is called a Lebesgue-measurable function iff f 2 BŁ .
The domain of definition f may be an arbitrary Borel set or Lebesgue-
measurable set D. This case is reduced to that for D D R by extending the
definition of f to be zero outside D. The integral of f 2 BŁ corresponding
to the measure mŁ constructed from F is denoted by
 1
Ef D fx dFx.
1

In case Fx  x, this is called the Lebesgue integral of f; in this case the
usual notation is, for A 2 BŁ :

fx dx D EA; f.
A

Below are some examples of the application of preceding theorems to classic


analysis.

Example 1. Let I be a bounded interval in R; fuk g a sequence of functions on I; and


for x 2 I:
n
sn x D uk x, n 2 N.
kD1
408 SUPPLEMENT: MEASURE AND INTEGRAL


Suppose the infinite series k uk x converges I; then in the usual notation:


n 1

lim uk x D uk x D sx
n!1
kD1 kD1

exists and is finite. Now suppose each uk is Lebesgue-integrable, then so is each sn ,


by property (iii) of the integral; and
 n 

sn x dx D uk x dx.
I kD1 I

Question: does the numerical series above converge? and if so is the sum of integrals
equal to the integral of the sum:
1 
  
1 
uk x dx D uk x dx D sx dx?
kD1 I I kD1 I

This is the problem of integration term by term.


A very special but important case is when the interval I D [a, b]  is compact
and the functions uk are all continuous in I. If we assume that the series 1 kD1 uk x
converges uniformly in I, then it follows from elementary analysis that the sequence
of partial sums fsn xg is totally bounded, that is,

sup sup jsn xj D sup sup jsn xj < 1.


n x x n

Since mI < 1, the bounded convergence theorem applies to yield


 
lim sn x dx D lim sn x dx.
n I I n

The Taylor series of an analytic function always converges uniformly and abso-
lutely in any compact subinterval of its interval of convergence. Thus the result above
is fruitful.
Another example of term-by-term integration goes back to Theorem 8.

Example 2. Let uk ½ 0, uk 2 L1 , then


 b  1
 1  b

54 uk d D uk d .
a kD1 kD1 a

n 1
Let fn D kD1 uk , then fn 2 L1 , fn " f D kD1 uk . Hence by monotone conver-
gence
Ef D lim Efn 
n

that is (54).
5 APPLICATIONS 409

When uk is general the preceding result may be applied to juk j to obtain


 
1  1 

juk j d D juk j d .
kD1 kD1

If this is finite, then the same is true when juk j is replaced by ukC and uk . It then
follows by subtraction that (54) is also true. This result of term-by-term integration
may be regarded as a special case of the Fubini–Tonelli theorem (pp. 63–64), where
one of the measures is the counting measure on N.
For another perspective, we will apply the Borel–Lebesgue theory of integral to
the older Riemann context.

Example 3. Let I, BŁ , m be as in the preceding example, but let I D [a, b] be


compact. Let f be a continuous function on I. Denote by P a partition of I as follows:

a D x0 < x1 < x2 < Ð Ð Ð < xn D b;

and put
υP D max xk  xk1 .
1kn

For each k, choose a point k in [xk1 , xk ], and define a function fP as follows:



1 , for x 2 [x0 , x1 ],
fP x D
k , for x 2 xk1 , xk ], 2  k  n.

Particular choices of k are: k D fxk1 ; k D fxk ;

55 k D min fx; k D max fx.


xk1 xxk xk1 xxk

The fP is called a step function; it is an approximant of f. It is not basic by Defini-


tion 8(a) but fC 
P and fP are. Hence by Definitions 8(a) and 8(c), we have


n
EfP  D fk xk  xk1 .
kD1

The sum above is called a Riemann sum; when the k are chosen as in (55), they are
called lower and upper sums, respectively.
Now let fPn, n 2 Ng be a sequence of partitions such that υPn ! 0 as
n ! 1. Since f is continuous on a compact set, it is bounded. It follows that there
is a constant C such that
sup sup jfPn xj < C.
n2N x2I

Since I is bounded, we can apply the bounded convergence theorem to conclude that

lim EfPn  D Ef.


n
410 SUPPLEMENT: MEASURE AND INTEGRAL

The finite existence of the limit above signifies the Riemann-integrability of f, and the
b
limit is then its Riemann-integral a fx dx. Thus we have proved that a continuous
function on a compact interval is Riemann-integrable, and its Riemann-integral is equal
to the Lebesgue integral. Let us recall that in the new theory, any bounded measurable
function is integrable over any bounded measurable set. For example, the function
1
sin , x 2 0, 1]
x
being bounded by 1 is integrable. But from the strict Riemannian point of view it
has only an “improper” integral because (0, 1] is not closed and the function is not
continuous on [0, 1], indeed it is not definable there. Yet the limit
 1
1
lim sin dx
#0  x
1
exists and can be defined to be 0 sin1/x dx, As a matter of fact, the Riemann sums
do converge despite the unceasing oscillation of f between 0 and 1 as x # 0.

Example 4. The Riemann integral of a function on 0, 1 is called an “infinite


integral” and is definable as follows:
 1  n
fx dx D lim fx dx
0 n!1 0

when the limit exists and is finite. A famous example is


sin x
56 fx D , x 2 0, 1.
x
This function is bounded by 1 and is continuous. It can be extended to [0, 1 by
defining f0 D 1 by continuity. A cute calculation (see §6.2 of main text) yields the
result (useful in Optics):  n
sin x 
lim dx D .
n 0 x 2
By contrast, the function jfj is not Lebesgue-integrable. To show this, we use
trigonometry:
   
sin x C 1 1  3
½ p D Cn for x 2 2n C , 2n C D In .
x 2 2n C 1 4 4
Thus for x > 0:  C 1

sin x
½ Cn 1In x.
x nD1

The right member above is a basic function, with its integral:


  
Cn mIn  D p D C1.
n n 22n C 12
5 APPLICATIONS 411

It follows that EfC  D C1. Similarly Ef  D C1; therefore by Definition 8(c)
Ef does not exist! This example is a splendid illustration of the following
Non-Theorem.
Let f 2 B and fn D f10,n , n 2 N. Then fn 2 B and fn ! f as n ! 1.
Even when the f0n s are “totally bounded”, it does not follow that

57 lim Efn  D Ef;


n

indeed Ef may not exist.


On the other hand, if we assume, in addition, either (a) f ½ 0; or (b) Ef
exists, in particular f 2 L 1 ; then the limit relation will hold, by Theorems 9 and 10,
respectively. The next example falls in both categories.

Example 5. The square of the function f in (56):


 
sin x 2
fx2 D , x2R
x
is integrable in the Lebesgue sense, and is also improperly integrable in the Riemann
sense.
We have
1
fx2  11,C1 C 11,1[C1,C1 2
x
and the function on the right side is integrable, hence so is f2 .
Incredibly, we have
 1   1
sin x 2  sin x
dx D D RI dx,
0 x 2 0 x
where we have inserted an “RI” to warn against taking the second integral as a
Lebesgue integral. See §6.2 for the calculation. So far as I know, nobody has explained
the equality of these two integrals.

Example 6. The most notorious example of a simple function that is not Riemann-
integrable and that baffled a generation of mathematicians is the function 1Q , where Q
is the set of rational numbers. Its Riemann sums can be made to equal any real number
between 0 and 1, when we confine Q to the unit interval (0, 1). The function is so
totally discontinuous that the Riemannian way of approximating it, horizontally so to
speak, fails utterly. But of course it is ludicrous even to consider this indicator function
rather than the set Q itself. There was a historical reason for this folly: integration was
regarded as the inverse operation to differentiation, so that to integrate was meant to
“find the primitive” whose derivative is to be the integrand, for example,
 
2 d 2
x dx D , D ;
2 d 2
 
1 d 1
dx D log , log  D .
x d 
412 SUPPLEMENT: MEASURE AND INTEGRAL

2
A primitive is called “indefinite integral”, and 1 1/x dx e.g. is called a “definite

integral.” Thus the unsolvable problem was to find 0 1Q x dx, 0 <  < 1.
The notion of measure as length, area, and volume is much more ancient than
Newton’s fluxion (derivative), not to mention the primitive measure of counting with
fingers (and toes). The notion of “countable additivity” of a measure, although seem-
ingly natural and facile, somehow did not take hold until Borel saw that
 
mQ D mq D 0 D 0.
q2Q q2Q

There can be no question that the “length” of a single point q is zero. Euclid gave it
“zero dimension”.
This is the beginning of MEASURE. An INTEGRAL is a weighted measure,
as is obvious from Definition 8(a). The rest is approximation, vertically as in Defini-
tion 8(b), and convergence, as in all analysis.
As for the connexion with differentiation, Lebesgue made it, and a clue is given
in §1.3 of the main text.
General bibliography

[The five divisions below are merely a rough classification and are not meant to be
mutually exclusive.]

1. Basic analysis

[1] G. H. Hardy, A course of pure mathematics, 10th ed. Cambridge University


Press, New York, 1952.
[2] W. Rudin, Principles of mathematical analysis, 2nd ed. McGraw-Hill Book
Company, New York, 1964.
[3] I. P. Natanson, Theory of functions of a real variable (translated from the
Russian). Frederick Ungar Publishing Co., New York, 1955.

2. Measure and integration

[4] P. R. Halmos, Measure theory. D. Van Nostrand Co., Inc., Princeton, N.J.,
1956.
[5] H. L. Royden, Real analysis. The Macmillan Company, New York, 1963.
[6] J. Neveu, Mathematical foundations of the calculus of probability. Holden-
Day, Inc., San Francisco, 1965.

3. Probability theory

[7] Paul Lévy, Calcul des probabilités. Gauthier-Villars, Paris, 1925.


[8] A. Kolmogoroff, Grundbegriffe der Wahrscheinlichkeitsrechnung. Springer-
Verlag, Berlin, 1933.
414 GENERAL BIBIOGRAPHY

[9] M. Fréchet, Généralités sur les probabilités. Variables aléatoires. Gauthier-


Villars, Paris, 1937.
[10] H. Cramér, Random variables and probability distributions, 3rd ed. Cam-
bridge University Press, New York, 1970 [1st ed., 1937].
[11] Paul Lévy, Théorie de l’addition des variables aléatoires, 2nd ed. Gauthier-
Villars, Paris, 1954 [1st ed., 1937].
[12] B. V. Gnedenko and A. N. Kolmogorov, Limit distributions for sums of inde-
pendent random variables (translated from the Russian). Addison-Wesley Publishing
Co., Inc., Reading, Mass., 1954.
[13] William Feller, An introduction to probability theory and its applications,
vol. 1 (3rd ed.) and vol. 2 (2nd ed.). John Wiley & Sons, Inc., New York, 1968 and
1971 [1st ed. of vol. 1, 1950].
[14] Michel Loève, Probability theory, 3rd ed. D. Van Nostrand Co., Inc., Prince-
ton, N.J., 1963.
[15] A. Rényi, Probability theory. North-Holland Publishing Co., Amsterdam,
1970.
[16] Leo Breiman, Probability. Addison-Wesley Publishing Co., Reading, Mass.,
1968.

4. Stochastic processes

[17] J. L. Doob, Stochastic processes. John Wiley & Sons, Inc., New York, 1953.
[18] Kai Lai Chung, Markov chains with stationary transition probabilities, 2nd
ed. Springer-Verlag, Berlin, 1967 [1st ed., 1960].
[19] Frank Spitzer, Principles of random walk. D. Van Nostrand Co., Princeton,
N.J., 1964.
[20] Paul–André Meyer, Probabilités et potential. Herman (Editions Scienti-
fiques), Paris, 1966.
[21] G. A. Hunt, Martingales et processus de Markov. Dunod, Paris, 1966.
[22] David Freedman, Brownian motion and diffusion. Holden-Day, Inc., San
Francisco, 1971.

5. Supplementary reading

[23] Mark Kac, Statistical independence in probability, analysis and number


theory, Carus Mathematical Monograph 12. John Wiley & Sons, Inc., New York,
1959.
[24] A. Rényi, Foundations of probability. Holden-Day, Inc., San Francisco,
1970.
[25] Kai Lai Chung, Elementary probability theory with stochastic processes.
Springer-Verlag, Berlin, 1974.
Index

A Bernstein’s theorem, 200


Abelian theorem, 292 Berry–Esseen theorem, 235
Absolutely continuous part of d.f., 12 B.F., see Borel field
Adapted, 335 Bi-infinite sequence, 270
Additivity of variance, 108 Binary expansion, 60
Adjunction, 25 Bochner–Herglotz theorem, 187
a.e., see Almost everywhere Boole’s inequality, 21
Algebraic measure space, 28 Borel–Cantelli lemma, 80
Almost everywhere, 30 Borel field, 18
Almost invariant, permutable, remote, trivial, generated, 18
266 Borel field on R, 388, 394
Approximation lemmas, 91, 264 Borel–Lebesgue measure, 24, 389
Arcsin law, 305 Borel–Lebesgue-Stiltjes measure, 394
Arctan transformation, 75 Borel measurable function, 37
Asymptotic density, 24 Borel’s Covering Theorem, 392
Asymptotic expansion, 236 Borel set, 24, 38, 394
Atom, 31, 32 Borel’s Lemma, 390
Augmentation, 32 Borel’s theorem on normal numbers, 109
Axiom of continuity, 22 Boundary value problem, 344
Bounded convergence theorem, 44
B Branching process, 371
B0 , B, BŁ , 389 Brownian motion, 129, 227
Ballot problem, 232
Basic function, 395 C
Bayes’ rule, 320
Belonging (of function to B.F.), 263 CK , C0 , CB , C, 91
Beppo Levi’s theorem, 401 C1
B , CU , 157
416 INDEX

Canonical representation of infinitely Density of d.f., 12


divisible laws, 258 d.f., see Distribution function
Cantor distribution, 13–15, 129, 174 Differentiation:
Carathéodory criterion, 28, 389, 394 of ch.f., 175
Carleman’s condition, 103 of d.f., 163
Cauchy criterion, 70 Diophantine approximation, 285
Cauchy distribution, 156 Discrete random variable, 39
Cauchy–Schwartz inequality, 50, 317 Distinguished logarithm, nth root, 255
Central limit theorem, 205 Distribution function, 7
classical form, 177 continuous, 9
for dependent r.v.’s, 224 degenerate, 7
for random number of terms, 226 discrete, 9
general form, 211 n-dimensional, 53
Liapounov form, 208 of p.m., 30
Lindeberg–Feller form, 214 Dominated convergence theorem, 44
Lindeberg’s method, 211 Dominated ergodic theorem, 362
with remainder, 235 Dominated r.v., 71
Chapman–Kolmogorov equations, 333 Doob’s martingale convergence, 351
Characteristic function, 150 Doob’s sampling theorem, 342
list of, 155–156 Double array, 205
multidimensional, 197 Double limit lemma, 397
Chebyshev’s inequality, 50 Downcrossing inequality, 350
strengthened, 404
ch.f., see Characteristic function E
Coin-tossing, 111
Egorov’s theorem, 79
Complete probability space, 30
Empiric distribution, 138
Completely monotonic, 200
Concentration function, 160 Equi-continuity of ch.f.’s, 169
Conditional density, 321 Equivalent sequences, 112
Conditional distribution, 321 Event, 54
Conditional expectation, 313 Exchangeable, event, 366
as integration, 316 Existence theorem for independent r.v.’s, 60
as martingale limit, 369 Expectation, 41
Conditional independence, 322 Extension theorem, see Kolmogorov’s
Conditional probability, 310, 313 extension theorem
Continuity interval, 85, 196
Convergence a.e., 68 F
in dist., 96 F0 , F , F Ł , 380
in L p , 71 F Ł -measurability, 378
in pr., 70 approximation of F Ł -measurable set, 382
weak in L 1 , 73 Fatou’s lemma, 45, 402
Convergence theorem: Feller’s dichotomy, 134
for ch.f., 169 Field, 17
for Laplace transform, 200 Finite-product set, 61
Convolution, 152, 157 First entrance time, 273
Coordinate function, 58, 264 Fourier series, 183
Countable, 4 Fourier–Stieltjes transform, 151
of a complex variable, 203
D Fubini’s theorem, 63
de Finetti’s theorem, 367 Functionals of Sn , 228
De Moivre’s formula, 220 Function space, 265
INDEX 417

G J

Gambler’s ruin, 343 Jensen’s inequality, 50, 317


Gambling system, 273, 278, see also Optional Jump, 3
sampling Jumping part of d.f., 7
Gamma distribution, 158
Generalized Poisson distribution, 257 K
Glivenko–Cantelli theorem, 140
Kolmogorov extension theorem, 64, 265
Kolmogorov’s inequalites, 121, 125
H Kronecker’s lemma, 129

Harmonic analysis, 164


L
Harmonic equation, 344
Harmonic function, 361 Laplace transform, 196
Helley’s extraction principle, 88 Large deviations, 214, 241
Holder’s inequality, 50 Lattice d.f., 183
Holospoudicity, 207 Law of iterated logarithm, 242, 248
Homogeneous Markov chain, process, Law of large numbers
333 converse, 138
for pairwise independent r.v.’s, 134
for uncorrelated r.v.’s, 108
I
identically distributed case, 133, 295, 365
Identically distributed, 37 strong, 109, 129
Independence, 53 weak, 114, 177
characterizations, 197, 322 Law of small numbers, 181
Independent process, 267 Lebesgue measurable, 394
Indicator, 40 Lebesgue’s dominated and bounded
In dist., see In distribution convergence, 401, 406
In distribution, 96 Levi’s monotone convergence, 401
Induced measure, 36 Lévy–Cramér theorem, 169
Infinitely divisible, 250 Lévy distance, 98
Lévy’s formula, see Canonical representation
Infinitely often, 77, 360
Lévy’s theorem on series of independent
Infinite product space, 61
r.v.’s, 126
Information theory, 148, 414
strengthened, 363
Inner measure, 28
Liapounov’s central limit theorem, 208
In pr., see In probability
Liapounov’s inequality, 50
In probability, 70
Liminf, Limsup, 75
Integrable random variable, 43
Lindeberg’s method, 211
Integrable function, 403
Lindeberg–Feller theorem, 214
Integral, 396, 400, 403
Lower semicontinuous function, 95
additivity, 406
convergence theorems, 401, 406–408
M
Integrated ch.f., 172
Integration term by term, 44, 408 Marcinkiewicz–Zygmund inequality, 362
Invariance principle, 228 Markov chain, homogeneous, 333
Invariant, 266 Markov process, 326
Inverse mapping, 35 homogeneous, 330
Inversion formula for ch.f., 161, 196 or rth order, 333
Inversions, 220 Markov property, 326
i.o., see Infinitely often strong, 327
418 INDEX

Martingale, 335, see also Submartingale, O


Supermartingale
Optional r.v., 259, 338
difference sequence, 337
Optional sampling, 338, 346
Krickeberg’s decomposition, 358
Order statistics, 147
of conditional expectations, 338
Orthogonal, 107
stopped, 341
Maximum of Sn , 228, 299 Outcome, 57
Maximum of submartingale, 346, 362 Outer measure, 28, 375, 377–378
M.C., see Monotone class
m-dependent, 224 P
Mean, 49 Pairwise independence, 53
Measurable with respect to F , 35 Partition, 40
Measure Permutable, 266
Completion, 384 p.m., see Probability measure
Extension, 377, 380 Poisson d.f., 194
-finite, 381
Poisson process, 142
Transfinite construction, 386
Pólya criterion, 191
Uniqueness, 381
Pólya distribution, 158
Measure space, 381
Positive definite, 187
discrete, 386
Positive normal, see Normal
Median, 117
Post-˛, 259
Method of moments, 103, 178
Potential, 359, 361
Metric
Pre-˛, 259
for ch.f.’s, 172
Probability distribution, 36
for convergence in pr., 71
on circle, 107
for d.f.’s, 98
Probability measure, 20
for measure space, 47
of d.f., 30
for p.m.’s, 172
Probability space, triple, 23
Minimal B.F., 18
Product B.F. measure, space, 58
Minkowski’s inequality, 50
Projection, 64
Moment, 49
Moment problem, 103
Monotone class theorems, 19 Q
Monotone convergence theorem, 44, 401 Queuing theory, 307
Monotone property of measure, 21
Multinomial distribution, 181
R
N Radon–Nikodym theorem, 313, 369
n-dimensional distribution, 53 Random variable, 34, see also Discrete,
Negligibility, 206 integrable, simple, symmetric random
Non-measurable set, 395 variable
Normal d.f., 104 Random vector, 37
characterization of, 180, 182 Random walk, 271
convergence to, see Central limit theorem recurrent, 284
kernel, 157 Recurrence principle, 360
positive, 104, 232 Recurrent value, 278
Normal number, 109, 112 Reflection principle, 232
Norming, 206 Remainder term, 235
Null set, 30, 383 Remote event, field, 264
Number of positive terms, 302 Renewal theory, 142
Number of zeros, 234 Riemann integral, 409
INDEX 419

improper, infinite, 410 Supermartingale, 335


primitive, 411 optional sampling, 340
Riemann–Lebesgue lemma, 167 Support of d.f., 10
Riemann–Stieltjes integral, 198 Support of p.m., 32
Riemann zeta function, 259 Symmetric difference 1, 16
Riesz decomposition, 359 Symmetric random variable, 165
Right continuity, 5 Symmetric stable law, 193, 250
r.v., see Random variable Symmetrization, 155
System theorem, see Gambling system,
S Submartingale

Sample function, 141 T


Sample space, 23
Tauberian theorem, 292
discrete, 23
Taylor’s series, 177
s.d.f., see Subdistribution function
Three series theorem, 125
Second-order stochastic process, 191
Tight, 95
Sequentially vaguely compact, 88
Topological measure space, 28
Set of convergence, 79
Trace, 23
Shift, 265
Transition probability function, 331
optional, 273
Trial, 57
Simple random variable, 40
Truncation, 115
Singleton, 16
Types of d.f., 184
Singular d.f., 12
measure, 31
U
Singular part of d.f., 12
Smartingale, 335 Uncorrelated, 107
closeable, 359 Uniform distribution, 30
Span, 184 Uniform integrability, 99
Spectral distribution, 191 Uniqueness theorem
St. Petersburg paradox, 120, 373 for ch.f., 160, 168
Spitzer’s identity, 299 for Laplace transform, 199, 201
Squared variation, 367 for measure, 30
Stable distribution, 193 Upcrossing inequality, 348
Standard deviation, 49
Stationary independent process, 267 V
Stationary transition probabilities, 331 Vague convergence, 85, 92
Step function, 92 general criterion, 96
Stochastic process, with discrete parameter, Variance, 49
263
Stone–Weierstrass theorem, 92, 198 W
Stopped (super) martingale, 340–341
Stopping time, see Optional sampling Wald’s equation, 144, 345
Subdistribution function, 88 Weak convergence, 73
Submartingale, 335 Weierstrass theorem, 145
convergence theorems for, 350 Wiener–Hopf technique, 291
convex transformation of, 336
Doob decomposition, 337 Z
inequalities, 346 Zero-or-one law
Subprobability measure, 85 Hewitt–Savage, 268
Subsequences, method of, 109 Kolmogorov, 267
Superharmonic function, 361 Lévy, 357
This Page Intentionally Left Blank

You might also like