Discrete Probability PDF
Discrete Probability PDF
S. Axler
F. W. Gehring
K.A. Ribet
Discrete Probability
Springer
Hugh Gordon
Department of Mathematcs
SUNY at AIbany
AIbany, NY 12222
USA
Editorial Board
S. AxIer F.W. Gehring K .A. Ribet
Department of Department of Department of
Mathematics Mathematics Mathematics
San Francisco State East Hali University of California
Unversity University of Michigan at Berkeley
San Francisco, CA 94132 Ann Arbor, MI 48109 Berkeley, CA 94720
USA USA USA
987654321
ISBN 9781-4612-7359--2
This book is dedicated
to my parents.
Preface
Vll
Vlll __P_re
__fa_c_e____________________________________________
must believe that everything follows definite rules, which for the
most part are not known to us; thus to say something depends on
chance is to say its actual cause is hidden. According to this definition
we may say that the life of a human being is a game where chance
rules."
On the other hand, at the present time, there are those who
claim that in quantum mechanics we find situations in which there
is nothing but blind chance, with no causality behind it; all that
actually exists before we make an observation is certain probabilities.
We need not go into this point of view here, beyond noting the
following: If we see from experiments that the rules of probability
theory are followed in certain situations, for many purposes it is not
necessary to know why they are followed. Be that as it may, the
examples which appear in this book are all of the kind Montmort
had in mind. We speak of whether a coin falls heads as a random
event, because we do not have the detailed information necessary
to compute from the laws of physics how the coin will fall.
* * *
Hugh Gordon
Contents
Preface vii
1 Introduction 1
2 Counting 17
2.1 order counts, with replacement . . . . . . . 20
2.2 order counts, without replacement . . . . . 20
2.3 order does not count, without replacement 23
2.4 order does not count, with replacement . 25
4 Random Variables 75
4.1 Expected Value and Variance . . . . . . . . . 76
4.2 Computation of Expected Value and Variance 90
Xl
Xll Contents
Answers 253
Index 263
Introduction
CHAPTER
1
2 1. Introduction
But how large is "large"? Well, one million is surely large. However,
as common sense suggests, the chance of getting exactly five hun-
dred thousand heads in a million tosses is small; in Chapter Six we
shall learn just how small. (Meanwhile, which of the following do
you guess is closest: 1/13, 1/130, 1/1300, 1/130000, 1/1300000000?)
Of course, when we say "half heads," we mean approximately half.
Furthermore, no matter how many times we toss the coin, there is a
chance-perhaps very small-of getting heads every time. Thus all
we can say is the following: In a "large" number of tosses, we would
"probably" get "approximately" half heads. That statement is rather
vague. Before doing anything mathematical with it, we must figure
out exactly what we are trying to say. (YVe shall do that in Chapter
Five.)
Now there are still additional difficulties with the concept of
repetition. In 1693, in one of the earliest recorded discussions of
probability, Samuel Pepys (1633-1703), who of course was a diarist,
but not a mathematician, proposed a probability problem to Isaac
Newton (1642-1727) . When Newton began talking about many rep-
etitions, Pepys reacted by rewording his question to make repetition
impossible. (See Exercise 21 of Chapter Three for the reworded ques-
tion.) Pepys was concerned with what would happen if something
were done just once. (Curiously, Newton's letters to Pepys appear to
be the earliest documented use of the idea of repetition in discussing
probabilities. However, the idea must have been in the background
for some time when Newton wrote about it. In fact, despite his great
contributions in other areas, Newton does not seem to have done
anything original in probability theory.) Another case where repeti-
tion is impossible occurs in weather forecasting. What does it mean
to say the probability of rain here tomorrow is 60 %? Even though
there is considerable doubt in our minds as to exactly what we are
saying, we do seem to be saying something.
Often the word "odds" is used in describing how likely something
is to happen. For example, one might say, "The odds are 2 to 1 that it
will rain tomorrow;' Sometimes the odds given refer to the amount
of money each party to a bet will stake. However, frequently it is
not intended that any gambling take place. When that is so, making
a statement associating an event with odds of 2: 1 is simply a way
of saying that of the two probabilities, namely, that that the event
1. Introduction 3
will happen and that that the event won't happen, one is twice the
other. But which is twice which? Since "odds" basically refers to
an advantage, and one side's advantage is the other's disadvantage,
the matter is somewhat complicated. In addition, the use of neg-
ative words, for example, by saying "odds against," causes further
difficulty. The meaning is usually clear from the exact wording and
context, but no simple rule can be given. Under the circumstances,
we shall never use the word "odds" again; the term "probability" is
much clearer and more convenient.
Still before starting our formal theory, we begin to indicate the
setting in which we shall work. We suppose that we are going to
observe something that can turn out in more than one way. We
may be going to take some action, or we may just watch. In ei-
ther case, we speak, for convenience, of doing an experiment. We
assume that which of the various possible outcomes actually takes
place is "due to chance." In other words, we have no knowledge of
the mechanism, if there is one, that produces one outcome rather
than another. From our point of view, the way the experiment turns
out is arbitrary, although some possibilities may be more likely than
others, whatever that means. We restrict our attention to one experi-
ment at a time. This one experiment often involves doing something
more than once. While, over the course of time, we consider many
different examples of experiments, all our theory assumes we are
studying a single experiment. The basic idea of probability theory is
that we assign numbers to hypothetical events, the numbers indicat-
ing how likely the events are to occur. Clearly we must discuss what
an event is before we can consider assigning the numbers. From now
on, when we call something an event, we do not mean that it has
already taken or necessarily will take place. An event is, informally,
something that possibly could occur and for which we want to know
the chances that it will occur. A more formal, and more abstract,
definition of an event occurs in the next paragraph.
From the point of view of abstract mathematical theory, a sam-
ple space is just a set, any set. Since we are going to study only
a certain portion of probability theory, which we shall call discrete
probability, we shall very soon put a restriction on which sets may
be used. However before doing anything else, we want to describe
the intuitive ideas behind our mathematical abstractions. We make a
4 1. Introduction
negative comment first: The word "sample:' which we use for histor-
ical reasons, does not give any indication of what we have in mind.
The sample space, which we shall always denote by Q, is simply the
set of all possible ways our experiment can turn out. In other words,
doing the experiment determines just one element ofQ . Thus, when
the experiment is done, one and only one element of Q "happens."
The choice of a sample space then amounts to specifying all possi-
ble outcomes, without omission or repetition. A subset of the sample
space is called an event; we shall assign probabilities to events. We
illustrate the concepts of sample space and event in the following
examples:
1. Adie is to be thrown. We are interested in which number the die
"shows"; by the number shown, we mean, of course, the number
on the top of the die after it comes to rest. A possible choice of
sample space Q would be {I, 2,3, 4, 5, 6}. Other choices of Q are
also possible. For example, we could classify the way the die falls
considering also which face most nearly points north. However,
since we are interested only in which face of the die is up, the
simplest choice for Q would be {I, 2,3, 4, 5, 6}, and we would be
most unlikely to make any other choice. We consider some exam-
ples of events: Let A be "The die shows an even number." Then A
is an event, and A = {2, 4, 6} . Let B be "The die shows a number
no greater than 3." Then B is an event, and B = {I, 2, 3}. Let C be
"The die shows a number no less than 10." Then C is an event,
and C = 0, the empty set. Let D be "The die shows a number
no greater than 6:' Then D is an event, and D = Q. (YVe note
in passing that 0 and Q are events for any experiment and any
sample space.)
2. Suppose two dice, one red and one blue, are thrown. Suppose
also that we are concerned only with the total of the numbers
shown on the dice. One choice of a sample space would be Q =
{2, 3, 4, ... ,11, 12}. Another choice for Q would be the set
{(I, 1), (1,2), (1 , 3), (1,4), (1,5), (1, 6) ,
(2, I), (2,2), (2, 3), (2,4), (2,5),(2,6),
(3, I), (3,2), (3,3), (3,4), (3,5), (3,6),
(4, I), (4,2), (4,3), (4,4), (4, 5) , (4,6),
(5,1), (5,2), (5,3), (5,4), (5,5),(5, 6) ,
(6,1), (6,2), (6,3), (6,4), (6, 5), (6, 6)},
1. Introduction 5
with 36 elements; here the notation (a, b) means the red die shows
the number a and the blue die shows the number b. The first
choice has the advantage of being smaller; we have only 11 points
in n instead of 36. The second choice has the advantage that the
points of n are equally likely. More precisely, it is reasonable
to assume, on the basis of common sense and perhaps practical
experience, that the 36 points are equally likely. At this time, we
make no decision as to which n to use.
3. Three coins are tossed; we note which fall heads. The first ques-
tion is whether we can tell the coins apart. We shall always
assume that we can distinguish between any two objects. How-
ever, whether we do make the distinction depends on where we
are going. In the case of the three coins, we decide to designate
them as the first coin, the second coin, and the third coin in some
arbitrary manner. Now we have an obvious notation for the way
the coins fall: hht will mean the first and second coins fall heads
and the third tails; tht will mean the first and third coins fall tails
and the second heads; etc. A good choice of a sample space is
n = {hhh, hht, hth, htt, thh, tht, tth, ttt}.
By way of example, let A be the event, "Just two coins fall heads."
Then A = {hht, hth, thh}. The event, liThe number of tails is
even" is {hhh, htt, tht, tth}. It is worth noting that it would not
have mattered if we tossed one coin three times instead of three
coins once each.
4. A coin is tossed repeatedly until it falls the same way twice in a
row. A possible sample space, with obvious notation, is
n = {hh, tt, htt, thh, hthh, thtt, hthtt, ththh, ... }.
Examples of events are liThe coin is tossed exactly five times" and
liThe coin is tossed at least five times."
In studying probability, it makes a great deal of sense to begin
with the simplest case, especially since this case is tremendously im-
portant in applications. In certain situations that arise later, we shall
use addition where more complicated examples require integration.
The difference between addition and integration is precisely the dif-
ference between discreteness and continuity. We are going to stick
to the discrete case. We want the points of our sample space to come
6 1. Introduction
Thus, C = AU B. The logic here is, of course, the logic behind the set
theoretic identity An B = AUB. Likewise one can interpret, in terms
of events, the identity A U B = A n B. [These last two equations are
called DeMorgan's Laws, after Augustus DeMorgan (1806-1871).]
Often we have two ways to say the same thing, one in the lan-
guage of sets and one in the language of events. Suppose A and Bare
events. We can put the equation A nB = 0 in words by saying that the
sets A and B are disjoint. Or we can say equivalently that A and Bare
mutually exclusive; two events are called mutually exclusive when it
is impossible for them to happen together. Consider the set theoretic
identity, which we shall need soon, AU B = A U (B \ A). A logician
might seek a formal proof of that identity, but such a proof need not
concern us here. We convince ourselves informally that the equation
is correct, whether we regard it as a statement about sets or about
events, just by thinking about what it means. Additional equations
that will be needed soon are A n (B \ A) = 0, (A n B) U (A \ B) = A
and (A n B) n (A \ B) = 0. In each case, the reader should think over
what is being said and realize that it is obvious.
Now the key idea of probability theory is that real numbers are
assigned to events to measure how likely the events are. We have
already discussed this idea informally; now we are ready to consider
it as a mathematical abstraction. We begin by preparing the ground-
work as to notation. The probability of an event A will be denoted by
peA); in conformity with general mathematical notation, peA) is the
value the function P assigns to A. In other words, P is a function that
assigns numbers to events; peA) is the number assigned by P to A.
Certain events contain just one point of the sample space. We
shall call such an event an elementary event. Consider Example 3
above, where three coins are tossed. The event, "Three coins fall
heads" is an elementary event, since it consists of just one of the
eight points of the sample space. This event is {hhh}, and its prob-
ability will be denoted by P({hhh}), according to the last paragraph.
To simplify notation, we shall write P(hhh) as an abbreviation for
P({hhh}). In general, ifu E Q, P(u) means P({u}).
We shall need to refer to the sum of the numbers P(u) for all
u E A, where A is an event. We denote this sum by
LP(u).
ueA
8 1. Introduction
LP(u)
UEA
peA) = LP(u);
UEA
in particular, P(0) = o.
Some consequences of these assumptions are too immediate to be
made into formal theorems. We simply list these statements with
brief parenthetical explanations; throughout, A and B denote events.
a. If A C B, then peA) ::: PCB). (P(A) is the sum ofthe numbers P(u)
for all u E A; PCB) adds in additional nonnegative terms.)
b. peA) ::: 1. (This is the special case B = n of the last statement.)
c. If A and B are mutually exclusive, then peA U B) = peA) + PCB).
(P(A) is the sum of all P(u) for certain points u of rl; PCB) is the
sum of all P(u) for certain other points of n. peA UB) is the overall
sum of all of these P(u).)
d. peA) = 1 - P(A). (This amounts to the case B = A of the last
statement, since peA U A) = pen) = 1.)
1. Introduction 9
Most, but not all, of the sample spaces we shall be using in this
book are of a special kind. They are sample spaces that consist of
finitely many "equally likely" points. Alittle more formally, assume
that all elementary events have the same probability. In other words,
there is a number a such that P(u) = a for all u E Q. Denote the
number of points in Q by N . Now we have
1 L P(u) = a + N ...
= P(Q) = uen terms
+ a = Na.
10 1. Introduction
to us, we must judge from the precise wording of the question what
is intended to be equally likely.
Another question of language is best indicated by an example.
Four coins are going to be tossed. Th be specific, suppose they are a
penny, a nickel, a dime, and a quarter. We bet that three coins will
fall heads. If all four coins fall heads, do we win? We can argue, for
example, that three coins, namely the penny, the nickel, and the
dime, have fallen heads. Our opponent would argue that four coins
fell heads and four is not three. Who is right? There is no answer
because the bet was too vague. Th simplify matters in the future,
we adopt a convention on this point. Mathematics books differ as to
what convention they choose; different areas of mathematics make
different terminology convenient. In our context, the easiest way to
proceed is to say that three means just three and no more. From now
on, when we say something happens n times, we mean the number
of times it happens is n; in other words, it happens only n times. We
can still, for emphasis, make statements like, "just five of the dice
fall 'six'." But the meaning would be the same if the word "just" were
omitted.
Throughout the book, we shall insert various brief comments on
the history of the study of probability. In particular, we shall include
very short biographies of some of the people who devised probability
theory. At this point, we are ready to conclude the chapter with a
discussion of how it all started.
The earliest records of the study of probability theory are some
letters exchanged by Blaise Pascal and Pierre Fermat in the summer
of 1654. While neither man published anything about probabil-
ity, and only a few particular problems were discussed in the
correspondence, informal communication spread the word in the
mathematical community. (A particular case of this is somewhat
amusing. Fermat, in writing to Pascal, says at one point that Pas-
cal will have already understood the solution to a certain problem.
Fermat goes on to announce that he will spell out the details any-
how for the benefit of M. de Roberval. Gilles Personne de Roberval
(1602-1675) was professor of mathematics at the College Royal in
Paris; he apparently had some difficulty in understanding the work
of Pascal and Fermat. In fact, Gottfried Leibniz (1646-1716) refers
to "the beautiful work on chance of Fermat, Pascal, and Huygens,
about which M. Roberval is neither willing nor able to understand
12 1. Introduction
Pierre Fermat
Pierre de Fermat (French, 1601-1665)
Fermat's father was a prosperous leather merchant. Fermat himself
studied law. His father's wealth enabled Fermat to purchase a suc-
cession ofjudicial offices in the parlement ofThu10use; at that time in
France, the usual way of becoming a judge was to purchase a judge-
ship. These positions entitled him to add "de" to his name; however
he only did so in his official capacity. Despite his profession, Fermat
is usually described in reference books as a mathematician. Cer-
tainly his mathematical accomplishments are enormous. Since the
brief biographies, of which this is the first, will not describe the sub-
ject's mathematical accomplishments, we just conclude by pointing
out that little is known about Fermat's life.
Blaise Pascal
(French, 1623-1662)
Pascal was descended from a long line of senior civil servants. In
1470 the Pascal family was "ennobled" and thus received the right to
1. Introduction 13
add "de" to their name; however they did not do so. Blaise Pascal's
father, Etienne, had a substantial interest in mathematics. In 1637,
Etienne Pascal, Fermat, and Roberval jointly sent Rene Descartes
(1596-1650) a critical comment on his geometry. Blaise Pascal's
health was very poor almost from the time of his birth; he was
more or less an invalid throughout his life. Blaise, together with
his two sisters, was educated at home by his father. The sisters
were Gilberte, born in 1620, and Jacqueline, born in 1625. All three
children were child prodigies. In 1638, Etienne protested the omis-
sion of interest payments on some government bonds he owned,
and, as a result, Cardinal Richelieu (1585-1642) ordered his arrest.
The family was forced to flee. Then the two sisters participated
in a children's performance of a dramatic tragedy sponsored by
the cardinal. After the performance, Jacqueline read a poem she
had written to the cardinal, who was so impressed that he recon-
sidered his feelings about Etienne. As a result, Etienne was given
the post of intendent at Rouen. This office made him the adminis-
trator of, and chief representative of the king for, the surrounding
province. It is curious that Fermat was a member of a parlement
while Etienne Pascal was an intendent. At this time, there was great
political tension between the intendents, representing the central au-
thority of the king, and the local parlements, representing regional
interests. In fact, in a revolution that took place between 1648 and
1653, the intendents, including Etienne Pascal, were driven out of
office. Etienne died in 1651. During the 1640s, the family had be-
come more and more taken up with the Jansenist movement. Their
ties to it had been strengthen by Gilberte's marriage in 164l. In
1652, shortly after the death of her father, Jacqueline became a
nun at the Jansenist convent of Port-Royal. In 1655, Blaise entered
the convent. While he turned his attention primarily to theology,
he did do so some important mathematical work on the cycloid
in 1658. Jacqueline, broken-hearted by the pope's suppression of
Jansenism, died in 1661. On March 18, 1662, an invention made
by Blaise Pascal went into operation in Paris: carriages running on
fixed routes at frequent intervals could be boarded by anyone on
the payment of five sous. In short, Pascal invented buses. Shortly
after seeing the success of his bus system, Pascal died at the age
of 39. His surviving sister, who was then Gilberte Perier, was in-
14 1. Introduction
Christiaan Huygens
(Dutch, 1629-1695)
Christiaan Huygens's father, Constantijn, was a diplomat and poet
of some note. He was also a friend of Rene Descartes (1596-1650).
Descartes was much impressed by the young Christiaan's mathemat-
ical ability when he visited the Huygens's home. Christiaan studied
both mathematics and law at the University of Leiden. He want on
to devote his time to mathematics, physics, and astronomy; since
he was supported by his wealthy father, he did not need to seek
gainful employment. While some of his work was theoretical, he
was also an experimenter and observer. He devised, made, and used
clocks and telescopes. When he visited Paris for the first time in
1655, he met Roberval and other French mathematicians. In 1666,
he became a founding member of the French Academy of Sciences.
He was awarded a stipend larger than that of any other member
and given an apartment in which to live. Accordingly he moved to
Paris, remaining there even while France and Holland were at war.
In 1681, serious illness forced him to return home to Holland. Vari-
ous political complications made it impossible for him to go back to
France, and so he spent the rest of his life in Holland.
Exercises 15
Exercises
1. Allen, Baker, Cabot, and Dean are to speak at a dinner. They will
draw lots to determine the order in which they will speak.
a. List all the elements of a suitable sample space, if all orders
need to be distinguished.
b. Mark with a check the elements of the event, "Allen speaks
before Cabot."
c. Mark with a cross the elements of the event, "Cabot's speech
comes between those of Allen and Baker."
d. Mark with a star the elements of the event, "The four persons
speak in alphabetical order."
2 . An airport limousine can carry up to seven passengers. It stops
to pick up people at each of two hotels.
a. Describe a sample space that we can use in studying how many
passengers get on at each hotel.
b. Can your sample space still be used if we are concerned only
with how many passengers arrive at the airport?
3. A coin is tossed five times. By counting the elements in the
following events, determine the probability of each event.
a. Heads never occurs twice in a row.
b. Neither heads nor tails ever occurs twice in a row.
c. Both heads and tails occur at least twice in a row.
4. 'TWo dice are thrown. Consider the sample space with 36 elements
described above. Let:
A be "The total is two."
B be "The total is seven."
C be "The number shown on the first die is odd."
D be "The number shown on the second die is odd:'
E be "The total is odd:'
By counting the appropriate elements of the sample space ifnec-
essary, find the probability of each of the following events: a. A;
b. B; c. C; d. D; e. E; f. A U B; g. A n B; h. A U C; i. A n C; j. A \ C;
k . C \ A; 1. D \ C; m. B U D; n . B n D; o. C n D; p. D n E; q. C n E;
r. CnDnE.
16 1. Introduction
17
18 2. Counting
of a round table is that all seats are alike. If each person moves one
seat to the right, the arrangement of the people is still the same.
As between different arrangements, someone would have to have a
different right-hand neighbor. When we consider a round table, we
want to determine how many different, in that sense, arrangements
are possible.
Now we return to the president and cabinet. As is often the
case, introducing a time factor will help explain our computation.
Of course, the president sits down first. It makes no difference where
the first person to be seated sits; all chairs at a round table are alike.
After the president is seated, the position of each chair may be
specified in relation to the president's chair. For example, one chair
is four chairs to the president's right; some chair is three chairs to the
president's left, etc. Thus after the president is seated, we consider
the number of ways ten people can be placed into ten different
chairs. This can be done in 10! = 3,628,800 ways, and that is the
answer to our question.
It is now a trivial exercise to give a formula for the number of
ways n people can sit at a round table. Problems involving other
shapes of tables can be done on the basis of analogous reasoning.
The next topic, which is also a digression, is most easily de-
scribed by way of examples. In how many ways can the letters of
the word SCHOOL be arranged? There are some obvious conven-
tions in understanding what the question means. The letters are to
be arranged in a row from left to right, as we normally write letters.
Thus HOOLCS is one arrangement; SCLOOH is another. The point
of the question is that the two Os cannot be distinguished from each
other. The way to answer the question is first to compute the num-
ber of arrangements there would be if the Os differed, say by being
printed in different fonts, and then adjusting. Six distinct letters can
be arranged in 6! = 720 ways. But in fact we cannot tell the two Os
apart. Thus the 720 ways counts SHOLOC and SHOLOC as differ-
ent, when they should have been counted as the same. Clearly the
answer to our question is 720 divided by 2, since either 0 can come
first. In other words, the answer is 720/2 = 360, because the Os can
be arranged in order in two ways.
Now let us try a more complicated problem. In how many ways
can the letters of the phrase
2. Counting 23
SASKATOON SASKATCHEWAN
be arranged? Note that the question says "letters"; the space is to
be ignored. The first step is simply to count how many times each
distinct letter appears. There are five As; four Ss; two each of K, T,
0, and N; and one each of C, H, E, and W. That makes a total of 21
letters, counting repetitions. If all the letters were different, there
would be 2I! ways they could be arranged. As in the last problem,
we divide by two because there are two Os. Then we again divide by
two for the two Ks; then again for the 15; and finally a fourth time for
the Ns. At this point we have 2I!/24 . There are four Ss; they can be
arranged in 4! ways. Thus there are 4! times as many arrangements
if we distinguish the Ss as there are if we do not. We divide by 4!
because we do not want to distinguish the Ss. Finally we divide by
5! because the five As are all alike. Our final answer is thus
2I!
5! 4! (2!)4
1b make the procedure by which we obtained this number clearer,
we may write the number as
2I!
5!I!I!I!2!2!2!4!2!1!'
where the numbers 5, I, I, I, 2, 2,2,4,2,1 refer to the letters A, C, E,
H, K, N, 0, S, T, W in alphabetical order. (With a calculator, we find
that the answer just given is roughly eleven hundred trillion.) Since
the ideas here are not made any clearer by writing out a general
formula, we shall just assume that the method is now obvious and
goon.
y n! 1 n!
x=-=---
r! (n - r)! r! r!(n - r)!
(
100) = ~ = 100999897969594939291
10 10!90! 10 987654 321
ways to arrange the letters. Thus we have a new proof of the formula
of the last paragraph.
2. Counting 25
**1****11***1***
In this example, the first object was selected twice, the second object
four times, the third object was never selected, and the fourth and
fifth objects were selected three times each. Each way of making
r selections from n objects, with replacement, disregarding order
of selection, corresponds to just one arrangement of the stars and
bars. Furthermore, each such arrangement corresponds to a certain
selection. Thus the number we seek is simply the number of ways
to put the stars and bars in order. As noted above, there are n - 1
bars. Since there is a star for each selection, there are r stars. The
number of ways of arranging n - 1 bars and r stars in a row was
discussed earlier; it is
(n -1 + r)!
(n - I)! r!
We now have the notation
(5+ 1) = (18) =
14 -
14 14
3060.
guaranteed minimum of two bars per child. The number of ways the
remaining four bars can be distributed is given using n = 5, r = 4 in
our formula; thus this time our answer is
(n)
r -
n!
r! (n - r)!'
(n)r - n!
r! (n - r)!'
we see that
1
1 1
30 2 . Counting
The next row lists @' (i), @' namely, I, 2, 1. As stated above, these
numbers are placed as follows:
1
1 1
1 2 1
G) = (~) + G) = 1+2 = 3,
G) = G) + G) = 2+1 = 3.
We show this computation by
1
1 1
1 2 1
1 3 3 1
We continue in this way. Each row begins and ends with a 1. Each
of the other entries, according to the formula
is the sum of the two numbers most nearly directly above it. The
array described above, part of which is shown in the table below,
is known as Pascal's 1hangle. (A short biography of Pascal appears
in Chapter One.) However the triangle was invented by Jia Xian,
sometimes spelled Chia Hsien, in China in the eleventh century.
Jia Xian devised his triangle to use in raising sums of two terms to
powers. Even in Europe, something was known about the triangle
before the time of Pascal. But the important properties of the triangle
were first formally established by Pascal. We note in passing that
Pascal invented the method now called "mathematical induction"
for just this purpose. Pascal's 1tiangle provides a convenient way to
compute binomial coefficients when we need all (;) for a given, not
Exercises 31
1
2 1
3 3
4 6 4
1 5 10 10 5 1
1 6 15 20 15 6 1
1 7 21 35 35 21 7
8 28 70 56
56 28 8 1
1 9 36 84
126 126 84 36 9 1
1 10 45 120 210 252 210 120 45 10 1
11 55 165 330 462 462 330 165 55 11 1
12 66 220 495 792 924 792 495 220 66 12 1
Exercises
1. a. In how many ways can the letters of the word
FLUFF
be arranged?
b . In how many ways can the letters of the word
ROIDR
be arranged leaving the T in the middle?
c. In how many ways can the letters of the word
REDIVIDER
be arranged so that we still have a palindrome, that is, the
letters read the same backwards as forwards?
2. a . In how many ways can the letters of the word
SINGULAR
be arranged?
b. In how many ways can the letters of the word
DOUBLED
32 2. Counting
be arranged?
c. In how many ways can the letters of the word
REPETITIOUS
be arranged?
3. a. In how many ways can the letters of the word
KRAKATOA
be arranged?
b. In how many ways can the letters of the word
MISSISSIPPI
be arranged?
c. In how many ways can the letters of the phrase
MINNEAPOLIS MINNEsarA
be arranged?
d. In how many ways can the letters of the phrase
NINETEEN TENNIS NETS
be arranged?
4. Suppose your campus bookstore has left in stock three copies of
the calculus book, four copies of the linear algebra book, and five
copies of the discrete probability book. In how many different
orders can these books be arranged on the shelf?
5. How many numbers can be made each using all the digits I, 2,
3, 4, 4, 5, 5, 5?
6. How many numbers can be made each using all the digits I, 2,
2, 3, 3, 3, O?
7. Five persons, A, B, C, D, and E, are going to speak at a meeting.
a. In how many orders can they take their turns if B must speak
after A?
b. How many ifB must speak immediately after A?
8. In how many ways can the letters of the word
MUHAMMADAN
Exercises 33
d. In how many ways can four couples sit at the round table if
each husband sits next to his wife?
e. In how many ways can four couples sit at a square table with
one couple on each side?
17. At a rectangular dining table, the host and hostess sit one at each
end. In how many ways can each of the following sit at the table:
a. six guests, three on each side?
b . four male and four female guests, four on each side, in such
a way that no two persons of the same sex sit next to each
other?
c. eight guests, four one each side, so that two particular guests
sit next to each other?
* *18. Show that four persons of each of n nationalities can stand in
a row in 12n(2n)! ways with each person standing next to a
compatriot.
19. A store has in stock one copy of a certain book, two copies
of another book, and four copies of a third book. How many
different purchases can a customer make of these books? (A
purchase can be anything from one copy of one book to the
entire stock of the store.)
20. A restaurant offers its patrons the following choices for a
complete dinner:
i. choose one appetizer out of four;
ii. choose one entree out of five;
iii. choose two different items from a list of three kinds of
potatoes, three vegetables, and one salad;
iv. choose one dessert out of four;
v. choose one beverage out of three.
a. How many different dinners can be ordered without ordering
more than one kind of potato, assuming that no course is
omitted?
b. How many different dinners can be ordered with no more
than one kind of potato if one item, other than the entree, is
omitted?
Exercises 35
21. Of 12 men, just two are named Smith. In how many ways, disre-
garding the order of selection, can seven of the men be chosen:
a) with no restrictions? b) ifboth Smiths must be included? c) if
neither Smith may be included? d) if just one Smith must be in-
cluded? e) if at least one Smith must be included? f) if no more
than one Smith may be included?
22. Consider computer words consisting of six characters each. If the
characters are chosen from a, b, c, d, 2, 3, 4, 5, 6, each of which
may be used at most once, how many words can be formed:
a) with no other restrictions? b) if the third and fourth characters
must be digits and the other characters must be letters? c) If
there must be three digits and three letters? d) if no two digits
and no two letters may be adjacent?
23. Do the last problem with the modification that the characters
may be used more than once.
24. For each of k = 0, 1, 2, 3, 4, find the probability that a poker
hand (five cards) contains just k aces.
25. Of ten twenty-dollar bills, two are counterfeit. Six bills are cho-
sen at random. What is the probability that both counterfeit bills
are chosen?
26. An urn contains eight balls-two red, two blue, two orange, and
two green. The balls are separated at random into two sets of
four balls each. What is the probability that each set contains
one ball of each color?
27. A bin contains 100 balls numbered consecutively from 1 to 100.
1Wo balls are chosen at random without replacement. What is
the probability that the total of the numbers on the chosen balls
is even?
28. What is the probability that a bridge hand (13 cards) contains all
four aces?
29. a. How many five-person committees can be chosen from a club
with 15 members?
b . In how many ways can a five-person committee consisting of
a chair, a secretary, and three other members be chosen from
the club?
36 2. Counting
35. In how many ways can five apples and six oranges be distributed
among seven children, disregarding the order of distribution?
36. How many ways are there to choose three letters from the phrase
MISS MISSISSIPPI NEVER EVER SIMPERS,
ignoring the order of selection.
37. How many different computer words of six characters each can
be made with letters chosen from the phrase
E PLURIBUS UNUM?
38. Astore has in stock 20 cans of each of four flavors of a particular
brand of soda.
a. In how many ways can a customer purchase ten of these cans
of soda?
b . In how many ways can a customer purchase 65 of these cans
of soda?
39. What is the probability that four randomly chosen people were
born on four different days of the week?
40 . A die is thrown four times. What is the probability that each
number thrown is higher than all those that were thrown earlier?
41 . A die is thrown four times. What is the probability that each
number thrown is at least as high as all of the numbers that
were thrown earlier?
42. In how many ways can 11 # signs and 8 * signs be arranged in a
row so that no two * signs come together?
43. In how many ways can 11 men and 8 women stand in a row so
that no two women stand next to each other?
* * 44. In how many ways can 11 women and 8 men sit at a round table
so that no two men sit next to each other?
45. In how many orders can the letters of the word
INDIVISIBILITY
be arranged without having two Is together.
46. One arrangement of the letters of the word
MISSISSIPPI
38 2. Counting
is chosen at random.
a. What is the probability no two Ss are together?
b . What is the probability all the Ss are together?
47. How many arrangements of the letters of the phrase
MINNEAPOLIS MINNESOTA
have no two vowels together?
48. In a certain city the streets all run north-south and the avenues
run east-west. The avenues are numbered consecutively, and
the streets are denoted by letters in alphabetical order. A po-
lice car is patrolling the city. At each intersection it either goes
straight ahead, makes a left turn, or makes a right turn-it never
makes a U-turn. The car starts by entering the intersection of
J Street and 1Welfth Avenue coming from I Street. How many
different routes can it take if:
a. it reaches L Street and Eighth Avenue after going just six
blocks?
b. it returns to J Street and 1Welfth Avenue after going four
blocks?
c. it returns to J Street and 1Welfth avenue after going six blocks?
49. In the city described in the last problem, suppose a police car is
at B Street and Second Avenue when it gets word of an accident
at J Street and Tenth Avenue. If the car is instructed to proceed
to the accident by as short a route as possible, from how many
routes can the driver choose?
50. How many collections of letters, in no particular order, can be
formed by choosing one or more letters from the phrase
MISSISSIPPI RIVER?
* *51. How many computer words of six characters each can be made
by choosing letters from the phrase
NINETEEN TENNIS NETS?
(Consider various cases separately.)
52. a. A disgruntled letter carrier has to place ten different letters
into seven mail boxes. He pays no attention to the addresses
Exercises 39
a . To win the Jackpot, the player must have picked all six
winning numbers (in any order).
b . 1b win a Second Prize, the player must have picked five
winning numbers.
c. To win a Third Prize, the player must have picked four
winning numbers.
d. To win a Fourth Prize, the player must have picked three
winning numbers and the supplementary number.
Find the probability of winning each of the prizes.
56. According to Laplace, in his time the lottery of France was
played as follows : Ninety numbers were used and five were
randomly drawn, without replacement, each time the game was
played. The player could make a bet on from one to five dif-
ferent numbers and won if all these numbers were drawn. (Of
course, the amount won depended on how many numbers the
player chose.) Find the probability that the player won for each
of the five possible kinds of bet.
**57. Let
Show that
have the same parity, that is, either both are odd or both are
even.
* *59. Show that there are infinitely many rows of Pascal's Triangle
that consist entirely of odd numbers.
**60. We have m balls of each of three colors. We also have three boxes,
each of a different shape. In how many ways can the balls be
distributed among the boxes so that each box contains m balls?
Independence
and
Conditional
CHAPTER
Probability
In this chapter we study how the knowledge that one event definitely
occurs influences our judgement as to the chances for some other
event to occur. The case we consider first is that in which there is
no influence.
3.1 Independence
We begin by describing the intuitive background. We may as well
work with specific numbers. Suppose P(A) = 1/3 and P(B) = 1/4.
Then, in many repetitions of our experiment, A occurs one-third
and B occurs one-fourth of the time. If the occurrence of A does
not change the likelihood of B, then B would occur on one-fourth of
those occasions when A occurs. Since A occurs one-third of the time,
A n B would occur one-fourth of one-third of the time. In symbols,
P(A n B) = (1/3)(1/4) = 1/12. Of course, the conclusion P(A n B) =
P(A)P(B) would remain valid if we used different numbers.
Now, with the last paragraph in the back of our minds, we
make the following definition: 'TWo events A and B are called
41
42 3. Independence and Conditional Probability
is John Doe and offers to make a bet with you. He begins to describe
the bet with a demonstration. John selects some cards from the deck
and arranges them into three stacks. He lets you see which cards are
in each stack. You note that:
Stack S contains 2 tens and 3 threes.
Stack T contains 1 nine, 3 sevens, 3 sixes, and 3 fives.
Stack U contains 4 eights and 2 twos.
John offers to give you first choice of stack; you may select
whichever stack you think is best. Then he will choose from the
remaining stacks the one he wants. Next you will pick a card at
random from your stack and John will pick a card at random from
his. High card wins. Your opponent is willing to bet at even money
despite the fact that you get first choice of stack. Is the bet to your
advantage?
Let us compute your chances of winning. Since which stacks are
used is not a matter of chance, but of choice, we must make several
separate computations.
First, suppose you select stack U. Then John is free to choose
stack S; suppose he does so. If you draw a two, you lose. In order for
you to win, you must drawn an eight and your opponent must draw
a three. The probabilities of these two independent events are 2/3
and 3/5. Thus the probability that you win is only (2/3)(3/5) = 2/5;
more likely than not, you will lose.
Perhaps you think you'll do better by using stack S. Suppose you
do choose S; then John can select T. With these choices, you will
win if you draw a ten and lose with a three, regardless of which card
your opponent picks from T. Thus the probability that you win is
again 2/5 and you must try to do better.
By now you may think T is the stack to choose. Suppose you pick
stack T and your opponent then picks stack U. He will win ifhe gets
an eight while you get a seven, six or five; otherwise, you win. The
probability that he gets an eight is 2/3; the probability that you will
get a seven, six or five is 9/10. Thus the probability that that you
win is 1 - (2/3)(9/10) = 2/5.
In short, if you bet at all, you will probably lose. Accordingly, you
shouldn't bet. Later on, in Chapter Five, we will see that, assuming
that you are fool enough to play many times, you are essentially
44 3. Independence and Conditional Probability
sure to come out behind. How long it takes for you to go broke will
be discussed in Chapter Eight.
Let us consider the problem of when we should call more than
two events independent. We begin with three events, A, B, and C.
We would not call the events independent unless
P(A n B n C) = P(A)P(B)P(C) .
P(A n B n C) = P(A)P(B)P(C),
P(A n B) = P(A)P(B),
P(A n C) = P(A)P(C) ,
P(B n C) = P(B)P( C).
a different toss of a die, for the reason we discussed above in the case
of two events, we take the events Bl , B2, ... , Bm to be independent.
Exercises
1. Prove: If A and B are mutually exclusive, and A and B are also
independent, then either A or B has probability zero.
2. a. Prove: If A and B are independent, then so are A and B.
b. Prove: If A and B are independent, then so are A and B.
c. Give an example of events A, B, and C such that
peA n B n C) = peA )P(B)P( C),
but
peA n B n C) f. P(A)P(B)P(C).
3. Prove:
a. Suppose that
peA n B n C) = P(A)P(B)P(C),
peA n B n C) = P(A)P(B)P( C),
peA n B n C) = P(A)P(B)P( C)
and
peA n B n C) = P(A)P(B)P(C).
Then, A, B, and C are independent.
b. Suppose A, B, and C are independent. Then so are A, B, and
C.
c. Suppose A, B, and C are independent. Then so are A, B, and
C.
d. Suppose A, B, and C are independent. Then so are A, Band
C.
4. Prove: If
peA) PCB) 1 1
peA nB) + peA nB) = peA) + PCB)'
then A and B are independent.
46 3. Independence and Conditional Probability
(~) = I,
the conclusion obviously holds for k = 0 and k = n. Thus, for any
integers nand k with 0 ::: k ::: n, the probability of k successes in n
Bernoulli trials is
going and never get r successes? In the first place, we shall see later
that this cannot happen; sooner or later we will have the r successes.
In the second place, the question is irrelevant anyhow. We want to
find the probability that it takes just k trials to get the r successes.
After the kth trial, we shall know whether or not that event hap-
pened. To say that it takes just k trials to get r successes is to say that
the rth success occurs on the kth trial. Rewording our requirements
once more, we see we must have a success, the rth success, on the
kth trial, and we must have had r - 1 successes on the earlier trials.
We finally get to the point: It takes k trials to get r successes exactly
when there are r - 1 successes on the first k - 1 trials and then a suc-
cess on the kth trial. As we saw in the last paragraph, the probability
of r - 1 successes in k - 1 trials is
(k -
r-l
1) p r-l qk-l .
(k -
r-l
1) r k-r
pq .
Exercises
15. A coin is tossed repeatedly until it falls heads for the fourth
time. What is the probability that the fourth head occurs on the
seventh toss?
16. If 101 coins are tossed, what is the probability that at least 51 fall
heads?
Exercises 51
* *22. 1Wo men, Ben and Frank, want to settle who will pay a dinner
check, which must be placed on a single credit card. Th prolong
the excitement, they do not want to decide on the basis of a
single toss of a coin. Ben suggests that they each toss a coin 20
times and the one who gets the most heads wins. Frank points
out that there could be a tie. Ben then proposes the following:
Frank wins in case they toss the same number of heads, but Ben
will get to toss 21 times to Frank's 20. Is this fair?
ak = (~)pkqn-k,
for which k is ak largest? Towards finding an answer, we compare
ak to ak-l. Whether ak is larger or smaller than ak-l corresponds to
whether ak/ak-l is larger or smaller than 1. We have
n! k n-k
ak k!(n - k/ q
--- =------~~--~---------
n! k-l n-k+l
(k - l)!(n - k + l)!P q
(k - I)! (n - k + I)! P
k! (n - k)! q
n-k+1p
=
k q
np-kp+p
=
kq
3.3. The Most Likely Number of Successes 53
4/5 of success on each trial. Of course, the numbers in the table are
approximations accurate to the number of decimal places shown.
The most likely number of successes is clearly seven, and we have
[pen + 1)] = [7.2] = 7.
Exercises
23. n coins are tossed. For each ofn = 100, 101, 102, and 103, what
is the most likely number of coins to fall heads?
24. n dice are thrown. For each of n = 14, 15, 16, 17, 18, and 19,
what is the most likely number of dice to fall "six"?
25. Twenty-two red dice and ten blue dice are thrown. What is the
most likely number of "sixes" to be shown on:
a. the red dice?
b. the blue dice?
c. all the dice?
26. n pairs of dice are thrown. For each ofn = 2,4, 6, and 8, what is
the most likely number of pairs to total six?
27. Urn A contains two red balls and eight blue balls. Urn B contains
two red balls and ten green balls. Six balls are drawn from urn A
and four are drawn from urn B; in each case, each ball is replaced
before the next one is drawn. What is the most likely number of
blue balls to be drawn? What is the most likely number of green
balls to be drawn?
28. Prove: In a system of Bernoulli trials, the first success is more
likely to occur on the first trial than on any other trial.
29. Prove: Consider a system of Bernoulli trials and a fixed positive
integer s. Let
X= [S;q).
With the exception noted below, the sth success is more likely
to occur on the xth trial than on any other trial. However, if
3.4. Conditional Probability 55
I 1/15
----
6 2/5
is the ratio of the number of days that both have high absenteeism
and are Monday or Friday to the number of days that are Monday
or Friday.
Let us now generalize. Let A and B be events. We define
P(A I B) = P(A n B) .
P(B)
3.4. Conditional Probability 57
there are 3 aces left among the 51 cards. Thus, P(B I A) = 3/5l.
Hence P(A n B) = (4/52)(3/51) = 1/22l.
2. Suppose three cards are drawn from the deck, without replace-
ment. What is the probability all three are aces? Let C be liThe
first two cards are both aces." Let D be liThe third card is an
ace." We seek P(C n D) = P(D I C)P(C) . By the last example,
P(C) = (4/52)(3/51). If the first two cards are aces, the third card is
chosen from 50 cards, of which 2 are aces. Thus, P(D I C) = 2/50.
Hence P(C n D) = (4/52)(3/51)(2/50) = 1/5525.
3. What is the probability that all five cards in a poker hand are
spades? Since the problem is like the ones we just did about aces,
we shall be brief. The probability the first card is a spade is
13
52
The probability that the first two cards are both spades is
13 12
-.-
52 51
The probability the first three cards are all spades is
13 12 11
52 51 50
The probability the first four cards are all spades is
13 12 11 10
52 51 50 49
Thus the answer to the original question is
13 12 11 10 9 33
-.-.-.-.-=--
52 51 50 49 48 66640
Note that this answer could have been written down directly if
we had omitted the explanation.
4. Given that a poker hand (five cards) contains both black aces,
what is the probability that it contains at least three aces? We
work this problem by several different methods to illustrate both
the methods and the flexibility ofthe concept of conditional prob-
ability. The first method by which we solve the problem begins
by defining certain events:
3.4. Conditional Probability 59
PCB I A) = peA n B) .
peA)
We find peA) first. How many poker hands contain both black
aces? As many as there are ways to choose the other three cards
the hand contains in addition to the black aces. Thus there are
50) 50 . 49 . 48
( = = 19 600
3 32 '
hands with both black aces; hence
()
PA = e:)
19,600
Now consider peA n B). We need hands with both black aces and
at least one additional ace. That includes the 48 hands with all
four aces. 1b get the hands with both black aces and one red ace,
we must pick one red ace out of two red aces and two non-aces
out of 48 non-aces. Thus there are
peA nB) =
2304
e:)
When we divide peA nB) by P(A), the denominators of (5;) cancel
out and we get
2304 144
19600 1225
as our answer.
60 3. Independence and Conditional Probability
Our second method is basically the same as the first. The differ-
ence is that we anticipate the fact that the total number of poker
hands, (552) will cancel out. We simply compute the number, 19,600,
of hands that contain both black aces and determine that 2304 of
these hands contain at least three aces. Thus 2304/19,600 gives us
our answer.
The third method is an obvious modification of the second. In
the last paragraph, we particularly studied the three cards in the
poker hand besides the black aces. We now restrict our attention
entirely to these three cards. These three cards can be any of the 50
cards left when the black aces are excluded. The three cards are as
likely to be any three of the 50 as any other three of the 50. Thus we
are trying to find the probability that when three cards are chosen
from a certain 50, at least one ace is chosen. The obvious way to find
that probability is to begin by finding the probability that no aces are
chosen. Reasoning in this way, we find the answer to our original
question to be
each i,
P(B I At)P(Ai)
P(A t I B) = P-(-B-I-A-1-)P-(A-1-)-+-P-(-B-I-A-z)-P-(A-z-)-+-.-..-+-P-(-B-I-A-n-)P-(A-n-)
Proof From the definition of conditional probability we have
P(A z I B)
P(B I Az)P(A z)
--------------------------------------------------
P(B I A1)P(Al) + P(B I Az)P(Az) + P(B I A 3)P(A3) + P(B I A4)P(A4)
Thus we have
1 1
16 6
P(Az I B) =----------
1 1 1 5 1 15 1
0-+--+__+--
2 16 6 32 6 64 6
4
=-
29
We give another application of Bayes's Formula. Even when a
gizmo-making machine is in good working order it occasionally mal-
functions; each gizmo made has independently a 1 % chance to be
defective. A certain gizmo-making machine has just had a new part
installed. The first gizmo it makes is defective. It is known that one
in every 50 new parts installed is a dud. When a dud is used in a
gizmo-making machine, only 10% of its output is usable. Should the
new part be replaced?
All we can do is compute the probability that the part is a dud.
Let us introduce some events:
64 3. Independence and Conditional Probability
Thomas Bayes
The Reverend Thomas Bayes (English, 1702-1761)
Thomas Bayes's father, Joshua, was one of the first six men to be
publicly ordained Nonconformist ministers. That is to say, when it
became legal to be a minister outside of the established Anglican
Church, Joshua Bayes was one of the first such ministers. He was a
respected theologian and also a fellow of the Royal Society. Thomas
followed in his father's footsteps, becoming minister of the Presbyte-
rian Chapel in Thnbridge Wells. Thomas Bayes published two works
in his lifetime. The first was about theology. The second, published
anonymously, was an attempt to refute Bishop Berkeley'S objections
to calculus as illogical (George Berkeley, 1685-1753). Bayes did the
best that could be done at the time; it was not until the late nine-
teenth century that the reasoning behind calculus was brought up to
modern standards. Bayes was elected a fellow of the Royal Society
in 1742. He retired a rich man in 1750. His most important work,
the "Essay Towards Solving a Problem in the Doctrine of Chances:'
was not published until 1763, but was judged important enough to
be worth republishing for its ideas as recently as 1958.
Laplace
Pierre Simon, Marquis de Laplace (French, 1749-1827)
Laplace was born in Normandy; reports about the circumstances of
his family are conflicting. Laplace attended a school where half the
students had their expenses paid for by the king; most students went
on to the church or the army. Laplace demonstrated aptitude in all
areas and particularly distinguished himself in debating the subtle
points of theology. Nevertheless, his primary interest was mathe-
matics. Upon completing his studies at the school, he immediately
Exercises 67
Exercises
30. Suppose A, B, and C are events none of whose probabilities are
zero . Prove:
a. P(A U B) = 1 - P(B)P(A I B).
b . P(A n BIB U C) = P(A n B I B)P(B I B U C) .
c. P(B n C I A) = P(C I A)P(B I A n C) if P(A n C) :;6 O.
d. P(A I B)P(B I C)P(C I A) = P(B I A)P(C I B)P(A I C).
P(A IA UB) P(A)
e. P(B I A UB) = P(B)
68 3 . Independence and Conditional Probability
39. Find the probability that a poker hand (five cards) contains both
black aces given that it contains at least three aces.
40. In a bridge game, Players Nand S each are dealt 13 cards. If
Player N has exactly two aces in his hand, what is the probability
that Player S has at least one ace?
41. Suppose that in a hog calling contest the more skillful participant
always wins. Suppose that Alphonse, Beatrice, and Charlene are
randomly chosen hog callers. If Alphonse beats Charlene in hog
calling, what is the probability that Alphonse is a better hog
caller than Beatrice?
42. A coin is tossed ten times. Given that heads occurred for the
first time on the second toss, what is the probability that heads
occurred for the fifth time on the last toss?
43. We select from a deck of cards the four kings and the four queens.
From these eight cards, we draw one card at a time, without
replacement, until all eight cards have been drawn. Find the
probability that:
a. All kings are drawn before the queen of spades.
b. There is at least one queen that is drawn after all the kings.
c. Each queen is drawn before each of the kings.
d. Each queen is drawn before the king of the same suit.
e. The last king to be drawn is the sixth card to be drawn.
44. A closet contains three pairs of black socks and two pairs of white
socks, all new and all of the same size and style. 'TWo socks are
chosen at random. What is the probability that these two socks
can be worn together without exciting comment?
45. Do the last problem with "socks" replaced by "shoes" throughout.
46. Each employee of a certain large hotel works five days a week,
getting two consecutive days off. The days off of the various
employees are scattered evenly through the calendar week.
a. If Smith and Jones each work for the hotel, what is the
probability that they have a day off in common?
b. If Doe also works for the hotel, what is the probability that
no two of these three persons have a day off in common?
70 3. Independence and Conditional Probability
heads, two balls are drawn.) What is the probability that at least
one red ball is drawn?
54. Urn H contains six red balls and four white balls. Urn T contains
two red balls and three white balls. A coin is tossed. If it falls
heads, a ball is chosen from Urn H. If the coin falls tails, a ball
is chosen from Urn T.
a. If the ball is chosen from Urn H, what is the probability that
it is red?
b. What is the probability that the chosen ball is red and is
chosen from Urn H?
c. What is the probability that the chosen ball is red?
d. If a red ball is chosen, what is the probability that it came
from Urn H?
55. Urn A contains two red balls and three white balls. Urn B con-
tains five red balls and one blue ball. A ball is chosen at random
from Urn A and placed into Urn B. Then a ball is chosen at
random from Urn B.
a. What is the probability that both balls are red?
b. What is the probability that the second ball is red?
c. Given that the second ball is red, what is the probability that
the first ball was red?
d. Given that the second ball is blue, what is the probability that
the first ball was red?
56. Urn A contains six red balls and six blue balls. Urn B contains
four red balls and 16 green balls. A die is thrown. If the die
falls "six," a ball is chosen at random from Urn A. Otherwise, a
ball is chosen from Urn B. If the chosen ball is red, what is the
probability that the die fell "six"?
57. A medical test is not completely accurate. When people who
have a certain disease are tested, 90% of them have a "positive"
reaction. But 5% of people without the disease also have a "pos-
itive" reaction. In a certain city, 20% of the population have the
disease. A person from this city is chosen at random and tested;
if the reaction is "positive," what is the probability the person has
the disease?
72 3. Independence and Conditional Probability
role. Assume that, when two people play, the better player wins
three-fourths of the time. Alice and Bob are randomly chosen
members of the club. What is the probability Alice is the better
player if:
a. Alice and Bob play one game and Alice wins?
b. Alice and Bob play three games and Alice wins just two of
them?
c. Alice and Bob agree to decide a match on the basis of two out
of three and Alice wins the match?
d. Using a calculator to get approximate answers where conve-
nient, do parts band c with "three" and "two" replaced by
"nine" and "five," respectively.
* *69. We again consider the Piquet Club of the last problem. Suppose
Chuck, Debby, and Ed are randomly chosen members of the
club.
a. If chuck plays a game with Debby and Chuck wins, what is
the probability that Chuck is a better player than Ed?
b. If Chuck now also plays Ed and Chuck wins, what is the
probability that Chuck is a better player than Debby?
RandolU
Variables
CHAPTER
75
76 4. Random Variables
coins that fall heads is a random variable X. The value of the coins
that fall heads would be another random variable Y. We should not
let the need for a formal definition obscure the basic simplicity of
the concept.
A function that assigns a number to each point in the sample
space is called a random variable. (Calling a function a variable, or a
variable a function, is not unusual, even ifit is somewhat confusing.)
One obvious way in which random variables arise is in the situation
where a bet is made on the outcome of our experiment. Then the
amount of money paid is a random variable X. In more detail, if
the point u of the sample space is the one that occurs, the number
X(u) assigned by X to u is the amount actually paid. Some very
important random variables came very close to being introduced
in the last chapter. The number of successes in a predetermined
number of Bernoulli trials is a random variable. So is the number of
trials needed to get a predetermined number of successes.
We next introduce some notation that is almost self-explanatory.
Suppose we are given a random variable X and a number t. When
our experiment is done, we get a particular point u of the sample
space, and X assigns a certain value X(u) to this point u. Either X(u)
is t or it is not; we consider the probability that X(u) = t. We write
P(X = t) for this probability. The event A involved here is simply the
set of those u E n for which X(u) = t; P(X = t) means, by definition,
P(A).
E(X) = LP(u)X(u).
uen
(3/8)2, where 3/8 is the probability that two of the coins fall heads.
The computation of E(X) then would be
It appears that we have a term in the sum for every number t. But
if P(X = t) is zero, which will be the case for almost all values of t,
the term is zero, and we simply ignore the term. Thus what the last
4.1. Expected Value and Variance 81
equation really says is that we may find E(X) as follows : For each
number t such that P(X = t) =1= 0, multiply t by P(X = t) and then
add up all the products.
Let us consider another example of the use of this formula. A coin
is tossed ten times. John pays Thm $1 if the first toss is heads, $2 if
heads occurs for the first time on the second toss, $4 if heads occurs
for the first time on the third toss, etc. In other words, for each of
n = 1,2, ... ,10, ifheads occurs for the first time of the nth toss, John
pays Thm 2-1 dollars. If, however, heads never is thrown, that is, if
all ten tosses result in tails, Tom must pay John $5,000. We seek to
find out which player has an advantage, and how big this advantage
is. Let X be the net amount John pays Thm; in particular, if tails
does come up all ten times, X would be -5000. E(X) is computed
as follows : P(X = 1) = 1/2, since John pays $1 exactly when the
first throw is heads; 1(1/2) = 1/2. The probability that $2 is paid is
(l/2)(1/2) = 1/4; 2(1/4) = 1/2. Likewise, tP(X = t) = 1/2 for each
of t = 4,8, . . . ,1024. The remaining value t for which P(X = t) =1= 0
is t = -5000; P(X = -5000) = 1/2 10 . Thus E(X) is the sum of
(-5000)(1/2 10 ) and ten terms each equal to 1/2. In short,
= E(X) + E(Y). o
Corollary Let XI, X 2 , .. ,Xn be random variables. Then
4.1. Expected Value and Variance 83
will turn out, although it is not obvious at this point, that studying
(X - m)2 gives us a particularly nice theory. Therefore we will study
(X - m)2.
We denote E([X - E(X)f) by Var(X). We call Var(X) the variance
of X . The reasons for studying the variance of X were discussed in
the last paragraph: Var(X) tells us the extent to which X fluctuates
from one performance of the experiment to another.
Let us illustrate with an example. Moe and Joe have an agreement
whereby Moe will pay Joe an amount determined by the throw of
a die. In detail, the amount to be paid is described in the first two
columns of the table below: If u is the number shown on the die,
then X(u) is the amount paid.
u X(u) Y(u) X(u) - 10 Y(u) -10 [X(u) - 10]2 [Y(u) - 10]2
1 7 -2990 -3 -3000 9 9000000
2 8 -1990 -2 -2000 4 4000000
3 9 -990 -1 -1000 1 1000000
4 11 1010 1 1000 1 1000000
5 12 2010 2 2000 4 4000000
6 13 3010 3 3000 9 9000000
Using the table, it is easy to find the expected value of the random
variable X . We have
I 1 1 I 1 1
E(X) = - 7+ - . 8 + - 9+ - 11 + - . 12 + - .13 = 10.
6 6 6 6 6 6
We contrast Moe and Joe with Sue and Pru. Sue and Pru have an
agreement similar to that of Moe and Joe. But the amount Y that Sue
pays Pru is described by the first and third columns of the table. For
example, if the die falls "two," Sue pays Pru $-1990; in other words,
in that case, Pru pays Sue $1990. We compute
1 1 1 1
E(Y) = 6(-2990) + 6(-1990) + 6(-990) + 6(1010)
1 1
+ 6(2010) + 6(3010)
= 10.
Thus, E(X) = E(Y). The difference between X and Y is revealed only
when we examine their variances.
4.1. Expected Value and Variance 85
as required. o
We next work a simple problem two ways, first directly and then
using the formula just derived. As we shan see, in a simple problem it
makes little difference which method we use. In a more complicated
situation, the advantage of the new formula is more substantial. Th
get to the point: Suppose a die is thrown. Bob pays Ray $2 if the
die falls "one," "two," or "three." And Bob pays Ray $3 if the die falls
IIfour" or llfive!' But if the die falls "six," Bob pays Ray $600. In seeking
Var(X), we must first find E(X). We have
111
E(X) = -2 + -3 + -600 = 102.
2 3 6
Directly from the definition of Var(X) we have
111
Var(X) = -(-100)2 + -(-99) + -(498)2 = 49601.
236
With easier arithmetic we can find
2121212
E(X ) = "2 . 2 +:3' 3 + 6 . 600 = 60005.
Thus
Exercises
1. John provides and tosses a dime and a half-dollar. Richard gets
to keep whichever coins fall heads.
a. Find the expected value of the amount Richard gets.
b . How much should Richard pay John in advance to make the
game fair?
2. Fred tosses two coins. Ifboth fall heads, he wins $10. If just one
falls heads, he wins $4. But ifboth coins fall tails, he must pay a
$2 penalty. Find the expected value of Fred's gain.
3. In certain lottery, 5,000,000 tickets are sold for $1 each.
1 ticket wins a prize of$I,OOO,OOO.
10 tickets win prizes of $100,000 each.
100 tickets win prizes of $1,000 each.
10,000 tickets win prizes of$IO each.
1,000,000 tickets receive a refund of the purchase price.
a. Find the mean value of the amount a ticket gets.
b. Find the expected net gain for each ticket.
4. Six coins are tossed. Alice pays Betty according to the following
table:
If no heads $200
If 1 head $50
If 2 heads $10
If 3 heads $5
If 4 heads $20
88 4. Random Variables
If 5 heads $25
If 6 heads $80
If X is the amount Alice pays, find E(X) .
5. Alan tosses a coin 20 times. Bob pays Alan $1 if the first toss falls
heads, $2 if the first toss falls tails and the second heads, $4 if
the first two tosses both fall tails and the third heads, $8 if the
first three tosses fall tails and the fourth heads, etc. If the game
is to be fair, how much should Alan pay Bob for the right to play
the game?
6. Find the mean and variance of each of the random variables
described below; each of parts a-o refers to a different random
variable.
a. P(X = -1) = 1/4, P(X = 0) = 1/2, P(X = 1) = 1/4.
b. P(X = -1) = 1/4, P(X = 0) = 1/2, P(X = 5) = 1/4.
c. P(X = -5) = 1/4, P(X = 0) = 1/2, P(X = 5) = 1/4.
d. P(X = -5) = .01, P(X = 0) = .98, P(X = 5) = .01.
e. P(X = -50) = .0001, P(X = 0) = .9998, P(X = 50) = .0001.
f. P(X = 1) = 1.
g. P(X = 0) = 1/2, P(X = 2) = 1/2.
h . P(X = .01) = .01, P(X = 1.01) = .99.
i. P(X = 0) = .99, P(X = 100) = .01.
j. P(X = 0) = .999999, P(X = 1000000) = .000001.
k . P(X = 0) = 1/2, P(X = 2) = 1/2.
1. P(X = 0) = 3/5, P(X = 2) = 1/5, P(X = 3) = 1/5.
m. P(X = 0) = 4/7, P(X = 2) = 2/7, P(X = 3) = 1/7.
n . P(X = 0) = 5/8, P(X = 2) = 1/8, P(X = 3) = 1/4.
o. P(X = 0) = 2/3, P(X = 3) = 1/3.
7. A coin is tossed repeatedly until heads has occurred twice or
tails has occurred twice, whichever comes first. Let X be the
number of times the coin is tossed. Find:
a. E(X) .
b . Var(X) .
Exercises 89
that Q may contain only finitely many points. Use your best
judgement as to how to proceed.)
14. A box contains one twenty-dollar bill and four one-dollar
bills. Two bills are randomly drawn, one at a time, without
replacement.
a. Find the expected value of the bill drawn first.
b. Find the expected value of the total amount drawn.
15. A hat contains two tickets each marked $2, one ticket marked $4,
and one ticket marked $20. Mr. Smith draws a ticket and keeps
it. Then Ms. Jones draws a ticket. Each receives the number of
dollars stated on the ticket. Let X be the amount Mr. Smith gets
and Y the amount Ms. Jones gets. Find:
a. E(X).
b. E(Y).
c. Var(X).
d. Var(Y).
16. A hat contains one thousand-dollar bill and four five-dollar bills.
Five persons, one at a time, each draw a bill at random and keep
it. Let Xl be the value of the first person's bill, X2 the value of
the second person's bill, etc. Find:
a. E(X1 ).
b. E(Xs).
c. Var(X1 ).
d. Var(X3).
e. E(XI + ... +Xs) .
f. Var(Xl + ... + Xs).
E(Xy) = E(X)E(Y)
and
Probability times
No. of heads Probability Value of X value of X
0 1/32 5 5/32
1 5/32 3 15/32
2 10/32 1 10/32
3 10/32 1 10/32
4 5/32 3 15/32
5 1/32 5 5/32
Total 60/32
Thus
UEAi UEAi
E(Xy) = E(X)E(Y).
Proof Let t1, .. " tn be all the numbers r for which P(X = r) =F 0; let
Sl, ... ,Sm be the numbers r for which P(Y = r) =F O. Then
Ifwe multiple together the sums on the right in these two equations,
we get a sum of terms of the form
in this last sum, we have one such term for each choice of i and j.
Since X and Yare independent, P(X = ti)P(Y = S;) = P(X = ti, Y =
S;). Let Bij consist of those u E Q such that X(u) = ti and Y(u) = S;. We
have just seen that E(X)E(Y) is the sum of tiS;P(B ij ) over all choices
of i and j. Given u E Q, there is just one i for which X(u) = ti and
just one j for which Y(u) = S;; thus u belongs to just one of the sets
Bij. In other words, the sets Bij form a partition of Q. If u E Bij, then
X(u) = ti, Y(u) = S;, and hence (XY)(u) = tiS;. Thus by the lemma,
E(Xy) is the sum of tiS;P(Bij) over all choices of i and j. Since we have
already noted that E(X)E(y) is equal to the same sum, the proof is
complete. 0
We have,
E(X2) = ~. 1 + ~. 4 + ~. 9 + ~ .16 + ~. 25 + ~. 36 = 91
I 6 6 6 6 6 6 6'
7)2 91 35
= ( 2: -"6 = 12'
Var(Xj)
Proof
Var(tX) = EtX)2) - [E(tX)f = E(t2X2) - [tE(X)f
= =
t 2E(X2) - f [E(X)]2 t 2Var(X) .
o
We return to the 100 dice thrown just before the statement of the
theorem. Let Y be the average number of spots shown on the 100
dice. In other words, let Y = X/100. We have E(y) = E(X/100) =
E(X)/100 = 7/2. Also, Var(Y) = Var(X/lOO) = Var(X)/10000 =
7/240. Note that the expected value is the same as that for throwing
a single die, but the variance is much less than that for a single die.
We leave a discussion of the significance of this computation for the
next chapter.
Now we are ready to remove the restriction that the sample
space contain only finitely many points. Suppose n contains in-
finitely many points. In accordance with the assumption we made
in Chapter One, we may designate these points as UI, Uz, U3, .... We
make the obvious definition of E(X); that is, we let
00
E(X) = LP(Uj)X(Uj).
j=l
There are some problems: The series may be divergent. In that case,
X has no expected value. The choice of which point of n is Ul, which
96 4. Random Variables
UZ, etc. is arbitrary. Ifwe change the order, we may change the sum
of the series. Th avoid this difficulty, we agree to define E(X) only
when the series above is absolutely convergent. It is a property of
such series that the series remains absolutely convergent with the
same sum if the terms are put into a different order. Th repeat, if the
series above is not absolutely convergent, X has no expected value.
If the series is absolutely convergent, we say, "E(X) exists!' With the
new definition of E(X), all the theory we just did goes through with
obvious minor modifications. We give a description of the changes
necessary in the next paragraph.
We restate the theorems just discussed here partly to include
the case of infinite Q and partly just for the convenience of having
them all together. We shall not worry about proofs; the proofs are
essentially unchanged from the finite case, except that we must
use the properties of infinite series. Note that the definition of
variance, Var(X) = E([X - E(X)]z), still makes sense. If X has no
expected value, then it has no variance. If X has an expected value
but [X - E(X)]2 does not, then X still has no variance. Now we list
the theorems:
Theorem Let X and y be random variables. Suppose E(X) and E(y)
exist. Then E(X + Y) exists and
E(X + Y) = E(X) + E(Y).
Theorem Let X be a random variable and t be a number. Suppose
E(X) exists. Then E(tX) exists and
E(tX) = tE(X).
Theorem Let X be a random variable. Suppose E(X) and E(X2) exist.
Then Var(X) exists and
Var(X) = E(Xz) - [E(X)]z.
Theorem Let X and Y be independent random variables. Suppose E(X)
and E(Y) exist. Then E(XY) exists and
E(Xy) = E(X)E(Y).
Theorem Suppose Xl, ... , Xn are independent random variables, each
having a variance. Then Var(Xl + ... + Xn) exists and
4.2. Computation of Expected Value and Variance 97
Var(tX) = tZVar(X).
There is a certain method for finding expected values that often
greatly simplifies their computation. We shall illustrate this method
in several examples, the first two of which are very important. The
method is absurdly simple. Suppose we want to find the expected
value of a somewhat complicated random variable X. Ifwe can find
much simpler random variables Xl, ... , Xk such that X = Xl + .. . +Xk,
we can use the fact that E(X) = E(XI) + .. . +E(Xk) to find E(X).
Further details of the method become clear in the examples we
work next.
Suppose we are going to make n Bernoulli trials. We let p and
q have their usual meanings. Let X be the number of successes we
shall get; we seek E(X). Let
Next we define
P(A) = LP(k)
keA
Example 1
Suppose a committee with six members is to be formed by randomly
selecting six of the twelve senators from the New England states.
102 4. Random Variables
x _ {I
1 - if Maine is represented,
if Maine is not represented.
Likewise, define Xz, . . .,X6 corresponding to the other five states (in
any order). Then, if the number of states represented is denoted
by X, we have X = Xl + ... + X 6 . Since clearly E(XI ) = E(Xz) =
... = E(X6 ), it follows that E(X) = E(XI) + ... + E(X6 ) = 6E(XI). To
find E(XI ), we find the probability that Maine is represented. The
probability that Maine is not represented is
e60) 5
e62) ~ 22'
Thus E(Xd = 1 . P(X1 = 1) + 0 P(XI = 0) = P(X1 = 1) =1- 5/22 =
17/22. We conclude that E(X) = 6(17/22) = 51/11. 0
4.2. Computation of Expected Value and Variance 103
Example 2
Suppose that an urn contains 100 balls, numbered from 1 to 100. Jane
is to draw balls at random, one at a time, without replacement, until
she draws a ball with a lower number than one she drew earlier. She
will be paid $1 for each ball she draws. How much does she get-of
course we mean, how much does she get on the average?
Now, for each positive integer r, the probability that Jane draws
just r balls is not hard to find. But we shall see that we do not need
these probabilities; there is an easier way to do the problem. The first
step we take is the same whether we find the probabilities or not. We
change the description of the procedure Jane follows to provide that
she continues to draw balls until all the balls are drawn. However,
she still is paid a dollar only for each ball up to and including the
first ball that has a lower number than a ball drawn earlier. Thus
she receives k or more dollars if and only if the first k - 1 balls are
drawn in numerical order; the probability of this event is clearly
1/(k - I)!. Let X be the amount Jane receives. We define random
variables Xl, ... ,XlOO as follows:
X _ {I if Jane receives at least k dollars,
k - 0 if Jane receives no more than k - 1 dollars.
In forming Xl + Xz + ... + X lOO , we add up zeros and ones; we con-
sider how many ones. In those circumstances where Jane receives r
dollars, Xk = 1 exactly when k ::: r; thus in this case there are r ones.
It follows that X = Xl + ... + X lOO . Thus E(X) = E(XI ) + ... + E(XlOO).
We already noted, but stated it a different way, that P(Xk = 1) =
1/(k - 1)!. Thus, E(Xk) = 1/(k - I)!, and hence
1 1 1 1
E(X) = - + -+ - + ... +-.
O! I! 2! 99!
E(X) is thus very, very close to
1 1
e=1+1+-+-+
2! 3!
To the nearest cent, Jane receives $2.72 on the average.
The reasoning just completed would work even if we had con-
sidered starting with a different number of balls. We would have
obtained
111
E(X) = 1 + 1 + - + - + ... + - - -
2! 3! (n-I)!
104 4. Random variables
if we had used n balls. The surprising thing is how little the amount
Jane gets depends on the number of balls originally in the urn. The
problem really only makes sense with at least three balls; thus we
can say E(X) is always between $2.50 and $2 .72. It is $2.72 to the
nearest cent if there are at least six balls in all. 0
Example 3
Suppose we have 100 letters, each addressed to a different person by
name. We prepare an envelope for each letter. Suppose we are called
away for a while. In our absence, a helpful, but not very clever, per-
son puts the letters in the envelopes and mails them off. How many
people get the correct letter? Obviously, that number is a random
variable X. The probabilities of X taking various values are some-
what hard to find; we defer the computation of these probabilities
to Chapter Seven. However, E(X) is very easy to find. Number the
letters from 1 to 100. Let
Xi = {I if the ith letter is in the correct envelope,
o otherwise.
Then X = Xl + ... + X lOO ; hence, E(X) = E(X1 ) + ... + E(XlOO) . Since
clearly E(X1 ) = ... = E(XlOO ), we have E(X) = 100E(Xd. P(XI = 1)
is the probability that the first letter gets in the correct one out of
100 envelopes. Thus, P(XI = 1) = 1/100, and hence E(Xd = 1/100.
Therefore E(X) = 1; on the average, just one person gets the right
letter. It is somewhat surprising that the number 100 did not affect
the final answer; we would have gotten E(X) = 1 even if we had used
any other number in place of 100. This problem is a good example
of a situation in which it is easy to find an expected value, but hard
to find the probabilities behind it. 0
Exercises
17. Suppose X and Y have the same mean and have the same
variance. Suppose also X and Yare independent. Then
EX - Y)2) = 2Var(X).
Exercises 105
19. Suppose that X and Yare independent and that each has mean
3 and variance 1. Find the mean and variance of X + Y and XY.
20. Suppose X and Yare random variables such that E(Xy) = o.
Suppose also each of X and Y has mean 1 and variance 3. Find
the variance of X + Y.
21. Show that
27. Cards are drawn as in the last problem. Let Y be the number of
draws necessary so that ten of the draws result in spades. Find
E(Y).
28. Suppose that in bobbing for apples there is a probability of 2/3
of getting an apple on any particular try.
a. Let X be the total number of apples obtained in eight tries.
Find the mean and standard deviation of X.
b . Let Ybe the total number of tries it takes to get a total offour
apples. Find E(Y) .
29. Suppose in a certain game a player receives $1 for each "five"
and each "six" thrown on a single die.
a. If the die is thrown six times, find the mean and variance of
the amount the player receives.
b. If the player throws repeatedly until he gets $10, find the
mean of the number of throws necessary.
30. Alice and Betty each provide and toss 100 dimes. Alice keeps all
the dimes that fall heads and Betty keeps those that fall tails. Let
X be Alice's net gain in cents. Find E(X) and Var(X).
31. A die is thrown and eight coins are tossed. LetXbe the number
shown on the die and Y be the number of coins that fall heads.
Find:
a. E(X) .
b. E(Y) .
c. Var(X).
d. Var(Y).
e. E(X + Y).
f. Var(X + Y).
g. E(XY).
h . Var(Xy) .
32. A Kennedy half-dollar, an Eisenhower dollar, an Anthony dollar,
and a Roosevelt dime are tossed once. Ms. Kear gets to keep all
that fall heads. What are the mean and variance of the amount
she gets to keep?
Exercises 107
33 . $3 .00 worth of nickels and $2.40 worth of dimes are tossed. Find
the mean and variance of the value of the coins that fall heads.
34. Ten dimes, 30 Eisenhower dollars, 20 Anthony dollars, and 20
nickels are tossed. Ms. Dean gets to keep those that fall heads.
Let X be the number of coins Ms. Dean gets and Y be their value
in cents. Find:
a. E(X) .
b . E(Y).
c. Var(X).
d. Var(Y).
e. E(X + Y).
f. Var(X + Y) .
35. A bag contains 12 coins as follows : 3 nickels, 4 dimes, and 5
quarters. Four coins are drawn from the bag at random without
replacement. Let X be the value of the coins drawn. Find E(X).
36. Assume that 40% of the "public" likes a certain television pro-
gram. Find the mean and standard deviation of the number of
persons in a randomly chosen group of 100 that like the program.
37. A coin is tossed repeatedly until tails appears. Find the mean of
the number of tosses necessary.
38. A coin is tossed repeatedly until heads has appeared four times.
Find the mean of the number of tosses necessary.
39. A coin is tossed repeatedly until either tails appears or heads
has appeared four times, whichever comes first. Find the mean
and variance of the number of tosses necessary.
40. Eight United States pennies and four Canadian pennies are
tossed. Let X be the number of U.S. pennies and Y the number
of Canadian pennies that fall heads. Complete the table below.
(Warning: The hard part of this problem is to decide in what
order the required items should be computed.)
E(X) = E(X2) = Var(X) =
E(y) = E(y2) = Var(Y) =
E(X + Y) = EX + Y)2) = Var(X + Y) =
E(Xy) = EXY)2) = Var(Xy) =
108 4. Random Variables
The last chapter covered the basic facts about random variables. In
this chapter we discuss a number of different topics related only in
that all involve random variables. The first section covers some very
important theoretical matters that are necessary for understanding
the basic ideas of probability theory. The second section discusses
the computation of expected values in certain circumstances; this
material is used extensively in Chapters Eight and Nine. The optional
third section is concerned with finding variances. The three sections
may be read in any order.
III
112 5. More About Random Variables
simple assumptions forced our theory to work out along the lines we
envisaged. We are almost ready to make a precise statement about a
coin falling heads half the time, but first we need a few more facts.
Let X be a random variable. We suppose that X has a variance
and, consequently, a mean. We denote the mean of X by m and
the standard deviation of X by <Tj that is, we set m = E(X) and
<T2 = Var(X). We seek to study the probability that X takes a value
IIfar/l from m. We must first say what IIfar/l means. In fact, an arbitrary
standard of closeness will do. Let t be any positive number. We
investigate how likely it is that X takes a value differing from m by t
or more. In other words, we consider P(IX - ml ~ t). The definition
of variance is,
Each term of the second sum is nonnegativej thus that sum itself is
nonnegative. We have then
a2 ~ L (X(u) - m)2 P(u).
UEA
Dividing by t2 , we have
a 2 jt 2 ~ P(IX - ml ~ t).
This last inequality was first established by Irene-Jules Bienayme
(1795-1878). Nevertheless it is usually called the Chebyshev Inequal-
ity. We shall explain this name, and say more about Chebyshev, after
we study the implications of the inequality.
5.1. The Law of Large Numbers 113
for every c > 0; this form is obtained by simply setting t = cain the
original form of the Chebyshev Inequality.
Let us think about what the new form ofthe Chebyshev Inequal-
ity says. We do this by considering examples. First suppose c = 2.
Then we have
P (IX - ml ::: 20') ::: 1/4.
makes it clear that it is unlikely for any random variable to miss its
mean by as much as ten standard deviations. The specific number
1/100 could quite possibly be much reduced if we had additional
information. Finally we give a silly example. Let c = 1/2. Then we
have
P (IX - ml ::: 0'/2) ::: 4.
back,
1
P (I Y - pi ~ t) ::: nt 2 '
that is,
1
a<-.
- nt2
Let N = 1/st2 . Then, for n:::: N, we have
1 1 st2
a<-<-=-=s
- nt 2 - Nt 2 t2 '
oped until a long time after Bernoulli proved his theorem. Bernoulli's
proof involved very complicated estimates of the probabilities ofvar-
ious numbers of successes. While these computations were far from
easy, they do not constitute Bernoulli's real achievement. As a mat-
ter of fact, as we shall see in the next chapter, DeMoivre was able
to reach a much more detailed conclusion only a few years after
Bernoulli published his theorem. But the basic idea of a law oflarge
numbers remains of great importance.
Probability theory forms only a small part of the mathematical
work of Bernoulli; this part was included in his book, Ars conjectandi.
However, even in that book, Bernoulli included much that is no
longer regarded as belonging to probability theory. The history of
the book is somewhat complicated and involves four Bernoullis.
When Jakob Bernoulli died in 1705, his manuscript was inherited by
his brother, Nikolaus (1662-1716) . Nikolaus turned the manuscript
over to his son, also Nikolaus (1687-1759), who prepared it for pub-
lication. By the time the book was finally published, the first edition
of Montmort's book on probability had already appeared. We shall
refer to the correspondence between Johann Bernoulli (1667-1748),
another brother of Jakob, and Montmort in Chapter Eight. Thus it is
far from clear which Bernoulli did what.
Jakob Bernoulli
(a/k/a James B. and Jacques B.) (Swiss, 1654-1705)
The Bernoulli family left Antwerp in 1583, going first to Frankfort
and then to Basel, to escape persecution as Protestants by the Span-
ish. Having settled in Switzerland, a country of many languages, the
family continued to use many languages; the result is that the first
problem in sorting out the many Bernoullis is that each of them is
often referred to under a number of translations of what is basically
the same name. Even more confusing is that they, as many families
do, tended to use the same first name for different members of the
family. The following table is intended to help the reader keep track
of the names. The table includes only those Bernoullis we shall have
occasion to mention in this book. We omit the obvious variations in
5.1. The Law of Large Numbers 117
spelling, for example, Jacob and Jacobi for Jakob, that often appear
in print.
First Generation
Nikolaus (1623-1708)
Second Generation
(Sons of Nikolaus (1623-1708))
Jakob = James = Jacques (1654-1705)
Nikolaus (1662-1716)
Johann = John = Jean (1667-1748)
Third Generation
Nikolaus (1687-1759), son of Nikolaus (1662-1716)
Daniel (1700-1782), son of Johann (1667-1748)
The three most eminent Bernoullis were Jakob, Johann, and Daniel.
They were, successively in that order, professors of mathematics at
Basel. Jakob and Johann are often known as the "Bernoulli broth-
ers./I [The third brother, Nikolaus (1662-1716), was a senator and a
painter, rather than a mathematician.] Daniel made many discover-
ies in mathematical physics, and the Bernoulli Principle is named
for him. Let us now turn to Jakob, whose work in probability is more
important than that of the others. Jakob became a mathematician
despite his father, who insisted that he study theology. Jakob was
forced to teach himself mathematics, and even to hide his mathemat-
ics books from his father. As a young man, Jakob taught mathematics
to a blind girli later he wrote a book on how to teach mathematics
to the blind. Also in his youth, he travelled extensively, meeting
the leading mathematicians and physicists of the day. In 1687, he
became a professor at Basel. Jakob Bernoulli suggested the use of
the word "integral," as a term in calculus, to Leibniz.
P (I n
I)
XI+ ... +Xn - m > t < s.
--
Proof Let
Xl + +Xn
y=-----.
n
Then we have
1
E(Y) = -E(XI
n
+ ... +Xn)
1
= -(m+
n
.. +m) en terms)
=m,
5.1. The Law of Large Numbers 119
and we have
1
Var() = 2" Var(XI + ... + Xn)
n
1
= 2" [Var(XI ) + ... + Var(Xn )]
n
1
= -(0-
n2
2
+ ... + a 2 ) en terms)
a2
n
Applying the Chebyshev Inequality, we have
a2
P(IY - ml ~ t).:::: t 2 n'
the need to transliterate the Cyrillic. Watch out especially for such
forms as Tchebycheff that begin with a T. Thrning now to Cheby-
shev's work on probability, we begin by noting that his applications
of the inequality that bears his name are sufficient justification for
naming the inequality after him. Chebyshev was the first to prove
the Law of Large Numbers in the form we just gave; he gave the
same proof, using the Chebyshev Inequality, as we did above. In
the first place, Chebyshev's proof is a most elegant replacement for
Bernoulli's complicated computations. What is more important, not
only is the Law of Large Numbers, as stated above, more general
than the version given by Bernoulli's Theorem, but Chebyshev's ap-
proach opens up a whole field of study, extending well beyond the
discrete probability to which we limit ourselves in this book.
P.L. Chebyshev
Pafnuty Lvovich Chebyshev (Russian, 1821-1894)
Chebyshev was one of nine children of a retired army officer. His
brother Vladimir became a distinguished artillery general. When
the family moved to Moscow in 1832, Pafnuty finished his sec-
ondary education at home before entering Moscow University. At
the university, he started on a long list of distinguished mathemat-
ical accomplishments. When the time came to seek employment,
he went to St. Petersburg. Like his brother, he concerned himself
with ballistics, although Pafnuty's concern was theoretical; he made
computations for use by the army. Pafnuty Chebyshev was also
most interested in designing machinery. Not only did he investigate
the theoretical principles, but he actually made some calculating
machines. He continued to turn out mathematical papers until his
death.
Exercises
1. Explain why the exercises for this section differ so much from
the exercises for the other sections of the book.
2. Given a number a:::: I, describe a random variable X such that
P(IX - E(x)1 :::: a(1') = l/a 2 ,
where (1'2 = Var(X). Show that if random variables Xl and X 2 both
have this property, E(XI ) = E(X2 ) and Var(XI) = Var(X2 ), then
P(XI = t) = P(X2 = t) for all t.
3. We could, but will not, use the ideas above and the Chebyshev
Inequality to find an estimate of the minimum number oftimes a
coin must be tossed to be 99.44% sure that it falls heads between
49.9% and 50.1 % of the time. Explain why one would expect the
Chebyshev Inequality to give a very substantial overestimate. (In
the next chapter we shall learn how to give a reliable estimate.)
122 5. More About Random Variables
4. Prove: Let numbers m, 0-, and s be given. Suppose sand m are pos-
itive. Then there is a number N with the following property: Let
Xl, ... ,Xn be independent random variables with n ~ N. Suppose
E(Xi ) = m and Var(Xi) = 0- 2 for all i. Then P(XI + ... +Xn < 0) :s s.
5. Show that ifin the Law of Large Numbers we replace the assump-
tion that all the Xi have the same variance 0- 2 by the assumption
they all have variance less than some number 0- 2 , the law remains
valid.
6. Show that ifin the Law of Large Numbers we omit the assumption
that all the X have the same mean and replace the m in the
conclusion by [E(X1 ) + ... + E(Xn)] In, the law remains valid.
7. Combine the ideas of the last two exercises to find a generalized
version of the Law of Large Numbers.
8. Combine the ideas of exercises 4, 6, and perhaps 5 to find a
statement about random variables with positive means.
Of course, this definition makes sense only when P(A) =1= o. Infor-
mally speaking, E(X I A) may be described as follows: We do our
experiment many times and each time note whether A happens.
We record the value of X only on those occasions when A does
happen. Then the average of those values of X should be close to
E(X I A) . This intuitive description of E(X I A) corresponds exactly
to the more extensive discussion of E(X) with which we started the
last chapter. Likewise, the formal theory of conditional expectation
is completely analogous to that given above for expectation. For
example, the equation,
E(X I A) = LtP(X = t I A) ,
t
holds for exactly the same reasons as the corresponding formula for
E(X). We proceed at once to a theorem that is not a trivial modifica-
124 5. More About Random Variables
We need to evaluate P(u I Ai) for each particular u; let B = {u}. Then
we have
PCB nAi)
P(u I Ai) = PCB I Ai) = P(Ai) .
Exercises
9. 'TWo dice are thrown. The total is noted, and that number of
coins are tossed. What is the expected number of heads to be
obtained?
10. 'TWo coins are tossed. A die is thrown for each coin that falls
heads. What is the expected number of spots shown on the dice?
11. A pair of dice is thrown repeatedly until the total obtained on
the first throw is obtained again. What is the expected number
of throws necessary?
126 5. More About Random Variables
12. A die is thrown and the number noted. Then the die is thrown
repeatedly until a number at least as high as the number ob-
tained on the first throw is thrown. Find the mean number of
times the die is thrown including the first throw.
13. In the game of craps, a player throws a pair of dice. The game
ends with the first throw if a 2, 3, 7, 11, or 12 is thrown. Oth-
erwise, the player throws the dice repeatedly until either the
number obtained on the first throw occurs again or a 7 is thrown.
What is the expected number of throws to complete the game?
14. 'TWo coins are tossed. If two heads are thrown, letters are chosen
from the word TWOFOLD. If one head is thrown, letters are
chosen from the word ONEROUS. Ifno heads are thrown, letters
are chosen from the word NOMAD. Whichever word is used, we
draw letters one at a time, with replacement, until a vowel is
obtained. What is the expected number ofletters then drawn?
15. Repeat the last exercise assuming the drawing to be without
replacement.
16. The words
TATTLETALE, TATTOO, TATTING
contain a total often Th. We choose one of those ten Th at random.
We continue to use the word containing that T and discard the
other words. After replacing the T, we choose letters one at a
time, with replacement, from the word in use until we get an A.
On the average, how many times do we choose a letter, including
the original choice of a T?
17. Repeat the last exercise assuming the drawing to be without
replacement, except for the original T.
18. A die is thrown. Then cards are drawn from the deck, one at
a time, with replacement, until the number of draws that have
resulted in a spade is the number shown on the die. How many
draws are made, on the average?
19. An urn contains two red balls and four green balls. Balls are
drawn from the urn, one at a time, with replacement. The draw-
ing continues until a red ball is drawn. If five or fewer of the
draws result in green balls, no money changes hands.
5.3. Computation of Variance 127
the variances. We shall see next that this difficulty can be easily
overcome.
Before actually working a problem, let us get ready. Some ter-
minology may help. We have seen that we often work with random
variables of a certain kind. Specifically, let X be a random variable
that assumes the value 1 when a certain event A happens and 0
when it does not happen. In this situation we call X the indicator
of A. (Outside of probability theory, one would ordinarily say the
same thing in different words by saying that X is the characteristic
function of A .) If X is the indicator of A, then obviously E(X) = peA),
a fact we have often used. Suppose X and Yare the indicators of A
and B . Since of course 1 . 1 = 1 and 01 = 1 0 = 00 = 0, we see
that XV, like X and Y, assumes only the values 0 and 1. In detail, XY
assumes the value 1 exactly when both X and Y assume the value 1.
Thus, XY is the indicator of A nB, and hence E(XY) = peA nB). Now
we are ready for examples.
What are the mean and variance of the number of aces in a bridge
hand (13 cards)? There are two alternate versions of the same basic
method that we are about to use. We will do the problem twice for
extra practice. Our first computation uses the events AI, ... , A 13,
where Ai is the event, liThe ith card in the hand is an ace." Let
Xl, ... , Xl3 be the indicators of AI, ... , A 13 . Then Xl + ... + Xl3 is the
number of aces in the hand; let X = Xl + .. + X13. As we have done
before, we find
4
E(X) = E(Xl) + ... + E(X13) = 13E(Xl) = 13 52
- = 1.
Thus we have
The first 4 is the number of ways to choose one of the four aces, and
the 4 . 3 is the number of ways to choose two different aces in order.
130 5. More About Random Variables
E(X2) = 4 . ~ + 4 . 3 . ~ = 29.
4 17 17
As before, we find Var(X) = 29/17 - 1 = 12/17.
Now we can find the variance of many of the random variables
we considered in the last chapter. We give an example here. Suppose
a committee with six members is to be formed by randomly selecting
six of the twelve senators from the New England states. Let X be the
number of states represented on the committee. In the last chapter
we found E(X); now we find Var(X). A standard trick will help. Let
Y be the number of states not represented on the committee. Then
X + Y = 6, since there are six states in all . It is easy to see that
E(X) = 6 - E(Y) and Var(X) = Var(Y). Let each of AI, ... , A6 be
the event that a different one of the states is not represented on
the committee; let YI , , Y6 be the indicators of AI, ... ,A6. Then
Y = YI + ... + Y6 and we have, similarly to the examples already
worked,
E(Y) = 6P(A I ),
E(Yz) = 6P(A I ) + 6 5P(AI n Az).
There are several ways to compute the probabilities we need. For
example,
e60) 5
P(A I ) = -12(
) = -22'
6
(!)
P(A j nA z) = -l2(
) = 33
-.
1
6
5.3. Computation of Variance 131
Substituting, we have
5 15
E(Y) =6 - =-
22 11'
2 5 1 25
E(Y ) = 6 . - + 6 . 5 . - = -
22 33 11'
Var(y) = 25 _ (15)2 = 50 .
11 11 121
Thus we conclude
15 51
E(X) = 6 - U = U'
50
Var(X) = 121 .
We may as well record a portion of the technique under consider-
ation in the form of a formula. Since the proof of the formula is just
a simple application of the discussion above, we leave this proof as
an exercise. Let Xl, ... ,Xn be the indicators of the events AI, ... ,An.
We suppose there are numbers sand t such that
peA;) = s for all i,
P(Ai nAj) = t whenever i =I- j.
Then
Var(XI + .. . +Xn) = ns + n(n - l)t - n 2 s2 .
we note that
R R-l
t=P(AinAj) = T T-l whenever i =I j.
Substituting, we find
nR nR(n - I)(R - 1) n 2R2
Var(X) = T + T(T - 1) - --yz
nRT(T - 1) + nRT(n - 1)(R - 1) - n 2R2(T - 1)
T2(T - 1)
nR [T(T - 1) + T(n - I)(R - 1) - nR(T - 1)]
T2(T- 1)
2
nR(T - Tn - TR nR) +
T2(T - 1)
nR(T - n)(T - R)
T2(T - 1)
Recalling T = R + G, we have
neT - n) R G
Var (X) = --.
T-l TT
We now record our result and give a name to the distribution
involved. Let positive integers n, R, and G be given; set T = R + G.
Suppose the random variable X is such that
Exercises
23. In Example 3 of the last section of Chapter Four, about putting
letters in envelopes, we introduced a random variable X. Find
Var(X).
24. Find the variance of the amount of money the contestant wins
in Exercise 44 of Chapter Four.
25. Find the variance of the random variable in each of parts a-h of
Exercise 46 of Chapter Four.
26. Find the variance of the random variable in Exercise 48 of
Chapter Four.
27. Find the variance of the random variable in Exercise 49 of
Chapter Four.
28. Find the variance of the random variable in part a of Exercise
50 of Chapter Four.
29. Find the variance of the random variable in Exercise 51 of
Chapter Four.
30. Find the variance of the random variable in Exercise 52 of
Chapter Four.
3l. Find the variance of the random variable in Exercise 54 of
Chapter Four.
32. Find the variance of the random variable in part b of Exercise
55 of Chapter Four.
134 5. More About Random Variables
2000)2
(
1000
( 4000).
2000
But how large is that? For example, is it more or less than .001? That's
far from obvious. We can rewrite the answer as follows:
( 2000! )2
(1000!)2 (2000!)4
=-~-~-
4000! (1000!)44000!
(2000!)2
135
136 6. Approximating Probabilities
As we shall soon learn how to find out, 4000! has 12,674 digits. Using
a computer to put the last fraction in lowest terms, we find that
the numerator and denominator have about 1000 digits each. That
doesn't help us. Since there is no way to express the exact answer
in a form we can comprehend, we should seek an approximation.
In this chapter, we consider several ways of approximating numbers
that appear in the theory of probability.
lim
n---+oo
(n)pkqn-k,
k
6.1. The Poisson Distribution 137
An = (~)pkqn-k.
Then we have
_ n! (m)k n-k _ n(n - 1) ... (n - k + 1) k k n-k
An - k k q - k m (lIn) q .
!(n - ) ! n !
The numerator of the fraction to the right of the last equal sign
contains k factors; thus we may multiply each of these factors by lin
and delete the expression (llnl. Now we have
1 1 1 1
-n-(n - l)-(n - 2) .. -en - k + 1)
An = n n n n mkqnq-k
k!
q -k = (1 - -m)-k
n
~
1
1- k =.
:r
It is a standard result of calculus that
(1- ~ e- m .
t ~)
[One way to show this is to note
10g (l- :r ~ 10
g
and apply I1Hospital's Rule; the rule is named for G.F.A. de l'Hospital
(1661-1704), but it was devised by Johann Bernoulli.] Using these
results in the last expression for An, we have
mk
lim An = -k e- m .
n--+oo !
138 6. Approximating Probabilities
S.D. Poisson
many large earthquakes there will be next year. Since, so far, earth-
quakes have proved unpredictable, we may regard them, for present
purposes, as due to chance. 1b find out how common earthquakes
are, we consult The Cambridge Encyclopedia of Earth Sciences (Cam-
bridge University Press, 1981). In that book, we find the statement,
"Great earthquakes with magnitudes exceeding 8 [on the Richter
scale] occur about once every five to ten years." For simplicity, we'll
settle for five years. In other words, the average number of "great"
earthquakes per year is 1/5. How many opportunities for earth-
quakes are there worldwide in a year? In other words, what is n? We
don't know, and we don't care. As long as n is large and p is small,
we may use
mk m
_e-
k!
as an approximation to the probability of just k successes. This for-
mula involves m and k; it does not mention nand p. In the case at
hand, m = .2. The values given by the formula, for certain values of
k, are as shown in the following table.
Probability of k great
k earthquakes in a year
0 .81873
1 .16374
2 .01637
3 .00109
4 .00005
2 -m + - e -m + - e -m + ....
m3 m4
E (X) =me -m + me
2! 3!
m3
= me- m ( 1 + m + -m2 + - + ... )
2! 3!
=m.
We shall show in the next chapter that Var(X) is also m.
Exercises
1. Suppose a book of 200 pages contains 100 misprints distributed
among the pages at random. Find the probability that page 72
contains exactly two misprints.
2. A book of 500 pages contains 750 misprints.
a. What is the probability that a certain page contains no
misprints?
b. At least two misprints?
3. Suppose that the number of telephone calls an operator receives
from 9:00 to 9:05 A.M. follows a Poisson distribution with mean
3. Find the probability that the operator will receive:
a. no calls in that interval tomorrow.
b. three or more calls in that interval the day after tomorrow.
4. Suppose raisin bread "averages" six raisins per slice. What is the
probability that a slice contains at least three raisins?
142 6. Approximating Probabilities
The statement
is usually called Stirling's Formula; the sign "-' may be taken as mean-
ing the expression to the right of the sign approximates the one to
the left for large n. More formally, the statement means that
. nne-n J2rrn
hm =1.
n-+oo n!
It might be better to call this statement Stirling's approximation,
rather than Stirling's Formula. CWe shall say something about who
Stirling was shortly.) Knowing the value of the limit above is often
helpful in working with theory. But in a practical case, for exam-
ple, when n = 50, we need to have some idea how accurate our
approximation is. We shall see in the Appendix to this chapter that
nn e- nJ2rrn
.92 < < 1 ...
n!
for all n, and
for n:::: 9.
The appearance of the quantity rr in the last paragraph raises a
number of questions. One of them, how does rr get involved here at
all, must be deferred to the Appendix as to details. But we can point
out that rr appears "all over the place" in mathematics. A more prac-
tical question is why we use rr when all we get is an approximation
anyhow. Using the approximation 2.5 for $ , we could have noted
that
James Stirling
first expenditure from this fund was to pay for a silver teakettle to
reward Stirling for surveying the River Clyde.
Exercises
15. Find the number of digits in 100!. (Actually, find the number
of digits in the Stirling's Formula approximation to 100!. Note:
e100 = 1043 .43 approximately.)
16. Use Stirling's Formula to approximate:
a. e:).
6.3. The Normal Distribution 147
b.
e:)'
(~:) .
[(2n)!]2
c.
n!(3n)!
17. Let
2000199819961994 2
x=
1999 199719951993 .. 1
a. Show that
(2 1ooo 1000!)2
x=
2000!
b . Use Stirling's Formula to approximate x.
18. Use Stirling's Formula to approximate the probability that a coin
that is tossed 2000 times falls heads exactly 1000 times.
19. In how many ways can 10 different objects be selected
from among 30 (disregarding the order of selection)? Answer
approximately using Stirling's Formula.
20 . Suppose X has a Poisson distribution with E(X) = 10. Use
Stirling's Formula to approximate P(X = 10).
21. In a system of Bernoulli trials with n, p, and q as usual, sup-
pose np is an integer. Use Stirling's Formula to show that the
probability of exactly np successes is approximately
1
,J27rnpq'
$Jc
Unfortunately, we're not quite done. The indefinite integral of [(x) =
e- x2 / 2 cannot be given explicitly in terms of the functions studied in
calculus. That doesn't really matter; we would expect to evaluate the
function with the aid of a table in any case. Most tables, including
the one given just before the exercises, give the value of
-1-
.../2rrc
1d -x
e
2 .3.. 1
/2 {,U=--
.../2rr
ld
0
.3.. 1
e -x2 /2 {,U---
.../2rr
lac e _x /2
0
2 .3
(,U
= F(d) - F(c).
It follows that F(t) = - F( -t) for all t. Thus for negative t, we find
F(t) = - F( -t) by looking up -t in the table. [But don't forget the
minus sign in front of "F( -t)" in the last equation; actually picturing
the area under the curve is a good precaution.] The table that is
included here is a very short one, just long enough for our exercises.
Of course, far more extensive tables are readily available. Most such
tables have a title indicating that they are tables of the "Standard
Normal Distribution!' The word "distribution" is being used here in a
more general sense than we have been using it. The standard normal
distribution is a continuous distribution and thus falls outside of
our topic of discrete probability. The approximation we have just
introduced is often called the Normal approximation. The theorem
that justifies the use of this approximation is the simplest example
of a class of theorems called central limit theorems.
The method we just described was devised by Abraham
DeMoivre. He describes it in the second edition of his book, The Doc-
trine of Chances, in language as informal as that we used. DeMoivre
suggests power series and the Newton-Cotes formulas as ways to
find approximate values for the definite integrals involved here; of
course, tables of the values of these integrals had yet to be con-
structed. (indexNewton, IsaacNewton refers to the obvious person;
Cotes refers to Roger Cotes, 1682-1716.) It seems appropriate to us to
call the result simply DeMoivre's Theorem, as is often done, rather
than to mention Laplace, as is also often done. In this respect, as
well as others, DeMoivre had by 1738 made very substantial progress
beyond the work of Montmort, about whom we shall say more in
Chapter Eight, and Jakob Bernoulli. Without minimizing the theo-
retical importance of Bernoulli's Law of Large Numbers, we do point
out that its use is confined to theory. It says certain probabilities are
"small" when the number of trials is "large," but it does not state how
small or how large. It gives no hint of how to say anything about
probabilities that are not small. DeMoivre gives an approximation,
sufficiently accurate for practical purposes, to the probabilities of
different numbers of successes in a fixed number of Bernoulli trials.
In particular, Bernoulli's Theorem is a corollary of DeMoivre's.
150 6. Approximating Probabilities
Abraham DeMoivre
(English, 1667-1754)
Abraham DeMoivre was born in France. His father, a surgeon, sent
him away to school in 1673. Abraham was proud throughout his life
that he was able to write a letter to his parents at that time. Like Jakob
Bernoulli, he was forced to try to keep his study of mathematics
secret. At one point, his teacher asked him what the "little rogue
meant to do with all those cyphers." When the Edict of Nantes, which
provided for the toleration of Protestants in France, was revoked
in 1685, DeMoivre fled to England. It appears that he added the
"de" to his name at this time. 0Ne are following the usual practice,
when writing in English, of treating "de" as an integral part of the
name for citizens of English-speaking countries, but not for others.
Thus we say "DeMoivre" and place the name alphabetically under
D. However, for example, we say "Fermat" and place the name under
F.) DeMoivre supported himself by travelling from house to house
doing tutoring. He took with him pages from Newton's Principia,
6.3. The Normal Distribution 151
1-
1=-
v'2ii
it
0
e- x2 / 2 ax
t I 1/2 - I
0 0 5.0.10- 1
5 .4999997133 2.9.10- 7
Exercises
22. 4500 dice are thrown. Find the probability of getting between
775 and 800 "sixes?'
23. 720 dice are thrown. Find the probability of getting between 100
and 130 "sixes?'
24. Find the probability that, when 1620 dice are thrown, between
255 and 300 of them fall "six?'
25. Find the probability that, when 2880 dice are thrown, between
430 and 450 of them fall "six."
26. 10,000 coins are tossed.
a. What is the probability that between 4950 and 5050 of them
fall heads?
b. Between 4850 and 5150?
c. Between 4995 and 5005?
27. By expressing, approximately, the probabilities as definite inte-
grals, determine which is more likely-that a coin that is tossed
2500 times falls heads between 1215 and 1290 times or that a
die that is thrown 720 times turns up "six" between 106 and 132
times.
28. By expressing, approximately, the probabilities as definite inte-
grals, determine which is more likely-that a coin that is tossed
400 times falls heads between 182 and 214 times or that a die
that is thrown 180 times falls "six" between 24 and 39 times.
29. The probability that a randomly selected voter will answer "Yes"
to a certain political question is 5/9. 720 voters are chosen at
random, and each is asked the question. What is the probability
that between 380 and 420 answer "Yes"? More than 360 answer
"Yes"?
30. Suppose just two-thirds of all voters support a certain proposi-
tion. 800 voters are chosen at random. Find the probability that
a majority of these 800 oppose the proposition.
31. An examination consists of 100 true-false questions. Find the
probability that a student who answers all questions at random
gets at least 65 right.
Appendix 153
Appendix
We now prove Stirling's Formula. We begin by fixing our attention on
some one integer n ::: 3 and trying to estimate n!. The idea behind
our work is that
Ii-I
i 1
logxdx ::: - [log(i - 1) + logi].
2
A2 = 1 I
2 1
logxdx - -log2.
2
Now let
B2 =A2
B3 =A 2 +A3
B4 = A2 + A3 + A4
I n- 112
3/2
logxdx =
15/2
3/2
logxdx + ... +
n- 1/2
n-3/2
logxdx
i
::s log2 + log3 + . .. +log(n -1).
From the third diagram,
i n
n-1/2
log x dx ::s -
1
2
log n.
Appendix 155
for all n.
We have shown that B2 ::; B3 ::; .. . , but that Bn ::; a certain
number fixed for all n. Thus the sequence B 2 , B 3 , . must converge
to some number. Call this number C; then we have, in symbols,
Bn~ C.
We can actually evaluate the integral that appears in the
expression for Bn. We have
Also we have
1 1
log2 + log3 + ... + log(n -1) + -logn = logn! - -logn.
2 2
Thus
1
Bn = nlogn - n + 1 -logn! + -logn.
2
Our goal was information about n! . We have
n! = e iogn !
= e-Bn+niogn-n+1+(l/2)logn
= e-Bnenlogne-neeiog,Jri
= e-Bnenne- n In.
156 6. Approximating Probabilities
Hence, since Bn -+ C,
{ 3/2
C::: II logxdx.
Then
.
bm Pn = - 1 fd e- x /2 dx.
2
n ..... oo 2n c
and
From the first ofthese inequalities, we have ,JYiP ::=: IclJCi ::=: -cJCi,
and hence np ::=: -cJnpq. It follows that np + cJnpq ::=: O. From
n ::=: d 2p/q, we see in the same way that nq::=: dJnpq. Thus we have
np + dJnpq :s np + nq = n.
Since we are concerned only with a limit as n -+ 00, we may
set a minimum size for n. From now on, we shall always assume
that n satisfies the conditions of the last paragraph. We also assume
that n is large enough so that there is at least one integer between
np + cJnpq and np + dJnpq.
Next we make a remark as to notation. The letter n will, of course,
take different values during our discussion. Various other variables,
k, t, R, S, m, cr, T, /)..x, h, kl' ... , kh' Xl, ... , Xh, will be introduced and
defined in terms of n. We shall not show the dependence of these
variables on n in our notation. On the other hand, c, d, p, and q are
constant and do not depend on n.
These preliminaries being completed, we now make a computa-
tion relating to a specific number k of successes for each number of
trials. For each n, we choose an integer k such that
Then we have
c d
- - <t<-_
.jnpq - - .jnpq'
hence t -+ 0 as n -+ 00. We put the Stirling's Formula approximation
to
(~)pkqn-k
in terms of t. This approximation is RS, where
and
.j2rrn
S= .
.j2rrk.j2rr(n - k)
We treat R first. We have
Integrating
1 2
- - = I - x + x - ...
I+x '
we obtain the well-known series for log(1 + x), namely,
x2 x3
log(I + x) =x - -
2
+ - - ....
3
Appendix 159
(That the last equation does not hold for all values of x is not
important. We are only concerned with t close to zero.) Thus
(qt)2 (qt)3 )
logR = -(np + npqt) ( qt - -2- + -3- - ...
Now we turn to S:
J27rn
S= n=;k:-";-;;2=7r(;::n=-::::;k~)
-:,.;"F.!?-Z
= lzrrJ k(n ~ k)
1 n
= $ (np + npqt)(nq - npqt)
1 1
=
J27rn (p + pqt)(q - pqt) .
For small t, we have approximately
1 fl 1
S = J27rn V pq = J27rnpq'
160 6. Approximating Probabilities
Putting everything together, we have just seen that the limit of the
ratio of the probability of k successes in n trials, as described above,
to
tends to 1 as n -+ 00 .
At this point, some common abbreviations are helpful. Let m =
np and a 2 = npq. Then the approximation in the last paragraph is
_1_ e -(k-m)2/(202)
a$
The probability of between m + ca and m + do- successes in n trials is
the sum of the probabilities of k successes for those integers k within
this range. Denote these integers by kl , ... , kh with kl ::: ... ::: kh;
note that we obtain entirely different sets of integers for different
values of n. The sum of probabilities just mentioned is, according to
the last paragraph, approximated by
_1_ L h
e-(ki -m)2/(2u2).
a$ i=l
Now turn to the integral
ld e- x2 / 2 ax.
for each i = 1, ... , h. Now kl , ... , kh include in order all the integers
between m+ca and m+do-. Thus, kl -1 < m+ca, kh + 1 > m+do-,
and ki = ki - l + 1 for all i = 2, ... , h. Using kl - m < ca + 1, we have
kl - m ca+ 1 1
Xl = a
< --
a
= C + a- = c + llx.
Appendix 161
Finally, note that kl ::: m + C(1 implies that Xl ::: c; similarly Xh ::: d.
Thus we have
C ::: Xl ::: C + I::!x,
C + I::!x ::: X2 ::: C + 21::!x,
C + 21::!x ::: X3 ::: C + 31::!x,
Id e- x2 /2 dx.
Note that we had a slight problem adjusting the last piece, but, since
we have
o ::: d - c - (h - l)l::!x ::: Xh + I::!x - Xh-l = 21::!x
_1_ {d e-x2/2 dx .
.;zrr lc
This completes the proof. o
However we left a gap in our reasoning earlier in this Appendix.
When we first mentioned .;zrr, we simply announced without ex-
162 6. Approximating Probabilities
planation that a certain constant had the value v'2Ji. 1b stick to what
we really have established, every appearance of v'2Ji in this chapter
so far should be replaced by an unknown constant; call it k. Now we
evaluate k. By choosing t large enough, we can make
as close as we like to
~
k
1-00
00
e-xZ/2 LU
A .
~
k
1 00
-00
e-x2 / 2 dx = 1
k= 1 00
-00
e- XZ /2 dx.
ke-!I /2 = 1 -00
00
e-(x2 +yZ)/2 dx.
i: i: i:
It follows that
k2 = fL e-(x2+y2)/2 dA.
165
166 7. Genemting Functions
Thus
We also have
E(X) =['(1),
Var(X) = f"(1) + ['(1) - [('(l)t,
This is a geometric series with first term pz and ratio qz. It follows
f(z)=~,
1-qz
f'(z) = p(1 - qz) - pz(-q) = p
(1 - qZ)2 (1 - qZ)2
,,(-2)p 2pq
f (z) = (1- qZ)3 (-q) = (1 _ qZ)3
Now we put in z = 1:
P 1
= (1 ~ q)2 = p2 = P- ,
f'(1)
Thus
E(X) = m,
Var(X) = m 2 +m - m2 = m.
Before continuing our computations of important generating
functions, we establish a general fact that we shall need. When the
generating functions of independent random variable X and Yare
known, it is very easy to find the generating function of X + Y. We
next derive the appropriate formula. Let X and Y be independent
and have generating functions f and g. Then X + Y obviously has
a generating function, since the sum of nonnegative integers is a
nonnegative integer. Th find the generating function of X + Y, we
need to find P(X + Y = n) for each n. For the event, X + Y = n, to
occur
either (X = 0 and Y = n)
or (X = 1 and Y = n - 1)
or(X=2 and Y=n-2)
It can be shown that the obvious way to multiply these last two series
gives the correct result, namely, the series given for h(z) above. Thus
the generating function X + Y is fg, the product of the generating
function of X and Y.
Now we consider the generating function for X, the number of
successes in n Bernoulli trials. We first consider the special case
where n = 1; that is, where X has a Bernoulli distribution. Thus we
seek the number of successes in one trial. Then P(X = 1) = P and
P(X = 0) = q. It follows that the generating function of X is
g(z) = q+ pz.
g(z)=~
1- qz
when r = 1. Reasoning again as we just did, we see that Y has the
generating function
f(x) = (~)r
1- qz
ways to select which two letters are to be right. Thus the probability
that exactly two letters are in the correct envelopes is
n(n-l) 1 1 an-z
. - . --an_z = --.
2 n n-1 2
The analogous computation for three letters is, of course,
In general, the probability that just k letters are correct is an-klk! for
k = 0, I, ... , n. Thus, as soon as we find the values of ao, al, az, ... ,
we shall have our whole problem solved.
The last paragraph not only reduces our problem to that of
evaluating the ai, but it also gives us a means of completing that eval-
uation. With n letters, the number ofletters in the correct envelopes
is some number from 0 to n. Thus the corresponding probabilities
total 1. Thus we have
We conclude
an = 1/2! -1/3! + 1/4! - ... + (-l)n/n!.
We can thus use
1 - 1 + 1/2! - 1/3! + 1/4! - ... = e- 1
to approximate an. The remarkable thing is how little an varies with
n. The relative error in using l/e in place of an is less than 2% for
all n :::: 4. Assuming we have at least four letters, the probability of
getting them all in the wrong envelopes is .37, to two decimal places,
regardless of how many letters we have.
The problem just solved was first treated by Pierre Montmort.
We shall have more to say about him at the beginning of the next
chapter. Montmort used a different method from ours to find the
probability of no matches. The problem is often referred to by the
French name of rencontre.
Given n objects in a row, we may consider the number of rear-
rangements of the objects, still in a row, that leave no object in its
original place. This number Dn is called the number of derangements
of the objects. Clearly we have Dn = n!an . Using the properties of
alternating series studied in calculus, we can see that Dn differs from
nile by less than 1/2. In short, for aU n, Dn is the integer closest to
nile.
We note in passing an application of generating functions that is
not directly related to probability theory. Suppose we need to find
the sum of the series
1 2 3 4
-+-+-+-+
3 9 27 81
....
Were it not for the numerators, 2, 3, 4, ... , the problem would be
easy. But since those numbers are there, we replace the problem by
an apparently still harder problem. Evaluate
1 2 3 2 4 3
f(z) = :3 + gZ + 27 z + 81 Z + ... ;
the special case Z = 1 is the original problem. If we integrate each
term and set
1 12
g(z) = -z + -z + -13 14
z + - z + ....
3 9 27 81 '
7. Generating Functions 1 75
then g'(z) = fez). Since the series for g(z) is a geometric series, we
have
(1/3)z z
g(z) = 1 _ (1/3)z - -3---z
It follows
, (3-z)-z(-1) 3
fez) = g (z) = (3 _ Z)2 = (3 _ Z)2
Thus we have
1 2 3 4 3
:3 + 9 + 27 + 81 + ... = f(1) = 4'
answering the original question.
Let us try a somewhat different question. Find the sum of
1 1 1 1
34 + 416 + 564 + 6256 + ....
Again there are some integers, 3, 4, 5, 6, ... , that are in our way.
Th remove these integers from the denominators, we shall have to
differentiate, instead of integrating as we did in the last problem. We
put in whatever powers of z will do the job of removing 3, 4, 5, 6, ....
Then we have
1 3 1 4 1 5
f(z)=-z + - - z + - - z
3.4 4 . 16 5 . 64
+ .. ;
again we need f(I). Now we have
, 1 2
f (z) = -z + -1z3 + -1z4 + ... = -(1/
- 4)Z2
-- Z2
4 16 64 1-(1/4)z 4-z
Integrating, to get back to f(z), is a bit of a chore. We have
z2 16
--=-z-4+--.
4-z 4-z
Thus
fez) = -z2/2 - 4z - 16log(4 - z) + c.
(Don't forget the constant of integration!) Th find C, note that
obviously from its definition f(O) = O. Thus
0= -16log4 + C;
176 7. Generating Functions
Exercises
1. Find the generating function of each of the following sequences:
a. 3, 6, 12, 24, ... ;
b. I, 3, 9, 27, ... ;
Exercises 177
Find:
a. P(X = 0);
b . E(X) ;
c. Var(X).
10. For the random variable of Exercise 6, find:
a. the mean;
b . the variance.
11 . Arandom variable X assumes only the values 0, 1,2,3, .. . , and
the probabilities that X assumes these values are
Exercises 179
2 2 1 2 (1)2 2 (1)3
3'33'3 3 '3 3 , ... ,
respectively. Find:
a. E(X);
b. Var(X).
12. Suppose X is a random variable with generating function
f(z) = ef - e + 2 - z.
Find:
a. P(X = 3);
b. E(X);
c. Var(X).
13. A random variable takes only the values 3, 4, 5, 6, ... , and the
probabilities that it assumes these values are given by
P(X = k) = 48/4 k for k = 3,4,5, ....
Find the:
a. mean;
b. variance.
14. A random variable takes only the values 2, 4, 6, 8, ... , and the
probabilities that it assumes these value are given by
P(X = k) = 3/2 k for k = 2,4, 6, ....
Find the:
a. mean;
b. variance.
15. A random variable X assumes only the values 0, 1,2,3, ... , and
the probabilities that X assumes these values are
1111111
l-log1.5'3' 322' 333' 34 4' ....
a. E(X);
b. Var(X).
180 7. Generating Functions
183
184 8. Random Walks
Remond de Montmort
Note that Pierre Montmort was hardly neutral between Pierre and
Paul. We still root for Peter and tend to look at things from his point of
view. For example, we shall use the letter p, which we have used for
the probability of success, for the probability that Peter wins a game;
q, corresponding to failure, is the probability Paul wins. Sometimes it
is important to be aware that, notation apart, the roles of the players
are completely interchangeable. There is a basic symmetry in all of
our ideas, if not in our terminology.
Let us explain the words, "random walk." We shall be concerned
here only with random walks on the line, that is, one-dimensional
random walks. Random walks in higher dimensions are also im-
portant. We shall be assuming that Peter and Paul playa series of
games. Suppose they bet a fixed amount on each game. Imagine the
amount of money that Peter possesses to be continuously displayed
graphically on an electric sign. A spot oflight moves back and forth
along a line to illustrate how Peter is doing. Each time Peter wins,
the spot moves one unit to the right; when he loses, it moves one
unit to the left. The reader should visualize the sign from time to
time, even though we don't explicitly refer to it again. In fact draw-
ing a diagram showing the line and key points, such as the starting
point, on it can be very helpful. We should note that both physical
and metaphorical random walks occur often in the real world; their
186 8. Random Walks
We next try to find fI , fz, ..., one at a time. We begin by noting the
following: The first time Peter is ahead, he is necessarily ahead by
just one game. f1 is easy. If Peter wins the first game, he is, at that
point, ahead by just one game, obviously for the first time. Thus
f1 = 1/3.
8.1. The Probability Peter Wins 187
f1 = 1/3,
f3 = (1/3)2(2/3),
fs = 2(1/3)3(2/3)2,
h= 5(1/3)4(2/3)3.
p < l/Z
p 2q = p(pq) < (1/2)(1/4) = (1/Z)3
p3q2 = p(pq)2 < 0/2)0/4)2 = 0/2)5
p4q3 = p(pq)3 < (1/2)(1/4)3 = (1/zf,
etc.
Looking at the last paragraph, we see that each term of the series for
h is less than the corresponding term of the series for 1. Thus h < 1;
190 8. Random Walks
{
I if p ::: q,
(P/q)k ifp < q.
In the discussion just concluded, play continued no matter how
far Peter was behind. Suppose a one-dollar bet is made on each
game. Then we were considering a situation where Paul was willing
to extend Peter unlimited credit. An alternative situation arises when
Peter and Paul each start with only so much money and play stops
when one of them goes broke. We now study this new situation.
It is clear that one or the other of the players must go broke.
Towards seeing why, we first consider the case where p ::: q. We saw
above that, if play continues long enough, Peter will be one dollar
ahead. If play continues from that point, he will get to be a second
dollar ahead. Thus eventually, Peter will be ahead by any amount
named in advance. Thus Peter will win all Paul's money unless play
is stopped by Peter's going broke. If p < q, we may interchange the
roles of the players and still conclude that one of them must go
broke. We may be satisfied to know just that much, but Peter and
Paul will want to discuss their individual chances.
Before deriving the general formulas, it is instructive to study
some special cases. The method used here is fundamental in this
chapter and the next one. The method may also be employed in
many cases where the precise circumstances necessary for the
formulas to hold do not apply.
In the first problem we work, besides the assumptions already
announced, we suppose that Peter starts with $1 and Paul starts with
$2. We also suppose p = q = 1/2. We are discussing the situation
where each player bets $1 on each game and play continues until
8.1. The Probability Peter Wins 191
one of the players goes broke. What is the probability Paul is the one
to go broke? We present two ways to work this problem.
In both solutions, we note that Peter must win the first game in
order to have a chance of bankrupting Paul. One way to proceed
is now to consider the cases, "Peter wins the first two games" and
"Peter wins the first game, but loses the second./I For brevity, we
denote these possibilities by WW and WL. If WW, Paul goes broke.
The probability of Paul's going broke because WW occurs is 1/4.
After WL, the players each have the same amount of money as they
started with; thus they each have the same chance of winning as
they did at the outset. Denote the chance that, starting from scratch,
Paul goes broke by x. The probability ofWL is 1/4. After WL, x is the
probability that Paul goes broke. Thus the probability that Paul goes
broke with play starting with WL is (1/4)x . Combining all this,
1 1
x= -
4
+ -x.
4
It follows x = 1/3; that is, the probability that Paul goes broke is 1/3.
Now consider the second method of solving our problem. Unless
Peter wins the first game, Paul cannot go broke. If Peter does win
the first game, after that game, Peter has $2 and Paul $l-just the
reverse of the amounts they started with. Thus, at this point, after
Peter has won the first game, the probability of Paul's going broke
is equal to what was the probability of Peter's going broke when
they started. Let x be the probability Paul goes broke, starting from
scratch. Then 1 - x is the probability of Peter's going broke, also
starting from scratch. Therefore 1 - x is the probability of Paul's
going broke after the first game has been won by Peter. For Paul to
go broke, we must first have Peter winning the first game, which has
probability 1/2, and then have Paul going broke after that, which has
probability 1 - x. Thus
1
x = -(I-x).
2
Again x = 1/3, as we found by the other method.
Now we change the conditions a little. We still suppose Peter
begins with $1 and Paul with $2. We now assume p = 2/3 and
q = 1/3. Now what is the probability that Paul goes broke? With
192 8. Random Walks
PZ - PI = r(pi - 0) = rpi 0
Pt - Pt-I = r t-I PI 0
Exercises
In those exercises below that mention Peter and Paul, we
continue to make our basic assumptions about how those per-
sons gamble. These assumptions are described in the seventh
paragraph of this chapter.
1. Peter and Paul bet one dollar each on each game. Peter starts
with s dollars and Paul with t - s dollars. They play until one
of them is broke. For practice in a certain kind of reasoning,
answer the following questions without using the general for-
mulas. Some parts of the exercise require the use of the result
of a previous part. What is the probability that Peter wins all the
money if:
p= s= t=
a. 1/2 2 4
b . 1/2 1 4
c. 1/2 2 5
d. 1/2 1 5
e . 1/2 2 6
f. 1/2 1 6
g. 1/2 3 6
h. 1/3 1 3
196 8. Random Walks
i. 1/3 2 3
j. 1/3 2 4
k. 1/3 1 4
1. 1/3 3 4
2. Peter and Paul bet one dollar each on each game. For this ex-
ercise only, we modify our basic assumptions as follows: Peter
is nervous on the first game, and the probability of his winning
that game is 1/3. Thereafter, Peter and Paul each have probabil-
ity 1/2 of winning each game. They play until one of them has
a net loss of $2. What is the probability Paul is the one with that
net loss?
3. Th obtain a certain job, a student must pass a certain exam. The
exam may be taken many times. If the student passes on the first
try, she gets the job. If not, she still gets the job if at some time
the number of passing attempts exceeds the number of failures
by two. If at any time the number of failures exceeds the number
of passes by two, she is not allowed ever to take the exam again.
The probability of the student passing on each particular try is
1/2. What is the probability she gets the job?
4. David and Carol playa game as follows: David throws a die, and
Carol tosses a coin. If die falls "six," David wins. If the die does
not fall "six" and the coin does fall heads, Carol wins. If neither
the die falls "six" nor the coin falls heads, the foregoing is to
be repeated as many times as necessary to determine a winner.
What is the probability that David wins?
5. Three persons, A, B, and C, take turns in throwing a die. They
throw in the order A, B, C, A, B, C, A, B, etc., until someone
wins. Awins by throwing a "one!' B wins by throwing a "one" or
a "two." C wins by throwing a "one," a "two," or a "three." Find the
probability that each of the players is the winner.
6. Four persons, A, B, C, and D, take turns in tossing a coin. They
throw in the order A, B, C, D, A, B, C, D, A, B, etc., until someone
gets heads. The one who throws heads wins. Find the probability
that each of the players is the winner.
7. Adam and Eve alternately toss a coin.
Exercises 197
a. The first one to throw heads wins. If Adam has the first toss,
what is the probability that he wins?
** b. Eve must throw heads twice to win, but Adam need throw
heads only once to win. If Adam goes first, find the probability
he wins. If Eve goes first, find the probability that Adam wins.
**c. Each player needs two heads to win, and Adam goes first.
Find the probability that Adam wins.
8. Peter and Paul bet one dollar each on each game. Each is willing
to allow the other unlimited credit. Use a calculator to make a
table showing, to four decimal places, for each ofp = 1/10, 1/3,
.49, .499, .501, .51, 2/3, 9/10 the probabilities that Peter is ever
ahead by $10, by $100, and by $1000.
9. Suppose Peter and Paul bet $1 on each game and Paul starts with
$5. For each game, the probability that Peter wins is 1/10. If Paul
extends Peter unlimited credit, what is the probability that Peter
will eventually have all of Paul's $5?
10. Repeat the last exercise assuming, for each game, the probability
Peter wins is .499.
11. Peter needs $100 for a special purposei he will stop gambling
when he gets it. Peter and Paul bet $10 on each game. Ifp = .48
and Paul extends Peter unlimited credit, what is the probability
that Peter gets the $100?
12. Peter has probability 2/3 of winning each game. Peter and Paul
bet $1 on each game. If Peter starts with $3 and Paul with $5,
what is the probability Paul goes broke before Peter is broke?
13. Peter has probability 1/4 of winning each game. Peter and Paul
each bet $100 on each game. They each start with $400 and play
until one of them goes broke. What is the probability that Paul
goes broke?
14. Peter and Paul each bet $1 on each game, and each player starts
with $10. Peter has probability 1/3 of winning in each game.
What is the probability that Peter is $3 ahead at some time before
he is $7 behind?
15. Peter and Paul each have probability 1/2 of winning in each
game. They bet $10 each on each game. What is the probability
that Peter is $100 ahead at some time before he is $50 behind?
198 8. Random Walks
16. Peter has probability 2/3 of winning in each game. Peter and
Paul each bet $100 on each game. Peter starts with $200 and
Paul with $600. They play until one of them goes broke. What is
the probability that Peter goes broke?
17. Peter starts with $10,000 and Paul with $1,000. They bet $100
each on each game. In each game, each player has the same
chance of winning. What is the probability that Peter goes broke?
18. Peter and Paul each start with $32 and p = .6. They bet $1
each on each game. Use a calculator to find the approximate
probability that Peter bankrupts Paul. [Note: If your calculator
does not have a button for raising numbers to arbitrary powers,
you can find the 32nd power of a number by squaring five times
in a row; since we have (X2)Z = x4, (X4)2 = X8, (X8)2 = X16,
(X16)2 = X32. More generally, the (2 n )th power may be found by
squaring n times.]
19. Do the last exercise modified so that Peter starts with $8 and
Paul with $56.
20. Do Exercise 18 modified to provide p = .51, Peter starts with $8,
and Paul with $56.
21. Do Exercise 18 modified to provide p = .501, Peter starts with
$256, and Paul with $768.
22. Peter starts with $2048 (2048 = 211) and p = .501. Paul starts
with billions. Use a calculator to find the approximate probability
that Peter bankrupts Paul, assuming each player bets $1 on each
game. (See the note to Exercise 18.)
23. Show that, for each positive integer k, there is a sequence ao, aI,
a2, ... of nonnegative integers such that the probability Peter is
ever k games ahead is
aopk + a1pk+1q + azpk+2q2 + a3pk+3q3 + ... ,
for all values ofp.
**24. If wo, WI, .. are the numbers defined in the beginning of this
chapter, show
Wn =_l (2n).
n+ 1 n
8.2. The Dumtion of Play 199
We suppose that Peter and Paul agree to bet $1 each on each of certain
games. Whether or not a bet is made on a particular game may be
a matter of chance, perhaps depending on the outcome of previous
games. However, whether a bet is made on a game and the outcome
of that game are to be independent events. In other words, given
that a bet is made on a certain game, the conditional probability
that Peter wins that game is p. Let X be the net amount that Peter
wins overall and Ybe the number of games on which bets are made.
Whether E(y) is defined depends on the agreement between Peter
and Paul; the average number of games on which bets will be made
need not be finite. We shall determine the relationship between E(X)
and E(Y) on the assumption that E(y) is defined.
We begin by defining, for each of n = 1,2, ... , random variables
Xn and Y n as follows:
Thus, for all n, we have E(Xn) = E(Yn)(P - q). Under the assumption
that E(Y) is defined, it can be shown that E(X) = E(XI ) + E(Xz) + ...
and E(Y) = E(YI ) + E(Yz) + ... , even though there are infinitely
many terms in each sum. Hence we can conclude that
It can be shown, using Exercise 49, that E(Y) is defined in this case.
We have then, assuming p =1= q,
_ p*t - s
E (Y) - .
p-q
This formula may be used to find E(y) when p =1= q.
We next consider the case p = q. In this case, E(X) = E(y)(p - q)
becomes simply E(X) = o. That won't help us find E(y), but we can
now derive the formula for p*, stated above without proof. [We may
as well at least announce E(Y) = set - s), even though we leave
the proof for the Appendix to this chapter.] We have, as in the last
paragraph, 0 = E(X) = p*t - s. Thus, p* = s/t, as claimed.
We pause briefly for an historical sidelight. DeMoivre, whose
biography appears in Chapter Six, found an ingenious way to derive
the formula for p* in the case where p =1= q. He found a way to use the
same simple method we just used for the case p = q. See Exercise 51
for the details.
Now we return to the situation considered first. We suppose Paul
extends Peter unlimited credit and they continue betting until Peter
is k dollars ahead. To be sure that Peter will eventually be k dollars
ahead, we need p ::: q. Since play stops when Peter is k dollars ahead,
X = k, no matter how the individual games turn out. For p > q, it
can be shown that E(Y) is defined (see Exercise 52). Thus, for p > q,
we have k = E(X) = E(Y)(P - q) and can conclude
k
E(y)=-;
p-q
in other words, it takes k/(P - q) games on the average before Peter
is k games ahead. If p = q, the formula E(X) = E(Y)(P - q) yields
k = o E(Y) = 0, which is impossible. The formula fails because E(y)
is not defined. The original definition of expected value, we recall,
sometimes involves an infinite series. When the series diverges, the
expected value is undefined. We have just shown that Y has no
expected value; in other words, we have shown that a certain series
diverges. Informally, if p = q, it is certain that Peter will sooner
Exercises 203
Unlimited Credit
p<q p=q p>q
No Credit
Peter starts with s. Paul starts with t - s.
Play continues until someone is broke.
r = q/p.
p#q p=q
l-yS
Probability Paul goes broke p*=-- sit
1 - rt
p*t- s
Number of games until someone is broke --- s(t - s)
p-q
Exercises
25. Reconsider each of the situations described in Exercise 1. Again
for practice in a certain kind of reasoning, answer the follow-
ing question without using the general formulas. What is the
204 8. Random Walks
* *49. Consider Peter and Paul with P > 1/2. Assume play continues
until Peter is ahead. Let Y be the number of games until that
happens. Show that, with wo, WI, Wz, ... as early in the chapter,
E(y) = WoP + 3wIP2q + 5wzp3qz + 7W3P4q3 + ....
Show that this series converges by comparing it to
wo(l/2) + wI(l/2)3 + wz(1/2)5 + w3(l/2f + ... ;
we have already shown that the latter series converges to 1.
**50. Consider Peter and Paul with p > 1/2. Assume play continues
until Peter is k games ahead. Let Y be the number of games until
that happens. Show that E(y) is finite by combining the ideas of
the last exercise with those of Exercise 23.
Appendix
We now prove the formula set - s) announced above. The circum-
stances, besides the general assumptions of this chapter, are as
follows: Peter starts with s dollars and Paul with t - s. They each
bet a dollar on each game, and play continues until someone goes
broke. Also, each player is as likely to win in each game as his oppo-
nent; i.e., p = q = 1/2. We seek to show that it requires an average
of set - s) games before one player has all the money.
Barring trivialities, we suppose s ~ 1 and t-s ~ 1; ifs = t-s = I,
it obviously takes just one game to bankrupt one of the players. Thus
the formula gives the correct value in this case. We go on to check
the formula for t = 3,4,5, ... in turn. (fhis process of basing the
proof for each value of t on having established the formula for lower
values oft is called mathematical induction.) The prooffor t = 3 will
use the formula for t = 2; the proof for t = 4 will use the formula
for t ::: 3, etc. In other words, we show, for each particular value of
t, that the formula holds on the assumption that it holds whenever
Peter and Paul start with fewer than t dollars between them.
We need to distinguish three cases according to whether s < t/2,
s > t/2, or s = t/2. First suppose s < t/2. Then, since Peter starts
with less than half the money, Paul starts with more than half. Thus,
Appendix 207
sooner or later, either Peter will go broke or Paul will have lost all
but S of his dollars. How long does that take?
S t - 2s s
We are concerned with either Peter losing s dollars or Paul losing
t - 2s dollars, whichever comes first. Since s + (t - 2s) < t, by our
assumption about what happens when fewer than t dollars are in-
volved, it takes s(t - 2s) games, on the average, for this situation
to arise. If at this point Peter is broke, play is over. If Paul has s
dollars left, additional games are necessary. The probability of Peter
winning t - 2s dollars before Paul wins s dollars is
s s
----=--
s + t - 2s t- s
Thus that is the probability that additional games are necessary. How
many additional games will be needed? We are discussing reaching
a situation where Paul has s dollars and Peter t - s, just the reverse
of the way they started. Thus as many additional games are needed,
on the average, as were needed, on the average, overall when play
started. Denote this number of games by x. We have
s
x = s(t- 2s) + - - x.
t-s
Solving for x, we have x = s(t - s) in the case s < t/2.
Now we cover the other cases. The case s > t/2 is the same
as the first case with the roles of the players interchanged. Thus
the formula holds for s > t/2 also. Finally, suppose s = t/2. Then
each player starts with s dollars. After one game, one player has
s - 1 dollars and the other s + 1. At that point, by the cases already
covered, (s - 1)(s + 1) games remain to be played, on the average.
The total expected number of games needed is thus
1 + (s - 1)(s + 1) = 1 + S2 - 1 = S2 = s(t - s),
209
210 9. Markov Chains
The point is that we get the same value for this probability no matter
when "now" is or what the "additional information" is. Of course, the
past may affect the future, but only by acting through the present.
As long as we are where we are, it doesn't matter how we got there.
At this point, we have described the properties of a Markov chain.
Let us just agree that when we speak of a Markov chain, we are
saying that we are considering a system along the lines described
above. It doesn't really help to select some particular object, such as
the transition matrix to be described shortly, and say that it is the
Markov chain.
In a moment, we'll give the short biography of the Markov for
whom Markov chains are named. However, we should first explicitly
state that Markov did invent Markov chains. Markov chains formed
only a small part of Markov's work on probability. And probability
formed only a part of Markov'S mathematical work.
A.A. Markov
Academy, to which Gorky had just been elected, Markov was partic-
ularly active in protesting. In 1913, when the authorities organized
a celebration of the 300th anniversary of the Romanovs, Markov or-
ganized a celebration of the 200th anniversary of Bernoulli's Law of
Large Numbers.
Exercises
1. A museum owns a certain three paintings by Renoir, two by
Cezanne, and one by Monet. It has room to display only one
of these paintings. Therefore the painting on display is changed
once a month. At that time, the painting on display is replaced by
a randomly chosen one of the other five paintings. Let El be "A
Renoir is on display:' E2 be "A Cezanne is on display," and E3 be
"A Monet is on display." Find the transition matrix for the Markov
chain just described.
2. Do the last problem modified as follows: The next painting to
be displayed is randomly chosen from among those paintings by
different artists from the painting being replaced.
3. Brands A, B, C, and D of detergent are placed in that order on the
supermarket shelf. A customer either buys again the same brand
as last purchased or changes to a brand displayed next to it on the
shelf; if two brands are adjacent to the old brand, and a change
of brands is made, a random choice is made between those two.
The probability of buying Brand A given that Brand A was the one
purchased last is .9; the corresponding probabilities for Brands B,
216 9. Markov Chains
C, and Dare .9, .8, and .9, respectively. Find the transition matrix
for the Markov chain involved.
4. The floor plan of a portion of a museum is shown in the diagram
below.
t-- -
B
D
EXIT
I I
Mr. Hadd E. Nuffwants to leave the museum, but he has become
flustered. He is wandering from room to room at random; he is
as likely to leave a room by anyone door as by any other door.
However, when he reaches Room F, he can see the exit, and thus
he will definitely go that way and not return. Find the transition
matrix for the path taken by Mr. Nuff; use one state for each room
and one for "out of the area."
5. A sales representative divides her time among six cities, located
as shown in the sketch map below. The map also shows which
pairs of these cities have scheduled air service; such pairs are
connected by a line segment marked with the number of flights
per day in each direction. The representative, for reasons that we
do not explain, chooses a flight at random from those going from
the city she is in to one of the other five cities; after completing
Exercises 217
her business in the latter city, she repeats this process. Find the
transition matrix of the Markov chain involved here.
0 0 0 0 1/2 0 1/2 0
0 1/3 0 0 0 2/3 0 0
0 1/3 0 0 1/3 0 0 1/3
0 0 0 1 0 0 0 0
b.
1/2 0 0 0 1/2 0 0 0
0 1/2 0 0 0 1/2 0 0
1/3 0 0 0 1/3 0 1/3 0
0 0 1/2 1/2 0 0 0 0
0 0 0 2/3 0 0 1/3 0 0
0 0 0 1 0 0 0 0 0
0 0 0 0 1 0 0 0 0
0 1 0 0 0 0 0 0 0
c. 0 0 0 0 0 0 1 0 0
0 0 0 0 0 1/2 0 0 1/2
0 0 1 0 0 0 0 0 0
0 1/3 0 0 0 2/3 0 0 0
0 0 0 0 0 1 0 0 0
We are instructed to treat all hij with the same last subscript
together. Thus we start by finding hI 1, hZl' h3l' h4l' hSl . Since we are
discussing the chances of ever reaching E l , it is not surprising that
we treat the numbers in the first column of the transition matrix
differently from those in the other columns. Starting in E 1 , there
is a probability of 1/2 of reaching E1 again in one step. There is a
probability of 113 of reaching E z in the first step and a probability of
hZl of going on from there to El later. Similarly for 1/6, E3, and h 3l .
Thus we have
(2/3)h 41 + (l/3)h sl ,
(3/4)h 41 + (l/4)h sl .
We have five equations in five unknowns. But the third equation,
h31 = h31, is useless. And each of the last two equations simplifies to
h41 = h S1 . Thus these five equations by themselves are not enough
to determine the hij . However, either by determining the ~ or by
inspection, we can find out which hij are zero. In the case at hand,
clearly h31 = 0, since we can never leave E3. Likewise from E4 and
Es only E4 and Es can be reached. Thus, h41 = hSl = o. Inserting
these zeros in our equations we have the new system,
= (1/2)h1z + 113,
h1Z
hzz = (2/5)h 12 + liS.
As for those hij with j = 3, the ones that obviously are zero are
h43 and h s3 . Our procedure therefore calls for writing the equations,
The first two equations yield hI4 = 1/2, hZ4 = 3/4. The last two
equations yield h44 = I, hs4 = 1. (Alternatively, we could have
found h44 and hs4 another way. The abstract principle involved is
the following: If Ei is recurrent, then either hij = 0 or hij = 1; nothing
in between is possible. Because, assuming a start in Ei, there will be
infinitely many returns to Ei; if it is at all possible to reach Ej, sooner
or later Ej will be reached. In the case at hand, E4 is recurrent, and
thus h44 # o. Also hs4 # 0 since Ps4 # 0, and Es is recurrent. It
follows that h44 = hs4 = 1.)
Finally we treat the hij with j = 5. Clearly h3s = o. We have the
equations,
2/3 = 2
Vll = 1 - 2/3 '
V21 = 1/2 = 3/2
1 - 2/3 '
2/3 = 5/4
Vl2 = 1 - 7/15 '
7/15
V22 = 1 - 7/15 = 7/8.
Exercises
,q
9. For the transition matrix
['~2
0 0
1/4 1/4
1/2 0
0 0
find:
a. h 21 .
b. h 22 .
c. All of the other h ij .
10. For the transition matrix of the last problem, find all Vij .
224 9. Markov Chains
[1/2
1/2
1/2 0
1/2 0
1/2 0 1/4
0 1/2 0 1/2
lq
lH
19. Find all Vij for the transition matrix
[1/2
1/2
1/2
0
0 0
20. Find all Vij for the transition matrix
1 0 0 0 0
1/3 0 1/3 1/3 0
0 0 1 0 0
0 1/3 1/3 0 1/3
0 0 0 0 1
2l. Find all Vij for the transition matrix
[If 1/3
1
1/2 1/2
1~1
*
nj = 1 + LPikrkj.
By I:* we mean, "sum over all k for which k i= j and Pik i= 0:' Note that
the equations are almost, but not quite, identical to those we used to
find the hij. As with the hij, those rij with a particularj are determined
together. The equations of the form just given that involve these rij
always constitute a system oflinear equations that can be shown to
have a unique solution.
We give an example of finding the rij. Suppose our Markov chain
has transition matrix
o
1~2]
0
o 1
[T 1/4 3/4
1/5 0
o .
4/5
9.3. How Long Does It 'lb.k.e? 227
The first step is to determine for which i and j there is an Yij' Recall
Yij is defined only when hij = 1. It might appear that we therefore
need to compute all of the h ij as a first step towards finding the Yij'
We are prepared to do just that if necessary. In an example as simple
as the one we are working, however, it is clear by inspection which
h ij are 1. Since E1 cannot be reached from any other state, we have
h21 = h31 = h41 = O. From E1 there is a chance of going to E 4 , from
which return to E1 is impossible; thus hll < 1. We conclude Yij is not
defined for any i if j = 1. Now consider j = 4. It is clear that, from
E 1 , sooner or later we will go to E 4 . Thus h14 = 1 and Y14 is defined.
Since E4 cannot be reached at all from either E2 or E 3 , we see that
both Y24 and Y34 are meaningless. Wherever we start, it is clear that
eventually we shall be wandering between E2 and E 3 , reaching each
of them from time to time; thus, hij = 1 for j = 2,3. Therefore, Yij is
defined providedj = 2 or 3. Th summarize, Y12, Y13, Y14, Y22, Y23, Y32,
Y33, Y42, Y43, are the only Yij that make sense.
We next evaluate the Yij' We begin withj = 2. In doing that, we
treat the second column of the transition matrix differently from the
other columns. Since P12 = 0, to get from E1 to E2 we make a first
step and then a number of additional steps, how many additional
steps depending on where we went with the first step. Either by
looking back at the last paragraph, or by thinking it out, we see
In writing the last two equations, we used the fact that if the first
step is to E 2 , no additional steps are necessary to reach E 2 . We find
in turn Y32 = 4, Y42 = 5, YZ2 = 5, Y12 = 7. The next set of equations,
for j = 3, is
Y13 = 1 + (1/2)Y13 +
YZ3 = I,
Y33 = 1 + (l/4)YZ3,
Y43 =
1+ (l/5)YZ3 + (4/5)Y43.
Solving we find YZ3 = I, Y43 = 6, Y13 = 8, Y33 = 5/4. The last set of
228 9. Markov Chains
Xc = 1 + PXc-I + qxc.
We can simplify matters by subtracting each equation from its
successor. We have then
X2 - Xl = PXI,
X3 - X2 = P(X2 - Xl) p2XI,
X4 - X3 = P(X3 - X2) p3 XI ,
Adding we have
Xc-Xl =XI(P+p2+ ... + pC-I),
Xc = Xl (1 + P + p2 + ... + pc-I).
Using the first equation of (*) again, we have
Xc = (1 + qXc)(1 + P + ... + pC-I),
qxc = (1 + qXc)(1 - p)(1 + P + .. . +pc-I),
= (1 + q Xc)(1 - pC).
= 1 + qxc - pC - pCqxc .
Hence
1- pC
Xc =--
pCq
That answers our question; (1 - pC)j(pCq) is the average number of
Bernoulli trials necessary to get c successes in a row.
We started this chapter by announcing we were going to gener-
alize the ideas of the last chapter. Recall that in the last chapter one
230 9. Markov Chains
question discussed was how long it takes for someone, either Peter
or Paul, to go broke. We answered that question even though we
didn't know which player would go broke. In our present language,
we were finding, for certain Markov chains, how long it takes to be-
fore an absorbing state is reached. In other words, how many of the
steps taken go to transient states?
We may as well generalize by considering an arbitrary Markov
chain. As we have already noted, there will come a time after which
only recurrent states are visited. How long does it take for this sit-
uation to occur? We already have at hand almost all the machinery
needed to answer this question.
Our first chore, however, remains to be described. This chore
consists of replacing the given Markov chain with a new chain. The
transient states are left unchanged. We replace all the recurrent
states by a single state, E* . Since the new states are described in
terms of the old, the transition probabilities are just the already-
known probabilities of going from one state to another. To spell this
out, the probability of going from any transient state to another
remains the same. The new state E* is to be absorbing, since it
is impossible to go from a recurrent state to a transient state. The
probability of going from a transient state Ei to E* is the sum of the
probabilities of going from Ei to the various recurrent states of the old
chain. At the risk of undue repetition, let us state what is to be done
mechanically. We assume the new state E* is to be considered the
last state corresponding to the bottom row and right-hand column
of the matrix. The first step is to strike out from the given matrix
the rows and columns corresponding to the recurrent states. Then a
new column is inserted at the right; the entries in this new column
are determined by the fact that the total of each row must be one.
Finally a new row, consisting of Os followed by a single I , is added at
the bottom. Now we are ready to use the method developed earlier
in this section.
At this point, we have completely described a method of find-
ing the answer to the question raised two paragraphs back: Given a
Markov chain, for each transient state, assuming a start in that state,
what is the average number of steps needed to reach a recurrent
9.3. How Long Does It'TIlke? 231
state? First we modify the transition matrix in the manner just de-
scribed. That rij that corresponds to going from the given transient
state to the new state E* is the answer to our question.
Let us consider an example. Suppose a Markov chain has
transition matrix
1 0 0 0 0 0
1/2 0 1/3 0 1/6 0
0 0 1/2 1/2 0 0
0 0 1/4 3/4 0 0
0 1/7 0 0 2/7 4/7
0 0 0 0 0 1
El and E6 are absorbing, and hence recurrent. E3 and E4 are dearly
also recurrent. Now we see that E2 and E5 are transient. Accord-
ingly we delete the first, third, fourth, and sixth rows and the
corresponding columns. Now we have the matrix
0 1/6]
[ 1/7 2/7 .
[+ 2/7
o
For this last matrix we have
r13 = 1 + 0/6)r23,
r23 = 1 + 0/7)r13 + (2/7)r23.
From this we find r23 = 48/29 and r13 = 37/29. In other words,
returning to the original Markov chain, it takes an average of 48/29
steps to reach a recurrent state if we start in E 5 , and an average of
37/29 steps if we start in E2.
232 9. Markov Chains
Exercises
22. Find all Yij for the transition matrix
+l
[1/2
1/2
1/2
0
0 0
[~
1/2
0
1/2]
o .
1 0
lq
25. Find all Yij for the transition matrix
1/2 0
[1/2
1/2 1/2 0
1/2 0 1/4
0 1/2 1/4 1/2
[ 1/2 1/2]
1/3 2/3
29. Find all r;j for the transition matrix
1/2 1/2 0]
[ 3/4 0 1/4
o 1/4 3/4
30. Show that the formulas of Chapters Four and Nine agree as to
the number of Bernoulli trials needed to get one success.
31. a. Assuming a coin to be tossed once a second, how long would
it take, on the average, of c = 5, 10, IS, 20, 25, 30, 40, 50 to
get c heads in a row?
b. Assuming a die to be thrown once a second, how long would
it take, on the average, for each of c = 5, 10, IS, 20 to get c
"sixes" in a row?
c. Assuming a die to be thrown once a second, how long would
it take, on the average, for each of c = 5, 10, 25, 50, 75, 100,
125, ISO, 175,200 to get c consecutive throws none of which
is a "six"?
32. Letters are chosen at random, with replacement, from the word
BANANA
until each of the three different letters B, A, and N has been
drawn at least once. Find the expected number of times a letter
is drawn. (Hint: Assign a state to each of "nothing," B, A, N, BA,
BN, AN, BAN.)
33. On the average, how many times must one throw a die to obtain a
"one:' a "two," and a "three," in that order, on consecutive throws?
Suggestion: Let E4 be "mission accomplished:' Let E3 be "one"
and "two" just thrown, "three" still needed. Let E z be "one" just
thrown, "two" and "three" still needed. Let El be nothing useful
yet done. Be careful; note for example: If you are in E 3 , needing
only a "three," and you throw a "one," then you go to E z, not to E 1 .
234 9. Markov Chains
34. On the average, how many times must one throw a die to obtain
a "one," a "two," and another "one:' in that order, on consecutive
throws? Suggestion: Modify your solution to the last problem.
35. On the average, how many times must one throw a die to obtain
a "one," another "one," and a "two:' in that order, on consecutive
throws?
36. On the average, how many times must one throw a die to obtain
a "one," a "two," and another "two," in that order, on consecutive
throws?
37. On the average, how many times must one throw a die to obtain
three consecutive throws that fall alike?
38. On the average, how many times must one throw a die to obtain
three consecutive throws no two of which fall alike?
39. A coin is tossed repeatedly until it falls heads-tails-heads, in that
order, on three consecutive tosses. Find the expected number of
tosses necessary.
40. A coin is tossed repeatedly until it falls tails-tails-heads, in that
order, on three consecutive tosses. Find the expected number of
tosses necessary.
41. A coin is tossed repeatedly until it falls heads-tails-tails, in that
order, on three consecutive tosses. Find the expected number of
tosses necessary.
42. On the average, how many times must one toss a coin to obtain
three consecutive tosses that fall alike?
43. On the average, how many times must one toss a coin to obtain
three consecutive tosses that do not fall all alike?
44. On the average, how many times must one throw a die to get six
consecutive tosses that fall alike?
* *45. A die is thrown repeatedly until it falls "one," "two," "three,"
"four," "five," "six," in that order, on consecutive throws. Find the
expected number of throws made.
* *46. A die is thrown repeatedly until six consecutive throws show six
different numbers. Find the expected number of throws made.
47. In the situation in Exercise 3, if the customer buys Brand A now,
how long will it take before the customer buys Brand D?
Exercises 235
1 0 0 0]
1/4 1/4 1/4 1/4
[
o 1/4 1/4 1/2 .
000 1
53. For each state, assuming a start in that state, how long does it
take, on the average, to reach a recurrent state for the transition
matrix
1 1/4
1/2 0 1/4
0 0]
0
[
o 1/2 0 1/2'
000 1
54. For each state, assuming a start in that state, how long does it
take, on the average, to reach a recurrent state for the transition
matrix
1 0 0 0 0
1/5 4/5 0 0 0
0 1/3 1/3 1/3 0
0 1/4 0 1/2 1/4
0 0 0 0 1
55. For each state, assuming a start in that state, how long does it
take, on the average, to reach a recurrent state for the transition
236 9. Markov Chains
matrix
1/2 1/2 o
[ 1/2 1/2 o
1/2 0 1/4
o 0 1/2
56 . For each state, assuming a start in that state, how long does it
take, on the average, to reach a recurrent state for the transition
matrix
1 0 0 0 0 0
0 0 1/2 1/2 0 0
0 1/3 0 2/3 0 0
0 1/4 1/2 1/4 0 0
1/4 0 0 0 1/2 1/4
0 1/2 0 0 1/2 0
These formulas may be used to find all the Pij(2). There are two
corresponding formulas for the Pij(3) . To get from Ei to Ej in three
9.4. What Happens in the Long Run? 237
Pij(4) = LPik(2)pkj(2),
k
Pij(4) = LPikPkj(3).
k
are the probabilities of being in the various states exactly n steps after
a start in Ei. Thus, of course, each of these numbers is nonnegative,
and the numbers total 1. We generalize. Any list of s nonnegative
numbers that total 1 will be called a probability vector. (The word
vector is used here simply to indicate that we have a list of numbers.)
1b restate the definition, (aI, a2, ... , as) is a probability vector if
s
for all j = 1, ... , s; and LaJ = l.
j=l
Suppose the probability vector (aI, ... , as) gives the probabilities
of being in the various states at some time. We mean, of course, that
aJ is the probability of being in Ej at that time. For convenience, call
the time in question "now." What is the probability "0 of being in Ej
exactly n steps from now? The conditional probability, given being
in Ei now, of being in Ej after n steps is Pij(n). Since we must be
somewhere now,
"0 = LaiPij(n).
i
for allj.
time. In symbols,
rrlj = L miPij(n)
i
for all n.
At present, we do not know which Markov chains have a unique
fixed probability vector, but, given such a chain, finding the fixed
probability vector is easy. We may as well discuss how to do that now,
even though we do not yet know the significance of such vectors. A
fixed probability vector (ml, ... , ms) is simply a solution of the linear
equations
rrlj = L miPij,
i
2/3 1/3 0]
[ 1/4 1/2 1/4 .
o 4/5 1/5
We shall show that there is just one fixed probability vector and find
this vector. Suppose x, y, andz are the probabilities of being inE 1 , E2,
and E3 at a particular time. Let us compute the probability of being
in EI one stage later. We can get to EI directly from EI ; the probability
we are in EI and then stay in EI is x(2/3) . Likewise, the probability of
being in E2 and then going to EI in a single step is y(l/4). We cannot
get from E3 to EI in a single step. Thus the probability of being in EI
one step after the "particular time" is
(2/3)x + (l/4)y .
240 9. Markov Chains
x = (2/3)x + (l/4)y.
Likewise, we need
y = (1/3)x + (1/2)y + (4/5)z,
z= (l/4)y + (l/5)z.
x, y, and z with
x+y+z=1.
(3/4)y + y + (5/16)y = 1.
Thus y = 16/33, and hence x = 4/11 and z = 5/33. Our fixed proba-
bility vector is thus (4/11, 16/33,5/33). We now return to theory and
discuss, among other things, the significance of a fixed probability
vector.
We are ready now to turn to the main subject of this section,
namely, what happens in a Markov chain after we reach a recurrent
state? We have already noted that sooner or later we definitely will
reach some recurrent state. If this state is absorbing, we just stay
there, and nothing worth mentioning occurs further. However, if we
reach a recurrent state that is not absorbing, we still have something
to study. Of course, one possibility is to start in a recurrent state.
As we already know, starting in a certain state is entirely equivalent
to reaching this state later, as far as what happens next. Therefore,
the discussion that follows does not distinguish between starting in
a recurrent state and reaching that state after a while. Our detailed
work will all be based on the assumption that a particular recurrent
state, which we shall designate Ei, is reached. Since, for some chains
and starting points, we do not know in advance which recurrent
9.4. What Happens in the Long Run? 241
states will be reached, we may have to apply the discussion that fol-
lows to more than one choice of Ej. Now we consider what happens
afterwards if we do reach a certain recurrent state E j .
Let a particular recurrent state Ej be chosen. There typically will
be states that cannot be reached from Ej . Since we are concerned
only with what happens after being in Ej, we shall soon discard those
states; they are of no further interest to us. But there is something
we should discuss first. Recall that Ri is the set of states that can be
reached from Ej. Since Ej is recurrent, we can return from any state
in Ri to Ej. It follows we can get from any state in R- j to any state
in Ri. On the other hand, it follows from the definition of Ri that
one cannot get from a state in Ri to a state that is not in Ri. Now
we are ready to discard all states outside ofR- j In mechanical terms,
we simply delete from the transition matrix the rows and columns
corresponding to all states that cannot be reached from Ej. We then
have the transition matrix for a new Markov chain. This new chain
has the property that it is possible to get from any state to any state.
Studying this new chain is equivalent to discussing what happens in
the original chain after we reach Ej.
Until further notice, we shall confine our attention to the new
Markov chain introduced in the last paragraph. 1b say the same thing
another way, for the time being we confine our attention to chains
in which we can get from any state to any state. Clearly in such
a chain all states are recurrent. Over the course of time, wherever
we start, we shall pay infinitely many visits to every state. What
fraction of the time we spend, on the average, in each state is an
obvious question to raise.
The first job to do is to make it clear exactly what we mean by
tlfraction of the time." We begin doing that by choosing a positive
integer N and fixing our attention on the first N steps. Later we can
get an overall view by considering the limit as N tends to infinity.
As we just said, let N be a fixed positive integer. Consider certain
states Ej and ;. We assume a start in Ej. We let the random variable
X be the number of steps, among the first N steps, that go to ;.
We evaluate X in the usual way. Let random variables Xl, ... ,XN be
defined as follows:
I if the nth step is to ;,
Xn = { 0 otherwise.
242 9. Markov Chains
We may write
9.4. What Happens in the Long Run? 243
It follows
Hence, we have
[Pij(N) + Pij(N - 1) + ... + Pij(l))/N = l/1)j.
We just showed
lim"J(N) = I/1)j.
We have
L"J(N) = L[Pij(l) + ... + Pij(N)]/N
j j
Now we have
L(I/rkk)pkj = L[limxk(N)]Pkj = lim LXk(N)Pkj = l/1)j.
k k k
By definition then, (l/rll, .. . ,l/rss) is a fixed probability vector.
In the last paragraph we showed there is at least one fixed prob-
ability vector, namely, (l/rll, ... , l/rss) . Now we show that there no
other fixed probability vectors. Consider any fixed probability vector
(ml, ... , ms). Then, as noted above, we have
mj = L mkPkj(n) for all n .
k
9.4. What Happens in the Long Run? 245
Thus the arbitrary fixed probability vector (ml' ... , ms) is identical
to (1lr11, ... , 1/rss). It follows that (1 Ir11 , "', 1/rss) is the only fixed
probability vector. Recall we are working under the hypothesis that
one can get from any state to any state. The fact that, under this
hypothesis, there is a unique fixed probability vector is important in
itself.
It is now clear how to find the values of the 1/1Jj efficiently.
All we need do is find a fixed probability vector. This vector will
necessarily be (1 Irll, ... , 1Irss). An example showing how to do this
will be found above.
At this point a warning may be in order. We illustrate what we
were talking about with an example. Consider the Markov chain with
transition matrix:
By direct computation we find that (1/2, 112) is the only fixed prob-
ability vector. It follows, and it was obvious from the matrix, that
wherever we start we spend half the time in each state. But it does
not follow that, for example,
P12(3456789) = 1/2.
If we start in E I , the probability of being in Ez exactly 3,456,789
steps later, or any odd number of steps later, is 1. After an even
246 9. Markov Chains
Exercises
57. Find a fixed probability vector for
1/2 1/2 0]
[ 3/4 0 1/4.
o 1/4 3/4
58. Find a fixed probability vector for
0 1/2 1/2]
[1 0 0 .
o 1 0
1/2 0 1/2]
[ o 1/2 1/2 .
o 1/3 2/3
60. Find a fixed probability vector for
[1~2 1~2 ~
1/3 2/3 0
1
:0].
1/3 0
o 2/3 2/3]
1/2
o 1/2
0 1~2
248 9. Markov Chains
r rq Number of Bernoulli
Pascal r, r+ 1"" ,-II)P'if-'
e- p2
P C~ZqZr trials needed to get r
successes
m k
Poisson 0,1,2", , _e-m m m emz - m Approximates binomial
k! for n large, but m = np
not large
Chapter One
3a 13/32 3b 1116 3e 1/4 4a 1136 4b 1/6
4c 1/2 4d 1/2 4e 112 4f 7/36 4g 0
4h 112 4i 1136 4j 0 4k 17/36 41 1/4
4m 7/12 4n 1112 40 1/4 4p 1/4 4q 114
4r 0
Chapter Two
1a 20 1b 6 Ie 24 2a 40,320 2b 2520
2e 4,989,600 3a 3,360 3b 34,650 3c 5.2797.10 14
253
254 Answers
Chapter Three
5 Yes, Yes, No 6 TFFF TTTF 7a .125, .25, .3125, .3125
7b .1552, .2688, .29952, .27648 8a .015625 8b .09375
8e .1875 8d .140625 9 116 lOa .056314
lOb .001953 11a .062500 lIb .197531 lIe .135031
12a .401878 12b .131687 12e .270190 13 .038580
14 .169753 15 .15625 16 1/2 17a .38580
Answers 255
Chapter Four
1a $.30 1b $.30 2 $4 3 $.64, -.36 4 $20
5 $10 6a 0,.5 6b 1,5.5 6e 0,12.5 6d 0, .5
6e 0,.5 6f 1,0 6g I, 1 6h I, .0099 6i 1,99
6j 1,999,999 6k I, 1 61 I, 1.6 6m I, 1.43
6n I, 1.75 60 1,2 7a 5/2 7b 1/4 lOa -2.375,
2.7344 lOb 2.375,2.7344 11a 3 lIb 2 12a 200
12b 5,000 13 7/3,14/9 14 $4 .80,9 .60 15a 7
15b 7 15e 57 15d 57 16a 204 16b 204
16e 158,404 16d 158,404 16e 1 020 16f 0
19 X + Y:6&2, XY:9&19 20 4 22 2,1 23 210,175
24a 3 24b 2 24e 600 24d 400 24e 3
256 Answers
Chapter Five
9 3.5 10 3.5 11 12 12 3.45 13 3.3758
14 19/8 15 59/30 16 6.9 17 283/60 18 14
19a $.26337 19b $.70233 20 16/11 21 $479.60
22a 49/24 22b 26/15 23 1 24 21,998,000.25
25a .86397 25b .40462 25c .32718 25d .39320
25e 1.86397 25f .04874 25g .70588 25h 1.10385
26 104/9 27 10 28 710/17
29 with replacement: .27430656 (exactly), without replacement:
571/2205 30 11.25 31 3.15689 32 35/12 33 14/9
34a 23/3 34b 104/63 35 3e - e2 (approx.)
Answers 257
Chapter Six
1 .075816 2a .223 2b .442 3a .0498 3b .577
4 .938 Sa .135 5b .323 6 4.6 7 .751, .215,
.0307, .00291, .000209 8 .384 9 .184 10a.5
lOb .153 10e 182.5 10d 56 12 12 13 5
14 1 15 158 16a 4n .J2rr 16b J2/(rrn)
16e 2(16/27)n/../3 17 56.05 18 .0178 19 30.10 6
20 .126 22 .1359 23 .8186 24 .8186 25 .0606
26a .6827 26b .9973 26e .0797 29 .8664, .9987
30 7.6.10- 24 31 .0013
Chapter Seven
1a 3/(1 - 2z) 1b 1/(1 - 3z) Ie 1/(1 + 3z)
1d 3/(1 - 3z) Ie z/(1 - 3z) If 1/(4 - z)
19 3/(3 - z) 1h 2/(1 - z) Ii 2/(2 + z) 2 z/(2 - z)
3 z/2+z2/4+z 3/8+z 4/8 4a (Z+Z2+"'+ Z6)/6
4b z/(6 - 5z) Sa I, I, I, 1; 1/(1 - z)
5b 1 - ao, 1 - aI, 1 - a2, 1 - a3; 1/(1 - z) - f(z) 5e aI, a2, a3,
a4; [f(z) - ao]/z 5d 0, ao, aI, a2; zf(z) 5e f(z)/(1- z)
Sf [1 - f(z)]/(l - z) 5g zf(z)/(1 - z) 5h [1 - zf(z)]!(l - z)
6a 1/6 6b 1/4 6e 1/2 6d 5/12
8a (1/6)(Z+Z2+"'+Z6) 8b 1/4+z/2+z2/4
8e z/24 + z2/8 + (Z3 + ... + z6)/6 + z7/8 + z8/24
8d(Z2+ Z4+"'+ Z12) 9a1/4 9b2 ge4
lOa 3/2 lOb 19/12 lIa 112 lIb 3/4 12a 1/6
12b e - 1 12e 4e - e2 - 2 13a 10/3 13b 4/9
14a 8/3 14b 16/9 15a 1/2 15b 112 16a .3240
16b .7616 16e 1.1816 17a L 17b 2L - L2
17e 1/1og2 18a 0,1/2,1/4,1/8, ... 18b z2/(2-z)
18e 3 18d 2 19a $.97,2.58,3.55 22a 4
22b 21/4 22e 1 22d 5/9 22e 21og2
22f 910g 1.5 - 3 22g 4
258 Answers
Chapter Eight
2 5/ 12 3 5/ 8 4 217 5 3/ 13,5/ 13,5/ 13
6 8/ 15, 4/ 15, 2/ 15,1 / 15 7a 2/ 3 7b 8/ 9, 7/ 9
7c 16/ 27 9 .0000169 10 .9802 11 .449
12 .8784 13 .01220 14 .1241 15 1/ 3 16 .2471
17 1/ 11 18 .9999977 19 .9610 20 .2968
21 .6517 22 .9997 26 6 27 4 28 2.5
29 43/ 13 30a 8/ 3, 10/ 3 30b 16/ 3 31 30
33 12.1 34 7.8 35 17.3 36 50 37 12.1
38 1000 39 160.0 40 267.5 41 549.8
42 205,665 43 11 days 14 h rs 44a .0183, .4492, .6662,
.8686, .8898 44b 48,993,5,058.5, 1,336.0,44 .5, 10.6 45 6
46 14 47 42
Chapter Nine
r ~]
.1 0
3 [9 .9
.1
.05
.8
0 .1 .9
0 1/2 1/2 0 0 0 0
1/4 0 1/4 1/4 1/4 0 0
1/4 1/4 0 1/4 0 1/4 0
4 0 1/2 1/2 0 0 0 0
0 1/2 0 0 0 1/2 0
0 0 0 0 0 0 1
0 0 0 0 0 0 1
5
5~8]
0 0
[ 3/8
1 1/3 1/3
13c 1~8 1/3 1/3 7/8
0 0 1
14 VIZ = V4Z = VI3 = V43 = 0, VZZ = V3Z = VZ3 = V33 = 1/2
15a 1 15b 112
1 0 0 0 0
1 1/2 0 0 0
15c 1/3 1/3 1/4 1/3 1/3
0 0 0 1 0
0 0 0 0 1
16 = V4Z = VSZ = Vl3 = VZ3 = V43 = VS3 = 0, VZZ = I, V3Z = 2/3,
VIZ
V33 = 1/3 17 VIS = VZS = V3S = V4S = 0, VSS = 1/2
18 VI3 = VZ3 = V43 = VI4 = VZ4 = 0, V33 = 1/3, V34 = 2/3, V44 = 1
19 Vll = 3, VZI = 2, V3I = 0, VIZ = 2, VZZ = I, V3Z = 0
20 Vl2 = V3Z = VSZ = VI4 = V34 = VS4 = 0, VZZ = 1/8, V4Z = 3/8,
VZ4 = 3/8, V44 = 1/8 21 Vll = 1/2, VZI = 0, V3I = 0, VI 3 = I,
!]
VZ3 = 0, V33 = 1 22 rl2 = 2, rI3 = 6, rZ3 = 4, r33 = 1
distinct rooms: 4.38, 4.32, 3.81, 4.38, 2.81, 0 (In each case the start
is excluded.) 51 77.5 52 1,2,2, 1 53 1,2,2, 1
54 1,5,25/4,9/2,1 55 I, 1,3,5 56 I, I, I, I, 10/3,8/3
57 (3/7, 217, 2/7) 58 (215, 2/5, 1/5) 59 (0, 2/5, 3/5)
60 (113, 1/3, 1/9,2/9) 61 (1/5,3/10, 1/5,3/10)
62 (3/14,3/14,2/7,217) 63 (5/24, 1/4, 7/24, 1/4)
64 (0,0, 1) 65a 0,0,0,2/5,3/5 65b 2/9,1/3,4/9,0,0
66a 1/2,0, 1/2, 0, 0, 0,0 66b 0,0,0, 1,0,0,0
67 1/2, 1/3, 1/6 68 9/22, 4/11, 5/22
69 1/5, 2/5, 1/5, 1/5 70 6/29, 5/29, 6/29, 9/58, 6/29, 3/58
72 19/75,38/75,18/75
Index
263
264 Index
D H
E L
fair game, 81
Fermat, Pierre de, 11-14, 56, 78, M
184 Maclaurin, Colin, 165
fixed probability vector, 238 Maclaurin series, 166
Index 265
Ri,213
N
random variable, 76
negative binomial distribution, random walks, 183-207
see Pascal distribution recurrent, 214
Newton, Isaac, 2, 51, 145, 150, rencontre, 174
151,184 Richelieu, Cardinal, 13
Normal approximation, 149 Riemann, Bernhard, 160
Normal Distribution, 149
Roberval, Gilles Personne de,
11-14
o runs of successes, see successes,
consecutive
odd and even, see parity
Q,4
s
p
sample space, 4
P(A I B), 56 Shakespeare, William, 142
P(A) , 7 standard deviation, 85
pij,212 Stirling, James, 144, 145, 156
pij(n),236 Stirling's Formula, 144
266 Index
T w
transient, 214 Wo, WIt W2, . . , 188, 206
transition matrix, 212 WO, wIt ... , 198
transition probabilities, 212 with replacement, 19
without replacement, 19
v
x
vij,223
Var(X),84 Xian, Jia, 30
variance, 84, 96
Undergraduate Texts in Mathematics
(continued from page ii)