Cao 4
Cao 4
PROBABILITY CONCEPTS
LEARNING OUTCOMES
After completing this chapter, you will be able to do the following:
■ Define a random variable.
171
174 Chapter 4 Probability Concepts
1 INTRODU CTION
All investment decisions are made in an environment of risk. The tools that allow us to
make decisions with consistency and logic in this setting come under the heading of prob-
ability. This chapter presents the essential probability tools needed to frame and address
many real world problems involving risk. We illustrate how these tools apply to such issues
as predicting investment manager performance, forecasting financial variables, and pricing
a bond so that it fairly compensates bondholders for default risk. In contrast to most intro-
ductions to probability, we de-emphasize mathematics but explore concepts important to
investments more fully. One such concept is independence, as independence relates to the
predictability of returns and financial variables. Another concept which receives special at-
tention is expectation, as analysts continually look to the future in their analyses and deci-
sions. Analysts and investors must also cope with variability. We present variance, or dis-
persion around expectation, as a risk concept important in investments. You will acquire
specific skills in using portfolio expected return and variance.
The basic tools of probability, including expected value and variance, are set out in
Section 2 of this chapter. Section 3 introduces covariance and correlation (measures of re-
latedness between random quantities) and the principles for calculating portfolio expected
return and variance. Two topics end the chapter: Bayes' formula and outcome counting.
Bayes' formula is a procedure for updating beliefs based on new information. In several
areas, including a widely-used option pricing model, the calculation of probabilities in-
Probability, Expected Value, and Variance 175
volves defining and counting outcomes. The chapter ends with a discussion of principles
and shortcuts for counting.
In the above definition, the term mutually exclusive events means that only one
event can occur at a time; exhaustive means that the events cover all possible outcomes.
The most basic kind of mutually exclusive and exhaustive events is the set of the distinct
possible outcomes of the random variable. If we have that set and the assignment of prob-
abilities to those outcomes-the probability distribution of the random variable-we have
a complete description of the random variable.
Suppose we have a statement of the possible outcomes of stock returns and we know
their probabilities. But we are interested in the probability of a more complex event than a
particular outcome: What is the probability that the stock earns a return above the risk-free
rate? (We use italics to highlight statements that define events, in this chapter.) The proba-
bility of any event is the sum of the probabilities of the distinct outcomes-here, stock re-
turn outcomes-included in the definition of the event. So if the risk-free rate is 4 percent,
we would sum the probabilities of returns above 4 percent. And that raises a question: How
do we, in practice, obtain probabilities?
176 Chapter 4 Probability Concepts
In investments, the probability of an event is very often estimated from data, as a rel-
ative frequency of occurrence. This is an empirical probability. We will point out empiri-
cal probabilities in several places in which they are used in this chapter. Relationships have
to be stable through time for empirical probabilities to be accurate. We cannot calculate an
empirical probability of an event not in the historical record, or a reliable empirical proba-
bility for a very rare event. There are cases, then, in which we may adjust an empirical prob-
ability to take account of perceptions of changing relationships. In other cases, we do not
have an empirical probability to use at all. We may also make a personal assessment of prob-
ability without reference to any particular data. Each of these three probabilities is a sub-
jective probability, one drawing on personal or subjective judgment. Subjective probabili-
ties are of great importance in investments. Investors, in making buy and sell decisions that
determine asset prices, often draw on subjective probabilities. Subjective probabilities ap-
pear in various places in this chapter, notably in our discussion of Bayes' formula. In a more
narrow range of well-defined problems, we can sometimes deduce probabilities by reason-
ing about the problem. The resulting probability is an a priori probability, one based on
logical analysis rather than on observation or personal judgment. We will use this type of
probability in Example 4-6. The counting methods we discuss later are particularly impor-
tant in calculating an a priori probability. Because a priori and empirical probabilities gen-
erally do not vary from person to person, they are often grouped as objective probabilities.
In business, we often meet probabilities stated in terms of odds, as "the odds for£,"
1
or the "odds against £," for example. These terms can be defined as follows:
1. Odds for E = P(E)/[ I - P(E)]. In words, the odds for E are the probability of E
divided by I minus the probability of E. Given odds for E of "a to b" (for exam-
ple, "7 to 2"), the implied probability of Eis a/(a + b).
2. Odds against E = [ I - P(E) ]/P(E), the reciprocal of odds for E. Given odds
against E of "a to b," the implied probability of Eis bl(a + b).
As an example of Statement I, if P(E) = 1/3, the odds for E are ( l/3)/(2/3) = l/2, or "I to
2." For odds of "l to 2," the implied probability is l/3 = l/(1 + 2) = 1/3, as expected. As
an example of Statement 2, in wagering it is common to speak in terms of the odds against
something. For odds of "2 to l" against E (an implied probability of E of 1/3), a $1 wager
on E, if successful, returns $2 in profits plus the $1 staked in the wager. The bet's antici-
pated profit is $0 because ( l/3 probability of winning) X ($2 profit if the wager is won) +
(2/3 probability of losing) X ( -$1 loss if the wager is lost) = 0. This is an example of an
expected value calculation, which we define later.
You are examining the common stock of two firms in the same industry in which an
important antitrust decision will be announced next week. The first firm, SmithCo
Corporation, will benefit by a governmental decision that there is no antitrust obsta-
cle related to a merger in which it is involved. You believe that SmithCo's share price
reflects a 0.85 probability of such a decision. A second firm, Selbert Corporation,
will equally benefit from a "go ahead" ruling. Surprisingly, you believe Selbert stock
1
In certain econometric and statistical applications, probability is also stated as odds.
Probability, Expected Value, and Variance 177
0.50 0.85
The 0.50 probability column shows that Selbert shares are a better value than
SmithCo shares. Selbert shares are also a better value if a 0.85 probability is accu-
rate. On average, SmithCo shares are overvalued and Selbert shares are undervalued.
Your investment actions depend on your confidence in your analysis and on
any investment constraints you face (such as constraints on selling stock short). 2 A
conservative strategy would be to buy Selbert shares and reduce or eliminate any
current position in SmithCo. The most aggressive strategy is to short SmithCo stock
(relatively overvalued) and simultaneously buy the stock of Selbert (relatively un-
dervalued). This is known as a pairs arbitrage trade: a trade in two closely related
stocks involving the short sale of one and the purchase of the other.
The prices of SmithCo and Selbert shares reflect probabilities that are not con-
sistent. According to the Dutch Book Theorem,3 one of the most important proba-
bility results for investments, inconsistent probabilities create profit opportunities. In
our example, investors, by their buy and selJ decisions to exploit the inconsistent
probabilities, should eliminate the profit opportunity and inconsistency.
2
Selling short or shorting stock is selling borrowed shares in the hope that you can repurchase them later at a
lower price.
3
The theorem's name comes from the terminology of wagering. Suppose someone places a$ I 00 bet on X at
odds of IO to l against X, and later he is able to place a $600 bet against X at odds of l to I against X. Whatever
the outcome of X, that person makes a riskless profit of $500 because the implied probabilities are inconsistent.
He is said to have made a Dutch book in X. Ramsey ( I 93 I) presented the problem of consistent probabilities.
See also Lo ( 1999).
178 Chapter 4 Probability Concepts
4
marginal probabilities. Suppose the question is: What is the probability that the stock
earns a return above the risk-free rate? The answer is an unconditional probability that can
be viewed as the ratio of tw9 quantities. In the numerator is the sum of the probabilities of
stock returns above the risk-free rate. In the denominator is I, the sum of the probabilities
of all possible returns.
Contrast the question, What is the probability of A? with the question, What is the
probability of A, given that B has occurred? The probability in answer to this last question
is a conditional probability, denoted P(A I B) (read: "the probability of A given B"). For
example, suppose we want to know the probability that the stock earns a return above the
risk-free rate, given that the stock earns a positive return. With the words "given that" we
are restricting returns to those larger than O percent; this is a new element in contrast to the
question that brought forth an unconditional probability. The conditional probability is cal-
culated as the ratio of two quantities. The numerator is the sum of the probabilities of stock
returns above the risk-free rate; in this particular case, the numerator is the same as it was
in the unconditional case. The denominator, however, changes from I to the sum of the
probabilities for all outcomes (returns) above O percent; the denominator is a number less
5
than I, as negative returns are possible. To review, an unconditional probability is the
probability of an event without any restriction; it might even be thought of as a stand-alone
probability. A conditional probability, in contrast, is a probability of an event given that an-
other event has occurred.
Investors continually seek an information edge that will help improve their forecasts.
In mathematical terms, they are attempting to frame their view of the future using proba-
bilities conditioned on relevant information or events. Investors do not ignore useful infor-
mation; they adjust their probabilities to reflect it. Thus, the concepts of conditional proba-
bility and conditional expectation, which are discussed later, are extremely important in
investment analysis and financial markets. To state an exact definition of conditional prob-
ability, we need to introduce the concept of joint probability.
Suppose we ask the question: What is the probability of both A and B happening?
The answer to this question is a joint probability, denoted P(AB) (read: "the probability of
A and B"). If we think of the probability of A and the probability of B as sets built of the
outcomes of one or more random variables, the joint probability of A and B is the sum of
the probabilities of the outcomes they have in common. For example, consider two events:
the stock earns a return above the risk-free rate (A) and the stock earns a positive return
(B). The outcomes of A are contained within (are a subset of) the outcomes of B, so P(AB)
equals P(A). We can now state a definition of conditional probability that provides a for-
mula for calculating it.
4
In analyses of probabilities presented in tables, unconditional probabilities usually appear at the ends or
margins of the table, thus the term marginal probability. Because of possible confusion with the way marginal
is used in economics (roughly meaning incremental), we use the term unconditional probability throughout this
discussion.
5
ln this example, the conditional probability is larger than the unconditional probability. We cannot generalize
from this example, however. For instance, the probability that the stock earns a return above the risk-free rate
given that the stock earns a negative return is 0.
Probability, Expected Value, and Variance 179
Sometimes we know the conditional probability P(A I B) and we want to know the joint
probability P(AB). We can obtain the joint probability from the following multiplication
rule for probabilities, which is Equation 4-1 rearranged.
• Multiplication Rule for Probabilities. The joint probability of A and B can be ex-
pressed as
Equation 4-2 states that the joint probability of A and B equals the probability of A given B
times the probability of B. As P(AB) = P(BA), the expression P(AB) = P(BA) =
P(B I A)P(A) is equivalent to Equation 4-2.
Solution to 2.
From row 1:
From row 2:
When we have two events, A and B, that we are interested in, we often want to know
the probability that either A or B occurs. Here by or we mean an inclusive-or: that either A
or B occurs, or both A and B occur. To put this another way, the probability of A or B is the
probability that at least one of the two events occurs. Such probabilities are calculated
using the addition rule for probabilities.
• Addition Rule for Probabilities. Given events A and B, the probability that A or B
occurs, or both occur, is equal to the probability that A occurs, plus the probability
that B occurs, minus the probability that both A and B occur.
probability of outcomes in B net of the probability of any outcomes already counted when
we computed P(A). This is illuslrated in Figure 4-1, where we avoid double-counting the
outcomes in the intersection of A and B by subtracting P(AB). As an example of the calcu-
lation, if P(A) = 0.50, P(B) = 0.40, and P(AB) = 0.20, then P(A or B) = 0.50 + 0.40 -
0.20 = 0.70. Only if the two events A and B were m_utually exclusive, so that P(AB) = 0,
would it be correct to state that P(A or B) = P(A) + P(B).
A and B
The next example shows how much useful information can be obtained using the few prob-
ability rules presented to this point.
Solution to 1. The probability is 0.35. The calculation uses the addition rule
for probabilities:
cutes) = 1 because, if order 2 executes, it is certain that order 1 also executes: Price
must pass through $10 to reach $9.75.
Note that the outcomes for which order 2 executes are a subset of the outcomes for
which order 1 executes. After you count the probability that order 1 executes, you
have counted the probability of the outcomes for which order 2 also executes. There-
fore, the answer to the question is the probability that order I executes, 0.35.
Solution to 2. If the first order executes, the probability that the second
order executes (stated as a percent) is 71.4 percent. In the solution to Part 1, you
found that P(order I executes and order 2 executes) = P(order I executes I order 2
executes)P(order 2 executes) = 1 X 0.25 = 0.25. An equivalent way to state this
joint probability is useful here:
Now P(order 1 executes) = 0.35 was a given, so you have one equation in one un-
known:
You conclude that P(order 2 executes I order 1 executes)= 0.2510.35 = 5/7, or about
0.714.
Of great interest to investment analysts are the concepts of independence and de-
pendence. These concepts bear on such basic investment questions as which financial vari-
ables are useful for investment analysis, whether asset returns can be predicted, and
whether superior investment managers can be selected on the basis of their past records.
Two events are independent if the occurrence of one event does not affect the proba-
bility of occurrence of the other event.
• Definition of Independent Events. Two events A and B are independent if and only
if P(AI B) = P(A) or, equivalently, P(B I A)= P(B).
When two events are not independent, they are dependent: the occurrence of one is related
to the probability of occurrence of the other. If we are trying to forecast one event, infor-
mation about a dependent event may be useful, but information about an independent event
will not be useful.
When two events are independent, the multiplication rule for probabilities, Equation
4-2, simplifies as follows.
• Multiplication Rule for Independent Events. When two events are independent,
the joint probability of A and B equals the product of the individual probabilities of
A and B.
Thus, if we are interested in two independent events with probabilities of 0. 75 and 0.50, re-
spectively, the probability that they both occur is 0.375 = 0.75 X 0.50. The multiplication
rule for independent events generalizes to more than two events; for example, if A, B, and
Care independent events, then P(ABC) = P(A)P(B)P( C).
Probability, Expected Value, and Variance 183
1. What is the probability that 3Q:2001 EPS will be larger than 2Q:2001 EPS (a
positive change in sequential EPS)?
2. What is the probability of two negative changes in sequential EPS (3Q:2001
EPS smaller than 2Q:2001 EPS, and 4Q:2001 EPS smaller than 3Q:2001
EPS)?
The following example illustrates how hard it is to satisfy a set of independent crite-
ria even when, individually, the criteria may not be stringent.
6
Sequential comparisons of quarterly EPS are with the immediately prior quarter. A sequential comparison
stands in contrast to a comparison with the same quarter one year ago (another frequent type of comparison).
184 Chapter 4 Probability Concepts
Only 23 stocks out of 1,000 pass through your screen. If you define five
events-the stock passes the first valuation criterion, the stock passes the second
valuation criterion, the s.tock passes the analyst coverage criterion, the company
passes the profitability criterion, the company passes the.financial strength criterion,
say events A, B, C, D, and£, respectivel y-then the probability that a stock will pass
all five criteria, under independence, is
Although only one of the five criteria is even moderately strict (the strictest lets
25 percent of stocks through), the probability that a stock can pass all five is only
0.023031, or about 2 percent. The size of the list of candidate investments is
0.023031 x 1,000 = 23.031 or 23 stocks.
An area of intense interest to investment managers and their clients is whether past
records of performance are useful in identifying repeat winners and losers. The following
example shows how this issue relates to the concept of independence.
The purpose of the Kahn and Rudd ( 1995) study, introduced in Example 4-2, was to
address the question of repeat mutual fund winners and losers. If whether a fund is a
loser in one period is independent of whether it is winner in the next period, the prac-
tical value of performance ranking is questionable. Using the four events defined in
Example 4-2 as building blocks, we can define the following events to address the
issue of predictability of mutual fund performance:
If the ranking in one period is independent of the ranking in the next period,
what would you expect P(jund is a period 2 loser and fund is a period I loser) to be?
Interpret the calculated probability 0.266.
By the multiplication rule for independent events, P(fund is a period 2 loser
and fund is a period I loser) = P(fund is a period 2 loser) X P(jund is a period I
loser). Because 50 percent of funds are categorized as losers in each period, the un-
conditional probability that a fund is labeled a loser either period is 0.50. Thus
P(jund is a period 2 loser) X P(fund is a period 1 loser) = 0.50 X 0.50 = 0.25. If
whether a fund is a loser in one period is independent of whether a fund is a loser in
the other period, we conclude that P(fund is a period 2 loser and fund is a period I
Probability, Expected Value, and Variance 185
In investments, the question of whether one event (or characteristic) provides infor-
mation about another event (or characteristic) arises in both time-series settings (across
time) and cross-sectional settings (across units at a given point in time). Examples 4-4 and
4-6 illustrated independence in a time-series setting. Example 4-5 illustrated independence
in a cross-sectional setting. Independence/dependence relationships are often also ex-
plored in both settings using regressi<1franalysis, a technique we discuss in a later chapter.
In many practical problems, we logically analyze a problem as follows: We formu-
late scenarios that we think are important for understanding the likelihood of an event that
we are interested in. We then estimate the probability of the event, given the scenario.
When the scenarios (conditioning events) are mutually exclusive and exhaustive, no possi-
ble outcomes are left out. We can then analyze the event using the total probability rule.
This rule explains the unconditional probability of the event in terms of probabilities con-
ditional on the scenarios.
The total probability rule is stated below for two cases. Part I gives the simplest case,
where we have two scenarios. One new notation is introduced. If we have an event or sce-
nario S, the event not-S, called the complement of S, is written sC. 7 Note that P(S) + P(sC)
= I, as either Sor not-S must occur. Part 2 states the rule for the general case of n mutually
exclusive and exhaustive events or scenarios.
where S 1, S2, ... , Sn are mutually exclusive and exhaustive scenarios or events.
Equation 4-6 states the following: The probability of any event [P(A)] can be ex-
pressed as a weighted average of the probabilities of the event, given scenarios [terms such
P(A I S 1)]; the weights applied to these conditional probabilities are the respective proba-
bilities of the scenarios [terms such as P(S 1) multiplying P(A I S 1)], and the scenarios must
be mutually exclusive and exhaustive. Among other applications, this rule is needed to un-
derstand Bayes' formula, which we discuss later in the chapter.
In the next example, we use the total probability rule to develop a consistent set of
views about BankCorp's earnings per share.
7
For readers familiar with mathematical treatments of probability, S, a notation usually reserved for a concept
called the sample space, is being appropriated to stand for scenario.
186 Chapter 4 Probability Concepts
You are continuing your investigation into whether you can predict the direction of
changes in BankCorp's quarterly EPS. You define four events:
Event Probability
A = change in sequential EPS is positive next quarter 0.55
Ac = change in sequential EPS is O or negative next quarter 0.45
S = change in sequential EPS is positive the prior quarter 0.55
Sc = change in sequential EPS is O or negative the prior quarter 0.45
On inspecting the data, you observe some persistence in EPS changes: increases tend
to be followed by increases, and decreases by decreases. The first probability esti-
mate you develop is P(change in sequential EPS is positive next quarter I change in
sequential EPS is O or negative the prior quarter) = P(A I s2) = 0.40. The most re-
cent quarter's EPS (2Q:2001) is announced, and the change is a positive sequential
change (the event S). You are interested in forecasting EPS for 3Q:200 l.
1. Write this statement in probability notation: "The probability that the change
in sequential EPS is positive next quarter, given that the change in sequential
EPS is positive the prior quarter."
2. Calculate the probability in Part 1. (Calculate the probability that is consistent
with your other probabilities or beliefs.)
In the chapter on statistical concepts and market returns, we discussed the concept of
a weighted average or weighted mean. The example highlighted in that chapter was that
portfolio return is a weighted average of the returns on the individual assets in the portfo-
lio, where the weight applied to each asset's return is the fraction of the portfolio invested
in that asset. The total probability rule, which is a rule for stating an unconditional proba-
Probability, Expected Value, and Variance 187
Expected value (for example, expected stock return) looks either to the future, as a fore-
cast, or to the "true" value of the mea~the population mean, discussed in the chapter on
statistical concepts and market return"§); We should distinguish expected value from the
concepts of historical or sample mean. The sample mean also summarizes in a single num-
ber a central value. However, the sample mean presents a central value for a particular set
of observations as an equally weighted average of those observations. To summarize, the
contrast is forecast versus historical, or population versus sample.
You continue with your analysis of BankCorp's EPS. In Table 4-3, you have
recorded a probability distribution for BankCorp's EPS for the current fiscal year.
Probability EPS
0.15 $2.60
0.45 $2.45
0.24 $2.20
0.16 $2.00
Sum= 1.00
What is the expected value of BankCorp's EPS for the current fiscal year?
Following the definition of expected value, list each outcome, weight it by its
probability, and sum the terms.
II
8
where xi is one of n possible outcomes of the random variable X.
The expected value is our forecast. Because we are discussing random quantities, we
cannot count on an individual forecast being realized (although we hope that, on average,
forecasts will be accurate). It is important, as a result, to measure the risk we face. Variance
and standard deviation measure the dispersion of outcomes around the expected value or
forecast.
2
The two notations for variance are u (X) and Var(X).
Variance is a number greater than or equal to O because it is the sum of squared terms. If
variance is 0, there is no dispersion or risk. The outcome is certain, and the quantity Xis
not random at all. Variance greater than O indicates dispersion of outcomes. Increasing
variance indicates increasing dispersion, all else equal. Variance of X is a quantity in the
squared units of X. For example, if the random variable is return in percent, variance of re-
turn is in units of percent squared. Standard deviation is easier to interpret than variance, as
it is in the same units as the random variable. If the random variable is return in percent,
standard deviation of return is also in units of percent.
The best way to become familiar with these concepts is to work examples.
In Example 4-8, you calculated the expected value of BankCorp's EPS as $2.34,
which is your forecast. Now you want to measure the dispersion around your fore-
cast. Table 4-4 shows your view of the probability distribution of EPS.
x For simplicity, we model all random variables in this chapter as discrete random variables, which have a
countable set of outcomes. For continuous random variables, which are discussed along with discrete random
variables in the chapter on common probability distributions, the operation corresponding to summation is
integration.
Probability, Expected Value, and Variance 189
0.15 $2.60
0.45 $2.45
0.24 $2.20
0.16 $2.00
Sum= 1.00
What are the variance and standard deviation of BankCorp's EPS for the current fis-
cal year?
The order of calculation is always expected value, then variance, then standard
deviation. Expected value has already been calculated. Following the definition of
variance above, calculate the devi~n of each outcome from the mean or expected
value, square each deviation, weight (multiply) each squared deviation by its proba-
bility of occurrence, then sum these terms.
2
cr (X) = P(x )[x
1 1
- £(X)] 2 + P(x )[x
2 2
- E(X)]2 +
n
E(X I S) = [P(x1 IS) X xi] + LP(x2 I S) X x2J + ... + [P(xn IS) X x 11 ] (4-10)
Parallel to the total probability rule for stating unconditional probabilities in terms of
conditional probabilities, there is a principle for stating (unconditional) expected values in
terms of conditional expect~d values. This principle is the total probability rule for ex-
pected value.
2. E(X) = E(X I S,)P(S,) + E(X I S2) P(S2) + ••• + E(X I S11 ) P(S,,) (4-12)
where S 1, S2 , ... , S,, are mutually exclusive and exhaustive scenarios or events.
The general case, Part 2, states that the expected value of X equals the expected value of X
given Scenario I, E(X I S 1), times the probability of Scenario 1, P(S 1), plus the expected
value of X given Scenario 2, E(X I S2 ), times the probability of Scenario 2, P(S2 ), and so
forth.
To use this principle, we formulate mutually exclusive and exhaustive scenarios that
are useful for understanding the outcomes of the random variable. This approach was em-
ployed in developing the probability distribution of BankCorp's EPS in Examples 4-8 and
4..:9_
The earnings of BankCorp are interest rate sensitive, benefiting from a declining in-
terest rate environment. Suppose there is a 0.60 probability that BankCorp will operate in
a declining interest rate environment in the current fiscal year, and a 0.40 probability that it
will operate in a stable interest rate environment (assessing the chance of an increasing in-
terest rate environment as negligible). If a declining interest rate environment occurs, the
probability that EPS will be $2.60 is estimated at 0.25, and the probability that EPS will be
$2.45 is estimated at 0.75. Note that 0.60, the probability of declining interest rate envi-
ronment, times 0.25, the probability of $2.60 EPS given a declining interest rate environ-
ment, equals 0.15, the (unconditional) probability of $2.60 given in the table in Examples
4-8 and 4-9 above. The probabilities are consistent. Also, 0.60 X 0.75 = 0.45, the proba-
bility of $2.45 EPS given in Table 4-2. The tree diagram in Figure 4-2 shows the rest of
the analysis.
E(EPS) = $2.34
EPS = $2.20 with
Prob= 0.24
Prob. of stable
inleresl rales = 0.40
EPS = $2.00 with
Prob= 0.16
Probability, Expected Value, and Variance 191
Given a declining interest rate environment, we are at the node of the tree that
branches off to outcomes of $2.60 and $2.45. We can find expected EPS given a declining
interest rate environment as follows, using Equation 4-10:
Once we have the new piece of information that interest rates are stable, for example, we
revise our original expectation of EPS from $2.34 downward to $2.12. Now using the total
probability rule for expected value (Part 1)
These are conditional variances, the variance of EPS given a declining interest rate
environment and the variance of EPS given a stable interest rate environment. The rela-
tionship between unconditional variance and conditional variance is a relatively advanced
192 Chapter 4 Probability Concepts
topic. 9 The main points are that variance, like expected value, has a conditional counterpart
to the unconditional concept, and that we can use conditional variance to assess risk given
a particular scenario.
Continuing with BankCorp, you focus now on BankCorp's cost structure. One
model you are researching for BankCorp's operating costs is
Y =a+ hX
where Y is a forecast of operating costs in millions of dollars and Xis the number of
branch offices. (This model was developed using regression analysis, which we will
discuss in a later chapter.) You interpret the intercept a as fixed costs and h as vari-
able costs. You estimate the equation as
Y= 12.5 + 0.65X
. BankCorp currently has 66 branch offices, and the equation estimates that
12.5 + 0.65 X 66 = $55.4 million. You have two scenarios for growth, pictured in
the tree diagram in Figure 4-3.
Branches = 125
Op. Costs=?
Prob=?
Branches = 100
Op. Costs= ?
Prob=?
Expected Op.
Costs=? :lranches = 80
Op. Costs= ?
Low Growth Prob=?
Probability= 0.20
Branches = 70
Op. Costs=?
Prob=?
9
The unconditional variance of EPS is the sum of two terms: (I) the expected value (probability weighted
average) of the conditional variances (parallel to the total probability rules), and (2) the variance of conditional
expected values of EPS. The second term arises because the variability in conditional expected value is a source
of risk. Term (I) is cr2(EPS) = ?(declining interest rate environment) X a (EPS I declining inlerest
2
2
rate environmenl) + P(stahle interesl rale environment) X a (EPS I stable inlerest rale environment) =
(0.60 X 0.004219) + (0.40 X 0.0096) = 0.006371. Term (2) is <T 1£(EPS I interest rate environment)l =
2
10.60 X ($2.4875 - $2.34/1 + 10.40 X ($2.12 - $2.34) 1 = 0.032414. Summing the two terms,
2
1. Compute the~ forecasted operating costs given the different levels of operating
costs, using Y = 12.5 + 0.65X. State the probability of each level of the
number of branch offices. These are the answers to the questions in the termi-
nal boxes of the tree diagram.
2. Compute the expected value of operating co~ts, given the high-growth sce-
nario. Also calculate the expected value of operating costs, given the low-
growth scenario.
3. Answer the question in the initial box of the tree: What are BankCorp's ex-
pected operating costs?
Solution to I. Using E(X I Y) = 12.5 + 0.65Y, from top to bottom you have
We will see conditional probabilities again when we discuss Bayes' formula. This
section has only introduced some of the problems that can be addressed using probability
tools. The following problem draws on these tools, as well as on analytical skills.
maturity) on the debt instrument and R1 is the risk-free rate, the default risk premium
is R - Rp You assess the probability that the bond defaults as P(the bond defaults) =
0.06. Looking at current money market yields, you find that one-year Treasury bills
(T-bills) are offering a return of 5.8 percent, an estimate of Rf. As a first step, you
make the simplifying assumption that bondholders will recover nothing in the event
of a default. What is the minimum default risk premium you should require for this
instrument?
The challenge in this type of problem is to find a starting point. In many prob-
lems, including this one, an effective first step is to divide up the possible outcomes
into mutually exclusive and exhaustive events in an economically logical way. Here,
from the viewpoint of a bondholder, the two events that affect returns are the bond
defaults and the bond does not default. These two events cover all outcomes. How do
these events affect a bondholder's returns? A second step is to compute the value of
the bond for the two events. We don't have specifics on bond face value, but we can
compute value per $1 or one unit of currency invested. (It is useful to use symbols so
that a sensitivity analysis can be done.)
The third step is to find the expected value of the bond (per $1 invested).
Solving for the promised return on the bond, you find R = {(1 + Rf)/[l - P(the bond
defaults)]} - 1. Substituting in the values in the statement of the problem, R =
[1.058/(1 - 0.06)) - 1 = 1.12553 - I = 0.12553 or about 12.55 percent, and de-
fault risk premium is R - Rr = 12.55% - 5.8% = 6.75%.
You require a default risk premium of at least 675 basis points. You can state
the matter as follows: If the bond is priced to yield 12.55 percent, you will earn a 675
basis-point spread and receive the bond principal with 94 percent probability. If the
bond defaults, however, you will lose everything. With a premium of 675 basis
points, you expect to just break even relative to an investment in T-bills. Because an
investment in the zero-coupon bond has variability, if you are risk averse you might
demand a higher risk premium than 675 basis points.
This analysis is a starting point. Bondholders usually recover part of their in-
vestment after a default. A next step would be to incorporate a recovery rate. That
problem is left for the end-of-chapter problems.
Prortfolio Expected Return and Variance 195
In this section, we have treated random variables such as EPS as stand-alone quanti-
ties. We have not explored how descriptors such as expected value and variance of EPS
may be functions of other random variables such as sales and costs. To analyze portfolios,
we must understand how portfolio expected return and variance of return are a function of
characteristics of the individual securities' returns. When we look at the dispersion or vari-
ance of portfolio return, we see that how individual security returns move together or co-
vary is important. New concepts, covariance and correlation, are needed. These new con-
cepts are introduced in the next section, which deals with portfolio expected return and
variance of return.
The first question is: What is the expected return on the portfolio? In the previous
section, we defined the expected value of a random variable as the probability-weighted
average of the possible outcomes. Portfolio return, we know, is a weighted average of the
returns on the securities in the portfolio. Similarly, the expected return on a portfolio is a
weighted average of the expected returns on the securities in the portfolio, using exactly
the same weights. When we have estimated the expected returns on the individual securi-
ties, we immediately have portfolio expected return. This convenient fact follows from the
properties of expected value.
111
Although we outline a number of basic concepts in this section, we do not present mean-variance analysis per
se. For extended treatments, consult standard investment textbooks such as Bodie, Kane, and Marcus ( 1999),
Elton and Gruber ( 1995), Reilly and Brown (2000), and Sharpe, Alexander, and Bailey ( 1998).
196 Chapter 4 Probability Concepts
2. The expected value of a weighted sum of random variables equals the weighted
sum of the expected values, using the same weights.
Suppose we have a random variable with a given expected value. We then multiply each
outcome by 2, doubling the value of each outcome. The random variable's expected value
doubles as well. That is the meaning of Part 1. The second statement generalizes the prin-
ciple; it is the rule that directly leads to the expression for portfolio expected return. A port-
folio with n securities is defined by its portfolio weights, w 1, w 2, ... , wn, which sum to 1.
So portfolio return, Rp, is RP = w 1R 1 + w 2R 2 + ... + wnRn- We can state the following
principle:
Suppose we have estimated expected returns on the assets in the portfolio, as given in
Table 4-6.
tion? In the chapter on statistical concepts and market returns, we learned how to calculate
a historical or sample variance based on a sample of returns. Now we are considering vari-
ance in a forward-looking sense. We will use information about the individual assets in the
portfolio to obtain portfolio variance of return. To avoid clutter in notation, we write ERP
for E(Rp)- We need the concept of covariance.
• Definition of Covariance. Given two random variables R; and RJ, the covariance be-
tween R; and RJ is
Equation 4-14 states that the covariance between two random variables is the probability-
weighted average of the cross-product of each random variable's deviation from its own
expected value. We will return to discuss covariance after we establish the need for the
concept. Working from the definition o""'ariance, we find
(4-15)
The last step follows from the definitions of variance and covariance. 11 For the italicized
covariance terms below the diagonal, we used the fact that the order of variables in covari-
11
The calculations leading to Equation 4-15 demonstrate the first of the following useful facts about variance.
Let w be any constant, and let R be any random variable: ( 1) The variance of a constant times a random variable
equals the constant squared times the variance of the random variable, or a2(wR) = w 2 u 2(R); (2) The variance
of a constant plus a random variable equals the variance of the random variable, or a 2 ( w + R) = a\R).
Chapt er 4 Probability Conce pts
198
tion 4-15 is u\Rp) = LL w; w1 Cov(R;, Rj). The double summation signs say: "Set
i= I j= I
i = l then letj run from l to 3; then set i = 2 and letj run from I to 3; next set i = 3 and
izes for a portfolio
letj run from 1 to 3; finally add the nine terms." This expression general
of any size n to
n n
u2(Rp) = LL W; w1 Cov(R;, Rj) (4-16)
i= I j=I
diagonal
We see from Equation 4-15 that individual variances of return (the bolded
es are actually
terms) constitute part, but not all, of portfolio variance. The three varianc
the ratio is I to
outnumbered by the six covariance terms off the diagonal. For three assets,
20 varianc e terms and 20 X 20 -
2, or 50 percent. If there are 20 assets, there are
to off-dia gonal co-
20 = 380 off-diagonal covariance terms. The ratio of variance terms
then, is this: As the
variance terms is less than 6 to 100, or 6 12percent. A first observation,
nt, all else equal.
number of holdings increases, covariance becomes increasingly importa
io varianc e? The covaria nce terms
What exactly is the effect of covariance on portfol
examp le, consider
capture how the co-movements of returns affect portfolio variance. For
when the other
two stocks: one tends to have high returns (relative to its expected return)
tend to offset the
has low returns (relative to its expected return). The returns on one stock
e of returns on the portfolio.
returns on the other stock, lowering the variability or varianc
will introdu ce a more
Like variance, the units of covariance are hard to interpret, and we
we can establish
intuitive concept shortly. Meanwhile, from the definition of covariance
two essential observations about covariance.
needed to compute
A complete list of the covariances constitutes all the statistical data
format called a co-
portfolio variance of return. Covariances are often presented in a square
d return and vari-
variance matrix. Table 4-7 summarizes the inputs for portfolio expecte
ance of return.
Stock B· C
£(Rs) E(Rc)
Stock A B C
A Cov(RA, RA)* Cov(RA, Rs) Cov(RA, Re)
B Cov(Rs, RA) Cov(Rs, Rs)** Cov(Rs, Re)
C Cov(Rc, RA) Cov(Rc, Rs) Cov(Rc, Re)***
.
With three assets, the covariance matrix has 32 = 3 X 3 = 9 entries, but it is cus-
tomary to treat the diagonal terms, the variances, separately from the off-diagonal terms.
This is natural, as security variance is a single variable concept. So there are 9 - 3 = 6
covariances, excluding variances. But Cov(Rs, RA) = Cov(RA, Rs), Cov(Rc, RA) =
Cov(RA, Re), and Cov(Rc, Rs) = Cov(Rs, Re)- The covariance matrix below the diagonal
is the mirror image of the covariance matrix above the diagonal. As a result, there are only
3 = 6/2 distinct covariance terms to estimate. In general, for n securities there are
n(n - 1)/2 distinct covariances to estimate, and n variances to estimate.
Suppose we have the covariance matrix shown in Table 4-8:
U.S. Long-Term
S&P 500 Corporate Bonds MSCI EAFE
Let us take Equation 4-15 and group variance terms together. We have:
(4-17)
Let us look at the first three terms in the calculation above. Their sum, 132.625 =
100 + 5.0625 + 27.5625, is the contribution of the individual variances to portfolio vari-
ance. If the returns on the three assets were independent, according to a fact given above,
covariances would be 0 'and the standard deviation of portfolio return would be
(132.625) 112 = 11.52 percent as compared to 14 percent before. The portfolio would have
less risk. Suppose the covariance terms were negative. Then a negative number would be
added to 132.625, so portfolio variance and risk would be even smaller. At the same time,
we have not changed expected return. For the same expected portfolio return, the portfo-
lio has less risk. This risk reduction is a diversification benefit, meaning a risk-reduction
benefit from holding a portfolio of assets. The diversification benefit increases with de-
creasing covariance. This observation is a key insight of modem portfolio theory. It is
even more intuitively stated when we can use the concept of correlation. Then we can say
that as long as security returns are not perfectly positively correlated, diversification ben-
efits are possible. Furthermore, the smaller the correlation between security returns, the
greater the cost of not diversifying (in terms of risk reduction benefits forgone), all else
equal.
• Definition of Correlation. The correlation between two random variables, R; and Rj,
is defined as p(R;, R) = Cov(R;, R)lrr(R;)(T(Ri). Alternative notations are Corr(R;, R)
and Pij·
U.S. Long-Term
S&P 500 Corporate Bonds MSCI EAFE
For example, the covariance between long-term bonds and EAFE is 38, from Table 4-8.
112
The standard deviation of long-term bond returns is (81) = 9 percent, that of EAFE re-
turns is 21 percent, from diagonal terms in Table 4-8. The correlation p(Retum on long-
2
term bonds, Return on EAFE) is (38% )/(9%)(21 %) = 0.201, rounded to 0.20. The corre-
lation of the S&P 500 with itself equals I: The calculation is own covariance, which is
variance divided by its standard deviation squared, which equals variance.
• Properties of Correlation.
1. Correlation is a number between - I and + I:
-1 ::::: p(X,Y) ::::: + 1
Prortfolio Expected Return and Variance 201
Fund A B
£(RA)= 20% f(R 8) = 12%
Covariance Matrix
Fund A B
A 625 120
B 120 196
p(RA, R 8
) = Cov(RA, R )/rr(RA)rr(R
8 8
) = 120/(25 X 14) = 0.342857, or 0.34.
13
If the correlation is 0, R 1 = a + bR 2 + error, with b = 0.
14
If the correlation is positive, R 1 = a + bR2 , + error, with b > 0. If the correlation is negative, b < 0.
202 Chapter 4 Probability Concepts
A B
A 1.00 0.34
B 0.34 1.00
Solution to 3.
2 2
u\Rp) = w1 u (RA) + w1 u (R 8 ) + 2wAwsCov(RA, R8 )
2
= (0.75)2(625) + (0.25) (196) + 2(0.75)(0.25)(120)
= 351.5625 + 12.25 + 45 = 408.8125
u(Rp) = (408.8125) 112 = 20.22 percent
The expected return on BankCorp stock is (0.20 X 25%) + (0.50 X 12%) + (0.30 X
10%) = 14%. The expected return on NewBank stock is (0.20 X 20%) + (0.50 X
16%) + (0.30 X 10%) = 15%. The joint probability function above might reflect an
analysis based on whether banking industry conditions are good, average, or poor. Table
4-13 presents the calculation of covariance.
15
See any of the textbooks mentioned in footnote I0.
Prortfolio Expected Return and Variance 203
The first and second columns of numbers show, respectively, the deviations of BankCorp
and NewBank returns from their mean or expected value. The next column shows the
product of the deviations. For examje, for good industry conditions, (25 - 14) X
(20 - 15) = 11 X 5 = 55. Then 55 is multiplied or weighted by 0.20, the probability that
banking industry conditions are good: 55 X 0.20 = 11. The calculations for average and
poor banking conditions follow the same pattern. Summing up these probability-weighted
products, we find that Cov(R/J, RN) = 16.
A formula for computing the covariance between random variables R; and Rj is
The formula tells us to sum all possible cross-products of the two random variables
weighted by the appropriate joint probability. In the example we just worked, as you can
see from Table 4-13, only three joint probabilities are non-zero. Therefore, in computing
the covariance of returns in this case, we need to consider only three cross-products:
One theme of this chapter has been independence. Two random variables are inde-
pendent when every possible pair of events--one event corresponding to a value of X and
another event corresponding to a value of Y-are independent events. When: two random
variables are independent, their joint probability function simplifies.
For example, given independence, P(3, 2) = P(3)P(2). We multiply the individual proba-
bilities. Independence is a stronger property than uncorrelatedness because correlation ad-
dresses only linear relationships. The following condition holds for uncorrelated random
variables, and therefore also holds for independent random variables.
204 Chapter 4 Probability Concepts
• Multiplication Rule for the Expected Value of the Product of Uncorrelated Ran-
dom Variables. The expected value of the product of uncorrelated random variables
is the product of their .expected values.
4 TOPICS IN PROBABILITY
In the remainder of the chapter we discuss two topics that can be important in solving in-
vestment problems. We start with Bayes' formula: what probability theory has to say about
learning from experience. Then we move to a discussion of shortcuts and principles for
counting.
4.1 BAYES' When we make decisions involving investments, we often start with viewpoints based on
FORMULA our experience and knowledge. These viewpoints may be changed or confirmed by new
knowledge and observations. Bayes' formula is a rational method for adjusting our view-
17
points as we confront new information. Bayes' formula and related concepts have been
applied in many business and investment decision-making contexts, including the evalua-
18
tion of mutual fund performance.
Bayes' formula makes use of Equation 4-6, the total probability rule. To review, that
rule expressed the probability of an event as a weighted average of the probabilities of the
event, given a set of scenarios. Bayes' formula works in reverse, or more precisely, reverses
the "given that" information. Bayes' formula uses the occurrence of the event to infer the
19
probability of the scenario generating it. In many applications, including the one illus-
trating its use in this section, an individual is updating his beliefs concerning the causes
that may have produced a new observation.
To illustrate Bayes' formula, we work through an investment example that you can
adapt to any actual problem. Suppose you are an investor in the stock of DriveMed, Inc.
Security analysts make forecasts of earnings per share of the firms they cover, and various
services report consensus EPS estimates. Positive earnings surprises relative to consensus
EPS estimates often result in positive stock returns, and negative surprises often have the
opposite effect. DriveMed will release last quarter's EPS and you are interested in which of
these three events happened: last quarter's earnings exceeded the consensus EPS estimate,
or last quarter's earnings exactly met the consensus EPS estimate, or last quarter's earn-
ings fell short of the consensus EPS estimate. This list of the alternatives is mutually ex-
clusive and exhaustive. You expect that when the actual earnings become public, you will
be benefited or hurt as an investor by the reaction of the stock price to the news.
On the basis of your own research, you jot down the following prior probabilities
(or priors, for short) concerning these three events:
16
Otherwise, the calculation depends on conditional expected value; the calculation can be expressed as
E(XY) = E[X E(Y I X)J.
17
Named after the Reverend Thomas Bayes ( 1702-1761 ).
ix See Eaks, Metrick, and Wachter (200 l ).
19
For that reason, Bayes' formula is sometimes called an inverse probability.
Topics in Probability 205
These probabilities are prior in the sense that they reflect only what you know now, before
the arrival of any new information.
The next day, DriveMed announces that it is expanding factory capacity in Singapore
and Ireland to meet increased sales demand. You now assess this new information. The de-
cision to expand capacity relates not only to current demand, but probably also to the prior
quarter's sales demand. You know that sales demand is positively related to EPS. So now it
appears more likely that last quarter's EPS will exceed the consensus.
The question you have is this: In light of the new information, what is my updated
probability that the prior quarter's EPS exceeded the consensus estimate?
Bayes' formula provides a rational method for accomplishing this updating. We can
abbreviate the new information as DriveMed expands. The first step in applying Bayes'
formula is to calculate the probability of the new information (here: DriveMed expands),
given a list of events or scenarios tha~ay have generated it. The list of events should
cover all possibilities, as it does here. Formulating these conditional probabilities is the key
step in the updating process. Suppose your view is
P(DriveMed expands) =
P(DriveMed expands I EPS exceeded consensus) X
P(EPS exceeded consensus)+
P(DriveMed expands I EPS met consensus) X
P(EPS met consensus) +
P(DriveMed expands I EPS fell short of consensus) X
P(EPS fell short of consensus)
= 0.75 X 0.45 + 0.20 X 0.30 + 0.05 X 0.25 = 0.41, or41%
This is Equation 4-6, the total probability rule, in action. Now we can answer_ the question
on your mind. According to Bayes' formula,
Prior to DriveMed's announcement, you thought the probability that DriveMed would beat
consensus expectations was 45 percent. On the basis of your interpretation of the an-
206 Chapter 4 Probability Concep ts
nouncement, you update that probability to 82.3 percent. This updated probability is called
your posterior probability because it reflects or comes after the new information.
The Bayes' calculation takes the prior probability, which was 45 percent, and multi-
plies it by a ratio-th e first term on the right-hand side of the equal sign. In the denomina-
-
tor of the ratio is the probability that DriveMed expands, as you view it without consider
ing (conditioning on) anything else. Therefore, this probability is uncondit ional. The
numerator is the probability that DriveMed expands, if last quarter's EPS actually ex-
ceeded the consensus estimate. This last probability is larger than unconditional probabil-
ity in the denominator, so the ratio (l.83 roughly) is greater than 1. As a result, your up-
dated or posterior probability is larger than your prior probability. Thus, the ratio reflects
the impact of the new information on your prior beliefs. The following is a general state-
ment of Bayes' formula:
• Bayes' Formula. Given a set of prior probabilities for an event of interest, if you re-
ceive new information, the rule for updating your probability of the event is
EXAMPLE 4-13. Inferring Whethe r DriveM ed's EPS Met Consens us EPS.
You are still an investor in DriveMed stock. To review the givens, your prior proba-
bilities are P(EPS exceeded consensus) = 0.45, P(EPS met consensus) = 0.30, and
P(EPSfe ll short of consens us)= 0.25. You also have the following conditional prob-
abilities:
Recall that you updated your probability that last quarter's EPS exceeded the con-
sensus estimate from 45 percent to 82.3 percent after DriveMed announced that it
would expand. Now you want to update your other priors.
The probability P(DriveMed expands) is found by taking each of the three condi-
tional probabilities in the statement of the problem, such as P(DriveMed expands I
EPS exceeded consensus); multiplying each one by the prior probability of the
conditioning event, such as P(EPS exceeded consensus); then adding the three
products. The calculation is unchanged from the problem in the text above:
P(DriveMed expands)= 0.75 X 0.45 + 0.20 X 0.30 + 0.05 X 0.25 = 0.41, or41
percent. The other probabilities needed, P(DriveMed expands I EPS met consen-
sus) = 0.20 and P(EPS met consensus) = 0.30, are givens. So
As a result of the announcement, you have revised your probability that DriveMed's
EPS fell short of consensus from 25 percent (your prior probability) to 3 percent.
Solution to 3. The sum of the three updated probabilities is
The three events (EPS exceeded consensus, EPS met consensus, EPS fell short of
consensus) are mutually exclusive and exhaustive: One of these events or statements
must be true, so the conditional probabilities must sum to I. Whether we are talking
about conditional or unconditional probabilities, whenever we have a complete list
of the distinct possible events or outcomes, the probabilities must sum to 1. This is a
check on your work.
Solution to 4. According to Bayes' formula, P(EPS exceeded consensus I
DriveMed expands) = [0.75/(1/3)] X (1/3) = 0.75 or 75 percent. This probability
is identical to your estimate of P(DriveMed expands I EPS exceeded consensus).
This holds true in general: When a decision-maker is uninformed, his beliefs are
completely determined by the data or new information. The assumption of equal
prior probabilities is called a diffuse prior.
4.2 PRINCIPLES OF The first step in addressing a question often involves determining the different logical pos-
COUNTING sibilities. We may also want to know the number of ways each of these possibilities can
happen. In back of our mind is often a question about probability. How likely is it that I
208 Chapter 4 Probability Concepts
will observe this particular possibility? Records of success and failure are an example.
When we evaluate a market timer's record, one well-known evaluation method uses count-
ing methods presented in this section.2° An important investment model, the Binomial Op-
tion Pricing Model, incorporates the combination formula that you will learn shortly. The
methods of this section are also useful for calculating what were called a priori probabili-
ties in Section 2. When we can assume that the possible outcomes of a random variable are
equally likely, the probability of an event equals the number of possible outcomes favor-
able for the event divided by the total number of outcomes.
In counting, enumeration (counting the outcomes one by one) is of course the most
basic resource. What we discuss in this section are shortcuts and principles. Without these
shortcuts and principles, counting the total number of outcomes can be very difficult and
prone to error. The first and basic principle of counting is the multiplication rule.
• Multiplication Rule of Counting. If one thing can be done in n I ways, and a second
thing, given the first, can be done in n2 ways, and a third thing, given the first two
things, can be done in n 3 ways, and so on for k things, then the number of ways the k
things can be done is n I X n 2 X n 3 X ... X nk.
Suppose we have three steps in an investment decision process. The first step can
be done in two ways, the second in four ways, and the third in three ways. Following the
multiplication rule, there are 2 X 4 X 3 = 24 ways in which we can carry out the three
steps.
Another illustration is the assignment of members of a group to an equal number of
positions. For example, suppose you want to assign three security analysts to cover three
different industries. In how many ways can the assignments be made? The first analyst
may be assigned in three different ways. Then two industries remain. The second analyst
can be assigned in two different ways. Then one industry remains. The third and last ana-
lyst can be assigned in only one way. The total number of different assignments equals
3 X 2 X I = 6. The compact notation for the multiplication we have just performed is 3 !
(read: 3 factorial). If we had n analysts, the number of ways we could assign them to n
tasks would be
n! = n X (n - 1) X (n - 2) X (n - 3) X ... X I
211
Henriksson and Merton ( 1981 ).
A
21
The shortest explanation of n factorial is that it is the number of ways we can order n objects in a row.
we use up all the members of a
characteristic of the problems lo which we apply this counting method is that
group (sampling without replacement).
22
This discussion follows Kemeny, Schleifer, Snell, and Thompson ( 1972) in terminology and approach.
Topics in Probability 209
can we take 18 mutual funds and label 4 of them high risk, 4 above-average risk, 3 average
risk, 4 below-average risk, and 3 low risk, so each fund is labeled?
The answer is close to 13 billion. We can label 18 funds high risk (the first slot), then
17 funds, then 16 funds, then 15 funds (now we have 4 funds in the high risk group); then
we can label 14 funds above average risk, then 13 fmids, and so forth. There are 18! possi-
ble sequences. However, order of assignment within a category does not matter. For exam-
ple, whether a fund occupies the first or third slot of the four funds labeled high risk, the
fund has the same label (high risk). Thus, there are 4! ways to assign a given group of 4
funds to the 4 high risk slots. Making the same argument for the other categories, in total
there are 4 ! X 4 ! X 3 ! X 4 ! X 3 ! equivalent sequences. To eliminate such redundancies
from the 18! total, we divide 18! by 4! X 4! X 3! X 4! X 3!. We have 18!/(4! X 4! x
3! X 4! X 3!) = 18!/(24 X 24 X 6 X 24 X 6) = 12,864,852,000. This procedure gen-
eralizes as follows.
• Multinomial Formula (The General Formula for Labeling Problems). The num-
ber of ways that n objects can be labeled with k different labels, with n I of the first
type, n2 of the second type, and ~on, with n 1 + n 2 + ... + nk = n, is given by
n!
The special case of the general rule for when there are just two different labels (k =
2) is especially important. The special case is called the combination formula. A combina-
tion is a listing in which order of listing does not matter. We state the combination formula
in a traditional way, but no new concepts are involved. Using the notation in the formula
below, the number of objects with the first label is r = n 1, and the number with the second
label is n - r = n 2 (there are just two categories, son 1 + n 2 = n). Here is the formula.
• Combination Formula (The Binomial Formula). The number of ways that we can
choose r objects from a total of n objects, where the order in which the r objects is
listed does not matter, is
n
C _
r -
(n)-
r
n!
- (n - r)! X r!
Here nCr and (;) are shorthand notations for n !/[ (n - r) !r!] (read: n choose r, or n com-
bination r).
If we label the r objects as belongs to the group and the remaining objects as does
not belong to the group, whatever the group of interest, the combination formula tells us
how many ways we can select a group of size r. We can illustrate this formula with the bi-
nomial option pricing model (BOPM). The BOPM describes the movement of the under-
lying asset as a series of moves, price up (U) or price down (D). For example, two se-
quences of five moves containing three up moves, such as UUUDD and UDUUD, result in
the same final stock price. At least for an option with a payoff dependent on final stock
price, the number but not the order of up moves in a sequence matters. How many se-
quences of five moves belong to the group with three up moves? The answer is 10, calcu-
lated using the combination formula ("5 choose 3"):
• Permutation Formula. The number of ways that we can choose r objects from a
total of n objects, where the order in which the r objects is listed does matter, is
n!
npr = ---
(n - r)!
1. Does the thing that I want to count have a finite number of possible outcomes? If
the answer is yes, you may be able to use a tool in this section, and you can go to
the second question. If the answer is no, the number of outcomes is infinite, and the
tools in this section do not apply.
2. Do I want to assign every member of a group of size n to one of n slots (or tasks)?
If the answer is yes, use n factorial. If the answer is no, go to the third question.
3. Do I want to count the number of ways to apply one of three or more labels to each
member of a group? If the answer is yes, use the multinomial formula. If the an-
swer is no, go to the fourth question.
4. Do I want to count the number of ways that I can choose r objects from a total of n,
where the order in which I list the r objects does not matter (can I give the r objects
a label)? If the answer to these questions is yes, the combination formula applies. If
the answer is no, go to the fifth question.
23
A more formal definition states that a permutation is an ordered subset of n distinct objects.
Summary 211
5. Do I want to count the number of ways I can choose r objects from a total of n,
where the order in which I list the r objects is important? If the answer is yes, the
permutation formula applies. If the answer is no, go to question 6.
6. Can the multiplication rule of counting be used? If it cannot, you may have to count
the possibilities one by one, or use more advanced techniques than those presented
here. 24
5 SUMMARY
In this chapter, we have discussed the essential concepts and tools of probability. We have
applied probability, expected value, and variance to a range of investment problems.
• Probability is a number between Oand I that describes the chance that a stated event
will occur.
• A random variable is a quantity whose outcome is uncertain.
• An event is any outcome or spec~ed set of outcomes of a random variable.
• The probability of an event Eis denoted P(E).
• Mutually exclusive events can only occur one at a time. Exhaustive events cover or
contain all possible outcomes.
• The two defining properties of a probability are, first, that O ~ probability of any
event ~ 1 and second, the sum of the probabilities of any list of mutually exclusive
and exhaustive events equals 1.
• A probability estimated from data as a relative frequency of occurrence is an empir-
ical probability. A probability obtained based on logical analysis is an a priori prob-
ability. A probability drawing on personal or subjective judgment is a subjective
probability.
• A probability of an event£, P(E), can be stated as odds for E = P(E)/[l - P(E)] or
against E = [I - P(E)]IP(E).
• Probabilities that are not consistent create profit opportunities, according to the
Dutch Book Theorem.
• A probability of an event not conditioned on another event is an unconditional prob-
ability. The unconditional probability of an event A is denoted P(A). Unconditional
probabilities are also called marginal probabilities.
• A probability of an event given (conditioned on) another event is a conditional prob-
ability. The probability of an event A given an event B is denoted P(A I B).
• The probability of both A and B occurring is the joint probability of A and B, denoted
P(AB).
• P(A I B) = P(AB)IP(B), P(B) 0.=:/=
24
Feller ( 1957) contains a very full treatment of counting problems and solution methods.
212 Chapter 4 Probability Concepts
• Two events A and B are independent if and only if P(A I B) = P(A) and
P(B I A) = P(B).
• The multiplication rule for independent events states that if A and B are independent
events, P(AB) = P(A)P(B).
• If S1, S2 , . . . , S,, are mutually exclusive and exhaustive scenarios or events, then
P(A) = P(A I S1)P(S1) + P(A I S2)P(S2) + · · · + P(A I S,,)P(S,,).
• The expected value of a random variable is a probability-weighted average of the
possible outcomes of the random variable. For a random variable X, the expected
value of Xis denoted E(X).
• The variance of a random variable is the expected value (the probability-weighted
average) of squared deviations from its expected value E(X): <i(X) = E{ [X -
2
E(X)] }, where (J"2(X) stands for the variance of X. An alternative notation for the
variance of Xis Var(X).
• Variance is a measure of dispersion about the mean. Increasing variance indicates in-
creasing dispersion. Variance is measured in squared units of the original variable.
• Standard deviation is the positive square root of variance.
• Standard deviation measures dispersion (as does variance), but it is measured in the
same units as the variable.
• If w1, w2, ... , w,, are constants and R 1, R2, ... , R,, are random variables, then
E(w 1R 1 + w2R2 + ... + w,,R,,) = w 1E(R 1) + w2E(R 2) + ... + w,,E(R,,).
• The properties of variance include the following, where w and a are constants and R
2
is a random variable: (J"2(wR) = w (J"2(R) and (J"2(a + R) = (J"2(R).
• Covariance is a measure of the co-movement (linear association) between random
variables.
• The covariance between two random variables R; and Rj is the expected value of the
cross-product of the deviations of the two random variables from their respective
means: Cov(R;, R) = E{[R; - E(R;)][~ - E(R)]}.
• The covariance of a random variable with itself is its own variance:
Cov(R, R) = (J"2(R).
• Correlation is a number between - I and + 1 that measures the co-movement (linear
association) between two random variables: p(R;, R) = Cov(R;, R)/[(J"(R;) (J"(R)].
• When return correlation is less than + 1, diversification reduces risk.
• To calculate the variance of return on a portfolio of n assets, the inputs needed are the
n expected returns on the individual securities, n variances of return on the individ-
ual securities, and n(n - 1)/2 distinct covariances.
n n
• Bayes' formula is expressed as follows: Updated probability of event given the new
information = [(Probability of the new information given event)/(Unconditional
probability of the new information)] X Prior probability of event.
• The multiplication rule of counting says, for example, that if the first step in a
process can be done in IO ways, the second step, given the first, can be done in 5
ways, and the third step, given the first two, can be done in 7 ways, then the steps can
be carried out in l O x 5 X 7 = 350 ways.
• The number of ways to assign every member of a group of size n to n slots is
n! = 11 X (n - 1) X (11 - 2) X (n - 3) X ... X I. (By convention, O! = l.)
• The number of ways that 11 objects can be labeled with k different labels, with 11 1 of
the first type, n2 of the second type, and so on, with n 1 + n 2 + ... + n" = n, is given
by 11 !/(n 1 ! X 11 2 ! X ... X nk)- This expression is the multinomial formula.
• A special case of the multinomial formula is the combination formula. The number
of ways that we can choose r objects from a total of n objects, where the order in
which the r objects is listed does not matter, is
C _ ( 11 ) - n!
11
r - r - (11 - r)! X r!
• The number of ways that we can choose r objects from a total of 11 objects, where the
order in which the r objects is listed does matter, is
n!
nPr=---
(n - r)!
If the criteria are independent, how many companies will pass the screen?
7. You apply both valuation criteria and financial strength criteria in choosing stocks.
The probability that a randomly selected stock (from your investment universe)
meets your valuation criteria is 0.25. Given that a stock meets your valuation criteria,
the probability that the stock meets your financial strength criteria is 0.40. What is the
probability that a stock meets both your valuation and financial strength criteria?
Problems 215
8. Suppose that 5 percent of the stocks meeting your stock selection criteria are in the
telecommunications (telecom) industry. Also, dividend-paying telecom stocks are I
percent of the total number of stocks meeting your selection criteria. What is the prob-
ability that a stock is dividend-paying, given that it is a telecom stock that has met your
stock selection criteria?
9. The following two facts were cited in a report from Fitch data service. 25
• In 2000, the volume of defaulted U.S. high-yield debt was $27.9 billion. The aver-
age market size of the high-yield bond market during 2000 was $550 billion.
• The average recovery rate for defaulted U.S. high-yield bonds in 2000 (defined as
average price one month after default) was $0.27 on the dollar.
Probability Sales
0.20 $275
0.40 $250
0.25 $200
0.10 $190
0.05 $180
Sum= 1.00
25
"High Yield Defaults Soar in 2000," February 12, 200 I.
216 Chapter 4 Probability Concepts
suit in recovery of $0. 90 per $1 principal value with probability 0.45, or in recovery of
$0.80 per $1 principal value with probability 0.55. Scenario 2 has probability 0.25 and
will result in recovery of $0.50 per $1 principal value with probability 0.85, or in re-
covery of $0.40 per $1 principal value with probability 0.15.
a. Compute the probability of each of the four possible recovery amounts: $0.90,
$0.80, $0.50, and $0.40.
b. Compute the expected recovery, given the first scenario.
c. Compute the expected recovery, given the second scenario.
d. Compute the expected recovery.
e. Graph the information in a tree diagram.
12. Suppose we have the expected daily returns (in terms of U.S. dollars), standard devia-
tions, and correlations shown in the table below.
Correlation Matrix
A B C
U.S. Bonds I 0.09 0.10
German Bonds 0.70
Italian Bonds
a. Using the data given above, construct a covariance matrix for the daily returns on
U.S., German, and Italian bonds.
b. State the expected return and variance of return on a portfolio 70 percent
invested in U.S. bonds, 20 percent in German bonds, and 10 percent in Italian
bonds.
c. Calculate the standard deviation of return on a portfolio 70 percent invested
in U.S. bonds, 20 percent in German bonds, and 10 percent in Italian
bonds.
13. The variance of a portfolio of stocks depends on the variances of each individual stock
in the portfolio and also the covariances among the stocks in the portfolio. If you have
five stocks, how many unique covariances (excluding variances) must you use in order
to compute the variance of return on your portfolio? (Recall that the covariance of a
stock with itself is the stock's variance.)
14. Calculate the covariance of the returns on Bedolf Corporation (Ru) with the returns on
Zedock Corporation (R 2 ), using the following data.
Problems 217
Rz = 15% Rz = 10% Rz = 5%
15. You have developed a set of criteria for evaluating distressed credits. Firms that do not
receive a passing score are classed as likely to go bankrupt within 12 months. You
gathered the following information when validating the criteria:
• Forty percent of the companies to which the test is administered will go bankrupt
within 12 months: P(non-survivi = 0.40.
• Fifty-five percent of the comparnes to which the test is administered pass it: P(pass)
= 0.55.
• The probability that a firm will pass the test (and be classed as a 12-month survivor),
given that it will subsequently survive 12 months, is 0.85: P(pass test I survivor) =
0.85.
SOLUTIONS 1. a. Probability is defined by the following two properties: (1) the probability of any
event is a number between 0 and 1, and (2) the sum of the probabilities of any list
of mutually exclusive and exhaustive events equals 1.
b. Conditional probability is the probability of a stated event, given that another event
has occurred. For example P(A I B) is the probability of A, given that B has oc-
curred.
c. An event is any specified outcome or set of outcomes of a random variable.
d. Two events are independent if the occurrence of one event does not affect the prob-
ability of occurrence of the other event. In symbols, two events A and B are inde-
pendent if and only if P(A I B) = P(A) or, equivalently, P(B I A) = P(B).
e. The variance of a random variable is the expected value (the probability-weighted
average) of squared deviations from the random variable's expected value. In sym-
2
bols, cr2(X) = E{[X - E(X)] }.
2. One logical set of three mutually exclusive and exhaustive events for the reaction of a
firm's stock price on the day of a corporate earnings announcement are as follows
(wording may vary):
In fact, there are an unlimited number of ways to split up the possible outcomes into
three mutually exclusive and exhaustive events. For example, the following list also
answers this question satisfactorily:
• Stock price increases by more than 4 percent on the day of the announcement.
• Stock price increases by 0 percent to 4 percent on the day of the announcement.
• Stock price decreases on the day of the announcement.
Recovery = $0.80
Prob= 0.4125
Expected
Recovery = $0.755
Recovery = $0.50
Prob= 0.2125
Scenario 2,
Probability = 0.25
Recovery = $0.40
Prob = 0.0375
12. a. The diagonal entries in the covariance matrix are the variances, found by squaring
the standard deviations.
2
Var(U.S. bond returns)= 0.409 = 0.167281
2
Var(German bond returns) = 0.606 = 0.367236
2
Var(Italian bond returns) = 0.635 = 0.403225
• Cov(U.S. bond returns, German bond returns) = p(U.S. bond returns, German
bond returns)a(U.S. bond returns)a(German bond returns) = 0.09 X 0.409 X
0.606 = 0.022307
• Cov(U.S. bond returns, Italian bond returns) = p(U.S. bond returns, Italian
bond returns)a(U.S. bond returns)a(Italian bond returns) = 0.10 X 0.409X
0.635 = 0.025972
• Cov(German bond returns, Italian bond returns)= p(German bond returns, Ital-
ian bond returns)a(German bond returns)a(ltalian bond returns) = 0.70 X
0.409 X 0.635 = 0.181801
2
)
3
)
1 2
, )
+ 2w w Cov(R ,R + w w_:,Cov(R~, R
1 2
)
1 3 )
1 3 2
3
15. a. We can set up the equation using the total probability rule:
The information that a firm passes the test causes you to update your probability
that the firm is a survivor from 0.60 to approximately 0.927.
c. According to Bayes' formula, P(non-survivor I not pass test) = [P(not pass test I
non-survivor)IP(not pass test)] X P(non-survivor) = [P(not pass test I
non-survivor)/0.45] X 0.40.
222 Chapter 4 Probability Concepts
We can set up the following equation to obtain P(not pass test I non-survivor):
17. We find the answer using the combination formula (:) = n!/[(n - r)!r!] Here,
n = 10 and r = 4, so the answer is 10!/[(10 - 4)!4!] = 3,628,800/(720 X
24) = 210.
18. a. The two events that affect a bondholder's returns are the bond defaults and the
bond does not default. First, compute the value of the bond for the two events per
$1 invested.
On the other hand, the expected value of the T-bill is the certain value ( I + R1).
Setting the expected value of the bond to the expected value of the T-bill permits us
to find the promised return on the bond such that bondholders expect to break even.
b. For this problem, Rf= 0.058, P(the bond defaults) = 0.06, 1 - P(the bond de-
faults)= 0.94, and 0 = 0.35.
With a recovery rate of 35 cents on the dollar, a minimum default risk premium
of about 430 basis points is required, calculated as 4.3% = 10.1 % - 5.8%.
"'