P9-Conditional Distribution
P9-Conditional Distribution
§9.1 Introduction
9.1.1 Let f (x, y), fX (x) and fY (y) be the joint probability (density or mass) function of (X, Y ) and
the marginal probability functions of X and Y , respectively. Consider, for some small δ > 0,
the conditional probability
Z x Z y+δ . Z y+δ
P(X ≤ x | y ≤ Y ≤ y + δ) = f (u, v) du dv fY (v) dv
−∞ y y
Z x Z x
. f (u, y)
≈ δ f (u, y) du δ fY (y) = du,
−∞ −∞ fY (y)
which suggests Z x
f (u, y)
lim P(X ≤ x | y ≤ Y ≤ y + δ) = du.
δ→0 −∞ fY (y)
[The discrete case can be treated in a similar way.]
9.1.2 By conditioning on {Y = y}, we limit our scope to those outcomes of X that are possible
when Y is observed to be y.
Example. Let X = no. of casualties at a road accident and Y = total weight (in tons) of vehicles
involved. Then the conditional distributions of X given different values of Y may be very different, i.e.
distribution of X|Y = 10 may be different from distribution of X|Y = 1, say.
56
9.1.4 X, Y independent if and only if fX|Y (x|y) = fX (x) for all x, y.
9.1.5 Conditional distributions may similarly be defined for groups of random variables.
For example, for random variables X = (X1 , . . . , Xr ) and Y = (Y1 , . . . , Ys ), let
9.1.7 Concepts previously established for “unconditional” distributions can be obtained analogously
for conditional distributions by substituting conditional distribution or probability functions,
i.e. F (·|·) or f (·|·), for their unconditional counterparts, i.e. F (·) or f (·).
for all x1 , . . . , xn , y ∈ [−∞, ∞] and any n ∈ {1, 2, . . .}. The latter condition is equivalent to
n
Y
fX1 ,...,Xn |Y (x1 , . . . , xn |y) = fXi |Y (xi |y)
i=1
for all x1 , . . . , xn , y ∈ [−∞, ∞] and any n ∈ {1, 2, . . .}, where fX1 ,...,Xn |Y (x1 , . . . , xn |y) denotes
the joint probability function of (X1 , . . . , Xn ) conditional on Y = y.
57
9.1.9 Examples.
(i) Let X1 , X2 , Y be Bernoulli random variables such that fY (y) = (1/2) 1{y = 0 or 1} and
Note that
1 {x = 0} 1 {x = 0}, y = 0,
1 2
fX1 ,X2 |Y (x1 , x2 |y) =
(1/2) 1 {x1 = 0 or 1} × (1/2) 1 {x2 = 0 or 1}, y = 1.
Since fX1 ,X2 |Y (x1 , x2 |y) can be factorised into a product fX1 |Y (x1 |y)fX2 |Y (x2 |y) for y = 0
or 1, X1 and X2 are conditionally independent given Y .
However, the (unconditional) joint mass function of (X1 , X2 ),
1
X
fX1 ,X2 (x1 , x2 ) = fX1 ,X2 |Y (x1 , x2 |y) fY (y) = (1/2) 1 {x1 = x2 = 0} + 1/8,
y=0
cannot be factorised into a product fX1 (x1 )fX2 (x2 ). Therefore, X1 , X2 are not indepen-
dent.
(ii) Toss a coin N times, where N ∼ Poisson (λ). Suppose the coin has probability p of
turning up “head”. Let X = no. of heads and Y = N − X = no. of tails. Then
n x
p (1 − p)n−x 1 x ∈ {0, 1, . . . , n} ,
fX|N (x|n) =
x
n n−y
p (1 − p)y 1 y ∈ {0, 1, . . . , n} .
fY |N (y|n) =
y
58
Joint mass function of (X, Y ):
∞
X
f (x, y) = fX,Y |N (x, y|n) P(N = n)
n=0
∞
X n! x λn e−λ
p (1 − p)y 1 x, y ∈ {0, 1, . . . , n}, x + y = n
=
n=0
x! y! n!
y
x+y −λ
(pλ)x e−pλ (1 − p)λ e−(1−p)λ
(x + y)! x y λ e
= p (1 − p) = ,
x! y! (x + y)! x! y!
so that X and Y are independent Poisson random variables with means pλ and (1 − p)λ,
respectively.
(iii) Joint pdf:
f (x, y, z) = 40 xz 1 {x, y, z ≥ 0, x + z ≤ 1, y + z ≤ 1}.
It has been derived in Example §7.1.11(c) that, for x, y, z ∈ [0, 1],
20 5
fX (x) = x(1 − x)2 (1 + 2x), fY (y) = (1 − 4y 3 + 3y 4 ), fZ (z) = 20z(1 − z)3 .
3 3
Thus, for x, y, z ∈ [0, 1], conditional pdf’s can be obtained as follows.
– X, Y |Z = z ∼ ?
f (x, y, z) 2x 1 {x ≤ 1 − z} 1 {y ≤ 1 − z}
fX,Y |Z (x, y|z) = = .
fZ (z) (1 − z)2 1−z
The above factorisation suggests that X and Y are conditionally independent given
Z, and therefore
2x 1 {x ≤ 1 − z}
fX|Y,Z (x|y, z) = fX|Z (x|z) = ,
(1 − z)2
1 {y ≤ 1 − z}
fY |X,Z (y|x, z) = fY |Z (y|z) =
⇒ Y |Z = z ∼ U [0, 1 − z].
1−z
– X, Z|Y = y ∼ ?
f (x, y, z) 24 xz 1 z ≤ 1 − max{x, y}
fX,Z|Y (x, z|y) = = .
fY (y) 1 − 4y 3 + 3y 4
Note: fX,Z|Y (x, z|y) cannot be expressed as a product of a function of (x, y) and a function
of (z, y), which implies that X and Z are not conditionally independent given Y .
59
– Z|(X, Y ) = (x, y) ∼ ?
Note: The above equality may not hold if X, Y are not independent.
X1 ≥ X2 ⇒ E[X1 |Y ] ≥ E[X2 |Y ].
For any functions α(Y ), β(Y ), γ(Y ) of Y ,
E α(Y )X1 + β(Y )X2 + γ(Y )Y = α(Y ) E[X1 |Y ] + β(Y ) E[X2 |Y ] + γ(Y ).
E[X|Y ] ≤ E |X| Y .
9.2.5 Concepts built upon E[ · ] can be extended to a conditional version. For example,
60
CONDITIONAL VARIANCE
2
– Var(X|Y ) = E (X − E[X|Y ])2 Y = E[X 2 |Y ] − E[X|Y ] .
– For any functions a(Y ), b(Y ) of Y , Var a(Y )X + b(Y ) Y = a(Y )2 Var(X|Y ).
– Var(X|Y ) ≥ 0.
– Var(X|Y ) = 0 iff P X = h(Y )Y = 1 for some function h(Y ) of Y .
P P
– X1 , . . . , Xn conditionally independent given Y ⇒ Var i Xi Y =
i Var(Xi |Y ).
CONDITIONAL QUANTILE
Conditional αth quantile of X given Y is inf{x ∈ R : FX|Y (x|Y ) > α}.
9.2.6 Proposition. (Law of total expectation) For any function g(x), E E[g(X)|Y ] = E[g(X)].
Proof: Consider the continuous case (discrete case similar).
Z Z Z
E E[g(X)|Y ] = E g(X) Y = y fY (y) dy =
g(x) fX|Y (x|y) dx fY (y) dy
Z Z Z
= g(x) f (x, y) dy dx = g(x) fX (x) dx = E[g(X)].
9.2.7 Proposition. For any event A, E P(A|Y ) = P(A).
Proof: Note that
E 1 {A}Y = 1 × P 1 {A} = 1Y + 0 × P 1 {A} = 0Y = P(A|Y ).
Similarly, E 1 {A} = P(A). The result follows by applying Proposition §9.2.6 with X = 1 {A}.
9.2.8 Proposition. (Law of total variance) Var(X) = E Var(X|Y ) + Var E[X|Y ] .
61
Proof:
9.2.9 The results of §9.2.1 to §9.2.8 can be extended to the multivariate case where Y is replaced by
(Y1 , . . . , Ys ).
62
– Given (X, Y )
Z 1
E XY Z X, Y = XY E Z X, Y = XY zfZ|X,Y (z|X, Y ) dz
0
Z 1−max{X,Y }
2XY 2
z 2 dz = XY 1 − max{X, Y } .
= 2
1 − max{X, Y } 0 3
As expected, the above results agree with that derived in Example §8.1.4(iii):
E E[XY Z|Z] = E E[XY Z|X, Y ] = E[XY Z] = 5/126.
Note: The expectation of X can be treated as a weighted average of the conditional expectations
of X given disjoint sectors of the sample space. The weights are determined by the probabilities of
the sectors. The special case X = 1 {B} reduces to the “law of total probability”.
63
9.2.12 Proposition. For a random variable X and an event A with P(A) > 0,
E X 1 {A}
E[X|A] = .
P(A)
Proof: Applying Proposition §9.2.11 with A1 = A, A2 = Ac and X replaced by X 1 {A}, we have
E[X 1{A}] = E[X 1{A}|A] P(A) + E[X 1{A}|Ac ] P(Ac ) = E[X|A] P(A).
Note: The special case X = 1 {B} reduces to the definition of the conditional probability P(B|A).
9.2.13 Example. A person is randomly selected from an adult population and his/her height X
measured. It is known that the mean height of a man is 1.78m, and that of a woman is 1.68m.
Men account for 48% of the population. Calculate the mean height of the adult population,
E[X].
Answer:
E[X] = E[X|{man}] P(man) + E[X|{woman}] P(woman) = 1.78m × 0.48 + 1.68m × 0.52 = 1.728m.
64
Putting j = 0, 1, 2, respectively, we have
1 1 1 1
P(Y > Xπ) = + = ,
2 2 2 2
−1 1 ln 2 2 ln 2 − 1 3 ln 2 − 1
E Y 1 {Y > Xπ} = + = ,
2 2π 2π 4π
1 1 1 − ln 2 2 − ln 2
E Y −2 1 {Y > Xπ} =
2
+ 2
= .
2 4π 4π 8π 2
It follows that
E Y −1 1 {Y > Xπ}
−1 3 ln 2 − 1
E Y Y > Xπ = = .
P(Y > Xπ) 2π
Similarly, we have
E Y −2 1 {Y > Xπ}
−2 2 − ln 2
E Y Y > Xπ = = ,
P(Y > Xπ) 4π 2
so that
2
Var Y −1 Y > Xπ = E Y −2 Y > Xπ − E Y −1 Y > Xπ
3 ln 2 − 1 2 1 + 5 ln 2 − 9(ln 2)2
2 − ln 2
= − = .
4π 2 2π 4π 2
(a) Find C.
(b) Find the conditional pdf fX|Y .
(c) Find the conditional pdf of X given X > Y .
9.3.2 Dreydel is an ancient game played by Jews at the Chanukah festival. It can be played by any
number, m say, of players. Each player takes turns to spin a four-sided top — the dreydel —
marked with letters N, G, H and S respectively. Before the game starts, each player contributes one
unit to the pot which thus contains m units. Depending on the outcome of his spin, the spinning
player
65
receives the entire pot if G turns up,
receives half the pot if H turns up,
contributes 1 unit to the pot if S turns up.
If G turns up, all m players must each contribute one unit to the pot to start the game again.
(a) Show that in the long run, the pot contains 2(m + 1)/3 units on average.
(b) Is Dreydel a fair game, i.e. no player has advantages over the others?
9.3.3 Two players compete in a card-drawing game. Each player is given a full pack of 52 cards. Each
draws cards from the pack repeatedly and with replacement, until a stopping condition is met.
Player 1 stops whenever a queen is followed by a queen, which is then followed by a king. Player
2 stops whenever a queen is followed by a king, which is then followed by a queen. Whoever stops
first is the winner. Compare the expected numbers of cards drawn by the two players. Which player
does the game favour more?
66