P9-Conditional Distribution
P9-Conditional Distribution
§9.1 Introduction
9.1.1 Let f (x, y), fX (x) and fY (y) be the joint probability (density or mass) function of (X, Y ) and
the marginal probability functions of X and Y respectively. Then the conditional probability
function of X given Y = y is
f (x, y)
fX|Y (x|y) = ,
fY (y)
and the corresponding conditional cdf is
9.1.2 By conditioning on {Y = y}, we limit our scope to those outcomes of X that are possible
when Y is observed to be y.
Example. Let X = no. of casualties at a road accident and Y = total weight (in tons) of vehicles
involved. Then the conditional distributions of X given different values of Y may be very different, i.e.
distribution of X|Y = 10 may be different from distribution of X|Y = 1, say.
9.1.5 Conditional distributions may similarly be defined for groups of random variables.
For example, for random variables X = (X1 , . . . , Xr ) and Y = (Y1 , . . . , Ys ), let
56
(a) conditional (joint) probability function of X given Y = (y1 , . . . , ys ):
f (x1 , . . . , xr , y1 , . . . , ys )
fX |YY (x1 , . . . , xr |y1 , . . . , ys ) = ;
fY (y1 , . . . , ys )
9.1.7 Concepts previously established for “unconditional” distributions can be obtained analogously
for conditional distributions by substituting conditional probabilities P(·|·) for P(·).
for all x1 , . . . , xn , y ∈ (−∞, ∞) and any n ∈ {1, 2, . . .}. The latter condition is equivalent to
n
Y
f(X1 ,...,Xn )|Y (x1 , . . . , xn |y) = fXi |Y (xi |y)
i=1
for all x1 , . . . , xn , y ∈ (−∞, ∞) and any n ∈ {1, 2, . . .}, where f(X1 ,...,Xn )|Y (x1 , . . . , xn |y) denotes
the joint probability function of (X1 , . . . , Xn ) conditional on Y = y.
9.1.9 Examples.
(i) Toss a coin N times, where N ∼ Poisson (λ). Suppose the coin has probability p of
turning up “head”. Let X = no. of heads and Y = N − X = no. of tails. Then
n x
p (1 − p)n−x 1 x ∈ {0, 1, . . . , n} ,
fX|N (x|n) =
x
n n−y
p (1 − p)y 1 y ∈ {0, 1, . . . , n} .
fY |N (y|n) =
y
57
Conditional joint mass function of (X, Y ) given N = n:
so that X and Y are independent Poisson random variables with means pλ and (1 − p)λ,
respectively.
(ii) Joint pdf:
f (x, y, z) = 40 xz 1 {x, y, z ≥ 0, x + z ≤ 1, y + z ≤ 1}.
It has been derived in Example §7.1.11(c) that, for x, y, z ∈ [0, 1],
20 5
fX (x) = x(1 − x)2 (1 + 2x), fY (y) = (1 − 4y 3 + 3y 4 ), fZ (z) = 20z(1 − z)3 .
3 3
Thus, for x, y, z ∈ [0, 1], conditional pdf’s can be obtained as follows.
– Given Z
f (x, y, z) 2x 1{x ≤ 1 − z} 1{y ≤ 1 − z}
f(X,Y )|Z (x, y|z) = = .
fZ (z) (1 − z)2 1−z
The above decomposition suggests that X and Y are conditionally independent given
Z, and therefore
2x 1 {x ≤ 1 − z}
fX|(Y,Z) (x|y, z) = fX|Z (x|z) = ,
(1 − z)2
1 {y ≤ 1 − z}
fY |(X,Z) (y|x, z) = fY |Z (y|z) =
⇒ Y |(X, Z) = (x, z) ∼ U [0, 1 − z].
1−z
58
– Given Y
f (x, y, z) 24 xz 1 {x + z ≤ 1, z ≤ 1 − y}
f(X,Z)|Y (x, z|y) = =
fY (y) 1 − 4y 3 + 3y 4
1
2
12 x 1 − max{x, y}
Z
fX|Y (x|y) = f(X,Z)|Y (x, z|y) dz = ,
0 1 − 4y 3 + 3y 4
Z 1
12 z(1 − z)2 1 {z ≤ 1 − y}
fZ|Y (z|y) = f(X,Z)|Y (x, z|y) dx = ,
0 1 − 4y 3 + 3y 4
f(X,Z)|Y (x, z|y) 2 z 1{z ≤ 1 − max{x, y}}
fZ|(X,Y ) (z|x, y) = = 2 .
fX|Y (x|y) 1 − max{x, y}
– Given X
f (x, y, z) 6 z 1 {z ≤ 1 − x, y + z ≤ 1}
f(Y,Z)|X (y, z|x) = = .
fX (x) (1 − x)2 (1 + 2x)
1
2
3 1 − max{x, y}
Z
fY |X (y|x) = f(Y,Z)|X (y, z|x) dz = ,
0 (1 − x)2 (1 + 2x)
Z 1
6 z(1 − z) 1 {z ≤ 1 − x}
fZ|X (z|x) = f(Y,Z)|X (y, z|x) dy = .
0 (1 − x)2 (1 + 2x)
59
9.2.2 Let ψ(y) = E g(X) Y = y , a function of y. The random variable ψ(Y ) is usually written
as E g(X) Y for brevity, so that E g(X) Y = y is a realisation of E g(X) Y when Y is
observed to be y.
9.2.3 E g(X) X = g(X).
(
E[X|Y ] = E[X],
9.2.4 X, Y independent ⇒
E[Y |X] = E[Y ].
R R
Proof: ∀ y, E[X|Y = y] = x fX|Y (x|y) dx = x fX (x) dx = E[X]. (discrete case similar).
9.2.5 Proposition. E E[X|Y ] = E[X].
Proof: Consider the continuous case (discrete case similar).
Z Z Z
E E[X|Y ] = E X Y = y fY (y) dy = x fX|Y (x|y) dx fY (y) dy
Z Z Z
= x f (x, y) dy dx = x fX (x) dx = E[X].
9.2.6 Proposition. For any event A, E P(A|Y ) = P(A).
Proof: Note that
E 1 {A} Y = 1 × P 1 {A} = 1 Y + 0 × P 1 {A} = 0 Y = P(A|Y ).
Similarly, E 1 {A} = P(A). The result follows by applying Proposition §9.2.5 with X = 1 {A}.
9.2.8 Concepts derived from E[ · ] can be extended to a conditional version. For example,
60
• CONDITIONAL VARIANCE
2
– Var(X|Y ) = E (X − E[X|Y ])2 Y = E[X 2 |Y ] − E[X|Y ] .
Note: Var(X) = E Var(X|Y ) + Var E[X|Y ] .
– For any functions a(Y ), b(Y ) of Y , Var a(Y )X + b(Y ) Y = a(Y )2 Var(X|Y ).
– Var(X|Y ) ≥ 0.
– Var(X|Y ) = 0 iff P X = h(Y ) Y = 1 for some function h(Y ) of Y .
P P
– X1 , . . . , Xn conditionally independent given Y ⇒ Var i Xi Y = i Var(Xi |Y ).
• CONDITIONAL QUANTILE
Conditional αth quantile of X given Y is inf{x ∈ R : FX|Y (x|Y ) > α}.
9.2.9 The results of §9.2.1 to §9.2.8 can be extended to the multivariate case where X is replaced
by (X1 , . . . , Xr ) and Y replaced by (Y1 , . . . , Ys ).
61
– Given Y
Z 1Z 1
E XY Z Y = Y E XZ Y = Y xz f(X,Z)|Y (x, z|Y ) dx dz
0 0
Z 1−Y Z 1−z
24Y 2 2 2Y (1 − Y )(1 + 3Y + 6Y 2 + 10Y 3 )
= z x dx dz = .
1 − 4Y 3 + 3Y 4 0 0 15(1 + 2Y + 3Y 2 )
– Given X
Z 1Z 1
E XY Z X = X E Y Z X = X yz f(Y,Z)|X (y, z|X) dy dz
0 0
Z 1−X Z 1−z
6X 2 X(1 − X)(1 + 3X + 6X 2 )
= z y dy dz = .
(1 − X)2 (1 + 2X) 0 0 10(1 + 2X)
Similarly,
1
2y(1 − y)(1 + 3y + 6y 2 + 10y 3 )
Z
5 3 4
E E[XY Z|Y ] = (1 − 4y + 3y ) dy = 5/126,
0 15(1 + 2y + 3y 2 ) 3
Z 1
z(1 − z)2
3
E E[XY Z|Z] = 20z(1 − z) dz = 5/126.
0 3
As expected, the above results agree with that derived in Example §8.1.4(iii):
E E[XY Z|X] = E E[XY Z|Y ] = E E[XY Z|Z] = E[XY Z] = 5/126.
62
Proof: Consider
h i h i
E X 1 {A} = E E X 1 {A} 1 {A} = E 1 {A} E X 1 {A}
= E X 1 {A} = 1 P 1 {A} = 1 = E[X|A] P(A).
9.2.13 Example. A person is randomly selected from an adult population and his/her height X
measured. It is known that the mean height of a man is 1.78m, and that of a woman is 1.68m.
Men account for 48% of the population. Calculate the mean height of the adult population,
E[X].
Answer:
E[X] = E[X|{man}] P(man) + E[X|{woman}] P(woman) = 1.78m × 0.48 + 1.68m × 0.52 = 1.728m.
(a) Find C.
(b) Find the marginal pdf’s of X and Y .
(c) Find the conditional pdf’s fX|Y and fY |X .
63
9.3.2 Let X1 , X2 , . . . be a sequence of independent and identically distributed random variables with a
common mean µ and common variance σ 2 . Let N be a random positive integer, independent of
X1 , X2 , . . . , with E[N ] = ν and Var(N ) = τ 2 . Define S = N
P
i=1 Xi , e(n) = E[S|N = n] and
v(n) = Var(S|N = n), for n = 1, 2, . . . .
(a) Write down explicit expressions for the functions e(n) and v(n).
(b) Show that e(N ) has mean µν and variance µ2 τ 2 .
(c) Show that v(N ) has mean σ 2 ν.
(d) Deduce from (b) and (c), or otherwise, that S has mean µν and variance µ2 τ 2 + σ 2 ν.
9.3.3 Dreydel is an ancient game played by Jews at the Chanukah festival. It can be played by any
number, m say, of players. Each player takes turns to spin a four-sided top — the dreydel —
marked with letters N, G, H and S respectively. Before the game starts, each player contributes one
unit to the pot which thus contains m units. Depending on the outcome of his spin, the spinning
player
If G turns up, all m players must each contribute one unit to the pot to start the game again.
(a) Show that in the long run, the pot contains 2(m + 1)/3 units on average.
(b) Is Dreydel a fair game, i.e. no player has advantages over the others?
9.3.4 A mysterious organism is capable of resurrections, so that it can start a new life immediately after
death and repeat this cycle indefinitely. Let Xi be the duration of the ith life of the organism so
that Sn = ni=1 Xi gives the time of its nth resurrection. Define S0 = 0 by convention. Assume
P
that X1 , X2 , . . . are independent unit-rate exponential random variables with the density function
64
(b) It is known that Sn has the Gamma (n, 1) density function
xn−1 e−x
gn (x) = 1 {x > 0}.
(n − 1)!
P∞
Show that n=1 gn (x) = 1 for x > 0.
(c) Suppose that the organism is living its N th life at time t, so that N is a positive random
integer.
(i) Show that
∞
X
P(XN ≤ x) = P(Xn ≤ x, Sn−1 < t ≤ Sn ).
n=1
(iii) Deduce from (b) and (c)(ii) that XN has the density function
65