0% found this document useful (0 votes)
10 views

P9-Conditional Distribution

STAT

Uploaded by

jeffsiu456
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

P9-Conditional Distribution

STAT

Uploaded by

jeffsiu456
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

§9 Conditional distribution

§9.1 Introduction
9.1.1 Let f (x, y), fX (x) and fY (y) be the joint probability (density or mass) function of (X, Y ) and
the marginal probability functions of X and Y respectively. Then the conditional probability
function of X given Y = y is
f (x, y)
fX|Y (x|y) = ,
fY (y)
and the corresponding conditional cdf is

FX|Y (x|y) = P(X ≤ x|Y = y)


 X X


 f X|Y (u|y) = P(X = u|Y = y) (discrete case),

{u: u≤x} {u: u≤x}
= Z x

fX|Y (u|y) du (continuous case).



−∞

9.1.2 By conditioning on {Y = y}, we limit our scope to those outcomes of X that are possible
when Y is observed to be y.
Example. Let X = no. of casualties at a road accident and Y = total weight (in tons) of vehicles
involved. Then the conditional distributions of X given different values of Y may be very different, i.e.
distribution of X|Y = 10 may be different from distribution of X|Y = 1, say.

9.1.3 Special case: Y = 1 {A} (for some event A)


For brevity, we write f (x|A) = fX|Y (x|1) and f (x|Ac ) = fX|Y (x|0), and similarly for condi-
tional cdf’s.

9.1.4 X, Y independent if and only if fX|Y (x|y) = fX (x) for all x, y.

9.1.5 Conditional distributions may similarly be defined for groups of random variables.
For example, for random variables X = (X1 , . . . , Xr ) and Y = (Y1 , . . . , Ys ), let

f (x1 , . . . , xr , y1 , . . . , ys ), fX (x1 , . . . , xr ) and fY (y1 , . . . , ys )

X , Y ), X and Y , respectively. Then


be the joint probability functions of (X

56
(a) conditional (joint) probability function of X given Y = (y1 , . . . , ys ):

f (x1 , . . . , xr , y1 , . . . , ys )
fX |YY (x1 , . . . , xr |y1 , . . . , ys ) = ;
fY (y1 , . . . , ys )

(b) conditional (joint) cdf of X given Y = (y1 , . . . , ys ):

FX |YY (x1 , . . . , xr |y1 , . . . , ys ) = P(X1 ≤ x1 , . . . , Xr ≤ xr | Y1 = y1 , . . . , Ys = ys ).

9.1.6 X , Y independent if and only if

fX |YY (x1 , . . . , xr |y1 , . . . , ys ) = fX (x1 , . . . , xr ), for all x1 , . . . , xr , y1 , . . . , ys .

9.1.7 Concepts previously established for “unconditional” distributions can be obtained analogously
for conditional distributions by substituting conditional probabilities P(·|·) for P(·).

9.1.8 CONDITIONAL INDEPENDENCE


X1 , X2 , . . . are conditionally independent given Y iff
n
Y
P( X1 ≤ x1 , . . . , Xn ≤ xn |Y = y ) = P(Xi ≤ xi |Y = y)
i=1

for all x1 , . . . , xn , y ∈ (−∞, ∞) and any n ∈ {1, 2, . . .}. The latter condition is equivalent to
n
Y
f(X1 ,...,Xn )|Y (x1 , . . . , xn |y) = fXi |Y (xi |y)
i=1

for all x1 , . . . , xn , y ∈ (−∞, ∞) and any n ∈ {1, 2, . . .}, where f(X1 ,...,Xn )|Y (x1 , . . . , xn |y) denotes
the joint probability function of (X1 , . . . , Xn ) conditional on Y = y.

9.1.9 Examples.

(i) Toss a coin N times, where N ∼ Poisson (λ). Suppose the coin has probability p of
turning up “head”. Let X = no. of heads and Y = N − X = no. of tails. Then
 
n x
p (1 − p)n−x 1 x ∈ {0, 1, . . . , n} ,

fX|N (x|n) =
x
 
n n−y
p (1 − p)y 1 y ∈ {0, 1, . . . , n} .

fY |N (y|n) =
y

57
Conditional joint mass function of (X, Y ) given N = n:

f(X,Y )|N (x, y|n) = P(X = x, Y = y|N = n)


 
n x
p (1 − p)n−x 1 x = n − y ∈ {0, 1, . . . , n} 6= fX|N (x|n) fY |N (y|n)

=
x

in general — hence X, Y are not conditionally independent given N .


Joint mass function of (X, Y ):

X
f (x, y) = f(X,Y )|N (x, y|n) P(N = n)
n=0
∞  
X n x n−x
 λn e−λ
= p (1 − p) 1 x = n − y ∈ {0, 1, . . . , n}
n=0
x n!
y −(1−p)λ 
x+y −λ x −pλ −
   
x+y x λ e (pλ) e (1 p)λ e
= p (1 − p)y = ,
x (x + y)! x! y!

so that X and Y are independent Poisson random variables with means pλ and (1 − p)λ,
respectively.
(ii) Joint pdf:
f (x, y, z) = 40 xz 1 {x, y, z ≥ 0, x + z ≤ 1, y + z ≤ 1}.
It has been derived in Example §7.1.11(c) that, for x, y, z ∈ [0, 1],
20 5
fX (x) = x(1 − x)2 (1 + 2x), fY (y) = (1 − 4y 3 + 3y 4 ), fZ (z) = 20z(1 − z)3 .
3 3
Thus, for x, y, z ∈ [0, 1], conditional pdf’s can be obtained as follows.
– Given Z
  
f (x, y, z) 2x 1{x ≤ 1 − z} 1{y ≤ 1 − z}
f(X,Y )|Z (x, y|z) = = .
fZ (z) (1 − z)2 1−z

The above decomposition suggests that X and Y are conditionally independent given
Z, and therefore

2x 1 {x ≤ 1 − z}
fX|(Y,Z) (x|y, z) = fX|Z (x|z) = ,


(1 − z)2
 1 {y ≤ 1 − z}
fY |(X,Z) (y|x, z) = fY |Z (y|z) =
 ⇒ Y |(X, Z) = (x, z) ∼ U [0, 1 − z].
1−z

58
– Given Y

f (x, y, z) 24 xz 1 {x + z ≤ 1, z ≤ 1 − y}
f(X,Z)|Y (x, z|y) = =
fY (y) 1 − 4y 3 + 3y 4

cannot be expressed as a product of a function of x and a function of z, which implies


that X and Y are not conditionally independent given Y .
From f(X,Z)|Y (x, z|y) we may also derive:

1
2
12 x 1 − max{x, y}
Z
fX|Y (x|y) = f(X,Z)|Y (x, z|y) dz = ,
0 1 − 4y 3 + 3y 4
Z 1
12 z(1 − z)2 1 {z ≤ 1 − y}
fZ|Y (z|y) = f(X,Z)|Y (x, z|y) dx = ,
0 1 − 4y 3 + 3y 4
f(X,Z)|Y (x, z|y) 2 z 1{z ≤ 1 − max{x, y}}
fZ|(X,Y ) (z|x, y) = = 2 .
fX|Y (x|y) 1 − max{x, y}

– Given X

f (x, y, z) 6 z 1 {z ≤ 1 − x, y + z ≤ 1}
f(Y,Z)|X (y, z|x) = = .
fX (x) (1 − x)2 (1 + 2x)

cannot be expressed as a product of a function of y and a function of z, which implies


that Y and Z are not conditionally independent given X.
From f(Y,Z)|X (y, z|x) we may also derive:

1
2
3 1 − max{x, y}
Z
fY |X (y|x) = f(Y,Z)|X (y, z|x) dz = ,
0 (1 − x)2 (1 + 2x)
Z 1
6 z(1 − z) 1 {z ≤ 1 − x}
fZ|X (z|x) = f(Y,Z)|X (y, z|x) dy = .
0 (1 − x)2 (1 + 2x)

§9.2 Conditional expectation


9.2.1 Let fX|Y (x|y) be the conditional probability function of X given Y = y. Then, for any function
g(·),  X


 g(x)fX|Y (x|y) (discrete case),
  x∈X(Ω)
E g(X) Y = y = Z ∞

g(x)fX|Y (x|y) dx (continuous case).



−∞

59
 
9.2.2 Let ψ(y) = E g(X) Y = y , a function of y. The random variable ψ(Y ) is usually written
     
as E g(X) Y for brevity, so that E g(X) Y = y is a realisation of E g(X) Y when Y is
observed to be y.
 
9.2.3 E g(X) X = g(X).
(
E[X|Y ] = E[X],
9.2.4 X, Y independent ⇒
E[Y |X] = E[Y ].
R R
Proof: ∀ y, E[X|Y = y] = x fX|Y (x|y) dx = x fX (x) dx = E[X]. (discrete case similar).
 
9.2.5 Proposition. E E[X|Y ] = E[X].
Proof: Consider the continuous case (discrete case similar).
Z Z Z 
   
E E[X|Y ] = E X Y = y fY (y) dy = x fX|Y (x|y) dx fY (y) dy
Z Z  Z
= x f (x, y) dy dx = x fX (x) dx = E[X].

 
9.2.6 Proposition. For any event A, E P(A|Y ) = P(A).
Proof: Note that
   
E 1 {A} Y = 1 × P 1 {A} = 1 Y + 0 × P 1 {A} = 0 Y = P(A|Y ).
 
Similarly, E 1 {A} = P(A). The result follows by applying Proposition §9.2.5 with X = 1 {A}.

9.2.7 Standard properties of E[ · ] still hold for conditional expectations E[ · | Y ]:

• X1 ≥ X2 given Y ⇒ E[X1 |Y ] ≥ E[X2 |Y ].


• For any functions α(Y ), β(Y ) of Y ,
 
E α(Y )X1 + β(Y )X2 Y = α(Y ) E[X1 |Y ] + β(Y ) E[X2 |Y ].
 
• E[X|Y ] ≤ E |X| Y .
 
• g(X1 , . . . , Xn ) = h(Y ) ⇒ E g(X1 , . . . , Xn ) Y = h(Y ).
• X1 , X2 conditionally independent given Y ⇒ E[X1 X2 |Y ] = E[X1 |Y ] E[X2 |Y ].

9.2.8 Concepts derived from E[ · ] can be extended to a conditional version. For example,

60
• CONDITIONAL VARIANCE
  2
– Var(X|Y ) = E (X − E[X|Y ])2 Y = E[X 2 |Y ] − E[X|Y ] .
  
Note: Var(X) = E Var(X|Y ) + Var E[X|Y ] .

– For any functions a(Y ), b(Y ) of Y , Var a(Y )X + b(Y ) Y = a(Y )2 Var(X|Y ).
– Var(X|Y ) ≥ 0.

– Var(X|Y ) = 0 iff P X = h(Y ) Y = 1 for some function h(Y ) of Y .
P  P
– X1 , . . . , Xn conditionally independent given Y ⇒ Var i Xi Y = i Var(Xi |Y ).

• CONDITIONAL COVARIANCE/CORRELATION COEFFICIENT


 
Cov(X1 , X2 |Y ) = E (X1 − E[X1 |Y ])(X2 − E[X2 |Y ]) Y
= E[X1 X2 |Y ] − E[X1 |Y ] E[X2 |Y ],
Cov(X1 , X2 |Y )
ρ(X|Y ) = p .
Var(X1 |Y )Var(X2 |Y )

• CONDITIONAL QUANTILE
Conditional αth quantile of X given Y is inf{x ∈ R : FX|Y (x|Y ) > α}.

9.2.9 The results of §9.2.1 to §9.2.8 can be extended to the multivariate case where X is replaced
by (X1 , . . . , Xr ) and Y replaced by (Y1 , . . . , Ys ).

9.2.10 Examples — (cont’d from §9.1.9)

(i) X|N ∼ Binomial (N, p) ⇒ E[X|N ] = N p.


Then
 
E E[X|N ] = p E[N ] = pλ.
But, unconditionally, X ∼ Poisson (pλ), which implies E[X] = pλ.
 
This confirms E E[X|N ] = E[X].
(ii) Consider conditional expectations of g(X, Y, Z) = XY Z given X, Y, Z, respectively.
– Given Z
Since X and Y are conditionally independent given Z,
       
E XY Z Z = Z E XY Z = Z E X Z E Y Z
Z 1−Z Z 1−Z
2x2 y Z(1 − Z)2
=Z dx dy = .
0 (1 − Z)2 0 1−Z 3

61
– Given Y
Z 1Z 1
   
E XY Z Y = Y E XZ Y = Y xz f(X,Z)|Y (x, z|Y ) dx dz
0 0
Z 1−Y Z 1−z
24Y 2 2 2Y (1 − Y )(1 + 3Y + 6Y 2 + 10Y 3 )
= z x dx dz = .
1 − 4Y 3 + 3Y 4 0 0 15(1 + 2Y + 3Y 2 )

– Given X
Z 1Z 1
   
E XY Z X = X E Y Z X = X yz f(Y,Z)|X (y, z|X) dy dz
0 0
Z 1−X Z 1−z
6X 2 X(1 − X)(1 + 3X + 6X 2 )
= z y dy dz = .
(1 − X)2 (1 + 2X) 0 0 10(1 + 2X)

It has been derived in Example §7.1.11(c) that, for x, y, z ∈ [0, 1],


20 5
fX (x) = x(1 − x)2 (1 + 2x), fY (y) = (1 − 4y 3 + 3y 4 ), fZ (z) = 20z(1 − z)3 .
3 3
Thus,
Z 1
 
E E[XY Z|X] = E[XY Z|X = x] fX (x) dx
0
Z 1
x(1 − x)(1 + 3x + 6x2 )
 
20 2
= x(1 − x) (1 + 2x) dx = 5/126.
0 10(1 + 2x) 3

Similarly,
1
2y(1 − y)(1 + 3y + 6y 2 + 10y 3 )
Z   
  5 3 4
E E[XY Z|Y ] = (1 − 4y + 3y ) dy = 5/126,
0 15(1 + 2y + 3y 2 ) 3
Z 1
z(1 − z)2
 
3
 
E E[XY Z|Z] = 20z(1 − z) dz = 5/126.
0 3

As expected, the above results agree with that derived in Example §8.1.4(iii):
     
E E[XY Z|X] = E E[XY Z|Y ] = E E[XY Z|Z] = E[XY Z] = 5/126.

9.2.11 Proposition. For a random variable X and an event A,


 
E X 1 {A} = E[X|A] P(A).

62
Proof: Consider
  h  i h  i
E X 1 {A} = E E X 1 {A} 1 {A} = E 1 {A} E X 1 {A}
  
= E X 1 {A} = 1 P 1 {A} = 1 = E[X|A] P(A).

9.2.12 Proposition. Suppose Ω = A1 ∪ A2 · · · , where the Aj ’s are mutually exclusive. Then

E[X] = E[X|A1 ] P(A1 ) + E[X|A2 ] P(A2 ) + · · · .

The expectation of X can be treated as a weighted average of the conditional expectations of X


given disjoint sectors of the sample space. The weights are determined by the probabilities of the
sectors. The special case X = 1 {B} reduces to the “law of total probability”.
Proof:

Clearly, X = X 1 {A1 } + X 1 {A2 } + · · · . The result follows from Proposition §9.2.11.

9.2.13 Example. A person is randomly selected from an adult population and his/her height X
measured. It is known that the mean height of a man is 1.78m, and that of a woman is 1.68m.
Men account for 48% of the population. Calculate the mean height of the adult population,
E[X].
Answer:

E[X] = E[X|{man}] P(man) + E[X|{woman}] P(woman) = 1.78m × 0.48 + 1.68m × 0.52 = 1.728m.

§9.3 *** More challenges ***


9.3.1 Let X and Y be independent continuous random variables with joint density function

 Cy,
 x, y ≥ 0, x + y ≤ 1,
−3
f (x, y) = C(x + y) , x, y ≥ 0, x + y > 1,

0, otherwise,

for some constant C > 0.

(a) Find C.
(b) Find the marginal pdf’s of X and Y .
(c) Find the conditional pdf’s fX|Y and fY |X .

63
9.3.2 Let X1 , X2 , . . . be a sequence of independent and identically distributed random variables with a
common mean µ and common variance σ 2 . Let N be a random positive integer, independent of
X1 , X2 , . . . , with E[N ] = ν and Var(N ) = τ 2 . Define S = N
P
i=1 Xi , e(n) = E[S|N = n] and
v(n) = Var(S|N = n), for n = 1, 2, . . . .

(a) Write down explicit expressions for the functions e(n) and v(n).
(b) Show that e(N ) has mean µν and variance µ2 τ 2 .
(c) Show that v(N ) has mean σ 2 ν.
(d) Deduce from (b) and (c), or otherwise, that S has mean µν and variance µ2 τ 2 + σ 2 ν.

9.3.3 Dreydel is an ancient game played by Jews at the Chanukah festival. It can be played by any
number, m say, of players. Each player takes turns to spin a four-sided top — the dreydel —
marked with letters N, G, H and S respectively. Before the game starts, each player contributes one
unit to the pot which thus contains m units. Depending on the outcome of his spin, the spinning
player

• receives no payoff if N turns up,


• receives the entire pot if G turns up,
• receives half the pot if H turns up,
• contributes 1 unit to the pot if S turns up.

If G turns up, all m players must each contribute one unit to the pot to start the game again.

(a) Show that in the long run, the pot contains 2(m + 1)/3 units on average.
(b) Is Dreydel a fair game, i.e. no player has advantages over the others?

9.3.4 A mysterious organism is capable of resurrections, so that it can start a new life immediately after
death and repeat this cycle indefinitely. Let Xi be the duration of the ith life of the organism so
that Sn = ni=1 Xi gives the time of its nth resurrection. Define S0 = 0 by convention. Assume
P

that X1 , X2 , . . . are independent unit-rate exponential random variables with the density function

f (x) = e−x 1 {x > 0}.

(a) Find the mean lifetime, that is E[X1 ], of the organism.

64
(b) It is known that Sn has the Gamma (n, 1) density function

xn−1 e−x
gn (x) = 1 {x > 0}.
(n − 1)!
P∞
Show that n=1 gn (x) = 1 for x > 0.
(c) Suppose that the organism is living its N th life at time t, so that N is a positive random
integer.
(i) Show that

X
P(XN ≤ x) = P(Xn ≤ x, Sn−1 < t ≤ Sn ).
n=1

(ii) Deduce from (i) that


∞ Z
X t
P(XN ≤ x) = P(t ≤ X1 ≤ x) + P(t − s ≤ Xn ≤ x)gn−1 (s) ds.
n=2 0

(iii) Deduce from (b) and (c)(ii) that XN has the density function

h(x) = xe−x 1 {0 < x ≤ t} + (1 + t) e−x 1 {x > t}.

(iv) Show that E[XN ] = 2 − e−t .


[Hint: You may find the following integrals useful:
Z u Z u
−x −u
x e dx = 1 − (1 + u)e and x2 e−x dx = 2 − (2 + 2u + u2 )e−u ,
0 0

for any u > 0.]


(v) Do your answers to (a) and (c)(iv) contradict each other? Explain.

65

You might also like