0% found this document useful (0 votes)
4 views

8 Conditional Expectation

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

8 Conditional Expectation

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

Slide set  Conditional expectation

∙ Conditioning on an event
∙ Law of total expectation
∙ Conditional expectation as a r.v.
∙ Iterated expectation
∙ Conditional variance
∙ MSE estimation
∙ Quantization
∙ Summary

© Copyright  Abbas El Gamal


Conditioning on an event

∙ Let X ∼ pX (x) be a r.v. and A ⊂ ℝ be a nonzero probability event.


The conditional pmf of X given {X ∈ A} is defined as
pX (x)
P{X = x, X ∈ A} 󶀂 x ∈ A,
pX|A (x) = P{X = x | X ∈ A} = = 󶀊 P{X ∈ A}
P{X ∈ A} 󶀚 otherwise

Note that pX|A (x) is a pmf on X; ≥  and sums to 


∙ Similarly for X ∼ fX (x), the conditional pdf of X given {X ∈ A} is defined as
fX (x)
󶀂 x ∈ A,
fX|A (x) = 󶀊 P{X ∈ A}
󶀚 otherwise, and is a pdf on X

∙ For (X, Y) ∼ fX,Y (x, y) and A ⊂ ℝ , we can similarly define


fX,Y (x, y)
󶀂 (x, y) ∈ A,
fX,Y|A (x, y) = 󶀊 P{(X, Y) ∈ A}
󶀚
 otherwise (see HW  problem)

 / 
Conditioning on an event

∙ Example: Let X ∼ Exp(λ) and A = (a, ∞), for some constant a > ,
find the conditional pdf of X given {X ∈ A}
The conditional pdf is

λe−λx
󶀂 x > a,
fX|A (x) = 󶀊 P{X > a}
󶀚
 otherwise

Since X ∼ Exp(λ), P{X > a} = e−λa , hence

λe−λ(x−a) x>a
fX|A (x) = 󶁇
 otherwise

 / 
Conditional expectation
∙ Since fX|A (x) is a pdf on X, we can define the conditional expectation
of a function g(X) given A as

E(g(X) | A) = 󵐐 g(x) fX|A (x) dx
−∞

∙ Example: Find E(X | A) and E(X  | A) for the previous example


Recall that the mean and second moment for X ∼ Exp(λ) are
∞ ∞
−λx  
E(X) = 󵐐 xλe dx = , E(X ) = 󵐐 x λe−λx dx =

 λ  λ
The mean and second moment conditioned on A = (a, ∞) are
∞ ∞
−λ(x−a) 
E(X | A) = 󵐐 xλe dx = 󵐐 (y + a)λe−λy dy = + a, where y = x − a
a  λ
∞ ∞
−λ(x−a)

E(X | A) = 󵐐 x λe
dx = 󵐐 (y + a) λe−λy dy
a 
∞ ∞ ∞
−λy −λy  a
= 󵐐 y λe
dy + a 󵐐 yλe dy + a 󵐐 λe−λy dy =

+ + a 
   λ λ
 / 
Law of total expectation
∙ Let X ∼ fX (x) and A , A , . . . , An ⊂ ℝ be disjoint, non-zero probability events
n
such that 󵠈 P{X ∈ Ai } = , then
i=
n
E(g(X)) = 󵠈 P{X ∈ Ai } E(g(X) | Ai )
i=
n
∙ Proof: By the law of total probability, fX (x) = 󵠈 P{X ∈ Ai } fX|A (x), therefore,i
i=

E(g(X)) = 󵐐 g(x) fX (x) dx
−∞
∞ n
=󵐐 g(x) 󵠈 P{X ∈ Ai } fX|Ai (x) dx
−∞ i=
n ∞
= 󵠈 P{X ∈ Ai } 󵐐 g(x) fX|Ai (x) dx
i= −∞
n
= 󵠈 P{X ∈ Ai } E(g(X) | Ai )
i=

∙ This law can be very useful in computing expectation


 / 
Mean and variance of piecewise uniform r.v.

∙ Piecewise uniform pdf is commonly used to estimate / approximate a pdf


∙ Example: Let X be piecewise uniform with the fX (x)
shown pdf. Find the mean and variance of X

∙ Let A = [, ], A = (, ], A = (, ]. 

They are disjoint, have non-zero probabilities


and their probabilities sum to  ; fX|Ai (x), i = , , ,
x
  
are Unif[, ], Unif(, ], Unif(, ], respectively
Recall that for Y ∼ Unif[a, b], E(Y) = (a + b)/, E(Y  ) = (b − a )/((b − a)).
Using the law of total expectation, the mean and second moment of X are

    
E(X) = 󵠈 P{X ∈ Ai } E(X|Ai ) = ⋅ + ⋅ + ⋅  = .
i=     

       
E(X ) = 󵠈 P{X ∈ Ai } E(X |Ai ) = ⋅ + ⋅ + ⋅ = .
i=      
Thus the variance of X is Var(X) = E(X  ) − (E(X)) = . − . = .
 / 
Finding expectation of a mixed r.v.
∙ Let X be a mixed r.v., how do we find E(g(X))? FX (x)

continuous r.v. Y with prob.  − p
∙ We express X = 󶁇
discrete r.v. Z with prob. p
∙ Let Z be the set over which FX (x) is x x x
discontinuous, define p = 󵠈 P{X = x}
x∈Z
( − p) FY (x)

∙ By the law of total probability


−p
FX (x) = ( − p) P{X ≤ x | X ∉ Z} + p P{X ≤ x | X ∈ Z}
= ( − p) FY (x) + p FZ (x) x
p FZ (x)
∙ To find E(g(X)), we use the law of total expectation
E(g(X)) = ( − p) E(g(X) | Z c ) + p E(g(X) | Z)
= ( − p) EY (g(Y)) + p EZ (g(Z)), where p

dFY (y) x x x
fY (y) = , pZ (z) are the steps of FZ (z)
dy
 / 
Conditional expectation as a r.v.

∙ Let (X, Y) ∼ fX,Y (x, y). We defined the conditional pdf of X given {Y = y} as
fX,Y (x, y)
fX|Y (x|y) = , fY (y) ̸= 
fY (y)

∙ We know that fX|Y (x|y) is a pdf for X (for any given y), so we can define
the expectation of any function g(X, Y) with respect to fX|Y (x|y) as

E(g(X, Y) | Y = y) = 󵐐 g(x, y) fX|Y (x|y) dx
−∞

∙ In particular for g(X, Y) = X, the conditional expectation of X given {Y = y} is



E(X | Y = y) = 󵐐 x fX|Y (x|y) dx
−∞

∙ We can similarly define this conditional expectation for two discrete and for
one discrete and one continuous r.v.s

 / 
Example
∙ Let
 x, y ≥ , x + y ≤ ,
fX,Y (x, y) = 󶁇
 otherwise
Find E(X | Y = y) and E(XY | Y = y)

󶀂( − y)  ≤ y ≤ ,
∙ We already know that (page  of Slide set ): fY (y) = 󶀊
󶀚 otherwise

fX,Y (x, y) 󶀂 x, y ≥ , x + y < ,
Hence, fX|Y (x|y) = = 󶀊 − y
fY (y) 󶀚 otherwise fX|Y (x|y)


Thus −y

−y −y

E(X|Y = y) = 󵐐 x⋅ dx = , ≤y<
 −y  x
 ( − y)
Now to find E(XY|Y = y), note that
y( − y)
E(XY |Y = y) = E(X ⋅ y|Y = y) = y E(X|Y = y) = , ≤y<

 / 
Conditional expectation as a r.v.
∙ Since the conditional expectation E(g(X, Y) | Y = y) is a function of y,
we can define the random variable E(g(X, Y) | Y) as a function of Y
∙ In particular, the r.v. E(X | Y) is the conditional expectation of X given Y
−Y
∙ For the previous example, find the pdf of E(X | Y) =

( − y)  ≤ y ≤ ,
∙ The pdf of Y, fY (y) = 󶁇
 otherwise
−Y
We want to find the pdf of E(X | Y) = Z =

We use the formula for pdf of a linear function Z = aY + b,
fZ (z)
 z−b
fZ (z) = fY 󶀥 󶀵 with a = −/, b = /
|a| a 
z − /
=  ×  󶀥 − 󶀵
−/
= z for  < z ≤ /,  otherwise z
/
 / 
Mean of conditional expectation

∙ Since E(g(X, Y) | Y) is a r.v. that is a function of Y, we can define its expectation as



EY [E(g(X, Y) | Y)] = 󵐐 E(g(X, Y)|Y = y)fY (y) dy
−∞

−Y
∙ Example: Consider our running example with E(X | Y) =

We know that fY (y) = ( − y),  ≤ y ≤ , so we can find the expectation as
 −y 
−Y  
EY [E(X | Y)] = EY 󶀤 󶀴=󵐐 ⋅ ( − y) dy = 󵐐 ( − y) dy =
    
∙ We also know that fX (x) = ( − x) for  ≤ x ≤ , hence

 
E(X) = 󵐐 x ⋅ ( − x) dx =  − =
  
∙ Hence for this example, EY [E(X | Y)] = E(X)
∙ It turns out that this equality holds for every r.v.s (X, Y)!
 / 
Iterated expectation

∙ Let (X, Y) be r.v.s and g(x, y) be a function, then


EY 󶁡EX (g(X, Y) | Y)󶁱 = E(g(X, Y)),

where EX is expectation w.r.t. fX|Y (x|y)


∙ Proof: To show this consider

EY 󶁡EX (g(X, Y) | Y)󶁱 = 󵐐 EX (g(X, Y) | Y = y) fY (y) dy
−∞
∞ ∞
=󵐐 󶀥󵐐 g(x, y) fX|Y (x|y) dx󶀵 fY (y) dy
−∞ −∞
∞ ∞
=󵐐 󵐐 g(x, y) fX,Y (x, y) dx dy = E(g(X, Y))
−∞ −∞

∙ This result can be very useful in computing expectation

 / 
Examples

. A coin has a random bias P with fP (p) = ( − p) for  ≤ p ≤ 


Flip it n times independently. Let N be the number of heads, find E(N)
Of course we can first find the pmf of N, then find its expectation.
Using iterated expectation, it’s much easier to find the expectation of N,

E(N) = EP 󶁡EN (N |P)󶁱


= EP (nP)

n
= n 󵐐 ( − p)p dp =
 
. Let E(X | Y) = Y  and Y ∼ Unif[, ], find E(X)
Here we cannot first find the pdf of X, since we do not know fX|Y (x|y)
However, using iterated expectation we can easily find it


E(X) = EY 󶁡EX (X|Y)󶁱 = 󵐐 y dy =
 
 / 
Recap

∙ Let (X, Y) ∼ fX,Y (x, y) and fX|Y (x|y) be the conditional pdf of X given Y
∙ Using fX|Y (x|y), we can define the conditional expectation of g(X) given {Y = y}
as E(g(X) | Y = y)
∙ The conditional expectation E(g(X) | Y) is a r.v. that takes values E(g(X) | Y = y)
∙ Iterated expectation: EY [E(g(X) | Y)] = E(g(X))
∙ In particular, for g(X) = X, the conditional expectation of X given Y is E(X | Y)
and EY [E(X | Y)] = E(X)
∙ Up next:
󳶳 Conditional variance as a r.v. and its expectation
󳶳 Application: Mean squared error estimation

 / 
Conditional variance

∙ Let (X, Y) be r.v.s, the conditional variance of X given Y = y is



Var(X | Y = y) = E 󶁡(X − E(X | Y = y)) | Y = y󶁱 = E(X  | Y = y) − 󶁡 E(X | Y = y)󶁱

∙ Define the r.v. Var(X | Y) – a function of Y that takes values Var(X | Y = y) – as



Var(X | Y) = E 󶁡(X − E(X | Y)) | Y󶁱 = E(X  | Y) − 󶁡 E(X | Y)󶁱

∙ Example: Let fX|Y (x|y) = , x, y ≥ , x + y < , find Var(X | Y)
−y
−Y
We already know that E(X | Y) = . Now consider


(−y)
  ( − y)
E(X | Y = y) = 󵐐 x dx =
 −y 
Hence,
( − Y) ( − Y) ( − Y)
Var(X | Y) = − =
  

 / 
Conditional variance
∙ The expected value of Var(X|Y) can be computed as
EY 󶁡 Var(X | Y)󶁱 = EY 󶁡 E(X  | Y) − (E(X | Y)) 󶁱 = E(X  ) − E 󶁡(E(X | Y)) 󶁱 ()
∙ For our example, fY (y) = ( − y),  ≤ y ≤ , fX (x) = ( − x),  ≤ x ≤ , hence
E[Var(X|Y)] = E(X  ) − E[(E(X|Y)) ]



( − y)
= 󵐐 x ⋅ ( − x) dx − 󵐐 ⋅ ( − y) dy
  
   
= − − =
   
  
∙ Compare to Var(X) = E(X  ) − [E(X)] = − = ≥ E[Var(X|Y)]
  
∙ Since E(X | Y) is a r.v. it has a variance
Var(E(X | Y)) = EY 󶁢(E(X | Y) − E[E(X | Y)]) 󶁲 = E 󶁢(E(X | Y)) 󶁲 − (E(X)) ()
∙ Law of conditional variances: Adding () and (), we have
E (Var(X | Y)) + Var (E(X | Y)) = Var(X)
Thus in general, Var(X) ≥ E[Var(X|Y)]
 / 
MSE Estimation

∙ Again consider the signal estimation problem


X Y ̂
X
Sensor Estimator

∙ We wish to find the estimate X̂ that minimizes the mean squared error
̂ ]
MSE = E[(X − X)

∙ The X̂ that achieves the minimum MSE is called the MMSE estimate of X given Y
∙ Note that in general, the MMSE estimate is nonlinear, and
its MSE is ≤ the MSE of the linear MMSE estimate, i.e., it’s a better estimate

 / 
MMSE Estimate
∙ Theorem: The MMSE estimate of X given the observation Y is X̂ = E(X | Y)
and its MSE is
MMSE = E 󶁡(X − E(X | Y)) 󶁱
= E 󶁡 E 󶀡(X − E(X | Y)) | Y󶀱 󶁱 = EY 󶁡 Var(X | Y)󶁱
∙ Computing the above requires knowledge of the distribution of (X, Y)
∙ In contrast, the linear MMSE estimate requires knowledge only of the means,
variances, and covariance, which are far easier to estimate from data
∙ Properties of the MMSE estimate:
󳶳 Since by iterated expectation, EY [E(X | Y)] = E(X), the MMSE estimate is unbiased
󳶳 If X and Y are independent, then the MMSE estimate is E(X)
󳶳 The conditional expectation of the estimation error for any Y = y is
̂ | Y = y) = E(X|Y = y) − E(X|Y
E((X − X) ̂ = y)
= E(X|Y = y) − E(X|Y = y) = ,
i.e., the error is unbiased for every Y = y
 / 
Proof of theorem

∙ Recall that minb E[(X − b) ] = Var(X) and is achieved for b = E(X) ()
∙ We will use this result to show that E(X | Y) is the MMSE estimate of X given Y
∙ First we use iterated expectation to write
̂
min E 󶁡(X − X(Y))  ̂
󶁱 = min EY 󶁡 EX 󶀡(X − X(Y)) 
| Y󶀱󶁱
̂
X(Y) ̂
X(Y)

̂
= min 󶁧󵠈 EX 󶀡(X − X(y)) 
| Y = y󶀱pY (y)󶁷
̂
X(Y) y

̂
= 󵠈 min 󶁢EX 󶀡(X − X(y)) 
| Y = y󶀱󶁲 pY (y),
̂
y X(y)

since pY (y) ≥  for every y


∙ From (), it follows that min EX 󶁡(X − X(y))
̂ 
| Y = y󶁱 = Var(X | Y = y) and
̂
X(y)
̂
is achieved for X(y) = E(X | Y = y)
∙ Hence, the MMSE estimate is X(Y)
̂ = E(X | Y) and its MSE is E[Var(X | Y)]

 / 
Example

∙ Let the service time of a packet be X|{Λ = λ} ∼ Exp(λ), where Λ ∼ Unif[, ]


Find the MMSE estimate of Λ given the observation X
∙ The MMSE estimate is E(Λ | X), so we need to find the conditional pdf
fX|Λ (x|λ)fΛ (λ)
fΛ|X (λ|x) =
∫ fX|Λ (x|u)fΛ (u) du
λe−λx x λe−λx
=  = −x )e−x
,  ≤ λ ≤ , x ≥ 
−ux
∫ ue du (x +  − (x + )e

Hence,
x 
 −λx
E(Λ | X = x) = −x −x
⋅ 󵐐 λ e dλ
(x +  − (x + )e )e 
x (x + x +  − (x + x + )e−x )e−x
= −x −x

(x +  − (x + )e )e x
x + x +  − (x + x + )e−x
= −x
, x≥
x(x +  − (x + )e )
 / 
Example: MMSE versus linear MMSE estimates

1.50
MMSE
LMMSE
1.25

1.00

0.75

0.50

0.25

0.00

0.0 2.5 5.0 7.5 10.0 12.5 15.0 17.5 20.0


x

 / 
Constant versus Linear versus MSE estimation

Constant Linear MSE estimation


MSE estimation MSE estimation

Signal X X X

Observation none Y Y

Problem min E[(X − b) ] ̂ ]


min E[(X − X) ̂ ]
min E[(X − X)
b ̂
X=aY+b ̂
X(Y)

Y − E(Y)
Optimal estimate E(X) ρX,Y σX 󶀧 󶀷 + E(X) E(X|Y)
σY


MMSE Var(X) ( − ρX,Y )σX EY [Var(X|Y)]

Information needed E(X), Var(X) E(X), E(Y), σX , σY , ρX,Y joint distribution of (X, Y)

Computation trivial efficient (very) hard

 / 
Quantization

∙ Real world signals (music, speech, . . . ) are analog (continuous) waveforms


∙ To store or transmit them digitally, the waveform is first sampled
∙ Each sample X is then quantized and reproduced using a quantization system
X I ̂
X
Encoder Decoder

∙ The encoder (quantizer) assigns an index I ∈ {, , . . . , k − } to each x as follows:


󳶳 The real line is divided into k intervals (ai , ai+ ], where −∞ < a < a , . . . , ak − < ∞
󳶳 A k-bit index I = i is then assigned to every x ∈ (ai , ai+ ]
∙ The decoder maps I into an estimate X̂ ∈ {̂x , x̂ , . . . , x̂ − }, where x̂i ∈ (ai , ai+ ]
k

x̂  x̂  x̂ i x̂ k− x̂ k −

a a ai ai+ ak− ak − x

I:   i k− k − 

 / 
Quantization

∙ We choose the {ai } and {̂xi } that minimize MSE = E[(X − X(I))
̂ 
]
∙ From the MSE estimation result, we know that X̂ = E(X | I) minimizes the MSE
∙ For I = i, from our earlier discussion on conditioning on an event,
x̂ i = E(X | I = i) = E 󶀡X | X ∈ (ai , ai+ ]󶀱
a i+
x fX (x)
=󵐐 dx
ai P{X ∈ (ai , ai+ ]}
a i+ x fX (x)
=󵐐 dx
ai FX (ai+ ) − FX (ai )

So, if you know the {ai }, you can find the optimal {̂xi } using the above formula
x̂  x̂  x̂ i x̂ k− x̂ k −

a a ai ai+ ak− ak − x

I:   i k− k − 

 / 
-bit quantizer of Gaussian sample
∙ Let X ∼ N(, σ  ), k = , and divide the line into two intervals (−∞, ], (, ∞]
fX (x)

x
I:  x̂  x̂  
∙ Hence, I =  if X ≤  and I =  if X > 
∙ The estimate for I =  is
 x fX (x)  x f (x)
X
x̂  = 󵐐 dx = 󵐐 dx
−∞ FX () − FX (−∞) −∞ .

 − x 
= 󵐐 xe 󰜎  dx = −󵀊 σ
󵀂πσ  −∞ π

∙ Similarly for I = , x̂ = 󵀊 σ
π
 / 
Problem  of HW 

∙ Finding the optimal {ai }, {̂xi } for a given k is computationally difficult in general
∙ In HW , you will explore an iterative procedure called the Lloyd algorithm
󳶳 Fix an initial set {ai } and find the set {̂xi } using the formula
a i+
x fX (x)
x̂ i = 󵐐 dx
ai FX (ai+ ) − FX (ai )
󳶳 Then fix the resulting {̂xi } and find the {ai }s that minimize the MSE, specifically
x̂ i− + x̂ i
ai =

󳶳 Repeat the above procedure until your MSE doesn’t change much
∙ This procedure is not guaranteed to minimize the MSE
∙ Quantization is closely related to clustering, which you’ll also explore in HW 
x̂  x̂  x̂ i x̂ k− x̂ k −

a a ai ai+ ak− ak − x

I:   i k− k − 
 / 
Summary

∙ Expectation conditioned on an event


󳶳 Law of total expectation

∙ Conditional expectation as a r.v.:


󳶳 Iterated expectation

∙ The MMSE estimate is the conditional expectation


∙ Quantization

 / 

You might also like