0% found this document useful (0 votes)
38 views

Conditional Expectation: Scott Sheffield

This document summarizes a lecture on conditional expectation. It begins by reviewing conditional probability distributions, defining the conditional probability mass or density function. It then defines conditional expectation as the expected value of a random variable X under the conditional probability measure given some other random variable Y equals y. Conditional expectation can be written as a sum or integral involving the conditional probability distribution. Finally, it notes that conditional expectation E[X|Y] can itself be viewed as a random variable that depends on the value of Y.

Uploaded by

Ed Z
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views

Conditional Expectation: Scott Sheffield

This document summarizes a lecture on conditional expectation. It begins by reviewing conditional probability distributions, defining the conditional probability mass or density function. It then defines conditional expectation as the expected value of a random variable X under the conditional probability measure given some other random variable Y equals y. Conditional expectation can be written as a sum or integral involving the conditional probability distribution. Finally, it notes that conditional expectation E[X|Y] can itself be viewed as a random variable that depends on the value of Y.

Uploaded by

Ed Z
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 58

18.

600: Lecture 25
Conditional expectation

Scott Sheffield

MIT
Outline

Conditional probability distributions

Conditional expectation

Interpretation and examples


Outline

Conditional probability distributions

Conditional expectation

Interpretation and examples


Recall: conditional probability distributions
I It all starts with the definition of conditional probability:
P(A|B) = P(AB)/P(B).
Recall: conditional probability distributions
I It all starts with the definition of conditional probability:
P(A|B) = P(AB)/P(B).
I If X and Y are jointly discrete random variables, we can use
this to define a probability mass function for X given Y = y .
Recall: conditional probability distributions
I It all starts with the definition of conditional probability:
P(A|B) = P(AB)/P(B).
I If X and Y are jointly discrete random variables, we can use
this to define a probability mass function for X given Y = y .
p(x,y )
I That is, we write pX |Y (x|y ) = P{X = x|Y = y } = pY (y ) .
Recall: conditional probability distributions
I It all starts with the definition of conditional probability:
P(A|B) = P(AB)/P(B).
I If X and Y are jointly discrete random variables, we can use
this to define a probability mass function for X given Y = y .
p(x,y )
I That is, we write pX |Y (x|y ) = P{X = x|Y = y } = pY (y ) .
I In words: first restrict sample space to pairs (x, y ) with given
y value. Then divide the original mass function by pY (y ) to
obtain a probability mass function on the restricted space.
Recall: conditional probability distributions
I It all starts with the definition of conditional probability:
P(A|B) = P(AB)/P(B).
I If X and Y are jointly discrete random variables, we can use
this to define a probability mass function for X given Y = y .
p(x,y )
I That is, we write pX |Y (x|y ) = P{X = x|Y = y } = pY (y ) .
I In words: first restrict sample space to pairs (x, y ) with given
y value. Then divide the original mass function by pY (y ) to
obtain a probability mass function on the restricted space.
I We do something similar when X and Y are continuous
random variables. In that case we write fX |Y (x|y ) = ffY(x,y )
(y ) .
Recall: conditional probability distributions
I It all starts with the definition of conditional probability:
P(A|B) = P(AB)/P(B).
I If X and Y are jointly discrete random variables, we can use
this to define a probability mass function for X given Y = y .
p(x,y )
I That is, we write pX |Y (x|y ) = P{X = x|Y = y } = pY (y ) .
I In words: first restrict sample space to pairs (x, y ) with given
y value. Then divide the original mass function by pY (y ) to
obtain a probability mass function on the restricted space.
I We do something similar when X and Y are continuous
random variables. In that case we write fX |Y (x|y ) = ffY(x,y )
(y ) .
I Often useful to think of sampling (X , Y ) as a two-stage
process. First sample Y from its marginal distribution, obtain
Y = y for some particular y . Then sample X from its
probability distribution given Y = y .
Recall: conditional probability distributions
I It all starts with the definition of conditional probability:
P(A|B) = P(AB)/P(B).
I If X and Y are jointly discrete random variables, we can use
this to define a probability mass function for X given Y = y .
p(x,y )
I That is, we write pX |Y (x|y ) = P{X = x|Y = y } = pY (y ) .
I In words: first restrict sample space to pairs (x, y ) with given
y value. Then divide the original mass function by pY (y ) to
obtain a probability mass function on the restricted space.
I We do something similar when X and Y are continuous
random variables. In that case we write fX |Y (x|y ) = ffY(x,y )
(y ) .
I Often useful to think of sampling (X , Y ) as a two-stage
process. First sample Y from its marginal distribution, obtain
Y = y for some particular y . Then sample X from its
probability distribution given Y = y .
I Marginal law of X is weighted average of conditional laws.
Example

I Let X be value on one die roll, Y value on second die roll,


and write Z = X + Y .
Example

I Let X be value on one die roll, Y value on second die roll,


and write Z = X + Y .
I What is the probability distribution for X given that Y = 5?
Example

I Let X be value on one die roll, Y value on second die roll,


and write Z = X + Y .
I What is the probability distribution for X given that Y = 5?
I Answer: uniform on {1, 2, 3, 4, 5, 6}.
Example

I Let X be value on one die roll, Y value on second die roll,


and write Z = X + Y .
I What is the probability distribution for X given that Y = 5?
I Answer: uniform on {1, 2, 3, 4, 5, 6}.
I What is the probability distribution for Z given that Y = 5?
Example

I Let X be value on one die roll, Y value on second die roll,


and write Z = X + Y .
I What is the probability distribution for X given that Y = 5?
I Answer: uniform on {1, 2, 3, 4, 5, 6}.
I What is the probability distribution for Z given that Y = 5?
I Answer: uniform on {6, 7, 8, 9, 10, 11}.
Example

I Let X be value on one die roll, Y value on second die roll,


and write Z = X + Y .
I What is the probability distribution for X given that Y = 5?
I Answer: uniform on {1, 2, 3, 4, 5, 6}.
I What is the probability distribution for Z given that Y = 5?
I Answer: uniform on {6, 7, 8, 9, 10, 11}.
I What is the probability distribution for Y given that Z = 5?
Example

I Let X be value on one die roll, Y value on second die roll,


and write Z = X + Y .
I What is the probability distribution for X given that Y = 5?
I Answer: uniform on {1, 2, 3, 4, 5, 6}.
I What is the probability distribution for Z given that Y = 5?
I Answer: uniform on {6, 7, 8, 9, 10, 11}.
I What is the probability distribution for Y given that Z = 5?
I Answer: uniform on {1, 2, 3, 4}.
Outline

Conditional probability distributions

Conditional expectation

Interpretation and examples


Outline

Conditional probability distributions

Conditional expectation

Interpretation and examples


Conditional expectation

I Now, what do we mean by E [X |Y = y ]? This should just be


the expectation of X in the conditional probability measure
for X given that Y = y .
Conditional expectation

I Now, what do we mean by E [X |Y = y ]? This should just be


the expectation of X in the conditional probability measure
for X given that Y = y .
I Can write this as
P P
E [X |Y = y ] = x xP{X = x|Y = y } = x xpX |Y (x|y ).
Conditional expectation

I Now, what do we mean by E [X |Y = y ]? This should just be


the expectation of X in the conditional probability measure
for X given that Y = y .
I Can write this as
P P
E [X |Y = y ] = x xP{X = x|Y = y } = x xpX |Y (x|y ).
I Can make sense of this in the continuum setting as well.
Conditional expectation

I Now, what do we mean by E [X |Y = y ]? This should just be


the expectation of X in the conditional probability measure
for X given that Y = y .
I Can write this as
P P
E [X |Y = y ] = x xP{X = x|Y = y } = x xpX |Y (x|y ).
I Can make sense of this in the continuum setting as well.
f (x,y )
I In continuum setting we had fX |Y (x|y ) = fY (y ) . So
R
E [X |Y = y ] = x ffY(x,y )
(y ) dx
Example

I Let X be value on one die roll, Y value on second die roll,


and write Z = X + Y .
Example

I Let X be value on one die roll, Y value on second die roll,


and write Z = X + Y .
I What is E [X |Y = 5]?
Example

I Let X be value on one die roll, Y value on second die roll,


and write Z = X + Y .
I What is E [X |Y = 5]?
I What is E [Z |Y = 5]?
Example

I Let X be value on one die roll, Y value on second die roll,


and write Z = X + Y .
I What is E [X |Y = 5]?
I What is E [Z |Y = 5]?
I What is E [Y |Z = 5]?
Conditional expectation as a random variable

I Can think of E [X |Y ] as a function of the random variable Y .


When Y = y it takes the value E [X |Y = y ].
Conditional expectation as a random variable

I Can think of E [X |Y ] as a function of the random variable Y .


When Y = y it takes the value E [X |Y = y ].
I So E [X |Y ] is itself a random variable. It happens to depend
only on the value of Y .
Conditional expectation as a random variable

I Can think of E [X |Y ] as a function of the random variable Y .


When Y = y it takes the value E [X |Y = y ].
I So E [X |Y ] is itself a random variable. It happens to depend
only on the value of Y .
I Thinking of E [X |Y ] as a random variable, we can ask what its
expectation is. What is E [E [X |Y ]]?
Conditional expectation as a random variable

I Can think of E [X |Y ] as a function of the random variable Y .


When Y = y it takes the value E [X |Y = y ].
I So E [X |Y ] is itself a random variable. It happens to depend
only on the value of Y .
I Thinking of E [X |Y ] as a random variable, we can ask what its
expectation is. What is E [E [X |Y ]]?
I Very useful fact: E [E [X |Y ]] = E [X ].
Conditional expectation as a random variable

I Can think of E [X |Y ] as a function of the random variable Y .


When Y = y it takes the value E [X |Y = y ].
I So E [X |Y ] is itself a random variable. It happens to depend
only on the value of Y .
I Thinking of E [X |Y ] as a random variable, we can ask what its
expectation is. What is E [E [X |Y ]]?
I Very useful fact: E [E [X |Y ]] = E [X ].
I In words: what you expect to expect X to be after learning Y
is same as what you now expect X to be.
Conditional expectation as a random variable

I Can think of E [X |Y ] as a function of the random variable Y .


When Y = y it takes the value E [X |Y = y ].
I So E [X |Y ] is itself a random variable. It happens to depend
only on the value of Y .
I Thinking of E [X |Y ] as a random variable, we can ask what its
expectation is. What is E [E [X |Y ]]?
I Very useful fact: E [E [X |Y ]] = E [X ].
I In words: what you expect to expect X to be after learning Y
is same as what you now expect X to be.
I Proof in discretePcase:
E [X |Y = y ] = x xP{X = x|Y = y } = x x p(x,y )
P
pY (y ) .
Conditional expectation as a random variable

I Can think of E [X |Y ] as a function of the random variable Y .


When Y = y it takes the value E [X |Y = y ].
I So E [X |Y ] is itself a random variable. It happens to depend
only on the value of Y .
I Thinking of E [X |Y ] as a random variable, we can ask what its
expectation is. What is E [E [X |Y ]]?
I Very useful fact: E [E [X |Y ]] = E [X ].
I In words: what you expect to expect X to be after learning Y
is same as what you now expect X to be.
I Proof in discretePcase:
E [X |Y = y ] = x xP{X = x|Y = y } = x x p(x,y )
P
pY (y ) .
P
I Recall that, in general, E [g (Y )] = y pY (y )g (y ).
Conditional expectation as a random variable

I Can think of E [X |Y ] as a function of the random variable Y .


When Y = y it takes the value E [X |Y = y ].
I So E [X |Y ] is itself a random variable. It happens to depend
only on the value of Y .
I Thinking of E [X |Y ] as a random variable, we can ask what its
expectation is. What is E [E [X |Y ]]?
I Very useful fact: E [E [X |Y ]] = E [X ].
I In words: what you expect to expect X to be after learning Y
is same as what you now expect X to be.
I Proof in discretePcase:
E [X |Y = y ] = x xP{X = x|Y = y } = x x p(x,y )
P
pY (y ) .
P
I Recall that, in general, E [g (Y )] = y pY (y )g (y ).
E [E [X |Y = y ]] = y pY (y ) x x p(x,y )
P P P P
pY (y ) = y p(x, y )x =
I
x
E [X ].
Conditional variance

I Definition:
Var(X |Y ) = E (X E [X |Y ])2 |Y = E X 2 E [X |Y ]2 |Y .
   
Conditional variance

I Definition:
Var(X |Y ) = E (X E [X |Y ])2 |Y = E X 2 E [X |Y ]2 |Y .
   

I Var(X |Y ) is a random variable that depends on Y . It is the


variance of X in the conditional distribution for X given Y .
Conditional variance

I Definition:
Var(X |Y ) = E (X E [X |Y ])2 |Y = E X 2 E [X |Y ]2 |Y .
   

I Var(X |Y ) is a random variable that depends on Y . It is the


variance of X in the conditional distribution for X given Y .
I Note E [Var(X |Y )] = E [E [X 2 |Y ]] E [E [X |Y ]2 |Y ] =
E [X 2 ] E [E [X |Y ]2 ].
Conditional variance

I Definition:
Var(X |Y ) = E (X E [X |Y ])2 |Y = E X 2 E [X |Y ]2 |Y .
   

I Var(X |Y ) is a random variable that depends on Y . It is the


variance of X in the conditional distribution for X given Y .
I Note E [Var(X |Y )] = E [E [X 2 |Y ]] E [E [X |Y ]2 |Y ] =
E [X 2 ] E [E [X |Y ]2 ].
I If we subtract E [X ]2 from first term and add equivalent value
E [E [X |Y ]]2 to the second, RHS becomes
Var[X ] Var[E [X |Y ]], which implies following:
Conditional variance

I Definition:
Var(X |Y ) = E (X E [X |Y ])2 |Y = E X 2 E [X |Y ]2 |Y .
   

I Var(X |Y ) is a random variable that depends on Y . It is the


variance of X in the conditional distribution for X given Y .
I Note E [Var(X |Y )] = E [E [X 2 |Y ]] E [E [X |Y ]2 |Y ] =
E [X 2 ] E [E [X |Y ]2 ].
I If we subtract E [X ]2 from first term and add equivalent value
E [E [X |Y ]]2 to the second, RHS becomes
Var[X ] Var[E [X |Y ]], which implies following:
I Useful fact: Var(X ) = Var(E [X |Y ]) + E [Var(X |Y )].
Conditional variance

I Definition:
Var(X |Y ) = E (X E [X |Y ])2 |Y = E X 2 E [X |Y ]2 |Y .
   

I Var(X |Y ) is a random variable that depends on Y . It is the


variance of X in the conditional distribution for X given Y .
I Note E [Var(X |Y )] = E [E [X 2 |Y ]] E [E [X |Y ]2 |Y ] =
E [X 2 ] E [E [X |Y ]2 ].
I If we subtract E [X ]2 from first term and add equivalent value
E [E [X |Y ]]2 to the second, RHS becomes
Var[X ] Var[E [X |Y ]], which implies following:
I Useful fact: Var(X ) = Var(E [X |Y ]) + E [Var(X |Y )].
I One can discover X in two stages: first sample Y from
marginal and compute E [X |Y ], then sample X from
distribution given Y value.
Conditional variance

I Definition:
Var(X |Y ) = E (X E [X |Y ])2 |Y = E X 2 E [X |Y ]2 |Y .
   

I Var(X |Y ) is a random variable that depends on Y . It is the


variance of X in the conditional distribution for X given Y .
I Note E [Var(X |Y )] = E [E [X 2 |Y ]] E [E [X |Y ]2 |Y ] =
E [X 2 ] E [E [X |Y ]2 ].
I If we subtract E [X ]2 from first term and add equivalent value
E [E [X |Y ]]2 to the second, RHS becomes
Var[X ] Var[E [X |Y ]], which implies following:
I Useful fact: Var(X ) = Var(E [X |Y ]) + E [Var(X |Y )].
I One can discover X in two stages: first sample Y from
marginal and compute E [X |Y ], then sample X from
distribution given Y value.
I Above fact breaks variance into two parts, corresponding to
these two stages.
Example

I Let X be a random variable of variance X2 and Y an


independent random variable of variance Y2 and write
Z = X + Y . Assume E [X ] = E [Y ] = 0.
Example

I Let X be a random variable of variance X2 and Y an


independent random variable of variance Y2 and write
Z = X + Y . Assume E [X ] = E [Y ] = 0.
I What are the covariances Cov(X , Y ) and Cov(X , Z )?
Example

I Let X be a random variable of variance X2 and Y an


independent random variable of variance Y2 and write
Z = X + Y . Assume E [X ] = E [Y ] = 0.
I What are the covariances Cov(X , Y ) and Cov(X , Z )?
I How about the correlation coefficients (X , Y ) and (X , Z )?
Example

I Let X be a random variable of variance X2 and Y an


independent random variable of variance Y2 and write
Z = X + Y . Assume E [X ] = E [Y ] = 0.
I What are the covariances Cov(X , Y ) and Cov(X , Z )?
I How about the correlation coefficients (X , Y ) and (X , Z )?
I What is E [Z |X ]? And how about Var(Z |X )?
Example

I Let X be a random variable of variance X2 and Y an


independent random variable of variance Y2 and write
Z = X + Y . Assume E [X ] = E [Y ] = 0.
I What are the covariances Cov(X , Y ) and Cov(X , Z )?
I How about the correlation coefficients (X , Y ) and (X , Z )?
I What is E [Z |X ]? And how about Var(Z |X )?
I Both of these values are functions of X . Former is just X .
Latter happens to be a constant-valued function of X , i.e.,
happens not to actually depend on X . We have
Var(Z |X ) = Y2 .
Example

I Let X be a random variable of variance X2 and Y an


independent random variable of variance Y2 and write
Z = X + Y . Assume E [X ] = E [Y ] = 0.
I What are the covariances Cov(X , Y ) and Cov(X , Z )?
I How about the correlation coefficients (X , Y ) and (X , Z )?
I What is E [Z |X ]? And how about Var(Z |X )?
I Both of these values are functions of X . Former is just X .
Latter happens to be a constant-valued function of X , i.e.,
happens not to actually depend on X . We have
Var(Z |X ) = Y2 .
I Can we check the formula
Var(Z ) = Var(E [Z |X ]) + E [Var(Z |X )] in this case?
Outline

Conditional probability distributions

Conditional expectation

Interpretation and examples


Outline

Conditional probability distributions

Conditional expectation

Interpretation and examples


Interpretation

I Sometimes think of the expectation E [Y ] as a best guess or


best predictor of the value of Y .
Interpretation

I Sometimes think of the expectation E [Y ] as a best guess or


best predictor of the value of Y .
I It is best in the sense that at among all constants m, the
expectation E [(Y m)2 ] is minimized when m = E [Y ].
Interpretation

I Sometimes think of the expectation E [Y ] as a best guess or


best predictor of the value of Y .
I It is best in the sense that at among all constants m, the
expectation E [(Y m)2 ] is minimized when m = E [Y ].
I But what if we allow non-constant predictors? What if the
predictor is allowed to depend on the value of a random
variable X that we can observe directly?
Interpretation

I Sometimes think of the expectation E [Y ] as a best guess or


best predictor of the value of Y .
I It is best in the sense that at among all constants m, the
expectation E [(Y m)2 ] is minimized when m = E [Y ].
I But what if we allow non-constant predictors? What if the
predictor is allowed to depend on the value of a random
variable X that we can observe directly?
I Let g (x) be such a function. Then E [(y g (X ))2 ] is
minimized when g (X ) = E [Y |X ].
Examples

I Toss 100 coins. Whats the conditional expectation of the


number of heads given that there are k heads among the first
fifty tosses?
Examples

I Toss 100 coins. Whats the conditional expectation of the


number of heads given that there are k heads among the first
fifty tosses?
I k + 25
Examples

I Toss 100 coins. Whats the conditional expectation of the


number of heads given that there are k heads among the first
fifty tosses?
I k + 25
I Whats the conditional expectation of the number of aces in a
five-card poker hand given that the first two cards in the hand
are aces?
Examples

I Toss 100 coins. Whats the conditional expectation of the


number of heads given that there are k heads among the first
fifty tosses?
I k + 25
I Whats the conditional expectation of the number of aces in a
five-card poker hand given that the first two cards in the hand
are aces?
I 2 + 3 2/50

You might also like