The document discusses expected value and how it is used to describe the weighted average of potential outcomes of a random variable or process. It provides examples of calculating expected value for situations like coin flips, dice rolls, and lotteries. It also introduces related concepts like variance, which measures the spread of possible values, and how expected value can be used to describe profits and losses from games of chance. Moment generating functions are presented as another tool to analyze distributions and calculate moments like mean and variance.
The document discusses expected value and how it is used to describe the weighted average of potential outcomes of a random variable or process. It provides examples of calculating expected value for situations like coin flips, dice rolls, and lotteries. It also introduces related concepts like variance, which measures the spread of possible values, and how expected value can be used to describe profits and losses from games of chance. Moment generating functions are presented as another tool to analyze distributions and calculate moments like mean and variance.
The expected value of a random variable indicates its weighted average.
Ex. How many heads would you expect if you flipped a coin twice?
X = number of heads = {0,1,2} p(0)=1/4, p(1)=1/2, p(2)=1/4 Weighted average = 0*1/4 + 1*1/2 + 2*1/4 = 1
Draw PDF
Definition: Let X be a random variable assuming the values x 1 , x 2 , x 3 , ... with corresponding probabilities p(x 1 ), p(x 2 ), p(x 3 ),..... The mean or expected value of X is defined by E(X) = sum x k p(x k ).
Interpretations: (i) The expected value measures the center of the probability distribution - center of mass. (ii) Long term frequency (law of large numbers well get to this soon)
Expectations can be used to describe the potential gains and losses from games.
Ex. Roll a die. If the side that comes up is odd, you win the $ equivalent of that side. If it is even, you lose $4.
Ex. Lottery You pick 3 different numbers between 1 and 12. If you pick all the numbers correctly you win $100. What are your expected earnings if it costs $1 to play?
Let X be a random variable assuming the values x 1 , x 2 , x 3 , ... with corresponding probabilities p(x 1 ), p(x 2 ), p(x 3 ),..... For any function g, the mean or expected value of g(X) is defined by E(g(X)) = sum g(x k ) p(x k ).
Ex. Roll a fair die. Let X = number of dots on the side that comes up.
E(X) is the expected value or 1 st moment of X. E(X n ) is called the nth moment of X.
Calculate E(sqrt(X)) = sum_{i=1}^{6} sqrt(i) p(i) Calculate E(e X ) = sum_{i=1}^{6} e i p(i) (Do at home)
Ex. An indicator variable for the event A is defined as the random variable that takes on the value 1 when event A happens and 0 otherwise.
I A = 1 if A occurs 0 if A C occurs
P(I A =1) = P(A) and P(I A =0) = P(A C ) The expectation of this indicator (noted I A ) is E(I A )=1*P(A) + 0*P(A C ) =P(A).
One-to-one correspondence between expectations and probabilities.
If a and b are constants, then E(aX+b) = aE(X) + b Proof: E(aX+b) = sum [(ax k +b) p(x k )] = a sum{x k p(x k )} + b sum{p(x k )} = aE(X) + b
Variance
We often seek to summarize the essential properties of a random variable in as simple terms as possible.
The mean is one such property.
Let X = 0 with probability 1
Let Y = -2 with prob. 1/3 -1 with prob. 1/6 1 with prob. 1/6 2 with prob. 1/3
Both X and Y have the same expected value, but are quite different in other respects. One such respect is in their spread. We would like a measure of spread.
Definition: If X is a random variable with mean E(X), then the variance of X, denoted by Var(X), is defined by Var(X) = E((X-E(X)) 2 ).
A small variance indicates a small spread.
Var(X) = E(X 2 ) - (E(X)) 2
Var(X) = E((X-E(X)) 2 ) = sum (x- E(X)) 2 p(x) = sum (x 2 -2x E(X)+ E(X) 2 )
p(x) = sum x 2 p(x) -2 E(X) sum xp(x) + E(X) 2 sum p(x) = E(X 2 ) -2 E(X) 2 + E(X) 2 = E(X 2 ) - E(X) 2
Ex. Roll a fair die. Let X = number of dots on the side that comes up.
If a and b are constants then Var(aX+b) = a 2 Var(X)
E(aX+b) = a E(X) + b Var(aX+b) = E[(aX+b (a E(X)+b)) 2 ]= E(a 2 (X E(X)) 2 ) = a 2 E((X E(X)) 2 )= a 2 Var(X)
The square root of Var(X) is called the standard deviation of X. SD(X) = sqrt(Var(X)): measures scale of X.
Means, modes, and medians
Best estimate under squared loss: mean
i.e., the number m that minimizes E[(X-m)^2] is m=E(X). Proof: expand and differentiate with respect to m.
Best estimate under absolute loss: median. i.e., m=median minimizes E[|X-m|]. Proof in book. Note that median is nonunique in general.
Best estimate under 1-1(X=x) loss: mode. Ie, choosing mode maximizes probability of being exactly right. Proof easy for discrete r.v.s; a limiting argument is required for continuous r.v.s, since P(X=x)=0 for any x. Moment Generating Functions
The moment generating function of the random variable X, denoted ) (t M X , is defined for all real values of t by,
! ! " ! ! # $ = = % & ' ' ( f(x) pdf with continuous is X if ) ( p(x) pmf with discrete is X if ) ( ) ( ) ( dx x f e x p e e E t M tx x tx tX X
The reason ) (t M X is called a moment generating function is because all the moments of X can be obtained by successively differentiating ) (t M X and evaluating the result at t=0.
First Moment:
) ( ) ( ) ( ) ( tX tX tX X Xe E e dt d E e E dt d t M dt d = = =
) ( ) 0 ( ' X E M X =
(For any of the distributions we will use we can move the derivative inside the expectation). Second moment:
) ( )) ( ( ) ( ) ( ' ) ( ' ' 2 tX tX tX X e X E Xe dt d E Xe E dt d t M dt d t M = = = =
) ( ) 0 ( ' ' 2 X E M X =
kth moment:
) ( ) ( tX k X k e X E t M =
) ( ) 0 ( k X k X E M =
Ex. Binomial random variable with parameters n and p.
Calculate ) (t M X : ! M X (t) = E(e tX ) = e tk n k " # $ % & ' p k (1( p) n(k k= 0 n ) = n k " # $ % & ' pe t ( ) k (1( p) n(k k= 0 n ) = pe t +1( p ( ) n
( ) t n t X pe p pe n t M 1 1 ) ( ' ! ! + = ( ) ( ) t n t t n t X pe p pe n pe p pe n n t M 1 2 2 1 ) ( 1 ) 1 ( ) ( ' ' ! ! ! + + ! + ! =
( ) np pe p pe n M X E n X = ! + = = ! 0 1 0 1 ) 0 ( ' ) ( ( ) ( ) np p n n pe p pe n pe p pe n n t M X E n n X + ! = ! + + ! + ! = = ! ! 2 0 1 0 2 0 2 0 2 ) 1 ( 1 ) ( 1 ) 1 ( ) ( ' ' ) (
) 1 ( ) ( ) 1 ( ) ( ) ( ) ( 2 2 2 2 p np np np p n n X E X E X Var ! = ! + ! = ! =
Later well see an even easier way to calculate these moments, by using the fact that a binomial X is the sum of N i.i.d. simpler (Bernoulli) r.v.s.
Fact: Suppose that for two random variables X and Y, moment generating functions exist and are given by ) (t M X and ) (t M Y , respectively. If ) (t M X = ) (t M Y for all values of t, then X and Y have the same probability distribution.
If the moment generating function of X exists and is finite in some region about t=0, then the distribution is uniquely determined.
Properties of Expectation
Proposition:
If X and Y have a joint probability mass function p XY (x,y), then !! = x y XY y x p y x g Y X g E ) , ( ) , ( )) , ( ( If X and Y have a joint probability density function f XY (x,y), then ! ! " " # " " # = ) , ( ) , ( )) , ( ( y x f y x g Y X g E XY
It is important to note that if the function g(x,y) is only dependent on either x or y the formula above reverts to the 1-dimensional case.
Ex. Suppose X and Y have a joint pdf f XY (x,y). Calculate E(X). ! ! ! ! ! " " # " " # " " # " " # " " # = $ $ % & ' ' ( ) = = dx x xf dx dy y x f x dydx y x xf X E X XY XY ) ( ) , ( ) , ( ) (
Ex. An accident occurs at a point X that is uniformly distributed on a road of length L. At the time of the accident an ambulance is at location Y that is also uniformly distributed on the road. Assuming that X and Y are independent, find the expected distance between the ambulance and the point of the accident.
Compute E(|X-Y|).
Both X and Y are uniform on the interval (0,L). The joint pdf is 2 1 ) , ( L y x f XY = , 0<x<L, 0<y<L.
! ! ! ! " = " = " L L L L dydx y x L dydx L y x Y X E 0 0 2 0 0 2 | | 1 1 | | |) (|
xL x L x x xL L x x yx y y xy dy x y dy y x dy y x L x x o L x x L ! + = ! ! ! + ! = " # $ % & ' ! + " # $ % & ' ! = ! + ! = ! ( ( ( 2 2 2 2 2 2 2 2 2 0 0 2 ) 2 ( 2 ) 2 ( 2 2 ) ( ) ( | |
3 2 3 2 1 2 3 2 1 2 1 |) (| 3 3 3 2 0 2 3 2 2 0 2 2 2 L L L L L L x x xL L dx xL x L L Y X E L L = ! ! " # $ $ % & ' + = ( ) * + , - ' + = ! ! " # $ $ % & ' + = ' .
Expectation of sums of random variables
Ex. Let X and Y be continuous random variables with joint pdf f XY (x,y). Assume that E(X) and E(Y) are finite. Calculate E(X+Y).
) ( ) ( ) ( ) ( ) , ( ) , ( ) , ( ) ( ) ( Y E X E dy y yf dx x xf dxdy y x yf dxdy y x xf dxdy y x f y x Y X E Y X XY XY XY + = + = + = + = + ! ! ! ! ! ! ! ! " " # " " # " " # " " # " " # " " # " " # " " #
Same result holds in discrete case.
Proposition: In general if E(X i ) are finite for all i=1,.n, then ) ( ) ( ) ( 1 1 n n X E X E X X E + + = + + K K .
Proof: Use the example above and prove by induction.
Let X 1 , .. X n be independent and identically distributed random variables having distribution function F X and expected value . Such a sequence of random variables is said to constitute a sample from the distribution F X . The quantity X , defined by ! = = n i i n X X 1 is called the sample mean.
Calculate E( X ).
We know that = ) ( i X E . = = = = ! ! ! = = = n i i n i i n i i X E n X E n n X E X E 1 1 1 ) ( 1 ) ( 1 ) ( ) (
When the mean of a distribution is unknown, the sample mean is often used in statistics to estimate it. (Unbiased estimate)
Ex. Let X be a binomial random variable with parameters n and p. X represents the number of successes in n trials. We can write X as follows:
n X X X X + + + = K 2 1
where ! " # = failure a is i trial if 0 success a is i trial if 1 i X
The X i s are Bernoulli random variables with parameter p.
p p p X E i = ! + = ) 1 ( * 0 * 1 ) ( np X E X E X E X E n = + + + = ) ( ) ( ) ( ) ( 2 1 K
Ex. A group of N people throw their hats into the center of a room. The hats are mixed, and each person randomly selects one. Find the expected number of people that select their own hat.
Let X = the number of people who select their own hat.
Number the people from 1 to N. Let
! " # = otherwise 0 hat own his choses i person the if 1 i X
then N X X X X + + + = K 2 1
Each person is equally likely to select any of the N hats, so N X P i 1 ) 1 ( = = . N N N X E i 1 ) 1 1 ( 0 1 1 ) ( = ! + = .
Hence, 1 1 ) ( ) ( ) ( ) ( 2 1 = = + + + = N N X E X E X E X E N K
Ex. Twenty people, consisting of 10 married couples, are to be seated at five different tables, with four people at each table. If the seating is done at random, what is the expected number of married couples that are seated at the same table?
Let X = the number of married couples at the same table.
Number then couples from 1 to 10 and let,
! " # = otherwise 0 table. same at the seated is i couple if 1 i X
Then 10 2 1 X X X X + + + = K
To calculate E(X) we need to know ). ( i X E
Consider the table where husband i is sitting. There is room for three other people at his table. There are a total of 19 possible people which could be seated at his table.
19 3 3 19 2 18 1 1 ) 1 ( = ! ! " # $ $ % & ! ! " # $ $ % & ! ! " # $ $ % & = = i X P . 19 3 19 16 0 19 3 1 ) ( = + = i X E .
Hence, 19 30 19 3 10 ) ( ) ( ) ( ) ( 2 1 = = + + + = n X E X E X E X E K
Proposition: If X and Y are independent, then for any functions h and g,
)) ( ( )) ( ( )) ( ) ( ( Y h E X g E Y h X g E = .
Proof:
)) ( ( )) ( ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) , ( ) ( ) ( )) ( ) ( ( Y h E X g E dy y f y h dx x f x g dxdy y f x f y h x g dxdy y x f y h x g Y h X g E Y X Y X XY = = = = ! ! ! ! ! ! " " # " " # " " # " " # " " # " " #
In fact, this is an equivalent way to characterize independence: if )) ( ( )) ( ( )) ( ) ( ( Y h E X g E Y h X g E = for any functions g(X) and h(Y) (but not any function f(X,Y)), then X and Y are independent. To see this, just use indicator functions.
Fact: The moment generating function of the sum of independent random variables equals the product of the individual moment generating functions.
Proof: ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( t M t M e E e E e e E e E t M Y X tY tX tY tX Y X t Y X = = = = + +
Covariance and correlation
Previously, we have discussed the absence or presence of a relationship between two random variables, i.e. independence or dependence. But if there is in fact a relationship, the relationship may be either weak or strong.
Ex. (a) Let X = weight of a sample of water Y = volume of the same sample of water
There is an extremely strong relationship between X and Y.
(b) Let X = a persons weight Y = same persons weight
There is a relationship between X and Y, but not as strong as in (a).
We would like a measure that can quantify this difference in the strength of a relationship between two random variables.
Definition: The covariance between X and Y, denoted by Cov(X,Y), is defined by
[ ] )) ( ))( ( ( ) , ( Y E Y X E X E Y X Cov ! ! = .
Similarly as with the variance, we can rewrite this equation,
[ ] [ ] ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) ( )) ( ) ( ) ( ) ( ( )) ( ))( ( ( ) , ( Y E X E XY E Y E X E Y E X E Y E X E XY E Y E X E Y XE Y X E XY E Y E Y X E X E Y X Cov ! = + ! ! = + ! ! = ! ! =
Note that if X and Y are independent, 0 ) ( ) ( ) ( ) ( ) ( ) ( ) ( ) , ( = ! = ! = Y E X E Y E X E Y E X E XY E Y X Cov .
The converse is however NOT true.
Counter-Example: Define X and Y so that,
P(X=0) = P(X=1) = P(X=-1) = 1/3 and
! " # = $ = 0 if 1 0 if 0 X X Y X and Y are clearly dependent.
XY=0 so we have that E(XY)=E(X)=0, so 0 ) ( ) ( ) ( ) , ( = ! = Y E X E XY E Y X Cov .
Proposition:
(i) ) , ( ) , ( X Y Cov Y X Cov = (ii) ) ( ) , ( X Var X X Cov = (iii) ) , ( ) , ( Y X aCov Y aX Cov = (iv) !! ! ! = = = = = n i m j j i m j i n i i Y X Cov Y X Cov 1 1 1 1 ) , ( ) , (
Proof: (i) (iii) Verify yourselves.
(iv). Let ) ( i i X E = and ) ( j j Y E = !
Then ! ! = = = m i i m i i X E 1 1 ) ( and ! ! = = = m i i m i i Y E 1 1 ) ( "
!! !! !! ! ! ! ! ! ! ! ! = = = = = = = = = = = = = = = " " = # $ % & ' ( " " = # $ % & ' ( " " = # $ % & ' ( " " = n i m j j i i i n i m j i i i i n i m j i i m j i i n i i i n i i m j i n i i n i i m j i n i i Y X Cov Y X E Y X E Y X E Y X E Y X Cov 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ) , ( )) ( ) (( ) ( ) ( ) ( ) ( ) )( ( ) , ( ) ) ) )
Proposition: ! Var( X i i=1 n " ) = Var(X i ) i=1 n " + 2 Cov(X i , X j ) " i< j " . In particular, V(X+Y)=V(X)+V(Y)+2C(X,Y).
Proof:
! Var( X i i=1 n " ) = Cov( X i i=1 n " , X i i=1 n " ) = Cov(X i , X j ) j=1 n " i=1 n " = Var(X i ) i=1 n " + Cov(X i , X j ) " i# j " = Var(X i ) i=1 n " + 2 Cov(X i , X j ) " i< j "
If X 1 , X n are pairwise independent for i!j, then ! ! = = = n i i n i i X Var X Var 1 1 ) ( ) ( .
Ex. Let X 1 , .. X n be independent and identically distributed random variables having expected value and variance " 2 . Let ! = = n i i n X X 1 be the sample mean. The random variable ! = " " = n i i n X X S 1 2 2 1 ) ( is called the sample variance.
Calculate (a) Var( X ) and (b) E(S 2 ).
(a) We know that 2 ) ( ! = i X Var . n X Var n X Var n n X Var X Var n i i n i i n i i 2 1 2 1 2 1 ) ( 1 ) ( 1 ) ( ) ( ! = " # $ % & ' = " # $ % & ' = = ( ( ( = = = .
! ! ! = ! ! ! ! + ! = ! ! ! ! + ! = ! ! ! ! + ! = ! ! ! ! + ! = ! ! ! = ! + ! = ! " " " " " " " " " " " = = = = = = = = = = = X n X X n X X n X X X X n X X X X X X X X X X X X X X X n i i n i i n i i n i i n i i n i n i i n i i i n i i n i i n i i
[ ] 2 2 2 2 1 2 2 ) 1 ( 1 1 ) ( 1 1 ) ) (( ] ) [( 1 1 ) ( ! ! ! = " " = " " = # $ % & ' ( " " " " = ) = n n X nVar n n X nE X E n S E n i i
(The sample variance is an unbiased estimate of the variance)
Ex. A group of N people throw their hats into the center of a room. The hats are mixed, and each person randomly selects one.
Let X = the number of people who select their own hat.
Number the people from 1 to N. Let
! " # = otherwise 0 hat own his choses i person the if 1 i X
then n X X X X + + + = K 2 1
We showed last time that E(X)=1.
Calculate Var(X).
! Var(X) =Var( X i i=1 n " ) = Var(X i ) i=1 n " + 2 Cov(X i , X j ) " i< j "
Recall that since each person is equally likely to select any of the N hats N X P i 1 ) 1 ( = = . Hence, N N N X E i 1 ) 1 1 ( 0 1 1 ) ( = ! + = and N N N X E i 1 ) 1 1 ( 0 1 1 ) ( 2 2 = ! + = .
2 2 2 2 1
1 1 )) ( ( ) ( ) ( N N N N X E X E X Var i i i ! = " # $ % & ' ! = ! = .
) ( ) ( ) ( ) , ( j i j i j i X E X E X X E X X Cov ! =
! " # = otherwise 0 hat own their choses j and i persons both if 1 j i X X
1 1 1 ) 1 ( ) 1 | 1 ( ) 1 , 1 ( ! = = = = = = = N N X P X X P X X P j j i j i
1 1 1 ) 1 1 1 1 ( 0 1 1 1 1 ) ( ! = ! ! + ! = N N N N N N X X E j i
) 1 ( 1 1 1 1 1 ) , ( 2 2 ! = " # $ % & ' ! ! = N N N N N X X Cov j i
Hence, 1 1 1 ) 1 ( 1 2 2 1 ) ( 2 = + ! = ! " " # $ % % & ' + ! = N N N N N N N N X Var
Definition: The correlation between X and Y, denoted by #(X,Y), is defined, as long as Var(X) and Var(Y) are positive, by
) ( ) ( ) , ( ) , ( Y Var X Var Y X Cov Y X = ! .
It can be shown that ! "1# $(X,Y) #1, with equality only if Y=aX+b (assuming E(X^2) and E(Y^2) are both finite). This is called the Cauchy-Schwarz inequality.
Proof: It suffices to prove (E(XY))^2<=E(X^2)E(Y^2). The basic idea is to look at the expectations E[(aX+bY)^2] and E[(aX-bY)^2]. We use the usual rules for adding and subtracting variance:
and this is equivalent to the inequality ! "1# $(X,Y) #1. For equality to hold, either E[(aX+bY)^2]=0 or E[(aX-bY)^2]=0, i.e., X and Y are linearly related with a negative or positive slope, respectively.
The correlation coefficient is therefore a measure of the degree of linearity between X and Y. If #(X,Y)=0 then this indicates no linearity, and X and Y are said to be uncorrelated. Conditional Expectation
Recall that if X and Y are discrete random variables, the conditional mass function of X, given Y=y, is defined for all y such that P(Y=y)>0, by
) ( ) , ( ) | ( ) | ( | y p y x p y Y x X P y x p Y XY Y X = = = = .
Definition: If X and Y are discrete random variables, the conditional expectation of X, given Y=y, is defined for all y such that P(Y=y)>0, by
! ! = = = = = x Y X x y x xp y Y x X xP y Y X E ) | ( ) | ( ) | ( | .
Similarly, if X and Y are continuous random variables, the conditional pdf of X given Y=y, is defined for all y such that 0 ) ( > y f Y , by
) ( ) , ( ) | ( | y f y x f y x f Y XY Y X = .
Definition: If X and Y are continuous random variables, the conditional expectation of X, given Y=y, is defined for all y such that 0 ) ( > y f Y , by
! ! " " # " " # = = = = = dx y x xf y Y x X xP y Y X E Y X ) | ( ) | ( ) | ( | .
Conditional expectations are themselves random variables. The conditional expectation of X given Y=y, is just the expected value on a reduced sample space consisting only of outcomes where Y=y.
E(X|Y=y) is a function of y.
It is important to note that conditional expectations satisfy all the properties of regular expectations:
1. [ ] ! = = x Y X y x p x g y Y X g E ) | ( ) ( | ) ( | if X and Y discrete.
2. [ ] ! " " # = = dx y x f x g y Y X g E Y X ) | ( ) ( | ) ( | if X and Y continuous.
3. [ ] y Y X E y Y X E i n i n i i = = ! " # $ % & = ' ' = = | | 1 1
Proposition: )) | ( ( ) ( Y X E E X E =
If Y is discrete ! = = y Y y p Y X E Y X E E X E ) ( ) | ( )) | ( ( ) ( If Y is continuous ! = = ) ( ) | ( )) | ( ( ) ( y f Y X E Y X E E X E Y
Proof: (Discrete case)
) ( ) ( ) , ( ) ( ) ( ) , ( ) ( ) | ( ) ( ) | ( )) | ( ( | X E x xp y x p x y p y p y x p x y p y x xp y p Y X E Y X E E x X x y XY y Y x Y XY y Y x Y X y Y = = = = ! " # $ % & = = ' ' ' '' ' ' '