0% found this document useful (0 votes)
4 views

Math5846_chapter5

This document is a chapter from an Open Learning course on Probability and Stochastic Processes, focusing on transformations, conditional expectation, and conditional variance. It covers topics such as linear transformations, probability integral transformations, and the sum of independent random variables, providing definitions, results, and examples. The chapter aims to explain how transformations affect probability distributions and includes methods for generating random variables from various distributions.

Uploaded by

huangde1212
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Math5846_chapter5

This document is a chapter from an Open Learning course on Probability and Stochastic Processes, focusing on transformations, conditional expectation, and conditional variance. It covers topics such as linear transformations, probability integral transformations, and the sum of independent random variables, providing definitions, results, and examples. The chapter aims to explain how transformations affect probability distributions and includes methods for generating random variables from various distributions.

Uploaded by

huangde1212
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 102

School of Mathematics and Statistics

UNSW Sydney

Introduction to Probability and Stochastic Processes

OPEN LEARNING
Chapter 5
Transformations,
Conditional Expectation,
and
Conditional Variance

2 / 95
Outline:

5.1 Introduction

5.2 Linear Transformation

5.3 Probability Integral Transformation

5.4 Bivariate Transformations

5.5 Sum of Independent Random Variables


① Sum of Independent Poisson Random Variables
② Sum of Independent Continuous Random Variables
③ Sum of Independent Exponential Random Variables
④ Sum of Independent Gamma Random Variables
⑤ Sum of Independent Normal Random Variables
5.6 Moment Generating Function Approach

5.7 Conditional Expectation and Conditional Variance

5.8 Supplementary Material


3 / 95
5.1 Introduction

4 / 95
Definition
If X is a random variable, Y = h(X) for some function h is
a transformation of X.

Result
For discrete X,
X
P (Y = y) = P ( h(X) = y ) = P (X = x).
x:h(x)=y

5 / 95
Example
Suppose the probability mass function of X is
x −1 0 1 2
P (X = x) 1/8 1/4 1/2 1/8

If Y = X 2 , we have
y 0 1 4
P (Y = y) 1/4 5/8 1/8
since, for example P (Y = 0) = P (X 2 = 0) = P (X = 0) = 1/4,

P (Y = 1) = P (X 2 = 1) = P (X = −1) + P (X = 1) = 1/8 + 1/2 = 5/8.


√ √
P (Y = 2) = P (X 2 = 2) = P (X = − 2) + P (X = 2) = 0.
√ √
P (Y = 4) = P (X 2 = 4) = P (X = − 4) + P (X = 4) = 0 + 1/8 = 1/8.

6 / 95
The density function of a transformed continuous variable
is simple to determine when the transformation is
monotonic.

Definition
Let h be a real-valued function defined over the set A where
A is a subset of R. Then h is a monotonic transformation
if h is strictly increasing or decreasing over A.

7 / 95
Example
Classifies the following functions as monotonic or
non-monotonic for the domain x ∈ R.

If the function is non-monotonic, specify a domain over


which this function will be monotonic.
➊ h1 (x) = x2

Solution:
➊ h1 (x) = x2 is non-monotonic on x ∈ R. However, it is
monotonic on x ∈ (−∞, 0) and x ∈ (0, ∞).

8 / 95
Example
Classifies the following functions as monotonic or
non-monotonic for the domain x ∈ R.

If the function is non-monotonic, specify a domain over


which this function will be monotonic.
➋ h2 (x) = 7(x − 4)3

Solution:
➋ h2 (x) = 7(x − 4)3 is monotonic on x ∈ R.

9 / 95
Example
Classifies the following functions as monotonic or
non-monotonic for the domain x ∈ R.

If the function is non-monotonic, specify a domain over


which this function will be monotonic.
➌ h3 (x) = sin(x)

Solution:
➌ h3 (x) = sin(x) is non-monotonic
  on x ∈ R. However, it is
−π π
monotonic on x ∈ 2 , 2 , x ∈ π2 , 3π 2 , etc.

10 / 95
Result:
For continuous random variable X, if h is monotonic over
the set {x : fX (x) > 0}, then

dx
fY (y) = fX (x)
dy
dx
= fX h−1 (y)

dy

for y such that fX {h−1 (y)} > 0.

11 / 95
Proof.

FY (y) = P (Y ≤ y) = P {h(X) ≤ y}
 
−1
 −1
P X ≤ h (y) = FX h (y)
 if h ↑
=
P X ≥ h−1 (y) = 1 − FX h−1 (y)

  
if h ↓


dh−1 (y)
= fX (x) dx
 −1
 fX h (y)

 dy dy
if h ↑
∴ fY (y) =
dh−1 (y)

 −fX h−1 (y) = −fX (x) dx if h ↓

dy dy

dy >0 if h ↑
Now and so
dx <0 if h ↓

dx
fY (y) = fX (x) .
dy

12 / 95
Example
Suppose the probability density function of X is
fX (x) = 3x2 , 0 < x < 1.

Find fY (y) where Y = 2X − 1.

13 / 95
Example
Suppose the probability density function of X is
fX (x) = 3x2 , 0 < x < 1.

Find fY (y) where Y = 2X − 1.


Solution:
First, we observe the function Y = 2X − 1 is a monotonic
transformation.
y+1 dx
We have x = 2
, and dy
= 21 . By the Transformation Formula

dx  y + 1 2 1 3
fY (y) = fX (x) =3 = (y + 1)2 , −1 < y < 1.
dy 2 2 8

13 / 95
Example
Let X ∼ Exponential(β), i.e., fX (x) = β1 e−x/β , x > 0;
β > 0.

Find fY (y), where Y = X/β.

14 / 95
Example
Let X ∼ Exponential(β), i.e., fX (x) = β1 e−x/β , x > 0;
β > 0.

Find fY (y), where Y = X/β.


Solution:
x
First, we observe that h(x) = y = β
is a monotonic
transformation on x ∈ R.
dx
Now let x = yβ and dy
= β. By the Transformation Formula ,

dx 1
fY (y) = fX (x) = e−(y β) /β |β|
dy β
= e−y , y > 0.

That is, Y ∼ Exponential(1).

14 / 95
Example
Let X ∼ Exponential(β), i.e., fX (x) = β1 e−x/β , x > 0;
β > 0.

We observe that Y ∼ Exponential(1) when Y = X/β.

Note that this result shows that any exponential variable


can be transformed to the exponential distribution with
parameter 1 by dividing by the parameter β.

Hence β can be interpreted as a scale parameter (i.e., a


parameter that does not change the shape of the
distribution, it just changes the scale of X).

15 / 95
Example
Suppose q
that X has probability density function,
2 −x2 /2
fX (x) = π
e ,x > 0.

X2
Let Y = 2
. Find the density function of Y .

16 / 95
Example
Suppose q
that X has probability density function,
2 −x2 /2
fX (x) = π
e ,x > 0.

X2
Let Y = 2
. Find the density function of Y .

Solution:
2
We observe that y = h(x) = x2 is monotonic over x ∈ R+ .
√ 1
So we have x = 2y, dx dy
= (2y)− 2 . By the
Transformation Formula ,
q 1
−2
1 e−y
√y
fY (y) = fX (x) dx
dy
= π
e |(2y)− 2 |
2 −y
= π
, y > 0.
R∞ 1 √
1
e−y y − 2 dy =

Note: Γ 2
= 0
π.

16 / 95
5.2 Linear Transformations

17 / 95
The simplest monotonic transformations are linear
transformations:

h(x) = ax + b for a ̸= 0.

This transformation leads to the following special case.

18 / 95
Result
For a continuous random variable X, if Y = aX + b is a
linear transformation of X with a ̸= 0, then
 
1 y−b
fY (y) = fX
|a| a

for all y such that fX y−b



a
> 0.

This result follows directly from the result in the previous section for
general monotonic transformations.

The implication is that linear transformations only change the location


and scale of a density function. They do not change their shape.

19 / 95
The following figure illustrates this for a bimodal density
function fX , and different choices of linear transformation
parameters.

20 / 95
5.3 Probability Integral
Transformation

21 / 95
Result
If X has density function, fX (x), and cumulative
distribution function, FX (x), then

Y = FX (X) ∼ Uniform (0, 1).

22 / 95
Proof.
dy
Let Y = FX (X). Then y = FX (x) and dx
= fX (x). By the
transformation formula ,

dx 1
fY (y) = fX (x) = fX (x) · = 1, 0 < y < 1.
dy fX (x)

Alternatively, from first principles,

FY (y) = P (Y ≤ y) = P {FX (X) ≤ y}


= P X ≤ FX−1 (y) since FX ↑


= FX FX−1 (y) = y.


∴ fY (y) = 1, 0 < y < 1.

23 / 95
Example
Suppose X ∼ Exp(1), i.e.,

fX (x) = e−x , x > 0 and


FX (x) = 1 − e−x , x > 0.

Hence, Y = 1 − e−X ∼ Uniform(0, 1).

24 / 95
The Probability Integral Transformation allows for easy
simulation of random variables from any distribution for
which the inverse cdf FX−1 is easily computed.

A computer can generate U where U ∼ Uniform (0, 1).

If we require an observation X where X has cdf FX , then

U = FX (X) ∼ Uniform (0, 1) ⇐⇒ X = FX−1 (U ).

Hence, we have the following result.

25 / 95
A Universal Random Number Generator
To generate a sample from any distribution X:

1 Use a computer to generate a random sample u from


U ∼ Uniform (0, 1).

2 Calculate your random sample from X as x = F −1 (u).

A notable exception is the normal and others where the


cumulative cannot be written in closed form.

26 / 95
Example
We require an observation from the distribution with a
cumulative distribution function

FX (x) = 1 − e−x , x > 0.


Explain how we can generate a random observation
from this distribution.

27 / 95
Example

Solution:
Step 1: Generate a random observation (number) from
Uniform(0, 1).

(In RStudio, u = runif(n) generates n random


numbers.)

Step 2: The random observation from X can be


calculated as follows x = F −1 (u) = − ln(1 − u).

(In RStudio, x = − log(1 − u) generates n such


numbers.)

28 / 95
Example

Solution:
That is, let y = FX (x).

Then x = FX−1 (y) and y = 1 − e−x =⇒ x = − ln(1 − y), so

FX−1 (y) = − ln(1 − y).

If X = FX−1 (U ) = − ln(1 − U ), where U ∼ Uniform (0, 1),


then X has the cumulative distribution function,

FX (x) = 1 − e−x , x > 0.

29 / 95
5.4 Bivariate Transformations

30 / 95
Suppose
X and Y have joint probability density function
fX,Y (x, y) and
U is a function of X and Y .

We can find the probability density function of U by


calculating
FU (u) = P (U ≤ u)
and differentiating.

31 / 95
Example
Suppose fX,Y (x, y) = 1, 0 < x < 1, 0 < y < 1.

Let U = X + Y .

Find fU (u).

32 / 95
Example

Solution:
FU (u) = P (U ≤ u) = P (X + Y ≤ u)
 Z u Z u−y
1 dx dy, 0<u<1




 0 0
= Z 1 Z 1


 1− 1 dx dy, 1 < u < 2


u−1 u−y

u2

2
, 0<u<1
= u2
2u − 2
− 1, 1 < u < 2.
Thus 
u, 0<u<1
fU (u) =
2 − u, 1 < u < 2.

33 / 95
An alternative way to find the density of U is by way of a
bivariate transformation, using the following result:

Result
If U and V are functions of continuous random variables X
and Y , then

fU,V (u, v) = fX,Y (x, y) · |J|,

where
∂x ∂x
∂u ∂v
J=
∂y ∂y
∂u ∂v

is a determinant called the Jacobian of the transformation.

34 / 95
The full specification of fU,V (u, v) requires that the range of
(u, v) values corresponding to those (x, y) for which
fX,Y (x, y) > 0 is determined.

So to find fU (u) by bivariate transformation:


1 Define some bivariate transformation to (U, V ).
2 Find fU,V (u, v).
3 We wantRthe marginal distribution of U . So now find

fU (u) = −∞ fU,V (u, v)dv.

35 / 95
Example
X, Y independent Uniform (0,1) variables.

fX,Y (x, y) = 1, 0 < x < 1, 0 < y < 1.

Let U = X + Y and V = Y . Use a bivariate transformation


to (U, V ) to find the density function of U .

First, note that X = U − V and Y = V .


∂x ∂x
x = u − v, y = v gives = 1, = −1,
∂u ∂v
∂y ∂y 1 −1
= 0, = 1 and J = = 1.
∂u ∂v 0 1
Now 0 < x < 1 ⇔ 0 < u − v < 1 and 0 < y < 1 ⇔ 0 < v < 1.

∴ fU,V (u, v) = fX,Y (x, y)|J|


= 1, v < u < 1 + v, 0 < v < 1.

36 / 95
Example
 Z u



 1 · dv = u ,0 < u < 1
 0
fU (u) =


Z 1
1 · dv = 2 − u , 1 < u < 2.



u−1

Note that V = Y , so fV (v) = 1, 0 < v < 1.

37 / 95
Example
Suppose the joint probability density function of X and Y
is
fX,Y (x, y) = 3y, 0 < x < y < 1.
Let U = X + Y and V = Y − X. then
U −V U +V
X= Y = .
2 2
u−v u+v
Now x = 2
, y= 2
give

∂x 1 ∂x 1 ∂y 1 ∂y 1
= , =− , = , = .
∂u 2 ∂v 2 ∂u 2 ∂v 2
1/2 −1/2 1
∴J = = and
1/2 1/2 2

38 / 95
Example
u−v u+v
0 < x < y ⇐⇒ 0 < < ⇐⇒ 0 < v < u,
2 2
u+v
y < 1 ⇐⇒ < 1 ⇐⇒ u + v < 2.
2

39 / 95
Example
3(u + v) 1 3
∴ fU,V (u, v) = | | = (u + v),
2 2 4
0 < v < u, u + v < 2.

u
9u2
Z
3
For 0 < u < 1, fU (u) = (u + v)dv = .
0 4 8
2−u
3 3u2
Z
3
For 1 < u < 2, fU (u) = (u + v)dv = − .
0 4 2 8

As an aside, note that


Z 2−v
3 3
fV (v) = (u + v)du = (1 − v 2 ), 0 < v < 1.
v 4 2

40 / 95
Example
The lifetimes X and Y of two brands of components of a
system are independent with

fX (x) = xe−x , x > 0, and fY (y) = e−y , y > 0.

The relative efficiency of the components is measured as


Y
U=X .

Find the density function of the relative efficiency using a


¯
bivariate transformation.

41 / 95
Example

Solution:
We have the joint probability density function of X and Y
as
fX,Y (x, y) = xe−(x+y) , x > 0, y > 0.
Y
Now U = X
. Let V = X. Then

X = V , Y = U V and if x = v, y = uv,

∂x ∂x ∂y ∂y
= 0, = 1, = v, = u, so
∂u ∂v ∂u ∂v
0 1
J= = −v.
v u
Now x > 0 ⇐⇒ v > 0 and y > 0 ⇐⇒ uv > 0 =⇒ u > 0
since v > 0.
42 / 95
Example

Solution - continued:

∴ fU,V (u, v) = fX,Y (x, y)|J|


= v e−(v+uv) | − v|
= v 2 e−v(1+u) , u > 0, v > 0.

Z ∞
2
∴ fU (u) = v 2 e−v(1+u) dv = , u > 0.
0 (1 + u)3
Note also,
Z ∞
2
E(U ) = u· du = 1.
0 (1 + u)3

See derivation of this result .


43 / 95
Example
Suppose X denotes the total time from arrival to exit from
a service queue and Y denotes the time spent in the queue
before being served. Suppose also that we want the density
of U = X − Y , the amount of time spent being served when

fX,Y (x, y) = e−x , 0 < y < x < ∞.

Find the density function of U , using a bivariate


transformation.

44 / 95
Example

Solution:
We have

X = U + V , Y = V and if x = u + v, y = v,

∂x ∂x ∂y ∂y
then = 1, = 1, = 0, = 1.
∂u ∂v ∂u ∂v

1 1
∴J = =1
0 1
and

0 < y < x < ∞ ⇐⇒ 0 < v < u + v < ∞

=⇒ v > 0, u > 0, and u + v < ∞ =⇒ u < ∞, v < ∞.

45 / 95
Example

Solution - continued:
∴ fU,V (u, v) = e−(u+v) , u > 0, v > 0.

Thus, the time spent being served, U , and the time spent
in the queue before service, V , are independent random
variables. Also,

fU (u) = e−u , u > 0 and fV (v) = e−v , v > 0.

46 / 95
5.5 Sum of Independent Random
Variables

47 / 95
Result
Suppose that X and Y are independent random variables
taking only non-negative integer values (here X and Y are
discrete random variables), and let Z = X + Y . Then, the
probability mass function of Z is

z
X
P (Z = z) = P (X = z − y) P (Y = y), z = 0, 1, 2, . . .
y=0

48 / 95
Proof.

P (Z = z) = P (X + Y = z)
= P (X = z, Y = 0) + P (X = z − 1, Y = 1)
+ · · · + P (X = 0, Y = z)
= P (X = z)P (Y = 0) + P (X = z − 1)P (Y = 1)
+ · · · + P (X = 0)P (Y = z)
= P (X = z) P (Y = 0) + P (X = z − 1) P (Y = 1)
+ · · · + P (X = 0) P (Y = z)
z
X
= P (X = z − y) P (Y = y), z = 0, 1, 2, . . . .
y=0

49 / 95
Example
Suppose X and Y are independent with the probability
mass functions given by

P (X = k) = P (Y = k) = (1 − θ) θk , k = 0, 1, 2, . . . ,
0 < θ < 1.

Find the probability mass function of Z = X + Y .

50 / 95
Example
Suppose X and Y are independent with the probability
mass functions given by

P (X = k) = P (Y = k) = (1 − θ) θk , k = 0, 1, 2, . . . ,
0 < θ < 1.

Find the probability mass function of Z = X + Y .


Solution:

z
X
P (Z = z) = (1 − θ) θk (1 − θ) θz−k
k=0
z
X
2
= (1 − θ) θz
k=0
= (z + 1) (1 − θ)2 θz , z = 0, 1, 2, . . . .
50 / 95
5.5.1 Sum of independent
Poisson random variables is
again a Poisson random variable.

51 / 95
Example
Suppose that X ∼ Poisson (λ1 ) and Y ∼ Poisson (λ2 ) are
independent. That is,

e−λ1 λk1
P (X = k) = , k = 0, 1, 2, . . . , λ1 > 0,
k!
e−λ2 λk2
P (Y = k) = , k = 0, 1, 2, . . . , λ2 > 0.
k!

Find the probability mass function of Z = X + Y .

52 / 95
Example

Solution:

P (Z = z) = P (X + Y = z)
z
X e−λ1 λk1 e−λ2 λz−k
2
= ·
k=0
k! (z − k)!
z  
e−(λ1 +λ2 ) X z
= λk1 λz−k
2
z! k=0
k
e−(λ1 +λ2 ) (λ1 + λ2 )z
= , z = 0, 1, 2, . . .
z!
(by the Binomial Theorem ).

Thus Z = X + Y ∼ Poisson(λ1 + λ2 ).

53 / 95
The next result is obtained by induction.

Result
If X1 , X2 , . . . , Xn are independent with Xi ∼ Poisson(λi ),
then !
Xn Xn
Xi ∼ Poisson λi .
i=1 i=1

This is an important and useful property of Poisson


random variables.

54 / 95
5.5.2 Sum of Independent
Continuous Random Variables.

55 / 95
Result
Suppose that X and Y are independent continuous random
variables with probability density function fX (x) and
fY (y), respectively.

Let Z = X + Y . Then, the probability density function of


Z is
Z
fZ (z) = fX (z − y) fY (y) dy.
all values of y

56 / 95
Proof.

FZ (z) = P (Z ≤ z) = P (X + Y ≤ z)
ZZ
= fX,Y (x, y) dx dy
x+y≤z
Z Z z−y
= fX (x) fY (y) dx dy
all possible y −∞
Z Z z−y
= fX (x) dx fY (y) dy
all possible y −∞
Z
= FX (z − y) fY (y) dy.
all possible y
Z
d
∴ fZ (z) = FZ (z) = fX (z − y) fY (y) dy.
dz all possible y

57 / 95
Example
Suppose that X and Y are independent random variables
with probability density functions fX (x) = e−x , x > 0 and
fY (y) = e−y , y > 0, respectively.

Find the density function of Z = X + Y .

58 / 95
Example
Suppose that X and Y are independent random variables
with probability density functions fX (x) = e−x , x > 0 and
fY (y) = e−y , y > 0, respectively.

Find the density function of Z = X + Y .


Solution:
By the continuous convolution formula , the probability
density function of Z = X + Y is
Z
fZ (z) = fX (z − y)fY (y)dy.
all possible y

Now fX (x) = e−x , only for x > 0. So

fX (z − y) = e−(z−y) for z − y > 0 or y < z.

58 / 95
Example

Solution - continued
This means that the range of possible values of y is
0 < y < z. Therefore
Z z
fZ (z) = e−(z−y) e−y dy
0
Z z
= e−z dy
h0 iz
−z
= ye
0
= z e−z , z > 0.

Notice that the answer is the probability density function of a


Gamma(2, 1) random variable.

59 / 95
5.5.3 Sum of independent
Exponential random variables is
a Gamma random variable.

60 / 95
Thus, the sum of two independent Exponential(1) random
variables is a Gamma(2, 1) random variable.

In general, if

X and Y are independent, and


X ∼ Gamma(α1 , 1) and Y ∼ Gamma(α2 , 1)

then X + Y ∼ Gamma(α1 + α2 , 1).

61 / 95
5.5.4 Sum of independent
Gamma random variables is
again a Gamma random variable

62 / 95
The arguments given in the previous example can be
extended to derive the following result.

Result
If X1 , X2 , . . . , Xn are independent with

Xi ∼ Gamma(αi , β),

then
n n
!
X X
Xi ∼ Gamma αi , β .
i=1 i=1

63 / 95
5.5.5 Sum of independent Normal
random variables is again a Normal
random variable.

64 / 95
Example
Suppose that X ∼ N (0, 1), Y ∼ N (0, 1). That is,
1 2
fX (x) = √ e−x /2 , −∞ < x < ∞

1 2
fY (y) = √ e−y /2 , −∞ < y < ∞

Find the probability density function of Z = X + Y

65 / 95
Example
Suppose that X ∼ N (0, 1), Y ∼ N (0, 1). That is,
1 2
fX (x) = √ e−x /2 , −∞ < x < ∞

1 2
fY (y) = √ e−y /2 , −∞ < y < ∞

Find the probability density function of Z = X + Y

Solution:
By the continuous convolution formula , the probability
density function of Z = X + Y is
Z ∞
fZ (z) = fX (z − y) fY (y) dy.
−∞

65 / 95
Example

Solution - continued:
We have
1 2
fX (z − y) = √ e−(z−y) /2 , −∞ < z − y < ∞.

For any fixed z ∈ (−∞, ∞), we can consider fX (z − y) as a
function of y, i.e.,
1 2
fX (z − y) = √ e−(z−y) /2 , −∞ < y < ∞.

66 / 95
Example

Solution - continued:
Z ∞
1 2 1 2
∴ fZ (z) = √ e−(z−y) /2 · √ e−y /2 dy
−∞ 2π 2π
Z ∞
1 2 2
= e−z /2+zy−y dy
2 π −∞
2
e−z /2 ∞ − y2 −zy+ z42 + z42
Z  
= e dy
2π −∞
2
e−z /4 ∞ −(y− z2 )2
Z
= e dy.
2π −∞

67 / 95
Example

Solution - continued:
Now for µ and σ, we have
Z ∞
1 1 2
√ e− 2σ2 (y−µ) dy = 1,
−∞ σ 2 π

and
Z ∞
1 2 √
e− 2σ2 (y−µ) dy = σ 2π.
−∞

68 / 95
Example

Solution - continued:
Z ∞ √
z 2
2
Put σ = 1
2
and µ = z
2
. Then e−(y− 2 ) dy = π.
−∞

2
e−z /4 √ 1 2
∴ fZ (z) = π = √ e−z /4 , −∞ < z < ∞.
2π 2 π

Thus Z ∼ N (0, 2).

More generally, we can show that the sum of any two


independent normal random variables is also
normal.

69 / 95
5.6 Moment Generating Function
Approach

70 / 95
Result
Suppose that X and Y independent random variables with
moment-generating functions mX and mY , respectively.
Then
mX+Y (u) = mX (u) · mY (u).

71 / 95
Proof.
If X and Y are independent, then Z = X + Y has moment
generating function

mZ (u) = E(eu(X+Y ) ) = E(euX · euY )


= E(euX ) · E(euY )
= mX (u) · mY (u).

72 / 95
Alternative Proof
Recall, the probability density function of Z is
Z ∞
fZ (z) = fX (z − y) fY (y) dy.
−∞

Thus,
Z ∞
E(euZ ) = euz fZ (z) dz
−∞
Z ∞ Z ∞
= euz fX (z − y) fY (y) dy dz
−∞ −∞
Z ∞ Z ∞
= eu(z−y) fX (z − y) dz · euy fY (y) dy
−∞ −∞
Z ∞ Z ∞
= eux fX (x) dx euy fY (y) dy,
−∞ −∞
with x=z−y and dx = dz
Z ∞
uy
= mX (u) e fY (y) dy
−∞
Z ∞
= mX (u) euy fY (y) dy
−∞
= mX (u) · mY (u).
73 / 95
Sum of Independent Random Variables
Using Moment Generating Functions

74 / 95
We can use this approach to find the distribution of the
sum of independent random variables using the one-to-one
correspondence between distributions and moment
generating functions.

The required result, which follows directly from the


previous result is
Result
If
PX 1 , X2 , . . . , Xn are independent random variables, then
n
i=1 Xi has the moment generating function

n
Y
m Pn
i=1 Xi (u) = mXi (u).
i=1

75 / 95
Proof.
Pn
mPni=1 Xi (u) = E(eu i=1 Xi
)
n
!
Y
= E euXi
i=1
n
Y
= E(euXi )
i=1
n
Y
= mXi (u).
i=1

76 / 95
Example
Let X1 , X2 , . . . , Xn be independent Bernoulli(p) random variables. Thus each Xi
has a probability function

fX (x) = px (1 − p)1−x , x = 0, 1; 0<p<1

and moment generating function (mgf) of X is

1
X
mX (u) = E(eu X ) = eux P (X = x)
k=0

= eu·0 P (X = 0) + eu·1 P (X = 1)
= 1 − p + p eu .

Therefore,
n
Y
mPn (u) = mXi (u) = (1 − p + peu )n
i=1 Xi
i=1

which is the mgf of a Binomial (n, p) random variable. Thus we can conclude that
Pn
i=1 Xi ∼ Binomial (n, p).

77 / 95
Example
Note that if Z ∼ Binomial(n, p), then
X
mZ (u) = E(euZ ) = euz P (Z = z)
all z
n  
uz n
X
= e pz (1 − p)n−z
z=0
z
n  
X n
= (p eu )z (1 − p)n−z
z=0
z
= (peu + 1 − p)n
= (1 − p + peu )n .

78 / 95
Example
Suppose that X1 , X2 , . . . , Xn are independent random
variables with each i = 1, . . . , n, Xi ∼ Poisson (λi ).

If X ∼ Poisson (λ), then X has moment generating


function

uX
X e−λ λx
mX (u) = E(e )= eux
x=0
x!

X (λeu )x u u
= e−λ = e−λ · eλe = eλ(e −1) .
x=0
x!

79 / 95
Example
Thus Xi has the moment generating function
u −1)
mXi (u) = E(euXi ) = eλi (e
n
X
and Xi has the moment generating function
i=1

n Pn
Pn
E(euXi ) = e( λi )(eu −1)
Y
mPni=1 Xi (u) = E(eu i=1 Xi
)= i=1 .
i=1

n n
!
X X
∴ Xi ∼ Poisson λi .
i=1 i=1

This agrees with what was stated earlier .

80 / 95
Sum of Independent Normal Random
Variables is again Normal.

81 / 95
Sums of independent normal random variables are also
normal, with the means and variances added.

Result
2
If X ∼ N (µX , σX ) and Y ∼ N (µY , σY2 ) are independent
then

2
X + Y ∼ N (µX + µY , σX + σY2 ).

82 / 95
The following extension is for general weighted sums.

Result
Let 1 ≤ i ≤ n. If Xi ∼ N (µi , σi2 ) are independent, then for
any set of constants a1 , . . . , an ,
n n n
!
X X X
ai X i ∼ N ai µ i , a2i σi2 .
i=1 i=1 i=1

The proofs of these last two results are straightforward via


moment generating functions.

83 / 95
Example
Suppose that

X1 ∼ N (3, 8), X2 ∼ N (−5, 5), X3 ∼ N (2, 11).

and that X1 .X2 and X3 are independent. Let

U = 6X1 + 3X2 − 9X3 .


Then U is normal with mean

E(U ) = E(6X1 + 3X2 − 9X3 )


= 6 E(X1 ) + 3 E(X2 ) − 9 E(X3 )
= 6 × 3 + 3 × (−5) − 9 × 2 = −15

84 / 95
Example

V ar(U ) = V ar(6X1 + 3X2 − 9X3 )


= 62 V ar(X1 ) + 32 V ar(X2 ) + (−9)2 V ar(X3 )
= 36 × 8 + 9 × 5 + 81 × 11 = 1224

In summary,

U ∼ N (−15, 1224).

85 / 95
Chapter 5.7
Conditional Expectation and
Conditional Variance

86 / 95
Suppose X, Y and Z are random variables.

These random variables can be discrete or continuous.

The proofs of the following results will be given from


discrete random variables but are also valid for continuous
random variables.

87 / 95
Result 1

E(aY + bZ X) = a E(Y X) + b E(Z X) for a, b ∈ R.

88 / 95
Result 1

E(aY + bZ X) = a E(Y X) + b E(Z X) for a, b ∈ R.

Proof.
For a, b ∈ R,

X
E(aY + bZ X) = (ay + bz) P (Y = y, Z = z X = x)
y,z
X X
= ay P (Y = y, Z = z X = x) + bz P (Y = y, Z = z X = x)
y,z y,z
X X
= a y P (Y = y X = x) + b z P (Z = z X = x)
y z

= a E(Y X) + b E(Z X)

88 / 95
Result 2
E(Y X) ≥ 0 if Y ≥ 0.

Result 3
E(1 X) = 1.

Result 4
If X and Y are independent, then E(Y X) = E(Y ).

Result 5 - Pull Through Property


E(Y g(X) X) = g(X)E(Y X) for any suitable g function.

89 / 95
Result 6
 
1 E E(Y X) = E(Y )
 
2 E E(g(Y ) X) = E(g(Y )) for any suitable g function.

Proof.
We will prove ❷ since ❶ is a special case of ❷ when g = 1.
  X 
E E(g(Y ) X) = E g(y) P (Y = y X = x)
y
XhX i
= g(y) P (Y = y X = x) P (X = x)
x y
X hX i
= g(y) P (Y = y X = x) P (X = x)
y x
X hX i
= g(y) P (Y = y, X = x)
y x
X
= g(y) P (Y = y)
y

= E(g(Y )).

90 / 95
Result 7 - Tower Property
! !
 
E E(Y X, Z) X = E(Y X) = E E(Y X) X, Z .

Proof.
h i XhX i
E E(Y X, Z) X = P (Y = y X = x, Z = z) P (X = x, Z = z X = s)
z y
XX P (Y = y, X = x, Z = z) P (X = x, Z = z)
= y
z y P (X = x, Z = z) P (X = x)
X
= yP (Y = y X = x)
y

= E(Y X = x)
!

= E E(Y X) X, Z

by Result 5

91 / 95
Result 8 - Conditional Variance Formula
 
Var(Y X) = E Var(Y X) + Var E(Y X) .

Proof.
The natural definitions are given by

h 2 i
V ar(Y X = x) = E Y − E(Y X = x) X =x
2
V ar(Y ) = E((Y − E(Y ) ) ).

2
V ar(Y X) = E((Y − E(Y )) X)
h  2 i
= E E Y − E(Y X) + E(Y X) − E(Y ) X
 
= E V ar(Y |X) + V ar E(Y X)

since the mean of E(Y X) is E(Y ) and the cross product reduces to zero because


2E E(Y − E(Y X) ) (E(Y X) − E(Y ) ) X
   
= 2E E(Y X) − E(Y ) E Y − E(Y X) X
= 0,


that is, E (Y − E(Y X)) X = E(Y X) − E(Y X) = 0.
92 / 95
5.8 Supplementary Material

93 / 95
Supplementary Material-Binomial Theorem

Binomial Theorem
Let a and b be constant. Then,
m  
m
X m y m−y
(a + b) = a b
y=0
y
m
X m!
= ay bm−y .
y=0
y! (m − y)!

94 / 95
Supplementary Material- Integration by Parts Formula

Recall the integration by parts formula,


Z Z
x dy = xy − y dx.

Let x = u and dy = 2 (1 + u)−3 du, so dx = du and y = −(1 + u)−2 .


By the integration by parts formula,
Z ∞ ∞
Z ∞
2 −2
u du = u [−(1 + u) ] + (1 + u)−2 du
0 (1 + u)3 0 0

= 0 − (1 + u)−1 = 1.
0

Return to the example .

95 / 95

You might also like