Math5846_chapter5
Math5846_chapter5
UNSW Sydney
OPEN LEARNING
Chapter 5
Transformations,
Conditional Expectation,
and
Conditional Variance
2 / 95
Outline:
5.1 Introduction
4 / 95
Definition
If X is a random variable, Y = h(X) for some function h is
a transformation of X.
Result
For discrete X,
X
P (Y = y) = P ( h(X) = y ) = P (X = x).
x:h(x)=y
5 / 95
Example
Suppose the probability mass function of X is
x −1 0 1 2
P (X = x) 1/8 1/4 1/2 1/8
If Y = X 2 , we have
y 0 1 4
P (Y = y) 1/4 5/8 1/8
since, for example P (Y = 0) = P (X 2 = 0) = P (X = 0) = 1/4,
6 / 95
The density function of a transformed continuous variable
is simple to determine when the transformation is
monotonic.
Definition
Let h be a real-valued function defined over the set A where
A is a subset of R. Then h is a monotonic transformation
if h is strictly increasing or decreasing over A.
7 / 95
Example
Classifies the following functions as monotonic or
non-monotonic for the domain x ∈ R.
Solution:
➊ h1 (x) = x2 is non-monotonic on x ∈ R. However, it is
monotonic on x ∈ (−∞, 0) and x ∈ (0, ∞).
8 / 95
Example
Classifies the following functions as monotonic or
non-monotonic for the domain x ∈ R.
Solution:
➋ h2 (x) = 7(x − 4)3 is monotonic on x ∈ R.
9 / 95
Example
Classifies the following functions as monotonic or
non-monotonic for the domain x ∈ R.
Solution:
➌ h3 (x) = sin(x) is non-monotonic
on x ∈ R. However, it is
−π π
monotonic on x ∈ 2 , 2 , x ∈ π2 , 3π 2 , etc.
10 / 95
Result:
For continuous random variable X, if h is monotonic over
the set {x : fX (x) > 0}, then
dx
fY (y) = fX (x)
dy
dx
= fX h−1 (y)
dy
11 / 95
Proof.
FY (y) = P (Y ≤ y) = P {h(X) ≤ y}
−1
−1
P X ≤ h (y) = FX h (y)
if h ↑
=
P X ≥ h−1 (y) = 1 − FX h−1 (y)
if h ↓
dh−1 (y)
= fX (x) dx
−1
fX h (y)
dy dy
if h ↑
∴ fY (y) =
dh−1 (y)
−fX h−1 (y) = −fX (x) dx if h ↓
dy dy
dy >0 if h ↑
Now and so
dx <0 if h ↓
dx
fY (y) = fX (x) .
dy
12 / 95
Example
Suppose the probability density function of X is
fX (x) = 3x2 , 0 < x < 1.
13 / 95
Example
Suppose the probability density function of X is
fX (x) = 3x2 , 0 < x < 1.
dx y + 1 2 1 3
fY (y) = fX (x) =3 = (y + 1)2 , −1 < y < 1.
dy 2 2 8
13 / 95
Example
Let X ∼ Exponential(β), i.e., fX (x) = β1 e−x/β , x > 0;
β > 0.
14 / 95
Example
Let X ∼ Exponential(β), i.e., fX (x) = β1 e−x/β , x > 0;
β > 0.
dx 1
fY (y) = fX (x) = e−(y β) /β |β|
dy β
= e−y , y > 0.
14 / 95
Example
Let X ∼ Exponential(β), i.e., fX (x) = β1 e−x/β , x > 0;
β > 0.
15 / 95
Example
Suppose q
that X has probability density function,
2 −x2 /2
fX (x) = π
e ,x > 0.
X2
Let Y = 2
. Find the density function of Y .
16 / 95
Example
Suppose q
that X has probability density function,
2 −x2 /2
fX (x) = π
e ,x > 0.
X2
Let Y = 2
. Find the density function of Y .
Solution:
2
We observe that y = h(x) = x2 is monotonic over x ∈ R+ .
√ 1
So we have x = 2y, dx dy
= (2y)− 2 . By the
Transformation Formula ,
q 1
−2
1 e−y
√y
fY (y) = fX (x) dx
dy
= π
e |(2y)− 2 |
2 −y
= π
, y > 0.
R∞ 1 √
1
e−y y − 2 dy =
Note: Γ 2
= 0
π.
16 / 95
5.2 Linear Transformations
17 / 95
The simplest monotonic transformations are linear
transformations:
h(x) = ax + b for a ̸= 0.
18 / 95
Result
For a continuous random variable X, if Y = aX + b is a
linear transformation of X with a ̸= 0, then
1 y−b
fY (y) = fX
|a| a
This result follows directly from the result in the previous section for
general monotonic transformations.
19 / 95
The following figure illustrates this for a bimodal density
function fX , and different choices of linear transformation
parameters.
20 / 95
5.3 Probability Integral
Transformation
21 / 95
Result
If X has density function, fX (x), and cumulative
distribution function, FX (x), then
22 / 95
Proof.
dy
Let Y = FX (X). Then y = FX (x) and dx
= fX (x). By the
transformation formula ,
dx 1
fY (y) = fX (x) = fX (x) · = 1, 0 < y < 1.
dy fX (x)
= FX FX−1 (y) = y.
23 / 95
Example
Suppose X ∼ Exp(1), i.e.,
24 / 95
The Probability Integral Transformation allows for easy
simulation of random variables from any distribution for
which the inverse cdf FX−1 is easily computed.
25 / 95
A Universal Random Number Generator
To generate a sample from any distribution X:
26 / 95
Example
We require an observation from the distribution with a
cumulative distribution function
27 / 95
Example
Solution:
Step 1: Generate a random observation (number) from
Uniform(0, 1).
28 / 95
Example
Solution:
That is, let y = FX (x).
29 / 95
5.4 Bivariate Transformations
30 / 95
Suppose
X and Y have joint probability density function
fX,Y (x, y) and
U is a function of X and Y .
31 / 95
Example
Suppose fX,Y (x, y) = 1, 0 < x < 1, 0 < y < 1.
Let U = X + Y .
Find fU (u).
32 / 95
Example
Solution:
FU (u) = P (U ≤ u) = P (X + Y ≤ u)
Z u Z u−y
1 dx dy, 0<u<1
0 0
= Z 1 Z 1
1− 1 dx dy, 1 < u < 2
u−1 u−y
u2
2
, 0<u<1
= u2
2u − 2
− 1, 1 < u < 2.
Thus
u, 0<u<1
fU (u) =
2 − u, 1 < u < 2.
33 / 95
An alternative way to find the density of U is by way of a
bivariate transformation, using the following result:
Result
If U and V are functions of continuous random variables X
and Y , then
where
∂x ∂x
∂u ∂v
J=
∂y ∂y
∂u ∂v
34 / 95
The full specification of fU,V (u, v) requires that the range of
(u, v) values corresponding to those (x, y) for which
fX,Y (x, y) > 0 is determined.
35 / 95
Example
X, Y independent Uniform (0,1) variables.
36 / 95
Example
Z u
1 · dv = u ,0 < u < 1
0
fU (u) =
Z 1
1 · dv = 2 − u , 1 < u < 2.
u−1
37 / 95
Example
Suppose the joint probability density function of X and Y
is
fX,Y (x, y) = 3y, 0 < x < y < 1.
Let U = X + Y and V = Y − X. then
U −V U +V
X= Y = .
2 2
u−v u+v
Now x = 2
, y= 2
give
∂x 1 ∂x 1 ∂y 1 ∂y 1
= , =− , = , = .
∂u 2 ∂v 2 ∂u 2 ∂v 2
1/2 −1/2 1
∴J = = and
1/2 1/2 2
38 / 95
Example
u−v u+v
0 < x < y ⇐⇒ 0 < < ⇐⇒ 0 < v < u,
2 2
u+v
y < 1 ⇐⇒ < 1 ⇐⇒ u + v < 2.
2
39 / 95
Example
3(u + v) 1 3
∴ fU,V (u, v) = | | = (u + v),
2 2 4
0 < v < u, u + v < 2.
u
9u2
Z
3
For 0 < u < 1, fU (u) = (u + v)dv = .
0 4 8
2−u
3 3u2
Z
3
For 1 < u < 2, fU (u) = (u + v)dv = − .
0 4 2 8
40 / 95
Example
The lifetimes X and Y of two brands of components of a
system are independent with
41 / 95
Example
Solution:
We have the joint probability density function of X and Y
as
fX,Y (x, y) = xe−(x+y) , x > 0, y > 0.
Y
Now U = X
. Let V = X. Then
X = V , Y = U V and if x = v, y = uv,
∂x ∂x ∂y ∂y
= 0, = 1, = v, = u, so
∂u ∂v ∂u ∂v
0 1
J= = −v.
v u
Now x > 0 ⇐⇒ v > 0 and y > 0 ⇐⇒ uv > 0 =⇒ u > 0
since v > 0.
42 / 95
Example
Solution - continued:
Z ∞
2
∴ fU (u) = v 2 e−v(1+u) dv = , u > 0.
0 (1 + u)3
Note also,
Z ∞
2
E(U ) = u· du = 1.
0 (1 + u)3
44 / 95
Example
Solution:
We have
X = U + V , Y = V and if x = u + v, y = v,
∂x ∂x ∂y ∂y
then = 1, = 1, = 0, = 1.
∂u ∂v ∂u ∂v
1 1
∴J = =1
0 1
and
45 / 95
Example
Solution - continued:
∴ fU,V (u, v) = e−(u+v) , u > 0, v > 0.
Thus, the time spent being served, U , and the time spent
in the queue before service, V , are independent random
variables. Also,
46 / 95
5.5 Sum of Independent Random
Variables
47 / 95
Result
Suppose that X and Y are independent random variables
taking only non-negative integer values (here X and Y are
discrete random variables), and let Z = X + Y . Then, the
probability mass function of Z is
z
X
P (Z = z) = P (X = z − y) P (Y = y), z = 0, 1, 2, . . .
y=0
48 / 95
Proof.
P (Z = z) = P (X + Y = z)
= P (X = z, Y = 0) + P (X = z − 1, Y = 1)
+ · · · + P (X = 0, Y = z)
= P (X = z)P (Y = 0) + P (X = z − 1)P (Y = 1)
+ · · · + P (X = 0)P (Y = z)
= P (X = z) P (Y = 0) + P (X = z − 1) P (Y = 1)
+ · · · + P (X = 0) P (Y = z)
z
X
= P (X = z − y) P (Y = y), z = 0, 1, 2, . . . .
y=0
49 / 95
Example
Suppose X and Y are independent with the probability
mass functions given by
P (X = k) = P (Y = k) = (1 − θ) θk , k = 0, 1, 2, . . . ,
0 < θ < 1.
50 / 95
Example
Suppose X and Y are independent with the probability
mass functions given by
P (X = k) = P (Y = k) = (1 − θ) θk , k = 0, 1, 2, . . . ,
0 < θ < 1.
z
X
P (Z = z) = (1 − θ) θk (1 − θ) θz−k
k=0
z
X
2
= (1 − θ) θz
k=0
= (z + 1) (1 − θ)2 θz , z = 0, 1, 2, . . . .
50 / 95
5.5.1 Sum of independent
Poisson random variables is
again a Poisson random variable.
51 / 95
Example
Suppose that X ∼ Poisson (λ1 ) and Y ∼ Poisson (λ2 ) are
independent. That is,
e−λ1 λk1
P (X = k) = , k = 0, 1, 2, . . . , λ1 > 0,
k!
e−λ2 λk2
P (Y = k) = , k = 0, 1, 2, . . . , λ2 > 0.
k!
52 / 95
Example
Solution:
P (Z = z) = P (X + Y = z)
z
X e−λ1 λk1 e−λ2 λz−k
2
= ·
k=0
k! (z − k)!
z
e−(λ1 +λ2 ) X z
= λk1 λz−k
2
z! k=0
k
e−(λ1 +λ2 ) (λ1 + λ2 )z
= , z = 0, 1, 2, . . .
z!
(by the Binomial Theorem ).
Thus Z = X + Y ∼ Poisson(λ1 + λ2 ).
53 / 95
The next result is obtained by induction.
Result
If X1 , X2 , . . . , Xn are independent with Xi ∼ Poisson(λi ),
then !
Xn Xn
Xi ∼ Poisson λi .
i=1 i=1
54 / 95
5.5.2 Sum of Independent
Continuous Random Variables.
55 / 95
Result
Suppose that X and Y are independent continuous random
variables with probability density function fX (x) and
fY (y), respectively.
56 / 95
Proof.
FZ (z) = P (Z ≤ z) = P (X + Y ≤ z)
ZZ
= fX,Y (x, y) dx dy
x+y≤z
Z Z z−y
= fX (x) fY (y) dx dy
all possible y −∞
Z Z z−y
= fX (x) dx fY (y) dy
all possible y −∞
Z
= FX (z − y) fY (y) dy.
all possible y
Z
d
∴ fZ (z) = FZ (z) = fX (z − y) fY (y) dy.
dz all possible y
57 / 95
Example
Suppose that X and Y are independent random variables
with probability density functions fX (x) = e−x , x > 0 and
fY (y) = e−y , y > 0, respectively.
58 / 95
Example
Suppose that X and Y are independent random variables
with probability density functions fX (x) = e−x , x > 0 and
fY (y) = e−y , y > 0, respectively.
58 / 95
Example
Solution - continued
This means that the range of possible values of y is
0 < y < z. Therefore
Z z
fZ (z) = e−(z−y) e−y dy
0
Z z
= e−z dy
h0 iz
−z
= ye
0
= z e−z , z > 0.
59 / 95
5.5.3 Sum of independent
Exponential random variables is
a Gamma random variable.
60 / 95
Thus, the sum of two independent Exponential(1) random
variables is a Gamma(2, 1) random variable.
In general, if
61 / 95
5.5.4 Sum of independent
Gamma random variables is
again a Gamma random variable
62 / 95
The arguments given in the previous example can be
extended to derive the following result.
Result
If X1 , X2 , . . . , Xn are independent with
Xi ∼ Gamma(αi , β),
then
n n
!
X X
Xi ∼ Gamma αi , β .
i=1 i=1
63 / 95
5.5.5 Sum of independent Normal
random variables is again a Normal
random variable.
64 / 95
Example
Suppose that X ∼ N (0, 1), Y ∼ N (0, 1). That is,
1 2
fX (x) = √ e−x /2 , −∞ < x < ∞
2π
1 2
fY (y) = √ e−y /2 , −∞ < y < ∞
2π
Find the probability density function of Z = X + Y
65 / 95
Example
Suppose that X ∼ N (0, 1), Y ∼ N (0, 1). That is,
1 2
fX (x) = √ e−x /2 , −∞ < x < ∞
2π
1 2
fY (y) = √ e−y /2 , −∞ < y < ∞
2π
Find the probability density function of Z = X + Y
Solution:
By the continuous convolution formula , the probability
density function of Z = X + Y is
Z ∞
fZ (z) = fX (z − y) fY (y) dy.
−∞
65 / 95
Example
Solution - continued:
We have
1 2
fX (z − y) = √ e−(z−y) /2 , −∞ < z − y < ∞.
2π
For any fixed z ∈ (−∞, ∞), we can consider fX (z − y) as a
function of y, i.e.,
1 2
fX (z − y) = √ e−(z−y) /2 , −∞ < y < ∞.
2π
66 / 95
Example
Solution - continued:
Z ∞
1 2 1 2
∴ fZ (z) = √ e−(z−y) /2 · √ e−y /2 dy
−∞ 2π 2π
Z ∞
1 2 2
= e−z /2+zy−y dy
2 π −∞
2
e−z /2 ∞ − y2 −zy+ z42 + z42
Z
= e dy
2π −∞
2
e−z /4 ∞ −(y− z2 )2
Z
= e dy.
2π −∞
67 / 95
Example
Solution - continued:
Now for µ and σ, we have
Z ∞
1 1 2
√ e− 2σ2 (y−µ) dy = 1,
−∞ σ 2 π
and
Z ∞
1 2 √
e− 2σ2 (y−µ) dy = σ 2π.
−∞
68 / 95
Example
Solution - continued:
Z ∞ √
z 2
2
Put σ = 1
2
and µ = z
2
. Then e−(y− 2 ) dy = π.
−∞
2
e−z /4 √ 1 2
∴ fZ (z) = π = √ e−z /4 , −∞ < z < ∞.
2π 2 π
69 / 95
5.6 Moment Generating Function
Approach
70 / 95
Result
Suppose that X and Y independent random variables with
moment-generating functions mX and mY , respectively.
Then
mX+Y (u) = mX (u) · mY (u).
71 / 95
Proof.
If X and Y are independent, then Z = X + Y has moment
generating function
72 / 95
Alternative Proof
Recall, the probability density function of Z is
Z ∞
fZ (z) = fX (z − y) fY (y) dy.
−∞
Thus,
Z ∞
E(euZ ) = euz fZ (z) dz
−∞
Z ∞ Z ∞
= euz fX (z − y) fY (y) dy dz
−∞ −∞
Z ∞ Z ∞
= eu(z−y) fX (z − y) dz · euy fY (y) dy
−∞ −∞
Z ∞ Z ∞
= eux fX (x) dx euy fY (y) dy,
−∞ −∞
with x=z−y and dx = dz
Z ∞
uy
= mX (u) e fY (y) dy
−∞
Z ∞
= mX (u) euy fY (y) dy
−∞
= mX (u) · mY (u).
73 / 95
Sum of Independent Random Variables
Using Moment Generating Functions
74 / 95
We can use this approach to find the distribution of the
sum of independent random variables using the one-to-one
correspondence between distributions and moment
generating functions.
n
Y
m Pn
i=1 Xi (u) = mXi (u).
i=1
75 / 95
Proof.
Pn
mPni=1 Xi (u) = E(eu i=1 Xi
)
n
!
Y
= E euXi
i=1
n
Y
= E(euXi )
i=1
n
Y
= mXi (u).
i=1
76 / 95
Example
Let X1 , X2 , . . . , Xn be independent Bernoulli(p) random variables. Thus each Xi
has a probability function
1
X
mX (u) = E(eu X ) = eux P (X = x)
k=0
= eu·0 P (X = 0) + eu·1 P (X = 1)
= 1 − p + p eu .
Therefore,
n
Y
mPn (u) = mXi (u) = (1 − p + peu )n
i=1 Xi
i=1
which is the mgf of a Binomial (n, p) random variable. Thus we can conclude that
Pn
i=1 Xi ∼ Binomial (n, p).
77 / 95
Example
Note that if Z ∼ Binomial(n, p), then
X
mZ (u) = E(euZ ) = euz P (Z = z)
all z
n
uz n
X
= e pz (1 − p)n−z
z=0
z
n
X n
= (p eu )z (1 − p)n−z
z=0
z
= (peu + 1 − p)n
= (1 − p + peu )n .
78 / 95
Example
Suppose that X1 , X2 , . . . , Xn are independent random
variables with each i = 1, . . . , n, Xi ∼ Poisson (λi ).
79 / 95
Example
Thus Xi has the moment generating function
u −1)
mXi (u) = E(euXi ) = eλi (e
n
X
and Xi has the moment generating function
i=1
n Pn
Pn
E(euXi ) = e( λi )(eu −1)
Y
mPni=1 Xi (u) = E(eu i=1 Xi
)= i=1 .
i=1
n n
!
X X
∴ Xi ∼ Poisson λi .
i=1 i=1
80 / 95
Sum of Independent Normal Random
Variables is again Normal.
81 / 95
Sums of independent normal random variables are also
normal, with the means and variances added.
Result
2
If X ∼ N (µX , σX ) and Y ∼ N (µY , σY2 ) are independent
then
2
X + Y ∼ N (µX + µY , σX + σY2 ).
82 / 95
The following extension is for general weighted sums.
Result
Let 1 ≤ i ≤ n. If Xi ∼ N (µi , σi2 ) are independent, then for
any set of constants a1 , . . . , an ,
n n n
!
X X X
ai X i ∼ N ai µ i , a2i σi2 .
i=1 i=1 i=1
83 / 95
Example
Suppose that
84 / 95
Example
In summary,
U ∼ N (−15, 1224).
85 / 95
Chapter 5.7
Conditional Expectation and
Conditional Variance
86 / 95
Suppose X, Y and Z are random variables.
87 / 95
Result 1
88 / 95
Result 1
Proof.
For a, b ∈ R,
X
E(aY + bZ X) = (ay + bz) P (Y = y, Z = z X = x)
y,z
X X
= ay P (Y = y, Z = z X = x) + bz P (Y = y, Z = z X = x)
y,z y,z
X X
= a y P (Y = y X = x) + b z P (Z = z X = x)
y z
= a E(Y X) + b E(Z X)
88 / 95
Result 2
E(Y X) ≥ 0 if Y ≥ 0.
Result 3
E(1 X) = 1.
Result 4
If X and Y are independent, then E(Y X) = E(Y ).
89 / 95
Result 6
1 E E(Y X) = E(Y )
2 E E(g(Y ) X) = E(g(Y )) for any suitable g function.
Proof.
We will prove ❷ since ❶ is a special case of ❷ when g = 1.
X
E E(g(Y ) X) = E g(y) P (Y = y X = x)
y
XhX i
= g(y) P (Y = y X = x) P (X = x)
x y
X hX i
= g(y) P (Y = y X = x) P (X = x)
y x
X hX i
= g(y) P (Y = y, X = x)
y x
X
= g(y) P (Y = y)
y
= E(g(Y )).
90 / 95
Result 7 - Tower Property
! !
E E(Y X, Z) X = E(Y X) = E E(Y X) X, Z .
Proof.
h i XhX i
E E(Y X, Z) X = P (Y = y X = x, Z = z) P (X = x, Z = z X = s)
z y
XX P (Y = y, X = x, Z = z) P (X = x, Z = z)
= y
z y P (X = x, Z = z) P (X = x)
X
= yP (Y = y X = x)
y
= E(Y X = x)
!
= E E(Y X) X, Z
by Result 5
91 / 95
Result 8 - Conditional Variance Formula
Var(Y X) = E Var(Y X) + Var E(Y X) .
Proof.
The natural definitions are given by
h 2 i
V ar(Y X = x) = E Y − E(Y X = x) X =x
2
V ar(Y ) = E((Y − E(Y ) ) ).
2
V ar(Y X) = E((Y − E(Y )) X)
h 2 i
= E E Y − E(Y X) + E(Y X) − E(Y ) X
= E V ar(Y |X) + V ar E(Y X)
since the mean of E(Y X) is E(Y ) and the cross product reduces to zero because
2E E(Y − E(Y X) ) (E(Y X) − E(Y ) ) X
= 2E E(Y X) − E(Y ) E Y − E(Y X) X
= 0,
that is, E (Y − E(Y X)) X = E(Y X) − E(Y X) = 0.
92 / 95
5.8 Supplementary Material
93 / 95
Supplementary Material-Binomial Theorem
Binomial Theorem
Let a and b be constant. Then,
m
m
X m y m−y
(a + b) = a b
y=0
y
m
X m!
= ay bm−y .
y=0
y! (m − y)!
94 / 95
Supplementary Material- Integration by Parts Formula
95 / 95