0% found this document useful (0 votes)
84 views

MA2216/ST2131 Probability Notes 5 Distribution of A Function of A Random Variable and Miscellaneous Remarks

The document provides examples and derivations of the distributions of random variables that are functions of other random variables. It begins with an example of how the normal distribution is preserved under an affine transformation. It then works through several other examples, such as how the chi-squared distribution arises from squaring a standard normal variable, and how an exponential variable remains exponential when divided by a constant. The document proves a general theorem stating that if Y is a strictly monotone function of a continuous random variable X, then the probability density function of Y can be determined from the inverse function and probability density function of X.

Uploaded by

Sarah
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
84 views

MA2216/ST2131 Probability Notes 5 Distribution of A Function of A Random Variable and Miscellaneous Remarks

The document provides examples and derivations of the distributions of random variables that are functions of other random variables. It begins with an example of how the normal distribution is preserved under an affine transformation. It then works through several other examples, such as how the chi-squared distribution arises from squaring a standard normal variable, and how an exponential variable remains exponential when divided by a constant. The document proves a general theorem stating that if Y is a strictly monotone function of a continuous random variable X, then the probability density function of Y can be determined from the inverse function and probability density function of X.

Uploaded by

Sarah
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

MA2216/ST2131 Probability

Notes 5
Distribution of a Function of a Random Variable
and
Miscellaneous Remarks
1. Change-of-Variable.
To begin with, let us recall an ane transformation of a normal r.v.
mentioned in 8.8 of Lecture Notes 4.
To be precise, let X N (, 2 ). Put
Y = aX + b.
Then, it has been pointed out in 8.8 of Notes 4 that, for a = 0,
Y N (a + b, a2 2 ).
Equivalently, for y IR, the p.d.f. of Y is given by
(yab)2
1

fY (y) =
e 2(a)2 .
2
2(a)

Derivation of (1.1).

(1.1)

Recall that
(x)2
1
e 22 .
fX (x) =
2 2

It takes two steps for us to determine the distribution of Y , i.e., to nd


FY , the d.f. of Y , rst, and then to dierentiate FY to get fY , the p.d.f.
of Y .
Consider the case a > 0 rst.
FY (y) = IP (Y y)
= IP (aX + b y)
yb
= IP (X
)
a
)
(
yb
.
= FX
a
1

To identify the distribution of Y , we need to nd its p.d.f., fY , which is


the derivative of FY : for y IR,
d
d
fY (y) =
FY (y) =
FX
dy
dy
(
)
yb
1
= fX

a
a

yb
a

yb

)2
(
a
1
1
e 22
=
a
2 2
(yab)2
1

=
e 2(a)2 .
2(a)2

We recognize that it is the p.d.f. of N (a + b, a2 2 ), which is (1.1).


If a < 0, then
)
(
)
(
yb
yb
FY (y) = IP X
= 1 FX
.
a
a
Hence,
yb

(
)2
a
1
1
fY (y) =
e 22
a
2 2
(yab)2
1

=
e 2(a)2 ,
2(a)2

which is obviously the p.d.f. of the normal distribution N (a+b, a2 2 ),


and therefore, we are done.
In particular, if we set
Z=

X
X

=
,

then Z is a standard normal r.v., i.e., Z N (0, 1). This is perhaps the
most frequently used transformation.

In the following, we will see a few additional examples on the derivation


of the distribution of a transformed r.v.
1. Example.
Y = Z 2?

Let Z N (0, 1). What are the d.f. and p.d.f. of Y , where

Note that, Y is known in the literature as chi-square (2 ) r.v. of degree


1.
Sol. First note that Y takes nonnegative values. Obviously, Y is continuous, and hence we determine its c.d.f. rst. For y > 0, we have
FY (y) = IP (Y y) = IP (Z 2 y)
1

= IP ( y Z y) =
2
y
2
2
=
ez /2 dz,
2 0
hence
fY (y) =

ez

/2

dz,

dFY (y)
2
1 y/2
y 1/2 ey/2

=
e
=

dy
2 2 y
2
1

( 12 ) 2 1 1 1 y
=
y2 e 2 .
( 12 )
We conclude that

0,
FY (y) =

for y 0,
y

eu

/2

du, for y > 0.

0,

and
fY (y) =

for y 0,
1

( 2 )12 y 12 1 e 12 y , for y > 0.


( )
2

Remark. Do you recognize the distribution of the above Y ?


In fact, the distribution of Z 2 with Z N (0, 1) is ( 12 , 12 ). Thus,
(
21

1 1
,
2 2

)
.

(1.2)

This is a very useful result, which will be quoted often throughout.


3

Addendum. Refer to 1.4 of Notes 7, in which the following property


will be established:
If Xi , i = 1, 2, . . . , n are independent gamma r.v.s with respective parameters (i , ), i = 1, 2, . . . , n, then
( n
)
n

i , .
(1.3)
Xi
i=1

i=1

Now, apply this proposition to the sum,

Zi2 , where Z1 , Z2 , . . . , Zn is

i=0

a sequence of i.i.d. standard normal random variables. Put


X=

Zi2 .

i=0
2
The distribution of X is called the chi-square
( n 1 )(n ) distribution of n
degrees of freedom, which is the same as 2 , 2 . That is,
(
)
n 1
2
,
n
.
(1.4)
2 2

2. Example. Let X N (0, 1). Dene Y = eX , commonly known as the


lognormal random variable. Find the p.d.f. fY .
Sol. First note that Y takes nonnegative values. Therefore, for y > 0,
we have
FY (y) = IP (Y y)
= IP (eX y)
= IP (X ln y)
ln y
2
1
=
ex /2 dx,
2
hence
fY (y) =

2
dFY (y)
1
1
= e(ln y) /2 .
dy
2 y

To be precise,

0,
fY (y) =

1
y 2

for y 0,
e(ln y)

/2

, for y > 0.

3. Example. Let X be a continuous random variable with p.d.f. fX . Let


Y = X n where n is odd. Find p.d.f. fY .
Sol. Let y IR. We have
FY (y) = IP (Y y) = IP (X n y)
= IP (X y 1/n )
= FX (y 1/n ),
hence
fY (y) =

1 1
dFY (y)
= y n 1 fX (y 1/n ).
dy
n

4. Example. Let X be an exponential r.v. with parameter . That is,


X Exp(). Set
Y = [[X]] + 1.
(Note that [[x]] = k i k x < k +1 for integers k.) Find the distribution
of Y .
Sol. Obviously, Y is a discrete r.v., taking values in the set of positive
integers. For y = 1, 2, . . .,
fY (y) = IP {Y = y} = IP {[[X]] + 1 = y} = IP {[[X]] = y 1}
= IP {y 1 X < y} = IP {X y 1} IP {X y}
= e(y1) ey
(
)
= e(y1) 1 e .
Let p = 1 e and q = e . The above p.d.f. can now be written as
fY (y) = q y1 p
for y = 1, 2, . . ., and we conclude that Y Geom(p).

5. Let X Exp(1). Set Y = X/, where > 0. Determine the distribution


of Y .
Sol. Obviously, Y is continuous, taking non-negative values. For y > 0,
FY (y) = IP {Y y} = IP {X/ y}
= IP {X y} = 1 ey .
The p.d.f. fY () is thus obtained: for y > 0,
fY (y) =

dFY (y)
= ey .
dy

We conclude that Y Exp().


Alternatively, if Y Exp() is given, and we let X = Y , then
X Exp(1).

6. Example.

Let X be a r.v. of which its c.d.f. is given by

0,

for x ,
{ (
) }
FX (x) = 1 exp x
, for x > .

(Note that this distribution is called a Weibull distribution with parameters , > 0, and > 0. Such a distribution is widely used in
engineering practice due to its versatility. Dierentiating FX yields
0,
for x ,
{ (
(
)1
) }
x
fX (x) = x
exp
, for x > ,

which is the p.d.f. of X.)


(
Let Y =

)
. Determine the distribution of Y .

Sol. Y takes non-negative values. For y > 0,


{(
}
)
X
FY (y) = IP {Y y} = IP
y

{
}
= IP X y 1/ +
= 1 ey ,
which is the c.d.f. of an exponential distribution with parameter = 1.
Thus,
Y Exp(1).

7. Theorem. Let X be a continuous r.v. having p.d.f. fX . Suppose


that g(x) is a strictly monotone (increasing or decreasing) dierentiable
function of x. Then the r.v. Y dened by Y = g(X) has a p.d.f. given
by
{


d 1
1
fX [g (y)] dy g (y) , if y = g(x) for some x,
fY (y) =
0,
if y = g(x) for all x,
where g 1 (y) is dened to equal that value of x such that g(x) = y, i.e.,
g 1 (y) is the inverse function of g.
Sketch of a Proof.
of Y rst.

Let us try to determine the ditribution function

FY (y) = IP (Y y) = IP (g(X) y)
= IP (X g 1 (y)) (assuming g is increasing)
= FX (g 1 (y)).
To nd fY (y), we dierentiate FY (y) w.r.t. y, and get
d
FX (g 1 (y))
dy
d
d 1
=
FX (x)|x=g1 (y)
g (y)
dx
dy
d 1
= fX [g 1 (y)]
g (y).
dy

fY (y) =

If g is decreasing, we have to modify the above derivation a little bit:


FY (y) = IP (Y y) = IP (g(X) y)
= IP (X g 1 (y)) (assuming g is decreasing)
= 1 FX (g 1 (y)).
7

Thus,

]
d [
1 FX (g 1 (y))
dy
d
d 1
= FX (x)|x=g1 (y)
g (y)
dx
dy
[
]
d 1
1
= fX [g (y)] g (y)
dy



d 1
1

= fX [g (y)] g (y) .
dy

fY (y) =

8. Remark. Let X N (, 2 ). Put Y = aX + b = g(X), where


g(x) = ax + b.
Assume a = 0. Thus, the function g is a strictly monotone dierentiable
function of x, i.e., g() satises the assumption in the above theorem.
Obviously,
yb
g 1 (y) =
,
a
d 1
1
with g (x) = a and
g (y) = .
dy
a
As a result, we arrive at the p.d.f. of Y :
fY (y) = fX [g



d 1
(y)] g (y)
dy
yb

(
)2
a
1
1
e 22 | |
=
a
2 2
[y(a+b)]2
1
e 2a2 2
=
2a2 2
(yab)2
1

=
e 2(a)2 .
2(a)2

9. Remark. Can the above theorem be applied to the example in item


1?
10. Theorem 7 is concerning the change of variable for one-dimensional setting. Later in the course, we will learn how to make change of variables
for the multi-dimensional setting. (See Lecture Notes 8.)
8

2. Remarks & Examples


Several examples/remarks are presented below for reference only. (We
wont go through them in class.)
1. Stirlings Formula & De Moivre-Laplaces Thm.
(Refer to Kai Lai Chungs book on Elementary Probability Theory with
Stochastic Processes, Appendix 2 of Chapter 7.)
n!
as n , i.e.,
lim

2 nn+(1/2) en
n! en

n nn+(1/2)

2.

(2.1)

2. It has been shown that an exponential distribution possesses the socalled memorylessness property. Indeed, it is the unique distribution
(on IR+ ) possessing this property.
Proof. To see this, suppose X is a non-negative real-valued r.v. which
admits the memorylessness property and let
F (x) = IP {X > x}.
Then, the memorylessness yields
F (s + t) = F (s) F (t).
In other words, F () satises the functional equation
g(s + t) = g(s) g(t).
Obviously,

( )
(
)
( )2
2
1
1
1
g
=g
+
=g
,
n
n n
n
( )m
(m)
1
and repeating this yields g
=g
.
n
n
Similarly,

(
g(1) = g

1
1
+ +
n
n
9

( )n
1
=g
,
n

( )
1
g
= g(1)1/n .
n

or
(m)

= g(1)m/n . Recall that the distribution is dened to be


n
right-continuous. Clearly, F () = 1 FX () is right-continuous as well.
By making use of the right-continuity of F , we claim that
Hence, g

g(x) = g(1)x .
In turn, we conclude that for x > 0,
g(x) = ex ,
where = ln g(1).

3. A variation of the exponential distribution is the distribution of a r.v.


that is equally likely to be either positive or negative and whose absolute
variable is exponentially distributed with parameter > 0. Determine
the d.f. and p.d.f. of this r.v.
Sol. Let X denote such a r.v. By assumption, X is equally likely to
be either positive or negative, and hence X and X have the identical
distribution. Thus, for x > 0,
IP {X < x} = IP {X > x}.
On the one hand, for x > 0,
IP {|X| x} = IP {x X X}
= 1 2 IP {X > x},
and on the other hand, IP {|X| x} = 1 ex . We obtain for x > 0,
IP {X > x} =

1 x
e
.
2

The c.d.f. of X is hence given by


{ 1 |x|
e
,
x < 0,
IP {X x} = 2 1 x
1 2e
, x > 0.
Dierentiating w.r.t. x yields the p.d.f. which is given by
fX (x) =
10

1
e|x| ,
2

for x IR.
Remark. The above distribution is said to have a Laplacian distribution with parameter . It is sometimes called the double exponential
distribution with parameter .
4. Suppose that if you are s minutes early for an appointment, then you
incur a cost cs, and if you are s minutes late, then you incur a cost
ks. Suppose the travelling time from where you are to the location
of your appointment is a continuous random variable having p.d.f. f .
Determine the time at which you should depart if you want to minimize
your expected cost.
Sol. Let X be your travel time. If you leave t minutes before your
appointment, then the cost, call it Ct (X), is given by
{
c(t X), if X t,
Ct (X) =
k(X t), if X > t.
Therefore,

E [Ct (X)] =

Ct (x) f (x) dx

c(t x) f (x) dx +
k(x t) f (x) dx
0
t
t
t

= ct
f (x) dx c
x f (x) dx + k
x f (x) dx
0
0
t

kt
f (x) dx
t
t

= ct F (t) c
x f (x) dx + k
x f (x) dx
=

kt [1 F (t)].
The value of t which minimizes E [Ct (X)] is obtained by calculus. Differentiation yields
dE
E [Ct (X)]
= c F (t) + ct f (t) c tf (t) kt f (t) k [1 F (t)] + kt f (t)
dt
= (k + c) F (t) k.
Equating to 0, the minimal expected cost is obtained when you leave t
minutes before your appointment, where t satises
F (t ) =
11

k
,
k+c

k
that is, t = F 1 ( k+c
) if F 1 exists.

Question. How do we know that this t gives us a minimum and not


a maximum?
5. A stick of length 1 is broken at random. Determine the expected length
of the piece that contains the point P of length p from one end, call A,
where 0 p 1.
Sol. Let U be the point (measuring from A) where the stick is broken
into two pieces. Then, U is uniformly distributed, i.e., the p.d.f. of U is
given by
{
1, for 0 < u < 1,
fU (u) =
0, otherwise.
Let Lp (U ) denote the length of the piece which contains the point P ,
and note that
{
1 U, if U < p,
Lp (U ) =
U,
if U > p.
Hence,

E [Lp (U )] =

Lp (u) fU (u) du

Lp (u) 1 du

=
0

(1 u) du +

u du
p

1
+ p(1 p).
2

6. Buses arrive at a specied stop at 15-minute intervals starting at 7 am.


That is, they arrive at 7, 7:15, 7:30, 7:45, and so on. If a passenger
arrives at the stop at a time that is uniformly distributed between 7 and
7:30, nd the probability that he waits
(i) less than 5 minutes for a bus;
(ii) more than 10 minutes for a bus.
Sol. Let X denote the arrival time of the passenger (after 7 in minutes).
Then,
X U (0, 30).
(i) Probability that the passenger waits less than 5 minutes for a bus
when and only when he arrives (a) between 7:107:15 or (b) 7:25
12

7:30. Therefore the desired probability is given by


15 10 30 25
+
30
30
1
= .
3

IP (10 < X < 15) + IP (25 < X < 30) =

(ii) Probability that the passenger waits more than 10 minutes for a bus
when and only when he arrives (a) between 7:007:05 or (b) 7:15
7:20. And so the desired probability is
IP (0 < X < 5) + IP (15 < X < 20) =

13

20 15
1
5
+
= .
30
30
3

You might also like