0% found this document useful (0 votes)
17 views

STAT2372 Topic3 2020

This document discusses continuous random variables. It defines continuous random variables as those that can take on an uncountable number of values and have a probability density function. It provides examples of continuous random variables and discusses key concepts like the cumulative distribution function and quantile function. It also introduces the uniform distribution as a simple example of a continuous distribution and illustrates properties of probability densities, cumulative distribution functions, and quantile functions.

Uploaded by

Simthande Soke
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views

STAT2372 Topic3 2020

This document discusses continuous random variables. It defines continuous random variables as those that can take on an uncountable number of values and have a probability density function. It provides examples of continuous random variables and discusses key concepts like the cumulative distribution function and quantile function. It also introduces the uniform distribution as a simple example of a continuous distribution and illustrates properties of probability densities, cumulative distribution functions, and quantile functions.

Uploaded by

Simthande Soke
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 86

STAT2372 Probability

2020

Topic 3
Continuous Random Variables

STAT2372 2020 Topic 3 1


Continuous random variables
• We’ve seen examples of discrete random variables which take on
an infinite but countable number of values.
• There are rvs which take on an uncountable number of values.
• Two examples:
– the time that a train arrives at a specified stop;
– the lifetime of a transistor.
• Since random variables of this type can have a continuum of
possible values, they are called continuous random variables.
• Continuous random variables can take any value over some range.
• It is possible for a rv to be a mixture of a continuous and discrete
rv. Examples:
– Amount of rainfall in a day. On a dry day, the amount of

STAT2372 2020 Topic 3 2


rainfall is exactly zero; on a wet day the amount of rainfall is
continuous on (0, ∞).
– The length of time it takes for a customer to commence
service in a single-server queue with continuously distributed
service times. There is a non-zero probability that the queue
is empty and therefore a strictly positive probability that the
time until commencement of service is zero. However, if the
queue is non-empty, the time until commencement of service
is continuous on (0, ∞) .
A random variable X is said to be continuous if it takes on an
uncountably infinite number of values, and if there is a function
fX (x) , called the probability density function, or pdf, such that:
1.
fX (x) ≥ 0, ∀x ∈ R;

STAT2372 2020 Topic 3 3


2.
Z∞
fX (x) dx = 1;
−∞

3.
Zb
P (a < X ≤ b) = fX (x) dx.
a

STAT2372 2020 Topic 3 4


0.4

0.4
P(−1<X<2)
0.3

0.3
fX(x)

fX(x)
0.2

0.2
1
0.1

0.1
0.0

0.0
−2 0 2 4 −2 0 2 4
x x

Important:
• Probabilities are represented by areas under the pdf.

STAT2372 2020 Topic 3 5


• Note that this is not true for discrete distributions.

• The height of fX (x) is never interpreted as P (X = x) for a


continuous rv.

The Uniform Distribution U (0, 1)


• Every value between 0 and 1 is “equally likely”.

1 0 ≤ x ≤ 1
fX (x) =
0 otherwise

STAT2372 2020 Topic 3 6


1.5
1.0
fX(x)
0.5
0.0

−0.5 0.0 0.5 1.0 1.5


x

STAT2372 2020 Topic 3 7


• If X is uniformly distributed on [0, 1], then, if a ≥ 0 and b ≤ 1,
Z b
P (a ≤ X ≤ b) = P (a < X < b) = 1 dx
a
= [x]ba
= b − a.

STAT2372 2020 Topic 3 8


1.5
1.0
fX(x)
0.5
0.0

a b

−0.5 0.0 0.5 1.0 1.5


x

STAT2372 2020 Topic 3 9


• From above, for any a,

P (X = a) = lim+ P (a ≤ X ≤ b)
b→a
= lim+ (b − a) = 0
b→a

• This will be true for any continuous rv, i.e. If X is a continuous


rv, then P (X = x) = 0, ∀x.

Cumulative Distribution Function (cdf )


• For any rv X, continuous, discrete, or neither, let FX be defined
by
FX (x) = P (X ≤ x) .
• FX (x) is called the distribution function of X or cumulative
distribution function (cdf) of X.

STAT2372 2020 Topic 3 10


e.g. For the uniform random variable above:

STAT2372 2020 Topic 3 11


• When b < 0, P (X ≤ b) = 0.

• When b > 1, P (X ≤ b) = 1.

• When 0 < b < 1,


Z b
P (X ≤ b) = 1 dx
0
= [x]b0
=b.

Therefore we can write the cdf as



 0 ; b≤0


FX (b) = P (X ≤ b) = b ; 0<b<1


1 ; b≥1

STAT2372 2020 Topic 3 12


• The domain of the cdf is R. Note that FX (x) needs to be
specified for all values of x, and not just the values of x for which
x is a ‘possible value’ of X.

Properties of a cdf

• FX (−∞) = limx→−∞ P (X ≤ x) = 0.
• FX (∞) = limx→∞ P (X ≤ x) = 1.
• Let x2 ≥ x1 . Then FX (x2 ) ≥ FX (x1 ) .
Proof:

FX (x2 ) = P (X ≤ x2 )
= P ({X ≤ x1 } ∪ {x1 < X ≤ x2 }) .

STAT2372 2020 Topic 3 13


Since (−∞, x1 ] and (x1 , x2 ] are mutually exclusive

FX (x2 ) = P (X ≤ x1 ) + P (x1 < X ≤ x2 )


≥ P (X ≤ x1 ) ,

since all probabilities are non-negative. Thus

FX (x2 ) ≥ FX (x1 ) .

• If X has lowest possible value ℓ and greatest possible value u,


then

FX (ℓ) = 0
FX (u) = 1

STAT2372 2020 Topic 3 14


Example of shapes of cdfs

The picture below is of the cdf of a discrete rv. Since X is discrete,


FX (x) is a step-function.

STAT2372 2020 Topic 3 15


The picture below is of the cdf of a continuous rv. Since X is
continuous, FX (x) is continuous.

STAT2372 2020 Topic 3 16


The Quantile Function

The quantile function of a probability distribution is the inverse F −1


of its cdf F .
If the probability distribution is discrete, then the quantile function is

Q(p) = F −1 (p) = inf{x ∈ R : p ≤ F (x)}

for a probability 0 < p < 1, and the quantile function returns the
minimum value of x for which the previous probability statement
holds.
If the probability distribution is continuous, then the quantile
function is
Q(p) = F −1 (p).

STAT2372 2020 Topic 3 17


Specialized Quantiles

• If p = 0.5, then the quantile is called the median.


• If p = 0.25, then the quantile is called the lower quartile.
• If p = 0.75, then the quantile is called the upper quartile.
• If p = 0.1, 0.2, . . . , 0.9, then the quantiles are called deciles.
• If p = 0.01, 0.02, . . . , 0.99, then the quantiles are called
percentiles.

STAT2372 2020 Topic 3 18


Median

1.0
0.8
0.6
F(x)

0.4
0.2
0.0

0 1 2 3 4 5

STAT2372 2020 Topic 3 19


Median

1.0
0.8
0.6
F(x)

0.4
0.2
0.0

−0.5 0.0 0.5 1.0 1.5 2.0

STAT2372 2020 Topic 3 20


Probability Density Functions (pdf )
• The probability of observing a specific value x of a continuous
random variable X is zero. It is thus the cdf, FX (x) , which is
used to define probabilities.
• For a continuous rv X,
Z x
FX (x) = fX (u) du
−∞

Note that the dummy in the integral is u, not x.


• The fundamental theorem of calculus gives us the relationship
d
FX (x) = fX (x) .
dx
i.e. the probability density function fX is the derivative of the
cdf FX .

STAT2372 2020 Topic 3 21


• When X is discrete, there is also a relationship between fX and
FX . ∀x,
fX (x) = FX (x) − lim− FX (y)
y→x

The Uniform Distribution U (a, b)


• Suppose that a random variable X is equally likely to lie in any
small sub-interval of (a, b) , and that it cannot lie outside this
interval. We then say that X ∼ U (a, b) , and have

 1 ; a≤x≤b
b−a
fX (x) =
 0 ; otherwise.

STAT2372 2020 Topic 3 22


• Thus, if x ∈ (a, b) , we have
Zx
FX (x) = fX (u) du
−∞
Za Zx
1
= 0du + du
b−a
−∞ a
1 x
=0+ [u]u=a
b−a
x−a
= .
b−a
• Obviously, we have FX (x) = 0 if x ≤ a and FX (x) = 1 if x ≥ b.
Hence the proper specification of FX is

STAT2372 2020 Topic 3 23



x−a


 b−a ; a≤x≤b
FX (x) = 0 ; x<a


1 ; x>b

STAT2372 2020 Topic 3 24


Checks:

STAT2372 2020 Topic 3 25


a−a
FX (a+ ) = b−a = 0 (lower limit from above)
b−a
FX (b− ) = b−a = 1 (upper limit from below)

Note: The cdf is continuous but not differentiable, since it is not


differentiable at a or b.

The triangular Distribution (Symmetric)


• If two independent (this will be defined later) rvs are both
U (a, b) , then their sum is triangular on (2a, 2b) .
• The pdf is given in the picture below, where c = 2a and d = 2b.
• We shall prove this later in more generality. The pdf of the sum
of two independent rvs is known as the convolution of the pdfs of
the two rvs.

STAT2372 2020 Topic 3 26


STAT2372 2020 Topic 3 27
• For the case where a = 0 and b = 1, we have

 x ; 0<x≤1


fX (x) = 2−x ; 1<x≤2


0 ; otherwise.

• The cdf FX (x) is a little complicated. When x ∈ (0, 1) ,


Zx
FX (x) = u du
0
2 x
 
u
=
2 0
x2
= ,
2

STAT2372 2020 Topic 3 28


while, when x ∈ (1, 2) ,
Zx
FX (x) = P (X ≤ 1) + (2 − u) du
1
2 x
 
u
= FX (1) + 2u −
2 1
x2
   
1 1
= + 2x − − 2−
2 2 2
x2
= 2x − −1
2
1 2
= 1 − (2 − x)
2

STAT2372 2020 Topic 3 29


Thus 


 0 ; x≤0
x2



 ; 0<x≤1
FX (x) = 2
2
1 − 21 (2 − x) ; 1<x≤2






 1 ; x > 2.
• Note the symmetry
• Checks:
+ 02

FX 0 = =0
2

 1
FX 1 =
2
+
 1 1
FX 1 =2− −1=
2 2

 4
FX 2 =4− −1=1
2
STAT2372 2020 Topic 3 30
The Cauchy Distribution
k
• What value of k makes a valid pdf on R?
1 + x2

• First, does the function satisfy non-negativity conditions?

STAT2372 2020 Topic 3 31


Obviously.
• What about its integral?
Z∞
k  −1 ∞
dx = k tan x −∞
1 + x2
−∞
h π   π i
=k − −
2 2
= kπ.

• The function is thus a pdf if kπ = 1, i.e. if k = 1/π.


• The pdf of a Cauchy random variable is thus
1
fX (x) =
π (1 + x2 )

• What is the cdf?

STAT2372 2020 Topic 3 32


• ∀x,
Zx
FX (x) = f (u) du
−∞
Zx
1
= du
π (1 + u2 )
−∞
1  −1 x
= tan u −∞
π
1 h −1  π i
= tan x − −
π 2
1 1
= + tan−1 x.
2 π

STAT2372 2020 Topic 3 33


Checks:
1 1
FX (−∞) = + tan−1 (−∞)
2 π
1 1  π
= + × −
2 π 2
= 0,
1 1
FX (∞) = + tan−1 (∞)
2 π
1 1 π 
= + ×
2 π 2
= 1,
1 1
FX (0) = + tan−1 (0)
2 π
1 1
= + ×0
2 π
1
=
2
STAT2372 2020 Topic 3 34
The Exponential Distribution
• Let V be the time between two successive events in a Poisson
process {X (t)} with parameter λ.
• As the numbers of events in non-overlapping intervals are
independent, this will have the same distribution as the time it
takes a single event to occur, given that X (0) = 0.
• Consider the event {V > v} . This is the same as the event
{X (v) = 0} . To see this, note that if X (v) = 0, then
X (t) = 0, ∀t ≤ v, and so V > v. Conversely, if V > v, then the
time at which the next event occurs is greater than v, and so
X (v) = 0.

STAT2372 2020 Topic 3 35


• Thus

P (V > v) = P (X (v) = 0|X (0) = 0)


0
e−λv (λv)
=
0!
= e−λv .

Now

FV (v) = P (V ≤ v)
= 1 − e−λv .

Hence, for v > 0,


d −λv

fV (v) = 1−e
dv
= λe−λv .

STAT2372 2020 Topic 3 36


• The pdf 
 λe−λv ; v>0
fV (v) =
 0 ; v≤0
is known as the exponential with parameter λ.

STAT2372 2020 Topic 3 37


STAT2372 2020 Topic 3 38
Some examples of quantities that have been modelled as
exponentially distributed random variables:
• The time it takes for someone to pick up a ringing phone;
• The amount of time until two vehicles meet at a one-lane bridge;
• The amount of time (from now) until an earthquake occurs;

“Lack of Memory” or Memoryless Property

• Let X be distributed exponentially with parameter λ. Consider


the conditional probability

P (X > a + b|X > a)

This is the probability that X is at least a + b, given that we


know X is at least a. [Note: a and b must both be ≥ 0. ]

STAT2372 2020 Topic 3 39


• By definition,
P [{X > a + b} ∩ {X > a}]
P (X > a + b|X > a) =
P (X > a)
P (X > a + b)
= ,
P (X > a)
by the rule of Conditional Probability and noting that the event
{X > a + b} is a subset of the event {X > a}. Thus

STAT2372 2020 Topic 3 40


1 − P (X ≤ a + b)
P (X > a + b | X > a) =
1 − P (X ≤ a)
1 − FX (a + b)
=
1 − FX (a)

1 − 1 − e−λ(a+b)
=
1 − {1 − e−λa }
e−λ(a+b)
=
e−λa
= e−λb

= 1 − 1 − e−λb
= 1 − FX (b)
= P (X > b)

STAT2372 2020 Topic 3 41


Thus
P (X > a + b | X > a) = P (X > b)
provided a and b are both non-negative.

• How do we interpret this? Suppose we have modelled the life of a


lightbulb as being exponentially distributed with parameter 1
(year). Suppose the lightbulb has been working for 1 year. The
above shows that the remaining life is also exponentially
distributed with parameter 1 (year). We say that the exponential
distribution has no memory. This should indicate that the
exponential model is probably not a very good model for the life
of a lightbulb.

STAT2372 2020 Topic 3 42


The Erlangian Distribution
• Now consider the total time until the second event occurs in a
Poisson process. Let W be the time until the second event occurs
in a Poisson process {X (t)} with parameter λ, when X (0) = 0.
Now

P (W > w) = P (2nd event occurs at a time > w)


= P (X (w) = 0 or 1) .

STAT2372 2020 Topic 3 43


Thus

1 − FW (w) = P (0 events in (0, w))


+ P (1 event in (0, w))
0 1
e−λw (λw) e−λw (λw)
= +
0! 1!
= e−λw (1 + λw) ,

and so

 1 − e−λw (1 + λw) ; w>0
FW (w) =
 0 ; w≤0

• The pdf of W is found by differentiating:

STAT2372 2020 Topic 3 44


d
fW (w) = FW (w)
dw

 λe−λw (1 + λw) − e−λw λ ; w>0
=
 0 ; w≤0

 λ2 we−λw ; w > 0
=
 0 ; w≤0

Thus 
 λ2 we−λw ; w>0
fW (w) =
 0 ; w ≤ 0.

• Generalisation: What is the pdf of the time until the kth event in
a Poisson process with parameter λ?

STAT2372 2020 Topic 3 45


The Gamma Function

• The gamma function is defined by


Z∞
Γ (α) = xα−1 e−x dx
0

where the condition α > 0 is needed for the integral to be finite.


• Properties:
Z∞
Γ (1) = e−x dx
0
−x ∞
 
= −e 0
= [(−0) − (−1)] = 1.

STAT2372 2020 Topic 3 46


Now
Z∞
Γ (2) = xe−x dx
0
Z∞
−x

= (−x) d e
0

Thus
Z∞
−x ∞
 
Γ (2) = −xe 0
− − e−x dx
0
= [(−0) − (−0)] + Γ (1)
= 1.

STAT2372 2020 Topic 3 47


More generally, when α > 0,

Γ (α + 1) = αΓ (α)

Proof:
Z∞
Γ (α + 1) = xα e−x dx
0
Z∞
α −x

= −x d e
0
Z∞
 α −x ∞ α−1 −x
=− x e 0
+ α x e dx
0
= 0 + αΓ (α) .

STAT2372 2020 Topic 3 48


Note that for integer α

Γ (α) = (α − 1) Γ (α − 1)
= (α − 1) (α − 2) Γ (α − 2)
= ···
= (α − 1) (α − 2) · · · 1Γ (1)
= (α − 1)!

• If α > 0, but is not integer, then the Γ function is still defined. In


fact, as long as we know the values of Γ(x) for x ∈ (0, 1), we can
calculate all values of Γ(x). For example,

STAT2372 2020 Topic 3 49


     
1 7 5 5
Γ 3 =Γ = Γ
2 2 2 2
 
5 3 3
= × Γ
2 2 2
 
5 3 1 1
= × × Γ
2 2 2 2
 
15 1
= Γ
8 2


 
1
It can be shown that Γ = π. Let W = Wk be the time
2
until the occurrence of the k th event in a Poisson process with

STAT2372 2020 Topic 3 50


parameter λ. Then, for k = 1, 2, 3, . . .

1 − FW (w) = P (W > w)
 
0 or 1 or . . . or (k − 1) events
=P 
occur in (0, w)
k−1 r
X e−λw (λw)
= .
r=0
r!

(Note that the kth event has to occur at a time > w.) Hence

 k−1
X (λw)r
 1 − e−λw

; w>0
FW (w) = r=0
r!

0 ; w≤0

STAT2372 2020 Topic 3 51


and
k−1
( )
r
d −λw
X (λw)
fW (w) = − e
dw r=0
r!
k−1 r k−1
X rλ (λw)r−1
X (λw)
= λe−λw − e−λw
r=0
r! r=1
r!
k−1 r k−2
X (λw)s
X (λw)
= λe−λw − λe−λw
r=0
r! s=0
s!
k−1
(λw)
= λe−λw
(k − 1)!
λk wk−1 e−λw
=
(k − 1)!
Special cases:

STAT2372 2020 Topic 3 52


1. k = 1. W is exponential with parameter λ. As expected,

 λe−λw ; w > 0
fW (w) =
 0 ; w≤0

2. k integer. W is said to have the Erlangian distribution. The rv


W can be thought of as the sum of independent and identically
distributed exponential rvs, each with parameter λ. This has
resulted in the Erlangian distribution being used in instances
where the lack of memory of the exponential prevents it from
being used.

k k−1 −λw
λ w e
; w>0


fW (w) = (k − 1)!
0 ; w ≤ 0.

3. Although we have assumed that k is an integer in the above, we

STAT2372 2020 Topic 3 53


know that when α > 0, the function

α α−1 −λw
λ w e
; w>0


f (w) = Γ (α)
0 ; w ≤ 0.

is a pdf, since, using the substitution x = λw,


Z∞ α α−1 −λw Z∞ α−1 −x
λ w e x e
dw = dx
Γ (α) Γ (α)
0 0
= 1.

4. The parameter α is also known as the shape parameter. It is


usual to put β = 1/λ. β is called the scale parameter. The pdf is
then known as the Gamma distribution with parameters α and β:
xα−1 e−x/β
fX (x) =
β α Γ(α)

STAT2372 2020 Topic 3 54


This is denoted as X ∼ G(α, β).
5. The chi-squared distribution with n degrees of freedom χ2n is a
particular case of G(α, β) when α = n/2, β = 2. The pdf of χ2n ,
n > 0, is 
 xn/2−1 e−x/2 , x > 0;
2n/2 Γ(n/2)
fX (x) =
 0, x ≤ 0.

STAT2372 2020 Topic 3 55


Some shapes of the Gamma pdf are shown below.
0.20

α = 1, β = 5
α = 2, β = 5
α = 3, β = 5
0.15
fX(x)
0.10
0.05
0.00

0 5 10 15 20 25 30
x

STAT2372 2020 Topic 3 56


The Beta Distribution

This is a two-parameter density function defined over the closed


interval [0, 1] . As such, it is often used as a model for proportions,
such as the proportion of impurities in a chemical product or the
proportion of time that a machine is in a state of being repaired, etc.

The beta function

• The beta function is defined by


Z 1
β−1
B (α, β) = xα−1 (1 − x) dx.
0

• For the existence of the integral, we need α > 0 and β > 0.

STAT2372 2020 Topic 3 57


• We’ll now show that
Γ (α) Γ (β)
B (α, β) = .
Γ (α + β)

• By definition,
Z ∞ Z ∞
Γ (α) Γ (β) = e−x xα−1 dx e−y y β−1 dy
0 0
Z ∞ Z ∞ 
= e−x xα−1 e−y y β−1 dy dx
0 0

• This is the volume between the surface

z = e−x xα−1 e−y y β−1

and the x − y plane.

STAT2372 2020 Topic 3 58


STAT2372 2020 Topic 3 59
• Let’s make the change of variable u = x + y in the inner integral.
Note that dy = du. The above integral is then
Z ∞ Z ∞ 
e−u xα−1 (u − x)β−1 du dx
0 x

• Changing the order of integration, the integral becomes


Z ∞ Z u 
β−1
e−u xα−1 (u − x) dx du.
0 0

STAT2372 2020 Topic 3 60


• Now put x = uz. We get dx = udz and the integral becomes
Z ∞ Z 1 
α−1 β−1
e−u (uz) (u − uz) udz du
0 0
Z ∞ Z 1 
= uα+β−1 e−u z α−1 (1 − z)β−1 dz du
0 0
Z ∞  Z 1 
β−1
= uα+β−1 e−u du z α−1 (1 − z) dz
0 0
= Γ (α + β) B (α, β) .

• Hence
Γ (α) Γ (β) = Γ (α + β) B (α, β)
and
Γ (α) Γ (β)
B (α, β) = .
Γ (α + β)
• For the special case where α = β = 1/2, we obtain, since

STAT2372 2020 Topic 3 61


R∞
Γ (1) = 0
e−x dx = 1,
2
{Γ (1/2)} = Γ (1) B (1/2, 1/2)
= B (1/2, 1/2) .

• Now Z 1
−1/2
B (1/2, 1/2) = x−1/2 (1 − x) dx.
0

• Put x = sin2 θ. Then dx = 2 sin θ cos θdθ and so


Z π/2
1 1
B (1/2, 1/2) = 2 sin θ cos θdθ
0 sin θ cos θ
Z π/2
=2 dθ = π.
0

• We thus have
2
{Γ (1/2)} = π

STAT2372 2020 Topic 3 62


and so

e−x √
Z
√ dx = π.
0 x
• We’ll need this soon.

STAT2372 2020 Topic 3 63


The Beta Distribution

• The random variable X is said to follow the beta distribution


with parameters α and β (α > 0 and β > 0) if
 1 β−1
 xα−1 (1 − x) ; x ∈ (0, 1)
fX (x) = B (α, β)
0 ; elsewhere


 Γ (α + β) xα−1 (1 − x)β−1

; x ∈ (0, 1)
= Γ (α) Γ (β)
0 ; elsewhere.

STAT2372 2020 Topic 3 64


• The graphs of a few beta pdfs are shown below.

STAT2372 2020 Topic 3 65


STAT2372 2020 Topic 3 66
• The standard uniform distribution (on the range 0 to 1 ) is just
the beta with parameters α = 1 and β = 1.

STAT2372 2020 Topic 3 67


The Normal Distribution
• The normal or Gaussian distribution is perhaps the most widely
used of all the continuous probability distributions. The shape of
the pdf is the familiar bell-shaped curve. The two parameters, µ
and σ 2 completely determine the shape σ 2 and location µ of the
normal pdf:
(  2 )
1 1 x−µ
f (x) = √ exp − ; −∞ < x < ∞
2πσ 2 2 σ

• Why is this a pdf? From above, putting y = (x − µ) /σ, we have


Z ∞ (  2 ) Z ∞  
1 x−µ 1 2
exp − dx = exp − y σdy
−∞ 2 σ −∞ 2
Z ∞  
1
= 2σ exp − y 2 dy.
0 2

STAT2372 2020 Topic 3 68



Now put u = y /2. Then y = 2u, dy = √12u du, and so the
2

above is
Z ∞
1 √
2σ √ exp (−u) du = 2σΓ (1/2)
0 2u

= 2πσ 2 .

• Thus f (x) is a pdf.


• When X has this pdf, we write
2

X ∼ N µ, σ .

The standard normal has µ = 0 and σ 2 = 1, and the notation Z


is usually reserved to denote a rv having this special distribution.
We shall interpret the parameters µ and σ 2 later.

STAT2372 2020 Topic 3 69


1. Pdf of Z ∼ N (0, 1)

STAT2372 2020 Topic 3 70


2

2. Pdf of X ∼ N µ, σ

STAT2372 2020 Topic 3 71


Checks:
1. f is obviously non-negative;
2. We’ve shown its integral is 1.
Hence
1 − 21 ( x−µ ) 2
f (x) = √ e σ ; −∞ < x < ∞
2πσ
is a valid probability density function.
We can obtain a random variable X which has the N (µ, σ 2 )
distribution from Z which has the N (0, 1) distribution via the
equations
X −µ
Z=
σ
and
X = µ + σZ.
They have the same shapes but different scales and locations.

STAT2372 2020 Topic 3 72


Cumulative Distribution Function

• The normal cdf cannot be written down in a nice closed form.


But neither can the log, exp, sin and cos functions.
• The normal cdf is important enough to be tabulated. It is also
very easy to calculate it using the well-known function erf, which
is called the “error function”.
• If
2

X ∼ N µ, σ
then the transformation
X −µ
Z= ∼ N (0, 1)
σ
can be used to determine “areas under the curve” for any normal
distribution.
• Some tables give the area between 0 and x (x > 0) , some

STAT2372 2020 Topic 3 73


between −∞ and x. All areas can be evaluated from either set of
tables. Some tables give the area between x and ∞.
• The cdf of a standard normal random variable is commonly
written as Φ (z), while the probability density function is written
as φ (z). Thus
Zz
Φ (z) = φ (u) du,
−∞

where
1 2
φ (z) = (2π)−1/2 e− 2 z .

• Note that Φ (−z) = 1 − Φ (z) , since φ (z) is symmetric about


z = 0.

STAT2372 2020 Topic 3 74


The Normal Approximation to the Binomial distribution

• The Poisson distribution can be used to approximate the


Binomial when n is large and p is very small. However, when p is
“medium” and n “large”, the binomial is better approximated by
the normal distribution. We shall prove this later on.
• If
Y ∼ Bin (n, p)
then we may approximate Y ’s distribution with that of a normal
rv with

µ = np
σ 2 = np (1 − p) .

The interpretation of µ and σ 2 will come later.

STAT2372 2020 Topic 3 75


 
• Thus we may approximate P (Y ≤ y) by Φ √ y−np .
np(1−p)

Continuity Correction

• Since the binomial is discrete and the normal is continuous, an


additional factor must be included to account for this.

STAT2372 2020 Topic 3 76


0.15
0.10
P(X=x)
0.05
0.00

11 12 13 14 15 16 17 18
x

• The B(n, p) probability P (X = x) is approximated by the area


under the N (np, np(1 − p)) curve, between x − 12 and x + 12 .
• As an example, consider the probability of getting between 70
and 72 heads (inclusive) when tossing a biased coin 100 times,

STAT2372 2020 Topic 3 77


where the probability of a heads appearing on any one toss is 0.6.
• Let Y be the number of heads in 100 tosses of the coin. Then

Y ∼ Bin (100, 0.6) .

• Now,
   
100 70 30 100 71 29
P (70 ≤ Y ≤ 72) = (0.6) (0.4) + (0.6) (0.4)
70 71
 
100 72 28
+ (0.6) (0.4)
72
≃ 0.0201824

STAT2372 2020 Topic 3 78


• Using the normal approximation, we have
µ = np = 100 (0.6) = 60
σ 2 = np (1 − p)
= 100 (0.6) (0.4)
= 24

• Without Continuity Correction: the area under the curve


between 70 and 72 is
 
70 − 60 Y −µ 72 − 60
P (70 ≤ Y ≤ 72) = P √ ≤ ≤ √
24 σ 24
≃ P (2.04 ≤ Z ≤ 2.45)
= 0.0136
The relative error is 33%.
• With Continuity Correction: the area under the curve between

STAT2372 2020 Topic 3 79


   
1 1
70 − and 72 + is
2 2
 
1 1
P (70 ≤ Y ≤ 72) = P 70 − ≤ Y ≤ 72 +
2 2
 
69.5 − 60 Y −µ 72.5 − 60
=P √ ≤ ≤ √
24 σ 24
≃ P (1.939 ≤ Z ≤ 2.552)
≃ 0.9946 − 0.9737 = 0.0209

which is a much better approximation. The relative error is now


only 3.1%.
• What is the probability of obtaining fewer than 50 heads?

STAT2372 2020 Topic 3 80


• The exact answer is

P (Y < 50) = P (Y ≤ 49)


49  
100
(0.6)r (0.4)100−r
X
=
r=0
r

The R command pbinom(49,100,0.6) gives 0.01676169.


• Using the normal approximation with continuity correction

P (Y < 50) = P (Y ≤ 49)


 
Y −µ 49.5 − 60
=P ≤ √
σ 24
≃ P (Z < −2.14)
≃ 0.0162,

for which the relative error is −3.6%.

STAT2372 2020 Topic 3 81


• Without continuity correction, we obtain
 
Y −µ 50 − 60
P (Y < 50) = P < √
σ 24
≃ P (Z < −2.04)
≃ 0.0207,
for which the relative error is 23%.
• Some books call the normal approximation to the binomial the
De Moivre-Laplace Limit Theorem.
• This approximation is quite good for values of n (and p )
satisfying
np (1 − p) ≥ 10.
There are no rules, in general, how big a sample size should be
before the approximation is arbitrarily accurate. The n = 30 rule
you may have seen before is just a rule of thumb.

STAT2372 2020 Topic 3 82


R functions

x = seq(-0.5,2,0.1)
Fx = x^2
Fx[x<=0] = 0
Fx[x>1] = 1
plot(x,Fx,type="l",ylab="F(x)",main="Cumulative distribution
function",lwd=2)

STAT2372 2020 Topic 3 83


Cumulative distribution function

1.0
0.8
0.6
F(x)

0.4
0.2
0.0

−0.5 0.0 0.5 1.0 1.5 2.0

STAT2372 2020 Topic 3 84


For each standard probability distribution, there are 4 R functions.
For example, for uniform distribution:
• dunif(x,min=a,max=b) returns the value of the uniform
probability density function U(a, b) at point x.
> dunif(0.1,0,2)
[1] 0.5
• punif(x,min=a,max=b) returns the value of the cumulative
distribution function (cdf) of the U(a, b) at point x.
> punif(0.1,0,2)
[1] 0.05
• qunif(q,min=a,max=b) returns the q-quantile of the uniform
distribution U(a, b).
> qunif(0.7,0,2)
[1] 1.4

STAT2372 2020 Topic 3 85


• runif(M,min=a,max=b) generates M random variables from the
uniform distribution U(a, b).
> runif(5,0,2)
[1] 0.8512345 0.6658686 1.4998520 0.9878262 1.2498115
Similarly, there are 4 R functions for other continuous distributions
considered in this topic:
• Cauchy dcauchy, pcauchy, qcauchy, rcauchy
• exponential dexp, pexp, qexp, rexp
• Gamma dgamma, pgamma, qgamma, rgamma
• chi-squared dchisq, pchisq, qchisq, rchisq
• Beta dbeta, pbeta, qbeta, rbeta
• normal dnorm, pnorm, qnorm, rnorm

STAT2372 2020 Topic 3 86

You might also like