0% found this document useful (0 votes)
65 views

Continuous Random Variables and Their Probability Distributions

This document outlines key concepts related to probability density functions (PDFs) for continuous random variables. It defines a PDF as a non-negative function whose area under the curve between any two values equals the probability that the random variable falls within that range. Several examples are provided to demonstrate calculating probabilities using PDF graphs and formulas. Common PDF shapes like uniform, normal and exponential are also introduced.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
65 views

Continuous Random Variables and Their Probability Distributions

This document outlines key concepts related to probability density functions (PDFs) for continuous random variables. It defines a PDF as a non-negative function whose area under the curve between any two values equals the probability that the random variable falls within that range. Several examples are provided to demonstrate calculating probabilities using PDF graphs and formulas. Common PDF shapes like uniform, normal and exponential are also introduced.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 59

Chapter 4

Continuous Random Variables


and Their Probability
Distributions

1 / 59
Outline

4.1 Probability Density Functions

4.2 Cumulative Distribution Functions and Expected values

4.3 The Normal Distribution

4.4 The Exponential and Gamma Distribution

4.5 Other continuous Distributions

4.6 Probability Plots

2 / 59
Probability Density Functions
Just as probability is conceived as the long-run relative frequency, the idea
of probability density curve in a continuous probability distribution draws
from the relative frequency histogram for a large number of measurements.
Example
The time (in seconds) for 158 firefighters to place a ladder against a
building, pulling out a section of fire hose, dragging a weighted object, and
crawling in a simulated attic were reported as follows.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
1 425 427 287 296 270 294 365 279 319 289 297 267 336 374 380
2 386 334 238 281 222 261 370 291 334 350 291 294 389 417 256
3 266 302 254 356 400 276 312 353 305 291 268 421 386 342 286
4 228 285 293 399 352 294 276 269 323 438 378 269 317 317 254
5 354 438 313 297 333 386 320 331 300 226 276 312 264 236 287
6 262 304 285 264 289 368 321 291 254 327 277 285 363 289 240
7 317 299 339 417 286 280 278 288 266 303 350 273 303 296 261
8 292 403 269 221 247 395 228 275 278 317 255 276 284 373 416
9 225 283 336 240 347 403 278 328 305 247 302 410 385 268 302
10 296 264 322 268 234 292 252 339 342 257 315 317 308 259 246
11 232 250 266 279 286 328 304 406

3 / 59
Probability Density Functions (cont’d)
Recall that a relative frequency histogram has the following properties:
1 The total area under the histogram is 1.

2 For two points a and b such that each is a boundary point of some

class, the relative frequency of measurements in the interval a to b is


the area under the histogram above this interval.
Relative frequency histogram are given below if the data grouped in class
intervals of 40 and 10.
Time of Physical Test
Bandwidth=25
0.006
Density

0.010
0.000

Density curve
250 300 350 400 450

0.006
Time

Time of Physical Test 0.002


0.008
Density

0.000

250 300 350 400


250 300 350 400 450

Time

4 / 59
Probability Density Functions (cont’d)
By proceeding in this manner, even further refinements of relative frequency
histogram can be imagined with smaller class intervals. The jumps between
consecutive rectangles tend to dampen out, and the top of the histogram
approximates the shape of a smooth curve.
Because the probability is interpreted as long-run relative frequency, the curve
obtained as the limiting form of the relative frequency histograms represents the
manner in which the total probability 1 is distributed over the interval of possible
values of the random variable X . This curve is called the probability density
curve of the continuous random variable. The mathematical function f (x) whose
graph produces this curve is called its probability density function.

Definition
Let X be a continuous r.v. Then a probability distribution or probability
density function (pdf) of X is a non-negative function f (x) such that for
Rb
any two numbers a and b with a ≤ b, P[a ≤ X ≤ b] = a f (x)dx. The
graph of f (x) is often referred to as the density curve.

5 / 59
Probability Density Functions (cont’d)
For f (x) to be a legitimate pdf it must satisfy the following conditions.
1 f (x) ≥ 0 for all possible x.
Rb
2 For any two numbers a and b with a ≤ b, P[a ≤ X ≤ b] = a
f (x)dx.
3 The total area under the Rprobability density function curve is 1, i.e.
R ∞
all possible x
f (x)dx = 1 or −∞ f (x)dx = 1 if X can take any real values.
Unlike the discrete probability function, the probability density f (x) of a
continuous random variable doesn’t represent the probability that the random
variable will exactly equal the value of x. Instead, it is only meaningful to talk
about the probability that X lies in an interval.
With a continuous random variable, the probability that X takes any specific
value is always 0.
Because of the zero probability on a single point, for a continuous random
variable X ,
P[a ≤ X ≤ b] = P[a < X ≤ b] = P[a ≤ X < b] = P[a < X < b].
P[a < X < b] = {area to the left of b} − {area to the left of a}.

6 / 59
Shapes of Density Curves

7 / 59
Probability Calculation Using Density Curves
Example
The direction of an imperfection with respect to a reference line on a
circular object such as tire, brake rotor, or flywheel is, in general, subject
to uncertainty. Consider the reference line connecting the valve stem on a
tire to the center point, and let X be the angle measured clockwise to the
location ofan imperfection. One possible pdf (uniform distribution) for X
1 ◦ ◦
is f (x) = 360 , for 0 ≤ x < 360 Find the probabilities that
0, otherwise
(a) the angle is between 90◦ and 180◦
(b) the angle of occurrence is within 90◦ of the reference line.

Definition
A continuous r.v. X is said to have a uniform distribution on [α, β],
denoted by UNIF(α,
 1 β), if the pdf of X is
f (x; α, β) = β−α , for α ≤ x ≤ β
0, otherwise
8 / 59
Probability Calculation Using Density Curves (cont’d)
Example
The actual tracking weight of a stereo cartridge that is set to track at 3g on a
particular changer can be regarded as a continuous rv X with a pdf

k[1 − (x − 3)2 ], for 2 ≤ x ≤ 4



f (x) =
0, otherwise

1 Sketch the graph of f (x).


2 Find the value of k.
3 What is the probability that the actual tracking weight is greater that the
prescribed weight?
4 What is the probability that the actual tracking weight is within 0.25g of the
prescribed weight?
5 What is the probability that the actual tracking weight differs from the
prescribed weight by more than 0.5g?

9 / 59
Probability Calculation Using Density Curves (cont’d)
Example
Find k so that the followingfunction can serve as the probability density of
2
kxe −4x , for x > 0
a random variable: f (x) =
0, for x ≤ 0

Example
”Time headway” in traffic flow is the elapsed time between the time that one car
finishes passing a fixed point and the instant that the next car begins to pass that
point. Let X be the time headway for two randomly selected cars on a freeway
during a period of heavy flow. The following pdf of X was suggested in Leo
Breiman, R. Lawrence, D. Goodwin, and B. Bailey (1977, The statistical
properties
 of freeway traffic, Transportation Research, 11(4), 221-228):
0.15e −0.15(x−0.5) , for x ≥ 0.5
f (x) =
0, otherwise
(a) Is this a valid pdf? Sketch the graph of f (x).
(b) Find the probability that headway time is at most 5 seconds.

10 / 59
Probability Calculation Using Density Curves (cont’d)

Example
Let X denote the vibratory stress (psi) on a wind turbine blade at a
particular wind speed in a wind tunnel. Veers, P.S. (1982, Blade fatigue
life assessment with application to VAWTS, Journal of Solar Energy
Engineer, (104(2), 107-111) proposed the Rayleigh distribution, with a pdf
x2
x − 2θ2
f (x; θ) = θ2
e , for x > 0
0, otherwise
1 Verify that f (x; θ) is a legitimate pdf.
2 Suppose θ = 100 (a value suggested by a graph in the article). What
is the probability that X is at most 200? Less than 200? At least 200?
3 What is the probability that X is between 100 and 200 (again
assuming θ = 100)?
4 Give an expression for P(X ≤ x).

11 / 59
Cumulative Distribution Functions
Recall that the cumulative distribution function (cdf) F (x) for a discrete
rv X P(X ≤ x) is obtained by summing the pmf f (t) over all possible
values satisfying t ≤ x. The cdf of a continuous rv X P(X ≤ x) is
obtained by integrating the pdf f (y ) between the limits −∞ and x.
Definition
The cumulative distribution function F (x) for a continuous
Rx rv X is
defined for every number x by F (x) = P(X ≤ x) = −∞ f (t)dt.
Cumulative Probability Function and Density Curve Cumulative Distribution Function
0.4

1.0
0.8
F(7)
0.3

0.6
F(x)
0.2
f(x)

F(7)
0.4
0.1

0.2
0.0
0.0

2 4 6 8 10 2 4 6 8 10

12 / 59
Cumulative Distribution Functions (cont’d)
Example
Let X , the thickness of a certain
 1metal sheet, have a uniform distribution
on [A,B]. The pdf is f (x) = B−A , for A ≤ x ≤ B Verify that the
0, otherwise
CDF is 
 0 for x < A
x−A
F (x) = for A≤x ≤B
 B−A
1 for x > B

Using F (x) to compute probabilities


Let X be a continuous rv with a pdf f (x) and a cdf F (x). Then for any
number,
P(X ≤ a) = F (a) , P(X > a) = 1 − F (a)
and for any two numbers a and b with a < b,

P(a ≤ X ≤ b) = F (b) − F (a).

13 / 59
Cumulative Distribution Functions (cont’d)

Example
Suppose the pdf of the magnitude X of a dynamic load on a bridge (in
newtons) is given by
 1 3
f (x) = 8 + 8 x, for 0 ≤ x ≤ 2
0, otherwise

(a). Verify that



 0 forx < 0
x 3 2
F (x) = 8 + 16 x for 0 ≤ x ≤ 2
1, for x > 2

(b). P(1 ≤ X ≤ 1.5) = F (1.5) − F (1) = 0.297.


(c). P(X > 1) = 1 − P(X ≤ 1) = 0.688.

14 / 59
Cumulative Distribution Functions (cont’d)
Example
Let X be the amount of time for which a book on 2-hour reserve at a
college library is checked out by a randomly
 selected student and suppose
0.5x 0 ≤ x ≤ 2
that X has a density function f (x) =
0 otherwise

 0 x <0
(a). Verify that the cdf of X is F (x) = x2
0≤x ≤2
 4
1 x ≥2
(b). Use both pdf and cdf to show that P[X ≤ 1] = 0.25,
P[0.5 ≤ X ≤ 1.5] = 0.1875, and P[X > 0.5] = 0.9375.

Obtain f (x) from F (x) For discrete, the pmf is obtained from the cdf by
taking the difference between two F (x) values. Note that the continuous
analog of a difference is a derivative.
If X is a continuous rv with pdf f (x) and cdf F (x), then at every x at
which the derivative F ′ (x) exists, F ′ (x) = f (x).
15 / 59
Percentiles of a Continuous Distribution
Let p be a number between 0 and 1. The 100pth percentile of the
distribution of a continuous rv X , denoted by η(p), is defined by
Z η(p)
p= f (t)dt.
−∞
According to the definition, η(p) is that x value on the horizontal axis such
that 100p% of the area under the graph of f (x) lies to the left of η(p) and
100(1 − p)% lies to the right.
First quartile, second quartile (median), and third quartile are 25%, 50%,
and 75% percentiles.

f(x) F(x)
1
Area = p

x x

16 / 59
Percentiles of a Continuous Distribution (cont’d)
Example
The distribution of the amount of gravel (in tons) sold by a particular
construction supply company in a given week is a continuous rv X with a pdf

1.5(1 − x 2 ) for 0 ≤ x ≤ 1

f (x) =
0 otherwise

The cdf of sale for any x between 0 and 1 is


Rx 3
F (x) = 0 1.5(1 − t 2 )dt = 1.5(x − x3 ). The 100p%th percentile of the
η(p)3
distribution satisfies the equation p = 1.5(η(p) − 3 ) or
η(p)3 − 3η(p) + 2p = 0.
> f<-function(x) x^3-3*x+2*0.5
The median of a continuous
> uniroot(f,lower=0,upper=1)$root
[1] 0.3472963
distribution, denoted by µ̃, is the 50th
> f<-function(x) x^3-3*x+2*0.8 percentile. That is F (µ̃) = 0.5. Half the
> uniroot(f,lower=0,upper=1)$root
[1] 0.6084005 area under the density curve is to the
> f<-function(x) x^3-3*x+2*0.9
> uniroot(f,lower=0,upper=1)$root left of µ̃ and half is to the right of µ̃.
[1] 0.7293046

17 / 59
Expected Values and Variances
Definition
The expected Ror mean value of a continuous rv X with pdf f (x) is

µX = E (X ) = −∞ tf (t)dt.
R∞
If h(x) is any function of X , then E [h(X )] = µh(x) = −∞ h(t)f (t)dt.
The variance of a continuous R∞ rv X with pdf f(x) and mean µ is
σX = V (x) = E (X − µ) = −∞ (t − µ)2 f (t)dt.
2 2
p
The standard deviation (SD) of X is σX = V (X ).

Note that V (X ) = E (X 2 ) − [E (X )]2 .


The standardized variable
X − µX variable − mean
Z= = .
σX standard deviation
Usually, the mean of a rv X is to the left of its median if the distribution is
skewed to left and mean is to the right of median if the distribution is
skewed to the right.
18 / 59
Expected Values and Variances (cont’d)

Example
The distribution of the amount of gravel (in tons) sold by a particular
construction supply company in a given week is a continuous rv X with pdf

1.5(1 − x 2 ) for 0 ≤ x ≤ 1

f (x) =
0 otherwise

Show that E (X ) = 3/8 and V (X ) = 19/320.

Example
Two species are competing in a region for control of limited amount of a certain
resource. Let X be the proportion of the resource controlled by species one and
1 for 0 ≤ x ≤ 1
has a pdf f (x) = Show that the majority of the resource
0 otherwise
controlled by one species h(X ) = max(X , 1 − X ) has E [h(X )] = 3/4 and
V [h(X )] = 1/48.

19 / 59
Expected Values and Variances (cont’d)

Example
Prof. Jay Devore commutes to work. He must first get on a bus near his house and then
transfer to a second bus. If the waiting time (in minutes) at each stop has a uniform
distribution on [0, 5] and two buses
 are independent, then it can be shown that his total
x
 25 for 0 ≤ x < 5
2 x
waiting time X has a pdf f (x) = 5
− 25 for 5 ≤ x < 10
0 otherwise

1 What is the probability that total waiting time is at most 3 minutes?


2 What is the probability that total waiting time is between 3 and 8 minutes?
3 What is the probability that total waiting time is either less than 2 minutes or
more than 6 minutes?
4 Compute the cdf of X .
5 Obtain an expression for the 100pth percentile.
6 Compute E (X ) and V (X ). How do these compare with the expected waiting times
and variances for two individual buses?

20 / 59
The Normal Distribution
The normal distribution is the most important distribution in probability and statistics.
Many statistical populations have distributions that can be fitted very closely by
appropriate normal curves. Here are some examples: human heights, weights, and other
physical characteristics, measurement errors in scientific experiments, etc.
Even when the underlying distribution is discrete, the normal curve often gives an
excellent approximation.
Although individual random variables themselves are not normally distributed, sums and
averages of them will have approximately normal distributions under suitable conditions.

Definition
A continuous rv X is said to have a normal distribution with parameter µ and σ, where
−∞ < µ < ∞ and σ > 0, if the pdf of X is

1 (x−µ)2

f (x) = √ e 2σ2 ,
2πσ
The cdf is x (t−µ)2
Z
1 −
F (x) = √ e 2σ2 dt.
−∞ 2πσ
2
denoted by X ∼ N(µ, σ ).

21 / 59
Normal Distribution (cont’d)
(x−µ)2
1
The normal density f (x) = √2πσ e − 2σ2 . E [X ] = µ and V [X ] = σ 2 where
µ is the location parameter and σ is the scale parameter.

Downward cup Downward cup

Point of inflection Point of inflection


σ σ
Upward cup Upward cup

µ−σ µ µ+σ

Abraham de Moivre Pierre-Simon Laplace Carl Friedrich Gauss Sir Francis Galton

22 / 59
Normal Distribution (cont’d)
Normal Distributions with Different Means Normal Distributions with Different Variances
0.4

0.8
0.3

0.6
N(0,1)
N(0,0.25)
0.2

0.4
N(0,4)
N(−2,1)
N(2,1)
0.1

0.2
0.0

0.0
−10 −5 0 5 10 −3 −2 −1 0 1 2 3

When µ = 0 and σ = 1 the density and probability distribution functions


z2
of a standard normal Z ∼ N(0, 1) are φ(z) = √12π e − 2 ,
Rz t2
Φ(z) = −∞ √12π e − 2 dt.
zα denotes the value on the x axis for which α of the area under the z
curve lies to the right of zα or in other words, zα is the 100(1 − α)th
percentile of the standard normal distribution and is usually called as z
critical value.
23 / 59
Probability Calculations from Normal Distribution
Calculating probabilities from a normal distribution is not easy and let’s
start with a standard normal distribution.
1 Approximation: Derenzo’s approximation is a top for ”calculator” for

(83z+351)z 2 +562z
n h io
0 < z < 12. P[Z ≥ z] = 1 − Φ(z)≈ 0.5exp − 703+165z .
2 Use standard normal table, Appendix Table A.3.
z 0.00 0.04 0.05 0.06 0.07
1.0 0.8413 0.8508 0.8531 0.8554 0.8577
1.3 0.9032 0.9099 0.9115 0.9131 0.9147
. . . . . .
. . . . . .
. . . . . .
1.5 0.9332 0.9382 0.9394 0.9406 0.9418
1.6 0.9452 0.9495 0.9505 0.9515 0.9525
N(0,1) curve
. . . . . .
. . . . . .
. . . . . .
1.9 0.9713 0.9738 0.9744 0.9750 0.9756
2.0 0.9772 0.9793 0.9798 0.9803 0.9808
0 z
2.1 0.9821 0.9838 0.9842 0.9846 0.9850
Shaded area given in table . . . . . .
. . . . . .
. . . . . .
3.0 0.9987 0.9988 0.9989 0.9989 0.9989
. . . . . .
. . . . . .
. . . . . .
3 Computer software: R, Splus, SAS, Minitab, Mathlab, Mathematica,
Maple, etc.
24 / 59
Probability Calculation from Normal Distribution (cont’d)
1 P[Z ≤ 0] = 0.5.
2 P[Z ≤ −z] = 1 − P[Z ≤ z] = P[Z ≥ z].

0 z1 0 0 z1

Area to the right of z1 Area = 1 Area to the left of z1

=
0 z1 − z1 0

Area to the right of z1 Area to the left of − z1

z1 0 z2 0 z2 z1 0

Area between z1 and z2 Area to the left of z2 Area to the left of z1

25 / 59
Probability Calculation from Normal Distribution (cont’d)
If X is distributed as N(µ, σ 2 ), then
     
a−µ b−µ b−µ a−µ
P[a ≤ X ≤ b] = P ≤Z ≤ =Φ −Φ ,
σ σ σ σ
   
a−µ b−µ
P[X ≤ a] = Φ , and P[X ≥ b] = 1 − Φ .
σ σ
X −µ
Z= σ is called the z-score of X .

Example
The number of calories in one type of salad on the lunch menu is normally
distributed with mean 200 and sd 5. Find the probability that a randomly
selected one of this type of salad will contain
1 (a) more than 208 calories (0.0548).
2 (b) between 190 and 200 calories (0.4772).

26 / 59
Percentiles of Normal Distribution
The (100p)th percentile of a normal distribution with mean µ and
standard deviation σ can be obtained from the (100)th percentile of the
standard normal distribution, i.e.,
(100p)th percentile of N(µ,σ) =µ + (100p)th percentile of N(0,1)×σ.

Shaded Area = 0.9974


Shaded Area = 0.9544
Shaded Area = 0.6826

0.0215 0.1359 0.3413 0.3413 0.1359 0.0215

µ − 3σ µ − 2σ µ−σ µ µ+σ µ + 2σ µ + 3σ

27 / 59
Normal Approximation to Binomial Probabilities
Bin(5,0.3) Bin(30,0.3)

0.20
0.4

0.15
0.3
Probability

Probability

0.10
0.2

0.05
0.1

0.00
0.0

0 1 2 3 4 5 0 5 10 15 20

Bin(100,0.3) Bin(200,0.3)
0.08

0.08
Probability

Probability
0.04

0.04
0.00

0.00

10 20 30 40 50 30 40 50 60 70 80 90

28 / 59
Normal Approximation to Binomial Probabilities (cont’d)
Let X ∼ BIN(n, p). When np ≥ 10 and n(1 − p) ≥ 10 (sometimes 5 or 15), the binomial
distribution
pis well approximated by the normal distribution with mean np and standard
deviation np(1 − p). Therefore,
x + 0.5 − np x − 0.5 − np
P[X = x] = P[x − 0.5 < X ≤ x + 0.5] ≈ Φ( p ) − Φ( p );
np(1 − p) np(1 − p)
b + 0.5 − np a − 0.5 − np
P[a ≤ X ≤ b] = P[a − 0.5 < X ≤ b + 0.5] ≈ Φ( p ) − Φ( p );
np(1 − p) np(1 − p)
x + 0.5 − np −0.5 − np
B(x; n, p) = P[−0.5 < X ≤ x + 0.5] ≈ Φ( p ) − Φ( p ).
np(1 − p) np(1 − p)
The addition and subtraction of 0.5 is called the continuity correction.
Recall that the Poisson distribution also provides an approximation to the binomial
distribution when n is large and p is small so that np is moderate. Schader, M. and
Schmid, F. (1989, Two rules of thumb for the approximation of the binomial distribution
by the normal distribution, The American Statistician 43(1), 23-24) studied two rules of
thumb for the approximation of normal distribution to the binomial distribution when
np(1 − p) > 9 and np > 5 for 0 < p ≤ 0.5, n(1 − p) > 5 for 0.5 < p < 1 and found out
that the maximum absolute error is dependent on p.
29 / 59
Normal Approximation to Binomial Probabilities (cont’d)
P[6 ≤ X ≤ 8] when X ∼ BIN(n, 0.3) P[6 ≤ X ≤ 8] when X ∼ BIN(n, 0.05)
n Binomial Poisson Normal n Binomial Poisson Normal
10 0.04720530 0.08011495 0.04217526 10 0.00000275 0.00001416 0.00000000
20 0.47029771 0.40155785 0.48511880 20 0.00032910 0.00059306 0.00000195
30 0.35492315 0.33996208 0.33945790 30 0.00327181 0.00442825 0.00040284
40 0.10239112 0.13468675 0.10114008 40 0.01374737 0.01632616 0.00555464
50 0.01753049 0.03465406 0.02074592 50 0.03702019 0.04088079 0.02573849

Example
In a state that 25% of licensed drivers don’t have insurance, a random sample of
size 50 has been done. Let X be the number of uninsured drivers among the 50
drivers. Then X ∼ BIN(50, 0.25).
1 What is the probability that no more than 10 people don’t have insurance?
How about less than 10? (Binomial: 0.2622023; Poisson: 0.2970747;
Normal: 0.2568037 (0.2568146 in book); Binomial: 0.1636839; Poisson:
0.2014311; Normal: 0.1635825 (0.1635934 in book))
2 What is the probability that between 5 and 15 (inclusive) drivers don’t have
insurance? (Binomial: 0.8348084; Poisson: 0.8006835; Normal: 0.8319162)

30 / 59
Gamma Distributions
A continuous random variable X has a gamma distribution with the
parameter α and β, denoted by X ∼ GAM(α, β), if its pdf is
(
1 α−1 e −x/β
β α Γ(α) x for x > 0, α > 0, β > 0
f (x) =
0 elsewhere

The name comes from its relationship to a function Rcalled the gamma

function. The gamma function is given by Γ(α) = 0 t α−1 e −t dt.
Some properties of gamma function are Γ(α) = (α − 1)Γ(α − 1) for α > 1;

Γ(n) = (n − 1)! for n = 1, 2, . . . ; and Γ( 21 ) = π.
When shape parameter α = 1 gamma distribution becomes the
exponential distribution.
When α = v /2 and β = 2 gamma distribution becomes the
chi-square distribution with degrees of freedom v , denoted by χ2 (ν).
When the scale parameter β = 1, the distribution is called standard
gamma distribution.
31 / 59
Gamma Distributions (cont’d)
Gamma Distributions Gamma Distributions

1.0
1.0

0.8
α = 0.6, β = 1
α = 1, β = 1
0.8

α = 2, β = 1 3 α = 2, β = 1
α = 2, β = 1 2 α = 3, β = 1
α = 2, β = 2 3 α = 4, β = 1

0.6
α = 2, β = 1 α = 5, β = 1
0.6

α = 2, β = 2
f(x)

f(x)
α = 2, β = 4

0.4
0.4

0.2
0.2
0.0

0.0
0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7

The mean and variance of a random variable having a gamma distribution


GAM(α, β) are E (X ) = µ = αβ and V (X ) = σ 2 = αβ 2 .
When X is a standard gamma R.V., the CDF of X
Z x
1
I (x; α) = t α−1 e −t dt
Γ(α) 0
for x > 0 is called the incomplete gamma function. The CDF of GAM(α, β) is
P[X ≤ x] = F (x; α, β) = I (x/β, α).
Usage: shape is alph and scale is beta
dgamma(x, shape, rate = 1, scale = 1/rate, log = FALSE)
pgamma(q, shape, rate = 1, scale = 1/rate, lower.tail = TRUE, log.p = FALSE)
qgamma(p, shape, rate = 1, scale = 1/rate, lower.tail = TRUE, log.p = FALSE)
rgamma(n, shape, rate = 1, scale = 1/rate)
32 / 59
The Exponential Distribution
A continuous random variable X is said to have an exponential
distribution
 with parameter λ > 0, denoted by EXP(λ), if its pdf
λe −λx x ≥ 0
f (x; λ) = For an exponential distribution µ = 1/λ,
0 otherwise

2 2 0 x <0
and σ = 1/λ . Its cdf is F (x; λ) = −λx
1−e x ≥0
Memoryless property:
Exponential Distributions
P[X ≥ t + t0 |X ≥ t0 ] = P[X ≥ t].
2.0

Usage
dexp(x, rate = 1, log = FALSE)
1.5

λ = 0.5
λ=1
pexp(q, rate = 1, lower.tail = TRUE, log.p = FALSE)
λ=2 qexp(p, rate = 1, lower.tail = TRUE, log.p = FALSE)
rexp(n, rate = 1)
f(x)

1.0

Applications: Time between the occurrence of


0.5

successive events such as customer arriving at a


service facility or calls coming into a switchboard
0.0

0 1 2 3 4 5
in a Poisson process or component lifetime.

33 / 59
Karl Pearson’s χ2 Distribution
A continuous random variable X has a chi-squared distribution with ν
degrees of freedom, denoted by X ∼ χ2 (ν), ( if and only if its probability
ν−2 x
1
ν ν
x 2 e − 2 for x > 0
density distribution is given by f (x; ν) = 2 2 Γ( 2
)
0 elsewhere

Chi−square Distribution
E (X ) = ν and V (X ) = 2ν.
0.5

Let X1 , X2 , . . . , Xn be a sample
from N(µ, σ 2 ). Then
0.4

(n − 1)S 2 /σ 2 ∼ χ2n−1 .
0.3

chisq(2) Relationship between χ2ν and


f(x)

chisq(5)
chisq(10)
pois(λ): Fpois (k, λ) =
0.2

1 − Fχ2 (2λ; 2(k + 1)).


0.1

Usage
dchisq(x, df, ncp=0, log = FALSE)
0.0

pchisq(q, df, ncp=0, lower.tail = TRUE, log.p = FALSE)


0 5 10 15 20 qchisq(p, df, ncp=0, lower.tail = TRUE, log.p = FALSE)
rchisq(n, df, ncp=0)

34 / 59
The χ2 Table: Table A.7
Let χ2α,ν be the number of measurement axis for which the area under the χ2 density
curve with ν > 0 df to the right of χ2α,ν is α. χ2α,ν is called a χ2 critical value. For
 q 3
ν > 40, χ2α,ν ≈ ν 1 − 9ν 2
+ zα 9ν 2
based on Wilson-Hilferty transformation
 2 1/3
χν 2 2
ν
∼ N(1 − 9ν , 9ν ), approximately (Wilson, E.B. and Hilferty, M.M., 1931, The
distribution of chi-squared, Proceedings of the National Academy of Sciences of the
United States of America 17(12), 684-688).

ν\α ··· 0.05 0.025 0.01 · · ·


1 ··· 3.843 5.025 6.637 ···
2 ··· 5.992 7.378 9.210 ···
3 ··· 7.815 9.348 11.344 ···
4 ··· 9.488 11.143 13.277 ···
5 ··· 12.592 14.440 16.812 ···
6 ··· 14.067 16.012 18.474 ···
7 ··· 15.507 17.534 20.090 ···
χ2ν curve 8 ··· 16.919 19.022 21.665 ···
. . . . . .
. . . . . .
. . . . . .
26 ··· 38.885 41.923 45.642 ···
Shaded area=α 27 ··· 40.113 43.194 46.962 ···
28 ··· 41.337 44.461 48.278 ···
χ2α,ν
29 ··· 42.557 45.772 49.586 ···
. . . . . .
. . . . . .
. . . . . .

35 / 59
Student’s t Distribution
A continuous random variable X is said to have a student’s t distribution
with a degree of freedom of ν, denoted by t(ν), if it has pdf
Γ( ν+1
2 ) √1 t 2 ν+1
f (t; ν) = ν (1 + )− 2
Γ( 2 ) νπ ν
for −∞ < t < ∞ and ν > 0. (William Sealey Gosset, 1876-1937).

t versus N(0,1) 1. The density curve is bell-shaped and


centered at 0;
0.4

N(0,1) 2. The density curve of t-distribution has


t(2)
t(5) heavier tails than standard normal distribution.
t(30)
3. As ν → ∞, t distribution approaches the
0.3

standard normal (ν > 30)


ν
4. E (X ) = 0 and V (X ) = ν−2 for ν > 2.
f(x)

0.2

5. Let X1 , X2 , . √
. . , Xn be a sample from
N(µ, σ 2 ). Then n(X̄ − µ)/S ∼ t(n − 1).
0.1

Usage:
dt(x, df, ncp, log = FALSE)
0.0

pt(q, df, ncp, lower.tail = TRUE, log.p = FALSE)


qt(p, df, ncp, lower.tail = TRUE, log.p = FALSE)
−4 −2 0 2 4
rt(n, df, ncp)

36 / 59
The Student t Tables: Tables A.5 and A.8
Let tα,ν be the number of measurement axis for which the area under the t
curve with ν df to the right of tα,ν is α. tα,ν is called a t critical value.
ν \α ··· 0.10 0.05 0.025 ···
1 ··· 3.078 6.314 12.706 ···
. . . . .
. . . . .
. ··· . . . .
5 ··· 1.476 2.015 2.571 ···
. . . . .
. . . . .
. ··· . . . .
9 ··· 1.383 1.833 2.262 ···
tν curve . . . . .
. . . . .
. ··· . . . .
∞ ··· 1.282 1.645 1.96 ···

t\ν ··· 8 9 10 ···


0.0 ··· 0.500 0.500 0.500 ···
0.1 ··· 0.461 0.461 0.461 ···
. . . . . .
. . . . . .
Shaded area=α . . . . . .
2.6 ··· 0.016 0.014 0.013 ···
0 tα,ν
2.7 ··· 0.014 0.012 0.011 ···
2.8 ··· 0.012 0.010 0.009 ···
2.9 ··· 0.010 0.009 0.008 ···
. . . . . .
. . . . . .
. . . . . .

37 / 59
The Fisher-Snedecor’s F Distribution
A continuous random variable X is said to have an F distribution with
parameters ν1 > 0 and ν2 > 0, denoted by F(ν1 , ν2 ), if it has pdf
ν +ν
Γ( 1 2 2 ) ν1 ν1 ν1 −1
( 1
ν1 ν ( ) 2 f 2 (1 + νν21 f )− 2 (ν1 +ν−2) for f > 0
Γ( 2 )Γ( 22 ) ν2
g (f ) =
0 elsewhere
R ∞ α−1 −x
where Γ(α) = 0 x e dx.

F Distribution
ν2
1. E (X ) = ν2 −2 , for ν2 > 2;
1.0

2ν22 (ν1 +ν2 −2)


V (X ) = ν1 (ν2 −2)2 (ν2 −4) for ν2
> 4.
0.8

2. Let X1 , X2 , . . . , Xn1 be a sample from


N(µ1 , σ12 ) and Y1 , Y2 , . . . , Yn2 be a sample
0.6

from N(µ2 , σ22 ). Then


f(x)

σ22 SX2
0.4

σ2 S 2
∼ F (n1 − 1, n2 − 1).
F(6,6) 1 Y
0.2

F(6,2)
F(2,6)
Usage:
df(x, df1, df2, log = FALSE)
0.0

pf(q, df1, df2, ncp=0, lower.tail = TRUE, log.p = FALSE)


0 1 2 3 4 5 6 qf(p, df1, df2, lower.tail = TRUE, log.p = FALSE)
rf(n, df1, df2)

38 / 59
The F Table: Table A.9
Fα;ν1 ,ν2 is the upper α point of the F distribution. That is, the area to the right
of Fα;ν1 ,ν2 is α.

ν1
ν2 α ··· 8 9 10 ···
8 0.1 ··· 2.59 2.56 2.54 ···
0.05 ··· 3.44 3.39 3.35 ···
0.01 ··· 6.03 5.91 5.81 ···
0.001 ··· 12.05 11.77 11.54 ···
F(ν1, ν2) curve . . . . . . .
. . . . . . .
. . . . . . .
9 0.1 ··· 2.47 2.44 2.42 ···
0.05 ··· 3.23 3.18 3.14 ···
Shaded area=α 0.01 ··· 5.47 5.35 5.26 ···
0.001 ··· 10.37 10.11 9.89 ···
Fα;ν1,ν2 . . . . . . .
. . . . . . .
. . . . . . .

F0.1,8,9 = 2.47 but F0.1,9,8 = 2.44. Find a ≥ 0 and b > 0 such that
P[a < F8,9 < b] = 0.90.
1
It can be shown that F1−α;ν1 ,ν2 = Fα;ν1 ,ν . F0.95;8,9 = F0.05;9,8 1
= 3.39 = 0.29.
2 1
Relation to binomial distribution:
FBIN (x; n, p) = FF ( (x+1)(1−p)
p(n−x) ; 2(n − x), 2(x + 1)) (Ling, Robert F., 1992, Just
Say no to Binomial (and other Discrete Distributions) Tables. The American
Statistician 46(1), 53-54).
39 / 59
Uniform Distribution
A continuous random variable X is said to have a uniform
distribution with parameters α and β, denoted by UNIF(α, β), if it
has a probability density function
 1
β−α for α ≤ x ≤ β
f (x; α, β) =
0 elsewhere
Its cumulative distribution function is

 1 if x > β
x−α
F (x) = for α ≤ x ≤ β
 β−α
0 elsewhere
(β−α)2
µ = α+β 2
2 , σ = 12 .
Usage in R
dunif(x, min=0, max=1, log = FALSE)
punif(q, min=0, max=1, lower.tail = TRUE, log.p = FALSE)
qunif(p, min=0, max=1, lower.tail = TRUE, log.p = FALSE)
runif(n, min=0, max=1)
40 / 59
The Weibull Distribution
The family of Weibull distributions was introduced by the Swedish physicist
Waloddi Weibull (1887-1979) in 1939. His 1951 hallmark article ”A Statistical
Distribution Function of Wide Applicability” (Journal of Applied Mechanics, 18,
293-297) discussed a number of applications.
A continuous random variable X has a Weibull distribution with shape parameter
α > 0 and scale parameter β > 0, denoted by WEI(α, β), if its probability density
function (  α
x
α α−1 − β
f (x) = βα
x e for x > 0
0 elsewhere
When α = 1 the Weibull distribution becomes exponential distribution. However,
Weibull distribution is not the same as Gamma distribution.
E (X ) = βΓ(1 + 1/α), V (X ) = β 2 {Γ(1 + 2/α) − [Γ(1 + 1/α)]2 }.
The cdf of Weibull distribution is
(  α
x
− β
F (x) = 1−e for x ≥ 0
0 for x < 0

41 / 59
The Weibull Distribution (cont’d)
Density graph
Weibull Distributions

1.5
α = 1, β = 1
α = 2, β = 1
α = 4, β = 2
α = 10, β = 4
α = 10, β = 6

1.0
f(x)

0.5
0.0

0 2 4 6 8

Usage in R
dweibull(x, shape, scale = 1, log = FALSE)
pweibull(q, shape, scale = 1, lower.tail = TRUE, log.p = FALSE)
qweibull(p, shape, scale = 1, lower.tail = TRUE, log.p = FALSE)
rweibull(n, shape, scale = 1)

42 / 59
The Weibull Distribution (cont’d)
The Weibull distribution has many applications in reliability studies such as voltage
breakage of electric circuits and other disciplines such as studies of crystallization
in physics, studies of tides in climatology, and studies of the time to complete a
task in cognitive psychology, etc.
A continuous random variable is said to have a Weibull distribution with shape
parameter α > 0, scale parameter β > 0, and shift parameter γ if its pdf
(  α−1  x−γ α
α x−γ −
e β for x > γ
f (x) = β β
0 elsewhere

E (X ) = γ + βΓ(1 + 1/α), V (X ) = β 2 {Γ(1 + 2/α) − [Γ(1 + 1/α)]2 }.


The cdf of three-parameter Weibull distribution is
(  α
− x−γ
F (x) = 1 − e β for x ≥ γ
0 for x < γ

A continuous
 α random variable has a three-parameter Weibull distribution if
Y = X −γβ
has the standard exponential distribution with pdf f (y ) = e −y , for
y > 0.
43 / 59
Applications of Weibull Distribution

Example
In recent years the Weibull distribution has been used to model engine emission of
various pollutants. Frey, H.C. and Bammi, S. (2002, Quantification of variability and
uncertainty in lawn and garden equipment NOx and total hydrocarbon emission factors,
Journal of the Air & Waste Management Association, 52(4), 435-448) suggested that
the amount of NOx emission, X , (g/gal) from a randomly selected four-stroke engine of
certain type follows a Weibull distribution with α = 2 and β = 10.
1 P[X ≤ 25] = 0.998, so the distribution is almost entirely concentrated on values
between 0 and 25.
2 The value c, which separates the 5% of all engines having the largest amounts of
NOx emission from the remaining 95%, satisfies
c 2
0.95 = 1 − e −( 10 ) , c ≈ 17.3.

> pweibull(25,2,10) > qweibull(0.95,2,10)


[1] 0.9980695 [1] 17.30818

44 / 59
Applications of Weibull Distribution (cont’d)
Example
Volumetric properties such as air voids, voids in mineral aggregate, voids filled with
asphalt, and bulk specific gravity of asphalt mixtures play an important role in
determining the resistance of asphalt mixtures to major pavement distresses including
rutting, fatigue cracking, and low temperature cracking. They also influence the
durability of asphalt mixtures in terms of aging and stripping. Tan, Yiqiu, Xu, Huining,
and Li, Xiaomin (2009, Is normal distribution the most appropriate statistical
distribution for volumetric properties in asphalt mixtures, Journal of Testing and
Evaluation 37(5), 1-11) used the analysis of some sample data to recommend that, for a
particular mixture, the air void volume (X in %) be modeled by a three-parameter
Weibull distribution with γ = 4, α = 1.3, and β = 0.8. What are the differences between
two models?
1 E (X ) = 4 + 0.8Γ(1 + 1/1.3) = 4.7389 and
V (X ) = 0.82 {Γ(1 + 2/1.3) − [Γ(1 + 1/1.3)]2 } = 0.3285.
x−4 1.3
2 For Weibull distribution, F (x; α, β, γ) = F (x; 1.3, 0.8, 4) = 1 − e −[ 0.8 ] .
Therefore, P[5 ≤ X ≤ 6] = F (6; 1.3, 0.8, 4) − F (5; 1.3, 0.8, 4) ≈ 0.2255
3 For normal distribution P[5 ≤ X ≤ 6] ≈ 0.3104.

45 / 59
Applications of Weibull Distribution (cont’d)

Example

0.0 0.2 0.4 0.6 0.8 1.0


Normal
Density Weibull

3 4 5 6 7
Air void volume

> library("FAdist")
> pweibull3(6,1.3,0.8,4)-pweibull3(5,1.3,0.8,4)
[1] 0.2255341
> pnorm(6,4.738861,sqrt(0.3284997))-pnorm(5,4.738861,sqrt(0.3284997))
[1] 0.3104407

46 / 59
Log-Normal Distribution
The log-normal distribution has many practical applications in medicine (incubation
period of infectious diseases, time to recover from illness), business and economics
(duration of employment, time to first purchase in a population of potential
buyers), and sociology (time to marriage and divorce, lengths of telephone calls).
Edwin L. Crow and Kunio Shimizu (1988, Lognormal Distributions: Theory and
Applications) has more detailed discussion on genesis and applications of
lognormal distribution.
Limpert, Eckhard, Stahel, Werner A., and Abbt, Markus. (2001, Log-normal
distributions across the sciences: Keys and clues, BioScience, 51(5), 341-352) has
updated information.
A continuous random variable is said to have a log-normal distribution with
parameters µ and σ, denoted by LN(µ, σ), if its probability density function
(lnx−µ)2
(

√ 1 x −1 e 2σ 2 for x > 0, σ > 0
f (x) = 2πσ
0 elsewhere

A positive random variable X is log-normally distributed if and only if Y = lnX is


normally distributed with mean µ and variance σ 2 .

47 / 59
Log-Normal Distribution (cont’d) h i
ln(x)−µ
F (x; µ, σ) = P(X ≤ x) = P[ln(X ) ≤ ln(x)] = P Z ≤ σ =
 
Φ ln(x)−µ
σ for x > 0.
2 /2 2 2
E (X ) = e µ+σ , V (X ) = e 2µ+σ (e σ − 1).
Usage in R
dlnorm(x, meanlog = 0, sdlog = 1, log = FALSE)
plnorm(q, meanlog = 0, sdlog = 1, lower.tail = TRUE, log.p = FALSE)
qlnorm(p, meanlog = 0, sdlog = 1, lower.tail = TRUE, log.p = FALSE)
rlnorm(n, meanlog = 0, sdlog = 1)

Density graph
Lognormal Distribution
4

meanlog=0, sdlog = 1
meanlog=0.5, sdlog = 1
meanlog=0, sdlog = 0.1
3

meanlog=0, sdlog = 0.3


f(x)
2
1
0

0 1 2 3 4

48 / 59
Applications of Log-normal Distribution
Velazquez, J.C., Caleyo, F., Valor, A. and Hallen, J.M. (2009, Predictive model
for pitting corrosion in buried oil and gas pipelines, Corrosion 65(5), 332-342)
suggested that the log-normal distribution is the best model for describing the
distribution of maximum pit depth data from cast iron pipes in soil. The authors
recommended that a log-normal distribution with µ = 0.353 and σ = 0.754 be
appropriate for the maximum pit depth (X in mm) of a randomly selected
pipeline buried.
2
E (X ) = e 0.353+0.754 /2 = 1.8913 and
2 2
V (X ) = e 2(0.353)+0.754 · (e 0.754 − 1) = 2.7387.

 ≤ X ≤ 2] = P[ln(1)
P[1 ≤ ln(X
 ) ≤ ln(2)] =
P 0−0.353
0.754 ≤ Z ≤ 0.693−0.353
0.754 = 0.354.
What value c is such that only 1% of all specimens have a maximum pit
depth exceeding?
 
ln(c) − 0.353
0.99 = P[X ≤ c] = P Z ≤ .
0.754
ln(c)−0.353
Therefore, 0.754 = 2.3263 or c = 8.2241.
49 / 59
Beta Distribution
A continuous random variable X is said to have a standard beta
distribution with parameters α and β, denoted by BETA(α, β), if its
probability density function

Γ(α+β) α−1

 Γ(α)Γ(β)
x (1 − x)β−1 for 0 < x < 1,
f (x; α, β) = α > 0, β > 0

0 elsewhere

α
µ = α+β , σ 2 = (α+β)2αβ
(α+β+1)
.
The general beta distribution with parameters α and β has
probability density function
( α−1  β−1
Γ(α+β)

1 x−A B−x
for A ≤ x ≤ B, α > 0, β > 0
f (x; α, β, A, B) = B−A Γ(α)Γ(β) B−A B−A
0 otherwise

α (B−A)2 αβ
µ = A + (B − A) α+β , σ2 = (α+β)2 (α+β+1)
.
Usage in R
dbeta(x, shape1, shape2, ncp=0, log = FALSE)
pbeta(q, shape1, shape2, ncp=0, lower.tail = TRUE, log.p = FALSE)
qbeta(p, shape1, shape2, lower.tail = TRUE, log.p = FALSE)
rbeta(n, shape1, shape2)

50 / 59
Beta Distribution (cont’d)
Density graph

Beta Distributions

α = 1, β = 1

3.0
α = 1, β = 2
α = 2, β = 1
α = 2, β = 2
α = 0.5, β = 0.5
2.5

α = 2, β = 4
α = 4, β = 2
α = 0.2, β = 1
α = 1, β = 0.2
2.0
f(x)

1.5
1.0
0.5
0.0

0.0 0.2 0.4 0.6 0.8 1.0

51 / 59
Applications of Beta Distribution
The standard beta distribution is commonly used to model variation in the proportion or
percentage of a quantity occurring in different samples such as the proportion of a
24-hour day that an individual is asleep or the proportion of a certain element in a
chemical compound. The general beta distribution is useful for modeling the time to
finish a specific task.

Example
Project managers often use PERT (program evaluation and review technique) to
coordinate the various activities making up a large project. A standard assumption in
PERT analysis is that the time necessary to complete any particular activity one has
been started has a beta distribution with the optimistic time A (if everything goes well)
and the pessimistic time B (if everything goes badly).
Suppose that the time (X in days) necessary for laying the the foundation has a beta
distribution with A = 2, B = 5, α = 2, and β = 3.
2 (5−2)2 ·2×3
E (X ) = 2 + (5 − 2) · 2+3
= 3.2 and V (X ) = (2+3)2 ·(2+3+1
= 0.36.
The probability that it takes at most 3 days to lay the foundation is
Z 3    2
1 4! x −2 5−x
P[X ≤ 3] = · · · dx = 0.407.
2 3 1!2! 3 3

52 / 59
Assessing Distributional Assumptions about Data
At the heart of probabilistic statistical analysis is the assumption that a set of data
arises as a sample from a distribution in some class of probability distributions.
The reasons for making distributional assumptions about data are several.
1 If we can describe a set of data as a sample from a certain theoretical
distribution, say, a normal distribution, then we can achieve a valuable
compactness of description for the data.
2 The assumptions can lead us to useful statistical procedures.
3 The assumptions allow us to characterize the sampling distribution of
statistics computed during the analysis and thereby make inferences and
probabilistic statements about unknown aspects of the underlying
distribution.
4 Understanding the distribution of a set of data can sometimes shed light on
the physical mechanisms involved in generating the data.
5 Analysis based on specific distributional assumptions about data are not
valid if the assumptions are not met to a reasonable degree.
6 Garbage in, garbage out!
53 / 59
Data from Various Populations
Researchers in the development of new treatments for cancer patients often evaluate the
effectiveness of new therapies by reporting the proportion of patients who survive for a
specified period of time after completion of treatment. A new genetic treatment of 870
patients with a particular type of cancer resulted in 330 patients surviving at least five
years after treatment.
Beginning in June, 1944, London was the target of attacks by V-1 ”flying bombs”
launched primarily from France. The attacks continued through the summer and ended as
the launch sites were overrun by advancing allied forces. Clarke, R.D. (1946, An
application of the Poisson distribution Journal of the Institute of Actuaries 72, 481)
presented an analysis of the distribution of V-1 impacts on London. The city was divided
into 576 small areas of one-quarter square kilometers each and the number of areas hit
exactly 0, 1, 2, 3, 4, 5 or more than 5 times was counted: 0 (229), 1(211), 2(93), 3(35),
4(7), 5+ (1) There were a total of 537 hits, so the average number of hits per area was
0.9323.
Andreas Lindén and Samu Mäntyniemi (2011, Using the negative binomial distribution to
model overdispersion in ecological count data, Ecology 92, 1414-1421) reported count
data on migrating woodlarks at Hanko bird observatory from 2007 to 2009. The migration
period considered ranges from the first day of September (day-of-year 245) to the 10th of
November (day-of-year 315). Here are the 2007 data 249(1), 252(7), 254(6), 255(3),
258(6), 259(2), 262(53), 264(3), 267(49), 268(7), 269(15), 271(2), 272(16), 273(4),
274(58), 275(2), 276(1), 278(1), 280(10), 281(2), 282(4), 285(13), 287(3), 291(2),
292(21), 293(11), 294(1), 296(2), 300(2), 305(4), 309(2), and all other days have zero.
The following data are the oxygen uptakes (milliliters) during incubation of a random
sample of 15 cell suspension: 14.0, 14.1, 14.5, 13.2, 11.2, 14.0, 14.1, 12.2, 11.1, 13.7,
13.2, 16.0, 12.8, 14.4, 12.9.
54 / 59
Probability Plots or Quantile-Quantile (QQ) Plots
The concept of quantile is closely connected with the familiar concept
of percentile. The only difference between percentile and quantile is
that percentile refers to a percent of the set of data and quantile
refers to a fraction of the set of data.
A convenient, operational definition of quantile: Let
x(1) < x(2) < . . . < x(n) be the sorted data and p represent any
fraction between 0 and 1. The quantile Q(p) to be x(i) if p is one of
the fractions pi = (i − 0.5)/n, for i = 1 to n. The reason using
(i − 0.5)/n is that the observation is counted as being half in the
lower group and half in the upper group. If p is a fraction λ of the
way from pi to pi+1
Q(p) = (1 − λ)Q(pi ) + λQ(pi+1 ).
For p < 0.5/n and p > 1 − 0.5/n, Q(p) = x(1) for p < p1 and
Q(p) = x(n) for p > pn . Probability plots or Q-Q plots are obtained
by plotting the quantiles of the data against the corresponding
quantiles of the theoretical distribution.
55 / 59
Assessing Distributional Assumptions about Data (cont’d)

If the data are from the distribution, all points in the probability plot will fall
near the line y = x. If any large or systematic departures from the line occur,
they should be judged as indicating lack of fit of the distribution to the data.
If the observed configuration follows a line that is parallel to the line y = x,
we would conclude that the data come from a distribution that is compatible
with the theoretical distribution. The two distributions have different
locations or centers.
If the points have a nearly straight configuration that passes through the
origin but is not parallel to the line y = x, we would conclude that the data
and the theoretical distribution match except for a difference of spread.
If adding a constant is also required to map the configuration onto the line
y = x then we would judge that the data and the reference distribution
differ both in location and spread. It is the straightness of the probability
plot that is used to judge whether the data and reference distribution have
the same distributional shape.

56 / 59
Assessing Distributional Assumptions about Data (cont’d)
When there are departures from linearity in a probability plot, it
usually matches the following cases:
◮ Stragglers at either end indicate outliers
◮ Curvature at both ends indicates long (the left end is under the line
and the right end is above the line) or short tails (the left end is above
the line and the right end is below the line).
◮ Convex or concave curvature relates to asymmetry: If the slope of the
curve increasing (decreasing) from left to right, it indicates the data
have an asymmetric distribution that is skewed to the right (left). In
other words, if we compare pairs of quantiles moving inward from the
ends of the sorted data, we find that the quantiles above the median
are farther from the median than their counterpart below. If the data
were skewed to the left, then lower-tail quantiles would be farther from
the median than the upper-tails quantiles.
◮ Horizontal segments, plateaus, or gaps: Horizontal segments reveal
multiple of a specific value in the data. Plateaus show the clusters in
the data. Gaps show no values in the data.

57 / 59
Assessing Distributional Assumptions about Data (cont’d)
Probability plots are powerful for exploring distributional properties of data
but they must be used with care. Two important facts that must be kept
in mind are
1 the natural variability of the data generates departures from
straightness even if the distributional model is valid;
2 each probability plot only compares the data from one variable with a
theoretical distribution and all other information, in particular the
relationship of this variable to others, is ignored.
Quantile-Quantile Plots

Description
qqnorm is a generic function the default method of which produces a normal QQ plot of the values in y.
qqline adds a line to a normal quantile-quantile plot which passes through the first and third quartiles.
qqplot produces a QQ plot of two data sets.

Usage
qqnorm(y, ylim, main = "Normal Q-Q Plot",xlab = "Theoretical Quantiles", ylab = "Sample Quantiles",
plot.it = TRUE, datax = FALSE, ...)

qqline(y, datax = FALSE, ...)

qqplot(x, y, plot.it = TRUE, xlab = deparse(substitute(x)),ylab = deparse(substitute(y)), ...)

58 / 59
QQ-Plots for Some Continuous Distributions
N(0,1) N(3,16) chi square(5)

15

20
2

10
Sample Quantiles

Sample Quantiles

Sample Quantiles

15
1

5
0

10
−1

5
−2

−5
−3

0
−3 −1 1 2 3 −3 −1 1 2 3 −3 −1 1 2 3

Theoretical Quantiles Theoretical Quantiles Theoretical Quantiles

Beta(6,2) t(5) uniform[0,1]


1.0

1.0
8
6

0.8
0.8
Sample Quantiles

Sample Quantiles

Sample Quantiles
4

0.6
2
0.6

0.4
0
−2
0.4

0.2
0.0
0.2

−6

−3 −1 1 2 3 −3 −1 1 2 3 −3 −1 1 2 3

Theoretical Quantiles Theoretical Quantiles Theoretical Quantiles

59 / 59

You might also like