Probability Distributions n Special
Probability Distributions n Special
Probability Distributions
Probability Distributions
Preceding unit reviewed concepts of probability as well as
techniques for calculating the probability of an event.
We see the relationship between the values of a random
variable and the probabilities of occurrence in a probability
distribution.
Introduction to Statistical Analysis
Probability Distributions
Random Variables
Let X denote the number of heads obtained in an experiment
where a coin is tossed five times.
The possible values X can assume are 0, 1, 2, . . . , 5.
This is an example of variable that is numerical valued.
It is called random variable.
Introduction to Statistical Analysis
Random Variables
Definition
A variable is a real valued function defined on S, a sample
space.
Here are some of the examples of random variables:
A die is rolled twice. Let Y represent the sum of points
obtained in this experiment. Then the possible values of Y are
2, 3, 4, . . . , 12. the variable takes particular real values.
This variable is called a discrete random variable.
Let W be the weight of a new born normal baby, where the
minimum and the maximum weights respectively are 1.8 and
3.5 kg. So the possible values of W are in the interval [1.8, 3.5]
Introduction to Statistical Analysis
Random Variables
Notes
In the first example, the random variable takes values that are
distinct from each other. This variable is called a discrete
random variable.
W can assume any value in the interval [1.8, 3.5]. It is called
a continuous random variable.
If each value or range of values can be accompanied by its
probability then we have a probability distribution.
A formula may also be used to determine probabilities. The
formula is called a probability density function(pdf).
Introduction to Statistical Analysis
Discrete Random Variable
Example
The probability density function of a discrete random variable
Y is given by
ky , for y = 12, 13, 14
P(Y = y ) =
0, Otherwise
Solution
Since the function P(Y = y ) = kx is a probability distribution
function, it follows that
14
X
P(Y = y ) = 1.
y =12
1
Thus 12k + 13k + 14k = 1. Therefore, k = 39 . So
y
P(X = x) = , y = 12, 13, 14
39
.
The probability distribution is given in the table below.
Y =y 12 13 14
12 13 14
P(Y = y ) 39 39 39
Introduction to Statistical Analysis
Continuous Random Variables
Continued/. . .
Example
x
A continuous random variable X has a pdf f (x) = 6 + C for
0 ≤ x ≤ 3.
Evaluate C .
Find P(1 ≤ x ≤ 2).
Introduction to Statistical Analysis
Continuous Random Variables
Solution
Z 3
P(0 ≤ x ≤ 3) = f (x)dx
0
Z 3
x
= ( + C )dx
0 6
3
= + 3c = 1
4
1
Thus c = 12 .
Z 2
P(1 ≤ x ≤ 2) = f (x)dx
1
Z 2
x 1 1
= ( + )dx =
1 6 12 3
Introduction to Statistical Analysis
Continuous Random Variables
Continued/. . .
Have you ever imagined that we can draw a histogram of a
pdf? Consider the following activity.
For the continuous distribution defined by the pdf:
1
f (x) = (3x 2 + 4), for 0 ≤ x ≤ 4
80
Continued/. . .
The cumulative distribution function (or distribution function
for short) gives the probability that a random variable will
assume a value less than or equal to some specified value, b.
For X continuous, with pdf f (x), the distribution function,
F (b), is given by
Z b
F (b) = f (x)dx
−∞
Example
A discrete random variable X has a pdf, P(X = x) given by
1 2
P(X = x) = 25 (x − 4x + 5), x = 0, 1, 2, 3, 4 and 5
0, otherwise
Continued/. . .
The probability density function is given by
1
P(X = x) = 25 (x 2 − 4x + 5). Thus
Continued/. . .
F (b) = 1 if b is the maximum value that can be taken by the
random variable.
E.g. For a discrete random variable X , the distribution
function is given by F (x) = kx, x = 1, 2, 3. Find (a) the
value of the constant k, (b) P(X < 3), (c) the probability
distribution of X .
Introduction to Statistical Analysis
The Cumulative Distribution Function
Solution
We are given the cumulative distribution function. We need
to find the pdf before we can obtain a probability distribution.
Recall that F (b) = 1 if b is the maximum value a random
variable can attain. So F (3) = 3k = 1. Thus k = 13 .
Continued/. . .
Note that P(X = 2) = P(3) = 13 .
Probability distribution is summarised in the following table.
x 1 2 3
P(x) 13 13 31
Introduction to Statistical Analysis
The Cumulative Distribution Function
Example
A continuous random variable Y has a pdf of the form
f (y ) = A(y 2 + 4) for 0 ≤ y ≤ 1.
Calculate the value of the constant A,
Derive an expression for the distribution function of this
random variable.
Introduction to Statistical Analysis
The Cumulative Distribution Function
Solution
If f (y ) is a pdf, it follows that
Z 1
A(y 2 + 4)dy = 1.
0
13
Thus 3 A = 1.
3
Therefore, A = 13 . The pdf of Y is
3 2
f (y ) = (y + 4).
13
Let the distribution function be F (b). Then
Z b
3 2 b
F (b) = (y + 4)dy = (b 2 + 12)
0 13 13
Introduction to Statistical Analysis
Expected Value of a Random Variable
Definition
Let X be a random variable with pdf P(X = x) (or f (x) for
continuous). Then the expected value of X , denoted by E [X ], is
given by
P
E [X ] = R X xP(X = x), for discrete X
X xf (x)dx, for continuous X
Introduction to Statistical Analysis
Expected Value of a Random Variable
Continued/. . .
This quantity can take any value.
The quantity E [X ] is usually denoted by the symbol µ.
E.g. Find E [X ] = µ for the following distributions:
x 0 1 2 3 4 5
P(X = x) 0.01 0.08 0.23 0.36 0.21 0.11
f (x) = 1 − x2 for 0 ≤ x ≤ 2.
Introduction to Statistical Analysis
Expected Value of a Random Variable
Solution
The random variable in this case is discrete. So
X
E [X ] = xP(X = x)
all x
= 0 × 0.01 + 1 × 0.08 + . . . + 5 × 0.11 = 3.01
Example
A committee of three is to be chosen from four girls and seven
boys. Find the expected number of girls on the committee, if
the members of the committee are chosen at random.
Solution
Let Y be the number of girls on the committee. Then
4 7
y 3−y
P(Y = y ) = for y = 0, 1, 2, 3.
11
3
Introduction to Statistical Analysis
Expected Value of a Random Variable
Continued/. . .
Need to find E [Y ].
Start with a probability distribution for Y .
y 0 1 2 3
P(Y = y ) 0.212 0.509 0.255 0.024
So the expectation of Y , is given by
Example
If X is a random variable (discrete or continuous), show that
Var[X ] = E [X 2 ] − µ2 .
For the discrete random variable X defined by the pdf
x 0 1 2 3 4
1 3 7 3 2
P(X = x) 16 16 16 16 16
Solution
The expectation of X , E [X ] is found to be E [X ] = 2.125.
Now, E [X 2 ] = x 2 P(X = x). This gives a value of 5.625.
P
Var[X ] = E [X 2 ] − µ2
= 5.625 − 4.516 = 1.109
Introduction to Statistical Analysis
SOME SPECIAL DISTRIBUTIONS
Binomial Distribution
It is concerned with experiments for which there are two possible
outcomes: success or failure.
Definition
A random variable R is said to follow a binomial distribution if
n
π r (1 − π)n−r , r = 0, 1, . . . , n
P(R = r ) = r
0, otherwise
Example
The probability that a student, in a particular school, is left
handed is 0.3. If a sample of 10 students is drawn from this
school, find the probability that of the 10 students
none is left handed;
all are left handed;
exactly four are left handed.
Introduction to Statistical Analysis
Binomial Distribution
Solution
π = 0.3 and 1 − π = 0.7
The probability that none is left handed is
10
C0 (0.3)0 (0.7)10 ≈ 0.02825
The probability that all are left handed is
10
C10 (0.3)10 (0.7)0 = 0.310 ≈ 5.9 × 10−6 .
The probability that exactly four are left handed is
10
C4 (0.3)4 (0.7)6 ≈ 0.2.
Introduction to Statistical Analysis
Binomial Distribution
Example
A large consignment of shelled peas is known to have 2%
discoloured. If a lot of 10000 is dispatched for tinning, find
the expected number of discoloured peas in the lot, and also
the variance.
Solution
Let X be the number of discoloured peas in a lot. Then X
follows a binomial distribution with parameters n = 10000 and
π = 0.02. Thus
E [X ] = nπ
= 10000 × 0.02 = 200
Var[X ] = nπ(1 − π)
= 10000 × 0.02 × 0.98 = 196
Introduction to Statistical Analysis
The Hypergeometric distribution
Example
Suppose we have 17 students on a study programme with 8 female
students. If a committee of 6 is to be formed from this programme,
what is the chance the committee will include 3 female students?
Solution
From counting techniques, we note that there are
8 9
×
3 3
Definition
Let N be population size and k be the number of items in the
population with a particular attribute. Suppose further that n is
the sample size and the random variable X denote the number of
items in the sample with the attribute. Then
k N −k
x n−x
P(X = x) =
N
n
Example
Refer to the example in introduction to find the probability that in
a committee of 6, three will be ladies.
Solution
Let X be the number of ladies in the committee. Then
X ∼ Hyper (17,
8, 6)
so that
8
9
3 3
P(X = 3) = = 0.3801
17
6
Introduction to Statistical Analysis
The Hypergeometric distribution
Example
A product is shipped in lots of 20. A sampling plan calls for
getting a sample of 5 items from each lot and rejecting the lot if
more than one defective is observed. If a lot contains four
defectives, what is the probability a lot will be rejected?
Solution
Let X be the number of defective items in a sample of 5 items.
Then X ∼ Hyper(20, 4, 5). The lot is not accepted of X ≥ 2.
Thus, we require
Theorem
Let X ∼ Hyper(N, k, n). Then
k
E [X ] = n
N
k N −k N −n
Var[X ] = n
N N N −1
Introduction to Statistical Analysis
The Hypergeometric distribution
Example
A doctor examines 15 out-patients at a small clinic. Four of these
are malaria cases. What is the expectation and variance of number
of malaria patients in the 5 that are earmarked for a transfer to a
central hospital?
Introduction to Statistical Analysis
The Hypergeometric distribution
Solution
Let Y be number of malaria cases in the sample. Then
Y ∼ Hyper(N = 15, k = 4, n = 5). Thus
4
E [Y ] = 5 × 15 = 43 = 1.3333. Similarly, we find
4
σ 2 = 5 × 15 × 15−4 15−5
15 × 15−1 = 0.7683.
Introduction to Statistical Analysis
Poisson Distribution
Poisson Distribution
A discrete random variable X is said to have a Poisson
distribution id it has a pdf of the form
µx
P(X = x) = e −µ for x = 0, 1, 2, 3, . . . (2)
x!
where µ is the parameter.
Introduction to Statistical Analysis
Poisson Distribution
Continued/. . .
The following statements describe what is called a Poisson
process.
Occurrences of events are independent. i.e. Occurrence of an
event in an interval of space or time has no effect on the
probability of a second occurrence of the event in the same or
any other, interval.
An infinite number of occurrences of the event must be
possible in an interval.
The probability of the single occurrence of the event in a given
interval is proportional to the length of he interval.
In a small portion of interval, the probability of more than one
occurrence of the events is negligible.
Introduction to Statistical Analysis
Poisson Distribution
Example
In a study of a certain aquatic organism, a large number of
samples were taken from a pond, and the number of organisms in
each sample was counted. The average number of organisms per
sample was found to be two. Assuming that the number of
organisms follows a Poisson distribution, find the probability that
the next sample taken will contain one or fewer organisms.
Introduction to Statistical Analysis
Poisson Distribution
Solution
Let Y be the number of organisms in a sample. You note that in
this case µ = 2. So probability of interest is
F (1) = P(Y ≤ 1)
= P(Y = 0) + P(Y = 1)
Example
If on a certain stretch of road accidents occur at a rate of 3 per
month, find the probability that
only two accidents occur in a month;
fourteen or fifteen accidents occur in four months.
Introduction to Statistical Analysis
Poisson Distribution
Solution
Two accidents in a month means we consider r.v. X with
parameter µ = 3. So
e −3 × 32
P(X = 2) =
2!
= 0.2240
Remark
If X ∼ Poiss(µ), then (1) E [X ] = µ and (2) Var[X ] = µ.
It is only a Poisson distribution that has a mean equal to its
variance.
We can fit theoretical Poisson frequencies to a given
frequency distribution.
A binomial distribution with parameters n and π can be
approximated by a Poisson distribution with parameter
nπ = µ if n is large (more than 50) and π is small (usually
1
less than 10 ).
Introduction to Statistical Analysis
Poisson Distribution
Example
In the mass production of an article, 500 samples each of 30
articles are examined. The number of defective items in the
samples are shown in the following table:
No. of
defectives 0 1 2 3 4 Total
Frequency 309 142 40 8 1 500
Find the mean number of defective items in each sample;
Show that the distribution is approximately that of a Poisson
distribution with this mean.
Calculate the variances of both distributions
Introduction to Statistical Analysis
Poisson Distribution
Solution
Let X be number of defectives in a sample. Then
X fi xi 250
X̄ = P = 0.5
fi 500
Continued/. . .
Note that observed) frequencies are close to the “Poisson
frequencies” except the second frequency.
Now, the variance of the frequency distribution is given by
fi (xi − X̄ )2
P
S2 = .
500
P 2
Computations show that fi xi = 390. Thus
Normal Distribution
Normal distribution is the most important continuous
distribution in statistics.
It approximates very well to data that occurs frequently in real
life situations. E. g.like weight, height and age.
A continuous random variable X is said to have a Normal
distribution if its pdf is of the form
(x − µ)2
1
f (x) = √ exp − for − ∞ < x < ∞
σ 2π 2σ 2
Continued/. . .
The mean, the median and the mode are all equal.
Different values of µ shift the graph of the distribution along
x-axis,
Different values of σ determine the degree of peakiness or
flatness of the graph of the distribution. See figures below.
Introduction to Statistical Analysis
Characteristics of the Normal Distribution
Continued/. . .
Φ(b) = F (b)
Z b 2
1 z
= P(Z < b) = √ exp − dx
−∞ 2π 2
Introduction to Statistical Analysis
The Standard Normal Distribution
Continued/. . .
Tables have been drawn for a wide range of values of b. For
most tables, the area given is shown in the figure.
Usually such tables only give values of F (b) for b ≥ 0.
Probabilities like P(Z < −1.25) and P(Z ≥ 2.33) have to be
changed into probabilities of the form P(Z < b) by using the
symmetric properties of the distribution.
Continued/. . .
In general if Z ∼ N(0, 1) and Φ(b) = P(Z < b) then for
a ∈ [0, ∞):
P(Z > a) = 1 − P(Z < a)
Φ(−a) = 1 − F (a)
any b and c (positive or negative),
P(b < Z < c) = Φ(c) − Φ(b).
Introduction to Statistical Analysis
The Standard Normal Distribution
Example
If Z ∼ N(0, 1), find
P(Z < 0),
P(Z < 2.37),
P(Z > 1.56),
P(Z < −1.65),
P(Z > −2.808)
Introduction to Statistical Analysis
The Standard Normal Distribution
Solution
Figure 6:
Continued/. . .
The shaded area represents P(Z < 2.37) which is 0.9911
Introduction to Statistical Analysis
The Standard Normal Distribution
Continued/. . .
The area in question is to the right of 1.56. So
P(Z > 1.56) = 1 − P(Z < 1.56). Now, from tables
P(Z < 1.56) = 0.9406. Thus P(Z > 1.56) = 0.0594.
Continued/. . .
You see that the area of interest is to the right of −2.808.
But by symmetry, this is the same as P(Z < 2.808) which is
0.9975.
Introduction to Statistical Analysis
The Standard Normal Distribution
Example
If Z ∼ N(0, 1), find the values of
P(0.829 < Z < 1.843),
P(−1.764 < Z < 2.083),
P(Z < −1.97 or Z > 2.5),
P(|Z | > 2.326) and
P(|Z | < 1.78).
Introduction to Statistical Analysis
The Standard Normal Distribution
Solution
P(0.829 < Z < 1.843) = P(Z < 1.843) − P(0.829). From
tables, P(Z < 1.843) = 0.9673 and P(Z < 0.829) = 0.7964.
Thus P(0.829 < Z < 1.843) = 0.9673 − 0.7964 = 0.1709.
Figure 7:
Introduction to Statistical Analysis
The Standard Normal Distribution
Continued/. . .
The shaded area is P(−1.764 < Z < 2.083). Now,
P(−1.764 < Z < 2.083) = P(Z < 2.083) − P(Z < −1.764)
= P(Z < 2.083) + P(Z < 1.764) − 1 =
Introduction to Statistical Analysis
The Standard Normal Distribution
Continued/. . .
The area required is to the left of −1.97 and to the right of
2.5. Now
P(Z < −1.97 or Z > 2.5) = P(Z < −1.97) + P(Z > 2.5)
= (1 − P(Z < 1.97)) + (1 − P(Z < 2.
= 2 − (P(Z < 1.97) + P(Z < 2.5))
From tables P(Z < 1.97) = 0.9756 and P(Z < 2.5) = 0.9938.
Thus P(Z < 1.97) + P(Z < 2.5) = 0.0182
Introduction to Statistical Analysis
The Standard Normal Distribution
Continued/. . .
P(|Z | < 1.78) the area between −1.78 and 1.78. Thus
Continued/. . .
The new random variable has mean 0 and a standard
deviation of 1.
X −µ
Let Z = σ . Confirm that E [Z ] = 0 and Var[Z ] = 1.
Lets look at some examples on non-standard normal variables.
If X ∼ N(50, 20), find
P(X > 60.3);
P(X > 48.9);
P(X < 53.5) and
P(X < 47.3)
Introduction to Statistical Analysis
Transforming any Normal Random Variable to Standard Normal
Solution
To find P(X > 60.3) we proceed as follows.
60.3 − 50
P(X > 60.3) = P Z > √
20
= P(Z > 2.303)
= 1 − P(Z ≤ 2.303)
Example
If X ∼ N(84, 12), find
P(X < 79 or X > 92);
P(76 < X < 82);
P(|X − 84| > 2.9)
Introduction to Statistical Analysis
Transforming any Normal Random Variable to Standard Normal
Solution
P(X < 79 or X > 92) = P(X < 79) + P(X > 92)
= P(Z < −1.443) + P(Z > 2.309)
= 2 − (P(Z < 1.443) + P(Z < 2.309)) .
Continued/ldots
Similarly P(X < 76) = 0.0105. Thus
P(76 < X < 82) = 0.2715
By definition of the modulus function,
P(|X − 84| > 2.9) = P(X − 84 < −2.9 or X − 84 > 2.9).
Therefore, we proceed as follows:
Continued/. . .
A continuity correction factor is introduced because we are
approximating probabilities of discrete using continuous.
Let k and l be integral values. This is done because we are
finding the probabilities of a Then
P(X = k) = P(k − 12 < X < k + 12 );
P(X ≤ k) = P(X < k + 12 );
P(X < k) = P(X < k − 12 );
P(X ≥ k) = P(X > k − 12 );
P(X > k) = P(X > k + 12 );
P(k ≤ X ≤ l) = P(k − 12 < X < l + 12 );
Introduction to Statistical Analysis
Normal approximation to the Binomial and Poisson distributions
Example
Note how the factor is used for ≤, < and ≥, >.
Consider the following illustrations on the use of the
continuity correction factor.
If X Bin(200, 0.7), use the normal approximation to find (a)
P(X ≥ 130); (b) P(X < 142); (c) P(136 ≤ X < 148).
Introduction to Statistical Analysis
Normal approximation to the Binomial and Poisson distributions
Solution
n = 200 and π = 0.7 is reasonably close to 12 ,
Use the normal approximation in all the three cases with
µ = 140 and σ 2 = 42. (a) P(X ≥ 130) = P(X > 130 − 0.5).
Now,
129.5 − 140
P(X > 129.5) = P(Z > √ )
42
= P(Z > −1.62)
= P(Z < 1.62)
142.5 − 140
P(X < 142.5) = P(Z < √ )
42
Introduction to Statistical Analysis
Normal approximation to the Binomial and Poisson distributions
Continued/. . .
(c) P(136 ≤ X < 148) = P(135.5 < X < 147.5). This is
expressible as P(X < 147.5) − P(X < 135.5). Now
147.5 − 140
P(X < 147.5) = P(Z < √ )
42
= P(Z < 1.1573) = 0.8764
and
135.5 − 140
P(X < 147.5) = P(Z < √ )
42
= 1 − P(Z < 0.694) = 0.2437
Therefore,
Example
The number of calls received by an office switch-board per
hour follows a Poisson distribution with parameter 30. Use
the normal approximation to find probability that in one hour
there are (a) more than 33 calls; (b) between 25 and 28 calls
(inclusive); and (c) 34 calls.
Introduction to Statistical Analysis
Normal approximation to the Binomial and Poisson distributions
Example
We only do the last part; others can be worked out in similar
manner.
Let Y be the number of calls received in one hour. Then we
require P(Y = 34). 34 is between 33.5 and 34.5. Therefore,
P(Y = 34) = P(33.5 < Y < 34.5). Now,
P(33.5 < Y < 34.5) = P(Y < 34.5) − P(Y < 33.5)
= P(Z < 0.822) − P(Z < 0.639) = 0.0559