Chap5 Statistical Inference
Chap5 Statistical Inference
5 Statistical Inference 2
5.1 Population and random sample . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
5.1.1 Sample Mean . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
5.1.2 Sample Variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
5.1.3 Sampling distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
5.1.4 Sampling distribution of mean X̄ . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
5.1.5 Sampling distribution of a statistic involving S 2 . . . . . . . . . . . . . . . . . . . . 10
5.1.6 Sampling distribution of a statistic involving X̄ (small sample) . . . . . . . . . . . 11
5.1.7 Sampling distribution of the difference between two sample means . . . . . . . . . 11
5.1.8 Sampling distribution of sample proportion . . . . . . . . . . . . . . . . . . . . . . 12
5.1.9 Sampling distribution of difference of sample proportions . . . . . . . . . . . . . . 12
5.1.10 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
5.2 Point Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
5.2.1 Unbiased point estimator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
5.2.2 Maximum likelihood (ML) method . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
5.2.3 Method of moments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
5.3 Confidence interval (CI) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
5.3.1 CI of mean of normal population with known variance . . . . . . . . . . . . . . . . 18
5.3.2 CI of mean of normal population with unknown variance . . . . . . . . . . . . . . 19
5.3.3 CI of variance of normal distribution with unknown mean . . . . . . . . . . . . . . 20
5.3.4 CI of difference of population means . . . . . . . . . . . . . . . . . . . . . . . . . . 20
5.3.5 CI of population proportion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
5.3.6 CI of difference of population proportions . . . . . . . . . . . . . . . . . . . . . . . 22
5.3.7 One-sided confidence bounds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
5.3.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
5.4 Hypotheses testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
5.4.1 Hypothesis testing procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
5.4.2 List of some useful test statistics (two tailed test) . . . . . . . . . . . . . . . . . . . 30
5.4.3 More about P-value . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
1
Chapter 5
Statistical Inference
Note: These lecture notes aim to present a clear and crisp presentation of some topics in Probability and
Statistics. Comments/suggestions are welcome via the e-mail: [email protected] to Dr. Suresh Kumar.
Random sample
Suppose we are interested in studying some particular property of a given population, for example, this
could be average height of a given population of students. Suppose the property of our interest in each
member M of the given population is defined or quantified by a random variable X that follows some
distribution, say X ∼ D(µ, σ 2 ) with mean µ and variance σ 2 . Next, let us randomly select n members, say
M1 , M2 , ... ,Mn from the population such that the selections are independent of each other. Suppose some
random variables X1 , X2 , ..., Xn define the property of our interest for the randomly chosen members M1 ,
M2 , ... ,Mn respectively from the population. Then, X1 , X2 , ..., Xn are n independent random variables
where each of these follow the same distribution as of X, that is, Xi ∼ D(µ, σ 2 ). Why? It requires a bit of
thinking: X1 , X2 , ..., Xn are random variables because these are defined for the randomly chosen members
M1 , M2 , ... ,Mn . Further, each of these follow the same distribution as of X because each Xi defines the
same property for Mi as it is defined by X for M . Finally, the definition of random sample reads as follows:
A random sample of size n from a population distribution (X, f (X)) with mean µ and
variance σ 2 is a collection of n independent random variables X1 , X2 , ......., Xn , each having
the same probability distribution as of X. Therefore, E[Xi ] = E[X] = µ and V [Xi ] = V [X] = σ 2 ,
2
for i = 1, 2, ..., n.
Conditions of the random sample can be paraphrased by saying that the Xi ’s are independent and
identically distributed (iid). If sampling is either with replacement or from an infinite (conceptual) pop-
ulation, the conditions are satisfied exactly. These conditions will be approximately satisfied if sampling
is without replacement, yet the sample size n is much smaller than the population size N . In practice,
if n/N ≤ 0.05 (at most 5% of the population is sampled), we can proceed as if the Xi ’s form a random
sample.
Before an example of random sample, we introduce two important sample based measures, namely
the sample mean and sample variance.
n n
1 X 1 X 2 1 σ2
V [X̄] = 2 V [Xi ] = 2 σ = 2 nσ 2 = .
n n n n
i i
σ2
Thus, sample mean X̄ follows a distribution with mean µ and variance n .
n n
! !
2 1 X 1 X
∴ E[S ] = E[Xi2 ] − nE[X̄ ] 2
= 2
(V [Xi ] + E[Xi ] ) − n(V [X̄] + E[X̄] ) 2
n−1 n−1
i=1 i=1
n
!
1 X 1
= (σ 2 + µ2 ) − n(σ 2 /n + µ2 ) = [n(σ 2 + µ2 ) − (σ 2 + nµ2 )] = σ 2 .
n−1 n−1
i=1
3
Further, it can be shown that
n
1 X (n − 3) 4 1 (n − 3) 4
V (S 2 ) = E[(S 2 )2 ] − E[S 2 ]2 = E(Xi − µ)4 − σ = E(X − µ)4 − σ ,
n2 n(n − 1) n n(n − 1)
i=1
since E(Xi − µ)4 = E(X − µ)4 . Thus, the sample variance S 2 follows a distribution with mean σ 2 (the
1 (n − 3) 4
population variance) and variance V (S 2 ) = E(X − µ)4 − σ .
n n(n − 1)
Statistics and Parameters: Note that random sample based statistical measures such as X̄, S 2 etc. are
called statistics, and the population based statistical measures such as µ, σ 2 etc. are called parameters.
Suppose on a particular day only two MP3 players are sold. Let X1 be the revenue from the first sale
and X2 the revenue from the second, assuming that the two sales are independent of each other. Then,
(X1 , X2 ) is a random sample of size 2 from the given population of three types of MP3 players. Note
that prior to sale, X1 is a random variable since it could take any of the three values of
X, and hence follows the same distribution of the population variable X. Similarly, X2 also
follows the distribution of X. So E[X] = E[X1 ] = E[X2 ] = 106 and V [X] = V [X1 ] = V [X2 ] = 244.
The following table lists all possible (x1 , x2 ) pairs, the probability P [X = x1 , X2 = x2 ] of each pair, and
the resulting x̄ and s2 values. Note that when n = 2, s2 = (x1 − x̄)2 + (x2 − x̄)2 .
80 80 (0.2)(0.2) = 0.04 80 0
80 100 (0.2)(0.3) = 0.06 90 200
80 120 (0.2)(0.5) = 0.10 100 800
100 80 (0.3)(0.2) = 0.06 90 200
100 100 (0.3)(0.3) = 0.09 100 0
100 120 (0.3)(0.5) = 0.15 110 200
120 80 (0.5)(0.2) = 0.10 100 800
120 100 (0.5)(0.3) = 0.15 110 200
120 120 (0.5)(0.5) = 0.25 120 0
4
X̄ = x̄ 80 90 100 110 120
P [X̄ = x̄] 0.04 0.06 + 0.06 = 0.12 0.10 + 0.09 + 0.10 = 0.29 0.15 + 0.15 = 0.30 0.25
S 2 = s2 0 200 800
P [S 2 = s2 ] 0.04 + 0.09 + 0.25 = 0.38 0.06 + 0.06 + 0.15 + 0.15 = 0.42 0.10 + 0.10 = 0.20
1 (n − 3) 4
V (S 2 ) = E(X − µ)4 − σ = 85264.
n n(n − 1)
Notice that it is the pdf of gamma distribution with α = 2 and β = 1/(2λ). Thus, T̄ is a gamma ran-
dom variable with mean E[T̄ ] = αβ = 1/λ and variance V [T̄ ] = αβ 2 = 1/(2λ2 ). Also, the mean and
variance of the underlying exponential distribution are µ = 1/λ and σ 2 = 1/λ2 . So E[T̄ ] = 1/λ = µ and
V [T̄ ] = 1/(2λ2 ) = σ 2 /2, as expected.
Note. The second method of obtaining information about a statistic’s sampling distribution is to perform
a simulation experiment. This method is usually used when a derivation via probability rules is too difficult
or complicated to be carried out. Such an experiment is virtually always done with the aid of a computer.
5
5.1.3 Sampling distributions
In this section, we will learn about sampling distributions of some useful statistics. First, let us see some
important results in this direction pertaining to the distributions of linear combinations of independent
random variables.
(i) Uniqueness of moment generating function (mgf): Recall that mgf of a random variable X is
defined as mX (t) = E(etX ), and its kth derivative at t = 0 gives the kth moment E(X k ). Next, mgf
of a random variable is unique, that is, if two random variables have same mgf, then these have same
probability distribution, and vice versa. Let us accept this well-known result in the literature.
(ii) mgf of the linear combination of independent random variables: Let X1 and X2 be independent
random variables with mgf mX1 (t) and mX2 (t) respectively. Let Y = X1 + X2 . Then the mgf of Y is
given by
For, we have
Let X1 , X2 , ...., Xn be n random variables, with means µ1 , µ2 , ...., µn , and variances σ12 , σ22 , .....,
σn2 , respectively. Let a1 , a2 , ...., an be n constants. Then mean and variance of the linear combination
Y = a1 X1 + a2 X2 + ... + an Xn are given by
In case, X1 , X2 , ...., Xn are independent random variables, we have Cov(Xi , Xj ) = 0 for i 6= j and
Cov(Xi , Xi ) = σi2 . So we get
Also,
n
Y
mY (t) = mX1 (a1 t)mX2 (a2 t)....mXn (an t) = mXi (ai t).
i=1
Ex. If X1 and X2 are two independent Poisson random variables with parameters or means k1 and k2
respectively, then find the distribution of Y = X1 + X2 .
Sol. We recall that mgf of a Poisson random variable with mean k is
t −1)
mX (t) = ek(e .
It follows that
t −1) t −1) t −1)
mY (t) = mX1 (t)mX2 (t) = ek1 (e ek2 (e = e(k1 +k2 )(e ,
6
which is the mgf of a Poisson random variable with mean k1 + k2 . So, by uniqueness of mgf, Y = X1 + X2
follows a Poisson distribution with mean k1 + k2 .
Note that, in general, the linear combination of two Poisson random variables is not a Poisson random
2t 3t
variable. For example, if Y = 2X1 + 3X2 =, then mY (t) = ek1 (e −1)+k2 (e −1) , which is not the mgf of a
Poisson random variable.
Ex. If X1 and X2 are two independent exponential random variables both with same parameter or mean
β, then find the distribution of Y = X1 + X2 .
Sol. We recall that mgf of an exponential random variable with mean β is
mX (t) = (1 − βt)−1 .
It follows that
which is the mgf of a gamma random variable with the parameters α = 2 and β. So, by uniqueness of
mgf, Y = X1 + X2 follows a gamma distribution with the parameters: α = 2 and β.
In case, the X1 and X2 are exponential variables are with different parameters, say β1 and β2 , then
Ex. If X1 and X2 are two independent χ2 random variables with n1 and n2 dof respectively, then find
the distribution of Y = X1 + X2 .
Sol. We recall that mgf of a χ2 random variable X with n dof is
mX (t) = (1 − 2t)−n/2 .
It follows that
mY (t) = mX1 (t)mX2 (t) = (1 − 2t)−n1 /2 .(1 − 2t)−n2 /2 = (1 − 2t)−(n1 +n2 )/2 ,
which is the mgf of a χ2 random variable with n1 + n2 dof. So, by uniqueness of mgf, Y = X1 + X2 follows
χ2 distribution with n1 + n2 dof.
Note that a linear combination of X1 and X2 different from X1 + X2 is not a χ2 random variable
(verify!).
Ex. If X1 and X2 are two independent normal random variables both with mean µ and variance σ 2 , then
find the distribution of Y = X1 + X2 .
Sol. We recall that mgf of a normal random variable X is
1 2 t2
mX (t) = eµt+ 2 σ .
It follows that
1 2 t2 1 2 t2 1 2 /2)t2
mY (t) = mX1 (t)mX2 (t) = eµt+ 2 σ .eµt+ 2 σ = e2µt+ 2 (σ ,
7
which is the mgf of a normal random variable with mean 2µ and variance σ 2 /2. So, by uniqueness of mgf,
Y = X1 + X2 follows normal distribution with mean 2µ and variance σ 2 /2.
Further, if Y = a1 X1 + a2 X2 , then
1 2 2 2 t2
mY (t) = e(a1 +a2 )µt+ 2 (a1 +a2 )σ .
This shows that the linear combination Y = a1 X1 +a2 X2 is a normal random variable with mean (a1 +a2 )µ
and variance (a21 + a22 )σ 2 .
In general, if X1 and X2 are two independent normal random variables with means µ, µ2 and variances
σ12 and σ22 , and Y = a1 X1 + a2 X2 , then
1 2 2 2 2 2
mY (t) = e(a1 µ1 +a2 µ2 )t+ 2 (a1 σ1 +a2 σ2 )t .
This shows that the linear combination Y = a1 X1 + a2 X2 is a normal random variable with mean
a1 µ1 + a2 µ2 and variance a21 σ12 + a2 σ22 .
For, we know that mgf of a normal random variable X with mean µ and variance σ 2 is
1 2 t2
mX (t) = eµt+ 2 σ .
Also, here X̄ = n1 X1 + n1 X2 + .... + n1 Xn is the linear combination of the n independent normal random
variables X1 , X2 , ......., Xn . So we have
σ2 2 σ2
( nµ +.... nµ )t+ 12 n2
+....+ σ 2 t2 µt+ 12 t2
mX̄ (t) = mX1 (t/n)mX2 (t/n)....mXn (t/n) = e n =e n
,
which is the mgf of a normal random variable with mean µ and variance σ 2 /n. So by uniqueness of mgf,
X̄ is normally distributed with mean µ and variance σ 2 /n.
Here, we have a very useful and important observation: The variance σ 2 /n of X̄ decreases as the size
n of the random sample increases (For an illustration see Figure 5.1). It means as we increase the sample
size, the observed values of the statistic X̄ are likely to appear closer and closer to the population mean
µ.
Case (ii) Population with known or unknown variance (large sample size)
Let X1 , X2 , ......., Xn be a random sample of size n from any population distribution with mean µ and
variance σ 2 . Then for large n, X̄ is approximately normal (exactly normal in case of normal population)
8
Figure 5.1: The sampling distributions of sample mean X̄ with sample sizes n = 2, 4, 16 obtained via
10000 MCMC simulations from a normal population with µ = 1 and σ 2 = 1. We can see that the
sampling distributions are approximately normal, and the variance of the sampling distribution decreases
with the increase in the sample size n, as expected.
X̄−µ
with mean µ and variance σ 2 /n. Therefore, for large n, Z = √
σ/ n
is approximately standard normal.
X̄−µ
Further, for large n, the statistic Z = √
S/ n
is also approximately normal, where S is S.D. of the sample.
Thumb rule for applying CLT: The normal approximation for X̄ is generally good if n ≥ 30, provided
the population distribution is not terribly skewed. If n < 30, the approximation is good only if the
population is not too different from a normal distribution.
(i) If a random sample of size n is selected from a normal population having mean µ and variance σ 2 ,
X̄ − µ
then the sample mean X̄ follows a normal distribution with mean µ and variance σ 2 /n. So Z = √
σ/ n
is a standard normal variable.
(ii) If a random sample of size n ≥ 30 (large sample) is selected from a non-normal population having
mean µ and variance σ 2 , then by CLT, the sample mean X̄ approximately follows a normal distribution
X̄ − µ
with mean µ and variance σ 2 /n. So Z = √ is approximately a standard normal variable. In case,
σ/ n
X̄ − µ
population variance σ 2 is unknown, we can use the sample variance S 2 , that is, Z = √ is also ap-
S/ n
proximately a standard normal variable.
Ex. The breaking strength of a rivet has a mean value of 10000 psi and a standard deviation of 500
9
psi. What is the probability that the sample mean breaking strength for a random sample of 40 rivets is
between 9950 and 10250?
X̄−10000
Sol. Here µ = 10000, σ = 500 and n = 40 > 30. So by CLT, Z = √
500/ 40
is approximately standard
normal. Therefore, we have
P (9950 ≤ X̄ ≤ 10250) ≈ P 9950−10000
√
500/ 40
≤ Z ≤ 10250−10000
√
500/ 40
= P (−0.63 ≤ Z ≤ 3.16) = F (3.16) − F (−0.63) = 0.9992 − 0.2643 = 0.7349.
Note: To save time, you can use the online probability calculator SticiGui, that provides the probabilities
for different distributions. Else, use the probability distribution tables given in the text book. If you see
the values from the Table, you may need to do interpolation in order to get a better value, in case exact
value is not available in the Table. Interpolation procedure is as follows:
Suppose we need P [X ≤ c], that is, F (c). But from the Table, say only F (a) and F (b) are available,
where a < c < b. Then for interpolation, use
c−a
P [X ≤ c] = F (c) = F (a) + (F (b) − F (a)).
b−a
If S 2 is the variance of a random sample of size n taken from a normal population having the variance
σ2, then the statistic
n
(n − 1)S 2 X (Xi − X̄)2
χ2n−1 = =
σ2 σ2
i=1
10
V [χ2n−1 ] = 2(n − 1).
We can use this to get the mean and variance of S 2 :
2 2
2 σ χn−1 σ2
E[S ] = E = (n − 1) = σ 2 .
n−1 n−1
2 2
2 σ χn−1 σ4 2 σ4 2σ 4
V [S ] = V = V [χ n−1 ] = 2(n − 1) = .
n−1 (n − 1)2 (n − 1)2 n−1
Ex. An electrical firm manufactures light bulbs that have a length of life that is normally distributed
with mean equal to 800 hours. Find the probability that a random sample of 16 bulbs with a standard
deviation of 40 hours will have an average life of less than 775 hours.
X̄−µ
√ = T15 = X̄−800
Sol. Here µ = 800, n = 16 and S = 40. So Tn−1 = S/ n
√
40/ 16
follows a T distribution with 15
degrees of freedom, and therefore
and variance
σ12 σ22
V [X̄ − Ȳ ] = V [X̄] + V [Ȳ ] = + .
n1 n2
Therefore,
(X̄ − Ȳ ) − (µ1 − µ2 )
Z= q 2
σ1 σ22
n1 + n2
11
5.1.8 Sampling distribution of sample proportion
Let p denote the proportion of “successes” in a Binomial population. Suppose X is number of successes in
a random sample of size n. Then mean and variance of X are E[X] = np and variance V (X) = E[X 2 ] −
E[X]2 = np(1 − p), respectively. Furthermore, if np ≥ 10 and np(1 − p) ≥ 10, then X approximately
follows a normal distribution. Since n is constant, the random variable defined by P̂ = X
n (the sample
proportion) also follows normal distribution with mean
X 1 1
E = E[X] = np = p
n n n
and variance
" #
X 2
2
X X 1 1 p(1 − p)
V =E −E = 2 (E[X 2 ] − E[X]2 ) = 2 np(1 − p) = .
n n n n n n
P̂ − p
Z=p
p(1 − p)/n
E[P̂1 − P̂2 ] = p1 − p2 ,
p1 (1 − p1 ) p2 (1 − p2 )
V [P̂1 − P̂2 ] = + ,
n1 n2
provided the conditions of normality are met by the chosen samples from the Binomial populations.
Therefore, the statistic
12
5.1.10 Summary
Before, we proceed further, it is important to recollect the following important points.
• Population is the collection of all objects or observations under study, and any subset of the popu-
lation is a sample.
• In practical applications, we usually need to know the properties (mean, variance etc.) of a large
size or infinite population. Due to the large size, we can not deal with each item of the population.
So we collect random (unbiased) sample(s) from the population, and from the collected sample
information, we infer the properties of population.
• Mean of the random sample X1 , X2 , ..., Xn is X̄ = (X1 + X2 + ... + Xn )/n. Note that X̄ being a
combination of random variables, is also a random variable. Further, X̄ follows a distribution with
mean µ and variance σ 2 /n, if the random sample is selected from a population having mean µ and
variance σ 2 .
• If a random sample of size n is selected from a normal population having mean µ and variance
σ 2 , then the sample mean X̄ follows a normal distribution with mean µ and variance σ 2 /n. So
X̄ − µ
Z= √ is a standard normal variable.
σ/ n
• If a random sample of size n < 30 (small sample) with mean X̄ and variance S 2 is selected from a
normal distribution with mean µ but unknown variance. Then
X̄ − µ
Tn−1 = √
S/ n
13
• If a random sample of size n ≥ 30 (large sample) is selected from a non-normal population having
mean µ and variance σ 2 , then by CLT the sample mean X̄ approximately follows a normal distribu-
X̄ − µ
tion with mean µ and variance σ 2 /n. So Z = √ is approximately a standard normal variable.
σ/ n
X̄ − µ
In case, population variance σ 2 is unknown, we can use the sample variance S 2 , that is, Z = √
S/ n
is also approximately a standard normal variable.
• If S 2 is the variance of a random sample of size n taken from a normal population having the
variance σ 2 , then the statistic
n
(n − 1)S 2 X (Xi − X̄)2
χ2n−1 = =
σ2 σ2
i=1
• If X1 , X2 , ...., Xn are independent normal random variables with means µ1 , µ2 ,...., µn and variances
σ12 , σ22 , ...., σn2 , respectively, then the linear combination a1 X1 + a2 X2 + ... + an Xn is also a normal
random variable with mean a1 µ1 + a2 µ2 + ... + an µn and variance a21 σ12 + a22 σ22 + ... + a2n σn2 .
(i) Point Estimation: We use the sample information to estimate the value of the population pa-
rameter, called the point estimate.
(ii) Confidence Interval: We use the sample information to estimate an interval, called the con-
fidence interval, in which the population parameter is likely to lie with some given probability (confidence).
(ii) Hypothesis Testing: We use the sample information to test a given or existing hypothesis or
statement, called null hypothesis, about the population parameter via an alternative hypothesis.
14
5.2 Point Estimation
Estimating the value of a population parameter θ from the sample information refers to as the point
estimation of the population parameter θ.
E[X̄] = µ, E[S 2 ] = σ 2 .
Therefore, the sample mean µ̂ = X̄ and sample variance σ̂ 2 = S 2 are unbiased estimators of the population
mean µ and variance σ 2 , respectively.
Ex. Suppose 7 heads are observed in 10 tosses of a coin. Find the ML estimate of the probability of
success (observing a head) in each trial.
Sol. Let p be the probability of success (observing a head) in each trial. Here, the random experiment
is governed by Binomial distribution. Therefore, the probability of observing 7 heads in 10 tosses (the
likelihood function of the given sample) is given by
10 7
L(p) = p (1 − p)3 .
7
This implies
10
ln L(p) = ln + 7 ln p + 3 ln(1 − p).
7
d
So dp (ln L(p)) = 0 gives
7 3
− = 0.
p 1−p
15
7
This gives p = 10 . Thus the ML estimate is simply given by the observed fraction of successes, which is
intuitively obvious. Also, the sample information suggests that the coin is not fair enough.
Ex. Suppose 15 buses arrive on a bus stop in a span of 3 hours. Find the ML estimate of the average
number of buses that arrive on the bus stop.
Sol. Let λ be the number of buses that arrive on the bus stop in one hour. Here, the random experiment
is governed by Poisson distribution. Therefore, the probability of 15 buses’ arrival in a span of 3 hours
(the likelihood function of the given sample) is given by
(3λ)15 e−3λ
L(λ) = .
15!
This implies
ln L(λ) = 15 ln(3λ) − 3λ − ln 15!.
d
So dλ (ln L(λ)) = 0 gives
15
− 3 = 0.
λ
This gives λ = 5, as expected.
Ex. Let a1 , a2 , ......., an be an observed set of values of a random sample X1 , X2 , ......., Xn from a normal
population distribution with mean µ and variance σ 2 . Find the ML estimates of µ and σ 2 .
Sol. The density for Xi is
2
x −µ
1 −1 i
f (xi ) = √ e 2 σ , i = 1, 2, ...., n.
σ 2π
Since X1 , X2 , ......., Xn are independent, the joint probability of Xi , reads as
n
X
n
xi −µ
2 n − 12
2σ
(xi − µ)2
Y1 −1 1
f (x1 , x2 , ..., xn ; µ, σ) = √ e 2 σ
= √ e i=1 .
i=1
σ 2π σ 2π
Therefore, the likelihood function providing the observed values a1 , a2 , ......., an of the random sample
X1 , X2 , ......., Xn , is
n
X
n − 1
2σ 2
(ai − µ)2
1
L(µ, σ) = √ e i=1 .
σ 2π
n
√ 1 X
∴ ln L(µ, σ) = −n ln 2π − n ln σ − 2 (ai − µ)2 .
2σ
i=1
Putting the partial derivatives of ln L(µ, σ) equal to 0, we find
n n
1X 2 1X
µ= ai = ā, σ = (ai − ā)2 .
n n
i=1 i=1
16
5.2.3 Method of moments
In many cases, the moments involve the parameter θ to be estimated. We can often obtain a reasonable
estimator for θ by replacing the theoretical moments by their estimates based on drawn sample and solv-
ing the resulting equations for the estimator θ̂. The kth moment of the random sample X1 , X2 , ......., Xn
n
1X k
is defined as Xi . On the other hand, the kth moment of the population is E(X k ). If there are m
n
i=1
n
1X k
unknown parameters, then we use the m equations: E(X k ) = Xi , k = 1, 2, ..., m, and solve these
n
i=1
equations to obtain the m unknown parameters.
Ex. A forester plants 5 rows of pine seedlings with 20 pine seedlings in each row. Let X denotes the
number of seedlings per row that survive the first winter. Then X follows a binomial distribution with
n = 20 and unknown p. Find an estimate of p given that X1 = 18, X2 = 17, X3 = 15, X4 = 19, X5 = 20.
Sol. The first moment of the binomial random variable X is E[X] = np = 20p while the first moment of
5
1X 1
the given sample is Xi = (18 + 17 + 15 + 19 + 20) = 17.8. So solving 20p̂ = 17.8, we find p̂ = 0.89,
5 5
i=1
the estimate for p.
Ex. Suppose 2, 4, 3, 6, 10 are the values of a sample of size 5 from a gamma distribution. Find estimates
of the parameters α and β of the gamma distribution.
Sol. The first and second moments of gamma random variable are αβ and αβ 2 + α2 β 2 while the given
sample moments are (2 + 4 + 3 + 6 + 10)/5 = 5 and (22 + 42 + 32 + 62 + 102 )/5 = 33. So solving α̂β̂ = 5
and α̂β̂ 2 + α̂2 β̂ 2 = 125, we find the estimates α̂ = 25/8, β̂ = 8/5.
Note: The estimate obtained from the method of moments often agrees with the one obtained from the
method of ML. If it does not happen in some case, then the ML estimate is preferred.
In the above, we learned some methods to find point estimate of a population parameter θ. Now, we
learn the methods to use the sample information to find an interval or range in which the parameter θ is
likely to lie with some given probability (confidence).
17
5.3 Confidence interval (CI)
A 100(1 − α)% confidence interval (CI) for a population parameter θ is a random interval [L1 , L2 ] such
that P [L1 ≤ θ ≤ L2 ] = 1 − α, regardless the value of θ.
X̄ − µ
Z= √
σ/ n
follows a standard normal distribution. We utilize this fact to find CI of µ. From the standard normal
distribution, we know that zα/2 is the value of Z such that P [Z > zα/2 ] = α/2. Then by the symmetry
of the distribution, we have
where x̄ is the computed value of X̄ from the given sample. This means that we can be 100(1 − α)%
√
confident that the error will not exceed zα/2 σ/ n.
Frequently, we wish to know how large a sample is necessary to ensure that the error in estimating µ will
√
be less than a specified amount e. Clearly, we must choose n such that zα/2 σ/ n = e or n = (zα/2 σ/e)2 .
Now, let us find 95% CI for µ. From the normal probability distribution table, we have
Sol. Here n = 16, x̄ = 13.88 and σ = 3. So 95% confidence limits are given by
√
L1 = x̄ − 1.96σ/ n = 13.88 − 1.96(3/4) = 12.41,
√
L2 = x̄ + 1.96σ/ n = 13.88 + 1.96(3/4) = 15.35.
18
Ex. How large a sample is required if we want to be 95% confident that our estimate of mean of a
population with S.D. 0.3 is off by less than 0.05?
Note: If a sample X1 , X2 , ......., Xn of size n > 30 (large sample) with mean X̄ and variance S 2 is drawn
from a non-normal population with unknown mean µ, then the CLT theorem suggests that the statistic
X̄ − µ
Z= √
S/ n
approximately follows the standard normal distribution. So 100(1 − α)% CI for µ is
√ √
[L1 , L2 ] = [x̄ − zα/2 s/ n, x̄ + zα/2 s/ n],
X̄ − µ
or P [−tα/2 ≤ √ ≤ tα/2 ] = 1 − α.
S/ n
√ √
∴ P [X̄ − tα/2 S/ n ≤ µ ≤ X̄ + tα/2 S/ n] = 0.95.
Thus, 100(1 − α)% CI for µ is
√ √
[L1 , L2 ] = [x̄ − tα/2 s/ n, x̄ + tα/2 s/ n],
where x̄ and s are the the computed value of X̄ and S from the given sample.
In particular, 95% CI for µ is
√ √
[L1 , L2 ] = [x̄ − t0.025 s/ n, x̄ + t0.025 s/ n].
Ex. Find the 95% CI for µ of a normal population based on the sample of size 24 given below:
Sol. Here n = 24, X̄ = 53.92 and S = 10.07. From the T probability distribution table, for 23 degrees of
freedom, we have t0.025 = 2.069. So 95% confidence limits are given by
√ √
L1 = x̄ − t0.025 s/ n = 53.92 − 2.069(10.07)/ 24 = 49.67,
√ √
L2 = x̄ + t0.025 s/ n = 53.92 + 2.069(10.07)/ 24 = 58.17.
19
5.3.3 CI of variance of normal distribution with unknown mean
Let X1 , X2 , ......., Xn be a random sample of size n with mean X̄ and variance S 2 drawn from a normal
distribution with unknown mean µ and variance σ 2 . Then we recall that the statistic
n
X
2
Xn−1 = (n − 1)S 2 /σ 2 = (Xi − X̄)2 /σ 2
i=1
follows a chi-squared distribution with n − 1 degrees of freedom. Further, χ2α/2 and χ21−α/2 denote the
2
values of Xn−1 2
such that P [Xn−1 ≥ χ2α/2 ] = α/2 and P [Xn−1
2 ≥ χ21−α/2 ] = 1 − α/2. Therefore,
2
P [χ21−α/2 ≤ Xn−1 ≤ χ2α/2 ] = 1 − α.
Ex. Find the 95% CI for σ 2 of a normal population based on the following sample:
Sol. Here n = 25 and S 2 = 1.408. From the χ2 probability distribution table, for 24 degrees of freedom,
we have χ20.025 = 39.4 and χ20.975 = 12.4. So 95% confidence limits are given by
20
s s
σ12 σ22 σ12 σ22
=⇒ P X̄ − Ȳ − zα/2 + ≤ µ1 − µ2 ≤ X̄ − Ȳ + zα/2 + = 1 − α.
n1 n2 n1 n2
where x̄ and ȳ are computed values of X̄ and Ȳ , respectively, from the given samples.
Note. This procedure for estimating the difference between two means is applicable if σ12 and σ22 are
known. If the variances are not known and the two distributions involved are normal, the T-distribution
becomes involved, as in the case of a single sample. If one is not willing to assume normality, large
samples (say greater than 30) will allow the use of computed sample S.D. s1 and s2 in place of σ1 and σ2 ,
respectively.
Ex. A study was conducted in which two types of engines, A and B, were compared. Gas mileage, in miles
per gallon, was measured. Fifty experiments were conducted using engine type A and 75 experiments were
done with engine type B. The gasoline used and other conditions were held constant. The average gas
mileage was 36 miles per gallon for engine A and 42 miles per gallon for engine B. Find a 96% confidence
interval on µB −µA , where µA and µB are population mean gas mileages for engines A and B, respectively.
Assume that the population standard deviations are 6 and 8 for engines A and B, respectively.
Sol. Here n1 = 50, n2 = 75, x̄ = 36, ȳ = 42, σ1 = 6 and σ2 = 8. Also, we find z0.02 = 2.05. So the
required CI is
s s
2 2 2 2
ȳ − x̄ − z0.02 σ1 + σ2 , ȳ − x̄ + z0.02 σ1 + σ2 = [3.43, 8.57].
n1 n2 n1 n2
21
where p̂ is the computed value of P̂ from the given sample, and q̂ = 1 − p̂.
If the sample size n is very large, then a good approximation to the above interval is
p p
[p̂ − zα/2 p̂q̂/n, p̂ + zα/2 p̂q̂/n].
Ex. In a random sample of 500 families owning television sets in Delhi, it is found that 340 subscribe to
HBO. Find a 95% CI for the actual proportion of families with television sets in Delhi that subscribe to
HBO.
Sol. Here n = 500, x = 340. So p̂ = x/n = 340/500 = 0.68 and q̂ = 0.32. Also, z0.025 = 1.96. So 95% CI
for the actual proportion is
p p
[p̂ − z0.025 p̂q̂/n, p̂ + z0.025 p̂q̂/n] = [0.6391, 0.7209].
is a standard normal variable. For large samples, the approximate 100(1 − α)% CI for p1 − p2 is obtained
as
h p p i
p̂1 − p̂2 − zα/2 p̂1 q̂1 /n1 + p̂2 q̂2 /n2 , p̂1 − p̂2 + zα/2 p̂1 q̂1 /n1 + p̂2 q̂2 /n2 ,
where p̂1 , p̂2 are the computed values of P̂2 , P̂2 from the given samples, and q̂1 = 1 − p̂1 , q̂2 = 1 − p̂2 .
Ex. A certain change in a process for manufacturing component parts is being considered. Samples are
taken under both the existing and the new process so as to determine if the new process results in an
improvement. If 75 of 1500 items from the existing process are found to be defective and 80 of 2000 items
from the new process are found to be defective, find a 90% confidence interval for the true difference in
the proportion of defectives between the existing and the new process.
Sol. Here n1 = 1500, n2 = 2000, x = 75, y = 80. So p̂1 = x/n1 = 75/1500 = 0.05 and p̂2 = x/n2 =
80/2000 = 0.04. Also, z0.05 = 1.645. So 90% CI for p1 − p2 is
h p p i
p̂1 − p̂2 − z0.05 p̂1 q̂1 /n1 + p̂2 q̂2 /n2 , p̂1 − p̂2 + z0.05 p̂1 q̂1 /n1 + p̂2 q̂2 /n2 = [−0.0017, 0.0217].
22
So 100(1 − α)% lower confidence bound for µ is
√
x̄ − zα σ/ n,
where x̄ is the computed value of sample mean X̄ from the given sample. Likewise, 100(1 − α)% upper
confidence bound for µ is obtained as
√
x̄ + zα σ/ n.
Ex. In a psychological testing experiment, 25 subjects are selected randomly and their reaction time, in
seconds, to a particular stimulus is measured. Past experience suggests that the variance in reaction times
to these types of stimuli is 4 sec2 and that the distribution of reaction times is approximately normal.
The average time for the subjects is 6.2 seconds. Give an upper 95% bound for the mean reaction time.
Hence, we are 95% confident that the mean reaction time is less than 6.858 seconds.
23
5.3.8 Summary
Before, we proceed further, it is important to recollect the following important points.
• If X̄ is the mean of a random sample of size n selected from a normal population with unknown mean
X̄ − µ
µ and known variance σ 2 , then the statistic Z = √ is standard normal, and the 100(1 − α)%
σ/ n
CI of µ is
√ √
[L1 , L2 ] = [x̄ − zα/2 σ/ n, x̄ + zα/2 σ/ n],
where x̄ is the computed value of X̄ from the given sample.
• If X̄ is the mean of a random sample of size n ≥ 30 (large sample) selected from a non-normal popu-
X̄ − µ
lation with unknown mean µ and known variance σ 2 , then the statistic Z = √ is approximately
σ/ n
standard normal, and the 100(1 − α)% CI of µ is
√ √
[L1 , L2 ] = [x̄ − zα/2 σ/ n, x̄ + zα/2 σ/ n],
where x̄ is the computed value of X̄ from the given sample.
• If X̄ is the mean and S 2 is the variance of a random sample of size n ≥ 30 (large sample) selected
from a non-normal population with unknown mean µ and unknown variance, then the statistic
X̄ − µ
Z= √ is approximately standard normal, and the 100(1 − α)% CI of µ is
S/ n
√ √
[L1 , L2 ] = [x̄ − zα/2 s/ n, x̄ + zα/2 s/ n],
where x̄ and s are the computed values of X̄ and S, respectively from the given sample.
• If X̄ is the mean and S 2 is the variance of a random sample of size n < 30 (small sample) selected
from a normal population with unknown mean µ and unknown variance, then the statistic Tn−1 =
X̄ − µ
√ follows T-distribution with n − 1 degrees of freedom, and the 100(1 − α)% CI of µ is
S/ n
√ √
[L1 , L2 ] = [x̄ − tα/2 s/ n, x̄ + tα/2 s/ n],
where x̄ and s are the computed values of X̄ and S, respectively from the given sample.
• If X̄ is the mean and S 2 is the variance of a random sample of size n selected from a normal
2
population with unknown mean and unknown variance σ 2 , then the statistic Xn−1 = (n − 1)S 2 /σ 2
follows a chi-squared distribution with n − 1 degrees of freedom, and the 100(1 − α)% CI for σ 2 is
[L1 , L2 ] = [(n − 1)s2 /χ2α/2 , (n − 1)s2 /χ21−α/2 ],
where s is the computed value of S from the given sample.
• If two independent random samples of size n1 and n2 with means X̄ and Ȳ are drawn from two
normal populations with unknown means µ1 and µ2 and known variances σ12 and σ22 , respectively,
(X̄ − Ȳ ) − (µ1 − µ2 )
then the statistic Z = q 2 is standard normal, and the 100(1 − α)% CI for µ1 − µ2
σ1 σ22
n1 + n2
is
s s
2
σ1 σ 2 2
σ1 σ 2
[L1 , L2 ] = x̄ − ȳ − zα/2 + 2 , x̄ − ȳ + zα/2 + 2,
n1 n2 n1 n2
where x̄ and ȳ are computed values of X̄ and Ȳ , respectively, from the given samples.
24
• If P̂ = X/n is the proportion of successes X in a random sample of size n drawn from a binomial
P̂ − p
population with unknown success proportion p, then the statistic Z = p is approximately
p(1 − p)/n
standard normal under the assumptions np ≥ 10 and np(1 − p) ≥ 10, and the approximate 100(1 −
α)% CI of p for large n is
p p
[L1 , L2 ] = [p̂ − zα/2 p̂q̂/n, p̂ + zα/2 p̂q̂/n],
where p̂ is the computed value of P̂ from the given sample, and q̂ = 1 − p̂.
• If P̂1 = X/n and P̂2 = Y /n are the proportions of successes X and Y in two random samples
of size n1 and n2 drawn from two binomial populations with unknown success proportions p1 and
(P̂1 − P̂2 ) − (p1 − p2 )
p2 , respectively, then the statistic Z = p is approximately standard
p1 (1 − p1 )/n1 + p2 (1 − p2 )/n2
normal under suitable assumptions as in case of single proportion, and the approximate 100(1−α)%
CI of p1 − p2 for large n1 and n2 is
h p p i
[L1 , L2 ] = p̂1 − p̂2 − zα/2 p̂1 q̂1 /n1 + p̂2 q̂2 /n2 , p̂1 − p̂2 + zα/2 p̂1 q̂1 /n1 + p̂2 q̂2 /n2 ,
where p̂1 , p̂2 are the computed values of P̂2 , P̂2 from the given samples, and q̂1 = 1 − p̂1 , q̂2 = 1 − p̂2 .
25
5.4 Hypotheses testing
In a hypotheses-testing problem, there are two contradictory hypotheses under consideration, namely the
null hypothesis and the alternative hypothesis. The null hypothesis, denoted by H0 , is the claim that
is initially assumed to be true (“the prior belief” claim). The alternative hypothesis, denoted by Ha , is
the assertion that is opposite or contradictory to H0 . The following examples illustrate the two hypotheses.
Ex. A manufacturer of sprinkler systems used for fire protection in office buildings claims that the true
average µ system-activation temperature is 130 degree Fahrenheit. To analyze the manufacturer’s claim,
we need to test Ha versus H0 where
H0 : µ = 130
H0 : µ ≤ 1.5 mg (Company’s claim that the average nicotine content is at most 1.5 mg)
H0 : µ = 1000 hours
H0 : p = 0.1
H0 : p ≤ 0.3
Ha : p > 0.3
26
Ex. Let σ denote the standard deviation of the distribution of inside diameters (inches) for a certain
type of metal sleeve. The company decides to continue the production of sleeve unless sample evidence
conclusively demonstrates σ > 0.001. So to continue/stop the production, one needs to test Ha versus
H0 where
H0 : σ = 0.001
From the above examples, it is clear that the hypotheses statement about a parameter θ of interest
appears in one of the following forms:
(1) H0 : θ = θ0 , Ha : θ 6= θ0 (two-tailed)
(2) H0 : θ = θ0 , Ha : θ < θ0 (left-tailed)
(3) H0 : θ = θ0 , Ha : θ > θ0 (right-tailed)
(4) H0 : θ ≥ θ0 , Ha : θ < θ0 (left-tailed)
(5) H0 : θ ≤ θ0 , Ha : θ > θ0 (right-tailed)
where the value θ0 of θ, that separates H0 from Ha , is called null value of θ, and is included in the state-
ment of null hypothesis H0 . Note that the statisticians prefer to write H0 : θ = θ0 in place of H0 : θ ≥ θ0
or H0 : θ ≤ θ0 since, as we will see later, if Ha given in (4) or (5) is favored over H0 : θ = θ0 , then it also
favored over H0 : θ ≥ θ0 or H0 : θ ≤ θ0 . So the first three statements of hypotheses are used in practice.
Scientific research often involves trying to decide whether a current theory should be replaced by
a more plausible and satisfactory explanation of the phenomenon under investigation. A conservative
approach is to identify the current theory with H0 and the researcher’s alternative explanation with Ha .
Rejection of the current theory will then occur only when evidence is much more consistent with the new
theory. That is why, in many situations, Ha is referred to as the “researcher’s hypothesis”, since it is
the claim that the researcher would really like to validate. The word null means “of no value, effect, or
consequence”, which suggests that H0 should be identified with the hypothesis of no change (from current
opinion), no difference, no improvement, and so on.
The null hypothesis will be rejected in favor of the alternative hypothesis only if sample evidence
suggests that H0 is false. If the sample does not strongly contradict H0 , we can not reject H0 . The two
possible conclusions from a hypothesis-testing analysis are then reject H0 or fail to reject H0 . A researcher
or experimenter who puts lot of effort, money etc. to validate the hypothesis Ha , would not like to accept
H0 rather would like to comment that the existing research evidence is not enough to reject H0 .
27
5.4.1 Hypothesis testing procedure
In a hypothesis testing problem, we are given a prevailing or existing statement (known as null hypothesis
H0 ) about the population parameter. By setting up an alternative hypothesis Ha (complementary to H0 )
and employing a suitable statistical method or test, we check correctness of H0 with certain confidence
or probability. The following example illustrates the idea and procedure of hypothesis testing.
Suppose we are given a normal population with mean µ0 and variance σ02 . But we suspect whether
actual mean of the population is µ0 . To test or make a decision about the prevailing hypothesis or null
hypothesis, H0 : µ = µ0 about the mean of the population, the alternative hypothesis is, Ha : µ 6= µ0 . So
the hypothesis statement is given by
H0 : µ = µ0
Ha : µ 6= µ0
Now suppose, for the hypothesis testing purpose, we are given a random sample of size n from the normal
population N (µ0 , σ02 ). Let X̄ be mean of the random sample, and x̄ be its observed value. We will use
random sample mean X̄ and its observed value x̄ for testing the given hypothesis about the population
mean.
We know that the sampling distribution of mean from the given normal population is also normal,
that is, X̄ ∼ N (µ0 , σ02 /n). It follows that
X̄ − µ0
P −zα/2 ≤ √ ≤ zα/2 = 1 − α.
σ0 / n
In other words, there are 100(1 − α)% chances of the observed value x̄ of X̄ to lie in the above interval.
Notice that this interval becomes shorter and shorter in length as we choose the size n of the sample larger
and larger. This increases the possibility of x̄ to be very close (or equal) to the µ0 , in case H0 : µ = µ0
is true! This shows that X̄ is the right choice of test statistic when we test a hypothesis regarding the
population mean.
Now let us see how we design a test for hypothesis testing on the basis of given or observed sample
information. First we choose a fixed value of α, known as the level of significance (los) of the test. After
fixing the los, we have two ways of hypothesis testing: (i) Critical region test (ii) P-value test. In the
following, we describe both these approaches.
In practical terms, the mean x̄ of a good and fair sample would lie in this interval [L1 , L2 ]. Indeed,
we can expect this in case µ0 is the true population mean, that is, H0 : µ = µ0 is true. So let us decide
28
that we will reject H0 only when the observed value x̄ of X̄ lies outside this 99% CI [L1 , L2 ]. This is a
good test because there are 99% chances of the observed value x̄ of X̄ to lie in the interval [L1 , L2 ], and
hence only 1% chances to lie outside this interval. Indeed, we can suspect the truth of H0 in case x̄ lies
outside the 99% CI, and therefore accept Ha . The range of x̄ outside the 99% CI [L1 , L2 ] is the critical
region or rejection region of H0 or acceptance region of Ha . So if the observed value x̄ of X̄ lies outside
this interval [L1 , L2 ], we accept Ha and reject H0 at the given los α = 0.01.
The end points of the 99% CI [L1 , L2 ], that is, L1 and L2 are critical values (x̄cr ) of the test statistic
X̄. Obviously, probability of getting x̄ beyond x̄cr , that is, in the critical region, is α:
Notice that, the observed value x̄ of X̄ may lie in the acceptance region, X̄ < L1 or X̄ > L2 , of Ha
even when H0 is true. Of course, there are only 1% chances of this situation. It means, there can be an
error of 1% (very low risk) in our decision of accepting Ha when actually H0 is true. This is Type-I error,
and it is under our control as it corresponds to α = 0.01 that we preset at our own. This α value is the
level of significance (los) of our test. So accepting Ha at 1% los means we are 99% confident of rejecting
H0 .
P-value test
In the critical region test, we need to find the critical region explicitly. Alternatively, we can find proba-
bility of X̄ beyond its observed value x̄, that is,P (|X̄| > x̄) This probability is called P-value of our test.
So
If P-value is less than α, then obviously either x̄ < L1 or x̄ > L2 . That means, x̄ lies in the critical region,
and therefore we accept Ha in this case.
Type II error
In any case (critical region test or P-value test), it is very important to note that we do not accept
H0 in case our test does not allow us to accept Ha . Accepting H0 could be a very big risk when it is
not true. This is called Type II error, denoted by β. Suppose the true value of population mean is µ1
√ √
instead of µ0 . Then the probability of x̄ to lie in the interval [µ0 − z0.005 σ0 / n, µ0 + z0.005 σ0 / n], when
X̄ ∼ N (µ1 , σ02 /n), gives the value of β. So
√ √
β = P (µ0 − z0.005 σ0 / n ≤ X̄ ≤ µ0 + z0.005 σ0 / n) : X̄ ∼ N (µ1 , σ02 /n)
It implies that
√ √
µ0 − z0.005 σ0 / n − µ1 X̄ − µ1 µ0 + z0.005 σ0 / n − µ1
β=P √ ≤ √ ≤ √
σ0 / n σ0 / n σ0 / n
Notice that for converting X̄ to the standard normal variable, we have used the true population mean µ1 .
H0 : µ ≤ µ0
Ha : µ > µ0
29
Then, we need to consider 99% upper confidence bound (in case α = 0.01) of X̄, which is given by
√ √
U = µ0 + zα σ0 / n = µ0 + z0.01 σ0 / n.
Clearly, now the critical value is U and the critical region is X̄ > U , which lies in the right tail of the
distribution of X̄. So, if the observed value x̄ of X̄ lies in the critical region, that is, x̄ > U , then we
accept Ha at 1% los.
In this case, the P-value is given by
If P-value is less than α, then obviously x̄ > U . That means, x̄ lies in the critical region, and therefore
we accept Ha in this case.
Type I and Type II errors, in case of the right tailed test, are given by
α = P (X̄ > U ),
√
X̄ − µ1 µ0 + z0.01 σ0 / n − µ1
β=P √ ≤ √
σ0 / n σ0 / n
H0 : µ ≥ µ0
Ha : µ < µ0
Then, we need to consider 99% lower confidence bound (in case α = 0.01) of X̄, which is given by
√ √
L = µ0 − zα σ0 / n = µ0 − z0.01 σ0 / n.
Clearly, now the critical value is L and the critical region is X̄ < L, which lies in the left tail of the
distribution of X̄. So, if the observed value x̄ of X̄ lies in the critical region, that is, x̄ < L, then we
accept Ha at 1% los.
In this case, the P-value is given by
If P-value is less than α, then obviously x̄ < L. That means, x̄ lies in the critical region, and so Ha is
accepted in this case.
Type I and Type II errors, in case of the left tailed test, are given by
30
(i) If the random sample of size n is chosen from a normal population with mean µ0 and variance
σ02 or a non-normal population with mean µ0 , variance σ02 and n ≥ 30, then X̄ ∼ N (µ0 , σ02 /n),
and the 100(1 − α)% CI of X̄ is
√ √
[L1 , L2 ] = [µ0 − zα/2 σ0 / n, µ0 + zα/2 σ0 / n].
(ii) If the random sample of size n is chosen from a normal population with mean µ0 or a non-
normal population with mean µ0 and n ≥ 30, then X̄ ∼ N (µ0 , s20 /n) and the 100(1 − α)% CI
of X̄ is
√ √
[L1 , L2 ] = [µ0 − zα/2 s0 / n, µ0 + zα/2 s0 / n],
where s0 is the observed value of the sample S.D. S from the given or chosen sample.
(iii) If the random sample of size n < 30 is chosen from a normal population with mean µ0 , then
√
(X̄ − µ0 )/(s0 / n) ∼ Tn−1 , and 100(1 − α)% CI of X̄ is
√ √
[L1 , L2 ] = [µ0 − tα/2 s0 / n, µ0 + tα/2 s0 / n],
where s0 is the observed value of the sample S.D. S from the given or chosen sample.
2. If the hypothesis is about the difference of means of two populations, then we use the difference of
the random sample means X̄1 − X̄2 as the test statistic. In case, two random samples of sizes n1
and n2 are selected from two normal populations with distributions N (µ1 , σ12 ) and N (µ2 , σ22 ), then
X̄1 − X̄2 ∼ N (µ1 − µ2 , σ12 /n + σ12 /n), and the 100(1 − α)% CI of X̄1 − X̄2 is
q q
[L1 , L2 ] = [µ1 − µ2 − zα/2 σ1 /n1 + σ2 /n2 , µ1 − µ2 + zα/2 σ12 /n1 + σ22 /n2 ]
2 2
3. If the hypothesis is about the population proportion, then we use the the random sample proportion
P̂ as the test statistic. In case, the random sample of size n is chosen from a Binomial population
with proportion p such that np ≥ 10, np(1−p) ≥ 10, then P̂ ∼ N (p, p(1−p)/n) and the 100(1−α)%
CI of P̂ is
p p
[L1 , L2 ] = [p − zα/2 p(1 − p)/n, p + zα/2 p(1 − p)/n]
4. If the hypothesis is about the difference of proportions of two populations, then we use the difference
of the random sample sample proportions P̂1 − P̂2 as the test statistic. In case, two random samples
of sizes n1 and n2 are selected from two Binomial populations with proportions p1 and p2 such that
n1 p1 ≥ 10, n1 p1 (1 − p1 ) ≥ 10 and n2 p2 ≥ 10, n2 p2 (1 − p2 ) ≥ 10, then P̂1 − P̂2 ∼ N (p1 − p2 , p1 (1 −
p1 )/n1 + p2 (1 − p2 )/n2 ) and the 100(1 − α)% CI of P̂1 − P̂2 is
p p
[L1 , L2 ] = [p1 −p2 −zα/2 p1 (1 − p1 )/n1 + p2 (1 − p2 )/n2 , p1 −p2 +zα/2 p1 (1 − p1 )/n1 + p2 (1 − p2 )/n2 ]
5. If the hypothesis is about the population variance, then we use the the random sample variance S 2
as the test statistic. In case, the random sample of size n is chosen from a normal population with
variance σ02 , then (n − 1)S 2 /σ02 follows χ2 distribution with n − 1 dof, and the 100(1 − α)% CI of
S 2 is
It is straightforward to write the one sided upper or lower confidence bounds for above statistics, in
case, we need to apply right or left tailed test.
31
Ex. A manufacturer of sprinkler systems used for fire protection in office buildings claims that the true
average system-activation temperature is 130o F. A sample of n = 9 systems, when tested, yields a sample
average activation temperature of 131.08o F. If the distribution of activation times is normal with standard
deviation 1.5o F, do the data contradict the manufacturer’s claim at significance level α = 0.01 ? Also,
calculate Type II error in case the true average system-activation temperature is 131o F.
H0 : µ = 130
Critical region test: Since α = 0.01, we find the 99% CI of X̄, which reads
√ √
µ0 ± z0.005 σ0 / n = 130 ± 2.58(1.5/ 9) = [128.71, 131.29].
The observed value 131.08 of X̄ lies inside this interval. So we are unable to reject H0 at 0.01 lof. Hence,
the data do not contradict the manufacturer’s claim at significance level α = 0.01
131.08−130
P-value = 2P [X ≥ 131.08] = 2P [Z ≥ √
1.5/ 9
] = 2P [Z ≥ 2.16] = 2(1 − P [Z ≤ 2.16]) =
2(1 − 0.9846) = 0.0308.
It is greater than α = 0.01. So H0 cannot be rejected at significance level 0.01. Hence, the data do
not give strong support to the claim that the true average differs from the design value of 130.
H0 : µ ≤ 26
32
It is about the population mean. So the appropriate test statistic is the sample mean X̄. The
given population is normal with mean µ0 = 26 and S.D. σ0 = 5. Also, the sample size is n = 36. So
X̄ ∼ N (µ0 , σ02 /n) = N (26, (5)2 /36). Given los is α = 0.05. Note that here we need to apply right tailed
test.
Critical region test: Since α = 0.05, we find the 95% upper confidence bound of X̄, which reads
√ √
µ0 + z0.05 σ0 / n = 26 + 1.65(5/ 36) = 27.37.
The observed value 28.04 of X̄ is greater than 27.37, and hence lies in the critical region. So we reject H0
at 0.05 lof. Hence, the sample data support the hypothesis of increase in the mean petrol mileage at the
5% level of significance.
28.04−26
P-value = P [X ≥ 28.04] = P [Z ≥ √
5/ 36
] = P [Z ≥ 2.45] = 1 − P [Z ≤ 2.45] = 1 − 0.9929 = 0.0071.
It is less than α = 0.05. So H0 can be rejected at significance level 0.05. Hence, the sample data
support the hypothesis of increase in the mean petrol mileage at the 5% level of significance.
Interpretation of the P-value: There are two explanations of this very small probability 0.0071. First,
the null hypothesis H0 is true and we have observed a very rare sample that by chance has a large mean.
Second, the new design with more aluminium has, in fact, resulted in a higher mean petrol mileage. We
prefer the second explanation as it supports our research hypothesis Ha . That is, we shall reject H0 and
report that P-value of our test is 0.0071. Also, the prestated level of significance is α = 0.05. We can
safely reject H0 at this level of significance since the P-value 0.0071 of our test is less than α = 0.05.
33
Ex. The maximum acceptable level for exposure to microwave radiation in Mumbai is an average of 10
microwatts per square centimeter (assume normality). It is feared that a large television transmitter may
be polluting the air nearby by pushing the level of microwave radiation above the safe limit. A sample of
25 readings gives an average of 10.3 microwatts per square centimeter with variance 4. Test the hypothesis
whether the transmitter pushes the level of radiation beyond the safe limit at α = 0.1. Also, verify your
decision with the P-value.
H0 : µ ≤ 10
Ha : µ > 10 (unsafe).
It is about the population mean. So the appropriate test statistic is the sample mean X̄. The given
population is normal with mean µ0 = 10. The sample size is n = 25, which is less than 30. Sam-
√
ple S.D. is s0√= 2. Also, S.D. of the population is unknown. So (X̄ − µ0 )/(s0 / n) ∼ Tn−1 , that is,
(X̄ − 10)/(2/ 25) ∼ T24 . Given los is α = 0.1. Note that here we need to apply right tailed test.
Critical region test: Since α = 0.1, we find the 95% upper confidence bound of X̄, which reads
√ √
µ0 + t0.1,24 σ0 / n = 10 + 1.318(2/ 25) = 10.527.
The observed value 10.3 of X̄ is less than 10.527, and hence does not lie in the critical region. So we are
unable to reject H0 at 0.1 lof, and conclude that the observed data do not support the contention that
the transmitter is forcing the average microwave level level above the safe limit.
10.3−10
P-value = P [X ≥ 10.3] = P [T24 ≥ √
2/ 25
] = P [T24 ≥ 0.75] = 0.23
It is greater than α = 0.1. So H0 can not be rejected at significance level 0.1. Hence, the observed
data do not support the contention that the transmitter is forcing the average microwave level level above
the safe limit.
Note: Here, the P-value P [T24 ≥ 0.75] = 0.23 is calculated by using the online calculator SticiGui. In the
T-distribution table, in the text book, this value is not directly available. But still you can have an idea
about the P-value by seeing the two adjacent values from the T-distribution table. For, we see sP [T24 >
0.685] = 1 − P [T24 ≤ 0.685] = 1 − 0.75 = 0.25. Also, P [T24 > 1.318] = 1 − P [T24 ≤ 1.318] = 1 − 0.9 = 0.1.
That means the P-value P [T24 ≥ 0.75] = 0.23 lies between 0.1 and 0.25. So the P-value is greater than
α = 0.1, and therefore H0 can not be rejected.
34
Ex. Transport authorities believe that night accidents happen on a particular highway due to improper
reflective signs put by the engineers. On the other hand, highway engineers claim that the reflective
highway signs do not perform properly because more than 30% of the vehicles on the road have misaimed
headlights. If this contention is supported statistically, a tougher inspection program will be put into
operation by the transport authorities. For this purpose, the transport authorities randomly selected
100 vehicles and found 40 vehicles with misaimed headlights. Test whether this sample data support the
engineers’ claim at the 5% level of significance.
Sol. Let p denote the proportion of vehicles with misaimed headlights. Since the engineers wish to
support p > 0.3, so the null hypothesis H0 and the research hypothesis Ha are
H0 : p ≤ 0.3
Ha : p > 0.3
It is about the population proportion. So the appropriate test statistic is the sample proportion P̂ .
The given population is Binomial with p = 0.3. The sample size is n = 100 such that np = 30 > 10,
np(1 − p) = 21 > 10. So P̂ ∼ N (p, p(1 − p)/n) = N (0.3, 0.0021). Given los is α = 0.05. Note that here
we need to apply right tailed test.
Critical region test: Since α = 0.05, we find the 95% upper confidence bound of P̂ , which reads
p √
p + z0.05 p(1 − p)/n = 0.3 + 1.65 0.0021 = 0.376
So the observed sample proportion 40/100 = 0.4 is greater tahn 0.376, hence lies in the critical region. So
we reject H0 at 0.05 lof. Thus, the sample data support the engineers’ claim at the 5% level of significance.
0.4−0.3
P-value = P [P̂ ≥ 0.4] = P [Z ≥ √
0.0021
] = P [Z ≥ 2.18] = 0.015.
It is less than α = 0.05. So H0 can be rejected at significance level 0.05. Hence, the sample data
support the engineers’ claim at the 5% level of significance.
Note: In the above example, the conditions np ≥ 10 and np(1 − p) ≥ 10 are satisfied. In case np < 10,
the normal approximation of the Binomial population is not applicable, but we can proceed as explained
in the following example.
35
Ex. Transport authorities believe that night accidents happen on a particular highway due to improper
reflective signs put by the engineers. On the other hand, highway engineers claim that the reflective
highway signs do not perform properly because more than 30% of the vehicles on the road have misaimed
headlights. If this contention is supported statistically, a tougher inspection program will be put into
operation by the transport authorities. For this purpose, the transport authorities randomly selected
15 vehicles and found 9 vehicles with misaimed headlights. Test whether this sample data support the
engineers’ claim at the 5% level of significance. Describe type-II errors of the test.
Sol. Let p denote the proportion of vehicles with misaimed headlights. Since the engineers wish to
support p > 0.3, so the null hypothesis H0 and the research hypothesis Ha are
H0 : p ≤ 0.3
Ha : p > 0.3
Let X be random variable denoting the number of vehicles with misaimed headlights. Then in a sample
of 15 vehicles, X can take the values 0, 1, 2, ..., 15. Obviously, X is a binomial random variable with
n = 13 and p = 0.3, that is X ∼ B(n = 15, p = 0.3). So, X is our test statistic and its observed
value is given as 9. Now we need to find the critical region of X at 5% level of significance given that
X ∼ B(n = 15, p = 0.3). Binomial distribution cdf table, we find
P [X ≥ 8 : X ∼ B(n = 15, p = 0.3)] = 1 − P [X ≤ 7 : X ∼ B(n = 15, p = 0.3)] = 1 − 0.95 = 0.05.
This shows that the critical region C for the test statistic X at the 5% level of significance is
C = {8, 9, 10, 11, 12, 13, 14, 15}.
Since the observed value of X is given as 9, lying in this critical region. So we accept Ha , that is, the
sample data support the engineers’ claim at the 5% level of significance.
Also, we find
P-value = P [X ≥ 9 : X ∼ B(n = 15, p = 0.3)]
= 1 − P [X ≤ 8 : X ∼ B(n = 15, p = 0.3)] = 1 − 0.9848 = 0.0152,
which is less than α = 0.05. Thus, the sample data support the engineers’ claim at the 5% level of
significance.
Calculating Type-II error: In the above test, we have set α = 0.05. It implies that 5% of the
experiments consisting of inspection of 15 vehicles would result into incorrect rejection of H0 when it is
true, that is the probability of committing Type I error is α = 0.05.
In contrast to α, there is not a single β. Instead, there is a different β for each different p that exceeds
0.3. Thus there is a value of β for p = 0.4 (in which case X ∼ B(15, 0.4)), another value of β for p = 0.45
, and so on. So to calculate Type II error probability, we need to specify value of p. For example,
β(p = 0.4) = P [Type II error when p = 0.4]
= P [H0 not rejected when it is not true because p = 0.4]
= P [X ≤ 7 : X ∼ B(n = 15, p = 0.4)] = 0.7869 ≈ 0.79
It means, when p is actually 0.4 rather than 0.3 (a “small” departure from H0 ), roughly 79% of all
experiments of this type would result in H0 being incorrectly not rejected.
Also, notice that β is large because it pertains to the complementary region of the small region of α.
So accepting H0 is a big risk when it is not true. That is why, we prefer to use the phrase “fail to reject
H0 ” rather than “accept H0 ” when the sample evidence is not enough to accept Ha . In simple words,
rejecting H0 when it is not true is a small preset or known risk α but accepting H0 when it is not true
could be a very big unknown risk β.
36
5.4.3 More about P-value
Suppose we want to test
H0 : p ≤ 0.1
H1 : p > 0.1
based on a sample of size 20. Let the test statistic is X, the number of successes that are observed in 20
trials. If p = 0.1, the null value of p, then X follows a binomial distribution with mean E[X] = 20(0.1) = 2.
So values of X somewhat greater than 2 will lead to the rejection of null hypothesis. Suppose we want α
to be very small, say 0.0001. From the binomial probability distribution table, we have
So the critical region of the test is C = {9, 10, ......., 20}. Now suppose we conduct the test and observe 8
successes. It does not fall into C. So via our rigid rule of hypothesis testing we are unable to reject H0 .
However, a little thought should make us a bit uneasy with this decision. We find
It means we are willing to tolerate 1 chance in 10000 of making a Type I error. But we shall declare 4
chances in 10000 of making such an error too large to risk. There is so little difference between these
probabilities that it seems a bit silly to insist with our original cut off value 9.
Such a problem can be avoided by adopting a technique known as significance testing where we do not
preset α and hence do not specify a rigid critical region. Rather, we evaluate the test statistic and then
determine the probability of observing a value of the test statistic at least as extreme as the value noted,
under the assumption θ = θ0 . This probability is known as critical level or descriptive level of significance
or P value of the test. We reject H0 if we consider this P-value to be small. In case, an α level has been
preset to ensure that a traditional or industry maximum acceptable level is met, we compare the P-value
with the preset α value. If P-value ≤ α, then we can reject the null hypothesis atleast at the stated level
of significance.
37