0% found this document useful (0 votes)
222 views

Chap5 Statistical Inference

The document discusses statistical inference concepts related to population and random sampling. It defines key terms like population, random sample, sample mean, and sample variance. The main points are: - A population is the entire set of objects/individuals under study, while a random sample is a subset selected independently from the population. - The sample mean and variance are used to describe properties of the random sample. The sample mean has an expected value equal to the population mean, while the sample variance approaches the population variance as the sample size increases.

Uploaded by

NIHIR Agarwal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
222 views

Chap5 Statistical Inference

The document discusses statistical inference concepts related to population and random sampling. It defines key terms like population, random sample, sample mean, and sample variance. The main points are: - A population is the entire set of objects/individuals under study, while a random sample is a subset selected independently from the population. - The sample mean and variance are used to describe properties of the random sample. The sample mean has an expected value equal to the population mean, while the sample variance approaches the population variance as the sample size increases.

Uploaded by

NIHIR Agarwal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 37

Contents

5 Statistical Inference 2
5.1 Population and random sample . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
5.1.1 Sample Mean . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
5.1.2 Sample Variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
5.1.3 Sampling distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
5.1.4 Sampling distribution of mean X̄ . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
5.1.5 Sampling distribution of a statistic involving S 2 . . . . . . . . . . . . . . . . . . . . 10
5.1.6 Sampling distribution of a statistic involving X̄ (small sample) . . . . . . . . . . . 11
5.1.7 Sampling distribution of the difference between two sample means . . . . . . . . . 11
5.1.8 Sampling distribution of sample proportion . . . . . . . . . . . . . . . . . . . . . . 12
5.1.9 Sampling distribution of difference of sample proportions . . . . . . . . . . . . . . 12
5.1.10 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
5.2 Point Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
5.2.1 Unbiased point estimator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
5.2.2 Maximum likelihood (ML) method . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
5.2.3 Method of moments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
5.3 Confidence interval (CI) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
5.3.1 CI of mean of normal population with known variance . . . . . . . . . . . . . . . . 18
5.3.2 CI of mean of normal population with unknown variance . . . . . . . . . . . . . . 19
5.3.3 CI of variance of normal distribution with unknown mean . . . . . . . . . . . . . . 20
5.3.4 CI of difference of population means . . . . . . . . . . . . . . . . . . . . . . . . . . 20
5.3.5 CI of population proportion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
5.3.6 CI of difference of population proportions . . . . . . . . . . . . . . . . . . . . . . . 22
5.3.7 One-sided confidence bounds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
5.3.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
5.4 Hypotheses testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
5.4.1 Hypothesis testing procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
5.4.2 List of some useful test statistics (two tailed test) . . . . . . . . . . . . . . . . . . . 30
5.4.3 More about P-value . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

1
Chapter 5

Statistical Inference

Note: These lecture notes aim to present a clear and crisp presentation of some topics in Probability and
Statistics. Comments/suggestions are welcome via the e-mail: [email protected] to Dr. Suresh Kumar.

5.1 Population and random sample


In statistics, a population refers to the collection of all objects or individuals or measurements or obser-
vations under study or consideration. The number of observations in the population is defined to be the
size of the population. Any subset of the population is called a sample.
Population could be of infinite size as well. For example, all measurements of the depth of a lake,
from any conceivable position, constitute an infinite population. In statistical studies, finite but large size
population may also be considered as an infinite population.
In statistical studies, we often need to determine or estimate the parameters (such as mean, variance
etc.) of infinite or large size population. Indeed, we can not deal with all the objects of an infinite or
large size population to measure some parameter or property of the population due to many practical
reasons such as cost, time etc. For example, Govt. of India has to decide subsidy on fertilizers depending
on the average income of farmers in the previous five years in India. Due to large population of farmers in
India, it is impractical to collect the income information from all the farmers. In such situations, random
samples are collected from the population to infer the population parameter or property of interest.

Random sample
Suppose we are interested in studying some particular property of a given population, for example, this
could be average height of a given population of students. Suppose the property of our interest in each
member M of the given population is defined or quantified by a random variable X that follows some
distribution, say X ∼ D(µ, σ 2 ) with mean µ and variance σ 2 . Next, let us randomly select n members, say
M1 , M2 , ... ,Mn from the population such that the selections are independent of each other. Suppose some
random variables X1 , X2 , ..., Xn define the property of our interest for the randomly chosen members M1 ,
M2 , ... ,Mn respectively from the population. Then, X1 , X2 , ..., Xn are n independent random variables
where each of these follow the same distribution as of X, that is, Xi ∼ D(µ, σ 2 ). Why? It requires a bit of
thinking: X1 , X2 , ..., Xn are random variables because these are defined for the randomly chosen members
M1 , M2 , ... ,Mn . Further, each of these follow the same distribution as of X because each Xi defines the
same property for Mi as it is defined by X for M . Finally, the definition of random sample reads as follows:

A random sample of size n from a population distribution (X, f (X)) with mean µ and
variance σ 2 is a collection of n independent random variables X1 , X2 , ......., Xn , each having
the same probability distribution as of X. Therefore, E[Xi ] = E[X] = µ and V [Xi ] = V [X] = σ 2 ,

2
for i = 1, 2, ..., n.

Conditions of the random sample can be paraphrased by saying that the Xi ’s are independent and
identically distributed (iid). If sampling is either with replacement or from an infinite (conceptual) pop-
ulation, the conditions are satisfied exactly. These conditions will be approximately satisfied if sampling
is without replacement, yet the sample size n is much smaller than the population size N . In practice,
if n/N ≤ 0.05 (at most 5% of the population is sampled), we can proceed as if the Xi ’s form a random
sample.

Before an example of random sample, we introduce two important sample based measures, namely
the sample mean and sample variance.

5.1.1 Sample Mean


Let X1 , X2 , ......., Xn be a random sample from the distribution of X. Then the random variable
n
1X
X̄ = Xi
n
i

defines mean of the sample X1 , X2 , ......., Xn .


Since E[X] = µ = E[Xi ] and V [X] = σ 2 = V [Xi ], it follows that
n n
1X 1X 1
E[X̄] = E[Xi ] = µ = nµ = µ.
n n n
i i

n n
1 X 1 X 2 1 σ2
V [X̄] = 2 V [Xi ] = 2 σ = 2 nσ 2 = .
n n n n
i i
σ2
Thus, sample mean X̄ follows a distribution with mean µ and variance n .

5.1.2 Sample Variance


The random variable
n
1 X
S2 = (Xi − X̄)2
n−1
i=1

is called the sample variance, and S = S 2 is called the sample standard deviation.

The sample variance can also be rewritten as


n n n
!
2 1 X 1 X 2 1 X
S = (Xi − X̄)2 = (Xi + X̄ 2 − 2Xi X̄) = Xi2 − nX̄ 2
n−1 n−1 n−1
i=1 i=1 i=1

n n
! !
2 1 X 1 X
∴ E[S ] = E[Xi2 ] − nE[X̄ ] 2
= 2
(V [Xi ] + E[Xi ] ) − n(V [X̄] + E[X̄] ) 2
n−1 n−1
i=1 i=1
n
!
1 X 1
= (σ 2 + µ2 ) − n(σ 2 /n + µ2 ) = [n(σ 2 + µ2 ) − (σ 2 + nµ2 )] = σ 2 .
n−1 n−1
i=1

3
Further, it can be shown that
n
1 X (n − 3) 4 1 (n − 3) 4
V (S 2 ) = E[(S 2 )2 ] − E[S 2 ]2 = E(Xi − µ)4 − σ = E(X − µ)4 − σ ,
n2 n(n − 1) n n(n − 1)
i=1

since E(Xi − µ)4 = E(X − µ)4 . Thus, the sample variance S 2 follows a distribution with mean σ 2 (the
1 (n − 3) 4
population variance) and variance V (S 2 ) = E(X − µ)4 − σ .
n n(n − 1)
Statistics and Parameters: Note that random sample based statistical measures such as X̄, S 2 etc. are
called statistics, and the population based statistical measures such as µ, σ 2 etc. are called parameters.

Illustrative example from a discrete distribution


A certain brand of MP3 player comes in three models I, II, III with prices: $80, $100, and $120, respec-
tively. If 20% of all purchasers choose the model I, 30% choose the model II, and 50% choose the model
III, then the probability distribution of the cost X of a single randomly selected MP3 player purchase is
given by

X=x 80 100 120

f (x) = P [X = x] 0.2 0.3 0.5

2 = E[X 2 ] − E[X]2 = 244.


The mean and variance of X are µX = E[X] = 106 and σX

Suppose on a particular day only two MP3 players are sold. Let X1 be the revenue from the first sale
and X2 the revenue from the second, assuming that the two sales are independent of each other. Then,
(X1 , X2 ) is a random sample of size 2 from the given population of three types of MP3 players. Note
that prior to sale, X1 is a random variable since it could take any of the three values of
X, and hence follows the same distribution of the population variable X. Similarly, X2 also
follows the distribution of X. So E[X] = E[X1 ] = E[X2 ] = 106 and V [X] = V [X1 ] = V [X2 ] = 244.
The following table lists all possible (x1 , x2 ) pairs, the probability P [X = x1 , X2 = x2 ] of each pair, and
the resulting x̄ and s2 values. Note that when n = 2, s2 = (x1 − x̄)2 + (x2 − x̄)2 .

x1 x2 P [X = x1 , X2 = x2 ] x̄ = (x1 + x2 )/2 s2 = (x1 − x̄)2 + (x2 − x̄)2

80 80 (0.2)(0.2) = 0.04 80 0
80 100 (0.2)(0.3) = 0.06 90 200
80 120 (0.2)(0.5) = 0.10 100 800
100 80 (0.3)(0.2) = 0.06 90 200
100 100 (0.3)(0.3) = 0.09 100 0
100 120 (0.3)(0.5) = 0.15 110 200
120 80 (0.5)(0.2) = 0.10 100 800
120 100 (0.5)(0.3) = 0.15 110 200
120 120 (0.5)(0.5) = 0.25 120 0

The sampling distribution of X̄ is given as follows:

4
X̄ = x̄ 80 90 100 110 120

P [X̄ = x̄] 0.04 0.06 + 0.06 = 0.12 0.10 + 0.09 + 0.10 = 0.29 0.15 + 0.15 = 0.30 0.25

2 = E[X̄ 2 ] − E[X̄]2 = 244/2 = σ 2 /2.


We find µX̄ = E[X̄] = 106 and σX̄

The sampling distribution of S 2 is given as follows:

S 2 = s2 0 200 800

P [S 2 = s2 ] 0.04 + 0.09 + 0.25 = 0.38 0.06 + 0.06 + 0.15 + 0.15 = 0.42 0.10 + 0.10 = 0.20

We find µS 2 = E[S 2 ] = (0)(.38) + (200)(.42) + (800)(.20) = 244 = σ 2 .


Thus, the X̄ sampling distribution is centered at the population mean µ, and the S 2 sampling distri-
bution is centered at the population variance σ 2 .

Further, we can verify

V (S 2 ) = E[(S 2 )2 ] − E[S 2 ]2 = 85264.

Alternatively, for n = 2, we find

1 (n − 3) 4
V (S 2 ) = E(X − µ)4 − σ = 85264.
n n(n − 1)

Illustrative example from a continuous distribution


Service time for a certain type of bank transaction is a random variable having an exponential distribution
with mean 1/λ. Suppose T1 and T2 are service times for two different customers, assumed independent
of each other. Consider the average service time T̄ = (T1 + T2 )/2 for the two customers, also a statistic.
The cdf of T̄ is, for t ≥ 0,
ZZ Z 2t Z 2t−t1
FT̄ (t) = P [T̄ ≤ t] = f (t1 , t2 )dt1 dt2 = λe−λt1 .λe−λt2 dt2 dt1 = 1 − e−2λt − 2λte−2λt .
t1 +t2 ≤2t 0 0

The pdf of T̄ is obtained by differentiating FT̄ (t):

4λ2 te−2λt , t > 0



fT̄ (t) =
0, t≤0

Notice that it is the pdf of gamma distribution with α = 2 and β = 1/(2λ). Thus, T̄ is a gamma ran-
dom variable with mean E[T̄ ] = αβ = 1/λ and variance V [T̄ ] = αβ 2 = 1/(2λ2 ). Also, the mean and
variance of the underlying exponential distribution are µ = 1/λ and σ 2 = 1/λ2 . So E[T̄ ] = 1/λ = µ and
V [T̄ ] = 1/(2λ2 ) = σ 2 /2, as expected.

Note. The second method of obtaining information about a statistic’s sampling distribution is to perform
a simulation experiment. This method is usually used when a derivation via probability rules is too difficult
or complicated to be carried out. Such an experiment is virtually always done with the aid of a computer.

5
5.1.3 Sampling distributions
In this section, we will learn about sampling distributions of some useful statistics. First, let us see some
important results in this direction pertaining to the distributions of linear combinations of independent
random variables.

(i) Uniqueness of moment generating function (mgf): Recall that mgf of a random variable X is
defined as mX (t) = E(etX ), and its kth derivative at t = 0 gives the kth moment E(X k ). Next, mgf
of a random variable is unique, that is, if two random variables have same mgf, then these have same
probability distribution, and vice versa. Let us accept this well-known result in the literature.

(ii) mgf of the linear combination of independent random variables: Let X1 and X2 be independent
random variables with mgf mX1 (t) and mX2 (t) respectively. Let Y = X1 + X2 . Then the mgf of Y is
given by

mY (t) = mX1 (t)mX2 (t).

For, we have

mY (t) = E[etY ] = E[etX1 +tX2 ] = E[etX1 ]E[etX2 ] = mX1 (t)mX2 (t),

since etX1 and etX2 are independent as X1 and X2 are independent.

Next, if a1 , a2 are some constants, and Y = a1 X1 + a2 X2 , then it is easy to verify that

mY (t) = mX1 (a1 t)mX2 (a2 t).

Let X1 , X2 , ...., Xn be n random variables, with means µ1 , µ2 , ...., µn , and variances σ12 , σ22 , .....,
σn2 , respectively. Let a1 , a2 , ...., an be n constants. Then mean and variance of the linear combination
Y = a1 X1 + a2 X2 + ... + an Xn are given by

E[Y ] = E[a1 X1 + a2 X2 + ... + an Xn ] = a1 E[X1 ] + a2 E[X2 ] + ... + an E[Xn ] = a1 µ1 + a2 µ2 + ... + an µn ,


n X
X n
V [Y ] = V [a1 X1 + a2 X2 + ... + an Xn ] = ai aj Cov(Xi , Xj ).
i=1 j=1

In case, X1 , X2 , ...., Xn are independent random variables, we have Cov(Xi , Xj ) = 0 for i 6= j and
Cov(Xi , Xi ) = σi2 . So we get

V [Y ] = V [a1 X1 + a2 X2 + ... + an Xn ] = a21 σ12 + a22 σ22 + ... + a2n σn2 .

Also,
n
Y
mY (t) = mX1 (a1 t)mX2 (a2 t)....mXn (an t) = mXi (ai t).
i=1

Ex. If X1 and X2 are two independent Poisson random variables with parameters or means k1 and k2
respectively, then find the distribution of Y = X1 + X2 .
Sol. We recall that mgf of a Poisson random variable with mean k is
t −1)
mX (t) = ek(e .

It follows that
t −1) t −1) t −1)
mY (t) = mX1 (t)mX2 (t) = ek1 (e ek2 (e = e(k1 +k2 )(e ,

6
which is the mgf of a Poisson random variable with mean k1 + k2 . So, by uniqueness of mgf, Y = X1 + X2
follows a Poisson distribution with mean k1 + k2 .

Note that, in general, the linear combination of two Poisson random variables is not a Poisson random
2t 3t
variable. For example, if Y = 2X1 + 3X2 =, then mY (t) = ek1 (e −1)+k2 (e −1) , which is not the mgf of a
Poisson random variable.

Ex. If X1 and X2 are two independent exponential random variables both with same parameter or mean
β, then find the distribution of Y = X1 + X2 .
Sol. We recall that mgf of an exponential random variable with mean β is

mX (t) = (1 − βt)−1 .

It follows that

mY (t) = mX1 (t)mX2 (t) = (1 − βt)−1 .(1 − βt)−1 = (1 − βt)−2 ,

which is the mgf of a gamma random variable with the parameters α = 2 and β. So, by uniqueness of
mgf, Y = X1 + X2 follows a gamma distribution with the parameters: α = 2 and β.

In general, the linear combination Y = a1 X1 + a2 X2 of two independent exponential random variables


both with same parameter β is not a gamma random variable.

In case, the X1 and X2 are exponential variables are with different parameters, say β1 and β2 , then

mY (t) = (1 − β1 t)−1 .(1 − β2 t)−1

is not the mgf of a gamma random variable.

Ex. If X1 and X2 are two independent χ2 random variables with n1 and n2 dof respectively, then find
the distribution of Y = X1 + X2 .
Sol. We recall that mgf of a χ2 random variable X with n dof is

mX (t) = (1 − 2t)−n/2 .

It follows that

mY (t) = mX1 (t)mX2 (t) = (1 − 2t)−n1 /2 .(1 − 2t)−n2 /2 = (1 − 2t)−(n1 +n2 )/2 ,

which is the mgf of a χ2 random variable with n1 + n2 dof. So, by uniqueness of mgf, Y = X1 + X2 follows
χ2 distribution with n1 + n2 dof.

Note that a linear combination of X1 and X2 different from X1 + X2 is not a χ2 random variable
(verify!).

Ex. If X1 and X2 are two independent normal random variables both with mean µ and variance σ 2 , then
find the distribution of Y = X1 + X2 .
Sol. We recall that mgf of a normal random variable X is
1 2 t2
mX (t) = eµt+ 2 σ .

It follows that
1 2 t2 1 2 t2 1 2 /2)t2
mY (t) = mX1 (t)mX2 (t) = eµt+ 2 σ .eµt+ 2 σ = e2µt+ 2 (σ ,

7
which is the mgf of a normal random variable with mean 2µ and variance σ 2 /2. So, by uniqueness of mgf,
Y = X1 + X2 follows normal distribution with mean 2µ and variance σ 2 /2.

Further, if Y = a1 X1 + a2 X2 , then
1 2 2 2 t2
mY (t) = e(a1 +a2 )µt+ 2 (a1 +a2 )σ .

This shows that the linear combination Y = a1 X1 +a2 X2 is a normal random variable with mean (a1 +a2 )µ
and variance (a21 + a22 )σ 2 .

In general, if X1 and X2 are two independent normal random variables with means µ, µ2 and variances
σ12 and σ22 , and Y = a1 X1 + a2 X2 , then
1 2 2 2 2 2
mY (t) = e(a1 µ1 +a2 µ2 )t+ 2 (a1 σ1 +a2 σ2 )t .

This shows that the linear combination Y = a1 X1 + a2 X2 is a normal random variable with mean
a1 µ1 + a2 µ2 and variance a21 σ12 + a2 σ22 .

So we conclude that any linear combination of independent normal random variables


with the same or different means and variances is a normal random variable.

5.1.4 Sampling distribution of mean X̄


The sampling distribution of mean X̄ depends on the nature of population and/or size of the chosen
sample, as discussed in the following cases:

Case (i) Normal population with known mean and variance


Let X1 , X2 , ......., Xn be a random sample of size n from a normal distribution with mean µ and
variance σ 2 . Then X̄ is normally distributed with mean µ and variance σ 2 /n.

For, we know that mgf of a normal random variable X with mean µ and variance σ 2 is
1 2 t2
mX (t) = eµt+ 2 σ .

Also, here X̄ = n1 X1 + n1 X2 + .... + n1 Xn is the linear combination of the n independent normal random
variables X1 , X2 , ......., Xn . So we have
σ2 2 σ2
   
( nµ +.... nµ )t+ 12 n2
+....+ σ 2 t2 µt+ 12 t2
mX̄ (t) = mX1 (t/n)mX2 (t/n)....mXn (t/n) = e n =e n
,

which is the mgf of a normal random variable with mean µ and variance σ 2 /n. So by uniqueness of mgf,
X̄ is normally distributed with mean µ and variance σ 2 /n.

Here, we have a very useful and important observation: The variance σ 2 /n of X̄ decreases as the size
n of the random sample increases (For an illustration see Figure 5.1). It means as we increase the sample
size, the observed values of the statistic X̄ are likely to appear closer and closer to the population mean
µ.

Case (ii) Population with known or unknown variance (large sample size)
Let X1 , X2 , ......., Xn be a random sample of size n from any population distribution with mean µ and
variance σ 2 . Then for large n, X̄ is approximately normal (exactly normal in case of normal population)

8
Figure 5.1: The sampling distributions of sample mean X̄ with sample sizes n = 2, 4, 16 obtained via
10000 MCMC simulations from a normal population with µ = 1 and σ 2 = 1. We can see that the
sampling distributions are approximately normal, and the variance of the sampling distribution decreases
with the increase in the sample size n, as expected.

X̄−µ
with mean µ and variance σ 2 /n. Therefore, for large n, Z = √
σ/ n
is approximately standard normal.
X̄−µ
Further, for large n, the statistic Z = √
S/ n
is also approximately normal, where S is S.D. of the sample.

The above result is due to the Central Limit Theorem (CLT).

Thumb rule for applying CLT: The normal approximation for X̄ is generally good if n ≥ 30, provided
the population distribution is not terribly skewed. If n < 30, the approximation is good only if the
population is not too different from a normal distribution.

In summary, we have the following conclusions about the distribution of X̄:

(i) If a random sample of size n is selected from a normal population having mean µ and variance σ 2 ,
X̄ − µ
then the sample mean X̄ follows a normal distribution with mean µ and variance σ 2 /n. So Z = √
σ/ n
is a standard normal variable.

(ii) If a random sample of size n ≥ 30 (large sample) is selected from a non-normal population having
mean µ and variance σ 2 , then by CLT, the sample mean X̄ approximately follows a normal distribution
X̄ − µ
with mean µ and variance σ 2 /n. So Z = √ is approximately a standard normal variable. In case,
σ/ n
X̄ − µ
population variance σ 2 is unknown, we can use the sample variance S 2 , that is, Z = √ is also ap-
S/ n
proximately a standard normal variable.

Ex. The breaking strength of a rivet has a mean value of 10000 psi and a standard deviation of 500

9
psi. What is the probability that the sample mean breaking strength for a random sample of 40 rivets is
between 9950 and 10250?

X̄−10000
Sol. Here µ = 10000, σ = 500 and n = 40 > 30. So by CLT, Z = √
500/ 40
is approximately standard
normal. Therefore, we have
 
P (9950 ≤ X̄ ≤ 10250) ≈ P 9950−10000

500/ 40
≤ Z ≤ 10250−10000

500/ 40
= P (−0.63 ≤ Z ≤ 3.16) = F (3.16) − F (−0.63) = 0.9992 − 0.2643 = 0.7349.

Note: To save time, you can use the online probability calculator SticiGui, that provides the probabilities
for different distributions. Else, use the probability distribution tables given in the text book. If you see
the values from the Table, you may need to do interpolation in order to get a better value, in case exact
value is not available in the Table. Interpolation procedure is as follows:
Suppose we need P [X ≤ c], that is, F (c). But from the Table, say only F (a) and F (b) are available,
where a < c < b. Then for interpolation, use
c−a
P [X ≤ c] = F (c) = F (a) + (F (b) − F (a)).
b−a

5.1.5 Sampling distribution of a statistic involving S 2


From the central limit theorem (CLT), we know that the distribution of the sample mean is approximately
normal. What about the sample variance? Unfortunately there is no CLT analog for variance. But there
is an important special case, when X1 , X2 , ......., Xn are from a normal distribution. In this case, the
distribution of a statistic involving the sample variance is the χ2 distribution as described in the following.

If S 2 is the variance of a random sample of size n taken from a normal population having the variance
σ2, then the statistic
n
(n − 1)S 2 X (Xi − X̄)2
χ2n−1 = =
σ2 σ2
i=1

has a chi-squared distribution with n − 1 degrees of freedom.

It follows from the following results:


(i) If X is a normal random variable with mean µ and variance σ 2 . Then it can be proved that the
 2
square of the standard normal variable, that is, Z 2 = X−µ σ follows a chi-square distribution with one
degree of freedom.
(ii) It can be proved that the sum of independent chi-square random variables is also a chi-square
random variable with degrees of freedom equal to the sum of degrees of freedom of all the independent
random variables. It follows that if X1 , X2 , ......., Xn is a random sample of size n from a normal
n 
Xi − µ 2
X 
distribution with mean µ and variance σ 2 , then is a chi-square random variable with n
σ
i=1
degrees of freedom.
n  2
X Xi − X̄
(iii) If the population mean µ is approximated by the sample mean X̄, then is a
σ
i=1
chi-square random variable with n − 1 degrees of freedom. Here one degree of freedom is consumed due
to the calculation of population mean µ from the sample.
For the chi-square distribution, it turns out that the mean and variance are:
E[χ2n−1 ] = n − 1,

10
V [χ2n−1 ] = 2(n − 1).
We can use this to get the mean and variance of S 2 :
 2 2 
2 σ χn−1 σ2
E[S ] = E = (n − 1) = σ 2 .
n−1 n−1
 2 2 
2 σ χn−1 σ4 2 σ4 2σ 4
V [S ] = V = V [χ n−1 ] = 2(n − 1) = .
n−1 (n − 1)2 (n − 1)2 n−1

5.1.6 Sampling distribution of a statistic involving X̄ (small sample)


Let X1 , X2 , ......., Xn be a random sample of size n < 30 (small sample) with mean X̄ and variance S 2
drawn from a normal distribution with mean µ but unknown variance. Then it can be shown
that the statistic
X̄ − µ
Tn−1 = √
S/ n
follows a T distribution with n − 1 degrees of freedom.

Ex. An electrical firm manufactures light bulbs that have a length of life that is normally distributed
with mean equal to 800 hours. Find the probability that a random sample of 16 bulbs with a standard
deviation of 40 hours will have an average life of less than 775 hours.
X̄−µ
√ = T15 = X̄−800
Sol. Here µ = 800, n = 16 and S = 40. So Tn−1 = S/ n

40/ 16
follows a T distribution with 15
degrees of freedom, and therefore

P (X̄ < 775) = P (T15 < −2.5) = 0.0123.

5.1.7 Sampling distribution of the difference between two sample means


Suppose two independent random samples of size n1 and n2 with means X̄ and Ȳ are drawn from two
normal populations with unknown means µ1 and µ2 and known variances σ12 and σ22 , respectively. Then
the statistic X̄ − Ȳ , is normally distributed with mean

E[X̄ − Ȳ ] = E[X̄] − E[Ȳ ] = µ1 − µ2

and variance
σ12 σ22
V [X̄ − Ȳ ] = V [X̄] + V [Ȳ ] = + .
n1 n2
Therefore,

(X̄ − Ȳ ) − (µ1 − µ2 )
Z= q 2
σ1 σ22
n1 + n2

is standard normal variable.


Note. This procedure for estimating the difference between two means is applicable if σ12 and σ22 are
known. If the variances are not known and the two distributions involved are normal, the T-distribution
becomes involved, as in the case of a single sample. If one is not willing to assume normality, large
samples (say greater than 30) will allow the use of computed sample S.D. s1 and s2 in place of σ1 and σ2 ,
respectively.

11
5.1.8 Sampling distribution of sample proportion
Let p denote the proportion of “successes” in a Binomial population. Suppose X is number of successes in
a random sample of size n. Then mean and variance of X are E[X] = np and variance V (X) = E[X 2 ] −
E[X]2 = np(1 − p), respectively. Furthermore, if np ≥ 10 and np(1 − p) ≥ 10, then X approximately
follows a normal distribution. Since n is constant, the random variable defined by P̂ = X
n (the sample
proportion) also follows normal distribution with mean
 
X 1 1
E = E[X] = np = p
n n n

and variance
"  #
X 2
   2
X X 1 1 p(1 − p)
V =E −E = 2 (E[X 2 ] − E[X]2 ) = 2 np(1 − p) = .
n n n n n n

It follows that the statistic

P̂ − p
Z=p
p(1 − p)/n

follows the standard normal distribution.

5.1.9 Sampling distribution of difference of sample proportions


Following the approach of single sample, we derive the sampling distribution of P̂1 − P̂2 , where P̂1 = X/n1
and P̂2 = Y /n2 ; further X and Y being the number of successes in the two random samples with sizes n1
and n2 chosen from two Binomial populations with success proportions p1 and p2 , respectively. We find
that P̂1 − P̂2 follows normal distribution with

E[P̂1 − P̂2 ] = p1 − p2 ,

p1 (1 − p1 ) p2 (1 − p2 )
V [P̂1 − P̂2 ] = + ,
n1 n2
provided the conditions of normality are met by the chosen samples from the Binomial populations.
Therefore, the statistic

(P̂1 − P̂2 ) − (p1 − p2 )


Z= q
p1 (1−p1 )
n1 + p2 (1−p
n2
2)

is a standard normal variable.

12
5.1.10 Summary
Before, we proceed further, it is important to recollect the following important points.

• Population is the collection of all objects or observations under study, and any subset of the popu-
lation is a sample.

• In practical applications, we usually need to know the properties (mean, variance etc.) of a large
size or infinite population. Due to the large size, we can not deal with each item of the population.
So we collect random (unbiased) sample(s) from the population, and from the collected sample
information, we infer the properties of population.

• A random sample of size n, by definition, is a collection of n independent random variables X1 , X2 ,


..., Xn from a population distribution (X, f (X)) such that each Xi has the same distribution as of
X. Therefore, if E[X] = µ and V [X] = σ 2 , then E[Xi ] = µ and V [Xi ] = σ 2 for all i = 1, 2, ..., n.
Note that the random sample of size n carries n independent random variables X1 , X2 , ..., Xn
because prior to selection of the n members of the sample, each member can take any value of the
population distribution variable X. Obviously, once a sample is collected from the population, the
variables X1 , X2 , ..., Xn get some fixed values (as per the given definition of X) corresponding
to the members in the sample collected from the population. Recall the example of MP3 players.
There a random sample of size two carries two random variables X1 and X2 . Now suppose we select
two MP3 players with costs $ 80 and $ 120, respectively, from the population of the MP3 players,
then for this selected sample X1 = 80 and X2 = 120.

• Mean of the random sample X1 , X2 , ..., Xn is X̄ = (X1 + X2 + ... + Xn )/n. Note that X̄ being a
combination of random variables, is also a random variable. Further, X̄ follows a distribution with
mean µ and variance σ 2 /n, if the random sample is selected from a population having mean µ and
variance σ 2 .

• Variance of the random sample X1 , X2 , ..., Xn is defined by


1
X̄ = [(X1 − X̄)2 + (X2 − X̄)2 + ... + (Xn − X̄)2 ].
n−1
Notice that here in the definition of variance division is done by n − 1 instead of n (You studied the
variance with division by n in your high school statistics!). First, variance is simply a mathematical
measure quantifying the deviation of data from the mean value. So it does not matter whether it is
defined with n or n − 1 in the denominator as it still quantifies the deviation of data from the mean.
But here we have a profound reason for choosing the definition with n − 1. With this definition, we
get E[S 2 ] = σ 2 . Therefore, the expected value of the sample variance is the population variance.
This is the desirable feature of a representative sample of the population.

• If a random sample of size n is selected from a normal population having mean µ and variance
σ 2 , then the sample mean X̄ follows a normal distribution with mean µ and variance σ 2 /n. So
X̄ − µ
Z= √ is a standard normal variable.
σ/ n
• If a random sample of size n < 30 (small sample) with mean X̄ and variance S 2 is selected from a
normal distribution with mean µ but unknown variance. Then

X̄ − µ
Tn−1 = √
S/ n

follows a T distribution with n − 1 degrees of freedom.

13
• If a random sample of size n ≥ 30 (large sample) is selected from a non-normal population having
mean µ and variance σ 2 , then by CLT the sample mean X̄ approximately follows a normal distribu-
X̄ − µ
tion with mean µ and variance σ 2 /n. So Z = √ is approximately a standard normal variable.
σ/ n
X̄ − µ
In case, population variance σ 2 is unknown, we can use the sample variance S 2 , that is, Z = √
S/ n
is also approximately a standard normal variable.

• If S 2 is the variance of a random sample of size n taken from a normal population having the
variance σ 2 , then the statistic
n
(n − 1)S 2 X (Xi − X̄)2
χ2n−1 = =
σ2 σ2
i=1

has a chi-squared distribution with n − 1 degrees of freedom.

• If X1 , X2 , ...., Xn are independent normal random variables with means µ1 , µ2 ,...., µn and variances
σ12 , σ22 , ...., σn2 , respectively, then the linear combination a1 X1 + a2 X2 + ... + an Xn is also a normal
random variable with mean a1 µ1 + a2 µ2 + ... + an µn and variance a21 σ12 + a22 σ22 + ... + a2n σn2 .

Classical methods for Statistical Inference


To study a population parameter or property from the collected sample information, in what follows, we
shall use the following three methods:

(i) Point Estimation: We use the sample information to estimate the value of the population pa-
rameter, called the point estimate.

(ii) Confidence Interval: We use the sample information to estimate an interval, called the con-
fidence interval, in which the population parameter is likely to lie with some given probability (confidence).

(ii) Hypothesis Testing: We use the sample information to test a given or existing hypothesis or
statement, called null hypothesis, about the population parameter via an alternative hypothesis.

14
5.2 Point Estimation
Estimating the value of a population parameter θ from the sample information refers to as the point
estimation of the population parameter θ.

5.2.1 Unbiased point estimator


A statistic θ̂ is an unbiased estimator for a population parameter θ if and only if E[θ̂] = θ.
It is desirable that the unbiased estimator θ̂ has a small variance for large sample sizes. Therefore, if
we consider all possible unbiased point estimators of some parameter θ, then the one with the smallest
variance is called the most efficient estimator of θ.

Unbiased point estimators of the population mean and variance


Suppose X follows a distribution with mean µ and variance σ 2 . Let X1 , X2 , ......, Xn be a random
sample from the distribution of X. Then, as shown earlier,

E[X̄] = µ, E[S 2 ] = σ 2 .

Therefore, the sample mean µ̂ = X̄ and sample variance σ̂ 2 = S 2 are unbiased estimators of the population
mean µ and variance σ 2 , respectively.

Difference between estimator and estimate


An estimator is a function of the random sample variables, that is, it is a rule that tells you how to
calculate an estimate of the parameter from the given sample. An estimate is a value of the estimator
calculated from a given sample. For example, the sample mean X̄ = (X1 +X2 +...+Xn )/n is an estimator
of the population mean µ. But for some given particular values x1 , x2 , ..., xn of the random sample X1 ,
X2 , ..., Xn , the quantity x̄ = (x1 + x2 + ... + xn )/n is a particular value of X̄, and hence x̄ is an estimate
of µ.

5.2.2 Maximum likelihood (ML) method


In this method, we write a function L(θ) (known as likelihood function) involving the population param-
eter(s) θ of interest such that L(θ) gives the probability of the given sample information. The value of θ
that maximizes the likelihood function, serves as an estimate for the parameter θ.

Ex. Suppose 7 heads are observed in 10 tosses of a coin. Find the ML estimate of the probability of
success (observing a head) in each trial.
Sol. Let p be the probability of success (observing a head) in each trial. Here, the random experiment
is governed by Binomial distribution. Therefore, the probability of observing 7 heads in 10 tosses (the
likelihood function of the given sample) is given by
 
10 7
L(p) = p (1 − p)3 .
7
This implies
 
10
ln L(p) = ln + 7 ln p + 3 ln(1 − p).
7
d
So dp (ln L(p)) = 0 gives
7 3
− = 0.
p 1−p

15
7
This gives p = 10 . Thus the ML estimate is simply given by the observed fraction of successes, which is
intuitively obvious. Also, the sample information suggests that the coin is not fair enough.

Ex. Suppose 15 buses arrive on a bus stop in a span of 3 hours. Find the ML estimate of the average
number of buses that arrive on the bus stop.
Sol. Let λ be the number of buses that arrive on the bus stop in one hour. Here, the random experiment
is governed by Poisson distribution. Therefore, the probability of 15 buses’ arrival in a span of 3 hours
(the likelihood function of the given sample) is given by
(3λ)15 e−3λ
L(λ) = .
15!
This implies
ln L(λ) = 15 ln(3λ) − 3λ − ln 15!.
d
So dλ (ln L(λ)) = 0 gives
15
− 3 = 0.
λ
This gives λ = 5, as expected.

Ex. Let a1 , a2 , ......., an be an observed set of values of a random sample X1 , X2 , ......., Xn from a normal
population distribution with mean µ and variance σ 2 . Find the ML estimates of µ and σ 2 .
Sol. The density for Xi is
 2
x −µ
1 −1 i
f (xi ) = √ e 2 σ , i = 1, 2, ...., n.
σ 2π
Since X1 , X2 , ......., Xn are independent, the joint probability of Xi , reads as
n
X
n 
xi −µ
2  n − 12

(xi − µ)2
Y1 −1 1
f (x1 , x2 , ..., xn ; µ, σ) = √ e 2 σ
= √ e i=1 .
i=1
σ 2π σ 2π
Therefore, the likelihood function providing the observed values a1 , a2 , ......., an of the random sample
X1 , X2 , ......., Xn , is
n
X
 n − 1
2σ 2
(ai − µ)2
1
L(µ, σ) = √ e i=1 .
σ 2π
n
√ 1 X
∴ ln L(µ, σ) = −n ln 2π − n ln σ − 2 (ai − µ)2 .

i=1
Putting the partial derivatives of ln L(µ, σ) equal to 0, we find
n n
1X 2 1X
µ= ai = ā, σ = (ai − ā)2 .
n n
i=1 i=1

Thus, the ML estimates for the parameters µ and σ 2 are


n
1X
µ̂ = ā and σ̂ 2 = (ai − ā)2 ,
n
i=1

where ā is the computed value of X̄ from the given sample.

Note: We see that the ML estimate of σ 2 is not unbiased.

16
5.2.3 Method of moments
In many cases, the moments involve the parameter θ to be estimated. We can often obtain a reasonable
estimator for θ by replacing the theoretical moments by their estimates based on drawn sample and solv-
ing the resulting equations for the estimator θ̂. The kth moment of the random sample X1 , X2 , ......., Xn
n
1X k
is defined as Xi . On the other hand, the kth moment of the population is E(X k ). If there are m
n
i=1
n
1X k
unknown parameters, then we use the m equations: E(X k ) = Xi , k = 1, 2, ..., m, and solve these
n
i=1
equations to obtain the m unknown parameters.

Ex. A forester plants 5 rows of pine seedlings with 20 pine seedlings in each row. Let X denotes the
number of seedlings per row that survive the first winter. Then X follows a binomial distribution with
n = 20 and unknown p. Find an estimate of p given that X1 = 18, X2 = 17, X3 = 15, X4 = 19, X5 = 20.

Sol. The first moment of the binomial random variable X is E[X] = np = 20p while the first moment of
5
1X 1
the given sample is Xi = (18 + 17 + 15 + 19 + 20) = 17.8. So solving 20p̂ = 17.8, we find p̂ = 0.89,
5 5
i=1
the estimate for p.
Ex. Suppose 2, 4, 3, 6, 10 are the values of a sample of size 5 from a gamma distribution. Find estimates
of the parameters α and β of the gamma distribution.
Sol. The first and second moments of gamma random variable are αβ and αβ 2 + α2 β 2 while the given
sample moments are (2 + 4 + 3 + 6 + 10)/5 = 5 and (22 + 42 + 32 + 62 + 102 )/5 = 33. So solving α̂β̂ = 5
and α̂β̂ 2 + α̂2 β̂ 2 = 125, we find the estimates α̂ = 25/8, β̂ = 8/5.

Note: The estimate obtained from the method of moments often agrees with the one obtained from the
method of ML. If it does not happen in some case, then the ML estimate is preferred.

In the above, we learned some methods to find point estimate of a population parameter θ. Now, we
learn the methods to use the sample information to find an interval or range in which the parameter θ is
likely to lie with some given probability (confidence).

17
5.3 Confidence interval (CI)
A 100(1 − α)% confidence interval (CI) for a population parameter θ is a random interval [L1 , L2 ] such
that P [L1 ≤ θ ≤ L2 ] = 1 − α, regardless the value of θ.

5.3.1 CI of mean of normal population with known variance


Let X1 , X2 , ......., Xn be a random sample of size n from a normal distribution with unknown mean µ
and known variance σ 2 . Then recall that X̄ is normally distributed with mean µ and variance σ 2 /n.
Therefore, the statistic

X̄ − µ
Z= √
σ/ n
follows a standard normal distribution. We utilize this fact to find CI of µ. From the standard normal
distribution, we know that zα/2 is the value of Z such that P [Z > zα/2 ] = α/2. Then by the symmetry
of the distribution, we have

P [−zα/2 ≤ Z ≤ zα/2 ] = F (zα/2 ) − F (−zα/2 ) = 1 − α.


 
X̄ − µ
or P −zα/2 ≤ √ ≤ zα/2 = 1 − α.
σ/ n
√ √
∴ P [X̄ − zα/2 σ/ n ≤ µ ≤ X̄ + zα/2 σ/ n] = 1 − α.
Thus, 100(1 − α)% CI for µ is
√ √
[L1 , L2 ] = [x̄ − zα/2 σ/ n, x̄ + zα/2 σ/ n],

where x̄ is the computed value of X̄ from the given sample. This means that we can be 100(1 − α)%

confident that the error will not exceed zα/2 σ/ n.
Frequently, we wish to know how large a sample is necessary to ensure that the error in estimating µ will

be less than a specified amount e. Clearly, we must choose n such that zα/2 σ/ n = e or n = (zα/2 σ/e)2 .
Now, let us find 95% CI for µ. From the normal probability distribution table, we have

P [−1.96 ≤ Z ≤ 1.96] = F (1.96) − F (−1.96) = 0.95.


√ √
So, 95% CI of µ is [L1 , L2 ] = [x̄ − 1.96σ/ n, x̄ + 1.96σ/ n].
√ √
Likewise, 99% CI for µ is [L1 , L2 ] = [x̄ − 1.96σ/ n, x̄ + 1.96σ/ n].

Ex. Find the 95% CI for mean of population given a sample

8.0 13.6 13.2 13.6


12.5 14.2 14.9 14.5
13.4 8.6 11.5 16.0
14.2 19.0 17.9 17.0

and population variance σ 2 = 9.

Sol. Here n = 16, x̄ = 13.88 and σ = 3. So 95% confidence limits are given by

L1 = x̄ − 1.96σ/ n = 13.88 − 1.96(3/4) = 12.41,

L2 = x̄ + 1.96σ/ n = 13.88 + 1.96(3/4) = 15.35.

18
Ex. How large a sample is required if we want to be 95% confident that our estimate of mean of a
population with S.D. 0.3 is off by less than 0.05?

Sol. n = [(1.96)(0.3)/(0.05)]2 = 138.3

Note: If a sample X1 , X2 , ......., Xn of size n > 30 (large sample) with mean X̄ and variance S 2 is drawn
from a non-normal population with unknown mean µ, then the CLT theorem suggests that the statistic
X̄ − µ
Z= √
S/ n
approximately follows the standard normal distribution. So 100(1 − α)% CI for µ is
√ √
[L1 , L2 ] = [x̄ − zα/2 s/ n, x̄ + zα/2 s/ n],

5.3.2 CI of mean of normal population with unknown variance


Let X1 , X2 , ......., Xn be a random sample of size n (< 30) with mean X̄ and variance S 2 drawn from a
normal distribution with unknown mean µ and unknown variance. Then we recall that the statistic
X̄ − µ
Tn−1 = √
S/ n
follows a T distribution with n − 1 degrees of freedom.
We recall that tα/2 denotes the value of Tn−1 such that P [Tn−1 ≥ tα/2 ] = α/2. By symmetry of T-
distribution, we have P [Tn−1 ≤ −tα/2 ] = α/2. So

P [−tα/2 ≤ Tn−1 ≤ tα/2 ] = 1 − α.

X̄ − µ
or P [−tα/2 ≤ √ ≤ tα/2 ] = 1 − α.
S/ n
√ √
∴ P [X̄ − tα/2 S/ n ≤ µ ≤ X̄ + tα/2 S/ n] = 0.95.
Thus, 100(1 − α)% CI for µ is
√ √
[L1 , L2 ] = [x̄ − tα/2 s/ n, x̄ + tα/2 s/ n],

where x̄ and s are the the computed value of X̄ and S from the given sample.
In particular, 95% CI for µ is
√ √
[L1 , L2 ] = [x̄ − t0.025 s/ n, x̄ + t0.025 s/ n].

Ex. Find the 95% CI for µ of a normal population based on the sample of size 24 given below:

52.7 43.9 41.7 71.5 47.6 55.1


62.2 56.5 33.4 61.8 54.3 50.0
45.3 63.4 53.9 65.5 66.6 70.0
52.4 38.6 46.1 44.4 60.7 56.4

Sol. Here n = 24, X̄ = 53.92 and S = 10.07. From the T probability distribution table, for 23 degrees of
freedom, we have t0.025 = 2.069. So 95% confidence limits are given by
√ √
L1 = x̄ − t0.025 s/ n = 53.92 − 2.069(10.07)/ 24 = 49.67,
√ √
L2 = x̄ + t0.025 s/ n = 53.92 + 2.069(10.07)/ 24 = 58.17.

19
5.3.3 CI of variance of normal distribution with unknown mean
Let X1 , X2 , ......., Xn be a random sample of size n with mean X̄ and variance S 2 drawn from a normal
distribution with unknown mean µ and variance σ 2 . Then we recall that the statistic
n
X
2
Xn−1 = (n − 1)S 2 /σ 2 = (Xi − X̄)2 /σ 2
i=1

follows a chi-squared distribution with n − 1 degrees of freedom. Further, χ2α/2 and χ21−α/2 denote the
2
values of Xn−1 2
such that P [Xn−1 ≥ χ2α/2 ] = α/2 and P [Xn−1
2 ≥ χ21−α/2 ] = 1 − α/2. Therefore,
2
P [χ21−α/2 ≤ Xn−1 ≤ χ2α/2 ] = 1 − α.

or P [χ21−α/2 ≤ (n − 1)S 2 /σ 2 ≤ χ2α/2 ] = 1 − α.


∴ P [(n − 1)S 2 /χ2α/2 ≤ σ 2 ≤ (n − 1)S 2 /χ21−α/2 ] = 1 − α.
Thus, 100(1 − α)% CI for σ 2

[L1 , L2 ] = [(n − 1)s2 /χ2α/2 , (n − 1)s2 /χ21−α/2 ],

where s is the computed value of S from the given sample.


In particular, 95% CI for σ 2 is

[L1 , L2 ] = [(n − 1)s2 /χ20.025 , (n − 1)s2 /χ20.975 ].

Ex. Find the 95% CI for σ 2 of a normal population based on the following sample:

3.4 3.6 4.0 0.4 2.0


3.0 3.1 4.1 1.4 2.5
1.4 2.0 3.1 1.8 1.6
3.5 2.5 1.7 5.1 0.7
4.2 1.5 3.0 3.9 3.0

Sol. Here n = 25 and S 2 = 1.408. From the χ2 probability distribution table, for 24 degrees of freedom,
we have χ20.025 = 39.4 and χ20.975 = 12.4. So 95% confidence limits are given by

L1 = (n − 1)s2 /χ20.025 = 24(1.408)/39.4 = 0.858,

L2 = (n − 1)s2 /χ20.975 = 24(1.408)/12.4 = 2.725.

5.3.4 CI of difference of population means


Suppose two independent random samples of size n1 and n2 with means X̄ and Ȳ are drawn from two
normal populations with unknown means µ1 and µ2 and known variances σ12 and σ22 , respectively. Then
we recall that the statistic
(X̄ − Ȳ ) − (µ1 − µ2 )
Z= q 2
σ1 σ22
n1 + n2

is standard normal variable. It implies


 
(X̄ − Ȳ ) − (µ1 − µ2 )
P −zα/2 ≤ q 2 ≤ zα/2  = 1 − α.
σ1 σ22
n1 + n2

20
 s s 
σ12 σ22 σ12 σ22 
=⇒ P X̄ − Ȳ − zα/2 + ≤ µ1 − µ2 ≤ X̄ − Ȳ + zα/2 + = 1 − α.
n1 n2 n1 n2

Thus, 100(1 − α)% CI for µ1 − µ2 is


 s s 
2 2 2 2
x̄ − ȳ − zα/2 σ1 + σ2 , x̄ − ȳ + zα/2 σ1 + σ2  .
n1 n2 n1 n2

where x̄ and ȳ are computed values of X̄ and Ȳ , respectively, from the given samples.

Note. This procedure for estimating the difference between two means is applicable if σ12 and σ22 are
known. If the variances are not known and the two distributions involved are normal, the T-distribution
becomes involved, as in the case of a single sample. If one is not willing to assume normality, large
samples (say greater than 30) will allow the use of computed sample S.D. s1 and s2 in place of σ1 and σ2 ,
respectively.

Ex. A study was conducted in which two types of engines, A and B, were compared. Gas mileage, in miles
per gallon, was measured. Fifty experiments were conducted using engine type A and 75 experiments were
done with engine type B. The gasoline used and other conditions were held constant. The average gas
mileage was 36 miles per gallon for engine A and 42 miles per gallon for engine B. Find a 96% confidence
interval on µB −µA , where µA and µB are population mean gas mileages for engines A and B, respectively.
Assume that the population standard deviations are 6 and 8 for engines A and B, respectively.
Sol. Here n1 = 50, n2 = 75, x̄ = 36, ȳ = 42, σ1 = 6 and σ2 = 8. Also, we find z0.02 = 2.05. So the
required CI is
 s s 
2 2 2 2
ȳ − x̄ − z0.02 σ1 + σ2 , ȳ − x̄ + z0.02 σ1 + σ2  = [3.43, 8.57].
n1 n2 n1 n2

5.3.5 CI of population proportion


Let p denote the proportion of “successes” in a Binomial population. Suppose X is number of successes
in a random sample of size n such that np ≥ 10 and np(1 − p) ≥ 10. Let P̂ = X
n . Then we recall that the
statistic
P̂ − p
Z=p
p(1 − p)/n
follows the standard normal distribution. Therefore,
P̂ − p
P [−zα/2 ≤ p ≤ zα/2 ] = 1 − α.
p(1 − p)/n
Simplifying the inequality for p, we find
 q q 
1 2 1 1 2 1 2 1 1 2
P̂ + 2n zα/2 n P̂ (1 − P̂ ) + z
4n2 α/2 P̂ + z
2n α/2 n P̂ (1 − P̂ ) + z
4n2 α/2
P 1 2 − zα/2 1 2 ≤p≤ 1 2 + zα/2 1 2
 = 1−α.
1 + n zα/2 1 + n zα/2 1 + n zα/2 1 + n zα/2

Thus, 100(1 − α)% CI for p is


 q q 
1 2 1 1 2 1 2 1 1 2
p̂ + 2n zα/2 n p̂q̂ + 4n2 zα/2 p̂ + 2n zα/2 n p̂q̂ + 4n2 zα/2
 − zα/2 , + zα/2 .
1 + n1 zα/2
2 1 + n1 zα/2
2 1 + n1 zα/2
2 1 + n1 zα/2
2

21
where p̂ is the computed value of P̂ from the given sample, and q̂ = 1 − p̂.
If the sample size n is very large, then a good approximation to the above interval is
p p
[p̂ − zα/2 p̂q̂/n, p̂ + zα/2 p̂q̂/n].

Ex. In a random sample of 500 families owning television sets in Delhi, it is found that 340 subscribe to
HBO. Find a 95% CI for the actual proportion of families with television sets in Delhi that subscribe to
HBO.
Sol. Here n = 500, x = 340. So p̂ = x/n = 340/500 = 0.68 and q̂ = 0.32. Also, z0.025 = 1.96. So 95% CI
for the actual proportion is
p p
[p̂ − z0.025 p̂q̂/n, p̂ + z0.025 p̂q̂/n] = [0.6391, 0.7209].

5.3.6 CI of difference of population proportions


Following the approach of single sample, a CI for the difference p1 − p2 of the proportions from two
different normal populations can be derived by considering the sampling distribution of P̂1 − P̂2 , where
P̂1 = X/n1 and P̂2 = Y /n2 ; further X and Y being the number of successes in the two random samples
with sizes n1 and n2 , respectively. Then we recall that the statistic

(P̂1 − P̂2 ) − (p1 − p2 )


Z= q
p1 (1−p1 )
n1 + p2 (1−p
n2
2)

is a standard normal variable. For large samples, the approximate 100(1 − α)% CI for p1 − p2 is obtained
as
h p p i
p̂1 − p̂2 − zα/2 p̂1 q̂1 /n1 + p̂2 q̂2 /n2 , p̂1 − p̂2 + zα/2 p̂1 q̂1 /n1 + p̂2 q̂2 /n2 ,

where p̂1 , p̂2 are the computed values of P̂2 , P̂2 from the given samples, and q̂1 = 1 − p̂1 , q̂2 = 1 − p̂2 .

Ex. A certain change in a process for manufacturing component parts is being considered. Samples are
taken under both the existing and the new process so as to determine if the new process results in an
improvement. If 75 of 1500 items from the existing process are found to be defective and 80 of 2000 items
from the new process are found to be defective, find a 90% confidence interval for the true difference in
the proportion of defectives between the existing and the new process.

Sol. Here n1 = 1500, n2 = 2000, x = 75, y = 80. So p̂1 = x/n1 = 75/1500 = 0.05 and p̂2 = x/n2 =
80/2000 = 0.04. Also, z0.05 = 1.645. So 90% CI for p1 − p2 is
h p p i
p̂1 − p̂2 − z0.05 p̂1 q̂1 /n1 + p̂2 q̂2 /n2 , p̂1 − p̂2 + z0.05 p̂1 q̂1 /n1 + p̂2 q̂2 /n2 = [−0.0017, 0.0217].

5.3.7 One-sided confidence bounds


One-sided confidence bounds are developed in the same fashion as two-sided intervals. For instance,
suppose X̄ is the mean of a random sample of size n from a population with unknown µ and known
variance σ 2 . Then we have
 
X̄ − µ
P √ < zα = 1 − α.
σ/ n

=⇒ P [µ > X̄ − zα σ/ n] = 1 − α.

22
So 100(1 − α)% lower confidence bound for µ is

x̄ − zα σ/ n,

where x̄ is the computed value of sample mean X̄ from the given sample. Likewise, 100(1 − α)% upper
confidence bound for µ is obtained as

x̄ + zα σ/ n.

In a similar way, we can obtain one-sided confidence bounds in other cases.

Ex. In a psychological testing experiment, 25 subjects are selected randomly and their reaction time, in
seconds, to a particular stimulus is measured. Past experience suggests that the variance in reaction times
to these types of stimuli is 4 sec2 and that the distribution of reaction times is approximately normal.
The average time for the subjects is 6.2 seconds. Give an upper 95% bound for the mean reaction time.

Sol. The upper 95% bound is given by


√ p
x̄ + z0.05 σ/ n = 6.2 + (1.645) 4/25 = 6.858.

Hence, we are 95% confident that the mean reaction time is less than 6.858 seconds.

23
5.3.8 Summary
Before, we proceed further, it is important to recollect the following important points.
• If X̄ is the mean of a random sample of size n selected from a normal population with unknown mean
X̄ − µ
µ and known variance σ 2 , then the statistic Z = √ is standard normal, and the 100(1 − α)%
σ/ n
CI of µ is
√ √
[L1 , L2 ] = [x̄ − zα/2 σ/ n, x̄ + zα/2 σ/ n],
where x̄ is the computed value of X̄ from the given sample.
• If X̄ is the mean of a random sample of size n ≥ 30 (large sample) selected from a non-normal popu-
X̄ − µ
lation with unknown mean µ and known variance σ 2 , then the statistic Z = √ is approximately
σ/ n
standard normal, and the 100(1 − α)% CI of µ is
√ √
[L1 , L2 ] = [x̄ − zα/2 σ/ n, x̄ + zα/2 σ/ n],
where x̄ is the computed value of X̄ from the given sample.
• If X̄ is the mean and S 2 is the variance of a random sample of size n ≥ 30 (large sample) selected
from a non-normal population with unknown mean µ and unknown variance, then the statistic
X̄ − µ
Z= √ is approximately standard normal, and the 100(1 − α)% CI of µ is
S/ n
√ √
[L1 , L2 ] = [x̄ − zα/2 s/ n, x̄ + zα/2 s/ n],
where x̄ and s are the computed values of X̄ and S, respectively from the given sample.
• If X̄ is the mean and S 2 is the variance of a random sample of size n < 30 (small sample) selected
from a normal population with unknown mean µ and unknown variance, then the statistic Tn−1 =
X̄ − µ
√ follows T-distribution with n − 1 degrees of freedom, and the 100(1 − α)% CI of µ is
S/ n
√ √
[L1 , L2 ] = [x̄ − tα/2 s/ n, x̄ + tα/2 s/ n],
where x̄ and s are the computed values of X̄ and S, respectively from the given sample.
• If X̄ is the mean and S 2 is the variance of a random sample of size n selected from a normal
2
population with unknown mean and unknown variance σ 2 , then the statistic Xn−1 = (n − 1)S 2 /σ 2
follows a chi-squared distribution with n − 1 degrees of freedom, and the 100(1 − α)% CI for σ 2 is
[L1 , L2 ] = [(n − 1)s2 /χ2α/2 , (n − 1)s2 /χ21−α/2 ],
where s is the computed value of S from the given sample.
• If two independent random samples of size n1 and n2 with means X̄ and Ȳ are drawn from two
normal populations with unknown means µ1 and µ2 and known variances σ12 and σ22 , respectively,
(X̄ − Ȳ ) − (µ1 − µ2 )
then the statistic Z = q 2 is standard normal, and the 100(1 − α)% CI for µ1 − µ2
σ1 σ22
n1 + n2
is
 s s 
2
σ1 σ 2 2
σ1 σ 2
[L1 , L2 ] = x̄ − ȳ − zα/2 + 2 , x̄ − ȳ + zα/2 + 2,
n1 n2 n1 n2

where x̄ and ȳ are computed values of X̄ and Ȳ , respectively, from the given samples.

24
• If P̂ = X/n is the proportion of successes X in a random sample of size n drawn from a binomial
P̂ − p
population with unknown success proportion p, then the statistic Z = p is approximately
p(1 − p)/n
standard normal under the assumptions np ≥ 10 and np(1 − p) ≥ 10, and the approximate 100(1 −
α)% CI of p for large n is
p p
[L1 , L2 ] = [p̂ − zα/2 p̂q̂/n, p̂ + zα/2 p̂q̂/n],

where p̂ is the computed value of P̂ from the given sample, and q̂ = 1 − p̂.

• If P̂1 = X/n and P̂2 = Y /n are the proportions of successes X and Y in two random samples
of size n1 and n2 drawn from two binomial populations with unknown success proportions p1 and
(P̂1 − P̂2 ) − (p1 − p2 )
p2 , respectively, then the statistic Z = p is approximately standard
p1 (1 − p1 )/n1 + p2 (1 − p2 )/n2
normal under suitable assumptions as in case of single proportion, and the approximate 100(1−α)%
CI of p1 − p2 for large n1 and n2 is
h p p i
[L1 , L2 ] = p̂1 − p̂2 − zα/2 p̂1 q̂1 /n1 + p̂2 q̂2 /n2 , p̂1 − p̂2 + zα/2 p̂1 q̂1 /n1 + p̂2 q̂2 /n2 ,

where p̂1 , p̂2 are the computed values of P̂2 , P̂2 from the given samples, and q̂1 = 1 − p̂1 , q̂2 = 1 − p̂2 .

25
5.4 Hypotheses testing
In a hypotheses-testing problem, there are two contradictory hypotheses under consideration, namely the
null hypothesis and the alternative hypothesis. The null hypothesis, denoted by H0 , is the claim that
is initially assumed to be true (“the prior belief” claim). The alternative hypothesis, denoted by Ha , is
the assertion that is opposite or contradictory to H0 . The following examples illustrate the two hypotheses.

Ex. A manufacturer of sprinkler systems used for fire protection in office buildings claims that the true
average µ system-activation temperature is 130 degree Fahrenheit. To analyze the manufacturer’s claim,
we need to test Ha versus H0 where

H0 : µ = 130

Ha : µ 6= 130 (The research hypothesis to be tested.)


Ex. A coffee company claims that the average (µ) nicotine content in its coffee brand is (at most) 1.5
mg, and thus fulfills the safety limit issued by the health authorities. But the health authorities suspect
that the coffee brand carries more nicotine than the safety standards, and therefore should be banned.
It would be unwise to reject the company’s claim without strong contradictory evidence by the health
authorities. So here it is appropriate to test Ha versus H0 , where

H0 : µ ≤ 1.5 mg (Company’s claim that the average nicotine content is at most 1.5 mg)

Ha : µ > 1.5 mg (The research hypothesis to be tested by the health authorities)


Ex. Suppose a company is considering putting a new type of coating on bearings that it produces.
The true average wear life with the current coating is known to be 1000 hours. With µ denoting the
true average life for the new coating, the company would not want to make a change unless an evidence
strongly suggests that µ exceeds 1000 hours. So here it is appropriate to test Ha versus H0 , where

H0 : µ = 1000 hours

Ha : µ > 1000 hours (The research hypothesis to be tested.)


Ex. Suppose that 10% of all circuit boards produced by a certain manufacturer are found defective.
An engineer has suggested a change in the production process with the belief that it will result in a
reduced defective rate. Let p denote the true proportion of defective boards resulting from the suggested
or changed process. To see whether engineer’s suggestion has really worked, we need to test Ha versus
H0 where

H0 : p = 0.1

Ha : p < 0.1 (The research hypothesis to be tested.)


Ex. Transport authorities believe that night accidents happen on a particular highway due to improper
reflective signs put by the engineers. On the other hand, highway engineers claim that the reflective
highway signs do not perform properly because more than 30% of the vehicles on the road have misaimed
headlights. If this contention is supported statistically, a tougher inspection program will be put into
operation by the transport authorities. Let p denote the proportion of vehicles with misaimed headlights.
Since the engineers wish to support p > 0.3, so the null hypothesis H0 and the research hypothesis Ha
are

H0 : p ≤ 0.3

Ha : p > 0.3

26
Ex. Let σ denote the standard deviation of the distribution of inside diameters (inches) for a certain
type of metal sleeve. The company decides to continue the production of sleeve unless sample evidence
conclusively demonstrates σ > 0.001. So to continue/stop the production, one needs to test Ha versus
H0 where

H0 : σ = 0.001

Ha : σ > 0.001 (The research hypothesis to be tested.)

From the above examples, it is clear that the hypotheses statement about a parameter θ of interest
appears in one of the following forms:

(1) H0 : θ = θ0 , Ha : θ 6= θ0 (two-tailed)
(2) H0 : θ = θ0 , Ha : θ < θ0 (left-tailed)
(3) H0 : θ = θ0 , Ha : θ > θ0 (right-tailed)
(4) H0 : θ ≥ θ0 , Ha : θ < θ0 (left-tailed)
(5) H0 : θ ≤ θ0 , Ha : θ > θ0 (right-tailed)

where the value θ0 of θ, that separates H0 from Ha , is called null value of θ, and is included in the state-
ment of null hypothesis H0 . Note that the statisticians prefer to write H0 : θ = θ0 in place of H0 : θ ≥ θ0
or H0 : θ ≤ θ0 since, as we will see later, if Ha given in (4) or (5) is favored over H0 : θ = θ0 , then it also
favored over H0 : θ ≥ θ0 or H0 : θ ≤ θ0 . So the first three statements of hypotheses are used in practice.

Scientific research often involves trying to decide whether a current theory should be replaced by
a more plausible and satisfactory explanation of the phenomenon under investigation. A conservative
approach is to identify the current theory with H0 and the researcher’s alternative explanation with Ha .
Rejection of the current theory will then occur only when evidence is much more consistent with the new
theory. That is why, in many situations, Ha is referred to as the “researcher’s hypothesis”, since it is
the claim that the researcher would really like to validate. The word null means “of no value, effect, or
consequence”, which suggests that H0 should be identified with the hypothesis of no change (from current
opinion), no difference, no improvement, and so on.
The null hypothesis will be rejected in favor of the alternative hypothesis only if sample evidence
suggests that H0 is false. If the sample does not strongly contradict H0 , we can not reject H0 . The two
possible conclusions from a hypothesis-testing analysis are then reject H0 or fail to reject H0 . A researcher
or experimenter who puts lot of effort, money etc. to validate the hypothesis Ha , would not like to accept
H0 rather would like to comment that the existing research evidence is not enough to reject H0 .

27
5.4.1 Hypothesis testing procedure
In a hypothesis testing problem, we are given a prevailing or existing statement (known as null hypothesis
H0 ) about the population parameter. By setting up an alternative hypothesis Ha (complementary to H0 )
and employing a suitable statistical method or test, we check correctness of H0 with certain confidence
or probability. The following example illustrates the idea and procedure of hypothesis testing.
Suppose we are given a normal population with mean µ0 and variance σ02 . But we suspect whether
actual mean of the population is µ0 . To test or make a decision about the prevailing hypothesis or null
hypothesis, H0 : µ = µ0 about the mean of the population, the alternative hypothesis is, Ha : µ 6= µ0 . So
the hypothesis statement is given by

H0 : µ = µ0

Ha : µ 6= µ0
Now suppose, for the hypothesis testing purpose, we are given a random sample of size n from the normal
population N (µ0 , σ02 ). Let X̄ be mean of the random sample, and x̄ be its observed value. We will use
random sample mean X̄ and its observed value x̄ for testing the given hypothesis about the population
mean.
We know that the sampling distribution of mean from the given normal population is also normal,
that is, X̄ ∼ N (µ0 , σ02 /n). It follows that
 
X̄ − µ0
P −zα/2 ≤ √ ≤ zα/2 = 1 − α.
σ0 / n

This can be rewritten as


√ √
P [µ0 − zα/2 σ0 / n ≤ X̄ ≤ µ0 + zα/2 σ0 / n] = 1 − α.

Thus the 100(1 − α)% CI of X̄ is


√ √
[µ0 − zα/2 σ0 / n, µ0 + zα/2 σ0 / n].

In other words, there are 100(1 − α)% chances of the observed value x̄ of X̄ to lie in the above interval.
Notice that this interval becomes shorter and shorter in length as we choose the size n of the sample larger
and larger. This increases the possibility of x̄ to be very close (or equal) to the µ0 , in case H0 : µ = µ0
is true! This shows that X̄ is the right choice of test statistic when we test a hypothesis regarding the
population mean.
Now let us see how we design a test for hypothesis testing on the basis of given or observed sample
information. First we choose a fixed value of α, known as the level of significance (los) of the test. After
fixing the los, we have two ways of hypothesis testing: (i) Critical region test (ii) P-value test. In the
following, we describe both these approaches.

Critical region test


To understand this test, in the above example of our discussion, let us choose α = 0.01. Then there are
99% chances of the observed value x̄ of X̄ to lie in the interval:
√ √
[µ0 − z0.005 σ0 / n, µ0 + z0.005 σ0 / n] = [L1 , L2 ],
√ √
where L1 = µ0 − z0.005 σ0 / n and L2 = µ0 + z0.005 σ0 / n.

In practical terms, the mean x̄ of a good and fair sample would lie in this interval [L1 , L2 ]. Indeed,
we can expect this in case µ0 is the true population mean, that is, H0 : µ = µ0 is true. So let us decide

28
that we will reject H0 only when the observed value x̄ of X̄ lies outside this 99% CI [L1 , L2 ]. This is a
good test because there are 99% chances of the observed value x̄ of X̄ to lie in the interval [L1 , L2 ], and
hence only 1% chances to lie outside this interval. Indeed, we can suspect the truth of H0 in case x̄ lies
outside the 99% CI, and therefore accept Ha . The range of x̄ outside the 99% CI [L1 , L2 ] is the critical
region or rejection region of H0 or acceptance region of Ha . So if the observed value x̄ of X̄ lies outside
this interval [L1 , L2 ], we accept Ha and reject H0 at the given los α = 0.01.
The end points of the 99% CI [L1 , L2 ], that is, L1 and L2 are critical values (x̄cr ) of the test statistic
X̄. Obviously, probability of getting x̄ beyond x̄cr , that is, in the critical region, is α:

P (X̄ < L1 or X̄ > L2 ) = α

Notice that, the observed value x̄ of X̄ may lie in the acceptance region, X̄ < L1 or X̄ > L2 , of Ha
even when H0 is true. Of course, there are only 1% chances of this situation. It means, there can be an
error of 1% (very low risk) in our decision of accepting Ha when actually H0 is true. This is Type-I error,
and it is under our control as it corresponds to α = 0.01 that we preset at our own. This α value is the
level of significance (los) of our test. So accepting Ha at 1% los means we are 99% confident of rejecting
H0 .

P-value test
In the critical region test, we need to find the critical region explicitly. Alternatively, we can find proba-
bility of X̄ beyond its observed value x̄, that is,P (|X̄| > x̄) This probability is called P-value of our test.
So

P-value = P (|X̄| > x̄) = 2P (X̄ > x̄)

If P-value is less than α, then obviously either x̄ < L1 or x̄ > L2 . That means, x̄ lies in the critical region,
and therefore we accept Ha in this case.

Type II error
In any case (critical region test or P-value test), it is very important to note that we do not accept
H0 in case our test does not allow us to accept Ha . Accepting H0 could be a very big risk when it is
not true. This is called Type II error, denoted by β. Suppose the true value of population mean is µ1
√ √
instead of µ0 . Then the probability of x̄ to lie in the interval [µ0 − z0.005 σ0 / n, µ0 + z0.005 σ0 / n], when
X̄ ∼ N (µ1 , σ02 /n), gives the value of β. So
√ √
β = P (µ0 − z0.005 σ0 / n ≤ X̄ ≤ µ0 + z0.005 σ0 / n) : X̄ ∼ N (µ1 , σ02 /n)

It implies that
 √ √ 
µ0 − z0.005 σ0 / n − µ1 X̄ − µ1 µ0 + z0.005 σ0 / n − µ1
β=P √ ≤ √ ≤ √
σ0 / n σ0 / n σ0 / n

Notice that for converting X̄ to the standard normal variable, we have used the true population mean µ1 .

Right tailed test


In the previous discussion, the test applied is two tailed because the acceptance region X̄ < L1 or X̄ > L2
of Ha lies in both the tails of the distribution of X̄. Now suppose the hypothesis statement reads:

H0 : µ ≤ µ0

Ha : µ > µ0

29
Then, we need to consider 99% upper confidence bound (in case α = 0.01) of X̄, which is given by
√ √
U = µ0 + zα σ0 / n = µ0 + z0.01 σ0 / n.

Clearly, now the critical value is U and the critical region is X̄ > U , which lies in the right tail of the
distribution of X̄. So, if the observed value x̄ of X̄ lies in the critical region, that is, x̄ > U , then we
accept Ha at 1% los.
In this case, the P-value is given by

P-value = P (X̄ > x̄)

If P-value is less than α, then obviously x̄ > U . That means, x̄ lies in the critical region, and therefore
we accept Ha in this case.
Type I and Type II errors, in case of the right tailed test, are given by

α = P (X̄ > U ),
 √ 
X̄ − µ1 µ0 + z0.01 σ0 / n − µ1
β=P √ ≤ √
σ0 / n σ0 / n

Left tailed test


It is similar to the right tailed test. Suppose the hypothesis statement reads:

H0 : µ ≥ µ0

Ha : µ < µ0
Then, we need to consider 99% lower confidence bound (in case α = 0.01) of X̄, which is given by
√ √
L = µ0 − zα σ0 / n = µ0 − z0.01 σ0 / n.

Clearly, now the critical value is L and the critical region is X̄ < L, which lies in the left tail of the
distribution of X̄. So, if the observed value x̄ of X̄ lies in the critical region, that is, x̄ < L, then we
accept Ha at 1% los.
In this case, the P-value is given by

P-value = P (X̄ < x̄)

If P-value is less than α, then obviously x̄ < L. That means, x̄ lies in the critical region, and so Ha is
accepted in this case.
Type I and Type II errors, in case of the left tailed test, are given by

α = P (X̄ < L),


 √ 
X̄ − µ1 µ0 − z0.01 σ0 / n − µ1
β=P √ ≥ √
σ0 / n σ0 / n

5.4.2 List of some useful test statistics (two tailed test)


Here we list some useful test statistics for two tailed test, which are directly used in the hypothesis testing
problems.
1. If the hypothesis is about mean of population, then we use random sample mean X̄ as the test
statistic. For applying critical region test, we use one of the following 100(1−α)% CI of X̄ depending
upon the nature of population or size of the sample.

30
(i) If the random sample of size n is chosen from a normal population with mean µ0 and variance
σ02 or a non-normal population with mean µ0 , variance σ02 and n ≥ 30, then X̄ ∼ N (µ0 , σ02 /n),
and the 100(1 − α)% CI of X̄ is
√ √
[L1 , L2 ] = [µ0 − zα/2 σ0 / n, µ0 + zα/2 σ0 / n].

(ii) If the random sample of size n is chosen from a normal population with mean µ0 or a non-
normal population with mean µ0 and n ≥ 30, then X̄ ∼ N (µ0 , s20 /n) and the 100(1 − α)% CI
of X̄ is
√ √
[L1 , L2 ] = [µ0 − zα/2 s0 / n, µ0 + zα/2 s0 / n],

where s0 is the observed value of the sample S.D. S from the given or chosen sample.
(iii) If the random sample of size n < 30 is chosen from a normal population with mean µ0 , then

(X̄ − µ0 )/(s0 / n) ∼ Tn−1 , and 100(1 − α)% CI of X̄ is
√ √
[L1 , L2 ] = [µ0 − tα/2 s0 / n, µ0 + tα/2 s0 / n],

where s0 is the observed value of the sample S.D. S from the given or chosen sample.

2. If the hypothesis is about the difference of means of two populations, then we use the difference of
the random sample means X̄1 − X̄2 as the test statistic. In case, two random samples of sizes n1
and n2 are selected from two normal populations with distributions N (µ1 , σ12 ) and N (µ2 , σ22 ), then
X̄1 − X̄2 ∼ N (µ1 − µ2 , σ12 /n + σ12 /n), and the 100(1 − α)% CI of X̄1 − X̄2 is
q q
[L1 , L2 ] = [µ1 − µ2 − zα/2 σ1 /n1 + σ2 /n2 , µ1 − µ2 + zα/2 σ12 /n1 + σ22 /n2 ]
2 2

3. If the hypothesis is about the population proportion, then we use the the random sample proportion
P̂ as the test statistic. In case, the random sample of size n is chosen from a Binomial population
with proportion p such that np ≥ 10, np(1−p) ≥ 10, then P̂ ∼ N (p, p(1−p)/n) and the 100(1−α)%
CI of P̂ is
p p
[L1 , L2 ] = [p − zα/2 p(1 − p)/n, p + zα/2 p(1 − p)/n]

4. If the hypothesis is about the difference of proportions of two populations, then we use the difference
of the random sample sample proportions P̂1 − P̂2 as the test statistic. In case, two random samples
of sizes n1 and n2 are selected from two Binomial populations with proportions p1 and p2 such that
n1 p1 ≥ 10, n1 p1 (1 − p1 ) ≥ 10 and n2 p2 ≥ 10, n2 p2 (1 − p2 ) ≥ 10, then P̂1 − P̂2 ∼ N (p1 − p2 , p1 (1 −
p1 )/n1 + p2 (1 − p2 )/n2 ) and the 100(1 − α)% CI of P̂1 − P̂2 is
p p
[L1 , L2 ] = [p1 −p2 −zα/2 p1 (1 − p1 )/n1 + p2 (1 − p2 )/n2 , p1 −p2 +zα/2 p1 (1 − p1 )/n1 + p2 (1 − p2 )/n2 ]

5. If the hypothesis is about the population variance, then we use the the random sample variance S 2
as the test statistic. In case, the random sample of size n is chosen from a normal population with
variance σ02 , then (n − 1)S 2 /σ02 follows χ2 distribution with n − 1 dof, and the 100(1 − α)% CI of
S 2 is

[L1 , L2 ] = [χ21−α/2 σ02 /(n − 1), χ2α/2 σ02 /(n − 1)]

It is straightforward to write the one sided upper or lower confidence bounds for above statistics, in
case, we need to apply right or left tailed test.

31
Ex. A manufacturer of sprinkler systems used for fire protection in office buildings claims that the true
average system-activation temperature is 130o F. A sample of n = 9 systems, when tested, yields a sample
average activation temperature of 131.08o F. If the distribution of activation times is normal with standard
deviation 1.5o F, do the data contradict the manufacturer’s claim at significance level α = 0.01 ? Also,
calculate Type II error in case the true average system-activation temperature is 131o F.

Sol. Here we need to test the hypotheses

H0 : µ = 130

Ha : µ 6= 130 (a departure from the claimed value in either direction is of concern)


It is about the population mean. So the appropriate test statistic is the sample mean X̄. The given
population is normal with mean µ0 = 130 and S.D. σ0 = 1.5. Also, the sample size is n = 9. So
X̄ ∼ N (µ0 , σ02 /n) = N (130, (1.5)2 /9). Given los is α = 0.01.

Critical region test: Since α = 0.01, we find the 99% CI of X̄, which reads
√ √
µ0 ± z0.005 σ0 / n = 130 ± 2.58(1.5/ 9) = [128.71, 131.29].

The observed value 131.08 of X̄ lies inside this interval. So we are unable to reject H0 at 0.01 lof. Hence,
the data do not contradict the manufacturer’s claim at significance level α = 0.01

P-value test: We find that

131.08−130
P-value = 2P [X ≥ 131.08] = 2P [Z ≥ √
1.5/ 9
] = 2P [Z ≥ 2.16] = 2(1 − P [Z ≤ 2.16]) =
2(1 − 0.9846) = 0.0308.

It is greater than α = 0.01. So H0 cannot be rejected at significance level 0.01. Hence, the data do
not give strong support to the claim that the true average differs from the design value of 130.

Type II error: Here µ1 = 131, so


 √ √ 
µ0 − z0.005 σ0 / n − µ1 X̄ − µ1 µ + z0.005 σ0 / n − µ1
β=P √ ≤ √ ≤ 0 √
σ0 / n σ0 / n σ0 / n
 
128.71 − 131 131.29 − 131
=P √ ≤Z≤ √ = P (−4.58 ≤ Z ≤ 0.58) ≈ P (Z ≤ 0.58) = 0.719
1.5/ 9 1.5/ 9
Ex. Automotive engineers are using more and more aluminium in manufacturing the automobiles in hopes
of reducing the cost and improving the petrol mileage. For a particular model, the mileage on highway
has a mean 26 kmpl with a standard deviation of 5 kmpl. It is hoped that a new design manufactured
by using more aluminium will increase the mean petrol mileage on highway maintaining the standard
deviation of 5 kmpl. To test this hypothesis, 36 vehicles with new design are tested on highway and the
mean petrol mileage is found to be 28.04 kmpl. Find the P-value, interpret it and decide whether the
sample data support the hypothesis of increase in the mean petrol mileage at the 5% level of significance.

Sol. We test the hypothesis

H0 : µ ≤ 26

Ha : µ > 26 (the new design incereases the petrol mileage on highway)

32
It is about the population mean. So the appropriate test statistic is the sample mean X̄. The
given population is normal with mean µ0 = 26 and S.D. σ0 = 5. Also, the sample size is n = 36. So
X̄ ∼ N (µ0 , σ02 /n) = N (26, (5)2 /36). Given los is α = 0.05. Note that here we need to apply right tailed
test.

Critical region test: Since α = 0.05, we find the 95% upper confidence bound of X̄, which reads
√ √
µ0 + z0.05 σ0 / n = 26 + 1.65(5/ 36) = 27.37.

The observed value 28.04 of X̄ is greater than 27.37, and hence lies in the critical region. So we reject H0
at 0.05 lof. Hence, the sample data support the hypothesis of increase in the mean petrol mileage at the
5% level of significance.

P-value test: We find that

28.04−26
P-value = P [X ≥ 28.04] = P [Z ≥ √
5/ 36
] = P [Z ≥ 2.45] = 1 − P [Z ≤ 2.45] = 1 − 0.9929 = 0.0071.

It is less than α = 0.05. So H0 can be rejected at significance level 0.05. Hence, the sample data
support the hypothesis of increase in the mean petrol mileage at the 5% level of significance.

Interpretation of the P-value: There are two explanations of this very small probability 0.0071. First,
the null hypothesis H0 is true and we have observed a very rare sample that by chance has a large mean.
Second, the new design with more aluminium has, in fact, resulted in a higher mean petrol mileage. We
prefer the second explanation as it supports our research hypothesis Ha . That is, we shall reject H0 and
report that P-value of our test is 0.0071. Also, the prestated level of significance is α = 0.05. We can
safely reject H0 at this level of significance since the P-value 0.0071 of our test is less than α = 0.05.

33
Ex. The maximum acceptable level for exposure to microwave radiation in Mumbai is an average of 10
microwatts per square centimeter (assume normality). It is feared that a large television transmitter may
be polluting the air nearby by pushing the level of microwave radiation above the safe limit. A sample of
25 readings gives an average of 10.3 microwatts per square centimeter with variance 4. Test the hypothesis
whether the transmitter pushes the level of radiation beyond the safe limit at α = 0.1. Also, verify your
decision with the P-value.

Sol. Here we need to test the hypotheses:

H0 : µ ≤ 10

Ha : µ > 10 (unsafe).
It is about the population mean. So the appropriate test statistic is the sample mean X̄. The given
population is normal with mean µ0 = 10. The sample size is n = 25, which is less than 30. Sam-

ple S.D. is s0√= 2. Also, S.D. of the population is unknown. So (X̄ − µ0 )/(s0 / n) ∼ Tn−1 , that is,
(X̄ − 10)/(2/ 25) ∼ T24 . Given los is α = 0.1. Note that here we need to apply right tailed test.

Critical region test: Since α = 0.1, we find the 95% upper confidence bound of X̄, which reads
√ √
µ0 + t0.1,24 σ0 / n = 10 + 1.318(2/ 25) = 10.527.

The observed value 10.3 of X̄ is less than 10.527, and hence does not lie in the critical region. So we are
unable to reject H0 at 0.1 lof, and conclude that the observed data do not support the contention that
the transmitter is forcing the average microwave level level above the safe limit.

P-value test: We find that

10.3−10
P-value = P [X ≥ 10.3] = P [T24 ≥ √
2/ 25
] = P [T24 ≥ 0.75] = 0.23

It is greater than α = 0.1. So H0 can not be rejected at significance level 0.1. Hence, the observed
data do not support the contention that the transmitter is forcing the average microwave level level above
the safe limit.

Note: Here, the P-value P [T24 ≥ 0.75] = 0.23 is calculated by using the online calculator SticiGui. In the
T-distribution table, in the text book, this value is not directly available. But still you can have an idea
about the P-value by seeing the two adjacent values from the T-distribution table. For, we see sP [T24 >
0.685] = 1 − P [T24 ≤ 0.685] = 1 − 0.75 = 0.25. Also, P [T24 > 1.318] = 1 − P [T24 ≤ 1.318] = 1 − 0.9 = 0.1.
That means the P-value P [T24 ≥ 0.75] = 0.23 lies between 0.1 and 0.25. So the P-value is greater than
α = 0.1, and therefore H0 can not be rejected.

34
Ex. Transport authorities believe that night accidents happen on a particular highway due to improper
reflective signs put by the engineers. On the other hand, highway engineers claim that the reflective
highway signs do not perform properly because more than 30% of the vehicles on the road have misaimed
headlights. If this contention is supported statistically, a tougher inspection program will be put into
operation by the transport authorities. For this purpose, the transport authorities randomly selected
100 vehicles and found 40 vehicles with misaimed headlights. Test whether this sample data support the
engineers’ claim at the 5% level of significance.

Sol. Let p denote the proportion of vehicles with misaimed headlights. Since the engineers wish to
support p > 0.3, so the null hypothesis H0 and the research hypothesis Ha are

H0 : p ≤ 0.3

Ha : p > 0.3
It is about the population proportion. So the appropriate test statistic is the sample proportion P̂ .
The given population is Binomial with p = 0.3. The sample size is n = 100 such that np = 30 > 10,
np(1 − p) = 21 > 10. So P̂ ∼ N (p, p(1 − p)/n) = N (0.3, 0.0021). Given los is α = 0.05. Note that here
we need to apply right tailed test.

Critical region test: Since α = 0.05, we find the 95% upper confidence bound of P̂ , which reads
p √
p + z0.05 p(1 − p)/n = 0.3 + 1.65 0.0021 = 0.376

So the observed sample proportion 40/100 = 0.4 is greater tahn 0.376, hence lies in the critical region. So
we reject H0 at 0.05 lof. Thus, the sample data support the engineers’ claim at the 5% level of significance.

P-value test: We find that

0.4−0.3
P-value = P [P̂ ≥ 0.4] = P [Z ≥ √
0.0021
] = P [Z ≥ 2.18] = 0.015.

It is less than α = 0.05. So H0 can be rejected at significance level 0.05. Hence, the sample data
support the engineers’ claim at the 5% level of significance.

Note: In the above example, the conditions np ≥ 10 and np(1 − p) ≥ 10 are satisfied. In case np < 10,
the normal approximation of the Binomial population is not applicable, but we can proceed as explained
in the following example.

35
Ex. Transport authorities believe that night accidents happen on a particular highway due to improper
reflective signs put by the engineers. On the other hand, highway engineers claim that the reflective
highway signs do not perform properly because more than 30% of the vehicles on the road have misaimed
headlights. If this contention is supported statistically, a tougher inspection program will be put into
operation by the transport authorities. For this purpose, the transport authorities randomly selected
15 vehicles and found 9 vehicles with misaimed headlights. Test whether this sample data support the
engineers’ claim at the 5% level of significance. Describe type-II errors of the test.

Sol. Let p denote the proportion of vehicles with misaimed headlights. Since the engineers wish to
support p > 0.3, so the null hypothesis H0 and the research hypothesis Ha are
H0 : p ≤ 0.3
Ha : p > 0.3
Let X be random variable denoting the number of vehicles with misaimed headlights. Then in a sample
of 15 vehicles, X can take the values 0, 1, 2, ..., 15. Obviously, X is a binomial random variable with
n = 13 and p = 0.3, that is X ∼ B(n = 15, p = 0.3). So, X is our test statistic and its observed
value is given as 9. Now we need to find the critical region of X at 5% level of significance given that
X ∼ B(n = 15, p = 0.3). Binomial distribution cdf table, we find
P [X ≥ 8 : X ∼ B(n = 15, p = 0.3)] = 1 − P [X ≤ 7 : X ∼ B(n = 15, p = 0.3)] = 1 − 0.95 = 0.05.
This shows that the critical region C for the test statistic X at the 5% level of significance is
C = {8, 9, 10, 11, 12, 13, 14, 15}.
Since the observed value of X is given as 9, lying in this critical region. So we accept Ha , that is, the
sample data support the engineers’ claim at the 5% level of significance.

Also, we find
P-value = P [X ≥ 9 : X ∼ B(n = 15, p = 0.3)]
= 1 − P [X ≤ 8 : X ∼ B(n = 15, p = 0.3)] = 1 − 0.9848 = 0.0152,
which is less than α = 0.05. Thus, the sample data support the engineers’ claim at the 5% level of
significance.

Calculating Type-II error: In the above test, we have set α = 0.05. It implies that 5% of the
experiments consisting of inspection of 15 vehicles would result into incorrect rejection of H0 when it is
true, that is the probability of committing Type I error is α = 0.05.
In contrast to α, there is not a single β. Instead, there is a different β for each different p that exceeds
0.3. Thus there is a value of β for p = 0.4 (in which case X ∼ B(15, 0.4)), another value of β for p = 0.45
, and so on. So to calculate Type II error probability, we need to specify value of p. For example,
β(p = 0.4) = P [Type II error when p = 0.4]
= P [H0 not rejected when it is not true because p = 0.4]
= P [X ≤ 7 : X ∼ B(n = 15, p = 0.4)] = 0.7869 ≈ 0.79
It means, when p is actually 0.4 rather than 0.3 (a “small” departure from H0 ), roughly 79% of all
experiments of this type would result in H0 being incorrectly not rejected.
Also, notice that β is large because it pertains to the complementary region of the small region of α.
So accepting H0 is a big risk when it is not true. That is why, we prefer to use the phrase “fail to reject
H0 ” rather than “accept H0 ” when the sample evidence is not enough to accept Ha . In simple words,
rejecting H0 when it is not true is a small preset or known risk α but accepting H0 when it is not true
could be a very big unknown risk β.

36
5.4.3 More about P-value
Suppose we want to test

H0 : p ≤ 0.1

H1 : p > 0.1
based on a sample of size 20. Let the test statistic is X, the number of successes that are observed in 20
trials. If p = 0.1, the null value of p, then X follows a binomial distribution with mean E[X] = 20(0.1) = 2.
So values of X somewhat greater than 2 will lead to the rejection of null hypothesis. Suppose we want α
to be very small, say 0.0001. From the binomial probability distribution table, we have

P [X ≥ 9 : p = 0.1] = 1 − P [X ≤ 8 : p = 0.1] = 1 − 0.9999 = 0.0001.

So the critical region of the test is C = {9, 10, ......., 20}. Now suppose we conduct the test and observe 8
successes. It does not fall into C. So via our rigid rule of hypothesis testing we are unable to reject H0 .
However, a little thought should make us a bit uneasy with this decision. We find

P [X ≥ 8 : p = 0.1] = 1 − P [X ≤ 7 : p = 0.1] = 1 − 0.9996 = 0.0004.

It means we are willing to tolerate 1 chance in 10000 of making a Type I error. But we shall declare 4
chances in 10000 of making such an error too large to risk. There is so little difference between these
probabilities that it seems a bit silly to insist with our original cut off value 9.
Such a problem can be avoided by adopting a technique known as significance testing where we do not
preset α and hence do not specify a rigid critical region. Rather, we evaluate the test statistic and then
determine the probability of observing a value of the test statistic at least as extreme as the value noted,
under the assumption θ = θ0 . This probability is known as critical level or descriptive level of significance
or P value of the test. We reject H0 if we consider this P-value to be small. In case, an α level has been
preset to ensure that a traditional or industry maximum acceptable level is met, we compare the P-value
with the preset α value. If P-value ≤ α, then we can reject the null hypothesis atleast at the stated level
of significance.

37

You might also like