0% found this document useful (0 votes)
36 views12 pages

STAT2602B Topic 4 With Exercise Suggested Solution

The document provides information about interval estimation and constructing confidence intervals for a population mean based on a random sample from that population. Specifically, it discusses: 1) How confidence intervals provide a range of values that is likely to include the unknown population parameter, compared to a single point estimate. 2) The formulas for constructing a 100(1-α)% confidence interval for a normal population mean when the population variance is known and unknown, based on the t-distribution. 3) Examples of applying the formulas to construct 95% and 90% confidence intervals for unknown population means based on sample data.

Uploaded by

giaoxukun
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
36 views12 pages

STAT2602B Topic 4 With Exercise Suggested Solution

The document provides information about interval estimation and constructing confidence intervals for a population mean based on a random sample from that population. Specifically, it discusses: 1) How confidence intervals provide a range of values that is likely to include the unknown population parameter, compared to a single point estimate. 2) The formulas for constructing a 100(1-α)% confidence interval for a normal population mean when the population variance is known and unknown, based on the t-distribution. 3) Examples of applying the formulas to construct 95% and 90% confidence intervals for unknown population means based on sample data.

Uploaded by

giaoxukun
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

The University of Hong Kong

Department of Statistics and Actuarial Science


STAT2602B Probability and Statistics II
Semester 2 2023/2024
Topic 4: Interval Estimation: One Population

1 Interval Estimation
A point estimate for the parameter confidence interval does not provide much infor-
mation about the accuracy of the estimate and can sometimes be misleading. It is
desirable to generate a narrow interval that will cover the unknown parameter with
a large probability (confidence).

Definition 4.1. Suppose that θ̂ is an estimator of θ based on a random sample


from a population with parameter θ. Construct θ̂1 and θ̂2 such that
 
P θ̂1 ≤ θ ≤ θ̂2 = 1 − α for 0 < α < 1,
h i
then, the random interval θ̂1 , θ̂2 is an interval estimator for some specified proba-
bility (1 − α).

Remarks:
h i
ˆ θ̂1 , θ̂2 is called a 100 (1 − α) % confidence interval for θ

ˆ (1 − α) is the confidence level, confidence coefficient or degree of confidence

ˆ θ̂1 is a lower 100(1 − α)% confidence limit (bound)

ˆ θ̂2 is an upper 100(1 − α)% confidence limit (bound)

ˆ (θ̂1 , θ̂2 ) is a 100(1 − α)% confidence interval

For example, when α = 0.05, the confidence level is 0.95 and we get a 95% confidence
interval. It should be understood that, like point estimates, interval estimates of a
given parameter are not unique. Methods of interval estimation are judged by their
various statistical properties. For instance, one desirable property is to have the
width of a 100(1 − α)% confidence interval as narrow as possible.

2 Estimation of the Mean of a Normal Population


In this section, we assume that the population is normally distributed with mean µ
and variance σ 2 .

1
STAT2602B TST23/24 Topic 4

Theorem 4.1. If x̄ is the value of the mean of a random sample of size n from a
normal population with known variance σ 2 , a 100 (1 − α) % confidence interval for
the population mean µ is
 
σ σ
x̄ − zα/2 √ , x̄ + zα/2 √ .
n n

X −µ
Proof. For a random sample of size n, X ∼ N (µ, σ 2 /n). Then, Z = √ ∼
σ/ n
N (0, 12 ) that
 
X −µ
1 − α = P −zα/2 < √ < zα/2
σ/ n
 
σ σ
= P X − zα/2 √ < µ < X + zα/2 √ ,
n n

where zα is such that


P (Z ≥ zα ) = α.
Hence, we arrive at the following conclusion that when σ is known,
 
σ σ
x − zα/2 √ , x + zα/2 √
n n

is a 100(1 − α)% confidence interval for µ.


Remarks: For a standard normal random variable Z, some values of zα can be found
in Table 3 or the last row of Table 2 of the book Modern Fundamental Statistical
Tables. Other values can be found through Table 1 of the book Modern Fundamental
Statistical Tables. For example, z0.005 ≈ 2.576 (not 2.575 as stated by some secondary
school textbooks) and z0.015 ≈ 2.17.

Figure 1: Quantile of standard normal distribution.

Example 4.1. A publishing company has just published a new college textbook.
Before the company decides the price of the book, it wants to know the average price

2
STAT2602B TST23/24 Topic 4

of all such textbooks in the market. The research department at the company took
a random sample of 36 such textbooks and collected information on their prices.
This information produced a mean price of $48.40 for this sample. It is known that
the standard deviation of the prices of all such textbooks is $4.50. Construct a 90%
confidence interval for the mean price of all such college textbooks assuming that
the underlying population has a normal distribution.

From the given information, n = 36, x = 48.40 and σ = 4.50. Now 1 − α = 0.9
which means α = 0.1, and zα/2 = 1.645. Hence, the 90% confidence interval for the
mean price of all such college textbooks is
   
σ σ 4.5 4.5
x − zα/2 √ , x + zα/2 √ = 48.4 − 1.645 × √ , 48.4 + 1.645 × √
n n 36 36
= [47.17, 49.63]

Remarks: How do we interpret this result? If we observe a large number of times the
value of x and construct each time a 90% confidence interval for µ accordingly, we
can expect that 90% of these intervals will include µ and 10% will not. If we observe
that x is, say, 48.4, we do not know whether the particular interval [47.17, 49.63]
includes the true value of µ or not because µ is unknown. Although we can say that
it either includes or does not include µ. We cannot say that the particular interval
includes µ with probability 0.9 as there is nothing to do with probability here since
47.17, 49.63 and µ are non-random.

Example 4.2. Suppose the bureau of the census and statistics of a city wants to
estimate the mean family annual income µ for all families in the city, where the
family annual income is known to have a normal distribution. It is known that
the standard deviation σ for the family annual income is 60 thousand dollars. How
large a random sample should the bureau select so that it can assert with probabil-
ity 0.99 that the sample mean will differ from µ by no more than 5 thousand dollars?

From the construction of a confidence interval, we have


 
σ σ
1 − α = P −zα/2 √ < X − µ < +zα/2 √
n n
 
σ
= P |X − µ| < zα/2 √ ,
n

where 1 − α = 0.99 and σ = 60 thousand dollars. It suffices to have

60zα/2 2
   2
σ 60 × 2.576
zα/2 √ ≤ 5 ⇒ n ≥ = = 955.6.
n 5 5

Hence, the sample size should be at least 956. Note that we have to round 955.6
up to the next higher integer. This is always the case when determining the sample
size.

3
STAT2602B TST23/24 Topic 4

Theorem 4.2. If x̄ and s are the values of the sample mean and the sample standard
deviation of a random sample of size n from a normal population, a 100 (1 − α) %
confidence interval for the population mean µ is
 
s s
x̄ − tα/2,n−1 √ , x̄ + tα/2,n−1 √ .
n n

X −µ
Proof. Now σ is unknown. Instead of σ, we make use of S and the fact that √
S/ n
has t(n − 1) if the population has a normal distribution. Therefore,
 
X −µ
1 − α = P −tα/2,n−1 < √ < tα/2,n−1
S/ n
 
S S
= P X − tα/2,n−1 √ < µ < X + tα/2,n−1 √ ,
n n

where tα,n satisfies


P (T ≥ tα,n ) = α
for a random variable T having t (n). Hence, we arrive at the following conclusion
that when σ is unknown,
 
s s
x − tα/2,n−1 √ < µ < x + tα/2,n−1 √
n n

is a 100(1 − α)% confidence interval for µ.


Remarks: For a Student’s t random variable T , some values of tα,n can be found in
Table 2 of the book Modern Fundamental Statistical Tables.

Example 4.3. A paint manufacturer wants to determine the average drying time
of a new brand of interior wall paint. If for 12 test areas of equal size he obtained a
mean drying time of 66.3 minutes and a standard deviation of 8.4 minutes, construct
a 95% confidence interval for the true population mean assuming normality.

Given n = 12, x = 66.3, s = 8.4, α = 1 − 0.95 = 0.05 and tα/2,n−1 = t0.025,11 ≈ 2.201.
The 95% confidence interval for µ is
   
s s 8.4 8.4
x̄ − tα/2,n−1 √ , x̄ + tα/2,n−1 √ = 66.3 − 2.201 × √ , 66.3 + 2.201 × √
n n 12 12
= [60.96, 71.64]

Remarks: Usually there is a row with ∞ degrees of freedom in a t-distribution table,


which actually shows values of zα . In fact, when n → ∞, the distribution function
of t (n) tends to that of N (0, 12 ).

4
STAT2602B TST23/24 Topic 4

Example 4.4. For a random sample of 50 apprentice geologists, the sample mean
and the sample standard deviation of hourly wage of apprentice geologists employed
by the top 5 oil companies are 14.75 and 3.0, respectively. Construct a 95% confi-
dence interval for the mean hourly wage of apprentice geologists employed by the
top 5 oil companies if the population of hourly wage has a normal distribution.

Given x = 14.75, s = 3.0 and n = 50, α = 1−0.95 = 0.05. By Excel, t0.025,49 ≈ 2.010
using the function =T.INV(0.975,49), the 95% confidence interval for µ is
   
s s 3.0 3.0
x̄ − tα/2,n−1 √ , x̄ + tα/2,n−1 √ = 14.75 − 2.010 × √ , 14.75 + 2.010 × √
n n 50 50
= [13.90, 15.60]

3 Estimation of the Variance of a Normal Popu-


lation
Theorem 4.3. If s2 is the value of the sample variance of a random sample of size
n from a normal population, a 100 (1 − α) % confidence interval for the population
variance σ 2 is " #
n−1 2 n−1
s, 2 s2 ,
χ2α/2,n−1 χ1−α/2,n−1
where χ2α,n is the real number satisfying

P W ≥ χ2α,n = α.


with W having χ2 (n).


Proof. Given a random sample of size n from a normal population, by Theorem 1.9,
we have
n−1 2
2
S ∼ χ2n−1 .
σ
It follows that
 
2 n−1 2 2
P χ1−α/2,n−1 ≤ S ≤ χα/2,n−1 = 1 − α,
σ2
where χ21−α/2,n−1 and χ2α/2,n−1 are the left and the right quantiles of the chi-squared
distribution with (n − 1) degrees of freedom. Then,
!
n−1 2 n − 1
P S ≤ σ2 ≤ 2 S 2 = 1 − α.
χ2α/2,n−1 χ1−α/2,n−1

Hence, " #
n−1 2 n−1
2
s, 2 s2
χα/2,n−1 χ1−α/2,n−1
is a 100(1 − α)% confidence interval for σ 2 .

5
STAT2602B TST23/24 Topic 4

Remarks: For a chi-squared random variable W , some values of χ2α,n can be found
in Table 4 of the book Modern Fundamental Statistical Tables.
Example 4.5. A machine is set up to fill packages of cookies. A recently taken ran-
dom sample of the weights of 25 packages from the production line gave a variance
of 2.9, where the weights are known to have a normal distribution. Construct a 95%
confidence interval for the standard deviation of the weight of a randomly selected
package from the production line.

Given n = 25, s2 = 2.9 and α = 0.05. The 95% confidence interval for the population
variance is
" #  
n−1 2 n−1 2 25 − 1 25 − 1
s, 2 s = × 2.9, × 2.9
χ2α/2,n−1 χ1−α/2,n−1 39.36 12.40
= [1.768, 5.613]

Remarks: Taking positive square roots, we obtain the 95% confidence interval for
the population standard deviation to be [1.330, 2.369].

4 Estimation of the Population Proportion


In many problems we must estimate proportions, probabilities, percentages or rates,
such as the proportion of defectives in a large shipment of transistors, the percentage
of school children with IQ over 115 or the mortality rate of a disease.
The population proportion, denoted by p, is obtained by taking the ratio of the
number of elements in a population with a specific characteristic to the total number
of elements in the population. The sample proportion, denoted by p̂, gives a similar
ratio of a sample.
In theory, we assume that the population follows a Bernoulli distribution with
unknown parameter p. Here p can be treated as the population proportion for a
specific characteristic. Note that p is also the population mean of a Bernoulli random
variable.
Theorem 4.4. If p̂ is an estimate of p, an approximate 100 (1 − α) % confidence
interval for the population proportion p is
" r r #
p̂ (1 − p̂) p̂ (1 − p̂)
p̂ − zα/2 , p̂ + zα/2 .
n n

Proof. Suppose X1 , X2 , . . . , Xn constitute a random sample from the population.


Then,
P (Xi = 1) = p and P (Xi = 0) = 1 − p,
n
1X
for i = 1, 2, . . . , n. The sample mean X = Xi is just the sample proportion p̂.
n i=1
Then, p̂ is an unbiased point estimator of p. In addition, p̂ converges in probability

6
STAT2602B TST23/24 Topic 4

to p. We also know that


p̂ − p X −p
p =p
p(1 − p)/n p(1 − p)/n
follows approximately N (0, 12 ) as n → ∞ by the central limit theorem. Therefore,
!
p̂ − p
1 − α ≈ P −zα/2 < p < zα/2
p(1 − p)/n
r r !
p(1 − p) p(1 − p)
= P p̂ − zα/2 < p < p̂ + zα/2 .
n n
Since the population proportion p in the variance term is unknown, for simplicity, we
replace the population proportion p in the variance term by the sample proportion
p̂ and conclude that
" r r #
p̂(1 − p̂) p̂(1 − p̂)
p̂ − zα/2 , p̂ + zα/2
n n
is an approximate 100(1 − α)% confidence interval for p.
Example 4.6. In a random sample, 136 of 400 persons given a flu vaccine experi-
enced some discomfort. Construct a 95% confidence interval for the true population
proportion of persons who will experience some discomfort from the vaccine.

Given n = 400, p̂ = 136/400 = 0.34 and z0.025 = 1.960. The approximate 95%
confidence interval for p is
" r r #
p̂(1 − p̂) p̂(1 − p̂)
p̂ − zα/2 , p̂ + zα/2
n n
" r r #
0.34 × (1 − 0.34) 0.34 × (1 − 0.34)
= 0.34 − 1.960 × , 0.34 + 1.960 ×
400 400
= [0.2936, 0.3864]
Example 4.7. The reaction of an individual to a stimulus in a psychology exper-
iment may take one of two forms, A or B. If an experimenter wishes to estimate
the probability p that a person will react in manner A, how many people must be
included in the experiment? Assume that the experimenter will be satisfied if the
estimation error is less than 0.04 with probability equal to 0.9. Assume also that he
expects p to lie somewhere in the neighbourhood of 0.6.

We know that
!
p̂ − p
1 − α ≈ P −zα/2 < p < zα/2
p × (1 − p) /n
r !
p × (1 − p)
= P |p̂ − p| < zα/2 .
n

7
STAT2602B TST23/24 Topic 4

Hence, r
p × (1 − p)
zα/2 < 0.04,
n
or  z 2
α/2
n > p (1 − p) ,
0.04
where α = 1 − 0.9 = 0.1. Since p is unknown, we would use the guessed value of
p = 0.6 provided by the experimenter. Then,
 2
1.645
n > 0.6 × (1 − 0.6) × ≈ 405.9.
0.04

Hence, the sample size should be at least 406.


Remarks: If we did not know that p ≈ 0.6, we would use p = 0.5, which will yield
the maximum value for p (1 − p) because
 2
2 1 1
p (1 − p) = p − p = − −p .
4 2

Therefore,  2
1.645
n > 0.5 × (1 − 0.5) × ≈ 422.8,
0.04
and the final result will be n ≥ 423.

8
The University of Hong Kong
Department of Statistics and Actuarial Science
STAT2602B Probability and Statistics II
Semester 2 2023/2024
Topic 4 Summary

1 Confidence Intervals for Normal Population


(a) Mean µ with known variance σ 2 .
A 100 (1 − α) % confidence interval for µ is
 
σ σ
x̄ − zα/2 √ , x̄ + zα/2 √ .
n n

(b) Mean µ with unknown variance σ 2 .


A 100 (1 − α) % confidence interval for µ is
 
s s
x̄ − tα/2,n−1 √ , x̄ + tα/2,n−1 √ .
n n

(c) Variance σ 2 .
A 100 (1 − α) % confidence interval for σ 2 is
" #
n−1 2 n−1
s, 2 s2 .
χ2α/2,n−1 χ1−α/2,n−1

2 Confidence Intervals for Non-normal Popula-


tion
(a) Proportion p.
An approximate 100 (1 − α) % confidence interval for p is
" r r #
p̂ (1 − p̂) p̂ (1 − p̂)
p̂ − zα/2 , p̂ + zα/2 .
n n

9
The University of Hong Kong
Department of Statistics and Actuarial Science
STAT2602B Probability and Statistics II
Semester 2 2023/2024
Topic 4 Exercise

1. A team of efficiency experts intends to use the mean of a random sample of size
n = 150 to estimate the average mechanical aptitude of assembly-line workers
in a large industry. If, based on experience, the efficiency experts can assume
that σ = 6.2 for such data which yield a sample mean of 12. Construct a 95%
confidence interval for µ assuming the population has the normal distribution.

2. A random sample of 10 data was selected. The results are as follow:

25, 28, 26, 29, 32, 22, 24, 26, 33, 30.

The population has the normal distribution but its standard deviation is un-
known. Construct a 90% confidence interval for the population mean.

3. A detergent packaging machine at 14 independent operations produces packets


with the following quantities (in grams):

200 204 207 204 199 200 203


203 197 201 206 204 202 205

It is known that the population of packet weights has the normal distribution
with mean µ.

(a) Obtain an unbiased estimate of the population mean µ.


(b) Construct a 95% confidence interval for µ.

4. In 16 test runs the gasoline consumption of an experimental engine has a


standard deviation of 2.2 gallons. Construct a 99% confidence interval for σ 2 ,
which measures the true variability of the gasoline consumption of the engine.
Assume the gasoline consumption follows the normal distribution.

5. A manager wants to estimate the true proportion of customers who pay by


credit card. How large a random sample of customers should be taken so that
he is 95% confident that the maximum estimation error of his estimate will be
within 0.025?

10
The University of Hong Kong
Department of Statistics and Actuarial Science
STAT2602B Probability and Statistics II
Semester 2 2023/2024
Topic 4 Exercise Suggested Solution

1. A 95% confidence interval for µ is


   
σ 6.2
x̄ ± zα/2 √ = 12 ± 1.96 × √ = [11.01, 12.99] .
n 150
Remarks: Suppose that this is a pilot study which investigates the minimum
sample size required such that the estimation error is not greater than 0.5 with
a 99% degree of confidence. To determine the minimum sample size, consider
σ 6.2 6.2
zα/2 √ ≤ 0.5 ⇒ z0.01/2 √ ≤ 0.5 ⇒ 2.576 × √ ≤ 0.5 ⇒ n ≥ 1020.32.
n n n

Take n = 1021 since the sample size must be an integer.

2. We obtain from the sample that x̄ = 27.5, s = 3.535534 and n = 10. A 90%
confidence interval for µ is
   
s 3.535534
x̄ ± tα/2,n−1 √ = 27.5 ± 1.833 × √ = [25.45, 29.55] .
n 10
Look up the t-test Table with the level of significance for two-tailed tests
=0.10 (or level of significance for one-tailed tests= 0.05) and df = 9. This
gives tα/2,n−1 = t0.05,9 = 1.833.

3. Let X be the weight of detergent packets (in grams) where X ∼ N (µ, σ 2 ). We


obtain x̄ = 202.5, s = 2.821620 and n = 14.

(a) An unbiased estimate of µ is the sample mean µ̂ = x̄ = 202.5.


(b) A 95% confidence interval for µ is
h i h i
s s 2.821620
x̄ − tα/2,n−1 n , x̄ + tα/2,n−1 n = 202.5 ± 2.160 ×
√ √ √
14
= [200.9, 204.1]

Look up the t-test Table with the level of significance for two-tailed tests=
0.05 (or level of significance for one-tailed tests= 0.025) and df = 13. This
gives tα/2,n−1 = t0.025,13 = 2.160.

4. Given n = 16 and s = 2.2. Look up the χ2 -test Table with the level of
significance for two-tailed tests= 0.01 (or level of significance for one-tailed
tests= 0.005) and df = 15. This gives the left quantile

χ21−α/2,n−1 = χ20.995,15 = 4.60,

11
STAT2602B TST23/24 Topic 4 Exercise Suggested Solution

and the right quantile

χ2α/2,n−1 = χ20.005,15 = 32.80.

A 99% confidence interval for the population variance σ 2 is


" #  
n−1 2 n−1 2 16 − 1 2 16 − 1 2
s, 2 s = × 2.2 , × 2.2 = [2.213, 15.78] .
χ2α/2,n−1 χ1−α/2,n−1 32.80 4.60

5. An approximate 95% confidence interval for p is


" r r #
p̂ (1 − p̂) p̂ (1 − p̂)
p̂ − zα/2 , p̂ + zα/2 .
n n
p
The estimation error is zα/2 p̂ (1 − p̂) /n with 0.95 probability. We require
the estimation error to be less than 0.025. Then,
r
p̂ (1 − p̂)
1.96 × ≤ 0.025
n
2


1.96 p 1.96
⇒ n≥ × p̂ (1 − p̂) ⇒ n ≥ × p̂ (1 − p̂)
0.025 0.025
 2
1.96
⇒ n≥ × 0.5 × (1 − 0.5) ⇒ n ≥ 1536.64
0.025

Since the sample size must be an integer, we take n = 1537.


Remarks: Since there is no prior information given about p̂, we use p̂ = 0.5
such that the estimation error is at the maximum. Hence, the computed
sample size is the most conservative.

12

You might also like