0% found this document useful (0 votes)
4 views

2.7*_t-test 2

This document covers the use of the t-distribution for calculating confidence intervals and conducting hypothesis tests for means of normal distributions with unknown variances. It outlines objectives such as finding confidence intervals, conducting paired t-tests, and testing differences between means from independent samples. The document also explains the properties of the t-distribution, including degrees of freedom and critical values, along with practical examples.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

2.7*_t-test 2

This document covers the use of the t-distribution for calculating confidence intervals and conducting hypothesis tests for means of normal distributions with unknown variances. It outlines objectives such as finding confidence intervals, conducting paired t-tests, and testing differences between means from independent samples. The document also explains the properties of the t-distribution, including degrees of freedom and critical values, along with practical examples.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

Confidence intervals

and tests using the


t-distribution
7
Objectives
After completing this chapter you should be able to:
● Find a confidence interval for the mean of a normal distribution with
unknown variance ← pages 164–170
● Conduct a hypothesis test for the mean of a normal distribution with
unknown variance ← pages 170–174
● Carry out a paired t-test ← pages 174–179
● Find a confidence interval for the difference between means from two
independent normal distributions with equal but unknown variances ← pages 180–184
● Conduct a hypothesis test for the difference between means from two
independent normal distributions with equal but unknown variances ← pages 185–189

Prior knowledge check


1 A random sample of size 20 is taken from
a normally distributed population with a
standard deviation of 2. The mean of the
sample was 16.
Find a 95% confidence interval for the
mean μ. ← Section 5.2

2 A researcher is comparing the heights of


children in two towns. A random sample
of 100 children from town A is taken and
the sample mean and standard deviation
are 145 cm and 4 cm respectively. An
independent random sample of 120
children from town B is taken and the
sample mean and standard deviation are
Farmers often try out different diets to see 146 cm and 3.5 cm respectively.
which is the most effective at producing Test, at the 5% level of significance,
high-yield animals. They can compare the whether there is evidence of a difference
effectiveness of two diets using a paired in the mean heights of the children in the
t-test. → Mixed exercise Q13 two towns. ← Section 5.3

163

M07_EDALVL_FS2_83381__U07_163-195.indd 163 05/07/2018 13:09


Chapter 7

7.1 Mean of a normal distribution with unknown variance


A You know that for a normally distributed random variable X, the sample mean will also be distributed
normally:
_
​​  ∼ N​​(μ, ​ __
n ​)​
σ2
X ​​
This means that you can calculate a confidence interval for the population mean, μ, as long as
you know the population variance. In most cases, however, if you are taking a sample from a large
population you will not already know the population variance.

If the sample size, n, is large, then you can use Links For a _ large sample of size n from a normal
the sample variance as an approximation of the ​ X ​ − μ
______
population, ​​   ​​ is approximately normal with
population variance. S__
​ ___  ​
​ n ​

However, if n is small,_ S is unlikely to be distribution N(0, 12). ← Section 5.4
 ​ − μ
​ _____
X
very close to σ and ​ ​  ​​ can no longer be
___ S
​  √__ ​
​ n ​
modelled by the normal distribution N(0, 12). _
X ​ − μ
​ ______
When n is small we usually use the symbol t to denote the quantity ​​   ​​
___S
​  __ ​
​ n ​

■ If a random sample X1, X2, … , Xn is selected from a normal distribution with mean μ and
unknown variance σ2 then
_
​  − μ
X ​
______
t = ​​   ​​
n − 1 ( i=1 )
Links 1 n _
___S__ S 2 = _____
​​   ​​   ​​ ∑
​ ​  X   
​ 2
​ − n​ 
X ​   2 ​​ ← Section 5.1
​   ​ i

​ n ​
has a tn−1-distribution where S 2 is an
unbiased estimator of σ 2.
There are a family of t-distributions determined by the value of n. This establishes the number of
degrees of freedom, similar to the chi-squared and F-distributions.

The number of degrees of freedom, ν, is equal to n − 1 and as ν → ∞, the t-distribution approaches


the distribution N(0, 12). For this reason the t-distribution is usually used when the sample size, n, is
small. For larger sample sizes it is more convenient to approximate t with a normal distribution. The
diagram below shows two examples of the t-distribution for different values of ν, together with the
standardised normal distribution.
Standard normal curve
Notation
ν=4 W. S. Gosset, who published his works under the
pseudonym ‘the student’, first investigated
_
ν=1 ​ X ​ − μ
______
the probability distribution of ​​   ​​ for a sample
___S
​  __ ​
​ n ​

taken from a normal distribution. The resulting


–4 –3 –2 –1 O 1 2 3 4 t distribution is known as ‘Student’s t-distribution’,
or more commonly just the t-distribution.

164

M07_EDALVL_FS2_83381__U07_163-195.indd 164 05/07/2018 13:09


Confidence intervals and tests using the t-distribution

A As with the F-distribution and the χ


2
Note The t-distribution is symmetric in the same
distribution, the critical values of the way as the normal distribution.
t-distribution depend on the number of degrees
of freedom. The table of values in the formulae booklet, and on page 217, gives percentage points for
the t-distribution for certain values of ν up to 120.

The values in the table are those which a random variable with Student’s t-distribution on ν degrees
of freedom exceeds with the probability shown.
ν 0.10 0.05 0.025 0.01 0.005
1 3.078 6.314 12.706 31.821 63.657 For example, if X has the t7-distribution
2 1.886 2.920 4.303 6.965 9.925 with 7 degrees of freedom (n = 8):
3 1.638 2.353 3.182 4.541 5.841 ● P(X . 1.895) = 0.05
4 1.533 2.132 2.776 3.747 4.604 ● P(X , 1.895) = 0.95

5 1.476 2.015 2.571 3.365 4.032 and by the symmetry of the


6 1.440 1.943 2.447 3.143 3.707 t-distribution:
● P(X , −1.895) = 0.05
7 1.415 1.895 2.365 2.998 3.499
8 1.397 1.860 2.306 2.896 3.355

Note You may be able to use your calculator to


ν=7
find the percentage points for the t-distribution
rather than the tables.

0.05
Watch out When working with the
t t-distribution you are strongly advised to
–4 –3 –2 –1 O 1 2 3 4
draw appropriate diagrams so that you are
1.895
sure in your own mind which areas under the
t-distribution you are dealing with.

Online Expore the t-distribution using


GeoGebra and use it to determine critical
values of the sample variances.
Example 1
The random variable X has a t-distribution with 10 degrees of freedom. Determine values of t for
which:
a P(X > t) = 0.025
b P(X < t) = 0.95
Notation
​​|X   |​​ means the modulus of X. This
c P(X < t) = 0.025 is the absolute value of X ignoring the sign, for
d P(​​|X   |​​ > t) = 0.05 example the modulus of −5, written |−5| is 5. So
e P(​​|X   |​​ < t) = 0.98 P(​​|X   |​​ > t) = P(X < −t) + P(X > t)
P(​​|X   |​​ < t) = P(−t < X < t)

165

M07_EDALVL_FS2_83381__U07_163-195.indd 165 05/07/2018 13:09


Chapter 7

A
a ν = 10

You are looking for


0.025 P(X > t) so you can use the
tables directly. Look where
O t the ν = 10 row intersects
with the 0.025 column.
t10(0.025) = 2.228
b

0.95
The whole area under the
curve = 1.
So P(X > t) = 1 − P(X < t).
O t Look where the ν = 10 row
If P(X < t) = 0.95 then P(X > t) = 1 − 0.95 = 0.05 intersects with the 0.05
From the table t10(0.05) = 1.812 column.
c

Because the distribution is


symmetrical
P(X < −t) = P(X > t). You
0.025
know from part a that
t P(X > t) = 0.025 if t = 2.228,
O
From a, P(X > t) = 0.025 when t = 2.228 so P(X < t) = 0.025 when
t = −0.228.
so P(X < t) = 0.025 if t = −2.228
d

0.025 0.025

O t
This is two-tailed with
From a and c, P(​​|X|​ > t) = 0.05 if X < −2.228 and X > 2.228
probability of 0.025 at
There are therefore two values for t and they are
each tail.
−2.228 and 2.228
e

0.98
Again, a two-tailed
problem. From the
O t diagram you can see that
P(​​|X|​ > t) = 0.01 if t = 2.764 and −2.764 < X < 2.764 you are looking for tails
Again there are two values of t and they are −2.764 and 2.764 each with probability 0.01.

166

M07_EDALVL_FS2_83381__U07_163-195.indd 166 05/07/2018 13:09


Confidence intervals and tests using the t-distribution

Example 2
A The random variable Y has a t -distribution.
4
Determine:
a P(Y > 3.747) b P(Y < −2.132)

From the ν = 4 row of the table you can see that


a ν=4 3.747 is in the 0.01 probability column.
P(Y > 3.747) = 0.01
b P(Y > 2.132) = 0.05
From the ν = 4 row of the table you can see that
By symmetry P(Y < −2.132) = 0.05 2.132 is in the 0.05 probability column.

You can use the t-distribution to find a confidence interval for the mean of a normal distribution
when the variance is unknown.
For a sample _ taken from a normal population with unknown variance,
​  − μ
X ​
t = ______ ​​   ​​ has a tn−1-distribution.
___S__
​   ​

​ n ​
If you want to find a 95% confidence interval for the population mean, μ, then you start by finding a
value of_t such that _
⎛ X ​ −μ ⎞ ⎛ X ​ ⎞
⎜ ⎟ ⎜ ⎟
​ ______ −μ
​ ______
P  ​​ ​   ​> t ​​ = 0.025 and P   ​​ ​   ​ < −t ​​ = 0.025
___S ___S
⎝ √​ n ​ ⎠ ⎝ √​ n ​ ⎠
​  __ ​ ​  __ ​

This value of t is called the tn−1 value for a Notation This value is written as tn−1(0.025).
probability of 0.025. You can find it using tables
or your calculator.
_
⎛ ⎞
⎜ ⎟
X ​
​ ______ −μ
Thus P  ​​ −tn−1(0.025) < ​   ​< tn−1(0.025) ​​ = (1 − 0.025) − 0.025 = 0.95
___ S
⎝ ⎠
​  __ ​

​ n ​
Look at the inequality inside the bracket: You are interested in μ, so here
S__ _ S__ you try to isolate it by
−tn−1(0.025) × ​​   ​​ < X ​​ ___ ​​  − μ < tn−1(0.025) × ​​   ​​ ___

​ n ​ √
​ n ​ S
1 multiplying by ___ ​ ​ __ ​​
S__ _
___ S__ _
___ √ n ​

−tn−1(0.025) × ​ ​  ​​− X ​ ​  ​​< −μ < tn−1(0.025) × ​​   ​​ − X ​ ​  ​​ _
​√ n ​ ​√ n ​
2 subtracting X ​​
​​ 
_ S _ S
​  ​​− tn−1(0.025) × ___
​ X ​​  __ ​​ < μ < X ​ ​  ​​+ tn−1(0.025) × ___
​​  __ ​​ 3 multiplying by −1 and altering
​√ n ​ ​√ n ​
_ the inequality.
For a particular sample with mean ​​ x ​​ and variance s2, this becomes:
_ _
( ​ n ​)
s s
​  − tn−1(0.025) × ___
P​ x ​ ​  __ ​ < μ < x ​ ​  + tn−1(0.025) × ___ ​  __ ​ ​= 0.95

​ n ​ √
So for a small sample of size n from a normal distribution N(  μ, σ2) with unknown mean and variance:
_ s
● the 95% confidence limits for μ are given by x ​​ ​ ​ ± tn–1(0.025) × ___ ​​  __ ​​

​ n ​
● the 95% confidence interval for μ is given by
_ s _
( ​ n ​)
s
x − tn−1(0.025) × ___
​​ ​  ​ ​  __ ​  , ​ x ​ + tn−1(0.025) × ___ ​  __ ​ ​​

​ n ​ √

167

M07_EDALVL_FS2_83381__U07_163-195.indd 167 05/07/2018 13:09


Chapter 7

A In the same way,


_ s
● the 90% confidence limits for μ are given by x ​​ ​ ​ ± tn–1(0.05) × ___
​​  __ ​​

​ n ​
● the 90% confidence interval for μ is given by
_ s _
( ​ n ​ )
s
​  − tn−1(0.05) × ___
​​ x ​ ​  + tn−1(0.05) × ___
​  __ ​, x ​ ​  __ ​ ​​

​ n ​ √

■ In general, for a small sample of size n from a normal distribution N(μ, σ2) with unknown
mean and variance:
• the 100(1 − α)% confidence limits for the population mean are
_ α s
​​ x ​​ ± tn−1​​(__
​   ​)​​ × ___
​​  __ ​​
2 √
​ n ​
• the 100(1 − α)% confidence interval for the population mean is
_ α s _ α
( ​ n ​)
s
​  − tn−1 (​ __
​​ x ​ ​   ​)​ × ___ ​  + tn−1​(__
​  __ ​, x ​ ​   ​)​ × ___
​  __ ​ ​​
2 √
​ n ​ 2 √

Example 3
A sample of 6 trout taken from a fish farm were caught and their lengths in centimetres were
measured. The lengths of the fish were as follows:
26.8    26.0    25.8    25.5    24.3    24.6
Assuming that the lengths of trout are normally distributed, find a 90% confidence interval for the
mean length of trout in the fish farm.
_
Using a calculator gives ​​ x ​​ = 25.5 and s2 = 0.8560 First find the sample mean and variance.
_
s = ​​√ 0.8560 ​​ = 0.9252
_ The standard deviation of the sample can
The 90% confidence limits for ​​ x ​​ are
_ s__ 0.9252 be found by taking the square root of the
​​  ± t5(5%) ​​ ___
x ​​  ​​ = 25.5 ± 2.015 × ________
​​  __ ​​
variance.

​ n ​ √
​ 6 ​
= 25.5 ± 0.761
The 90% confidence interval is (24.739, 26.261) _
Put your values for ​​ x ​​ and s into the formula,
and work out the confidence interval.

Example 4
The percentage starch content of potatoes is normally distributed with mean μ. In order to assess
the mean value of the starch content, a random sample of 12 potatoes is selected and their starch
content measured. The percentages of starch contents obtained were as follows:
23.2 20.3 18.6 20.0 20.8 21.6 19.4 18.7 22.1 19.5 21.3 22.6
Find a 95% confidence interval for the mean.
_
​​ x ​​ = 20.675 and s = 1.513 You could use a calculator to find these.
_
The 95% confidence limits for ​​ x ​​ are
_ s 1.513
​​  ± t11(2.5%) ___
x ​​ ​​  __ ​​ = 20.675 ± 2.201 × _____
​​  ___ ​​ Use the formula.

​ n ​ ​√ 12 ​
= 20.675 ± 0.961
The 95% confidence interval is (19.714, 21.636) Write out the confidence interval.

168

M07_EDALVL_FS2_83381__U07_163-195.indd 168 05/07/2018 13:09


Confidence intervals and tests using the t-distribution

Exercise 7A
A
1 Given that the random variable X has a t12-distribution, find values of t such that:
a P(X < t) = 0.025 b P(X > t) = 0.05 c P(​​|X|​​ > t) = 0.95

2 Find:
a t26(0.01) b t26(0.05)

3 The random variable Y has a tn-distribution. Find a value (or values) of t that satisfies each
of the following.
a n = 10, P(Y < t) = 0.95 b n = 32, P(Y < t) = 0.005 c n = 5, P(Y < t) = 0.025
d n = 16, P(​​|Y   |​< t) = 0.98 e n = 18, P(​​|Y   |​​ > t) = 0.10

4 A test on the life (in hours) of a certain make of torch batteries gave the following results.
20.3    17.3    25.0    18.4    16.3    24.8    24.3    21.2
Assuming that the lifetime of batteries is normally distributed, find a 90% confidence interval
for the mean.

5 A sample of size 16 taken from a normal population with unknown variance gave the following
sample values.
_
​​ x ​​ = 12.4 s2 = 21.0
Find a 95% confidence interval for the population mean.

6 The mean heights (measured in centimetres) of six male students at a college were as follows:
182    178    183    180    169    184
Calculate:
a a 90% confidence interval b a 95% confidence interval
for the mean height of male students at the college.
You may assume that the heights are normally distributed.

E 7 The masses (in grams) of 10 nails selected at random from a bin of 90 mm long nails were:
9.7    10.2    11.2    9.4    11.0    11.2    9.8    9.8    10.0    11.3
a Calculate a 98% confidence interval for the mean mass of the nails in the bin.  (6 marks)
b State one assumption you have made in your calculation.  (1 mark)

E 8 A random sample of the feet of 8 adult males gave the following summary
statistics of length x (in cm):
Σ
​ x​= 224.1 ​Σx​2 = 6337.39
Assuming that the length of men’s feet is normally distributed, calculate a 99%
confidence interval for the mean length of men’s feet based upon these results. (6 marks)

169

M07_EDALVL_FS2_83381__U07_163-195.indd 169 05/07/2018 13:09


Chapter 7

A 9 A random sample of 26 students from the sixth form of a school sat an intelligence test that
E
measured their IQs. The results are summarised below.
_
 ​​ = 122
​​ x s2 = 225
Assuming that IQ is normally distributed, calculate a 95% confidence interval for the mean
IQ of the students. (6 marks)

E/P 10 Add ticks to this table to show the distribution you would use when finding a confidence
interval.
Normal χ2 t
For the population mean, using a sample of size 50 from a
population of unknown variance
For the population mean, using a sample of size 6 from a
population of known variance
For the population variance, using a sample of size 20
 (3 marks)

E/P 11 A company manufactures light bulbs which they state have an average lifespan of 500 days.
The manager is concerned that the production process is faulty and that the light bulbs do not
last as long as stated.
He tests a random sample of 15 bulbs and finds their lifespan, x days. The data is summarised as
Σ
​ x​= 7338 ​Σx​2 = 3 618 260
a Explain why you need to use the t-distribution to find a confidence interval for the
population mean. (1 mark)
b Find a 90% confidence interval for the mean lifespan, μ days, of the light bulbs.  (6 marks)
c State one assumption you have made in finding your answer to part b. (1 mark)
d Find a 95% confidence interval for the population variance. (4 marks)
Hint Confidence intervals for the population
variance are covered in Chapter 6.  ← Section 6.1

7.2 Hypothesis test for the mean of a normal distribution with unknown variance
Apart from using the t-distribution rather than the normal distribution for finding the critical region,
testing the mean of a normal distribution with unknown variance follows the same steps as you used
when testing the mean of a normal distribution with known variance.

The following steps might help you in answering questions about hypothesis testing of the mean of a
normal distribution with unknown variance.
1 Write down H0.
2 Write down H1.
3 Specify the significance level, α.
4 Write down the number of degrees of freedom, ν.

170

M07_EDALVL_FS2_83381__U07_163-195.indd 170 05/07/2018 13:09


Confidence intervals and tests using the t-distribution

A 5 Write down the critical region.


_
6 Calculate ​​ x ​​, s2 and t using
_ _ _
Σ(x2 − ​ x ​)2
 ​​  ​​(or s = ​ 
n−1 )
_ ____ Σ x 2 _________
​​ x ​​ = ​​  n ​​, s = ​​  2 Σx2 − n​ x
_________  ​
 ​ 2 ​  − μ
x ​
_____
​​ and t = ​​  s  ​​
n−1 ___
​  __ ​

​ n ​
7 Conclusions
The following points should be addressed:
i Is the result significant or not?
ii What are the implications in terms of the original problem?

Example 5
A shopkeeper sells jars of jam. The weights of the jars of jam are normally distributed with a mean
of 150 g. A customer complains that the mean weight of 8 jars she had bought was only 147 g.
An estimate for the standard deviation of the weights of the 8 jars of jam calculated from the
8 observations was 2 g.
a Test, at the 5% significance level, whether 147 g is significantly less than the quoted mean.
b Discuss whether the customer has cause for complaint.

a H0: μ = 150    H1: μ , 150 State your hypotheses and write down the
Significance level = 0.05 (one-tailed test) significance level.

ν=8−1=7 Find the number of degrees of freedom.

From tables, the critical value t 7 is −1.895


Look up the critical value in the table on page 217.
so the critical region is t < −1.895 Note a minus sign is needed since a left-hand tail
is being used.
_
​​ x ​​ = 147, μ = 150, s = 2
_ Write down the critical region.
​  − μ __________
x ​
______ 147 − 150
t = ​​  s  ​​ =    ​​   ​​ = −4.2426
___
​   ​
__ ___ 2__ _

​   ​
​ n ​ √
​ 8 ​ Calculate ​​ x ​​ and s. Use these to calculate t.

Now −4.2426 , −1.895 so the result is


Draw a conclusion.
significant and H0 is rejected.

b There is evidence to suggest that the Put it in the context of the original problem.
mean weight is less than 150 g and the
customer does have a cause for complaint.

171

M07_EDALVL_FS2_83381__U07_163-195.indd 171 05/07/2018 13:09


Chapter 7

Example 6
A The temperature (°C) was measured at noon on 10 days during the month of March in West
Cumbria. The readings were:
12.8    11.4    12.9    15.1    15.4    13.5    14.9    15.0    16.0    15.8
Using a 5% significance level, test whether or not this is an increase over the previous year when the
average noon temperature was 13.5 °C.
State your hypotheses
H0: μ = 13.5 H1: μ . 13.5 and write down the
Significance level 5% significance level.
ν=9
From tables, the critical value is t9 = 1.833 Write down the
so the critical region is t > 1.833 critical region.
_ 12.8 + 11.4 + 12.9 + 15.1 + 15.4 + 13.5 + 14.9 + 15.0 + 16.0 + 15.8
​​  = _____________________________________________________________
 ​​
x ​​             ​​ _
10 Calculate ​​ x ​​ and s.
= 14.28 Note both of these
_2 are easily found
∑x − nx
2 ​   ​ 2060.28 − 10 × 14.28 2
s2 ​ ​  ​​ = ______________________
= __________ ​ ​       ​​ = 2.344 using a calculator.
n−1 10 − 1
s = 1.531
_
​ x ​ − μ ____________
_____ 14.28 − 13.5
t = ​​  s  ​​ =   ​​   ​​ = 1.611 Calculate t.
___
​  __ ​ 1.531
______

​  ___ ​
​ n ​ √
​ 10 ​
1.611 , 1.833, so the result is not significant.
There is not enough evidence to suggest that the average temperature Draw a conclusion.
has increased.

Example 7
A concrete manufacturer tests cubes of its concrete at regular intervals, and their compressive
strengths in N mm−2 are determined. The mean value of the strengths is required to be 0.47 N mm−2.
A new supplier of cement offers to supply the firm at a cheaper rate than the present supplier, and
a trial bag of cement is used to make 12 concrete cubes. Upon testing, these cubes are found to
have strengths (x) such that ∑x = 5.52 and ∑x2 = 2.542. Assume that the strengths are normally
distributed.
a Stating your hypotheses clearly, test, at the 5% level of significance, whether or not the use of the
new cement has altered the mean strength of the concrete.
b In the light of your conclusion to the test in part a, what would you recommend the
manufacturer to do?

172

M07_EDALVL_FS2_83381__U07_163-195.indd 172 05/07/2018 13:09


Confidence intervals and tests using the t-distribution

A
a H0: μ = 0.47 H1: μ ≠ 0.47 You are looking to see if the strength
ν = 12 − 1 = 11 has altered up or down so use ≠ in H1.
Probability in each tail = 0.025
From tables the critical value is 2.201 This is a two-tailed test so halve
the significance level to find the
The critical region is ​​|t|​​ > 2.201
probability in each tail.
_ ∑x _____ 5.52
x = ____
​​  ​​ ​​  n ​ ​ = ​​   ​​ = 0.46
12
_ Since you are given Σx and Σx2 in the
∑x2 − n​ x  ​ 2 2.542 − 12 × 0.462
s2 = ___________
  
​​   ​​ = ___________________
  
​​     ​​ = 0.0002545 question, use these formulae.
n−1 11
s = 0.016
_
x − μ ____________
​  ​ 0.46 − 0.47 _
______
t = ​​  s  ​​ =    ​​   ​​ = −2.165 Since ​​ x ​​ , μ, t is negative.
___
​   ​
__ 0.016
______

​  ___ ​
​ n ​ √
​ 12 ​
Now ​​|−2.165|​​ , |​​ −2.201|​​ t lies between −2.201 and 2.201.
The result is not significant. There is not
enough evidence to suggest that the Draw the conclusion in context.
mean strength has altered.
b Since the mean strength has not altered, the
Base your recommendation on your
manufacturer should accept the new supplier because
conclusion.
they are cheaper. The two values −2.165 and −2.201
are quite close, however, and a one-tailed test of
whether or not the strength had decreased should be
done, or failing this a further sample could be taken.

Exercise 7B
1 Given that the observations 9, 11, 11, 12, 14, have been drawn from a normal distribution, test
H0: μ = 11 against H1: μ > 11. Use a 5% significance level.

2 A random sample of size 28 taken from a normally distributed variable gave the sample values
_
​​ x ​​ = 17.1 and s2 = 4. Test H0: μ = 19 against H1: μ < 19. Use a 1% level of significance.

3 A random sample of size 13 taken from a normally distributed variable gave the sample values
_
​​  = 3.26 and s2 = 0.64. Test H0: μ = 3 against H1: μ ≠ 3. Use a 5% significance level.
x ​​

E/P 4 A certain brand of blanched hazelnuts for use in cooking is sold in packets. The weights of the
packets of hazelnuts follow a normal distribution with mean, μ. The manufacturer claims that
μ = 100 g. A sample of 15 packets was taken and the weight, x, of each was measured.
The results are summarised by the following statistics: ∑x = 1473, ∑x2 = 148 119.
_
​ X ​ − μ
______
a Explain why it is not suitable to use a normal approximation for ​​   ​​ in this
___ S__
instance. ​   ​ (1 mark)

​ n ​
b Test, at the 5% significance level, whether or not there is evidence to justify the
manufacturer’s claim. (7 marks)

173

M07_EDALVL_FS2_83381__U07_163-195.indd 173 05/07/2018 13:09


Chapter 7

A 5 A manufacturer claims that the lifetimes of its 100‑watt bulbs are normally distributed with a
E mean of 1000 hours. A laboratory tests 8 bulbs and finds their lifetimes to be 985, 920, 1110,
1040, 945, 1165, 1170 and 1055 hours.
Stating your hypotheses clearly, examine whether or not the bulbs have a longer mean
lifetime than that claimed. Use a 5% level of significance. (7 marks)

E 6 A fertiliser manufacturer claims that by using brand F fertiliser the yield of fruit bushes will
be increased. A random sample of 14 fruit bushes was fertilised with brand F and the resulting
yields, x, were summarised by ∑x = 90.8, ∑x2 = 600. The yield of bushes fertilised by the usual
fertiliser was normally distributed with a mean of 6 kg of fruit per bush.
Test, at the 2.5% significance level, the manufacturer’s claim. (7 marks)

E 7 A nuclear reprocessing company claims that the amount of radiation within a reprocessing
building in which there had been an accident had been reduced to an acceptable level by their
clean-up team. The amounts of radiation, x, at 20 sites within the building in suitable units are
summarised by ∑x = 21.7, ∑x2 = 28.4. In the same units, the acceptable level of radiation is
given as 1.00.
a By carrying out a suitable test for the population mean, test whether the building falls within
acceptable radiation levels.  (7 marks)
b State one assumption made in carrying out your test.  (1 mark)

E/P 8 Scores in an aptitude test are assumed to be normally distributed with a population mean of
100. A company claims to be able to train people to improve their scores in the test. A random
sample of 20 people is taken and they are trained before taking the test. The sample standard
deviation is found to be 15 and the mean of the scores of the 20 people is found to be 110.
a Test, at the 5% level of significance, whether there is evidence of the training improving
the scores of participants. State your hypotheses clearly. (5 marks)
b Test, at the 10% level of significance, the hypothesis that the population standard
deviation is different from 12. State your hypotheses clearly. (5 marks)

7.3 The paired t-test


There are many occasions when you might want to compare results before and after some treatment,
or the effectiveness of two different types of treatment. You could, for example, be investigating the
effect of alcohol on people’s reactions, or the difference in intelligence levels of identical twins who
were separated at birth and who have been brought up in different family circumstances.

In both cases you need to have a common link between the two sets of results, for instance by
taking the same person’s result before and after drinking alcohol, or by the twins being identical. It
is necessary to have this link so that differences caused by other factors are eliminated as much as
possible. It would, for example, be of little use if you tested one person’s reactions without drinking
alcohol and a different person’s reactions after drinking alcohol because any difference could be due
to normal variations between their reactions. In the same way you would have to use identical twins

174

M07_EDALVL_FS2_83381__U07_163-195.indd 174 05/07/2018 13:09


Confidence intervals and tests using the t-distribution

A in the intelligence experiment, otherwise any difference in intelligence might be due to the normal
variability of intelligence between different people. In these cases, each result in one of the samples is
paired with a result in the other sample; the results are therefore referred to as paired.

In paired experiments such as these you are not really interested in the individual results as such,
but in the difference, D, between the results. In these circumstances you can treat the differences
between pairs of matched subjects as if they were a random sample from a N(μ, σ2) distribution. You
can then proceed as you did for a single sample.

Although you do not need to assume the two populations are normal, you need to assume that the
differences are normally distributed. Given that you are unlikely to know σ2 and that n is likely to be
small, then _
​  − μD
D ​
t = ​​ _______  ​​ ∼ tn−1
___S__
​   ​

​ n ​ Note This is the null hypothesis
Taking H0: μD = 0 as your null hypothesis, this reduces to that on average there is no
_ difference between the two
​  − 0
D ​
t = ​​ ______  ​​ ∼ tn−1 populations.
___S__
​   ​

​ n ​
_
■ In a paired experiment with a mean of the differences between the samples of D ​​ ​​  ,
_
D ​ − μD
​ _______
​​   ​​ ∼ tn−1
___S
​  __ ​

​ n ​
The paired t-test proceeds in almost the same way as the t-test itself. The steps are given below.
1 Write down the null hypothesis H0.
2 Write down the alternative hypothesis H1.
3 Specify α.
4 Write down the degrees of freedom (remembering that ν = n − 1).
5 Write down the critical region.
6 Calculate the differences d.
_
Calculate d ​​
​​  and s2. _
​  − μD
d ​
Calculate the value of the test statistic t = ​​ ______ s  ​​
___
​  __ ​
​√ n ​
7 Complete the test and state your conclusions. As before, the following points should be addressed:
i Is the result significant or not?
ii What are the implications in terms of the original problem?

175

M07_EDALVL_FS2_83381__U07_163-195.indd 175 05/07/2018 13:09


Chapter 7

Example 8
A In an experiment to test the effects of alcohol on the reaction times of people, a group of 10
students took part in the experiment. The students were asked to react to a light going on by
pushing a switch that would switch it off again. Their reaction times were automatically recorded.
After the students had each drunk one pint of beer the experiment was repeated. The results are
shown below.
Student A B C D E F G H I J
Reaction time before (seconds) 0.8 0.2 0.4 0.6 0.4 0.6 0.4 0.8 1.0 0.9
Reaction time after (seconds) 0.7 0.5 0.6 0.8 0.8 0.6 0.7 0.9 1.0 0.7
Difference −0.1 0.3 0.2 0.2 0.4 0 0.3 0.1 0 −0.2

Test, at the 5% significance level, whether or not the consumption of a pint of beer increased the
students’ reaction times.

H0: μd = 0 H1: μd . 0 State your hypotheses.

Significance level = 0.05 (one-tailed test) Write down the significance level.

ν = 10 − 1 = 9 Find the number of degrees of freedom.

Critical value t9(5%) = 1.833 Look up the critical value in the table.

The critical region is t > 1.833 Write down the critical region.

∑d 1.2
____
​​  n ​​ = ___
​ ​  ​​ = 0.12
10 _
∑d − n​  ​d
2
__________
2 _
s2 = ​​     ​​ Calculate d ​​
​​  and s2.
n−1
0.48 − 10(0.12)2
= ________________
  
​​   ​​
9
= 0.037333
_
d ​​  − μD
  0.12 − 0   Calculate the value of the test statistic t = ______
t =  ​​ _____________
     __________ ​​ = 1.9640 ​​  s  ​​
___

​ 0.037333 ​
  
____________ ​  __ ​
  
​  ___ ​ √
​ n ​

​ 10 ​
1.9640 . 1.833. The result is significant:
Always state whether you accept or reject H0 and
reject H0. There is evidence that consuming
draw a conclusion (in context if possible).
a pint of beer increased the students'
reaction times.

176

M07_EDALVL_FS2_83381__U07_163-195.indd 176 05/07/2018 13:09


Confidence intervals and tests using the t-distribution

Example 9
A In order to compare two methods of measuring the hardness of metals, readings of Brinell
hardness were taken using each method for 8 different metal specimens. The resulting Brinell
hardness readings are given in the table below.
Material Reading method A Reading method B
Aluminium 29 31
Magnesium alloy 64 63
Wrought iron 104 105
Duralumin 116 119
Mild steel 138 140
70/30 brass 156 156
Cast iron 199 200
Nickel chrome steel 385 386

Use a paired t-test, at the 5% level of significance, to test whether or not there is a difference in the
readings given by the two methods.

H0: μd = 0 H1: μd ≠ 0 State your hypotheses.

Probability in each tail = 0.025 This is a two-tailed test so halve the significance
ν=8−1=7 level to find the probability in each tail.

Critical value t7(2.5%) = 2.365 Find the number of degrees of freedom.

The critical regions are t , −2.365 and


Look up the critical value in the table on page 217.
t . 2.365

Write down the critical regions.


The differences, d, are 2, −1, 1, 3, 2, 0, 1 and 1
∑d = 9 ∑d2 = 21
_
9
​​  = __
d ​​ ​​   ​​ = 1.125
8 _
_ Calculate d, d ​
​​  ​ and s2.
∑d2 − n​  ​d 2
__________
s = ​​   
2  ​​
n−1
21 − 8(1.125)2
= _____________
  
​​   ​​
7
= 1.554 _
1.125 −0 d ​ − μD
​ ______
t = ​​ _________
______  ​​ Calculate the value of the test statistic t = ​​  s  ​​
___

​ 1.554 ​
_______ ​  __ ​
​  __ ​ √
​ n ​

​ 8 ​
= 2.553
The t value is significant; there is sufficient
evidence to reject the null hypothesis. There Always state whether you accept or reject H0 and
is a difference between the mean hardness draw a conclusion (in context if possible).
readings using the two methods.

177

M07_EDALVL_FS2_83381__U07_163-195.indd 177 05/07/2018 13:09


Chapter 7

Exercise 7C
A
1 It is claimed that completion of a shorthand course has increased the shorthand speeds of the
E/P
students.
a If the suggestion that the mean speed of the students has not altered is to be tested,
write down suitable hypotheses for which
i a two-tailed test would be appropriate
ii a one-tailed test would be appropriate. (2 marks)
The table below gives the shorthand speeds of students before and after the course.
Student A B C D E F
Speed before in words/minute 35 40 28 45 30 32
Speed after 42 45 28 45 40 40

b Carry out a paired t-test, at the 5% significance level, to determine whether or not
there has been an increase in shorthand speeds. (7 marks)

E 2 A large number of students took two General Studies papers that were supposed to be of equal
difficulty. The results for 10 students chosen at random are shown below.
Candidate A B C D E F G H I J
Paper 1 18 25 40 10 38 20 25 35 18 43
Paper 2 20 27 39 12 40 23 20 35 20 41

The teacher looked at the marks of the random sample of 10 students, and decided that Paper 2
was easier than Paper 1.
Given that the marks on each paper are normally distributed, carry out an appropriate
test of the teacher’s claim, at the 1% level of significance. (7 marks)

E 3 It is claimed by the manufacturer that by chewing a special flavoured chewing gum, smokers are
able to reduce their craving for cigarettes, and thus cut down on the number of cigarettes smoked
per day. In a trial of the gum on a random selection of 10 people, the no-gum smoking rate and
the smoking rate when chewing the gum were investigated, with the following results.
Person A B C D E F G H I J
Without-gum smoking rate (cigs./day) 20 35 40 32 45 15 22 30 34 40
With-gum smoking rate (cigs./day) 15 25 35 30 45 15 14 25 28 34

a Use a paired t-test at the 5% significance level to test the manufacturer’s claim. (7 marks)
b State any assumptions you have had to make. (1 mark)

E 4 A town council is going to put a new traffic management scheme into operation in the hope that
it will make travel to work in the mornings quicker for most people. Before the scheme is put
into operation, 10 randomly selected workers are asked to record the time it takes them to come
into work on a Wednesday morning. After the scheme is put into place, the same 10 workers
are again asked to record the time it takes them to come into work on a particular Wednesday
morning.

178

M07_EDALVL_FS2_83381__U07_163-195.indd 178 05/07/2018 13:09


Confidence intervals and tests using the t-distribution

A The times in minutes are shown in the table below.


Worker A B C D E F G H I J
Before 23 37 53 42 39 60 54 85 46 38
After 18 35 49 42 34 48 52 79 37 37
Test, at the 5% significance level, whether or not the journey time to work has decreased.
 (7 marks)

E 5 A teacher wants to test the idea that students’ results in mock examinations are good predictors
for their results in actual examinations. He selects 8 students at random from those doing a mock
Statistics examination and records their marks out of 100. Later he collects the same students’
marks in the actual examination. The resulting marks are as follows:
Student A B C D E F G H
Mock examination mark 35 86 70 91 45 64 78 38
Actual examination 45 77 81 86 53 71 68 46
a Use a paired t-test to investigate whether or not the mock examination is a good
predictor. (Use a 10% significance level.) (7 marks)
b State any assumptions you have made. (1 mark)

E/P 6 The manager of a dress-making company took a random sample of 10 of his employees and
recorded the number of dresses made by each. He discovered that the number of dresses made
between 3.00 and 5.00 p.m. was fewer than the same employees achieved between 9.00 and
11.00 a.m. He wondered whether a tea break from 2.45 to 3.00 p.m. would increase productivity
during these last two hours of the day.
The numbers of dresses made by these workers in the last two hours of the day before and after
the introduction of the tea break were as shown below.
Worker A B C D E F G H I J
Before 75 73 75 81 74 73 77 75 75 72
After 80 84 79 84 85 84 78 78 80 83
a Why was the comparison made for the same 10 workers? (1 mark)
b Conduct, at the 5% level of significance, a paired t-test to see whether the introduction
of a tea break has increased productivity between 3.00 and 5.00 p.m. (7 marks)

E 7 A drug administered in tablet form to help people sleep and a placebo were given for two weeks
to a random sample of eight patients in a clinic. The drug and the placebo were given in random
order for one week each. The average numbers of hours sleep that each patient had per night
with the drug and with the placebo are given in the table below.
Patient 1 2 3 4 5 6 7 8
Hours of sleep with drug 10.5 6.7 8.9 6.7 9.2 10.9 11.9 7.6
Hours of sleep with placebo 10.3 6.5 9.0 5.3 8.7 7.5 9.3 7.2
Test, at the l% level of significance, whether or not the drug increases the mean number of hours
sleep per night. State your hypotheses clearly. (7 marks)

179

M07_EDALVL_FS2_83381__U07_163-195.indd 179 05/07/2018 13:09


Chapter 7

7.4 Difference between means of two independent normal distributions


A You need to be able to find a confidence interval
Watch out In Section 5.3, you carried out
for the difference between two means from hypothesis tests for the difference between
independent normal distributions with equal the means of normal distributions with
but unknown variances. known variances. In that case the population
distributions could have different variances. The
To do this, you need to find a pooled estimate
techniques of this section and the next section
of variance. only apply to normal distributions with unknown
Suppose that you take random samples from but equal variances.
random variables X and Y that have a common
variance, σ2. You will have two estimates of σ2, namely ​​sx​  2​  ​ ​and ​​sy​  2​  ​​. A better estimate of σ2 than either ​​sx​  2​  ​
​or ​​sy​  2​  ​​ can be obtained by pooling the two estimates. You will recall that, for a single sample, an
unbiased estimate of the population variance was given by
_
∑(x − ​ x ​)2
_________
s = ​​ 
2  ​​
n−1
A similar idea works for two pooled estimates. You have
_2 _2
∑ (x − ​  x  ​
) ∑(   y − ​  y  ​)
​s​x2​ = _________
​ ​  ​​ and s​ ​y2​​= _________
​​   ​​
nx − 1 ny − 1
_ _
so that (nx − 1)​​sx​  2​  ​​ = ∑
​ (​x − ​​ x ​​)2 and (ny − 1)​​sy​  2​  ​​ = ∑
​ (  ​y − ​​ y ​​  )2.
These are the sums of the squares of the differences of each sample value from the sample mean.
You can add them together to get a total sum of squares of differences:
_ _
​∑(x − ​ x ​)2​ + ​∑(  y − ​ y ​  )2​ = (nx – 1)​​sx​  2​  ​​ + (ny – 1)​​sy​  2​  ​​

You can use this sum to calculate a pooled estimate, ​​sp​  2​  ​​, of σ2:
(​nx​  ​​  − 1) ​sx​  2​  ​  + (​ny​  ​​  − 1) ​sy​  2​  ​ (​nx​  ​​  − 1) ​sx​  2​  ​  + (​ny​  ​​  − 1) ​sy​  2​  ​
​s​ p​  2​  ​​ = __________________
​​        ​​ = ​​ __________________
       ​​
(​nx​  ​​  − 1 ) + (​n​ y​​  − 1) ​nx​  ​​  + ​ny​  ​​  − 2
■ If a random sample of nx observations is taken from a normal distribution with unknown
variance σ2, and an independent sample of ny observations is taken from a normal
distribution that also has unknown variance σ2, then a pooled estimate for σ2 is
(nx − 1)​sx​  2​  ​ + (ny − 1)​sy​  2​  ​
​​sp​  2​  ​ = ____________________
​        ​​
nx + ny − 2
_ _
∑x2 − nx​ x  ​ 2 ∑y2 − ny ​ y ​2
where ​​sx​ ​  ​ = ​ 
2 ___________  ​​ and s​​ y​  ​  ​ = ​ 
2 ___________
 ​​
nx − 1 ny − 1

Note
​sx​2​ + ​sy​2​​
(n − 1)( ​sx​  2​  ​  + ​sx​  2​  ​  ) ______
Notice that if nx = ny = n, this reduces to ​​sp​2​​= _____________
​​     ​​ = ​​   ​​ which is
2(n − 1) 2
the mean of the two variances. The pooled estimate of variance is really a
weighted mean of two variances with the two weights being (nx − 1) and (ny − 1).

180

M07_EDALVL_FS2_83381__U07_163-195.indd 180 05/07/2018 13:09


Confidence intervals and tests using the t-distribution

Example 10
A A random sample of 15 observations is taken from a population and gives an unbiased estimate
for the population variance of 9.47. A second random sample of 12 observations is taken from
a different population that has the same population variance as the first population, and gives
an unbiased estimate for the variance as 13.84. Calculate an unbiased estimate of the population
variance σ2 using both samples.

(14 × 9.47) + (11 × 13.84) (nx − 1)​sx​  2​  ​+ (ny − 1)​sy​  2​  ​
​sp​2​ = _________________________
​​         ​​ Use ​sp​2​ = __________________
​​       ​​
14 + 11 (nx − 1) + (ny − 1)
= 11.3928

In Section 5.4, you saw that if the sample sizes are large then
_ _
​  ) − (μx − μy)
(​ X ​ − Y ​
​​ _________________
     ______ ​​ is approximately normal with distribution N(0, 12).

√ ​ __
​sx​2​ ​sy​2​​
​  n  ​ + __
x
​  n  ​ ​
y

When the sample sizes are small you need to make three assumptions:
1 that the populations are normal Links In many cases, it is reasonable to assume
2 that the samples are independent that the variances of the populations are equal. If
3 that the variances of the two populations you are unsure, you can use the F-distribution to
are equal. test for equal variance. ← Section 6.4

The third assumption enables you to pool the two sample variances to find an estimator for the
common variance:
( nx − 1) ​Sx​  2​  ​  + ( ny − 1) ​Sy​  2​  ​
2 ___________________
​​Sp​  ​  ​​ = ​​        ​​
( nx − 1 )  + ( ny − 1)
Substituting ​Sp​  2​  ​for ​Sx​  2​  ​ and ​​Sy​  2​  ​​ gives
_ _ _ _
(​ X ​ − ​ Y ​) − ( μx − μy ) (​ X ​ − ​ Y ​) − ( μx − μy )
​​ _________________
      _______ ​​ = ​​ _________________
      ______ ​​

√ √
​Sp​  2​  ​ ​Sp​  2​  ​ 1 1
​ ___
​  n  ​ + ​ ___  ​  ​ Sp ​ __
​  n  ​ + ​ __
ny ​ ​
x n y
x

Now, because the sample sizes are small, this will not as before follow a N(0, 12) distribution.
You have already seen that in the single-sample case
_
X ​ − ​μ​ x​​
​ ______
​​   ​​
___ S
​  __ ​
​√ nx ​
follows a t-distribution, so you will not be surprised to find that
_ _
(​ X ​ − ​ Y ​) − ( μx − μy )
_________________
  
​​     ______ ​ ​

Sp ​ __
x
1 1
​  n  ​ + ​ __
y n  ​ ​

also follows a t-distribution.

181

M07_EDALVL_FS2_83381__U07_163-195.indd 181 05/07/2018 13:10


Chapter 7

_ (nx +_ny) observations in the total sample and two calculated restrictions (namely the
A There are
​​  ), so the number of degrees of freedom will be nx + ny − 2.
means ​​ X ​​ and Y ​​  

■ If a random sample of nx observations is taken from a normal distribution that has


unknown variance σ2, and an independent sample of ny observations is taken from a normal
distribution with equal variance, then
_ _
(​ X ​ − ​ Y ​  ) − ( μx − μy) ( nx − 1) ​Sx​  2​  ​  + ( ny − 1) ​Sy​  2​  ​
_________________
​​       _______ ​ ~ t​ nx + ny − 2​ where S
 ​ 2 ____________________
​​ p​  ​  ​ = ​        ​​

1 1 nx + ny − 2
Sp ​ __ ​  n  ​ + ​ __
n  ​ ​
x y

You can now use tables of values for the t-distribution to find a confidence interval for μx − μy.
For example, for a 95% confidence interval you would start by finding the value tc that is exceeded
with probability 0.025. This would give you:
P(−tc < tnx + ny − 2 < tc) = 0.95
⎛ ⎞
_ _

⎜ ⎟
(​ x ​ − ​ y ​) − ( μx − μy )
________________
​P​ − ​tc​  ​​  < ​       ______ ​ < ​tc​  ​​ ​​ = 0.95

​sp​  ​​ ​ __√1 1
​  n  ​ + ​ __
x ny ​ ​ ⎠
______ ______

( √ ​n​  ​​ √ ​n​  ​​ ​ny​  ​​ )


1 1 _ _ 1 1
​   ​  + ​ __ ​ ​  < (​ x ​ − ​ y ​) − ( ​μ​ x​​  − ​μ​ y​​  ) < ​tc​  ​​ ​sp​  ​​ ​ __
​P​ − ​tc​  ​​ ​sp​  ​​ ​ __ ​ny​  ​​ ​   ​  + ​ __ ​ ​ ​​ = 0.95
x x

The confidence limits for ​( ​μ​ x​​  − ​μ​ y​​ )​are therefore given by


______


_ _ 1 1
​​  ) ± tc​​sp​  ​​ ​ __
(​​ x ​​ − y ​​ ​  ​n​   ​ ​​ + ​ __
n​ y​   ​ ​​​​
x

and the confidence interval is


______ ______

( √ √ n​ y​   ​ ​​​ )​​


_ _ 1 1 _ _ 1 1
​​ (​ x ​ − ​ y ​) − ​tc​  ​​ ​sp​  ​​ ​ __
​  ​n​   ​ ​​ + ​ __
n​ y​  ​​  ​ ​  , x ​
(​  − ​ y ​) + ​tc​  ​​ ​sp​  ​​ ​ __
​  ​n​   ​ ​​ + ​ __
x x

■ The confidence limits for the difference between two means from independent normal
distributions, X and Y, when the variances are equal but unknown are given by
_______


_ _ 1 1
​   ​ − ​ y ​  ) ± t​ c​  ​​ ​sp​  ​​ ​ __
​(x ​  n  ​ + ​ __
ny ​ ​​
x

where sp is the pooled estimate of the population variance, and tc is the relevant value taken
from the t-distribution tables.

■ The confidence interval is given by


______ ______

( √ √__​ ​n1​   ​ ​​ + ​ __​n1​   ​ ​​​ )​​


_ _ 1 1 _ _
​​ (​ x ​ − ​ y ​) − ​tc​​​sp​  ​​ ​ __
​  ​n​   ​ ​​ + ​ __
n
​ ​   ​ ​ ​​ , (​ x ​ − ​ y ​) + ​tc​​s​ p​  ​​ ​
x y x y

182

M07_EDALVL_FS2_83381__U07_163-195.indd 182 05/07/2018 13:10


Confidence intervals and tests using the t-distribution

Example 11
A In a survey on the petrol consumption of cars, a random sample of 12 cars with 2-litre engines was
compared with a random sample of 15 cars with 1.6-litre engines. The following results show the
consumption, in suitable units, of the cars:
2-litre cars: 34.4, 32.1, 30.1, 32.8, 31.5, 35.8, 28.2, 26.6, 28.8, 28.5, 33.6, 28.8
1.6-litre cars: 35.3, 34.0, 36.7, 40.9, 34.4, 39.8, 33.6, 36.7, 34.0, 39.2, 39.8, 38.7, 40.8, 35.0, 36.7
Calculate a 95% confidence interval for the difference between the two mean petrol consumption
figures. You may assume that the variables are normally distributed and that they have the same
variance.
_
For the 2-litre engine, ny = 12, y ​​ ​​  = 30.933, ​​s​ y2​  ​​ = 8.177
_
For the 1.6-litre engine, nx = 15, ​​ x ​​ = 37.04, ​​s​ x2​  ​​ = 6.894
(14 × 6.894) + (11 × 8.177) ( ​n​  ​​  − 1) ​s​  2​  ​  + ( ​n​  ​​  − 1) ​s​  2​  ​
x x y y
​​sp​  2​  ​​ = ___________________________
​​         ​​ ___________________
​​       ​​
25 ​nx​  ​​  + ​ny​  ​​  − 2
= 7.459______
sp = √​​ 7.459 ​​ = 2.731
tc = t25(2.5%) = 2.060 ν = 12 + 15 − 2 = 25
The confidence limits are _______
(37.04 − 30.933) ± 2.060 × 2.731​​√ __ ​  151  ​ + __
​  121  ​ ​​ = 6.107 ± 2.179 _ _ _
Use ​(​ x ​ − ​ y ​  ) ± ​tc​  ​​ ​sp​  ​​ ​√ _
​  ​n1x​  ​​ ​  + ​ _1
​ny​  ​​ ​ ​​
= 8.286 and 3.928
The 95% confidence interval is (3.928, 8.286)
or (3.93, 8.29) to 3 s.f.

Exercise 7D

E 1 A random sample of 10 toothed winkles was taken from a sheltered shore, and a sample of
15 was taken from a non-sheltered shore. The maximum basal width, x mm, of the shells was
measured and the results are summarised below.
_
Sheltered shore: ​​ x ​​ = 25, s2 = 4
_
Non-sheltered shore: ​​ x ​​ = 22, s2 = 5.3
a Find a 95% confidence interval for the difference between the means. (6 marks)
b State an assumption that you have made when calculating this interval. (1 mark)

E/P 2 A packet of plant seeds was sown and, when the seeds had germinated and begun to grow,
8 were transferred into pots containing a soil-less compost and 10 were grown on in a soil-based
compost. After 6 weeks of growth, the heights, x, in cm of the plants were measured with the
following results.
Soil-less compost: 9.3, 8.7, 7.8, 10.0, 9.2, 9.5, 7.9, 8.9
Soil-based compost: 12.8, 13.1, 11.2, 10.1, 13.1, 12.0, 12.5, 11.7, 11.9, 12.0
a Assuming that the populations are normally distributed, and that there is a difference between
the two means, calculate a 90% confidence interval for this difference. (6 marks)
b State an additional assumption you have used when calculating this interval and discuss
whether this assumption is reasonable in the context given. (2 marks)

183

M07_EDALVL_FS2_83381__U07_163-195.indd 183 05/07/2018 13:10


Chapter 7

A 3 Forty children were randomly selected from all 12-year-old children in a large city to compare
E/P two methods of teaching the spelling of 50 words which were likely to be unfamiliar to the
children. Twenty children were randomly allocated to each method. Six weeks later the children
were tested to see how many of the words they could spell correctly. The summary statistics
_
​​  is the mean number of words spelled
for the two methods are given in the table below, where x ​​
correctly, s2 is an unbiased estimate of the variance of the number of words spelled correctly and
n is the number of children taught using each method.
_
​​ x ​​ s2 n
Method A 32.7 6.12 20
Method B 38.2 5.22 20

a Calculate a 99% confidence interval for the difference between the mean numbers
of words spelled correctly by children who used Method B and Method A.  (6 marks)
b State two assumptions you have made in carrying out part a. (2 marks)
c Interpret your result.  (1 mark)

E 4 The table below shows summary statistics for the mean daily consumption of cigarettes by a
random sample of 10 smokers before and after their attendance at an anti-smoking workshop
with x representing the mean and s2 representing the unbiased estimate of population variance
in each case.
_
​​ x ​​ s2 n
Mean daily consumption before the workshop 18.6 32.488 10
Mean daily consumption after the workshop 14.3 33.344 10

Stating clearly any assumption you make, calculate a 90% confidence interval for the
difference in the mean daily consumption of cigarettes before and after the workshop. (7 marks)

E/P 5 Two farmers add different protein supplements to the feed of cows to increase the yield of milk.
A sample of 8 cows is taken from the first farmer, who uses supplement A, and the yield of milk
is measured. A second sample, of size 7, is taken from the cows of the second farmer, who uses
supplement B. The table shows the mean daily yield, in litres, and unbiased estimates for the
population variance in each case.
_
​​ x ​​ s2 n
Supplement A 24.5 1.2 8
Supplement B 26.8 1.6 7

a Stating your hypotheses clearly, test, at the 10% level of significance, the hypothesis that
there is a difference in the variability of the yields. State any assumptions you make. (5 marks)
The farmers wish to find a confidence interval for the difference in the average milk yield for the
two supplements.
b Explain how the result from part a can be used to justify the use of a t-distribution to find the
confidence interval. (1 mark)
c Find, correct to 3 significant figures, a 95% confidence interval for the difference in
the average milk yield. (5 marks)

184

M07_EDALVL_FS2_83381__U07_163-195.indd 184 05/07/2018 13:10


Confidence intervals and tests using the t-distribution

7.5 Hypothesis test for the difference between means


A
Apart from using the t-distribution rather than the normal distribution for finding the critical values,
testing the difference between means of two independent normal distributions with unknown
variances follows similar steps to those used for testing the difference of means when the variances
are known.

The following steps might help you in answering questions on the difference of means of normal
distributions when the variances are unknown.
1 Write down H0.
2 Write down H1.
3 Specify the significance level, α.
4 Write down the number of degrees of freedom, ν.
5 Write down the critical region.
_ _ 2
6 Calculate the sample means and variances, x ​​
​​  , y ​​
​​  , ​s​ x​  ​  ​  and  ​sy​  2​  ​​.
7 Calculate a pooled estimate of the variance:
( ​nx​  ​​  − 1) ​sx​  2​  ​  + ( ​ny​  ​​  − 1) ​sy​  2​  ​
2 ___________________
​​sp​  ​  ​  = ​        ​​
​nx​  ​​  + ​ny​  ​​  − 2
8 Calculate the value of t:
_ _
 ​ − ​ y ​  ) − (μx − μy)
(​ x
________________
​t = ​       _______ ​ ​
__

1 __ 1
sp​ ​  n  ​ + ​  n  ​ ​
x y

9 Complete the test and state your conclusions. The following points should be addressed:
i Is the result significant?
ii What are the implications in terms of the original problem?

Example 12
Two groups of students, X and Y, were taught by different teachers. At the end of their course,
a random sample of students from each class was selected and given a test. The test results out of
50 were as follows:
Group X: 40 37 45 34 30 41 42 43 36
Group Y: 38 43 36 45 35 44 41
The headteacher wishes to find out if there is a significant difference between the results for these
two groups.
a Write down any assumptions that need to be made in order to conduct a difference of means test
on this data.
b Assuming that these assumptions apply, test at the 10% level of significance whether or not there
is a significant difference between the means.

185

M07_EDALVL_FS2_83381__U07_163-195.indd 185 05/07/2018 13:10


Chapter 7

A
a The assumptions that need to be made are
that the two samples come from normal
distributions, are independent and that the
populations from which they are taken have the State your hypotheses.
same variances.
This is a two-tailed test. Halve the
b H0: μx = μy H1: μx ≠ μy
significance level to find the probability in
Probability in each tail = 0.05
each tail.
ν = 9 + 7 − 2 = 14
Critical value t14(0.05) is 1.761
Find the number of degrees of freedom
(n1 + n2 − 2 in this case).

Look up the critical value in the table.


area area
0.05 0.05

–1.761 O 1.761 t
Write down the critical region. There are two
The critical regions are t < −1.761 and t > 1.761
regions as it is a two-tailed test.
Using a calculator gives
_
nx = 9, ​​ x ​​ = 38.667, ​​s​ x2​  ​​ = 23.0 _ _ 2
_ Calculate ​​ x ​​, y ​​
​​  , sx and sy2.
ny = 7, y ​​ ​​  = 40.286, ​​s​ y2​  ​​ = 15.9
(8 × 23) + (6 × 15.9)
​​sp​  2​  ​​ = _____________________
​​         ​​ Calculate a pooled estimate of the variance
9 + 7 −2
= 19.957 ( ​nx​  ​​  − 1) ​sx​  2​  ​  + ( ​ny​  ​​  − 1) ​sy​  2​  ​
using ___________________
​ ​       ​​
So sp = 4.467 ​nx​  ​​  + ​ny​  ​​  − 2
38.667 −______ 40.286
t = _________________
  
​​      ​​
√ 1 1 _ _
__ __
4.467 ​ ​   ​ + ​   ​ ​ (​ x ​ − ​ y ​  ) − ( ​μ​  ​​  − ​μ​  ​​  )
x y
9 7 Calculate t using ________________
​​      _ ​​
= −0.719
_
1 _
1
√ ​n​  ​​
​s​  ​​ ​ ​   ​  + ​   ​ ​
p x ​ny​  ​​
−1.761 , −0.719 , 1.761 so the result is not μx − μy = 0 from hypothesis.
significant. Accept H0. On the evidence given
by the two samples there is no difference Always state whether you accept or reject H0
between the means of the two groups. and draw a conclusion (in context if possible).

Example 13
A random sample of the heights, in cm, of sixth form boys and girls was taken with the following
results:
Boys’ heights: 152, 148, 147, 157, 158, 140, 141, 144
Girls’ heights: 142, 146, 132, 125, 138, 131, 143
a Carry out a two-sample t-test at the 5% significance level on these data to see whether the mean
height of boys exceeds the mean height of girls by more than 4 cm.
b State any assumptions that you have made.

186

M07_EDALVL_FS2_83381__U07_163-195.indd 186 05/07/2018 13:10


Confidence intervals and tests using the t-distribution

A
Let x be the height of a boy and y be the
height of a girl.
a H0: μx = μy + 4 H1: μx > μy + 4 State your hypotheses and write down the
Significance level = 0.05 (one-tailed test) significance level.
ν = 8 + 7 − 2 = 13
Critical value t13(5%) = 1.771 Find the number of degrees of freedom.
The critical region is t > 1.771
Using a calculator gives: Look up the critical value in the table and write
_
For the boys: ​​ x ​​ = 148.375, sx2 = 46.554, down the critical region.
nx = 8
_ _ _ 2
For the girls: ​​ y ​​ = 136.714, sy2 = 57.905, Calculate ​​ x ​​, y ​​
​​  , sx and sy2.
ny = 7
(7 × 46.554) + (6 × 57.905)
___________________________ Calculate a pooled estimate of the variance
sp2 = ​​         ​​
8+7−2
= 51.793 ( ​nx​  ​​  − 1) ​sx​  2​  ​  + ( ​ny​  ​​  − 1) ​sy​  2​  ​
using ___________________
​ ​       ​​
So sp = 7.197 ​nx​  ​​  + ​ny​  ​​  − 2
(148.375 − 136.714) − 4
t = ______________________
   
​​     _ ​​ _ _
(​ x ​ − ​ y ​  ) − ( ​μ​  ​​  − ​μ​  ​​  )
7.197 ​√ _
​  81  ​  + ​ _
1
7 ​ ​ Calculate t using ________________
​​      _ ​​
x y

= 2.057 ​s​  ​​ ​ _ p √ ​n​  ​​


​  1  ​  + ​ _
1
 ​ ​
x ​ny​  ​​
2.057 is in the critical region and so there μx − μy = 4 from hypothesis.
is sufficient evidence to reject the null
hypothesis. The mean height of boys exceeds Always state whether you accept or reject H0 and
the mean height of girls by more than 4 cm. draw a conclusion (in context if possible).
b The assumptions made are that the two
samples are independent, that the variances
of both populations are equal and that the
populations are normally distributed.

Exercise 7E
_
E 1 A random sample of size 20 from a normal population gave x ​​ ​​  = 16, s2 = 12.
_
A second random sample of size 11 from a normal population gave ​​ x ​​ = 14, s2 = 12.
a Assuming that both populations have the same variance, write down an unbiased estimate for
that variance. (1 mark)
b Test, at the 5% level of significance, the suggestion that the two populations have the same
mean.  (5 marks)

E 2 Salmon reared in Scottish fish farms are generally larger than wild salmon. A fisherman measured
the length of the first 6 wild salmon caught from a river. Their lengths in centimetres were:
42.8 40.0 38.2 37.5 37.0 36.5
Chefs prefer wild salmon to fish-farmed salmon because of their better flavour. A chef was
offered 4 salmon that were claimed to be wild. Their lengths in centimetres were:
42.0 43.0 41.5 40.0
Use the information given above and a suitable t-test at the 5% level of significance to help
the chef to decide if the claim is likely to be correct. You may assume that the populations are
normally distributed.  (8 marks)

187

M07_EDALVL_FS2_83381__U07_163-195.indd 187 05/07/2018 13:10


Chapter 7

A 3 In order to check the effectiveness of three drugs against the E. coli bacillus, 15 cultures of the
E bacillus (five for each of three different antibiotics) had discs soaked in the antibiotics placed in
their centre. The 15 cultures were left for a time and the area in cm2 per microgram of drug where
the E. coli was killed was measured. The results for the three different drugs are given below.
Streptomycin: 0.210 0.252 0.251 0.210 0.256 0.253
Tetracycline: 0.123 0.090 0.123 0.141 0.142 0.092
Erythromycin: 0.134 0.120 0.123 0.210 0.134 0.134
a It was thought that tetracycline and erythromycin seemed equally effective. Assuming that the
populations are normally distributed, test this at the 5% significance level. (8 marks)
b Streptomycin was thought to be more effective than either of the other two drugs. Treating the
other two as being a single sample of 12, test this assertion at the same level of significance.
 (7 marks)

E/P 4 To test whether a new version of a computer programming language enabled faster task
completion, the same task was performed by 16 programmers, divided at random into two
groups. The first group used the new version of the language, and the time for task completion,
in hours, for each programmer was as follows:
4.9 6.3 9.6 5.2 4.1 7.2 4.0
The second group used the old version, and their times were summarised as follows:
n = 9, ∑x = 71.2, ∑x2 = 604.92
a State the null and alternative hypotheses.  (1 mark)
b Perform an appropriate test at the 5% level of significance. (7 marks)
In order to compare like with like, experiments such as this are often performed using the same
individuals in the first and the second groups.
c Give a reason why this strategy would not be appropriate in this case. (1 mark)

E/P 5 A company undertakes investigations to compare the fuel consumption, x, in miles per gallon,
of two different cars, the Volcera and the Spintono, with a view to purchasing a number as
company cars.
For a random sample of 12 Volceras the fuel consumption is summarised by
∑v = 384 ∑v2 = 12 480
A statistician incorrectly combines the figures for the sample of 12 Volceras with those of a
random sample of 15 Spintonos, then carries out calculations as if they are all one larger sample
_
​​  ​ = 34 and s2 = 23.
and obtains the results y ​
a Show that, for the sample of 15 Spintonos, ∑x = 534 and ∑x2 = 19 330. (2 marks)
b Given that the variance of the fuel consumption for each make of car is σ , obtain an
2

unbiased estimate for σ2.  (3 marks)


c Test, at the 5% level of significance, whether there is a difference between the mean fuel
consumption of the two models of car. State your hypotheses and conclusion clearly. (7 marks)
d State any further assumption you made in order to be able to carry out your test in part c.
 (1 mark)
e Give two precautions which could be taken when undertaking an investigation into the fuel
consumption of two models of car to ensure that a fair comparison is made.  (2 marks)

188

M07_EDALVL_FS2_83381__U07_163-195.indd 188 05/07/2018 13:10


Confidence intervals and tests using the t-distribution

A 6 A group of scientists is experimenting with different fertilisers. Fertiliser A is given to a crop of


E/P potatoes and a sample of 11 plants is taken. The weight of potatoes, in kg, from each plant is
measured. Fertiliser B is given to a second crop of potatoes and a sample of 13 plants is taken.
The weight of potatoes, in kg, from each plant is also measured. The table shows the mean
weight of potatoes and unbiased estimates for the population variance in each case.
_
​​ x ​​ s2 n
Fertiliser A 42.1 2.1 11
Fertiliser B 46.3 3.3 13

a Stating your hypotheses clearly, test, at the 10% level of significance, the hypothesis that there
is a difference in the variability of the weights. State any assumptions you make. (5 marks)
The scientists wish to test if there is a difference in the average weight of potatoes for each fertiliser.
b Explain how the result in part a can be used to justify the scientists using a two-sample
t-test to test their hypothesis. (1 mark)
c Stating your hypotheses clearly, test, at the 5% level of significance, whether there is a
difference in the average weight of potatoes. (5 marks)

Challenge
For samples of sizes nx and ny from populations X and Y with equal but
unknown variance σ 2, show that the pooled sample variance
( ​nx​  ​​  − 1) ​​Sx​  ​​​​  2​  + ( ​ny​  ​​  − 1) ​​Sy​  ​​​​  2​
​​SP​  ​​​​  2​  = ​ _____________________
       ​​
​nx​  ​​  + ​ny​  ​​  − 2

is an unbiased estimator of σ2. You may assume that the sample
variances Sx and Sy are each unbiased estimators for σ 2.

Mixed exercise 7

1 A random sample of 14 observations is taken from a normal distribution. The sample has a
_
mean ​​ x ​​ = 30.4 and a sample variance s2 = 36.
It is suggested that the population mean is 28. Test this hypothesis at the 5% level of significance.

2 A random sample of 8 observations is taken from a random variable X that is normally


distributed. The sample gave the following summary statistics:
∑x2 = 970.25 ∑x = 85
The population mean is thought to be 10. Test this hypothesis against the alternative hypothesis
that the mean is greater than 10. Use the 5% level of significance.

E 3 Six eggs selected at random from the daily output of a brood of hens had the following weights
in grams:
55 50 53 53 52 54
Calculate 95% confidence intervals for: Hint For part b, use the chi-squared
distribution.  ← Section 6.1
a the mean (5 marks)
b the variance of the population from which these eggs were taken. (5 marks)
c What assumption have you made about the distribution of the weights of eggs? (1 mark)

189

M07_EDALVL_FS2_83381__U07_163-195.indd 189 05/07/2018 13:10


Chapter 7

A 4 A sample of size 18 was taken from a random variable X which was normally distributed,
producing the following summary statistics:
_
 ​​ = 9.8 s2 = 0.49
​​ x
Calculate 95% confidence intervals for:
a the mean
b the variance of the population.

E 5 A manufacturer claims that the lifetime of its batteries is normally distributed with mean 21.5
hours. A laboratory tests 8 batteries and finds the lifetimes of these batteries to be as follows:
19.7 18.4 22.2 20.8 16.9 25.3 23.2 21.1
Stating clearly your hypotheses, examine whether or not these lifetimes indicate that the batteries
have a shorter mean lifetime than that claimed by the manufacturer.
Use a 5% level of significance. (6 marks)

E 6 A diabetic patient monitors his blood glucose in mmol/l at random times of the day over several
days. The following is a random sample of the results for this patient.
5.1 5.8 6.1 6.8 6.2 5.1 6.3 6.6 6.1 7.9 5.8 6.5
Assuming the data to be normally distributed, calculate a 95% confidence interval for:
a the mean of the population of blood glucose readings (6 marks)
b the standard deviation of the population of blood glucose readings. (6 marks)
The level of blood glucose varies throughout the day according to the consumption of food and
the amount of exercise taken during the day.
c Comment on the suitability of the patient’s method of data collection.  (1 mark)

E 7 In order to discover the possible error in using a stopwatch, a student started the watch and
stopped it again as quickly as she could. The times taken in centiseconds for 6 such attempts are
recorded below:
10 13 14 10 13 9
Assuming that the times are normally distributed, find 95% confidence limits for:
a the mean (6 marks)
b the variance. (6 marks)

E 8 A manufacturer claims that the car batteries which it produces have a mean lifetime of 24
months, with a standard deviation of 4 months. A garage selling the batteries doubts this claim
and suggests that both values are in fact higher.
The garage monitors the lifetimes of 10 randomly selected batteries and finds that they have a
mean lifetime of 27.2 months and a standard deviation of 5.2 months.
Stating clearly your hypotheses and using a 5% level of significance, test the claim made by the
manufacturer for:
a the standard deviation (6 marks)
b the mean. (6 marks)
c State an assumption which has to be made when carrying out these tests. (1 mark)

190

M07_EDALVL_FS2_83381__U07_163-195.indd 190 05/07/2018 13:10


Confidence intervals and tests using the t-distribution

A 9 The distance to takeoff from a standing start of an aircraft Distance (m) Frequency
E was measured on twenty occasions. The results are summarised
700– 3
in the table.
710– 5
Assuming that distance to takeoff is normally distributed,
find 95% confidence intervals for: 720– 9
730– 2
a the mean (6 marks)
740–750 1
b the standard deviation.  (6 marks)
It has been hypothesised that the mean distance to takeoff is 725 m.
c Comment on this hypothesis in the light of your interval from part a.  (1 mark)

E 10 A company knows from previous experience that the mean time taken by maintenance
engineers to repair a particular electrical fault on a complex piece of electrical equipment is
3.5 hours, with a standard deviation of 0.5 hours.
A new method of repair has been devised, but before converting to this new method the
company took a random sample of 10 of its engineers and each engineer carried out a repair
using the new method. The time, x hours, it took each of them to carry out the repair was
recorded and the data are summarised below:
∑x = 34.2 ∑x2 = 121.6
Assume that the data can be regarded as a random sample from a normal population.
a For the new repair method, calculate an unbiased estimate of the variance. (2 marks)
b Use your estimate from part a to calculate, for the new repair method, a 95% confidence
interval for:
i the mean
ii the standard deviation.  (10 marks)
c Use your calculations and the given data to compare the two repair methods in order to
advise the company as to which method to use.  (2 marks)
d Suggest an alternative way of comparing the two methods of repair using the 10 randomly
chosen engineers.  (1 mark)

E/P 11 A random sample of 60 female raccoons is taken and their heights recorded. The sample mean is
found to be 24 cm and an unbiased estimate for the population variance is found to be 2.1 cm2.
a Given that the underlying population is normally distributed, find a 90% confidence interval
for the mean height of female raccoons. State clearly the approximating distribution you
have used to determine this confidence interval. (5 marks)
A second random sample of 6 male raccoons is taken and their heights recorded. The sample
mean is found to be 27 cm and an unbiased estimate for the population variance is found to
be 2.7 cm2.
A hypothesis test is to be carried out to test if the mean height of male raccoons is greater
than 25 cm.
b Explain why the approximating distribution used in part a is no longer valid when carrying
out this test. (1 mark)
c Test, at the 5% level of significance, the hypothesis that male raccoons have an average height
greater than 25 cm. State your hypotheses clearly. (5 marks)

191

M07_EDALVL_FS2_83381__U07_163-195.indd 191 05/07/2018 13:10


Chapter 7

A 12 A chemist has developed a fuel additive and claims that it reduces the fuel consumption of cars.
E To test this claim, 8 randomly selected cars were each filled with 20 litres of fuel and driven
around a race circuit. Each car was tested twice, once with the additive and once without.
The distances, in miles, that each car travelled before running out of fuel are given in the table
below.
Car 1 2 3 4 5 6 7 8
Distance without additive 163 172 195 170 183 185 161 176
Distance with additive 168 185 187 172 180 189 172 175

Assuming that the distances travelled follow a normal distribution and stating your hypotheses
clearly, test, at the 10% level of significance, whether or not there is evidence to support the
chemist’s claim. (7 marks)

E/P 13 A farmer set up a trial to assess the effect of two different diets on the increase in the weight of
his lambs. He randomly selected 20 lambs. Ten of the lambs were given diet A and the other
10 lambs were given diet B. The gain in weight, in kg, of each lamb over the period of the trial
was recorded.
a State why a paired t-test is not suitable for use with these data. (1 mark)
b Suggest an alternative method for selecting the sample which would make the use of a paired
t-test valid.  (1 mark)
c Suggest two other factors that the farmer might consider when selecting the
sample.  (2 marks)
The following paired data were collected.
Diet A 5 6 7 4.6 6.1 5.7 6.2 7.4 5 3
Diet B 7 7.2 8 6.4 5.1 7.9 8.2 6.2 6.1 5.8

d Using a paired t-test at the 5% significance level, test whether or not there is evidence of
a difference in the weight gained by the lambs using diet A compared with those using
diet B.  (7 marks)
e State, giving a reason, which diet you would recommend the farmer to use for his
lambs.  (1 mark)

E 14 A medical student is investigating two methods of taking a person’s blood pressure. He takes a
random sample of 10 people and measures their blood pressure using an arm cuff and a finger
monitor. The table below shows the blood pressure for each person, measured by each method.
Person A B C D E F G H I J
Arm cuff 140 110 138 127 142 112 122 128 132 160
Finger monitor 154 112 156 152 142 104 126 132 144 180

a Use a paired t-test to determine, at the 10% level of significance, whether or not there
is a difference in the mean blood pressure measured using the two methods. State your
hypotheses clearly.  (7 marks)
b State an assumption about the underlying distribution of measured blood pressure
required for this test. (1 mark)

192

M07_EDALVL_FS2_83381__U07_163-195.indd 192 05/07/2018 13:10


Confidence intervals and tests using the t-distribution

A 15 The weights, in grams, of mice are normally distributed. A biologist takes a random sample of
E 10 mice. She weighs each mouse and records its weight.
The 10 mice are then fed on a special diet. They are weighed again after two weeks.
Their weights in grams are as follows:
Mouse A B C D E F G H I J
Weight before diet 50.0 48.3 47.5 54.0 38.9 42.7 50.1 46.8 40.3 41.2
Weight after diet 52.1 47.6 50.1 52.3 42.2 44.3 51.8 48.0 41.9 43.6

Stating your hypotheses clearly, and using a 1% level of significance, test whether or not the
diet causes an increase in the mean weight of the mice. (7 marks)

E/P 16 A hospital department installed a new, more sophisticated, piece of equipment to replace an
ageing one in order to speed up the treatment of patients. The treatment times of random
samples of patients during the last week of operation of the old equipment and during the first
week of operation of the new equipment were recorded. The summary results, in minutes, are
shown in the table.
n 2 x x
a Show that the values of s2 for the old and Old equipment 10 225 5136.3
new equipment are 8.2 and 14.5 respectively. New equipment 9 234 6200.0
 (2 marks)
Stating clearly your hypotheses, test:
b whether the variance of the times using the new equipment is greater than the variance of
the times using the old equipment, using a 5% significance level (6 marks)
c whether there is a difference between the mean times for treatment using the new
equipment and the old equipment, using a 2% significance level. (6 marks)
d Find 95% confidence limits for the mean difference in treatment times between the new
and old equipment. (5 marks)
Even if the new equipment would eventually lead to a reduction in treatment times, it might be
that to begin with treatment times using the new equipment would be higher than those using
the old equipment.
e Give one reason why this might be so. (1 mark)
f Suggest how the comparison between the old and new equipment could be
improved. (1 mark)

E/P 17 Two different drugs designed to increase the red blood cell _
​​ x ​​ s2 n
count are administered to two groups of patients. A sample Drug A 5.9 2.6 25
of 25 patients who took the first drug, A, is taken and the red
Drug B 4.8 1.7 19
blood cell count, in million cells per microlitre, is recorded.
A sample, of size 19, is then taken from the patients who took drug B. The table shows the
mean red blood cell count and unbiased estimates for the population variance in each case.
a Stating your hypotheses clearly, test, at the 10% level of significance, the hypothesis
that there is a difference in the variability of the red blood cell counts.
State any assumptions you make. (5 marks)

193

M07_EDALVL_FS2_83381__U07_163-195.indd 193 05/07/2018 13:10


Chapter 7

Doctors wish to find a confidence interval for the difference in the mean red blood cell counts
for the two drugs.
b With reference to your answer to part a, comment on the suitability of using a t-distribution
to find this confidence interval. (1 mark)
c Find, correct to 3 significant figures, a 95% confidence interval for the difference in the mean
red blood cell count. (5 marks)

Challenge
Three independent random samples of sizes nx, ny and nz (where nx, ny, nz . 1)
are taken from three populations X, Y and Z respectively, where X, Y and Z
have equal but unknown variance σ 2.
a Given that the sample variances Sx2, Sy2 and Sz2 are unbiased estimators
for σ 2, find a pooled estimator Sp2 for σ 2 based on all three samples, giving
your answer in terms of nx, ny, nz, Sx2, Sy2 and Sz2.
b Show that the estimator found in part a is unbiased.

Summary of key points


1 If a random sample X1, X2, … , Xn is selected from a normal distribution with mean μ and
unknown variance σ2 then
_
​  − μ
X ​
______
t = ​​   ​​
___S_
​   ​
​ n​

has a tn−1-distribution where S 2 is an unbiased estimator of σ2.

2 In general, for a small sample of size n from a normal distribution N(μ, σ2) with unknown mean
and variance:
• the 100(1 − α)% confidence limits for the population mean are
_ α
 ​ ± ​t​ n−1​​ ​(__
​   ​)​  × ​ ___
s_
​ x  ​
2 ​
√ n​
• the 100(1 − α)% confidence interval for the population mean is
_ α s_ _ α
​(​  ​
​ n ​)
x − ​t​ n−1​​ ​(__​   ​)​  × ​ ___  ​  , ​ x ​ + ​t​ n−1​​ ​(__
​   ​)​  × ​ ___
s_
 ​ ​
2 ​ n​
√ 2 √
_
3 In a_paired experiment with a mean of the differences between the samples of D ​​
​​  ,
​  − ​μ​ D​​
D ​
_______
​​   ​​  ~ ​t​ n−1​​
___S
​  _ ​
​ n​

194

M07_EDALVL_FS2_83381__U07_163-195.indd 194 05/07/2018 13:10


Confidence intervals and tests using the t-distribution

4 If a random sample of nx observations is taken from a normal distribution with unknown


variance σ2, and an independent sample of ny observations is taken from a normal distribution
that also has unknown variance σ2, then a pooled estimate for σ2 is
( ​nx​  ​​ − 1) ​sx​  2​  ​  + ( ​ny​  ​​ − 1) ​sy​  2​  ​
2 ___________________
​sp​  ​  ​ = ​        ​
​nx​  ​​ + ​ny​  ​​ − 2
_ _
∑x2 − nx​  ​ x2 ∑y2 − ny ​ y ​2
where ​sx​  ​  ​ = ​​ 
2 __________  ​​ and ​sy​  ​  ​ = ​​ 
2 __________
 ​​
nx − 1 ny − 1
5 If a random sample of nx observations is taken from a normal distribution that has unknown
variance σ2, and an independent sample of ny observations is taken from a normal distribution
with equal variance, then
_ _
(​ X ​ − ​ Y ​   ) − ( ​μ​ x​​  − ​μ​ y​​  )
_________________
  
​​     _______ ​​ ~ t​​n​ ​​ ​+ ​​n​ ​​ ​– 2

√ 1 1 x y

​Sp​  ​​ ​ __
​  ​n​   ​​​ + ​ __
​n​  ​​  ​ ​
x y

(​nx​  ​​ − 1)​Sx​  2​  ​  + (​ny​  ​​ − 1)​Sy​  2​  ​


where ​Sp​  2​  ​ = ___________________
​        ​
​nx​  ​​ + ​ny​  ​​ − 2

6 The confidence limits for the difference between two means from independent normal
distributions, X and Y, when the variances are equal but unknown are given by
_______


_ _ 1 1
(​ x ​ − ​ y ​   ) ± ​tc​  ​​ ​sp​  ​​ ​​ __
​  ​n​   ​​​ + ​ __
​ny​   ​ ​​​​
x

where sp is the pooled estimate of the population variance, and tc is the relevant value taken
from the t-distribution tables.
The confidence interval is given by
_______ _______

( √ √ n​ y​   ​ ​​​ )​​


_ _ 1 __ 1 _ _ 1 1
​​ (​ x ​ − ​ y ​   ) − ​tc​  ​​ ​sp​  ​​ ​ ​  ​n​   ​​​ + ​  ​n​   ​ ​​​ , (​ x ​ − ​ y ​   ) + ​tc​  ​​ ​sp​  ​​ ​ __
__ ​  ​n​   ​​​ + ​ __
x y x

195

M07_EDALVL_FS2_83381__U07_163-195.indd 195 05/07/2018 13:10

You might also like