Sample Mean Distribution
Sample Mean Distribution
You could work out the probability distribution for the sample means, by calculating
the probability of each possible value for the mean. What sort of shape would this
probability distribution have, and what would be the mean and standard deviation of
the distribution?
You can investigate the distribution of sample means using a simple example:
throwing an ordinary, fair die. This means that you are dealing with the population
{1, 2, 3, 4, 5, 6}. Throwing one die is equivalent to taking a sample of size 1 from the
population; throwing two dice is equivalent to taking a sample of size 2 from the
population, and so on.
Samples of size 1
If you throw one die, then there are six possible samples you could obtain:
Each of these samples is equally likely to occur. The sample mean in each case is,
of course, just the value of the score on the die.
So the probability distribution of the sample means for a sample of size 1 is:
x 1 2 3 4 5 6
P( X x ) 1
6
1
6
1
6
1
6
1
6
1
6
1 of 10 07/02/19 © MEI
integralmaths.org
MEI A level Hypothesis testing 1 Notes and examples
0.2 p
0.15
0.1
0.05
x
Samples of size 2
If you throw two dice, then there are 36 possible samples you could obtain (some of
which are the same, e.g. {1, 2} and {2, 1}).
The table below shows the possible values of the sample mean.
1 2 3 4 5 6
1 1 1.5 2 2.5 3 3.5
2 1.5 2 2.5 3 3.5 4
3 2 2.5 3 3.5 4 4.5
4 2.5 3 3.5 4 4.5 5
5 3 3.5 4 4.5 5 5.5
6 3.5 4 4.5 5 5.5 6
So the probability distribution of the sample means for a sample of size 2 is:
0.2 p
0.15
0.1
0.05
x
Samples of size 3
If you throw three dice, then there are 216 possible samples you could obtain (again,
some are the same, such as {1, 1, 2}, {1, 2, 1} and {2, 1, 1}).
If a complete list is made of all the possible samples, and the sample mean
calculated for each, you can find the probability distribution of the sample mean in
the same way as for samples of size 2.
The probability distribution of the sample means for a sample of size 3 is:
z 1 1 13 1 23 2 2 13 2 23 3 3 13 3 23 4 4 13 4 23 5 5 13 5 23 6
P(Z z ) 1
216
3
216
6
216
10
216
15
216
21
216
25
216
27
216
27
216
25
216
21
216
15
216
10
216
6
216
3
216
1
216
2 of 10 07/02/19 © MEI
integralmaths.org
MEI A level Hypothesis testing 1 Notes and examples
0.2 p
0.15
0.1
0.05
x
Comparing the distributions for samples of size 1, 2 and 3, you can see that whereas
a sample of size 1 has a uniform distribution, for samples of size 2 and 3 the
distribution has a peak in the centre corresponding to the mean value of 3.5.
In addition, the distribution for sample size 2 is triangular, whereas the one for
sample size 3 is more “bell-shaped”, suggesting that the standard deviation is
smaller. In fact, this trend continues with larger sample sizes.
We have used the theoretical distribution of throwing a die to model the outcomes of
sampling from a very simple population (the numbers 1, 2, 3, 4, 5 and 6). The mean
35
(3.5) and standard deviation ( 12 ) are the same as the population mean, , and
standard deviation, (the population standard deviation is calculated using divisor
n, since we are dealing with a complete population). All three probability distributions
have mean 3.5, which is the same as the population mean .
35
The standard deviation of the distribution for sample size 2 is 24 , which can be
35
written as . The standard deviation of the distribution for sample size 3 is 36 ,
2
which can be written as .
3
Notice that the standard deviation of the distribution of sample means (sometimes
called the standard error of the mean) is smaller than the population standard
deviation and decreases as the sample size increases.
In this topic we are assuming that the underlying distribution has a Normal
distribution.
Given a population X with a mean of μ and a standard deviation of σ
3 of 10 07/02/19 © MEI
integralmaths.org
MEI A level Hypothesis testing 1 Notes and examples
i.e. X ~ N(μ , 2), and a sample of size n is taken, the distribution of the sample
2
means is given by X ~ N , .
n
You can therefore use the skills learnt when working with the Normal distribution to
calculate probabilities with a sample mean.
Note you can become confused between the theoretical distribution and a practical
experiment. If you are conducting a biology experiment you will normally be
collecting one sample of data. When analysing the results you are using the theory
from the theoretical distribution.
As you saw in the work on the normal distribution, any normal distribution
X ~ N(μ, σ) can be transformed to the standard normal distribution Z ~ N(0, 1).
The variable X has mean μ and standard deviation σ.
so x, a particular value of X, is transformed into z by the formula:
x
z
So for the distribution of the sample means, X , you can standardise by using
x
z .
n
Hypothesis tests
You have already met hypothesis tests involving the binomial distribution
B(n, p), in which you investigate whether a hypothesised value for the population
parameter p takes a particular value.
You will now look at hypothesis tests using the Normal distribution N(μ, σ), in which
you test whether the population mean takes a particular value.
In the test, you are assuming that the value of the population mean is the one given
in the null hypothesis, and then considering the value of the sample mean. If your
sample mean is too far away from the assumed population mean, then you conclude
that as it is very unlikely that a randomly chosen sample would have such a high (or
low) sample mean, the population mean does not in fact have the value that you
assumed it to have. This means that you are rejecting the null hypothesis.
There are two main approaches that can be used in the hypothesis test. They are
equivalent but you should know both.
4 of 10 07/02/19 © MEI
integralmaths.org
MEI A level Hypothesis testing 1 Notes and examples
m x
Example 1
Test results are normally distributed with a mean of 65 and a standard deviation of 10. After
the introduction of a dynamic new teacher the results for a group of 8 students had a mean of
72. Is there evidence that the results have significantly improved at a 5% level of
significance?
You want to see if the results
Solution could have come from a
H0 : μ = 65 distribution where the population
H1 : μ > 65 mean has remained unchanged.
where is the population mean test score.
Since 0.0239 < 0.05 (the required significance level of 5%) the null hypothesis is rejected.
There is evidence to suggest that the mean score has increased, i.e. the teacher has had some
effect.
5 of 10 07/02/19 © MEI
integralmaths.org
MEI A level Hypothesis testing 1 Notes and examples
You can use your calculator to find the critical value (the boundary of the critical
region). For a null hypothesis of the form H1: > m, you are looking at the right-hand
tail, so for a 5% significance level you need the inverse normal value for 0.95 for
²
N m, .
n
Example 2
Test results are normally distributed with a mean of 65 and a standard deviation of 10. After
the introduction of a dynamic new teacher the results for a group of 8 students had a mean of
72. Is there evidence that the results have significantly improved at a 5% level of
significance?
Solution
H0 : μ = 65
H1 : μ > 65
where is the population mean test score.
Notice from the example above that the conclusion should always be given in terms
of the problem. First state whether H0 is to be accepted or rejected, then make a
statement beginning “there is evidence to suggest that …” or “there is not sufficient
evidence to suggest that …”. You should NOT write “this proves that ….” or “so the
claim is right”. You are not proving anything, only considering evidence.
6 of 10 07/02/19 © MEI
integralmaths.org
MEI A level Hypothesis testing 1 Notes and examples
In reality the standard deviation of the parent population will usually not be known.
So in this case the standard deviation will have to be estimated from the sample
data.
In order for us to proceed with the same style of analysis we require the sample size
to be sufficiently large. It is usual to require the sample size n to be 30 or above.
Given a Normal population X with a mean of μ and unknown standard deviation, the
sampling distribution of the mean is:
s2
X N ,
n
2
where s is the estimated variance from the sample data.
Example 3
The time taken for a bus to go from Oundle to Thrapston is normally distributed with a mean
time of 18 minutes. A new roundabout is introduced, which it is hoped will speed up the
journey.
A large number of observations are taken, following complaints from students that the
journey is now taking longer than 18 minutes.
From the 50 observations, the mean was found to be 19.1 minutes, with a sample standard
deviation of 5 minutes.
Investigate the students’ complaint, state a suitable null and alternative hypothesis for the test
and carry out the test at the 5% level of significance, stating your conclusion carefully.
Solution
H0 : μ = 18.
H1 : μ > 18.
where is the population mean journey time.
7 of 10 07/02/19 © MEI
integralmaths.org
MEI A level Hypothesis testing 1 Notes and examples
Example 4
The supplier of LITE light bulbs claims that the mean life of a LITE light bulb is 130 hours.
Responding to customer complaints that the light bulbs did not last as long as expected, a
training standards organisation tested 4000 bulbs and found the mean to be 128.5 hours and
the sample standard deviation was 13 hours.
Is there evidence at a 2% level that the mean is lower than 130 hours?
Solution
H0 : μ = 130
H1 : μ < 130.
where is the population mean lifetime.
8 of 10 07/02/19 © MEI
integralmaths.org
MEI A level Hypothesis testing 1 Notes and examples
However, sometimes you will need to look at situations where the alternative
hypothesis is of the form k (in which you are testing whether the mean is as
stated or not, without specifying in which direction it is likely to be wrong. A test like
this is a two-tailed test, as you are looking at both tails of the distribution.
In a two-tailed test, there are two parts to the critical region. If you are asked to give
the critical region for a test, you must give both parts. However, if you are just asked
to carry out the hypothesis test, you need only look at the relevant tail, depending on
whether the sample mean is higher or lower that the value given in the null
hypothesis. At the 5% significance level, you find the lower tail critical region using
the inverse normal of 0.025, and the upper tail critical region using the inverse
normal of 0.0975, so that the two tails correspond to a total probability of 5%.
Similarly, if you are using p-values, you compare with half the significance level,
since you are looking at just the relevant tail.
Example 5
The lengths of the leaves of a certain species of rare plant are Normally distributed with mean
8.6 cm and standard deviation 1.2 cm. A botanist finds a clump of plants and wants to find
out whether they are of the rare species. She collects and measures 50 leaves and finds that
the total of their lengths is 442 cm. Carry out a test at the 5% level. What should the biologist
conclude?
Solution
This is a two-tailed test, as the alternative hypothesis is that the mean is not 8.6, rather than
being specifically more or less than 8.6.
In this test, we are looking for
H0: = 8.6 evidence that the plants are
not of the rare species
H1: 8.6
where is the population mean leaf length.
442
x 8.84
50 2.5%
2.5%
Let X be the distribution of the lengths of the leaves.
1.22
X N 8.6,
50
As the sample mean is greater than 8.6, we are looking at the right-hand tail.
9 of 10 07/02/19 © MEI
integralmaths.org
MEI A level Hypothesis testing 1 Notes and examples
Since 0.0787 > 0.025 (the required significance level of 2.5% in each tail) the null hypothesis
is accepted. There is not sufficient evidence to suggest that the plants are not of the rare
species.
This is not the same as
evidence that they are
of the rare species
Method 2: Using critical regions
The critical value for the upper tail is found using the inverse normal of 0.0975.
1.22
For N 8.6, this is 8.93
50
The critical region is X 8.93
Since the sample mean of 8.84 is not in the critical region, the null hypothesis is accepted.
There is not sufficient evidence to suggest that the plants are not of the rare species.
10 of 10 07/02/19 © MEI
integralmaths.org