Statistical
Statistical
Statistical and
Data Handling
Skills in Biology
Roland Ennos
Statistical and Data Handling
Skills in Biology
Roland Ennos
Faculty of Life Sciences, University of Manchester
Pearson Education Limited
Edinburgh Gate
Harlow
Essex CM20 2JE
England
The right of Roland Ennos to be identified as author of this Work have been asserted by him
in accordance with the Copyright, Designs and Patents Act 1988.
All rights reserved. No part of this publication may be reproduced, stored in a retrieval
system, or transmitted in any form or by any means, electronic, mechanical, photocopying,
recording or otherwise, without either the prior written permission of the publisher or a
licence permitting restricted copying in the United Kingdom issued by the Copyright
Licensing Agency Ltd, Saffron House, 6‐10 Kirby Street, London EC1N 8TS.
All trademarks used therein are the property of their respective owners. The use of any
trademark in this text does not vest in the author or publisher any trademark ownership
rights in such trademarks, nor does the use of such trademarks imply any affiliation
with or endorsement of this book by such owners.
Pearson Education is not responsible for the content of third‐party internet sites.
ISBN 978‐0‐273‐72949‐5
10 9 8 7 6 5 4 3 2 1
15 14 13 12 11
Typeset in 9/12.5 by 75
Printed by Ashford Colour Press Ltd., Gosport
Dedication
For my father
Brief contents
1 An introduction to statistics 1
2 Dealing with variability 10
3 Testing for normality and transforming data 31
4 Testing for differences between one or two groups 43
5 Testing for difference between more than two groups: ANOVA and
its non‐parametric equivalents 83
6 Investigating relationships 132
7 Dealing with categorical data 164
8 Designing experiments 187
9 More complex statistical analysis 203
10 Dealing with measurements and units 213
Glossary 228
Further reading 232
Solutions 233
Statistical tables 258
Index 267
vii
Contents
1 An introduction to statistics 1
1.1 Becoming a research biologist 1
1.2 Awkward questions 1
1.3 Why do biologists have to repeat everything? 2
1.4 Why do biologists have to bother with statistics? 3
1.5 Why is statistical logic so strange? 3
1.6 Why are there are so many statistical tests? 4
1.7 Using the decision chart 6
1.8 Using this book 8
x
Contents
Glossary 228
Further reading 232
Solutions 233
xi
Contents
Index 267
Supporting resources
Visit www.pearsoned.co.uk/ennos to find valuable online resources
For more information please contact your local Pearson Education sales
representative or visit www.pearsoned.co.uk/ennos
xii
List of figures and tables
Figures
1.1 Decision chart for statistical tests 7
1.2 Flow chart showing how to deal with measurements
and rank data 8
2.1 Methods of presenting the distribution of a sample 11
2.2 Different ways in which data may be distributed 12
2.3 Measurements of the distribution of data 13
2.4 A normal distribution 14
2.5 Length distributions for a randomly breeding population of rats 15
2.6 The effect of sample size 18
2.7 Normal distribution and t distribution 20
2.8 Changes in the mean and 95% confidence intervals for the
mass of the bull elephants from example 2.2 after different
numbers of observations 21
2.9 Presenting descriptive statistics using error bars 23
4.1 Sample means different from an expected value 44
4.2 Mean ( { standard error) of the pH of the nine ponds at dawn
and dusk 56
4.3 Overlapping populations 57
4.4 The mean ( { standard error) of the masses of 16 bull and
16 cow elephants 63
4.5 Box and whisker plot showing the levels of acne of patients
before and after treatment 74
4.6 Box and whisker plot showing the numbers of beetles caught
in traps in the two fields 79
5.1 The rationale behind ANOVA: hypothetical weights for two
samples of fish 85
5.2 Two contrasting situations 86
5.3 Bar chart showing the means with standard error bars of the
diameters of bacterial colonies subjected to different antibiotic
treatments 95
5.4 Mean sweating rates of soldiers before, during and after exercise 101
5.5 Box and whisker plot showing the medians, qaurtiles and range
of the test scores of children who had taken different CAL
packages 106
xiii
List of figures and tables
5.6 Box and whisker plot showing the medians, quartiles and range
of the numbers of different flavoured pellets eaten by birds 112
5.7 The yields of wheat grown in a factorial experiment with or
without nitrate and phosphate 117
5.8 Box and whisker plot showing the medians, quartiles and
range of the numbers of snails given the different nitrate and
phosphate treatments 122
5.9 Mean ± standard error lengths of the lice found on fish in
fresh water and sea water 129
6.1 The relationship between the age of eggs and their mass 133
6.2 Ways in which variables can be related 134
6.3 A straight line relationship 135
6.4 Effect of sample size on the likelihood of getting an apparent
association 136
6.5 Correlation 137
6.6 Graph showing the relationship between the heart rate and
blood pressure of elderly patients 144
6.7 Regression 146
6.8 How to describe a power relationship 152
6.9 How to describe an exponential relationship 153
6.10 Graph showing the relationship between the density of
tadpoles and dragonfly larvae in 12 ponds 161
7.1 The binomial distribution 165
8.1 The Latin square design helps avoid unwanted bunching of
treatments 190
8.2 Blocking can help to avoid confounding variables: an agricultural
experiment with two treatments, each with eight replicates 190
8.3 (a) An effect will be detected roughly 50% of the time if the
expected value is two standard errors away from the actual
population mean. (b) To detect a significant difference between
a sample and an expected value 80% of the time, the expected
value should be around three standard errors away from the
population mean 195
10.1 Using logarithms 223
A1 Mean birth weight 233
A2 Graph showing the mean { standard error of calcium‐binding
protein activity before and at various times after being given
a heat stimulus 239
A3 Graph showing the aluminium concentration in tanks at five‐
weekly intervals after 20 snails had been placed in them (n = 8) 240
A4 Mean { standard error of yields for two different varieties of
wheat at applications of nitrate of 0, 1 and 2 (kg m - 2) 244
xiv
List of figures and tables
Tables
4.1 The effect of nitrogen treatment on sunflower plants. The results
show the means { standard error for control and high nitrogen
plants of their height, biomass, stem diameter and leaf area. 63
7.1 The numbers of men and women and their smoking status 178
7.2 The numbers of models eaten and left uneaten by the birds 185
10.1 The base and supplementary SI units 214
10.2 Important derived SI units 215
10.3 Prefixes used in SI 215
10.4 Conversion factors from obsolete units to SI 217
10.5 Some useful constants and formulae 220
xv
Preface
It is five years since the second edition of Statistical and Data Handling Skills in
Biology was first published and I am grateful to Person Education for allowing
me the opportunity to update and expand the book for a third edition.
A few more years’ experience have prompted me to make some more changes.
There were some errors to correct, of course, but the chief failing of the second
edition was the artificial separation of parametric and non‐parametric tests.
In this edition, the book has been restructured to bring the two types of tests
together into the same chapters, though in all the cases the parametric tests
are introduced first, as this seems logical both from a theoretical and historical
perspective. I include more information about the basic examination of dis-
tributions, while testing for normality is also brought forward to highlight its
importance when deciding which statistical test to perform.
The new edition also includes coverage of additional tests that should take
undergraduates up to their final year. There is now coverage of nested ANOVA,
the Scheirer–Ray–Hare test, analysis of covariance, and logistic regression, while
there is a bigger section on more complex statistical analysis and data explora-
tion. The section on experimental design has also been expanded, with more
formal coverage of power analysis.
Finally, there are now comprehensive instructions about how to carry out the
statistical tests, not only using the latest version of SPSS (version 19) but also
the other common package MINITAB (version 6). I hope that this additional
information does not make the book too big or cumbersome.
Like the earlier editions, the book is based on courses I have given to students
at the University of Manchester’s Faculty of Life Sciences. I am heavily indebted
to our e‐learning team and to those students who have taken these courses for
their feedback. With their help, and with that of several of Pearson Education’s
reviewers, many errors have been eliminated, and I have learnt much more
about statistics, though I take full responsibility for those errors and omissions
that remain.
Finally, I would like to thank Yvonne for her unfailing support during the
writing of all of the editions of the book.
xvii
Publisher’s acknowledgements
SPSS screenshots on pages 24, 25 (t), 32, 37 (t, b), 39 (t, b), 48, 53, 59 (t, b), 66,
71, 76, 88, 93, 97, 98, 103, 108, 114, 119, 124 (t, b), 139, 148, 158, 168 (t, b), 173,
174 (t, c, b), 181, 206, 209 from SPSS Inc / IBM, Reprint Courtesy of International
Business Machines Corporation,© SPSS, Inc., an IBM Company. SPSS was acquired
by IBM in October, 2009.
MINITAB screenshots on pages 25 (b), 26, 28, 29 (t, b), 33 (t, b), 36 (t, b), 40 (t, b),
41 (t, b), 49, 54, 61, 67, 72, 73, 77, 89, 94, 104, 110, 115, 120, 126, 140, 141, 142,
143, 149, 159 (t, b), 169, 175, 183, 196, 207 from MINITAB, portions of the input
and output contained in this publication/book are printed with permission of
Minitab Inc. All material remains the exclusive property and copyright of Minitab
Inc., All rights reserved.
In some instances we have been unable to trace the owners of copyright mate-
rial and we would appreciate any information that would enable us to do so.
xix
1 An introduction to statistics
A biologist can be defined as someone who studies the living world. Much of
a biologist’s training involves learning about what other people have found
out: how organisms operate, and why they work in that way. But knowing
what other people have learnt in the past is not enough: you also have to be
able to find things out for yourself, and so you have to learn how to become a
researcher. Nowadays, almost all research is quantitative, so no biologist’s edu-
cation is complete without a training in how to take measurements, and how to
use the measurements you have taken to answer biological questions.
By the time you have reached advanced level, you will no doubt already
have had to undertake a research project, collected results and analysed them
in some way. However, you were probably not really sure why you had to do
what you did. This opening chapter brings up the sorts of questions that you
might have worried about, and attempts to answer them. Hopefully it will help
you understand why you should bother learning about the world of quantita-
tive biology and statistics. The chapter ends by introducing the subject of how
to choose the correct statistical tests for your research project.
The first thing you are invariably told to do when carrying out a research pro-
ject is to make repeated measurements: to include tens or even hundreds of
people in surveys; or to have large numbers of replicates in experiments. This
seems to be a great deal of wasted effort, so the first question that this book
needs to answer is why do biologists have to repeat everything?
You are then told to subject your results to statistical analysis. Unfortunately,
few subjects are less inviting to most biology students than statistics. For a start
it is a branch of mathematics ‐ not usually a biologist’s strong suit. You might
feel that as you are studying biology you should be able to leave the horrors of
maths behind you. So the second question that any book on biological statistics
needs to answer is why do biologists have to bother with statistics?
1
Chapter 1 An introduction to statistics
Many students also have a problem with the ideas behind statistics. You
might well have found that statisticians seem to think in a weird inverted kind
of way that is at odds with normal scientific logic. So this book also has to
answer the question why is statistical logic so strange?
Finally, students often complain, not unreasonably, about the size of sta-
tistics books and the amount of information they contain. The reason for this
is that there are large numbers of statistical tests, so this book also needs to
answer the question why are there so many different statistical tests?
In this opening chapter I hope that I can answer these questions and so help
put the subject into perspective and encourage you to stick with it. This chap-
ter can be read as an introduction to the information which is set out in what
I hope is a logical order throughout the book; it should help you work through
the book, either in conjunction with a taught course, or on your own. For those
more experienced and confident about statistics, and in particular those with an
experiment to perform or results to analyse, you can go directly to the decision
chart for simple statistical tests (Figure 1.1) introduced later in this chapter on
page 7 and also inside the back cover of the book. This will help you choose the
statistical test you require and direct you to the instructions on how to perform
each test, which are given later in the book. Hence the book can also be used as a
handbook to keep around the laboratory and consult when required.
Why do biologists have to repeat everything when they are conducting sur-
veys or analysing experiments? After all, physicists don’t need to do it when
they are comparing the masses of sub‐atomic particles. Chemists don’t need to
when they’re comparing the pHs of different acids. And engineers don’t need to
when they are comparing the strength of different shaped girders. They can just
generalise from single observations; if a single neutron is heavier than a single
proton, then that will be the case for all of them.
However, if you decided to compare the heights of fair‐ and dark‐haired
women it is obvious that measuring just one fair‐haired and one dark‐haired
woman would be stupid. If the fair‐haired woman was taller, you couldn’t gen-
eralise from these single observations to tell whether fair‐haired women are on
average taller than dark‐haired ones. The same would be true if you compared
a single man and a single woman, or one rat that had been given growth hor-
mone with another that had not. Why is this? The answer is, of course, that in
contrast to sub‐atomic particles, which are all the same, people (in common
with other organisms, organs and cells) are all different from each other. In
other words they show variability, so no one person or cell or experimentally
treated organism is typical. It is to get over the problem of variability that biolo-
gists have to do so much work and have to use statistics.
To overcome variability, the first thing you have to do is to make replicated
observations of a sample of all the observations you could possibly make.
There are two ways in which you can do this.
2
1.5 Why is statistical logic so strange?
1. You can carry out a survey, sampling at random from the existing popula-
tion of people or creatures or cells. You might measure 20 fair‐haired and
20 dark‐haired women, for instance.
2. You can create your own samples by performing an experiment. Your experi-
mental subjects are then essentially samples of the infinite population of
subjects that you could have created if you had infinite time and resources.
You might, for instance, perform an experiment in which 20 experimentally
treated rats were injected with growth hormone and 20 other controls were
kept in exactly the same way except that they received no growth hormone.
At first glance it is hard to know exactly what you should do with all the observa-
tions that you make, given that all creatures are different. This is where statistics
comes in; it helps you deal with the variability. The first thing it helps you do is
to examine exactly how your observations vary, in other words to investigate the
distribution of your samples. The second thing it helps you do is calculate rea-
sonable estimates of the situation in the whole population, for instance working
out how tall the women are on average. These estimates are known as descriptive
statistics. How you do both of these things is described in Chapter 2.
Descriptive statistics summarise what you know about your samples. How-
ever, few people are satisfied with simply finding out these sorts of facts; they
usually want to answer questions. You would want to know whether one group
of the women was on average taller than the other, or you might want to know
whether the rats that had been given the growth hormone were heavier than
those which hadn’t. Hypothesis testing enables you to answer these questions.
If you compared the groups, you would undoubtedly find that they were at
least slightly different (let’s say the fair‐haired women were taller than the dark‐
haired) but there could be two reasons for this. It could be because there really
is a difference in height between fair‐ and dark‐haired women. However, it is
also possible that you obtained this difference by chance by virtue of the partic-
ular people you chose. To discount this possibility, you would have to carry out
a statistical test (in this case a two‐sample t test) to work out the probability
that the apparent effects could have occurred by chance. If this probability was
small enough you could make the judgement that you could discount it and
decide that the effect was significant. In this case you would then have decided
that fair‐haired women are significantly taller than dark‐haired.
All of this has the consequence that the logic of hypothesis testing is rather
counterintuitive. When you are investigating a subject in science, you typi-
cally make a hypothesis that something interesting is happening, for instance
in our case that fair‐haired women are taller than dark‐haired, and then set out
3
Chapter 1 An introduction to statistics
to test it. In statistical hypothesis testing you do the opposite. You construct a
null hypothesis null hypothesis that nothing interesting is happening, in this case that fair‐ and
A preliminary assumption
dark‐haired women have the same mean height, and then test whether this
in a statistical test that
the data shows no differ- null hypothesis is true. Statistical tests have four main stages.
ences or associations.
A statistical test then Step 1: Formulating a null hypothesis
works out the probability
of obtaining data similar The null hypothesis you must set up is the opposite of your scientific hypoth-
to your own by chance. esis: that there are no differences or relationships. (In the case of the fair‐ and
dark‐haired women, the null hypothesis is that they are the same height.)
Even if we accept that statistical tests are necessary in biology, and can cope
with the unusual logic, it is perhaps not unreasonable to expect that we should
be able to analyse all our results using just a single statistical test. However,
4
1.6 Why are there are so many statistical tests?
statistics books such as this one contain large numbers of different tests. Why
are there so many? There are two main reasons for this. First, there are several
very different ways of quantifying things and hence different types of data that
you can collect, and this data can vary in different ways. Second, there are very
different questions you might want to ask about the data you have collected.
5
Chapter 1 An introduction to statistics
The logic of the previous section has been developed and expanded to produce
a decision chart (Figure 1.1 and on the inside cover of the book). Though not
fully comprehensive, the chart includes virtually all of the tests that you are
likely to encounter as an undergraduate. If you are already a research biologist,
it may also include all the tests you are ever likely to use over your working life!
If you follow down from the start at the top and answer each of the questions
in turn, this should lead you to the statistical test you need to perform.
There is only one complication. The final box may have two alternative tests: a
parametric test in bold type and an equivalent non‐parametric test in normal type.
You are always advised to use the parametric test if it is valid, because parametric
tests are more powerful in detecting significant effects. Use the non‐parametric
test if you are dealing with ranked data, irregularly distributed data that cannot
6
1.7 Using the decision chart
START
Differences
Relationships
Will you have one, Correlation coefficient (p.135) Regression analysis (p.145)
two or more than two Rank correlation (p.156) Rank correlation (p.156)
sets of measurements?
Two More than two
One
Yes No
Yes No Yes No
Figure 1.1 Decision chart for statistical tests. Start at the top and follow the
questions down until you reach the appropriate box. The tests in normal type are
non‐parametric equivalents for irregularly distributed or ranked data.
7
Chapter 1 An introduction to statistics
START
Measurements Ranks
Yes
Is the distribution significantly different from normal?
No Yes
No
Analyse your results using parametric tests Analyse your results using non-parametric tests
Figure 1.2 Flow chart showing how to deal with measurements and rank data. Start
at the top, answer the questions and transform data where appropriate before deciding
whether you can use parametric tests or have to make do with non‐parametric ones.
1. It will tell you the sorts of questions the test will enable you to answer and
give examples to show the range of situations for which it is suitable. This
will help you make sure you have chosen the right test.
2. It will tell you when it is valid to use the test.
8
1.8 Using this book
3. It will describe the rationale and mathematical basis for the test; basically it
will tell you how it works.
4. It will show you how to perform the test using a calculator and/or the
computer‐based statistical packages SPSS and MINITAB.
5. It will tell you how to present the results of the statistical tests.
9
2 Dealing with variability
2.1 Introduction
This chapter tells you how to deal with the problem of variability: it shows how
to examine and present the distribution of data; explains why variation occurs
in the first place; and describes how to quantify it. In other words, it shows how
you can obtain useful quantitative information about a population from the
results of your sample, despite the variation.
The first thing to do when you have taken some measurements from your sam-
ple is to investigate their distribution. The best way of doing this is produce a
histogram or bar chart.
For continuous data, you should produce a histogram (Figure 2.1a), grouping
the data points into a number of arbitrarily defined size classes of equal width
set out along the x-axis, while the y-axis shows the number of data points in
each class. This gives very useful information about the distribution, in particu-
lar about the relative commonness of different values. The number of classes
you choose should depend on the sample size. If you have a very large sample
you could have anything up to 12 or more classes to produce a detailed distri-
bution. However, with smaller sample sizes the numbers within each class fall
and the distribution is likely to become more bumpy. It is better, then, to have
a smaller number of classes: as few as 5 for small samples of 20 or less.
Discrete data can be treated in just the same way as continuous data, with
each class covering the same number of discrete values. However, if you have
a big enough sample, each discrete value may have enough data points within
it to allow you to draw a bar chart (Figure 2.1b), in which each bar is separated
from the next.
10
2.2 Examining the distribution of data
Number of rats
4 5 6 7 8 9 10 11 12 13 14 15
(a) Length (cm)
Number of flies
13 14 15 16 17 18
(b) Number of thoracic hairs
Figure 2.1 Methods of presenting the distribution of a sample. Continuous data
should be presented as a histogram (a) which gives the numbers of points within a
number of classes of equal width. Discrete data can instead be given in a bar chart (b).
distributed (Figure 2.2a), positively skewed, with more small values than large
ones. (Figure 2.2b), or negatively skewed, with more large values than small
ones (Figure 2.2c). Positively skewed data are particularly commonly in natu-
ral populations because there tend to be many more young (and hence small)
organisms in a population than older, larger ones. Data may even be irregu-
larly distributed (Figure 2.2d).
Whichever way the data is distributed, there is no way that anyone else would
be particularly interested in seeing all your histograms; you need a way to sum-
marise and quantify the distribution.
11
Chapter 2 Dealing with variability
Figure 2.2 Different ways in which data may be distributed. (a) A symmetrical
distribution; (b) positively skewed data; (c) negatively skewed data; (d) irregularly
distributed data.
mean (μ) the class in which there are the most data points. I don’t recommend you use
The average of a population.
The estimate of μ is called x.
the mode, as its value will depend on exactly how you have split up your data
into size classes. The mean is the arithmetical average of all the data points. As
skewed data we shall see, in many cases this is extremely useful, but it is not very helpful
Data with an asymmetric
distribution. for skewed data, when the mean will be greatly affected by the few outlying
points. The most universally useful measure of the centre of the distribution is
median
The central value of a
the median which is the point halfway along the ranked data set (or the average
distribution (or average of of the points above and below the middle if the sample size is even). Finally, the
the middle points if the shape of the distribution is best represented by finding the quartiles, the points
sample size is even).
25% and 75% down the ranked data set. The interquartile range is the dis-
quartiles tance between these two points, and is another measure of the width of the
Upper and lower quartiles
distribution.
are values exceeded by 25%
and 75% of the data points, These measures can be combined to produce a box and whisker plot
respectively. (Figure 2.3b) with the median as a thick bar at the centre, the upper and lower
quartiles as the top and bottom of the box, and the maximum and minimum
values as the top and bottom of the whiskers. This one simple plot allows you
12
2.3 The normal distribution
Mode
Lower quartile
Median
Number of newts
Mean
Upper quartile
Minimum
Maximum
(a)
Height Length
Figure 2.3 Measurements of the distribution of data. The median, quartiles and
maximum and minimum values of the positively skewed distribution (a) are best
summarised using a box and whisker plot (b), such as this which compares the height
of fair‐haired and dark‐haired women.
to see how symmetrical the distribution is, and how much the data is concen-
trated towards the middle. Giving two box and whisker plots side by side of two
different samples also allows you to compare them at a glance. In Figure 2.3b,
for instance, it is clear that there is not really that much difference between fair‐
haired and dark‐haired women.
13
Chapter 2 Dealing with variability
s s
m sm sm m sm s
Figure 2.4 A normal distribution. The centre of the distribution is described by the
mean m and the width by the standard deviation s which is the distance to the point
of inflexion: 68% of measurements are found within one standard deviation of the
mean; 95% are found within 1.96 times the standard deviation of the mean.
looking at just why data is normally distributed, and then examining how nor-
mally distributed data is described.
14
2.3 The normal distribution
Figure 2.5 Length distributions for a randomly breeding population of rats. Length
is controlled by a number of factors, each of which is found 50% of the time in the
form which reduces length and 50% in the form which increases length. The graphs
show length control by (a) 1 factor, (b) 2 factors, (c) 4 factors and (d) 8 factors. The
greater the number of influencing factors, the greater the number of peaks and the
more nearly they approximate a smooth curve (dashed outline).
which increases length by 5% and half the time in the form which decreases
it by 5%. In this case, of 16 possible combinations of factors, there is only
one combination in which all four factors are in the long form and one com-
bination in which all are in the short form. The chances of each are therefore
1–
2
⫻ 12
– ⫻ 1– ⫻ 1– ⫽ —
2 2
1
16
. The rats are much more likely to be intermediate in size,
because there are four possible combinations in which three long and one short
factor (or three short and one long) can be arranged, and six possible combina-
tions in which two long and two short factors can be arranged. It can be seen
that the central peak is higher than those further out. The process is even more
apparent, and the shape of the distribution becomes more obviously humped
if there are eight factors, each of which increases or decreases length by 2.5%
binomial distributions (Figure 2.5d). The resulting distributions are known as binomial distributions.
The pattern by which the If length were affected by more and more factors, this process would continue;
sample frequencies in two
groups tends to vary. the curve would become smoother and smoother until, if length were affected by
an infinite number of factors, we would get the bowler‐hat‐shaped distribution
curve we saw in Figure 2.4. This is the so‐called normal distribution (also known
as the Z distribution). If we measured an infinite number of rats, most would
have length somewhere near the average, and the numbers would tail off on each
15
Chapter 2 Dealing with variability
Unlike general distributions which need at least five numbers to describe them,
parameters
A measure, such as the
the normal distribution of a population can be described by just two numbers
mean and standard devia- or parameters. The position of the centre of the distribution is described by
tion, which describes or the population mean m, which on the graph is located at the central peak of
characterises a population.
These are usually repre- the distribution (Figure 2.4). The width of the distribution is described by the
sented by Greek letters. population standard deviation S, which is the distance from the central peak to
population
the point of inflexion of the curve (where it changes from being convex to con-
A potentially infinite group cave) (Figure 2.4). This is a measure of about how much, on average, points
on which measurements differ from the mean. Of course we can never know for certain the population
could be taken. Parameters
of populations usually have parameters because we would never have the time to measure the entire
to be estimated from the population, but we can use the results from a sample of a manageable size to
results of samples.
make an estimate of the population mean and standard deviation. These esti-
sample mates are known as statistics.
A subset of a possible It is very easy to calculate an estimate of the population mean. It is simply
population on which –. It is simply the sum of all the
measurements are taken. the average of the sample, or the sample mean x
These can be used to lengths divided by the number of rats measured. In mathematical terms this is
estimate parameters of
given by the expression
the population.
estimate a xi
x = (2.1)
A parameter of a population N
which has been calculated
from the results of a sample. where xi is the values of length and N is the number of rats.
The estimate of the population standard deviation, written s or sn - 1, is
statistics
An estimate of a population
given by the expression
parameter, found by random
s = sn - l = A a
sampling. Statistics are (xl - x)2
(2.2)
represented by Latin letters. N - l
variance It is the square root of the variance, which is the average of the square of the
A measure of the variability distances of each point from the sample mean. Rather than dividing the sum of
of data: the square of their
standard deviation.
squares by N, however, we divide by (N -1). We use (N -1) because this expres-
sion will give an unbiased estimate of the population standard deviation, where-
as using N would tend to underestimate it. To see why this is so, it is perhaps
best to consider the case when we have only taken one measurement. Since the
degrees of freedom (DF) estimated mean x necessarily equals the single measurement, the standard devia-
A concept used in paramet-
tion we calculate when we use N will be zero. Similarly, if there are two points,
ric statistics, based on the
amount of information you the estimated mean will be constrained to be exactly halfway between them,
have when you examine sam- whereas the real mean is probably not. Thus the variance (calculated from the
ples. The number of degrees
of freedom is generally the
square of the distance of each point to the mean) and hence standard deviation
total number of observations will probably be underestimated.
you make minus the number The quantity (N - 1) is known as the number of degrees of freedom of the
of parameters you estimate
from the samples. sample. Since the concept of degrees of freedom is repeated throughout the rest
of this book, it is important to describe what it means. In a sample of N obser-
16
2.5 The variability of samples
vations each is free to have any value. However, if we have used the measure-
ments to calculate the sample mean, this restricts the value the last point can
have. Take a sample of two measurements, for instance. If the mean is 17 and
the first measurement is 17 + 3 = 20, the other measurement must have the
value 17 - 3 = 14. Thus, knowing the first measurement fixes the second, and
there will only be one degree of freedom. In the same way, if you calculate the
mean of any sample of size N, you restrict the value of the last measurement, so
there will be only (N -1) degrees of freedom.
It can take time calculating the standard deviation by hand, but fortunately
few people have to bother nowadays; estimates for the mean and standard devi-
ation of the population can readily be found using computer statistics packages
or even scientific calculators. All you need to do is type in the data values and
press the x button for the mean and the s, sn - 1 or xsn - 1 button for the popula-
tion standard deviation. Do not use the sn or xsn button, since this works out
the sample standard deviation, NOT the population standard deviation.
Example 2.1 The masses (in tonnes) of a sample of 16 bull elephants from a single reserve
in Africa were as follows.
Solution
The estimate for the population mean is 4.70 tonnes and the population
standard deviation is 0.2251 tonne, rounded to 0.23 tonne to two decimal
places. Note that both figures are given to one more degree of precision
than the original data points because so many figures have been combined.
17
Chapter 2 Dealing with variability
Figure 2.6 The effect of sample size. Changes in the cumulative (a) mean,
(b) standard deviation and (c) standard error of the mass of bull elephants from
Example 2.1 after different numbers of observations. Notice how the values for mean
and standard deviation start to level off as the sample size increases, as you get
better and better estimates of the population parameters. Consequently the standard
error (c), a measure of the variability of the mean, falls.
18
2.6 Confidence limits
Note how the fluctuations of the cumulative mean start to get less and less
and how the line starts to level off. Figure 2.6b shows the cumulative standard
deviation. This also tends to level off. If we increased the sample size more and
more, we would expect the fluctuations to get less and less until the sample
mean converged on the population mean and the sample standard deviation
standard error (SE) converged on the population standard deviation. The standard error (SE) of
A measure of the spread of
the mean is a measure of how much the sample means would on average differ
sample means: the amount
by which they differ from the from the population mean. Of course, like mean and standard deviation, we
true mean. Standard error cannot know the standard error with any certainty, but we can estimate it. Our
equals standard deviation
divided by the square root of
estimate of the standard error, SE, is given by the expression
SE = s> 2N
the number in the sample.
The estimate of SE is called (2.3)
SE.
so that the larger the sample size, the smaller the value of as SE Figure 2.6c
shows. The standard error is an extremely important statistic because it is a
measure of just how variable your estimate of the mean is.
Once we have our estimate for the mean, x, and for the standard error, SE ,
of the population, it is fairly straightforward to calculate what are known as
confidence limits confidence limits for the population mean m. The most often used are the 95%
Limits between which
estimated parameters
confidence limits: numbers between which the real population mean, m will be
have a defined likelihood of found 95 times out of 100.
occurring. It is common to Because the standard error, SE, is only estimated, the sample mean will not
calculate 95% confidence
limits, but 99% and 99.9% vary precisely according to the normal distribution, but to a slightly wider one,
confidence limits are also which is known as the t distribution (Figure 2.7). The exact shape of the t distri-
used. The range of values
between the upper and
bution depends on the number of degrees of freedom; it becomes progressively
lower limits is called the more similar to the normal distribution as the number of degrees of freedom
confidence interval. increases (and hence as the estimate of standard deviation becomes more exact).
t distribution The 95% confidence limits for the population mean μ can be found using
The pattern by which the tabulated critical values of the t statistic (Table S1) given in the statistical
sample means of a normally
tables at the end of the book. The critical t value t(N - 1)(5%) is the number of
distributed population tend
to vary. standard errors SE away from the estimate of population mean x within which
the real population mean μ will be found 95 times out of 100. The 95% confi-
critical values
Tabulated values of test dence limits define the 95% confidence interval, or 95% CI; this is expressed as
statistics; if the absolute follows:
value of a calculated test
statistic is usually greater 95% CI(mean) = mean x { (t(N - 1)(5%) * SE) (2.4)
than or equal to the appro-
priate critical value, the null
hypothesis must be rejected.
where (N - 1) is the number of degrees of freedom. It is most common to use a
95% confidence interval but it is also possible to calculate 99% and 99.9% confi-
dence intervals for the mean by substituting the critical t values for 1% and 0.1%
respectively into equation 2.4.
Note that the larger the sample size N, the narrower the confidence interval.
This is because as N increases, not only will the standard error SE be lower but
so will the critical t values. Quadrupling the sample size reduces the distance
19
Chapter 2 Dealing with variability
m SE m SE m m SE m SE
Figure 2.7 Normal distribution and t distribution. The distribution of sample means
5 relative to the estimate of the standard error 1 calculated from samples with 1, 10
and infinite degrees of freedom. With infinite degrees of freedom the distribution
equals the normal distribution. However, it becomes more spread out as the sample
size decreases (fewer degrees of freedom) because the estimate of standard error
becomes less reliable.
between the upper and lower limits of the confidence interval by more than
one‐half.
Take the results for the bull elephants given in Example 2.1. Figure 2.6c
shows the standard error of the weights. Note how the standard error falls as
the sample size increases. Figure 2.8 shows the cumulative 95% confidence
intervals for weight. Note that the distance between the upper and lower inter-
vals falls off extremely rapidly, especially at first; the bigger the sample size the
more confident we can be of the population mean.
Example 2.2 Our survey of the 16 bull elephants gave an estimate of mean mass of 4.70
tonnes and an estimate of standard deviation of 0.2251 tonne. We want to
calculate 95% and 99% confidence limits for the mean mass.
Solution
The estimate of standard error is SE = 0.2251> 216 = 0.0563 tonne, which
is rounded to 0.056 tonne to three decimal places. Notice that standard er-
rors are usually given to one more decimal place than the mean or standard
deviation.
To calculate the 95% confidence limits we must look in Table S1 (p. 258)
for the critical value of t for 16 -1 = 15 degrees of freedom. In fact t15(5%) =
2.131. Therefore 95% confidence limits of the population mean are 4.70
{(2.131 * 0.0563) = 4.70 { 0.12 = 4.58 and 4.82 tonnes. So 95 times out
of 100 the real population mean would be between 4.58 and 4.82 tonnes.
Similarly, t15(1%) = 2.947. Therefore 99% confidence limits of the popu-
lation mean are 4.70 { (2.947 * 0.0563) = 4.70 { 0.16 = 4.54 and 4.86
tonnes. So 99 times out of 100 the real population mean would be between
4.54 and 4.86 tonnes.
20
2.7 Presenting descriptive statistics and confidence limits
Figure 2.8 Changes in the mean and 95% confidence intervals for the mass of the
bull elephants from Example 2.1 after different numbers of observations. Notice
how the 95% confidence intervals converge rapidly as sample size increases.
21
Chapter 2 Dealing with variability
the reader can calculate the other statistic. A 95% confidence interval can be
given as x { (t(N - 1)(5%) * SE). For example, in our elephant example:
2.7.2 Graphically
The other way to present data is on a point graph or a bar chart (Figure 2.9).
error bars The mean is the central point of the graph or the top of the bar. Error bars are
Bars drawn upwards and
then added. From the mean, bars are drawn both up and down a length equal
downwards from the mean
values on graphs; error bars to either the standard deviation, standard error, or 95% confidence intervals.
can represent the standard Finally, lines are drawn across the ends of the bars. You must say in the captions
deviation or the standard
error.
or legends which type of bar you are using.
The choice of which measure of variation to use depends on what you want
to emphasise about your results. If you want to show how much variation there
is, you should choose standard deviation (Figure 2.9a). On the other hand, if
you want to show how confident you can be of the mean, you should choose
standard error (Figure 2.9b). If you want to show the likely range of the mean
you should choose the 95% confidence intervals (Figure 2.9c).
In general, if two samples have overlapping standard error bars, they are
unlikely to be statistically different (Chapter 4). Since people in general want to
show mean results and tend to want to compare means (see Chapters 4 and 5),
standard error bars are by far more the commonly used ones, though some people
prefer standard deviation, as it does not hide the variability.
In the past we used to have to carry out all of our statistical calculations by hand,
with the help of statistical tables, and from the 1970s onwards electronic calcu-
lators. Fortunately, this is no longer essential (though useful to help you learn
about the basics of statistics), because you can use one of the many computer‐
based statistical packages that are available, such as SPSS (Statistical Package
for the Social Sciences), MINITAB, SAS, SYSTAT or R. You simply enter all your
results straight into a spreadsheet in the computer package and let the com-
puter take the strain. Using a computer package has two advantages: (1) the
computer carries out the calculations more quickly; and (2) you can save the
results for future analysis.
In this book we will examine how to carry out statistical tests on two of the
most commonly used packages, SPSS and MINITAB. They both work in much
the same way. You enter the results from different samples into separate col-
umns of their data spreadsheets. You can then run tests on the different columns
from the command screen of the package using a drop‐down menu. They run
statistical tests, and also produce graphs, which can be useful (though they are
rarely of publishable quality).
22
2.8 Introducing computer packages
Figure 2.9 Presenting descriptive statistics using error bars. (a) The mean yield of
two species of grass with error bars showing their standard deviation; this emphasises
the high degree of variability in each grass, and the fact that the distributions overlap
a good deal. (b) Standard error bars emphasise whether or not the two means are
different; here the error bars do not overlap, suggesting that the means might be
significantly different. (c) 95% confidence intervals emphasise the likely limits to the
mean of each species.
Each package has a different history and its own quirks, so although they
have become more similar recently, it is worth introducing them separately,
before showing how you can use each of them to calculate descriptive statistics
and confidence intervals.
2.8.1 SPSS
SPSS was designed for social scientists and so has a slightly awkward (though very
logical) way of working; it assumes you are putting in lots of data about a certain
23
Chapter 2 Dealing with variability
number of cases (usually people). It can then look at how a wide range of fac-
tors affect the characteristics of these people (e.g. how does gender affect people’s
income) and how those characteristics are related to each other (e.g. do richer peo-
ple tend to be more healthy). Data is entered into a spreadsheet, rather like that
of Excel. However, in SPSS, each row represents a particular person (or in biology
a particular replicate such as a plant or cell), so you can’t put data about two dif-
ferent groups of people or organisms into different columns and use SPSS to ana-
lyse it. They have to be identified as members of different groups using another
column, and since only numbers are allowed in the spreadsheet, different groups
have to be identified by the use of subscripts. An example is shown below, giving
the heights and genders of eight different people. Here the genders are given as
the subscripts 1 and 2 in a separate column, representing female and male.
Source: All SPSS screenshots from SPSS Inc / IBM, Reprint Courtesy of International Business Machines
Corporation,© SPSS, Inc., an IBM Company. SPSS was acquired by IBM in October, 2009.
Having entered data into the Data View Screen which is shown above (and
which appears when you open the program), you name the columns by click-
ing on the Variable View tab to get to the Variable View Screen (below) and
simply type in the name in the Name box. You can also change the width of
the column (in the Width box) to allow you to get longer numbers on and
change the number of decimal places SPSS shows your data to by altering the
Decimals box.
To return to the data simply click onto the Data View tab at the bottom left.
You can then order statistical tests to be carried out or plot graphs by pulling
down menus with your mouse. The results will be printed out on an Output
Screen. Both the Data and the Output can be separately saved.
24
2.8 Introducing computer packages
As you will see, SPSS produces prodigious amounts of output, most of it irrel-
evant to a working biologist. In this book I will present just the useful parts of
the results it produces.
2.8.2 MINITAB
MINITAB was designed for scientists and has a slightly different layout from SPSS. You
enter your data into a spreadsheet in just the same way, but as it does not expect you to
be inputting data about people, it is a bit more flexible (though not always so logical!).
You can put data in columns representing different organisms side by side (see below)
as well as putting all the data in one column and using subscripts as you do in SPSS.
Source: All MINITAB screenshots from MINITAB, portions of the input and output contained in this
publication/book are printed with permission of Minitab Inc. All material remains the exclusive prop-
erty and copyright of Minitab Inc., All rights reserved.
25
Chapter 2 Dealing with variability
Having entered data into the Worksheet, the column titles are simply typed
in the boxes below the column number. Finally, you can order statistical tests
to be carried out or plot graphs just as in SPSS by pulling down menus with
your mouse. The results will be printed out on the Session area above the work-
sheet. Both the Worksheet and the Session can be saved separately. MINITAB
usually produces less output than SPSS, but provides most of the relevant
information.
To demonstrate the workings of the packages we will examine the data on the
weight of bull elephants given in Example 2.1.
26
2.9 Calculating descriptive statistics
Standard Deviation, Standard Error, and the 95% Confidence Intervals for
Mean. Note that the package gives some of the items with too much precision.
Don’t copy things from computer screens without thinking!
Histogram
4 Mean = 4.70
Std. dev. = 0.225
N = 16
3
Frequency
0
4.20 4.40 4.60 4.80 5.00 5.20
27
Chapter 2 Dealing with variability
The histogram and box and whisker plot are seen above and below.
5.20
5.00
4.80
4.60
4.40
4.20
Bulls
28
2.9 Calculating descriptive statistics
MINITAB will produce the results shown above, giving everything SPSS does except
the 95% confidence intervals. If you want those you can go into the Graphical
Summary dialogue box. MINITAB will then give you the following graphic.
29
Chapter 2 Dealing with variability
Problem 2.1
In a population of women, heart rate is normally distributed with a mean of 75 and a
standard deviation of 11. Between which limits will 95% of the women have their heart
rates?
Problem 2.2
The masses (in grams) for a sample of 10 adult mice from a large laboratory population
were measured. The following results were obtained:
5.6 5.2 6.1 5.4 6.3 5.7 5.6 6.0 5.5 5.7
Calculate estimates of the mean and standard deviation of the mass of the mice.
Problem 2.3
Measurements were taken of the pH of nine leaf cells. The results were as follows:
(a) Use the data to calculate estimates of the mean, standard deviation and standard-
error of the mean. Use these estimates to calculate the 95% confidence interval for
cell pH.
(b) Repeat the calculation assuming that you had only taken the first four measurements.
How much wider is the 95% confidence interval?
Problem 2.4
The masses (in kilograms) of 25 newborn babies were as follows.
3.5 2.9 3.4 1.8 4.2 2.6 2.2 2.8 2.9 3.2 2.7 3.4 3.0
3.2 2.8 3.2 3.0 3.5 2.9 2.8 2.5 2.9 3.1 3.3 3.1
Calculate the mean, standard deviation and standard error of the mean and present your
results (a) in figures and (b) in the form of a bar chart with error bars showing standard
deviation.
30
3 Testing for normality and transforming data
3.2.1 Purpose
To test whether the distribution of a sample is significantly different from the
normal distribution.
31
Chapter 3 Testing for normality and transforming data
Example 3.1 Is the distribution of weights of the bull elephants significantly different
from the normal distribution?
Solution
Step 1: Formulating the null hypothesis
The null hypothesis is that the distribution of your sample is not different
from the normal distribution.
Here the null hypothesis is that the distribution of the weights of the
bull elephants is not different from the normal distribution.
Finally click on Continue and then OK to run the test. SPSS will give you
not only the usual descriptive statistics but also an extra table and two more
plots. The important results are shown in the table below.
Tests of normality
Kolmogorov-Smirnova Shapiro-Wilk
Statistic df Sig. Statistic df Sig.
Bulls 0.172 16 0.200* 0.962 16 0.691
32
3.2 The Kolgomorov–Smirnov test
Using MINITAB
Click on the Stat menu; from there move onto the Basic Statistics bar, then
click on Normality Test. This brings up the Normality Test window (below).
Click the mouse in the Variable box, highlight C1 bulls and click on Select.
This will put C1 bulls in the Variable box. Next tick Kolgomorov–Smirnov
to give the completed window shown below.
Finally click on OK to run the test. MINITAB will give you the following
plot which presents the Kolgomorov–Smirnov statistic (K‐S). Here it is 0.172.
33
Chapter 3 Testing for normality and transforming data
34
3.4 Examining data in practice
below 0 or rise above 1, so the distribution is cut off at both ends. Before dealing
To illustrate how you should investigate the distribution of your data and deal
with it if it does not look normally distributed, it is perhaps best to go through
two examples.
Example 3.2 Data on the masses of 30 newts gave the following results.
Mass (g)
0.91 1.45 5.32 2.97 2.12 0.76 1.85 1.53 2.42 1.92
1.86 3.81 6.54 2.53 1.92 2.33 1.45 1.22 1.68 3.10
1.80 3.51 2.43 1.34 1.09 2.62 1.90 4.32 0.89 1.55.
Test to see if this data is normally distributed and, if not, carry out a sen-
sible transformation to make it normally distributed.
Solution
Using SPSS
SPSS gives the following histogram:
Histogram
10 Mean = 2.27
Std. dev. = 1.323
N = 30
8
Frequency
0
1.00 2.00 3.00 4.00 5.00 6.00 7.00
Newt weight
➨
35
Chapter 3 Testing for normality and transforming data
Tests of normality
Kolmogorov–Smirnova Shapiro–Wilk
Statistic df Sig. Statistic df Sig.
Newt weight 0.177 30 0.018 0.857 30 0.001
Using MINITAB
Examining the distribution in MINITAB gives the following histogram:
The data is clearly strongly positively skewed with far more small newts
than large ones, and performing a test for normality gives the following results.
36
3.5 Transforming data
Click on OK and SPSS will produce the new column lognewt next to newt-
weight as seen below.
37
Chapter 3 Testing for normality and transforming data
You can now examine the distribution of log10(newt weight) using SPSS and
carry out a further Kolgomorov–Smirnov test. It will give the following results.
Histogram
Mean = 0.30
Std. dev. = 0.231
N = 30
6
Frequency
0
−0.20 0.00 0.20 0.40 0.60 0.80
lognewt
Tests of normality
Kolmogorov–Smirnova Shapiro–Wilk
Statistic df Sig. Statistic df Sig.
lognewt 0.095 30 0.200* 0.987 30 0.972
a Lilliefors’ significance correction.
* This is a lower bound of the true significance.
38
3.5 Transforming data
Click on OK and MINITAB will produce the new column C2, which you can
name log weight as seen below.
You can now examine the distribution of log weight and carry out a
Kolgomorov–Smirnov test. It will give the following results.
39
Chapter 3 Testing for normality and transforming data
Whichever package is used the new histogram is far more symmetrical. Here
Sig. and P‐value 7 0.05, so the distribution is not significantly different from
normal. This shows that the log transformation has been successful. It is now
possible to carry out parametric statistical tests using this transformed data.
Let’s have a look at another example as a contrast.
Example 3.3 In a survey to investigate the population of ground beetles in a field of wheat,
20 pitfall traps were set out through the field, and samples collected after 3
days. The numbers of ground beetles found in each trap were as follows:
12 15 0 2 26 0 1 18 3 0
0 5 17 0 13 1 10 2 8 13
40
3.5 Transforming data
Solution
Exploring this data in MINITAB and testing for normality gives the follow-
ing relevant statistical results, histogram and box and whisker plots:
41
Chapter 3 Testing for normality and transforming data
The complete process is best summarised by the flow chart in Figure 1.2, pre-
sented on page 8 and on the inside cover of the book. One warning is in order,
however. The Kolgomorov–Smirnov test is useful, but you must remember that
just because it does not show up a significant difference does not mean that your
data is normally distributed. Particularly with small sample sizes, a type 2 error
can easily occur. Therefore when examining the distribution of data you should
also look at histograms as well. The results of the Kolgomorov–Smirnov tests are
useful, however, when you are presenting your results to back up claims that a
parametric analysis you have carried out is valid.
Problem 3.1
In a survey to examine the relative investment into roots of a self‐supporting plant and
a climber, the following results were obtained for the proportion of total dry mass in the
root system.
Self‐supporting 0.16 0.23 0.28 0.22 0.25 0.20 0.17 0.32 0.24 0.26
Climber 0.15 0.13 0.08 0.11 0.13 0.19 0.14 0.16 0.15 0.24
0.13 0.07.
Problem 3.2
A survey was carried out of the mean lengths of 20 species of the crow family. The results
are shown below.
Species 1 2 3 4 5 6 7 8 9 10
11 12 13 14 15 16 17 18 19 20
Length 7 8 9 10 11 12 12 13 14 15
16 18 19 21 23 24 26 29 32 37
Examine the data using SPSS or MINITAB. Does it look normally distributed and if not
how would you transform it to make it so?
42
Testing for differences between
4 one or two groups
4.1 Introduction
43
Chapter 4 Testing for differences between one or two groups
of variation we can never be certain that the differences between our sample
means reflect real differences in the population means. We might have got dif-
ferent means just by chance.
Figure 4.1 Sample means different from an expected value. (a) There is a high
probability (shaded areas) of obtaining a mean at least one standard error SE away
from the expected mean m (b) There is a very low probability (shaded areas) of getting
a mean at least three standard errors 3 SE away from the expected mean m.
44
4.3 How we test for differences
critical values
Statisticians have taken a lot of the hard work out of deciding whether to reject
Tabulated values of test the null hypothesis by preparing tables of critical values for test statistics. Sev-
statistics; usually if the abso- eral of these tables, including the one for the t statistic, are given in the tables at
lute value of a calculated test
statistic is greater than or the end of the book. If the absolute value of your test statistic is (usually) greater
equal to the appropriate criti- than or equal to the critical value for the 5% significance level, then there is a
cal value, the null hypothesis
must be rejected.
less than 5% probability of getting these results by chance. Therefore, you can
reject the null hypothesis. It is even easier if you are carrying out a statistical test
confidence limits in SPSS or another computer package. It will work out the significance probabil-
Limits between which ity for you, and all you have to do is compare that probability with 0.05.
estimated parameters
have a defined likelihood of Sometimes you may find the probability P falls below critical levels of 1 in
occurring. It is common to 100 or 1 in 1000. If this is true, you can reject the null hypothesis at the 1% or
calculate 95% confidence
limits, but 99% and 99.9%
0.1% levels respectively.
confidence limits are also
used. The range of values Step 5: Calculating confidence limits
between the upper and
lower limits is called the
Whether or not there is a significant difference, you can calculate confidence
confidence interval. limits to give a set of plausible values for the differences of the means. Calculating
45
Chapter 4 Testing for differences between one or two groups
Statistical tables often come in two different versions: one‐tailed and two‐tailed.
two‐tailed tests Most biologists use two‐tailed tests. These test whether there are differences
Tests which ask merely
from expected values but do not make any presuppositions about which way
whether observed values
are different from an the differences might be. With our rats, therefore, we would be testing whether
expected value or each they had a different length but not whether they were longer or shorter than
other, not whether they
are larger or smaller.
expected. The criterion for rejecting the null hypothesis in the two‐tailed test is
when the total area in the two tails of the distribution (Figure 4.1) is less than
5%, so each tail must have an area of less than 2.5%. All the statistical tables in
this text are two‐tailed. The tests carried out by SPSS and MINITAB are also by
default two‐tailed.
There are three main types of t test, which are used in different situations. The
simplest one, the one‐sample t test, is used to determine whether the mean of a
single sample is different from an expected value. If you want to see if there are
differences between two sets of paired observations, you need to use the paired
t test. Finally to test whether the means of two independent sets of measure-
ments are different you need to carry out a two‐sample t test, also known as an
independent sample t test. These tests all have fairly easy to grasp logic and are
straightforward to carry out mathematically. Therefore, instructions will be given
to carry out these tests both using a calculator, and the computer packages.
parametric test All these t tests are so‐called parametric tests, which assume that the data is
A statistical test which normally distributed. If you have found that your data is not normally distributed
assumes that data are
normally distributed. (see Section 3.3), and cannot transform it into data that is, you should instead use
their non‐parametric equivalents: the sign test, the Wilcoxon matched pairs
test and the Mann–Whitney U test. These are given at the end of the chapter.
4.6.1 Purpose
To test whether the sample mean of one measurement taken on a single popu-
lation is different from an expected value E.
4.6.2 Rationale
The one‐sample t test calculates how many standard errors the sample mean is
away from the expected value. It is therefore found using the formula
46
4.6 The one‐sample t test
The further away the mean is from the expected value, the larger the value of
t, and the less probable it is that the real population mean could be the expected
value. Note, however, that it does not matter whether t is positive or negative.
It is the difference from zero that matters, so you must consider the absolute
value of t, |t |. If |t | is greater than or equal to a critical value, then the difference
is significant.
4.6.3 Validity
The data must be normally distributed.
Example 4.1 Do the bull elephants we first met in Example 2.1 have a different mean
mass from the mean value for the entire population of African elephants
of 4.50 tonnes?
Solution
Step 1: Formulating the null hypothesis
The null hypothesis is that the mean of the population is not different from
the expected value. Here, therefore, the null hypothesis is that the mean
weight of bull elephants is 4.50 tonnes.
The mean is 3.55 standard errors away from the expected value.
Using SPSS
To carry out a one sample t test, you first need to enter all the data into a
single column, and give the column a name (here bulls). To run the test,
click on the Analyze menu, then move onto the Compare Means bar and
click on One‐Sample T Test. SPSS will present you with the One‐Sample T
Test dialogue box.
➨
47
Chapter 4 Testing for differences between one or two groups
Put the column you want to compare (here bulls) into the Test Variables
box, and the value you want to compare it (here 4.50) with in the Test
Value box. This will give
One-sample test
Bulls
SPSS gives you all the descriptive statistics you need in the top table, and
the value of t (3.554) in the lower table.
Using MINITAB
Once again, enter all the data into a single column, and name it bulls. To
run the test, click on the Stat menu, then move onto the Basic Statistics
bar and click on 1‐Sample t. MINITAB will present you with the 1‐Sample t
(Test and Confidence Interval) dialogue box.
48
4.6 The one‐sample t test
Put the column you want to compare (here bulls) into the Samples in
columns box, tick the Perform hypothesis test, and type in the value
you want to compare (here 4.5 in the Hypothesised mean box). This
will give
One‐Sample T: bulls
Using a calculator
You must compare your value of | t | with the critical value of the t statistic
for (N - 1) degrees of freedom and at the 5% level [t(N-1)(5%)]. This is given
in Table S1 at the end of the book.
➨
49
Chapter 4 Testing for differences between one or two groups
Here, there are 16-1 ⫽ 15 degrees of freedom, so the critical value that
| t | must exceed for the probability to drop below the 5% level is 2.131.
Using a calculator
The 95% confidence limits for the difference are given by the equation
Here, the mean is 4.70, with a standard error of 0.0563, and the critical t
value for 15 degrees of freedom is 2.131. Therefore
50
4.7 The paired t test
Bull elephants are 95% likely to be between 0.08 and 0.32 tonnes heavier
than 4.5 tonnes.
The weight of the bull elephants, with a mean and standard error of 4.70 ⫾
0.056, was 4% greater than the expected weight of 4.5 tonnes, a difference
that a one‐sample t test showed was significant (t15 ⫽ 3.55, P ⫽ 0.003).
Note that the subscript after t is the number of degrees of freedom.
4.7.1 Purpose
To test whether the means of two sets of paired measurements are different
from each other. Examples might be a single group of people measured twice,
e.g. before vs after a treatment; or related people measured once, e.g. older vs
younger identical twins.
4.7.2 Rationale
The idea behind the paired t test is that you look at the difference between each
pair of points and then see whether the mean of these values is different from
0. The test therefore has two stages. You first calculate the difference d between
each of the paired measurements you have made. You can then use these figures
to calculate the mean difference between the two sets of measurements and the
standard error of the difference. You then use a one‐sample t test to determine
whether the mean difference d is different from zero. The test statistic t is the
number of standard errors the difference is away from zero. It can be calculated
using a calculator or using computer programs using the following equation.
Mean difference d
t = = (4.3)
Standard error of difference SEd
51
Chapter 4 Testing for differences between one or two groups
This procedure has the advantage that it removes the variability within each
sample, concentrating only on the differences between each pair, so it improves
your chances of detecting an effect.
4.7.3 Validity
Both sets of data must be normally distributed.
Example 4.2 Two series of measurements were made of the pH of nine ponds: at dawn
and at dusk. The results are shown below.
Carrying out descriptive statistics shows that the mean difference d = 0.19
and the standard error of the difference SEd ⫽ 0.043.
Do the ponds have a different pH at these times?
Solution
Step 1: Formulating the null hypothesis
The null hypothesis is that the mean difference d is not different from zero.
Here, the null hypothesis is that the mean of the differences in the pH is
0, i.e. the ponds have the same pH at dawn and dusk.
52
4.7 The paired t test
Using SPSS
SPSS can readily work out t as well as other important elements in the test.
Simply put the data side by side into two columns so that each pair of
observations has a single row and give each column a name (here dawnph
and duskph). Next, click on the Analyze menu, move onto the Compare
Means bar, and click on the Paired‐Samples T Test bar. SPSS will produce
the Paired‐Samples T Test dialogue box. Put the columns to be compared
into the Variables box as shown below by clicking on both of them and
then on the arrow. The completed data and dialogue box are shown below.
DawnpH DuskpH
SPSS gives the descriptive statistics in the first table and in the last table
calculates t, which here is ⫺4.404. (Note that version 19 of SPSS always takes
the column which is second in the alphabet from that which is first.)
➨
53
Chapter 4 Testing for differences between one or two groups
Using MINITAB
Simply put the data side by side into two columns so that each pair of obser-
vations has a single row and give each column a name (here Dawn pH and
Dusk pH). Next, click on the Stat menu, move onto the Basic Statistics bar,
and click on Paired t. MINITAB will produce the Paired t (Test and Con-
fidence Interval) dialogue box. Put the columns to be compared into the
First Sample and Second Sample box as shown. The completed data and
dialogue box are shown below.
It gives the descriptive statistics in the top table and the value of t
(T‐value) as ⫺4.40 in the bottom line
54
4.7 The paired t test
Using a calculator
You must compare your value of | t | with the critical value of the t statistic for
(N ⫺ 1) degrees of freedom where N is the number of pairs of observations, and
at the 5% level (t(N⫺1)(5%)). This is given in Table S1 at the end of the book.
Here there are 9 ⫺ 1 = 8 degrees of freedom, so the critical value of t for
the 5% level is 2.306.
Using a calculator
The 95% confidence intervals can be calculated from the equation
It is 95% likely that the pH at dusk will be between 0.09 and 0.29 higher
than the pH at dawn.
5.8
5.7
5.6
5.5
pH
5.4
5.3
5.2
5.1
5
Dawn Dusk
Time
Figure 4.2 Mean (± standard error) of the pH of the nine ponds at dawn and dusk.
4.8.1 Purpose
To test whether the means of a two sets of unpaired measurements are different
from each other. For instance it is used to test whether experimentally treated
organisms are different from controls, or one species is different from another.
56
4.8 The two‐sample t test
4.8.2 Rationale
This test is rather more complex than the previous two because you have to
decide the probability of overlap between the distributions of two sample means
(Figure 4.3). To do this you have to calculate t by comparing the difference in
standard error of the the means of the two populations with an estimate of the standard error of the
difference (SEd )
difference between the two populations, using the equation
A measure of the spread of
the difference between two
Mean difference x A - xB
estimated means. t = = (4.5)
Standard error of difference SEd
In this case, however, it is much more complex to calculate the standard error
of the difference SEd because this would involve comparing each member of the
first population with each member of the second. Using a calculator SEd can be
estimated if we assume that the variance of the two populations is the same. It
is given by the equation
where SEA and SEB are the standard errors of the two populations. If the popula-
tions are of similar size, SEd will be about one‐and‐a‐half times as big as either
population standard error. Computer packages can also perform a more complex
calculation of SEd that makes no such simplifying assumption.
4.8.3 Validity
Both sets of data must be normally distributed. The two‐sample t test also makes
an important assumption about the measurements: it assumes the two sets of
measurements are independent of each other. This would not be true of the
data on the ponds we examined in Example 4.2, because each measurement has
a pair, a measurement from the same pond at a different time of day. Therefore
it is not valid to carry out a two‐sample t test on this data.
57
Chapter 4 Testing for differences between one or two groups
Example 4.3 The following data were obtained by weighing 16 cow elephants as well as
the 16 bull elephants we have already weighed. We will test whether bull
elephants have a different mean mass from cow elephants.
Solution
It looks like bulls are heavier, but are they significantly heavier?
= 0.30/0.068 = 4.43.
Using SPSS
To perform a two‐sample t test in SPSS, you must first put all the data into the
same column because each measurement is on a different organism. Call it
something like weight. To distinguish between the two groups, you must
58
4.8 The two‐sample t test
create a second, subscript, column with one of two values, here 1 and 2. We
will call it sex. You can also identify the subscripts using the Values box in
the Variable View screen. To do this click on the box and onto the three
dots on the right. The Value Labels dialogue box will come up. Type the
first subscript (here 1) into the Value box and type in its name (here bulls)
in the Label box. Click on Add to save this, then do the same for 2 and
cows. The completed box is shown below.
Once you have entered your data, simply click on the Analyze menu,
move onto the Compare Means bar, and click on Independent‐Samples
T Test. SPSS will come up with the Independent‐Samples T Test dialogue
box.
Put the variable you want to test (here weight) into the Test Variable
box and the subscript column (here sex) into the Grouping Variable box.
Define the groups by clicking on the Define Groups tab to bring up the
Define Groups dialogue box. Put in the values of the subscript column
(here 1 and 2) into that box to give the data set and completed boxes
shown below.
➨
59
Chapter 4 Testing for differences between one or two groups
Click on Continue to get back to the main dialogue box and finally click
on OK to run the tests. SPSS comes up with the following results.
Sig.
(2-tailed)
In the first box it gives the descriptive statistics for the two sexes (note
that I have given names for the values of the subscripts 1 and 2) and then
performs the t test, both with and without making the assumption of equal
variances. In both cases here, t ⫽ 4.431. They are usually similar. The first
test is only valid if the variances are not significantly different, and this is,
in fact, tested by Levene’s test for equality of variances, the results of which
are given at the left of the second table. If Sig. ⬍ 0.05 then the test is not
valid. Here Sig. ⫽ 0.140, so you could use either test. If in doubt though, use
the second, more accurate statistic.
Using MINITAB
There are two ways in which you can input your data into MINITAB to
perform a two‐sample t test. You can either put the results from the two
samples into separate columns, or (I recommend this so that later more
complex tests don’t come as so much of a shock) put all the data into the
same column as in SPSS. Call it something like weight. To distinguish be-
tween the two groups, you must create a second, subscript, column with
one of two values, here 1 and 2. We will call it sex.
Once you have entered your data, simply click on the Stat menu, move
onto the Basic Statistics bar, and click on 2‐Sample t. MINITAB will come
up with the 2‐Samples t (Test and Confidence Interval) dialogue box.
Put the variable you want to test (here weight) into the Samples box and
the subscript column (here sex) into the Subscripts box to give the data set
and completed boxes shown below.
60
4.8 The two‐sample t test
26
The means, standard deviations and standard errors are given at the top
and the value of t (T‐value) on the final line. Here T‐value ⫽ 4.43.
Using a calculator
95% CI(difference) = 4.70 - 4.40 { (2.042 * 0.0680
= 0.16 to 0 .44
62
4.8 The two‐sample t test
4.8
4.7
Mass (tonnes)
4.6
4.5
4.4
4.3
4.2
4.1
Bulls Cows
Sex of elephant
Figure 4.4 The mean ( t standard error) of the masses of 16 bull and 16 cow
elephants.
Note that the P value given is not zero as SPSS and MINITAB report. Indeed P
values are never zero, so when a computer package gives a value of 0.000 it just
means that it is less than 0.0005. A two‐sample t test is usually only significant
if the error bars on your bar chart do not overlap each other.
If you have taken several measurements on the same samples (for instance
comparing a control and experimentally treated group) it is usually best to pre-
sent the information in the form of a table as in Table 4.1, with the benefit of
an informative legend.
Table 4.1 The effect of nitrogen treatment on sunflower plants. The results show the
means { standard error for control and high nitrogen plants of their height, biomass,
stem diameter and leaf area.
Asterisks denote the degree of significance: * P 6 0.05; ** P 6 0.01; *** P 6 0.001; NS no significant
difference.
63
Chapter 4 Testing for differences between one or two groups
rank Non‐parametric tests make no assumption about the shape of the distribution
Numerical order of a data
point.
but use only information about the rank of each data point. The tests for differ-
median ences compare the medians of the groups instead of comparing means, and all
The central value of a the tests look at the probability of getting the ranked data points in a particular
distribution (or average
order. This means that the tests are intuitively fairly easy to understand. How-
of the middle points if the
sample size is even). ever, it often takes a great deal of time to assign ranks, and to manipulate these
ranks to produce the test statistic. Therefore it is often quicker to carry out non‐
parametric statistics using computer packages which do all that for you. For this
reason we will take a brief look at the rationale and mathematics of each of the
non‐paremetric equivalents of t tests, before looking at how to carry it out both
on a calculator and using statistical packages.
4.10.1 Purpose
To test whether the sample median of one measurement taken on a single pop-
ulation is different from an expected value E. It is the non‐parametric equiva-
lent of the one‐sample t test.
4.10.2 Rationale
Like the one‐sample t test, the first stage of the one sample sign test is to calculate the
difference d between each measurement you have made and the expected value, E.
Next, you rank the absolute values of these differences, and give the positive differ-
ences a plus and the negative differences a minus sign. Finally, you add all the nega-
tive ranks together, and separately add all the positive ranks together to give two
values of T: T- and T+. Note that if the median of your sample is lower or higher
than the expected median, one value of T will be large and the other will be small. In
this test the smaller T value is compared with the critical values table for the relevant
group size. The null hypothesis is rejected if your smaller value of T is lower than or
equal to a critical value. Note that this test is a special case of the Wilcoxon matched
pairs test (Section 4.11) but substituting expected values for one of the two samples.
64
4.10 The one‐sample sign test
Example 4.4 After a course on statistics, students were required to give their verdict on the
merits of the course in a questionnaire. Students could give scores of 1 = rubbish
through 3 = reasonable to 5 = excellent. The following scores were obtained.
Number
Students scoring 1 8
Students scoring 2 14
Students scoring 3 13
Students scoring 4 4
Students scoring 5 0
Solution
Step 1: Formulating the null hypothesis
The null hypothesis is that the median of the scores the students gave was 3.
The ties (where the scores equal the expected value and so Score ⫺ E =
0) contribute nothing to the comparison and are ignored. The ranks of the
differences are given from 1 = the lowest to 26 = the highest. When there
are ties between the ranks of the differences, each one is given the mean of
the ranks. Hence in this case there are 18 students with a difference from ex-
pected of 1 (14 with - 1 and 4 with + 1). The mean of ranks 1 to 18 is 9.5, so
each of these students is given 9.5. There are also eight students with a differ-
ence from the expected value of 2. The mean of ranks 19 to 26 is 22.5, so each
of these students is given a rank of 22.5. Adding up the ranks gives a value for
T- of 313 and for T + of 38. The smaller value is T +, so the T value to use is 38.
Using SPSS
Enter your data values into one column (here named, say, scores) and the
expected mean values (here 3) as a second column (named, say expected).
Next, click on the Analyze menu, move onto the Nonparametric Tests bar,
onto Legacy Dialogs and click on the two Related Samples bar. SPSS will
➨
65
Chapter 4 Testing for differences between one or two groups
produce the Two Related Samples Tests dialogue box. Put the columns to
be compared into the Test Pair(s) List box, making sure the Wilcoxon test
type is ticked. To get descriptive statistics and quartiles, go into Options and
click on both Descriptives and Quartiles. The completed boxes and data are
shown below.
Finally, click on Continue to get back to the main dialogue box and onto
OK to run the test. SPSS will come up with the following results.
NPar tests
Descriptive statistics
Percentiles
N Mean Std. deviation Minimum Maximum 25th 50th (Median) 75th
Score 39 2.3333 0.92717 1.00 4.00 2.0000 2.0000 3.0000
Expected 39 3.0000 0.00000 3.00 3.00 3.0000 3.0000 3.0000
Ranks
N Mean rank Sum of ranks
Expected - score Negative ranks 4a 9.50 38.00
Positive ranks 22b 14.23 313.00
Ties 13c
Total 39
a
Expected < score.
b
Expected > score.
c
Expected = score.
66
4.10 The one‐sample sign test
Test statisticsb
Expected-
score
Z −3.651a
Asymp. Sig. (two-tailed) 0.000
a Based on negative ranks.
b
Wilcoxon signed ranks test.
As well as the descriptive statistics SPSS has given the sums of the ranks,
T⫺ ⫽ 38 and T⫹ ⫽ 313, and has also given something called the Z statistic.
Here Z ⫽ ⫺3.651.
NB. SPSS now has new analysis available for this test, available through the
One Sample bar. I don’t recommend this new method, however, as though it
carries out the test it doesn’t actually present you with any statistics!
Using MINITAB
Enter your data values into one column (here named, say, scores). Next,
click on the Stat menu, move onto the Nonparametrics bar, then click on
the 1‐Sample Sign bar. MINITAB will produce the 1‐Sample Sign dialogue
box. Put the column to be compared into the Variables box, tick on Test
Median and enter the value you are testing against (here 3). The completed
box and data are shown below.
Finally, click on OK to run the test. MINITAB will come up with the follow-
ing results.
➨
67
Chapter 4 Testing for differences between one or two groups
Using a calculator
You must compare your value of |t| with the critical value of the Wilcoxon
T distribution for N ⫺1 degrees of freedom, where N is the number of non‐
tied pairs. This is given in Table S4 at the end of the book.
Here there are 39 students, but 13 have the expected median score, so are
tied, and hence N ⫽ 39 ⫺ 13 ⫽ 26. The critical value of T for 26 ⫺ 1 ⫽ 25
degrees of freedom at the 5% level is 89.
68
4.11 The Wilcoxon matched pairs test
Therefore we must reject the null hypothesis. We can say that the medi-
an scores of the course were significantly different from the expected value
of 3. In fact the median score (2) was lower, showing the unpopularity of
this course.
The median scores of the students was 2, which a one sample sign test
showed was significantly lower than the expected value of 3 (T25 = 38,
P 6 0.0005).
4.11.1 Purpose
To test whether the medians of two paired measurements made on identifiable
population are different from each other. Examples might be a single group of
people measured twice, e.g. before and after a treatment; or related people meas-
ured once, e.g. a single group of husbands and wives. This in the non‐parametric
equivalent of the paired t test.
4.11.2 Rationale
Like the paired t test, the first stage of the Wilcoxon test is to calculate the dif-
ference d between each of the two paired measurements you have made. Next,
you rank the absolute value of these differences, and give the positive differ-
ences a plus and the negative differences a minus sign. Finally, you add all the
negative ranks together, and separately add all the positive ranks together to
give two values of T: T⫺ and T⫹. Note that if one set of measurements is larger
than the other, one value of T will be large and the other will be small. In this
test the smaller T value is compared with the critical values table for the rel-
evant group sizes. The null hypothesis is rejected if your smaller value of T is
lower than or equal to a critical value.
Example 4.5 A new treatment for acne was being tested. Ten teenage sufferers had their
level of acne judged on an arbitrary rank scale from 0 (totally clear skin)
through to 6 (very bad acne). They were then given the new treatment for
4 weeks and assessed again. The following results were found:
➨
69
Chapter 4 Testing for differences between one or two groups
It looks as if the treatment reduces the severity of the acne, but is this a sig-
nificant difference?
Solution
Step 1: Formulating the null hypothesis
The null hypothesis is that there is no difference in the median level before
and after treatment.
The ties (where the level before equals the level after and so d = 0) con-
tribute nothing to the comparison and are ignored. The ranks of the differ-
ences are given from 1 = the lowest to 9 = the highest. When there are ties
between the ranks of the differences, each one is given the mean of the ranks.
Hence in this case the three lowest differences are all equal to 1 (where d is
either - 1 or + 1). They would be given ranks 1, 2 and 3, the mean of which is
2. Similarly there are three points that have the largest difference of 3; these
points would have the ranks of 7, 8 and 9, the mean of which is 8.
70
4.11 The Wilcoxon matched pairs test
Using SPSS
Enter the data into two columns, named, say, skinbefore and skinafter. Next,
click on the Analyze menu, move onto the Nonparametric tests bar, onto
Legacy Dialogs then click on the Two Related Samples bar. SPSS will produce
the Two Related Samples Test dialogue box. Put the columns to be compared
into the Test Pair(s) List box, making sure the Wilcoxon test type is ticked.
To get descriptive statistics and quartiles, go into Options and click on both
Descriptives and Quartiles. The completed boxes and data are shown below.
Finally, click on Continue to get back to the main dialogue box and then
click on OK to run the test. SPSS will come up with the following results.
NPar tests
Descriptive statistics
Percentiles
N Mean Std. deviation Minimum Maximum 25th 50th (Median) 75th
Skinbefore 9 4.5556 0.88192 3.00 6.00 4.0000 5.0000 5.0000
Skinafter 9 2.7778 1.09291 1.00 5.00 2.0000 3.0000 3.0000
Ranks
➨
71
Chapter 4 Testing for differences between one or two groups
Test statisticsb
Skinafter-
skinbefore
Z −2.455a
Asymp. Sig. (2-tailed) 0.014
a Based on positive ranks
b
Wilcoxon signed ranks test
In the top table SPSS has given the sums of the ranks, T - = 43 and
T + = 2, and in the bottom one has also given something called the Z sta-
tistic. Here Z = - 2.455.
NB SPSS now has new analysis available for this test, available through
the Related Samples bar. I don’t recommend this new method, however,
as though it carries out the test it doesn’t actually present you with any
statistics!
Using MINITAB
To do this test in MINITAB is a bit awkward. You have to calculate the dif-
ference between each data item, and then perform a 1 Sample Sign Test.
Enter the data into two columns, named, say, skinbefore and skinafter.
Next, create a difference column. Click on the Calc menu, and click onto
the Calculator bar. This opens the Calculator diialogue box. Put the ex-
pression for the difference (‘skin after’ – ‘skin before’) into the Expression
box, and put C3 into the Store result in variable box. The completed data
and box are shown below.
72
4.11 The Wilcoxon matched pairs test
Finally, click on OK to run the test. MINITAB will come up with the follow-
ing results.
Wilcoxon signed rank test: difference
MINITAB gives the Wilcoxon test statistic, here 2.0, in the table.
Using a calculator
You must compare your value of T with the critical value of the T statistic
for N degrees of freedom, where N is the number of non‐tied pairs. This is
given in Table S4 at the end of the book.
Here there are 10 pairs, but there is one tie, so you must look up the criti-
cal T value for 10 - 1 = 9 degrees of freedom. The critical value of T for the
5% level is 5.
• If T is greater than the critical value, you have no evidence to reject the
null hypothesis. Therefore you can say that the mean difference is not
significantly different from zero.
Here T9 = 2 6 5.
6.00
5.00
4.00
Acnescore
3.00
2.00
1.00
Before After
Time
Figure 4.5 Box and whisker plot showing the levels of acne of patients before
and after treatment.
74
4.12 The Mann–Whitney U test
4.12.1 Purpose
To test whether the medians of two unpaired sets of measurements are different
from each other. For instance it is used to test whether experimentally treated
organisms are different from controls, or one species is different from another.
4.12.2 Rationale
The Mann–Whitney U test works by comparing the ranks of observations in the
two groups. First the data from the two groups are pooled and the ranks of each
observation are calculated, with rank 1 being the smallest value. Where there
are ties, observations are given the average value of the ranks. Next, the rank of
each group is summed separately, to give the values R1 and R2. Finally two test
statistics U1 and U2 are calculated using the following equations.
Where n1 and n2 are the sample sizes of group 1 and group 2, respectively.
Note that if one group has much higher ranks than the other, one value of
U will be large and the other will be small. In this test the smaller U value is
compared with the critical values table for the relevant group sizes. Just like the
Wilcoxon test, the null hypothesis is rejected if your value of U is lower than
or equal to a critical value.
Example 4.6 In a field survey to compare the numbers of ground beetles in two fields,
one of which had an arable crop, the other being permanent pasture, sev-
eral pitfall traps were laid down and collected after a week.
The numbers of beetles caught in the traps in the two fields were as follows.
Solution
Step 1: Formulating the null hypothesis
The null hypothesis is that there is no difference in the median numbers of
beetles between the two fields.
➨
75
Chapter 4 Testing for differences between one or two groups
Observation: 2 4 5 8 9 12 12 15 17 19 21 25 44 44 60
Rank: 1 2 3 4 5 6.5 6.5 8 9 10 11 12 13.5 13.5 15
Using SPSS
As for the two‐sample t test, you must first put all the data into the same col-
umn because each measurement is on a different trap. To distinguish between
the two fields, you must create a second, subscript, column with one of two
values, here 1 and 2. Now you can carry out the test. Simply click on the
Analyze menu, move onto the Nonparametric tests bar, Legacy Dialogs and
click on the Two Independent Samples bar. SPSS will come up with the Two
Independent Samples Tests dialogue box. Put the variable you want to test
(here beetles) into the Test Variable box, making sure the Mann–Whitney
U test type is ticked. Put the subscript column (here field) into the Group-
ing Variable box. Define the groups by putting in the values of the subscript
column (here 1 and 2). The completed boxes and data are shown below.
Finally click on Continue and then OK to run the test. SPSS will come up
with the following tables.
Ranks
76
4.12 The Mann–Whitney U test
Test statisticsb
Beetles
Mann–Whitney U 8.500
Wilcoxon W 36.500
Z −2.261
Asymp. Sig. (2-tailed) 0.024
Exact Sig. [2*(1-tailed
021a
Sig.)]
a Not corrected for ties
b Grouping variable: field
The Mann–Whitney U value, here 8.5, is given at the top of the last box.
NB SPSS now has new analysis available for this test, available through
the Independent Samples bar. I don’t recommend this new method, how-
ever, as though it carries out the test it doesn’t actually present you with
any statistics!
Using MINITAB
Unlike SPSS in MINITAB you have to put the data side by side into two
separate columns, called something like field one and field two. Next, click
on the Stat menu, move onto the Nonparametrics bar, and click on the
Mann–Whitney bar. MINITAB will produce the Mann–Whitney dialogue
box. Put the columns to be compared into the First Sample and Second
Sample box as shown. The completed data and dialogue box are shown
below.
Finally, click on OK to run the test. MINITAB will come up with the follow-
ing results. Mann–Whitney Test and CI: field one, field two
➨
77
Chapter 4 Testing for differences between one or two groups
Using a calculator
You must compare your value of U with the critical value of the U statistic
for sample sizes of n1 and n2. This is given in Table S5 at the end of the
book.
Looking up in the U distribution for n1 = 7 and n2 = 8 gives a critical
value of U for the 5% level of 10.
78
4.12 The Mann–Whitney U test
60.00
50.00
40.00
Beetles
30.00
20.00
10.00
0.00
1.00 2.00
Field
Figure 4.6 Box and whisker plot showing the numbers of beetles caught in traps
in the two fields.
79
Chapter 4 Testing for differences between one or two groups
Problem 4.1
The scores (in percent) of 25 students in a statistics test were as follows:
58 65 62 73 70 42 56 53 59 56 60 64 63
78 90 31 65 58 59 21 49 51 58 62 56
Calculate the mean, standard deviation and standard error of the mean for these scores.
The mean mark of students in finals exams is supposed to be 58%. Perform a one‐sample
t test to determine whether these students did significantly differ from expected.
Problem 4.2
The masses (in grams) of 16 randomly chosen tomatoes grown in a commercial glass-
house were as follows.
32 56 43 48 39 61 29 45
53 38 42 47 52 44 36 41
Other growers have found that the mean mass of this sort of tomato is 50 g. Perform a
one‐sample t test to determine whether the mean mass of tomatoes from this glasshouse
is different from the expected mass. Give the 95% confidence intervals for the mean mass.
Problem 4.3
Students were tested on their ability to predict how moving bodies behave, both before
and after attending a course on Newtonian physics. Their marks are tabulated here. Did
attending the course have a significant effect on their test scores, and if so by how much?
Before After
Martha 45 42
Denise 56 50
Betty 32 19
Amanda 76 78
Eunice 65 63
Ivy 52 43
Pamela 60 62
Ethel 87 90
Letitia 49 38
Patricia 59 53
Problem 4.4
The pH of cactus cells was measured at dawn and at dusk using microprobes. The fol-
lowing results were obtained.
Dawn 5.3 5.6 5.2 7.1 4.2 4.9 5.4 5.7 6.3 5.5 5.7 5.6
Dusk 6.7 6.4 7.3 6.2 5.2 5.9 6.2 6.5 7.6 6.4 6.5
80
4.13 Self‐assessment problems
(a) Using a statistical package, carry out a two‐sample t test to determine if there is any
significant difference in pH between the cells at these times.
(b) The cactus was identifiable and two sets of measurements were carried out on it. So
why can’t you analyse this experiment using the paired t test?
Problem 4.5
An experiment was carried out to investigate the effect of mechanical support on the
yield of wheat plants. The masses of seed (in grams) produced by 20 control plants and
20 plants whose stems had been supported throughout their life were as follows:
Control 9.6 10.8 7.6 12.0 14.1 9.5 10.1 11.4 9.1 8.8
9.2 10.3 10.8 8.3 12.6 11.1 10.4 9.4 11.9 8.6
Supported 10.3 13.2 9.9 10.3 8.1 12.1 7.9 12.4 10.8 9.7
9.1 8.8 10.7 8.5 7.2 9.7 10.1 11.6 9.9 11.0
Using a statistical package, carry out a two‐sample t test to determine whether the sup-
port has a significant effect on yield.
Problem 4.6
A population of 30 adult deer, which exhibit marked sexual dimorphism, were weighed
at the start and at the end of summer, to investigate if they ‘fatten up’ to last over the
subsequent winter. The following results (in kg) were obtained.
(a) Why is it not possible to transform this data to make it normally distributed?
(b) Carry out a Wilcoxon matched pairs test to determine if the animals had significantly
different weight at the end compared with the start of summer.
Problem 4.7
In a behavioural experiment, scientists compared the amount of time that a macaque
spent pacing back and forth (a sign of distress) when in a traditional cage compared with
when it was in an ‘environmentally enriched’ cage. The animal was observed over 4 days
for periods of 15 minutes every 2 hours, being moved at the end of each day to the other
cage. The following results (in minutes spent pacing) were obtained.
Carry out a Mann–Whitney U test to see if the animal behaved differently in the two cages.
81
Chapter 4 Testing for differences between one or two groups
Problem 4.8
A new drug to improve wound healing was tested on students in Manchester. Tiny
experimental lesions were made on their arms. Half were given the drug, while the
other half received a placebo. Six weeks later the extent of scarring was assessed on
an arbitrary scale ranging from 0 (no scar tissue) to 5 (heavy scarring). The following
results were obtained.
Placebo: 1, 3, 2, 4, 3, 3, 2, 3, 3, 2, 3, 1, 0, 4, 3, 2, 3, 4, 3, 2
Drug: 1, 2, 2, 3, 0, 2, 0, 1, 2, 1, 2, 1, 4, 3, 0, 2, 2, 1, 0, 1
(a) Which test should you use to determine whether the drug had any effect on scarring?
(b) Carry out the test.
82
Testing for differences between
5 more than two groups
ANOVA and its non‐parametric equivalents
5.1 Introduction
We saw in the last chapter how you can use t tests and their non‐parametric
equivalents to compare one set of measurements with an expected value, or
two sets of measurements with each other. However, there are many occasions
in biology when we might want to save time and effort by comparing three or
more groups.
This chapter describes how you can use a set of tests called analysis of variance
(ANOVA) to help determine whether there are differences and if so between
which of the groups. Non‐parametric equivalents for some of the tests will also
be described. First, however, we must see why it is that you cannot use t tests for
comparing multiple groups.
83
Chapter 5 Testing for differences between more than two groups
there is the problem of convenience. As the number of groups you are compar-
ing goes up, the number of tests you must carry out rises rapidly, from 3 tests
when comparing 3 groups to 45 tests for 10 groups. It would take a lot of time
to do all these tests and it would be nearly impossible to present the results of
all them!
Number of groups 3 4 5 6 7 8 9 10
Number of t tests 3 6 10 15 21 28 36 45
5.2.1 Purpose
To test whether the means of two or more sets of unrelated measurements are
different from each other. For instance it is used to test whether one or more
groups of experimentally treated organisms are different from controls, or two
or more species are different from one another.
5.2.2 Validity
Like t tests, one‐way ANOVA is only valid if the data within each group is nor-
mally distributed. However, ANOVA also assumes that the variances of the
groups are equal. Fortunately, this is not a strict condition, and research has
shown that the variances of the groups can differ by factors of over four with-
out it affecting the results of the test (Field, 2000).
84
5.2 One‐way ANOVA
Figure 5.1 The rationale behind ANOVA: hypothetical weights for two samples of
fish. (a) Calculate the overall mean and the group means. (b) The total variability is
the sum of the squares of the distances of each point from the overall mean; this can
be broken down into between‐group variability and within‐group variability. (c) The
between‐group variability is the sum of the squares of the distances from each point’s
group mean to the overall mean. (d) The within‐group variability is the sum of the
squares of the distances from each point to its group mean.
weights of just two small samples of fish are compared (Figure 5.1a). The over-
all variability is the sum of the squares of the distances from each point to the
overall mean (Figure 5.1b); here it’s 32 + 22 + 12 + 32 + 22 + 12 = 28. But this can
be split into two parts. First, there is the between‐group variability, which is due
to the differences between the group means. This is the sum of the squares of
the distances of each point’s group mean from the overall mean (Figure 5.1c);
85
Chapter 5 Testing for differences between more than two groups
Figure 5.2 Two contrasting situations. (a) Most of the variability is caused by the
group means being far apart. (b) Most of the variability is caused by differences within
the groups.
here it’s (6 * 22) = 24. Second, there is the within‐group variability, which is due
to the scatter within each group. This is the sum of the squares of the distance
from each point to its group mean (Figure 5.1d); here it’s (4 * 12) + (2 * 02) = 4.
ANOVA compares the between‐group variability and the within‐group
variability. To show how this helps, let’s look at two contrasting situations. In
Figure 5.2a the two means are far apart and there is little scatter within each
group; the between‐group variability will clearly be much larger than the
within‐group variability. In Figure 5.2b the means are close together and there
is much scatter within each group; the between‐group variability will be lower
than the within‐group variability.
The test statistic in ANOVA tests is the F statistic, a measure of the ratio of
between‐group to within‐group variability. Calculating F is quite a long‐winded
process, however, and involves producing a table like the one shown below and
in Example 5.1.
(a) The first stage is to calculate the variabilities due to each factor to produce
the so‐called sums of squares (SS).
(b) You cannot directly compare sums of squares, because they are the result of
adding up different numbers of points. The next stage is therefore to calculate
86
5.2 One‐way ANOVA
variance the actual variance or mean squares (MS) due to each factor. This is calcu-
A measure of the variability
of data: the square of their
lated by dividing each sum of squares by the correct number of degrees of
standard deviation. freedom.
(i) If there are n groups, the between‐group degrees of freedom DFB = n - 1.
mean square
The variance due to a (ii) If there are N items in total and r items in each group, there will be r - 1
particular factor in analysis degrees of freedom in each group, hence n(r - 1) in total. The within‐
of variance (ANOVA).
group degrees of freedom, DFW = N - n.
degrees of freedom (DF) (iii) If there are N items in total, the total number of degrees of freedom
A concept used in para- DFT = N - 1.
metric statistics, based on
the amount of information (c) The last stage is to calculate the test statistic F. This is the ratio of the
you have when you exam- between‐group mean square MSB to the within‐group mean square MSW.
ine samples. The number
of degrees of freedom is
generally the total number F = MSB/MSW (5.1)
of observations you make
minus the number of para-
meters you estimate from
The larger the value of F, the more likely it is that the means are significantly
the samples. different.
You must be able to recognise all of them as different names may be used in
scientific papers or different computer packages. Then you can cope with any
statistics book or any lecturer!
Example 5.1
Mass of group 1: 6, 7, 8.
Mass of group 2: 10, 11, 12
Is the mass of the two groups of fish significantly different from each other?
➨
87
Chapter 5 Testing for differences between more than two groups
Solution
Step 1: Formulating the null hypothesis
The null hypothesis is that the groups have the same mean. In this case the
hypothesis is that the two groups of fish have the same mean weight.
Click on Continue to get back to the original dialogue box and finally click
on OK to start the test. SPSS will print the following results.
Oneway
88
5.2 One‐way ANOVA
The first table gives the descriptive statistics for the two groups. The second
table gives the completed ANOVA table which shows that F = 24.00.
Using MINITAB
As in SPSS I recommend putting all the data items into a single column and
then creating a second subscript column with different integer values for
each group (here 1 and 2). [An alternative is to put your groups into differ-
ent columns and run the One‐Way (Unstacked) test.] Next, click on the
Stat menu, move onto the ANOVA bar, and click on One‐Way. MINITAB
will produce the One‐Way Analysis of Variance dialogue box. Put the vari-
able you want to test (here fish weight) into the Response box and the sub-
script column (here sample) into the Factor box to give the data set and
completed box shown below.
➨
89
Chapter 5 Testing for differences between more than two groups
The first table gives the completed ANOVA table which shows that F =
24.00. The second gives the descriptive statistics for the two groups.
Here 0.008 6 0.05 so we can reject the null hypothesis and say that the fish
samples have different mean weights. In fact sample two is significantly
heavier.
90
5.3 Deciding which groups are different – post hoc tests
post hoc tests but it will be a problem if you have three or more groups. Fortunately, statis-
Statistical tests carried out
if an analysis of variance is
ticians have worked out several different post hoc tests that you can use to
significant; they are used to see which groups are different from each other, but only if the ANOVA is itself
determine which groups are significant. They basically all make allowances for the problems caused by the
different from each other.
fact that you are running several comparisons, but they do it in different ways.
SPSS and MINITAB both allow you to perform several of these post hoc tests,
and different ones can be used depending on what you want to test.
• If you want to compare each group with all the others, the tests most used
by biologists are the Tukey test and the Scheffe test.
• If you want to compare experimental groups with a control, then the test to
use is the Dunnett test.
Let’s have a look at a typical example, to see how to perform one way ANOVA
and a relevant post hoc test.
Example 5.2 The effect of three different antibiotics on the growth of a bacterium was
examined by adding them to Petri dishes, which were then inoculated with
the bacteria. The diameter of the colonies (in millimetres) was then meas-
ured after three days. A control where no antibiotics were added was also
included. The following results were obtained.
Carry out a one‐way ANOVA test to determine whether any of the antibiotic
treatments affected the growth of the bacteria and if so, which ones.
Solution
Step 1: Formulating the null hypothesis
The null hypothesis is that there was no difference in the mean diameters
of the four groups of bacteria.
➨
91
Chapter 5 Testing for differences between more than two groups
92
5.3 Deciding which groups are different – post hoc tests
Using SPSS
Repeat the ANOVA test but click on Post Hoc to reveal the Post Hoc Multi-
ple Comparisons dialogue box. Tick the Dunnett box. The control is given
by the subscript 1, so you need to change the Control Category from Last
to First. The completed dialogue box is shown below.
Finally, click on Continue and OK in the main dialogue box. As well as the
results for the ANOVA SPSS will produce the following table.
➨
93
Chapter 5 Testing for differences between more than two groups
The important column is the Sig. Column. This tells you that of the
three treatment groups, 2, 3 and 4, only 2 has a significantly different mean
(Sig. = 0.000 which is less than 0.05) from the control. At 4.15 mm its mean
diameter is almost 1 mm less than that of the control (5.06 mm). In groups
3 and 4 the significance probabilities (0.978 and 0.987) are well over 0.05,
so they are not different from the control.
Using MINITAB
Repeat the ANOVA test but click on Comparisons to reveal the One‐Way
Multiple Comparisons dialogue box. Tick the Dunnett box. The control
is given by the subscript 1, so you need to enter 1 into the Control Group
Level box. The completed dialogue box is shown below.
Finally, click on OK, then OK in the main dialogue box. As well as the re-
sults for the ANOVA MINITAB will produce (amongst other information)
the following table.
This tells you that of the three groups only group 2 has a mean significantly
different from that of the control group 1.
94
5.4 Presenting the results of one‐way ANOVAs
If you are comparing more than two groups, it is best to present your results in
the form of a bar chart. For instance the results of Example 5.2 can be presented
as in Figure 5.3a or b.
Asterisks should be used, as shown in Figure 5.3a, if you want to empha-
sise whether individual groups are different from a control, for instance if you
have carried out a Dunnett post hoc test on your data. Here only antibiotic A is
5
Colony Diameter (mm)
***
4
0
Control (14) Antibiotic A (14) Antibiotic B (10) Antibiotic C (16)
Treatment
(a)
6
a a a
5
Colony Diameter (mm)
b
4
0
Control (14) Antibiotic A (14) Antibiotic B (10) Antibiotic C (16)
Treatment
(b)
Figure 5.3 Bar chart showing the means with standard error bars of the diameters
of bacterial colonies subjected to different antibiotic treatments. (a) Numbers in
brackets give sample size. Asterisks denote the degree of significance of differences
of diameter compared with controls: * P 6 0.05; ** P 6 0.01; *** P 6 0.001. (b) Numbers
in brackets give sample size. Letters denote significant differences between groups.
Groups denoted by the same letter are not significantly different from each other.
95
Chapter 5 Testing for differences between more than two groups
significantly different from the control. The letter notation (Figure 5.3b) is pref-
erable if you want to show which groups are different from each other and if
you have performed a Tukey or Scheffe test to do just that. In this example colo-
nies given antibiotic A were significantly different from all the others.
Once again, you should refer to the figure in the text of your results section,
saying something like
5.5.1 Purpose
To test whether the means of a two or more sets of related measurements are
different from each other. For instance it is used to test whether one group of
experimentally treated organisms are different at several times before and after
a treatment or if two or more sets of measurements taken at known time points
are different from each other.
5.5.2 Rationale
Repeated measures ANOVA acts in the same way as one‐way ANOVA, but it
improves the chances of detecting differences between groups by removing the
within group variability, just as a paired t test does.
5.5.3 Validity
Like t tests, repeated measures ANOVA is only valid if the data within each
group is normally distributed. It also assumes that the variances of the groups
are equal, though this condition is not strict.
Example 5.3 In an experiment to investigate the time course of the effect of exercise
on the rate of sweating in soldiers in the desert, the following results were
obtained.
96
5.5 Repeated measures ANOVA
Soldier 1 2 3 4 5 6 7 8 9 10
Rate before (litres/hour) 3.6 3.9 4.2 4.0 3.8 3.5 4.2 4.0 3.9 3.8
During 4.5 4.4 4.8 4.3 4.6 4.5 5.0 4.6 4.1 4.6
1 hour after 3.9 4.4 3.7 3.9 3.5 4.2 4.0 4.1 3.6 4.6
Carry out a repeated measures ANOVA to find out whether the rate of sweat-
ing altered during exercise and afterwards compared with before.
➨
97
Chapter 5 Testing for differences between more than two groups
Now click on the Define tab to get into the main Repeated Measures dia-
logue box. Next, you must tell the computer which are the three Within
Subject Variables (time). To do this, click on each of the three columns —
before, during and after — in turn and clicking on the top arrow. The data
is now entered.
To get other useful things done you should also click on the Options
tab. This brings up the Repeated Measures: Options dialogue box. Click
onto descriptives, to get the means and standard deviations. Unfortunately
there is no Dunnett test to compare groups with a control, but you can car-
ry out tests that compare each group with all the others. Other authors who
know much more about this (see Field, 2000) than myself suggest that the
Bonferroni is the most reliable post hoc test to use for repeated measures
ANOVA. To perform it click on time within the estimated marginal means
box and move it with the arrow into the Display means for: box. Now you
can tick the Compare Main Effects and change the confidence interval
adjustment box from LSD (none) to Bonferroni. The data and completed
boxes are shown below.
Finally click on the Continue tab, and when the main dialogue box
appears again click on OK to run the test. SPSS comes up a huge mass
of results. However, only the ones that are important for us are shown
below.
98
5.5 Repeated measures ANOVA
The first thing you must check is whether the data passes Mauchley’s sphe-
ricity test in the second table. If the data shows significant non‐sphericity
Sig. 6 0.05. Here, fortunately, Sig. = 0.216, so we can go along to examine
the F ratio for Sphericity Assumed which is shown in the Tests of Within‐
Subjects Effects box. Here F = 16.662.
➨
99
Chapter 5 Testing for differences between more than two groups
Using MINITAB
There is no way of performing a Repeated measures ANOVA in MINITAB.
Here Sig. = 0.000 < 0.05 so we can reject the null hypothesis and say that there
are significant differences between the mean rates of sweating at the three times.
5
b
4.5 a
a
4
Sweating rate (litres hour−1)
3.5
2.5
1.5
0.5
0
Before During After
Time relative to exercise
Figure 5.4 Mean sweating rates of soldiers before, during and after exercise. For
all groups n = 9. Letters denote significant differences between groups. Groups
denoted by the same letter are not significantly different from each other.
100
5.6 The Kruskall–Wallis test
The results for the experiments on the sweating rates of soldiers are shown
in Figure 5.4. A repeated measures ANOVA showed that sweating rates were
significantly different at the three times (F2 = 16.62). Bonferroni post hoc
tests showed that rates were significantly higher only during exercise.
5.6.1 Purpose
To test whether the medians of two or more sets of unrelated measurements
are different from each other. It is therefore the non‐parametric version of
one‐way ANOVA. For instance it is used to test whether one or more groups of
experimentally treated organisms are different from controls, or two or more
species are different from one another.
5.6.2 Rationale
The Kruskall–Wallis test starts in the same way as the Mann–Whitney U test,
by assigning each observation its rank within all the measurements. If there
are tied ranks, each is assigned the average value. The sum of the ranks in
each sample, R, is then calculated. Finally the test statistic, K, is calculated using
the formula
Where n is the size of each sample and N is the total number of observations. The
more different the medians of the samples are, the larger will be the sum of R2/n,
so the bigger K will be. In this test, the null hypothesis is rejected if K is greater
than or equal to a critical value.
Using the Explore command, SPSS tells us that the median scores were 11,
10.5 and 11.5, but are these significantly different?
Solution
Step 1: Formulating the null hypothesis
The null hypothesis is that there is no difference in the median scores of
the three groups.
n 10 10 10
R 156.5 146 162.5
R2 24492.25 21316 26406.25
R2/n 2449.2 2131.6 2640.6
K = [(2449.2 + 2131.6 + 2640.6) * 12兾(30 * 31)] - (3 * 31)
K = 93.18 - 93 = 0.18
Using SPSS
As for one‐way ANOVA, you must first put all the data (called, say, testscore)
into the same column because each measurement is on a different school-
child. To distinguish between the three packages, you must create a second,
subscript, column (called, say, CALtype) with one of three values, here 1, 2
and 3. Simply click on the Analyze menu, move onto the Nonparametric
tests bar, onto Legacy Dialogs and click onto the k Independent Samples bar.
SPSS will come up with the Tests for Several Independent Samples dialogue
box. Put the variable you want to test into the Test Variable box, making
sure the Kruskall–Wallis test type is ticked. Put the subscript column into the
102
5.6 The Kruskall–Wallis test
Grouping Variable box. Define the range by clicking on the Define Range tab
to reveal the dialogue box and putting in the minimum and maximum values
of the subscript column (here 1 and 3) in the Range for grouping variable
boxes. The completed dialogue boxes and data are shown below.
Finally, click on Continue and OK to run the test. SPSS will come up with
the following results.
Ranks
CALtype N Mean rank
testscore 1.00 10 15.65
2.00 10 14.60
3.00 10 16.25
Total 30
Test statisticsa,b
Test Score
Chi-square 0.181
df 2
Asymp. sig. 0.913
a
Kruskal–Wallis test.
b
Grouping variable: CALtype
SPSS calls K chi‐square and gives the value as 0.181.
NB SPSS now has new analysis available for this test, available through
the Independent Samples bar. I don’t recommend this new method,
however, as although it carries out the test it doesn’t actually present you
with any statistics!
➨
103
Chapter 5 Testing for differences between more than two groups
Using MINITAB
As for one‐way ANOVA, you must first put all the data (called, say, test
score) into the same column because each measurement is on a different
schoolchild. To distinguish between the three packages, you must create a
second, subscript, column (called, say, CAL type) with one of three values,
here 1, 2 and 3. Next, click on the Stat menu, move onto the Nonparamet-
rics bar, and click on Kruskall–Wallis. MINITAB will produce the Kruskall–
Wallis dialogue box. Put the variable you want to test (here test score)
into the Response box and the subscript column (here CAL type) into the
Factor box to give the data set and completed box shown below.
104
5.6 The Kruskall–Wallis test
Using a calculator
You must compare your value of K with the critical value of the x2 statistic
for (G – 1) degrees of freedom, where G is the number of groups. This is
given in Table S3 at the end of the book.
Looking up in the chi‐square distribution for (3−1) = 2 degrees of free-
dom gives a critical value of chi‐square = 5.99.
• If x2 is greater than or equal to the critical value, you must reject the
null hypothesis. You can say that the medians of the samples are signifi-
cantly different from each other.
• If x2 is less than the critical value, you cannot reject the null hypothesis.
You can say that there is no significant difference between the medians
of the samples.
• If Asymp. Sig. (two‐tailed) pr P … 0.05 you must reject the null hypoth-
esis. Therefore you can say that the medians of the samples are signifi-
cantly different from each other.
• If Asymp. Sig. (two‐tailed) or P 7 0.05, you have no evidence to reject
the null hypothesis. Therefore you can say that there is no significant
difference between the medians of the samples.
105
Chapter 5 Testing for differences between more than two groups
20.00
15.00
testscore
10.00
5.00
.00
Figure 5.5 Box and whisker plot showing the medians, quartiles and range of
the test scores of children who had taken different CAL packages. For all samples
n = 10.
you want to compare, but use the Dunn–Sidak correction so that the signifi-
cance probability for each test, rather than being 0.05 is given by the equation
P = (1 – 0.951/k) where k is the number of tests.
The median mark of the children who had been given the different CAL
packages are shown in Figure 5.5. A Kruskall–Wallis tests showed that there
was no significant difference between the median scores.
5.7.1 Purpose
To test whether the means of two or more sets of related measurements are
different from each other. For instance it could be used to test whether one
group of experimentally treated organisms are different at several times after a
106
5.7 The Friedman test
5.7.2 Rationale
Take the case of an investigation looking at whether there are differences in a
measurement between four time points, in 10 experimental animals. The meas-
urements within each of the b blocks are first given ranks. In this case, as there
are 10 animals measured there will be 10 blocks. The ranks are then summed
for each of the a groups. As the animals are measured four times there will be
four groups. The sums for each group are given the term Ri. The test statistic, x2,
is then calculated using the following formula.
12 gR21
x2 = - 3b(a + 1) (5.3)
ba(a + 1)
Note that the bigger the differences in the medians of the groups, the larger
the value of gR21 will be, and so the larger the value of x2.
It looks as if there were differences, with the birds avoiding chilli more
than the other two types of pellet, but were these differences significant?
x2 = 12 ⌺ R21 - 3b(a + 1)
ba(a + 1)
Where b = 10 since birds were tested in 10 trials and a = 3 since there were
three types of pellet that were being compared.
Using SPSS
As in repeated measures ANOVA, you must first put all the data into three
separate columns, with the results of each trial on the same row. Now click
on the Analyze menu, move onto the Nonparametric tests bar, onto Leg-
acy Dialogs and click onto the k Related Samples bar. SPSS will come up
with the Tests for Several Related Samples dialogue box. Put the three
variables you want to test into the Test Variable box, making sure the
Friedman test type is ticked. To examine the medians and quartiles click
on the Statistics tab and in the Statistics dialogue box tick Quartiles. The
completed dialogue boxes and data are shown below.
108
5.7 The Friedman test
Finally, click on Continue and OK to run the test. SPSS will come up
with the following results.
Descriptive statistics
Percentiles
N 25th 50th (Median) 75th
Control 10 2.0000 4.5000 6.5000
Quinine 10 2.7500 4.0000 5.0000
Chili 10 2.0000 4.0000 5.0000
Friedman test
Ranks
Mean rank
Control 2.10
Quinine 2.20
Chili 1.70
Test statisticsa
N 10
Chi-square 1.514
df 2
Asymp. Sig. 0.469
a
Friedman test
Using MINITAB
Putting the data into MINITAB for the Friedman test is rather illogical. First,
put all the data into a single column (here called Number eaten as for the
Kruskall–Wallis test. To distinguish between the three packages, you must
create a second, subscript, column (called, say, treatment) with one of three
values, here 1, 2 and 3. Finally you also have to create a third column with
subscripts to identify which trial it was. These are given the numbers 1 to
10. Next, click on the Stat menu, move onto the Nonparametrics bar, and
click on Friedman. MINITAB will produce the Friedman dialogue box. Put
the variable you want to test (here Number eaten) into the Response box
and the first subscript column (here treatment) into the Treatment box.
Finally, put the third column (here trial) into the Blocks box, to give the
data set and completed box shown below.
➨
109
Chapter 5 Testing for differences between more than two groups
Once again MINITAB doesn’t actually give a value for K, but never mind.
Using a calculator
You must compare your value of x2 with the critical value of the Friedman
x2 statistic for your values of a and b. This is given in Table S6 at the end of
the book.
110
5.7 The Friedman test
• If x2 is greater than or equal to the critical value, you must reject the
null hypothesis. You can say that the medians of the samples are signifi-
cantly different from each other.
• If x2 is less than the critical value, you cannot reject the null hypothesis.
You can say that there is no significant difference between the medians
of the samples.
• If Asymp. Sig. (two‐tailed) or P ≤ 0.05 you must reject the null hypoth-
esis. Therefore you can say that the medians of the samples are signifi-
cantly different from each other.
• If Asymp. Sig. (two‐tailed) or P > 0.05, you have no evidence to reject the
null hypothesis. Therefore you can say that there is no significant differ-
ence between the medians of the samples.
111
Chapter 5 Testing for differences between more than two groups
10.00
8.00
6.00
Earten
4.00
2.00
0.00
Figure 5.6 Box and whisker plot showing the medians, quartiles and range of the
numbers of different flavoured pellets eaten by birds. For all samples n = 10.
To show which groups (if any) are different from each other, it is best to use
the letter notation as used for Tukey test (Figure 5.3b).
Once again, you should refer to the figure in the text of your results section,
saying something like
5.8.1 Purpose
To analyse experiments or trials in which you can look at the effect of two fac-
tors at once, for instance:
• You might want to examine the effect on corn yield of adding different
amounts of nitrate and phosphate.
• You might want to examine the effect on yield of adding different amounts
of nitrate to more than one wheat variety.
5.8.2 Rationale
Two‐way ANOVA acts in the same way as one‐way ANOVA, but with two factors
it tests three questions.
112
5.8 Two‐way ANOVA
5.8.3 Validity
Like other forms of ANOVA, two‐way ANOVA is only valid if the data within
each group is normally distributed. It also assumes that the variances of the
groups are equal, though this condition is not strict.
Example 5.6 In a field trial to look at the effects of fertilisers, wheat was grown at two dif-
ferent levels of nitrogen and at two different levels of phosphorus. To allow
analysis, all possible combinations of nitrogen and phosphorus levels were
grown (so there were 2 * 2 = 4 combinations in total). The yields (t ha-1)
from the experiment are tabulated below.
No nitrate or phosphate 1.4 1.8 2.1 2.4 1.7 1.9 1.5 2.0 2.1
Mean = 1.88, s = 0.32, SE = 0.105
Added nitrate only 2.4 2.7 3.1 2.9 2.8 3.0 2.6 3.1 2.6
Mean = 2.80, s = 0.24, SE = 0.082
Added phosphate only 3.5 3.2 3.7 2.8 4.0 3.2 3.9 3.6 3.1
Mean = 3.44, s = 0.40, SE = 0.132
Added nitrate and phosphate 7.5 6.4 8.1 6.3 7.2 6.8 6.4 6.7 6.5
Mean = 6.88, s = 0.61, SE = 0.203
Solution
Step 1: Formulating the null hypotheses
There are three null hypotheses:
Finally, click on the Continue tab to get back to the main dialogue box and
click on OK. SPSS will produce lots of results, the most important of which
are shown below.
114
5.8 Two‐way ANOVA
Just like the one‐way ANOVA we have already looked at, two‐way ANOVA
partitions the variability and variance. However, there will be not two pos-
sible causes of variability but four: the effect of nitrate; the effect of phos-
phate; the interaction between the effects of nitrate and phosphate (shown
here are nitrate * phosphate) and finally, variation within the groups (here
called Error).
These possibilities are here used to produce three F ratios, which test the
null hypotheses.
Using MINITAB
In MINITAB enter all the measurements of yield into a single column and
call it, say, yield. Next create two more columns: a second column (called,
say, nitrate) with subscripts 0 and 1 for no nitrate and added nitrate, re-
spectively; and a third column (called, say, phosphate) with subscripts 0
➨
115
Chapter 5 Testing for differences between more than two groups
MINITAB gives the three F ratios, to test the effect of nitrate, phosphate and
the interaction between them in the second last column. You can also find
out the means and standard deviations of the different groups using the
Basic Statistics bar.
1. Here Sig. = P = 0.000 6 0.05 so we can reject the null hypothesis and say
that nitrate has a significant effect on yield. In fact, looking at the descrip-
tive statistics we can see that adding nitrate increases yield by 0.89 t ha-1.
2. Here Sig. = P = 0.000 6 0.05 so we can reject the null hypothesis and say
that phosphate has a significant effect on yield. In fact, looking at the
descriptive statistics we can see that adding phosphate also increases yield
by 1.55 t ha-1.
3. Here Sig. = P = 0.000 6 0.05 so we can reject the null and say that nitrate
and phosphate have a significant interaction. What does this mean? Well,
looking at the descriptive statistics we can see that the yield with both ni-
trate and phosphate is very large. Adding nitrate and phosphate have more
effect when added together (they increase yield by 4.97 t ha-1 rather than
116
5.9 The Scheirer–Ray–Hare Test
by just adding the effects of them added singly (0.89 + 1.55 = 2.44 t ha-1).
In this case they potentiate each other’s effects. (Though you would also
get a significant interaction if they had inhibited each other’s effects.)
The results are shown in Figure 5.7. Both nitrate (F1,32 = 385.1) and phos-
phate (F1,32 = 228.5) increased yield significantly and there was also a sig-
nificant interaction (F1,32 = 79.2); they potentiated each others effects.
8
7
6
Yield (t ha−1)
5
No nitrate
4
Nitrate
3
2
1
0
No phosphate Phosphate
Figure 5.7 The yields of wheat grown in a factorial experiment with or without
nitrate and phosphate. Bars show means ± standard error. For all samples n = 9.
5.9.1 Purpose
The Scheirer–Ray–Hare test is the non‐parametric version of two‐way ANOVA,
for use when you have ranked or non‐normally distributed data. It should be
used with caution, however.
117
Chapter 5 Testing for differences between more than two groups
5.9.2 Rationale
This test is essentially a two‐way extension of the Kruskall–Wallis test, and like
two‐way ANOVA it tests three questions.
Example 5.7 In the field trial to look at the effects of fertilisers, the numbers of snails in
the different plots were also counted and the results shown below obtained.
No nitrate or phosphate 0 1 3 2 1 5 4 8 2
Median = 2
Added nitrate only 3 9 5 12 4 9 16 6 1
Median = 6
Added phosphate only 2 7 1 2 0 2 0 6 5
Median = 2
Added nitrate and phosphate 6 3 12 7 2 17 10 4 5
Median = 6
It look as though areas with higher nitrate had larger numbers of snails, but
is this difference significant?
Solution
Step 1: Formulating the null hypotheses
There are three null hypotheses:
1. That nitrate addition had no effect on snail number.
2. That phosphate addition had no effect on snail number.
3. That there was no interaction between the actions of nitrate and
phosphate.
118
5.9 The Scheirer–Ray–Hare Test
come up with the Sort Cases dialogue box. Put number into the Sort by box
and click on OK. Now you can easily create a new rank column. In many cases
this will just involve typing in an ascending series of numbers, but here there are
many cases with equal rank, so it is a bit more tricky (see part of the completed
column below). Next, carry out a conventional two‐way ANOVA putting rank
into the Dependent Variable box and nitrate and phosphate into the fixed
factor(s) box. The data and completed main dialogue box are shown below.
Next, click on OK and SPSS will produce the ANOVA table below.
a
R Squared = 0.292 (Adjusted R Squared = 0.226).
➨
119
Chapter 5 Testing for differences between more than two groups
Unfortunately, this is not the end of the matter. You have to calculate the x2
test statistics yourself from this table for each factor, where x2 = factor sum
of squares/total mean square. Here the total mean square = corrected total
sum of squares/corrected total df = 3848/35 = 109.94. Therefore
Using MINITAB
In MINITAB enter all the measurements of yield into a single column and
call it, say, number. Next create two more columns: a second column
(called, say, nitrate) with subscripts 0 and 1 for no nitrate and added ni-
trate, respectively; and a third column (called, say, phosphate) with sub-
scripts 0 and 1 for no phosphate and added phosphate, respectively. Next,
rank the data. To do this go into the Data column and click on Rank. This
brings up the Rank dialogue box. Put number into the Rank data in box
and type rank into the Store ranks in box. The completed data and dia-
logue boxes are shown below.
Click on OK and MINITAB will produce the ranked data in the new column
rank. Now carry out a conventional two‐way ANOVA. Click on the Stat
menu, move onto the ANOVA bar, and click on Two‐Way. MINITAB will
produce the Two‐Way Analysis of Variance dialogue box. Put the vari-
able you want to test (here rank) into the Response box and the subscript
columns (here nitrate and phosphate) into the Row factor and Column
factor boxes. Click on OK to run the tests. MINITAB comes up with the fol-
lowing results.
120
5.9 The Scheirer–Ray–Hare Test
Unfortunately, this is not the end of the matter. You have to calculate the
x2 test statistics yourself from this table for each factor, where x2 = factor
sum of squares/total mean square. Here the total mean square = total SS/
total DF = 3848/35 = 109.94. Therefore
• If x2 is greater than or equal to the critical value, you must reject the
null hypothesis. You can say that the medians of the samples are signifi-
cantly different from each other.
• If x2 is less than the critical value, you cannot reject the null hypothesis.
You can say that there is no significant difference between the medians
of the samples.
20.00
15.00
Snails
10.00
5.00
0.00
Figure 5.8 Box and whisker plot showing the medians, quartiles and range of the
numbers of snails given the different nitrate and phosphate treatments. For all
samples n = 9.
The median number of snails in the areas given the different treatments
is shown in shown in Figure 5.8. A Scheirer–Ray–Hare test showed that
there were significantly more snails in areas given high nitrate (x21 = 10.21,
P 6 0.01), but that neither phosphate nor the interaction between the two
factors had a significant effect.
5.10.1 Purpose
The purpose of a nested ANOVA is to analyse experiments or trials in which
you are basically looking at the effect of a single factor, but within each repli-
cate you are taking several measurements. It therefore seems at first glance that
there are two factors, the main factor and the replicates, but the second factor
is ‘nested’ within the first. For instance:
• You might want to examine the effect of nitrogen fertilisation on the area of
the individual leaves of trees. Here the main factor is nitrogen fertilisation,
the replicates are trees, and you are measuring several leaves from each tree.
122
5.10 Nested ANOVA
• You might want to examine the effect of disease on the permeability of indi-
vidual kidney cells. Here the main factor is disease, the replicates are people
and you are measuring several cells from each person.
In these situations, it is tempting just to calculate average values for each repli-
cate (e.g. average leaf area for each tree, or average cell permeability for each per-
son, and analyse the results using a t test or one‐way ANOVA. However, though
this works, you would be missing out information because you would be greatly
reducing your sample size, and you would not be able to gain any information
about whether the replicates were different from each other. Instead you should
carry out a nested ANOVA.
5.10.2 Rationale
Nested ANOVA acts in the same way as one‐way ANOVA, but with one factor
nested within the other it tests two questions.
5.10.3 Validity
Like other forms of ANOVA, nested ANOVA is only valid if the data within each
group is normally distributed. It also assumes that the variances of the groups
are equal, though this condition is not strict.
Note that each fish had different numbers of lice, and there were ac-
tually two more lice on fish in freshwater than salt water. This is not a
➨
123
Chapter 5 Testing for differences between more than two groups
problem! It looks as if the lice in salt water fish are longer but is this differ-
ence significant?
Solution
Step 1: Formulating the null hypotheses
There are two null hypotheses:
To get SPSS to carry out the correct nested analysis click on Paste. This brings
up the Syntax window shown below.
124
5.10 Nested ANOVA
You must change the last line of this to /DESIGN = fish(water) water. This
shows that you are looking at two factors: the effect of water type; and the
effect of the fish, which are nested within each water type. To run the test,
click on the green triangle below Add‐ons that points to the right. SPSS
produces results, the most important bits of which are shown below.
Descriptive statistics
Dependent variable:length
125
Chapter 5 Testing for differences between more than two groups
The descriptive statistics show that the mean lengths of lice in fresh and salt
water are 1.49 and 1.74 mm, respectively. The main table produces two F
ratios that are used to test the two null hypotheses.
Using MINITAB
In MINITAB enter all the measurements of length into a single column
and call it, say, length. Next create two more columns: a second column
(called, say, water) with subscripts 1 and 2 for fresh and salt water, respec-
tively; and a third column (called, say, fish) with subscripts 1 to 8 for the
different fish.
Next, click on the Stat menu, move onto the ANOVA bar, and click on
General Linear Model. MINITAB will produce the General Linear Model
dialogue box. Put the variable you want to test (here length) into the
Response box and the two factors (here water and fish) into the Model
box. To show that the fish are nested within the water follow fish by water
in brackets. The data set and completed box is shown below.
126
5.10 Nested ANOVA
You can also find out the means and standard deviations of the different
groups using the Basic Statistics bar.
The main table produces two F ratios that are used to test the two null
hypotheses.
1. Here Sig. = 0.045; P = 0.000 < 0.05 so we can reject the null hypothesis
and say that water type has a significant effect on the length of lice. In
fact, looking at the descriptive statistics we can see that lice in salt water
fish are longer.
2. Here Sig. = P = 0.000 < 0.05 so we can reject the null hypothesis and say
that fish have a significant effect on the length of lice; some fish have
longer lice than others.
➨
127
Chapter 5 Testing for differences between more than two groups
The mean lengths of the fish lice on the different fish are shown in Fig-
ure 5.9. A nested ANOVA showed that lice in sea water were significantly
longer than those in freshwater (F1 = 44.07, P < 0.005) and lice growing on
different fish showed significant differences in length (F6 = 7.30, P < 0.005).
Many people also recommend showing the completed ANOVA table, though
I am not so keen!
2
1.8
Mena loues length (mm)
1.6
1.4
1.2
1
0.8
0.6
0.4
0.2
0
Fresh water Sea water
Type of water
Figure 5.9 Mean (± standard error) lengths of the lice found on fish in fresh water
and sea water. For both water types the number of fish = 4.
128
5.11 Self‐assessment problems
Problem 5.1
The levels of calcium‐binding protein activity were followed in isolated plant protoplasts
following delivery of a heat shock stimulus. Measurements were taken on six samples of
protoplasts just before and 1, 2, 4 and 8 hours after the stimulus was applied. The follow-
ing results were obtained.
Investigate the way in which protein activity changes during the time course of the experi-
ment. Carry out a one‐way ANOVA and appropriate post hoc tests to determine if any of
the apparent changes are significant.
Problem 5.2
Interpret the following ANOVA table. How many groups were being compared? What
was the total number of observations? And was there a significant difference between
the groups?
Problem 5.3
In an experiment to investigate the uptake of aluminium by snails, 20 snails were placed
in each of eight tanks of water, each of which had an initial aluminium concentration of
20 mM. The water in each tank was sampled at weekly intervals for five weeks after the
start of the experiment and the concentration of aluminium measured. The following re-
sults were obtained.
Tank 1 2 3 4 5 6 7 8
Week 1 16.5 14.3 14.6 15.5 13.1 15.2 14.5 13.9
Week 2 12.1 11.2 12.5 10.9 10.5 11.6 13.2 10.5
Week 3 10.9 8.6 10.2 8.7 8.9 9.3 11.0 9.5
Week 4 10.5 7.8 9.6 7.6 6.8 8.0 9.1 8.5
Week 5 10.2 7.4 8.6 7.9 5.7 7.6 8.4 8.2
129
Chapter 5 Testing for differences between more than two groups
Carry out a repeated measures ANOVA to test whether aluminium level changes signifi-
cantly through time. Carry out a post hoc test to determine whether levels continue to fall
throughout the period.
Problem 5.4
An experiment was carried out to test the effectiveness of three different antibiotics on
the germination and growth of bacteria. Bacteria were smeared onto 40 petri dishes:
10 dishes were left as controls, while 10 had antibiotic A, 10 antibiotic B and 10 antibiotic
C added. After three days the numbers of bacterial colonies was counted on each dish.
The following results were obtained.
Control 0, 6, 9, 1, 2, 8, 3, 5, 2, 0
Antibiotic A 0, 2, 1, 3, 0, 0, 1, 0, 0, 2
Antibiotic B 0, 5, 2, 1, 0, 2, 7, 0, 2, 5
Antibiotic C 6, 1, 5, 2, 0, 1, 0, 7, 0, 0
Carry out a Kruskall–Wallis test to see if the antibiotics had any significant effect on the
numbers of bacterial colonies.
Problem 5.5
An experiment was carried out to test the effectiveness over time of an antidepressant
drug. Ten patients were asked to assess their mood on a 1 (depressed) to 5 (ecstatic)
scale before, one day, one week and one month after taking the drug. The following re-
sults were obtained.
Student 1 2 3 4 5 6 7 8 9 10
Before 2 3 2 4 2 1 3 2 1 2
1 day after 4 5 3 4 4 4 3 3 3 3
1 week after 3 3 4 4 3 3 4 3 4 3
1 month after 3 2 3 4 2 2 2 1 2 2
Carry out a Friedman test to determine if the drug had any significant effect on the pa-
tients’ moods. What pattern emerges of the action of the drug over time and how would
you test for it?
Problem 5.6
In a field trial, two different varieties of wheat. Widgeon and Hereward, were grown at
three different levels of nitrogen. The following results were obtained.
130
5.11 Self‐assessment problems
(a) Which of the three possible effects, variety, nitrogen and interaction, are significant?
(b) Examine the descriptive statistics to work out what these results mean in real terms.
131
6 Investigating relationships
6.1 Introduction
We saw in Chapter 4 that we can use a paired t test or the Wilcoxon matched
pairs test to determine whether two sets of paired measurements are differ-
ent. For instance, we can test whether students have a different heart rate after
drinking coffee compared with before. But we may instead want to know if
and how the two sets of measurements are related. Do the students who have
a higher heart rate before drinking coffee also have a higher heart rate after-
wards? Or we might ask other questions. How are the lengths of snakes related
to their age? How are the wing areas of birds related to their weight? Or how are
the blood pressures of stroke patients related to their heart rate?
This chapter has three sections. First, it shows how to examine data to see
whether variables are related. Second, it shows how you can use statistical tests
to work out whether, despite the inevitable variability, there is a real linear rela-
tionship between the variables, and if so how to determine what it is. Finally,
it describes some of the non‐linear ways in which biological variables can be
related and shows how data can be transformed to make a linear relationship,
the equation of which can be determined statistically.
The first thing you should do if you feel that two variables might be related is
scatter plot
A point graph between two
to draw a scatter plot of one against the other. This will allow you to see at a
variables which allows one glance what is going on. For instance, it is clear from Figure 6.1 that as the age
to visually determine whether of eggs increases, their mass decreases. But it is important to make sure you plot
they are associated.
the graph the correct way round. This depends on how the variables affect each
independent variable
A variable in a regression
other. One of the variables is called the independent variable; the other vari-
which affects another vari- able is called the dependent variable. The independent variable affects, or may
able but is not itself affected. affect, the dependent variable but is not itself affected. Plot the independent
dependent variable variable along the horizontal axis, often called the x‐axis. Plot the dependent
A variable in a regression
which is affected by another
variable along the vertical axis, often called the y‐axis. You would then say
variable. you were plotting the dependent variable against the independent variable. In
132
6.4 Linear relationships
Figure 6.1 The relationship between the age of eggs and their mass. Note that the
dependent variable, mass, is plotted along the vertical axis.
Figure 6.1, age is the independent variable and mass is the dependent variable.
This is because age can affect an egg’s mass, but mass can’t affect an egg’s age.
Things are not always so clear‐cut. It is virtually impossible to tell whether
blood pressure would affect heart rate or vice versa. They are probably both
affected by a third variable - artery stiffness. In this case, it does not matter
so much; however, the one you wish to predict from the relationship (if any)
should go on the y axis.
Once you have plotted your graph, you should examine it for associations.
There are several main ways in which variables can be related:
• There may be no relationship: points are scattered all over the graph paper
(Figure 6.2a).
• There may be a positive association (Figure 6.2b): the dependent variable
increases as the independent variable increases.
• There may be a negative association (Figure 6.2c): the dependent variable
decreases as the independent variable increases.
• There may be a more complex relationship: Figure 6.2d shows a relationship
in which the dependent variable rises and falls as the independent variable
increases.
There are an infinite number of ways in which two variables can be related, most
of which are rather complex. Perhaps the simplest relationships to describe lin-
ear ones such as that shown in Figure 6.3. In these cases, the dependent vari-
able y is related to the independent variable x by the general equation
133
Chapter 6 Investigating relationships
Figure 6.2 Ways in which variables can be related. (a) No association; (b) positive
association; (c) negative association; (d) a complex curvilinear association.
y = a + bx (6.1)
slope where b is the slope of the line and a is the constant or intercept. The intercept
The gradient of a straight
is the value of y where the line crosses the y‐axis. Note that this equation is EXACTLY
line.
THE SAME as the equation
intercept
The point where a straight y = mx + c (6.2)
line crosses the y-axis.
which is the form in which many students encounter it at school.
Linear relationships are important because they are by far the easiest to ana-
lyse statistically. When biologists test whether two variables are related, they are
usually testing whether they are linearly related. Fortunately, linear relationships
between variables are surprisingly common in biology. Many other common
134
6.6 Correlation
Figure 6.3 A straight line relationship. The straight line y = a + bx has y-intercept a
and slope b.
The points on your plots will never exactly follow a straight line, or indeed
any exact mathematical function, because of the variability that is inherent
in biology. There will always be some scatter away from a line. The difficulty
in determining whether two measurements are really related is that when you
were taking a sample you might have chosen points which followed a straight
line even if there were no relationship between the measurement in the popu-
lation. If there appears only to be a slight association and if there are only a few
points, this is quite likely to happen (Figure 6.4a). In contrast it is very unlikely
that you would choose large numbers of points all along a straight line just by
chance if there was no real relationship (Figure 6.4b). Therefore you have to
carry out statistical tests to work out the probability that you could get your
apparent relationship by chance. If there is an association, you may also be able
to work out what the linear relationship is. There are two main tests for associa-
tion: correlation and regression.
6.6 Correlation
6.6.1 Purpose
To test whether two sets of paired measurements, neither of which is clearly
independent of the other, are linearly associated.
135
Chapter 6 Investigating relationships
Stage 1
The first step is to calculate the means of the two sets of measurements, x and y.
Stage 2
The next step is to calculate, for each point, the product of its x and y distances
from the mean (x - x)(y - y ). Note that if both x and y are greater than the mean,
this figure will be positive because both (x - x) and (y - y ) will be positive. It will
also be positive if both x and y are smaller than the mean, because both (x - x)
and (y - y ) will be negative and their product will be positive. However, if one is
larger than the mean and the other smaller, the product will be negative.
These points are added together to give
Sum = a (x - x)(y - y)
136
6.6 Correlation
Figure 6.5 Correlation. (a) Positive correlation: ∑(x - x )(y - y ) is large and positive;
(b) negative correlation: ∑(x - x ) (y - y ) is large and negative; (c) no correlation:
∑(x - x )(y - y ) is small.
• If there is positive association (Figure 6.5a), with points all either above and
to the right or below and to the left of the overall (Figure 6.5b), with points
all either above and to the left or below and to the right of the overall mean,
the sum will be large and negative.
• If there is no association (Figure 6.5c), points will be on all sides of the overall
mean, and the positive and negative numbers will cancel each other out. The
sum will therefore be small.
Stage 3
The final stage is to scale the sum obtained in stage 2 by dividing it by the prod-
uct of the variation within each of the measurements. The correlation coeffi-
cient r is therefore given by the formula
⌺(x - x)(y - y)
r = (6.3)
[⌺(x - x)2 ⌺(y - y)2]1>2
137
Chapter 6 Investigating relationships
n⌺xy - ⌺x ⌺y
2[(n⌺x - (⌺x)2)(n⌺y2 - (⌺y)2)]
(6.4)
2
6.6.3 Validity
Both sets of data must be normally distributed.
138
6.6 Correlation
21 60 158
22 67 160
23 63 167
24 90 221
25 50 149
26 73 180
27 64 168
28 68 162
29 65 168
30 70 157
Solution
Plotting relationship data in SPSS
As well as carrying out statistical tests, SPSS can first be used to graphically
examine the data. Simply put the data into two columns and name them,
here heart and pressure. Now go into the Graphs menu and click on Chart
Builder. In the Chart Builder dialogue click on Scatter/Dot from within
the Choose from menu and drag the top left simple scatter plot into the big
Chart Preview box. Finally move heart into the Y Axis box and pressure
into the X Axis box (actually it doesn’t matter which way round in this
case). The completed box with the data screen is shown below.
Finally click on OK to get SPSS to draw the graph. It will produce a graph
like the one shown below.
➨
139
Chapter 6 Investigating relationships
140
6.6 Correlation
Using SPSS
Click on the Analyse menu, move onto the Correlation bar and click on
Bivariate. SPSS will produce the Bivariate Correlations dialogue box. Move
heart and pressure into the Variables box and make sure that Pearson is
ticked. The completed box is shown below.
➨
141
Chapter 6 Investigating relationships
Finally click on OK. SPSS will come up with the following results:
Correlations
Using MINITAB
Click on the Stat menu, move onto the Basic Statistics bar and click on
Correlation. MINIAB will produce the Correlation dialogue box. Move
heart rate and pressure into the Variables box. The completed box is
shown below.
142
6.6 Correlation
Finally click on OK. MINITAB will come up with the following results:
• If | r | is greater than or equal to the critical value, you must reject the null
hypothesis. You can say that the two variables show significant correlation.
➨
143
Chapter 6 Investigating relationships
• If | r | is less than the critical value, you cannot reject the null hypothesis.
There is no evidence of a linear association between the two variables.
• If Sig. (two‐tailed) or P‐value … 0.05 you must reject the null hypothesis.
Therefore you can say that there is a significant association between the
variables.
• If Sig. (two‐tailed) or P‐value 7 0.05 you have no evidence to reject the
null hypothesis. Therefore you can say that there is no significant asso-
ciation between the variables.
Therefore we must reject the null hypothesis. We can say that heart rate
and blood pressure are significantly correlated. In fact as r 7 0 they show a
significant positive association.
The relationship between blood pressure and heart rate of the patients
is given in Figure 6.6. Correlation analysis showed that there was a
significant positive association between heart rate and blood pressure
(r28 = 0.860, P 6 0.001).
Blood pressure (mm Hg)
Figure 6.6 Graph showing the relationship between the heart rate and blood pressure
of elderly patients.
144
6.7 Regression
6.7 Regression
6.7.1 Purpose
To quantify the linear relationship between two sets of paired measurements,
one of which is clearly independent of the other. Good examples of independent
variables are
• Age or time
• An experimentally manipulated variable, such as temperature or humidity.
6.7.2 Rationale
regression Regression analysis finds an estimate of the line of best fit y = a + bx through
A statistical test which analy-
the scattered points on your graph. If you measure the vertical distance of each
ses how one set of measure-
ments is (usually linearly) point from the regression line (Figure 6.7a), the line of best fit is the one which
affected by another. minimises the sum of the squares of the distances.
The estimate of the slope k is actually worked out in a similar way to the cor-
relation coefficient, using the formula
⌺(x - x)(y - y)
b = (6.5)
⌺(x - x)2
145
Chapter 6 Investigating relationships
Figure 6.7 Regression. The line of best fit minimises the variability a si2 from the line.
(a) Significant regression: a si2 is low; (b) non-significant regression: a si2 is high.
Since the line of best fit always passes through the means of x and y, x and y ,
the estimate of the constant a can then be found by substituting them into the
equation to give
a = y - bx (6.7)
This is all very well, but data with very different degrees of scatter, such as
those shown in Figure 6.7, can have identical regression lines. In Figure 6.7a
there is clearly a linear relationship. However, in Figure 6.7b there may actually
be no relationship between the variables; you might have chosen a sample that
suggests there is a relationship just by chance.
standard deviation (S)
In order to test whether there is really a relationship, therefore, you would
A measure of spread of a
group of measurements: the have to carry out one or more statistical tests. You could do this yourself, but
amount by which on average the calculations needed are a bit long and complex, so it is much better now to
they differ from the mean.
The estimate of S is called s.
use a computer package. SPSS not only calculates the regression equation but
also performs two statistical tests and gives you the information you need to
t tests
Statistical tests which analyse carry out a whole range of other tests:
whether there are differences
between measurements on • It works out the standard deviation of a and b and uses them to carry out
a single population and an two separate t tests to determine whether they are significantly different
expected value, between
paired measurements, or from zero. The data can also be used to calculate 95% confidence intervals for
between two unpaired sets a and b.
of measurements
• It carries out an ANOVA test, which essentially compares the amount of varia-
.
ANOVA tion explained by the regression line with that due to the scatter of the points
Abbreviation for analysis
of variance: a widely used away from the regression line. This tells you whether there is a significant
series of tests which can slope, e.g. if b is significantly different from zero.
determine whether there
• It also tells you the percentage of the total variation that the regression line
are significant differences
between groups. explains. This r2 value is equal to the square of the correlation coefficient.
146
6.7 Regression
6.7.3. Validity
Both sets of data must be normally distributed, but you must also be careful to
use regression appropriately; there are many cases where it is not valid:
• Simple regression is not valid for data in which there is no independent vari-
able. For example, you should not regress heart rate against blood pressure,
because each factor could affect the other. In that case you could, in fact,
use reduced major axis regression, though debate continues to rage about
whether even this approach is really valid. Details of how to carry out this
(not too difficult!) analysis can be found in Zar (2000).
• All of your measurements must be independent. Therefore you should not use
regression to analyse repeated measures, such as the height of a single plant at
different times. If that is the case you will need to use growth analysis, which
is a subject in itself.
Example 6.2 In a survey to investigate the way in which chicken eggs lose weight after
they are laid, one egg was collected newly laid every 2 days. Each egg was
put into an incubator, and after 40 days all 20 eggs were weighed. The re-
sults are tabulated here and plotted in Figure 6.1. Carry out a regression
analysis to determine whether age significantly affects egg weight. If there
is a relationship, determine what it is.
Age (days) 2 4 6 8 10 12 14 16 18 20 22
Mass (g) 87 80 79 75 84 75 70 65 64 67 57
Age (days) 24 26 28 30 32 34 36 38 40
Mass (g) 67 53 50 41 41 53 39 36 34
Solution
Step 1: Formulating the null hypothesis
The null hypothesis is that age has no effect on egg weight. In other words,
that the slope of the regression line is zero.
Finally click on OK. SPSS will produce masses of output, of which the only
useful bit is the following table:
This gives you an estimate (B) of the Constant and slope (day) from equa-
tion 6.1. Here, the line of best fit is the equation
Using MINITAB
First put the data into two columns called, say mass and day. Next, click
on the Stat menu, move onto the Regression bar and click on Regression.
MINITAB will produce the Regression dialogue box. Put the dependent
variable (here mass) into the Response box and the independent variable
148
6.7 Regression
(here day) into the Predictors box. The completed box, and the data screen
are shown below.
Finally click on OK. MINITAB will produce masses of output, of which the
most useful bits are the following tables:
MINITAB actually gives the regression equation. The slope of the regression
equation is -1.36, which appears to be well below zero. But is this difference
significant? MINITAB has also calculated the standard error of the slope
(0.09514) and has performed a t test to determine whether the slope is sig-
nificantly different from zero. Here t = -14.31.
• If Sig. or P … 0.05 you should reject the null hypothesis. Therefore you
can say that the slope is significantly different from zero.
• If Sig. or P 7 0.05 you have no evidence to reject the null hypothesis.
Therefore you can say that the slope is not significantly different from zero.
➨
149
Chapter 6 Investigating relationships
Here Sig. = P = 0.000 6 0.05. Therefore we must reject the null hypothesis.
We can say that age has a significant effect on egg weight; in fact older eggs
are lighter.
This shows that the confidence intervals for the slope of the line are between
-1.561 and -1.161.
Observed value - 0
t = (6.8)
Standard deviation
However, it is also possible from the computer output to carry out a whole
range of one‐sample t tests to determine whether the slope or constant is dif-
ferent from any expected value. Then t is simply given by the expression
Just like equation 4.1, and you can carry out the t test for N - 2 degrees of free-
dom just as the computer did to determine whether the slope or constant was
different from zero.
Example 6.3 From the egg weight data in Example 6.2, we want to determine whether the
initial egg weight was significantly different from 90 g, which is the mean
figure for the general population. In other words, we must test whether the
intercept (or Constant as SPSS calls it) is different from 90.
150
6.7 Regression
Solution
Step 1: Formulating the null hypothesis
The null hypothesis is that the constant is equal to 90.
Slope 1 - Slope 2
2[(SEslope 1)2 + (SEslope2)2]
t = (6.10)
Where there are N + M - 4 degrees of freedom, where N and M are the two
sample sizes.
Analysis also showed that the mass of eggs at day 0 was not significantly different
from 90 g (t18 = -0.247 P 7 0.05).
151
Chapter 6 Investigating relationships
As we have seen, not all relationships between variables are linear. There are
in fact two particularly common non‐linear ways in which measurements in
biology may be related. Fortunately, as we shall see, these relationships can be
changed into linear relationships by transforming one or both of the variables,
allowing them to be quantified using regression analysis.
y = axb (6.11)
152
6.8 Studying common non‐linear relationships
which a and b can be easily calculated. The first thing to do is to take loga-
rithms of both sides of the equation. We have y = axb, so
Therefore plotting log10 y against log10 x (Figure 6.8b) will produce a straight
line with slope b and intercept log10 a.
y = aebx (6.16)
where e is the base of natural logarithms (e = 2.718). Looking at the curve pro-
duced by this sort of relationship (Figure 6.9a), it is very difficult to determine
153
Chapter 6 Investigating relationships
the values of a and b, just as it was with power relationships. However, we can
again use some clever mathematical tricks to produce a straight line graph. As
before, the first thing to do is to take logarithms of both sides of the equation.
Therefore
and rearranging
Therefore plotting loge y against x (Figure 6.9b) will produce a straight line
with slope b and y‐intercept loge a.
Example 6.4 An investigation was carried out into the scaling of heads in worker army
ants. Body length and jaw width were measured in 20 workers of contrast-
ing size. The following results were obtained.
Length (mm) 3.2 3.6 4.2 4.3 4.6 5.0 5.2 5.3 5.5 5.5
Jaw width (mm) 0.23 0.29 0.32 0.38 0.45 0.44 0.55 0.43 0.60 0.58
Length (mm) 5.7 6.2 6.6 6.9 7.4 7.6 8.5 9.2 9.7 9.9
Jaw width (mm) 0.62 0.73 0.74 0.88 0.83 0.93 1.03 1.15 1.09 1.25
It was suggested that these ants showed allometry, the jaws of larger ants being
relatively wider than those of smaller ants. It certainly looks that way as the larg-
est ants are around three times as long as the smallest ones but have heads that
are around four to five times wider. To investigate whether there is a significant
change in proportions, the body length and jaw width must first both be log
transformed using SPSS or MINITAB (to see how to transform data see Section
3.5). This gives data which, when plotted in SPSS, gives the following graph.
154
6.8 Studying common non‐linear relationships
Solution
Step 1: Formulating the null hypothesis
The null hypothesis is that the slope of the line is equal to 1.
rank If you have rank data or if your data is not normally distributed, it is not valid
Numerical order of a data to use either correlation or regression. In these cases, rank correlation can be
point.
used to see whether there is a relationship between the ranks of the data.
6.9.1 Purpose
To test whether the ranks of two sets of paired measurements are linearly
associated.
6⌺d2
r = 1 - (6.19)
(n3 - n)
Where n is the sample number and d is the difference between ranks for each
point. Note that the higher the correlation, the smaller the differences between
the ranks, so the higher r will be.
156
6.9 Dealing with non‐normally distributed data: rank correlation
Example 6.5 In a field survey which was investigating if there was any relationship be-
tween the density of tadpoles and their dragonfly larvae predators, 12 ponds
were sampled. The following results were found.
Dragonfly density 3, 6, 5, 1, 1, 4, 9, 8, 2, 5, 7, 11
Tadpole density 86, 46, 39, 15, 41, 52, 100, 63, 60, 30, 72, 71
3 4 86 11 -7 49
6 8 46 5 3 9
5 6.5 39 3 3.5 12.25
1 1.5 15 1 0.5 0.25
1 1.5 41 4 - 2.5 6.25
4 5 52 6 -1 1
9 11 100 12 -1 1
8 10 63 8 2 4
2 3 60 7 -4 16
5 6.5 30 2 4.5 20.25
➨
157
Chapter 6 Investigating relationships
7 9 72 10 -1 1
11 12 71 9 3 9
gd2 = 129
6⌺d2
Since r = 1 -
(n3 - n)
r = 1 - [(6 * 129)/(123 - 12)]
= 1 - (774/1716) = 1 -0.453 = 0.547
Using SPSS
Enter the data into two columns, named, say, dragonfly and tadpole Next,
click on the Analyze menu, move onto the Correlate bar, then click on the
Bivariate bar. SPSS will produce the Bivariate Correlations dialogue box. Put
the columns to be compared into the Variables box and tick on the Spearman
correlation coefficient. The completed dialogue box and data are shown below.
Finally click onto OK to run the test. SPSS will come up with the following
results.
158
6.9 Dealing with non‐normally distributed data: rank correlation
Using MINITAB
MINITAB does not calculate the Spearman correlation directly. Instead,
you must convert your data into ranks, and then perform a conventional
Pearson correlation analysis (see Section 6.7) on the ranked data. To con-
vert the two columns of data to ranks. Click on the Calc menu, and click
onto the Calculator bar. This opens the Calculator dialogue box. For each
column put the expression RANK(‘Dragonfly’) and RANK(‘Tadpole’) into
the Expression box, and put C3 or C4 into the Store result in variable
box. The final completed data and box are shown below.
Click on OK to complete the ranking of the data. Then move onto the Basic
Statistics bar and click on Correlation. MINITAB will produce the Corre-
lation dialogue box. Move Rank Dragonfly and Rank Tadpole into the
Variables box. The completed box and data are shown below.
Finally click onto OK to run the test. MINITAB will come up with the fol-
lowing results.
Using a calculator
You must compare your value of r with the critical value of the Spearman
correlation coefficient for N - 2 degrees of freedom, where N is the number
of pairs of observations. This is given in Table S7 at the end of the book.
Here looking up r for (12 - 2) = 10 degrees of freedom gives a critical
value of r = 0.648.
160
6.10 Self‐assessment problems
Figure 6.10 Graph showing the relationship between the density of tadpoles and
dragonfly larvae in 12 ponds.
Problem 6.1
Which way round would you plot the following data?
Problem 6.2
A study of the density of stomata in vine leaves of different areas came up with the fol-
lowing results. Calculate the correlation coefficient r between these two variables and
determine whether this is a significant correlation. What can you say about the relation-
ship between leaf area and stomatal density?
Problem 6.3
In a survey to investigate why bones become more brittle in older women, the density of
bone material was measured in 24 post‐menopausal women of contrasting ages. Bone
density is given as a percentage of the average density in young women.
161
Chapter 6 Investigating relationships
Age (years) 43 49 56 58 61 63 64 66 68 70 72 73
Relative bone density 108 85 92 90 84 83 73 79 80 76 69 71
Age (years) 74 74 76 78 80 83 85 87 89 92 95 98
Relative bone density 65 64 67 58 50 61 59 53 43 52 49 42
Problem 6.4
In an experiment to examine the ability of the polychaete worm Nereis diversicolor to
withstand zinc pollution, worms were grown in solutions containing different concentra-
tions of zinc and their internal zinc concentration was measured. The following results
were obtained.
log10 [Zn]water 1.96 2.27 2.46 2.65 2.86 2.92 3.01 3.24 3.37 3.49
log10 [Zn]worm 2.18 2.23 2.22 2.27 2.25 2.30 2.31 2.34 2.36 2.35
Problem 6.5
A study of the effect of seeding rate on the yield of wheat gave the following results.
Seeding rate (m-2) 50 80 100 150 200 300 400 500 600 800
Yield (tonnes) 2.5 3.9 4.7 5.3 5.6 5.9 5.4 5.2 4.6 3.2
Problem 6.6
(a) The logarithms of the wing area A of birds and their body length L are found to be
related by the straight line relationship log10 A = 0.3 + 2.36 log10 L. What is the
relationship between A and L?
(b) The natural logarithm of the numbers of cells N in a bacterial colony is related to time
T by the equation loge N = 2.3 + 0.1T. What is the relationship between N and T?
162
6.10 Self‐assessment problems
Problem 6.7
An investigation was carried out into the temperature dependence of the metabolism
of a species of coecilian (a worm‐like amphibian). A captive animal was kept at tem-
peratures ranging from 0 to 30° C at intervals 2° C and its metabolic rate determined
by measuring the rate of output of carbon dioxide. The following results were obtained.
Temperature 0 2 4 6 8 10 12 14
CO2 production (ml/min) 0.35 0.43 0.45 0.55 0.60 0.78 0.82 0.99
Temperature 16 18 20 22 24 26 28 30
CO2 production (ml/min) 1.32 1.43 1.64 1.71 2.02 2.35 2.99 3.22
Transform the data by taking natural logarithms of CO2 production and use regression
analysis to examine the relationship between temperature and metabolic rate.
Problem 6.8
It was thought that the dominance of male rats might be related to the levels of testos-
terone in their blood. Therefore encounters between a total of 20 rats were observed
and the dominance order, from 1 for the top rat to 20 for the bottom was then worked
out. Blood samples were also taken from each rat to measure its testosterone level. The
following results were obtained.
Dominance 1 2 3 4 5 6 7 8 9 10
Testosterone 7.8 6.7 7.3 6.8 6.2 8.1 7.8 6.5 6.9 7.0
Dominance 11 12 13 14 15 16 17 18 19 20
Testosterone 6.7 6.4 6.3 5.8 7.6 6.7 6.6 7.1 6.4 6.5
Carry out a Spearman rank correlation analysis to determine whether there was a signifi-
cant association between testosterone and dominance.
163
7 Dealing with categorical data
7.1 Introduction
At first glance it might seem easy to tell whether character frequencies are differ-
ent. When looking at a sample of sheep, if we found that eight were black and six
white, we might conclude that black ones were commoner than white. Unfortu-
nately, there might easily have been the same number of black and white sheep
in the population and we might just have picked more black ones by chance.
164
7.2 The problem of variation
A character state is, in fact, unlikely to appear at exactly the same frequency
in a small sample as in the whole population. Let’s examine what happens
when we take samples of a population of animals, 50% of which are white and
50% black. In a sample of 2 there is only a 50% chance of getting a 1:1 ratio; the
binomial distribution
other times both animals would be either black or white. With 4 animals there
The pattern by which the will be a 1:1 ratio only 6 times out of 16; there will be a 3:1 or 1:3 ratio 4 times
sample frequencies in two out of 16 and a 4:0 or 0:4 ratio once every 16 times.
groups tends to vary.
As the number of animals in the sample increases, the most likely frequen-
mean (μ) cies are those closer and closer to 1:1, but the frequency will hardly ever equal
The average of a population.
1:1 exactly. In fact the probability distribution will follow an increasingly tight
The estimate of μ is called x.
ple size and standard deviation s approaching 1(n/2). The probability that the
binomial distribution (Figure 7.1) with mean x equal to n/2, where n is the sam-
standard deviation
(s)A measure of spread of a ratio is near 1:1 increases, and the chances of it being further away decreases.
group of measurements: the
amount by which on average However, there is always a finite, if increasingly tiny, chance of getting all white
they differ from the mean. animals.
The estimate of s is called s.
Things get more complex if the expected frequencies are different from 1:1
and if there are a larger number of categories, but essentially the same pattern
will occur: as the sample size increases, the frequencies will tend to approach,
but seldom equal, the frequencies in the population. The probability of obtain-
ing frequencies similar to that of the population rises, but there is still a finite
probability of the frequencies being very different. As expected, in a population
where a character occurs at a frequency p, the mean frequency at which it will
turn up in a sample is also p. However, the standard deviation, s (also confus-
ingly called the standard error) is given by the rather complex formula
So if the results from a sample are different from the expected frequency, you
cannot be sure this is because the population you are sampling is really differ-
ent. Even if you sampled 100 animals and all were white, the population still
might have contained a ratio of 1:1; you might just have been very unlucky.
However, the greater the difference and the larger the sample, the less likely
this becomes. To determine whether differences from an expected frequency
165
Chapter 7 Dealing with categorical data
chi-squared (X2) are likely to be real, there are several types of test you can use: the chi‐squared
A statistical test which deter-
mines whether there are
(x2) test, the G test, the Kolgomorov–Smirnov one‐sample test and Fisher’s
differences between real and exact test. However, by far the most commonly used is the chi‐squared (x2)
expected frequencies in one test, which we will examine here. There are two main types of x2 test:
set of categories, or associa-
tions between two sets of
• The x2 test for differences
categories.
• The x2 test for association
7.3.1 Purpose
To test whether character frequencies are different from expected values. It is
best used when you expect numbers in different categories to be in particular
ratios. Here are some examples:
7.3.2 Rationale
The test calculates the chi‐squared statistic (x2); this is a measure of the differ-
ence between the observed frequencies and the expected frequencies. Basically,
the larger x2, the less likely the results could have been obtained by chance if
the population frequency was the expected one.
The x2 statistic is given by the simple expression
(O - E)2
x2 = a (7.2)
E
where O is the observed frequency and E is the expected frequency for each char-
acter state. The larger the difference between the frequencies, the larger the value
of x2 and the less likely it is that observed and expected frequencies are different
just by chance. Similarly, the bigger the sample, the larger O and E, hence the
larger the value of x2; this is because of the squared term in the top half of the
fraction. So the bigger the sample you take, the more likely you will be to detect
any differences.
The greater the number of possible categories there are, the greater the num-
ber of degrees of freedom; this also tends to increase x2. The distribution of x2
has been worked out for a range of degrees of freedom, and Table S3 (at the end
of the book) gives the critical values of x2 above which there is less than a 5%,
1% or 0.1% probability of getting the observed values by chance.
166
7.3 The x2 test for differences
Test whether the ratio of smooth to wrinkled peas in the 100 progeny is dif-
ferent from the 3:1 ratio predicted by Mendelian genetics. Carrying out the
test involves the usual four stages.
So we have
Using SPSS
Conventionally, carrying out x2 tests for differences in SPSS involves putting
in data about each organism or item and then carrying out the test. Since
sample sizes are invariably large, this approach could be extremely time‐con-
suming. Fortunately, there is a quicker approach which involves weighting
the data you enter.
➨
167
Chapter 7 Dealing with categorical data
As usual the first thing is to enter your data correctly. In this case you
will have to enter data into two columns. In the first column you should
enter subscripts for each of the categories (here 1 and 2). You can label these
categories smooth and wrinkled. In the second column you should put the
numbers in each of the categories. You should call this weighting.
To weight the cases, go into Data and click on Weight Cases. This will
bring up the Weight Cases dialogue box. Click on Weight Cases by and
enter Weighting into the Frequency Variable box. The completed dialogue
box and data are shown below.
Click on OK to weight the data. Now you can carry out the test (which
you could also do if you had a column with 69 1’s and 31 s’s!). Click on
Analyse, go into Nonparametric Tests onto Legacy Dialogue and click
168
7.3 The x2 test for differences
on Chi‐Square. This will bring up the Chi‐Square Test dialogue box. Put
the Categories column into the Test Variable List box. Finally you need to
enter the expected ratios of your two categories (here 3:1). To do this click
on Values in the Expected Values box and type in 3 (the expected value for
category 1) and click on Add to enter it. The enter 1 and Add to do the same
for category 2. The completed data and dialogue box are shown in previous
page bottom.
Finally click on OK to run the test. SPSS will print the following tables:
Observed and expected values are given in the first table, the value of x2
(here 1.920) in the second.
Using MINITAB
It is straightforward to carry out x2 tests for differences in MINITAB. Put the
frequencies into the first column and the expected proportions (here 0.75
and 0.25) into the second column. Call them something like observed
and expected. Next click on the Stat menu, move onto the Tables bar and
➨
169
Chapter 7 Dealing with categorical data
Observed and expected values are given in the first table, the value of x2
(here 1.92) in the second. MINITAB will also give you some not very useful
tables. You may stop it doing this by going into the Graphs dialogue box
and unticking the two bar charts.
• If x2 is greater than or equal to the critical value, you must reject the
null hypothesis. You can say that the distribution is significantly differ-
ent from expected.
170
7.4 The x2 test for association
• If x2 is less than the critical value, you cannot reject the null hypothesis.
You have found no significant difference from the expected distribution.
7.4.1 Purpose
To test whether the character frequencies of two or more groups are different
from each other. In other words, to test whether character states are associated
in some way. It is used when there is no expected frequency. Here are some
examples:
7.4.2 Rationale
The test investigates whether the distribution is different from what it would
be if the character states were distributed randomly among the population.
The tricky part is determining the expected frequencies. Once these have been
determined, however, the value of x2 is found in just the same way as for the x2
test for differences, using equation 6.2.
171
Chapter 7 Dealing with categorical data
Example 7.2 A sociological study found that out of 30 men, 18 were smokers and 12
non‐smokers, while of the 60 women surveyed, 12 were smokers and 48
were non‐smokers. Test whether the rates of smoking are significantly dif-
ferent between the sexes.
Solution
Step 1: Formulating the null hypothesis
The null hypothesis is that there is no association between the character
states. Here, the null hypothesis is that there is no association between gen-
der and the likelihood that a person will smoke, so men and women are
equally likely to smoke.
where the grand total is the total number of observations (here 90). There-
fore, the expected value for male smokers is found by multiplying its row
172
7.4 The x2 test for association
total (30) by the column total (30) and dividing by 90, to give 10. These
expected values are then put into the contingency table, written in paren-
theses. It is now straightforward to calculate x2 using equation 7.2.
(O - E)2
x2 = a
E
(18 - 10)2 (12 - 20)2 (12 - 20)2 (48 - 40)2
x2 = + + +
10 20 20 40
Using SPSS
Once again SPSS can calculate x2 if you enter each person separately in a
large data sheet, with separate columns for each characteristic. Since there
are 90 people here this involves entering 180 numbers. However, you can
also do it much quicker by weighting the data as in Example 7.1.
First, you will need to put the data into two columns, one for gender
(give men the subscript 1 and the women 2), the other for smoking (give
non‐smokers the subscript 0 and smokers 1). Doing it longhand, for the
gender column give the first 30 people the subscript 1 (meaning men), and
for 31–90 give them subscript 2 (meaning women). In the smoking column
give the first 18 men the subscript 1 (meaning smoking) and the final 12
the subscript 0 (meaning non‐smoking). Finally give the first 12 women the
subscript 1 and the final 48 women the subscript 0.
To do it quicker produce three columns: a gender column, a smoking
column and a weighting column. You need just four rows, one for each
combination of categories, and the weighting column should give the num-
bers in each of the categories. The completed columns are shown below.
To weight the data click on Data and then on Weight Cases. You then weight
the data by clicking on Weight Cases by and moving Weighting into the
Frequency Variable box. The completed dialogue box is shown below.
➨
173
Chapter 7 Dealing with categorical data
To carry out the test, click on Analyse, move onto Descriptive Statistics
and click on Crosstabs. This will bring up the Crosstabs dialogue box. Put
gender into the Row(s) box and smoking into the Column(s) box. Next, click
on the Statistics box to bring up the Crosstab: Statistics dialogue box and tick
Chi‐square. Your dialogue boxes and data screen will look like the following.
174
7.4 The x2 test for association
Finally click on Continue and OK to run the test. SPSS will come up with
the following useful output.
Crosstabs
The first table is the contingency table, and the statistic you require is the
Pearson chi‐square. Here Pearson chi‐square = 14.400.
Using MINITAB
It is straightforward to carry out x2 tests for associations in MINITAB. Put
the frequencies into the data sheet in the same arrangement as in the con-
➨
175
Chapter 7 Dealing with categorical data
tingency table. Call the first column non‐smoking and the second smok-
ing. Next click on the Stat menu, move onto the Tables bar and click on
Chi‐Square Test (Two‐Way Table in Worksheet). MINITAB will produce
the Chi‐Square Test (Table in Worksheet) dialogue box. Move the two
columns into the Columns containing the table: box. The completed data
and box are shown in previous page bottom.
Finally click on OK to run the test. MINITAB will come up with the fol-
lowing output.
The first table is the contingency table with the expected values and
values of x2 for each cell. The line below gives the total x2 value
(x2) which is 14.400.
176
7.4 The x2 test for association
• If x2 is greater than or equal to the critical value, you must reject the
null hypothesis. You can say that the distribution is significantly differ-
ent from expected, hence there is a significant association between the
characters.
• If x2 is less than the critical value, you cannot reject the null hypothesis.
You have found no significant difference from the expected distribution,
hence no evidence of an association between the characters.
177
Chapter 7 Dealing with categorical data
Table 7.1 The numbers of men and women and their smoking status. The table gives
both observed and expected (in brackets) numbers
Table 7.1 shows the numbers of men and women who were smokers or
non‐smokers. A x2 test showed that there were significant differences in
the incidence of smoking in the two genders (x12 = 14.4, P 6 0.001). Men
were more likely to smoke than women.
1. You must only carry out x2 tests on the raw numbers of observations that
you have made. Never use percentages. This is because the larger the number
of observations, the more likely you are to be able to detect differences or
associations with the x2 test.
2. Another point about sample size is that x2 tests are only valid if all expected
values are larger than 5. If any expected values are lower than 5, there are
two possibilities:
• You could combine data from two or more groups, but only if this makes
biological sense. For instance, different species of fly could be combined
in Problem 7.4 because flies have more in common with each other than
with the other insects studied.
• If there is no sensible reason for combining data, small groups should be
left out of the analysis.
178
7.5 Validity of x2 tests
Example 7.3 In a survey of the prevalence of a heart condition in 300 people of different
races, the following results were obtained.
Solution
The first thing to notice is that there are too few mixed race people in the
survey. The expected value for mixed race people with the condition, for
instance is (9 * 60)/300 = 1.8, which is well below 5. There is no justifica-
category tion for putting mixed race people into any other category so they must
A character state which be ignored. Now we can carry out a x2 test for association for the other
cannot meaningfully be
represented by a number. three races.
7.6.1 Purpose
logistic regression The purpose of logistic regression is to test whether the frequencies of particu-
A statistical test which analy- lar traits (usually binary traits such as diseased or fit, eaten or ignored, yes or
ses how a binary outcome is
affected by other numerical no) are influenced by other traits of the same organisms or cells. Logistic regres-
characteristics. sion essentially does the same thing as the x2 test for association, but it can also
investigate how these traits are influenced by ranked and measurement data, as
well as categorical data. Here are some examples:
• Ecological surveys: Are the rates of predation different on animals with differ-
ent levels of protective chemicals?
• Medical surveys: Is the rate of type 2 diabetes different for people with differ-
ent body mass indices?
• Sociological surveys: Do people with different incomes have a different prob-
ability of smoking?
7.6.2 Rationale
The test investigates whether the distribution is different from what it would
be if the character states were distributed randomly among the population. It
looks at how much the distribution changes with the other trait, and performs
a t test just like that done in linear regression to see whether the trait has a sig-
nificant effect.
Example 7.4 In a study into the effectiveness of camouflage on the predation of model
caterpillars, 50 unrealistic models, 50 semi‐camouflaged and 50 completely
camouflaged models were put into a cage into which starlings were intro-
duced. After one hour, 27 unrealistic, 19 semi‐camouflaged and 17 com-
pletely camouflaged models had been eaten. The results can be described in
the following contingency table.
180
7.6 Logistic regression
Completely
Unrealistic Semi‐camouflaged camouflaged Total
Eaten 27 19 17 62
Uneaten 23 31 33 88
Total 50 50 50 150
Solution
Step 1: Formulating the null hypothesis
The null hypothesis is that there is no association between the character
state and the second trait. Here, the null hypothesis is that the level of
camouflage has no effect on predation, so the different models are equally
likely to be eaten.
Using SPSS
The usual way (and if you are using measurements the only way) of put-
ting data into SPSS is to enter each model separately in a large data sheet,
with separate columns for the two characteristics: its realism (1, 2 or 3), and
➨
181
Chapter 7 Dealing with categorical data
whether or not it was eaten (0 or 1). Since there are 150 models this involves
entering 300 numbers. However, you can also do it much quicker in this
case by weighting the data as in Example 7.1. Next, click on the Analyse
menu, move onto the Regression bar and click on Binary Logistic. SPSS
will produce the Logistic Regression dialogue box. Put the dependent vari-
able (here eaten) into the Dependent box and the independent variable
(here realism) into the Covariate box. The completed box, and the data
screen are shown in previous page bottom.
Finally click on OK. SPSS will produce masses of output, of which the
only useful bits, are the last two tables shown below.
The first table shows how well the logistic regression model predicts the
numbers of eaten and uneaten models. The second table is like the table in
a linear regression; it gives the size of the effect (B) of realism, which is just
like the slope of a regression analysis and performs a test to see if B is signifi-
cantly different from 0; in other words it tests whether the model has sig-
nificantly improved the fit. Here B = - 0.417 with a standard error of 0.207.
Using MINITAB
The usual way (and if you are using measurements the only way) of putting
data into MINITAB is to enter each model separately in a large data sheet,
with separate columns for the two characteristics: its realism (1, 2 or 3), and
whether or not it was eaten (0 or 1). Since there are 150 models this involves
entering 300 numbers. You could also do it by putting the frequencies into
the data sheet, as in x2 for association but I don’t recommend this as it gets
rather involved. Next, click on the Stat menu, move onto the Regression
bar and click on Binary Logistic Regression. MINITAB will produce the
Binary Logistic Regression dialogue box. Put the dependent variable (here
eaten) into the Response box and the independent variable (here realism)
into the Model box. The completed box and the data screen are shown
below.
182
7.6 Logistic regression
Finally click on OK. MINITAB will produce masses of output, of which the
most useful bits are the following tables.
➨
183
Chapter 7 Dealing with categorical data
The second table shows the observed and the expected numbers of eaten
and uneaten models according to the model that the logistic regression has
fitted. The first table is like the table in a linear regression; it gives the size
of the effect (Coef) like the slope of a regression analysis and performs a test
to see if Coef is significantly different from 0. In other words it tests whether
the model has significantly improved the fit. Here Coef = -0.4166 with a
standard error (SE Coef) of 0.207.
Here Sig. = P = 0.044 6 0.05. Therefore we must reject the null hypoth-
esis. We can say that the realism of the models significantly affects their
probability of being eaten (the more realistic a model is the less likely it is
to be eaten).
184
7.7 Self‐assessment problems
Table 7.2 The numbers of models eaten and left uneaten by the birds
Completely
Unrealistic Semi‐camouflaged camouflaged Total
Eaten 27 19 17 62
Uneaten 23 31 33 88
Total 50 50 50 150
Table 7.2 shows the numbers of the different types of models that were
eaten or left uneaten. A logistic regression showed that better camouflaged
models were eaten more rarely than less camouflaged ones (b = -0.417,
P = 0.044).
Problem 7.1
In an experiment to test the reactions of mice to a potential pheromone, they were run
down a T‐junction maze; the pheromone was released in one of the arms of the T. After
the first 10 trials, 3 mice had turned towards the scent and 7 had turned away. After 100
trials, 34 had turned towards the scent and 66 had turned away. Is there any evidence of
a reaction to the scent?
Problem 7.2
A cross was carried out between peas which were heterozygous in the two characters:
height (tall H or short h) and pea colour (green G or yellow g). The following offspring were
obtained.
Number
Tall plants, green peas 87
Tall plants, yellow peas 34
Short plants, green peas 28
Short plants, yellow peas 11
For unlinked genes the expected ratios of each sort of plant are 9:3:3:1. Carry out a x2
test to determine whether there is any evidence of gene linkage between these charac-
ters.
Problem 7.3
A study of the incidence of a childhood illness in a small mining town showed that out
of a population of 165 children, 9 had developed the disease. This compares with a rate
of 3.5% in the country as a whole. Is there any evidence of a different rate in the town?
185
Chapter 7 Dealing with categorical data
Problem 7.4
In a study of insect pollination, the numbers of insect visitors belonging to different taxo-
nomic groups were investigated at flowers of different colours. The following results were
obtained.
(a) Carry out a x2 test to determine whether there is any association between the types
of insects and the colour of the flowers they visit.
(b) Which cells have the three highest x2 values? What do these results tell you about the
preferences of different insects?
Problem 7.5
A study was carried out to determine whether there is a link between the incidence of skin
cancer and the possession of freckles. Of the 6045 people examined, 978 had freckles,
of whom 33 had developed skin cancer. Of the remaining people without freckles, 95 had
developed skin cancer. Is there any evidence that people with freckles have an increased
risk of developing skin cancer?
Problem 7.6
A field study on the distribution of two species of newt found that of 745 ponds studied,
180 contained just smooth newts, 56 just palmate newts, 236 had both species present
and the remainder had neither. Is there any association between the two species and, if
so, what is it?x2
186
8 Designing experiments
8.1 Introduction
The role of experiments and surveys in biology is to help you answer questions
about the natural world by testing hypotheses you have put forward. According
to scientific papers (a highly stylised, and even misleading form of literature)
this is amazingly simple. All you need do is to examine and take measurements
on small numbers of organisms or cells. The results you obtain can then be sub-
jected to statistical analysis to determine whether differences between groups
or relationships between variables are real or could have occurred by chance.
Nothing could appear simpler, but this is because scientific papers leave out the
vast majority of work that scientists actually do, deciding on, designing and set-
ting up their experiments.
Good scientists spend a lot of time making sure their experiments will
work before they attempt them. Bad ones can waste a lot of time, effort and
money carrying out badly designed experiments or surveys. Some experi-
ments which they carry out could never work; in others the size of the sam-
ple is either too small to detect the sorts of effects which might be expected
or much larger than necessary. Still other experiments are ruined by a con-
founding variable.
Before you carry out an experiment or survey, you should therefore ask your-
self the following questions:
Most of the work a scientist does is aimed at answering these questions and
is carried out before the final experiment is performed. The key to success
in experimentation is careful forward planning and preliminary work. The
main techniques are preparation, replication, randomisation and blocking.
187
Chapter 8 Designing experiments
8.2 Preparation
To find out whether an experiment could work, you must first find out some-
thing about the system you are studying by reading the scientific literature,
by carrying out the sort of rough calculations we examined in Chapter 2 or by
making preliminary examinations yourself.
Example 8.1 Consider an experiment to test whether the lead from petrol fumes affects
the growth of roadside plants. There would be no point in carrying out an
experiment if we already knew from the literature that lead only reduces
growth at concentrations above 250 ppm, while levels measured at our
roadside were only 20 ppm.
In an experiment you are usually investigating the effect of one (or at most two)
factors on organisms, organs or cells, while keeping all the other conditions the
same. For instance if you want to investigate the effect of a drug treatment on
rats, you should give it to one set of rats (the treated group) while treating a sec-
ond group (the control group) exactly the same except that they are not given
confounding variables the drug. This should exclude potential confounding variables which might
Variables which invalidate
affect the two groups differently and mess up the experiment.
an experiment if they are not
taken into account. Excluding confounding variables is not as easy as it might first seem, how-
ever. For instance if you are injecting the treated rats with the drug, you are
not only giving them the drug, but also subjecting the rats to the ordeal of
the injections. Therefore your control rats should also be injected, but with
pure saline solution alone. This sort of intervention is known as a procedural
control; you could have both normal and procedural controls for this sort of
treatment.
Use of procedural controls is, of course, obligatory in human drug trials, in
which control patients are treated exactly the same as ones who are given the
placebo test drug, except that they are given an inactive dose or placebo. Drug trials are
A non-active control treat- also double blind; the researcher does not know which patients are given the
ment used in drug trials.
drug, so that the possible confounding variable of their expectations are not
included.
replication Everything we have done so far in this book has emphasised that you always
The use of large numbers need to repeat observations and to include replication in your experiments.
of measurements to allow
one to estimate population However, this is not always as straightforward as it sounds. True replicates
parameters. are individuals that have been subjected to exactly the same treatments as
188
8.4 Replication and pseudoreplication
replicates members of a different treatment, except for the one experimental treatment in
The individual data points.
which they differ. As we have seen, this helps exclude potential confounding
variables.
Clearly this is straightforward for some factors. You can easily arrange an
experiment so that the replicates are all kept at the same temperature, at the
same light levels and over the same period of time. However, it is not possible
to keep every factor the same; for instance, you cannot grow plants in exactly
the same position as each other, or in exactly the same soil, and you cannot
harvest and test them at exactly the same time. In such cases, where differ-
ences are inevitable, systematic errors between samples can be avoided by using
randomisation techniques. You should randomise the position of plants and
the order in which they are tested using random numbers from tables or com-
puters, so that none of the groups is treated consistently differently from the
others.
Failure to randomise can lead to pseudoreplication . You might, for
instance, be interested in the effect of temperature on bacterial growth, but
only have two growth rooms. You might then grow one set of plates in one
growth room set at a low temperature and the other set in the other growth
room at a higher temperature. The problem is that this introduces the con-
founding variable of the growth rooms; they might be different in other
respects than just temperature. Therefore you would be testing whether
there is a growth room effect, not a purely temperature effect. One way of
getting round this is to periodically shift your plates between the rooms,
swapping their temperature regimes. This method of controlling the con-
founding variable is not perfect, but it may be the best you can do with low
resources.
Another example of pseudoreplication is if you take lots of measurements
on a single subject. For instance if you are looking at the effect of fertiliser on
the leaves of trees, you might grow two trees, one with high and one with low
fertiliser, and take measurements on lots of leaves. You might think that each
leaf was a replicate and use it to show that trees with high fertiliser levels grow
better than ones with low levels. However, in reality you have only one repli-
cate per treatment, the two trees! All you would be testing would be whether
the leaves in tree 1 had leaves that were different from that of tree 2. To get over
this problem, you should use several trees, and analyse the results on the leaves
using nested ANOVA (Section 5.10).
Pseudoreplication is a particular problem in molecular biology. If you
want to compare ribonucleic acid (RNA) from a tumour, for instance, with
RNA from control tissue, you might take a single RNA extract from each and
process them three times. These look at the effect of the technology on the
variability and are known as technical replicates. Alternatively you could take
three separate RNA replicates from each and process them just once. These
look at the effect of biological variability and are known as biological repli-
cates. Ideally, you need to have several biological replicates as well as several
technical replicates in your experimental design, and to analyse the results
with nested ANOVA.
189
Chapter 8 Designing experiments
A B C B A B C D
A C A B B A D C
D C B C C D B A
D A D D D C A B
(a) (b)
Figure 8.1 The Latin square design helps avoid unwanted bunching of treatments.
(a) A fully random design might result in all treatment A towards the top left. (b) In a
Latin square only a single replicate of each treatment is arranged in each row and
column.
190
8.5 Randomisation and blocking
Example 8.2 An experiment was carried out to test the effect of a food supplement on the
rate of growth of sheep. Individual sheep were kept in small pens and given
identical amounts of feed, except that half the sheep were given the supplement
and half were not. Because of space limitations, however, the sheep had to be
split between three sheds, with 8 sheep in shed 1, 6 in shed 2 and 6 in shed 3. To
ensure adequate blocking and take account of the possible effect of the sheds,
the experimenters made sure that there were equal numbers of sheep that were
given the supplement and sheep without the supplement in each shed. The
sheep were kept for 6 months in these conditions before being weighed.
The following results were obtained:
Shed number 1 1 1 1 1 1 1 1 2 2
Treatment – – – – + + + + – –
Sheep mass (kg) 56 48 54 57 59 61 55 64 67 64
Shed number 2 2 2 2 3 3 3 3 3 3
Treatment – + + + – – – + + +
Sheep mass (kg) 59 65 68 65 45 53 50 57 53 56
Solution
Since there are just two treatments and you are looking for differences be-
tween sheep that were given the supplement and those that were not, it
looks at first glance like it might be best to analyse the results using a
two‐sample t test. However, because there is blocking between the sheds,
you can use shed as a second factor giving six treatments in all. The test
to use is therefore two‐way ANOVA. Carrying out such a test in SPSS gives
the following results:
➨
191
Chapter 8 Designing experiments
b s e
v
It can be seen that both shed and supplement have significant effects on
sheep weight (Sig. = 0.000 and 0.008 respectively), and from the descriptive
statistics two things can be seen:
1. Sheep in shed 2 were the heaviest and in shed 3 were the lightest.
2. Sheep given the supplement were heavier than control animals.
If a two‐sample t test had been used it would have shown no significant ef-
fect of the supplement (t = 1.855, Sig. = 0.082) because the variability caused
by the sheds swamps the small difference between sheep within each shed.
It is at this stage that you should use the decision chart (Figure 1.1) to work out
which statistical test you are going to need to analyse your results. You should
be clear about this. It will help ensure that you have chosen a sensible, straight-
forward experimental design. It will also help you decide how many replicates
you will need in your experiment.
Here are some examples of deciding which tests to use following Figure 1.1.
Example 8.3 You are comparing the seed weights of 4 varieties of winter wheat, and you have
weighed 50 seeds of each variety. What null hypothesis should you test and
which test should you use?
Solution
If you are comparing seed weight (e.g. looking for differences) the null hy-
pothesis is that there is no significant difference between the mean weight of
each of the varieties. So which statistical test should you use?
• You have taken a measurement of weight for all 200 seeds, so you should
choose the left‐hand option.
192
8.6 Choosing the statistical test
• You are looking for differences between the varieties, so you should choose
the left‐hand option.
• You have more than two sets of measurements, so you should choose the
right‐hand option.
• You are investigating the effect of only one factor, variety, so you should
choose the left‐hand option.
• The measurements on the seeds are NOT in matched sets, so you should
choose the right‐hand option.
• You are examining a continuous measurement, weight, likely to be nor-
mally distributed and with a sample size of well over 20, so you should
choose the boldface option.
• You should use the parametric test one‐way ANOVA.
Example 8.4 You are investigating how the swimming speed of fish depends on their length,
and you have measured both for 30 fish. What is your null hypothesis and
which statistical test should you use to test it?
Solution
• You have taken measurements on speed and length, so you should choose
the left‐hand option.
• You are looking for a relationship between the two measurements, speed
and length, so you should choose the right‐hand option.
• One measurement, length, is clearly unaffected by the other, speed, so you
should choose the right‐hand option.
• Both length and speed are continuous measurements which are likely to
be normally distributed; therefore you should carry out regression analysis.
Example 8.5 You are investigating the incidence of measles in children resident in hospitals
and comparing it with the national average; you have surveyed 540 children.
Which test should you use?
Solution
If you are comparing incidence of measles between hospitals and the nation-
al average the null hypothesis must be that there is no significant difference
between them. So which statistical test should you use?
• You have counted the frequencies of children in two categories (with and
without measles), so you should choose the right‐hand option.
• You are comparing your sample with an expected outcome (the national
average), so you should choose the left‐hand option.
• You should calculate x2 for differences.
193
Chapter 8 Designing experiments
Example 8.6 You are investigating the relationship between the weight and social rank of
domestic hens, and you have observed 34 birds. Which test should you use?
Solution
If you are investigating the relationship between weight and social rank,
the null hypothesis is that there is no relationship between them. So which
statistical test should you use?
• You have taken measurements on weight and assigned social rank to your
birds, so you should choose the left‐hand option.
• You are looking for a relationship between the two measurements, weight
and social rank, so you should choose the right‐hand option.
• The measurements are not unaffected by each other – weight can affect
rank, and rank can also affect weight – so you should choose the left‐
hand option.
• Since social rank is by definition rank data you should choose the non‐
parametric option in normal type.
• You should carry out rank correlation.
Good experimental design is not enough if you don’t have enough replicates
in your experiments to detect effects that you are interested in. The key to
working out how many replicates you need in your samples is knowledge of
the system you are examining. You must have enough replicates to allow sta-
tistical analysis to tease apart the possible effects from random variability, but
not many more or else you would be wasting your time. Therefore you need
to know two things: the size of the effects you want to be able to detect; and
the variability of your system. In general, the smaller the effect you want to
detect and the greater the variability, the larger the sample sizes you will need.
For parametric tests for differences it is relatively straightforward to work out
how many replicates you need, using so‐called power calculations. Nowadays,
there are many programs (including MINITAB) that allow you to perform power
calculations, but it is important to understand how they work, and how you
can make rough power calculations yourself.
As we saw in Chapter 4, in t tests a difference becomes significant when t is
around 2, in other words, when the mean is around two standard errors away
from the expected value. In other words, when
D ⬇ 2 SE
where D is the difference and SE is the standard error. If the mean value of the
population from which you took your sample was two standard errors away
from the expected mean, you would therefore only be able to detect that there
was a difference exactly half the time (Figure 8.3a) (because half the time the
194
8.7 Choosing the number of replicates: power calculations
2SE
80%
3SE
Figure 8.3 (a) An effect will be detected roughly 50% of the time if the expected value
is two standard errors away from the actual population mean (b) To detect a significant
difference between a sample and an expected value 80% of the time, the expected
value should be around three standard errors away from the population mean.
sample mean would be higher than the population mean and half the time
lower). To be able to detect that there was an 80% difference of the time (the
accepted criterion for power calculations) the population mean would have to
be approximately a further standard error away from the expected mean (Figure
8.3b). In other words
D ⬇ 3 SE
ence, D is given by
D ⬇ 3s/ 2N (8.1)
The number of replicates you need, N, to get an 80% chance of detecting the
difference D can therefore be found by rearranging the equation, so that
N ⬇ 9(s/D)2 (8.2)
195
Chapter 8 Designing experiments
deviation of samples. You could do this by looking up values from the literature
or by carrying out a small pilot experiment. This will allow you to carry out
power calculations, either by hand, or using MINITAB.
Example 8.7 A survey is to be carried out to determine whether workers on an oil plat-
form have higher levels of stress hormone than do the general population.
The mean and standard deviation for the general population are 2.15 (0.69)
nM. Assuming the workers show the same variation as the general popula-
tion, how many would be needed to test to detect
(a) A 20% (0.43 nM)
(b) A 10% (0.215 nM) difference in hormone concentration?
(c) The total number of workers available is 124. What difference would be
detectable if they were all used?
Solution
Using a calculator
(a) Using equation 8.2, n L 9 : (0.69兾0.43)2 L 23.
(b) n L 9 : (0.69兾0.215)2 L 93.
(c) Using equation 8.1 D L 3 : 0.69兾 √124 = 0.186 nM.
Using MINITAB
Click on the Stat menu, move onto the Power and Sample Size bar and
click on 1‐Sample t. MINITAB will produce the Power and Sample Size for
1‐Sample t dialogue box. Put in the value for the standard deviation (here
0.69), give 0.8 as the Power value, and put in either the Sample sizes: or
the Differences: MINITAB will calculate the other. For instance for (a) the
completed dialogue box is shown below:
196
8.7 Choosing the number of replicates: power calculations
Finally click on OK to run the calculation. MINITAB will come up with the
following output for (a).
For (a) MINITAB gives a value for the sample size of 23.
For (b) MINITAB gives a value for the sample size of 83.
For (c) MINITAB gives a value of 0.174967.
The rough calculations give a reasonable figure, but not exactly correct!
D ⬇ 3s/ 2N (8.3)
To be able to detect that there was an 80% difference of the time (the accepted
criterion for power calculations) the population mean would have to be
approximately a further standard error away from the expected mean. In other
words
D ⬇ 4 SE
number in the sample (SE = s> 2N ). Therefore the smallest detectable differ-
But standard error is the standard deviation divided by the square root of the
ence, D is given by
D ⬇ 4s/ 2N (8.4)
197
Chapter 8 Designing experiments
The number of replicates you need, N, to get an 80% chance of detecting the
difference D can therefore be found by rearranging the equation, so that
N ⬇ 16(s/D)2 (8.5)
Example 8.8 Tortoises from two different volcanic islands are to be compared. Assum-
ing the standard deviation of each population’s mass is 30 g, how many
tortoises should be sampled from each island to detect (a) a 20 g difference
and (b) a 10 g difference in mean mass?
Solution
Using a calculator
(a) N L 16 : (30/20)2 L 36.
(b) N L 16 * (30/10)2 L 144.
Using MINITAB
Do the same as for the one-sample t test but click on 2‐Sample t and fill in
the boxes in the same way.
For (a) MINITAB gives a value for the sample sizes of 37.
For (b) MINITAB gives a value for the sample sizes of 143.
8.7.3 ANOVA
The results of power calculations for ANOVA are the same when one has only
two samples. However, for more than two samples the sample sizes needed go
up because one would be carrying out several post hoc tests and the signifi-
cance probability for each has to be reduced.
Power calculations for ANOVA are best carried out in MINITAB by clicking
on the One‐Way ANOVA bar. The dialogue box is then filled in as for t tests
except you also have to give the number of samples by filling in the Number
of levels: box.
Example 8.9 The survey of tortoises in Example 8.8 is extended to end up with four
volcanic islands. Repeat the power calculation s to work out sample sizes to
detect (a) a 20 g difference and (b) a 10 g difference in mean mass.
Solution
Using MINITAB
Do the same as for the 2‐Sample t but put 4 into the Number of levels box
and fill in the boxes in the same way.
For (a) MINITAB gives a value for the sample sizes of 51.
For (b) MINITAB gives a value for the sample sizes of 198.
198
8.7 Choosing the number of replicates: power calculations
To get a significant result 80% of the time one would have to have a differ-
ence of a further standard error, so that
d ⬇ 3 2p(1 - p)/(N - 1) (8.8)
Rearranging 8.8 gives an expression for the sample size N that would be
required to detect a proportional difference, d:
N ⬇ [9p(1 - p)/d2] + 1 (8.9)
Example 8.10 We want to know how many people in a town we need to test to detect
whether there is a 1% difference in the incidence of hair lice from the na-
tional figure of 4%.
Solution
Inserting values for p of 0.04 and for d of 0.01 into the equation, we get N L
[9 * 0.04 * 0.96兾0.012] + 1 L 3456.
Example 8.11 In an investigation of the relationship between the size of people’s hands
and the length of their feet, 40 students were measured. What is the mini-
mum correlation coefficient that could be detected? What sample size
would be needed to detect a correlation coefficent of 0.2?
Solution
meaning a sample size of around 97 (say 100 to be on the safe side) would
be needed.
You can also use this technique to determine the sample size needed for re-
gression analysis.
During your experiment you should look at your results as soon as possible,
preferably while you are collecting them, and try to think what they are telling
you about the natural world. Calculate the mean and standard deviations of
measurements, plot the results on a graph or look at the frequencies in differ-
ent categories. Once you can see what seems to be happening, you should write
down your ideas in your laboratory book, think about them and then tell your
supervisor or a colleague. Do not put your results into a spreadsheet and forget
about them until the write‐up.
Only after you have worked out what you think is happening should you
carry out your statistical analysis to see if the trends you identified are signifi-
cant. Usually, if a trend is not obvious to the naked eye it is unlikely to be sig-
nificant. So always use statistics as a tool; do not allow it to become the master!
Problem 8.1
A clinician wants to find out if there is any link between energy intake (in calories) and heart
rate in old people. She collects data on both of them from 150 volunteers.
What is her null hypothesis? And which statistical test should she choose to determine
whether energy intake and heart rate are linked?
Problem 8.2
An ecologist collects data about the numbers of individuals that belong to five species of
crow feeding in three different habitats: farmland, woodland and mountain. He wants to
determine whether different crows are distributed non‐randomly in different habitats.
What is his null hypothesis and how will he analyse his data to determine whether dif-
ferent crows are distributed non‐randomly in different habitats?
Problem 8.3
A doctor wants to find out if there is any difference in insulin levels between three races of
people: Afro‐Caribbean, Asian and Caucasian. He collects data on insulin levels from 30
people of each race.
What is the null hypothesis and which statistical test should he use to answer his
question?
200
8.9 Self‐assessment problems
Problem 8.4
A genetics student wants to find out whether two genes are linked: one for shell colour
(brown dominant, yellow recessive) and one for having a banded or plain shell (banded
dominant, plain recessive). To do this, she crosses two pure‐bred lines of brown‐banded
snails with yellow plain snails. The result is an F1 generation, all of which are brown and
banded. These are crossed to produce an F2 generation.
What is the null hypothesis and which statistical test should she perform to test wheth-
er there is in fact any linkage?
Problem 8.5
An ecologist wants to find out whether the levels of pesticide residue found in kestrels dif-
fer at different times of the year. She measures pesticide levels in ten birds, repeating the
measurements on each bird every 2 months.
What is the null hypothesis and which statistical test should she perform to test wheth-
er pesticide levels are different at different times of the year?
Problem 8.6
A new medication to lower blood pressure is being tested in field trials. Forty patients
were tested before and after taking the drug. Which test should the clinicians use to best
determine whether it is having an effect?
Problem 8.7
It has been suggested that pot plants can help the survival of patients in intensive care
by providing the room with increased levels of oxygen from photosynthesis. Carry out a
rough calculation to work out if this idea is worth testing. (Hint: estimate how fast the plant
is growing and hence laying down carbohydrates and exporting oxygen.)
Problem 8.8
In an investigation into starch metabolism in mutant potatoes, the effect of deleting a gene
is investigated. It is expected that this will reduce the level to which starch builds up. Large
numbers of previous experiments have shown that the mean level of starch in ordinary po-
tatoes is 21 M with standard deviation 7.9 M. Assuming the standard deviation in mutants
is similar to that in ordinary potatoes, how many replicates would have to be examined
to detect a significant difference in mutants whose mean starch level is 16 M, a 5 M fall?
Problem 8.9
An experiment is being designed to test the effect of shaking maize plants on their exten-
sion growth. There will be two groups: shaken plants and unshaken controls. It is known
that maize usually has a mean height of 1.78 m with standard deviation of 0.36 m. How
many replicates of experimental and control plants must be grown to detect a height dif-
ference of 0.25 m?
Problem 8.10
The national rate of breast cancer is given as 3.5% of women over 45 years old. It has
been suggested that silicone implants may increase the rate of such cancers. How many
women with implants would need to be tested to detect a doubling of the risk?
201
Chapter 8 Designing experiments
Problem 8.11
Design an experiment to test the relative effects of applying four different amounts of
nitrogen fertiliser, 0, 3.5, 7 and 14 g of nitrogen per square metre per year, in 25 fortnightly
applications (researchers get a fortnight off at Christmas) on to chalk grassland at Wardlaw,
Derbyshire. The field site is split into 16 plots, each of dimensions 1 m * 1 m in an 8 m *
2 m grid (see grid). You are supplied with a quantity of 20 * 10-3 M ammonium nitrate
solution fertiliser. Mark the different plots and describe exactly what you would apply to
each plot.
202
9 More complex statistical analysis
So far we have investigated how to analyse fairly simple experiments and sur-
veys: ones in which you are looking at how one (or at most two) factors affect
another, or at the relationships between two sets of measurements or factors. I
would strongly advise most working biologists to limit themselves as far as they
can to such simple experiments and to the statistical tests in the decision chart.
Quite often teachers of statistics find that inexperienced scientists (or just igno-
rant ones!) who so often come to us at the end of their projects to ‘help with
their stats’ have designed their experiments so poorly that they are not properly
controlled, or there is a confounding variable. This means they have to be ana-
lysed using more complicated statistics that they don’t really understand. The
moral is always to think carefully about how your experiments will be analysed
before carrying them out.
However, there are legitimate cases in which biologists may want or need to
do rather more complex things and use more complicated statistical analyses.
• To save time you may want to design experiments that investigate several
factors at once.
• You may not be able to control all the conditions in your experiments, so you
may need to be able to carry out statistical analysis that makes allowances for
this and mixes the two types of analysis: comparing groups and looking at
relationships.
• You may want to investigate the relationships between several sets of meas-
urements you have taken in a survey.
• You may have taken large numbers of measurements on a single group of
organisms or cells and be interested in exploring the data. In particular, you
might want to see whether the measurements allow the groups to be split up
into several subgroups, and which ones are most similar.
This chapter introduces you to the main types of complex statistical analysis
(none of which I have myself ever used!).
203
Chapter 9 More complex statistical analysis
In Chapter 5 we saw how you can use two‐way ANOVA and nested ANOVA to
analyse complex experiments in which two factors can be investigated simulta-
neously. It is also possible to examine the effect of three or more factors using
three or more way ANOVA. Such tests can readily be performed in both SPSS
and MINITAB using the same techniques as we used for two‐way ANOVA. Of
course you need to be careful to distinguish between the main factors and
nested factors, but the principles are the same, and the data is analysed within
the General Linear Model dialogue boxes.
There are a few problems you have to remember if you want to perform such
complex analyses. First, if you want to perform a complete factorial experi-
ment, and you have three or more factors the number of treatments you will
have to set up starts to rise alarmingly. For instance, if you have three factors,
each of which is found in two states you will have 2 * 2 * 2 = 8 treatments.
If each treatment is found in three states there will be 27! Second, the analysis
will not only give you results about the main effects of the three factors but
also the interactions between them. For a three‐way ANOVA there will be three
interaction terms between each pair of factors, and one extra one for the inter-
action of all three factors. These interaction terms become increasingly numer-
ous and difficult to understand. Finally, all these analyses assume that the data
is normally distributed. At present there are no straightforward non‐parametric
tests that can simultaneously test for several factors.
All of the tests for differences we have examined so far investigate the effect
of factors on just a single measurement or variable (such as yield or size). They
are what is known as univariate tests, so all the ANOVA tests we have exam-
ined are examples of univariate ANOVA. Usually biologists carry out this sort of
analysis even when they want to examine the effect of the treatments on several
different measurements. They simply carry out two or more ANOVAs. However,
a theoretically better way of carrying out such analyses is by performing a mul-
tivariate analysis of variance (MANOVA) which investigates to see whether
there are differences in a combination of measurements. This has the advantage
that it might help find significant differences between groups when the effects
on a number of individual measurements are all close to being significant. It also
allows you to investigate how the various outcome measurements are related.
This is a complex procedure, though, and it is hard to interpret what the results
of a MANOVA actually mean in physical terms. For this reason, this technique is
more often used by psychologists than biologists, so those interested are recom-
mended to look up the theory and practice in Field (2009).
Sometimes you may not be able to control all the variables within your experi-
ments. For instance you might have taken measurements of bone density in
two groups of women, one of which was put under an exercise regime, the
204
9.3 Experiments in which you cannot control all the variables
other of which was not. However, in both groups you might have women
of widely varying age, which also potentially affects women’s bone density.
Consequently, age might act as a confounding variable, preventing you from
finding a significant result if you simply analysed the experiment using a two‐
sample t test or one‐way ANOVA.
Fortunately, you can make use of the fact that analysis of variance and regres-
sion are actually both manifestations of a single analytical technique. They both
general linear model
operate by apportioning and comparing the variability due to a possible effect
(GLM)
A series of tests which com- (of one or more factors in ANOVA and one or more variables in regression) with
bine ANOVA and regression the residual variability. The two techniques can therefore be combined (as they
analysis to allow powerful
analysis of complex data are in SPSS and MINITAB) within the general linear model (GLM).
sets. Analysis of covariance (ANCOVA) is the most commonly used example of
a technique within GLM; it allows you to combine the effect of a single factor
ANCOVA
Abbreviation for analysis of with a single variable. In the example of women’s bone density, for instance,
covariance: a series of tests analysing the data using ANCOVA allows you to do two things.
that determines whether
there are significant differ- 1. To determine whether calcium affects bone density while allowing for the
ences between groups, while
accounting for variability in effects of age.
other measurements. 2. To determine whether age affects bone density, while allowing for the effects
of the calcium treatment.
Age 56 50 69 52 72 58 42 80 62 68
Controls
Density 54 62 45 64 48 56 63 42 56 51
Exercise Age 50 59 75 65 55 60 57 54 67 61
treated Density 67 56 47 65 76 59 63 64 49 65
Solution
Step 1: Formulating the null hypothesis
The null hypothesis is that exercise had no effect on bone density. ANCOVA
also tests the other null hypothesis that age had no effect on bone density.
➨
205
Chapter 9 More complex statistical analysis
Finally click on OK. SPSS will produce an output of which the most useful
part is shown in the following table:
Dependent variable:density
a
Corrected model 1 060.010 2 530.005 23.538 0.000
Total 67 798.000 20
a
R squared = 0.735 (adjusted R squared = 0.703).
206
9.3 Experiments in which you cannot control all the variables
This table gives F ratios for the effect of the treatment (9.604) and age
(36.195).
Using MINITAB
First put the data into three columns say age, treatment and density. Next,
click on the Stat menu, move onto the ANOVA bar and click on General
Linear Model. MINITAB will produce the General Linear Model dialogue
box. Put the dependent variable (here Density) into the Responses box and
Treatment into the Model box. Then click onto the Covariates button and
put Age into the Covariates box. The completed boxes and the data screen
are shown below.
This table gives F ratios for the effect t of the treatment (5.11) and age
(36.20) and presents the result of a t test to compare the slope of the
regression line of density vs age with 0. Here T = -6.02.
➨
207
Chapter 9 More complex statistical analysis
Here Sig. = 0.007 and P = 0.037 6 0.05. Therefore we must reject the null
hypothesis. We can say that the exercise treatment has a significant ef-
fect on bone density: the treatment increases it. There are two things to
note. First SPSS and MINITAB give different results! Second, the analysis
also shows that bone density falls significantly with increasing age. If
we had not included it in the analysis and just carried out a one‐way
ANOVA, the result of the treatment would not have come out significant
(Sig. = 0.071).
In Chapters 6 and 7 we saw how you can use correlation, regression, the x2
test and logistic regression to investigate the relationships between two sets
of variables. However, biologists often want to investigate the relationships
between several sets of measurements. Once again there are several differ-
ent ways of doing this, and the method you choose will depend on the type
of data and the ways in which the different variables could influence each
other.
correlation Correlation is the simplest method of investigating the relationships
A statistical test which deter-
between more than two variables, where each variable could conceivably
mines whether there is linear
association between two affect the others and vice‐versa. All you need to do is to make all the possible
sets of measurements. comparisons you could do, using Pearson correlation for normally distrib-
uted data and rank correlation for non‐normally distributed data or ranks.
The result will be a large table or correlation coefficients that you can look
through for significant correlations. There are two main problems with this
simple approach. The first is that the more correlations you do the more likely
you are to get significant correlations just by chance. You can get over this
problem by decreasing the probability at which you consider significance to be
208
9.4 Investigating the relationships between several variables
reached. I recommend using the Dunn–Sidak method in which you reduce the
probability according to the equation P = 1 - (0.95)1/c where k is the number
of comparisons.
The second problem is not so easy to get over. Many of the relationships you
uncover may not represent real causal relationships. Two variables may be cor-
related, not because there is a causal relationship between them but because one
is co‐correlated with a third variable that does have such a causal relationship.
For instance many measurements on people may be correlated because they
all change with age. However, this does not necessarily mean they affect each
other. To get over this problem, the technique to use is partial correlation.
This produces a measure of the correlation between two variables, when all the
other variables are held constant. The technique is easy to carry out in SPSS
(but not MINITAB!). Let’s investigate the data about heart rate and blood pres-
sure we examined in Example 6.1. The two variables were found to be closely
correlated with a correlation coefficient of 0.860. What about adding a third
measure, artery stiffness? To investigate how the three variables are related you
can carry out several partial correlations.
For instance to see if the relationship between heart rate and pressure
would remain significant, even if artery stiffness was taken into account, go
into the Correlate menu, and click on Partial Correlation. SPSS will produce
the Partial Correlation dialogue box. Put the two variables whose relation-
ship you want to investigate (here heartrate and pressure) into the Variables
box, and the variable for which you are controlling (here artery) into the
Controlling for box. The completed boxes and the data screen are shown
below.
209
Chapter 9 More complex statistical analysis
Correlations
df 0 27
df 27 0
This shows that even taking into account artery stiffness, heart rate and blood
pressure are still strongly positively correlated.
Multiple regression allows you to determine how a single ‘output’ meas-
urement is affected by several independent variables or cofactors. For instance
you might be interested in seeing how the metabolic rate of patients is related
to the levels of several hormones. If each hormone is measured in each patient,
along with the metabolic rate, multiple regression allows you to work out if
any of the hormones have a significant effect on metabolic rate. If so it will
enable you to determine the line of best fit between hormone levels and meta-
bolic rate, and tell you how much of the variability is accounted for by each
hormone.
The main problem with using multiple regression is that there are sev-
eral different ways in which it can be performed, and each can give different
results! You can involve all the variables. Another way is to build them up
forwards, starting with the variable that explains the most variability and con-
tinue until the last significant variable is added. Alternatively you can start
with all of them and move backwards removing the ones with the least effect.
A final way, stepwise adds new variables, like the forwards strategy but at any
stage can add or take away any of the existing variables in the regression.
This is a more flexible strategy, but none is absolutely guaranteed to get the
best possible regression, so you are advised to proceed with caution with this
technique. The advantage is that all of these techniques are supported in both
SPSS and MINITAB (within the Regression menus) and they are as easy to
carry out as simple linear regression. You just put in all the possible factors
into the Independent or Predictor boxes and choose the technique to use
from a menu.
logistic regression Logistic regression allows you to investigate how several variables or char-
A statistical test which analy-
acter states affect a binary output. Thus you might look at how several fac-
ses how a binary outcome is
affected by other numerical tors influence whether an animal is eaten or not, or which factors influence
characteristics whether people smoke. The factors can be a mixture of character states, ranks
and measurements.
210
9.5 Exploring data to investigate groupings
Finally, there are several techniques that can be used to investigate the rela-
tionships between several variables taken on a single group of subjects. Some
techniques can help to split up the individuals as much as possible on the basis
of differences in a combination of the variables. Other techniques can be used
if you have already defined your groups. They can see which variables best sepa-
rate the groups. Finally, yet others find out how closely the different subjects
are related and work a dendrogram or family tree showing those relationships.
These techniques are useful, but they should be mainly used as the basis for fur-
ther research and to develop hypotheses which should then be tested if possible
by more rigorous experiments or surveys.
9.5.4 A warning
Data exploration techniques can seem very attractive, as they allow the scien-
tist to chuck huge amounts of ‘data’ into an analysis and get lots of information
out. This can allow you to ‘see the wood for the trees’. However, the problem
211
Chapter 9 More complex statistical analysis
is that the results they produce are hard to understand and it can be hard to
see how reliable they are. Consequently, they should rarely be used as an end
in themselves but should be used mainly to generate hypotheses, which can be
subsequently tested by experiment. In science you can’t find things out about
your subjects before you have a reasonably good understanding about them
and just collecting data is no substitute for thinking. For this reason, this book
doesn’t go deeply into this subject and the techniques are relevant mostly to
particular areas of biology: morphometrics, systematics, community ecology
and bioinformatics. Students of those subjects will no doubt find useful texts
pointing them to the relevant techniques for their subject.
212
10 Dealing with measurements and units
10.1 Introduction
10.2 Measuring
213
Chapter 10 Dealing with measurements and units
10.3.1 SI units
Before carrying out any further manipulation of data or expressing it, the
data should be converted into the correct SI units. The Système International
d’Unités (SI) is the accepted scientific convention for measuring physical quan-
tities, under which the most basic units of length, mass and time are kilograms,
metres and seconds respectively. The complete list of the basic SI units is given
in Table 10.1.
All other units are derived from these basic units. For instance, volume should
be expressed in cubic metres or m3. Similarly density, mass per unit volume,
should be expressed in kilograms per cubic metre or kg m-3. Some important
derived units have their own names; the unit of force (kg m s-2) is called a new-
ton (N), and the unit of pressure (N m-2) is called a pascal (Pa). A list of impor-
tant derived SI units is given in Table 10.2.
Use of prefixes
Using prefixes, each of which stands for a multiplication factor of 1000 (Table 10.3),
is the simplest way to present large or small measurements. Any quantity can
Base
Length metre m
Mass kilogram kg
Time second s
Amount of substance mole mol
Temperature kelvin K
Electric current ampere A
Luminous intensity candela cd
Supplementary
Plane angle radian rad
Solid angle steradian sr
214
10.3 Converting to SI units
Mechanics
Force newton N kg m s-2
Energy joule J Nm
Power watt W J s-1
Pressure pascal Pa N m-2
Electricity
Charge coulomb C As
Potential difference volt V J C-1
Resistance ohm Ω V A-1
Conductance siemens S Ω-1
Capacitance farad F C V-1
Light
Luminous flux lumen lm cd sr-1
Illumination lux lx lm m-2
Others
Frequency hertz Hz s-1
Radioactivity becquerel Bq s-1
Enzyme activity katal kat mol substrate s-1
Small numbers
Large numbers
Multiple 103 106 109 1012 1015 1018
Prefix kilo mega giga tera peta exa
Symbol k M G T P E
(a) 24 ha
(b) 25 cm
Solution
216
10.3 Converting to SI units
hydrogen atoms. In other words, the mass of 1 mole of a substance is its molecular
mass in grams.
When working out concentrations of solutions it is probably best to stick to
these units, since most glassware is still calibrated in litres and small balances
in grams.
The molarity M of a solution is obtained as follows:
Number of moles
M =
Solution volume (1)
Mass (g)
=
Molecular mass * Solution volume (1)
217
Chapter 10 Dealing with measurements and units
Example 10.2 A solution contains 23 g of copper sulphate (CuSO4) in 2.5 l of water. What
is its concentration?
Solution
Non‐metric units
Non‐metric units, such as those based on the old Imperial scale, are also given
in Table 10.4. Again you must simply multiply your measurement by the con-
version factor. However, they are more difficult to convert to SI, since they must
be multiplied by factors which are not just powers of 10. For instance,
Note that the answer was given as 1.83 m, not the calculated figure of 1.8288 m.
This is because a measure of 6 ft implies that the length was measured to the
nearest inch. The answer we produced is accurate to the nearest centimetre,
which is the closest SI unit.
If you have to convert square or cubic measures into metric, simply multiply by
the conversion factor to the power of 2 or 3. So 12 ft3 = 12 : (3.038 : 10-1)3 m3 =
3.4 : 10-1 m3 to two significant figures.
Once measurements have been converted into SI units with exponents, they
are extremely straightforward to combine using either pencil and paper or cal-
culator (most calculators use exponents nowadays). When multiplying two
measurements, for instance, you simply multiply the initial numbers, add the
exponents and multiply the units together. If the multiple of the two initial
numbers is greater than 10 or less than 1, you simply add or subtract 1 from the
exponent. For instance,
Notice that the area is given to two significant figures because that was the
degree of precision with which the lengths were measured. Similarly,
218
10.6 Doing all three steps
In the same way, when dividing one measurement by another you divide the
first initial number by the second, subtract the second exponent from the first,
and divide the first unit by the second.
When you have completed all calculations, you must be careful how you express
your answer. First, it should be given to the same level of precision as the least
accurate of the measurements from which it was calculated. This book and many
statistical packages use the following convention: the digits 1 to 4 go down, 6 to
9 go up and 5 goes to the nearest even digit. Here are some examples:
The various steps can now be carried out to manipulate data to reliably derive
further information. It is important to carry out each step in its turn, produc-
ing an answer before going on to the next step in the calculation. Doing all the
calculations at once can cause confusion and lead to silly mistakes.
Example 10.3 A sample of heartwood taken from an oak tree was 12.1 mm long by 8.2 mm
wide by 9.5 mm deep and had a wet mass of 0.653 g. What was its density?
Solution
Density has units of mass (in kg) per unit volume (in m3). Therefore the first
thing to do is to convert the units into kg and m. The next thing to do is
to calculate the volume in m3. Only then can the final calculation be per-
formed. This slow building up of the calculation is ponderous but is the best
way to avoid making mistakes.
219
Chapter 10 Dealing with measurements and units
Notice that the answer is given to two significant figures, like the dimen-
sions of the sample.
Frequently, raw data on their own are not enough to work out other important
quantities. You may need to include physical or chemical constants in your
calculations, or insert your data into basic mathematical formulae. Table 10.5
Physical constants
π = 3.1416
loge x = 2.30 log10 x
220
10.8 Using calculations
is a list of some useful constants and formulae. Many of them are worth
memorising.
Solution
The first thing to calculate is the area of the Petri dish. Since its diameter is
10 cm, its radius R will be 5 cm (or 5 : 10-2 m). A circle’s area A is given by
the formula A = πR2. Therefore
Once you can reliably perform calculations, you can use them for far more than
just working out the results of your experiments from your raw data. You can
use them to put your results into perspective or extrapolate from your results
into a wider context. You can also use calculations to help design your experi-
ments: to work out how much of each ingredient you need, or how much the
experiment will cost. But even more usefully, they can help you to work out
whether a particular experiment is worth attempting in the first place. Calcula-
tions are thus an invaluable tool for the research biologist to help save time,
money and effort. They don’t even have to be very exact calculations. Often, all
that is required is to work out a rough or ballpark figure.
Example 10.5 Elephants are the most practical form of transport through the Indian rain-
forest because of the rough terrain; the only disadvantage is their great
weight. A scientific expedition needs to cross a bridge with a weight limit of
10 tonnes, in order to enter a nature reserve. Will the elephants be able to
cross this bridge safely?
Solution
221
Chapter 10 Dealing with measurements and units
they are. Since the mass of an object is equal to volume × density, the first
thing to calculate is the volume. What is the volume of an elephant? Well,
elephants are around 2–3 m long and have a (very roughly) cylindrical body
of diameter, say, 1.5 m (so the radius = 0.75 m). The volume of a cylinder
is given by V = πR2L, so with these figures the volume of the elephant is
approximately
V = * 0.752 * 2 up to * 0.752 * 3
V = 3.53 - 5.30 m3
The volume of the legs, trunk, etc. is very much less and can be ignored in
this rough calculation. So what is the density of an elephant? Well, elephants
(like us) can just float in water and certainly swim, so they must have about
the same density as water, 1000 kg m-3. The approximate mass of the el-
ephant is therefore
Note, however, that the length of the beast was estimated to only one sig-
nificant figure, so the weight should also be estimated to this low degree of
accuracy. The weight of the elephant will be (4–5) : 103 kg or 4–5 tonnes.
(Textbook figures for weights of elephants range from 3 to 7 tonnes.) The
bridge should easily be able to withstand the weight of an elephant.
This calculation would not have been accurate enough to determine
whether our elephant could cross a bridge with weight limit 4.5 tonnes. It
would have been necessary to devise a method of weighing it.
222
10.9 Logarithms, graphs and pH
Figure 10.1 Using logarithms. (a) A simple graph showing the relationship between
the size of woods and the number of tree species they contain; the points are
hopelessly congested at the left of the plot. (b) Plotting number of species against
log10 (area) spreads the data out more evenly.
223
Chapter 10 Dealing with measurements and units
pH
The single most important use of logarithms in biology is in the units for acid-
ity. The unit pH is given by the formula
pH = -log10[H + ] (10.1)
where [H+] is the hydrogen ion concentration in moles per litre (M). Therefore,
a solution containing 2 : 10–5 mole (mol) of hydrogen ions per litre will have a
pH of -log10(2 : 10-5) = 4.7.
Example 10.6 A solution has a pH of 3.2. What is the hydrogen ion concentration?
Solution
Problem 10.1
What are the SI units for the following measurements?
(a) Area
(b) The rate of height growth for a plant
(c) The concentration of red cells in blood
(d) The ratio of the concentrations of white and red cells in blood
224
10.10 Self‐assessment problems
Problem 10.2
How would you express the following quantities using appropriate prefixes?
Problem 10.3
How would you express the following quantities in scientific notation using appropriate
exponents?
Problem 10.4
How would you express the following quantities in scientific notation using the appropriate
exponents?
Problem 10.5
Convert the following to SI units expressed in scientific notation.
Problem 10.6
Convert the following into SI units.
(a) 35 yards
(b) 3 feet 3 inches
(c) 9.5 square yards
Problem 10.7
Perform the following calculations.
Problem 10.8
Give the following expressions in prefix form and to the correct degree of precision.
225
Chapter 10 Dealing with measurements and units
Problem 10.9
A blood cell count was performed. Within the box on the slide, which had sides of length
1 mm and depth of 100 μm, there were 652 red blood cells. What was the concentration
of cells (in m-3) in the blood?
Problem 10.10
An old‐fashioned rain gauge showed that 0.6 in. of rain had fallen on an experimental plot
of area of 2.6 ha. What volume of water had fallen on the area?
Problem 10.11
What is the concentration of a solution of 25 g of glucose (formula C6H12O6) in a volume
of 2000 ml of water?
Problem 10.12
An experiment to investigate the basal metabolic rate of human beings showed that in 5
minutes the subject breathed out 45 l of air into a Douglas bag. The oxygen concentra-
tion in this air had fallen from 19.6% by volume to 16.0%, so it contained 3.6% CO2 by
volume. What was the mass of this CO2 and at what rate had it been produced?
Problem 10.13
A chemical reaction heated 0.53 l of water at 2.4 K. How much energy had it produced?
Problem 10.14
An experiment which must be repeated around 8 times requires 80 ml of a 3 * 10-3 M
solution of the substance X. Given that X has a molecular mass of 258 and costs £56 per
gram, and given that your budget for the year is £2000, do you think you will be able to
afford to do the experiment?
Problem 10.15
It has been postulated that raised bogs may be major producers of methane and, be-
cause methane is a greenhouse gas, therefore an important cause of the greenhouse
effect. A small microcosm experiment was carried out to investigate the rate at which
methane is produced by a raised bog in North Wales. This showed that the rate of pro-
duction was 21 ml m-2 per day. Given that (1) world production of CO2 by burning fossil
fuels is 25 Gt per year, (2) weight for weight, methane is said to be three times more ef-
ficient a greenhouse gas than CO2 and (3) there is 3.4 * 106 km2 of blanket bog in the
world, what do you think of this idea?
226
10.10 Self‐assessment problems
Problem 10.16
Calculate log10 of
(a) 45
(b) 450
(c) 0.000 45
(d) 1 000 000
(e) 1
Problem 10.17
Reconvert the following logarithms to base 10 back to numbers.
(a) 1.4
(b) 2.4
(c) -3.4
(d) 4
(e) 0
Problem 10.18
Calculate the pH of the following solutions.
Problem 10.19
Calculate the mass of sulphuric acid (H2SO4) in 160 ml of a solution which has a pH of 2.1.
Problem 10.20
Calculate the natural logarithm of
(a) 30
(b) 0.024
(c) 1
Problem 10.21
Convert the following natural logarithms back to numbers.
(a) 3
(b) -3
(c) 0
227
Glossary
ANOVA Abbreviation for analysis of variance: a widely used series of tests which
can determine whether there are significant differences between groups.
chi-squared (x2) A statistical test which determines whether there are differ-
ences between real and expected frequencies in one set of categories, or associa-
tions between two sets of categories.
critical values Tabulated values of test statistics; usually if the absolute value of
a calculated test statistic is greater than or equal to the appropriate critical value,
the null hypothesis must be rejected.
data Observations or measurements you have taken (your results) which are
used to work things out about the world.
228
Glossary
error bars Bars drawn upwards and downwards from the mean values on
graphs; error bars can represent the standard deviation or the standard error.
general linear model (GLM) A series of tests which combine ANOVA and regres-
sion analysis to allow powerful analysis of complex data sets.
median The central value of a distribution (or average of the middle points if
the sample size is even).
metric Units based on the metre, second and kilogram but not necessarily SI.
229
Glossary
non-parametric test A statistical test which does not assume that data is nor-
mally distributed, but instead uses the ranks of the observations.
null hypothesis A preliminary assumption in a statistical test that the data shows
no differences or associations. A statistical test then works out the probability of
obtaining data similar to your own by chance.
parameter A measure, such as the mean and standard deviation, which describes
or characterises a population. These are usually represented by Greek letters.
parametric test A statistical test which assumes that data are normally
distributed.
post hoc tests Statistical tests carried out if an analysis of variance is significant;
they are used to determine which groups are different from each other.
quartiles Upper and lower quartiles are values exceeded by 25% and 75% of
the data points, respectively.
scatter plot A point graph between two variables which allows one to visually
determine whether they are associated.
230
Glossary
standard error (SE) A measure of the spread of sample means: the amount by
which they differ from the true mean. Standard error equals standard deviation
divided by the square root of the number in the sample. The estimate of SE is
called i.
standard error of the difference (id) A measure of the spread of the difference
between two estimated means.
t tests Statistical tests which analyse whether there are differences between
measurements on a single population and an expected value, between paired
measurements, or between two unpaired sets of measurements.
two-tailed tests Tests which ask merely whether observed values are different
from an expected value or each other, not whether they are larger or smaller.
231
Further reading
Statistics is a huge field, so this short book has of necessity been superficial and
selective in the material it covers. For more background and for more informa-
tion about the theory of statistics the reader is referred to the following titles.
The books vary greatly in their approach and in the degree of mathematical
competence required to read them, so you should choose your texts carefully.
Sokal and Rohlf (2009) is really the bible of biological statistics, but most stu-
dents will struggle with its high mathematical tone. Field (2009) is much easier
to understand. It is based on SPSS and, though really for psychologists, gives a
lot of useful background to the more complex GLM and data exploration tech-
niques. Heath (1995) and Ruxton and Colegrave (2003) gives useful advice on
experimental design. In contrast, if you are struggling even with the complex-
ity of this book, a very simple introduction to statistical thinking (without any
equations!) is Rowntree (1991).
Ruxton, G. D. and Colegrave, N. (2003) Experimental Design for the Life Sciences.
Oxford University Press, Oxford.
232
Solutions
Chapter 2
Problem 2.1
95% will have heart rates around 75 ; (1.96 : 11), i.e. between 53 and 97 beats per
minute.
Problem 2.2
Mean = 5.71 g, s = 0.33 g.
Problem 2.3
(a) Mean = 5.89, s = 0.31, SE = 0.103, 95% CI = 5.89 ; (2.306 : 0.103) = 5.65 to 6.13.
(b) Mean = 5.95, s = 0.45, SE = 0.225, 95% CI = 5.95 ; (3.182 : 0.225) = 5.23 to
6.67. The 95% confidence interval is three times wider than (a).
Problem 2.4
(a) Mean (s) = 3.00 (0.47) kg, n = 25, SE = 0.093 kg.
(b) The bar chart is shown in Figure A1.
Chapter 3
Problem 3.1
233
Solutions
Problem 3.2
Exploring the data in SPSS shows it to be strongly positively skewed with many more
small species than large ones.
To obtain a more symmetrical distribution you should carry out a log transforma-
tion to give results like those shown below.
234
Solutions
Chapter 4
Problem 4.1
The null hypothesis is that the mean score = 58%. The mean score of the students is
x = 58.36 with s = 13.70 and SE = 2.74. The score seems higher than expected but in
the one‐sample t test, t = (58.36 - 58)/2.74 = 0.13 to two decimal places. The abso-
lute value of t, 0.13, is therefore well below the value of 2.064 needed for significance
at 24 degrees of freedom. The SPSS and MINITAB give a significance probability of
0.897. This is much greater than the value of 0.05 needed for significance.
Therefore students did not perform significantly differently from expected.
Problem 4.2
The null hypothesis is that the mean mass of tomatoes is 50 g. Looking at descrip-
tive statistics, x= 44.1 g, s = 8.6 g, SE = 2.15 kg. This seems well below 50 g. In the
one‐sample t test to determine whether the mean mass is significantly different from
50 g, t = (44.1 - 50)/2.15 = -2.74 to two decimal places. The absolute value of t, 2.74,
is well above the value of 2.131 required for significance at 15 degrees of freedom
(remember that is the magnitude of t and not whether it is positive or negative that
matters).
SPSS and MINITAB give a significance probability 0.016, well below the 0.05
needed for significance.
Therefore the tomatoes are significantly lighter than the expected 50 g. The 95%
confidence interval for mass is 44.1 ; (2.131 : 2.15) = 39.5 to 48.7 g.
Problem 4.3
The null hypothesis is that students’ scores after the course were the same as before
it. The mean score was 58.1 before and 53.8 after. The scores seem to be worse after-
wards, but to find whether the difference is significant you need to carry out a paired
t test. This shows a mean difference d = -4.3, s = 5.7 and SEd = 1.79. Therefore
t = -4.3/1.79 = -2.40 to two decimal places. Its absolute value, 2.40, is larger than
the value of 2.306 needed for significance at 9 - 1 = 8 degrees of freedom.
SPSS and MINITAB give the significance probability as 0.040, below the 0.05
needed for significance.
Therefore the course did have a significant effect. After the course most students
got worse marks!
The 95% confidence interval for the difference is -4.3 ; (2.306 : 1.79) = -0.2
to -8.4.
Problem 4.4
(a) The null hypothesis is that pH was the same at dawn and dusk. The mean at dawn
was 5.54 (with s = 0.71 and SEd = 0.203 ) and at dusk was 6.45 (with s = 0.64 and
SEd = 0.193 ). The pH seems to be higher at dusk but to find out you need to carry
out a two sample t test. Using equations 3.5 and 3.6 we can calculate that
to two decimal places. Its absolute value, 3.20, is larger than the value of 2.080
needed for significance at 12 + 11 - 2 = 21 degrees of freedom.
235
Solutions
MINITAB and SPSS give the significance probability as 0.004, well below the
0.05 needed for significance.
Therefore the pH is significantly different at dawn and dusk, being signifi-
cantly higher at the end of the day.
(b) You cannot use a paired t test, because the cells which you measured are not
identifiably the same. Indeed, different numbers of cells were examined at dawn
and dusk.
Problem 4.5
The null hypothesis is that the control and supported plants have the same mean
yield. The mean yield of controls was 10.28 (with s = 1.60 and SEd = 0.36) and sup-
ported plants was 10.06 (with s = 1.55 and SEd = 0.35). The yield seems to be higher
in the controls but to find if this is a significant difference you need to carry out a
two‐sample t test. Using equations 3.5 and 3.6 we can calculate that
to two decimal places. Its absolute value, 0.43, is smaller than the value of 2.025
needed for significance at 20 + 20 - 2 = 38 degrees of freedom.
SPSS and MINITAB give the significance probability as 0.669, well above the 0.05
needed for significance.
Therefore the yield is not significantly different between the control and sup-
ported plants.
Problem 4.6
(a) The data cannot be transformed to be normally distributed because the weight
of the deer at the start is bimodally distributed as can be seen in the SPSS plot
below; the males are on average much heavier than the females.
236
Solutions
(b) The null hypothesis is that the weight of the deer is the same at the end as at the
start of the summer. Descriptive statistics given by SPSS or MINITAB show that the
median weight at the end is actually higher (57.0 kg) than at the start (53.5 kg) of
summer, but is this a significant difference? Carrying out the Wilcoxon matched
pairs test shows that the sum of negative ranks (63) is far less than that of positive
ranks (402). Looking up in Table S5 for the Wilcoxon T distribution, shows that
the lower value, 63, is far less than the value of 137 needed for significance for
30 matched pairs of data. SPSS and MINITAB give the significance probability as
0.000, well below the 0.05 needed for significance.
Therefore the deer did weigh significantly different at the end of the summer com-
pared with the start; they were heavier.
Problem 4.7
The null hypothesis is that the animal spent equal time pacing when in the two
cages. The sum of the ranks is lower for cage 2 (132.5) than for cage 1 (167.5), and
results in a value for U of 54.5, but does this show a significant difference? Looking
up in Table S4 shows that for n1 = 12 and n2 = 12 the critical value of U is 107, so our
value is above that needed for significance. SPSS and MINITAB give the significance
probability as 0.294, well above the value of 0.05 needed for significance.
Therefore there is no significant difference in the time the animal spends pacing
in the two cages.
Problem 4.8
(a) You should use the Mann–Whitney U test because there are two groups of stu-
dents, those given the drug and those the placebo, and there was no matching
into pairs of these two groups.
(b) The null hypothesis is that there is no difference in the scarring between patients
given the drug and those that had been given the placebo. The mean ranks of
the two groups was different (514 and 306) but is this difference significant? A
Mann–Whitney U test gives a value of U of 96. Looking up in Table S4 shows that
for n1 = 20 and n2 = 20 the critical value of U is 273, so our value is well below
that needed for significance. SPSS and MINITAB give the significance probability
as 0.004, well below the value of 0.05 needed for significance.
Chapter 5
Problem 5.1
The null hypothesis is that the mean activity is the same at each point in time. To test
whether this is the case you should carry out one‐way ANOVA as well as determine
the descriptive statistics for the five time points.
For example SPSS produces the following results.
237
Solutions
It is clear that the mean activity rises from 2.95 at the start to a peak of 4.52 after
4 hours, but are these changes significant? Looking at the ANOVA table, SPSS gives
a high value of F = 12.276, while Sig. = 0.000, well below the 0.05 level needed for
significance. Therefore activity is different at different times. But at which times is
the activity raised from the control. To find out you must perform a Dunnett test in
SPSS which will produce the following results.
238
Solutions
Looking at the Sig. column, shows that there is a significant difference from the
control (0 hours) after 2 and 4 hours (Sig. = 0.004 and 0.000 respectively) but not at
1 or 8 hours (Sig. = 0.819 and 0.608 respectively).
Therefore activity is significantly increased but only after more than 1 hour, and
it drops off again before 8 hours.
Problem 5.2
Looking at the degrees of freedom (DF), it is clear that 4 + 1 = 5 groups must have
been examined, and in total 29 + 1 = 30 observations must have been made. F is
quite small (1.71) and Sig. is high (0.35 7 0.05). Therefore there was no significant
difference between the groups.
Problem 5.3
The null hypothesis is that the mean aluminium concentration is the same at each
point in time. To test whether this is the case you should carry out a repeated meas-
ures ANOVA in SPSS, as well as determine the descriptive statistics for the five time
points.
SPSS will produce the following results (among other stuff).
239
Solutions
The descriptives table shows you that the mean aluminium activity is falling through-
out the time period, from 14.7 at week 1 to 8.0 at week 5, but are these changes signi-
ficant? Looking at the tests of within‐subjects effects, table, for Sphericity Assumed
240
Solutions
F = 157.5 and Sig. = 0.000 0.000, well below the 0.05 level needed for significance.
Therefore aluminium levels are different at different times. But to test whether activity
continues to fall throughout the experiment we must perform a Bonferroni test, which
compares all time points with all the others. This comes up with the following results.
Mean
difference
(I − J )
Here it can be seen that all time points are significantly different from each other
(Sig. 6 0.05) except for weeks 4 and 5 where Sig. = 0.174 which is greater than 0.05.
Therefore the aluminium levels continue to fall significantly only up to week 4
after which they level off.
Problem 5.4
The null hypothesis is that there is no difference between the numbers of colonies
on dishes smeared with different antibiotics. Carrying out the Kruskall–Wallis test in
SPSS gives the following results.
Kruskal‐Wallis Test
241
Solutions
The first table shows that the mean ranks of the four treatments are certainly dif-
ferent, but are these differences significant? The second table gives the value of chi‐
square as 4.604, which is well below the value of 7.815 need for 3 degrees of freedom
(see Table S3). SPSS and MINITAB also directly calculate the significance probability
as 0.203, which is well above the value of 0.05 needed for significance.
Therefore there is no significant difference between the numbers of colonies
growing on plates which have had the different antibiotic treatments.
Problem 5.5
The null hypothesis is that there was no difference in the mood of the patients at
the four times before and after taking the drug. Carrying out a Friedman’s test using
SPSS gives the following results.
Mean rank
a
Friedman test.
The descriptive statistics suggest that the mood of the patients improved 1 day and
1 week after taking the drug, but were these differences significant? The second
table gives the mean ranks, and the third table gives the value of chi‐square (18.805)
242
Solutions
which is well above the value of 7.800 need for four groups and ten blocks (see Table
S6). SPSS and MINITAB also directly calculate the significance probability as 0.000,
which is well below the value of 0.05 needed for significance.
Therefore the mood of the patients is significantly different at different times,
and it looks as if it improves mood for over a week.
Problem 5.6
(a) To investigate which effects are significant we must carry out a two‐way ANOVA.
In SPSS this comes up with the following results.
It can be seen from this table that nitrogen has a significant effect on yield (Sig. =
0.000) as does the interaction between nitrogen and variety (Sig. = 0.000). However,
variety does not have a significant effect (Sig. = 0.618 which is greater than 0.05).
What does this mean? Well, we can find out by looking at the descriptive statistics
and a plot of the means (Figure A4) for each variety at each nitrogen level (below).
Looking at the table and the plot, it is clear that adding nitrogen increases yield in
both varieties. This is the cause of the significant term for nitrogen level. The average
243
Solutions
12
10
Yield (t ha−1)
8
Variety 1
6
Variety 2
4
0
0 1 2
Figure A4 Mean t standard error of yields for two different varieties of wheat at
applications of nitrate of 0, 1 and 2 (kg m−2).
yield of the two varieties is about the same (the cause of the non‐significant variety
term), but nitrogen has more effect on yield in Hereward than in Widgeon (the cause
of the significant interaction term). Hence Widgeon does better without nitrogen;
Hereward does better with lots of nitrogen.
Chapter 6
Problem 6.1
(a) Plot cell number against time.
(b) Plot chick pecking order against parent pecking order, since the behaviour and
health of chicks is more likely to be affected by their parents than vice versa.
(c) Plot weight against height, because weight is more likely to be affected by height
than vice versa.
(d) You can plot this graph either way, because length and breadth are a measure of
size and may both be affected by the same factors.
Problem 6.2
The null hypothesis is that there is no linear association between leaf area and sto-
matal density. In the correlation analysis, SPSS and MINITAB both calculate a cor-
relation coefficient r of -0.944. This looks like a strong negative correlation, but is it
significant? SPSS and MINITAB give the significance probability as 0.000, well below
the 0.05 needed for significance.
Therefore, leaf area and stomatal density show a significant negative correlation.
Problem 6.3
(a) Bone density is the dependent variable, so it should be plotted along the vertical
axis. The SPSS graph is given below.
244
Solutions
(b) SPSS comes up with the following results for the regression analysis amongst
other tables.
From the graph and the above equation it appears that bone density falls signifi-
cantly with age. To determine whether the fall is significant, we must examine
the results of the t test for age. Here t = -14.880 and the significance probability =
0.000, well below the value of 0.05 needed for significance.
Therefore the slope is significantly different from 0. We can say there is a sig-
nificant fall in bone density with age.
(c) Expected density at age 70 is found by inserting the value of 70 into the regres-
sion equation:
245
Solutions
Problem 6.4
(a) Worm zinc concentration depends on environmental zinc concentration, not
vice versa, so worm zinc concentration must be plotted along the y‐axis. SPSS
plots the following graph.
(b) In SPSS the regression analysis comes up with the following table amongst others.
It is clear that the zinc concentration in the worms does increase with the zinc
concentration in the water, but the slope is much lower than 1, being only 0.119.
To investigate whether the slope is significantly different from 1 we must test
the null hypothesis that the actual slope equals 1. To do this we carry out the
following t test:
Actual slope - Expected slope
t =
Standard deviation of slope
246
Solutions
Here t = (0.119 - 1)/0.012 = -73.4. Its absolute value, 73.4, is much greater than
the value of 2.306 needed for a significant effect at 10 - 2 = 8 degrees of freedom.
Therefore the slope is significantly different from 1 (less in fact). It is clear that
the worms must be actively controlling their internal zinc concentrations.
Problem 6.5
(a) The relationship between seeding rate and yield as drawn by SPSS is shown
below. It looks as if the yield rises to a maximum at a seeding rate of 300 m-2,
before falling again.
(b) In SPSS the regression analysis yields the following results among others:
It is clear that the slope is not significantly different from 0 (Sig. = 0.955 which is
much greater than 0.05) and in fact the regression equation explains essentially
none of the variability (r2 = 0.000).
(c) There is no significant linear relationship between seeding rate and yield; the
relationship is curvilinear. The moral of this exercise is that linear relation-
ships are not the only ones you can get, so it is important to examine the data
graphically.
247
Solutions
Problem 6.6
(a) If log10 A = 0.3 + 2.36 log10 L, taking inverse logarithms gives
A = 100.3 * L2.36
A = 2.0L 2.36
N = e2.3 * e0.1T
N = 10e0.1T
Problem 6.7
(a) The logged data are shown in an SPSS graph below. lnmetabolism is plotted
against temperature because metabolism can be affected by temperature and not
vice versa.
248
Solutions
Problem 6.8
The null hypothesis is that there was no correlation between dominance rank and
testosterone level. Carrying out a Spearman’s rank correlation in SPSS gives the value
of r as -0.375. However, 0.375 is less than the critical value for 18 degrees of freedom
of 0.472. SPSS and MINITAB also directly calculate the significance probability as
0.104, well above the level of 0.05 needed for significance.
Therefore there is no significant correlation between the dominance rank of males
and their testosterone levels.
Chapter 7
Problem 7.1
The null hypothesis is that the mice are equally likely to turn to the right as to the
left. Therefore the expected ratio is 1:1.
(a) After 10 trials, expected values are 5 towards the scent and 5 away. x2 = (-2)2/5 +
22/5 = 0.80 + 0.80 = 1.60 to two decimal places. This is below the critical value of
3.84 needed for significance for 1 degree of freedom, so there is as yet no evidence
of a reaction.
(b) After 100 trials, expected values are 50 towards the scent and 50 away. x2 =
(-16)2/50 + 162/50 = 5.12 + 5.12 = 10.24. This is greater than the critical value of
3.84 needed for significance for 1 degree of freedom, so there is clear evidence of
a reaction. The mice seem to avoid the scent, in fact.
This problem shows the importance of taking large samples, as this will improve the
chances of detecting effects.
Problem 7.2
The null hypothesis is that there is no linkage, so in this sample of 160 plants the
expected numbers in each class are 90, 30, 30 and 10. Therefore x2 = (-3)2/90 +
42/30 + (-2)2/30 + 12/10 = 0.100 + 0.533 + 0.133 + 0.100 = 0.87 to two decimal places.
This is below the critical value of 7.34 needed for significance for 3 degrees of freedom,
so there is no evidence of a ratio different from 9:3:3:1 and so no evidence of linkage.
Problem 7.3
The null hypothesis is that the incidence of illness in the town was the same as for
the whole country. The expected values for illness in the town = 3.5% of 165 = 5.8,
and therefore 159.2 without the illness. x2 = (9 - 5.8)2/5.8 + (156 - 159.2)2/159.2 =
3.22/5.8 + (-3.2)2/159.2 = 1.76 + 0.06 = 1.82. This is below the critical value of 3.84
needed for significance for 1 degree of freedom, so there is no evidence of a different
rate of illness.
Problem 7.4
The null hypothesis is that the insects are randomly distributed about different col-
oured flowers. The completed table with expected values is given here, and it shows
249
Solutions
that in many cases the numbers of insects on flowers of particular colours is very
different from expected. But is this a significant association? A x2 test is needed.
(a) Working with four decimal places for the expected values, e.g. 26.0426 instead of
26.04, we obtain
This value is higher than the critical value of 9.48 needed for (3 - 1) : (3 - 1) = 4
degrees of freedom. SPSS and MINITAB also directly calculate a significance prob-
ability of 0.000. Therefore we can conclude there is a significant association
between insect type and flower colour.
(b) The highest x2 values are 34.46 for beetles and white flowers; 20.84 for bees
and wasps and blue flowers; and 17.61 for beetles and blue flowers. Looking at
the values, more beetles are found at white flowers than expected, so beetles in
particular favour them; similarly, bees and wasps favour blue flowers; but beetles
seem to avoid blue flowers.
Problem 7.5
The null hypothesis is that people with and without freckles have the same inci-
dence of cancer. The completed contingency table is as shown and seems to indicate
that the number of people with freckles who have cancer is greater than expected.
But is this a significant effect? A x2 test gives the following result: working with
four decimal places for the expected values, we obtain
This value is higher than the critical value of 3.84 needed for 1 degree of freedom.
SPSS and MINITAB also calculate a significance probability of 0.003. Therefore there
250
Solutions
Problem 7.6
The first thing to do is to calculate the number of ponds with neither newt spe-
cies. This equals 745 - (180 + 56 + 236) = 273. The null hypothesis is that there is
no association between the presence in ponds of smooth and palmate newts. The
completed contingency table is shown here and appears to indicate that there are far
more ponds than expected with both species or with neither species present.
Working with expected values to four decimal places, a x2 test gives the following
results:
The value is higher than the critical value of 3.84 needed for (2 - 1) : (2 - 1) = 1
degree of freedom. SPSS and MINITAB also calculate a significance probability of
0.000. Therefore there is a significant association between the presence of the
two species. In fact, the newts seem to be positively associated with each other.
When one species is present, it is more likely that the other species will be present
as well.
Chapter 8
Problem 8.1
Her null hypothesis is that there is no significant relationship between energy intake
and heart rate. The statistical test to use is correlation. She is looking at measure-
ments, looking for an association between two sets of measurements, and neither
variable is clearly independent of the other.
Problem 8.2
His null hypothesis is that there is no significant association between particular hab-
itats and species. The statistical test to use is the x2 test for association. He is looking
at frequencies in different categories, and looking for an association between two
types of category (species and habitat).
Problem 8.3
His null hypothesis is that there is no significant difference between the insulin
levels of the three races. The statistical test to use is one‐way ANOVA. He has taken
measurements and is looking for differences between groups; there are more than
two groups; the measurements are not matched; and he is investigating just one
factor (race).
251
Solutions
Problem 8.4
Her null hypothesis is that there is no significant difference between the expected
ratio of numbers of the groups of snails and the expected 9:3:3:1 ratio. The statistical
test to use is the x2 test for differences. She is dealing with frequencies in different
categories, and there are expected frequencies (9:3:3:1).
Problem 8.5
His null hypothesis is that there is no significant difference in pesticide levels between
the birds at different times of year. The statistical test to use is repeated measures
ANOVA. He has taken measurements and is looking for differences between different
sets of measurements; there are more than two sets of measurements and these are
related, since they are taken on the same birds. Finally pesticide levels are continu-
ous variables which are likely to be normally distributed.
Problem 8.6
Their null hypothesis is that there is no significant difference between blood pres-
sure before and after taking the drug. The statistical test to use is the paired t test.
They are looking at measurements and looking for differences; they will compare
only two groups (before and after), and measurements will be in matched pairs
(before and after).
Problem 8.7
The net production of oxygen by plants via photosynthesis results in them grow-
ing. Therefore if we can estimate how much a pot plant grows, we can estimate how
much oxygen it produces. Let’s suppose it grows at the (fast) rate of 1 g dry mass per
day (so after a year it would have a wet weight of over a kilogram).
Now oxygen is produced by the following reaction:
But 1 mol of glucose weighs (12 : 6) + 12 + (16 : 6) = 180 g, so the number of moles
of glucose produced per day = 1/180 = 5.556 : 10-3. For every 1 mol of dry matter
produced, 6 mol of O2 is also produced. Therefore for every 1 g of dry mass produced
by the plant, the number of moles of oxygen produced = 6/180 = 3.333 : 10-2.
Since 1 mol of oxygen takes up 24 l, this makes up a volume of 3.333 : 10-2 : 24 =
0.80 l = 0.8 : 10-3 m3 = 8 : 10-4 m3. How does this compare with the amount of oxy-
gen in the room. Well, let’s imagine a room of 5 m : 4 m : 2.5 m high, containing 20%
oxygen. The volume of oxygen = 5 : 4 : 2.5 : 0.2 = 10 m3. This is over 10 000 times
greater. The tiny contribution of the plant will be far too small to make a difference.
There is no point in doing the experiment.
Problem 8.8
The number of replicates required ≈ 9 : (7.9/5)2 ≈ 22. MINITAB also gives a figure
of 22.
Problem 8.9
Number ≈ 16 : (0.36/0.25)2 ≈ 33. MINITAB gives a figure of 34.
Problem 8.10
A doubling of the risk means an increase of 0.035. Therefore N ≈ (9 : 0.035 :
0.965)/0.0352 + 1 ≈ 249.
252
Solutions
Problem 8.11
The first thing to do is to arrange for replication in your experiment. Each treatment
should be given four plots. Next you must decide how to arrange the treatments
around the plots. You could randomise totally, arranging treatments randomly in
each of the 16 plots. However, in this case one treatment might tend to be restricted
to one end of the site. A better solution is to split the site into four 2 m : 2 m blocks
and randomise each of the four treatments within each block (see diagram below).
0 3.5 7 14 3.5 14 0 7
7 14 3.5 0 0 7 3.5 14
Next you must calculate how much fertiliser to apply to each plot. A litre of 1 M
ammonium nitrate will contain 1 mol of the substance. The formula of ammonium
nitrate is NH4NO3, so this will contain 2 mol of nitrogen, a mass of 28 g (the relative
atomic mass of nitrogen is 14). Therefore the mass of nitrogen in 1 l of 20 : 10-3 M
ammonium nitrate fertiliser is given by
Chapter 10
Problem 10.1
(a) m2
(b) m s-1 (though the number will obviously be very low!)
(c) m-3 (number per unit volume)
(d) no units (it’s one concentration divided by another)
Problem 10.2
(a) 192 MN or 0.192 GN
(b) 102 μg or 0.102 mg
(c) 0.12 ms (120 μs would imply that you had measured to three significant figures)
(d) 213 mm or 0.213 m
Problem 10.3
(a) 4.61 : 10-5 J
(b) 4.61 : 108 s
Problem 10.4
(a) 3.81 : 109 Pa
(b) 4.53 : 10-3 W
253
Solutions
Problem 10.5
(a) 250 : 103 = 2.50 : 105 kg
(b) 0.3 : 105 = 3 : 104 Pa
(c) 24 : 10-10 = 2.4 : 10-9 m
Problem 10.6
In each case use a similar degree of precision as the original measurements.
Problem 10.7
Problem 10.8
(a) 1.3 mmol
(b) 365 MJ or 0.365 GJ
(c) 0.24 μm (not 240 nm, because this implies knowledge to three significant figures)
Problem 10.9
The concentration is the number of cells divided by the volume in which they were
found. The dimensions of the box are 1 : 10-3 m by 1 : 10-3 m by 1 : 10-4 m.
Therefore its volume is 1 : 10(-3-3-4) = 1 : 10-10 m3. The concentration of blood
cells is therefore
Problem 10.10
The volume of water which had fallen is the depth of water which had fallen multi-
plied by the area over which it had fallen.
254
Solutions
Problem 10.11
The concentration is the number of moles of glucose per litre.
Mass in grams
Number of moles =
Molecular mass
25
=
(6 * 12) + (12 * 1) + (6 * 16)
Problem 10.12
The first thing to work out is the volume of CO2 that was produced.
And we know that at room temperature and pressure 1 mol of gas takes up
24 l, so
The mass of CO2 produced equals the number of moles multiplied by the mass of
each mole of CO2. Since the mass of 1 mol of CO2 = 12 + (2 : 16) = 44 g, we have
Problem 10.13
The energy produced by the reaction was converted to heat, and heat energy =
mass * specific heat * temperature rise. First, we need to work out the mass of water.
Fortunately, this is easy as 1 l of water weighs 1 kg. Therefore 0.53 l has a mass of
0.53 kg. From Table 10.5 we can see that water has a specific heat of 4.2 : 103 J K-1
kg-1, therefore
Problem 10.14
The first thing to do is work out the number of moles of X you will use:
Number of moles = Volume (in litres) * Concentration (in moles per litre)
= (8 * 80 * 10-3) * 3 * 10-3
= 1.92 * 10-3
255
Solutions
Since this is far less than £1000 you will easily be able to afford it.
Problem 10.15
The first thing to do is to work out the volume of methane produced by bogs per
year. We have
And we have
Therefore
Next you need to work out how many moles this is equal to and hence the mass of
methane produced per year. Since 1 mol of gas takes up 24 litres, we have
Volume (in litres)
Number of moles =
24
= (2.606 * 1013)>24
= 1.086 * 1012
We also have
However, since this is three times as effective as CO2, this is equivalent to 1.737 :
1010 : 3 = 5.2 : 1010 kg of CO2.
How does this compare with the amount of CO2 produced by burning fossil fuels?
This equals 25 Gt. We need to convert to kg:
This is much more. The ratio of the effect of fossil fuel to the effect of bog methane
production is
256
Solutions
2.5 * 1013
= 苲500
5.2 * 1010
Therefore bog methane will have a negligible effect compared with our use of fossil
fuels.
Problem 10.16
(a) 1.65
(b) 2.65
(c) -3.35
(d) 6
(e) 0
Problem 10.17
(a) 25.1
(b) 251
(c) 3.98 : 10-4
(d) 104
(e) 1
Problem 10.18
(a) In 3 : 10-4 M HCl the concentration of H+ is [H+] = 3 : 10-4 M. Therefore pH = 3.5.
(b) In 4 : 10-6 M H2SO4 the concentration of H+ is [H+] = 8 : 10-6 M. Therefore
pH = 5.1.
Problem 10.19
Concentration of H+ ions in pH 2.1 = 10-2.1 = 7.94 : 10-3 M. But each molecule of
H2SO4 has two hydrogen ions. Therefore the concentration of H2SO4 = (7.94 : 10-3)/2
= 3.97 : 10-3 M.
Problem 10.20
(a) 3.40
(b) −3.73
(c) 0
Problem 10.21
(a) 20.1
(b) 0.050
(c) 1
257
Statistical tables
Critical values of t at the 5%, 1% and 0.1% significance levels. Reject the null
hypothesis if the absolute value of t is greater than or equal to the tabulated value
at the chosen significance level, for the calculated number of degrees of freedom.
5% 1% 0.1%
Critical values of the correlation coefficient r at the 5%, 1% and 0.1% significance
levels. Reject the null hypothesis if your absolute value of r is greater than or equal
to the tabulated value at the chosen significance level, for the calculated number of
degrees of freedom.
5% 1% 0.1%
259
Statistical tables
Critical values of x2 at the 5%, 1% and 0.1% significance levels. Reject the null
hypothesis if your value of x2 is greater than or equal to than the tabulated value at
the chosen significance level, for the calculated number of degrees of freedom.
5% 1% 0.1%
260
Statistical tables
Critical values of T at the 5%, 1% and 0.1% significance levels. Reject the null
hypothesis if your value of T is less than or equal to than the tabulated value at the
chosen significance level, for the calculated number of degrees of freedom.
6 0
7 2
8 3 0
9 5 1
10 8 3
11 10 5 0
12 13 7 1
13 17 9 2
14 21 12 4
15 25 15 6
16 29 19 8
17 34 23 11
18 40 27 14
19 46 32 18
20 52 37 21
21 58 42 25
22 65 48 30
23 73 54 35
24 81 61 40
25 89 68 45
26 98 75 51
27 107 83 57
28 116 91 64
29 126 100 71
30 137 109 78
31 147 118 86
32 159 128 94
33 170 138 102
34 182 148 111
35 195 159 120
261
Statistical tables
262
Statistical tables
Critical values of U at the 5% significance level. Reject the null hypothesis if your
value of U is less than or equal to than the tabulated value, for the sizes of the two
samples, u1 and u2.
n1
n2 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
1 – – – – – – – – – – – – – – – – – – – –
2 – – – – – – – 0 0 0 0 1 1 1 1 1 2 2 2 2
3 – – – 0 0 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8
4 – – 0 0 1 2 3 4 4 5 6 7 8 9 10 11 11 12 13 14
5 – – 0 1 2 3 5 6 7 8 9 11 12 13 14 15 17 18 19 20
6 – – 1 2 3 5 6 8 10 11 13 14 16 17 19 21 22 24 25 27
7 – – 1 3 5 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34
8 – 0 2 4 6 8 10 13 15 17 19 22 24 26 29 31 34 36 38 41
9 – 0 2 4 7 10 12 15 17 20 23 26 28 31 34 37 39 42 45 48
10 – 0 3 5 8 11 14 17 20 23 26 29 33 36 39 42 45 48 52 55
11 – 0 3 6 9 13 16 19 23 26 30 33 37 40 44 47 51 55 58 62
12 – 1 4 7 11 14 18 22 26 29 33 37 41 45 49 53 57 61 65 69
13 – 1 4 8 12 16 20 24 28 33 37 41 45 50 54 59 63 67 72 76
14 – 1 5 9 13 17 22 26 31 36 40 45 50 55 59 64 67 74 78 83
15 – 1 5 10 14 19 24 29 34 39 44 49 54 59 64 70 75 80 85 90
16 – 1 6 11 15 21 26 31 37 42 47 53 59 64 70 75 81 86 92 98
17 – 2 6 11 17 22 28 34 39 45 51 57 63 67 75 81 87 93 99 105
18 – 2 7 12 18 24 30 36 42 48 55 61 67 74 80 86 93 99 106 112
19 – 2 7 13 19 25 32 38 45 52 58 65 72 78 85 92 99 106 113 119
20 – 2 8 14 20 27 34 41 48 55 62 69 76 83 90 98 105 112 119 127
263
Statistical tables
Critical values of x2 at the 5%, 1% and 0.1% significance levels. Reject the null
hypothesis if your value of x2 is greater than or equal to than the tabulated value,
for a groups and b blocks.
a b 5% 1% 0.1%
3 2 – – –
3 3 6.000 – –
3 4 6.500 8.000 –
3 5 6.400 8.400 10.000
3 6 7.000 9.000 12.000
3 7 7.143 8.857 12.286
3 8 6.250 9.000 12.250
3 9 6.222 9.556 12.667
3 10 6.200 9.600 12.600
3 11 6.545 9.455 13.273
3 12 6.167 9.500 12.500
3 13 6.000 9.385 12.923
3 14 6.143 9.000 13.286
3 15 6.400 8.933 12.933
4 2 6.000 – –
4 3 7.400 9.000 –
4 4 7.800 9.600 11.100
4 5 7.800 9.960 12.600
4 6 7.600 10.200 12.800
4 7 7.800 10.371 13.800
4 8 7.650 10.350 13.800
4 9 7.800 10.867 14.467
4 10 7.800 10.800 14.640
4 11 7.909 11.073 14.891
4 12 7.900 11.100 15.000
4 13 7.985 11.123 15.277
4 14 7.886 11.143 15.257
4 15 8.040 11.240 15.400
5 2 7.600 8.000 –
5 3 8.533 10.133 11.467
5 4 8.800 11.200 13.200
5 5 8.960 11.680 14.400
5 6 9.067 11.867 15.200
5 7 9.143 12.114 15.657
5 8 9.300 12.300 16.000
5 9 9.244 12.444 16.356
5 10 9.280 12.480 16.480
264
Statistical tables
a b 5% 1% 0.1%
6 2 9.143 9.714 –
6 3 9.857 11.762 13.286
6 4 10.286 12.714 15.286
6 5 10.486 13.229 16.429
6 6 10.571 13.619 17.048
6 7 10.674 13.857 17.612
6 8 10.714 14.000 18.000
6 9 10.778 14.143 18.270
6 10 10.800 14.299 18.514
265
Statistical tables
Critical values of the correlation coefficient r at the 5%, 1% and 0.1% significance
levels. Reject the null hypothesis if your absolute value of r is greater than or equal
to the tabulated value at the chosen significance level, for the calculated number of
degrees of freedom.
5% 1% 0.1%
1
2
3
4
5 1.000
6 0.886 1.000
7 0.786 0.929 1.000
8 0.738 0.881 0.976
9 0.700 0.833 0.933
10 0.648 0.794 0.903
266
Index
267
Index
268
Index
269
Index
270
Index
variability 2
confidence limits for the
population 19–21
271