Science and Law Statistics Primer
Science and Law Statistics Primer
The use
of statistics
in legal
proceedings
A PRIMER FOR COURTS
2 THE USE OF STATISTICS IN LEGAL PROCEEDINGS: A PRIMER FOR COURTS
Contents
Summary, introduction and scope 5
4. The role of expert witnesses and what should be expected from them 25
Appendices 29
Appendix 1. The use of probability 29
Appendix 2. Evaluation of trace evidence 38
Appendix 3. Evaluation of impression evidence 46
Appendix 4. Statistical significance 57
Appendix 5. Causation and relative risk 59
References 67
4 THE USE OF STATISTICS IN LEGAL PROCEEDINGS: A PRIMER FOR COURTS
Each primer presents an easily understood, accurate position on the scientific topic in
question, and considers the limitations of the science and the challenges associated
with its application. The way scientific evidence is used can vary between jurisdictions,
but the underpinning science and methodologies remain consistent. For this reason
we trust these primers will prove helpful in many jurisdictions throughout the world
and assist the judiciary in their understanding of scientific topics. The primers are not
intended to replace expert scientific evidence; they are intended to help understand
it and assess it, by providing a basic, and so far as possible uncontroversial, statement
of the underlying science.
The production of this primer on the use of statistics in legal proceedings has been
led by Professor Niamh Nic Daéid FRSE. We are most grateful to her, to the Executive
Director of the Royal Society, Dr Julie Maxton CBE, the Chief Executive of the Royal
Society of Edinburgh, Dr Rebekah Widdowfield, and the members of the Primers
Steering Group, the Editorial Board and the Writing Group. Please see the back page
for a full list of acknowledgements.
The aim of this primer is to provide assistance to the judiciary and legal professionals
in understanding the principles of evaluating evidence (that has a statistical basis)
presented in the courts. The primer is presented in two parts. The first part provides
a general introduction to the use of statistical and probabilistic tools within legal
processes with some examples presented, including some relating to evidence types
commonly presented to the courts.
Appendices 4 and 5 relate to specific statistical methods and to their use in assessing
statistical significance and relative risk. These areas generally have more relevance in
civil proceedings.
This short guide cannot equip the judiciary and legal professionals with all the
necessary skills required, but it should be useful for signposting where problems
may arise and where external expertise may be needed.
6 THE USE OF STATISTICS IN LEGAL PROCEEDINGS: A PRIMER FOR COURTS
• Prediction: eg Given a set of characteristics, what is the chance that the accused
will reoffend2?
All these situations are characterised by uncertainty, and probability theory provides
the tools and language for handling and communicating uncertainty.
THE USE OF STATISTICS IN LEGAL PROCEEDINGS: A PRIMER FOR COURTS 7
Statistical science has developed a wide range of powerful techniques for quantifying
the impact of some sources of uncertainty, eg calculating margins of error from a survey,
measuring the support for a proposition (also called a hypothesis) from observed data
or assessing the probability of a future event. Other sources of uncertainty are not
so easily quantified but can still be informally assessed and communicated, eg those
arising from the reliability of survey respondents, the quality of scientific studies and
the relevance of available and good quality datasets to the facts of a legal case.
Unavoidable uncertainty about the future is often termed chance, also known as
aleatory uncertainty, and the assignment of probabilities to future events is familiar.
Legal cases generally deal with uncertainty in the sense of lack of knowledge, also
known as epistemic uncertainty. Fortunately, the theory of probability can still be applied
in this context. Uncertainty of measurement can also arise and this, in general, can be
characterised for objective measurements (eg how much of a controlled substance may
be in an analysed sample) but is more challenging for more subjective measurements
(eg in the examination of toolmarks).
8 THE USE OF STATISTICS IN LEGAL PROCEEDINGS: A PRIMER FOR COURTS
The way in which statistical science may be used in a legal context is illustrated in
Figure 1.
THE USE OF STATISTICS IN LEGAL PROCEEDINGS: A PRIMER FOR COURTS 9
FIGURE 1
The process by which statistical science may be used in legal proceedings and in
which relevant past data are used to draw conclusions about the facts of a current case.
Estimates are neither wholly right nor wholly wrong, conclusions are not mechanistic
and sometimes the only database available is the experience of the professional. In
such situations transparency is particularly necessary and the experience needs to be
documented with emphasis on the relevance to the case in question. The professional
judgement of the appropriate experts (expert knowledge) is inevitably involved in each
stage of the process outlined in Figure 1. Probability is a conceptual device that helps
us think and reason logically when faced with uncertainty about the occurrence of a
questioned event in the past, present or the future.
FIGURE 2
Expected frequency tree when repeating a double-flip of a coin 100 times. We would
expect the first flip to be heads in 50 of these experiments, and both flips to be heads
in 25.
Second
25
First
Heads
50
Tails
Heads 25
100
Tails 25
Heads
50
Tails
25
It is a common misapprehension that probabilities can only be used for future events
with some randomness. While it is true that an event has either happened or not,
many statisticians will feel that it is reasonable to assign probabilities to our personal
uncertainty about unknown facts, as the following example shows.
Suppose I have a coin and I ask you for your probability that it will come up heads.
You answer “50:50”, or similar (50% or ½). Then I flip it, cover up the result before either
of us sees it and again ask for your probability that it is heads. You may, after a pause,
say “50:50”. Then I take a quick look at the coin, without showing you, and repeat the
question. Again, if you are like most people, you eventually say “50:50”. This simple
exercise reveals a major distinction between two types of uncertainty: what is known
as aleatory uncertainty before I flip the coin – the ‘chance’ of an unpredictable event –
and epistemic uncertainty after I flip the coin – an expression of our personal ignorance
about an event that is fixed but unknown. In forensic science we are almost always
concerned with epistemic uncertainty about the facts of a past situation.
2. P
robability and the principles of
evaluating scientific evidence
Specifically, in a legal context, probability can help fact-finders assess the impact of
evidence on the truth or otherwise of a particular proposition. It has a well-documented
history in academic legal literature6. Each item of evidence can be used to support one
or more proposition(s). Evidence may, on occasion, point directly to incriminating or
exculpating a suspect of a particular crime, but will more likely have probative value in
discriminating between competing propositions for either the source of some material
found in relation to a scene of the crime or an alleged activity connected to a crime.
But that is not to say that personal probability is conjured up on a whim or a preference,
or to suggest it is not based on acquired data. Where relevant data are available, it is
expected that they will be taken into account in assigning a probability. For example,
suppose that reliable information is available on the proportion of individuals in a target
population that possess a particular observable feature, such as skin, hair or eye colour.
Then, our assessment that a person drawn randomly from that population will show a
particular feature of interest ought to be informed by the available knowledge about the
composition of the population. When new testing systems are used, the frequencies of
given traits in a population will not be widely known.
• What is the probability that I will miss the bus this morning if I have one more
cup of coffee?
• What is the probability that I will be caught if I break into this property?
Some people may attempt to answer the above questions by thinking of past
experiences in similar situations. For example, they may consider how many mornings
in the past they have missed the bus when having one more cup of coffee, though this
may give rise to many other questions, such as the extent to which today’s morning is
comparable to previous experiences.
THE USE OF STATISTICS IN LEGAL PROCEEDINGS: A PRIMER FOR COURTS 15
Statisticians and scientists refer to data on the proportion of times an event has occurred
as a relative frequency. However, and most importantly, there are some situations
for which relative frequencies cannot be meaningfully conceived. In legal cases, for
example, the fact-finder must deal with singular, non-repeatable, one-off events for
which the notion of relative frequency may not be helpful. This does not preclude
the possibility that useful frequency data may be available (eg scientific data on the
occurrence of genetic features or the prevalence of a disease) to help decide aspects
of the case (examples of this type of use are in Appendices 4 and 5). Where such data
are available and are relevant, they ought to be used in probability assignment as one
source of information among others.
• the ability of the expert to compile and store systematically those experiences
in their memory;
• the expert’s ability to avoid and mitigate against bias while inputting expert
knowledge; and
In general, the more that experts base their assignments of probability on relevant,
shared and robust data, the greater is the trustworthiness of those assignments. The
more they base their assignments on their recalled experience and knowledge and
on their intuition, the more those assignments will be open to justified challenge.
16 THE USE OF STATISTICS IN LEGAL PROCEEDINGS: A PRIMER FOR COURTS
The Court of Appeal decided in Regina v Abadom7 that an expert is entitled to draw
upon material produced by others in the field in which his or her expertise lies, and
indeed where any reliable data are available that bear upon the question the expert
is addressing. It is part of the duty of the expert to take this into account. So, a crucial
judgement concerns the reliability and relevance of available data. The dataset
should have high intrinsic quality, reliability and high relevance to the question being
addressed by the use of the data. A national dataset of informally collected examples
of glass or footwear from people’s homes may be of limited relevance to local criminal
investigations but of value in informing background abundance of the items in question,
while local datasets collected to address specific aspects of criminal incidents or of
suspect populations may have high relevance. Ideally, there will be an appropriately
large enough random sample from a population that matches agreed features of the
case, but this is a high bar that is rarely achieved and means that expert judgement and
full transparency are required to deal with such limitations.
THE USE OF STATISTICS IN LEGAL PROCEEDINGS: A PRIMER FOR COURTS 17
LRs are typically attached to DNA evidence in which a ‘match’ of some degree is found
between the suspect’s DNA profile and the DNA profile derived from a trace found at
the scene of a crime. The two competing hypotheses are that the DNA profile in the
recovered trace material originates from the suspect or it originates from someone else,
so that we can express the LR as:
probability of the DNA profile ‘match’, if the suspect left the trace
LR =
probability of the DNA profile ‘match’, if the trace was left by
someone else
The ‘DNA evidence’ is the suspect’s DNA together with the DNA trace from the crime
scene. For the specific situation when the trace contains plenty of DNA and it is deemed
to have come from one person, the LR above can be written, after some mathematical
operations and given some assumptions, as:
1
LR =
random match probability
The random match probability is the probability of finding an evidence match if selecting
at random from within a particular population. For example, in the context of a DNA
sample8, it is the probability of observing a DNA profile of an unknown person that is the
same as the DNA profile from a crime scene stain (and assuming a particular population
genetic model). Typical LRs for DNA evidence are in the millions or billions, although
the exact values may be contested, such as when there are complications due to the
traces containing a mix of DNA from multiple people. Further information is provided in
Forensic DNA analysis: a primer for the courts9.
18 THE USE OF STATISTICS IN LEGAL PROCEEDINGS: A PRIMER FOR COURTS
Table 1 shows an example of a verbal scale used for communicating LRs (a similar
example can be found in Willis10). In most cases (including DNA at an activity level where
the activity that caused the DNA to be deposited is the issue) the LR will be based on
a semi-quantitative (ie an order of magnitude) assessment and a verbal equivalent may
be presented to the court. The reason that verbal expressions are defined numerically
is to provide a consistency in their use rather than to translate available numbers into a
common language. In those cases where a quantitative assignment is possible there is
a strong argument for presenting the LR value to the court without a verbal qualifier but
also an argument for avoiding the risk that lay persons (eg juries) may misunderstand
conclusions stated in numbers as absolute measurements. An LR equal to 1 supports
neither proposition preferentially.
The LR is not a specific measurement, but rather is the weight of evidence of the
scientific findings in two competing scenarios (prosecution and defence). There will
almost always be some natural variation in LRs depending on different assumptions
and the quality and relevance of the datasets and what is known about the transfer,
persistence, recovery and background abundance of the particular type of evidence
under scrutiny. The value of the LR on the scale shown in Table 1 is preferably assigned
based on robust data extracted from a relevant dataset.
With good quality and relevant data it may be possible to generate a numerical
assessment using the LR relating to evidential support. However, often the available
data are either poor or non-existent (particularly true for knowledge relating to the
transfer of material and its persistence once transferred). In such cases, the expert
forms a personal opinion based on domain knowledge of processes and on personal
experience that can be disclosed and audited. In these situations, verbal expressions
or orders of magnitude of the LR may be helpful and the basis of any statement of
expert opinion formed this way must always be made clear.
THE USE OF STATISTICS IN LEGAL PROCEEDINGS: A PRIMER FOR COURTS 19
TABLE 1
Example of verbal interpretations of likelihood ratios (LRs) – in this case for source-
level propositions.
For example, suppose a hypothetical screening test for doping in sports is claimed to
be ‘95% accurate’, meaning that if an athlete is doping there is a 95% chance (probability
0.95; sensitivity) of obtaining a positive test result, and if the athlete is a non-doper
there is a 95% chance (probability 0.95; specificity) of obtaining a negative test result.
Such general performance characteristics have been determined through tests under
controlled conditions, ie by applying the test in so-called ground truth cases, where it is
known whether a tested person is doping or not.
Assuming that the odds of an athlete taking drugs prior to being subject to a screening
test are 1 in 50 (1:50), then if an athlete tests positive what is the probability that they are
truly doping?
The LR (explained in Section 2.4) is the probability of a positive test given the
proposition that the athlete is doping (95%) divided by the probability of a positive test
given the proposition that the athlete is not doping (5%, ie 1 - specificity). This ratio is 19
(LR = 0.95/0.05 = 19).
Bayes’ theorem tells us that the posterior odds of the athlete having taken drugs can be
computed by multiplying the prior odds of that proposition by the LR provided by the
positive test. In this form, we have to work with odds not probability. Odds are related
mathematically to probability and a very simple conversion can be used to give the
value for probability where the odds of m:n correspond to the probability m/(m + n).
THE USE OF STATISTICS IN LEGAL PROCEEDINGS: A PRIMER FOR COURTS 21
• the prior odds for the proposition ‘athlete is doping’ versus ‘athlete is not doping’
are 1:50, which correspond to a probability of 1/(1 + 50) or a prior probability of
approximately 0.02 (the actual value is 0.0196, which is equivalent to 1.96%);
• therefore, by Bayes’ theorem, the posterior odds that the athlete is doping are (1:50)
x 19 = 19:50, giving a posterior probability of doping of 19/(19 + 50) ≈ 0.28 or 28%.
So, even though drug testing could be claimed to be ‘95% accurate’ (based on the
sensitivity and specificity metric) this does not mean that, in the event of a positive result,
there is a 95% chance that the athlete is doping. In this example, the probability that the
athlete is doping, given a positive test result, is approximately 28%. The posterior odds
that an athlete is doping crucially depend on the prior odds for the proposition ‘athlete is
doping’ versus ‘athlete is not doping’ (in the example this was 1:50) prior to considering
the result of the screening test (the LR result). This means that if conclusions are drawn
from test results in isolation there could be misinterpretations of what is meant by the
accuracy of the test. This could cause conclusions such as athletes being incorrectly
accused of doping because they failed a drug test.
In practice, the Court of Appeal has ruled that Bayes’ theorem should not be used by
a jury to combine and weigh evidence11, but LRs assessed by experts are permitted if
they have a sound basis. Many real-world cases involve multiple and related items of
evidence, making probabilistic inference much more complex and intricate than the
illustrative doping example given above.
22 THE USE OF STATISTICS IN LEGAL PROCEEDINGS: A PRIMER FOR COURTS
CASE STUDY 1
An archaeological case
On Saturday 25 August 2012, archaeologists began an excavation for Richard III’s
remains by digging in a car park in Leicester. Within a few hours they found their first
skeleton and the question was whether this was Richard III. Table 2 shows the specific
items of evidence and their likelihood ratios (LRs) regarding the propositions that the
skeleton was that of Richard III. These LRs were, as far as possible, based on sound
statistical evidence, but there was inevitably some uncertainty so that conservative
values were assigned, and verbally interpreted here (because of their uncertainty
and therefore qualitative nature) using the terms in Table 1.
Probability theory permits, given certain assumptions about the evidence, the
multiplication of these LRs to give a final number that represents ‘extremely strong’
evidence to support the proposition that the skeleton was, rather than was not, that of
Richard III. Of course, the final assignation of the skeleton would not be based on the
LR alone but would involve other evidence as well.
TABLE 2
In statistical science, however, the uncertainty regarding facts is carried through any
chain of reasoning and influences trust in the final conclusions. The expert can properly
ask what the level of probability or uncertainty is. However, courts are perfectly used to
bringing into the calculation of a primary conclusion uncertain disputed facts along the
way and without necessarily resolving each uncertainty. Sometimes evidence going to
disputed contributory facts, when combined with other evidence going to a different
disputed contributory fact, may enable a conclusion to be reached safely on the
principal fact in issue. Likewise, juries are commonly directed that they do not need to
resolve every dispute in the evidence, so long as they are satisfied beyond reasonable
doubt of the guilt of the accused. Some disputed facts can safely be left unresolved and
scientific findings will generally have a degree of uncertainty rather than a definite value.
The Forensic Science Regulator has the responsibility for reporting standards used
by forensic experts and reports must be structured to provide an understanding of
the probative value of the evidence. Crucially, legal professionals should be able
to recognise complex and non-standard situations in which an expert in probability,
forensic inference or statistical reasoning may need to be consulted.
[S] Knowledge derived from robust systematic studies, ideally published, where the
relevant features have been measured and studied statistically.
[E] Knowledge derived from personal experience, ie the expert’s training and
professional experience in their forensic specialism.
Published scientific data are used wherever possible as a basis for these assessments.
If relevant published data are not available, then data from unpublished sources or
ad hoc experiments may be used as long as they have been peer reviewed and
documented on file. Knowledge such as personal experience in similar cases and
peer consultations may be used provided that the practitioner can justify their use
and demonstrate their basis15. In addition to the nature of the knowledge invoked, it
is critically important that the expert discloses transparently the nature, provenance,
extent and relevancy of the knowledge used to inform their LR. Transparency is
paramount to ensure scrutiny and ultimately to allow courts to assess credibility, ie
how LRs were derived and their robustness. Because LRs may be based on different
experts’ knowledge, there may be a legitimate and understandable difference in opinion
between two experts.
28 THE USE OF STATISTICS IN LEGAL PROCEEDINGS: A PRIMER FOR COURTS
The lack of a common language among experts, lawyers, judges and lay people
about what is meant by probability and statistics and how these concepts are used
by experts to provide answers to specific case-related questions remains challenging.
This can lead to misunderstandings and confusion.
There are also gaps in data and knowledge related to many types of evidence, including
how materials transfer between people and between people and surfaces. Similarly,
data on the persistence of materials once transferred and on background abundance
of materials are sparse. This requires a greater reliance on the expert’s knowledge and
understanding of evidence and applying this to specific case circumstances.
In some circumstances where data are well known and well defined (eg repetitive
measurements made by a scientific instrument) significance testing (Appendix 4) can
be undertaken to provide a fundamental tool in uncovering relevant information about
the data and what inferences can be made. An example may be the measurement
of uncertainty or error relating to the determination of alcohol or drugs in a sample.
Concepts such as causation and relative risk (Appendix 5) can be explored with the help
of statistical methods; however, the significance of such associations remains primarily a
matter of expert judgement.
This primer forms only a basic introduction to the evaluation of evidence based on
statistical and probabilistic reasoning.
THE USE OF STATISTICS IN LEGAL PROCEEDINGS: A PRIMER FOR COURTS 29
In criminal cases, the fact-finder must believe beyond reasonable doubt and be sure
that the event occurred. There have been attempts to define what ‘beyond reasonable
doubt’ and to ‘be sure’ mean in numerical, probabilistic terms but there is no agreement
on, or even a strong drive to adopt, a particular number.
In contrast, the fact-finder in civil cases relies on the concept of balance of probabilities.
One common understanding is that the fact-finder must form a judgement, based on the
strength of the evidence, as to whether their belief for the plaintiff’s contention is greater
than or less than 50%. The notion of probability may also be invoked in court when
expert evidence is being adduced.
To avoid possible confusion, the expert should explain to the court that ‛highly probable’
is a description of their expectation that this particular pattern of bloodstaining on the
defendant’s shoes would have been observed if the defendant had, as alleged, kicked
the victim. But, for balance, it is necessary also to ask: What is the probability that this
pattern of bloodstaining would have been observed if the defendant had not kicked the
victim? The answer to this second question may be, for example, ‛very low’.
30 THE USE OF STATISTICS IN LEGAL PROCEEDINGS: A PRIMER FOR COURTS
These assignments are the expert's probability judgements made after taking into
account the results of any experimentation aligned to the case circumstances, the body
of documented knowledge and data in the specialism, as well as their own experience
in the field. There are no such things as the ‛right’ or ‛correct’ probabilities in this case.
Explanations may offer assistance during the investigative proceedings but are
limited insofar as there is no assessment of the probative value of the findings. The
explanations may not be an exhaustive list of possibilities and there is no assessment
of how probable each explanation may be, rendering them generally not useful for
decision-making. However, while exploring alternative explanations (causes) for the
evidence is a perfectly valid procedure before a trial, questions about alternative
explanations may be posed later in court by defence counsel to dilute the force of
the principal conclusions.
THE USE OF STATISTICS IN LEGAL PROCEEDINGS: A PRIMER FOR COURTS 31
It is important for the people who make probability assignments to declare the
conditioning factors that have influenced those assignments – and it is important for the
recipients of expert information to probe the foundations of those declarations. It is also
vital that the fact-finder is made fully aware of those influences. The type and quantity
of conditioning information taken into account will vary depending on, first, the role of
the person assigning the probability within the fact-finding process and, second, the
question being asked.
For some roles and some types of question, the basis for assigning a probability may be
straightforward, but for others assigning a probability may be problematic because of a
lack of knowledge or a lack of relevant data or because it is not within the competence
of the person being asked the question. Case studies 2 and 3 provide two examples in
a criminal context to illustrate the sources of expert probability assignments.
32 THE USE OF STATISTICS IN LEGAL PROCEEDINGS: A PRIMER FOR COURTS
CASE STUDY 2
A DNA case
The fact-finder is presented with expert evidence of matching DNA profiles extracted
from a sample from a defendant and from a bloodstain left at the scene of a crime.
A question for the fact-finder would be: Is the DNA extracted and analysed from
the bloodstain that of the defendant? And the ultimate question would be: Is the
defendant guilty of an offence? It is the fact-finder’s role to answer the two questions.
Whether they do so probabilistically is entirely their choice. It is not the expert’s role
to answer the two questions, however tempting it may be to answer the first question.
What the expert can do is to provide the fact-finder with their expert, justified view on
the probability of observing the DNA evidence, ie DNA profile from the bloodstain and
the defendant’s DNA profile, under two competing propositions. If we look at the first
question, the pair of competing propositions would be:
The expert can help the fact-finder by providing their probability of obtaining the DNA
evidence if the DNA had originated from the defendant and, alternatively, if it had
originated from someone unrelated to the defendant.
THE USE OF STATISTICS IN LEGAL PROCEEDINGS: A PRIMER FOR COURTS 33
There are databases of DNA profiles from samples of people from various
ethnic groupings. These databases can be consulted to assess the rarity of the
(matching) DNA profile. That statistic can then be used as a basis, among other
considerations (such as genetic relationships among individuals), for assigning a
probability of obtaining a match IF the bloodstaining had originated from someone
unrelated to the defendant. What the expert brings to that assignment is knowledge
and understanding of the impact of relatedness among people, in the form of a
population-genetic model, and of choosing the most relevant database(s) for the case
in question. It is not just a simple question of using a frequency of occurrence as a
probability: a more subtle treatment is required.
34 THE USE OF STATISTICS IN LEGAL PROCEEDINGS: A PRIMER FOR COURTS
CASE STUDY 3
The fact-finder’s questions would include: Did the FDR on the defendant’s hand
swabs come from the gun that fired the cartridges? And: What is the probability that
the defendant fired the gun, given matching FDR had been found? The expert’s
role is to offer probabilities for observing the particular FDR, given the truth of the
prosecution proposition and defence alternative that flow from the questions facing
the fact-finder. Looking at the issue of whether the defendant fired the gun, the two
competing propositions would be:
• Hp: The defendant fired the gun (at the relevant time).
• Hd: The defendant did not fire the gun (it was some other person).
The task to be addressed is that the propositions are contested – we do not know
which of the competing propositions is true. Did the defendant or someone else fire
the gun? The principles of inductive logic (where the conclusion may be probable
based upon the evidence presented) dictate that, to assess the probability that an
uncertain proposition is true, the probabilities of the evidence under the competing
propositions need to be considered (see Section 2.5 on doping). So, how does the
expert assign such probabilities?
• The essential distinction between probabilities for evidence (usually the expert’s
observations and analytical results) and probabilities for propositions (ie the facts
in issue).
• Whether probabilities for propositions are being assigned before or after expert
evidence is presented. If before the consideration of expert evidence, these
probabilities are called prior probabilities; if after the presentation of expert evidence,
they are called posterior probabilities (see Section 2.5 on the doping example).
• Who is best placed, in terms of their roles in the legal process, to provide these
different types of probability?
• What is the information that has conditioned (or influenced) the assignment of
probabilities to the propositions, and is it relevant to the task?
Generally, the probabilities for obtaining expert observations, conditioned on the truth
or otherwise of the proposition in question, are in the domain of the experts. The expert
should have sufficient data and the knowledge and understanding of the evidence to
assign defensible, informed probabilities. The expert should be able to convey and
explain these probabilities to the fact-finder to help them deliberate on the truth or
falsehood of the proposition in question. It is the fact-finder who has received other,
non-expert evidence in a case and who is therefore in the best position to take a view
on the truth of the proposition in question. However, this is not a hard-and-fast rule and
there will be some situations in which the expert can provide informed probabilities for
the truth of the proposition.
THE USE OF STATISTICS IN LEGAL PROCEEDINGS: A PRIMER FOR COURTS 37
Another limitation is the lack of relevant data to inform probabilities in some areas of
expertise. In some areas, such as in textile fibres and ‛touch’ DNA, there is an extensive
body of research and survey data on which to draw. In other areas, such as toolmarks
or ballistics, there is only limited published research on important considerations
relevant for assessing probative value. In the absence of reliable, informative and
structured data, the expert must rely on their knowledge and understanding of the
evidence type, provided that the basis of such opinion is documented, can be audited
and is disclosed. It is in such areas particularly that evidence of the reliability of the
expert’s opinions would be highly desirable.
38 THE USE OF STATISTICS IN LEGAL PROCEEDINGS: A PRIMER FOR COURTS
In forensic casework, the objective is often to compare known material and questioned
material. Either can originate from the suspect or from the scene. Examples include
glass from a window compared with glass fragments recovered from a suspect or fibres
from a suspect’s jumper compared with recovered fibres from a victim. The material
might be compared at the source level (eg are the fragments from the same window or
not?) or at the activity level (eg did the suspect break the window or not?). Source level
is rarely sufficient to assist the questions relevant to the case. Glass recovered from
a suspect that is indistinguishable from a broken window is of little value without an
assessment of how probable such a finding is if the suspect broke the window versus
that the suspect had nothing to do with the breaking.
Analysing trace evidence at the source level alone, without reference to the activities
associated with the evidence, can be misleading. It is often the case that the most
relevant questions are related to the activities which may have led to the trace
materials being transferred. In order to obtain results that are helpful to address these
questions and that are not misleading, other factors need to be considered in addition
to the source-level questions. These other factors include the probabilities of transfer,
persistence and recovery of the material in the context of the alleged activities.
Statistical approaches which only assess the similarity and rarity of the materials can
miss factors which affect relevance within the context of the circumstances of a given
case – this can have a major impact on the evidential weight of the findings.
THE USE OF STATISTICS IN LEGAL PROCEEDINGS: A PRIMER FOR COURTS 39
Trace evidence must be viewed in the context of the case. As described in Section
2, at least two competing propositions should be addressed. In addressing these for
a specific case, the expert considers how probable the findings are in each of these
competing propositions. The result is presented in the form of an LR. This highlights
that the results do not have a stand-alone value; rather, their value is dependent on the
proposition being addressed or the questions to be answered.
The issues to be considered to address activity-level propositions are similar for all
trace materials but the factors affecting the issues vary from one material to another.
To assess how probable it is to find matching materials (glass, fibres, etc) if a particular
action took place, data on transfer, persistence and recovery are needed as well as data
on how common the materials under consideration are in the environment. Levels of
background abundance are also needed when considering the findings if the alternative
is true, ie that the activity did not take place or was not carried out by the suspect.
An example is knowing the background abundance of groups of glass fragments which
would be found on clothing in a given population. Sometimes, where insufficient data
are available, expert judgement or personal opinion are used to assign the required
probabilities. This should be made explicit in the report. Ideally ad hoc experiments
should be carried out in the absence of data.
Where any statistical assumptions or datasets have been used to evaluate evidence,
these should be clearly explained and justified in the case report. It is important that
checks have been carried out to test whether the statistical assumptions used are
appropriate for the evidence type and the propositions being assessed. One way of
doing this is to test the statistical approach on an existing dataset where the ground
truth is known and to assess the proportion of times that the LR gives a misleading
or incorrect result.
For example, for a source-level comparison this would mean evaluating the proportion
of times that the LR is greater than 1 when the two sets of material are from different
sources and the proportion of times that the LR is less than 1 when the two sets of
material are from the same source. Both proportions should be small in a model that
fits the evidence type for which it is being used.
Fibres
Fibres are shed from surfaces of various materials such as clothing, carpets
and car seats. They vary greatly in composition and colour. Studies have found that
fibres which are common, such as blue wool, have not been detected on surfaces
in high numbers except in areas where a known source has been in contact. Hence,
fibres can be very useful in reconstructing the activities that occurred during the contact
between textiles. The tendency to shed fibres is governed by many factors, including
the looseness of the weave, the size of the fibres and the age of the garment. This is
easy to visualise when we consider the difference between the shedding of a new
carpet and that of one that has been in place for some time.
Whether fibres transfer or not depends on the shedability and on the type of contact.
Information on both factors is needed to assess the range of fibres likely to be
transferred. A smooth shell suit will not be expected to yield transferred fibres even
if the contact is prolonged while a woollen jumper will give rise to transferred fibres
with limited contact. Little peer reviewed published literature exists in relation to the
shedability of fibres. These considerations highlight why case context is important
in assigning the probabilities of the findings in competing scenarios. Fibres are best
considered in the totality of the case and it is rarely useful to consider only their
sources or the presence of a single fibre.
THE USE OF STATISTICS IN LEGAL PROCEEDINGS: A PRIMER FOR COURTS 41
Transfers of fibres can be by direct contact (eg from the suspect’s clothes to the
victim’s clothes) or by indirect contact (eg fibres transferred from the victim’s clothes
to the suspect’s clothes via an intermediary object); the latter is called secondary
transfer. It is not possible for the expert to opine on whether the transfer of fibres was
primary or secondary transfer, or whether or not the transfer occurred during the alleged
activity. However, fibre experts are well placed to assess the probability of particular
findings in either scenario.
A large number of fibres indistinguishable from the bedcover would not be expected
to be recovered from the suspect’s socks given the alternative scenario. In this case, the
LR (ie the weight to be assigned to the findings) would involve the relative consideration
of these probabilities and the pair of competing propositions would be:
Given the above assignments, if a large number of matching fibres were obtained,
we would expect a value of the LR above 1, supporting the proposition that the
assault occurred in the bedroom. The actual values for the probabilities will depend on
factors such as the tendency of the socks and the bedcover to shed fibres, the time
between the incident and the seizure and examination of the items, the frequency of
occurrence of fibre types in given situations and the statistical assumptions used to
link these factors together. The number of recovered indistinguishable fibres from the
bedcover are relevant because it is possible that a small number of bedcover fibres
may be present in the living room. Much of this becomes a matter of professional
judgement and experience.
42 THE USE OF STATISTICS IN LEGAL PROCEEDINGS: A PRIMER FOR COURTS
• Hd: The suspect never had any contact with the victim.
With the information above, the expert will inform that the findings are unlikely given Hp
but more expected given Hd. The LR will be less than 1, ie supporting Hd.
Glass
When a window is broken a large number of fragments fall back in the direction of the
blow. Thus, a person delivering the blow is expected to have small fragments in their
hair or on the surface of their clothing depending on how close they were to the window
as it was breaking, the height of the window, the type of glass and the activities that
followed19. One of the main tests used to examine glass fragments at the source level
consists of measuring the refractive index, which varies both within a pane of glass
and between sources of glass. The refractive indices of the glass fragments from the
window are measured and glass fragments from the suspect are also analysed and put
into matching and non-matching groups of glass if glass of more than one refractive
index is present20. However, assessing the closeness of the ‘match’ between glass
recovered from clothing and the window glass does not provide sufficient information
to evaluate propositions concerning whether or not the suspect broke the window.
Even when additional analytical tests are applied, this activity-related question is not
answered. Other information is required relating to, for example, how glass fragments
are transferred and retained following the breaking of glass objects, or how prevalent
THE USE OF STATISTICS IN LEGAL PROCEEDINGS: A PRIMER FOR COURTS 43
To evaluate the glass fragment results, information is needed on how probable the
findings are if the suspect broke the window(s) against how probable the findings
are if the suspect had nothing to do with breaking the window(s). To address the first
probability, two possible ways that the glass fragments can arise must be considered
– either fragments were transferred when the window was broken and non-matching
glass, if present, was already on the clothing or no glass was transferred from the
window and all the glass on the clothing, both matching and non-matching, was
already there. To assess the probability of the findings (ie glass matching the window
found on the clothing), if the suspect had nothing to do with breaking the glass, we
need information on the probability of finding glass on innocent members of the
population. This information is critical, and it is useful to consider a population as close
to the suspect as possible. One well-known dataset considers glass on the clothing of
persons who come to the attention of the police rather than the general population.
Different ammunitions give rise to different residues, but a very high number have
similar compositions providing little discrimination. Even when the ammunition type is
known, it is considered good practice to compare residue from the discharged cartridge
case, barrel of the gun or the bullet hole – the known material(s) – and the residue
recovered from the hands or clothing of the suspect – the questioned material(s). This is
because variation in the proportions of particles can occur. It is common to see reports
from forensic scientists in which the number of particles recovered is factually reported,
or a statement that FDR was detected. Such statements can often be accompanied by
a disclaimer that the findings are ‘consistent with’ the suspect being close to a person
firing the gun or touching a surface with firearm residue on it. Similarly, a negative
finding is explained away by loss or time delay. The evaluation of the meaning of such
findings in the context of the case is generally left out of the report.
44 THE USE OF STATISTICS IN LEGAL PROCEEDINGS: A PRIMER FOR COURTS
However, as with other trace evidence types, the more relevant question is whether
or not the suspect undertook an activity which could result in the scientific findings. In
the case of FDR, it is more meaningful to address whether the findings are more or less
likely if the person fired or did not fire the gun in the circumstances of the case. These
assessments will go beyond statements of consistency but should be qualified
in terms of probability.
Example
In a given set of circumstances, the expert may indicate that the probability of finding
FDR on a person who fired a gun is high if the person is sampled soon after the gun had
been fired. That expectation will be balanced against the probability of the finding if the
suspect did not fire the gun. The latter probability is low as surveys on members of the
general population show few instances of FDR. Depending on the circumstances, finding
FDR on a person’s hands is expected to provide support for firing of a gun rather than not
firing a gun, ie the LR will be greater than 1.
To inform these probabilities and assign a meaningful LR, information is needed on the
type of weapon and length of time between the alleged firing and sampling. Ideally, tests
should be carried out under the conditions of the known circumstances of the case.
Data regarding the presence of FDR as a background in a given population (of individuals
or objects) is also required to assess how prevalent the material may be considering
activities other than discharging a firearm. At the current time the understanding of the
transfer, persistence, recovery and background abundance of FDR is limited.
Drugs on banknotes
Banknotes seized from people who have been found guilty of drug crime on average
have higher levels of drug contamination than banknotes found in general circulation.
Different analytical techniques can be used to obtain measurements of drug traces
on banknotes and can result in different measurements. Hence it is important that
comparisons are made using datasets obtained using the same analytical technique. For
some drugs (eg cocaine, which is found on most banknotes) the measurements of drug
found on the set of seized banknotes are related to the quantities of drug on the notes.
For other drugs (such as heroin) the measurements might simply be the presence or
absence of the drug on each banknote.
THE USE OF STATISTICS IN LEGAL PROCEEDINGS: A PRIMER FOR COURTS 45
The strength of this evidence in relation to the following propositions can be evaluated
using an LR and the pair of competing propositions would be:
• Hp: The banknotes are associated with a person involved in drug crime.
Statistical models can be used to evaluate the LR; for some examples, see Wilson
et al.21. The assumptions behind the statistical models must be checked and the
models validated. Selecting suitable databases for these statistical models can be a
challenge22 as there may be both regional variations and variations over time associated
with particular drug use behaviours in different areas. Having relevant localised
ground truth data (analysed samples from known locations over different known time
periods) is essential to assess whether this is the case. It is important that the datasets
are consistent with the propositions. For example, if the propositions are specific
to a particular drug, then the dataset should also be specific to that drug over the
appropriate time period. It is also difficult to obtain a dataset of banknotes ‛associated
with a person involved in drug crime’, which can make it difficult to estimate the
probability of the findings under Hp. It is therefore key that the statistical assumptions
supporting data selection are described and justified.
Appendix 3: Evaluation of
impression evidence
The purpose of this appendix is to explain the type and significance of the conclusions
reached by forensic experts dealing with impression evidence. It will also explain the
basis (statistical or otherwise) upon which these conclusions are formed.
• Were the (finger-/palm-) marks recovered at the scene left by this individual or by
some other unknown person?
• Were the (footwear-/tool-) marks recovered at the scene made by this object
(shoe or tool) or another unknown shoe or tool?
• Was the bullet recovered from a body fired by the seized firearm or by some
other unknown firearm?
THE USE OF STATISTICS IN LEGAL PROCEEDINGS: A PRIMER FOR COURTS 47
TABLE 3
Footwear mark examination Footwear mark Known shoes and their associated
reference prints (reference prints
taken in two dimensions or
impressions in three dimensions)
It is important to stress that the issue of origin (or source) in all of these specialisms
is generally associated with an implied activity made by a person or carried out using
an object. It is these implied activities that led to the production of the questioned
impressions (Table 4).
These activities are often not referred to specifically in the expert’s report but remain
implied. Indeed, experts will not systematically envisage all conceivable possibilities for
a questioned impression to be produced but will consider the most reasonable activity
arising from the case circumstances. For example, experts, unless instructed otherwise,
will not account for fanciful scenarios such as:
• a fingermark not being left on the surface by a living hand but using a forged
dummy finger obtained from an individual; and
• a footwear mark in the snow being the result of the landing of a shoe after being
discarded from a car.
TABLE 4
Footwear mark examination A person wearing these shoes walked on the floor.
Toolmark examination A person using this tool forced the safe, the door or
the window.
Firearm examination A person with that firearm fired a cartridge that led to a
bullet and a cartridge case.
THE USE OF STATISTICS IN LEGAL PROCEEDINGS: A PRIMER FOR COURTS 49
Relevant features are specific to the forensic specialism and are detailed in
Table 5. They can be shared by many (such as the manufacturing size of a tool shared
by all tools produced of that size) or by a few (such as the acquired damage in the
form of cuts on the outsole of a shoe). In other words, the discriminative power
of relevant features varies depending on the specific type of feature considered.
Furthermore, depending on their size and on how the impression is produced (types
of surfaces, residue on the surface, movement, materials, etc), features may not be
reproduced in the impressions and even when features are made by the same source
(eg a shoe) impressions are never identical. Hence, features in disagreement may be
found between impressions, despite sharing the same source. This is because the
respective impressions have been subject to distortion, movement, superimposition or
background noise or have changed appearance over time. On the other hand, while
findings in agreement should be observed when the impressions were produced by
a common source, they may also be found when they were produced by different
sources. Matches between different sources are known as adventitious matches
because another source has produced, by chance, the same level of agreement.
50 THE USE OF STATISTICS IN LEGAL PROCEEDINGS: A PRIMER FOR COURTS
TABLE 5
Fingerprint examination The general flow of the friction-ridge skin (papillary lines), often
classified in general patterns such as arches, loops and whorls.
The ridge endings and bifurcations (referred to as minutiae
or Galton’s details) made by the papillary lines and the
combination thereof.
The marks left by scars or other damage.
The specific shapes of edges of the papillary lines and the
pores present in them.
Barefoot examination The size (length and width) of the foot from heel to toes and the
relative size and position of the toe impressions.
Footwear mark The overall manufacturing design of the outsole (the geometric
examination elements of the design and how they are arranged relative to
each other).
The size of the outsole as specified by the manufacturer.
The general level of wear of the outsole at the time of the
impressions.
The acquired features shown in the impressions in the form
of cuts, removal of material or damage to the outsole.
Toolmark examination The width and size of the tool and its shape as given by the
manufacturer.
The acquired defects on the surface, removal of material and small
imperfections of the surface due to its usage.
Firearm examination The calibre, number, widths and twist of the lands and grooves of
the barrel through which a bullet was fired.
The relative positions of the firing pin, extractor and ejector coming
in contact with a cartridge case.
The striated marks, due to usage, left on the impressions of lands
and grooves on a fired bullet.
The breech face impression left on the back of the cartridge
when fired.
THE USE OF STATISTICS IN LEGAL PROCEEDINGS: A PRIMER FOR COURTS 51
The results of all these observations on both the known-source and the questioned
impression (in agreement or disagreement) are what will be called the comparison
findings. These findings will then be evaluated with regards to the proposition
of common source against the proposition of different sources. This evaluation
stage encapsulates the interpretation of the findings and generation of associated
conclusions. In the UK the evaluation is carried out holistically by the expert using their
knowledge and experience. It is more infrequent that such an evaluation is undertaken
using a likelihood ratio (LR) approach. However, such approaches are encouraged
by forensic practitioners and are used elsewhere, particularly in continental Europe.
Finally, it is customary for each examination followed by a conclusion to be reviewed
independently by a second examiner. This is called the verification stage. The four
above stages (analysis, comparison, evaluation and verification) are generally referred to
by the acronym ACE-V. In the UK, it is common to use this approach for the comparison
of fingerprint evidence but ACE-V is not necessarily used for other types of evidence
involving the comparison of visual patterns.
The term ‘identification’ refers to the decision of the expert that the questioned and
known impressions originated from the same source. It is a categorical opinion, and
should not be misconstrued as being a factual certainty. No forensic examination
covered in this appendix can claim to factually demonstrate the source of a questioned
impression. A decision of ‘identification’ is not a fact; it is the opinion of an expert based
on their measurements, observations and experience and it is a statement of an expert’s
probability that the impression was made by different sources other than the questioned
source is so small that it is negligible.
52 THE USE OF STATISTICS IN LEGAL PROCEEDINGS: A PRIMER FOR COURTS
• that the impression was associated with a specific individual or object to the
exclusion of all others in the world; or
• that it is absolutely certain (or with a 100% certainty) that a specific individual or an
object is the source of the questioned impressions; or
• that it is the result of the comparison of all impressions in the world’s population.
The term ‘exclusion’ refers to the decision of the expert that the questioned and known
impressions did not originate from the same source; in other words, they have been
produced by different sources. When a decision of ‘identification’ or ‘exclusion’ cannot
be reached, the expert may either, depending on the forensic specialism:
• indicate the strength of support the findings will bring to the question of the source;
that strength will be qualified either verbally or numerically.
This can be done by expressing the degree of support that the findings provide in
favour or against these propositions using an LR based on a qualitative assessment for
impression evidence and expressed using a scale such as that provided in Table 1.
TABLE 6
Exclusion Exclusion
Identification Identification
54 THE USE OF STATISTICS IN LEGAL PROCEEDINGS: A PRIMER FOR COURTS
For example, in a case involving a footwear mark where the expert noted a
correspondence between a mark and a sole in terms of overall design, size, general
level of wear and the presence of three cuts located in the heel area, and there is no
significant discrepancy, they may state:
• the comparison findings are more than 1,000 times more likely to be observed if
the mark has been left by that sole rather than if the mark has been left by some
other sole.
If the findings had been different, for example if the general level of wear on both the
mark and the sole were different and not easily reconcilable (with potential additional
wear over the time period between the recovery of the mark and its seizure), the expert
may express (reversing the propositions):
• that the comparison findings, in my opinion, provide moderate support for the view
that the mark has been left by some other sole; or
• that the comparison findings are between 10 and 100 times more likely to be
observed if the mark has been left by some other sole.
The LR is always assigned numerically first and then, if so chosen, translated into a verbal
expression. In most cases involving impression evidence, the LR will be expressed
by an order of magnitude (10, 100, 1,000, 10,000, 1,000,000, etc). If this methodology is
used then an LR should be the prerequisite required to reach any conclusion, including
an identification or exclusion decision. An LR is obtained by dividing two probability
assignments. Both require judgement on the part of the expert based on a corpus of
knowledge that can be divided into two broad categories:
[S] Knowledge derived from robust systematic studies, ideally published, where the
relevant features have been measured and studied statistically.
[E] Knowledge derived from personal experience, ie the expert’s training and
professional experience in the forensic specialism.
Because LRs may be based on different experts’ knowledge, there may be a legitimate
and understandable difference in opinion between two experts. Table 7 gives, for a few
specialisms, examples of the type of knowledge used by experts.
THE USE OF STATISTICS IN LEGAL PROCEEDINGS: A PRIMER FOR COURTS 55
TABLE 7
Forensic specialism Nature of the knowledge used by the examiner to assign a likelihood
ratio (LR)
Footwear mark [S] Studies on the relative frequencies of the general designs of the
examination outsoles and their sizes in the selected population. Data showing
how wear develops on used outsoles.
Firearm examination [S] Studies associated with the systematic search of matching features
occurring on bullets fired by different firearms. Data associated with
the evolution of striated features on bullets due to their successive
firings in a given barrel.
[E] Knowledge of how features produced by the manufacturing
process can be distinguished from features acquired through the
use of a firearm.
• A null hypothesis is presented. This is generally the proposition that the discovery is
false, eg a pharmaceutical has no beneficial effect or the Higgs boson does not exist.
This null hypothesis is set up as a default assumption and is only rejected if there is
sufficiently convincing evidence.
• A test statistic is chosen, for which large values would tend to cast doubt on the null
hypothesis. For example, the average observed benefit in patients within a control
group compared with the average observed benefit in patients given a particular drug.
• Data are collected and the observed value for the test statistic is calculated.
• The probability of getting the observed value (or a more extreme value) of the test
statistic given that the null hypothesis is assumed to be true is calculated. This is
known as the P-value.
• If the P-value is very small, then the null hypothesis is rejected. The definition of
‘small’ depends on the stringency required: a standard threshold for declaring
statistical significance (that a difference between the tested data and the expected
value is a real difference and not just due to chance alone) is to find a P-value of
less than 0.05 (1 in 20).
To claim the existence of the Higgs boson, physicists required a P-value of less
than 1 in 3.5 million.
58 THE USE OF STATISTICS IN LEGAL PROCEEDINGS: A PRIMER FOR COURTS
The P-value is essentially a measure of the incompatibility between the observed data
and a pre-specified hypothesis: if the P-value is very small, either the null hypothesis
is true and a very surprising event has occurred or the null hypothesis is false. Many
problems can arise in the use and interpretation of statistical significance testing.
• The P-value is the probability of extreme evidence, given that the null hypothesis is
true, but it is often interpreted as the probability that the null hypothesis is true, given
the evidence. This is an example of the prosecutor’s fallacy.
• If the null hypothesis is not rejected, it does not mean it is true. This is similar to
someone who is not found guilty in an English court: they are found ‘not guilty’ rather
than ‘innocent’.
• With a large dataset, a statistically significant result may not necessarily be of any
practical significance, eg the difference in cure rates between a new and a standard
drug might be only 1%.
• It is poor scientific practice to conduct multiple tests and only report the most
significant – this is very likely to be a false discovery and will give biased estimates.
THE USE OF STATISTICS IN LEGAL PROCEEDINGS: A PRIMER FOR COURTS 59
Relative risk (RR) estimates are used with increasing frequency in toxic tort/delict
litigation as evidence for a causal link between the putative toxic exposure and the
personal injury sustained by the claimant. The simplistic phrase ‘doubling the risk’ is
unhelpful, because there is rarely a single risk with no variation by age and sex, and
rarely only one estimate, let alone a very precise estimate of a particular RR.
An RR, or risk ratio, measures the size of the effect of a given risk factor on disease
rates in specific populations. It describes the proportional increase in the probability of
an event occurring in a group exposed to some condition, as measured from a baseline
probability of an event occurring in a comparison group that has not been exposed
to the condition. For example, men who smoke 15 – 24 cigarettes per day have an RR
of lung cancer compared with never-smokers of about 26. The RR of lung cancer for
regular drinkers of more than four glasses of wine per day compared with those who
drink a glass of wine per week is about 3.2, averaged across smoking habits. After
adjusting for smoking habits and other factors, the RR is 1.423.
The RR of lung cancer with heavy drinking is more than halved by adjusting for smoking;
this illustrates the importance of having a careful definition of a causal hypothesis
and not drawing conclusions from a single estimate. Estimates from a range of
epidemiological studies are needed. A medical statistician or epidemiologist will usually
consider how study results relate to the viewpoints identified by Sir Austin Bradford Hill
for the assessment of causality24.
60 THE USE OF STATISTICS IN LEGAL PROCEEDINGS: A PRIMER FOR COURTS
The Bradford Hill criteria apply to general scientific conclusions for populations. But we
may also be interested in individual cases, say in civil litigation where courts need to
decide whether a particular exposure (say the asbestos encountered in a job) caused a
negative outcome in a specific person (say John Smith’s lung cancer). It can never be
established with absolute certainty that the asbestos was the cause of the cancer, since
it cannot be proved that the cancer would not have occurred without the exposure.
But some courts have accepted that, on the ‘balance of probabilities’, a direct causal
link has been established if the RR associated with the exposure is greater than two25.
But why two?
Presumably the reasoning behind this conclusion is as follows26:
• Suppose that, in the normal run of things, out of 1,000 men like John Smith, 10 would
get lung cancer. If asbestos more than doubles the risk, then if these 1,000 men had
been exposed to asbestos, perhaps 25 would develop lung cancer.
• Of those exposed to asbestos who go on to develop lung cancer, fewer than half
would have got lung cancer if they had not been exposed to the asbestos.
• So more than half of the lung cancers in this group will have been caused by
the asbestos.
• Since John Smith is one of a group who was diagnosed with lung cancer, then on
the balance of probabilities his lung cancer was caused by the asbestos.
• An incidence rate is the number of new cases identified in a given time period
divided by the total life time lived in that period by the population under study. If 6
new cases of lung cancer are identified in a year, and 1,000 people were monitored
during that year, the incidence rate is 6 per 1,000 person-years.
THE USE OF STATISTICS IN LEGAL PROCEEDINGS: A PRIMER FOR COURTS 61
• RRs are used to compare incidence rates in two groups (hence rate ratio), usually an
exposed group compared with a non-exposed group. Given a lung cancer incidence
rate of 6 per 1,000 person-years in chimney sweeps, and 2 per 1,000 person-years in
the general population, the RR for chimney sweeps is 6/2 = 3.
• Excess risk is the difference between incidence rates in two groups. The excess risk
of lung cancer for chimney sweeps is 6 - 2 = 4 per 1,000 person-years.
A prospective cohort study actively enrols the defined cohort and collects baseline
information. Subsequent health status is observed through follow-up. A retrospective
cohort study, or historical cohort study, uses data from past records. In studies of health
issues arising from occupational exposure, data from routine employment medical
examinations and health and safety records can be used. A retrospective cohort study
might provide information relatively quickly.
If the retrospective cohort continues to be followed after the first phase, the study
includes both retrospective and prospective data. Cohort studies can provide
information on a range of factors affecting a range of health states. It is easier to
ensure consistency of measurement or recording of exposure factors and diagnosis in
prospective cohorts than in retrospective cohorts or case-control studies. It is important
that possible exposure or risk factors are clearly defined and that consistent effort is
made to obtain data from the whole cohort. However, cohort studies are typically more
expensive because they are larger. The results of a prospective cohort study can only
be observed after some time, potentially a long time for slowly developing conditions.
Clinical databases, such as a list of children referred to a specialist hospital, are not
regarded as a disease register, because a relevant population cannot be identified.
Parents might travel across regions or countries to seek help33. If cases are collected
from a service which focuses on provision for those with cognitive deficits, accurate
clinical diagnoses and reliable information on people with normal or good cognitive
ability will not be available34, 35.
Systematic review
The Cochrane Collaboration36 defines a systematic review as a review of a clearly
formulated question that uses systematic and explicit methods to identify, select
and critically appraise all relevant research, and to collect and analyse data from the
studies that are included in the review. Statistical methods (meta-analysis) may or may
not be used to analyse and summarise the results of the included studies, depending
on the quality and quantity of information. Some systematic reviews can only provide
descriptions of the main features of the included studies. A summary of the evidence
based on the data for each patient in each study is generally regarded as the optimal
approach, but this is acknowledged often to be impractical owing to constraints of
confidentially and time. Systematic reviews are contrasted with narrative or expert
reviews, which are based only on research that is known by, easily available to or
acceptable to the reviewers.
64 THE USE OF STATISTICS IN LEGAL PROCEEDINGS: A PRIMER FOR COURTS
Bias
In epidemiology and statistics, bias typically refers to estimates which systematically
misrepresent the quantity of interest. As a simple example, if a lecturer asks the 10
students, out of a class of 100 students, who have come to all his lectures whether his
lectures are worth attending, the answers given cannot be assumed to represent the
view of the whole class.
Reporting standard
In order to assess the quality of the research described in a published article, sufficient
information is required. When medical statisticians began to assess the quality of
medical research publications, one difficulty was the lack of information provided
regarding the design and analysis of studies37, 38, 39. Since 2010, a series of statements
and guidelines, with accompanying checklists, have been published to facilitate
understanding of the study and assessment of the validity of results and conclusions.
These can be found on the website of the Equator (Enhancing the Quality and
Transparency of Health Research) Network40.
Often a confounder cannot be precisely estimated and there may be various factors to
consider as outlined in the Bradford Hill criteria illustrated in Regina v Abadom, 198341.
Strength
‘The strength of the association is expressed as a comparison between a standard or
unexposed population and a population exposed to the putative causal agent. The RR
of lung cancer for smokers is higher compared to never-smokers’42. ‘Strength’ is not well
defined, although an RR of 5 would generally lead to further investigation. However,
the baseline rate of the condition should also be taken into account. Headlines in
2000 which reported a doubling of risk of deep vein thrombosis associated with oral
contraceptives, without reporting the absolute risk of 2 per 10,000 users per year,
resulted in many women abruptly stopping taking their pills. Doubling the risk increases
the absolute rate to 4 per 10,000 users.
Consistency
‘If an association is repeatedly observed by different people, in different places,
circumstances and times, it is more reasonable to conclude that the association is not
due to error, or imprecise definition, or a false positive statistical result’43. Further, the
association should be observed in studies with a high methodological standard.
Specificity
‘Consideration should also be given to whether particular diseases only occur among
workers within particular occupations. This is a supporting feature in some cases, but
in other cases one agent might give rise to a range of reasons for death’44. The best
example of a simple specific causal agent is thalidomide: the congenital deformity
known as phocomelia is seen almost exclusively in the population of individuals
exposed to thalidomide during gestation.
Temporality
‘This requires causal factors to be present before the disease’45.
Plausibility
Biological plausibility that an effect is causal for an outcome to occur must be based
on scientific reasoning or data, not just prior beliefs. Laboratory experiments might
be possible, especially if the outcome effect can be modified by an appropriate
experimental regime. However, extrapolation from animal experiments to humans is
not straightforward. In the development of drugs, randomised experiments are required.
For side effects with long-term treatments, or industrial exposures, other study designs
are used.
Coherence
‘A cause and effect interpretation should not seriously conflict with generally known
facts of the development and biology of the disease’47.
Experiment
‘Sometimes evidence from laboratory or field experiments might be available’48.
Analogy
Bradford Hill commented: “In some circumstances it would be fair to judge by analogy.
With the effects of thalidomide and rubella before us we would surely be ready to
accept slighter but similar evidence with another drug or another viral disease in
pregnancy.” This criterion has a limited role. If the criteria are not met, one cannot
conclude that there is not a causal association. The conclusion is that there might be
direct causal explanation, or an indirect explanation, or even that the association arose
from some aspects of data collection or analysis. Competing explanations should be
considered that might include unmeasured confounding factors or alternative factors
which have an association of similar strength to the putative causal factor49.
THE USE OF STATISTICS IN LEGAL PROCEEDINGS: A PRIMER FOR COURTS 67
References
1. Straight Statistics. 2009 Question marks over Corby judgement. See https://straightstatistics.fullfact.org/
article/question-marks-over-corby-judgement (accessed 26 May 2020).
2. The Law Society Gazette. 2018 Police chief explains ‘justice by algorithm’ tool.
See https://www.lawgazette.co.uk/news/police-chief-explains-justice-by-algorithm-tool-/5067033.article
(accessed 26 May 2020).
3. Willis W. 2015 ENFSI guideline for evaluative reporting in forensic science. European Network
of Forensic Science Institutes. See http://enfsi.eu/wp-content/uploads/2016/09/m1_guideline.pdf
(accessed 26 May 2020).
4. Royal Society and Royal Society of Edinburgh. 2017 Forensic DNA analysis: a primer for the courts.
See https://royalsociety.org/-/media/about-us/programmes/science-and-law/royal-society-forensic-dna-
analysis-primer-for-courts.pdf (accessed 24 August 2020).
6. Anderson T, Schum D, Twining W. 2005 Analysis of evidence (2nd edn, Law in Context). Cambridge:
Cambridge University Press. doi:10.1017/CBO9780511610585.
8. The Council of the Inns of Court and the Royal Statistical Society. 2017 Statistics and probability
for advocates: understanding the use of statistical evidence in courts and tribunals.
See https://www.statsref.com/ICCA-RSS-guide.pdf (accessed 26 May 2020).
11. Regina v Adams. 1996 England and Wales Court of Appeal (Criminal Division) 222.
13. Part 19, the Criminal Procedure Rules and Criminal Practice Direction 2015 (as amended 2018, 2019).
See https://www.justice.gov.uk/courts/procedure-rules/criminal/docs/2015/crim-proc-rules-2015-part-19.pdf
(accessed 26 May 2020).
14. Part 35, Civil Procedure Rules 1998 (as amended 2019). See https://www.justice.gov.uk/courts/procedure-
rules/civil/rules/part35 (accessed 26 May 2020).
16. Jackson G, Aitken C, Roberts P. 2015 Case assessment and interpretation of expert evidence – guidance
for judges, lawyers, forensic scientists and expert witnesses. London: Royal Statistical Society.
68 THE USE OF STATISTICS IN LEGAL PROCEEDINGS: A PRIMER FOR COURTS
19. Curran J, Hicks T, Buckleton J. 2000 Forensic interpretation of glass evidence. Boca Raton, FL: CRC Press LLC.
20. Ibid.
21. Wilson A, Aitken C, Sleeman R, et al. 2014 The evaluation of evidence relating to traces of cocaine
on banknotes. Forensic Science International, 236, 67 – 76.
22. Aitken C, Wilson A, Sleeman E, et al. 2017 Distribution of cocaine on banknotes in general circulation
in England and Wales. Forensic Science International, 270, 261 – 266.
23. Bagnardi V, Randi G, Lubin J, et al. 2009 Alcohol consumption and lung cancer risk in the Environment
and Genetics in Lung Cancer Etiology (EAGLE) study. American Journal of Epidemiology, 171, 36 – 44.
24. Hill A. 1965 The environment and disease: association or causation? Proceedings of the Royal Society
of Medicine, 58(5), 295 – 300.
25. Spiegelhalter D. 2019 The art of statistics: learning from data. London: Pelican Books.
26. Ibid.
27. Dawid A, Faigman D, Fienberg S. 2014 Fitting science into legal contexts: assessing effects of causes
or causes of effects? Sociological Methods and Research, 43, 359 – 421.
28. Hutton J. 2018 Expert evidence: civil law, epidemiology and data quality. Law, Probability and Risk.
17(2), 101 – 110.
29. Altman D. 1991 Practical statistics for medical research. London, Chapman and Hall, 93 – 96.
30. Sackett D. 1979 Bias in analytical research. Journal of Chronic Diseases, 32, 51 – 63.
31. Strauss D, Shavelle R. 1998 Life expectancy of adults with cerebral palsy. Developmental Medicine
& Child Neurology, 40, 369 – 375.
32. Ludvigsson J, Häberg S, Knudsen G, et al. 2015 Ethical aspects of registry-based research in the
Nordic countries. Clinical Epidemiology, 7, 491 – 508.
33. Hutton J. 2015 Weighing privacy against effective epidemiology. Developmental Medicine & Child
Neurology, 57, 595 – 596.
34. Hutton J. 2006 Cerebral palsy life expectancy. Clinics in Perinatology, 33, 545 – 555.
35. Hutton J, Eccles M, Grimshaw J. 2008 Ethical issues in implementation research: a discussion
of the problems in achieving informed consent. Implementation Science, 3, 52.
39. Gore S, Altman D. 1982 Statistics in practice. London: British Medical Association.
The members of the groups involved in producing this primer are listed below.
The members acted in an individual and not organisational capacity and declared
any conflicts of interest. They contributed on the basis of their own expertise and
good judgement. The Royal Society and the Royal Society of Edinburgh gratefully
acknowledge their contribution.
9 781782 524861
ISBN: 978-1-78252-486-1
Issued: November 2020 DES6439