Week 9 Chapter 1 Normal
Week 9 Chapter 1 Normal
1.0 Introduction
Perhaps the most striking feature of real data from almost any discipline is the
variation they exhibit. If a nurse measures physical dimensions (heights, weights..)
or blood parameters (Serum albumin or cholesterol..) for a sample of patients, they
will (nearly) all be different. If a teacher or psychologist administers a psychological
test to a sample of children, the tests scores will vary greatly. If a manufacturing
engineer measures physical dimensions (e.g., outer diameter or length) of
components manufactured to the same specifications, they will not all be the same –
in fact, they will exhibit precisely the same type of variation as the psychological test
scores. Similarly, environmental scientists will find different acidity levels (pH
values) in samples from different lakes or streams, or from one site sampled at
different times. They will find that birds differ in their dimensions in much the same
way that humans do and if they measure birds’ eggs, they will find very much the
same kind of variation as did the engineer, teacher or nurse referred to above.
This variation is the reason why statistical methods and ideas are required across a
wide range of disciplines. Statistics is the science of describing and analysing
chance variation, and statistical methods help us distinguish between systematic
effects and chance variation.
In this chapter, we will be concerned with describing such variation – at first with
pictures in Section 1.1. We will then focus, in Section 1.2, on the Normal curve that
can be adapted to describe the variation encountered in many practical situations,
and which is fundamental to the simple statistical methods that form the basis of
much of this course. We will discuss, in Section 1.3, how to decide if these curves fit
our data. Section 1.4 considers the implications of combining chance quantities (for
example, the implications for the overall length of a project of combining several
consecutive activities, each subject to chance variation). In practice, we are very
often concerned with the behaviour of averages; thus, a medical study might be
carried out to decide if the use of a new drug reduces, on average, the blood
pressure of patients suffering from high blood pressure. In Section 1.5, we will
examine how the behaviour of averages differs from that of ‘raw’ scores – this
carries implications for answering questions such as that related to blood pressure
changes, referred to above. In Section 1.6 we will consider some aspects of
measurement systems – if the same object is measured many times (even by the
same person using the same equipment) the same result is not found. Due to
chance measurement error, a range of values will be obtained, and these values
often exhibit characteristics very similar to those of the industrial or psychological
data discussed above. Finally, Section 1.7 illustrates some applications of a simple
statistical tool (the control chart) which is related to the material introduced in the
earlier sections of the chapter.
Figure 1.1.2 represents capping torques (in lbs-inches) of 78 container caps; the
capping torque is a measure of the energy expended in breaking the seal of the
screw-capped containers. The data come from a process capability study of a newly
installed filling line in a company that manufactures chemical reagents. It is
interesting to note that the physical measurements display similar characteristics to
the psychological scores – they vary in a roughly symmetrical way around a central
value (about 10.8 for the torques).
Figure 1.1.3 represents petal lengths of a sample 50 irises from the variety Setosa.
The data were originally published in the Bulletin of the American Iris Society [2], but
their frequent citation in the statistical literature is due to their use in the pioneering
paper on multivariate analysis by R.A. Fisher [3]. This botanical/environmental
science example exhibits the same symmetry characteristics noted above for the
psychological and industrial data.
Figure 1.1.5: Urinary -hydroxycorticosteroid (nmol/24h) values for 100 obese females
There are of course very many different types of shape that may be encountered for
histograms of real data (more will be presented during the lectures). However,
Figures 1.1.1-1.1.6 were selected for presentation, both to illustrate the wide
applicability of the Normal curve (in terms of discipline type) and the possibility of
transforming data so that they follow this curve, which is, perhaps, the easiest of all
statistical distributions to use – hence its popularity. It is also of fundamental
importance because averages of data sampled from skewed distributions will, in
many cases, follow a Normal curve (see Section 1.5) – this means that methods
developed for Normal data can be applied to such averages, even though the
underlying data are skewed.
If the more or less symmetrical histograms of Sections 1.1 were based on several
thousand observations, the bars of the histograms could be made very narrow and
the overall shapes might then be expected to approximate the smooth curve shown
below as Figure 1.2.1.
Figure 1.2.2 shows three Normal curves with mean and standard deviation . The
curves are drawn in such a way that the total area enclosed by each curve is one.
Approximately 68% of the area lies within 1 standard deviation of the mean; the
corresponding values for two and three standard deviations are 95% and 99.7%,
respectively. These areas will be the same for all Normal curves irrespective of the
values of the means and standard deviations.
Figure 1.2.2: Areas within one, two and three standard deviations of the mean
The implications of this figure are that if the variation between measurements on
individuals/objects (e.g. blood pressure, SAT scores, torques, etc.) can be described
by a Normal curve, then 68% of all values will lie within one standard deviation of the
mean, 95% within two, and 99.7% within three standard deviations of the mean.
Similarly, if we have a measurement error distribution that follows a Normal curve,
then in 32% of cases the chance error will result in a measured value being more
than one standard deviation above or below the mean (the ‘true value’ if the
measurement system is unbiased). In only 5% of cases will the observation be
further than two standard deviations from the mean.
Suppose that the Normal curve shown in Figure 1.2.3 represents the masses of filled
containers sampled from a filling line. The curve is interpreted as an idealised
histogram for which the total area is 1.0. The area between any two points (say x 1
and x2) represents the relative frequency of container masses between x1 and x2. If
this area is 0.2 it means that 20% of all containers filled by this line have masses
between x1 and x2 units. The assumption is that, provided the filling process
remains stable, 20% of all future fillings will result in containers with masses
between x1 and x2 units. Since relative frequencies are generally of interest, e.g.
what fraction of values is less than or greater than a given value, or what fraction lies
between two given values, then, where a Normal curve describes the variability in
question, it can be used to calculate the required relative frequency.
Figure 1.2.4: A standard Normal curve (z) and an arbitrary Normal curve (x)
All Normal curves have essentially the same shape once allowance is made for
different standard deviations. This allows for easy calculation of areas, i.e., relative
frequencies or probabilities. If, for any arbitrary curve with mean µ and standard
deviation , the area between two values, say x1 and x2, is required, all that is
necessary is to find the corresponding values z1 and z2 on the standard Normal
curve, determine the area between z1 and z2 and this will also be the area between
x1 and x2 (as shown (schematically) in Figure 1.2.4). A standard Normal curve has
mean zero and standard deviation one.
x
z
The z value is the answer to the question “by how many standard deviations does x
differ from the mean ?”
Once the z value is calculated, the area to its left (i.e. the area from minus infinity up
to this value of z) can be read from a standard Normal table (Table ST-1). Thus, a z
value of 2 has an area to its left of 0.9772, while a z value of 1.5 has an area to its
left of 0.9332. Therefore, the area between z=1.5 and z=2 on the standard Normal
curve is 0.9772 - 0.9332 or 0.0440. Similarly, the area between two points which are
1.5 and 2 standard deviations from the mean of any Normal curve will also be
0.0440.
Figure 1.1.4 showed a distribution of heights for a population of women. The mean
of the 1794 values was 162.4 and the standard deviation was 6.284. If we take the
mean and standard deviation as those of the corresponding population, we can
address questions such as those below.
(i) What fraction of the population has a height greater than 170 cm?
(ii) What fraction of the population has a height less than 145 cm?
(iii) What bounds should we use if we want to enclose the central 95% of
the population within them?
(i) What fraction of the population has a height greater than 170 cm?
To calculate the required proportion of the population, the area to the right of 170 on
the height curve must be found. To do this, the equivalent value on the standard
Normal curve is calculated; the area to the right of this can be found easily and this
gives the required proportion, as shown in Figure 1.2.5.
Figure 1.2.5: The standard Normal curve (z) and the women’s heights curve (x)
The translation from the height (x) to the standard Normal scale (z) is given by:
x
z
170 162.4
P(x >170) = P( Z ) = P(z > 1.21) = 1 – P(z < 1.21)
6.284
= 1 – 0.8869 = 0.1131
We can read P(x >170) as “the proportion of x values greater than 170” or,
equivalently, as “the probability that a single x value, selected at random, will be
greater than 170”.
(ii) What fraction of the population has a height less than 145 cm?
x
z
Statistical Table-1 (ST-1) only gives areas for positive values of Z. However, we can
use the symmetry of the curve to find areas corresponding to negative values. Thus,
P(z<2.77) is the area below 2.77; 1–P(z<2.77) is the area above +2.77. By
symmetry this is also the area below –2.77, as shown in Figure 1.2.6
-2.77 0 2.77
Figure 1.2.6: The symmetry property of the Normal curve (area is 0.0028 in each tail)
P(z < –2.77) = 1 – P(z < 2.77) = 1 – 0.9972 = 0.0028 = P(X < 145).
(iii) What bounds (A, B) should we use if we want to enclose the central 95%
of the population within them?
To find the values that enclose the central 95% of the standard Normal curve (i.e.,
an area of 0.95) we proceed as follows. Enclosing the central 0.95 means we leave
0.025 in each tail. The upper bound, therefore, has an area of 0.975 below it (as
shown in the leftmost curve in Figure 1.2.7). If we inspect the body of Table ST-1,
we find that 0.975 corresponds to a z value of 1.96. By symmetry, the lower bound
is –1.96 (the central curve in Figure 1.2.7). Thus, a standard Normal curve has 95
% of its area between z=–1.96 and z=+1.96. From the relation:
x
z
We get:
B
1.96 => B = +1.96
and:
A
– 1.96 => A = –1.96
It is clear, then, that the values that enclose the central 95% of the population for an
arbitrary Normal curve (x) with mean and standard deviation are: –1.96 and
+1.96.
Example 2
Suppose the distribution of capping torques for a certain product container is Normal
with mean =10.6 lb-in and standard deviation =1.4 lb-in (these were the values
obtained from the data on which Figure 1.1.2 is based). To protect against leaks, a
minimum value for the torque must be achieved in the manufacturing process. If the
lower specification limit on torque is set at 7 lb-in, what percentage of capping
torques will be outside the lower specification limit?
To calculate the required percentage, the area to the left of 7 on the capping process
curve must be found. To do this, the equivalent value on the standard Normal curve
is calculated; the area to the left of this can be found easily and this gives the
required proportion (as illustrated schematically in Figure 1.2.8).
Figure 1.2.8: The standard Normal curve and the capping process curve
The translation from the torque scale (x) to the standard Normal scale (z) is given
by:
x
z
z = 7 10.6 = – 2.57
1.4
Thus, a torque of 7 on the capping process curve corresponds to the point – 2.57 on
the standard Normal curve. The area to the left of 2.57 under the standard Normal
Exercises
1.2.1. Weight uniformity regulations usually refer to weights of tablets lying within a certain range
of the label claim. Suppose the weights of individual tablets are Normally distributed with
mean 100 mg (label claim) and standard deviation 7 mg.
(a) If a tablet is selected at random, what is the probability that its weight is:
(i) less than 85 mg ?
(ii) more than 115 mg ?
(iii) either less than 85 mg or more than 115 mg ?
(c) (i) What weight value K could be quoted such that 99% of all tablets are heavier than K ?
(ii) What weight values A and B can be quoted such that 99% of all tablets lie between A
and B ?
1.2.2 Figures 1.1.5 and 1.1.6 are based on data cited by Strike [5]. The data are urinary -
hydroxycorticosteroid (-OHCS) values in nmol/24h, obtained on a random sample of 100
obese adult females drawn from a fully specified target population. According to Strike it is
of interest to define a 95% reference range for the -OHCS measurements (a reference
interval encloses the central 95% of the population of measured values) that can be used to
assess the clinical status of future subjects drawn from the same population. The
distribution of -OHCS values is quite skewed; accordingly, in order to construct the
reference interval we first transform the data to see if they can be modelled by a Normal
distribution in a transformed scale. Figure 1.1.6 suggests that this is a reasonable
assumption (this will be discussed further in Section 1.3). In the log10 scale the mean is
2.7423 and the standard deviation is 0.1604.
(i) Calculate the bounds (a, b) for the 95% reference interval in the log scale.
(ii) Use your results to obtain the corresponding reference interval in the original
a b
measurement scale: the bounds will be (10 , 10 ).
(iii) What value, U, should be quoted such that only 1% of women from this population
would be expected to have a -OHCS measurement which exceeds U?
1
Try re-calculating using standard deviations of 1.38 or 1.41 instead of 1.40 and note the effect on
the resulting probabilities. How often (never?) do engineers know standard deviations to this level of
precision?
Our calculations using the Normal curve assumed the mean and standard deviation
were already known. In practice, of course, they have to be calculated from
observations on the system under study. Suppose that a sample of n objects has
been selected and measured (e.g., SAT scores, people’s heights etc.) and the
results are x1, x2, …xn.. The average of these, x , gives an estimate of the
corresponding population value, . The standard deviation, s, of the set of results
gives an estimate of the corresponding population standard deviation, . This is
calculated as:
(x i x)2
s i 1
n 1
i.e. the mean is subtracted from each value, the deviation is squared and these
squared deviations are summed. To get the average squared deviation, the sum is
divided by n–1, which is called the 'degrees of freedom'2. The square root of the
average squared deviation is the standard deviation. Note that it has the same units
as the original measurements.
The assumption that the parameters of the Normal distributions are known without
error is reasonable for the height data of Example 1 (where n=1794), but clearly the
sample size for the capping torque data (n=78) is such that we cannot expect that
the ‘true’ process mean or standard deviation is exactly equal to the sample value.
Methods have been developed (tolerance intervals) to allow for the uncertainty
involved in having only sample data when carrying out calculations like those of
2
(n–1) is used instead of the sample size n for a mathematical reason; the deviations (before they are
squared) sum to zero, so once the sum of (n–1) is found, the last deviation is determined by the
constraint, thus, only (n–1) deviations are free to vary at random – hence, ‘degrees of freedom’.
Obviously, unless n is quite small, effectively the same answer is obtained if we get the average by
dividing by n.
The statistical methods that will be discussed in later chapters assume that the data
come from a Normal distribution. This assumption can be investigated using a
simple graphical technique, which is based on the type of calculation discussed in
the previous section. Here, for simplicity, the logic of the method will be illustrated
using a sample of 10 values, but such a sample size would be much too small in
practice, as it would contain little information on the underlying shape, particularly on
the shapes of the tails of the curve.
The relation between any arbitrary Normal value x and the corresponding standard
Normal value z is given by:
x
z
x = + z
If x is plotted on the vertical axis and z on the horizontal axis, this is the equation of a
straight line with intercept and slope , as shown in Figure 1.3.1. Accordingly, if
we make a correspondence between the x values in which we are interested and z
values from the standard Normal table and obtain a straight line, at least
approximately, when they are plotted against each other, we have evidence to
support the Normality assumption.
0 Z
Using this method for establishing the correspondence between x and z will always
result in the largest x value corresponding to z= (which presents a challenge when
plotting!). To avoid the infinity, some small perturbation of the calculation of the
fraction i/n is required, such that the result is never i/n=1. Simple examples of
possible perturbations would be (i–0.5)/n or i/(n+0.5). Minitab has several options;
(i–3/8)/(n+1/4), which is based on theoretical considerations, is used in Table 1.3.1.
The areas which result from this are shown in column 5 and the corresponding z
values (Normal scores) in column 6. The choice of perturbation is not important: all
that is required is to draw a graph to determine if the values form an approximately
straight line. We do not expect a perfectly straight line even for truly Normal data,
due to the effect of sampling variability. Thus, 12.6 is the ninth largest value in our
dataset, but any value between 13.1 and 12.2 would also have been the ninth
largest value. Consequently, 12.6 is not uniquely related to 0.841.
Figure 1.3.2 shows the Normal plot (or Normal probability plot, as it is also called) for
our dataset – a ‘least squares’ line is superimposed to guide the eye (we will discuss
such lines later). The data are close to the line (hardly surprising for an illustrative
example!).
13
12
X
11
10
8
-2 -1 0 1 2
Score
The schematic diagrams in Figure 1.3.3 indicate some patterns which may be
expected when the underlying distribution does not conform to the Normal curve.
Plots similar to Figure 1.3.3(a) will be encountered commonly, as outliers are often
seen in real datasets. Figure 1.3.3(c) might, for example, arise where the data
represent blood measurements on patients - many blood parameters have skewed
distributions. Figures 1.3.3(b) and (d) are less common but (b) may be seen where
the data are generated by two similar but different sources, e.g. if historical product
yields were being studied and raw materials from two different suppliers had been
used in production. The term ‘heavy tails’ refers to distributions which have more
area in the tails than does the Normal distribution (the t-distribution is an example).
Heavy tails mean that data values are observed further from the mean, in each
direction, more frequently than would be expected from a Normal distribution.
600
Verbal
500
400
300
-3 -2 -1 0 1 2 3
Score
Figure 1.3.4: Normal plot for SAT Verbal scores for 200 university students
Mean 10.84
15
StDev 1.351
N 78
14 AD 0.206
P-Value 0.865
13
12
Torque
11
10
6
-3 -2 -1 0 1 2 3
Score
Figure 1.3.5: Normal plot for Capping torques for 78 screw-capped containers
1000
b-OHCS
800
600
400
200
-3 -2 -1 0 1 2 3
Score
Figure 1.3.6: Normal plot for Urinary -OHCS 100 obese females
Mean 2.742
3.2 StDev 0.1604
N 100
AD 0.250
P-Value 0.738
3.0
Log10(b-OHCS)
2.8
2.6
2.4
2.2
-3 -2 -1 0 1 2 3
Score
Figure 1.3.8 is a Normal plot of the women’s heights measurements of Figure 1.1.4.
Although it is very close to a straight line, it has a curiously clumped, almost stair-
like, appearance. The p-value is very small indeed. The reason for this is that the
measurements are rounded to integer values (centimetres) and, because the
dataset is so large, there are many repeated values. A Normal distribution is
continuous (the sampled values can have many decimal places) and few, if any,
repeated values would be expected to occur; the AD test detects this unusual
occurrence. To illustrate the effect of rounding on the plots, 1794 random numbers
were generated from a Normal distribution with the same mean and standard
deviation as the real data (the mean and SD of the sample generated are slightly
different from those specified, due to sampling variability).
170
Heights (cm)
160
150
140
-3 -2 -1 0 1 2 3
Score
170
Simulate-1
160
150
140
-3 -2 -1 0 1 2 3
Score
Figure 1.3.9 shows a Normal plot for the data generated; as expected from random
numbers, the sample values give a very good Normal plot. The data were then
rounded to whole numbers (centimetres) and Figure 1.3.10 was drawn – its
characteristics are virtually identical to those of the real data, showing that the cause
of the odd line is the rounding; note the change in the p-value in moving from Figure
1.3.9 to Figure 1.3.10. Figure 1.3.11 shows a Normal plot for the Iris data of Figure
1.1.3 – the effect of the measurements being rounded to only one decimal place,
resulting in multiple petals having the same length, is quite striking.
Round(Simlate-1)
170
160
150
140
-3 -2 -1 0 1 2 3
Score
Figure 1.3.10: Normal plot for rounded simulated women’s heights measurements
Mean 1.464
1.9
StDev 0.1735
N 50
1.8 AD 1.011
P-Value 0.011
1.7
1.6
Petal Length
1.5
1.4
1.3
1.2
1.1
1.0
-2 -1 0 1 2
Score
The assumption of data Normality underlies the most commonly used statistical
significance tests and the corresponding confidence intervals. We will use Normal
plots extensively in verifying this assumption in later chapters.
Often the quantities in which we are interested are themselves the result of
combining other quantities that are subject to chance variation. For example, a
journey is composed of two parts: X is the time it takes to travel from A to B, and Y
is the time to travel from B to C. We are interested in S, the sum of the two travel
times.
If we can assume that X is distributed with mean X=100 and standard deviation
x=10 (units are minutes), while Y is distributed with mean Y=60 and standard
deviation Y=5, what can we say about the total travel time, S=X+Y?
Addition Rules
S = X + Y
The underlying mathematics shows that both the means and the variances (the
squares of the standard deviations) are additive. This means that the standard
deviation of the overall travel time, S, is:
S X2 Y2
We might also ask, how much longer than the trip from B to C, the trip from A to B is
likely to take (i.e., what can we say about D=X–Y). To answer such questions we
need subtraction rules.
Subtraction Rules
If D = X – Y, then:
D = X – Y
Note that the variances add, even though we are subtracting. This gives:
D X2 Y2
The independence assumption is required for the rules related to the standard
deviation; it is not required for combining means.
Normality Assumptions
The rules given above hold irrespective of the distributions that describe the chance
variation for the two journeys, X and Y. However, if we can assume that X and Y are
Normally distributed, then any linear combination of X and Y (sums or differences,
possibly multiplied by constants) will also be Normally distributed. In such cases,
the calculations required to answer our questions are simple; they are essentially the
same as for Examples 1 and 2, once we have combined the component parts into
the overall measured quantity, and then found its mean and standard deviation using
the above rules.
Given our assumptions above about the means and standard deviations of the two
journey times, X and Y, what is the probability that the overall journey time, S,
(i) will take more than three hours (S > 180 minutes)?
(ii) will take less than 2.5 hours (S < 150 minutes)?
S X2 Y2 10 2 5 2 125 11.18
Overall travel times are, therefore, Normally distributed with a mean of 160 and
a standard deviation of 11.18, as shown schematically in Figure 1.4.1.
Figure 1.4.1: The standard Normal curve and the travel times curve
The probability that the overall travel time, S, exceeds three hours (180
minutes) is approximately 0.04, as illustrated schematically in Figure 1.4.1.
Figure 1.4.2: The standard Normal curve and the travel times curve
The probability that the overall journey, S, will take less than 2.5 hours (150
minutes) is 0.19, as illustrated schematically in Figure 1.4.2.
Example 4
Suppose that X (as given in the introduction, i.e. mean X=100 and standard
deviation x=10 (units are minutes)) is the time it takes a part-time student to travel
to college each evening and she makes ten such trips in a term.
(i) If T is the total travel time to college for the term, what are the mean and
standard deviation for T?
Since only one trip type is involved, we do not need the X sub-script. We have:
10
T Xi
i 1
T = 31.62
Total travel time is Normally distributed with a mean of 1000 and a standard
deviation of 31.62, where the units are minutes.
(ii)
Figure 1.4.3: The standard Normal curve and the total travel time curve
1080 1000
P(T 1080) P(Z ) P(Z 2.53)
31.62
1 P(Z 2.53) 1 0.9943 0.0057
The chances that her total travel time to college for the term might exceed 18
hours (1080 minutes) is only about 6 in a thousand, as illustrated schematically
in Figure 1.4.3. Note that the travel characteristics for travelling back from
college could be quite different (different times might mean different traffic
levels), so this would need examination if we were interested in her total
amount of college travel.
Exercises
1.4.1. A final year psychology student approaches a member of staff seeking information on doing a
PhD. In particular, she is concerned at how long it will take. The staff member suggests that
there are essentially four sequential activities involved: Literature Review (reading and
understanding enough of the literature in a particular area to be able to say where there are
gaps that might be investigated), Problem Formulation (stating precisely what will be
investigated, how this will be done, what measurement instruments will be required, what
kinds of conclusions might be expected), Data Collection (developing the necessary
research tools, acquiring subjects, carrying out the data collection, carrying out the statistical
analysis), and Writing up the results (drawing conclusions, relating them to the prior
Table 1.4.1: Estimates for the means and SDs of the four activities (weeks)
Assuming that this is a reasonably accurate description of the situation (in practice, of course,
the activities will overlap), what is the probability that the doctorate will take (i) less than 3 years
(156 weeks)?; (ii) more than 3.5 years (182 weeks)?; (iii) more than 4 years (208 weeks)?
It is rarely the case that decisions are based on single measured values – usually,
several values are averaged and the average forms the basis of decision making.
We need, therefore, to consider the effect of averaging on our data. We will discuss
this in the context of measuring the percentage dissolution of tablets after 24 hours
in a dissolution apparatus. The ideas are general though; precisely the same
statements could be made about examination results, prices of second-hand cars or
the pH measurements made on samples of lake water.
The data on which Figure 1.5.1(b) is based are no longer single tablet dissolution
values. Suppose we randomly group the many thousands of tablet measurements
into sets of four, calculate the means for each set, and draw a histogram of the
means. Four values selected at random from the distribution described by (a) will,
when averaged, tend to give a mean result close to the centre of the distribution. In
order for the mean to be very large (say greater than 92) all four of the randomly
selected tablet measurements would have to be very large, an unlikely occurrence3.
Figure 1.5.1(b) represents an idealised version of this second histogram. Of course
we would never undertake such an exercise in the laboratory, though we might in
3
If the probability that one tablet was greater than 92 was 0.05, the probability that all four would be
greater than 92 would be 0.00000625!
(b)
(a)
We will illustrate this interesting and important result using some data simulated from
a skewed distribution, specifically from a chi-square distribution with 7 degrees of
freedom (we will encounter the chi-square distribution later when we discuss
significance tests; here, it is used simply as a typically skewed distribution curve).
Figure 1.5.2 shows the theoretical curve, while Figure 1.5.3 shows a histogram of
200 values randomly generated from this distribution.
50
40
30
Frequency
20
10
0
0 4 8 12 16 20 24
Figure 1.5.4 shows the corresponding Normal plot which has the characteristic
curved shape, we saw previously for skewed distributions.
Simulated Data 15
10
-5
-3 -2 -1 0 1 2 3
Score
Thirty values were randomly generated from this distribution and the mean was
obtained. This was repeated 200 times, resulting in a sample of 200 means. A
histogram of these means is shown in Figure 1.5.5, while Figure 1.5.6 represents a
Normal plot of the set of 200 means.
50
40
Frequency
30
20
10
0
5 6 7 8 9
Mean
Mean
6
4
-3 -2 -1 0 1 2 3
Score
Figure 1.5.6: Normal plot of 200 means each based on 30 values from a chi-square curve
with 7 df
The effect of averaging has been particularly successful here in producing means
which closely follow a Normal distribution (not all samples will be quite so good).
The mathematical theory says that as the sample size tends to infinity, the means
become Normal; it is clear, though, that even with a sample size of 30 in this case,
the Normal approximation is very good.
The same symmetrical characteristics as seen for the data of Section 1.1 are
displayed in Figure 1.6.1, but it is different in one important respect – the picture is
based on 119 repeated measurements of the same pharmaceutical material,
whereas all our earlier examples were based on single measurements on different
people or objects. The material measured was a control material used in monitoring
the measurement process in a quality control analytical laboratory in a
pharmaceutical manufacturing company [See Mullins [7] Chapters 1, 2]. The
methods discussed in Section 1.3 show that a Normal curve is a good model for
these data, also.
4
Alanine aminotransferase (ALT) is an enzyme found in most human tissue.
The spread of measurements for System B is greater than that for A – A is said to
have better precision. The width of each curve is measured by the standard
deviation, ; the shapes of the curves in Figure 1.6.2 mean that A < B. There is no
necessary relationship between the two parameters. There is no reason why A
could not be bigger than B, while A was less than B.
Figure 1.6.3 [7, 8] shows serum ALT concentrations for 129 adults. It demonstrates
that there is no necessary relationship between the shape of the curve describing
variation in the physical system under study and the shape of the curve that
describes variation in the measurement system used to measure the physical
entities. The ALT distribution (Figure 1.6.3(a)) is right-skewed, that is, it has a long
tail on the right-hand side, which means that some individuals have much higher
ALT levels than the majority of the population.
Figure 1.6.3: Distributions of ALT results (a) 129 adults (b) 119 assays of one specimen
Figure 1.6.3(b) shows the distribution of results obtained when one of the 129 serum
specimens was re-analysed 119 times. The shape of the distribution that describes
the chance analytical variability is very different from that which describes the overall
Figures 1.6.1 and 1.6.3 underline the importance of the Normal curve for the
analysis of measurement systems, as the graphs of Section 1.1 showed its
importance for describing chance variation in different physical populations.
Exercises
1.6.1. (i) An unbiased analytical system is used to measure an analyte whose true value for
the parameter measured is 100 units. If the standard deviation of repeated
measurements is 0.80 units, what fraction of measurements will be further than one
unit from the true value? Note that this fraction will also be the probability that a
single measurement will produce a test result which is either less than 99 or greater
than 101, when the material is measured. What fraction of measurements will be
further than 1.5 units from the true value? Give the answers to two decimal places.
(ii) Suppose now that the analytical protocol requires making either 2 or 3 measurements
and reporting the mean of these as the measured value. Recalculate the
probabilities for an error of one unit and compare them to those obtained for a single
measurement.
In this chapter we have discussed various aspects of the Normal curve as a model
for statistical variation. This final section illustrates the use of a graphical tool for
monitoring the stability of a system. This tool, the control chart, is based on the
properties of the Normal curve and is an example of a powerful practical tool based
on simple underlying ideas.
Statistical process control (SPC) charts were introduced for monitoring production
systems but, as we shall see, they have much wider applicability. Conceptually they
are very simple: a small sample of product is taken regularly from the process and
some product parameter is measured. The values of a summary statistic (such as
the sample mean) are plotted in time order on a chart; if the chart displays other than
random variation around the expected result it suggests that something has changed
in the process. To help decide if this has happened control limits are plotted on the
chart: the responses are expected to remain inside these limits. Rules are decided
upon which will define non-random behaviour.
Figure 1.7.1 shows a control chart for a critical dimension (mm) of moulded plastic
medical device components, each of which contained a metal insert purchased from
an outside supplier.
Twenty five samples of size n=5 were taken from the stream of parts manufactured
over a period of several weeks, with a view to setting up control charts to monitor the
process. The five sample results were averaged and it is the average that is plotted
in the chart. For now, the centre line (CL) and control limits (upper and lower control
limits, UCL and LCL) will be taken as given; later, the rationale for the limits will be
discussed.
UCL=3.6713
3.65
3.60
Sample Mean
_
_
3.55 X=3.5537
3.50
3.45
LCL=3.4361
1 3 5 7 9 11 13 15 17 19 21 23 25
Sample
• a run of seven points all above or all below the central line;
Runs of nine points, rather than seven, are also commonly recommended. The
same basic principle underlies all the rules: a system that is in statistical control
should exhibit purely random behaviour - these rules correspond to improbable
events on such an assumption. Accordingly, violation of one of the rules suggests
that a problem has developed in the process and that action is required. The
rationale for the rules is discussed below.
The manufacturing process to which Figure 1.7.1 refers appears, by these rules, to
be in statistical control. No points are outside the control limits and there are no long
runs of points upwards or downwards or at either side of the central line.
95.5 UCL=95.519
95.0
94.5
%Potency
_
X=94.150
94.0
93.5
93.0
LCL=92.781
1 5 9 13 17 21 25 29 33 37 41 45
Observation
Figure 1.7.2 shows a control chart for a HPLC potency assay of a pharmaceutical
product [7]. The data displayed in the chart were collected over a period of several
months. At each time point two replicate measurements were made on a control
material, which was just a quantity of material from one batch of the production
material routinely manufactured and then measured in the laboratory. These results
were averaged and it is the average that is plotted in the chart.
The average level of the measurement system to which Figure 1.7.2 refers appears,
by the AIAG rules, to be in statistical control. No points are outside the control limits
and there are no long runs of points upwards or downwards or on either side of the
central line. Accordingly, we can feel confident that the analytical system is stable
and producing trustworthy results.
Figure 1.7.3 illustrates the benefits of having control charts in use for both production
results and analytical control data, when product reviews are carried out.
Figure 1.7.3(a) shows potencies for a series of batches of a drug with a clear
downwards shift in results. Such a shift suggests problems in the production
system, but would often lead to production management questioning the quality of
the analytical results. It has to be someone else’s fault!
Figure 1.7.3: Simultaneous control charts for a production system (a) and the analytical
system (b) used to measure the potency of its product.
5
In fact, the problem here could be due to changes in an in-coming raw material; so a third control
chart might well be worthwhile!
50
Fe Result
Mean=48.2
45
LCL=42.38
40
0 10 20 30 40 50
Run Number
Figure 1.7.4: A control chart for the iron (Fe) content of a water sample (ppb)
The diagram shows an analytical system which is clearly out of control; in fact, three
gross outliers were removed before this chart was drawn. The data were collected
retrospectively from the laboratory records; the laboratory did not use control charts
routinely. They are presented here because they exhibit the classic features of an
out-of-control system: several points outside the control limits, a run of points above
the centre line and a run of points downwards. There is an obvious need to stabilise
this analytical system.
There are two ways in which control charts are used, viz., for assessing
retrospectively the performance of a system (as in Figure 1.7.4) and for maintaining
the stability of a system, which is their routine use once stability has been
established. Where a chart is being used to monitor a system we would not expect
to see a pattern such as is exhibited between observations 17 and 32 of Figure
1.7.4, where 16 points are all either on or above the centre line. Use of a control
chart should lead to the upward shift suggested here being corrected before such a
long sequence of out-of-control points could develop.
Note that if the chart were being used for on-going control of the measurement
process, the centre line would be set at the reference value of 50 ppb. Here the
interest was in assessing the historical performance of the system, and so the data
were allowed determine the centre line of the chart.
The centre line (CL) for the chart should be the mean value around which the
measurements are expected to vary at random. In a manufacturing context, this
could be the target value for the process parameter being measured, where there is
a target, but in most cases it is the mean of the most recent observations considered
to be ‘in-control’. Similarly, in a measurement context, the centre line will either be
the mean of recent measurements on the control material, or the accepted ‘true
value’ for an in-house standard or a certified reference material.
The control limits are usually placed three standard deviations above and below the
centre line, Figure 1.7.5(b); the standard deviation of individual measurements is
used if the points plotted are individual measurements, as in Chart 4, and the
standard error of the mean, if the plotted points are means of several measured
values, as in Chart 1. This choice may be based on the assumption that the
frequency distribution of chance causes will follow a Normal curve, or it may be
regarded simply as a sensible rule of thumb, without such an assumption. As we
have seen a distribution curve can be thought of as an idealised histogram: the area
under the curve between any two values on the horizontal axis gives the relative
frequency with which observations occur between these two values. Thus, as
shown in Figure 1.7.5(a), 99.74% of the area under any Normal curve lies within
three standard deviations (3) of the long-run mean () and so, while the system
remains in control, 99.7% of all plotted points would be expected to fall within the
control limits.
Where only single values are measured (as frequently happens in process
industries, for example) the control limits are:
To obtain the correct control limits when plotting averages, we simply replace by
n expressions given above and obtain:
where is (as before) the standard deviation of individual values. The chart based
on single measurements is sometimes called an ‘Individuals or X-chart’ while the
chart based on means is called an ‘X-bar chart’; the two charts are, however,
essentially the same. Note that in a laboratory measurement context, where n is
typically 2 or 3, the simplest and safest approach is to average the replicates and
treat the resulting means as individual values (see Mullins [7], Chapter 2).
The limits described above are known as ‘three sigma limits’, for obvious reasons;
they are also called ‘action limits’. They were first proposed by Shewhart in the
The basis for the second AIAG rule (a run of seven points all above or all below the
central line) is that if the system is in control the probability that any one value is
above or below the central line is 1/2. Accordingly, the probability that seven in a row
will be at one side of the central line is (1/2)7=1/128; again, such an occurrence would
suggest that the process has shifted upwards or downwards. The third rule (a run of
seven points in a row that are consistently increasing (equal to or greater than the
preceding points) or consistently decreasing) has a similar rationale: if successive
values are varying at random about the centre line, we would not expect long runs in
any one direction.
The last catch-all rule (any other obviously non-random pattern) is one to be careful
of: the human eye is adept at finding patterns, even in random data. The advantage
of having clear-cut rules that do not allow for subjective judgement is that the same
decisions will be made, irrespective of who is using the chart. Having said this, if
there really is ‘obviously non-random’ behaviour in the chart (e.g., cycling of results
between day and night shift) it would be foolish to ignore it.
The assumption that the data follow a Normal frequency distribution is more critical
for charts based on individual values than for those based on averages. Averages,
as we saw in Section 1.5, tend to follow the Normal distribution unless the
distribution of the values on which they are based is highly skewed. In principle, this
tendency holds when the averages are based on very large numbers of
observations, but in practice samples of even four or five will often be well behaved
in this regard. If there is any doubt concerning the distribution of the measurements
a Normal probability plot (see Section 1.3) may be used to check the assumption.
Stuart [11] considered the problem of monitoring cash variances in the till of a retail
outlet – a sports club bar. At the end of each day's trading, the cash in the till is
counted. The till is then set to compute the total cash entered through the keys,
which is printed on the till roll. The cash variance is the difference between cash
and till roll totals (note that ‘variance’ as used here is an accounting term, which is
not the same as the statistical use of the same word, i.e., the square of the standard
deviation). Small errors in giving change or in entering cash amounts on the till keys
A control chart covering Sunday cash variances for a year is shown in Figure 1.7.6.
The variances are recorded in pounds (the data pre-date the Euro). The centre line,
is placed at zero, the desired process mean. The control limits are placed at ±£12;
the limits were established using data from the previous year.
Sundays
20
12 UC L
V
a 4
r CL
i -4
a
n - 12 LC L
c
e - 20
10 20 30 40 50
We ek
As may be seen from the chart, this process was well behaved for the most part,
apart from three points outside the control limits. For the first point, week 12, value
£14.83, the large positive cash variance indicates either too much cash having been
put in the till or not enough entered through the till keys. When the bar manager
queried this, one of the bar assistants admitted a vague recollection of possibly
having given change out of £5 instead of out of £10 to a particular customer. The
customer involved, when asked, indicated that she had suspected an error, but had
not been sure. The £5 was reimbursed.
In the second case, week 34, value –£22.08, an assistant bar manager who was on
duty that night paid a casual bar assistant £20 wages out of the till but did not follow
the standard procedure of crediting the till with the relevant amount.
The last case, week 51, value -£18.63, was unexplained at first. However, other out
of control negative cash variances on other nights of the week were observed, and
these continued into the following year. Suspicion rested on a recently employed
casual bar assistant. An investigation revealed that these cash variances occurred
1.8 Conclusion
This chapter has introduced the concept of statistical variation – the fact that objects
or people selected from the same population or process give different numerical
results when measured on the same characteristic. It also showed that, where the
same object/person was measured repeatedly, different numerical results were
obtained, due to chance measurement error. In many cases, the variation between
numerical results can be described, at least approximately, by a Normal curve.
This curve will often describe the variation between means of repeated samples,
also, even where the raw data on which the means are based come from
distributions that are not themselves Normal. This fact is of great importance for
what follows: the most commonly used statistical procedures are based on means;
therefore, statistical methods based on Normal distributions will very often be
applicable in their analysis. For this reason, the properties of the Normal curve and
its use in some simple situations (e.g., developing reference intervals in medicine, or
use of control charts in industrial manufacturing or in monitoring measurement
systems) were discussed in the chapter.
1.2.1.
(a)
A weight of 85 mg corresponds to a Z value of –2.14 as shown below – the
standard Normal table gives an area of 0.0162 to the left of –2.14, so the area
to the left of 85 is also 0.0162.
x 85 100
z 2.14
7
P(X < 85) = P(Z < -2.14) = P(Z>2.14) = 1 – P(Z < 2.14) = 1 – 0.9838 = 0.0162
Approximately, 1.6% of tablets will have weights either greater than 115 mg or
less than 85 mg, which means that about 3.2% of tablets are outside these
limits.
(b) The calculations for part (b) are carried out in exactly the same way as those
for (a).
If we inspect the standard Normal table to find the value that has an area of
0.99 below it, we find Z=2.33; therefore, -2.33 will have an area of 0.01 below it.
K 100
Z 2.33
7
(ii)
B 100
Z 2.575
7
(i) Calculate the bounds (a, b) for the 95% reference interval in the log
scale.
(ii) Use your results to obtain the corresponding reference interval in the
original measurement scale: the bounds will be (10 a, 10b).
Note also that the calculations above will be quite sensitive to the
number of decimal places used.
(iii) What value, U, should be quoted such that only 1% of women from this
population would be expected to have a -OHCS measurement which
exceeds U?
The critical value on the standard Normal curve which has 0.99 to the left and
0.01 to the right is 2.33. From the relation:
x
z
it is clear that the corresponding value for a Normal curve with mean
and standard deviation =0.1604 is =3.1160. Only 1% of
the Log10(-OHCS) values would be expected to exceed Log10(U)=3.1160.
The corresponding value in the original measurement scale is U=103.1160=
1306.2.
1.4.1. We begin by finding the parameters for the doctorate completion time, T
T L2 P2 D
2
W
2
15.03
Since all four components of the total time are assumed to be Normally
distributed, the total time, T, will also have a Normal distribution.
T T 156 176
P(T 156) P(Z ) P(Z ) P(T 1.33) P(T 1.33)
T 15.03
1 P(T 1.33) 0.09
The probability that the doctorate will take less than 3 years (156 weeks)
is 0.09.
T T 182 176
P(T 182) P( Z ) P( Z ) P(T 0.3992)
T 15.03
1 P(T 0.40) 1 0.655 0.345
The chances of the postgraduate work taking more than three and a half
years (182 weeks) are about 0.35.
T T 208 176
P (T 208) P ( Z ) P( Z ) P (T 2.13)
T 15.03
1 P (T 2.13) 1 0.9834 0.0166
1.6.1.
(ii) We saw in Section 1.5 that means are less variable than single
measurements and, specifically, that the variability of means is given by
the standard error (the standard deviation for single values divided by the
square root of the number of values on which the mean is based). To
allow for the averaging in our calculations, therefore, we simple replace
the standard deviation by the standard error.
The probability that a mean of two measurements will exceed 101 is given
by:
The value of making even two measurements instead of one is clear – the
probability of an error of one unit has been reduced from 0.21 to 0.08.