04 - Mean, SD & Peer - David Plaut QC Article - Without Notes
04 - Mean, SD & Peer - David Plaut QC Article - Without Notes
No one would argue that the simple answer to these questions is “the more the better.” While that is
indeed a simple answer it does not really answer our question - we are looking for a useful and practical
number.
My first serious study of the first question began about 30 years ago when I was working with a short-
dated whole blood-based blood gas control. Since the customary answer of 20 to 30 points (*) made it
expensive and impractical (since the product lasted only 60 days, one-third to one-half of the life of the
material would be depleted before the mean was established and ready to use for quality monitoring).
In order to answer the question from a practical standpoint point, the group I was working with in Miami
examined the data from several lots of three levels of blood gas control using a form similar to that
shown in Figure 1. The data in the Cumulative Mean column indicates that by and large the mean for
these data becomes stable after some 8 data points. It was this study that was published in American
Association for Respiratory Therapy as an abstract for the national meeting in 1978 (1).
(*) I had occasion recently to go back and look at the original Levey and Jennings paper from 1954 and
found that while they referenced earlier work on charting procedures and statistics, they did not give any
reason for why they chose to run their home-made pooled serum control in duplicate over 20 days!
Rev. 3/20/2005 1
Mean & Peer Group Data A Discussion of Three Common Questions in Quality Control
This material is adapted and enlarged from articles that appeared in Advance in March and April, 2005.
The author thanks Lynn Nace and the publishers of Advance for allowing me to use this material.
There are two aspects of this study that the group felt were important enough to explore in more detail.
First we knew that blood gas instruments, using no true reagents or reactions in the usual sense and
calibrated as they were with primary standards, were more precise than most other instruments in the
laboratory. Secondly it seemed to us from the data on the three levels of control for a number of
parameters including oxygen, carbon dioxide and pH that there seemed little, if any, correlation between
the SD and the CV and the number of replicates to establish the mean.
Rev. 3/20/2005 2
Mean & Peer Group Data A Discussion of Three Common Questions in Quality Control
Minitab we could ask for as many data • 99.7% of values are within
+/- 3SD of the mean
points that were normally distributed
(bell-shaped) as we wished, with 0 x
whatever mean and SD we wanted and Figure 3
look at the cumulative means for the
data sets. These data sets were generated hundreds of times with various means and SDs - all returned
the same conclusion “Eight is enough.”
Appendix A is an expanded set of data that lead to the same conclusion. In the table I have added the
% difference of the given mean from the mean after 30 replicates. Also in the table is the SDI which is
defined as
meann - mean 30
SDI = -----------------------
SD30
In general if the SDI is less than 1.0 there is little difference between the means. Another way to
evaluate the SDI in the table is to compare it to the SD of the data set for the mean when n = 30. If SDI
is less than SD for n = 30, there is little chance the means are different. Perhaps the most significant test
though is the student t-test. This value was calculated where the largest difference in the mean when n
was less than 30 and the mean when n = 30. In each case the t value was not statistically significant.
As you may wish to explore these ideas on your own, we offer two simple ways that you can do this:
First take your own quality control data for the tests you wish to study and set up an Excel sheet as you
see in Figures 1 or Table 1. The formulas are given in Appendix B. Enter the raw data in the data point
column and the formulas as shown in Appendix B. Should you conclude that eight (or 7 or 9) points are
sufficient, save the data in case you want to demonstrate it to others (e.g. laboratory inspector). It is
quite easy to generate quite a few means from n = 8 using a set of 30 points. Again this is outlined in
the appendix.
The second approach is to use a nifty tool available in Microsoft Excel – the random number generator.
In essence this algorithm will provide you with as many, say 30, data points with a given mean and SD.
Keep in mind since these are random (Gaussian) numbers the mean and SD returned will differ ever so
slightly from the one you type in. But the exercise is certainly worth the few minutes it takes. Again we
suggest you save the results.
Although the data certainly indicate that 8 replicates will suffice to provide a good estimate of the mean
for a new lot of control, there is an additional question to be discussed relating to this: Over what period
do I collect these eight data points? It would not be wise to obtain all 8 data points within a single run.
The best answer it over 8 days, although 3 or 4 days (or runs) will usually suffice. During this period
when the new lot of control is being phased in, good practice indicates that the ‘old’ control is being run
concurrently. By running the old and new controls in parallel any bad runs should be detected by the old
control and the data on the new control can be discarded. Another way that an outlier from the set of 8
can be detected is to use the following procedure.
Rev. 3/20/2005 3
Mean & Peer Group Data A Discussion of Three Common Questions in Quality Control
The two approaches to the answer to the question how many points does it take to establish the mean
for a new lot of control, are pragmatic and based on calculations with data from laboratory QC programs
and similar computer generated data. Another approach is based on the widely accepted method to
compare two means – the t-test. I have mentioned this test earlier when it was used to determine
whether the means of eight replicates were different from the mean when 30 points were used. It can
also be used to help determine n. This application is discussed in Statistical Methods in Analytical
Chemistry by Meier and Zund3 titled in their section “Number of Determinations.” As it is somewhat more
mathematical, I have added it as Appendix D.
Moving on to the second question 2: Once I have a mean how to I establish an SD?
When the mean for the new lot of control has been established using say 8 replicates, the next task is to
establish a measure of the imprecision as measured by the SD for the new lot of control in order to set
QC limits. Again the traditional number is 20 – 30 replicates of the control. I have not found any real
statistical discussion of the source of these values for n. In the discussion above we developed the idea
that a mean for the new lot of the control can be set with as few as 8 data points. Few of us would
suggest using 8 replicates of any measurement in the clinical laboratory to establish a measurement of
the inherent variation (random error, SD) in an analytical system. The list of sources of random error in
such measurements would include calibration, new lots of reagents and calibrators, maintenance to the
instrument, line voltage fluctuations (that would create both temperature fluctuations and lamp
fluctuations), and variations between operators would be on the list. There are certainly more. Given
this, one has to ask whether even as many as 30 points are sufficient to include all the sources of
random error. In other words is the SD after 30 points really enough. Any one who has not said “My SD
is too tight” may be using an SD that is indeed too tight simply because it was based on too few data
points!
Figure 4
Rev. 3/20/2005 4
Mean & Peer Group Data A Discussion of Three Common Questions in Quality Control
Figure 5
45
Given that more than 30 replicates may be
better for establishing and SD (and CV), I
40 suggest the following approach:
35
1. Find the mean for the new lot of control
from the 8 replicates.
30 2. Find the cumulative CV from an
interlaboratory survey (such as the data
25
from the entire previous year on the
20
previous lot of control).
0 5 10 15 20 25 30 35 40 45 3. Using the formula SD (new) = CV
(cumulative) * Mean (new from n = 8).
Figure 6
This new CV contains more of the sources of variation than that from 8 or 20 or 30 replicates and should
be a better overall statistic with which to set the QC limits. The CV is used as it is a measure of relative
precision and thus accounts for minor variation from lot-to-lot of control.
Rev. 3/20/2005 5
Mean & Peer Group Data A Discussion of Three Common Questions in Quality Control
Turning now to the third question: How many laboratory means are needed for a group comparison?
From the above discussion it should be apparent that the number of lab means needed to find a useful
mean for interlaboratory comparisons is probably fewer than expected. In looking for an answer to this
question we need first to keep in mind not only what the data I’ve discussed above have shown us, but
one of the prime tenets of statistics – the central limit theorem. This theorem simply says that the
distribution of means is narrower than the distribution of the raw data making up the means. See Figure
7. In looking at these data you will note that the range of the group of means 85 – 101) is narrower than
the range of the 5 laboratory means (81 – 102). This concept is plotted in Figure 8. It can also be shown
that the t-value for any of these means is not different from the group mean.
Using a data set from a group of laboratories reporting CK on a control (Figure 9), I have first presented
all the unsorted data and in the next column I have calculated the cumulative mean as we did for the
data when establishing the mean for the control in a lab. From the data in the cumulative mean column
you note that at an n of 5 the mean is about 2% different from the mean after 60 some points. This
difference is not statistically significant, nor is it beyond what might be seen from the within run precision
of the data for any of the laboratories nor what might occur after changing reagents or recalibration.
Rev. 3/20/2005 6
Mean & Peer Group Data A Discussion of Three Common Questions in Quality Control
The next table (Figure 10) uses data from an interlaboratory survey where the imprecision was somewhat
higher as well as with the data from the Excel tool box. Figure 11 depicts the individual means of
participating laboratories, the group cumulative mean and percentage difference between cumulative
mean of the participants and the mean after 30 laboratories were included. Again the t-test did not
detect a significant difference between the cumulative mean after five laboratories were included and the
entire set of 30.
Rev. 3/20/2005 7
Mean & Peer Group Data A Discussion of Three Common Questions in Quality Control
35
30
25
Mean Value
20 Lab Mean
Cum Mean
15 % Diff
10
0
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29
Lab ID
Figure 11 (plot of the individual means, the cumulative mean and percentage difference
from the mean after 30 points)
It would seem from our discussion then that using the Excel tool box to generate data which represent
different laboratory’s mean (as opposed to a single laboratory’s data on a control material) would serve
our needs to answer the question of how many lab’s means (data points) are necessary to find the group
mean. Again I suggest we look at Table I for the answer to this question. These data as we have said
earlier indicate that an n of 8 (or even 5) is sufficient to establish a mean to use in interpreting for an
interlaboratory survey.
Note: It is worthwhile to keep in mind when assessing interlaboratory data from a QC program that one
of the main goals is to pass surveys by keeping the QC data within CLIA limits. Thus only the group
mean and the CLIA limits as reported in some Quality Assurance Programs are necessary to begin setting
QC limits. The group SD is not a necessary component.
provides as good as and often a better estimation of the new SD than 20 or 30 data points.
Rev. 3/20/2005 8
Mean & Peer Group Data A Discussion of Three Common Questions in Quality Control
Appendix A.
ID Data Mean Abs % diff SDI Data Mean Abs % diff SDI Data Mean Abs % diff SDI
n = 30 n = 30 n = 30
1 30.08 29.70 28.12
2 29.79 28.72 27.34
3 31.45 30.24 27.09
4 29.51 31.28 27.93
5 30.69 30.23 1.26 0.32 31.20 30.23 1.26 0.32 31.95 28.49 4.44 -0.54
8 29.26 30.06 0.68 0.17 29.77 30.06 0.68 0.17 33.71 30.00 0.66 0.08
9 29.75 30.17 1.07 0.27 31.10 30.17 1.07 0.27 26.73 29.64 0.57 -0.07
10 29.84 30.05 0.65 0.16 28.91 30.05 0.65 0.16 31.22 29.80 0.04 0.00
15 29.68 29.63 0.74 -0.19 29.23 29.63 0.74 -0.19 31.62 29.77 0.14 -0.02
20 31.04 29.56 0.99 -0.25 29.63 29.56 0.99 -0.25 30.36 29.68 0.45 -0.05
25 30.15 29.66 0.64 -0.16 29.81 29.66 0.64 -0.16 29.83 30.04 0.78 0.09
30 29.04 29.85 0.00 0.00 29.35 29.85 0.00 0.00 25.35 29.81 0.00 0.00
Rev. 3/20/2005 9
Mean & Peer Group Data A Discussion of Three Common Questions in Quality Control
Appendix B.
1. Generation of random numbers using Microsoft Excel.
a. Open Microsoft Excel
b. From the Menu Bar select Tools.
c. From the Tools menu select Data Analysis. Note: The Data Analysis package in Excel is not
automatically installed when the Excel program is first loaded. When selecting the Data Analysis
pack you maybe asked for the Microsoft Office CD to complete the installation. Follow the on-
screen instructions to complete this task.
d. In Data Analysis select Random Number
Generator from the Menu that appears.
e. In Random Number Generator there are several cells
where data will be entered (see image below):
1. Under Number of Variables - enter 1.
2. Under Number of Random Numbers - select any
number you want (e.g. 30)
3. Under Distribution - select Normal
4. Under Mean and SD - enter the numbers of the
example mean/SD (e.g. 100 and 3)
5. Under Output put the column and row in which
you want the first Data Point (random number to
appear).
This can be repeated numerous times to build a population to expand on the ideas included in this
paper.
You may wish to add the absolute % difference (from the mean of n =30) in the column to the right
of your cumulative SD (or cumulative mean if you did not add the column for SD) To do this use the
formula, beginning in cell E4, =Abs(100*D4-C$30)/C30. Again copy the formula down the column to
point number 30. Another interesting statistic is the SDI which measures the difference between the
mean of say n = 8 and n = 30. You may have seen the SDI on your interlaboratory report. There, too, it
measures the difference between two means -ñ in that case between your laboratory’s mean and the
group. Use the formula
=(C4-C$30)/STDEV(C1.C30).
Rev. 3/20/2005 10
Mean & Peer Group Data A Discussion of Three Common Questions in Quality Control
Appendix C.
The following illustration shows how to best establish an SD using cumulative data over several months.
Rev. 3/20/2005 11
Mean & Peer Group Data A Discussion of Three Common Questions in Quality Control
Appendix D.
Beginning with the t-test statistic
t = (mean –µ)/SD of the mean
If we use the usual value of 2.0 for t and replace (mean 8 - mean30)/ SD8 with SDI and solve for SDI, we
find that an SDI of 0.70 or greater indicates that the means were different. In referring to Figure 2 and
table I we note that only in rare cases does this occur – suggesting again that 8 replicates is sufficient.
Reference:
1. Herring K, Matthews H, Pulwer E. Paper presented at the American Association for Respiratory
Therapy 1978.
2. Surkin, K. and Hershberger, D. Clinical Chemistry 43 ( 6):, S140-141, 1997.
3. P. Meier and R. Zünd, Statistical Methods in Analytical Chemistry, Wiley Interscience, NY, 2000.
David is at [email protected]
Rev. 3/20/2005 12