Lab Statistics Fun and Easy Fifth Edition
Lab Statistics Fun and Easy Fifth Edition
Lab Statistics
Fun and Easy
By David G. Rhoads
Fifth Edition
by
David G. Rhoads
David G. Rhoads Associates
A Data Innovations brand
120 Kimball Ave, Suite 100
South Burlington, VT 05403
(800) 786-2622 and (802) 658-1955
www.datainnovations.com
Table of Contents
Preface
Copyright Notice................................................................................................................. i
Acknowledgements............................................................................................................. i
Preface for the Fifth Edition.............................................................................................ii
History of Our Company..................................................................................................ii
Clinical Laboratory and Standards Institute Notice.....................................................iii
Chapter 1
Introduction
Chapter 2
Effects of Bad Lab Results
Case of the Disappearing Patients.................................................................................2-2
Case of the Congressional Hearings..............................................................................2-2
Case of the Failed Hospital............................................................................................2-2
Case of the Mistaken Murder Charge..........................................................................2-3
How Lab Error Contributes to Higher Health care Costs..........................................2-3
Key Role of Clinical Laboratory ..................................................................................2-4
Why include these cases in a Lab Statistics Manual?.................................................2-4
It’s the Culture, Stupid!!................................................................................................2-5
Chapter 3
Relevant Highlights of CLIA ‘88 Regulations
Startup Requirements ...................................................................................................3-2
Periodic Requirements ..................................................................................................3-4
Performance Standards..................................................................................................3-5
Requirements of Other Deemed Organizations...........................................................3-5
Chapter 4
Statistics 101
Major Statistical Concepts.............................................................................................4-2
Statistical Terms used in Clinical Laboratories...........................................................4-8
Table of Contents -1
Chapter 5
Understanding Error and Performance Standards
Experimental Error .......................................................................................................5-1
Concepts of Error...........................................................................................................5-2
Performance Standards (i.e. Total Allowable Error)...................................................5-5
QC Failure.......................................................................................................................5-6
Error Profiles...................................................................................................................5-8
Error Budgets..................................................................................................................5-9
Calculation of Allowable Systematic Error................................................................ 5-11
Assessing Uncertainty................................................................................................... 5-11
Chapter 6
Defining Performance Standards
Performance Standards Defined...................................................................................6-3
Defining PS for Established Methods..........................................................................6-10
Comprehensive Approach for Defining PS.................................................................6-10
Low End Performance Standards............................................................................... 6-11
Limits defined by Performance Standards................................................................. 6-11
Defining Performance Standards - Case Studies....................................................... 6-11
Chapter 7
Managing Quality Control
Target Mean ...................................................................................................................7-2
Target SD ........................................................................................................................7-3
QC Rules..........................................................................................................................7-4
Tips on Managing QC ...................................................................................................7-5
In the Event of QC Failure.............................................................................................7-5
Average of Normals........................................................................................................7-6
Chapter 8
Performance Validation Experiments
Startup Requirements....................................................................................................8-2
Recurring Requirements................................................................................................8-3
Before You Begin. . .........................................................................................................8-4
Chapter 10
Interpreting Linearity Experiments
Definition of Some Statistical Terms...........................................................................10-2
Linearity .......................................................................................................................10-2
Accuracy........................................................................................................................10-5
Reportable Range.........................................................................................................10-7
Calibration Verification................................................................................................10-8
Case Studies...................................................................................................................10-8
Case 1: A Non-Linear Case..........................................................................................10-9
Case 2: Inaccurate Results.........................................................................................10-10
Case 3: Failures Due to Inappropriate Specifications............................................. 10-11
Table of Contents -3
Chapter 11
Understanding Proficiency Testing
Regulatory Requirements............................................................................................ 11-1
Theoretical Approach................................................................................................... 11-2
Bias................................................................................................................................. 11-3
Calculating the Probability of PT Failure.................................................................. 11-4
Statistics......................................................................................................................... 11-7
Strategy to Pass Proficiency Testing............................................................................ 11-9
Proficiency Testing Report......................................................................................... 11-11
Chapter 12
Precision Experiments
Simple Precision Experiment.......................................................................................12-2
Simple Precision Report (Page 1)................................................................................12-3
Complex Precision Experiment...................................................................................12-4
Complex Precision Report (Page 1)............................................................................12-5
Complex Precision Report (Page 2)............................................................................12-6
Chapter 13
Understanding Reference Intervals
Key Concepts.................................................................................................................13-1
Sources of Medical Decision Points.............................................................................13-2
Verifying vs. Establishing vs. Neither.........................................................................13-3
Verifying a Normal Range .......................................................................................13-3
Establishing a Normal Range......................................................................................13-5
MDP’s which are “Cast in Stone”...............................................................................13-7
Outliers..........................................................................................................................13-8
Accuracy and Precision for Normal Ranges..............................................................13-9
Interpreting the Reference Interval Report.............................................................13-10
Case 1: An Uncomplicated Example ........................................................................13-15
Case 2: Skewed Data.................................................................................................13-16
Case 3: Effect of Number of Specimens....................................................................13-17
Case 4: Partitioning (by Gender, Race, etc.)............................................................13-19
Case 5: Effect of Outliers...........................................................................................13-20
Case 6: Effect of Tails.................................................................................................13-22
ERI Report showing Effect of Tails (no Truncation)...............................................13-25
ERI Report showing Effect of Tails (with Truncation)...........................................13-26
Appendix A
Published Performance Standards
Medical Requirements...................................................................................................A-4
Appendix B
Technical CLIA ‘88 Regulations
Subpart K - Quality Systems for Nonwaived Testing . ............................................. B-1
Selected Interpretative Guidelines............................................................................. B-17
Appendix C
Glossary
AppendixD
Bibliography
Appendix E
Our Products
EP Evaluator®, Release 10........................................................................................... E-1
Instrument Manager®.................................................................................................. E-5
Table of Contents -5
-6 Lab Statistics - Fun and Easy
Preface
Copyright Notice
This manual is copyrighted 1996 - 2012 by Data Innovations, LLC. All rights are
reserved worldwide. No part of this book may be reproduced, transmitted, tran-
scribed or translated into any language by any means without the express written
consent of Data Innovations, LLC.
Preface -i
Acknowledgements
My wife, Elizabeth A.L. Rhoads, has as always, been very helpful in the produc-
tion of this book. She remains a tower of support.
Gregory R. Vail and David G. Potter, founders of Data Innovations. They deserve
recognition both for their vision that quality in clinical laboratories is important,
and for their commitment to help laboratories achieve that quality.
If you, the reader, have comments or criticism about this book, please share them
with us. In that way, we will be able to improve the next version.
The changes between the Third and Fourth editions were relatively few, but of
major consequence. Their genesis was the changes that we have made in the pre-
sentations for our workshops. This book now better reflects those changes which
were made as part of our process to improve the presentation of clinical labora-
tory statistics.
David G. Rhoads is the individual primarily responsible for the overall design of
EE. He has had 9 years of experience as a hospital based clinical chemist. He has
a Ph.D. in Biochemistry from Brandeis University, is Board Certified in Clinical
Chemistry (DABCC) and has been a member of the AACC since 1975. He is very
concerned with the quality of the work coming from clinical laboratories.
Our customers and partners include large IVD vendors such as Roche Diagnos-
tics, Abbott Laboratories, Sysmex-America and Beckman-Coulter; reference
labs including Quest Diagnostics, and LabCorp plus many hospitals and medical
centers, large and small throughout the world.
For updates of the Standards and/or Guidelines incorporated into this software and
for information about other CLSI publications, the user may write to CLSI at 940
West Valley Road, Suite 1400, Wayne, PA 19087-1898, may call 610-688-0100,
may e-mail CLSI at [email protected], or may send a fax to 610-688-
0700.
Preface -iii
-iv Lab Statistics - Fun and Easy
Chap t e r
Introduction
1
Our point, which we will proceed to demonstrate to you, is that field of Clinical
Laboratory Statistics as we enter the 21st Century is not rocket science.
Introduction 1-1
1-2 Lab Statistics - Fun and Easy
Chap t e r
2
Effects of Bad Lab Results
In This Chapter
Poor quality assurance practices in the clinical laboratory can cause real trouble.
We discuss:
• A number of real cases which have ranged from merely expensive to disas-
trous.
• An estimate of how much poor QC practices can cost.
Many people assume that as long as a lab continues to pass its proficiency testing,
there will be no problems. Not true!! Not only can there be negative effects with
respect to patient care, but also with respect to the laboratory and hospital. Here
are several real cases which demonstrate real problems that can occur when labs
produce erroneous results. In all these cases, the laboratory passed proficiency
testing. These documented cases resulted in:
• Patient death
• Hospital failure
• Loss of business
• Loss of management jobs
• Higher costs for the health care system
• Lawsuits against the laboratory
It turns out that this lab was defining its target SD values for the chemistry elec-
trolytes using the mean SD from the BioRad QC results. The reason this is a prob-
lem is that the mean SD is calculated across all the available SD’s. Consequently
it will be 2-3 times what an appropriate value should be.
As soon as the University of Maryland Medical System realized that it had a prob-
lem, to its credit, it immediately replaced senior management both for the hospital
and for the laboratory. In other words, several people lost their jobs because the
laboratory for which they were responsible had such a poor quality assurance
record. (Baltimore Sun - 2004)
The laboratory then attempted to re-test the approximately 1500 patients previ-
ously tested. It was able to find 460 of the patients on which the HIV testing was
done during this period and repeated the test at no charge to the patient. The CEO
who was testifying before the Congressional Subcommittee proudly announced
that 99.6% of the results were the same as before.
If you do the math, you will find that two patients changed. The results for one of
those patients changed from HIV positive to HIV negative. That patient has sued
Maryland General for $5,000,000.
The specimen was rechecked by the lab and again the concentration of the drug
was found to be lethal. Later the specimen was sent to other labs and was found
to be in the therapeutic range. At that point 3 months after the initial indictment,
the charges against the mother were dropped. Potential consequences for the lab
include revocation of its license to operate as well as a lawsuit brought by the
child’s mother.
If the average additional cost for each of these incorrectly diagnosed patients was
only $1,000, then these incorrect results would result in an additional $1.2 million
cost per year for the health care system.
This cost is 40% of the total budget for a lab performing this volume of work. The
annual budget for a lab this size is approximately $3 million.
One important observation is that if the lab makes a mistake, with few exceptions,
the costs of that mistake are not paid by the laboratory, but instead by some other
cost center. In other words, as far as the lab is concerned, these errors are silent
and lab personnel are usually not aware of the errors or costs that arise from their
work.
I have had many other trips on this airline in which my baggage was not lost.
Perhaps we could say that I have 95% confidence that my luggage would be avail-
able for pickup at my destination. By implication then, for 5% of my trips on this
airline, my luggage will be lost. That is not a happy prospect.
Could it be that this airline has a culture which accepts this degree of failure?
Similarly, one can argue that laboratories have a culture which accepts failure.
The QC rules are designed so that some failure is acceptable. Witness the 2-2s
rule which says that failure occurs when 2 consecutive QC results are outside
the 2 SD limits. If at least one result is inside the limits, the process is officially
in control. One common approach in dealing with this event with 2 of 2 results
outside the limits is to repeat both QC specimens. If during the second analysis, at
least one result is acceptable, then of course one can accept the run and the prob-
lem has gone away at least for the moment.
The fundamental problem is that we have to decide which of the many failure
conditions are not acceptable. This problem is compounded by the fact that in the
stressful world of the clinical laboratory, it is easier (and often within the rules) to
ignore problems than it is to fix them.
Relevant Highlights of
CLIA ‘88 Regulations
In This Chapter
CLIA ‘88 technical regulations define tasks which laboratories are required to do
as part of their quality assurance and quality control program. We discuss:
• The groups of clinical laboratory methods and which tests are included in each
group.
• The tasks required to validate new methods.
• The tasks required to validate methods on a recurring (semi-annual) basis.
• Specifies how performance standards (specifications) are to be used.
Complete sections of the relevant technical requirements in the “Final Rule” are
in Appendix B of this manual.
There are two types of technical requirements: a) those that must be done before
results are reported for a new method (Startup); and b) those that must be done
periodically, at least semi-annually (Periodic).
Waived methods: These are methods which are so simple that no one in their
right mind can screw up the results. Some of these methods are sold over the
counter in drug stores. The federal government has reviewed each of these
methods and has waived them. This class of method will not be discussed
further in this book.
Physician Performed Microscopy: A concession to the physician’s lobby. This
covers procedures performed by a physician with his microscope. This calls of
methods will not be discussed further in this book.
Unmodified methods: These are methods adopted by a laboratory which are “un-
modified, FDA-cleared or approved test systems.”
Modified methods: These are methods adopted by a laboratory which either are
“home-brew” methods or are modified versions of approved methods.
The CAP has taken the position that treats all methods as “Modified Methods”.
For those labs inspected by the CAP, then regulations pertaining to the Modified
Methods pertain.
Unmodified methods
The regulations [493.1253(b)(1)] for the methods cleared by the FDA are very
clear. They provide that the laboratory must “demonstrate that it can obtain per-
formance specifications comparable to those established by the manufacturer for
“accuracy, precision and reportable range.”” The lab must also “verify that the
manufacturer’s reference intervals (normal values) are appropriate for the labo-
ratory’s patient population.”
Modified methods
The requirements for “everything else” are significantly more rigorous
[493.1213(b)(2)]. Prior to reporting patient test results, a laboratory must estab-
lish for each method, performance specifications as listed below:
Must be demonstrated using an experiment.
Accuracy
Precision
Reportable range of patient test results
Reference range(s)
Sensitivity (e.g. the lowest result which can be reported)
May be documented in laboratory procedure manual.
Specificity (interfering substances).
The second set of regulations specifies clearly that validation of the set of instru-
ment performance parameters must be done in all cases for both modified and
unmodified methods. This is a significant change for hematology instruments for
which a method comparison experiment was all that was done in most cases to
satisfy accuracy and reportable range requirements.
Another issue is that while many vendors verify the instrument or method for
their customer, usually that service only takes the form of demonstrating accuracy,
precision and reportable range. They usually do not verify the reference interval.
This means that it is up to the individual labs to verify or establish their reference
intervals.
Verification of reference intervals can be done two ways: (1) either with an explic-
it reference interval experiment (establish or verify), or (2) to perform a method
comparison experiment to demonstrate that there is no difference in the medical
decision points from the previous method. I know that very few labs do the ex-
periments to establish therapeutic ranges for drugs or do the experiments to (re-)
establish medical decision points for analytes such as PSA (where the traditional
cutoff is 4 ng/mL and no “normal range” is published).
Calibration Verification
CalVer “is required to substantiate the continual accuracy. . . throughout the labo-
ratory’s reportable range of test results for the test system.” CalVer must be done
as follows:
• Following manufacturer’s CalVer instructions.
• Using criteria specified by the laboratory. Materials must include specimens
with a minimal or zero value, a mid-range value and a maximum value near
the upper limit of the reportable range respectively.
• Frequency of performance of CalVer is:
• At least every six months or whenever one of the following occurs:
• A complete change of reagents occurs unless the laboratory can demon-
strate that no changes in the analytical system have occurred.
• There is major preventative maintenance or replacement of critical parts
which may affect instrument performance.
• Control materials reflect an unusual trend or shift, or are outside of the
laboratory’s acceptable limits, and other means of assessing and correct-
ing unacceptable control values fail to identify and correct the problem.
• “The laboratory’s established schedule for verifying the reportable
range for patient test results requires more frequent calibration verifica-
tion.”
• All calibration and CalVer procedures must be documented.
If a laboratory performs tests not among the 75 analytes for which proficiency
testing is required, the laboratory must have a system for verifying the accuracy of
its test results at least twice a year. [493.1281]
While the inspection process varies significantly across these organizations, all the
organizations must meet or exceed the requirements specified in CLIA ‘88. The
details of the inspection process vary substantially depending on the organization
performing the inspection. Each of these organizations has its own checklist of
requirements. Of these checklists, perhaps the CAP’s is most rigorous. The fol-
lowing table compares the requirements of CLIA ‘88 and CAP.
Statistics 101
In This Chapter
Statistics are the lifeblood of quality assurance in the clinical laboratory. We
discuss:
• Purpose and philosophy of statistics in the clinical laboratory.
• Lists of major statistical measures.
• Concepts of central tendency and dispersion.
• Significance of various degrees of dispersion.
• Criteria for detection and elimination of outliers.
• Definitions of statistical terms used in clinical laboratories.
One of the great dangers of statistics is that they can be used to distort or conceal
the realities of a situation. Politicians often do this deliberately. Laboratorians
often do this because of their ignorance of how a given situation is best described
statistically. Our effort in this book will be to describe the uses of statistics and
error with respect to:
• Understanding how error concepts describe the performance of clinical labo-
ratory methods.
• Understanding the basis for performance standards.
• Understanding how performance standards are established.
• Evaluating clinical laboratory methods with respect to performance standards.
First, we must understand the basic concepts of statistics because they are used to
describe the reality of clinical laboratory tests, namely uncertainty and error. Only
after we have learned the relationship of statistics to error and uncertainty can we
understand how to correctly evaluate, describe or establish appropriate perfor-
mance of a method.
Statistics 101 4-1
Statistics are useful because they allow us to predict future performance to a
specified probability. We can predict the range within which results are expected
from a series of events. However, predicting the exact outcome of an event is like
hitting the lottery - pure luck.
For example, we can reliably predict that the values we obtain from repeated as-
say of a specimen for LDH will be in the range of 31 and 39. We cannot predict
with any degree of assurance that the first specimen will have a value of 38 and
the second a value of 36.
Fact: John Kruk, first baseman for the 1993 Philadelphia Phillies, mad an out
66% of the times he came to bat.
This would seem to be terrible performance with failure two-thirds of the time.
Certainly it would be terrible for a professional basketball player shooting foul
shots (typically 20 to 30% failure). However, if you put this in context of what
other players have achieved, Kruk had a wonderful season. The 66% figure corre-
sponds to a batting average of 0.340. He was in the top 3 percentile of hitters that
year. Typical team batting averages are about 0.270. It is a significant achieve-
ment to have a batting average over 0.300. The best batting average of Ted Wil-
liams, one of the great batters of all time, was 0.406 only 20% greater than Kruk’s
average.
Central tendency
Dispersion
X-Y Relationships
Central tendency
Dispersion
SDI
tendency). The disper- -1.00
sion in this case is de- -2.00
scribed by the Standard -3.00
Deviation. 0 5 10
Day
15 20 25
X-Y Relationships
Examples:
Linear Regression Height-age charts for children
Fitted Lines Weight-age charts for adults
Levey-Jennings Charts
250
200
150
Y
100
50
0
0 100 200 300
One thing to keep in mind is that there are many different descriptors of the cen-
tral tendency. Use of an inappropriate descriptor can be misleading particularly if
the distribution is highly skewed.
Mean: This statistic is calculated from the sum of all elements in the set and
divided by the number of elements. It is meaningful when the distribution of
the numbers is not highly Figure 4.1
skewed. For example,
the mean age of all the
people present in a hos-
pital nursery when the
nurse is present is much
higher than when only
the infants are present. If
only infants are present,
the mean age is several
days. Whenever the nurse
is present, the mean age
becomes years.
Average: See mean. Age (years)
Median: The median is the middle Infants Infants &
of an ordered list of results. If the Only Nurse
number of items (N) is odd, then Mean 0.02 1.3
it is the middle item. If N is even Median 0.02 0.02
it is the average of the middle two
items. In a skewed distribution, it better represents the central tendency than
the mean. In the nursery example above, the median age would change little
whether the nurse is present or not. In both cases, the median age would be a
few days at most.
Linear regression line: One type of central tendency in an X-Y relationship. In
this case, the line fitted through the pairs of results is the central tendency.
Since statistics only have real meaning when viewed in context, many techniques
describe the distribution of the data and the regions of confidence we have in
them.
Keep the food fat analogy in mind: If the Haagen-Daas ice cream container
claims that its contents are 85% fat free, you can expect a 15% fat content
(100%-85% = 15%). Similarly, with a 95% confidence limit, the number is
expected to fall within that range 95% of the time. In other words, 19 times
out of 20, the expectation will be correct. One time out of 20, it will be
wrong.
2 SD Range: The 2 SD range, more accurately, is the interval from the mean -
2SD to mean + 2 SD. It has a 4SD width. It is another form of the 95% Confi-
dence Interval. To be more precise, the 95% CI is a 1.96 SD range. Converse-
ly, the 2 SD range is a 95.5% CI. For practical purposes, these two concepts
are identical.
Standard Error of the Estimate describes the dispersion of data in the environ-
ment of an X-Y relationship, namely the dispersion of data around a linear
regression line.
Table 4.1 illustrates several aspects of dispersion. The three sets of data all have
means of 46. However their distributions are very different. The data in Set I has
a range of 3 (from 45 to 48), the data in Set II a range of 15 (from 38 to 53) and
the data in Set 3 a range of 25 (from 33 to 58).
Table 4.1
Examples of Dispersion
Set 1 Set 2 Set3
45 43 41
45 47 53
47 52 38
46 46 58
45 38 33
48 53 56
45 42 43
Average 46.0 46.0 46.0
SD 1.2 5.2 9.7
%CV 2.5% 11.2% 21.0%
Range 45 to 48 42 to 53 33 to 58
95% CI 43.7 to 48.3 35.9 to 56.0 27.1 to 65.0
Ratio (1) 1.11 1.56 2.40
(1) The ratio is the upper 95% confidence limit divided by the lower 95% confi-
dence limit.
Keep in mind that widening the dispersions of data means that the significance of
individual results can change. All the numbers within the 95% CI are identical
statistically. In the case of set 1, these statistically identical numbers range from
43.7 to 48.3. For set 3, they range from 27.1 to 65.0. The meaning of a value of
35 is very different when viewed from the perspective of the 95% CI of set 1 ver-
sus that of set 3. When viewed from the perspective of set 1, it is a clear outlier.
When viewed from the perspective of set 3, it is statistically identical to the other
numbers in the set.
The most important thing to realize is that in the clinical laboratory environment,
since all results within a 2 SD range of the mean are acceptable, all results in that
range are statistically identical.
Outliers are points obtained in a set of data which for some reason are considered
to not be representative of that set. The usual reason for declaring a point an out-
lier is because it represents an error.
Example: 45 results were obtained for a precision study from the XYZ analyzer.
44 of the results were in the range of 85 to 97. The 45th result was 81. Examina-
tion of the sample cup for the 45th result showed it to be empty. The other sample
cups had small amounts of specimen remaining in them. Since it can be shown
that the suspected outlier was due to a short sample, it can be legitimately exclud-
ed from the calculation.
Detection of Outliers
Determining whether points are outliers requires honesty and integrity. Otherwise
the quality of the resulting statistics will be compromised. It is best to use criteria
which require that outliers be VERY different from the remaining results.
Remember that the purpose of statistical analysis is to predict future results. There
must be good reason to define a point as an outlier. For example, the CLSI:EP5
Precision document defines outliers as those runs for which duplicate results are
different by more than 5.5 times the SD calculated from the preliminary (within-
run) precision data. Consequently, two results within a run have to be very differ-
ent before the run can be rejected as outliers.
One issue is what criteria should be used to find outliers. A common (inappro-
priate) practice is to declare all points that seem to be inconveniently located
as outliers. What should be done is first to check to see if a typographical error
occurred. If more than one outlier exists, then perhaps no outliers at all should be
excluded as this indicates a potentially serious problem.
Distributions
In this Chapter
Critical to good quality assurance is understanding experimental error. We dis-
cuss:
• Concepts and terms used to describe systematic, random and total error.
• Concepts of allowable total error.
• The basis for error budgets.
• Error budgets including the 25% rule.
• Different applications of 95% confidence limits.
Experimental Error
Definition: Error is the deviation of a single estimate of a quantity from its true
value. (Carey and Garber - 1989)
Synonym: Uncertainty.
Life would be much simpler if every measurement on the same specimen gave the
same result. However, reality is that repeated analysis of the same specimen often
produces different results. This is due to the presence of error in the measuring
process. This chapter will present a general discussion of what error is and how it
can be managed in the clinical laboratory setting.
Managing Error
It is essential that the user understand important related concepts concerning the
management of error. Some important elements are:
• Understanding the concepts of error.
• Defining specifications of allowable error.
• Using statistical tools to measure error.
• Error profiles for quantitative analytical methods.
• The relationship of QC rules to allowable error concepts.
• Motivating and supporting personnel so satisfactory results are produced.
• Providing evidence to regulators that the work is satisfactory.
Concepts of Error
Four major types of experimental error are introduced and discussed: random er-
ror, systematic error, total error and idiosyncratic error. Note that random, system-
atic and total error can only be assessed after analysis of a number of samples.
Linearity Experiment
Regulatory Requirements Addressed: Accuracy, reportable range, calibration
verification and linearity.
Experiment: Analysis of specimens with defined analyte concentrations. The
following elements may be varied in this type of experiment: number of
specimens, type of specimen, number of replicates, and range of specimen
concentrations. Depending on these variables, the experiment can determine
accuracy, reportable range and linearity, as well as verify calibration.
Alternate and EP9 Method Comparison: Analyze results from two or more
methods for the same analyte for specimens with undefined concentrations.
Statistical results include slope, intercept, standard error of the estimate (SEE)
and the bias at the medical decision point.
EP10: Analyzes 10 results a day from 3 specimens over a period of at least 5
days. Statistical results include linearity, precision, accuracy, carryover and
drift. If the low and high specimens challenge the limits of the reportable
range, this experiment will evaluate those limits as well. EP10 is an experi-
ment designed to show whether a method is satisfactory or not. It does not
have sufficient statistical power to determine sources of problems.
EP15: Analyzes two groups of specimens, one for accuracy and reportable range,
the other for precision. This CLSI document is not explicitly implemented in
EP Evaluator®. However, it is possible to perform this type of experiment by
using the Linearity and Complex Precision modules.
If TEa is too large, the quality of results suffers. If it is too small, the cost of
keeping the process in control is excessive. Hopefully the error in results from
the analytical process is less than the allowable total error.
Examples
This table includes Analyte CLIA ‘88 Limits
several Erythrocyte count (RBC) +/- 6^
examples of TEa show- Prothrombin time +/- 15%
ing of various types of Potassium +/- 0.5 mmol/L
error ALT (SGPT) +/- 20%
specification, namely: Blood gas pO2 +/- 3 SD
• By concentration. Glucose +/- 6 mg/dL or 10% (greater)
• By percent. HCG +/- 3 SD
• By concentration Diogoxin +/- 20% or 0.2 ng/mL (greater)
and percent which-
ever is greater.
• By SD (always 3 SD).
More important to our discussion here is the effect of QC failure. In fact, there are
only two ways in which QC failure shows up externally:
• Bad proficiency testing results.
• Bad patient results
This figure has been set up so that the PT Limit is approximately 1 SD away from
the mean. Statistically, about 17% of the results will be outside the PT limits. This
number is shown by the hatched area to the right of the figure.
Let’s calculate how many results would be wrong if this scenario applied to all
your tests. The typical hospital lab has a test menu of about 200 tests. If PT were
performed on all those tests, 5 results per test or a total of 1000 results would be
submitted to the PT provider (such as the CAP). Of those 1000 results, 17% or
170 results would be outside the PT limits. Not a happy thought. Most lab manag-
ers get upset if there are more than 5 to 10 results (0.5 to 1%) which fail. In fact
you want only a small fraction of 1% to fail.
What happens if
• 10% of the specimens with true results below the MDP in fact exceed it?
• 10% of the specimens with true results above the MDP in fact do NOT exceed
it?
In both cases, incorrect diagnoses may occur. These will result in extra cost and
pain for both the health care system and the patient as the system deals with the
consequences of poor lab results.
The reason we are interested in the Error Profile is that it provides us with a
mechanism of determining Total Allowable Error (TEa), otherwise known as Per-
formance Standards.
The equation for the CV is shown below. The key point here is that while the CV
rises rapidly at the low concentrations, it is relatively constant at higher concen-
trations.
cv = 100 * SD / mean
The effect of these observations is to express the error for an analyte in terms of
both SD (for low concentrations) and CV (for higher concentrations).
One case with real numbers is rep- HCG (Vitros ECi) (n=63)
resented by CAP survey results for Spec ID Mean SD CV
bHCG for a major analyzer as shown C-11 26.97 1.65 6.1
in this table. C-12 68.29 4.54 6.6
C-13 90.61 6.39 7.1
One important observations from this
table are that all the CV’s are about C-14 52.13 3.57 6.8
the same (roughly 6 - 7%). While C-15 82.47 4.84 5.9
this is not always the case, one often
observes that the CV’s at the upper 60 to 80% of the reportable range tend to be
similar, namely within a range of 20% or so. CV’s in the lower range, untested in
this survey, can be significantly higher.
This technique of using the median value from PT surveys is one of the key ap-
proaches to establishing TEa. The process necessary to convert the median to TEa
is discussed extensively in Chapter 6, Defining Performance Standards under the
category of Total Achievable Error – based on Peer Group Surveys.
Error Budgets
As discussed above, TEa has two major
components: allowable systematic error
(SEa) and random error (RE). An error
budget allocates a fraction of the total error
for SEa and the remainder for RE as shown
at right.
It is clearly a mistake to set SEB at 100%. While you will pass accuracy and
linearity tests relatively easily, it completely neglects the very significant contribu-
tion of random error to the probability of failing PT.
If the actual errors are within these specifications, the probability of failing PT be-
comes exceedingly small. For details, see Chapter 11, Understanding Proficiency
Testing.
In EP Evaluator®, SEB is used to calculate items such as the height of the linear-
ity error bars and maximum bias for recovery purposes. These values are applied
to statistical values such as the mean at each concentration. They are not applied
to individual results.
Assessing Uncertainty
When a result is presented, there should be some indication of its quality. For
example, a result of 10.0 ± 0.1 is very different from a result of 10 ± 3. In the first
case, the range of possible results is 9.9 to 10.1, in the second, it is 7 to 13.
Confidence intervals are often used to show the range within which results are
expected to occur. Confidence intervals typically are given for 95% (2 SD) or
99.7% (3 SD) limits. The 99.7% confidence interval will always be wider than
the 95% interval.
Examples of 95% confidence intervals for a set of data with a mean of 100, an SD
of 10 with various numbers of results (N) are shown in the table below. Note how
the confidence intervals decrease in size for the mean and SD as N gets larger.
The confidence interval for the results themselves are independent of N. Note
also that while N increases by a factor of 4, the confidence intervals only decrease
by a factor of 2.
Important Question!
Imagine this case:
Analyte: Phenobarb
QC: CV=10% at a concentration of 10, therefore 1 SD=1 mg/L
QC Limits: 8 - 12 mg/L
Suppose your instrument is in control for the 5 days you submit the same patient specimen
(nominal concentration of 10 mg/L) to it. If Murphy’s Law (if anything can go wrong, it will) is func-
tioning perfectly, you could report out the results for that specimen over those 5 days as 8, 9, 10,
11 and 12.
• If you were the physician caring for a patient, what conclusion would you draw based on
these results in which the last value is 50% more than the first one?
Defining
Performance Standards
The concept of performance standards (PS) for the clinical laboratory is not new.
This issue has been addressed in numerous conferences and papers spanning more
than 40 years. Furthermore PS were established for about 75 common tests in the
US by federal regulations in 1992 and then were finalized in 2003 (Federal Regis-
ter - 2003).
Of the many reasons PS have not been widely adopted, there are two which are
key: a) In many laboratories, there is no urgency to work according to PS; and b)
PS are not available for most laboratory tests. It has been my observation that:
• There is no consensus in our industry that we need PS except for the CLIA ’88
PT limits.
• There is no consensus on how to establish PS.
• There is no consensus on what those PS should be.
Achieving a consensus on these issues will likely take years. Fortunately, many
are now working on this problem. I hope that this discussion will be an impetus so
a consensus on these issues can eventually be established.
Our industry is heavily quantitative. However it does not universally accept the
concept of quality goals. A common practice in the US is to set convenient preci-
sion targets with the assumption that since these targets are “so tight”, no one will
get hurt.
PS are used to define the two most important metrics for quantitative clinical labo-
ratory tests: (1) allowable bias and (2) target SD values for routine quality control.
These metrics define the quality of our results.
Adding to the complexity of this problem is that there are many types of clinical
laboratories, some of which are listed below. Clearly one single approach will not
work in all cases.
• Clinical laboratories operate in different regulatory environments. Conse-
quently, PS dictated by regulations from one jurisdiction will not apply to all.
• Specimens are obtained from multiple species. While the most common spe-
cies of course is homo sapiens, there is an active community working with
animals. As a consequence, species-specific criteria for establishing PS such
as medical requirements, biological variation and reference intervals are not
universally applicable.
Editorial Note:
PS for clinical laboratory tests traditionally have been expressed as Total Al-
lowable Error (TEa). I prefer the term “Performance Standards” because it
is positive. It is easier for the uninitiated to understand than the formal term
“Total Allowable Error” because the latter is technical and has negative con-
notations.
On what basis should one define PS? Should there be a relationship to clinical
metrics (i.e. medical requirements, biological variation, etc.) or should their val-
ues be based on what is analytically achievable? In years past, I preferred clinical
metrics. However I have found that this is not universally acceptable because as I
will show below, there are many instances in which clinical metrics are unattain-
able, unavailable or inappropriate.
The overall object should be to define PS for every test. These PS should be both
attainable and defensible. Keep in mind also that there is almost always an ac-
ceptable range for a PS, not just a single value.
• Attainable means that the PS is analytically achievable by the process.
• Defensible means that the PS has an acceptable relationship to the require-
ments of the clinical decision making process for that analyte.
The analysis below applies to analytes for which quantitative results are reported.
It does not apply to qualitative or semi-quantitative analytes nor to analytes mea-
sured using amplification processes such as PCR.
Classes
There are three classes of definitions of PS: (1) clinical, (2) establishment and (3)
deployment. Each has its role in understanding PS.
Clinical defines the clinical or operational goals without specifying how the PS
metrics are calculated.
Establishment definitions specify the algorithms which can be used to define PS.
The bulk of this chapter will relate to this class of definitions.
Deployment definitions specify how PS metrics calculated in the Establishment
phase are converted into allowable bias as well as the target SD’s for routine
quality control.
Clinical Definitions
Traditional Definition: “The amount of error that can be tolerated without invali-
dating the medical usefulness of the analytical result ...” (Garber and Carey
- 2010).
Biological Variation Definition: The amount of error derived from an experiment
which estimates inter- and intra-individual biological variation (Fraser - 2001).
Regulatory Definition: The amount of error that can be tolerated without failing
performance requirements established by a regulatory body.
Establishment Definitions
Some definitions are based on clinical requirements, others on what is achievable.
The definitions are listed in order with the most desirable coming first. We will
describe each definition and then discuss its advantages and disadvantages.
Medical Requirements
These values are created by a committee of experts which has carefully examined
an analyte’s clinical use and its analytical properties. This may be done at an inter-
national, national or local level. This type of definition is most satisfying because
of its clinical utility.
Table I shows TEa’s established for six important analytes, four lipids, creatinine
and HbA1c.
The advantage of medical requirements is that experts have considered the issues
related to what an acceptable PS should be. That is very powerful.
Biological Variation
This popular definition is based on the measurement of inter- and intra- individual
variation. The inter-individual variation is closely related to reference intervals, in
this case, the central 95% of a healthy population. The intra-individual variation
corresponds to the scatter of results for an individual around their own biological
set point.
While this approach is attractive because PS are defined by metrics which have
some clinical relevance, the fundamental issues are what that clinical relevance
is and then how to apply it. Fraser (2001) has shown that the clinical relevance
is that given successive results for the same test on the same patient, one can
estimate the probability that those results are different based on intra-individual
variation. While this metric may seem to some to be valuable, a number of ques-
tions both practical and theoretical can be raised which indicate that it may not be
as valuable as one might initially believe.
This very simple approach was proposed (Tonks - 1963) as a metric for what con-
stitutes an acceptable result for proficiency testing. TEa is 25% of the reference
interval, expressed as a percent or 10% whichever is less.
This approach seems to be very nice because it is clinically related, namely to the
magnitude of the reference interval (RI). Its problem is that it works with very
few tests. In many cases, TEa is not achievable especially when the lower limit of
RI is near zero.
Those regulating clinical laboratory performance in the United States and else-
where have established maximum performance standards for many important
analytes (Federal Register - 2003). For this, our industry should be grateful as this
was the first list of PS for a significant number of analytes. Values for selected
analytes are shown in Table III.
There are two advantages of this PS determined using the CLIA ‘88 PT Limits:
• By definition, the values are acceptable to the regulators.
• With some exceptions, the values seem to be clinically responsible.
There are a number of disadvantages to these values apart from their clinical util-
ity:
• Values have been established for only 75 analytes.
• There are many labs (i.e. outside the US) which are not required to use these
values.
It must be pointed out that other organizations and jurisdictions have established
regulatory requirements for clinical labs including Australasia and Saskatchewan.
These values are available on our website (www.datainnovations.com).
Unlike several approaches described above, this one is based on the analytical
performance of which a process is capable. Peer Group Survey results may be
easily obtained from PT providers such as the CAP for a large number of analytes.
This approach is derived from the CLIA ’88 specification of “target +/- 3SD.”
The core calculation of this approach is to calculate the median CV for enough
specimens and then to multiply it by three. The CV is one form of SD. This ap-
proach provides a reasonable approximation of the achievable error.
Our approach divides the reportable range into two general regions, an upper
region which typically is the upper 50 - 80% of the reportable range and a lower
region which is everything else. Basically one can use an error profile approach to
calculate a median CV in the upper region and a median SD near the low end of
the reportable range. Once you have determined an appropriate median SD or CV,
then multiply it by 3 to calculate the PS.
The key issue that you need to keep in mind when using this approach is that the
median fairly represents the error of your method in the applicable range. Several
things to check:
• Are there enough specimens in the sample from which you calculate the me-
dian?
• Are there enough specimens (at least 4) in the applicable range?
• Are there enough labs (at least 10) participating in the survey?
The advantage of this approach is that PT or EQAS results are readily available
for more analytes. Furthermore, the calculated PS is attainable which is a major
advantage.
This approach defines TEa from a laboratory experiment which resembles EP9
Method Comparison. (ref. CLSI:EP21)
• Calculation process
Aggressive RPE TEa = 3 * RPE
Achievable RPE TEa = 4 * RPE
The greatest danger with this approach is the potential that it is biased by results
from poor QC practices. This would make the RPE much larger than it should be.
Biological Variation
The algorithm used to calculate TEa uses values already calculated for the target
CV and allowable bias.
Fixed formula
25% Rule allots 25% of TEa to systematic error. In addition, the target SD is 25%
of the TEa. There is nothing special about this rule except that it is a very easy
way to allocate the total allowable error space to both random and system-
atic error. One should also remember that there are cases in which it doesn’t
work. An advantage of this rule is that about 3000 results out of a million will
exceed TEa.
Traditional practice often allocates 50% of TEa to systematic error. The target
SD is 25% of the TEa. With this approach, about 23,000 results per million
will exceed TEa.
Six Sigma is similar to the 25% rule in that it also allots 25% of TEa to system-
atic error. However, the target SD is 16% of TEa. With this approach, only 3
results out of a million will exceed TEa.
Relative Allocation
Near the end of this chapter, there are a number of worked out examples. This ap-
proach has been implemented in EP Evaluator®, Release 10.
• If clinical requirements are available, use them. If you have adequate resourc-
es especially interested experts, consider establishing your own.
• If a reference interval has been established, consider Tonk’s Rule. If the refer-
ence interval is near the low end of the reportable range, its usefulness will not
be likely.
• Consider biological variation data if they exist. In a number of cases, such
values are not achievable. However they may give you a starting point.
• Regulatory limits not necessarily from your jurisdiction.
• Calculate from peer group survey results.
• CLSI:EP21. Implementation of this will require the performance of a carefully
designed experiment in which results are entered for two methods, the pro-
posed method and one which someone else has established.
• Responsible Precision Estimate data will almost always be available.
Some of the values obtained from these sources may be so tight that they are not
achievable. Others are so wide that they are indefensible. We recommend that
you define minimum and maximum values for PS. The minimum value is the
best your equipment can reasonably achieve (i.e. the Aggressive RPE described
above). The maximum value is the regulatory limit or the Maximum TEa as de-
scribed below whichever is less.
6-10 Lab Statistics - Fun and Easy
Low End Performance Standards
In many cases, the concentrations of the linearity specimens at the low end of
the reportable range are such that the PS calculated for the upper portion of the
reportable range fails. For example, suppose the defined concentration of a low
end specimen is 5 for ALT (TEa=20%, typical reportable range is about 0 to 1000
U/L). The measured result is 7. In most cases, such values are good enough from
a clinical perspective. The problem is that in this case, it fails accuracy because 7
is outside the acceptable range of 4 to 6 (5 +/- 20%).
The purpose is to define a concentration term for TEa. There are two ways to do
it:
• Multiply an SD at a concentration near the low end of the reportable range by
3. (Recommended).
• Define a clinically insignificant concentration which works. The key is that the
concentration really be clinically insignificant.
PS define upper limits for two major performance metrics. They are:
• Upper limit for bias is 50% of PS.
• Upper limit for a target SD is 25% of PS.
One major implication of this rule is that the maximum %CV for any test should
not exceed 7.5% (25% of the maximum allowable TEa of 30%).
The following examples are meant to illustrate the process. The numbers calcu-
lated may not be applicable to your environment.
Starting Parameters
Units mmol/L
Medical Requirements None
Regulatory Requirements 4 units 4/140 = 2.85%
Peer Group Survey Median 1.23% 3 * PGS = 3.7%
CV
Low end SD n/a
The instrument in this case is the fictitious Eximer 250. PGS data is from New
York State PT Surveys.
Calculation Process:
Medical Requirements: None
• Regulatory Requirements: 2.85%
• PGS Value: 3.7% -- too large (exceeds maximum of 2.85%)
Sodium is one of those tests for which its QC needs to be monitored very closely.
I know of one lab which assays sodium in duplicate and then reports the mean
in order to improve the precision. I recommend using the tightest feasible values
which in this case corresponds to the regulatory requirements.
Starting Parameters
Units mg/dL
Medical Requirements 13%
Regulatory Requirements 30%
Peer Group Survey Median CV 5.4% 16.2%
The PGS results are from the New York State PT Surveys for the Siemens Dimen-
sion RXL.
In this case, Medical Requirements have been specified. This then sets an upper
limit for TEa of 13% which trumps the Regulatory Requirement of 30%.
Starting Parameters
Units mm Hg
Medical Requirements None
Regulatory Requirements 3 SD
Peer Group Survey Median CV 3.9% 3 * 3.9 = 11.7%
The PGS results are from the New York State PT Surveys for the Siemens 845
instrument.
Calculation Process:
• Medical Requirements: None
• Regulatory requirements specify 3 SD. This translates to forcing the use of 3 *
PGS value.
• PGS value: 11.7%
Options:
• Regulatory Requirements (also PGS value): 11.7% rounded up to 12%.
• TEa for pO2 could be set in the range of 8 to 12 mmHg.
Starting Parameters
Units pMol/L
Reportable Range (AMR) 3.9 to 77.2
Reference Interval 10.3 to 24.5
Medical Requirements None
Biological Variation - TEa 9.9%
Peer Group Survey Median 7.1% 3 * 7.1% = 21.3%
CV
Regulatory Requirements None
Responsible Precision Esti- 6%
mate
I initially did this calculation during a trip to the Netherlands in 2005. The instru-
ment was the DPC Immulite 2500. Much of the data was taken from the package
insert for this instrument. The Peer Group Survey data was taken from the SKML
(Dutch Clinical Chemistry Society) survey results. The Nether-lands at that time,
had no regulatory requirements for any clinical laboratory tests. In the US, the
CLIA requirements are +/- 3SD.
Calculation Process:
• Minimum TEa (3 * RPE): 18% (while possible, difficult over long-term)
• Maximum TEa: 30% (from Rhoads’ maximum TEa rule)
• Medical Requirements: none
• Biological Variation: 9.9% (unachievable)
• PGS value: 21.3% - round to 20 or 25%
• Achievable TEa from 4*RPE: 24%
Options:
• Defensible range: 18 to 25%
• Achievable range: 20 to 25%
CLIA ‘88 requires that for each instrument and analyte for which results are
reported, at least two levels of QC are to be tested. This requirement is good
laboratory practice because it requires the operator to demonstrate on every day of
operation that the analytical system is working correctly.
consistent. -1.00
-2.00
-3.00
In a Levey-Jennings chart, the 0 5 10 15 20 25
Day
target mean is the central tenden-
cy which is the expected result.
The target SD (along with some QC rules) describes the magnitude of the region
around the target mean within which the results will be acceptable.
The chart above shows an example of a well-behaved case. In this case, about
95% of the results are scattered in the 2 SD region above and below the mean.
SDI
Over the period shown in the -1.00
When QCMs are prepared, the manufacturer attempts to set the concentration of
each analyte at specific levels. QC materials are often marketed as Level I (Nor-
mal) and Level II (Abnormal) materials. In many cases, a third material with dif-
ferent analyte concentrations is also available.
Many labs purchase of large quantity of a single lot of unassayed QCMs to last a
year or more. This is useful because the lab has a substantial investment in estab-
lishing the target values. (It is often difficult to purchase such large amounts of
assayed QC materials.)
Target Mean
Establishing a target mean for assayed materials is relatively easy: Start with the
vendor’s values for your own instrument. Validate the values for your own ana-
lytes.
Establishing a target mean for unassayed materials is more difficult. Set target
mean to the average of the results collected on a daily basis for a month.
In an emergency, you can set the target mean based on as few as 6 results and then
update it every week for the next three weeks based on the later results you’ve
collected.
Maintaining a target mean is not always easy. Ideally, you use the same target
mean during the whole period the QCM is in use. However, some analytes in the
QCM such as enzymes and proteins, are unstable over time. They slowly degrade
over a period of many months. This forces the periodic adjustment of the target
mean. We recommend that for labile analytes like these, the target mean be adjust-
One way changes in means are validated is to compare one’s own target values
with those for other laboratories (peer group comparison) for the same analytes
and instruments. One’s values should be similar to those of the other laboratories.
Target SD
There are two general approaches to establishing the target SD.
1 Base it on the SD calculated from the set of results used to calculate the target
mean. This is the traditional approach.
2 Establish a value for the target SD which is somewhat larger than the SD cal-
culated from the routine QC results and use it over a long period of time.
Once the target SD is established, in most cases, you can continue using it or a
close equivalent even across lots of QCM as long as the mean values are similar.
In other words, you can use an SD after adjustment on successive lots of Level 1
control. To use it across lots of QCM, calculate the CV from the SD of the senior
lot, then for the junior lot, calculate the SD from that CV.
Given identical QC rules, the one with the larger target SD will have fewer in-
stances of QC failure. Therefore it is in the laboratory’s economic best interest to
use the larger target SD. If there is no significant downside in accepting a larger
error, then use the larger value. If the target SD is significantly larger than the
observed SD, a different set of QC rules should be considered for use.
To use the first option, calculate the mean and SD from the QC results obtained
over as many months as possible (at a minimum, the previous month). Avoid as
much as possible basing the target SD solely on just the previous month’s calcu-
lated SD because it is too unstable to be reliable.
To use the second option, set target SD’s based on a combination of what is need-
ed and what is possible for that method. If what is possible does not meet minimal
Performance Standard requirements, then that method is unsatisfactory.
The next issue is how to calculate the value of the target SD based on the 25%
Rule. For the 75 analytes for which CLIA has set proficiency testing (PT) limits,
TEa can be considered to be the PT limit at the target mean.
For the enzyme ALT, for example, the PT limits are 20% of the target mean. If
the target mean is 100 IU/L, the PT limits are 25 IU/L (25% times 100 IU/L). The
target SD is then set to 25% of the PT limits or 6.3 IU/L. For many instruments,
the observed process SD is about 3 or 4 IU/L. Since this is significantly smaller
For sodium, the PT limits are 4 mMol/L. The ideal target SD would be 1.0
mMol/L. Since on many instruments, a value this small is hard to achieve, one
may have to accept a value somewhat larger.
QC Rules
There is a long, rich and complex history to QC rules. Numerous scientific papers
have been written over the last quarter century describing and analyzing many
rules. The most common are known as the Westgard rules, named after James O.
Westgard, Ph.D. (Westgard Website) who popularized them.
Westgard and others have defined a series of rules designed to detect when an ana-
lytical system is no longer producing results which are in total control. An ideal
rule would immediately detect every instance in which a system deviates from
correct operation (no false negatives). Furthermore, it would produce NO false
alarms (no false positives). However, there are no ideal rules. All rules ignore er-
rors which are not totally egregious (at least for a while) and identify other events
as QC failure even though they do not correspond to problems.
1-2S: If one result is more than two SDs away from the mean, QC failure is de-
clared. This rule, popular in many laboratories, detects a great many problems.
However, it also generates a large number of false alarms. If the SD value is
calculated from the routine QC results, one would expect 5% of the values to
be outside the limits. If this rule is implemented for 20 tests on an instrument,
the target SDs are set to the observed SDs and the process is in control, at least
one QC failure can be expected about 65% of the time. This rule is widely
(and mistakenly) used in clinical laboratories.
2-2S: If two consecutive results either in the same or different QC materials are
more than two SDs away from the mean, QC failure is declared. This rule is
recommended in the CLSI document on internal quality control. It is much
less sensitive to false alarms. This rule is recommended for most analytes.
1-3S, 1-3.5S: If one result is more than 3 or 3.5 SDs away from the mean, QC
failure is declared. The purpose of this rule is to detect gross analytical prob-
lems. False alarms from this rule will be relatively rare. The downside is that
it will not detect developing problems at an early stage.
It turns out that not only is selection of the appropriate rules important, but that
the frequency at which the QC samples are run is also important.
Calculate, review and file the actual means and SDs from all your QC results
every month. In a system operating properly, you will not see large changes in the
values. This is an important reality check. The results from each instrument will
be slightly different from the next. One of your goals should be to make sure that
the differences are not significant.
One practice in many labs is that after a failure, the QC specimens are assayed
repeatedly until it passes. This practice is inappropriate because it will not assure
good results. For example, suppose this QC failure occurred because of a 2 SD
shift in the mean with the underlying reason being that the reagents had deteriorat-
ed. In this case, you would detect the failure fairly soon. However if you repeated
it several times, inevitably some of the QC results would pass and then accord-
ing to this practice, you would not have to determine the source of the problem.
Consequently, the underlying cause of the failure would continue and the quality
of patient results would be compromised.
To properly deal with the issue of QC failure, CLSI in their document on Quality
Control (CLSI:C24) says it well:
The point of all this is to ensure patient safety. If the correctness of any results is
uncertain, then immediate steps must be taken to ensure that correct results are
available as soon as possible to the caregiver.
Performance Validation
Experiments
In This Chapter
CLIA ‘88 technical regulations and good laboratory practices require that each
laboratory should check from 3 to 7 statistical parameters for each new quantita-
tive method adopted before patient results are reported.
Some individuals may believe that all these items have already been checked
for each instrument by the vendor before they applied for FDA clearance. Just
because a vendor has received clearance to market an instrument, does not mean
that the instrument in your lab always reports correct results for all its analytes. In
fact, most instrument vendors check out the operations of each instrument while it
is being installed in your lab.
Question: You have just received a tool which you use to provide information
to your clients. How important is it to you that the information is correct?
Many of us know of occasions when a friendly physician will call the labora-
tory and ask if a certain test is running high (or low). However the kindness of
that physician does not excuse poor QC/QA practices.
Accuracy: A CLIA ‘88 requirement. Accuracy is the ability to measure the cor-
rect amount of analyte present in the specimen. Evaluate this parameter using
a linearity style experiment. In EP Evaluator®, the Linearity module assesses
accuracy. See Chapter 10, Interpreting Linearity Experiments for discussion of
this issue.
Linearity: A CAP requirement for some laboratory disciplines. There are several
definitions of linearity, all of which show that a result does not differ from a
straight line by an excessive amount. In EP Evaluator®, the Linearity mod-
ule assesses linearity. See Chapter 10, Interpreting Linearity Experiments for
discussion of this issue.
Precision: A CLIA ‘88 requirement. Precision is the ability to obtain the same
result upon repeated measurement of a specimen. Evaluate this parameter us-
ing a precision experiment. In EP Evaluator®, three modules assess precision.
They are Simple Precision and Complex Precision. See Chapter 12, Precision
Experiments for discussion of this issue.
Sensitivity: A CLIA ‘88 requirement for methods which are performed differently
than according to manufacturer’s directions. Sensitivity defines the lowest re-
portable concentration for a method. It is of especial interest for analytes such
as TSH or troponin where the low concentrations are clinically important.
Evaluate this parameter using a sensitivity experiment. Three experiments are
suggested, two for Limits of Blank (LOB or Analytical Sensitivity) and one
for Limits of Quantitation (LOQ or Functional Sensitivity).In EP Evaluator®,
the appropriate modules are Sensitivity (LOB) and Sensitivity (LOQ). See
Chapter 14, Sensitivity Experiments for a discussion of these issues.
Specificity: A CLIA ‘88 requirement for high and modified moderate complexity
methods. Also known as interference, these experiments evaluate the degree to
which an interfering substance modifies the measured amount of the intended
analyte. Experiments to determine interference are not easy to design and
execute. Therefore this requirement may be satisfied by including appropriate
literature references in the package insert. It is the only requirement which can
be satisfied this way. In EP Evaluator®, the Interference module assesses this
parameter.
Recurring Requirements
Calibration Verification: A CLIA ‘88 requirement. The equivalent term from
the CAP is Verification of AMR (Analyte Measurement Range). The purpose
of the Calibration Verification experiment is to demonstrate that the method
produces accurate results across the Reportable Range. Note that whether an
instrument passes or not depends on whether the results produced by a method
are suitably accurate and that the specimens used for the experiment adequate-
ly challenge the limits of the Reportable Range.
There are two types of experiments which can be done to demonstrate method
comparability, both similar. The basic design of these experiments is to assay
Performance Validation Experiments 8-3
6 to 10 specimens for which their results cover a suitable portion of the report-
able range. The results from each of the instruments is compared with the tar-
get value. If the differences between the target value and the instrument result
for any of the specimens exceeds performance standards, the method fails.
The difference between the two experiments is whether there are only two
instruments or more than two instruments. See Chapter 9, Interpreting Method
Comparison Experiments for more details.
Interpreting Method
Comparison Experiments
In This Chapter
• Constant Bias: differences between sets of results which are constant at all
concentrations. (Non-zero intercept)
• Proportional Bias: differences between sets of results which are proportional
to the analyte concentration. (Slope significantly different from one.)
• Random error: differences between results due to random effects. (Impreci-
sion or other methodological error.)
• Patterns such as nonlinear results or outliers.
• Other systematic problems such as selective interference or matrix effects.
The sensitivity of the major statistical parameters is shown in Table 9.1. Each of
these parameters is discussed in detail. This table is derived from a similar table
in Westgard and Hunt (1973).
Table 9.1
Sensitivity of Statistical Parameters
to Three Classes of Error
Random Proportional Constant
Slope No Yes No
Intercept No No Yes
Std. Err. Est. Yes No No
Bias No Yes Yes
Std. Dev. Diff Yes Yes Yes
t Test Yes Yes Yes
Corr. Coef. (R)* Yes No No
Med. Dec. Pt. No No No
Predicted Bias No Yes Yes
95% Conf. Limits Yes Yes Yes
*Corr. Coef. (R) is also sensitive to the range of the results.
In both these figures, R is about the same, 0.99. However in Figure 9.1., the iden-
tity is poor, while in Figure 9.2. the identity is good. It is clear from these figures
that a good R has NO relationship to the degree of identity.
Please look at three cases (Cases 1, 2 and 3) later in this chapter. One is a case
of good data, one with proportional error (slope = 2) and one with constant error
(intercept about 20). In all cases, R is 0.9976.
There are three limiting values: +1 (results fall on a straight line with a positive
slope), -1 (results fall on a straight line with a negative slope), and 0 (results are a
round cloud).
R is used to determine which type of calculation should be used for the Predicted
Medical Decision point and the associated 95% Confidence Limits. If R is less
than 0.90 (for the Alternate module) or 0.975 (for EP9), then these items are cal-
culated using the Partitioned Bias Procedure. Otherwise they are calculated from
Deming linear regression statistics.
This is a “quality of data” statistic, not the “quality of method” statistic. A high
number indicates that a good range of data is present and that more weight can be
placed on the experimental results.
Keep in mind that different R values are expected for each analyte. Many chemis-
try analytes, such as LDH and glucose, have R values of 0.98 and higher because
they have wide ranges. For other analytes with narrower ranges, such as the
electrolytes, sodium and chloride, and several hematology cell differential param-
eters, R values typically are much lower. For sodium, neutrophils and basophils,
R values of 0.90, 0.80 and 0.30, respectively, are common.
One other aspect of R needs to be kept in mind. That is expressed by the follow-
ing approximate relationship (shown in the equation below) which holds when the
errors of the two methods are similar and R is greater than 0.8:
Corr. Coef (R) = Regular slope / Deming slope
One major implication of this relationship is that the Regular slope is always less
than the Deming slope. In all the experiments that I have simulated, the Deming
slope is a much better approximation of the target (true) slope. Consequently, the
Regular slope always underestimates the true slope. The only issue is by how
much.
A second implication is that if R has a value of 0.995, then the difference between
the slopes (0.5%) is insignificant in most cases. In this case, it probably doesn’t
matter whether the Regular or Deming slope is used. On the other hand, if R is
0.90, then the difference between the slopes (10%) probably is significant. In this
case, the Deming slope must be used.
A sound bite for the t test is “He who lives by the t test, dies by the t test.”
In the case studies at the end of this chapter, check the number of cases in
which the t test indicates problems and the number it misses.
Bias plots are useful because they show the relationships of the differences be-
tween two different results. Outliers are obvious in the plot. Subtle patterns can
show the presence of problems. For example, a group of results significantly
removed from the others could indicate that one instrument was out of control
on a certain day. Other patterns show the presence of non-linear assays (Case 5:
Non-Linear Pattern), random error (Case 4: Effect of Random Error), constant
bias (Case 3: Effect of Constant Error). With the default plotting approach of EP
Evaluator®, a bias plot for a good case is a round cloud.
Intercept is the Y value when X is zero. In other words, it is where the regression
line meets the Y axis.
Regular vs. Deming Linear Regression: In the Regular (ordinary least squares)
linear regression approach, the assumption is that the data plotted on the X
axis are absolutely accurate and have no error. In the Deming approach, the
assumption is that the data plotted on the X axis do have error. Clearly the
Deming assumption better represents data from clinical laboratory procedures
than does the assumption for regular regression.
With the Regular approach, the lines “drawn” from the individual data points
to the regression line (i.e. the values that are minimized) are parallel to the Y
axis. With the Deming approach, the lines are at an angle defined by the ratio
of the relative errors of the two methods. (See Figure 9.3.) Theoretically this
angle can vary from horizontal to vertical. The latter case, when the lines are
vertical, corresponds to the regular regression approach where there is no er-
ror in the X data.
One feature of the Deming approach is that if you exchange the data plotted
on the X and Y axes and recalculate the Deming slope and intercept, then the
second slope and intercept will be the “reciprocal” of the first. For example,
suppose that with Method A plotted on the X axis and B on the Y axis, you get
a slope of 2 and an intercept of 0. If you then replot Method B on the X axis
and A on the Y axis, you will get a slope of 0.5 and an intercept of 0. This re-
lationship usually fails if you use the values obtained from Regular regression.
If you would like to “correct” the slope and intercept of the Y method so that
the Y method results match those of the X method, use the equations below.
Advantages: (a) The resulting slope and intercept does not reflect the pres-
ence of outliers so outlier detection schemes are not needed. (b) No estimate
of the relative error of the method is needed because no assumptions about the
slopes of the residuals are made.
Disadvantages: It will not work with large numbers of data points because
the number of slopes increases as the factorial of the number of pairs of
points. EP Evaluator® does not use it if the number of pairs of results exceeds
500.
95% confidence intervals are useful because they allow a statement of statisti-
cal confidence to be made. Determine if the item (i.e. slope or intercept) is
within the 95% confidence limits. Since the Deming slope of 1.007 is within
the interval of 0.980 to 1.034, one can make a statement that:
The problem is how to define the magnitude of the error. The way we have chosen
to do it in EP Evaluator® is with the term of Representative SD. This is a single
number which represents the average error profile of this method. In general, if
two relatively similar instruments are being compared, the Rep SD’s can both be
set to 1.0.
Of interest is how various regression statistics change as the Rep SD varies over
a wide range. Note that the Regular slope is to 3 significant figures identical with
the Rep SD Ratio of 10,000.
The statistics are much more sensitive to differences in Rep SD’s when R is
relatively low. In the table below, for the data with the broad range (50 to 250),
the total difference in the Deming slope is about 2% (0.98 to 1.00) over a range
of values exceeding 1,000,000. For the narrow range data (70 to 100), the corre-
sponding difference exceeds 30% (0.84 to 1.14).
95% Confidence Interval: This popular feature which is often applied to many
quantitative analytes has two major weaknesses: a) Since it calculated from
the experimental data, its shape and position change depending on the qual-
ity of the data. For example, if an outlier is present, the bounds can expand
dramatically. Consequently it provides little help to the user in a regulatory
environment which specifies total allowable error for many analytes; b) The
curve has a slightly convex shape which is closest to the regression line near
the center of the results and then flares gently outwards at both ends. As a
result, it often does a poor job of detecting outliers especially at the low end of
the curve.
Allowable error: Its advantages are that it is independent of scatter in the experi-
mental data. Consequently it allows the user to judge the quality of the data
against a predefined standard. Unlike the 95% CI above, it does not change
its position relative to the regression line when presented with poor data. Its
shape is defined by the amount of allowable error, either concentration, per-
cent concentration or both. It is plotted around the Deming regression line.
White cell differential counts. Also known as the binomial distribution: The
scatter plot bounds are based on the statistical uncertainty for counting cells.
More specifically, it represents the error in finding the “true” fraction in the
limited population that was sampled which usually is the 100 or 200 cells that
were counted manually vs. the 10,000 or so that were counted by an instru-
ment. This approach is independent of the experimental data. The shape of the
bounds is concave and the bounds are centered on the identity (1:1) line. One
expects 5% of the points to be outside these bounds.
An analyte may have one or more MDPs. The best known examples of MDPs
are the limits of the normal range (reference interval). Glucose has at least
four MDP’s. From low to high, 40 mg/dL (lower critical value), 70 (lower
limit of normal range), 110 (upper limit of normal range) and 400 (upper criti-
cal value). (These values are approximate and may vary from lab to lab.)
Y Method Pred. MDP is based on the experimental results and the specified X
method MDP. It is sensitive to proportional and constant error.
The Predicted Decision Point and the 95% Confidence Limits are calculated
differently depending on the data. In Alternate Method Comparison, they are
calculated by either the Deming linear regression statistics or the Partitioned
Bias Procedure. The magnitude of R defines the approach used. If R is less
than the specified amount (user’s choice of 0.90, 0.95 or 0.975), the Parti-
tioned Bias procedure is used; otherwise, the linear regression procedure is
used.
95% Confidence Limits (95% CL) define the range within which the true Deci-
sion Point for the Y (i.e. new) method will exist 95% of the time if this experi-
ment were done repeatedly. This statistic is sensitive to random error (increas-
es its range), to proportional error (shifts and increases range), and to constant
error (shifts range).
If the OLD decision point is outside the range described by the lower and up-
per Confidence Limits, the Predicted Decision Point is significant. The user
should then consider changing the Medical Decision Point but only if the
quality of the data is good.
Note: Medical decision point statistics are not calculated in the Alternate module
when the number of results is less than 21 because there are insufficient numbers
of points to obtain statistically reliable results.
If the OLD MDP is between the lower and upper 95% CL, this experiment does
not support changing the MDP. If the OLD MDP is outside the range of the 95%
CL, this is evidence that the MDP should be changed. If so, one needs to proceed
cautiously and gather more information.
First one should assay a substantially larger number of specimens to confirm one’s
initial results. If you do confirm that there is a significant difference between the
OLD and NEW MDP’s, then you must re-establish the MDP.
The 95% confidence interval for the slope and intercept is approximately the ±2
SD range. If 1.00 is between the lower and upper limits, one can say with 95%
confidence that this slope is not different from a slope of 1.00.
Similarly, a 95% confidence interval can be calculated for the intercept. If 0.0 is
between these two numbers, one can say with 95% confidence that this intercept
is not different from an intercept of 0.0.
If two methods are calculated to be statistically equivalent, then their MDPs and
reference ranges should also be statistically equivalent. If this is not the case, this
discrepancy needs to be investigated.
Slope and intercept lose reliability when the correlation coefficient (R) is less than
0.95. There is too much scatter in the data to be able to draw statistically reliable
conclusions from the linear regression data.
Investigative
The significance of the various parameters is beyond the scope of this work.
Case Studies
Case 1: A Good Example
Case 2: Effect of Proportional Error
Case 3: Effect of Constant Error
Case 4: Effect of Random Error
Case 5: Non-Linear Pattern
Case 6: Effect of Outliers
Case 7: Effect of Extreme Range
Case 8: Effect of Range of Results
Case 9: Effect of Number of Specimens
Case 10: Effect of Poor Distribution of Results
The results used in these case studies were simulated. Unless otherwise indicated,
the number of results is 50, the target slope is 1.0, the target intercept is 0.0, the
target range is 50 to 250, the target CV is 2.0% and the medical decision point is
100. The calculated statistics are by the Deming method. The representative SDs
were the defaults of 1.0. The Scatter Plot Bounds shown are calculated with a 6%
allowable error.
The results used in these examples were obtained from a program which simulates
output from a clinical laboratory instrument. The program contained a random
number generator which output results in an approximate Gaussian distribution.
Since the random number generator was started with the same seed in all cases,
the same sequence of random numbers was generated for all cases where the num-
ber of points were identical.
Input to the result generating program includes a slope and intercept, the upper
and lower limits of the range of results before addition of the random error com-
ponent, CVs for both the X and Y methods and the number of points to generate.
The program generates a series of results at even intervals along the range. A
random error component is then added to each of those results. The results for the
nonlinear and outlier examples are arbitrarily “doctored” versions of the “good”
set of results.
The results were then imported into EP Evaluator® through the rapid results entry
facility using the Windows clipboard. Then the statistics were calculated.
If the above statements are all true, then you can make the powerful
statement that the two methods are equivalent within 95% confidence.
Target Slope Calc. Slope Std. Err. Est Bias Std. Dev. Diff.
1.0 1.011 3.4 -0.3 3.4
2.0 2.024 8.4 150.0 61.3
To see the impact of random error, compare the range of values on the X axis of
the two bias plots for this case where Y method has a 10% CV (range: -60 to 60)
with where it is only 2% (range: -7 to 10.5). You would not likely detect this dif-
ference as being important unless you were familiar with the test.
Note also that the scatter plot bounds in the scatter plot are much wider than they
are with the case of the Good experiment. They are defined by the 95% confi-
dence interval.
In this example, since the random error is not the same for the two methods, the
representative SDs were set to 2 and 10 respectively for the old and new methods.
The easiest way to recognize non-linear systems is to look at the graphs. While it
shows up on the scatter plot, it is most obvious on the bias plot. The slope in the
report suggests that things aren’t quite right, but the effects there are subtle. A
simple glance at the bias plot shows where the problem is immediately.
Note that if the highest value assayed had been 180 (vs. the actual 250), the non-
linearity would not have been detected.
The effect of outliers is shown in the statistics shown below. The data is identical
to that of Case 1, except that one point has been increased 50%. Blatant outliers
are easy to spot on the graph. They have most effect on parameters which are
indicators of random error such as SDD and SEE.
The slopes and intercepts for a case of extreme range are shown below. In the
second case, the intercept is -39.0, almost 10 times the magnitude of the smallest
number. This is not unreasonable since the largest value of the data is 50,000.
The single statistic which points directly to the problems of these data is the Cor-
relation Coefficient. The ONLY potentially positive item in these statistics is the
bias. All statistics related to slope and intercept are worthless!!
If the number of specimens is small, the quality of the statistics will be poor. The
poor quality shows up as larger 95% confidence intervals for the slope and inter-
cept and a larger range for the 95% confidence limits (95% CL) for the MDP.
Recommendation:
• The statistically rigorous and rugged CLSI EP9 protocol recommends that 40
specimens be assayed in duplicate for a total of 80 results.
• For clinical laboratories, a minimum of 25 specimens should be used. Keep
in mind that a good distribution of results is more important than numbers of
specimens. The quality of your experiment will be much better if you have
results from 25 specimens spread over most of the reportable range than if you
have 100 specimens all in the (relatively narrow) normal range.
This type of method comparison experiment is very different from the types we
have been discussing previously because it uses a different paradigm than the
previous ones as shown below:
This chapter is provided to help you understand the information calculated using
EP Evaluator®’s Linearity module. Linearity provides the facility of evaluating
laboratory methods for any or all of the following:
Best Fit Line is calculated in one of two ways depending on whether an allow-
able error is input or not. If allowable error is defined and linearity is selected,
then the best fit line is obtained using the clinical linearity algorithm; other-
wise it is calculated using regular linear regression.
Slope is the angle of the best fit line through the results. The ideal slope is 1.00.
Intercept is the value on the Y axis at which the best fit line intersects. The ideal
intercept is zero.
Std Err Est (Standard Error of the Estimate) is a measure of the dispersion of the
data points around the linear regression line. Used only with linear regression.
Observed Error is the minimum allowable error which could be defined for a
data set and still have it be linear. Used only with Clinical Linearity.
Proximity Limits are the user-defined acceptable limits for the concentration of
the specimens used to test the reportable range. If that concentration is within
the proximity limits, then the method passes one part of the two part test for
meeting the manufacturer’s claim for the reportable range.
Residual is the difference between the best fit line and a result. For example, if
the slope and intercept of a best fit line are 1.1 and +10, and a result of 55 is
obtained for a specimen with a defined concentration (defConc) of 50, the
residual is:
Linearity
One of the fundamental problems with Linearity studies at the time of this writing
(Summer, 2005) is that there is no consensus within the clinical laboratory indus-
try concerning either how to determine whether a set of data are linear or whether
determination of linearity even is necessary. One reason for this is that until the
advent of Clinical Linearity, a module unique to EP Evaluator®, no definition of
linearity took allowable error into account.
• A data set is linear if a straight line can be drawn through all the points. (Pen-
cil Width Definition)
• A data set is linear if the fitted polynomial curve is not significantly different
from the “ideal” linear equation. Other qualifiers are also applied. (Polynomial
Regression Definition) (CLSI:EP6 and CAP (1993)).
• A data set is linear if a straight line can be drawn through vertical error bars
centered on the mean of the results for each specimen. The lengths of the ver-
tical error bars are calculated from the user defined allowable error. (Clinical
Linearity Definition) (Rhoads and Castaneda-Mendez - 1994)
We believe that the last of these definitions, that of Clinical Linearity, makes most
sense in a laboratory setting because it is the only one that relates linearity to a
user defined allowable error.
Advantages:
• Traditionally accepted in the industry;
• Easy to calculate statistics (slope, intercept and standard error of the estimate
(SEE)).
Disadvantages:
• Unable to clearly define what constitutes linearity;
• Unable to determine which points are outliers;
• Unable to relate linearity to a user defined analytical goal;
• Slope and intercept of best fit line may be biased if outliers are present.
Advantages:
• Approved by CLSI and CAP.
• Required by the FDA for certain submissions.
Disadvantages:
• Can incorrectly declare a case non-linear when Allowable Error criterion is
stated in percent and lower limit of data is near zero. (EP6A)
• Non-linear descriptions are hard to understand. (CAP)
• Identifies outliers, but does not exclude them. (Both)
Clinical Linearity
This is the only algorithm in commercially available software which can deter-
mine whether a set of results is linear within a user defined allowable error (Ea).
The functional definition of Clinical Linearity has three steps:
Entry of Allowable Error: The user enters an Ea such as CLIA PT limits for
each method. Ea is expressed in units of concentration or percent of concen-
tration or both.
Error Bars: Vertical error bars for each linearity specimen are calculated from
the Ea. Each error bar is centered on the mean of the results for that specimen.
Determines Linearity: Our innovative algorithm attempts to draw a straight line
through the error bars. If such a straight line can be drawn, then the user can
claim that the data set is linear within the Ea.
Two additional steps provide additional data:
• Detect and Exclude Outliers: If the system is non-linear, the program looks
for and excludes outliers one by one from the data set. The process of exclud-
ing outliers continues until the remaining points are linear within the allow-
able error or until results for only three specimens remain. Outliers may be
anywhere in the data set including the middle points.
• Calculation of Best Fit Line: Several statistics are based on the best fit line
which is calculated by iteratively adjusting an internal Ea value until only one
line can be drawn through the error bars. The intrinsic error is the internal Ea
of the best fit line. The slope and intercept are those of the best fit line.
Disadvantages:
• Use of the clinical linearity algorithm has not been adopted by others in the
industry primarily because it is proprietary.
Criteria for passing the Linearity test are that the system is linear by the Clinical
Linearity test.
Accuracy
Accuracy is the agreement between a result and its true value. Ideally, the true
value is obtained using a method which produces results equivalent to a definitive
method. Since that rarely is the case, the practical “true value” is usually obtained
by averaging results from many instruments.
A Recovery Plot is shown at right. The significant features of this plot are:
• An outer envelope (dashed lines
- in this case, just inside the top
and bottom graph boundaries)
which defines allowable total er-
ror. A point is considered to have
excessive error if it falls outside
the outer envelope.
• An inner envelope (dotted lines)
which define allowable systemat-
ic error. The mean for a specimen
is considered to have excessive
error if it falls outside the inner
envelope.
• 100% Recovery line is the ideal
recovery (i.e. perfect accuracy).
Both the outer and inner envelopes are centered around this line.
The reportable range test is only applied to the lowest and highest specimens in
the series. There are two criteria for passing:
• The specimens pass the proximity test, namely that the defined concentration
be sufficiently close to the limits of reportable range specified by the vendor
for that method.
The purpose of this test is to assure the laboratory that both ends of the report-
able range are adequately tested. Cases are known in which the defined con-
centration of the highest specimen in the series is less than 50% of the upper
limit of the reportable range.
Suppose the vendor specifies that the reportable range for sodium is 110 to
160 mmol/L and that the laboratory specifies that the proximity limit is +/-
10%. The defined concentration of the specimen with the lowest concentration
in the linearity series must be in the range of 99 to 121 (110 +/-10%). If the
defined concentration of that specimen is 125, then the reportable range test
fails.
• The lowest and highest specimens meet accuracy requirements. This test, of
course, requires results from the accuracy test.
CLIA ‘88 requires the establishment or verification of the reportable range for
moderate and high complexity methods before they can be put into service.
In other words, the lab must verify the accuracy and reportable range of each
method periodically. This experiment is identical to the CAP requirement of Veri-
fying the Analytical Measurement Range. This current requirement represents a
significant change from the regulations promulgated in 1992.
Case Studies
These cases are real linearity experiments. Results were obtained during the initial
validation of an instrument. They illustrate a number of the issues that will be
encountered during the method validation process.
Case Analysis
• Results are clearly non-linear.
• Recovery is poor throughout the reportable range. Of the six specimens, two
are declared to be accurate. Four are considered to be linear.
• If a concentration component (5 mg/dL in this case) is added to the TEa, the
accuracy and linearity problems with the lowest specimen are fixed. However
those problems with the highest specimen still remain.
• Clearly there are severe systematic problems for this assay which need to be
resolved.
Case Analysis
• All the results in this case seem to fall about 5 to 7 units below their assigned
values. This seems to be an example of constant error.
• To fix it, one would proceed through the usual trouble-shooting process of
repeating analysis of the existing specimens, preparation of fresh linearity and
calibration specimens followed by a re-calibration and re-assay of the linearity
specimens. If those measures fail, then an examination of the instrument is in
order.
Case Analysis
• This case represents one in which not all specifications of TEa and Reportable
Range (RR) are correctly specified.
• To fix the low end accuracy problem, one simply needs to add a clinically
insignificant concentration component to TEa. (A value of 8 units works.)
• Another reason for the RR failures is that the RR and/or proximity limit speci-
fications are unsatisfactory. If upper RR limit was 1500 units (vs the present
2000), it would pass. The CAP recommended proximity limit of 10% at the
Understanding Proficiency
Testing
In This Chapter
Most CLIA ‘88 requirements are established or verified experimentally. We dis-
cuss:
• The types of experiments which may be used to establish or verify each CLIA
technical requirement.
• Guidelines for testing new instruments.
One of the key elements of the current CLIA ‘88 regulations is proficiency testing
(PT). The intent of PT is to improve the quality of results produced by clinical
laboratories. On examination, it turns out that PT is heavily statistical. Poor labs
are much more likely to fail than good labs. However, the statistical nature of the
process means that even though the odds are against it, very good labs can fail and
likewise very poor labs can pass. This chapter has two purposes: a) to indicate
ways to improve one’s odds for passing PT; and b) to understand the statistics of
PT. But first, the rules. . .
Regulatory Requirements
The CLIA regulations state that at periodic intervals (at least three times per year),
each laboratory must participate in proficiency testing surveys (an event) unless
testing for that analyte is waived by CLIA. Five proficiency samples (challenges)
for each PT analyte are to be analyzed. After the results for the five specimens are
returned to the PT provider, the provider determines if the results fall within the
PT limits around a target value.
A result is graded as passing if it falls within the PT limit and fails if it is outside
the limits. A passing grade is obtained when 80% or at least 4 out of 5 of the
results are within the limits. A grade less than 80% for an analyte, specialty or
subspecialty, results in an unsatisfactory performance. An unsatis-factory perfor-
mance for two of three consecutive events results in an unsuccessful performance
at which point sanctions will be imposed.
Theoretical Approach
Since PT is a statistical process, laboratorians should understand on a theoretical
basis how to improve chances of passing. In simplest terms, total error should
be reduced to such a level that the probability of failing PT drops to essentially
zero. Total error has three major components: systematic error, random error, and
idiosyncratic error. These were discussed in Chapter 5, Understanding Error and
Performance Standards.
In this example, the difference (PT Limits minus Bias) is equal to about 1 SD. The
probability of that result being outside the PT limits is about 17%.
Note that this concept is The X axis is the difference (PT Limit - Bias) expressed
very similar to the one in SD units. The Y axis is the percent of results outside
discussed earlier in Chap- PT Limits.
ter 6, Defining Perfor-
mance Standards. There it was expressed in terms of total allowable error (vs. PT
Limits in this chapter) and systematic error (vs. bias in this chapter).
Bias
Bias can be a serious prob-
lem in proficiency testing. If
not kept under control, it
can “kill” you! This figure
shows that the probability
that a result will be outside
the PT limits increases sig-
nificantly as bias increases.
The page showing the Probability of PT Failure part of the report from EP Evalua-
tor® is shown at the end of this chapter. One of the tables is shown below.
PT Analysis Table
The PT Analysis table shows how biases are calculated and gives probabilities
that a result will be outside its PT limits for each specimen.
Assigned Conc’ns are the same concentrations that were originally input into the
Linearity Results Entry screen.
The situations in which a lab must pass PT are shown in the table below. For the
purposes of this illustration, the present time is assumed to be shortly before the
PT event in December. To avoid an Unsuccessful Performance, the lab must pass
PT in December in cases 1 and 2. Furthermore, in case 2, it also must pass the
event Next April. The lab in case 3 is not in a Must Pass situation.
From now on
December (now) Must Pass Must Pass --
Next April -- Must Pass --
Failure Relationships
Of interest is how the Probability Result Outside Limits relates to failing the three
types of PT events. Here the probabilities of event failure are calculated assum-
ing that the failure probabilities for all five specimens are the same. Note the
substantial increase in likelihood of failing once a single event has been failed. In
some cases, such likelihood of failure can increase by a factor of 20. Clearly, it is
worthwhile to avoid failing any events.
Performance Requirements
Once the various probabilities of failure have been calculated, the next major
issue is “the level of bias and precision needed to be sure of passing.” Work to
adjust the bias and SD so the result calculated with equation below is at least 3.
Ideal is 4.5 which corresponds to Six Sigma. It is not worthwhile expending effort
to get a value higher than that.
Num SDs = (PT limit - Bias) / SD
The probability of failing the next event as a function of the probability of a result
falling outside the PT limits is shown below. The important message is that once
the probability of a result being outside the PT limits drops below 0.2% (which
corresponds to PT - bias >= 3), then the probabilities of failure become vanish-
ingly small, less than 0.01%.
There are approximately 12,000 hospital and reference laboratories in the United
States. With each analyzing at least 60 PT analytes, there are a total of 720,000
opportunities for failing each PT event. If the overall probability of an unsatisfac-
tory performance for this entire group is 0.01% (1 chance in 10000), then statisti-
cally, one would predict 72 unsatisfactory performances in this group.
The more PT tests a lab has on its menu, the more opportunities there are for
problems. An insurance actuary would note the increase in “exposure.” If the
worst case is a single test with a 1% probability of failure, the probability of pass-
ing is 99%. However if 60 tests each has a 1% probability of failure, the overall
probability of passing all the tests is 55%.
For example, suppose a lab has 5 analytes with the probabilities of failure as fol-
lows: Sodium (5%), Potassium (0.1%), Glucose (5%), Chloride (1%), and BUN
(0.2%). Overall probability of passing would be (after converting from percent to
fractions):
PP (sodium) = 1 - 0.05 = 0.95
PP (potassium) = 1 - 0.001 = 0.999
PP (glucose) = 1 - 0.05 = 0.95
PP (chloride) = 1 - 0.01 = 0.99
PP (BUN) = 1 - 0.002 = 0.998
The calculations can be simplified by omitting failure probabilities less than the
25% of the largest probabilities because their contribution are insignificant. In the
example above, if the probabilities for chloride (1%), potassium (0.1%) and BUN
(0.2%) were omitted, the overall PP changes from 89.08% to 90.25%, an insignifi-
cant change.
Keep in mind that in the period when the PT results are due, other labs may
be having similar problems and that the vendor’s hot line may be very busy.
Consequently, keep your instrument in good condition all the time and avoid
last minute repairs.
The purpose of this exercise is to pass PT. Failing PT can jeopardize the future of
your laboratory. Consequently, you should take all legally available measures to
increase your odds of success.
• Before PT samples are run, run your usual controls to make sure that your
instrument is “on the money.”
• Then carefully prepare and analyze the PT samples as if they were patient
specimens. Remember that you cannot analyze PT samples in duplicate un-
less you routinely assay patient samples in duplicate.
• Enter the results on the report form and check the entries. Get someone else
to check the entries for errors. It is a shame to go through all that work to get
everything working properly and then make a mistake reporting the results.
12
Precision Experiments
In This Chapter
Precision experiments evaluate random error. We discuss:
• Deciding which type of precision experiment to perform.
• Simple Precision Experiments.
• Complex Precision Experiments.
• Interpretation of results from precision experiments.
One decision each lab must make is what type of precision experiment to perform.
The two possible types of experiments are listed below:
The simple precision experiment is a “quick and dirty” way to fulfill the letter of
the precision requirement Its advantage is that it is easy to do. Its disadvantage
is that it only tells you about one element of the precision performance of the
instrument, namely within-run precision. It says nothing about the other types of
precision (between-run, between-day, and total). If the laboratory cares about the
quality of its results, we encourage them to perform the complex precision experi-
ment. While it does take a little longer, the additional information is valuable.
This procedure is valid for verifying manufacturer’s claims only if the manu-
facturer makes within-run precision claims.
Materials: Obtain at least two or possibly three specimens for the precision study.
If two specimens are used, one should be in the lower portion of the reportable
range and the other should be in the upper portion.
Experiment: Assay each specimen at least 20 times in the same run.
Data Required for the Calculations: You need to know an approximate concen-
tration and the results. You have conducted one simple precision experiment
for each concentration of analyte. Concentrations can be entered in the form
of “Low” or “20” or “Level 1” in EP Evaluator®. Optionally, TEa and the ran-
dom error budget can also be entered. If these are entered, then the observed
precision can be checked to see if it meets explicit precision requirements.
Calculations: In EP Evaluator®, the appropriate statistics module is Simple Pre-
cision. Enter the data and calculate the results.
SD and CV: The most important numbers on the page. They should be compared
with the manufacturer’s claims at similar concentrations. Keep in mind that
an SD marginally larger than that claimed by the manufacturer may not be
significant.
Precision Graph: Look for any significant trends in the data. An example would
be all the early points being above the mean followed by a drift downward
over the analysis period.
Mean: Significant only if the manufacturer’s mean is appreciably different.
95% Confidence Interval: This is the range within which the mean is expected
to occur 95% of the time if this experiment were repeated.
2 SD Range: This is the range within which individual results are expected to oc-
cur about 95% of the time in future experiments.
Precision Graphic: Displays the observed SD complete with a 95% confidence
interval. If the observed SD exceeds the Random Allowable Error (REa), then
it fails. However, if the bottom of the 95% CI is less than REa, the user may
legitimately declare the experiment as having passed and override the failure
declaration of the program. Keep in mind however that the magnitude of the
95% CI is also dependent on the number of results (N) included. As N increas-
es, the 95% CI decreases.
EP Evaluator GLUCOSE
Clinical Laboratory -- Kennett Community Hospital
Instrument: XYZ
Sample Name: HIGH
EP5 Precision
Claim Evaluation
User's Concentration: 244.2 Claim Concentration: --
Standard Deviation
User's Verification
df % CV User's Claim Value (95%) Pass/Fail
Within run 40 1.2 2.8 2.5 2.95 Pass
Between run 0.7 1.8
Between day 0.6 1.4
Total 65 1.5 3.6 3.4 3.88 Pass
Medical Req 65 1.5 3.6 -- -- --
The calculated value passes if it does not exceed the verification value.
Results:
242 246 245 246 243
242 238 238 247 239
241 240 249 241 250
245 246 242 243 240
Supporting Data
Upper 95% tolerance limit for 95% of user estimates Analyst Alice Doe
Analysis Date 12 May 2000 to 31 May 2000
df for user's within run total 20 / 0
Days (total/excl)
experiment SD SD 2
Runs per Day
10 3.8 4.9 Reps per Run 2
20 3.5 4.5 Critical Value 95%
30 3.4 4.3 Reagent AA, Lot ABC 87011
40 3.3 4.2 Calibrator BB, Lot DEF 4700
50 3.3 4.2 Units mg/dL
60 3.2 4.1 This table provides Verify Mode Verify Vendor Claim
70 3.2 4.1 data for a manufact- Allowable Total Error --
80 3.2 4.1 urer to include in Random Error Budget --
90 3.1 4.0 published materials Allowable Rand Error --
100 3.1 4.0 for users. Comment
Accepted by:
Signature Date
EP Evaluator 7.0.0.99 Copyright 1991-2005 David G. Rhoads Associates, Inc.
Default Printed: 14 Jun 2005 16:40:48 Page 1
EP Evaluator GLUCOSE
Clinical Laboratory -- Kennett Community Hospital
Instrument: XYZ
Sample Name: HIGH
EP5 Precision
Experimental Results
Date Results Date Results Date Results
12 May 2000 242 246 19 May 2000 245 245 26 May 2000 247 248
245 246 243 245 245 246
13 May 2000 243 242 20 May 2000 243 239 27 May 2000 240 238
238 238 244 245 239 242
14 May 2000 247 239 21 May 2000 244 246 28 May 2000 241 244
241 240 247 239 245 248
15 May 2000 249 241 22 May 2000 252 251 29 May 2000 244 244
250 245 247 241 237 242
16 May 2000 246 242 23 May 2000 249 248 30 May 2000 241 239
243 240 251 246 247 245
17 May 2000 244 245 24 May 2000 242 240 31 May 2000 247 240
251 247 251 245 245 242
18 May 2000 241 246 25 May 2000 246 249
245 247 248 240
'X' indicates an excluded run, 'O' indicates an outlier run, and 'S' indicates a day that does not
have a full complement of results. In all of these cases, the entire day is excluded from the calculations.
Understanding Reference
Intervals
In This Chapter
Verification or establishment of a correct reference interval is a CLIA ‘88 require-
ment. We discuss:
• The definition of a reference interval.
• The major issues with respect to establishing or verifying a reference interval.
• What experiments can be used to verify a proposed reference interval.
• What experiments can be used to adjust or establish a reference interval.
• Issues with respect to interpretation of results from verification of reference
interval experiments.
Key Concepts
Medical Decision Point: that value for an analyte which represents the boundary
between different therapeutic approaches.
Note also another pun that of the word “normal”. There are two relevant defi-
nitions of this term in this context: one referring to a healthy population, the
second a more statistical one referring to a (hoped-for) Gaussian distribution.
Note in the case of Haptoglobin, that both lower and upper limits are shown
as a 90% CI. The point of this is that reference intervals, just like other statis-
tics, also have some uncertainty.
Reference Interval: A pair of medical decision points which frame the limits of
results expected for a given condition. All normal ranges are reference inter-
vals. Not all reference intervals are normal ranges.
Therapeutic range: Reference interval applied to therapeutic drugs. Establish-
ment of a therapeutic range can be very difficult. One of the major problems is
collecting specimens from patients who often-times are very sick. Examples
are shown in below.
There are an increasing number of tests for which medical decision points and ref-
erence intervals are established by the industry. One might say that these are “cast
in stone.” These are discussed in a later section.
While it is a lot of work on your part to establish a good range, there is much
more effort, pain and expense on the part of your customers to do inappropriate
work-ups because of falsely abnormal results produced in your laboratory. It is
even more of a problem if diagnoses are missed because your results falsely indi-
cated that the marker disease was not present.
Make sure that you use enough specimens to assure the statistical conclusions that
you draw are reasonable. Twenty specimens is enough only to verify the RI. It is
far from enough to adjust it.
13-4 Lab Statistics - Fun and Easy
Establishing a Normal Range
A reference interval (RI), known more familiarly as a normal range, is that range
within which the results of a healthy patient are expected to fall. RIs are estab-
lished by assaying a large number of specimens from healthy people. Then the re-
sults are ordered by value and the central 95% is used. Effectively what happens
is that the bottom 2.5% and the top 2.5% are both chopped off from this ordered
list.
At first blush, this seems to be simple to fulfill. However some complicating is-
sues quickly arise. These come under the categories of analyte type, pre-analytical
issues and approach to the calculation.
Analyte Type
• If the analyte is not endogenous (i.e. it is a drug), the user may declare the
reference interval / normal range to be zero to zero. However, there is the ad-
ditional problem in establishing the therapeutic range. Usually one relies on
literature values and/or package inserts.
• In addition, there are many analytes for which reference intervals are inap-
propriate, such as cholesterol, HbA1c and the like. Cutoff values are used for
these analytes, not reference intervals. Establishing suitable cutoff values is
non-trivial. ROC software can be used for this.
Pre-Analytical
Issues here relate to the population from which the specimens are taken, the care
with which they are processed, and the number of specimens.
• Ideally, samples are obtained from a healthy population. Often, this require-
ment is difficult to achieve. Possible alternate specimen sources are discussed
below.
• The population used to establish or verify the RI must be the same as the
population to which the RI applies. For example, the population used to
verify or establish the RI for PSA should be males. It should not be the usual
laboratory employee population in which young women predominate.
• RIs can differ dramatically with age, gender and lifestyle. These differences
need to be anticipated during the specimen collection process. Two examples:
a) Hemoglobin concentrations are quite different for a pediatric population
than for an adult population; b) Cholesterol concentrations are much lower
for a Japanese population than for an American population.
Calculation Approach
• In a great many cases, the distribution of results is not Gaussian. Therefore
the simple (un-transformed) parametric calculation of the reference interval
mean +/- 2 SD will not work.
• Many additional issues with respect to RI calculations and interpretation of
the results are discussed later.
WARNING:
Do not calculate the Normal Range from the mean +/-2 SD unless
it is very clear that the distribution really is Gaussian or unless it is
clear that this approach is the best of all reasonable alternatives.
Excluding a result just because it appears to be a little large or a little small may
cause the calculated normal range to be in error. It is very tempting to conclude
that EVERY result that falls outside some published normal range is an “outlier”
that should be excluded. Not so. To use a deliberate pun, it is nor-mal for some
results to lie outside the normal range. If you exclude every result outside the
published range, you will guarantee that your reference interval estimate is too
narrow.
“Outliers” and “Tails” are actually separate problems. An outlier, in the context
of NCCLA:C28, is a single point or two that is very far from the others. A tail is a
larger group of points that cannot be readily separated from the main body of the
data, but gives the histogram a non-Gaussian appearance. Possible causes:
The way we did this was to make the assumption that the reference interval
determined for all the results in a data set was accurate when calculated using
CLSI:C28. Then we randomly sampled that data set 50 times for each trial with
N ranging from 25 to 1000. We then submitted these various sets of data to our
reference interval software and examined the two point estimates and the 90%
confidence intervals for both.
We then evaluated the systematic and random error of the various approaches.
Systematic error was the difference between the mean estimates of the lower and
upper reference interval limits and the correct value. Random error was the dis-
persion of the various calculated limits around those computed reference limits.
Total error was calculated as the sum of systematic error plus 2 times random
error.
We observed that:
• Random error decreases with increasing N for all approaches. If N increases
by a factor of 4, random error decreases by a factor of 2. Of the algorithms
tested, the nonparametric method has the greatest random error.
• While it is somewhat affected by sample size, Systematic Error is more a
function of the estimation method (i.e. parametric, nonparametric) than sam-
ple size. Error performance of the algorithms is exactly reversed, as compared
to random error. The nonparametric method has the least systematic error, and
the parametric method has the most.
• If the results have a nice Gaussian distribution, all the approaches work well
and give similar results. Both systematic error and random error are low.
• If the results have a significant tail, then there can be substantial amounts of
systematic or random error or both regardless of the calculation approach. In
a distressingly large number of cases, total error exceeded 20%, sometimes by
a large amount, even when N was 120.
• The CLSI non-parametric approach is by far the best approach with large
numbers of specimens (N>500) assuming small numbers of outliers (less than
1-2% at either end).
• Transformations can be dangerous with small numbers (<60) of results when
outliers are present, since the outliers have undue weight in estimating the
transformation. Trying to force a Gaussian distribution to “fit” the outlier
makes the reference interval estimate less accurate rather than more accurate.
Understanding Reference Intervals 13-9
• With small N (<60), the best approach is parametric, again assuming that the
numbers of outliers are small. One must realize that in this case, the calcu-
lated limits will be relatively inaccurate. However they are more likely to be
closer to the correct value than results calculated using other approaches.
Calculation Approaches
Different types of calculations are used to Establish Reference Intervals in EP
Evaluator® because several complicating issues often occur:
• Distribution of results in the sample. The fundamental issue here is that in
many cases, the distribution of results is non-Gaussian. In other words the
distribution may exhibit skewness and/or kurtosis. In these cases, the conse-
quence is that the simple mean/standard deviation calculations, which assume
a Gaussian distribution, will not accurately calculate a normal range.
• Difficulty of getting adequate numbers of results. Obtaining the required
number of results can be a daunting task in some settings. One of the most
difficult is obtaining an adequate number in pediatric settings. It is not easy to
obtain 60 specimens from healthy children. Additional complications occur
when the RI is age- and gender-related so the number of specimens for a given
RI is reduced even further.
• Reducing the effects of tails and outliers.
The characteristics of each of these types of calculations is described below:
CLSI:C28 - Non-parametric
Advantages: This is the industry standard. It is relatively simple to use and un-
derstand. One of its best attributes is that it makes no assumptions about the
distribution of the data.
Disadvantages: A relatively large number of specimens (minimum of 120) is
required.
How it works: The results are ordered by value. With 120 specimens, the lower
end of the RI is the value of specimen 3. The upper end is the value of speci-
men 118. The 90% confidence interval is defined by the range of several
specimens, in this case specimens 1 to 7 at the lower end.
When to use it: Any time you have enough specimens (minimum of 120).
Parametric
Advantages: Easy to understand. Easy to calculate. Often the only reasonable
method for small samples.
Disadvantages: Assumes a Gaussian distribution. Sensitive to tails and outliers.
May give a negative lower limit for skewed data.
How it works: Simple calculation of mean ± 2 SD.
When to use it: Whenever there is a Gaussian distribution of the data, or when
the sample size is very small (N<50) and there are no obvious outliers.
Robust
In an early version of EP Evaluator® robust techniques were used. After an ex-
tensive study, we removed them because we felt they did not generally improve
the calculated RI. While in a few cases there was an improvement, in most cases,
there was none. To remove the complexity, we deleted it.
Confidence Interval
The confidence limit indicates the uncertainty of the normal range estimate. In an
example shown earlier (Table 13.1), the point estimate for the lower limit of the
haptoglobin normal range is 25 mg/dL and the 90% confidence limit is from 20
to 33 mg/dL. It is important to determine the uncertainty in one’s result because
then there is a sense of the magnitude of the random error of the estimate of the
calculated value.
For example, suppose you calculate the normal range of calcium (traditional range
is 8.5 to 10.5 mg/dL) as well as the 90% confidence limits (CI). One’s conclu-
sions about the quality of the data are very different if the CI is the relatively tight
8.4 to 8.6 in one case vs. a much looser 8.0 to 9.1 in another.
The major factor affecting the confidence interval is the number of specimens.
The CI will tend to halve in size as the number of specimens increases by a factor
of four.
Also, do not assume that the “best” method in the reference interval report is the
one with the narrowest CI. The confidence intervals for different methods are not
directly comparable. This is particularly true when comparing a parametric meth-
od to a non parametric method.
The Confidence Ratio (CR) is a measure of the relative width of the 90% CI for
the low and high limits of the reference interval to the width of the reference in-
terval itself. It is undesirable for that value to be greater than 30% because a wide
CR indicates a lot of uncertainty in the measurement of the reference interval.
where UL and LL are the upper and lower limits of the appropriate reference
interval (LRL and URL).
The case studies below illustrate how these issues affect the interpretation of the
study.
Plots of Results
Three plots are presented for each set of data.
Mean and SD
Median
Central 95% Index. These are the indices of the two values used to establish the
lower and upper limits of the non-parametric approach.
Transformation Process
The table of Normalizing Transformations shows the exponent and constant for
the transformation process. Non-blank values for the exponent include 0.0 (log),
0.25, 0.5 (square root), 0.75 and 1.00. While it is possible to normalize a wide
variety of data sets using a broader range of exponents, this has not been done
because it is entirely possible to make absurd data seem good.
Keep in mind that significant errors can be introduced by transforming data sets
with small N’s (<60).
The data has an un-skewed Gaussian distribution. Note that in the triple plots,
the histogram shows a more or less even distribution of the data. The data lies
nicely on the probability line in the probability plot (original data). Furthermore
no transformation was found which significantly improves the fit to a Gaussian
distribution. In this case, all the limits are essentially the same. This is expected
when the data are Gaussian and there is no transformation.
One major consequence is that the reference interval limits derived from un-trans-
formed calculations (parametric and robust - shown in gray) are unreliable. The
indicators of this are: a) evidence of skewed data in the triple plots; b) the statis-
tics in the Goodness of Fit table indicate skewed data; and c) the lower limit for
the parametric approach is a negative number.
For example with 40 specimens, the CR is about 22% (0.5*2.3/5). With 800
specimens, the CR drops to 5% (0.5*0.5/5), a 4.5-fold improvement. There are
two issues in this area:
• Making sure that the CR is sufficiently small so that your published RI is
meaningful in the many contexts in which it will be used.
• Realizing that while it is desirable to minimize the CI range, the amount of
work involved in getting improved confidence limits will at some point ex-
ceed the medical usefulness of the added precision of the measurement. This
decision will be closely related to the ease with which you can get additional
specimens.
Results for both men and women were included in the calculation. The values
shown were calculated from Hgb data using the parametric approach. The data
had a Gaussian distribution at all N’s. The first N (ranging from 20 to 800) speci-
mens in the set of 905 results obtained from a health fair were used to perform the
calculations. This experiment simulates the results as if a laboratory had obtained
N specimens from an arbitrary source.
CLSI:C28 points out that separate reference intervals may not be justified un-
less they will be useful and/or are well-grounded physiologically. Obtaining an
adequate number of samples is also an issue. You need twice as many samples to
estimate separate ranges for men and women than to estimate a single range for
both combined. Even if separate intervals are clinically useful, there is a trade-off
between one’s desire to have separate ranges and one’s ability to acquire enough
results to measure a statistically significant difference between the two groups.
Mean (the Z Test): Difference between the groups is statistically significant if the
difference in the means divided by the pooled SD exceeds a Critical Value that
depends on the sample size.
SD: (Ratio test for equality of variance) Separate references are justified if the
larger SD is more than 1.5 times the smaller SD.
There are plenty of specimens. Consequently, assuming the participants are rea-
sonably healthy, the results should be reasonably good. Partitioning observations:
Outliers: one or at most two results widely removed from the rest of the data.
True outliers may be the inclusion of specimens from one or two sick persons
in the study.
Tails occur either because the data really
is non-Gaussian or it has results from With a small sample, a tail in
a mix of multiple populations. There is the population may look like
no clear line of demarcation between an outlier in the sample.
the tail and what we want to believe is
the “good data.” We will discuss tails
further in Case 6.
In practice, this is often difficult to do because the person’s medical history is not
available to the analyst. For example, if you collect specimens at a health fair or
from employee pre-employment physicals, most of the specimens will be from
healthy people. Keep in mind that you rarely have assurance that allyour speci-
mens are from healthy people. It doesn’t take many specimens to change the
answer significantly.
In some cases, a tail may occur in the absence of disease. In some of these cases,
it doesn’t matter what that RI is because the results in the tail are not indicative of
disease.
The point is that you just can’t take any set of patient results and get a valid refer-
ence interval. You must think through the specimen acquisition process BEFORE
you acquire specimens as that more than anything else will define your eventual
reference interval.
Sensitivity Experiments
In This Chapter
Verification or establishment of sensitivity is a CLIA ‘88 requirement. We discuss:
• Several definitions of Sensitivity.
• The two types of sensitivity implemented in EP Evaluator® and the situations in
which the use of each is appropriate.
• Two experiments to determine Limits of Blank (analytical sensitivity)
• Experiment to determine Limits of Quantitation (functional sensitivity)
The protocol for performing such an experiment may be found in CLSI:EP17. CLSI:17
is not implemented in EP Evaluator®.
There was a name change between Releases 6 and 7 for EP Evaluator®. The module called Sensi-
tivity (Limits of Detection) in Release 6 was renamed Sensitivity (Limits of Blank) in Release 7.
The reason for the change was to stay current with the names that CLSI was using for the different
types of sensitivity. The calculation process was unchanged.
Purpose: To determine the sensitivity (limits of Blank) of a method for the case in
which the instrument cannot report out results which are zero or less than zero. If
the instrument can report out results regardless of value, then use the experiment
described below under “Sensitivity (LOB) - Alternate Experiment.”
Materials: Specimen A has a zero concentration of the analyte being studied. Speci-
men B has a low known non-zero concentration. For many methods, the low non-
zero calibrator is a very adequate Specimen B. The zero calibrator can be used as
Specimen A.
Experiment: Assay Specimen A 10 to 20 times. Assay Specimen B 3 to 15 times. All
replicates for both specimens can be assayed in the same run. Record the RE-
SPONSES (i.e. absorbance, fluorescence), not results, in the same units in which
they are being reported (i.e. mmol/L).
Data Required for the Calculations: Instrument responses (not results) for both spec-
imens. Concentration of the non-zero specimen. Calculations: In EP Evaluator®,
Release 9, the appropriate statistics module is Sensitivity (LOB). A figure showing
the basis of these calculations is shown in Figure 14.1.
Calculations: In EP Evaluator, the appropriate statistics module is Sensitivity (LOB).
A figure showing the basis of these calculations is shown in Figure 14.1.
Purpose: To determine the sensitivity (Limits of Blank) of a method for the case in
which the instrument can report out results regardless of value. If the instrument
cannot report out zero or less than zero values, then use the experiment described
above under “Sensitivity (Limits of Blank).”
Materials: A specimen with a zero concentration of the analyte being studied. The zero
calibrator specimen may be used.
Experiment: Assay this specimen 20 times. Record the results.
Data Required for the Calculations: Results for this specimen.
Calculations: In EP Evaluator®, the appropriate statistics module is Simple Precision.
Calculate an SD. Multiply that SD by 2 (95% confidence) or by 3 (99.7% confi-
dence) to get the Sensitivity (LOB).
Published
Performance Standards
Two sets of performance standards are listed here.
• The first set is the proficiency testing (PT) limits were defined for quantita-
tive assays by the CLIA ‘88 regulations published February 28, 1992. Semi-
quantitative or qualitative analytes are not included. In some jurisdictions,
other PT limits may apply. Unless otherwise specified, the lower and upper
PT limits are obtained by subtracting and adding the specified quantity to the
target value.
• The second set are medical requirements as specified by national or govern-
mental organizations.
We have provided these lists for your information and convenience. While we
have checked them for errors, we do not guarantee that the lists are either com-
plete or accurate.
General Immunology
IgG ± 3SD
IgM ± 3SD
Chloride ± 5%
Cholesterol, total ± 10%
Cholesterol, high density lipoprotein ± 30%
Creatine kinase (CPK) ± 30%
Creatine kinase isoenzymes MB elevated (presence or
absence) or Target value ± 3
SD
Magnesium ± 25%
Potassium ± 0.5 mmol/L
Sodium ± 4 mmol/L
Total protein ± 10%
Triglycerides ± 25%
Toxicology
Medical Requirements
The following values were developed by task groups created by the NIH. In all
cases, TEa is defined as bias + 2*CV.
The following are just those CLIA regulations pertaining to technical issues. The
complete CLIA regs can be found at:
http://www.phppo.cdc.gov/clia/regs/toc.aspx
The interpretative guidelines and probes may be found at:
http://www.cms.hhs.gov/CLIA/downloads/apcsubk1.pdf
Preanalytic Systems
For kinetic enzymes, the calibration verification requirements may be met by veri-
fying the procedure using a high enzyme level material such as a control, calibra-
tion material, or patient specimen and diluting it to cover the reportable range.
Control activities routinely used to satisfy the requirement for §493.1256 do not
satisfy the calibration verification requirements.
EXCEPTIONS:
1 For automated cell counters, the calibration verification requirements are
considered met if the laboratory follows the manufacturer’s instructions for
instrument operation and tests 2 levels of control materials each day of testing
provided the control results meet the laboratory’s criteria for acceptability.
2 If the laboratory follows the manufacturer’s instruction for instrument opera-
tion and routinely tests three levels of control materials (lowest level avail-
able, mid-level, and highest level available) more than once each day of test-
ing; the control material results meet the laboratory’s criteria for acceptability
and the control materials are traceable to National Institute of Standards and
Technology (NIST) reference materials, the calibration verification require-
ments are met.
If reagents are obtained from a manufacturer and all of the reagents for a test are
packaged together, the laboratory is not required to perform calibration verifica-
tion for each package of reagents, provided the packages of reagents are received
in the same shipment and contain the same lot number.
...
Probes §493.1255(b)
Glossary
Analyte is the substance being measured. Synonyms for analyte are “tests,” “pa-
rameters” (hematology) and “measurand.” Examples of analytes are glucose
(chemistry), red blood cells (RBC) (hematology), prothrombin time (PT)
(hemostasis), phenytoin (toxicology), TSH (endocrinology) and IgG (immu-
nology) to mention just a few.
Analyte Concentration is the amount of analyte present in whatever units are be-
ing measured, whether an actual concentration (mmol/L), an activity (U/L),
time (seconds), cell count percentage or some other measurable quantity.
Assigned Concentrations refers to the concentration of a set of specimens as
defined (or assigned) by the user and is used in the assigned concentrations for
the linearity and calibration verification protocols. Eleven approaches used in
EP Evaluator®, Release 9 to assign concentrations or relative concentrations.
For a fuller discussion, refer to the EP Evaluator® User’s Manual Chapter on
Linearity.
Bias The difference between two related numbers. There are several definitions of
this term. The one frequently used by statisticians is (Ymean - Xmean). A second
definition is to indicate the difference between results obtained from two
different methods (Yi -Xi). A third is to indicate any difference between two
experimental values.
Bias Plot is a graph of the differences between the two results for a single speci-
men plotted versus the X result or in the case of the Bland-Altman plot, each
difference is plotted versus the average of the X and Y results.
Bins The concept of bins is discussed in Section 3.3 of CLSI:EP9. A suggested
list of bins is given in CLSI:EP9 Table I. Bins represent clinical or statistical
groups into which results for a given analyte can be distributed. For example,
CLSI:EP9 suggests that glucose results be divided into five groups, < 50, 51
to 110, 111 to 150, 151 to 250, and > 250 mg/dl. As the results are entered into
the program, the percent in each group are counted and displayed on the data
entry screen. The purpose of this concept is to encourage the users to accumu-
late results from specimens over a fairly wide analytical range.
Bland-Altman Plot See Bias Plot.
Glossary C-1
Carryover occurs if a specimen or reagent used for the assay of one specimen
contaminates the mixture used to assay the next specimen. Usually carryover
causes the second specimen to have a falsely high value.
Case has to do with whether a name is spelled with “CAPITAL LETTERS LIKE
THIS” (upper case), with “little letters like this” (lower case), or with “Both
Capital And Little Letters Like This” (mixed case). (In general, you may input
data in either upper or lower case. If the program cares about the case, it will
make sure that it receives it in the appropriate form.).
Central Tendency is the center point of the data. Examples of central tendencies
are means and medians. See Chapter 4, Statistics 101.
Clinical Linearity is an algorithm by which the linearity of a system can be
evaluated against user-defined allowable error. See Chapter 10, Interpreting
Linearity Experiments, for details.
CLSI is the acronym for Clinical and Laboratory Standards Institute (formerly
known as National Committee for Clinical Laboratory Standards) a voluntary
organization which defines standards for the clinical laboratory industry.
Coefficient of Variation (CV) is a common measure of Dispersion. It is a form of
SD which has been normalized for concentration using the equation below.
CV = (SD / mean) * 100
Concentration a generic term which refers to the amount of analyte present in a
specimen. It may be expressed in whatever units are appropriate to that ana-
lyte. It may refer to the concentration of a material such as glucose, the activ-
ity of an enzyme such as ALT, clotting time for a hemostasis analyte such as
APTT, the number of cells such as neutrophils in hematology.
Confidence Interval The range of values in which results from a large fraction of
similar future experiments (usually 95 or 99%) are expected to fall.
Cutoff value: A medical decision point often defined with the use of ROC soft-
ware. For example, there are two cutoff values for cholesterol of 200 and 240
mg/dL. For some analytes such as drugs of abuse, the cutoff values are estab-
lished administratively.
CV. See Coefficient of Variation.
Defined Concentrations: See Assigned Concentrations.
Degrees of Freedom is a statistical term for a corrected number of variables used
to calculate a number. Generally, a larger number of degrees of Freedom pro-
vides more reliable statistics.
Deming Regression is a regression calculation made assuming that error exists in
the data plotted on the X axis. See Regular Regression.
Dispersion is the scatter of data around the central tendency. One example of this
would be seen in a Levey-Jennings chart in which typically the results are
scattered around the mean.
Drift is the net shift in results over time, either up or down. It is an indicator of
the instability of the analytical process. In EP10, it is evaluated in each run.
Experiment refers to the process used to evaluate a single analyte by a single
method (i.e. instrument). In many instances, experiments are closely related.
Glossary C-3
Method refers to the process which makes measurements on a specimen. Ex-
amples of methods include a chemistry instrument which determines glucose
concentrations, a hematology analyzer which measures hematocrits, a RIA
kit which measures estradiol, a manual microscopic process which counts
yeast cells in urine and device which measures pH, pO2 and pCO2. Synonym:
Instrument.
Method Comparison is a process which statistically compares two methods. The
usual purpose of the comparison process is to show the statistical relationship
of the methods being compared. The comparison may be either quantitative or
qualitative.
NCCLS: See CLSI.
Normal range is a range of results between two medical decision points which
corresponds to the central 95% of results from a healthy patient population. It
is one form of a reference interval.
Parameter: 1) An item used to describe a property of a analyte such as units, total
allowable error, reference interval and the like; 2) A hematology analyte.
Passing-Bablock is a robust approach to calculating the best straight line through
a series of points in a method comparison study. See Chapter 9, Interpreting
Method Comparison Experiments for a definition.
Performance Standards is a synonym for Allowable Error. The advantage of
using this term is that it is intuitively seen as a positive term. In contrast, the
term Allowable Error has negative implications. See Total Allowable Error for
a discussion of the term.
POC (Point of Care) (also known as Near Patient Testing) refers to tests which
are performed near the patient, as compared to the laboratory which is often at
some distance.
POL is the acronym for Physician Office Lab. This is a clinical lab serving one
or more physician offices and is managed by a physician.
Policy Definitions are the descriptors of the data needed to define an experiment.
Policy definitions allow a user to quickly create an exper-iment, enter or cap-
ture results and perform calculations. In other words, they are the non-results
type data which must be entered for each experiment such as names, units and
reference intervals for analytes, panels, and serial communication parameters
to mention but a few.
Precision is a measure of the agreement between replicate measurements of the
same specimen.
Prevalence is the frequency with which positives occur in a defined population.
Predictive Value Positive is the probability that a subject with a positive result
actually has the disease. It includes prevalence.
Predictive Value Negative is the probability that a subject with a negative result
actually does not have the disease.
Project is a folder containing a group of experiments by one or more of EP
Evaluator®’s statistical modules. Ideally all those experiments are related; for
example, the linearity, precision and method comparison experiments used to
evaluate a specific new instrument.
C-4 Lab Statistics - Fun and Easy
Proximity Limits are the acceptable limits for the concentration of the specimen
used to test the reportable range. If that concentration is within the proxim-
ity limits, then the method passes one part of the two part test for meeting the
manufacturer’s claim for the reportable range.
PT Limits (Proficiency Testing Limits) are analytical limits specified by regulato-
ry bodies for surveys. The PT limit for glucose is 6 mg/dL or 10% whichever
is greater. At a target concentration of 50 mg/dL, the PT limits are 44 to 56. At
a target concentration of 200 mg/dL, the PT limits are 180 to 220. For a list of
the PT limits specified by CLIA ‘88, see Appendix A, Published Performance
Standards.
Random Error Budget is that fraction of TEa which is allocated to 1 SD. Rec-
ommended range is 16% (6 SD’s per TEa) to 25% (4 SD’s per TEa). See
Chapter 5, Understanding Error and Performance Standards for additional
details.
Regression Line is the straight line drawn through the results which minimizes
the sum of the square of the distances between each point and the line. Think
of it as the “best fit” line.
Recovery is the amount of substance present in a sample that can be detected by
the analytical system. Usually this term is referred to as percent recovery. A
system in which there is 100% recovery is perfectly accurate.
Reference Interval. See Chapter 13, Understanding Reference Intervals.
Regular Regression is a regression calculation made assuming that no error
exists in the data plotted on the X axis. This is also termed “Ordinary Linear
Regression.” See also Deming Regression.
Residual is usually calculated in a linear regression environment. It refers to the
vertical distance between two numbers, one calculated from a best fit line
(often a linear regression line) and an experimental result.
Restore is the process of restoring data, which had previously been backed up, to
the original disk. See Backup.
Sensitivity: a) The probability that a test will be positive in a population in which
everyone has the disease. The ideal sensitivity is 100%. b) The lowest concen-
tration that can be reported (Chapter 14, Sensitivity Experiments for details).
Specificity is the probability that a test will be negative in a population in which
no one has the disease. The ideal specificity is 100%.
Skew refers to the position of the mode (highest point of the curve) of the bell
shaped distribution relative to the mean. If the mode and the mean are signifi-
cantly different, the curve is said to be skewed. See Kurtosis.
SMAD (Scaled Median Absolute Deviation) is a value similar to Standard Error
of the Estimate (SEE) in that it describes the scatter around best fit line, but
developed with particular relevance to the Passing-Bablok approach as it is
insensitive to outliers.
Standard Deviation (SD) describes the degree of dispersion of data around a
central value or mean. In a set of normally distributed data, the central 2 SD
constitutes about 66% of the results. Similarly the central 4 SD constitutes
about 95% of the results.
Glossary C-5
Standard Deviation Index (SDI) is a measure of the distance of a point to the
mean described in standard deviation units. The equation for SDI is:
SDI = (result - mean) / SD
Standard Deviation of the Differences (SDD) comes from comparing the X
value with the Y value in a given pair of values. It represents the statistical
difference between the values of X - Y pairs. One helpful analogy is that the
SDD is an “envelope” around the bias similar to the “envelope” of the SD
around the mean in a Levey-Jennings chart. When using the terms of central
tendency and dispersion, SDD is the dispersion component. Bias is the corre-
sponding central tendency.
Standard Error of the Estimate Think of this number (SEE) as the “standard
deviation” of the differences between the linear regression line and the plotted
points. One helpful analogy is that the SEE is an “envelope” around the re-
gression line similar to the “envelope” of the SD around the mean in a Levey-
Jennings chart.
String is a line of one or more characters, typically used as a name or descriptor.
An example of a string is “GLUCOSE.”
Systematic Error Budget is the fraction of TEa that is to be allocated to system-
atic error. Recommended range is 25 to 50%. See Chapter 5, Understanding
Error and Performance Standards for additional details.
TEa: See Total Allowable Error.
Therapeutic Range is a reference interval applied to therapeutic drugs.
Total Allowable Error (TEa) has many definitions. One of these is “the amount of
error that can be tolerated without invalidating the medical usefulness of the
analytical result” (Carey and Garber - 1989). A more quantitative way to think
of it is “This result is expected to be within X% of the true result 99.7% of the
time” where X is the performance standard for this analyte. The “99.7%” may
be other values such as 95% or 99.9997%. Synonym: Performance Standards.
See Chapter 6, Defining Performance Standards for a substantial discussion
on this issue.
Unsatisfactory Performance is defined by CLIA ‘88 regulations as that occasion
when the grade during a proficiency event for an analyte is less than 80.
Unsuccessful Performance is defined by CLIA ‘88 regulations as that occasion
when there have been Unsatisfactory Performances for an analyte in two of
the last three consecutive PT events.
Worksheet is the RRE table into which data is entered prior to moving it into one
of the EP Evaluator® statistical modules.
Bibliography
Aspen Conference (1976) Proceedings of the 1976 Aspen Conference on Analytic
Goals in Clinical Chemistry, College of American Pathologists, Skokie, IL.
C.C. Garber and R.N. Carey (2010) Clinical Chemistry: Theory, Analysis Correla-
tion, ed: L.A. Kaplan and A.J. Pesce, Mosby, Inc, an affiicate of Elsevier, Inc, St.
Louis.
R.N. Carey, C.C. Garber and D.D. Koch (2000) Concepts and Practices in the
Evaluation of Laboratory Methods. Workshop #2103, AACC 52nd Annual Meet-
ing, San Francisco, CA, July 23,2000.
Bibliography D-1
CLIA Interpretative Guidelines (2003); For sections related to Calibration and
Calibration Verification: http://www.cms.hhs.gov/CLIA/downloads/apcsubk1.pdf
CLSI Document C28-A2. How to define and determine reference intervals in the
clinical laboratory; Approved guideline-second edition. CLSI, 940 West Valley
Road, Suite 1400, Wayne, PA 19087-1898 USA, 2000. (References to this docu-
ment will be to CLSI:C28.)
CLSI Document EP9-A. Method comparison and bias estimation using patient
samples; Approved guideline. CLSI, 940 West Valley Road, Suite 1400, Wayne,
PA 19087-1898 USA, 1995. (References to this document will be to CLSI:EP9.)
CLSI Document EP12-A. User protocol for evaluation of qualitative test perfor-
mance; Approved guideline. CLSI, 940 West Valley Road, Suite 1400, Wayne, PA
19087-1898 USA, 2002. (References to this document will be to CLSI:EP12.)
P.J. Cornbleet and N. Gochman (1979) Clin Chem, 25, 432, Incorrect Least-
Squares Regression Coefficients in Method-Comparison Analysis.
Federal Register (1992), February 18, 1992, 42 CFR Part 405 et al. Medicaid and
CLIA Programs; Regulations Implementing the Clinical Laboratory Improvement
Amendments of 1988 (CLIA); (Part 493, Subparts I, K and P are relevant to this
program. These are the original CLIA ‘88 regulations.)
Federal Register (2003), February 24, 2003, 42 CFR Part 493. Medicaid and
CLIA Programs; Regulations Implementing the Clinical Laboratory Improvement
Amendments of 1988 (CLIA); (Part 493, Subparts I, K and P are relevant to this
program. This reference is the last and final set of CLIA ‘88 regulations.)
C.G. Fraser (1987) Desirable standards of performance for therapeutic drug moni-
toring. Clin Chem 33, 387.
C.G. Fraser (1987a) Goals for TDM: a correction, Clin Chem 33, 387.
R.S. Galen and S.R. Gambino (1975) Beyond Normality: the predictive value and
efficiency of medical diagnoses. John Wiley and Sons, New York.
Eugene K. Harris and James C. Boyd (1995) Statistical Bases of Reference Values
in Laboratory Medicine. Marcel Dekker, Inc., New York
Bibliography D-3
N. Mielczarek (2004), Mother not killer, state concedes, Tennessean, Nov. 13,
2004, at 1A (accessed 22 Aug 2005)
National Health and Nutrition Survey (1976-1980). National Center for Health
Statistics. National Health and Nutrition Examination Survey, 1976-80. Catalog
No 5411, Version 2. U.S. Department of Health and Human Services, Public
Health Service, Centers for Disease Control.
New York State PT Surveys. Summaries of these results over many years are
available at: http://www.wadsworth.org/labcert/clep/PT/ptindex.html
Plebani and Carrero (1997) Clinical Chemistry, 43, 1348. Mistakes in a stat labo-
ratory: types and frequency
J.W. Ross (1980) Blood gas internal quality control, Pathologist, 34, 377.
S.J. Soldin, C. Brugnara, J.M. Hicks (1999) Pediatric Reference Ranges, Third
Ed., AACC Press, Washington, DC.
D. Tonks (1963) A study of the accuracy and precision in clinical chemistry deter-
minations in 170 Canadian laboratories, Clin Chem 9, 217.
J.O. Westgard (2002), Basic QC Practices, Second Edition, AACC Press, Wash-
ington, DC.
Bibliography D-5
D-6 Lab Statistics - Fun and Easy
E
Appen d i x
Purchase Plans
Two general plans are available:
• Perpetual Dating. This is the traditional buy once and own it forever plan. The
user gets 60 days of free telephone support, free updates within their release,
and no upgrades to future releases. This plan is available only in the 50 United
States and Canada.
• Subscriptions. Basically this is a software rental plan. The usual subscription
is for a year. At the end of the initial subscription period, the subscription may
be renewed at a 10% discount off list price. Support is provided in two major
ways:
• Unlimited free telephone support.
• Automatic free upgrades to the next release.
EP Evaluator® - Webcasts
Free, live, interactive webcasts are given frequently at scheduled times either
weekly or monthly on a variety of EP Evaluator® related topics. Advance signup
is required. Consult our website (datainnovations.com) for topics, schedule and
signup.
EP Evaluator® - Online
Several statistical modules are also available from our website (datainnovations.
com). These statistical modules include most of those in the CLIA version plus a
module on Stability.
The user accesses the on-line facility on the website using their browser. They
then have a data entry screen. After data is entered and submitted to the website,
the website does the calculations and immediately returns a ready-to-print report
to the user.
Reports are virtually identical in appearance to the reports generated by the desk-
top version of EP Evaluator®.
The cost for most modules is about $1 per report. A block of report credits can be
purchased using your credit card on-line or from us directly.
Services
Installation, training, 24x7x365 support, and consulting all help you maximize
your Return on Investment in IM. Four worldwide offices ensure services are
available where and when you need us.
IM is FDA 510(k) cleared and Data Innovations (DI) is ISO 13485 certified.
Availability
IM is available directly from DI and from its numerous business partners. For
more information, contact DI at (802) 264-3470, northamerica-sales@datainnova-
tions.com, or through our website at www.datainnovations.com.