Use of Statistical Analysis of Cytologic Interpretation to Determine the Causes of Interobserver Disagreement and in Quality Improvement
Use of Statistical Analysis of Cytologic Interpretation to Determine the Causes of Interobserver Disagreement and in Quality Improvement
CANCER
CYTOPATHOLOGY
Andrew A. Renshaw, M.D. BACKGROUND. Disagreements in cytologic interpretation can have several causes,
Kenneth R. Lee, M.D. including differences in diagnostic threshold and diagnostic accuracy. These can
Scott R. Granter, M.D. be distinguished by a combination of statistical analyses.
METHODS. For demonstration purposes, a nonrandom collection of 80 cervicovagi-
Division of Cytology, Departments of Pathology, nal smears, the majority of which (74) were originally diagnosed as atypical cells
Brigham & Women’s Hospital and Harvard Med- of undetermined significance (ASCUS), were reviewed by 3 separate observers and
ical School, Boston, Massachusetts. classified as either negative, negative and reactive, ASCUS favor reactive, ASCUS
not otherwise specified, ASCUS suggestive of a squamous intraepithelial lesion
(SIL), low grade SIL, or high grade SIL. The results were compared with correspond-
ing biopsies and analyzed with distribution analysis, the kappa statistic, threshold
analysis, and receiver operating characteristic (ROC) curve analysis.
RESULTS. Distribution analysis of diagnoses from the three observers demonstrated
statistically significant differences in how cases were classified and a low level
of agreement. Kappa analysis confirmed a very poor interobserver agreement.
Threshold analysis revealed that one observer used a threshold between negative
and ASCUS that was statistically more specific but less sensitive than the other
observers. ROC curve analysis showed that another observer was more accurate
than this observer.
CONCLUSIONS. Variation in cytologic interpretation may have several causes. Distri-
bution, threshold, and ROC analysis allow distinction between differences in diag-
nostic accuracy and diagnostic thresholds. This approach to analyzing cytologic
interpretation may be useful for quality improvement efforts. Cancer (Cancer Cyto-
pathol) 1997;81:212–9. q 1997 American Cancer Society.
Statistical Analysis
Distribution analysis was performed by assigning a
opsy and those in which it did not. This selection bias numeric value (1 to 7) to each diagnosis (negative
was used to magnify any differences in interobserver through HSIL, respectively) and performing a Mann –
variability, diagnostic threshold, or diagnostic accu- Whitney U test to determine whether the median diag-
racy. When reviewing the cases, each author knew that nosis was different. Kappa analysis was performed us-
the original diagnosis for most cases was ASCUS, that ing a one-tailed test model. Kappa values õ 0.40 repre-
each had a subsequent biopsy that was either negative sented poor agreement, and values ú 0.60 represented
or SIL, and that the cases were selected to be difficult. good agreement. Statistical analysis of diagnostic
However, the authors did not know the actual distribu- thresholds was performed using a two-tailed chi-
tion of diagnoses, what was favored, or the biopsy re- square test. ROC analysis was performed using the
sults. All cases were diagnosed according to the CORROC2 program available from the Department of
Bethesda system.26 Biopsy reports were used as the Radiology at the University of Chicago, written in part
gold standard; individual biopsies were not reviewed by Dr. Charles E. Metz and Helen B. Kronan.27 The
for this study. program calculates a maximum likelihood estimate
All cases were examined by each author and with- based on an effective pair of underlying bivariate-nor-
out clinical information or knowledge of the biopsy mal decision variable distributions. Statistical differ-
FIGURE 2. Level of agreement between the three observers. SIL: squamous intraepithelial lesion.
ASCUS: atypical squamous cells of undetermined significance; SIL: squamous intraepithelial lesion.
a
Two-tailed P values. DISCUSSION
This study illustrates the use of several different meth-
ods to examine interobserver variability and accuracy
in the interpretation of cervicovaginal smears. Al-
nosis of SIL on biopsy for each observer with a diagnosis though cases were not randomly selected, they pro-
of ASCUS or SIL is shown in Table 4. These thresholds vided an excellent group of cases to demonstrate the
represent different points along the ROC curve as dem- usefulness of these various methods in determining
onstrated in Figure 3. Specifically, when the threshold the causes of interobserver variability. A valid interpre-
was set as ASCUS, all cytologic cases diagnosed as nega- tation of ASCUS variability awaits application of these
tive were interpreted as negative and all cytologic cases methods of analysis to a larger, unbiased, and random
diagnosed as either ASCUS or SIL were interpreted as sample,28 – 32 which the authors currently are trying to
positive. In contrast, when the threshold was set at SIL, assemble.
all negative and ASCUS cytology cases were interpreted The current data illustrate the fact that poor inter-
as negative and only cytology cases interpreted as SIL observer variability may have several causes. The dis-
were interpreted as positive. Analysis of the differences tribution analysis data demonstrates that each of the
in these thresholds is shown in Table 5. Observer 2 was three observers categorized these cases differently.
significantly (P õ 0.05) more sensitive in predicting SIL Observer 3 was statistically more likely to categorize
on biopsy at the negative/ASCUS threshold than Ob- a case as less atypical than the other observers, and
server 3, and Observer 1 was almost statistically more Observer 2 was more likely to categorize a case as more
whether a threshold with a higher sensitivity is more squamous cells of undetermined significance: correlative
histologic and follow-up studies from an academic center.
useful than one with a higher specificity generally de-
Diagn Cytopathol 1997;16:1–7.
pends on whether the test is being used as a screening 3. Howell LP, Davis RL. Follow-up of Papanicolaou smears di-
or a diagnostic test. In addition, it is possible that an agnosed as atypical squamous cells of undetermined sig-
observer with a lower overall accuracy may be more nificance. Diagn Cytopathol 1996;14:20–4.
diagnostically useful than an observer with a higher 4. Kaye KS, Dhurandhar NR. Atypical cells of undetermined
accuracy due to differences in diagnostic thresholds. significance: follow-up biopsy and Pap smear findings. Am
For example, because of the uncertainties in patient J Clin Pathol 1993;99:332.
5. Cocchi V, Carretti D, Fanti S, Baldazzi P, Casotti MT, Piazza
management with a diagnosis of ASCUS, some clini-
R, et al. Intralaboratory quality assurance in cervical/vaginal
cians may find an observer who is more willing to cytology: evaluation of intercytologist diagnostic reproduc-
diagnose borderline specimens as either negative or ibility. Diagn Cytopathol 1997;16:87–92.
SIL more useful than one who is more accurate but 6. Sidaway MK, Tabbara SO. Reactive change and atypical
diagnoses many cases as ASCUS. Nevertheless, the au- squamous cells of undetermined significance in Papanico-
thors believe selecting the most diagnostically useful laou smears: a cytohistologic correlation. Diagn Cytopathol
1993;9:423–9.
criteria is much easier when the differences between 7. Sherman ME, Schiffman MH, Lorincz AT, Manos MM, Scott
diagnostic thresholds and accuracy are clearly defined. DR, Kurman RJ, et al. Toward objective quality assurance in
This allows one to determine the value of several diag- cervical cytopathology. Correlation of cytopathologic diag-
nostic criteria more effectively and objectively and se- noses with detection of high-risk human papillomavirus
lect those that are most appropriate. types. Am J Clin Pathol 1994;102:182–7.
8. Epstein JI, Grignon DJ, Humphrey PA, McNeal JE, Sester-
Finally, although ROC curve analysis is clearly use-
henn IA, Troncoso P, et al. Interobserver reproducibility in
ful, the main disadvantage is that there must be a gold the diagnosis of prostatic intraepithelial neoplasia. Am J Surg
standard. Fortunately, biopsy provides a reasonable Pathol 1995;19:873–86.
gold standard for most cytologic studies and often is 9. Allam CK, Bostwick DG, Hayes JA, Upton MP, Wade GG,
available. Although biopsy has limitations related to Domanowski GF, et al. Interobserver variability in the diag-
both sampling and interpretation, especially in cervi- nosis of high grade prostatic intraepithelial neoplasia and
adenocarcinoma. Mod Pathol 1996;9:742–51.
covaginal smears,34 – 39 inherent errors using it as the
10. Raab SS, Isacson C, Layfield LJ, Lenel JC, Slagel DD, Thomas
gold standard should be random and thus should not PA. Atypical glandular cells of undetermined significance.
affect comparisons of accuracy between observers ex- Am J Clin Pathol 1995;104:574–82.
amining large samples. 11. Beck JR, Shultz EK. The use of relative operating characteris-
In conclusion, this study demonstrated that a tic (ROC) curves in test performance evaluation. Arch Pathol
complete understanding of the results of cytologic in- Lab Med 1986;110:13–20.
12. Raab SS. Diagnostic accuracy in cytopathology. Diagn Cyto-
terpretation only can come from a comprehensive sta-
pathol 1994;10:68–75.
tistical analysis. Kappa analysis can determine the 13. Hanley JA, McNeil BJ. The meaning and use of the area
level of interobserver agreement, but distribution under a receiver operating characteristic (ROC) curve. Radi-
analysis allows one to determine in what way different ology 1982;143:29–36.
observers disagree. Threshold analysis is independent 14. McNeil BJ, Hanley JA. Statistical approaches to the analysis
of interobserver agreement, but provides only a lim- of receiver operating characteristic (ROC) curves. Med Decis
ited measure of accuracy. Overall accuracy can best Making 1984;4:137–50.
15. Hanley JA. Receiver operating characteristic (ROC) method-
be determined using ROC analysis. Poor interobserver
ology: the state of the art. Crit Rev Diagn Imaging 1989;
agreement can be the result of either differences in 29:307–35.
diagnostic thresholds or diagnostic accuracy. If one 16. Metz CE, Kronman HB. Statistical significance tests for bi-
wishes to improve one’s performance, then it is im- normal ROC curves. J Math Psych 1980;22:218–43.
portant to know in what way it is deficient; otherwise, 17. Dorfman DD, Alf E. Maximum-likelihood estimation param-
efforts may be misdirected. These different statistical eters of signal-detection theory and determination of confi-
dence intervals-rating-method data. J Math Psych 1969;
methods can be used for quality improvement in cyto-
6:487–96.
logic interpretation, and provide an objective mea- 18. Giard RWM, Hermans J. Interpretation of diagnostic cytol-
surement in determining overall diagnostic use- ogy with likelihood ratios. Arch Pathol Lab Med 1990;
fulness. 114:852–4.
19. Raab SS, Slagel DD, Jensen CS, Teague MW, Savell VH, Oz-
REFERENCES kutlu D, et al. Transitional cell carcinoma: cytologic criteria
1. Davey DD, Naryshkin S, Nielsen ML, Kline TS. Atypical squa- to improve diagnostic accuracy. Mod Pathol 1996;9:225–31.
mous cells of undertermined significance: interlaboratory 20. Cohen MB, Rodgers C, Hales MS, Gonzales JM, Ljung BME,
comparison and quality assurance monitors. Diagn Cytopa- Beckstead JH, et al. Influence of training and experience in
thol 1994;11:390–6. fine needle aspration biopsy of the breast. Arch Pathol Lab
2. Williams ML, Rimm DL, Pedigo MA, Frable WJ. Atypical Med 1987;111:518–20.
21. Raab SS, Thomas PA, Lenel JC, Bottles K, Fitzsimmons KM, study in a colposcopy unit [abstract]. Mod Pathol 1997;
Zaleski MS, et al. Pathology and probability. Am J Clin Pathol 10:33A.
1995;103:588–93. 30. Flynn C, Pitman M. Cytohistological correlation of subclassi-
22. Bacus JW, Wiley EL, Galbraith W, Marshall PN, Wilbanks GD, fied ASCUS Pap smears [abstract]. Mod Pathol 1997;10:33A.
Weinstein RS. Malignant cell detection and cervical cancer 31. Collins LC, Wang HH, Abu-Jawdeh GM. Qualifiers of atypical
screening. Anal Quant Cytol Histol 1984;6:121–30. squamous cells of undetermined significance help in patient
23. Raab SS, Snider TE, Potts SA, McDaniel HL, Robinson RA, management. Mod Pathol 1996;9:677–81.
Nelson DL, et al. Atypical glandular cells of undetermined 32. Kline MJ, Davey DD. Atypical squamous cells of undeter-
significance. Diagnostic accuracy and interobserver variabil- mined significance qualified: a follow-up study. Diagn Cyto-
ity using select cytologic criteria. Am J Clin Pathol 1997; pathol 1996;14:380–4.
107:299–307. 33. Cramer SF. Interobserver variability in surgical pathology.
24. Langley FA, Buckley CH, Tasker M. The use of ROC curves In: Weinstein RS, editor. Advances in pathology and labora-
in histopathologic decision making. Anal Quant Cytol Histol tory medicine. Vol. 9. St. Louis: C.V. Mosby, 1996:3–82.
1985;7:167–73. 34. Rohr R. Quality assurance in gynecologic cytology. Am J Clin
Pathol 1990;94:754–8.
25. Hermann GA, Herrera N, Sugiura HT. Comparison of inter-
35. Dodd LG, Sneige N, Villarreal Y, Fanning CV, Staerkel GA,
laboratory survey data in terms of receiver operating chract-
Caraway NP, et al. Quality-assurance study of simultane-
eristic (ROC) indices. J Nucl Med 1982;23:525–31.
ously sampled, non-correlating cervical cytology and biop-
26. The Bethesda Committee. The Bethesda system for re-
sies. Diagn Cytopathol 1993;9:138–44.
porting cervical/vaginal cytologic diagnoses. Acta Cytol 36. Cramer H, Schlenk E. An analysis of discrepancies between
1993;37:115–24. the cervical cytologic diagnosis and subsequent histopatho-
27. Metz CE, Wang PL, Kronan HB. A new approach for testing logic diagnosis in 1260 cases [abstract]. Acta Cytol 1994;
the significance of differences between ROC curves mea- 38:812.
sured from correlated data. In: DeConinck F, editor. Infor- 37. Tritz DM, Weeks JA, Spires SE, Sattich M, Banks H, Cibull
mation processing in medical imaging. The Hague: Martinus ML, et al. Etiologies for non-correlating cervical cytologies
Nijhoff, 1984:432–45. and biopsies. Am J Clin Pathol 1995;103:594–7.
28. Chhieng DC, Taylor J, Schmee J, McKenna BJ. Cytologic cri- 38. Joste NE, Crum CP, Cibas ES. Cytologic/histologic correla-
teria for subclassification of ASCUS improve correlation with tion for quality control in cervicovaginal cytology. Am J Clin
biopsy outcome [abstract]. Mod Pathol 1997;10:32A. Pathol 1995;103:32–4.
29. Ettler HC, Downing P, Wright VC, Joseph MG. Atypical squa- 39. Jones BA, Novis DA. Cervical biopsy-cytology correlation.
mous cells of undetermined significance: a cytohistologic Arch Pathol Lab Med 1996;120:523–31.