AO - OTA Classification of Open Fracture 013 PDF
AO - OTA Classification of Open Fracture 013 PDF
One hundred thirty-six independent raters, comprised of moderate to substantial levels of agreement. Muscle injury had
91 attending orthopaedic surgeons and 45 orthopaedic the most disagreement across raters but still demonstrated a fair
residents at variable levels of training, completed open level of interrater agreement, which is a level of agreement
fracture classification forms after viewing the video presenta- superior to the Gustilo–Anderson classification.
tions of the open fracture (Table 1). In some cases, not all Levels of agreement were similar between attending
raters completed classification forms for all 6 video presenta- surgeons and residents for all categories, although agreement
tions. Incomplete fracture classification forms were not was slightly higher for the resident cohort in every category
included in this analysis. except arterial injury (Table 3) where near perfect agreement
The completed raw data forms were mailed to one of was demonstrated for all reviewers.
the investigators (J.A.) who compiled the data. Data for each The percent agreement across all raters within each
classification category (skin injury, muscle injury, etc.), and classification category and severity subcategory for each
for each severity subcategory (1, 2, 3), was reviewed and video case is presented in Table 4. This represents the degree
analyzed for interrater reliability and percent agreement. All to which identical ratings were given for the categories for
analysis was done using SPSS v 17 (Chicago, IL) or SASv9 each video. The severity subcategory within the arterial injury
(Cary, NC). Interrater agreement for multiple raters was done category representing no arterial injury was the only category
using the Inter_Rater Macro for SAS8 which allows for to reach perfect agreement amongst all raters and was
Kappa scores for multiple raters on a scale with multiple achieved for 3 of 6 videos. Classification categories of muscle
values. Kappa scores were interpreted similar to other studies injury and contamination consistently demonstrated the low-
as follows: ,0.20 = poor agreement, 0.20–0.40 = fair agree- est percent agreement across almost all videos. There was
ment, 0.40–0.60 = moderate, 0.60–0.80 = good agreement, a trend for greater levels of agreement amongst raters for
and 0.80–1.0 = excellent agreement.9 Interrater reliability (k) classification subcategories representing lesser severity inju-
values were also calculated separately for the cohort of ries (mild and moderate) and somewhat less agreement for
attending orthopaedic surgeons and the cohort of orthopaedic more severe injuries (moderate and severe).
residents. The measures of reliability were used to determine Twenty attending orthopaedic surgeons assessed and rated
the ability of the classification to differentiate between videos. the quality of the videos and the overall presentations of the
The measure of percent agreement was used to determine the cases. The results are presented in Supplement Digital Content 2
degree to which the different raters gave each variable within (Appendix 2, http://links.lww.com/BOT/A78). Overall, video
each video the same value. As an assessment of quality of the quality and arrangement within the presentations was felt to be
video presentations, participants were asked to grade the ade- adequate by most participants. There was a lack of participant
quacy of the videos in 5 categories during one of the presenta- agreement on the appropriateness of the amount of material
tion sessions. The 5 categories were based upon image quality, presented and whether the experience of classifying open
the arrangement of videos within the presentation, the length of fracture severity outside the operating theater was contrived.
video shown an impression of whether the video material rep- This indicates that at least some raters perceived that it was
resented the setting of the initial fracture debridement or seemed difficult to assess and classify open fractures in this video
contrived, and whether adequate time was allowed for video presentation format.
viewing, interpretation, and fracture classification.
DISCUSSION
RESULTS This study demonstrated that the new OTA Open
Each category of the classification was assessed sepa- Fracture Classification has good reliability of classification
rately. The results shown in Table 2 demonstrate the system to amongst a wide variety of orthopaedic surgeons. These data
have high reliability and much improvement compared with support continued development and utilization of the OTA
that found for the Gustilo–Anderson classification. Overall Open Fracture Classification.
interrater reliability (k) values were highest for arterial injury, The strength of this study is the broad, multicentered,
with near perfect agreement across all raters and within each international cohort of orthopaedic surgeons and residents
value. Skin injury, bone loss, and contamination demonstrated who participated as raters. This was accomplished by using
videos as the mechanism to most closely replicate the
experience of actually performing or observing the initial
TABLE 1. Sites of Open Fracture Video Presentations and open fracture debridement. Reliability was assessed by kappa
Experience Levels of Raters scores to determine the degree to which observers using the
Site Attending Surgeons Residents open fracture classification agreed on how to differentiate
SEFS 22 3 amongst various injuries. Percent agreement was assessed to
Emory University 15 determine the degree to which the scores in each category
SWOTA 15 given by the raters were identical. These 2 analyses combined
University of Pittsburgh 27 provide information on the overall quality of the measurement
AO meeting 54 tool being evaluated; its ability to discriminate or identify
different levels of severity, and the ability for different users
SEFS, Southeastern Fracture Symposium; SWOTA, Southwest Orthopedic Trauma
Association. of the instrument to choose the same value when assessing the
same injury.
TABLE 2. Interrater Reliability for the 5 Categories and 3 Levels of Severity (Kappa Values)
Kappa (k)
Score Definition Values
Skin Injury
1 Can be approximated 0.79
2 Cannot be approximated 0.01
3 Extensive degloving 0.79
Overall 0.69
Muscle Injury
1 No muscle in area, no appreciable muscle necrosis, some muscle injury with intact muscle function 0.51
2 Loss of muscle but the muscle remains functional, some localized necrosis in the zone of injury that 0.21
requires excision, intact muscle–tendon unit
3 Dead muscle, loss of muscle function, partial or complete compartment excision, complete disruption of 0.52
a muscle–tendon unit, muscle defect does not approximate
Overall 0.40
Arterial Injury
1 No injury 0.9
2 Artery injury without ischemia 0.9
3 Artery injury with distal ischemia 0.0
Overall 0.9
Bone Loss
1 None 0.70
2 Bone missing or devascularized but still some contact between proximal and distal fragments 0.65
3 Segmental bone loss 0.00
Overall 0.65
Contamination
1 None or minimal contamination 0.52
2 Surface contamination (easily removed not embedded in bone or deep soft tissues 0.16
3 Contaminant embedded in bone or deep soft tissues or high risk environmental conditions (barnyard, 0.69
fecal, dirty water, etc.)
Overall 0.48
When the 6 open fracture cases were presented, the OTA nomenclature contained within the muscle injury and contam-
Open Fracture Classification demonstrated moderate to excel- ination categories will be re-evaluated. However, limits to the
lent agreement between the raters. This level of agreement was degree of agreement possible for some of these categories
better than found by Brumback et al5 for the Gustilo–Anderson regardless of category wording may be present. This is partic-
classification using a similar video format. The average interob- ularly the case when employing the current study design of
server agreement seen for the Gustilo–Anderson classification video assessment of injury. Assessment of muscle injury and
was 60%, whereas all categories of the OTA Open Fracture contamination might be expected to be better and more reliable
Classification in the current study demonstrated an average of with direct observation of the wound.
86% overall agreement for all categories (between 52% and The level of agreement between residents was similar to
100% agreement for individual categories). the agreement between attending surgeons. This finding was
This study also separately assessed the reliability of each unexpected because more experience should lead to greater
of the 5 categories. The arterial injury category had excellent interrater reliability for a classification scheme. This supports
agreement among the raters for all 6 video presentations the wording appropriateness of each category. However, this
(k = 0.9), which would be expected as the information to classification was new to all of the observers minimizing the
assign the correct category was given in the video. The only effect of previous experience. In addition, more experienced
disagreement seemed to be error in recording results on the part surgeons may have had preconceived ideas that made it
of the rater. The raters also had a good level of agreement for difficult for them to reliably use these new definitions. They
skin injury and bone loss (k = 0.65 and 0.69, respectively). The may have substituted their own opinion of importance of
most disagreement was identified for muscle injury and con- various open fracture characteristics for the ones identified by
tamination, which had moderate agreement (k = 0.40 and 0.48, the classification itself when assigning cases to categories.
respectively). Muscle injury and contamination are areas This level of agreement between residents and attending
where category definitions are the most subjective and vari- surgeons show this system may be readily applied in teaching
able. The committee members have reassessed category defi- hospitals and trauma centers.
nitions after each stage of validating the classification to make In vivo reliability testing for this type of classification
them as clear as possible. Based on the results of this study, the system is probably the best scenario. However having
too many to gain widespread clinical use. However, any nomenclature, which injury based and not treatment based,
scoring method that summarizes all 5 categories of injury requires further scrutiny particularly in the muscle injury
into a single number or grouping will potentially lose the and contamination categories in which interobserver
specificity of injury severity inherent in independent assess- agreement was the weakest. In addition to reliability
ment of each of the 5 categories. It is hoped that further testing, the degree to which patient outcomes are stratified
research using the OTA Open Fracture Classification will between the categories of the classification is a current
allow recognition of common important patterns for the research interest.
classification of severe injuries (subclassification of Gustilo–
Anderson 3B) in a clinically relevant manner. Furthermore, REFERENCES
grouping may lessen the affect of one category over another. 1. Dellinger EP, Miller SD, Wertz MJ, et al. Risk of infection after open
However broad appeal of the classification for routine non- fracture of the arm or leg. Arch Surg. 1988;123:1320–1327.
research use may be facilitated by a summary score or by 2. Gustilo RB. Management of open fractures. An analysis of 673 cases.
Minn Med. 1971;54:185–189.
grouping of categories into a 3 (or other small number of 3. Gustilo RB, Anderson JT. Prevention of infection in the treatment of one
categories) category summary. One potential summary score thousand and twenty-five open fractures of long bones: retrospective and
is to use the most severe of any category to describe the prospective analyses. J Bone Joint Surg Am. 1976;58:453–458.
overall injury. Therefore, an injury with skin severity 2, 4. Gustilo RB, Simpson L, Nixon R, et al. Analysis of 511 open fractures.
Clin Orthop Relat Res. 1969;66:148–154.
muscle 2, bone loss 1, contamination 1, and arterial injury 5. Brumback RJ, Jones AL. Interobserver agreement in the classification of
1 would be classified as a 2 or “moderate” because the high- open fractures of the tibia. The results of a survey of two hundred and forty-
est severity of any category is 2. The OTA classification and five orthopaedic surgeons. J Bone Joint Surg Am. 1994;76:1162–1166.
outcomes committee continue to evaluate the potential of 6. Evans AR, Agel J, DeSilva GL, et al; Orthopaedic Trauma Association:
these and other options. Open fracture study group. A new classification scheme for open fractures.
J Orthop Trauma. 2010;24:457–465.
In conclusion, in this study, which included a diverse 7. Kottner J, Audige L, Brorson S, et al. Guidelines for reporting reliability
multicenter multinational cohort of orthopaedic surgeons and agreement studies (GRRAS) were proposed. J Clin Epidemiol. 2011;
and residents, of the OTA Open Fracture Classification 64:96–106.
demonstrated moderate to excellent inter-observer reli- 8. Gwet K. Inter-rater reliability: dependency on trait prevalence and
marginal homogeneity. Statistical Methods for Inter-rater Reliability
ability. This is despite the fact that the new classification Assessment. 2002;2:1–9.
provides more detailed injury severity discrimination 9. Altman DG. Practical Statistics for Medical Research. London, United
particularly in the more severe injuries. The classification Kingdom: Chapman and Hall/CRC; 1991.
Invited Commentary
I read this manuscript with interest, as I believe that advancement in the classification of open fractures is necessary to be able to
compare the results of open fracture care. One unifying problem with the scientific study of open fractures remains the difficulty
in creating cohort groups, with so many variables of injury inherent in open fractures. Without differing trauma centers being able
to reproducibly group like fractures together, we are at a loss to discover which open fractures need certain treatments, and what
others are better served by alternative methods. And, while certainly advancements have been made, the level of fracture care still
seems to depend on the judgment and experience of the individual treating orthopaedic surgeon, attempting to match the best type
and timing of care to each open fracture.
I have long believed that if you hypothetically handed each member of the OTA the exact same open fracture of both
bones of the forearm, the postoperative radiographs would look remarkably similar, but the amount of surgical debridement
performed for that injury would differ markedly. Only when we can accurately classify open fractures, can we subdivide injuries
into like groups and, perhaps, learn to optimally treat these subdivisions differently. A reproducible classification of open
fractures should logically lead to a reproducible measure of the accuracy and effectiveness of debridement, which, I believe, has
a direct effect on infectious outcomes. So I envision standardization and acceptance of the classification of open fractures as
a necessary first step to the standardization of open fracture treatment and results.
All good science raises further questions and that remains true here. There is still unimpressive reliability and agreement in
2 key areas of injury, muscle damage and contamination. Certainly, these are 2 important factors in analyzing outcomes of open
fractures. The authors recognized this and are to be commended for honestly concluding that these areas need further
adjustments of their tested criteria so that greater agreement can occur. It is apparent that certain injuries have significant
agreement in all areas, whereas others seem less so.