Assessment of automatic cephalometric landmark identification using artificial intelligence
Assessment of automatic cephalometric landmark identification using artificial intelligence
DOI: 10.1111/ocr.12542
ORIGINAL ARTICLE
1
Department of Orthodontics, College of
Dentistry, University of Illinois, Chicago, Abstract
Illinois, USA Objective: To compare the accuracy of cephalometric landmark identification be-
2
Department of Pediatric Dentistry,
tween artificial intelligence (AI) deep learning convolutional neural networks (CNN)
University of Illinois, Chicago, Illinois, USA
You Only Look Once, Version 3 (YOLOv3) algorithm and the manually traced (MT)
Correspondence
group.
Galina Bulatova, Department of
Orthodontics, College of Dentistry, Setting and sample population: The American Association of Orthodontists
University of Illinois, Chicago, IL, USA.
Federation (AAOF) Legacy Denver collection was used to obtain 110 cephalometric
Email: [email protected]
images for this study.
Materials and Methods: Lateral cephalograms were digitized and traced by a cali-
brated senior orthodontic resident using Dolphin Imaging. The same images were
uploaded to AI software Ceppro DDH Inc The Cartesian system of coordinates with
Sella as the reference landmark was used to extract x-and y-coordinates for 16 ceph-
alometric points: Nasion (Na), A point, B point, Menton (Me), Gonion (Go), Upper
incisor tip, Lower incisor tip, Upper incisor apex, Lower incisor apex, Anterior Nasal
Spine (ANS), Posterior Nasal Spine (PNS), Pogonion (Pg), Pterigomaxillary fissure
point (Pt), Basion (Ba), Articulare (Art) and Orbitale (Or). The mean distances were as-
sessed relative to the reference value of 2 mm. Student paired t-tests at significance
level of P < .05 were used to compare the mean differences in each of the x-and y-
components. SPSS (IBM-vs. 27.0) software was used for the data analysis.
Results: There was no statistical difference for 12 out of 16 points when analysing
absolute differences between MT and AI groups.
Conclusion: AI may increase efficiency without compromising accuracy with cepha-
lometric tracings in routine clinical practice and in research settings.
KEYWORDS
1 | I NTRO D U C TI O N diagnosis and treatment planning have not experienced similar ad-
vances during the past century.1,2 For example, most clinicians use
Despite advances in orthodontic technology, including innova- cephalometrics for orthodontic diagnosis and treatment planning. In
tion(s) in imaging systems and software, the approaches used in 2002, 90% of orthodontists in the United States routinely obtained
© 2021 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd
cephalograms.3 However, many of those who take these radiographs (IEEE) International Symposium on Biomedical Imaging (ISBI) Grand
routinely do not trace them to get the measurements; they simply Challenges were organized on this topic to encourage the develop-
take them so as to meet the expectations for standard of care.4-6 ment of better algorithms. The results were described as providing a
Several systematic reviews and prospective studies have argued that benchmark for any future development.17
cephalograms are not routinely needed for orthodontic treatment The aim of the current study was to establish the accuracy of
and have no significant impact on treatment planning decisions, par- landmark detection using a fully automated AI-based machine learn-
ticularly in the absence of skeletal concerns.4-7 These studies also ing and convolutional neural network by DDH Inc and compare its
stated that tracing lateral cephalograms is time-consuming, and they reliability to that of well-calibrated, experienced orthodontists con-
are only taken for other uses, such as medico-legal reasons, for a ducting manual visual inspection. If correlation is high, then AI may
teaching environment or because of the practitioner's lack of ex- prove to be an accurate and efficient mechanism for completing
perience.4-6 When using cephalometric imaging software (such as cephalogram tracings.
Dolphin Imaging, QuickCeph, etc), an experienced clinician spends
on average 7–15 min to manually place landmarks in each image,
which makes the procedure time-consuming and is inherently sub- 2 | M ATE R I A L S A N D M E TH O DS
ject to human error.7,8
A large amount of longitudinal data has been collected by multi- Since data are publicly available from the web from AAOF Legacy
ple institutions around the world. Unfortunately, most of these data Denver collection,18 an Institutional Review Board (IRB) exception
available are still in analogue format or are digital scans derived from was applied for this project (IRB: 2019-1180).
an analogue format. Many institutions have significant amounts of The criteria for inclusion and exclusion in this study were as
big data digitized by multiple operators over many years, yielding in- follows. Inclusion criteria were: digital lateral radiographs that
consistency and inaccuracies of digitized cephalogram points. Rapid have a ruler for calibration or fiducial points on the computer im-
development of AI in recent years has become important to many aging system; digital lateral radiographs from the same subjects
aspects of daily life, including the analyses of extensive available were done at least with a 3-year interval. Digital lateral radio-
data sets. With orthodontic diagnostic and treatment planning, this graphs of poor quality that do not allow manual tracing were also
AI technology could not only deliver an efficient and precise tool excluded. After reviewing all records in the AAOF Legacy Denver
for the practising clinician, but would also be capable of significantly collection website, 110 subjects remained. Based on previous
improving the amount of accurately available labelled data, such as studies and calculations including power tables, 110 subjects pro-
records found in the AAOF Legacy collection. vided sufficient sample size to detect a moderate difference in
Despite readily available studies demonstrating different pro- the two methods. Out of 110 subjects, 25 had a 30-m m ruler on
cesses to auto-detect craniofacial landmarks, most clinicians use their images, and the remaining 85 subjects had 150-m m distance
approaches based on image-processing techniques where images between two fiducial points (upper right and lower right) for cal-
of cephalometric radiographs require intense human preparation, ibration purposes.
such as re-scaling, calibration and labelling. Calibration and other Before digitizing the images, intra-and inter-
rater reliability
image preparations are time-consuming and, if not done properly, checks were performed. For intra-rater reliability verification, 20
often generate landmark outliers,9-11 as they strongly rely on the lateral cephalometric images were traced twice by the first inves-
quality and size of the cephalometric radiographic images. Many tigator. For the inter-rater reliability, another trained investigator
other studies propose different novel frameworks for landmark traced the same images to be compared with the first investigator's
detection in cephalometric radiographs and demonstrate results tracing for each of the 20 subjects. A kappa coefficient of 0.8 was
with an accuracy of 72%, but again, these are not fully automated used to demonstrate sufficient calibration. The landmarks that were
procedures.12,13 Current advances in this technology have, in turn, evaluated and further used in this study included: Nasion (Na), A
provided hardware and software development that is sufficiently ro- point (A), B point (B), Menton (Me), Gonion (Go), Upper incisor tip,
bust to support the large computational requirements of complex AI Lower incisor tip, Upper incisor apex, Lower incisor apex, Anterior
algorithms and their applications to machine learning. Applications Nasal Spine (ANS), Posterior Nasal Spine (PNS), Pogonion (Pg),
of a variety of deep learning architectures, such as deep neural net- Pterigomaxillary fissure point (Pt), Basion (Ba), Articulare (Art) and
works, convolutional deep neural networks, deep belief networks Orbitale (Or) (Table 1).
and recurrent neural networks, to the creation of algorithms in im-
portant fields such as natural language processing, computer vision,
speech recognition and bioinformatics have resulted in efficient 2.1 | Manually detected landmark group
and accurate automation of many pragmatic tasks.14-16 However,
due to the complexity of this problem, the developed methods Each image was uploaded to Dolphin Imaging without changing
were unable to compete with manual landmark identification. In re- the size or orientation (Figure 1). Using the 110 unmarked images,
cent years, several Institute of Electrical and Electronics Engineers the first investigator visually identified and traced landmarks using
BULATOVA et al. | 39
the Dolphin Imaging System software with an American Board of determination of the mm-pixels ratio (how many pixels in 1 mm) and,
Orthodontics cephalometric analysis sequence. After 110 radio- therefore, present extracted coordinates in mm.
graphs were traced, coordinates in absolute values in millimetres The radiographs were uploaded by the DDH company software
(mm) with Sella as 0:0 point in the Cartesian system were extracted engineer to the AI engine with no further labelling and/or changes, and
for each cephalometric image, copied to Excel and saved as the the landmarks were auto-identified. In the group of radiographs with
Manually Detected Landmark group. the 30-mm ruler, the beginning and end of the 30-mm virtual ruler in
AI was manually aligned with the ruler on radiographs to allow com-
parison (Figure 2). In the group of radiographs with 150-mm distance
2.2 | Auto-detected landmark group between two fiducial points, the image aspect ratio was adjusted to
allow comparison between the two groups. Coordinates with Sella as
Before uploading the radiographs to AI, several preparatory issues 0:0 point in the Cartesian system were exported for each cephalomet-
had to be resolved. Initially, a coordinate system in the AI program ric image and saved in Excel file as Auto-detected landmarks (ADLs).
(Ceppro software, DDH Inc) was set with lower left corner as a 0:0
point. To make the comparison with the Manually Digitized group
possible, a software engineer from the DDH company took effort to 2.3 | Statistical analysis
change the coordinate system into the Cartesian system with Sella
as the 0:0 point to allow comparison between the two groups. The Student paired t-tests at significance level of P < .05 were used to
second correction was necessary because the AI software calcu- compare the mean differences for each pair of measurement tech-
lated the coordinates in pixels. To transform the pixels into mm, the niques (MT vs AI) in both the x- and y-components for each land-
engineer placed a virtual ruler into the AI software, thus allowing a mark. SPSS (IBM-vs. 27.0) software was use for the data analysis.
40 | BULATOVA et al.
F I G U R E 1 Sample lateral cephalometric radiograph from F I G U R E 2 Sample lateral cephalometric radiograph with a
manual tracing (MT) group with digital landmarks in Dolphin 30-mm ruler uploaded to CEPPRO software [Colour figure can be
Imaging [Colour figure can be viewed at wileyonlinelibrary.com] viewed at wileyonlinelibrary.com]
Vx Vy
The inter-and intra-class correlation coefficients were 0.80, which
indicated a ‘good’ reliability. Landmark x SD Landmark x SD
Pogonion (X,Y), PT (X,Y), Articulare (X,Y) and Orbitale (Y) (Table 2). L tip 1.7 2.5 L tip 2.1 3.8
Therefore, the success detection rate for AI within 2 mm ± 0.13 mm ANS 1.7 1.7 ANS 2.3 3.3
equals three-fourths of the selected points (Figure 3). Points U1 PNS 2.1 1.9 PNS 2.1 1.7
apex (X,Y), L1 apex (X,Y), Basion (X,Y), Gonion (X,Y) and Orbitale (X) Pogonion 1.9 3.6 Pogonion 2.5 4.1
showed significant differences (>2 mm). PT 1.7 1.6 PT 2.1 1.6
Articulare 2 2.5 Articulare 2 1.7
Orbitale 2.4 2.4
4 | D I S CU S S I O N
Mean 2.0 3.0 Mean 2.1 3.0
Significant difference P <.05
Possible explanation for differences in detecting root apices lies in
U apex 3 2 U apex 3.4 3.6
the fact, some lateral cephalometric images belonged to growing
L apex 2.6 3 L apex 5.4 2.9
children and the upper central incisors apex could not be formed
completely. When considering that AI and Dolphin Imaging software Basion 3.1 2.4 Basion 4.3 3
have templates for upper and lower incisors, open root apexes could Orbitale 3 2.3
be identified according to the template rather than real position. Gonion 8.7 9 Gonion 4.9 4.7
The bigger difference in vertical direction supports this assumption. Mean 4.1 3.7 Mean 4.5 3.6
As for Gonion, confusion can come from the fact that this point is
BULATOVA et al. | 41
usually an average between two mandibular angle contours. Basion of images that it was trained. CEPPRO software was trained on
and Orbitale are generally considered hard to detect and not reliable 15 000 digital contemporary lateral cephalometric images with
points in cephalometric analysis. 1:1 scale obtained from one cephalometric machine at the Seoul
The results of our study are consistent with the existing litera- National University Dental Hospital (Seoul, Korea). Therefore, using
ture in terms of AI accuracy when identifying cephalometric land- analogue-scanned images with non-uniform image quality, improper
marks within 2 mm. Our results suggest that AI accurately detects photo angle, non-skeletal objects (metal ear rods from cephalostat
landmarks with reasonable accuracy and precision for most points. machine) and unnecessary markings (fiducial points) can affect AI
In previous studies, the accuracy of AI determined with this accuracy, since AI was not trained initially to recognize such objects
method varied from 1.5 to 1.7 mm.7 Park et al evaluated 283 lateral on the image. In our study, AI made unintended detection results
cephalometric images with 19 anthropometric points, and YOLOv3 and marked such data with ‘ERROR’ in 38 images out of 110. We
algorithm demonstrated accuracy within 2 mm in 80.4% cases, and still used the coordinates from the marked ‘ERROR’ images in our
19
3 mm for 92.0% cases. Similar results were found by Hwang et al, study, because after visual inspection the inaccuracies were among
who found that the mean error between AI and human examiners soft tissue points, rather than skeletal points.
while tracing 283 images and 80 points was 1.46 ± 2.97 mm. 20 In our A second factor that influences AI inaccuracy is operator's mis-
study, success detection rate for AI within 2 mm of accuracy while take while calibrating images in AI software. To extract the coordi-
comparing MT and AI groups was 75% and 93% within 4 mm, which nates of the AI placed cephalometric points, the images should be
is considered clinically acceptable. calibrated in the CEPPRO software by human operator. The operator
There was likely two main sources’ inaccuracy for the per- is supposed to put a digital ruler, showing AI how many pixels are in
formance of AI in the study. AI operates based on the collection 1 mm. The operator can also introduce a mistake while stretching
42 | BULATOVA et al.
the ruler, and as a result change coordinates for the landmarks. Even 5. Durão AR, Pittayapat P, Rockenbach MI, et al. Validity of 2D lateral
cephalometry in orthodontics: a systematic review. Prog Orthod.
the smallest mistake in putting digital ruler alters the number of pix-
2013;20(14):31.
els in 1 mm for the computer and can influence the coordinates for 6. Manosudprasit A, Haghi A, Allareddy V, Masoud M. Diagnosis and
all points. treatment planning of orthodontic patients with 3-dimensional
dentofacial records. Am J Orthod Dentofac Orthop. 2017;151:1083-
1091. 10.1016/j.ajodo.2016.10.037
7. Wang C-W, Huang C-T, Hsieh M-C , et al. Evaluation and compari-
5 | CO N C LU S I O N S son of anatomical landmark detection methods for cephalometric
X-Ray Images. IEEE Trans Med Imaging. 2015;34(9):1890-1900.
AI is a promising tool to facilitate cephalometric tracing process in 8. El-Fegh I, Galhood M, Sid-Ahmed M, Ahmadi M. Automated
routine clinical practice and analysing big databases for research 2-D cephalometric analysis of X-ray by image registration ap-
proach based on least square approximator. In: 30th International
purposes to make it more affordable. However, the operator should
Conference of the IEEE Engineering in Medicine and Biology
account for the presence of different artefacts, like cephalostat or Society -EMBS 2008. 3949-3952 (2008).
chin-cup contours, which can affect the AI results. AI can analyse 9. Cardillo J, Sid-Ahmed MA. An image processing system for locating
only something that it was trained to detect. Therefore, no new ele- craniofacial land-marks. IEEE Trans Med Imaging. 1994;13:275-289.
10. Forsyth DB, Davis DN. Assessment of an automated cephalometric
ments should be introduced to a machine to avoid mistakes or the
analysis system. Eur J Orthod. 1996;18:471-478.
machine should be trained to recognize these elements on the lateral 11. Lévy-Mandel AD, Venetsanopoulos AN, Tsotsos K. Knowledge-
cephalometric images from the beginning. based landmarking of cephalograms. Comput Biomed Res Int J.
1986;19:282-3 09.
12. Ibragimov B, Likar B, Pernuš F, Vrtovec T. Automatic cephalometric
AC K N OW L E D G E M E N T S
x-ray landmark detection by applying game theory and random for-
We thank Mr Thomas Kim, program engineer from DDH Inc, for ests. In Proc. ISBI Int. Symp. on Biomedical Imaging. adfa, pp. 1, 2014.
incorporating many additional features in the software to help us © Springer-Verlag Berlin Heidelberg 2014.
conduct this study. 13. Chakrabartty S, Yagi M, Shibata T, Cauwenberghs G. Robust ceph-
alometric Landmark Identification Using Support Vector Machines.
ICASSP; 2003.
C O N FL I C T O F I N T E R E S T 14. Collobert R, Weston J. A unified architecture for natural language
The authors declare that there is no conflict of interest. processing: Deep neural networks with multitask learning. In
Proceedings of the 25th International Conference on Machine Learning.
ACM. 2008:160-167.
AU T H O R C O N T R I B U T I O N
15. Hinton G, Deng LI, Yu D, et al. Deep neural networks for acoustic
Galina Bulatova was a principal investigator conducting research, modeling in speech recognition: the shared views of four research
performing data collection and software experiment, and writing the groups. IEEE Signal Process Mag. 2012;29(6):82-97.
initial manuscript. Flavio Jose Castelli Sanchez was responsible for re- 16. Alipanahi B, Delong A, Weirauch MT, Frey BJ. Predicting the se-
search idea and design, and development of research methodology. quence specificities of DNA-and RNA-binding proteins by deep
learning. Nat Biotech. 2015;33(8):831-838.
Budi Kusnoto supervised the research execution, writing and editing
17. Wang C-W, Huang C-T, Lee J-H, et al. A benchmark for compar-
of the manuscript. Viana Grace provided statistical analysis and data ison of dental radiography analysis algorithms. Med Image Anal.
presentation. T. Peter Tsay did critical review and revision. David M. 2016;31:63-76.
Avenetti did critical review and editing of the final manuscript. 18. Association of Orthodontists Foundation, Inc. AAOF Legacy
Collection.org [Internet]. American Association of Orthodontists
Foundation, Inc; 2013.
DATA AVA I L A B I L I T Y S TAT E M E N T 19. Park J-H, Hwang H-W, Moon J-H, et al. Automated identification
The data used in research were downloaded from open access of cephalometric landmarks: Part 1- Comparisons between the
internet-based American Association of Orthodontist Foundation latest deep-learning methods YOLOV3 and SSD. Angle Orthod.
2019;89(6):903-909.
Craniofacial Growth Legacy Collection.
20. Hwang HW, Park JH, Moon JH, et al. Automated identification of
cephalometric landmarks: Part 2-Might it be better than human?
ORCID Angle Orthod. 2020;90(1):69-76.
Galina Bulatova https://orcid.org/0000-0002-2936-5869
1. Proffit W, Fields H. Contemporary Orthodontics. Mosby, Inc.; 2000. Tsay TP, Avenetti DM, Sanchez FJC. Assessment of automatic
2. Graber T, Vanarsdall R. Orthodontics: Current Principles and cephalometric landmark identification using artificial
Techniques. Mosby, Inc.; 2000. intelligence. Orthod Craniofac Res. 2021;24(Suppl. 2):37–42.
3. Keim RG, Gottlieb EL, Nelson AH, Vogels DS III. Study of orthodon-
https://doi.org/10.1111/ocr.12542
tic diagnosis and treatment procedures. Part 1. Results and trends.
J Clin Orthod. 2002;36:553-568.
4. Rischen RJ, Breuning KH, Bronkhorst EM, Kuijpers-Jagtman AM.
Records needed for orthodontic diagnosis and treatment planning:
a systematic review. PLoS One. 2013;8(11):e74186.