0% found this document useful (0 votes)
64 views7 pages

Pi Is 2589537020303023

Uploaded by

saraswathi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
64 views7 pages

Pi Is 2589537020303023

Uploaded by

saraswathi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

EClinicalMedicine 27 (2020) 100558

Contents lists available at ScienceDirect

EClinicalMedicine
journal homepage: https://www.journals.elsevier.com/eclinicalmedicine

Research Paper

A deep learning algorithm for detection of oral cavity squamous cell


carcinoma from photographic images: A retrospective study
Qiuyun Fua,1, Yehansen Chene,1, Zhihang Lie,1, Qianyan Jinge,1, Chuanyu Huf,1, Han Liue,
Jiahao Baoe, Yuming Honge, Ting Shig, Kaixiong Lia, Haixiao Zouh, Yong Songi, Hengkun Wangj,
Xiqian Wangk, Yufan Wangl, Jianying Lium, Hui Liun, Sulin Cheno, Ruibin Chenp, Man Zhangd,
Jingjing Zhaoq, Junbo Xiangc, Bing Liua, Jun Jiaa, Hanjiang Wur, Yifang Zhaoa, Lin Wane,**,
Xuepeng Xionga,b,*
a
Department of Oral and Maxillofacial Surgery, School and Hospital of Stomatology, Wuhan University, Wuhan, Hubei 430079, China
b
The State Key Laboratory Breeding Base of Basic Science of Stomatology (Hubei-MOST) and Key Laboratory of Oral Biomedicine Ministry of Education, Wuhan
University, Wuhan, Hubei 430079, China
c
Department of Periodontology, School and Hospital of Stomatology, Wuhan University, Wuhan, China
d
Department of Orthodontics, Hubei-MOST KLOS and KLOBM, School and Hospital of Stomatology, Wuhan University, Wuhan, China
e
School of Geography and Information Engineering, China University of Geosciences, Wuhan, China
f
Center of Stomatology, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China
g
School of Information Engineering, Wuhan Huaxia University of Technology, Wuhan, China
h
Department of Stomatology, the Second Affiliated Hospital of Nanchang University, Nanchang, China
i
Department of Stomatology, Liuzhou People’s Hospital, Liuzhou, China
j
Department of Stomatology, Weihai Municipal Hospital, Weihai, China
k
Oral Medical Center, Henan Provincial People’s Hospital, School of Clinical Medicine, Henan University, Zhengzhou, China
l
Department of Oral and Maxillofacial surgery, Peking University Shenzhen Hospital, Shenzhen, China
m
Department of Stomatology, the People's Hospital of Zhengzhou, Zhengzhou, China
n
Department of Oral and Maxillofacial Surgery, Shanghai Stomatological Hospital, Fudan University, Shanghai, China
o
Department of Oral Implantology, School and Hospital of Stomatology, Fujian Medical University, Fuzhou, China
p
Department of Oral Mucosal Diseases, Xiamen Key Laboratory of Stomatological Disease Diagnosis and Treatment, Stomatological Hospital of Xiamen Medical
College, Xiamen, China
q
Department of Oral Surgery, Jingmen No.2 People's Hospital, Jingmen, China
r
Department of Oral and Maxillofacial Surgery, The Second Xiangya Hospital, Central South University, Changsha, China

A R T I C L E I N F O A B S T R A C T

Article History: Background: The overall prognosis of oral cancer remains poor because over half of patients are diagnosed at
Received 19 June 2020 advanced-stages. Previously reported screening and earlier detection methods for oral cancer still largely
Revised 5 September 2020 rely on health workers’ clinical experience and as yet there is no established method. We aimed to develop a
Accepted 9 September 2020
rapid, non-invasive, cost-effective, and easy-to-use deep learning approach for identifying oral cavity squa-
Available online 23 September 2020
mous cell carcinoma (OCSCC) patients using photographic images.
Methods: We developed an automated deep learning algorithm using cascaded convolutional neural networks
to detect OCSCC from photographic images. We included all biopsy-proven OCSCC photographs and normal
controls of 44,409 clinical images collected from 11 hospitals around China between April 12, 2006, and Nov
25, 2019. We trained the algorithm on a randomly selected part of this dataset (development dataset) and used
the rest for testing (internal validation dataset). Additionally, we curated an external validation dataset com-
prising clinical photographs from six representative journals in the field of dentistry and oral surgery. We also
compared the performance of the algorithm with that of seven oral cancer specialists on a clinical validation
dataset. We used the pathological reports as gold standard for OCSCC identification. We evaluated the algo-
rithm performance on the internal, external, and clinical validation datasets by calculating the area under the
receiver operating characteristic curves (AUCs), accuracy, sensitivity, and specificity with two-sided 95% CIs.
Findings: 1469 intraoral photographic images were used to validate our approach. The deep learning algo-
rithm achieved an AUC of 0¢983 (95% CI 0¢973 0¢991), sensitivity of 94¢9% (0¢915 0¢978), and specificity of
88¢7% (0¢845 0¢926) on the internal validation dataset (n = 401), and an AUC of 0¢935 (0¢910 0¢957),

* Corresponding author at: Department of Oral and Maxillofacial Surgery, School and Hospital of Stomatology, Wuhan University, Wuhan, Hubei 430079, China.
** Corresponding author at: School of Geography and Information Engineering, China University of Geosciences, Wuhan, Hubei 430078, China.
E-mail addresses: [email protected] (L. Wan), [email protected] (X. Xiong).
1
These authors contributed equally.

https://doi.org/10.1016/j.eclinm.2020.100558
2589-5370/© 2020 The Author(s). Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)
2 Q. Fu et al. / EClinicalMedicine 27 (2020) 100558

sensitivity of 89¢6% (0¢847 0¢942) and specificity of 80¢6% (0¢757 0¢853) on the external validation dataset
(n = 402). For a secondary analysis on the internal validation dataset, the algorithm presented an AUC of
0¢995 (0¢988 0¢999), sensitivity of 97¢4% (0¢932 1¢000) and specificity of 93¢5% (0¢882 0¢979) in detecting
early-stage OCSCC. On the clinical validation dataset (n = 666), our algorithm achieved comparable perfor-
mance to that of the average oral cancer expert in terms of accuracy (92¢3% [0¢902 0¢943] vs 92.4%
[0¢912 0¢936]), sensitivity (91¢0% [0¢879 0¢941] vs 91¢7% [0¢898 0¢934]), and specificity (93¢5%
[0¢909 0¢960] vs 93¢1% [0¢914 0¢948]). The algorithm also achieved significantly better performance than
that of the average medical student (accuracy of 87¢0% [0¢855 0¢885], sensitivity of 83¢1% [0¢807 0¢854],
and specificity of 90¢7% [0¢889 0¢924]) and the average non-medical student (accuracy of 77¢2%
[0¢757 0¢787], sensitivity of 76¢6% [0¢743 0¢788], and specificity of 77¢9% [0¢759 0¢797]).
Interpretation: Automated detection of OCSCC by deep-learning-powered algorithm is a rapid, non-invasive,
low-cost, and convenient method, which yielded comparable performance to that of human specialists and
has the potential to be used as a clinical tool for fast screening, earlier detection, and therapeutic efficacy
assessment of the cancer.
© 2020 The Author(s). Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license
(http://creativecommons.org/licenses/by-nc-nd/4.0/)

Research in context
1. Introduction

Evidence before this study


Oral cancer is one of the common malignancies worldwide. There
were an estimated 354,864 new cases and 177,384 deaths occurring We searched PubMed on Jan 4, 2020, for articles that described
in 2018, which represented 2% cancer cases and 1.9% cancer related the application of deep learning algorithm to detect oral cancer
deaths respectively [1]. Of all oral cavity cancer cases, approximately from images, using the search terms “deep learning” OR “convo-
90% are squamous cell carcinoma (SCC) [2]. Despite various emerging lutional neural network” AND “oral cavity squamous cell carci-
treatment modalities adopted over the past decades, the overall mor- noma” OR “oral cancer” AND “images”, with no language or date
tality of oral cavity squamous cell carcinoma (OCSCC) has not restrictions. We found that previous researches were mainly lim-
decreased significantly since the 1980s due to the relatively limited ited to highly standardized images, such as multidimensional
effort towards screening and early detection, which accounts for the hyperspectral images, laser endomicroscopy images, computed
stubborn rate of diagnosis with advanced-stage diseases [3]. tomography images, positron emission tomography images, his-
The early detection of OCSCC is essential. As the estimated 5-year tological images, and Raman spectra images. There were only
survival rate for OCSCC demonstrates a distinct decrease from 84% if two reports of artificial intelligence-enabled oral lesion classifi-
detected in its early stages (stages Ⅰ and Ⅱ) to about 39% if detected in cation using photographic images, which were published on Oct
its advanced stages (stages Ⅲ and Ⅳ) [3] And even worse, patients 10 and Dec 5, 2018, respectively. However, both of them suffered
with advanced-stage diseases have to undergo poorer postoperative from extreme scarcity of data (<300 images in total) and
quality of life as a result of the suffering and costly process of multi- depended heavily on specialized instruments that generated
modal therapy including surgery, adjuvant radiation therapy with or autofluorescence and white light images. In summary, we identi-
without chemotherapy [4]. fied no research to allow direct comparison with our algorithm.
Unlike other internal organs, oral cavity allows for easy visualiza-
tion without the need of special instruments. In clinical practice, spe- Added value of this study
cialists tend to make suspected diagnoses of oral cancer during visual
To our knowledge, this is the first study to develop a deep learn-
inspection according to their own experience and knowledge on visual
ing algorithm for detection of OCSCC from photographic images.
appearances of cancerous lesions [5,6]. Generally, OCSCC lesions often
The high performance of the algorithm was validated in various
appear first as white or red patches, or mixed white-red patches, the
scenarios, including detecting early oral cavity cancer lesions
mucosal surface usually exhibits an increasingly irregular, granular,
(diameters less than two centimetres). We compared the perfor-
and ulcerated appearance (see appendix p2 for details) [7,8]. Neverthe-
mance of the algorithm with that of oral cancer specialists on a
less, such visual patterns are easily mistaken for signs of ulceration or
clinical validation dataset and found its competence is comparable
other oral mucous membrane diseases by non-specialist medical prac-
to or even beyond that of the oral cancer specialists. The deep
titioners [8]. For a long time, there is no well-established vision-based
learning algorithm was trained and tested with ordinary photo-
method for oral cancer detection. The diagnosis of OCSCC has to rely
graphic images (for example, smartphone images) alone and did
on invasive oral biopsy which is not only time-consuming, but also not
not require any other highly standardized images via a specialized
guaranteed in primary care or community settings, especially in devel-
instrument or invasive biopsy. Specifically, we developed a smart-
oping countries [9,10]. Thus, quite often OCSCC patients cannot receive
phone app on the basis of our algorithm to provide real-time
timely diagnosis and referrals [11,12].
detection of oral cancer. Our approach outputted reasonable
There is growing evidence that deep learning techniques have
scores for one OCSCC lesion during different cycles of chemother-
matched or even outperformed human experts in identifying subtle
apy, which exhibited a steady decline in parallel with the chemo-
visual patterns from photographic images, [13] including classifying
therapy shrunk the lesion.
skin lesions, [14] detecting diabetic retinopathy, [15] and identifying
facial phenotypes of genetic disorders [16]. These impressive results
Implications of all the available evidence
inspire us to believe that deep learning also might have a potential to
capture fine-grained features of oral cancer lesions, which is benefi- Our study reveals that OCSCC lesions carry discriminative visual
cial to the early detection of OCSCC. appearances, which can be identified by deep learning algo-
With the assumption that deep neural networks could identified rithm. The ability of detecting OCSCC in a point-of-care, low-
specific visual patterns of oral cancer like human experts, we devel- cost, non-invasive, widely available, effective manner has sig-
oped a deep learning algorithm using photographic images for fully nificant clinical implications for OCSCC detecting.
automated OCSCC detection. We evaluated the algorithm
Q. Fu et al. / EClinicalMedicine 27 (2020) 100558 3

performance on the internal and external validation datasets, and 2.3. Human readers versus the algorithm
compared the model to the average performance of seven oral cancer
specialists on a clinical validation dataset. We compared the performance of the algorithm with that of 21
human readers on the clinical validation dataset. Readers employed
in our study were divided into three panels according to their profes-
2. Methods sional backgrounds and clinical experiences. The specialist panel con-
sisted of seven oral cancer specialists from five hospitals. The medical
2.1. Datasets student panel contained seven postgraduates who major in oral and
maxillofacial surgery; and the non-medical student panel recruited
We retrospectively collected 44,409 clinical oral photographs seven non-medical undergraduates (readers’ detailed information
from 11 hospitals in China between April 12, 2006, and Nov 25, 2019. listed in the appendix, p 12). None of these readers participated in
We included all biopsy-proven OCSCC photographs and normal con- the clinical care or assessment of the enrolled patients, nor did they
trols by performing image quality control to remove intraoperative, have access to their medical records.
postoperative and blurry photographs, and photographs of same For the purposes of the present study, we classified the photo-
lesion from approximate angles. We randomly selected 5775 photo- graphs of OCSCC, non-OCSCC malignancies and oral epithelial dyspla-
graphs (development dataset) to develop the algorithm and used the sia in the clinical validation dataset as oral cancer lesions; others
remaining 401 photographs (internal validation dataset) for valida- (benign lesions and normal oral mucosa) as negative control. We
tion. We also included all photographs of early-stage OCSCC (lesion’s chose this classification because both precancerous and cancerous
diameter less than two centimetres) in the internal validation dataset oral lesions should be detected without delay in clinical practice.
to evaluate the algorithm performance in the early detection of Each reader was tested independently on the clinical validation
OCSCC [17]. The corresponding pathological reports were used as the dataset. We asked them to read each photograph in the dataset and
gold standard to develop and validate the deep learning algorithm. record their judgements on the answer sheet. The photograph pre-
We also curated an external validation dataset comprising 420 sentation order was randomized, and the answer sheet used in the
clinical photographs from six representative journals in the field of test was shown in the appendix (pp 13). The performance of readers
dentistry, and oral maxillofacial surgery (listed in the appendix, p was assessed by comparing their predictions with corresponding
17 28), which were published between Jan, 2000 and Aug, 2019. We pathological reports. We aggregated the final results and calculated
removed black-and-white, intraoperative, and dyed lesions photo- the overall accuracy, sensitivity, and specificity of each panel.
graphs.
We acquired a clinical validation dataset from the outpatient 2.4. Statistical analysis
departments of Hospital of Stomatology, Wuhan University between
Nov 4, 2010, and Oct 8, 2019. This dataset contained 1941 photo- We used the receiver operating characteristic (ROC) curve to eval-
graphs of OCSCC, other diseases or disorders of oral mucosa (see uate the performance of the deep learning algorithm in discriminat-
appendix, p 8 9 for details), and normal oral mucosa. We included ing OCSCC lesions from controls. The ROC curve was plotted by
all biopsy-proven photographs and normal controls by performing calculating the true positive rate (sensitivity) and the false positive
similar image quality control as mentioned above. All photographs rate (1-specificity) with different predicted probability thresholds,
involved in this study were stored in a jpg format. We classified pho- and we calculated AUC values [19]. We calculated 95% bootstrap CIs
tographs of normal mucosa as negative controls. for accuracy, sensitivity, and specificity using 10,000 replicates [20].
This study is reported according to STROBE guideline recommen- Sensitivity was calculated as the fraction of photographs of oral can-
dations and approved by the Institutional Review Board (IRB) of the cer patients which were correctly classified, and specificity was cal-
Ethics Committee of Hospital of Stomatology, Wuhan University (IRB culated as the fraction of photographs of non-cancer individuals
No. 2019-B21). Informed consent from all participants was exempted which were correctly classified. We used the average accuracy, sensi-
by the IRB because of the retrospective nature of this study. tivity, and specificity of human readers when comparing with that of
the model. We also employed statistical t-distributed Stochastic
Neighbour Embedding (t-SNE) to demonstrate the effectiveness of
2.2. Algorithms development process our deep neural networks on differentiating OCSCC from non-OCSCC
oral diseases [21]. All statistical analyses were done using scipy (ver-
We developed an automated deep learning algorithm using cas- sion 0.22.1) and scikit-learn (version 1.4.1) python packages.
caded convolutional neural networks to detect OCSCC from photo-
graphic images. A detection network firstly took an oral photograph 2.5. Role of the funding source
as input and generated one bounding box that located the suspected
lesion. The lesion area was cropped as a candidate patch according to There was no funding source for this study. XPX and LW had full
the detection results returned by the first step. The candidate patch access to all the data and had final responsibility for the decision to
was then fed to a classification network which produced a list of two submit for publication.
confidence scores in range of 0 1 for classification of patients with
OCSCC and controls. The backbone networks of detection and classifi- 3. Results
cation were initialised with a pre-trained model that had been
trained with tens of millions of images in the ImageNet dataset and In the development and internal validation datasets, 28,064 pho-
further finetuned on the development dataset [18]. More details tographs (non-OCSCC diseases [n = 19,271] and non-biopsy-proven
were described in the appendix (pp 3 7). [n = 8793]) were excluded. Of these, another 9989 photographs were
We also augmented our data to generate more training samples removed after image quality control, including intraoperative and
through image pre-processing like scaling, rotation, horizontal flip- postoperative photographs (n = 8551), photographs of same lesion
ping and adjustment of the saturation and exposure (detailed in the from approximate angles (n = 1167), and blurry photographs
appendix, p 6). Data augmentation was not executed on datasets (n = 271). We selected 402 of 420 photographs in the external valida-
used for validation. We developed the deep learning algorithm based tion dataset after removing 18 photographs, including the black and
on transfer learning (detailed in the appendix, p 5), which benefits to white (n = 12), intraoperative (n = 4), and dyed lesions (n = 2) photo-
shorten the network training time and alleviate overfitting. graphs. In the clinical validation dataset, 233 photographs were
4 Q. Fu et al. / EClinicalMedicine 27 (2020) 100558

Fig. 1. Workflow diagram for the development and evaluation of the OCSCC detection algorithm
*Cancer photographs were images of OCSCC, other malignancies, and epithelial dysplasia while control photographs were images of benign lesions and normal oral mucosa for
the clinical validation dataset. OCSCC=oral cavity squamous cell carcinoma. WHUSS=School and Hospital of Stomatology, Wuhan University.

excluded for unavailable pathological reports. We performed similar normal oral mucosa photographs were used as negative controls. The
quality control by removing intraoperative and postoperative photo- internal validation dataset contained 179 photographs of OCSCC
graphs (n = 529), photographs of same lesion from approximate lesions and 222 normal controls. The clinical validation dataset
angles (n = 456), and blurry photographs (n = 57). Fig. 1 summarises included 274 photographs of OCSCC lesions, 77 photographs of non-
the workflow diagram for the development and evaluation of the OCSCC oral diseases, and 315 photographs of normal oral mucosa.
deep learning algorithm. The external validation dataset consisted of 154 photographs of
Baseline characteristics for the development and three validation OCSCC lesions and 248 normal controls. Statistics for the sites of
datasets are summarised in Table 1. In the development dataset, occurrence of OCSCC lesions were conducted according to the Inter-
2055 photographs of OCSCC lesions were included while 3720 national Classification of Diseases 11th Revision (ICD-11) [22].

Table 1
Baseline characteristics.

Development Internal validation Clinical validation External validation p value


dataset dataset dataset dataset

Number of photographs 5775 401 666 402 ..


Stage 0.033
Number of T1 OCSCC patients 459 101 51 .. ..
Number of T2 OCSCC patients 471 27 54 .. ..
Number of T3 OCSCC patients 110 7 18 .. ..
Number of T4 OCSCC patients 82 1 8 .. ..
Number of photographs for which age was unknown 3735 224 316 402 ..
Mean age, years (range) 55 (19 88) 58 (26 89) 55 (21 83) .. <0.0001
Lesion location 0.005
Squamous cell carcinoma of lip 99 (2%) 8 (2%) 6 (1%) 22 (6%) ..
Squamous cell carcinoma of tongue 901 (16%) 83 (20%) 120 (18%) 37 (9%) ..
Squamous cell carcinoma of gum 272 (5%) 21 (5%) 43 (7%) 34 (8%) ..
Squamous cell carcinoma of floor of mouth 202 (3%) 16 (4%) 9 (1%) 12 (3%) ..
Squamous cell carcinoma of palate 112 (2%) 10 (3%) 12 (2%) 10 (2%) ..
Squamous cell carcinoma of pharynx 40 (1%) 5 (1%) 18 (3%) 1 (1%) ..
Squamous cell carcinoma of other or unspecified 429 (7%) 36 (9%) 66 (10%) 38 (9%) ..
parts of mouth
Non-OCSCC oral mucosal diseases* 0 0 77 (11%) 0 ..
Normal oral mucosa 3720 (64%) 222 (56%) 315 (47%) 248 (62%) ..
OCSCC = oral cavity squamous cell carcinoma.
Data are n (%), unless otherwise stated.
* Non-OCSCC oral mucosal diseases included non-OCSCC malignancies, epithelial dysplasia and benign lesions that were detailed in the appendix.
Q. Fu et al. / EClinicalMedicine 27 (2020) 100558 5

Fig. 2. ROC curves for the deep learning algorithm on three validation datasets
In the main analysis, all photographs in the internal validation dataset were used. In the secondary analysis, only photographs of early-stage oral cavity squamous cell carcinoma
(lesion’s diameter less than two centimetres) and random selected negative controls in the internal validation dataset were used. ROC=receiver operating characteristic. AUC=area
under the curve.

Table 2
Algorithm performance.

AUC Sensitivity Specificity Accuracy

Internal validation dataset (n = 401) 0¢983 (0¢973 0¢991) 94¢9% (91¢5 97¢8) 88¢7% (84¢5 92¢6) 91¢5% (88¢8 94¢3)
Secondary analysis* (n = 170) 0¢995 (0¢988 0¢999) 97¢4% (93¢2 100¢0) 93¢5% (88¢2 97¢9) 95¢3% (91¢8 98¢2)
External validation dataset (n = 402) 0¢935 (0¢910 0¢957) 89¢6% (84¢7 94¢2) 80¢6% (75¢7 85¢3) 84¢1% (80¢3 87¢6)
Clinical validation dataset (n = 666) 0¢970 (0¢957 0¢981) 91¢0% (87¢9 94¢1) 93¢5% (90¢9 96¢0) 92¢3% (90¢2 94¢3)
Data in parentheses are 95% CIs.
* In the secondary analysis, only photographs of early-stage oral cavity squamous cell carcinoma (lesion’s diameter less than two
centimetres) and random selected negative controls in the internal validation dataset were used.

Fig. 3. Comparisons between the deep learning algorithm and three panels of human readers
The dots in the left subgraph indicate the performance of each individual. The crosses in the right subgraph demonstrate the average performance and corresponding error bar
of each panel. OCSCC=oral cavity squamous cell carcinoma. AUC=area under the curve.

As in the primary analysis on all photographs in the internal vali- performance on the external validation dataset with an AUC of 0¢935
dation dataset, the deep learning algorithm achieved an AUC of 0¢983 (95% CI, 0¢910 0¢957), accuracy of 84.1% (80¢3 87¢6), sensitivity of
(95% CI, 0¢973 0¢991), accuracy of 91¢5% (88¢8 94¢3), sensitivity of 89¢6% (84¢5 94¢1), and specificity of 80¢6% (75¢5 85¢4).
94¢9% (91¢5 97¢8), and specificity of 88¢7% (84¢5 92¢6) in detecting The test results for the algorithm and three panels of human read-
OCSCC lesions (Fig. 2 and Table 2). A secondary analysis was per- ers on the clinical validation dataset are shown in Fig. 3. The algo-
formed on all photographs of early-staged OCSCC lesions (n = 77) and rithm achieved an AUC of 0¢970 (95% CI, 0¢957 0¢981) with accuracy
randomly selected normal controls (n = 93) in the same dataset, of 92¢3% (90¢2 94¢3), sensitivity of 91¢0% (87¢9 94¢1), and specificity
which achieved an AUC of 0¢995 (0¢988 0¢999) with accuracy of of 93.5% (90¢9 96¢0) in detecting oral cancer. Among the human
95¢3% (91¢8 98¢2), sensitivity of 97¢4% (93¢2 100.0), and specificity readers, the accuracy of specialist panel was slightly higher than that
of 93¢5% (88¢2 97¢9). Similarly, the model also achieved promising of the algorithm at 92¢4% (95% CI, 91¢2 93¢6) whereas 87¢0%
6 Q. Fu et al. / EClinicalMedicine 27 (2020) 100558

(85¢5 88¢5) and 77¢2% (75¢7 78¢7) for the medical and non-medical (see appendix, p 2). These results imply that application-oriented
student panel, respectively. The sensitivity and specificity varied deep neural network architecture is more effective to improve overall
greatly among three panels: the model achieved comparable results performance than simply stacking more layers into one network.
to the specialist panel (sensitivity of 91¢7% [95% CI, 89¢8 93¢4], and Our deep learning algorithm also generalizes well for early cancer
specificity of 93¢1% [91¢4 94¢8]) and demonstrated significantly lesions. Recognizing early-stage oral cavity cancer lesions, which are
higher results than the medical student panel (sensitivity of 83¢1% smaller than two centimetres and carried few visual features, [17]
[80¢7 85¢4], and specificity of 90¢7% [88¢9 92¢4]) and non-medical can be very difficult, but is effective to improve the curative effect, as
student panel (sensitivity of 76¢6% [74¢3 78¢8], and specificity of the World Health Organization (WHO) stated. We found our deep
77¢9% [75¢9 79¢7]). Results of each human reader were listed in the neural networks to be helpful in identifying these very small OCSCC
appendix (pp 14). lesions in high-risk individuals, achieving a promising result (AUC
0¢995) during the secondary analysis on internal validation dataset.
4. Discussion Another noteworthy finding is that our approach might potentially
be used as a quantitative tool in aiding assessment of efficacy of thera-
In this study, we developed a deep learning algorithm that performs peutic regimens. The deep learning algorithm outputs a score in terms
well (AUC 0¢980 and 0¢935 for the clinical and publication validation of visual features extracted from lesion photos, which might be consid-
datasets, respectively) in automated detection of OCSCC from oral pho- ered as a measure of the severity of cancer. It could be possibly helpful
tographs, which is comparable favourably with the performance of for assisting human specialists in rating the curative effects of non-sur-
seven oral cancer specialists. Our finding that deep learning can capture gical treatment modalities. For instance, the downward trend among
fine-grained visual patterns of OCSCC in cluttered oral image back- outputted scores corresponded to a triple of oral photos taken for an
ground with a speed and reliability matching or even beyond the capa- OCSCC patient who received two cycles of docetaxel/cisplatin/5-fluoro-
bilities of human experts validates what is, to our knowledge, the first uracil (TPF) induction chemotherapy at three time points: before the
fully automated, photographic-image-based approach for oral cancer chemotherapy, after the first chemotherapy cycle, and after the second
lesion precise localization and recognition. This is of particular impor- chemotherapy cycle, which shows that the treatment was effective (see
tance since such a non-invasive, rapid, and easy-to-use tool has signifi- appendix pp 10 11). We believe that such a finding should be a funda-
cant clinical implications for early diagnosis or screening for suspected mental basis of further clinical research and practice.
patients in countries that are lack of medical expertise. Despite recent advances in applying deep learning techniques to
Delays in referrals to cancer specialists is a primary cause of late medical-imaging interpretation tasks, large datasets are remained as
presentation of quite a proportion of OCSCC patients (approximately one prerequisite for achieving the performance of human-based
60% 65% at advanced stages), leading to a worse prognosis of this diagnosis [14,15,16,30]. Unlike computed tomography (CT) and mag-
cancer [23,24]. Early detection of OCSCC is challenging because of netic resonance imaging (MRI) image and electrocardiogram, taking
poor public awareness and knowledge about oral cancer, particu- oral photographs is not mandatory before treatment. Hence it is
larly its clinical presentation. Furthermore, it is really hard for extraordinarily difficult to collect large amounts of photographs. The
patients and even non-specialist healthcare professionals to per- development dataset (5775 images) used in our study is not enough
ceive subtle visual signs of OCSCC from the variability in the appear- to train a robust deep learning model from scratch. Therefore, we
ance of oral mucosa lesions [11,12]. For example, appearances of adopt transfer learning by finetuning a pre-trained model trained on
tumours could be erythematous and ulcerative lesions since the large-scale image datasets. Furthermore, data augmentation techni-
onset, typically producing no prominent signs and discomfort until ques are utilized to increase the size of our training set. Additional
they progress. Trained health workers-based screening program tricks, such as multi-task learning, hard example mining, etc., are also
conducted in previous studies did reduce oral cancer mortality but employed to improve our model (see appendix pp 6 7).
was costly, time-consuming, labor-intensive, and inefficient due to A limitation of our study is that the algorithm cannot make defi-
a large fraction of inexperienced non-medical undergraduate nite predictions for other oral diseases, mainly because the photo-
employees in the task [25 27]. In comparison, our work is novel in graphs used to train the deep neural networks may not fully
that no specific training or expert experience is required, the artifi- represent the diversity and heterogeneity of oral disease lesions.
cial intelligence (AI)-powered algorithm enables OCSCC lesions to Despite this, the internal features learned by the neural networks
be discriminated easily and automatically, just from one ordinary show our algorithm has the promising potential not only to distin-
smartphone photo containing the suspicious region, achieving per- guish OCSCC from non-OCSCC oral diseases, but also to differentiate
formance on par with human experts and far outperforming medi- between non-OCSCC oral disease and normal oral mucosal (Appendix
cal/non-medical school students (Fig. 3). Furthermore, we also built Fig. 1.6). On the resulting plots of t-SNE representations of these three
a mobile app on the basis of our OCSCC-recognition algorithm (see lesion classes, [22] each point represents one oral photo projected
appendix pp 9 11), which might provide effective, easy, and low- from the 1024-dimensional output of the last hidden layer of our
cost medical assessments for more individuals in need than is possi- neural network into two dimensions. We see the points of the same
ble with existing healthcare systems. lesion class are aggregated into one cluster with the same color while
Apart from the photographic variability problem, identifying oral OCSCC, non-OCSCC oral diseases and normal oral mucosal are well
cancer from ordinary photographic images is a far trickier task than separated into three clusters. But still, the proposed algorithm fails in
classifying skin-lesion diseases, [15] because OCSCC lesions are often distinguishing several visually confusing cases such as epulis (see
hidden or masked in complex background by overlapping teeth, buc- appendix pp 15 for more cases of where the algorithm and human
cal mucosa, tongue, palate, and lip. Here we present a two-step deep experts failed to assess the malignancy correctly). Much larger
learning algorithm to detect OCSCC in a ‘coarse-to-fine’ way. The diverse training dataset might be one possible solution to tackle this
detection network Single Shot MultiBox Detector (SSD) firstly spotted issue and will be tested in next clinical trials.
highly suspicious areas with OCSCC visual patterns in given photo- In conclusion, we report that deep learning methods may offer
graphs by filtering out unrelated contents [28]. DenseNet121 then opportunities for automatically identifying OCSCC patients with the
classified those targeted regions into OCSCC or not [29]. Our deep performance matching or even beyond that of skilled human experts.
neural networks achieve 92¢3% (95% CI 0¢902 0¢943) overall accuracy The developed algorithm with good generalization capability could
whereas seven tested experts attain 92¢4% (0¢912 0¢936) average be used as a handy, non-invasive, and cost-effectiveness tool for non-
accuracy. In addition, it also shows better generalization performance specialist people to detect OCSCC lesions as soon as possible, thereby
for varied datasets contained photographs taken by different cameras enabling early treatment.
Q. Fu et al. / EClinicalMedicine 27 (2020) 100558 7

Declaration of Competing Interest [4] Hammerlid E, Bjordal K, Ahlner-Elmqvist M, Boysen M, Evensen JF, Bio € rklund A,
et al. A prospective study of quality of life in head and neck cancer patients. Part
I: at diagnosis. Laryngoscope 2001;111:669–80.
We declare no competing interests. [5] der Waal I, de Bree R, Brakenhoff R, Coebegh JW. Early diagnosis in primary oral
cancer: is it possible? Med Oral Patol Oral Cir Bucal 2011;16:e300–5.
Funding [6] Kundel HL. History of research in medical image perception. J Am Coll Radiol
2006;3:402–8.
[7] Chi AC, Day TA, Neville BW. Oral cavity and oropharyngeal squamous cell carci-
No funding received. noma—An update. CA Cancer J Clin 2015;65:401–21.
[8] Bagan J, Sarrion G, Jimenez Y. Oral cancer: clinical features. Oral Oncol
2010;46:414–7.
Contributions
[9] Moy E, Garcia MC, Bastian B, Rossen LM, Ingram DD, Faul M, et al. Leading causes
of death in nonmetropolitan and metropolitan areas - United States, 1999-2014.
XPX and LW designed and co-supervised the study. ZHL and QYJ MMWR Surveill Summ 2017;66:1–8.
[10] Pagedar NA, Kahl AR, Tasche KK, Seaman AT, Christensen AJ, Howren MB, et al.
implemented the deep learning algorithm and developed the CACal-
Incidence trends for upper aerodigestive tract cancers in rural United States coun-
culator app. XPX, HJW, CYH, YS, HKW, YFW, XQW, MZ, YMH, JJ, BL, ties. Head Neck 2019;41:2619–24.
JBX, SLC, JJZ, QYF, and KXL contributed to data collection. QYF and [11] Gigliotti J, Madathil S, Makhoul N. Delays in oral cavity cancer. Int J Oral Maxillo-
JHB collated data and checked data sources. QYF, HXZ, JYL, and HuiLiu fac Surg 2019;48:1131–7.
[12] Liao DZ, Schlecht NF, Rosenblatt G, Kinkhabwala CM, Leonard JA, Ference RS, et al.
coordinated the human-AI competition experiment and reader Association of delayed time to treatment initiation with overall survival and
recruitment. XPX, LW, ZHL, QYF, and YHSC drafted the manuscript. recurrence among patients with head and neck squamous cell carcinoma in an
QYF, YHSC, and HanLiu did the statistical analysis, data interpretation, underserved urban population. JAMA Otolaryngol - Head Neck Surg
2019;145:1001–9.
and constructed all tables and figures. YFZ and TS reviewed and [13] LeCun Y, Bengio Y, Hinton G. Deep learning. Nature 2015;521:436–44.
revised the draft. All authors did several rounds of amendments and [14] Esteva A, Kuprel B, Novoa RA, Ko J, Swetter SM, Blau HM, et al. Dermatologist-
approved the final manuscript. level classification of skin cancer with deep neural networks. Nature
2017;542:115–8.
[15] Gulshan V, Peng L, Coram M, Stumpe MC, Wu D, Narayanaswamy A, et al. Devel-
Data sharing opment and validation of a deep learning algorithm for detection of diabetic reti-
nopathy in retinal fundus photographs. JAMA J Am Med Assoc 2016;316:2402–
10.
All photographic images used in the study are available from the
[16] Gurovich Y, Hanani Y, Bar O, Nadav G, Fleischer N, Gelbman D, et al. Identifying
corresponding authors on reasonable request. facial phenotypes of genetic disorders using deep learning. Nat Med 2019;25:60–
4.
[17] Woolgar JA. Histopathological prognosticators in oral and oropharyngeal squa-
Acknowledgments
mous cell carcinoma. Oral Oncol 2006;42:229–39.
[18] Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, et al. ImageNet large scale
We thank the participants from Hospital of Stomatology, Wuhan visual recognition challenge. Int J Comput Vis 2015;115:211–52.
University, People’s Hospital of Zhengzhou, Jing Men No.2 People’s [19] Hanley JA, McNeil BJ. The meaning and use of the area under a receiver operating
characteristic (ROC) curve. Radiology 1982;143:29–36.
Hospital, Tongji Hospital of Tongji Medical College, Huazhong Univer- [20] Efron B. Better bootstrap confidence intervals. J Am Stat Assoc 1987;82:171–85.
sity of Science and Technology, Wuhan Union Hospital of Tongji Med- [21] Van Der Maaten L, Hinton G. Visualizing data using t-SNE. J Mach Learn Res
ical College, Huazhong University of Science and Technology, The 2008;9:2579–625.
[22] International Classification of Diseases 11th Revision. May 25, 2019. https://icd.
First Affiliated Hospital of Zhengzhou University, Henan Provincial who.int/en/(accessed Jan 4, 2020)
People’s Hospital, People’s Hospital of Zhengzhou, The Second Affili- [23] Scott SE, Grunfeld EA, Main J, McGurk M. Patient delay in oral cancer: a qualitative
ated Hospital of Nanchang University, and China University of Geo- study of patients’ experiences. Psycho-Oncol J Psychol Soc Behav Dimens Cancer
2006;15:474–85.
sciences, Wuhan for providing assistance with the interpretation of [24] De Vicente JC, Recio OR, Pendas SL, Lo  pez-Arranz JS. Oral squamous cell carci-
the clinical validation dataset. We also thank the doctors from The noma of the mandibular region: a survival study. Head Neck J Sci Spec Head Neck
First Affiliated Hospital of Zhengzhou University, The Second Affili- 2001;23:536–43.
[25] Sankaranarayanan R, Ramadas K, Thomas G, Muwonge R, Thara S, Mathew B, et al.
ated Hospital of Xinjiang University, Qinghai red-cross hospital, and Effect of screening on oral cancer mortality in Kerala, India: a cluster-randomised
The Second Xiangya Hospital of Central South University for aiding in controlled trial. Lancet 2005;365:1927–33.
the data acquisition. [26] Sankaranarayanan R, Ramadas K, Thara S, Muwonge R, Thomas G, Anju G, et al.
Long term effect of visual screening on oral cancer incidence and mortality in a
randomized trial in Kerala, India. Oral Oncol 2013;49:314–21.
Supplementary materials [27] Mathew B, Sankaranarayanan R, Sunilkumar KB, Kuruvila B, Pisani P, Krishnan
Nair M. Reproducibility and validity of oral visual inspection by trained health
Supplementary material associated with this article can be found, workers in the detection of oral precancer and cancer. Br J Cancer 1997;76:
390–4.
in the online version, at doi:10.1016/j.eclinm.2020.100558. [28] Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu C-Y, et al. SSD: single shot mul-
tibox detector. In: Proceeding of the european conference on computer vision;
References 2016 published online September 17. doi: 10.1007/978-3-319-46448-0_2.
[29] Huang G, Liu Z, Van Der Maaten L, Weinberger KQ. Densely connected convolu-
tional networks. In: Proceedings of the IEEE conference on computer vision
[1] Bray F, Ferlay J, Soerjomataram I, Siegel RL, Torre LA, Jemal A. Global cancer statis- and pattern recognition; 2017 published online June 22. doi: 10.1109/
tics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 can- CVPR.2017.243.
cers in 185 countries. CA Cancer J Clin 2018;68:394–424. [30] Chilamkurthy S, Ghosh R, Tanamala S, Biviji M, Campeau NG, Venugopal VK, et al.
[2] Marur S, Forastiere AA. Head and neck cancer: changing epidemiology, diagnosis, Deep learning algorithms for detection of critical findings in head CT scans: a ret-
and treatment. Mayo Clin Proc 2008;83:489–501. rospective study. Lancet 2018;392:2388–96.
[3] Howlader N, Noone AM, Krapcho M, Miller D, Bishop K, Kosary CL, et al. SEER can-
cer statistics review, 1975-2014. Bethesda, MD Natl Cancer Inst 2017 2018.

You might also like