0% found this document useful (0 votes)
14 views

Resarch Paper01

Uploaded by

Tanishq Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views

Resarch Paper01

Uploaded by

Tanishq Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Automatic classification of esophageal lesions in

endoscopic images using a convolutional neural network


Gaoshuang Liu, Jie Hua, Zhan Wu, Tianfang Meng, Mengxue Sun, Peiyun
Huang, Xiaopu He, Weihao Sun, Xueliang Li, Yang Chen

To cite this version:


Gaoshuang Liu, Jie Hua, Zhan Wu, Tianfang Meng, Mengxue Sun, et al.. Automatic classification of
esophageal lesions in endoscopic images using a convolutional neural network. Annals of translational
medicine, AME Publishing Company, 2020, 8 (7), �10.21037/atm.2020.03.24�. �hal-02735846�

HAL Id: hal-02735846


https://hal-univ-rennes1.archives-ouvertes.fr/hal-02735846
Submitted on 30 Sep 2020

HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est


archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents
entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non,
lished or not. The documents may come from émanant des établissements d’enseignement et de
teaching and research institutions in France or recherche français ou étrangers, des laboratoires
abroad, or from public or private research centers. publics ou privés.

Distributed under a Creative Commons Attribution - NonCommercial - NoDerivatives| 4.0


International License
Original Article
Page 1 of 10

Automatic classification of esophageal lesions in endoscopic


images using a convolutional neural network
Gaoshuang Liu1#, Jie Hua2#, Zhan Wu3,4, Tianfang Meng3,4, Mengxue Sun1, Peiyun Huang1, Xiaopu He1,
Weihao Sun1, Xueliang Li2, Yang Chen3,4,5
1
Department of Geriatric Gerontology, 2Department of Gastroenterology, The First Affiliated Hospital of Nanjing Medical University, Nanjing
210029, China; 3Laboratory of Image Science and Technology, School of Computer Science and Engineering, Southeast University, Nanjing 211102,
China; 4The Key Laboratory of Computer Network and Information Integration (Southeast University), Ministry of Education, Nanjing 211102,
China; 5Centre de Recherche en Information Biomedicale Sino-Francais (LIA CRIBs), Rennes, France
Contributions: (I) Conception and design: J Hua, G Liu, X Li, W Sun; (II) Administrative support: X He, P Huang, Z Wu; (III) Provision of study
materials or patients: J Hua, M Sun, W Sun, Z Wu; (IV) Collection and assembly of data: G Liu, M Sun, X He, P Huang; (V) Data analysis and
interpretation: T Meng; (VI) Manuscript writing: All authors; (VII) Final approval of manuscript: All authors.
#
These authors contributed equally to this work.
Correspondence to: Weihao Sun. Department of Geriatric Gerontology, The First Affiliated Hospital of Nanjing Medical University, Guangzhou Road,
Nanjing 210029, China. Email: [email protected]; Xueliang Li. Department of Gastroenterology, The First Affiliated Hospital of Nanjing
Medical University, Guangzhou Road, Nanjing 210029, China. Email: [email protected]; Yang Chen. Laboratory of Image Science and
Technology, School of Computer Science and Engineering, Southeast University, Moling Street, Southeast University Road, Nanjing 211102, China;
The Key Laboratory of Computer Network and Information Integration (Southeast University), Ministry of Education, Nanjing 211102, China;
Centre de Recherche en Information Biomedicale Sino-Francais (LIA CRIBs), Rennes, France. Email: [email protected].

Background: Using deep learning techniques in image analysis is a dynamically emerging field. This
study aims to use a convolutional neural network (CNN), a deep learning approach, to automatically classify
esophageal cancer (EC) and distinguish it from premalignant lesions.
Methods: A total of 1,272 white-light images were adopted from 748 subjects, including normal cases,
premalignant lesions, and cancerous lesions; 1,017 images were used to train the CNN, and another 255
images were examined to evaluate the CNN architecture. Our proposed CNN structure consists of two
subnetworks (O-stream and P-stream). The original images were used as the inputs of the O-stream to
extract the color and global features, and the pre-processed esophageal images were used as the inputs of the
P-stream to extract the texture and detail features.
Results: The CNN system we developed achieved an accuracy of 85.83%, a sensitivity of 94.23%, and
a specificity of 94.67% after the fusion of the 2 streams was accomplished. The classification accuracy of
normal esophagus, premalignant lesion, and EC were 94.23%, 82.5%, and 77.14%, respectively, which
shows a better performance than the Local Binary Patterns (LBP) + Support Vector Machine (SVM) and
Histogram of Gradient (HOG) + SVM methods. A total of 8 of the 35 (22.85%) EC lesions were categorized
as premalignant lesions because of their slightly reddish and flat lesions.
Conclusions: The CNN system, with 2 streams, demonstrated high sensitivity and specificity with the
endoscopic images. It obtained better detection performance than the currently used methods based on the
same datasets and has great application prospects in assisting endoscopists to distinguish esophageal lesion
subclasses.

Keywords: Esophageal cancer (EC); endoscopic diagnosis; convolutional neural network (CNN); deep learning

Submitted Nov 11, 2019. Accepted for publication Feb 21, 2020.
doi: 10.21037/atm.2020.03.24
View this article at: http://dx.doi.org/10.21037/atm.2020.03.24

© Annals of Translational Medicine. All rights reserved. Ann Transl Med 2020;8(7):486 | http://dx.doi.org/10.21037/atm.2020.03.24
Page 2 of 10 Liu et al. Classification of esophageal lesions based on CNN

Introduction at the forefront of computational image analysis (16-18).


A convolutional neural network (CNN), a classical
Esophageal cancer (EC) is the seventh most common form of
algorithm of deep learning, has been adopted to extract the
malignant tumor and the sixth leading cause of cancer-related
local features of the edge texture in the lower layer and to
deaths worldwide. Approximately 572,034 new EC cases
abstract the high-dimensional features in the deep layer by
and 508,585 EC-related deaths were recorded in 2018 (1).
simulating the recognition of the human visual system.
EC is known for its insidious onset, rapid progress,
CNNs with self-learning abilities are an effective
and poor prognosis. While diagnosing EC, the stage of
method in medical image classification, segmentation,
cancer determines the prognosis of patients (2). The five-
and detection (17,18). Shichijo et al. (19) applied a deep
year survival rate of a patient with EC is 20.9% in its
learning AI-based diagnostic system to diagnose Helicobacter
advanced stage and greater than 85% in the early stage
pylori infections, and Hirasawa et al. (20) detected gastric
(3,4). Therefore, early detection is necessary for improving
cancer by using a CNN model. Moreover, several studies
patient survival rates.
have constructed computer-aided methods to analyze the
In recent decades, esophagogastroduodenoscopy with
real-time endoscopic video images of colorectal polyps
a biopsy has been the standard procedure for diagnosing
(21-23). However, there are only a few studies on EC
EC, and the detection rate of EC has increased with the
detection. Horie et al. (24) used CNN to detect EC with a
development of endoscopic technologies (5-7). Moreover,
positive predictive value (PPV) of 40%, which is lower than
endoscopy can be used for observing premalignant lesions,
expected. Yang et al. (25) trained a 3D-CNN model with
such as intraepithelial neoplasia and atypical hyperplasia,
the PET image datasets to predict EC outcomes.
which could progress to EC. Early detection and
We aimed to propose a novel diagnostic method based
determining EC or premalignant lesions can lead to more
on a CNN model that can automatically detect EC and
effective targeted interventions. However, distinguishing
distinguish it from premalignant lesions in endoscopic
between early EC and premalignant lesions is normally
images.
a challenging task because of their similar endoscopic
features, such as mucosal erosion, hyperemia, and roughness
(Figure 1). Methods
Meta-analysis studies have shown that the endoscopic
Datasets and data preparation
miss rate for upper gastrointestinal cancers is 11.3%, while
33 (23%) subjects with EC had undergone an endoscopy Between July 2010 and July 2018, a total of 1,272 esophagus
that failed to diagnose their cancers within 1 year before endoscopic images were collected from 748 patients
diagnosis (8,9). Moreover, around 7.8% of patients with from the First Affiliated Hospital of Nanjing Medical
EC fail to be diagnosed with conventional endoscopy, and University, which is the best and biggest comprehensive
most missed cases of EC are in the upper esophagus (5). hospital in Jiangsu, taking charge of four central roles for
However, a recent multicenter study found that missed the whole province: medical treatment, medical teaching,
EC accounted for only 6.4% of cases and was associated scientific research, and hospital ethics activities. The
with a poor survival rate (10). Therefore, although the imaging data consisted of 531 normal esophagus images,
missed diagnosis rate of EC has decreased, to improve the 387 premalignant images, and 354 EC images. Endoscopic
survival rate of patients, endoscopists must receive long- images were captured by Olympus endoscopes (GIF-
term professional training and have the ability to detect EC H260Z, GIF-Q260, GIF-Q260J, GIF-XQ260, GIF-H260,
properly. GIF-H260Q, GIF-XQ240, Japan). The inclusion criteria of
In the past several years, computer vision-based this database are those images with available conventional
techniques have been widely applied in the field of medical white-light endoscopy, chromoendoscopy, and narrow-
image classification and detection (11,12). Traditional band imaging. The images with poor quality, including
machine learning models have been developed using prior excessive mucus, foams, blurring, and active bleeding and
data on the local features for automatic esophageal lesion images captured from patients who underwent esophageal
diagnosis. However, the performance of many of these surgery and endoscopic resection, were excluded. All images
traditional methods is highly determined by the manually were marked manually by the author. In our study, ECs
developed features (13-15). Recently, deep learning has been included adenocarcinoma and squamous cell carcinoma,

© Annals of Translational Medicine. All rights reserved. Ann Transl Med 2020;8(7):486 | http://dx.doi.org/10.21037/atm.2020.03.24
Annals of Translational Medicine, Vol 8, No 7 April 2020 Page 3 of 10

Figure 1 Sample images of three types using the CNN system. CNN, convolutional neural network. The red boxes indicate location of lesion.

Figure 2 Original and preprocessing images.

and precancerous lesions included low-grade dysplasia and A large difference and a clear “boundary effect” were
high-grade dysplasia. observed between the foreground and background of the
images. Images were cropped to 90% to eliminate the
boundary effect. The original and preprocessed images are
Data preprocessing
shown in Figure 2.
The esophageal images were rescaled to 512×512 through a
bilinear interpolation method to reduce the computational Data augmentation
complexity (26).
Brightness variation of the endoscopic esophageal images To overcome overfitting for our small-scale esophageal
might lead to intraclass differences, which can affect the images, we adopted the following data augmentation
results of the proposed network. Therefore, instead of using measurements before training the network. In the training
the original endoscopic images, the following contrast- dataset, spatial translation of 0–10-pixel value in horizontal
enhanced image was used as the inputs for the CNN. and vertical direction flipping and slight shifting between
−10 and 10 pixels were employed (Figure 3).
I ′ ( x, y;σ ) = αI ( x, y ) + βG ( x, y;ε ) * I ( x, y ) + γ [1]

where “*” represents the convolution operator, I(x,y) is the


CNNs
original endoscopic image, and G(x,y;ε) is a Gaussian filter
with scale ε. The parameter values were empirically selected The basic CNNs consisted of 2 basic operational layers: the
as α=4, β=−4, ε=512/20, and γ=128. convolutional and pooled layers (Figure 4).

© Annals of Translational Medicine. All rights reserved. Ann Transl Med 2020;8(7):486 | http://dx.doi.org/10.21037/atm.2020.03.24
Page 4 of 10 Liu et al. Classification of esophageal lesions based on CNN

A B C

Figure 3 Data augmentation with flipping (B) and mirror (C) in the original image (A).

Convolutional Pooling Fully-connected Softmax


layer layer layer layer

Figure 4 The exemplary architecture of the basic CNN. CNN, convolutional neural network.

The convolutional layer’s main function was to extract calculation formula of the pooling layer was as follows:
the features of the image information on the upper layer.
Convolution operations use local perception and weight (
x L = f β Lj down ( x L-
j ) +b j
1 L
) [3]

sharing to reduce parameters. The calculation formula of where down(∙) represents a down-sampling function, and
the convolution layer was as follows: β and b represent weight and bias, respectively. In this
study, we selected average pooling, which is defined as the
x L = f ( x L-1 * wLj +b Lj ) [2]
following:
L
where x represents the feature map of the convolution
 m m 
kernel in the L-th layer for input and j-th convolution kernel down ( xm? m ) = mean  ∑∑ xab  [4]
in the (L-1)-th layer for output, “*” represents convolution  a=1 b=1 
operation, wLj represents the bias of j-th convolutional Fully connected layer FC(c): each unit of feature maps
kernel in the L-th layer, and f(*) represents activation in the upper layer is connected with the c units of the fully
function. In this study, the RELU activation function was connected layer. An output layer follows the fully connected
often used to solve the gradient dispersion problem. layer.
The pooling layer performed dimensionality reductions The Softmax layer was used to normalize the input
on an input feature map, reduced parameters, and retained feature values into the range of (0, 1) so that the output
the main feature information. The layer also improved values y m represented the probability of each category.
the robustness of a network structure to transformations, The operation for the Softmax layer can be written as the
such as rotation, translation, and stretching of images. The following:

© Annals of Translational Medicine. All rights reserved. Ann Transl Med 2020;8(7):486 | http://dx.doi.org/10.21037/atm.2020.03.24
Annals of Translational Medicine, Vol 8, No 7 April 2020 Page 5 of 10

HL(x) Original image Preprocessing image


512 512
Relu activation

512
+ 512
Input

1×1 Conv
(2048 Linear)

3×1 Conv
(256) Inception- Inception-
X
ResNet-V2 ResNet-V2

1×1 Conv FL(x)


1×3 Conv
(192)
(224)
Training

O-stream

P-stream
1×1 Conv Fusion
(192)

Relu activation
Output
Figure 5 The basic structure of the Inception-ResNet module.

Classification Normal Precancerous Lesion Cancer


θm x
e
ym = n Figure 6 Proposed two-stream structure. The Inception-ResNet is
[5]
∑e
θm x

used as the basic CNN structure. The input of the O-stream is the
m=1
original image, and the input of the P-stream is the preprocessed
where ym is the output probability of the m-th class, θm is the image. CNN, convolutional neural network.
weight parameter of the m-th class, n is the number of total
class, and x represents the input neurons of the upper layer.
stacked nonlinear layers are forced to learn the following
Construction of Two-stream CNN algorithm transformation:
A deep neural network structure called Inception-ResNet
was employed to construct a reliable AI-based diagnostic F
=L ( x) H L ( x) − x [6]
system. The Inception-ResNet achieved the best results of th
Therefore, the transformation for the L building block
the moment in the ILSVRC image classification benchmark
is the following:
in 2017 (27). The proposed structure consists of 2 streams:
the O-stream and P-stream. L ( x)
H= FL ( x) + x [7]
Inception networks can effectively solve the problem
The classic Inception-ResNet module consists of 1×1,
of computation complexity. The ResNet network can
1×3, and 3×1 convolutional layers. The 1×1 convolutional
reduce the overfitting when the network becomes deeper.
Inception-ResNet network combining the Inception layer is used to reduce channel number, and the 1×3, 3×1
network with the ResNet network achieves an improved convolutional layer is employed to extract spatial features.
performance on the test set of the ImageNet classification Figure 6 demonstrates the O-stream and P-streams
challenge (28). Figure 5 shows the basic structure of the employing the same network structure to allow effective
Inception-ResNet module. feature fusion. The O-stream inputs the original image and
For clarity, HL(x) denotes the transformation of the Lth focuses on extracting the global features of the esophageal
building block. x is the input of the L th building block, images. The P-stream inputs the preprocessed images and
and the desired output is FL(x). Residual block explicitly focuses on extracting the texture features of the esophageal
forces the output to fit the residual mapping; that is, the image (Figure 6). The results of the proposed network and

© Annals of Translational Medicine. All rights reserved. Ann Transl Med 2020;8(7):486 | http://dx.doi.org/10.21037/atm.2020.03.24
Page 6 of 10 Liu et al. Classification of esophageal lesions based on CNN

Table 1 Size and demographics of the study sample


Male Female Total
Group
N Age (mean) SD N Age (mean) SD N Age (mean) SD

Cancer 140 63.4 8.8 67 64.9 7.6 207 63.7 8.6

Precancer 178 61.1 7.5 78 59.5 7.8 256 60.6 7.7

Normal 114 45.6 15.4 171 47.5 12.9 285 46.8 13.9

Total 432 57.8 12.8 316 53.3 13.0 748 56.0 13.1

Loss_Acc_Graph parameters of the proposed network are learned by using


1.6 0.95
Loss
mini-batch stochastic gradient descent with a momentum
ACC
1.4 0.90
set to 0.9. The 10 image batches are sent to the network
1.2 0.85 with a weight decay of 0.0005. The base learning rate is set
1.0 0.80 to 10−3, and the value is further dropped until the loss stops
decreasing. The convergence range of the validation loss
ACC

0.8 0.75
Loss

is 0.05–0.1, and the average validation accuracy after 10 k


0.6 0.70
epochs was 0.8583 (Figure 7).
0.4 0.65

0.2 0.60
Experiments and validation parameters
0.0 0.55
0 20000 40000 60000 80000 100000 The proposed approaches were implemented in the
Figure 7 Training curves of the proposed classification approach TensorFlow deep learning framework, which was run on
on the EC database. EC, esophageal cancer. a PC with NVIDIA GeForce GTX 1080Ti GPU (8 G)
(NVIDIA CUDA framework 8.0, and cuDNN library).
For the elimination of contingencies in the classification
the sub-streams for EC classification are presented in Table 1. results and to evaluate the performance of the proposed
The fusion of the 2 streams show the final results. For the EC model, the results were quantitatively evaluated by 3
proposed structure, the concatenation fusion is employed. metrics; these were accuracy (ACC), sensitivity (SEN), and
For clarity, we defined a concatenation fusion function: specificity (SPEC), and were defined as the following:
f, 2 feature maps xta and xtb , and a fusion feature map y,
TP
where x a ∈ R H ×W ×D , x b ∈ R H ×W ×D , and y ∈ R H ′×W ′×D′ , and Sen =
TP + FP [9]
where W, H, and D are the width, height, and the number
TP
of channels of the feature maps. The concatenation fusion Spec = [10]
FP +TN
method was described as follows:
TP +TN
Concatenation fusion y=fcat(xa,xb) stacks the 2 features at Acc = [11]
the same location i, j across the feature channels d. TP +TN + FP + FN

where
yi, j,d = xi,aj,d , yi, j,D+d = xi,b j,d [8] True positive (TP) is the number of positive images
correctly detected.
where y ∈ R H ×W ×2D . True negative (TN) is the number of negative images
correctly detected.
Learning parameters False positive (FP) is the number of correctly detected
The key to achieving promising results is training a model wrongly as the esophagus images. False negative (FN) is the
with the correct weight parameters, which influence the number of positive samples misclassified as negative.
performance of the entire structure. In training, the weight In the evaluation phase, all the metrics were calculated

© Annals of Translational Medicine. All rights reserved. Ann Transl Med 2020;8(7):486 | http://dx.doi.org/10.21037/atm.2020.03.24
Annals of Translational Medicine, Vol 8, No 7 April 2020 Page 7 of 10

Table 2 Statistics distribution from EC database Confusion matrix

Images Normal Precancerous lesion Cancer

Train 1,017 424 310 283 Normal 49 2 1 40

Validation 126 53 38 35
30
Test 129 54 39 36

True label
Precancer 4 33 3
EC, esophageal cancer.
20

Table 3 Results of the proposed network and the sub-streams in the Cancer 0 8 27
10
EC database
SEN (%) SPEC (%) ACC (%) 0
Normal Precancer Cancer
O-Stream 98.08 85.33 66.93
Predicted label
P-Stream 96.15 88.00 79.53
Figure 8 Confusion matrix of the proposed structure in EC
Proposed structure 94.23 94.67 85.83 database. EC, esophageal cancer.
EC, esophageal cancer; SEN, sensitivity; SPEC, specificity;
ACC, accuracy.
by itself was 66.93%. Using the preprocessed image as the
input, the P-stream focuses on exploiting the textures and
Table 4 Results of the proposed network in the EC database detailed features of the esophageal images, and the ACC of
Normal Precancerous lesion Cancer p-stream alone was 79.53%. The fusion of the two streams
ACC 94.23% 82.50% 77.14% led to the best results of 85.83%.
EC, esophageal cancer; ACC, accuracy.
Table 4 shows the ACC of each category in the EC
database based on the proposed network. The normal type
was easier to identify probably because the amount of data
based on the five-fold cross-validation results. The dataset in the normal type was greater than the other two types.
was divided into the training (80%) and testing (20%) Figure 8 presents the confusion matrix for the EC
datasets, respectively. database. In the confusion matrix, the diagonal values are
The detailed data statistics distribution from the EC the A of each category classification, and the others are
database is shown in Table 2. the confusion degrees between the two categories. This
method diagnosed 74 total lesions as esophageal lesions (the
precancerous lesion or cancer); 3 were normal cases with
Results
a PPV of 95.94% and a negative predictive value (NPV)
A total of 748 patients were included in this analysis. Table 1 92.45%. The PPV and the NPV of EC were 87.09% and
presents the sizes and demographics of the database. Overall, 91.67%, respectively. The accuracy of the cancer category
no significant age difference was observed between males and was 77.14%, which implies that it is easy to confuse EC
females in each group. However, the normal control group with the precancerous lesions.
was 15 years younger on average than the other two groups. Table 5 demonstrates a comparison made between the
Cancer and precancerous lesion groups had more males than method we proposed and the methods of LBP+SVM and
females, both of which were around 60 years old. HOG+SVM using the same dataset. The total sensitivity,
The comparative results of the proposed network and sub- specificity, and accuracy of our method were 94.23%,
streams (the O-Stream and the P-Stream) in the database are 94.67%, and 85.83%, respectively, which are higher than
listed in Table 3. This database contains all images, including those of the other methods.
those of the normal esophagus, precancerous lesions, and
EC. And the results are the overall ACC, SEN, and SPEC of
Discussion
each methods. The O-stream focuses on exploiting the color
and global features of the esophageal images, and its ACC Endoscopy plays a crucial role in the diagnosis of EC,

© Annals of Translational Medicine. All rights reserved. Ann Transl Med 2020;8(7):486 | http://dx.doi.org/10.21037/atm.2020.03.24
Page 8 of 10 Liu et al. Classification of esophageal lesions based on CNN

Table 5 Comparison of the proposed network with other methods presented achieved better results. Therefore, the CNN
SEN (%) SPEC (%) ACC (%) system we proposed can easily distinguish whether
LBP + SVM 63.27 64.36 64.75
samples suffer from esophageal lesions. In some cases,
however, there were some discrepancies between EC and
HOG + SVM 57.93 59.82 60.40
precancerous esophageal lesions. For instance, 85% of the
Proposed method 94.23 94.67 85.83 lesions diagnosed by the CNN as premalignant lesions
SEN, sensitivity; SPEC, specificity; ACC, accuracy; LBP, Local were EC. The most probable reason for misdiagnosis was
Binary Patter ns; SVM, Support Vector Machine; HOG, that cancerous lesions were extremely localized in the
Histogram of Gradient. precancerous lesions, and their surface characteristics were
not obvious. Some other reasons may include the fact that
the cancer was hard to detect on the surface or the poor
which is the sixth leading cause of cancer-related death (1). angle at which the image was taken.
However, diagnosing EC at an early stage by endoscopy The main contributions of this paper are twofold.
is difficult and requires experienced endoscopists. An First, the esophageal endoscopic database was built. The
alternative method for EC classification is done by using database included 1,272 endoscopic images, which consisted
a deep leaning method. It is more helpful and has been of 3 types of endoscopic images (normal, premalignant,
applied in various fields, such as computer vision (29) and cancerous). Each image in this database had a classification
pattern recognition (30). The application of deep learning label. Secondly, we presented a two-stream CNN that
methods achieves complex function approximation through can automatically extract global and local features from
a nonlinear network structure and shows powerful learning endoscopic images.
abilities (31). Compared with traditional recognition The significant strength of the study was that our
algorithms, deep learning combines feature selection proposed two-stream CNN consisted of 2 subnetworks
methods or extraction and classifier determination methods (O-stream and P-stream). The original images were input
into a single step and can study features to reduce the with the O-stream to extract the colors and global features,
manual design workload (32). and the pre-processed esophageal images were input with
The CNN model is one of the most important deep the P-stream to extract the texture and detail features.
learning models for computer vision and image detection. Advanced Inception-ResNet V2 was adopted as our CNN
In the most recent study, Hirasawa et al. achieved the framework. Finally, two-stream CNN effectively extracted
automatic detection of gastric cancer in endoscopic images the two-stream feature and achieved promising results.
by using a CNN-based diagnostic system and obtained This study had some limitations. First, the detection of
an overall sensitivity of 92.2% and a PPV of 30.6% (20). EC was based on images in white light view only. Designing
Sakai et al. proposed a CNN-based detection scheme and a universal detection system with images under more
achieved high accuracy in classifying early gastric cancer and views, such as NBI and chromoendoscopy using indigo
normal stomach (33). Our study has developed a CNN- carmine, is possible. Second, our sample size was small, and
based framework to classify esophageal lesions with an we obtained all endoscopic images from a single center.
overall accuracy of 85.83%. The images were preprocessed The type of endoscopy and its image resolution are highly
first, then the features of the image information were variable across different facilities. Therefore, we will obtain
extracted and annotated manually; finally, these images endoscopic images from other centers and use other types
were used for training the CNN model. This model was of endoscopy in future research. Third, the anatomical
applied to distinguish normal esophagus, premalignant structure of the squamocolumnar junction was also
lesions from EC. misdiagnosed as EC, which is unlikely to be misdiagnosed
According to our study, the trained network achieved an by endoscopists. If CNNs can have more systematic
accuracy of 85.83%, a sensitivity of 94.23%, and a specificity learning about the normal anatomical structures and various
of 94.67% with the fusion of the 2 streams. The accuracy lesions, the accuracy of EC detection will improve in the
rates of classifying normal esophagus, premalignant lesions, future.
and EC were 94.23%, 82.5%, and 77.14%, respectively. In future studies, we will add the precise location of
LBP+SVM and HOG+SVM methods are classical machine lesion areas and video analysis to allow for real-time
learning methods. Compared with them, the system we computer-aided diagnosis of esophageal tumors.

© Annals of Translational Medicine. All rights reserved. Ann Transl Med 2020;8(7):486 | http://dx.doi.org/10.21037/atm.2020.03.24
Annals of Translational Medicine, Vol 8, No 7 April 2020 Page 9 of 10

Conclusions mortality worldwide for 36 cancers in 185 countries. CA


Cancer J Clin 2018;68:394-424.
We constructed a CNN system to classify EC and
2. Hu Y, Hu C, Zhang H, et al. How does the number
premalignant lesions with high accuracy and specificity.
of resected lymph nodes influence TNM staging and
The system distinguished early EC from premalignant
prognosis for esophageal carcinoma? Ann Surg Oncol
lesions and was able to increase the detection rate of early
2010;17:784-90.
EC. Our method showed better detection performance
3. Janurova K, Bris R. A nonparametric approach to medical
than other detection methods. In the future, the burden
survival data: Uncertainty in the context of risk in
of endoscopists can be reduced, and the difficulties of
mortality analysis. Reliab Eng Syst Safe 2014;125:145-52.
the shortage of professionals in primary hospitals can be
4. Lee JS, Ahn JY, Choi KD, et al. Synchronous second
alleviated. primary cancers in patients with squamous esophageal
cancer: clinical features and survival outcome. Korean J
Acknowledgments Intern Med 2016;31:253-9.
5. Chadwick G, Groene O, Hoare J, et al. A population-
Funding: This research was supported by the Jiangsu Science based, retrospective, cohort study of esophageal cancer
and Technology Department Basic Research Program of missed at endoscopy. Endoscopy 2014;46:553-60.
the Natural Science Foundation [No. BK20171508 (DA17)]. 6. Li J, Xu R, Liu M, et al. Lugol Chromoendoscopy Detects
Esophageal Dysplasia With Low Levels of Sensitivity in a
Footnote High-Risk Region of China. Clin Gastroenterol Hepatol
2018;16:1585-92.
Conflicts of Interest: All authors have completed the ICMJE 7. Khalil Q, Gopalswamy N, Agrawal S. Missed esophageal
uniform disclosure form (available at http://dx.doi. and gastric cancers after esophagogastroduodenoscopy in
org/10.21037/atm.2020.03.24). The authors have no a midwestern military veteran population. South Med J
conflicts of interest to declare. 2014;107:225-8.
8. Menon S, Trudgill N. How commonly is upper
Ethical Statement: The authors are accountable for all gastrointestinal cancer missed at endoscopy? A meta-
aspects of the work in ensuring that questions related analysis. Endosc Int Open 2014;2:46-50.
to the accuracy or integrity of any part of the work are 9. Visrodia K, Singh S, Krishnamoorthi R, et al. Magnitude
appropriately investigated and resolved. The study was of Missed Esophageal Adenocarcinoma After Barrett's
approved by the Ethics Committee of the First Affiliated Esophagus Diagnosis: A Systematic Review and Meta-
Hospital of Nanjing Medical University (No. 2019- analysis. Gastroenterology 2016;150:599-607. e7; quiz
SR-448). Informed consent for upper gastrointestinal e14-5.
endoscopy (UGE) was obtained in all cases. 10. Rodríguez de Santiago E, Hernanz N, Marcos-Prieto
HM, et al. Rate of missed oesophageal cancer at routine
Open Access Statement: This is an Open Access article endoscopy and survival outcomes: A multicentric cohort
distributed in accordance with the Creative Commons study. United European Gastroenterol J 2019;7:189-98.
Attribution-NonCommercial-NoDerivs 4.0 International 11. Yan H. Computer Vision Applied in Medical Technology:
License (CC BY-NC-ND 4.0), which permits the non- The Comparison of Image Classification and Object
commercial replication and distribution of the article with Detection on Medical Images. Proceedings of the 2018
the strict proviso that no changes or edits are made and the International Symposium on Communication Engineering
original work is properly cited (including links to both the & Computer Science (CECS 2018), 2018:98-103.
formal publication through the relevant DOI and the license). 12. Fritscher K, Raudaschl P, Zaffino P, et al. Deep Neural
See: https://creativecommons.org/licenses/by-nc-nd/4.0/. Networks for Fast Segmentation of 3D Medical Images.
Medical Image Computing and Computer-Assisted
Intervention, 2016:158-65.
References
13. Kage A, Münzenmayer C, Wittenberg T. A Knowledge-
1. Bray F, Ferlay J, Soerjomataram I, et al. Global cancer Based System for the Computer Assisted Diagnosis of
statistics 2018: GLOBOCAN estimates of incidence and Endoscopic Images. Bildverarbeitung für die Medizin

© Annals of Translational Medicine. All rights reserved. Ann Transl Med 2020;8(7):486 | http://dx.doi.org/10.21037/atm.2020.03.24
Page 10 of 10 Liu et al. Classification of esophageal lesions based on CNN

2008:272-6. level CNN features from nonmedical domain. IEEE J


14. Van der Sommen F, Zinger S, Schoon EJ, et al. Supportive Biomed Health Inform 2017;21:41-7.
automatic annotation of early esophageal cancer using local 24. Horie Y, Yoshio T, Aoyama K, et al. Diagnostic outcomes
gabor and color features. Neurocomputing 2014;144:92-106. of esophageal cancer by artificial intelligence using
15. de Souza L, Hook C, Papa JP, et al. Barrett’s esophagus convolutional neural networks. Gastrointest Endosc
analysis using SURF features. Bildverarbeitung für die 2019;89:25-32.
Medizin: Springer, 2017:141-6. 25. Yang CK, Yeh JY, Yu WH, et al. Deep convolutional
16. Suzuki K. Overview of deep learning in medical imaging. neural network-based positron emission tomography
Radiol Phys Technol 2017;10:257-73. analysis predicts esophageal cancer outcome. J Clin Med
17. Shin HC, Roth HR, GaoM, et al. Deep convolutional 2019;8:844.
neural networks for computer-aided detection: CNN 26. Chen G, Clarke D, Giuliani M, et al. Combining
architectures, dataset characteristics and transfer Learning. unsupervised learning and discrimination for 3D action
IEEE Trans Med Imaging 2016;35:1285-98. recognition. Signal Process 2015;110:67-81.
18. Yamashita R, Nishio M, Do RKG, et al. Convolutional 27. Szegedy C, Ioffe S, Vanhoucke V, et al. Inception-v4,
neural networks: An overview and application in radiology. inception-ResNet and the impact of residual connections
Insights Imaging 2018;9:611-29. on learning. Thirty-First AAAI Conference on Artificial
19. Shichijo S, Nomura S, Aoyama K, et al. Application Intelligence, 2017:4278-84.
of convolutional neural networks in the diagnosis of 28. Wu Z, Shen C, Hengel AVD. Wider or deeper: Revisiting
Helicobacter pylori infection based on endoscopic images. the ResNet model for visual recognition. Pattern Recognit
EBioMedicine 2017;25:106-11. 2019;90:119-33.
20. Hirasawa T, Aoyama K, Tanimoto T, et al. Application of 29. Zhang J, Luo HB, Hui B, et al. Image interpolation
artificial intelligence using a convolutional neural network for division of focal plane polarimeters with intensi-ty
for detecting gastric cancer in endoscopic images. Gastric correlation. Optics Express 2016;24:20799-807.
Cancer 2018;21:653-60. 30. Bengio Y. Learning deep architectures for AI. Foundations
21. Byrne MF, Chapados N, Soudan F, et al. Real-time and Trends® In Machine Learning 2009:1-127.
differentiation of adenomatous and hyperplastic diminutive 31. LeCun Y, Bengio Y, Hinton G. Deep learning. Nature
colorectal polyps during analysis of unaltered videos of 2015;521:436-44.
standard colonoscopy using a deep learning model. Gut 32. Arel I, Rose D, Coop R. DeSTIN: A scalable deep learning
2019;68:94-100. architecture with application to high-dimensional robust
22. Komeda Y, Handa H, Watanabe T, et al. Computer-aided pattern recognition. AAAI Fall Symposium: Biologically
diagnosis based on convolutional neural net-work system Inspired Cognitive Architectures 2009:11-5.
for colorectal polyp classification: Preliminary experience. 33. Sakai Y, Takemoto S, Hori K, et al. Automatic detection
Oncology 2017;93:30-4. of early gastric cancer in endoscopic images using a
23. Zhang R, Zheng Y, Mak TWC, et al. Automatic detection transferring convolutional neural network. Conf Proc
and classification of colorectal polyps by transferring low- IEEE Eng Med Biol Soc 2018;4138-41.

Cite this article as: Liu G, Hua J, Wu Z, Meng T,


Sun M, Huang P, He X, Sun W, Li X, Chen Y. Automatic
classification of esophageal lesions in endoscopic images using a
convolutional neural network. Ann Transl Med 2020;8(7):486.
doi: 10.21037/atm.2020.03.24

© Annals of Translational Medicine. All rights reserved. Ann Transl Med 2020;8(7):486 | http://dx.doi.org/10.21037/atm.2020.03.24

You might also like