0% found this document useful (0 votes)
50 views

Improvement of Gastroscopy Classification Performance Through Image Augmentation Using A Gradient-Weighted Class Activation Map

This document describes a study that aimed to improve the performance of a computer-aided diagnosis (CADx) system for classifying gastroscopy images through the use of image augmentation. Specifically, it used a technique called gradient-weighted class activation mapping (Grad-CAM) to identify lesion locations in abnormal images, which were then augmented by pasting them into normal images. This increased the number of abnormal images by 360,905. The CADx system was trained on the augmented dataset using Xception convolutional neural network. Evaluation found that image augmentation using Grad-CAM improved the F1 score from 0.792 to 0.835 and the area under the curve from 0.885 to 0.903, demonstrating it to be an

Uploaded by

halouane omar
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
50 views

Improvement of Gastroscopy Classification Performance Through Image Augmentation Using A Gradient-Weighted Class Activation Map

This document describes a study that aimed to improve the performance of a computer-aided diagnosis (CADx) system for classifying gastroscopy images through the use of image augmentation. Specifically, it used a technique called gradient-weighted class activation mapping (Grad-CAM) to identify lesion locations in abnormal images, which were then augmented by pasting them into normal images. This increased the number of abnormal images by 360,905. The CADx system was trained on the augmented dataset using Xception convolutional neural network. Evaluation found that image augmentation using Grad-CAM improved the F1 score from 0.792 to 0.835 and the area under the curve from 0.885 to 0.903, demonstrating it to be an

Uploaded by

halouane omar
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Received 8 September 2022, accepted 13 September 2022, date of publication 19 September 2022,

date of current version 26 September 2022.


Digital Object Identifier 10.1109/ACCESS.2022.3207839

Improvement of Gastroscopy Classification


Performance Through Image Augmentation Using
a Gradient-Weighted Class Activation Map
HYUN-SIK HAM 1 , HAN-SUNG LEE1 , JUNG-WOO CHAE1 , HYUN CHIN CHO2 ,
AND HYUN-CHONG CHO 1,3 , (Member, IEEE)
1 Department Graduate Program for BIT Medical Convergence, Kangwon National University, Chuncheon 24341, South Korea
2 Department of Internal Medicine, School of Medicine, Institute of Health Sciences, Gyeongsang National University Hospital, Gyeongsang National University,
Jinju 52727, South Korea
3 Department of Electronics Engineering, Kangwon National University, Chuncheon 24341, South Korea

Corresponding authors: Hyun Chin Cho ([email protected]) and Hyun-Chong Cho ([email protected])
This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korean Government [Ministry of
Science and Information and Communication Technology (MSIT)] (No. 2017R1E1A1A03070297). This research was supported by Basic
Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education
(No. 2022R1I1A3053872).

This work involved human subjects or animals in its research. Approval of all ethical and experimental procedures and protocols was
granted by the Gyeongsang National University Hospital, South Korea, under Application No. GNUH 2017-09-019-003.

1 ABSTRACT Endoscopic specialists performing gastroscopy, which relies on the naked eye, may benefit
2 from a computer-aided diagnosis (CADx) system that employs deep learning. This report proposes utilizing
3 a CADx system to classify normal and abnormal gastric cancer, gastritis, and gastric ulcer. The CADx system
4 was trained using a deep learning algorithm known as a convolutional neural network (CNN). Specifically,
5 Xception, which includes depth-wise separable convolution, was employed as the CNN. Image augmentation
6 was applied to improve the disadvantages of medical data, which are difficult to collect. A class activation
7 map (CAM), an algorithm that visualizes the classified region of interest in a CNN, was used to cut and paste
8 the image area into another image. The CAM-identified lesion location in an abnormal image was augmented
9 by pasting it into a normal image. The normal image was divided into nine equal parts and pasted where the
10 variance difference from the lesion was minimal. Consequently, the number of abnormal images increased
11 by 360,905. Xception was used to train the augmented dataset. A confusion matrix was used to evaluate
12 the performance of the gastroscopy CADx system. The performance criteria were specificity, sensitivity, F1
13 score, harmonic average of precision, sensitivity (recall), and AUC. The F1 score of the CADx system trained
14 with the original dataset was 0.792 and AUC was 0.885. The dataset augmentation approach using CAM
15 presented in this report is shown to be an effective augmentation algorithm, with performance improved to
16 0.835, 0.903 in terms of F1 score and AUC respectively.

17 INDEX TERMS Class activation map, classification, computer-aided diagnosis (CADx), deep learning,
18 gastroscopy, image augmentation.

19 I. INTRODUCTION incidence in Korea. In 2018, there were 29,279 cases of 23

20 According to GLOBOCAN statistics from 2018 (Fig. 1), gastric cancer among the 243,837 cases of all cancer types in 24

21 gastric cancer currently ranks fifth in incidence worldwide Korea [2]. Gastrointestinal diseases, including gastric cancer, 25

22 [1]. In particular, gastric cancer has recently had the highest have no clear symptoms or exact causes. Therefore, it is 26

difficult to diagnose them based on symptoms alone, and 27

The associate editor coordinating the review of this manuscript and regular gastroscopy should be performed to detect gastroin- 28

approving it for publication was Yizhang Jiang . testinal diseases. Although all parts of the stomach can be 29

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.


VOLUME 10, 2022 For more information, see https://creativecommons.org/licenses/by-nc-nd/4.0/ 99361
H.-S. Ham et al.: Improvement of Gastroscopy Classification Performance Through Image Augmentation

30 visually observed through gastroscopy, there is a possibility were 79.2% and 77.8%, respectively, with an AUC of 0.851. 85

31 of misdiagnosis, depending on the proficiency and fatigue As a result, the Grad-CAM method was effective in improv- 86

32 of the diagnostic physician who performs endoscopy. For ing gastric cancer classification performance. The Non-EGC 87

33 example, according to the Korea Consumer Agency, there images are much more prevalent than EGC images in this 88

34 were 78 cases of hospital-responsible cancer misdiagnosis study and the kinds of Non-EGC include chronic gastritis, 89

35 from 2017 to June 2021, of which endoscopy video reading chronic atomic gastritis, essential metaplasia, and erosion. 90

36 errors accounted for 30.8% (24 cases) [3]. They also explain how large datasets will enable more accu- 91

37 Computer-aided diagnosis (CADx) systems are being rate depth predictions. Hirasawa et al. [12] also attempted to 92

38 researched as they provide doctors with objective informa- classify gastric cancer using gastroscopic images. They col- 93

39 tion, which tends to increase diagnosis accuracy [4], [5]. lected 77 endoscopic videos of gastric cancer from 69 patients 94

40 As the number of medical imaging devices and endoscopies and a total of 13,584 endoscopic images: 11,288 images for 95

41 is increasing, research on CADx through endoscopic imag- the training dataset and 2,296 images for the test dataset. This 96

42 ing is being conducted. This report proposes a gastroscopy data was collected for more than 10 years from April 2004 to 97

43 CADx system that classifies normal and abnormal gas- December 2016. Gastroscopy was trained with a single-shot 98

44 troscopy images using a deep learning convolutional neural multibox detector (SSD), which is an object detection net- 99

45 network (CNN). work [13]. The trained network identified 71 out of 77 gas- 100

tric cancers, resulting in a sensitivity of 92.2%. However, 101

46 II. RELATED WORK the benign detection rate was low (30.6%). Wu et al. [14] 102

47 Research on applying artificial intelligence to medical data trained a CNN to classify gastric cancer and compared its 103

48 has been conducted [6], [7]. In particular, CADx is used results with those obtained by six specialists. A total of 3,170 104

49 in various medical imaging techniques, such as computed images of early gastric cancer and 5,981 images of benign 105

50 tomography, X-ray imaging, and magnetic resonance imag- lesions were collected. VGG16 and ResNet50 were used as 106

51 ing. Research on gastroscopy images is also being conducted. classification networks [15], [16]. The accuracy, sensitivity 107

52 Sakai et al. [8] classified gastric cancer using data from to gastric cancer, and speci ficity were 92.5%, 94.0%, and 108

53 58 patients who underwent endoscopic submucosal dissec- 91.0%, respectively, which were similar to the classification 109

54 tion. They collected 926 images, consisting of 228 early results of the specialists. He et al. [17] attempted to classify 110

55 gastric cancers and 698 normal tissues. The data were cut the stomach area using gastroscopy rather than by finding 111

56 into 224 × 224 and trained by increasing them with nine gastric lesions. The area was divided into 11 parts, including 112

57 types of augmentation techniques, such as rotation, shift, and the pharynx, esophagus, and duodenum. Through 229 gas- 113

58 shear. Transfer learning was conducted utilizing GoogLeNet, troscopies, 5,661 gastroscopy images were obtained, and the 114

59 and the performance was evaluated using a confusion matrix dataset was composed of 3,704 of them by selecting images. 115

60 [9]. They used sliding window method for lesion detection ResNet50, InceptionV3, VGG11, VGG16, and DenseNet121 116

61 and obtained heatmap of cancer. The accuracy was 87.6%, were used as classification networks, and their performances 117

62 and the sensitivity indicating the gastric cancer detection were compared. DenseNet121 exhibited the best performance 118

63 rate was 80%. Lee, Jang Hyung et al. [10] used gastroscopy with 91.1% accuracy [18]. Kim et al. [19] used deep convolu- 119

64 images to perform three tasks: normal vs cancer, normal vs tional generative advertising networks (DCGAN) and Cifar10 120

65 benign ulcer, and cancer vs benign ulcer. ResNet-50 and policies to compensate for the lack of data. They generated 121

66 Inception v3 and VGG-16 were used as classification net- 200 normal and 200 abnormal images through the DCGAN 122

67 works. The dataset consisted of 220 ulcer images, 367 gas- and increased by 25 times through Cifar10. In addition, 123

68 tric cancer images, and 200 normal images were collected, augmented data through DCGAN and Cifar10 policies were 124

69 and 10% of each category was used as a test. They used filtered to deep learning models trained with the original 125

70 adaptive histogram equalization to reduce image variation dataset. The classification threshold is set to 0.8 or higher. 126

71 for brightness and contrast. ResNet-50 performed best for The model that trained the data augmented with Cifar10 and 127

72 each task after training, scoring 96.5%, 92.6%, and 77.1%. DCGAN showed the best performance with an AUC value of 128

73 Furthermore, the performance of cancer vs benign ulcer was 0.900. This is 5% higher than the AUC value of 0.855, which 129

74 77.1%, indicating difficulty in classification due to the small is the performance of the network trained with the original 130

75 difference in appearance. Hong Jin Yoon, et al. [11] aimed data. 131

76 to develop a model optimized for EGC detection and depth Zhang et al. [20] attempted to detect gastric polyps using a 132

77 prediction using vgg-16. From 2012 to 2018, a total of deep learning object-detection algorithm. In total, 404 images 133

78 11,539 images were collected from 800 patients, including were obtained from 215 patients who underwent gastroscopy. 134

79 896 T1a-EGC, 809 T1b-EGC, and 9,834 non-EGC. In this The data collection period was short, so they used rotation 135

80 paper, they propose a loss function that used the weighted to augment the data. After applying image augmentation, 136

81 sum of gradient-weighted class activation mapping and cross- the training data comprised 708 images. SSD-GP Net was 137

82 entropy losses. The classification sensitivity and specificity used, and not only max pooling, but also min pooling and 138

83 were 91.0% and 97.6%, respectively, and the AUC was 0.981. second max pooling were added to the network to detect 139

84 The predictive sensitivity and specificity of Tumor depth polyps. The F1 score was 84.2%, and the mean average 140

99362 VOLUME 10, 2022


H.-S. Ham et al.: Improvement of Gastroscopy Classification Performance Through Image Augmentation

141 precision, which describes the object detection performance, To improve the classification performance of the CADx 197

142 was 90.4%. A study was conducted to identify gastric cancer system, image augmentation using CAM, a CNN visualiza- 198

143 using an image segmentation algorithm. Shibata et al. [21] tion technique, is applied. Grad-CAM image augmentation 199

144 obtained 1,208 gastroscopic images from 42 healthy indi- extracts the lesion region from the abnormal image. The 200

145 viduals and 533 gastroscopic images from 93 patients with extracted image is attached to the area with the smallest 201

146 cancer from 2013 to 2017. A mask R-CNN was used as variance difference from the original image. As a result, 202

147 the image segmentation algorithm and was trained after an entirely new abnormal image is generated. To ensure that 203

148 masking the gastric cancer area in the endoscopic image the proposed augmentation method is effective a performance 204

149 [22]. The sensitivity was 96%, and the undetected rate was comparison was performed on the Xception between the orig- 205

150 1 per 10 images, indicating good detection of gastric can- inal dataset and the augmented dataset with Cifar10 policy 206

151 cer lesions. Ikenoyama et al. [23] studied whether the gas- and Grad-CAM. 207

152 troscopy CADx results acquired through a CNN could be


153 better than those obtained by endoscopists. The SSD network III. MATERIALS AND METHODS 208
154 was trained using 13,584 images collected between 2004 and
A. DATABASE 209
155 2016. The test dataset for performance validation consisted
The gastroscopy dataset was obtained from the Department 210
156 of 2,920 images collected from patients since 2018. The
of Internal Medicine of Gyeongsang National University 211
157 sensitivity of CADx to gastric cancer when only endoscopic
Hospital, South Korea. All endoscopic images were approved 212
158 imaging was performed was 58.4%, which was better than
by the IRB and were biopsy-verified. Because white-light 213
159 that of 67 endoscopists (31.9 %).
endoscopy is mainly used for endoscopy during health check- 214
160 This characteristic represents a shortcoming of medical
ups, narrowband endoscopy was excluded from the obtained 215
161 data. In addition, the performance in finding lesions has been
images. The gastroscopic images obtained from a total of 216
162 evaluated mainly through sensitivity.
158 people included healthy conditions and abnormalities, 217
163 Zhang, Wei, et al. [24] conducted a data augmentation
such as gastric cancer, gastritis, submucosal tumor (SMT), 218
164 study using a class activation map (CAM). Crop and trans-
ulcer, polyp, and bleeding. In Table 1, ‘‘Others’’ include por- 219
165 lation are commonly used for the augmentation method.
tal hypertensive gastrolatry, extrinsic compression, whitish 220
166 However, there is a flaw in that an important aspect of the
discoloration, gastric neuroendocrine tumor, gastric xan- 221
167 original image may be lost. As a result, data verification
thoma, gastric telangiectasia, blood clot, fundus diverticulum, 222
168 on the augmented dataset was performed by calculating the
and Crohn’s disease. 1 to 33 images were collected per 223
169 intersection over true (IoT) between the feature of the orig-
patient, with an average of 10 images. The dataset consisted 224
170 inal image and the deformed image. The CAM was used to
of 819 normal and 819 abnormal samples and was divided 225
171 extract the features from the original image. ResNet-18 and
such that 80% of the data were used for CADx training 226
172 SqueezeNet and DenseNet-121 were used as classification
and the remaining 20% were used for CADx performance 227
173 networks. The results of the study show that the proposed
verification. The training and test datasets were separated ran- 228
174 method using CAM improved classification accuracy by
domly, and the training and test image data were distributed 229
175 0.4%. Ghiasi et al. [25] proposed copy-pasted augmenta-
independently. The detailed configuration is presented in 230
176 tion method. The copy-paste augmentation method extracts
Table 1, and an example of this dataset is shown in Fig. 2. 231
177 objects from one image and pastes them to another image.
178 Through this method, mask AP achieved 49.1 and box AP
179 57.3 on COCO instance segmentation. The study by Zhang B. XCEPTION 232

180 and Ghiasi showed that copying objects and pasting them For gastroscopy classification, a CNN, which is a deep learn- 233

181 onto other images is an effective augmentation method. ing algorithm, was used. The Xception network using a depth- 234

182 Looking for related studies using medical data, the amount wise separable convolution structure was selected as the CNN 235

183 of data is significantly smaller than that of other filed studies for gastroscopy classification in this study [26]. Xception 236

184 if the collection time is not long due to the nature of the is designed to completely separate cross-channel correla- 237

185 data. Due to the nature of medial data, it is difficult to collect tions and spatial correlations. Therefore, it used Depth-wise 238

186 sufficient data for research. To overcome the difficulty of data separable convolution consisting of depth-wise and point- 239

187 collection, some studies apply augmentation techniques such wise parts. The depth-wise part completely separates the 240

188 as rotation and flipping to images. However, lesions such as channel and performs a 3 × 3 size convolution operation. 241

189 early gastric cancer with minute characteristics can be trans- This method creates one feature map per channel. The com- 242

190 formed by augmentation techniques. So, in this paper, we pro- putation amount can be reduced in a 3 × 3 convolution, 243

191 pose an augmentation method that can increase the amount which requires considerable computation, thus preventing 244

192 larger than the existing method and consequently improve the bottleneck phenomenon. The pointwise part performs a 245

193 the classification performance of gastrointestinal disease 1×1 convolution operation on each channel of the depth- 246

194 while maintaining the characteristics of the lesion. In this wise output. The number of output channels delivered to the 247

195 approach, a CADx system is trained to classify normal and next layer was adjusted based on the number of convolution 248

196 abnormal conditions, including gastritis and gastric cancer. filters in the pointwise part. Through the depth-wise separable 249

VOLUME 10, 2022 99363


H.-S. Ham et al.: Improvement of Gastroscopy Classification Performance Through Image Augmentation

TABLE 1. Configuration of the gastroscopy dataset.

FIGURE 2. Images of gastric endoscopy: (a) normal; (b) early gastric


cancer; (c) benign gastric ulcer; (d) gastric submucosal tumor.

normalization to Xception. Fig. 3 shows the structure of the 254

depth-wise separable convolution. 255

FIGURE 3. Depth-wise separable convolution.

C. CLASS ACTIVATION MAP 256

Training through a CNN involves a deep network and a series 257

of nonlinear operations such as activation functions, making 258

it difficult to know what kind of process is performed for 259


FIGURE 1. Cancer statistics of Global and Korea in 2018. (a) Percentages
of top 5 cancer cases in Global. (b) Percentages of top 5 cancer cases in
classification. Therefore, algorithms for estimating and visu- 260

Korea. alizing the process through which the trained model predicts 261

the correct answer is being studied. In this study, regions of 262

interest (ROIs) were identified using the gradient-weighted 263

250 convolution structure, feature extraction can be performed CAM (Grad-CAM) approach [27]. The Grad-CAM technique 264

251 efficiently, and the amount of computation can be reduced. uses the gradient value generated during backpropagation 265

252 In addition, a deeper network was constructed by applying in the CNN training process. The Grad-CAM process is as 266

253 techniques such as the skip connection of ResNet and batch follows. 267

99364 VOLUME 10, 2022


H.-S. Ham et al.: Improvement of Gastroscopy Classification Performance Through Image Augmentation

268 Firstly, the k feature maps Ak that have completed convo- and probability. Therefore, depending on the augmentation 307

269 lution operations are obtained from the layer that checks the probability, different results can be obtained even if the same 308

270 ROI. Backpropagation is then performed with the predicted subpolicy is used. 309

271 value yc classified as class c in the classifier. After back- AutoAugment set the recurrent neural network (RNN) as 310

272 propagation, global average pooling (GAP) is performed on the controller that determined and created subpolicy S. S was 311

273 the gradient map of Ak created to obtain αkc . Expressed as (1), applied to the training dataset, and the validation accuracy R 312

274 i and j are the coordinates of the feature map and Z is the was obtained. Subsequently, S was adjusted such that R in the 313

275 number of elements in the feature map. Subsequently, the RNN could be increased. By repeating this process, 25 sub- 314

276 feature map Ak is multiplied times αkc , which is the value of policies with the best performance in the training dataset were 315

277 GAP, and then the sum is obtained. Finally, the sum value selected. One image was augmented per sub-policy, resulting 316

278 becomes Grad-CAM Lc for class c after the ReLU operation in a 25-fold increase. Further, 25 subpolicies constituted an 317

279 is performed. Grad-CAM Lc is shown in (2). augmentation policy for one dataset. 318

AutoAugment provides policies based on the three


1 X X ∂yc
319

280 αkc = (1) datasets. The first policy is based on Cifar10 [30], which 320
Z i j ∂Ak
X
i,j consists of 10 classes: airplanes, vehicles, birds, cats, deer, 321

281
c
LGrad−CAM = ReLU( αkc Ak ) (2) dogs, frogs, horses, ships, and trucks. There are 6,000 images 322
k for each class, for a total of 60,000 images. The second policy 323

282 In the existing CAM, only the ROI in the last layer of the is based on ImageNet [31], which is a huge dataset that 324

283 CNN can be verified, and GAP is essential in the network was used in the ImageNet Large Scale Visual Recognition 325

284 [28]. Grad-CAM uses backpropagation gradients to identify Challenge until 2017. The final policy is based on SVHN 326

285 the ROI in all the layers of the CNN. In addition, it can be [32], which consists of numbers collected from Google 327

286 applied to general models by performing the GAP operation Street View. When comparing the performance after applying 328

287 separately, instead of in the CNN. AutoAugment to gastroscopy in the previous study, it was 329

288 With Grad-CAM, localization is possible without an object confirmed that the performance of the Cifar10 policy was the 330

289 detection model; thus, the area of the lesion in gastroscopy best [33]. Therefore, the gastroscopy dataset was augmented 331

290 can be found. Therefore, if applied to CADx, the area infor- by applying the Cifar10 policy. A total of 1,310 training 332

291 mation of a lesion can be transmitted to a specialist with a images were augmented 25 times for a total of 32,750 images. 333

292 relatively small amount of computation. However, because


293 the area is the ROI predicted by the CNN, it is not accu- E. IMAGE AUGMENTATION: GRAD-CAM 334

294 rate, and there is a possibility that the CNN may misclas- In medical imaging classification, Grad-CAM is generally 335

295 sify gastroscopy, confusing specialists. An example of using used to visualize the location of the lesion, and only a 336

296 Grad-CAM for gastroscopic classification is shown in Fig. 4. few studies have used Grad-CAM for data augmentation. 337

Furthermore, anomalous findings are more significant than 338

normal findings. If a gastrointestinal disease is not detected, 339

treatment may not be timely and may worsen. For this reason, 340

it is imperative not to lose the characteristics of the lesion 341

when increasing data using geometrical transformations such 342

as shear or shift. However, the general augmentation method 343

does not meet this requirement. As a result, we propose an 344

image augmentation method based on Grad-CAM to pre- 345

serve lesion features while improving anomaly classification 346

FIGURE 4. Example of gastric endoscopy Grad-CAM application: (a)


performance. 347

gastroscopic image; (b) ROI detected by CNN; (c) gastroscopy lesion heat The gastroscopic dataset was augmented with Cifar10 of 348
map. AutoAugment and trained using Xception. The abnormal 349

image was not all used for augmentation through Grad-CAM. 350

To find a better lesion area using Grad-CAM, we apply 351

297 D. IMAGE AUGMENTATION: AutoAugment 655 abnormal training images to previously learned Xception 352

298 Image augmentation was used to compensate for the insuf- models. And, out of 655 abnormal images, only 551 had 353

299 ficient gastroscopic dataset and to increase the classification classification accuracy greater than 0.9. The ROI of the lesion 354

300 performance. In this study, Google’s AutoAugment was used was extracted from each selected image using Grad-CAM, 355

301 as the image augmentation algorithm [29]. AutoAugment and the average size of the ROI was measured. Consequently, 356

302 consists of the augmentation operations Shear X/Y, Translate the average width of the ROI was 96 pixels, and the average 357

303 X/Y, Rotate, Auto Contrast, Invert, Equalize, Posterize, Con- height was 99 pixels. The ROI is attached to the normal image 358

304 trast, Color, Brightness, Sharpness, Cutout, and Sample pair- to create a new abnormal image. Two procedures were carried 359

305 ing. In addition, one subpolicy consists of two augmentation out at this time to prevent the ROI of the lesion from being 360

306 operations. The subpolicy has two parameters: magnitude attached to an unrealistic position. We divide the normal 361

VOLUME 10, 2022 99365


H.-S. Ham et al.: Improvement of Gastroscopy Classification Performance Through Image Augmentation

362 image into nine uniform areas. It was divided into nine areas precision. In this study, the sensitivity, specificity, and F1 389

363 to correspond to the size of ROI. Subsequently, the variance score were mainly used to assess the performance improve- 390

364 of ROI with each region was compared, and ROI was pasted ment. The equations for precision, specificity, sensitivity, and 391

365 at a position where the variance difference was minimal. F1 score are shown in (3), (4), (5), and (6), respectively. 392

366 However, the size of ROI is not fixed, and each patient has a The performance was also evaluated using the area under 393

367 different size, so the size of the lesion may be larger than the the curve (AUC) index of the receiver operating characteristic 394

368 size of the divided area. As a result, in this case, the size of the (ROC) curve. AUC is used as a major evaluation metric in 395

369 divided area was the same as ROI, and the windowing stride studies using medical data. In the ROC curve, the y-axis is 396

370 was adjusted so that the comparison was made nine times. the true positive rate, indicating sensitivity, and the x-axis is 397

371 Figure 6 shows an example of this. the false positive rate, meaning 1-specificity. The closer the 398

AUC is to 1, the better the classification performance. 399

The training was performed using the PyTorch framework 400

in Windows10. The CPU used was Intel Xeon W-2133, and 401

the GPU employed for training was TITAN RTX 24GB. The 402

batch size was 64, and the learning rate was 0.0001. All 403

datasets were trained for 100 epochs. 404

TP
Precisio = (3) 405
TP + FP
TN
Specificity = (4) 406
TN + FP
TP
Sensitivity = (5) 407
TP + FN
Precision × Sensitivity
F1Score = 2 × (6) 408
Precision + Sensitivity

FIGURE 5. Process of AutoAugment.


IV. RESULTS 409

The performance of six networks was compared to select a 410

network to be used in this study. Each network was trained 411

with an original dataset without augmentation techniques, 412

and the kind of comparison networks included Inception- 413

v3, EfficientNet-B3, ResNet-152, VGG-16, ViT-B, and 414

Xception. 415

Table 2 displays the training results of each network. 416

Although there are differences in performance for each ele- 417


FIGURE 6. Example of Image augmentation using Grad-CAM: (a) normal
image divided into nine areas; (b) ROI of abnormal image (c) augmented ment, as seen, Xception outperforms other network among 418
image with ROI pasted onto normal image. several models, including ViT-B, which showed good perfor- 419

mance in the latest classification studies. In particular, Xcep- 420

372 The ROI lesions in 551 selected abnormal images tion showed the highest performance in terms of accuracy, 421

373 were pasted onto 655 normal images using the proposed specificity, and AUC. ResNet-152 had a precision of 0.878, 422

374 Grad-CAM augmentation method. Consequently, 360,905 which was the highest value including the proposed method. 423

375 augmented abnormal images were generated. In the cases But it had the lowest value about sensitivity of 0.659, so it 424

376 of normal lesions, there were no distinct ROI-like abnormal was not selected as a model for this study. Xception reduced 425

377 lesions. Therefore, only abnormal values were augmented. information loss by removing the non-linear function that 426

existed between the Depthwise Convolution and Pointwise 427

378 F. EVALUATION METHOD Convolution operations. Furthermore, Xception solved the 428

379 The performance of the classification model was evaluated Gradient vanishing problem using the residual connection, 429

380 using a confusion matrix. The confusion matrix consisted outperforming other models. For this reason, we conducted 430

381 of true positives (TP), true negatives (TN), false positives the experiment using Xception. We used AutoAugment’s 431

382 (FP), and false negatives (FN). For CADx, the early detection Cifar10 policy and applied augmentation with Grad-CAM. 432

383 of abnormalities that can worsen the condition is important. The training was performed two times, including the orig- 433

384 Therefore, when evaluating the performance, abnormal was inal dataset, AutoAugment’s Cifar10 policy dataset, and 434

385 set to positive and normal to negative. Through this approach, Grad-CAM augmented dataset. The detailed process is illus- 435

386 the sensitivity and precision were obtained, and the overall trated in Figure. 7. 436

387 classification performance of the model was evaluated using Xception was trained using the original dataset. Missing 437

388 the F1 score and the harmonic average of the sensitivity and 38 out of 164 abnormal and 28 out of 164 normal cases 438

99366 VOLUME 10, 2022


H.-S. Ham et al.: Improvement of Gastroscopy Classification Performance Through Image Augmentation

TABLE 2. Comparison of network performance.

FIGURE 7. Flow chart of our gastric endoscopy classification system.

439 resulted in an accuracy of 0.799. The sensitivity and speci- FIGURE 8. ROC curves for the original, and Grad-CAM datasets.
440 ficity were 0.768 and 0.782, respectively. The F1 score was
441 0.792. Subsequently, the dataset was augmented and trained specificity, sensitivity, F1 score, and AUC. The original 467
442 using AutoAugment’s Cifar10 policy and the Grad-CAM gastroscopy dataset had a lower sensitivity than accu- 468
443 images were augmented with the weights of models trained racy. The specificity and AUC were relatively high. Thus, 469
444 using AutoAugment’s Cifar10 policy. Moreover, the Cifar10 the false-positive rate was low. When training with the 470
445 policy and dataset augmented by Grad-CAM were combined Grad-CAM augmented dataset, there was not much differ- 471
446 and trained. The model trained by the proposed method ence in the AUCs. However, the sensitivity increased by 472
447 showed improved performance on all evaluation metrics. 5%, and the F1 score increased by 5%. This dataset showed 473
448 The accuracy of proposed method was 0.842, and 52 of the better performance than the others owing to the increase in 474
449 328 endoscopies were missed. The sensitivity was 0.805, and abnormal classification performance. As a result, the Cifar10 475
450 26 abnormal images were classified incorrectly. Compared policy of AutoAugment and the Grad-CAM augmentation 476
451 to the Xception trained with the original data, performance method extracted with the corresponding weights yielded 477
452 improvement of about 5% was achieved from 0.768 to 0.805. improved classification performance compared to the orig- 478
453 Also, the F1 score and AUC were 0.835, and 0.903 respec- inal dataset, which contained fewer data. The Cifar10 aug- 479
454 tively which was the highest value. The detailed performance mentation method augments data by applying geometrical 480
455 evaluation results are presented in Table 2. transformations such as rotation to the image. Therefore, 481
456 Fig. 8 shows the AUC performance of the ROC curve. strong transformations may occur depending on the applied 482
457 The AUC of the original dataset, and Grad-CAM models policy. Since strong transformations slow training and impede 483
458 are 0.885, and 0.903, respectively. In addition, the proposed weight convergence, the Cifar10 augmentation method limits 484
459 method had the highest AUC value among several models. performance improvement. However, since the Grad-CAM 485
460 This result is meaningful because AUC is an indicator of augmentation method is applied to create an image by syn- 486
461 classification performance between normal and abnormal. thesizing the ROI in the original image, augmentation can 487

be performed while maintaining the color of the original 488

462 V. DISCUSSION image and the morphological characteristics of the lesion. 489

463 In this study, the Xception network was used to train the Therefore, unlike Cifar10, it is possible to increase a lot of 490

464 original gastroscopy dataset, the augmented dataset with data without interfering with the convergence of training. 491

465 the Cifar10 policy and Grad-CAM, and their performance Furthermore, the advantage of the Grad-CAM augmentation 492

466 was compared. The performance indicators were accuracy, method is that the amount of increase is larger than the general 493

VOLUME 10, 2022 99367


H.-S. Ham et al.: Improvement of Gastroscopy Classification Performance Through Image Augmentation

494 augmentation method. In this study, ROI was extracted from [4] A. Misumi, K. Misumi, A. Murakami, K. Harada, U. Honmyo, and 551

495 551 selected images, so 655 abnormal images increased by M. Akagi, ‘‘Endoscopic diagnosis of minute, small, and flat early gastric 552
cancers,’’ Endoscopy, vol. 21, pp. 159–164, Jul. 1989. 553
496 551 times to 360,905, but the amount can vary depending on [5] U. Honmyo, A. Misumi, A. Murakami, S. Mizumoto, I. Yoshinaka, 554
497 the size of each dataset. Discrimination of abnormal images M. Maeda, S. Yamamoto, and S. Shimada, ‘‘Mechanisms producing 555

498 is another critical factor in medical data. As a result, the color change in flat early gastric cancers,’’ Endoscopy, vol. 29, no. 5, 556
pp. 366–371, Jun. 1997. 557
499 improvement in abnormal classification demonstrates that
[6] I. M. El-Hasnony, O. M. Elzeki, A. Alshehri, and H. Salem, ‘‘Multi-label 558
500 the Grad-CAM augmentation method used in this study is active learning-based machine learning model for heart disease predic- 559
501 effective for medical data. tion,’’ Sensors, vol. 22, no. 3, p. 1184, Feb. 2022. 560

[7] H. Torkey, M. Atlam, N. El-Fishawy, and H. Salem, ‘‘A novel deep 561

502 VI. CONCLUSION autoencoder based survival analysis approach for microarray dataset,’’ 562
PeerJ Comput. Sci., vol. 7, p. e492, Apr. 2021. 563
503 CADx can assist specialists in performing endoscopy. In this [8] Y. Sakai, S. Takemoto, K. Hori, M. Nishimura, H. Ikematsu, T. Yano, 564
504 study, CADx, which is classified as normal or abnormal by and H. Yokota, ‘‘Automatic detection of early gastric cancer in endo- 565

505 augmentation using AutoAugment and Grad-CAM, was stud- scopic images using a transferring convolutional neural network,’’ in Proc. 566
40th Annu. Int. Conf. IEEE Eng. Med. Biol. Soc. (EMBC), Jul. 2018, 567
506 ied. The classifier was trained by selecting Xception, which is pp. 4138–4141. 568
507 one of the CNNs that shows strength in image classification. [9] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, 569
508 For the augmentation algorithm, the method of extracting V. Vanhoucke, and A. Rabinovich, ‘‘Going deeper with convolutions,’’ 570

509 the ROI of the lesion through Grad-CAM and pasting it in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2014, 571
pp. 1–9. 572
510 on another image was applied. Subsequently, we checked [10] J. H. Lee, Y. J. Kim, Y. W. Kim, S. Park, Y.-I. Choi, Y. J. Kim, D. K. Park, 573
511 whether the proposed method was effective in improving per- K. G. Kim, and J.-W. Chung, ‘‘Spotting malignancies from gastric endo- 574

512 formance, using indicators such as the F1 score and AUC. The scopic images using deep learning,’’ Surgical Endoscopy, vol. 33, no. 11, 575
pp. 3790–3797, Nov. 2019. 576
513 evaluation results demonstrated that the classification model
[11] H. J. Yoon, S. Kim, J.-H. Kim, J.-S. Keum, S.-I. Oh, J. Jo, J. Chun, 577
514 to which the proposed Grad-CAM augmentation model is Y. H. Youn, H. Park, I. G. Kwon, S. H. Choi, and S. H. Noh, ‘‘A lesion- 578
515 applied has the best classification performance. Compared to based convolutional neural network improves endoscopic detection and 579
depth prediction of early gastric cancer,’’ J. Clin. Med., vol. 8, no. 9, 580
516 the original dataset, the overall classification performance, F1
p. 1310, Aug. 2019. 581
517 score, and AUC increased by 5%. [12] T. Hirasawa, K. Aoyama, T. Tanimoto, S. Ishihara, S. Shichijo, T. Ozawa, 582
518 Several challenges remain to be addressed in future studies. T. Ohnishi, M. Fujishiro, K. Matsuo, J. Fujisaki, and T. Tada, ‘‘Application 583

519 First, we compared the variance between lesion and normal of artificial intelligence using a convolutional neural network for detecting 584
gastric cancer in endoscopic images,’’ Gastric Cancer, vol. 21, no. 4, 585
520 images and pasted the lesion into the area where variance is pp. 653–660, 2018. 586
521 minimal to generate an image as realistic as possible in this [13] W. Liu, D. Auguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, and 587

522 study. However, improvement in image synthesis method is A. C. Berg, ‘‘SSD: Single shot MultiBox detector,’’ in Proc. 14th Eur. 588
Conf. (ECCV). Cham, Switzerland: Springer, Jun. 2016, pp. 21–37. 589
523 required for more realistic augmented images. We believe that
[14] L. Wu, W. Zhou, X. Wan, J. Zhang, J. Shen, S. Hu, Q. Ding, G. Mu, 590
524 this issue can be solved using algorithms such as generative A. Yin, X. Huang, and J. Liu, ‘‘A deep neural network improves endoscopic 591
525 adversarial networks (GANs) or image blending. Second, detection of early gastric cancer without blind spots,’’ Endoscopy, vol. 51, 592

526 the localization performance is poor compared with those no. 6, pp. 522–531, Jun. 2019. 593

[15] K. Simonyan and A. Zisserman, ‘‘Very deep convolutional networks for 594
527 of object detection or segmentation. CAM has the advan- large-scale image recognition,’’ 2014, arXiv:1409.1556. 595
528 tage of extracting ROIs using only a CNN without a sep- [16] K. He, X. Zhang, S. Ren, and J. Sun, ‘‘Deep residual learning for image 596

529 arate layer for region proposal, but it does not find exact recognition,’’ in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 597
Jun. 2016, pp. 770–778. 598
530 lesion locations. This problem can be solved by employ-
[17] Q. He, S. Bano, O. F. Ahmad, B. Yang, X. Chen, P. Valdastri, L. B. Lovat, 599
531 ing segmentation algorithms, such as SLIC superpixels or D. Stoyanov, and S. Zuo, ‘‘Deep learning-based anatomical site classifica- 600
532 FRFCMs, to find segmented ROIs. In addition, although this tion for upper gastrointestinal endoscopy,’’ Int. J. Comput. Assist. Radiol. 601

533 study was conducted through white light endoscopic imaging, Surgery, vol. 15, no. 7, pp. 1085–1094, May 2020. 602

[18] G. Huang, Z. Liu, L. van der Maaten, and K. Q. Weinberger, ‘‘Densely 603
534 which is commonly performed, narrow-band imaging (NBI) connected convolutional network,’’ in Proc. IEEE Conf. Comput. Vis. 604
535 endoscopy also exists, which is performed using the wave- Pattern Recognit., Jul. 2017, pp. 4700–4708. 605

536 length of light absorbed by blood vessels. We plan to develop [19] Y.-J. Kim, H. C. Cho, and H.-C. Cho, ‘‘Deep learning-based computer- 606
aided diagnosis system for gastroscopy image classification using synthetic 607
537 a model for classifying abnormalities by extending the study data,’’ Appl. Sci., vol. 11, no. 2, p. 760, Jan. 2021. 608
538 from white-light endoscopic imaging to NBI. [20] X. Zhang, F. Chen, T. Yu, J. An, Z. Huang, J. Liu, W. Hu, L. Wang, H. Duan, 609
and J. Si, ‘‘Real-time gastric polyp detection using convolutional neural 610
539 REFERENCES network,’’ PLoS ONE, vol. 14, no. 3, Mar. 2019, Art. no. e0214133. 611

540 [1] F. Bray, J. Ferlay, I. Soerjomataram, R. L. Siegel, L. A. Torre, and A. Jemal, [21] T. Shibata, A. Teramoto, H. Yamada, N. Ohmiya, K. Saito, and H. Fujita, 612
541 ‘‘Global cancer statistics 2018: GLOBOCAN estimates of incidence and ‘‘Automated detection and segmentation of early gastric cancer from endo- 613
542 mortality worldwide for 36 cancers in 185 countries,’’ CA, Cancer J. Clin., scopic images using mask R-CNN,’’ Appl. Sci., vol. 10, no. 11, p. 3842, 614
543 vol. 68, no. 6, pp. 394–424, Nov. 2018. May 2020. 615

544 [2] S. Hong, Y.-J. Won, J. J. Lee, K.-W. Jung, H.-J. Kong, J.-S. Im, and [22] K. He, G. Gkioxari, P. Dollár, and R. Girshick, ‘‘Mask R-CNN,’’ in Proc. 616
545 H. G. Seo, ‘‘Cancer statistics in Korea: Incidence, mortality, survival, and IEEE Int. Conf. Comput. Vis., Oct. 2017, pp. 2961–2969. 617
546 prevalence in 2018,’’ Cancer Res. Treatment, vol. 53, no. 2, pp. 301–315, [23] Y. Ikenoyama, T. Hirasawa, M. Ishioka, K. Namikawa, S. Yoshimizu, 618
547 Apr. 2021. Y. Horiuchi, A. Ishiyama, T. Yoshio, T. Tsuchida, Y. Takeuchi, and 619
548 [3] Korea Consumer Agency. Cancer Misdiagnosis Consumer Damage S. Shichijo, ‘‘Detecting early gastric cancer: Comparison between the 620
549 Prevention Advisory. Accessed: Jul. 13, 2017. [Online]. Available: diagnostic ability of convolutional neural networks and endoscopists,’’ 621
550 https://www.kca.go.kr/kca/ Digestive Endoscopy, vol. 33, no. 1, pp. 141–150, Apr. 2021. 622

99368 VOLUME 10, 2022


H.-S. Ham et al.: Improvement of Gastroscopy Classification Performance Through Image Augmentation

623 [24] W. Zhang and Y. Cao, ‘‘A new data augmentation method of remote sensing HAN-SUNG LEE received the B.S. degree in elec- 660
624 dataset based on class activation map,’’ J. Phys., Conf., vol. 1961, no. 1, trical and electronic engineering from Kangwon 661
625 Jul. 2021, Art. no. 012023. National University, South Korea, in 2022, where 662
626 [25] G. Ghiasi, Y. Cui, A. Srinivas, R. Qian, T.-Y. Lin, E. D. Cubuk, Q. V. Le, he is currently pursuing the M.S. degree in inter- 663
627 and B. Zoph, ‘‘Simple copy-paste is a strong data augmentation method disciplinary graduate program for BIT medical 664
628 for instance segmentation,’’ Proc. IEEE/CVF Conf. Comput. Vis. Pattern convergence. 665
629 Recognit., Jun. 2021, pp. 2918–2928.
630 [26] F. Chollet, ‘‘Xception: Deep learning with depthwise separable convo-
631 lutions,’’ Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Sep. 2017,
632 pp. 1251–1258.
633 [27] R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and
634 D. Batra, ‘‘Grad-CAM: Visual explanations from deep networks via JUNG-WOO CHAE received the B.S. degree 666
635 gradient-based localization,’’ in Proc. IEEE Int. Conf. Comput. Vis.
in electrical and electronic engineering and the 667
636 (ICCV), Oct. 2017, pp. 618–626.
M.S. degree in electrical and medical conver- 668
637 [28] B. Zhou, A. Khosla, A. Lapedriza, A. Oliva, and A. Torralba, ‘‘Learning
638 deep features for discriminative localization,’’ in Proc. IEEE Conf. Com- gent engineering from Kangwon National Univer- 669

639 put. Vis. Pattern Recognit. (CVPR), Jun. 2016, pp. 2921–2929. sity, South Korea, in 2019 and 2021, respectively, 670

640 [29] E. D. Cubuk, B. Zoph, D. Mane, V. Vasudevan, and Q. V. Le, where he is currently pursuing the Ph.D. degree in 671

641 ‘‘AutoAugment: Learning augmentation policies from data,’’ 2018, interdisciplinary graduate program in BIT medical 672

642 arXiv:1805.09501. convergence. 673

643 [30] A. Krizhevsky and G. Hinton, ‘‘Learning multiple layers of features from
644 tiny images,’’ M.S. thesis, Dept. Comput. Sci., Univ. Toronto, Toronto, ON,
645 Canada, Apr. 2009.
646 [31] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, ‘‘ImageNet: HYUN CHIN CHO received the M.S., Ph.D., 674

647 A large-scale hierarchical image database,’’ in Proc. IEEE Conf. Comput. and M.D. degrees in internal medicine from 675

648 Vis. Pattern Recognit., Jun. 2009, pp. 248–255. the School of Medicine, Gyeongsang National 676
649 [32] Y. Netzer, T. Wang, A. Coates, A. Bissacco, B. Wu, and A. Y. Ng, ‘‘Reading University, Jinju, South Korea, in 2008 and 677
650 digits in natural images with unsupervised feature learning,’’ in Proc. 2014, respectively. She was a fellow at the 678
651 NeurIPSW, 2011, pp. 1–9. Samsung Medical Center, School of Medicine, 679
652 [33] S.-A. Lee, D.-H. Kim, and H.-C. Cho, ‘‘Deep learning based gastric lesion Sungkyunkwan University, Seoul, South Korea, 680
653 classification system using data augmentation,’’ Trans. Korean Inst. Electr. from 2009 to 2010. From 2011 to 2015, she was a 681
654 Eng., vol. 69, no. 7, pp. 1033–1039, Jul. 2020. Professor at School of Medicine, Samsung Chang- 682
won Hospital, Sungkyunkwan University, Chang- 683
won, South Korea. She is currently a Professor with the School of Medicine, 684
Gyeongsang National University Hospital, Gyeongsang National University. 685

HYUN-CHONG CHO (Member, IEEE) received 686


the M.S. and Ph.D. degrees in electrical and com- 687

655 HYUN-SIK HAM received the B.S. degree in puter engineering from the University of Florida, 688

656 electrical and electronic engineering and the USA, in 2009. From 2010 to 2011, he was a 689

657 M.S. degree in electrical and medical convergent Research Fellow at the University of Michigan, 690

658 engineering from Kangwon National University, Ann Arbor, USA. From 2012 to 2013, he was 691

659 South Korea, in 2020 and 2022, respectively. the Chief Research Engineer at LG Electronics, 692
South Korea. He is currently a Professor with the 693
Department of Electronics Engineering and Inter- 694
disciplinary Graduate Program for BIT Medical 695
Convergence, Kangwon National University, South Korea. 696
697

VOLUME 10, 2022 99369

You might also like