Improvement of Gastroscopy Classification Performance Through Image Augmentation Using A Gradient-Weighted Class Activation Map
Improvement of Gastroscopy Classification Performance Through Image Augmentation Using A Gradient-Weighted Class Activation Map
Corresponding authors: Hyun Chin Cho ([email protected]) and Hyun-Chong Cho ([email protected])
This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korean Government [Ministry of
Science and Information and Communication Technology (MSIT)] (No. 2017R1E1A1A03070297). This research was supported by Basic
Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education
(No. 2022R1I1A3053872).
This work involved human subjects or animals in its research. Approval of all ethical and experimental procedures and protocols was
granted by the Gyeongsang National University Hospital, South Korea, under Application No. GNUH 2017-09-019-003.
1 ABSTRACT Endoscopic specialists performing gastroscopy, which relies on the naked eye, may benefit
2 from a computer-aided diagnosis (CADx) system that employs deep learning. This report proposes utilizing
3 a CADx system to classify normal and abnormal gastric cancer, gastritis, and gastric ulcer. The CADx system
4 was trained using a deep learning algorithm known as a convolutional neural network (CNN). Specifically,
5 Xception, which includes depth-wise separable convolution, was employed as the CNN. Image augmentation
6 was applied to improve the disadvantages of medical data, which are difficult to collect. A class activation
7 map (CAM), an algorithm that visualizes the classified region of interest in a CNN, was used to cut and paste
8 the image area into another image. The CAM-identified lesion location in an abnormal image was augmented
9 by pasting it into a normal image. The normal image was divided into nine equal parts and pasted where the
10 variance difference from the lesion was minimal. Consequently, the number of abnormal images increased
11 by 360,905. Xception was used to train the augmented dataset. A confusion matrix was used to evaluate
12 the performance of the gastroscopy CADx system. The performance criteria were specificity, sensitivity, F1
13 score, harmonic average of precision, sensitivity (recall), and AUC. The F1 score of the CADx system trained
14 with the original dataset was 0.792 and AUC was 0.885. The dataset augmentation approach using CAM
15 presented in this report is shown to be an effective augmentation algorithm, with performance improved to
16 0.835, 0.903 in terms of F1 score and AUC respectively.
17 INDEX TERMS Class activation map, classification, computer-aided diagnosis (CADx), deep learning,
18 gastroscopy, image augmentation.
20 According to GLOBOCAN statistics from 2018 (Fig. 1), gastric cancer among the 243,837 cases of all cancer types in 24
21 gastric cancer currently ranks fifth in incidence worldwide Korea [2]. Gastrointestinal diseases, including gastric cancer, 25
22 [1]. In particular, gastric cancer has recently had the highest have no clear symptoms or exact causes. Therefore, it is 26
The associate editor coordinating the review of this manuscript and regular gastroscopy should be performed to detect gastroin- 28
approving it for publication was Yizhang Jiang . testinal diseases. Although all parts of the stomach can be 29
30 visually observed through gastroscopy, there is a possibility were 79.2% and 77.8%, respectively, with an AUC of 0.851. 85
31 of misdiagnosis, depending on the proficiency and fatigue As a result, the Grad-CAM method was effective in improv- 86
32 of the diagnostic physician who performs endoscopy. For ing gastric cancer classification performance. The Non-EGC 87
33 example, according to the Korea Consumer Agency, there images are much more prevalent than EGC images in this 88
34 were 78 cases of hospital-responsible cancer misdiagnosis study and the kinds of Non-EGC include chronic gastritis, 89
35 from 2017 to June 2021, of which endoscopy video reading chronic atomic gastritis, essential metaplasia, and erosion. 90
36 errors accounted for 30.8% (24 cases) [3]. They also explain how large datasets will enable more accu- 91
37 Computer-aided diagnosis (CADx) systems are being rate depth predictions. Hirasawa et al. [12] also attempted to 92
38 researched as they provide doctors with objective informa- classify gastric cancer using gastroscopic images. They col- 93
39 tion, which tends to increase diagnosis accuracy [4], [5]. lected 77 endoscopic videos of gastric cancer from 69 patients 94
40 As the number of medical imaging devices and endoscopies and a total of 13,584 endoscopic images: 11,288 images for 95
41 is increasing, research on CADx through endoscopic imag- the training dataset and 2,296 images for the test dataset. This 96
42 ing is being conducted. This report proposes a gastroscopy data was collected for more than 10 years from April 2004 to 97
43 CADx system that classifies normal and abnormal gas- December 2016. Gastroscopy was trained with a single-shot 98
44 troscopy images using a deep learning convolutional neural multibox detector (SSD), which is an object detection net- 99
45 network (CNN). work [13]. The trained network identified 71 out of 77 gas- 100
46 II. RELATED WORK the benign detection rate was low (30.6%). Wu et al. [14] 102
47 Research on applying artificial intelligence to medical data trained a CNN to classify gastric cancer and compared its 103
48 has been conducted [6], [7]. In particular, CADx is used results with those obtained by six specialists. A total of 3,170 104
49 in various medical imaging techniques, such as computed images of early gastric cancer and 5,981 images of benign 105
50 tomography, X-ray imaging, and magnetic resonance imag- lesions were collected. VGG16 and ResNet50 were used as 106
51 ing. Research on gastroscopy images is also being conducted. classification networks [15], [16]. The accuracy, sensitivity 107
52 Sakai et al. [8] classified gastric cancer using data from to gastric cancer, and speci ficity were 92.5%, 94.0%, and 108
53 58 patients who underwent endoscopic submucosal dissec- 91.0%, respectively, which were similar to the classification 109
54 tion. They collected 926 images, consisting of 228 early results of the specialists. He et al. [17] attempted to classify 110
55 gastric cancers and 698 normal tissues. The data were cut the stomach area using gastroscopy rather than by finding 111
56 into 224 × 224 and trained by increasing them with nine gastric lesions. The area was divided into 11 parts, including 112
57 types of augmentation techniques, such as rotation, shift, and the pharynx, esophagus, and duodenum. Through 229 gas- 113
58 shear. Transfer learning was conducted utilizing GoogLeNet, troscopies, 5,661 gastroscopy images were obtained, and the 114
59 and the performance was evaluated using a confusion matrix dataset was composed of 3,704 of them by selecting images. 115
60 [9]. They used sliding window method for lesion detection ResNet50, InceptionV3, VGG11, VGG16, and DenseNet121 116
61 and obtained heatmap of cancer. The accuracy was 87.6%, were used as classification networks, and their performances 117
62 and the sensitivity indicating the gastric cancer detection were compared. DenseNet121 exhibited the best performance 118
63 rate was 80%. Lee, Jang Hyung et al. [10] used gastroscopy with 91.1% accuracy [18]. Kim et al. [19] used deep convolu- 119
64 images to perform three tasks: normal vs cancer, normal vs tional generative advertising networks (DCGAN) and Cifar10 120
65 benign ulcer, and cancer vs benign ulcer. ResNet-50 and policies to compensate for the lack of data. They generated 121
66 Inception v3 and VGG-16 were used as classification net- 200 normal and 200 abnormal images through the DCGAN 122
67 works. The dataset consisted of 220 ulcer images, 367 gas- and increased by 25 times through Cifar10. In addition, 123
68 tric cancer images, and 200 normal images were collected, augmented data through DCGAN and Cifar10 policies were 124
69 and 10% of each category was used as a test. They used filtered to deep learning models trained with the original 125
70 adaptive histogram equalization to reduce image variation dataset. The classification threshold is set to 0.8 or higher. 126
71 for brightness and contrast. ResNet-50 performed best for The model that trained the data augmented with Cifar10 and 127
72 each task after training, scoring 96.5%, 92.6%, and 77.1%. DCGAN showed the best performance with an AUC value of 128
73 Furthermore, the performance of cancer vs benign ulcer was 0.900. This is 5% higher than the AUC value of 0.855, which 129
74 77.1%, indicating difficulty in classification due to the small is the performance of the network trained with the original 130
75 difference in appearance. Hong Jin Yoon, et al. [11] aimed data. 131
76 to develop a model optimized for EGC detection and depth Zhang et al. [20] attempted to detect gastric polyps using a 132
77 prediction using vgg-16. From 2012 to 2018, a total of deep learning object-detection algorithm. In total, 404 images 133
78 11,539 images were collected from 800 patients, including were obtained from 215 patients who underwent gastroscopy. 134
79 896 T1a-EGC, 809 T1b-EGC, and 9,834 non-EGC. In this The data collection period was short, so they used rotation 135
80 paper, they propose a loss function that used the weighted to augment the data. After applying image augmentation, 136
81 sum of gradient-weighted class activation mapping and cross- the training data comprised 708 images. SSD-GP Net was 137
82 entropy losses. The classification sensitivity and specificity used, and not only max pooling, but also min pooling and 138
83 were 91.0% and 97.6%, respectively, and the AUC was 0.981. second max pooling were added to the network to detect 139
84 The predictive sensitivity and specificity of Tumor depth polyps. The F1 score was 84.2%, and the mean average 140
141 precision, which describes the object detection performance, To improve the classification performance of the CADx 197
142 was 90.4%. A study was conducted to identify gastric cancer system, image augmentation using CAM, a CNN visualiza- 198
143 using an image segmentation algorithm. Shibata et al. [21] tion technique, is applied. Grad-CAM image augmentation 199
144 obtained 1,208 gastroscopic images from 42 healthy indi- extracts the lesion region from the abnormal image. The 200
145 viduals and 533 gastroscopic images from 93 patients with extracted image is attached to the area with the smallest 201
146 cancer from 2013 to 2017. A mask R-CNN was used as variance difference from the original image. As a result, 202
147 the image segmentation algorithm and was trained after an entirely new abnormal image is generated. To ensure that 203
148 masking the gastric cancer area in the endoscopic image the proposed augmentation method is effective a performance 204
149 [22]. The sensitivity was 96%, and the undetected rate was comparison was performed on the Xception between the orig- 205
150 1 per 10 images, indicating good detection of gastric can- inal dataset and the augmented dataset with Cifar10 policy 206
151 cer lesions. Ikenoyama et al. [23] studied whether the gas- and Grad-CAM. 207
180 and Ghiasi showed that copying objects and pasting them For gastroscopy classification, a CNN, which is a deep learn- 233
181 onto other images is an effective augmentation method. ing algorithm, was used. The Xception network using a depth- 234
182 Looking for related studies using medical data, the amount wise separable convolution structure was selected as the CNN 235
183 of data is significantly smaller than that of other filed studies for gastroscopy classification in this study [26]. Xception 236
184 if the collection time is not long due to the nature of the is designed to completely separate cross-channel correla- 237
185 data. Due to the nature of medial data, it is difficult to collect tions and spatial correlations. Therefore, it used Depth-wise 238
186 sufficient data for research. To overcome the difficulty of data separable convolution consisting of depth-wise and point- 239
187 collection, some studies apply augmentation techniques such wise parts. The depth-wise part completely separates the 240
188 as rotation and flipping to images. However, lesions such as channel and performs a 3 × 3 size convolution operation. 241
189 early gastric cancer with minute characteristics can be trans- This method creates one feature map per channel. The com- 242
190 formed by augmentation techniques. So, in this paper, we pro- putation amount can be reduced in a 3 × 3 convolution, 243
191 pose an augmentation method that can increase the amount which requires considerable computation, thus preventing 244
192 larger than the existing method and consequently improve the bottleneck phenomenon. The pointwise part performs a 245
193 the classification performance of gastrointestinal disease 1×1 convolution operation on each channel of the depth- 246
194 while maintaining the characteristics of the lesion. In this wise output. The number of output channels delivered to the 247
195 approach, a CADx system is trained to classify normal and next layer was adjusted based on the number of convolution 248
196 abnormal conditions, including gastritis and gastric cancer. filters in the pointwise part. Through the depth-wise separable 249
Korea. alizing the process through which the trained model predicts 261
250 convolution structure, feature extraction can be performed CAM (Grad-CAM) approach [27]. The Grad-CAM technique 264
251 efficiently, and the amount of computation can be reduced. uses the gradient value generated during backpropagation 265
252 In addition, a deeper network was constructed by applying in the CNN training process. The Grad-CAM process is as 266
253 techniques such as the skip connection of ResNet and batch follows. 267
268 Firstly, the k feature maps Ak that have completed convo- and probability. Therefore, depending on the augmentation 307
269 lution operations are obtained from the layer that checks the probability, different results can be obtained even if the same 308
270 ROI. Backpropagation is then performed with the predicted subpolicy is used. 309
271 value yc classified as class c in the classifier. After back- AutoAugment set the recurrent neural network (RNN) as 310
272 propagation, global average pooling (GAP) is performed on the controller that determined and created subpolicy S. S was 311
273 the gradient map of Ak created to obtain αkc . Expressed as (1), applied to the training dataset, and the validation accuracy R 312
274 i and j are the coordinates of the feature map and Z is the was obtained. Subsequently, S was adjusted such that R in the 313
275 number of elements in the feature map. Subsequently, the RNN could be increased. By repeating this process, 25 sub- 314
276 feature map Ak is multiplied times αkc , which is the value of policies with the best performance in the training dataset were 315
277 GAP, and then the sum is obtained. Finally, the sum value selected. One image was augmented per sub-policy, resulting 316
278 becomes Grad-CAM Lc for class c after the ReLU operation in a 25-fold increase. Further, 25 subpolicies constituted an 317
279 is performed. Grad-CAM Lc is shown in (2). augmentation policy for one dataset. 318
280 αkc = (1) datasets. The first policy is based on Cifar10 [30], which 320
Z i j ∂Ak
X
i,j consists of 10 classes: airplanes, vehicles, birds, cats, deer, 321
281
c
LGrad−CAM = ReLU( αkc Ak ) (2) dogs, frogs, horses, ships, and trucks. There are 6,000 images 322
k for each class, for a total of 60,000 images. The second policy 323
282 In the existing CAM, only the ROI in the last layer of the is based on ImageNet [31], which is a huge dataset that 324
283 CNN can be verified, and GAP is essential in the network was used in the ImageNet Large Scale Visual Recognition 325
284 [28]. Grad-CAM uses backpropagation gradients to identify Challenge until 2017. The final policy is based on SVHN 326
285 the ROI in all the layers of the CNN. In addition, it can be [32], which consists of numbers collected from Google 327
286 applied to general models by performing the GAP operation Street View. When comparing the performance after applying 328
287 separately, instead of in the CNN. AutoAugment to gastroscopy in the previous study, it was 329
288 With Grad-CAM, localization is possible without an object confirmed that the performance of the Cifar10 policy was the 330
289 detection model; thus, the area of the lesion in gastroscopy best [33]. Therefore, the gastroscopy dataset was augmented 331
290 can be found. Therefore, if applied to CADx, the area infor- by applying the Cifar10 policy. A total of 1,310 training 332
291 mation of a lesion can be transmitted to a specialist with a images were augmented 25 times for a total of 32,750 images. 333
294 rate, and there is a possibility that the CNN may misclas- In medical imaging classification, Grad-CAM is generally 335
295 sify gastroscopy, confusing specialists. An example of using used to visualize the location of the lesion, and only a 336
296 Grad-CAM for gastroscopic classification is shown in Fig. 4. few studies have used Grad-CAM for data augmentation. 337
treatment may not be timely and may worsen. For this reason, 340
gastroscopic image; (b) ROI detected by CNN; (c) gastroscopy lesion heat The gastroscopic dataset was augmented with Cifar10 of 348
map. AutoAugment and trained using Xception. The abnormal 349
image was not all used for augmentation through Grad-CAM. 350
297 D. IMAGE AUGMENTATION: AutoAugment 655 abnormal training images to previously learned Xception 352
298 Image augmentation was used to compensate for the insuf- models. And, out of 655 abnormal images, only 551 had 353
299 ficient gastroscopic dataset and to increase the classification classification accuracy greater than 0.9. The ROI of the lesion 354
300 performance. In this study, Google’s AutoAugment was used was extracted from each selected image using Grad-CAM, 355
301 as the image augmentation algorithm [29]. AutoAugment and the average size of the ROI was measured. Consequently, 356
302 consists of the augmentation operations Shear X/Y, Translate the average width of the ROI was 96 pixels, and the average 357
303 X/Y, Rotate, Auto Contrast, Invert, Equalize, Posterize, Con- height was 99 pixels. The ROI is attached to the normal image 358
304 trast, Color, Brightness, Sharpness, Cutout, and Sample pair- to create a new abnormal image. Two procedures were carried 359
305 ing. In addition, one subpolicy consists of two augmentation out at this time to prevent the ROI of the lesion from being 360
306 operations. The subpolicy has two parameters: magnitude attached to an unrealistic position. We divide the normal 361
362 image into nine uniform areas. It was divided into nine areas precision. In this study, the sensitivity, specificity, and F1 389
363 to correspond to the size of ROI. Subsequently, the variance score were mainly used to assess the performance improve- 390
364 of ROI with each region was compared, and ROI was pasted ment. The equations for precision, specificity, sensitivity, and 391
365 at a position where the variance difference was minimal. F1 score are shown in (3), (4), (5), and (6), respectively. 392
366 However, the size of ROI is not fixed, and each patient has a The performance was also evaluated using the area under 393
367 different size, so the size of the lesion may be larger than the the curve (AUC) index of the receiver operating characteristic 394
368 size of the divided area. As a result, in this case, the size of the (ROC) curve. AUC is used as a major evaluation metric in 395
369 divided area was the same as ROI, and the windowing stride studies using medical data. In the ROC curve, the y-axis is 396
370 was adjusted so that the comparison was made nine times. the true positive rate, indicating sensitivity, and the x-axis is 397
371 Figure 6 shows an example of this. the false positive rate, meaning 1-specificity. The closer the 398
in Windows10. The CPU used was Intel Xeon W-2133, and 401
the GPU employed for training was TITAN RTX 24GB. The 402
batch size was 64, and the learning rate was 0.0001. All 403
TP
Precisio = (3) 405
TP + FP
TN
Specificity = (4) 406
TN + FP
TP
Sensitivity = (5) 407
TP + FN
Precision × Sensitivity
F1Score = 2 × (6) 408
Precision + Sensitivity
Xception. 415
372 The ROI lesions in 551 selected abnormal images tion showed the highest performance in terms of accuracy, 421
373 were pasted onto 655 normal images using the proposed specificity, and AUC. ResNet-152 had a precision of 0.878, 422
374 Grad-CAM augmentation method. Consequently, 360,905 which was the highest value including the proposed method. 423
375 augmented abnormal images were generated. In the cases But it had the lowest value about sensitivity of 0.659, so it 424
376 of normal lesions, there were no distinct ROI-like abnormal was not selected as a model for this study. Xception reduced 425
377 lesions. Therefore, only abnormal values were augmented. information loss by removing the non-linear function that 426
378 F. EVALUATION METHOD Convolution operations. Furthermore, Xception solved the 428
379 The performance of the classification model was evaluated Gradient vanishing problem using the residual connection, 429
380 using a confusion matrix. The confusion matrix consisted outperforming other models. For this reason, we conducted 430
381 of true positives (TP), true negatives (TN), false positives the experiment using Xception. We used AutoAugment’s 431
382 (FP), and false negatives (FN). For CADx, the early detection Cifar10 policy and applied augmentation with Grad-CAM. 432
383 of abnormalities that can worsen the condition is important. The training was performed two times, including the orig- 433
384 Therefore, when evaluating the performance, abnormal was inal dataset, AutoAugment’s Cifar10 policy dataset, and 434
385 set to positive and normal to negative. Through this approach, Grad-CAM augmented dataset. The detailed process is illus- 435
386 the sensitivity and precision were obtained, and the overall trated in Figure. 7. 436
387 classification performance of the model was evaluated using Xception was trained using the original dataset. Missing 437
388 the F1 score and the harmonic average of the sensitivity and 38 out of 164 abnormal and 28 out of 164 normal cases 438
439 resulted in an accuracy of 0.799. The sensitivity and speci- FIGURE 8. ROC curves for the original, and Grad-CAM datasets.
440 ficity were 0.768 and 0.782, respectively. The F1 score was
441 0.792. Subsequently, the dataset was augmented and trained specificity, sensitivity, F1 score, and AUC. The original 467
442 using AutoAugment’s Cifar10 policy and the Grad-CAM gastroscopy dataset had a lower sensitivity than accu- 468
443 images were augmented with the weights of models trained racy. The specificity and AUC were relatively high. Thus, 469
444 using AutoAugment’s Cifar10 policy. Moreover, the Cifar10 the false-positive rate was low. When training with the 470
445 policy and dataset augmented by Grad-CAM were combined Grad-CAM augmented dataset, there was not much differ- 471
446 and trained. The model trained by the proposed method ence in the AUCs. However, the sensitivity increased by 472
447 showed improved performance on all evaluation metrics. 5%, and the F1 score increased by 5%. This dataset showed 473
448 The accuracy of proposed method was 0.842, and 52 of the better performance than the others owing to the increase in 474
449 328 endoscopies were missed. The sensitivity was 0.805, and abnormal classification performance. As a result, the Cifar10 475
450 26 abnormal images were classified incorrectly. Compared policy of AutoAugment and the Grad-CAM augmentation 476
451 to the Xception trained with the original data, performance method extracted with the corresponding weights yielded 477
452 improvement of about 5% was achieved from 0.768 to 0.805. improved classification performance compared to the orig- 478
453 Also, the F1 score and AUC were 0.835, and 0.903 respec- inal dataset, which contained fewer data. The Cifar10 aug- 479
454 tively which was the highest value. The detailed performance mentation method augments data by applying geometrical 480
455 evaluation results are presented in Table 2. transformations such as rotation to the image. Therefore, 481
456 Fig. 8 shows the AUC performance of the ROC curve. strong transformations may occur depending on the applied 482
457 The AUC of the original dataset, and Grad-CAM models policy. Since strong transformations slow training and impede 483
458 are 0.885, and 0.903, respectively. In addition, the proposed weight convergence, the Cifar10 augmentation method limits 484
459 method had the highest AUC value among several models. performance improvement. However, since the Grad-CAM 485
460 This result is meaningful because AUC is an indicator of augmentation method is applied to create an image by syn- 486
461 classification performance between normal and abnormal. thesizing the ROI in the original image, augmentation can 487
462 V. DISCUSSION image and the morphological characteristics of the lesion. 489
463 In this study, the Xception network was used to train the Therefore, unlike Cifar10, it is possible to increase a lot of 490
464 original gastroscopy dataset, the augmented dataset with data without interfering with the convergence of training. 491
465 the Cifar10 policy and Grad-CAM, and their performance Furthermore, the advantage of the Grad-CAM augmentation 492
466 was compared. The performance indicators were accuracy, method is that the amount of increase is larger than the general 493
494 augmentation method. In this study, ROI was extracted from [4] A. Misumi, K. Misumi, A. Murakami, K. Harada, U. Honmyo, and 551
495 551 selected images, so 655 abnormal images increased by M. Akagi, ‘‘Endoscopic diagnosis of minute, small, and flat early gastric 552
cancers,’’ Endoscopy, vol. 21, pp. 159–164, Jul. 1989. 553
496 551 times to 360,905, but the amount can vary depending on [5] U. Honmyo, A. Misumi, A. Murakami, S. Mizumoto, I. Yoshinaka, 554
497 the size of each dataset. Discrimination of abnormal images M. Maeda, S. Yamamoto, and S. Shimada, ‘‘Mechanisms producing 555
498 is another critical factor in medical data. As a result, the color change in flat early gastric cancers,’’ Endoscopy, vol. 29, no. 5, 556
pp. 366–371, Jun. 1997. 557
499 improvement in abnormal classification demonstrates that
[6] I. M. El-Hasnony, O. M. Elzeki, A. Alshehri, and H. Salem, ‘‘Multi-label 558
500 the Grad-CAM augmentation method used in this study is active learning-based machine learning model for heart disease predic- 559
501 effective for medical data. tion,’’ Sensors, vol. 22, no. 3, p. 1184, Feb. 2022. 560
[7] H. Torkey, M. Atlam, N. El-Fishawy, and H. Salem, ‘‘A novel deep 561
502 VI. CONCLUSION autoencoder based survival analysis approach for microarray dataset,’’ 562
PeerJ Comput. Sci., vol. 7, p. e492, Apr. 2021. 563
503 CADx can assist specialists in performing endoscopy. In this [8] Y. Sakai, S. Takemoto, K. Hori, M. Nishimura, H. Ikematsu, T. Yano, 564
504 study, CADx, which is classified as normal or abnormal by and H. Yokota, ‘‘Automatic detection of early gastric cancer in endo- 565
505 augmentation using AutoAugment and Grad-CAM, was stud- scopic images using a transferring convolutional neural network,’’ in Proc. 566
40th Annu. Int. Conf. IEEE Eng. Med. Biol. Soc. (EMBC), Jul. 2018, 567
506 ied. The classifier was trained by selecting Xception, which is pp. 4138–4141. 568
507 one of the CNNs that shows strength in image classification. [9] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, 569
508 For the augmentation algorithm, the method of extracting V. Vanhoucke, and A. Rabinovich, ‘‘Going deeper with convolutions,’’ 570
509 the ROI of the lesion through Grad-CAM and pasting it in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2014, 571
pp. 1–9. 572
510 on another image was applied. Subsequently, we checked [10] J. H. Lee, Y. J. Kim, Y. W. Kim, S. Park, Y.-I. Choi, Y. J. Kim, D. K. Park, 573
511 whether the proposed method was effective in improving per- K. G. Kim, and J.-W. Chung, ‘‘Spotting malignancies from gastric endo- 574
512 formance, using indicators such as the F1 score and AUC. The scopic images using deep learning,’’ Surgical Endoscopy, vol. 33, no. 11, 575
pp. 3790–3797, Nov. 2019. 576
513 evaluation results demonstrated that the classification model
[11] H. J. Yoon, S. Kim, J.-H. Kim, J.-S. Keum, S.-I. Oh, J. Jo, J. Chun, 577
514 to which the proposed Grad-CAM augmentation model is Y. H. Youn, H. Park, I. G. Kwon, S. H. Choi, and S. H. Noh, ‘‘A lesion- 578
515 applied has the best classification performance. Compared to based convolutional neural network improves endoscopic detection and 579
depth prediction of early gastric cancer,’’ J. Clin. Med., vol. 8, no. 9, 580
516 the original dataset, the overall classification performance, F1
p. 1310, Aug. 2019. 581
517 score, and AUC increased by 5%. [12] T. Hirasawa, K. Aoyama, T. Tanimoto, S. Ishihara, S. Shichijo, T. Ozawa, 582
518 Several challenges remain to be addressed in future studies. T. Ohnishi, M. Fujishiro, K. Matsuo, J. Fujisaki, and T. Tada, ‘‘Application 583
519 First, we compared the variance between lesion and normal of artificial intelligence using a convolutional neural network for detecting 584
gastric cancer in endoscopic images,’’ Gastric Cancer, vol. 21, no. 4, 585
520 images and pasted the lesion into the area where variance is pp. 653–660, 2018. 586
521 minimal to generate an image as realistic as possible in this [13] W. Liu, D. Auguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, and 587
522 study. However, improvement in image synthesis method is A. C. Berg, ‘‘SSD: Single shot MultiBox detector,’’ in Proc. 14th Eur. 588
Conf. (ECCV). Cham, Switzerland: Springer, Jun. 2016, pp. 21–37. 589
523 required for more realistic augmented images. We believe that
[14] L. Wu, W. Zhou, X. Wan, J. Zhang, J. Shen, S. Hu, Q. Ding, G. Mu, 590
524 this issue can be solved using algorithms such as generative A. Yin, X. Huang, and J. Liu, ‘‘A deep neural network improves endoscopic 591
525 adversarial networks (GANs) or image blending. Second, detection of early gastric cancer without blind spots,’’ Endoscopy, vol. 51, 592
526 the localization performance is poor compared with those no. 6, pp. 522–531, Jun. 2019. 593
[15] K. Simonyan and A. Zisserman, ‘‘Very deep convolutional networks for 594
527 of object detection or segmentation. CAM has the advan- large-scale image recognition,’’ 2014, arXiv:1409.1556. 595
528 tage of extracting ROIs using only a CNN without a sep- [16] K. He, X. Zhang, S. Ren, and J. Sun, ‘‘Deep residual learning for image 596
529 arate layer for region proposal, but it does not find exact recognition,’’ in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 597
Jun. 2016, pp. 770–778. 598
530 lesion locations. This problem can be solved by employ-
[17] Q. He, S. Bano, O. F. Ahmad, B. Yang, X. Chen, P. Valdastri, L. B. Lovat, 599
531 ing segmentation algorithms, such as SLIC superpixels or D. Stoyanov, and S. Zuo, ‘‘Deep learning-based anatomical site classifica- 600
532 FRFCMs, to find segmented ROIs. In addition, although this tion for upper gastrointestinal endoscopy,’’ Int. J. Comput. Assist. Radiol. 601
533 study was conducted through white light endoscopic imaging, Surgery, vol. 15, no. 7, pp. 1085–1094, May 2020. 602
[18] G. Huang, Z. Liu, L. van der Maaten, and K. Q. Weinberger, ‘‘Densely 603
534 which is commonly performed, narrow-band imaging (NBI) connected convolutional network,’’ in Proc. IEEE Conf. Comput. Vis. 604
535 endoscopy also exists, which is performed using the wave- Pattern Recognit., Jul. 2017, pp. 4700–4708. 605
536 length of light absorbed by blood vessels. We plan to develop [19] Y.-J. Kim, H. C. Cho, and H.-C. Cho, ‘‘Deep learning-based computer- 606
aided diagnosis system for gastroscopy image classification using synthetic 607
537 a model for classifying abnormalities by extending the study data,’’ Appl. Sci., vol. 11, no. 2, p. 760, Jan. 2021. 608
538 from white-light endoscopic imaging to NBI. [20] X. Zhang, F. Chen, T. Yu, J. An, Z. Huang, J. Liu, W. Hu, L. Wang, H. Duan, 609
and J. Si, ‘‘Real-time gastric polyp detection using convolutional neural 610
539 REFERENCES network,’’ PLoS ONE, vol. 14, no. 3, Mar. 2019, Art. no. e0214133. 611
540 [1] F. Bray, J. Ferlay, I. Soerjomataram, R. L. Siegel, L. A. Torre, and A. Jemal, [21] T. Shibata, A. Teramoto, H. Yamada, N. Ohmiya, K. Saito, and H. Fujita, 612
541 ‘‘Global cancer statistics 2018: GLOBOCAN estimates of incidence and ‘‘Automated detection and segmentation of early gastric cancer from endo- 613
542 mortality worldwide for 36 cancers in 185 countries,’’ CA, Cancer J. Clin., scopic images using mask R-CNN,’’ Appl. Sci., vol. 10, no. 11, p. 3842, 614
543 vol. 68, no. 6, pp. 394–424, Nov. 2018. May 2020. 615
544 [2] S. Hong, Y.-J. Won, J. J. Lee, K.-W. Jung, H.-J. Kong, J.-S. Im, and [22] K. He, G. Gkioxari, P. Dollár, and R. Girshick, ‘‘Mask R-CNN,’’ in Proc. 616
545 H. G. Seo, ‘‘Cancer statistics in Korea: Incidence, mortality, survival, and IEEE Int. Conf. Comput. Vis., Oct. 2017, pp. 2961–2969. 617
546 prevalence in 2018,’’ Cancer Res. Treatment, vol. 53, no. 2, pp. 301–315, [23] Y. Ikenoyama, T. Hirasawa, M. Ishioka, K. Namikawa, S. Yoshimizu, 618
547 Apr. 2021. Y. Horiuchi, A. Ishiyama, T. Yoshio, T. Tsuchida, Y. Takeuchi, and 619
548 [3] Korea Consumer Agency. Cancer Misdiagnosis Consumer Damage S. Shichijo, ‘‘Detecting early gastric cancer: Comparison between the 620
549 Prevention Advisory. Accessed: Jul. 13, 2017. [Online]. Available: diagnostic ability of convolutional neural networks and endoscopists,’’ 621
550 https://www.kca.go.kr/kca/ Digestive Endoscopy, vol. 33, no. 1, pp. 141–150, Apr. 2021. 622
623 [24] W. Zhang and Y. Cao, ‘‘A new data augmentation method of remote sensing HAN-SUNG LEE received the B.S. degree in elec- 660
624 dataset based on class activation map,’’ J. Phys., Conf., vol. 1961, no. 1, trical and electronic engineering from Kangwon 661
625 Jul. 2021, Art. no. 012023. National University, South Korea, in 2022, where 662
626 [25] G. Ghiasi, Y. Cui, A. Srinivas, R. Qian, T.-Y. Lin, E. D. Cubuk, Q. V. Le, he is currently pursuing the M.S. degree in inter- 663
627 and B. Zoph, ‘‘Simple copy-paste is a strong data augmentation method disciplinary graduate program for BIT medical 664
628 for instance segmentation,’’ Proc. IEEE/CVF Conf. Comput. Vis. Pattern convergence. 665
629 Recognit., Jun. 2021, pp. 2918–2928.
630 [26] F. Chollet, ‘‘Xception: Deep learning with depthwise separable convo-
631 lutions,’’ Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Sep. 2017,
632 pp. 1251–1258.
633 [27] R. R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, and
634 D. Batra, ‘‘Grad-CAM: Visual explanations from deep networks via JUNG-WOO CHAE received the B.S. degree 666
635 gradient-based localization,’’ in Proc. IEEE Int. Conf. Comput. Vis.
in electrical and electronic engineering and the 667
636 (ICCV), Oct. 2017, pp. 618–626.
M.S. degree in electrical and medical conver- 668
637 [28] B. Zhou, A. Khosla, A. Lapedriza, A. Oliva, and A. Torralba, ‘‘Learning
638 deep features for discriminative localization,’’ in Proc. IEEE Conf. Com- gent engineering from Kangwon National Univer- 669
639 put. Vis. Pattern Recognit. (CVPR), Jun. 2016, pp. 2921–2929. sity, South Korea, in 2019 and 2021, respectively, 670
640 [29] E. D. Cubuk, B. Zoph, D. Mane, V. Vasudevan, and Q. V. Le, where he is currently pursuing the Ph.D. degree in 671
641 ‘‘AutoAugment: Learning augmentation policies from data,’’ 2018, interdisciplinary graduate program in BIT medical 672
643 [30] A. Krizhevsky and G. Hinton, ‘‘Learning multiple layers of features from
644 tiny images,’’ M.S. thesis, Dept. Comput. Sci., Univ. Toronto, Toronto, ON,
645 Canada, Apr. 2009.
646 [31] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, ‘‘ImageNet: HYUN CHIN CHO received the M.S., Ph.D., 674
647 A large-scale hierarchical image database,’’ in Proc. IEEE Conf. Comput. and M.D. degrees in internal medicine from 675
648 Vis. Pattern Recognit., Jun. 2009, pp. 248–255. the School of Medicine, Gyeongsang National 676
649 [32] Y. Netzer, T. Wang, A. Coates, A. Bissacco, B. Wu, and A. Y. Ng, ‘‘Reading University, Jinju, South Korea, in 2008 and 677
650 digits in natural images with unsupervised feature learning,’’ in Proc. 2014, respectively. She was a fellow at the 678
651 NeurIPSW, 2011, pp. 1–9. Samsung Medical Center, School of Medicine, 679
652 [33] S.-A. Lee, D.-H. Kim, and H.-C. Cho, ‘‘Deep learning based gastric lesion Sungkyunkwan University, Seoul, South Korea, 680
653 classification system using data augmentation,’’ Trans. Korean Inst. Electr. from 2009 to 2010. From 2011 to 2015, she was a 681
654 Eng., vol. 69, no. 7, pp. 1033–1039, Jul. 2020. Professor at School of Medicine, Samsung Chang- 682
won Hospital, Sungkyunkwan University, Chang- 683
won, South Korea. She is currently a Professor with the School of Medicine, 684
Gyeongsang National University Hospital, Gyeongsang National University. 685
655 HYUN-SIK HAM received the B.S. degree in puter engineering from the University of Florida, 688
656 electrical and electronic engineering and the USA, in 2009. From 2010 to 2011, he was a 689
657 M.S. degree in electrical and medical convergent Research Fellow at the University of Michigan, 690
658 engineering from Kangwon National University, Ann Arbor, USA. From 2012 to 2013, he was 691
659 South Korea, in 2020 and 2022, respectively. the Chief Research Engineer at LG Electronics, 692
South Korea. He is currently a Professor with the 693
Department of Electronics Engineering and Inter- 694
disciplinary Graduate Program for BIT Medical 695
Convergence, Kangwon National University, South Korea. 696
697