CMMM2021 5940433
CMMM2021 5940433
Research Article
Gastrointestinal Tract Disease Classification from Wireless
Endoscopy Images Using Pretrained Deep Learning Model
Received 10 May 2021; Revised 3 July 2021; Accepted 16 August 2021; Published 11 September 2021
Copyright © 2021 J. Yogapriya et al. This is an open access article distributed under the Creative Commons Attribution License,
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Wireless capsule endoscopy is a noninvasive wireless imaging technology that becomes increasingly popular in recent years.
One of the major drawbacks of this technology is that it generates a large number of photos that must be analyzed by
medical personnel, which takes time. Various research groups have proposed different image processing and machine
learning techniques to classify gastrointestinal tract diseases in recent years. Traditional image processing algorithms and a
data augmentation technique are combined with an adjusted pretrained deep convolutional neural network to classify
diseases in the gastrointestinal tract from wireless endoscopy images in this research. We take advantage of pretrained
models VGG16, ResNet-18, and GoogLeNet, a convolutional neural network (CNN) model with adjusted fully connected
and output layers. The proposed models are validated with a dataset consisting of 6702 images of 8 classes. The VGG16
model achieved the highest results with 96.33% accuracy, 96.37% recall, 96.5% precision, and 96.5% F1-measure. Compared
to other state-of-the-art models, the VGG16 model has the highest Matthews Correlation Coefficient value of 0.95 and
Cohen’s kappa score of 0.96.
physician to decide about the diseases [7]. The major papers published in the field. Traditional machine learning
diseases diagnosed using the WCE are ulcers, bleeding, algorithms and deep learning algorithms are used in these
malignancy, and polyps in the digestive system. The ana- studies. Improving the classification of disease areas with a
tomical landmarks, pathological findings, and poly removal high degree of precision in automatic detection is a great
play a vital role in diagnosing the diseases in the digestive challenge. Advanced deep learning techniques are important
system using WCE captured images. It is a more convenient in WCE to boost its analytical vintage. The AlexNet model is
method to diagnose by providing a wide range of visuals [8]. proposed to classify the upper gastrointestinal organs from
It reduces the patient’s discomfort and complications during the images captured under different conditions. The model
the treatment in conventional endoscopy methods like com- achieves an accuracy of 96.5% in upper gastrointestinal
puter tomography enteroclysis and enteroscopy. The accu- anatomical classification [17]. The author proposed the
racy of diagnosing tumours and gastrointestinal bleeding, technique to reduce the review time of endoscopy screening
especially in the small intestine, has improved. The overall based on the analysis of factorization. The sliding window
process is very time-consuming to analyze all the frames mechanism with single value decomposition is used. The
extracted from each patient [9]. Furthermore, even the most technique achieves an overall precision of 92% [18]. The
experienced physicians confront difficulties that necessitate a author proposed a system for automatically detecting irreg-
large amount of time to analyze all of the data because the ular WCE images by extracting fractal features using the
contaminated zone in one frame will not emerge in the next. differential box-counting method. The output is tested on
Even though the majority of the frames contain useless two datasets, both of which contain WCE frames, and
material, the physician must go through the entire video in achieves binary classification accuracy of 85% and 99% for
order. Owing to inexperience or negligence, it may often dataset I and dataset II, respectively [19]. The author uses
result in a misdiagnosis [10]. the pretrained models Inception-v4, Inception ResNet-v2,
Segmentation, classification, detection, and localization and NASNet to classify the anatomical landmarks from the
are techniques used to solve this problem by researchers. WCE images, which obtained 98.45%, 98.48%, and 97.35%.
Feature extraction and visualization are an important step Out of this, the Inception-v4 models achieves a precision
that determines the overall accuracy of the computer-aided of 93.8% [20]. To extract the features from the data, the
diagnosis method. The different features are extracted based authors used AlexNet and GoogLeNet. This approach is
upon the texture analysis, color-based, points, and edges in aimed at addressing the issues of low contrast and abnormal
the images [11]. The features extracted are insufficient to lesions in endoscopy [21]. The author proposed a computer-
determine the model’s overall accuracy. As a result, feature aided diagnostics tool for classifying ulcerative colitis and
selection is a time-consuming process that is crucial in deter- achieves the area under the curve of 0.86 for mayo 0 and
mining the model’s output. The advancements in the field of 0.98 for mayo 0-1 [22]. The author proposed the convolu-
deep learning, especially CNN, can solve the problem [12]. tional neural network with four layers to classify a different
The advancement of CNN has been promising in the last class of ulcers from the WCE video frames. The test results
decades, with automated detection of diseases in various are improved by tweaking the model’s hyperparameters
organs of the human body, such as the brain [13], cervical and achieving an accuracy of 96.8% [23]. The authors have
cancer [14], eye diseases [15], and skin cancer [16]. Unlike introduced the new virtual reality capsule to simulate and
conventional learning algorithms such as machine learning, identify the normal and abnormal regions. This environ-
the CNN model has the advantage of extracting features hier- ment is generated new 3D images for gastrointestinal dis-
archically from low to a high level. The remainder of the eases [24]. Local spatial features are retrieved from pixels
manuscript is organized as follows: Section 2 explains the of interest in a WCE image using a linear separation
related work in the field of GIT diagnosis; Section 3 discusses approach in this paper. The proposed probability density
the dataset consider for this study; Section 4 describes the function model fitting-based approach not only reduces
pretrained architecture to diagnose eight different diseases computing complexity, but it also results in a more consis-
from WCE images; Section 5 contains the derived findings tent representation of a class. The proposed scheme
from the proposed method; Section 6 concludes the work. performs admirably in terms of precision, with a score of
96.77% [25]. In [26], the author proposed a Gabor capsule
2. Related Work network for classifying complex images like the Kvasir data-
set. The model achieves an overall accuracy of 91.50%. The
The automated prediction of anatomical landmarks, patho- wavelet transform with a CNN is proposed to classify gastro-
logical observations, and polyp groups from images obtained intestinal tract diseases and achieves an overall average
using wireless capsule endoscopy is the subject of this performance of 93.65% in classifying the eight classes [27].
research. The experimental groups from the pictures make From the literature, the CNN model can provide better
it simple for medical experts to make an accurate diagnosis results if the number of the dataset is high. But there are
and prescribe a treatment plan. Significant research in this several obstacles in each step that will reduce the model’s
area has led to the automatic detection of infection from a performance. The low contrast video frames in the dataset
large number of images, saving time and effort for medical make segmenting the regions difficult. The extraction and
experts while simultaneously boosting diagnosis accuracy. selection of important traits are another difficult step in
Automatically detecting infected image from WCE images identifying disorders including ulcers, bleeding, and polyps.
has lately been a popular research topic, with a slew of The workflow of the proposed method for disease
Computational and Mathematical Methods in Medicine 3
Image labeling
Wireless capsule Data
by medical
endoscopy augmentation
experts
Normal-pylorus
Normal Z-line
Esophagitis
Model
Model training Trained model Normal-cecum
predication
Polyps
Ulcerative-colitis
Dyed-resection-margins
Dyed-lifted-polyps
classification using wireless endoscopy is shown in Figure 1. × 1072 pixels. The different diseases with corresponding
The significant contributions of this study are as follows. class label encoding are provided in Table 1.
An anatomical landmark is a characteristic of the GIT
(1) A computer-assisted diagnostic system is being pro- that can be seen through an endoscope. It is necessary for
posed to classify GIT diseases into many categories, navigation and as a reference point for describing the loca-
including anatomical landmarks, pathological obser- tion of a given discovery. It is also possible that the land-
vations, and polyp removal marks are specific areas for pathology, such as ulcers or
inflammation. Class 0 and class 1 are the two classes of poly
(2) The pretrained model is used to overcome small
removal. Class 3, class 4, and class 5 are the most important
datasets and overfitting problem, which reduces the
anatomical landmarks. The essential pathological findings
model accuracy [28]
are class 2, class 6, and class 7. The sample image from the
(3) The VGG16, ResNet-18, and GoogLeNet pretrained dataset is shown in Figure 2, and the distribution of the data-
CNN architecture classify gastrointestinal tract set is represented in Figure 3.
diseases from the endoscopic images by slightly
modifying the architecture 4. Proposed Deep Learning Framework
(4) The visual features of GIT disease ways of obtaining To solve the issue of small data sizes, transfer learning was
classification decisions are visualized using the used to fine-tune three major pretrained deep neural
occlusion sensitivity map networks called VGG16, ResNet-18, and GoogLeNet on the
(5) We also compared the modified pretrained architec- training images of the augmented Kvasir version 2 dataset.
ture with other models, which used handcrafted
features and in-depth features to detect the GIT dis- 4.1. Transfer Learning. In the world of medical imaging,
eases in accuracy, recall, precision, F1-measure, classifying multiple diseases using the same deep learning
region of characteristics (ROC) curve, and Cohen’s architecture is a difficult task. Transfer learning is a tech-
kappa score nique for repurposing a model trained on one task to a com-
parable task that requires some adaptation. When there are
3. Dataset Description not enough training samples to train a model from start,
transfer learning is particularly beneficial for applications
The dataset used in these studies is a GIT images taken with like medical picture classification for rare or developing dis-
endoscopic equipment at Norway’s VV health trust. The eases. This is particularly true for deep neural network
training data is obtained from a large gastroenterology models, which must be trained with a huge number of
department at one of the hospitals in this trust. The further parameters. Transfer learning enables model parameters to
medical experts meticulously annotated the dataset and start with good initial values that only need minimal tweaks
named it Kvasir-V2. This dataset was made available in the to be better curated for the new problem. Transfer learning
fall of 2017 as part of the Mediaeval Medical Multimedia can be done in two ways; one approach is training the model
Challenge, a benchmarking project that assigns tasks to the from the top layers, and another approach is freezing the top
research group [29]. Anatomical landmarks, pathological layers of the model and fine-tunes it on the new dataset. The
observations, and polyp removal are among the eight groups eight different types of diseases are considered in the pro-
that make up the dataset with 1000 images each. The images posed model, so the first approach is used where the model
in the dataset range in resolution from 720 × 576 to 1920 is trained from the top layers. VGG16, GoogLeNet, and
4 Computational and Mathematical Methods in Medicine
Dataset distribution
1000
800
Total images
600
400
200
0
Class 1
Class 2
Class 3
Class 4
Class 5
Class 6
Class 7
Class 8
VGG16
Conv1_1
Conv1_2
Conv2_1
Conv2_2
Conv3_1
Conv3_2
Conv3_1
Conv4_1
Conv4_2
Conv4_1
Conv4_1
Conv4_2
Conv4_1
Pooling
Pooling
Pooling
Pooling
Pooling
Output
Dense
Dense
Dense
Input
Figure 4: VGG16 architecture for gastrointestinal tract disease classification.
ResNet-18 are the pretrained model used for classifying the Input
different gastrointestinal tract diseases using endoscopic
images. The above pretrained models are used as baseline 7×7 Conv, 64, /2
− − - Skip Connection
models, and the model performance is increased by using 3×3, pooling, /2
various performance improvement techniques.
3×3 Conv, 64
4.2. Gastrointestinal Tract Disease Classification Using
VGG16. The VGG16 model comprises 16 layers which 3×3 Conv, 64
consist of 13 convolution layers and three dense layers. This Layer 1 100×100
model is initially introduced in 2014 for the ImageNet com- 3×3 Conv, 64
petition. The VGG16 is one of the best models for image
classification. Figure 4 depicts the architecture of the 3×3 Conv, 64
VGG16 model. 50×50
Instead of having many parameters, the model focuses 3×3 Conv, 128, /2
on having a 3 × 3 convolution layer with stride one and pad- 3×3 Conv, 128
ding that is always the same. The max-pooling layer uses a Layer 2
2 × 2 filter with a stride of two. The model is completed by 3×3 Conv, 128
two dense layers, followed by the softmax layer. There are
approximately 138 million parameters in the model [30]. 3×3 Conv, 128
The dense layers 1 and 2 consist of 4096 nodes. The dense 25×25
layer 1 consists of a maximum number of parameters of 3×3 Conv, 256, /2
100 million approximately. The number of the parameter 3×3 Conv, 256
in that particular layer is reduced without degrading the Layer 3
performance of the model. 3×3 Conv, 256
4.3. Gastrointestinal Tract Disease Classification Using ResNet- 3×3 Conv, 256
18. Another pretrained model for classifying gastrointestinal 5×5
tract disease from endoscopic images is the ResNet-18 model. 3×3 Conv, 512, /2
Figure 5 depicts the architecture of the ResNet-18 platform.
3×3 Conv, 512
This model is based on a convolutional neural network, one Layer 4
of the most common architectures for efficient training. It 3×3 Conv, 512
allows for a smooth gradient flow. The identity shortcut link
in the ResNet-18 model skips one or more layers. This will 3×3 Conv, 512
allow the network to have a narrow connection to the
network’s first layers, rendering gradient upgrades much Avg Pool
1×1
easier for those layers [31]. The ResNet model comprises 17
FC 8
convolution layers and one fully connected layer.
Figure 5: ResNet-18 architecture for gastrointestinal tract disease
4.4. Gastrointestinal Tract Disease Classification Using
classification.
GoogLeNet. In many transfer learning tasks, the GoogLeNet
model is a deep CNN model that obtained good classifica-
tion accuracy while improving compute efficiency. With a ations. When compared to the AlexNet model, the GoogLe-
top-5 error rate of 6.67%, the GoogLeNet, commonly known Net model has twice the amount of parameters.
as the Inception model, won the ImageNet competition in
2015. The inception module is shown in Figure 6, and the 4.5. Data Augmentation. The CNN models are proven to be
GoogLeNet architecture is shown in Figure 7. It has 22 suitable for many computer vision tasks; however, they
layers, including 2 convolution layers, 4 max-pooling layers, required a considerable amount of training data to avoid
and 9 linearly stacked inception modules. The average pool- overfitting. Overfitting occurs when a deep learning model
ing is introduced at the end of the previous inception learns a high-variance function that precisely models the
module. To execute the dimension reduction, the 1 × 1 filter training data but has a narrow range of generalizability.
is employed before the more expensive 3 × 3 and 5 × 5 oper- But in many cases, especially for medical image datasets
6 Computational and Mathematical Methods in Medicine
Filter concatenate
previous layer
Inception 4d Dropout
Maxpool
obtained, a large amount of data is a tedious task. Different 256 × 256 pixels in the collected dataset. The augmented
data augmentation techniques are used to increase the size dataset is consisting of 33536 images which contained 4192
and consistency of the data to solve the issue of overfitting. images in individual classes. Then, the augmented datasets
These techniques produce dummy data that has been sub- are divided into 80% training and 20% validation set. There
jected to different rotations, width changes, height shifts, are 26832 images in the training and 6407 images in the vali-
zooming, and horizontal flip but is not the same as the orig- dation. The pretrained models are trained from scratch with
inal data. The rotation range is fixed as 45°, shifting width the hyperparameters of 30 epoch, batch size of 8, Adam opti-
and height range is 0.2, zooming range of 0.2, and horizontal mizers, and learning rate of 1-e05 with step size 33% via trial
flip. The augmented dataset from the original Kvasir version and error method by considering the computing facility. The
2 dataset is shown in Figure 8. Adam optimizers are used due to their reduced complexity
during the model training [32]. The softmax classification
layer and categorical cross-entropy are used in the output of
5. Results and Discussion the pretrained model, and it is given in equations (1) and (2).
In this work, the Kvasir version 2 dataset is used for the clas-
sification of GIT diseases. The entire dataset is divided into ez i
an 80% training and 20% validation set. NVIDIA Digits uses =
σ Z , ð1Þ
the Caffe deep learning system to build the pretrained CNN
i
∑Kj=1 ez j
models. The CNN pretrained model is trained and tested
with a system configuration Intel i9 processor with 32 GB
NVIDIA Quadro RTX6000 GPU. The pretrained models where σ denotes the softmax, Z denotes the input vector,
are written with the Caffe deep learning framework in the ezi denotes the standard exponential of the input vector, K
NVIDIA Digits platform. Images with resolutions ranging denotes the number of classes, and ez j denotes the standard
from 720 × 576 to 1920 × 1072 pixels were transformed to exponential of the output vector.
Computational and Mathematical Methods in Medicine 7
13 100
12 90
11
80
10
9 70
Accuracy (%)
8 60
7
Loss
50
6
5 40
4 30
3
20
2
10
1
0 0
0 5 10 15 20 25 30
Epoch
Loss (train) Accuracy_top_5 (val)
Accuracy_top_1 (val) Loss (val)
100
90
90
80
80
70
70
60
Accuracy (%)
60
50
Loss
50
40 40
30 30
20 20
10 10
0 0
0 5 10 15 20 25 30
Epoch
Loss (train)
Accuracy (val)
Loss (val)
100
8
90
7
80
6 70
Accuracy (%)
5 60
Loss
4 50
40
3
30
2
20
1 10
0 0
0 5 10 15 20 25 30
Epoch
Truth Data
Class 0 824 11 0 0 0 0 3 0
Class 1 18 819 1 0 0 0 0 0
Class 2 0 0 764 0 1 72 1 0
Classifier
Class 3 0 0 0 831 0 0 4 3
results
Class 4 0 0 0 0 835 0 3 0
Class 5 0 0 80 0 0 757 1 0
Class 6 2 0 0 6 2 0 819 9
Class 7 0 1 0 11 2 0 15 809
and high training loss of 0.58 after the epoch of 30. The Goo- The kappa coefficient is the de facto norm for assessing
gLeNet model has obtained a top_1 accuracy of 91.21%, top_5 rater agreement, as it eliminates predicted agreement due
accuracy of 100%, and training loss of 0.21. to chance. Cohen’s kappa value is obtained by equation
After the model training is completed, the models are (3), where G denotes overall correctly predicted classes, H
validated with the validation dataset, and the confusion denotes the total number of elements, cl denotes the overall
matrix is drawn out of it. Figures 12–14 represent the confu- times class l that was predicted, and sl denotes overall times
sion matrices of the three pretrained models validated on the class l occurred [33].
validation dataset. The confusion matrix is drawn with truth
data and classifier results. From the confusion matrix, the G × H − ∑Ll cl × sl
True Positive Value (TPV), False Positive Value (FPV), True CK = : ð3Þ
Negative Value (TNV), and False Negative Value (FNV) are H 2 − ∑Ll cl × sl
calculated. The diagonal elements represent the TPV of the
corresponding class. The different performance metrics such The kappa coefficient is used when the number of classes
as top_1 accuracy, top_5 accuracy, recall, precision, and more to determine its classification performance. The value
Cohen’s Kappa score are calculated using equations interprets the kappa score ranges from 0 to 1, and their
mentioned in Table 2. interpretation is provided in Table 3.
Computational and Mathematical Methods in Medicine 9
Truth data
Class 7 1 0 1 52 19 1 52 712
Truth data
Class 0 787 50 0 0 0 0 0 1
Class 1 58 780 0 0 0 0 0 0
Class 4 0 0 1 0 813 14 7 3
Class 6 3 0 0 21 4 0 757 53
Class 7 0 0 0 11 0 0 33 794
Model name Top_1 ACC (%) Top_5 ACC (%) Recall (%) Precision (%) F1-measure (%) Kappa score
VGG16 96.33 100 96.37 96.50 96.50 0.96
GoogLeNet 90.27 100 90.33 90.27 90.37 0.89
ResNet-18 78.77 99.99 78.91 78.77 78.75 0.75
Method Accuracy
DenseNet-201 [34] 90.74
ResNet-18 [34] 88.43
Baseline+Inceptionv3 + VGGNet [35] 96.11
Ensemble model [36] 93.7
Logistic regression tree [29] 94.2
Proposed method 96.33
ROC curves
1.0
0.8
True positive rate
0.6
Original image Heat map
0.4
0.2
0.0
0.00 00.22 00.44 00.66 00.88 11.0 Ulcerative-colitis
False positive rate
ROC curve of class 0 (area = 0.94)
ROC curve of class 1 (area = 0.94)
ROC curve of class 2 (area = 0.95)
ROC curve of class 3 (area = 0.98)
ROC curve of class 4 (area = 1.00)
ROC curve of class 5 (area = 0.95)
ROC curve of class 6 (area = 0.94)
ROC curve of class 7 (area = 0.96)
Micro-average ROC curve (area = 0.96)
Macro-average ROC curve (area = 0.96)
(a) (b)
Figure 15: (a) VGG16 ROC for GIT classification. (b) Heat map for test data.
model, which requires high computation power and different cases. Compared to the various machine learning
obtained the Matthews Correlation Coefficient (MCC) of and deep learning models used to classify gastrointestinal
0.826. In [36], the CNN and transfer learning model is pro- tract disease, the VGG16 model achieves better results of
posed classify GIT diseases using global features. The model 96.33% accuracy, 0.96 Cohen’s kappa score, and 0.95
achieves an accuracy of 93.7% with an MCC value of 0.71. MCC. The requirement of manually marking data is the
The logistic model tree proposed in the reference uses the algorithm’s weakest point. As a result, the network could
handcrafted features using 4000 images and achieves an inherit some flaws from an analyst, as diagnosing diseases
accuracy of 94.2% but with poor MCC values of 0.72 [29]. correctly is difficult even for humans in many cases. Using
The person’s significant disadvantage should be knowledge a larger dataset labelled by a larger community of experts
of feature extraction and feature selection techniques. The will be one way to overcome this limitation.
modified pretrained model VGG16 obtained the MCC value
of 0.95, which outperforms all the other models. From the Data Availability
MCC of all the states of the method, we found that the mod-
ified VGG16 method proves to be a perfect agreement for The data used to support the findings of this study are
classifying GIT diseases. included within the article.
The time complexity of the modified pretrained model is
compared with the other models in classifying the GIT Conflicts of Interest
diseases. The proposed models VGG16, GoogLeNet, and
ResNet-18 reported the training time of 1 hour 50 minutes, The authors declare that there is no conflict of interest
1 hour, 7, and 57 minutes, respectively. The literature found regarding the publication of this article.
that DenseNet-201 [34] and ResNet-18 [34] have been
trained for more than 10 hours. The ROC curve in References
Figure 15(a) depicts the tradeoff between true-positive and
false-positive rates. The ROC curve shows the performance [1] M. A. Khan, M. A. Khan, F. Ahmed et al., “Gastrointestinal
of the classification model at different classification thresh- diseases segmentation and classification based on duo-deep
olds. It is plotted at different classification thresholds. The architectures,” Pattern Recognition Letters, vol. 131, pp. 193–
ROC is drawn for the eight classes to determine the better 204, 2020.
threshold for each category. The curve that fits the top left [2] M. A. Khan, M. Rashid, M. Sharif, K. Javed, and T. Akram,
of the corner indicates the better performance of classifica- “Classification of gastrointestinal diseases of stomach from
WCE using improved saliency-based method and discrimi-
tion. Occlusion sensitivity is used to assess the deep neural
nant features selection,” Multimedia Tools and Applications,
network’s sensitivity map to identify the image input area vol. 78, no. 19, pp. 27743–27770, 2019.
for predicted diagnosis. The heat map for test data is shown [3] T. Rahim, M. A. Usman, and S. Y. Shin, “A survey on contem-
in Figure 15(b). This test procedure identified the region of porary computer-aided tumor, polyp, and ulcer detection
interest, which was crucial in the development of the methods in wireless capsule endoscopy imaging,” Computer-
VGG16 model. The model’s occlusion sensitivity map is ized Medical Imaging and Graphics, vol. 85, p. 101767, 2020.
visualized to determine the areas of greatest concern when [4] A. Liaqat, M. A. Khan, J. H. Shah, M. Sharif, M. Yasmin, and
evaluating a diagnosis. The occlusion test’s greatest advan- S. L. Fernandes, “Automated ulcer and bleeding classification
tage is that it shows unresponsive insights into neural net- from WCE images using multiple features fusion and selec-
work decisions, also known as black boxes. The algorithm tion,” Journal of Mechanics in Medicine and Biology, vol. 18,
has been disfigured without disrupting its performance since no. 4, article 1850038, 2018.
the evaluation was performed at the end of the experiment. [5] N. Dey, A. S. Ashour, F. Shi, and R. S. Sherratt, “Wireless cap-
sule gastrointestinal endoscopy: direction-of-arrival estima-
tion based localization survey,” IEEE Reviews in Biomedical
6. Conclusion Engineering, vol. 10, no. c, pp. 2–11, 2017.
These findings show that the most recent pretrained models, [6] A. S. Ashour, N. Dey, W. S. Mohamed et al., “Colored video
such as VGG-16, ResNet-18, and GoogLeNet, can be used in analysis in wireless capsule endoscopy: a survey of state-of-
medical imaging domains such as image processing and the-art,” Current Medical Imaging Formerly Current Medical
Imaging Reviews, vol. 16, no. 9, pp. 1074–1084, 2020.
analysis. CNN models can advance medical imaging tech-
[7] Q. Wang, N. Pan, W. Xiong, H. Lu, N. Li, and X. Zou, “Reduc-
nology by offering a higher degree of automation while also
tion of bubble-like frames using a RSS filter in wireless capsule
speeding up processes and increasing efficiency. The algo-
endoscopy video,” Optics & Laser Technology, vol. 110,
rithm in this study obtained a state-of-the-art result in pp. 152–157, 2019.
gastrointestinal tract disease classification, with 96.33% and [8] M. T. K. B. Ozyoruk, G. I. Gokceler, T. L. Bobrow et al., “Endo-
equally high sensitivity and specificity. Transfer learning is SLAM dataset and an unsupervised monocular visual odome-
helpful for various challenging tasks and is one solution to try and depth estimation approach for endoscopic videos:
computer vision problems for which only small datasets endo-SfMLearner,” Medical Image Analysis, vol. 71, article
are often accessible. Medical applications demonstrate that 102058, 2021.
advanced CNN architectures can generalize and acquire very [9] M. Islam, B. Chen, J. M. Spraggins, R. T. Kelly, and K. S. Lau,
rich features, mapping information on images similar to “Use of single-cell -omic technologies to study the gastrointes-
those in the ImageNet database and correctly classifying very tinal tract and diseases, from single cell identities to patient
12 Computational and Mathematical Methods in Medicine
features,” Gastroenterology, vol. 159, no. 2, pp. 453–466.e1, [25] A. K. Kundu and S. A. Fattah, “Probability density function
2020. based modeling of spatial feature variation in capsule endos-
[10] T.-C. Hong, J. M. Liou, C. C. Yeh et al., “Endoscopic sub- copy data for automatic bleeding detection,” Computers in
mucosal dissection comparing with surgical resection in Biology and Medicine, vol. 115, article 103478, 2019.
patients with early gastric cancer - a single center experi- [26] M. Abra Ayidzoe, Y. Yu, P. K. Mensah, J. Cai, K. Adu, and
ence in Taiwan,” Journal of the Formosan Medical Associa- Y. Tang, “Gabor capsule network with preprocessing blocks
tion, vol. 119, no. 12, pp. 1750–1757, 2020. for the recognition of complex images,” Machine Vision and
[11] M. Suriya, V. Chandran, and M. G. Sumithra, “Enhanced deep Applications, vol. 32, no. 4, 2021.
convolutional neural network for malarial parasite classifica- [27] S. Mohapatra, J. Nayak, M. Mishra, G. K. Pati, B. Naik, and
tion,” International Journal of Computers and Applications, T. Swarnkar, “Wavelet transform and deep convolutional neu-
pp. 1–10, 2019. ral network-based smart healthcare system for gastrointestinal
[12] T. M. Berzin, S. Parasa, M. B. Wallace, S. A. Gross, A. Repici, disease detection,” Interdisciplinary Sciences: Computational
and P. Sharma, “Position statement on priorities for artificial Life Sciences, vol. 13, no. 2, pp. 212–228, 2021.
intelligence in GI endoscopy: a report by the ASGE Task [28] P. Muruganantham and S. M. Balakrishnan, “A survey on deep
Force,” Gastrointestinal Endoscopy, vol. 92, no. 4, pp. 951– learning models for wireless capsule endoscopy image analy-
959, 2020. sis,” International Journal of Cognitive Computing in Engineer-
[13] S. Murugan, C. Venkatesan, M. G. Sumithra et al., “DEMNET: ing, vol. 2, pp. 83–92, 2021.
a deep learning model for early diagnosis of Alzheimer dis- [29] K. Pogorelov, K. R. Randel, C. Griwodz et al., “KVASIR: a
eases and dementia from MR images,” IEEE Access, vol. 9, multi-class image dataset for computer aided gastrointestinal
pp. 90319–90329, 2021. disease detection,” in Proceedings of the 8th ACM on Multime-
[14] V. Chandran, M. G. Sumithra, A. Karthick et al., “Diagnosis of dia Systems Conference, pp. 164–169, New York, NY, USA,
cervical cancer based on ensemble deep learning network 2017.
using colposcopy images,” vol. 2021, pp. 1–15, 2021. [30] A. Caroppo, A. Leone, and P. Siciliano, “Deep transfer learning
[15] A. Khosla, P. Khandnor, and T. Chand, “A comparative anal- approaches for bleeding detection in endoscopy images,” Com-
ysis of signal processing and classification methods for differ- puterized Medical Imaging and Graphics, vol. 88, article
ent applications based on EEG signals,” Biocybernetics and 101852, 2021.
Biomedical Engineering, vol. 40, no. 2, pp. 649–690, 2020. [31] S. Minaee, R. Kafieh, M. Sonka, S. Yazdani, and G. Jamalipour
[16] P. Tang, Q. Liang, X. Yan et al., “Efficient skin lesion segmen- Soufi, “Deep-COVID: predicting COVID-19 from chest X-ray
tation using separable-Unet with stochastic weight averaging,” images using deep transfer learning,” Medical Image Analysis,
Computer Methods and Programs in Biomedicine, vol. 178, vol. 65, p. 101794, 2020.
pp. 289–301, 2019. [32] M. N. Y. Ali, M. G. Sarowar, M. L. Rahman, J. Chaki, N. Dey,
[17] S. Igarashi, Y. Sasaki, T. Mikami, H. Sakuraba, and S. Fukuda, and J. M. R. S. Tavares, “Adam deep learning with SOM for
“Anatomical classification of upper gastrointestinal organs human sentiment classification,” International Journal of
under various image capture conditions using AlexNet,” Com- Ambient Computing and Intelligence, vol. 10, no. 3, pp. 92–
puters in Biology and Medicine, vol. 124, article 103950, 2020. 116, 2019.
[18] A. Biniaz, R. A. Zoroofi, and M. R. Sohrabi, “Automatic reduc- [33] M. Grandini, E. Bagli, and G. Visani, “Metrics for multi-class
tion of wireless capsule endoscopy reviewing time based on classification: an overview,” 2020, https://arxiv.org/abs/2008
factorization analysis,” Biomedical Signal Processing and Con- .05756.
trol, vol. 59, p. 101897, 2020. [34] C. Gamage, I. Wijesinghe, C. Chitraranjan, and I. Perera, “GI-
[19] S. Jain, A. Seal, A. Ojha et al., “Detection of abnormality in Net: anomalies classification in gastrointestinal tract through
wireless capsule endoscopy images using fractal features,” endoscopic imagery with deep learning,” in 2019 Moratuwa
Computers in Biology and Medicine, vol. 127, p. 104094, 2020. Engineering Research Conference (MERCon), pp. 66–71, Mora-
tuwa, Sri Lanka, 2019.
[20] T. Cogan, M. Cogan, and L. Tamil, “MAPGI: accurate identi-
fication of anatomical landmarks and diseased tissue in gastro- [35] T. Agrawa, R. Gupta, S. Sahu, and C. E. Wilson, “SCL-UMD at
intestinal tract using deep learning,” Computers in Biology and the medico task-mediaeval 2017: transfer learning based classi-
Medicine, vol. 111, article 103351, 2019. fication of medical images,” CEUR Workshop Proceedings,
vol. 1984, pp. 3–5, 2017.
[21] H. Alaskar, A. Hussain, N. Al-Aseem, P. Liatsis, and
D. Al-Jumeily, “Application of convolutional neural net- [36] S. S. A. Naqvi, S. Nadeem, M. Zaid, and M. A. Tahir, “Ensem-
works for automated ulcer detection in wireless capsule ble of texture features for finding abnormalities in the gastro-
endoscopy images,” Sensor, vol. 19, no. 6, p. 1265, 2019. intestinal tract,” CEUR Workshop Proceedings, vol. 1984, 2017.
[22] T. Ozawa, S. Ishihara, M. Fujishiro et al., “Novel computer-
assisted diagnosis system for endoscopic disease activity in
patients with ulcerative colitis,” Gastrointestinal Endoscopy,
vol. 89, no. 2, pp. 416–421.e1, 2019.
[23] V. Vani and K. M. Prashanth, “Ulcer detection in wireless cap-
sule endoscopy images using deep CNN,” Journal of King Saud
University - Computer and Information Sciences, 2020.
[24] K. İncetan, I. O. Celik, A. Obeid et al., “VR-Caps: a virtual
environment for capsule endoscopy,” Medical Image Analysis,
vol. 70, p. 101990, 2021.