0% found this document useful (0 votes)
24 views13 pages

The value of machine learning approaches

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views13 pages

The value of machine learning approaches

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

Shi et al.

World Journal of Surgical Oncology (2024) 22:40 World Journal of


https://doi.org/10.1186/s12957-024-03321-9
Surgical Oncology

REVIEW Open Access

The value of machine learning approaches


in the diagnosis of early gastric cancer:
a systematic review and meta‑analysis
Yiheng Shi1,2†, Haohan Fan2†, Li Li1,4†, Yaqi Hou5, Feifei Qian1,2, Mengting Zhuang1,2, Bei Miao1,3* and
Sujuan Fei1,4*

Abstract
Background The application of machine learning (ML) for identifying early gastric cancer (EGC) has drawn increasing
attention. However, there lacks evidence-based support for its specific diagnostic performance. Hence, this systematic
review and meta-analysis was implemented to assess the performance of image-based ML in EGC diagnosis.
Methods We performed a comprehensive electronic search in PubMed, Embase, Cochrane Library, and Web
of Science up to September 25, 2022. QUADAS-2 was selected to judge the risk of bias of included articles. We did
the meta-analysis using a bivariant mixed-effect model. Sensitivity analysis and heterogeneity test were performed.
Results Twenty-one articles were enrolled. The sensitivity (SEN), specificity (SPE), and SROC of ML-based models
were 0.91 (95% CI: 0.87–0.94), 0.85 (95% CI: 0.81–0.89), and 0.94 (95% CI: 0.39–1.00) in the training set and 0.90 (95%
CI: 0.86–0.93), 0.90 (95% CI: 0.86–0.92), and 0.96 (95% CI: 0.19–1.00) in the validation set. The SEN, SPE, and SROC
of EGC diagnosis by non-specialist clinicians were 0.64 (95% CI: 0.56–0.71), 0.84 (95% CI: 0.77–0.89), and 0.80 (95% CI:
0.29–0.97), and those by specialist clinicians were 0.80 (95% CI: 0.74–0.85), 0.88 (95% CI: 0.85–0.91), and 0.91 (95% CI:
0.37–0.99). With the assistance of ML models, the SEN of non-specialist physicians in the diagnosis of EGC was signifi-
cantly improved (0.76 vs 0.64).
Conclusion ML-based diagnostic models have greater performance in the identification of EGC. The diagnostic
accuracy of non-specialist clinicians can be improved to the level of the specialists with the assistance of ML models.
The results suggest that ML models can better assist less experienced clinicians in diagnosing EGC under endoscopy
and have broad clinical application value.
Keywords Machine learning, Gastric cancer, Artificial intelligence, Endoscopy, Neural networks


Yiheng Shi, Haohan Fan, and LiLi contributed equally to this work and share
first authorship.
*Correspondence:
Bei Miao
[email protected]
Sujuan Fei
[email protected]
Full list of author information is available at the end of the article

© The Author(s) 2024. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which
permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the
original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or
other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line
to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory
regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this
licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecom-
mons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
Shi et al. World Journal of Surgical Oncology (2024) 22:40 Page 2 of 13

Background based on convolutional neural network (CNN) exhibits


Gastric cancer (GC) is among the most prevailing gas- great advantages in image recognition, segmentation, and
trointestinal malignancies. Global Cancer Statistics [1] feature extraction. Several studies have confirmed that it
indicated that in 2020, there were 1,089,103 newly diag- can be an auxiliary way to improve the accuracy of can-
nosed GC patients and 768,793 GC-caused deaths, with cer diagnosis [16, 17]. However, it has diverse algorithms,
morbidity ranking 5th and mortality ranking 4th among and there is significant heterogeneity among different ML
all types of cancer. This makes it a great hazard to pub- models. Even for the same ML model combined with dif-
lic health worldwide [2]. The popularity of endoscopic ferent predictors, the diagnostic effect may vary. There-
screening, the improvements in comprehensive treat- fore, ML can be a potential tool assisting in the diagnosis
ment strategies and surgical modalities, and the effec- of EGC, while its performance lacks evidence-based sup-
tive treatment of Helicobacter pylori (HP) infection in port. Thereby, this systematic review and meta-analysis
recent years have reduced the morbidity of GC, while the was performed to appraise the performance of ML-based
patients still have poor 5-year survival [3]. The median endoscopy for EGC diagnosis, to provide evidence to
survival differs between the patients at an early stage update artificial intelligence (AI) tools in this field.
and those at an advanced stage. Endoscopic therapy is
recommended for GC patients staged T1 by the AJCC- Methods
TNM system. The 5-year survival rate of these patients We conduct this study in strict accordance with the
can reach more than 95%, and some of them can achieve PRISMA 2020 statement [18]. The protocol of this study
a complete recovery [3, 4]. In contrast, the median sur- has been registered on PROSPERO (registration No.
vival of those at an advanced stage (stage-IV) is less CRD42022374248).
than 12 months despite systematic treatment [5]. Hence,
timely identification of early gastric cancer (EGC) is of
essence to the prognosis of the patients. Selection criteria
Endoscopy is a prevalently used approach in clinical Inclusion criteria
screening for gastrointestinal malignancies, and the iden-
tification of EGC depends greatly on endoscopic biopsy. – Types of participants: Adult EGC patients whose
Despite its high sensitivity and capability of identify- baseline characteristics and image information were
ing most of the cases, there is still a considerable omis- recorded
sion diagnostic rate [6]. It is reported that the omission – Types of study: Randomized controlled trial (RCT),
diagnostic rate of upper gastrointestinal malignancies case–control study, cohort study, nested case–con-
reaches 15% in Western populations, which can be over trol study, and case-cohort study
25% in Eastern countries such as Japan [7–9]. Endos- – Constructed a completed ML-based model for EGC
copy-based diagnosis relies largely on the image quality diagnosis
and endoscopists’ skill level of skill. An obscure image – With or without the process of external validation.
could easily misguide endoscopists to take the mucosal In ML research, it is difficult to conduct independ-
lesions of EGC for chronic atrophic gastritis [10] and the ent external validation due to limited conditions, so
skill of endoscopists requires training and practicing for a validation methods such as K-fold cross-validation or
long time [8]. In China, due to the large population base, leave-one-out method are utilized. However, we can-
severe imbalance of regional medical development, and not ignore the contributions that these studies have
uneven levels of doctors, the detection rate of EGC is not made, as we need to consider overfitting from the
ideal. According to reports [11, 12], the detection rate of perspective of evidence-based medicine. Therefore,
EGC in China is less than 5%, and the rate of missed diag- these articles were also included
nose under endoscopy is about 10%, which is obviously – Studies using different ML models based on a same
unfavorable to the prognosis of patients. In addition, the data set. In certain publicly authoritative datasets,
identification of EGC in gastroscopy mainly relies on different ML models have been developed, which
the visual diagnosis and empirical judgment of doctors, were also included
which also poses a huge challenge to the accurate detec- – Reported and published in English
tion of EGC. Thus, there is an urgent need for effective
approaches that can assist clinicians in endoscopic diag-
nosis and improve the diagnostic rate of EGC. Exclusion criteria
Machine learning (ML)-based endoscopy for EGC
diagnosis has currently attracted extensive attention in – Other types of study, such as meta-analysis, review,
clinical settings [13–15]. Deep learning (DL) methods guideline, and expert comments
Shi et al. World Journal of Surgical Oncology (2024) 22:40 Page 3 of 13

– Only performed analysis for the risk factors, with no independently by the same two reviewers, and their
ML-based model completely constructed results were cross-checked. Any disagreements among
– Lacked the following outcome measures: sensitivity them were addressed by a third reviewer (FSJ).
(SEN), specificity (SPE), receiver operator character-
istic curve (ROC), calibration curve, c-index, accu- Statistical analysis
racy, precision rate, recovery rate, confusion matrix, We used a bivariant mixed-effect model for meta-
diagnostic fourfold table, and F1 score analysis. The model takes into account both fixed- and
– Assessed the accuracy using univariate analysis random-effects models and better handles heterogene-
ity across studies and the correlation between SEN and
SPE, making the results more robust and reliable [20, 21].
Search strategy The number of true positive (TP), false positive (FP), true
A comprehensive electronic search was implemented up negative (TN), and false negative (FN) cases in original
to September 25, 2022, in PubMed, Embase, Cochrane studies were needed, while we could only obtain the SEN
Library, and Web of Science. The strategy was designed and SPE from several studies instead of the above infor-
based on Medical Subject Headings (MeSH) and free mation. Given this situation, we used the SEN and SPE
words. No restrictions were set to region and language. in combination with EGC cases and total cases to cal-
culate TP, FP, FN, and TN. Some studies only provided
Study screening and data extraction the ROC. In this case, we adopted Origin based on the
We used Endnote X9 for the management of the retrieved optimal Youden index to extract the SEN and SPE from
papers. Following the duplicate-checking, potentially eli- the ROC and subsequently calculated TP, FP, TN, and
gible articles were screened by browsing the titles and FN. The outcome variables in the bivariant mixed-effect
abstracts, and we downloaded the full texts of potentially model contained the SEN and SPE as well as the negative
eligible articles. Studies that met the pre-set eligibility likelihood ratio (NLR), positive likelihood ratio (PLR),
criteria were included after reading the full texts. A pre- diagnostic odds ratio (DOR), and 95% confidence inter-
designed form was adopted for extracting the data, which vals (95%CI). Summarized ROC was produced and the
contained the following: title, author, publication date, area under the curve was computed. Deek’s funnel plot
nationality, study type, EGC cases, total cases, images of was utilized for publication bias assessment.
EGC, total images, EGC cases in training set, total cases Subgroup analysis was processed based on the data sets
in training set, images of EGC in training set, total images (training set and validation set) and modeling variables
in training set, EGC cases in validation set, total cases (fixed images and dynamic videos). Moreover, we sum-
in validation set, images of EGC in validation set, total marized the results of non-specialist clinicians/specialist
images in validation set, model type, variables for model clinicians, non-specialist clinicians/specialist clinicians
construction, and comparisons with clinicians. The above with the assistance of ML, and video validation.
processes were completed independently by two review- All the data analyses were done on Stata 15.0, and
ers (SYH and MB), and their results were cross-checked. p < 0.05 implied statistical significance.
Ant disagreements among them were addressed by a
third reviewer (FSJ). Results
Study selection
Quality assessment There were 8758 articles retrieved through the literature
Quality Assessment of Diagnostic Accuracy Studies-2 search, of which 1394 were from PubMed, 3866 from
(QUADAS-2) [19] was applied for the evaluation of the Embase, 138 from Cochrane Library, and 3360 from Web
risk of bias. QUADAS-2 contains the following 4 aspects: of Science, and 4683 ineligible articles were removed
patient selection, index test, reference standard, and flow due to duplication and other reasons. We screened the
and timing. Each domain includes several items that remaining 4075 articles through browsing their titles and
could be filled as “yes,” “no,” or “uncertain,” correspond- abstracts, and 39 articles preliminarily met the inclusion
ing “low,” “high,” and “unclear” risk of bias, respectively. If criteria. Among these 39 articles, the full texts of 1 study
all items in a domain are filled as “yes,” this domain would could not be obtained, and full texts of the other 38 were
be graded as “low” risk of bias. If one item in a domain is read. After excluding conference summaries, reviews,
filled as “no,” there would be potential bias, and the risk studies with the full texts unavailable, and studies for
should be assessed according to the established guideline. which the diagnostic performance of the ML models
“Unclear” refers to no detailed information provided in could not be assessed, 21 articles were finally included.
the study, which makes it difficult for reviewers to assess The flow diagram of study selection is presented in Fig. 1,
its risk of bias. The above processes were completed and the detailed search strategies are shown in Table S1.
Shi et al. World Journal of Surgical Oncology (2024) 22:40 Page 4 of 13

Fig. 1 PRISMA 2020 flow diagram of the study selection process

Characteristics of the included articles DeepLabv3, GoogLeNet, EfficientDet, Darknet-53,


Twenty-one studies were included [13, 22–41], of which ResNet101, and SSD. Detailed study characteristics are
14 studies [13, 22, 24, 26–35, 41] were conducted in presented in Table S2.
China and 7 studies [23, 25, 36–40] were in Japan. There
were 9 multi-centric studies [22, 24, 27–30, 34, 35, 41] Quality assessment
and 4 prospective studies [13, 27, 30, 34]. There were By using QUADAS-2, the included studies were generally
16,074 participants involved, and 454,528 endoscopic graded as high quality. Detailed results of the risk of bis
images were obtained, of which 97,950 images involved assessment are exhibited in Fig. 2.
EGC. Among the included studies, 7 studies [13, 24, 30,
33–35, 39] performed real-time training or validation Results of meta‑analysis
for ML-based models in videos, and 11 studies [24, 27, Diagnostic performance of ML models in the image training
29, 30, 32–38] provided comparisons for the diagnostic set
performance of the ML-based models with that of clini- There were 7 studies [24, 26–30, 35] that trained endo-
cians. We roughly divided those clinicians into specialists scopic image-based ML models for EGC diagnosis. The
and non-specialists according to their working experi- pooled AUC, SEN, and SPE were 0.94 (95% CI: 0.39–
ence and the number of times of performing endos- 1.00), 0.91 (95% CI, 0.87–0.94), and 0.85 (95% CI: 0.81–
copy yearly. The involving ML models were as follows: 0.89) (Fig. 3A, B). The PLR, NLR, and DOR were 6.2 (95%
VGG-16, ResNet50, VGG-19, SVM, PLS-DA, ResNet34, CI: 4.6–8.2), 0.11 (95% CI: 0.07–0.16), and 58 (95% CI:
Shi et al. World Journal of Surgical Oncology (2024) 22:40 Page 5 of 13

Fig. 2 Risk of bias and clinical applicability assessment of included studies by QUADAS-2

29–114), respectively. No evident publication bias was EGC, and 6 of them [22, 24, 26, 29, 30, 35] had included
found (p = 0.51). More details are provided in Supple- more than 1 set of data. The pooled AUC, SEN, and SPE
mentary Fig. 1. were 0.96 (95% CI: 0.19–1.00), 0.90 (95% CI: 0.86–0.93),
and 0.90 (95% CI: 0.86–0.92) (Fig. 4A, B). The PLR,
Diagnostic performance of ML models in the image NLR, and DOR were 8.7 (95% CI: 6.6–11.4), 0.11 (95%
validation set CI: 0.08–0.15), and 80 (95% CI: 47–138), respectively.
There were 17 studies [13, 22–24, 26, 29–40] that vali- No evident publication bias was noted (p = 0.84). More
dated the performance of the ML models for diagnosing details are provided in Supplementary Fig. 2.
Shi et al. World Journal of Surgical Oncology (2024) 22:40 Page 6 of 13

Fig. 3 Diagnostic performance of the ML models in image training set. A SROC; B forest plot of pooled SEN and SPE

Fig. 4 Diagnostic performance of the ML models in image validation set. A SROC; B forest plot of pooled SEN and SPE

Diagnostic performance of clinicians SPE were 0.91(95% CI: 0.37–0.99), 0.80 (95% CI: 0.74–
We divided those clinicians into specialists and non- 0.85), and 0.88 (95% CI: 0.85–0.91) (Fig. 6A, B). The PLR,
specialists according to their working experience and the NLR, and DOR were 6.7 (95% CI: 5.4–8.4), 0.23 (95% CI:
number of times of endoscopy performed. There were 72 0.18–0.30), and 29 (95% CI: 21–41), respectively. No evi-
non-specialist clinicians, and the pooled AUC, SEN, and dent publication bias existed (p = 0.27). More details are
SPE were 0.80 (95% CI: 0.29–0.97), 0.64 (95% CI: 0.56– provided in Supplementary Figs. 3 and 4.
0.71), and 0.84 (95% CI: 0.77–0.89) (Fig. 5A, B). The PLR,
NLR, and DOR were 4 (95% CI: 2.9–5.3), 0.44 (95% CI: Diagnostic performance of clinicians with the assistance
0.37–0.52), and 9 (95% CI: 6–13), respectively. No evi- of ML models
dent publication bias was noticed (p = 0.94). There were There were 6 studies [13, 24, 29, 30, 35, 41] report-
76 specialist clinicians, and the pooled AUC, SEN, and ing the performance of clinicians in diagnosing EGC
Shi et al. World Journal of Surgical Oncology (2024) 22:40 Page 7 of 13

Fig. 5 Diagnostic performance of non-specialist clinicians in the diagnosis of EGC through endoscopic images. A SROC; B forest plot of pooled SEN
and SPE

Fig. 6 Diagnostic performance of specialist clinicians in the diagnosis of EGC by endoscopic images. A SROC; B forest plot of pooled SEN and SPE

with the assistance of ML models. We also divided and 21 (95% CI:11–43). No evident publication bias
these clinicians into specialist clinicians and non- was existed (p = 0.10). With the assistance of the ML
specialist clinicians. There were 16 specialist cli- models, the pooled AUC, SEN, and SPE of specialist
nicians and 12 non-specialist clinicians. With the clinicians were 0.93 (95% CI: 0.38–1.00), 0.89 (95% CI:
assistance of the ML models, the pooled AUC, SEN, 0.82–0.93), and 0.86 (95% CI: 0.81–0.90), respectively
and SPE of non-specialist clinicians were 0.90 (95% (Fig. 8A, B). The PLR, NLR, and DOR were 6 (95% CI:
CI: 0.36–0.99), 0.76 (95% CI: 0.68–0.83), and 0.87 (95% 4.6–8.6), 0.13 (95% CI: 0.08–0.21), and 48 (95% CI:
CI: 0.83–0.90), (Fig. 7A, B). The PLR, NLR, and DOR 26–87), respectively. No evident publication bias was
were 6 (95% CI: 4.1–8.3), 0.27 (95% CI: 0.19–0.38), noticed (p = 0.22). More details are provided in Sup-
plementary Figs. 5 and 6.
Shi et al. World Journal of Surgical Oncology (2024) 22:40 Page 8 of 13

Fig. 7 Diagnostic performance of non-specialist clinicians with assistance of the machine learning models in the diagnosis of EGC by endoscopic
images. A SROC; B forest plot of pooled SEN and SPE

Fig. 8 Diagnostic performance of specialist clinicians with assistance of the machine learning models in the diagnosis of EGC by endoscopic
images. A SROC; B forest plot of pooled SEN and SPE

Diagnostic performance of ML models in the video validation existed (p = 0.08). More details are provided in Supple-
set mentary Fig. 7.
There were 4 studies [13, 24, 30, 39] that validated the
diagnostic performance of ML models in real-time vid- Diagnostic performance of clinicians in the video validation
eos. The pooled AUC, SEN, and SPE were 0.94 (95% CI: set
0.39–1.00), 0.91 (95% CI: 0.82–0.96), and 0.86 (95% CI: There were 3 studies [13, 30, 39] that validated the per-
0.75–0.93) (Fig. 9A, B). The PLR, NLR, and DOR were 6 formance of clinicians (n = 20) in the diagnosis of EGC in
(95% CI: 3.5–12.1), 0.11 (95%CI: 0.05–0.22), and 60 (95% real-time videos. The pooled AUC, SEN, and SPE were
CI: 20–176), respectively. No evident publication bias 0.90 (95% CI: 0.58–0.98), 0.83 (95% CI: 0.77–0.88), and
Shi et al. World Journal of Surgical Oncology (2024) 22:40 Page 9 of 13

Fig. 9 Performance of ML models in the diagnosis of EGC in video validation set. A SROC; B forest plot of pooled SEN and SPE

0.85 (95% CI: 0.77–0.90) (Fig. 10A, B). The PLR, NLR, of these models with clinicians of different skill levels.
and DOR were 5 (95% CI: 3.6–8.2), 0.20 (95% CI: 0.15– Moreover, we assessed the diagnostic performance of ML
0.27), and 27 (95% CI: 17–44), respectively. No evident models in real-time videos. The analysis results revealed
publication bias was noticed (p = 0.51). More details are that ML models would be of greater performance in
provided in Supplementary Fig. 8. diagnosing endoscopic images than clinicians (including
specialists and non-specialists), and the diagnostic per-
Discussion formance of non-specialist clinicians could be improved
In this study, we systematically searched articles regard- to the level of the specialists with the assistance of ML
ing the application of ML for the diagnosis of EGC, models. ML models presented a remarkable performance
assessed the application value of image-based ML mod- in real-time video diagnosis, and the sensitivity and spec-
els for EGC diagnosis, and compared the performance ificity were all higher than those of clinicians.

Fig. 10 Performance of clinicians in the diagnosis of EGC in video validation set. A SROC; B forest plot of pooled SEN and SPE
Shi et al. World Journal of Surgical Oncology (2024) 22:40 Page 10 of 13

ML is a crucial part of artificial intelligence. It is com- EGC diagnosis were 0.95 and 0.89, respectively. Among
posed of multiple disciplines and can learn and prac- the articles included, only 2 articles [25, 26] used con-
tice with a large amount of historical data to construct ventional ML methods (SVM). Miyaki, R et al. [25] dis-
algorithm models that provide accurate prediction and covered the mean SVM output-value of the cancer lesion
assessment for the new data [42, 43], which refers to a was 0.846 ± 0.220, which was evidently higher than that
process from experience summarizing to flexible use. ML of the reddened lesions (0.381 ± 0.349) and surrounding
technique has been extensively employed in screening tissues (0.219 ± 0.277). Yuanpeng Li et al. [26] elicited
gastrointestinal malignancies, mainly in assisting endo- the SEN, SPE, and accuracy of SVM in diagnosing EGC
scopic diagnosis, automatic pathological examination, were all over 90%, indicating its good application value.
and tumor invasion depth detection, and has produced However, conventional ML methods such as SVM have
desired results. [44] Chang et al. [45] reviewed the diag- more limitations compared to DL models. The former
nostic performance of endoscopic image-based ML for relies on experienced experts to manually design the
early esophageal cancer. The AUC, SEN, and SPE were image features, requires multiple calculations to obtain
0.97 (95% CI 0.95–0.99), 0.94 (95% CI, 0.89–0.96), and the best truncation value, and yields poor performance in
0.88 (95% CI, 0.76–0.94). Jiang et al. [46] included 16 arti- processing large-scale data sets [44, 55, 56]. All of these
cles and found that the AUC, SEN, and SPE of AI-assisted problems impede the further development of conven-
EGC diagnosis were 0.96 (95% CI: 0.94–0.97), 86% (95% tional ML methods.
CI: 77–92%), and 93% (95% CI: 89–96%). However, Luo We observed, in this study, that ML-based mod-
et al. [47] included 15 articles and reported the pooled els had a higher diagnostic sensitivity than clinicians.
AUC, SEN, and SPE of endoscopic images-based AI in These models showed diagnostic performance as good
the detection of EGC were 0.94, 0.87 (95% CI: 0.87–0.88), as clinical specialists in both the images and videos.
and 0.88 (95% CI: 0.87–0.88). Variances in the diagnos- With the assistance of ML, the diagnostic sensitiv-
tic performance of ML models among different studies ity of non-specialists and specialists for EGC was sig-
indicate significant heterogeneity among different mod- nificantly improved, while such an improvement was
els. ML models can have overfitting or underfitting prob- not observed in the specificity, and the specificity of
lems when dealing with specific datasets, which can limit ML-assisted specialists was slightly lower than the
their application and generalization [48, 49]. Thereby, we ML models. This indicated that the assistance of ML
strictly differentiated between the results of the train- increased the specialists` misdiagnosis rate. Misdiag-
ing set and validation set, which could help us to analyze nosis caused by ML models in the process of image
whether ML models are at risk of overfitting and under- recognition is often attributed to the poor endoscopic
fitting and to reflect whether there are any challenges in image resolution leading to an abnormal mucosal
the goodness-of-fit of the existing ML models from an background color, which could be induced by residual
evidence-based medicine perspective. Fortunately, our foam, blood, and food residues in the lesion site, and
results were not overfitting or underfitting. Additionally, confusing tissue structures such as atrophic gastritis,
validating the model performance in different datasets intestinal metaplasia, and ulcers [29, 30]. ML mod-
with adequate external validation is necessary to improve els could interfere with clinical experts` judgment
the model and increase its reliability and application [50]. by presenting them with misidentified information,
There is a current lack of articles comparing the per- as reported by Tang et al. [24] In addition, in video
formance of ML-based models with clinicians of differ- diagnosis, the SROC, SEN, and SPE of ML models
ent skill levels and clinicians with the assistance of ML for EGC were 0.94 (95% CI: 0.39–1.00), 0.91 (95% CI:
models in EGC diagnosis as well as studies validating the 0.82–0.96), and 0.86 (95% CI: 0.75–0.93), greater than
diagnostic performance of ML models in real-time vid- that of clinicians: the SROC, SEN, and SPE were 0.90
eos. Our study has filled the gap. (95% CI: 0.58–0.98), 0.83 (95% CI: 0.77–0.88), and 0.85
According to our study, the mainstream ML method (95% CI: 0.77–0.90). By comparing the performance
is CNN. CNN is among the most typical DL models, between ML models in EGC diagnosis in images and
which includes multiple algorithm models such as VGG, real-time videos, we found that video slightly outper-
GoogleNet, ResNet, and DenseNet [51]. It is of excellent formed image on SEN, with image vs. video at 0.90 vs.
image recognition and classification ability and has been 0.91. And image slightly outperformed video on SROC
widely applied in endoscopic image-based diagnosis [27, (0.96 vs. 0.94) and SPE (0.9 vs. 0.86). However, this is
52]. Fang et al. [53] revealed the AUC, SEN, and SPE of not enough to clarify whose performance of ML mod-
CNN in the endoscopic image-based GC diagnosis were els is better in images and real-time videos. Because
0.89, 0.83, and 0.94. Md Mohaimenul Islam et al. [54] only 4 papers validated the detection performance of
revealed that the SROC and SEN of the CNN model in ML models in real-time videos, with a significantly
Shi et al. World Journal of Surgical Oncology (2024) 22:40 Page 11 of 13

smaller sample size than images. Thus, more original Conclusion


studies are still needed to validate the diagnostic per- This meta-analysis demonstrates that ML-based diag-
formance of ML models in real-time videos to better nostic models have great performance in EGC diagno-
compare their performance. Indeed, video diagnostics sis, with the sensitivity and specificity all higher than
also presents unique challenges [57, 58]. First, com- those of clinical specialists. It has great application
pared to images, videos contain dynamic and time- prospects and can be used as an adjuvant approach to
dependent information, which makes processing and help clinicians make more accurate diagnoses.
analysis more difficult. Second, the training and infer-
ence of ML models usually require high-performance
Abbreviations
computers and many computational resources. Videos EGC Early gastric cancer
contain much frame and pixel information and thus ML Machine learning
require higher computation and equipment require- HP Helicobacter pylori
AI Artificial intelligence
ments. Finally, due to the specificity of the medical DL Deep learning
field, the use of ML models for cancer diagnosis may CNN Convolutional neural network
involve many complex regulatory and ethical issues. RCT​ Randomized controlled trial
MeSH Medical Subject Headings
However, it is undeniable that ML-based models can TP True positive
serve as an adjuvant diagnostic approach for EGC, FP False positive
bringing effective help to clinicians in clinical prac- TN True negative
FN False negative
tice, especially for non-specialists. It could improve NLR Negative likelihood ratio
their diagnostic performance to the level of special- PLR Positive likelihood ratio
ists while reducing costs. The study demonstrates the DOR Diagnostic odds ratio
95%CI 95% Confidence intervals
feasibility of ML methods for EGC diagnosis, which SROC Summarized receiver operator characteristic
facilitates the development of AI tools to provide diag-
nostic assistance to inexperienced clinicians and in Supplementary Information
areas where medical resources are scarce. The online version contains supplementary material available at https://​doi.​
This study also has limitations. Firstly, most included org/​10.​1186/​s12957-​024-​03321-9.
articles were retrospective-design, and only few arti-
cles performed prospective validation for the con- Additional file 1: Supplementary Fig. 1. Meta-analysis of the predictive
accuracy of image-based machine learning models in diagnosis of early
structed ML models. Retrospective studies may suffer GC in the training cohort (A) Funnel plot for publication bias; (B) Hetero-
from incomplete data collection, poor quality, and bias, geneity box plot; (C) Clinical application nomogram. Supplementary
which affect the generalizability of the findings [49, 50]. Fig. 2. Meta-analysis of the predictive accuracy of image-based machine
learning models in diagnosis of early GC in the validation cohort (A) Fun-
Therefore, the performance of ML models in EGC diag- nel plot for publication bias; (B) Heterogeneity box plot; (C) Clinical appli-
nosis needs to be validated by more prospective studies. cation nomogram. Supplementary Fig. 3. Meta-analysis of the predictive
Secondly, most included articles had excluded manu- accuracy of non-specialist clinicians with assistance of endoscopic images
in diagnosis of early GC (A) Funnel plot for publication bias; (B) Hetero-
ally images of poor quality during the image selection geneity box plot; (C) Clinical application nomogram. Supplementary
process, which might cause an overestimated diagnos- Fig. 4. Meta-analysis of the predictive accuracy of specialist clinicians with
tic performance of these models. The included images assistance of endoscopic images in the diagnosis of early GC (A) Funnel
plot for publication bias; (B) Heterogeneity box plot; (C) Clinical applica-
were also less likely to include all types of GC lesions tion nomogram. Supplementary Fig. 5. Meta-analysis of non-specialist
that could be used as controls to EGC, making it dif- clinicians with assistance of the machine learning models in the diagnosis
ficult to conduct comprehensive training of the models, of early GC by endoscopic images (A) Funnel plot for publication bias;
(B) Heterogeneity box plot; (C) Clinical application nomogram. Sup‑
and their application was subsequently limited. In addi- plementary Fig. 6. Meta-analysis of the predictive accuracy of specialist
tion, ML models in most of the included studies were clinicians with assistance of the machine learning models in the diagnosis
constructed with DL, and subgroup analysis for dif- of early GC by endoscopic images (A) Funnel plot for publication bias; (B)
Heterogeneity box plot; (C) Clinical application nomogram. Supplemen‑
ferent types of ML (e.g., VGG-16, ResNet50, VGG-19) tary Fig. 7. Meta-analysis of the predictive accuracy of machine learning
could not be performed owing to the limited included models in diagnosis of early GC in the video validation cohort (A) Funnel
articles. Due to the limited number of ML methods, we plot for publication bias; (B) Heterogeneity box plot; (C) Clinical applica-
tion nomogram. Supplementary Fig. 8. Meta-analysis of the predictive
also failed to conduct a more detailed subgroup analy- accuracy of clinicians in diagnosis of early GC in the video validation
sis of different ML models (e.g., CNN, SVM). Lastly, the cohort (A) Funnel plot for publication bias; (B) Heterogeneity box plot; (C)
model construction in the included articles was mostly Clinical application nomogram.
based on static endoscopic images, which is different Additional file 2: Table S1. Literature search strategy. Table S2. Basic
characteristics of the included literature.
from the real-time clinical operation scenarios. More
original articles are needed to further validate the diag-
nostic performance of ML models in real-time videos.
Shi et al. World Journal of Surgical Oncology (2024) 22:40 Page 12 of 13

Acknowledgements 9. Raftopoulos SC, Segarajasingam DS, Burke V, Ee HC, Yusoff IF. A cohort
We would like to thank the researchers and study participants for their study of missed and new cancers after esophagogastroduodenoscopy.
contributions. Am J Gastroenterol. 2010;105(6):1292–7.
10. Rugge M, Genta RM, Di Mario F, El-Omar EM, El-Serag HB, Fassan M,
Authors’ contributions Hunt RH, Kuipers EJ, Malfertheiner P, Sugano K, et al. Gastric cancer as
Conceptualization: YS, YH. Data curation: YS, LL. Formal analysis: YS, SF. Inves- preventable disease. Clin Gastroenterol Hepatol : the official clini-
tigation: HF, SF. Methodology: FQ, SF. Project administration: MZ. Software: YS, cal practice journal of the American Gastroenterological Association.
LL. Supervision: LL, BM. Validation: LL, YH. Visualization: HF, YH.Writing -original 2017;15(12):1833–43.
draft: YS, LL, YH. Writing-review & editing: YS, BM, SF. All authors contributed to 11. Ren W, Yu J, Zhang ZM, Song YK, Li YH, Wang L. Missed diagnosis of early
the article and approved the submitted version. gastric cancer or high-grade intraepithelial neoplasia. World J Gastroen-
terol. 2013;19(13):2092–6.
Funding 12. Pimenta-Melo AR, Monteiro-Soares M, Libânio D, Dinis-Ribeiro M. Miss-
The authors declare that they did not receive any funding from any source. ing rate for gastric cancer during upper gastrointestinal endoscopy:
a systematic review and meta-analysis. Eur J Gastroenterol Hepatol.
Availability of data and materials 2016;28(9):1041–9.
The data that support the findings of this study are available from the cor- 13. Li J, Zhu Y, Dong Z, He X, Xu M, Liu J, Zhang M, Tao X, Du H, Chen D, et al.
responding author upon reasonable request. Development and validation of a feature extraction-based logical anthro-
pomorphic diagnostic system for early gastric cancer: A case-control
study. EClinicalMedicine. 2022;46:101366.
Declarations 14. van der Sommen F, de Groof J, Struyvenberg M, van der Putten J, Boers T,
Fockens K, Schoon EJ, Curvers W, de With P, Mori Y, et al. Machine learning
Ethics approval and consent to participate in GI endoscopy: practical guidance in how to interpret a novel field. Gut.
Not applicable. 2020;69(11):2035–45.
15. Gottlieb K, Daperno M, Usiskin K, Sands BE, Ahmad H, Howden CW,
Consent for publication Karnes W, Oh YS, Modesto I, Marano C, et al. Endoscopy and central
Not applicable. reading in inflammatory bowel disease clinical trials: achievements, chal-
lenges and future developments. Gut. 2021;70(2):418–26.
Competing interests 16. Rezaeijo SM, Chegeni N, Baghaei Naeini F, Makris D, Bakas S. Within-
The authors declare no competing interests. modality synthesis and novel radiomic evaluation of brain MRI scans.
Cancers (Basel). 2023;15(14):3565.
Author details 17. Khanfari H, Mehranfar S, Cheki M, Mohammadi Sadr M, Moniri S, Heydar-
1
Department of Gastroenterology, The Affiliated Hospital of Xuzhou Medical heydari S, Rezaeijo SM. Exploring the efficacy of multi-flavored feature
University, 99 West Huaihai Road, Jiangsu Province 221002, Xuzhou, China. extraction with radiomics and deep features for prostate cancer grading
2
First Clinical Medical College, Xuzhou Medical University, Jiangsu Prov- on mpMRI. BMC Med Imaging. 2023;23(1):195.
ince 221002, Xuzhou, China. 3 Institute of Digestive Diseases, Xuzhou Medical 18. Page MJ, McKenzie JE, Bossuyt PM, Boutron I, Hoffmann TC, Mulrow
University, 84 West Huaihai Road, Jiangsu Province 221002, Xuzhou, China. CD, Shamseer L, Tetzlaff JM, Akl EA, Brennan SE, et al. The PRISMA 2020
4
Key Laboratory of Gastrointestinal Endoscopy, Xuzhou Medical University, statement: an updated guideline for reporting systematic reviews. BMJ.
Jiangsu Province 221002, Xuzhou, China. 5 College of Nursing, Yangzhou 2021;372: n71.
University, Yangzhou 225009, China. 19. Whiting PF, Rutjes AW, Westwood ME, Mallett S, Deeks JJ, Reitsma JB,
Leeflang MM, Sterne JA, Bossuyt PM. QUADAS-2: a revised tool for the
Received: 14 November 2023 Accepted: 23 January 2024 quality assessment of diagnostic accuracy studies. Ann Intern Med.
2011;155(8):529–36.
20 Chu H, Cole SR. Bivariate meta-analysis of sensitivity and specificity with
sparse data: a generalized linear mixed model approach. J Clin Epidemiol.
2006;59(12):1331–2 (author reply 1332-1333).
References 21. McDowell M, Jacobs P. Meta-analysis of the effect of natural frequencies
1. Sung H, Ferlay J, Siegel RL, Laversanne M, Soerjomataram I, Jemal A, Bray on Bayesian reasoning. Psychol Bull. 2017;143(12):1273–312.
F. Global cancer statistics 2020: GLOBOCAN estimates of incidence and 22. Yao Z, Jin T, Mao B, Lu B, Zhang Y, Li S, Chen W. Construction and multi-
mortality worldwide for 36 cancers in 185 countries. CACancer J Clin. center diagnostic verification of intelligent recognition system for endo-
2021;71(3):209–49. scopic images from early gastric cancer based on YOLO-V3 algorithm.
2. Thrift AP, El-Serag HB. Burden of gastric cancer. Clinical gastroenterology Front Oncol. 2022;12:815951.
and hepatology : the official clinical practice journal of the American 23. Ueyama H, Kato Y, Akazawa Y, Yatagai N, Komori H, Takeda T, Matsumoto
Gastroenterological Association. 2020;18(3):534–42. K, Ueda K, Matsumoto K, Hojo M, et al. Application of artificial intelligence
3. Ajani JA, Lee J, Sano T, Janjigian YY, Fan D, Song S. Gastric adenocarci- using a convolutional neural network for diagnosis of early gastric cancer
noma Nature reviews Disease primers. 2017;3:17036. based on magnifying endoscopy with narrow-band imaging. Journal of
4. Miller KD, Nogueira L, Devasia T, Mariotto AB, Yabroff KR, Jemal A, Kramer Gastroenterology and Hepatology (Australia). 2021;36(2):482–9.
J, Siegel RL. Cancer treatment and survivorship statistics. CA Cancer J 24. Tang D, Ni M, Zheng C, Ding X, Zhang N, Yang T, Zhan Q, Fu Y, Liu W,
Clin. 2022;72(5):409–36. Zhuang D, et al. A deep learning-based model improves diagnosis
5. Ajani JA, D’Amico TA, Bentrem DJ, Chao J, Cooke D, Corvera C, Das P, Enz- of early gastric cancer under narrow band imaging endoscopy. Surg
inger PC, Enzler T, Fanta P, et al. Gastric cancer version 2.2022 clinical prac- Endosc. 2022;36(10):7800–10.
tice guidelines in oncology. J Natl Compr Canc Netw. 2022;20(2):167–92. 25. Miyaki R, Yoshida S, Tanaka S, Kominami Y, Sanomura Y, Matsuo T, Oka S,
6. Hamashima C, Okamoto M, Shabana M, Osaki Y, Kishimoto T. Sensitivity of Raytchev B, Tamaki T, Koide T, et al. A computer system to be used with
endoscopic screening for gastric cancer by the incidence method. Int J laser-based endoscopy for quantitative diagnosis of early gastric cancer. J
Cancer. 2013;133(3):653–9. Clin Gastroenterol. 2015;49(2):108–15.
7. Telford JJ, Enns RA. Endoscopic missed rates of upper gastroin- 26. Li Y, Xie X, Yang X, Guo L, Liu Z, Zhao X, Luo Y, Jia W, Huang F, Zhu S,
testinal cancers: parallels with colonoscopy. Am J Gastroenterol. et al. Diagnosis of early gastric cancer based on fluorescence hyper-
2010;105(6):1298–300. spectral imaging technology combined with partial-least-square
8. Veitch AM, Uedo N, Yao K, East JE. Optimizing early upper gastrointes- discriminant analysis and support vector machine. J Biophotonics.
tinal cancer detection at endoscopy. Nat Rev Gastroenterol Hepatol. 2019;12(5):e201800324.
2015;12(11):660–7. 27. Li L, Chen Y, Shen Z, Zhang X, Sang J, Ding Y, Yang X, Li J, Chen M, Jin
C, et al. Convolutional neural network for the diagnosis of early gastric
Shi et al. World Journal of Surgical Oncology (2024) 22:40 Page 13 of 13

cancer based on magnifying narrow band imaging. Gastric Cancer. 47. Luo D, Kuang F, Du J, Zhou M, Liu X, Luo X, Tang Y, Li B, Su S. Artificial
2020;23(1):126–32. intelligence-assisted endoscopic diagnosis of early upper gastroin-
28. Jin T, Jiang Y, Mao B, Wang X, Lu B, Qian J, Zhou H, Ma T, Zhang Y, Li S, et al. testinal cancer: a systematic review and meta-analysis. Front Oncol.
Multi-center verification of the influence of data ratio of training sets on 2022;12:855175.
test results of an AI system for detecting early gastric cancer based on 48. Charilaou P, Battat R. Machine learning models and over-fitting considera-
the YOLO-v4 algorithm. Front Oncol. 2022;12:953090. tions. World J Gastroenterol. 2022;28(5):605–7.
29. Hu H, Gong L, Dong D, Zhu L, Wang M, He J, Shu L, Cai Y, Cai S, Su W, 49. Hosseinzadeh M, Gorji A, Fathi Jouzdani A, Rezaeijo SM, Rahmim A, Sal-
et al. Identifying early gastric cancer under magnifying narrow-band manpour MR. Prediction of cognitive decline in Parkinson’s disease using
images with deep learning: a multicenter study. Gastrointest Endosc. clinical and DAT SPECT imaging features, and hybrid machine learning
2021;93(6):1333-1341.e1333. systems. Diagnostics (Basel). 2023;13(10):1691.
30. He X, Wu L, Yu H. Real-time use of artificial intelligence for diagnosing 50. Heydarheydari S, Birgani MJT, Rezaeijo SM. Auto-segmentation of
early gastric cancer by endoscopy: a multicenter, diagnostic study. United head and neck tumors in positron emission tomography images
European Gastroenterology Journal. 2021;9(SUPPL 8):777. using non-local means and morphological frameworks. Pol J Radiol.
31. Zhou B, Rao X, Xing H, Ma Y, Wang F, Rong L. A convolutional neural 2023;88:e365–70.
network-based system for detecting early gastric cancer in white-light 51. LeCun Y, Bengio Y, Hinton G. Deep learning. Nature.
endoscopy. Scand J Gastroenterol. 2022;58(2):157–62. 2015;521(7553):436–44.
32. Zhang LM, Zhang Y, Wang L, Wang JY, Liu YL. Diagnosis of gastric 52. Litjens G, Kooi T, Bejnordi BE, Setio AAA, Ciompi F, Ghafoorian M, van der
lesions through a deep convolutional neural network. Dig Endosc. Laak J, van Ginneken B, Sánchez CI. A survey on deep learning in medical
2021;33(5):788–96. image analysis. Med Image Anal. 2017;42:60–88.
33. Wu L, Zhou W, Wan X, Zhang J, Shen L, Hu S, Ding Q, Mu G, Yin A, Huang 53. Xie F, Zhang K, Li F, Ma G, Ni Y, Zhang W, Wang J, Li Y. Diagnostic accuracy
X, et al. A deep neural network improves endoscopic detection of early of convolutional neural network-based endoscopic image analysis in
gastric cancer without blind spots. Endoscopy. 2019;51(6):522–31. diagnosing gastric cancer and predicting its invasion depth: a systematic
34. Wu L, He X, Liu M, Xie H, An P, Zhang J, Zhang H, Ai Y, Tong Q, Guo review and meta-analysis. Gastrointest Endosc. 2022;95(4):599-609.
M, et al. Evaluation of the effects of an artificial intelligence system (e597).
on endoscopy quality and preliminary testing of its performance in 54 Islam MM, Poly TN, Walther BA, Lin MC, Li YJ. Artificial intelligence in
detecting early gastric cancer: a randomized controlled trial. Endoscopy. gastric cancer: identifying gastric cancer using endoscopic images with
2021;53(12):1199–207. convolutional neural network. Cancers (Basel). 2021;13(21):5253.
35. Tang D, Wang L, Ling T, Lv Y, Ni M, Zhan Q, Fu Y, Zhuang D, Guo H, Dou 55. Zhou S. Sparse SVM for sufficient data reduction. IEEE Trans Pattern Anal
X, et al. Development and validation of a real-time artificial intelligence- Mach Intell. 2022;44(9):5560–71.
assisted system for detecting early gastric cancer. a multicentre retro- 56. Erickson BJ, Korfiatis P, Akkus Z. Kline TL Machine learning for medical
spective diagnostic study. EBioMedicine. 2020;62:103146. imaging. Radiographics a review publication of the Radiological Society
36. Noda H, Kaise M, Higuchi K, Koizumi E, Yoshikata K, Habu T, Kirita K, Onda of North America. 2017;37(2):505–15.
T, Omori J, Akimoto T, et al. Convolutional neural network-based system 57. Chen S, Lu S, Tang Y, Wang D, Sun X, Yi J, Liu B, Cao Y, Chen Y, Liu X. A
for endocytoscopic diagnosis of early gastric cancer. BMC Gastroenterol- machine learning-based system for real-time polyp detection (DeFrame):
ogy. 2022;22(1):237. a retrospective study. Front Med (Lausanne). 2022;9:852553.
37. Kanesaka T, Lee TC, Uedo N, Lin KP, Chen HZ, Lee JY, Wang HP, Chang HT. 58. Gong EJ, Bang CS, Lee JJ, Baik GH, Lim H, Jeong JH, Choi SW, Cho J, Kim
Computer-aided diagnosis for identifying and delineating early gastric DY, Lee KB, et al. Deep learning-based clinical decision support system for
cancers in magnifying narrow-band imaging. Gastrointest Endosc. gastric neoplasms in real-time endoscopy: development and validation
2018;87(5):1339–44. study. Endoscopy. 2023;55(8):701–8.
38. Ikenoyama Y, Hirasawa T, Ishioka M, Namikawa K, Yoshimizu S, Horiuchi Y,
Ishiyama A, Yoshio T, Tsuchida T, Takeuchi Y, et al. Detecting early gastric
cancer: Comparison between the diagnostic ability of convolutional Publisher’s Note
neural networks and endoscopists. Dig Endosc. 2021;33(1):141–50. Springer Nature remains neutral with regard to jurisdictional claims in pub-
39. Horiuchi Y, Hirasawa T, Ishizuka N, Tokai Y, Namikawa K, Yoshimizu lished maps and institutional affiliations.
S, Ishiyama A, Yoshio T, Tsuchida T, Fujisaki J, et al. Performance of a
computer-aided diagnosis system in diagnosing early gastric cancer
using magnifying endoscopy videos with narrow-band imaging (with
videos). Gastrointest Endosc. 2020;92(4):856–65 (e851).
40. Horiuchi Y, Aoyama K, Tokai Y, Hirasawa T, Yoshimizu S, Ishiyama A, Yoshio
T, Tsuchida T, Fujisaki J, Tada T. Convolutional neural network for dif-
ferentiating gastric cancer from gastritis using magnified endoscopy with
narrow band imaging. Dig Dis Sci. 2020;65(5):1355–63.
41. Gong L, Wang M, Shu L, He J, Qin B, Xu J, Su W, Dong D, Hu H, Tian J,
et al. Automatic captioning of early gastric cancer via magnification
endoscopy with narrow band imaging. Gastrointestinal endoscopy.
2022;96(6):929-942.e6.
42. Deo RC. Machine learning in medicine. Circulation. 2015;132(20):1920–30.
43. Handelman GS, Kok HK, Chandra RV, Razavi AH, Lee MJ, Asadi H.
eDoctor: machine learning and the future of medicine. J Intern Med.
2018;284(6):603–19.
44. Cao R, Tang L, Fang M, Zhong L, Wang S, Gong L, Li J, Dong D, Tian J,
et al. Artificial intelligence in gastric cancer: applications and challenges.
Gastroenterol Rep. 2022;(10):goac064.
45. Bang CS, Lee JJ, Baik GH. Computer-aided diagnosis of esophageal cancer
and neoplasms in endoscopic images: a systematic review and meta-
analysis of diagnostic test accuracy. Gastrointest Endosc. 2021;93(5):1006-
1015. (e1013).
46. Jiang K, Jiang X, Pan J, Wen Y, Huang Y, Weng S, Lan S, Nie K, Zheng Z, Ji
S, et al. Current evidence and future perspective of accuracy of artificial
intelligence application for early gastric cancer diagnosis with endos-
copy: a systematic and meta-analysis. Front Med. 2021;8:629080.

You might also like