0% found this document useful (0 votes)
16 views18 pages

Chronic Diseases and Translational Medicine - 2022 - Shah - Advancement of Deep Learning in Pneumonia Covid 19

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views18 pages

Chronic Diseases and Translational Medicine - 2022 - Shah - Advancement of Deep Learning in Pneumonia Covid 19

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

Received: 8 August 2021 | Accepted: 20 January 2022

DOI: 10.1002/cdt3.17

REVIEW

Advancement of deep learning in pneumonia/Covid‐19


classification and localization: A systematic review with
qualitative and quantitative analysis

Aakash Shah1 | Manan Shah2

1
Department of Computer Science &
Engineering, Institute of Technology, Nirma Abstract
University, Ahmedabad, India Around 450 million people are affected by pneumonia every year, which results
2
Department of Chemical Engineering, School in 2.5 million deaths. Coronavirus disease 2019 (Covid‐19) has also affected 181
of Technology, Pandit Deendayal Energy million people, which led to 3.92 million casualties. The chances of death in
University, Gandhinagar, India
both of these diseases can be significantly reduced if they are diagnosed early.
Correspondence
However, the current methods of diagnosing pneumonia (complaints + chest
Manan Shah, Department of Chemical X‐ray) and Covid‐19 (real‐time polymerase chain reaction) require the presence
Engineering, School of Technology, Pandit of expert radiologists and time, respectively. With the help of deep learning
Deendayal Energy University, Gandhinagar, models, pneumonia and Covid‐19 can be detected instantly from chest X‐rays or
Gujarat 382007, India.
Email: [email protected] computerized tomography (CT) scans. The process of diagnosing pneumonia/
Covid‐19 can become faster and more widespread. In this paper, we aimed to
Edited by Yi Cui elicit, explain, and evaluate qualitatively and quantitatively all advancements in
deep learning methods aimed at detecting community‐acquired pneumonia,
viral pneumonia, and Covid‐19 from images of chest X‐rays and CT scans. Being
a systematic review, the focus of this paper lies in explaining various deep
learning model architectures, which have either been modified or created from
scratch for the task at hand. For each model, this paper answers the question of
why the model is designed the way it is, the challenges that a particular model
overcomes, and the tradeoffs that come with modifying a model to the required
specifications. A grouped quantitative analysis of all models described in the
paper is also provided to quantify the effectiveness of different models with a
similar goal. Some tradeoffs cannot be quantified and, hence, they are men-
tioned explicitly in the qualitative analysis, which is done throughout the paper.
By compiling and analyzing a large quantum of research details in one place
with all the data sets, model architectures, and results, we aimed to provide a
one‐stop solution to beginners and current researchers interested in this field.

KEYWORDS
classification, Covid‐19, deep learning, localization, pneumonia

1 | INTRODUCTION the lungs' air sacs known as the alveoli. The infected
alveoli are filled with fluid, which makes breathing dif-
Pneumonia is a respiratory disease responsible for sig- ficult. Pneumonia, a contagious disease, is classified into
nificant morbidity all over the world. It causes a lower two main types (hospital‐acquired pneumonia and
respiratory tract infection, leading to inflammation in community‐acquired pneumonia [CAP]) based on

This is an open access article under the terms of the Creative Commons Attribution‐NonCommercial License, which permits use, distribution and reproduction in any
medium, provided the original work is properly cited and is not used for commercial purposes.
© 2022 The Authors. Chronic Diseases and Translational Medicine published by John Wiley & Sons, Ltd on behalf of Chinese Medical Association.

154 | wileyonlinelibrary.com/journal/cdt3 Chronic Dis Transl Med. 2022;8:154–171.


25890514, 2022, 3, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cdt3.17 by Bangladesh Hinari NPL, Wiley Online Library on [04/11/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
ADVANCEMENT OF DEEP LEARNING IN PNEUMONIA/COVID‐19 | 155

where it is acquired. The majority of pneumonia cases The fact that deep learning models succeeded not
fall under the category of CAP (all cases of pneumonia only in the task of pneumonia detection but also in
that are not acquired from the hospital). If CAP is di- other pulmonary abnormality detection tasks was le-
agnosed early, the chances of 100% recovery are high, veraged by many other researchers to detect other
with little chances of reinfection. For a complete diag- anomalies from the same models or training data. This
nosis of pneumonia, a combination of clinical aware- case could prove useful, especially in recent situations
ness, specific microbiological tests, and radiographical (in 2021) such as the outbreak of Covid‐19 because of
studies are necessary. However, plain chest radiography the following reasons. Even though real‐time poly-
alone can rapidly demonstrate the presence of pul- merase chain reaction (RT‐PCR) is the accepted as
monary abnormalities in most cases.1 Unfortunately, standard in the diagnosis of Covid‐19, its sensitivity and
pneumonia is only one of many pulmonary abnormal- specificity are not optimal.11 Other than that, many
ities and, hence, radiographical findings often fail to countries or regions cannot conduct sufficient RT‐PCR
lead to a definitive diagnosis of pneumonia. Conse- testing for thousands of subjects in a small span of time
quently, the distinction of pneumonia from other pul- because of the lack of people who can perform these
monary diseases cannot be made with certainty on tests. In these cases, deep learning algorithms might
radiological grounds with the current technology. help if the country has enough imaging machines but
One of the significant problems of radiographical fewer people who can perform the test. RT‐PCR testing
findings is that the distinction of pneumonia from other may also be delayed in cases of newly evolved cor-
pulmonary diseases cannot be made with certainty on onavirus, because detection of a newly evolved virus
radiological grounds alone. Moreover, this is not the requires the extraction of the new DNA sequence.11 In
only problem with the current procedure of pneumonia contrast, deep learning models with anomaly detection
diagnosis. A considerable number of medical images are capabilities can detect the clustering effect of viral
produced in hospitals and medical centers daily. Con- pneumonia occurrences such as Middle East respiratory
sequently, radiologists are inundated with a large syndrome (MERS),12 severe acute respiratory syndrome
number of images that they have to analyze manually. (SARS),13 and Covid‐19 as proved by Zhang et al.11 Thus,
In these cases, tried and tested deep learning algorithms deep learning models provide a vital technique that
might be helpful in assisting doctors by marking the part might help in diagnosing pneumonia better and faster.
of the lungs where pneumonia/coronavirus disease 2019 In this paper, we aimed to elicit, explain, and eval-
(Covid‐19) is present. uate qualitatively and quantitatively all advancements in
Many automated technologies related to medical deep learning methods aimed at detecting bacterial or
imaging have shown promising results over the past viral pneumonia from radiographical images. Since
few years, but deep learning has quickly gained pro- chest X‐rays and computerized tomography (CT) scans
minence among them. Researchers have extensively are the most common radiographical tools doctors use
exploited deep learning methods for detecting diseases today, we have covered deep learning methods that use
in various body parts such as the eye, brain,2,3 and chest X‐rays, CT scans, or both as input images. As the
skin.4,5 In some medical imaging cases, it was shown quantitative results of these models depend on the data
that the classification performance of a deep learning sets used, we group these models according to data
model was better than that of medical specialists.6 sets, to perform a fair and uniform quantitative analysis.
Since the proposal of AlexNet7 in 2012, deep learning Although standard data sets are available for bacterial/
models have improved significantly in image classifi- viral pneumonia detection tasks, the same is not ap-
cation tasks. Recent architectures such as ResNet and plicable for Covid‐19 data sets due to the disease's no-
variations of ResNet have also provided a solid base for velty (in 2021). However, the models that leverage these
accurate object detection and localization. Although data sets have been grouped by the amount and quality
single‐shot detectors such as Yolo8 and RetinaNet9 of images used for training and testing. This being said,
provide speedy detections useful in real time, gen- it is not uncommon to find deep learning models that
erative adversarial networks (GANs)10 have played an fail to perform well in the real world after being trained
essential role in unsupervised learning and domain on data sets with specific sources. The poor perfor-
adaption whenever training images have been scarce. mance in the real world is mainly because of the data set
Hence, automated deep learning solutions can solve shift between training images and the images used in
both problems mentioned above. Deep learning mod- other hospitals. A significant amount of variability in
els for pneumonia classification and detection can individual hospital images also accounts for the poor
automatically learn complex features from radiographs performance of these models. To address this problem,
that may not be visible to the naked eye. This was we also evaluate and compare the features learned by
proved in 2017 when Rajpurkar et al.6 proposed various models to predict how well they would perform
CheXNet, a deep learning model, which achieved bet- in the real world. The reason for comprehensively
ter results than radiologists on pneumonia detection compiling all significant research in deep learning for
and other pulmonary disease detection tasks. pneumonia detection is to compare different models
25890514, 2022, 3, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cdt3.17 by Bangladesh Hinari NPL, Wiley Online Library on [04/11/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
156 | SHAH AND SHAH

used in each scenario and identify the best deep Radiologists use either chest X‐rays or CT scans for
learning architectures for each of those scenarios. Al- diagnosing a patient. Both of these modes have their pros
though similar work was performed by Li et al.,14 we and cons. Although X‐ray machines are portable and en-
provide a significantly more comprehensive overview of able faster diagnosis, CT scans provide finer detail of the
models by including research with CT scans, localiza- lungs that may be more difficult to see in a plain X‐ray.
tion tasks, and Covid‐19 classification. Similarly, some deep learning models use X‐rays as input
images, whereas others use CT scans. This paper gives
equal weightage to both models mentioned above but
2 | METHODOLOGY discusses them separately in Sections 3 and 4, respectively.
Other than classification, a significant task taken up
This review is based upon the qualitative and quantitative by some deep learning models is that of detecting and
analysis of studies in the field of pneumonia/Covid‐19 localizing the region where pneumonia is present in the
detection via chest X‐rays and CT scans. The method for lungs. It is worth noting that some classification models
collecting relevant papers for this study was as follows. also perform grad‐cam analysis to analyze which fea-
Platforms such as Elsevier, Google Scholar, IEEE Xplore, tures are being used to perform classification. These
and Springer were searched with the keywords: “pneu- models, even after localizing features, are not con-
monia detection with deep learning,” “Covid‐19 detection sidered localization/segmentation models. Localization/
with deep learning,” “pneumonia localization with deep segmentation models provide bounding boxes/semantic
learning,” “Covid‐19 localization with deep learning,” segmentation in input images around the part of the
“pneumonia detection with Chest X‐rays,” “pneumonia chest affected by pneumonia. We will include these
localization with chest X‐rays,” “Covid‐19 detection with models in our discussion too. However, their compar-
chest X‐rays,” and “Covid‐19 localization with chest ison shall only be made with other localization models.
X‐rays.” Papers were excluded from the study as fol- Data sets play one of the most prominent roles in the
lows: all papers not related to deep learning, pneumonia, success or failure of deep learning models. The details of
or Covid‐19 were excluded. After the first exclusion pro- the three most frequently used data sets are shown in
cess, all remaining papers were included in the final re- Table 1. The National Institutes of Health (NIH) data set
view according to the following criteria. As the main focus consists of 15 classes, out of which one is pneumonia,
of this review is on the generalizability of models, all one is no pulmonary disease, and the remaining 13 are
studies that made an explicit effort to make their model other pulmonary diseases. It is worth noting that “other
generalizable were included. Different studies used var- pulmonary diseases” may have any number of classes
ious metrics for accuracy, so there was no hard limit of ranging from 0 to 13. This way, if it has 0 classes,
accuracy (performance in general) for a paper to be in- the classification task simplifies to pneumonia/
cluded in this study. After that, studies were included no‐pneumonia (one sigmoid neuron or two softmax
with the goal of covering as much breadth in deep neurons in the output layer). On the other hand, if it has
learning methods as possible. This was done because 13 classes, the model will classify a chest X‐ray into
different deep learning methods often solve different pneumonia, no‐pneumonia, or any one of the 13 pul-
problems (improper images, training data shortage, and monary diseases (15 softmax neurons in the output
insufficient training data variety). Furthermore, if a si- layer). The classes of the Radiological Society of North
milar method was followed by more than one paper, then America (RSNA) data set are normal, lung opacity, and
the most generalizable and the paper with the best per- no lung opacity‐not normal, which can be explained as
formance was chosen. no pneumonia, pneumonia with visible lung opacity,
On the medical front, pneumonia is mainly divided and some pulmonary disease without visible damage to
into two types: bacterial pneumonia and viral pneu- the lungs. Lastly, the classes of the Kaggle data set are
monia. Although bacterial pneumonia does not have divided as normal, bacterial‐pneumonia, and viral‐
any subcategories worth discussing here, viral pneu- pneumonia, which need no further explanation.
monia is often subcategorized according to the virus
responsible for causing viral pneumonia. The most TABLE 1 Data set for pneumonia detection
recent example of viral pneumonia and of concern to
us is Covid‐19. Owing to these types and subtypes, Data set Images Classes Bounding boxes
researchers broadly classify input images into the NIH chest X‐rays 1,12,120 14 985
following: (1) pneumonia/no‐pneumonia, (2) bacter- RSNA chest X‐rays 26,684 3 9555
ial pneumonia/viral pneumonia/no‐pneumonia, and
(3) Covid‐19/all other pneumonia/no‐pneumonia. Kaggle chest X‐rays 5856 3 0
Although most research papers fall into one of these CheXpert 2,24,316 14 0
three categories, some models do not consider no‐
MIMIC‐CXR 3,71,920 14 0
pneumonia.
25890514, 2022, 3, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cdt3.17 by Bangladesh Hinari NPL, Wiley Online Library on [04/11/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
ADVANCEMENT OF DEEP LEARNING IN PNEUMONIA/COVID‐19 | 157

2.1 | Detection of pneumonia and its classifier in which the final output score of the classifier
classification among other pulmonary would be invariant of the view (AP or PA). Although the
diseases training and architecture for their classifier were the
same as that of Rajpurkar et al.,6 they also added and
Rajpurkar et al.6 developed a deep learning model that trained an adversary network. This adversary network
could achieve radiologist‐level accuracy on pneumonia took the output score of the classifier as input and
detection from chest X‐rays. They used the NIH data set, outputted a prediction of the view. The adversary net-
which consists of 112,120 chest X‐ray images from work is a standard 3 layered feedforward network of 32
30,805 patients. This data set was first presented and neurons, each with rectified linear unit (ReLU) activa-
used by Wang et al.15 for the same task. However, the tions. The classifiers' objective was to predict output
model was the first one that attained radiologist‐level scores such that the adversary could not predict the
accuracy and it also served as a base for many future view of the input image from the output score. In con-
models. First, the entire data set is split into training and trast, the adversarial network's objective was to predict
test sets such that no patients are repeated in the re- the output score's view (AP or PA). Both the classifier
spective sets. The images are converted to size 224 × 224 and the adversary network were trained alternatively for
and normalized by the ImageNet16 training data set optimizing their respective objectives. To test their ap-
metrics. For training, these images are fed into the proach, Janizek et al.19 tested their model on the
CheXNet model that uses a 121 layered dense con- CheXpert data set (source domain) and Massachusetts
volution neural network (CNN) known as DenseNet.17 Institute of Technology MIMIC‐CXR data set (target
DenseNet improves information flow and back- domain). Although the standard model (without the
propagation through the network, which makes the adversary network) achieved an AUROC of 0.79 on the
optimization process easier. Hence, the entire model source domain, it could only achieve an AUROC of 0.703
was used as it is, except for the output/classification on the target domain. Alternatively, the adversarially
layer. This layer was replaced by a single sigmoid neu- trained model achieved almost similar AUROC's of 0.747
ron because the classification task was pneumonia/no‐ and 0.739 on the source and target domains.
pneumonia. As the NIH data set consists of 15 classes, In April 2020, Lu et al.20 presented the MUXConv, a
the classes pneumonia and no‐pneumonia (14 classes CNN layer specially designed to increase the flow of
including other pulmonary diseases) were highly im- information by multiplexing channels and spatial input
balanced. To get rid of this problem, a weighted loss through the network. They also presented a multi-
function is used while training the model. Finally, the objective algorithm to automatically optimize hy-
model achieved an F1 score of 0.435 and an area under perparameters while training. Although the MUXConv
the receiver operating characteristic (AUROC) of 0.76 was not specially designed for pneumonia classification,
when tested with 420 images. The data set was randomly it could achieve an AUROC of 84.1% on the same data
split into training (28,744 patients and 98,637 images), set used by Rajpurkar et al.6 while using 3× fewer
validation (1672 patients and 6351 images), and test (389 parameters, being 14× more efficient than DenseNet‐121
patients and 420 images). There was no patient overlap and without any manual hyperparameter optimizations.
between the sets. This result shows the scope of improvement in the ac-
Zech et al.18 demonstrated that deep learning curacy of pneumonia detection through better deep
pneumonia classifiers trained on two different hospital learning architectures alone, i.e., without considering
systems predicted results by learning the origin of those any medical knowledge. In September 2020, the same
hospitals instead of learning relevant features that cause team presented the NSGANetV1, another multiobjective
pneumonia. To address this problem, Janizek et al.19 evolutionary algorithm. NSGANetV1 learns the designs
developed an adversarial training‐based approach. They of various architectures through the recombination and
found that the occurrence of pneumonia in generation of multiple architectural components.
posterior–anterior (PA) chest X‐rays was twice as much NSGANetV1 makes its efficiency better by exploiting
as that of pneumonia in anterior–posterior (AP) ima- various patterns used in successful architectures by es-
ges (PA images are the ones in which X‐rays enter from timating their distributions with the help of a Bayesian
the back of the body, whereas AP is vice versa). They model. Although made for general‐purpose image clas-
also found out that pneumonia detection classifiers as in sification, this model achieved an AUROC of 84.6% on
Rajpurkar et al.6 learned to distinguish between the two the NIH data set without modifications or hyperpara-
views (AP and PA) and leveraged that information to meter tuning. Moreover, the class activation mean
classify pneumonia. Their approach was different from average map) of NSGANetV1 showed that the model
standard adversarial approaches, where the classifier learns relevant features, which can also be used to
learns domain‐invariant features. In their case, the pinpoint the region where pneumonia is present.
classifier could not learn domain‐invariant features, Using architectures such as DenseNet‐121 in the
because they had no images from the target domain. In pneumonia detection task is possible because of large
their adversarial approach, Janizek et al.19 tried to train a data sets such as NIH or CheXpert. If such architectures
25890514, 2022, 3, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cdt3.17 by Bangladesh Hinari NPL, Wiley Online Library on [04/11/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
158 | SHAH AND SHAH

are used with smaller data sets such as that of Kaggle, recall of 0.9897. The catch, however, is that they used
there is a considerable chance of overfitting. Li et al.21 624 images to train the GAN, which is the same number
presented the PNet, an efficient yet effective architecture of images provided in the testing data set. Although the
for pneumonia detection using a significantly smaller authors have mentioned that three separate trials were
number of images. They collected their own data set conducted with a different 10% of the data set, using test
from Shenzhen No.2 People's Hospital, consisting of images in even one of the four trials would drastically
6339 X‐rays labeled pneumonia and 4445 X‐rays labeled change the average accuracy. Nonetheless, the idea of
normal. The architecture of PNet is straightforward, using GAN's to generate new data can certainly be ap-
consisting of only five convolution blocks, each followed plied when there is a dearth of training images.
by a max‐pooling layer. This small architecture allows Dey et al.24 developed a model with an Ensemble
PNet to be 25 times as efficient as AlexNet and about Feature Scheme (EFS) for pneumonia detection. Their
50 times as efficient as visual geometric group (VGG) EFS combines handcrafted features and automatically
detection task with an accuracy of 92.79% and an F1 extracted features from a deep learning model to classify
score of 0.93. Even though PNet has a smaller number of an image into pneumonia or normal. Extraction of
parameters, it outperforms both the AlexNet and VGG hand‐crafted features is again completed by combining
16 in the pneumonia are many customized architectures continuous wavelet transform, discrete wavelet trans-
such as PNet, which also get equivalent accuracy. form, and gray level co‐occurrence matrix (GLCM). The
However, only PNet was included in our research be- deep learning features are extracted using the standard
cause of its excellent results on feature analysis. While VGG‐19 architecture. The combined handcrafted fea-
analyzing the features of all models, it was found that tures are then concatenated with features extracted
VGG 16 focuses on the entire lung region instead of using VGG‐19 through PCA and serial feature con-
focusing on the pneumonia‐affected region and AlexNet catenation. After concatenation, these features are given
wanders off to the wrong regions. On the other hand, as an input to a random forest classifier for final clas-
PNet focuses on only those features that correspond to sification. This model was trained using 5500 images
the pneumonia‐affected region in most cases. Hence, from the NIH data set and achieved 97% accuracy when
PNet is not only good at detecting pneumonia but it can tested against 1650 separate images from the NIH data
also help doctors by highlighting the pneumonia‐ set. Similar to other models mentioned in this paper, the
affected area. The detailed results were true positive/ feature activations of this model also point to relevant
false positive/true negative/false negative: 617/86/360/ regions in the lung where pneumonia is present. The
19 with a sensitivity of 0.9701 and specificity of 0.8072. detailed metrics were true positive rate/false positive
Dong et al.22 presented a network architecture that rate/true negative rate/false negative rate: 0.9756/
achieved high classification accuracy in pneumonia 0.0244/0.9808/0.0192 with a sensitivity of 0.9807 and
detection. They used an improved quantum neural specificity of 0.9757 (Table 2).
network and trained this model on the Kaggle chest
X‐ray data set containing 5232 training images. This
model was tested using 624 separate images in the test 2.2 | Detection of Covid‐19 and
set and achieved an accuracy of 96.07%. They also classification of viral pneumonia from
trained AlexNet, ResNet, and InceptionV3 on the same bacterial pneumonia
data, giving 85.30%, 86.38%, and 95.53% accuracy, re-
spectively. Although the authors do not conduct a fea- Capturing a chest X‐ray is one of the primary methods of
ture analysis in their paper, chances are few that a screening the occurrence of Covid‐19. However, there is
quantum neural network would give such high accuracy a general dearth of doctors even at places where
while learning wrong or irrelevant features. The data set equipment to capture such X‐rays is available. To tackle
that these authors used was published by the University this problem, a lot of research has been done to detect
of California, San Diego. The sensitivity and specificity Covid‐19 from chest X‐rays automatically. Cases of
were 0.9756 and 0.9460, respectively. Covid‐19 emerged in the entire world in 2019, but a lot
Diving deeper into pneumonia detection with small of research in pneumonia detection from chest X‐rays
data sets, most intuitively, we come across a solution had already been done before. Hence, much research
based on GAN. Khalifa et al.23 used a GAN with various on the detection of Covid‐19 from chest X‐rays is built
deep learning models to generate more images and use upon the base provided by previous research into
those images to train the deep learning models. They pneumonia detection. Due to the novelty of Covid‐19 (in
took only 10% images from the Kaggle chest X‐ray data 2020–2021), no standardized databases are available and
set and generated the remaining 90% with the GAN for almost every research work uses a different database.
training purposes. These images were then used for Hence, the details of all databases and comments on
training by AlexNet, SqueezeNet, GoogleNet, and Re- their quality are given while explaining the research
sNet with 8, 18, 12, and 18 layers, respectively. ResNet work rather than giving an overview of all databases
performed best with a testing accuracy of 99.0% and a beforehand.
25890514, 2022, 3, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cdt3.17 by Bangladesh Hinari NPL, Wiley Online Library on [04/11/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
ADVANCEMENT OF DEEP LEARNING IN PNEUMONIA/COVID‐19 | 159

TABLE 2 A comprehensive study on pneumonia detection and classification

Author Model Data set AUROC Accuracy


6
Rajpurkar et al. CheXNet (DenseNet‐121) NIH 0.760 NA
19
Janizek et al. CheXNet (DenseNet + Adversarial) NIH + MIMIC 0.747 NA
20
Lu et al. MUXConv (multiplexed convolutions) NIH 0.841 NA
20
Lu et al. NSGANetV1 NIH 0.846 NA
21
Li et al. P‐Net (customized CNN) Custom (10,784) NA 92.79%

Dong et al.22 Quantum neural network Kaggle NA 96.07%

Khalifa et al.23 GAN (semi‐supervised) Kaggle (624) NA 99.00%


24
Dey et al. EFS (CWT + DWT + GLCM) NIH (5550) NA 97.00%

Abbreviations: AUROC, area under the receiver operating characteristic; CNN, convolution neural network; CWT, continuous wavelet transform; DWT, discrete wavelet
transform; GAN, generative adversarial network; GLCM, gray level co‐occurrence matrix; NIH, National Institutes of Health.

Haghanifar et al.25 made a hierarchical deep learning entire model is divided into three parts. The first part is
model for detecting Covid‐19. In the first level, images of where the lung region is segmented from the chest X‐ray
chest X‐rays are classified into normal and pneumonia. by an eight‐layer fully convolution network (FCN).27 The
In the second level, images classified as pneumonia are FCN model was trained using the 241 segmented images
further classified into covid positive (CP) or CAP. The from the Japanese Society of Radiological Technol-
data set used by the authors contains 780 Covid‐19‐ ogy data set and used pretrained weights from the
positive X‐rays, 4600 X‐rays having CAP, and 5000 nor- Pascal visual object class28 segmentation data set. The
mal X‐rays. The approach taken by Haghanifar et al.25 second part consists of feature extraction, where fea-
was very similar to that of Rajpurkar et al.6 The key tures are extracted using three different methods. The
difference was that Haghanifar et al.25 first segmented first method uses a deep CNN (DCNN), the second
the lungs from chest X‐ray and then they only used the method uses a mixture of GLCM‐based (Gray‐Level Co‐
part surrounding those lungs for classification. This occurrence Matrix) texture features and histogram of
approach, to a significant extent, solved the issue of oriented gradients‐based shape features, whereas the
“learning the wrong features to reach the right an- third method uses HAAR wavelet texture features. The
swer,” because then, the model was forced to learn only third part of the model uses a simple support vector
from the lung region rather than learning from the en- machine (SVM) classifier to classify a given image into
tire X‐ray, which usually contains a lot of regions other bacterial pneumonia or viral pneumonia. This particular
than the lungs. U‐Net was used for segmentation of the approach achieved an accuracy of 76.92% with an area
lung region and then they performed dilation on the under curve (AUC) of 82.34%. At this point, it is im-
segmented lungs to cover some lung areas that the perative to reiterate that metrics like accuracy, F‐scores,
U‐Net did not segment. After segmentation, they crop- and AUC should not be the only parameters to judge the
ped the chest X‐ray image such that only the segmented performance of a deep learning (DL) Model. In fact, in
area was covered. This cropped image was then fed into most cases, perfect or close to perfect metrics suggest
the DenseNet‐121 model given by Rajpurkar et al.6 This the opposite of sound, because in most cases, the un-
model achieved an accuracy of 81.04% and f‐scores of derlying model is overfitted, not because of the com-
0.85 and 0.76 for CP and CAP classes, respectively. Al- plexity of the model or the lack of data, but because of
though the accuracy of this model is 0.4% less than that learning irrelevant features that are specific to the
of CheXNet,6 it is more robust than CheXNet on unseen source of train data. The model achieved a sensitivity of
data because of the cropped images. The precision and 0.5567 and specificity of 0.9267.
recall for (normal/pneumonia/Covid‐19) were P: Covid‐19 is a type of viral pneumonia, but it is not
(0.8251/0.9340/0.9420) and R: (0.9516/0.7797/0.9420), the only type of viral pneumonia. Several different re-
respectively. spiratory diseases such as MERS and SARS fall into the
While on the topic of lung segmentation, we cover category of viral pneumonia. Moreover, the occurrence
another research work,26 which uses lung segmentation of clusters of viral pneumonia cases over a short period
to classify a chest X‐ray into bacterial pneumonia or can be a signal of an upcoming outbreak or a pandemic.
viral pneumonia. The data set used by them consists of Keeping this in mind, Zhang et al.11 developed a Con-
241 X‐ray images where lungs have been separated fidence Aware Anomaly Detection (CAAD) model to
manually. The rest of the data set consists of 4513 pe- detect the occurrence of viral pneumonia from chest
diatric chest X‐ray images, out of which 2665 are bac- X‐rays. To train their model, they used two in‐house data
terial pneumonia and 1848 are viral pneumonia. The sets named X‐Viral and X‐Covid. The X‐Viral data set
25890514, 2022, 3, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cdt3.17 by Bangladesh Hinari NPL, Wiley Online Library on [04/11/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
160 | SHAH AND SHAH

contains 5977 viral pneumonia images, 18,619 nonviral image into CP, both the softmax score and original
pneumonia images, and 18,774 normal images. The image are passed into the Localization‐DL. The
X‐Covid data set contains 106 CP images and 107 normal Localization‐DL only gives one out of three results, that
images. They also used the Open‐Covid data set con- is, it classifies the Covid‐19 as either present in the left
taining 493 CP images. The CAAD model has three main lung or the right lung or both lungs. The name
parts. A feature extractor, an anomaly detector, and a Localization‐DL might thus seem to be misleading, be-
confidence predictor. Before we go any further, it is es- cause it is more of a classifier. Nevertheless, the
sential to clarify that the “anomaly” we are trying to Localization‐DL uses a residual attention mechanism to
predict is viral pneumonia and all other classes (pneu- determine the occurrence of Covid‐19 in both lungs. The
monia and normal) are considered normal. Moving back residual attention mechanism looks at the features ex-
to the model, after passing an image to the feature ex- tracted by the feature extractor to determine where the
tractor, the features are passed simultaneously into the attention of the classifier lies. For a deeper analysis of
anomaly detector and the confidence predictor. If the the residual attention mechanism, the reader is referred
anomaly detector predicts the image as an anomaly or to the original paper.31 Coming to the accuracy of this
the confidence predictor predicts our model's con- model, it achieves 99%, 90%, and 93% accuracy on CP,
fidence below a particular threshold, the image is con- CAP, and normal classes, respectively.
sidered an anomaly, i.e., viral pneumonia. The feature Arias‐Londono et al.32 presented a thoughtful eva-
extractor is made up of EfficientNet B0.29 The authors luation approach for DL networks that detect Covid‐19.
designed the anomaly predictor and the confidence Not only that, but they also compiled the most extensive
detector, and they are not as common as other ones known data set of 8573 unique Covid‐19 chest X‐rays.
mentioned in this review, so they deserve an explana- The entire data set consisted of 49,000 normal, 2400
tion. However, the explanation is too involved and out of CAP, and 8573 Covid‐19‐positive images. They used the
the scope of this review, so readers are requested to read same deep learning model used in Covid‐Net33 and ran
the original paper for an explanation of those modules. three different experiments on this data set and model.
Coming to the results of this approach, it achieved The first experiment used raw images as an input, with
80.33% accuracy on the X‐viral data set with training and the only preprocessing being histogram equalization.
78.57% accuracy on the X‐Covid and Open‐Covid data In the second experiment, they used U‐Net to segment
sets combined without any training. This shows us that the lung region and cropped the image so that only the
the model could categorize Covid‐19 cases as viral region encompassing the two lung regions remained. In
pneumonia without any specific training on Covid‐19 the third experiment, the same segmentation approach
images, which shows that this model can be useful in was used, but this time they only kept the segmented
predicting upcoming cases and different mutations of lung part while the remaining region was filled with a
viral pneumonia. The sensitivity and specificity on var- black mask. Upon Grad‐Cam analysis, it was found that
ious data sets for viral and normal classes were: (X‐Viral: only experiment three learned relevant features even if
85.88/79.44), (X‐Covid: 71.70/73.83), (Open‐Covid: 100/ the accuracy was lower than that of the other two ex-
100), (X‐Covid + Open‐Covid: (77.13/78.97)). periments. They also showed that the accuracies of the
Another instance of a region‐based discriminator for AP X‐ray projection were significantly higher than that
Covid‐19 was given by Wang et al.30 in August 2021. of the PA projection. The showings of this study take us
They used the Covid‐CXR data set consisting of 204 CP to an important point worth noticing. As shown below,
X‐rays and the RSNA pneumonia detection data set for metrics such as accuracy and F‐scores can be bolstered
2004 CAP and 1314 normal chest X‐rays to train their if the deep learning model is not extracting the right
model. The authors proposed a Discrimination‐DL and features. However, models made in such a manner may
a Localization‐DL, but their approach was completely be poor at generalizing to new data from a new source.
different. They divided all chest X‐ray images into su- Hence, Grad‐Cam analysis is crucial to determine
perpixels first and then they ran a proposal of lung whether a given model will be able to perform well in
(POL) regressor over those superpixels. This approach is the real world, and one should not judge a model solely
very similar to that of YOLO,8 with a critical difference based on its metrics, especially if the train/test data is
that only the outer boundaries of all superpixels inside less or if the train/test data belong to the same source.
the POL‐proposed rectangles are used to extract two Before we continue with our quest for the best deep
lungs. After both lung regions are extracted, they are learning models for Covid‐19 detection and classifica-
passed into the Discrimination‐DL, which comprises a tion of viral pneumonia from bacteria pneumonia, we
ResNet and a feature pyramid network over the ResNet should make a note. The constructions of all models
to rebuild the image after feature extraction. Focal loss is discussed above show an explicit effort to make the
then measured against the rebuilt image and the origi- model perform well in the real world. These efforts are
nal lung region is passed into the Discrimination‐DL. shown in the form of Grad‐Cam evaluations or seg-
This method helps the Discriminator‐DL in learning menting the lungs so that the models learn only relevant
optimal features. If the Discriminator‐DL classifies the features. The models described below this point,
25890514, 2022, 3, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cdt3.17 by Bangladesh Hinari NPL, Wiley Online Library on [04/11/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
ADVANCEMENT OF DEEP LEARNING IN PNEUMONIA/COVID‐19 | 161

however, do not showcase any effort of such kind. channel attention modules measure the importance of
Hence, even though the accuracies and other metrics of each channel with regard to other channels, whereas the
the models below this point might seem significantly spatial attention module measures the importance of
higher than those mentioned above, the reader should each feature in a channel with regard to other features
keep in mind that they are not proven to generalize well in the same channel. This model achieved an accuracy
in the real world. of 97.82%.
To overcome the problem of a significantly smaller Ohata et al.36 used MobileNet to classify chest X‐rays
number of Covid‐19 images as compared with normal with Covid‐19 and normal chest X‐rays. The data set
and CAP images, Sakib et al.34 used a custom GAN to used consisted of 194 Covid‐19 images and the normal
generate more Covid‐19 images for training. The data images were collected from Kaggle and NIH data sets.
set used by them consisted of 27,228 normal, 5794 CAP, They used MobileNet for feature extraction and tried six
and 209 Covid‐19 images. On analysis, they found that different classifiers for classification purposes. In the
generating precisely 100%, that is, 209 new Covid‐19 end, they decided to use linear SVM for classification
images by GAN, led to the highest classification accu- purposes, which gave an accuracy of 98.62%. Lastly,
racy. On top of GAN, they used a customized CNN with Chowdhary et al.37 tried using various models such as
exponential linear unit activation and Adagrad optimi- SqueezeNet, MobileNet, InceptionV3, ResNet18,
zer. The idea of using a customized and lean CNN works ResNet101, CheXNet, DenseNet201, and VGG19 on 423
well in cases where data used for training is less. In such Covid‐19 images, 1579 normal images, and 1485 CAP
cases, even if the metrics are not necessarily excellent, images. They concluded that DenseNet and CheXNet
we can be assured that the model will not overfit our perform best (99.70% accuracy) in two‐class classifica-
small data set, ensuring good generalizability. Talking tion, that is, Covid‐19 and other, whereas DenseNet
about the results, this model achieved 93.94%, 88.52%, performs best (97.94% accuracy) in three‐class classifi-
and 95.91% accuracy on CP, CAP, and normal cases, cation problems, that is, Covid‐19, CAP, and normal.
respectively. The sensitivity and specificity were 0.979 and 0.988,
Ali et al.35 proposed a dual attention module to respectively (Table 3).
classify viral pneumonia and bacterial pneumonia. For
training, they used the popular data set available on
Kaggle, which consists of 5856 chest X‐rays. The dual 2.3 | Localization of pneumonia in chest
attention module consists of a spatial attention module X‐rays
and a channel attention module. For readers that do not
know what “attention” is, attention was primarily used Although we have already covered some research that
fornatural language processing (NLP) in recurrent localized the entire lung region with the help of seg-
neural networks to allow the network to remember the mentation models such as U‐Net or a‐YoLo‐like‐lung‐
relevant parts of a sentence. Later on, it was adopted regressor, it is worth noting that the research covered
into computer vision to determine the relevance of each previously only localized the entire lung regions and not
feature with respect to the output. After that, each fea- pneumonia‐affected regions. Localization of pneumonia‐
ture is multiplied by its weight to give importance to affected regions in a chest X‐ray can be beneficial in two
those features that contribute more to the output. The ways. Mainly, it can assist radiologists in giving a quicker

TABLE 3 A systematic study on detection of Covid‐19 and classification of viral pneumonia from bacterial pneumonia

Author Model Task Data set Accuracy


Haghanifar et al.25 U‐Net + DenseNet‐121 CP/N/CAP 780/4600/5000 81.06%

Gu et al.26 FCN + (DCNN) Bacterial/viral 2655/1848 76.92%


11
Zhang et al. ResNet + AD + CoP CP/N/CAP 5977/18619/18774 80.33%
30
Wang et al. POL + ResNet CP/N/CAP 204/2004/1314 99%/90%/93%
32
Arias‐Londoño et al. U‐Net + Covid‐Net CP/N/CAP 8573/400/49000 91.53%
34
Sakib et al. GAN + Custom CNN CP/N/CAP 209/5794/27228 94%/88.5%/96%
35
Ali et al. ResNet + Attention Bacterial/Viral Kaggle 97.82%
36
Ohata et al. MobileNet CP/CN 194/NIH‐RSNA 97.00%

Chowdhury et al.37 Multiple CP/N/CAP 423/1485/1579 97.94%

Abbreviations: CAP, community‐acquired pneumonia; CN, covid negative; CNN, convolution neural network; Covid‐19, coronavirus disease 2019; CP, covid positive;
DCNN, deep convolution neural network; FCN, fully convolution network; GAN, generative adversarial network; N, normal; NIH, National Institutes of Health; RSNA,
Radiological Society of North America.
25890514, 2022, 3, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cdt3.17 by Bangladesh Hinari NPL, Wiley Online Library on [04/11/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
162 | SHAH AND SHAH

and more accurate diagnosis. Not only that, but locali- capacity, while distinguishing between lung opacity with
zation also solves a significant problem of generalizability pneumonia and lung opacity without pneumonia. Images
that we have encountered so far. If the primary goal of of 10 different sizes are given as input to all five models.
our deep learning model is to localize pneumonia‐ Hierarchical ensembles are then formed from the two
affected regions, we can be assured that the model is not main groups and, finally, the bounding boxes from both
looking at the wrong features to arrive at the right deci- models are ensembled according to different thresholds.
sion. As far as data sets are concerned, only one data set Li et al.39 used 30,000 images to train their mod-
(RSNA) has enough images with bounding boxes to train el and the rest of the images from the RSNA data set
a DL that localizes well. Thus, it will be easy to compare were used for testing. Before using the raw images for
all research work in this section based on metrics alone. input, they segmented the lung region from the original
We start by explaining the approach38 because they image using U‐Net, much similar to Haghanifar et al.25
won the RSNA Pneumonia Detection Challenge hosted After segmenting the lung region, they combined the
by Kaggle. The authors used an ensemble of five models segmented and raw images to make a final data set for
to localize pneumonia in chest X‐rays. These five models training their model. They used the SE‐ResNet34 for
were divided into two groups. The output regions from localizing regions containing pneumonia.40 SE‐ResNet is
the first group (three models) were ensembled into one short for squeeze‐and‐excitation ResNet, which is basi-
region. Similarly, the output regions from the second cally an encoder–decoder model that serves multiple
group (two models) were separately ensembled into a purposes. The SE‐ResNet acts as a feature extractor and
single region. Finally, the output regions from the two its side branch can automatically learn weights to assign
groups are ensembled into one output region using ap- importance to each channel. Moreover, the model can
propriate thresholds. The first group is made up of one learn smoothly even over significantly deep layers
Deformable Object Relation Network and two Deform- without risk of degradation because of the residual
able region‐based FCNs (R‐FCNs). Here, the prefix De- blocks. Hence, the model works as a channel attention
formable simply suggests the use of deformable module over a ResNet34. For the final output, each pixel
convolutions in the respective architectures. Deformable in the output channel represents the probability of that
convolutions are different from regular convolutions in pixel belonging to the pneumonia class. The regions can
that every pixel/feature is offset by a certain amount in a then be extracted by applying thresholds to those
certain direction. In this way, the shape of the receptive probabilities. Coming to the results of this model, it was
field of the convolution becomes free and is not limited to able to achieve an mean average precision (mAP) score
a rectangle. The offsets are learnable and thus play an of 0.262. The mAP was calculated under intersection
essential role in correctly locating the entire object. over union (IoU) thresholds of 0.3, 0.4, 0.5, 0.6, and 0.7.
The object relation network is not used very commonly Dimitrov's team placed second in the RSNA pneu-
and thus deserves some explanation. The object relation monia detection challenge hosted by Kaggle. The paper
module is an adapted version of a basic attention module (including Poplavskiy) describes their model and ap-
used in NLP. Although the primitive elements of an NLP proach in detail.41 For their model, they used RetinaNet,
attention module are words, the primitive elements of an which is a single‐shot detector. For the base of RetinaNet,
object relation module are objects. As objects have a two‐ they decided to use the encoder part of SE‐ResNext‐101.
dimensional spatial arrangement and vary in terms of This particular design was chosen to accommodate both
scale/shape, their locations and geometrical features are the speed of a single shot detector and the accuracy of a
much more complex than the positions of words in a deep model such as ResNext‐101. Using this approach,
single sentence. Hence, the object relation module has an they were able to achieve an mAP score of 0.26097. The
added geometric weight other than the original weight official score on the leaderboard was 0.24781, but they
commonly found in NLP attention modules. The geo- optimized the model with heavy augmentations and zero
metric weight considers the relative geometry of objects rotation after the competition was over. A lot more trial
and models spatial relationships between them. and error went into making this model, mainly because it
The second type of module used in the first group is was made as a part of a competition. Almost all hy-
deformable R‐FCN, which is just R‐FCN with deform- perparameters in this model are optimized and with good
able convolutions. R‐FCN is explained during the dis- reasons, which are provided in their paper.
cussion of GeminiNet in this section itself. Up until now, we have talked about research that uses
Moving on, the second group is made up of two Re- single‐shot detectors for the localization of pneumonia‐
tinaNets. The difference between these two RetinaNets is affected regions. However, two‐stage detectors have a
not in their architectures but in the type of input images significant advantage over single‐shot detectors in terms
used for training. The first RetinaNet, also called the of accuracy. There is, of course, a time tradeoff involved
ConcatRetinaNet, uses concatenated images for training. while using two‐stage detectors, but the question to ask
Each concatenated image is made by concatenating a is: how much does the detection time matter? At testing
pneumonia‐negative image with a pneumonia‐positive time, the difference between single‐shot and two‐stage
one. This way, the RetinaNet improves its distinguishing detectors is not big enough to make any significant
25890514, 2022, 3, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cdt3.17 by Bangladesh Hinari NPL, Wiley Online Library on [04/11/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
ADVANCEMENT OF DEEP LEARNING IN PNEUMONIA/COVID‐19 | 163

difference, because real‐time detection is not required for Although RFB blocks are important in GeminiNet, its
any use case of pneumonia localization. heart is the R‐FCN. R‐FCN is used as a substitute for
Keeping this in mind, Yao et al.42 presented the Ge- Fast‐residual convolutional neural network) and Faster
miniNet in March 2020. Before we begin with the ex- R‐CNN. Fast R‐CNN improves upon the speed of R‐CNN
planation of this study, there is a note worth taking. Some by calculating the feature map of the entire image at
terminology in the following four or five sentences might once and uses that feature map to derive region of in-
sound new to beginners, but all of it is elaborated upon in terest (ROIs) directly. Feature maps do not have to be
considerable detail in the two successive paragraphs. calculated for different ROI's separately. R‐FCN works
Continuing with GeminiNet, it is a two‐stage detector that by simultaneously generating ROIs and region‐based
builds upon the concept of R‐FCN.43 The difference be- feature maps, thus saving a lot of time. After that step,
tween R‐FCN and GeminiNet is that the latter uses RFB44 for all regions generated in the ROI step, region‐based
blocks instead of simple convolution blocks for multiscale feature maps are checked to vote for the probability of a
context information. Moreover, they changed the base particular ROI containing a particular part of the entire
model used for feature extraction. Instead of using object. The final vote array (consisting of probabilities
ResNet‐50 they used DetNet59, because it yielded better from all ROIs) is averaged to determine which object is
performance metrics. This model (DetNet59 + GeminiNet) present in the image. This process of calculating prob-
presented by the authors achieved an mAP score of 0.3259 abilities for all ROIs and storing them in a vote array is
at IOU thresholds 0.4, 0.5, 0.6, and 0.7. called position‐sensitive ROI (PS‐ROI) pooling. Gemi-
Now onto the elaboration, the RFB block is much like niNet does not use R‐FCN as it is. The changes are as
an InceptionV1 block, except it has an extra shortcut such shown in Figure 2.
as residual blocks in a ResNet. RFB blocks are especially While on the topic of R‐FCN, the approach of the
useful in object detection scenarios, because they have DeepRadiology Team46 is worth mentioning. They used a
variable receptive fields (e.g., inception) and they can modified version of R‐FCN called CoupleNet.47 Couple-
handle deep models smoothly (e.g., ResNet). Moreover, Net adds a second branch to R‐FCN for processing global
instead of simple convolutions, the authors used dilated features. This way, the resulting architecture learns fea-
convolutions in the RFB block.45 Dilated convolutions tures from a larger area through the global branch by
convolve upon a larger size (say 5 × 5 instead of 3 × 3) but adding extra ROI features and local features learn from
select only a few features (3 × 3 = 9) from the big block the local branch by using PS‐ROI features. The DeepRa-
(5 × 5), thereby keeping the number of parameters small diology Team used an ensemble of four models having
but increasing the receptive field (Figure 1). the same architecture. All four of these models gave
unique outputs, which were used for generating the final
regions. First, all bounding boxes that had a confidence
score < 0.5 were eliminated. After that, bounding boxes
from all four groups, which had an IOU > 0.25 were
grouped together. Lastly, the coordinates of all bounding
boxes in one group were used to derive a final bounding
box. This model was able to achieve an mAP of 0.23089
and placed seventh in the competition.
Next, we move on to models that use a combination
of single‐shot detector and two‐stage detectors.
Sirazitdinov et al.48 presented a model that used a
FIGURE 1 Classifications of pneumonia and its detection
combination of RetinaNet (single‐shot detector) and
techniques. Covid‐19, coronavirus disease 2019; CT, computerized Mask R‐CNN (two‐stage detector). RetinaNet worked as
tomography the main unit, whereas Mask R‐CNN was used as an

FIGURE 2 Architecture of GeminiNet. FCN, fully convolution network; PS ROI, position‐sensitive ROI
25890514, 2022, 3, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cdt3.17 by Bangladesh Hinari NPL, Wiley Online Library on [04/11/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
164 | SHAH AND SHAH

auxiliary unit to adjust the regions of RetinaNet. The and are used widely as a metric in segmentation tasks.
working of the entire model is straightforward. Both the Moving on, the classification module is made up of the
RetinaNet and the Mask R‐CNN models work separately DenseNet‐121 architecture just like CheXNet and takes a
and predict bounding boxes with corresponding classes. fixed input of size 192 × 192 × 64. Finally, this model
After applying non‐max suppression in both models, a achieved an accuracy of 89.6% with an AUC score of
weighted average of predictions from both models is 0.941 on independent testing sets. Although the archi-
calculated where the weight of RetinaNet: Mask R‐CNN tecture of the classifier in this model is the same as
predictions is 3:1. This ratio was calculated by an CheXNet, the number of training images is significantly
iterative grid search over many such ratios ranging from fewer. Nevertheless, Grad‐CAM evaluations of this
1:1 to 4:1. model show that the model can learn correct features to
Another research work explored the combination arrive at the right decision. Hence, the segmentation
of RetinaNet and Mask R‐CNN for pneumonia detec- module that precedes the classification module plays a
tion.49 They tried various ensembles of RetinaNet and vital role in the generalizability of this model. The sen-
Mask R‐CNN with different sizes and different weights. sitivity and specificity of this model were 0.840 and
Finally, a model with RetinaNet 178, RetinaNet 184, 0.930, respectively.
RetinaNet 201, Mask R‐CNN 150, and Mask R‐CNN 162 Ouyang et al.52 presented a deep learning model
in the ratio 2:2:3:2:3 was used for detection. This with dual sampling and an online, trainable class acti-
model achieved an mAP of 0.21746, which could be vation mapping (CAM) module to ensure that the model
placed at the 21st place in the competition approxi- learned important features. The training data set used
mately (Table 4). for this model contains 2186 images, of which 1092 are
CP and 1094 are CAP. The data set used for testing is
also quite large, with 2796 images, of which 2295 are
2.4 | Classification of Covid‐19 and CAP CP and 501 are CAP. The authors also use a standard
via CT scans lung segmentation module called the VB‐Net toolkit53
for lung segmentation. Feature extraction is then done
Harmon et al.50 made a deep learning model detect using a ResNet34. After segmentation, the entire data set
Covid‐19 from CT scans using multinational data sets. is sampled in two ways. The first one is uniform sam-
Their data set consisted of CP scans from China (369), pling, where each minibatch contains images in the
Japan (100), and Italy (57). In total, 1059 scans were same ratio as the entire data set. The second method is
used for training and 1397 separate scans were used for size‐balanced sampling. Size‐balanced sampling is re-
testing. Their deep learning model consists of a lung quired, because the data set has only a small number of
segmentation module and a classifier module. The lung Covid‐19 images with a small infection area. Similarly,
segmentation module segments the lung region from only a few images with a large area of infections are
the entire CT scan. After the lung region is segmented, available in the CAP category. Hence, size‐balanced
the segmented region is given as an input to the clas- sampling is applied such that the ratio of: CAP images
sifier, which classifies the input into CP or covid nega- with large infection; CAP images with small infection;
tive. For the lung segmentation module, the AH‐net51 covid images with large infection; and covid images with
architecture is used. AH‐Net is an encoder–decoder‐ small infection remain approximately the same in each
based segmentation module used for three‐dimensional minibatch. This ratio is maintained by oversampling.
(3D) segmentation and it mostly works similar to U‐Net. However, oversampling poses another challenge of
The segmented regions used while training had a mean overfitting. This challenge is resolved by using the first of
dice score of 0.95. Dice scores are similar to IOU scores its kind, online CAM module. The online CAM module

TABLE 4 A comprehensive review on localization of pneumonia in chest X‐rays

Author Model Type IOU thresholds mAP


40
Li et al. U‐Net (SE‐ResNet34) SSD 0.3–0.7 (0.1) 0.262
41
Gabruseva et al. RetinaNet (SE‐ResNext101) SSD 0.4–0.75 (0.05) 0.260
42
Yao et al. GeminiNet (modified R‐FCN) TSD 0.4–0.7 (0.1) 0.326

The DeepRadiology Team46 CoupleNet (modified R‐FCN) TSD 0.4–0.75 (0.05) 0.231
48
Sirazitdinov et al. RetinaNet + Mask R‐CNN (3:1) SSD + TSD 0.4–0.75 (0.05) 0.204
49
Ko et al. RetinaNet + Mask R‐CNN (7:5) SSD + TSD 0.4–0.75 (0.05) 0.217
38
Pan et al. R‐FCN + RelNet + RetinaNet SSD + TSD 0.4–0.75 (0.05) 0.255

Abbreviations: mAP, R‐FCN, region‐based fully convolution network; SSD, single shot detector; TSD, two stage detector.
25890514, 2022, 3, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cdt3.17 by Bangladesh Hinari NPL, Wiley Online Library on [04/11/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
ADVANCEMENT OF DEEP LEARNING IN PNEUMONIA/COVID‐19 | 165

is generated by applying a 1 × 1 × 1 convolution to the from CT scans. The authors of NCIP‐Net used a multi-
weights of the fully connected layer and then convolving task DCNN for determining the presence of Covid‐19
that layer over the feature map. A ReLU operation is based on the entire image, Segmentation of Covid‐19
applied at last to get the final activation map. This lesions from the entire CT scan and determining the
model achieved 95.4% accuracy with an AUC of 0.988. probability of Covid‐19 from the segmented lesions. The
The sensitivity and specificity of this model were 0.872 data set used for training this model consists of 323
and 0.907, respectively. Covid‐19‐positive CT scans and 501 normal scans. Be-
The work of Wang et al.54 is yet another example of a fore providing the images to the model as an input, all
deep learning model that consists of a lung segmenta- images went through a lung lobe segmentation process
tion module followed by a classifier with attention. Their where the lung region was separated from the entire
data set consists of 4657 scans where 936 are Normal, image. The model is constructed like a normal
2406 are CAP, and 1315 are CP. For segmentation, the encoder–decoder, but the encoder is connected to three
authors used the 3D‐UNet55 models. After lung lobe branches. Out of those three branches, one is the de-
segmentation, the images are cropped into a size of coder, which is used for lesion segmentation. The sec-
96 × 96 × 96 and passed into the classifier. The classifier ond branch from the encoder is used for the prediction
consists of two parts, the pneumonia detector and the of Covid‐19 directly from the image. The third branch is
pneumonia classifier. If an image is detected to have used for determining the probability of Covid‐19 based
pneumonia by the pneumonia detector, it is passed to on the ROI with lesions. The training is divided into two
the pneumonia classifier, which classifies the image into stages. In the first stage, the second branch from the
interstitial lung disease (ILD) or Covid‐19. The fact that encoder is connected to three convolution layers with a
the pneumonia classifier only comes into action after residual block concatenated with a softmax function to
the pneumonia detector has performed its job was le- determine the probability of Covid‐19 from the image
veraged into using a prior attention residual block. As directly. Still, in the first training stage, the features
shown in Figure 3, the prior attention residual block has encoded by the encoder are passed on to the decoder
one additional input other than the regular residual for lesion segmentation based on dice loss. In the sec-
block, which is borrowed from the weights of the final ond stage of training, CT volume patches are used as an
layer of the pneumonia detection module. The prior input and the third branch extended from the encoder
attention residual block can get the attention weights (C‐Net) is used to identify a maximum of 10 proposals
before backpropagation takes place and they can be with the likelihood of lesions to predict the presence of
used to train the pneumonia classifier simultaneously. Covid‐19. The encoder can predict the proposals with
This method ensures that the classifier is trained on the the likelihood of lesions, because it was previously
right features. This model achieved an accuracy of 93.3% trained to segment lesions from the CT scan. This model
on the Covid‐19 class, 89.4% on the ILD class, and 91.5% achieved an accuracy of 74.4% in Covid‐19/normal and
on the normal class. The sensitivity and specificity for 82.9% in Covid‐19/other lung diseases.
normal/viral/covid‐19 classes were (91.5/89.4/93.3) and Looking at all this study work, some patterns clearly
(93.5/90.6/95.5), respectively. stand out. The first and the most important one is to
Lai et al.56 proposed the novel coronavirus‐infected segment the lung region from the entire CT scan. This
pneumonia (NCIP)‐Net for the detection of Covid‐19 way, a lot of computation time is saved, and the model is

FIGURE 3 Comparison of different attention mechanisms


25890514, 2022, 3, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cdt3.17 by Bangladesh Hinari NPL, Wiley Online Library on [04/11/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
166 | SHAH AND SHAH

forced to learn features from the right region. However, sensitivity and specificity of this model were 0.933 and
the model can still learn the wrong features from the 0.922, respectively (Table 5).
lung region. To overcome this problem, some kind of
attention mechanism, online or offline, is used in all
models that are proven to generalize well. Next, we 2.5 | Localization of Covid‐19 in CT scans
move on to some research work that distinguishes
pneumonia from normal cases and does not include Wang et al.58 presented the COPLE‐Net, a noise‐robust
Covid‐19 cases. A separate section was not created to model for segmentation of Covid‐19 lesions from CT
include the detection of pneumonia via CT scans, be- images. To train their model, they used 558 CP CT
cause not enough research has been carried on that images. The architecture of COPLE‐Net was based on
topic. This is because detection of pneumonia is usually U‐Net with some modifications. First, instead of using
done with X‐rays rather than with CT scans. only max‐pooling or average pooling for downsampling,
Wang et al.57 proposed a multichannel multimodal the authors concatenated both methods, and it gave
deep regression framework for the screening of pneu- better results. Second, they modified the skip connec-
monia from CT scans. For their model, they used 450 tions of U‐Net by adding another layer of convolution
pneumonia‐positive CT scans and 450 normal CT scans. between the encoder and the decoder. This additional
Not only that, but they also used the complaints of those layer contains half as many channels as the encoder.
patients and their demographic information to improve This layer was added to alleviate the semantic gap be-
the performance of their model. The entire model is tween the decoder's high‐level features and the en-
divided into three parts that process demographic coder's low‐level features by forcing the encoder
information, complaint information, and CT scans, re- features to a lower dimension (half channels). Third, the
spectively. Intuitively, the demographic information and authors added an atrous spatial pyramid pooling
the complaint information are processed with the help (ASPP)44 layer at the end of the encoder. An ASPP layer
of an LSTM (long short term memory). The CT scans, contains four parallel layers of dilated convolutions with
however, are processed differently. First, three slices different dilation rates. This way, multiscale features can
from the CT, namely the lung window, high attenuation, be extracted for small and large lesion segmentation.
and low attenuation (LA), are extracted and con- COPLE‐Net was trained using an adaptive self‐
catenated into a three‐channel image. This three‐ ensembling technique with a noise‐robust dice loss. The
channel image is then passed onto an R‐CNN with a noise robustness in dice loss was achieved by using an
base of ResNet‐50. The R‐CNN is an object detection mean absolute error analogous dice loss instead of the
module, so it detects the region of the CT scan where usual mean squared error analogous dice loss. To un-
pneumonia is present. The features extracted from the derstand the self‐ensembling, we must first understand
region detected by the R‐CNN are then passed on to an which models were ensembled. The authors trained two
LSTM network. The features extracted by the R‐CNN COPLE‐Nets via a teacher‐student mechanism. The
were passed on to the LSTM for two reasons. First, the teacher model was an exponential moving average of
authors wanted to use the three channels as a sequence the student model and was thus more stable than the
of video frames that were dependent on each other. The student model. However, the weights of the moving
second reason is that an LSTM was the only feasible way average were not fixed from the beginning. If the loss of
to concatenate the demographic and complaint in- the student model was more than a defined threshold,
formation with the spatial information of CT scans. Fi- the student model was not used to update the teacher
nally, all three LSTMs are concatenated and used for model at all. Otherwise, the weight of the student model
pneumonia detection. This model achieved an accuracy considered to update the teacher model was defined as
of 94.6% in the pneumonia detection task. The a function of the loss constant (difference between the

TABLE 5 A detailed study on classification of Covid‐19 and CAP via CT scans

Author Model Task Data set Accuracy


50
Harmon et al. AH‐Net + CheXNet CP/N/CAP 1059 89.6%
52
Ouyang et al. VB‐Net + ResNet34 CP/CAP 1092/1094 95.4%
54
Wang et al. 3D U‐Net + ResNet CP/N/CAP 1315/936/2406 93.3%/91.5%/89.4%

Lai et al.56 NCIP‐Net CP/N 323/501 74.4%

Wang et al.57 ResNet + LSTM N/CAP 450/450 94.6%

Abbreviations: CAP, community‐acquired pneumonia; CN, covid negative; Covid‐19, coronavirus disease 2019; CP, covid positive; CT, computerized tomography; N,
normal; 3D, three‐dimensional.
25890514, 2022, 3, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cdt3.17 by Bangladesh Hinari NPL, Wiley Online Library on [04/11/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
ADVANCEMENT OF DEEP LEARNING IN PNEUMONIA/COVID‐19 | 167

losses) of the said models. This model was able to generate 200 CT scans from each unique patient. The
achieve a dice score of 0.8072% or 80.72%. detailed augmentation technique has not been disclosed
Gao et al.59 presented a dual‐branch combination in the paper, but the principles upon which the aug-
network (DCN) for performing lesion segmentation and mentation was based were delineated. Hence, the data set
classification at once. Their data set consisted of 1918 CT consists of about 24,000 CT scans. The authors used three‐
scans from 1202 subjects across two hospitals. Before way segmentation in that they extracted x–y, y–z, and x–z
feeding the CT image slices into the DCN, the images slices from the CT scan and trained three different seg-
underwent lung segmentation through a U‐Net. These mentation models to segment Covid‐19 lesions from these
segmented lungs with a dice score coefficient of 0.99 were models. This technique is analogous to how radiologists
then used as an input to the DCN model. The model diagnose Covid‐19 lesions. If a particular voxel cannot be
comprises two main parts, one for classification and clearly predicted as lesion or normal, radiologists often
another one for segmentation. The segmentation model look at voxels surrounding that voxel. Similarly, if we have
is an encoder‐decoder model analogous to a U‐Net two‐dimensional segmentations from all three axes (x–y,
model. The classification model uses ResNet‐50 as a y–z, and x–z), our model can classify a voxel into lesion or
backbone with lesion attention modules, as shown by normal by looking at surrounding voxels without being
brown color in Figure 4. The LA module is a combination limited to that particular plane. This model was able to
of (the original CT slice)/(ResNet‐50 downsampled slice) achieve a dice score of 0.783.
and the feature extracted slice of the corresponding size Fan et al.61 presented the Inf‐Net, a semisupervised
from the decoder of the segmentation module. A slice deep learning model for the segmentation of Covid‐19
from the decoder module is chosen, because the decoder lesions from CT scans. Their data set consisted of 50 CT
has more relevant features which correspond to Covid‐19 scans, which aptly justifies the semisupervised learning.
lesions. Hence, the ResNet‐50 classification module is The architecture of Inf‐Net begins with two convolution
forced to pay attention to features that contain Covid‐19 layers into which a CT scan slice is fed. The first two
lesions. This model was able to achieve a dice score of convolution layers extract the low‐level features. In
0.8351% or 83.51%. The classification accuracy for inter- general, low‐level features are known to detect edges in
nal validation (CT images from the same hospital that the computer vision, so these features are passed through a
model was trained on) was 96.74%, with an AUC of simple convolution layer and compared against the
0.9864, whereas the accuracy on external validation (CT ground truth segmented region to determine the edge
images from a different hospital) was 92.87% with an loss. As shown in Figure 5, this edge loss is back-
AUC of 0.9771. propagated to f2 so that f2 can learn correct edge fea-
Zhou et al.60 presented a three‐way segmentation tures. Next, the features of convolution layers 3, 4, and 5
technique for segmentation of Covid‐19 infected regions are passed on to a partial decoder, which yields a coarse
from a CT scan. The data set used by them consisted of global map of the region to be segmented. Only high‐
CT scans of 120 patients. The total number of unique CT level features are used as an input to the partial decoder
scans used is not disclosed in their paper. The authors, because Wu et al.62 pointed out that low‐level features
however, used a unique data augmentation technique to are computationally intensive as compared to high‐level

FIGURE 4 Architecture of dual‐branch combination network


25890514, 2022, 3, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cdt3.17 by Bangladesh Hinari NPL, Wiley Online Library on [04/11/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
168 | SHAH AND SHAH

FIGURE 5 A detailed architecture of Inf‐Net

features and contribute little to the process of segmen- were present. The generator model was trained against a
tation. The global map provided by the partial decoder discriminator model, which tried to distinguish between
is labeled as coarse in that it contains an extra seg- real and generated normal pneumonia images. More-
mentation region that needs to be removed. Hence, a over, a ResNet‐18 was also trained on Covid‐19‐positive
reverse attention module is used to erase the extra re- images so that the ResNet could grasp the low‐level
gion from the coarse global map. The removal of this features and concatenate those features with the en-
extra region is done with the help of edge features from coder of the generator network. This was done because
the second convolution layer so that only the region the generator network itself was not powerful enough to
inside the edge is preserved. Therefore, the reverse at- grasp the low‐level features of a CT slice. Finally, both
tention module takes input from both f2 and the global normal and Covid‐19‐positive CT slices are provided to
coarse map. Three such reverse attention modules, R3, the generator model, but the loss is only calculated
R4, and R5, are stacked in a cascade manner such that against normal images. In this way, the generator is
the output of R5 is used as an input for the reverse forced to generate normal CT slices even from the
attention module of R4 and so on. Finally, the output of Covid‐19 lesion containing CT slices. This is analogous
R3 is followed by a sigmoid function to give the com- to a denoising autoencoder where noisy images are
pletely segmented infected region. The semisupervised passed into the auto‐encoder, but the loss is calculated
learning approach of Inf‐net is progressively enlarging against noise‐less images. A major benefit of using this
the data set. This process is performed by predicting model is that it is weakly supervised. Hence, while
some labels from the limited training data and then training the generator, labeled image pairs are not ne-
using the predicted labels as the training data and the cessarily required. This model achieved a dice score of
original training data. This process is repeated for a 0.575, which is very competitive for weakly supervised
while until enough training data is gathered. The Inf‐Net models. However, fully supervised models have a much
achieved a dice score of 0.739 on their data set and a higher dice score (Table 6).
dice score of 0.597 on a different data set.
Yang et al.63 presented a unique approach for the
localization of Covid‐19 lesions in CT slices. The idea 3 | CHALLENGES AND F UTURE
was to train a Generator Network, which would output SCOPE
normal (without Covid‐19) slices even if the corre-
sponding input slice had Covid‐19 lesions. Afterward, The end goal of all research into automatic pneumonia/
the output slices could be subtracted from the input Covid‐19 detection and localization is to have a model
slices to localize the regions where Covid‐19 lesions that can be used in (hospitals)/(chest X‐ray centers)/(CT
25890514, 2022, 3, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cdt3.17 by Bangladesh Hinari NPL, Wiley Online Library on [04/11/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
ADVANCEMENT OF DEEP LEARNING IN PNEUMONIA/COVID‐19 | 169

TABLE 6 A study on localization of Covid‐19 in CT scans

Author Model Type Data set DSC


58
Wang et al. COPLE‐Net (Modified U‐Net + ASE) Fully supervised 558 Scans 0.8072
59
Gao et al. DCN (Modified U‐Net + LA + ResNet) Fully supervised 1918 Scans 0.8351
60
Zhou et al. U‐Net (X–Y, Y–Z, X–Z axes segmentation) Semisupervised 120 Patients 0.783
61
Fan et al. Inf‐Net (Custom CNN + RA + PD) Semisupervised 50 Scans 0.594
63
Yang et al. GAN + ResNet Semisupervised 1252 Scans 0.575

Abbreviations: ASE, adaptive self‐ensembling; Covid‐19, coronavirus disease 2019; CT, computerized tomography; DSC, dice score coefficient; LA, lesion attention; PD,
partial decoder; RA, reverse attention.

scan centers) on an everyday basis. For a single model 4 | C ON C LU S I O N


to be used in different centers worldwide, the model
should be able to generalize well to different CT scan/ The process for automating the detection of pneumonia
X‐ray machines and different demographics. from chest X‐rays and CT scans has evolved a lot over
This poses the problem of collecting a data set that the past few years, especially with the advent of deep
contains such a wide variety of data. Although the learning methods. Looking back at the past 4 years, base
problem of overfitting to a particular data set has been deep learning model architectures have evolved a lot.
mitigated by attention mechanisms, Grad‐CAM analy- However, base model architectures are not the most
sis, adversarial training, and segmentation‐before‐ effective solutions for the specific task of pneumonia
classification, this kind of work needs to be applied to a detection. The pioneering models that achieved good
more distributed data set so that it can learn correct metrics on pneumonia detection tasks tweaked the ar-
features from any chest X‐ray/CT scan around the chitectures of base models so that the tweaked models
world without the need of tedious preprocessing. were a better fit for the task of pneumonia detection.
Hence, the first future scope would be to collect a data The models that followed these pioneering models were
set with a wide variety of chest X‐rays/CT scans, focused on generalizing the model architecture. This
especially for Covid‐19 classification. generalization was achieved through techniques such as
Preprocessing an image of a chest‐X‐ray/CT‐scan adversarial training, Grad‐CAM analysis, attention me-
before using it as an input for a deep learning model chanisms, and many more.
poses another challenge. As most image preprocessing The task of classifying Covid‐19 from chest X‐rays
is dependent on the type of image. For example, chest and CT scans is not very different from the pneumonia
X‐rays taken on machine A would require a different detection task. However, research into Covid‐19 de-
kind of image preprocessing mechanism than a chest tection through deep learning models is relatively
X‐ray taken on machine B. Hence, another future scope new, because Covid‐19 is a relatively new disease (as
would be creating deep learning models, which require of 2021). Because of the time gap, the models made
little to no data‐dependent preprocessing. for detecting Covid‐19 from pneumonia use better
In this study, a lot of different research that tackles base model architectures than those initially used in
different problems has been illustrated. Although no pneumonia detection. However, the techniques used
single work tackles all challenges, a smart combination to make the base models more effective toward the
of some practices used in the mentioned research might specific task of Covid‐19 detection are similar to the
yield a truly generalizable model. Furthermore, several techniques used for the pneumonia detection task,
small, custom data sets were compiled by different au- both for higher metrics and better generalization. This
thors for their research. Combining those data sets or observation leads us to an important inference. The
even using semisupervised domain adversarial training inference would be that those techniques, which make
with different data sets would generalize the corre- base model architectures more effective or more
sponding deep learning model better. generalizable for a specific task (pneumonia detec-
Practical application of research in such deep tion) are at least as important if not more important
learning models might be restricted to assisting doctors than the base models.
in making a better diagnosis instead of working in Even as base model architectures keep improving, the
complete autonomy. Keeping such applications in techniques discussed in this paper can always be applied
mind, deep learning models can be modified to output a to the improved base models to further improve the base
prediction highlighting the most important features models' generalizability and effectiveness. With that
based on which the prediction was made. This way, thought, many different techniques and architecture
doctors might get help if they miss some features in the tweaks, along with their merits, demerits, and tradeoffs,
image which are not apparent to the naked eye. have been explained in this paper. A quantitative analysis
25890514, 2022, 3, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cdt3.17 by Bangladesh Hinari NPL, Wiley Online Library on [04/11/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
170 | SHAH AND SHAH

table that corresponds to each section of the paper is also images: a systematic review and meta‐analysis. Comput Biol Med.
provided so that the readers can corelate between the 2020;123:103898. doi:10.1016/j.compbiomed.2020.103898
qualitative and quantitative results of different models 15. Wang X, Peng Y, Lu L, Lu Z, Bagheri M, Summers RM. Chest
X‐Ray8: hospital‐scale chest X‐ray database and benchmarks on
and techniques. With both qualitative and quantitative weakly‐supervised classification and localization of common
analysis, this paper can be a one‐stop solution for as- thorax diseases: IEEE Conference on Computer Vision and Pat-
piring researchers who want to study the field of tern Recognition (CVPR). IEEE; 2017. doi:10.1109/CVPR.2017.369
pneumonia/Covid‐19 detection in depth. Lastly, this pa- 16. Deng J, Dong W, Socher R, Li LJ, Kai L, Li FF. ImageNet: a large‐
scale hierarchical image database: IEEE Conference on Computer
per serves as a means of initiating and propagating new
Vision and Pattern Recognition. IEEE; 2009. doi:10.1109/CVPR.
research in the field of automatic pneumonia/Covid‐19 2009.5206848
detection and localization by providing a wide breadth of 17. Huang G, Liu Z, Van Der Maaten L, Weinberger KQ. Densely
techniques along with enough depth in every technique connected convolutional networks: IEEE Conference on Com-
so as to guide aspiring researchers in the right direction puter Vision and Pattern Recognition (CVPR). IEEE; 2017. doi:10.
for their specific purpose. 1109/CVPR.2017.243
18. Zech JR, Badgeley MA, Liu M, Costa AB, Titano JJ, Oermann EK.
Variable generalization performance of a deep learning model to
C ON F L I C T OF I N T E R E S T S detect pneumonia in chest radiographs: a cross‐sectional study.
The authors declare no conflict of interest. PLoS Med. 2018;15:e1002683. doi:10.1371/journal.pmed.1002683
19. Janizek JD, Erion G, DeGrave AJ, Lee S‐I. An adversarial approach
R EF E R E N C E S for the robust classification of pneumonia from chest radiographs.
ACM CHIL 2020 ‐ Proc 2020 ACM Conf Heal Inference, Learn.
1. Franquet T. Imaging of pneumonia: trends and algorithms. Eur
2020:69‐79. doi:10.1145/3368555.3384458
Respir J. 2001;18:196‐208. doi:10.1183/09031936.01.00213501
20. Liang C, Li Y, Luo J. Multiobjective evolutionary design of deep
2. Kaymak S, Serener A. Automated age‐related macular degen- convolutional neural networks for image classification. IEEE
eration and diabetic macular edema detection on OCT images Trans Evol Comput. 2021;25:277‐291. doi:10.1109/TEVC.2020.
using deep learning: IEEE 14th International Conference on In- 3024708
telligent Computer Communication and Processing (ICCP). IEEE. 21. Li Z, Yu J, Li X, et al. PNet: an efficient network for pneumonia
2018:265‐269. detection: 12th International Congress on Image and Signal
3. Shi J, Zheng X, Li Y, Zhang Q, Ying S. Multimodal neuroimaging Processing, BioMedical Engineering and Informatics (CISP‐
feature learning with multi‐modal stacked deep polynomial BMEI). IEEE; 2019. doi:10.1109/CISP-BMEI48845.2019.8965660
networks for diagnosis of Alzheimer's disease. IEEE J Biomed 22. Dong Y, Wu M, Zhang J. Recognition of pneumonia image based
Heal Informatics. 2018;22:173‐183. on improved quantum neural network. IEEE Access. 2020;8:
4. Kaymak S, Esmaili P, Serener A. Deep learning for two‐step 224500‐224512. doi:10.1109/ACCESS.2020.3044697
classification of malignant pigmented skin lesions: 14th Sympo- 23. Khalifa NEM, Taha MHN, Hassanien AE, Elghamrawy S.
sium on Neural Networks and Applications (NEURAL). IEEE. Detection of coronavirus (COVID‐19) associated pneumonia based
2018:1‐6. on generative adversarial networks and a fine‐tuned deep transfer
5. Serte S, Serener A. A generalized deep learning model for glau- learning model using chest X‐ray dataset. 2020:1‐15. http://arxiv.
coma detection: 3rd International Symposium on Multidis‐ org/abs/2004.01184
Ciplinary Studies and Innovative Technologies (ISMSIT). IEEE. 24. Dey N, Zhang YD, Rajinikanth V, Pugalenthi R, Raja NSM. Cus-
2019:1‐5. tomized VGG19 architecture for pneumonia detection in chest
6. Rajpurkar P, Irvin J, Zhu K, et al. CheXNet: radiologist‐level X‐rays. Pattern Recognit Lett. 2021;143:67‐74. doi:10.1016/j.patrec.
pneumonia detection on chest X‐rays with deep learning. 2017. 2020.12.010
http://arxiv.org/abs/1711.05225 25. Haghanifar A, Majdabadi MM, Choi Y, Deivalakshmi S, Ko S.
7. Krizhevsky A, Sutskever I, Hinton GE. ImageNet classification COVID‐CXNet: detecting COVID‐19 in frontal chest X‐ray
with deep convolutional neural networks. Commun ACM. 2017; images using deep learning. 2020. http://arxiv.org/abs/2006.
60:84‐90. doi:10.1145/3065386 13807
8. Redmon J, Divvala S, Girshick R, Farhadi A. You only look once: 26. Gu X, Pan L, Liang H, Yang R. Classification of bacterial and viral
unified, real‐time object detection: 2016 IEEE Conference on childhood pneumonia using deep learning in chest radiography.
Computer Vision and Pattern Recognition (CVPR). IEEE; ACM Int Conf Proc Ser. 2018:88‐93. doi:10.1145/3195588.3195597
2016. doi:10.1109/CVPR.2016.91 27. Long J, Shelhamer E, Darrell T Fully convolutional networks for
9. Lin T‐Y, Goyal P, Girshick R, He K, Dollar P. Focal loss for dense semantic segmentation: 2015 IEEE Conference on Computer Vi-
object detection: 2017 IEEE International Conference on Com- sion and Pattern Recognition (CVPR). IEEE; 2015. doi:10.1109/
puter Vision (ICCV). IEEE; 2017. doi:10.1109/ICCV.2017.324 CVPR.2015.7298965
10. Goodfellow I, Pouget‐Abadie J, Mirza M, et al. Generative ad- 28. Everingham M, Van Gool L, Williams CKI, Winn J, Zisserman A.
versarial networks. Commun ACM. 2020;63(11). doi:10.1145/ The Pascal visual object classes (VOC) challenge. Int J Comput
3422622 Vis. 2010;88:303‐338. doi:10.1007/s11263-009-0275-4
11. Zhang J, Xie Y, Pang G, et al. Viral pneumonia screening on chest 29. Tan M, Le QV. EfficientNet: Rethinking model scaling for con-
X‐ray images using confidence‐aware anomaly detection. 2020. volutional neural networks. 36th Int Conf Mach Learn ICML.
http://arxiv.org/abs/2003.12338 2019:10691‐10700.
12. Drosten C, Kellam P, Memish ZA. Evidence for camel‐to‐human 30. Wang Z, Xiao Y, Li Y, et al. Automatically discriminating and
transmission of MERS coronavirus. N Engl J Med. 2014;371: localizing COVID‐19 from community‐acquired pneumonia on
1359‐1360. doi:10.1056/NEJMc1409847 chest X‐rays. Pattern Recognit. 2021;110:107613. doi:10.1016/j.
13. Li W, Moore MJ, Vasilieva N, et al. Angiotensin‐converting en- patcog.2020.107613
zyme 2 is a functional receptor for the SARS coronavirus. Nature. 31. Wang F, Jiang M, Qian C, et al. Residual attention network for
2003;426:450‐454. image classification: IEEE Conference on Computer Vision and
14. Li Y, Zhang Z, Dai C, Dong Q, Badrigilan S. Accuracy of deep Pattern Recognition (CVPR). IEEE; 2017. doi:10.1109/CVPR.
learning for automated detection of pneumonia using chest X‐ray 2017.683
25890514, 2022, 3, Downloaded from https://onlinelibrary.wiley.com/doi/10.1002/cdt3.17 by Bangladesh Hinari NPL, Wiley Online Library on [04/11/2024]. See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions) on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License
ADVANCEMENT OF DEEP LEARNING IN PNEUMONIA/COVID‐19 | 171

32. Arias‐Londono JD, Gomez‐Garcia JA, Moro‐Velazquez L, Godino‐ 50. Harmon SA, Sanford TH, Xu S, et al. Artificial intelligence for the
Llorente JI. Artificial intelligence applied to chest X‐ray images detection of COVID‐19 pneumonia on chest CT using multi-
for the automatic detection of COVID‐19. A thoughtful evaluation national datasets. Nat Commun. 2020;11:1‐7. doi:10.1038/s41467-
approach. IEEE Access. 2020;8:226811‐226827. doi:10.1109/ 020-17971-2
ACCESS.2020.3044858 51. Liu S, Xu D, Zhou SK, et al. 3D anisotropic hybrid network:
33. Wang L, Lin ZQ, Wong A. COVID‐Net: a tailored deep convolu- transferring convolutional features from 2D images to 3D ani-
tional neural network design for detection of COVID‐19 cases sotropic volumes. Lect Notes Comput Sci. 2018;11071:851‐858.
from chest X‐ray images. Sci Rep. 2020;10:19549. doi:10.1038/ doi:10.1007/978-3-030-00934-2_94
s41598-020-76550-z 52. Ouyang X, Huo J, Xia L, et al. Dual‐sampling attention network
34. Sakib S, Tazrin T, Fouda MM, Fadlullah ZM, Guizani M. DL‐CRC: for diagnosis of COVID‐19 from community‐acquired pneumo-
deep learning‐based chest radiograph classification for Covid‐19 nia. IEEE Trans Med Imaging. 2020;39:2595‐2605. doi:10.1109/
detection: a novel approach. IEEE Access. 2020;8:171575‐171589. TMI.2020.2995508
doi:10.1109/ACCESS.2020.3025010 53. Shan F, Gao Y, Wang J, et al. Abnormal lung quantification in
35. Ali G, Shahin A, Elhadidi M, Elattar M. Convolutional neural chest CT images of COVID‐19 patients with deep learning and its
network with attention modules for pneumonia detection. 2020 application to severity prediction. Med Phys. 2021;48:1633‐1645.
Int Conf Innov Intell Informatics, Comput Technol 3ICT 2020. doi:10.1002/mp.14609
2020;13:0‐5. doi:10.1109/3ICT51146.2020.9311985 54. Wang J, Bao Y, Wen Y, et al. Prior‐attention residual learning for
36. Ohata EF, Bezerra GM, Souza das Chagas JV, et al. Automatic more discriminative COVID‐19 screening in CT images. IEEE Trans
detection of COVID‐19 infection using chest X‐ray images Med Imaging. 2020;39:2572‐2583. doi:10.1109/TMI.2020.2994908
through transfer learning. IEEE/CAA J Autom Sin. 2021;8:239‐248. 55. Çiçek Ö, Abdulkadir A, Lienkamp SS, Brox T, Ronneberger O. 3D
doi:10.1109/JAS.2020.1003393 U‐Net: learning dense volumetric segmentation from sparse an-
37. Chowdhury M, Rahman T, Khandakar A, et al. Can AI help in notation. In: Ourselin S, Joskowicz L, Sabuncu M, Unal G,
screening viral and COVID‐19 pneumonia? IEEE Access. 2020;8: Wells W, eds. Medical Image Computing and Computer‐Assisted
132665‐132676. doi:10.1109/ACCESS.2020.3010287 Intervention–MICCAI 2016, 2016:424‐432. doi:10.1007/978-3-319-
38. Pan I, Cadrin‐chênevert A, Cheng PM. Tackling the radiological 46723-8_49
society of North America pneumonia detection challenge. AJR 56. Lai Y, Li G, Wu D, et al. 2019 novel coronavirus‐infected pneu-
Am J Roentgenol. 2019;213:568‐574. monia on CT: a feasibility study of few‐shot learning for com-
39. Li B, Kang G, Cheng K, Zhang N. Attention‐guided convolutional puterized diagnosis of emergency diseases. IEEE Access. 8, 2020:
neural network for detecting pneumonia on chest X‐rays. Proc 194158‐194165. doi:10.1109/ACCESS.2020.3033069
Annu Int Conf IEEE Eng Med Biol Soc EMBS. 2019;2019: 57. Wang Q, Yang D, Li Z, Zhang X, Liu C. Deep regression via multi‐
4851‐4854. doi:10.1109/EMBC.2019.8857277 channel multi‐modal learning for pneumonia screening. IEEE
40. Hu J, Shen L, Albanie S, Sun G, Wu E. Squeeze‐and‐excitation Access. 2020;8:78530‐78541. doi:10.1109/ACCESS.2020.2990423
networks. IEEE Trans Pattern Anal Mach Intell. 2020;42: 58. Wang G, Liu X, Li C, et al. A Noise‐Robust framework for auto-
2011‐2023. doi:10.1109/TPAMI.2019.2913372 matic segmentation of COVID‐19 pneumonia lesions from CT
41. Gabruseva T, Poplavskiy D, Kalinin A. Deep learning for auto- images. IEEE Trans Med Imaging. 2020;39:2653‐2663. doi:10.
matic pneumonia detection. IEEE Comput Soc Conf Comput Vis 1109/TMI.2020.3000314
Pattern Recognit Work. 2020;2020:1436‐1443. doi:10.1109/ 59. Gao K, Su J, Jiang Z, et al. Dual‐branch combination network
CVPRW50498.2020.00183 (DCN): towards accurate diagnosis and lesion segmentation of
42. Yao S, Chen Y, Tian X, Jiang R. GeminiNet: combine fully con- COVID‐19 using CT images. Med Image Anal. 2021;67:101836.
volution network with structure of receptive fields for object doi:10.1016/j.media.2020.101836
detection. IEEE Access. 2020;8:60305‐60313. doi:10.1109/ACCESS. 60. Zhou L, Li Z, Zhou J, et al. A rapid, accurate and machine‐
2020.2982939 agnostic segmentation and quantification method for CT‐based
43. Dai J, Li Y, He K, Sun J. R‐FCN: Object detection via region‐based COVID‐19 diagnosis. IEEE Trans Med Imaging. 2020;39:
fully convolutional networks. Adv Neural Inf Process Syst. 2016: 2638‐2652. doi:10.1109/TMI.2020.3001810
379‐387. 61. Fan DP, Zhou T, Ji GP, et al. Inf‐Net: automatic COVID‐19 lung
44. Liu S, Huang D, Wang Y. Receptive field block net for accurate infection segmentation from CT images. IEEE Trans Med
and fast object detection. Lect Notes Comput Sci. 2018;45(11215): Imaging. 2020;39:2626‐2637. doi:10.1109/TMI.2020.2996645
404‐419. doi:10.1007/978-3-030-01252-6_24 62. Wu Z, Su L, Huang Q. Cascaded partial decoder for fast and
45. Chen LC, Papandreou G, Kokkinos I, Murphy K, Yuille AL. accurate salient object detection. Proc IEEE Comput Soc Conf
DeepLab: semantic image segmentation with deep convolutional Comput Vis Pattern Recognit. 2019;2019:3902‐3911. doi:10.1109/
nets, atrous convolution, and fully connected CRFs. IEEE Trans CVPR.2019.00403
Pattern Anal Mach Intell. 2018;40:834‐848. doi:10.1109/TPAMI. 63. Yang Z, Zhao L, Wu S, Chen CYC. Lung lesion localization of
2017.2699184 COVID‐19 from chest CT image: a novel weakly supervised
46. The DeepRadiology Team. 2018. Pneumonia detection in chest learning method. IEEE J Biomed Heal Informatics. 2021;25:
radiographs. http://arxiv.org/abs/1811.08939 1864‐1872. doi:10.1109/JBHI.2021.3067465
47. Zhu Y, Zhao C, Wang J, Zhao X, Wu Y, Lu H. CoupleNet: coupling
global structure with local parts for object detection: IEEE In-
ternational Conference on Computer Vision (ICCV). IEEE; 2017.
doi:10.1109/ICCV.2017.444
How to cite this article: Shah A, Shah M.
48. Sirazitdinov I, Kholiavchenko M, Mustafaev T, Yixuan Y, Kuleev R,
Ibragimov B. Deep neural network ensemble for pneumonia loca- Advancement of deep learning in pneumonia/
lization from a large‐scale chest X‐ray database. Comput Electr Eng. Covid‐19 classification and localization: a
2019;78:388‐399. doi:10.1016/j.compeleceng.2019.08.004 systematic review with qualitative and
49. Ko H, Ha H, Cho H, Seo K, Lee J. Pneumonia detection with quantitative analysis. Chronic Dis Transl Med.
weighted voting ensemble of CNN models 2019: 2nd Int Conf
2022;8:154‐171. doi:10.1002/cdt3.17
Artif Intell Big Data, ICAIBD 2019. 2019;306‐310. doi:10.1109/
ICAIBD.2019.8837042

You might also like