Sensors 24 02641 v2
Sensors 24 02641 v2
Article
COVID-19 Hierarchical Classification Using a Deep
Learning Multi-Modal
Albatoul S. Althenayan 1,2, * , Shada A. AlSalamah 1,3,4 , Sherin Aly 5 , Thamer Nouh 6 , Bassam Mahboub 7 ,
Laila Salameh 8 , Metab Alkubeyyer 9 and Abdulrahman Mirza 1
1 Information Systems Department, College of Computer and Information Sciences, King Saud University,
Riyadh 11543, Saudi Arabia; [email protected] (S.A.A.); [email protected] (A.M.)
2 Information Systems Department, College of Computer and Information Sciences, Imam Mohammed Bin
Saud Islamic University, Riyadh 11432, Saudi Arabia
3 National Health Information Center, Saudi Health Council, Riyadh 13315, Saudi Arabia
4 Digital Health and Innovation Department, Science Division, World Health Organization,
1211 Geneva, Switzerland
5 Institute of Graduate Studies and Research, Alexandria University, Alexandria 21526, Egypt;
[email protected]
6 Trauma and Acute Care Surgery Unit, College of Medicine, King Saud University, Riyadh 12271,
Saudi Arabia; [email protected]
7 Clinical Sciences Department, College of Medicine, University of Sharjah, Sharjah P.O. Box 27272,
United Arab Emirates; [email protected]
8 Sharjah Institute for Medical Research, University of Sharjah, Sharjah P.O. Box 27272, United Arab Emirates;
[email protected]
9 Department of Radiology and Medical Imaging, King Khalid University Hospital, King Saud University,
Riyadh 12372, Saudi Arabia; [email protected]
* Correspondence: [email protected]; Tel.: +966-506061614
Abstract: Coronavirus disease 2019 (COVID-19), originating in China, has rapidly spread worldwide.
Physicians must examine infected patients and make timely decisions to isolate them. However,
completing these processes is difficult due to limited time and availability of expert radiologists, as
Citation: Althenayan, A.S.;
well as limitations of the reverse-transcription polymerase chain reaction (RT-PCR) method. Deep
AlSalamah, S.A.; Aly, S.; Nouh, T.; learning, a sophisticated machine learning technique, leverages radiological imaging modalities for
Mahboub, B.; Salameh, L.; disease diagnosis and image classification tasks. Previous research on COVID-19 classification has
Alkubeyyer, M.; Mirza, A. COVID-19 encountered several limitations, including binary classification methods, single-feature modalities,
Hierarchical Classification Using a small public datasets, and reliance on CT diagnostic processes. Additionally, studies have often
Deep Learning Multi-Modal. Sensors utilized a flat structure, disregarding the hierarchical structure of pneumonia classification. This study
2024, 24, 2641. https://doi.org/ aims to overcome these limitations by identifying pneumonia caused by COVID-19, distinguishing
10.3390/s24082641
it from other types of pneumonia and healthy lungs using chest X-ray (CXR) images and related
Academic Editors: Dong Xiao and tabular medical data, and demonstrate the value of incorporating tabular medical data in achieving
Yahui Li more accurate diagnoses. Resnet-based and VGG-based pre-trained convolutional neural network
(CNN) models were employed to extract features, which were then combined using early fusion
Received: 5 March 2024
for the classification of eight distinct classes. We leveraged the hierarchal structure of pneumonia
Revised: 15 April 2024
Accepted: 18 April 2024
classification within our approach to achieve improved classification outcomes. Since an imbalanced
Published: 20 April 2024 dataset is common in this field, a variety of versions of generative adversarial networks (GANs) were
used to generate synthetic data. The proposed approach tested in our private datasets of 4523 patients
achieved a macro-avg F1-score of 95.9% and an F1-score of 87.5% for COVID-19 identification using a
Resnet-based structure. In conclusion, in this study, we were able to create an accurate deep learning
Copyright: © 2024 by the authors. multi-modal to diagnose COVID-19 and differentiate it from other kinds of pneumonia and normal
Licensee MDPI, Basel, Switzerland. lungs, which will enhance the radiological diagnostic process.
This article is an open access article
distributed under the terms and
Keywords: artificial intelligence; COVID-19; CXR; hierarchical; deep learning; multi-modal; diagno-
conditions of the Creative Commons
sis; image classification; multi-classes; pneumonia
Attribution (CC BY) license (https://
creativecommons.org/licenses/by/
4.0/).
Pneumonia
Viral Bacterial
respiratory
SARSr-CoV-2 Influenza syncytial virus Adenoviruses
(RSV)
Figure
Figure1.1.Proposed
Proposedhierarchal
hierarchalclass
classstructure
structureofofpneumonia.
pneumonia.
AsAsillustrated
illustratedpreviously,
previously, the hierarchical
hierarchicalstructure
structureofofCOVID-19
COVID-19indicates
indicatesthat this
that is a
this
ishierarchical
a hierarchicalclassification problem.
classification problem.TheThe
classes in high-level
classes nodes
in high-level at hierarchical
nodes levels
at hierarchical are
lev-
known as coarse-grained nodes because they have unique features that
els are known as coarse-grained nodes because they have unique features that will be will be transmitted
to their child
transmitted to nodes alongnodes
their child with along
all thewith
features from
all the their parent
features node.parent
from their Furthermore,
node. Fur-the
final levelthe
thermore, of nodes in theof
final level structure,
nodes inthe leaf
the node, is referred
structure, the leaf to as fine-grained
node, is referred since
to asitfine-
lacks
descendants
grained since itand inherits
lacks all of itsand
descendants parent’s features.
inherits all of its parent’s features.
Applying high-accuracy artificial intelligence (AI) models to diagnose medical imaging
problems is a current trend in healthcare. Convolutional neural networks are able to detect
and learn the significant details that radiologists find difficult to recognize with their naked
Sensors 2024, 24, 2641 3 of 30
eyes [11]. It produces promising results for learning complex problems in radiology [12].
Many of the previously reviewed studies [13] have employed deep learning models to
diagnose and detect COVID-19 pneumonia utilizing medical imaging in a theoretical
manner that cannot be implemented clinically. CT scans have primarily been considered as
the main radiological imaging modality for all infected cases during the ongoing pandemic.
In addition, most studies on COVID-19 image classification are misleading due to the use of
binary classification and larger samples for COVID-19 classification. Despite this, the ability
of AI systems to differentiate between various classes is increasing as they learn from a
greater number of classes. Most approaches use a flat structure, while pneumonia naturally
falls into a hierarchical structure. Despite several techniques used for COVID-19 detection
and classification, limited research has addressed multi-modal deep learning models
for heterogeneous data types. Most existing models focus on a single feature modality,
while multi-modal features combine multiple aspects of COVID-19 health information,
contributing to superior disease diagnostic processes.
In several fields, especially in diagnosis by medical assistants, deep learning ap-
proaches have accomplished significant advances in multi-modal structures by learning
features from different sources of data [14]. This clearly explains the effectiveness of adding
various medical data in addition to CXR images for the diagnostic process.
One variant of the conventional flat classification problem is hierarchical classification
(HC). In a flat classification approach, cases are categorized into classes without following
any predefined structure.
Proving that hierarchical classification is more effective than flat classification in
this domain is not the purpose of this work, as this has already been addressed in the
literature [15]. In this work, we investigated how clinical data affect COVID-19 classification
utilizing CXR images with a hierarchical classification framework to detect different types
of pneumonia caused by multiple pathogens and differentiated them from normal lungs. To
achieve this, we collected a private, imbalanced dataset in which some types of pneumonia
are much more common than others. To this end, we applied variants of the GAN model
to balance the class distribution. We first applied multi-modal hierarchical classification
utilizing a deep learning approach for two predefined models in a hierarchical structure
using a hybrid approach to the CXR images; then, the medical tabular data were added
using early fusion. It is important to note that the newly released WHO normative guidance
for applying artificial intelligence in health recognizes the risks, and we are in compliance
with their recommendations for safe and effective implementation [16,17].
This paper is organized as follows: Section 2 covers the related works in the literature.
The proposed methodology and details of the dataset used in this paper and its analysis, as
well as the techniques used to preprocess either the CXR images or the tabular dataset, are
discussed in Section 3. After that, Section 4 details the proposed architecture of hierarchical
multi-modals and the training procedure. The experimental setup and obtained results and
a discussion are summarized in Section 5. Finally, the conclusions of the current work and
some possibilities for future works are described in Section 6.
2. Related Work
As a result of the COVID-19 pandemic, COVID-19 medical image classification has
recently attracted a lot of scientific interest. Researchers from a variety of disciplines
have developed deep learning detection and classification models to diagnose COVID-19
reliably and quickly by analyzing radiological images. We published a review paper called
“Detection and Classification of COVID-19 by Radiological Imaging Modalities Using
Deep Learning Techniques: A Literature Review” [13], which attempts to explore all related
remarkable works in the literature and study and analyze them to explain how most current
key approaches to the COVID-19 classification challenge have gaps and untapped potential.
In addition, in the aforementioned paper, we provided some recommendations addressing
various aspects that may help researchers in this field.
Sensors 2024, 24, 2641 4 of 31
Sensors 2024, 24, 2641 potential. In addition, in the aforementioned paper, we provided some recommendations
4 of 30
addressing various aspects that may help researchers in this field.
3. Proposed
3. Proposed Methods
Methods and and Materials
Materials
The proposed approach
The proposed approach that that is
is applied
applied in inthis
thisstudy
studyconsists
consistsofoffive
fivephases,
phases,asasdemon-
demon-
strated in Figure 2. These phases are as follows: collecting the required
strated in Figure 2. These phases are as follows: collecting the required dataset (CXRdataset (CXR im-
images
ages and patient medical data in tabular format), preparing and preprocessing
and patient medical data in tabular format), preparing and preprocessing the collected the col-
lected dataset,
dataset, generating
generating a synthetic
a synthetic datasetdataset to balance
to balance thefeeding
the data, data, feeding the preprocessed
the preprocessed dataset
dataset into a hierarchal multi-modal network, and evaluating the classification
into a hierarchal multi-modal network, and evaluating the classification output. The output.
details
The
of details
each phaseof are
each phase areindescribed
described in thesections.
the following following sections.
Figure 2. Proposed
Figure 2. Proposed framework for multi-modal
framework for multi-modal classification
classificationof
ofCXR
CXRimages
imagesand
andtabular
tabulardata.
data.
Category Value
Sensors 2024, 24, 2641 5 of 31
Number of patients 1217
Gender
Male 1070
48 h, and/or those that did not have tabular data were excluded. Some samples
Female 147 of the
dataset for different classes are shown in Figure 3. The process
Diagnosis of obtaining IRB approval
from hospitals and collecting, cleaning, and organizing the data took approximately
SARSr-CoV-2 1217 a full
year. The COVID-19 cases were collected for the patients
Age that visited from the year 2019
to the year 2022, while theRange
remaining cases were collected for patients from the year 2014
19–87
to the year 2023.
To obtain
Table 1. Patient the datasets
characteristics from
for dataset (I).the
hospital databases, a physician gathered the required
viral patients by searching for the desired test name and range of years. While there is no test
Category Value
for bacterial infection, the patients whose diagnosis contains the word “bacterial pneumonia”
Number of patients 3306
were selected as cases for the bacterial class. It is important to note that the data in the normal
Gender
category wereMalecollected from a patient who was scheduled 1584
for surgery to ensure that the
patient’s lungs were
Female healthy. The CXR images from the
1722 dataset were produced using
first
an Optima XR240amx, General Electric Healthcare
Diagnosis from Chicago, United States. All CXR
images in SARSr-CoV-2
datasets (I) and (II) are posterior–anterior (PA),630and anterior–posterior (AP) views
were included. While there were CXR images of both lateral
Normal 1272 views, the date gap between the
diagnosis and obtaining the CXR images was more than 248
Bacterial 48 h, and/or those that did not have
tabular dataInfluenza
were excluded. Some samples of the dataset 1120for different classes are shown in
Figure 3. The process
RSV of obtaining IRB approval from hospitals
21 and collecting, cleaning, and
organizingAdenovirus
the data took approximately a full year. The15 COVID-19 cases were collected for
the patients that visited from the yearAge2019 to the year 2022, while the remaining cases were
collected for patients
Range from the year 2014 to the year 2023. 0–103
The dataset consists of CXR images with the corresponding medical tabular data for each
patient. It includes demographic, vital signs, clinical, and medication data; the attributes in the
data record are described in Table 3. The tabular dataset includes 644 features with categorical
and numerical data types. Some of the features have been removed since they were not
significant in diagnosing pneumonia. In addition, there are a total of 60 different nationalities
represented in the patient sample for the 4523 patients who make up the entire dataset.
The MEWS (modified early warning score) feature is a clinical tool used in healthcare
settings to assess a patient’s vital signs. More hospitals are currently using it to help track
changes between each set of vitals [18]. The MEWS score typically consists of several physio-
logical parameters including blood pressure, body temperature, pulse rate, respiratory rate,
and the AVPU (A = Awake, V = Verbal, P = Pain, U = Unresponsive) score, which is used to
determine a patient’s level of consciousness. A score is given for each parameter based on
specified standards. The overall MEWS score is then determined by summing the scores
for each parameter [19]. According on their MEWS score, patients may be classified into
risk categories using the MEWS scoring system: Normal: 0–1 score, Low Risk: 2–3 score,
Moderate Risk: 4–6, High Risk: 7–8, Critical: >8. The concern for clinical deterioration
increases as the MEWS score rises.
With the assistance of a knowledgeable pharmacist, medication prescriptions were
also limited to the most fundamental categories without doses, which helped to decrease
the enormous number of medications from 3500 to 614.
Table 4. Patient’s medical information characteristics (*: data with statistical significance; a: chi-square
test; b: Student’s t-test; c: Kruskal–Wallis H test).
Table 4. Cont.
As we can see, there is a significant difference in age, gender, and BMI between the
two groups (all p < 0.001). The MEWS score shows a significant difference (p < 0.001),
and an abnormal MEWS score was more often observed in COVID-19 patients. Signif-
icant differences were not found in some lab tests between the two groups, including
tHb (p = 0.729), Potassium Lvl (p = 0.606), Alk Phos (p = 0.187), Bili Total (p = 0.477), INR
(p = 0.680), LDH (p = 0.553), Ferritin Lvl (p = 0.419), BNP (p = 0.569), and Vitamin D 25 OH
(p = 0.818). WBC, Plt, Lymph Auto #, Sodium Lvl, BUN, Creatinine, Albumin Lvl, ALT, and
Total CK were significantly different between the two groups (p < 0.001). In addition, AST,
Procalcitonin, CRP, Hgb A1c, and D-Dimer also showed significant differences (p = 0.021,
p = 0.036, p = 0.009, p = 0.001, and p = 0.030, respectively). Although medications were
observed in COVID-19 patients, they were not statistically different compared with those
in the non-COVID-19 group.
There are almost 59.5% Saudi citizens among all classes. The distribution of the data
was also shown using a number of visualizations. Figure 4 shows the distribution of some
continuous and categorical features; many medical features were observed. Chart (A) in
Figure 4, showing the age distribution across all datasets, reveals that the majority of
patients fall within the 20–60-year age range, followed by those aged between 60 and 80,
while the smallest sample size belongs to patients over 80 years old. The distribution of
patient gender is presented in chart (B), where we find that the total females (represented
by 1) are 1869 and the total males (represented by 0) are 2654 patients. Approximately 57%
of the total patients had a normal MEWS score, while 1.2% from all classes were critical
cases with an abnormal MEWS score, as shown in chart (C). Charts (D), (E), and (F) show
that the majority of patients typically have a normal white blood cell (WBC) count, platelet
(PLT) count, and C-reactive protein (CRP) result, respectively. Vitamin D 25 OH in chart
(H) shows a right-skewed distribution, and the lymphocyte percentage (Lymph Auto #) in
chart (G) shows an almost normal distribution.
Sensors 2024, 24, 2641 9 of 31
The Pearson correlation coefficient was used to obtain the relationships between the
continuous features. Cramér’s V was used to measure the association between the cate-
gorical features. Figure 5 shows the correlation of the continuous features; the heat map
shows the correlation between twenty-four continuous features. A correlation value of
0.75 was recorded between both creatinine and total CK. Furthermore, a correlation of 0.72
was observed between total AST and ALT. Table 5 indicates that there is no association
between age and nationality, while there is a weak association between age, nationality, and
MEWS score.
Sensors
Sensors 24, 2641
2024,2024, 24, 2641 10 of 10
31 of 30
Figure
Figure 5. 5.Correlation
Correlationof
ofthe
the continuous
continuous features
featuresininthe
thedataset.
dataset.
Table 5. Correlation of the categorical features in the dataset.
Table 5. Correlation of the categorical features in the dataset.
Features Correlation
Features
(AGE, NATIONALITY) Correlation
0.000000
(AGE,(AGE, MEWS Score)
NATIONALITY) 0.196708
0.000000
(AGE, MEWS Score)AGE)
(NATIONALITY, 0.000000
0.196708
(NATIONALITY, MEWS
(NATIONALITY, AGE) Score) 0.187111
0.000000
(MEWS Score,
(NATIONALITY, MEWSAGE)
Score) 0.196708
0.187111
(MEWS
(MEWSScore, NATIONALITY)
Score, AGE) 0.187111
0.196708
(MEWS Score, NATIONALITY) 0.187111
3.3. Data Preprocessing
Efficient
3.3. Data preprocessing of data can have a major effect on the reliability and quality
Preprocessing
of deep learning model results. It assists in guaranteeing that the data are accurate, in the
Efficient preprocessing of data can have a major effect on the reliability and quality
right format, free of errors, and in line with the objectives of the modeling tasks [20].
of deep learning model results. It assists in guaranteeing that the data are accurate, in the
right format, free of errors, and in line with the objectives of the modeling tasks [20].
To preserve data privacy, we anonymize the identity of the patients in the CXR images
and the tabular data because it is not included in the analysis. In the following subsections,
Sensors 2024, 24, 2641 11 of 30
we discuss each preprocessing step for each item of the tabular data and the CXR images in
detail and illustrate its main methods.
3.3.2. Images
All the CXR images were downloaded from picture archiving and communication
systems (PACS). The radiology consultant was provided with files containing patient file
numbers according to each class; the radiology consultant fetched all the CXR images of the
Sensors 2024, 24, 2641 12 of 30
specific range of years (e.g., COVID-19 from 2020 to 2022) that related to the listed patients.
Each image was selected and labeled by the represented class based on matching the date
when the patient visited the hospital and was diagnosed with the disease with the date that
the CXR image was taken.
The CXR images were obtained in the DICOM (Digital Imaging and Communications
in Medicine) extension. The MicroDicom DICOM viewer version (DM_PLATFORM_XRAY_
GANAPATI_4.10.2_2020_FW41.1_158) was used to convert the images to the appropriate
format of a JPG file [22].
To help the deep learning model focus on the chest area (especially the lungs), the
images were manually cropped to guarantee that no tiny part of the lungs was removed in
any way, cutting out a chest region and removing any other parts of the body that appeared
in the image. Applying image enhancement is also important to improve the classification
result. The ability of the gamma-correction-based technique to detect COVID-19 from CXR
images outperforms other methods [23]. Trying different thresholds, the gamma threshold
value (0.9) was chosen with the contrast enhancement threshold value (1.5) to enhance
the contrast of the CXR images. Combining them enables a more thorough adjustment
of the appearance and tonal range of the image. If P is the pixel value within the range
[0,255], then x is the pixel’s grayscale value (x ∈ P). The output pixel vector of the gamma
correction function g(x) is calculated with Equation (1).
x 1/γ( x)
g( x ) = 255 (1)
255
In addition, we apply image denoising using the total variation filter (TVF) method to
remove the noise from the images. Based on the literature, combining contrast enhancement
(gamma correction) and image denoising (TVF) approaches produces outstanding results
on COVID-19 images [24]. Moreover, transformations are used to preprocess the images
before the training phase. Due to dealing with CXR images from different resources and
sizes, standardization was applied by resizing the images. The images were resized to
128 × 128 pixels, which gave a better result through the experiments. The images were
converted to grayscale and the image pixel values were normalized to a range by dividing
by 255 (the maximum pixel value for 8-bit images) and then converting to a tensor for
integration with the TensorFlow framework.
Sensors 2024, 24, 2641 dilation was applied to merge nearby bright regions and increase their size; an opening 13 of 30
operation was also performed on the dilated images. Erosion was applied to the opened
images to shrink the bright regions and refine the image by reducing the size of the re-
maining bright regions after the opening operation. To prepare the images for parabola
The first (background) and last (foreground) components were excluded as they were not
fitting, connected component analysis was performed on the images to define the con-
useful;
nectedthen, fine tuning
components. The to the
first best number
(background) andoflast
connected components
(foreground) waswere
components performed.
ex-
Thereafter, parabola fitting was calculated using Equation (2):
cluded as they were not useful; then, fine tuning to the best number of connected compo-
nents was performed. Thereafter, parabola fitting
2 was calculated using Equation (2):
f(x) = ax + bx + c (2)
f(x) = ax + bx + c (2)
The most fitting connected components were considered, and all the curves were
The most fitting connected components were considered, and all the curves were
plotted on the images using polyline’s function. Finally, the rib region was acquired, shadow
plotted on the images using polyline’s function. Finally, the rib region was acquired,
estimation was clearly
shadow estimation wasdefined, and suppression
clearly defined, was performed
and suppression was performedby removing the shadows
by removing the
from an image, achieved by adjusting the pixel values in the shadow
shadows from an image, achieved by adjusting the pixel values in the shadow regions regions based on the
average BGR color values.
based on the average BGR color values.
Applying
Applyingthe theprevious
previous approach
approach to toour
ourprivate
privateCXRCXR images,
images, as shown
as shown in Figure
in Figure 6, 6,
delivers
delivers unacceptable results on most images. It removes a lot of nodules from the lungarea,
unacceptable results on most images. It removes a lot of nodules from the lung
which
area,affects
which the prediction
affects results.results.
the prediction It is important to note
It is important to that
notethe
thateffectiveness
the effectivenessof shadow
of
suppression may depend
shadow suppression mayon the characteristics
depend of the images,
on the characteristics which could
of the images, whichbe attributed
could be
to attributed
a lack of to
clarity
a lackofofvision
clarity and sometimes
of vision the boundary
and sometimes of the of
the boundary lungs for most
the lungs images,
for most
images,
despite despite
them beingthem being preprocessed.
preprocessed. TestingTesting on a variety
on a variety of images
of images is often
is often necessaryfor a
necessary
for a robust
robust shadow shadow
removal removal
model. model. For reason,
For that that reason, we decided
we decided notnot to apply
to apply ribrib elimina- in
elimination
tion
this in this experiment.
experiment.
(A) Original image (B) Lung mask (C) Soble edge detection (D) Dilation
(E) Calculation of connected components (F) Shadow estimation (G) Rib suppression (H) Shadow subtraction
Figure
Figure 6. 6.An
Anexample
exampleof
ofthe
the rib
rib shadow
shadow elimination
eliminationprocess
processand result.
and result.
3.4. Generating
3.4. GeneratingSynthetic
Synthetic Dataset
Dataset
OneOnepotential
potentialpitfall
pitfall toto consider
consider withwiththis
thisapproach
approach is is
thethe
presence of imbalanced
presence of imbalanced
datasets.When
datasets. Whenthere
thereisis aa significant
significant skew
skewininthethedistribution
distribution of of
classes, with
classes, some
with classes
some classes
having
having farfewer
far fewersamples
samples thanthan others,
others,the themodel
modelcancanbecome
become biased towards
biased the the
towards majority
majority
class.
class. Thismeans
This meansthat
thatthe
the model
model might
mightperform
performwell
wellininidentifying
identifying thethe
common
common classclass
but but
struggle
struggle toto accuratelyclassify
accurately classifythetheless
lessfrequent
frequentones.
ones.This
This bias
bias can
can lead
lead to
to misleading
misleading re- results
sults and limit the generalizability of the approach to real-world scenarios
and limit the generalizability of the approach to real-world scenarios with a more balanced with a more
balanced
class class distribution.
distribution. To addressTobias address bias in
in model model training,
training, we needwe to need to balance
balance the to
the dataset
ensure that the number of instances for each class is roughly the same. Balanced datasets
often lead to better model performance [27]. While the number of cases for classes in each
level of the hierarchy structure are not balanced, we need to generate synthetic data to
balance the dataset. Given that the most common augmentation methods used to increase
the dataset do not fit with the type of dataset that is being used in this research, we choose
GANs as the base model. A variety of versions of the model have been developed, each
with a particular purpose [28].
Sensors 2024, 24, 2641 14 of 30
Figure
Figure 7. 7.Samples
Samplesof
ofsynthetic
synthetic CXR
CXR images.
images.
4. 4.
Hierarchal
HierarchalModel
ModelArchitecture
Architecture
ToTo develop
develop a hierarchical
a hierarchical classification
classification by applying
by applying deepdeep learning
learning models,
models, we
we adapted
adapted four pre-trained models to tackle the hierarchy classification process. It
four pre-trained models to tackle the hierarchy classification process. It has been observed has been
observed
that that VisualGroup
Visual Geometry Geometry Group (VGG)-based
(VGG)-based and ResidualandNetwork
Residual(Resnet)-based
Network (Resnet)- models
are widely utilized in this field and provide outstanding results [29,30].results
based models are widely utilized in this field and provide outstanding [29,30]. We
We adopted VGG11
adopted
and VGG11
Resnet18 andbasic
as the Resnet18
modelsas the
forbasic
this models for this
challenge challenge
because both because
are more both are more
suitable for the
suitable for
moderate sizethe
ofmoderate size of
our dataset. Theourdataset
dataset.consisted
The dataset consisted
of eight of eight hierarchical
hierarchical paths that are
paths that are shown in Table 6. The number of samples in the table represents only the
shown in Table 6. The number of samples in the table represents only the number of cases
number of cases in the original datasets. The details of all models are explained in the
in the original datasets. The details of all models are explained in the following subsections.
following subsections.
Table Dataset
6. 6.
Table Datasetdistribution
distribution for
for hierarchical classification.
hierarchical classification.
Label
LabelPath
Path #Samples
#Samples
Level#1
Level#1
Normal
Normal 1273
1273
Pneumonia
Pneumonia 3270
3270
Level#2
Level#2
Pneumonia\Bacterial
Pneumonia\Bacterial 248
248
Pneumonia\Viral
Pneumonia \Viral 3165
3165
Level#3
Level#3
Pneumonia\Viral\SARSr-CoV-2
Pneumonia \Viral\SARSr-CoV-2 1848
1848
Pneumonia\Viral\Influenza
Pneumonia \Viral\Influenza 1281
1281
Pneumonia \Viral\RSV
Pneumonia\Viral\RSV 2121
Pneumonia \Viral\Adenoviruses
Pneumonia\Viral\Adenoviruses 1515
Figure8.8.VGG-like
Figure VGG-likemulti-modal
multi-modal architecture.
architecture.
Theinput
The second architecture
image size wasis the VGG-backbone
changed to 128 ×multi-modal,
128 pixels. which adapts
The initial the original
CNN layer for a
VGG architecture by utilizing a pre-trained VGG11 model as a feature extractor,
first-level decision consists of one 2D convolution with thirty-two filters of size 1 × 1 and a followed
by three
kernel sizebranches
of 3 × 3 of fullythe
using connected layers (ANNs)
ReLU activation, and Maxfor the hierarchical
Pooling with adecision-making
kernel size of 2 × 2.
task. It has the same input image size: 128 × 128 pixels. The first convolutional
The input for the initial layer is 1 channel, and the output is 32 channels. The flattened layer was
output
from the initial CNN layer concatenated with the tabular data. These concatenatedby
modified to handle a single input channel, applying 64 filters of size 3 × 3 followed a are
data
Max Pooling layer with a kernel size of 2 × 2. This is followed by the second
then passed to two hidden layers (Dense) of 128 neurons with ReLU activation. The final convolutional
layer with 64 filters of size 3 × 3 followed by a Max Pooling layer with a kernel size of 2 ×
layer of the initial branch (decision #1) represents probabilities of normal/pneumonia classes.
The pneumonia CNN layer for the second-level decision consists of two convolutional
layers with the same kernel size, activation, and Max Pooling layer. The output (decision
#2) represents probabilities of viral/bacterial classes. The viral branch for the third-level
decision is similar to the pneumonia branch, but it has a different final layer with four output
units (decision #3) corresponding to SARS-CoV-2, influenza, RSV, and adenovirus.
The second architecture is the VGG-backbone multi-modal, which adapts the original
VGG architecture by utilizing a pre-trained VGG11 model as a feature extractor, followed by
three branches of fully connected layers (ANNs) for the hierarchical decision-making task. It
has the same input image size: 128 × 128 pixels. The first convolutional layer was modified
to handle a single input channel, applying 64 filters of size 3 × 3 followed by a Max Pooling
layer with a kernel size of 2 × 2. This is followed by the second convolutional layer with
64 filters of size 3 × 3 followed by a Max Pooling layer with a kernel size of 2 × 2 using ReLU
activation. This is followed by two convolutional layers and a Max Pooling layer with a
kernel size of 2 × 2 using ReLU activation. This pattern of two convolutional layers followed
by a Max Pooling layer is repeated for several blocks, progressively increasing the number
of filters (typically doubling) to extract more intricate features. After the convolutional
layers, the Flatten features are combined with the tabular feature and are passed to three
separated branches of fully connected layers. These layers perform computations on all acti-
vations from the previous layers. The fully connected layers of the original architecture are
replaced with custom layers designed to make hierarchical decisions specific to pneumonia
classification, as shown in Figure 9. Each branch has a sequential block with two Dense
layers followed by a Max Pooling layer is repeated for several blocks, progressively in-
creasing the number of filters (typically doubling) to extract more intricate features. After
the convolutional layers, the Flatten features are combined with the tabular feature and
are passed to three separated branches of fully connected layers. These layers perform
Sensors 2024, 24, 2641 computations on all activations from the previous layers. The fully connected layers of the
17 of 30
original architecture are replaced with custom layers designed to make hierarchical deci-
sions specific to pneumonia classification, as shown in Figure 9. Each branch has a sequen-
tial block with two Dense layers and ReLU activation. The first Dense layer has 128 units.
layers and ReLU
The second activation.
Dense layer hasThe first Dense
a specific numberlayer has 128
of units units. The
depending second
on the Dense layer
classification
hastask. The first and second branches (decision #1 and decision #2) have two units and
a specific number of units depending on the classification task. The first second
(normal
branches
vs. pneumonia and viral vs. bacterial, respectively). The third branch (decision #3) hasviral
(decision #1 and decision #2) have two units (normal vs. pneumonia and
vs.four
bacterial,
units ofrespectively). The third(SARS-CoV-2
the four viral subtypes branch (decision #3) has vs.
vs. influenza four units
RSV vs. of the four viral
adenovirus).
subtypes (SARS-CoV-2 vs. influenza vs. RSV vs. adenovirus). This adaptation
This adaptation allows the models to focus on the most relevant features for each decision allows the
models
level. to focus on the most relevant features for each decision level.
Figure
Figure 10.ResNet-like
10. ResNet-like multi-modal
multi-modal architecture.
architecture.
The
TheResNet-backbone multi-modalisisthe
ResNet-backbone multi-modal thesecond
second architecture,
architecture, which
which is anisadaptation
an adaptation
ofofthe
theoriginal
original ResNet
ResNet architecture utilizing
architecture a pre-trained
utilizing Resnet18
a pre-trained modelmodel
Resnet18 as a feature
as a ex-
feature
tractor, followed
extractor, followedbyby
three branches
three of fully
branches connected
of fully connectedlayers (ANNs)
layers for thefor
(ANNs) hierarchical
the hierarchi-
caldecision-making
decision-making task, as shown
task, in Figure
as shown 11. It has
in Figure 11. aItgrayscale input image
has a grayscale input size of 128size
image × of
128 × 128 pixels. The adaptations were designed to process single-channel (grayscale) CXR
images by modifying the first convolutional layer to accept a single input channel. The
model consists of 18 conventional layers with ReLU activation. The extracted features from
the CXR images are flattened, transforming the 2D feature maps into 1D vectors suitable for
fully connected layers. The flattened features from the CNN block combine image-derived
features with additional tabular patient information. After utilizing the pre-trained network
as a feature extractor, it employs three branches of fully connected layers (ANNs) for the
hierarchical decision-making process. Instead of using standard fully connected layers,
this architecture distributes the features into three branches, each containing a sequence
of two Dense layers with a ReLU activation function in between. The first Dense layer in
all branches has 128 units. The key difference lies in the second Dense layer, which adapts
its number of units based on the classification task: two units for normal vs. pneumonia
(decision #1), two units for viral vs. bacterial (decision #2, assuming pneumonia), and
four units for the four viral subtypes (decision #3). This branching approach with custom
layers allows the model to make hierarchical decisions tailored to each classification level.
Adapting ResNet’s residual learning principle, the model efficiently learns features from
layers, this architecture distributes the features into three branches, each containing a se-
quence of two Dense layers with a ReLU activation function in between. The first Dense
layer in all branches has 128 units. The key difference lies in the second Dense layer, which
adapts its number of units based on the classification task: two units for normal vs. pneu-
Sensors 2024, 24, 2641 monia (decision #1), two units for viral vs. bacterial (decision #2, assuming pneumonia),
19 of 30
and four units for the four viral subtypes (decision #3). This branching approach with
custom layers allows the model to make hierarchical decisions tailored to each classifica-
tion level. Adapting ResNet’s residual learning principle, the model efficiently learns fea-
CXR images,
tures from CXRwhich is crucial
images, whichforismedical imaging
crucial for tasks
medical wheretasks
imaging interpretability and accuracy
where interpretability
areand
paramount.
accuracy are paramount.
4.3.4.3. Training
Training thetheHierarchical
HierarchicalMulti-Modal
Multi-Modal Methodology
Methodology
WeWe are
are nowready
now readytototrain
train the
the four
four multi-modals
multi-modalsusing
usingthe hierarchical
the multi-modal
hierarchical multi-modal
approach. The pseudocode provided in Algorithm 1 illustrates the
approach. The pseudocode provided in Algorithm 1 illustrates the sequentialsequential training andand
training
classification strategy for a hierarchical model focused on pneumonia detection from CXR
classification strategy for a hierarchical model focused on pneumonia detection from CXR
images and tabular data. This methodology enables the model to learn distinctive features
images and tabular data. This methodology enables the model to learn distinctive features
relevant to each decision level, improving its ability to generalize and accurately classify
relevant to each decision level, improving its ability to generalize and accurately classify
new data.
new data.
Algorithm 1. Pneumonia Hierarchical Classification
Algorithm
1: 1. Pneumonia
Input: dataset_path,Hierarchical
num_epochs,Classification
batch_size
1: 2: Input:
Output: Trained Hierarchical
dataset_path, num_epochs,Model
batch_size
3: 1. Initialize transformations
2: Output: Trained Hierarchical Model for dataset preprocessing
4: 2. Load and split dataset into training and testing sets
3: 1. Initialize transformations for dataset preprocessing
5: 3. Define model, loss function, and optimizer
4: 2. Load and split dataset into training and testing sets
6: Function TrainModelForDecision(model,train_data, decision_point, loss_weights)
5: 7: 3. Define model, loss function, and optimizer
For each epoch in num_epochs do
6: Function TrainModelForDecision(model, train_data, decision_point, loss_weights)
7: For each epoch in num_epochs do
8: For each batch in train_data do
9: Perform forward pass for the current decision_point
10: Compute loss using decision-specific loss_weights
11: Perform backward pass and update model parameters
12: End For
13: End For
14: End Function
15: Sequentially train model for each decision point in the hierarchy
16: a. For decision_point in [decision_1, decision_2, decision_3] do
17: i. Set appropriate loss_weights for the current decision_point
18: ii. Call TrainModelForDecision with the current decision_point
19: iii. Optionally adjust model for next decision_point
20: b. End For
21: Function ClassifyImage(image, tabular data, model)
22: Perform model inference on the combined features
23: Extract and return decision outcomes for each hierarchy level
24: End Function
25: 4. Demonstrate classification with a sample image and tabular data using the trained model
Sensors 2024, 24, 2641 20 of 30
The hierarchal multi-modal first determines whether the image shows signs of pneu-
monia. If pneumonia is detected, it then classifies the pneumonia as either viral or bacterial.
If viral pneumonia is detected, the model further classifies the type of viral pneumonia.
The hierarchical inference function returns a tuple of decisions, each corresponding to a
level in the decision hierarchy. Algorithm 2 details the proposed inference with conditional
flow in the form of pseudocode.
The training methodology adopted for the VGG-like and ResNet-like multi-modals
involves a sequential and focused approach, targeting one decision point at a time within
the hierarchical structure of the problem. This approach ensures that the models learn to
accurately classify at each level of decision making, from distinguishing between normal
and pneumonia cases to identifying specific types of pneumonia, and so on. Focusing on
the first decision point (normal vs. pneumonia), which distinguishes between normal and
pneumonia cases, during this phase, the training process begins as the following:
• Train the VGG-like or ResNet-like model, focusing solely on the first decision point.
• Set the loss weight for the first decision point (e.g., normal vs. pneumonia) to 1.
• Set the loss weights for subsequent decision points (e.g., viral vs. bacterial, viral sub-
types) to 0. This ensures that the model concentrates its learning on accurately classifying
the initial coarse categories without being influenced by the more detailed classifications
that follow.
• Train the model until it achieves satisfactory performance on the first decision to
distinguish normal from pneumonia cases.
• The training proceeds to the next decision point (bacterial vs. viral). For this phase,
the model’s weights from the previous training step are retained, ensuring continuity
and leveraging learned features. The loss weight for the current decision is now set
to a higher value (e.g., 0.9 for decision #2), while the loss weight for the first decision
might be reduced (e.g., 0.1) to maintain its knowledge, and the loss weight for the
third decision is set to 0. This process is repeated for each subsequent decision point,
gradually shifting the model’s focus down the hierarchy.
Sensors 2024, 24, 2641 21 of 30
1 TPi + TNi
C ∑ TPi + TNi + FPi + FNi
Macro-average Accuracy = × (3)
Macro-average precision measures the average ratio of true positives among all pre-
dicted positives across all classes, as shown in Equation (4):
1 TPi
C ∑ TPi + FPi
Macro-average Precision = × (4)
1 TPi
C ∑ TPi + FNi
Macro-average sensitivity = × (5)
Macro-average F1-score combines both precision and recall into a single metric, pro-
viding an overall measure of the model’s performance in terms of correctly identifying true
positives and minimizing false positives and negatives, as shown in Equation (6):
1 2 × TPi
C ∑ 2 × TPi + FPi + FNi
Macro-average F1-score = × (6)
where
Sensors 2024, 24, 2641 22 of 30
Macro-Average Accuracy
Resnet backbone model (First+ Second) dataset 82.06
Resnet backbone model (First) dataset 75.85
VGG backbone model (First+ Second) dataset 82.2
VGG backbone model (First) dataset 78.53
Resnet like model (First+ Second) dataset 85.77
Resnet like model (First) dataset 83.48
VGG like model (First+ Second) dataset 84.25
VGG like model (First) dataset 81.45
70 72 74 76 78 80 82 84 86 88
Figure 12.
Figure 12. Comparison of Comparison of macro-average
macro-average accuracy
accuracy for all modelsforwith
all models with and
and without withoutdataset.
a second a second dataset.
Table 7. Results of decisions at each level for each hierarchical classification schema using only CXR
images.
Table 8. Comparison of COVID-19 classification results for each hierarchical classification schema
using only CXR images.
Table 9. Comparison of macro-avg results for each hierarchical classification schema using only CXR
images.
In the last experiment, we applied the multi-modal approach by combining both the
CXR images and tabular data. Compared to the third experiment, the results improved very
clearly for all models after integrating the medical tabular data into the CXR images. This
indicates the importance of adding medical data to CXR images for the diagnostic process and
that depending only on the imaging modality does not achieve the required accuracy results.
Key demographic factors that contribute to classifying pneumonia include the patient’s age,
body mass, MEWS score (indicating overall clinical severity and vital signs), nationality,
and gender. Additionally, in terms of lab tests, blood clotting, muscle damage, blood clots,
inflammation, tissue damage, Albumin level, kidney function, vitamin D, and white blood
cell count are the most crucial factors and help to determine whether pneumonia is present
as well as to determine its type. Finally, medications for heart failure, nausea, arthritis, blood
pressure, diabetes, acidosis, and iron deficiency are among the most important to consider.
What was dispensed to the patient when they were infected with pneumonia indicates the
most important symptoms of the side effects associated with the infection.
Sensors 2024, 24, 2641 24 of 30
Figure Testingaccuracy
13. Testing
Figure 13. accuracy(one-fold)
(one-fold) against
against thethe number
number of epochs
of epochs forResnet-backbone
for the the Resnet-backbone
multi- multi-
modal in
modal in the
the last
lastexperiments.
experiments.
Table 10. Results of decisions at each level for each hierarchical classification schema using CXR
Table 10. Results of decisions at each level for each hierarchical classification schema using CXR
images and tabular data.
images and tabular data.
Models Decision # Accuracy Sensitivity Precision F1-Score
Models Decision # #1
Decision Accuracy
93.63 Sensitivity
93.63 Precision 93.83
94.67 F1-Score
VGG-like Decision
Decision #1 #2 96.06
93.63 96.06
93.63 96.15
94.67 96.09 93.83
VGG-like Decision #2 #3
Decision 91.62
96.06 91.62
96.06 91.70
96.15 91.65 96.09
Decision
Decision #3 #1 95.90
91.62 95.90
91.62 96.25
91.70 95.98 91.65
Resnet-like Decision #2 93.29 93.29 93.89 93.41
Decision #1 95.90 95.90 96.25 95.98
Decision #3 93.38 93.38 93.44 93.41
Resnet-like Decision #2 93.29 93.29 93.89 93.41
Decision #1 97.66 97.66 97.68 97.67
Decision #3 93.38 93.38 93.44 93.41
VGG-backbone Decision #2 96.68 96.68 96.80 96.71
Decision #1 #3
Decision 97.66
91.40 97.66
91.40 97.68
91.49 91.45 97.67
VGG-backbone Decision #2
Decision #1
96.68
98.13
96.68
98.13
96.80
98.17 98.15
96.71
Decision #3 91.40 91.40 91.49 91.45
Resnet-backbone Decision #2 96.42 96.42 96.49 96.44
Decision #1 #3
Decision 98.13
93.35 98.13
93.35 98.17
93.38 93.36 98.15
Resnet-backbone Decision #2 96.42 96.42 96.49 96.44
Decision #3
Table 11. Comparison of COVID-19 93.35
classification 93.35
results for each 93.38
hierarchical classification 93.36
schema
using CXR images and tabular data.
Table 11. Comparison of COVID-19 classification results for each hierarchical classification schema
using CXR images and tabular data.
Table 12. Comparison of macro-avg results for each hierarchical classification schema using CXR
images and tabular data.
Figure 14 shows the training and testing loss chart; the training loss starts high and
steadily decreases over the epochs, which means that the models learn from the training
Sensors 2024, 24, 2641and improve their performance. The testing loss also decreases over the epochs, and it
data 26 of
is close to the training loss. In this case, the two curves are very close, suggesting that there
is no observed overfitting for any of the multi-modals.
The 6 × 6 confusion-matrix plots for the four multi-modals are depicted in Figure
Instead of showing all eight possible classes, the confusion matrix only presents six. Th
is because some classes are grouped together. Imagine a hierarchy in which the class
(pneumonia and viral) are super classes of others (such as influenza). The confusion m
trix focuses on the specific types (leaf nodes) because the overall pneumonia number
Sensors 2024, 24, 2641 26 of 30
The 6 × 6 confusion-matrix plots for the four multi-modals are depicted in Figure 15.
Instead of showing all eight possible classes, the confusion matrix only presents six. This
is because some classes are grouped together. Imagine a hierarchy in which the classes
(pneumonia and viral) are super classes of others (such as influenza). The confusion matrix
focuses on the specific types (leaf nodes) because the overall pneumonia number is just the
sum of its subtypes, and it presents the actual predicted cases. The horizontal axes corre-
spond to the predicted classes, and the vertical axes correspond to the true classes, which
represent the actual classifications. The diagonal cells in the confusion matrix represent
the correct predictions (TP and TN). The off-diagonal cells represent incorrect predictions
(FP and FN). From observing the number of false prediction cells, the Resnet-backbone
model achieved a low overall misclassification rate of 4.03%. Bacterial pneumonia proved
Sensors 2024, 24, 2641 27 of 31
the most challenging with 74 falsely predicted cases, while the model perfectly classified
all adenovirus cases. There were 5 falsely predicted cases for the normal class and 16 for
COVID-19 classification, which is an acceptable outcome. There were 19 and 29 misclas-
however, the misclassification rate was very low, especially with the Resnet-backbone
sification cases for influenza and RSV, respectively. Though viral pneumonia (including
multi-modal.
influenza, COVID-19, adenovirus, and likely RSV) saw 64 misclassifications, the overall
The macro-average ROC curve in Figure 16 demonstrates that the VGG-like multi-
pneumonia category naturally had a higher rate of 138 out of 1905 cases. Other models
modal (AUC = 0.95) has the best overall performance across all classes, followed closely
(VGG-backbone, Resnet-like, and VGG-like) exhibited slightly higher misclassification
by the VGG-backbone and Resnet-backbone multi-modals (AUC = 0.93~0.92). Taking into
rates between 4.75% and
consideration the other 6.23%. All metrics,
performance the multi-modals seem to perform
the Resnet-backbone well;achieved
multi-modal however, the
misclassification rate was very low, especially with
superior performance in the classification process. the Resnet-backbone multi-modal.
The macro-average ROC curve in Figure 16 demonstrates that the VGG-like multi-
modal (AUC = 0.95) has the best overall performance across all classes, followed closely
by the VGG-backbone and Resnet-backbone multi-modals (AUC = 0.93~0.92). Taking into
Sensors 2024, 24, 2641 27 of 30
Figure
Figure 16.
16. Macro-average
Macro-averageROC
ROCcurve
curveacross
acrossall
alldecisions
decisionsfor
foreach
eachmulti-modal
multi-modalininthe
thelast
lastexperi-
experiments.
ments.
While many studies have explored various approaches to the classification and identi-
While
fication of many studiestohave
COVID-19, our explored
knowledge, various approaches
no approach hasto attempted
the classification and iden-
to classify COVID-19
tification of COVID-19, to our knowledge, no approach has attempted
by employing a hierarchical classification architecture that combines CXR image to classify COVID-features
19
withby tabular
employing a hierarchical
medical classification
data within architecture
a single model. This that
unique combines CXRdifferentiates
approach image fea- our
tures
workwithfromtabular medical
existing data within
research a single model.
on COVID-19 This unique
classification, approach
which differentiates
is particularly evident
our
when comparing the results of our proposed approach with those of similarevident
work from existing research on COVID-19 classification, which is particularly works in the
when comparing the results of our proposed approach with those of similar works in the
literature, as shown in Table 13. While binary classification and CT scan-based studies can
literature, as shown in Table 13. While binary classification and CT scan-based studies can
achieve high accuracy, we opted not to compare our work to them for two reasons. Firstly,
achieve high accuracy, we opted not to compare our work to them for two reasons. Firstly,
simplifying the problem into a binary classification might not reflect the complexities of
simplifying the problem into a binary classification might not reflect the complexities of
real-world scenarios with more granular classifications. Secondly, CT scans, while valu-
real-world scenarios with more granular classifications. Secondly, CT scans, while valua-
ablefor
ble fordiagnosis,
diagnosis,cancan
be be impractical
impractical duedue to limitations
to the the limitations mentioned
mentioned previously
previously in the in the
literature review. Our focus here is on more applicable, similar
literature review. Our focus here is on more applicable, similar approaches. approaches.
Table 13.
Table Comparisontoto
13.Comparison related
related studies
studies applying
applying similar
similar approaches.
approaches.
Methodology
Methodology
Model
Model Accuracy
Accuracy F1-Score
F1-Score
Hierarchal Multi-Modal
Hierarchal Multi-Modal
Pereira et al. [15] Yes No - 65%
Pereiraetetal.
Attaullah al.[33]
[15] No Yes Yes No 77.88% - - 65%
Attaullah
Cheng et al. [34] [33]
et al. No No Yes Yes 77.88%
73.2% 70.7% -
Cheng
Loey et al.
et al. [35][34] No No No Yes 80.56%73.2% 82.32%70.7%
Loey etetal.
Rajaraman al.[35]
[36] No No No No 80.56%
91.77% 91.41%82.32%
Rajaraman
Proposed Modelet al. [36] YesNo Yes No 91.77%
95.97% 95.98%91.41%
Proposed Model Yes Yes 95.97% 95.98%
Compared to previous research using a hierarchical structure for pneumonia classi-
fication [15], our to
Compared proposed
previousapproach
researchachieved better performance
using a hierarchical structure than
for the hierarchalclassifi-
pneumonia
model in this work. The overall performance of the model achieved a macro-average
cation [15], our proposed approach achieved better performance than the hierarchal F1- model
score of 0.65, while the identification of COVID-19 cases specifically achieved
in this work. The overall performance of the model achieved a macro-average F1-score of an F1-score
of 0.89 for this class. This study also faced limitations in the feature extraction phase. It
0.65, while the identification of COVID-19 cases specifically achieved an F1-score of 0.89
relied on hand-crafted features, potentially missing more intricate patterns, and extracted
for this class. This study also faced limitations in the feature extraction phase. It relied on
features from a single modality. Additionally, the sample size was restricted, with only
hand-crafted features, potentially missing more intricate patterns, and extracted features
1144 CXR images (1000 normal and a concerningly low 144 pneumonia cases, including
from a single modality. Additionally, the sample size was restricted, with only 1144 CXR
COVID-19). This limited dataset might hinder the generalizability of the findings.
imagesThe(1000 normalapproach
multi-modal and a concerningly
significantlylow 144 pneumonia
outperforms previous cases,
workincluding COVID-19).
by Attaullah et
This limited dataset might hinder the generalizability of the findings.
al. [33] for classifying COVID-19 using a public dataset with five classes using the data of
symptomsThe multi-modal approach
and CXR images, significantly
which achieved outperforms
an accuracy previous
of 78.88%. work
Also, the by Attaullah
research in
et al. [33] for classifying COVID-19 using a public dataset with five
[34] combined clinical data with the CXR image features fed into a neural network classes using the data
Sensors 2024, 24, 2641 28 of 30
of symptoms and CXR images, which achieved an accuracy of 78.88%. Also, the research
in [34] combined clinical data with the CXR image features fed into a neural network
architecture, achieving 73.2% accuracy and a 70.7% F1-score. While previous approaches
have their merits, our findings yield demonstrably superior outcomes.
However, compared to flat classification studies that relied on CXR images only,
our study surpasses previous work [35] that addressed a limited dataset (307 images) by
employing a two-stage approach (data augmentation and deep learning). While their best
result with GoogLeNet achieved 80.6% accuracy for multiclass pneumonia classification.
In addition, the ensemble learning for the COVID-19 detection module in this research [36]
categorized standardized CXR images into different classes, with the findings indicating
that the developed deep learning system successfully identified COVID-19 pneumonia
with an accuracy rate of 91.77% and an F1-score of 91.41%. Furthermore, it is noteworthy
that our innovative approach surpasses these results, demonstrating superior performance
in accurately predicting COVID-19 pneumonia through the integration of hierarchical
classification architecture, combining CXR image features with tabular medical data within
a unified model.
6. Conclusions
This paper proposes a novel approach for classifying COVID-19 and distinguishing it
from other types of pneumonia and normal lungs using CXR images and medical tabular
data in four different hierarchal architectures based on Resnet and VGG pre-trained models.
This study used a private dataset obtained from King Khalid University Hospital and
Rashid Hospital, containing a total of 4544 cases. This study aims to enhance the process
of diagnosing COVID-19 and prove that combining CXR images with clinical data can
achieve significant improvements in the hierarchal classification process. Overall, the
performance metrics for all the hierarchal deep learning models were enhanced after
combining the medical data with CXR images. Resnet-backbone achieved the highest
performance with an accuracy of 95.97%, a precision of 96.01%, and an F-score of 95.98%.
The proposed approach showed promising results, especially the hierarchal deep learning
multi-modal. Our findings could aid in the development of better diagnostic tools for
upcoming respiratory disease outbreaks. However, this study suffers from a data imbalance
due to the lack of available patient medical data for some classes. This challenge affects the
evaluation of the model’s performance. Generating a synthetic dataset makes the model
more robust; however, it could also introduce biases or inaccuracies, potentially leading to
unreliable results. To some extent, we are satisfied with the quality of our generated dataset
so far, but we believe that there is room to enhance the quality of the synthetic dataset to
optimize the model’s performance. In future work, we plan to explore more datasets from
different resources, including different classes of pneumonia and lung diseases.
Author Contributions: Conceptualization, S.A.A., A.M. and A.S.A.; methodology, S.A.A., S.A. and
A.S.A.; software, A.S.A. and M.A.; validation, S.A.A., S.A., T.N., B.M. and A.M.; formal analysis,
S.A.A., S.A. and A.S.A.; investigation, A.S.A.; resources, S.A.A., S.A., L.S., M.A. and A.S.A.; data
curation, A.S.A.; writing—original draft preparation, A.S.A.; writing—review and editing, S.A.A.,
S.A., T.N., B.M., L.S., M.A. and A.M.; visualization, S.A.A., S.A., T.N., A.M., B.M. and A.S.A.;
supervision, S.A.A. and A.M.; project administration, S.A.A., A.M. and A.S.A.; funding acquisition,
A.S.A. All authors have read and agreed to the published version of the manuscript.
Funding: This research received no external funding.
Institutional Review Board Statement: This study was conducted in accordance with the declaration
and guidelines of the Dubai Scientific Research Ethics Committee, DHA (DSREC-12/2021_01), and
the King Saud University Institutional Review Board Committee (E-251-5939).
Informed Consent Statement: Informed consent was obtained from all subjects involved in this study.
Data Availability Statement: The data presented in this study are available on request from the
corresponding author due to (ethical reasons).
Sensors 2024, 24, 2641 29 of 30
Acknowledgments: The authors would like to thank Deanship of scientific research in King Saud
University, Riyadh, Saudi Arabia for funding and supporting this research through the initiative of
DSR Graduate Students Research Support (GSR), and the Dubai Scientific Research Ethical Committee
(DSREC), Dubai Health Authority and Rashid Hospital, for their support in this study. In addition,
we give special thanks to the editor and reviewers for spending their valuable time reviewing and
polishing this article.
Conflicts of Interest: The authors declare no conflict of interest.
References
1. Pal, M.; Berhanu, G.; Desalegn, C.; Kandi, V. Severe Acute Respiratory Syndrome Coronavirus-2 (SARS-CoV-2): An Update.
Cureus 2020, 12, e7423. [CrossRef]
2. COVID-19 Cases|WHO COVID-19 Dashboard. Datadot. Available online: https://data.who.int/dashboards/covid19/cases
(accessed on 20 January 2024).
3. Chowdhury, M.E.H.; Rahman, T.; Khandakar, A.; Mazhar, R.; Kadir, M.A.; Bin Mahbub, Z.; Islam, K.R.; Khan, M.S.; Iqbal, A.; Al
Emadi, N.; et al. Can AI Help in Screening Viral and COVID-19 Pneumonia? IEEE Access 2020, 8, 132665–132676. [CrossRef]
4. Maharjan, N.; Thapa, N.; Magar, B.P.; Maharjan, M.; Tu, J. COVID-19 Diagnosed by Real-Time Reverse Transcriptase-Polymerase
Chain Reaction in Nasopharyngeal Specimens of Suspected Cases in a Tertiary Care Center: A Descriptive Cross-sectional Study.
J. Nepal Med. Assoc. 2021, 59, 464–467. [CrossRef]
5. Swapnarekha, H.; Behera, H.S.; Nayak, J.; Naik, B. Role of intelligent computing in COVID-19 prognosis: A state-of-the-art review.
Chaos Solitons Fractals 2020, 138, 109947. [CrossRef] [PubMed]
6. Yang, T.; Wang, Y.-C.; Shen, C.-F.; Cheng, C.-M. Point-of-Care RNA-Based Diagnostic Device for COVID-19. Diagnostics 2020, 10,
165. [CrossRef]
7. Helmy, Y.A.; Fawzy, M.; Elaswad, A.; Sobieh, A.; Kenney, S.P.; Shehata, A.A. The COVID-19 Pandemic: A Comprehensive Review
of Taxonomy, Genetics, Epidemiology, Diagnosis, Treatment, and Control. J. Clin. Med. 2020, 9, 1225. [CrossRef]
8. Candemir, S.; Antani, S. A review on lung boundary detection in chest X-rays. Int. J. Comput. Assist. Radiol. Surg. 2019, 14, 563–576.
[CrossRef] [PubMed]
9. Jacobi, A.; Chung, M.; Bernheim, A.; Eber, C. Portable chest X-ray in coronavirus disease-19 (COVID-19): A pictorial review. Clin.
Imaging 2020, 64, 35–42. [CrossRef] [PubMed]
10. ICD-10 Version:2019. Available online: https://icd.who.int/browse10/2019/en#/ (accessed on 25 January 2024).
11. Kiryu, S.; Yasaka, K.; Akai, H.; Nakata, Y.; Sugomori, Y.; Hara, S.; Seo, M.; Abe, O.; Ohtomo, K. Deep learning to differentiate
parkinsonian disorders separately using single midsagittal MR imaging: A proof of concept study. Eur. Radiol. 2019, 29, 6891–6899.
[CrossRef]
12. Roberts, M.; Driggs, D.; Thorpe, M.; Gilbey, J.; Yeung, M.; Ursprung, S.; Aviles-Rivero, A.I.; Etmann, C.; McCague, C.; Beer, L.;
et al. Common pitfalls and recommendations for using machine learning to detect and prognosticate for COVID-19 using chest
radiographs and CT scans. Nat. Mach. Intell. 2021, 3, 199–217. [CrossRef]
13. Althenayan, A.S.; AlSalamah, S.A.; Aly, S.; Nouh, T.; Mirza, A.A. Detection and Classification of COVID-19 by Radiological
Imaging Modalities Using Deep Learning Techniques: A Literature Review. Appl. Sci. 2022, 12, 10535. [CrossRef]
14. Gao, J.; Li, P.; Chen, Z.; Zhang, J. A Survey on Deep Learning for Multimodal Data Fusion. Neural Comput. 2020, 32, 829–864.
[CrossRef]
15. Pereira, R.M.; Bertolini, D.; Teixeira, L.O.; Silla, C.N.; Costa, Y.M. COVID-19 identification in chest X-ray images on flat and
hierarchical classification scenarios. Comput. Methods Programs Biomed. 2020, 194, 105532. [CrossRef]
16. Regulatory Considerations on Artificial Intelligence for Health. Available online: https://www.who.int/publications-detail-
redirect/9789240078871 (accessed on 30 March 2024).
17. Ethics and Governance of Artificial Intelligence for Health: GUIDANCE on Large Multi-Modal Models. Available online:
https://www.who.int/publications-detail-redirect/9789240084759 (accessed on 30 March 2024).
18. Barnett, W.R.; Radhakrishnan, M.; Macko, J.; Hinch, B.T.; Altorok, N.; Assaly, R. Initial MEWS score to predict ICU admission or
transfer of hospitalized patients with COVID-19: A retrospective study. J. Infect. 2021, 82, 282–327. [CrossRef]
19. Gardner-Thorpe, J.; Love, N.; Wrightson, J.; Walsh, S.; Keeling, N. The Value of Modified Early Warning Score (MEWS) in Surgical
In-Patients: A Prospective Observational Study. Ind. Mark. Manag. 2006, 88, 571–575. [CrossRef]
20. Menéndez, C.; Ordieres, J.; Ortega, F. Importance of information pre-processing in the improvement of neural network results.
Expert Syst. 1996, 13, 95–103. [CrossRef]
21. Liu, M.; Li, S.; Yuan, H.; Ong, M.E.H.; Ning, Y.; Xie, F.; Saffari, S.E.; Shang, Y.; Volovici, V.; Chakraborty, B.; et al. Handling missing
values in healthcare data: A systematic review of deep learning-based imputation techniques. Artif. Intell. Med. 2023, 142, 102587.
[CrossRef]
22. Varma, D.R. Managing DICOM images: Tips and tricks for the radiologist. Indian J. Radiol. Imaging 2012, 22, 4–13. [CrossRef]
23. Rahman, T.; Khandakar, A.; Qiblawey, Y.; Tahir, A.; Kiranyaz, S.; Kashem, S.B.A.; Islam, M.T.; Al Maadeed, S.; Zughaier, S.M.;
Khan, M.S.; et al. Exploring the effect of image enhancement techniques on COVID-19 detection using chest X-ray images.
Comput. Biol. Med. 2021, 132, 104319. [CrossRef]
Sensors 2024, 24, 2641 30 of 30
24. Sharma, A.; Mishra, P.K. Image enhancement techniques on deep learning approaches for automated diagnosis of COVID-19
features using CXR images. Multimed. Tools Appl. 2022, 81, 42649–42690. [CrossRef]
25. Gordienko, Y.; Gang, P.; Hui, J.; Zeng, W.; Kochura, Y.; Alienin, O.; Rokovyi, O.; Stirenko, S. Deep Learning with Lung Segmentation
and Bone Shadow Exclusion Techniques for Chest X-Ray Analysis of Lung Cancer. In Advances in Computer Science for Engineering and
Education; Hu, Z., Petoukhov, S., Dychka, I., He, M., Eds.; Advances in Intelligent Systems and Computing; Springer International
Publishing: Cham, Switzerland, 2019; Volume 754, pp. 638–647.
26. Oğul, H.; Oğul, B.B.; Ağıldere, A.M.; Bayrak, T.; Sümer, E. Eliminating rib shadows in chest radiographic images providing
diagnostic assistance. Comput. Methods Programs Biomed. 2016, 127, 174–184. [CrossRef] [PubMed]
27. Bennin, K.E.; Keung, J.; Monden, A.; Kamei, Y.; Ubayashi, N. Investigating the Effects of Balanced Training and Testing Datasets
on Effort-Aware Fault Prediction Models. In Proceedings of the 2016 IEEE 40th Annual Computer Software and Applications
Conference (COMPSAC), Atlanta, GA, USA, 10–14 June 2016; pp. 154–163. [CrossRef]
28. Abedi, M.; Hempel, L.; Sadeghi, S.; Kirsten, T. GAN-Based Approaches for Generating Structured Data in the Medical Domain.
Appl. Sci. 2022, 12, 7075. [CrossRef]
29. Andrade-Girón, D.C.; Marín-Rodriguez, W.J.; Lioo-Jordán, F.d.M.; Villanueva-Cadenas, G.J.; Salinas, F.d.M.G.-T.d. Neural
Networks for the Diagnosis of COVID-19 in Chest X-ray Images: A Systematic Review and Meta-Analysis. EAI Endorsed Trans.
Pervasive Health Technol. 2023, 9. [CrossRef]
30. Saini, K.; Devi, R. A systematic scoping review of the analysis of COVID-19 disease using chest X-ray images with deep learning
models. J. Auton. Intell. 2023, 7. [CrossRef]
31. Silla, C.N.; Freitas, A.A. A survey of hierarchical classification across different application domains. Data Min. Knowl. Discov.
2011, 22, 31–72. [CrossRef]
32. Hossin, M.; Sulaiman, M.N. A Review on Evaluation Metrics for Data Classification Evaluations. Int. J. Data Min. Knowl. Manag.
Process 2015, 5, 1–11. [CrossRef]
33. Attaullah, M.; Ali, M.; Almufareh, M.F.; Ahmad, M.; Hussain, L.; Jhanjhi, N.; Humayun, M. Initial Stage COVID-19 Detection
System Based on Patients’ Symptoms and Chest X-ray Images. Appl. Artif. Intell. 2022, 36, 2055398. [CrossRef]
34. Cheng, J.; Sollee, J.; Hsieh, C.; Yue, H.; Vandal, N.; Shanahan, J.; Choi, J.W.; Tran, T.M.L.; Halsey, K.; Iheanacho, F.; et al. COVID-19
mortality prediction in the intensive care unit with deep learning based on longitudinal chest X-rays and clinical data. Eur. Radiol.
2022, 32, 4446–4456. [CrossRef] [PubMed]
35. Loey, M.; Smarandache, F.; Khalifa, N.E.M. Within the Lack of Chest COVID-19 X-ray Dataset: A Novel Detection Model Based
on GAN and Deep Transfer Learning. Symmetry 2020, 12, 651. [CrossRef]
36. Rajaraman, S.; Sornapudi, S.; Alderson, P.O.; Folio, L.R.; Antani, S.K. Analyzing inter-reader variability affecting deep ensemble
learning for COVID-19 detection in chest radiographs. PLoS ONE 2020, 15, e0242301. [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.