0% found this document useful (0 votes)
6 views

Sensors 24 02641 v2

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

Sensors 24 02641 v2

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

sensors

Article
COVID-19 Hierarchical Classification Using a Deep
Learning Multi-Modal
Albatoul S. Althenayan 1,2, * , Shada A. AlSalamah 1,3,4 , Sherin Aly 5 , Thamer Nouh 6 , Bassam Mahboub 7 ,
Laila Salameh 8 , Metab Alkubeyyer 9 and Abdulrahman Mirza 1

1 Information Systems Department, College of Computer and Information Sciences, King Saud University,
Riyadh 11543, Saudi Arabia; [email protected] (S.A.A.); [email protected] (A.M.)
2 Information Systems Department, College of Computer and Information Sciences, Imam Mohammed Bin
Saud Islamic University, Riyadh 11432, Saudi Arabia
3 National Health Information Center, Saudi Health Council, Riyadh 13315, Saudi Arabia
4 Digital Health and Innovation Department, Science Division, World Health Organization,
1211 Geneva, Switzerland
5 Institute of Graduate Studies and Research, Alexandria University, Alexandria 21526, Egypt;
[email protected]
6 Trauma and Acute Care Surgery Unit, College of Medicine, King Saud University, Riyadh 12271,
Saudi Arabia; [email protected]
7 Clinical Sciences Department, College of Medicine, University of Sharjah, Sharjah P.O. Box 27272,
United Arab Emirates; [email protected]
8 Sharjah Institute for Medical Research, University of Sharjah, Sharjah P.O. Box 27272, United Arab Emirates;
[email protected]
9 Department of Radiology and Medical Imaging, King Khalid University Hospital, King Saud University,
Riyadh 12372, Saudi Arabia; [email protected]
* Correspondence: [email protected]; Tel.: +966-506061614

Abstract: Coronavirus disease 2019 (COVID-19), originating in China, has rapidly spread worldwide.
Physicians must examine infected patients and make timely decisions to isolate them. However,
completing these processes is difficult due to limited time and availability of expert radiologists, as
Citation: Althenayan, A.S.;
well as limitations of the reverse-transcription polymerase chain reaction (RT-PCR) method. Deep
AlSalamah, S.A.; Aly, S.; Nouh, T.; learning, a sophisticated machine learning technique, leverages radiological imaging modalities for
Mahboub, B.; Salameh, L.; disease diagnosis and image classification tasks. Previous research on COVID-19 classification has
Alkubeyyer, M.; Mirza, A. COVID-19 encountered several limitations, including binary classification methods, single-feature modalities,
Hierarchical Classification Using a small public datasets, and reliance on CT diagnostic processes. Additionally, studies have often
Deep Learning Multi-Modal. Sensors utilized a flat structure, disregarding the hierarchical structure of pneumonia classification. This study
2024, 24, 2641. https://doi.org/ aims to overcome these limitations by identifying pneumonia caused by COVID-19, distinguishing
10.3390/s24082641
it from other types of pneumonia and healthy lungs using chest X-ray (CXR) images and related
Academic Editors: Dong Xiao and tabular medical data, and demonstrate the value of incorporating tabular medical data in achieving
Yahui Li more accurate diagnoses. Resnet-based and VGG-based pre-trained convolutional neural network
(CNN) models were employed to extract features, which were then combined using early fusion
Received: 5 March 2024
for the classification of eight distinct classes. We leveraged the hierarchal structure of pneumonia
Revised: 15 April 2024
Accepted: 18 April 2024
classification within our approach to achieve improved classification outcomes. Since an imbalanced
Published: 20 April 2024 dataset is common in this field, a variety of versions of generative adversarial networks (GANs) were
used to generate synthetic data. The proposed approach tested in our private datasets of 4523 patients
achieved a macro-avg F1-score of 95.9% and an F1-score of 87.5% for COVID-19 identification using a
Resnet-based structure. In conclusion, in this study, we were able to create an accurate deep learning
Copyright: © 2024 by the authors. multi-modal to diagnose COVID-19 and differentiate it from other kinds of pneumonia and normal
Licensee MDPI, Basel, Switzerland. lungs, which will enhance the radiological diagnostic process.
This article is an open access article
distributed under the terms and
Keywords: artificial intelligence; COVID-19; CXR; hierarchical; deep learning; multi-modal; diagno-
conditions of the Creative Commons
sis; image classification; multi-classes; pneumonia
Attribution (CC BY) license (https://
creativecommons.org/licenses/by/
4.0/).

Sensors 2024, 24, 2641. https://doi.org/10.3390/s24082641 https://www.mdpi.com/journal/sensors


Sensors 2024, 24, 2641 2 of 31

Sensors 2024, 24, 2641 2 of 30


1. Introduction
Healthcare systems worldwide are seriously endangered by a type of viral pneumo-
nia known
1. Introduction as COVID-19. As the virus that causes COVID-19 has a high pathogenicity
similar to SARS-CoV, it is known as severe acute respiratory syndrome coronavirus-2
Healthcare systems worldwide are seriously endangered by a type of viral pneumonia
(SARS-CoV-2) [1]. Respiratory insufficiency can arise from the virus’s impact on the pa-
known as COVID-19. As the virus that causes COVID-19 has a high pathogenicity similar
tient’s respiratory
to SARS-CoV, it issystem,
knownspecifically
as severe acute the lungs. As of 11
respiratory February
syndrome 2024, the World
coronavirus-2 Health
(SARS-CoV-
Organization (WHO) has received reports of 774,631,444 confirmed
2) [1]. Respiratory insufficiency can arise from the virus’s impact on the patient’s respiratory cases of COVID-19, as
well as 7,031,216 deaths from respiratory failure and injury
system, specifically the lungs. As of 11 February 2024, the World Health Organization (WHO) to other major organs [2].
China reported
has received its first
reports of instances
774,631,444 ofconfirmed
the virus to the of
cases WHO in December
COVID-19, as well2019. Hospitaliza-
as 7,031,216 deaths
tion rates have increased globally due to the disease’s rapid spread
from respiratory failure and injury to other major organs [2]. China reported its first instances [3]. However, while
the peak
of the of the
virus initial
to the WHO crisis may be over,
in December 2019.the world is still dealing
Hospitalization rates have with the lasting
increased effects
globally due
oftothe COVID-19 pandemic. Continuous research in this field is
the disease’s rapid spread [3]. However, while the peak of the initial crisis may be over, theimportant to improving
the diagnosis
world of future
is still dealing respiratory
with the lasting disease
effectspandemics.
of the COVID-19 pandemic. Continuous research
in this field is important to improving the sensitive
RT-PCR testing is a standard, highly diagnosismethod
of futurefor diagnosing
respiratory COVID-19
disease pandemics.[4].
The method
RT-PCR has someislimitations
testing a standard,that highlymake its usemethod
sensitive a challenge in critical cases
for diagnosing [5]. In[4].
COVID-19 emer-The
gency
method situations,
has someradiologists
limitations that can assist
make its physicians in diagnosing
use a challenge in critical lung
cases diseases by using
[5]. In emergency
imaging
situations, modalities
radiologists suchcan as chest
assistcomputed
physicianstomography
in diagnosing (CT scans)
lung or CXR.
diseases RT-PCR
by using lim-
imaging
itations can be overcome with accurate analysis for radiological
modalities such as chest computed tomography (CT scans) or CXR. RT-PCR limitations imaging. These proce-
dures
can be areovercome
challenging with due to timeanalysis
accurate constraints and the largeimaging.
for radiological number These of patients. Further-
procedures are
more, erroneous
challenging due todiagnoses can result
time constraints andfrom radiologists
the large number of with different
patients. levels of expertise
Furthermore, erroneous
[6].
diagnoses can result from radiologists with different levels of expertise [6].
Chest
Chestradiography
radiography is isa significant
a significant diagnostic
diagnostic tooltool
[7], [7],
particularly
particularly considering
considering the ac-the
curacy
accuracy with which
with whichCTCT scans cancan
scans identify
identify andanddetect
detectpneumonia;
pneumonia; however,
however, ininthis
thiscontext,
context,
CXR
CXRmay maybebemore moreappropriate.
appropriate.This Thisisisdueduetotoseveral
severalreasons:
reasons:CXR CXRisisinexpensive,
inexpensive,simple, simple,
portable,
portable,available,
available,and andthe themost
mostwidely
widelyused usedtool
toolforfordiagnosing
diagnosingthe thelungs
lungs[8];[8];itithas
hasalso
also
beenrecommended
been recommendedby bythetheAmerican
AmericanCollege College of of Radiology
Radiology (ACR) to use CXR imaging imaging as as a
a certified
certified modality
modalityto toreduce
reducethe thepossibility
possibilityof ofdisease
diseasetransmission
transmission[9]. [9].
ItItisisimportant
importanttotonote notethat thatallallkinds
kindsofofpneumonia
pneumoniathat thatinfect
infectthe thelungs
lungsare areclassified
classified
inina ahierarchical
hierarchicalstructure
structureby bythe
theInternational
InternationalStatistical
StatisticalClassification
ClassificationofofDiseasesDiseases(ICD). (ICD).
COVID-19, scientifically referred to as SARSr-CoV-2, was
COVID-19, scientifically referred to as SARSr-CoV-2, was added to the hierarchical viral added to the hierarchical viral
family in the
family in the ICD 10thICD 10th revision (ICD-10) [10]. Figure 1 presents the types
(ICD-10) [10]. Figure 1 presents the types of pneumonia of pneumonia for
which datasets were obtained that are organized in a hierarchical
for which datasets were obtained that are organized in a hierarchical structure. There are structure. There are eight
classes
eight in total
classes (with(with
in total a normal class),
a normal six ofsix
class), which are leaf
of which are nodes.
leaf nodes.

Pneumonia

Viral Bacterial

respiratory
SARSr-CoV-2 Influenza syncytial virus Adenoviruses
(RSV)

Figure
Figure1.1.Proposed
Proposedhierarchal
hierarchalclass
classstructure
structureofofpneumonia.
pneumonia.

AsAsillustrated
illustratedpreviously,
previously, the hierarchical
hierarchicalstructure
structureofofCOVID-19
COVID-19indicates
indicatesthat this
that is a
this
ishierarchical
a hierarchicalclassification problem.
classification problem.TheThe
classes in high-level
classes nodes
in high-level at hierarchical
nodes levels
at hierarchical are
lev-
known as coarse-grained nodes because they have unique features that
els are known as coarse-grained nodes because they have unique features that will be will be transmitted
to their child
transmitted to nodes alongnodes
their child with along
all thewith
features from
all the their parent
features node.parent
from their Furthermore,
node. Fur-the
final levelthe
thermore, of nodes in theof
final level structure,
nodes inthe leaf
the node, is referred
structure, the leaf to as fine-grained
node, is referred since
to asitfine-
lacks
descendants
grained since itand inherits
lacks all of itsand
descendants parent’s features.
inherits all of its parent’s features.
Applying high-accuracy artificial intelligence (AI) models to diagnose medical imaging
problems is a current trend in healthcare. Convolutional neural networks are able to detect
and learn the significant details that radiologists find difficult to recognize with their naked
Sensors 2024, 24, 2641 3 of 30

eyes [11]. It produces promising results for learning complex problems in radiology [12].
Many of the previously reviewed studies [13] have employed deep learning models to
diagnose and detect COVID-19 pneumonia utilizing medical imaging in a theoretical
manner that cannot be implemented clinically. CT scans have primarily been considered as
the main radiological imaging modality for all infected cases during the ongoing pandemic.
In addition, most studies on COVID-19 image classification are misleading due to the use of
binary classification and larger samples for COVID-19 classification. Despite this, the ability
of AI systems to differentiate between various classes is increasing as they learn from a
greater number of classes. Most approaches use a flat structure, while pneumonia naturally
falls into a hierarchical structure. Despite several techniques used for COVID-19 detection
and classification, limited research has addressed multi-modal deep learning models
for heterogeneous data types. Most existing models focus on a single feature modality,
while multi-modal features combine multiple aspects of COVID-19 health information,
contributing to superior disease diagnostic processes.
In several fields, especially in diagnosis by medical assistants, deep learning ap-
proaches have accomplished significant advances in multi-modal structures by learning
features from different sources of data [14]. This clearly explains the effectiveness of adding
various medical data in addition to CXR images for the diagnostic process.
One variant of the conventional flat classification problem is hierarchical classification
(HC). In a flat classification approach, cases are categorized into classes without following
any predefined structure.
Proving that hierarchical classification is more effective than flat classification in
this domain is not the purpose of this work, as this has already been addressed in the
literature [15]. In this work, we investigated how clinical data affect COVID-19 classification
utilizing CXR images with a hierarchical classification framework to detect different types
of pneumonia caused by multiple pathogens and differentiated them from normal lungs. To
achieve this, we collected a private, imbalanced dataset in which some types of pneumonia
are much more common than others. To this end, we applied variants of the GAN model
to balance the class distribution. We first applied multi-modal hierarchical classification
utilizing a deep learning approach for two predefined models in a hierarchical structure
using a hybrid approach to the CXR images; then, the medical tabular data were added
using early fusion. It is important to note that the newly released WHO normative guidance
for applying artificial intelligence in health recognizes the risks, and we are in compliance
with their recommendations for safe and effective implementation [16,17].
This paper is organized as follows: Section 2 covers the related works in the literature.
The proposed methodology and details of the dataset used in this paper and its analysis, as
well as the techniques used to preprocess either the CXR images or the tabular dataset, are
discussed in Section 3. After that, Section 4 details the proposed architecture of hierarchical
multi-modals and the training procedure. The experimental setup and obtained results and
a discussion are summarized in Section 5. Finally, the conclusions of the current work and
some possibilities for future works are described in Section 6.

2. Related Work
As a result of the COVID-19 pandemic, COVID-19 medical image classification has
recently attracted a lot of scientific interest. Researchers from a variety of disciplines
have developed deep learning detection and classification models to diagnose COVID-19
reliably and quickly by analyzing radiological images. We published a review paper called
“Detection and Classification of COVID-19 by Radiological Imaging Modalities Using
Deep Learning Techniques: A Literature Review” [13], which attempts to explore all related
remarkable works in the literature and study and analyze them to explain how most current
key approaches to the COVID-19 classification challenge have gaps and untapped potential.
In addition, in the aforementioned paper, we provided some recommendations addressing
various aspects that may help researchers in this field.
Sensors 2024, 24, 2641 4 of 31

Sensors 2024, 24, 2641 potential. In addition, in the aforementioned paper, we provided some recommendations
4 of 30
addressing various aspects that may help researchers in this field.

3. Proposed
3. Proposed Methods
Methods and and Materials
Materials
The proposed approach
The proposed approach that that is
is applied
applied in inthis
thisstudy
studyconsists
consistsofoffive
fivephases,
phases,asasdemon-
demon-
strated in Figure 2. These phases are as follows: collecting the required
strated in Figure 2. These phases are as follows: collecting the required dataset (CXRdataset (CXR im-
images
ages and patient medical data in tabular format), preparing and preprocessing
and patient medical data in tabular format), preparing and preprocessing the collected the col-
lected dataset,
dataset, generating
generating a synthetic
a synthetic datasetdataset to balance
to balance thefeeding
the data, data, feeding the preprocessed
the preprocessed dataset
dataset into a hierarchal multi-modal network, and evaluating the classification
into a hierarchal multi-modal network, and evaluating the classification output. The output.
details
The
of details
each phaseof are
each phase areindescribed
described in thesections.
the following following sections.

Figure 2. Proposed
Figure 2. Proposed framework for multi-modal
framework for multi-modal classification
classificationof
ofCXR
CXRimages
imagesand
andtabular
tabulardata.
data.

3.1. Dataset Description


3.1. Dataset Description
Our study used two anonymized private datasets: (I) The first dataset was collected from
Our study
the database usedKhalid
at King two anonymized
University private
Hospital, datasets:
Riyadh,(I)KSA,
The first
under dataset was collected
Institutional Review
from the database at King Khalid University Hospital, Riyadh,
Board (IRB) approval (E-21-5939) in collaboration with Dr. Thamer Nouh and KSA, under Institutional
Dr. Metab
Review Board
Alkubeyyer. (IRB) approval
It contains (E-21-5939)
3326 patients in collaboration
of normal, with Dr. Thamer
bacterial, SARS-CoV-2, Nouh
influenza, and Dr.
respiratory
Metab Alkubeyyer.
syncytial virus (RSV),Itand contains 3326 patients
adenovirus cases, asofshown
normal,in bacterial, SARS-CoV-2,
Table 1. (II) influenza,
The second dataset was
obtained from Rashid Hospital, Dubai, UAE, reviewed and approved by the DubaiThe
respiratory syncytial virus (RSV), and adenovirus cases, as shown in Table 1. (II) sec-
Scientific
ond dataset
Research was obtained
Ethical Committee from Rashid Dubai
(DSREC), Hospital, Dubai,
Health UAE, reviewed
Authority (DHA); andIRB approved
approval wasby
the Dubai Scientific Research Ethical Committee (DSREC), Dubai
acquired (DSREC-12/2021_01) in collaboration with Dr. Bassam Mahboub and Dr. Laila Health Authority
(DHA); IRB
Salameh, approval1218
containing was SARS-CoV-2
acquired (DSREC-12/2021_01)
patients, as shown inincollaboration withwere
Table 2. Efforts Dr. Bassam
made to
Mahboub and Dr. Laila Salameh, containing 1218 SARS-CoV-2 patients,
obtain data from different sources to improve the model’s generalization capabilities. as shown in Table
2. Efforts were made to obtain data from different sources to improve the model’s gener-
alization
Table capabilities.
1. Patient characteristics for dataset (I).
To obtain the datasets from the hospital databases, a physician gathered the required
Category for the desired test name and range ofValue
viral patients by searching years. While there is no
test for bacterial
Numberinfection, the patients whose diagnosis contains
of patients 3306the word “bacterial
pneumonia” were selected as cases for the bacterial class. It is important to note that the
Gender
Male
data in the normal category were collected from a patient who was1584 scheduled for surgery
to ensure that the Female
patient’s lungs were healthy. The CXR images1722 from the first dataset
Diagnosis
were produced using an Optima XR240amx, General Electric Healthcare from Chicago,
SARSr-CoV-2 630
United States. All CXR Normal images in datasets (I) and (II) are posterior–anterior
1272 (PA), and an-
terior–posterior (AP) views
Bacterial were included. While there were CXR images
248 of both lateral
views, the date gap between the diagnosis and obtaining the CXR1120
Influenza images was more than
RSV 21
Adenovirus 15
Age
Range 0–103
Sensors 2024, 24, 2641 5 of 30

Table 2. Patient characteristics for dataset (II).

Category Value
Sensors 2024, 24, 2641 5 of 31
Number of patients 1217
Gender
Male 1070
48 h, and/or those that did not have tabular data were excluded. Some samples
Female 147 of the
dataset for different classes are shown in Figure 3. The process
Diagnosis of obtaining IRB approval
from hospitals and collecting, cleaning, and organizing the data took approximately
SARSr-CoV-2 1217 a full
year. The COVID-19 cases were collected for the patients
Age that visited from the year 2019
to the year 2022, while theRange
remaining cases were collected for patients from the year 2014
19–87
to the year 2023.

To obtain
Table 1. Patient the datasets
characteristics from
for dataset (I).the
hospital databases, a physician gathered the required
viral patients by searching for the desired test name and range of years. While there is no test
Category Value
for bacterial infection, the patients whose diagnosis contains the word “bacterial pneumonia”
Number of patients 3306
were selected as cases for the bacterial class. It is important to note that the data in the normal
Gender
category wereMalecollected from a patient who was scheduled 1584
for surgery to ensure that the
patient’s lungs were
Female healthy. The CXR images from the
1722 dataset were produced using
first
an Optima XR240amx, General Electric Healthcare
Diagnosis from Chicago, United States. All CXR
images in SARSr-CoV-2
datasets (I) and (II) are posterior–anterior (PA),630and anterior–posterior (AP) views
were included. While there were CXR images of both lateral
Normal 1272 views, the date gap between the
diagnosis and obtaining the CXR images was more than 248
Bacterial 48 h, and/or those that did not have
tabular dataInfluenza
were excluded. Some samples of the dataset 1120for different classes are shown in
Figure 3. The process
RSV of obtaining IRB approval from hospitals
21 and collecting, cleaning, and
organizingAdenovirus
the data took approximately a full year. The15 COVID-19 cases were collected for
the patients that visited from the yearAge2019 to the year 2022, while the remaining cases were
collected for patients
Range from the year 2014 to the year 2023. 0–103

(A) Normal (B) Bacterial (C) Adenovirus

(D) SARSr-CoV-2 (E) Influenza (F) RSV

Figure 3. CXR samples of the dataset.


Sensors 2024, 24, 2641 6 of 30

The dataset consists of CXR images with the corresponding medical tabular data for each
patient. It includes demographic, vital signs, clinical, and medication data; the attributes in the
data record are described in Table 3. The tabular dataset includes 644 features with categorical
and numerical data types. Some of the features have been removed since they were not
significant in diagnosing pneumonia. In addition, there are a total of 60 different nationalities
represented in the patient sample for the 4523 patients who make up the entire dataset.

Table 3. Description of attributes of the tabular data.

Feature Type Feature Name Description Data Type


Age Patient’s age at the time of diagnosis. Numerical
Demographic Gender Patient’s gender. Categorical
Nationality Nation of origin of the patient. Categorical
BMI Patient’s body mass index. Numerical
Vital signs It is a calculation performed on a patient
MEWS Score Categorical
after checking their vital signs and AVPU score.
WBC Test that measures the number of white blood cells. Numerical
tHb Test that measures the total hemoglobin. Numerical
Plt Test that measures the blood platelets. Numerical
Lymph Auto # Test that measures the percentage of lymphocytes in your blood. Numerical
Sodium Lvl Test of the sodium level in the blood. Numerical
Potassium Lvl Test of the potassium level in the blood. Numerical
BUN Test of the blood urea nitrogen. Numerical
Creatinine Test that measures the level of creatinine in the blood. Numerical
Alk Phos Tests that measure the level of alkaline phosphatase in the blood. Numerical
Test that measures the levels of aspartate
AST Numerical
aminotransferase enzyme in the blood.
Albumin Lvl Test that checks the amount of Albumin in the blood. Numerical
Lab test ALT Test that measures the amount of alanine transaminase in the blood. Numerical
Bili Total Test that measures the levels of bilirubin in your blood. Numerical
The international normalized ratio is a blood test
INR Numerical
that measures how long it takes for blood to clot.
LDH Test that measures the level of lactate dehydrogenase in the blood. Numerical
Procalcitonin Test that measures the level of procalcitonin in the blood. Numerical
CRP Test to check the C-reactive protein level in the blood. Numerical
Ferritin Lvl Test that measures the amount of ferritin in the blood. Numerical
Test that measures the percentage of hemoglobin proteins in the
Hgb A1c Numerical
blood that are coated with sugar.
Test that measures D-dimer, which is a protein fragment that the
D-Dimer Numerical
body makes when a blood clot dissolves in the body.
A B-type natriuretic peptide test is for measuring the
BNP Numerical
levels of a certain type of hormone in the blood.
Total CK Test that measures the amount of creatine kinase in the blood. Numerical
Vitamin D 25 OH Test that measures the level of active vitamin D in the blood. Numerical
Total number of medications that were prescribed
Medication 614 Medications Categorical
to the patient when visiting the hospital.
Sensors 2024, 24, 2641 7 of 30

The MEWS (modified early warning score) feature is a clinical tool used in healthcare
settings to assess a patient’s vital signs. More hospitals are currently using it to help track
changes between each set of vitals [18]. The MEWS score typically consists of several physio-
logical parameters including blood pressure, body temperature, pulse rate, respiratory rate,
and the AVPU (A = Awake, V = Verbal, P = Pain, U = Unresponsive) score, which is used to
determine a patient’s level of consciousness. A score is given for each parameter based on
specified standards. The overall MEWS score is then determined by summing the scores
for each parameter [19]. According on their MEWS score, patients may be classified into
risk categories using the MEWS scoring system: Normal: 0–1 score, Low Risk: 2–3 score,
Moderate Risk: 4–6, High Risk: 7–8, Critical: >8. The concern for clinical deterioration
increases as the MEWS score rises.
With the assistance of a knowledgeable pharmacist, medication prescriptions were
also limited to the most fundamental categories without doses, which helped to decrease
the enormous number of medications from 3500 to 614.

3.2. Data Analysis


Statistical analysis was conducted with python libraries related to statistical testing.
The t-test, Kruskal–Wallis test, and chi-squared test, accessed from the “scipy.stats” module,
were applied for continuous, categorical, and binary categorical values, respectively. The
reported significance levels were two-sided, and the statistical significance level was set
to 0.05. The categorical features were expressed as frequency (%), mean (µ), and standard
deviation (σ) for continuous features. Table 4 represents a comparison of the features
between the two patient groups for the raw dataset, the first with COVID-19 and the second
for all remaining classes.

Table 4. Patient’s medical information characteristics (*: data with statistical significance; a: chi-square
test; b: Student’s t-test; c: Kruskal–Wallis H test).

COVID-19 Non-COVID-19 Overall


Characteristics p Value
(n = 1847) (n = 2677) (n = 4524)
Age (years) 48.9 ± 16.3 37.4 ± 20.95 42.65 ± 19.81 <0.001 *b
Gender
Male 1402, 75.9% 1252, 46.8% 2655, 58.7% <0.001 *a
Female 445, 24.1% 1424, 53.2% 1869, 41.3%
Vital signs
BMI 28.53 ± 10.86 39.03 ± 20.3 33.89 ± 17.19 <0.001 *b
MEWS score (normal) 581, 31.5% 2057, 76.8% 2638, 58.3% <0.001 *c
MEWS score (low-risk) 603, 32.6% 365, 13.6% 968, 21.4%
MEWS score (moderate-risk) 491, 26.6% 185, 6.9% 676, 14.9%
MEWS score (high-risk) 106, 5.7% 59, 2.2% 165, 3.6%
MEWS score (critical) 27, 1.5% 11, 0.4% 38, 0.8%
Lab test
WBC 9.69 ± 5.83 8.8 ± 4.46 8.98 ± 4.78 <0.001 *b
tHb 12.97 ± 2.18 13.03 ± 2.55 12.99 ± 2.32 0.729 b
Plt 232.81 ± 99.41 nan ± nan 232.81 ± 99.41 <0.001 *b
Lymph Auto # 1.3 ± 2.25 2.13 ± 1.44 1.92 ± 1.72 <0.001 *b
Sodium Lvl 135.77 ± 6.11 138.02 ± 3.65 136.99 ± 5.06 <0.001 *b
Potassium Lvl 4.14 ± 0.59 4.15 ± 0.49 4.15 ± 0.54 0.606 b
BUN 21.81 ± 20.96 5.13 ± 4.32 12.77 ± 16.75 <0.001 *b
Sensors 2024, 24, 2641 8 of 30

Table 4. Cont.

COVID-19 Non-COVID-19 Overall


Characteristics p Value
(n = 1847) (n = 2677) (n = 4524)
Creatinine 36.81 ± 84.86 78.3 ± 104.39 59.27 ± 98.12 <0.001 *b
Alk Phos 99.38 ± 68.16 103.83 ± 79.9 102.72 ± 77.15 0.187 b
AST 71.5 ± 388.91 42.73 ± 222.26 54.68 ± 303.08 0.021 b
Albumin Lvl 27.04 ± 8.74 34.54 ± 5.2 32.68 ± 7.05 <0.001 *b
ALT 77.44 ± 114.05 44.16 ± 100.25 45.66 ± 101.99 <0.001 *b
Bili Total 11.18 ± 18.35 10.57 ± 19.74 10.72 ± 19.42 0.477 b
INR 1.08 ± 0.28 1.08 ± 0.26 1.08 ± 0.27 0.680 b
LDH 376.81 ± 331.01 363.38 ± 298.75 375.34 ± 327.59 0.553 b
Procalcitonin 1.67 ± 12.4 3.54 ± 12.36 1.91 ± 12.4 0.036 b
CRP 82.72 ± 84.94 68.91 ± 84.35 80.59 ± 84.97 0.009 b
Ferritin Lvl 1007.16 ± 2892.55 709.05 ± 6046.88 960.46 ± 3574.55 0.419 b
Hgb A1c 8.19 ± 5.34 7.47 ± 2.16 7.96 ± 4.58 0.001 b
D-Dimer 1.61 ± 2.51 2.37 ± 3.58 1.66 ± 2.61 0.030 b
BNP 2872.61 ± 14459.7 2505.09 ± 4094.67 2757.2 ± 12191.43 0.569 b
Total CK 126.63 ± 610.22 282.06 ± 505.17 170.01 ± 586.74 <0.001 *b
Vitamin D 25 OH 43.57 ± 33.94 42.62 ± 32.02 42.7 ± 32.16 0.818 b
Medication
614 Medications (No) 1,231,221, 98.5% 1,623,034, 98.6% 3,269,812, 98.7% 1.000 a
614 Medications (Yes) 18,623, 1.5% 23,306, 1.4% 41,976, 1.3%

As we can see, there is a significant difference in age, gender, and BMI between the
two groups (all p < 0.001). The MEWS score shows a significant difference (p < 0.001),
and an abnormal MEWS score was more often observed in COVID-19 patients. Signif-
icant differences were not found in some lab tests between the two groups, including
tHb (p = 0.729), Potassium Lvl (p = 0.606), Alk Phos (p = 0.187), Bili Total (p = 0.477), INR
(p = 0.680), LDH (p = 0.553), Ferritin Lvl (p = 0.419), BNP (p = 0.569), and Vitamin D 25 OH
(p = 0.818). WBC, Plt, Lymph Auto #, Sodium Lvl, BUN, Creatinine, Albumin Lvl, ALT, and
Total CK were significantly different between the two groups (p < 0.001). In addition, AST,
Procalcitonin, CRP, Hgb A1c, and D-Dimer also showed significant differences (p = 0.021,
p = 0.036, p = 0.009, p = 0.001, and p = 0.030, respectively). Although medications were
observed in COVID-19 patients, they were not statistically different compared with those
in the non-COVID-19 group.
There are almost 59.5% Saudi citizens among all classes. The distribution of the data
was also shown using a number of visualizations. Figure 4 shows the distribution of some
continuous and categorical features; many medical features were observed. Chart (A) in
Figure 4, showing the age distribution across all datasets, reveals that the majority of
patients fall within the 20–60-year age range, followed by those aged between 60 and 80,
while the smallest sample size belongs to patients over 80 years old. The distribution of
patient gender is presented in chart (B), where we find that the total females (represented
by 1) are 1869 and the total males (represented by 0) are 2654 patients. Approximately 57%
of the total patients had a normal MEWS score, while 1.2% from all classes were critical
cases with an abnormal MEWS score, as shown in chart (C). Charts (D), (E), and (F) show
that the majority of patients typically have a normal white blood cell (WBC) count, platelet
(PLT) count, and C-reactive protein (CRP) result, respectively. Vitamin D 25 OH in chart
(H) shows a right-skewed distribution, and the lymphocyte percentage (Lymph Auto #) in
chart (G) shows an almost normal distribution.
Sensors 2024, 24, 2641 9 of 31

Sensors 2024, 24, 2641 9 of 30


between age and nationality, while there is a weak association between age, nationality,
and MEWS score.

(A) Age (B) Gender

(C) MEWS Score (D) WBC

(E) Plt (F) CRP

(G) Lymph Auto # (H) Vitamin D 25 OH

Figure 4. Exploratory analysis of the numeric features.


Figure 4. Exploratory analysis of the numeric features.

The Pearson correlation coefficient was used to obtain the relationships between the
continuous features. Cramér’s V was used to measure the association between the cate-
gorical features. Figure 5 shows the correlation of the continuous features; the heat map
shows the correlation between twenty-four continuous features. A correlation value of
0.75 was recorded between both creatinine and total CK. Furthermore, a correlation of 0.72
was observed between total AST and ALT. Table 5 indicates that there is no association
between age and nationality, while there is a weak association between age, nationality, and
MEWS score.
Sensors
Sensors 24, 2641
2024,2024, 24, 2641 10 of 10
31 of 30

Figure
Figure 5. 5.Correlation
Correlationof
ofthe
the continuous
continuous features
featuresininthe
thedataset.
dataset.
Table 5. Correlation of the categorical features in the dataset.
Table 5. Correlation of the categorical features in the dataset.
Features Correlation
Features
(AGE, NATIONALITY) Correlation
0.000000
(AGE,(AGE, MEWS Score)
NATIONALITY) 0.196708
0.000000
(AGE, MEWS Score)AGE)
(NATIONALITY, 0.000000
0.196708
(NATIONALITY, MEWS
(NATIONALITY, AGE) Score) 0.187111
0.000000
(MEWS Score,
(NATIONALITY, MEWSAGE)
Score) 0.196708
0.187111
(MEWS
(MEWSScore, NATIONALITY)
Score, AGE) 0.187111
0.196708
(MEWS Score, NATIONALITY) 0.187111
3.3. Data Preprocessing
Efficient
3.3. Data preprocessing of data can have a major effect on the reliability and quality
Preprocessing
of deep learning model results. It assists in guaranteeing that the data are accurate, in the
Efficient preprocessing of data can have a major effect on the reliability and quality
right format, free of errors, and in line with the objectives of the modeling tasks [20].
of deep learning model results. It assists in guaranteeing that the data are accurate, in the
right format, free of errors, and in line with the objectives of the modeling tasks [20].
To preserve data privacy, we anonymize the identity of the patients in the CXR images
and the tabular data because it is not included in the analysis. In the following subsections,
Sensors 2024, 24, 2641 11 of 30

we discuss each preprocessing step for each item of the tabular data and the CXR images in
detail and illustrate its main methods.

3.3.1. Tabular Data


The raw dataset contains tabular data that were obtained in their original, unprocessed
form, and we cleaned and organized the data in a unified tabular structure in line with the
Dubai COVID-19 data format.
Before addressing the missing values in the tabular data, the data type was checked
to make sure that the data in the dataset were correctly formatted and that the data types
were consistent. Since the presence of string values in the data is minimal, the appropriate
action would be to convert these string values to null and then proceed as missing values.
As is well known, managing the missing values in medical data is not straightfor-
ward [21]. This is due to the nature and sensitivity of these data, where replacing the missing
values with a value of 0 has different meanings in this field. For this reason, we predict the
values for the features that have less than 75% of their values missing for each file. Due to the
importance of all features and an inability to remove any of the existing ones, we combined
all files and repeated the process for the features in which the percentage of missing values
was higher than 75%.
Extreme Gradient Boosting (XGBoost) and Random Forest are machine learning models
used to impute continuous and categorical missing values. For each class, the data are split
into two parts: a training dataset (non-missing values) and a test dataset (missing values to
be imputed). The XGBoost Regressor and Random Forest Regressor are used for continuous
features, while categorical features are imputed using the XGBoost Classifier and Random
Forest Classifier. Thereafter, box and bar plots are utilized to visualize the distribution of
continuous and categorical values, respectively, and spot the outliers. The interquartile
range (IQR) method is used to identify the outliers; values outside the range of the lower
bound [q1 − 1.5 × iqr] and upper bound [q3 + 1.5 × iqr] are considered outliers. Considering
the importance of each individual patient in the dataset, no patients were removed. The
outlier values are replaced with “NaN” and then imputed using the XGBoost Regressor
and Random Forest Regressor for continuous features. Subsequently, the features where the
percentage of missing data exceeds 75% are returned and imputed; each column in this list is
considered a target for imputation. The same approach is used to create an initial predictive
model using the target column and common non-null features from the trainable datasets.
Following each training cycle, a decision tree structure visualization of the model is created
and saved. This visualization aids in understanding the model’s decision pathways. Once
these predictions are made, the missing values in the original datasets are updated.
Then, applying a log transformation to the continuous columns in the dataset makes the
data more normally distributed and standardizes the numerical values to scale them in the
range of a mean of 0 and a standard deviation of 1, ensuring that the data are appropriate for
subsequent modeling steps. Principal component analysis (PCA) is used to extract features
from high-dimensional tabular data. Maximum likelihood estimation (MLE) is used to reduce
the number of principle components from 644 to 218 components, significantly decreasing the
complexity of the data while retaining the essential variance to focus on the most informative
aspects of the data for modeling. The most important demographic and health-related data,
in order of importance, are the following: age, BMI, MEWS score, nationality, and then
gender. The top ten lab tests in order are INR, Total CK, D-Dimer, CRP, LDH, Albumin Lvl,
BUN, Vitamin D 25 OH, and WBC test. The following medications were among the most
important: Spironolactone, Granisetron, Colchicine, Sulfasalazine, Nifedipine, Gliclazide,
Sodium Bicarbonate, Ferrous Sulfate, Metformin, and Pioglitazone, respectively.

3.3.2. Images
All the CXR images were downloaded from picture archiving and communication
systems (PACS). The radiology consultant was provided with files containing patient file
numbers according to each class; the radiology consultant fetched all the CXR images of the
Sensors 2024, 24, 2641 12 of 30

specific range of years (e.g., COVID-19 from 2020 to 2022) that related to the listed patients.
Each image was selected and labeled by the represented class based on matching the date
when the patient visited the hospital and was diagnosed with the disease with the date that
the CXR image was taken.
The CXR images were obtained in the DICOM (Digital Imaging and Communications
in Medicine) extension. The MicroDicom DICOM viewer version (DM_PLATFORM_XRAY_
GANAPATI_4.10.2_2020_FW41.1_158) was used to convert the images to the appropriate
format of a JPG file [22].
To help the deep learning model focus on the chest area (especially the lungs), the
images were manually cropped to guarantee that no tiny part of the lungs was removed in
any way, cutting out a chest region and removing any other parts of the body that appeared
in the image. Applying image enhancement is also important to improve the classification
result. The ability of the gamma-correction-based technique to detect COVID-19 from CXR
images outperforms other methods [23]. Trying different thresholds, the gamma threshold
value (0.9) was chosen with the contrast enhancement threshold value (1.5) to enhance
the contrast of the CXR images. Combining them enables a more thorough adjustment
of the appearance and tonal range of the image. If P is the pixel value within the range
[0,255], then x is the pixel’s grayscale value (x ∈ P). The output pixel vector of the gamma
correction function g(x) is calculated with Equation (1).
 x 1/γ( x)
g( x ) = 255 (1)
255
In addition, we apply image denoising using the total variation filter (TVF) method to
remove the noise from the images. Based on the literature, combining contrast enhancement
(gamma correction) and image denoising (TVF) approaches produces outstanding results
on COVID-19 images [24]. Moreover, transformations are used to preprocess the images
before the training phase. Due to dealing with CXR images from different resources and
sizes, standardization was applied by resizing the images. The images were resized to
128 × 128 pixels, which gave a better result through the experiments. The images were
converted to grayscale and the image pixel values were normalized to a range by dividing
by 255 (the maximum pixel value for 8-bit images) and then converting to a tensor for
integration with the TensorFlow framework.

3.3.3. Eliminating Rib Shadows in CXR Images


A significant challenge in the study of chest radiographs is the invisibility of anomalies
due to the superimposition of normal anatomical components, such as ribs, over the primary
tissue under examination [25]. Therefore, it would be helpful to eliminate the ribs without
losing any information about the original tissue when trying to increase the nodule visibility
and identify nodules on a chest radiograph. For that reason, we tried to apply a method [26]
for removing the rib shadows from the infected lungs to improve nodule detection and
enhance the diagnostic process. A hybrid self-template approach was used wherein the
algorithm tried to first identify the ribs. An unsupervised regression model was then used
to suppress the identified ribs. We attempted to adapt the paper’s approach to our private
dataset in order to achieve the same good results for rib elimination from their CXR images.
After preprocessing the images, the lung area was defined using a Gaussian filter to
extract the mask from each image. Rib detection was performed using a bilateral filter, and
the result was then converted to grayscale using Extreme Level Eliminating Histogram
Equalization (ELEHE) to improve the visibility of features in images. After that, Sobel edge
detection was applied to the equalized image to define the edge thickness. Then, dilation
was applied to merge nearby bright regions and increase their size; an opening operation
was also performed on the dilated images. Erosion was applied to the opened images to
shrink the bright regions and refine the image by reducing the size of the remaining bright
regions after the opening operation. To prepare the images for parabola fitting, connected
component analysis was performed on the images to define the connected components.
Sensors 2024, 24, 2641 13 of 31

Sensors 2024, 24, 2641 dilation was applied to merge nearby bright regions and increase their size; an opening 13 of 30
operation was also performed on the dilated images. Erosion was applied to the opened
images to shrink the bright regions and refine the image by reducing the size of the re-
maining bright regions after the opening operation. To prepare the images for parabola
The first (background) and last (foreground) components were excluded as they were not
fitting, connected component analysis was performed on the images to define the con-
useful;
nectedthen, fine tuning
components. The to the
first best number
(background) andoflast
connected components
(foreground) waswere
components performed.
ex-
Thereafter, parabola fitting was calculated using Equation (2):
cluded as they were not useful; then, fine tuning to the best number of connected compo-
nents was performed. Thereafter, parabola fitting
2 was calculated using Equation (2):
f(x) = ax + bx + c (2)
f(x) = ax + bx + c (2)
The most fitting connected components were considered, and all the curves were
The most fitting connected components were considered, and all the curves were
plotted on the images using polyline’s function. Finally, the rib region was acquired, shadow
plotted on the images using polyline’s function. Finally, the rib region was acquired,
estimation was clearly
shadow estimation wasdefined, and suppression
clearly defined, was performed
and suppression was performedby removing the shadows
by removing the
from an image, achieved by adjusting the pixel values in the shadow
shadows from an image, achieved by adjusting the pixel values in the shadow regions regions based on the
average BGR color values.
based on the average BGR color values.
Applying
Applyingthe theprevious
previous approach
approach to toour
ourprivate
privateCXRCXR images,
images, as shown
as shown in Figure
in Figure 6, 6,
delivers
delivers unacceptable results on most images. It removes a lot of nodules from the lungarea,
unacceptable results on most images. It removes a lot of nodules from the lung
which
area,affects
which the prediction
affects results.results.
the prediction It is important to note
It is important to that
notethe
thateffectiveness
the effectivenessof shadow
of
suppression may depend
shadow suppression mayon the characteristics
depend of the images,
on the characteristics which could
of the images, whichbe attributed
could be
to attributed
a lack of to
clarity
a lackofofvision
clarity and sometimes
of vision the boundary
and sometimes of the of
the boundary lungs for most
the lungs images,
for most
images,
despite despite
them beingthem being preprocessed.
preprocessed. TestingTesting on a variety
on a variety of images
of images is often
is often necessaryfor a
necessary
for a robust
robust shadow shadow
removal removal
model. model. For reason,
For that that reason, we decided
we decided notnot to apply
to apply ribrib elimina- in
elimination
tion
this in this experiment.
experiment.

(A) Original image (B) Lung mask (C) Soble edge detection (D) Dilation

(E) Calculation of connected components (F) Shadow estimation (G) Rib suppression (H) Shadow subtraction

Figure
Figure 6. 6.An
Anexample
exampleof
ofthe
the rib
rib shadow
shadow elimination
eliminationprocess
processand result.
and result.

3.4. Generating
3.4. GeneratingSynthetic
Synthetic Dataset
Dataset
OneOnepotential
potentialpitfall
pitfall toto consider
consider withwiththis
thisapproach
approach is is
thethe
presence of imbalanced
presence of imbalanced
datasets.When
datasets. Whenthere
thereisis aa significant
significant skew
skewininthethedistribution
distribution of of
classes, with
classes, some
with classes
some classes
having
having farfewer
far fewersamples
samples thanthan others,
others,the themodel
modelcancanbecome
become biased towards
biased the the
towards majority
majority
class.
class. Thismeans
This meansthat
thatthe
the model
model might
mightperform
performwell
wellininidentifying
identifying thethe
common
common classclass
but but
struggle
struggle toto accuratelyclassify
accurately classifythetheless
lessfrequent
frequentones.
ones.This
This bias
bias can
can lead
lead to
to misleading
misleading re- results
sults and limit the generalizability of the approach to real-world scenarios
and limit the generalizability of the approach to real-world scenarios with a more balanced with a more
balanced
class class distribution.
distribution. To addressTobias address bias in
in model model training,
training, we needwe to need to balance
balance the to
the dataset
ensure that the number of instances for each class is roughly the same. Balanced datasets
often lead to better model performance [27]. While the number of cases for classes in each
level of the hierarchy structure are not balanced, we need to generate synthetic data to
balance the dataset. Given that the most common augmentation methods used to increase
the dataset do not fit with the type of dataset that is being used in this research, we choose
GANs as the base model. A variety of versions of the model have been developed, each
with a particular purpose [28].
Sensors 2024, 24, 2641 14 of 30

A conditional tabular generative adversarial network (CTGAN) was used to generate


synthetic tabular data. A CTGAN synthesizer is the generator that is responsible for creating
synthetic data samples. It is initialized with the best epoch value for each data file that
yields the optimal loss values, fits the synthesizer to the data, and generates synthetic data
using the fitted synthesizer. The desired samples of records that we needed to generate
were specified to achieve balance among each level of the dataset. The discriminator acts
as a critic, aiming to distinguish between real data samples from the training set and the
synthetic samples generated by the generator. The CTGAN generator consists of multiple
fully connected layers with a leaky ReLU activation function and batch normalization to
improve stability during training. The discriminator has the same structure, but it outputs
a probability score between 0 and 1, indicating how likely the input data sample (real or
synthetic) originates from the true data distribution.
The evaluation of the quality of the synthetic data compared to the real data yielded
superior results from synthetic data. The CTGAN-generated synthetic data demonstrated
high fidelity, with overall quality scores of 96.21%, 97.26%, 98.46%, etc. For the respective
datasets, this closely mimics the real data in terms of column shapes and pair trends. This
high degree of similarity suggests that the synthetic data can accurately represent the real
dataset, thereby minimizing the risk of introducing bias through unrepresentative samples.
Furthermore, the successful evaluation metrics highlight that the synthetic data cover over
90% of the categories and continuous ranges found in the real data and adhere well to the
minimum and maximum boundaries, ensuring that the integrity of data distributions and
limits is maintained. The synthesis quality checks confirm that the generated data maintain
a high level of quality and consistency with the original dataset, reducing the likelihood of
introducing artificial trends or biases.
To generate fake CXR images from corresponding fake tabular data, we used a com-
bination of tabular data and an image as inputs to a customized conditional generative
adversarial network (CGAN). The model is trained by initially learning from the original
tabular data and CXR images. Subsequently, it utilizes synthetic tabular data to generate cor-
responding synthetic CXR images. The generator is designed to produce 500 × 500 images,
and the discriminator processes both tabular and image data. It is important to note that
numerical-to-image synthesis with CGAN is a challenging task. The generator model has
10 layers, and the discriminator model has 8 layers; the count includes various types of
layers, such as Dense, Conv2D, Conv2DTranspose, Batch Normalization, Dropout, ReLU,
LeakyReLU, and Flatten layers. Both the generator and discriminator use the Adam opti-
mizer with a learning rate of 0.0001, a beta_1 of 0.5, and a dropout rate of 0.2 used in the
discriminator.
The model was trained for a number of epochs for each class to generate synthetic
data. The model’s Structural Similarity Index Measure (SSIM) averaged around 0.67, which
indicates a good degree of similarity to the original images. This value, although not perfect,
suggests that the synthetic images capture much of the structural integrity and texture
present in the real CXR images. Such a level of SSIM is often indicative of synthetic images.
The higher peak signal-to-noise ratio (PSNR) was calculated to be higher than the average,
which typically ranges between 20 and 40 dB for medical images. The PSNR value indicates
that the synthetic images have a lower level of error or noise compared to the original
images. The exact value of the PSNR that we targeted reflects a quality level that is generally
accepted as good in the field of medical image analysis; some samples are shown in Figure 7.
In the next section, we try to investigate how synthetic data improve classification accuracy,
especially when a small amount of data is available.
Sensors 2024, 24, 2641 15 of 31

Sensors 2024, 24, 2641 15 of 30


samples are shown in Figure 7. In the next section, we try to investigate how synthetic
data improve classification accuracy, especially when a small amount of data is available.

(A) Bacterial (B) Adenovirus (C) Influenzas (D) RSV

Figure
Figure 7. 7.Samples
Samplesof
ofsynthetic
synthetic CXR
CXR images.
images.

4. 4.
Hierarchal
HierarchalModel
ModelArchitecture
Architecture
ToTo develop
develop a hierarchical
a hierarchical classification
classification by applying
by applying deepdeep learning
learning models,
models, we
we adapted
adapted four pre-trained models to tackle the hierarchy classification process. It
four pre-trained models to tackle the hierarchy classification process. It has been observed has been
observed
that that VisualGroup
Visual Geometry Geometry Group (VGG)-based
(VGG)-based and ResidualandNetwork
Residual(Resnet)-based
Network (Resnet)- models
are widely utilized in this field and provide outstanding results [29,30].results
based models are widely utilized in this field and provide outstanding [29,30]. We
We adopted VGG11
adopted
and VGG11
Resnet18 andbasic
as the Resnet18
modelsas the
forbasic
this models for this
challenge challenge
because both because
are more both are more
suitable for the
suitable for
moderate sizethe
ofmoderate size of
our dataset. Theourdataset
dataset.consisted
The dataset consisted
of eight of eight hierarchical
hierarchical paths that are
paths that are shown in Table 6. The number of samples in the table represents only the
shown in Table 6. The number of samples in the table represents only the number of cases
number of cases in the original datasets. The details of all models are explained in the
in the original datasets. The details of all models are explained in the following subsections.
following subsections.
Table Dataset
6. 6.
Table Datasetdistribution
distribution for
for hierarchical classification.
hierarchical classification.

Label
LabelPath
Path #Samples
#Samples
Level#1
Level#1
Normal
Normal 1273
1273
Pneumonia
Pneumonia 3270
3270
Level#2
Level#2
Pneumonia\Bacterial
Pneumonia\Bacterial 248
248
Pneumonia\Viral
Pneumonia \Viral 3165
3165
Level#3
Level#3
Pneumonia\Viral\SARSr-CoV-2
Pneumonia \Viral\SARSr-CoV-2 1848
1848
Pneumonia\Viral\Influenza
Pneumonia \Viral\Influenza 1281
1281
Pneumonia \Viral\RSV
Pneumonia\Viral\RSV 2121
Pneumonia \Viral\Adenoviruses
Pneumonia\Viral\Adenoviruses 1515

4.1. Hierarchical Convolutional Neural Network Based on the VGG Architecture


4.1. Hierarchical Convolutional Neural Network Based on the VGG Architecture
A VGG-based neural network was mainly used as a deep learning multi-modal for
A VGG-based neural network was mainly used as a deep learning multi-modal for
the proposed method, with two architectures. The first architecture, called the VGG-like
the proposed method, with two architectures. The first architecture, called the VGG-like
multi-modal, adapts the architectural principles of the VGG neural network architecture,
multi-modal, adapts the architectural principles of the VGG neural network architecture,
which utilizes repetitive blocks of convolutional layers followed by Max Pooling layers to
which utilizes repetitive
effectively extract blocks
features fromofCXR convolutional
images. Ourlayers followed
VGG-like by Maxsimplifies
multi-modal Pooling layers
and to
effectively
tailors the original design for hierarchical decision making in pneumonia classification and
extract features from CXR images. Our VGG-like multi-modal simplifies
tailors
fromtheCXRoriginal
images,design
as shownfor in
hierarchical
Figure 8. Thedecision making
adaptations in pneumonia
were classification
designed to process single-from
CXR images, as shown in Figure 8. The adaptations were designed to process
channel (grayscale) CXR images by modifying the first convolutional layer to accept a single-channel
(grayscale)
single inputCXR images
channel. bydepth
The modifying
of the the first convolutional
network layermodel
was adjusted. The to accept a single
includes input
fewer
channel. The depth of the network was adjusted. The model includes
convolutional layers than some VGG models (e.g., VGG16, VGG19), making it effective fewer convolutional
layers than
for the some dataset.
targeted VGG models
The model(e.g.,introduces
VGG16, VGG19),
branchingmaking it effective
points to for the targeted
make hierarchical de-
dataset.
cisionsThe model introduces
at different branching
levels of pneumonia points to make
classification: hierarchical
normal decisions
vs. pneumonia, at different
bacterial vs.
levels of pneumonia classification: normal vs. pneumonia, bacterial vs. viral pneumonia,
and further subclassification of viral pneumonia. This hierarchical approach is novel and
not present in the standard VGG architecture. After the initial shared convolutional layers,
the network branches out to make specific decisions, with each branch having its own set
of convolutional and fully connected layers tailored to its classification task. Considering
the dataset size, the fully connected (ANN) layers in the branches are simplified compared
2. The input for the initial layer is 1 channel, and the output is 32 channels. The flattened
output from the initial CNN layer concatenated with the tabular data. These concatenated
data are then passed to two hidden layers (Dense) of 128 neurons with ReLU activation.
The final layer of the initial branch (decision #1) represents probabilities of normal/pneu-
monia classes. The pneumonia CNN layer for the second-level decision consists of two
Sensors 2024, 24, 2641 16 of 30
convolutional layers with the same kernel size, activation, and Max Pooling layer. The
output (decision #2) represents probabilities of viral/bacterial classes. The viral branch for
the third-level decision is similar to the pneumonia branch, but it has a different final layer
towith
the Dense layersunits
four output in the original#3)
(decision VGG models, reducing
corresponding the model’s
to SARS-CoV-2, complexity
influenza, and the
RSV, and
risk of overfitting on medical imaging datasets, which are typically smaller than ImageNet.
adenovirus.

Figure8.8.VGG-like
Figure VGG-likemulti-modal
multi-modal architecture.
architecture.

Theinput
The second architecture
image size wasis the VGG-backbone
changed to 128 ×multi-modal,
128 pixels. which adapts
The initial the original
CNN layer for a
VGG architecture by utilizing a pre-trained VGG11 model as a feature extractor,
first-level decision consists of one 2D convolution with thirty-two filters of size 1 × 1 and a followed
by three
kernel sizebranches
of 3 × 3 of fullythe
using connected layers (ANNs)
ReLU activation, and Maxfor the hierarchical
Pooling with adecision-making
kernel size of 2 × 2.
task. It has the same input image size: 128 × 128 pixels. The first convolutional
The input for the initial layer is 1 channel, and the output is 32 channels. The flattened layer was
output
from the initial CNN layer concatenated with the tabular data. These concatenatedby
modified to handle a single input channel, applying 64 filters of size 3 × 3 followed a are
data
Max Pooling layer with a kernel size of 2 × 2. This is followed by the second
then passed to two hidden layers (Dense) of 128 neurons with ReLU activation. The final convolutional
layer with 64 filters of size 3 × 3 followed by a Max Pooling layer with a kernel size of 2 ×
layer of the initial branch (decision #1) represents probabilities of normal/pneumonia classes.
The pneumonia CNN layer for the second-level decision consists of two convolutional
layers with the same kernel size, activation, and Max Pooling layer. The output (decision
#2) represents probabilities of viral/bacterial classes. The viral branch for the third-level
decision is similar to the pneumonia branch, but it has a different final layer with four output
units (decision #3) corresponding to SARS-CoV-2, influenza, RSV, and adenovirus.
The second architecture is the VGG-backbone multi-modal, which adapts the original
VGG architecture by utilizing a pre-trained VGG11 model as a feature extractor, followed by
three branches of fully connected layers (ANNs) for the hierarchical decision-making task. It
has the same input image size: 128 × 128 pixels. The first convolutional layer was modified
to handle a single input channel, applying 64 filters of size 3 × 3 followed by a Max Pooling
layer with a kernel size of 2 × 2. This is followed by the second convolutional layer with
64 filters of size 3 × 3 followed by a Max Pooling layer with a kernel size of 2 × 2 using ReLU
activation. This is followed by two convolutional layers and a Max Pooling layer with a
kernel size of 2 × 2 using ReLU activation. This pattern of two convolutional layers followed
by a Max Pooling layer is repeated for several blocks, progressively increasing the number
of filters (typically doubling) to extract more intricate features. After the convolutional
layers, the Flatten features are combined with the tabular feature and are passed to three
separated branches of fully connected layers. These layers perform computations on all acti-
vations from the previous layers. The fully connected layers of the original architecture are
replaced with custom layers designed to make hierarchical decisions specific to pneumonia
classification, as shown in Figure 9. Each branch has a sequential block with two Dense
layers followed by a Max Pooling layer is repeated for several blocks, progressively in-
creasing the number of filters (typically doubling) to extract more intricate features. After
the convolutional layers, the Flatten features are combined with the tabular feature and
are passed to three separated branches of fully connected layers. These layers perform
Sensors 2024, 24, 2641 computations on all activations from the previous layers. The fully connected layers of the
17 of 30
original architecture are replaced with custom layers designed to make hierarchical deci-
sions specific to pneumonia classification, as shown in Figure 9. Each branch has a sequen-
tial block with two Dense layers and ReLU activation. The first Dense layer has 128 units.
layers and ReLU
The second activation.
Dense layer hasThe first Dense
a specific numberlayer has 128
of units units. The
depending second
on the Dense layer
classification
hastask. The first and second branches (decision #1 and decision #2) have two units and
a specific number of units depending on the classification task. The first second
(normal
branches
vs. pneumonia and viral vs. bacterial, respectively). The third branch (decision #3) hasviral
(decision #1 and decision #2) have two units (normal vs. pneumonia and
vs.four
bacterial,
units ofrespectively). The third(SARS-CoV-2
the four viral subtypes branch (decision #3) has vs.
vs. influenza four units
RSV vs. of the four viral
adenovirus).
subtypes (SARS-CoV-2 vs. influenza vs. RSV vs. adenovirus). This adaptation
This adaptation allows the models to focus on the most relevant features for each decision allows the
models
level. to focus on the most relevant features for each decision level.

Figure 9. VGG-backbone multi-modal architecture.


Figure 9. VGG-backbone multi-modal architecture.
4.2.
4.2. HierarchicalConvolutional
Hierarchical Convolutional Neural
Neural Network
NetworkBased
Basedononthe
theResNet
ResNetArchitecture
Architecture
InIn additiontotothe
addition theVGG-based
VGG-based multi-modal,
multi-modal,the theResNet-based
ResNet-based multi-modal
multi-modal waswas
alsoalso
used as a deep learning multi-modal, with two architectures. The first architecture was
used as a deep learning multi-modal, with two architectures. The first architecture was the
the ResNet-like
ResNet-like multi-modal,
multi-modal, which
which waswas inspired
inspired bybythethe ResNetarchitecture.
ResNet architecture.The
The ResNet-like
ResNet-
like multi-modal adapts the ResNet architecture for the same hierarchical decision-mak-
multi-modal adapts the ResNet architecture for the same hierarchical decision-making
ing task in pneumonia classification, as shown in Figure 10. The input grayscale image
task in pneumonia classification, as shown in Figure 10. The input grayscale image size is
size is 128 × 128 pixels. The key adaptations are modified in the first convolutional layer
128 × 128 pixels. The key adaptations are modified in the first convolutional layer to accept
to accept grayscale images, reflecting the single-channel nature of CXR images. The model
grayscale images, reflecting the single-channel nature of CXR images. The model employs
employs customized residual blocks that match the task’s complexity and data character-
customized
istics. Eachresidual blocksofthat
block consists match the task’s
convolutional layers complexity and data characteristics.
with batch normalization and ReLU ac-Each
block consists of convolutional layers with batch normalization and
tivation, similar to ResNet’s design, but the number and configuration of blocks are ReLU activation,
tai-
similar to ResNet’s design, but the number and configuration of blocks
lored to the pneumonia classification task. The initial block (decision #1) processes theare tailored to the
pneumonia
entire CXR image to extract general features helpful for identifying patterns. It has oneCXR
classification task. The initial block (decision #1) processes the entire
image to extractlayer
conventional general features32helpful
that applies for identifying
3 × 3-sized filters with patterns.
a stride ofItone
hasand
onepadding
conventional
of
layer that applies 32 3 × 3-sized filters with a stride of one and
one using ReLU activation, followed by the batch normalization layer. The second padding of one using
is the
ReLU activation,
pneumonia blockfollowed
(decisionby#2),
thewhich
batchaims
normalization layer.
to classify the CXR The second
as viral is the pneumonia
or showing signs
block (decision
of bacterial #2), which
infection. aims toofclassify
It consists the CXR aslayers
two conventional viralwith
or showing
64 filterssigns of 3bacterial
of size ×3
infection. It consists of two conventional layers with 64 filters of size 3 × 3 followed by a
batch normalization layer and ReLU activation. The last is the viral block (decision #3),
which has the same specifications as the second block.
Similar to the VGG-like model, the ResNet-like model incorporates branching points for
hierarchical classification decisions. This structure leverages the deep feature representation
capability of ResNet while providing specialized decision paths for different classification
levels. The introduction of skip connections in each block ensures effective training and
feature propagation, even with the model’s depth. The model concludes with simplified,
fully connected layers in each branch for decision-specific classification. Each branch of the
network uses a sequence of two fully connected layers with a ReLU activation function in
between. The first Dense layer in each branch has 128 units. However, the second Dense
layer has a different number of units depending on the specific classification output it
performs. Branch 1 (decision #1) focuses on the normal vs. pneumonia classification, which
represents the two possible outcomes. The second branch (decision #2) assuming pneumonia
classification levels. The introduction of skip connections in each block ensures effective
training and feature propagation, even with the model’s depth. The model concludes with
simplified, fully connected layers in each branch for decision-specific classification. Each
branch of the network uses a sequence of two fully connected layers with a ReLU activa-
Sensors 2024, 24, 2641
tion function in between. The first Dense layer in each branch has 128 units. However, the
18 of 30
second Dense layer has a different number of units depending on the specific classification
output it performs. Branch 1 (decision #1) focuses on the normal vs. pneumonia classifi-
cation, which represents the two possible outcomes. The second branch (decision #2) as-
is suming
detectedpneumonia
in the firstisbranch;
detectedthisinbranch
the firstclassifies
branch; the
thispneumonia type as
branch classifies theviral or bacterial.
pneumonia
It type
utilizes a second
as viral Dense layer
or bacterial. witha two
It utilizes units,
second Densecorresponding
layer with twoto the two
units, classifications.
corresponding to The
third branch
the two (decision #3)
classifications. Theaims
thirdtobranch
classify(decision
the specific viraltosubtype.
#3) aims classify It
theemploys
specific aviral
second
Dense layer
subtype. with four
It employs units, Dense
a second representing
layer withthefour
fourunits,
possible viral subtypes
representing (SARS-CoV-
the four possible
2, viral
influenza,
subtypes RSV, and adenovirus).
(SARS-CoV-2, influenza,Overall,
RSV, and theadenovirus).
branching architecture leveragesar-
Overall, the branching Dense
chitecture
layers with leverages Dense layers
varying output sizes to with varying
handle outputclassification
different sizes to handletasks
different
withinclassifica-
the overall
tion tasks within
pneumonia the subtype
and viral overall pneumonia and viral subtype detection processes.
detection processes.

Figure
Figure 10.ResNet-like
10. ResNet-like multi-modal
multi-modal architecture.
architecture.

The
TheResNet-backbone multi-modalisisthe
ResNet-backbone multi-modal thesecond
second architecture,
architecture, which
which is anisadaptation
an adaptation
ofofthe
theoriginal
original ResNet
ResNet architecture utilizing
architecture a pre-trained
utilizing Resnet18
a pre-trained modelmodel
Resnet18 as a feature
as a ex-
feature
tractor, followed
extractor, followedbyby
three branches
three of fully
branches connected
of fully connectedlayers (ANNs)
layers for thefor
(ANNs) hierarchical
the hierarchi-
caldecision-making
decision-making task, as shown
task, in Figure
as shown 11. It has
in Figure 11. aItgrayscale input image
has a grayscale input size of 128size
image × of
128 × 128 pixels. The adaptations were designed to process single-channel (grayscale) CXR
images by modifying the first convolutional layer to accept a single input channel. The
model consists of 18 conventional layers with ReLU activation. The extracted features from
the CXR images are flattened, transforming the 2D feature maps into 1D vectors suitable for
fully connected layers. The flattened features from the CNN block combine image-derived
features with additional tabular patient information. After utilizing the pre-trained network
as a feature extractor, it employs three branches of fully connected layers (ANNs) for the
hierarchical decision-making process. Instead of using standard fully connected layers,
this architecture distributes the features into three branches, each containing a sequence
of two Dense layers with a ReLU activation function in between. The first Dense layer in
all branches has 128 units. The key difference lies in the second Dense layer, which adapts
its number of units based on the classification task: two units for normal vs. pneumonia
(decision #1), two units for viral vs. bacterial (decision #2, assuming pneumonia), and
four units for the four viral subtypes (decision #3). This branching approach with custom
layers allows the model to make hierarchical decisions tailored to each classification level.
Adapting ResNet’s residual learning principle, the model efficiently learns features from
layers, this architecture distributes the features into three branches, each containing a se-
quence of two Dense layers with a ReLU activation function in between. The first Dense
layer in all branches has 128 units. The key difference lies in the second Dense layer, which
adapts its number of units based on the classification task: two units for normal vs. pneu-
Sensors 2024, 24, 2641 monia (decision #1), two units for viral vs. bacterial (decision #2, assuming pneumonia),
19 of 30
and four units for the four viral subtypes (decision #3). This branching approach with
custom layers allows the model to make hierarchical decisions tailored to each classifica-
tion level. Adapting ResNet’s residual learning principle, the model efficiently learns fea-
CXR images,
tures from CXRwhich is crucial
images, whichforismedical imaging
crucial for tasks
medical wheretasks
imaging interpretability and accuracy
where interpretability
areand
paramount.
accuracy are paramount.

Figure 11. ResNet-backbone multi-modal architecture.


Figure 11. ResNet-backbone multi-modal architecture.

4.3.4.3. Training
Training thetheHierarchical
HierarchicalMulti-Modal
Multi-Modal Methodology
Methodology
WeWe are
are nowready
now readytototrain
train the
the four
four multi-modals
multi-modalsusing
usingthe hierarchical
the multi-modal
hierarchical multi-modal
approach. The pseudocode provided in Algorithm 1 illustrates the
approach. The pseudocode provided in Algorithm 1 illustrates the sequentialsequential training andand
training
classification strategy for a hierarchical model focused on pneumonia detection from CXR
classification strategy for a hierarchical model focused on pneumonia detection from CXR
images and tabular data. This methodology enables the model to learn distinctive features
images and tabular data. This methodology enables the model to learn distinctive features
relevant to each decision level, improving its ability to generalize and accurately classify
relevant to each decision level, improving its ability to generalize and accurately classify
new data.
new data.
Algorithm 1. Pneumonia Hierarchical Classification
Algorithm
1: 1. Pneumonia
Input: dataset_path,Hierarchical
num_epochs,Classification
batch_size
1: 2: Input:
Output: Trained Hierarchical
dataset_path, num_epochs,Model
batch_size
3: 1. Initialize transformations
2: Output: Trained Hierarchical Model for dataset preprocessing
4: 2. Load and split dataset into training and testing sets
3: 1. Initialize transformations for dataset preprocessing
5: 3. Define model, loss function, and optimizer
4: 2. Load and split dataset into training and testing sets
6: Function TrainModelForDecision(model,train_data, decision_point, loss_weights)
5: 7: 3. Define model, loss function, and optimizer
For each epoch in num_epochs do
6: Function TrainModelForDecision(model, train_data, decision_point, loss_weights)
7: For each epoch in num_epochs do
8: For each batch in train_data do
9: Perform forward pass for the current decision_point
10: Compute loss using decision-specific loss_weights
11: Perform backward pass and update model parameters
12: End For
13: End For
14: End Function
15: Sequentially train model for each decision point in the hierarchy
16: a. For decision_point in [decision_1, decision_2, decision_3] do
17: i. Set appropriate loss_weights for the current decision_point
18: ii. Call TrainModelForDecision with the current decision_point
19: iii. Optionally adjust model for next decision_point
20: b. End For
21: Function ClassifyImage(image, tabular data, model)
22: Perform model inference on the combined features
23: Extract and return decision outcomes for each hierarchy level
24: End Function
25: 4. Demonstrate classification with a sample image and tabular data using the trained model
Sensors 2024, 24, 2641 20 of 30

The hierarchal multi-modal first determines whether the image shows signs of pneu-
monia. If pneumonia is detected, it then classifies the pneumonia as either viral or bacterial.
If viral pneumonia is detected, the model further classifies the type of viral pneumonia.
The hierarchical inference function returns a tuple of decisions, each corresponding to a
level in the decision hierarchy. Algorithm 2 details the proposed inference with conditional
flow in the form of pseudocode.

Algorithm 2. Inference with Conditional Flow


1: Input: CXR images, Tabular data, trained model
2: Output: Normal OR Pneumonia/Bacterial OR
Pneumonia/Viral/(SARSr-CoV-2/Influenza/RSV/Adenovirus)
3: Function Hierarchical_Inference (image, tabular data, model)
4: decision_1_probs = model.forward_pass (image, tabular data, decision_point = ‘decision_1’)
5: decision_1 = ArgMax(decision_1_probs) # Normal vs. Pneumonia classification
6: If decision_1 is ‘Pneumonia’ Then
7: decision_2_probs = model.forward_pass (image, tabular data, decision_point =
‘decision_2’)
8: decision_2 = ArgMax(decision_2_probs) # Viral vs. Bacterial classification
9: If decision_2 is ‘Viral’ Then
10: decision_3_probs = model.forward_pass (image, tabular data, decision_point =
‘decision_3’)
11: decision_3 = ArgMax(decision_3_probs) # Subtypes of Viral Pneumonia
classification
12: Return (‘Pneumonia’, ‘Viral’, decision_3) #
SARSr-CoV-2/Influenza/RSV/Adenovirus
13: Else
14: Return (‘Pneumonia’, ‘Bacterial’, ‘N/A’) # No further subclassification
15: End If
16: Else
17: Return (‘Normal’, ‘N/A’, ‘N/A’) # No pneumonia detected, no further classification
18: End If
19: End Function

The training methodology adopted for the VGG-like and ResNet-like multi-modals
involves a sequential and focused approach, targeting one decision point at a time within
the hierarchical structure of the problem. This approach ensures that the models learn to
accurately classify at each level of decision making, from distinguishing between normal
and pneumonia cases to identifying specific types of pneumonia, and so on. Focusing on
the first decision point (normal vs. pneumonia), which distinguishes between normal and
pneumonia cases, during this phase, the training process begins as the following:
• Train the VGG-like or ResNet-like model, focusing solely on the first decision point.
• Set the loss weight for the first decision point (e.g., normal vs. pneumonia) to 1.
• Set the loss weights for subsequent decision points (e.g., viral vs. bacterial, viral sub-
types) to 0. This ensures that the model concentrates its learning on accurately classifying
the initial coarse categories without being influenced by the more detailed classifications
that follow.
• Train the model until it achieves satisfactory performance on the first decision to
distinguish normal from pneumonia cases.
• The training proceeds to the next decision point (bacterial vs. viral). For this phase,
the model’s weights from the previous training step are retained, ensuring continuity
and leveraging learned features. The loss weight for the current decision is now set
to a higher value (e.g., 0.9 for decision #2), while the loss weight for the first decision
might be reduced (e.g., 0.1) to maintain its knowledge, and the loss weight for the
third decision is set to 0. This process is repeated for each subsequent decision point,
gradually shifting the model’s focus down the hierarchy.
Sensors 2024, 24, 2641 21 of 30

Although the VGG-backbone and ResNet-backbone models are not hierarchical in


architecture, the training process incorporates hierarchical principles to align with the struc-
tured decision-making process of the problem. For these models, the training methodology
mimics the sequential focus used for the hierarchical models, adapting the learning process
to emphasize one level of classification at a time. This structured approach ensures that the
backbone models, which are powerful feature extractors due to their pre-trained weights,
are finely tuned to the specific requirements of each decision point in the classification task.
While cases in hierarchical classification must follow a predefined hierarchy structure,
local (top-down) and global (big-bang) are the two main approaches that can be used for ad-
dressing hierarchical classification [31]. Global and local approaches are also implemented
in the hybrid approach, as in this work. From the local approach, the function proceeds
in a top-down manner, starting with a broad classification (decision_1) and refining the
classification based on subsequent decisions (decision_2 and decision_3). From a global
perspective, the function considers the entire hierarchy of classifications as it defines the
possible outcomes at each level and makes decisions based on the entire set of possibilities.
Therefore, both local and global approaches are implemented in this hybrid approach,
as explained.
Among the four architectures, ResNet-like has more computation time due to its
complex structure. The VGG-backbone follows in terms of computational demands, while
the ResNet-backbone and VGG-like models require less computation time.

4.4. Evaluation Metrics


To analyze the general classification performance, we must choose the right evaluation
method. As mentioned, we addressed the class imbalance issue in the dataset by generating
a significant amount of synthetic data. Since the data were now balanced, we decided to
use macro-avg as the primary metric to calculate the mean evaluation metrics between
the classes. The proposed models were assessed using common evaluation metrics such
as accuracy, precision, sensitivity, and F1-score. The research targets hierarchal multiclass
classification, using a 6 × 6-sized confusion matrix to output for values—true positive
(TP), false positive (FP), false positive (FP), and false negative (FN)—used to calculate the
following measures [32]:
Macro-average accuracy measures the average correctly predicted cases from each
class to the total number of instances evaluated, as shown in Equation (3):

1 TPi + TNi
C ∑ TPi + TNi + FPi + FNi
Macro-average Accuracy = × (3)

Macro-average precision measures the average ratio of true positives among all pre-
dicted positives across all classes, as shown in Equation (4):

1 TPi
C ∑ TPi + FPi
Macro-average Precision = × (4)

Macro-average sensitivity measures the average ability of the model to correctly


identify true positives across all classes, as shown in Equation (5):

1 TPi
C ∑ TPi + FNi
Macro-average sensitivity = × (5)

Macro-average F1-score combines both precision and recall into a single metric, pro-
viding an overall measure of the model’s performance in terms of correctly identifying true
positives and minimizing false positives and negatives, as shown in Equation (6):

1 2 × TPi
C ∑ 2 × TPi + FPi + FNi
Macro-average F1-score = × (6)

where
Sensors 2024, 24, 2641 22 of 30

C is the number of classes in the classification task;


TPi is the number of true positives for class i;
TNi is the number of true negatives for class i;
FPi is the number of false positives for class i;
FNi is the number of false negatives for class i.

5. Results and Discussion


The experiments were implemented using the Python 3.10.12 language and PyTorch
2.1.0+cu121 for the deep learning models. The training of the model was performed using
a PC running the 64-bit Windows 11 Pro operating system. The PC had an Intel® Core™ i7-
10700 CPU @ 2.90GHz and 32GB of RAM (Intel, Santa Clara, CA, USA). Due to the limited
size of some classes in the dataset (total of 4543 patients) and to obtain a more robust
model, the data splitting approach was 70/30 for training and testing, respectively, using
k-fold cross-validation with k set to five. Random searching within an iterative process
was used to adjust the hyperparameter values of the models. This iterative approach
drew inspiration from the value ranges employed in relevant research focused on similar
problems and utilizing the same model architecture. Each model used the Adam optimizer
with a learning rate of 0.001, a learning scheduler patience of three, an early stopping
patience of five for the model without tabular data and fifteen for the model with tabular
data, and cross-entropy loss as a loss function. The models were trained in ranges of 20 to
40 epochs with batch sizes of 32. The hierarchical multi-modal consistently performed well,
given that the cross-validation score was consistent across all cross-folds.
As mentioned in Section 4, four hierarchical deep learning multi-modals have been
created
Sensors 2024, 24, 2641
to diagnose COVID-19 using CXR images and clinical tabular data and have been 23 of 31
applied in many experiments. The first and second experiments were conducted to measure
the performance of hierarchical deep learning models with and without a second dataset
(used only CXR images), which is mentioned in Section 3, and before integrating the
dataset (used only CXR images), which is mentioned in Section 3, and before integrating
synthetic CXR images. Figure 12 shows the macro-average accuracy of the models before
the synthetic CXR images. Figure 12 shows the macro-average accuracy of the models
and after integrating
before the
andsecond dataset. The
after integrating clear enhancement
the second of allenhancement
dataset. The clear models’ performance
of all models’ per-
after integrating the second dataset signifies a positive impact on the models’
formance after integrating the second dataset signifies a positive ability
impact to models’
on the
classify pneumonia accurately. The increased performance range (2.29–6.21%) indicates a
ability to classify pneumonia accurately. The increased performance range (2.29–6.21%)
significant improvement for all models.
indicates a significant improvement for all models.

Macro-Average Accuracy
Resnet backbone model (First+ Second) dataset 82.06
Resnet backbone model (First) dataset 75.85
VGG backbone model (First+ Second) dataset 82.2
VGG backbone model (First) dataset 78.53
Resnet like model (First+ Second) dataset 85.77
Resnet like model (First) dataset 83.48
VGG like model (First+ Second) dataset 84.25
VGG like model (First) dataset 81.45
70 72 74 76 78 80 82 84 86 88

Figure 12.
Figure 12. Comparison of Comparison of macro-average
macro-average accuracy
accuracy for all modelsforwith
all models with and
and without withoutdataset.
a second a second dataset.

The third experiment


The thirdwas performed
experiment was for all models
performed after
for all integrating
models the synthetic
after integrating the synthetic
CXR images with CXRtheimages withCXR
original the original
imagesCXRfromimages from allTable
all datasets. datasets. Tablethe
7 shows 7 shows
resultstheofresults of
all decisions atall decisions
each at each
level for eachlevel for each hierarchical
hierarchical classification classification
schema. We schema.
observedWe observed
that the that the
best results forbest results#1for
decision decision
and #1 and
decision decisionare
#2 (which #2 binary
(which classification)
are binary classification) were obtained
were obtained
using themodel,
using the Resnet-like Resnet-like
whilemodel, while
the best the best
results results for#3decision
for decision #3 (multi-classification)
(multi-classification)
were obtained were
usingobtained using the Resnet-backbone
the Resnet-backbone model. the
model. In addition, In addition,
results ofthe
theresults of the compari-
comparison
son of COVID-19 classification for each hierarchical classification
of COVID-19 classification for each hierarchical classification schema are shown in Table schema are 8.shown in
Table 8. COVID-19 classification using the Resnet-backbone model is higher than other
COVID-19 classification using the Resnet-backbone model is higher than other models,
models, demonstrating an F1-score of 85.82% and an accuracy of 92.88%. Table 9 repre-
sents the macro-avg results that were achieved from all hierarchical classification models.
Compared to the first and second experiments, the results improved very clearly for all
models after integrating the synthetic CXR images to balance the dataset. The Resnet-like
model achieved the best results among all models, with an F1-score of 92.65% and an ac-
curacy of 92.61% for classifying all the classes.
Sensors 2024, 24, 2641 23 of 30

demonstrating an F1-score of 85.82% and an accuracy of 92.88%. Table 9 represents the


macro-avg results that were achieved from all hierarchical classification models. Compared
to the first and second experiments, the results improved very clearly for all models
after integrating the synthetic CXR images to balance the dataset. The Resnet-like model
achieved the best results among all models, with an F1-score of 92.65% and an accuracy of
92.61% for classifying all the classes.

Table 7. Results of decisions at each level for each hierarchical classification schema using only CXR
images.

Models Decision # Accuracy Sensitivity Precision F1-Score


Decision #1 94.92 94.92 95.12 94.98
VGG-like Decision #2 90.21 90.21 90.80 90.37
Decision #3 88.95 88.95 89.06 88.98
Decision #1 95.05 95.05 95.26 95.12
Resnet-like Decision #2 92.82 92.82 92.90 92.85
Decision #3 89.95 89.95 90.01 89.98
Decision #1 93.92 93.92 94.40 94.04
VGG-backbone Decision #2 88.41 88.41 89.51 88.67
Decision #3 89.36 89.36 89.43 89.39
Decision #1 93.90 93.90 94.52 94.04
Resnet-backbone Decision #2 89.02 89.02 90.23 89.28
Decision #3 90.32 90.32 90.49 90.33

Table 8. Comparison of COVID-19 classification results for each hierarchical classification schema
using only CXR images.

Models Accuracy Sensitivity Precision F1-Score


VGG-like 90.82 80.80 83.57 82.17
Resnet-like 91.55 84.99 83.11 84.04
VGG-backbone 91.42 84.29 83.13 83.71
Resnet-backbone 92.88 82.37 89.56 85.82

Table 9. Comparison of macro-avg results for each hierarchical classification schema using only CXR
images.

Models Loss of Test Set Accuracy Sensitivity Precision F1-Score


VGG-like 0.43 91.36 91.36 91.66 91.45
Resnet-like 0.37 92.61 92.61 92.72 92.65
VGG-backbone 0.37 90.56 90.56 91.11 90.70
Resnet-backbone 0.30 91.08 91.07 91.75 91.22

In the last experiment, we applied the multi-modal approach by combining both the
CXR images and tabular data. Compared to the third experiment, the results improved very
clearly for all models after integrating the medical tabular data into the CXR images. This
indicates the importance of adding medical data to CXR images for the diagnostic process and
that depending only on the imaging modality does not achieve the required accuracy results.
Key demographic factors that contribute to classifying pneumonia include the patient’s age,
body mass, MEWS score (indicating overall clinical severity and vital signs), nationality,
and gender. Additionally, in terms of lab tests, blood clotting, muscle damage, blood clots,
inflammation, tissue damage, Albumin level, kidney function, vitamin D, and white blood
cell count are the most crucial factors and help to determine whether pneumonia is present
as well as to determine its type. Finally, medications for heart failure, nausea, arthritis, blood
pressure, diabetes, acidosis, and iron deficiency are among the most important to consider.
What was dispensed to the patient when they were infected with pneumonia indicates the
most important symptoms of the side effects associated with the infection.
Sensors 2024, 24, 2641 24 of 30

Based on the training evaluation metrics, the Resnet-backbone hierarchal multi-modal


performed the best among the four in this experiment. Figure 13 shows the testing (valida-
tion) accuracy/epochs for the Resnet-backbone multi-modal in one fold, which illustrates
how the accuracy of the model improves over epochs. The VGG-based hierarchal multi-
modal also shows good performance. Table 10 presents the results of different evaluation
metrics of all decisions at each level for each hierarchical classification schema. We observed
clear enhancement in all decisions. The best results for decision #1 were obtained using the
Resnet-backbone multi-modal, while the best results for decision #2 were obtained using
the VGG-backbone multi-modal. However, regarding decision #3, the results from the
Resnet-based multi-modals were the best. The COVID-19 classification was also improved,
as shown in Table 11, using the Resnet-based multi-modal. An accuracy of 93.72% and an
F1-score of 88.24 was achieved using the Resnet-like multi-modal. Table 12 summarizes the
macro-avg results that were achieved for this last experiment when integration between
the CXR images and the tabular data produced superior diagnosis performance compared
Sensors 2024, 24, 2641 to the previous experiment. Resnet-backbone outperforms other multi-modals 25 of with
31 an
accuracy of 95.97% and an F1-score of 95.98%.

Figure Testingaccuracy
13. Testing
Figure 13. accuracy(one-fold)
(one-fold) against
against thethe number
number of epochs
of epochs forResnet-backbone
for the the Resnet-backbone
multi- multi-
modal in
modal in the
the last
lastexperiments.
experiments.

Table 10. Results of decisions at each level for each hierarchical classification schema using CXR
Table 10. Results of decisions at each level for each hierarchical classification schema using CXR
images and tabular data.
images and tabular data.
Models Decision # Accuracy Sensitivity Precision F1-Score
Models Decision # #1
Decision Accuracy
93.63 Sensitivity
93.63 Precision 93.83
94.67 F1-Score
VGG-like Decision
Decision #1 #2 96.06
93.63 96.06
93.63 96.15
94.67 96.09 93.83
VGG-like Decision #2 #3
Decision 91.62
96.06 91.62
96.06 91.70
96.15 91.65 96.09
Decision
Decision #3 #1 95.90
91.62 95.90
91.62 96.25
91.70 95.98 91.65
Resnet-like Decision #2 93.29 93.29 93.89 93.41
Decision #1 95.90 95.90 96.25 95.98
Decision #3 93.38 93.38 93.44 93.41
Resnet-like Decision #2 93.29 93.29 93.89 93.41
Decision #1 97.66 97.66 97.68 97.67
Decision #3 93.38 93.38 93.44 93.41
VGG-backbone Decision #2 96.68 96.68 96.80 96.71
Decision #1 #3
Decision 97.66
91.40 97.66
91.40 97.68
91.49 91.45 97.67
VGG-backbone Decision #2
Decision #1
96.68
98.13
96.68
98.13
96.80
98.17 98.15
96.71
Decision #3 91.40 91.40 91.49 91.45
Resnet-backbone Decision #2 96.42 96.42 96.49 96.44
Decision #1 #3
Decision 98.13
93.35 98.13
93.35 98.17
93.38 93.36 98.15
Resnet-backbone Decision #2 96.42 96.42 96.49 96.44
Decision #3
Table 11. Comparison of COVID-19 93.35
classification 93.35
results for each 93.38
hierarchical classification 93.36
schema
using CXR images and tabular data.

Models Accuracy Sensitivity Precision F1-Score


VGG-like 92.30 85.71 82.64 84.15
Resnet-like 93.72 88.69 87.79 88.24
VGG-backbone 91.88 83.51 84.62 84.06
Resnet-backbone 93.89 87.96 86.98 87.47
Sensors 2024, 24, 2641 25 of 30

Table 11. Comparison of COVID-19 classification results for each hierarchical classification schema
using CXR images and tabular data.

Models Accuracy Sensitivity Precision F1-Score


VGG-like 92.30 85.71 82.64 84.15
Resnet-like 93.72 88.69 87.79 88.24
VGG-backbone 91.88 83.51 84.62 84.06
Resnet-backbone 93.89 87.96 86.98 87.47

Table 12. Comparison of macro-avg results for each hierarchical classification schema using CXR
images and tabular data.

Models Loss of Test Set Accuracy Sensitivity Precision F1-Score


VGG-like 0.23 93.77 93.77 94.18 93.86
Resnet-like 0.20 94.19 94.19 94.53 94.26
VGG-backbone 0.24 95.25 95.25 95.32 95.27
Resnet-backbone 0.33 95.97 95.97 96.01 95.98

Figure 14 shows the training and testing loss chart; the training loss starts high and
steadily decreases over the epochs, which means that the models learn from the training
Sensors 2024, 24, 2641and improve their performance. The testing loss also decreases over the epochs, and it
data 26 of

is close to the training loss. In this case, the two curves are very close, suggesting that there
is no observed overfitting for any of the multi-modals.

(A) Resnet-backbone model multi-modal (B) Resnet like multi-modal

(C) VGG-backbone multi-modal (D) VGG like multi-modal

Figure 14. TrainingFigure 14. Training


and testing and testing
loss against loss against
the number the number
of epochs of epochs
for each for each in
multi-modal multi-modal
the last in the l
experiments.
experiments.

The 6 × 6 confusion-matrix plots for the four multi-modals are depicted in Figure
Instead of showing all eight possible classes, the confusion matrix only presents six. Th
is because some classes are grouped together. Imagine a hierarchy in which the class
(pneumonia and viral) are super classes of others (such as influenza). The confusion m
trix focuses on the specific types (leaf nodes) because the overall pneumonia number
Sensors 2024, 24, 2641 26 of 30

The 6 × 6 confusion-matrix plots for the four multi-modals are depicted in Figure 15.
Instead of showing all eight possible classes, the confusion matrix only presents six. This
is because some classes are grouped together. Imagine a hierarchy in which the classes
(pneumonia and viral) are super classes of others (such as influenza). The confusion matrix
focuses on the specific types (leaf nodes) because the overall pneumonia number is just the
sum of its subtypes, and it presents the actual predicted cases. The horizontal axes corre-
spond to the predicted classes, and the vertical axes correspond to the true classes, which
represent the actual classifications. The diagonal cells in the confusion matrix represent
the correct predictions (TP and TN). The off-diagonal cells represent incorrect predictions
(FP and FN). From observing the number of false prediction cells, the Resnet-backbone
model achieved a low overall misclassification rate of 4.03%. Bacterial pneumonia proved
Sensors 2024, 24, 2641 27 of 31
the most challenging with 74 falsely predicted cases, while the model perfectly classified
all adenovirus cases. There were 5 falsely predicted cases for the normal class and 16 for
COVID-19 classification, which is an acceptable outcome. There were 19 and 29 misclas-
however, the misclassification rate was very low, especially with the Resnet-backbone
sification cases for influenza and RSV, respectively. Though viral pneumonia (including
multi-modal.
influenza, COVID-19, adenovirus, and likely RSV) saw 64 misclassifications, the overall
The macro-average ROC curve in Figure 16 demonstrates that the VGG-like multi-
pneumonia category naturally had a higher rate of 138 out of 1905 cases. Other models
modal (AUC = 0.95) has the best overall performance across all classes, followed closely
(VGG-backbone, Resnet-like, and VGG-like) exhibited slightly higher misclassification
by the VGG-backbone and Resnet-backbone multi-modals (AUC = 0.93~0.92). Taking into
rates between 4.75% and
consideration the other 6.23%. All metrics,
performance the multi-modals seem to perform
the Resnet-backbone well;achieved
multi-modal however, the
misclassification rate was very low, especially with
superior performance in the classification process. the Resnet-backbone multi-modal.

(A) Resnet-backbone multi-modal (B) Resnet-like multi-modal

(C) VGG-backbone multi-modal (D) VGG-like multi-modal


Figure15.
Figure 15. Confusion
Confusion matrix
matrixfor
foreach
eachmulti-modal
multi-modalin the lastlast
in the experiments.
experiments.

The macro-average ROC curve in Figure 16 demonstrates that the VGG-like multi-
modal (AUC = 0.95) has the best overall performance across all classes, followed closely
by the VGG-backbone and Resnet-backbone multi-modals (AUC = 0.93~0.92). Taking into
Sensors 2024, 24, 2641 27 of 30

Sensors 2024, 24, 2641 28 ofachieved


consideration the other performance metrics, the Resnet-backbone multi-modal 31
superior performance in the classification process.

Figure
Figure 16.
16. Macro-average
Macro-averageROC
ROCcurve
curveacross
acrossall
alldecisions
decisionsfor
foreach
eachmulti-modal
multi-modalininthe
thelast
lastexperi-
experiments.
ments.
While many studies have explored various approaches to the classification and identi-
While
fication of many studiestohave
COVID-19, our explored
knowledge, various approaches
no approach hasto attempted
the classification and iden-
to classify COVID-19
tification of COVID-19, to our knowledge, no approach has attempted
by employing a hierarchical classification architecture that combines CXR image to classify COVID-features
19
withby tabular
employing a hierarchical
medical classification
data within architecture
a single model. This that
unique combines CXRdifferentiates
approach image fea- our
tures
workwithfromtabular medical
existing data within
research a single model.
on COVID-19 This unique
classification, approach
which differentiates
is particularly evident
our
when comparing the results of our proposed approach with those of similarevident
work from existing research on COVID-19 classification, which is particularly works in the
when comparing the results of our proposed approach with those of similar works in the
literature, as shown in Table 13. While binary classification and CT scan-based studies can
literature, as shown in Table 13. While binary classification and CT scan-based studies can
achieve high accuracy, we opted not to compare our work to them for two reasons. Firstly,
achieve high accuracy, we opted not to compare our work to them for two reasons. Firstly,
simplifying the problem into a binary classification might not reflect the complexities of
simplifying the problem into a binary classification might not reflect the complexities of
real-world scenarios with more granular classifications. Secondly, CT scans, while valu-
real-world scenarios with more granular classifications. Secondly, CT scans, while valua-
ablefor
ble fordiagnosis,
diagnosis,cancan
be be impractical
impractical duedue to limitations
to the the limitations mentioned
mentioned previously
previously in the in the
literature review. Our focus here is on more applicable, similar
literature review. Our focus here is on more applicable, similar approaches. approaches.

Table 13.
Table Comparisontoto
13.Comparison related
related studies
studies applying
applying similar
similar approaches.
approaches.

Methodology
Methodology
Model
Model Accuracy
Accuracy F1-Score
F1-Score
Hierarchal Multi-Modal
Hierarchal Multi-Modal
Pereira et al. [15] Yes No - 65%
Pereiraetetal.
Attaullah al.[33]
[15] No Yes Yes No 77.88% - - 65%
Attaullah
Cheng et al. [34] [33]
et al. No No Yes Yes 77.88%
73.2% 70.7% -
Cheng
Loey et al.
et al. [35][34] No No No Yes 80.56%73.2% 82.32%70.7%
Loey etetal.
Rajaraman al.[35]
[36] No No No No 80.56%
91.77% 91.41%82.32%
Rajaraman
Proposed Modelet al. [36] YesNo Yes No 91.77%
95.97% 95.98%91.41%
Proposed Model Yes Yes 95.97% 95.98%
Compared to previous research using a hierarchical structure for pneumonia classi-
fication [15], our to
Compared proposed
previousapproach
researchachieved better performance
using a hierarchical structure than
for the hierarchalclassifi-
pneumonia
model in this work. The overall performance of the model achieved a macro-average
cation [15], our proposed approach achieved better performance than the hierarchal F1- model
score of 0.65, while the identification of COVID-19 cases specifically achieved
in this work. The overall performance of the model achieved a macro-average F1-score of an F1-score
of 0.89 for this class. This study also faced limitations in the feature extraction phase. It
0.65, while the identification of COVID-19 cases specifically achieved an F1-score of 0.89
relied on hand-crafted features, potentially missing more intricate patterns, and extracted
for this class. This study also faced limitations in the feature extraction phase. It relied on
features from a single modality. Additionally, the sample size was restricted, with only
hand-crafted features, potentially missing more intricate patterns, and extracted features
1144 CXR images (1000 normal and a concerningly low 144 pneumonia cases, including
from a single modality. Additionally, the sample size was restricted, with only 1144 CXR
COVID-19). This limited dataset might hinder the generalizability of the findings.
imagesThe(1000 normalapproach
multi-modal and a concerningly
significantlylow 144 pneumonia
outperforms previous cases,
workincluding COVID-19).
by Attaullah et
This limited dataset might hinder the generalizability of the findings.
al. [33] for classifying COVID-19 using a public dataset with five classes using the data of
symptomsThe multi-modal approach
and CXR images, significantly
which achieved outperforms
an accuracy previous
of 78.88%. work
Also, the by Attaullah
research in
et al. [33] for classifying COVID-19 using a public dataset with five
[34] combined clinical data with the CXR image features fed into a neural network classes using the data
Sensors 2024, 24, 2641 28 of 30

of symptoms and CXR images, which achieved an accuracy of 78.88%. Also, the research
in [34] combined clinical data with the CXR image features fed into a neural network
architecture, achieving 73.2% accuracy and a 70.7% F1-score. While previous approaches
have their merits, our findings yield demonstrably superior outcomes.
However, compared to flat classification studies that relied on CXR images only,
our study surpasses previous work [35] that addressed a limited dataset (307 images) by
employing a two-stage approach (data augmentation and deep learning). While their best
result with GoogLeNet achieved 80.6% accuracy for multiclass pneumonia classification.
In addition, the ensemble learning for the COVID-19 detection module in this research [36]
categorized standardized CXR images into different classes, with the findings indicating
that the developed deep learning system successfully identified COVID-19 pneumonia
with an accuracy rate of 91.77% and an F1-score of 91.41%. Furthermore, it is noteworthy
that our innovative approach surpasses these results, demonstrating superior performance
in accurately predicting COVID-19 pneumonia through the integration of hierarchical
classification architecture, combining CXR image features with tabular medical data within
a unified model.

6. Conclusions
This paper proposes a novel approach for classifying COVID-19 and distinguishing it
from other types of pneumonia and normal lungs using CXR images and medical tabular
data in four different hierarchal architectures based on Resnet and VGG pre-trained models.
This study used a private dataset obtained from King Khalid University Hospital and
Rashid Hospital, containing a total of 4544 cases. This study aims to enhance the process
of diagnosing COVID-19 and prove that combining CXR images with clinical data can
achieve significant improvements in the hierarchal classification process. Overall, the
performance metrics for all the hierarchal deep learning models were enhanced after
combining the medical data with CXR images. Resnet-backbone achieved the highest
performance with an accuracy of 95.97%, a precision of 96.01%, and an F-score of 95.98%.
The proposed approach showed promising results, especially the hierarchal deep learning
multi-modal. Our findings could aid in the development of better diagnostic tools for
upcoming respiratory disease outbreaks. However, this study suffers from a data imbalance
due to the lack of available patient medical data for some classes. This challenge affects the
evaluation of the model’s performance. Generating a synthetic dataset makes the model
more robust; however, it could also introduce biases or inaccuracies, potentially leading to
unreliable results. To some extent, we are satisfied with the quality of our generated dataset
so far, but we believe that there is room to enhance the quality of the synthetic dataset to
optimize the model’s performance. In future work, we plan to explore more datasets from
different resources, including different classes of pneumonia and lung diseases.

Author Contributions: Conceptualization, S.A.A., A.M. and A.S.A.; methodology, S.A.A., S.A. and
A.S.A.; software, A.S.A. and M.A.; validation, S.A.A., S.A., T.N., B.M. and A.M.; formal analysis,
S.A.A., S.A. and A.S.A.; investigation, A.S.A.; resources, S.A.A., S.A., L.S., M.A. and A.S.A.; data
curation, A.S.A.; writing—original draft preparation, A.S.A.; writing—review and editing, S.A.A.,
S.A., T.N., B.M., L.S., M.A. and A.M.; visualization, S.A.A., S.A., T.N., A.M., B.M. and A.S.A.;
supervision, S.A.A. and A.M.; project administration, S.A.A., A.M. and A.S.A.; funding acquisition,
A.S.A. All authors have read and agreed to the published version of the manuscript.
Funding: This research received no external funding.
Institutional Review Board Statement: This study was conducted in accordance with the declaration
and guidelines of the Dubai Scientific Research Ethics Committee, DHA (DSREC-12/2021_01), and
the King Saud University Institutional Review Board Committee (E-251-5939).
Informed Consent Statement: Informed consent was obtained from all subjects involved in this study.
Data Availability Statement: The data presented in this study are available on request from the
corresponding author due to (ethical reasons).
Sensors 2024, 24, 2641 29 of 30

Acknowledgments: The authors would like to thank Deanship of scientific research in King Saud
University, Riyadh, Saudi Arabia for funding and supporting this research through the initiative of
DSR Graduate Students Research Support (GSR), and the Dubai Scientific Research Ethical Committee
(DSREC), Dubai Health Authority and Rashid Hospital, for their support in this study. In addition,
we give special thanks to the editor and reviewers for spending their valuable time reviewing and
polishing this article.
Conflicts of Interest: The authors declare no conflict of interest.

References
1. Pal, M.; Berhanu, G.; Desalegn, C.; Kandi, V. Severe Acute Respiratory Syndrome Coronavirus-2 (SARS-CoV-2): An Update.
Cureus 2020, 12, e7423. [CrossRef]
2. COVID-19 Cases|WHO COVID-19 Dashboard. Datadot. Available online: https://data.who.int/dashboards/covid19/cases
(accessed on 20 January 2024).
3. Chowdhury, M.E.H.; Rahman, T.; Khandakar, A.; Mazhar, R.; Kadir, M.A.; Bin Mahbub, Z.; Islam, K.R.; Khan, M.S.; Iqbal, A.; Al
Emadi, N.; et al. Can AI Help in Screening Viral and COVID-19 Pneumonia? IEEE Access 2020, 8, 132665–132676. [CrossRef]
4. Maharjan, N.; Thapa, N.; Magar, B.P.; Maharjan, M.; Tu, J. COVID-19 Diagnosed by Real-Time Reverse Transcriptase-Polymerase
Chain Reaction in Nasopharyngeal Specimens of Suspected Cases in a Tertiary Care Center: A Descriptive Cross-sectional Study.
J. Nepal Med. Assoc. 2021, 59, 464–467. [CrossRef]
5. Swapnarekha, H.; Behera, H.S.; Nayak, J.; Naik, B. Role of intelligent computing in COVID-19 prognosis: A state-of-the-art review.
Chaos Solitons Fractals 2020, 138, 109947. [CrossRef] [PubMed]
6. Yang, T.; Wang, Y.-C.; Shen, C.-F.; Cheng, C.-M. Point-of-Care RNA-Based Diagnostic Device for COVID-19. Diagnostics 2020, 10,
165. [CrossRef]
7. Helmy, Y.A.; Fawzy, M.; Elaswad, A.; Sobieh, A.; Kenney, S.P.; Shehata, A.A. The COVID-19 Pandemic: A Comprehensive Review
of Taxonomy, Genetics, Epidemiology, Diagnosis, Treatment, and Control. J. Clin. Med. 2020, 9, 1225. [CrossRef]
8. Candemir, S.; Antani, S. A review on lung boundary detection in chest X-rays. Int. J. Comput. Assist. Radiol. Surg. 2019, 14, 563–576.
[CrossRef] [PubMed]
9. Jacobi, A.; Chung, M.; Bernheim, A.; Eber, C. Portable chest X-ray in coronavirus disease-19 (COVID-19): A pictorial review. Clin.
Imaging 2020, 64, 35–42. [CrossRef] [PubMed]
10. ICD-10 Version:2019. Available online: https://icd.who.int/browse10/2019/en#/ (accessed on 25 January 2024).
11. Kiryu, S.; Yasaka, K.; Akai, H.; Nakata, Y.; Sugomori, Y.; Hara, S.; Seo, M.; Abe, O.; Ohtomo, K. Deep learning to differentiate
parkinsonian disorders separately using single midsagittal MR imaging: A proof of concept study. Eur. Radiol. 2019, 29, 6891–6899.
[CrossRef]
12. Roberts, M.; Driggs, D.; Thorpe, M.; Gilbey, J.; Yeung, M.; Ursprung, S.; Aviles-Rivero, A.I.; Etmann, C.; McCague, C.; Beer, L.;
et al. Common pitfalls and recommendations for using machine learning to detect and prognosticate for COVID-19 using chest
radiographs and CT scans. Nat. Mach. Intell. 2021, 3, 199–217. [CrossRef]
13. Althenayan, A.S.; AlSalamah, S.A.; Aly, S.; Nouh, T.; Mirza, A.A. Detection and Classification of COVID-19 by Radiological
Imaging Modalities Using Deep Learning Techniques: A Literature Review. Appl. Sci. 2022, 12, 10535. [CrossRef]
14. Gao, J.; Li, P.; Chen, Z.; Zhang, J. A Survey on Deep Learning for Multimodal Data Fusion. Neural Comput. 2020, 32, 829–864.
[CrossRef]
15. Pereira, R.M.; Bertolini, D.; Teixeira, L.O.; Silla, C.N.; Costa, Y.M. COVID-19 identification in chest X-ray images on flat and
hierarchical classification scenarios. Comput. Methods Programs Biomed. 2020, 194, 105532. [CrossRef]
16. Regulatory Considerations on Artificial Intelligence for Health. Available online: https://www.who.int/publications-detail-
redirect/9789240078871 (accessed on 30 March 2024).
17. Ethics and Governance of Artificial Intelligence for Health: GUIDANCE on Large Multi-Modal Models. Available online:
https://www.who.int/publications-detail-redirect/9789240084759 (accessed on 30 March 2024).
18. Barnett, W.R.; Radhakrishnan, M.; Macko, J.; Hinch, B.T.; Altorok, N.; Assaly, R. Initial MEWS score to predict ICU admission or
transfer of hospitalized patients with COVID-19: A retrospective study. J. Infect. 2021, 82, 282–327. [CrossRef]
19. Gardner-Thorpe, J.; Love, N.; Wrightson, J.; Walsh, S.; Keeling, N. The Value of Modified Early Warning Score (MEWS) in Surgical
In-Patients: A Prospective Observational Study. Ind. Mark. Manag. 2006, 88, 571–575. [CrossRef]
20. Menéndez, C.; Ordieres, J.; Ortega, F. Importance of information pre-processing in the improvement of neural network results.
Expert Syst. 1996, 13, 95–103. [CrossRef]
21. Liu, M.; Li, S.; Yuan, H.; Ong, M.E.H.; Ning, Y.; Xie, F.; Saffari, S.E.; Shang, Y.; Volovici, V.; Chakraborty, B.; et al. Handling missing
values in healthcare data: A systematic review of deep learning-based imputation techniques. Artif. Intell. Med. 2023, 142, 102587.
[CrossRef]
22. Varma, D.R. Managing DICOM images: Tips and tricks for the radiologist. Indian J. Radiol. Imaging 2012, 22, 4–13. [CrossRef]
23. Rahman, T.; Khandakar, A.; Qiblawey, Y.; Tahir, A.; Kiranyaz, S.; Kashem, S.B.A.; Islam, M.T.; Al Maadeed, S.; Zughaier, S.M.;
Khan, M.S.; et al. Exploring the effect of image enhancement techniques on COVID-19 detection using chest X-ray images.
Comput. Biol. Med. 2021, 132, 104319. [CrossRef]
Sensors 2024, 24, 2641 30 of 30

24. Sharma, A.; Mishra, P.K. Image enhancement techniques on deep learning approaches for automated diagnosis of COVID-19
features using CXR images. Multimed. Tools Appl. 2022, 81, 42649–42690. [CrossRef]
25. Gordienko, Y.; Gang, P.; Hui, J.; Zeng, W.; Kochura, Y.; Alienin, O.; Rokovyi, O.; Stirenko, S. Deep Learning with Lung Segmentation
and Bone Shadow Exclusion Techniques for Chest X-Ray Analysis of Lung Cancer. In Advances in Computer Science for Engineering and
Education; Hu, Z., Petoukhov, S., Dychka, I., He, M., Eds.; Advances in Intelligent Systems and Computing; Springer International
Publishing: Cham, Switzerland, 2019; Volume 754, pp. 638–647.
26. Oğul, H.; Oğul, B.B.; Ağıldere, A.M.; Bayrak, T.; Sümer, E. Eliminating rib shadows in chest radiographic images providing
diagnostic assistance. Comput. Methods Programs Biomed. 2016, 127, 174–184. [CrossRef] [PubMed]
27. Bennin, K.E.; Keung, J.; Monden, A.; Kamei, Y.; Ubayashi, N. Investigating the Effects of Balanced Training and Testing Datasets
on Effort-Aware Fault Prediction Models. In Proceedings of the 2016 IEEE 40th Annual Computer Software and Applications
Conference (COMPSAC), Atlanta, GA, USA, 10–14 June 2016; pp. 154–163. [CrossRef]
28. Abedi, M.; Hempel, L.; Sadeghi, S.; Kirsten, T. GAN-Based Approaches for Generating Structured Data in the Medical Domain.
Appl. Sci. 2022, 12, 7075. [CrossRef]
29. Andrade-Girón, D.C.; Marín-Rodriguez, W.J.; Lioo-Jordán, F.d.M.; Villanueva-Cadenas, G.J.; Salinas, F.d.M.G.-T.d. Neural
Networks for the Diagnosis of COVID-19 in Chest X-ray Images: A Systematic Review and Meta-Analysis. EAI Endorsed Trans.
Pervasive Health Technol. 2023, 9. [CrossRef]
30. Saini, K.; Devi, R. A systematic scoping review of the analysis of COVID-19 disease using chest X-ray images with deep learning
models. J. Auton. Intell. 2023, 7. [CrossRef]
31. Silla, C.N.; Freitas, A.A. A survey of hierarchical classification across different application domains. Data Min. Knowl. Discov.
2011, 22, 31–72. [CrossRef]
32. Hossin, M.; Sulaiman, M.N. A Review on Evaluation Metrics for Data Classification Evaluations. Int. J. Data Min. Knowl. Manag.
Process 2015, 5, 1–11. [CrossRef]
33. Attaullah, M.; Ali, M.; Almufareh, M.F.; Ahmad, M.; Hussain, L.; Jhanjhi, N.; Humayun, M. Initial Stage COVID-19 Detection
System Based on Patients’ Symptoms and Chest X-ray Images. Appl. Artif. Intell. 2022, 36, 2055398. [CrossRef]
34. Cheng, J.; Sollee, J.; Hsieh, C.; Yue, H.; Vandal, N.; Shanahan, J.; Choi, J.W.; Tran, T.M.L.; Halsey, K.; Iheanacho, F.; et al. COVID-19
mortality prediction in the intensive care unit with deep learning based on longitudinal chest X-rays and clinical data. Eur. Radiol.
2022, 32, 4446–4456. [CrossRef] [PubMed]
35. Loey, M.; Smarandache, F.; Khalifa, N.E.M. Within the Lack of Chest COVID-19 X-ray Dataset: A Novel Detection Model Based
on GAN and Deep Transfer Learning. Symmetry 2020, 12, 651. [CrossRef]
36. Rajaraman, S.; Sornapudi, S.; Alderson, P.O.; Folio, L.R.; Antani, S.K. Analyzing inter-reader variability affecting deep ensemble
learning for COVID-19 detection in chest radiographs. PLoS ONE 2020, 15, e0242301. [CrossRef]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.

You might also like