sensors-2926618-peer-review-v1
sensors-2926618-peer-review-v1
Albatoul Althinyan 1,2, Shada A. AlSalamah 1,3,4, Sherin Aly 5, Thamer Nouh 6, Bassam Mahboub 7, Laila Salameh 4
8, Metab Alkubeyyer 9 and Abdulrahman Mirza 1 5
1 Information Systems Department, College of Computer and Information Sciences, King Saud University, 6
Riyadh, Saudi Arabia. 7
2 Information Systems Department, College of Computer and Information Sciences, Imam Mohammed Bin 8
Saud Islamic University, Riyadh, Saudi Arabia. 9
3 National Health Information Center, Saudi Health Council, Riyadh, Saudi Arabia. 10
4 Digital Health and Innovation Department, Science Division, World Health Organization, Geneva, Switzer- 11
land. 12
5 Institute of Graduate Studies and Research, Alexandria University, Egypt 13
6 Trauma and Acute Care Surgery Unit, College of Medicine, King Saud University, Riyadh, Saudi Arabia 14
7 Clinical Sciences Department, College of Medicine, University of Sharjah, Sharjah, United Arab Emirates 15
8 Sharjah Institute for Medical Research, University of Sharjah, Sharjah, United Arab Emirates 16
9 Department of Radiology and Medical Imaging, King Khalid University Hospital, King Saud University, 17
Riyadh, Saudi Arabia 18
* Correspondence: [email protected]; Tel.: 00966506061614 19
Abstract: The coronavirus disease 2019 (COVID-19), originating in China, has rapidly spread world- 20
wide. Early detection is crucial for effective treatment. Physicians must examine infected patients 21
and make timely decisions to isolate them. However, completing these processes is difficult due to 22
limited time, expert radiologists, as well as limitations related to the reverse-transcription polymer- 23
ase chain reaction (RT-PCR) method. Early detection is crucial for preventive measures. Using radi- 24
ological imaging modalities, deep learning is an advanced machine learning technique that is highly 25
effective in diagnosing diseases and image classification. Previous COVID-19 classification research 26
has limitations such as binary classification, flat structure, single feature modality, small public da- 27
taset, and reliance on CT diagnostic processes. This study aims to overcome these limitations by 28
identifying pneumonia caused by COVID-19, distinguishing it from other types of pneumonia and 29
healthy lungs using chest X-ray (CXR) images and related tabular medical data, and showing the 30
effectiveness of incorporating tabular medical data. Additionally, given that pneumonia naturally 31
falls into a hierarchical structure, we leverage this structure within our approach to achieve im- 32
proved classification outcomes. Pre-trained CNN models were employed to extract features, which 33
were then combined using early fusion for the classification of eight distinct classes. Due to the na- 34
ture of the imbalanced dataset in this field, a variety of versions of Generative Adversarial Networks 35
(GANs) are used to generate synthetic data. The proposed approach tested in our private datasets 36
of 4,523 patients achieved a macro-avg F1-score of 95.9% and an F1-score of 87.5% for COVID-19 37
identification using a Resnet-based structure. In conclusion, this study was able to create an accurate 38
deep learning multi-modal to diagnose COVID-19 and differentiate it from other kinds of pneumo- 39
nia and normal lungs, which will enhance the radiological diagnostic process. 40
Keywords: Artificial intelligence; COVID-19; CXR; Hierarchical; Deep learning; Multi-modal; Diag- 41
nosis; Image classification; Multi-classes; Pneumonia 42
43
1. Introduction 44
The entire world's healthcare system is seriously endangered by a type of viral pneu- 45
monia known as COVID-19. Since the virus that causes COVID-19 has a high 46
Pneumonia
Viral Bacterial
respiratory
SARSr-CoV-2 Influenza syncytial virus Adenoviruses
(RSV)
79
Figure 1. Proposed hierarchal class structure of pneumonia. 80
are able to detect and learn the significant details that radiologists find difficult to recog- 89
nize with their naked eyes [11]. It produces promising results for learning complex prob- 90
lems in radiology [12]. Many of the previously reviewed studies [13] have employed deep 91
learning models to diagnose and detect COVID-19 pneumonia utilizing medical imaging 92
in a theoretical manner that cannot be implemented clinically. 93
In several fields, especially the diagnosis of medical assistants, deep learning ap- 94
proaches have accomplished significant advances in multi-modal structures by learning 95
features from different sources of data[14]. This clearly explains the effectiveness of add- 96
ing various medical data in addition to CXR images for the diagnostic process. 97
One variant of the conventional flat classification problem is called hierarchical clas- 98
sification (HC). In a flat classification approach, cases are categorized into classes without 99
following any predefined structure. 100
Proving that the hierarchical classification is more effective than the flat classification 101
in this domain is not the purpose of this work, because there are those in the literature 102
who have proven (Pereira et al., 2020). In this work, we investigate how clinical data affect 103
COVID-19 classification utilizing CXR images with a hierarchical classification frame- 104
work. to detect different types of pneumonia caused by multiple pathogens and differen- 105
tiate them from the normal lungs. To achieve that, we collected a private, imbalanced da- 106
taset in which some types of pneumonia are much more common than others. For that, 107
we applied variants of GANs model to balance the class distribution. We applied a multi- 108
modal hierarchical classification utilizing a pure deep learning (end-to-end CNN) ap- 109
proach for some predefined models in a hierarchy structure using a hybrid approach to 110
the CXR images first, then added the medical tabular data using early fusion. 111
The paper is organized as follows: Section 2 covers the literature for related works. 112
The details and characteristics of the dataset used in this paper and its analysis, as well as 113
the techniques used to preprocess either the CXR images or the tabular dataset, are dis- 114
cussed in Section 3. After that, Section 4 details our proposed methodology and experi- 115
mental setup. The obtained results and a discussion are summarized in Section 5. Finally, 116
the conclusion of the current work and some possibilities for future works are described 117
in the last section. 118
138
Figure 2. Proposed framework for multi-modal classification of CXR images. 139
Category Value
Number of patients 1217
Gender
Male 1070
Female 147
Diagnosis
SARSr-COV-2 1217
Age
Range 19-87
164
Sensors 2024, 24, x FOR PEER REVIEW 5 of 27
Category Value
Number of patients 1217
Gender
Male 1070
Female 147
Diagnosis
SARSr-COV-2 1217
Age
Range 19-87
The dataset consists of CXR images with the corresponding medical tabular data for 167
each patient. It is including demographic, vital signs, clinical, and medications data, the 168
attributes in the data record are described in Table 3. The tabular dataset includes 644 169
features with categorical and numerical data types. Some of the features have been re- 170
moved since they were not significant in diagnosing pneumonia. In addition, there are a 171
total of 60 different nationalities represented in the patient sample for the 4523 patients 172
who make up the entire dataset. 173
The MEWS score feature (stands for modified early warning score) is a clinical tool 174
used in healthcare settings to assess the patient's vital signs. More hospitals are currently 175
using it to help track changes between each set of vitals[15]. The MEWS score typically 176
consists of several physiological parameters include blood pressure, body temperature, 177
pulse rate, respiratory rate, and the AVPU (A = Awake, V = Verbal, P = Pain, U = Unre- 178
sponsive) score that is used to determine a patient's level of consciousness. A score is given 179
to each parameter based on specified standards. The overall MEWS score is then deter- 180
mined by summing the scores for each parameter[16]. According on their MEWS score, 181
patients may be classified into risk categories using the MEWS scoring system, such as: 182
Normal: 0-1 score, Low Risk: 2-3 score, Moderate Risk: 4-6, High Risk: 7-8, Critical: >8. The 183
concern for clinical deterioration increases as the MEWS score rises. 184
With the assistance of a knowledgeable pharmacist, medication prescriptions were 185
also limited to the most fundamental categories without doses, which helped decrease the 186
enormous number of medications from 3500 to 614. 187
188
Sensors 2024, 24, x FOR PEER REVIEW 6 of 27
Table 4. Patient's Medical information Characteristics (*: Data with statistical significance. a: chi- 211
square test, b: Student’s t test. c: Kruskal-Wallis H test). 212
Gender
Vital signs
Lab test
ALT <0.001
77.44 ± 114.05 44.16 ± 100.25 45.66 ± 101.99
*b
Medication
614 Medications
18623, 1.5% 23306, 1.4% 41976, 1.3%
(Yes)
213
The distribution of the data was also shown using a number of visualizations. Figure 214
4 shows the distribution of some continuous and categorical features, because we have 215
many medical features. 216
217
218
Sensors 2024, 24, x FOR PEER REVIEW 9 of 27
There are almost 59.5% Saudi citizens among all classes. The majority of patients are 220
male (0), and the age feature values fall between 20 and 60. Most of the MEWS scores are 221
normal, while there are a few critical cases from all classes. C-reactive protein (CRP) and 222
Sensors 2024, 24, x FOR PEER REVIEW 10 of 27
the Vitamin D 25 OH both showed a right-skewed distribution, and the white blood cells 223
(WBC), blood platelets (Plt), and lymphocytes percentage (Lymph Auto #) are show al- 224
most normal distribution. The Pearson correlation coefficient was used to obtain the rela- 225
tionships between the continuous features. While Cramér's V was used to measure the 226
association between the categorical features. 227
228
Figure 5. Correlation of the continuous features in the dataset. 229
Figure 5 shows the correlation of the continuous features, the heat map shows the 230
correlation between twenty-four continuous features. A correlation value of 0.75 was rec- 231
orded between both creatinine and total CK. Furthermore, a correlation of 0.72 was ob- 232
served between total AST and ALT. Furthermore, Table 5 represents that there is no asso- 233
ciation between age and nationality, while there is a weak association between age, na- 234
tionality, and MEWS score. 235
236
Sensors 2024, 24, x FOR PEER REVIEW 11 of 27
Features Correlation
(AGE, NATIONALITY) 0.000000
(AGE, MEWS Score) 0.196708
(NATIONALITY, AGE) 0.000000
(NATIONALITY, MEWS Score) 0.187111
(MEWS Score, AGE) 0.196708
(MEWS Score, NATIONALITY) 0.187111
model's decision pathways. Once these predictions are made, the missing values in the 278
original datasets are updated. 279
Then, applying a log transformation to the continuous columns in the dataset makes 280
the data more normally distributed and standardizes the numerical values to scale them 281
in the range of a mean of 0 and a standard deviation of 1, ensuring the data is appropriate 282
for subsequent modeling steps. The principal component analysis (PCA) is used to extract 283
features from high-dimensional tabular data. The Maximum Likelihood Estimation (MLE) 284
method is used to reduce the number of principle components from 644 to 218 compo- 285
nents significantly decreases the complexity of the data while retaining the essential vari- 286
ance to focusing on the most informative aspects of the data for modeling. The most im- 287
portant demographic and health-related data in order are: age, BMI, MEWS score, nation- 288
ality, then gender. While the top ten lab tests in order are: INR, Total CK, D-Dimer, CRP, 289
LDH, Albumin Lvl, BUN, Vitamin D 25 OH, and WBC test. The following medications 290
were among the most important: Spironolactone, Granisetron, Colchicine, Sulfasalazine, 291
Nifedipine, Gliclazide, Sodium Bicarbonate, Ferrous Sulfate, Metformin, and Pioglita- 292
zone, respectively. 293
𝑥 1⁄𝛾(𝑥)
𝑔(𝑥) = 255 ( ) (1)
255
In addition, we apply image denoising using the total variation filter (TVF) method 317
to remove the noise from the images. Based on literature, combining contrast enhance- 318
ment (gamma correction) and image denoising (TVF) approaches produces outstanding 319
results on COVID-19 images (Sharma & Mishra, 2022). Moreover, transformations are 320
used to preprocess the images before the training phase. Due to dealing with CXR im- 321
ages from different resources and sizes, standardization was applied by resizing the im- 322
ages. The images were resized to (128×128) pixels, which gave a better result through the 323
experiments. Converting to grayscale and normalizing the image pixel values to a range 324
by dividing by 255 (the maximum pixel value for 8-bit images), and then converting to a 325
tensor for integration with the TensorFlow framework. 326
Sensors 2024, 24, x FOR PEER REVIEW 13 of 27
(A) Original image (B) Lung mask (C) Soble edge detection (D) Dilation
(E) Calculate the connected (F) Shadow estimation and (G) Ribs suppression (H) Shadow subtraction
components suppression
Sensors 2024, 24, x FOR PEER REVIEW 14 of 27
Applying the previous approach to our private CXR images, as shown in Figure 6, 358
delivers unacceptable results on most images. It removes a lot of nodules from the lung 359
area, which affects the prediction results. It's important to note that the effectiveness of 360
shadow suppression may depend on the characteristics of the images you're working 361
with. which makes us attribute that to a lack of clarity of vision and sometimes the bound- 362
ary of the lungs for the most images, despite preprocessing them. Testing on a variety of 363
images is often necessary for a robust shadow removal model. For that reason, we decided 364
to unapplied the ribs elimination on this experiment. 365
Level#1
Normal 1273
Pneumonia 3270
Level#2
Pneumonia\Bacterial 248
Pneumonia\Viral 3165
Level#3
Pneumonia\Viral\Influenza 1281
Pneumonia\Viral \ RSV 21
Pneumonia\Viral \Adenoviruses 15
4.1. Hierarchical Convolutional Neural Network Based on the VGG Architecture 405
VGG-based was mainly used as a deep learning multi-modal for the proposed 406
method, with two architectures. The first architecture called the VGG-Like multi-modal, 407
adapts the architectural principles of the VGG neural network architecture, which utilizes 408
repetitive blocks of convolutional layers followed by max-pooling layers to effectively ex- 409
tract features from CXR images. Our VGG-Like multi-modal simplifies and tailors the 410
original design for hierarchical decision-making in pneumonia classification from CXR 411
images as shown in Figure 8. The adaptations made were designed to process single-chan- 412
nel (grayscale) CXR images by modifying the first convolutional layer to accept a single 413
input channel. The depth of the network is adjusted. The model includes fewer convolu- 414
tional layers than some VGG models (e.g., VGG16, VGG19), making it effective for the 415
targeted dataset. The model introduces branching points to make hierarchical decisions 416
at different levels of pneumonia classification: normal vs. pneumonia, bacterial vs. viral 417
pneumonia, and further subclassification of viral pneumonia. This hierarchical approach 418
is novel and not present in the standard VGG architecture. After the initial shared convo- 419
lutional layers, the network branches out to make specific decisions, with each branch 420
having its own set of convolutional and fully connected layers tailored to its classification 421
task. Considering the dataset size, the fully connected (ANN) layers in the branches are 422
simplified compared to the dense layers in the original VGG models, reducing the model's 423
complexity and the risk of overfitting on medical imaging datasets, which are typically 424
smaller than ImageNet. 425
Sensors 2024, 24, x FOR PEER REVIEW 16 of 27
426
Figure 8. VGG-Like multi-modal Architecture. 427
The input image size was changed to 128×128 pixels. The initial CNN layer for a first- 428
level decision consists of 2D convolution with a kernel size of 3×3 using the ReLU activa- 429
tion. And the Max Pooling with a kernel size of 2×2. The input for the initial layer is 1 430
channel, and the output is 32 channels. The Flattened output from the initial CNN layer 431
pass to hidden Layer of 128 neurons with ReLU activation, and the output (decision#1) 432
representing probabilities of normal/pneumonia classes. The pneumonia CNN layer for 433
the second-level decision consists of two convolutional layers with the same kernel size, 434
activation, and Max pooling layer. The output (decision#2) representing probabilities of 435
Viral/Bacterial classes. The Viral CNN layer for the third-level decision has the same de- 436
scription of the previous level, and the output (decision#3) representing probabilities of 437
SARSr-COV-2/Influenza/RSV/Adenovirus classes. The second architecture is VGG-Back- 438
bone Multi-modal, which adapts the original VGG architecture by utilizing their pre- 439
trained network as feature extractors, followed by three branches of fully connected layers 440
(ANNs) for the hierarchical decision-making task. The first convolutional layer was mod- 441
ified to accept a single input channel. The fully connected layers of the original architec- 442
ture are replaced with custom layers designed to make hierarchical decisions specific to 443
pneumonia classification, as shown in Figure 9. This adaptation allows the models to focus 444
on the most relevant features for each decision level. 445
446
4.2. Hierarchical Convolutional Neural Network Based on the ResNet Architecture 448
In addition to VGG-based multi-modal, ResNet-based was also used as a deep learn- 449
ing multi-modal, with two architectures. The first architecture was called ResNet-Like 450
Multi-modal, which was inspired by the ResNet architecture. ResNet-Like multi-modal 451
adapts the ResNet architecture for the same hierarchical decision-making task in pneu- 452
monia classification as shown in Figure 10. The key adaptations are modified in the first 453
convolutional layer to accept grayscale images, reflecting the single-channel nature of 454
CXR images. The model employs customized residual blocks that match the task's com- 455
plexity and data characteristics. Each block consists of convolutional layers with batch 456
normalization and ReLU activation, similar to ResNet's design, but the number and con- 457
figuration of blocks are tailored to the pneumonia classification task. Similar to the VGG- 458
like model, the ResNet-like model incorporates branching points for hierarchical classifi- 459
cation decisions. This structure leverages the deep feature representation capability of 460
ResNet while providing specialized decision paths for different classification levels. The 461
introduction of skip connections in each block ensures effective training and feature prop- 462
agation, even with the model's depth. The model concludes with simplified, fully con- 463
nected layers in each branch for decision-specific classification. 464
465
471
By adapting ResNet's residual learning principle, the model efficiently learns features 473
from CXR images, which is crucial for medical imaging tasks where interpretability and 474
accuracy are paramount. 475
18: End If
19: End Function
The training methodology adopted for VGG-Like and ResNet-Like involves a se- 483
quential and focused approach, targeting one decision point at a time within the hierar- 484
chical structure of the problem. This approach ensures that the models learn to accurately 485
classify at each level of decision-making, from distinguishing between normal and pneu- 486
monia cases to identifying specific types of pneumonia, and so on. The training process 487
begins with the first decision point, which distinguishes between normal and pneumonia 488
cases. During this phase, the loss weight for the first decision is set to 1, while the loss 489
weights for subsequent decisions are set to 0. This concentrates the model's learning on 490
accurately classifying the initial coarse categories without being influenced by the more 491
detailed classifications that follow. After the model achieves satisfactory performance on 492
the first decision, the training proceeds to the next decision point (bacterial vs. viral). For 493
this phase, the model's weights from the previous training step are retained, ensuring 494
continuity and leveraging learned features. The loss weight for the current decision is now 495
set to a higher value (e.g., 0.9 for decision 2), while the loss weight for the first decision 496
might be reduced (e.g., 0.1) to maintain its knowledge, and the loss weight for the third 497
decision is set to 0. This process is repeated for each subsequent decision point, gradually 498
shifting the model's focus down the hierarchy. 499
Although the VGG-Backbone and ResNet-Backbone models are not hierarchical in 500
architecture, the training process incorporates hierarchical principles to align with the 501
structured decision-making process of the problem. For these models, the training meth- 502
odology mimics the sequential focus used for the hierarchical models, adapting the learn- 503
ing process to emphasize one level of classification at a time. This structured approach 504
ensures that the backbone models, which are powerful feature extractors due to their pre- 505
trained weights, are finely tuned to the specific requirements of each decision point in the 506
classification task. 507
While cases in hierarchical classification have to follow a predefined hierarchy struc- 508
ture, Local (top-down) and global (big-bang) are the two main approaches that can be 509
used for addressing hierarchical classification (Silla & Freitas, 2011). Global and local ap- 510
proaches are also implemented in the hybrid approach, as we were doing in this work. 511
Whereas, from the local approach, the function proceeds in a top-down manner, starting 512
with a broad classification (decision_1) and refining the classification based on subsequent 513
decisions (decision_2 and decision_3). From a global perspective, the function considers 514
the entire hierarchy of classifications, as it defines the possible outcomes at each level and 515
makes decisions based on the entire set of possibilities. Therefore, both local and global 516
approaches are implemented in this hybrid approach, as explained. 517
𝑇𝑃 +𝑇𝑁
Macro-average Accuracy= 1⁄𝐶 × ∑ 𝑇𝑃 +𝑇𝑁𝑖 +𝐹𝑃 𝑖+𝐹𝑁 (3)
𝑖 𝑖 𝑖 𝑖
Macro-average Precision: It measures the average ratio of true positives among all 528
predicted positives across all classes as shown in Equation (4). 529
Sensors 2024, 24, x FOR PEER REVIEW 20 of 27
𝑇𝑃
Macro-average Precision= 1⁄𝐶 × ∑ 𝑇𝑃 +𝐹𝑃
𝑖
(4)
𝑖 𝑖
Macro-average sensitivity: It measures the average ability of the model to correctly 530
identify true positives across all classes, as shown in Equation (5). 531
𝑇𝑃
Macro-average sensitivity= 1⁄𝐶 × ∑ 𝑇𝑃 +𝐹𝑁
𝑖
(5)
𝑖 𝑖
Macro-average F1-score: It combines both precision and recall into a single metric, 532
providing an overall measure of the model's performance in terms of correctly identifying 533
true positives and minimizing false positives and negatives, as shown in Equation (6). 534
2×𝑇𝑃
Macro-average F1-score= 1⁄𝐶 × ∑ 2× 𝑇𝑃 +𝐹𝑃𝑖 +𝐹𝑁 (6)
𝑖 𝑖 𝑖
Where: 535
C is the number of classes in the classification task. 536
TPi is the number of True Positives for class i. 537
TNi is the number of True Negatives for class i. 538
FPi is the number of False Positives for class i. 539
FNi is the number of False Negatives for class i. 540
Macro-Average Accuracy
Resnet backbone model (First+ Second) dataset 82.06
Resnet backbone model (First) dataset 75.85
Figure 12. Comparison of Macro-Average Accuracy for All Models with and Without a Second Da- 563
taset. 564
The third experiment was done for all models after integrating the synthetic CXR 565
images with the original CXR images from all datasets. Table 8 shows the results of all 566
decisions at each level for each hierarchical classification schema. We observed that the 567
best results for decision#1 and decision#2 (which is binary classification) were obtained 568
by the Resnet-Like model, while the best results for decision#3 (multi-classification) were 569
obtained by the Resnet-backbone model. In addition, the results of the comparison of 570
COVID-19 class for each hierarchical classification schema are shown in Table 9. The 571
COVID-19 classification using the Resnet-backbone model is higher than other models in 572
F1-score 85.82% and accuracy 92.88%. Table 10 represents the macro-avg results that were 573
achieved from all hierarchical classification models. Compared to the first and second ex- 574
periments, the results improved very clearly for all models after integrating the synthetic 575
CXR images to balance the dataset. The Resnet-Like model achieved the best results 576
among all models, with an F1-score of 92.65% and an accuracy of 92.61% for classifying 577
all the classes. 578
Table 8. Results of decisions at each level for each hierarchical classification schema using only CXR 579
images. 580
Table 9. Results comparison of COVID-19 class for each hierarchical classification schema using 582
only CXR images. 583
Table 10. macro-avg results comparison for each hierarchical classification schema using only CXR 584
images. 585
(A)
Sensors 2024, 24, x FOR PEER REVIEW 23 of 27
603
604
605
606
607
608
609
610
611
612
613
614
(B)
615
616
Figure 13. Accuracy and F1-score results of training learning against the number of epochs across 617
all decisions for each multi-modal in the last experiments: (A) Accuracy. (B) F1-score. 618
Table 11. Results of decisions at each level for each hierarchical classification schema using CXR 619
images and tabular data. 620
Table 12. Results comparison of COVID-19 class for each hierarchical classification schema using 621
CXR images and tabular data. 622
Table 13. macro-avg results comparison for each hierarchical classification schema using CXR im- 623
ages and tabular data. 624
(A) (B)
(D)
(C)
Figure 14. Confusion matrix for: (A) Resnet-Backbone (B) Resnet-Like (C)VGG-Backbone (D) VGG- 626
Like. 627
Sensors 2024, 24, x FOR PEER REVIEW 25 of 27
628
Figure 14. Macro-average ROC curve across all decisions for each multi-modal in the last experi- 629
ments. 630
The confusion-matrix plots for the four multi-modal are depicted above in Figure 13. 631
The horizontal axes correspond to the predicted classes, and the vertical axes correspond 632
to the true classes, which represent the actual classifications. The diagonal cells in the con- 633
fusion matrix represent the correct predictions (TP and TN). The off-diagonal cells repre- 634
sent incorrect predictions (FP and FN). From observation of the number in the false pre- 635
diction cells, all the multi-modal seem to perform fairly well, as most of the higher values 636
are concentrated along the diagonal, which are correct predictions, and the misclassifica- 637
tion rate was very low, especially with Resnet_backbone multi-modal. The macro-average 638
ROC curve in Figure 14 demonstrates that the VGG-like multi-modal (AUC=0.95) has the 639
best overall performance across all classes, followed closely by the VGG-Backbone and 640
Resnet-Backbone multi-modal (AUC=0.93~0.92). Taking into consideration the other per- 641
formance metrics, Resnet-Backbone multi-modal achieved superior performance in the 642
classification process. 643
6. Conclusions 644
This paper proposes a novel approach for classifying COVID-19 and distinguishing 645
it from other types of pneumonia and normal lungs using CXR images and medical tabu- 646
lar data in four different hierarchal architectures based on Resnet and VGG pre-trained 647
models. The study used a private dataset that was obtained from King Khalid University 648
Hospital and Rashid Hospital, which contains a total of 4544 cases. The study aims to 649
enhance the process of diagnosing COVID-19 and prove that the combining of CXR im- 650
ages with clinical data will achieve significant improvements in the hierarchal classifica- 651
tion process. Overall, the performance metrics for all the hierarchal deep learning models 652
are enhanced after combining the medical data with CXR images. Resnet-Backbone 653
achieved the highest performance with an accuracy of 95.97%, a precision of 96.01%, and 654
an F-score of 95.98%. The proposed approach showed a promising result, especially hier- 655
archal deep learning multi-modal. Our findings could aid in the development of better 656
diagnostic tools for upcoming respiratory disease outbreaks. However, the study suffers 657
from a data imbalance due to the lack of patient medical data available for some classes. 658
In future work, we plan to explore more datasets from different resources, including dif- 659
ferent classes of pneumonia and lung diseases. 660
Acknowledgments: The authors would like to thank the Deanship of Scientific Research (DSR), 661
King Saud University, Riyadh, Saudi Arabia and Dubai Scientific Research Ethical Committee 662
(DSREC), Dubai Health Authority and Rashid hospital for their support in this study. In addition, 663
special thanks to the editor and reviewers for spending their valuable time reviewing and polishing 664
this article. . 665
Sensors 2024, 24, x FOR PEER REVIEW 26 of 27
Data Availability: The data are not available due to ethical reasons. 666
Conflicts of Interest: On behalf of all authors, the corresponding author declares no conflicts of 667
interest. 668
Informed Consent Statement: Informed consent was obtained from all subjects involved in the 669
study. 670
Institutional Review Board Statement: The study was conducted according to the declaration and 671
guidelines of Dubai Scientific Research Ethics Committee, DHA (DSREC-12/2021_01), and King 672
Saud University Institutional Review Board Committee (E-251-5939). . 673
References 674
1. M. Pal, G. Berhanu, C. Desalegn, and V. Kandi, “Severe Acute Respiratory Syndrome Coronavirus-2 (SARS-CoV-2): An Update,” 675
Cureus, Mar. 2020, doi: 10.7759/cureus.7423. 676
2. “COVID-19 cases | WHO COVID-19 dashboard,” datadot. Accessed: March. 5, 2024. [Online]. Available: 677
https://data.who.int/dashboards/covid19/cases 678
3. M. E. H. Chowdhury et al., “Can AI Help in Screening Viral and COVID-19 Pneumonia?,” IEEE Access, vol. 8, pp. 132665–132676, 679
2020, doi: 10.1109/ACCESS.2020.3010287. 680
4. N. Maharjan, N. Thapa, B. Pun Magar, M. Maharjan, and J. Tu, “COVID-19 Diagnosed by Real-Time Reverse Transcriptase- 681
Polymerase Chain Reaction in Nasopharyngeal Specimens of Suspected Cases in a Tertiary Care Center: A Descriptive Cross- 682
sectional Study,” J. Nepal Med. Assoc., vol. 59, no. 237, May 2021, doi: 10.31729/jnma.5383. 683
5. H. Swapnarekha, H. S. Behera, J. Nayak, and B. Naik, “Role of intelligent computing in COVID-19 prognosis: A state-of-the-art 684
review,” Chaos Solitons Fractals, vol. 138, p. 109947, Sep. 2020, doi: 10.1016/j.chaos.2020.109947. 685
6. T. Yang, Y.-C. Wang, C.-F. Shen, and C.-M. Cheng, “Point-of-Care RNA-Based Diagnostic Device for COVID-19,” Diagnostics, 686
vol. 10, no. 3, p. 165, Mar. 2020, doi: 10.3390/diagnostics10030165. 687
7. Y. A. Helmy, M. Fawzy, A. Elaswad, A. Sobieh, S. P. Kenney, and A. A. Shehata, “The COVID-19 Pandemic: A Comprehensive 688
Review of Taxonomy, Genetics, Epidemiology, Diagnosis, Treatment, and Control,” J. Clin. Med., vol. 9, no. 4, p. 1225, Apr. 2020, 689
doi: 10.3390/jcm9041225. 690
8. S. Candemir and S. Antani, “A review on lung boundary detection in chest X-rays,” Int. J. Comput. Assist. Radiol. Surg., vol. 14, 691
no. 4, pp. 563–576, Apr. 2019, doi: 10.1007/s11548-019-01917-1. 692
9. A. Jacobi, M. Chung, A. Bernheim, and C. Eber, “Portable chest X-ray in coronavirus disease-19 (COVID-19): A pictorial review,” 693
Clin. Imaging, vol. 64, pp. 35–42, Aug. 2020, doi: 10.1016/j.clinimag.2020.04.001. 694
10. “ICD-10 Version:2019.” Accessed: Jan. 25, 2024. [Online]. Available: https://icd.who.int/browse10/2019/en#/ 695
11. S. Kiryu et al., “Deep learning to differentiate parkinsonian disorders separately using single midsagittal MR imaging: a proof 696
of concept study,” Eur. Radiol., vol. 29, no. 12, pp. 6891–6899, Dec. 2019, doi: 10.1007/s00330-019-06327-0. 697
12. M. Roberts et al., “Common pitfalls and recommendations for using machine learning to detect and prognosticate for COVID- 698
19 using chest radiographs and CT scans,” Nat. Mach. Intell., vol. 3, no. 3, pp. 199–217, Mar. 2021, doi: 10.1038/s42256-021-00307- 699
0. 700
13. A. S. Althenayan, S. A. AlSalamah, S. Aly, T. Nouh, and A. A. Mirza, “Detection and Classification of COVID-19 by Radiological 701
Imaging Modalities Using Deep Learning Techniques: A Literature Review,” Appl. Sci., vol. 12, no. 20, p. 10535, Oct. 2022, doi: 702
10.3390/app122010535. 703
14. J. Gao, P. Li, Z. Chen, and J. Zhang, “A Survey on Deep Learning for Multimodal Data Fusion,” Neural Comput., vol. 32, no. 5, 704
pp. 829–864, May 2020, doi: 10.1162/neco_a_01273. 705
15. W. R. Barnett, M. Radhakrishnan, J. Macko, B. T. Hinch, N. Altorok, and R. Assaly, “Initial MEWS score to predict ICU 706
admission or transfer of hospitalized patients with COVID-19: A retrospective study,” J. Infect., vol. 82, no. 2, pp. 282–327, Feb. 707
2021, doi: 10.1016/j.jinf.2020.08.047. 708
16. J. Gardner-Thorpe, N. Love, J. Wrightson, S. Walsh, and N. Keeling, “The Value of Modified Early Warning Score (MEWS) in 709
Surgical In-Patients: A Prospective Observational Study,” Ann. R. Coll. Surg. Engl., vol. 88, no. 6, pp. 571–575, Oct. 2006, doi: 710
10.1308/003588406X130615. 711
17. C. Menéndez, J. B. Ordieres, and F. Ortega, “Importance of information pre-processing in the improvement of neural network 712
results,” Expert Syst., vol. 13, no. 2, pp. 95–103, May 1996, doi: 10.1111/j.1468-0394.1996.tb00182.x. 713
18. M. Liu et al., “Handling missing values in healthcare data: A systematic review of deep learning-based imputation techniques,” 714
Artif. Intell. Med., vol. 142, p. 102587, Aug. 2023, doi: 10.1016/j.artmed.2023.102587. 715
19. D. R. Varma, “Managing DICOM images: Tips and tricks for the radiologist,” Indian J. Radiol. Imaging, vol. 22, no. 01, pp. 4–13, 716
Jan. 2012, doi: 10.4103/0971-3026.95396. 717
20. Yu. Gordienko et al., “Deep Learning with Lung Segmentation and Bone Shadow Exclusion Techniques for Chest X-Ray 718
Analysis of Lung Cancer,” in Advances in Computer Science for Engineering and Education, vol. 754, Z. Hu, S. Petoukhov, I. Dychka, 719
and M. He, Eds., in Advances in Intelligent Systems and Computing, vol. 754. , Cham: Springer International Publishing, 2019, 720
pp. 638–647. doi: 10.1007/978-3-319-91008-6_63. 721
Sensors 2024, 24, x FOR PEER REVIEW 27 of 27
21. H. Oğul, B. B. Oğul, A. M. Ağıldere, T. Bayrak, and E. Sümer, “Eliminating rib shadows in chest radiographic images providing 722
diagnostic assistance,” Comput. Methods Programs Biomed., vol. 127, pp. 174–184, Apr. 2016, doi: 10.1016/j.cmpb.2015.12.006. 723
22. K. E. Bennin, J. Keung, A. Monden, Y. Kamei, and N. Ubayashi, “Investigating the Effects of Balanced Training and Testing 724
Datasets on Effort-Aware Fault Prediction Models,” in 2016 IEEE 40th Annual Computer Software and Applications Conference 725
(COMPSAC), Atlanta, GA, USA: IEEE, Jun. 2016, pp. 154–163. doi: 10.1109/COMPSAC.2016.144. 726
23. M. Abedi, L. Hempel, S. Sadeghi, and T. Kirsten, “GAN-Based Approaches for Generating Structured Data in the Medical 727
Domain,” Appl. Sci., vol. 12, no. 14, p. 7075, Jul. 2022, doi: 10.3390/app12147075. 728
24. H. M and S. M.N, “A Review on Evaluation Metrics for Data Classification Evaluations,” Int. J. Data Min. Knowl. Manag. Process, 729
vol. 5, no. 2, pp. 01–11, Mar. 2015, doi: 10.5121/ijdkp.2015.5201. 730
731
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual au- 732
thor(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to 733
people or property resulting from any ideas, methods, instructions or products referred to in the content. 734