0% found this document useful (0 votes)
19 views12 pages

Bmri2022 3113119

COVID-19

Uploaded by

RasoolNani
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views12 pages

Bmri2022 3113119

COVID-19

Uploaded by

RasoolNani
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Hindawi

BioMed Research International


Volume 2022, Article ID 3113119, 12 pages
https://doi.org/10.1155/2022/3113119

Research Article
Symptom-Based COVID-19 Prognosis through AI-Based IoT: A
Bioinformatics Approach

Madhumita Pal,1 Smita Parija ,1 Ranjan K. Mohapatra ,2 Snehasish Mishra,3


Ali A. Rabaan,4,5,6 Abbas Al Mutair,7,8,9 Saad Alhumaid,10 Jaffar A. Al-Tawfiq,11,12,13
and Kuldeep Dhama 14
1
Electronics and Communication Engineering, CV Raman Global University, Bidyanagar, Mahura, Janla, Bhubaneswar,
Odisha 752054, India
2
Department of Chemistry, Government College of Engineering, Keonjhar, Odisha 758002, India
3
Bioenergy Lab, School of Biotechnology, Campus-11, KIIT Deemed University, Bhubaneswar, Odisha 751024, India
4
Molecular Diagnostic Laboratory, Johns Hopkins Aramco Healthcare, Dhahran 31311, Saudi Arabia
5
College of Medicine, Alfaisal University, Riyadh 11533, Saudi Arabia
6
Department of Public Health and Nutrition, The University of Haripur, Haripur 22610, Pakistan
7
Research Center, Almoosa Specialist Hospital, Al-Ahsa 36342, Saudi Arabia
8
College of Nursing, Princess Norah Bint Abdulrahman University, Riyadh 11564, Saudi Arabia
9
School of Nursing, Wollongong University, Wollongong NSW 2522, Australia
10
Administration of Pharmaceutical Care, Al-Ahsa Health Cluster, Ministry of Health, Al-Ahsa 31982, Saudi Arabia
11
Specialty Internal Medicine and Quality Department, Johns Hopkins Aramco Healthcare, Dhahran 31311, Saudi Arabia
12
Indiana University School of Medicine, Indiana 46202, USA
13
School of Medicine, Johns Hopkins University Baltimore, MD 21287, USA
14
Division of Pathology, ICAR-Indian Veterinary Research Institute, Izatnagar, Bareilly, 243122 Uttar Pradesh, India

Correspondence should be addressed to Smita Parija; [email protected]


and Ranjan K. Mohapatra; [email protected]

Received 2 December 2021; Accepted 17 June 2022; Published 23 July 2022

Academic Editor: Bing Wang

Copyright © 2022 Madhumita Pal et al. This is an open access article distributed under the Creative Commons Attribution
License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is
properly cited.

Objective. Internet of Things (IoT) integrates several technologies where devices learn from the experience of each other thereby
reducing human-intervened likely errors. Modern technologies like IoT and machine learning enable the conventional to patient-
specific approach transition in healthcare. In conventional approach, the biggest challenge faced by healthcare professionals is to
predict a disease by observing the symptoms, monitoring the remote area patient, and also attending to the patient all the time
after being hospitalised. IoT provides real-time data, makes decision-making smarter, and provides far superior analytics, and
all these to help improve the quality of healthcare. The main objective of the work was to create an IoT-based automated
system using machine learning models for symptom-based COVID-19 prognosis. Methods. Comparative analysis of predictive
microbiology of COVID-19 from case symptoms using various machine learning classifiers like logistics regression, k-nearest
neighbor, support vector machine, random forest, decision trees, Naïve Bayes, and gradient booster is reported here. For the
sake of the validation and verification of the models, performance of each model based on the retrieved cloud-stored data was
measured for accuracy. Results. From the accuracy plot, it was concluded that k-NN was more accurate (97.97%) followed by
decision tree (97.79), support vector machine (97.42), logistics regression (96.50), random forest (90.66), gradient boosting
classifier (87.77), and Naïve Bayes (73.50) in COVID-19 prognosis. Conclusion. The paper presents a health monitoring IoT
framework having high clinical significance in real-time and remote healthcare monitoring. The findings reported here and the
lessons learnt shall enable the healthcare system worldwide to counter not only this ongoing COVID but many other such
global pandemics the humanity may suffer from time to come.
2 BioMed Research International

1. Introduction with the objective of data sensing and transmission using


different multiple access networks. The study draws substan-
The ongoing COVID-19 pandemic is caused by a highly tial attention to the low peak age of information (AoI) at low
contagious novel virus, namely, severe acute respiratory syn- power consumption. Abd-Elmagid et al. [19] have described
drome coronavirus 2 (SARS-CoV-2). After its official report the comparison among delay, throughput, and age of infor-
of origin from Wuhan, China, on 31 December 2019, the mation. The study explored the optimal sampling policy that
pathogen spread astoundingly fast round the globe and combines wireless energy transfer with the objective of min-
emerged as a global pandemic [1–3]. As this report is being imizing long-term weighted sum-AoI.
drafted (7 July 2021), more than 3.9 million global tally of Under the current study, the authors have endeavoured
deaths is registered attributed mainly to the human-to- to apply different ML techniques and publicly available
human viral transmission [4]. This novel and rapidly evolv- cloud-stored healthcare datasets to build a system that
ing mutating RNA virus has not only attacked the health and allows real-time and remote health monitoring built on
medical systems but also the global economy significantly, IoT and is associated with cloud computing. Such system
rewriting socioeconomic activities including the stock and shall be allowed to derive recommendations based on the
financial markets [1, 2]. It has also affected the cultural, past and empirical data stored in the cloud. IoT is a progres-
social, festival, and knowledge-sharing activities and the sive technology that is drastically evolving and improving
overall human behavioural patterns [1]. The human-to- day by day with advancements in information technology
human transmission mainly occurs through respiratory and allied technologies. The main objective of the study
droplets/aerosols and the faecal-oral route [5]. Several other was applying ML models to predict COVID-19 by observing
means of transmissions include air-borne transmission and the symptoms manifested by the patients using the real-time
direct/indirect contacts (such as the fomite) [6]. The disease data. Applying ML in predicting COVID-19 infection adds a
is manifested with typical [5, 7] and atypical [8, 9] symp- new dimension to early disease diagnosis. It would help
toms. As per reports, the virus infects the upper and lower researchers as well the medical professionals in predicting
respiratory parts, heart, kidney, liver, gut, and the nervous the rising cases of COVID-19 from symptoms and also help
system, ultimately causing multiorgan damage [10, 11]. It prevent the pandemic with due precaution and prevention.
causes severe health problems in the immunocompromised Recently, Pourhomayoun and Shakibi [20] proposed a
with diabetes, obesity, hypertension, cardiovascular disorder, model that integrated AI and machine learning to forecast
psychiatric disorder, etc. [12–14]. the mortality rate in COVID-19 cases. They analysed the
Numerous measures have been taken by the health bod- data of more than 2,670,000 samples of confirmed COVID
ies and the government agencies to combat SARS-CoV-2 cases from 146 countries and reported 89.98% prediction
transmission. Since the onset of the novel virus, healthcare accuracy in the mortality rate COVID-19 patients [20].
professionals have gone that extra mile to help the needy. Muhammad et al. [21] compared five supervised machine
A major challenge faced by them is the shortage of testing learning models, LR, DT, SVM, NB, and ANN, on Mexico
kits and other medical equipment. As a result, the pandemic dataset to predict COVID-19 infection. They obtained the
continues to challenge the medical systems all around [15]. highest (94.99%) prediction accuracy with decision tree,
In such a scenario, an early diagnosis of the disease may maximum (93.34%) sensitivity with SVM and maximum
improve the healthcare facility. This research article focuses (94%) specificity with NB. Zeroual et al. [22] compared
on predictive COVID-19 prognosis using machine learning five deep learning models, recurrent neural network, long
(ML) algorithm. Machine learning is a subset of artificial short-term memory, bidirectional LSTM, gated recurrent
intelligence (AI) that uses statistics to enable machines to units, and variational autoencoder algorithms, to predict
improve with experience. COVID-19 prognosis in Italy, Spain, France, China, USA,
ML algorithm categorises into three types, supervised and Australia and reported superior performance of varia-
(task driven), unsupervised (clustering), and reinforcement tional autoencoder as compared to others. Zoabi et al. [23]
learning. Supervised learning algorithm handles two types established a machine learning approach trained on the
of problems, classification and regression. Learning algo- data of 51,831 individuals of the Israeli Ministry of Health.
rithm takes samples as input (training set). Unsupervised The model predicted high accuracy with eight binary fea-
learning algorithm predisposes unlabelling for unbiased pre- tures like sex, age ≥ 60, known contact with infected indi-
diction. In reinforcement learning (RL), the agent learns to viduals, and the initial five (cough, fever, sore throat,
interact with the environment to achieve a reward. It has dyspnoea, and headache) clinical symptoms. Aljameel
promising application for rational decision making in et al. [24] reported a prediction model for early identifica-
diverse fields, such as energy management, robotics, agricul- tion of COVID-19 by using 287 samples collected from
ture, and healthcare. Moreover, Kumar et al. have developed the King Fahad University Hospital, Saudi Arabia. They
the deep learning and reinforcement learning models to analysed the data with three classified algorithms, random
forecast COVID-19-infected individuals, losses, and cures forest, logistics regression, and extreme gradient boosting.
with the predictive outcomes [16]. Wang et al. have also
applied the reinforcement learning method to detect 2. Materials and Methods
COVID-19 infection [17]. In the real-time monitoring plat-
form based on IoT devices, Fang et al. [18] focused on 2.1. Proposed System. The IoT is a proposed system where
energy harvesting in next-generation multiple access systems everything is connected to the Internet. It bridges the gap
BioMed Research International 3

Monitor & prescribe medicine

Data
analysis
zone

P
A
T BIO
Data M D
I S
analysis models O O
E E
B C
N N Controller I T
T S L
Data O
with O E
base R
R
covid
symptoms
Computer

Scheme 1: Proposed model for COVID-19 prognosis.

between the man and the machine. Using emerging technol- were implemented on the given dataset which is analysed
ogy, IoT has impacted numerous fields of human endeav- and described in this study. Application of machine learning
ours greatly including the healthcare system. It could to predict COVID-19 infection provides a new and more
change the existing healthcare system merely by using reliable direction to the healthcare professionals for an
advanced sensors and cloud computing platform. IoT, an early-stage disease diagnosis. It helps researchers predict
advanced automation system that uses big data concept, the rising COVID-19 cases at the symptom stage and also
makes it possible to connect every asset through the web helps in preventing the disease by taking due diligent
and helps design a smart healthcare system. As IoT handles precautions.
big data, it is hard for the healthcare professionals to handle
and manage it. Thus, the medical professionals require 2.4. Data Source. The dataset used for the work was accessed
chronicled data to predict a disease. Although various kinds from Kaggle site [25]. The dataset could be collected in a
of machine learning algorithms have been used since long to CSV file and uploaded in a Jupiter notebook for analysis
predict a disease, the biggest challenge in the machine learn- with the Python software. The dataset contained a total of
ing algorithm is to tune the various parameters. Proper tun- 5434 data samples and 19 features/parameters related to
ing of the parameters results in efficient prognosis and the patient symptoms as detailed in Table 1. Seven machine
diagnosis of a disease. learning algorithms were implemented in this work for
COVID-19 prognosis with maximum possible accuracy
2.2. Significance of the Proposed System. The present work and create an automated system for COVID-19 detection.
proposes a framework of e-healthcare system by using artifi- 2.5. Data Preprocessing. The dataset contained vast numbers
cial intelligence, machine learning, and statistics for disease of null values and outliers which might affect the accuracy of
prognosis. In the proposed system, the patient’s data are col- the model. To remove these noisy data, the datasets were
lected stored in cloud by using IoT sensors and transmitted preprocessed and the null values were removed to help
to the web server (mobile app) through the IoT agent. The increase the efficacy of the models. After cleaning the data-
cloud shares the data over social insurance frameworks, set, the data were transformed to a new form by using the
and various machine learning algorithms are executed to process of smoothing and normalisation. The dataset was
process the data. The response is sent to healthcare profes- classified into testing and training set which was imple-
sionals to monitor and suggest proper actions. The block mented on several machine learning models to compare
diagram of the proposed system is shown in Scheme 1. the accuracy score. The various machine learning algorithms
In this proposed model, six data prediction techniques used in this research are discussed below.
are used and their performances are compared to provide
better and reliable quality service for the healthcare sys- 2.5.1. Logistics Regression. This classifier, used for classifica-
tem. Data prediction techniques used are k-nearest neigh- tion and data analysis, is based on supervised algorithm. It
bor, support vector machines, decision tree, random is a type of regression model when data modeling requires
forest, gradient boosting classifier, Naïve Bayes, and logis- sigmoid function [26].
tics regression.
1
Sigmoid function, gðyÞ = : ð1Þ
2.3. Proposed Methodology. The main objective of this work 1 + e−y
was to forecast the probability of a patient suffering from
COVID-19 infection using computer-aided diagnosis/prog- Here, the regression model is built to predict the proba-
nosis system. To deliver this work, different ML techniques bility and measure the learning rate; thus, it is also
4 BioMed Research International

Table 1: Features of the dataset.

Sl. no. Features Description


1. Breathing problem T = 67%; F = 33%
2. Fever T = 79%; F = 21%
3. Sore throat T = 73%; F = 27%
4. Dry cough T = 79%; F = 21%
5. Hyper tension T = 51%; F = 49%
6. Abroad travel T = 54:9%; F = 45:1%
7. Contact with COVID patient T = 50:2%; F = 49:8%
8. Attended large gathering T = 53:8%; F = 46:2%
9. Visited public exposed places T = 51:9%; F = 48:1%
10. Family working in public exposed places T = 58:4%; F = 41:6%
T: true; F: false.

considered as a probabilistic classifier. As it is based on clas-


sification technique, the output or target variables take only 4000
the discrete values for features/parameters as input values.
2.5.2. Support Vector Machines (SVM). This classifier, used 3000
for both classification and regression analysis, is based on
supervised algorithm. This classifier is a margin-based classi- Count
fier as it differentiates the data between margin and hyper- 2000
plane and distinctly classifies the dataset into classes.
It has the capability to work on text classification prob-
1000
lem. It deals with two group classification problems by giv-
ing the model sets for labeled type of training data for each
category. The hard margin type of support vector model 0
optimisation problem can be solved by using the Lagrange Yes No
multiplier method. COVID-19

2.5.3. Random Forest (RF) Model. This classifier is the Figure 1: Count plot for the numerous patients suffering from
ensemble learning classifier. It is used for both classification COVID-19 (yes) and that did not (no).
and regression analysis. It consists of a set of trees in which
each tree is capable of providing a set of predictor values idea behind such decision algorithm includes the best attri-
[27]. Overall, the decision trees are weak classifier and they butes using information gain and the gain ratio. It makes a
are merged to form a random forest model. Random forest decision tree based on that attribute and breaks into subdata-
model does not have cross-validation, while the other classi- sets. Further, it starts building the tree and process repetition
fiers like decision tree and k-NN model have cross- recursively.
validation. In this classifier, a greater number of trees result
in more accuracy. Random forest classifier logic uses entropy, n
gain ratio, and Gini index. Information ðM Þ = − 〠 π log2 π,
i=1

n m
MJ 
Entropy ðN Þ = − 〠 π log2 π, InformationA ðM Þ = 〠 X Information M j ,
j=1 M
i=1
ð3Þ
M m
ð2Þ M M
Gini ðN Þ = 1 − 〠 π2 , SplitA ðM Þ = − 〠 J log2 J ,
j=1 M M
I=1

N N Gain ðN Þ
GiniA ðN Þ = 1 GiniðN 1 Þ + 2 GiniðN 2 Þ: Gain Ratio ðN Þ = :
N N SplitA ðM Þ

2.5.4. Decision Tree (DT) Model. This classifier is based on 2.5.5. k-Nearest Neighbor (k-NN). Based on supervised algo-
classification algorithm while it works on numerical and cat- rithm, k-nearest neighbour technique is based on the nearest
egorical data. It is required to create tree-shaped graph while neighbour data points concept. By using different dis-
analysing the data. The analysis of decision trees is based on tance metric concept, the nearest neighbour data point could
three nodes (root node, interior node, and leaf node). The be deciphered. Although inefficient for large dimensional
BioMed Research International 5

Number of cases

Yes

80.7%

COVID-19

19.3%

No

Figure 2: Pie plot for the patients suffering from COVID-19.

dataset, k-NN technique is easy to implement. It is a non- the preceding predictors. The base learner in the machine
parametric model used to solve classification and regres- is the classification and regression trees [29]. The major
sion problems. The object is classified depending on the parameter used in this technique is the shrinkage which
nearest neighbour using the classification technique. The refers to the prediction of each tree when the model is
calculation of the nearest neighbor is measured using the shrunk after multiplying the learning rate that ranges
Euclidean distance. between 0 and 1. Since all trees are trained, the final predic-
tion is done by the following formula:
Euclidean Distance, dða, bÞ2 = ðb1 − a1 Þ2 + ðb2 − a2 Þ2 : ð4Þ
xðpredÞ = x1 + ðη ∗ r1Þ + ðη ∗ r2Þ+⋯ ⋯ :+ðη ∗ rnÞ: ð6Þ
Here, the input consists of the closest or nearest
neighbour in the dataset to deploy the model. The classi- The algorithm is used to classify gradient boosting
fier assumes similar attributes existing in closer proximity. classifier, and the class is called the gradient boosting
After loading the data and choosing the nearest neigh- regressor (GBR).
bour, the distance between query and original example is
calculated and the numbers of entries are sorted in the 3. Results
collection [28].
Count plot shows that 4383 patients suffered from COVID-
2.5.6. Naïve Bayes (NB). This classifier is based on supervised 19 and 1051 patients did not (Figure 1). Pie plot shows that
algorithm. A classification technique by Baye’s theorem, it 80.7% patients had COVID-19 infection and 19.3% did not
finds out the probability of attributes not having any cor- have (Figure 2).
relation with each other. All attributes contribute inde- 3620 patients had breathing problem and 1814 did not
pendently to the probability. The probability could be out of 5434 data samples. Similarly, 4273 patients suffered
calculated by building the frequency table and likelihood from fever and 1161 did not, 4307 patients had dry cough
table. Further, the test phase from the likelihood table and 1127 did not, 3953 patients had sore throat and 1481
needs to be found out after the training is done. The did not, and 2952 patients had running nose and 2482 did
Baye’s theorem equation is not (Figure 3).
Also, 2514 patients had asthma tendency and 2920 did
PðB/AÞ:PðBÞ not, 2565 patients had chronic lung disease and 2869 did
P ðB/AÞ = , ð5Þ
P ðA Þ not, 2736 patients had headache and 2698 did not, 2523
patients had heart disease and 2911 did not have, and 2588
where PðB/AÞ is the posterior probability, PðBÞ is the patients suffered from diabetes and 2846 did not. Patients
class prior probability, PðAÞ is the predictor prior proba- with heart disease, diabetes, headache, asthma, hypertension,
bility, and PðA/BÞ is the predictor probability. fatigue, gastrointestinal issue, and prior contact with
COVID-19 patient had more probability of suffering from
2.5.7. Gradient Boosting Machine (GBM). This classifier is COVID-19 infection than those that followed COVID
the most popular among all the boosting algorithms where appropriate measures (such as wearing a mask and sanitising
each predictor corrects its preceding predictor’s error. Each regularly) and had no associated health or sociological
predictor in the model is trained well using the errors of issues.
6 BioMed Research International

Breathing problem Fever Dry cough Sore throat Running nose


4000 3000
3500 4000 4000
3500
3000 3500 2500
3000
2500 3000 3000 2000
2500
2000 2500
2000 1500
2000 2000
1500 1500
1500 1000
1000 1000
1000 1000
500 500
500 500
0 0 0 0 0
0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00

Asthma Chronic lung disease Headache Heart disease Diabetes


3000 3000 3000
2500 2500
2500 2500 2500

2000 2000 2000


2000 2000

1500 1500 1500 1500 1500

1000 1000 1000 1000 1000

500 500 500 500 500

0 0 0 0 0
0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00

Hyper tension Fatigue Gastrointestinal Abroad travel Contact with COVID patient
3000 3000
2500 2500 2500
2500 2500
2000 2000 2000
2000 2000
1500 1500 1500 1500
1500
1000 1000 1000 1000 1000

500 500 500 500 500

0 0 0 0 0
0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00

Attended large gathering Visited public exposed places Family working in public exposed places Wearing masks Sanitization from market
3000
3000 5000 5000
2500 2500
2500
2000 4000 4000
2000
2000
1500 3000 3000
1500
1500
1000 1000 2000 2000
1000
500 500 500 1000 1000

0 0 0 0 0
0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 –0.50 –0.25 0.00 0.25 0.50 –0.50 –0.25 0.00 0.25 0.50

Target

4000

3000

2000

1000

0
0.00 0.25 0.50 0.75 1.00

Figure 3: Probability of patients suffering from COVID-19 with relevant symptoms.

Table 2: Different correlation coefficient of the given dataset.


Pearson, Spearman, and Kendallau correlation coeffi-
cient are presented in Table 2. Features like wearing a mask Types of correlation Pearson Spearman Kendallau
and sanitisation from market are not considered as they con- Highest positive correlation 0.503 0.503 0.503
tained null values. As running nose, chronic lung and heart Highest negative correlation -0.016 -0.016 -0.016
diseases, gastrointestinal issues are strongly correlated, these Lowest correlation 0.002 0.002 0.002
features are removed. The correlation matrix after these data
Mean correlation 0.139 0.139 0.139
cleaning is shown in Figure 4.
BioMed Research International 7

1.0

Breathing problem

Fever

0.8
Dry cough

Sore throat

0.6
Hyper tension

Abroad travel

Contact with COVID patient 0.4

Attended large gathering

Visited public exposed places


0.2

Family working in public exposed places

COVID-19
0.0
Fever

Dry cough

COVID-19
Sore throat

Abroad travel
Breathing problem

Hyper tension

Contact with COVID patient

Attended large gathering

Visited public exposed places

Family working in public exposed places

Figure 4: Obtained correlation matrix for the given dataset after data cleaning operation.

3.1. Confusion Matrix. This table is considered to visualise tion. Recall represents correctly predicted positive class.
the classification of classification model. It contains positive, The best sensitivity rate is 1.0 and the worst rate is 0.
negative observation of actual class and positive, negative
observation of predicted class. The four observations are
TP1
TP1 (true positive), FN1 (false negative), TN1 (true nega- Sensitivity = : ð7Þ
tive), and FP1 (false positive). The confusion matrix and TP1 + FN1
the performance measurement parameters of k-NN models
are presented in Figure 5 and Table 3. 3.1.2. Specificity. This is used to calculate true negative pre-
This curve is used to evaluate binary classification and dictions by the total number of negative prediction. The best
plots true positive observations by the false positive observa- specificity rate is 1.0 and the worst rate is 0.
tions. AUC is used to measure the performance by distin-
guishing the positive and negative observations.
The area under the curve value obtained for k-NN algo- TN1
Specificity = : ð8Þ
rithm was found to be 0.98 (Figure 6). It represents that k- TN1 + FP1
NN model was able to reliably prognose COVID-19 infec-
tion up to 98%. k-NN model performance measure matrices 3.1.3. Precision. It represents the actual number of positive
are presented in Table 4 and are used to calculate sensitivity, class from total number of positive classes.
specificity, precision, and accuracy.

3.1.1. Sensitivity (Recall). This is used to calculate the true TP1


Precision = : ð9Þ
positive prediction by the total number of positive predic- TP1 + FP1
8 BioMed Research International

Confusion matrix

Without COVID
750

203 2
600
True class

450

300

20 862

150
COVID

Without COVID COVID


Predicted class

Figure 5: Confusion matrix of k-NN.

Table 3: Confusion matrix report of k-NN.

Performance parameter Description k-NN


TP1 Predicted and actual values are positive 862
TN1 Predicted and actual values are negative 203
FP1 Predicted value is positive but actual value is negative 2
FN1 Predicted value is negative but actual value is positive 20

Area under curve


1.0 3.1.4. Accuracy. It is used to calculate the true observations
to the total number of observations. True observations are
0.8 TP and TN.
True positive rate

0.6
TP1 + TN1
Accuracy = : ð10Þ
0.4
TP1 + TN1 + FP1 + FN1

0.2
3.1.5. F1-Score. It is the harmonic mean between precision
and sensitivity.
0.0
0.0 0.2 0.4 0.6 0.8 1.0
False positive rate

K-NN tree (AUC = 0.98) 1 + β2 Precision:Sensitivity
FI = , ð11Þ
Figure 6: AUC plot of k-NN model. β2 ðPrecision + SensitivityÞ
BioMed Research International 9

Table 4: Classification report of k-NN model.

Performance matrix Precision Recall F1-score Support


0 0.91 0.99 0.95 205
1 1.00 0.98 0.99 882
Accuracy — — 0.98 1087
Macro average 0.95 0.98 0.97 1087
Weighted average 0.98 0.98 0.98 1087

Table 5: Performance report of the various test models executed in the study.

Algorithm TP TN FP FN Accuracy Sensitivity Precision F1-score


Logistics regression 852 208 9 18 96.50 0.97 0.98 0.98
Random forest 821 200 54 51 90.66 0.94 0.93 0.93
Decision tree 890 172 20 4 97.79 0.99 0.97 0.98
Linear SVM 885 174 17 11 97.42 0.98 0.98 0.98
Naïve Bayes 558 233 0 285 73.50 0.66 1.00 0.79
Gradient boosting classifier 814 213 55 88 87.77 0.90 0.93 0.91

Table 6: Accuracy score obtained by ML models.

ML models Accuracy score Run time (seconds)


k-NN 97.97 0.543
Decision tree 97.79 0.024
Support vector machines 97.42 0.217
Logistics regression 96.50 0.053
Random forest 90.66 5.423
Gradient boosting classifier 87.77 0.523
Naïve Bayes 73.50 0.013

where β is a constant which is commonly 1, 2, or 0.5. Table 5. The percentage of accuracy score is presented in
Table 6, and the accuracy comparison of each of the model
TP1:TP1 are depicted in Figure 7. From the accuracy plot, it was con-
F1 = , cluded that k-NN was more accurate (97.97%) followed by
TP1 + TP1 + FP1 + FN1
ð12Þ decision tree (97.79), support vector machine (97.42), logis-
2:TP1 tics regression (96.50), random forest (90.66), gradient
FI = :
2:TP1 + FP1 + FN1 boosting classifier (87.77), and Naïve Bayes (73.50) in
COVID-19 prognosis based on the given dataset and the
4. Discussion defined features/parameters.
Out of all the models compared for reliability, k-NN
This piece of research work detects (prognoses) whether or model was found to be the best. It was found that k-NN
not a patient is likely to suffer from COVID-19 infection model with a prediction accuracy of 98% performed better
by observing the patients’ symptoms. This research was as compared to other six algorithms. We have also compared
done on machine learning classification techniques using the results of our study with some other reported models
Naïve Bayes, decision tree, random forest, k-nearest neigh- (Table 7), which suggests that our models are effective and
bor, support vector machine, logistics regression, and gradi- give better results [30–34]. We have used a 10-fold cross-
ent booster. The dataset was collected from Kaggle site and validation method for improving the performances of our
processed using python open access software in Jupyter models. In future, this research may help healthcare profes-
notebook. The data was analysed and split into a training sionals to predict and diagnose COVID-19 at an early stage.
set and a test set. Different ML models are implemented This would be useful especially for the patients in remote
on the dataset, and the performance of each of the model locations with low access to immediate medical facility.
is described in terms of accuracy. Performance report of COVID-19 prognosis could also be done using other
the various test models executed in the study is given in machine learning and deep learning approaches with
10 BioMed Research International

Accuracy of different classifier models


100 97.42% 97.97% 96.50% 97.79%
90.66% 87.77%
80 73.50%
% of accuracy
60

40

20

Logistic regression

Decision tree
Support vector machines

KNN

Random forest

Gradient boosting classifier


Naive bayes
Classifier models

Figure 7: Accuracy comparison plot of different ML models.

Table 7: Performance comparison of proposed work with other reported works.

Model for prediction Accuracy Specificity Sensitivity AUC


Brinati et al. [30] Random forest 82 — — 84
Tschoellitsch et al. [31] Random forest 81 — — 74
Tordjman et al. [32] Logistics regression — 80.3 88.9
Soltan et al. [33] Extreme gradient boosting tree — 94.8 77.4 99
Alakus and Turkoglu [34] LSTM 86.66 — 99.42 62.50
k-NN 97.97 0.98 0.98 98
Random forest 90.66 0.94 0.93 98
Logistics regression 96.50 0.97 0.98 93
Proposed work
SVM 97.42 0.98 0.98 89
Decision tree 97.79 0.99 0.97 95
Gradient boosting classifier 87.77 0.90 0.93 97

potentially better accuracy. This study is bound to provide COVID-19 infection in India and elsewhere. Also, the
ample references for further development in this field at a AUC and various performance measurement metrics like
global scale. However, more robust datasets as inputs are accuracy, precision, recall, and F1-score of k-NN model
strongly recommended to achieve this. are discussed. The work provides a precursor to design
an automated COVID-19 prognosis system using IoT and
5. Conclusion machine learning algorithms. The risk rate was 65-80%
with the four critical symptoms (fever, dry cough, breath-
Many countries including India are still struggling to fight ing issue, and sore throat) out of the 10 parameters/fea-
against this deadly corona pandemic as the cases are rising tures considered from the 19 total possible parameters/
daily. Each day comes as a new challenge with ever larger features. So, these four critical parameters could be recom-
quantity of COVID-19 cases and data. To address this, mended as the strong prognosis bioindicators.
research to develop medicines to treat and vaccines to pre-
vent COVID-19 is being pursued at global scale. This Data Availability
paper compares seven machine learning algorithms in
terms of their accuracy in COVID-19 prognosis; machine The data used to support the findings of this study are avail-
learning algorithms are implemented to predict/prognose able from the corresponding author upon request.
BioMed Research International 11

Conflicts of Interest macokinetic and toxicity studies,” Journal of King Saud Uni-
versity–Science, vol. 33, no. 8, article 101637, 2021.
The authors have no conflict of interest. [11] R. K. Mohapatra, K. Dhama, S. Mishra et al., “The microbiota
related coinfections in COVID-19 patients: a real challenge,”
Beni-Suef University Journal of Basic and Applied Sciences,
Authors’ Contributions vol. 10, no. 1, p. 47, 2021.
Conceptualisation and writing the original draft were per- [12] C. McCarthy, C. P. O’Donnell, N. E. W. Kelly, D. O'Shea, and
formed by RKM and MP. Software was the responsible of A. E. Hogan, “COVID-19 severity and obesity: are MAIT cells
a factor?,” The Lancet, vol. 9, no. 5, pp. 445–447, 2021.
MP. Literature search, data analysis, and interpretation and
editing were performed by MP, SP, AAR, SM, AM, and [13] B. M. Popkin, S. Du, W. D. Green et al., “Individuals with obe-
sity and COVID-19: a global perspective on the epidemiology
SA. Writing, review, and editing were carried out by KD
and biological relationships,” Obesity Reviews, vol. 21, article
and JAT. e13128, 2020.
[14] I. Lega, L. Nisticò, L. Palmieri et al., “Psychiatric disorders
Acknowledgments among hospitalized patients deceased with COVID-19 in
Italy,” EClinicalMedicine, vol. 35, article 100854, 2021.
Authors are very grateful to the authorities of their respec- [15] T. K. Suvvari, P. CharulataSree, S. Kuppili et al., “Consecutive
tive institutions/universities for the cooperation and support Hits of COVID-19 in India: The Mystery of Plummeting Cases
extended. and Current Scenario,” Archives of Razi Institute, vol. 76, no. 5,
pp. 1165–1174, 2021.
[16] R. L. Kumar, F. Khan, S. Din, S. S. Band, A. Mosavi, and
References E. Ibeke, “Recurrent neural network and reinforcement learn-
ing model for COVID-19 prediction,” Frontiers in public
[1] R. K. Mohapatra, S. Mishra, M. Azam, and K. Dhama, health, vol. 9, article 744100, 2021.
“COVID-19, WHO guidelines, pedagogy, and respite,” Open
[17] B. Wang, Y. Sun, T. Q. Duong, L. D. Nguyen, and L. Hanzo,
Medicine, vol. 16, no. 1, pp. 491–493, 2021.
“Risk-aware identification of highly suspected covid-19 cases
[2] R. K. Mohapatra, L. Perekhoda, M. Azam et al., “Computa- in social IoT: a joint graph theory and reinforcement learning
tional investigations of three main drugs and their comparison approach,” IEEE Access, vol. 8, pp. 115655–115661, 2020.
with synthesized compounds as potent inhibitors of SARS-
[18] Z. Fang, J. Wang, Y. Ren, Z. Han, H. V. Poor, and L. Hanzo,
CoV-2 main protease (Mpro): DFT, QSAR, molecular docking,
“Age of information in energy harvesting aided massive multi-
and in silico toxicity analysis,” Journal of King Saud Univer-
ple access networks,” IEEE journals on selected areas in com-
sity–Science, vol. 33, no. 2, article 101315, 2021.
munication, vol. 40, no. 5, pp. 1441–1456, 2022.
[3] R. K. Mohapatra, P. K. Das, and V. Kandi, “Challenges in con-
trolling COVID-19 in migrants in Odisha, India,” Diabetes & [19] M. A. Abd-Elmagid, N. Pappas, and H. S. Dhillon, “On the role
Metabolic Syndrome: Clinical Research & Reviews, vol. 14, of age of information in the Internet of Things,” IEEE commu-
no. 6, pp. 1593-1594, 2020. nication magazines, vol. 57, no. 12, pp. 72–77, 2019.
[4] WHO, “WHO Coronavirus (COVID-19) Dashboard,” 2021, [20] M. Pourhomayoun and M. Shakibi, “Predicting mortality risk
https://covid19.who.int/. in patients with COVID-19 using machine learning to help
medical decision-making,” Smart Health, vol. 20, article
[5] R. K. Mohapatra, L. Pintilie, V. Kandi et al., “The recent chal-
100178, 2021.
lenges of highly contagious COVID-19, causing respiratory
infections: symptoms, diagnosis, transmission, possible vac- [21] L. J. Muhammad, E. A. Algehyne, S. S. Usman, A. Ahmad,
cines, animal models, and immunotherapy,” Chemical Biology C. Chakraborty, and I. A. Mohammed, “Supervised machine
& Drug Design, vol. 96, no. 5, pp. 1187–1208, 2020. learning models for prediction of COVID-19 infection using
epidemiology dataset,” SN Computer Science, vol. 2, p. 11,
[6] R. K. Mohapatra, P. K. Das, L. Pintilie, and K. Dhama, “Infec-
2021.
tion capability of SARS-CoV-2 on different surfaces,” Egyptian
Journal of Basic and Applied Science, vol. 8, no. 1, pp. 75–80, [22] A. Zeroual, F. Harrou, A. Dairi, and Y. Sun, “Deep learning
2021. methods for forecasting COVID-19 time-series data: a com-
parative study,” Chaos, Solitons & Fractals, vol. 140, article
[7] C. Huang, Y. Wang, X. Li et al., “Clinical features of patients
110121, 2020.
infected with 2019 novel coronavirus in Wuhan, China,” The
Lancet, vol. 395, no. 10223, pp. 497–506, 2020. [23] Y. Zoabi, S. Deri-Rozov, and N. Shomron, “Machine learning-
[8] N. Singhania, S. Bansal, and G. Singhania, “An atypical presen- based prediction of COVID-19 diagnosis based on symp-
tation of novel coronavirus disease 2019 (COVID-19),” The toms,” npj Digital Medicine, vol. 4, p. 3, 2021.
American Journal of Medicine, vol. 133, no. 7, pp. e365–e366, [24] S. S. Aljameel, I. U. Khan, N. Aslam, M. Aljabri, and E. S.
2020. Alsulmi, “Machine learning-based model to predict the disease
[9] M. S. Ekbatani, S. A. Hassani, L. Tahernia et al., “Atypical and severity and outcome in COVID-19 patients,” Scientific Pro-
novel presentations of coronavirus disease 2019: a case series gramming, vol. 2021, Article ID 5587188, 2021.
of three children,” British Journal of Biomedical Science, [25] https://www.kaggle.com/symptoms-and-covid-presence.
vol. 78, no. 1, pp. 47–52, 2021. [26] C.-Y. J. Peng, K. L. Lee, and G. M. Ingersoll, “An introduction
[10] R. K. Mohapatra, K. Dhama, A. A. El–Arabey et al., “Repur- to logistic regression analysis and reporting,” The Journal of
posing benzimidazole and benzothiazole derivatives as poten- Educational Research, vol. 96, no. 1, pp. 3–14, 2002.
tial inhibitors of SARS-CoV-2: DFT, QSAR, molecular [27] G. Biau, “Analysis of a random forests model,” Journal of
docking, molecular dynamics simulation, and in-silico phar- Machine Learning Research, vol. 13, pp. 1063–1095, 2012.
12 BioMed Research International

[28] D. Shah, S. Patel, and S. K. Bharti, “Heart disease prediction


using machine learning techniques,” SN Computer Science,
vol. 1, p. 345, 2020.
[29] O. González-Recio, J. A. Jiménez-Montero, and R. Alenda,
“The gradient boosting algorithm and random boosting for
genome-assisted evaluation in large data sets,” Journal of Dairy
Science, vol. 96, no. 1, pp. 614–624, 2013.
[30] D. Brinati, A. Campagner, D. Ferrari, M. Locatelli, G. Banfi,
and F. Cabitza, “Detection of COVID-19 infection from rou-
tine blood exams with machine learning: a feasibility study,”
Journal of medical systems, vol. 44, no. 8, p. 135, 2020.
[31] T. Tschoellitsch, M. Dünser, C. Böck, K. Schwarzbauer, and
J. Meier, “Machine learning prediction of sars-cov-2 polymer-
ase chain reaction results with routine blood tests,” Laborator-
iums Medizin, vol. 52, no. 2, pp. 146–149, 2021.
[32] M. Tordjman, A. Mekki, R. D. Mali et al., “Pre-test probability
for SARS-Cov-2-related infection score: the PARIS score,”
PLoS ONE, vol. 15, no. 12, article e0243342, 2020.
[33] A. A. Soltan, S. Kouchaki, T. Zhu et al., Artificial intelligence
driven assessment of routinely collected healthcare data is an
effective screening test for COVID-19 in patients presenting to
hospital, medRxiv, 2020.
[34] T. Alakus and I. Turkoglu, “Comparison of deep learning
approaches to predict COVID-19 infection,” Chaos Solitons
Fractals, vol. 140, article 110120, 2020.

You might also like