SlideShare a Scribd company logo
IAES International Journal of Artificial Intelligence (IJ-AI)
Vol. 13, No. 3, September 2024, pp. 3072∼3082
ISSN: 2252-8938, DOI: 10.11591/ijai.v13.i3.pp3072-3082 ❒ 3072
Age prediction from COVID-19 blood test for ensuring
robust artificial intelligence
Nunung Nurul Qomariyah1
, Dimitar Kazakov2
1Department of Computer Science, School of Computing and Creative Arts, Bina Nusantara University, Jakarta, Indonesia
2Department of Computer Science, University of York, Heslington, United Kingdom
Article Info
Article history:
Received Oct 6, 2023
Revised Oct 25, 2023
Accepted Jan 6, 2024
Keywords:
Age prediction
Blood test
COVID-19
Machine learning
Medical record
Regression
ABSTRACT
With the advancement of artificial intelligence (AI) nowadays, the world is ex-
periencing conveniences in automating some complex and tedious tasks, such
as analysing large data and predicting the future by mimicking human expertise.
AI has also shown promise for mitigating future crisis, such as pandemic. Since
the beginning of the COVID-19, several AI models have been published by the
researchers to help the healthcare to fight in this situation. However, before de-
ploying the model, one needs to ensure that the model is robust and safe to learn
from the real environment, especially in medical domain, where the uncertainty
and incomplete information are not unusual. In the effort of providing robust AI,
we proposed to use patient age as one of the feasible feature for ensuring vigor-
ous AI models from electronic health record. We conducted several experiment
with 28 blood test items and radiologist report from 1,000 COVID-19 patients.
Our result shows that with the predicted age as an additional feature in mortality
classification task, the model is significantly improved when compared to adding
the actual age. We also reported our findings regarding the predicted age in the
dataset.
This is an open access article under the CC BY-SA license.
Corresponding Author:
Nunung Nurul Qomariyah
Department of Computer Science, School of Computing and Creative Arts, Bina Nusantara University
Jakarta 11480, Indonesia
Email: nunung.qomariyah@binus.edu
1. INTRODUCTION
Artificial intelligence (AI) has been helping humanity since the beginning of the neural network era.
It continues to contribute in an era where the deep learning (DL) approach became very popular in the 2010s.
AI has become a rising field that revolutionary produces solutions to automate many tasks in several domains,
including healthcare. Scientists have developed promising models that have the potential to reshape the health-
care domain, especially in the settings where resources are lacking, such as in the coronavirus disease 2019
(COVID-19) pandemic situation. In this kind of situations, health workers often feel exhausted when deal-
ing with the rapidly increasing number of patients. AI researchers have been proposing many solutions for
this, including automatic diagnosing, severity, and mortality prediction. Several studies on how AI can help
COVID-19 have been published. From Scopus indexed database, there were 1,667 documents returned for the
keyword “AI for COVID-19”, published in 2020 to 2022.
This advancement of AI technology has also been encouraged to be deployed in high-stakes settings,
such as autonomous driving and managing power-grid. Such applications have been raising another need of
a robust AI [1]. Before deploying the model, one needs to ensure that the model is robust and safe to learn
Journal homepage: http://ijai.iaescore.com
Int J Artif Intell ISSN: 2252-8938 ❒ 3073
from the real environment, where the uncertainty and incomplete information exist. According to Dietterich
[1], robust AI can be achieved through several ways; such as robust optimization, regularization in machine
learning (ML), modify the objectives to be risk-sensitive, robust inference, detecting model failures, causal
models, ensemble method, and expand the model (such as knowledge model employed by Google that contains
millions of objects and relationships).
Several AI models have been developed for COVID-19 problem, since the first time this disease
was detected in Wuhan, China, 2019. For diagnosing task, the standard test use for detecting COVID-19
is through reverse transcription polymerase chain reaction (RT-PCR) test. The presence of this disease can
also be confirmed from the patients’ radiology test result chest X-ray (CXR) and computerized tomography
(CT) scan. The first method, RT-PCR, is more preferable because its speed and accuracy [2]. However, the
specificity and sensitivity of this method have a fairly large gap between one test kit and another. According to
WHO, the ‘acceptable’ test method can have the sensitivity above 80% and specificity above 90%, while the
‘desired’ test method can have sensitivity above 90% and specificity above 99% [3]. Other that that, the RT-
PCR test also has several limitation to be used in large-scale diagnosis, such as the long turnaround times (over
2-3 hours), certified laboratories, trained healthcare staff, expensive equipment and reagents which possibly
will make the demand overcome supply [4]. Other than radiology images, blood test can also be used as the
alternative method for initial patient screening. When compared to the gold standard of RT-PCR method, this
routine blood test will usually can be delivered more quickly with a hematology analyzer within 30 minutes
to 2 hours time range, as mentioned in [2]. Therefore, in the medical domain itself, some researchers have
also considered the blood test for COVID-19 diagnosis such as [5]. Following this fact, the researchers in AI
domain, also developed several models to diagnose COVID-19 from blood test data, such as the one in [2],
[6]-[11]. In addition to the diagnosis task, the blood test exam is also used to predict the severity of the disease
[12] and mortality [13], [14].
As suggested by Dietterich [1], we need to perform additional efforts for implementing a robust AI,
before the models can be deployed in real-world settings. There are several ways to ensure the robustness of the
AI models in medical domains. One effort has shown by removing bias in radiology image classification [15].
The main contribution of this paper is proposing a novel way to handle this problem by using “age” as the im-
portant feature to ensure the robustness and reliability model. The patient’s age, as one additionalan important
features that can improve and check the consistency of the results of the AI model of blood tests for COVID-19
patients, was chosen because of some reasons as follows: i) age is a risk factor for almost all chronic disease,
ii) knowing the patient’s age for planning proper triage is important, iii) in forensic and anthropological inves-
tigations, predicting age is common method, and iv) age was found as one of the most significant contributors
of predicting mortality and diagnosing COVID-19.
As stated in [16], the rates of disease in each age group are different. Age is a risk factor for almost
all chronic disease, including most cancers. In the epidemiology, the age is also important factor to observe the
findings in collecting health statistics from the communities. The epidemiologist refer this as age-adjustment,
where it allows the statisticians to give different weight for people from different age group to remove the
confounding factor.
Knowing the patient’s age for planning proper triage is important. Not only that, knowing the patient’s
age also shows benefits in establishing a diagnosis, such as the study conducted by Urban et al. [17]. They
confirm that age specific cut off point of D-Dimer can significantly improve the specificity of the venous
thromboembolism (VTE) diagnosis in a patient, particularly for older age. They compared the result with the
conventional D-Dimer cut off point.
Similar study which focus on predicting the patients’ age has been carried out by Wang et al. [18].
They proposed a model that can predict the chronological age of patients from electronic medical records
(EMR) which contains information about the patient physiological state: vital signs and lab tests. They aim to
identify the discrepancy between chronological and physiological age of patients which is vital for preventative
and personalized care. They trained a DL model and the result is satisfied with the standard deviation error of
7 years.
In forensic and anthropological investigations, predicting individual age is not new. A recent study
by Karargyris et al. [19] demonstrated a novel approach of predicting age automatically by using DL model
from medical images. Based on their study, various medical imaging modalities often contain individual visual
features of a person. The benefits of their research are not only can be used for forensic purposes, but can also
be used for planning appropriate treatment, for example, for children who are detected to have growth disorders
Age prediction from COVID-19 blood test for ensuring ... (Nunung Nurul Qomariyah)
3074 ❒ ISSN: 2252-8938
from bone age prediction.
Several studies also confirm that age was one of the most significant contributors of predicting mortal-
ity in COVID-19. One study that confirms this findings has been conducted by Ruan et al. [20]. Based on their
trial, age was found to be significant along with the presence of comorbidity disease, secondary infection and
inflammation in the blood. Another similar study in New York City by Vaid et al. [21] also confirms the same
result. They stated that at 7 days, age was found to be important for COVID-19 mortality with rapid increase
of feature associated with increasing age. A study by Brinati et al. [10] also found similar findings saying that
age was one of the top-10 most important feature in diagnosing COVID-19.
On the other hand, in this digital era when every patient history record is stored electronically, it is
also necessary to develop a system that can confirm the patient’s age based on their laboratory readings. This is
especially useful when the patient age data is missing due to human error or other unexpected oversight when
entering data. Therefore, a system to confirm the patient’s biological age is needed, to develop appropriate
triage and make correct treatment decisions. In a pandemic situation like now, people find it useful to use
telemedicine to seek first aid from a healthcare practitioner. Telemedicine is also a digital technology that
is currently very popular during this pandemic. This mechanism will be useful when there is a need for the
patients need to send the blood test result and consult with the doctors remotely through telemedicine.
This study is conducted based on our previous findings on training the AI models to predict patients’
age from CXR [22] and blood test data [23]. For the radiology image, the models ware trained on predicting
age from general case of CXR, while for the blood test, the models were trained on both COVID-19 and non-
COVID-19 case. However, since our finding in [23] shows that for the blood test data, COVID-19 dataset
performed better than the other dataset (Pneumonia and other disease), in this paper, we want to focus on the
COVID-19 blood test dataset.
This paper is limiting the scope of the research to only focus on the patient’s with COVID-19, as
people diagnosed with this disease have a particular biological markers and a set of tests. Therefore, the results
obtained from the conclusions of this research will be easier to be applied. We conducted several experiments,
where it can be composed into two big scopes: predicting age from blood test and predicting mortality from
blood test by considering age as a feature. In the first scope, we want to show the impact of radiologist
observation text report to predict age from blood test. In the second scope, we performed experiments by
involving the actual age and predicted age with each of the blood test items in predicting the mortality of the
patients.
2. RESEARCH METHODOLOGY
This section will show about the research methodology used in the paper. The research work flow
is shown in Figure 1. The process started from data collection, then it was followed by preprocessing step,
dividing the data into training and testing, then perform several experiments and evaluation. As explained in
the previous section, this study composed of two big tasks, namely predicting age and predicting mortality. For
each task, we conducted experiments and tuned two models. Hence, we ended up with four AI models. The
summary of the four models is shown in Table 1.
2.1. Dataset and preprocessing
In Figure 1, the four AI models were highlighted in yellow boxes, they are: i) model for predicting
age; ii) model for predicting age with radiologist report as extra feature; iii) model for predicting mortality with
actual age; and iv) model for predicting mortality with predicted age. As shown in Figure 1, the first step in this
study was collecting data. The data was collected from a COVID-19 referral hospital, Pasar Minggu Regional
Hospital in Jakarta, Indonesia, during the first wave of the pandemic, March to December 2020. There were
1,000 patients records collected as a sample in this study. Each patient was administered to several blood test
examination during their stay in the hospital. Consequently, the data contains several rows of blood test result
from each patient. So in total, we have 24,629 records in the dataset. The data collected in this study include:
patients age, blood test, mortality status, and radiologist report (sample is provided in Figure 2).
The radiologist report were given as an extra information about the patient conditions, in replace of the
radiology images. This is due to the hospital policy, where the radiology images cannot be sent out to external
parties. We plan to conduct deeper analysis of this additional feature in another paper. For this study, we use
this extra feature to examine whether there is improvement in the performance when compared to only using
the blood test data.
Int J Artif Intell, Vol. 13, No. 3, September 2024: 3072–3082
Int J Artif Intell ISSN: 2252-8938 ❒ 3075
Figure 1. Research flow diagram
Table 1. Summary of the four AI models
Model Task Predicted
value/label
Feature Evaluation
1 regression age 28 blood test item Normalised RMSE and R2
2 regression age 28 blood test item and radiologist report Normalised RMSE and R2
3 classification mortality each blood test item and actual age specificity, sensitivity, accuracy, F1-score
4 classification mortality each blood test item and predicted age specificity, sensitivity, accuracy, F1-score
Age prediction from COVID-19 blood test for ensuring ... (Nunung Nurul Qomariyah)
3076 ❒ ISSN: 2252-8938
Original report in Bahasa
Thorax PA
Perbandingan : foto thorax PA tanggal 07-01-2018
Cor sedikit membesar ke lateral kiri dengan apeks yang tertanam
pada diafragma, pinggang jantung normal (CTR 52%).
Sinuses dan diafragma normal.
Pulmo:
-Hili normal.
-Corakan bronkovaskuler normal.
-Tampak infiltrat di tengah sampai bawah kiri.
-Kranialisasi (-).
KESAN :
-Kardiomegali ringan.
-Infiltrat di tengah sampai bawah kiri dd/ bronkopneumonia.
(dibandingkan dengan foto thorax PA tanggal 07-01-2018 : stqa)
Translated report in English
Thorax PA
Comparison: PA thorax photo on 07-01-2018
Slightly enlarged left lateral with embedded apex
on the diaphragm, the waist of the heart was normal (CTR 52%).
The sinuses and diaphragm are normal.
Pulmo:
-Normal hila.
-Normal bronchovascular pattern.
-Looks infiltrate in the middle to the bottom left.
-Cranialization (-).
IMPRESSION:
-Mild cardiomegaly.
-Infiltrate in the middle to bottom left chest/ bronchopneumonia.
(compared to PA thorax photo on 07-01-2018: status quante)
Figure 2. A sample of radiologist report dataset
Based on our interview with the pulmonologist from the hospital and also considering our previous
findings in [14], we decided to use only 28 blood biomarkers. These biomarkers is shown in Table 2. From
the the patients’ age data, it was varied from the youngest, 1.5 years old, to the oldest, and 92 years old.
The histogram of age with the proportion of each patient outcome is shown in Figure 3. It is shown that the
patient’s age ranging from 1.5 years old to 92 years old, with two peak points in between 36 and 41 years old,
also between 56 and 61 years old. While the distribution of the dead outcome only have one peak point, that is
in between 56-61 years old.
For preprocessing step, we employed backward fill and k-nearest neighbour (KNN) imputation tech-
nique. The first method is to fill the missing values of the blood test with the latest exam result. Then, KNN
technique is used for final step in imputation to clean the whole dataset. KNN works by finding the similar
values and estimate the missing one. To handle the imbalanced dataset in mortality, where the dead status is
much lower than the survive one, we used SMOTE technique to perform data upsampling for minority class.
We divided the dataset into 60:40 proportion for training and testing. We also ensure that when splitting the
dataset into training and testing, there was no data belong to the same patients were split up.
2.2. Text embedding technique
In this section, we will explain the text embedding technique for preparing the radiologist report data.
In our experiment, we only used FastText [24], a library created by Facebook Research team for efficient
text classification and representation learning. The library already provided the pre-trained word vectors for
Indonesian languages, which were trained on Common Crawl and Wikipedia. This word vectors model were
trained by Grave et al. [25], and published in 2018. The reason we chose this library in our experiment,
other than the reason that they provided Indonesian language model, they also offer several benefits over the
other library such as word2vec, gensim or glove, i.e. they use assumption that a word is formed by a n-
grams of character. This can be helpful to find the vector representation of a rare word. It can also give
vector representation, even when there is no existing word in the dictionary. Since our radiologist report
dataset contains so many rare word, which are specific on the pulmonary domain, the FastText library is mostly
suitable. Several latest studies already showed evidence that FastText performed better than the other libraries,
Int J Artif Intell, Vol. 13, No. 3, September 2024: 3072–3082
Int J Artif Intell ISSN: 2252-8938 ❒ 3077
such as the one performed by Alghamdi and Assiri [26]. The other study also showed that FastText not only
performed better but also faster [27].
Table 2. Biomarkers used in blood test dataset
Biomarker Feature code Normal level (adult) Unit
HEMATOLOGY
Hemoglobin HB 13.2 - 17.3 g/dL
Hematocrit HCT 40 - 52 %
Leukocytes LEKO 3.8 - 10.6 103/µL
Platelets PLT 150 - 440 103/µL
Erythrocytes ERI 4.40 - 5.90 106/µL
Red cell distribution width RDW 11.8 - 14.5 %
AVERAGE ERYTHROCYTE VALUE
Mean corpuscular volume MCV 80 - 100 fl
Mean corpuscular hemoglobin MCH 27.5 - 33.2 pg
Mean corpuscular hemoglobin concentration MCHC 32 - 36 g/dL
COUNT TYPE
Basophils BASOFIL 0.0 - 1.0 %
Eosinophils EOS 1.0 - 5.0 %
Stem neutrophils NEUTB 3.0 - 5.0 %
Segmented neutrophils SEGMEN 50 - 70 %
Lymphocytes LIMFOSIT 25 - 50 %
Monocytes MONOSIT 2.0 - 8.0 %
Neutrophil-lymphocyte ratio NLR1 <3.12
Erythrocyte sedimentation rate LED 0 - 20 mm/hour
HEMOSTASIS
D-Dimer DDIMER <0.5 µg/mL
prothrombin time PTHSL 10.80 - 14.40 second
Activated partial Thromboplastin Time APTTHSL 25.00 - 35.00 second
BLOOD CHEMISTRY
Arterial blood gas analysis
Partial pressure of oxygen PO2 N 71.0 - 104.0 mmHg
Oxygen saturation O2S N 94.0 - 100.0 %
Liver function
Serum glutamic oxaloacetic transaminase SGOT <50 U/L
Serum glutamic pyruvic transaminase SGPT <50 U/L
Diabetes
Random plasma glucose test GDSFULL 70 - 180 mg/dL
Kidney function
Urea UREUM <48 mg/dl
Creatinine CREAT 0.70 - 1.30 mg/dL
Cardiac enzymes
Lactate dehydrogenase LDH 50 - 150 U/L
In order to transform the radiologist text report to vector representation, so that the feature can be used
in training the model for solving predicting age problem, we used a method called get sentence vector(). It
gets the vector of a sentence with the size of 300 for each sentence, by averaging the L2 norm of word vector
or n-gram embeddings by element-wise [28]. In order to get the sentence vector, we performed the following
preprocessing step for each row in the radiologist dataset:
– remove all the special character, including the newline,
– calculate the sentence vector representation by using method get_sentence_vector(),
– we obtained the vector with the length of 300, and
– we summed up all those 300 elements inside the vector to get a single numerical representation for each
data. For example, for the text shown in Figure 2, the resulting vector is -0.74.
2.3. Model training and parameter
For all the four AI models, we only use the XGBoost algorithm [29], which is a tree based ensemble
ML algorithm. We decided to use this algorithm after observing that in several experiment, such as the one
we performed in [23], our data works best with XGBoost model. The only parameter that has been tuned for
both age and mortality prediction task is the maximum depth, which is 20 (the default is 3), while for the rest
of the parameters were left with their default value. The maximum depth was chosen because based on our
experiment, there is no improvement for the model training performance after the depth of 20. With this depth,
Age prediction from COVID-19 blood test for ensuring ... (Nunung Nurul Qomariyah)
3078 ❒ ISSN: 2252-8938
a single experiment took only about 15 seconds in Google Colab (24,629 of total rows and 29 columns were
trained), which is still a reasonable performance. Based on the result from several experiments there was no
significant improvement has shown by tuning the other parameters. Hence, we decided to focus on the maxi-
mum depth parameter.
Figure 3. Histogram of patient’s age and outcome
2.4. Evaluation technique
The dataset were split into 60:40 proportion, where the training is 60% and the testing portion is 40%.
The split point is based on the patient ID to ensure that there is no blood test record of a patient is separated
between training and test data. This setting is chosen due to the number of blood test record that were not the
same for all patients. We tried to split into different proportion, such as 70:30 and 80:20, however, in some
round of experiments, we were only left with very few samples in the test set and it causes bias when calculating
the model’s performance.
We performed two tasks of ML in this study, they are: regression (for age prediction) and classification
(for mortality prediction). For the regression task, we used two metrics called normalised root mean squared
error (NRMSE) and coefficient of determination (R2
). The NRMSE is the normalised version of RMSE, which
is to measure the differences between the predicted value and the observed value. We use NRMSE instead of
the RMSE because it is easier to compare the models of different scale. NRMSE is often expressed as a
percentage, where low value indicates better performance because of the less residual.
The other metric, is called coefficient of determination (R2
). In ML regression task, this metric can
be used to measure the performance of the model by calculating the proportion of the observed value that
is predictable. For the second task, mortality prediction with classification, we used four common metrics,
namely sensitivity, specificity, accuracy, and F1-score.
3. RESULT AND DISCUSSION
3.1. Predicting age from blood test
In this part, we reported the evaluation result of AI model 1 and AI model 2 as shown in Figure 1.
The additional feature, radiologist report did not show adding any significance improvement to the model. We
repeated the experiment 100 times, and the Mann Whitney U Test for two independent samples only shows the
p-value of 0.1 (greater than threshold of alpha 0.05). The result from both models was quite similar. We show
the result of the model with radiologist report in Figure 4. The R2
is shown in the figure as high as 33% of the
variable can be explained by the other variable. While the Pearson coefficient r value shows there is positive
correlation between the predicted age and true age (> 0.5).
However, when we observed the feature importance, the radiologist report was always in the top 5
of the important features to the model. We show the plot of both true age and predicted age distribution in
Int J Artif Intell, Vol. 13, No. 3, September 2024: 3072–3082
Int J Artif Intell ISSN: 2252-8938 ❒ 3079
Figure 5. It shows that our model can predict the same highest peak as the true age, that is in between 56 and
61 years old. The figure shows that the distribution of predicted age is quite similar with the true age. The
overlapped bar (purple shaded color) shows there is same amount of data in the particular bin.
Figure 4. Scatter plot of true age and predicted age
1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81 86
AGE VALUE
0.00
0.01
0.02
0.03
0.04
Density
HISTOGRAM PLOT OF AGE AND PREDICTED AGE
predicted age
true age
Figure 5. Distribution of predicted age in comparison
with true age
3.2. Enhancing the blood test with age for mortality prediction
3.2.1. Correlation matrix
The correlation matrix of the top-15 features in the dataset is shown in Figure 6. The darker color
shows the higher correlation. Several blood item have relatively high correlation with both actual and predicted
age, such as SEGMEN, UREUM, MCH, MCV, and LED. The UREUM has higher correlation with the pre-
dicted age than the actual age. Both true age and predicted age, agree on the correlation with other features,
however, the predicted age have slightly higher correlation score.
3.2.2. Mortality proportion in predicted age
We explore the proportion of the actual died patient in each age range. The result is shown in Figure 7.
From each bar show in the figure, the predicted age capture more died patient in the older range, i.e. older than
70 years old. With this result, we can say that one cannot use the age alone as an easy factor to predict the
mortality. In the actual age, patients with bin age 75, who actually died is only 35.86%, while our predicted age
actually can capture 53.50%. The red bar shows the dead status, and the yellow bar shows the survive status.
The proportion of dead outcome in predicted age (top figure) is higher when compared to the true age for the
age older than 70 years old, with the exception of 85 years old age bin where the percentage is not high in the
dataset.
3.2.3. Adding predicted age as additional feature
In this part, we reported the result of comparing AI model 3 and AI model 4 as shown in Figure 1. We
run an experiment by using each blood feature and add the result of the predicted age as the additional feature
to classify the mortality outcome for each patient. Our result show that this predicted age can increase the
performance of the model in the 18 blood items. We repeated the experiment 30 times and compared the result
with the one that added with the true age. We measure all the accuracy, sensitivity, specificity and F1-score
and calculate the p-value to determine whether adding predicted age is significantly better than adding true age
to predict the mortality. The result of the blood items which shows the significance result (p-value < 0.05) by
adding the predicted age is shown in Table 3. From the table we can see that all the classification measurement
return very good result, except the sensitivity. This is due to the limited case of mortality in our dataset which
is very low when compared to the survived case.
Age prediction from COVID-19 blood test for ensuring ... (Nunung Nurul Qomariyah)
3080 ❒ ISSN: 2252-8938
Figure 6. Correlation matrix of the top 15 features for mortality
Figure 7. Predicted age vs true age in predicting mortality
Table 3. List of blood items which shows significant difference after adding predicted age
Blood Item Accuracy Sensitivity Specificity F1
HCT 0.82 0.30 0.87 0.84
ERI 0.79 0.30 0.84 0.82
RDW 0.80 0.28 0.85 0.83
MCHC 0.85 0.25 0.91 0.86
MONOSIT 0.86 0.35 0.91 0.87
NLR1 0.83 0.44 0.86 0.85
APTTHSL 0.80 0.36 0.84 0.83
PO2 N 0.82 0.40 0.86 0.85
O2S N 0.80 0.32 0.84 0.83
DDIMER 0.82 0.30 0.87 0.84
LEKO 0.80 0.35 0.84 0.83
MCH 0.86 0.27 0.91 0.86
SGOT 0.85 0.48 0.88 0.87
SGPT 0.82 0.41 0.86 0.85
PTHSL 0.80 0.37 0.84 0.83
LIMFOSIT 0.84 0.43 0.88 0.86
GDSFULL 0.80 0.37 0.84 0.83
Int J Artif Intell, Vol. 13, No. 3, September 2024: 3072–3082
Int J Artif Intell ISSN: 2252-8938 ❒ 3081
4. CONCLUSION
The study of predicting patient age is still limited. We explore several reason on why this is needed
when we have to deal with AI model. Not only showing an effort to produce a robust AI model, but also to
improve the performance of the model itself. In this study, we show the importance of predicting age from
patient laboratory result. We conducted several experiments and show the pipeline of how the ML can predict
age from the given dataset. Our result shows that with the predicted age as an additional feature in mortality
classification task, the model is significantly improved when compared to adding the actual age. In the future,
we want to explore more about combining the other patient electronic health record and try the other ML
algorithm.
ETHICAL CLEARANCE
The patients’ medical records used in this study were collected by the data provider, including epi-
demiological, demographic, clinical, laboratory and mortality outcome information. This study has been ap-
proved by the Ethics Committee of the data provider, Pasar Minggu Regional Hospital Jakarta, Indonesia. The
requirement for patient consent was waived as this was a secondary analysis of anonymized data.
REFERENCES
[1] T. G. Dietterich, “Steps toward robust artificial intelligence,” AI Magazine, vol. 38, no. 3, pp. 3–24, 2017, doi:
10.1609/aimag.v38i3.2756.
[2] A. E. Öztaş, D. Boncukcu, E. Ozteke, M. Demir, A. Mirici, and P. Mutlu, “Covid-19 diagnosis: comparative approach between
chest x-ray and blood test data,” in 2021 6th International Conference on Computer Science and Engineering (UBMK), 2021, pp.
472–477, doi: 10.1109/UBMK52708.2021.9558969.
[3] T. Asai, “COVID-19: accurate interpretation of diagnostic tests—a statistical point of view,” Journal of Anesthesia, vol. 35, no. 3,
pp. 328–332, 2021, doi: 10.1007/s00540-020-02875-8.
[4] Z. Li et al., “Development and clinical application of a rapid IgM-IgG combined antibody test for SARS-CoV-2 infection diagnosis,”
Journal of Medical Virology, vol. 92, no. 9, pp. 1518–1524, 2020, doi: 10.1002/jmv.25727.
[5] D. Ferrari, A. Motta, M. Strollo, G. Banfi, and M. Locatelli, “Routine blood tests as a potential diagnostic tool for COVID-19,”
Clinical Chemistry and Laboratory Medicine, vol. 58, no. 7, pp. 1095–1099, 2020, doi: 10.1515/cclm-2020-0398.
[6] F. Bismadhika, N. N. Qomariyah, and A. A. Purwita, “Experiment on deep learning models for covid-19 detection from blood
testing,” in 2021 IEEE International Biomedical Instrumentation and Technology Conference (IBITeC), Oct. 2021, pp. 136–141,
doi: 10.1109/IBITeC53045.2021.9649254.
[7] F. Soares, “A novel specific artificial intelligence-based method to identify COVID-19 cases using simple blood exams,” medRxiv,
pp. 1–16, 2020, doi: 10.1101/2020.04.10.20061036.
[8] M. AlJame, I. Ahmad, A. Imtiaz, and A. Mohammed, “Ensemble learning model for diagnosing COVID-19 from routine blood
tests,” Informatics in Medicine Unlocked, vol. 21, 2020, doi: 10.1016/j.imu.2020.100449.
[9] F. Cabitza et al., “Development, evaluation, and validation of machine learning models for COVID-19 detection based on routine
blood tests,” Clinical Chemistry and Laboratory Medicine, vol. 59, no. 2, pp. 421–431, 2021, doi: 10.1515/cclm-2020-1294.
[10] D. Brinati, A. Campagner, D. Ferrari, M. Locatelli, G. Banfi, and F. Cabitza, “Detection of covid-19 infection from routine blood
exams with machine learning: a feasibility study,” Journal of Medical Systems, vol. 44, no. 8, pp. 1–12, 2020.
[11] S. B. Rikan, A. S. Azar, A. Ghafari, J. B. Mohasefi, and H. Pirnejad, “COVID-19 diagnosis from routine blood tests using artificial
intelligence techniques,” Biomedical Signal Processing and Control, vol. 72, 2022, doi: 10.1016/j.bspc.2021.103263.
[12] J. Luo, L. Zhou, Y. Feng, B. Li, and S. Guo, “The selection of indicators from initial blood routine test results to improve the
accuracy of early prediction of COVID-19 severity,” PLoS ONE, vol. 16, 2021, doi: 10.1371/journal.pone.0253329.
[13] H. Ko et al., “An artificial intelligence model to predict the mortality of COVID-19 patients at hospital admission time using routine
blood samples: Development and validation of an ensemble model,” Journal of Medical Internet Research, vol. 22, no. 12, 2020,
doi: 10.2196/25442.
[14] N. N. Qomariyah, A. A. Purwita, S. D. A. Asri, and D. Kazakov, “A tree-based mortality prediction model of covid-
19 from routine blood samples,” in 2021 International Conference on ICT for Smart Society (ICISS), 2021, pp. 1–7, doi:
10.1109/ICISS53185.2021.9533219.
[15] A. N. Ashadi, A. A. Purwita, and N. N. Qomariyah, “Combating bias in covid-19 disease detection using synthetic annota-
tions on chest x-ray images,” in 2021 IEEE International Biomedical Instrumentation and Technology Conference: The Im-
provement of Healthcare Technology to Achieve Universal Health Coverage, IBITeC 2021, 2021, pp. 88–92, doi: 10.1109/IB-
ITeC53045.2021.9649129.
[16] New York State, “Age-adjusted rates - statistics teaching tools.” Department for Health New York State, 1999. [Online]. Available:
https://www.health.ny.gov/diseases/chronic/ageadj.htm
[17] K. Urban, K. Kirley, and J. J. Stevermer, “PURLs: It’s time to use an age-based approach to D-dimer,” The Journal of Family
Practice, vol. 63, no. 3, pp. 155–156, 2014.
[18] Z. Wang, L. Li, B. S. Glicksberg, A. Israel, J. T. Dudley, and A. Ma’ayan, “Predicting age by mining electronic medical records
with deep learning characterizes differences between chronological and physiological age,” Journal of Biomedical Informatics, vol.
76, pp. 59–68, 2017, doi: 10.1016/j.jbi.2017.11.003.
[19] A. Karargyris, S. Kashyap, J. T. Wu, A. Sharma, M. Moradi, and T. S. -Mahmood, “Age prediction using a large chest x-ray dataset,”
in Medical Imaging 2019: Computer-Aided Diagnosis, 2019, doi: 10.1117/12.2512922.
Age prediction from COVID-19 blood test for ensuring ... (Nunung Nurul Qomariyah)
3082 ❒ ISSN: 2252-8938
[20] Q. Ruan, K. Yang, W. Wang, L. Jiang, and J. Song, “Clinical predictors of mortality due to COVID-19 based on an analysis of data
of 150 patients from Wuhan, China,” Intensive Care Medicine, vol. 46, no. 5, pp. 846–848, 2020, doi: 10.1007/s00134-020-05991-x.
[21] A. Vaid et al., “Machine learning to predict mortality and critical events in a cohort of patients with covid-19 in new york city:
model development and validation,” Journal of Medical Internet Research, vol. 22, no. 11, 2020, doi: 10.2196/24018.
[22] C. Solomou and D. Kazakov, “Utilizing chest x-rays for age prediction and gender classification,” in 2021 4th International
Seminar on Research of Information Technology and Intelligent Systems (ISRITI), 2021, pp. 356–361, doi: 10.1109/IS-
RITI54043.2021.9702796.
[23] N. N. Qomariyah, A. A. Purwita, M. S. Astriani, S. D. A. Asri, and D. Kazakov, “An XGBoost model for age prediction from
covid-19 blood test,” in 2021 4th International Seminar on Research of Information Technology and Intelligent Systems, ISRITI
2021, 2021, pp. 446–452, doi: 10.1109/ISRITI54043.2021.9702867.
[24] “fastText.” fastText, 2024. [Online]. Available: https://fasttext.cc/index.html
[25] E. Grave, P. Bojanowski, P. Gupta, A. Joulin, and T. Mikolov, “Learning word vectors for 157 languages,” in LREC 2018 - 11th
International Conference on Language Resources and Evaluation, 2019, pp. 3483–3487.
[26] N. Alghamdi and F. Assiri, “A comparison of fasttext implementations using arabic text classification,” Advances in Intelligent
Systems and Computing, vol. 1038, pp. 306–311, 2020, doi: 10.1007/978-3-030-29513-4 21.
[27] Z. S. Ritu, N. Nowshin, M. M. H. Nahid, and S. Ismail, “Performance analysis of different word embedding models on Bangla lan-
guage,” in 2018 International Conference on Bangla Speech and Language Processing (ICBSLP), 2018, pp. 1–5, doi: 10.1109/ICB-
SLP.2018.8554681.
[28] A. Joulin, E. Grave, P. Bojanowski, and T. Mikolov, “Bag of tricks for efficient text classification,” in 15th Conference of the
European Chapter of the Association for Computational Linguistics, EACL 2017, 2017, vol. 2, pp. 427–431, doi: 10.18653/v1/e17-
2068.
[29] T. Chen and C. Guestrin, “XGBoost: A scalable tree boosting system,” in Proceedings of the ACM SIGKDD International Confer-
ence on Knowledge Discovery and Data Mining, 2016, pp. 785–794, doi: 10.1145/2939672.2939785.
BIOGRAPHIES OF AUTHORS
Dr. Nunung Nurul Qomariyah is an Assistant Professor at BINUS University Inter-
national, Jakarta, Indonesia. She is a former member of the Artificial Intelligence Research Group,
University of York, UK. Her Ph.D. topic is in the area of recommender systems, particularly in e-
commerce user preference learning from pairwise comparisons. Her current research is focusing on
developing an explainable AI model for COVID-19 patients which is co-funded by Newton British
Council in collaboration with and Indonesian Ministry of Research and Education. She can be con-
tacted at email: nunung.qomariyah@binus.edu.
Dr. Dimitar Kazakov is a reader in Computer Science at the University of York, UK
and a member of his Departmental Artificial Intelligence Group. His research encompasses the de-
velopment of machine learning (ML) and evolutionary algorithms and their applications to natural
language processing, real-time systems, intelligent agents, function optimisation, and financial fore-
casting. He has published over 120 peer-reviewed articles, supervised 7 and co-supervised another
3 Ph.D. students to completion. He is currently leading a research team of 6 Ph.D. students. He
is a former Vice-Chair of the UK Society for the Study of Artificial Intelligence and Simulation of
Behaviour (AISB). He can be contacted at email: dimitar.kazakov@york.ac.uk.
Int J Artif Intell, Vol. 13, No. 3, September 2024: 3072–3082

More Related Content

PDF
INSIGHT ABOUT DETECTION, PREDICTION AND WEATHER IMPACT OF CORONAVIRUS (COVID-...
ijaia
 
PDF
20601-38945-1-PB.pdf
IjictTeam
 
PDF
Health Risk Prediction Using Support Vector Machine with Gray Wolf Optimizati...
ijtsrd
 
PDF
Application of Deep Learning for Early Detection of Covid 19 using CT scan Im...
ijtsrd
 
PDF
Social Distancing Detection, Monitoring and Management Using OpenCV
IRJET Journal
 
PDF
An intelligent approach for detection of covid by analysing Xray and CT scan ...
IRJET Journal
 
PDF
Covid 19 Health Prediction using Supervised Learning with Optimization
ijtsrd
 
INSIGHT ABOUT DETECTION, PREDICTION AND WEATHER IMPACT OF CORONAVIRUS (COVID-...
ijaia
 
20601-38945-1-PB.pdf
IjictTeam
 
Health Risk Prediction Using Support Vector Machine with Gray Wolf Optimizati...
ijtsrd
 
Application of Deep Learning for Early Detection of Covid 19 using CT scan Im...
ijtsrd
 
Social Distancing Detection, Monitoring and Management Using OpenCV
IRJET Journal
 
An intelligent approach for detection of covid by analysing Xray and CT scan ...
IRJET Journal
 
Covid 19 Health Prediction using Supervised Learning with Optimization
ijtsrd
 

Similar to Age prediction from COVID-19 blood test for ensuring robust artificial intelligence (20)

PDF
Insight of recent artificial intelligence-based strategy to effectively scree...
IAESIJAI
 
PDF
Mortality prediction of COVID-19 patients using supervised machine learning
IAESIJAI
 
PDF
Coronavirus disease situation analysis and prediction using machine learning...
IJECEIAES
 
PDF
Dissertation on Computer Science: Machine Learning Algorithm to Predict Covid...
PhD Assistance
 
PPTX
Dissertation on Computer Science: Machine Learning Algorithm to Predict Covid...
PhD Assistance
 
PDF
Can ai help in screening vira
Dinh Hong Duyen
 
PDF
Predicting the status of COVID-19 active cases using a neural network time s...
IJECEIAES
 
PDF
Estimating the Statistical Significance of Classifiers used in the Predictio...
IOSR Journals
 
PDF
Role of Machine Learning Techniques in COVID-19 Prediction and Detection
IRJET Journal
 
PDF
COVID-19 FUTURE FORECASTING USING SUPERVISED MACHINE LEARNING MODELS
IRJET Journal
 
PDF
Predictions And Analytics In Healthcare: Advancements In Machine Learning
IRJET Journal
 
PDF
An Investigation Into the Impacts of ICT in the Compacting of COVID-19: A Nam...
BOHR International Journal of Smart Computing and Information Technology
 
PDF
Detection of chest pathologies using autocorrelation functions
IJECEIAES
 
PDF
Role of data science during covid times
TanyaAgarwal71
 
DOCX
Running head Qualitative Research Critique and Ethical Considerat.docx
todd581
 
PDF
Calculating the area of white spots on the lungs of patients with COVID-19 u...
IJECEIAES
 
PDF
Machine learning approaches in the diagnosis of infectious diseases-a review.pdf
Smriti Mishra
 
PDF
Predictive machine learning applying cross industry standard process for data...
IAESIJAI
 
PDF
Predicting hepatitis C infection with machine learning algorithms: a prospect...
IAESIJAI
 
PDF
Predicting hepatitis C infection with machine learning algorithms: a prospect...
IAESIJAI
 
Insight of recent artificial intelligence-based strategy to effectively scree...
IAESIJAI
 
Mortality prediction of COVID-19 patients using supervised machine learning
IAESIJAI
 
Coronavirus disease situation analysis and prediction using machine learning...
IJECEIAES
 
Dissertation on Computer Science: Machine Learning Algorithm to Predict Covid...
PhD Assistance
 
Dissertation on Computer Science: Machine Learning Algorithm to Predict Covid...
PhD Assistance
 
Can ai help in screening vira
Dinh Hong Duyen
 
Predicting the status of COVID-19 active cases using a neural network time s...
IJECEIAES
 
Estimating the Statistical Significance of Classifiers used in the Predictio...
IOSR Journals
 
Role of Machine Learning Techniques in COVID-19 Prediction and Detection
IRJET Journal
 
COVID-19 FUTURE FORECASTING USING SUPERVISED MACHINE LEARNING MODELS
IRJET Journal
 
Predictions And Analytics In Healthcare: Advancements In Machine Learning
IRJET Journal
 
An Investigation Into the Impacts of ICT in the Compacting of COVID-19: A Nam...
BOHR International Journal of Smart Computing and Information Technology
 
Detection of chest pathologies using autocorrelation functions
IJECEIAES
 
Role of data science during covid times
TanyaAgarwal71
 
Running head Qualitative Research Critique and Ethical Considerat.docx
todd581
 
Calculating the area of white spots on the lungs of patients with COVID-19 u...
IJECEIAES
 
Machine learning approaches in the diagnosis of infectious diseases-a review.pdf
Smriti Mishra
 
Predictive machine learning applying cross industry standard process for data...
IAESIJAI
 
Predicting hepatitis C infection with machine learning algorithms: a prospect...
IAESIJAI
 
Predicting hepatitis C infection with machine learning algorithms: a prospect...
IAESIJAI
 
Ad

More from IAESIJAI (20)

PDF
Detection and avoidance of black-hole attack in mobile adhoc network using be...
IAESIJAI
 
PDF
Method for developing and partitioning graph-based data warehouses using asso...
IAESIJAI
 
PDF
Serial parallel dataflow-pipelined processing architecture based accelerator ...
IAESIJAI
 
PDF
An ontology-based knowledge modeling for the rite of Bai Sri Su Kwan: a ritua...
IAESIJAI
 
PDF
Development of a 2 degree of freedom-proportional integral derivative control...
IAESIJAI
 
PDF
Electroencephalogram denoising using discrete wavelet transform and adaptive ...
IAESIJAI
 
PDF
Mobile robot localization using visual odometry in indoor environments with T...
IAESIJAI
 
PDF
Bring your own device readiness and productivity framework: a structured part...
IAESIJAI
 
PDF
Optimizing seismic sequence clustering with rapid cube-based spatiotemporal a...
IAESIJAI
 
PDF
Smart contracts vulnerabilities detection using ensemble architecture of grap...
IAESIJAI
 
PDF
Parallel rapidly exploring random tree method for unmanned aerial vehicles au...
IAESIJAI
 
PDF
Arabic text diacritization using transformers: a comparative study
IAESIJAI
 
PDF
Financial text embeddings for the Russian language: a global vectors-based ap...
IAESIJAI
 
PDF
Towards efficient knowledge extraction: Natural language processing-based sum...
IAESIJAI
 
PDF
A novel model to detect and categorize objects from images by using a hybrid ...
IAESIJAI
 
PDF
Enhancement of YOLOv5 for automatic weed detection through backbone optimization
IAESIJAI
 
PDF
Reliable backdoor attack detection for various size of backdoor triggers
IAESIJAI
 
PDF
Chinese paper classification based on pre-trained language model and hybrid d...
IAESIJAI
 
PDF
A robust penalty regression function-based deep convolutional neural network ...
IAESIJAI
 
PDF
Artificial intelligence-driven method for the discovery and prevention of dis...
IAESIJAI
 
Detection and avoidance of black-hole attack in mobile adhoc network using be...
IAESIJAI
 
Method for developing and partitioning graph-based data warehouses using asso...
IAESIJAI
 
Serial parallel dataflow-pipelined processing architecture based accelerator ...
IAESIJAI
 
An ontology-based knowledge modeling for the rite of Bai Sri Su Kwan: a ritua...
IAESIJAI
 
Development of a 2 degree of freedom-proportional integral derivative control...
IAESIJAI
 
Electroencephalogram denoising using discrete wavelet transform and adaptive ...
IAESIJAI
 
Mobile robot localization using visual odometry in indoor environments with T...
IAESIJAI
 
Bring your own device readiness and productivity framework: a structured part...
IAESIJAI
 
Optimizing seismic sequence clustering with rapid cube-based spatiotemporal a...
IAESIJAI
 
Smart contracts vulnerabilities detection using ensemble architecture of grap...
IAESIJAI
 
Parallel rapidly exploring random tree method for unmanned aerial vehicles au...
IAESIJAI
 
Arabic text diacritization using transformers: a comparative study
IAESIJAI
 
Financial text embeddings for the Russian language: a global vectors-based ap...
IAESIJAI
 
Towards efficient knowledge extraction: Natural language processing-based sum...
IAESIJAI
 
A novel model to detect and categorize objects from images by using a hybrid ...
IAESIJAI
 
Enhancement of YOLOv5 for automatic weed detection through backbone optimization
IAESIJAI
 
Reliable backdoor attack detection for various size of backdoor triggers
IAESIJAI
 
Chinese paper classification based on pre-trained language model and hybrid d...
IAESIJAI
 
A robust penalty regression function-based deep convolutional neural network ...
IAESIJAI
 
Artificial intelligence-driven method for the discovery and prevention of dis...
IAESIJAI
 
Ad

Recently uploaded (20)

PDF
Doc9.....................................
SofiaCollazos
 
PDF
Automating ArcGIS Content Discovery with FME: A Real World Use Case
Safe Software
 
PDF
Research-Fundamentals-and-Topic-Development.pdf
ayesha butalia
 
PDF
The Evolution of KM Roles (Presented at Knowledge Summit Dublin 2025)
Enterprise Knowledge
 
PPTX
How to Build a Scalable Micro-Investing Platform in 2025 - A Founder’s Guide ...
Third Rock Techkno
 
PPTX
Comunidade Salesforce São Paulo - Desmistificando o Omnistudio (Vlocity)
Francisco Vieira Júnior
 
PDF
Software Development Company | KodekX
KodekX
 
PPTX
OA presentation.pptx OA presentation.pptx
pateldhruv002338
 
PDF
Orbitly Pitch Deck|A Mission-Driven Platform for Side Project Collaboration (...
zz41354899
 
PDF
Software Development Methodologies in 2025
KodekX
 
PDF
Cloud-Migration-Best-Practices-A-Practical-Guide-to-AWS-Azure-and-Google-Clou...
Artjoker Software Development Company
 
PDF
How Open Source Changed My Career by abdelrahman ismail
a0m0rajab1
 
PDF
Structs to JSON: How Go Powers REST APIs
Emily Achieng
 
PDF
Presentation about Hardware and Software in Computer
snehamodhawadiya
 
PDF
Brief History of Internet - Early Days of Internet
sutharharshit158
 
PDF
This slide provides an overview Technology
mineshkharadi333
 
PDF
Security features in Dell, HP, and Lenovo PC systems: A research-based compar...
Principled Technologies
 
PPTX
IoT Sensor Integration 2025 Powering Smart Tech and Industrial Automation.pptx
Rejig Digital
 
PPTX
AI and Robotics for Human Well-being.pptx
JAYMIN SUTHAR
 
PDF
Event Presentation Google Cloud Next Extended 2025
minhtrietgect
 
Doc9.....................................
SofiaCollazos
 
Automating ArcGIS Content Discovery with FME: A Real World Use Case
Safe Software
 
Research-Fundamentals-and-Topic-Development.pdf
ayesha butalia
 
The Evolution of KM Roles (Presented at Knowledge Summit Dublin 2025)
Enterprise Knowledge
 
How to Build a Scalable Micro-Investing Platform in 2025 - A Founder’s Guide ...
Third Rock Techkno
 
Comunidade Salesforce São Paulo - Desmistificando o Omnistudio (Vlocity)
Francisco Vieira Júnior
 
Software Development Company | KodekX
KodekX
 
OA presentation.pptx OA presentation.pptx
pateldhruv002338
 
Orbitly Pitch Deck|A Mission-Driven Platform for Side Project Collaboration (...
zz41354899
 
Software Development Methodologies in 2025
KodekX
 
Cloud-Migration-Best-Practices-A-Practical-Guide-to-AWS-Azure-and-Google-Clou...
Artjoker Software Development Company
 
How Open Source Changed My Career by abdelrahman ismail
a0m0rajab1
 
Structs to JSON: How Go Powers REST APIs
Emily Achieng
 
Presentation about Hardware and Software in Computer
snehamodhawadiya
 
Brief History of Internet - Early Days of Internet
sutharharshit158
 
This slide provides an overview Technology
mineshkharadi333
 
Security features in Dell, HP, and Lenovo PC systems: A research-based compar...
Principled Technologies
 
IoT Sensor Integration 2025 Powering Smart Tech and Industrial Automation.pptx
Rejig Digital
 
AI and Robotics for Human Well-being.pptx
JAYMIN SUTHAR
 
Event Presentation Google Cloud Next Extended 2025
minhtrietgect
 

Age prediction from COVID-19 blood test for ensuring robust artificial intelligence

  • 1. IAES International Journal of Artificial Intelligence (IJ-AI) Vol. 13, No. 3, September 2024, pp. 3072∼3082 ISSN: 2252-8938, DOI: 10.11591/ijai.v13.i3.pp3072-3082 ❒ 3072 Age prediction from COVID-19 blood test for ensuring robust artificial intelligence Nunung Nurul Qomariyah1 , Dimitar Kazakov2 1Department of Computer Science, School of Computing and Creative Arts, Bina Nusantara University, Jakarta, Indonesia 2Department of Computer Science, University of York, Heslington, United Kingdom Article Info Article history: Received Oct 6, 2023 Revised Oct 25, 2023 Accepted Jan 6, 2024 Keywords: Age prediction Blood test COVID-19 Machine learning Medical record Regression ABSTRACT With the advancement of artificial intelligence (AI) nowadays, the world is ex- periencing conveniences in automating some complex and tedious tasks, such as analysing large data and predicting the future by mimicking human expertise. AI has also shown promise for mitigating future crisis, such as pandemic. Since the beginning of the COVID-19, several AI models have been published by the researchers to help the healthcare to fight in this situation. However, before de- ploying the model, one needs to ensure that the model is robust and safe to learn from the real environment, especially in medical domain, where the uncertainty and incomplete information are not unusual. In the effort of providing robust AI, we proposed to use patient age as one of the feasible feature for ensuring vigor- ous AI models from electronic health record. We conducted several experiment with 28 blood test items and radiologist report from 1,000 COVID-19 patients. Our result shows that with the predicted age as an additional feature in mortality classification task, the model is significantly improved when compared to adding the actual age. We also reported our findings regarding the predicted age in the dataset. This is an open access article under the CC BY-SA license. Corresponding Author: Nunung Nurul Qomariyah Department of Computer Science, School of Computing and Creative Arts, Bina Nusantara University Jakarta 11480, Indonesia Email: [email protected] 1. INTRODUCTION Artificial intelligence (AI) has been helping humanity since the beginning of the neural network era. It continues to contribute in an era where the deep learning (DL) approach became very popular in the 2010s. AI has become a rising field that revolutionary produces solutions to automate many tasks in several domains, including healthcare. Scientists have developed promising models that have the potential to reshape the health- care domain, especially in the settings where resources are lacking, such as in the coronavirus disease 2019 (COVID-19) pandemic situation. In this kind of situations, health workers often feel exhausted when deal- ing with the rapidly increasing number of patients. AI researchers have been proposing many solutions for this, including automatic diagnosing, severity, and mortality prediction. Several studies on how AI can help COVID-19 have been published. From Scopus indexed database, there were 1,667 documents returned for the keyword “AI for COVID-19”, published in 2020 to 2022. This advancement of AI technology has also been encouraged to be deployed in high-stakes settings, such as autonomous driving and managing power-grid. Such applications have been raising another need of a robust AI [1]. Before deploying the model, one needs to ensure that the model is robust and safe to learn Journal homepage: http://ijai.iaescore.com
  • 2. Int J Artif Intell ISSN: 2252-8938 ❒ 3073 from the real environment, where the uncertainty and incomplete information exist. According to Dietterich [1], robust AI can be achieved through several ways; such as robust optimization, regularization in machine learning (ML), modify the objectives to be risk-sensitive, robust inference, detecting model failures, causal models, ensemble method, and expand the model (such as knowledge model employed by Google that contains millions of objects and relationships). Several AI models have been developed for COVID-19 problem, since the first time this disease was detected in Wuhan, China, 2019. For diagnosing task, the standard test use for detecting COVID-19 is through reverse transcription polymerase chain reaction (RT-PCR) test. The presence of this disease can also be confirmed from the patients’ radiology test result chest X-ray (CXR) and computerized tomography (CT) scan. The first method, RT-PCR, is more preferable because its speed and accuracy [2]. However, the specificity and sensitivity of this method have a fairly large gap between one test kit and another. According to WHO, the ‘acceptable’ test method can have the sensitivity above 80% and specificity above 90%, while the ‘desired’ test method can have sensitivity above 90% and specificity above 99% [3]. Other that that, the RT- PCR test also has several limitation to be used in large-scale diagnosis, such as the long turnaround times (over 2-3 hours), certified laboratories, trained healthcare staff, expensive equipment and reagents which possibly will make the demand overcome supply [4]. Other than radiology images, blood test can also be used as the alternative method for initial patient screening. When compared to the gold standard of RT-PCR method, this routine blood test will usually can be delivered more quickly with a hematology analyzer within 30 minutes to 2 hours time range, as mentioned in [2]. Therefore, in the medical domain itself, some researchers have also considered the blood test for COVID-19 diagnosis such as [5]. Following this fact, the researchers in AI domain, also developed several models to diagnose COVID-19 from blood test data, such as the one in [2], [6]-[11]. In addition to the diagnosis task, the blood test exam is also used to predict the severity of the disease [12] and mortality [13], [14]. As suggested by Dietterich [1], we need to perform additional efforts for implementing a robust AI, before the models can be deployed in real-world settings. There are several ways to ensure the robustness of the AI models in medical domains. One effort has shown by removing bias in radiology image classification [15]. The main contribution of this paper is proposing a novel way to handle this problem by using “age” as the im- portant feature to ensure the robustness and reliability model. The patient’s age, as one additionalan important features that can improve and check the consistency of the results of the AI model of blood tests for COVID-19 patients, was chosen because of some reasons as follows: i) age is a risk factor for almost all chronic disease, ii) knowing the patient’s age for planning proper triage is important, iii) in forensic and anthropological inves- tigations, predicting age is common method, and iv) age was found as one of the most significant contributors of predicting mortality and diagnosing COVID-19. As stated in [16], the rates of disease in each age group are different. Age is a risk factor for almost all chronic disease, including most cancers. In the epidemiology, the age is also important factor to observe the findings in collecting health statistics from the communities. The epidemiologist refer this as age-adjustment, where it allows the statisticians to give different weight for people from different age group to remove the confounding factor. Knowing the patient’s age for planning proper triage is important. Not only that, knowing the patient’s age also shows benefits in establishing a diagnosis, such as the study conducted by Urban et al. [17]. They confirm that age specific cut off point of D-Dimer can significantly improve the specificity of the venous thromboembolism (VTE) diagnosis in a patient, particularly for older age. They compared the result with the conventional D-Dimer cut off point. Similar study which focus on predicting the patients’ age has been carried out by Wang et al. [18]. They proposed a model that can predict the chronological age of patients from electronic medical records (EMR) which contains information about the patient physiological state: vital signs and lab tests. They aim to identify the discrepancy between chronological and physiological age of patients which is vital for preventative and personalized care. They trained a DL model and the result is satisfied with the standard deviation error of 7 years. In forensic and anthropological investigations, predicting individual age is not new. A recent study by Karargyris et al. [19] demonstrated a novel approach of predicting age automatically by using DL model from medical images. Based on their study, various medical imaging modalities often contain individual visual features of a person. The benefits of their research are not only can be used for forensic purposes, but can also be used for planning appropriate treatment, for example, for children who are detected to have growth disorders Age prediction from COVID-19 blood test for ensuring ... (Nunung Nurul Qomariyah)
  • 3. 3074 ❒ ISSN: 2252-8938 from bone age prediction. Several studies also confirm that age was one of the most significant contributors of predicting mortal- ity in COVID-19. One study that confirms this findings has been conducted by Ruan et al. [20]. Based on their trial, age was found to be significant along with the presence of comorbidity disease, secondary infection and inflammation in the blood. Another similar study in New York City by Vaid et al. [21] also confirms the same result. They stated that at 7 days, age was found to be important for COVID-19 mortality with rapid increase of feature associated with increasing age. A study by Brinati et al. [10] also found similar findings saying that age was one of the top-10 most important feature in diagnosing COVID-19. On the other hand, in this digital era when every patient history record is stored electronically, it is also necessary to develop a system that can confirm the patient’s age based on their laboratory readings. This is especially useful when the patient age data is missing due to human error or other unexpected oversight when entering data. Therefore, a system to confirm the patient’s biological age is needed, to develop appropriate triage and make correct treatment decisions. In a pandemic situation like now, people find it useful to use telemedicine to seek first aid from a healthcare practitioner. Telemedicine is also a digital technology that is currently very popular during this pandemic. This mechanism will be useful when there is a need for the patients need to send the blood test result and consult with the doctors remotely through telemedicine. This study is conducted based on our previous findings on training the AI models to predict patients’ age from CXR [22] and blood test data [23]. For the radiology image, the models ware trained on predicting age from general case of CXR, while for the blood test, the models were trained on both COVID-19 and non- COVID-19 case. However, since our finding in [23] shows that for the blood test data, COVID-19 dataset performed better than the other dataset (Pneumonia and other disease), in this paper, we want to focus on the COVID-19 blood test dataset. This paper is limiting the scope of the research to only focus on the patient’s with COVID-19, as people diagnosed with this disease have a particular biological markers and a set of tests. Therefore, the results obtained from the conclusions of this research will be easier to be applied. We conducted several experiments, where it can be composed into two big scopes: predicting age from blood test and predicting mortality from blood test by considering age as a feature. In the first scope, we want to show the impact of radiologist observation text report to predict age from blood test. In the second scope, we performed experiments by involving the actual age and predicted age with each of the blood test items in predicting the mortality of the patients. 2. RESEARCH METHODOLOGY This section will show about the research methodology used in the paper. The research work flow is shown in Figure 1. The process started from data collection, then it was followed by preprocessing step, dividing the data into training and testing, then perform several experiments and evaluation. As explained in the previous section, this study composed of two big tasks, namely predicting age and predicting mortality. For each task, we conducted experiments and tuned two models. Hence, we ended up with four AI models. The summary of the four models is shown in Table 1. 2.1. Dataset and preprocessing In Figure 1, the four AI models were highlighted in yellow boxes, they are: i) model for predicting age; ii) model for predicting age with radiologist report as extra feature; iii) model for predicting mortality with actual age; and iv) model for predicting mortality with predicted age. As shown in Figure 1, the first step in this study was collecting data. The data was collected from a COVID-19 referral hospital, Pasar Minggu Regional Hospital in Jakarta, Indonesia, during the first wave of the pandemic, March to December 2020. There were 1,000 patients records collected as a sample in this study. Each patient was administered to several blood test examination during their stay in the hospital. Consequently, the data contains several rows of blood test result from each patient. So in total, we have 24,629 records in the dataset. The data collected in this study include: patients age, blood test, mortality status, and radiologist report (sample is provided in Figure 2). The radiologist report were given as an extra information about the patient conditions, in replace of the radiology images. This is due to the hospital policy, where the radiology images cannot be sent out to external parties. We plan to conduct deeper analysis of this additional feature in another paper. For this study, we use this extra feature to examine whether there is improvement in the performance when compared to only using the blood test data. Int J Artif Intell, Vol. 13, No. 3, September 2024: 3072–3082
  • 4. Int J Artif Intell ISSN: 2252-8938 ❒ 3075 Figure 1. Research flow diagram Table 1. Summary of the four AI models Model Task Predicted value/label Feature Evaluation 1 regression age 28 blood test item Normalised RMSE and R2 2 regression age 28 blood test item and radiologist report Normalised RMSE and R2 3 classification mortality each blood test item and actual age specificity, sensitivity, accuracy, F1-score 4 classification mortality each blood test item and predicted age specificity, sensitivity, accuracy, F1-score Age prediction from COVID-19 blood test for ensuring ... (Nunung Nurul Qomariyah)
  • 5. 3076 ❒ ISSN: 2252-8938 Original report in Bahasa Thorax PA Perbandingan : foto thorax PA tanggal 07-01-2018 Cor sedikit membesar ke lateral kiri dengan apeks yang tertanam pada diafragma, pinggang jantung normal (CTR 52%). Sinuses dan diafragma normal. Pulmo: -Hili normal. -Corakan bronkovaskuler normal. -Tampak infiltrat di tengah sampai bawah kiri. -Kranialisasi (-). KESAN : -Kardiomegali ringan. -Infiltrat di tengah sampai bawah kiri dd/ bronkopneumonia. (dibandingkan dengan foto thorax PA tanggal 07-01-2018 : stqa) Translated report in English Thorax PA Comparison: PA thorax photo on 07-01-2018 Slightly enlarged left lateral with embedded apex on the diaphragm, the waist of the heart was normal (CTR 52%). The sinuses and diaphragm are normal. Pulmo: -Normal hila. -Normal bronchovascular pattern. -Looks infiltrate in the middle to the bottom left. -Cranialization (-). IMPRESSION: -Mild cardiomegaly. -Infiltrate in the middle to bottom left chest/ bronchopneumonia. (compared to PA thorax photo on 07-01-2018: status quante) Figure 2. A sample of radiologist report dataset Based on our interview with the pulmonologist from the hospital and also considering our previous findings in [14], we decided to use only 28 blood biomarkers. These biomarkers is shown in Table 2. From the the patients’ age data, it was varied from the youngest, 1.5 years old, to the oldest, and 92 years old. The histogram of age with the proportion of each patient outcome is shown in Figure 3. It is shown that the patient’s age ranging from 1.5 years old to 92 years old, with two peak points in between 36 and 41 years old, also between 56 and 61 years old. While the distribution of the dead outcome only have one peak point, that is in between 56-61 years old. For preprocessing step, we employed backward fill and k-nearest neighbour (KNN) imputation tech- nique. The first method is to fill the missing values of the blood test with the latest exam result. Then, KNN technique is used for final step in imputation to clean the whole dataset. KNN works by finding the similar values and estimate the missing one. To handle the imbalanced dataset in mortality, where the dead status is much lower than the survive one, we used SMOTE technique to perform data upsampling for minority class. We divided the dataset into 60:40 proportion for training and testing. We also ensure that when splitting the dataset into training and testing, there was no data belong to the same patients were split up. 2.2. Text embedding technique In this section, we will explain the text embedding technique for preparing the radiologist report data. In our experiment, we only used FastText [24], a library created by Facebook Research team for efficient text classification and representation learning. The library already provided the pre-trained word vectors for Indonesian languages, which were trained on Common Crawl and Wikipedia. This word vectors model were trained by Grave et al. [25], and published in 2018. The reason we chose this library in our experiment, other than the reason that they provided Indonesian language model, they also offer several benefits over the other library such as word2vec, gensim or glove, i.e. they use assumption that a word is formed by a n- grams of character. This can be helpful to find the vector representation of a rare word. It can also give vector representation, even when there is no existing word in the dictionary. Since our radiologist report dataset contains so many rare word, which are specific on the pulmonary domain, the FastText library is mostly suitable. Several latest studies already showed evidence that FastText performed better than the other libraries, Int J Artif Intell, Vol. 13, No. 3, September 2024: 3072–3082
  • 6. Int J Artif Intell ISSN: 2252-8938 ❒ 3077 such as the one performed by Alghamdi and Assiri [26]. The other study also showed that FastText not only performed better but also faster [27]. Table 2. Biomarkers used in blood test dataset Biomarker Feature code Normal level (adult) Unit HEMATOLOGY Hemoglobin HB 13.2 - 17.3 g/dL Hematocrit HCT 40 - 52 % Leukocytes LEKO 3.8 - 10.6 103/µL Platelets PLT 150 - 440 103/µL Erythrocytes ERI 4.40 - 5.90 106/µL Red cell distribution width RDW 11.8 - 14.5 % AVERAGE ERYTHROCYTE VALUE Mean corpuscular volume MCV 80 - 100 fl Mean corpuscular hemoglobin MCH 27.5 - 33.2 pg Mean corpuscular hemoglobin concentration MCHC 32 - 36 g/dL COUNT TYPE Basophils BASOFIL 0.0 - 1.0 % Eosinophils EOS 1.0 - 5.0 % Stem neutrophils NEUTB 3.0 - 5.0 % Segmented neutrophils SEGMEN 50 - 70 % Lymphocytes LIMFOSIT 25 - 50 % Monocytes MONOSIT 2.0 - 8.0 % Neutrophil-lymphocyte ratio NLR1 <3.12 Erythrocyte sedimentation rate LED 0 - 20 mm/hour HEMOSTASIS D-Dimer DDIMER <0.5 µg/mL prothrombin time PTHSL 10.80 - 14.40 second Activated partial Thromboplastin Time APTTHSL 25.00 - 35.00 second BLOOD CHEMISTRY Arterial blood gas analysis Partial pressure of oxygen PO2 N 71.0 - 104.0 mmHg Oxygen saturation O2S N 94.0 - 100.0 % Liver function Serum glutamic oxaloacetic transaminase SGOT <50 U/L Serum glutamic pyruvic transaminase SGPT <50 U/L Diabetes Random plasma glucose test GDSFULL 70 - 180 mg/dL Kidney function Urea UREUM <48 mg/dl Creatinine CREAT 0.70 - 1.30 mg/dL Cardiac enzymes Lactate dehydrogenase LDH 50 - 150 U/L In order to transform the radiologist text report to vector representation, so that the feature can be used in training the model for solving predicting age problem, we used a method called get sentence vector(). It gets the vector of a sentence with the size of 300 for each sentence, by averaging the L2 norm of word vector or n-gram embeddings by element-wise [28]. In order to get the sentence vector, we performed the following preprocessing step for each row in the radiologist dataset: – remove all the special character, including the newline, – calculate the sentence vector representation by using method get_sentence_vector(), – we obtained the vector with the length of 300, and – we summed up all those 300 elements inside the vector to get a single numerical representation for each data. For example, for the text shown in Figure 2, the resulting vector is -0.74. 2.3. Model training and parameter For all the four AI models, we only use the XGBoost algorithm [29], which is a tree based ensemble ML algorithm. We decided to use this algorithm after observing that in several experiment, such as the one we performed in [23], our data works best with XGBoost model. The only parameter that has been tuned for both age and mortality prediction task is the maximum depth, which is 20 (the default is 3), while for the rest of the parameters were left with their default value. The maximum depth was chosen because based on our experiment, there is no improvement for the model training performance after the depth of 20. With this depth, Age prediction from COVID-19 blood test for ensuring ... (Nunung Nurul Qomariyah)
  • 7. 3078 ❒ ISSN: 2252-8938 a single experiment took only about 15 seconds in Google Colab (24,629 of total rows and 29 columns were trained), which is still a reasonable performance. Based on the result from several experiments there was no significant improvement has shown by tuning the other parameters. Hence, we decided to focus on the maxi- mum depth parameter. Figure 3. Histogram of patient’s age and outcome 2.4. Evaluation technique The dataset were split into 60:40 proportion, where the training is 60% and the testing portion is 40%. The split point is based on the patient ID to ensure that there is no blood test record of a patient is separated between training and test data. This setting is chosen due to the number of blood test record that were not the same for all patients. We tried to split into different proportion, such as 70:30 and 80:20, however, in some round of experiments, we were only left with very few samples in the test set and it causes bias when calculating the model’s performance. We performed two tasks of ML in this study, they are: regression (for age prediction) and classification (for mortality prediction). For the regression task, we used two metrics called normalised root mean squared error (NRMSE) and coefficient of determination (R2 ). The NRMSE is the normalised version of RMSE, which is to measure the differences between the predicted value and the observed value. We use NRMSE instead of the RMSE because it is easier to compare the models of different scale. NRMSE is often expressed as a percentage, where low value indicates better performance because of the less residual. The other metric, is called coefficient of determination (R2 ). In ML regression task, this metric can be used to measure the performance of the model by calculating the proportion of the observed value that is predictable. For the second task, mortality prediction with classification, we used four common metrics, namely sensitivity, specificity, accuracy, and F1-score. 3. RESULT AND DISCUSSION 3.1. Predicting age from blood test In this part, we reported the evaluation result of AI model 1 and AI model 2 as shown in Figure 1. The additional feature, radiologist report did not show adding any significance improvement to the model. We repeated the experiment 100 times, and the Mann Whitney U Test for two independent samples only shows the p-value of 0.1 (greater than threshold of alpha 0.05). The result from both models was quite similar. We show the result of the model with radiologist report in Figure 4. The R2 is shown in the figure as high as 33% of the variable can be explained by the other variable. While the Pearson coefficient r value shows there is positive correlation between the predicted age and true age (> 0.5). However, when we observed the feature importance, the radiologist report was always in the top 5 of the important features to the model. We show the plot of both true age and predicted age distribution in Int J Artif Intell, Vol. 13, No. 3, September 2024: 3072–3082
  • 8. Int J Artif Intell ISSN: 2252-8938 ❒ 3079 Figure 5. It shows that our model can predict the same highest peak as the true age, that is in between 56 and 61 years old. The figure shows that the distribution of predicted age is quite similar with the true age. The overlapped bar (purple shaded color) shows there is same amount of data in the particular bin. Figure 4. Scatter plot of true age and predicted age 1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81 86 AGE VALUE 0.00 0.01 0.02 0.03 0.04 Density HISTOGRAM PLOT OF AGE AND PREDICTED AGE predicted age true age Figure 5. Distribution of predicted age in comparison with true age 3.2. Enhancing the blood test with age for mortality prediction 3.2.1. Correlation matrix The correlation matrix of the top-15 features in the dataset is shown in Figure 6. The darker color shows the higher correlation. Several blood item have relatively high correlation with both actual and predicted age, such as SEGMEN, UREUM, MCH, MCV, and LED. The UREUM has higher correlation with the pre- dicted age than the actual age. Both true age and predicted age, agree on the correlation with other features, however, the predicted age have slightly higher correlation score. 3.2.2. Mortality proportion in predicted age We explore the proportion of the actual died patient in each age range. The result is shown in Figure 7. From each bar show in the figure, the predicted age capture more died patient in the older range, i.e. older than 70 years old. With this result, we can say that one cannot use the age alone as an easy factor to predict the mortality. In the actual age, patients with bin age 75, who actually died is only 35.86%, while our predicted age actually can capture 53.50%. The red bar shows the dead status, and the yellow bar shows the survive status. The proportion of dead outcome in predicted age (top figure) is higher when compared to the true age for the age older than 70 years old, with the exception of 85 years old age bin where the percentage is not high in the dataset. 3.2.3. Adding predicted age as additional feature In this part, we reported the result of comparing AI model 3 and AI model 4 as shown in Figure 1. We run an experiment by using each blood feature and add the result of the predicted age as the additional feature to classify the mortality outcome for each patient. Our result show that this predicted age can increase the performance of the model in the 18 blood items. We repeated the experiment 30 times and compared the result with the one that added with the true age. We measure all the accuracy, sensitivity, specificity and F1-score and calculate the p-value to determine whether adding predicted age is significantly better than adding true age to predict the mortality. The result of the blood items which shows the significance result (p-value < 0.05) by adding the predicted age is shown in Table 3. From the table we can see that all the classification measurement return very good result, except the sensitivity. This is due to the limited case of mortality in our dataset which is very low when compared to the survived case. Age prediction from COVID-19 blood test for ensuring ... (Nunung Nurul Qomariyah)
  • 9. 3080 ❒ ISSN: 2252-8938 Figure 6. Correlation matrix of the top 15 features for mortality Figure 7. Predicted age vs true age in predicting mortality Table 3. List of blood items which shows significant difference after adding predicted age Blood Item Accuracy Sensitivity Specificity F1 HCT 0.82 0.30 0.87 0.84 ERI 0.79 0.30 0.84 0.82 RDW 0.80 0.28 0.85 0.83 MCHC 0.85 0.25 0.91 0.86 MONOSIT 0.86 0.35 0.91 0.87 NLR1 0.83 0.44 0.86 0.85 APTTHSL 0.80 0.36 0.84 0.83 PO2 N 0.82 0.40 0.86 0.85 O2S N 0.80 0.32 0.84 0.83 DDIMER 0.82 0.30 0.87 0.84 LEKO 0.80 0.35 0.84 0.83 MCH 0.86 0.27 0.91 0.86 SGOT 0.85 0.48 0.88 0.87 SGPT 0.82 0.41 0.86 0.85 PTHSL 0.80 0.37 0.84 0.83 LIMFOSIT 0.84 0.43 0.88 0.86 GDSFULL 0.80 0.37 0.84 0.83 Int J Artif Intell, Vol. 13, No. 3, September 2024: 3072–3082
  • 10. Int J Artif Intell ISSN: 2252-8938 ❒ 3081 4. CONCLUSION The study of predicting patient age is still limited. We explore several reason on why this is needed when we have to deal with AI model. Not only showing an effort to produce a robust AI model, but also to improve the performance of the model itself. In this study, we show the importance of predicting age from patient laboratory result. We conducted several experiments and show the pipeline of how the ML can predict age from the given dataset. Our result shows that with the predicted age as an additional feature in mortality classification task, the model is significantly improved when compared to adding the actual age. In the future, we want to explore more about combining the other patient electronic health record and try the other ML algorithm. ETHICAL CLEARANCE The patients’ medical records used in this study were collected by the data provider, including epi- demiological, demographic, clinical, laboratory and mortality outcome information. This study has been ap- proved by the Ethics Committee of the data provider, Pasar Minggu Regional Hospital Jakarta, Indonesia. The requirement for patient consent was waived as this was a secondary analysis of anonymized data. REFERENCES [1] T. G. Dietterich, “Steps toward robust artificial intelligence,” AI Magazine, vol. 38, no. 3, pp. 3–24, 2017, doi: 10.1609/aimag.v38i3.2756. [2] A. E. Öztaş, D. Boncukcu, E. Ozteke, M. Demir, A. Mirici, and P. Mutlu, “Covid-19 diagnosis: comparative approach between chest x-ray and blood test data,” in 2021 6th International Conference on Computer Science and Engineering (UBMK), 2021, pp. 472–477, doi: 10.1109/UBMK52708.2021.9558969. [3] T. Asai, “COVID-19: accurate interpretation of diagnostic tests—a statistical point of view,” Journal of Anesthesia, vol. 35, no. 3, pp. 328–332, 2021, doi: 10.1007/s00540-020-02875-8. [4] Z. Li et al., “Development and clinical application of a rapid IgM-IgG combined antibody test for SARS-CoV-2 infection diagnosis,” Journal of Medical Virology, vol. 92, no. 9, pp. 1518–1524, 2020, doi: 10.1002/jmv.25727. [5] D. Ferrari, A. Motta, M. Strollo, G. Banfi, and M. Locatelli, “Routine blood tests as a potential diagnostic tool for COVID-19,” Clinical Chemistry and Laboratory Medicine, vol. 58, no. 7, pp. 1095–1099, 2020, doi: 10.1515/cclm-2020-0398. [6] F. Bismadhika, N. N. Qomariyah, and A. A. Purwita, “Experiment on deep learning models for covid-19 detection from blood testing,” in 2021 IEEE International Biomedical Instrumentation and Technology Conference (IBITeC), Oct. 2021, pp. 136–141, doi: 10.1109/IBITeC53045.2021.9649254. [7] F. Soares, “A novel specific artificial intelligence-based method to identify COVID-19 cases using simple blood exams,” medRxiv, pp. 1–16, 2020, doi: 10.1101/2020.04.10.20061036. [8] M. AlJame, I. Ahmad, A. Imtiaz, and A. Mohammed, “Ensemble learning model for diagnosing COVID-19 from routine blood tests,” Informatics in Medicine Unlocked, vol. 21, 2020, doi: 10.1016/j.imu.2020.100449. [9] F. Cabitza et al., “Development, evaluation, and validation of machine learning models for COVID-19 detection based on routine blood tests,” Clinical Chemistry and Laboratory Medicine, vol. 59, no. 2, pp. 421–431, 2021, doi: 10.1515/cclm-2020-1294. [10] D. Brinati, A. Campagner, D. Ferrari, M. Locatelli, G. Banfi, and F. Cabitza, “Detection of covid-19 infection from routine blood exams with machine learning: a feasibility study,” Journal of Medical Systems, vol. 44, no. 8, pp. 1–12, 2020. [11] S. B. Rikan, A. S. Azar, A. Ghafari, J. B. Mohasefi, and H. Pirnejad, “COVID-19 diagnosis from routine blood tests using artificial intelligence techniques,” Biomedical Signal Processing and Control, vol. 72, 2022, doi: 10.1016/j.bspc.2021.103263. [12] J. Luo, L. Zhou, Y. Feng, B. Li, and S. Guo, “The selection of indicators from initial blood routine test results to improve the accuracy of early prediction of COVID-19 severity,” PLoS ONE, vol. 16, 2021, doi: 10.1371/journal.pone.0253329. [13] H. Ko et al., “An artificial intelligence model to predict the mortality of COVID-19 patients at hospital admission time using routine blood samples: Development and validation of an ensemble model,” Journal of Medical Internet Research, vol. 22, no. 12, 2020, doi: 10.2196/25442. [14] N. N. Qomariyah, A. A. Purwita, S. D. A. Asri, and D. Kazakov, “A tree-based mortality prediction model of covid- 19 from routine blood samples,” in 2021 International Conference on ICT for Smart Society (ICISS), 2021, pp. 1–7, doi: 10.1109/ICISS53185.2021.9533219. [15] A. N. Ashadi, A. A. Purwita, and N. N. Qomariyah, “Combating bias in covid-19 disease detection using synthetic annota- tions on chest x-ray images,” in 2021 IEEE International Biomedical Instrumentation and Technology Conference: The Im- provement of Healthcare Technology to Achieve Universal Health Coverage, IBITeC 2021, 2021, pp. 88–92, doi: 10.1109/IB- ITeC53045.2021.9649129. [16] New York State, “Age-adjusted rates - statistics teaching tools.” Department for Health New York State, 1999. [Online]. Available: https://www.health.ny.gov/diseases/chronic/ageadj.htm [17] K. Urban, K. Kirley, and J. J. Stevermer, “PURLs: It’s time to use an age-based approach to D-dimer,” The Journal of Family Practice, vol. 63, no. 3, pp. 155–156, 2014. [18] Z. Wang, L. Li, B. S. Glicksberg, A. Israel, J. T. Dudley, and A. Ma’ayan, “Predicting age by mining electronic medical records with deep learning characterizes differences between chronological and physiological age,” Journal of Biomedical Informatics, vol. 76, pp. 59–68, 2017, doi: 10.1016/j.jbi.2017.11.003. [19] A. Karargyris, S. Kashyap, J. T. Wu, A. Sharma, M. Moradi, and T. S. -Mahmood, “Age prediction using a large chest x-ray dataset,” in Medical Imaging 2019: Computer-Aided Diagnosis, 2019, doi: 10.1117/12.2512922. Age prediction from COVID-19 blood test for ensuring ... (Nunung Nurul Qomariyah)
  • 11. 3082 ❒ ISSN: 2252-8938 [20] Q. Ruan, K. Yang, W. Wang, L. Jiang, and J. Song, “Clinical predictors of mortality due to COVID-19 based on an analysis of data of 150 patients from Wuhan, China,” Intensive Care Medicine, vol. 46, no. 5, pp. 846–848, 2020, doi: 10.1007/s00134-020-05991-x. [21] A. Vaid et al., “Machine learning to predict mortality and critical events in a cohort of patients with covid-19 in new york city: model development and validation,” Journal of Medical Internet Research, vol. 22, no. 11, 2020, doi: 10.2196/24018. [22] C. Solomou and D. Kazakov, “Utilizing chest x-rays for age prediction and gender classification,” in 2021 4th International Seminar on Research of Information Technology and Intelligent Systems (ISRITI), 2021, pp. 356–361, doi: 10.1109/IS- RITI54043.2021.9702796. [23] N. N. Qomariyah, A. A. Purwita, M. S. Astriani, S. D. A. Asri, and D. Kazakov, “An XGBoost model for age prediction from covid-19 blood test,” in 2021 4th International Seminar on Research of Information Technology and Intelligent Systems, ISRITI 2021, 2021, pp. 446–452, doi: 10.1109/ISRITI54043.2021.9702867. [24] “fastText.” fastText, 2024. [Online]. Available: https://fasttext.cc/index.html [25] E. Grave, P. Bojanowski, P. Gupta, A. Joulin, and T. Mikolov, “Learning word vectors for 157 languages,” in LREC 2018 - 11th International Conference on Language Resources and Evaluation, 2019, pp. 3483–3487. [26] N. Alghamdi and F. Assiri, “A comparison of fasttext implementations using arabic text classification,” Advances in Intelligent Systems and Computing, vol. 1038, pp. 306–311, 2020, doi: 10.1007/978-3-030-29513-4 21. [27] Z. S. Ritu, N. Nowshin, M. M. H. Nahid, and S. Ismail, “Performance analysis of different word embedding models on Bangla lan- guage,” in 2018 International Conference on Bangla Speech and Language Processing (ICBSLP), 2018, pp. 1–5, doi: 10.1109/ICB- SLP.2018.8554681. [28] A. Joulin, E. Grave, P. Bojanowski, and T. Mikolov, “Bag of tricks for efficient text classification,” in 15th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2017, 2017, vol. 2, pp. 427–431, doi: 10.18653/v1/e17- 2068. [29] T. Chen and C. Guestrin, “XGBoost: A scalable tree boosting system,” in Proceedings of the ACM SIGKDD International Confer- ence on Knowledge Discovery and Data Mining, 2016, pp. 785–794, doi: 10.1145/2939672.2939785. BIOGRAPHIES OF AUTHORS Dr. Nunung Nurul Qomariyah is an Assistant Professor at BINUS University Inter- national, Jakarta, Indonesia. She is a former member of the Artificial Intelligence Research Group, University of York, UK. Her Ph.D. topic is in the area of recommender systems, particularly in e- commerce user preference learning from pairwise comparisons. Her current research is focusing on developing an explainable AI model for COVID-19 patients which is co-funded by Newton British Council in collaboration with and Indonesian Ministry of Research and Education. She can be con- tacted at email: [email protected]. Dr. Dimitar Kazakov is a reader in Computer Science at the University of York, UK and a member of his Departmental Artificial Intelligence Group. His research encompasses the de- velopment of machine learning (ML) and evolutionary algorithms and their applications to natural language processing, real-time systems, intelligent agents, function optimisation, and financial fore- casting. He has published over 120 peer-reviewed articles, supervised 7 and co-supervised another 3 Ph.D. students to completion. He is currently leading a research team of 6 Ph.D. students. He is a former Vice-Chair of the UK Society for the Study of Artificial Intelligence and Simulation of Behaviour (AISB). He can be contacted at email: [email protected]. Int J Artif Intell, Vol. 13, No. 3, September 2024: 3072–3082