e94a8dcd23ca44bf89915c5883190ed6.
e94a8dcd23ca44bf89915c5883190ed6.
*Correspondence:
[email protected] Abstract
1
School of computer Stroke prediction remains a critical area of research in healthcare, aiming to enhance
engineering, KIIT University, Patia, early intervention and patient care strategies. This study investigates the effi-
Bhubaneswar, Odisha 751024, cacy of machine learning techniques, particularly principal component analysis
India
2
Department of Environmental (PCA) and a stacking ensemble method, for predicting stroke occurrences based
Health, Harvard T H Chan School on demographic, clinical, and lifestyle factors. We systematically varied PCA compo-
of public Health, 677 Harrington nents and implemented a stacking model comprising random forest, decision tree,
Avenue, Boston, MA 02115, USA
3
Department of Computer and K-nearest neighbors (KNN).Our findings demonstrate that setting PCA compo-
Sciences, College of Computer nents to 16 optimally enhanced predictive accuracy, achieving a remarkable 98.6%
and Information Sciences, accuracy in stroke prediction. Evaluation metrics underscored the robustness of our
Princess Nourah bint
Abdulrahman University, P.O. approach in handling class imbalance and improving model performance, also com-
Box 84428, 11671 Riyadh, Saudi parative analyses against traditional machine learning algorithms such as SVM, logistic
Arabia regression, and Naive Bayes highlighted the superiority of our proposed method.
4
Electrical Engineering
Department, College Keywords: Stroke prediction, Machine learning, Principal component analysis (PCA),
of Engineering, King Khalid
University, 61421 Abha, Saudi
Stacking ensemble, Healthcare analytics, Predictive modeling, Class imbalance, Feature
Arabia selection, Early intervention
5
Radiological Sciences
Department, College of Applied
Medical Sciences, King Khalid Introduction
University, 61421 Abha, Saudi The global population’s growth has coincided with a concerning surge in cases of brain
Arabia
6
BioImaging Unit, Space strokes, leading to a notable increase in annual fatalities by 2023. With the number
Research Centre, University
of stroke-related deaths on the rise, the imperative to address this crisis has become
of Leicester, Michael Atiyah
Building, Leicester LE1 7RH, UK increasingly urgent. This alarming trend has propelled stroke research to the forefront of
7
PRINCE Laboratory Research,
ISITcom, Hammam Sousse, medical exploration.
University of Sousse, Sousse, Machine learning algorithms have shown promise in revolutionizing stroke predic-
Tunisia
tion by analyzing extensive datasets encompassing demographic information, medical
histories, and physiological markers like age, blood pressure, and glucose levels [1, 2].
However, the deployment of these algorithms in clinical settings presents challenges that
must be addressed. One significant concern is the potential bias embedded within train-
ing data, which can lead to skewed predictions and inequitable healthcare outcomes [3].
© The Author(s) 2024. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits
use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original
author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third
party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the mate-
rial. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or
exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://
creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publi
cdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
Chakraborty et al. BMC Bioinformatics (2024) 25:329 Page 2 of 23
Motivation
We propose a pioneering approach to stroke prediction, leveraging advanced machine
learning techniques and introducing a novel stacking methodology. Our research stands
out for its innovative contribution in showcasing the robust performance of this stacking
technique across a spectrum of crucial healthcare metrics. We demonstrate the poten-
tial of our proposed approach, thereby enhancing patient outcomes and healthcare man-
agement strategies.
Literature survey
Stroke prediction research has witnessed significant advancements through the appli-
cation of machine learning (ML) techniques, contributing to improved accuracy and
timely interventions. This review synthesizes findings from recent studies focusing on
ML approaches for stroke prediction, emphasizing algorithmic performance, feature
selection methodologies, model interpretability, and key results.
In [15], an innovative stroke detection algorithm is presented, employing various
ML classifiers such as Naïve Bayes, logistic regression, XgBoost, and support vector
machines (SVM). Notably, the support vector machine algorithm outperformed other
models, achieving exceptional accuracy (98.6%) and precision (99.9%). However, the
paper lacks explicit discussions on feature selection and data preprocessing strategies.
In [16], researchers develop an ML-based stroke prediction algorithm utilizing readily
available data from patients’ hospital presentations and investigating the impact of social
determinants of health (SDoH) variables. The study reports high sensitivity and reason-
able specificity of the ML stroke prediction algorithm, with significant improvements
observed upon the inclusion of individual-level SDoH features. Importantly, experimen-
tal results demonstrate consistent outperformance of ML classifiers over logistic regres-
sion, with AUC improvements from 0.694 to 0.823 with the inclusion of SDoH features.
Moreover, [17] employs logistic regression (LR) with recursive feature selection (RFE)
to predict stroke and Transient Ischemic Attack (TIA) diagnosis, highlighting the pre-
dictive utility of patient-reported symptoms. ML techniques achieve impressive per-
formance metrics, with AUC exceeding 0.94 for stroke outcome prediction and notable
enhancements upon incorporating follow-up data.
In [18], the stacking classification method emerges as a superior approach, showcas-
ing high performance across multiple metrics, including an impressive AUC of 98.9%
and an accuracy of 98%. The study underscores the efficacy of the stacking ensemble
method, comprising base classifiers such as naive Bayes and random forests, with a
logistic regression meta-classifier.
Additionally, [19] explores the interpretability of ML models for stroke prediction
using SHAP and LIME techniques. Notably, Random Forest emerges as the top-per-
forming algorithm with an accuracy score of 90.36%, followed closely by the XGB Classi-
fier with an accuracy score of 89.02% [20–22].
In [23], machine learning (ML) is applied to predict early signs of ischemic stroke in
emergency settings, although its predictive accuracy is constrained by the area under
the receiver operating characteristic (AUC). The study highlights the XGBoost-based
model’s superior predictive power for pre-screening ischemic stroke, particularly
Chakraborty et al. BMC Bioinformatics (2024) 25:329 Page 4 of 23
[15] Naïve Bayes, Logistic Regression, SVM: 98.6% SVM achieved the highest
XgBoost, SVM accuracy and precision (99.9%),
highlighting its robustness.
[16] Various ML classifiers, Logistic AUC: 0.694 to 0.823 Inclusion of SDoH features signifi-
Regression cantly improved AUC, showing
the importance of these variables.
[17] Logistic Regression with RFE AUC:>0.94 Recursive feature selection and
follow-up data incorporation
enhanced predictive utility.
[18] Stacking (Naïve Bayes, Random AUC: 98.9%, Accuracy: 98% Stacking method demonstrated
Forests, LR) superior performance across
multiple metrics.
[19] Random Forest, XGBoost Random Forest: 90.36%, SHAP and LIME techniques
XGBoost: 89.02% enhanced interpretability, with
Random Forest performing best.
[23] XGBoost Highest Accuracy XGBoost showed superior predic-
tive power for pre-screening
ischemic stroke.
[24] DeepSurv, Deep-Survival- Enhanced Predictive Accuracy Deep learning models surpassed
Machines traditional survival models for
predicting MACEs after AIS.
Chakraborty et al. BMC Bioinformatics (2024) 25:329 Page 5 of 23
Aim
This research aims to pioneer a pioneering approach to predictive analysis of Ischemic
brain stroke with machine learning techniques. Initially, the study focuses on utilizing
preference algorithms to discern the key traits using several machine learning tech-
niques such as Logistic regression, support vector machine, decision tree and K-near-
est neighbor. We utilized PCA for the reduction the dimensionality of the dataset.
Contributions of our study as follows:
The subsequent sections of this paper are organized as follows: in Sect. 2, we elabo-
rate on the feature Selection method and Classifier. Following that, in Sect. 3, we pre-
sent the experiment and results of our study, including a comparative analysis of our
model with both the proposed model and other state-of-the-art methods.
Methodology
Dataset
This dataset from Kaggle includes 5110 patients, with attributes such as gender, age,
presence of hypertension, history of heart disease, marital status, type of work, resi-
dence type, average glucose level, body mass index (BMI), smoking status, and stroke
occurrence. The gender attribute is categorical, the age is numerical, and hypertension
and heart disease are binary indicators (1 for yes, 0 for no). Marital status is recorded
as either married or not married, while work type categories include government job,
never worked, private, self-employed, and children. Residence type is categorized as
urban or rural. Average glucose level and BMI are continuous variables, and smok-
ing status is categorized as never smoked, formerly smoked, or smokes. The target
variable is stroke prediction, also a binary indicator (1 for stroke, 0 for no stroke). For
every column, there are comprehensive explanations in Table 2.
To rectify dataset imbalances and bolster model accuracy, we implement oversam-
pling techniques. We aim to equalize representation across classes by increasing the
number of instances in the minority class (stroke) to match that of the majority class
(no stroke). Post-oversampling, both classes comprise 4861 cases each, ensuring a
balanced dataset for training and testing. The disparity in stroke class distribution
pre- and post-oversampling is visually depicted in the accompanying image. Figure 1
depicts for the same.
We use the following features from the stroke prediction dataset, which is publicly
available on Kaggle. Table 2 provides a detailed description of each feature.
Chakraborty et al. BMC Bioinformatics (2024) 25:329 Page 6 of 23
Fig. 1 Distribution of stroke and no stroke cases before and after oversampling
Table 2 Dataset features summary: feature, description, data type, and additional information
Feature Description Data type Additional information
Figures 2 and 3 depict the prevalence of heart disease and hypertension among par-
ticipants who have experienced a stroke. In both figures, a significant proportion of
participants who have had a stroke do not have a diagnosis of hypertension or heart
disease.
Figures 4 and 5 display the prevalence of residence type and work type among partici-
pants who have experienced a stroke. These figures highlight that a significant propor-
tion of participants who have had a stroke reside in urban areas and have a private work
type.
Figures 6 and 7 display the prevalence of glucose level and smoking level among par-
ticipants who have experienced a stroke.
Figure 8 display the correlation among various features. The figure provides valuable
insights into the interplay and potential dependencies among these attributes, which are
crucial for understanding the underlying patterns and dynamics within the dataset.
Chakraborty et al. BMC Bioinformatics (2024) 25:329 Page 7 of 23
Data pre‑processing
Ensuring the quality of raw data is crucial for the accuracy of our final predictions,
particularly in the presence of missing values and noisy data. Therefore, our research
emphasizes the necessity of data preprocessing to enhance the appropriateness of the
data for analysis. This preprocessing involves several steps, including the reduction of
redundant values, feature selection, and data discretization.
An integral part of our data preprocessing strategy is addressing class imbalance, a
common challenge in predictive modeling. To tackle this issue, we employ the Syn-
thetic Minority Over-sampling Technique (SMOTE) within our proposed framework.
By oversampling the minority class, specifically the ’stroke’ participants, we aim to
achieve a more balanced distribution, thereby preventing biases in the predictive
model.We addressed missing values within the BMI column by imputing them with
the median value. This method ensures that the dataset remains robust and complete
for subsequent analysis.
Figure 9 shows us the end-to-end flow charts of the preprocessing.
Chakraborty et al. BMC Bioinformatics (2024) 25:329 Page 10 of 23
Feature selection
This study explores the impact of varying the number of components in Principal Com-
ponent Analysis (PCA) on the accuracy of stroke prediction models. By systematically
adjusting the value of n from 1 to 16, we observed that the majority of models exhib-
ited the highest accuracy when n was set to 16. Building upon this observation, we pro-
ceeded to implement a stacking ensemble method. In this approach, we combined the
predictions from the three best-performing models: Random Forest and Decision Tree
as base estimators, and K-Nearest Neighbors (KNN) as the final estimator.
Upon applying the stacking ensemble technique, we achieved a remarkable accuracy
of 98.6%. This significant improvement underscores the efficacy of combining comple-
mentary strengths from multiple models to enhance predictive performance.
This research aims to compare the performance of various machine learning classi-
fiers in predicting stroke occurrences after dimensionality reduction using PCA. We uti-
lized PCA to reduce the dimensionality of the dataset and then trained several classifiers
including Random Forest, SVM, XGBoost, Naive Bayes, KNN, Logistic Regression, and
Decision Tree on the transformed data.
Before training the models, we conducted data preprocessing steps including han-
dling missing values (replacing them with the median value for BMI), feature scaling,
and splitting the data into training and testing sets. Each classifier was evaluated using
accuracy scores, F1 scores, precision, and recall which were computed by comparing the
model predictions with the actual labels in the test set.
The results of our analysis are presented in a data frame, showcasing the accuracy of
each classifier for different numbers of PCA components. Some key risk factors can be
identfied as:
(a) Age: Older age significantly increases the risk of ischemic stroke.
(b) Hypertension: High blood pressure is a major risk factor.
(c) Diabetes: Diabetes mellitus is strongly associated with an increased risk.
(d) Smoking: Tobacco use contributes to the risk of stroke.
(e) Cholesterol levels: High levels of LDL cholesterol can lead to stroke.
(f )Cardiovascular diseases: Conditions like atrial fibrillation and heart failure are criti-
cal predictors.
(g) Lifestyle factors: Physical inactivity, poor diet, and obesity are important considera-
tions.
(h) Genetic factors: Family history and specific genetic markers can also be significant.
These factors are typically integrated into machine learning models to enhance the
prediction accuracy of ischemic stroke outcomes.
The findings demonstrate that the performance of the classifiers varies with the num-
ber of PCA components, with certain classifiers exhibiting better accuracy than others.
This information can guide the selection of an appropriate classifier for stroke prediction
tasks based on the desired trade-off between computational complexity and predictive
accuracy.
Chakraborty et al. BMC Bioinformatics (2024) 25:329 Page 13 of 23
Classification
In our research paper, we’ve employed cutting-edge classification techniques to predict
and mitigate the risk of stroke occurrences. Stacking, a sophisticated ensemble learning
method, has been at the forefront of our approach, allowing us to amalgamate insights
from base classifiers. This innovative fusion of classifiers has enabled us to discern intri-
cate patterns and relationships in patient data, enhancing the precision and reliability of
our predictive models.
Our methodology involved training a diverse ensemble of classifiers on comprehensive
dataset. These classifiers, acting as the foundation, have collectively contributed to our
understanding of stroke risk factors and prediction accuracy. Through iterative refine-
ment and model aggregation facilitated by stacking, we’ve strived to push the bounda-
ries of stroke prediction, aiming for more personalized healthcare interventions and
improved patient outcomes.
Technical details
Principal component analysis (PCA): PCA was employed for dimensionality reduction,
standardizing data, computing the covariance matrix, and projecting data onto principal
components to retain 95% variance.
PCA assumes linearity and Gaussian distributions in the data, which may not always
be applicable. This powerful dimensionality reduction technique some specific features
in stroke prediction which provides valuable insights to medical professionals. In this
context they are listed below:
Chakraborty et al. BMC Bioinformatics (2024) 25:329 Page 14 of 23
(1) Variance capture: PCA identifies and retains the components that explain the high-
est variance in the data, ensuring that the most informative aspects are prioritized.
(2) Noise reduction: By filtering out less significant components, PCA reduces noise,
which helps in making the prediction model more robust and accurate.
(3) Multicollinearity handling: PCA transforms correlated features into uncorrelated
principal components, addressing issues of multicollinearity that can affect model
performance.
(4) Simplification: It reduces the complexity of the dataset by lowering the number of
features, which simplifies the model and enhances computational efficiency.
• Random forest: 500 trees, criterion = ‘entropy’, max depth = None, min samples
leaf = 1, min samples split = 5
• Decision tree: criterion = ‘entropy’, max depth = None, min samples leaf = 1, min
samples split = 5
• Base classifier training: Each base classifier was independently trained on the
training dataset.
• Level 1 data generation: Predictions from base classifiers were used to generate
a new dataset, serving as input for the meta-classifier. This involved performing
5-fold cross-validation on the training set to avoid overfitting.
• Meta-classifier (final estimator): K-nearest neighbors (KNN) with 5 neighbors and
Euclidean distance metric.
Training and evaluation: The dataset was split into 80% training and 20% validation
sets. Fivefold cross-validation was performed to tune hyperparameters and evaluate
each classifier’s performance. Metrics such as accuracy, precision, recall, and F1-score
were used to assess the stacking classifier’s effectiveness.
This comprehensive and detailed approach ensures robust and accurate stroke risk
predictions, paving the way for personalized healthcare interventions and improved
patient outcomes.
data visualization TensorFlow 2.6.0 and Keras 2.6.0 for deep learning models Devel-
opment Environment: Jupyter Notebook and PyCharm These specifications provide
a baseline for replicating our study and further developing the predictive model for
ischemic stroke.
Evaluation metrics
In our investigation into predicting ischemic stroke occurrences, we evaluated the per-
formance of our predictions by comparing them against actual data using predefined
metrics. The dataset encompasses diverse patient characteristics pertinent to stroke
prognosis.
Evaluation metrics are critical for analyzing the performance of categorization models.
Accuracy is the proportion of properly identified cases overall, providing a broad meas-
ure of model performance. Precision highlights the fraction of true positive forecasts
among all positive predictions, indicating how reliable positive predictions are. Recall,
on the other hand, emphasizes the fraction of true positive predictions across all actual
positive cases, demonstrating the model’s capacity to detect positives. Specificity is the
proportion of genuine negative predictions among all real negative cases, demonstrating
the model’s ability to identify negatives correctly. The F1-Score, which is the average of
the harmonics of precision and recall, gives a balanced assessment that is especially ben-
eficial in circumstances with uneven class distributions. These measurements provide
insights into a model’s strengths and limitations, aiding in the Helping in maximizing
efficiency and choosing the suitable models for classification jobs.
• Advantages
• Disadvantages
Fig. 10 Confusion matrix of predicted versus actual classes of our proposed method
TOPSIS analysis
The technique for order of preference by similarity to ideal solution (TOPSIS) is a
method used for ranking and selection of alternatives based on their closeness to the
ideal solution. The following subsections outline the steps involved in applying the TOP-
SIS method.
xij
rij =
m 2 (1)
i=1 xij
where rij is the normalized value, xij is the original value, i is the index of the alternative,
and j is the index of the criterion.
Refer to Table 5 for the normalized decision matrix.
Chakraborty et al. BMC Bioinformatics (2024) 25:329 Page 20 of 23
• Accuracy: 1
• Precision: 1
• Recall: 1
• F1-score: 1
• Accuracy: 0
• Precision: 0
• Recall: 0
• F1-score: 0
KNN 0.883
NN 0.709
RF 0.709
SVM-L 0.137
SVM-R 0.547
ADA 0.640
MNB 0.664
Proposed method 0.984
n
Di = (rij − rj− )2
−
(3)
j=1
where Di+ is the distance to the ideal solution, Di− is the distance to the anti-ideal solu-
tion, rij is the normalized value of the i -th alternative and j-th criterion, rj+ is the ideal
value for the j-th criterion, and rj− is the anti-ideal value for the j-th criterion.
Refer to Table 6 for the Euclidean distances.
Di−
Ci = (4)
Di+ + Di−
Future work
In future work we will incorporate diverse datasets, including genetic, lifestyle, and high-
tech imaging data, to strengthen the model’s predictive capabilities. Exploring deep learn-
ing techniques tailored for clinical interpretability and further advancements in ensemble
learning methodologies offer promising pathways for improvement. To ensure real-world
applicability, we propose a multi-phase clinical validation plan, starting with a pilot obser-
vational study in three hospitals, enrolling 200 patients. This study will assess the model’s
accuracy against established diagnostic methods. Our ultimate goal is comprehensive
clinical validation to enhance the model’s credibility and impact on patient care. We seek
collaborations with healthcare institutions and funding agencies to support this endeavor,
aiming to offer a robust tool for ischemic stroke prediction and patient management.
Acknowledgements
This research was financially supported by Princess Nourah bint Abdulrahman University Researchers Supporting Project
number (PNURSP2024R321), Princess Nourah bint Abdulrahman University, Riyadh, Saudi Arabia. The authors extend
their appreciation to the Deanship of Research and Graduate Studies at King Khalid University for funding this work
through Large Research Project under grant number RGP2/549/45.
Author contributions
PC, AB, PPS, AB, SM, and NA were involved in the conceptualization and design of this system, and they also sourced
funding for the project. MA and MSA conducted the data analysis and wrote the first draft of the manuscript. BOS was
responsible for project management, monitoring, and evaluation of the study. All authors reviewed the manuscript and
made significant contributions to its content.
Funding
This research was financially supported by Princess Nourah bint Abdulrahman University Researchers Supporting Project
number (PNURSP2024R321), Princess Nourah bint Abdulrahman University, Riyadh, Saudi Arabia. The authors extend
their appreciation to the Deanship of Research and Graduate Studies at King Khalid University for funding this work
through Large Research Project under grant number RGP2/549/45.
Data availability
The dataset used during the current study is available here.
Declarations
Competing interests
The authors declare no competing interests.
References
1. Kogan E, Twyman K, Heap J, Milentijevic D, Lin JH, Alberts M. Assessing stroke severity using electronic health record
data: a machine learning approach. BMC Med Inf Decis Making. 2020;20:1–8.
Chakraborty et al. BMC Bioinformatics (2024) 25:329 Page 23 of 23
2. Wang W, Rudd AG, Wang Y, Curcin V, Wolfe CD, Peek N, Bray B. Risk prediction of 30-day mortality after stroke using
machine learning: a nationwide registry-based cohort study. BMC Neurol. 2022;22(1):195.
3. Campagnini S, Arienti C, Patrini M, Liuzzi P, Mannini A, Carrozza MC. Machine learning methods for functional recov-
ery prediction and prognosis in post-stroke rehabilitation: a systematic review. J Neuroeng Rehabil. 2022;19(1):1–22.
4. Polikar R. Ensemble learning. Ensemble machine learning: methods and applications. Berlin: Springer; 2012. p. 1–34.
5. Sagi O, Rokach L. Ensemble learning: a survey. Wiley Interdiscip Rev Data Min Knowl Discov. 2018;8(4):1249.
6. Dong X, Yu Z, Cao W, Shi Y, Ma Q. A survey on ensemble learning. Front Comp Sci. 2020;14:241–58.
7. Firoozbakhsh KK, Kunkel CF, Scremin AE, Moneim MS. Isokinetic dynamometric technique for spasticity assessment.
Am J Phys Med Rehabil. 1993;72(6):379–85.
8. Wang L, Guo X, Fang P, Wei Y, Samuel OW, Huang P, Geng Y, Wang H, Li G. A new EMG-based index towards the
assessment of elbow spasticity for post-stroke patients. In: 2017 39th Annual International conference of the IEEE
engineering in medicine and biology society (EMBC); 2017. pp. 3640–3643.
9. Singh T, Ninkovic BM, Tasic MS, Stevanovic MN, Kolundzija BM. 3-d EM modeling of medical microwave imaging
scenarios with controllable accuracy. IEEE Trans Antennas Propag. 2022;71(2):1640–53.
10. Taylor RA, Sansing LH. Microglial responses after ischemic stroke and intracerebral hemorrhage. Clin Dev Immunol.
2013;2013:746068.
11. Schiff L, Hadker N, Weiser S, Rausch C. A literature review of the feasibility of glial fibrillary acidic protein as a bio-
marker for stroke and traumatic brain injury. Mol Diagn Therapy. 2012;16:79–92.
12. Frey S, Ertl T. Progressive direct volume-to-volume transformation. IEEE Trans Vis Comput Graph. 2016;23(1):921–30.
13. Vlachos M, Kollios G, Gunopulos D. Discovering similar multidimensional trajectories. In: Proceedings 18th interna-
tional conference on data engineering; 2002. pp. 673–684.
14. Dobkin BH. Rehabilitation after stroke. N Engl J Med. 2005;352(16):1677–84.
15. Mushtaq S, Saini KS, Bashir S. Machine learmusht for brain stroke prediction. In: 2023 International conference on
disruptive technologies (ICDT); 2023. pp. 401–408.
16. Chen M, Tan X, Padman R. A machine learning approach to support urgent stroke triage using administrative data
and social determinants of health at hospital presentation: retrospective study. J Med Internet Res. 2023;25:e36477.
https://doi.org/10.2196/36477.
17. Khatri I, Fraser H, Bacher I, Madsen T. Abstract tmp53: prediction of acute cerebrovascular events based on patient
reported symptoms. Stroke. 2023;54(1):53–53.
18. Dritsas E, Trigka M. Stroke risk prediction with machine learning techniques. Sensors. 2022;22(13):4670.
19. Mridha K, Ghimire S, Shin J, Aran A, Uddin MM, Mridha MF. Automated stroke prediction using machine learning: an
explainable and exploratory study with a web application for early intervention. IEEE Access. 2023;11:52288–308.
20. Abedi V, Avula V, Chaudhary D, Shahjouei S, Khan A, Griessenauer CJ, Li J, Zand R. Prediction of long-term stroke
recurrence using machine learning models. J Clin Med. 2021;10(6):1286.
21. Boukhennoufa I, Zhai X, Utti V, Jackson J, McDonald-Maier KD. A comprehensive evaluation of state-of-the-art time-
series deep learning models for activity-recognition in post-stroke rehabilitation assessment. In: 2021 43rd Annual
international conference of the IEEE engineering in medicine and biology society (EMBC); 2021. pp. 2242–2247.
22. Boukhennoufa I, Altai Z, Zhai X, Utti V, McDonald-Maier KD, Liew BX. Predicting the internal knee abduction impulse
during walking using deep learning. Front Bioeng Biotechnol. 2022;10:877347.
23. Zheng Y, Guo Z, Zhang Y, Shang J, Yu L, Fu P, Liu Y, Li X, Wang H, Ren L, et al. Rapid triage for ischemic stroke: a
machine learning-driven approach in the context of predictive, preventive and personalised medicine. EPMA J.
2022;13(2):285–98.
24. Kim D-Y, Choi K-H, Kim J-H, Hong J, Choi S-M, Park M-S, Cho K-H. Deep learning-based personalised outcome predic-
tion after acute ischaemic stroke. J Neurol Neurosurg Psychiatry. 2023;94(5):369–78.
25. Chun M, Clarke R, Cairns BJ, Clifton D, Bennett D, Chen Y, Guo Y, Pei P, Lv J, Yu C, et al. Stroke risk prediction using
machine learning: a prospective cohort study of 0.5 million Chinese adults. J Am Med Inf Assoc. 2021;28(8):1719–27.
26. Campagnini S, Arienti C, Patrini M, Liuzzi P, Mannini A, Carrozza MC. Machine learning methods for functional recov-
ery prediction and prognosis in post-stroke rehabilitation: a systematic review. J Neuroeng Rehabil. 2022;19(1):1–22.
27. Boukhennoufa I, Zhai X, Utti V, Jackson J, McDonald-Maier KD. Wearable sensors and machine learning in post-
stroke rehabilitation assessment: a systematic review. Biomed Signal Process Control. 2022;71:103197.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.