Developing A Predictive Model of Stroke Using Support Vector Machine
Developing A Predictive Model of Stroke Using Support Vector Machine
Abstract— Health is a fundamental human right of all the and proposes a new predictive method using Principal
Filipinos in the Philippines, as stated by the Philippine Component Analysis and a supervised machine learning
Constitution of 1987. Based on the data published by the algorithm. For dimensionality reduction and dealing with
World Health Organization in 2018, there are 41 million the multi-collinearity problem in the experimental data,
deaths occurred because of stroke and its complications. Thus,
PCA is used [8].
given the parameters for the variables of risk factors of stroke,
a predictive model is developed for the occurrence of stroke
based on the medical records of the patient. To ensure quality Support Vector Machine (SVM) is a technique suitable
data, the medical data of the patients underwent data pre- for disease prediction task, [9]. Thus, SVM is chosen to
processing, principal component analysis is used for dimension predict stroke. SVM based-approach for various kernel
reduction. The model is evaluated using accuracy, precision, functions produced accurate results, and it showed the
recall, F1 score, and area under curve. The study used datasets predictive power of SVM within a small set of input
of 1500 patients from Cavite, Philippines. This study used 60 parameters [10].
percent for training the model, and 30 percent is used for
testing the model and 10 percent for validating the model. The
The paper intends to develop a predictive model using
SVM model achieved an accuracy of 99% for training the data,
98.89% for testing, and 97.33% for validation. The results of the medical records of the patients and undergo dimension
the model show the potential use of the predictive model for reduction through Principal Component Analysis by
stroke, thus, remains relevant for researchers and practitioners reducing the range of continuous data into a range of values
in the medical and health sciences field. or categories and processed using Support Vector Machine
The model is evaluated using accuracy, precision, recall, F1
Keywords—support vector machine, principal component score and area under curve.
analysis, stroke prediction, Philippines
A. Overview of Stroke
Stroke is the top life-threatening disease in the world. It
is the leading cause of cognitive disorder around the world. Stroke is a prevalent disease that for many years, can
[1]. To decrease the problem of stroke in the population, it is influence the patient and his/her family. It is one of the
needed first to identify the modifiable risk factors and to world’s major causes of adult disability. Developing
demonstrate the effectiveness of risk reduction efforts [2]. countries face this kind of non-communicable disease [11].
Accordingly, preventing stroke in the field s of neurology, For this reason, knowing what stroke is, is an essential first
cardiology, vascular medicine, and geriatrics medicine step. A stroke is a “brain attack.” It can occur anytime and
remains as one of the essential targets [3]. can affect anyone. It happens when blood flows to a cut area
of the brain. Brain cells die when this occurs due to the
In 2016, there were an estimated 41 million deaths absence of oxygen. Memory and muscle control are some of
because of non-communicable diseases. The significant part the capabilities regulated by the brain region that will be lost
of the percentage was because of cardiovascular disease when brain cells die. The common signs of stroke are
accounting to 17.9 million of deaths equivalent to 44% of all weakness or numbness of the face, arm, and leg of one side
non-communicable diseases deaths [4]. On the other hand, of the body. Speech difficulty happened and has trouble
based on the Philippines Statistics Authority (PSA), stroke seeing in one or both eyes. A patient can also experience
was the top leading cause of death with 74,134 or 12.7 sudden severe dizziness and loss of balance and has a severe
percent of the total in the Philippines [5]. headache. Moreover, lastly, increasing drowsiness with
possible loss of consciousness and confusion. [12].
However, the growing number of stroke incidents can be
addressed through innovation and technology. The use of B. Support Vector Machine
machine learning in knowledge discovery for disease Support Vector Machine, based on statistical learning
prediction has been one of the interesting and relevant topics theory, ensures a machine learning method. In the training
addressed by researchers [6]. Accordingly, because of the information descriptor space, a separate hyperplane is
importance of disease prediction to the people, several developed, and variables are categorized based on the side
studies have been conducted on modeling procedures for where the hyperplane is situated [13]. It is possible to use
prediction. [7]. This study incorporates machine learning
Moreover, in terms of classification, prediction, and Moreover, Xiang [22] applied and compared different
regression analysis, SVM is one of the supervised learning categories of machine learning model that have good
methods used [15]. interpretability, including generalized linear models, to build
the prediction for stroke and thromboembolism. This
study used integrated machine approaches, including data
curation, feature engineering, and supervised learning to
build the thromboembolism prediction model. The study
showed that the approach could achieve significantly better
prediction performance.
Negative Hyperplane
[11] Subha PP, P. G. (2015). ,Pattern and risk factors of stroke in the young
The objective of this study is to develop a predictive among stroke parients admitted in medical college hospital.
model using SVM to predict the possibility of stroke of the Thiruvananthapuram.,Ann indian Acad Neurol, 18:20-3 .
patients in Cavite, Philippines. Predictions from SVM
kernel resulted in high-performance classifier for RBF as [12] National Stroke Association. (2019). (American Heart Association
1.0. This can assist doctors to plan for better stroke detection Inc.) Retrieved May 28, 2019, from
medication soon. This study proves the predictive capability https://www.stroke.org/understand-stroke/what-is-stroke/
of SVM with 1, 500 patients, and 10 attributes. The results
[13] Dr. S. Vijayarani, M. S. (2015). Data Mining Classification
for evaluation resulted in accuracy of 99% using the training
Algorithms for Kidney Disease Prediction. International Journal on
data and 98.89% using the testing data with a validation Cybernatics and Informatics, 4(4), 13-25.
result of 97.33%.
[14] Jean-Emmanuel Bibault, P. G. (2016). Big Data and machine learning
This study is not free from limitations. Thus, this in radiation oncology: State of the art and future prospect. Elsevier,
110-117.
recommends some future activities. The study could be used
in the future for stroke prevention since it could detect the
[15] Cemil Colak, E. K. (2015). Application of knowledge discovery
early occurrence of stroke among the patients of Cavite, process on the prediction of stroke. Elsevier, 181-185.
Philippines. The results could also help in developing a
control plan for those patients since stroke cannot be [16] Raoof Gholami, N. F. (2017). Support Vector Machine: Principles,
detected beforehand. This study could also be used for Parameters, And Applications. Elsevier, 515-533.
developing another model for further comparison of the
different machine learning algorithms. [17] Ng, A. (n.d.). Standford Edu. Retrieved May 30, 2019, from
cs229.stanford.edu/notes/cs229-notes3.pdf