0% found this document useful (0 votes)
239 views

Prediction of Stroke Using Machine Learning

This document discusses using machine learning models to predict stroke. Random forest, KNN, and logistic regression algorithms are used to classify and predict strokes based on a hospital dataset. Exploratory data analysis is performed to understand the data and features. The models are trained on the dataset and show satisfactory results, indicating they could be useful for real-time stroke detection and classification. Wireless body area networks using sensors are also discussed as a method for health monitoring and risk detection that could predict strokes. EEG data from patient reports is analyzed to study different types of strokes using the machine learning approaches.

Uploaded by

Anonymous aasari
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
239 views

Prediction of Stroke Using Machine Learning

This document discusses using machine learning models to predict stroke. Random forest, KNN, and logistic regression algorithms are used to classify and predict strokes based on a hospital dataset. Exploratory data analysis is performed to understand the data and features. The models are trained on the dataset and show satisfactory results, indicating they could be useful for real-time stroke detection and classification. Wireless body area networks using sensors are also discussed as a method for health monitoring and risk detection that could predict strokes. EEG data from patient reports is analyzed to study different types of strokes using the machine learning approaches.

Uploaded by

Anonymous aasari
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 6

Prediction of Stroke using Machine learning

Ciddarth RM Gokul Ranjith V


Mrs. B. Chandra, B.E,M.E Student-Dept. of Information
Student - Dept.of
Asst.Professor - Dept.of Information Technology
Information Technology
Technology Easwari Engineering
Easwari Engineering College
Easwari Engineering College Chennai, India
Chennai, India
College Chennai, India [email protected]
[email protected]
[email protected]

Abstract— Stroke is among major causes of death


algorithms have been implemented to understand
and long-term disability worldwide. It is one of great
importance to predict the risk of having stroke for various relations and the risk factors that leads to
better prevention and early treatment. This brief report stroke. The Machine learning algorithms can been
presents my attempt to develop a machine learning used to improve the supervised classification. To
(ML) model to accurately and quickly predict whether detect stroke that has been occured in the patient
or not a person suffered stroke based on the Random already,its been sorted from the Hospital clinical
forest dataset, K-Nearest Neighbor(KNN) Algorithm, report and the statistical data. From the list of data,
Logistic Regression.by the following process of the processes like regression and classification
classification and regression are proposed in the data algorithms has been implemented to find the
moreover training of predifined hospital types,accuracy and severity of the stroke and as well
dataset are been
as to find the reason behind the stroke from the
performed.Exploratory Data Analysis(EDA) process has
been implemented to investigating or discover the
person's medical history.With the help of the
dataset outliers to achieve better classification.The main supervised machine learning concepts, the distinct
objective of this paper is to develop an effective stroke data has been properly maintained and well
detection health care service to predict stroke in better processed. In all around, we have build a stroke
performance. In addition to providing performance dataset from various medical experts, then the data
benchmark across various ML models, the report also has been processed by EDA process and the
examines which features are useful for stroke prediction. preprocessing of the data has been implemented. As
the result of the model is satisfactory, it can be used
Keywords- stroke prediction, random forest,
in the realtime stroke detection as well as
regression,machine leaening
classification.
I. INTRODUCTION
Stroke detection is an topological function of human II. LITERATURE SURVEY
health care. A stroke occurs when the flow of blood to
a particular part of the brain is disrupted or
[1] While Studying Various papers,Wireless body
diminished as a result of broken or blocked blood
area network (WBAN) is an health monitoring
vessel. As a outcome ,the cells that didn't receive
system used in remote monitering health care services
proper nutrients and oxygen are subjected to death. It
and it works remotely .These systems are composed
is one of the urgent scenarios that require prior
of small, wearable,em-battery-powered sensors that
medical attention. Early detection as well as proper
are placed on the body of the patients. The sensors are
attention has been required to prevent the future
wired to a microprocessor where the data are
damage to the affected areas of brain and other
compressed and sent to a local smartphone. The
complications in other parts of the body. According to
smartphone transmits the compressed data over the
the World health organization(WHO) the estimated
internet to a remote medical hospital where in the data
number of deaths from stroke is fifteen million across
are monitored and analyzed .EEG (
the world. It also calculated that for every 10 minutes
Electroencephalogram ) is a bioscope
atleast two people are dying around the world due to
electrophysiological technique for detecting an
increased severity and absence of
monitoring the electrical activity of the brain and the
provision.According to BBC, the Stroke is the fifth
most common test for diagnosing epilepsyIt has
leading cause of molarity in the United states
couple EEG databases, that are the Bonn University
followed by 12% of population daily suffers due to
and the CHB-MIT databases, used in this study for
stroke. This circumstance occurs due to environment
evaluating the performance of the proposed univariate
and certain body condition. In order to overcome this
and multivariate seizure detection features.By
situation, some sort of prediction
applying the orthogonal matching pursuit (OMP)
algorithm on the
compressed EEG data and computing the rate by compared to other neuro imaging techniques like MRI
which the energies of partially reconstructed signals and CT scan. Lets take 5 patient report first analyze
are increased.Partial energy difference (PED), is then that next following the results of the patients of stroke
used for classifying seizure and non-seizure persons will be resulted using EEG. EEG process was
states.The proposed methods have the potential to be started as soon as possible after the patient admitted
deployed for tele-monitoring of epilepsy patients to the emergency ward, and the test was started. The
using WBANs in personalized medicine to improve CTscan and MRI scan test was to be done before EEG
their quality of life. It could also run on a mobile analyiss and it will be diagnosed by a qualified
device and it will give real-time feedback to the neurologist. Out of 5 patients, what are results came
patient or doctor before transmission. from the test reports are noted. The results are
comparably better in the case of ischemic or than the
hemorrhagic stroke. In this paper, the electrical
[2] Health Monitoring using WBANs technologies activation of different regions of the patient and
with sensors will enhance the quality of ife in medical different types of strokes has been tested and studied.
wise. Implementing the risk detection in wireless The current density at each source has been estimated
body sensor network for health monitoring in deep and ocuured correctly . While comparing in most of
learning involves lot of volume datasets and need the hospitals using CT scan and MRI scan to detect
more computational tools. The data that we are the stroke but it is very much costly , so we have to
collected from the risk detection on WBAN health find the cost-effectively way here we find it to detect
monitoring data sensors is not labeled exactly. We a stroke in easily way.
can use BPNN or WBAN models to train some data
used for the prediction of new data. In olden days data III. PROPOSED SYSTEM
processing was done the samples . Now a days the
advancement on technical instruments which is used The main objective of Prediction of Stroke using
to allowed them to be developed. In hybried deep Machine learning is utilise the random forest concept
learning by using the HDL model which help us to with help of binary tree and other algorithms like K-
generate the the information about the risk patient. At Nearest Neighbor (KNN) Algorithm, Logistic
the final stage we find the best parameters for our Regression and followed by hyper parameters has
better evaluation.Collecting the data from the online used to tuned in these method. while using this
about the information of WBAN data by usning of the method the analysis of stroke has been processed and
scrapping tools, so that lot number of data mining it can easily predicted.in order to improving the
techniques have being used. Then preprocess the data performance benchmark across various machine
by increasing the efficiency of data prediction using learning algorithm has been implemented and easily
available techniques. The datasets are found by sorted.
converting the unwanted data to new modified
data.We using the Mapreduce algorithm of function
based on HDL for preproccessed work, this is used
for data data precessing and classification. We also IV. SYSTEM DESIGN
using python language for polarization of the data. so
by using of WBAN services everyday lives is
spreading widely which extend the lives of society.

[3] The paper is about to find the brain stroke using


electroencephalography (EEG). Most of the hospitals
uisng CT scan and MRI scan to detect the stroke but it
is very much costly , so we have to find the cost-
efeectively way here we find it . Stroke is caused
because of hypertension, diabetes, some low blood
pressure and in high striking of smoking . In India
stroke is still high unavailibility of management
facilities in government hospitals and poor public
awareness. EEG is used for detecting brain disorders
like stroke, brain trauma as because of it is non-
invasive and having a better temporal resolution as Fig. 1.System design
Fig.1 proposes the overall design and implementation classification and also in the regression also. in tree
of the entire system. The main objective of the structured classification the classification there are lot
prediction of stroke using machine learning is to of algorithms such as naive based classifier like that
predict the stroke with the help of classification and in classification decision tree also a one of the
regression while these process has been achieved by algorithm it is a tree based classifier it will classifying
using some sort of algorithms like logistic the data based on a tree based structure.it is a
regression,random forest and k-nearest neighbors graphical representation of all the possible solutions
(KNN) algorithm the dataset has been get as a input to a decision based on certain conditions.it represents
from the user and the input has been stored process in a functions that takes as input a vector of attribute
list of characters in the data set and the hospital values and returns a decision as a single output
dataset has been collected followed by the values.the decision tree algorithm falls under the
Exploratory Data Analysis (EDA) process has been category of supervised learning for classification
implemented in the design.continously pre- and regression
processing of the data has been enact in the design ,as decision tree reaches its decision by performing a
after the pre-processing has been concluded the train sequence of test.an highly complicated decision tree
of the data has been the next process at the dataset.the tends to have a low bias which makes it difficult for
pre-processed data has been collected by the random the model to work with new data and it has simple to
forest algorithm and the null values has been replaced understand interpret and visualize and it been carried
them into zero to avoid the null value exception.after both numerical and categorical,The overfitting occurs
all null value operation and extraction of data has when the algorithm captures noise in the data,the
been finalize regression and classification has been model can get unstable due to small variation of data.
processed after all comparison of all algorithm has
been implemented and then prediction made and the C. Logistic Regression algorithm
output has been delivered as a data of prediction and
non prediction of stroke. Logistic regression is a machine learning model
used for binary classification it has mainly used for
A. Data preprocessing
classification of problems and identifies the
different components that are present in the image
Our database contains a list of data and string
and helps to categorize them while we have to
values that which has been processed by
predict discrete of values. the response variable of
Exploratory Data Analysis(EDA),it contains
the logistic regression is categorical in nature.it
integer encoding for string values and followed by
helps to calculate the possibility of a particular
list of null values has been extracted and integer
event taking place. while binary classificattion
has been converted into binary representation
means where our output predictions can only take
values,in attributes some of the values are been
one of the two possible values for example it can
missed so it been represented as N/A,but we
be
replaced them into zero to avoid the null value
0 or 1 or else true or false,well some of the
exception.after all null value operation and
application where the logistic regression can be
extraction of data has been finalize,the training of
used in real world can be to classify the email is a
dataset has been processed with the help of other
spam or not in order to find out if the transaction is
machine learning algorithms.
fraud or not or it can be also to find out with the
person having a disease or not or to find out the
tumor is malignant or not so you can see that
logistic regression is used when our output
prediction can take either of the values means
either this value or that value now this can be very
well to understood with the help of these example.

Fig 2. Data pre-processing example

B. Decision Tree

Decision tree is an tree structured classification and


regression algorithm is mainly used in case of Fig 3. Logistic Regression algorithm example
D. K-nearest neighbors algorithm of the majority of the decision trees.the required
purpose of the random forest is the decision tree has
k-nearest neighbors algorithm is a super simple been suffer from low bias and the high variance and
way to classify the data. is a wonderful algorithm the random forest induces flexibility and converts into
for solve a non linear and classify data points.in high varaince into low variance,the random forest
pattern recognition the k-nearest neighbors uses many trees ,and makes a prediction by
averaging the
algorithm is a non-parametric method used for prediction of each component or indiviual tree,the
classifications and regression in both cases the random forest model generally has much better
input consist of k-closest training in the feature predictive accuracy than a single decision tree and
space.where it has stores all the available cases and works well with default paramaters.while comming to
classifies the new data or case based on a assumptions of random forest there should be some
similarity measure in k-nearest neighbors actual values in the feature varaible of the dataset so
algorithm it can use
any distance for calculating and the major that the classifier can predict accurate results rather
disadvantge of the algorithm its takes more than a guessed result,the prediction from each tree
memory for storing or calculating a distance.where must have very low correlations.the random forest has
k-nearest neighbors algorithm is a type of instance predict with high accuracy even for the large dataset it
based learning or lazy learning the function is only runs efficiently,and it also maintain the accuracy
approximated locally and all computation is when a large proportion of data is missing.
deferred until classification. the k-nearest
neighbors
algorithm is among the simplest of all machine
learning algorithm.

Fig 5.Random forest algorithm example

F. Support Vector Machine algorithm

One of the supervised machine learning algorithm


which as named as Support vector machine it has
been used as an both classification and regression
problems,
it has been work with the help of Thumb rule that
which to identify the right hyper-plane and also it
been segregates the two classes better and maximize
the distances between nearest data point either class
and hyper-plane this distance is called as margin and
the margin has been divided into two parts soft
margin and hard margin.The main strength of the
support vector machine is that they work well even
when the number of SVM features is much larger than
the number of instances.it can be work on dataset with
Fig 4. K-nearest neighbors algorithm example huge feature space,such is the case in spam filtering
where a large number of words are the potential
E. Random forest algorithm signifiers of a message being spam.even when the
optimal decision boundary is a nonlinear curve the
Random forest is a suprvised machine learning SVM transform the variable to create new dimensions
technique that which construct multiple decision such that the representation of the classifier is a linear
trees.the final decision is made based on the outcome function of those transformed dimensions of data.the
high number
of dimensions has been easily classified for example
2D to 3D classification has been easily classified by
using support vector machine.

Fig 7. Output Diagram

REFERENCES
Fig 6. Support Vector Machine algorithm
example [1] Mohammad H. Aghababaei a, Ghasem Azemi a, John M.
O’Toole "Detection of epileptic seizures from compressively
V. RESULT sensed EEG signals for wireless body area networks"2021,Expert
Systems with Applications
As the data given by the user has been processed, the [2] Anand Singh Rajawat, Kanishk Barhanpurkar, Rabindra Nath
condition checks the input through various Shaw,and Ankush Ghosh"Risk Detection in Wireless Body Sensor
methods.with the help of pre-defined classification Networks for Health Monitoring Using Hybrid Deep
and regression algorithms the predection of stroke has Learning"2021,Innovations in Electrical and Electronic
been implemented by the help of Flask lib in Engineering.
python.Its an pre-defined classifer to generate Port 3] Pinanshu Garg,Prateek Kumar Sonker,Dheeraj
id .the input has given with the help of generated Port KhuranaShubhajit Roy Chowdhury,Kshitij Shakya,"Detection of
id and it been passes the values to our machine Brain Stroke using Electroencephalography (EEG)"2019
learning algorithms afterall the process concludes,the Thirteenth international confernce on sessing technology,
final result has been reported to the user or admin.the [4] Kaiyuan Shen,Zhewen Tian,Yong HuXiaonan Zou,"Logistic
way of process continous until the input ends. Regression Model Optimization and Case Analysis" 2019 IEEE
7th International Conference on Computer Science and Network
VI. CONCLUSION Technology (ICCSNT).

[5] Jitendra Kumar Jaiswal,Rita Samikannu"Application of


We have proposed a Stroke prediction using machine
Random Forest Algorithm on Feature Subset Selection and
learning framework based on the classification Classification and Regression"2017 World Congress on
research of artificial intelligence and machine Computing and Communication Technologies (WCCCT).
learning platforms.In this paper the Logistic
Regression algorithm is used for regression and other [6] Sourish Ghosh; Anasuya Dasgupta; Aleena Swetapadma,"A
Study on Support Vector Machine based Linear and Non-Linear
algorithms like random forest and K-nearest Pattern Classification",2019 International Conference on
neighbors algorithm are been implemented for futher Intelligent Sustainable Systems (ICISS).
prediction and machine learning classification,for
example the pre- defined dataset has been set for [7] Kashvi Taunk; Sanjukta De; Srishti Verma; Aleena
Swetapadma,"A Brief Review of Nearest Neighbor Algorithm for
stroke detection it has been obtain by hospital.it been Learning and Classification", 2019 International Conference on
processed and detect if the person affected by stroke Intelligent Computing and Control Systems (ICCS).
or not and also it been processed in user random input
also.the input has been given by user after that it been [8] L. Ding, W. Fang, H. Luo, P.E.D. Love, B. Zhong, X. Ouyang,
shared to our following dataset and then the process A deep hybrid learning model todetect unsafe behavior: Integrat-
ing convolution neural networks and long short-term
of all classsification and regression has been memory.Automa- tion in Construction 86, 118–124 (2018).
implemented and the predicted result has been shared
to user or admin. [9] Ali, S. El-Sappagh, S.M.R. Islam, D. Kwak, A. Ali, M. Imran,
K.-S. Kwak, A Smart Healthcare Monitoring System for Heart
Disease Prediction Based On Ensemble Deep Learning andFeature
Fu- sion. Information Fusion (2020)

[10] Negra, R., Jemili, I., & Belghith, A. (2016). Wireless body
area networks: Applications and technologies. Procedia Computer
Science, 83, 1274–1281

You might also like