2.the Causes Analysis of Ischemic Stroke Transformation Into Hemorrhagic Stroke Using PLS Partial Least Square-GA and Swarm Algorithm
2.the Causes Analysis of Ischemic Stroke Transformation Into Hemorrhagic Stroke Using PLS Partial Least Square-GA and Swarm Algorithm
Abstract—ŊŊschemic stroke has been known to convert to offer early intervention and also provide relevant health
Hemorrhagic stroke with ease. It is crucial to identify high risk education knowledge and lifestyle guides for different risk
patients with relevant diseases. This study uses the Taiwan groups, thereby reducing follow-up medical costs and health
National Health Insurance database to collate and analyze the insurance expenditures.
relevant diseases leading to Hemorrhagic from Ischemic stroke. Brain stroke is referred to local occlusion or hemorrhage in
We propose several novel machine learning based algorithms and the blood vessels of the brain, which damages the brain tissue
indexing methods for disease transformation prediction and that supplies blood and nutrients by this blood vessel, causing
accuracy enhancements. They are a modified swarm algorithm, brain cell damage, necrosis, and local neurological
partial least square(PLS) algorithm and genetic algorithms. dysfunction[13].
From the petition application files accumulated from 2006 to There are two reasons for the formation of Cerebral
2013 within the National Health Insurance Database, 8,483
thrombosis - (1) abnormal blood coagulation causes the blood
patients with ischemic stroke is collected. Among them, the 1,145
viscosity to become large and form a thrombus. (2) cerebral
patients with both ischemic stroke and hemorrhagic stroke were
screened according to the ICD-9-CM diagnostic code. The
vascular (arterial) atherosclerosis, the formation of plaque,
disease history of each patient is vectorized and stacked into a narrowing the lumen of the artery, resulting in thrombosis.
matrix for analysis. The PLS/GA process is then applied on the Blood circulation is blocked, resulting in hypoxic necrosis of
disease history matrix, trying to filter out the candidate diseases the brain[15].
leading to such transformation. A total of 750 diseases were Cerebral embolism is caused by Embolons (such as blood
found to be associated with ischemic stroke and hemorrhagic clots, sclerosing plaques, fat, air bubbles, etc.) which block the
stroke through the PLS/GA process. A modified PSO algorithm cerebral blood vessels and cause ischemic necrosis of the
is developed to further weight these selected diseases. A quartile brain[14]. Hemorrhagic transformation (HT) can be a natural
rule is then applied to filter these weighted diseases to ten most outcome of patients with cerebral infarction, and can also occur
influential diseases which are dizziness, constipation, chronic after stroke treatment. In the current research, the incidence of
renal failure, hypertension, diabetes, hyperlipidemia, anxiety, hemorrhagic transformation is inconsistent. However, severe
muscle pain, prostatic hypertrophy, etc. In addition to normally hemorrhagic transformation causes human disabling and the
known diseases to such conversion in the literature such as mortality rate is high. This draws high attentions and
hypertension and diabetes, we also discovered more potential precaution efforts from clinicians. The risk factors include
diseases. The initial analysis accuracy of our proposed methods anticoagulant indications, demographic data (gender, race and
reaches 86% on average, while that of the traditional Neural age), medical factors (blood pressure, glycemic control, lipid
Network algorithm utilizing the same training data was only
distribution and renal function [2]) [6]. In this study we only
49.5%.
explore medical factors and look for related diseases.
Keywords—Partial Least Square Algorithm, Genetic Algorithm, We first analyze the set of 8,483 patients with ischemic
Particle Swarm Algorithm, ischemic stroke, hemorrhagic stroke . stroke. Among them, the 1,145 patients with both ischemic
stroke and hemorrhagic stroke are organized into a training
data feed. A GA/PLS algorithm is used to select all possible
I. INTRODUCTION relevant diseases that can lead to maximum differentiation rate
In the past four decades, stroke has become one of the top among 2770 diseases for the two classes of symptoms. Given
three causes of death in Taiwan. Stroke is divided into the still large amount of selected diseases from the GA/PLS
ischemic stroke and hemorrhagic stroke. The treatment process, we developed a modified swarm (PSO) algorithm for
methods of the two strokes are not the same. Patients with optimizing the weights of the selected diseases. Once a set of
ischemic stroke will be treated by neurologists. Patients with optimal weight distribution is found to reach an accuracy of
hemorrhagic stroke will be referred to neurosurgeons. 87%, a quartile rule is then applied to further filter the 750
Therefore, when a patient turns bleedingġ from ischemic in diseases down to 10. In addition to two regularly recognized
ignorance, it will cause a major disaster. Therefore, it is crucial diseases causing conversion, we also discovered eight more
to develop a predictive model to identify and track high-risk diseases worthy of attentions.
diseases causing such transformation. The clinics can thus
Authorized licensed use limited to: REVA UNIVERSITY. Downloaded on September 13,2022 at 03:06:21 UTC from IEEE Xplore. Restrictions apply.
Section II describes the previous study of disease results showed that they did not prove to be influential,
transformation specific about ischemic stroke transformation. although other researchers speculated that there was a cause
Section III describes the research methods which includes data and effect relationship.
preprocessing, Partial Least Squares as a dimension reduction In this paper, we proposed to apply PLS/GA and PSO
method, the GA/PLS method as the machine learning algorithm for the data mining on a health care database in
techniques adapted by this paper, and the swarm algorithm for Taiwan. The PLS/GA is used to exercise an initial reduction
further feature extraction. Section IV gives the experiment on the problem dimension of 2770 diseases to 750 diseases.
results of the big data set of stroke samples per proposed The process includes adjusting the weights and removal of the
machine learning methods. We compare the test data accuracy noise within the training dataset. Given the still large set of
between the traditional neural network method and our dataset, we developed a modified PSO swarm algorithm to fine
GA/PLS methods with regard to the true positive and false tune these diseases and apply a basic quartile rule in statistics
negative rate. Finally, we enlisted ten high risk diseases most to find a threshold value to filter out most relevant diseases for
likely leading to the transformation. final evaluation and diagnosis application in the clinics.
ġ
II. RELEVANT WORKS
ŊŊŊ.RESEARCH METHODS
In many related studies, some have been made in the causes This study used gene algorithm plus PLS to find closely
analysis of stroke, ischemic stroke and hemorrhagic stroke, but related diseases, and finally used PSO to find out the most
most of them are regression analysis, multivariate, logical influential diseases. Fig. 1 depicts the work flow presented in
regression, etc. These statistical methods, and the vast majority the research methods. As observed Fig. 1, the disease history
are directed at specific individual diseases, or physiological of each patient is entered as input to the GA process for
factors that are known to be affected. In this paper, we work dimension reduction. The selected gene decides the set of
on a big data analysis on all the diseases that all patients diseases for principal components analysis(PCA) through the
receive. The goal is to find out if there are other diseases that PLS process. Given the PC values of each sample point (each
may affect the transformation. The following is a comparison patient), the KNN process is exercised to derive the
of some related studies. classification quality as a fitness function value. A threshold
Jie Zhang, Yi [3] and others conducted a correlation study value is chosen as a criterion to decide if the whole process
on hemorrhagic transformation of its related inducing factors. should stop or not. Otherwise the GA and PLS algorithm is
The results showed large area infarction, cortical infarction, repeated until an optimal set of diseases is selected to generate
atrial fibrillation and cerebral embolism, hyperglycemia, low the best classification quality.
TC and low density lipoprotein Cholesterol levels, low platelet
counts, poor collateral vessels, thrombolytic therapy, globulin
increase, early CT signs, HMCAS, etc. All have high impact
factors. The authors should be more cautious in the treatment
of patients with these predictive factors, and monitor closely.
Marsh EB [2] et al. created a hemorrhagic risk stratification
(HeRS) score based on regression coefficients in multivariate
modeling, primarily for effective predictors of ischemic stroke
and anticoagulation index. The results show that deteriorating
eGFR Categories, greater infarct volume, high serum glucose,
higher NIH Stroke scale scores, and TPA treatment, leukemia
resection, and elevated white blood cell count were all
associated with an increased risk of hemorrhagic
transformation.
Chuang CS et al. [4] used the health insurance database to
conduct a generational study on the correlation between
pneumoconiosis and cerebrovascular disease. The analysis
showed that the pneumoconiosis had a higher ischemic stroke
rate than the control group. Therefore, the author's team
suggested that when patients with pneumoconiosis are
clinically diagnosed. Appropriate education, follow-up should
be considered. Risk factors should be associated with brain
stroke.
Adamson, J [6] and others studied whether stroke and
disability were related. Logistic regression modeling was used
to find that stroke is not the cause of disability, but stroke is
highly associated with disability, which may be compared with
other diseases. It is considered to be the most common cause
of complex disabilities
Peter N. Lee Ma [7] et al. studied whether there was a risk
of stroke of non-smokers in a smoking environment. The Fig. 1 Dataflow for GA/PLS/KNN
721
Authorized licensed use limited to: REVA UNIVERSITY. Downloaded on September 13,2022 at 03:06:21 UTC from IEEE Xplore. Restrictions apply.
A. Data processing
if i = 1 , [u,s,v] = svd(S)
Firstly, we used Statistical Analysis System(SAS) to
extract the medical records of ischemic stroke and if i > 1 , [u,s,v] = svd(S-Pi-1(Pi-1TPi-1)-
hemorrhagic stroke from the health care database. In the
1
hemorrhagic stroke, icd 9 cm code: 430, 431, 432 is used, and Pi-1TS)
ischemic stroke is icd. 9 cm code: 433, 434, 435, 436, 437.
Then we divide the data into two classes. One with ischemic ri = u(:,1)
stroke only and the other with ischemic stroke to hemorrhagic
stroke. The disease history of each patient of the two classes ti = X0ri
is extracted and form a single data set as a feature exploration
basis for our proposed machine learning algorithms. The pi = X0Tti/(tiTti)
characteristic of such dataset is a list consisting of all diseases,
while the value for each element is 1 if the patient has been end
diagnosed with a specific disease, otherwise 0, if not with the
disease. The total size of the list is 2770 for all diseases BPLS=Rk(TkTTk)-1TkTY0
accounted for by the national health care database.
Fig. 2 SIMPLS algorithm listing
B. Partial Least Squares(SIMPLS)
PLS is a method of multivariate statistical analysis, selected as the grouping standard. The training sample binary
proposed by Herman Wold (1975), which has developed 11 ⋯ 1
rapidly in theory, methods and applications in recent years. In M = ⋮ ⋱ ⋮
the establishment of the model, PLS combines important matrix, X0= 1 ⋯ ͉ ij,={0,1}, where
statistical techniques such as multiple linear regression,
i=1..p(patient_numer), j=1..k, from GA. k is the reduced
principal component analysis and canonical correlation
disease number of 750 from the original total of 2770 diseases
analysis to realize the application of various data analysis.
and Y0=[1..j].
However, the traditional PLS must be subjected to repeated
Before gene manipulation, two procedures are needed
iterative operations and data compression to obtain the correct
according to Fig. 3. They are PC score calculations based on
weight vector and fraction vector, which is time consuming.
KNN algorithm and fitness function calculation. PC score
Therefore, this paper uses De Jong, S. (1993) to propose a
calculation is based on the general PCA(Principle Component
method [8], which obtains the fractional vector directly from
Analysis) which needs Mij on the selected diseases signal
the original data, without repeated iterative operations. This
profile accumulated as eigenvectors. PC score of each sample
method is SIMPLS. The weight vectors r, q are unitized, that
is derived from the first and second principle components of
is, r ŕ =1, q q́ =1, and r1 is the first left singular vector of S=X Y
́
the PLS calculation in Fig. 3. Given PC scores of all samples
after the end of the first round operation. q1 is the first right
of the assigned classes, the KNN algorithm is exercised to
singular vector of S=X ́Y which is different from the fractional
calculate the data required by the fitness function for the GA
vector in which we take the weight vector as the largest
algorithm. This will be explained in more depth after the next
variation. In SIMPLS algorithm, Fig. 2, we set the weight
paragraph of fitness function explanation. Eq.(1) gives the
vector r be the first left singular vector of S. The linear model
definition of the fitness function. The fitness function is
estimation formula of Y can be obtained from Y=TC ́+F* and
composed of normalized class weights of SHC(s)/Kc and
C ́=T ́Y [9] as in Fig.2.
sample weights of SW(s). Firstly, the number of nearest
C. Genetic Analysis and Partial Leaset Square(GA/PLS) neighbors with the same class label as the sample point in
The GA-PLS algorithm pseudo code is shown in Fig. 3. At question is computed as sample hit count (SHC), (0 < SHC(s)
first, the gene algorithm randomly generates a number of < Kc), where Kc is the number of samples in class c. These
binary strings of length equal to the length of the original numbers are fixed for each generation of gene mutation.
1
¦¦ K u SHCs u SW s
training data. This is called the first generation code, each of
which is called a chromosome (chromosome). Each set of F (d ) (1)
c sc c
codes represents a feasible solution (possibly containing the
CW c
CW c
best solution). For a single chromosome, all data set to 1 will
u 100 (2)
be extracted and included in a subset ready for the PLS process.
Then, the original 2770 dimensions were reduced to 750 ¦ CW c
c
dimensions by GA in Fig.3, and all the dimensions were
SW s
SIMPLS Algorithm SW s u CW c (3)
¦ SW s
S = X0TY sc
Each principal component plot generated for each feature
subset is scored using the K-nearest neighbor (K–NN)
for i = 1 to k classification algorithm. For a given data point for a patient,
Euclidean distances are computed between it and every other
722
Authorized licensed use limited to: REVA UNIVERSITY. Downloaded on September 13,2022 at 03:06:21 UTC from IEEE Xplore. Restrictions apply.
point in the principal component plot. These distances are weight for s during generation g+1. Both are influenced by
sorted and a poll is then taken of the point’s K-nearest SHRg. Classes and samples with low hit rates will be
neighbors. For the most ideal classification situation, K equals weighted more heavily, that is, they will have more influences
the number of samples in the class to which the point belongs, in the fitness calculation, than classes or samples that score
Kc in this case. well. The purpose is to enhance the overall sample hit count
Following the steps of PC score and fitness function which is directly proportional to the quality of sample
calculation, GA takes over and performs gene manipulation to agglomeration of final selected classes.
generate a new set of feature disease selection ready for the
sample weight adjustments step. This step guarantees the 1 ]
SHCi s
successful operation of the GA, thereby helping to minimize SHRg s ¦ ˈ s c (4)
the problem of convergence to a local optimum. ] i 1 Kc
Sample weight adjustments is performed in three steps.
1
First, the sample hit rate(SHR), which is the mean value of CHR g c ¦ SHR s ˈ s c (5)
SHC/Kc over all feature subsets(ξ) produced in a particular ] sc
g
generation ‘g’(see Eq. 4). The class hit rate (i.e., average
sample hit rate for all samples in a class) is computed using the CWg 1 c CW c 1 CHR c
g g (6)
SHRg. CHR provides consistent information about the
difficulty in classifying a particular sample type. Second, class,
SWg 1 s SW s 1 SHR s
g g (7)
and sample weights are adjusted during each generation via
Eqs. 6 and 7 respectively where CWg+1(c) is the class weight
for c during the generation g+1, and SW(s)g+1 is the sample
Algorithm : GA+PLS+KNN
// Initialise generation 0:
11 ⋯ 1
M = ⋮ ⋱ ⋮
Input: Training data set, ⋯
Ǵαij={0,1}Ǵp= number of patientǴm=total disease
1
numberǴ0 stands for without the disease, while 1 stands for with such disease. Binary strings of
chromosomes, Gij(i=1..n, j=1..m), n := the number of individual chromosomes in the population;
k := 0;
χ := the fraction of the population to be replaced by crossover in each iteration, 0.5 in this application;
μ := the mutation rate;
Pk := a population of n randomly-generated individuals;
SHC(s): sample hit count for class c, s:=1..2, given a PC(principle component) from PLS calculation of
Ri, based on a random initial distribution of Gij
Kc(c): sample number in class c
SW(s): sample weight in class c, initially equal to 1/m for generation 0
// Evaluate Pk:
Compute fitness(i) for each i ∈ Pk;
Kc di
1
F (d ) AVG( ¦¦ u SHC s u SW s ), SW ( s) ¦1 e kc , d i is the Euclidean distance to s in PLS
c 1..2 sc K c i 1
space.
do { // Create generation k + 1:
// 1. Copy:
Select (1 − χ) × n members of the sorted Pk and insert into Pk+1;
// 2. Crossover:
723
Authorized licensed use limited to: REVA UNIVERSITY. Downloaded on September 13,2022 at 03:06:21 UTC from IEEE Xplore. Restrictions apply.
Select χ × n members of the sorted Pk based on Fk(d);
pair them up;
produce offspring;
insert the offspring into Pk+1;
// 3. Mutate:
Select μ × n members of Pk+1;
invert a randomly-selected bit in each;
// Evaluate Pk+1:
Compute fitnessk+1(i) for each chromosome i ∈ Pk+1 via eq.(1)~(7);
For( i= 1 to n)%for each chromosome
For ( j = 1 to s, size of training sample)
KNN(j) ← list of KNN neighbors of sample j, length =class size of sample j, Kc
CLASS_ID( j ) ← belonging classes IDs of each sample in KNN(j), size=Kc
SHC(i, j ) ← 0
For ( u = 1 to Kc)
If CLASS_ID( j , u ) = sample j’s belonging class ID
SHC( i, j )++
End
End
End
End
For ( u= 1 to m)% calculate new sample weight for every sample
% via a mean SHR for all chromosomes from the 1 st to the n st
SHR(u) = mean(SHC( i, j )/Kc, i=1..n)
SW(u) = SW(u) + (1- SHR(u) )
End
For( i= 1 to n)
Kc di
1
F (i, d ) k 1 AVG(¦¦ u SHCs u SW s ) 炻 SW ( s ) ¦1 e kc
c sc K c i 1
End
// Increment:
k := k + 1;
} while fitness of fittest individual in Pk is not high enough;
return the fittest individual from Pk;
Fig. 3 GA/PLS/KNN algorithm pseudo code
724
Authorized licensed use limited to: REVA UNIVERSITY. Downloaded on September 13,2022 at 03:06:21 UTC from IEEE Xplore. Restrictions apply.
Disease weight distribution optimization using PSO particle swarm algorithm
11 ⋯ 1
C = ⋮ ⋱ ⋮
Input: 1 ⋯ the binary matrix of all selected diseases for all patients after PLS/GA process, k being the
reduced number of 750 from the original total of 2770 diseases.
1 ∗ 11 ⋯ ∗ 1
X0=C = ⋮ ⋱ ⋮
∗ 1 ⋯ ∗ 1
Output: [w1, w2,…wk] for selected diseases with max. F(d) cost with 1
While (~StopCondition())
For (P ∈ population)
If (Pwt(i,k) = Pg_best)
Pvelocity ← 0
Else
Pvelocity ← v(t+1) = w x Pvelocity(t)+c1 x rand x (Pbest - Pwt (t))+c2 x rand(Pg_best –Pwt (t)) eq.(10)
Pwt(I,k) ←Pwt (t+1)= Pwt (t)+ Pvelocity (t+1)
End
If (Cost(Pwt) ≥ Cost(Pp_best))
Pp_best ← Pwt(i,k)
If (Cost(Pp_best) ≥ Cost(Pg_best))
Pg_best ← Pp_best
End
End
End
End
Return(Pg_best)
Fig.4 Modified swarm (PSO) algorithm
During each generation, the selection, crossover, and mated with the top half of the random population, guaranteeing
mutation operators are applied to the chromosomes to develop that the best 50% are selected for reproduction, while ensuring
new and potentially better solutions. A fraction of the that every string in the randomized copy has a uniform
population is then selected as per the selection pressure, which chance of being selected due to the randomized selection
is usually set at 0.5. The top half of the ordered population is criterion imposed on the strings in this population.
725
Authorized licensed use limited to: REVA UNIVERSITY. Downloaded on September 13,2022 at 03:06:21 UTC from IEEE Xplore. Restrictions apply.
For each pair of strings selected for crossover, two Fig. 7 shows the classification results of feeding test data of
new strings are generated using a variation of three-point a few hundred patients containing both ischemic stroke and
crossover. As in the case of simple three-point crossover, the hemorrhagic stroke and with ischemic stroke only. These
length of each new string or solution is the same as the patients’ 750 diseases are from the optimal selection of
dimensionality of the data. However, the crossover operator GA/PLS results. The weighted KNN is calculated for these set
used by the pattern recognition GA is not compelled to of data. The first and second principle components are
preserve order among exchanged string fragments. calculated in PLS space and displayed in Fig. 7. The
separation of the two classes is quite clear and the accuracy
D. Weight Optimization using Swarm algorithm(PSO) reaches 0.85 in average.
A modified PSO algorithm is listed above in Fig. 4.
The goal is to add weights to the selected diseases from the
previous GA/PLS process. This allows further filtering of the
more relevant diseases from the total of 750 diseases after
GA/PLS. The process starts by initializing a random weight
distribution to the set of diseases. We apply ten particles to the
swarm process, while each particle contains a distribution of
weights pattern for the selected diseases. A vector of velocity
values is also assigned for each disease, forming a 750 degree
of freedom for each particle.
IV.EXPERIMENTS RESULT
This study used health care data from 2006 to 2013, and
selected patients with ischemic stroke and hemorrhagic stroke.
The disease's icd 9 cm code: 430, 431, 432 is a hemorrhagic
stroke, icd 9 cm code: 433 , 434, 435, 436, 437, is an ischemic
stroke. The total number of diseases reaches 2770 and serves
as the search space for the GA/PLS algorithm.
A. GA-PLS unweighted KNN
We used pre-organized two groups of patients, one with
ischemic stroke, and the other with ischemic stroke to
hemorrhagic stroke. In each of the two groups, 1000 patients
were randomly selected for training. The unweighted GA-PLS
method is used, iterating for 500 times. It can be seen from the
above two figures that the two groups although are separated, Fig. 6 PLS grouping data after GA, with KNN weighting
but not very obvious and most of the data points overlap. Fig.
5 shows the optimization process for sample distribution in
both PLS’s first and second principle components.
2nd principle component
726
Authorized licensed use limited to: REVA UNIVERSITY. Downloaded on September 13,2022 at 03:06:21 UTC from IEEE Xplore. Restrictions apply.
Table 1. Filtered disease using the upper outlier of 161
JDE!
Ejtfbtf!obnf!
dpef!
8915! Ej{{joftt!boe!hjeejoftt!
6751! Dpotujqbujpo!
696! Dispojd!sfobm!gbjmvsf!
512! +Fttfoujbm!izqfsufotjpo!
Ejbcfuft!xjui!votqfdjgjfe!
dpnqmjdbujpo-!Uzqf!JJ!\opo.jotvmjo! Fig. 8 disease weighting distribution after PSO process
361:1! efqfoefou!uzqf^\OJEEN!uzqf^\bevmu.
potfu!uzqf^!ps!votqfdjgjfe!uzqf-!
opu!tubufe!bt!vodpouspmmfe!
8911:! Puifs!bmufsbujpo!pg!dpotdjpvtoftt!
3833! Njyfe!izqfsmjqjefnjb!
41111! Boyjfuz!tubuf-!votqfdjgjfe!
83:2! Nzbmhjb!boe!nzptjujt-!votqfdjgjfe!
7111! Izqfsuspqiz!)cfojho*!pg!qsptubuf!
Cfojho!izqfsufotjwf!ifbsu!ejtfbtf! Fig. 9 Box and whisker plot of disease weights in Fig. 8
51321!
xjuipvu!dpohftujwf!ifbsu!!
box and whisker plot of the quartile calculation is shown in
Fig. 9.
C. Modified PSO for disease weights optimiation D. Artifical Neural Network
To further refine the search results from the GA/PLS The traditional method of Neural Network(NN) is used
process, we perform a filtering process by PSO algorithm to in this section to see if it performs better. The diseases
discover which diseases have a greater impact to the obtained by using the GA-PLS weighted method is used in
transformation. The idea is to add weights to the diseases the NN instead of all the diseases in the database. The
selected by the previous GA-PLS process. Instead of a motivation is to ensure a common comparison ground.
unity and constant weight put on the selected disease in the Network type - reverse transfer algorithm, training
original GA/PLS process, we put a variety of weights to function - Belle regularization (TRAINBR), adaptive
each disease within the optimal set of pool of disease. A set learning function - gradient descending weight value with
of 30 particles is used in the PSO algorithm to search for a momentum
combination of weights in order to achieve an optimal F(d)
objective value in the PLS/KNN calculation. The best seed (LEARNGDM). Network layer : 2 layers (hidden layer,
contains the best distribution of the weights to each of the output layer) , the first hidden layer contains 10 neurons.
disease. Fig. 3 describes the detail of the process. The The hidden layer transfer function is LOGSIG (logarithmic
weight of diseases can determine which diseases have a double bending), while the second layer of output layer
greater impact on ischemic hemorrhagic stroke. Given the contains 2 neurons. The output layer transfer function uses
optimal weight distributions of the disease, we apply a PURELIN (linear), other preset parameters are not Change,
quartile rule to these weights and calculate the outlier value. only the maximum number of failed verification data is
The rule first finds the lower quartile (Q1) and the upper changed from 6 to 100.
quartile (Q3), and then calculates the interquartile range As shown in Fig. 10, we can see that the neural networks
(ΔQ) = Q3-Q1, since we only want to know which diseases doesn’t show superior result. Their resolution is only 50%,
have the highest degree of influence, the outlier = Q3 + 1.5 which is similar to throwing coins.
* ΔQ. As long as the weight exceeds this threshold, it is
considered candidate diseases affecting from ischemic to
hemorrhagic stroke. The optimal weight distribution and
outlier threshold is picked to be 161 and shown in Fig. 8. A
total of 10 diseases is filtered and shown in Table 1. The
727
Authorized licensed use limited to: REVA UNIVERSITY. Downloaded on September 13,2022 at 03:06:21 UTC from IEEE Xplore. Restrictions apply.
the ANN shows a lower accuracy in average in prediction of
both kind of disease correlation, the prediction accuracy in
predicting the occur of type 1 conversion reaches 77 percent.
ŗįńŐŏńōŖŔŊŐŏ
728
Authorized licensed use limited to: REVA UNIVERSITY. Downloaded on September 13,2022 at 03:06:21 UTC from IEEE Xplore. Restrictions apply.
Intelligent Systems (CYBER), 2016 IEEE International Conference
on. IEEE, 2016. p. 126-130.
[6] Adamson, Joy; Beswick, Andy; EBRAHIM, Shah. Is stroke the most
common cause of disability?. Journal of Stroke and Cerebrovascular
Diseases, 2004, 13.4: 171-177.
[7] Lee, Peter N., et al. Environmental Tobacco Smoke Exposure and
Risk of Stroke in Never Smokers: An Updated Review with Meta-
Analysis. Journal of Stroke and Cerebrovascular Diseases, 2017, 26.1:
204-216.
[8] De Jong, S. (1993). SIMPLSʁan alternative approach to partial least
squares regression. Chemometrics and Intelligent Laboratory
Systems , 18:251将263.
[9] Dong-Yan Huang, Zhengchen Zhang, and Shuzhi Sam Ge, "Speaker
state classification based on fusion of asymmetricsimple partial least
squares (simpls) and support vector machines," Computer Speech &
Language, vol. 28, no. 2,pp. 392–419, 2014.
[10] Jae-woo Lee,Hyun-sun Lim,Dong-wook Kimb Soon-ae Shin,Jinkwon
Kim,Bora Yoo,Kyung-hee Cho (2017),“The development and
implementation of stroke risk prediction model in National Health
Insurance Service's personal health record”, Computer methods and
programs in biomedicine, Vol.153,pp.253-257.
[11] R. C. Eberhart, and J. Kennedy, “new optimizer using particle swarm
theory”, Proc. Sixth International Symposium on Micro Machine and
Human Science, Nagoya, Japan, 1995, pp.39-43.
[12] J. Kennedy, and R. C. Eberhart, “Particle swarm optimization”, Proc.
IEEE International Conference on Neural Networks (Perth, Australia),
IEEE Service Center, Piscataway, NJ, 1995, pp. IV: 1942-1948. obile
systems, applications, and services, Pages 205-218, 2005.
[13] Bilic, I., G. Dzamonja, I. Lusic, M. Matijaca, and K.
Caljkusic(2009).“Risk factors and outcome differences between
ischemic and hemorrhagic stroke.”, Acta Clinica Croatica 48 (4):
399-403.
[14] K. Toyoda, Y. Okada, S. Kobayashi(2007),“Early recurrence of
ischemic stroke in Japanese patients: the Japan standard stroke
registry study.”,Cerebrovasc , pp. 289-295.
[15] Y.Y. Lee, K.L. Lin, H.S. Wang, M.L. Chou, P.C. Hung, M.Y. Hsieh,
et al.(2008),“Risk factors and outcomes of childhood ischemic stroke
in Taiwan.”,Brain, pp. 14-19.
729
Authorized licensed use limited to: REVA UNIVERSITY. Downloaded on September 13,2022 at 03:06:21 UTC from IEEE Xplore. Restrictions apply.