0% found this document useful (0 votes)

54 views

2.the Causes Analysis of Ischemic Stroke Transformation Into Hemorrhagic Stroke Using PLS Partial Least Square-GA and Swarm Algorithm

Uploaded by

geetha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

54 views

2.the Causes Analysis of Ischemic Stroke Transformation Into Hemorrhagic Stroke Using PLS Partial Least Square-GA and Swarm Algorithm

Uploaded by

geetha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC)

The causes analysis of Ischemic Stroke transformation

into Hemorrhagic Stroke using PLS (partial least
square)-GA and swarm algorithm

Chihhsiong Shih, You-Wei Chang, WillIam Cheng-Chung Chu

Department of computer science
Department of computer science
Taichung, Taiwan
[email protected]

Abstract—ŊŊschemic stroke has been known to convert to offer early intervention and also provide relevant health
Hemorrhagic stroke with ease. It is crucial to identify high risk education knowledge and lifestyle guides for different risk
patients with relevant diseases. This study uses the Taiwan groups, thereby reducing follow-up medical costs and health
National Health Insurance database to collate and analyze the insurance expenditures.
relevant diseases leading to Hemorrhagic from Ischemic stroke. Brain stroke is referred to local occlusion or hemorrhage in
We propose several novel machine learning based algorithms and the blood vessels of the brain, which damages the brain tissue
indexing methods for disease transformation prediction and that supplies blood and nutrients by this blood vessel, causing
accuracy enhancements. They are a modified swarm algorithm, brain cell damage, necrosis, and local neurological
partial least square(PLS) algorithm and genetic algorithms. dysfunction[13].
From the petition application files accumulated from 2006 to There are two reasons for the formation of Cerebral
2013 within the National Health Insurance Database, 8,483
thrombosis - (1) abnormal blood coagulation causes the blood
patients with ischemic stroke is collected. Among them, the 1,145
viscosity to become large and form a thrombus. (2) cerebral
patients with both ischemic stroke and hemorrhagic stroke were
screened according to the ICD-9-CM diagnostic code. The
vascular (arterial) atherosclerosis, the formation of plaque,
disease history of each patient is vectorized and stacked into a narrowing the lumen of the artery, resulting in thrombosis.
matrix for analysis. The PLS/GA process is then applied on the Blood circulation is blocked, resulting in hypoxic necrosis of
disease history matrix, trying to filter out the candidate diseases the brain[15].
leading to such transformation. A total of 750 diseases were Cerebral embolism is caused by Embolons (such as blood
found to be associated with ischemic stroke and hemorrhagic clots, sclerosing plaques, fat, air bubbles, etc.) which block the
stroke through the PLS/GA process. A modified PSO algorithm cerebral blood vessels and cause ischemic necrosis of the
is developed to further weight these selected diseases. A quartile brain[14]. Hemorrhagic transformation (HT) can be a natural
rule is then applied to filter these weighted diseases to ten most outcome of patients with cerebral infarction, and can also occur
influential diseases which are dizziness, constipation, chronic after stroke treatment. In the current research, the incidence of
renal failure, hypertension, diabetes, hyperlipidemia, anxiety, hemorrhagic transformation is inconsistent. However, severe
muscle pain, prostatic hypertrophy, etc. In addition to normally hemorrhagic transformation causes human disabling and the
known diseases to such conversion in the literature such as mortality rate is high. This draws high attentions and
hypertension and diabetes, we also discovered more potential precaution efforts from clinicians. The risk factors include
diseases. The initial analysis accuracy of our proposed methods anticoagulant indications, demographic data (gender, race and
reaches 86% on average, while that of the traditional Neural age), medical factors (blood pressure, glycemic control, lipid
Network algorithm utilizing the same training data was only
distribution and renal function [2]) [6]. In this study we only
49.5%.
explore medical factors and look for related diseases.
Keywords—Partial Least Square Algorithm, Genetic Algorithm, We first analyze the set of 8,483 patients with ischemic
Particle Swarm Algorithm, ischemic stroke, hemorrhagic stroke . stroke. Among them, the 1,145 patients with both ischemic
stroke and hemorrhagic stroke are organized into a training
data feed. A GA/PLS algorithm is used to select all possible
I. INTRODUCTION relevant diseases that can lead to maximum differentiation rate
In the past four decades, stroke has become one of the top among 2770 diseases for the two classes of symptoms. Given
three causes of death in Taiwan. Stroke is divided into the still large amount of selected diseases from the GA/PLS
ischemic stroke and hemorrhagic stroke. The treatment process, we developed a modified swarm (PSO) algorithm for
methods of the two strokes are not the same. Patients with optimizing the weights of the selected diseases. Once a set of
ischemic stroke will be treated by neurologists. Patients with optimal weight distribution is found to reach an accuracy of
hemorrhagic stroke will be referred to neurosurgeons. 87%, a quartile rule is then applied to further filter the 750
Therefore, when a patient turns bleedingġ from ischemic in diseases down to 10. In addition to two regularly recognized
ignorance, it will cause a major disaster. Therefore, it is crucial diseases causing conversion, we also discovered eight more
to develop a predictive model to identify and track high-risk diseases worthy of attentions.
diseases causing such transformation. The clinics can thus

978-1-7281-2607-4/19/$31.00 ©2019 IEEE 720

DOI 10.1109/COMPSAC.2019.00108

Authorized licensed use limited to: REVA UNIVERSITY. Downloaded on September 13,2022 at 03:06:21 UTC from IEEE Xplore. Restrictions apply.
Section II describes the previous study of disease results showed that they did not prove to be influential,
transformation specific about ischemic stroke transformation. although other researchers speculated that there was a cause
Section III describes the research methods which includes data and effect relationship.
preprocessing, Partial Least Squares as a dimension reduction In this paper, we proposed to apply PLS/GA and PSO
method, the GA/PLS method as the machine learning algorithm for the data mining on a health care database in
techniques adapted by this paper, and the swarm algorithm for Taiwan. The PLS/GA is used to exercise an initial reduction
further feature extraction. Section IV gives the experiment on the problem dimension of 2770 diseases to 750 diseases.
results of the big data set of stroke samples per proposed The process includes adjusting the weights and removal of the
machine learning methods. We compare the test data accuracy noise within the training dataset. Given the still large set of
between the traditional neural network method and our dataset, we developed a modified PSO swarm algorithm to fine
GA/PLS methods with regard to the true positive and false tune these diseases and apply a basic quartile rule in statistics
negative rate. Finally, we enlisted ten high risk diseases most to find a threshold value to filter out most relevant diseases for
likely leading to the transformation. final evaluation and diagnosis application in the clinics.
ġ
II. RELEVANT WORKS
ŊŊŊ.RESEARCH METHODS
In many related studies, some have been made in the causes This study used gene algorithm plus PLS to find closely
analysis of stroke, ischemic stroke and hemorrhagic stroke, but related diseases, and finally used PSO to find out the most
most of them are regression analysis, multivariate, logical influential diseases. Fig. 1 depicts the work flow presented in
regression, etc. These statistical methods, and the vast majority the research methods. As observed Fig. 1, the disease history
are directed at specific individual diseases, or physiological of each patient is entered as input to the GA process for
factors that are known to be affected. In this paper, we work dimension reduction. The selected gene decides the set of
on a big data analysis on all the diseases that all patients diseases for principal components analysis(PCA) through the
receive. The goal is to find out if there are other diseases that PLS process. Given the PC values of each sample point (each
may affect the transformation. The following is a comparison patient), the KNN process is exercised to derive the
of some related studies. classification quality as a fitness function value. A threshold
Jie Zhang, Yi [3] and others conducted a correlation study value is chosen as a criterion to decide if the whole process
on hemorrhagic transformation of its related inducing factors. should stop or not. Otherwise the GA and PLS algorithm is
The results showed large area infarction, cortical infarction, repeated until an optimal set of diseases is selected to generate
atrial fibrillation and cerebral embolism, hyperglycemia, low the best classification quality.
TC and low density lipoprotein Cholesterol levels, low platelet
counts, poor collateral vessels, thrombolytic therapy, globulin
increase, early CT signs, HMCAS, etc. All have high impact
factors. The authors should be more cautious in the treatment
of patients with these predictive factors, and monitor closely.
Marsh EB [2] et al. created a hemorrhagic risk stratification
(HeRS) score based on regression coefficients in multivariate
modeling, primarily for effective predictors of ischemic stroke
and anticoagulation index. The results show that deteriorating
eGFR Categories, greater infarct volume, high serum glucose,
higher NIH Stroke scale scores, and TPA treatment, leukemia
resection, and elevated white blood cell count were all
associated with an increased risk of hemorrhagic
transformation.
Chuang CS et al. [4] used the health insurance database to
conduct a generational study on the correlation between
pneumoconiosis and cerebrovascular disease. The analysis
showed that the pneumoconiosis had a higher ischemic stroke
rate than the control group. Therefore, the author's team
suggested that when patients with pneumoconiosis are
clinically diagnosed. Appropriate education, follow-up should
be considered. Risk factors should be associated with brain
stroke.
Adamson, J [6] and others studied whether stroke and
disability were related. Logistic regression modeling was used
to find that stroke is not the cause of disability, but stroke is
highly associated with disability, which may be compared with
other diseases. It is considered to be the most common cause
of complex disabilities
Peter N. Lee Ma [7] et al. studied whether there was a risk
of stroke of non-smokers in a smoking environment. The Fig. 1 Dataflow for GA/PLS/KNN

721

Authorized licensed use limited to: REVA UNIVERSITY. Downloaded on September 13,2022 at 03:06:21 UTC from IEEE Xplore. Restrictions apply.
A. Data processing
if i = 1 , [u,s,v] = svd(S)
Firstly, we used Statistical Analysis System(SAS) to
extract the medical records of ischemic stroke and if i > 1 , [u,s,v] = svd(S-Pi-1(Pi-1TPi-1)-
hemorrhagic stroke from the health care database. In the
1
hemorrhagic stroke, icd 9 cm code: 430, 431, 432 is used, and Pi-1TS)
ischemic stroke is icd. 9 cm code: 433, 434, 435, 436, 437.
Then we divide the data into two classes. One with ischemic ri = u(:,1)
stroke only and the other with ischemic stroke to hemorrhagic
stroke. The disease history of each patient of the two classes ti = X0ri
is extracted and form a single data set as a feature exploration
basis for our proposed machine learning algorithms. The pi = X0Tti/(tiTti)
characteristic of such dataset is a list consisting of all diseases,
while the value for each element is 1 if the patient has been end
diagnosed with a specific disease, otherwise 0, if not with the
disease. The total size of the list is 2770 for all diseases BPLS=Rk(TkTTk)-1TkTY0
accounted for by the national health care database.
Fig. 2 SIMPLS algorithm listing
B. Partial Least Squares(SIMPLS)
PLS is a method of multivariate statistical analysis, selected as the grouping standard. The training sample binary
proposed by Herman Wold (1975), which has developed 11 ⋯ 1
rapidly in theory, methods and applications in recent years. In M = ⋮ ⋱ ⋮
the establishment of the model, PLS combines important matrix, X0= 1 ⋯ ͉ ij,={0,1}, where
statistical techniques such as multiple linear regression,
i=1..p(patient_numer), j=1..k, from GA. k is the reduced
principal component analysis and canonical correlation
disease number of 750 from the original total of 2770 diseases
analysis to realize the application of various data analysis.
and Y0=[1..j].
However, the traditional PLS must be subjected to repeated
Before gene manipulation, two procedures are needed
iterative operations and data compression to obtain the correct
according to Fig. 3. They are PC score calculations based on
weight vector and fraction vector, which is time consuming.
KNN algorithm and fitness function calculation. PC score
Therefore, this paper uses De Jong, S. (1993) to propose a
calculation is based on the general PCA(Principle Component
method [8], which obtains the fractional vector directly from
Analysis) which needs Mij on the selected diseases signal
the original data, without repeated iterative operations. This
profile accumulated as eigenvectors. PC score of each sample
method is SIMPLS. The weight vectors r, q are unitized, that
is derived from the first and second principle components of
is, r ŕ =1, q q́ =1, and r1 is the first left singular vector of S=X Y
́
the PLS calculation in Fig. 3. Given PC scores of all samples
after the end of the first round operation. q1 is the first right
of the assigned classes, the KNN algorithm is exercised to
singular vector of S=X ́Y which is different from the fractional
calculate the data required by the fitness function for the GA
vector in which we take the weight vector as the largest
algorithm. This will be explained in more depth after the next
variation. In SIMPLS algorithm, Fig. 2, we set the weight
paragraph of fitness function explanation. Eq.(1) gives the
vector r be the first left singular vector of S. The linear model
definition of the fitness function. The fitness function is
estimation formula of Y can be obtained from Y=TC ́+F* and
composed of normalized class weights of SHC(s)/Kc and
C ́=T ́Y [9] as in Fig.2.
sample weights of SW(s). Firstly, the number of nearest
C. Genetic Analysis and Partial Leaset Square(GA/PLS) neighbors with the same class label as the sample point in
The GA-PLS algorithm pseudo code is shown in Fig. 3. At question is computed as sample hit count (SHC), (0 < SHC(s)
first, the gene algorithm randomly generates a number of < Kc), where Kc is the number of samples in class c. These
binary strings of length equal to the length of the original numbers are fixed for each generation of gene mutation.
1
¦¦ K u SHCs u SW s
training data. This is called the first generation code, each of
which is called a chromosome (chromosome). Each set of F (d ) (1)
c sc c
codes represents a feasible solution (possibly containing the
CW c
CW c
best solution). For a single chromosome, all data set to 1 will
u 100 (2)
be extracted and included in a subset ready for the PLS process.
Then, the original 2770 dimensions were reduced to 750 ¦ CW c
c
dimensions by GA in Fig.3, and all the dimensions were
SW s
SIMPLS Algorithm SW s u CW c (3)
¦ SW s
S = X0TY sc
Each principal component plot generated for each feature
subset is scored using the K-nearest neighbor (K–NN)
for i = 1 to k classification algorithm. For a given data point for a patient,
Euclidean distances are computed between it and every other

722

Authorized licensed use limited to: REVA UNIVERSITY. Downloaded on September 13,2022 at 03:06:21 UTC from IEEE Xplore. Restrictions apply.
point in the principal component plot. These distances are weight for s during generation g+1. Both are influenced by
sorted and a poll is then taken of the point’s K-nearest SHRg. Classes and samples with low hit rates will be
neighbors. For the most ideal classification situation, K equals weighted more heavily, that is, they will have more influences
the number of samples in the class to which the point belongs, in the fitness calculation, than classes or samples that score
Kc in this case. well. The purpose is to enhance the overall sample hit count
Following the steps of PC score and fitness function which is directly proportional to the quality of sample
calculation, GA takes over and performs gene manipulation to agglomeration of final selected classes.
generate a new set of feature disease selection ready for the
sample weight adjustments step. This step guarantees the 1 ]
SHCi s
successful operation of the GA, thereby helping to minimize SHRg s ¦ ˈ s c (4)
the problem of convergence to a local optimum. ] i 1 Kc
Sample weight adjustments is performed in three steps.
1
First, the sample hit rate(SHR), which is the mean value of CHR g c ¦ SHR s ˈ s c (5)
SHC/Kc over all feature subsets(ξ) produced in a particular ] sc
g

generation ‘g’(see Eq. 4). The class hit rate (i.e., average
sample hit rate for all samples in a class) is computed using the CWg 1 c CW c 1 CHR c
g g (6)
SHRg. CHR provides consistent information about the
difficulty in classifying a particular sample type. Second, class,
SWg 1 s SW s 1 SHR s
g g (7)
and sample weights are adjusted during each generation via
Eqs. 6 and 7 respectively where CWg+1(c) is the class weight
for c during the generation g+1, and SW(s)g+1 is the sample
Algorithm : GA+PLS+KNN

// Initialise generation 0:
11 ⋯ 1
M = ⋮ ⋱ ⋮
Input: Training data set, ⋯
Ǵαij={0,1}Ǵp= number of patientǴm=total disease
1

numberǴ0 stands for without the disease, while 1 stands for with such disease. Binary strings of
chromosomes, Gij(i=1..n, j=1..m), n := the number of individual chromosomes in the population;

k := 0;
χ := the fraction of the population to be replaced by crossover in each iteration, 0.5 in this application;
μ := the mutation rate;
Pk := a population of n randomly-generated individuals;

Output: One string in Gij with the best fitness function

SHC(s): sample hit count for class c, s:=1..2, given a PC(principle component) from PLS calculation of
Ri, based on a random initial distribution of Gij
Kc(c): sample number in class c
SW(s): sample weight in class c, initially equal to 1/m for generation 0

// Evaluate Pk:
Compute fitness(i) for each i ∈ Pk;
Kc di
1
F (d ) AVG( ¦¦ u SHC s u SW s ), SW ( s) ¦1 e kc , d i is the Euclidean distance to s in PLS
c 1..2 sc K c i 1

space.
do { // Create generation k + 1:
// 1. Copy:
Select (1 − χ) × n members of the sorted Pk and insert into Pk+1;
// 2. Crossover:

723

Authorized licensed use limited to: REVA UNIVERSITY. Downloaded on September 13,2022 at 03:06:21 UTC from IEEE Xplore. Restrictions apply.
Select χ × n members of the sorted Pk based on Fk(d);
pair them up;
produce offspring;
insert the offspring into Pk+1;
// 3. Mutate:
Select μ × n members of Pk+1;
invert a randomly-selected bit in each;
// Evaluate Pk+1:
Compute fitnessk+1(i) for each chromosome i ∈ Pk+1 via eq.(1)~(7);
For( i= 1 to n)%for each chromosome
For ( j = 1 to s, size of training sample)
KNN(j) ← list of KNN neighbors of sample j, length =class size of sample j, Kc
CLASS_ID( j ) ← belonging classes IDs of each sample in KNN(j), size=Kc
SHC(i, j ) ← 0
For ( u = 1 to Kc)
If CLASS_ID( j , u ) = sample j’s belonging class ID
SHC( i, j )++
End
End
End
End
For ( u= 1 to m)% calculate new sample weight for every sample
% via a mean SHR for all chromosomes from the 1 st to the n st
SHR(u) = mean(SHC( i, j )/Kc, i=1..n)
SW(u) = SW(u) + (1- SHR(u) )
End

%modify the class weight on next training generation

For(w = 1 to size of all training chromosome)
CHR(w) = AVG(SHR(s), s = 1 to Kc)
CW(w) = CW(w) + (1- CHR(w) )
End
%compound the sample weight with the class weight
For ( u= 1 to m)
SW(u) = CW(c)*SW(u)/SUM(SW(s, s c ))
End

For( i= 1 to n)
Kc di
1
F (i, d ) k 1 AVG(¦¦ u SHCs u SW s ) 炻 SW ( s ) ¦1 e kc

c sc K c i 1

End
// Increment:
k := k + 1;
} while fitness of fittest individual in Pk is not high enough;
return the fittest individual from Pk;
Fig. 3 GA/PLS/KNN algorithm pseudo code

724

Authorized licensed use limited to: REVA UNIVERSITY. Downloaded on September 13,2022 at 03:06:21 UTC from IEEE Xplore. Restrictions apply.
Disease weight distribution optimization using PSO particle swarm algorithm

11 ⋯ 1
C = ⋮ ⋱ ⋮
Input: 1 ⋯ the binary matrix of all selected diseases for all patients after PLS/GA process, k being the
reduced number of 750 from the original total of 2770 diseases.
1 ∗ 11 ⋯ ∗ 1
X0=C = ⋮ ⋱ ⋮
∗ 1 ⋯ ∗ 1
Output: [w1, w2,…wk] for selected diseases with max. F(d) cost with 1

, and Y0=[1..j] in the SIMPLS.

Population ← 0
Pg_best ← 0
c1=c2←2.0
rand←-1~1
P={P1, P2…P10}; %particle 1~particle 10
X={X1,X2…X10}; %Ten weight distributions of k diseases for particle 1~ 10
Pwt ={Xi1, Xi2…Xik}; %The weight distributions of k diseases for particle i
V={V1,V2…V10}; %particle velocity for each particle 1~10
Pvelocity={Vi1, Vi2…Vik}; %The changing velocity for weights, xij, in particle i
PopulationSize← 10
For (i = 1 to PopulationSize)
Pvel ← Random Velocity()
Pwt(i,k) ← Random Position(PopulationSize)
Pp_best ← Pwt(i,k)
Kc di
1
% Cost(p)← F ( p) ¦¦ u SHC s u SW s , SW ( s) ¦1 e kc
, d i is the Euclidean %distance to s in PLS space.
c 1..2 sc K c i 1
If (Cost(Pp_best) ʀ Cost(Pg_best))
Pg_best ← Pp_best
End
End

While (~StopCondition())
For (P ∈ population)
If (Pwt(i,k) = Pg_best)
Pvelocity ← 0
Else
Pvelocity ← v(t+1) = w x Pvelocity(t)+c1 x rand x (Pbest - Pwt (t))+c2 x rand(Pg_best –Pwt (t)) eq.(10)
Pwt(I,k) ←Pwt (t+1)= Pwt (t)+ Pvelocity (t+1)
End
If (Cost(Pwt) ≥ Cost(Pp_best))
Pp_best ← Pwt(i,k)
If (Cost(Pp_best) ≥ Cost(Pg_best))
Pg_best ← Pp_best
End
End
End
End
Return(Pg_best)
Fig.4 Modified swarm (PSO) algorithm

During each generation, the selection, crossover, and mated with the top half of the random population, guaranteeing
mutation operators are applied to the chromosomes to develop that the best 50% are selected for reproduction, while ensuring
new and potentially better solutions. A fraction of the that every string in the randomized copy has a uniform
population is then selected as per the selection pressure, which chance of being selected due to the randomized selection
is usually set at 0.5. The top half of the ordered population is criterion imposed on the strings in this population.

725

Authorized licensed use limited to: REVA UNIVERSITY. Downloaded on September 13,2022 at 03:06:21 UTC from IEEE Xplore. Restrictions apply.
For each pair of strings selected for crossover, two Fig. 7 shows the classification results of feeding test data of
new strings are generated using a variation of three-point a few hundred patients containing both ischemic stroke and
crossover. As in the case of simple three-point crossover, the hemorrhagic stroke and with ischemic stroke only. These
length of each new string or solution is the same as the patients’ 750 diseases are from the optimal selection of
dimensionality of the data. However, the crossover operator GA/PLS results. The weighted KNN is calculated for these set
used by the pattern recognition GA is not compelled to of data. The first and second principle components are
preserve order among exchanged string fragments. calculated in PLS space and displayed in Fig. 7. The
separation of the two classes is quite clear and the accuracy
D. Weight Optimization using Swarm algorithm(PSO) reaches 0.85 in average.
A modified PSO algorithm is listed above in Fig. 4.
The goal is to add weights to the selected diseases from the
previous GA/PLS process. This allows further filtering of the
more relevant diseases from the total of 750 diseases after
GA/PLS. The process starts by initializing a random weight
distribution to the set of diseases. We apply ten particles to the
swarm process, while each particle contains a distribution of
weights pattern for the selected diseases. A vector of velocity
values is also assigned for each disease, forming a 750 degree
of freedom for each particle.

The Swarm process then exercises the optimization by

moving these particles around, trying to find a global optimal
fitness value in the PLS space. Each iteration of the particle
search will locate a local optimal combination of weights
leading to an optimal fitness value. The velocity vector of each
particle is updated according to eq.(10) in Fig. 4. After around
100 iterations, the global optimal distribution of weights will Fig. 5 PLS grouping data after GA, without KNN weighting
be found to generate the best classification results.

IV.EXPERIMENTS RESULT
This study used health care data from 2006 to 2013, and
selected patients with ischemic stroke and hemorrhagic stroke.
The disease's icd 9 cm code: 430, 431, 432 is a hemorrhagic
stroke, icd 9 cm code: 433 , 434, 435, 436, 437, is an ischemic
stroke. The total number of diseases reaches 2770 and serves
as the search space for the GA/PLS algorithm.
A. GA-PLS unweighted KNN
We used pre-organized two groups of patients, one with
ischemic stroke, and the other with ischemic stroke to
hemorrhagic stroke. In each of the two groups, 1000 patients
were randomly selected for training. The unweighted GA-PLS
method is used, iterating for 500 times. It can be seen from the
above two figures that the two groups although are separated, Fig. 6 PLS grouping data after GA, with KNN weighting
but not very obvious and most of the data points overlap. Fig.
5 shows the optimization process for sample distribution in
both PLS’s first and second principle components.
2nd principle component

B. GA/PLS weighted KNN

In this section, the weighted GA-PLS method is applied on
the same set of training sample as the previous paragraph. The
weighting KNN uses the eq. 2 with same iteration times as
before. Its classification results are better than that of no
weighting. The optimal selection of affecting diseases reaches
a total of 750 from the original total of 2770. These diseases
provide an optimal differentiation result between the two
groups of the test populations. The final cost function value 1st principle component
reaches around 0.85 as shown in Fig. 6.
Fig. 7 PLS classification result of test data

726

Authorized licensed use limited to: REVA UNIVERSITY. Downloaded on September 13,2022 at 03:06:21 UTC from IEEE Xplore. Restrictions apply.
Table 1. Filtered disease using the upper outlier of 161
JDE!
Ejtfbtf!obnf!
dpef!
8915! Ej{{joftt!boe!hjeejoftt!
6751! Dpotujqbujpo!
696! Dispojd!sfobm!gbjmvsf!
512! +Fttfoujbm!izqfsufotjpo!
Ejbcfuft!xjui!votqfdjgjfe!
dpnqmjdbujpo-!Uzqf!JJ!\opo.jotvmjo! Fig. 8 disease weighting distribution after PSO process
361:1! efqfoefou!uzqf^\OJEEN!uzqf^\bevmu.
potfu!uzqf^!ps!votqfdjgjfe!uzqf-!
opu!tubufe!bt!vodpouspmmfe!
8911:! Puifs!bmufsbujpo!pg!dpotdjpvtoftt!
3833! Njyfe!izqfsmjqjefnjb!
41111! Boyjfuz!tubuf-!votqfdjgjfe!
83:2! Nzbmhjb!boe!nzptjujt-!votqfdjgjfe!
7111! Izqfsuspqiz!)cfojho*!pg!qsptubuf!
Cfojho!izqfsufotjwf!ifbsu!ejtfbtf! Fig. 9 Box and whisker plot of disease weights in Fig. 8
51321!
xjuipvu!dpohftujwf!ifbsu!!
box and whisker plot of the quartile calculation is shown in
Fig. 9.
C. Modified PSO for disease weights optimiation D. Artifical Neural Network
To further refine the search results from the GA/PLS The traditional method of Neural Network(NN) is used
process, we perform a filtering process by PSO algorithm to in this section to see if it performs better. The diseases
discover which diseases have a greater impact to the obtained by using the GA-PLS weighted method is used in
transformation. The idea is to add weights to the diseases the NN instead of all the diseases in the database. The
selected by the previous GA-PLS process. Instead of a motivation is to ensure a common comparison ground.
unity and constant weight put on the selected disease in the Network type - reverse transfer algorithm, training
original GA/PLS process, we put a variety of weights to function - Belle regularization (TRAINBR), adaptive
each disease within the optimal set of pool of disease. A set learning function - gradient descending weight value with
of 30 particles is used in the PSO algorithm to search for a momentum
combination of weights in order to achieve an optimal F(d)
objective value in the PLS/KNN calculation. The best seed (LEARNGDM). Network layer : 2 layers (hidden layer,
contains the best distribution of the weights to each of the output layer) , the first hidden layer contains 10 neurons.
disease. Fig. 3 describes the detail of the process. The The hidden layer transfer function is LOGSIG (logarithmic
weight of diseases can determine which diseases have a double bending), while the second layer of output layer
greater impact on ischemic hemorrhagic stroke. Given the contains 2 neurons. The output layer transfer function uses
optimal weight distributions of the disease, we apply a PURELIN (linear), other preset parameters are not Change,
quartile rule to these weights and calculate the outlier value. only the maximum number of failed verification data is
The rule first finds the lower quartile (Q1) and the upper changed from 6 to 100.
quartile (Q3), and then calculates the interquartile range As shown in Fig. 10, we can see that the neural networks
(ΔQ) = Q3-Q1, since we only want to know which diseases doesn’t show superior result. Their resolution is only 50%,
have the highest degree of influence, the outlier = Q3 + 1.5 which is similar to throwing coins.
* ΔQ. As long as the weight exceeds this threshold, it is
considered candidate diseases affecting from ischemic to
hemorrhagic stroke. The optimal weight distribution and
outlier threshold is picked to be 161 and shown in Fig. 8. A
total of 10 diseases is filtered and shown in Table 1. The

727

Authorized licensed use limited to: REVA UNIVERSITY. Downloaded on September 13,2022 at 03:06:21 UTC from IEEE Xplore. Restrictions apply.
the ANN shows a lower accuracy in average in prediction of
both kind of disease correlation, the prediction accuracy in
predicting the occur of type 1 conversion reaches 77 percent.
ŗįńŐŏńōŖŔŊŐŏ

Our proposed PLS/GA/KNN algorithm worksġside by

side with the modified PSO algorithm in filtering out the
relevant diseases causing the transformation of ischemic
stroke to hemorrhagic stroke transformation. The PSO
algorithm works on the fine tuning of the disease weights
for optimal fitness value. The search space is in the 750
degree freedom of selected diseases from the PLS/GA
process.

Diseases identified by PSO can be found to affect

ischemic stroke to hemorrhagic stroke transformation which
include dizziness, constipation, chronic renal failure,
hypertension, diabetes, hyperlipidemia, anxiety, muscle pain,
prostatic hypertrophy, etc. In the recent medical
researches[2,10], hypertension and hyperglycemia have
been identified as risk factors for triggering the
transformation. The above results are consistent with our
findings and the credibility is thus quite high. In addition,
Fig. 10 Artificial Neural Network parameters settings our research discovered more diseases that could have led to
the transformation. These evidences have been enforced by
E. Comparison of test data accuracy the high differentiation rate of our novel machine learning
The testing data accuracies of PLS/GA/KNN process and data mining methods. Although these evidences cannot
while applying a variety of noise removal and weighted prove to have derivative relationships among each other,
KNN index is compared and listed in Table2. The data in more work need to be done on the fine tune of brain stroke
table 2 lists the P1 and P2 and average of P1 and P2. P1 is research. However, if the patient already has an ischemic
the ratio that the first type of data is assigned to the correct stroke and has these relevant diseases, it is still necessary to
class, that is, the patient with ischemic stroke to pay more attention to the situation of these patients in order
hemorrhagic stroke. P2 is the ratio that the second type of to avoid the deterioration of the condition.
data is assigned to the correct group, that is, the probability
of being paired by patients with ischemic stroke only. ACKNOWLEDGMENT
From these data, we can observe that in the GA/PLS process, Authors wish to appreciate the kindly support from Ministry
the weighted KNN coupled with noise removal produces the of Science and Technology, Taiwan, R.O.C. for this work
highest accuracy. It reaches the ratio of 0.86 in average, under the grant number of “MOST 106-2221-E-029-006 -”.
while that of Neural Network only reaches 0.495. This
provides another prevail evidence of our proposed REFERENCES
algorithms. [1] Bilic, I., G. Dzamonja, I. Lusic, M. Matijaca, and K. Caljkusic. 2009.
This fact not only proves the effective of our KNN "Risk factors and outcome differences between ischemic and
weighting model in enhancing the disease selection, but the hemorrhagic stroke." Acta Clinica Croatica 48 (4): 399-403.
efficiency of noise removal. It is worth noting that although [2] Marsh EB, Llinas RH, Schneider AL, Hillis AE, Lawrence E,
Dziedzic P, et al. Predicting hemorrhagic transformation of acute
ischemic stroke: prospective validation of the HeRS Score. Medicine
Table 2 Testing accuracy comparison (Baltimore). 2016;95:e2430.
Process tuning parameters P1 P2 Average [3] Zhang J, Yang Y, Sun H, Xing Y. Hemorrhagic transformation after
GA-PLS(Unweighted, no 0.54 0.58 0.56 cerebral infarction: current concepts and challenges. Ann Transl Med.
2014;2:81.
noise removal)
[4] Chuang CS, Ho SC, Lin CL, Lin MC, Kao CH.” Risk of
GA-PLS(Weighted, no noise 0.60 0.66 0.63 Cerebrovascular Events in Pneumoconiosis Patients: A Population-
removal) based Study, 1996-2011.” Medicine (Baltimore). 2016 Mar; 95(9):
Neural network 0.77 0.22 0.495 e2944.
[5] Shi, Guangyi, et al. The human body characteristic parameters
GA-PLS(Weighted, noise 0.72 1.00 0.86 extraction and disease tendency prediction based on multi-sensing
removal) fusion algorithms. In: Cyber Technology in Automation, Control, and

728

Authorized licensed use limited to: REVA UNIVERSITY. Downloaded on September 13,2022 at 03:06:21 UTC from IEEE Xplore. Restrictions apply.
Intelligent Systems (CYBER), 2016 IEEE International Conference
on. IEEE, 2016. p. 126-130.
[6] Adamson, Joy; Beswick, Andy; EBRAHIM, Shah. Is stroke the most
common cause of disability?. Journal of Stroke and Cerebrovascular
Diseases, 2004, 13.4: 171-177.
[7] Lee, Peter N., et al. Environmental Tobacco Smoke Exposure and
Risk of Stroke in Never Smokers: An Updated Review with Meta-
Analysis. Journal of Stroke and Cerebrovascular Diseases, 2017, 26.1:
204-216.
[8] De Jong, S. (1993). SIMPLSʁan alternative approach to partial least
squares regression. Chemometrics and Intelligent Laboratory
Systems , 18:251将263.
[9] Dong-Yan Huang, Zhengchen Zhang, and Shuzhi Sam Ge, "Speaker
state classification based on fusion of asymmetricsimple partial least
squares (simpls) and support vector machines," Computer Speech &
Language, vol. 28, no. 2,pp. 392–419, 2014.
[10] Jae-woo Lee,Hyun-sun Lim,Dong-wook Kimb Soon-ae Shin,Jinkwon
Kim,Bora Yoo,Kyung-hee Cho (2017),“The development and
implementation of stroke risk prediction model in National Health
Insurance Service's personal health record”, Computer methods and
programs in biomedicine, Vol.153,pp.253-257.
[11] R. C. Eberhart, and J. Kennedy, “new optimizer using particle swarm
theory”, Proc. Sixth International Symposium on Micro Machine and
Human Science, Nagoya, Japan, 1995, pp.39-43.
[12] J. Kennedy, and R. C. Eberhart, “Particle swarm optimization”, Proc.
IEEE International Conference on Neural Networks (Perth, Australia),
IEEE Service Center, Piscataway, NJ, 1995, pp. IV: 1942-1948. obile
systems, applications, and services, Pages 205-218, 2005.
[13] Bilic, I., G. Dzamonja, I. Lusic, M. Matijaca, and K.
Caljkusic(2009).“Risk factors and outcome differences between
ischemic and hemorrhagic stroke.”, Acta Clinica Croatica 48 (4):
399-403.
[14] K. Toyoda, Y. Okada, S. Kobayashi(2007),“Early recurrence of
ischemic stroke in Japanese patients: the Japan standard stroke
registry study.”,Cerebrovasc , pp. 289-295.
[15] Y.Y. Lee, K.L. Lin, H.S. Wang, M.L. Chou, P.C. Hung, M.Y. Hsieh,
et al.(2008),“Risk factors and outcomes of childhood ischemic stroke
in Taiwan.”,Brain, pp. 14-19.

729

Authorized licensed use limited to: REVA UNIVERSITY. Downloaded on September 13,2022 at 03:06:21 UTC from IEEE Xplore. Restrictions apply.

Id No.: CVD/NCD Assessment Form For Adults 20 Years Old and Above
83% (6)
Id No.: CVD/NCD Assessment Form For Adults 20 Years Old and Above
1 page
(50 Studies Every Doctor Should Know (Series) ) David Y. Hwang, David M. Greer, Michael E. Hochman-50 Studies Every Neurologist Should Know-Oxford University Press (2016)
100% (1)
(50 Studies Every Doctor Should Know (Series) ) David Y. Hwang, David M. Greer, Michael E. Hochman-50 Studies Every Neurologist Should Know-Oxford University Press (2016)
378 pages
Ischaemic Stroke FM
No ratings yet
Ischaemic Stroke FM
137 pages
jcc2024126 51732728
No ratings yet
jcc2024126 51732728
13 pages
(23005319 - Acta Mechanica Et Automatica) Attribute Selection For Stroke Prediction
No ratings yet
(23005319 - Acta Mechanica Et Automatica) Attribute Selection For Stroke Prediction
5 pages
4
No ratings yet
4
7 pages
Deep Learning For Hemorrhagic Lesion Detection and Segmentation On Brain CT Images
No ratings yet
Deep Learning For Hemorrhagic Lesion Detection and Segmentation On Brain CT Images
13 pages
67891-Galley+proof
No ratings yet
67891-Galley+proof
10 pages
Ahe 401 Stroke and Oral Health Research Paper
No ratings yet
Ahe 401 Stroke and Oral Health Research Paper
7 pages
8867-Article Text-10094-1-10-20231128
No ratings yet
8867-Article Text-10094-1-10-20231128
9 pages
Detection of Stroke Disease Using Machine Learning Algorithams Full
No ratings yet
Detection of Stroke Disease Using Machine Learning Algorithams Full
57 pages
Frontiers Neutrophil-to-Lymphocyte Ratio As A Predictive Biomarker For Stroke Severity and Short-Term Prognosis in Acute Ische
No ratings yet
Frontiers Neutrophil-to-Lymphocyte Ratio As A Predictive Biomarker For Stroke Severity and Short-Term Prognosis in Acute Ische
1 page
Answer 1
No ratings yet
Answer 1
6 pages
Ischemic Stroke Epidemiology, Classification, Risk Factors, Etiopathogenesis and Investigations
No ratings yet
Ischemic Stroke Epidemiology, Classification, Risk Factors, Etiopathogenesis and Investigations
84 pages
Sensors 22 04670
No ratings yet
Sensors 22 04670
13 pages
Stroke Epidemiology: Advancing Our Understanding of Disease Mechanism and Therapy
No ratings yet
Stroke Epidemiology: Advancing Our Understanding of Disease Mechanism and Therapy
12 pages
STROKE DISEASE PREDICTION USING Bidectional Finnal
No ratings yet
STROKE DISEASE PREDICTION USING Bidectional Finnal
10 pages
Computed-Tomography a Powerful Tool for Diagnosis of Pediatric and Adult Congenital Heart Disease: Methodology and Interpretation Guide
From Everand
Computed-Tomography a Powerful Tool for Diagnosis of Pediatric and Adult Congenital Heart Disease: Methodology and Interpretation Guide
Jami G. Shakibi
No ratings yet
Arvi FRND
No ratings yet
Arvi FRND
60 pages
FCPS Dissertation Research Protocol Fresh No Print 2, 15, 16 Page
100% (6)
FCPS Dissertation Research Protocol Fresh No Print 2, 15, 16 Page
21 pages
Case-Study Hemorrhagic-Stroke Final
No ratings yet
Case-Study Hemorrhagic-Stroke Final
102 pages
Stroke Prediction using Machine Learning
No ratings yet
Stroke Prediction using Machine Learning
6 pages
Daniel Project Chapter All
No ratings yet
Daniel Project Chapter All
73 pages
Mechanisms of Adaptation and Reconstruction in the Hypoperfused Brain
From Everand
Mechanisms of Adaptation and Reconstruction in the Hypoperfused Brain
Melanie Tamara Carolin Kuffner
No ratings yet
A Case Presentation On Cerebrovascular Disease Infarct
No ratings yet
A Case Presentation On Cerebrovascular Disease Infarct
50 pages
71st AACC Annual Scientific Meeting
From Everand
71st AACC Annual Scientific Meeting
CTI Meeting Technology
No ratings yet
TSP CMC 34400
No ratings yet
TSP CMC 34400
16 pages
CASE-STUDY HEMORRHAGIC-STROKE Final
67% (3)
CASE-STUDY HEMORRHAGIC-STROKE Final
102 pages
Strokeaha 110 601922
No ratings yet
Strokeaha 110 601922
5 pages
Prediction of Stroke Using Machine Learning: June 2020
No ratings yet
Prediction of Stroke Using Machine Learning: June 2020
10 pages
Prediction of Stroke Using Deep Learning Model: October 2017
No ratings yet
Prediction of Stroke Using Deep Learning Model: October 2017
10 pages
Icec 2020
No ratings yet
Icec 2020
5 pages
Ischemic Stroke in Young Adults in Bogota, Colombia: A Cross-Sectional Study
No ratings yet
Ischemic Stroke in Young Adults in Bogota, Colombia: A Cross-Sectional Study
7 pages
Stroke Prediction System Using ANN (Artificial Neural Network)
No ratings yet
Stroke Prediction System Using ANN (Artificial Neural Network)
3 pages
Nutrition Guide For Clinicians - STROKE
No ratings yet
Nutrition Guide For Clinicians - STROKE
12 pages
Stroke
No ratings yet
Stroke
114 pages
Cerebrovascular Disease Residents 1
No ratings yet
Cerebrovascular Disease Residents 1
20 pages
The_prediction_and_feature_importance_analysis_of_
No ratings yet
The_prediction_and_feature_importance_analysis_of_
5 pages
Ambulatory Blood Pressure Monitoring: Practical Insights: Medical Series
From Everand
Ambulatory Blood Pressure Monitoring: Practical Insights: Medical Series
Taha Othmane
No ratings yet
An Artificial Intelligence Approach For Predicting Different Types of Stroke
No ratings yet
An Artificial Intelligence Approach For Predicting Different Types of Stroke
4 pages
Clinical Presentation and Epidemiology of Stroke - A Study of 100 Cases
No ratings yet
Clinical Presentation and Epidemiology of Stroke - A Study of 100 Cases
4 pages
74 ch067 605 610 9780323754835
No ratings yet
74 ch067 605 610 9780323754835
13 pages
Patfall2015 Msii Anjouligerez
No ratings yet
Patfall2015 Msii Anjouligerez
26 pages
A Sea of Broken Hearts: Patient Rights in a Dangerous, Profit-Driven Health Care System
From Everand
A Sea of Broken Hearts: Patient Rights in a Dangerous, Profit-Driven Health Care System
John T. James Ph.D
No ratings yet
INTRODUCTION - The Two Broad Categories of Stroke, Hemorrhage and Ischemia
No ratings yet
INTRODUCTION - The Two Broad Categories of Stroke, Hemorrhage and Ischemia
11 pages
Stroke: Zikria, PH.D
No ratings yet
Stroke: Zikria, PH.D
24 pages
Early Stroke Disease Prediction With Facial Features Using Convolutional Neural Network Model
No ratings yet
Early Stroke Disease Prediction With Facial Features Using Convolutional Neural Network Model
8 pages
Stroke
No ratings yet
Stroke
91 pages
Stroke
100% (4)
Stroke
38 pages
Stroke
No ratings yet
Stroke
50 pages
Cauze Stroke
No ratings yet
Cauze Stroke
6 pages
Development of Random Forest Model For Stroke Prediction
No ratings yet
Development of Random Forest Model For Stroke Prediction
13 pages
Analysis of NN
No ratings yet
Analysis of NN
6 pages
Stroke in The Young Adult
No ratings yet
Stroke in The Young Adult
181 pages
Cva Case Study
No ratings yet
Cva Case Study
31 pages
Stroke and CT Perfusion
100% (1)
Stroke and CT Perfusion
27 pages
NEURO2 1.02A Stroke Generalities and Mechanism - Dr. Hiyadan
No ratings yet
NEURO2 1.02A Stroke Generalities and Mechanism - Dr. Hiyadan
3 pages
General Objectives
No ratings yet
General Objectives
30 pages
Cerebrovascular Disease Residents 2
No ratings yet
Cerebrovascular Disease Residents 2
29 pages
Ischemicstroke: Advances in Diagnosis and Management
No ratings yet
Ischemicstroke: Advances in Diagnosis and Management
20 pages
Thesis Ethics Committee
No ratings yet
Thesis Ethics Committee
28 pages
Chase 22final 22 Draft
No ratings yet
Chase 22final 22 Draft
42 pages
Head and Neck Problems: Training Packages For Health Emergencies
No ratings yet
Head and Neck Problems: Training Packages For Health Emergencies
42 pages
Diferencia Clinica Minima Detectable Gait
No ratings yet
Diferencia Clinica Minima Detectable Gait
5 pages
'05 Sullivan C.
100% (1)
'05 Sullivan C.
37 pages
DLP-Bohol - Science9 Q1 W2 D4
No ratings yet
DLP-Bohol - Science9 Q1 W2 D4
6 pages
Cushing Triad: Bradycardia, Wide Pulse Irregular Respirations
No ratings yet
Cushing Triad: Bradycardia, Wide Pulse Irregular Respirations
3 pages
SVIN Annual Meet - 2023
No ratings yet
SVIN Annual Meet - 2023
3 pages
SOAL Soal UNBK Bahasa Inggris PDF
No ratings yet
SOAL Soal UNBK Bahasa Inggris PDF
5 pages
Cerebrovascular Accident
No ratings yet
Cerebrovascular Accident
37 pages
Dysphagia and Aspiration Following Stroke
No ratings yet
Dysphagia and Aspiration Following Stroke
74 pages
Post-Stroke Checklist (PSC) :: Improving Life After Stroke
No ratings yet
Post-Stroke Checklist (PSC) :: Improving Life After Stroke
2 pages
Cva PPT Case Pres
No ratings yet
Cva PPT Case Pres
19 pages
31979
No ratings yet
31979
50 pages
Carte
No ratings yet
Carte
354 pages
Seminar CNE 2018
No ratings yet
Seminar CNE 2018
12 pages
Presentation Schedule
No ratings yet
Presentation Schedule
11 pages
Statins Side Effects - Pain, Inflammation, and More
No ratings yet
Statins Side Effects - Pain, Inflammation, and More
6 pages
s2 Syllabus Plan For Aug 2013
No ratings yet
s2 Syllabus Plan For Aug 2013
2 pages
Nursing Notes
No ratings yet
Nursing Notes
15 pages
Step 2 - Notes On Questions From USMLE World
100% (1)
Step 2 - Notes On Questions From USMLE World
16 pages
Download Full (Ebook) More Case Studies in Stroke: Common and Uncommon Presentations by Michael G. Hennerici, Rolf Kern, Louis R. Caplan, Kristina Szabo (eds.) ISBN 9781107610033, 1107610036 PDF All Chapters
100% (5)
Download Full (Ebook) More Case Studies in Stroke: Common and Uncommon Presentations by Michael G. Hennerici, Rolf Kern, Louis R. Caplan, Kristina Szabo (eds.) ISBN 9781107610033, 1107610036 PDF All Chapters
51 pages
Summary of Recommendations For Aspirin Use To Prevent Cardiovascular Disease
No ratings yet
Summary of Recommendations For Aspirin Use To Prevent Cardiovascular Disease
1 page
WASID Trial 2005
No ratings yet
WASID Trial 2005
12 pages
Common Diseases of The Elderly
No ratings yet
Common Diseases of The Elderly
21 pages
Referat Fraktur
No ratings yet
Referat Fraktur
42 pages
Cerebral Venous Thrombosis
No ratings yet
Cerebral Venous Thrombosis
15 pages
Disorders of The Nervous System: Vascular Disorder
100% (1)
Disorders of The Nervous System: Vascular Disorder
3 pages
2017 Y3 Saq Meq MCQ
No ratings yet
2017 Y3 Saq Meq MCQ
19 pages
Omega-3s Boost Oxygen Intake
No ratings yet
Omega-3s Boost Oxygen Intake
3 pages

2.the Causes Analysis of Ischemic Stroke Transformation Into Hemorrhagic Stroke Using PLS Partial Least Square-GA and Swarm Algorithm

Uploaded by

2.the Causes Analysis of Ischemic Stroke Transformation Into Hemorrhagic Stroke Using PLS Partial Least Square-GA and Swarm Algorithm

Uploaded by

2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC)

The causes analysis of Ischemic Stroke transformation

Chihhsiong Shih, You-Wei Chang, WillIam Cheng-Chung Chu

978-1-7281-2607-4/19/$31.00 ©2019 IEEE 720

Output: One string in Gij with the best fitness function

%modify the class weight on next training generation

, and Y0=[1..j] in the SIMPLS.

The Swarm process then exercises the optimization by

B. GA/PLS weighted KNN

Our proposed PLS/GA/KNN algorithm worksġside by

Diseases identified by PSO can be found to affect

You might also like