0% found this document useful (0 votes)

6 views

Feature Selection for Support Vector Machines With

The paper presents a new feature selection algorithm, SVM-RBF-RFE, for Support Vector Machines using a Radial Basis Function (RBF) kernel, which expands the kernel into a Maclaurin series to compute the weight vector for feature ranking. The algorithm iteratively eliminates features with the least squared weight, demonstrating improved performance over traditional methods like SVM-RFE and information gain on various datasets. Experimental results indicate that SVM-RBF-RFE is effective in enhancing classification accuracy in applications such as gene selection and text categorization.

Uploaded by

261294

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views

Feature Selection for Support Vector Machines With

Uploaded by

261294

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 18

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/220637867

Feature selection for support vector machines with RBF kernel

Article in Artificial Intelligence Review · August 2011

DOI: 10.1007/s10462-011-9205-2 · Source: DBLP

CITATIONS READS

91 11,681

4 authors, including:

Quanzhong Liu Yang Zhang

Northwest A & F University Fudan University
28 PUBLICATIONS 630 CITATIONS 1,661 PUBLICATIONS 78,830 CITATIONS

SEE PROFILE SEE PROFILE

All content following this page was uploaded by Quanzhong Liu on 15 June 2015.

The user has requested enhancement of the downloaded file.

Artif Intell Rev
DOI 10.1007/s10462-011-9205-2

Feature selection for support vector machines

with RBF kernel

Quanzhong Liu · Chihau Chen · Yang Zhang ·

Zhengguo Hu

© Springer Science+Business Media B.V. 2011

Abstract Linear kernel Support Vector Machine Recursive Feature Elimination (SVM-
RFE) is known as an excellent feature selection algorithm. Nonlinear SVM is a black box
classifier for which we do not know the mapping function explicitly. Thus, the weight vec-
tor w cannot be explicitly computed. In this paper, we proposed a feature selection algorithm
utilizing Support Vector Machine with RBF kernel based on Recursive Feature Elimina-
tion (SVM-RBF-RFE), which expands nonlinear RBF kernel into its Maclaurin series, and
then the weight vector w is computed from the series according to the contribution made
to classification hyperplane by each feature. Using wi2 as ranking criterion, SVM-RBF-RFE
starts with all the features, and eliminates one feature with the least squared weight at each
step until all the features are ranked. We use SVM and KNN classifiers to evaluate nested
subsets of features selected by SVM-RBF-RFE. Experimental results based on 3 UCI and 3
microarray datasets show SVM-RBF-RFE generally performs better than information gain
and SVM-RFE.

Keywords Feature selection · RBF kernel · Information gain · SVM-RFE ·

Recursive Feature Elimination

Q. Liu (B) · Z. Hu
College of Mechanical and Electric Engineering, Northwest A & F University,
Yangling, 712100 Shaanxi Province, China
e-mail: [email protected]
Z. Hu
e-mail: [email protected]

C. Chen
Electrical & Computer Engineering, University of Massachusetts Dartmouth,
Dartmouth, MA 02747-2300, USA
e-mail: [email protected]

Y. Zhang
College of Information Engineering, Northwest A & F University,
Yangling, 712100 Shaanxi Province, China
e-mail: [email protected]

123
Q. Liu et al.

1 Introduction

Feature selection is a key technology which can eliminate irrelevant features, reduce data
dimensionality, and increase learning efficiency and improve predictive performance. It has
been an active research area in pattern recognition, machine learning and data mining com-
munities (Elalami 2009; Guyon and Elisseeff 2003; Huang et al. 2007; Sun 2007). Gene
selection from microarray data and feature selection from text categorization are two typ-
ical application domains. In the gene selection domain, the gene expression data usually
has thousands or even tens of thousands of genes features but much fewer tissue samples
available (usually a few hundred). In order to find gene subset which is strongly correlated
with disease (cancer), and improves the classification accuracy as well as to provide helpful
information for doctors to diagnose illness, several gene selection algorithms have exten-
sively been developed in related research literature, such as Bontempi et al. (2007), Niijima
and Kuhara (2006), Li and Yang (2005), Silva et al. (2005), Zhang et al. (2006) and so
on. In text categorization domain, documents contain hundreds of thousands of words, so
it is critical for many text data mining problems to select subsets of features that are useful
to build a good predictor (Guyon and Elisseeff 2003; Youn and Jeong 2009; Brank et al.
2002).
Feature selection algorithms can be broadly divided into two categories, the filter method
and the wrapper model (Kohavi and John 1997). Filter methods select subset of features
as a preprocessing step that ignore the effects of the selected feature subset on the perfor-
mance of learning algorithm (Claeskens et al. 2008). In the machine learning community,
information gain has been proved a successful filter algorithm (Lee and Lee 2006), and so is
Mutual information algorithm (Estevez et al. 2009). Wrappers use the classification method
to score subsets of variables. SVM-RFE is a typical successful wrapper algorithm (Li and
Yang 2005; Zhang et al. 2006), and was originally proposed to perform gene selection for
cancer classification problems (Guyon et al. 2002), and was further improved and applied
to select informative genes from bioinformatics data (Duan and Rajapakse 2004a,b; Duan
et al. 2005; Ding and Wilkins 2006; Tang et al. 2007). Wrapper methods are computationally
complex but consider the correlation among features and perform better than filter method
(Sun 2007).
For SVM-RFE algorithm discussed above, the weight vector w is a normal to the hyper-
plane that separates two classes of examples with maximum margin, which is the distance
from the hyperplane to the closest positive (negative) example. Let there be d training sam-
−
→ −
→
ples { X i , yi }, i = 1, 2, . . . , d where yi denotes the class label of sample X i . The weight
d −
→
vector can be computed by equation w = i=1 αi yi ( X i ), Here, αi (1 ≤ i ≤ d) is knowl-
edge learned by SVM. If a nonlinear kernel is used in SVM, we do not know the mapping
function explicitly, and so the weight vector cannot be computed explicitly. To the best of
our knowledge, feature selection for Support Vector Machines usually adopts a linear kernel
in the research community. In this paper, we propose a wrapper algorithm SVM-RBF-RFE,
which expands nonlinear RBF kernel into its Maclaurin series, and compute the weight vector
from the series according to the contribution made to classification by each feature. Then,
using wi2 as ranking criterion, the algorithm starts with all the features, and eliminates one
feature with the least squared weight in the weight vector at each step, and repeat this process
until all features are ranked.
In order to validate the effectiveness of our algorithm, we compare SVM-RBF-RFE
with SVM-RFE and information gain algorithms on three UCI datasets with more features
and three microarray data, and use SVM with a linear kernel (SVML), SVM with RBF

123
Feature selection for support vector machines

kernel (SVM), a Nearest Neighbor with five neighbors (5NN), and a Nearest Neighbor with
10 neighbors (10NN) to evaluate feature subsets selected by the three algorithms. Exper-
imental results show that the proposed algorithm is very encouraging over experimental
datasets.
This paper is organized as follows. Section 2 introduces a state-of-the-art feature selec-
tion algorithm SVM-RFE. Our algorithm SVM-RBF-RFE is proposed in Sect. 3. Section 4
provides our experimental results on six datasets and Sect. 5 concludes the paper.

2 SVM-RFE algorithm

SVM is a classification algorithm based on statistical learning theory (Vapnik 1998; Burges
1998). The SVM constructs a hyperplane with maximum margin in the feature space, which
is mapped from the original input space by the mapping function . Let us use − →
xi and −→
zi
to denote a pair of corresponding vectors in the original input space and in the feature space,
−
→ −
→
respectively, then Z i = ( X i ).
−→ −
→
A dataset with d samples could be represented as { X i , yi }, i = 1, 2, . . . , d, with X i ∈
{0, 1} , representing a sample data, and yi ∈ {+1, −1}, representing the class label of this
m

sample. For a testing sample X , the optimal hyperplane constructed by SVM in the feature
space is:

< w, (X ) > + b = 0 (1)

The optimization problem must satisfy the following constraints (Cristianini and Taylor
2000):
−
→
yi [< w, ( X i ) > + b] + ξi − 1 ≥ 0 ξi ≥ 0 i = 1, 2, . . . , d (2)
d

min ||w|| /2 + C
2
ξi (3)
w,ξi
i=1

It can be proved that the hyperplane which satisfies the above constraints is an optimal
hyperplane. Here, C is a specified constant that controls the trade-off between maximizing
the margin and minimizing the training error term.
The optimization problem is usually translated into its dual form by the Lagrangian. For
the detailed process, please refer to Duan and Rajapakse (2004b). We can obtain the weight
vector and the hyperplane function by the Lagrangian for this problem.

d
−
→
w= αi yi Z i (4)
i=1

d
−
→
f (Z ) = b + αi yi < Z i , Z > (5)
i=1

−
→
Here, <, > denotes the inner product of two vectors. In SVM, kernel function K ( X i ,
−
→ −
→ − →
X j ) computes the inner product of two vectors in the feature space: K ( X i , X j ) = <
−
→ −→ −
→ − →
( X i ), ( X j ) >=< Z i , Z j >, If a nonlinear kernel is applied to SVM, such as RBF,
SIGMOID kernel, the weight vector w cannot be computed directly according to equa-
tion 4 because we do not know the mapping function . A linear kernel is often used in

123
Q. Liu et al.

−
→ − → −
→ − →
the research community: K ( X i , X j ) =< X i , X j >. For a linear kernel, weight w can be
represented as:

d
−
→
w= αi yi X i (6)
i=1

If a linear kernel is used, SVM-RFE algorithm starts with all the features, and eliminates one
feature with the least squared weight at each step until all the features are ranked. In each
iteration, wi2 is used as the feature ranking criterion.
In SVM-RFE algorithm, the objective function is J = ||w||2 /2 as employed in the OBD
algorithm (LeCun et al. 1990), which approximates the change of J by removing the ith
∂J ∂2 J
gene by expanding J in the Taylor series to second order: J (i) = ∂w i
wi + ∂w 2 wi . In
2
i
each iteration, the elimination of the feature with the least squared weight will cause the least
effect on J (Guyon et al. 2002; Duan and Rajapakse 2004a; Tang et al. 2007). Therefore, wi2
is adopted as ranking criterion, and in order to improve the efficiency of the algorithm, more
features can be eliminated at each step.

3 The proposed method

Like many machine learning algorithms, we focus on sample data with discrete attributes
(features). Numerical attributes could be discretized into discrete attributes (Fayyad and Irani
1993). For a discrete attribute A, if there are |A| possible values for this attribute, then we
can use |A| Boolean literals to represent this attribute, with each Boolean literal representing
the occurrence or non-occurrence of the corresponding attribute value.
Three UCI datasets in our experiments are preprocessed according to the above discrete
method. The attributes of other three bioinformatics datasets are discrete in original sam-
ples. The expression value of each gene is represented as P, M, or A that indicates whether
RNA for the gene is present, marginal, or absent, respectively based upon the matched and
mismatched probes for the genes.

3.1 RBF Kernel function

Among many kernel functions, RBF kernel is a default and recommended kernel function
for SVM classifier. Suppose U ∈ R n , V ∈ R n , g ∈ R + , here, g is a hyper-parameter of RBF
kernel. RBF kernel is defined as:

K (U, V ) = exp(−g||U − V ||2 ) (7)

After preprocessing in the above discrete way, the input space of sample data becomes
{0, 1}n . Suppose U ∈ {0, 1}n , V ∈ {0, 1}n , g ∈ R + , we have:

n
n
||U − V ||2 = (Ui − Vi )2 = Ui (1 − Vi ) + Vi (1 − Ui ) (8)
i=1 i=1

The following mapping is used to map vector U from a vector of length n to a vector with
length 2n.

123
Feature selection for support vector machines

ψ(U ) = (U1 , U2 , U3 , . . . , Un , U1 , U2 , U3 , . . . , Un ) (9)

here, Ui = 1 − Ui , 1 ≤ i ≤ n. In the rest of the paper, we will use ψ(Ui ) to represent the
ith element of vector ψ(U ).
Similarly, we use the following mapping to map vector V .

τ (V ) = (V1 , V2 , V3 , . . . , Vn , V1 , V2 , V3 , . . . , Vn ) (10)

In the rest of the paper, we use τ (Vi ) to represent the ith element of the vector τ (V ).
Hence, we get:

||U − V ||2 =< ψ(U ), τ (V ) >=< ψ(V ), τ (U ) > (11)

Here, <, > represents the inner product between the two vectors.
Following the above mappings, RBF kernel could be represented as:

K (U, V ) = exp(−g < ψ(U ), τ (V ) >) (12)

3.2 Weight of features

−
→ −
→
A dataset with d samples could be represented as { X i , yi }, i = 1, 2, . . . , d, with X i ∈ {0, 1}n ,
representing a sample data, and yi ∈ {+1, −1}, representing the class label of this sample.
For a testing sample X , the classification function learned by non-linear SVM with RBF
kernel could be represented as:

d
−
→
F(X ) = sgn b + αi yi K ( X i , X ) (13)
i=1

Here, αi (1 ≤ i ≤ d) and b are knowledge learned by SVM; sgn() represents the sign function.
−
→
K ( X i , X ) represent a kernel function, which is a RBF kernel function here. Let’s consider
the following function:

d
−
→
f (X ) = b + αi yi K ( X i , X ) (14)
i=1
−
→ −
→ −
→
Suppose λi = αi yi K ( X i , X ) = αi yi exp(−g < ψ( X i ), τ (X ) >), here, K ( X i , X ) is
replaced with equation (12) and expanded into its Maclaurin series (Liu et al. 2007):
∞

(−1)k g k −
→
λi = αi yi 1 + < ψ( X i ), τ (X ) >k
k!
k=1

g1 −
→
= αi yi 1 − < ψ( X i ), τ (X ) >
1!
g2 −
→
+ < ψ( X i ), τ (X ) >2
2!
g3 −
→
− < ψ( X i ), τ (X ) >3
3!

+... (15)

123
Q. Liu et al.

−
→
For the item < ψ( X i ), τ (X ) >m , m ∈ N in equation 15, we use h(m, p) to represent the
−
→
sum of the coefficients of terms with length p( p < n) in the expansion of < ψ( X i ), τ (X ) >m ,
and h(m, p) can be computed by the following equation.
⎧
⎪ h(m, 1) = 1
⎪
⎨
p
p−1
h(m, p) = p m − h(m, k) p≤m (16)
⎪
⎪ k=1 k
⎩
h(m, p) = 0 p>m

Equation 16 can be proved by mathematical induction, and the detailed proof is omitted
here for lack of space.
Then, for λi , the contribution of dimension τ (X ) j1 (1 ≤ j1 ≤ 2n) to classification could
be represented as:

∞
∂λi → (−1)k g k
−
= αi yi ψ( X i ) j1 h(k, 1) (17)
∂(τ (X ) j1 ) k!
k=1

Here, we borrow the terminology from the research of mining association rules. If we look
at the dimension τ (X ) j1 as an item, then equation 17 could be looked as the classification
weight of 1-itemset {τ (X ) j1 } in λi . Therefore, the classification weight of 1-itemset {τ (X ) j1 }
for classification decision hyperplane could be represented as:

∞
→ (−1)k g k
d
∂ f (x) −
= αi yi ψ( X i ) j1 h(k, 1) (18)
∂(τ (X ) j1 ) k!
i=1 k=1

Equation 18 is used to compute the weight of each feature in experimental study.

3.3 SVM-RBF-RFE algorithm

Algorithm 1 shows the detail of our algorithm for feature selection by recursive feature elim-
ination, which starts with all the features and remove one feature with the least squared
weight at each step. Finally, all the features are ranked. In order to improve the effi-
ciency of our algorithm, the algorithm can remove per cent T oEliminate percentage rate of
attributes each time until the number of remaining attributes up to per cent T hr eshold.
And then, the algorithm switch to eliminate numT oEliminate features at each step.
Here, per cent T oEliminate, per cent T hr eshold and numT oEliminate are user defined
parameters. The detailed computation about the number of elimination features at each
step refers to step 7 in Algorithm 1. In our experiments, for 3 microarray data, we set
per cent T oEliminate = 10%, per cent T oEliminate = 80, numT oEliminate = 1.
That is to say, firstly, 10 percentage rates of attributes are eliminated per iteration until the
number of remaining features is equal to 80; secondly, the algorithm switches to remove
one feature in per iteration. For 3 UCI datasets, the algorithm starts with all the features and
eliminates one feature per iteration.
In step 3 of algorithm 1, features with the least squared weight are removed from train-
ing samples. In step 5, a SVM with RBF kernel is trained by all the training samples, and
obtaining αi which is the knowledge learned. The weight vector is computed in step 6.

123
Feature selection for support vector machines

Algorithm 1 SVM-RBF-RFE Algorithm

Require: Training Sample, X 0 = [X 1 , X 2 , . . . , X k , . . . , X l ],
Class labels, Y = [Y1 , Y2 , . . . , Yk , . . . , Yl ],
Percentage rate of attribute elimination, per cent T oEliminate,
Threshold of percent elimination, per cent T hr eshold,
Constant rate of attribute elimination per iteration, numT oEliminate,
1: Initialize: Subset of surviving features S = [1, 2, . . . , n], Feature ranked list r = [ ]
2: Repeat until S = [ ]
3: Restrict training samples to good feature indices, X r = X 0 [:, S]
4: Expand training samples according to equation 9, and obtain new training samples ψ(X )
5: Train SVM classifier with RBF kernel function by all new training samples,
α = SV M − train(ψ(X ), Y )
6: Compute the weight vector of dimension length(S) according to equation 18
d →
− (−1)k g k
w = {wi |wi = i=1 αi yi ψ( X i ) j1 ∞ k=1 k! h(k, 1)}
7: Compute the number of feature N um Elim eliminated per iteration
If ( per cent T oEliminate > 0) then
N um Elim = per cent T oEliminate ∗ lengh(S)
If (lengh(S) − N um Elim <= per cent T hr eshold) then
N um Elim = lengh(S) − per cent T hr eshold
Else if (lengh(S) >= numT oEliminate) then
N um Elim = numT oEliminate
Else N um Elim = lengh(S)
8: Find the subset f which consist of N um Elim features with the least squared weight in the weight
vector w
9: Update feature ranked list r = [S( f i ), r ], i = 1, 2, . . . , length( f )
10: Eliminate the subset f from subset s: S = S(1 : ( f i ) − 1, ( f i ) + 1 : length(S)), i = 1, 2, . . . length( f )
Ensure: Feature ranked list r

4 Experimental result

In order to validate the effectiveness of our algorithm, we perform experiments on two test-
beds, the first test-bed is composed of three UCI Datasets with more features and the second
test-bed contains three Microarray Data, and compare our algorithm with information gain
and SVM-RFE algorithms. SVML, SVM, 5NN and 10NN classifiers are used to evaluate
the feature subsets selected by our algorithm, information gain and SVM-RFE algorithms,
respectively. The four classifiers are from WEKA software which is publicly available at the
Website (http://www.cs.waikato.ac.nz/ml/weka/) in Machine Learning community.
For datasets without natural training and testing partition, 10-fold cross validation is used
and the average results on the 10-folds are reported as the final result.
Our experiments are performed on a PC with Pentium 4, 3.0 GHZ CPU and 512 MB
memory. The algorithm was implemented in Java language. In our experiments, we set
hyper-parameter of RBF kernel g = 0.1, and regulation parameter of SVM c = 1.

4.1 Datasets

The experimental data information is summarized in Table 1. Sonar, Hypo and Horse data-
sets are available from public UCI Machine Learning Repository (http://www.ics.uci.edu/
mlearn/MLRepository.html). Column 4 gives the number of features after 3 UCI datasets are
discretized.
The first microarray data is Leukemia (Golub et al. 1999). The training data set consist of
38 samples (27 acute lymphoblastic leukemia, ALL, and 11 acute myeloid leukemia, AML)
from bone marrow specimens, and the testing data set contain 34 samples (20 ALL and

123
Q. Liu et al.

Table 1 Dataset summary of three UCI data and three microarray data

DataSets #Training samples #Testing samples #Feature #Class ratio

Sonar 208 / 42 111:97

Hypo 3,163 / 57 3,012:151
Horse 368 / 78 232:136
Leukemia 38 34 7,129 47:25
DLBCL 77 / 7,129 58:19
Prostate 102 / 12,600 52:50

14 AML), which are prepared under different experimental conditions and include 24 bone
marrow and 10 blood sample specimens, and all samples contain 7,129 genes.
The second microarray data is diffuse large B-cell lymphomas (DLBCL), which contain
77 samples (Shipp et al. 2002). Fifty eight samples are diffuse large B-cell lymphomas and
19 samples are follicular lymphomas. All samples contain 7,129 genes.
The third microarray data is Prostate cancer (Singh et al. 2002). 102 samples contain 52
tumors and 50 normal prostate specimens. All samples contain 12,600 genes.
Leukemia, DLBCL and Prostate microarray data are publicly available at the Website
(http://www.broadinstitute.org/cgi-bin/cancer/datasets.cgi).

4.2 Experiments on UCI data sets

Figure 1, 2 and 3 show experimental results on Horse, Sonar and Hypo datasets, respec-
tively. X axis denotes the size of selected feature subset by information gain, SVM-RFE
and SVM-RBF-RFE algorithms, and upper boundary is the feature number of experimental
datasets. Each curve denotes the testing error rate on different feature subsets selected by
information gain, SVM-RFE and SVM-RBF-RFE algorithms, respectively. Figure(a), Fig-
ure(b), Figure(c) and Figure(d) in each figure, respectively show classification error rate of
SVML, SVM, 5NN and 10NN classifiers on different feature subsets.
For the Horse dataset with 78 features, In Fig. 1a, the classification error rate of SVML on
the original samples without feature selection is 0.1930. The lowest classification error rate of
SVML on the feature subsets selected by information gain, SVM-RFE and SVM-RBF-RFE
algorithms are 0.1493, 0.1577 and 0.1548, respectively, and the size of corresponding feature
subsets are 53, 38 and 41. In Fig. 1b, the classification error rate of SVM on the original
samples without feature selection is 0.1632. The lowest classification error rate of SVM on
the feature subsets selected by information gain, SVM-RFE and SVM-RBF-RFE algorithms
are 0.1415, 0.1332 and 0.1386, respectively, and the size of corresponding feature subsets
are 45, 26 and 16. From Fig. 1a and b, for SVML and SVM classifiers, we can see that the
classification error rate is lower on most feature subsets selected by SVM-RBF-RFE than on
the same size of feature subsets selected by information gain and SVM-RFE algorithm. In
Fig. 1c, the classification error rate of 5NN on the original samples without feature selection
is 0.1957, 5NN achieve the best performance on feature subset by SVM-RBF-RFE, and the
lowest classification error rate is 0.1251 when the size of selected feature subset by SVM-
RBF-RFE is 15. In Fig. 1d, the classification error rate of 10NN on the original samples
without feature selection is 0.1874. When the size of selected feature subset by SVM-RBF-
RFE is 13, the lowest classification error rate on 10NN is 0.1386. From Fig. 1c and d, it is

123
Feature selection for support vector machines

(a) SVML Classifier (b) SVM Classifier

Classification Error Rate

Classification Error Rate
0.38 0.38
0.36 SVM-RFE 0.36 SVM-RFE
SVM-RBF-RFE SVM-RBF-RFE
0.34 Information Gain 0.34 Information Gain
0.32 0.32
0.3 0.3
0.28 0.28
0.26 0.26
0.24 0.24
0.22 0.22
0.2 0.2
0.18 0.18
0.16 0.16
0.14 0.14
10 20 30 40 50 60 70 10 20 30 40 50 60 70
Number of Features Number of Features

(c) NN5 Classifier (d) NN10 Classifier

Classification Error Rate

0.38 0.38
0.36 SVM-RFE 0.36 SVM-RFE
0.34 SVM-RBF-RFE SVM-RBF-RFE
Information Gain 0.34 Information Gain
0.32 0.32
0.3 0.3
0.28 0.28
0.26 0.26
0.24 0.24
0.22 0.22
0.2 0.2
0.18 0.18
0.16 0.16
0.14 0.14
0.12
10 20 30 40 50 60 70 10 20 30 40 50 60 70
Number of Features Number of Features
Fig. 1 Compare the effectiveness of information gain, SVM-RFE and SVM-RBF-RFE algorithms on the data
set Horse. a, b, c, d, respectively, show SVML, SVM, 5NN and 10NN are used to evaluate feature subsets
selected by three algorithms, respectively

clear that SVM-RBF-RFE select better feature subsets than information gain and SVM-RFE
feature selection algorithms.
For the Sonar dataset with 42 features, in Fig. 2, when no feature selection algorithms is
performed on the original samples, the classification error rate on SVML, SVM ,5NN and
10NN are 0.1481, 0.1383, 0.1769, and 0.1626, respectively. From Fig. 2, we can conclude
SVML, SVM, 5NN and 10NN achieve the lowest classification error rate on feature subset
selected by SVM-RBF-RFE algorithm, the lowest classification error rate are 0.1290, 0.1240,
0.1531 and 0.1383, respectively, and the corresponding size of feature subset are 30, 34, 27
and 35 respectively. Moreover, it is clear that the classification error rate is lower on most
feature subsets selected by SVM-RBF-RFE than on the same size of feature subsets selected
by information gain and SVM-RFE algorithm.
For the Hypo dataset with 57 features, from Fig. 3, we conclude that information gain,
SVM-RFE and SVM-RBF-RFE algorithms cannot improve the performance of SVML and
SVM classifiers. In Fig. 3c, the classification error rate of 5NN on the original samples with-
out feature selection is 0.0130, and 5NN achieves the best performance on feature subset
selected by SVM-RBF-RFE, and the lowest classification error rate is 0.0073 when 5NN
performs on feature subsets with size from 4 to 20. In Fig. 3d, the classification error rate
of 10NN on the original samples without feature selection is 0.0133, and 10NN achieve the
best performance on feature subset selected by SVM-RBF-RFE, and the lowest classification
error rate is 0.0073 when 10NN performs on feature subsets with size from 4 to 5. Moreover,
for 5NN and 10NN, it is clear that the classification error rate is lower on most feature subsets

123
Q. Liu et al.

(a) SVML Classifier (b) SVM Classifier

Classification Error Rate

0.56 SVM-RFE 0.56 SVM-RFE
0.52 SVM-RBF-RFE 0.52 SVM-RBF-RFE
0.48 Information Gain 0.48 Information Gain
0.44 0.44
0.4 0.4
0.36 0.36
0.32 0.32
0.28 0.28
0.24 0.24
0.2 0.2
0.16 0.16
0.12 0.12
4 8 12 16 20 24 28 32 36 40 4 8 12 16 20 24 28 32 36 40
Number of Features Number of Features
(c) NN5 Classifier (d) NN10 Classifier

Classification Error Rate

0.56 SVM-RFE 0.56 SVM-RFE

0.52 SVM-RBF-RFE 0.52 SVM-RBF-RFE
0.48 Information Gain 0.48 Information Gain
0.44 0.44
0.4 0.4
0.36 0.36
0.32 0.32
0.28 0.28
0.24 0.24
0.2 0.2
0.16 0.16
4 8 12 16 20 24 28 32 36 40 4 8 12 16 20 24 28 32 36 40
Number of Features Number of Features
Fig. 2 Compare the effectiveness of information gain, SVM-RFE and SVM-RBF-RFE algorithms on the data
set Sonar. (a, b, c, d), respectively, show SVML, SVM, 5NN and 10NN are used to evaluate feature subsets
selected by three algorithms, respectively

selected by SVM-RBF-RFE than on the same size of feature subsets selected by information
gain and SVM-RFE algorithms.

4.3 Experiments on microarray data

For microarray data, we need a threshold s defined by biologist to decide on the size of
selected gene subset. The gene subset with the best performance is selected from s gene
subsets with sizes from 1 to s. Generally speaking, it is difficult for biologist to decide pre-
cise threshold s. A large number of genes are very expensive and meaningless for biologist
to research on informative genes which are strongly related with cancer. Too few genes are
difficult to construct gene regulatory network which uncovers cancer regulatory mechanisms.
In this paper, for Leukemia and DLBCL Microarray data with 7,129 genes in all samples,
we set s = 80. For Prostate Microarray data with 12,600 genes, we set s = 100.
Figure 4 shows the experimental results on selected gene subsets with sizes from 1 to 80
on Leukemia. Figure 4a and b indicate classification results on classifier SVML and SVM,
respectively. From Fig. 4a and b, we can conclude that testing error rate can achieve 0 on
gene subsets selected by SVM-RFE and SVM-RBF-RFE algorithms, respectively, and the
lowest error rate keeps stable on gene subsets with a definite size range. However, for infor-
mation gain algorithm, the classification error rate achieves 0 just on gene subset with size
equal to 22 in Fig. 4b. Figure 4c and d are classification results on 5NN and 10NN classifier,
respectively, two classifiers perform better on some gene subsets selected by SVM-RBF-RFE
than the same size of gene subsets selected by information gain or SVM-RFE algorithms,

123
Feature selection for support vector machines

(a) SVML Classifier (b) SVM Classifier

Classification Error Rate

0.048 SVM-RFE 0.048 SVM-RFE
0.044 SVM-RBF-RFE 0.044 SVM-RBF-RFE
0.04 Information Gain 0.04 Information Gain
0.036 0.036
0.032 0.032
0.028 0.028
0.024 0.024
0.02 0.02
0.016 0.016
0.012 0.012
0.008 0.008
0.004 0.004
0 0
4 8 12 16 20 24 28 32 36 40 44 48 52 56 4 8 12 16 20 24 28 32 36 40 44 48 52 56

Number of Features Number of Features

Classification Error Rate

0.05 0.05
SVM-RFE SVM-RFE
0.045 SVM-RBF-RFE 0.045 SVM-RBF-RFE
0.04 Information Gain 0.04 Information Gain
0.035 0.035
0.03 0.03
0.025 0.025
0.02 0.02
0.015 0.015
0.01 0.01
0.005 0.005
0 0
4 8 12 16 20 24 28 32 36 40 44 48 52 56 4 8 12 16 20 24 28 32 36 40 44 48 52 56

Number of Features Number of Features

Fig. 3 Compare the effectiveness of information gain, SVM-RFE and SVM-RBF-RFE algorithms on the data
set Hypo. a, b, c, d, respectively, show SVML, SVM, 5NN and 10NN are used to evaluate feature subsets
selected by three algorithms, respectively

and worse than information gain or SVM-RFE algorithms in some gene subsets. Moreover,
5NN and 10NN can achieve the highest performance on gene subsets selected by 3 algorithms.
Figure 5 shows experimental results on gene subsets with sizes from 1 to 80 on DLBCL
dataset. Figure 5a, b, c and d show classification error rate on SVML, SVM, 5NN and 10NN
classifiers, respectively. From Fig. 5a, c and d, we can see that the classification error rate is
lower on almost all of gene subsets selected by SVM-RBF-RFE than the same size of gene
subsets selected by information gain and SVM-RFE algorithms. In Fig. 5b, when the size
of gene subsets is less than 36, the curve for information gain is lower than the curve for
SVM-RBF-RFE algorithms, but when the size of gene subsets is larger than 36, the classi-
fication error rate is lower on most gene subsets selected by SVM-RBF-RFE than the same
size of selected gene subsets selected by information gain and SVM-RFE. From Fig. 5, we
can conclude that four classifiers achieve the lowest classification error rate on gene subset
selected by SVM-RBF-RFE algorithm, the lowest classification error rate are 0.0518, 0.0768,
0.0625 and 0.0768, respectively, and the corresponding size of gene subsets are 48, 44, 14
and 14, respectively.
Figure 6 gives the experimental results on gene subsets with sizes from 1 to 100 on Pros-
tate data. Figure 6a and b are classification error rate on SVML and SVM, respectively. From
Fig. 6a, we can see that SVML performs worse on gene subsets selected by SVM-RBF-RFE
than on the same size of gene subsets selected by information gain and SVM-RFE. How-
ever, in Fig. 6b, the classification error rate is lower on almost all of gene subsets selected
by SVM-RBF-RFE than on the same size of selected gene subsets by information gain and
SVM-RFE. This may be the reason that selected genes overfit the classifier used by gene

123
Q. Liu et al.

(a) SVML Classifier (b) SVM Classifier

Classification Error Rate

0.18 0.5
SVM-RFE SVM-RFE
0.16 SVM-RBF-RFE 0.45 SVM-RBF-RFE
0.14 Information Gain 0.4 Information Gain
0.12 0.35
0.3
0.1
0.25
0.08
0.2
0.06 0.15
0.04 0.1
0.02 0.05
0 0
10 20 30 40 50 60 70 80 10 20 30 40 50 60 70 80
Number of Selected Genes Number of Selected Genes

(c) NN5 Classifier (d) NN10 Classifier

Classification Error Rate

0.18 0.18
SVM-RFE SVM-RFE
0.16 SVM-RBF-RFE 0.16 SVM-RBF-RFE
0.14 Information Gain 0.14 Information Gain
0.12 0.12
0.1 0.1
0.08 0.08
0.06 0.06
0.04 0.04
0.02 0.02
0 0
10 20 30 40 50 60 70 80 10 20 30 40 50 60 70 80
Number of Selected Genes Number of Selected Genes
Fig. 4 Compare the effectiveness of information gain, SVM-RFE and SVM-RBF-RFE algorithms on the data
set Leukemia. a, b, c, d, respectively, show SVML, SVM, 5NN and 10NN are used to evaluate feature subsets
selected by three algorithms, respectively

selection algorithm (Niijima and Kuhara 2006; Deng et al. 2004). From Fig. 6c and d, which
are classification results performed on 5NN and 10NN classifiers, respectively, we can con-
clude that classification performance on almost all of selected gene subsets by SVM-RBF-
RFE is better than by information gain and SVM-RFE. Moreover, 5NN and 10NN achieve
the lowest classification error rate on gene subset by SVM-RBF-RFE algorithm, the lowest
classification error rate are 0.1482 and 0.1382, respectively, and the corresponding size of
gene subsets are 42 and 32, respectively.

4.4 Discussion on important genes selected by SVM-RBF-RFE

In this section, take Leukemia Data set as example, we show the most important genes selected
by SVM-RBF-RFE algorithm. Table 2 list rank, gene accession number(GAN) and descrip-
tion of the top 54 selected genes. Among the top 54 genes, 13 genes most highly correlated
with ALL-AML class distinction have been shown by Golub et al. (1999). The 13 genes
are U05259, Y12670, M31211, M23197, M92287, L47738, M27891, Y08612, X95735,
M84526, M63138, M80254 and M81695. X70297 and D49950 were found to have strong
prediction powers, and annotated by Onto-Express with significant biological processes and
significant molecular funcitons, moreover, X70297 was annotated by Onto-Express with
significant cellular components (Tang et al. 2007). D88422, M89957 and X03934 were iden-
tified to have a strong discrimination between ALL and AML (Ando and Iba 2004; Albrecht
2006). Many other genes of 54 have been reported as strong predictors in related literature,

123
Feature selection for support vector machines

(a) SVML Classifier (b) SVM Classifier

Classification Error Rate

0.18
0.17 SVM-RFE 0.24 SVM-RFE
0.16 SVM-RBF-RFE SVM-RBF-RFE
Information Gain 0.22 Information Gain
0.15
0.14 0.2
0.13 0.18
0.12 0.16
0.11
0.1 0.14
0.09 0.12
0.08
0.07 0.1
0.06 0.08
0.05
10 20 30 40 50 60 70 80 10 20 30 40 50 60 70 80
Number of Selected Genes Number of Selected Genes

(c) NN5 Classifier (d) NN10 Classifier

Classification Error Rate

0.24 SVM-RFE 0.24 SVM-RFE
0.22 SVM-RBF-RFE SVM-RBF-RFE
Information Gain 0.22 Information Gain
0.2 0.2
0.18 0.18
0.16 0.16
0.14 0.14
0.12 0.12
0.1
0.1
0.08
0.08
0.06
10 20 30 40 50 60 70 80 10 20 30 40 50 60 70 80
Number of Selected Genes Number of Selected Genes
Fig. 5 Compare the effectiveness of information gain, SVM-RFE and SVM-RBF-RFE algorithms on the data
set DLBCL. a, b, c, d, respectively, show SVML, SVM, 5NN and 10NN are used to evaluate feature subsets
selected by three algorithms, respectively

The significant genes such as U46499 (Ding and Peng 2005), L05148 and M11722 (Ho et al.
2006), M77142, M31166, D14874 and M54995 (Draminski et al. 2008), M30703 (Tong et al.
2009), X16665 (Schoch et al. 2002), L41870 (Zhang et al. 2009), X59871 (Wang et al. 2006),
J04615 (Tong et al. 2009), Y00339 (Tong et al. 2009).

5 Conclusions

For nonlinear SVM, feature ranking criterion is unknown and the weight vector cannot
be computed explicitly. In this paper, we proposed a wrapper feature selection algorithm
SVM-RBF-RFE, which expands nonlinear RBF kernel into its Maclaurin series in order to
compute the weight of each feature. For 10NN classifier with a weak classification perfor-
mance, SVM-RBF-RFE can improve the classification performance of 10NN used in our
experiment, the error rate is from 0.1874 to 0.1386 on Horse dataset, and the error rate is
from 0.1626 to 0.1383 on Sonar dataset, and the error rate is from 0.0133 to 0.0073 on
Hypo dataset. In three microarray data, 10NN classifier also gives better performance on
gene subset selected by SVM-RBF-RFE. SVM-RBF-RFE is compared with information
gain which is a filter feature selection algorithm, and SVM-RFE which is wrapper feature
selection algorithm. The results show SVM-RBF-RFE algorithm is very competitive over
experimental datasets. Moreover, from Table 2, we can see that SVM-RBF-RFE can identify
most significant genes which have been reported in the related research community. So we
conclude that SVM-RBF-RFE algorithm is very effective on feature selection for SVM with
RBF kernel.

123
Q. Liu et al.

(a) SVML Classifier (b) SVM Classifier

Classification Error Rate

0.36 SVM-RFE 0.56 SVM-RFE

SVM-RBF-RFE 0.52 SVM-RBF-RFE
Information Gain 0.48 Information Gain
0.32
0.44
0.28 0.4
0.36
0.24 0.32
0.28
0.2 0.24
0.2
0.16 0.16
0.12
0.12
10 20 30 40 50 60 70 80 90 100 10 20 30 40 50 60 70 80 90 100
Number of Selected Genes Number of Selected Genes

(c) NN5 Classifier (d) NN10 Classifier

Classification Error Rate

0.36 SVM-RFE SVM-RFE

SVM-RBF-RFE 0.36 SVM-RBF-RFE
Information Gain Information Gain
0.32 0.32
0.28 0.28
0.24 0.24
0.2 0.2
0.16 0.16
0.12 0.12
10 20 30 40 50 60 70 80 90 100 10 20 30 40 50 60 70 80 90 100
Number of Selected Genes Number of Selected Genes
Fig. 6 Compare the effectiveness of information gain, SVM-RFE and SVM-RBF-RFE algorithms on the data
set Prostate. a, b, c, d, respectively, show SVML, SVM, 5NN and 10NN are used to evaluate feature subsets
selected by three algorithms, respectively

Table 2 The most important genes selected on the leukemia data set

Rank GAN Description of gene

1 X62535 DAGK1 Diacylglycerol kinase, alpha (80 kD)
2 U05259 MB-1 gene
3 D21262 KIAA0035 gene, partial cds
4 D88422 CYSTATIN A
5 U46499 GLUTATHIONE S-TRANSFERASE, MICROSOMAL
6 M77142 NUCLEOLYSIN TIA-1
7 Y12670 LEPR Leptin receptor
8 M80783 B12 protein mRNA
10 D49950 Liver mRNA for interferon-gamma inducing factor(IGIF)
11 M30703 Amphiregulin (AR) gene
12 M31211 MYL1 Myosin light chain (alkali)
13 X70297 CHRNA7 Cholinergic receptor, nicotinic, alpha polypeptide 7
14 U61836 Putative cyclin G1 interacting protein mRNA, partial sequence
15 M11722 Terminal transferase mRNA
16 X16665 HOXB2 Homeo box B2
17 M23197 CD33 CD33 antigen (differentiation antigen)
18 M92287 CCND3 Cyclin D3
19 L47738 Inducible protein mRNA
20 M27891 CST3 Cystatin C (amyloid angiopathy and cerebral hemorrhage)

123
Feature selection for support vector machines

Table 2 continued
Rank GAN Description of gene
21 D14811 KIAA0110 gene
22 U41813 HOXA9 Homeo box A9
23 L00058 MYC V-myc avian myelocytomatosis viral oncogene homolog
24 D86983 KIAA0230 gene, partial cds
25 J03589 UBIQUITIN-LIKE PROTEIN GDX
26 S71043 Ig alpha 2=immunoglobulin A heavy chain allotype 2
27 M31166 PTX3 Pentaxin-related gene, rapidly induced by IL-1 beta
28 M83233 TCF12 Transcription factor 12
29 M95678 PLCB2 Phospholipase C, beta 2
30 Y08612 RABAPTIN-5 protein
31 U09578 MAPKAP kinase (3pK) mRNA
32 L41870 RB1 Retinoblastoma 1 (including osteosarcoma)
33 X95735 Zyxin
34 M19508 MPO from Human myeloperoxidase gene
35 D14874 ADM Adrenomedullin
36 M54995 PPBP Connective tissue activation peptide III
37 X03934 GB DEF = T-cell antigen receptor gene T3-delta
38 X59871 TCF7 Transcription factor 7 (T-cell specific)
39 U63825 Hepatitis delta antigen interacting protein A (dipA) mRNA
40 L07758 IEF SSP 9502 mRNA
41 U79285 GLYCYLPEPTIDE N-TETRADECANOYLTRANSFERASE
42 U04840 Onconeural ventral antigen-1 (Nova-1) mRNA
43 M84526 DF D component of complement (adipsin)
44 M63138 CTSD Cathepsin D (lysosomal aspartyl protease)
45 D13633 KIAA0008 gene
46 J04615 SNRPN Small nuclear ribonucleoprotein polypeptide N
47 D43948 KIAA0097 gene
48 M80254 PEPTIDYL-PROLYL CIS-TRANS ISOMERASE
49 D87469 KIAA0279 gene, partial cds
50 M89957 IGB Immunoglobulin-associated beta (B29)
51 L05148 Protein tyrosine kinase related mRNA sequence
52 M81695 ITGAX Integrin, alpha X
53 X82240 TCL1 gene (T cell leukemia)
54 Y00339 CA2 Carbonic anhydrase II

Acknowledgments This research is supported by the National Natural Science Foundation of China
(60873196) and Chinese Universities Scientific Fund (QN2009092).

References

Albrecht A (2006) Stochastic local search for the feature set problem, with applications to microarray data.
Appl Math Comput 183(2):1148–1164
Ando S, Iba H (2004) Classification of gene expression profile using combinatory method of evolutionary
computation and machine learning. Genet Program Evol Mach 5:1573–7632

123
Q. Liu et al.

Bontempi G (2007) A blocking strategy to improve gene selection for classification of gene expression data.
IEEE/ACM Trans Comput Biology Bioinform 4:293–300
Brank J, Grobelnik M, Milic-Frayling N, Mladenic D (2002) Feature selection using linear support vector
machines. Technical Report, MSR-TR-2002-63, Microsoft Research, Microsoft Corporation
Burges C (1998) A tutorial on support vector machines for pattern recognition. Data Mining Knowl Discovery
2:121–167
Claeskens G, Croux C, Kerckhoven J (2008) An information criterion for variable selection in support vector
machines. J Mach Learn Res 9:541–558
Cristianini N, Taylor J (2000) An introduction to support vector machines. Cambridge University Press,
Cambridge
Deng L, Pei J, Ma J, Lee D (2004) A rank sum test method for informative gene discovery. In: Proceedings of
the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Seattle,
pp 410–419
Ding C, Peng H (2005) Minimum redundancy feature selection from microarray gene expression data. J Bio-
inform Comput Biology 3(2):185–205
Ding Y, Wilkins D(2006)Improving the performance of SVM-RFE to select genes in microarray data. BMC
Bioinform 7 (Suppl 2):S12. doi:10.1186/1471-2105-7-S2-S12
Draminski M, Rada-Iglesias A, Enroth S, Wadelius C, Koronacki J, Komorowski J (2008) Monte Carlo feature
selection for supervised classification. Bioinformatics 24(1):110–117
Duan K, Rajapakse J (2004a) SVM-RFE peak selection for cancer classification with mass spectrometry data.
In: Proceedings of the 3rd Asia-pacific bioinformatics conference, pp 191–200
Duan K, Rajapakse J (2004b) A variant of SVM-RFE for gene selection in cancer classification with expression
data. In: Proceedings of IEEE symposium computational intelligence in bioinformatics and computa-
tional biology, pp 49–55
Duan K, Rajapakse J, Wang H, Azuaje F (2005) Multiple SVM-RFE for gene selection in cancer classification
with expression data. IEEE Trans Nanobiosci 4(3):228–234
Elalami M (2009) A filter model for feature subset selection based on genetic algorithm. Knowledge-Based
Syst 22:356–362
Estevez P, Tesmer M, Perez C, Zurada J (2009) Normalized mutual information feature selection. IEEE Trans
Neural Netw 20:189–201
Fayyad U, Irani K (1993) Multi-interval discretization of continuous-valued attributes for classification learn-
ing. In: Proceedings of the 13th international joint conference on artificial intelligence, pp 1022–1027
Golub T et al (1999) Molecular classification of cancer: class discovery and class prediction by gene expres-
sion monitoring. Science 286:531–537
Guyon W, Barnhill V (2002) Gene selection for cancer classification using support vector machines. Mach
Learn 46:389–422
Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157–1182
Ho S, Hsieh C, Chen H, Huang H (2006) Interpretable gene expression classifier with an accurate and compact
fuzzy rule base for microarray data analysis. BioSystems 85:165–176
Huang J, Cai Y, Xu X (2007) A hybrid genetic algorithm for feature selection wrapper based on mutual
information. Pattern Recogn Lett 28:1825–1844
Kohavi R, John G (1997) Wrappers for feature subset selection. Artif Intell 97(1-2):273–324
LeCun Y, Denker J, Solla S (1990) Optimal brain damage. Adv Neural Inform Process Syst II:598–605
Lee C, Lee G (2006) Information gain and divergence-based feature selection for machine learning-based text
categorization. Inform Process Manage 42(1):155–165
Li F, Yang Y (2005) Analysis of recursive gene selection approaches from microarray data. Bioinformatics
21(19):3741–3747
Liu Q, Zhang Y, Hu Z (2007) Extracting positive and negative association classification rules from RBF ker-
nel. In: 2007 International conference on convergence information technology. IEEE Computer Society,
pp 1285–1291
Niijima S, Kuhara S (2006) Gene subset selection in kernel-induced feature space. Pattern Recogn Lett
27:1884–1892
Schoch C, Kohlmann A, Schnittger S et al (2002) Acute myeloid leukemias with reciprocal rearrangements
can be distinguished by specific gene expression profiles. Proc Nat Acad Sci USA 99(15):10008–10013
Shipp M, Ross K, Tamayo P et al (2002) Diffuse large B-Cell lymphoma outcome prediction by gene-expres-
sion profiling and supervised machine learning. Nature Med 8(1):68–74
Silva P, Hashimoto R, Kim S et al (2005) Feature selection algorithms to find strong genes. Pattern Recogn
Lett 26:1444–1453
Singh D, Febbo P et al (2002) Gene expression correlates of clinical prostate cancer behavior. Cancer Cell
1:203–209

123
Feature selection for support vector machines

Sun Y (2007) Iterative RELIEF for feature weighting: algorithms, theories, and applications. In: IEEE trans-
actions on pattern analysis and machine intelligence, vol. 29(6):1035–1051
Tang Y, Zhang Y, Huang Z (2007) Development of two-stage SVM-RFE gene selection strategy for microarray
expression data analysis. IEEE/ACM Trans Comput Biol Bioinform 4(3):365–381
Tong D, Phalp K, Schierz A, Mintram R (2009) Innovative hybridisation of genetic algorithms and neural net-
works in detecting marker genes for leukaemia cancer. In: 4th IAPR international conference on pattern
recognition in bioinformatics, Sheffield, 7–9 September 2009
Vapnik V (1998) Statistical learning theory. Wiley, New York
Wang Z, Palade V, Xu Y (2006) Neuro-fuzzy ensemble approach for microarray cancer gene expression data
analysis. In: Proceedings of the second international symposium on evolving fuzzy system (EFS’06),
IEEE Computational Intelligence Society 2006 , pp 241–246
Youn E, Jeong M (2009) Class dependent feature scaling method using naive Bayes classifier for text data
mining. Pattern Recogn Lett 30:477–485
Zhang C, Lu X, Zhang X (2006) Significance of gene ranking for classification of microarray samples.
IEEE/ACM Trans Comput Biology Bioinform 3(3):312–320
Zhang H, Song X, Wang H, Zhang X (2009) MIClique: an algorithm to identify differentially coexpressed
disease gene subset from microarray data. J Biomed Biotechnol 2009. Article No.: 42524, doi:10.1155/
2009/642524

123
View publication stats

Yong Shi - Advances in Big Data Analytics - Theory, Algorithms and Practices (2022, Springer) - Libgen - Li
No ratings yet
Yong Shi - Advances in Big Data Analytics - Theory, Algorithms and Practices (2022, Springer) - Libgen - Li
723 pages
Interior Design - A Professional Guide
No ratings yet
Interior Design - A Professional Guide
179 pages
Nursing Care Plan of The Mother
No ratings yet
Nursing Care Plan of The Mother
21 pages
Feature Selection For SVMS: J. Weston, S. Mukherjee, O. Chapelle, M. Pontil T. Poggio, V. Vapnik
No ratings yet
Feature Selection For SVMS: J. Weston, S. Mukherjee, O. Chapelle, M. Pontil T. Poggio, V. Vapnik
7 pages
Hybrid-Recursive Feature Elimination for Efficient Feature Selection
No ratings yet
Hybrid-Recursive Feature Elimination for Efficient Feature Selection
9 pages
Combined SVM-Based Feature Selection and Classification
No ratings yet
Combined SVM-Based Feature Selection and Classification
22 pages
Local-Learning-Based Feature Selection For High-Dimensional Data Analysis
No ratings yet
Local-Learning-Based Feature Selection For High-Dimensional Data Analysis
18 pages
A Machine Learning Approach: SVM For Image Classification in CBIR
No ratings yet
A Machine Learning Approach: SVM For Image Classification in CBIR
7 pages
Optimal Feature Selection For Support Vector Machines: Independently
No ratings yet
Optimal Feature Selection For Support Vector Machines: Independently
25 pages
1 s2.0 S0925400515001872 Main
No ratings yet
1 s2.0 S0925400515001872 Main
11 pages
Nonlinear Feature Selection by Relevance Feature Vector Machine 1st Edition by Haibin Cheng, Haifeng Chen, Guofei Jiang, Kenji Yoshihira 9783540734987 - The ebook is available for quick download, easy access to content
100% (2)
Nonlinear Feature Selection by Relevance Feature Vector Machine 1st Edition by Haibin Cheng, Haifeng Chen, Guofei Jiang, Kenji Yoshihira 9783540734987 - The ebook is available for quick download, easy access to content
49 pages
Aim of The Experiment-Software Required - Theory
No ratings yet
Aim of The Experiment-Software Required - Theory
6 pages
Navot PHD
No ratings yet
Navot PHD
145 pages
Lab 6 Dsa
No ratings yet
Lab 6 Dsa
15 pages
Feature Gradients: Scalable Feature Selection Via Discrete Relaxation
No ratings yet
Feature Gradients: Scalable Feature Selection Via Discrete Relaxation
9 pages
Feature Selection For Domain Adaptation Using Complexity Meas - 2023 - Neurocomp
No ratings yet
Feature Selection For Domain Adaptation Using Complexity Meas - 2023 - Neurocomp
14 pages
C. Cifarelli Et Al - Incremental Classification With Generalized Eigenvalues
No ratings yet
C. Cifarelli Et Al - Incremental Classification With Generalized Eigenvalues
25 pages
1) SVM-RFE: This Is A Popular Method For Feature Selection Where Ranking Is Done Based On
No ratings yet
1) SVM-RFE: This Is A Popular Method For Feature Selection Where Ranking Is Done Based On
6 pages
Probabilistic Feature Selection and Classification Vector Machine
No ratings yet
Probabilistic Feature Selection and Classification Vector Machine
27 pages
Pattern Recognition: Kuo-Ping Wu, Sheng-De Wang
No ratings yet
Pattern Recognition: Kuo-Ping Wu, Sheng-De Wang
8 pages
icml2005
No ratings yet
icml2005
8 pages
SVM Basics Paper
No ratings yet
SVM Basics Paper
7 pages
Data Classification Using Support Vector Machine: Durgesh K. Srivastava, Lekha Bhambhu
No ratings yet
Data Classification Using Support Vector Machine: Durgesh K. Srivastava, Lekha Bhambhu
7 pages
Feature Selection Via Sensitivity Analysis of SVM Probabilistic Outputs
No ratings yet
Feature Selection Via Sensitivity Analysis of SVM Probabilistic Outputs
20 pages
2015-Elsevier-Kernel-methods-for-heterogeneous-feature-selection
No ratings yet
2015-Elsevier-Kernel-methods-for-heterogeneous-feature-selection
9 pages
Feature Extraction Techniques Using Support Vector Machines in Disease Prediction
No ratings yet
Feature Extraction Techniques Using Support Vector Machines in Disease Prediction
8 pages
An Introduction To Variable and Feature Selection
No ratings yet
An Introduction To Variable and Feature Selection
26 pages
International Journal of Engineering Research and Development (IJERD)
No ratings yet
International Journal of Engineering Research and Development (IJERD)
5 pages
Feature selection 2011 Kotsiantis
No ratings yet
Feature selection 2011 Kotsiantis
20 pages
CDT B1 Lab06 MondayWeek2
No ratings yet
CDT B1 Lab06 MondayWeek2
6 pages
Binary Ebola Optimization Search Algorithm For Feature Selection and Classification Problems
No ratings yet
Binary Ebola Optimization Search Algorithm For Feature Selection and Classification Problems
46 pages
2015-Elsevier-Optimal-feature-selection-for-nonlinear-data-using-branch-and-bound-in-kernel-space
No ratings yet
2015-Elsevier-Optimal-feature-selection-for-nonlinear-data-using-branch-and-bound-in-kernel-space
7 pages
Feature Subset Selection With Fast Algorithm Implementation
No ratings yet
Feature Subset Selection With Fast Algorithm Implementation
5 pages
eTasci
No ratings yet
eTasci
26 pages
1 s2.0 S1568494623005768 Main
No ratings yet
1 s2.0 S1568494623005768 Main
19 pages
Independent Feature Elimination in High Dimensional Data: Empirical Study by Applying Learning Vector Quantization Method
No ratings yet
Independent Feature Elimination in High Dimensional Data: Empirical Study by Applying Learning Vector Quantization Method
6 pages
Animprovedfeatureselectionmethodforclassification onincompletedata
No ratings yet
Animprovedfeatureselectionmethodforclassification onincompletedata
15 pages
Support Vector Machine in R Paper
No ratings yet
Support Vector Machine in R Paper
28 pages
Selecting Critical Features For Data Classification Based On Machine Learning Methods
No ratings yet
Selecting Critical Features For Data Classification Based On Machine Learning Methods
26 pages
Chapter 07
No ratings yet
Chapter 07
18 pages
Intrusion Detection System Based On Support Vector Machines and The Two-Phase Bat Algorithm
No ratings yet
Intrusion Detection System Based On Support Vector Machines and The Two-Phase Bat Algorithm
16 pages
Using RL To Find An Optimal Set of Features
No ratings yet
Using RL To Find An Optimal Set of Features
13 pages
A Review of Feature Selection Methods On Synthetic Data
No ratings yet
A Review of Feature Selection Methods On Synthetic Data
37 pages
A Study On Feature Selection Techniques in Bio Informatics
100% (1)
A Study On Feature Selection Techniques in Bio Informatics
7 pages
Recall, Precision
No ratings yet
Recall, Precision
7 pages
Feature Selection For Nonlinear Kernel Support Vector Machines
No ratings yet
Feature Selection For Nonlinear Kernel Support Vector Machines
6 pages
An Introduction To Kernel Methods: C. Campbell
No ratings yet
An Introduction To Kernel Methods: C. Campbell
38 pages
Combining Support Vector Machines: 6.1. Introduction and Motivations
No ratings yet
Combining Support Vector Machines: 6.1. Introduction and Motivations
20 pages
Hybrid Feature Selection
No ratings yet
Hybrid Feature Selection
8 pages
2015-Elsevier-Multi-objective-optimization-of-shared-nearest-neighbor-similarity-for-feature-selection
No ratings yet
2015-Elsevier-Multi-objective-optimization-of-shared-nearest-neighbor-similarity-for-feature-selection
12 pages
SVM Intro
No ratings yet
SVM Intro
23 pages
A Fast Clustering-Based Feature Subset Selection Algorithm For High Dimensional Data
No ratings yet
A Fast Clustering-Based Feature Subset Selection Algorithm For High Dimensional Data
8 pages
A Survey On Evolutionary Multiobjective Feature Selection in Classification Approaches Applications and Challenges
No ratings yet
A Survey On Evolutionary Multiobjective Feature Selection in Classification Approaches Applications and Challenges
21 pages
ML LW 6 Kernel SVM
No ratings yet
ML LW 6 Kernel SVM
4 pages
6. Support Vector Machine for Classification
No ratings yet
6. Support Vector Machine for Classification
38 pages
A Practical Guide To Support Vector Classification: I I I N L
No ratings yet
A Practical Guide To Support Vector Classification: I I I N L
15 pages
5.binary Grosopher
No ratings yet
5.binary Grosopher
20 pages
Artigo Smallex
No ratings yet
Artigo Smallex
17 pages
4.4-InstanceBasedLearning Part 2
No ratings yet
4.4-InstanceBasedLearning Part 2
16 pages
Support Vector Machine: Fundamentals and Applications
From Everand
Support Vector Machine: Fundamentals and Applications
Fouad Sabry
No ratings yet
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet
Ordered Weighted Averaging Aggregation Operator: Fundamentals and Applications
From Everand
Ordered Weighted Averaging Aggregation Operator: Fundamentals and Applications
Fouad Sabry
No ratings yet
Melissa Van Osselaer Candidate NUMBER:167044379 Choreography Logbook Unit Code: Word Count
No ratings yet
Melissa Van Osselaer Candidate NUMBER:167044379 Choreography Logbook Unit Code: Word Count
7 pages
Epistemic Responsibility
No ratings yet
Epistemic Responsibility
2 pages
15-16 Constructing Auxiliary Views
No ratings yet
15-16 Constructing Auxiliary Views
4 pages
Design-Based Research Syllabus (2011)
No ratings yet
Design-Based Research Syllabus (2011)
9 pages
28 Packing My Bag
No ratings yet
28 Packing My Bag
11 pages
Mba 4th Year Qs Paper - Research Methodology
No ratings yet
Mba 4th Year Qs Paper - Research Methodology
8 pages
2023 MBAZG511 - Lecture 11 Part 1 HR Planning
No ratings yet
2023 MBAZG511 - Lecture 11 Part 1 HR Planning
25 pages
List of Pending Transfer SY 2016-17 - Junior High School
No ratings yet
List of Pending Transfer SY 2016-17 - Junior High School
4 pages
Covid-19 Final Draft (1)
No ratings yet
Covid-19 Final Draft (1)
2 pages
Emily E. Ruhser: Ducation
No ratings yet
Emily E. Ruhser: Ducation
2 pages
Transformation of Eliza
No ratings yet
Transformation of Eliza
2 pages
Physical Assessment
No ratings yet
Physical Assessment
3 pages
Zapf 1999
No ratings yet
Zapf 1999
32 pages
G3Tg New
No ratings yet
G3Tg New
41 pages
Regional Development In The Knowledge Economy Edited By Philip Cooke download
100% (1)
Regional Development In The Knowledge Economy Edited By Philip Cooke download
35 pages
English 4 Final Exam
100% (1)
English 4 Final Exam
3 pages
DLL6
No ratings yet
DLL6
3 pages
Research Proposal
No ratings yet
Research Proposal
17 pages
University President Technological University of The Philippines
100% (1)
University President Technological University of The Philippines
1 page
Chapter 3 - Angelis Group 3
100% (1)
Chapter 3 - Angelis Group 3
5 pages
Stack Overflow Talent 2018 Global Developer Hiring Landscape
No ratings yet
Stack Overflow Talent 2018 Global Developer Hiring Landscape
32 pages
March 2012 Real Estate Broker Board Exam Room Assignments - Manila
No ratings yet
March 2012 Real Estate Broker Board Exam Room Assignments - Manila
164 pages
Week 8 Lesson Reflections
No ratings yet
Week 8 Lesson Reflections
5 pages
Organizational Behavior 12 Again
No ratings yet
Organizational Behavior 12 Again
20 pages
HRM Guide PDF
No ratings yet
HRM Guide PDF
270 pages
Daily Activities: What Do You Do?
No ratings yet
Daily Activities: What Do You Do?
11 pages
Small Group Reading
No ratings yet
Small Group Reading
2 pages
Project Rubric Photoshop
No ratings yet
Project Rubric Photoshop
6 pages

Feature Selection for Support Vector Machines With

Uploaded by

Feature Selection for Support Vector Machines With

Uploaded by

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

Feature selection for support vector machines with RBF kernel

Article in Artificial Intelligence Review · August 2011

Quanzhong Liu Yang Zhang

SEE PROFILE SEE PROFILE

The user has requested enhancement of the downloaded file.

Feature selection for support vector machines

Quanzhong Liu · Chihau Chen · Yang Zhang ·

© Springer Science+Business Media B.V. 2011

Keywords Feature selection · RBF kernel · Information gain · SVM-RFE ·

< w, (X ) > + b = 0 (1)

3 The proposed method

3.1 RBF Kernel function

K (U, V ) = exp(−g||U − V ||2 ) (7)

ψ(U ) = (U1 , U2 , U3 , . . . , Un , U1 , U2 , U3 , . . . , Un ) (9)

||U − V ||2 =< ψ(U ), τ (V ) >=< ψ(V ), τ (U ) > (11)

K (U, V ) = exp(−g < ψ(U ), τ (V ) >) (12)

3.2 Weight of features

Equation 18 is used to compute the weight of each feature in experimental study.

3.3 SVM-RBF-RFE algorithm

Algorithm 1 SVM-RBF-RFE Algorithm

DataSets #Training samples #Testing samples #Feature #Class ratio

Sonar 208 / 42 111:97

4.2 Experiments on UCI data sets

(a) SVML Classifier (b) SVM Classifier

Classification Error Rate

(c) NN5 Classifier (d) NN10 Classifier

Classification Error Rate

(a) SVML Classifier (b) SVM Classifier

Classification Error Rate

Classification Error Rate

0.56 SVM-RFE 0.56 SVM-RFE

4.3 Experiments on microarray data

(a) SVML Classifier (b) SVM Classifier

Classification Error Rate

Number of Features Number of Features

Classification Error Rate

Number of Features Number of Features

(a) SVML Classifier (b) SVM Classifier

Classification Error Rate

(c) NN5 Classifier (d) NN10 Classifier

Classification Error Rate

4.4 Discussion on important genes selected by SVM-RBF-RFE

(a) SVML Classifier (b) SVM Classifier

Classification Error Rate

(c) NN5 Classifier (d) NN10 Classifier

Classification Error Rate

(a) SVML Classifier (b) SVM Classifier

Classification Error Rate

0.36 SVM-RFE 0.56 SVM-RFE

(c) NN5 Classifier (d) NN10 Classifier

Classification Error Rate

0.36 SVM-RFE SVM-RFE

Rank GAN Description of gene

You might also like