0% found this document useful (0 votes)

2 views

SVM

This paper surveys Support Vector Machines (SVM) in the context of data uncertainties, highlighting various models that address situations where data points are not precisely known. It discusses robust optimization techniques to ensure optimal performance under worst-case scenarios, including bounded noise and chance constraints. The paper also reviews applications of SVM across multiple fields, emphasizing the importance of adapting SVM models to handle uncertain data effectively.

Uploaded by

ramin nourollahi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views

SVM

Uploaded by

ramin nourollahi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 17

Ann. Data. Sci.

DOI 10.1007/s40745-014-0022-8

A Survey of Support Vector Machines with

Uncertainties

Ximing Wang · Panos M. Pardalos

Received: 10 October 2014 / Revised: 1 November 2014 / Accepted: 10 December 2014

Abstract Support Vector Machines (SVM) is one of the well known supervised
classes of learning algorithms. SVM have wide applications to many fields in recent
years and also many algorithmic and modeling variations. Basic SVM models are
dealing with the situation where the exact values of the data points are known. This
paper presents a survey of SVM when the data points are uncertain. When a direct
model cannot guarantee a generally good performance on the uncertainty set, robust
optimization is introduced to deal with the worst case scenario and still guarantee
an optimal performance. The data uncertainty could be an additive noise which is
bounded by norm, where some efficient linear programming models are presented
under certain conditions; or could be intervals with support and extremum values; or
a more general case of polyhedral uncertainties with formulations presented. Another
field of the uncertainty analysis is chance constrained SVM which is used to ensure the
small probability of misclassification for the uncertain data. The multivariate Cheby-
shev inequality and Bernstein bounding schemes have been used to transform the
chance constraints through robust optimization. The Chebyshev based model employs
moment information of the uncertain training points. The Bernstein bounds can be less
conservative than the Chebyshev bounds since it employs both support and moment
information, but it also makes a strong assumption that all the elements in the data set
are independent.

Keywords Support vector machines · Robust optimization · Bounded norm ·

Polyhedral uncertainties · Chance constraints

X. Wang (B) · P. M. Pardalos

Department of Industrial and Systems Engineering, University of Florida, Gainesville, FL 32611, USA
e-mail: [email protected]
P. M. Pardalos
e-mail: [email protected]

123
Ann. Data. Sci.

1 Introduction

As one of the well known supervised learning algorithms, Support Vector Machines
(SVM) are gaining more and more attention. It was proposed by Vapnik [1,2] as a
maximum-margin classifier, and tutorials on SVM could refer to [3–6]. In recent years,
SVM have been applied to many fields and have many algorithmic and modeling vari-
ations. In the biomedical field, SVM have been used to identify physical diseases
[7–10] as well as psychological diseases [11]. Electroencephalography (EEG) signals
can also be analyzed using SVM [12–14]. Besides these, SVM also applied to pro-
tein prediction [15–19] and medical images [20–22]. Computer vision includes many
applications of SVM like person identification [23], hand gesture detection [24], face
recognition [25] and background subtraction [26]. In geosceinces, SVM have been
applied to remote sensing analysis [27–29], land cover change [30–32], landslide
susceptibility [33–36] and hydrology [37,38]. In power systems, SVM was used for
transient status prediction [39], power load forecasting [40], electricity consumption
prediction [41] and wind power forecasting [42]. Stock price forecasting [43–45] and
business administration [46] can also use SVM. Other applications of SVM include
agriculture plant disease detection [47], condition monitoring [48], network security
[49] and electronics [50,51]. When basic SVM models cannot satisfy the application
requirement, different modeling variations of SVM can be found in [52].
In this paper, a survey of SVM with uncertainties is presented. Basic SVM models
are dealing with the situation that the exact values of the data points are known. When
the data points are uncertain, different models have been proposed to formulate the
SVM with uncertainties. Bi and Zhang [53] assumed the data points are subject to an
additive noise which is bounded by the norm and proposed a very direct model. How-
ever, this model cannot guarantee a generally good performance on the uncertainty
set. To guarantee an optimal performance when the worst case scenario constraints
are still satisfied, robust optimization is utilized. Trafalis et al. [54–58] proposed a
robust optimization model when the perturbation of the uncertain data is bounded by
norm. Ghaoui et al. [59] derived a robust model when the uncertainty is expressed
as intervals. Fan et al. [60] studied a more general case for polyhedral uncertainties.
Robust optimization is also used when the constraint is a chance constraint which is
to ensure the small probability of misclassification for the uncertain data. The chance
constraints are transformed by different bounding inequalities, for example multivari-
ate Chebyshev inequality [61,62] and Bernstein bounding schemes [63].
The organization of this paper is as follows: Sect. 2 gives an introduction to the basic
SVM models. Section 3 presents the SVM with uncertainties, stating both the robust
SVM with bounded uncertainty and chance constrained SVM through robust opti-
mization. Section 4 presents concluding remarks and suggesting for further research.

2 Basic SVM Models

Support Vector Machines construct maximum-margin classifiers, such that small per-
turbations in data are least likely to cause misclassification. Empirically, SVM works
really well and are well known supervised learning algorithms proposed by Vap-

123
Ann. Data. Sci.

nik [1,2]. Suppose we have a two-class dataset of m data points {xi , yi }i=1 m with
n-dimensional features xi ∈ R and respective class labels yi ∈ {+1, −1}. For lin-
n

early separable datasets, there exists a hyperplane w x + b = 0 to separate the two

classes and the corresponding classification rule is based on the sign(w x + b). If this
value is positive, x is classified to be in +1 class; otherwise, −1 class.
The datapoints that the margin pushes up against are called support vectors. A
maximum-margin hyperplane is one that maximizes the distance between the hyper-
plane and the support vectors. For the separating hyperplane w x + b = 0, w and b
could be normalized so that w x + b = +1 goes through support vectors of +1 class,
and w x + b = −1 goes through support vectors of −1 class. The distance between
2
these two hyperplane, i.e., the margin width, is w 2 , therefore, maximization of the
2
margin can be performed as minimization of 21 w22 subject to separation constraints.
This can be expressed as the following quadratic optimization problem:

1
min w22 (1a)
w,b 2

s.t. yi (w xi + b) ≥ 1, i = 1, . . . , m (1b)

Introduing Lagrange multipliers α = [α1 , . . . , αm ], the above constrained problem

can be expressed as:

1
m

min max L (w, b, α) = w22 − αi yi (w xi + b) − 1 (2)
w,b α≥0 2
i=1

Take the derivatives with respect to w and b, and set to zero:

∂L (w, b, α) m
=0 ⇒ w= αi yi xi (3a)
∂w
i=1

∂L (w, b, α)
m
=0 ⇒ αi yi = 0 (3b)
∂b
i=1

Substituting into L (w, b, α):

m
1
m m
L (α) = αi − αi α j yi y j xi x j (4)
2
i=1 i=1 j=1

Then the dual of the original SVM problem is also a convex quadratic problem:

m
1
m m
max αi − αi α j yi y j xi x j (5a)
α 2
i=1 i=1 j=1
m
s.t. αi yi = 0, αi ≥ 0, i = 1, . . . , m (5b)
i=1

123
Ann. Data. Sci.

Since only the αi corresponding to support vectors can be nonzero, this dramatically
simplifies solving the dual problem.
The above is in the case that the two classes are linearly separable. When they
are not, mislabeled samples need to be allowed where soft margin SVM arises. Soft
margin SVM introduces non-negative slack variables ξi to measure the distance of
within-margine or misclassified data xi to the hyperplane with the correct label, and
ξi = max{0, 1 − yi (w xi + b)}. When 0 < ξi < 1, the data is within margine but
correctly classified; when ξi > 1, the data is misclassified. The objective function
is then adding a term that penalizes these slack variables, and the optimization is a
trade off between a large margin and a small error penalty. The soft margin SVM
formulation with L 1 regularization [64] is:

1 m
min w22 + C ξi (6a)
w,b,ξi 2
i=1
s.t. yi (w xi + b) ≥ 1 − ξi , ξi ≥ 0, i = 1, . . . , m (6b)

where C is a trade-off parameter.

Similarly, the Lagrange of the soft margin SVM is:

1 m
min max L (w, b, ξ , α, β) = w22 + C ξi
w,b,ξ α,β≥0 2
i=1

m
m
− αi yi (w xi + b) − 1 + ξi − βi ξi (7)
i=1 i=1

Take the derivative with respect to ξi and set to zero:

∂L (w, b, ξ , α, β)
= 0 ⇒ C − αi − βi = 0 (8)
∂ξi

Then αi = C − βi . Since βi ≥ 0, it indicates that αi ≤ C.

The derivatives with respect to w and b are the same as before, substituting into
L (w, b, ξ , α, β) and get the dual of the soft margin SVM:

m
1
m m
max αi − αi α j yi y j xi x j (9a)
α 2
i=1 i=1 j=1
m
s.t. αi yi = 0, 0 ≤ αi ≤ C, i = 1, . . . , m (9b)
i=1

The only difference is that the dual variables αi now have upper bounds C. The
advantage of the L 1 regularization (linear penalty function) is that in the dual problem,
the slack variables ξi vanish and the constant C is just an additional constraint on the

123
Ann. Data. Sci.

Lagrange multipliers αi . Because of this nice property and its huge impact in practice,
L 1 is the most widely used regularization term.
Besides the linear kernel k(xi , x j ) = xi x j , nonlinear kernels are also introduced
into SVM to create nonlinear classifiers. The maximum-margin hyperplane is con-
structed in a high-dimensional transformed fearture space with a possible nonlin-
ear transformation, therefore, it could be nonlinear in the original feature space.
A widely
used nonliear
kernel is the Gaussian radial basis function k(xi , x j ) =
exp − γ xi − x j 22 . It corresponds to a Hilbert space of infinite dimensions.

3 SVM with Uncertainties

Given m training data points in Rn , use X i = [X i1 , . . . , X in ] ∈ Rn , i = 1, . . . , m to

denote the uncertain training data points and yi ∈ {+1, −1}, i = 1, . . . , m to denote
the respective class labels. The soft margin SVM with uncertainty is as following:

1
m
min w22 + C ξi (10a)
w,b,ξi 2
i=1
s.t. yi (w X i + b) ≥ 1 − ξi , ξi ≥ 0, i = 1, . . . , m (10b)

When the training data points X i are random vectors, the model needs to be modified
to consider the uncertainties. The simplest model is to just employ the means of the
uncertain data points, μi = E[X i ]. The formulation would become:

1
m
min w22 + C ξi (11a)
w,b,ξi 2
i=1
s.t. yi (w μi + b) ≥ 1 − ξi , ξi ≥ 0, i = 1, . . . , m (11b)

The above model is equivalent to a soft margin SVM on data points fixed on the
means, therefore does not take into account the uncertainties of the data. Bi and Zhang
[53] assumed the data points are subject to an additive noise, X i = x̄i + xi and the
noise is bounded by xi 2 ≤ δi . Then they proposed the model as:

1 m
min w22 + C ξi (12a)
w,b,ξi 2
i=1
s.t. yi (w (x̄i + xi ) + b) ≥ 1 − ξi , ξi ≥ 0, i = 1, . . . , m (12b)
xi 2 ≤ δi , i = 1, . . . , m (12c)

In this model, the uncertain data X i is free in the circle centered at x̄i with radius
equal to δi , i.e., X i could move toward any direction in the uncertainty set. A drawback
of this model is that it cannot guarantee a generally good performance on the uncer-
tainty set since the direction of how the data points are perturbed is not constrained in
this model. It is highly possible and already presented in this paper that a data point

123
Ann. Data. Sci.

with a perturbation making it move far away from the separation hyperplane could be
used as the support vector. Then considering the original uncertainty set of this data
point, it would be mostly lie within the margin and the constraint would not be satisfied
any more. To guarantee a better performance under most conditions or with higher
probability, robust optimization is introduced to solve the SVM with uncertainty.

3.1 Robust SVM with Bounded Uncertainty

Robust optimization is to guarantee an optimal performance under the worst case

scenario. Given different information of the uncertain data, several models have been
proposed. Trafalis et al. [54–58] proposed a model when the perturbation of the uncer-
tain data is bounded by norm. The uncertain data could be expressed as X i = x̄i + σ i ,
the mean vector x̄i plus the additional perturbation σ i which is bounded by the L p
norm with σ i p ≤ ηi , for all i = 1, . . . , m. Robust optimization is to deal with the
worst case perturbation, and this would be:

min yi (w x̄i + b) + yi w σ i ≥ 1 − ξi , i = 1, . . . , m (13)

σ i p ≤ηi

To solve the robust SVM, the following subproblem needs to be solved first:

min yi w σ i (14a)
σi
s.t. σ i p ≤ ηi (14b)

Hölder’s inequality says that for a pair of dual norms L p and L q with p, q ∈ [1, ∞]
and 1/ p + 1/q = 1, the following inequality holds:

f g1 ≤ f p gq (15)

Therefore

|yi w σ i | ≤ σ i p wq ≤ ηi wq (16)

A lower bound of yi w σ i is −ηi wq , substituting into the original problem will get
the following formulation:

1 m
min w22 + C ξi (17a)
w,b,ξi 2
i=1
s.t. yi (w x̄i + b) − ηi wq ≥ 1 − ξi , ξi ≥ 0, i = 1, . . . , m (17b)

The above formulation depends on the norm L p . When p = q = 2, a conic program

of the above formulation can be obtained:

123
Ann. Data. Sci.

1 m
min w22 + C ξi (18a)
w,b,ξi 2
i=1
s.t. yi (w x̄i + b) − ηi w2 ≥ 1 − ξi , ξi ≥ 0, i = 1, . . . , m (18b)

An interesting property of the norm transformation is that for L 1 and L ∞ norms,

1 m
with the objective function wq + C i=1 ξi , the problem can be transformed into
2
a linear programming (LP) problem.
The dual of L 1 norm is L ∞ norm. When p = 1, the formulation becomes:

1 m
min w∞ + C ξi (19a)
w,b,ξi 2
i=1
s.t. yi (w x̄i + b) − ηi w∞ ≥ 1 − ξi , ξi ≥ 0, i = 1, . . . , m (19b)

Introducing an auxiliary variable α = w∞ , then the above formulation can be

written as a LP problem:

1 m
min α+C ξi (20a)
w,b,ξi 2
i=1
s.t. yi (w x̄i + b) − ηi α ≥ 1 − ξi , ξi ≥ 0, i = 1, . . . , m (20b)
α ≥ −w j , α ≥ w j , j = 1, . . . , n (20c)

When the L ∞ norm is chosen to express the perturbation, then the formulation
becomes:

1 m
min w1 + C ξi (21a)
w,b,ξi 2
i=1
s.t. yi (w x̄i + b) − ηi w1 ≥ 1 − ξi , ξi ≥ 0, i = 1, . . . , m (21b)

Introducing an auxiliary vector α with α j = |w j |, the resulting optimization problem

is also LP:

1
n m
min αj + C ξi (22a)
w,b,ξi 2
j=1 i=1

n
s.t. yi (w x̄i + b) − ηi α j ≥ 1 − ξi , ξi ≥ 0, i = 1, . . . , m (22b)
j=1
α j ≥ −w j , α j ≥ w j , j = 1, . . . , n (22c)

Ghaoui et al. [59] derived a robust model when the uncertainty is expressed as
intervals (also known as support or extremum values). Suppose the extremum values of
the uncertain data points are known as li j ≤ X i j ≤ u i j , then each training data point X i

123
Ann. Data. Sci.

is lying in a hyper-rectangle Ri = {xi = [xi1 , . . . , xin ] ∈ Rn | li j ≤ xi j ≤ u i j , j =

1, . . . , n} and the robust optimization requires that all points in the hyper-rectangle
should satisfy yi (w xi + b) ≥ 1 − ξi , ∀xi ∈ Ri . The geometric center of the hyper-
rectangle Ri is ci = [ci1 , . . . , cin ] ∈ Rn where ci j = (li j + u i j )/2, j = 1, . . . , n.
The semi-lengths of the sides of the hyper-rectangle Ri is si j = (u i j − li j )/2, j =
1, . . . , n. Then the worst case with these interval information would be:

n
yi (w ci + b) ≥ 1 − ξi + si j |w j | (23)
j=1

Then the SVM model with support information can be written as:

1 m
min w22 + C ξi (24a)
w,b,ξi 2
i=1
s.t. yi (w ci + b) ≥ 1 − ξi + ||Si w||1 , ξi ≥ 0, i = 1, . . . , m (24b)

where Si is a diagonal matrix with entries si j .

The interval uncertainty is a special case of polyhedral uncertainty [60]. The poly-
hedral uncertainty can be expressed as Di xi ≤ di , where the matrix Di ∈ Rq×n and
the vector di ∈ Rq . And since zero vectors could be added to obatin the same number
q of inequalities for all data points, q is the largest dimension of the uncerainties of
all the points. The robust SVM with polyhedral uncertainty is:

1 m
min w22 + C ξi (25a)
w,b,ξi 2
i=1
s.t. min yi (w xi + b) ≥ 1 − ξi , ξi ≥ 0, i = 1, . . . , m (25b)
{xi :Di xi ≤di }

Since min{xi :Di xi ≤di } yi (w xi + b) ≥ 1 − ξi is equivalent to:

max (−yi w xi ) − yi b ≤ −1 + ξi (26)

{xi :Di xi ≤di }

To solve

max − yi w xi (27a)
s.t. Di xi ≤ di (27b)

The dual is:

min di zi (28a)

s.t. Di zi = −yi w (28b)
zi = (z i1 , . . . , z iq ) ≥ 0 (28c)

123
Ann. Data. Sci.

Strong duality would guarantee that the objevtive values of the dual and primal are
equal. Therefore, the robust SVM with polyhedral uncertainty formulation is equiva-
lent to:

1 m
min w22 + C ξi (29a)
w,b,ξi ,z 2
i=1
s.t. di zi − yi b ≤ −1 + ξi , ξi ≥ 0 (29b)
Di zi
+ yi w = 0, zi = (z i1 , . . . , z iq ) (29c)
z i j ≥ 0, i = 1, . . . , m, j = 1, . . . , q (29d)

The authors also proved that for the hard margin SVM (i.e., when there is no ξi ),
the dual of the above formulation is:
m 2

m
1
n
min λi − yi μik (30a)
λ,μ 2
i=1 k=1 i=1

n
s.t. λi di j + μik Di jk = 0, i = 1, . . . , m, j = 1, . . . , q (30b)
k=1

m
λi yi = 0 (30c)
i=1
λi ≥ 0, i = 1, . . . , m (30d)

The interval uncertainty [xi0 −δ i , xi0 +δ i ] is a special case of polyhedral uncertainty

since when defining

I xi0 + δ i
Di = , di = (31)
−I −xi0 + δ i

{xi : xi ∈ [xi0 − δ i , xi0 + δ i ]} and {xi : Di xi ≤ di } are equivalent. The authors of [60]
also proposed probabilistic bounds on constraint violation in this case.

3.2 Chance Constrained SVM through Robust Optimization

The chance-constrained program (CCP) is used to ensure the small probability of

misclassification for the uncertain data. The chance-constrained SVM formulation is:

1 m
min w22 + C ξi (32a)
w,b,ξi 2
i=1

s.t. Prob yi (w X i + b) ≤ 1 − ξi ≤ ε, ξi ≥ 0, i = 1, . . . , m (32b)

123
Ann. Data. Sci.

where 0 < ε ≤ 1 is a prameter given by the user and close to 0. This model ensures
an upper bound on the misclassification probability, but the chance constraints are
typically non-convex so the problem is very hard to solve.
The work so far to deal with the chance constraint is to transform them by different
bounding inequalities. When the mean and covariance matrix are known, the multi-
variate Chebyshev bound via robust optimization can be used to express the chance
constraints above [61,62].
Markov’s inequality states that if X is a nonnegative random variable and a > 0,
then

E[X ]
Prob{X ≥ a} ≤ (33)
a
2
Consider the random variable X − E[X ] . Since Var(X ) = E (X − E[X ])2 , then

2 Var(X )
Prob{ X − E[X ] ≥ a 2 } ≤ (34)
a2

which yields the Chebyshev’s inequality

Var(X )
Prob{ X − E[X ] ≥ a} ≤ (35)
a2

Let x ∼ (μ, ) denote the random vector x with mean μ and convariance matrix
. The multivariate Chebyshev inequality [65,66] states that for an arbitrary closed
convex set S, the supremum of the probability that x takes a value in S is

1
sup Prob{x ∈ S} = (36a)
x∼(μ, ) 1 + d2
d 2 = inf (x − μ) −1
(x − μ) (36b)
x∈S

For the constraint Prob{w x + b ≤ 0} ≤ ε, it could be derived that:

1
w μ + b ≥ κC || 2 w||2 (37)
√
where κC = (1 − ε)/ε.
Applying the above result to the chance constrained SVM, the Chebyshev based
reformulation utilizing the means μi and covariance matrix i of each uncertain
training point X i can be obtained as the following robust model [61,62]:

1
m
min w22 + C ξi (38a)
w,b,ξi 2
i=1
1
s.t. yi (w μi + b) ≥ 1 − ξi + κC || i
2
w||2 , ξi ≥ 0, i = 1, . . . , m (38b)

123
Ann. Data. Sci.

Another approach to study SVM with chance constraints is to use Bernstein approx-
imation schemes [67–69]. Ben-Tal et al. [63] employed Bernstein bounding schemes
for the CCP relaxation and transformed the problem as a convex second order cone pro-
gram with robust set constraints to guarantee the satisfaction of the chance constraints
and can be solved efficiently using interior point solvers.
The Bernstein based relaxation utilized both the support (bounds, i.e. extremum
values of the data points) and moment information (mean and variance). For random
data point X i = [X i1 , . . . , X in ] and its label yi , support information is the bounds
of the data points li j ≤ X i j ≤ u i j , i.e. X i ∈ Ri = {xi = [xi1 , . . . , xin ] ∈ Rn | li j ≤
xi j ≤ u i j , j = 1, . . . , n}, 1st moment information is the bounds on the means of
the data points μi− = [μi1 − −
, . . . , μin ] ≤ μi = E[X i ] = [E[X i1 ], . . . , E[X in ]] ≤
+ + +
μi = [μi1 , . . . , μin ] , and 2nd moment information is the bounds on the second-
moments of the data points 0 ≤ E[X i2j ] ≤ σi2j .
The Bernstein based relaxation is to derive convex constraints so that when these
convex constraints are satisfied then the chance-constraints are guaranteed to be satis-
fied. They proved that with the information of independent random variable X i j , i.e.
support li j ≤ X i j ≤ u i j , bounds on the first-moment μi−j ≤ μi j = E[X i j ] ≤ μi+j , and
bounds on the second-moment 0 ≤ E[X i2j ] ≤ σi2j , the chance-constraint in SVM is
satisfied if the following convex constraint holds:

1 − ξi − yi b + max −yi μi−j w j , −yi μi+j w j + κ B || i w||2 ≤0 (39)
j

where κ B = 2 log(1/ε), and the diagonal matrix

− + − +
i = diag si1 ν(μi1 , μi1 , σi1 ), . . . , sin ν(μin , μin , σin ) (40)

u i j −li j
where si j = 2 and the function ν(μi−j , μi+j , σi j ) is defined by normalizing X̂ i j =
X i j −ci j li j +u i j u i j −li j
si j , where ci j = 2 and si j = 2 . Using the information of X i j , one can
easily compute the moment information of X̂ i j , which are denoted by μ̂i−j ≤ μ̂i j =
E[ X̂ i j ] ≤ μ̂i+j and 0 ≤ E[ X̂ i2j ] ≤ σ̂i2j . They proved that

⎧ μ̂i j −σ̂i2j
⎪
⎪ (1−μ̂i j )2 exp t˜ + σ̂i2j −μ̂i2j exp{t˜}
⎪
⎪ 1−μ̂i j
⎨
1−2μ̂i j + σ̂i2j
, t˜ ≥ 0
E exp{t˜ X̂ i j } ≤ gμ̂i j ,σ̂i j (t˜) =
⎪
⎪
μ̂i j +σ̂ 2
t˜ 1+μ̂ i j + σ̂i2j −μ̂i2j exp{−t˜}
⎪
⎪
(1+μ̂i j )2 exp
⎩ ij
, t˜ ≤ 0
1+2μ̂i j + σ̂i2j
(41)

They defined h μ̂i j ,σ̂i j (t˜) = log gμ̂i j ,σ̂i j (t˜), and the function ν(μ− , μ+ , σ ) is defined
as:

123
Ann. Data. Sci.

k2 2
ν(μ− , μ+ , σ ) = min k ≥ 0 : h μ̂,σ̂ (t˜) ≤ max[μ̂− t˜, μ̂+ t˜] + t˜ ,
2
∀μ̂ ∈ [μ̂− , μ̂+ ], t˜ (42)

This value can be calculated numerically. Under the condition that μi−j ≤ ci j ≤ μi+j ,

this value can be computed analytically by ν(μ− , μ+ , σ ) = 1 − (μ̂min )2 , where
μ̂min = min(−μ̂− , μ̂+ ).
Replacing the chance-constraints in SVM by the convex constraint derived above,
the problem is transformed into a convex second order cone program:

1 m
min w22 + C ξi (43a)
w,b,ξi ,z i j 2
i=1

s.t. 1 − ξi − yi b + z i j + κ B || i w||2 ≤0 (43b)
j

z i j ≥ −yi μi−j w j , z i j ≥ −yi μi+j w j (43c)

ξi ≥ 0, i = 1, . . . , m (43d)

which can be solved efficiently using cone programming solvers.

The geometrical interpretation of this convex constraint isthat yi (w ≥ 1−ξi
x+b)
is satisfied for all x belonging
to the union of ellipsoids E μ i , κ B i = x = μi +
κ B i a : ||a||2 ≤ 1 with center μi , shape and size κ B i , and the union is over
μi ∈ [μi− , μi+ ], i.e.,

yi (w x + b) ≥ 1 − ξi , ∀x ∈ ∪μi ∈[μ− ,μ+ ] E μi , κ B i (44)
i i

Therefore, this constraint is defining an uncertainty set ∪μi ∈[μ− ,μ+ ] E μi , κ B i for
i i
each uncertain training data point X i . If all the points in the uncertainty set satisfy
yi (w x + b) ≥ 1 − ξi , then the chance-constraint is guaranteed to be satisfied. This
transforms the CCP into a robust optimization problem over the uncertainty
set.
Since the size of the uncertainty set depend on κ B , and κ B = 2 log(1/ε), when
the upperbound of misclassification error ε decreases, the size of the uncertainty
set increases. When ε is very small, the uncertainty set would become huge so the
constraint would be too conservative. As the support information provides with the
bounding hyper-rectangle Ri where the true training data point X i would always
lie in, a less conservative
classifier can be obtained by taking the intersection of
∪μi ∈[μ− ,μ+ ] E μi , κ B i and Ri as the new uncertainty set.
i i
The authors proved that when the uncertainty set is the intersection, i.e.,

yi (w x + b) ≥ 1 − ξi , ∀x ∈ ∪μi ∈[μ− ,μ+ ] E μi , κ B i ∩ Ri (45)
i i

123
Ann. Data. Sci.

The above constraint is satisfied if and only if the following convex constraint holds:

max −li j (yi w j + ai j ), −u i j (yi w j + ai j ) + max μi−j ai j , μi+j ai j
j
+1 − ξi − yi b + κ B || i ai ||2 ≤0 (46)

Replacing the chance-constraints in SVM by the robust but less conservative convex
constraint above, the problem is transformed into the following SOCP:

1
m
min w22 + C ξi (47a)
w,b,ξi ,z i j ,z̃ i j ,ai 2
i=1

s.t. 1 − ξi − yi b + z̃ i j + z i j + κ B || i ai ||2 ≤0 (47b)
j j

z i j ≥ μi−j ai j , z i j ≥ μi+j ai j (47c)

z̃ i j ≥ −li j (yi w j + ai j ), z̃ i j ≥ −u i j (yi w j + ai j ) (47d)
ξi ≥ 0, i = 1, . . . , m (47e)

The Bernstein based formulations (43) and (47) are robust to the moment estimation
errors
− in+ addition to the uncertainty in data, since
they are using the bounds on mean
μi j , μi j and bounds on second-moment σi2j instead of the exact values of the
moments which are often unknown.
Comparing the two approaches for the chance constrained SVM, both of them are
robust to uncertainties in data and did not make assumptions to the underlying prob-
ability distribution. Chebyshev based schemes only employed moment information
of the uncertain training points, while Bernstein bounds employed both support and
moment information, therefore can be less conservative than Chebyshev bounds. The
resulting classifier by Bernstein approach achieved larger classification margins and
therefore better generalization ability according to the structural risk minimization
principle of Vapnik [1]. A drawback of Bernstein based formulation is that it assumes
each element X i j is independent with each other, while Chebyshev based formulation
allows the covariance matrix i of uncertain training point X i .

4 Concluding Remarks

This paper presented a survey on SVM with uncertainties. When direct model cannot
guarantee a generally good performance on the uncertainty set, robust optimization is
utilized to obtain an optimal performance under the worst case scenario. The perturba-
tion of the uncertain data could be bounded by the norm, or expressed as intervals and
polyhedrons. When the constraint is a chance constraint, different bounding schemes
like multivariate Chebyshev inequality and Bernstein bounding schemes are used to
ensure the small probability of misclassification for the uncertain data.
The models in the literature are generally processing the linear SVM. A big part of
the power of SVM lies in the powerful representation of nonlinear kernel in SVM mod-

123
Ann. Data. Sci.

els, which is to generate nonlinear classification boundaries. Therefore, it is suggested

that more study could be conducted to explore how to deal with nonlinear kernels.
And more schemes could be explored to represent the robust regions of the uncertain
data and formulate the models as convex solvable problems.

Acknowledgments Research is supported by RSF Grant 14-41-00039.

References

1. Vapnik VN (1998) Statistical learning theory. Wiley, New York

2. Vapnik VN (1999) An overview of statistical learning theory. IEEE Trans Neural Netw 10(5):988–999
3. Burges CJ (1998) A tutorial on support vector machines for pattern recognition. Data mining and
knowledge discovery 2(2):121–167
4. Abe S (2010) Support vector machines for pattern classification. Springer, Heidelberg
5. Ben-Hur A, Weston J (2010) A users guide to support vector machines. In: Carugo O, Eisenhaber F
(eds) Data mining techniques for the life sciences. Springer, Berlin, pp 223–239
6. Chang CC, Lin CJ (2011) Libsvm: a library for support vector machines. ACM Trans Intell Syst
Technol (TIST) 2(3):27
7. Mueller A, Candrian G, Grane VA, Kropotov JD, Ponomarev VA, Baschera GM (2011) Discriminating
between adhd adults and controls using independent erp components and a support vector machine: a
validation study. Nonlinear Biomed Phys 5(1):5
8. Xie J, Wang C (2011) Using support vector machines with a novel hybrid feature selection method for
diagnosis of erythemato-squamous diseases. Expert Syst Appl 38(5):5809–5815
9. Orrù G, Pettersson-Yeo W, Marquand AF, Sartori G, Mechelli A (2012) Using support vector machine
to identify imaging biomarkers of neurological and psychiatric disease: a critical review. Neurosci
Biobehav Rev 36(4):1140–1152
10. Ramírez J, Górriz J, Salas-Gonzalez D, Romero A, López M, Álvarez I, Gómez-Río M (2013)
Computer-aided diagnosis of Alzheimers type dementia combining support vector machines and dis-
criminant set of features. Inf Sci 237:59–72
11. Mourao-Miranda J, Reinders A, Rocha-Rego V, Lappin J, Rondina J, Morgan C, Morgan KD, Fearon
P, Jones PB, Doody GA et al.(2012) Individualized prediction of illness course at the first psychotic
episode: a support vector machine mri study. Psychol Med 42(05):1037–1047
12. Subasi A, Ismail Gursoy M (2010) EEG signal classification using PCA, ICA, IDA and support vector
machines. Expert Syst Appl 37(12):8659–8666
13. Übeyli ED (2010) Least squares support vector machine employing model-based methods coefficients
for analysis of EEG signals. Expert Syst Appl 37(1):233–239
14. Nicolaou N, Georgiou J (2012) Detection of epileptic electroencephalogram based on permutation
entropy and support vector machines. Expert Syst Appl 39(1):202–209
15. Qiu JD, Huang JH, Shi SP, Liang RP (2010) Using the concept of chou’s pseudo amino acid composition
to predict enzyme family classes: an approach with support vector machine based on discrete wavelet
transform. Protein Pept Lett 17(6):715–722
16. Kumar Kandaswamy K, Pugalenthi G, Moller S, Hartmann E, Uwe Kalies K, N Suganthan P, Martinetz
T (2010) Prediction of apoptosis protein locations with genetic algorithms and support vector machines
through a new mode of pseudo amino acid composition. Protein Pept Lett 17(12):1473–1479
17. Bikadi Z, Hazai I, Malik D, Jemnitz K, Veres Z, Hari P, Ni Z, Loo TW, Clarke DM, Hazai E
et al.(2011) Predicting p-glycoprotein-mediated drug transport based on support vector machine and
three-dimensional crystal structure of p-glycoprotein. PLoS One 6(10):e25–815
18. Lise S, Buchan D, Pontil M, Jones DT (2011) Predictions of hot spot residues at protein-protein
interfaces using support vector machines. PLoS One 6(2):e16,774
19. Mohabatkar H, Mohammad Beigi M, Esmaeili A (2011) Prediction of gaba receptor proteins using
the concept of Chou’s pseudo-amino acid composition and support vector machine. J Theor Biol
281(1):18–23
20. Morra JH, Tu Z, Apostolova LG, Green AE, Toga AW, Thompson PM (2010) Comparison of adaboost
and support vector machines for detecting Alzheimer’s disease through automated hippocampal seg-
mentation. IEEE Trans Med Imaging 29(1):30–43

123
Ann. Data. Sci.

21. Bauer S, Nolte LP, Reyes M (2011) Fully automatic segmentation of brain tumor images using support
vector machine classification in combination with hierarchical conditional random field regularization.
In: Fichtinger G, Martel A, Peters T (eds) Proceedings of Medical Image Computing and Computer-
Assisted Intervention-MICCAI 2011. Springer, Berlin, pp 354–361
22. Yao J, Dwyer A, Summers RM, Mollura DJ (2011) Computer-aided diagnosis of pulmonary infections
using texture analysis and support vector machine classification. Acad Radiol 18(3):306–314
23. Prosser B, Zheng WS, Gong S, Xiang T, Mary Q (2010) Person re-identification by support vector
ranking. BMVC 1:5
24. Dardas NH, Georganas ND (2011) Real-time hand gesture detection and recognition using bag-of-
features and support vector machine techniques. IEEE Trans Instrum Meas 60(11):3592–3607
25. Wei J, Jian-qi Z, Xiang Z (2011) Face recognition method based on support vector machine and particle
swarm optimization. Expert Syst Appl 38(4):4390–4393
26. Han B, Davis LS (2012) Density-based multifeature background subtraction with support vector
machine. IEEE Trans Pattern Anal Mach Intell 34(5):1017–1023
27. Waske B, van der Linden S, Benediktsson JA, Rabe A, Hostert P (2010) Sensitivity of support vector
machines to random feature selection in classification of hyperspectral data. IEEE Trans Geosci Remote
Sens 48(7):2880–2889
28. Mountrakis G, Im J, Ogole C (2011) Support vector machines in remote sensing: a review. ISPRS J
Photogramm Remote Sens 66(3):247–259
29. Li CH, Kuo BC, Lin CT, Huang CS (2012) A spatial-contextual support vector machine for remotely
sensed image classification. IEEE Trans Geosci Remote Sens 50(3):784–799
30. Otukei J, Blaschke T (2010) Land cover change assessment using decision trees, support vector
machines and maximum likelihood classification algorithms. Int J Appl Earth Obs Geoinf 12:S27–S31
31. Shao Y, Lunetta RS (2012) Comparison of support vector machine, neural network, and cart algorithms
for the land-cover classification using limited training data points. ISPRS J Photogramm Remote Sens
70:78–87
32. Volpi M, Tuia D, Bovolo F, Kanevski M, Bruzzone L (2013) Supervised change detection in VHR
images using contextual information and support vector machines. Int J Appl Earth Obs Geoinf 20:77–
85
33. Yilmaz I (2010) Comparison of landslide susceptibility mapping methodologies for koyulhisar, Turkey:
conditional probability, logistic regression, artificial neural networks, and support vector machine.
Environ Earth Sci 61(4):821–836
34. Tien Bui D, Pradhan B, Lofman O, Revhaug I (2012) Landslide susceptibility assessment in vietnam
using support vector machines, decision tree, and naive bayes models. Math Problems Eng 2012:
Article ID 974638
35. Xu C, Dai F, Xu X, Lee YH (2012) Gis-based support vector machine modeling of earthquake-triggered
landslide susceptibility in the Jianjiang River Watershed, China. Geomorphology 145:70–80
36. Pradhan B (2013) A comparative study on the predictive ability of the decision tree, support vec-
tor machine and neuro-fuzzy models in landslide susceptibility mapping using GIS. Comput Geosci
51:350–365
37. Kisi O, Cimen M (2011) A wavelet-support vector machine conjunction model for monthly streamflow
forecasting. J Hydrol 399(1):132–140
38. Yoon H, Jun SC, Hyun Y, Bae GO, Lee KK (2011) A comparative study of artificial neural networks and
support vector machines for predicting groundwater levels in a coastal aquifer. J Hydrol 396(1):128–
138
39. Gomez FR, Rajapakse AD, Annakkage UD, Fernando IT (2011) Support vector machine-based algo-
rithm for post-fault transient stability status prediction using synchronized measurements. IEEE Trans
Power Syst 26(3):1474–1483
40. Niu D, Wang Y, Wu DD (2010) Power load forecasting using support vector machine and ant colony
optimization. Expert Syst Appl 37(3):2531–2539
41. Kavaklioglu K (2011) Modeling and prediction of Turkeys electricity consumption using support vector
regression. Appl Energy 88(1):368–375
42. Zhou J, Shi J, Li G (2011) Fine tuning support vector machines for short-term wind speed forecasting.
Energy Conversv Manag 52(4):1990–1998
43. Kara Y, Acar Boyacioglu M, Baykan ÖK (2011) Predicting direction of stock price index movement
using artificial neural networks and support vector machines: the sample of the istanbul stock exchange.
Expert Syst Appl 38(5):5311–5319

123
Ann. Data. Sci.

44. Yeh CY, Huang CW, Lee SJ (2011) A multiple-kernel support vector regression approach for stock
market price forecasting. Expert Syst Appl 38(3):2177–2186
45. Huang CF (2012) A hybrid stock selection model using genetic algorithms and support vector regres-
sion. Appl Soft Comput 12(2):807–818
46. Yang, XS, Deb S, Fong S (2011) Accelerated particle swarm optimization and support vector machine
for business optimization and applications. In: Fong S (ed) Networked digital technologies. Springer,
Berlin, pp 53–66
47. Rumpf T, Mahlein AK, Steiner U, Oerke EC, Dehne HW, Plümer L (2010) Early detection and clas-
sification of plant diseases with support vector machines based on hyperspectral reflectance. Comput
Electron Agric 74(1):91–99
48. Konar P, Chattopadhyay P (2011) Bearing fault detection of induction motor using wavelet and support
vector machines (SVMS). Appl Soft Comput 11(6):4203–4211
49. Horng SJ, Su MY, Chen YH, Kao TW, Chen RJ, Lai JL, Perkasa CD (2011) A novel intrusion detection
system based on hierarchical clustering and support vector machines. Expert Syst Appl 38(1):306–313
50. Wong PK, Xu Q, Vong CM, Wong HC (2012) Rate-dependent hysteresis modeling and control of a
piezostage using online support vector machine and relevance vector machine. IEEE Trans Ind Electron
59(4):1988–2001
51. Cui J, Wang Y (2011) A novel approach of analog circuit fault diagnosis using support vector machines
classifier. Measurement 44(1):281–289
52. Tian Y, Shi Y, Liu X (2012) Recent advances on support vector machines research. Technol Econ Dev
Econ 18(1):5–33
53. Bi J, Zhang T (2004) Support vector classification with input data uncertainty. Adv Neural Inf Process
Syst 17:161–168
54. Trafalis TB, Gilbert RC (2006) Robust classification and regression using support vector machines.
Eur J Op Res 173(3):893–909
55. Trafalis TB, Gilbert RC (2007) Robust support vector machines for classification and computational
issues. Optim Methods Softw 22(1):187–198
56. Trafalis TB, Alwazzi SA (2010) Support vector machine classification with noisy data: a second order
cone programming approach. Int J Gen Syst 39(7):757–781
57. Pant R, Trafalis TB, Barker K (2011) Support vector machine classification of uncertain and imbal-
anced data using robust optimization. In: Proceedings of the 15th WSEAS international conference on
computers, World Scientific and Engineering Academy and Society (WSEAS), pp 369–374
58. Xanthopoulos P, Pardalos PM, Trafalis TB (2012) Robust data mining. Springer, New York
59. Ghaoui LE, Lanckriet GR, Natsoulis G (2003) Robust classification with interval data. Technical report
UCB/CSD-03-1279, Computer Science Division, University of California, Berkeley
60. Fan N, Sadeghi E, Pardalos PM (2014) Robust support vector machines with polyhedral uncertainty of
the input data. In: Pardalos PM, Resende MGC, Vogiatzis C, Walteros JL (eds) Learning and intelligent
optimization. Springer, Berlin, pp 291–305
61. Bhattacharyya C, Grate LR, Jordan MI, El Ghaoui L, Mian IS (2004) Robust sparse hyperplane
classifiers: application to uncertain molecular profiling data. J Comput Biol 11(6):1073–1089
62. Shivaswamy PK, Bhattacharyya C, Smola AJ (2006) Second order cone programming approaches for
handling missing and uncertain data. J Mach Learn Res 7:1283–1314
63. Ben-Tal A, Bhadra S, Bhattacharyya C, Nath JS (2011) Chance constrained uncertain classification
via robust optimization. Math Program 127(1):145–173
64. Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297
65. Marshall AW, Olkin I (1960) Multivariate Chebyshev inequalities. Ann Math Stat 31(4):1001–1014
66. Bertsimas D, Popescu I (2005) Optimal inequalities in probability theory: a convex optimization
approach. Siam J Optim 15(3):780–804
67. Ben-Tal A, Ghaoui LE, Nemirovski A (2009) Robust optimization. Princeton University Press, Prince-
ton
68. Ben-Tal A, Nemirovski A (2008) Selected topics in robust convex optimization. Math Program
112(1):125–158
69. Nemirovski A, Shapiro A (2006) Convex approximations of chance constrained programs. Siam J
Optim 17(4):969–996

123
Ann. Data. Sci.

Ximing Wang received her BS degree in Civil Engineering and MS

degree in Geodesy and Survey Engineering from Tsinghua Univer-
sity, Beijing, China in 2008 and 2011. She is now working at her
PhD degree in Industrial and Systems Engineering at the University
of Florida, Gainesville, FL, USA. Her research interests include data
mining, machine learning, stochastic programming, and robust opti-
mization.

Panos M. Pardalos received his PhD degree in Computer and

Information Sciences from the University of Minnesota, Minneapo-
lis, MN, USA. He serves as Distinguished Professor of Industrial and
Systems Engineering at the University of Florida. Additionally, he is
the Paul and Heidi Brown Preeminent Professor in Industrial and Sys-
tems Engineering. He is also an affiliated faculty member of the Com-
puter and Information Science Department, the Hellenic Studies Cen-
ter, and the Biomedical Engineering Program. He is also the Direc-
tor of the Center for Applied Optimization. Dr. Pardalos is a world
leading expert in global and combinatorial optimization. His recent
research interests include network design problems, optimization in
telecommunications, e-commerce, data mining, biomedical applica-
tions, and massive computing.

123

Meridium Enterprise APM ModulesAndFeaturesDeployment
No ratings yet
Meridium Enterprise APM ModulesAndFeaturesDeployment
517 pages
DeNora MicroChem Installation Manual
No ratings yet
DeNora MicroChem Installation Manual
60 pages
6 Lec SVM Kernel
No ratings yet
6 Lec SVM Kernel
36 pages
SVM Explained PDF
No ratings yet
SVM Explained PDF
19 pages
Support Vector Machine (SVM) : Basic Terminologies
100% (1)
Support Vector Machine (SVM) : Basic Terminologies
2 pages
Support Vector Machine (SVM)
No ratings yet
Support Vector Machine (SVM)
4 pages
SVM Tutorial
No ratings yet
SVM Tutorial
28 pages
2024-SCU-ML-2-1-SVM
No ratings yet
2024-SCU-ML-2-1-SVM
36 pages
SVM
No ratings yet
SVM
6 pages
neural 4
No ratings yet
neural 4
5 pages
Lecture 18 - SVM
No ratings yet
Lecture 18 - SVM
54 pages
Detailed SVM Presentation
No ratings yet
Detailed SVM Presentation
15 pages
UNIT - 2
No ratings yet
UNIT - 2
15 pages
Unit 2
No ratings yet
Unit 2
47 pages
27-Module 4 - Support Vector Machine and Naïve Bayes-20-09-2024
No ratings yet
27-Module 4 - Support Vector Machine and Naïve Bayes-20-09-2024
31 pages
SVM notes unit 4.docx
No ratings yet
SVM notes unit 4.docx
8 pages
6. Support Vector Machine for Classification
No ratings yet
6. Support Vector Machine for Classification
38 pages
Lecture Notes SVM
No ratings yet
Lecture Notes SVM
4 pages
Lecture Notes SVM
No ratings yet
Lecture Notes SVM
4 pages
Machine Learning - SVM
No ratings yet
Machine Learning - SVM
11 pages
Chapter 07 SVM
No ratings yet
Chapter 07 SVM
20 pages
SVM
No ratings yet
SVM
11 pages
(Optimization) SVMs
No ratings yet
(Optimization) SVMs
19 pages
SVM Tutorial
No ratings yet
SVM Tutorial
34 pages
UNIT - 2-1
No ratings yet
UNIT - 2-1
7 pages
Ann Unit III
No ratings yet
Ann Unit III
20 pages
Chapter 6 Data-DrivenModelingUsingMATLAB-6
No ratings yet
Chapter 6 Data-DrivenModelingUsingMATLAB-6
7 pages
Support Vector Machines: Some Slides Adapted From
No ratings yet
Support Vector Machines: Some Slides Adapted From
54 pages
SVM-1-SummaryNotes (1)
No ratings yet
SVM-1-SummaryNotes (1)
3 pages
SVM MJJ
No ratings yet
SVM MJJ
19 pages
SVM - Feb 15
No ratings yet
SVM - Feb 15
34 pages
Support Vector Machines For Classification and Regression
No ratings yet
Support Vector Machines For Classification and Regression
8 pages
Support Vector Machines: Jeff Wu
No ratings yet
Support Vector Machines: Jeff Wu
35 pages
Support Vector Machines: Constantin F. Aliferis & Ioannis Tsamardinos
No ratings yet
Support Vector Machines: Constantin F. Aliferis & Ioannis Tsamardinos
37 pages
CS-13410 Introduction To Machine Learning
No ratings yet
CS-13410 Introduction To Machine Learning
33 pages
16 SVM
No ratings yet
16 SVM
41 pages
A Introduction To SVM PDF
No ratings yet
A Introduction To SVM PDF
48 pages
SVM & Image Classification.
No ratings yet
SVM & Image Classification.
22 pages
Lec5 Support vector machine
No ratings yet
Lec5 Support vector machine
28 pages
Support Vector Machine
No ratings yet
Support Vector Machine
45 pages
Unit2 notes What is a Support Vector Machine
No ratings yet
Unit2 notes What is a Support Vector Machine
11 pages
Unit 2 - SVM - 241016 - 104220
No ratings yet
Unit 2 - SVM - 241016 - 104220
47 pages
Support Vector Machines (SVMs) - Introduction and Key Concepts
No ratings yet
Support Vector Machines (SVMs) - Introduction and Key Concepts
52 pages
Support Vector Machine in R Paper
No ratings yet
Support Vector Machine in R Paper
28 pages
Fundamental Knowledge of Machine Learning: Abstract This Chapter Introduces The Basic Concepts and Methods of Machine
No ratings yet
Fundamental Knowledge of Machine Learning: Abstract This Chapter Introduces The Basic Concepts and Methods of Machine
14 pages
SVM Tutorial
No ratings yet
SVM Tutorial
31 pages
Support Vector Machines SVM & NAive Bayes
No ratings yet
Support Vector Machines SVM & NAive Bayes
30 pages
SVM PRESENTATION
No ratings yet
SVM PRESENTATION
34 pages
Supervised Alg
No ratings yet
Supervised Alg
27 pages
SVM
No ratings yet
SVM
28 pages
UNIT-III Support Vector Machines
No ratings yet
UNIT-III Support Vector Machines
43 pages
Chapter 07
No ratings yet
Chapter 07
18 pages
CMPE 442 Introduction To Machine Learning: Support Vector Machines
No ratings yet
CMPE 442 Introduction To Machine Learning: Support Vector Machines
64 pages
Module10 - Support Vector Machine
No ratings yet
Module10 - Support Vector Machine
23 pages
Introduction To: Support Vector Machines
No ratings yet
Introduction To: Support Vector Machines
53 pages
Support Vector Machine
No ratings yet
Support Vector Machine
17 pages
Chapter 5 - Support Vector Machine: Prepared By: Shier Nee, SAW
No ratings yet
Chapter 5 - Support Vector Machine: Prepared By: Shier Nee, SAW
44 pages
Support Vector Machines
No ratings yet
Support Vector Machines
19 pages
MergedPDF Iml
No ratings yet
MergedPDF Iml
114 pages
Student's Solutions Manual and Supplementary Materials for Econometric Analysis of Cross Section and Panel Data, second edition
From Everand
Student's Solutions Manual and Supplementary Materials for Econometric Analysis of Cross Section and Panel Data, second edition
Jeffrey M. Wooldridge
No ratings yet
Student Solutions Manual to Accompany Economic Dynamics in Discrete Time, secondedition
From Everand
Student Solutions Manual to Accompany Economic Dynamics in Discrete Time, secondedition
Yue Jiang
4.5/5 (2)
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
WAGO 750-352 Coupler
No ratings yet
WAGO 750-352 Coupler
300 pages
Requirements 0.1
No ratings yet
Requirements 0.1
387 pages
Introduction of Firewall in Computer Network _ GeeksforGeeks
No ratings yet
Introduction of Firewall in Computer Network _ GeeksforGeeks
12 pages
Soal Test Sys Admin PT INA17
No ratings yet
Soal Test Sys Admin PT INA17
11 pages
Ms Word Parts and Functions
No ratings yet
Ms Word Parts and Functions
26 pages
Structural Design Pattern
No ratings yet
Structural Design Pattern
17 pages
Ensemble Learning and Random Forests
No ratings yet
Ensemble Learning and Random Forests
37 pages
3 - Part 1 - Online ABAP Report - Excel With Multiple Tabs
No ratings yet
3 - Part 1 - Online ABAP Report - Excel With Multiple Tabs
19 pages
Ssu 05124474
No ratings yet
Ssu 05124474
59 pages
Resistivity 1D Forward Modelling
No ratings yet
Resistivity 1D Forward Modelling
51 pages
Phases of Linear Sequential Model: Analysis
No ratings yet
Phases of Linear Sequential Model: Analysis
20 pages
Detecting Eggs Condition by Using Pixy Camera Based On Shell-Color Filtering
No ratings yet
Detecting Eggs Condition by Using Pixy Camera Based On Shell-Color Filtering
4 pages
PRIME User Guide
No ratings yet
PRIME User Guide
375 pages
Automatic Temperature Control System Using Arduino: Advances in Intelligent Systems and Computing March 2020
No ratings yet
Automatic Temperature Control System Using Arduino: Advances in Intelligent Systems and Computing March 2020
9 pages
Curriculum Vitae English Business
100% (1)
Curriculum Vitae English Business
8 pages
INS2080 - Project Description - Rubik
No ratings yet
INS2080 - Project Description - Rubik
4 pages
Digital Switching Systems Syllabus
No ratings yet
Digital Switching Systems Syllabus
2 pages
Get Excel Data Analysis Your visual blueprint for creating and analyzing data charts and PivotTables 3rd Edition Denise Etheridge PDF ebook with Full Chapters Now
100% (7)
Get Excel Data Analysis Your visual blueprint for creating and analyzing data charts and PivotTables 3rd Edition Denise Etheridge PDF ebook with Full Chapters Now
71 pages
MH-DL Technical - Presentation Modified
No ratings yet
MH-DL Technical - Presentation Modified
27 pages
ZXComputing Jun-Jul 1983
No ratings yet
ZXComputing Jun-Jul 1983
132 pages
Passive Survey Survey Report Example
No ratings yet
Passive Survey Survey Report Example
19 pages
MP/EMP Post Debugger User's Guide
No ratings yet
MP/EMP Post Debugger User's Guide
46 pages
3d Plot
No ratings yet
3d Plot
26 pages
AR100, AR120, AR150, AR160, AR200, AR1200, AR2200, AR3200, and AR3600 V200R009 CLI-based Configuration Guide - Network Management and Monitoring
No ratings yet
AR100, AR120, AR150, AR160, AR200, AR1200, AR2200, AR3200, and AR3600 V200R009 CLI-based Configuration Guide - Network Management and Monitoring
419 pages
Batch 1 List - Evaluation Sheet - Month 12
No ratings yet
Batch 1 List - Evaluation Sheet - Month 12
11 pages
PPT FINAL
No ratings yet
PPT FINAL
27 pages
QA Formula & Stats Table
No ratings yet
QA Formula & Stats Table
4 pages
Frames
No ratings yet
Frames
7 pages

SVM

Uploaded by

SVM

Uploaded by

Ann. Data. Sci.

A Survey of Support Vector Machines with

Ximing Wang · Panos M. Pardalos

Received: 10 October 2014 / Revised: 1 November 2014 / Accepted: 10 December 2014

Keywords Support vector machines · Robust optimization · Bounded norm ·

X. Wang (B) · P. M. Pardalos

2 Basic SVM Models

early separable datasets, there exists a hyperplane w x + b = 0 to separate the two

s.t. yi (w xi + b) ≥ 1, i = 1, . . . , m (1b)

Introduing Lagrange multipliers α = [α1 , . . . , αm ], the above constrained problem

Take the derivatives with respect to w and b, and set to zero:

Substituting into L (w, b, α):

where C is a trade-off parameter.

Take the derivative with respect to ξi and set to zero:

Then αi = C − βi . Since βi ≥ 0, it indicates that αi ≤ C.

3 SVM with Uncertainties

Given m training data points in Rn , use X i = [X i1 , . . . , X in ] ∈ Rn , i = 1, . . . , m to

3.1 Robust SVM with Bounded Uncertainty

Robust optimization is to guarantee an optimal performance under the worst case

min yi (w x̄i + b) + yi w σ i ≥ 1 − ξi , i = 1, . . . , m (13)

f g1 ≤ f p gq (15)

|yi w σ i | ≤ σ i p wq ≤ ηi wq (16)

The above formulation depends on the norm L p . When p = q = 2, a conic program

An interesting property of the norm transformation is that for L 1 and L ∞ norms,

Introducing an auxiliary variable α = w∞ , then the above formulation can be

Introducing an auxiliary vector α with α j = |w j |, the resulting optimization problem

is lying in a hyper-rectangle Ri = {xi = [xi1 , . . . , xin ] ∈ Rn | li j ≤ xi j ≤ u i j , j =

where Si is a diagonal matrix with entries si j .

Since min{xi :Di xi ≤di } yi (w xi + b) ≥ 1 − ξi is equivalent to:

max (−yi w xi ) − yi b ≤ −1 + ξi (26)

The dual is:

min di zi (28a)

The interval uncertainty [xi0 −δ i , xi0 +δ i ] is a special case of polyhedral uncertainty

3.2 Chance Constrained SVM through Robust Optimization

The chance-constrained program (CCP) is used to ensure the small probability of

s.t. Prob yi (w X i + b) ≤ 1 − ξi ≤ ε, ξi ≥ 0, i = 1, . . . , m (32b)

which yields the Chebyshev’s inequality

For the constraint Prob{w x + b ≤ 0} ≤ ε, it could be derived that:

z i j ≥ −yi μi−j w j , z i j ≥ −yi μi+j w j (43c)

which can be solved efficiently using cone programming solvers.

z i j ≥ μi−j ai j , z i j ≥ μi+j ai j (47c)

els, which is to generate nonlinear classification boundaries. Therefore, it is suggested

Acknowledgments Research is supported by RSF Grant 14-41-00039.

1. Vapnik VN (1998) Statistical learning theory. Wiley, New York

Ximing Wang received her BS degree in Civil Engineering and MS

Panos M. Pardalos received his PhD degree in Computer and

You might also like