SVM
SVM
DOI 10.1007/s40745-014-0022-8
Abstract Support Vector Machines (SVM) is one of the well known supervised
classes of learning algorithms. SVM have wide applications to many fields in recent
years and also many algorithmic and modeling variations. Basic SVM models are
dealing with the situation where the exact values of the data points are known. This
paper presents a survey of SVM when the data points are uncertain. When a direct
model cannot guarantee a generally good performance on the uncertainty set, robust
optimization is introduced to deal with the worst case scenario and still guarantee
an optimal performance. The data uncertainty could be an additive noise which is
bounded by norm, where some efficient linear programming models are presented
under certain conditions; or could be intervals with support and extremum values; or
a more general case of polyhedral uncertainties with formulations presented. Another
field of the uncertainty analysis is chance constrained SVM which is used to ensure the
small probability of misclassification for the uncertain data. The multivariate Cheby-
shev inequality and Bernstein bounding schemes have been used to transform the
chance constraints through robust optimization. The Chebyshev based model employs
moment information of the uncertain training points. The Bernstein bounds can be less
conservative than the Chebyshev bounds since it employs both support and moment
information, but it also makes a strong assumption that all the elements in the data set
are independent.
123
Ann. Data. Sci.
1 Introduction
As one of the well known supervised learning algorithms, Support Vector Machines
(SVM) are gaining more and more attention. It was proposed by Vapnik [1,2] as a
maximum-margin classifier, and tutorials on SVM could refer to [3–6]. In recent years,
SVM have been applied to many fields and have many algorithmic and modeling vari-
ations. In the biomedical field, SVM have been used to identify physical diseases
[7–10] as well as psychological diseases [11]. Electroencephalography (EEG) signals
can also be analyzed using SVM [12–14]. Besides these, SVM also applied to pro-
tein prediction [15–19] and medical images [20–22]. Computer vision includes many
applications of SVM like person identification [23], hand gesture detection [24], face
recognition [25] and background subtraction [26]. In geosceinces, SVM have been
applied to remote sensing analysis [27–29], land cover change [30–32], landslide
susceptibility [33–36] and hydrology [37,38]. In power systems, SVM was used for
transient status prediction [39], power load forecasting [40], electricity consumption
prediction [41] and wind power forecasting [42]. Stock price forecasting [43–45] and
business administration [46] can also use SVM. Other applications of SVM include
agriculture plant disease detection [47], condition monitoring [48], network security
[49] and electronics [50,51]. When basic SVM models cannot satisfy the application
requirement, different modeling variations of SVM can be found in [52].
In this paper, a survey of SVM with uncertainties is presented. Basic SVM models
are dealing with the situation that the exact values of the data points are known. When
the data points are uncertain, different models have been proposed to formulate the
SVM with uncertainties. Bi and Zhang [53] assumed the data points are subject to an
additive noise which is bounded by the norm and proposed a very direct model. How-
ever, this model cannot guarantee a generally good performance on the uncertainty
set. To guarantee an optimal performance when the worst case scenario constraints
are still satisfied, robust optimization is utilized. Trafalis et al. [54–58] proposed a
robust optimization model when the perturbation of the uncertain data is bounded by
norm. Ghaoui et al. [59] derived a robust model when the uncertainty is expressed
as intervals. Fan et al. [60] studied a more general case for polyhedral uncertainties.
Robust optimization is also used when the constraint is a chance constraint which is
to ensure the small probability of misclassification for the uncertain data. The chance
constraints are transformed by different bounding inequalities, for example multivari-
ate Chebyshev inequality [61,62] and Bernstein bounding schemes [63].
The organization of this paper is as follows: Sect. 2 gives an introduction to the basic
SVM models. Section 3 presents the SVM with uncertainties, stating both the robust
SVM with bounded uncertainty and chance constrained SVM through robust opti-
mization. Section 4 presents concluding remarks and suggesting for further research.
Support Vector Machines construct maximum-margin classifiers, such that small per-
turbations in data are least likely to cause misclassification. Empirically, SVM works
really well and are well known supervised learning algorithms proposed by Vap-
123
Ann. Data. Sci.
nik [1,2]. Suppose we have a two-class dataset of m data points {xi , yi }i=1 m with
n-dimensional features xi ∈ R and respective class labels yi ∈ {+1, −1}. For lin-
n
1
min w22 (1a)
w,b 2
1
m
min max L (w, b, α) = w22 − αi yi (w xi + b) − 1 (2)
w,b α≥0 2
i=1
∂L (w, b, α) m
=0 ⇒ w= αi yi xi (3a)
∂w
i=1
∂L (w, b, α)
m
=0 ⇒ αi yi = 0 (3b)
∂b
i=1
m
1
m m
L (α) = αi − αi α j yi y j xi x j (4)
2
i=1 i=1 j=1
Then the dual of the original SVM problem is also a convex quadratic problem:
m
1
m m
max αi − αi α j yi y j xi x j (5a)
α 2
i=1 i=1 j=1
m
s.t. αi yi = 0, αi ≥ 0, i = 1, . . . , m (5b)
i=1
123
Ann. Data. Sci.
Since only the αi corresponding to support vectors can be nonzero, this dramatically
simplifies solving the dual problem.
The above is in the case that the two classes are linearly separable. When they
are not, mislabeled samples need to be allowed where soft margin SVM arises. Soft
margin SVM introduces non-negative slack variables ξi to measure the distance of
within-margine or misclassified data xi to the hyperplane with the correct label, and
ξi = max{0, 1 − yi (w xi + b)}. When 0 < ξi < 1, the data is within margine but
correctly classified; when ξi > 1, the data is misclassified. The objective function
is then adding a term that penalizes these slack variables, and the optimization is a
trade off between a large margin and a small error penalty. The soft margin SVM
formulation with L 1 regularization [64] is:
1 m
min w22 + C ξi (6a)
w,b,ξi 2
i=1
s.t. yi (w xi + b) ≥ 1 − ξi , ξi ≥ 0, i = 1, . . . , m (6b)
1 m
min max L (w, b, ξ , α, β) = w22 + C ξi
w,b,ξ α,β≥0 2
i=1
m
m
− αi yi (w xi + b) − 1 + ξi − βi ξi (7)
i=1 i=1
∂L (w, b, ξ , α, β)
= 0 ⇒ C − αi − βi = 0 (8)
∂ξi
m
1
m m
max αi − αi α j yi y j xi x j (9a)
α 2
i=1 i=1 j=1
m
s.t. αi yi = 0, 0 ≤ αi ≤ C, i = 1, . . . , m (9b)
i=1
The only difference is that the dual variables αi now have upper bounds C. The
advantage of the L 1 regularization (linear penalty function) is that in the dual problem,
the slack variables ξi vanish and the constant C is just an additional constraint on the
123
Ann. Data. Sci.
Lagrange multipliers αi . Because of this nice property and its huge impact in practice,
L 1 is the most widely used regularization term.
Besides the linear kernel k(xi , x j ) = xi x j , nonlinear kernels are also introduced
into SVM to create nonlinear classifiers. The maximum-margin hyperplane is con-
structed in a high-dimensional transformed fearture space with a possible nonlin-
ear transformation, therefore, it could be nonlinear in the original feature space.
A widely
used nonliear
kernel is the Gaussian radial basis function k(xi , x j ) =
exp − γ xi − x j 22 . It corresponds to a Hilbert space of infinite dimensions.
1
m
min w22 + C ξi (10a)
w,b,ξi 2
i=1
s.t. yi (w X i + b) ≥ 1 − ξi , ξi ≥ 0, i = 1, . . . , m (10b)
When the training data points X i are random vectors, the model needs to be modified
to consider the uncertainties. The simplest model is to just employ the means of the
uncertain data points, μi = E[X i ]. The formulation would become:
1
m
min w22 + C ξi (11a)
w,b,ξi 2
i=1
s.t. yi (w μi + b) ≥ 1 − ξi , ξi ≥ 0, i = 1, . . . , m (11b)
The above model is equivalent to a soft margin SVM on data points fixed on the
means, therefore does not take into account the uncertainties of the data. Bi and Zhang
[53] assumed the data points are subject to an additive noise, X i = x̄i + xi and the
noise is bounded by xi 2 ≤ δi . Then they proposed the model as:
1 m
min w22 + C ξi (12a)
w,b,ξi 2
i=1
s.t. yi (w (x̄i + xi ) + b) ≥ 1 − ξi , ξi ≥ 0, i = 1, . . . , m (12b)
xi 2 ≤ δi , i = 1, . . . , m (12c)
In this model, the uncertain data X i is free in the circle centered at x̄i with radius
equal to δi , i.e., X i could move toward any direction in the uncertainty set. A drawback
of this model is that it cannot guarantee a generally good performance on the uncer-
tainty set since the direction of how the data points are perturbed is not constrained in
this model. It is highly possible and already presented in this paper that a data point
123
Ann. Data. Sci.
with a perturbation making it move far away from the separation hyperplane could be
used as the support vector. Then considering the original uncertainty set of this data
point, it would be mostly lie within the margin and the constraint would not be satisfied
any more. To guarantee a better performance under most conditions or with higher
probability, robust optimization is introduced to solve the SVM with uncertainty.
To solve the robust SVM, the following subproblem needs to be solved first:
min yi w σ i (14a)
σi
s.t. σ i p ≤ ηi (14b)
Hölder’s inequality says that for a pair of dual norms L p and L q with p, q ∈ [1, ∞]
and 1/ p + 1/q = 1, the following inequality holds:
Therefore
A lower bound of yi w σ i is −ηi wq , substituting into the original problem will get
the following formulation:
1 m
min w22 + C ξi (17a)
w,b,ξi 2
i=1
s.t. yi (w x̄i + b) − ηi wq ≥ 1 − ξi , ξi ≥ 0, i = 1, . . . , m (17b)
123
Ann. Data. Sci.
1 m
min w22 + C ξi (18a)
w,b,ξi 2
i=1
s.t. yi (w x̄i + b) − ηi w2 ≥ 1 − ξi , ξi ≥ 0, i = 1, . . . , m (18b)
1 m
min w∞ + C ξi (19a)
w,b,ξi 2
i=1
s.t. yi (w x̄i + b) − ηi w∞ ≥ 1 − ξi , ξi ≥ 0, i = 1, . . . , m (19b)
1 m
min α+C ξi (20a)
w,b,ξi 2
i=1
s.t. yi (w x̄i + b) − ηi α ≥ 1 − ξi , ξi ≥ 0, i = 1, . . . , m (20b)
α ≥ −w j , α ≥ w j , j = 1, . . . , n (20c)
When the L ∞ norm is chosen to express the perturbation, then the formulation
becomes:
1 m
min w1 + C ξi (21a)
w,b,ξi 2
i=1
s.t. yi (w x̄i + b) − ηi w1 ≥ 1 − ξi , ξi ≥ 0, i = 1, . . . , m (21b)
1
n m
min αj + C ξi (22a)
w,b,ξi 2
j=1 i=1
n
s.t. yi (w x̄i + b) − ηi α j ≥ 1 − ξi , ξi ≥ 0, i = 1, . . . , m (22b)
j=1
α j ≥ −w j , α j ≥ w j , j = 1, . . . , n (22c)
Ghaoui et al. [59] derived a robust model when the uncertainty is expressed as
intervals (also known as support or extremum values). Suppose the extremum values of
the uncertain data points are known as li j ≤ X i j ≤ u i j , then each training data point X i
123
Ann. Data. Sci.
n
yi (w ci + b) ≥ 1 − ξi + si j |w j | (23)
j=1
Then the SVM model with support information can be written as:
1 m
min w22 + C ξi (24a)
w,b,ξi 2
i=1
s.t. yi (w ci + b) ≥ 1 − ξi + ||Si w||1 , ξi ≥ 0, i = 1, . . . , m (24b)
1 m
min w22 + C ξi (25a)
w,b,ξi 2
i=1
s.t. min yi (w xi + b) ≥ 1 − ξi , ξi ≥ 0, i = 1, . . . , m (25b)
{xi :Di xi ≤di }
To solve
max − yi w xi (27a)
s.t. Di xi ≤ di (27b)
123
Ann. Data. Sci.
Strong duality would guarantee that the objevtive values of the dual and primal are
equal. Therefore, the robust SVM with polyhedral uncertainty formulation is equiva-
lent to:
1 m
min w22 + C ξi (29a)
w,b,ξi ,z 2
i=1
s.t. di zi − yi b ≤ −1 + ξi , ξi ≥ 0 (29b)
Di zi
+ yi w = 0, zi = (z i1 , . . . , z iq ) (29c)
z i j ≥ 0, i = 1, . . . , m, j = 1, . . . , q (29d)
The authors also proved that for the hard margin SVM (i.e., when there is no ξi ),
the dual of the above formulation is:
m 2
m
1
n
min λi − yi μik (30a)
λ,μ 2
i=1 k=1 i=1
n
s.t. λi di j + μik Di jk = 0, i = 1, . . . , m, j = 1, . . . , q (30b)
k=1
m
λi yi = 0 (30c)
i=1
λi ≥ 0, i = 1, . . . , m (30d)
I xi0 + δ i
Di = , di = (31)
−I −xi0 + δ i
{xi : xi ∈ [xi0 − δ i , xi0 + δ i ]} and {xi : Di xi ≤ di } are equivalent. The authors of [60]
also proposed probabilistic bounds on constraint violation in this case.
1 m
min w22 + C ξi (32a)
w,b,ξi 2
i=1
123
Ann. Data. Sci.
where 0 < ε ≤ 1 is a prameter given by the user and close to 0. This model ensures
an upper bound on the misclassification probability, but the chance constraints are
typically non-convex so the problem is very hard to solve.
The work so far to deal with the chance constraint is to transform them by different
bounding inequalities. When the mean and covariance matrix are known, the multi-
variate Chebyshev bound via robust optimization can be used to express the chance
constraints above [61,62].
Markov’s inequality states that if X is a nonnegative random variable and a > 0,
then
E[X ]
Prob{X ≥ a} ≤ (33)
a
2
Consider the random variable X − E[X ] . Since Var(X ) = E (X − E[X ])2 , then
2 Var(X )
Prob{ X − E[X ] ≥ a 2 } ≤ (34)
a2
Var(X )
Prob{ X − E[X ] ≥ a} ≤ (35)
a2
Let x ∼ (μ, ) denote the random vector x with mean μ and convariance matrix
. The multivariate Chebyshev inequality [65,66] states that for an arbitrary closed
convex set S, the supremum of the probability that x takes a value in S is
1
sup Prob{x ∈ S} = (36a)
x∼(μ, ) 1 + d2
d 2 = inf (x − μ) −1
(x − μ) (36b)
x∈S
1
w μ + b ≥ κC || 2 w||2 (37)
√
where κC = (1 − ε)/ε.
Applying the above result to the chance constrained SVM, the Chebyshev based
reformulation utilizing the means μi and covariance matrix i of each uncertain
training point X i can be obtained as the following robust model [61,62]:
1
m
min w22 + C ξi (38a)
w,b,ξi 2
i=1
1
s.t. yi (w μi + b) ≥ 1 − ξi + κC || i
2
w||2 , ξi ≥ 0, i = 1, . . . , m (38b)
123
Ann. Data. Sci.
Another approach to study SVM with chance constraints is to use Bernstein approx-
imation schemes [67–69]. Ben-Tal et al. [63] employed Bernstein bounding schemes
for the CCP relaxation and transformed the problem as a convex second order cone pro-
gram with robust set constraints to guarantee the satisfaction of the chance constraints
and can be solved efficiently using interior point solvers.
The Bernstein based relaxation utilized both the support (bounds, i.e. extremum
values of the data points) and moment information (mean and variance). For random
data point X i = [X i1 , . . . , X in ] and its label yi , support information is the bounds
of the data points li j ≤ X i j ≤ u i j , i.e. X i ∈ Ri = {xi = [xi1 , . . . , xin ] ∈ Rn | li j ≤
xi j ≤ u i j , j = 1, . . . , n}, 1st moment information is the bounds on the means of
the data points μi− = [μi1 − −
, . . . , μin ] ≤ μi = E[X i ] = [E[X i1 ], . . . , E[X in ]] ≤
+ + +
μi = [μi1 , . . . , μin ] , and 2nd moment information is the bounds on the second-
moments of the data points 0 ≤ E[X i2j ] ≤ σi2j .
The Bernstein based relaxation is to derive convex constraints so that when these
convex constraints are satisfied then the chance-constraints are guaranteed to be satis-
fied. They proved that with the information of independent random variable X i j , i.e.
support li j ≤ X i j ≤ u i j , bounds on the first-moment μi−j ≤ μi j = E[X i j ] ≤ μi+j , and
bounds on the second-moment 0 ≤ E[X i2j ] ≤ σi2j , the chance-constraint in SVM is
satisfied if the following convex constraint holds:
1 − ξi − yi b + max −yi μi−j w j , −yi μi+j w j + κ B || i w||2 ≤0 (39)
j
where κ B = 2 log(1/ε), and the diagonal matrix
− + − +
i = diag si1 ν(μi1 , μi1 , σi1 ), . . . , sin ν(μin , μin , σin ) (40)
u i j −li j
where si j = 2 and the function ν(μi−j , μi+j , σi j ) is defined by normalizing X̂ i j =
X i j −ci j li j +u i j u i j −li j
si j , where ci j = 2 and si j = 2 . Using the information of X i j , one can
easily compute the moment information of X̂ i j , which are denoted by μ̂i−j ≤ μ̂i j =
E[ X̂ i j ] ≤ μ̂i+j and 0 ≤ E[ X̂ i2j ] ≤ σ̂i2j . They proved that
⎧ μ̂i j −σ̂i2j
⎪
⎪ (1−μ̂i j )2 exp t˜ + σ̂i2j −μ̂i2j exp{t˜}
⎪
⎪ 1−μ̂i j
⎨
1−2μ̂i j + σ̂i2j
, t˜ ≥ 0
E exp{t˜ X̂ i j } ≤ gμ̂i j ,σ̂i j (t˜) =
⎪
⎪
μ̂i j +σ̂ 2
t˜ 1+μ̂ i j + σ̂i2j −μ̂i2j exp{−t˜}
⎪
⎪
(1+μ̂i j )2 exp
⎩ ij
, t˜ ≤ 0
1+2μ̂i j + σ̂i2j
(41)
They defined h μ̂i j ,σ̂i j (t˜) = log gμ̂i j ,σ̂i j (t˜), and the function ν(μ− , μ+ , σ ) is defined
as:
123
Ann. Data. Sci.
k2 2
ν(μ− , μ+ , σ ) = min k ≥ 0 : h μ̂,σ̂ (t˜) ≤ max[μ̂− t˜, μ̂+ t˜] + t˜ ,
2
∀μ̂ ∈ [μ̂− , μ̂+ ], t˜ (42)
This value can be calculated numerically. Under the condition that μi−j ≤ ci j ≤ μi+j ,
this value can be computed analytically by ν(μ− , μ+ , σ ) = 1 − (μ̂min )2 , where
μ̂min = min(−μ̂− , μ̂+ ).
Replacing the chance-constraints in SVM by the convex constraint derived above,
the problem is transformed into a convex second order cone program:
1 m
min w22 + C ξi (43a)
w,b,ξi ,z i j 2
i=1
s.t. 1 − ξi − yi b + z i j + κ B || i w||2 ≤0 (43b)
j
yi (w x + b) ≥ 1 − ξi , ∀x ∈ ∪μi ∈[μ− ,μ+ ] E μi , κ B i (44)
i i
Therefore, this constraint is defining an uncertainty set ∪μi ∈[μ− ,μ+ ] E μi , κ B i for
i i
each uncertain training data point X i . If all the points in the uncertainty set satisfy
yi (w x + b) ≥ 1 − ξi , then the chance-constraint is guaranteed to be satisfied. This
transforms the CCP into a robust optimization problem over the uncertainty
set.
Since the size of the uncertainty set depend on κ B , and κ B = 2 log(1/ε), when
the upperbound of misclassification error ε decreases, the size of the uncertainty
set increases. When ε is very small, the uncertainty set would become huge so the
constraint would be too conservative. As the support information provides with the
bounding hyper-rectangle Ri where the true training data point X i would always
lie in, a less conservative
classifier can be obtained by taking the intersection of
∪μi ∈[μ− ,μ+ ] E μi , κ B i and Ri as the new uncertainty set.
i i
The authors proved that when the uncertainty set is the intersection, i.e.,
yi (w x + b) ≥ 1 − ξi , ∀x ∈ ∪μi ∈[μ− ,μ+ ] E μi , κ B i ∩ Ri (45)
i i
123
Ann. Data. Sci.
The above constraint is satisfied if and only if the following convex constraint holds:
max −li j (yi w j + ai j ), −u i j (yi w j + ai j ) + max μi−j ai j , μi+j ai j
j
+1 − ξi − yi b + κ B || i ai ||2 ≤0 (46)
Replacing the chance-constraints in SVM by the robust but less conservative convex
constraint above, the problem is transformed into the following SOCP:
1
m
min w22 + C ξi (47a)
w,b,ξi ,z i j ,z̃ i j ,ai 2
i=1
s.t. 1 − ξi − yi b + z̃ i j + z i j + κ B || i ai ||2 ≤0 (47b)
j j
The Bernstein based formulations (43) and (47) are robust to the moment estimation
errors
− in+ addition to the uncertainty in data, since
they are using the bounds on mean
μi j , μi j and bounds on second-moment σi2j instead of the exact values of the
moments which are often unknown.
Comparing the two approaches for the chance constrained SVM, both of them are
robust to uncertainties in data and did not make assumptions to the underlying prob-
ability distribution. Chebyshev based schemes only employed moment information
of the uncertain training points, while Bernstein bounds employed both support and
moment information, therefore can be less conservative than Chebyshev bounds. The
resulting classifier by Bernstein approach achieved larger classification margins and
therefore better generalization ability according to the structural risk minimization
principle of Vapnik [1]. A drawback of Bernstein based formulation is that it assumes
each element X i j is independent with each other, while Chebyshev based formulation
allows the covariance matrix i of uncertain training point X i .
4 Concluding Remarks
This paper presented a survey on SVM with uncertainties. When direct model cannot
guarantee a generally good performance on the uncertainty set, robust optimization is
utilized to obtain an optimal performance under the worst case scenario. The perturba-
tion of the uncertain data could be bounded by the norm, or expressed as intervals and
polyhedrons. When the constraint is a chance constraint, different bounding schemes
like multivariate Chebyshev inequality and Bernstein bounding schemes are used to
ensure the small probability of misclassification for the uncertain data.
The models in the literature are generally processing the linear SVM. A big part of
the power of SVM lies in the powerful representation of nonlinear kernel in SVM mod-
123
Ann. Data. Sci.
References
123
Ann. Data. Sci.
21. Bauer S, Nolte LP, Reyes M (2011) Fully automatic segmentation of brain tumor images using support
vector machine classification in combination with hierarchical conditional random field regularization.
In: Fichtinger G, Martel A, Peters T (eds) Proceedings of Medical Image Computing and Computer-
Assisted Intervention-MICCAI 2011. Springer, Berlin, pp 354–361
22. Yao J, Dwyer A, Summers RM, Mollura DJ (2011) Computer-aided diagnosis of pulmonary infections
using texture analysis and support vector machine classification. Acad Radiol 18(3):306–314
23. Prosser B, Zheng WS, Gong S, Xiang T, Mary Q (2010) Person re-identification by support vector
ranking. BMVC 1:5
24. Dardas NH, Georganas ND (2011) Real-time hand gesture detection and recognition using bag-of-
features and support vector machine techniques. IEEE Trans Instrum Meas 60(11):3592–3607
25. Wei J, Jian-qi Z, Xiang Z (2011) Face recognition method based on support vector machine and particle
swarm optimization. Expert Syst Appl 38(4):4390–4393
26. Han B, Davis LS (2012) Density-based multifeature background subtraction with support vector
machine. IEEE Trans Pattern Anal Mach Intell 34(5):1017–1023
27. Waske B, van der Linden S, Benediktsson JA, Rabe A, Hostert P (2010) Sensitivity of support vector
machines to random feature selection in classification of hyperspectral data. IEEE Trans Geosci Remote
Sens 48(7):2880–2889
28. Mountrakis G, Im J, Ogole C (2011) Support vector machines in remote sensing: a review. ISPRS J
Photogramm Remote Sens 66(3):247–259
29. Li CH, Kuo BC, Lin CT, Huang CS (2012) A spatial-contextual support vector machine for remotely
sensed image classification. IEEE Trans Geosci Remote Sens 50(3):784–799
30. Otukei J, Blaschke T (2010) Land cover change assessment using decision trees, support vector
machines and maximum likelihood classification algorithms. Int J Appl Earth Obs Geoinf 12:S27–S31
31. Shao Y, Lunetta RS (2012) Comparison of support vector machine, neural network, and cart algorithms
for the land-cover classification using limited training data points. ISPRS J Photogramm Remote Sens
70:78–87
32. Volpi M, Tuia D, Bovolo F, Kanevski M, Bruzzone L (2013) Supervised change detection in VHR
images using contextual information and support vector machines. Int J Appl Earth Obs Geoinf 20:77–
85
33. Yilmaz I (2010) Comparison of landslide susceptibility mapping methodologies for koyulhisar, Turkey:
conditional probability, logistic regression, artificial neural networks, and support vector machine.
Environ Earth Sci 61(4):821–836
34. Tien Bui D, Pradhan B, Lofman O, Revhaug I (2012) Landslide susceptibility assessment in vietnam
using support vector machines, decision tree, and naive bayes models. Math Problems Eng 2012:
Article ID 974638
35. Xu C, Dai F, Xu X, Lee YH (2012) Gis-based support vector machine modeling of earthquake-triggered
landslide susceptibility in the Jianjiang River Watershed, China. Geomorphology 145:70–80
36. Pradhan B (2013) A comparative study on the predictive ability of the decision tree, support vec-
tor machine and neuro-fuzzy models in landslide susceptibility mapping using GIS. Comput Geosci
51:350–365
37. Kisi O, Cimen M (2011) A wavelet-support vector machine conjunction model for monthly streamflow
forecasting. J Hydrol 399(1):132–140
38. Yoon H, Jun SC, Hyun Y, Bae GO, Lee KK (2011) A comparative study of artificial neural networks and
support vector machines for predicting groundwater levels in a coastal aquifer. J Hydrol 396(1):128–
138
39. Gomez FR, Rajapakse AD, Annakkage UD, Fernando IT (2011) Support vector machine-based algo-
rithm for post-fault transient stability status prediction using synchronized measurements. IEEE Trans
Power Syst 26(3):1474–1483
40. Niu D, Wang Y, Wu DD (2010) Power load forecasting using support vector machine and ant colony
optimization. Expert Syst Appl 37(3):2531–2539
41. Kavaklioglu K (2011) Modeling and prediction of Turkeys electricity consumption using support vector
regression. Appl Energy 88(1):368–375
42. Zhou J, Shi J, Li G (2011) Fine tuning support vector machines for short-term wind speed forecasting.
Energy Conversv Manag 52(4):1990–1998
43. Kara Y, Acar Boyacioglu M, Baykan ÖK (2011) Predicting direction of stock price index movement
using artificial neural networks and support vector machines: the sample of the istanbul stock exchange.
Expert Syst Appl 38(5):5311–5319
123
Ann. Data. Sci.
44. Yeh CY, Huang CW, Lee SJ (2011) A multiple-kernel support vector regression approach for stock
market price forecasting. Expert Syst Appl 38(3):2177–2186
45. Huang CF (2012) A hybrid stock selection model using genetic algorithms and support vector regres-
sion. Appl Soft Comput 12(2):807–818
46. Yang, XS, Deb S, Fong S (2011) Accelerated particle swarm optimization and support vector machine
for business optimization and applications. In: Fong S (ed) Networked digital technologies. Springer,
Berlin, pp 53–66
47. Rumpf T, Mahlein AK, Steiner U, Oerke EC, Dehne HW, Plümer L (2010) Early detection and clas-
sification of plant diseases with support vector machines based on hyperspectral reflectance. Comput
Electron Agric 74(1):91–99
48. Konar P, Chattopadhyay P (2011) Bearing fault detection of induction motor using wavelet and support
vector machines (SVMS). Appl Soft Comput 11(6):4203–4211
49. Horng SJ, Su MY, Chen YH, Kao TW, Chen RJ, Lai JL, Perkasa CD (2011) A novel intrusion detection
system based on hierarchical clustering and support vector machines. Expert Syst Appl 38(1):306–313
50. Wong PK, Xu Q, Vong CM, Wong HC (2012) Rate-dependent hysteresis modeling and control of a
piezostage using online support vector machine and relevance vector machine. IEEE Trans Ind Electron
59(4):1988–2001
51. Cui J, Wang Y (2011) A novel approach of analog circuit fault diagnosis using support vector machines
classifier. Measurement 44(1):281–289
52. Tian Y, Shi Y, Liu X (2012) Recent advances on support vector machines research. Technol Econ Dev
Econ 18(1):5–33
53. Bi J, Zhang T (2004) Support vector classification with input data uncertainty. Adv Neural Inf Process
Syst 17:161–168
54. Trafalis TB, Gilbert RC (2006) Robust classification and regression using support vector machines.
Eur J Op Res 173(3):893–909
55. Trafalis TB, Gilbert RC (2007) Robust support vector machines for classification and computational
issues. Optim Methods Softw 22(1):187–198
56. Trafalis TB, Alwazzi SA (2010) Support vector machine classification with noisy data: a second order
cone programming approach. Int J Gen Syst 39(7):757–781
57. Pant R, Trafalis TB, Barker K (2011) Support vector machine classification of uncertain and imbal-
anced data using robust optimization. In: Proceedings of the 15th WSEAS international conference on
computers, World Scientific and Engineering Academy and Society (WSEAS), pp 369–374
58. Xanthopoulos P, Pardalos PM, Trafalis TB (2012) Robust data mining. Springer, New York
59. Ghaoui LE, Lanckriet GR, Natsoulis G (2003) Robust classification with interval data. Technical report
UCB/CSD-03-1279, Computer Science Division, University of California, Berkeley
60. Fan N, Sadeghi E, Pardalos PM (2014) Robust support vector machines with polyhedral uncertainty of
the input data. In: Pardalos PM, Resende MGC, Vogiatzis C, Walteros JL (eds) Learning and intelligent
optimization. Springer, Berlin, pp 291–305
61. Bhattacharyya C, Grate LR, Jordan MI, El Ghaoui L, Mian IS (2004) Robust sparse hyperplane
classifiers: application to uncertain molecular profiling data. J Comput Biol 11(6):1073–1089
62. Shivaswamy PK, Bhattacharyya C, Smola AJ (2006) Second order cone programming approaches for
handling missing and uncertain data. J Mach Learn Res 7:1283–1314
63. Ben-Tal A, Bhadra S, Bhattacharyya C, Nath JS (2011) Chance constrained uncertain classification
via robust optimization. Math Program 127(1):145–173
64. Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297
65. Marshall AW, Olkin I (1960) Multivariate Chebyshev inequalities. Ann Math Stat 31(4):1001–1014
66. Bertsimas D, Popescu I (2005) Optimal inequalities in probability theory: a convex optimization
approach. Siam J Optim 15(3):780–804
67. Ben-Tal A, Ghaoui LE, Nemirovski A (2009) Robust optimization. Princeton University Press, Prince-
ton
68. Ben-Tal A, Nemirovski A (2008) Selected topics in robust convex optimization. Math Program
112(1):125–158
69. Nemirovski A, Shapiro A (2006) Convex approximations of chance constrained programs. Siam J
Optim 17(4):969–996
123
Ann. Data. Sci.
123