0% found this document useful (0 votes)

107 views25 pages

LIBSVM A Library For Support Vector Machines

LIBSVM is a library for support vector machines (SVM) developed by Chih-Chung Chang and Chih-Jen Lin. The library aims to make it easy for users to apply SVM as a tool. The document describes the implementation details of LIBSVM, including formulations for C-SVC, ν-SVC, one-class SVM, ε-SVR, and ν-SVR. It also discusses solving the quadratic problems that arise and techniques like shrinking and caching to improve performance.

Uploaded by

mhv_game

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

107 views25 pages

LIBSVM A Library For Support Vector Machines

Uploaded by

mhv_game

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 25

LIBSVM: a Library for Support Vector Machines

Chih-Chung Chang and Chih-Jen Lin

Last updated: December 27, 2006

Abstract
LIBSVM is a library for support vector machines (SVM). Its goal is to help
users to easily use SVM as a tool. In this document, we present all its imple-
mentation details. For the use of LIBSVM, the README le included in the
package and the LIBSVM FAQ provide the information.
1 Introduction
LIBSVM is a library for support vector machines (SVM). Its goal is to let users can
easily use SVM as a tool. In this document, we present all its implementation details.
For using LIBSVM, the README le included in the package provides the information.
In Section 2, we show formulations used in LIBSVM: C-support vector classica-
tion (C-SVC), -support vector classication (-SVC), distribution estimation (one-
class SVM), -support vector regression (-SVR), and -support vector regression
(-SVR). We discuss the implementation of solving quadratic problems in Section 3.
Section 4 describes two implementation techniques: shrinking and caching. We also
support dierent penalty parameters for unbalanced data. Details are in Section 5.
Then Section 6 discusses the implementation of multi-class classication. Parameter
selection is important for obtaining good SVM models. LIBSVM provides simple and
useful tools, which are discussed in Section 7. Section 8 presents the implementation
of probability outputs.
2 Formulations
2.1 C-Support Vector Classication
Given training vectors x
i
R
n
, i = 1, . . . , l, in two classes, and a vector y R
l
such that y
i
{1, 1}, C-SVC (Cortes and Vapnik, 1995; Vapnik, 1998) solves the

Department of Computer Science, National Taiwan University, Taipei 106, Taiwan (http://
www.csie.ntu.edu.tw/

cjlin). E-mail: [email protected]

1
following primal problem:
min
w,b,
1
2
w
T
w + C
l

i=1

i
(2.1)
subject to y
i
(w
T
(x
i
) + b) 1
i
,

i
0, i = 1, . . . , l.
Its dual is
min

1
2

T
Qe
T

subject to y
T
= 0, (2.2)
0
i
C, i = 1, . . . , l,
where e is the vector of all ones, C > 0 is the upper bound, Q is an l by l positive
semidenite matrix, Q
ij
y
i
y
j
K(x
i
, x
j
), and K(x
i
, x
j
) (x
i
)
T
(x
j
) is the kernel.
Here training vectors x
i
are mapped into a higher (maybe innite) dimensional space
by the function .
The decision function is
sgn
_
l

i=1
y
i

i
K(x
i
, x) + b
_
.
2.2 -Support Vector Classication
The -support vector classication (Scholkopf et al., 2000) uses a new parameter
which controls the number of support vectors and training errors. The parameter
(0, 1] is an upper bound on the fraction of training errors and a lower bound of
the fraction of support vectors.
Given training vectors x
i
R
n
, i = 1, . . . , l, in two classes, and a vector y R
l
such that y
i
{1, 1}, the primal form considered is:
min
w,b,,
1
2
w
T
w +
1
l
l

i=1

i
subject to y
i
(w
T
(x
i
) + b)
i
,

i
0, i = 1, . . . , l, 0.
2
The dual is:
min

1
2

T
Q
subject to 0
i
1/l, i = 1, . . . , l, (2.3)
e
T
, y
T
= 0.
where Q
ij
y
i
y
j
K(x
i
, x
j
).
The decision function is:
sgn
_
l

i=1
y
i

i
(K(x
i
, x) + b)
_
.
In (Crisp and Burges, 2000; Chang and Lin, 2001), it has been shown that e
T

can be replaced by e
T
= . With this property, in LIBSVM, we solve a scaled version
of (2.3):
min

1
2

T
Q
subject to 0
i
1, i = 1, . . . , l,
e
T
= l,
y
T
= 0.
We output / so the computed decision function is:
sgn
_
l

i=1
y
i
(
i
/)(K(x
i
, x) + b)
_
and then two margins are
y
i
(w
T
(x
i
) + b) = 1
which are the same as those of C-SVC.
2.3 Distribution Estimation (One-class SVM)
One-class SVM was proposed by Scholkopf et al. (2001) for estimating the support of
a high-dimensional distribution. Given training vectors x
i
R
n
, i = 1, . . . , l without
any class information, the primal form in (Scholkopf et al., 2001) is:
min
w,b,,
1
2
w
T
w +
1
l
l

i=1

i
subject to w
T
(x
i
)
i
,

i
0, i = 1, . . . , l.
3
The dual is:
min

1
2

T
Q
subject to 0
i
1/(l), i = 1, . . . , l, (2.4)
e
T
= 1,
where Q
ij
= K(x
i
, x
j
) (x
i
)
T
(x
j
).
In LIBSVM we solve a scaled version of (2.4):
min
1
2

T
Q
subject to 0
i
1, i = 1, . . . , l,
e
T
= l.
The decision function is
sgn(
l

i=1

i
K(x
i
, x) ).
2.4 -Support Vector Regression (-SVR)
Given a set of data points, {(x
1
, z
1
), . . . , (x
l
, z
l
)}, such that x
i
R
n
is an input and
z
i
R
1
is a target output, the standard form of support vector regression (Vapnik,
1998) is:
min
w,b,,

1
2
w
T
w + C
l

i=1

i
+ C
l

i=1

i
subject to w
T
(x
i
) + b z
i
+
i
,
z
i
w
T
(x
i
) b +

i
,

i
0, i = 1, . . . , l.
The dual is:
min
,

1
2
(

)
T
Q(

) +
l

i=1
(
i
+

i
) +
l

i=1
z
i
(
i

i
)
subject to
l

i=1
(
i

i
) = 0, 0
i
,

i
C, i = 1, . . . , l, (2.5)
4
where Q
ij
= K(x
i
, x
j
) (x
i
)
T
(x
j
).
The approximate function is:
l

i=1
(
i
+

i
)K(x
i
, x) + b.
2.5 -Support Vector Regression (-SVR)
Similar to -SVC, for regression, Scholkopf et al. (2000) use a parameter to control
the number of support vectors. However, unlike -SVC, where replaces with C,
here replaces with the parameter of -SVR. The primal form is
min
w,b,,

,
1
2
w
T
w + C( +
1
l
l

i=1
(
i
+

i
)) (2.6)
subject to (w
T
(x
i
) + b) z
i
+
i
,
z
i
(w
T
(x
i
) + b) +

i
,

i
0, i = 1, . . . , l, 0.
and the dual is
min
,

1
2
(

)
T
Q(

) +z
T
(

)
subject to e
T
(

) = 0, e
T
(+

) C,
0
i
,

i
C/l, i = 1, . . . , l, (2.7)
Similarly, the inequality e
T
( +

) C can be replaced by an equality. In

LIBSVM, we consider C C/l, so the dual problem solved is:
min
,

1
2
(

)
T
Q(

) +z
T
(

)
subject to e
T
(

) = 0, e
T
(+

) = Cl,
0
i
,

i
C, i = 1, . . . , l. (2.8)
The decision function is
l

i=1
(
i
+

i
)K(x
i
, x) + b,
which is the same as that of -SVR.
5
3 Solving the Quadratic Problems
3.1 The Decomposition Method for C-SVC, -SVR, and One-
class SVM
We consider the following general form of C-SVC, -SVR, and one-class SVM:
min

1
2

T
Q+p
T

subject to y
T
= , (3.1)
0
t
C, t = 1, . . . , l,
where y
t
= 1, t = 1, . . . , l. It can be clearly seen that C-SVC and one-class SVM
are already in the form of (3.1). For -SVR, we consider the following reformulation
of (2.5):
min
,

1
2
_

T
, (

)
T

_
Q Q
Q Q
_ _

_
+
_
e
T
+z
T
, e
T
z
T

_
subject to y
T
_

_
= 0, 0
t
,

t
C, t = 1, . . . , l, (3.2)
where y is a 2l by 1 vector with y
t
= 1, t = 1, . . . , l and y
t
= 1, t = l + 1, . . . , 2l.
The diculty of solving (3.1) is the density of Q because Q
ij
is in general not zero.
In LIBSVM, we consider the decomposition method to conquer this diculty. Some
work on this method are, for example, (Osuna et al., 1997a; Joachims, 1998; Platt,
1998). This method modies only a subset of per iteration. This subset, denoted
as the working set B, leads to a small sub-problem to be minimized in each iteration.
An extreme case is the Sequential Minimal Optimization (SMO) (Platt, 1998), which
restricts B to have only two elements. Then in each iteration one solves a simple
two-variable problem without needing optimization software. Here we consider an
SMO-type decomposition method proposed in Fan et al. (2005).
Algorithm 1 (An SMO-type Decomposition method in Fan et al. (2005))
1. Find
1
as the initial feasible solution. Set k = 1.
2. If
k
is a stationary point of (2.2), stop. Otherwise, nd a two-element working
set B = {i, j} by WSS 1 (described in Section 3.2). Dene N {1, . . . , l}\B
and
k
B
and
k
N
to be sub-vectors of
k
corresponding to B and N, respectively.
6
3. If a
ij
K
ii
+ K
jj
2K
ij
> 0
Solve the following sub-problem with the variable
B
:
min

i
,
j
1
2
_

i

j

_
Q
ii
Q
ij
Q
ij
Q
jj
_ _

j
_
+ (p
B
+ Q
BN

k
N
)
T
_

j
_
subject to 0
i
,
j
C, (3.3)
y
i

i
+ y
j

j
= y
T
N

k
N
,
else
Solve
min

i
,
j
1
2
_

i

j

_
Q
ii
Q
ij
Q
ij
Q
jj
_ _

j
_
+ (p
B
+ Q
BN

k
N
)
T
_

j
_
+
a
ij
4
((
i

k
i
)
2
+ (
j

k
j
)
2
) (3.4)
subject to constraints of (3.3).
4. Set
k+1
B
to be the optimal solution of (3.3) and
k+1
N

k
N
. Set k k + 1
and goto Step 2.
Note that B is updated at each iteration. To simplify the notation, we simply
use B instead of B
k
. If a
ij
0, (3.3) is a concave problem. Hence we use a convex
modication in (3.4).
3.2 Stopping Criteria and Working Set Selection for C-SVC,
-SVR, and One-class SVM
The Karush-Kuhn-Tucker (KKT) optimality condition of (3.1) shows that a vector
is a stationary point of (3.1) if and only if there is a number b and two nonnegative
vectors and such that
f() + by = ,

i
= 0,
i
(C
i
) = 0,
i
0,
i
0, i = 1, . . . , l,
where f() Q+p is the gradient of f(). This condition can be rewritten as
f()
i
+ by
i
0 if
i
< C, (3.5)
f()
i
+ by
i
0 if
i
> 0. (3.6)
7
Since y
i
= 1, by dening
I
up
() {t |
t
< C, y
t
= 1 or
t
> 0, y
t
= 1}, and
I
low
() {t |
t
< C, y
t
= 1 or
t
> 0, y
t
= 1},
(3.7)
a feasible is a stationary point of (3.1) if and only if
m() M(), (3.8)
where
m() max
iIup()
y
i
f()
i
, and M() min
iI
low
()
y
i
f()
i
.
From this we have the following stopping condition:
m(
k
) M(
k
) . (3.9)
About the selection of the working set set B, we consider the following procedure:
WSS 1
1. For all t, s, dene
a
ts
K
tt
+ K
ss
2K
ts
, b
ts
y
t
f(
k
)
t
+ y
s
f(
k
)
s
> 0 (3.10)
and
a
ts

_
a
ts
if a
ts
> 0,
otherwise.
(3.11)
Select
i arg max
t
{y
t
f(
k
)
t
| t I
up
(
k
)},
j arg min
t
_

b
2
it
a
it
| t I
low
(
k
), y
t
f(
k
)
t
< y
i
f(
k
)
i
_
. (3.12)
2. Return B = {i, j}.
Details of how we choose this working set is in (Fan et al., 2005, Section II).
3.3 Convergence of the Decomposition Method
See (Fan et al., 2005, Section III) or (Chen et al., 2006) for a detailed discussion of
the convergence of Algorithm 1.
8
3.4 The Decomposition Method for -SVC and -SVR
Both -SVC and -SVR can be considered as the following general form:
min

1
2

T
Q+p
T

subject to y
T
=
1
, (3.13)
e
T
=
2
,
0
t
C, t = 1, . . . , l.
The KKT condition of (3.13) shows
f()
i
+ by
i
= 0 if 0 <
i
< C,
0 if
i
= 0,
0 if
i
= C.
Dene
r
1
b, r
2
+ b.
If y
i
= 1 the KKT condition becomes
f()
i
r
1
0 if
i
< C, (3.14)
0 if
i
> 0.
On the other hand, if y
i
= 1, it is
f()
i
r
2
0 if
i
< C, (3.15)
0 if
i
> 0.
Hence given a tolerance > 0, the stopping condition is:
max (m
p
() M
p
(), m
n
() M
n
()) < , (3.16)
where
m
p
() max
iIup(),y
i
=1
y
i
f()
i
, M
p
() min
iI
low
(),y
i
=1
y
i
f()
i
, and
m
n
() max
iIup(),y
i
=1
y
i
f()
i
, M
n
() min
iI
low
(),y
i
=1
y
i
f()
i
.
The working set selection is by extending WSS 1 to the following
9
WSS 2 (Extending WSS 1 to -SVM)
1. Find
i
p
arg m
p
(
k
),
j
p
arg min
t
_

b
2
ipt
a
ipt
| y
t
= 1,
t
I
low
(
k
), y
t
f(
k
)
t
< y
ip
f(
k
)
ip
_
.
2. Find
i
n
arg m
n
(
k
),
j
n
arg min
t
_

b
2
int
a
int
| y
t
= 1,
t
I
low
(
k
), y
t
f(
k
)
t
< y
in
f(
k
)
in
_
.
3. Return {i
p
, j
p
}) or {i
n
, j
n
} depending on which one gives smaller b
2
ij
/ a
ij
.
3.5 Analytical Solutions
Details are described in Section 5 in which we discuss the solution of a more general
sub-problem.
3.6 The Calculation of b or
After the solution of the dual optimization problem is obtained, the variables b or
must be calculated as they are used in the decision function. Here we simply describe
the case of -SVC and -SVR where b and both appear. Other formulations are
simplied cases of them.
The KKT condition of (3.13) has been shown in (3.14) and (3.15). Now we consider
the case of y
i
= 1. If there are
i
which satisfy 0 <
i
< C, then r
1
= f()
i
.
Practically to avoid numerical errors, we average them:
r
1
=

0<
i
<C,y
i
=1
f()
i

0<
i
<C,y
i
=1
1
.
On the other hand, if there is no such
i
, as r
1
must satisfy
max

i
=C,y
i
=1
f()
i
r
1
min

i
=0,y
i
=1
f()
i
,
we take r
1
the midpoint of the range.
For y
i
= 1, we can calculate r
2
in a similar way.
After r
1
and r
2
are obtained,
=
r
1
+ r
2
2
and b =
r
1
r
2
2
.
10
4 Shrinking and Caching
4.1 Shrinking
Since for many problems the number of free support vectors (i.e. 0 <
i
< C) is small,
the shrinking technique reduces the size of the working problem without considering
some bounded variables (Joachims, 1998). Near the end of the iterative process, the
decomposition method identies a possible set A where all nal free
i
may reside
in. Indeed we can have the following theorem which shows that at the nal iterations
of the decomposition proposed in Section 3.2 only variables corresponding to a small
set are still allowed to move:
Theorem 4.1 (Theorem IV in (Fan et al., 2005)) Assume Q is positive semi-
denite.
1. The following set is independent of any optimal solution :
I {i | y
i
f( )
i
> M( ) or y
i
f( )
i
< m( )}. (4.1)
Problem (2.2) has unique and bounded optimal solutions at
i
, i I.
2. Assume Algorithm 1 generates an innite sequence {
k
}. There is

k such that
after k

k, every
k
i
, i I has reached the unique and bounded optimal solution.
It remains the same in all subsequent iterations and k

k:
i {t | M(
k
) y
t
f(
k
)
t
m(
k
)}. (4.2)
Hence instead of solving the whole problem (2.2), the decomposition method works
on a smaller problem:
min

A
1
2

T
A
Q
AA

A
(p
A
Q
AN

k
N
)
T

A
subject to 0 (
A
)
t
C, t = 1, . . . , q, (4.3)
y
T
A

A
= y
T
N

k
N
,
where N = {1, . . . , l}\A is the set of shrunken variables.
Of course this heuristic may fail if the optimal solution of (4.3) is not the cor-
responding part of that of (2.2). When that happens, the whole problem (2.2) is
11
reoptimized starting from a point where
A
is an optimal solution of (4.3) and
N
are bounded variables identied before the shrinking process. Note that while solving
the shrunken problem (4.3), we only know the gradient Q
AA

A
+ Q
AN

N
+ p
A
of
(4.3). Hence when problem (2.2) is reoptimized we also have to reconstruct the whole
gradient f(), which is quite expensive.
Many implementations began the shrinking procedure near the end of the iterative
process, in LIBSVM however, we start the shrinking process from the beginning. The
procedure is as follows:
1. After every min(l, 1000) iterations, we try to shrink some variables. Note that
during the iterative process
m(
k
) > M(
k
) (4.4)
as (3.9) is not satised yet. Following Theorem 4.1, we conjecture that variables
in the following set can be shrunken:
{t | y
t
f(
k
)
t
> m(
k
), t I
low
(
k
),
k
t
is bounded}
{t | y
t
f(
k
)
t
< M(
k
), t I
up
(
k
),
k
t
is bounded}
= {t | y
t
f(
k
)
t
> m(
k
),
k
t
= C, y
t
= 1 or
k
t
= 0, y
t
= 1}
{t | y
t
f(
k
)
t
< M(
k
),
k
t
= 0, y
t
= 1 or
k
t
= C, y
t
= 1}.(4.5)
Thus the set A of activated variables is dynamically reduced in every min(l, 1000)
iterations.
2. Of course the above shrinking strategy may be too aggressive. Since the decom-
position method has a very slow convergence and a large portion of iterations
are spent for achieving the nal digit of the required accuracy, we would not
like those iterations are wasted because of a wrongly shrunken problem (4.3).
Hence when the decomposition method rst achieves the tolerance
m(
k
) M(
k
) + 10,
where is the specied stopping criteria, we reconstruct the whole gradient.
Then we inactive some variables based on the current set (4.5). and the decom-
position method continues.
12
Therefore, in LIBSVM, the size of the set A of (4.3) is dynamically reduced. To
decrease the cost of reconstructing the gradient f(), during the iterations we
always keep

G
i
= C

j
=C
Q
ij
, i = 1, . . . , l.
Then for the gradient f()
i
, i / A, we have
f()
i
=
l

j=1
Q
ij

j
+ p
i
=

G
i
+

0<
j
<C
Q
ij

j
+ p
i
.
For -SVC and -SVR, as the stopping condition (3.16) is dierent from (3.9),
the set (4.5) must be modied. For y
t
= 1, we shrink elements in the following set
{t | y
t
f(
k
)
t
> m
p
(
k
),
t
= C, y
t
= 1}
{t | y
t
f(
k
)
t
< M
p
(
k
),
t
= 0, y
t
= 1}.
For y
t
= 1, we shrink the following set:
{t | y
t
f(
k
)
t
> m
n
(
k
),
t
= 0, y
t
= 1}
{t | y
t
f(
k
)
t
< M
n
(
k
),
t
= C, y
t
= 1}.
4.2 Caching
Another technique for reducing the computational time is caching. Since Q is fully
dense and may not be stored in the computer memory, elements Q
ij
are calculated
as needed. Then usually a special storage using the idea of a cache is used to store
recently used Q
ij
(Joachims, 1998). Hence the computational cost of later iterations
can be reduced.
Theorem 4.1 also supports the use of the cache as in nal iterations only some
columns of the matrix Q are still needed. Thus if the cache can contain these columns,
we can avoid most kernel evaluations in nal iterations.
In LIBSVM, we implement a simple least-recent-use strategy for the cache. We
dynamically cache only recently used columns of Q
AA
of (4.3).
4.3 Computational Complexity
The discussion in Section 3.3 is about the asymptotic convergence of the decomposi-
tion method. Here, we discuss the computational complexity.
13
The main operations are on nding Q
BN

k
N
+ p
B
of (3.3) and the update of
f(
k
) to f(
k+1
). Note that f() is used in the working set selection as well
as the stopping condition. They can be considered together as
Q
BN

k
N
+p
B
= f(
k
) Q
BB

k
B
, (4.6)
and
f(
k+1
) = f(
k
) + Q
:,B
(
k+1
B

k
B
), (4.7)
where Q
:,B
is the sub-matrix of Q with column in B. That is, at the kth iteration,
as we already have f(
k
), the right-hand-side of (4.6) is used to construct the
sub-problem. After the sub-problem is solved, (4.7) is employed to have the next
f(
k+1
). As B has only two elements and solving the sub-problem is easy, the
main cost is Q
:,B
(
k+1
B

k
B
) of (4.7). The operation itself takes O(2l) but if Q
:,B
is
not available in the cache and each kernel evaluation costs O(n), one column indexes
of Q
:,B
already needs O(ln). Therefore, the complexity is:
1. #Iterations O(l) if most columns of Q are cached during iterations.
2. #Iterations O(nl) if most columns of Q are cached during iterations and each
kernel evaluation is O(n).
Note that if shrinking is incorporated, l will gradually decrease during iterations.
5 Unbalanced Data
For some classication problems, numbers of data in dierent classes are unbalanced.
Hence some researchers (e.g. (Osuna et al., 1997b)) have proposed to use dierent
penalty parameters in the SVM formulation: For example, C-SVM becomes
min
w,b,
1
2
w
T
w + C
+

y
i
=1

i
+ C

y
i
=1

i
subject to y
i
(w
T
(x
i
) + b) 1
i
,

i
0, i = 1, . . . , l.
14
Its dual is
min

1
2

T
Qe
T

subject to 0
i
C
+
, if y
i
= 1, (5.1)
0
i
C

, if y
i
= 1, (5.2)
y
T
= 0.
Note that by replacing C with dierent C
i
, i = 1, . . . , l, most of the analysis earlier
are still correct. Now using C
+
and C

is just a special case of it. Therefore, the

implementation is almost the same. A main dierence is on the solution of the sub-
problem (3.3). Now it becomes:
min

i
,
j
1
2
_

i

j

_
Q
ii
Q
ij
Q
ji
Q
jj
_ _

j
_
+ (Q
i,N

N
1)
i
+ (Q
j,N

N
1)
j
subject to y
i

i
+ y
j

j
= y
T
N

k
N
, (5.3)
0
i
C
i
, 0
j
C
j
,
where C
i
and C
j
can be C
+
or C

depending on y
i
and y
j
.
Let
i
=
k
i
+d
i
,
j
=
k
j
+d
j
and

d
i
y
i
d
i
,

d
j
y
j
d
j
. Then (5.3) can be written
as
min
d
i
,d
j
1
2
_
d
i
d
j

_
Q
ii
Q
ij
Q
ij
Q
jj
_ _
d
i
d
j
_
+
_
f(
k
)
i
f(
k
)
j

_
d
i
d
j
_
subject to y
i
d
i
+ y
j
d
j
= 0, (5.4)

k
i
d
i
C
k
i
,
k
j
d
j
C
k
j
.
Dene a
ij
and b
ij
as in (3.10). Note that if a
ij
0, then a modication similar to
(3.4). Using

d
i
=

d
j
, the objective function can be written as
1
2
a
ij

d
2
j
+ b
ij

d
j
.
Thus,

new
i
=
k
i
+ y
i
b
ij
/ a
ij
,

new
j
=
k
i
y
j
b
ij
/ a
ij
. (5.5)
15
To modify them back to the feasible region, we rst consider the case y
i
= y
j
and
write (5.5) as

new
i
=
k
i
+ (f(
k
)
i
f(
k
)
j
)/ a
ij
,

new
j
=
k
i
+ (f(
k
)
i
f(
k
)
j
)/ a
ij
.
If
new
is not feasible, (
new
i
,
new
j
) is in one of the following four regions:
-
6

j
I
II
III
IV
If it is in region I,
k+1
i
is set to be C
i
rst and then

k+1
j
= C
i
(
k
i

k
j
).
Of course we must check if it is in region I rst. If so, we have

k
i

k
j
> C
i
C
j
and
new
i
C
i
.
Other cases are similar. Therefore, we have the following procedure to identify
(
new
i
,
new
j
) in dierent regions and change it back to the feasible set.
if(y[i]!=y[j])
{
double quad_coef = Q_i[i]+Q_j[j]+2*Q_i[j];
if (quad_coef <= 0)
quad_coef = TAU;
double delta = (-G[i]-G[j])/quad_coef;
double diff = alpha[i] - alpha[j];
alpha[i] += delta;
alpha[j] += delta;
if(diff > 0)
{
if(alpha[j] < 0) // in region III
{
alpha[j] = 0;
16
alpha[i] = diff;
}
}
else
{
if(alpha[i] < 0) // in region IV
{
alpha[i] = 0;
alpha[j] = -diff;
}
}
if(diff > C_i - C_j)
{
if(alpha[i] > C_i) // in region I
{
alpha[i] = C_i;
alpha[j] = C_i - diff;
}
}
else
{
if(alpha[j] > C_j) // in region II
{
alpha[j] = C_j;
alpha[i] = C_j + diff;
}
}
}
6 Multi-class classication
We use the one-against-one approach (Knerr et al., 1990) in which k(k 1)/2
classiers are constructed and each one trains data from two dierent classes. The
rst use of this strategy on SVM was in (Friedman, 1996; Kreel, 1999). For training
data from the ith and the jth classes, we solve the following two-class classication
17
problem:
min
w
ij
,b
ij
,
ij
1
2
(w
ij
)
T
w
ij
+ C(

t
(
ij
)
t
)
subject to (w
ij
)
T
(x
t
) + b
ij
1
ij
t
, if x
t
in the ith class,
(w
ij
)
T
(x
t
) + b
ij
1 +
ij
t
, if x
t
in the jth class,

ij
t
0.
In classication we use a voting strategy: each binary classication is considered to be
a voting where votes can be cast for all data points x - in the end point is designated
to be in a class with maximum number of votes.
In case that two classes have identical votes, though it may not be a good strategy,
now we simply select the one with the smallest index.
There are other methods for multi-class classication. Some reasons why we choose
this 1-against-1 approach and detailed comparisons are in Hsu and Lin (2002).
7 Parameter Selection
LIBSVM provides a parameter selection tool using the RBF kernel: cross validation
via parallel grid search. While cross validation is available for both SVC and SVR,
for the grid search, currently we support only C-SVC with two parameters C and .
They can be easily modied for other kernels such as linear and polynomial, or for
SVR.
For median-sized problems, cross validation might be the most reliable way for
parameter selection. First, the training data is separated to several folds. Sequentially
a fold is considered as the validation set and the rest are for training. The average of
accuracy on predicting the validation sets is the cross validation accuracy.
Our implementation is as follows. Users provide a possible interval of C (or )
with the grid space. Then, all grid points of (C, ) are tried to see which one gives
the highest cross validation accuracy. Users then use the best parameter to train the
whole training set and generate the nal model.
For easy implementation, we consider each SVM with parameters (C, ) as an
independent problem. As they are dierent jobs, we can easily solve them in parallel.
18
Figure 1: Contour plot of heart scale included in the LIBSVM package
Currently, LIBSVM provides a very simple tool so that jobs are dispatched to a cluster
of computers which share the same le system.
Note that now under the same (C, ), the one-against-one method is used for
training multi-class data. Hence, in the nal model, all k(k 1)/2 decision functions
share the same (C, ).
LIBSVM also outputs the contour plot of cross validation accuracy. An example
is in Figure 1.
8 Probability Estimates
Support vector classication (regression) predicts only class label (approximate target
value) but not probability information. In the following we briey describe how we
extend SVM for probability estimates. More details are in Wu et al. (2004) for
19
classication and in Lin and Weng (2004) for regression.
Given k classes of data, for any x, the goal is to estimate
p
i
= p(y = i | x), i = 1, . . . , k.
Following the setting of the one-against-one (i.e., pairwise) approach for multi-class
classication, we rst estimated pairwise class probabilities
r
ij
p(y = i | y = i or j, x)
using an improved implementation (Lin et al., 2003) of (Platt, 2000):
r
ij

1
1 + e
A

f+B
, (8.1)
where A and B are estimated by minimizing the negative log-likelihood function
using known training data and their decision values

f. Labels and decision values
are required to be independent so here we conduct ve-fold cross-validation to obtain
decision values.
Then the second approach in Wu et al. (2004) is used to obtain p
i
from all these
r
ij
s. It solves the following optimization problem:
min
p
1
2
k

i=1

j:j=i
(r
ji
p
i
r
ij
p
j
)
2
subject to
k

i=1
p
i
= 1, p
i
0, i. (8.2)
The objective function comes from the equality
p(y = j | y = i or j, x) p(y = i | x) = p(y = i | y = i or j, x) p(y = j | x)
and can be reformulated as
min
p
1
2
p
T
Qp, (8.3)
where
Q
ij
=
_

s:s=i
r
2
si
if i = j,
r
ji
r
ij
if i = j.
(8.4)
This problem is convex, so the optimality conditions that there is a scalar b such that
_
Q e
e
T
0
_ _
p
b
_
=
_
0
1
_
. (8.5)
20
Here e is the k 1 vector of all ones, 0 is the k 1 vector of all zeros, and b is
the Lagrangian multiplier of the equality constraint

k
i=1
p
i
= 1. Instead of directly
solving the linear system (8.5), we derive a simple iterative method in the following.
As
p
T
Qp = p
T
Q(bQ
1
e) = bp
T
e = b,
the solution p satises
Q
tt
p
t
+

j:j=t
Q
tj
p
j
p
T
Qp = 0, for any t. (8.6)
Using (8.6), we consider the following algorithm:
Algorithm 2
1. Start with some initial p
i
0, i and

k
i=1
p
i
= 1.
2. Repeat (t = 1, . . . , k, 1, . . .)
p
t

1
Q
tt
[

j:j=t
Q
tj
p
j
+p
T
Qp] (8.7)
normalize p (8.8)
until (8.5) is satised.
This procedure guarantees to nd a global optimum of (8.2). Using some tricks, we
do not need to recalculate p
T
Qp in each iteration. Detailed implementation notes
are in Appendix C of Wu et al. (2004). We consider a relative stopping condition for
Algorithm 2:
Qp p
T
Qpe
1
= max
t
|(Qp)
t
p
T
Qp| < 0.005/k.
When k is large, p will be closer to zero, so we decrease the tolerance by a factor of
k.
Next, we discuss SVR probability inference. For a given set of training data
D = {(x
i
, y
i
) | x
i
R
n
, y
i
R, i = 1, . . . , l}, we suppose that the data are collected
from the model:
y
i
= f(x
i
) +
i
, (8.9)
where f(x) is the underlying function and
i
are independent and identically dis-
tributed random noises. Given a test data x, the distribution of y given x and D,
21
P(y | x, D), allows one to draw probabilistic inferences about y; for example, one
can construct a predictive interval I = I(x) such that y I with a pre-specied
probability. Denoting

f as the estimated function based on D using SVR, then
= (x) y

f(x) is the out-of-sample residual (or prediction error), and y I is
equivalent to I

f(x). We propose to model the distribution of based on a
set of out-of-sample residuals {
i
}
l
i=1
using training data D. The
i
s are generated
by rst conducting a k-fold cross validation to get

f
j
, j = 1, . . . , k, and then setting

i
y
i

f
j
(x
i
) for (x
i
, y
i
) in the jth fold. It is conceptually clear that the distribution
of
i
s may resemble that of the prediction error .
Figure 2 illustrates
i
s from a real data. Basically, a discretized distribution like
histogram can be used to model the data; however, it is complex because all
i
s must
be retained. On the contrary, distributions like Gaussian and Laplace, commonly
used as noise models, require only location and scale parameters. In Figure 2 we plot
the tted curves using these two families and the histogram of
i
s. The gure shows
that the distribution of
i
s seems symmetric about zero and that both Gaussian and
Laplace reasonably capture the shape of
i
s. Thus, we propose to model
i
by zero-
mean Gaussian and Laplace, or equivalently, model the conditional distribution of y
given

f(x) by Gaussian and Laplace with mean

f(x).
(Lin and Weng, 2004) discussed a method to judge whether a Laplace and Gaus-
sian distribution should be used. Moreover, they experimentally show that in all cases
they have tried, Laplace is better. Thus, here we consider the zero-mean Laplace with
a density function:
p(z) =
1
2
e

|z|

. (8.10)
Assuming that
i
are independent, we can estimate the scale parameter by maximizing
the likelihood. For Laplace, the maximum likelihood estimate is
=

l
i=1
|
i
|
l
. (8.11)
(Lin and Weng, 2004) pointed out that some very extreme
i
may cause inaccurate
estimation of . Thus, they propose to estimate the scale parameter by discarding
i
s
which exceed 5 (standard deviation of
i
). Thus, for any new data x, we consider
that
y =

f(x) + z,
22
where z is a random variable following the Laplace distribution with parameter .
In theory, the distribution of may depend on the input x, but here we assume
that it is free of x. This is similar to the model (8.1) for classication. Such an
assumption works well in practice and leads to a simple model.
Figure 2: Histogram of
i
s from a data set and the modeling via Laplace and Gaussian
distributions. The x-axis is
i
using ve-fold CV and the y-axis is the normalized
number of data in each bin of width 1.
Acknowledgments
This work was supported in part by the National Science Council of Taiwan via
the grants NSC 89-2213-E-002-013 and NSC 89-2213-E-002-106. The authors thank
Chih-Wei Hsu and Jen-Hao Lee for many helpful discussions and comments. We also
thank Ryszard Czerminski and Lily Tian for some useful comments.
References
C.-C. Chang and C.-J. Lin. Training -support vector classiers: Theory and algo-
rithms. Neural Computation, 13(9):21192147, 2001.
P.-H. Chen, R.-E. Fan, and C.-J. Lin. A study on SMO-type decomposition methods
23
for support vector machines. IEEE Transactions on Neural Networks, 2006. URL
http://www.csie.ntu.edu.tw/

cjlin/papers/generalSMO.pdf. To appear.
C. Cortes and V. Vapnik. Support-vector network. Machine Learning, 20:273297,
1995.
D. J. Crisp and C. J. C. Burges. A geometric interpretation of -SVM classiers.
In S. Solla, T. Leen, and K.-R. M uller, editors, Advances in Neural Information
Processing Systems, volume 12, Cambridge, MA, 2000. MIT Press.
R.-E. Fan, P.-H. Chen, and C.-J. Lin. Working set selection using second order
information for training SVM. Journal of Machine Learning Research, 6:18891918,
2005. URL http://www.csie.ntu.edu.tw/

cjlin/papers/quadworkset.pdf.
J. Friedman. Another approach to polychotomous classication. Technical
report, Department of Statistics, Stanford University, 1996. Available at
http://www-stat.stanford.edu/reports/friedman/poly.ps.Z.
C.-W. Hsu and C.-J. Lin. A comparison of methods for multi-class support vector
machines. IEEE Transactions on Neural Networks, 13(2):415425, 2002.
T. Joachims. Making large-scale SVM learning practical. In B. Scholkopf, C. J. C.
Burges, and A. J. Smola, editors, Advances in Kernel Methods - Support Vector
Learning, Cambridge, MA, 1998. MIT Press.
S. Knerr, L. Personnaz, and G. Dreyfus. Single-layer learning revisited: a stepwise
procedure for building and training a neural network. In J. Fogelman, editor, Neu-
rocomputing: Algorithms, Architectures and Applications. Springer-Verlag, 1990.
U. Kreel. Pairwise classication and support vector machines. In B. Scholkopf,
C. J. C. Burges, and A. J. Smola, editors, Advances in Kernel Methods Support
Vector Learning, pages 255268, Cambridge, MA, 1999. MIT Press.
C.-J. Lin and R. C. Weng. Simple probabilistic predictions for support vector regres-
sion. Technical report, Department of Computer Science, National Taiwan Univer-
sity, 2004. URL http://www.csie.ntu.edu.tw/

cjlin/papers/svrprob.pdf.
24
H.-T. Lin, C.-J. Lin, and R. C. Weng. A note on Platts probabilistic outputs for sup-
port vector machines. Technical report, Department of Computer Science, National
Taiwan University, 2003. URL http://www.csie.ntu.edu.tw/

cjlin/papers/
plattprob.ps.
E. Osuna, R. Freund, and F. Girosi. Training support vector machines: An application
to face detection. In Proceedings of CVPR97, pages 130136, New York, NY, 1997a.
IEEE.
E. Osuna, R. Freund, and F. Girosi. Support vector machines: Training and appli-
cations. AI Memo 1602, Massachusetts Institute of Technology, 1997b.
J. Platt. Probabilistic outputs for support vector machines and comparison to regu-
larized likelihood methods. In A. Smola, P. Bartlett, B. Scholkopf, and D. Schuur-
mans, editors, Advances in Large Margin Classiers, Cambridge, MA, 2000. MIT
Press. URL citeseer.nj.nec.com/platt99probabilistic.html.
J. C. Platt. Fast training of support vector machines using sequential minimal opti-
mization. In B. Scholkopf, C. J. C. Burges, and A. J. Smola, editors, Advances in
Kernel Methods - Support Vector Learning, Cambridge, MA, 1998. MIT Press.
B. Scholkopf, A. Smola, R. C. Williamson, and P. L. Bartlett. New support vector
algorithms. Neural Computation, 12:12071245, 2000.
B. Scholkopf, J. C. Platt, J. Shawe-Taylor, A. J. Smola, and R. C. Williamson.
Estimating the support of a high-dimensional distribution. Neural Computation,
13(7):14431471, 2001.
V. Vapnik. Statistical Learning Theory. Wiley, New York, NY, 1998.
T.-F. Wu, C.-J. Lin, and R. C. Weng. Probability estimates for multi-class classica-
tion by pairwise coupling. Journal of Machine Learning Research, 5:9751005, 2004.
URL http://www.csie.ntu.edu.tw/

cjlin/papers/svmprob/svmprob.pdf.
25

2.pattern Recognition (Pattern Classification) - Support Vector Machin
No ratings yet
2.pattern Recognition (Pattern Classification) - Support Vector Machin
122 pages
Support Vector Machines
100% (5)
Support Vector Machines
14 pages
SVM_GC
No ratings yet
SVM_GC
42 pages
Machine Learning - Open Elective - Part III
No ratings yet
Machine Learning - Open Elective - Part III
90 pages
An Improved Training Algorithm For Support Vector Machines
No ratings yet
An Improved Training Algorithm For Support Vector Machines
10 pages
20 SVM
No ratings yet
20 SVM
35 pages
L5 SVM
No ratings yet
L5 SVM
61 pages
Least Squares Support Vector Machines: Johan Suykens
No ratings yet
Least Squares Support Vector Machines: Johan Suykens
84 pages
Support Vector Machines For Classification and Regression
No ratings yet
Support Vector Machines For Classification and Regression
8 pages
Lecture 18 - SVM
No ratings yet
Lecture 18 - SVM
54 pages
Survit Sra Efficient Primal SVM
No ratings yet
Survit Sra Efficient Primal SVM
8 pages
CotterShalevSrebro2013
No ratings yet
CotterShalevSrebro2013
14 pages
LIBSVM: A Library For Support Vector Machines
No ratings yet
LIBSVM: A Library For Support Vector Machines
39 pages
TAZ-TFG-2016-2057
No ratings yet
TAZ-TFG-2016-2057
52 pages
SVM Tutorial
100% (1)
SVM Tutorial
34 pages
Thesis
No ratings yet
Thesis
364 pages
(Optimization) SVMs
No ratings yet
(Optimization) SVMs
19 pages
SVM-CDing2024 11 15
No ratings yet
SVM-CDing2024 11 15
54 pages
lecture6
No ratings yet
lecture6
17 pages
A_1012474916001
No ratings yet
A_1012474916001
20 pages
Support vector machine
No ratings yet
Support vector machine
49 pages
FTR 32(2) (2007) 173-178
No ratings yet
FTR 32(2) (2007) 173-178
7 pages
A Tutorial On Support Vector Regression
No ratings yet
A Tutorial On Support Vector Regression
30 pages
SVM Explained PDF
No ratings yet
SVM Explained PDF
19 pages
6. Support Vector Machine for Classification
No ratings yet
6. Support Vector Machine for Classification
38 pages
Machine Learning - SVM
No ratings yet
Machine Learning - SVM
11 pages
Machine Learning-Kernel Methods
No ratings yet
Machine Learning-Kernel Methods
5 pages
Support Vector Machine
No ratings yet
Support Vector Machine
45 pages
A Tutorial On Support Vector Regression
No ratings yet
A Tutorial On Support Vector Regression
77 pages
Lect3 2
No ratings yet
Lect3 2
43 pages
Rakotomamonjy SVR paper
No ratings yet
Rakotomamonjy SVR paper
13 pages
Support Vector Machines as Probabilistic Models
No ratings yet
Support Vector Machines as Probabilistic Models
8 pages
Comparison of SVM and LS SVM For Regress
No ratings yet
Comparison of SVM and LS SVM For Regress
5 pages
hw3 Soln
No ratings yet
hw3 Soln
7 pages
Fundamental Knowledge of Machine Learning: Abstract This Chapter Introduces The Basic Concepts and Methods of Machine
No ratings yet
Fundamental Knowledge of Machine Learning: Abstract This Chapter Introduces The Basic Concepts and Methods of Machine
14 pages
Doreswamy and Chanabasayya .M. Vastrad
No ratings yet
Doreswamy and Chanabasayya .M. Vastrad
18 pages
Introduction To: Support Vector Machines
No ratings yet
Introduction To: Support Vector Machines
53 pages
Support Vector Machines: Jeff Wu
No ratings yet
Support Vector Machines: Jeff Wu
35 pages
Time Series Forecasting by Using Wavelet Kernel SVM
No ratings yet
Time Series Forecasting by Using Wavelet Kernel SVM
52 pages
SVM Tutorial
No ratings yet
SVM Tutorial
31 pages
Machine Learning and Data Mining: Introduction to (Học máy và Khai phá dữ liệu)
No ratings yet
Machine Learning and Data Mining: Introduction to (Học máy và Khai phá dữ liệu)
49 pages
SVM
No ratings yet
SVM
36 pages
Support Vector Machin, An Excellent Tool
No ratings yet
Support Vector Machin, An Excellent Tool
36 pages
A Tutorial On Support Vector Regression
No ratings yet
A Tutorial On Support Vector Regression
3 pages
A Tutorial On Support Vector Regression
No ratings yet
A Tutorial On Support Vector Regression
24 pages
Bread and Pastry Production: Quarter 1
0% (1)
Bread and Pastry Production: Quarter 1
19 pages
39f6c97e482b96aba75c59b4ac0d99b8_MIT15_097S12_lec12
No ratings yet
39f6c97e482b96aba75c59b4ac0d99b8_MIT15_097S12_lec12
14 pages
Support Vector Machines
No ratings yet
Support Vector Machines
24 pages
1238 Support Vector Regression Machines
No ratings yet
1238 Support Vector Regression Machines
7 pages
Support Vector Machines Jie Tang
No ratings yet
Support Vector Machines Jie Tang
28 pages
Makala Mathematic V.english
No ratings yet
Makala Mathematic V.english
5 pages
2011-Modeling of Quadruple Tank System Using Support Vector Regression
No ratings yet
2011-Modeling of Quadruple Tank System Using Support Vector Regression
7 pages
Makalah
No ratings yet
Makalah
4 pages
Least Squares Support Vector Machine Classifiers: Neural Processing Letters 9: 293-300, 1999
No ratings yet
Least Squares Support Vector Machine Classifiers: Neural Processing Letters 9: 293-300, 1999
8 pages
Support Vector Machines (SVM) : Y.H. Hu
No ratings yet
Support Vector Machines (SVM) : Y.H. Hu
25 pages
Support Vector Machine in R Paper
No ratings yet
Support Vector Machine in R Paper
28 pages
SVM Tutorial
No ratings yet
SVM Tutorial
34 pages
Therigatha (Tr. Hallisey)
No ratings yet
Therigatha (Tr. Hallisey)
340 pages
insight advanced DVD worksheets Unit 4
No ratings yet
insight advanced DVD worksheets Unit 4
5 pages
Macaca HSK 3 Students
No ratings yet
Macaca HSK 3 Students
207 pages
Fractions Practice Test 1
No ratings yet
Fractions Practice Test 1
8 pages
Ancient History Stage 6 Persepolis Program
100% (2)
Ancient History Stage 6 Persepolis Program
21 pages
Islamic Studies - Lecture Slides - Module 6 - Predestination and Freewill
No ratings yet
Islamic Studies - Lecture Slides - Module 6 - Predestination and Freewill
13 pages
Mechenical Testing of Bones and Bone Implants Interface
100% (1)
Mechenical Testing of Bones and Bone Implants Interface
650 pages
Most Common Dissertation Mistakes
100% (2)
Most Common Dissertation Mistakes
4 pages
UNIT 7 - Well-Being - For Students
No ratings yet
UNIT 7 - Well-Being - For Students
4 pages
College Magazine – Department of Botany
No ratings yet
College Magazine – Department of Botany
7 pages
transcript-template-for-english-students-with-linear-a-levels-only
No ratings yet
transcript-template-for-english-students-with-linear-a-levels-only
2 pages
17H51A04L5-Katragadda Manisha Rani
No ratings yet
17H51A04L5-Katragadda Manisha Rani
1 page
The Role of A Project Management Office
No ratings yet
The Role of A Project Management Office
2 pages
The Space Between The Notes - The Effects of Background Music On S
No ratings yet
The Space Between The Notes - The Effects of Background Music On S
33 pages
MAQ Software - Job Description - Presales Solutions Architect - 2024
No ratings yet
MAQ Software - Job Description - Presales Solutions Architect - 2024
4 pages
Atcd-18is61 - Model Paper 2021
No ratings yet
Atcd-18is61 - Model Paper 2021
7 pages
Music Quarter 3
No ratings yet
Music Quarter 3
41 pages
Calibration Methodology of Microsimulation Model For Unsignalized Intersection Under Heterogeneous Traffic Conditions
No ratings yet
Calibration Methodology of Microsimulation Model For Unsignalized Intersection Under Heterogeneous Traffic Conditions
13 pages
OurHeadTeacherIsASuperVillain LessonPlan
No ratings yet
OurHeadTeacherIsASuperVillain LessonPlan
3 pages
Reka Gayathrie Pragalathan 2023
No ratings yet
Reka Gayathrie Pragalathan 2023
3 pages
GEC7 Chap 4 Topic 4
No ratings yet
GEC7 Chap 4 Topic 4
3 pages
1st Prelim - Teaching Profession-Zm
100% (8)
1st Prelim - Teaching Profession-Zm
3 pages
Educated Unemployed: A Challenge Before Sustainable Education
No ratings yet
Educated Unemployed: A Challenge Before Sustainable Education
8 pages
Corrigé Classroom English
No ratings yet
Corrigé Classroom English
3 pages
CS304 (P) - Sample Paper
No ratings yet
CS304 (P) - Sample Paper
14 pages
PSYC62 Syllabus Winter 2023
No ratings yet
PSYC62 Syllabus Winter 2023
9 pages
HRGP Q2 DLP
No ratings yet
HRGP Q2 DLP
3 pages
Activity Guide and Evaluation Rubric Speaking Practice
No ratings yet
Activity Guide and Evaluation Rubric Speaking Practice
6 pages
Social Studies 10-1 - Unit 4 Project
No ratings yet
Social Studies 10-1 - Unit 4 Project
2 pages
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
Sequences and Infinite Series, A Collection of Solved Problems
From Everand
Sequences and Infinite Series, A Collection of Solved Problems
Steven Tan
No ratings yet
A Brief Introduction to MATLAB: Taken From the Book "MATLAB for Beginners: A Gentle Approach"
From Everand
A Brief Introduction to MATLAB: Taken From the Book "MATLAB for Beginners: A Gentle Approach"
Peter Kattan
2.5/5 (2)
Numerical Analysis II Essentials
From Everand
Numerical Analysis II Essentials
The Editors of REA
No ratings yet

LIBSVM A Library For Support Vector Machines

Uploaded by

LIBSVM A Library For Support Vector Machines

Uploaded by

LIBSVM: a Library for Support Vector Machines

Chih-Chung Chang and Chih-Jen Lin

Last updated: December 27, 2006

cjlin). E-mail: [email protected]

) C can be replaced by an equality. In

is just a special case of it. Therefore, the

You might also like