0% found this document useful (0 votes)
59 views

Support Vector Machines For Classification: A Seminar On Data Mining

This document discusses support vector machines (SVMs) for classification. SVMs find the optimal separating hyperplane that maximizes the margin between classes. The hyperplane is determined by support vectors, which are the closest training examples to the hyperplane. New examples are classified based on which side of the hyperplane they fall. The document explains how SVMs are formulated as a quadratic optimization problem to find the maximum-margin hyperplane.

Uploaded by

Roshan Koshy
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
59 views

Support Vector Machines For Classification: A Seminar On Data Mining

This document discusses support vector machines (SVMs) for classification. SVMs find the optimal separating hyperplane that maximizes the margin between classes. The hyperplane is determined by support vectors, which are the closest training examples to the hyperplane. New examples are classified based on which side of the hyperplane they fall. The document explains how SVMs are formulated as a quadratic optimization problem to find the maximum-margin hyperplane.

Uploaded by

Roshan Koshy
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

SUPPORT VECTOR MACHINES

FOR CLASSIFICATION

A Seminar on Data Mining

Roshan P Koshy
210CS3058
National Institute of Technology,
Rourkela, Orissa

nitlogo.jpg

Roshan P Koshy 210CS3058 SUPPORT VECTOR MACHINES FOR CLASSIFICATION


Introduction

Empirical Data Modelling :


1 Induce
2 Build a model based on induction
3 Deduce response of the system, that is being modelled
Quality and Quality of the model :
1 Data is finite and sampled
2 Sampling not uniform
3 Problem high-dimensional in nature, Hence data forms a sparse
ditribution of the domian

nitlogo.jpg

Roshan P Koshy 210CS3058 SUPPORT VECTOR MACHINES FOR CLASSIFICATION


Support Vector Machine

was invented by Vladimir Vapnik


A set of related supervised learning methods that analyze data and
recognize patterns, used for classification and regression analysis.
The standard SVM is a non-probabilistic binary linear classifier, i.e.
it predicts, for each given input, which of two possible classes the
input is a member of.
An algorithm that builds a model that predicts whether a new
example falls into one category or the other.
Intuitively, an SVM model is a representation of the examples as
points in space, mapped so that the examples of the separate
categories are divided by a clear gap that is as wide as possible. New
examples are then mapped into that same space and predicted to
belong to a category based on which side of the gap they fall on.

nitlogo.jpg

Roshan P Koshy 210CS3058 SUPPORT VECTOR MACHINES FOR CLASSIFICATION


Support Vector Machine - Motivation

Classifying data and predicting the class of an object are common


tasks in machine learning.
A data point is viewed as a p-dimensional vector(a list of p
numbers), and we want to know whether we can separate such
points with a (p-1)-dimensional hyperplane. There are many
hyperplanes that might classify the data.
One reasonable choice as the best hyperplane is the one that
represents the largest separation, or margin, between the two classes.
So we choose the hyperplane so that the distance from it to the
nearest data point on each side is maximized.
Maximum-Margin Hyperplane : A hyperplane that represents the largest
separation, or margin between the two classes and the
linear classifier it defines is known as a maximum margin
classifier.
nitlogo.jpg

Roshan P Koshy 210CS3058 SUPPORT VECTOR MACHINES FOR CLASSIFICATION


Structural risk minimization (SRM)
Introduced by Vapnik and Chervonekis in 1974
An inductive principle for model selection used for learning from
finite training data sets. It describes a general model of capacity
control and provides a trade-off between hypothesis space
complexity (the VC dimension of approximating functions) and the
quality of fitting the training data (empirical error).
Working -
1 Divide the class of functions into a hierarchy of nested subsets in
order of increasing complexity. For example, polynomials of
increasing degree.
2 Perform empirical risk minimization on each subset
3 Parameter selection.
4 Select the model in the series whose sum of empirical risk and VC
confidence is minimal.

nitlogo.jpg

Roshan P Koshy 210CS3058 SUPPORT VECTOR MACHINES FOR CLASSIFICATION


Support Vector Machine – Working – Linear Case
Let the 2-dimensional data set D in be given as
(X1 , y1 ), (X2 , y2 ), . . . , (X|D| , y|D| ), where Xi is the set of training tuples
with associated class labels, yi . Each yi can take one of two values,
either +1 or -1 (i.e., y ∈ {1, −1}).
Lets consider an example based on two input attributes, A1 and A2 .

nitlogo.jpg

Roshan P Koshy 210CS3058 SUPPORT VECTOR MACHINES FOR CLASSIFICATION


Support Vector Machine – Working – Linear Case

The hyperplane with the larger margin expect to be more accurate at


classifying future data tuples than the hyperplane with the smaller
margin.Therefore the SVM try to searches for the hyperplane with the
largest margin, that is, the maximum marginal hyperplane (MMH). The
associated margin gives the largest separation between classes.
Margin: The shortest distance from a hyperplane to one side of its
margin is equal to the shortest distance from the
hyperplane to the other side of its margin, where the
”sides” of the margin are parallel to the hyperplane. When
dealing with the MMH, this distance is, the shortest
distance from the MMH to the closest training tuple of
either class.

nitlogo.jpg

Roshan P Koshy 210CS3058 SUPPORT VECTOR MACHINES FOR CLASSIFICATION


Support Vector Machine – Working – Linear Case
A separating hyperplane can be written as

W.X + b = 0 (1)

where W is a weight vector, namely, W = {w1 , w2 , . . . , wn } where n is


the number of attributes; and b is a scalar, often referred to as a bias.
Training tuples are 2-D, e.g., X = (x1 , x2 ), where x1 and x2 are the values
of attributes A1 and A2 , respectively, for X. If we think of b as an
additional weight, w0 , we can rewrite the above separating hyperplane as

w0 + w1 x1 + w2 x2 = 0 (2)

Thus, any point that lies above the separating hyperplane satisfies

w0 + w1 x1 + w2 x2 > 0 (3)

Similarly, any point that lies below the separating hyperplane satisfies

w0 + w1 x1 + w2 x2 < 0 (4) nitlogo.jpg


Roshan P Koshy 210CS3058 SUPPORT VECTOR MACHINES FOR CLASSIFICATION
Support Vector Machine – Working – Linear Case

The weights can be adjusted so that the hyperplanes defining the ”sides”
of the margin can be written as

H1 : w0 + w1 x1 + w2 x2 ≥ −1 for yi = +1 and (5)


H2 : w0 + w1 x1 + w2 x2 ≤ −1 for yi = −1 (6)

That is, any tuple that falls on or above H1 belongs to class +1, and any
tuple that falls on or below H2 belongs to class –1. Combining the two
inequalities of Equations and , we get

yi (w0 + w1 x1 + w2 x2 ) ≥ −1, ∀i. (7)

nitlogo.jpg

Roshan P Koshy 210CS3058 SUPPORT VECTOR MACHINES FOR CLASSIFICATION


Support Vector Machine – Working – Linear Case

Support Vector and Margin:


Any training tuples that fall on hyperplanes H1 or H2 satisfy Equation (7)
and are called support vectors. They are equally close to the MMH.
Now, we can obtain a formulae for the size of the maximal margin. The
1
distance from the separating hyperplane to any point on H1 is ||W|| ,

where ||W|| is the Euclidean norm √ of W, that
p is W.W.
If W = {w1 , w2 , . . . , wn }, then W.W = w12 + w22 + · · · + wn2 .
By definition, this is equal to the distance from any point on H2 to the
2
separating hyperplane. Therefore, the maximal margin is ||W|| .

nitlogo.jpg

Roshan P Koshy 210CS3058 SUPPORT VECTOR MACHINES FOR CLASSIFICATION


Support Vector Machine – Working – Linear Case

Support Vector and Margin:

nitlogo.jpg

Roshan P Koshy 210CS3058 SUPPORT VECTOR MACHINES FOR CLASSIFICATION


Support Vector Machine – Working – Linear Case

The next step is to rewrite the Equation (7) so that it becomes what is
known as a constrained quadratic optimization problem such as using
a Lagrangian formulation and then solving for the solution using
Karush-Kuhn-Tucker (KKT) conditions.For linearly separable data, the
support vectors are a subset of the actual training tuples.
Once we have found the support vectors and MMH , we have a trained
support vector machine.
The MMH can be rewritten as the decision boundary
l
X
d(XT ) = yi αi Xi XT + b0 (8)
i=1

where yi is the class label of support vector Xi ; XT is a test tuple; αi and


b0 are numeric parameters that were determined automatically by the
optimization; αi are known as Lagrangian multipliers and l is the number
of support vectors.
nitlogo.jpg

Roshan P Koshy 210CS3058 SUPPORT VECTOR MACHINES FOR CLASSIFICATION


Support Vector Machine – Working – Linear Case

Test the model :


Given a test tuple, XT , we plug it into Equation (8), and then check to
see the sign of the result. This tells us on which side of the hyperplane
the test tuple falls. If the sign is positive,then XT falls on or above the
MMH,and so the SVM predicts that XT belongs to class +1. If the sign
is negative, then XT falls on or below the MMH and the class prediction
is -1.

nitlogo.jpg

Roshan P Koshy 210CS3058 SUPPORT VECTOR MACHINES FOR CLASSIFICATION


Support Vector Machine – Working – Non-Linear Case
Two main steps to classify non-linear data.
1 Transform the original input data into a higher dimensional space using a
nonlinear mapping. Several common nonlinear mappings can be used in
this step.
2 The model searches for a linear separating hyperplane in the new
space.We again end up with a quadratic optimization problem that can
be solved using the linear SVM formulation. The maximal marginal
hyperplane found in the new space corresponds to a nonlinear separating
hypersurface in the original space.
However, the above steps can be achieved using “kernel trick”, that is
the dot product of support vector and one of the training tuple
Φ(Xi ) . Φ(Xj ) cna be replaced by K (Xi ), Xj ), where Φ(X) is simply the
nonlinear mapping function applied to transform the training tuples.

K (Xi , Xj ) = Φ(Xi ) . Φ(Xj )


After applying this trick, we can then proceed to find a maximal nitlogo.jpg
separating hyperplane.
Roshan P Koshy 210CS3058 SUPPORT VECTOR MACHINES FOR CLASSIFICATION
Support Vector Machine – Working – Non-Linear Case

Three admissible kernel functions include:

1 Polynomial kernel of degree h : K (Xi , Xj ) = (Xi . Xj + 1)h


2
/2σ 2
2 Gaussian radial basis function kernel : K (Xi , Xj ) = e −(||Xi − Xj ||)

3 Sigmoid kernel : K (Xi , Xj ) = tanh(κ Xi . Xj − δ)

nitlogo.jpg

Roshan P Koshy 210CS3058 SUPPORT VECTOR MACHINES FOR CLASSIFICATION


Weka and Support Vector Machine

Weka has two algorithm, SMO and LibSVM. Here We test with SMO
which is the linear kernel function.

nitlogo.jpg

Roshan P Koshy 210CS3058 SUPPORT VECTOR MACHINES FOR CLASSIFICATION


Weka and Support Vector Machine

nitlogo.jpg

Roshan P Koshy 210CS3058 SUPPORT VECTOR MACHINES FOR CLASSIFICATION


References

Jiawei Han and Micheline Kamber, Data Mining: Concepts and


Techniques Second Edition
http://www.svms.org/srm/
Steve R. Gunn, Support Vector Machines for Classification and
Regression
Wikipedia, http://www.wikipedia.en/

nitlogo.jpg

Roshan P Koshy 210CS3058 SUPPORT VECTOR MACHINES FOR CLASSIFICATION

You might also like