100% found this document useful (1 vote)
24 views

Support Vector Machine

Support vector machines (SVMs) are supervised learning models that analyze data used for classification and regression analysis. SVMs build a hyperplane or set of hyperplanes in a high- or infinite-dimensional space to separate classes of data. They are effective for non-linear classification using kernel tricks and work well even when the number of dimensions is greater than the number of samples. Common applications include facial expression classification, speech recognition, handwritten digit recognition, and text categorization.

Uploaded by

Dilna Sebastian
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
24 views

Support Vector Machine

Support vector machines (SVMs) are supervised learning models that analyze data used for classification and regression analysis. SVMs build a hyperplane or set of hyperplanes in a high- or infinite-dimensional space to separate classes of data. They are effective for non-linear classification using kernel tricks and work well even when the number of dimensions is greater than the number of samples. Common applications include facial expression classification, speech recognition, handwritten digit recognition, and text categorization.

Uploaded by

Dilna Sebastian
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 25

Support Vector Machine

Support Vector Machine


• A Support Vector Machine (SVM) is a
discriminative classifier formally defined by a
separating hyperplane
• In other words, given labeled training data
(supervised learning) ,the algorithm
 outputs an optimal hyperplane which
categorizes new examples.
SVM Classifier
• For a dataset consisting of features set and
labels set, an SVM classifier builds a model to
predict classes for new examples.
• It assigns new example/data points to one of
the classes.
• If there are only 2 classes then it can be called
as a Binary SVM Classifier.
There are 2 kinds of SVM classifiers:
– Linear SVM Classifier
– Non-Linear SVM Classifier
SVM Linear Classifier:

• It predicts a straight hyperplane dividing 2


classes.
• The primary focus while drawing the hyperplane
is on maximizing the distance from hyperplane
to the nearest data point of either class.
• The drawn hyperplane called as a maximum-
margin hyperplane.
• .
SVM Non-Linear Classifier:
• For this Vapnik suggested creating Non-
Linear Classifiers by applying the kernel trick to
maximum-margin hyperplanes.
• In Non-Linear SVM Classification, data points
plotted in a higher dimensional space
Examples of SVM boundaries

• Selecting best hyperplane for our classification.

• We will show data from 2 classes.

• The classes represented by triangle  and circle.


 
Case 1:
• In SVM, we try to
maximize the distance
between hyperplane &
nearest data point. This is
known as margin.
• Since 1st decision
boundary is maximizing
the distance between
classes on left and right.
So, our maximum margin
hyperplane will be “1st “.
Case2
• Data is not evenly distributed on
left and right.
• SVM tries to find out maximum
margin hyper plane but gives first
priority to correct classification.
1st decision boundary is separating
some from  but not all. It’s not even
showing good margin.
• 2nd decision boundary is separating
the data points similar to 1st
boundary but here margin between
boundary and data points is larger
than the previous case.
• 3rd decision boundary is separating
all from all  classes.
So, SVM will select  3rd hyperplane.
case3
• Data is not evenly distributed on left
and right.
• In the real world, we may find few
values that correspond to extreme
cases i.e, exceptions. These
exceptions are known as Outliers.
• SVM have the capability to detect
and ignore outliers.
• While selecting hyperplane, SVM will
automatically ignore these  and
select best-performing hyperplane.
• 1st & 2nd decision boundaries are
separating classes but 1st decision
boundary shows maximum margin in
between boundary and support
vectors.
Case 4
• Non-linear classifiers.
• Figure shows data can’t be
separated by any straight
line, i.e, data is not linearly
separable.
• SVM uses Non-Linear
classifier
• Figure shows a decision
boundary separating both the
classes. This decision
boundary resembles a
parabola.
Linear Support Vector Machine Classifier
• A data point is considered as a p-dimensional
vector(list of p-numbers) and we separate points
using (p-1) dimensional hyperplane.
• Many hyperplanes
• Best hyperplane is considered to be the one which
maximizes the margin i.e., the distance between
hyperplane and closest data point of either class.
• The Maximum-margin hyperplane is determined by
the data points that lie nearest to it. These data
points which influences our hyperplane are known as
support vectors.
Support Vectors

denotes +1
denotes -1

Support Vectors
are those
datapoints that
the margin
pushes up
against
Selecting the SVM Hyperplanes
Linearly Separable: 
• Select two parallel
hyperplanes that separate
the two classes of data,  so
that distance between both
the lines is maximum. 
• The region b/w these two
hyper planes is known as
“margin”
• Maximum margin
hyperplane is the one that
lies in the middle of them.
Linear SVM
• We are given a training dataset of n points of the form(x1,y1)……(xn,
yn)
– Where yi denotes classes
xi denotes features.
– yi are either 1 or −1, each indicating the class to which the point xi belongs.
Each p is a p dimensional real vector.
• We want to find the "maximum-margin hyperplane" that divides
the group of points xi for which yi=1 from the group of points for
which yi=-1 which is defined so that the distance between the
hyperplane and the nearest point xi from either group is maximized.
• Any hyperplane can be written as the set of points x satisfying
w.x-b=0
Linear SVM [cont’d]
• As we also have to prevent data points from
falling into the margin, we add the following
constraint: for each i either

• where  ||w|| is normal vector to the hyperplane,


2
The Distance between two hyperplanes is  w ,
to maximize this distance denominator value should
be minimized i.e,  ||w||  should be minimized.
Linear SVM [cont’d]
• For proper classification, we can build a
combined equation:
Non Linear SVM
• Vapnik proposed Non-Linear Classifiers in 1992.
•  For data points that are not linearly separable in a p-
dimensional(finite) space.
• To solve this, it was proposed to map p-dimensional space into
a much higher dimensional space.
• We can draw customized/non-linear hyper planes using Kernel
trick.
• Every kernel holds a non-linear kernel function.
• Kernel makes calculation process faster and easier, especially
when the feature vector is of very high dimension
•  A kernel is a shortcut that helps us do certain calculation faster
which otherwise would involve computations in higher
dimensional space
.
Kernel Functions
• Polynomial (homogeneous) Kernel:

• The polynomial kernel function can be


represented by the above expression.
• Where k(xi, xj) is a kernel function, xi & xj  are
vectors of feature space and d is the degree of
polynomial function.
Kernel Functions
Polynomial(non-homogeneous) Kernel:

• a constant term is also added.


• The constant term “c” is also known as a free
parameter.
• It influences the combination of features.
• x & y are vectors of feature space.
Kernel Functions
Radial Basis Function Kernel:
• It is also known as RBF kernel.

• where x & x’ are vectors of feature space. 


•  is a free parameter.
• Selection of parameters is a critical choice. Using a typical
value of the parameter can lead to overfitting our data.
• For distance metric ,squared euclidean distance is used here.
• It is used to draw completely non-linear hyperplanes.
Non-Linearly Separable
• To build classifier for non-linear data, we try
to minimize
[ max(0,1-yi(w.xi-b))]

• Here, max() method will be zero( 0 ), if xi is on the correct side


of the margin.
• For data that is on opposite side of the margin, the function’s
value is proportional to the distance from the margin.
• where,  determines tradeoff b/w increasing the margin size
and that  is on correct side of the margin.
Advantages of SVM Classifier:
• SVMs are effective when the number of features
is quite large.
• It works effectively even if the number of
features are greater than the number of samples.
• Non-Linear data can also be classified using
customized hyperplanes built by using kernel
trick.
• It is a robust model to solve prediction problems
since it maximizes margin.
Disadvantages of SVM Classifier:
• The biggest limitation of Support Vector Machine is
the choice of the kernel. The wrong choice of the
kernel can lead to an increase in error percentage.
• With a greater number of samples, it starts giving
poor performances.
• SVMs have good generalization performance but
they can be extremely slow in the test phase.
• SVMs have high algorithmic complexity and
extensive memory requirements due to the use of
quadratic programming.
SVM Applications:

• Facial expression classification: SVMs can be used to classify


facial expressions. It uses statistical models of shape and
SVMs.
• Speech recognition: SVMs are used to accept keywords and
reject non-keywords them and build a model to recognize
speech.
• Handwritten digit recognition: Support vector classifiers can
be applied to the recognition of isolated handwritten digits
optically scanned.
• Text Categorization: In information retrieval and then
categorization of data using labels can be done by SVM.

You might also like