0% found this document useful (0 votes)
12 views27 pages

Machine Learning (CSO851) - Lecture 05

Support Vector Machines (SVM) is a supervised learning technique used for classification, regression, and outlier detection, effective in high-dimensional spaces. SVMs create a hyperplane that maximizes the margin between different classes, utilizing support vectors for decision-making. The method can employ hard or soft margins depending on the data's separability, with considerations for overfitting and the need for probability estimates.

Uploaded by

trijitrana9878
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views27 pages

Machine Learning (CSO851) - Lecture 05

Support Vector Machines (SVM) is a supervised learning technique used for classification, regression, and outlier detection, effective in high-dimensional spaces. SVMs create a hyperplane that maximizes the margin between different classes, utilizing support vectors for decision-making. The method can employ hard or soft margins depending on the data's separability, with considerations for overfitting and the need for probability estimates.

Uploaded by

trijitrana9878
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 27

Support Vector Machines

Machine Learning (CSO851)


Acknowledgement

Duda, Hard et. al.


https://scikit-learn.org/stable/modules/svm.html
https://www.baeldung.com/cs/ml-support-vector-mac
hines

https://monkeylearn.com/blog/introduction-to-
support-vector-machines-svm/
https://math.mit.edu/~edelman/publications/
support_vector.pdf
Introduction to SVM
Support vector machines (SVMs) is a supervised learning method used for
classification, regression and outliers detection.

The advantages of support vector machines are:


• Effective in high dimensional spaces.
• Effective in cases where number of dimensions is greater than the number
of samples.
• Uses a subset of training points in the decision function (called support
vectors), so it is also memory efficient.
• Versatile: different Kernel functions can be specified for the decision
function. Common kernels are provided, but it is also possible to specify
custom kernels.
The disadvantages of support vector machines include:
• If the number of features is much greater than the number of samples, avoid
over-fitting in choosing Kernel functions and regularization term is crucial.
• SVMs do not directly provide probability estimates, these are calculated
using an expensive five-fold cross-validation.
Basic Features of SVM

– Optimal hyperplane for linearly separable patterns

– Extend to patterns that are not linearly separable by


transformations of original data to map into new space – the
Kernel function

4
How Does SVM Works?
• Let’s imagine we have two
tags: red and blue, and our
data has two features: x and
y.

• We want a classifier that,


given a pair of (x, y)
coordinates, outputs if it’s
either red or blue.

• We plot our already labeled


training data on a plane.

5
How Does SVM Works?
• A support vector
machine takes these data
points and generates a
hyperplane that best
separates the tags.

• This line is the decision


boundary: anything that
falls to one side of it we
will classify as blue, and
anything that falls to the
other as red.

6
How Does SVM Works?

What exactly is the best


hyperplane?

For SVM, it’s the one that


maximizes the margins
from both tags. In other
words: the hyperplane
(remember it's a line in this
case) whose distance to the
nearest element of each tag
is the largest.

7
How Does SVM Works?

8
Linear SVM
If we have a dataset comprising of observations, these span a feature
space . Here, |x| is the dimensionality of the vector comprising of the
features for a given observation x.
In this feature space V, an
SVM identifies the
hyperplane that maximizes
the distance that exists
between itself and the
closest two points or sets of
points that belong to distinct
categories. If one such
hyperplane exists, we can
say that the observations are
linearly separable in the
feature space V. 9
Separating Hyperplane & Support Vectors

10
Separating Hyperplane & Support Vectors

If any pair of
observations belong to
two different classes,
then the hyperplane
lies somewhere
between them.
These informative
observations support the
identification of the
decision boundary by the
SVM. For this reason, we
call the feature vectors
located in proximity to
observations of other
classes “support vectors”.

11
Decision Boundary

12
Decision Boundary

13
Decision Boundary
How can we find the optimal separating hyperplanes?

We know that the distance between the two parallel hyperplanes in (2.1) is
d = 2/||W||. Therefore, given any N points belonging to two classes, we can
formulate the finding of the optimal separating hyperplanes as the following
linearly constrained QP problem:

14
Decision Boundary
we derive the dual problem to the linearly constrained QP problem (2.9),
which is the one that is actually computed in practice. We also give
conditions for judging if a solution is optimal.

15
Decision Boundary

16
Training: Parameter Estimation

17
SVM With Hard Margin
When the data is linearly separable, and we don’t want to have any
misclassifications, we use SVM with a hard margin. However, when a linear
boundary is not feasible, or we want to allow some misclassifications in the hope
of achieving better generality, we can opt for a soft margin for our classifier.

18
SVM With Hard Margin

Without allowing any misclassifications in the hard margin SVM, we want


to maximize the distance between the two hyperplanes. To find this distance,
we can use the formula for the distance of a point from a plane. So the
distance of the blue points and the red point from the black line would
respectively be:

19
SVM With Hard Margin

20
SVM With Hard Margin

21
SVM With Soft Margin

22
SVM With Soft Margin

23
Hard Margin vs Soft Margin
• A hard margin and a soft margin in SVMs lies in the separability
of the data.
• If our data is linearly separable, we go for a hard margin.
However, if this is not the case, it won’t be feasible to do that. In
the presence of the data points that make it impossible to find a
linear classifier, we would have to be more lenient and let some
of the data points be misclassified. In this case, a soft margin
SVM is appropriate.
• Sometimes, the data is linearly separable, but the margin is so
small that the model becomes prone to overfitting or being too
sensitive to outliers.
• Also, in this case, we can opt for a larger margin by using soft
margin SVM in order to help the model generalize better.

24
Lagrange Multiplier
• Suppose we are given a function f(x,y,z,…) for which we want
to find extrema, subject to the condition g(x,y,z,…)=k.

• The idea used in Lagrange multiplier is that the gradient of


the objective function f, lines up either in parallel or anti-
parallel direction to the gradient of the constraint g, at an
optimal point.

• In such case, one the gradients should be some multiple of


another. Let’s see using an example —

25
Lagrange Multiplier

Using Lagrange multiplier we solve it the following way

26
27

You might also like