Machine Learning (CSO851) - Lecture 05
Machine Learning (CSO851) - Lecture 05
https://monkeylearn.com/blog/introduction-to-
support-vector-machines-svm/
https://math.mit.edu/~edelman/publications/
support_vector.pdf
Introduction to SVM
Support vector machines (SVMs) is a supervised learning method used for
classification, regression and outliers detection.
4
How Does SVM Works?
• Let’s imagine we have two
tags: red and blue, and our
data has two features: x and
y.
5
How Does SVM Works?
• A support vector
machine takes these data
points and generates a
hyperplane that best
separates the tags.
6
How Does SVM Works?
7
How Does SVM Works?
8
Linear SVM
If we have a dataset comprising of observations, these span a feature
space . Here, |x| is the dimensionality of the vector comprising of the
features for a given observation x.
In this feature space V, an
SVM identifies the
hyperplane that maximizes
the distance that exists
between itself and the
closest two points or sets of
points that belong to distinct
categories. If one such
hyperplane exists, we can
say that the observations are
linearly separable in the
feature space V. 9
Separating Hyperplane & Support Vectors
10
Separating Hyperplane & Support Vectors
If any pair of
observations belong to
two different classes,
then the hyperplane
lies somewhere
between them.
These informative
observations support the
identification of the
decision boundary by the
SVM. For this reason, we
call the feature vectors
located in proximity to
observations of other
classes “support vectors”.
11
Decision Boundary
12
Decision Boundary
13
Decision Boundary
How can we find the optimal separating hyperplanes?
We know that the distance between the two parallel hyperplanes in (2.1) is
d = 2/||W||. Therefore, given any N points belonging to two classes, we can
formulate the finding of the optimal separating hyperplanes as the following
linearly constrained QP problem:
14
Decision Boundary
we derive the dual problem to the linearly constrained QP problem (2.9),
which is the one that is actually computed in practice. We also give
conditions for judging if a solution is optimal.
15
Decision Boundary
16
Training: Parameter Estimation
17
SVM With Hard Margin
When the data is linearly separable, and we don’t want to have any
misclassifications, we use SVM with a hard margin. However, when a linear
boundary is not feasible, or we want to allow some misclassifications in the hope
of achieving better generality, we can opt for a soft margin for our classifier.
18
SVM With Hard Margin
19
SVM With Hard Margin
20
SVM With Hard Margin
21
SVM With Soft Margin
22
SVM With Soft Margin
23
Hard Margin vs Soft Margin
• A hard margin and a soft margin in SVMs lies in the separability
of the data.
• If our data is linearly separable, we go for a hard margin.
However, if this is not the case, it won’t be feasible to do that. In
the presence of the data points that make it impossible to find a
linear classifier, we would have to be more lenient and let some
of the data points be misclassified. In this case, a soft margin
SVM is appropriate.
• Sometimes, the data is linearly separable, but the margin is so
small that the model becomes prone to overfitting or being too
sensitive to outliers.
• Also, in this case, we can opt for a larger margin by using soft
margin SVM in order to help the model generalize better.
24
Lagrange Multiplier
• Suppose we are given a function f(x,y,z,…) for which we want
to find extrema, subject to the condition g(x,y,z,…)=k.
25
Lagrange Multiplier
26
27