SVM_Presentation
SVM_Presentation
(SVM)
Maximum Margin Hyperplane and Core Concepts
What is SVM?
5
SVM - Margins and Support Vectors
Let data D be (X1, y1), …, (X|D|, y|D|), where Xi is the set of training tuples associated
with the class labels yi
There are infinite lines (hyperplanes) separating the two classes but we want to find the
best one (the one that minimizes classification error on unseen data)
SVM searches for the hyperplane with the largest margin, i.e., Maximum Marginal
Hyperplane (MMH)
7
Support Vectors and Hyperplane
10
SVM - Margins and Support Vectors
Let data D be (X1, y1), …, (X|D|, y|D|), where Xi is the set of training tuples associated
with the class labels yi
There are infinite lines (hyperplanes) separating the two classes but we want to find the
best one (the one that minimizes classification error on unseen data)
SVM searches for the hyperplane with the largest margin, i.e., Maximum Marginal
Hyperplane (MMH)
12
SVM - Linearly Separable
A separating hyperplane can be written as
W●X+b=0
where W={w1, w2, …, wn} is a weight vector and b a scalar (bias)
For 2-D it can be written as
w0 + w1 x1 + w2 x2 = 0
The hyperplane defining the sides of the margin:
H1: w0 + w1 x1 + w2 x2 ≥ 1 for yi = +1, and
H2: w0 + w1 x1 + w2 x2 ≤ – 1 for yi = –1
Any training tuples that fall on hyperplanes H1 or H2 (i.e., the
sides defining the margin) are support vectors
This becomes a constrained (convex) quadratic optimization
problem: Quadratic objective function and linear constraints
Quadratic Programming (QP) Lagrangian multipliers
13
Mathematical Formulation
Non-linear Data and Kernel Trick A2
A1
dimensional space.
A2
Kernel Trick
A1
The kernel trick is used in SVMs when the data is not
linearly separable in its original feature space.
It allows the algorithm to operate in a higher-dimensional
space without explicitly computing the coordinates of the
data in that space.
This makes it possible to find a separating hyperplane for
complex, non-linear data distributions.
The goal of the kernel trick is to transform the data into a
higher-dimensional space where a linear separation
(hyperplane) is possible, even if the data is not linearly
separable in the original space.
Kernel Trick …
SVM: Different Kernel functions
Instead of computing the dot product on the transformed data, it
is math. equivalent to applying a kernel function K(Xi, Xj) to the
original data, i.e., K(Xi, Xj) = Φ(Xi) Φ(Xj)
Typical Kernel Functions
SVM can also be used for classifying multiple (> 2) classes and
for regression analysis (with additional parameters)
19
A2
Kernel Trick …
20