6 Lec SVM Kernel
6 Lec SVM Kernel
ﺓKernel Method
ﺓDual formulation of SVM
ﺓInner product of kernels
2
Separating Hyperplane?
Separating Hyperplane?
Separating Hyperplane?
Support Vector Machine (SVM)
Support vectors
Maximize
• SVMs maximize the margin (or the margin
street) around the separating hyperplane
• The decision function is fully specified
by a (usually very small) subset of
training samples, the support vectors
Support Vectors
d
X X
v1
v2
X X
v3
X
X
Three support vectors: v1, v2, v3, instead of just the 3 circled points at the tail ends of the
support vectors. d denotes 1/2 of the street ‘width’
Optimal Separating Hyperplane
ﺓThe important training examples are the ones with algebraic margin 1, and are
called support vectors
ﺓHence, this algorithm is called the (hard) Support Vector Machine (SVM)
Recall: 𝑡 𝑖 𝒘𝑇𝒙𝑖 + 𝑏 ≥ 1 − 𝜉𝑖 ∀𝑖 ∈ 𝑁
𝜉𝑖 ≥ 0 ∀𝑖 ∈ 𝑁
ﺓWe would like to find a smallest slack variable ξi that satisfy both
𝜉𝑖 ≥ 1 − 𝑡 𝑖 𝒘 𝑇 𝒙 𝑖 + 𝑏 and 𝜉𝑖 ≥ 0
ﺓCase 1: 1 − 𝑡 𝑖 𝒘 𝑇 𝒙 𝑖 + 𝑏 ≤ 0
The smallest non-negative ξi that satisfies the constraint is 𝜉𝑖 = 0
ﺓCase 2: 1 − 𝑡 𝑖 𝒘 𝑇 𝒙 𝑖 + 𝑏 > 0
The smallest 𝜉𝑖 that satisfies the constraint is 𝜉𝑖 = 1 − 𝑡 𝑖 𝒘 𝑇 𝒙 𝑖 + 𝑏
ﺓHence, 𝜉𝑖 = max {0, 1 − 𝑡 𝑖 𝒘 𝑇 𝒙 𝑖 + 𝑏 }
ﺓTherefore, the slack penalty can be written as
𝑁 𝑁
𝜉𝑖 = 𝑚𝑎𝑥 {0, 1 − 𝑡 𝑖 𝑤 𝑇 𝑥 𝑖 + 𝑏 }
𝑖 𝑖
From Margin Violation to Hinge Loss
Kernel Methods
or
Kernel Trick
Nonlinear Decision Boundaries
• Solid line - decision boundary. Dashed - +1/-1 margin. Purple - Bayes optimal
• Solid dots - Support vectors on margin
Example: Degree-4 Polynomial Kernel SVM
Example: Gaussian Kernel SVM