0% found this document useful (0 votes)

11 views

6 Lec SVM Kernel

Uploaded by

ĂmÑa CheEma

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views

6 Lec SVM Kernel

Uploaded by

ĂmÑa CheEma

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 36

Support vector machines (SVMs)

Dr. Saifullah Khalid

[email protected]
Slides Credit: Mostly based on UofT intro to machine learning course
Sequence

‫ ﺓ‬Support vector machine (SVM)

‫ ﺓ‬Optimal separating hyper planes
‫ ﺓ‬Non-seperable data

‫ ﺓ‬Kernel Method
‫ ﺓ‬Dual formulation of SVM
‫ ﺓ‬Inner product of kernels

2
Separating Hyperplane?
Separating Hyperplane?
Separating Hyperplane?
Support Vector Machine (SVM)

Support vectors

Maximize
• SVMs maximize the margin (or the margin
street) around the separating hyperplane
• The decision function is fully specified
by a (usually very small) subset of
training samples, the support vectors
Support Vectors

d
X X
v1
v2

X X
v3

X
X

Three support vectors: v1, v2, v3, instead of just the 3 circled points at the tail ends of the
support vectors. d denotes 1/2 of the street ‘width’
Optimal Separating Hyperplane

‫ ﺓ‬Optimal Separating Hyperplane: A hyperplane that

separates two classes and maximizes the distance to the
closest point from either class, i.e., maximize the margin of
the classifier

‫ ﺓ‬Intuitively, ensuring that a classifier is not too close to any

data points leads to better generalization on the test data.
Geometry of Points and Planes
Geometry of Points and Planes
Maximizing Margin as an Optimization Problem
Maximizing Margin as an Optimization Problem
Maximizing Margin as an Optimization Problem
Maximizing Margin as an Optimization Problem
Maximizing Margin as an Optimization Problem
Maximizing Margin as an Optimization Problem

Algebraic max-margin objective:

‫ ﺓ‬This is a Quadratic Program: Quadratic objective + Linear inequality constraints.

‫ ﺓ‬The important training examples are the ones with algebraic margin 1, and are
called support vectors

‫ ﺓ‬Hence, this algorithm is called the (hard) Support Vector Machine (SVM)

‫ ﺓ‬SVM-like algorithms are often called max-margin or large-margin

Non-Separable Data Points

‫ ﺓ‬How can we apply the max-margin principle if the

data are not linearly separable?
Maximizing Margin for Non-Separable Data
Points

Main Idea: ‫ ﺓ‬Allow some points to be within the margin or even be

misclassified; we represent this with slack variables ξi.
‫ ﺓ‬But constrain or penalize the total amount of slacks
Maximizing Margin for Non-Separable Data Points
Maximizing Margin for Non-Separable Data Points
‫ ﺓ‬Soft-margin SVM objective:

• 𝛾 is a hyper parameter that trades off the margin with the

amount of slack.
► For 𝛾 = 0, we’ll get 𝒘 = 0. (Why?)
► As 𝛾 → ∞ we get the hard-margin objective.
• Note: It is also possible to constrain 𝑖 𝜉𝑖 instead of penalizing it
From Margin Violation to Hinge Loss
Let’s simplify the soft margin constraint by eliminating ξi.

Recall: 𝑡 𝑖 𝒘𝑇𝒙𝑖 + 𝑏 ≥ 1 − 𝜉𝑖 ∀𝑖 ∈ 𝑁
𝜉𝑖 ≥ 0 ∀𝑖 ∈ 𝑁

‫ ﺓ‬We would like to find a smallest slack variable ξi that satisfy both
𝜉𝑖 ≥ 1 − 𝑡 𝑖 𝒘 𝑇 𝒙 𝑖 + 𝑏 and 𝜉𝑖 ≥ 0
‫ ﺓ‬Case 1: 1 − 𝑡 𝑖 𝒘 𝑇 𝒙 𝑖 + 𝑏 ≤ 0
The smallest non-negative ξi that satisfies the constraint is 𝜉𝑖 = 0
‫ ﺓ‬Case 2: 1 − 𝑡 𝑖 𝒘 𝑇 𝒙 𝑖 + 𝑏 > 0
The smallest 𝜉𝑖 that satisfies the constraint is 𝜉𝑖 = 1 − 𝑡 𝑖 𝒘 𝑇 𝒙 𝑖 + 𝑏
‫ ﺓ‬Hence, 𝜉𝑖 = max {0, 1 − 𝑡 𝑖 𝒘 𝑇 𝒙 𝑖 + 𝑏 }
‫ ﺓ‬Therefore, the slack penalty can be written as
𝑁 𝑁

𝜉𝑖 = 𝑚𝑎𝑥 {0, 1 − 𝑡 𝑖 𝑤 𝑇 𝑥 𝑖 + 𝑏 }
𝑖 𝑖
From Margin Violation to Hinge Loss
Kernel Methods
or
Kernel Trick
Nonlinear Decision Boundaries

‫ ﺓ‬SV Classifier: Margin maximizing linear classifier

‫ ﺓ‬Linear models are restrictive
‫ ﺓ‬Q: How can we get nonlinear decision boundaries?
‫ ﺓ‬Feature mapping 𝒙 → 𝜑(𝒙)

‫ ﺓ‬Q: How do we find good features?

Feature Maps

‫ ﺓ‬For a quadratic decision boundary

‫ ﺓ‬What feature mapping do we need?

‫ ﺓ‬One possibility (ignore √2 for now)

‫ ﺓ‬We have dim 𝜑 𝒙 = 𝑂 𝑑2 ; in a high dimension, the

computation cost might be large

‫ ﺓ‬Can we avoid the high computation cost?

‫ ﺓ‬Let us take a closer look at SVM

From Primal to Dual Formulation of SVM
‫ ﺓ‬Recall that the SVM is defined using the following constrained
optimization problem:

‫ ﺓ‬We can instead solve a dual optimization problem to obtain 𝒘

► We do not derive it here in detail. The basic idea is to form the following
Lagrangian, find w as a function of 𝛼 (and other variables), and express the
Lagrangian only in terms of the dual variables:
From Primal to Dual Formulation of SVM
‫ ﺓ‬Primal Optimization Problem:

‫ ﺓ‬Dual Optimization Problem:

‫ ﺓ‬The weights become:

which is a function of the dual variables 𝛼𝑖 ∀𝑖 ∈ 𝑁

From Primal to Dual Formulation of SVM
‫ ﺓ‬Dual Optimization Problem:

‫ ﺓ‬The weights become:

‫ ﺓ‬The non-zero weights 𝛼i corresponds to observations that satisfy

𝑡 𝑖 𝑤𝑇𝑥 𝑖 + 𝑏 = 1 − 𝜉𝑖 . These are the support vectors

‫ ﺓ‬Observation: The input data only appears in the form of inner

products 𝒙𝑖 𝒙𝑗
SVM in Feature Space
From Inner Products to Kernels
From Inner Products to Kernels
Kernels
Kernelizing SVM
Example: Linear SVM

• Solid line - decision boundary. Dashed - +1/-1 margin. Purple - Bayes optimal
• Solid dots - Support vectors on margin
Example: Degree-4 Polynomial Kernel SVM
Example: Gaussian Kernel SVM

Assignment 1
No ratings yet
Assignment 1
2 pages
Activity 2 Applications of System of Linear Equations-1
No ratings yet
Activity 2 Applications of System of Linear Equations-1
6 pages
EE323 Course Outline 2015 v2
No ratings yet
EE323 Course Outline 2015 v2
12 pages
SVM
No ratings yet
SVM
11 pages
Svm Student
No ratings yet
Svm Student
40 pages
27-Module 4 - Support Vector Machine and Naïve Bayes-20-09-2024
No ratings yet
27-Module 4 - Support Vector Machine and Naïve Bayes-20-09-2024
31 pages
Support Vector Machine
No ratings yet
Support Vector Machine
45 pages
SVM Tutorial
No ratings yet
SVM Tutorial
34 pages
SVM - Feb 15
No ratings yet
SVM - Feb 15
34 pages
Support Vector Machine (SVM)
No ratings yet
Support Vector Machine (SVM)
4 pages
CS-13410 Introduction To Machine Learning
No ratings yet
CS-13410 Introduction To Machine Learning
33 pages
Lecture 18 - SVM
No ratings yet
Lecture 18 - SVM
54 pages
Unit 2
No ratings yet
Unit 2
47 pages
Support Vector Machine
No ratings yet
Support Vector Machine
17 pages
6. Support Vector Machine for Classification
No ratings yet
6. Support Vector Machine for Classification
38 pages
SVM notes unit 4.docx
No ratings yet
SVM notes unit 4.docx
8 pages
SVM
No ratings yet
SVM
6 pages
A09-Support-Vector-Machines-2up (3)
No ratings yet
A09-Support-Vector-Machines-2up (3)
15 pages
SVM Tutorial
No ratings yet
SVM Tutorial
31 pages
Support Vector Machine (SVM) Algorithm
No ratings yet
Support Vector Machine (SVM) Algorithm
9 pages
Support Vector Machinephd Thesis
100% (2)
Support Vector Machinephd Thesis
6 pages
Svm
No ratings yet
Svm
40 pages
Chapter 07 SVM
No ratings yet
Chapter 07 SVM
20 pages
SVM Tutorial
No ratings yet
SVM Tutorial
28 pages
Support Vector Machines: Some Slides Adapted From
No ratings yet
Support Vector Machines: Some Slides Adapted From
54 pages
Unit2 notes What is a Support Vector Machine
No ratings yet
Unit2 notes What is a Support Vector Machine
11 pages
Support Vector Machine
No ratings yet
Support Vector Machine
31 pages
Lec5 Support vector machine
No ratings yet
Lec5 Support vector machine
28 pages
An Introduction Of: Support Vector Machine
No ratings yet
An Introduction Of: Support Vector Machine
36 pages
Lec06 SVM
No ratings yet
Lec06 SVM
25 pages
Support Vector Machines (SVMs) - Introduction and Key Concepts
No ratings yet
Support Vector Machines (SVMs) - Introduction and Key Concepts
52 pages
Math Behind SVM Part 1 (Support Vector Machine) - by MLMath - Io - Medium
No ratings yet
Math Behind SVM Part 1 (Support Vector Machine) - by MLMath - Io - Medium
15 pages
Svm
No ratings yet
Svm
52 pages
Unit-III - SVM
No ratings yet
Unit-III - SVM
105 pages
Support Vector Machines: Jeff Wu
No ratings yet
Support Vector Machines: Jeff Wu
35 pages
SVM Consolidated
No ratings yet
SVM Consolidated
34 pages
Detailed SVM Presentation
No ratings yet
Detailed SVM Presentation
15 pages
svm
No ratings yet
svm
36 pages
svm
No ratings yet
svm
33 pages
Support Vector Machines Theory Implementation and Applications
No ratings yet
Support Vector Machines Theory Implementation and Applications
10 pages
Ann Unit III
No ratings yet
Ann Unit III
20 pages
MergedPDF Iml
No ratings yet
MergedPDF Iml
114 pages
Support Vector Machine
No ratings yet
Support Vector Machine
52 pages
By: Moataz Al-Haj: Vision Topics - Seminar (University of Haifa)
No ratings yet
By: Moataz Al-Haj: Vision Topics - Seminar (University of Haifa)
69 pages
Machine Learning (CSO851) - Lecture 05
No ratings yet
Machine Learning (CSO851) - Lecture 05
27 pages
SVM_Presentation
No ratings yet
SVM_Presentation
19 pages
L5_SVMs
No ratings yet
L5_SVMs
37 pages
UNIT - 2-1
No ratings yet
UNIT - 2-1
7 pages
Support Vector Machines SVM & NAive Bayes
No ratings yet
Support Vector Machines SVM & NAive Bayes
30 pages
SVM PRESENTATION
No ratings yet
SVM PRESENTATION
34 pages
22-Kernel Tricks Shit
No ratings yet
22-Kernel Tricks Shit
43 pages
ML-Lec9-SVM
No ratings yet
ML-Lec9-SVM
32 pages
Unit 2 - SVM - 241016 - 104220
No ratings yet
Unit 2 - SVM - 241016 - 104220
47 pages
Support Vactor Machine Final
No ratings yet
Support Vactor Machine Final
11 pages
(Optimization) SVMs
No ratings yet
(Optimization) SVMs
19 pages
SVM Explained PDF
No ratings yet
SVM Explained PDF
19 pages
Support Vector Machines: Constantin F. Aliferis & Ioannis Tsamardinos
No ratings yet
Support Vector Machines: Constantin F. Aliferis & Ioannis Tsamardinos
37 pages
Support Vector Machines Theory Implementation and Applications (1)
No ratings yet
Support Vector Machines Theory Implementation and Applications (1)
10 pages
1694600937-Unit2.5 Support Vector Machine CU 2.0
No ratings yet
1694600937-Unit2.5 Support Vector Machine CU 2.0
26 pages
SVM Tutorial
No ratings yet
SVM Tutorial
34 pages
Presentation On Support Vector Machine (SVM)
100% (2)
Presentation On Support Vector Machine (SVM)
22 pages
Support Vector Machine: Fundamentals and Applications
From Everand
Support Vector Machine: Fundamentals and Applications
Fouad Sabry
No ratings yet
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet
Hamming Distance - Wikipedia
No ratings yet
Hamming Distance - Wikipedia
23 pages
Solutions For Tutorial Exercises Association Rule Mining.: Exercise 1. Apriori
No ratings yet
Solutions For Tutorial Exercises Association Rule Mining.: Exercise 1. Apriori
4 pages
Tuto 4 (Intro to AI)
No ratings yet
Tuto 4 (Intro to AI)
2 pages
L4DC PolicyOptTutorial2023
No ratings yet
L4DC PolicyOptTutorial2023
160 pages
Assignment4
No ratings yet
Assignment4
3 pages
Course: Cryptography Code: CS-21123 Branch: M.Tech - IS 1 Semester
No ratings yet
Course: Cryptography Code: CS-21123 Branch: M.Tech - IS 1 Semester
28 pages
Fourth Edition by William Stallings
No ratings yet
Fourth Edition by William Stallings
23 pages
Unit Iir20
No ratings yet
Unit Iir20
22 pages
Predictive_Modeling_for_Real-Time_Customer_Lifetime_Value
No ratings yet
Predictive_Modeling_for_Real-Time_Customer_Lifetime_Value
6 pages
Tugas Metode Komputasi Numerik
No ratings yet
Tugas Metode Komputasi Numerik
2 pages
6.01 Assignment PDF
No ratings yet
6.01 Assignment PDF
2 pages
K.Venkat Ratnam 191911412 Class Work 1) Describe The Attribute Selection Measures Used by The ID3 Algorithm To Construct A Decision Tree. A)
No ratings yet
K.Venkat Ratnam 191911412 Class Work 1) Describe The Attribute Selection Measures Used by The ID3 Algorithm To Construct A Decision Tree. A)
8 pages
DMM Quiz
No ratings yet
DMM Quiz
11 pages
Unit 1
No ratings yet
Unit 1
10 pages
Motion Planning For Autonomous Driving: The State of The Art and Future Perspectives
No ratings yet
Motion Planning For Autonomous Driving: The State of The Art and Future Perspectives
21 pages
Alexnet: The Architecture That Challenged Cnns
No ratings yet
Alexnet: The Architecture That Challenged Cnns
6 pages
A_Channel-Fused_Dense_Convolutional_Network_for_EEG-Based_Emotion_Recognition
No ratings yet
A_Channel-Fused_Dense_Convolutional_Network_for_EEG-Based_Emotion_Recognition
10 pages
33-005-PCI Datasheet DigitalPendulum MATLAB 10 2013
No ratings yet
33-005-PCI Datasheet DigitalPendulum MATLAB 10 2013
2 pages
Bubble Sort in C
No ratings yet
Bubble Sort in C
5 pages
CFD Handbook
No ratings yet
CFD Handbook
275 pages
MCA Sem I and II Pat 2019
No ratings yet
MCA Sem I and II Pat 2019
1 page
Type Equation Here.
No ratings yet
Type Equation Here.
3 pages
LAB3
No ratings yet
LAB3
14 pages
Accelerated CNN Training Through Gradient Approximation
No ratings yet
Accelerated CNN Training Through Gradient Approximation
9 pages
Process Control Matlab Exercise No. 8 Stability Analysis of Dynamic Systems 1. Objective: 2. Intended Learning Outcomes (Ilos)
No ratings yet
Process Control Matlab Exercise No. 8 Stability Analysis of Dynamic Systems 1. Objective: 2. Intended Learning Outcomes (Ilos)
28 pages
6b-Assignment of Discrete Probability Distributions
No ratings yet
6b-Assignment of Discrete Probability Distributions
5 pages
Lect 7 Single Layer NN
No ratings yet
Lect 7 Single Layer NN
20 pages

6 Lec SVM Kernel

Uploaded by

6 Lec SVM Kernel

Uploaded by

Support vector machines (SVMs)

Dr. Saifullah Khalid

‫ ﺓ‬Support vector machine (SVM)

‫ ﺓ‬Optimal Separating Hyperplane: A hyperplane that

‫ ﺓ‬Intuitively, ensuring that a classifier is not too close to any

Algebraic max-margin objective:

‫ ﺓ‬This is a Quadratic Program: Quadratic objective + Linear inequality constraints.

‫ ﺓ‬SVM-like algorithms are often called max-margin or large-margin

‫ ﺓ‬How can we apply the max-margin principle if the

Main Idea: ‫ ﺓ‬Allow some points to be within the margin or even be

• 𝛾 is a hyper parameter that trades off the margin with the

‫ ﺓ‬SV Classifier: Margin maximizing linear classifier

‫ ﺓ‬Q: How do we find good features?

‫ ﺓ‬For a quadratic decision boundary

‫ ﺓ‬One possibility (ignore √2 for now)

‫ ﺓ‬We have dim 𝜑 𝒙 = 𝑂 𝑑2 ; in a high dimension, the

‫ ﺓ‬Can we avoid the high computation cost?

‫ ﺓ‬Let us take a closer look at SVM

‫ ﺓ‬We can instead solve a dual optimization problem to obtain 𝒘

‫ ﺓ‬Dual Optimization Problem:

‫ ﺓ‬The weights become:

which is a function of the dual variables 𝛼𝑖 ∀𝑖 ∈ 𝑁

‫ ﺓ‬The weights become:

‫ ﺓ‬The non-zero weights 𝛼i corresponds to observations that satisfy

‫ ﺓ‬Observation: The input data only appears in the form of inner

You might also like