0% found this document useful (0 votes)
28 views

SMV 3

The document discusses support vector machines (SVMs) for classification. It explains that SVMs find the optimal separating hyperplane with the maximum margin between classes. Soft margin classification allows some misclassified points by introducing slack variables. The kernel trick enables SVMs to handle non-linear separability by projecting data into a higher-dimensional feature space. Common kernels include polynomial and Gaussian radial basis function kernels.

Uploaded by

Bhoot Bhoot
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views

SMV 3

The document discusses support vector machines (SVMs) for classification. It explains that SVMs find the optimal separating hyperplane with the maximum margin between classes. Soft margin classification allows some misclassified points by introducing slack variables. The kernel trick enables SVMs to handle non-linear separability by projecting data into a higher-dimensional feature space. Common kernels include polynomial and Gaussian radial basis function kernels.

Uploaded by

Bhoot Bhoot
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

Support Vector Machine

Neil Zhang

ECE 208/408 – The Art of Machine Learning


Linear SVM classification
SVM finds the hyperplane that separates the classes with the widest margin

(Figure from Géron figure 5-1)

Support Vector Machine, ECE 208/408 - The Art of Machine Learning, Spring 2023 2
Optimal margin classifier
Objective function

Derive on blackboard

(Figure from Support Vector Machine (SVM)


basics and implementation in Python)
Support Vector Machine, ECE 208/408 - The Art of Machine Learning, Spring 2023 3
Sensitive to feature scales

(Figure from Géron figure 5-2)

Support Vector Machine, ECE 208/408 - The Art of Machine Learning, Spring 2023 4
Sensitive to outliers

(Figure from Géron figure 5-3)

Support Vector Machine, ECE 208/408 - The Art of Machine Learning, Spring 2023 5
Soft margin classification

Introduce a slack variable to allow


margin violations

Derive on blackboard (Figure from Using Support Vector Machines for


Survey Research | Published in Survey Practice)
Support Vector Machine, ECE 208/408 - The Art of Machine Learning, Spring 2023 6
Soft margin classification

More margin violations Less margin violations


Underfitting Overfitting

(Figure from Géron figure 5-4)

Support Vector Machine, ECE 208/408 - The Art of Machine Learning, Spring 2023 7
Summary of linear SVM classifier
Find the maximized margin between two classes

Soft margin classification allows margin violations


controlled by a hyperparameter C

https://scikit-learn.org/stable/modules/generated/sklearn
.svm.LinearSVC.html

Support Vector Machine, ECE 208/408 - The Art of Machine Learning, Spring 2023 8
Non-linearly separable

(Figure from Géron figure 5-5)

Support Vector Machine, ECE 208/408 - The Art of Machine Learning, Spring 2023 9
Adding polynomial features

● Adding polynomial features works with many ML models


○ High polynomial degree creates a huge number of
features

Support Vector Machine, ECE 208/408 - The Art of Machine Learning, Spring 2023 10
How we handle non-linearity?
Feature mapping

(Figure from The Kernel Trick in Support Vector


Classification | by Drew Wilimitis | Towards Data Science )

Support Vector Machine, ECE 208/408 - The Art of Machine Learning, Spring 2023 11
Inner product is computationally expensive
To solve the optimization problem, we need to calculate the dot products of the
transformed features.

Support Vector Machine, ECE 208/408 - The Art of Machine Learning, Spring 2023 12
Kernel trick
Get the same result without adding the polynomial features.

We don’t need to apply the underlying transformations to the features, as long as


the kernel preserves the inner product.

Support Vector Machine, ECE 208/408 - The Art of Machine Learning, Spring 2023 13
An example kernel

Support Vector Machine, ECE 208/408 - The Art of Machine Learning, Spring 2023 14
Why applying kernel trick is better?
Reduce computational complexity

In this case,

O(d2) -> O(d)

Support Vector Machine, ECE 208/408 - The Art of Machine Learning, Spring 2023 15
Test a kennel is valid or not
Find the underlying transformation

A better way (Out of scope):

Gram matrix should be positive semi-definite.

Support Vector Machine, ECE 208/408 - The Art of Machine Learning, Spring 2023 16
Polynomial kernel
degree-M polynomials

Support Vector Machine, ECE 208/408 - The Art of Machine Learning, Spring 2023 17
Gaussian Radial Basis Function (RBF) Kernel

Support Vector Machine, ECE 208/408 - The Art of Machine Learning, Spring 2023 18
Gaussian kernel has infinite dimensionality
Taylor’s series expansion

Support Vector Machine, ECE 208/408 - The Art of Machine Learning, Spring 2023 19
Gaussian RBF kernel trick
More regularized Less regularized
Underfitting Overfitting

Less regularized
Overfitting
(Figure from Géron figure 5-9) 20
Support Vector Machine, ECE 208/408 - The Art of Machine Learning, Spring 2023
Summary: Why SVM?
Optimal margin classifier

Kernel trick for non-linearly separable classes

Note: Kernel trick is not limited to SVM, it can be applied to any algorithm
that involves inner products.

https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html

Support Vector Machine, ECE 208/408 - The Art of Machine Learning, Spring 2023 21
Question
● Can an SVM classifier output a confidence score when it classifies an
instance? What about a probability?
● An SVM classifier can output the distance between the test instance
and the decision boundary, and you can use this as a confidence
score. However, this score cannot be directly converted into an
estimation of the class probability.
● If you set probability=True when creating an SVM in Scikit-Learn,
then after training it will calibrate the probabilities using Logistic
Regression on the SVM’s scores (trained by an additional five-fold
cross-validation on the training data).

Support Vector Machine, ECE 208/408 - The Art of Machine Learning, Spring 2023 22
Some concepts we did not cover
Representer theorem

Lagrange duality

Karush–Kuhn–Tucker conditions

Dual form

Gram matrix

Support Vector Machine, ECE 208/408 - The Art of Machine Learning, Spring 2023 23

You might also like