0% found this document useful (0 votes)
3 views

SVM_Presentation

Support Vector Machines (SVM) are supervised learning algorithms used for classification and regression, focusing on finding a hyperplane that maximizes the margin between classes. Developed from statistical learning theory, SVM is effective in high-dimensional spaces and can handle both linear and nonlinear data through techniques like the Kernel Trick. The goal is to identify the Maximum Margin Hyperplane, defined by support vectors, to ensure robust decision boundaries.

Uploaded by

thanhafathima480
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

SVM_Presentation

Support Vector Machines (SVM) are supervised learning algorithms used for classification and regression, focusing on finding a hyperplane that maximizes the margin between classes. Developed from statistical learning theory, SVM is effective in high-dimensional spaces and can handle both linear and nonlinear data through techniques like the Kernel Trick. The goal is to identify the Maximum Margin Hyperplane, defined by support vectors, to ensure robust decision boundaries.

Uploaded by

thanhafathima480
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

SUPPORT VECTOR MACHINES

(SVM)
Maximum Margin Hyperplane and Core Concepts
What is SVM?

 Support Vector Machine (SVM) is a


supervised learning algorithm used for
classification and regression.
 Focuses on finding a hyperplane that

separates classes with maximum margin.


 Effective in high-dimensional spaces and
with small/medium-sized datasets.
SVM - History and Applications
 Vapnik and colleagues (1992)—groundwork from Vapnik &
Chervonenkis’ statistical learning theory in 1960s
 Features: training can be slow but accuracy is high owing to
their ability to model complex nonlinear decision boundaries
(margin maximization)
 Used for: classification and numeric prediction
 Applications:
 Handwritten digit recognition, object recognition, speaker
identification, benchmarking time-series prediction tests
Support Vector Machines
 Classification method for both linear and nonlinear data
 It uses a nonlinear mapping to transform the original
training data into a higher dimension
 With the new dimension, it searches for the linear optimal
separating hyperplane (i.e., “decision boundary”)
 With an appropriate nonlinear mapping to a sufficiently
high dimension, data from two classes can always be
separated by a hyperplane
 SVM finds this hyperplane using support vectors
(“essential” training tuples) and margins (defined by the
support vectors)
SVM – General Philosophy

Small Margin Large Margin


Support Vectors

5
SVM - Margins and Support Vectors

April 16, 2019 Data Mining: Concepts and Techniques 6


SVM - When Data Is Linearly Separable

Let data D be (X1, y1), …, (X|D|, y|D|), where Xi is the set of training tuples associated
with the class labels yi
There are infinite lines (hyperplanes) separating the two classes but we want to find the
best one (the one that minimizes classification error on unseen data)
SVM searches for the hyperplane with the largest margin, i.e., Maximum Marginal
Hyperplane (MMH)

7
Support Vectors and Hyperplane

 Support Vectors: Critical points closest to


the decision boundary that define the
hyperplane.
 Hyperplane: The decision boundary

separating different classes (a line, plane,


or hyperplane in higher dimensions).
Maximum Margin Hyperplane
 The Maximum Margin Hyperplane is the
decision boundary that maximizes the margin
between classes.
 Margin: Distance between the hyperplane and
the nearest data points (support vectors).
 SVM aims to maximize this margin, providing

a robust decision boundary.


SVM – General Philosophy

Small Margin Large Margin


Support Vectors

10
SVM - Margins and Support Vectors

April 16, 2019 Data Mining: Concepts and Techniques 11


SVM - When Data Is Linearly Separable

Let data D be (X1, y1), …, (X|D|, y|D|), where Xi is the set of training tuples associated
with the class labels yi
There are infinite lines (hyperplanes) separating the two classes but we want to find the
best one (the one that minimizes classification error on unseen data)
SVM searches for the hyperplane with the largest margin, i.e., Maximum Marginal
Hyperplane (MMH)

12
SVM - Linearly Separable
 A separating hyperplane can be written as
W●X+b=0
where W={w1, w2, …, wn} is a weight vector and b a scalar (bias)
 For 2-D it can be written as
w0 + w1 x1 + w2 x2 = 0
 The hyperplane defining the sides of the margin:
H1: w0 + w1 x1 + w2 x2 ≥ 1 for yi = +1, and
H2: w0 + w1 x1 + w2 x2 ≤ – 1 for yi = –1
 Any training tuples that fall on hyperplanes H1 or H2 (i.e., the
sides defining the margin) are support vectors
 This becomes a constrained (convex) quadratic optimization
problem: Quadratic objective function and linear constraints 
Quadratic Programming (QP)  Lagrangian multipliers
13
Mathematical Formulation
Non-linear Data and Kernel Trick A2

(SVM - Linearly Inseparable)

A1

 For non-linearly separable data, SVM uses the


Kernel Trick to project data into higher
dimensions.
 Common kernels: Polynomial, Radial Basis
Function (RBF), and Sigmoid.
 Allows SVM to find a hyperplane in higher-

dimensional space.
A2

Kernel Trick
A1
 The kernel trick is used in SVMs when the data is not
linearly separable in its original feature space.
 It allows the algorithm to operate in a higher-dimensional
space without explicitly computing the coordinates of the
data in that space.
 This makes it possible to find a separating hyperplane for
complex, non-linear data distributions.
 The goal of the kernel trick is to transform the data into a
higher-dimensional space where a linear separation
(hyperplane) is possible, even if the data is not linearly
separable in the original space.
Kernel Trick …
SVM: Different Kernel functions
 Instead of computing the dot product on the transformed data, it
is math. equivalent to applying a kernel function K(Xi, Xj) to the
original data, i.e., K(Xi, Xj) = Φ(Xi) Φ(Xj)
 Typical Kernel Functions

 SVM can also be used for classifying multiple (> 2) classes and
for regression analysis (with additional parameters)

19
A2

Kernel Trick …

 Transform the original input data into a higher dimensional space


A1

 Search for a linear separating hyperplane in the new space

20

You might also like