0% found this document useful (0 votes)

44 views

Basic Concept of SVM

This document summarizes the basic concept of support vector machines (SVM). It explains that SVMs find the optimal separating hyperplane that maximizes the margin between positive and negative examples. This hyperplane is determined by support vectors, which are the closest data points to the hyperplane. The document formulates the SVM optimization problem and shows that its dual formulation leads to efficiently solving non-linearly separable problems by mapping data to a higher-dimensional feature space.

Uploaded by

Shawez Khan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

44 views

Basic Concept of SVM

Uploaded by

Shawez Khan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 29

Basic Concept of SVM:

o Which line
will classify
the unseen
data well?
o The dotted
line! Its line
with
Maximum
(C) CDAC Mumbai Workshop on Machine Learning
Margin!
Cont…

Support Vectors Support Vectors

− 1
W T X + b = 0 
+ 1

(C) CDAC Mumbai Workshop on Machine Learning

Some definitions:
o Functional Margin:
w.r.t.
(i)
1) individual examples : γˆ = y ( i ) (W T x ( i ) + b )
2)example set S = {( x ( i ) , y ( i ) ); i = 1 ,....., m }
( i )
γˆ = min γˆ
i = 1 ,..., m

o Geometric Margin:
w.r.t   W T (i)
 b 
1)Individual examples: γ (i)
= y (i)
 x +
  || W ||  || W || 

2) example set S,
( i )
γ = min γ
i = 1 ,..., m

(C) CDAC Mumbai Workshop on Machine Learning

Problem Formulation:

 − 1
W T
X + b =  0 
 + 1 
(C) CDAC Mumbai Workshop on Machine Learning
Cont..
o Distance of a point (u, v) from Ax+By+C=0, is given by
|Ax+By+C|/||n||
Where ||n|| is norm of vector n(A,B)
b
o Distance of hyperpalne from origin =
|| W ||

b + 1
o Distance of point A from origin = || W ||

b −1
o Distance of point B from Origin = || W ||
2
o Distance between points A and B (Margin) =
|| W ||

(C) CDAC Mumbai Workshop on Machine Learning

Cont…
We have data set (i) (i)
{X ,Y },i =1,....,m
d 1
X ∈R and Y ∈R
separating hyperplane
T
W X + b = 0
s .t .
T (i) (i)
W X + b > 0 if Y = +1
T (i) (i)
W X + b < 0 if Y = −1

(C) CDAC Mumbai Workshop on Machine Learning

Cont…
o Suppose training data satisfy following constrains also,
T (i) (i)
W X + b ≥ +1 for Y = +1
W T X (i) + b ≤ − 1 for Y (i)
= −1
Combining these to the one,
Y ( i ) (W T X ( i ) + b ) ≥ 1 for ∀i

o Our objective is to find Hyperplane(W,b) with maximal

separation between it and closest data points while satisfying
the above constrains

(C) CDAC Mumbai Workshop on Machine Learning

THE PROBLEM:
2
max
W,b ||W ||
such that
(i ) T (i )
Y (W X + b) ≥ 1 for ∀i
Also we know
T
|| W || = W W

(C) CDAC Mumbai Workshop on Machine Learning

Cont..
So the Problem can be written as:
1 T
min
W ,b 2
W W

Such that

Y(i) (WT X(i) +b) ≥1 for ∀i

T 2
Notice:W W =|| W ||

It is just a convex quadratic optimization problem !

(C) CDAC Mumbai Workshop on Machine Learning
DUAL
o Solving dual for our problem will lead us to apply SVM for
nonlinearly separable data, efficiently
o It can be shown that
min primal = max(min L(W , b, α ))
α ≥0 W ,b
o Primal problem:
1 T
min
W ,b 2
W W

Such that

Y (i) (W T X (i) + b) ≥ 1 for ∀i

m
1
[ ]
L (W , b , α ) = || W || 2 − ∑ α i Y ( i ) (W T X ( i ) + b ) − 1
2 i =1

Where α a Lagrange multiplier and α i ≥ 0

o Now minimizing it w.r.t. W and b:

We set derivatives of Lagrangian w.r.t. W and b to zero
(C) CDAC Mumbai Workshop on Machine Learning
Cont…
o Setting derivative w.r.t. W to zero, it gives:
m
W − ∑ α iY ( i ) X ( i ) = 0
i =1

i.e.
m
W = ∑ α iY ( i ) X ( i )
i =1

o Setting derivative w.r.t. b to zero, it gives:

∑
i =1
α iY (i)
= 0

(C) CDAC Mumbai Workshop on Machine Learning

Cont…
o Plugging these results into Lagrangian gives
m
1 m (i ) ( j )
L(W , b, α ) = ∑α i − ∑Y Y α iα j ( X (i ) )T ( X ( j ) )
i =1 2 i , j =1
o Say it
m
1 m (i ) ( j )
D(α ) = ∑α i − ∑Y Y α iα j ( X (i ) )T ( X ( j ) )
i =1 2 i , j =1
o This is result of our minimization w.r.t W and b,

(C) CDAC Mumbai Workshop on Machine Learning

So The DUAL:
o Now Dual becomes::
m m
1
max
α
D (α ) = ∑
i=1
α i −
2
∑
i, j =1
Y (i)
Y ( j)
α iα j X (i)
, X ( j)

s .t .
α i ≥ 0, i = 1 ,..., m
m

∑i =1
α iY (i)
= 0

o Solving this optimization problem gives us α i

o Also Karush-Kuhn-Tucker (KKT) condition is
satisfied at this solution i.e.

αi [Y (W X + b) −1] = 0, for i =1,...,m

(i) T (i)

(C) CDAC Mumbai Workshop on Machine Learning

Values of W and b:
o W can be found using
m
W = ∑
i =1
α iY (i)
X (i)

o b can be found using:

max i:Y ( i ) = −1 W *T X ( i ) + min i:Y ( i ) =1 W *T X ( i )
b* = −
2
(C) CDAC Mumbai Workshop on Machine Learning
What if data is nonlinearly separable?
o The maximal margin
hyperplane can classify
only linearly separable
data
o What if the data is linearly
non-separable?
o Take your data to linearly
separable ( higher
dimensional space) and
use maximal margin
hyperplane there!

(C) CDAC Mumbai Workshop on Machine Learning

Taking it to higher dimension works!
Ex. XOR

(C) CDAC Mumbai Workshop on Machine Learning

Doing it in higher dimensional space
o Let Φ: X →F be non linear mapping from input
space X (original space) to feature space (higher
dimensional) F
o Then our inner (dot) product X (i ) , X ( j ) in higher
(i ) ( j)
dimensional space is φ ( X ), φ ( X )

o Now, the problem becomes:

m m
1
max D (α ) = ∑ α i − ∑Y (i)
Y ( j)
α iα j φ(X (i )
), φ ( X ( j)
)
α
i =1 2 i , j =1

s .t .
α i ≥ 0, i = 1,..., m
m

∑α Y
i =1
i
(C) CDAC Mumbai
(i )
= 0 Workshop on Machine Learning
Kernel function:
o There exist a way to compute inner product in feature
space as function of original input points – Its kernel
function!
o Kernel function:

K(x, z) = φ(x),φ(z)
o We need not know φ to compute K ( x , z )

(C) CDAC Mumbai Workshop on Machine Learning

An example:
let x, z ∈ R n For n=3, feature mapping φ
K ( x, z) = ( xT z) 2 is given as : x x  1 1
x x 
n n  1 2
i.e. K ( x, z) = (∑ xi zi ) (∑ x j z j )
 x1 x 3 
 
x x
 2 1
i =1 j =1
φ ( x) =  x2 x2 
n n  
= ∑∑ xi x j zi z j  2 3
x x
x x 
i =1 j =1  3 1
 x3 x2 
n  
= ∑ ( xi x j )(zi z j )
x x
 3 3

i , j =1

K ( x, z ) = φ ( x),φ ( z )
(C) CDAC Mumbai Workshop on Machine Learning
example cont…
o Here,
for  x1 x1  1 
 x 1 x 2  2
φ (x) =  =  
K ( x, z) = ( xT z)2  x 2 x1  2
   
 x2x2  4
1  3 
x =   z =   9 
 12 
2 4 φ (z) =  
 12 
3   
xT z = [1 2 ]    16 
4 9 
 12 
= 11 φ ( x ) T φ ( z ) = [1 2 2 4 ] 
 12 
 
K ( x , z ) = ( x T z ) 2 = 121  16 
= 121

(C) CDAC Mumbai Workshop on Machine Learning

So our SVM for the non-linearly
separable data:
o Optimization problem:
m m
1
max D (α ) = ∑ α i − ∑Y (i )
Y ( j)
α iα j K X (i)
,X ( j)
α
i =1 2 i , j =1

s .t .
α i ≥ 0, i = 1,..., m
m

∑α Y
i =1
i
(i)
=0

o Decision function
m
F ( X ) = Sign(∑ α iY (i ) K ( X (i ) , X ) + b)
i =1

(C) CDAC Mumbai Workshop on Machine Learning

Some commonly used Kernel functions:

o Linear: K ( X ,Y ) = X TY
o Polynomial of degree d: K ( X , Y ) = ( X T Y + 1) d
|| X −Y ||2
−
Gaussian Radial Basis Function (RBF): 2σ 2
o K ( X ,Y ) = e

o Tanh kernel: K ( X , Y ) = tanh( ρ ( X T Y ) − δ )

(C) CDAC Mumbai Workshop on Machine Learning

Implementations:
Some Ready to use available SVM implementations:
1)LIBSVM:A library for SVM by Chih-Chung Chang and
chih-Jen Lin
(at: http://www.csie.ntu.edu.tw/~cjlin/libsvm/)
2)SVM light : An implementation in C by Thorsten
Joachims
(at: http://svmlight.joachims.org/ )
3)Weka: A Data Mining Software in Java by University
of Waikato
(at: http://www.cs.waikato.ac.nz/ml/weka/ )
(C) CDAC Mumbai Workshop on Machine Learning
Issues:
o Selecting suitable kernel: Its most of the time trial
and error
o Multiclass classification: One decision function for
each class( l1 vs l-1 ) and then finding one with max
value i.e. if X belongs to class 1, then for this and
other (l-1) classes vales of decision functions:
F1( X ) ≥ + 1
F 2 ( X ) ≤ − 1
.
.
Fl ( X ) ≤ − 1
(C) CDAC Mumbai Workshop on Machine Learning
Cont….
o Sensitive to noise: Mislabeled data can badly affect
the performance
o Good performance for the applications like-
1)computational biology and medical applications
(protein, cancer classification problems)
2)Image classification
3)hand-written character recognition
And many others…..
o Use SVM :High dimensional, linearly separable
data (strength), for nonlinearly depends on choice of
kernel
(C) CDAC Mumbai Workshop on Machine Learning
Conclusion:
Support Vector Machines provides very
simple method for linear classification. But
performance, in case of nonlinearly separable
data, largely depends on the choice of kernel!

(C) CDAC Mumbai Workshop on Machine Learning

References:
o Nello Cristianini and John Shawe-Taylor (2000)??
An Introduction to Support Vector Machines and Other Kernel-based Learning Methods
Cambridge University Press
o Christopher J.C. Burges (1998)??
A tutorial on Support Vector Machines for pattern recognition
Usama Fayyad, editor, Data Mining and Knowledge Discovery, 2, 121-167.
Kluwer Academic Publishers, Boston.
o Andrew Ng (2007)
CSS229 Lecture Notes
Stanford Engineering Everywhere, Stanford University .
o Support Vector Machines <http://www.svms.org > (Accessed 10.11.2008)
o Wikipedia
o Kernel-Machines.org<http://www.kernel-machines.org >(Accessed 10.11.2008)

(C) CDAC Mumbai Workshop on Machine Learning

Thank You!

[email protected] ;
[email protected]

(C) CDAC Mumbai Workshop on Machine Learning

Lesson Plan in Science V: I. Objectives
100% (2)
Lesson Plan in Science V: I. Objectives
5 pages
Wisc IV
No ratings yet
Wisc IV
13 pages
Introduction To: Support Vector Machines
No ratings yet
Introduction To: Support Vector Machines
53 pages
SVM PRESENTATION
No ratings yet
SVM PRESENTATION
34 pages
SVM Tutorial
No ratings yet
SVM Tutorial
34 pages
CS-13410 Introduction To Machine Learning
No ratings yet
CS-13410 Introduction To Machine Learning
33 pages
SVM-CDing2024 11 15
No ratings yet
SVM-CDing2024 11 15
54 pages
SVM Tutorial
No ratings yet
SVM Tutorial
34 pages
Support Vector Machines: Logisic Regression
No ratings yet
Support Vector Machines: Logisic Regression
10 pages
SVM Tutorial
No ratings yet
SVM Tutorial
31 pages
MergedPDF Iml
No ratings yet
MergedPDF Iml
114 pages
22-Kernel Tricks Shit
No ratings yet
22-Kernel Tricks Shit
43 pages
SVM Tutorial
No ratings yet
SVM Tutorial
34 pages
Support Vector Machines and Artificial Neural Networks: Dr.S.Veena, Associate Professor/CSE
No ratings yet
Support Vector Machines and Artificial Neural Networks: Dr.S.Veena, Associate Professor/CSE
78 pages
Support Vector Machine
No ratings yet
Support Vector Machine
50 pages
SVM
No ratings yet
SVM
40 pages
cs221-lecture11
No ratings yet
cs221-lecture11
71 pages
Machine Learning - Open Elective - Part III
No ratings yet
Machine Learning - Open Elective - Part III
90 pages
Lecture Slides-Week12
100% (1)
Lecture Slides-Week12
41 pages
ML Lec SVM Linear
No ratings yet
ML Lec SVM Linear
19 pages
Support Vector Machine
No ratings yet
Support Vector Machine
35 pages
Lec06 SVM
No ratings yet
Lec06 SVM
25 pages
SML Unit 4
No ratings yet
SML Unit 4
61 pages
Another Introduction SVM
No ratings yet
Another Introduction SVM
4 pages
SVM
No ratings yet
SVM
44 pages
SVM Class
No ratings yet
SVM Class
33 pages
Svm Student
No ratings yet
Svm Student
40 pages
Support Vector Machine
No ratings yet
Support Vector Machine
45 pages
SVM
No ratings yet
SVM
36 pages
Kernel Method and Support Vector Machines: Nguyen Duc Dung, Ph.D. Ioit, Vast
No ratings yet
Kernel Method and Support Vector Machines: Nguyen Duc Dung, Ph.D. Ioit, Vast
34 pages
Introduction To Support Vector Machines: 1 Description
No ratings yet
Introduction To Support Vector Machines: 1 Description
15 pages
svm
No ratings yet
svm
36 pages
SVM
No ratings yet
SVM
57 pages
This Is
No ratings yet
This Is
7 pages
Support Vector Machines: Xiaojin Zhu
No ratings yet
Support Vector Machines: Xiaojin Zhu
41 pages
An Introduction Of: Support Vector Machine
No ratings yet
An Introduction Of: Support Vector Machine
36 pages
SVM Scribe Notes
No ratings yet
SVM Scribe Notes
16 pages
SVM Seminarbericht Hofmann
No ratings yet
SVM Seminarbericht Hofmann
16 pages
SVM_NEW
No ratings yet
SVM_NEW
12 pages
EXP-14
No ratings yet
EXP-14
27 pages
Support Vector Machines & Kernels: David Sontag New York University
No ratings yet
Support Vector Machines & Kernels: David Sontag New York University
19 pages
Support Vector Machine
No ratings yet
Support Vector Machine
55 pages
Support Vector Machines: (Vapnik, 1979)
No ratings yet
Support Vector Machines: (Vapnik, 1979)
34 pages
SVM EXAMPLE
No ratings yet
SVM EXAMPLE
24 pages
4 - SVM
No ratings yet
4 - SVM
58 pages
Chapter 8
No ratings yet
Chapter 8
103 pages
Support Vector Machine Master Thesis
100% (3)
Support Vector Machine Master Thesis
7 pages
Lect3 2
No ratings yet
Lect3 2
43 pages
Support Vector Machine
No ratings yet
Support Vector Machine
8 pages
Support Vector Machines: Constantin F. Aliferis & Ioannis Tsamardinos
No ratings yet
Support Vector Machines: Constantin F. Aliferis & Ioannis Tsamardinos
37 pages
10 SVM
No ratings yet
10 SVM
23 pages
Final - Support Vector Machine - Class - Modifie
No ratings yet
Final - Support Vector Machine - Class - Modifie
69 pages
Support Vector Machin, An Excellent Tool
No ratings yet
Support Vector Machin, An Excellent Tool
36 pages
An Introduction To Support Vector Machines
No ratings yet
An Introduction To Support Vector Machines
13 pages
Svm
No ratings yet
Svm
40 pages
Introduction To Support Vector Machines
No ratings yet
Introduction To Support Vector Machines
23 pages
SVM & Image Classification.
No ratings yet
SVM & Image Classification.
22 pages
Support Vector Machine
No ratings yet
Support Vector Machine
38 pages
W12 SVM
No ratings yet
W12 SVM
52 pages
Lecture 7_SVM
No ratings yet
Lecture 7_SVM
125 pages
De Moiver's Theorem (Trigonometry) Mathematics Question Bank
From Everand
De Moiver's Theorem (Trigonometry) Mathematics Question Bank
Mohmmad Khaja Shareef
No ratings yet
Inverse Trigonometric Functions (Trigonometry) Mathematics Question Bank
From Everand
Inverse Trigonometric Functions (Trigonometry) Mathematics Question Bank
Mohmmad Khaja Shareef
No ratings yet
Personality and Individual Differences: Graeme Galloway
No ratings yet
Personality and Individual Differences: Graeme Galloway
5 pages
Get The Behavioural and Emotional Complications of Traumatic Brain Injury Studies on Neuropsychology Neurology and Cognition 1st Edition Simon F. Crowe PDF ebook with Full Chapters Now
100% (5)
Get The Behavioural and Emotional Complications of Traumatic Brain Injury Studies on Neuropsychology Neurology and Cognition 1st Edition Simon F. Crowe PDF ebook with Full Chapters Now
61 pages
Immediate download The Routledge Course in Business Korean 1st Edition Young-Key Kim-Renaud ebooks 2024
100% (1)
Immediate download The Routledge Course in Business Korean 1st Edition Young-Key Kim-Renaud ebooks 2024
65 pages
Philippine Military ACADEMY Cadetship Application Form
No ratings yet
Philippine Military ACADEMY Cadetship Application Form
3 pages
WEEK 9 - Hypothesis 11032022 113649am 21022023 104041am 05102023 120649pm
No ratings yet
WEEK 9 - Hypothesis 11032022 113649am 21022023 104041am 05102023 120649pm
44 pages
Popcorn Experiment
No ratings yet
Popcorn Experiment
3 pages
Boccuni F., Zanetti L. Abstractionism 2024
No ratings yet
Boccuni F., Zanetti L. Abstractionism 2024
86 pages
Toppers of 2015
No ratings yet
Toppers of 2015
1 page
Sakshi Sawale Weekely Report
No ratings yet
Sakshi Sawale Weekely Report
5 pages
Dr. C. V. Raman University: Diploma in Computer Application (DCA) 4800 3500
No ratings yet
Dr. C. V. Raman University: Diploma in Computer Application (DCA) 4800 3500
10 pages
Integrated Photonic Tensor Processing Unit For A M
No ratings yet
Integrated Photonic Tensor Processing Unit For A M
14 pages
Lecture Planner : Chemistry || Manzil JEE 2025
No ratings yet
Lecture Planner : Chemistry || Manzil JEE 2025
1 page
NCERT Solutions For Class 12 Physics Chapter 7 Alternating Current (AC) PDF
No ratings yet
NCERT Solutions For Class 12 Physics Chapter 7 Alternating Current (AC) PDF
20 pages
Software Development Methodologies: A Comparative Analysis
No ratings yet
Software Development Methodologies: A Comparative Analysis
14 pages
Concept and Theory of Ageing
No ratings yet
Concept and Theory of Ageing
26 pages
LESSON-PLAN-IN-ENGLISH-GRADE-9
No ratings yet
LESSON-PLAN-IN-ENGLISH-GRADE-9
2 pages
Rizal Handouts
No ratings yet
Rizal Handouts
6 pages
Degree: Master of Science in Civil Engineering - Megastructure Engineering With Sustainable Resources (Academic)
No ratings yet
Degree: Master of Science in Civil Engineering - Megastructure Engineering With Sustainable Resources (Academic)
2 pages
Training Module On Team Development: Tata Iron & Steel Company
No ratings yet
Training Module On Team Development: Tata Iron & Steel Company
9 pages
Jonas
No ratings yet
Jonas
26 pages
Module 8 Maryam Tariq D14263
86% (7)
Module 8 Maryam Tariq D14263
11 pages
A Levels Subject Name Option Code Total Fee (NPR)
No ratings yet
A Levels Subject Name Option Code Total Fee (NPR)
4 pages
SK Operations Manual Clean Copy 2 1
No ratings yet
SK Operations Manual Clean Copy 2 1
118 pages
Gordon Allport - Trait Theory
100% (1)
Gordon Allport - Trait Theory
2 pages
Resume dk 27
No ratings yet
Resume dk 27
3 pages
Lesson Plans Edtpa
No ratings yet
Lesson Plans Edtpa
12 pages
Academic Wriring Past Simple Tense Lesson 5
No ratings yet
Academic Wriring Past Simple Tense Lesson 5
9 pages
Indian Airline Ticket Price Analysis
No ratings yet
Indian Airline Ticket Price Analysis
60 pages