0% found this document useful (0 votes)

18 views

(Optimization) SVMs

The document provides an introduction to support vector machines including their history, how they work as large-margin linear classifiers, finding the optimal decision boundary, and the dual formulation. Key aspects covered include maximizing the margin between classes, support vectors, solving the constrained optimization problem using Lagrange multipliers, and deriving the dual problem.

Uploaded by

Jeong Phill Kim

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views

(Optimization) SVMs

Uploaded by

Jeong Phill Kim

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 19

Outline

 Large-margin linear classifier

A Simple Introduction to Support  Linear separable
Vector Machines  Nonlinear separable
 Creating nonlinear classifiers: kernel trick
 Transduction
 Discussion on SVM
Adapted from various authors
 Conclusion
by Mario Martin

2 Simple introduction to SVMs May 13, 2012

History of SVM
 SVM is related to statistical learning theory [3]
 Introduced by Vapnik
 SVM was first introduced in 1992
SVM: Large-margin linear classifier
 SVM becomes popular because of its success a lot of
classification problems

3 Simple introduction to SVMs May 13, 2012 4 Simple introduction to SVMs May 13, 2012
Perceptron Revisited: Linear
Separators Linear Separators
 Binary classification can be viewed as the task of separating  Which of the linear separators is optimal?
classes in feature space:

wTx + b = 0
wTx + b > 0
wTx + b < 0

f(x) = sign(wTx + b)

5 Simple introduction to SVMs May 13, 2012 6 Simple introduction to SVMs May 13, 2012

What is a good Decision Boundary? Examples of Bad Decision Boundaries

 Consider a two-class, linearly

separable classification problem
Class 2
 Many decision boundaries! Class 2 Class 2
 The Perceptron algorithm can be
used to find such a boundary
 Other different algorithms have
been proposed
 Are all decision boundaries equally Class 1
Class 1 Class 1
good?

7 Simple introduction to SVMs May 13, 2012 8 Simple introduction to SVMs May 13, 2012
Maximum Margin Classification Classification Margin
 Maximizing the distance to examples is good according to wT xi  b
 Distance from example xi to the separator is r
w
intuition and PAC theory.
 Examples closest to the hyperplane are support vectors.
 Implies that only few vectors matter; other training examples are
 Margin ρ of the separator is the distance between support vectors.
ignorable. ρ

9 Simple introduction to SVMs May 13, 2012 10 Simple introduction to SVMs May 13, 2012

Large-margin Decision Boundary Finding the Decision Boundary

 The decision boundary should be as far away from the data of both  Let {x1, ..., xn} be our data set and let yi  {1,-1} be the
classes as possible: We should maximize the margin, m class label of xi
We normalize equations so
function in supports is 1/-1.
 The decision boundary should classify all points correctly 
wT xi  b
r
w
Class 2
 Maximizing margin classifying all points correctly constraints
is defined as follows:
Class 1
m

11 Simple introduction to SVMs May 13, 2012 12 Simple introduction to SVMs May 13, 2012
Finding the Decision Boundary [Recap of Constrained Optimization]
 Suppose we want to: minimize f(x) subject to g(x) = 0
 Primal formulation  A necessary condition for x0 to be a solution:

 : the Lagrange multiplier

 For multiple constraints gi(x) = 0, i=1, …, m, we need a
 We can solve this problem using this formulation, or using Lagrange multiplier i for each of the constraints
the dual formulation…

13 Simple introduction to SVMs May 13, 2012 14 Simple introduction to SVMs May 13, 2012

[Recap of Constrained Optimization] Back to the Original Problem

 The case for inequality constraint gi(x)0 is similar, except that
the Lagrange multiplier i should be positive
 If x0 is a solution to the constrained optimization problem
 The Lagrangian is

 There must exist i0 for i=1, …, m such that x0 satisfy

 Note that ||w||2 = wTw

 Setting the gradient of w.r.t. w and b to zero, we have
 The function is also known as the Lagrangrian.
We want to set its gradient to 0

15 Simple introduction to SVMs May 13, 2012 16 Simple introduction to SVMs May 13, 2012
The Dual Formulation The Dual formulation
 If we substitute to , we have  It is known as the dual problem (the original problem is
known as the primal problem): if we know w, we know all
i; if we know all i, we know w
 The objective function of the dual problem needs to be
maximized!
 The dual problem is therefore:

 Remember that

 This is a function of i only Properties of i when we introduce the

The result when we differentiate the
Lagrange multipliers original Lagrangian w.r.t. b

17 Simple introduction to SVMs May 13, 2012 18 Simple introduction to SVMs May 13, 2012

The Dual Problem A Geometrical Interpretation

Class 2

10=0
8=0.6

7=0
2=0
5=0
 This is a quadratic programming (QP) problem
 A global maximum of i can always be found 1=0.8
4=0
6=1.4
 w can be recovered by 9=0
3=0
Class 1

19 Simple introduction to SVMs May 13, 2012 20 Simple introduction to SVMs May 13, 2012
Characteristics of the Solution Characteristics of the Solution
 For testing with a new data z
 Many of the i are zero
 w is a linear combination of a small number of data points  Compute
 This “sparse” representation can be viewed as data compression  classify z as class 1 if the sum is positive, and class 2 otherwise
 xi with non-zero i are called support vectors (SV)  Note: w need not be formed explicitly
 The decision boundary is determined only by the SV
 Let tj (j=1, ..., s) be the indices of the s support vectors. We can
write

21 Simple introduction to SVMs May 13, 2012 22 Simple introduction to SVMs May 13, 2012

SVM
Non-Separable Sets
The Quadratic Programming Problem
 Many approaches have been proposed
 Loqo, cplex, etc. (see http://www.numerical.rl.ac.uk/qp/qp.html) • Sometimes, we do not want to separate perfectly.
 Most are “interior-point” methods
 Start with an initial solution that can violate the constraints
 Improve this solution by optimizing the objective function and/or
This is too
reducing the amount of constraint violation
close!
 For SVM, sequential minimal optimization (SMO) seems to be the
most popular
 A QP with two variables is trivial to solve Maybe this
point is not
 Each iteration of SMO picks a pair of (i,j) and solve the QP with
these two variables; repeat until convergence so important.
 In practice, we can just regard the QP solver as a “black-box”
without bothering how it works

23 Simple introduction to SVMs May 13, 2012 24 Simple introduction to SVMs May 13, 2012
SVM SVM
Non-Separable Sets Non-Separable Sets

• Sometimes, we do not want to separate perfectly. .

If we ignore
this point
The hyperplane
is nicer!

25 Simple introduction to SVMs May 13, 2012 26 Simple introduction to SVMs May 13, 2012

Soft Margin Classification Non-linearly Separable Problems

 What if the training set is not linearly separable?  We allow “error” i in classification; it is based on the output
 Slack variables ξi can be added to allow misclassification of of the discriminant function wTx+b
difficult or noisy examples, resulting margin called soft.  i approximates the number of misclassified samples

Class 2

ξi
ξi

Class 1
27 Simple introduction to SVMs May 13, 2012 28 Simple introduction to SVMs May 13, 2012
Soft Margin Hyperplane Soft Margin Hyperplane
 We want to minimize
 If we minimize ii, i can be computed by

 C : tradeoff parameter between error and margin

 The optimization problem becomes

 i are “slack variables” in optimization
 Note that i=0 if there is no error for xi
 Number of slacks + supports is an upper bound of the number
of errors (Leave one out error)

29 Simple introduction to SVMs May 13, 2012 30 Simple introduction to SVMs May 13, 2012

The Optimization Problem Non-linearly Separable Problems

 The dual of this new constrained optimization problem is  We allow “error” i in classification; it is based on the output
of the discriminant function wTx+b
 i approximates the number of misclassified samples
1=0
Class 2
3=C
 w is recovered as:

 This is very similar to the optimization problem in the linear

2<=C
separable case, except that there is an upper bound C on i now
 Once again, a QP solver can be used to find i

Class 1
31 Simple introduction to SVMs May 13, 2012 32 Simple introduction to SVMs May 13, 2012
Extension to Non-linear Decision
Boundary
 So far, we have only considered large-margin classifier with a
linear decision boundary
 How to generalize it to become nonlinear?

SVM with KERNELS: Large-margin  Key idea: transform xi to a higher dimensional space to “make life
easier”
NON-linear classifiers  Input space: the space the point xi are located
 Feature space: the space of (xi) after transformation
 Why transform?
 Linear operation in the feature space is equivalent to non-linear
operation in input space
 Classification can become easier with a proper transformation. In the
XOR problem, for example, adding a new feature of x1x2 make the
problem linearly separable

33 Simple introduction to SVMs May 13, 2012 34 Simple introduction to SVMs May 13, 2012

Transforming the Data Non-linear SVMs: Feature spaces

( )
( ) ( )
( ) ( ) ( )  General idea: the original feature space can always be
(.) ( )
( ) ( )
( ) ( ) mapped to some higher-dimensional feature space where the
( ) ( ) training set is separable:
( ) ( ) ( )
( )
( ) Φ: x →
Input space Feature space φ(x)
Note: feature space is of higher dimension than
the input space in practice

 Computation in the feature space can be costly because it is high

dimensional
 The feature space is typically infinite-dimensional!
 The kernel trick comes to rescue
35 Simple introduction to SVMs May 13, 2012 36 Simple introduction to SVMs May 13, 2012
SVMs with kernels
The Kernel Trick
 Recall the SVM optimization problem
• Training
l
1 l l
maximize   i 
i 1
  i  j  yi  yj  K xi  xj 
2 i 1 j 1
l
 The data points only appear as inner product
 As long as we can calculate the inner product in the feature space,
subject to   y
i 1
i i  0 and i C  i  0
we do not need the mapping explicitly
 Many common geometric operations (angles, distances) can be • Classification of x:
expressed by inner products
 Define the kernel function K by  l 
h ( x )  sign    i  y i  K ( xi , x )  b 
 i 1 
37 Simple introduction to SVMs May 13, 2012 38 Simple introduction to SVMs May 13, 2012

An Example for (.) and K(.,.) Kernel Functions

 Suppose (.) is given as follows • Kernel (Gram) matrix:
 K (x1 , x1 ) K (x1 , x 2 ) K (x1 , x 3 )  K (x1 , x l ) 
 
 An inner product in the feature space is  K (x 2 , x1 ) K (x 2 , x 2 ) K (x 2 , x 3 ) K (x 2 , x l ) 
   
 
   
 So, if we define the kernel function as follows, there is no need to  K (x , x ) K (x , x ) K (x , x )  K (x l , x l ) 
carry out (.) explicitly  l 1 l 2 l 3

Matrix obtained from product:

 This use of kernel function to avoid carrying out (.) explicitly is ’
known as the kernel trick

39 Simple introduction to SVMs May 13, 2012 40 Simple introduction to SVMs May 13, 2012
Kernel Functions
Kernel Functions
 Any function K(x,z) that creates a symmetric, positive
definite matrix Kij = K(xi,xj) is a valid kernel (an inner
 Another view: kernel function, being an inner
product in some space)
product, is really a similarity measure between the
 Why? Because any sdp matrix M can be decomposed as
objects
N’N = M  Not all similarity measures are allowed – they must
so N can be seen as the projection to the feature space Mercer conditions
 Any distance measure can be translated to a kernel

41 Simple introduction to SVMs May 13, 2012 42 Simple introduction to SVMs May 13, 2012

Examples of Kernel Functions Modification Due to Kernel Function

 Polynomial kernel with degree d  Change all inner products to kernel functions
 For training,

 Radial basis function kernel with width  Original

 Closely related to radial basis function neural networks

 The feature space is infinite-dimensional
 Sigmoid with parameter  and  With kernel
function
 It does not satisfy the Mercer condition on all  and 

43 Simple introduction to SVMs May 13, 2012 44 Simple introduction to SVMs May 13, 2012
Modification Due to Kernel Function More on Kernel Functions
 For testing, the new data z is classified as class 1 if f 0, and  Since the training of SVM only requires the value of K(xi, xj),
as class 2 if f <0 there is no restriction of the form of xi and xj
 xi can be a sequence or a tree, instead of a feature vector
Original
 K(xi, xj) is just a similarity measure comparing xi and xj
 For a test object z, the discriminant function essentially is a
weighted sum of the similarity between z and a pre-selected
set of objects (the support vectors)

With kernel
function

45 Simple introduction to SVMs May 13, 2012 46 Simple introduction to SVMs May 13, 2012

More on Kernel Functions Choosing the Kernel Function

 Probably the most tricky part of using SVM.
 The kernel function is important because it creates the kernel
 Not all similarity measure can be used as kernel function, matrix, which summarizes all the data
however  Many principles have been proposed (diffusion kernel, Fisher
 The kernel function needs to satisfy the Mercer function, kernel, string kernel, …)
i.e., the function is “positive-definite”  There is even research to estimate the kernel matrix from available
information
 This implies that the n by n kernel matrix, in which the
(i,j)-th entry is the K(xi, xj), is always positive definite  In practice, a low degree polynomial kernel or RBF kernel with a
 This also means that the QP is convex and can be solved in reasonable width is a good initial try
polynomial time  Note that SVM with RBF kernel is closely related to RBF neural
networks, with the centers of the radial basis functions
automatically chosen for SVM

47 Simple introduction to SVMs May 13, 2012 48 Simple introduction to SVMs May 13, 2012
Other Aspects of SVM Software
 How to use SVM for multi-class classification?  A list of SVM implementation can be found at
 One can change the QP formulation to become multi-class http://www.kernel-machines.org/software.html
 More often, multiple binary classifiers are combined  Some implementation (such as LIBSVM) can handle multi-
 One can train multiple one-versus-all classifiers, or combine class classification
multiple pairwise classifiers “intelligently”  SVMLight is among one of the earliest implementation of
 How to interpret the SVM discriminant function value as SVM
probability?  Several Matlab toolboxes for SVM are also available
 By performing logistic regression on the SVM output of a set of
data (validation set) that is not used for training
 Some SVM software (like libsvm) have these features built-in

49 Simple introduction to SVMs May 13, 2012 50 Simple introduction to SVMs May 13, 2012

Summary: Steps for Classification Strengths and Weaknesses of SVM

 Prepare the pattern matrix  Strengths
 Select the kernel function to use  Training is relatively easy
 No local optimal, unlike in neural networks
 Select the parameter of the kernel function and the value of C
 It scales relatively well to high dimensional data
 You can use the values suggested by the SVM software, or you
can set apart a validation set to determine the values of the  Tradeoff between classifier complexity and error can be
parameter controlled explicitly
 Non-traditional data like strings and trees can be used as input
 Execute the training algorithm and obtain the i
to SVM, instead of feature vectors
 Unseen data can be classified using the i and the support
 Weaknesses
vectors
 Need to choose a “good” kernel function.

51 Simple introduction to SVMs May 13, 2012 52 Simple introduction to SVMs May 13, 2012
Other Types of Kernel Methods Conclusion
 A lesson learnt in SVM: a linear algorithm in the feature  SVM is a useful alternative to neural networks
space is equivalent to a non-linear algorithm in the input  Two key concepts of SVM: maximize the margin and the
space kernel trick
 Standard linear algorithms can be generalized to its non-  Many SVM implementations are available on the web for you
linear version by going to the feature space to try on your data set!
 Kernel principal component analysis, kernel independent
component analysis, kernel canonical correlation analysis,
kernel k-means, 1-class SVM are some examples

53 Simple introduction to SVMs May 13, 2012 54 Simple introduction to SVMs May 13, 2012

Examples Examples
Toy Examples Toy Examples (I)
• All examples have been run with the 2D graphic interface of
SVMLIB (Chang and Lin, National University of Taiwan)
“LIBSVM is an integrated software for support vector classification, Linearly separable data set
(C-SVC, nu-SVC), regression (epsilon-SVR, un-SVR) and distribution
estimation (one-class SVM). It supports multi-class classification. The
Linear SVM
basic algorithm is a simplification of both SMO by Platt and SVMLight Maximal margin Hyperplane
by Joachims. It is also a simplification of the modification 2 of SMO by
Keerthy et al. Our goal is to help users from other fields to easily use
SVM as a tool. LIBSVM provides a simple interface where users can
easily link it with their own programs…”

• Available from: www.csie.ntu.edu.tw/~cjlin/libsvm (it icludes a . What happens if we add

Web integrated demo tool)
a blue training example
here?

55 Simple introduction to SVMs May 13, 2012 56 Simple introduction to SVMs May 13, 2012
Examples Examples
Toy Examples (I) Toy Examples (I)

(still) Linearly separable (still) Linearly separable

data set data set
Linear SVM Linear SVM
High value of C parameter Low value of C parameter
Maximal margin Hyperplane Trade-off between: margin
and training error

The example is The example is

correctly classified now a bounded SV

57 Simple introduction to SVMs May 13, 2012 58 Simple introduction to SVMs May 13, 2012

Examples Examples
Toy Examples (I) Toy Examples (I)

59 Simple introduction to SVMs May 13, 2012 60 Simple introduction to SVMs May 13, 2012
Examples Examples
Toy Examples (I) Toy Examples (I)

61 Simple introduction to SVMs May 13, 2012 62 Simple introduction to SVMs May 13, 2012

Resources
 http://www.kernel-machines.org/
 http://www.support-vector.net/
 http://www.support-vector.net/icml-tutorial.pdf
 http://www.kernel-machines.org/papers/tutorial- Transduction with SVMs
nips.ps.gz
 http://www.clopinet.com/isabelle/Projects/SVM/applist.
html

63 Simple introduction to SVMs May 13, 2012 64 Simple introduction to SVMs May 13, 2012
Transduction based on margin size
The learning problem  Binary classification, linear parameterization, joint set of
 Transduction: (training + working) samples
We consider a phenomenon f that maps inputs (instances) x to
outputs (labels) y = f(x) (y {−1, 1})
 Two objectives of transductive learning:
 Given a set of labeled examples {(xi, yi) : i = 1, …, n},
 and a set of unlabeled examples x’1, …, x’m
(TL1) separate labeled training data using a large-margin
hyperplane (as in standard inductive SVM)
 the goal is to find the labels y’1 , …, y’m (TL2) separating (explain) working data set using a large-margin
hyperplane.
 No need to construct a function f, the output of the
transduction algorithm is a vector of labels.

65 Simple introduction to SVMs May 13, 2012 66

TextCat
Transductive SVMs Transductive SVMs
• Transductive instead of inductive (Vapnik 98)
• TSVMs take into account a particular test set and try
to minimize misclassifications of just those particular
examples
• Formal setting:
Strain  {( x1 , y1 ), ( x 2 , y2 ),  , ( x n , yn )}
Stest  {x1* , x*2 ,  , x*k } (normally k  n )
Goal of the transductive learner L:
find a function hL  L( Strain , Stest ) so that the expected number
of erroneous predictions on the test examples is minimized

67 Simple introduction to SVMs May 13, 2012 68 Simple introduction to SVMs May 13, 2012
TextCat
Transductive SVMs
Induction vs Transduction

69 Simple introduction to SVMs May 13, 2012 70

Optimization formulation (cont’d)

Optimization formulation for SVM transduction
 Given: joint set of (training + working) samples  Hyperparameters C and C * control the trade-off between
 Denote slack variables for training,
i for working
 *j explanation and margin size
 Soft-margin inductive SVM is a special case of soft-margin
n m
 Minimize R (w , b) 
1
( w  w )  C   i  C *   *j

subject to 
2 i 1
y i [( w  x i )  b ]  1   i
j 1
transduction with zero slacks  *j  0

 y *j [( w  x i )  b ]  1   *j  Dual + kernel version of SVM transduction
 ,  *  0, i  1,..., n , j  1,..., m
 i j  Transductive SVM optimization is not convex
where y *j  sign ( w  x j  b ), j  1,..., m
(~ non-convexity of the loss for unlabeled data) –
 Solution (~ decision boundary) D ( x )  ( w *  x )  b *  different opt. heuristics ~ different solutions
 Unbalanced situation (small training/ large test)  Exact solution (via exhaustive search) possible for small
 all unlabeled samples assigned to one class number of test samples (m)
 Additional constraint: 1 n 1 m
 y i   [( w  x i )  b ]
n i 1 m j 1
71 72
Many applications for transduction
Example application
 Text categorization: classify word documents into a number
 Prediction of molecular bioactivity for drug discovery
of predetermined categories
 Training data~1,909; test~634 samples
 Email classification: Spam vs non-spam
 Input space ~ 139,351-dimensional
 Web page classification
 Prediction accuracy:
 Image database classification
SVM induction ~74.5%; transduction ~ 82.3%
 All these applications:
Ref: J. Weston et al, KDD cup 2001 data analysis: prediction of molecular
- high-dimensional data bioactivity for drug design – binding to thrombin, Bioinformatics 2003
- small labeled training set (human-labeled)
- large unlabeled test set

73 74

2024-SCU-ML-2-1-SVM
No ratings yet
2024-SCU-ML-2-1-SVM
36 pages
MergedPDF Iml
No ratings yet
MergedPDF Iml
114 pages
Support Vector Machine
No ratings yet
Support Vector Machine
45 pages
Introduction To: Support Vector Machines
No ratings yet
Introduction To: Support Vector Machines
53 pages
Chapter 07 SVM
No ratings yet
Chapter 07 SVM
20 pages
Introduction To Support Vector Machines: Hsuan-Tien Lin
No ratings yet
Introduction To Support Vector Machines: Hsuan-Tien Lin
20 pages
SVM Consolidated
No ratings yet
SVM Consolidated
34 pages
Lec5 Support vector machine
No ratings yet
Lec5 Support vector machine
28 pages
CMPE 442 Introduction To Machine Learning: Support Vector Machines
No ratings yet
CMPE 442 Introduction To Machine Learning: Support Vector Machines
64 pages
Lec06 SVM
No ratings yet
Lec06 SVM
25 pages
Support Vector Machines: Jeff Wu
No ratings yet
Support Vector Machines: Jeff Wu
35 pages
Support Vector Machines: Xiaojin Zhu
No ratings yet
Support Vector Machines: Xiaojin Zhu
41 pages
CS-13410 Introduction To Machine Learning
No ratings yet
CS-13410 Introduction To Machine Learning
33 pages
SVM Tutorial
No ratings yet
SVM Tutorial
34 pages
SVM Tutorial
No ratings yet
SVM Tutorial
31 pages
ML Unit 3
No ratings yet
ML Unit 3
14 pages
SVM Tutorial
No ratings yet
SVM Tutorial
28 pages
6 Lec SVM Kernel
No ratings yet
6 Lec SVM Kernel
36 pages
27-Module 4 - Support Vector Machine and Naïve Bayes-20-09-2024
No ratings yet
27-Module 4 - Support Vector Machine and Naïve Bayes-20-09-2024
31 pages
10 SVM
No ratings yet
10 SVM
23 pages
Support Vector Machin, An Excellent Tool
No ratings yet
Support Vector Machin, An Excellent Tool
36 pages
UNIT - 2
No ratings yet
UNIT - 2
15 pages
6. Support Vector Machine for Classification
No ratings yet
6. Support Vector Machine for Classification
38 pages
SVM - Feb 15
No ratings yet
SVM - Feb 15
34 pages
SVM
No ratings yet
SVM
11 pages
Support Vector Machine (SVM)
No ratings yet
Support Vector Machine (SVM)
4 pages
SVM Explained PDF
No ratings yet
SVM Explained PDF
19 pages
Unit 2
No ratings yet
Unit 2
47 pages
Supervised Alg
No ratings yet
Supervised Alg
27 pages
By: Moataz Al-Haj: Vision Topics - Seminar (University of Haifa)
No ratings yet
By: Moataz Al-Haj: Vision Topics - Seminar (University of Haifa)
69 pages
4 - SVM
No ratings yet
4 - SVM
58 pages
SVM Tutorial
No ratings yet
SVM Tutorial
34 pages
Support Vector Machines
No ratings yet
Support Vector Machines
13 pages
Support Vector Machines: Constantin F. Aliferis & Ioannis Tsamardinos
No ratings yet
Support Vector Machines: Constantin F. Aliferis & Ioannis Tsamardinos
37 pages
Lecture 18 - SVM
No ratings yet
Lecture 18 - SVM
54 pages
By: Moataz Al-Haj: Vision Topics - Seminar (University of Haifa)
No ratings yet
By: Moataz Al-Haj: Vision Topics - Seminar (University of Haifa)
69 pages
SVM PRESENTATION
No ratings yet
SVM PRESENTATION
34 pages
SVM-CDing2024 11 15
No ratings yet
SVM-CDing2024 11 15
54 pages
13.1 Support Vector Machine
No ratings yet
13.1 Support Vector Machine
28 pages
SVM Tutorial
No ratings yet
SVM Tutorial
34 pages
Support Vector Machines (II) : CMSC 422
No ratings yet
Support Vector Machines (II) : CMSC 422
26 pages
Hands On Machine Learning 3 Edition
No ratings yet
Hands On Machine Learning 3 Edition
43 pages
SVM notes unit 4.docx
No ratings yet
SVM notes unit 4.docx
8 pages
Machine Learning - Open Elective - Part III
No ratings yet
Machine Learning - Open Elective - Part III
90 pages
SVM
No ratings yet
SVM
57 pages
22-Kernel Tricks Shit
No ratings yet
22-Kernel Tricks Shit
43 pages
SVM
No ratings yet
SVM
40 pages
SVM Class
No ratings yet
SVM Class
33 pages
Introduction To Support Vector Machines: 1 Description
No ratings yet
Introduction To Support Vector Machines: 1 Description
15 pages
ML Support Vector Machines 2
No ratings yet
ML Support Vector Machines 2
22 pages
ML-Lec9-SVM
No ratings yet
ML-Lec9-SVM
32 pages
UNIT-III Support Vector Machines
No ratings yet
UNIT-III Support Vector Machines
43 pages
Support Vector Machine
100% (1)
Support Vector Machine
25 pages
SVM & Image Classification.
No ratings yet
SVM & Image Classification.
22 pages
A Introduction To SVM PDF
No ratings yet
A Introduction To SVM PDF
48 pages
20 SVM
No ratings yet
20 SVM
35 pages
Unit-4 AI - SVM
No ratings yet
Unit-4 AI - SVM
21 pages
1603083107LecT4 - Piezolectric Transducers
No ratings yet
1603083107LecT4 - Piezolectric Transducers
16 pages
Lecture 9
No ratings yet
Lecture 9
37 pages
(Optimization) SVM - Application (Kan)
No ratings yet
(Optimization) SVM - Application (Kan)
12 pages
Simrad Pro ARGUS Radar Solutions Brochure
No ratings yet
Simrad Pro ARGUS Radar Solutions Brochure
16 pages
(EM - Fields) Introduction To Modern Electromagnetics (Durney)
No ratings yet
(EM - Fields) Introduction To Modern Electromagnetics (Durney)
485 pages
Radar2009 0321
No ratings yet
Radar2009 0321
6 pages
4g S-Band Radars Report Final
No ratings yet
4g S-Band Radars Report Final
18 pages
B G Tri-Brand Broadband 4G Radar Essential Guide PDF Low Res PDF of The 22 Page A5 B G Tri-Brand Broadband 4G Radar Essential Guide 148x210 4625
No ratings yet
B G Tri-Brand Broadband 4G Radar Essential Guide PDF Low Res PDF of The 22 Page A5 B G Tri-Brand Broadband 4G Radar Essential Guide 148x210 4625
22 pages
Xilinx Zcu216 Product Brief
No ratings yet
Xilinx Zcu216 Product Brief
2 pages
Caam 453 Numerical Analysis I: 6 October 2009 M. Embree, Rice University
No ratings yet
Caam 453 Numerical Analysis I: 6 October 2009 M. Embree, Rice University
4 pages
swra621-TI ISK6843 DSP
No ratings yet
swra621-TI ISK6843 DSP
29 pages

(Optimization) SVMs

Uploaded by

(Optimization) SVMs

Uploaded by

Outline

 Large-margin linear classifier

2 Simple introduction to SVMs May 13, 2012

What is a good Decision Boundary? Examples of Bad Decision Boundaries

 Consider a two-class, linearly

Large-margin Decision Boundary Finding the Decision Boundary

 : the Lagrange multiplier

[Recap of Constrained Optimization] Back to the Original Problem

 There must exist i0 for i=1, …, m such that x0 satisfy

 Note that ||w||2 = wTw

 This is a function of i only Properties of i when we introduce the

The Dual Problem A Geometrical Interpretation

• Sometimes, we do not want to separate perfectly. .

Soft Margin Classification Non-linearly Separable Problems

 C : tradeoff parameter between error and margin

 The optimization problem becomes

The Optimization Problem Non-linearly Separable Problems

 This is very similar to the optimization problem in the linear

Transforming the Data Non-linear SVMs: Feature spaces

 Computation in the feature space can be costly because it is high

An Example for (.) and K(.,.) Kernel Functions

Matrix obtained from product:

Examples of Kernel Functions Modification Due to Kernel Function

 Radial basis function kernel with width  Original

 Closely related to radial basis function neural networks

More on Kernel Functions Choosing the Kernel Function

Summary: Steps for Classification Strengths and Weaknesses of SVM

• Available from: www.csie.ntu.edu.tw/~cjlin/libsvm (it icludes a . What happens if we add

(still) Linearly separable (still) Linearly separable

The example is The example is

65 Simple introduction to SVMs May 13, 2012 66

69 Simple introduction to SVMs May 13, 2012 70

Optimization formulation (cont’d)

You might also like