0% found this document useful (0 votes)

63 views

Support Vector Machines: Xiaojin Zhu

Support vector machines (SVMs) are a state-of-the-art classification algorithm that finds the optimal separating hyperplane between classes of data. SVMs maximize the margin between these classes to create an optimal linear decision boundary, even in cases where the classes are not linearly separable. This is done by mapping the data into a higher dimensional feature space and finding a linear separator in that space. SVMs can handle both binary and multiclass classification problems using kernels to efficiently compute the inner products between points in the higher dimensional feature space without explicitly computing the mapping.

Uploaded by

gunnersregister

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

63 views

Support Vector Machines: Xiaojin Zhu

Uploaded by

gunnersregister

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 41

Support Vector Machines

Xiaojin Zhu
[email protected]
Computer Sciences Department
University of Wisconsin, Madison

[ Based on slides from Andrew Moore http://www.cs.cmu.edu/~awm/tutorials]

slide 1

The no-math verison

Support vector machines

slide 2

Courtesy of Steve Chien of NASA/JPL

Lake Mendota, Madison, WI

Lake Mendota, Wisconsin

Identify areas of land cover (land, ice,

water, snow) in a scene
Three algorithms:

Scientist manually derived

Automatic best ratio

Support Vector Machine (SVM)

Classifier

Expert
Derived

Automated
Ratio

SVM

cloud

45.7%

43.7%

58.5%

ice

60.1%

34.3%

80.4%

land

93.6%

94.7%

94.0%

snow

63.5%

90.4%

71.6%

water

84.2%

74.3%

89.1%

unclassified

45.7%

Visible
Image

Expert
Labeled

Expert
Derived

Automated
Ratio

slide 3

SVM

Support vector machines

The state-of-the-art classifier

Class labels
denotes +1

denotes -1

How would you

classify this data?

slide 4

Linear classifier

If all you can do is to draw a straight line, the linear

decision boundary

slide 5

Linear classifier

Another OK decision boundary

slide 6

slide 7

Any of these
would be fine..
..but which is the
best?

slide 8

The margin

Margin: the width that the boundary can be increased

before hitting a data point

slide 9

SVM: maximize margin

The simplest SVM (linear SVM) is the linear classifier

with the maximum margin

Linear SVM
slide 10

SVM: linearly non-separable data

What if the data is not linearly separable?

slide 11

SVM: linearly non-separable data

Two solutions:
Allow a few points on the wrong side (slack
variables), and/or
Map data to a higher dimensional space, do linear
classification there (kernel)

slide 12

SVM: more than two classes

N class problem: Split the task into N binary tasks:

Class 1 vs. the rest (class 2N)
Class 2 vs. the rest (class 1, 3N)

Class N vs. the rest
Finally, pick the class that put the point furthest into
the positive region.

slide 13

SVM: get your hands on it

There are many implementations

http://www.support-vector.net/software.html
http://svmlight.joachims.org/

You dont have to know the rest of the class in order

to use SVM.

end of class
(But you want to know more, dont you?)

slide 14

The math verison

Support vector machines

slide 15

Vector

A vector W in d-dimensional space is a list of d

numbers, e.g. W=(-1,2)
A vector is a vertical column. is matrix transpose
Vector is a line segment, with arrow, in space
The norm of a vector ||W|| is its length
||W|| = sqrt(W W)
W
Inner product: XY = XiYi

What d-dimensional point X

makes W X = 0?

slide 16

Lines

W X is a scalar (single number): Xs projection onto W

WX=0 specifies the set of points X, which is the line
perpendicular to W, and intersects at (0,0)

WX=1, or WX-1=0

WX=0
WX=-1, or WX+1 = 0

1/||W||

WX=1 is the line parallel to WX=0, shifted by 1/||W||

What if the boundary doesnt go through origin?
slide 17

SVM boundary and margin

Want: find W, b (offset) such that

all positive training points (X,Y=1) in the red zone,
all negative ones (X, Y=-1) in the blue zone,
the margin M is maximized
Plus-Plane
Classifier Boundary
Minus-Plane

2/||W||

M=2/||W|| (why?)
How do we find such W,b?
slide 18

SVM as constrained optimization

Variables: W, b
Objective function: maximize the margin M=2/||W||
Equiv. to minimize ||W||, or ||W||2=WW, or WW

Assume N training points (Xi, Yi), Yi = 1 or -1

Subject to each training point on the correct side (the
constraint). How many constraints do we have?
Plus-Plane
Classifier Boundary
Minus-Plane

2/||w||

slide 19

SVM as constrained optimization

Variables: W, b
Objective function: maximize the margin M=2/||W||
Equiv. to minimize ||W||, or ||W||2=WW, or WW

Assume N data points (Xi, Yi), Yi = 1 or -1

Subject to each training point on the correct side (the
constraint). How many constraints do we have? N
WXi + b >= 1, if Yi=1
WXi + b <= -1, if Yi=-1
we can unify them: Yi(WXi+b) >= 1
We have a continuous constrained optimization
problem! What do we do?
slide 20

SVM as QP

minW,b WW
Subject to Yi (WXi + b) >= 1, for all i
Objective is convex, quadratic
Linear constraints
This problem is known as Quadratic Program (QP),
for which efficient global solution algorithms exist.
Plus-Plane
Classifier Boundary
Minus-Plane

2/||W||

slide 21

Non-separable case

What about this?

Can we insist on Yi (WXi + b) >= 1, for all i?

slide 22

Trick #1: slack variables

Relax the constraints allow a few bad apples

For a given linear boundary W, b, we can compute
how far off into the wrong side a bad point is

e11

Heres how we relax the constraints:

Yi (WXi + b) >= 1- ei
slide 23

Trick #1: SVM with slack variables

minW,b,e WW + C i ei
Subject to

Trade-off parameter

Yi (WXi + b) >= 1- ei for all i

ei>=0 for all i (why?)

e11

slide 24

Trick #1: SVM with slack variables

minW,b,e WW + C i ei
Subject to

Trade-off parameter

Yi (WXi + b) >= 1- ei for all i

ei>=0 for all i

Originally we optimize variables 11

e2
W (d-dimensional vector),
b
Now we optimize W, b, e1 eN

Now we have 2N constraints

Still a QP (soft-margin SVM)
slide 25

Another look at non-separable case

Heres another non-separable dataset, in 1dimensional space

We can of course use slack variables but heres
another trick!

x=0

slide 26

Trick #2:Map data to high dimensional space

We can map data x from 1-D to 2-D by x (x, x2)

x=0
slide 27

Trick #2:Map data to high dimensional space

We can map data x from 1-D to 2-D by

x (x)=(x, x2)
Now the data is
linearly separable in
the new space!
We can run SVM in
the new space
The linear boundary
in the new space
corresponds to a nonlinear boundary in the
old space
x=0
slide 28

Another example

( x1 , x2 ) ( x1 , x2 , x x )
2
1

2
2

slide 29

Trick #2:Map data to high dimensional space

In general we might want to map an already highdimensional X=(x1, x2, , xd) into some much higher,
even infinite dimensional space (x)
Problems:
How do you represent infinite dimensions?
We need to learn (amount other things) W, which
lives in the new space learning a large (or
infinite) number of variables in QP is not a good
idea.

x=0
slide 30

Trick #2: kernels

We will do several things:

Convert it into a equivalent QP problem, which
does not use W, or even (X) alone!
It only uses the inner product (X) (Y), where X and Y
are two training points. The solution also only uses such
inner product
It still seems infeasible to compute for high (infinite)
dimensions

But there are smart ways to compute such inner

product, known as kernel (a function on two var.)

kernel(X,Y) inner product (X) (Y)

Why should you care:

One kernel, one new (higher dimensional) space
You will impress friends at cocktail parties
slide 31

Prepare to bite the bullet

Heres the original QP formula

minW,b WW
Subject to Yi (WXi + b) >= 1, for all i (N constraints)
Remember Lagrange multipliers?
We have them

L = WW ai [Yi (WXi + b) 1]

here because
those are
inequality
constraints

Subject to ai >= 0, for all i

We want the gradient of L to vanish with respect to W,
b, a. Try this and youll get
W = ai Yi Xi

ai Yi = 0
Put them back into the Lagrangian L
slide 32

Biting the bullet

max{ai} ai i,jai aj Yi Yj Xi Xj
Subject to

ai >= 0, for all I

ai Yi = 0
This is an equivalent QP problem (the dual)
Before we optimize W (d variables), now we optimize
a (N variables): which is better?
X only appears in the inner product

slide 33

Biting the bullet

max{ai} ai i,jai aj Yi Yj (Xi) (Xj)
Subject to

ai >= 0, for all I

ai Yi = 0

If we map X to
new space
(X)

This is an equivalent QP problem

Before we optimize W (d variables), now we optimize
a (N variables): which is better?
X only appears in the inner product

slide 34

Biting the bullet

max{ai} ai i,jai aj Yi Yj K(Xi, Xj)
Subject to

ai >= 0, for all I

ai Yi = 0

If we map X to
new space
(X)

This is an equivalent QP problem

Before we optimize W (d variables), now we optimize
a (N variables): which is better?
X only appears in the inner product

Kernel K(Xi,Xj) = (Xi)

(Xj)

slide 35

Whats special about kernel

Say data is two dimensional: s=(s1,s2)

We decide to use a particular mapping into 6
dimensional space
(s)=(s12, s22, 2 s1s2, s1, s2, 1)
Let another point be t=(t1,t2).
(s) (t)=s12 t12 + s22 t22 + 2 s1s2t1t2 + s1t1 + s2t2 + 1

slide 36

Whats special about kernel

Say data is two dimensional: s=(s1,s2)

We decide to use a particular mapping into 6
dimensional space
(s)=(s12, s22, 2 s1s2, 2 s1, 2 s2, 1)
Let another point be t=(t1,t2).
(s) (t)=s12 t12 + s22 t22 + 2 s1s2t1t2 + 2 s1t1 + 2 s2t2 + 1

Let the kernel be K(s,t) = (st+1)2

Verify that they are the same. We saved
computation.

slide 37

Kernels

You ask: Is there such a good K for any I pick?

The inverse question: Given some K, is there a so
that K(X,Y) = (X) (Y)?
Mercers condition: the inverse question is true

This is positive semi-definiteness, if you must know

may be infinite dimensional; we may not be able to
explicitly write down

slide 38

Some frequently used kernels

Linear kernel: K(X,Y) = XY

Quadratic kernel: K(X,Y) = (XY+1)2
Polynomial kernel: K(X,Y) = (XY+1)n
Radial Basis Function kernel: K(X,Y) = exp(- ||X-Y||2/)
Many, many other kernels
Hacking with SVM: create various kernels, hope their
space is meaningful, plug them into SVM, pick the
one with good classification accuracy (equivalent to
feature engineering)
Kernel summary: QP of size N, nonlinear SVM in the
original space, new space possibly high/infinite dim,
efficient if K is easy to compute
Kernel can be combined with slack variables
slide 39

Why the name support vector machines

max{ai} ai i,jai aj Yi Yj K(Xi, Xj)
Subject to

ai >= 0, for all I

ai Yi = 0
The decision boundary is
f(Xnew)=W Xnew + b= ai Yi Xi Xnew + b
In practice, many as will be zero in the solution!
Those few X with a>0 lies on the margin (blue or
red lines), they are the support vectors

slide 40

What you should know

The intuition, where to find software

Vector, line, length
Margin
QP with linear constraints
How to handle non-separable data
Slack variables
Kernels new feature space

Ref: A Tutorial on Support Vector Machines for Pattern

Recognition (1998) Christopher J. C. Burges

slide 41

Pediatric Nursing Care (Case Study)
80% (46)
Pediatric Nursing Care (Case Study)
37 pages
Biomedical Applications of Composite Materials
50% (2)
Biomedical Applications of Composite Materials
30 pages
Introduction To: Support Vector Machines
No ratings yet
Introduction To: Support Vector Machines
53 pages
4 - SVM
No ratings yet
4 - SVM
58 pages
SVM
No ratings yet
SVM
57 pages
SVM PRESENTATION
No ratings yet
SVM PRESENTATION
34 pages
CS-13410 Introduction To Machine Learning
No ratings yet
CS-13410 Introduction To Machine Learning
33 pages
SVM Tutorial
No ratings yet
SVM Tutorial
34 pages
MergedPDF Iml
No ratings yet
MergedPDF Iml
114 pages
SVM Tutorial
No ratings yet
SVM Tutorial
34 pages
Support Vector Machine
No ratings yet
Support Vector Machine
35 pages
(Optimization) SVMs
No ratings yet
(Optimization) SVMs
19 pages
SVM Tutorial
No ratings yet
SVM Tutorial
34 pages
SVM
No ratings yet
SVM
40 pages
Introduction To Support Vector Machines: 1 Description
No ratings yet
Introduction To Support Vector Machines: 1 Description
15 pages
Support Vector Machines: Logisic Regression
No ratings yet
Support Vector Machines: Logisic Regression
10 pages
SVM Class
No ratings yet
SVM Class
33 pages
Support Vector Machine
No ratings yet
Support Vector Machine
45 pages
SVM-CDing2024 11 15
No ratings yet
SVM-CDing2024 11 15
54 pages
Lec5 Support vector machine
No ratings yet
Lec5 Support vector machine
28 pages
Lecture 7_SVM
No ratings yet
Lecture 7_SVM
125 pages
Support Vector Machines: More Generally Kernel Methods
No ratings yet
Support Vector Machines: More Generally Kernel Methods
58 pages
SVM Tutorial
No ratings yet
SVM Tutorial
31 pages
2024-SCU-ML-2-1-SVM
No ratings yet
2024-SCU-ML-2-1-SVM
36 pages
UNIT - 2
No ratings yet
UNIT - 2
15 pages
Lecture Slides-Week12
100% (1)
Lecture Slides-Week12
41 pages
Lec06 SVM
No ratings yet
Lec06 SVM
25 pages
cs221-lecture11
No ratings yet
cs221-lecture11
71 pages
Support Vector Machines: Constantin F. Aliferis & Ioannis Tsamardinos
No ratings yet
Support Vector Machines: Constantin F. Aliferis & Ioannis Tsamardinos
37 pages
Machine Learning
No ratings yet
Machine Learning
45 pages
Support Vector Machin, An Excellent Tool
No ratings yet
Support Vector Machin, An Excellent Tool
36 pages
Lecture 9 - SVMs
No ratings yet
Lecture 9 - SVMs
8 pages
ML Lec SVM Linear
No ratings yet
ML Lec SVM Linear
19 pages
Dis11 Sol
No ratings yet
Dis11 Sol
5 pages
SVM Consolidated
No ratings yet
SVM Consolidated
34 pages
SVM
No ratings yet
SVM
44 pages
SVM Explained PDF
No ratings yet
SVM Explained PDF
19 pages
10 SVM
No ratings yet
10 SVM
23 pages
This Is
No ratings yet
This Is
7 pages
SVM Notes
No ratings yet
SVM Notes
40 pages
Machine Learning - Open Elective - Part III
No ratings yet
Machine Learning - Open Elective - Part III
90 pages
SVM - Hype or Hallelujah
No ratings yet
SVM - Hype or Hallelujah
13 pages
Feature Selection For SVMS: by J. Weston, S. Mukherjee, O. Chapelle, M. Pontil, T. Poggio, V. Vapnik
No ratings yet
Feature Selection For SVMS: by J. Weston, S. Mukherjee, O. Chapelle, M. Pontil, T. Poggio, V. Vapnik
19 pages
Another Introduction SVM
No ratings yet
Another Introduction SVM
4 pages
Support Vector Machines Ymod
No ratings yet
Support Vector Machines Ymod
4 pages
Pattern Recognition & Learning II: © UW CSE Vision Faculty
No ratings yet
Pattern Recognition & Learning II: © UW CSE Vision Faculty
47 pages
315 F19 14 SVM 1
No ratings yet
315 F19 14 SVM 1
33 pages
SVM & Image Classification.
No ratings yet
SVM & Image Classification.
22 pages
Support Vector Machine
No ratings yet
Support Vector Machine
38 pages
Support Vector Machines: (Vapnik, 1979)
No ratings yet
Support Vector Machines: (Vapnik, 1979)
34 pages
CH 5 SVM
No ratings yet
CH 5 SVM
25 pages
Support Vector Machines: Some Slides Adapted From
No ratings yet
Support Vector Machines: Some Slides Adapted From
54 pages
Support Vector Machines: Jeff Wu
No ratings yet
Support Vector Machines: Jeff Wu
35 pages
Time Series Forecasting by Using Wavelet Kernel SVM
No ratings yet
Time Series Forecasting by Using Wavelet Kernel SVM
52 pages
Support Vector Machine
No ratings yet
Support Vector Machine
50 pages
SVM Scribe Notes
No ratings yet
SVM Scribe Notes
16 pages
22-Kernel Tricks Shit
No ratings yet
22-Kernel Tricks Shit
43 pages
SML Unit 4
No ratings yet
SML Unit 4
61 pages
Chapter 7
No ratings yet
Chapter 7
64 pages
7 SVM for Scientists Annotated (1)
No ratings yet
7 SVM for Scientists Annotated (1)
76 pages
Student's Solutions Manual and Supplementary Materials for Econometric Analysis of Cross Section and Panel Data, second edition
From Everand
Student's Solutions Manual and Supplementary Materials for Econometric Analysis of Cross Section and Panel Data, second edition
Jeffrey M. Wooldridge
No ratings yet
Generalized Fermat Equation
From Everand
Generalized Fermat Equation
Ran Van Vo
No ratings yet
Arithmetic Coding: I J I J J
No ratings yet
Arithmetic Coding: I J I J J
12 pages
Huffman Codes
No ratings yet
Huffman Codes
18 pages
Comp 5
No ratings yet
Comp 5
18 pages
Overlap and Add and Overlap and Save Methods
No ratings yet
Overlap and Add and Overlap and Save Methods
6 pages
Comp 1
No ratings yet
Comp 1
15 pages
DPCM With Quantisation Example: Transmitter
No ratings yet
DPCM With Quantisation Example: Transmitter
27 pages
Exam2 (2006 Spring)
No ratings yet
Exam2 (2006 Spring)
3 pages
Lect 26 PDF
No ratings yet
Lect 26 PDF
14 pages
An Invitation To 3-D Vision PDF
No ratings yet
An Invitation To 3-D Vision PDF
338 pages
Chemistry 8th Edition Steven S. Zumdahl download
100% (3)
Chemistry 8th Edition Steven S. Zumdahl download
50 pages
Preventive Maintenance Check List: March 1995
100% (2)
Preventive Maintenance Check List: March 1995
115 pages
Kicking in Soccer: Kicking: Art and Sports
No ratings yet
Kicking in Soccer: Kicking: Art and Sports
33 pages
Cladogram
No ratings yet
Cladogram
2 pages
Reproductive Health Bill (Philippines)
No ratings yet
Reproductive Health Bill (Philippines)
18 pages
Vocabulary For Argumentative Writing
No ratings yet
Vocabulary For Argumentative Writing
2 pages
Abs Book
No ratings yet
Abs Book
75 pages
Organisation Chart EZK 20 March 2023
No ratings yet
Organisation Chart EZK 20 March 2023
1 page
Product Information: Can-Switch
No ratings yet
Product Information: Can-Switch
3 pages
Bhaskararaya Makhin
No ratings yet
Bhaskararaya Makhin
8 pages
Revlec 3
No ratings yet
Revlec 3
30 pages
Data Kapal IKLAN
100% (1)
Data Kapal IKLAN
54 pages
Religious Symbols and Motifs in Crime and Punishment
No ratings yet
Religious Symbols and Motifs in Crime and Punishment
3 pages
Basic Electronics 2nd Semkjjj
No ratings yet
Basic Electronics 2nd Semkjjj
3 pages
Kaun Banega Crorepati Computer C++ Project
No ratings yet
Kaun Banega Crorepati Computer C++ Project
20 pages
Oscillations Class 11 Notes Physics: - Periodic Motion
No ratings yet
Oscillations Class 11 Notes Physics: - Periodic Motion
6 pages
BEDA Faq
No ratings yet
BEDA Faq
3 pages
Biomimicry of Palm Tree Leaves Form and Pattern On Building Form
No ratings yet
Biomimicry of Palm Tree Leaves Form and Pattern On Building Form
7 pages
International Law, World Order, and Critical Legal Studie
No ratings yet
International Law, World Order, and Critical Legal Studie
39 pages
Queue - Princípios de Termodinâmica para Engenharia - Moran &amp Shapiro - 7 Edição - Solucionário PDF
No ratings yet
Queue - Princípios de Termodinâmica para Engenharia - Moran &amp Shapiro - 7 Edição - Solucionário PDF
4 pages
2013-2018 Honda CRF110F Wiring Diagram
No ratings yet
2013-2018 Honda CRF110F Wiring Diagram
1 page
On Gains On Foreign Exchange Transactions
No ratings yet
On Gains On Foreign Exchange Transactions
1 page
Semantics and Pragmatics
100% (1)
Semantics and Pragmatics
38 pages
Chapter 3 (A) Professional Ethics Responsibilities
No ratings yet
Chapter 3 (A) Professional Ethics Responsibilities
48 pages
Notes Module 1
No ratings yet
Notes Module 1
2 pages
Portable Arduino Based Air Quality Monitoring System: Polytechnic University of The Philippines
No ratings yet
Portable Arduino Based Air Quality Monitoring System: Polytechnic University of The Philippines
52 pages
1 s2.0 S0045653521032677 Main
No ratings yet
1 s2.0 S0045653521032677 Main
16 pages
Quran The Sacred Book of Islam
100% (1)
Quran The Sacred Book of Islam
3 pages