Support Vector Machines: Xiaojin Zhu
Support Vector Machines: Xiaojin Zhu
Xiaojin Zhu
[email protected]
Computer Sciences Department
University of Wisconsin, Madison
slide 1
slide 2
Expert
Derived
Automated
Ratio
SVM
cloud
45.7%
43.7%
58.5%
ice
60.1%
34.3%
80.4%
land
93.6%
94.7%
94.0%
snow
63.5%
90.4%
71.6%
water
84.2%
74.3%
89.1%
unclassified
45.7%
Visible
Image
Expert
Labeled
Expert
Derived
Automated
Ratio
slide 3
SVM
Class labels
denotes +1
denotes -1
slide 4
Linear classifier
slide 5
Linear classifier
slide 6
slide 7
Any of these
would be fine..
..but which is the
best?
slide 8
The margin
slide 9
Linear SVM
slide 10
slide 11
Two solutions:
Allow a few points on the wrong side (slack
variables), and/or
Map data to a higher dimensional space, do linear
classification there (kernel)
slide 12
slide 13
end of class
(But you want to know more, dont you?)
slide 14
slide 15
Vector
slide 16
Lines
WX=1, or WX-1=0
WX=0
WX=-1, or WX+1 = 0
1/||W||
2/||W||
M=2/||W|| (why?)
How do we find such W,b?
slide 18
Variables: W, b
Objective function: maximize the margin M=2/||W||
Equiv. to minimize ||W||, or ||W||2=WW, or WW
2/||w||
slide 19
Variables: W, b
Objective function: maximize the margin M=2/||W||
Equiv. to minimize ||W||, or ||W||2=WW, or WW
SVM as QP
minW,b WW
Subject to Yi (WXi + b) >= 1, for all i
Objective is convex, quadratic
Linear constraints
This problem is known as Quadratic Program (QP),
for which efficient global solution algorithms exist.
Plus-Plane
Classifier Boundary
Minus-Plane
2/||W||
slide 21
Non-separable case
slide 22
e2
e11
e7
Trade-off parameter
e2
e11
e7
slide 24
Trade-off parameter
e7
x=0
slide 26
x=0
slide 27
Another example
( x1 , x2 ) ( x1 , x2 , x x )
2
1
2
2
slide 29
In general we might want to map an already highdimensional X=(x1, x2, , xd) into some much higher,
even infinite dimensional space (x)
Problems:
How do you represent infinite dimensions?
We need to learn (amount other things) W, which
lives in the new space learning a large (or
infinite) number of variables in QP is not a good
idea.
x=0
slide 30
L = WW ai [Yi (WXi + b) 1]
here because
those are
inequality
constraints
ai Yi = 0
Put them back into the Lagrangian L
slide 32
ai Yi = 0
This is an equivalent QP problem (the dual)
Before we optimize W (d variables), now we optimize
a (N variables): which is better?
X only appears in the inner product
slide 33
ai Yi = 0
If we map X to
new space
(X)
slide 34
ai Yi = 0
If we map X to
new space
(X)
slide 35
slide 36
slide 37
Kernels
slide 38
ai Yi = 0
The decision boundary is
f(Xnew)=W Xnew + b= ai Yi Xi Xnew + b
In practice, many as will be zero in the solution!
Those few X with a>0 lies on the margin (blue or
red lines), they are the support vectors
slide 40
slide 41