Machine Learning
Machine Learning
1. Analytical Geometry
3. Lagrange Multipliers
5. Soft-margin
• Margin: the smallest distance between the decision boundary and any of the samples.
• Margin: the smallest distance between the decision boundary and any of the samples.
y(xn )
||w||
y(xn )
||w||
tn .y(xn )
||w||
• Maximum margin:
1
arg max minn (tn .(w.xn + b))
w,b ||w||
with the constraint:
tn .(w.xn + b) ≥ 1
• To be optimized:
1
arg min kwk2
w,b 2
with the constraint:
tn .(w.xn + b) ≥ 1
• Problem:
arg max f (x)
x
such that:
∂L(x, λ)/∂xn = ∂f (x)/∂xn + λ.∂g(x)/∂xn = 0
and
∂L(x, λ)/∂λ = g(x) = 0
• Example:
f (x) = 1 − u2 − v 2
with the constraint:
g(x) = u + v − 1 = 0
• Lagrange function:
• Example:
f (x) = 1 − u2 − v 2
g(x) = u + v − 1 = 0
• Problem:
arg max f (x)
x
such that:
∂L(x, λ)/∂xn = ∂f (x)/∂xn + λ.∂g(x)/∂xn = 0
and
g(x) ≥ 0
λ≥0
λ.g(x) = 0
• To be optimized:
1
arg min kwk2
w,b 2
with the constraint:
tn .(w.xn + b) ≥ 1)
• Lagrange function for maximum margin classifier:
1 X
L(w, b, a) = kwk2 − an .(tn .(w.xn + b) − 1)
2
n=1..N
tn .(w.xn + b) − 1 ≥ 0
an ≥ 0
an .(tn .(w.xn + b) − 1) = 0
Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Machine Learning 19 / 33
Optimization using Lagrange multipliers
• Solution for w:
∂(w, b, a)/∂w = 0
X
w= an .tn .xn
n=1..N
X
∂L(w, b, a)/∂b = an .tn = 0
n=1..N
an .(tn .(w.xn + b) − 1) = 0
• Solution for b:
1 X
b= am .tm .xm .xn
|S|
n∈S
• Classification: X
y(x) = w.x + b = an .tn .xn .x + b
n=1..N
y(x) > 0 → +1
y(x) < 0 → −1
• Example 2:
• Original space: (u, v)
• New space: ((u2 + v 2 )1/2 , arctan(v/u))
In1 In2 t
0 0 0
0 1 1
1 0 1
1 1 0
• Computational complexity of φ(xn ).φ(x) is high due to the high dimension of φ(.).
• Computational complexity of φ(xn ).φ(x) is high due to the high dimension of φ(.).
• Kernel trick:
φ(xn ).φ(xm ) = K(xn , xm )
√ √ √
φ((u1 .u2 , ..., ud )) = (1, 2u1 , 2u2 , ..., 2ud ,
√ √ √
2u1 .u2 , 2u1 .u3 , ..., 2ud−1 .ud ,
u21 , u22 , ..., u2d )
X X X X
φ(u).φ(v) = 1 + 2 ui .vi + 2 ui .vi .uj .vj + u2i vi2
i=1..d i=1..d−1 j=i+1..d i=1..d
φ(u.φ(v) = K(u, v)
• Is φ(x) guaranteed to be separable?
• New constraints:
tn .(w.xn + b) ≥ 1 − ξn
ξn ≥ 0
• New constraints:
tn .(w.xn + b) ≥ 1 − ξn
ξn ≥ 0
• To be minimized:
1 X
||w||2 = C ξn
2
n=1..N
C > 0: controls the trade-off between the margin and slack variable penalty