0% found this document useful (0 votes)

23 views

Machine Learning

The document discusses support vector machines and their use of maximum margin classifiers and Lagrange multipliers for optimization. It covers analytical geometry, defining maximum margins to separate linearly separable data, using Lagrange multipliers to solve constrained optimization problems like finding the optimal separating hyperplane for support vector machines, and deriving the Lagrange function to optimize the margin.

Uploaded by

Tok Tik

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

23 views

Machine Learning

Uploaded by

Tok Tik

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 44

Machine Learning

Support Vector Machine

Lecturer: Duc Dung Nguyen, PhD.

Contact: [email protected]

Faculty of Computer Science and Engineering

Hochiminh city University of Technology
Contents

1. Analytical Geometry

2. Maximum Margin Classifiers

3. Lagrange Multipliers

4. Non-linearly Separable Data

5. Soft-margin

Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Machine Learning 1 / 33

Analytical Geometry
Analytical Geometry

Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Machine Learning 2 / 33

Analytical Geometry

Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Machine Learning 3 / 33

Maximum Margin Classifiers
Maximum margin classifiers

• Assume that the data are linearly separable

• Decision boundary equation:
y(x) = w.x + b

Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Machine Learning 4 / 33

Maximum margin classifiers

• Margin: the smallest distance between the decision boundary and any of the samples.

Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Machine Learning 5 / 33

Maximum margin classifiers

• Margin: the smallest distance between the decision boundary and any of the samples.

Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Machine Learning 5 / 33

Maximum margin classifiers

• Support vectors: samples at the two margins.

Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Machine Learning 6 / 33

Maximum margin classifiers

• Scaling y (support vectors) to be 1 or -1:

Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Machine Learning 7 / 33

Maximum margin classifiers

• Signed distance between the decision boundary and a sample xn :

y(xn )
||w||

Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Machine Learning 8 / 33

Maximum margin classifiers

• Signed distance between the decision boundary and a sample xn :

y(xn )
||w||

• Absolute distance between the decision boundary and a sample xn :

tn .y(xn )
||w||

tn = +1 iff y(xn ) > 0 and tn = −1 iff y(xn ) < 0

Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Machine Learning 8 / 33

Maximum margin classifiers

• Maximum margin:
1
arg max minn (tn .(w.xn + b))
w,b ||w||
with the constraint:
tn .(w.xn + b) ≥ 1

Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Machine Learning 9 / 33

Maximum margin classifiers

• To be optimized:
1
arg min kwk2
w,b 2
with the constraint:
tn .(w.xn + b) ≥ 1

Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Machine Learning 10 / 33

Lagrange Multipliers
Optimization using Lagrange multipliers

Joseph-Louis Lagrange born 25 January 1736 – Paris, 10

April 1813; also reported as Giuseppe Luigi Lagrange,
was an Italian Enlightenment Era mathematician and as-
tronomer. He made significant contributions to the fields
of analysis, number theory, and both classical and celestial
mechanics.

Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Machine Learning 11 / 33

Optimization using Lagrange multipliers

• Problem:
arg max f (x)
x

with the constraint:

g(x) = 0

Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Machine Learning 12 / 33

Optimization using Lagrange multipliers

• Solution is the stationary point of the Lagrange function:

L(x, λ) = f (x) + λ.g(x)

such that:
∂L(x, λ)/∂xn = ∂f (x)/∂xn + λ.∂g(x)/∂xn = 0
and
∂L(x, λ)/∂λ = g(x) = 0

Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Machine Learning 13 / 33

Optimization using Lagrange multipliers

• Example:
f (x) = 1 − u2 − v 2
with the constraint:
g(x) = u + v − 1 = 0

Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Machine Learning 14 / 33

Optimization using Lagrange multipliers

• Lagrange function:

L(x, λ) = f (x) + λ.g(x) = (1 − u2 − v 2 ) + λ.(u + v − 1)

∂L(x, λ)/∂u = ∂f (x)/∂u + λ.∂g(x)/∂u = −2u + λ = 0
∂L(x, λ)/∂v = ∂f (x)/∂v + λ.∂g(x)/∂v = −2v + λ = 0
∂L(x, λ)/∂λ = g(x) = u + v − 1 = 0

• Solution: u = 1/2 and v = 1/2

Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Machine Learning 15 / 33

Optimization using Lagrange multipliers

• Example:

f (x) = 1 − u2 − v 2

with the constraint:

g(x) = u + v − 1 = 0

Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Machine Learning 16 / 33

Optimization using Lagrange multipliers

• Problem:
arg max f (x)
x

with the inequality constraint:

g(x) ≥ 0

Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Machine Learning 17 / 33

Optimization using Lagrange multipliers

Solution is the stationary point of the Lagrange function:

L(x, λ) = f (x) + λ.g(x)

such that:
∂L(x, λ)/∂xn = ∂f (x)/∂xn + λ.∂g(x)/∂xn = 0
and
g(x) ≥ 0
λ≥0
λ.g(x) = 0

Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Machine Learning 18 / 33

Optimization using Lagrange multipliers

• To be optimized:
1
arg min kwk2
w,b 2
with the constraint:
tn .(w.xn + b) ≥ 1)
• Lagrange function for maximum margin classifier:
1 X
L(w, b, a) = kwk2 − an .(tn .(w.xn + b) − 1)
2
n=1..N

tn .(w.xn + b) − 1 ≥ 0
an ≥ 0
an .(tn .(w.xn + b) − 1) = 0
Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Machine Learning 19 / 33
Optimization using Lagrange multipliers

• Lagrange function for maximum margin classifier:

1 X
L(w, b, a) = kwk2 − an .(tn .(w.xn + b) − 1)
2
n=1..N

• Solution for w:
∂(w, b, a)/∂w = 0
X
w= an .tn .xn
n=1..N
X
∂L(w, b, a)/∂b = an .tn = 0
n=1..N

Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Machine Learning 20 / 33

Optimization using Lagrange multipliers

• Lagrange function for maximum margin classifier:

1 X
L(w, b, a) = kwk2 − an .(tn .(w.xn + b) − 1)
2
n=1..N

• Solution for a: dual representation to be optimized

X 1 X X
L∗ (a) = an − an .am .tn .tm .xn .xm
2
n=1..N n=1..N m=1..N

with the constraints:

an ≥ 0
X
an .tn = 0
n=1..N

Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Machine Learning 21 / 33

Optimization using Lagrange multipliers

• Lagrange function for maximum margin classifier:

1 X
L(w, b, a) = kwk2 − an .(tn .(w.xn + b) − 1)
2
n=1..N

• Solution for a: dual representation to be optimized

X 1 X X
L∗ (a) = an − an .am .tn .tm .xn .xm
2
n=1..N n=1..N m=1..N

Why optimization via dual representation?

• Sparsity: an = 0 if xn is not a support vector.

Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Machine Learning 22 / 33

Optimization using Lagrange multipliers

• Lagrange function for maximum margin classifier:

1 X
L(w, b, a) = kwk2 − an .(tn .(w.xn + b) − 1)
2
n=1..N

an .(tn .(w.xn + b) − 1) = 0
• Solution for b:
1 X
b= am .tm .xm .xn
|S|
n∈S

where S is the set of support vectors (an 6= 0)

Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Machine Learning 23 / 33

Optimization using Lagrange multipliers

• Classification: X
y(x) = w.x + b = an .tn .xn .x + b
n=1..N

y(x) > 0 → +1
y(x) < 0 → −1

Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Machine Learning 24 / 33

Non-linearly Separable Data
Kernel trick for non-linearly separable data

• Mapping the data points into a high dimensional feature space.

• Example 1:
• Original space: (x)
• New space: (x, x2 )

Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Machine Learning 25 / 33

Kernel trick for non-linearly separable data

• Example 2:
• Original space: (u, v)
• New space: ((u2 + v 2 )1/2 , arctan(v/u))

Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Machine Learning 26 / 33

Kernel trick for non-linearly separable data

Example 3: XOR function

In1 In2 t
0 0 0
0 1 1
1 0 1
1 1 0

Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Machine Learning 27 / 33

Kernel trick for non-linearly separable data

Example 3: XOR function

In1 In2 In3 Output

0 0 1 1
0 1 0 0
1 0 0 0
1 1 0 1

Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Machine Learning 28 / 33

Kernel trick for non-linearly separable data

• Classification in the new space:

X
y(x) = w.φ(x) + b = an .tn .φ(xn ).φ(x) + b
n=1..N

Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Machine Learning 29 / 33

Kernel trick for non-linearly separable data

• Classification in the new space:

X
y(x) = w.φ(x) + b = an .tn .φ(xn ).φ(x) + b
n=1..N

• Computational complexity of φ(xn ).φ(x) is high due to the high dimension of φ(.).

Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Machine Learning 29 / 33

Kernel trick for non-linearly separable data

• Classification in the new space:

X
y(x) = w.φ(x) + b = an .tn .φ(xn ).φ(x) + b
n=1..N

• Computational complexity of φ(xn ).φ(x) is high due to the high dimension of φ(.).
• Kernel trick:
φ(xn ).φ(xm ) = K(xn , xm )

Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Machine Learning 29 / 33

Kernel trick for non-linearly separable data

• A typical kernel function:

K(u, v) = (1 + u.v)2

√ √ √
φ((u1 .u2 , ..., ud )) = (1, 2u1 , 2u2 , ..., 2ud ,
√ √ √
2u1 .u2 , 2u1 .u3 , ..., 2ud−1 .ud ,
u21 , u22 , ..., u2d )
X X X X
φ(u).φ(v) = 1 + 2 ui .vi + 2 ui .vi .uj .vj + u2i vi2
i=1..d i=1..d−1 j=i+1..d i=1..d

φ(u.φ(v) = K(u, v)
• Is φ(x) guaranteed to be separable?

Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Machine Learning 30 / 33

Soft-margin
Soft margin SVM

• Soft-margin SVM: to allow some of the training samples to be misclassified.

• Slack variable: ξ

Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Machine Learning 31 / 33

Soft margin SVM

• New constraints:
tn .(w.xn + b) ≥ 1 − ξn
ξn ≥ 0

Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Machine Learning 32 / 33

Soft margin SVM

• New constraints:
tn .(w.xn + b) ≥ 1 − ξn
ξn ≥ 0
• To be minimized:
1 X
||w||2 = C ξn
2
n=1..N

C > 0: controls the trade-off between the margin and slack variable penalty

Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Machine Learning 32 / 33

Summary

• SVM is a sparse kernel method.

• Soft margin SVM is to deal with non-linearly separable data after kernel mapping.

Lecturer: Duc Dung Nguyen, PhD. Contact: [email protected] Machine Learning 33 / 33

SVM
No ratings yet
SVM
21 pages
Machine Learning and Data Mining: Introduction to (Học máy và Khai phá dữ liệu)
No ratings yet
Machine Learning and Data Mining: Introduction to (Học máy và Khai phá dữ liệu)
49 pages
EXPLOR 1 Stamped
No ratings yet
EXPLOR 1 Stamped
46 pages
Svm Student
No ratings yet
Svm Student
40 pages
L5_SVMs
No ratings yet
L5_SVMs
37 pages
10_SVM (1)
No ratings yet
10_SVM (1)
77 pages
Support Vector Machine
No ratings yet
Support Vector Machine
55 pages
Lec5 Support vector machine
No ratings yet
Lec5 Support vector machine
28 pages
SVM
No ratings yet
SVM
44 pages
Introduction To: Support Vector Machines
No ratings yet
Introduction To: Support Vector Machines
53 pages
ML - 05 - Support Vector Machines
No ratings yet
ML - 05 - Support Vector Machines
52 pages
Chapter 1 - Introduction
No ratings yet
Chapter 1 - Introduction
32 pages
Support Vector Machines (SVM) : Y.H. Hu
No ratings yet
Support Vector Machines (SVM) : Y.H. Hu
25 pages
Lec06 SVM
No ratings yet
Lec06 SVM
25 pages
SVM Slides
No ratings yet
SVM Slides
22 pages
Support Vector Machine
No ratings yet
Support Vector Machine
35 pages
Support Vector Machines: Logisic Regression
No ratings yet
Support Vector Machines: Logisic Regression
10 pages
SVM-CDing2024 11 15
No ratings yet
SVM-CDing2024 11 15
54 pages
An Introduction To Support Vector Machines
No ratings yet
An Introduction To Support Vector Machines
13 pages
ML Lec SVM Linear
No ratings yet
ML Lec SVM Linear
19 pages
SVM SLIDES
No ratings yet
SVM SLIDES
32 pages
Svm
No ratings yet
Svm
29 pages
CS-13410 Introduction To Machine Learning
No ratings yet
CS-13410 Introduction To Machine Learning
33 pages
10 SVM
No ratings yet
10 SVM
23 pages
[DL] Ch04-Regularization
No ratings yet
[DL] Ch04-Regularization
37 pages
Support Vector Machine
No ratings yet
Support Vector Machine
46 pages
SVM Seminarbericht Hofmann
No ratings yet
SVM Seminarbericht Hofmann
16 pages
cs221-lecture11
No ratings yet
cs221-lecture11
71 pages
SVM Tutorial
No ratings yet
SVM Tutorial
31 pages
ML Lectures - 20 22
No ratings yet
ML Lectures - 20 22
14 pages
SVM_NEW
No ratings yet
SVM_NEW
12 pages
Support Vector Machines For Classification: A Seminar On Data Mining
No ratings yet
Support Vector Machines For Classification: A Seminar On Data Mining
18 pages
11 Ethem Linear SVM 2015
No ratings yet
11 Ethem Linear SVM 2015
66 pages
Svm
No ratings yet
Svm
40 pages
Support Vector Machine (SVM)
No ratings yet
Support Vector Machine (SVM)
45 pages
svm
No ratings yet
svm
33 pages
SVM Tutorial
No ratings yet
SVM Tutorial
34 pages
Deep Learn
No ratings yet
Deep Learn
48 pages
SVM PRESENTATION
No ratings yet
SVM PRESENTATION
34 pages
CH 5 SVM
No ratings yet
CH 5 SVM
25 pages
Support Vector Machines: (Vapnik, 1979)
No ratings yet
Support Vector Machines: (Vapnik, 1979)
34 pages
4 - SVM
No ratings yet
4 - SVM
58 pages
MergedPDF Iml
No ratings yet
MergedPDF Iml
114 pages
A09-Support-Vector-Machines-2up (3)
No ratings yet
A09-Support-Vector-Machines-2up (3)
15 pages
Jean Gallier, Jocelyn Quaintance - Linear Algebra and Optimization With Applications To Machine Learning - Volume II - Fundamentals of Optimization Theory With Applications To Machine Learning. 2-Wor
100% (1)
Jean Gallier, Jocelyn Quaintance - Linear Algebra and Optimization With Applications To Machine Learning - Volume II - Fundamentals of Optimization Theory With Applications To Machine Learning. 2-Wor
896 pages
Support Vector Machines: Vibhav Gogate The University of Texas at Dallas
No ratings yet
Support Vector Machines: Vibhav Gogate The University of Texas at Dallas
36 pages
Support Vector Machines
No ratings yet
Support Vector Machines
13 pages
Support Vector Machine Master Thesis
100% (3)
Support Vector Machine Master Thesis
7 pages
Machine Learning - Open Elective - Part III
No ratings yet
Machine Learning - Open Elective - Part III
90 pages
Syllabus MAI391 Sp24
No ratings yet
Syllabus MAI391 Sp24
16 pages
W12 SVM
No ratings yet
W12 SVM
52 pages
SVM Tutorial
No ratings yet
SVM Tutorial
34 pages
Support Vector Machines: Javier B Ejar Cbea
No ratings yet
Support Vector Machines: Javier B Ejar Cbea
44 pages
Final - Support Vector Machine - Class - Modifie
No ratings yet
Final - Support Vector Machine - Class - Modifie
69 pages
Class 0420
No ratings yet
Class 0420
44 pages
Support Vector Machine (SVM) Algorithm - GeeksforGeeks
No ratings yet
Support Vector Machine (SVM) Algorithm - GeeksforGeeks
20 pages
SVM Tutorial
No ratings yet
SVM Tutorial
34 pages
Support Vector Machine
No ratings yet
Support Vector Machine
45 pages
A Star: Fundamentals and Applications
From Everand
A Star: Fundamentals and Applications
Fouad Sabry
No ratings yet
Calculus: Maths of the Gods
From Everand
Calculus: Maths of the Gods
Bill Todorovich
No ratings yet
Jellop Kickstarter Main Video Guide
No ratings yet
Jellop Kickstarter Main Video Guide
2 pages
PhySG - Kai
No ratings yet
PhySG - Kai
10 pages
NeRV - Pratul
No ratings yet
NeRV - Pratul
12 pages
Understanding The Masking-Shadowing Function in Microfacet-Based Brdfs
No ratings yet
Understanding The Masking-Shadowing Function in Microfacet-Based Brdfs
60 pages
Diffrt
No ratings yet
Diffrt
11 pages
Untitled
No ratings yet
Untitled
19 pages
Prob Weber
No ratings yet
Prob Weber
32 pages
Trending Topic Analysis Using Novel Sub Topic Detection Model
No ratings yet
Trending Topic Analysis Using Novel Sub Topic Detection Model
5 pages
Download full Modern Business Statistics with Microsoft Office Excel 4th Edition Anderson Solutions Manual all chapters
100% (35)
Download full Modern Business Statistics with Microsoft Office Excel 4th Edition Anderson Solutions Manual all chapters
65 pages
Evidence Nr439 Week 5 Worksheet July 2020-1acx
No ratings yet
Evidence Nr439 Week 5 Worksheet July 2020-1acx
5 pages
CQF January 2022 M1L4 Blank
No ratings yet
CQF January 2022 M1L4 Blank
46 pages
Representation of Data (S1) # 1
No ratings yet
Representation of Data (S1) # 1
6 pages
Research Methods For Strategic Managers
67% (3)
Research Methods For Strategic Managers
46 pages
Difference of Proportion
No ratings yet
Difference of Proportion
5 pages
Applications of Mathematics To Biology: Manuscript ID: #0443 Original Research Paper
No ratings yet
Applications of Mathematics To Biology: Manuscript ID: #0443 Original Research Paper
5 pages
Praktikum M2 (Minitab)
No ratings yet
Praktikum M2 (Minitab)
78 pages
Statistics For Management Unit 3 2marks
No ratings yet
Statistics For Management Unit 3 2marks
4 pages
CS115 01
No ratings yet
CS115 01
38 pages
Economic Analysis - ECO 740 Regression Assignment
No ratings yet
Economic Analysis - ECO 740 Regression Assignment
7 pages
Effectiveness of Music Therapy On Academic Performance of Nursing Students
No ratings yet
Effectiveness of Music Therapy On Academic Performance of Nursing Students
7 pages
11 Output Analysis For A Single Model
No ratings yet
11 Output Analysis For A Single Model
73 pages
WINSEM2023-24 BMAT202L TH VL2023240501734 2024-03-18 Reference-Material-I
No ratings yet
WINSEM2023-24 BMAT202L TH VL2023240501734 2024-03-18 Reference-Material-I
93 pages
Tutorial LAVAAN
No ratings yet
Tutorial LAVAAN
38 pages
Bio Ia
No ratings yet
Bio Ia
14 pages
3 6 PDF
No ratings yet
3 6 PDF
2 pages
Understanding & Interpreting The Effects of Continuous Variables: The MCP (Marginscontplot) Command
No ratings yet
Understanding & Interpreting The Effects of Continuous Variables: The MCP (Marginscontplot) Command
18 pages
PR 2303
0% (1)
PR 2303
4 pages
Effect of Project Complexity On Cost and Schedule Performance in Transportation Projects
No ratings yet
Effect of Project Complexity On Cost and Schedule Performance in Transportation Projects
17 pages
Prob Stats
No ratings yet
Prob Stats
80 pages
Actl 3003/5106: Insurance Risk Models Mid-Term Exam: School of Actuarial Studies SESSION 2, 2011
No ratings yet
Actl 3003/5106: Insurance Risk Models Mid-Term Exam: School of Actuarial Studies SESSION 2, 2011
18 pages
Dahilog - Research Method Pre-Test PDF
No ratings yet
Dahilog - Research Method Pre-Test PDF
5 pages
Sgems Quinta Parte Del MAnual para Uso Geoestadistico
No ratings yet
Sgems Quinta Parte Del MAnual para Uso Geoestadistico
27 pages
000 Methods of Presentation of Data - Textual and FDT
No ratings yet
000 Methods of Presentation of Data - Textual and FDT
63 pages
Assignment Solutions - 119
No ratings yet
Assignment Solutions - 119
5 pages
Introduction To Econometrics (ET2013) : Teresa Randazzo
No ratings yet
Introduction To Econometrics (ET2013) : Teresa Randazzo
49 pages