0% found this document useful (0 votes)

22 views

Math Behind SVM Part 1 (Support Vector Machine) - by MLMath - Io - Medium

Uploaded by

Amina Bashir

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views

Math Behind SVM Part 1 (Support Vector Machine) - by MLMath - Io - Medium

Uploaded by

Amina Bashir

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

Open in app

Math behind SVM (Support Vector Machine)

MLMath.io · Follow
9 min read · Feb 10, 2019

Listen Share More

SVM is one of the most popular, versatile supervised machine learning algorithm.
It is used for both classification and regression task.But in this thread we will talk
about classification task. It is usually preferred for medium and small sized data-
set.

The main objective of SVM is to find the optimal hyperplane which linearly
separates the data points in two component by maximizing the margin .

dotted line is hyperplane, separating blue and pink classes balls.

Don’t panic with the word ‘hyperplane’ and ‘margin’ that you have seen above. It is
explained in details below.
1.1K 12
Points to be covered:
1. Basic linear algebra

2. Hyper-plane

3. What if data points is not linearly separable ?

4. Optimal Hyperplane

5. How to choose optimal Hyperplane ?

6. Mathematical Interpretation of Optimal Hyperplane

7. Basic optimization term

8.SVM Optimization
Basic Linear Algebra

Vectors
Vectors are mathematical quantity which has both magnitude and direction. A
point in the 2D plane can be represented as a vector between origin and the
point.

Length of Vectors
Length of vectors are also called as norms. It tells how far vectors are from the
origin.
Direction of vectors
Direction of vector .

Dot Product
Dot product between two vectors is a scalar quantity . It tells how to vectors are
related.

Hyper-plane
It is plane that linearly divide the n-dimensional data points in two component. In
case of 2D, hyperplane is line, in case of 3D it is plane.It is also called as n-
dimensional line. Fig.3 shows, a blue line(hyperplane) linearly separates the data
point in two components.
In the Fig.3, hyperplane is line divides data point into two classes(red & green),
written as

What if data points is not linearly separable ?

Look Fig.4 how can we separate the data-points linearly? This type of situation
comes very often in machine learning world as raw data are always non-linear
here.So, Is it do-able? yes!!. we will add one extra dimension to the data points to
make it separable.
FIg.5

so , the above process of making non-linearly separable data point to linearly

separable data point is also known as Kernel Trick, which we will cover later in
details.

Optimal Hyperplane

Fig.6
If you look above Fig there are numbers of hyperplane that can separate the data
points in two components. So optimal hyperplane is one which divides the data points
very well. So question is why it is needed to choose optimal hyper plane?

So if you choose sub-optimal hyperplane, no doubt after number of training iteration ,

training error will decrease but during testing when an unseen instance will come, it will
result in high test error. In that case it is must to choose an optimal plane to get good
accuracy.

How to choose Optimal Hyperplane ?

Fig.7

Margin and Support Vectors

Let’s assume that solid black line in above Fig.7 is optimal hyperplane and two
dotted line is some hyperplane, which is passing through nearest data points to
the optimal hyperplane. Then distance between hyperplane and optimal
hyperplane is know as margin, and the closest data-points are known as support
vectors. Margin is an area which do not contains any data points. There will be
some cases when we have data points in margin area but right now we stick to
margin as no data points lands.

So, when we are choosing optimal hyperplane we will choose one among set of
hyperplane which is highest distance from the closest data points. If optimal
hyperplane is very close to data points then margin will be very small and it will
generalize well for training data but when an unseen data will come it will fail to
generalize well as explained above. So our goal is to maximize the margin so that
our classifier is able to generalize well for unseen instances.

So, In SVM our goal is to choose an optimal hyperplane which maximizes the margin.

———————

Since covering entire concept about SVM in one story will be very confusing. So i
will be dividing the tutorial into three parts.

1. Linear separable data points.

2. Linear separable data points II.

3. Non-linear separable data-points.

So, In this story we will assume that data points(training data) are linearly
separable.Lets start,

Mathematical Interpretation of Optimal Hyperplane

we have l training examples where each example x are of D dimension and each
have labels of either y=+1 or y= -1 class, and our examples are linearly separable.
Then, our training data is form ,
We consider D=2 to keep explanation simple and data points are linearly
separable, The hyperplane w.x+b=0 can be described as :

Fig.8

Support vectors examples are closest to optimal hyperplane and the aim of the
SVM is to orientate this hyperplane as far as possible from the closest member of
the both classes.

From the above Fig , SVM problem can be formulated as,

From the Fig.8 we have two hyperplane H1 and H2 passing through the support
vectors of +1 and -1 class respectively. so

w.x+b=-1 :H1

w.x+b=1 :H2
And distance between H1 hyperplane and origin is (-1-b)/|w| and distance
between H2 hyperplane and origin is (1–b)/|w|. So, margin can be given as

M=(1-b)/|w|-(-1-b)/|w|

M=2/|w|

Where M is nothing but twice of the margin. So margin can be written as 1/|w|.
As, optimal hyperplane maximize the margin, then the SVM objective is boiled
down to fact of maximizing the term 1/|w|,

Basic optimization algorithm terms

Unconstrained optimization
Example will be more intuitive in explaining the concept,

So, it is same that we used to do in higher school in calculus for finding maxima
and minima of a function. Only difference is at that time we were calculating for
univariate variable, but now we are calculating for multivariate variables.

Constrained optimization
Again it will become clear with an example,
So basically we calculate the maxima and minima of a function by taking into the
consideration of given constraints on the variables.

Primal and Dual Concept

Shocked!! what the heck is this now???

Don’t worry, it ‘s theory only !!

Lets not get deep into these concept. Any optimization problem can be
formulated into two way, primal and dual problem. First we use primal
formulation for optimization algorithm, but if it does not yield any solution, we go
for dual optimization formulation, which is guaranteed to yield solution.

Optimization
This part will be more mathematical, some terms are very high level concept of
mathematics, but don’t worry i will try to explain each one by one in layman term.

To make you comfortable, Learning algorithms of SVM are explained with pseudo
code explain below. This is very abstract concept in SVM optimization. For below
code assume x is data point and y is its corresponding labels.
Just above you see learning process of SVM. But here is a catch, how do we update w and b ??

Gradient descent of course!!! — -BIG ‘NO’.

why not Gradient descent??

SVM optimization problem is a case of constrained optimization problem, and it

is always preferred to use dual optimization algorithm to solve such constrained
optimization problem. That’s why we don’t use gradient descent.

Since it is constrained optimization problem Lagrange multipliers are used to

solve it, which is described below, It looks like , will be more mathematical but it
is not, its just few steps of finding gradient. We will divide the complete
formulation into three parts.

1. In first we will formulate SVM optimization problem Mathematically

2. we will find gradient with respect to learning parameters.

3. we will find the value of parameters which minimizes ||w||

PART I — Problem formulation

The above equation is Primal optimization problem.Lagrange method is required
to convert constrained optimization problem into unconstrained optimization
problem. The goal of above equation to get the optimal value for w and b.

PART II — — — finding the gradient with respect to w ,b and lambda.

May be above equation is looking tricky? but it is not, its just high school math of
finding minima with respect to variable.

PART III — — — — we will get the value of w

As,from the above formulation we only able to find the optimal value of w and
that is to dependent on λ, so we need to find the optimal value of λ also. And
finding optimal value of b needs both w and λ. So finding the value of λ will be
the important for us.
so how do we find the value of λ???

Above formulation is itself a optimization algorithm, but it not helpful to find the
optimal value. It is primal optimization problem. As we read above that if Primal
optimization doesn’t result in solution, we should use dual optimization
formulation, which has guaranteed solution. Also when we move from primal to
dual formulation we switch minimizing to maximizing the loss function. Again
we will divide the complete formulation into three parts to easier to understand.

1. Problem formulation and substitution value from primal

2. Simplify the loss function equation after substitution

3. final optimization formulation to get the value of λ

PART I — — — — Formulation from primal and Substitution

Above equation is a dual optimization problem. The equation are looking scarier
because of substitution of value of w.

PART II — — — — — — — — — Simplification
It is simplified equation of above dual optimization problem.

PART III — — — — — -Final optimization

So this is the final optimization problem, to find the maximum value of λ. Here
one more term K is there, which is nothing but dot product of input variable x.
(This K will be very important in future when we will learn about kernel trick and
non-linear data points).

Now How do we Solve the above problem??

Above maximization operation can be solved with the SMO ( sequential

minimization optimization) algorithms . There are also the various library
support online for this optimization. Once we get the value of λ we can get w
from below equation

and using value of w , λ we will calculate b as following,

As we now have the value for both w and b , then optimal hyperplane that can
separates the data points can be written as,

w.x + b = 0

And a new example x_ can be classified as sign(w.x_

+b)
Thanks for reading, this is the PART1 of SVM, in PART 2 we will discuss about how
to deal with a case when data is not fully linearly separable.

Machine Learning Svm Deep Learning Optimization

The Hundred-Page Machine Learning Book - Andriy Burkov
No ratings yet
The Hundred-Page Machine Learning Book - Andriy Burkov
16 pages
Support Vector Machine
100% (2)
Support Vector Machine
11 pages
Unit 02 - Nonlinear Classification, Linear Regression, Collaborative Filtering - MD
No ratings yet
Unit 02 - Nonlinear Classification, Linear Regression, Collaborative Filtering - MD
14 pages
Lab 5
No ratings yet
Lab 5
9 pages
Math Behind SVM Part 2 (Support Vector Machine) - by MLMath - Io - Medium
No ratings yet
Math Behind SVM Part 2 (Support Vector Machine) - by MLMath - Io - Medium
6 pages
Math Behind SVM (Kernel Trick) - This Is PART III of SVM Series - by MLMath - Io - Medium
No ratings yet
Math Behind SVM (Kernel Trick) - This Is PART III of SVM Series - by MLMath - Io - Medium
6 pages
ML Questions 2021
100% (1)
ML Questions 2021
26 pages
SVM Part A
No ratings yet
SVM Part A
16 pages
Support Vector Machine
No ratings yet
Support Vector Machine
32 pages
EXP-14
No ratings yet
EXP-14
27 pages
WWW Analyticsvidhya Com Blog 2021 10 Support Vector Machines
No ratings yet
WWW Analyticsvidhya Com Blog 2021 10 Support Vector Machines
21 pages
Unit3-SVM
No ratings yet
Unit3-SVM
20 pages
SVM Tutorial Part1
No ratings yet
SVM Tutorial Part1
9 pages
Support Vector Machine: Suraj Kumar Das
No ratings yet
Support Vector Machine: Suraj Kumar Das
10 pages
Pca PDF
No ratings yet
Pca PDF
10 pages
SVM VS SVC
No ratings yet
SVM VS SVC
27 pages
Understaing Support Vector Machine Example Code
No ratings yet
Understaing Support Vector Machine Example Code
11 pages
SVM.pptx
No ratings yet
SVM.pptx
67 pages
Unit 2
No ratings yet
Unit 2
47 pages
Important Questions
No ratings yet
Important Questions
18 pages
What Is Support Vector Machine
No ratings yet
What Is Support Vector Machine
13 pages
Support Vector Machine
No ratings yet
Support Vector Machine
52 pages
Math Behind SVM (Kernel Trick) - This Is PART III of SVM Series - by MLMath - Io - Medium
No ratings yet
Math Behind SVM (Kernel Trick) - This Is PART III of SVM Series - by MLMath - Io - Medium
6 pages
Support Vector Machine - Explanation
No ratings yet
Support Vector Machine - Explanation
12 pages
Anuranan Das Summer of Sciences, 2019. Understanding and Implementing Machine Learning
No ratings yet
Anuranan Das Summer of Sciences, 2019. Understanding and Implementing Machine Learning
17 pages
Support Vector Machine
No ratings yet
Support Vector Machine
40 pages
Understanding Support Vector Machine Algorithm From Examples
No ratings yet
Understanding Support Vector Machine Algorithm From Examples
10 pages
SVM notes unit 4.docx
No ratings yet
SVM notes unit 4.docx
8 pages
SVM&Decision Tree
No ratings yet
SVM&Decision Tree
10 pages
Gradient Descent
No ratings yet
Gradient Descent
9 pages
Understanding Support Vector Machine Algorithm From Examples Along With Code
No ratings yet
Understanding Support Vector Machine Algorithm From Examples Along With Code
11 pages
Support Vector Machine (SVM)
No ratings yet
Support Vector Machine (SVM)
4 pages
Lecture Notes - SVM
No ratings yet
Lecture Notes - SVM
13 pages
DL UNIT 2
No ratings yet
DL UNIT 2
46 pages
SVM
No ratings yet
SVM
5 pages
Machine Learning SVM - Supervised
No ratings yet
Machine Learning SVM - Supervised
32 pages
IML-IITKGP - Assignment 5 Solution
No ratings yet
IML-IITKGP - Assignment 5 Solution
7 pages
Support Vector Machine (SVM)
No ratings yet
Support Vector Machine (SVM)
28 pages
Support Vector Machine
No ratings yet
Support Vector Machine
9 pages
PA Notes 2
No ratings yet
PA Notes 2
23 pages
Syllabus of Machine Learning
No ratings yet
Syllabus of Machine Learning
19 pages
Unit2 notes What is a Support Vector Machine
No ratings yet
Unit2 notes What is a Support Vector Machine
11 pages
Support Vector Machine Algorithm
No ratings yet
Support Vector Machine Algorithm
8 pages
SVM1
No ratings yet
SVM1
4 pages
SVM - Feb 15
No ratings yet
SVM - Feb 15
34 pages
Support Vector Machine
No ratings yet
Support Vector Machine
19 pages
Support Vactor Machine Final
No ratings yet
Support Vactor Machine Final
11 pages
Support Vector Machine
No ratings yet
Support Vector Machine
12 pages
Advanced Regression Assignment
No ratings yet
Advanced Regression Assignment
5 pages
SVMs
No ratings yet
SVMs
30 pages
ML Answerbank
No ratings yet
ML Answerbank
14 pages
Ann Unit III
No ratings yet
Ann Unit III
20 pages
4 - Lec 11 - ML - Support Vector Machine
No ratings yet
4 - Lec 11 - ML - Support Vector Machine
6 pages
Q No. 1 1.1machine Learning:: Machine Learning Is The Study of Computer Algorithms That Improve Automatically
No ratings yet
Q No. 1 1.1machine Learning:: Machine Learning Is The Study of Computer Algorithms That Improve Automatically
10 pages
UNIT2
No ratings yet
UNIT2
25 pages
Unit 2
No ratings yet
Unit 2
16 pages
SVM Tutorial
No ratings yet
SVM Tutorial
28 pages
Statistics Project
No ratings yet
Statistics Project
5 pages
Lab5 AI
No ratings yet
Lab5 AI
7 pages
Top Numerical Methods With Matlab For Beginners!
From Everand
Top Numerical Methods With Matlab For Beginners!
Andrei Besedin
No ratings yet
Project Final Doc(17)
No ratings yet
Project Final Doc(17)
78 pages
Artificial Intelligence in Tomato Leaf Disease Detection: A Comprehensive Review and Discussion
No ratings yet
Artificial Intelligence in Tomato Leaf Disease Detection: A Comprehensive Review and Discussion
20 pages
Machine
100% (1)
Machine
45 pages
Malaria Detection Using Image Processing and Machine Learning
No ratings yet
Malaria Detection Using Image Processing and Machine Learning
11 pages
AI From Basics To Advanced Levels
No ratings yet
AI From Basics To Advanced Levels
3 pages
2012_NOVATCHKOV_Machine learning methods for the automatic evaluation of exercises on sensor equipped weight training machines
No ratings yet
2012_NOVATCHKOV_Machine learning methods for the automatic evaluation of exercises on sensor equipped weight training machines
6 pages
Thesis On Gene Expression Analysis
No ratings yet
Thesis On Gene Expression Analysis
125 pages
energies-16-01480
No ratings yet
energies-16-01480
33 pages
SS ZC416 Revised Course Handout
No ratings yet
SS ZC416 Revised Course Handout
6 pages
AI Fundamentals Finals
No ratings yet
AI Fundamentals Finals
6 pages
2024 Paper Answers
No ratings yet
2024 Paper Answers
24 pages
Support Vector Machine (SVM)
No ratings yet
Support Vector Machine (SVM)
26 pages
Controller-Free Exploration of
No ratings yet
Controller-Free Exploration of
3 pages
Download Full (Ebook) Data Science and Analytics with Python by Jesus Rogel-Salazar ISBN 9781498742092, 1498742092 PDF All Chapters
100% (8)
Download Full (Ebook) Data Science and Analytics with Python by Jesus Rogel-Salazar ISBN 9781498742092, 1498742092 PDF All Chapters
65 pages
Project Plagiarism Report (Before Removing)
No ratings yet
Project Plagiarism Report (Before Removing)
21 pages
Data Mining and Analysis Fundamental Concepts and Algorithms Mohammed J. Zaki 2024 scribd download
No ratings yet
Data Mining and Analysis Fundamental Concepts and Algorithms Mohammed J. Zaki 2024 scribd download
55 pages
Crop Recommendation System Using Machine Learning
No ratings yet
Crop Recommendation System Using Machine Learning
7 pages
BDE Final Report
No ratings yet
BDE Final Report
53 pages
REVIEW. Artificial Intelligence and Machine Learning-Based Monitoring and Design of Biological Wastewater Treatment Systems
No ratings yet
REVIEW. Artificial Intelligence and Machine Learning-Based Monitoring and Design of Biological Wastewater Treatment Systems
13 pages
Diagnostics 13 03313 v2
No ratings yet
Diagnostics 13 03313 v2
25 pages
A New Local Pressure Loss Coefficient Model of A Duct Tee Junction Applied During Transient Simulation of A HVAC Air-Side System
No ratings yet
A New Local Pressure Loss Coefficient Model of A Duct Tee Junction Applied During Transient Simulation of A HVAC Air-Side System
17 pages
Detecting Clinical Signs of Anaemia Using Machine Learning Report
No ratings yet
Detecting Clinical Signs of Anaemia Using Machine Learning Report
72 pages
Khayyam Offline Persian Handwriting Dataset
No ratings yet
Khayyam Offline Persian Handwriting Dataset
15 pages
Machine Learning in The Era of Big Data 1
No ratings yet
Machine Learning in The Era of Big Data 1
14 pages
HW 3
No ratings yet
HW 3
5 pages
Machine Learning Techniques For Stock Price Predic
No ratings yet
Machine Learning Techniques For Stock Price Predic
10 pages
Multi-Task Pre-Training of Deep Neural Networks For Digital Pathology
No ratings yet
Multi-Task Pre-Training of Deep Neural Networks For Digital Pathology
10 pages
Wa0007
No ratings yet
Wa0007
6 pages
Rainfall prediction using ML
No ratings yet
Rainfall prediction using ML
5 pages

Math Behind SVM Part 1 (Support Vector Machine) - by MLMath - Io - Medium

Uploaded by

Math Behind SVM Part 1 (Support Vector Machine) - by MLMath - Io - Medium

Uploaded by

Open in app

Math behind SVM (Support Vector Machine)

Listen Share More

dotted line is hyperplane, separating blue and pink classes balls.

3. What if data points is not linearly separable ?

5. How to choose optimal Hyperplane ?

6. Mathematical Interpretation of Optimal Hyperplane

7. Basic optimization term

What if data points is not linearly separable ?

so , the above process of making non-linearly separable data point to linearly

So if you choose sub-optimal hyperplane, no doubt after number of training iteration ,

How to choose Optimal Hyperplane ?

Margin and Support Vectors

1. Linear separable data points.

2. Linear separable data points II.

3. Non-linear separable data-points.

Mathematical Interpretation of Optimal Hyperplane

From the above Fig , SVM problem can be formulated as,

Basic optimization algorithm terms

Primal and Dual Concept

Don’t worry, it ‘s theory only !!

Gradient descent of course!!! — -BIG ‘NO’.

why not Gradient descent??

SVM optimization problem is a case of constrained optimization problem, and it

Since it is constrained optimization problem Lagrange multipliers are used to

1. In first we will formulate SVM optimization problem Mathematically

2. we will find gradient with respect to learning parameters.

3. we will find the value of parameters which minimizes ||w||

PART I — Problem formulation

PART II — — — finding the gradient with respect to w ,b and lambda.

PART III — — — — we will get the value of w

1. Problem formulation and substitution value from primal

2. Simplify the loss function equation after substitution

3. final optimization formulation to get the value of λ

PART I — — — — Formulation from primal and Substitution

PART III — — — — — -Final optimization

Now How do we Solve the above problem??

Above maximization operation can be solved with the SMO ( sequential

and using value of w , λ we will calculate b as following,

And a new example x_ can be classified as sign(w.x_

Machine Learning Svm Deep Learning Optimization

You might also like