0% found this document useful (0 votes)

7 views

SVM_NEW

Support Vector Machines (SVM) are supervised learning algorithms used for classification and regression, known for their effectiveness with complex datasets. Developed by Vladimir Vapnik in the 1990s, SVMs utilize the concept of finding an optimal hyperplane that maximizes the margin between classes, employing techniques like the kernel trick for non-linear problems. SVMs have diverse applications, including image and text classification, bioinformatics, finance, and speech recognition.

Uploaded by

Johnbabu Guttikonda

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views

SVM_NEW

Uploaded by

Johnbabu Guttikonda

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

Unit-3 SUPPORT VECTOR MACHINES

Dr. John Babu

Support Vector Machines

Introduction
Support Vector Machines (SVM) are powerful supervised learning algorithms used for classification
and regression tasks. They are particularly well-suited for tasks involving complex datasets where
traditional linear classifiers may fail to deliver satisfactory results. SVMs have become one of the
most popular methods in machine learning due to their effectiveness and robustness in various
applications.

History and Context

The SVM algorithm was developed by Vladimir Vapnik and his colleagues in the early 1990s. The
roots of SVM can be traced back to statistical learning theory, which Vapnik significantly con-
tributed to. The original version of the SVM was designed to solve binary classification problems
by finding the optimal hyperplane that separates two classes of data points in a high-dimensional
space.
The introduction of the kernel trick in the late 1990s expanded the applicability of SVMs by
allowing them to handle non-linear classification problems. By using kernel functions, SVMs can
effectively map the input data into higher dimensions, enabling the algorithm to find a hyperplane
that can separate classes that are not linearly separable in the original input space.

Principle of Support Vector Machines

The main idea behind SVM is to find the optimal hyperplane that maximizes the margin between
two classes. Here are the key concepts:
Hyperplane: In an n-dimensional space, a hyperplane is a flat affine subspace of dimension
n-1. For example, in a 2D space, a hyperplane is a line, and in 3D, it is a plane.
Margin: The margin is defined as the distance between the hyperplane and the closest data
points from either class. The goal of SVM is to maximize this margin, ensuring that the hyperplane
is as far away from the nearest points of both classes as possible.
Support Vectors: These are the data points that lie closest to the hyperplane and are critical
for defining the position and orientation of the hyperplane. Only the support vectors influence
the location of the hyperplane, while the other data points do not affect it.
The SVM algorithm can be summarized in the following steps:
1. Transform the Data: If the data is not linearly separable, kernel functions can be used
to transform the data into a higher-dimensional space.
2. Find the Optimal Hyperplane: The SVM algorithm finds the hyperplane that maxi-
mizes the margin by solving a quadratic optimization problem.
3. Classification: Once the hyperplane is determined, new data points can be classified
based on which side of the hyperplane they fall on.

1
Applications of Support Vector Machines
SVMs have a wide range of applications across various fields due to their robustness and accuracy.
Some notable applications include:
- Image Classification: SVMs are widely used for classifying images in computer vision tasks,
such as face detection and object recognition.
- Text Classification: SVMs are effective in categorizing text documents, such as spam detection
in emails or sentiment analysis.
- Bioinformatics: SVMs are applied in gene expression classification and protein structure
prediction, helping in medical diagnosis and treatment planning.
- Finance: In finance, SVMs are used for credit scoring, risk assessment, and fraud detection
by analyzing historical transaction data.
- Handwriting Recognition: SVMs have been employed in recognizing handwritten characters
and digits, enhancing the accuracy of Optical Character Recognition (OCR) systems.
- Speech Recognition: SVMs are also used in speech recognition systems to classify spoken
words or phrases.
Support Vector Machines are a versatile and powerful tool in the machine learning toolkit,
known for their ability to handle both linear and non-linear classification problems effectively.
Their theoretical foundation and practical applications make them a key algorithm for researchers
and practitioners in various domains.

Optimal Separation
In the context of Support Vector Machines SVM, optimal separation refers to the process of finding
the best hyperplane that divides two classes of data in such a way that maximizes the margin
between them. The margin is defined as the distance between the hyperplane and the nearest
data points from each class, which are known as support vectors.
The goal of optimal separation is to create a decision boundary that not only separates the
two classes but does so in a manner that minimizes the chance of misclassification when new data
points are introduced. The optimization problem can be expressed mathematically as:

minimize 12 ||w||2
subject to ti (w · xi + b) ≥ 1, ∀i
where w is the weight vector, b is the bias term, ti is the class label for data point i, and xi is the
feature vector of data point i.

Principles of Optimal Separation

1. Hyperplane: In a multidimensional space, a hyperplane is a flat affine subspace that separates
the space into two half spaces. In two dimensions, a hyperplane is simply a line, while in three
dimensions, it is a plane.
2. Margin: The margin is the gap between the hyperplane and the nearest points from either
class. The larger the margin, the better the separation. This is crucial for ensuring that the
classifier can generalize well to new, unseen data.
3. Support Vectors: These are the data points that lie closest to the hyperplane. The position
of the hyperplane is determined solely by these points. Other points do not affect its location.
The support vectors are essential in defining the optimal hyperplane.

III B.Tech-CSE(AIML)-MACHINE LEARNING - Dr.John Babu Page 2

Mathematical Formulation
To find the optimal hyperplane, we aim to solve the following optimization problem:

minimize 12 ||w||2
subject to ti (w · xi + b) ≥ 1, ∀i
In this formulation:
- w is the weight vector that defines the orientation of the hyperplane.
- b is the bias term that shifts the hyperplane away from the origin.
- ti is the class label for the training data points xi .

The condition ensures that points of one class yield positive results while points of the other
class yield negative results when plugged into the equation.

Solution By Quadratic Programming

Quadratic programming (QP) is a type of optimization problem where the objective function is
quadratic (meaning it includes terms squared, such as x2 or 21 ||w||2 , and the constraints are linear.
QP is widely used in constrained optimization because it effectively handles situations where the
cost or performance metric is not just linearly related to variables but has a certain curvature,
reflecting diminishing returns, risk minimization, or other nonlinear behaviors in economic and
engineering applications.

Lagrange Multipliers
Lagrange multipliers are a method used in optimization to find the local maxima and minima of
a function subject to equality constraints. This technique is particularly useful when we need to
optimize a function while ensuring that certain conditions (constraints) are met.

Basic Concept of Lagrange Multipliers

Objective Function: This is the function we want to maximize or minimize, denoted as
f (x, y).

Constraints: These are the conditions that must be satisfied, denoted as g(x, y) = 0.

Lagrangian Function: The Lagrangian incorporates both the objective function and the
constraint. It is defined as:

L(x, y, λ) = f (x, y) + λg(x, y)

Here, λ is the Lagrange multiplier.

Steps to Use Lagrange Multipliers

1. Set up the Lagrangian.

2. Take partial derivatives of the Lagrangian with respect to each variable (including the mul-
tiplier).

III B.Tech-CSE(AIML)-MACHINE LEARNING - Dr.John Babu Page 3

3. Set the derivatives equal to zero to find critical points.
4. Solve the resulting system of equations to find the values of the variables and the multiplier.

Simple Example
Let’s illustrate this with a basic example.

Example Problem
Maximize the function f (x, y) = xy subject to the constraint g(x, y) = x + y − 10 = 0.

Step 1: Set up the Lagrangian

The Lagrangian function for this problem is:
L(x, y, λ) = xy + λ(10 − x − y)

Step 2: Take Partial Derivatives

Now, we take partial derivatives of the Lagrangian:
1. With respect to x:
∂L
= y − λ = 0 (1)
∂x
2. With respect to y:
∂L
= x − λ = 0 (2)
∂y
3. With respect to λ:
∂L
= 10 − x − y = 0 (3)
∂λ

Step 3: Set Derivatives Equal to Zero

From equations (1) and (2), we can express λ: - From (1): λ = y - From (2): λ = x
Setting the two equal gives:
y = x (4)

Step 4: Solve the Constraint

Substituting equation (4) into the constraint (3):
10 − x − y = 010 − x − x = 010 − 2x = 02x = 10x = 5
Since y = x, we also have:
y=5

Conclusion: Optimal Solution

The values that maximize the function f (x, y) = xy under the given constraint are:
x = 5, y=5
Maximum Value:
f (5, 5) = 5 × 5 = 25

III B.Tech-CSE(AIML)-MACHINE LEARNING - Dr.John Babu Page 4

Summary
In this example, we used Lagrange multipliers to find the maximum value of the function xy under
the constraint that x + y = 10. The method allowed us to transform a constrained optimization
problem into a system of equations that could be solved easily. This technique is powerful for more
complex optimization problems encountered in various fields, including economics, engineering,
and machine learning.

A Constrained Optimization Problem

In order to determine the effectiveness of a classifier, we can establish that fewer classification
errors indicate a better model. To mathematically express this, we define a set of constraints
where the classifier should make correct predictions. By assigning the target values for two classes
as ±1 instead of 0 and 1, we can write down the product of the target ti and the predicted
output yi . This product will be positive if the predicted class matches the target, and negative
otherwise. Therefore, we can formulate the classifier’s condition as ti (wT xi + b) ≥ 1, ensuring
correct classification.
The full optimization problem is then:
1
min wT w subject to ti (wT xi + b) ≥ 1 ∀i = 1, . . . , n.
2
This optimization problem involves minimizing the norm of the weight vector w, while ensuring
that each datapoint satisfies the given constraint.

Primal Problem in SVM

The Primal Problem is the initial formulation of the optimization problem in SVM. Its objective
is to find a hyperplane that maximally separates data from two classes.
The hyperplane is defined as:
f (x) = wT x + b
where w is the weight vector, and b is the bias term.
For linear separable data, the SVM aims to:
Maximize the margin (distance between the separating hyperplane and the closest points
from each class).

Minimize classification error.

The primal problem is set up as follows:

Objective Function:
1
min ∥w∥2
w,b 2
where ∥w∥2 = wT w (the squared norm of the weight vector).

Constraints: For all data points (xi , yi ) where yi is the label (+1 or -1):

yi (wT xi + b) ≥ 1

These constraints ensure that each point is correctly classified and at least 1 unit away from
the decision boundary.

III B.Tech-CSE(AIML)-MACHINE LEARNING - Dr.John Babu Page 5

Quadratic programming efficiently solves problems like this in polynomial time. The advantage
is that convex problems, like this one, have a unique minimum. The Karush–Kuhn–Tucker
(KKT) conditions define the optimal solution as follows for all values of i:

λi (1 − ti (wT xi + b)) = 0
1 − ti (wT xi + b) ≤ 0
λi ≥ 0
Here, λi are Lagrange multipliers, which allow us to solve constrained optimization prob-
lems.
The first condition implies that if λi = 0, then ti (w∗T xi + b∗ ) = 1, meaning that the constraint
holds as an equality for support vectors. These support vectors lie on the boundary of the
margin, and their constraints hold as equalities, reducing the number of datapoints that need to
be considered.

Lagrangian Function
We define the Lagrangian for the problem as:
n
1 X
L(w, b, λ) = wT w + λi (1 − ti (wT xi + b)).
2 i=1

Differentiating this with respect to w and b, we obtain:

n
∂L X
=w− λi ti xi ,
∂w i=1

n
∂L X
=− λi ti .
∂b i=1

Setting these derivatives equal to zero gives us the optimal values for w and b:
n
X
∗
w = λi ti xi
i=1

n
X
λi ti = 0.
i=1

Substituting these values into the Lagrangian function yields the dual problem, where we aim
to maximize the following with respect to λi :
n n n
∗ ∗
X 1 XX
L(w , b , λ) = λi − λi λj ti tj xTi xj ,
i=1
2 i=1 j=1
Pn
subject to λi ≥ 0 and i=1 λi ti = 0.

Derivation of b for SVM

To understand how the equation for b is formed, let’s break down the process step-by-step. This
derivation is performed in the context of Support Vector Machines (SVM), where we aim to find
the optimal separation boundary between two classes.

III B.Tech-CSE(AIML)-MACHINE LEARNING - Dr.John Babu Page 6

Support Vector Condition: For any support vector xj , the optimal separation condition is
met exactly:
tj wT xj + b = 1

where: - tj is the label of the support vector (either +1 or −1). - wT xj + b represents the linear
boundary applied to the support vector xj .
Expressing w Using Lagrange Multipliers: In terms of the support vectors, we can write
w as: n
X
w= λi ti xi
i=1
where: - λi are the Lagrange multipliers. - xi are the training data points. - Only points with
λi > 0 are support vectors and contribute to w.
Substitute w in the Support Vector Condition: Substitute w into the support vector
condition:  !T 
Xn
tj  λi ti xi x j + b = 1
i=1

Expanding this gives: !

n
X
tj λi ti xTi xj + b =1
i=1
Since tj is +1 or −1, we can divide both sides by tj to isolate b:
n
X
λi ti xTi xj + b = tj
i=1

Rearranging, we obtain:
n
X
b = tj − λi ti xTi xj
i=1
Averaging Over All Support Vectors: Since any support vector xj should satisfy this
equation, we compute b for each support vector and then average to obtain a stable value. With
Ns support vectors, this becomes:
n
!
1 X X
b= tj − λi ti xTi xj
Ns support vectors j i=1

which simplifies to:

n
1 X X
b = tj − λi ti xTi xj
Ns support vectors j i=1

This averaging approach gives a more robust calculation of b across all support vectors, rather
than relying on a single point.

Prediction for a New Data Point

For a new point z, the prediction can be made using:
n
X
wz + b = λi ti xTi z + b.
i=1

Thus, classification of a new point involves computing the inner product between the point and
the support vectors.

III B.Tech-CSE(AIML)-MACHINE LEARNING - Dr.John Babu Page 7

Slack Variables for Non-Linearly Separable Problems
In a linearly separable dataset, there exists a hyperplane that perfectly separates the classes, and
every point satisfies the constraint:
ti (wT xi + b) ≥ 1
where
ti is the class label (±1),
w is the weight vector, and b is the bias.
However, in real-world applications, data is rarely linearly separable. To allow for some degree of
misclassification, we introduce slack variables ηi ≥ 0.
With slack variables, the constraints become:

ti (wT xi + b) ≥ 1 − ηi

where:

ηi = 0 for correctly classified points on or beyond the margin.

0 < ηi ≤ 1 for points inside the margin but on the correct side of the hyperplane.

ηi > 1 for points misclassified on the wrong side of the hyperplane.

Slack variables allow some points to violate the margin constraint, making the model more
flexible for non-linearly separable data.

Modified Objective Function for SVM with Slack Variables

Incorporating slack variables into the SVM optimization problem requires a trade-off between
maximizing the margin and minimizing classification errors. This leads to the objective function:
n
X
T
L(w, ϵ) = w w + C ϵi
i=1

where

-C is a regularization parameter that controls the trade-off between a wide margin and the
penalty for misclassifications.
- wTPw corresponds to maximizing the margin
- C ni=1 ηi penalizes points violating the margin. -ϵi distance of misclassified points from the
correct boundary line

If C is large: The optimization places higher priority on correctly classifying points, po-
tentially reducing the margin. Penalty of mis-classification is high.

If C is small: The optimization favors a larger margin, allowing more misclassifications if

necessary.

KKT Conditions for SVM with Slack Variables

The Karush-Kuhn-Tucker (KKT) conditions for this modified problem define the optimal solution
for the soft-margin SVM. These conditions are adjusted to account for the slack variables ηi :

III B.Tech-CSE(AIML)-MACHINE LEARNING - Dr.John Babu Page 8

Complementary slackness condition:

λi (1 − ti (wT xi + b) − ηi ) = 0

- If λi > 0, then the point lies exactly on the margin or within the margin boundary.
- If ηi > 0, then the point violates the margin constraint (it lies inside or beyond the margin).
Condition for support vectors:

(C − λi )ηi = 0

- If λi < C, then ηi = 0, indicating the point is a support vector lying on the margin
boundary.
- If λi = C and ηi > 1, the classifier misclassifies the point.
Separation constraint:
n
X
λi ti = 0
i=1

This ensures the separation constraint, indicating the balance between the support vectors
of both classes.

Kernel Trick in Machine Learning

The kernel trick is a method used in machine learning to enable algorithms, like Support Vector
Machines (SVMs), to operate in a high-dimensional space without directly computing the coor-
dinates in that space. This is useful for solving problems that are not linearly separable in their
original feature space. When we cannot linearly separate data in the original feature space, mod-
ifying the features can help. This idea is similar to the XOR problem we encountered earlier. By
transforming the data into a higher-dimensional space, we might find a linear decision boundary
that separates the classes. To achieve this, we introduce new functions ϕ(x) based on the input
features.
The key idea is to transform the input xi into a new form ϕ(xi ), while still being able to use
the SVM algorithm. Specifically, Equation (8.11) remains valid, but with xi replaced by ϕ(xi ).
The resulting prediction equation becomes:
n
X
wT x + b = λi ti ϕ(xi )T ϕ(z) + b.
i=1

The choice of functions ϕ(x) is critical. For instance, if we use a basis consisting of polynomials
up to degree 2, we can derive new features from the original input. A simple example for d = 3
dimensions would be:
√ √ √ √ √ √
Φ(x) = (1, 2x1 , 2x2 , 2x3 , x21 , x22 , x23 , 2x1 x2 , 2x1 x3 , 2x2 x3 ).
This transformation increases the dimensionality, making it computationally expensive. How-
ever, there is a trick: we don’t need to compute Φ(xi )T Φ(xj ) directly. Instead, we use the kernel
trick, which allows us to compute the dot product in the original space. For example, we can
express Φ(x)T Φ(y) as:

Φ(x)T Φ(y) = (1 + xT y)2 .

This reduces the computational cost from O(d2 ) to O(d), and the same applies to higher-order
polynomials.

III B.Tech-CSE(AIML)-MACHINE LEARNING - Dr.John Babu Page 9

Why the Kernel Trick is Needed
In many machine learning tasks, such as classification, we want to find a hyperplane that best
separates the data points of different classes. However, in some cases, the data cannot be separated
with a straight line in the original feature space. The kernel trick helps by implicitly mapping
the data into a higher-dimensional space where it becomes easier to separate.

How the Kernel Trick Works

The kernel trick leverages the concept of a kernel function. Instead of computing the coor-
dinates of the points in the high-dimensional space (which would be computationally expensive),
the kernel function directly computes the inner product (dot product) of two points in that space.

Kernel Transformation We map the input space x to a higher-dimensional space ϕ(x) where
linear separation can be achieved:
ϕ : x → ϕ(x)
Instead of explicitly computing this mapping, SVMs use a kernel function K(xi , xj ) = ϕ(xi )T ϕ(xj )
to calculate the dot product directly in the transformed space.
Mathematically, a kernel function K(x, y) is defined as:

K(x, y) = ϕ(x) · ϕ(y)

where:
x and y are data points in the original feature space.

ϕ(x) is a mapping function that transforms x to a higher-dimensional space.

With the kernel trick, we don’t need to know or calculate ϕ(x) explicitly. Instead, we only
compute K(x, y), which gives us the inner product of the transformed vectors. This makes it
computationally feasible to work in higher dimensions without explicitly mapping the data.

Common Kernel Functions

Some commonly used kernel functions are:
1. Linear kernel: K(x, y) = x · y

Used when the data is already linearly separable.

2. Polynomial kernel: K(x, y) = (x · y + c)d

c: A constant that trades off the influence of higher-order versus lower-order terms.
Typically set to 1 or 0.
d: The degree of the polynomial. A higher d allows more complex decision boundaries.
Useful for creating polynomial decision boundaries, which can separate more complex
data.
2

3. Gaussian (RBF) kernel: K(x, y) = exp − ∥x−y∥ 2σ 2

∥x − y∥: The Euclidean distance between x and y.

σ: The bandwidth parameter, controlling the spread of the Gaussian. Smaller σ values
make the kernel more sensitive to changes in x and y.

III B.Tech-CSE(AIML)-MACHINE LEARNING - Dr.John Babu Page 10

Used for non-linear data, as it can map points to an infinitely high-dimensional space.

4. Sigmoid kernel: K(x, y) = tanh(αx · y + c)

α: A scaling parameter, controlling the slope of the sigmoid function.

c: A constant that shifts the kernel function along the x-axis, controlling the threshold
of the sigmoid.
Similar to the activation function in neural networks, often used in neural network-
based methods.

Example
Imagine a dataset where the data points for two classes form concentric circles. This data is not
linearly separable in two-dimensional space, so a linear SVM would fail to classify it correctly.
However, with the kernel trick (using, for example, a Gaussian kernel), we can implicitly map
this data to a higher-dimensional space where it becomes linearly separable, allowing the SVM to
create a boundary that divides the two classes accurately.

Significance of the Kernel Trick

The kernel trick is significant because:

It allows complex decision boundaries in the original feature space by transforming

data into a higher-dimensional space where linear separation is possible.

It reduces computational complexity by avoiding the direct computation of high-

dimensional coordinates, making algorithms efficient even with complex mappings.

It enables algorithms to work with non-linear data and still achieve high accuracy,
as in SVMs and other machine learning methods.

In essence, the kernel trick enables algorithms to solve complex problems without a massive
increase in computational cost, making it a cornerstone technique in machine learning for non-
linear classification and regression tasks.

Support Vector Machine (SVM) Regression

Interestingly, the Support Vector Machine (SVM) can also be used for regression tasks. The
main idea is to adapt the usual least-squares error function, which typically aims to minimize the
difference between the predicted values and the actual target values. This error function can be
represented as:
N
1 X 1
(ti − yi )2 + λ∥w∥2 ,
2N i=1 2
where N is the number of data points, ti is the target value, yi is the predicted value, and w
represents the weights of the model.
To transform this into a form suitable for SVM regression, we utilize what is called an ϵ-
insensitive error function, denoted as Eϵ . This function behaves as follows: - It gives a value of
0 if the absolute difference between the target and the predicted output is less than ϵ. - If the
difference exceeds ϵ, it subtracts ϵ for consistency.

III B.Tech-CSE(AIML)-MACHINE LEARNING - Dr.John Babu Page 11

This transformation helps focus on the points that are not well predicted, which means we can
maintain a small number of support vectors.
The modified error function can be expressed as:
N
X 1
Eϵ (ti − yi ) + λ ∥w∥2 ,
i=1
2
where Eϵ (ti − yi ) encapsulates the ϵ-insensitive behavior.

Understanding the Prediction Process

Instead of requiring the predictions to be exactly correct, we want them to fall within a tube
of radius ϵ around the ideal prediction line. To accommodate possible errors in prediction, we
introduce slack variables for each data point, denoted as ϵi for the ith data point.
The process involves using Lagrange multipliers, which help manage constraints, transform-
ing our problem into a dual form, applying a kernel function, and ultimately solving it using a
quadratic solver.
The final prediction for a test point z is given by the equation:
n
X
f (z) = (µi − λi K(xi , z)) + b,
i=1

where µi and λi are sets of constraint variables, and K(xi , z) represents the kernel function
applied to the data points.

Further Developments in SVM

There has been substantial advancement in kernel methods and SVMs, including optimization
techniques such as Sequential Minimal Optimization. Additionally, some methods aim to compute
posterior probabilities instead of making hard classification decisions, one such method being the
Relevance Vector Machine.
Various SVM implementations are available online that offer advanced features beyond those
discussed in the standard texts. While many of these implementations are written in C, some
provide interfaces for other programming languages, including Python. A simple internet search
can yield several options to explore, with popular choices including SVMLight, LIBSVM, and
scikit-learn.

III B.Tech-CSE(AIML)-MACHINE LEARNING - Dr.John Babu Page 12

Official Hardcore Nuzlocke Rulebook by PokemonChallenges
No ratings yet
Official Hardcore Nuzlocke Rulebook by PokemonChallenges
4 pages
School Registration and Fees Management System
No ratings yet
School Registration and Fees Management System
24 pages
Support Vector Machine
100% (2)
Support Vector Machine
11 pages
Velo Clod Lab Hol 2140 01 Net - PDF - en
No ratings yet
Velo Clod Lab Hol 2140 01 Net - PDF - en
172 pages
SVM SLIDES
No ratings yet
SVM SLIDES
32 pages
Math Behind SVM Part 1 (Support Vector Machine) - by MLMath - Io - Medium
No ratings yet
Math Behind SVM Part 1 (Support Vector Machine) - by MLMath - Io - Medium
15 pages
SVM Scribe Notes
No ratings yet
SVM Scribe Notes
16 pages
Support Vector Machine
No ratings yet
Support Vector Machine
35 pages
An Idiot Guide To SVM
No ratings yet
An Idiot Guide To SVM
25 pages
Lec5 Support vector machine
No ratings yet
Lec5 Support vector machine
28 pages
SVM Seminarbericht Hofmann
No ratings yet
SVM Seminarbericht Hofmann
16 pages
SVM Explained PDF
No ratings yet
SVM Explained PDF
19 pages
SVM.pptx
No ratings yet
SVM.pptx
67 pages
SVM
No ratings yet
SVM
11 pages
UNIT - 2
No ratings yet
UNIT - 2
15 pages
K-SVM: An Effective SVM Algorithm Based On K-Means Clustering
No ratings yet
K-SVM: An Effective SVM Algorithm Based On K-Means Clustering
8 pages
CS-13410 Introduction To Machine Learning
No ratings yet
CS-13410 Introduction To Machine Learning
33 pages
Unit 2
No ratings yet
Unit 2
47 pages
By: Moataz Al-Haj: Vision Topics - Seminar (University of Haifa)
No ratings yet
By: Moataz Al-Haj: Vision Topics - Seminar (University of Haifa)
69 pages
Introduction To Support Vector Machines: 1 Description
No ratings yet
Introduction To Support Vector Machines: 1 Description
15 pages
Report 1
No ratings yet
Report 1
6 pages
20 SVM
No ratings yet
20 SVM
35 pages
SVM 30thoct Annotated
No ratings yet
SVM 30thoct Annotated
35 pages
W12 SVM
No ratings yet
W12 SVM
52 pages
EXP-14
No ratings yet
EXP-14
27 pages
SVM-1
No ratings yet
SVM-1
36 pages
SVM notes unit 4.docx
No ratings yet
SVM notes unit 4.docx
8 pages
Support Vector Machine
No ratings yet
Support Vector Machine
46 pages
An Introduction To Support Vector Machines
No ratings yet
An Introduction To Support Vector Machines
13 pages
Another Introduction SVM
No ratings yet
Another Introduction SVM
4 pages
Chapter 5 - Support Vector Machine: Prepared By: Shier Nee, SAW
No ratings yet
Chapter 5 - Support Vector Machine: Prepared By: Shier Nee, SAW
44 pages
Support Vector Machine
No ratings yet
Support Vector Machine
55 pages
Svm
No ratings yet
Svm
40 pages
An Idiot's Guide To Support Vector Machines
No ratings yet
An Idiot's Guide To Support Vector Machines
28 pages
Support Vector Machines Theory Implementation and Applications (1)
No ratings yet
Support Vector Machines Theory Implementation and Applications (1)
10 pages
SML Unit 4
No ratings yet
SML Unit 4
61 pages
Support Vector Machine
No ratings yet
Support Vector Machine
45 pages
Lec06 SVM
No ratings yet
Lec06 SVM
25 pages
Support Vector Machine
No ratings yet
Support Vector Machine
52 pages
Unit 2 - SVM - 241016 - 104220
No ratings yet
Unit 2 - SVM - 241016 - 104220
47 pages
SVM Tutorial
No ratings yet
SVM Tutorial
34 pages
SVM Tutorial
No ratings yet
SVM Tutorial
34 pages
SVM PRESENTATION
No ratings yet
SVM PRESENTATION
34 pages
Support Vector Machines Theory Implementation and Applications
No ratings yet
Support Vector Machines Theory Implementation and Applications
10 pages
ML Lec SVM Linear
No ratings yet
ML Lec SVM Linear
19 pages
Final - Support Vector Machine - Class - Modifie
No ratings yet
Final - Support Vector Machine - Class - Modifie
69 pages
Machine Learning Unit-3.3
No ratings yet
Machine Learning Unit-3.3
38 pages
By: Moataz Al-Haj: Vision Topics - Seminar (University of Haifa)
No ratings yet
By: Moataz Al-Haj: Vision Topics - Seminar (University of Haifa)
69 pages
SVM Slides
No ratings yet
SVM Slides
22 pages
Support Vector Machines: (Vapnik, 1979)
No ratings yet
Support Vector Machines: (Vapnik, 1979)
34 pages
SVM_Presentation
No ratings yet
SVM_Presentation
19 pages
Support Vector Machine (SVM)
No ratings yet
Support Vector Machine (SVM)
4 pages
1632118884_ML-TCS-Lecture-15 (1)
No ratings yet
1632118884_ML-TCS-Lecture-15 (1)
46 pages
data mining techniques
No ratings yet
data mining techniques
27 pages
27-Module 4 - Support Vector Machine and Naïve Bayes-20-09-2024
No ratings yet
27-Module 4 - Support Vector Machine and Naïve Bayes-20-09-2024
31 pages
SVM Consolidated
No ratings yet
SVM Consolidated
34 pages
ML-chap13_2024_110331
No ratings yet
ML-chap13_2024_110331
67 pages
Basic of SVM Algorithm
No ratings yet
Basic of SVM Algorithm
10 pages
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet
Mathematical Optimization: Fundamentals and Applications
From Everand
Mathematical Optimization: Fundamentals and Applications
Fouad Sabry
No ratings yet
Ordered Weighted Averaging Aggregation Operator: Fundamentals and Applications
From Everand
Ordered Weighted Averaging Aggregation Operator: Fundamentals and Applications
Fouad Sabry
No ratings yet
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Random Optimization: Fundamentals and Applications
From Everand
Random Optimization: Fundamentals and Applications
Fouad Sabry
No ratings yet
PROBABILISTIC Learning Jb-new
No ratings yet
PROBABILISTIC Learning Jb-new
13 pages
DECISION TREES-jb
No ratings yet
DECISION TREES-jb
8 pages
Syntax_complete
No ratings yet
Syntax_complete
22 pages
UNIT-3_SEMANTICS MATERIAL
No ratings yet
UNIT-3_SEMANTICS MATERIAL
16 pages
IDIOMS
No ratings yet
IDIOMS
10 pages
Stresses in Beams
No ratings yet
Stresses in Beams
5 pages
Vandana Environmental
No ratings yet
Vandana Environmental
8 pages
AHIST 1401-01 - AY2022-T4 - Syllabus
No ratings yet
AHIST 1401-01 - AY2022-T4 - Syllabus
7 pages
Session 3 - Data Modeling
No ratings yet
Session 3 - Data Modeling
60 pages
Chapter 11
No ratings yet
Chapter 11
23 pages
Advanced Rotoverter RV Research and Development
No ratings yet
Advanced Rotoverter RV Research and Development
135 pages
Detailed Production Process of IPA
75% (4)
Detailed Production Process of IPA
19 pages
C.V - Resume - Refilwe Prinah Isaac-Masiapeto
No ratings yet
C.V - Resume - Refilwe Prinah Isaac-Masiapeto
2 pages
BCECE Physics Question Paper 2015 PDF Download
No ratings yet
BCECE Physics Question Paper 2015 PDF Download
37 pages
Friction Tapes: Standard Test Methods For
No ratings yet
Friction Tapes: Standard Test Methods For
5 pages
5E's Friction Day 4
No ratings yet
5E's Friction Day 4
7 pages
Learning Skills Assessment Record
100% (1)
Learning Skills Assessment Record
2 pages
Gcst Research Manual (1)
No ratings yet
Gcst Research Manual (1)
108 pages
Ra 9175
100% (1)
Ra 9175
41 pages
Chapter 1 Pesto - Docx, With Corrections
No ratings yet
Chapter 1 Pesto - Docx, With Corrections
11 pages
Tinai Concept
No ratings yet
Tinai Concept
14 pages
Sikacim Latex - Pds en PDF
No ratings yet
Sikacim Latex - Pds en PDF
3 pages
Rape cases data analysis
No ratings yet
Rape cases data analysis
25 pages
Kaufman 2004
No ratings yet
Kaufman 2004
32 pages
Q1 - Las 3 - Pe11
No ratings yet
Q1 - Las 3 - Pe11
2 pages
Yoruba Town
100% (2)
Yoruba Town
2 pages
Download An Effective Strategy for Safe Design in Engineering and Construction David England ebook All Chapters PDF
100% (1)
Download An Effective Strategy for Safe Design in Engineering and Construction David England ebook All Chapters PDF
47 pages
Volcanic Hazards: Lava Flow
No ratings yet
Volcanic Hazards: Lava Flow
31 pages
Gathering Information and Measuring Market Demand
No ratings yet
Gathering Information and Measuring Market Demand
1 page
CC-M19 Rain Sensor (Polycarbonate)
No ratings yet
CC-M19 Rain Sensor (Polycarbonate)
10 pages
CBSE-ENGLISH- WORKSHEET- VIII CLASS - QUESTION PAPER _20240113_080310
No ratings yet
CBSE-ENGLISH- WORKSHEET- VIII CLASS - QUESTION PAPER _20240113_080310
3 pages

SVM_NEW

Uploaded by

SVM_NEW

Uploaded by

Unit-3 SUPPORT VECTOR MACHINES

Dr. John Babu

Support Vector Machines

History and Context

Principle of Support Vector Machines

Principles of Optimal Separation

III B.Tech-CSE(AIML)-MACHINE LEARNING - Dr.John Babu Page 2

Solution By Quadratic Programming

Basic Concept of Lagrange Multipliers

L(x, y, λ) = f (x, y) + λg(x, y)

Here, λ is the Lagrange multiplier.

Steps to Use Lagrange Multipliers

III B.Tech-CSE(AIML)-MACHINE LEARNING - Dr.John Babu Page 3

Step 1: Set up the Lagrangian

Step 2: Take Partial Derivatives

Step 3: Set Derivatives Equal to Zero

Step 4: Solve the Constraint

Conclusion: Optimal Solution

III B.Tech-CSE(AIML)-MACHINE LEARNING - Dr.John Babu Page 4

A Constrained Optimization Problem

Primal Problem in SVM

 Minimize classification error.

The primal problem is set up as follows:

III B.Tech-CSE(AIML)-MACHINE LEARNING - Dr.John Babu Page 5

Differentiating this with respect to w and b, we obtain:

Derivation of b for SVM

III B.Tech-CSE(AIML)-MACHINE LEARNING - Dr.John Babu Page 6

Expanding this gives: !

which simplifies to:

Prediction for a New Data Point

III B.Tech-CSE(AIML)-MACHINE LEARNING - Dr.John Babu Page 7

 ηi = 0 for correctly classified points on or beyond the margin.

 ηi > 1 for points misclassified on the wrong side of the hyperplane.

Modified Objective Function for SVM with Slack Variables

 If C is small: The optimization favors a larger margin, allowing more misclassifications if

KKT Conditions for SVM with Slack Variables

III B.Tech-CSE(AIML)-MACHINE LEARNING - Dr.John Babu Page 8

Kernel Trick in Machine Learning

Φ(x)T Φ(y) = (1 + xT y)2 .

III B.Tech-CSE(AIML)-MACHINE LEARNING - Dr.John Babu Page 9

How the Kernel Trick Works

K(x, y) = ϕ(x) · ϕ(y)

 ϕ(x) is a mapping function that transforms x to a higher-dimensional space.

Common Kernel Functions

 Used when the data is already linearly separable.

2. Polynomial kernel: K(x, y) = (x · y + c)d

 ∥x − y∥: The Euclidean distance between x and y.

III B.Tech-CSE(AIML)-MACHINE LEARNING - Dr.John Babu Page 10

4. Sigmoid kernel: K(x, y) = tanh(αx · y + c)

 α: A scaling parameter, controlling the slope of the sigmoid function.

Significance of the Kernel Trick

 It allows complex decision boundaries in the original feature space by transforming

 It reduces computational complexity by avoiding the direct computation of high-

Support Vector Machine (SVM) Regression

III B.Tech-CSE(AIML)-MACHINE LEARNING - Dr.John Babu Page 11

Understanding the Prediction Process

Further Developments in SVM

III B.Tech-CSE(AIML)-MACHINE LEARNING - Dr.John Babu Page 12

You might also like

Minimize classification error.

ηi = 0 for correctly classified points on or beyond the margin.

ηi > 1 for points misclassified on the wrong side of the hyperplane.

If C is small: The optimization favors a larger margin, allowing more misclassifications if

ϕ(x) is a mapping function that transforms x to a higher-dimensional space.

Used when the data is already linearly separable.

∥x − y∥: The Euclidean distance between x and y.

α: A scaling parameter, controlling the slope of the sigmoid function.

It allows complex decision boundaries in the original feature space by transforming

It reduces computational complexity by avoiding the direct computation of high-