Unit2 notes What is a Support Vector Machine
Unit2 notes What is a Support Vector Machine
1. Hyperplane:
In SVM, a hyperplane is a decision boundary that separates data into
two classes. For binary classification, the hyperplane is chosen to
maximize the margin between the classes.
2. Support Vectors:
Support vectors are the data points closest to the hyperplane. They
play a crucial role in determining the position and orientation of the
hyperplane.
3. Margin:
The margin is the distance between the hyperplane and the nearest
data points (support vectors). SVM aims to maximize this margin, as it
often leads to better generalization.
4. Kernel Trick:
SVM can handle nonlinear data by using a kernel function that
implicitly maps data into a higher-dimensional space, where a linear
hyperplane can be applied effectively.
Types of SVM:
1. Linear SVM:
In linear SVM, a linear hyperplane is used to separate data. It works
well when data is linearly separable.
2. Non-linear SVM:
When data is not linearly separable, non-linear SVM uses kernel
functions (e.g., polynomial, radial basis function) to transform the data
into a higher-dimensional space where it can be linearly separated.
1. Data Preparation:
SVM requires labeled training data, where each data point is assigned
to one of two classes (binary classification).
2. Feature Scaling:
It's important to scale features to have similar ranges, as SVM is
sensitive to feature scales.
3. Hyperplane Selection:
SVM finds the hyperplane that maximizes the margin while correctly
classifying as many training points as possible.
4. Kernel Transformation (for non-linear data):
If the data is not linearly separable, a kernel function is applied to map
the data to a higher-dimensional space.
5. Margin Optimization:
SVM aims to maximize the margin while minimizing classification
errors. This is often formulated as a mathematical optimization
problem.
6. Support Vector Identification:
The support vectors are identified as the data points that lie closest to
the hyperplane.
7. Classification:
After training, the SVM can classify new, unlabeled data points based
on their position relative to the hyperplane.
Advantages of SVM:
Challenges of SVM:
Applications of SVM:
From the figure above it’s very clear that there are multiple lines (our
hyperplane here is a line because we are considering only two input
features x1, x2) that segregate our data points or do a classification
between red and blue circles. So how do we choose the best line or in
general the best hyperplane that segregates our data points?
So in this type of data point what SVM does is, finds the maximum
margin as done with previous data sets along with that it adds a
penalty each time a point crosses the margin. So the margins in these
Say, our data is shown in the figure above. SVM solves this by creating
a new variable using a kernel. We call a point xi on the line and we
create a new variable yi as a function of distance from origin o.so if we
plot this we get something like as shown below
The vector W represents the normal vector to the hyperplane. i.e the
direction perpendicular to the hyperplane. The parameter b in the
equation represents the offset or distance of the hyperplane from the
origin along the normal vector w.
The distance between a data point x_i and the decision boundary can
be calculated as:
Optimization:
For Hard margin linear SVM classifier:
The target variable or label for the ith training instance is denoted by
the symbol ti in this statement. And ti=-1 for negative occurrences
(when yi= 0) and ti=1positive instances (when yi = 1) respectively.
Because we require the decision boundary that satisfy the constraint:
where,
αi is the Lagrange multiplier associated with the ith training sample.
K(xi, xj) is the kernel function that computes the similarity between
two samples xi and xj. It allows SVM to handle nonlinear
classification problems by implicitly mapping the samples into a
higher-dimensional feature space.
The term ∑αi represents the sum of all Lagrange multipliers.
The SVM decision boundary can be described in terms of these optimal
Lagrange multipliers and the support vectors once the dual issue has
been solved and the optimal Lagrange multipliers have been
discovered. The training samples that have i > 0 are the support
vectors, while the decision boundary is supplied by:
Advantages of SVM
Effective in high-dimensional cases.
Its memory is efficient as it uses a subset of training points in the
decision function called support vectors.
Different kernel functions can be specified for the decision functions
and its possible to specify custom kernels.