0% found this document useful (0 votes)
11 views

Unit2 notes What is a Support Vector Machine

Unit2

Uploaded by

sonia choudhary
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

Unit2 notes What is a Support Vector Machine

Unit2

Uploaded by

sonia choudhary
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 11

What is a Support Vector Machine (SVM)?

 A Support Vector Machine (SVM) is a supervised machine learning algorithm


used for classification and regression tasks.
 SVM is particularly well-suited for classification problems, where it finds the
best hyperplane to separate data points into distinct classes.

Key Concepts of SVM:

1. Hyperplane:
 In SVM, a hyperplane is a decision boundary that separates data into
two classes. For binary classification, the hyperplane is chosen to
maximize the margin between the classes.
2. Support Vectors:
 Support vectors are the data points closest to the hyperplane. They
play a crucial role in determining the position and orientation of the
hyperplane.
3. Margin:
 The margin is the distance between the hyperplane and the nearest
data points (support vectors). SVM aims to maximize this margin, as it
often leads to better generalization.
4. Kernel Trick:
 SVM can handle nonlinear data by using a kernel function that
implicitly maps data into a higher-dimensional space, where a linear
hyperplane can be applied effectively.

Types of SVM:

1. Linear SVM:
 In linear SVM, a linear hyperplane is used to separate data. It works
well when data is linearly separable.
2. Non-linear SVM:
 When data is not linearly separable, non-linear SVM uses kernel
functions (e.g., polynomial, radial basis function) to transform the data
into a higher-dimensional space where it can be linearly separated.

How SVM Works:

1. Data Preparation:
 SVM requires labeled training data, where each data point is assigned
to one of two classes (binary classification).
2. Feature Scaling:
 It's important to scale features to have similar ranges, as SVM is
sensitive to feature scales.
3. Hyperplane Selection:
 SVM finds the hyperplane that maximizes the margin while correctly
classifying as many training points as possible.
4. Kernel Transformation (for non-linear data):
 If the data is not linearly separable, a kernel function is applied to map
the data to a higher-dimensional space.
5. Margin Optimization:
 SVM aims to maximize the margin while minimizing classification
errors. This is often formulated as a mathematical optimization
problem.
6. Support Vector Identification:
 The support vectors are identified as the data points that lie closest to
the hyperplane.
7. Classification:
 After training, the SVM can classify new, unlabeled data points based
on their position relative to the hyperplane.

Advantages of SVM:

1. Effective in High-Dimensional Spaces: SVM works well even when the


number of features (dimensions) is much larger than the number of samples.
2. Robust to Overfitting: SVM tends to generalize well, thanks to its margin
maximization principle, which helps avoid overfitting.
3. Effective with Non-linear Data: SVM can handle complex, non-linear
relationships in data using kernel functions.
4. Fewer Hyperparameters: SVM has relatively few hyperparameters to tune
compared to some other algorithms.

Challenges of SVM:

1. Sensitivity to Scaling: SVM's performance can be affected by feature


scaling, so careful preprocessing is needed.
2. Slower Training on Large Datasets: SVM can be computationally
intensive, especially with large datasets.
3. Kernel Selection: Choosing the right kernel function can be challenging, as
it affects the algorithm's performance.

Applications of SVM:

 SVM is widely used in various applications, including:


 Text classification and sentiment analysis.
 Image classification and face recognition.
 Bioinformatics for protein classification.
 Financial fraud detection.
 Medical diagnosis.

In summary, Support Vector Machines are powerful and versatile machine


learning algorithms used for classification tasks. They are effective in both
linear and non-linear scenarios, but proper feature scaling and kernel
selection are essential for their success. SVMs are widely applied in many
fields due to their ability to handle complex data separation problems.

Support Vector Machine (SVM)


Algorithm
 Read
 Discuss
 Courses
 Practice
 Video



Support Vector Machine (SVM) is a powerful machine learning


algorithm used for linear or nonlinear classification, regression, and
even outlier detection tasks. SVMs can be used for a variety of tasks,
such as text classification, image classification, spam
detection, handwriting identification, gene expression analysis, face
detection, and anomaly detection. SVMs are adaptable and efficient in
a variety of applications because they can manage high-dimensional
data and nonlinear relationships.
SVM algorithms are very effective as we try to find the maximum
separating hyperplane between the different classes available in the
target feature.

Support Vector Machine


Support Vector Machine (SVM) is a supervised machine
learning algorithm used for both classification and regression. Though
we say regression problems as well it’s best suited for classification.
The main objective of the SVM algorithm is to find the
optimal hyperplane in an N-dimensional space that can separate the
data points in different classes in the feature space. The hyperplane
tries that the margin between the closest points of different classes
should be as maximum as possible. The dimension of the hyperplane
depends upon the number of features. If the number of input features
is two, then the hyperplane is just a line. If the number of input
features is three, then the hyperplane becomes a 2-D plane. It
becomes difficult to imagine when the number of features exceeds
three.
Let’s consider two independent variables x1, x2, and one dependent
variable which is either a blue circle or a red circle.

Linearly Separable Data points

From the figure above it’s very clear that there are multiple lines (our
hyperplane here is a line because we are considering only two input
features x1, x2) that segregate our data points or do a classification
between red and blue circles. So how do we choose the best line or in
general the best hyperplane that segregates our data points?

How does SVM work?

One reasonable choice as the best hyperplane is the one that


represents the largest separation or margin between the two classes.
Multiple hyperplanes separate the data from two classes

So we choose the hyperplane whose distance from it to the nearest


data point on each side is maximized. If such a hyperplane exists it is
known as the maximum-margin hyperplane/hard margin. So from
the above figure, we choose L2. Let’s consider a scenario like shown
below

Selecting hyperplane for data with outlier


Here we have one blue ball in the boundary of the red ball. So how
does SVM classify the data? It’s simple! The blue ball in the boundary
of red ones is an outlier of blue balls. The SVM algorithm has the
characteristics to ignore the outlier and finds the best hyperplane that
maximizes the margin. SVM is robust to outliers.

Hyperplane which is the most optimized one

So in this type of data point what SVM does is, finds the maximum
margin as done with previous data sets along with that it adds a
penalty each time a point crosses the margin. So the margins in these

the data set, the SVM tries to minimize (1/margin+∧(∑penalty)). Hinge


types of cases are called soft margins. When there is a soft margin to

loss is a commonly used penalty. If no violations no hinge loss.If


violations hinge loss proportional to the distance of violation.
Till now, we were talking about linearly separable data(the group of
blue balls and red balls are separable by a straight line/linear line).
What to do if data are not linearly separable?
Original 1D dataset for classification

Say, our data is shown in the figure above. SVM solves this by creating
a new variable using a kernel. We call a point xi on the line and we
create a new variable yi as a function of distance from origin o.so if we
plot this we get something like as shown below

Mapping 1D data to 2D to become able to separate the two classes

In this case, the new variable y is created as a function of distance


from the origin. A non-linear function that creates a new variable is
referred to as a kernel.

Support Vector Machine Terminology


1. Hyperplane: Hyperplane is the decision boundary that is used to
separate the data points of different classes in a feature space. In
the case of linear classifications, it will be a linear equation i.e.
wx+b = 0.
2. Support Vectors: Support vectors are the closest data points to
the hyperplane, which makes a critical role in deciding the
hyperplane and margin.
3. Margin: Margin is the distance between the support vector and
hyperplane. The main objective of the support vector machine
algorithm is to maximize the margin. The wider margin indicates
better classification performance.
4. Kernel: Kernel is the mathematical function, which is used in SVM
to map the original input data points into high-dimensional feature
spaces, so, that the hyperplane can be easily found out even if the
data points are not linearly separable in the original input space.
Some of the common kernel functions are linear, polynomial, radial
basis function(RBF), and sigmoid.
5. Hard Margin: The maximum-margin hyperplane or the hard margin
hyperplane is a hyperplane that properly separates the data points
of different categories without any misclassifications.
6. Soft Margin: When the data is not perfectly separable or contains
outliers, SVM permits a soft margin technique. Each data point has a
slack variable introduced by the soft-margin SVM formulation, which
softens the strict margin requirement and permits certain
misclassifications or violations. It discovers a compromise between
increasing the margin and reducing violations.
7. C: Margin maximisation and misclassification fines are balanced by
the regularisation parameter C in SVM. The penalty for going over
the margin or misclassifying data items is decided by it. A stricter
penalty is imposed with a greater value of C, which results in a
smaller margin and perhaps fewer misclassifications.
8. Hinge Loss: A typical loss function in SVMs is hinge loss. It
punishes incorrect classifications or margin violations. The objective
function in SVM is frequently formed by combining it with the
regularisation term.
9. Dual Problem: A dual Problem of the optimisation problem that
requires locating the Lagrange multipliers related to the support
vectors can be used to solve SVM. The dual formulation enables the
use of kernel tricks and more effective computing.

Mathematical intuition of Support Vector Machine


Consider a binary classification problem with two classes, labeled as
+1 and -1. We have a training dataset consisting of input feature
vectors X and their corresponding class labels Y.
The equation for the linear hyperplane can be written as:

The vector W represents the normal vector to the hyperplane. i.e the
direction perpendicular to the hyperplane. The parameter b in the
equation represents the offset or distance of the hyperplane from the
origin along the normal vector w.
The distance between a data point x_i and the decision boundary can
be calculated as:

where ||w|| represents the Euclidean norm of the weight vector w.


Euclidean norm of the normal vector W
For Linear SVM classifier :

Optimization:
 For Hard margin linear SVM classifier:

The target variable or label for the ith training instance is denoted by
the symbol ti in this statement. And ti=-1 for negative occurrences
(when yi= 0) and ti=1positive instances (when yi = 1) respectively.
Because we require the decision boundary that satisfy the constraint:

 For Soft margin linear SVM classifier:


 Dual Problem: A dual Problem of the optimisation problem that
requires locating the Lagrange multipliers related to the support
vectors can be used to solve SVM. The optimal Lagrange multipliers
α(i) that maximize the following dual objective function

where,
 αi is the Lagrange multiplier associated with the ith training sample.
 K(xi, xj) is the kernel function that computes the similarity between
two samples xi and xj. It allows SVM to handle nonlinear
classification problems by implicitly mapping the samples into a
higher-dimensional feature space.
 The term ∑αi represents the sum of all Lagrange multipliers.
The SVM decision boundary can be described in terms of these optimal
Lagrange multipliers and the support vectors once the dual issue has
been solved and the optimal Lagrange multipliers have been
discovered. The training samples that have i > 0 are the support
vectors, while the decision boundary is supplied by:

Types of Support Vector Machine

Based on the nature of the decision boundary, Support Vector


Machines (SVM) can be divided into two main parts:
 Linear SVM: Linear SVMs use a linear decision boundary to
separate the data points of different classes. When the data can be
precisely linearly separated, linear SVMs are very suitable. This
means that a single straight line (in 2D) or a hyperplane (in higher
dimensions) can entirely divide the data points into their respective
classes. A hyperplane that maximizes the margin between the
classes is the decision boundary.
 Non-Linear SVM: Non-Linear SVM can be used to classify data
when it cannot be separated into two classes by a straight line (in
the case of 2D). By using kernel functions, nonlinear SVMs can
handle nonlinearly separable data. The original input data is
transformed by these kernel functions into a higher-dimensional
feature space, where the data points can be linearly separated. A
linear SVM is used to locate a nonlinear decision boundary in this
modified space.

Popular kernel functions in SVM

The SVM kernel is a function that takes low-dimensional input space


and transforms it into higher-dimensional space, ie it converts
nonseparable problems to separable problems. It is mostly useful in
non-linear separation problems. Simply put the kernel, does some
extremely complex data transformations and then finds out the
process to separate the data based on the labels or outputs defined.

Advantages of SVM
 Effective in high-dimensional cases.
 Its memory is efficient as it uses a subset of training points in the
decision function called support vectors.
 Different kernel functions can be specified for the decision functions
and its possible to specify custom kernels.

You might also like