0% found this document useful (0 votes)
9 views

ML-II UNIT-1

Support Vector Machine (SVM) is a supervised machine learning algorithm primarily used for classification tasks, which finds the optimal hyperplane to separate different classes while maximizing the margin between them. SVM employs kernel functions to handle non-linearly separable data and has advantages such as good generalization and effectiveness with high-dimensional data, but it also faces challenges like long training times and difficulty in model interpretation. The Vapnik-Chervonenkis (VC) dimension is a key concept in understanding the complexity and capacity of SVM models, influencing their performance and generalization capabilities.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

ML-II UNIT-1

Support Vector Machine (SVM) is a supervised machine learning algorithm primarily used for classification tasks, which finds the optimal hyperplane to separate different classes while maximizing the margin between them. SVM employs kernel functions to handle non-linearly separable data and has advantages such as good generalization and effectiveness with high-dimensional data, but it also faces challenges like long training times and difficulty in model interpretation. The Vapnik-Chervonenkis (VC) dimension is a key concept in understanding the complexity and capacity of SVM models, influencing their performance and generalization capabilities.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Support Vector Machine (SVM) is a supervised machine learning algorithm used for

classification and regression tasks. While it can handle regression problems, SVM is particularly
well-suited for classification tasks.

SVM aims to find the optimal hyperplane in an N-dimensional space to separate data points into
different classes. The algorithm maximizes the margin between the closest points of different
classes.

Support Vector Machine (SVM) Terminology

• Hyperplane: A decision boundary separating different classes in feature space,


represented by the equation wx + b = 0 in linear classification.

• Support Vectors: The closest data points to the hyperplane, crucial for determining the
hyperplane and margin in SVM.

• Margin: The distance between the hyperplane and the support vectors. SVM aims to
maximize this margin for better classification performance.

• Kernel: A function that maps data to a higher-dimensional space, enabling SVM to


handle non-linearly separable data.

• Hard Margin: A maximum-margin hyperplane that perfectly separates the data without
misclassifications.

• Soft Margin: Allows some misclassifications by introducing slack variables, balancing


margin maximization and misclassification penalties when data is not perfectly separable.

• C: A regularization term balancing margin maximization and misclassification penalties.


A higher C value enforces a stricter penalty for misclassifications.

• Hinge Loss: A loss function penalizing misclassified points or margin violations,


combined with regularization in SVM.

• Dual Problem: Involves solving for Lagrange multipliers associated with support
vectors, facilitating the kernel trick and efficient computation.

How does Support Vector Machine Algorithm Work?


The key idea behind the SVM algorithm is to find the hyperplane that best separates two classes
by maximizing the margin between them. This margin is the distance from the hyperplane to the
nearest data points (support vectors) on each side.

The best hyperplane, also known as the “hard margin,” is the one that maximizes the distance
between the hyperplane and the nearest data points from both classes. This ensures a clear
separation between the classes. So, from the above figure, we choose L2 as hard margin.

Let’s consider a scenario like shown below:

When data is not linearly separable (i.e., it can’t be divided by a straight line), SVM uses a
technique called kernels to map the data into a higher-dimensional space where it becomes
separable. This transformation helps SVM find a decision boundary even for non-linear data.

A kernel is a function that maps data points into a higher-dimensional space without explicitly
computing the coordinates in that space. This allows SVM to work efficiently with non-linear
data by implicitly performing the mapping.
For example, consider data points that are not linearly separable. By applying a kernel function,
SVM transforms the data points into a higher-dimensional space where they become linearly
separable.

• Linear Kernel: For linear separability.

• Polynomial Kernel: Maps data into a polynomial space.

• Radial Basis Function (RBF) Kernel: Transforms data into a space based on distances
between data points.

SVM Advantages & Disadvantages


SVM Advantages

• SVM’s are very good when we have no idea on the data.


• Works well with even unstructured and semi structured data like text, Images and trees.
• The kernel trick is real strength of SVM. With an appropriate kernel function, we can solve any
complex problem.
• Unlike in neural networks, SVM is not solved for local optima.
• It scales relatively well to high dimensional data.
• SVM models have generalization in practice, the risk of over-fitting is less in SVM.
• SVM is always compared with ANN. When compared to ANN models, SVMs give better
results.

SVM Disadvantages

• Choosing a “good” kernel function is not easy.


• Long training time for large datasets.
• Difficult to understand and interpret the final model, variable weights and individual impact.
• Since the final model is not so easy to see, we can not do small calibrations to the model hence
its tough to incorporate our business logic.
• The SVM hyper parameters are Cost -C and gamma. It is not that easy to fine-tune these hyper-
parameters. It is hard to visualize their impact

SVM Application

• Protein Structure Prediction


• Intrusion Detection
• Handwriting Recognition
• Detecting Steganography in digital images
• Breast Cancer Diagnosis

Vapnik-Chervonenkis Dimension

The Vapnik-Chervonenkis (VC) dimension is a measure of the capacity of a hypothesis set to fit
different data sets. It was introduced by Vladimir Vapnik and Alexey Chervonenkis in the 1970s
and has become a fundamental concept in statistical learning theory. The VC dimension is a
measure of the complexity of a model, which can help us understand how well it can fit different
data sets.

The VC dimension of a hypothesis set H is the largest number of points that can be shattered by
H. A hypothesis set H shatters a set of points S if, for every possible labeling of the points in S,
there exists a hypothesis in H that correctly classifies the points. In other words, a hypothesis set
shatters a set of points if it can fit any possible labeling of those points.

Bounds of VC – Dimension

The VC dimension provides both upper and lower bounds on the number of training examples
required to achieve a given level of accuracy. The upper bound on the number of training
examples is logarithmic in the VC dimension, while the lower bound is linear.

Applications of VC – Dimension

The VC dimension has a wide range of applications in machine learning and statistics. For
example, it is used to analyze the complexity of neural networks, support vector machines, and
decision trees. The VC dimension can also be used to design new learning algorithms that are
robust to noise and can generalize well to unseen data.

The VC dimension can be extended to more complex learning scenarios, such as multiclass
classification and regression. The concept of the VC dimension can also be applied to other areas
of computer science, such as computational geometry and graph theory.

You might also like