ML-II UNIT-1
ML-II UNIT-1
classification and regression tasks. While it can handle regression problems, SVM is particularly
well-suited for classification tasks.
SVM aims to find the optimal hyperplane in an N-dimensional space to separate data points into
different classes. The algorithm maximizes the margin between the closest points of different
classes.
• Support Vectors: The closest data points to the hyperplane, crucial for determining the
hyperplane and margin in SVM.
• Margin: The distance between the hyperplane and the support vectors. SVM aims to
maximize this margin for better classification performance.
• Hard Margin: A maximum-margin hyperplane that perfectly separates the data without
misclassifications.
• Dual Problem: Involves solving for Lagrange multipliers associated with support
vectors, facilitating the kernel trick and efficient computation.
The best hyperplane, also known as the “hard margin,” is the one that maximizes the distance
between the hyperplane and the nearest data points from both classes. This ensures a clear
separation between the classes. So, from the above figure, we choose L2 as hard margin.
When data is not linearly separable (i.e., it can’t be divided by a straight line), SVM uses a
technique called kernels to map the data into a higher-dimensional space where it becomes
separable. This transformation helps SVM find a decision boundary even for non-linear data.
A kernel is a function that maps data points into a higher-dimensional space without explicitly
computing the coordinates in that space. This allows SVM to work efficiently with non-linear
data by implicitly performing the mapping.
For example, consider data points that are not linearly separable. By applying a kernel function,
SVM transforms the data points into a higher-dimensional space where they become linearly
separable.
• Radial Basis Function (RBF) Kernel: Transforms data into a space based on distances
between data points.
SVM Disadvantages
SVM Application
Vapnik-Chervonenkis Dimension
The Vapnik-Chervonenkis (VC) dimension is a measure of the capacity of a hypothesis set to fit
different data sets. It was introduced by Vladimir Vapnik and Alexey Chervonenkis in the 1970s
and has become a fundamental concept in statistical learning theory. The VC dimension is a
measure of the complexity of a model, which can help us understand how well it can fit different
data sets.
The VC dimension of a hypothesis set H is the largest number of points that can be shattered by
H. A hypothesis set H shatters a set of points S if, for every possible labeling of the points in S,
there exists a hypothesis in H that correctly classifies the points. In other words, a hypothesis set
shatters a set of points if it can fit any possible labeling of those points.
Bounds of VC – Dimension
The VC dimension provides both upper and lower bounds on the number of training examples
required to achieve a given level of accuracy. The upper bound on the number of training
examples is logarithmic in the VC dimension, while the lower bound is linear.
Applications of VC – Dimension
The VC dimension has a wide range of applications in machine learning and statistics. For
example, it is used to analyze the complexity of neural networks, support vector machines, and
decision trees. The VC dimension can also be used to design new learning algorithms that are
robust to noise and can generalize well to unseen data.
The VC dimension can be extended to more complex learning scenarios, such as multiclass
classification and regression. The concept of the VC dimension can also be applied to other areas
of computer science, such as computational geometry and graph theory.