Classification SVM
Classification SVM
Arvind Deshpande
11/25/2024 Arvind Deshpande (VJTI) 2
Support vectors
11/25/2024 Arvind Deshpande (VJTI) 6
Kernel
• Linear Kernel
• A normal dot product of any 2 given observations.
𝐾 𝑥, 𝑥𝑖 = 𝑠𝑢𝑚(𝑥 ∗ 𝑥𝑖 )
• Polynomial kernel – More generalized form of linear kernel.
Can distinguish curved or nonlinear input space
𝐾 𝑥, 𝑥𝑖 = 1 + 𝑠𝑢𝑚(𝑥 ∗ 𝑥𝑖 )𝑑
Where d is degree of the polynomial.
• Radial basis function kernel (Gaussian Kernel) – Commonly
used in SVM classification. Can map an input space in infinite
dimension as
2
𝐾 𝑥, 𝑥𝑖 = 𝑒 −𝑔𝑎𝑚𝑚𝑎∗𝑠𝑢𝑚 (𝑥 −𝑥𝑖 )
gamma is a parameter, which range from 0 to 1. A higher value
will perfectly fit the training dataset, which causes overfitting.
Gamma = 0.1 is considered to be a good default value.
11/25/2024 Arvind Deshpande (VJTI) 11
Advantages
• They are highly accurate, owing to their ability to model
complex nonlinear decision boundaries.
• They are much less prone to overfitting than other
methods. The support vectors found also provide a
compact description of the learned model.
• Effective in high dimensional space
• Suitable for cases where number of dimensions is greater
than the number of samples.
• Memory efficient as it uses a subset of training points in
the decision function (support vectors)
• Versatile. Different Kernel functions can be specified for
the decision functions.
11/25/2024 Arvind Deshpande (VJTI) 15
Advantages
• Support vector machine being computationally powerful
tool for supervised learning, widely used for classification
problems.
• SVMs can be used for numeric prediction as well as
classification.
• They have been applied to a number of areas, including
handwritten digit recognition, object recognition, and
speaker identification, as well as benchmark time-series
prediction tests.
11/25/2024 Arvind Deshpande (VJTI) 16
Disadvantages
• The training time of even the fastest SVMs can be
extremely slow.
• If the number of features is much greater than the number
of samples, avoid overfitting in choosing kernel functions
and regularization term is crucial.
• Do not directly provide probability estimates. They are
calculated using an expensive five-fold cross-validation.
11/25/2024 Arvind Deshpande (VJTI) 17