0% found this document useful (0 votes)
5 views17 pages

Classification SVM

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views17 pages

Classification SVM

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

CLASSIFICATION

Arvind Deshpande
11/25/2024 Arvind Deshpande (VJTI) 2

Support Vector Machine (SVM)


• A method for the classification of both linear and nonlinear
data.
• It uses a nonlinear mapping to transform the original
training data into a higher dimension.
• Within this new dimension, it searches for the linear
optimal separating hyperplane (i.e., a “decision boundary”
separating the tuples of one class from another).
• With an appropriate nonlinear mapping to a sufficiently
high dimension, data from two classes can always be
separated by a hyperplane.
• The SVM finds this hyperplane using support vectors
(“essential” training tuples) and margins (defined by the
support vectors).
11/25/2024 Arvind Deshpande (VJTI) 3

Linearly separable data


11/25/2024 Arvind Deshpande (VJTI) 4

Maximum marginal hyperplane


11/25/2024 Arvind Deshpande (VJTI) 5

Support vectors
11/25/2024 Arvind Deshpande (VJTI) 6

Linearly inseparable data


11/25/2024 Arvind Deshpande (VJTI) 7

Optimal Separating Hyperplane


These two subsets of
vectors can be
separated by the
optimal Hyperplane if it
is separated without
error and the distance
between the closest
vector to the
Hyperplane is maximal.
11/25/2024 Arvind Deshpande (VJTI) 8

Support Vector Machine (SVM)


• A separating hyperplane can be written as
𝑊. 𝑋 + 𝑏 = 0
where 𝑊 = 𝑤1 , 𝑤2 , … . , 𝑤𝑛 is a weight vector and 𝑏 is a
scalar(bias).
• For 2-D, it can be written as 𝑤1 𝑥1 + 𝑤2 𝑥2 = 0
• The hyperplane defining the sides of the margin:
𝐻1: 𝑤0 + 𝑤1 𝑥1 + 𝑤2 𝑥2 ≥ 1 𝑓𝑜𝑟 𝑦𝑖 = +1
𝐻2: 𝑤0 + 𝑤1 𝑥1 + 𝑤2 𝑥2 ≤ 1 𝑓𝑜𝑟 𝑦𝑖 = −1
• Any training tuples that fall on hyperplanes H1 and H2 are
called support vectors.
11/25/2024 Arvind Deshpande (VJTI) 9

Support Vector Machine (SVM)


• SVM algorithm is implemented using a kernel.
• A kernel transforms an input space into the required form.
• Kernel trick – A low dimensional input space in
transformed into a higher dimensional space (converting a
non-separable problem into separable problem by adding
more dimension to it). Helps to build a more accurate
classifier.
• Useful in non-linear separation problems.
11/25/2024 Arvind Deshpande (VJTI) 10

Kernel
• Linear Kernel
• A normal dot product of any 2 given observations.
𝐾 𝑥, 𝑥𝑖 = 𝑠𝑢𝑚(𝑥 ∗ 𝑥𝑖 )
• Polynomial kernel – More generalized form of linear kernel.
Can distinguish curved or nonlinear input space
𝐾 𝑥, 𝑥𝑖 = 1 + 𝑠𝑢𝑚(𝑥 ∗ 𝑥𝑖 )𝑑
Where d is degree of the polynomial.
• Radial basis function kernel (Gaussian Kernel) – Commonly
used in SVM classification. Can map an input space in infinite
dimension as
2
𝐾 𝑥, 𝑥𝑖 = 𝑒 −𝑔𝑎𝑚𝑚𝑎∗𝑠𝑢𝑚 (𝑥 −𝑥𝑖 )
gamma is a parameter, which range from 0 to 1. A higher value
will perfectly fit the training dataset, which causes overfitting.
Gamma = 0.1 is considered to be a good default value.
11/25/2024 Arvind Deshpande (VJTI) 11

Classification Performance Parameters


• To estimate the classification performance, the three
distinct parameters can be used which are as follows :
• Accuracy (ACC): It is determined as the ratio of the
accurately classified samples to the total samples, and
can be expressed as follows :
TP + TN
𝐴𝐶𝐶 = × 100%
TP + TN + FP + FN
where TP, TN, FP, and FN are total number of accurately
identified true positive samples, true negative samples,
false positive samples and false negative samples,
respectively.
11/25/2024 Arvind Deshpande (VJTI) 12

Classification Performance Parameters


• Precision: Ability of classifier not to label an instance
positive that is actually negative. It is computed as the
ratio of the accurately identified true positive samples to
the sum of true and false positives and its formula can be
given as follows :
TP
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 = × 100%
TP + FP
• Recall: Ability of classifier to find all positive instances. It
is computed as the ratio of the accurately identified true
positive samples to the total number of actual positive
samples, and its formula can be given as follows :
TP
𝑅𝑒𝑐𝑎𝑙𝑙 = × 100%
TP + FN
11/25/2024 Arvind Deshpande (VJTI) 13

Classification Performance Parameters


• F1 score – Weighted harmonic mean of precision and
recall such that the best score is 1.0 and the worst is 0.0.
• Lower than precision and recall.
2 × 𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 × 𝑟𝑒𝑐𝑎𝑙𝑙
𝐹, 𝐹1 , 𝐹 − 𝑠𝑐𝑜𝑟𝑒 =
𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 + 𝑟𝑒𝑐𝑎𝑙𝑙
1 + 𝛽2 × 𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 × 𝑟𝑒𝑐𝑎𝑙𝑙
𝐹𝛽 =
𝛽2 × 𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 + 𝑟𝑒𝑐𝑎𝑙𝑙
11/25/2024 Arvind Deshpande (VJTI) 14

Advantages
• They are highly accurate, owing to their ability to model
complex nonlinear decision boundaries.
• They are much less prone to overfitting than other
methods. The support vectors found also provide a
compact description of the learned model.
• Effective in high dimensional space
• Suitable for cases where number of dimensions is greater
than the number of samples.
• Memory efficient as it uses a subset of training points in
the decision function (support vectors)
• Versatile. Different Kernel functions can be specified for
the decision functions.
11/25/2024 Arvind Deshpande (VJTI) 15

Advantages
• Support vector machine being computationally powerful
tool for supervised learning, widely used for classification
problems.
• SVMs can be used for numeric prediction as well as
classification.
• They have been applied to a number of areas, including
handwritten digit recognition, object recognition, and
speaker identification, as well as benchmark time-series
prediction tests.
11/25/2024 Arvind Deshpande (VJTI) 16

Disadvantages
• The training time of even the fastest SVMs can be
extremely slow.
• If the number of features is much greater than the number
of samples, avoid overfitting in choosing kernel functions
and regularization term is crucial.
• Do not directly provide probability estimates. They are
calculated using an expensive five-fold cross-validation.
11/25/2024 Arvind Deshpande (VJTI) 17

Three species of Iris flower


The data set used consists of 50 samples from each of three
species of Iris flowers.
The three species of Iris are Iris Setosa, Iris Virginica, and Iris
Versicolor.
Four features were measured from each sample: the length and
the width of the sepals and petals, in centimeters.

You might also like